From f8640770493dd077bb77ad28c3951f2acc6f0f0b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E9=82=93=E4=BC=9F=E9=94=AE?= <emmmvkdeng@gmail.com>
Date: Wed, 3 Dec 2025 16:57:02 +0800
Subject: [PATCH 1/3] init

---
 .../\351\230\237\344\274\215emmm/README.md"   |   201 +
 .../patches/0001-20251104commit.patch"        |  1272 +
 .../patches/0002-20251106commit.patch"        |  3200 +
 .../patches/0003-20261106secondcommit.patch"  |  2769 +
 .../patches/0004-20251106change.patch"        |  7498 +++
 .../patches/0005-20251107001commit.patch"     |  7707 +++
 .../patches/0006-20251107002commit.patch"     |  7931 +++
 .../patches/0007-20251107003commit.patch"     |  8034 +++
 .../patches/0008-moe-change.patch"            |  8789 +++
 .../patches/0009-20251109firstcommit.patch"   |  9078 +++
 .../patches/0010-.patch"                      | 49453 ++++++++++++++++
 11 files changed, 105932 insertions(+)
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/README.md"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0001-20251104commit.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0002-20251106commit.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0003-20261106secondcommit.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0004-20251106change.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0005-20251107001commit.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0006-20251107002commit.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0007-20251107003commit.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0008-moe-change.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0009-20251109firstcommit.patch"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0010-.patch"

diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/README.md" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/README.md"
new file mode 100644
index 00000000..a3cda3f5
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/README.md"
@@ -0,0 +1,201 @@
+# README
+
+## 比赛内容(MOE赛道)
+
+MoE类：
+
+deepseek-ai/deepseek-moe-16b-chat
+
+Qwen/Qwen1.5-MoE-A2.7B-Chat
+
+在无精度误差的情况下提速这两个模型的prefill，decode和显存峰值
+
+![img](https://kxqaj5kr937.feishu.cn/space/api/box/stream/download/asynccode/?code=NWIwMTIzNjY4NDhkMTI4ZmYxNTFmMWNhOWIyNWRlYzZfeldtbk84b3lhUWVNYjJCZlRtT05TZ0JubU1hMzB0S3RfVG9rZW46WU10NmJsWXFab01CaER4NFlHT2NZRzJHbjJuXzE3NjQ3NTAyNDg6MTc2NDc1Mzg0OF9WNA)
+
+## 最终成绩
+
+![img](https://kxqaj5kr937.feishu.cn/space/api/box/stream/download/asynccode/?code=Zjk2MzEzNmNhYWUxODQ5NzI1NTNhODRmMjhmMDljMGZfeWNuZ0tzT3JBcHBlY0Z1ZnFJWHRNczRGWnd1UWFOaGdfVG9rZW46QVRXVWJGeUpGb2k5R094WmxuVGM5TUdEbmxmXzE3NjQ3NTAyNDg6MTc2NDc1Mzg0OF9WNA)
+
+# 比赛复盘
+
+## 前期思路
+
+- flash-attention ： 由于确实是很常用的加速手段，原理上也work，所以基本上贯穿了优化策略，但实际上收益甚微，就连显存上也没有显示出优化
+- 算子融合 ： 一开始做的是诸如合并两次运算到一次运算里面；官方开会的时候提到了mindnlp.core.F，也提到了融合算子的下发调用开销可能和他的加速持平，实际测试下来没有加速，而且造成了精度误差(F.rms_norm)
+- 合并python循环为矩阵运算 : 可以说是最有效的方法了，但是前期并没有探索的很深，浅浅掠过
+- 复用图/kernal ： 这个可谓是花了最多心思、同时又没弄出效果的方法，具体放在中期测试里面讲
+- 只遍历激活专家 ： 前期唯一的提分手段，从100->120
+
+## 中期测试
+
+- flash-attention
+  - 通过简单网络来测试，flash-attention对于长序列确而有提速效果，但是在中短序列不明显，有时候还会因为未知波动效果不如baseline
+  - 官方接口 `mindspore.ops.flash_attention_score`会带来一定的精度误差，具体而言qwen的prompt2会mismatch
+- 算子融合
+  - F.rms_norm 不仅没加速还带来了精度误差(应该是qwen的prompt1会mismatch)，遂直接放弃
+  - 但是我没太理解会议里面讲的要比较下放损耗和融合算子加速效果，我个人仍然觉得这应该要work，但是却没有
+- Graph&Pynative mode - kernal/图复用
+  - 一开始打算用分桶填充策略，设置 `seq_len = [1,2,4,8,..,128]`的桶来多次调用模型生成来生成这些尺寸的图，为输入的prompt寻找恰好不小于他的桶进行padding触发图复用，但是毫无效果，于是开始探索图复用的条件，网上有说法是需要 `@mindspore.jit`即时编译/`Graph mode`静态图模式才能生成可以复用的图，于是进入下一步测试
+  - @mindspore.jit：基本用不了，对于网络模型底层的try-except控制流，jit不支持这种低效分支，而我们又不大可能去修改底层的控制流(太多)，遂放弃
+  - Graph mode：经过我用简单网络的多次测试(充分预热，多次测试取平均)，Graph模式比Pynative模式慢10倍左右，匪夷所思，初步怀疑是没触发图复用所以编译开销也算里面了(也就是单纯用Graph跑一次不会建立图？要显性调用什么函数或者装饰器吗)，遂半放弃
+  - static-cache ：没做成功，因为需要把动态cache 换成 static cache，bug较多，时间上不允许，而且直播的时候说提升不大。
+- Profiler
+  - 这是一个很好的工具(疑似)，但是直到最后都不知道如何使用,一方面是断点设置和信息收集的问题，但这个问题不大
+  - ![img](https://kxqaj5kr937.feishu.cn/space/api/box/stream/download/asynccode/?code=ODNmOTFhMDg2NjZjYmJmODgwMjBlNzVjYTE1MWFiMzRfTTh1S1pnUVZXbWdPRGU0MGhSREh5TU05ZkRaNEJCMGZfVG9rZW46VmFIRWJnMDlob1FUV294YktYZGNTNklqbnlmXzE3NjQ3NTAyNDg6MTc2NDc1Mzg0OF9WNA)
+  - 最重要的是这个页面我只看到NPU的free/compute比值很大，除此之外不知道如何分析来调优了,要是能看**别人实际调优一遍肯定会好很多，求教程！！**
+- MOE分析
+  - 通过模型原来的代码，在self.mlp = ... 这一行，我发现了有一个if控制流，走moe/mlp,尝试使用走mlp之后，prefill/decode耗时降低了**20倍**，这时候我才意识到，原来前面有的没的都是**次要矛盾**，只要把**moe这个模块的代码**优化好了，就已经胜利了
+  - 测试对比了moe和attention模块的占时，发现moe耗时是attention的20倍左右，所以压根不需要管attention部分的优化，也说明了为什么flash-attention部分没什么提速效果，因为**attention对总时长的影响实在微乎其微。**
+  - 通过查看榜单，思考前两名prefill=1700的同时为什么显存都有略微下降，可以明确的是他们肯定用空间换了时间，但是具体是做了什么呢？
+  - 通过对moe模块的分析，终于才把握住了主要矛盾；结合前面总总，不难推断肯定是moe部分把某些**串行换并行或者大矩阵计算**，从显存情况我更倾向于是**大矩阵计算**，所以问题就转移到如何做这个大矩阵运算
+
+## 后期优化
+
+以下代码是我们moe prefill 和 decode 最快的一版
+
+### prefill
+
+1. **Pad (填充)**: 将分配给不同专家的、数量不等的“锯齿状”token数据，通过 tensor_scatter_update 填充成一个规整的、[专家数, 最大Token数, 隐藏层大小] 的“矩形”张量。
+2. **BMM (批量矩阵乘法)**: 利用这个规整的张量，调用一次 ops.bmm 即可同时计算所有专家的输出，将硬件并行度拉满。
+3. **Gather (收集)**: 计算完成后，用 gather_nd 从填充后的结果中高效地提取出有效的输出数据。
+
+```Python
+    @no_grad()
+    def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
+        num_total_assignments = flat_expert_indices.shape[0]
+        hidden_size = x.shape[-1]
+        num_experts = len(self.experts)
+
+        # 1) 排序专家分配
+        idxs = flat_expert_indices.argsort()
+        sorted_expert_indices = flat_expert_indices[idxs]
+        sorted_token_indices = idxs // self.num_experts_per_tok
+        permuted_tokens = x[sorted_token_indices]
+        sorted_weights = flat_expert_weights[idxs]
+
+        # 2) 计算Padding所需的尺寸
+        tokens_per_expert = sorted_expert_indices.bincount(minlength=num_experts)
+        max_tokens_per_expert = tokens_per_expert.max().item()
+        
+        if max_tokens_per_expert == 0:
+            return ops.zeros_like(x)
+
+        # 3) 创建Padded张量
+        expert_offsets = ops.cumsum(tokens_per_expert, dim=0) - tokens_per_expert
+        token_indices_in_sorted = mnp.arange(num_total_assignments)
+        relative_pos_in_expert = token_indices_in_sorted - expert_offsets[sorted_expert_indices]
+        
+        gather_indices_sparse = ops.stack([sorted_expert_indices, relative_pos_in_expert], dim=1)
+
+        # --- 关键修正: 使用 tensor_scatter_update ---
+        padded_tokens = ops.zeros((num_experts, max_tokens_per_expert, hidden_size), dtype=x.dtype)
+        # 直接使用 mindspore.ops.tensor_scatter_update
+        padded_tokens = mindspore.ops.tensor_scatter_update(padded_tokens, gather_indices_sparse, permuted_tokens)
+        
+        # 4) 堆叠所有专家权重
+        gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
+        up_weights   = ops.stack([expert.up_proj.weight for expert in self.experts], dim=0)
+        down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
+
+        # 5) --- 核心：巨型批处理矩阵乘法 (BMM) ---
+        gate_out = ops.bmm(padded_tokens, gate_weights.transpose(0, 2, 1))
+        up_out   = ops.bmm(padded_tokens, up_weights.transpose(0, 2, 1))
+        act_out = self.experts[0].act_fn(gate_out) * up_out
+        padded_expert_outputs = ops.bmm(act_out, down_weights.transpose(0, 2, 1))
+        
+        # 6) 从Padded结果张量中按原始顺序gather回所有有效结果
+        #    mindspore.ops.gather_nd 是 tensor_scatter_update 的逆操作，非常适合此场景
+        expert_outputs_sorted = mindspore.ops.gather_nd(padded_expert_outputs, gather_indices_sparse)
+
+        # 7) 最终加权和还原
+        final_output = ops.zeros_like(x)
+        final_output = mindspore.mint.scatter_add(
+            final_output,
+            0,
+            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+            expert_outputs_sorted * sorted_weights
+        )
+        return final_output
+```
+
+### Decode
+
+1. **向量化计算**
+
+新代码先将这个 token 选出的 top_k 个专家的weight收集起来，用 ops.stack 堆叠成一个批次。然后，通过 **ops.bmm** 这个算子，一次性并行完成这topk个专家的所有计算。
+
+1. **内存局部性优化**
+
+通过 init_active_expert_cache 函数提前进行预处理。在模型预热（阶段识别出最常用的专家，然后将这些“热门专家”的权重从它们各自零散的位置提前抽调出来，用 ops.stack 堆叠成一个巨大且连续的内存块（即 self.cache_gate_w 等缓存张量）。在实际解码时，代码会优先从这个连续的缓存块中通过索引（self.cache_gate_w[eid]）直接读取权重。这种直接在大块连续内存上的索引操作速度极快，因为它能高效利用硬件的内存缓存机制（“缓存命中” Cache Hit），避免了复杂的对象查找。
+
+```Python
+    def init_active_expert_cache(self, active_ids):
+        """
+        预热后调用，将常用专家的权重预先提取并堆叠，
+        形成一个内存连续的“快速访问缓存”。
+        """
+        self.cache_gate_w = ops.stack([self.experts[i].gate_proj.weight for i in active_ids], dim=0)
+        self.cache_up_w   = ops.stack([self.experts[i].up_proj.weight for i in active_ids], dim=0)
+        self.cache_down_w = ops.stack([self.experts[i].down_proj.weight for i in active_ids], dim=0)
+
+    def moe_infer_decode_fast(self, x, flat_expert_indices, flat_expert_weights):
+        """
+        利用“权重缓存”和“BMM向量化”实现极致解码速度。
+        """
+        top_k = flat_expert_indices.shape[0]
+        hidden_size = x.shape[-1]
+
+        selected_gate_w = []
+        selected_up_w = []
+        selected_down_w = []
+
+        # 1. 核心：从“快速缓存”或“慢速原始列表”中收集权重
+        for eid in flat_expert_indices.tolist():
+            # 检查缓存是否存在且eid在缓存范围内，如果满足则进入“快速通道”
+            if hasattr(self, "cache_gate_w") and eid < self.cache_gate_w.shape[0]:
+                selected_gate_w.append(self.cache_gate_w[eid])
+                selected_up_w.append(self.cache_up_w[eid])
+                selected_down_w.append(self.cache_down_w[eid])
+            else: # 否则，回退到“慢速通道”
+                selected_gate_w.append(self.experts[eid].gate_proj.weight)
+                selected_up_w.append(self.experts[eid].up_proj.weight)
+                selected_down_w.append(self.experts[eid].down_proj.weight)
+
+        # 2. 将收集到的分散权重堆叠成一个批次
+        selected_gate_w = ops.stack(selected_gate_w, dim=0)
+        selected_up_w   = ops.stack(selected_up_w, dim=0)
+        selected_down_w = ops.stack(selected_down_w, dim=0)
+
+        # 3. 向量化计算：使用BMM一次性完成所有专家运算
+        x_expanded = x.expand((top_k, 1, hidden_size))
+        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+        up_out   = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+        
+        # 4. 向量化聚合
+        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+        return weighted_sum
+```
+
+#### 其他trick
+
+- trick1 ： **劫持预热**，事先过一遍短中长三种prompt的generate，充分预热，当时的想法是瞎猫碰一下死耗子看看能不能触发图复用，结果意外确实降低了prefill时延，由于没有充分实验不确定原理
+- trick2 ： 根据测试的三个prompt的长度，**用Prompt=0/1/2控制走的优化流**，部分优化流对某些Prompt会带来精度误差，这样是为了在实在解决不了精度问题的情况下,不至于直接放弃这个有效的优化，利用其他Prompt的优化先快带动后快
+- trick3 ： **init_active_expert_cache和warmup_moe_model_deep**
+  -  在预热的时候，记录下所有被激活过的专家的ID，缓存那些在预热中被激活过的active_ids的权重（ops.stack）。
+  -  如果缓存已经建立，并且当前需要的专家 eid 就在缓存里，它会直接从连续的 cache_gate_w 张量中索引权重。
+
+## 收益点
+
+| DeepseekMoe + Qwen moe都进行MoE模块前向优化，decode直接遍历激活专家 | 总分的具体收益估计是从100->120                               |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| 在DeepseekAttention和QwenAttention的forward函数里有用apply_rotary_pos_emb函数，而对于该函数里用了rotate_half函数。对于rotate_half函数，可以使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]。 | 显存峰值100优化后100prefill133.4445 132.4821decode427.7311 437.5848总分220.919 223.3556 |
+| **moe_prefill_fast** 通过将权重堆叠、对输入 tokens 重新排序，将原本多个小的、串行的专家计算，转换为了几个大的、连续的计算块，并使用一个高效的 scatter_add 操作完成结果聚合，从而大幅提升了性能。**moe_decode_fast** 将多个小规模的、串行的专家计算，巧妙地转换成了一次大规模的、并行的批量矩阵乘法（bmm）操作，彻底消除了 Python 循环，因此速度更快。但是有mismatch，所以根据LongPrompt来做dispatch | 显存峰值100优化后98.4848prefill132.4821 163.8114decode437.5848 454.7424总分223.3556 239.0129 |
+| **init_active_expert_cache**和**warmup_moe_model_deep**：在预热的时候，记录下所有被激活过的专家的ID，缓存那些在预热中被激活过的active_ids的权重（ops.stack）。如果缓存已经建立，并且当前需要的专家 eid 就在缓存里，它会直接从连续的 cache_gate_w 张量中索引权重。 | 显存峰值98.4848优化后98.4848prefill163.8114 198.4985decode454.7424 493.2538总分239.0129 263.4124 |
+| 通过 **Pad -> BMM -> Gather** 的流程，将所有专家的计算合并为单个、大规模的并行操作Pad : 将分配给不同专家的、数量不等的“锯齿状”token数据，通过 tensor_scatter_update 填充成一个规整的、[专家数, 最大Token数, 隐藏层大小] 的“矩形”张量。BMM: 利用这个规整的张量，调用一次 ops.bmm 即可同时计算所有专家的输出，将硬件并行度拉满。Gather : 计算完成后，用 gather_nd 从填充后的结果中高效地提取出有效的输出数据。+但是有mismatch，解决思路是：在核心计算中使用 float32 保证数值精度，从根本上解决 mismatch 问题+根据LongPrompt来做dispatch | 显存峰值98.4848优化后83.3333prefill198.4985 487.1616decode493.2538 490.5996总分263.4124 353.6982 |
+
+## 总结
+
+- 对于精度优化比赛，不应该一上来就花费大量时间在框架本身优化、常用优化等上，应该先通过充分测试找到**主要矛盾**，因为一般这种比赛都会有侧重点，比如这个比赛就是moe部分，如果第一天我就能测试出moe的时间占比如此浮夸，我想我就不会把时间放在细枝末节上面
+- 对于调试工具(如Profiler等)，这是辅助完成第一步的，很有学习的必要，这种可视化工具分析的能力或者输出、断点分析能力将是打比赛的重要能力，说是最重要也不为过，这样才能有的放矢
\ No newline at end of file
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0001-20251104commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0001-20251104commit.patch"
new file mode 100644
index 00000000..c23f7201
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0001-20251104commit.patch"
@@ -0,0 +1,1272 @@
+From d61fd429337580809fe74a59b1dfa81b91094dae Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Tue, 4 Nov 2025 09:11:51 +0800
+Subject: [PATCH 01/10] 20251104commit
+
+---
+ mindnlp/transformers/cache_utils.py           |  28 +-
+ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+ 3 files changed, 976 insertions(+), 87 deletions(-)
+
+diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+index cadd2e04..02f8d4be 100644
+--- a/mindnlp/transformers/cache_utils.py
++++ b/mindnlp/transformers/cache_utils.py
+@@ -812,14 +812,26 @@ class StaticCache(Cache):
+             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+             # k_out[:, :, cache_position] = key_states
+             # v_out[:, :, cache_position] = value_states
+-            if ON_ORANGE_PI:
+-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-            else:
+-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-
++            # if ON_ORANGE_PI:
++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++            # else:
++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++            # 确保 cache_position 是 1D tensor 并且类型正确
++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++            if cache_position.ndim > 1:
++                cache_position = cache_position.flatten()
++            # 确保类型是 int32 或 int64（MindSpore 要求）
++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++                cache_position = cache_position.int()
++            
++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++            k_out[:, :, cache_position] = key_states
++            v_out[:, :, cache_position] = value_states
++            
+         return k_out, v_out
+ 
+     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index c695b944..d8303e45 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+ def rotate_half(x):
+     """Rotates half the hidden dims of the input."""
+-    x1 = x[..., : x.shape[-1] // 2]
+-    x2 = x[..., x.shape[-1] // 2 :]
++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++    # x1 = x[..., : x.shape[-1] // 2]
++    # x2 = x[..., x.shape[-1] // 2 :]
++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+     return ops.cat((-x2, x1), dim=-1)
+ 
+ 
+@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+         if self.training:
+             raise NotImplementedError("Training is not supported yet.")
+         else:
+-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-        if self.config.n_shared_experts is not None:
+-            y = y + self.shared_experts(identity)
+-        return y
++            # @lwx
++            if orig_shape[1] == 1:
++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++                y=y.view(*orig_shape)
++                if self.config.n_shared_experts is not None:
++                    y = y + self.shared_experts(identity)
++                return y
++            else:
++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++                if self.config.n_shared_experts is not None:
++                    y = y + self.shared_experts(identity)
++                return y
++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++        # if self.config.n_shared_experts is not None:
++        #     y = y + self.shared_experts(identity)
++        # return y
++        
++    @no_grad()
++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++
++        expert_cache = ops.zeros_like(x)
++        for i in range(self.num_experts_per_tok):
++            expert_id = flat_expert_indices[i].item()
++            weight = flat_expert_weights[i].item()
++            expert = self.experts[expert_id]
++            expert_out = expert(x)
++            expert_cache += expert_out * weight
++        return expert_cache
+ 
+     @no_grad()
+-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-        # expert_cache = torch.zeros_like(x)
+-        # idxs = flat_expert_indices.argsort()
+-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-        # token_idxs = idxs // self.num_experts_per_tok
+-        # for i, end_idx in enumerate(tokens_per_expert):
+-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-        #     if start_idx == end_idx:
+-        #         continue
+-        #     expert = self.experts[i]
+-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-        #     expert_tokens = x[exp_token_idx]
+-        #     expert_out = expert(expert_tokens)
+-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-        # return expert_cache
++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+         expert_cache = ops.zeros_like(x)
+         idxs = flat_expert_indices.argsort()
+         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+         token_idxs = idxs // self.num_experts_per_tok
++
+         for i, end_idx in enumerate(tokens_per_expert):
+             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+             if start_idx == end_idx:
+@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+             expert_out = expert(expert_tokens)
+             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++
+         return expert_cache
++        
++    # @no_grad()
++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++    #     # expert_cache = torch.zeros_like(x)
++    #     # idxs = flat_expert_indices.argsort()
++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++    #     # token_idxs = idxs // self.num_experts_per_tok
++    #     # for i, end_idx in enumerate(tokens_per_expert):
++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++    #     #     if start_idx == end_idx:
++    #     #         continue
++    #     #     expert = self.experts[i]
++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++    #     #     expert_tokens = x[exp_token_idx]
++    #     #     expert_out = expert(expert_tokens)
++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++    #     # return expert_cache
++    #     expert_cache = ops.zeros_like(x)
++    #     idxs = flat_expert_indices.argsort()
++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++    #     token_idxs = idxs // self.num_experts_per_tok
++
++    #     for i, end_idx in enumerate(tokens_per_expert):
++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++    #         if start_idx == end_idx:
++    #             continue
++    #         expert = self.experts[i]
++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++    #         expert_tokens = x[exp_token_idx]
++    #         expert_out = expert(expert_tokens)
++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++
++    #     return expert_cache
++    # @no_grad()
++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++    #     expert_cache = ops.zeros_like(x)
++
++    #     # 排序保证顺序一致
++    #     idxs = flat_expert_indices.argsort()
++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++    #     token_idxs = idxs // self.num_experts_per_tok
++
++    #     # 找出有 token 的专家
++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++
++    #     for i in active_experts.tolist():
++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++    #         end_idx = tokens_per_expert[i]
++    #         if start_idx == end_idx:  # 没有 token
++    #             continue
++
++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++    #         expert_tokens = x[exp_token_idx]
++    #         expert_out = self.experts[i](expert_tokens)
++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++
++    #         expert_cache = mindspore.mint.scatter_add(
++    #             expert_cache,
++    #             0,
++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++    #             expert_out
++    #         )
++
++    #     return expert_cache
++
++
+ 
+ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+ #     """
+@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+ 
+         # Initialize weights and apply final processing
+         self.post_init()
++        self.warm_up = False
++
++    def warmup_moe_model_deep(self):
++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++        test_texts = [
++            "warmup short",
++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++        ]
++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++        if tokenizer is None:
++            from mindnlp.transformers import AutoTokenizer
++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++            self._warmup_tokenizer = tokenizer
++
++        for text in test_texts:
++            inputs = tokenizer(text, return_tensors="ms")
++            with mindspore._no_grad():
++                _ = self(**inputs, use_cache=False)
++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+ 
+     def get_input_embeddings(self):
+         return self.model.embed_tokens
+@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+         ```"""
++        if not self.warm_up:
++            self.warm_up = True
++            self.warmup_moe_model_deep()
++
+         output_attentions = (
+             output_attentions
+             if output_attentions is not None
+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+index 3cbf820e..d4c6b651 100644
+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+@@ -18,7 +18,6 @@
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+ """MindSpore Qwen2MoE model."""
+-
+ import math
+ from typing import List, Optional, Tuple, Union
+ 
+@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+     TokenClassifierOutput,
+ )
+ from ...modeling_utils import PreTrainedModel
++from ...generation import GenerationMixin
+ from ....utils import logging
+ from .configuration_qwen2_moe import Qwen2MoeConfig
+ 
+@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+         self.variance_epsilon = eps
+ 
+     def forward(self, hidden_states):
++        # @dwj
++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++        # @lwx
++        # if not self.training :
++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+         input_dtype = hidden_states.dtype
+         hidden_states = hidden_states.to(mindspore.float32)
+         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+@@ -234,6 +239,8 @@ def rotate_half(x):
+     """Rotates half the hidden dims of the input."""
+     x1 = x[..., : x.shape[-1] // 2]
+     x2 = x[..., x.shape[-1] // 2 :]
++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+     return ops.cat((-x2, x1), dim=-1)
+ 
+ 
+@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+         self.config = config
+         self.hidden_size = config.hidden_size
+         self.intermediate_size = intermediate_size
++        
+         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+         self.act_fn = ACT2FN[config.hidden_act]
+ 
+     def forward(self, x):
+-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-
+ 
++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++        # @lwx 
++        # gate_up_output = self.gate_up_proj(x)
++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++        # return self.down_proj(swiglu_output)
++
++    # def forward(self, x):
++    #     gate_proj_out = self.gate_proj(x)
++    #     up_proj_out = self.up_proj(x)
++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++    #     return self.down_proj(swiglu_out)
++        
+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+     """
+@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+         use_cache: bool = False,
+         cache_position: Optional[mindspore.Tensor] = None,
+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++
++        
++
+         bsz, q_len, _ = hidden_states.shape
+ 
+         query_states = self.q_proj(hidden_states)
+@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+                     "with a layer index."
+                 )
+-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++            if isinstance(past_key_value, StaticCache):
++                kv_seq_len = key_states.shape[-2]
++            else:
++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+ 
+         if past_key_value is not None:
+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++            
++            if isinstance(past_key_value, StaticCache):
++                kv_seq_len = key_states.shape[-2]
+ 
+         # repeat k/v heads if n_kv_heads < n_heads
+         key_states = repeat_kv(key_states, self.num_key_value_groups)
+         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-
++        
+         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+ 
+-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-            raise ValueError(
+-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-                f" {attn_weights.shape}"
+-            )
+-
+-        if attention_mask is not None:  # no matter the length, we just slice it
+-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++        if attention_mask is not None:
++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+             attn_weights = attn_weights + causal_mask
+ 
+         # upcast attention to fp32
+@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+ 
+         attn_output = self.o_proj(attn_output)
+-
++        # @lwx
++        
++        # max_seq_len = self.max_position_embeddings  # 2048
++
++        # if attention_mask is not None:
++        #     # attention_mask: [B, 1, Sq, Sk]
++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++
++        #     # pad 到 [max_seq_len, max_seq_len]
++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++        #     global_attention_mask = padded_mask
++        # else:
++        #     global_attention_mask = None
++
++
++        # sparse_mode=3
++        # attn_output = mindspore.ops.flash_attention_score(
++        #     query=query_states,
++        #     key=key_states,
++        #     value=value_states,
++        #     real_shift=None,
++        #     padding_mask=None,
++
++        #     head_num=self.num_heads,
++        #     attn_mask=global_attention_mask,
++        #     keep_prob=1.0 - self.attention_dropout,
++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++        #     input_layout="BNSD",
++        #     pre_tokens=2147483647,
++        #     next_tokens=2147483647,
++        #     inner_precise=0,
++        #     drop_mask=None,
++        #     prefix=None,
++        #     actual_seq_qlen=None,
++        #     actual_seq_kvlen=None,
++        #     sparse_mode=sparse_mode,
++        # )
+         if not output_attentions:
+             attn_weights = None
+ 
+         return attn_output, attn_weights, past_key_value
+ 
+ 
++class Qwen2MoeFlashAttention(nn.Module):
++    """
++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++
++    关键改动:
++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++       直接传入原始的 key 和 value 张量效率更高。
++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++    """
++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++        super().__init__()
++        self.config = config
++        self.layer_idx = layer_idx
++        self.hidden_size = config.hidden_size
++        self.num_heads = config.num_attention_heads
++        self.head_dim = self.hidden_size // self.num_heads
++        self.num_key_value_heads = config.num_key_value_heads
++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++        self.max_position_embeddings = config.max_position_embeddings
++        self.rope_theta = config.rope_theta
++        self.attention_dropout = config.attention_dropout
++
++        if (self.head_dim * self.num_heads) != self.hidden_size:
++            raise ValueError(
++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++            )
++
++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++
++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++            self.head_dim,
++            max_position_embeddings=self.max_position_embeddings,
++            base=self.rope_theta,
++        )
++
++    def forward(
++        self,
++        hidden_states: mindspore.Tensor,
++        attention_mask: Optional[mindspore.Tensor] = None,
++        position_ids: Optional[mindspore.Tensor] = None,
++        past_key_value: Optional[Cache] = None,
++        output_attentions: bool = False,
++        use_cache: bool = False,
++        cache_position: Optional[mindspore.Tensor] = None,
++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++
++        bsz, q_len, _ = hidden_states.shape
++
++        # 1. 线性投射 Q, K, V
++        query_states = self.q_proj(hidden_states)
++        key_states = self.k_proj(hidden_states)
++        value_states = self.v_proj(hidden_states)
++
++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++        # query:   [B, S, H*D] -> [B, N1, S, D]
++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++
++        # 3. RoPE 旋转位置编码
++        kv_seq_len = key_states.shape[-2]
++        if past_key_value is not None:
++            if self.layer_idx is None:
++                raise ValueError(
++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++                    "with a layer index."
++                )
++            # 对于 StaticCache，需要特殊处理 kv_seq_len
++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++                if cache_position.shape[0] == 1:
++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++                    kv_seq_len = past_seen_tokens + 1
++                else:
++                    # prefill 阶段：cache_position 是范围，使用其长度
++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++            else:
++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++
++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++
++        # 4. KV 缓存更新
++        if past_key_value is not None:
++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++            key_states, value_states = past_key_value.update(
++                key_states, value_states, self.layer_idx, cache_kwargs
++            )
++            
++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++                if cache_position.shape[0] == 1:
++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++                    kv_seq_len = key_states.shape[-2]
++
++        # 5. [重要] 准备 Attention Mask
++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++        fa_attention_mask = None
++        if attention_mask is not None:
++            # 截取与当前key长度匹配的部分
++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++            # 转换为布尔类型: 大负数 -> True, 0 -> False
++            fa_attention_mask = (mask_slice != 0)
++
++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++        input_dtype = query_states.dtype
++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++            query_states = query_states.to(mindspore.float16)
++            key_states = key_states.to(mindspore.float16)
++            value_states = value_states.to(mindspore.float16)
++
++        # 6. [核心] 调用 flash_attention_score 算子
++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++        attn_output = mindspore.ops.flash_attention_score(
++            query=query_states,
++            key=key_states,
++            value=value_states,
++            head_num=self.num_heads, # 传入Q的头数(N1)
++            attn_mask=fa_attention_mask,
++            keep_prob=1.0 - self.attention_dropout,
++            scalar_value=1.0 / math.sqrt(self.head_dim),
++            input_layout="BNSD",
++            sparse_mode=0 # 使用 defaultMask 模式
++        )
++
++        # 恢复原始数据类型
++        attn_output = attn_output.to(input_dtype)
++
++        # 7. 调整输出形状
++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++        attn_output = self.o_proj(attn_output)
++
++        # FlashAttention 算子不直接返回注意力权重矩阵
++        attn_weights = None
++        if output_attentions:
++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++
++        return attn_output, attn_weights, past_key_value
++
++    # def forward(
++    #     self,
++    #     hidden_states: mindspore.Tensor,
++    #     attention_mask: Optional[mindspore.Tensor] = None,
++    #     position_ids: Optional[mindspore.Tensor] = None,
++    #     past_key_value: Optional[Cache] = None,
++    #     output_attentions: bool = False,
++    #     use_cache: bool = False,
++    #     cache_position: Optional[mindspore.Tensor] = None,
++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++
++    #     bsz, q_len, _ = hidden_states.shape
++
++    #     # 1. 线性投射 Q, K, V
++    #     query_states = self.q_proj(hidden_states)
++    #     key_states = self.k_proj(hidden_states)
++    #     value_states = self.v_proj(hidden_states)
++
++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++
++    #     # 3. RoPE 旋转位置编码
++    #     kv_seq_len = key_states.shape[-2]
++    #     if past_key_value is not None:
++    #         if self.layer_idx is None:
++    #             raise ValueError(
++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++    #                 "with a layer index."
++    #             )
++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++
++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++
++    #     # 4. KV 缓存更新
++    #     if past_key_value is not None:
++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++    #         key_states, value_states = past_key_value.update(
++    #             key_states, value_states, self.layer_idx, cache_kwargs
++    #         )
++
++    #     # 5. 准备 Attention Mask
++    #     fa_attention_mask = None
++    #     if attention_mask is not None:
++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++    #         fa_attention_mask = (mask_slice != 0)
++
++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++    #     input_dtype = query_states.dtype
++
++    #     # 6. [核心] 调用 flash_attention_score 算子
++    #     attn_output = mindspore.ops.flash_attention_score(
++    #         query=query_states,
++    #         key=key_states,
++    #         value=value_states,
++    #         head_num=self.num_heads,
++    #         attn_mask=fa_attention_mask,
++    #         keep_prob=1.0 - self.attention_dropout,
++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++    #         input_layout="BNSD",
++    #         sparse_mode=0,
++    #         # <--- 修改点 2: 启用内部高精度计算 ---
++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++    #         inner_precise=1
++    #     )
++
++    #     # 恢复原始数据类型
++    #     attn_output = attn_output.to(input_dtype)
++
++    #     # 7. 调整输出形状
++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++    #     attn_output = self.o_proj(attn_output)
++
++    #     attn_weights = None
++    #     if output_attentions:
++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++
++    #     return attn_output, attn_weights, past_key_value
++
++    # def forward(
++    #     self,
++    #     hidden_states: mindspore.Tensor,
++    #     attention_mask: Optional[mindspore.Tensor] = None,
++    #     position_ids: Optional[mindspore.Tensor] = None,
++    #     past_key_value: Optional[Cache] = None,
++    #     output_attentions: bool = False,
++    #     use_cache: bool = False,
++    #     cache_position: Optional[mindspore.Tensor] = None,
++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++
++    #     bsz, q_len, _ = hidden_states.shape
++
++    #     query_states = self.q_proj(hidden_states)
++    #     key_states = self.k_proj(hidden_states)
++    #     value_states = self.v_proj(hidden_states)
++
++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++
++    #     kv_seq_len = key_states.shape[-2]
++    #     if past_key_value is not None:
++    #         if self.layer_idx is None:
++    #             raise ValueError("`layer_idx` must be specified for caching")
++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++
++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++
++    #     if past_key_value is not None:
++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++    #         key_states, value_states = past_key_value.update(
++    #             key_states, value_states, self.layer_idx, cache_kwargs
++    #         )
++
++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++
++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++    #     query_states = query_states / math.sqrt(self.head_dim)
++    #     # <--- 修改结束 ---
++
++    #     fa_attention_mask = None
++    #     if attention_mask is not None:
++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++    #         fa_attention_mask = (mask_slice != 0)
++
++    #     input_dtype = query_states.dtype
++
++    #     attn_output = mindspore.ops.flash_attention_score(
++    #         query=query_states,    # 传入已经预先缩放过的 query
++    #         key=key_states,
++    #         value=value_states,
++    #         head_num=self.num_heads,
++    #         attn_mask=fa_attention_mask,
++    #         keep_prob=1.0 - self.attention_dropout,
++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++    #         input_layout="BNSD",
++    #         sparse_mode=0,
++    #         inner_precise=1        # 仍然保持内部高精度计算
++    #     )
++
++    #     attn_output = attn_output.to(input_dtype)
++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++    #     attn_output = self.o_proj(attn_output)
++
++    #     attn_weights = None
++    #     if output_attentions:
++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++
++    #     return attn_output, attn_weights, past_key_value
++
+ QWEN2MOE_ATTENTION_CLASSES = {
+     "eager": Qwen2MoeAttention,
++    "flash-attention": Qwen2MoeFlashAttention,
+ }
+ 
+ 
+@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+ 
++    #@dwj
++    # 只遍历激活的专家，而非全部专家
+     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-        hidden_states = hidden_states.view(-1, hidden_dim)
+-        # router_logits: (batch * sequence_length, n_experts)
+-        router_logits = self.gate(hidden_states)
+-
+-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-        if self.norm_topk_prob:
+-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-        # we cast back to the input dtype
+-        routing_weights = routing_weights.to(hidden_states.dtype)
+-
+-        final_hidden_states = ops.zeros(
+-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-        )
+-
+-        # One hot encode the selected experts to create an expert mask
+-        # this will be used to easily index which expert is going to be sollicitated
+-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-
+-        # Loop over all available experts in the model and perform the computation on each expert
+-        for expert_idx in range(self.num_experts):
+-            expert_layer = self.experts[expert_idx]
+-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-
+-            # Index the correct hidden states and compute the expert hidden state for
+-            # the current expert. We need to make sure to multiply the output hidden
+-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-            if 0 not in idx.shape:
+-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-
+-                # However `index_add_` only support torch tensors for indexing so we'll use
+-                # the `top_x` tensor here.
+-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-
+-        shared_expert_output = self.shared_expert(hidden_states)
+-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-
+-        final_hidden_states = final_hidden_states + shared_expert_output
++            batch_size, sequence_length, hidden_dim = hidden_states.shape
++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++            num_tokens = hidden_states_reshaped.shape[0]
++
++            router_logits = self.gate(hidden_states_reshaped)
++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++
++            if self.norm_topk_prob:
++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++            routing_weights = routing_weights.to(hidden_states.dtype)
++            
++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++            flat_selected_experts = selected_experts.flatten()
++            
++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++            token_indices = broadcasted_token_indices.flatten()
++            
++            active_experts = ops.unique(flat_selected_experts)
++            
++            for expert_idx_tensor in active_experts:
++                expert_idx = expert_idx_tensor.item()
++                expert_layer = self.experts[expert_idx]
++                
++                mask = (flat_selected_experts == expert_idx_tensor)
++                selected_token_indices = token_indices[mask]
++                selected_routing_weights = routing_weights.flatten()[mask]
++                
++                current_states = hidden_states_reshaped[selected_token_indices]
++                
++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++                
++                final_hidden_states = final_hidden_states.index_add(
++                    dim=0,
++                    index=selected_token_indices,
++                    source=expert_output.to(hidden_states.dtype)
++                )
++            
++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+ 
+-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-        return final_hidden_states, router_logits
++            final_hidden_states = final_hidden_states + shared_expert_output
++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++            
++            return final_hidden_states, router_logits
+ 
+ 
+ class Qwen2MoeDecoderLayer(nn.Module):
+@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+ 
+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+ 
++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++
+         if (layer_idx not in config.mlp_only_layers) and (
+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+         ):
+@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+     _skip_keys_device_placement = "past_key_values"
+     _supports_cache_class = True
++#lwx
++    # _supports_static_cache = True
+ 
+     def _init_weights(self, module):
+         std = self.config.initializer_range
+@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+         return causal_mask
+ 
+ 
+-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+     _tied_weights_keys = ["lm_head.weight"]
+ 
+     def __init__(self, config):
+@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+         self.num_experts_per_tok = config.num_experts_per_tok
+         # Initialize weights and apply final processing
+         self.post_init()
++        # @lwx
++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++        #     self.generation_config.cache_implementation = "static"
++        self._warmed_up = False
++
++    def warmup_moe_model(self):
++        print("[Warmup] Qwen2-MoE 模型预热开始...")
++        test_texts = [
++            "warmup short",
++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++        ]
++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++        if tokenizer is None:
++            from mindnlp.transformers import AutoTokenizer
++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++            self._warmup_tokenizer = tokenizer
++
++        for text in test_texts:
++            inputs = tokenizer(text, return_tensors="ms")
++            with mindspore._no_grad():
++                _ = self(**inputs, output_router_logits=True, use_cache=False)
++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+ 
+     def get_input_embeddings(self):
+         return self.model.embed_tokens
+@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+         ```"""
++        if not self._warmed_up:
++            self._warmed_up = True
++            self.warmup_moe_model()
+ 
+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+         output_router_logits = (
+@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+             }
+         )
+         return model_inputs
++# @lwx
++    # def _decode_one_tokens_logits(
++    #     self,
++    #     cur_token: mindspore.Tensor,
++    #     input_pos: Optional[mindspore.Tensor],
++    #     cache_position: mindspore.Tensor,
++    #     past_key_values: StaticCache,
++    # ) -> mindspore.Tensor:
++    #     """
++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++        
++    #     Args:
++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++    #         input_pos: 输入位置信息，可选
++    #         cache_position: 当前token在cache中的位置，shape为(1,)
++    #         past_key_values: StaticCache对象，存储之前的key-value状态
++            
++    #     Returns:
++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++    #     """
++    #     # 调用JIT编译的版本
++    #     return self.get_decode_one_tokens_logits(
++    #         cur_token=cur_token,
++    #         input_pos=input_pos,
++    #         cache_position=cache_position,
++    #         past_key_values=past_key_values,
++    #     )
++    
++    # @mindspore.jit(jit_level='O1')
++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++    #     """
++    #     JIT编译的函数，用于高效的单token解码
++    #     使用JIT编译优化以支持静态shape和高效执行
++        
++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++    #     """
++    #     outputs = self.model.forward(
++    #         input_ids=cur_token,
++    #         position_ids=input_pos,
++    #         cache_position=cache_position,
++    #         past_key_values=past_key_values,
++    #         use_cache=True,
++    #         return_dict=False,
++    #     )
++        
++    #     hidden_states = outputs[0]
++    #     logits = self.lm_head.forward(hidden_states)
++    #     logits = logits.float()
++        
++    #     return logits[:, -1, :]
++
++    # def _sample(
++    #     self,
++    #     input_ids: mindspore.Tensor,
++    #     logits_processor,
++    #     stopping_criteria,
++    #     generation_config,
++    #     synced_devices: bool,
++    #     streamer=None,
++    #     logits_warper=None,
++    #     **model_kwargs,
++    # ):
++    #     """
++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++    #     """
++    #     from ...generation.logits_process import LogitsProcessorList
++    #     from ...generation.stopping_criteria import StoppingCriteriaList
++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++    #     from mindnlp.core import nn, ops, no_grad
++    #     import numpy as np
++        
++    #     # 检查是否使用 StaticCache
++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++    #     # 否则，直接调用父类方法
++    #     past_key_values = model_kwargs.get("past_key_values")
++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++
++    #     if not isinstance(past_key_values, StaticCache):
++    #         # 不使用 StaticCache，直接调用父类方法
++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++    #         return super()._sample(
++    #             input_ids=input_ids,
++    #             logits_processor=logits_processor,
++    #             stopping_criteria=stopping_criteria,
++    #             generation_config=generation_config,
++    #             synced_devices=synced_devices,
++    #             streamer=streamer,
++    #             logits_warper=logits_warper,
++    #             **model_kwargs,
++    #         )
++        
++    #     # 使用 StaticCache，进入自定义循环
++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++    #     pad_token_id = generation_config._pad_token_tensor
++    #     output_attentions = generation_config.output_attentions
++    #     output_hidden_states = generation_config.output_hidden_states
++    #     output_scores = generation_config.output_scores
++    #     output_logits = generation_config.output_logits
++    #     return_dict_in_generate = generation_config.return_dict_in_generate
++    #     max_length = generation_config.max_length
++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++    #     do_sample = generation_config.do_sample
++        
++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++    #         raise ValueError(
++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++    #             f"{logits_warper})."
++    #         )
++        
++    #     # init attention / hidden states / scores tuples
++    #     scores = () if (return_dict_in_generate and output_scores) else None
++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++        
++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++    #         encoder_hidden_states = (
++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++    #         )
++        
++    #     # keep track of which sequences are already finished
++    #     batch_size, cur_len = input_ids.shape
++    #     this_peer_finished = False
++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++        
++    #     time_record = []
++    #     from ....utils.testing_utils import parse_flag_from_env
++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++        
++    #     while self._has_unfinished_sequences(
++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++    #     ):
++    #         if _record_time:
++    #             import time as time_module
++    #             infer_start = time_module.time()
++            
++    #         # prepare model inputs
++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++            
++    #         # prepare variable output controls
++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++            
++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++    #         cur_cache_position = model_inputs.get("cache_position")
++    #         cur_past_key_values = model_inputs.get("past_key_values")
++    #         cur_input_ids = model_inputs.get("input_ids")
++            
++    #         if (isinstance(cur_past_key_values, StaticCache) and 
++    #             cur_cache_position is not None and 
++    #             len(cur_cache_position.shape) > 0 and
++    #             cur_cache_position.shape[0] == 1 and
++    #             cur_input_ids is not None and
++    #             cur_input_ids.shape[1] == 1):
++    #             # 使用 JIT 优化的单 token 解码
++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++    #             if not hasattr(self, '_jit_used'):
++    #                 self._jit_used = False
++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++                
++    #             next_token_logits = self.get_decode_one_tokens_logits(
++    #                 cur_token=cur_input_ids,
++    #                 input_pos=model_inputs.get("position_ids"),
++    #                 cache_position=cur_cache_position,
++    #                 past_key_values=cur_past_key_values,
++    #             )
++                
++    #             # 标记已使用JIT（用于后续判断）
++    #             if not self._jit_used:
++    #                 self._jit_used = True
++                
++    #             # 构造兼容的输出对象
++    #             class JitOptimizedOutput:
++    #                 def __init__(self, logits, config):
++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++    #                     self.config = config
++    #                     # 对于 JIT 优化路径，这些属性通常不需要
++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++    #                     self.attentions = None if not config.is_encoder_decoder else None
++    #                     self.cross_attentions = None
++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++                
++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++    #         else:
++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++    #             outputs = self(**model_inputs, return_dict=True)
++            
++    #         if synced_devices and this_peer_finished:
++    #             continue
++            
++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++    #         next_token_logits = outputs.logits[:, -1, :]
++            
++    #         # pre-process distribution
++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++    #         if do_sample:
++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++            
++    #         # Store scores, attentions and hidden_states when required
++    #         if return_dict_in_generate:
++    #             if output_scores:
++    #                 scores += (next_token_scores,)
++    #             if output_logits:
++    #                 raw_logits += (next_token_logits,)
++    #             if output_attentions:
++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++    #                 if self.config.is_encoder_decoder:
++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++                
++    #             if output_hidden_states:
++    #                 hidden = (
++    #                     outputs.decoder_hidden_states
++    #                     if self.config.is_encoder_decoder
++    #                     else outputs.hidden_states
++    #                 )
++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++            
++    #         # token selection
++    #         if do_sample:
++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++    #         else:
++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++            
++    #         # finished sentences should have their next token be a padding token
++    #         if has_eos_stopping_criteria:
++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++            
++    #         # update generated ids, model inputs, and length for next step
++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++    #         if streamer is not None:
++    #             streamer.put(next_tokens)
++            
++    #         model_kwargs = self._update_model_kwargs_for_generation(
++    #             outputs,
++    #             model_kwargs,
++    #             is_encoder_decoder=self.config.is_encoder_decoder,
++    #         )
++            
++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++    #         cur_len += 1
++            
++    #         if _record_time:
++    #             import time as time_module
++    #             infer_stop = time_module.time()
++    #             time_record.append(infer_stop - infer_start)
++            
++    #         del outputs
++        
++    #     average_infer_time = None
++    #     if time_record:
++    #         if len(time_record) > 1:
++    #             time_record.pop(0)
++    #         average_infer_time = sum(time_record) / len(time_record)
++    #         print(f'average inference time is: {average_infer_time}')
++    #         print(f'inference time record: {time_record}')
++        
++    #     if streamer is not None:
++    #         streamer.end()
++        
++    #     # 简单判断：打印是否使用了JIT路径
++    #     if hasattr(self, '_jit_used') and self._jit_used:
++    #         print("[JIT] ✓ JIT optimization was used during generation")
++    #     else:
++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++        
++    #     if return_dict_in_generate:
++    #         if self.config.is_encoder_decoder:
++    #             return GenerateEncoderDecoderOutput(
++    #                 sequences=input_ids,
++    #                 scores=scores,
++    #                 logits=raw_logits,
++    #                 encoder_attentions=encoder_attentions,
++    #                 encoder_hidden_states=encoder_hidden_states,
++    #                 decoder_attentions=decoder_attentions,
++    #                 cross_attentions=cross_attentions,
++    #                 decoder_hidden_states=decoder_hidden_states,
++    #                 past_key_values=model_kwargs.get("past_key_values"),
++    #                 average_infer_time=average_infer_time
++    #             )
++    #         else:
++    #             return GenerateDecoderOnlyOutput(
++    #                 sequences=input_ids,
++    #                 scores=scores,
++    #                 logits=raw_logits,
++    #                 attentions=decoder_attentions,
++    #                 hidden_states=decoder_hidden_states,
++    #                 past_key_values=model_kwargs.get("past_key_values"),
++    #                 average_infer_time=average_infer_time
++    #             )
++    #     else:
++    #         return input_ids
++            
++    # def _prepare_cache_for_generation(
++    #     self,
++    #     generation_config,
++    #     model_kwargs,
++    #     assistant_model,
++    #     batch_size,
++    #     max_cache_length,
++    # ):
++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++    #         generation_config.cache_implementation = "static"
++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++        
++    #     if generation_config.cache_implementation == "static":
++    #         base_required_from_max_length = generation_config.max_length + 1
++    #         base_required = max(max_cache_length, base_required_from_max_length)
++    #         min_cache_size = 50
++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++    #         else:
++    #             max_cache_length = max(base_required, min_cache_size)
++            
++    #         original_max_cache_length = max_cache_length
++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++    #         print(f"  - final max_cache_length: {max_cache_length}")
++            
++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++    #             if max_cache_length > self.config.max_position_embeddings:
++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++        
++    #     result = super()._prepare_cache_for_generation(
++    #         generation_config=generation_config,
++    #         model_kwargs=model_kwargs,
++    #         assistant_model=assistant_model,
++    #         batch_size=batch_size,
++    #         max_cache_length=max_cache_length,
++    #     )
++        
++    #     if generation_config.cache_implementation == "static":
++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++    #         created_cache = model_kwargs.get(cache_name)
++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++    #             if created_cache.max_cache_len < generation_config.max_length:
++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++        
++    #     return result
++
++
++
+ 
+ 
+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0002-20251106commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0002-20251106commit.patch"
new file mode 100644
index 00000000..baee9388
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0002-20251106commit.patch"
@@ -0,0 +1,3200 @@
+From dcd6fc7b6307db27f23087ba3958949eb52a9beb Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Thu, 6 Nov 2025 09:20:38 +0800
+Subject: [PATCH 02/10] 20251106commit
+
+---
+ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+ 3 files changed, 2689 insertions(+), 305 deletions(-)
+ create mode 100644 patches/0001-20251104commit.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index d8303e45..73773c22 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+         #     y = y + self.shared_experts(identity)
+         # return y
+         
++    # @no_grad()
++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++
++    #     expert_cache = ops.zeros_like(x)
++    #     for i in range(self.num_experts_per_tok):
++    #         expert_id = flat_expert_indices[i].item()
++    #         weight = flat_expert_weights[i].item()
++    #         expert = self.experts[expert_id]
++    #         expert_out = expert(x)
++    #         expert_cache += expert_out * weight
++    #     return expert_cache
++
+     @no_grad()
+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++        # x 的 shape: (1, hidden_size)
++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++
++        # 1. 收集所有需要的专家层
++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++        selected_experts = [self.experts[i] for i in flat_expert_indices]
++
++        # 2. 并行计算所有专家的输出
++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++        # ops.cat 会将它们堆叠成一个新的 Tensor
++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++
++        # 3. 使用矩阵乘法进行加权求和
++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++        # 最终结果 final_output 的 shape: (1, hidden_size)
++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++        
++        return final_output
+ 
+-        expert_cache = ops.zeros_like(x)
+-        for i in range(self.num_experts_per_tok):
+-            expert_id = flat_expert_indices[i].item()
+-            weight = flat_expert_weights[i].item()
+-            expert = self.experts[expert_id]
+-            expert_out = expert(x)
+-            expert_cache += expert_out * weight
+-        return expert_cache
+ 
+     @no_grad()
+     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+             key_states = self.k_proj(hidden_states)
+             value_states = self.v_proj(hidden_states)
+ 
+-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++        # @lwx
++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+ 
+         kv_seq_len = key_states.shape[-2]
+         if past_key_value is not None:
+@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+         return attn_output, attn_weights, past_key_value
+ 
+ 
++# class DeepseekFlashAttention(nn.Module):
++#     """
++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
++
++#     This class is designed as a drop-in replacement for DeepseekAttention.
++#     """
++
++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++#         super().__init__()
++#         self.config = config
++#         self.layer_idx = layer_idx
++#         if layer_idx is None:
++#             logger.warning(
++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++#                 "when creating this class."
++#             )
++
++#         self.attention_dropout = config.attention_dropout
++#         self.hidden_size = config.hidden_size
++#         self.num_heads = config.num_attention_heads
++#         self.head_dim = self.hidden_size // self.num_heads
++#         self.num_key_value_heads = config.num_key_value_heads
++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++#         self.max_position_embeddings = config.max_position_embeddings
++#         self.rope_theta = config.rope_theta
++#         self.is_causal = True
++
++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++#             raise ValueError(
++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++#                 f" and `num_heads`: {self.num_heads})."
++#             )
++
++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++#         self._init_rope()
++
++#     def _init_rope(self):
++#         if self.config.rope_scaling is None:
++#             self.rotary_emb = DeepseekRotaryEmbedding(
++#                 self.head_dim,
++#                 max_position_embeddings=self.max_position_embeddings,
++#                 base=self.rope_theta,
++#             )
++#         else:
++#             scaling_type = self.config.rope_scaling["type"]
++#             scaling_factor = self.config.rope_scaling["factor"]
++#             if scaling_type == "linear":
++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++#                     self.head_dim,
++#                     max_position_embeddings=self.max_position_embeddings,
++#                     scaling_factor=scaling_factor,
++#                     base=self.rope_theta,
++#                 )
++#             elif scaling_type == "dynamic":
++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++#                     self.head_dim,
++#                     max_position_embeddings=self.max_position_embeddings,
++#                     scaling_factor=scaling_factor,
++#                     base=self.rope_theta,
++#                 )
++#             else:
++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++
++#     def forward(
++#         self,
++#         hidden_states: mindspore.Tensor,
++#         attention_mask: Optional[mindspore.Tensor] = None,
++#         position_ids: Optional[mindspore.Tensor] = None,
++#         past_key_value: Optional[Cache] = None,
++#         output_attentions: bool = False,
++#         use_cache: bool = False,
++#         **kwargs,
++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++#         if "padding_mask" in kwargs:
++#             warnings.warn(
++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++#             )
++        
++#         if output_attentions:
++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
++
++#         bsz, q_len, _ = hidden_states.shape
++
++#         if self.config.pretraining_tp > 1:
++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++
++#         query_states = self.q_proj(hidden_states)
++#         key_states = self.k_proj(hidden_states)
++#         value_states = self.v_proj(hidden_states)
++
++#         # Reshape for multi-head attention
++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++
++#         kv_seq_len = key_states.shape[-2]
++#         if past_key_value is not None:
++#             if self.layer_idx is None:
++#                 raise ValueError(
++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++#                     "with a layer index."
++#                 )
++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++        
++#         # Apply Rotary Positional Embedding
++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++
++#         if past_key_value is not None:
++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++
++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++        
++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++        
++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++
++#         # Convert attention_mask for flash_attention_score
++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
++#         if attention_mask is not None:
++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++#                 raise ValueError(
++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++#                 )
++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
++#         else:
++#             attn_mask_for_fa = None
++        
++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++
++#         # Call the fused flash_attention_score operator
++#         attn_output = mindspore.ops.flash_attention_score(
++#             query=query_states_for_fa,
++#             key=key_states_for_fa,
++#             value=value_states_for_fa,
++#             head_num=self.num_heads, # This is N1, the number of query heads
++#             input_layout='BSH',
++#             attn_mask=attn_mask_for_fa,
++#             keep_prob=keep_prob,
++#             scalar_value=1.0 / math.sqrt(self.head_dim),
++#             sparse_mode=0 # Default mask mode
++#         )
++        
++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
++#         attn_output = self.o_proj(attn_output)
++        
++#         # Flash Attention does not return attention weights
++#         attn_weights = None
++
++#         return attn_output, attn_weights, past_key_value
++
++class DeepseekFlashAttention(nn.Module):
++    """
++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
++    This implementation is a drop-in replacement for the original DeepseekAttention class,
++    designed for high performance on supported hardware (Ascend).
++
++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
++    """
++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++        super().__init__()
++        self.config = config
++        self.layer_idx = layer_idx
++        if layer_idx is None:
++            logger.warning(
++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++                "when creating this class."
++            )
++
++        # --- [FIX] Correctly initialize all required attributes ---
++        self.attention_dropout = config.attention_dropout
++        self.hidden_size = config.hidden_size
++        self.num_heads = config.num_attention_heads
++        self.head_dim = self.hidden_size // self.num_heads
++        self.num_key_value_heads = config.num_key_value_heads
++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++        self.max_position_embeddings = config.max_position_embeddings
++        self.rope_theta = config.rope_theta
++        self.is_causal = True
++
++        if (self.head_dim * self.num_heads) != self.hidden_size:
++            raise ValueError(
++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++                f" and `num_heads`: {self.num_heads})."
++            )
++
++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++        
++        # This call will now succeed as all attributes are initialized.
++        self._init_rope()
++
++    def _init_rope(self):
++        if self.config.rope_scaling is None:
++            self.rotary_emb = DeepseekRotaryEmbedding(
++                self.head_dim,
++                max_position_embeddings=self.max_position_embeddings,
++                base=self.rope_theta,
++            )
++        else:
++            scaling_type = self.config.rope_scaling["type"]
++            scaling_factor = self.config.rope_scaling["factor"]
++            if scaling_type == "linear":
++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++                    self.head_dim,
++                    max_position_embeddings=self.max_position_embeddings,
++                    scaling_factor=scaling_factor,
++                    base=self.rope_theta,
++                )
++            elif scaling_type == "dynamic":
++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++                    self.head_dim,
++                    max_position_embeddings=self.max_position_embeddings,
++                    scaling_factor=scaling_factor,
++                    base=self.rope_theta,
++                )
++            else:
++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++
++    def forward(
++        self,
++        hidden_states: mindspore.Tensor,
++        attention_mask: Optional[mindspore.Tensor] = None,
++        position_ids: Optional[mindspore.Tensor] = None,
++        past_key_value: Optional[Cache] = None,
++        output_attentions: bool = False,
++        use_cache: bool = False,
++        **kwargs,
++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++        if "padding_mask" in kwargs:
++            warnings.warn(
++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++            )
++        if output_attentions:
++            warnings.warn(
++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
++            )
++
++        bsz, q_len, _ = hidden_states.shape
++
++        if self.config.pretraining_tp > 1:
++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++
++        query_states = self.q_proj(hidden_states)
++        key_states = self.k_proj(hidden_states)
++        value_states = self.v_proj(hidden_states)
++
++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++
++        kv_seq_len = key_states.shape[-2]
++        if past_key_value is not None:
++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++        
++        # Apply Rotary Position Embedding
++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++
++        if past_key_value is not None:
++            cache_kwargs = {"sin": sin, "cos": cos}
++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++
++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
++        # So we must explicitly repeat the KV heads.
++        key_states = repeat_kv(key_states, self.num_key_value_groups)
++        value_states = repeat_kv(value_states, self.num_key_value_groups)
++
++        # Convert attention mask for flash_attention_score
++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
++        if attention_mask is not None:
++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++                 raise ValueError(
++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++                )
++            attn_mask_for_fa = attention_mask < 0
++        else:
++            attn_mask_for_fa = None
++
++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++
++        # Call the fused operator using the efficient BNSD layout
++        attn_output = mindspore.ops.flash_attention_score(
++            query=query_states,
++            key=key_states,
++            value=value_states,
++            head_num=self.num_heads,
++            input_layout='BNSD', # Specify the correct layout
++            attn_mask=attn_mask_for_fa,
++            keep_prob=keep_prob,
++            scalar_value=1.0 / math.sqrt(self.head_dim)
++        )
++        
++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++        
++        # Apply output projection
++        attn_output = self.o_proj(attn_output)
++
++        # Flash attention does not return attention weights, so we return None.
++        attn_weights = None
++
++        return attn_output, attn_weights, past_key_value
++
+ Deepseek_ATTENTION_CLASSES = {
+     "eager": DeepseekAttention,
++    "flash-attention": DeepseekFlashAttention,
+ }
+ 
+ 
+@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+             config=config, layer_idx=layer_idx
+         )
+ 
++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
++            config=config, layer_idx=layer_idx
++        )
++
+         self.mlp = (
+             DeepseekMoE(config)
+             if (
+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+index d4c6b651..bced285c 100644
+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+ 
+ import mindspore
+ import mindnlp.core.nn.functional as F
+-from mindnlp.core import nn, ops
++from mindnlp.core import nn, ops, no_grad
+ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+ 
+ from ....common.activations import ACT2FN
+@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+ 
++Long_Prompt = False
++PROMPT_LENGTH_THRESHOLD = 128
+ 
+ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+ def _prepare_4d_causal_attention_mask_with_cache_position(
+@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+         return attn_output, attn_weights, past_key_value
+ 
+ 
++# class Qwen2MoeFlashAttention(nn.Module):
++#     """
++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++
++#     关键改动:
++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++#        直接传入原始的 key 和 value 张量效率更高。
++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++#     """
++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++#         super().__init__()
++#         self.config = config
++#         self.layer_idx = layer_idx
++#         self.hidden_size = config.hidden_size
++#         self.num_heads = config.num_attention_heads
++#         self.head_dim = self.hidden_size // self.num_heads
++#         self.num_key_value_heads = config.num_key_value_heads
++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++#         self.max_position_embeddings = config.max_position_embeddings
++#         self.rope_theta = config.rope_theta
++#         self.attention_dropout = config.attention_dropout
++
++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++#             raise ValueError(
++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++#             )
++
++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++
++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
++#             self.head_dim,
++#             max_position_embeddings=self.max_position_embeddings,
++#             base=self.rope_theta,
++#         )
++
++#     def forward(
++#         self,
++#         hidden_states: mindspore.Tensor,
++#         attention_mask: Optional[mindspore.Tensor] = None,
++#         position_ids: Optional[mindspore.Tensor] = None,
++#         past_key_value: Optional[Cache] = None,
++#         output_attentions: bool = False,
++#         use_cache: bool = False,
++#         cache_position: Optional[mindspore.Tensor] = None,
++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++
++#         bsz, q_len, _ = hidden_states.shape
++
++#         # 1. 线性投射 Q, K, V
++#         query_states = self.q_proj(hidden_states)
++#         key_states = self.k_proj(hidden_states)
++#         value_states = self.v_proj(hidden_states)
++
++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++#         # query:   [B, S, H*D] -> [B, N1, S, D]
++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++
++#         # 3. RoPE 旋转位置编码
++#         kv_seq_len = key_states.shape[-2]
++#         if past_key_value is not None:
++#             if self.layer_idx is None:
++#                 raise ValueError(
++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++#                     "with a layer index."
++#                 )
++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++#                 if cache_position.shape[0] == 1:
++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++#                     kv_seq_len = past_seen_tokens + 1
++#                 else:
++#                     # prefill 阶段：cache_position 是范围，使用其长度
++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
++#             else:
++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++
++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++
++#         # 4. KV 缓存更新
++#         if past_key_value is not None:
++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++#             key_states, value_states = past_key_value.update(
++#                 key_states, value_states, self.layer_idx, cache_kwargs
++#             )
++            
++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
++#                 if cache_position.shape[0] == 1:
++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++#                     kv_seq_len = key_states.shape[-2]
++
++#         # 5. [重要] 准备 Attention Mask
++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++#         fa_attention_mask = None
++#         if attention_mask is not None:
++#             # 截取与当前key长度匹配的部分
++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
++#             fa_attention_mask = (mask_slice != 0)
++
++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++#         input_dtype = query_states.dtype
++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++#             query_states = query_states.to(mindspore.float16)
++#             key_states = key_states.to(mindspore.float16)
++#             value_states = value_states.to(mindspore.float16)
++
++#         # 6. [核心] 调用 flash_attention_score 算子
++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++#         attn_output = mindspore.ops.flash_attention_score(
++#             query=query_states,
++#             key=key_states,
++#             value=value_states,
++#             head_num=self.num_heads, # 传入Q的头数(N1)
++#             attn_mask=fa_attention_mask,
++#             keep_prob=1.0 - self.attention_dropout,
++#             scalar_value=1.0 / math.sqrt(self.head_dim),
++#             input_layout="BNSD",
++#             sparse_mode=0 # 使用 defaultMask 模式
++#         )
++
++#         # 恢复原始数据类型
++#         attn_output = attn_output.to(input_dtype)
++
++#         # 7. 调整输出形状
++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++#         attn_output = self.o_proj(attn_output)
++
++#         # FlashAttention 算子不直接返回注意力权重矩阵
++#         attn_weights = None
++#         if output_attentions:
++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++
++#         return attn_output, attn_weights, past_key_value
++
++#     # def forward(
++#     #     self,
++#     #     hidden_states: mindspore.Tensor,
++#     #     attention_mask: Optional[mindspore.Tensor] = None,
++#     #     position_ids: Optional[mindspore.Tensor] = None,
++#     #     past_key_value: Optional[Cache] = None,
++#     #     output_attentions: bool = False,
++#     #     use_cache: bool = False,
++#     #     cache_position: Optional[mindspore.Tensor] = None,
++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++
++#     #     bsz, q_len, _ = hidden_states.shape
++
++#     #     # 1. 线性投射 Q, K, V
++#     #     query_states = self.q_proj(hidden_states)
++#     #     key_states = self.k_proj(hidden_states)
++#     #     value_states = self.v_proj(hidden_states)
++
++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++
++#     #     # 3. RoPE 旋转位置编码
++#     #     kv_seq_len = key_states.shape[-2]
++#     #     if past_key_value is not None:
++#     #         if self.layer_idx is None:
++#     #             raise ValueError(
++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++#     #                 "with a layer index."
++#     #             )
++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++
++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++
++#     #     # 4. KV 缓存更新
++#     #     if past_key_value is not None:
++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++#     #         key_states, value_states = past_key_value.update(
++#     #             key_states, value_states, self.layer_idx, cache_kwargs
++#     #         )
++
++#     #     # 5. 准备 Attention Mask
++#     #     fa_attention_mask = None
++#     #     if attention_mask is not None:
++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++#     #         fa_attention_mask = (mask_slice != 0)
++
++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++#     #     input_dtype = query_states.dtype
++
++#     #     # 6. [核心] 调用 flash_attention_score 算子
++#     #     attn_output = mindspore.ops.flash_attention_score(
++#     #         query=query_states,
++#     #         key=key_states,
++#     #         value=value_states,
++#     #         head_num=self.num_heads,
++#     #         attn_mask=fa_attention_mask,
++#     #         keep_prob=1.0 - self.attention_dropout,
++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
++#     #         input_layout="BNSD",
++#     #         sparse_mode=0,
++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++#     #         inner_precise=1
++#     #     )
++
++#     #     # 恢复原始数据类型
++#     #     attn_output = attn_output.to(input_dtype)
++
++#     #     # 7. 调整输出形状
++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++#     #     attn_output = self.o_proj(attn_output)
++
++#     #     attn_weights = None
++#     #     if output_attentions:
++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++
++#     #     return attn_output, attn_weights, past_key_value
++
++
+ class Qwen2MoeFlashAttention(nn.Module):
+     """
+-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-
+-    关键改动:
+-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-       直接传入原始的 key 和 value 张量效率更高。
+-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
++
++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
++    完全使用模型的低精度数据类型（如 float16）进行计算，
++    以达到理论上的最高执行速度。
+     """
+     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+         super().__init__()
+         self.config = config
+         self.layer_idx = layer_idx
++        if layer_idx is None:
++            logger.warning_once(
++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
++            )
++
+         self.hidden_size = config.hidden_size
+         self.num_heads = config.num_attention_heads
+         self.head_dim = self.hidden_size // self.num_heads
+         self.num_key_value_heads = config.num_key_value_heads
+-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+         self.max_position_embeddings = config.max_position_embeddings
+         self.rope_theta = config.rope_theta
+         self.attention_dropout = config.attention_dropout
+ 
+-        if (self.head_dim * self.num_heads) != self.hidden_size:
+-            raise ValueError(
+-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-            )
+-
+         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+         key_states = self.k_proj(hidden_states)
+         value_states = self.v_proj(hidden_states)
+ 
+-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-        # query:   [B, S, H*D] -> [B, N1, S, D]
+-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++        # 2. 调整形状以匹配 BNSD 布局
+         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-
+-        # 3. RoPE 旋转位置编码
++        
++        # 3. RoPE 和 KV 缓存
+         kv_seq_len = key_states.shape[-2]
+         if past_key_value is not None:
+-            if self.layer_idx is None:
+-                raise ValueError(
+-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-                    "with a layer index."
+-                )
+-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-                if cache_position.shape[0] == 1:
+-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-                    kv_seq_len = past_seen_tokens + 1
+-                else:
+-                    # prefill 阶段：cache_position 是范围，使用其长度
+-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-            else:
+-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-
++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++        
+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+ 
+-        # 4. KV 缓存更新
+         if past_key_value is not None:
+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-            key_states, value_states = past_key_value.update(
+-                key_states, value_states, self.layer_idx, cache_kwargs
+-            )
+-            
+-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-                if cache_position.shape[0] == 1:
+-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-                    kv_seq_len = key_states.shape[-2]
+-
+-        # 5. [重要] 准备 Attention Mask
+-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++
++        # 4. 准备 Attention Mask
+         fa_attention_mask = None
+         if attention_mask is not None:
+-            # 截取与当前key长度匹配的部分
+-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+             fa_attention_mask = (mask_slice != 0)
+ 
+-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-        input_dtype = query_states.dtype
+-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-            query_states = query_states.to(mindspore.float16)
+-            key_states = key_states.to(mindspore.float16)
+-            value_states = value_states.to(mindspore.float16)
+-
+-        # 6. [核心] 调用 flash_attention_score 算子
+-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+         attn_output = mindspore.ops.flash_attention_score(
+             query=query_states,
+             key=key_states,
+             value=value_states,
+-            head_num=self.num_heads, # 传入Q的头数(N1)
++            head_num=self.num_heads,
+             attn_mask=fa_attention_mask,
+-            keep_prob=1.0 - self.attention_dropout,
++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+             scalar_value=1.0 / math.sqrt(self.head_dim),
+             input_layout="BNSD",
+-            sparse_mode=0 # 使用 defaultMask 模式
++            sparse_mode=0,
++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+         )
+ 
+-        # 恢复原始数据类型
+-        attn_output = attn_output.to(input_dtype)
+-
+-        # 7. 调整输出形状
+-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++        # 6. 调整输出形状
+         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+         attn_output = self.o_proj(attn_output)
+ 
+-        # FlashAttention 算子不直接返回注意力权重矩阵
++        # 7. 返回结果
+         attn_weights = None
+         if output_attentions:
+-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+ 
+         return attn_output, attn_weights, past_key_value
+ 
+-    # def forward(
+-    #     self,
+-    #     hidden_states: mindspore.Tensor,
+-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-    #     position_ids: Optional[mindspore.Tensor] = None,
+-    #     past_key_value: Optional[Cache] = None,
+-    #     output_attentions: bool = False,
+-    #     use_cache: bool = False,
+-    #     cache_position: Optional[mindspore.Tensor] = None,
+-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-
+-    #     bsz, q_len, _ = hidden_states.shape
+-
+-    #     # 1. 线性投射 Q, K, V
+-    #     query_states = self.q_proj(hidden_states)
+-    #     key_states = self.k_proj(hidden_states)
+-    #     value_states = self.v_proj(hidden_states)
+-
+-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-
+-    #     # 3. RoPE 旋转位置编码
+-    #     kv_seq_len = key_states.shape[-2]
+-    #     if past_key_value is not None:
+-    #         if self.layer_idx is None:
+-    #             raise ValueError(
+-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-    #                 "with a layer index."
+-    #             )
+-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+ 
+-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-
+-    #     # 4. KV 缓存更新
+-    #     if past_key_value is not None:
+-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-    #         key_states, value_states = past_key_value.update(
+-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-    #         )
+-
+-    #     # 5. 准备 Attention Mask
+-    #     fa_attention_mask = None
+-    #     if attention_mask is not None:
+-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-    #         fa_attention_mask = (mask_slice != 0)
+-
+-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-    #     input_dtype = query_states.dtype
+-
+-    #     # 6. [核心] 调用 flash_attention_score 算子
+-    #     attn_output = mindspore.ops.flash_attention_score(
+-    #         query=query_states,
+-    #         key=key_states,
+-    #         value=value_states,
+-    #         head_num=self.num_heads,
+-    #         attn_mask=fa_attention_mask,
+-    #         keep_prob=1.0 - self.attention_dropout,
+-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-    #         input_layout="BNSD",
+-    #         sparse_mode=0,
+-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-    #         inner_precise=1
+-    #     )
+-
+-    #     # 恢复原始数据类型
+-    #     attn_output = attn_output.to(input_dtype)
++QWEN2MOE_ATTENTION_CLASSES = {
++    "eager": Qwen2MoeAttention,
++    "flash-attention": Qwen2MoeFlashAttention,
++}
+ 
+-    #     # 7. 调整输出形状
+-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-    #     attn_output = self.o_proj(attn_output)
+ 
+-    #     attn_weights = None
+-    #     if output_attentions:
+-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++# class Qwen2MoeSparseMoeBlock(nn.Module):
++#     def __init__(self, config):
++#         super().__init__()
++#         self.num_experts = config.num_experts
++#         self.top_k = config.num_experts_per_tok
++#         self.norm_topk_prob = config.norm_topk_prob
++
++#         # gating
++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++#         self.experts = nn.ModuleList(
++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++#         )
++
++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++
++#     #@dwj
++#     # 只遍历激活的专家，而非全部专家
++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++#             num_tokens = hidden_states_reshaped.shape[0]
++
++#             router_logits = self.gate(hidden_states_reshaped)
++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++
++#             if self.norm_topk_prob:
++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++#             routing_weights = routing_weights.to(hidden_states.dtype)
++            
++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++#             flat_selected_experts = selected_experts.flatten()
++            
++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++#             token_indices = broadcasted_token_indices.flatten()
++            
++#             active_experts = ops.unique(flat_selected_experts)
++            
++#             for expert_idx_tensor in active_experts:
++#                 expert_idx = expert_idx_tensor.item()
++#                 expert_layer = self.experts[expert_idx]
++                
++#                 mask = (flat_selected_experts == expert_idx_tensor)
++#                 selected_token_indices = token_indices[mask]
++#                 selected_routing_weights = routing_weights.flatten()[mask]
++                
++#                 current_states = hidden_states_reshaped[selected_token_indices]
++                
++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++                
++#                 final_hidden_states = final_hidden_states.index_add(
++#                     dim=0,
++#                     index=selected_token_indices,
++#                     source=expert_output.to(hidden_states.dtype)
++#                 )
++            
++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+ 
+-    #     return attn_output, attn_weights, past_key_value
++#             final_hidden_states = final_hidden_states + shared_expert_output
++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++            
++#             return final_hidden_states, router_logits
++
++
++# class Qwen2MoeSparseMoeBlock(nn.Module):
++#     """
++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
++#     `_moe_infer_prefill` (用于长序列处理) 方法。
++#     """
++#     def __init__(self, config: Qwen2MoeConfig):
++#         super().__init__()
++#         self.num_experts = config.num_experts
++#         self.top_k = config.num_experts_per_tok
++#         self.norm_topk_prob = config.norm_topk_prob
++
++#         # 门控网络
++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++#         # 专家列表
++#         self.experts = nn.ModuleList(
++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++#         )
++#         # 共享专家
++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++
++#     @no_grad()
++#     def _moe_infer_decode(
++#         self, 
++#         hidden_states: mindspore.Tensor, 
++#         selected_experts: mindspore.Tensor, 
++#         routing_weights: mindspore.Tensor
++#     ) -> mindspore.Tensor:
++#         """
++#         【解码路径】针对 sequence_length=1 的极致优化。
++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
++#         """
++#         batch_size, hidden_dim = hidden_states.shape
++        
++#         expert_outputs_list = [
++#             ops.cat([
++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++#             ], dim=0) 
++#             for i in range(batch_size)
++#         ]
++        
++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
++#         # shape: (batch_size, top_k, hidden_dim)
++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++        
++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++        
++#         return moe_output.squeeze(1)
++
++#     @no_grad()
++#     def _moe_infer_prefill(
++#         self, 
++#         hidden_states: mindspore.Tensor, 
++#         selected_experts: mindspore.Tensor, 
++#         routing_weights: mindspore.Tensor
++#     ) -> mindspore.Tensor:
++#         """
++#         【预填充路径】针对 sequence_length > 1 的优化。
++#         按专家对 Token 进行分组，并进行批处理。
++#         """
++#         moe_output = ops.zeros_like(hidden_states)
++#         num_tokens = hidden_states.shape[0]
++#         flat_selected_experts = selected_experts.flatten()
++        
++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++        
++#         active_experts = ops.unique(flat_selected_experts)
++        
++#         for expert_idx_tensor in active_experts:
++#             expert_idx = expert_idx_tensor.item()
++#             expert_layer = self.experts[expert_idx]
++            
++#             mask = (flat_selected_experts == expert_idx_tensor)
++#             selected_token_indices = token_indices[mask]
++#             selected_routing_weights = routing_weights.flatten()[mask]
++            
++#             current_states = hidden_states[selected_token_indices]
++            
++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++            
++#             moe_output = moe_output.index_add(
++#                 dim=0,
++#                 index=selected_token_indices,
++#                 source=expert_output.to(hidden_states.dtype)
++#             )
++#         return moe_output
++
++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++#         """
++#         顶层 forward 方法，作为智能分发器。
++#         """
++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++        
++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++#         router_logits = self.gate(hidden_states_reshaped)
++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+ 
+-    # def forward(
+-    #     self,
+-    #     hidden_states: mindspore.Tensor,
+-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-    #     position_ids: Optional[mindspore.Tensor] = None,
+-    #     past_key_value: Optional[Cache] = None,
+-    #     output_attentions: bool = False,
+-    #     use_cache: bool = False,
+-    #     cache_position: Optional[mindspore.Tensor] = None,
+-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-
+-    #     bsz, q_len, _ = hidden_states.shape
+-
+-    #     query_states = self.q_proj(hidden_states)
+-    #     key_states = self.k_proj(hidden_states)
+-    #     value_states = self.v_proj(hidden_states)
+-
+-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-
+-    #     kv_seq_len = key_states.shape[-2]
+-    #     if past_key_value is not None:
+-    #         if self.layer_idx is None:
+-    #             raise ValueError("`layer_idx` must be specified for caching")
+-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-
+-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-
+-    #     if past_key_value is not None:
+-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-    #         key_states, value_states = past_key_value.update(
+-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-    #         )
++#         if self.norm_topk_prob:
++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++        
++#         routing_weights = routing_weights.to(hidden_states.dtype)
++        
++#         moe_output = None
++#         # 在推理时，根据序列长度选择最优路径
++#         if not self.training:
++#             if sequence_length == 1:
++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++#             else:
++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++#         else:
++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
++#             raise NotImplementedError("Training path is not implemented.")
++
++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
++        
++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
++        
++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
++        
++#         return final_hidden_states, router_logits
++
++
++# class Qwen2MoeSparseMoeBlock(nn.Module):
++#     """
++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
++#     """
++#     def __init__(self, config: Qwen2MoeConfig):
++#         super().__init__()
++#         self.num_experts = config.num_experts
++#         self.top_k = config.num_experts_per_tok
++#         self.norm_topk_prob = config.norm_topk_prob
++
++#         # 门控网络
++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++#         # 专家列表
++#         self.experts = nn.ModuleList(
++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++#         )
++#         # 共享专家
++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++
++#     @no_grad()
++#     def _moe_infer_decode(
++#         self, 
++#         hidden_states: mindspore.Tensor, 
++#         selected_experts: mindspore.Tensor, 
++#         routing_weights: mindspore.Tensor
++#     ) -> mindspore.Tensor:
++#         batch_size, _ = hidden_states.shape
++#         expert_outputs_list = [
++#             ops.cat([
++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++#             ], dim=0) 
++#             for i in range(batch_size)
++#         ]
++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++#         return moe_output.squeeze(1)
++
++#     @no_grad()
++#     def _moe_infer_prefill(
++#         self, 
++#         hidden_states: mindspore.Tensor, 
++#         selected_experts: mindspore.Tensor, 
++#         routing_weights: mindspore.Tensor
++#     ) -> mindspore.Tensor:
++#         moe_output = ops.zeros_like(hidden_states)
++#         num_tokens = hidden_states.shape[0]
++#         flat_selected_experts = selected_experts.flatten()
++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++#         active_experts = ops.unique(flat_selected_experts)
++        
++#         for expert_idx_tensor in active_experts:
++#             expert_idx = expert_idx_tensor.item()
++#             expert_layer = self.experts[expert_idx]
++#             mask = (flat_selected_experts == expert_idx_tensor)
++#             selected_token_indices = token_indices[mask]
++#             selected_routing_weights = routing_weights.flatten()[mask]
++#             current_states = hidden_states[selected_token_indices]
++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++#             moe_output = moe_output.index_add(
++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++#             )
++#         return moe_output
++
++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++#         """
++#         顶层 forward 方法，作为智能分发器。
++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
++#         """
++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++        
++#         # 1. 门控计算 (通用逻辑)
++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++#         router_logits = self.gate(hidden_states_reshaped)
++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++
++#         if self.norm_topk_prob:
++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++        
++#         routing_weights = routing_weights.to(hidden_states.dtype)
++        
++#         # 2. 智能分发到最优 MoE 路径
++#         moe_output = None
++#         if not self.training:
++#             if sequence_length == 1:
++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++#             else:
++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++#         else:
++#             raise NotImplementedError("Training path is not implemented.")
++
++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++        
++#         # 4. 合并 MoE 输出和共享专家输出
++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++        
++#         # 5. 恢复原始形状并返回
++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++        
++#         return final_hidden_states, router_logits
++
++# prefill fastest
++# class Qwen2MoeSparseMoeBlock(nn.Module):
++#     """
++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
++#     """
++#     def __init__(self, config: Qwen2MoeConfig):
++#         super().__init__()
++#         self.num_experts = config.num_experts
++#         self.top_k = config.num_experts_per_tok
++#         self.norm_topk_prob = config.norm_topk_prob
++
++#         # 门控网络
++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++#         # 专家列表
++#         self.experts = nn.ModuleList(
++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++#         )
++#         # 共享专家
++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++
++#     @no_grad()
++#     def _moe_infer_dispatch(
++#         self, 
++#         hidden_states: mindspore.Tensor, 
++#         selected_experts: mindspore.Tensor, 
++#         routing_weights: mindspore.Tensor
++#     ) -> mindspore.Tensor:
++#         """
++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
++#         """
++#         moe_output = ops.zeros_like(hidden_states)
++#         num_tokens, _ = hidden_states.shape
++        
++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
++#         flat_selected_experts = selected_experts.flatten()
++#         flat_routing_weights = routing_weights.flatten()
+ 
+-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-
+-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-    #     query_states = query_states / math.sqrt(self.head_dim)
+-    #     # <--- 修改结束 ---
+-
+-    #     fa_attention_mask = None
+-    #     if attention_mask is not None:
+-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-    #         fa_attention_mask = (mask_slice != 0)
+-
+-    #     input_dtype = query_states.dtype
+-
+-    #     attn_output = mindspore.ops.flash_attention_score(
+-    #         query=query_states,    # 传入已经预先缩放过的 query
+-    #         key=key_states,
+-    #         value=value_states,
+-    #         head_num=self.num_heads,
+-    #         attn_mask=fa_attention_mask,
+-    #         keep_prob=1.0 - self.attention_dropout,
+-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-    #         input_layout="BNSD",
+-    #         sparse_mode=0,
+-    #         inner_precise=1        # 仍然保持内部高精度计算
+-    #     )
++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+ 
+-    #     attn_output = attn_output.to(input_dtype)
+-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-    #     attn_output = self.o_proj(attn_output)
++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
++#         active_experts = ops.unique(flat_selected_experts)
++        
++#         for expert_idx_tensor in active_experts:
++#             expert_idx = expert_idx_tensor.item()
++#             expert_layer = self.experts[expert_idx]
++            
++#             # 找到所有分配给该专家的 token
++#             mask = (flat_selected_experts == expert_idx_tensor)
++            
++#             # 使用 mask 选取对应的 token 和权重
++#             current_token_indices = token_indices[mask]
++#             current_routing_weights = flat_routing_weights[mask]
++#             current_hidden_states = hidden_states[current_token_indices]
++            
++#             # 对这些 token 进行批处理
++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++            
++#             # 使用 index_add 将结果精确地加回到对应位置
++#             moe_output = moe_output.index_add(
++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
++#             )
++#         return moe_output
++
++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++#         """
++#         顶层 forward 方法，作为智能分发器。
++#         """
++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++        
++#         # 1. 门控计算
++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++#         router_logits = self.gate(hidden_states_reshaped)
++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++
++#         if self.norm_topk_prob:
++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++        
++#         routing_weights = routing_weights.to(hidden_states.dtype)
++        
++#         # 2. 调用统一的 MoE 计算内核
++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+ 
+-    #     attn_weights = None
+-    #     if output_attentions:
+-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++#         # 3. 统一处理共享专家
++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++        
++#         # 4. 合并输出
++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++        
++#         # 5. 恢复原始形状并返回
++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++        
++#         return final_hidden_states, router_logits
++
++
++# class Qwen2MoeSparseMoeBlock(nn.Module):
++#     """
++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++#     【最终高性能与高精度版】：
++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
++#     3. 这样实现了速度和准确性的两全其美。
++#     """
++#     def __init__(self, config: Qwen2MoeConfig):
++#         super().__init__()
++#         self.num_experts = config.num_experts
++#         self.top_k = config.num_experts_per_tok
++#         self.norm_topk_prob = config.norm_topk_prob
++
++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++#         self.experts = nn.ModuleList(
++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++#         )
++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++
++#     @no_grad()
++#     def _moe_infer_decode(
++#         self, 
++#         hidden_states: mindspore.Tensor, 
++#         selected_experts: mindspore.Tensor, 
++#         routing_weights: mindspore.Tensor
++#     ) -> mindspore.Tensor:
++#         """
++#         【解码路径】极致优化版：bmm + 高精度累加。
++#         """
++#         original_dtype = hidden_states.dtype
++#         batch_size, _ = hidden_states.shape
++        
++#         expert_outputs_list = [
++#             ops.cat([
++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++#             ], dim=0) 
++#             for i in range(batch_size)
++#         ]
++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++
++#         # 在 float32 下执行 bmm，得到高精度结果
++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++        
++#         # 将高精度结果转换回原始数据类型
++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
++        
++#         return moe_output
++
++#     @no_grad()
++#     def _moe_infer_prefill(
++#         self, 
++#         hidden_states: mindspore.Tensor, 
++#         selected_experts: mindspore.Tensor, 
++#         routing_weights: mindspore.Tensor
++#     ) -> mindspore.Tensor:
++#         """
++#         【预填充路径】与原始实现一致，结果精确。
++#         """
++#         moe_output = ops.zeros_like(hidden_states)
++#         num_tokens, _ = hidden_states.shape
++#         flat_selected_experts = selected_experts.flatten()
++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++#         active_experts = ops.unique(flat_selected_experts)
++        
++#         for expert_idx_tensor in active_experts:
++#             expert_idx = expert_idx_tensor.item()
++#             expert_layer = self.experts[expert_idx]
++#             mask = (flat_selected_experts == expert_idx_tensor)
++#             selected_token_indices = token_indices[mask]
++#             selected_routing_weights = routing_weights.flatten()[mask]
++#             current_states = hidden_states[selected_token_indices]
++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++#             moe_output = moe_output.index_add(
++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++#             )
++#         return moe_output
++
++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++        
++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++#         router_logits = self.gate(hidden_states_reshaped)
++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+ 
+-    #     return attn_output, attn_weights, past_key_value
++#         if self.norm_topk_prob:
++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++        
++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
++#         # 如果模型主体是 float16，后续再转换
++        
++#         moe_output = None
++#         if not self.training:
++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
++#             # _moe_infer_decode 内部会处理好类型转换
++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
++#             if sequence_length == 1:
++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
++#             else:
++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
++#         else:
++#             raise NotImplementedError("Training path is not implemented.")
++
++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++        
++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++        
++#         return final_hidden_states, router_logits
++    
+ 
+-QWEN2MOE_ATTENTION_CLASSES = {
+-    "eager": Qwen2MoeAttention,
+-    "flash-attention": Qwen2MoeFlashAttention,
+-}
++# class Qwen2MoeSparseMoeBlock(nn.Module):
++#     """
++#     【融合版】一个混合专家模块，内置两种推理策略，
++#     由外部全局变量 `Long_Prompt` 控制：
++
++#     - if Long_Prompt is True:  【精度优先模式】
++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
++#       适用于处理长序列，避免误差累积。
++
++#     - if Long_Prompt is False: 【速度优先模式】
++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
++#       在解码阶段获得极致速度，同时保证结果高度准确。
++#     """
++#     def __init__(self, config: Qwen2MoeConfig):
++#         super().__init__()
++#         self.num_experts = config.num_experts
++#         self.top_k = config.num_experts_per_tok
++#         self.norm_topk_prob = config.norm_topk_prob
++
++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++#         self.experts = nn.ModuleList(
++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++#         )
++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++
++#     # --- 速度优先模式的辅助函数 ---
++#     @no_grad()
++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++#         original_dtype = hidden_states.dtype
++#         batch_size, _ = hidden_states.shape
++#         expert_outputs_list = [
++#             ops.cat([
++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++#             ], dim=0) 
++#             for i in range(batch_size)
++#         ]
++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++#         weights_fp32 = routing_weights.to(mindspore.float32)
++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++#         return moe_output_fp32.squeeze(1).to(original_dtype)
++
++#     @no_grad()
++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++#         moe_output = ops.zeros_like(hidden_states)
++#         num_tokens, _ = hidden_states.shape
++#         flat_selected_experts = selected_experts.flatten()
++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++#         active_experts = ops.unique(flat_selected_experts)
++#         for expert_idx_tensor in active_experts:
++#             expert_idx = expert_idx_tensor.item()
++#             expert_layer = self.experts[expert_idx]
++#             mask = (flat_selected_experts == expert_idx_tensor)
++#             selected_token_indices = token_indices[mask]
++#             selected_routing_weights = routing_weights.flatten()[mask]
++#             current_states = hidden_states[selected_token_indices]
++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
++#         return moe_output
++
++#     # --- 精度优先模式的辅助函数 ---
++#     @no_grad()
++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++#         moe_output = ops.zeros_like(hidden_states)
++#         num_tokens, _ = hidden_states.shape
++#         flat_selected_experts = selected_experts.flatten()
++#         flat_routing_weights = routing_weights.flatten()
++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++#         active_experts = ops.unique(flat_selected_experts)
++#         for expert_idx_tensor in active_experts:
++#             expert_idx = expert_idx_tensor.item()
++#             expert_layer = self.experts[expert_idx]
++#             mask = (flat_selected_experts == expert_idx_tensor)
++#             current_token_indices = token_indices[mask]
++#             current_routing_weights = flat_routing_weights[mask]
++#             current_hidden_states = hidden_states[current_token_indices]
++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++#         return moe_output
++
++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++#         # 声明我们将要使用一个在模块外部定义的全局变量
++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
++#         global Long_Prompt
++
++#         # 1. 门控计算 (所有模式通用)
++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++#         router_logits = self.gate(hidden_states_reshaped)
++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++#         if self.norm_topk_prob:
++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++        
++#         moe_output = None
++#         if not self.training:
++#             # 根据 Long_Prompt 标志选择模式
++#             if Long_Prompt:
++#                 # --- 精度优先模式 ---
++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++#             else:
++#                 # --- 速度优先模式 ---
++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++#                 if sequence_length == 1:
++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++#                 else:
++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++#         else:
++#             raise NotImplementedError("Training path is not implemented.")
++
++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++        
++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++        
++#         return final_hidden_states, router_logits
++    
++class Qwen2MoeSparseMoeBlock(nn.Module):
++    """
++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
++    控制的顶级推理策略：
+ 
++    - if Long_Prompt is True:  【精度优先模式】
++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
++      适用于需要严格可复现性的长序列任务。
+ 
+-class Qwen2MoeSparseMoeBlock(nn.Module):
+-    def __init__(self, config):
++    - if Long_Prompt is False: 【速度优先模式】
++      采用业界最强的性能组合：
++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
++    """
++    def __init__(self, config: Qwen2MoeConfig):
+         super().__init__()
+         self.num_experts = config.num_experts
+         self.top_k = config.num_experts_per_tok
+         self.norm_topk_prob = config.norm_topk_prob
+ 
+-        # gating
+         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+         self.experts = nn.ModuleList(
+             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+         )
+-
+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+ 
+-    #@dwj
+-    # 只遍历激活的专家，而非全部专家
+-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-            num_tokens = hidden_states_reshaped.shape[0]
+-
+-            router_logits = self.gate(hidden_states_reshaped)
+-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-
+-            if self.norm_topk_prob:
+-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-            routing_weights = routing_weights.to(hidden_states.dtype)
+-            
+-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-            flat_selected_experts = selected_experts.flatten()
+-            
+-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-            token_indices = broadcasted_token_indices.flatten()
+-            
+-            active_experts = ops.unique(flat_selected_experts)
+-            
+-            for expert_idx_tensor in active_experts:
+-                expert_idx = expert_idx_tensor.item()
+-                expert_layer = self.experts[expert_idx]
+-                
+-                mask = (flat_selected_experts == expert_idx_tensor)
+-                selected_token_indices = token_indices[mask]
+-                selected_routing_weights = routing_weights.flatten()[mask]
+-                
+-                current_states = hidden_states_reshaped[selected_token_indices]
+-                
+-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-                
+-                final_hidden_states = final_hidden_states.index_add(
+-                    dim=0,
+-                    index=selected_token_indices,
+-                    source=expert_output.to(hidden_states.dtype)
+-                )
+-            
+-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
++    @no_grad()
++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++        original_dtype = hidden_states.dtype
++        batch_size, _ = hidden_states.shape
++        expert_outputs_list = [
++            ops.cat([
++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++            ], dim=0)
++            for i in range(batch_size)
++        ]
++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++        weights_fp32 = routing_weights.to(mindspore.float32)
++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++        return moe_output_fp32.squeeze(1).to(original_dtype)
++
++    @no_grad()
++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++        num_tokens, _ = hidden_states.shape
++        flat_selected_experts = selected_experts.flatten()
++        sorted_expert_indices = flat_selected_experts.argsort()
++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++        original_token_indices = sorted_expert_indices // self.top_k
++        moe_output = ops.zeros_like(hidden_states)
++        current_token_offset = 0
++        for i in range(self.num_experts):
++            expert_token_count = tokens_per_expert[i] - current_token_offset
++            if expert_token_count == 0:
++                continue
++            end_offset = current_token_offset + expert_token_count
++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++            expert_hidden_states = hidden_states[expert_original_token_indices]
++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++            current_token_offset += expert_token_count
++        return moe_output
++
++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
++    @no_grad()
++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++        moe_output = ops.zeros_like(hidden_states)
++        num_tokens, _ = hidden_states.shape
++        flat_selected_experts = selected_experts.flatten()
++        flat_routing_weights = routing_weights.flatten()
++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++        active_experts = ops.unique(flat_selected_experts)
++        for expert_idx_tensor in active_experts:
++            expert_idx = expert_idx_tensor.item()
++            expert_layer = self.experts[expert_idx]
++            mask = (flat_selected_experts == expert_idx_tensor)
++            current_token_indices = token_indices[mask]
++            current_routing_weights = flat_routing_weights[mask]
++            current_hidden_states = hidden_states[current_token_indices]
++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++        return moe_output
+ 
+-            final_hidden_states = final_hidden_states + shared_expert_output
+-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-            
+-            return final_hidden_states, router_logits
++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++        global Long_Prompt
++
++        # 1. 门控计算 (所有模式通用)
++        batch_size, sequence_length, hidden_dim = hidden_states.shape
++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++        router_logits = self.gate(hidden_states_reshaped)
++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++        if self.norm_topk_prob:
++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++        
++        moe_output = None
++        if Long_Prompt:
++            # --- 精度优先模式 (ACCURACY MODE) ---
++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++        else:
++            # --- 速度优先模式 (SPEED MODE) ---
++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++            if sequence_length == 1:
++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++            else:
++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++        
+ 
++        # 3. 共享专家计算与合并 (所有模式通用)
++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++        
++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++        
++        return final_hidden_states, router_logits
+ 
+ class Qwen2MoeDecoderLayer(nn.Module):
+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+         super().__init__()
+         self.hidden_size = config.hidden_size
++        
++        # if Long_Prompt:
++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++        # else:
++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+ 
+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+ 
+-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-
+         if (layer_idx not in config.mlp_only_layers) and (
+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+         ):
+@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+             self._warmed_up = True
+             self.warmup_moe_model()
+ 
++
++
+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+         output_router_logits = (
+             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+             router_logits=outputs.router_logits,
+         )
+ 
++    def generate(self, *args, **kwargs):
++        """
++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
++        """
++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++
++        input_ids = kwargs.get("input_ids")
++        if input_ids is None and args:
++            input_ids = args[0]
++
++        if input_ids is not None:
++            prompt_length = input_ids.shape[1]
++            
++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
++                Long_Prompt = True
++            else:
++                Long_Prompt = False
++
++        return super().generate(*args, **kwargs)
++    
+     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+     def prepare_inputs_for_generation(
+         self,
+@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+         # Exception 1: when passing input_embeds, input_ids may be missing entries
+         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
++        
+         if past_key_values is not None:
+             if inputs_embeds is not None:  # Exception 1
+                 if 0 not in input_ids.shape:
+@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+             }
+         )
+         return model_inputs
++
+ # @lwx
+     # def _decode_one_tokens_logits(
+     #     self,
+@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+             attentions=outputs.attentions,
+         )
+ 
++
+ __all__ = [
+     "Qwen2MoeForCausalLM",
+     "Qwen2MoeModel",
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+new file mode 100644
+index 00000000..6dfb5b93
+--- /dev/null
++++ b/patches/0001-20251104commit.patch
+@@ -0,0 +1,1272 @@
++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Tue, 4 Nov 2025 09:11:51 +0800
++Subject: [PATCH] 20251104commit
++
++---
++ mindnlp/transformers/cache_utils.py           |  28 +-
++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++ 3 files changed, 976 insertions(+), 87 deletions(-)
++
++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++index cadd2e04..02f8d4be 100644
++--- a/mindnlp/transformers/cache_utils.py
+++++ b/mindnlp/transformers/cache_utils.py
++@@ -812,14 +812,26 @@ class StaticCache(Cache):
++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++             # k_out[:, :, cache_position] = key_states
++             # v_out[:, :, cache_position] = value_states
++-            if ON_ORANGE_PI:
++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++-            else:
++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++-
+++            # if ON_ORANGE_PI:
+++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++            # else:
+++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++            # 确保 cache_position 是 1D tensor 并且类型正确
+++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++            if cache_position.ndim > 1:
+++                cache_position = cache_position.flatten()
+++            # 确保类型是 int32 或 int64（MindSpore 要求）
+++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++                cache_position = cache_position.int()
+++            
+++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++            k_out[:, :, cache_position] = key_states
+++            v_out[:, :, cache_position] = value_states
+++            
++         return k_out, v_out
++ 
++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index c695b944..d8303e45 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++ def rotate_half(x):
++     """Rotates half the hidden dims of the input."""
++-    x1 = x[..., : x.shape[-1] // 2]
++-    x2 = x[..., x.shape[-1] // 2 :]
+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++    # x1 = x[..., : x.shape[-1] // 2]
+++    # x2 = x[..., x.shape[-1] // 2 :]
+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++     return ops.cat((-x2, x1), dim=-1)
++ 
++ 
++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++         if self.training:
++             raise NotImplementedError("Training is not supported yet.")
++         else:
++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++-        if self.config.n_shared_experts is not None:
++-            y = y + self.shared_experts(identity)
++-        return y
+++            # @lwx
+++            if orig_shape[1] == 1:
+++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++                y=y.view(*orig_shape)
+++                if self.config.n_shared_experts is not None:
+++                    y = y + self.shared_experts(identity)
+++                return y
+++            else:
+++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++                if self.config.n_shared_experts is not None:
+++                    y = y + self.shared_experts(identity)
+++                return y
+++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++        # if self.config.n_shared_experts is not None:
+++        #     y = y + self.shared_experts(identity)
+++        # return y
+++        
+++    @no_grad()
+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++
+++        expert_cache = ops.zeros_like(x)
+++        for i in range(self.num_experts_per_tok):
+++            expert_id = flat_expert_indices[i].item()
+++            weight = flat_expert_weights[i].item()
+++            expert = self.experts[expert_id]
+++            expert_out = expert(x)
+++            expert_cache += expert_out * weight
+++        return expert_cache
++ 
++     @no_grad()
++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++-        # expert_cache = torch.zeros_like(x)
++-        # idxs = flat_expert_indices.argsort()
++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++-        # token_idxs = idxs // self.num_experts_per_tok
++-        # for i, end_idx in enumerate(tokens_per_expert):
++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++-        #     if start_idx == end_idx:
++-        #         continue
++-        #     expert = self.experts[i]
++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
++-        #     expert_tokens = x[exp_token_idx]
++-        #     expert_out = expert(expert_tokens)
++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++-        # return expert_cache
+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++         expert_cache = ops.zeros_like(x)
++         idxs = flat_expert_indices.argsort()
++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++         token_idxs = idxs // self.num_experts_per_tok
+++
++         for i, end_idx in enumerate(tokens_per_expert):
++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++             if start_idx == end_idx:
++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++             expert_out = expert(expert_tokens)
++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++
++         return expert_cache
+++        
+++    # @no_grad()
+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++    #     # expert_cache = torch.zeros_like(x)
+++    #     # idxs = flat_expert_indices.argsort()
+++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++    #     # token_idxs = idxs // self.num_experts_per_tok
+++    #     # for i, end_idx in enumerate(tokens_per_expert):
+++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++    #     #     if start_idx == end_idx:
+++    #     #         continue
+++    #     #     expert = self.experts[i]
+++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++    #     #     expert_tokens = x[exp_token_idx]
+++    #     #     expert_out = expert(expert_tokens)
+++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++    #     # return expert_cache
+++    #     expert_cache = ops.zeros_like(x)
+++    #     idxs = flat_expert_indices.argsort()
+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++    #     token_idxs = idxs // self.num_experts_per_tok
+++
+++    #     for i, end_idx in enumerate(tokens_per_expert):
+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++    #         if start_idx == end_idx:
+++    #             continue
+++    #         expert = self.experts[i]
+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++    #         expert_tokens = x[exp_token_idx]
+++    #         expert_out = expert(expert_tokens)
+++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++
+++    #     return expert_cache
+++    # @no_grad()
+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++    #     expert_cache = ops.zeros_like(x)
+++
+++    #     # 排序保证顺序一致
+++    #     idxs = flat_expert_indices.argsort()
+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++    #     token_idxs = idxs // self.num_experts_per_tok
+++
+++    #     # 找出有 token 的专家
+++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++
+++    #     for i in active_experts.tolist():
+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++    #         end_idx = tokens_per_expert[i]
+++    #         if start_idx == end_idx:  # 没有 token
+++    #             continue
+++
+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++    #         expert_tokens = x[exp_token_idx]
+++    #         expert_out = self.experts[i](expert_tokens)
+++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++
+++    #         expert_cache = mindspore.mint.scatter_add(
+++    #             expert_cache,
+++    #             0,
+++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++    #             expert_out
+++    #         )
+++
+++    #     return expert_cache
+++
+++
++ 
++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++ #     """
++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++ 
++         # Initialize weights and apply final processing
++         self.post_init()
+++        self.warm_up = False
+++
+++    def warmup_moe_model_deep(self):
+++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++        test_texts = [
+++            "warmup short",
+++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++        ]
+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++        if tokenizer is None:
+++            from mindnlp.transformers import AutoTokenizer
+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++            self._warmup_tokenizer = tokenizer
+++
+++        for text in test_texts:
+++            inputs = tokenizer(text, return_tensors="ms")
+++            with mindspore._no_grad():
+++                _ = self(**inputs, use_cache=False)
+++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++ 
++     def get_input_embeddings(self):
++         return self.model.embed_tokens
++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++         ```"""
+++        if not self.warm_up:
+++            self.warm_up = True
+++            self.warmup_moe_model_deep()
+++
++         output_attentions = (
++             output_attentions
++             if output_attentions is not None
++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++index 3cbf820e..d4c6b651 100644
++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++@@ -18,7 +18,6 @@
++ # See the License for the specific language governing permissions and
++ # limitations under the License.
++ """MindSpore Qwen2MoE model."""
++-
++ import math
++ from typing import List, Optional, Tuple, Union
++ 
++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++     TokenClassifierOutput,
++ )
++ from ...modeling_utils import PreTrainedModel
+++from ...generation import GenerationMixin
++ from ....utils import logging
++ from .configuration_qwen2_moe import Qwen2MoeConfig
++ 
++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++         self.variance_epsilon = eps
++ 
++     def forward(self, hidden_states):
+++        # @dwj
+++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++        # @lwx
+++        # if not self.training :
+++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++         input_dtype = hidden_states.dtype
++         hidden_states = hidden_states.to(mindspore.float32)
++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++@@ -234,6 +239,8 @@ def rotate_half(x):
++     """Rotates half the hidden dims of the input."""
++     x1 = x[..., : x.shape[-1] // 2]
++     x2 = x[..., x.shape[-1] // 2 :]
+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++     return ops.cat((-x2, x1), dim=-1)
++ 
++ 
++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++         self.config = config
++         self.hidden_size = config.hidden_size
++         self.intermediate_size = intermediate_size
+++        
++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++         self.act_fn = ACT2FN[config.hidden_act]
++ 
++     def forward(self, x):
++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++-
++ 
+++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++        # @lwx 
+++        # gate_up_output = self.gate_up_proj(x)
+++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++        # return self.down_proj(swiglu_output)
+++
+++    # def forward(self, x):
+++    #     gate_proj_out = self.gate_proj(x)
+++    #     up_proj_out = self.up_proj(x)
+++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++    #     return self.down_proj(swiglu_out)
+++        
++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++     """
++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++         use_cache: bool = False,
++         cache_position: Optional[mindspore.Tensor] = None,
++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++        
+++
++         bsz, q_len, _ = hidden_states.shape
++ 
++         query_states = self.q_proj(hidden_states)
++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++                     "with a layer index."
++                 )
++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++            if isinstance(past_key_value, StaticCache):
+++                kv_seq_len = key_states.shape[-2]
+++            else:
+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++ 
++         if past_key_value is not None:
++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++            
+++            if isinstance(past_key_value, StaticCache):
+++                kv_seq_len = key_states.shape[-2]
++ 
++         # repeat k/v heads if n_kv_heads < n_heads
++         key_states = repeat_kv(key_states, self.num_key_value_groups)
++         value_states = repeat_kv(value_states, self.num_key_value_groups)
++-
+++        
++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++ 
++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++-            raise ValueError(
++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++-                f" {attn_weights.shape}"
++-            )
++-
++-        if attention_mask is not None:  # no matter the length, we just slice it
++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++        if attention_mask is not None:
+++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++             attn_weights = attn_weights + causal_mask
++ 
++         # upcast attention to fp32
++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++ 
++         attn_output = self.o_proj(attn_output)
++-
+++        # @lwx
+++        
+++        # max_seq_len = self.max_position_embeddings  # 2048
+++
+++        # if attention_mask is not None:
+++        #     # attention_mask: [B, 1, Sq, Sk]
+++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++
+++        #     # pad 到 [max_seq_len, max_seq_len]
+++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++        #     global_attention_mask = padded_mask
+++        # else:
+++        #     global_attention_mask = None
+++
+++
+++        # sparse_mode=3
+++        # attn_output = mindspore.ops.flash_attention_score(
+++        #     query=query_states,
+++        #     key=key_states,
+++        #     value=value_states,
+++        #     real_shift=None,
+++        #     padding_mask=None,
+++
+++        #     head_num=self.num_heads,
+++        #     attn_mask=global_attention_mask,
+++        #     keep_prob=1.0 - self.attention_dropout,
+++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++        #     input_layout="BNSD",
+++        #     pre_tokens=2147483647,
+++        #     next_tokens=2147483647,
+++        #     inner_precise=0,
+++        #     drop_mask=None,
+++        #     prefix=None,
+++        #     actual_seq_qlen=None,
+++        #     actual_seq_kvlen=None,
+++        #     sparse_mode=sparse_mode,
+++        # )
++         if not output_attentions:
++             attn_weights = None
++ 
++         return attn_output, attn_weights, past_key_value
++ 
++ 
+++class Qwen2MoeFlashAttention(nn.Module):
+++    """
+++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++
+++    关键改动:
+++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++       直接传入原始的 key 和 value 张量效率更高。
+++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++    """
+++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++        super().__init__()
+++        self.config = config
+++        self.layer_idx = layer_idx
+++        self.hidden_size = config.hidden_size
+++        self.num_heads = config.num_attention_heads
+++        self.head_dim = self.hidden_size // self.num_heads
+++        self.num_key_value_heads = config.num_key_value_heads
+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++        self.max_position_embeddings = config.max_position_embeddings
+++        self.rope_theta = config.rope_theta
+++        self.attention_dropout = config.attention_dropout
+++
+++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++            raise ValueError(
+++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++            )
+++
+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++
+++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++            self.head_dim,
+++            max_position_embeddings=self.max_position_embeddings,
+++            base=self.rope_theta,
+++        )
+++
+++    def forward(
+++        self,
+++        hidden_states: mindspore.Tensor,
+++        attention_mask: Optional[mindspore.Tensor] = None,
+++        position_ids: Optional[mindspore.Tensor] = None,
+++        past_key_value: Optional[Cache] = None,
+++        output_attentions: bool = False,
+++        use_cache: bool = False,
+++        cache_position: Optional[mindspore.Tensor] = None,
+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++        bsz, q_len, _ = hidden_states.shape
+++
+++        # 1. 线性投射 Q, K, V
+++        query_states = self.q_proj(hidden_states)
+++        key_states = self.k_proj(hidden_states)
+++        value_states = self.v_proj(hidden_states)
+++
+++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++        # query:   [B, S, H*D] -> [B, N1, S, D]
+++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++        # 3. RoPE 旋转位置编码
+++        kv_seq_len = key_states.shape[-2]
+++        if past_key_value is not None:
+++            if self.layer_idx is None:
+++                raise ValueError(
+++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++                    "with a layer index."
+++                )
+++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++                if cache_position.shape[0] == 1:
+++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++                    kv_seq_len = past_seen_tokens + 1
+++                else:
+++                    # prefill 阶段：cache_position 是范围，使用其长度
+++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++            else:
+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++
+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++        # 4. KV 缓存更新
+++        if past_key_value is not None:
+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++            key_states, value_states = past_key_value.update(
+++                key_states, value_states, self.layer_idx, cache_kwargs
+++            )
+++            
+++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++                if cache_position.shape[0] == 1:
+++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++                    kv_seq_len = key_states.shape[-2]
+++
+++        # 5. [重要] 准备 Attention Mask
+++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++        fa_attention_mask = None
+++        if attention_mask is not None:
+++            # 截取与当前key长度匹配的部分
+++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++            fa_attention_mask = (mask_slice != 0)
+++
+++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++        input_dtype = query_states.dtype
+++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++            query_states = query_states.to(mindspore.float16)
+++            key_states = key_states.to(mindspore.float16)
+++            value_states = value_states.to(mindspore.float16)
+++
+++        # 6. [核心] 调用 flash_attention_score 算子
+++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++        attn_output = mindspore.ops.flash_attention_score(
+++            query=query_states,
+++            key=key_states,
+++            value=value_states,
+++            head_num=self.num_heads, # 传入Q的头数(N1)
+++            attn_mask=fa_attention_mask,
+++            keep_prob=1.0 - self.attention_dropout,
+++            scalar_value=1.0 / math.sqrt(self.head_dim),
+++            input_layout="BNSD",
+++            sparse_mode=0 # 使用 defaultMask 模式
+++        )
+++
+++        # 恢复原始数据类型
+++        attn_output = attn_output.to(input_dtype)
+++
+++        # 7. 调整输出形状
+++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++        attn_output = self.o_proj(attn_output)
+++
+++        # FlashAttention 算子不直接返回注意力权重矩阵
+++        attn_weights = None
+++        if output_attentions:
+++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++
+++        return attn_output, attn_weights, past_key_value
+++
+++    # def forward(
+++    #     self,
+++    #     hidden_states: mindspore.Tensor,
+++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++    #     position_ids: Optional[mindspore.Tensor] = None,
+++    #     past_key_value: Optional[Cache] = None,
+++    #     output_attentions: bool = False,
+++    #     use_cache: bool = False,
+++    #     cache_position: Optional[mindspore.Tensor] = None,
+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++    #     bsz, q_len, _ = hidden_states.shape
+++
+++    #     # 1. 线性投射 Q, K, V
+++    #     query_states = self.q_proj(hidden_states)
+++    #     key_states = self.k_proj(hidden_states)
+++    #     value_states = self.v_proj(hidden_states)
+++
+++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++    #     # 3. RoPE 旋转位置编码
+++    #     kv_seq_len = key_states.shape[-2]
+++    #     if past_key_value is not None:
+++    #         if self.layer_idx is None:
+++    #             raise ValueError(
+++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++    #                 "with a layer index."
+++    #             )
+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++
+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++    #     # 4. KV 缓存更新
+++    #     if past_key_value is not None:
+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++    #         key_states, value_states = past_key_value.update(
+++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++    #         )
+++
+++    #     # 5. 准备 Attention Mask
+++    #     fa_attention_mask = None
+++    #     if attention_mask is not None:
+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++    #         fa_attention_mask = (mask_slice != 0)
+++
+++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++    #     input_dtype = query_states.dtype
+++
+++    #     # 6. [核心] 调用 flash_attention_score 算子
+++    #     attn_output = mindspore.ops.flash_attention_score(
+++    #         query=query_states,
+++    #         key=key_states,
+++    #         value=value_states,
+++    #         head_num=self.num_heads,
+++    #         attn_mask=fa_attention_mask,
+++    #         keep_prob=1.0 - self.attention_dropout,
+++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++    #         input_layout="BNSD",
+++    #         sparse_mode=0,
+++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++    #         inner_precise=1
+++    #     )
+++
+++    #     # 恢复原始数据类型
+++    #     attn_output = attn_output.to(input_dtype)
+++
+++    #     # 7. 调整输出形状
+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++    #     attn_output = self.o_proj(attn_output)
+++
+++    #     attn_weights = None
+++    #     if output_attentions:
+++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++
+++    #     return attn_output, attn_weights, past_key_value
+++
+++    # def forward(
+++    #     self,
+++    #     hidden_states: mindspore.Tensor,
+++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++    #     position_ids: Optional[mindspore.Tensor] = None,
+++    #     past_key_value: Optional[Cache] = None,
+++    #     output_attentions: bool = False,
+++    #     use_cache: bool = False,
+++    #     cache_position: Optional[mindspore.Tensor] = None,
+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++    #     bsz, q_len, _ = hidden_states.shape
+++
+++    #     query_states = self.q_proj(hidden_states)
+++    #     key_states = self.k_proj(hidden_states)
+++    #     value_states = self.v_proj(hidden_states)
+++
+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++    #     kv_seq_len = key_states.shape[-2]
+++    #     if past_key_value is not None:
+++    #         if self.layer_idx is None:
+++    #             raise ValueError("`layer_idx` must be specified for caching")
+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++
+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++    #     if past_key_value is not None:
+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++    #         key_states, value_states = past_key_value.update(
+++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++    #         )
+++
+++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++
+++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++    #     query_states = query_states / math.sqrt(self.head_dim)
+++    #     # <--- 修改结束 ---
+++
+++    #     fa_attention_mask = None
+++    #     if attention_mask is not None:
+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++    #         fa_attention_mask = (mask_slice != 0)
+++
+++    #     input_dtype = query_states.dtype
+++
+++    #     attn_output = mindspore.ops.flash_attention_score(
+++    #         query=query_states,    # 传入已经预先缩放过的 query
+++    #         key=key_states,
+++    #         value=value_states,
+++    #         head_num=self.num_heads,
+++    #         attn_mask=fa_attention_mask,
+++    #         keep_prob=1.0 - self.attention_dropout,
+++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++    #         input_layout="BNSD",
+++    #         sparse_mode=0,
+++    #         inner_precise=1        # 仍然保持内部高精度计算
+++    #     )
+++
+++    #     attn_output = attn_output.to(input_dtype)
+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++    #     attn_output = self.o_proj(attn_output)
+++
+++    #     attn_weights = None
+++    #     if output_attentions:
+++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++
+++    #     return attn_output, attn_weights, past_key_value
+++
++ QWEN2MOE_ATTENTION_CLASSES = {
++     "eager": Qwen2MoeAttention,
+++    "flash-attention": Qwen2MoeFlashAttention,
++ }
++ 
++ 
++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++ 
+++    #@dwj
+++    # 只遍历激活的专家，而非全部专家
++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
++-        hidden_states = hidden_states.view(-1, hidden_dim)
++-        # router_logits: (batch * sequence_length, n_experts)
++-        router_logits = self.gate(hidden_states)
++-
++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++-        if self.norm_topk_prob:
++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-        # we cast back to the input dtype
++-        routing_weights = routing_weights.to(hidden_states.dtype)
++-
++-        final_hidden_states = ops.zeros(
++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++-        )
++-
++-        # One hot encode the selected experts to create an expert mask
++-        # this will be used to easily index which expert is going to be sollicitated
++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++-
++-        # Loop over all available experts in the model and perform the computation on each expert
++-        for expert_idx in range(self.num_experts):
++-            expert_layer = self.experts[expert_idx]
++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++-
++-            # Index the correct hidden states and compute the expert hidden state for
++-            # the current expert. We need to make sure to multiply the output hidden
++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++-            if 0 not in idx.shape:
++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++-
++-                # However `index_add_` only support torch tensors for indexing so we'll use
++-                # the `top_x` tensor here.
++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++-
++-        shared_expert_output = self.shared_expert(hidden_states)
++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++-
++-        final_hidden_states = final_hidden_states + shared_expert_output
+++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++            num_tokens = hidden_states_reshaped.shape[0]
+++
+++            router_logits = self.gate(hidden_states_reshaped)
+++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++
+++            if self.norm_topk_prob:
+++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++            routing_weights = routing_weights.to(hidden_states.dtype)
+++            
+++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++            flat_selected_experts = selected_experts.flatten()
+++            
+++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++            token_indices = broadcasted_token_indices.flatten()
+++            
+++            active_experts = ops.unique(flat_selected_experts)
+++            
+++            for expert_idx_tensor in active_experts:
+++                expert_idx = expert_idx_tensor.item()
+++                expert_layer = self.experts[expert_idx]
+++                
+++                mask = (flat_selected_experts == expert_idx_tensor)
+++                selected_token_indices = token_indices[mask]
+++                selected_routing_weights = routing_weights.flatten()[mask]
+++                
+++                current_states = hidden_states_reshaped[selected_token_indices]
+++                
+++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++                
+++                final_hidden_states = final_hidden_states.index_add(
+++                    dim=0,
+++                    index=selected_token_indices,
+++                    source=expert_output.to(hidden_states.dtype)
+++                )
+++            
+++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++ 
++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++-        return final_hidden_states, router_logits
+++            final_hidden_states = final_hidden_states + shared_expert_output
+++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++            
+++            return final_hidden_states, router_logits
++ 
++ 
++ class Qwen2MoeDecoderLayer(nn.Module):
++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++ 
++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++ 
+++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++
++         if (layer_idx not in config.mlp_only_layers) and (
++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++         ):
++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++     _skip_keys_device_placement = "past_key_values"
++     _supports_cache_class = True
+++#lwx
+++    # _supports_static_cache = True
++ 
++     def _init_weights(self, module):
++         std = self.config.initializer_range
++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++         return causal_mask
++ 
++ 
++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++     _tied_weights_keys = ["lm_head.weight"]
++ 
++     def __init__(self, config):
++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++         self.num_experts_per_tok = config.num_experts_per_tok
++         # Initialize weights and apply final processing
++         self.post_init()
+++        # @lwx
+++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++        #     self.generation_config.cache_implementation = "static"
+++        self._warmed_up = False
+++
+++    def warmup_moe_model(self):
+++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++        test_texts = [
+++            "warmup short",
+++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++        ]
+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++        if tokenizer is None:
+++            from mindnlp.transformers import AutoTokenizer
+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++            self._warmup_tokenizer = tokenizer
+++
+++        for text in test_texts:
+++            inputs = tokenizer(text, return_tensors="ms")
+++            with mindspore._no_grad():
+++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++        print("[Warmup] Qwen2-MoE 模型预热完成。")
++ 
++     def get_input_embeddings(self):
++         return self.model.embed_tokens
++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++         ```"""
+++        if not self._warmed_up:
+++            self._warmed_up = True
+++            self.warmup_moe_model()
++ 
++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++         output_router_logits = (
++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++             }
++         )
++         return model_inputs
+++# @lwx
+++    # def _decode_one_tokens_logits(
+++    #     self,
+++    #     cur_token: mindspore.Tensor,
+++    #     input_pos: Optional[mindspore.Tensor],
+++    #     cache_position: mindspore.Tensor,
+++    #     past_key_values: StaticCache,
+++    # ) -> mindspore.Tensor:
+++    #     """
+++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++        
+++    #     Args:
+++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++    #         input_pos: 输入位置信息，可选
+++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++            
+++    #     Returns:
+++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++    #     """
+++    #     # 调用JIT编译的版本
+++    #     return self.get_decode_one_tokens_logits(
+++    #         cur_token=cur_token,
+++    #         input_pos=input_pos,
+++    #         cache_position=cache_position,
+++    #         past_key_values=past_key_values,
+++    #     )
+++    
+++    # @mindspore.jit(jit_level='O1')
+++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++    #     """
+++    #     JIT编译的函数，用于高效的单token解码
+++    #     使用JIT编译优化以支持静态shape和高效执行
+++        
+++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++    #     """
+++    #     outputs = self.model.forward(
+++    #         input_ids=cur_token,
+++    #         position_ids=input_pos,
+++    #         cache_position=cache_position,
+++    #         past_key_values=past_key_values,
+++    #         use_cache=True,
+++    #         return_dict=False,
+++    #     )
+++        
+++    #     hidden_states = outputs[0]
+++    #     logits = self.lm_head.forward(hidden_states)
+++    #     logits = logits.float()
+++        
+++    #     return logits[:, -1, :]
+++
+++    # def _sample(
+++    #     self,
+++    #     input_ids: mindspore.Tensor,
+++    #     logits_processor,
+++    #     stopping_criteria,
+++    #     generation_config,
+++    #     synced_devices: bool,
+++    #     streamer=None,
+++    #     logits_warper=None,
+++    #     **model_kwargs,
+++    # ):
+++    #     """
+++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++    #     """
+++    #     from ...generation.logits_process import LogitsProcessorList
+++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++    #     from mindnlp.core import nn, ops, no_grad
+++    #     import numpy as np
+++        
+++    #     # 检查是否使用 StaticCache
+++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++    #     # 否则，直接调用父类方法
+++    #     past_key_values = model_kwargs.get("past_key_values")
+++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++
+++    #     if not isinstance(past_key_values, StaticCache):
+++    #         # 不使用 StaticCache，直接调用父类方法
+++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++    #         return super()._sample(
+++    #             input_ids=input_ids,
+++    #             logits_processor=logits_processor,
+++    #             stopping_criteria=stopping_criteria,
+++    #             generation_config=generation_config,
+++    #             synced_devices=synced_devices,
+++    #             streamer=streamer,
+++    #             logits_warper=logits_warper,
+++    #             **model_kwargs,
+++    #         )
+++        
+++    #     # 使用 StaticCache，进入自定义循环
+++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++    #     pad_token_id = generation_config._pad_token_tensor
+++    #     output_attentions = generation_config.output_attentions
+++    #     output_hidden_states = generation_config.output_hidden_states
+++    #     output_scores = generation_config.output_scores
+++    #     output_logits = generation_config.output_logits
+++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++    #     max_length = generation_config.max_length
+++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++    #     do_sample = generation_config.do_sample
+++        
+++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++    #         raise ValueError(
+++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++    #             f"{logits_warper})."
+++    #         )
+++        
+++    #     # init attention / hidden states / scores tuples
+++    #     scores = () if (return_dict_in_generate and output_scores) else None
+++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++        
+++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++    #         encoder_hidden_states = (
+++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++    #         )
+++        
+++    #     # keep track of which sequences are already finished
+++    #     batch_size, cur_len = input_ids.shape
+++    #     this_peer_finished = False
+++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++        
+++    #     time_record = []
+++    #     from ....utils.testing_utils import parse_flag_from_env
+++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++        
+++    #     while self._has_unfinished_sequences(
+++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++    #     ):
+++    #         if _record_time:
+++    #             import time as time_module
+++    #             infer_start = time_module.time()
+++            
+++    #         # prepare model inputs
+++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++            
+++    #         # prepare variable output controls
+++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++            
+++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++    #         cur_cache_position = model_inputs.get("cache_position")
+++    #         cur_past_key_values = model_inputs.get("past_key_values")
+++    #         cur_input_ids = model_inputs.get("input_ids")
+++            
+++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++    #             cur_cache_position is not None and 
+++    #             len(cur_cache_position.shape) > 0 and
+++    #             cur_cache_position.shape[0] == 1 and
+++    #             cur_input_ids is not None and
+++    #             cur_input_ids.shape[1] == 1):
+++    #             # 使用 JIT 优化的单 token 解码
+++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++    #             if not hasattr(self, '_jit_used'):
+++    #                 self._jit_used = False
+++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++                
+++    #             next_token_logits = self.get_decode_one_tokens_logits(
+++    #                 cur_token=cur_input_ids,
+++    #                 input_pos=model_inputs.get("position_ids"),
+++    #                 cache_position=cur_cache_position,
+++    #                 past_key_values=cur_past_key_values,
+++    #             )
+++                
+++    #             # 标记已使用JIT（用于后续判断）
+++    #             if not self._jit_used:
+++    #                 self._jit_used = True
+++                
+++    #             # 构造兼容的输出对象
+++    #             class JitOptimizedOutput:
+++    #                 def __init__(self, logits, config):
+++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++    #                     self.config = config
+++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++    #                     self.attentions = None if not config.is_encoder_decoder else None
+++    #                     self.cross_attentions = None
+++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++                
+++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++    #         else:
+++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++    #             outputs = self(**model_inputs, return_dict=True)
+++            
+++    #         if synced_devices and this_peer_finished:
+++    #             continue
+++            
+++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++    #         next_token_logits = outputs.logits[:, -1, :]
+++            
+++    #         # pre-process distribution
+++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++    #         if do_sample:
+++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++            
+++    #         # Store scores, attentions and hidden_states when required
+++    #         if return_dict_in_generate:
+++    #             if output_scores:
+++    #                 scores += (next_token_scores,)
+++    #             if output_logits:
+++    #                 raw_logits += (next_token_logits,)
+++    #             if output_attentions:
+++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++    #                 if self.config.is_encoder_decoder:
+++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++                
+++    #             if output_hidden_states:
+++    #                 hidden = (
+++    #                     outputs.decoder_hidden_states
+++    #                     if self.config.is_encoder_decoder
+++    #                     else outputs.hidden_states
+++    #                 )
+++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++            
+++    #         # token selection
+++    #         if do_sample:
+++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++    #         else:
+++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++            
+++    #         # finished sentences should have their next token be a padding token
+++    #         if has_eos_stopping_criteria:
+++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++            
+++    #         # update generated ids, model inputs, and length for next step
+++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++    #         if streamer is not None:
+++    #             streamer.put(next_tokens)
+++            
+++    #         model_kwargs = self._update_model_kwargs_for_generation(
+++    #             outputs,
+++    #             model_kwargs,
+++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++    #         )
+++            
+++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++    #         cur_len += 1
+++            
+++    #         if _record_time:
+++    #             import time as time_module
+++    #             infer_stop = time_module.time()
+++    #             time_record.append(infer_stop - infer_start)
+++            
+++    #         del outputs
+++        
+++    #     average_infer_time = None
+++    #     if time_record:
+++    #         if len(time_record) > 1:
+++    #             time_record.pop(0)
+++    #         average_infer_time = sum(time_record) / len(time_record)
+++    #         print(f'average inference time is: {average_infer_time}')
+++    #         print(f'inference time record: {time_record}')
+++        
+++    #     if streamer is not None:
+++    #         streamer.end()
+++        
+++    #     # 简单判断：打印是否使用了JIT路径
+++    #     if hasattr(self, '_jit_used') and self._jit_used:
+++    #         print("[JIT] ✓ JIT optimization was used during generation")
+++    #     else:
+++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++        
+++    #     if return_dict_in_generate:
+++    #         if self.config.is_encoder_decoder:
+++    #             return GenerateEncoderDecoderOutput(
+++    #                 sequences=input_ids,
+++    #                 scores=scores,
+++    #                 logits=raw_logits,
+++    #                 encoder_attentions=encoder_attentions,
+++    #                 encoder_hidden_states=encoder_hidden_states,
+++    #                 decoder_attentions=decoder_attentions,
+++    #                 cross_attentions=cross_attentions,
+++    #                 decoder_hidden_states=decoder_hidden_states,
+++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++    #                 average_infer_time=average_infer_time
+++    #             )
+++    #         else:
+++    #             return GenerateDecoderOnlyOutput(
+++    #                 sequences=input_ids,
+++    #                 scores=scores,
+++    #                 logits=raw_logits,
+++    #                 attentions=decoder_attentions,
+++    #                 hidden_states=decoder_hidden_states,
+++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++    #                 average_infer_time=average_infer_time
+++    #             )
+++    #     else:
+++    #         return input_ids
+++            
+++    # def _prepare_cache_for_generation(
+++    #     self,
+++    #     generation_config,
+++    #     model_kwargs,
+++    #     assistant_model,
+++    #     batch_size,
+++    #     max_cache_length,
+++    # ):
+++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++    #         generation_config.cache_implementation = "static"
+++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++        
+++    #     if generation_config.cache_implementation == "static":
+++    #         base_required_from_max_length = generation_config.max_length + 1
+++    #         base_required = max(max_cache_length, base_required_from_max_length)
+++    #         min_cache_size = 50
+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++    #         else:
+++    #             max_cache_length = max(base_required, min_cache_size)
+++            
+++    #         original_max_cache_length = max_cache_length
+++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++    #         print(f"  - final max_cache_length: {max_cache_length}")
+++            
+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++    #             if max_cache_length > self.config.max_position_embeddings:
+++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++        
+++    #     result = super()._prepare_cache_for_generation(
+++    #         generation_config=generation_config,
+++    #         model_kwargs=model_kwargs,
+++    #         assistant_model=assistant_model,
+++    #         batch_size=batch_size,
+++    #         max_cache_length=max_cache_length,
+++    #     )
+++        
+++    #     if generation_config.cache_implementation == "static":
+++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++    #         created_cache = model_kwargs.get(cache_name)
+++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++    #             if created_cache.max_cache_len < generation_config.max_length:
+++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++        
+++    #     return result
+++
+++
+++
++ 
++ 
++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++-- 
++2.27.0
++
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0003-20261106secondcommit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0003-20261106secondcommit.patch"
new file mode 100644
index 00000000..d64b7f3f
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0003-20261106secondcommit.patch"
@@ -0,0 +1,2769 @@
+From 7a37d9be16fe823c251701c26bbb20cc09f9922a Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Thu, 6 Nov 2025 14:54:37 +0800
+Subject: [PATCH 03/10] 20261106secondcommit
+
+---
+ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+ patches/0001-20251104commit.patch             | 1272 -----------------
+ 3 files changed, 528 insertions(+), 2032 deletions(-)
+ delete mode 100644 patches/0001-20251104commit.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index 73773c22..2f9192bf 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+ 
+ _CONFIG_FOR_DOC = "DeepseekConfig"
+ 
++_attn_mask_cache = {}
++
++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
++    q_len = batch_and_seq[1]
++    kv_len = batch_and_seq[1] + past_key_values_length 
++    key = (batch_and_seq[0], q_len, kv_len)
++
++    if key in _attn_mask_cache:
++        return _attn_mask_cache[key]
++
++    mask = _prepare_4d_causal_attention_mask(
++        attention_mask,
++        batch_and_seq,
++        inputs_embeds,
++        past_key_values_length,
++    )
++    _attn_mask_cache[key] = mask
++    return mask
+ 
+ def _get_unpad_data(attention_mask):
+     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+         return final_output
+ 
+ 
+-    @no_grad()
+-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-        expert_cache = ops.zeros_like(x)
+-        idxs = flat_expert_indices.argsort()
+-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-        token_idxs = idxs // self.num_experts_per_tok
+-
+-        for i, end_idx in enumerate(tokens_per_expert):
+-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-            if start_idx == end_idx:
+-                continue
+-            expert = self.experts[i]
+-            exp_token_idx = token_idxs[start_idx:end_idx]
+-            expert_tokens = x[exp_token_idx]
+-            expert_out = expert(expert_tokens)
+-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-
+-        return expert_cache
+-        
+     # @no_grad()
+-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-    #     # expert_cache = torch.zeros_like(x)
+-    #     # idxs = flat_expert_indices.argsort()
+-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-    #     # token_idxs = idxs // self.num_experts_per_tok
+-    #     # for i, end_idx in enumerate(tokens_per_expert):
+-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-    #     #     if start_idx == end_idx:
+-    #     #         continue
+-    #     #     expert = self.experts[i]
+-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-    #     #     expert_tokens = x[exp_token_idx]
+-    #     #     expert_out = expert(expert_tokens)
+-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-    #     # return expert_cache
++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+     #     expert_cache = ops.zeros_like(x)
+     #     idxs = flat_expert_indices.argsort()
+     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+ 
+     #     return expert_cache
+-    # @no_grad()
+-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-    #     expert_cache = ops.zeros_like(x)
++        
++    @no_grad()
++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++        """
++        优化版 MoE prefill：
++        - 批量张量化处理同一个 expert 的所有 token
++        - 跳过无 token 的专家
++        - 保持结果完全一致
++        """
++        # 初始化输出缓存
++        expert_cache = ops.zeros_like(x)
+ 
+-    #     # 排序保证顺序一致
+-    #     idxs = flat_expert_indices.argsort()
+-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-    #     token_idxs = idxs // self.num_experts_per_tok
++        # 排序（确保 scatter_add 位置对应原逻辑）
++        idxs = flat_expert_indices.argsort()
++        sorted_expert_indices = flat_expert_indices[idxs]
++        sorted_token_indices = idxs // self.num_experts_per_tok
+ 
+-    #     # 找出有 token 的专家
+-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++        # 每个 expert 的 token 数
++        tokens_per_expert = sorted_expert_indices.bincount()
+ 
+-    #     for i in active_experts.tolist():
+-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-    #         end_idx = tokens_per_expert[i]
+-    #         if start_idx == end_idx:  # 没有 token
+-    #             continue
++        # 找出有 token 的专家
++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+ 
+-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-    #         expert_tokens = x[exp_token_idx]
+-    #         expert_out = self.experts[i](expert_tokens)
+-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++        for expert_id in active_experts.tolist():
++            # 取该 expert 对应的排序后 token 区间
++            start = (tokens_per_expert[:expert_id]).sum().item()
++            end = start + tokens_per_expert[expert_id].item()
+ 
+-    #         expert_cache = mindspore.mint.scatter_add(
+-    #             expert_cache,
+-    #             0,
+-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-    #             expert_out
+-    #         )
++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
++            expert_tokens = x[token_idx]                     # 取输入向量
+ 
+-    #     return expert_cache
++            # 执行专家 MLP
++            expert_out = self.experts[expert_id](expert_tokens)
++
++            # 按权重缩放
++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
++
++            # 回写到缓存（等价 scatter_add）
++            expert_cache = mindspore.mint.scatter_add(
++                expert_cache,
++                0,
++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
++                scaled_out
++            )
++
++        return expert_cache
++
++        # @no_grad()
++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++        #     # expert_cache = torch.zeros_like(x)
++        #     # idxs = flat_expert_indices.argsort()
++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++        #     # token_idxs = idxs // self.num_experts_per_tok
++        #     # for i, end_idx in enumerate(tokens_per_expert):
++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++        #     #     if start_idx == end_idx:
++        #     #         continue
++        #     #     expert = self.experts[i]
++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++        #     #     expert_tokens = x[exp_token_idx]
++        #     #     expert_out = expert(expert_tokens)
++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++        #     # return expert_cache
++        #     expert_cache = ops.zeros_like(x)
++        #     idxs = flat_expert_indices.argsort()
++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++        #     token_idxs = idxs // self.num_experts_per_tok
++
++        #     for i, end_idx in enumerate(tokens_per_expert):
++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++        #         if start_idx == end_idx:
++        #             continue
++        #         expert = self.experts[i]
++        #         exp_token_idx = token_idxs[start_idx:end_idx]
++        #         expert_tokens = x[exp_token_idx]
++        #         expert_out = expert(expert_tokens)
++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++
++        #     return expert_cache
++        # @no_grad()
++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++        #     expert_cache = ops.zeros_like(x)
++
++        #     # 排序保证顺序一致
++        #     idxs = flat_expert_indices.argsort()
++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++        #     token_idxs = idxs // self.num_experts_per_tok
++
++        #     # 找出有 token 的专家
++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++
++        #     for i in active_experts.tolist():
++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++        #         end_idx = tokens_per_expert[i]
++        #         if start_idx == end_idx:  # 没有 token
++        #             continue
++
++        #         exp_token_idx = token_idxs[start_idx:end_idx]
++        #         expert_tokens = x[exp_token_idx]
++        #         expert_out = self.experts[i](expert_tokens)
++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++
++        #         expert_cache = mindspore.mint.scatter_add(
++        #             expert_cache,
++        #             0,
++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++        #             expert_out
++        #         )
++
++        #     return expert_cache
+ 
+ 
+ 
+@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+ 
+         return attn_output, attn_weights, past_key_value
+ 
+-
+ # class DeepseekFlashAttention(nn.Module):
+ #     """
+ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+ 
+         return attn_output, attn_weights, past_key_value
+ 
++
+ Deepseek_ATTENTION_CLASSES = {
+     "eager": DeepseekAttention,
+     "flash-attention": DeepseekFlashAttention,
+@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+             )
+         else:
+             # 4d mask is passed through the layers
+-            attention_mask = _prepare_4d_causal_attention_mask(
++            # attention_mask = _prepare_4d_causal_attention_mask(
++            #     attention_mask,
++            #     (batch_size, seq_length),
++            #     inputs_embeds,
++            #     past_key_values_length,
++            # )
++            #@dwj
++            attention_mask = get_cached_causal_mask(
+                 attention_mask,
+                 (batch_size, seq_length),
+                 inputs_embeds,
+@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+         # Initialize weights and apply final processing
+         self.post_init()
+         self.warm_up = False
++        #@dwj
++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
++            self.num_layers,
++            self.num_attention_heads,
++            self.head_dim,
++            batch_size=1,
++            max_length=self.max_length,
++            dtype=mindspore.float16
++        )
++
++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
++        key_cache = []
++        value_cache = []
++        for _ in range(num_layers):
++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++            key_cache.append(k)
++            value_cache.append(v)
++        return key_cache, value_cache
++
+ 
+     def warmup_moe_model_deep(self):
+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+index bced285c..ebd7782e 100644
+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+ 
+-Long_Prompt = False
+-PROMPT_LENGTH_THRESHOLD = 128
++Long_Prompt = 1
++LONG_PROMPT_LENGTH_THRESHOLD = 128
++SHORT_PROMPT_LENGTH_THRESHOLD = 32
++
++_causal_mask_cache = {}
++
++def get_cached_causal_mask_with_cache_position(
++    attention_mask: mindspore.Tensor,
++    sequence_length: int,
++    target_length: int,
++    dtype: mindspore.dtype,
++    min_dtype: float,
++    cache_position: mindspore.Tensor,
++    batch_size: int,
++):
++    """
++    带缓存的 causal mask 构造函数
++    """
++    # q_len 是当前 query 长度
++    q_len = sequence_length
++    # kv_len 是 target_length
++    kv_len = target_length
++
++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
++
++    if key in _causal_mask_cache:
++        return _causal_mask_cache[key]
++
++    # 调用原来的 mask 构造逻辑
++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++        attention_mask,
++        sequence_length=sequence_length,
++        target_length=target_length,
++        dtype=dtype,
++        min_dtype=min_dtype,
++        cache_position=cache_position,
++        batch_size=batch_size,
++    )
++    # 缓存结果
++    _causal_mask_cache[key] = causal_mask
++    return causal_mask
+ 
+ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+ def _prepare_4d_causal_attention_mask_with_cache_position(
+@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+ 
+ 
+ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
++# class Qwen2MoeAttention(nn.Module):
++#     """
++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
++#     and "Generating Long Sequences with Sparse Transformers".
++#     """
++
++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++#         super().__init__()
++#         self.config = config
++#         self.layer_idx = layer_idx
++#         if layer_idx is None:
++#             logger.warning_once(
++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++#                 "when creating this class."
++#             )
++
++#         self.hidden_size = config.hidden_size
++#         self.num_heads = config.num_attention_heads
++#         self.head_dim = self.hidden_size // self.num_heads
++#         self.num_key_value_heads = config.num_key_value_heads
++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++#         self.max_position_embeddings = config.max_position_embeddings
++#         self.rope_theta = config.rope_theta
++#         self.is_causal = True
++#         self.attention_dropout = config.attention_dropout
++
++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++#             raise ValueError(
++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++#                 f" and `num_heads`: {self.num_heads})."
++#             )
++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++
++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
++#             self.head_dim,
++#             max_position_embeddings=self.max_position_embeddings,
++#             base=self.rope_theta,
++#         )
++
++#     def forward(
++#         self,
++#         hidden_states: mindspore.Tensor,
++#         attention_mask: Optional[mindspore.Tensor] = None,
++#         position_ids: Optional[mindspore.Tensor] = None,
++#         past_key_value: Optional[Cache] = None,
++#         output_attentions: bool = False,
++#         use_cache: bool = False,
++#         cache_position: Optional[mindspore.Tensor] = None,
++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++
++        
++
++#         bsz, q_len, _ = hidden_states.shape
++
++#         query_states = self.q_proj(hidden_states)
++#         key_states = self.k_proj(hidden_states)
++#         value_states = self.v_proj(hidden_states)
++
++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++
++#         kv_seq_len = key_states.shape[-2]
++#         if past_key_value is not None:
++#             if self.layer_idx is None:
++#                 raise ValueError(
++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++#                     "with a layer index."
++#                 )
++#             if isinstance(past_key_value, StaticCache):
++#                 kv_seq_len = key_states.shape[-2]
++#             else:
++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++
++#         if past_key_value is not None:
++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++            
++#             if isinstance(past_key_value, StaticCache):
++#                 kv_seq_len = key_states.shape[-2]
++
++#         # repeat k/v heads if n_kv_heads < n_heads
++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
++        
++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++
++#         if attention_mask is not None:
++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++#             attn_weights = attn_weights + causal_mask
++
++#         # upcast attention to fp32
++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++#         attn_output = ops.matmul(attn_weights, value_states)
++
++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++#             raise ValueError(
++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
++#                 f" {attn_output.shape}"
++#             )
++
++#         attn_output = ops.transpose(attn_output, 1, 2)
++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++
++#         attn_output = self.o_proj(attn_output)
++#         # @lwx
++        
++#         # max_seq_len = self.max_position_embeddings  # 2048
++
++#         # if attention_mask is not None:
++#         #     # attention_mask: [B, 1, Sq, Sk]
++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++
++#         #     # pad 到 [max_seq_len, max_seq_len]
++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++#         #     global_attention_mask = padded_mask
++#         # else:
++#         #     global_attention_mask = None
++
++
++#         # sparse_mode=3
++#         # attn_output = mindspore.ops.flash_attention_score(
++#         #     query=query_states,
++#         #     key=key_states,
++#         #     value=value_states,
++#         #     real_shift=None,
++#         #     padding_mask=None,
++
++#         #     head_num=self.num_heads,
++#         #     attn_mask=global_attention_mask,
++#         #     keep_prob=1.0 - self.attention_dropout,
++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
++#         #     input_layout="BNSD",
++#         #     pre_tokens=2147483647,
++#         #     next_tokens=2147483647,
++#         #     inner_precise=0,
++#         #     drop_mask=None,
++#         #     prefix=None,
++#         #     actual_seq_qlen=None,
++#         #     actual_seq_kvlen=None,
++#         #     sparse_mode=sparse_mode,
++#         # )
++#         if not output_attentions:
++#             attn_weights = None
++
++#         return attn_output, attn_weights, past_key_value
++
+ class Qwen2MoeAttention(nn.Module):
+     """
+-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-    and "Generating Long Sequences with Sparse Transformers".
+-    """
++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+ 
++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
++
++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
++    """
+     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+         super().__init__()
+         self.config = config
+@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+         if layer_idx is None:
+             logger.warning_once(
+                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+                 "when creating this class."
+             )
+ 
+@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+         use_cache: bool = False,
+         cache_position: Optional[mindspore.Tensor] = None,
+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-
+         
+-
++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+         bsz, q_len, _ = hidden_states.shape
+ 
+         query_states = self.q_proj(hidden_states)
+         key_states = self.k_proj(hidden_states)
+         value_states = self.v_proj(hidden_states)
+ 
+-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-
++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++        
+         kv_seq_len = key_states.shape[-2]
+         if past_key_value is not None:
+-            if self.layer_idx is None:
+-                raise ValueError(
+-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-                    "with a layer index."
+-                )
+-            if isinstance(past_key_value, StaticCache):
+-                kv_seq_len = key_states.shape[-2]
+-            else:
+-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++        
+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+ 
+         if past_key_value is not None:
+-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++
++        # --- 2. 动态调度核心注意力计算 ---
++        global Long_Prompt
++        if Long_Prompt >= 1:
++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
++            fa_attention_mask = None
++            if attention_mask is not None:
++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++                fa_attention_mask = (mask_slice != 0)
++
++            attn_output = mindspore.ops.flash_attention_score(
++                query=query_states,
++                key=key_states,
++                value=value_states,
++                head_num=self.num_heads,
++                attn_mask=fa_attention_mask,
++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
++                scalar_value=1.0 / math.sqrt(self.head_dim),
++                input_layout="BNSD",
++                sparse_mode=0,
++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
++            )
+             
+-            if isinstance(past_key_value, StaticCache):
+-                kv_seq_len = key_states.shape[-2]
++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++            attn_output = self.o_proj(attn_output)
++            attn_weights = None
++            if output_attentions:
++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+ 
+-        # repeat k/v heads if n_kv_heads < n_heads
+-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-        
+-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++        else:
++            # --- Eager Attention 路径 (用于短序列和解码) ---
++            key_states = repeat_kv(key_states, self.num_key_value_groups)
++            value_states = repeat_kv(value_states, self.num_key_value_groups)
++            
++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+ 
+-        if attention_mask is not None:
+-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-            attn_weights = attn_weights + causal_mask
++            if attention_mask is not None:
++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++                attn_weights = attn_weights + causal_mask
+ 
+-        # upcast attention to fp32
+-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-        attn_output = ops.matmul(attn_weights, value_states)
++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++            attn_output = ops.matmul(attn_weights, value_states)
+ 
+-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-            raise ValueError(
+-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-                f" {attn_output.shape}"
+-            )
++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++                raise ValueError(
++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
++                )
+ 
+-        attn_output = ops.transpose(attn_output, 1, 2)
+-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++            attn_output = ops.transpose(attn_output, 1, 2)
++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++            attn_output = self.o_proj(attn_output)
+ 
+-        attn_output = self.o_proj(attn_output)
+-        # @lwx
++            if not output_attentions:
++                attn_weights = None
+         
+-        # max_seq_len = self.max_position_embeddings  # 2048
+-
+-        # if attention_mask is not None:
+-        #     # attention_mask: [B, 1, Sq, Sk]
+-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-
+-        #     # pad 到 [max_seq_len, max_seq_len]
+-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-        #     global_attention_mask = padded_mask
+-        # else:
+-        #     global_attention_mask = None
+-
+-
+-        # sparse_mode=3
+-        # attn_output = mindspore.ops.flash_attention_score(
+-        #     query=query_states,
+-        #     key=key_states,
+-        #     value=value_states,
+-        #     real_shift=None,
+-        #     padding_mask=None,
+-
+-        #     head_num=self.num_heads,
+-        #     attn_mask=global_attention_mask,
+-        #     keep_prob=1.0 - self.attention_dropout,
+-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-        #     input_layout="BNSD",
+-        #     pre_tokens=2147483647,
+-        #     next_tokens=2147483647,
+-        #     inner_precise=0,
+-        #     drop_mask=None,
+-        #     prefix=None,
+-        #     actual_seq_qlen=None,
+-        #     actual_seq_kvlen=None,
+-        #     sparse_mode=sparse_mode,
+-        # )
+-        if not output_attentions:
+-            attn_weights = None
+-
+         return attn_output, attn_weights, past_key_value
+ 
+-
+ # class Qwen2MoeFlashAttention(nn.Module):
+ #     """
+ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+ #             return final_hidden_states, router_logits
+ 
+ 
+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-#     """
+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-#     """
+-#     def __init__(self, config: Qwen2MoeConfig):
+-#         super().__init__()
+-#         self.num_experts = config.num_experts
+-#         self.top_k = config.num_experts_per_tok
+-#         self.norm_topk_prob = config.norm_topk_prob
+-
+-#         # 门控网络
+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-#         # 专家列表
+-#         self.experts = nn.ModuleList(
+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-#         )
+-#         # 共享专家
+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-
+-#     @no_grad()
+-#     def _moe_infer_decode(
+-#         self, 
+-#         hidden_states: mindspore.Tensor, 
+-#         selected_experts: mindspore.Tensor, 
+-#         routing_weights: mindspore.Tensor
+-#     ) -> mindspore.Tensor:
+-#         """
+-#         【解码路径】针对 sequence_length=1 的极致优化。
+-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-#         """
+-#         batch_size, hidden_dim = hidden_states.shape
+-        
+-#         expert_outputs_list = [
+-#             ops.cat([
+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-#             ], dim=0) 
+-#             for i in range(batch_size)
+-#         ]
+-        
+-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-#         # shape: (batch_size, top_k, hidden_dim)
+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-        
+-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-        
+-#         return moe_output.squeeze(1)
+-
+-#     @no_grad()
+-#     def _moe_infer_prefill(
+-#         self, 
+-#         hidden_states: mindspore.Tensor, 
+-#         selected_experts: mindspore.Tensor, 
+-#         routing_weights: mindspore.Tensor
+-#     ) -> mindspore.Tensor:
+-#         """
+-#         【预填充路径】针对 sequence_length > 1 的优化。
+-#         按专家对 Token 进行分组，并进行批处理。
+-#         """
+-#         moe_output = ops.zeros_like(hidden_states)
+-#         num_tokens = hidden_states.shape[0]
+-#         flat_selected_experts = selected_experts.flatten()
+-        
+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-        
+-#         active_experts = ops.unique(flat_selected_experts)
+-        
+-#         for expert_idx_tensor in active_experts:
+-#             expert_idx = expert_idx_tensor.item()
+-#             expert_layer = self.experts[expert_idx]
+-            
+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-#             selected_token_indices = token_indices[mask]
+-#             selected_routing_weights = routing_weights.flatten()[mask]
+-            
+-#             current_states = hidden_states[selected_token_indices]
+-            
+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-            
+-#             moe_output = moe_output.index_add(
+-#                 dim=0,
+-#                 index=selected_token_indices,
+-#                 source=expert_output.to(hidden_states.dtype)
+-#             )
+-#         return moe_output
+-
+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-#         """
+-#         顶层 forward 方法，作为智能分发器。
+-#         """
+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-        
+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-#         router_logits = self.gate(hidden_states_reshaped)
+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-
+-#         if self.norm_topk_prob:
+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-        
+-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-        
+-#         moe_output = None
+-#         # 在推理时，根据序列长度选择最优路径
+-#         if not self.training:
+-#             if sequence_length == 1:
+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-#             else:
+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-#         else:
+-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-#             raise NotImplementedError("Training path is not implemented.")
+-
+-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-        
+-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-        
+-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-        
+-#         return final_hidden_states, router_logits
+-
+-
+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-#     """
+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-#     """
+-#     def __init__(self, config: Qwen2MoeConfig):
+-#         super().__init__()
+-#         self.num_experts = config.num_experts
+-#         self.top_k = config.num_experts_per_tok
+-#         self.norm_topk_prob = config.norm_topk_prob
+-
+-#         # 门控网络
+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-#         # 专家列表
+-#         self.experts = nn.ModuleList(
+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-#         )
+-#         # 共享专家
+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-
+-#     @no_grad()
+-#     def _moe_infer_decode(
+-#         self, 
+-#         hidden_states: mindspore.Tensor, 
+-#         selected_experts: mindspore.Tensor, 
+-#         routing_weights: mindspore.Tensor
+-#     ) -> mindspore.Tensor:
+-#         batch_size, _ = hidden_states.shape
+-#         expert_outputs_list = [
+-#             ops.cat([
+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-#             ], dim=0) 
+-#             for i in range(batch_size)
+-#         ]
+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-#         return moe_output.squeeze(1)
+-
+-#     @no_grad()
+-#     def _moe_infer_prefill(
+-#         self, 
+-#         hidden_states: mindspore.Tensor, 
+-#         selected_experts: mindspore.Tensor, 
+-#         routing_weights: mindspore.Tensor
+-#     ) -> mindspore.Tensor:
+-#         moe_output = ops.zeros_like(hidden_states)
+-#         num_tokens = hidden_states.shape[0]
+-#         flat_selected_experts = selected_experts.flatten()
+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-#         active_experts = ops.unique(flat_selected_experts)
+-        
+-#         for expert_idx_tensor in active_experts:
+-#             expert_idx = expert_idx_tensor.item()
+-#             expert_layer = self.experts[expert_idx]
+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-#             selected_token_indices = token_indices[mask]
+-#             selected_routing_weights = routing_weights.flatten()[mask]
+-#             current_states = hidden_states[selected_token_indices]
+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-#             moe_output = moe_output.index_add(
+-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-#             )
+-#         return moe_output
+-
+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-#         """
+-#         顶层 forward 方法，作为智能分发器。
+-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-#         """
+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-        
+-#         # 1. 门控计算 (通用逻辑)
+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-#         router_logits = self.gate(hidden_states_reshaped)
+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-
+-#         if self.norm_topk_prob:
+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-        
+-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-        
+-#         # 2. 智能分发到最优 MoE 路径
+-#         moe_output = None
+-#         if not self.training:
+-#             if sequence_length == 1:
+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-#             else:
+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-#         else:
+-#             raise NotImplementedError("Training path is not implemented.")
+-
+-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-        
+-#         # 4. 合并 MoE 输出和共享专家输出
+-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-        
+-#         # 5. 恢复原始形状并返回
+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-        
+-#         return final_hidden_states, router_logits
+-
+-# prefill fastest
+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-#     """
+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-#     """
+-#     def __init__(self, config: Qwen2MoeConfig):
+-#         super().__init__()
+-#         self.num_experts = config.num_experts
+-#         self.top_k = config.num_experts_per_tok
+-#         self.norm_topk_prob = config.norm_topk_prob
+-
+-#         # 门控网络
+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-#         # 专家列表
+-#         self.experts = nn.ModuleList(
+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-#         )
+-#         # 共享专家
+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-
+-#     @no_grad()
+-#     def _moe_infer_dispatch(
+-#         self, 
+-#         hidden_states: mindspore.Tensor, 
+-#         selected_experts: mindspore.Tensor, 
+-#         routing_weights: mindspore.Tensor
+-#     ) -> mindspore.Tensor:
+-#         """
+-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-#         """
+-#         moe_output = ops.zeros_like(hidden_states)
+-#         num_tokens, _ = hidden_states.shape
+-        
+-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-#         flat_selected_experts = selected_experts.flatten()
+-#         flat_routing_weights = routing_weights.flatten()
+-
+-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-
+-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-#         active_experts = ops.unique(flat_selected_experts)
+-        
+-#         for expert_idx_tensor in active_experts:
+-#             expert_idx = expert_idx_tensor.item()
+-#             expert_layer = self.experts[expert_idx]
+-            
+-#             # 找到所有分配给该专家的 token
+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-            
+-#             # 使用 mask 选取对应的 token 和权重
+-#             current_token_indices = token_indices[mask]
+-#             current_routing_weights = flat_routing_weights[mask]
+-#             current_hidden_states = hidden_states[current_token_indices]
+-            
+-#             # 对这些 token 进行批处理
+-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-            
+-#             # 使用 index_add 将结果精确地加回到对应位置
+-#             moe_output = moe_output.index_add(
+-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-#             )
+-#         return moe_output
+-
+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-#         """
+-#         顶层 forward 方法，作为智能分发器。
+-#         """
+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-        
+-#         # 1. 门控计算
+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-#         router_logits = self.gate(hidden_states_reshaped)
+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-
+-#         if self.norm_topk_prob:
+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-        
+-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-        
+-#         # 2. 调用统一的 MoE 计算内核
+-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-
+-#         # 3. 统一处理共享专家
+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-        
+-#         # 4. 合并输出
+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-        
+-#         # 5. 恢复原始形状并返回
+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-        
+-#         return final_hidden_states, router_logits
+-
+-
+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-#     """
+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-#     【最终高性能与高精度版】：
+-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-#     3. 这样实现了速度和准确性的两全其美。
+-#     """
+-#     def __init__(self, config: Qwen2MoeConfig):
+-#         super().__init__()
+-#         self.num_experts = config.num_experts
+-#         self.top_k = config.num_experts_per_tok
+-#         self.norm_topk_prob = config.norm_topk_prob
+-
+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-#         self.experts = nn.ModuleList(
+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-#         )
+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-
+-#     @no_grad()
+-#     def _moe_infer_decode(
+-#         self, 
+-#         hidden_states: mindspore.Tensor, 
+-#         selected_experts: mindspore.Tensor, 
+-#         routing_weights: mindspore.Tensor
+-#     ) -> mindspore.Tensor:
+-#         """
+-#         【解码路径】极致优化版：bmm + 高精度累加。
+-#         """
+-#         original_dtype = hidden_states.dtype
+-#         batch_size, _ = hidden_states.shape
+-        
+-#         expert_outputs_list = [
+-#             ops.cat([
+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-#             ], dim=0) 
+-#             for i in range(batch_size)
+-#         ]
+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-
+-#         # 在 float32 下执行 bmm，得到高精度结果
+-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-        
+-#         # 将高精度结果转换回原始数据类型
+-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-        
+-#         return moe_output
+-
+-#     @no_grad()
+-#     def _moe_infer_prefill(
+-#         self, 
+-#         hidden_states: mindspore.Tensor, 
+-#         selected_experts: mindspore.Tensor, 
+-#         routing_weights: mindspore.Tensor
+-#     ) -> mindspore.Tensor:
+-#         """
+-#         【预填充路径】与原始实现一致，结果精确。
+-#         """
+-#         moe_output = ops.zeros_like(hidden_states)
+-#         num_tokens, _ = hidden_states.shape
+-#         flat_selected_experts = selected_experts.flatten()
+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-#         active_experts = ops.unique(flat_selected_experts)
+-        
+-#         for expert_idx_tensor in active_experts:
+-#             expert_idx = expert_idx_tensor.item()
+-#             expert_layer = self.experts[expert_idx]
+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-#             selected_token_indices = token_indices[mask]
+-#             selected_routing_weights = routing_weights.flatten()[mask]
+-#             current_states = hidden_states[selected_token_indices]
+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-#             moe_output = moe_output.index_add(
+-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-#             )
+-#         return moe_output
+-
+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-        
+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-#         router_logits = self.gate(hidden_states_reshaped)
+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-
+-#         if self.norm_topk_prob:
+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-        
+-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-#         # 如果模型主体是 float16，后续再转换
+-        
+-#         moe_output = None
+-#         if not self.training:
+-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-#             # _moe_infer_decode 内部会处理好类型转换
+-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-#             if sequence_length == 1:
+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-#             else:
+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-#         else:
+-#             raise NotImplementedError("Training path is not implemented.")
+-
+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-        
+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-        
+-#         return final_hidden_states, router_logits
+-    
+-
+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-#     """
+-#     【融合版】一个混合专家模块，内置两种推理策略，
+-#     由外部全局变量 `Long_Prompt` 控制：
+-
+-#     - if Long_Prompt is True:  【精度优先模式】
+-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-#       适用于处理长序列，避免误差累积。
+-
+-#     - if Long_Prompt is False: 【速度优先模式】
+-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-#       在解码阶段获得极致速度，同时保证结果高度准确。
+-#     """
+-#     def __init__(self, config: Qwen2MoeConfig):
+-#         super().__init__()
+-#         self.num_experts = config.num_experts
+-#         self.top_k = config.num_experts_per_tok
+-#         self.norm_topk_prob = config.norm_topk_prob
+-
+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-#         self.experts = nn.ModuleList(
+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-#         )
+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-
+-#     # --- 速度优先模式的辅助函数 ---
+-#     @no_grad()
+-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-#         original_dtype = hidden_states.dtype
+-#         batch_size, _ = hidden_states.shape
+-#         expert_outputs_list = [
+-#             ops.cat([
+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-#             ], dim=0) 
+-#             for i in range(batch_size)
+-#         ]
+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-#         weights_fp32 = routing_weights.to(mindspore.float32)
+-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-
+-#     @no_grad()
+-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-#         moe_output = ops.zeros_like(hidden_states)
+-#         num_tokens, _ = hidden_states.shape
+-#         flat_selected_experts = selected_experts.flatten()
+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-#         active_experts = ops.unique(flat_selected_experts)
+-#         for expert_idx_tensor in active_experts:
+-#             expert_idx = expert_idx_tensor.item()
+-#             expert_layer = self.experts[expert_idx]
+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-#             selected_token_indices = token_indices[mask]
+-#             selected_routing_weights = routing_weights.flatten()[mask]
+-#             current_states = hidden_states[selected_token_indices]
+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-#         return moe_output
+-
+-#     # --- 精度优先模式的辅助函数 ---
+-#     @no_grad()
+-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-#         moe_output = ops.zeros_like(hidden_states)
+-#         num_tokens, _ = hidden_states.shape
+-#         flat_selected_experts = selected_experts.flatten()
+-#         flat_routing_weights = routing_weights.flatten()
+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-#         active_experts = ops.unique(flat_selected_experts)
+-#         for expert_idx_tensor in active_experts:
+-#             expert_idx = expert_idx_tensor.item()
+-#             expert_layer = self.experts[expert_idx]
+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-#             current_token_indices = token_indices[mask]
+-#             current_routing_weights = flat_routing_weights[mask]
+-#             current_hidden_states = hidden_states[current_token_indices]
+-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-#         return moe_output
+-
+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-#         # 声明我们将要使用一个在模块外部定义的全局变量
+-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-#         global Long_Prompt
+-
+-#         # 1. 门控计算 (所有模式通用)
+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-#         router_logits = self.gate(hidden_states_reshaped)
+-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-#         if self.norm_topk_prob:
+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-        
+-#         moe_output = None
+-#         if not self.training:
+-#             # 根据 Long_Prompt 标志选择模式
+-#             if Long_Prompt:
+-#                 # --- 精度优先模式 ---
+-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-#             else:
+-#                 # --- 速度优先模式 ---
+-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-#                 if sequence_length == 1:
+-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-#                 else:
+-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-#         else:
+-#             raise NotImplementedError("Training path is not implemented.")
+-
+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-        
+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-        
+-#         return final_hidden_states, router_logits
+-    
+ class Qwen2MoeSparseMoeBlock(nn.Module):
+     """
+     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+         return moe_output_fp32.squeeze(1).to(original_dtype)
+ 
++    # @no_grad()
++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++    #     num_tokens, _ = hidden_states.shape
++    #     flat_selected_experts = selected_experts.flatten()
++    #     sorted_expert_indices = flat_selected_experts.argsort()
++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++    #     original_token_indices = sorted_expert_indices // self.top_k
++    #     moe_output = ops.zeros_like(hidden_states)
++    #     current_token_offset = 0
++    #     for i in range(self.num_experts):
++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
++    #         if expert_token_count == 0:
++    #             continue
++    #         end_offset = current_token_offset + expert_token_count
++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++    #         current_token_offset += expert_token_count
++    #     return moe_output
++
+     @no_grad()
+     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-        num_tokens, _ = hidden_states.shape
+-        flat_selected_experts = selected_experts.flatten()
+-        sorted_expert_indices = flat_selected_experts.argsort()
+-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-        original_token_indices = sorted_expert_indices // self.top_k
++        """
++        优化版 MoE prefill (速度优先模式)：
++        - 批量张量化处理同一个 expert 的所有 token
++        - 跳过无 token 的专家
++        - 保持结果完全一致
++        """
+         moe_output = ops.zeros_like(hidden_states)
+-        current_token_offset = 0
+-        for i in range(self.num_experts):
+-            expert_token_count = tokens_per_expert[i] - current_token_offset
+-            if expert_token_count == 0:
+-                continue
+-            end_offset = current_token_offset + expert_token_count
+-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-            expert_hidden_states = hidden_states[expert_original_token_indices]
+-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-            current_token_offset += expert_token_count
++
++        flat_selected_experts = selected_experts.flatten()
++        flat_routing_weights = routing_weights.flatten()
++
++        idxs = flat_selected_experts.argsort()
++        sorted_expert_indices = flat_selected_experts[idxs]
++        sorted_token_indices = idxs // self.top_k
++
++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
++
++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
++
++        for expert_id in active_experts.tolist():
++            start = int(tokens_per_expert[:expert_id].sum().item())
++            end = start + int(tokens_per_expert[expert_id].item())
++
++            token_idx = sorted_token_indices[start:end]
++            expert_tokens = hidden_states[token_idx]
++
++            expert_out = self.experts[expert_id](expert_tokens)
++
++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
++
++            moe_output = mindspore.mint.scatter_add(
++                moe_output,
++                0,
++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
++                scaled_out.to(hidden_states.dtype)
++            )
++
+         return moe_output
+ 
++
+     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+     @no_grad()
+     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+         
+         moe_output = None
+-        if Long_Prompt:
+-            # --- 精度优先模式 (ACCURACY MODE) ---
+-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++        # if Long_Prompt==0:
++        #     # --- 精度优先模式 (ACCURACY MODE) ---
++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++        # else:
++        #     # --- 速度优先模式 (SPEED MODE) ---
++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++        #     if sequence_length == 1:
++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++        #     else:
++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++        
++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
++        if sequence_length == 1:
++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+         else:
+-            # --- 速度优先模式 (SPEED MODE) ---
+-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-            if sequence_length == 1:
+-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-            else:
+-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-        
++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++    
+ 
+         # 3. 共享专家计算与合并 (所有模式通用)
+         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         
+         return final_hidden_states, router_logits
+ 
++
+ class Qwen2MoeDecoderLayer(nn.Module):
+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+         super().__init__()
+         self.hidden_size = config.hidden_size
+         
+-        # if Long_Prompt:
+-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-        # else:
++        # if Long_Prompt == 2:
+         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++        # else:
++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+ 
+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+ 
+@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+             )
+ 
+         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++        #     attention_mask,
++        #     sequence_length=sequence_length,
++        #     target_length=target_length,
++        #     dtype=dtype,
++        #     min_dtype=min_dtype,
++        #     cache_position=cache_position,
++        #     batch_size=input_tensor.shape[0],
++        # )
++        #@dwj
++        causal_mask = get_cached_causal_mask_with_cache_position(
+             attention_mask,
+             sequence_length=sequence_length,
+             target_length=target_length,
+@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+         """
+-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
++        _causal_mask_cache.clear()
+ 
+         input_ids = kwargs.get("input_ids")
+         if input_ids is None and args:
+@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+ 
+         if input_ids is not None:
+             prompt_length = input_ids.shape[1]
+-            
+-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-                Long_Prompt = True
++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
++                Long_Prompt = 2
++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
++                Long_Prompt = 0
+             else:
+-                Long_Prompt = False
++                Long_Prompt = 1
++
+ 
+         return super().generate(*args, **kwargs)
+     
+@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+             dtype = self.lm_head.weight.dtype
+             min_dtype = float(ops.finfo(dtype).min)
+ 
+-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++            #     attention_mask,
++            #     sequence_length=sequence_length,
++            #     target_length=past_key_values.get_max_length(),
++            #     dtype=dtype,
++            #     min_dtype=min_dtype,
++            #     cache_position=cache_position,
++            #     batch_size=batch_size,
++            # )
++
++            #@dwj
++            attention_mask = get_cached_causal_mask_with_cache_position(
+                 attention_mask,
+                 sequence_length=sequence_length,
+                 target_length=past_key_values.get_max_length(),
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+deleted file mode 100644
+index 6dfb5b93..00000000
+--- a/patches/0001-20251104commit.patch
++++ /dev/null
+@@ -1,1272 +0,0 @@
+-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Tue, 4 Nov 2025 09:11:51 +0800
+-Subject: [PATCH] 20251104commit
+-
+----
+- mindnlp/transformers/cache_utils.py           |  28 +-
+- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+- 3 files changed, 976 insertions(+), 87 deletions(-)
+-
+-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-index cadd2e04..02f8d4be 100644
+---- a/mindnlp/transformers/cache_utils.py
+-+++ b/mindnlp/transformers/cache_utils.py
+-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-             # k_out[:, :, cache_position] = key_states
+-             # v_out[:, :, cache_position] = value_states
+--            if ON_ORANGE_PI:
+--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+--            else:
+--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+--
+-+            # if ON_ORANGE_PI:
+-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+            # else:
+-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+            # 确保 cache_position 是 1D tensor 并且类型正确
+-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+            if cache_position.ndim > 1:
+-+                cache_position = cache_position.flatten()
+-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+                cache_position = cache_position.int()
+-+            
+-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+            k_out[:, :, cache_position] = key_states
+-+            v_out[:, :, cache_position] = value_states
+-+            
+-         return k_out, v_out
+- 
+-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index c695b944..d8303e45 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+- # Copied from transformers.models.llama.modeling_llama.rotate_half
+- def rotate_half(x):
+-     """Rotates half the hidden dims of the input."""
+--    x1 = x[..., : x.shape[-1] // 2]
+--    x2 = x[..., x.shape[-1] // 2 :]
+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+    # x1 = x[..., : x.shape[-1] // 2]
+-+    # x2 = x[..., x.shape[-1] // 2 :]
+-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-     return ops.cat((-x2, x1), dim=-1)
+- 
+- 
+-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-         if self.training:
+-             raise NotImplementedError("Training is not supported yet.")
+-         else:
+--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+--        if self.config.n_shared_experts is not None:
+--            y = y + self.shared_experts(identity)
+--        return y
+-+            # @lwx
+-+            if orig_shape[1] == 1:
+-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+                y=y.view(*orig_shape)
+-+                if self.config.n_shared_experts is not None:
+-+                    y = y + self.shared_experts(identity)
+-+                return y
+-+            else:
+-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+                if self.config.n_shared_experts is not None:
+-+                    y = y + self.shared_experts(identity)
+-+                return y
+-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+        # if self.config.n_shared_experts is not None:
+-+        #     y = y + self.shared_experts(identity)
+-+        # return y
+-+        
+-+    @no_grad()
+-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+
+-+        expert_cache = ops.zeros_like(x)
+-+        for i in range(self.num_experts_per_tok):
+-+            expert_id = flat_expert_indices[i].item()
+-+            weight = flat_expert_weights[i].item()
+-+            expert = self.experts[expert_id]
+-+            expert_out = expert(x)
+-+            expert_cache += expert_out * weight
+-+        return expert_cache
+- 
+-     @no_grad()
+--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+--        # expert_cache = torch.zeros_like(x)
+--        # idxs = flat_expert_indices.argsort()
+--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+--        # token_idxs = idxs // self.num_experts_per_tok
+--        # for i, end_idx in enumerate(tokens_per_expert):
+--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+--        #     if start_idx == end_idx:
+--        #         continue
+--        #     expert = self.experts[i]
+--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+--        #     expert_tokens = x[exp_token_idx]
+--        #     expert_out = expert(expert_tokens)
+--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+--        # return expert_cache
+-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-         expert_cache = ops.zeros_like(x)
+-         idxs = flat_expert_indices.argsort()
+-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-         token_idxs = idxs // self.num_experts_per_tok
+-+
+-         for i, end_idx in enumerate(tokens_per_expert):
+-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-             if start_idx == end_idx:
+-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-             expert_out = expert(expert_tokens)
+-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+
+-         return expert_cache
+-+        
+-+    # @no_grad()
+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     # expert_cache = torch.zeros_like(x)
+-+    #     # idxs = flat_expert_indices.argsort()
+-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+    #     # token_idxs = idxs // self.num_experts_per_tok
+-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+    #     #     if start_idx == end_idx:
+-+    #     #         continue
+-+    #     #     expert = self.experts[i]
+-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+    #     #     expert_tokens = x[exp_token_idx]
+-+    #     #     expert_out = expert(expert_tokens)
+-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+    #     # return expert_cache
+-+    #     expert_cache = ops.zeros_like(x)
+-+    #     idxs = flat_expert_indices.argsort()
+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+
+-+    #     for i, end_idx in enumerate(tokens_per_expert):
+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+    #         if start_idx == end_idx:
+-+    #             continue
+-+    #         expert = self.experts[i]
+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+    #         expert_tokens = x[exp_token_idx]
+-+    #         expert_out = expert(expert_tokens)
+-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+
+-+    #     return expert_cache
+-+    # @no_grad()
+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     expert_cache = ops.zeros_like(x)
+-+
+-+    #     # 排序保证顺序一致
+-+    #     idxs = flat_expert_indices.argsort()
+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+
+-+    #     # 找出有 token 的专家
+-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+
+-+    #     for i in active_experts.tolist():
+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+    #         end_idx = tokens_per_expert[i]
+-+    #         if start_idx == end_idx:  # 没有 token
+-+    #             continue
+-+
+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+    #         expert_tokens = x[exp_token_idx]
+-+    #         expert_out = self.experts[i](expert_tokens)
+-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+
+-+    #         expert_cache = mindspore.mint.scatter_add(
+-+    #             expert_cache,
+-+    #             0,
+-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+    #             expert_out
+-+    #         )
+-+
+-+    #     return expert_cache
+-+
+-+
+- 
+- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+- #     """
+-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+- 
+-         # Initialize weights and apply final processing
+-         self.post_init()
+-+        self.warm_up = False
+-+
+-+    def warmup_moe_model_deep(self):
+-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+        test_texts = [
+-+            "warmup short",
+-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+        ]
+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+        if tokenizer is None:
+-+            from mindnlp.transformers import AutoTokenizer
+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+            self._warmup_tokenizer = tokenizer
+-+
+-+        for text in test_texts:
+-+            inputs = tokenizer(text, return_tensors="ms")
+-+            with mindspore._no_grad():
+-+                _ = self(**inputs, use_cache=False)
+-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+- 
+-     def get_input_embeddings(self):
+-         return self.model.embed_tokens
+-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-         ```"""
+-+        if not self.warm_up:
+-+            self.warm_up = True
+-+            self.warmup_moe_model_deep()
+-+
+-         output_attentions = (
+-             output_attentions
+-             if output_attentions is not None
+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-index 3cbf820e..d4c6b651 100644
+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-@@ -18,7 +18,6 @@
+- # See the License for the specific language governing permissions and
+- # limitations under the License.
+- """MindSpore Qwen2MoE model."""
+--
+- import math
+- from typing import List, Optional, Tuple, Union
+- 
+-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-     TokenClassifierOutput,
+- )
+- from ...modeling_utils import PreTrainedModel
+-+from ...generation import GenerationMixin
+- from ....utils import logging
+- from .configuration_qwen2_moe import Qwen2MoeConfig
+- 
+-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-         self.variance_epsilon = eps
+- 
+-     def forward(self, hidden_states):
+-+        # @dwj
+-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+        # @lwx
+-+        # if not self.training :
+-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-         input_dtype = hidden_states.dtype
+-         hidden_states = hidden_states.to(mindspore.float32)
+-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-@@ -234,6 +239,8 @@ def rotate_half(x):
+-     """Rotates half the hidden dims of the input."""
+-     x1 = x[..., : x.shape[-1] // 2]
+-     x2 = x[..., x.shape[-1] // 2 :]
+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-     return ops.cat((-x2, x1), dim=-1)
+- 
+- 
+-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-         self.config = config
+-         self.hidden_size = config.hidden_size
+-         self.intermediate_size = intermediate_size
+-+        
+-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-         self.act_fn = ACT2FN[config.hidden_act]
+- 
+-     def forward(self, x):
+--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+--
+- 
+-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+        # @lwx 
+-+        # gate_up_output = self.gate_up_proj(x)
+-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+        # return self.down_proj(swiglu_output)
+-+
+-+    # def forward(self, x):
+-+    #     gate_proj_out = self.gate_proj(x)
+-+    #     up_proj_out = self.up_proj(x)
+-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+    #     return self.down_proj(swiglu_out)
+-+        
+- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-     """
+-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-         use_cache: bool = False,
+-         cache_position: Optional[mindspore.Tensor] = None,
+-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+        
+-+
+-         bsz, q_len, _ = hidden_states.shape
+- 
+-         query_states = self.q_proj(hidden_states)
+-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-                     "with a layer index."
+-                 )
+--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+            if isinstance(past_key_value, StaticCache):
+-+                kv_seq_len = key_states.shape[-2]
+-+            else:
+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+- 
+-         if past_key_value is not None:
+-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+            
+-+            if isinstance(past_key_value, StaticCache):
+-+                kv_seq_len = key_states.shape[-2]
+- 
+-         # repeat k/v heads if n_kv_heads < n_heads
+-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+--
+-+        
+-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+- 
+--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+--            raise ValueError(
+--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+--                f" {attn_weights.shape}"
+--            )
+--
+--        if attention_mask is not None:  # no matter the length, we just slice it
+--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+        if attention_mask is not None:
+-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-             attn_weights = attn_weights + causal_mask
+- 
+-         # upcast attention to fp32
+-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+- 
+-         attn_output = self.o_proj(attn_output)
+--
+-+        # @lwx
+-+        
+-+        # max_seq_len = self.max_position_embeddings  # 2048
+-+
+-+        # if attention_mask is not None:
+-+        #     # attention_mask: [B, 1, Sq, Sk]
+-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+
+-+        #     # pad 到 [max_seq_len, max_seq_len]
+-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+        #     global_attention_mask = padded_mask
+-+        # else:
+-+        #     global_attention_mask = None
+-+
+-+
+-+        # sparse_mode=3
+-+        # attn_output = mindspore.ops.flash_attention_score(
+-+        #     query=query_states,
+-+        #     key=key_states,
+-+        #     value=value_states,
+-+        #     real_shift=None,
+-+        #     padding_mask=None,
+-+
+-+        #     head_num=self.num_heads,
+-+        #     attn_mask=global_attention_mask,
+-+        #     keep_prob=1.0 - self.attention_dropout,
+-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+        #     input_layout="BNSD",
+-+        #     pre_tokens=2147483647,
+-+        #     next_tokens=2147483647,
+-+        #     inner_precise=0,
+-+        #     drop_mask=None,
+-+        #     prefix=None,
+-+        #     actual_seq_qlen=None,
+-+        #     actual_seq_kvlen=None,
+-+        #     sparse_mode=sparse_mode,
+-+        # )
+-         if not output_attentions:
+-             attn_weights = None
+- 
+-         return attn_output, attn_weights, past_key_value
+- 
+- 
+-+class Qwen2MoeFlashAttention(nn.Module):
+-+    """
+-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+
+-+    关键改动:
+-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+       直接传入原始的 key 和 value 张量效率更高。
+-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+    """
+-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+        super().__init__()
+-+        self.config = config
+-+        self.layer_idx = layer_idx
+-+        self.hidden_size = config.hidden_size
+-+        self.num_heads = config.num_attention_heads
+-+        self.head_dim = self.hidden_size // self.num_heads
+-+        self.num_key_value_heads = config.num_key_value_heads
+-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+        self.max_position_embeddings = config.max_position_embeddings
+-+        self.rope_theta = config.rope_theta
+-+        self.attention_dropout = config.attention_dropout
+-+
+-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+            raise ValueError(
+-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+            )
+-+
+-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+
+-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+            self.head_dim,
+-+            max_position_embeddings=self.max_position_embeddings,
+-+            base=self.rope_theta,
+-+        )
+-+
+-+    def forward(
+-+        self,
+-+        hidden_states: mindspore.Tensor,
+-+        attention_mask: Optional[mindspore.Tensor] = None,
+-+        position_ids: Optional[mindspore.Tensor] = None,
+-+        past_key_value: Optional[Cache] = None,
+-+        output_attentions: bool = False,
+-+        use_cache: bool = False,
+-+        cache_position: Optional[mindspore.Tensor] = None,
+-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+        bsz, q_len, _ = hidden_states.shape
+-+
+-+        # 1. 线性投射 Q, K, V
+-+        query_states = self.q_proj(hidden_states)
+-+        key_states = self.k_proj(hidden_states)
+-+        value_states = self.v_proj(hidden_states)
+-+
+-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+        # 3. RoPE 旋转位置编码
+-+        kv_seq_len = key_states.shape[-2]
+-+        if past_key_value is not None:
+-+            if self.layer_idx is None:
+-+                raise ValueError(
+-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+                    "with a layer index."
+-+                )
+-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+                if cache_position.shape[0] == 1:
+-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+                    kv_seq_len = past_seen_tokens + 1
+-+                else:
+-+                    # prefill 阶段：cache_position 是范围，使用其长度
+-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+            else:
+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+
+-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+        # 4. KV 缓存更新
+-+        if past_key_value is not None:
+-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+            key_states, value_states = past_key_value.update(
+-+                key_states, value_states, self.layer_idx, cache_kwargs
+-+            )
+-+            
+-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+                if cache_position.shape[0] == 1:
+-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+                    kv_seq_len = key_states.shape[-2]
+-+
+-+        # 5. [重要] 准备 Attention Mask
+-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+        fa_attention_mask = None
+-+        if attention_mask is not None:
+-+            # 截取与当前key长度匹配的部分
+-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+            fa_attention_mask = (mask_slice != 0)
+-+
+-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+        input_dtype = query_states.dtype
+-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+            query_states = query_states.to(mindspore.float16)
+-+            key_states = key_states.to(mindspore.float16)
+-+            value_states = value_states.to(mindspore.float16)
+-+
+-+        # 6. [核心] 调用 flash_attention_score 算子
+-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+        attn_output = mindspore.ops.flash_attention_score(
+-+            query=query_states,
+-+            key=key_states,
+-+            value=value_states,
+-+            head_num=self.num_heads, # 传入Q的头数(N1)
+-+            attn_mask=fa_attention_mask,
+-+            keep_prob=1.0 - self.attention_dropout,
+-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+            input_layout="BNSD",
+-+            sparse_mode=0 # 使用 defaultMask 模式
+-+        )
+-+
+-+        # 恢复原始数据类型
+-+        attn_output = attn_output.to(input_dtype)
+-+
+-+        # 7. 调整输出形状
+-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+        attn_output = self.o_proj(attn_output)
+-+
+-+        # FlashAttention 算子不直接返回注意力权重矩阵
+-+        attn_weights = None
+-+        if output_attentions:
+-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+
+-+        return attn_output, attn_weights, past_key_value
+-+
+-+    # def forward(
+-+    #     self,
+-+    #     hidden_states: mindspore.Tensor,
+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+    #     past_key_value: Optional[Cache] = None,
+-+    #     output_attentions: bool = False,
+-+    #     use_cache: bool = False,
+-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+    #     bsz, q_len, _ = hidden_states.shape
+-+
+-+    #     # 1. 线性投射 Q, K, V
+-+    #     query_states = self.q_proj(hidden_states)
+-+    #     key_states = self.k_proj(hidden_states)
+-+    #     value_states = self.v_proj(hidden_states)
+-+
+-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+    #     # 3. RoPE 旋转位置编码
+-+    #     kv_seq_len = key_states.shape[-2]
+-+    #     if past_key_value is not None:
+-+    #         if self.layer_idx is None:
+-+    #             raise ValueError(
+-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+    #                 "with a layer index."
+-+    #             )
+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+
+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+    #     # 4. KV 缓存更新
+-+    #     if past_key_value is not None:
+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+    #         key_states, value_states = past_key_value.update(
+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+    #         )
+-+
+-+    #     # 5. 准备 Attention Mask
+-+    #     fa_attention_mask = None
+-+    #     if attention_mask is not None:
+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+    #         fa_attention_mask = (mask_slice != 0)
+-+
+-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+    #     input_dtype = query_states.dtype
+-+
+-+    #     # 6. [核心] 调用 flash_attention_score 算子
+-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+    #         query=query_states,
+-+    #         key=key_states,
+-+    #         value=value_states,
+-+    #         head_num=self.num_heads,
+-+    #         attn_mask=fa_attention_mask,
+-+    #         keep_prob=1.0 - self.attention_dropout,
+-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+    #         input_layout="BNSD",
+-+    #         sparse_mode=0,
+-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+    #         inner_precise=1
+-+    #     )
+-+
+-+    #     # 恢复原始数据类型
+-+    #     attn_output = attn_output.to(input_dtype)
+-+
+-+    #     # 7. 调整输出形状
+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+    #     attn_output = self.o_proj(attn_output)
+-+
+-+    #     attn_weights = None
+-+    #     if output_attentions:
+-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+
+-+    #     return attn_output, attn_weights, past_key_value
+-+
+-+    # def forward(
+-+    #     self,
+-+    #     hidden_states: mindspore.Tensor,
+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+    #     past_key_value: Optional[Cache] = None,
+-+    #     output_attentions: bool = False,
+-+    #     use_cache: bool = False,
+-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+    #     bsz, q_len, _ = hidden_states.shape
+-+
+-+    #     query_states = self.q_proj(hidden_states)
+-+    #     key_states = self.k_proj(hidden_states)
+-+    #     value_states = self.v_proj(hidden_states)
+-+
+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+    #     kv_seq_len = key_states.shape[-2]
+-+    #     if past_key_value is not None:
+-+    #         if self.layer_idx is None:
+-+    #             raise ValueError("`layer_idx` must be specified for caching")
+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+
+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+    #     if past_key_value is not None:
+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+    #         key_states, value_states = past_key_value.update(
+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+    #         )
+-+
+-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+
+-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+    #     query_states = query_states / math.sqrt(self.head_dim)
+-+    #     # <--- 修改结束 ---
+-+
+-+    #     fa_attention_mask = None
+-+    #     if attention_mask is not None:
+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+    #         fa_attention_mask = (mask_slice != 0)
+-+
+-+    #     input_dtype = query_states.dtype
+-+
+-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+    #         query=query_states,    # 传入已经预先缩放过的 query
+-+    #         key=key_states,
+-+    #         value=value_states,
+-+    #         head_num=self.num_heads,
+-+    #         attn_mask=fa_attention_mask,
+-+    #         keep_prob=1.0 - self.attention_dropout,
+-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+    #         input_layout="BNSD",
+-+    #         sparse_mode=0,
+-+    #         inner_precise=1        # 仍然保持内部高精度计算
+-+    #     )
+-+
+-+    #     attn_output = attn_output.to(input_dtype)
+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+    #     attn_output = self.o_proj(attn_output)
+-+
+-+    #     attn_weights = None
+-+    #     if output_attentions:
+-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+
+-+    #     return attn_output, attn_weights, past_key_value
+-+
+- QWEN2MOE_ATTENTION_CLASSES = {
+-     "eager": Qwen2MoeAttention,
+-+    "flash-attention": Qwen2MoeFlashAttention,
+- }
+- 
+- 
+-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+- 
+-+    #@dwj
+-+    # 只遍历激活的专家，而非全部专家
+-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+--        hidden_states = hidden_states.view(-1, hidden_dim)
+--        # router_logits: (batch * sequence_length, n_experts)
+--        router_logits = self.gate(hidden_states)
+--
+--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+--        if self.norm_topk_prob:
+--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--        # we cast back to the input dtype
+--        routing_weights = routing_weights.to(hidden_states.dtype)
+--
+--        final_hidden_states = ops.zeros(
+--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+--        )
+--
+--        # One hot encode the selected experts to create an expert mask
+--        # this will be used to easily index which expert is going to be sollicitated
+--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+--
+--        # Loop over all available experts in the model and perform the computation on each expert
+--        for expert_idx in range(self.num_experts):
+--            expert_layer = self.experts[expert_idx]
+--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+--
+--            # Index the correct hidden states and compute the expert hidden state for
+--            # the current expert. We need to make sure to multiply the output hidden
+--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+--            if 0 not in idx.shape:
+--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+--
+--                # However `index_add_` only support torch tensors for indexing so we'll use
+--                # the `top_x` tensor here.
+--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+--
+--        shared_expert_output = self.shared_expert(hidden_states)
+--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+--
+--        final_hidden_states = final_hidden_states + shared_expert_output
+-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+            num_tokens = hidden_states_reshaped.shape[0]
+-+
+-+            router_logits = self.gate(hidden_states_reshaped)
+-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+
+-+            if self.norm_topk_prob:
+-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+            routing_weights = routing_weights.to(hidden_states.dtype)
+-+            
+-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+            flat_selected_experts = selected_experts.flatten()
+-+            
+-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+            token_indices = broadcasted_token_indices.flatten()
+-+            
+-+            active_experts = ops.unique(flat_selected_experts)
+-+            
+-+            for expert_idx_tensor in active_experts:
+-+                expert_idx = expert_idx_tensor.item()
+-+                expert_layer = self.experts[expert_idx]
+-+                
+-+                mask = (flat_selected_experts == expert_idx_tensor)
+-+                selected_token_indices = token_indices[mask]
+-+                selected_routing_weights = routing_weights.flatten()[mask]
+-+                
+-+                current_states = hidden_states_reshaped[selected_token_indices]
+-+                
+-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+                
+-+                final_hidden_states = final_hidden_states.index_add(
+-+                    dim=0,
+-+                    index=selected_token_indices,
+-+                    source=expert_output.to(hidden_states.dtype)
+-+                )
+-+            
+-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+- 
+--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+--        return final_hidden_states, router_logits
+-+            final_hidden_states = final_hidden_states + shared_expert_output
+-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+            
+-+            return final_hidden_states, router_logits
+- 
+- 
+- class Qwen2MoeDecoderLayer(nn.Module):
+-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+- 
+-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+- 
+-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+
+-         if (layer_idx not in config.mlp_only_layers) and (
+-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-         ):
+-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-     _skip_keys_device_placement = "past_key_values"
+-     _supports_cache_class = True
+-+#lwx
+-+    # _supports_static_cache = True
+- 
+-     def _init_weights(self, module):
+-         std = self.config.initializer_range
+-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-         return causal_mask
+- 
+- 
+--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-     _tied_weights_keys = ["lm_head.weight"]
+- 
+-     def __init__(self, config):
+-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-         self.num_experts_per_tok = config.num_experts_per_tok
+-         # Initialize weights and apply final processing
+-         self.post_init()
+-+        # @lwx
+-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+        #     self.generation_config.cache_implementation = "static"
+-+        self._warmed_up = False
+-+
+-+    def warmup_moe_model(self):
+-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+        test_texts = [
+-+            "warmup short",
+-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+        ]
+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+        if tokenizer is None:
+-+            from mindnlp.transformers import AutoTokenizer
+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+            self._warmup_tokenizer = tokenizer
+-+
+-+        for text in test_texts:
+-+            inputs = tokenizer(text, return_tensors="ms")
+-+            with mindspore._no_grad():
+-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+- 
+-     def get_input_embeddings(self):
+-         return self.model.embed_tokens
+-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-         ```"""
+-+        if not self._warmed_up:
+-+            self._warmed_up = True
+-+            self.warmup_moe_model()
+- 
+-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-         output_router_logits = (
+-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-             }
+-         )
+-         return model_inputs
+-+# @lwx
+-+    # def _decode_one_tokens_logits(
+-+    #     self,
+-+    #     cur_token: mindspore.Tensor,
+-+    #     input_pos: Optional[mindspore.Tensor],
+-+    #     cache_position: mindspore.Tensor,
+-+    #     past_key_values: StaticCache,
+-+    # ) -> mindspore.Tensor:
+-+    #     """
+-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+        
+-+    #     Args:
+-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+    #         input_pos: 输入位置信息，可选
+-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+            
+-+    #     Returns:
+-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+    #     """
+-+    #     # 调用JIT编译的版本
+-+    #     return self.get_decode_one_tokens_logits(
+-+    #         cur_token=cur_token,
+-+    #         input_pos=input_pos,
+-+    #         cache_position=cache_position,
+-+    #         past_key_values=past_key_values,
+-+    #     )
+-+    
+-+    # @mindspore.jit(jit_level='O1')
+-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+    #     """
+-+    #     JIT编译的函数，用于高效的单token解码
+-+    #     使用JIT编译优化以支持静态shape和高效执行
+-+        
+-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+    #     """
+-+    #     outputs = self.model.forward(
+-+    #         input_ids=cur_token,
+-+    #         position_ids=input_pos,
+-+    #         cache_position=cache_position,
+-+    #         past_key_values=past_key_values,
+-+    #         use_cache=True,
+-+    #         return_dict=False,
+-+    #     )
+-+        
+-+    #     hidden_states = outputs[0]
+-+    #     logits = self.lm_head.forward(hidden_states)
+-+    #     logits = logits.float()
+-+        
+-+    #     return logits[:, -1, :]
+-+
+-+    # def _sample(
+-+    #     self,
+-+    #     input_ids: mindspore.Tensor,
+-+    #     logits_processor,
+-+    #     stopping_criteria,
+-+    #     generation_config,
+-+    #     synced_devices: bool,
+-+    #     streamer=None,
+-+    #     logits_warper=None,
+-+    #     **model_kwargs,
+-+    # ):
+-+    #     """
+-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+    #     """
+-+    #     from ...generation.logits_process import LogitsProcessorList
+-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+    #     from mindnlp.core import nn, ops, no_grad
+-+    #     import numpy as np
+-+        
+-+    #     # 检查是否使用 StaticCache
+-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+    #     # 否则，直接调用父类方法
+-+    #     past_key_values = model_kwargs.get("past_key_values")
+-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+
+-+    #     if not isinstance(past_key_values, StaticCache):
+-+    #         # 不使用 StaticCache，直接调用父类方法
+-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+    #         return super()._sample(
+-+    #             input_ids=input_ids,
+-+    #             logits_processor=logits_processor,
+-+    #             stopping_criteria=stopping_criteria,
+-+    #             generation_config=generation_config,
+-+    #             synced_devices=synced_devices,
+-+    #             streamer=streamer,
+-+    #             logits_warper=logits_warper,
+-+    #             **model_kwargs,
+-+    #         )
+-+        
+-+    #     # 使用 StaticCache，进入自定义循环
+-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+    #     pad_token_id = generation_config._pad_token_tensor
+-+    #     output_attentions = generation_config.output_attentions
+-+    #     output_hidden_states = generation_config.output_hidden_states
+-+    #     output_scores = generation_config.output_scores
+-+    #     output_logits = generation_config.output_logits
+-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+    #     max_length = generation_config.max_length
+-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+    #     do_sample = generation_config.do_sample
+-+        
+-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+    #         raise ValueError(
+-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+    #             f"{logits_warper})."
+-+    #         )
+-+        
+-+    #     # init attention / hidden states / scores tuples
+-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+        
+-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+    #         encoder_hidden_states = (
+-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+    #         )
+-+        
+-+    #     # keep track of which sequences are already finished
+-+    #     batch_size, cur_len = input_ids.shape
+-+    #     this_peer_finished = False
+-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+        
+-+    #     time_record = []
+-+    #     from ....utils.testing_utils import parse_flag_from_env
+-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+        
+-+    #     while self._has_unfinished_sequences(
+-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+    #     ):
+-+    #         if _record_time:
+-+    #             import time as time_module
+-+    #             infer_start = time_module.time()
+-+            
+-+    #         # prepare model inputs
+-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+            
+-+    #         # prepare variable output controls
+-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+            
+-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+    #         cur_cache_position = model_inputs.get("cache_position")
+-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+    #         cur_input_ids = model_inputs.get("input_ids")
+-+            
+-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+    #             cur_cache_position is not None and 
+-+    #             len(cur_cache_position.shape) > 0 and
+-+    #             cur_cache_position.shape[0] == 1 and
+-+    #             cur_input_ids is not None and
+-+    #             cur_input_ids.shape[1] == 1):
+-+    #             # 使用 JIT 优化的单 token 解码
+-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+    #             if not hasattr(self, '_jit_used'):
+-+    #                 self._jit_used = False
+-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+                
+-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+    #                 cur_token=cur_input_ids,
+-+    #                 input_pos=model_inputs.get("position_ids"),
+-+    #                 cache_position=cur_cache_position,
+-+    #                 past_key_values=cur_past_key_values,
+-+    #             )
+-+                
+-+    #             # 标记已使用JIT（用于后续判断）
+-+    #             if not self._jit_used:
+-+    #                 self._jit_used = True
+-+                
+-+    #             # 构造兼容的输出对象
+-+    #             class JitOptimizedOutput:
+-+    #                 def __init__(self, logits, config):
+-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+    #                     self.config = config
+-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+    #                     self.cross_attentions = None
+-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+                
+-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+    #         else:
+-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+    #             outputs = self(**model_inputs, return_dict=True)
+-+            
+-+    #         if synced_devices and this_peer_finished:
+-+    #             continue
+-+            
+-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+    #         next_token_logits = outputs.logits[:, -1, :]
+-+            
+-+    #         # pre-process distribution
+-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+    #         if do_sample:
+-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+            
+-+    #         # Store scores, attentions and hidden_states when required
+-+    #         if return_dict_in_generate:
+-+    #             if output_scores:
+-+    #                 scores += (next_token_scores,)
+-+    #             if output_logits:
+-+    #                 raw_logits += (next_token_logits,)
+-+    #             if output_attentions:
+-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+    #                 if self.config.is_encoder_decoder:
+-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+                
+-+    #             if output_hidden_states:
+-+    #                 hidden = (
+-+    #                     outputs.decoder_hidden_states
+-+    #                     if self.config.is_encoder_decoder
+-+    #                     else outputs.hidden_states
+-+    #                 )
+-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+            
+-+    #         # token selection
+-+    #         if do_sample:
+-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+    #         else:
+-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+            
+-+    #         # finished sentences should have their next token be a padding token
+-+    #         if has_eos_stopping_criteria:
+-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+            
+-+    #         # update generated ids, model inputs, and length for next step
+-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+    #         if streamer is not None:
+-+    #             streamer.put(next_tokens)
+-+            
+-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+    #             outputs,
+-+    #             model_kwargs,
+-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+    #         )
+-+            
+-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+    #         cur_len += 1
+-+            
+-+    #         if _record_time:
+-+    #             import time as time_module
+-+    #             infer_stop = time_module.time()
+-+    #             time_record.append(infer_stop - infer_start)
+-+            
+-+    #         del outputs
+-+        
+-+    #     average_infer_time = None
+-+    #     if time_record:
+-+    #         if len(time_record) > 1:
+-+    #             time_record.pop(0)
+-+    #         average_infer_time = sum(time_record) / len(time_record)
+-+    #         print(f'average inference time is: {average_infer_time}')
+-+    #         print(f'inference time record: {time_record}')
+-+        
+-+    #     if streamer is not None:
+-+    #         streamer.end()
+-+        
+-+    #     # 简单判断：打印是否使用了JIT路径
+-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+    #     else:
+-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+        
+-+    #     if return_dict_in_generate:
+-+    #         if self.config.is_encoder_decoder:
+-+    #             return GenerateEncoderDecoderOutput(
+-+    #                 sequences=input_ids,
+-+    #                 scores=scores,
+-+    #                 logits=raw_logits,
+-+    #                 encoder_attentions=encoder_attentions,
+-+    #                 encoder_hidden_states=encoder_hidden_states,
+-+    #                 decoder_attentions=decoder_attentions,
+-+    #                 cross_attentions=cross_attentions,
+-+    #                 decoder_hidden_states=decoder_hidden_states,
+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+    #                 average_infer_time=average_infer_time
+-+    #             )
+-+    #         else:
+-+    #             return GenerateDecoderOnlyOutput(
+-+    #                 sequences=input_ids,
+-+    #                 scores=scores,
+-+    #                 logits=raw_logits,
+-+    #                 attentions=decoder_attentions,
+-+    #                 hidden_states=decoder_hidden_states,
+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+    #                 average_infer_time=average_infer_time
+-+    #             )
+-+    #     else:
+-+    #         return input_ids
+-+            
+-+    # def _prepare_cache_for_generation(
+-+    #     self,
+-+    #     generation_config,
+-+    #     model_kwargs,
+-+    #     assistant_model,
+-+    #     batch_size,
+-+    #     max_cache_length,
+-+    # ):
+-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+    #         generation_config.cache_implementation = "static"
+-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+        
+-+    #     if generation_config.cache_implementation == "static":
+-+    #         base_required_from_max_length = generation_config.max_length + 1
+-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+    #         min_cache_size = 50
+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+    #         else:
+-+    #             max_cache_length = max(base_required, min_cache_size)
+-+            
+-+    #         original_max_cache_length = max_cache_length
+-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+            
+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+    #             if max_cache_length > self.config.max_position_embeddings:
+-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+        
+-+    #     result = super()._prepare_cache_for_generation(
+-+    #         generation_config=generation_config,
+-+    #         model_kwargs=model_kwargs,
+-+    #         assistant_model=assistant_model,
+-+    #         batch_size=batch_size,
+-+    #         max_cache_length=max_cache_length,
+-+    #     )
+-+        
+-+    #     if generation_config.cache_implementation == "static":
+-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+    #         created_cache = model_kwargs.get(cache_name)
+-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+    #             if created_cache.max_cache_len < generation_config.max_length:
+-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+        
+-+    #     return result
+-+
+-+
+-+
+- 
+- 
+- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+--- 
+-2.27.0
+-
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0004-20251106change.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0004-20251106change.patch"
new file mode 100644
index 00000000..25b442d5
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0004-20251106change.patch"
@@ -0,0 +1,7498 @@
+From 60df5bdc79368911a03b9c034b11b7437df753ca Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Thu, 6 Nov 2025 15:48:09 +0800
+Subject: [PATCH 04/10] 20251106change
+
+---
+ .../models/deepseek/modeling_deepseek.py      |  189 +-
+ patches/0001-20251104commit.patch             | 1272 +++++++
+ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
+ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
+ 4 files changed, 7244 insertions(+), 186 deletions(-)
+ create mode 100644 patches/0001-20251104commit.patch
+ create mode 100644 patches/0002-20251106commit.patch
+ create mode 100644 patches/0003-20261106secondcommit.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index 2f9192bf..0546f318 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
+ 
+         return attn_output, attn_weights, past_key_value
+ 
+-# class DeepseekFlashAttention(nn.Module):
+-#     """
+-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-
+-#     This class is designed as a drop-in replacement for DeepseekAttention.
+-#     """
+-
+-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-#         super().__init__()
+-#         self.config = config
+-#         self.layer_idx = layer_idx
+-#         if layer_idx is None:
+-#             logger.warning(
+-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-#                 "when creating this class."
+-#             )
+-
+-#         self.attention_dropout = config.attention_dropout
+-#         self.hidden_size = config.hidden_size
+-#         self.num_heads = config.num_attention_heads
+-#         self.head_dim = self.hidden_size // self.num_heads
+-#         self.num_key_value_heads = config.num_key_value_heads
+-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-#         self.max_position_embeddings = config.max_position_embeddings
+-#         self.rope_theta = config.rope_theta
+-#         self.is_causal = True
+-
+-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-#             raise ValueError(
+-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-#                 f" and `num_heads`: {self.num_heads})."
+-#             )
+-
+-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-#         self._init_rope()
+-
+-#     def _init_rope(self):
+-#         if self.config.rope_scaling is None:
+-#             self.rotary_emb = DeepseekRotaryEmbedding(
+-#                 self.head_dim,
+-#                 max_position_embeddings=self.max_position_embeddings,
+-#                 base=self.rope_theta,
+-#             )
+-#         else:
+-#             scaling_type = self.config.rope_scaling["type"]
+-#             scaling_factor = self.config.rope_scaling["factor"]
+-#             if scaling_type == "linear":
+-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-#                     self.head_dim,
+-#                     max_position_embeddings=self.max_position_embeddings,
+-#                     scaling_factor=scaling_factor,
+-#                     base=self.rope_theta,
+-#                 )
+-#             elif scaling_type == "dynamic":
+-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-#                     self.head_dim,
+-#                     max_position_embeddings=self.max_position_embeddings,
+-#                     scaling_factor=scaling_factor,
+-#                     base=self.rope_theta,
+-#                 )
+-#             else:
+-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-
+-#     def forward(
+-#         self,
+-#         hidden_states: mindspore.Tensor,
+-#         attention_mask: Optional[mindspore.Tensor] = None,
+-#         position_ids: Optional[mindspore.Tensor] = None,
+-#         past_key_value: Optional[Cache] = None,
+-#         output_attentions: bool = False,
+-#         use_cache: bool = False,
+-#         **kwargs,
+-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-#         if "padding_mask" in kwargs:
+-#             warnings.warn(
+-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-#             )
+-        
+-#         if output_attentions:
+-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-
+-#         bsz, q_len, _ = hidden_states.shape
+-
+-#         if self.config.pretraining_tp > 1:
+-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-
+-#         query_states = self.q_proj(hidden_states)
+-#         key_states = self.k_proj(hidden_states)
+-#         value_states = self.v_proj(hidden_states)
+-
+-#         # Reshape for multi-head attention
+-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-
+-#         kv_seq_len = key_states.shape[-2]
+-#         if past_key_value is not None:
+-#             if self.layer_idx is None:
+-#                 raise ValueError(
+-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-#                     "with a layer index."
+-#                 )
+-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-        
+-#         # Apply Rotary Positional Embedding
+-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-
+-#         if past_key_value is not None:
+-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-
+-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-        
+-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-        
+-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-
+-#         # Convert attention_mask for flash_attention_score
+-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-#         if attention_mask is not None:
+-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-#                 raise ValueError(
+-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-#                 )
+-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-#         else:
+-#             attn_mask_for_fa = None
+-        
+-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-
+-#         # Call the fused flash_attention_score operator
+-#         attn_output = mindspore.ops.flash_attention_score(
+-#             query=query_states_for_fa,
+-#             key=key_states_for_fa,
+-#             value=value_states_for_fa,
+-#             head_num=self.num_heads, # This is N1, the number of query heads
+-#             input_layout='BSH',
+-#             attn_mask=attn_mask_for_fa,
+-#             keep_prob=keep_prob,
+-#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-#             sparse_mode=0 # Default mask mode
+-#         )
+-        
+-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-#         attn_output = self.o_proj(attn_output)
+-        
+-#         # Flash Attention does not return attention weights
+-#         attn_weights = None
+-
+-#         return attn_output, attn_weights, past_key_value
+ 
+ class DeepseekFlashAttention(nn.Module):
+     """
+@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
+         super().__init__()
+         self.hidden_size = config.hidden_size
+ 
+-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-            config=config, layer_idx=layer_idx
+-        )
++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
++            # config=config, layer_idx=layer_idx
++        # )
+ 
+         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+             config=config, layer_idx=layer_idx
+@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
+         return outputs
+ 
+ 
+-
+ class DeepseekPreTrainedModel(PreTrainedModel):
+     config_class = DeepseekConfig
+     base_model_prefix = "model"
+@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+         # Initialize weights and apply final processing
+         self.post_init()
+         self.warm_up = False
+-        #@dwj
+-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-            self.num_layers,
+-            self.num_attention_heads,
+-            self.head_dim,
+-            batch_size=1,
+-            max_length=self.max_length,
+-            dtype=mindspore.float16
+-        )
+-
+-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-        key_cache = []
+-        value_cache = []
+-        for _ in range(num_layers):
+-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-            key_cache.append(k)
+-            value_cache.append(v)
+-        return key_cache, value_cache
+-
+ 
+     def warmup_moe_model_deep(self):
+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+new file mode 100644
+index 00000000..78f22642
+--- /dev/null
++++ b/patches/0001-20251104commit.patch
+@@ -0,0 +1,1272 @@
++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Tue, 4 Nov 2025 09:11:51 +0800
++Subject: [PATCH 1/3] 20251104commit
++
++---
++ mindnlp/transformers/cache_utils.py           |  28 +-
++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++ 3 files changed, 976 insertions(+), 87 deletions(-)
++
++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++index cadd2e04..02f8d4be 100644
++--- a/mindnlp/transformers/cache_utils.py
+++++ b/mindnlp/transformers/cache_utils.py
++@@ -812,14 +812,26 @@ class StaticCache(Cache):
++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++             # k_out[:, :, cache_position] = key_states
++             # v_out[:, :, cache_position] = value_states
++-            if ON_ORANGE_PI:
++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++-            else:
++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++-
+++            # if ON_ORANGE_PI:
+++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++            # else:
+++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++            # 确保 cache_position 是 1D tensor 并且类型正确
+++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++            if cache_position.ndim > 1:
+++                cache_position = cache_position.flatten()
+++            # 确保类型是 int32 或 int64（MindSpore 要求）
+++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++                cache_position = cache_position.int()
+++            
+++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++            k_out[:, :, cache_position] = key_states
+++            v_out[:, :, cache_position] = value_states
+++            
++         return k_out, v_out
++ 
++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index c695b944..d8303e45 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++ def rotate_half(x):
++     """Rotates half the hidden dims of the input."""
++-    x1 = x[..., : x.shape[-1] // 2]
++-    x2 = x[..., x.shape[-1] // 2 :]
+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++    # x1 = x[..., : x.shape[-1] // 2]
+++    # x2 = x[..., x.shape[-1] // 2 :]
+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++     return ops.cat((-x2, x1), dim=-1)
++ 
++ 
++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++         if self.training:
++             raise NotImplementedError("Training is not supported yet.")
++         else:
++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++-        if self.config.n_shared_experts is not None:
++-            y = y + self.shared_experts(identity)
++-        return y
+++            # @lwx
+++            if orig_shape[1] == 1:
+++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++                y=y.view(*orig_shape)
+++                if self.config.n_shared_experts is not None:
+++                    y = y + self.shared_experts(identity)
+++                return y
+++            else:
+++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++                if self.config.n_shared_experts is not None:
+++                    y = y + self.shared_experts(identity)
+++                return y
+++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++        # if self.config.n_shared_experts is not None:
+++        #     y = y + self.shared_experts(identity)
+++        # return y
+++        
+++    @no_grad()
+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++
+++        expert_cache = ops.zeros_like(x)
+++        for i in range(self.num_experts_per_tok):
+++            expert_id = flat_expert_indices[i].item()
+++            weight = flat_expert_weights[i].item()
+++            expert = self.experts[expert_id]
+++            expert_out = expert(x)
+++            expert_cache += expert_out * weight
+++        return expert_cache
++ 
++     @no_grad()
++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++-        # expert_cache = torch.zeros_like(x)
++-        # idxs = flat_expert_indices.argsort()
++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++-        # token_idxs = idxs // self.num_experts_per_tok
++-        # for i, end_idx in enumerate(tokens_per_expert):
++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++-        #     if start_idx == end_idx:
++-        #         continue
++-        #     expert = self.experts[i]
++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
++-        #     expert_tokens = x[exp_token_idx]
++-        #     expert_out = expert(expert_tokens)
++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++-        # return expert_cache
+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++         expert_cache = ops.zeros_like(x)
++         idxs = flat_expert_indices.argsort()
++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++         token_idxs = idxs // self.num_experts_per_tok
+++
++         for i, end_idx in enumerate(tokens_per_expert):
++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++             if start_idx == end_idx:
++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++             expert_out = expert(expert_tokens)
++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++
++         return expert_cache
+++        
+++    # @no_grad()
+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++    #     # expert_cache = torch.zeros_like(x)
+++    #     # idxs = flat_expert_indices.argsort()
+++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++    #     # token_idxs = idxs // self.num_experts_per_tok
+++    #     # for i, end_idx in enumerate(tokens_per_expert):
+++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++    #     #     if start_idx == end_idx:
+++    #     #         continue
+++    #     #     expert = self.experts[i]
+++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++    #     #     expert_tokens = x[exp_token_idx]
+++    #     #     expert_out = expert(expert_tokens)
+++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++    #     # return expert_cache
+++    #     expert_cache = ops.zeros_like(x)
+++    #     idxs = flat_expert_indices.argsort()
+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++    #     token_idxs = idxs // self.num_experts_per_tok
+++
+++    #     for i, end_idx in enumerate(tokens_per_expert):
+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++    #         if start_idx == end_idx:
+++    #             continue
+++    #         expert = self.experts[i]
+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++    #         expert_tokens = x[exp_token_idx]
+++    #         expert_out = expert(expert_tokens)
+++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++
+++    #     return expert_cache
+++    # @no_grad()
+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++    #     expert_cache = ops.zeros_like(x)
+++
+++    #     # 排序保证顺序一致
+++    #     idxs = flat_expert_indices.argsort()
+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++    #     token_idxs = idxs // self.num_experts_per_tok
+++
+++    #     # 找出有 token 的专家
+++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++
+++    #     for i in active_experts.tolist():
+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++    #         end_idx = tokens_per_expert[i]
+++    #         if start_idx == end_idx:  # 没有 token
+++    #             continue
+++
+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++    #         expert_tokens = x[exp_token_idx]
+++    #         expert_out = self.experts[i](expert_tokens)
+++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++
+++    #         expert_cache = mindspore.mint.scatter_add(
+++    #             expert_cache,
+++    #             0,
+++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++    #             expert_out
+++    #         )
+++
+++    #     return expert_cache
+++
+++
++ 
++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++ #     """
++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++ 
++         # Initialize weights and apply final processing
++         self.post_init()
+++        self.warm_up = False
+++
+++    def warmup_moe_model_deep(self):
+++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++        test_texts = [
+++            "warmup short",
+++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++        ]
+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++        if tokenizer is None:
+++            from mindnlp.transformers import AutoTokenizer
+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++            self._warmup_tokenizer = tokenizer
+++
+++        for text in test_texts:
+++            inputs = tokenizer(text, return_tensors="ms")
+++            with mindspore._no_grad():
+++                _ = self(**inputs, use_cache=False)
+++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++ 
++     def get_input_embeddings(self):
++         return self.model.embed_tokens
++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++         ```"""
+++        if not self.warm_up:
+++            self.warm_up = True
+++            self.warmup_moe_model_deep()
+++
++         output_attentions = (
++             output_attentions
++             if output_attentions is not None
++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++index 3cbf820e..d4c6b651 100644
++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++@@ -18,7 +18,6 @@
++ # See the License for the specific language governing permissions and
++ # limitations under the License.
++ """MindSpore Qwen2MoE model."""
++-
++ import math
++ from typing import List, Optional, Tuple, Union
++ 
++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++     TokenClassifierOutput,
++ )
++ from ...modeling_utils import PreTrainedModel
+++from ...generation import GenerationMixin
++ from ....utils import logging
++ from .configuration_qwen2_moe import Qwen2MoeConfig
++ 
++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++         self.variance_epsilon = eps
++ 
++     def forward(self, hidden_states):
+++        # @dwj
+++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++        # @lwx
+++        # if not self.training :
+++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++         input_dtype = hidden_states.dtype
++         hidden_states = hidden_states.to(mindspore.float32)
++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++@@ -234,6 +239,8 @@ def rotate_half(x):
++     """Rotates half the hidden dims of the input."""
++     x1 = x[..., : x.shape[-1] // 2]
++     x2 = x[..., x.shape[-1] // 2 :]
+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++     return ops.cat((-x2, x1), dim=-1)
++ 
++ 
++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++         self.config = config
++         self.hidden_size = config.hidden_size
++         self.intermediate_size = intermediate_size
+++        
++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++         self.act_fn = ACT2FN[config.hidden_act]
++ 
++     def forward(self, x):
++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++-
++ 
+++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++        # @lwx 
+++        # gate_up_output = self.gate_up_proj(x)
+++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++        # return self.down_proj(swiglu_output)
+++
+++    # def forward(self, x):
+++    #     gate_proj_out = self.gate_proj(x)
+++    #     up_proj_out = self.up_proj(x)
+++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++    #     return self.down_proj(swiglu_out)
+++        
++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++     """
++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++         use_cache: bool = False,
++         cache_position: Optional[mindspore.Tensor] = None,
++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++        
+++
++         bsz, q_len, _ = hidden_states.shape
++ 
++         query_states = self.q_proj(hidden_states)
++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++                     "with a layer index."
++                 )
++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++            if isinstance(past_key_value, StaticCache):
+++                kv_seq_len = key_states.shape[-2]
+++            else:
+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++ 
++         if past_key_value is not None:
++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++            
+++            if isinstance(past_key_value, StaticCache):
+++                kv_seq_len = key_states.shape[-2]
++ 
++         # repeat k/v heads if n_kv_heads < n_heads
++         key_states = repeat_kv(key_states, self.num_key_value_groups)
++         value_states = repeat_kv(value_states, self.num_key_value_groups)
++-
+++        
++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++ 
++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++-            raise ValueError(
++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++-                f" {attn_weights.shape}"
++-            )
++-
++-        if attention_mask is not None:  # no matter the length, we just slice it
++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++        if attention_mask is not None:
+++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++             attn_weights = attn_weights + causal_mask
++ 
++         # upcast attention to fp32
++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++ 
++         attn_output = self.o_proj(attn_output)
++-
+++        # @lwx
+++        
+++        # max_seq_len = self.max_position_embeddings  # 2048
+++
+++        # if attention_mask is not None:
+++        #     # attention_mask: [B, 1, Sq, Sk]
+++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++
+++        #     # pad 到 [max_seq_len, max_seq_len]
+++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++        #     global_attention_mask = padded_mask
+++        # else:
+++        #     global_attention_mask = None
+++
+++
+++        # sparse_mode=3
+++        # attn_output = mindspore.ops.flash_attention_score(
+++        #     query=query_states,
+++        #     key=key_states,
+++        #     value=value_states,
+++        #     real_shift=None,
+++        #     padding_mask=None,
+++
+++        #     head_num=self.num_heads,
+++        #     attn_mask=global_attention_mask,
+++        #     keep_prob=1.0 - self.attention_dropout,
+++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++        #     input_layout="BNSD",
+++        #     pre_tokens=2147483647,
+++        #     next_tokens=2147483647,
+++        #     inner_precise=0,
+++        #     drop_mask=None,
+++        #     prefix=None,
+++        #     actual_seq_qlen=None,
+++        #     actual_seq_kvlen=None,
+++        #     sparse_mode=sparse_mode,
+++        # )
++         if not output_attentions:
++             attn_weights = None
++ 
++         return attn_output, attn_weights, past_key_value
++ 
++ 
+++class Qwen2MoeFlashAttention(nn.Module):
+++    """
+++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++
+++    关键改动:
+++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++       直接传入原始的 key 和 value 张量效率更高。
+++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++    """
+++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++        super().__init__()
+++        self.config = config
+++        self.layer_idx = layer_idx
+++        self.hidden_size = config.hidden_size
+++        self.num_heads = config.num_attention_heads
+++        self.head_dim = self.hidden_size // self.num_heads
+++        self.num_key_value_heads = config.num_key_value_heads
+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++        self.max_position_embeddings = config.max_position_embeddings
+++        self.rope_theta = config.rope_theta
+++        self.attention_dropout = config.attention_dropout
+++
+++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++            raise ValueError(
+++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++            )
+++
+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++
+++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++            self.head_dim,
+++            max_position_embeddings=self.max_position_embeddings,
+++            base=self.rope_theta,
+++        )
+++
+++    def forward(
+++        self,
+++        hidden_states: mindspore.Tensor,
+++        attention_mask: Optional[mindspore.Tensor] = None,
+++        position_ids: Optional[mindspore.Tensor] = None,
+++        past_key_value: Optional[Cache] = None,
+++        output_attentions: bool = False,
+++        use_cache: bool = False,
+++        cache_position: Optional[mindspore.Tensor] = None,
+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++        bsz, q_len, _ = hidden_states.shape
+++
+++        # 1. 线性投射 Q, K, V
+++        query_states = self.q_proj(hidden_states)
+++        key_states = self.k_proj(hidden_states)
+++        value_states = self.v_proj(hidden_states)
+++
+++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++        # query:   [B, S, H*D] -> [B, N1, S, D]
+++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++        # 3. RoPE 旋转位置编码
+++        kv_seq_len = key_states.shape[-2]
+++        if past_key_value is not None:
+++            if self.layer_idx is None:
+++                raise ValueError(
+++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++                    "with a layer index."
+++                )
+++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++                if cache_position.shape[0] == 1:
+++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++                    kv_seq_len = past_seen_tokens + 1
+++                else:
+++                    # prefill 阶段：cache_position 是范围，使用其长度
+++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++            else:
+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++
+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++        # 4. KV 缓存更新
+++        if past_key_value is not None:
+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++            key_states, value_states = past_key_value.update(
+++                key_states, value_states, self.layer_idx, cache_kwargs
+++            )
+++            
+++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++                if cache_position.shape[0] == 1:
+++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++                    kv_seq_len = key_states.shape[-2]
+++
+++        # 5. [重要] 准备 Attention Mask
+++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++        fa_attention_mask = None
+++        if attention_mask is not None:
+++            # 截取与当前key长度匹配的部分
+++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++            fa_attention_mask = (mask_slice != 0)
+++
+++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++        input_dtype = query_states.dtype
+++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++            query_states = query_states.to(mindspore.float16)
+++            key_states = key_states.to(mindspore.float16)
+++            value_states = value_states.to(mindspore.float16)
+++
+++        # 6. [核心] 调用 flash_attention_score 算子
+++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++        attn_output = mindspore.ops.flash_attention_score(
+++            query=query_states,
+++            key=key_states,
+++            value=value_states,
+++            head_num=self.num_heads, # 传入Q的头数(N1)
+++            attn_mask=fa_attention_mask,
+++            keep_prob=1.0 - self.attention_dropout,
+++            scalar_value=1.0 / math.sqrt(self.head_dim),
+++            input_layout="BNSD",
+++            sparse_mode=0 # 使用 defaultMask 模式
+++        )
+++
+++        # 恢复原始数据类型
+++        attn_output = attn_output.to(input_dtype)
+++
+++        # 7. 调整输出形状
+++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++        attn_output = self.o_proj(attn_output)
+++
+++        # FlashAttention 算子不直接返回注意力权重矩阵
+++        attn_weights = None
+++        if output_attentions:
+++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++
+++        return attn_output, attn_weights, past_key_value
+++
+++    # def forward(
+++    #     self,
+++    #     hidden_states: mindspore.Tensor,
+++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++    #     position_ids: Optional[mindspore.Tensor] = None,
+++    #     past_key_value: Optional[Cache] = None,
+++    #     output_attentions: bool = False,
+++    #     use_cache: bool = False,
+++    #     cache_position: Optional[mindspore.Tensor] = None,
+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++    #     bsz, q_len, _ = hidden_states.shape
+++
+++    #     # 1. 线性投射 Q, K, V
+++    #     query_states = self.q_proj(hidden_states)
+++    #     key_states = self.k_proj(hidden_states)
+++    #     value_states = self.v_proj(hidden_states)
+++
+++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++    #     # 3. RoPE 旋转位置编码
+++    #     kv_seq_len = key_states.shape[-2]
+++    #     if past_key_value is not None:
+++    #         if self.layer_idx is None:
+++    #             raise ValueError(
+++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++    #                 "with a layer index."
+++    #             )
+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++
+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++    #     # 4. KV 缓存更新
+++    #     if past_key_value is not None:
+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++    #         key_states, value_states = past_key_value.update(
+++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++    #         )
+++
+++    #     # 5. 准备 Attention Mask
+++    #     fa_attention_mask = None
+++    #     if attention_mask is not None:
+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++    #         fa_attention_mask = (mask_slice != 0)
+++
+++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++    #     input_dtype = query_states.dtype
+++
+++    #     # 6. [核心] 调用 flash_attention_score 算子
+++    #     attn_output = mindspore.ops.flash_attention_score(
+++    #         query=query_states,
+++    #         key=key_states,
+++    #         value=value_states,
+++    #         head_num=self.num_heads,
+++    #         attn_mask=fa_attention_mask,
+++    #         keep_prob=1.0 - self.attention_dropout,
+++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++    #         input_layout="BNSD",
+++    #         sparse_mode=0,
+++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++    #         inner_precise=1
+++    #     )
+++
+++    #     # 恢复原始数据类型
+++    #     attn_output = attn_output.to(input_dtype)
+++
+++    #     # 7. 调整输出形状
+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++    #     attn_output = self.o_proj(attn_output)
+++
+++    #     attn_weights = None
+++    #     if output_attentions:
+++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++
+++    #     return attn_output, attn_weights, past_key_value
+++
+++    # def forward(
+++    #     self,
+++    #     hidden_states: mindspore.Tensor,
+++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++    #     position_ids: Optional[mindspore.Tensor] = None,
+++    #     past_key_value: Optional[Cache] = None,
+++    #     output_attentions: bool = False,
+++    #     use_cache: bool = False,
+++    #     cache_position: Optional[mindspore.Tensor] = None,
+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++    #     bsz, q_len, _ = hidden_states.shape
+++
+++    #     query_states = self.q_proj(hidden_states)
+++    #     key_states = self.k_proj(hidden_states)
+++    #     value_states = self.v_proj(hidden_states)
+++
+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++    #     kv_seq_len = key_states.shape[-2]
+++    #     if past_key_value is not None:
+++    #         if self.layer_idx is None:
+++    #             raise ValueError("`layer_idx` must be specified for caching")
+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++
+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++    #     if past_key_value is not None:
+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++    #         key_states, value_states = past_key_value.update(
+++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++    #         )
+++
+++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++
+++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++    #     query_states = query_states / math.sqrt(self.head_dim)
+++    #     # <--- 修改结束 ---
+++
+++    #     fa_attention_mask = None
+++    #     if attention_mask is not None:
+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++    #         fa_attention_mask = (mask_slice != 0)
+++
+++    #     input_dtype = query_states.dtype
+++
+++    #     attn_output = mindspore.ops.flash_attention_score(
+++    #         query=query_states,    # 传入已经预先缩放过的 query
+++    #         key=key_states,
+++    #         value=value_states,
+++    #         head_num=self.num_heads,
+++    #         attn_mask=fa_attention_mask,
+++    #         keep_prob=1.0 - self.attention_dropout,
+++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++    #         input_layout="BNSD",
+++    #         sparse_mode=0,
+++    #         inner_precise=1        # 仍然保持内部高精度计算
+++    #     )
+++
+++    #     attn_output = attn_output.to(input_dtype)
+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++    #     attn_output = self.o_proj(attn_output)
+++
+++    #     attn_weights = None
+++    #     if output_attentions:
+++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++
+++    #     return attn_output, attn_weights, past_key_value
+++
++ QWEN2MOE_ATTENTION_CLASSES = {
++     "eager": Qwen2MoeAttention,
+++    "flash-attention": Qwen2MoeFlashAttention,
++ }
++ 
++ 
++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++ 
+++    #@dwj
+++    # 只遍历激活的专家，而非全部专家
++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
++-        hidden_states = hidden_states.view(-1, hidden_dim)
++-        # router_logits: (batch * sequence_length, n_experts)
++-        router_logits = self.gate(hidden_states)
++-
++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++-        if self.norm_topk_prob:
++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-        # we cast back to the input dtype
++-        routing_weights = routing_weights.to(hidden_states.dtype)
++-
++-        final_hidden_states = ops.zeros(
++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++-        )
++-
++-        # One hot encode the selected experts to create an expert mask
++-        # this will be used to easily index which expert is going to be sollicitated
++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++-
++-        # Loop over all available experts in the model and perform the computation on each expert
++-        for expert_idx in range(self.num_experts):
++-            expert_layer = self.experts[expert_idx]
++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++-
++-            # Index the correct hidden states and compute the expert hidden state for
++-            # the current expert. We need to make sure to multiply the output hidden
++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++-            if 0 not in idx.shape:
++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++-
++-                # However `index_add_` only support torch tensors for indexing so we'll use
++-                # the `top_x` tensor here.
++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++-
++-        shared_expert_output = self.shared_expert(hidden_states)
++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++-
++-        final_hidden_states = final_hidden_states + shared_expert_output
+++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++            num_tokens = hidden_states_reshaped.shape[0]
+++
+++            router_logits = self.gate(hidden_states_reshaped)
+++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++
+++            if self.norm_topk_prob:
+++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++            routing_weights = routing_weights.to(hidden_states.dtype)
+++            
+++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++            flat_selected_experts = selected_experts.flatten()
+++            
+++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++            token_indices = broadcasted_token_indices.flatten()
+++            
+++            active_experts = ops.unique(flat_selected_experts)
+++            
+++            for expert_idx_tensor in active_experts:
+++                expert_idx = expert_idx_tensor.item()
+++                expert_layer = self.experts[expert_idx]
+++                
+++                mask = (flat_selected_experts == expert_idx_tensor)
+++                selected_token_indices = token_indices[mask]
+++                selected_routing_weights = routing_weights.flatten()[mask]
+++                
+++                current_states = hidden_states_reshaped[selected_token_indices]
+++                
+++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++                
+++                final_hidden_states = final_hidden_states.index_add(
+++                    dim=0,
+++                    index=selected_token_indices,
+++                    source=expert_output.to(hidden_states.dtype)
+++                )
+++            
+++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++ 
++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++-        return final_hidden_states, router_logits
+++            final_hidden_states = final_hidden_states + shared_expert_output
+++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++            
+++            return final_hidden_states, router_logits
++ 
++ 
++ class Qwen2MoeDecoderLayer(nn.Module):
++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++ 
++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++ 
+++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++
++         if (layer_idx not in config.mlp_only_layers) and (
++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++         ):
++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++     _skip_keys_device_placement = "past_key_values"
++     _supports_cache_class = True
+++#lwx
+++    # _supports_static_cache = True
++ 
++     def _init_weights(self, module):
++         std = self.config.initializer_range
++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++         return causal_mask
++ 
++ 
++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++     _tied_weights_keys = ["lm_head.weight"]
++ 
++     def __init__(self, config):
++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++         self.num_experts_per_tok = config.num_experts_per_tok
++         # Initialize weights and apply final processing
++         self.post_init()
+++        # @lwx
+++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++        #     self.generation_config.cache_implementation = "static"
+++        self._warmed_up = False
+++
+++    def warmup_moe_model(self):
+++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++        test_texts = [
+++            "warmup short",
+++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++        ]
+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++        if tokenizer is None:
+++            from mindnlp.transformers import AutoTokenizer
+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++            self._warmup_tokenizer = tokenizer
+++
+++        for text in test_texts:
+++            inputs = tokenizer(text, return_tensors="ms")
+++            with mindspore._no_grad():
+++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++        print("[Warmup] Qwen2-MoE 模型预热完成。")
++ 
++     def get_input_embeddings(self):
++         return self.model.embed_tokens
++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++         ```"""
+++        if not self._warmed_up:
+++            self._warmed_up = True
+++            self.warmup_moe_model()
++ 
++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++         output_router_logits = (
++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++             }
++         )
++         return model_inputs
+++# @lwx
+++    # def _decode_one_tokens_logits(
+++    #     self,
+++    #     cur_token: mindspore.Tensor,
+++    #     input_pos: Optional[mindspore.Tensor],
+++    #     cache_position: mindspore.Tensor,
+++    #     past_key_values: StaticCache,
+++    # ) -> mindspore.Tensor:
+++    #     """
+++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++        
+++    #     Args:
+++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++    #         input_pos: 输入位置信息，可选
+++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++            
+++    #     Returns:
+++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++    #     """
+++    #     # 调用JIT编译的版本
+++    #     return self.get_decode_one_tokens_logits(
+++    #         cur_token=cur_token,
+++    #         input_pos=input_pos,
+++    #         cache_position=cache_position,
+++    #         past_key_values=past_key_values,
+++    #     )
+++    
+++    # @mindspore.jit(jit_level='O1')
+++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++    #     """
+++    #     JIT编译的函数，用于高效的单token解码
+++    #     使用JIT编译优化以支持静态shape和高效执行
+++        
+++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++    #     """
+++    #     outputs = self.model.forward(
+++    #         input_ids=cur_token,
+++    #         position_ids=input_pos,
+++    #         cache_position=cache_position,
+++    #         past_key_values=past_key_values,
+++    #         use_cache=True,
+++    #         return_dict=False,
+++    #     )
+++        
+++    #     hidden_states = outputs[0]
+++    #     logits = self.lm_head.forward(hidden_states)
+++    #     logits = logits.float()
+++        
+++    #     return logits[:, -1, :]
+++
+++    # def _sample(
+++    #     self,
+++    #     input_ids: mindspore.Tensor,
+++    #     logits_processor,
+++    #     stopping_criteria,
+++    #     generation_config,
+++    #     synced_devices: bool,
+++    #     streamer=None,
+++    #     logits_warper=None,
+++    #     **model_kwargs,
+++    # ):
+++    #     """
+++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++    #     """
+++    #     from ...generation.logits_process import LogitsProcessorList
+++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++    #     from mindnlp.core import nn, ops, no_grad
+++    #     import numpy as np
+++        
+++    #     # 检查是否使用 StaticCache
+++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++    #     # 否则，直接调用父类方法
+++    #     past_key_values = model_kwargs.get("past_key_values")
+++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++
+++    #     if not isinstance(past_key_values, StaticCache):
+++    #         # 不使用 StaticCache，直接调用父类方法
+++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++    #         return super()._sample(
+++    #             input_ids=input_ids,
+++    #             logits_processor=logits_processor,
+++    #             stopping_criteria=stopping_criteria,
+++    #             generation_config=generation_config,
+++    #             synced_devices=synced_devices,
+++    #             streamer=streamer,
+++    #             logits_warper=logits_warper,
+++    #             **model_kwargs,
+++    #         )
+++        
+++    #     # 使用 StaticCache，进入自定义循环
+++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++    #     pad_token_id = generation_config._pad_token_tensor
+++    #     output_attentions = generation_config.output_attentions
+++    #     output_hidden_states = generation_config.output_hidden_states
+++    #     output_scores = generation_config.output_scores
+++    #     output_logits = generation_config.output_logits
+++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++    #     max_length = generation_config.max_length
+++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++    #     do_sample = generation_config.do_sample
+++        
+++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++    #         raise ValueError(
+++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++    #             f"{logits_warper})."
+++    #         )
+++        
+++    #     # init attention / hidden states / scores tuples
+++    #     scores = () if (return_dict_in_generate and output_scores) else None
+++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++        
+++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++    #         encoder_hidden_states = (
+++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++    #         )
+++        
+++    #     # keep track of which sequences are already finished
+++    #     batch_size, cur_len = input_ids.shape
+++    #     this_peer_finished = False
+++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++        
+++    #     time_record = []
+++    #     from ....utils.testing_utils import parse_flag_from_env
+++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++        
+++    #     while self._has_unfinished_sequences(
+++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++    #     ):
+++    #         if _record_time:
+++    #             import time as time_module
+++    #             infer_start = time_module.time()
+++            
+++    #         # prepare model inputs
+++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++            
+++    #         # prepare variable output controls
+++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++            
+++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++    #         cur_cache_position = model_inputs.get("cache_position")
+++    #         cur_past_key_values = model_inputs.get("past_key_values")
+++    #         cur_input_ids = model_inputs.get("input_ids")
+++            
+++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++    #             cur_cache_position is not None and 
+++    #             len(cur_cache_position.shape) > 0 and
+++    #             cur_cache_position.shape[0] == 1 and
+++    #             cur_input_ids is not None and
+++    #             cur_input_ids.shape[1] == 1):
+++    #             # 使用 JIT 优化的单 token 解码
+++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++    #             if not hasattr(self, '_jit_used'):
+++    #                 self._jit_used = False
+++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++                
+++    #             next_token_logits = self.get_decode_one_tokens_logits(
+++    #                 cur_token=cur_input_ids,
+++    #                 input_pos=model_inputs.get("position_ids"),
+++    #                 cache_position=cur_cache_position,
+++    #                 past_key_values=cur_past_key_values,
+++    #             )
+++                
+++    #             # 标记已使用JIT（用于后续判断）
+++    #             if not self._jit_used:
+++    #                 self._jit_used = True
+++                
+++    #             # 构造兼容的输出对象
+++    #             class JitOptimizedOutput:
+++    #                 def __init__(self, logits, config):
+++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++    #                     self.config = config
+++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++    #                     self.attentions = None if not config.is_encoder_decoder else None
+++    #                     self.cross_attentions = None
+++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++                
+++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++    #         else:
+++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++    #             outputs = self(**model_inputs, return_dict=True)
+++            
+++    #         if synced_devices and this_peer_finished:
+++    #             continue
+++            
+++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++    #         next_token_logits = outputs.logits[:, -1, :]
+++            
+++    #         # pre-process distribution
+++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++    #         if do_sample:
+++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++            
+++    #         # Store scores, attentions and hidden_states when required
+++    #         if return_dict_in_generate:
+++    #             if output_scores:
+++    #                 scores += (next_token_scores,)
+++    #             if output_logits:
+++    #                 raw_logits += (next_token_logits,)
+++    #             if output_attentions:
+++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++    #                 if self.config.is_encoder_decoder:
+++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++                
+++    #             if output_hidden_states:
+++    #                 hidden = (
+++    #                     outputs.decoder_hidden_states
+++    #                     if self.config.is_encoder_decoder
+++    #                     else outputs.hidden_states
+++    #                 )
+++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++            
+++    #         # token selection
+++    #         if do_sample:
+++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++    #         else:
+++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++            
+++    #         # finished sentences should have their next token be a padding token
+++    #         if has_eos_stopping_criteria:
+++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++            
+++    #         # update generated ids, model inputs, and length for next step
+++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++    #         if streamer is not None:
+++    #             streamer.put(next_tokens)
+++            
+++    #         model_kwargs = self._update_model_kwargs_for_generation(
+++    #             outputs,
+++    #             model_kwargs,
+++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++    #         )
+++            
+++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++    #         cur_len += 1
+++            
+++    #         if _record_time:
+++    #             import time as time_module
+++    #             infer_stop = time_module.time()
+++    #             time_record.append(infer_stop - infer_start)
+++            
+++    #         del outputs
+++        
+++    #     average_infer_time = None
+++    #     if time_record:
+++    #         if len(time_record) > 1:
+++    #             time_record.pop(0)
+++    #         average_infer_time = sum(time_record) / len(time_record)
+++    #         print(f'average inference time is: {average_infer_time}')
+++    #         print(f'inference time record: {time_record}')
+++        
+++    #     if streamer is not None:
+++    #         streamer.end()
+++        
+++    #     # 简单判断：打印是否使用了JIT路径
+++    #     if hasattr(self, '_jit_used') and self._jit_used:
+++    #         print("[JIT] ✓ JIT optimization was used during generation")
+++    #     else:
+++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++        
+++    #     if return_dict_in_generate:
+++    #         if self.config.is_encoder_decoder:
+++    #             return GenerateEncoderDecoderOutput(
+++    #                 sequences=input_ids,
+++    #                 scores=scores,
+++    #                 logits=raw_logits,
+++    #                 encoder_attentions=encoder_attentions,
+++    #                 encoder_hidden_states=encoder_hidden_states,
+++    #                 decoder_attentions=decoder_attentions,
+++    #                 cross_attentions=cross_attentions,
+++    #                 decoder_hidden_states=decoder_hidden_states,
+++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++    #                 average_infer_time=average_infer_time
+++    #             )
+++    #         else:
+++    #             return GenerateDecoderOnlyOutput(
+++    #                 sequences=input_ids,
+++    #                 scores=scores,
+++    #                 logits=raw_logits,
+++    #                 attentions=decoder_attentions,
+++    #                 hidden_states=decoder_hidden_states,
+++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++    #                 average_infer_time=average_infer_time
+++    #             )
+++    #     else:
+++    #         return input_ids
+++            
+++    # def _prepare_cache_for_generation(
+++    #     self,
+++    #     generation_config,
+++    #     model_kwargs,
+++    #     assistant_model,
+++    #     batch_size,
+++    #     max_cache_length,
+++    # ):
+++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++    #         generation_config.cache_implementation = "static"
+++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++        
+++    #     if generation_config.cache_implementation == "static":
+++    #         base_required_from_max_length = generation_config.max_length + 1
+++    #         base_required = max(max_cache_length, base_required_from_max_length)
+++    #         min_cache_size = 50
+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++    #         else:
+++    #             max_cache_length = max(base_required, min_cache_size)
+++            
+++    #         original_max_cache_length = max_cache_length
+++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++    #         print(f"  - final max_cache_length: {max_cache_length}")
+++            
+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++    #             if max_cache_length > self.config.max_position_embeddings:
+++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++        
+++    #     result = super()._prepare_cache_for_generation(
+++    #         generation_config=generation_config,
+++    #         model_kwargs=model_kwargs,
+++    #         assistant_model=assistant_model,
+++    #         batch_size=batch_size,
+++    #         max_cache_length=max_cache_length,
+++    #     )
+++        
+++    #     if generation_config.cache_implementation == "static":
+++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++    #         created_cache = model_kwargs.get(cache_name)
+++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++    #             if created_cache.max_cache_len < generation_config.max_length:
+++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++        
+++    #     return result
+++
+++
+++
++ 
++ 
++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++-- 
++2.27.0
++
+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+new file mode 100644
+index 00000000..22b65dd5
+--- /dev/null
++++ b/patches/0002-20251106commit.patch
+@@ -0,0 +1,3200 @@
++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Thu, 6 Nov 2025 09:20:38 +0800
++Subject: [PATCH 2/3] 20251106commit
++
++---
++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
++ 3 files changed, 2689 insertions(+), 305 deletions(-)
++ create mode 100644 patches/0001-20251104commit.patch
++
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index d8303e45..73773c22 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
++         #     y = y + self.shared_experts(identity)
++         # return y
++         
+++    # @no_grad()
+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++
+++    #     expert_cache = ops.zeros_like(x)
+++    #     for i in range(self.num_experts_per_tok):
+++    #         expert_id = flat_expert_indices[i].item()
+++    #         weight = flat_expert_weights[i].item()
+++    #         expert = self.experts[expert_id]
+++    #         expert_out = expert(x)
+++    #         expert_cache += expert_out * weight
+++    #     return expert_cache
+++
++     @no_grad()
++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++        # x 的 shape: (1, hidden_size)
+++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++
+++        # 1. 收集所有需要的专家层
+++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+++
+++        # 2. 并行计算所有专家的输出
+++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++        # ops.cat 会将它们堆叠成一个新的 Tensor
+++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++
+++        # 3. 使用矩阵乘法进行加权求和
+++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++        # 最终结果 final_output 的 shape: (1, hidden_size)
+++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++        
+++        return final_output
++ 
++-        expert_cache = ops.zeros_like(x)
++-        for i in range(self.num_experts_per_tok):
++-            expert_id = flat_expert_indices[i].item()
++-            weight = flat_expert_weights[i].item()
++-            expert = self.experts[expert_id]
++-            expert_out = expert(x)
++-            expert_cache += expert_out * weight
++-        return expert_cache
++ 
++     @no_grad()
++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
++             key_states = self.k_proj(hidden_states)
++             value_states = self.v_proj(hidden_states)
++ 
++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++        # @lwx
+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
++ 
++         kv_seq_len = key_states.shape[-2]
++         if past_key_value is not None:
++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
++         return attn_output, attn_weights, past_key_value
++ 
++ 
+++# class DeepseekFlashAttention(nn.Module):
+++#     """
+++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+++
+++#     This class is designed as a drop-in replacement for DeepseekAttention.
+++#     """
+++
+++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+++#         super().__init__()
+++#         self.config = config
+++#         self.layer_idx = layer_idx
+++#         if layer_idx is None:
+++#             logger.warning(
+++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++#                 "when creating this class."
+++#             )
+++
+++#         self.attention_dropout = config.attention_dropout
+++#         self.hidden_size = config.hidden_size
+++#         self.num_heads = config.num_attention_heads
+++#         self.head_dim = self.hidden_size // self.num_heads
+++#         self.num_key_value_heads = config.num_key_value_heads
+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++#         self.max_position_embeddings = config.max_position_embeddings
+++#         self.rope_theta = config.rope_theta
+++#         self.is_causal = True
+++
+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++#             raise ValueError(
+++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++#                 f" and `num_heads`: {self.num_heads})."
+++#             )
+++
+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+++#         self._init_rope()
+++
+++#     def _init_rope(self):
+++#         if self.config.rope_scaling is None:
+++#             self.rotary_emb = DeepseekRotaryEmbedding(
+++#                 self.head_dim,
+++#                 max_position_embeddings=self.max_position_embeddings,
+++#                 base=self.rope_theta,
+++#             )
+++#         else:
+++#             scaling_type = self.config.rope_scaling["type"]
+++#             scaling_factor = self.config.rope_scaling["factor"]
+++#             if scaling_type == "linear":
+++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+++#                     self.head_dim,
+++#                     max_position_embeddings=self.max_position_embeddings,
+++#                     scaling_factor=scaling_factor,
+++#                     base=self.rope_theta,
+++#                 )
+++#             elif scaling_type == "dynamic":
+++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+++#                     self.head_dim,
+++#                     max_position_embeddings=self.max_position_embeddings,
+++#                     scaling_factor=scaling_factor,
+++#                     base=self.rope_theta,
+++#                 )
+++#             else:
+++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+++
+++#     def forward(
+++#         self,
+++#         hidden_states: mindspore.Tensor,
+++#         attention_mask: Optional[mindspore.Tensor] = None,
+++#         position_ids: Optional[mindspore.Tensor] = None,
+++#         past_key_value: Optional[Cache] = None,
+++#         output_attentions: bool = False,
+++#         use_cache: bool = False,
+++#         **kwargs,
+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++#         if "padding_mask" in kwargs:
+++#             warnings.warn(
+++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+++#             )
+++        
+++#         if output_attentions:
+++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+++
+++#         bsz, q_len, _ = hidden_states.shape
+++
+++#         if self.config.pretraining_tp > 1:
+++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+++
+++#         query_states = self.q_proj(hidden_states)
+++#         key_states = self.k_proj(hidden_states)
+++#         value_states = self.v_proj(hidden_states)
+++
+++#         # Reshape for multi-head attention
+++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++#         kv_seq_len = key_states.shape[-2]
+++#         if past_key_value is not None:
+++#             if self.layer_idx is None:
+++#                 raise ValueError(
+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++#                     "with a layer index."
+++#                 )
+++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++        
+++#         # Apply Rotary Positional Embedding
+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++#         if past_key_value is not None:
+++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++
+++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++        
+++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++        
+++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++
+++#         # Convert attention_mask for flash_attention_score
+++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+++#         if attention_mask is not None:
+++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+++#                 raise ValueError(
+++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+++#                 )
+++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+++#         else:
+++#             attn_mask_for_fa = None
+++        
+++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+++
+++#         # Call the fused flash_attention_score operator
+++#         attn_output = mindspore.ops.flash_attention_score(
+++#             query=query_states_for_fa,
+++#             key=key_states_for_fa,
+++#             value=value_states_for_fa,
+++#             head_num=self.num_heads, # This is N1, the number of query heads
+++#             input_layout='BSH',
+++#             attn_mask=attn_mask_for_fa,
+++#             keep_prob=keep_prob,
+++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+++#             sparse_mode=0 # Default mask mode
+++#         )
+++        
+++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+++#         attn_output = self.o_proj(attn_output)
+++        
+++#         # Flash Attention does not return attention weights
+++#         attn_weights = None
+++
+++#         return attn_output, attn_weights, past_key_value
+++
+++class DeepseekFlashAttention(nn.Module):
+++    """
+++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+++    This implementation is a drop-in replacement for the original DeepseekAttention class,
+++    designed for high performance on supported hardware (Ascend).
+++
+++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+++    """
+++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+++        super().__init__()
+++        self.config = config
+++        self.layer_idx = layer_idx
+++        if layer_idx is None:
+++            logger.warning(
+++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++                "when creating this class."
+++            )
+++
+++        # --- [FIX] Correctly initialize all required attributes ---
+++        self.attention_dropout = config.attention_dropout
+++        self.hidden_size = config.hidden_size
+++        self.num_heads = config.num_attention_heads
+++        self.head_dim = self.hidden_size // self.num_heads
+++        self.num_key_value_heads = config.num_key_value_heads
+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++        self.max_position_embeddings = config.max_position_embeddings
+++        self.rope_theta = config.rope_theta
+++        self.is_causal = True
+++
+++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++            raise ValueError(
+++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++                f" and `num_heads`: {self.num_heads})."
+++            )
+++
+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+++        
+++        # This call will now succeed as all attributes are initialized.
+++        self._init_rope()
+++
+++    def _init_rope(self):
+++        if self.config.rope_scaling is None:
+++            self.rotary_emb = DeepseekRotaryEmbedding(
+++                self.head_dim,
+++                max_position_embeddings=self.max_position_embeddings,
+++                base=self.rope_theta,
+++            )
+++        else:
+++            scaling_type = self.config.rope_scaling["type"]
+++            scaling_factor = self.config.rope_scaling["factor"]
+++            if scaling_type == "linear":
+++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+++                    self.head_dim,
+++                    max_position_embeddings=self.max_position_embeddings,
+++                    scaling_factor=scaling_factor,
+++                    base=self.rope_theta,
+++                )
+++            elif scaling_type == "dynamic":
+++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+++                    self.head_dim,
+++                    max_position_embeddings=self.max_position_embeddings,
+++                    scaling_factor=scaling_factor,
+++                    base=self.rope_theta,
+++                )
+++            else:
+++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+++
+++    def forward(
+++        self,
+++        hidden_states: mindspore.Tensor,
+++        attention_mask: Optional[mindspore.Tensor] = None,
+++        position_ids: Optional[mindspore.Tensor] = None,
+++        past_key_value: Optional[Cache] = None,
+++        output_attentions: bool = False,
+++        use_cache: bool = False,
+++        **kwargs,
+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++        if "padding_mask" in kwargs:
+++            warnings.warn(
+++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+++            )
+++        if output_attentions:
+++            warnings.warn(
+++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+++            )
+++
+++        bsz, q_len, _ = hidden_states.shape
+++
+++        if self.config.pretraining_tp > 1:
+++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+++
+++        query_states = self.q_proj(hidden_states)
+++        key_states = self.k_proj(hidden_states)
+++        value_states = self.v_proj(hidden_states)
+++
+++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++        kv_seq_len = key_states.shape[-2]
+++        if past_key_value is not None:
+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++        
+++        # Apply Rotary Position Embedding
+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++        if past_key_value is not None:
+++            cache_kwargs = {"sin": sin, "cos": cos}
+++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++
+++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+++        # So we must explicitly repeat the KV heads.
+++        key_states = repeat_kv(key_states, self.num_key_value_groups)
+++        value_states = repeat_kv(value_states, self.num_key_value_groups)
+++
+++        # Convert attention mask for flash_attention_score
+++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+++        if attention_mask is not None:
+++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+++                 raise ValueError(
+++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+++                )
+++            attn_mask_for_fa = attention_mask < 0
+++        else:
+++            attn_mask_for_fa = None
+++
+++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+++
+++        # Call the fused operator using the efficient BNSD layout
+++        attn_output = mindspore.ops.flash_attention_score(
+++            query=query_states,
+++            key=key_states,
+++            value=value_states,
+++            head_num=self.num_heads,
+++            input_layout='BNSD', # Specify the correct layout
+++            attn_mask=attn_mask_for_fa,
+++            keep_prob=keep_prob,
+++            scalar_value=1.0 / math.sqrt(self.head_dim)
+++        )
+++        
+++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++        
+++        # Apply output projection
+++        attn_output = self.o_proj(attn_output)
+++
+++        # Flash attention does not return attention weights, so we return None.
+++        attn_weights = None
+++
+++        return attn_output, attn_weights, past_key_value
+++
++ Deepseek_ATTENTION_CLASSES = {
++     "eager": DeepseekAttention,
+++    "flash-attention": DeepseekFlashAttention,
++ }
++ 
++ 
++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
++             config=config, layer_idx=layer_idx
++         )
++ 
+++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+++            config=config, layer_idx=layer_idx
+++        )
+++
++         self.mlp = (
++             DeepseekMoE(config)
++             if (
++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++index d4c6b651..bced285c 100644
++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
++ 
++ import mindspore
++ import mindnlp.core.nn.functional as F
++-from mindnlp.core import nn, ops
+++from mindnlp.core import nn, ops, no_grad
++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
++ 
++ from ....common.activations import ACT2FN
++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
++ 
+++Long_Prompt = False
+++PROMPT_LENGTH_THRESHOLD = 128
++ 
++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
++ def _prepare_4d_causal_attention_mask_with_cache_position(
++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
++         return attn_output, attn_weights, past_key_value
++ 
++ 
+++# class Qwen2MoeFlashAttention(nn.Module):
+++#     """
+++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++
+++#     关键改动:
+++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++#        直接传入原始的 key 和 value 张量效率更高。
+++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++#     """
+++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++#         super().__init__()
+++#         self.config = config
+++#         self.layer_idx = layer_idx
+++#         self.hidden_size = config.hidden_size
+++#         self.num_heads = config.num_attention_heads
+++#         self.head_dim = self.hidden_size // self.num_heads
+++#         self.num_key_value_heads = config.num_key_value_heads
+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++#         self.max_position_embeddings = config.max_position_embeddings
+++#         self.rope_theta = config.rope_theta
+++#         self.attention_dropout = config.attention_dropout
+++
+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++#             raise ValueError(
+++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++#             )
+++
+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++
+++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++#             self.head_dim,
+++#             max_position_embeddings=self.max_position_embeddings,
+++#             base=self.rope_theta,
+++#         )
+++
+++#     def forward(
+++#         self,
+++#         hidden_states: mindspore.Tensor,
+++#         attention_mask: Optional[mindspore.Tensor] = None,
+++#         position_ids: Optional[mindspore.Tensor] = None,
+++#         past_key_value: Optional[Cache] = None,
+++#         output_attentions: bool = False,
+++#         use_cache: bool = False,
+++#         cache_position: Optional[mindspore.Tensor] = None,
+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++#         bsz, q_len, _ = hidden_states.shape
+++
+++#         # 1. 线性投射 Q, K, V
+++#         query_states = self.q_proj(hidden_states)
+++#         key_states = self.k_proj(hidden_states)
+++#         value_states = self.v_proj(hidden_states)
+++
+++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++#         # query:   [B, S, H*D] -> [B, N1, S, D]
+++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++#         # 3. RoPE 旋转位置编码
+++#         kv_seq_len = key_states.shape[-2]
+++#         if past_key_value is not None:
+++#             if self.layer_idx is None:
+++#                 raise ValueError(
+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++#                     "with a layer index."
+++#                 )
+++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++#                 if cache_position.shape[0] == 1:
+++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++#                     kv_seq_len = past_seen_tokens + 1
+++#                 else:
+++#                     # prefill 阶段：cache_position 是范围，使用其长度
+++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++#             else:
+++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++
+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++#         # 4. KV 缓存更新
+++#         if past_key_value is not None:
+++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++#             key_states, value_states = past_key_value.update(
+++#                 key_states, value_states, self.layer_idx, cache_kwargs
+++#             )
+++            
+++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++#                 if cache_position.shape[0] == 1:
+++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++#                     kv_seq_len = key_states.shape[-2]
+++
+++#         # 5. [重要] 准备 Attention Mask
+++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++#         fa_attention_mask = None
+++#         if attention_mask is not None:
+++#             # 截取与当前key长度匹配的部分
+++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+++#             fa_attention_mask = (mask_slice != 0)
+++
+++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++#         input_dtype = query_states.dtype
+++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++#             query_states = query_states.to(mindspore.float16)
+++#             key_states = key_states.to(mindspore.float16)
+++#             value_states = value_states.to(mindspore.float16)
+++
+++#         # 6. [核心] 调用 flash_attention_score 算子
+++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++#         attn_output = mindspore.ops.flash_attention_score(
+++#             query=query_states,
+++#             key=key_states,
+++#             value=value_states,
+++#             head_num=self.num_heads, # 传入Q的头数(N1)
+++#             attn_mask=fa_attention_mask,
+++#             keep_prob=1.0 - self.attention_dropout,
+++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+++#             input_layout="BNSD",
+++#             sparse_mode=0 # 使用 defaultMask 模式
+++#         )
+++
+++#         # 恢复原始数据类型
+++#         attn_output = attn_output.to(input_dtype)
+++
+++#         # 7. 调整输出形状
+++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++#         attn_output = self.o_proj(attn_output)
+++
+++#         # FlashAttention 算子不直接返回注意力权重矩阵
+++#         attn_weights = None
+++#         if output_attentions:
+++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++
+++#         return attn_output, attn_weights, past_key_value
+++
+++#     # def forward(
+++#     #     self,
+++#     #     hidden_states: mindspore.Tensor,
+++#     #     attention_mask: Optional[mindspore.Tensor] = None,
+++#     #     position_ids: Optional[mindspore.Tensor] = None,
+++#     #     past_key_value: Optional[Cache] = None,
+++#     #     output_attentions: bool = False,
+++#     #     use_cache: bool = False,
+++#     #     cache_position: Optional[mindspore.Tensor] = None,
+++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++#     #     bsz, q_len, _ = hidden_states.shape
+++
+++#     #     # 1. 线性投射 Q, K, V
+++#     #     query_states = self.q_proj(hidden_states)
+++#     #     key_states = self.k_proj(hidden_states)
+++#     #     value_states = self.v_proj(hidden_states)
+++
+++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++
+++#     #     # 3. RoPE 旋转位置编码
+++#     #     kv_seq_len = key_states.shape[-2]
+++#     #     if past_key_value is not None:
+++#     #         if self.layer_idx is None:
+++#     #             raise ValueError(
+++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++#     #                 "with a layer index."
+++#     #             )
+++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++
+++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++#     #     # 4. KV 缓存更新
+++#     #     if past_key_value is not None:
+++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++#     #         key_states, value_states = past_key_value.update(
+++#     #             key_states, value_states, self.layer_idx, cache_kwargs
+++#     #         )
+++
+++#     #     # 5. 准备 Attention Mask
+++#     #     fa_attention_mask = None
+++#     #     if attention_mask is not None:
+++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++#     #         fa_attention_mask = (mask_slice != 0)
+++
+++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++#     #     input_dtype = query_states.dtype
+++
+++#     #     # 6. [核心] 调用 flash_attention_score 算子
+++#     #     attn_output = mindspore.ops.flash_attention_score(
+++#     #         query=query_states,
+++#     #         key=key_states,
+++#     #         value=value_states,
+++#     #         head_num=self.num_heads,
+++#     #         attn_mask=fa_attention_mask,
+++#     #         keep_prob=1.0 - self.attention_dropout,
+++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++#     #         input_layout="BNSD",
+++#     #         sparse_mode=0,
+++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++#     #         inner_precise=1
+++#     #     )
+++
+++#     #     # 恢复原始数据类型
+++#     #     attn_output = attn_output.to(input_dtype)
+++
+++#     #     # 7. 调整输出形状
+++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++#     #     attn_output = self.o_proj(attn_output)
+++
+++#     #     attn_weights = None
+++#     #     if output_attentions:
+++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++
+++#     #     return attn_output, attn_weights, past_key_value
+++
+++
++ class Qwen2MoeFlashAttention(nn.Module):
++     """
++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++-
++-    关键改动:
++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++-       直接传入原始的 key 和 value 张量效率更高。
++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+++
+++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+++    完全使用模型的低精度数据类型（如 float16）进行计算，
+++    以达到理论上的最高执行速度。
++     """
++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++         super().__init__()
++         self.config = config
++         self.layer_idx = layer_idx
+++        if layer_idx is None:
+++            logger.warning_once(
+++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+++            )
+++
++         self.hidden_size = config.hidden_size
++         self.num_heads = config.num_attention_heads
++         self.head_dim = self.hidden_size // self.num_heads
++         self.num_key_value_heads = config.num_key_value_heads
++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++         self.max_position_embeddings = config.max_position_embeddings
++         self.rope_theta = config.rope_theta
++         self.attention_dropout = config.attention_dropout
++ 
++-        if (self.head_dim * self.num_heads) != self.hidden_size:
++-            raise ValueError(
++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++-            )
++-
++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
++         key_states = self.k_proj(hidden_states)
++         value_states = self.v_proj(hidden_states)
++ 
++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++-        # query:   [B, S, H*D] -> [B, N1, S, D]
++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++        # 2. 调整形状以匹配 BNSD 布局
++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-
++-        # 3. RoPE 旋转位置编码
+++        
+++        # 3. RoPE 和 KV 缓存
++         kv_seq_len = key_states.shape[-2]
++         if past_key_value is not None:
++-            if self.layer_idx is None:
++-                raise ValueError(
++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++-                    "with a layer index."
++-                )
++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++-                if cache_position.shape[0] == 1:
++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++-                    kv_seq_len = past_seen_tokens + 1
++-                else:
++-                    # prefill 阶段：cache_position 是范围，使用其长度
++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++-            else:
++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++-
+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++        
++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++ 
++-        # 4. KV 缓存更新
++         if past_key_value is not None:
++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++-            key_states, value_states = past_key_value.update(
++-                key_states, value_states, self.layer_idx, cache_kwargs
++-            )
++-            
++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++-                if cache_position.shape[0] == 1:
++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++-                    kv_seq_len = key_states.shape[-2]
++-
++-        # 5. [重要] 准备 Attention Mask
++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++
+++        # 4. 准备 Attention Mask
++         fa_attention_mask = None
++         if attention_mask is not None:
++-            # 截取与当前key长度匹配的部分
++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
++             fa_attention_mask = (mask_slice != 0)
++ 
++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++-        input_dtype = query_states.dtype
++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++-            query_states = query_states.to(mindspore.float16)
++-            key_states = key_states.to(mindspore.float16)
++-            value_states = value_states.to(mindspore.float16)
++-
++-        # 6. [核心] 调用 flash_attention_score 算子
++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
++         attn_output = mindspore.ops.flash_attention_score(
++             query=query_states,
++             key=key_states,
++             value=value_states,
++-            head_num=self.num_heads, # 传入Q的头数(N1)
+++            head_num=self.num_heads,
++             attn_mask=fa_attention_mask,
++-            keep_prob=1.0 - self.attention_dropout,
+++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
++             scalar_value=1.0 / math.sqrt(self.head_dim),
++             input_layout="BNSD",
++-            sparse_mode=0 # 使用 defaultMask 模式
+++            sparse_mode=0,
+++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
++         )
++ 
++-        # 恢复原始数据类型
++-        attn_output = attn_output.to(input_dtype)
++-
++-        # 7. 调整输出形状
++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++        # 6. 调整输出形状
++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++         attn_output = self.o_proj(attn_output)
++ 
++-        # FlashAttention 算子不直接返回注意力权重矩阵
+++        # 7. 返回结果
++         attn_weights = None
++         if output_attentions:
++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
++ 
++         return attn_output, attn_weights, past_key_value
++ 
++-    # def forward(
++-    #     self,
++-    #     hidden_states: mindspore.Tensor,
++-    #     attention_mask: Optional[mindspore.Tensor] = None,
++-    #     position_ids: Optional[mindspore.Tensor] = None,
++-    #     past_key_value: Optional[Cache] = None,
++-    #     output_attentions: bool = False,
++-    #     use_cache: bool = False,
++-    #     cache_position: Optional[mindspore.Tensor] = None,
++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++-
++-    #     bsz, q_len, _ = hidden_states.shape
++-
++-    #     # 1. 线性投射 Q, K, V
++-    #     query_states = self.q_proj(hidden_states)
++-    #     key_states = self.k_proj(hidden_states)
++-    #     value_states = self.v_proj(hidden_states)
++-
++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-
++-    #     # 3. RoPE 旋转位置编码
++-    #     kv_seq_len = key_states.shape[-2]
++-    #     if past_key_value is not None:
++-    #         if self.layer_idx is None:
++-    #             raise ValueError(
++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++-    #                 "with a layer index."
++-    #             )
++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++ 
++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++-
++-    #     # 4. KV 缓存更新
++-    #     if past_key_value is not None:
++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++-    #         key_states, value_states = past_key_value.update(
++-    #             key_states, value_states, self.layer_idx, cache_kwargs
++-    #         )
++-
++-    #     # 5. 准备 Attention Mask
++-    #     fa_attention_mask = None
++-    #     if attention_mask is not None:
++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++-    #         fa_attention_mask = (mask_slice != 0)
++-
++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++-    #     input_dtype = query_states.dtype
++-
++-    #     # 6. [核心] 调用 flash_attention_score 算子
++-    #     attn_output = mindspore.ops.flash_attention_score(
++-    #         query=query_states,
++-    #         key=key_states,
++-    #         value=value_states,
++-    #         head_num=self.num_heads,
++-    #         attn_mask=fa_attention_mask,
++-    #         keep_prob=1.0 - self.attention_dropout,
++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++-    #         input_layout="BNSD",
++-    #         sparse_mode=0,
++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++-    #         inner_precise=1
++-    #     )
++-
++-    #     # 恢复原始数据类型
++-    #     attn_output = attn_output.to(input_dtype)
+++QWEN2MOE_ATTENTION_CLASSES = {
+++    "eager": Qwen2MoeAttention,
+++    "flash-attention": Qwen2MoeFlashAttention,
+++}
++ 
++-    #     # 7. 调整输出形状
++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++-    #     attn_output = self.o_proj(attn_output)
++ 
++-    #     attn_weights = None
++-    #     if output_attentions:
++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++#     def __init__(self, config):
+++#         super().__init__()
+++#         self.num_experts = config.num_experts
+++#         self.top_k = config.num_experts_per_tok
+++#         self.norm_topk_prob = config.norm_topk_prob
+++
+++#         # gating
+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++#         self.experts = nn.ModuleList(
+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++#         )
+++
+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++
+++#     #@dwj
+++#     # 只遍历激活的专家，而非全部专家
+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++#             num_tokens = hidden_states_reshaped.shape[0]
+++
+++#             router_logits = self.gate(hidden_states_reshaped)
+++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++
+++#             if self.norm_topk_prob:
+++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++#             routing_weights = routing_weights.to(hidden_states.dtype)
+++            
+++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++#             flat_selected_experts = selected_experts.flatten()
+++            
+++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++#             token_indices = broadcasted_token_indices.flatten()
+++            
+++#             active_experts = ops.unique(flat_selected_experts)
+++            
+++#             for expert_idx_tensor in active_experts:
+++#                 expert_idx = expert_idx_tensor.item()
+++#                 expert_layer = self.experts[expert_idx]
+++                
+++#                 mask = (flat_selected_experts == expert_idx_tensor)
+++#                 selected_token_indices = token_indices[mask]
+++#                 selected_routing_weights = routing_weights.flatten()[mask]
+++                
+++#                 current_states = hidden_states_reshaped[selected_token_indices]
+++                
+++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++                
+++#                 final_hidden_states = final_hidden_states.index_add(
+++#                     dim=0,
+++#                     index=selected_token_indices,
+++#                     source=expert_output.to(hidden_states.dtype)
+++#                 )
+++            
+++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++ 
++-    #     return attn_output, attn_weights, past_key_value
+++#             final_hidden_states = final_hidden_states + shared_expert_output
+++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++            
+++#             return final_hidden_states, router_logits
+++
+++
+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++#     """
+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+++#     `_moe_infer_prefill` (用于长序列处理) 方法。
+++#     """
+++#     def __init__(self, config: Qwen2MoeConfig):
+++#         super().__init__()
+++#         self.num_experts = config.num_experts
+++#         self.top_k = config.num_experts_per_tok
+++#         self.norm_topk_prob = config.norm_topk_prob
+++
+++#         # 门控网络
+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++#         # 专家列表
+++#         self.experts = nn.ModuleList(
+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++#         )
+++#         # 共享专家
+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++
+++#     @no_grad()
+++#     def _moe_infer_decode(
+++#         self, 
+++#         hidden_states: mindspore.Tensor, 
+++#         selected_experts: mindspore.Tensor, 
+++#         routing_weights: mindspore.Tensor
+++#     ) -> mindspore.Tensor:
+++#         """
+++#         【解码路径】针对 sequence_length=1 的极致优化。
+++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+++#         """
+++#         batch_size, hidden_dim = hidden_states.shape
+++        
+++#         expert_outputs_list = [
+++#             ops.cat([
+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++#             ], dim=0) 
+++#             for i in range(batch_size)
+++#         ]
+++        
+++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+++#         # shape: (batch_size, top_k, hidden_dim)
+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++        
+++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++        
+++#         return moe_output.squeeze(1)
+++
+++#     @no_grad()
+++#     def _moe_infer_prefill(
+++#         self, 
+++#         hidden_states: mindspore.Tensor, 
+++#         selected_experts: mindspore.Tensor, 
+++#         routing_weights: mindspore.Tensor
+++#     ) -> mindspore.Tensor:
+++#         """
+++#         【预填充路径】针对 sequence_length > 1 的优化。
+++#         按专家对 Token 进行分组，并进行批处理。
+++#         """
+++#         moe_output = ops.zeros_like(hidden_states)
+++#         num_tokens = hidden_states.shape[0]
+++#         flat_selected_experts = selected_experts.flatten()
+++        
+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++        
+++#         active_experts = ops.unique(flat_selected_experts)
+++        
+++#         for expert_idx_tensor in active_experts:
+++#             expert_idx = expert_idx_tensor.item()
+++#             expert_layer = self.experts[expert_idx]
+++            
+++#             mask = (flat_selected_experts == expert_idx_tensor)
+++#             selected_token_indices = token_indices[mask]
+++#             selected_routing_weights = routing_weights.flatten()[mask]
+++            
+++#             current_states = hidden_states[selected_token_indices]
+++            
+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++            
+++#             moe_output = moe_output.index_add(
+++#                 dim=0,
+++#                 index=selected_token_indices,
+++#                 source=expert_output.to(hidden_states.dtype)
+++#             )
+++#         return moe_output
+++
+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++#         """
+++#         顶层 forward 方法，作为智能分发器。
+++#         """
+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++        
+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++#         router_logits = self.gate(hidden_states_reshaped)
+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++ 
++-    # def forward(
++-    #     self,
++-    #     hidden_states: mindspore.Tensor,
++-    #     attention_mask: Optional[mindspore.Tensor] = None,
++-    #     position_ids: Optional[mindspore.Tensor] = None,
++-    #     past_key_value: Optional[Cache] = None,
++-    #     output_attentions: bool = False,
++-    #     use_cache: bool = False,
++-    #     cache_position: Optional[mindspore.Tensor] = None,
++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++-
++-    #     bsz, q_len, _ = hidden_states.shape
++-
++-    #     query_states = self.q_proj(hidden_states)
++-    #     key_states = self.k_proj(hidden_states)
++-    #     value_states = self.v_proj(hidden_states)
++-
++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-
++-    #     kv_seq_len = key_states.shape[-2]
++-    #     if past_key_value is not None:
++-    #         if self.layer_idx is None:
++-    #             raise ValueError("`layer_idx` must be specified for caching")
++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++-
++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++-
++-    #     if past_key_value is not None:
++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++-    #         key_states, value_states = past_key_value.update(
++-    #             key_states, value_states, self.layer_idx, cache_kwargs
++-    #         )
+++#         if self.norm_topk_prob:
+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++        
+++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++        
+++#         moe_output = None
+++#         # 在推理时，根据序列长度选择最优路径
+++#         if not self.training:
+++#             if sequence_length == 1:
+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++#             else:
+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++#         else:
+++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+++#             raise NotImplementedError("Training path is not implemented.")
+++
+++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+++        
+++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+++        
+++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+++        
+++#         return final_hidden_states, router_logits
+++
+++
+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++#     """
+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+++#     """
+++#     def __init__(self, config: Qwen2MoeConfig):
+++#         super().__init__()
+++#         self.num_experts = config.num_experts
+++#         self.top_k = config.num_experts_per_tok
+++#         self.norm_topk_prob = config.norm_topk_prob
+++
+++#         # 门控网络
+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++#         # 专家列表
+++#         self.experts = nn.ModuleList(
+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++#         )
+++#         # 共享专家
+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++
+++#     @no_grad()
+++#     def _moe_infer_decode(
+++#         self, 
+++#         hidden_states: mindspore.Tensor, 
+++#         selected_experts: mindspore.Tensor, 
+++#         routing_weights: mindspore.Tensor
+++#     ) -> mindspore.Tensor:
+++#         batch_size, _ = hidden_states.shape
+++#         expert_outputs_list = [
+++#             ops.cat([
+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++#             ], dim=0) 
+++#             for i in range(batch_size)
+++#         ]
+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++#         return moe_output.squeeze(1)
+++
+++#     @no_grad()
+++#     def _moe_infer_prefill(
+++#         self, 
+++#         hidden_states: mindspore.Tensor, 
+++#         selected_experts: mindspore.Tensor, 
+++#         routing_weights: mindspore.Tensor
+++#     ) -> mindspore.Tensor:
+++#         moe_output = ops.zeros_like(hidden_states)
+++#         num_tokens = hidden_states.shape[0]
+++#         flat_selected_experts = selected_experts.flatten()
+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++#         active_experts = ops.unique(flat_selected_experts)
+++        
+++#         for expert_idx_tensor in active_experts:
+++#             expert_idx = expert_idx_tensor.item()
+++#             expert_layer = self.experts[expert_idx]
+++#             mask = (flat_selected_experts == expert_idx_tensor)
+++#             selected_token_indices = token_indices[mask]
+++#             selected_routing_weights = routing_weights.flatten()[mask]
+++#             current_states = hidden_states[selected_token_indices]
+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++#             moe_output = moe_output.index_add(
+++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++#             )
+++#         return moe_output
+++
+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++#         """
+++#         顶层 forward 方法，作为智能分发器。
+++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+++#         """
+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++        
+++#         # 1. 门控计算 (通用逻辑)
+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++#         router_logits = self.gate(hidden_states_reshaped)
+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++
+++#         if self.norm_topk_prob:
+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++        
+++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++        
+++#         # 2. 智能分发到最优 MoE 路径
+++#         moe_output = None
+++#         if not self.training:
+++#             if sequence_length == 1:
+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++#             else:
+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++#         else:
+++#             raise NotImplementedError("Training path is not implemented.")
+++
+++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++        
+++#         # 4. 合并 MoE 输出和共享专家输出
+++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++        
+++#         # 5. 恢复原始形状并返回
+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++        
+++#         return final_hidden_states, router_logits
+++
+++# prefill fastest
+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++#     """
+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+++#     """
+++#     def __init__(self, config: Qwen2MoeConfig):
+++#         super().__init__()
+++#         self.num_experts = config.num_experts
+++#         self.top_k = config.num_experts_per_tok
+++#         self.norm_topk_prob = config.norm_topk_prob
+++
+++#         # 门控网络
+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++#         # 专家列表
+++#         self.experts = nn.ModuleList(
+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++#         )
+++#         # 共享专家
+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++
+++#     @no_grad()
+++#     def _moe_infer_dispatch(
+++#         self, 
+++#         hidden_states: mindspore.Tensor, 
+++#         selected_experts: mindspore.Tensor, 
+++#         routing_weights: mindspore.Tensor
+++#     ) -> mindspore.Tensor:
+++#         """
+++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+++#         """
+++#         moe_output = ops.zeros_like(hidden_states)
+++#         num_tokens, _ = hidden_states.shape
+++        
+++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+++#         flat_selected_experts = selected_experts.flatten()
+++#         flat_routing_weights = routing_weights.flatten()
++ 
++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++-
++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++-    #     query_states = query_states / math.sqrt(self.head_dim)
++-    #     # <--- 修改结束 ---
++-
++-    #     fa_attention_mask = None
++-    #     if attention_mask is not None:
++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++-    #         fa_attention_mask = (mask_slice != 0)
++-
++-    #     input_dtype = query_states.dtype
++-
++-    #     attn_output = mindspore.ops.flash_attention_score(
++-    #         query=query_states,    # 传入已经预先缩放过的 query
++-    #         key=key_states,
++-    #         value=value_states,
++-    #         head_num=self.num_heads,
++-    #         attn_mask=fa_attention_mask,
++-    #         keep_prob=1.0 - self.attention_dropout,
++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++-    #         input_layout="BNSD",
++-    #         sparse_mode=0,
++-    #         inner_precise=1        # 仍然保持内部高精度计算
++-    #     )
+++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++ 
++-    #     attn_output = attn_output.to(input_dtype)
++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++-    #     attn_output = self.o_proj(attn_output)
+++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+++#         active_experts = ops.unique(flat_selected_experts)
+++        
+++#         for expert_idx_tensor in active_experts:
+++#             expert_idx = expert_idx_tensor.item()
+++#             expert_layer = self.experts[expert_idx]
+++            
+++#             # 找到所有分配给该专家的 token
+++#             mask = (flat_selected_experts == expert_idx_tensor)
+++            
+++#             # 使用 mask 选取对应的 token 和权重
+++#             current_token_indices = token_indices[mask]
+++#             current_routing_weights = flat_routing_weights[mask]
+++#             current_hidden_states = hidden_states[current_token_indices]
+++            
+++#             # 对这些 token 进行批处理
+++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++            
+++#             # 使用 index_add 将结果精确地加回到对应位置
+++#             moe_output = moe_output.index_add(
+++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+++#             )
+++#         return moe_output
+++
+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++#         """
+++#         顶层 forward 方法，作为智能分发器。
+++#         """
+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++        
+++#         # 1. 门控计算
+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++#         router_logits = self.gate(hidden_states_reshaped)
+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++
+++#         if self.norm_topk_prob:
+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++        
+++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++        
+++#         # 2. 调用统一的 MoE 计算内核
+++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
++ 
++-    #     attn_weights = None
++-    #     if output_attentions:
++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++#         # 3. 统一处理共享专家
+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++        
+++#         # 4. 合并输出
+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++        
+++#         # 5. 恢复原始形状并返回
+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++        
+++#         return final_hidden_states, router_logits
+++
+++
+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++#     """
+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++#     【最终高性能与高精度版】：
+++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+++#     3. 这样实现了速度和准确性的两全其美。
+++#     """
+++#     def __init__(self, config: Qwen2MoeConfig):
+++#         super().__init__()
+++#         self.num_experts = config.num_experts
+++#         self.top_k = config.num_experts_per_tok
+++#         self.norm_topk_prob = config.norm_topk_prob
+++
+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++#         self.experts = nn.ModuleList(
+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++#         )
+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++
+++#     @no_grad()
+++#     def _moe_infer_decode(
+++#         self, 
+++#         hidden_states: mindspore.Tensor, 
+++#         selected_experts: mindspore.Tensor, 
+++#         routing_weights: mindspore.Tensor
+++#     ) -> mindspore.Tensor:
+++#         """
+++#         【解码路径】极致优化版：bmm + 高精度累加。
+++#         """
+++#         original_dtype = hidden_states.dtype
+++#         batch_size, _ = hidden_states.shape
+++        
+++#         expert_outputs_list = [
+++#             ops.cat([
+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++#             ], dim=0) 
+++#             for i in range(batch_size)
+++#         ]
+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++
+++#         # 在 float32 下执行 bmm，得到高精度结果
+++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++        
+++#         # 将高精度结果转换回原始数据类型
+++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+++        
+++#         return moe_output
+++
+++#     @no_grad()
+++#     def _moe_infer_prefill(
+++#         self, 
+++#         hidden_states: mindspore.Tensor, 
+++#         selected_experts: mindspore.Tensor, 
+++#         routing_weights: mindspore.Tensor
+++#     ) -> mindspore.Tensor:
+++#         """
+++#         【预填充路径】与原始实现一致，结果精确。
+++#         """
+++#         moe_output = ops.zeros_like(hidden_states)
+++#         num_tokens, _ = hidden_states.shape
+++#         flat_selected_experts = selected_experts.flatten()
+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++#         active_experts = ops.unique(flat_selected_experts)
+++        
+++#         for expert_idx_tensor in active_experts:
+++#             expert_idx = expert_idx_tensor.item()
+++#             expert_layer = self.experts[expert_idx]
+++#             mask = (flat_selected_experts == expert_idx_tensor)
+++#             selected_token_indices = token_indices[mask]
+++#             selected_routing_weights = routing_weights.flatten()[mask]
+++#             current_states = hidden_states[selected_token_indices]
+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++#             moe_output = moe_output.index_add(
+++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++#             )
+++#         return moe_output
+++
+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++        
+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++#         router_logits = self.gate(hidden_states_reshaped)
+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++ 
++-    #     return attn_output, attn_weights, past_key_value
+++#         if self.norm_topk_prob:
+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++        
+++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+++#         # 如果模型主体是 float16，后续再转换
+++        
+++#         moe_output = None
+++#         if not self.training:
+++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+++#             # _moe_infer_decode 内部会处理好类型转换
+++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+++#             if sequence_length == 1:
+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++#             else:
+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++#         else:
+++#             raise NotImplementedError("Training path is not implemented.")
+++
+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++        
+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++        
+++#         return final_hidden_states, router_logits
+++    
++ 
++-QWEN2MOE_ATTENTION_CLASSES = {
++-    "eager": Qwen2MoeAttention,
++-    "flash-attention": Qwen2MoeFlashAttention,
++-}
+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++#     """
+++#     【融合版】一个混合专家模块，内置两种推理策略，
+++#     由外部全局变量 `Long_Prompt` 控制：
+++
+++#     - if Long_Prompt is True:  【精度优先模式】
+++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+++#       适用于处理长序列，避免误差累积。
+++
+++#     - if Long_Prompt is False: 【速度优先模式】
+++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+++#       在解码阶段获得极致速度，同时保证结果高度准确。
+++#     """
+++#     def __init__(self, config: Qwen2MoeConfig):
+++#         super().__init__()
+++#         self.num_experts = config.num_experts
+++#         self.top_k = config.num_experts_per_tok
+++#         self.norm_topk_prob = config.norm_topk_prob
+++
+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++#         self.experts = nn.ModuleList(
+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++#         )
+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++
+++#     # --- 速度优先模式的辅助函数 ---
+++#     @no_grad()
+++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++#         original_dtype = hidden_states.dtype
+++#         batch_size, _ = hidden_states.shape
+++#         expert_outputs_list = [
+++#             ops.cat([
+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++#             ], dim=0) 
+++#             for i in range(batch_size)
+++#         ]
+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++#         weights_fp32 = routing_weights.to(mindspore.float32)
+++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++#         return moe_output_fp32.squeeze(1).to(original_dtype)
+++
+++#     @no_grad()
+++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++#         moe_output = ops.zeros_like(hidden_states)
+++#         num_tokens, _ = hidden_states.shape
+++#         flat_selected_experts = selected_experts.flatten()
+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++#         active_experts = ops.unique(flat_selected_experts)
+++#         for expert_idx_tensor in active_experts:
+++#             expert_idx = expert_idx_tensor.item()
+++#             expert_layer = self.experts[expert_idx]
+++#             mask = (flat_selected_experts == expert_idx_tensor)
+++#             selected_token_indices = token_indices[mask]
+++#             selected_routing_weights = routing_weights.flatten()[mask]
+++#             current_states = hidden_states[selected_token_indices]
+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+++#         return moe_output
+++
+++#     # --- 精度优先模式的辅助函数 ---
+++#     @no_grad()
+++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++#         moe_output = ops.zeros_like(hidden_states)
+++#         num_tokens, _ = hidden_states.shape
+++#         flat_selected_experts = selected_experts.flatten()
+++#         flat_routing_weights = routing_weights.flatten()
+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++#         active_experts = ops.unique(flat_selected_experts)
+++#         for expert_idx_tensor in active_experts:
+++#             expert_idx = expert_idx_tensor.item()
+++#             expert_layer = self.experts[expert_idx]
+++#             mask = (flat_selected_experts == expert_idx_tensor)
+++#             current_token_indices = token_indices[mask]
+++#             current_routing_weights = flat_routing_weights[mask]
+++#             current_hidden_states = hidden_states[current_token_indices]
+++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++#         return moe_output
+++
+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++#         # 声明我们将要使用一个在模块外部定义的全局变量
+++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+++#         global Long_Prompt
+++
+++#         # 1. 门控计算 (所有模式通用)
+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++#         router_logits = self.gate(hidden_states_reshaped)
+++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++#         if self.norm_topk_prob:
+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++        
+++#         moe_output = None
+++#         if not self.training:
+++#             # 根据 Long_Prompt 标志选择模式
+++#             if Long_Prompt:
+++#                 # --- 精度优先模式 ---
+++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++#             else:
+++#                 # --- 速度优先模式 ---
+++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++#                 if sequence_length == 1:
+++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++#                 else:
+++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++#         else:
+++#             raise NotImplementedError("Training path is not implemented.")
+++
+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++        
+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++        
+++#         return final_hidden_states, router_logits
+++    
+++class Qwen2MoeSparseMoeBlock(nn.Module):
+++    """
+++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+++    控制的顶级推理策略：
++ 
+++    - if Long_Prompt is True:  【精度优先模式】
+++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+++      适用于需要严格可复现性的长序列任务。
++ 
++-class Qwen2MoeSparseMoeBlock(nn.Module):
++-    def __init__(self, config):
+++    - if Long_Prompt is False: 【速度优先模式】
+++      采用业界最强的性能组合：
+++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+++    """
+++    def __init__(self, config: Qwen2MoeConfig):
++         super().__init__()
++         self.num_experts = config.num_experts
++         self.top_k = config.num_experts_per_tok
++         self.norm_topk_prob = config.norm_topk_prob
++ 
++-        # gating
++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++         self.experts = nn.ModuleList(
++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++         )
++-
++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++ 
++-    #@dwj
++-    # 只遍历激活的专家，而非全部专家
++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++-            num_tokens = hidden_states_reshaped.shape[0]
++-
++-            router_logits = self.gate(hidden_states_reshaped)
++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++-
++-            if self.norm_topk_prob:
++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-            routing_weights = routing_weights.to(hidden_states.dtype)
++-            
++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++-            flat_selected_experts = selected_experts.flatten()
++-            
++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++-            token_indices = broadcasted_token_indices.flatten()
++-            
++-            active_experts = ops.unique(flat_selected_experts)
++-            
++-            for expert_idx_tensor in active_experts:
++-                expert_idx = expert_idx_tensor.item()
++-                expert_layer = self.experts[expert_idx]
++-                
++-                mask = (flat_selected_experts == expert_idx_tensor)
++-                selected_token_indices = token_indices[mask]
++-                selected_routing_weights = routing_weights.flatten()[mask]
++-                
++-                current_states = hidden_states_reshaped[selected_token_indices]
++-                
++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++-                
++-                final_hidden_states = final_hidden_states.index_add(
++-                    dim=0,
++-                    index=selected_token_indices,
++-                    source=expert_output.to(hidden_states.dtype)
++-                )
++-            
++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+++    @no_grad()
+++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++        original_dtype = hidden_states.dtype
+++        batch_size, _ = hidden_states.shape
+++        expert_outputs_list = [
+++            ops.cat([
+++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++            ], dim=0)
+++            for i in range(batch_size)
+++        ]
+++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++        weights_fp32 = routing_weights.to(mindspore.float32)
+++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++        return moe_output_fp32.squeeze(1).to(original_dtype)
+++
+++    @no_grad()
+++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++        num_tokens, _ = hidden_states.shape
+++        flat_selected_experts = selected_experts.flatten()
+++        sorted_expert_indices = flat_selected_experts.argsort()
+++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++        original_token_indices = sorted_expert_indices // self.top_k
+++        moe_output = ops.zeros_like(hidden_states)
+++        current_token_offset = 0
+++        for i in range(self.num_experts):
+++            expert_token_count = tokens_per_expert[i] - current_token_offset
+++            if expert_token_count == 0:
+++                continue
+++            end_offset = current_token_offset + expert_token_count
+++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++            expert_hidden_states = hidden_states[expert_original_token_indices]
+++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++            current_token_offset += expert_token_count
+++        return moe_output
+++
+++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+++    @no_grad()
+++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++        moe_output = ops.zeros_like(hidden_states)
+++        num_tokens, _ = hidden_states.shape
+++        flat_selected_experts = selected_experts.flatten()
+++        flat_routing_weights = routing_weights.flatten()
+++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++        active_experts = ops.unique(flat_selected_experts)
+++        for expert_idx_tensor in active_experts:
+++            expert_idx = expert_idx_tensor.item()
+++            expert_layer = self.experts[expert_idx]
+++            mask = (flat_selected_experts == expert_idx_tensor)
+++            current_token_indices = token_indices[mask]
+++            current_routing_weights = flat_routing_weights[mask]
+++            current_hidden_states = hidden_states[current_token_indices]
+++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++        return moe_output
++ 
++-            final_hidden_states = final_hidden_states + shared_expert_output
++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++-            
++-            return final_hidden_states, router_logits
+++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++        global Long_Prompt
+++
+++        # 1. 门控计算 (所有模式通用)
+++        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++        router_logits = self.gate(hidden_states_reshaped)
+++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++        if self.norm_topk_prob:
+++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++        
+++        moe_output = None
+++        if Long_Prompt:
+++            # --- 精度优先模式 (ACCURACY MODE) ---
+++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++        else:
+++            # --- 速度优先模式 (SPEED MODE) ---
+++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++            if sequence_length == 1:
+++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++            else:
+++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++        
++ 
+++        # 3. 共享专家计算与合并 (所有模式通用)
+++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++        
+++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++        
+++        return final_hidden_states, router_logits
++ 
++ class Qwen2MoeDecoderLayer(nn.Module):
++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
++         super().__init__()
++         self.hidden_size = config.hidden_size
+++        
+++        # if Long_Prompt:
+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++        # else:
+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++ 
++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++ 
++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++-
++         if (layer_idx not in config.mlp_only_layers) and (
++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++         ):
++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++             self._warmed_up = True
++             self.warmup_moe_model()
++ 
+++
+++
++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++         output_router_logits = (
++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++             router_logits=outputs.router_logits,
++         )
++ 
+++    def generate(self, *args, **kwargs):
+++        """
+++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+++        """
+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+++
+++        input_ids = kwargs.get("input_ids")
+++        if input_ids is None and args:
+++            input_ids = args[0]
+++
+++        if input_ids is not None:
+++            prompt_length = input_ids.shape[1]
+++            
+++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+++                Long_Prompt = True
+++            else:
+++                Long_Prompt = False
+++
+++        return super().generate(*args, **kwargs)
+++    
++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
++     def prepare_inputs_for_generation(
++         self,
++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
++         # Exception 1: when passing input_embeds, input_ids may be missing entries
++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+++        
++         if past_key_values is not None:
++             if inputs_embeds is not None:  # Exception 1
++                 if 0 not in input_ids.shape:
++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++             }
++         )
++         return model_inputs
+++
++ # @lwx
++     # def _decode_one_tokens_logits(
++     #     self,
++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
++             attentions=outputs.attentions,
++         )
++ 
+++
++ __all__ = [
++     "Qwen2MoeForCausalLM",
++     "Qwen2MoeModel",
++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++new file mode 100644
++index 00000000..6dfb5b93
++--- /dev/null
+++++ b/patches/0001-20251104commit.patch
++@@ -0,0 +1,1272 @@
+++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++From: Pinoeer-kingxi <13022943007@163.com>
+++Date: Tue, 4 Nov 2025 09:11:51 +0800
+++Subject: [PATCH] 20251104commit
+++
+++---
+++ mindnlp/transformers/cache_utils.py           |  28 +-
+++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++ 3 files changed, 976 insertions(+), 87 deletions(-)
+++
+++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++index cadd2e04..02f8d4be 100644
+++--- a/mindnlp/transformers/cache_utils.py
++++++ b/mindnlp/transformers/cache_utils.py
+++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++             # k_out[:, :, cache_position] = key_states
+++             # v_out[:, :, cache_position] = value_states
+++-            if ON_ORANGE_PI:
+++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++-            else:
+++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++-
++++            # if ON_ORANGE_PI:
++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++            # else:
++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++            # 确保 cache_position 是 1D tensor 并且类型正确
++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++++            if cache_position.ndim > 1:
++++                cache_position = cache_position.flatten()
++++            # 确保类型是 int32 或 int64（MindSpore 要求）
++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++++                cache_position = cache_position.int()
++++            
++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++++            k_out[:, :, cache_position] = key_states
++++            v_out[:, :, cache_position] = value_states
++++            
+++         return k_out, v_out
+++ 
+++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++index c695b944..d8303e45 100644
+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++ def rotate_half(x):
+++     """Rotates half the hidden dims of the input."""
+++-    x1 = x[..., : x.shape[-1] // 2]
+++-    x2 = x[..., x.shape[-1] // 2 :]
++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++    # x1 = x[..., : x.shape[-1] // 2]
++++    # x2 = x[..., x.shape[-1] // 2 :]
++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++     return ops.cat((-x2, x1), dim=-1)
+++ 
+++ 
+++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++         if self.training:
+++             raise NotImplementedError("Training is not supported yet.")
+++         else:
+++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++-        if self.config.n_shared_experts is not None:
+++-            y = y + self.shared_experts(identity)
+++-        return y
++++            # @lwx
++++            if orig_shape[1] == 1:
++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++++                y=y.view(*orig_shape)
++++                if self.config.n_shared_experts is not None:
++++                    y = y + self.shared_experts(identity)
++++                return y
++++            else:
++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++++                if self.config.n_shared_experts is not None:
++++                    y = y + self.shared_experts(identity)
++++                return y
++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++        # if self.config.n_shared_experts is not None:
++++        #     y = y + self.shared_experts(identity)
++++        # return y
++++        
++++    @no_grad()
++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++
++++        expert_cache = ops.zeros_like(x)
++++        for i in range(self.num_experts_per_tok):
++++            expert_id = flat_expert_indices[i].item()
++++            weight = flat_expert_weights[i].item()
++++            expert = self.experts[expert_id]
++++            expert_out = expert(x)
++++            expert_cache += expert_out * weight
++++        return expert_cache
+++ 
+++     @no_grad()
+++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++-        # expert_cache = torch.zeros_like(x)
+++-        # idxs = flat_expert_indices.argsort()
+++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++-        # token_idxs = idxs // self.num_experts_per_tok
+++-        # for i, end_idx in enumerate(tokens_per_expert):
+++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++-        #     if start_idx == end_idx:
+++-        #         continue
+++-        #     expert = self.experts[i]
+++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++-        #     expert_tokens = x[exp_token_idx]
+++-        #     expert_out = expert(expert_tokens)
+++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++-        # return expert_cache
++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++         expert_cache = ops.zeros_like(x)
+++         idxs = flat_expert_indices.argsort()
+++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++         token_idxs = idxs // self.num_experts_per_tok
++++
+++         for i, end_idx in enumerate(tokens_per_expert):
+++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++             if start_idx == end_idx:
+++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++             expert_out = expert(expert_tokens)
+++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++
+++         return expert_cache
++++        
++++    # @no_grad()
++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++    #     # expert_cache = torch.zeros_like(x)
++++    #     # idxs = flat_expert_indices.argsort()
++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++    #     # token_idxs = idxs // self.num_experts_per_tok
++++    #     # for i, end_idx in enumerate(tokens_per_expert):
++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++    #     #     if start_idx == end_idx:
++++    #     #         continue
++++    #     #     expert = self.experts[i]
++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++    #     #     expert_tokens = x[exp_token_idx]
++++    #     #     expert_out = expert(expert_tokens)
++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++    #     # return expert_cache
++++    #     expert_cache = ops.zeros_like(x)
++++    #     idxs = flat_expert_indices.argsort()
++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++    #     token_idxs = idxs // self.num_experts_per_tok
++++
++++    #     for i, end_idx in enumerate(tokens_per_expert):
++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++    #         if start_idx == end_idx:
++++    #             continue
++++    #         expert = self.experts[i]
++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++    #         expert_tokens = x[exp_token_idx]
++++    #         expert_out = expert(expert_tokens)
++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++
++++    #     return expert_cache
++++    # @no_grad()
++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++    #     expert_cache = ops.zeros_like(x)
++++
++++    #     # 排序保证顺序一致
++++    #     idxs = flat_expert_indices.argsort()
++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++    #     token_idxs = idxs // self.num_experts_per_tok
++++
++++    #     # 找出有 token 的专家
++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++
++++    #     for i in active_experts.tolist():
++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++    #         end_idx = tokens_per_expert[i]
++++    #         if start_idx == end_idx:  # 没有 token
++++    #             continue
++++
++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++    #         expert_tokens = x[exp_token_idx]
++++    #         expert_out = self.experts[i](expert_tokens)
++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++
++++    #         expert_cache = mindspore.mint.scatter_add(
++++    #             expert_cache,
++++    #             0,
++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++    #             expert_out
++++    #         )
++++
++++    #     return expert_cache
++++
++++
+++ 
+++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++ #     """
+++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++ 
+++         # Initialize weights and apply final processing
+++         self.post_init()
++++        self.warm_up = False
++++
++++    def warmup_moe_model_deep(self):
++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++        test_texts = [
++++            "warmup short",
++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++++        ]
++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++        if tokenizer is None:
++++            from mindnlp.transformers import AutoTokenizer
++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++            self._warmup_tokenizer = tokenizer
++++
++++        for text in test_texts:
++++            inputs = tokenizer(text, return_tensors="ms")
++++            with mindspore._no_grad():
++++                _ = self(**inputs, use_cache=False)
++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++ 
+++     def get_input_embeddings(self):
+++         return self.model.embed_tokens
+++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++         ```"""
++++        if not self.warm_up:
++++            self.warm_up = True
++++            self.warmup_moe_model_deep()
++++
+++         output_attentions = (
+++             output_attentions
+++             if output_attentions is not None
+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++index 3cbf820e..d4c6b651 100644
+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++@@ -18,7 +18,6 @@
+++ # See the License for the specific language governing permissions and
+++ # limitations under the License.
+++ """MindSpore Qwen2MoE model."""
+++-
+++ import math
+++ from typing import List, Optional, Tuple, Union
+++ 
+++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++     TokenClassifierOutput,
+++ )
+++ from ...modeling_utils import PreTrainedModel
++++from ...generation import GenerationMixin
+++ from ....utils import logging
+++ from .configuration_qwen2_moe import Qwen2MoeConfig
+++ 
+++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++         self.variance_epsilon = eps
+++ 
+++     def forward(self, hidden_states):
++++        # @dwj
++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++        # @lwx
++++        # if not self.training :
++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++         input_dtype = hidden_states.dtype
+++         hidden_states = hidden_states.to(mindspore.float32)
+++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++@@ -234,6 +239,8 @@ def rotate_half(x):
+++     """Rotates half the hidden dims of the input."""
+++     x1 = x[..., : x.shape[-1] // 2]
+++     x2 = x[..., x.shape[-1] // 2 :]
++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++     return ops.cat((-x2, x1), dim=-1)
+++ 
+++ 
+++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++         self.config = config
+++         self.hidden_size = config.hidden_size
+++         self.intermediate_size = intermediate_size
++++        
+++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++         self.act_fn = ACT2FN[config.hidden_act]
+++ 
+++     def forward(self, x):
+++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++-
+++ 
++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++        # @lwx 
++++        # gate_up_output = self.gate_up_proj(x)
++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++++        # return self.down_proj(swiglu_output)
++++
++++    # def forward(self, x):
++++    #     gate_proj_out = self.gate_proj(x)
++++    #     up_proj_out = self.up_proj(x)
++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++++    #     return self.down_proj(swiglu_out)
++++        
+++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++     """
+++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++         use_cache: bool = False,
+++         cache_position: Optional[mindspore.Tensor] = None,
+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++        
++++
+++         bsz, q_len, _ = hidden_states.shape
+++ 
+++         query_states = self.q_proj(hidden_states)
+++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++                     "with a layer index."
+++                 )
+++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++            if isinstance(past_key_value, StaticCache):
++++                kv_seq_len = key_states.shape[-2]
++++            else:
++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++ 
+++         if past_key_value is not None:
+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++            
++++            if isinstance(past_key_value, StaticCache):
++++                kv_seq_len = key_states.shape[-2]
+++ 
+++         # repeat k/v heads if n_kv_heads < n_heads
+++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++-
++++        
+++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++ 
+++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++-            raise ValueError(
+++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++-                f" {attn_weights.shape}"
+++-            )
+++-
+++-        if attention_mask is not None:  # no matter the length, we just slice it
+++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++++        if attention_mask is not None:
++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++             attn_weights = attn_weights + causal_mask
+++ 
+++         # upcast attention to fp32
+++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++ 
+++         attn_output = self.o_proj(attn_output)
+++-
++++        # @lwx
++++        
++++        # max_seq_len = self.max_position_embeddings  # 2048
++++
++++        # if attention_mask is not None:
++++        #     # attention_mask: [B, 1, Sq, Sk]
++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++
++++        #     # pad 到 [max_seq_len, max_seq_len]
++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++        #     global_attention_mask = padded_mask
++++        # else:
++++        #     global_attention_mask = None
++++
++++
++++        # sparse_mode=3
++++        # attn_output = mindspore.ops.flash_attention_score(
++++        #     query=query_states,
++++        #     key=key_states,
++++        #     value=value_states,
++++        #     real_shift=None,
++++        #     padding_mask=None,
++++
++++        #     head_num=self.num_heads,
++++        #     attn_mask=global_attention_mask,
++++        #     keep_prob=1.0 - self.attention_dropout,
++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++        #     input_layout="BNSD",
++++        #     pre_tokens=2147483647,
++++        #     next_tokens=2147483647,
++++        #     inner_precise=0,
++++        #     drop_mask=None,
++++        #     prefix=None,
++++        #     actual_seq_qlen=None,
++++        #     actual_seq_kvlen=None,
++++        #     sparse_mode=sparse_mode,
++++        # )
+++         if not output_attentions:
+++             attn_weights = None
+++ 
+++         return attn_output, attn_weights, past_key_value
+++ 
+++ 
++++class Qwen2MoeFlashAttention(nn.Module):
++++    """
++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++
++++    关键改动:
++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++       直接传入原始的 key 和 value 张量效率更高。
++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++    """
++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++        super().__init__()
++++        self.config = config
++++        self.layer_idx = layer_idx
++++        self.hidden_size = config.hidden_size
++++        self.num_heads = config.num_attention_heads
++++        self.head_dim = self.hidden_size // self.num_heads
++++        self.num_key_value_heads = config.num_key_value_heads
++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++        self.max_position_embeddings = config.max_position_embeddings
++++        self.rope_theta = config.rope_theta
++++        self.attention_dropout = config.attention_dropout
++++
++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++            raise ValueError(
++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++            )
++++
++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++
++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++            self.head_dim,
++++            max_position_embeddings=self.max_position_embeddings,
++++            base=self.rope_theta,
++++        )
++++
++++    def forward(
++++        self,
++++        hidden_states: mindspore.Tensor,
++++        attention_mask: Optional[mindspore.Tensor] = None,
++++        position_ids: Optional[mindspore.Tensor] = None,
++++        past_key_value: Optional[Cache] = None,
++++        output_attentions: bool = False,
++++        use_cache: bool = False,
++++        cache_position: Optional[mindspore.Tensor] = None,
++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++        bsz, q_len, _ = hidden_states.shape
++++
++++        # 1. 线性投射 Q, K, V
++++        query_states = self.q_proj(hidden_states)
++++        key_states = self.k_proj(hidden_states)
++++        value_states = self.v_proj(hidden_states)
++++
++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++        # query:   [B, S, H*D] -> [B, N1, S, D]
++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++        # 3. RoPE 旋转位置编码
++++        kv_seq_len = key_states.shape[-2]
++++        if past_key_value is not None:
++++            if self.layer_idx is None:
++++                raise ValueError(
++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++                    "with a layer index."
++++                )
++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++                if cache_position.shape[0] == 1:
++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++                    kv_seq_len = past_seen_tokens + 1
++++                else:
++++                    # prefill 阶段：cache_position 是范围，使用其长度
++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++            else:
++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++
++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++        # 4. KV 缓存更新
++++        if past_key_value is not None:
++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++            key_states, value_states = past_key_value.update(
++++                key_states, value_states, self.layer_idx, cache_kwargs
++++            )
++++            
++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++                if cache_position.shape[0] == 1:
++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++                    kv_seq_len = key_states.shape[-2]
++++
++++        # 5. [重要] 准备 Attention Mask
++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++        fa_attention_mask = None
++++        if attention_mask is not None:
++++            # 截取与当前key长度匹配的部分
++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++            fa_attention_mask = (mask_slice != 0)
++++
++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++        input_dtype = query_states.dtype
++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++            query_states = query_states.to(mindspore.float16)
++++            key_states = key_states.to(mindspore.float16)
++++            value_states = value_states.to(mindspore.float16)
++++
++++        # 6. [核心] 调用 flash_attention_score 算子
++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++        attn_output = mindspore.ops.flash_attention_score(
++++            query=query_states,
++++            key=key_states,
++++            value=value_states,
++++            head_num=self.num_heads, # 传入Q的头数(N1)
++++            attn_mask=fa_attention_mask,
++++            keep_prob=1.0 - self.attention_dropout,
++++            scalar_value=1.0 / math.sqrt(self.head_dim),
++++            input_layout="BNSD",
++++            sparse_mode=0 # 使用 defaultMask 模式
++++        )
++++
++++        # 恢复原始数据类型
++++        attn_output = attn_output.to(input_dtype)
++++
++++        # 7. 调整输出形状
++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++        attn_output = self.o_proj(attn_output)
++++
++++        # FlashAttention 算子不直接返回注意力权重矩阵
++++        attn_weights = None
++++        if output_attentions:
++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++
++++        return attn_output, attn_weights, past_key_value
++++
++++    # def forward(
++++    #     self,
++++    #     hidden_states: mindspore.Tensor,
++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++    #     past_key_value: Optional[Cache] = None,
++++    #     output_attentions: bool = False,
++++    #     use_cache: bool = False,
++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++    #     bsz, q_len, _ = hidden_states.shape
++++
++++    #     # 1. 线性投射 Q, K, V
++++    #     query_states = self.q_proj(hidden_states)
++++    #     key_states = self.k_proj(hidden_states)
++++    #     value_states = self.v_proj(hidden_states)
++++
++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++    #     # 3. RoPE 旋转位置编码
++++    #     kv_seq_len = key_states.shape[-2]
++++    #     if past_key_value is not None:
++++    #         if self.layer_idx is None:
++++    #             raise ValueError(
++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++    #                 "with a layer index."
++++    #             )
++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++
++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++    #     # 4. KV 缓存更新
++++    #     if past_key_value is not None:
++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++    #         key_states, value_states = past_key_value.update(
++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++    #         )
++++
++++    #     # 5. 准备 Attention Mask
++++    #     fa_attention_mask = None
++++    #     if attention_mask is not None:
++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++    #         fa_attention_mask = (mask_slice != 0)
++++
++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++    #     input_dtype = query_states.dtype
++++
++++    #     # 6. [核心] 调用 flash_attention_score 算子
++++    #     attn_output = mindspore.ops.flash_attention_score(
++++    #         query=query_states,
++++    #         key=key_states,
++++    #         value=value_states,
++++    #         head_num=self.num_heads,
++++    #         attn_mask=fa_attention_mask,
++++    #         keep_prob=1.0 - self.attention_dropout,
++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++    #         input_layout="BNSD",
++++    #         sparse_mode=0,
++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++    #         inner_precise=1
++++    #     )
++++
++++    #     # 恢复原始数据类型
++++    #     attn_output = attn_output.to(input_dtype)
++++
++++    #     # 7. 调整输出形状
++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++    #     attn_output = self.o_proj(attn_output)
++++
++++    #     attn_weights = None
++++    #     if output_attentions:
++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++
++++    #     return attn_output, attn_weights, past_key_value
++++
++++    # def forward(
++++    #     self,
++++    #     hidden_states: mindspore.Tensor,
++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++    #     past_key_value: Optional[Cache] = None,
++++    #     output_attentions: bool = False,
++++    #     use_cache: bool = False,
++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++    #     bsz, q_len, _ = hidden_states.shape
++++
++++    #     query_states = self.q_proj(hidden_states)
++++    #     key_states = self.k_proj(hidden_states)
++++    #     value_states = self.v_proj(hidden_states)
++++
++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++    #     kv_seq_len = key_states.shape[-2]
++++    #     if past_key_value is not None:
++++    #         if self.layer_idx is None:
++++    #             raise ValueError("`layer_idx` must be specified for caching")
++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++
++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++    #     if past_key_value is not None:
++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++    #         key_states, value_states = past_key_value.update(
++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++    #         )
++++
++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++
++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++    #     query_states = query_states / math.sqrt(self.head_dim)
++++    #     # <--- 修改结束 ---
++++
++++    #     fa_attention_mask = None
++++    #     if attention_mask is not None:
++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++    #         fa_attention_mask = (mask_slice != 0)
++++
++++    #     input_dtype = query_states.dtype
++++
++++    #     attn_output = mindspore.ops.flash_attention_score(
++++    #         query=query_states,    # 传入已经预先缩放过的 query
++++    #         key=key_states,
++++    #         value=value_states,
++++    #         head_num=self.num_heads,
++++    #         attn_mask=fa_attention_mask,
++++    #         keep_prob=1.0 - self.attention_dropout,
++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++    #         input_layout="BNSD",
++++    #         sparse_mode=0,
++++    #         inner_precise=1        # 仍然保持内部高精度计算
++++    #     )
++++
++++    #     attn_output = attn_output.to(input_dtype)
++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++    #     attn_output = self.o_proj(attn_output)
++++
++++    #     attn_weights = None
++++    #     if output_attentions:
++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++
++++    #     return attn_output, attn_weights, past_key_value
++++
+++ QWEN2MOE_ATTENTION_CLASSES = {
+++     "eager": Qwen2MoeAttention,
++++    "flash-attention": Qwen2MoeFlashAttention,
+++ }
+++ 
+++ 
+++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++ 
++++    #@dwj
++++    # 只遍历激活的专家，而非全部专家
+++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-        hidden_states = hidden_states.view(-1, hidden_dim)
+++-        # router_logits: (batch * sequence_length, n_experts)
+++-        router_logits = self.gate(hidden_states)
+++-
+++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++-        if self.norm_topk_prob:
+++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-        # we cast back to the input dtype
+++-        routing_weights = routing_weights.to(hidden_states.dtype)
+++-
+++-        final_hidden_states = ops.zeros(
+++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++-        )
+++-
+++-        # One hot encode the selected experts to create an expert mask
+++-        # this will be used to easily index which expert is going to be sollicitated
+++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++-
+++-        # Loop over all available experts in the model and perform the computation on each expert
+++-        for expert_idx in range(self.num_experts):
+++-            expert_layer = self.experts[expert_idx]
+++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++-
+++-            # Index the correct hidden states and compute the expert hidden state for
+++-            # the current expert. We need to make sure to multiply the output hidden
+++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++-            if 0 not in idx.shape:
+++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++-
+++-                # However `index_add_` only support torch tensors for indexing so we'll use
+++-                # the `top_x` tensor here.
+++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++-
+++-        shared_expert_output = self.shared_expert(hidden_states)
+++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++-
+++-        final_hidden_states = final_hidden_states + shared_expert_output
++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++            num_tokens = hidden_states_reshaped.shape[0]
++++
++++            router_logits = self.gate(hidden_states_reshaped)
++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++
++++            if self.norm_topk_prob:
++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++            routing_weights = routing_weights.to(hidden_states.dtype)
++++            
++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++            flat_selected_experts = selected_experts.flatten()
++++            
++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++            token_indices = broadcasted_token_indices.flatten()
++++            
++++            active_experts = ops.unique(flat_selected_experts)
++++            
++++            for expert_idx_tensor in active_experts:
++++                expert_idx = expert_idx_tensor.item()
++++                expert_layer = self.experts[expert_idx]
++++                
++++                mask = (flat_selected_experts == expert_idx_tensor)
++++                selected_token_indices = token_indices[mask]
++++                selected_routing_weights = routing_weights.flatten()[mask]
++++                
++++                current_states = hidden_states_reshaped[selected_token_indices]
++++                
++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++                
++++                final_hidden_states = final_hidden_states.index_add(
++++                    dim=0,
++++                    index=selected_token_indices,
++++                    source=expert_output.to(hidden_states.dtype)
++++                )
++++            
++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++ 
+++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++-        return final_hidden_states, router_logits
++++            final_hidden_states = final_hidden_states + shared_expert_output
++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++            
++++            return final_hidden_states, router_logits
+++ 
+++ 
+++ class Qwen2MoeDecoderLayer(nn.Module):
+++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++ 
+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++ 
++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++
+++         if (layer_idx not in config.mlp_only_layers) and (
+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++         ):
+++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++     _skip_keys_device_placement = "past_key_values"
+++     _supports_cache_class = True
++++#lwx
++++    # _supports_static_cache = True
+++ 
+++     def _init_weights(self, module):
+++         std = self.config.initializer_range
+++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++         return causal_mask
+++ 
+++ 
+++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++     _tied_weights_keys = ["lm_head.weight"]
+++ 
+++     def __init__(self, config):
+++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++         self.num_experts_per_tok = config.num_experts_per_tok
+++         # Initialize weights and apply final processing
+++         self.post_init()
++++        # @lwx
++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++++        #     self.generation_config.cache_implementation = "static"
++++        self._warmed_up = False
++++
++++    def warmup_moe_model(self):
++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
++++        test_texts = [
++++            "warmup short",
++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++++        ]
++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++        if tokenizer is None:
++++            from mindnlp.transformers import AutoTokenizer
++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++            self._warmup_tokenizer = tokenizer
++++
++++        for text in test_texts:
++++            inputs = tokenizer(text, return_tensors="ms")
++++            with mindspore._no_grad():
++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++ 
+++     def get_input_embeddings(self):
+++         return self.model.embed_tokens
+++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++         ```"""
++++        if not self._warmed_up:
++++            self._warmed_up = True
++++            self.warmup_moe_model()
+++ 
+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++         output_router_logits = (
+++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++             }
+++         )
+++         return model_inputs
++++# @lwx
++++    # def _decode_one_tokens_logits(
++++    #     self,
++++    #     cur_token: mindspore.Tensor,
++++    #     input_pos: Optional[mindspore.Tensor],
++++    #     cache_position: mindspore.Tensor,
++++    #     past_key_values: StaticCache,
++++    # ) -> mindspore.Tensor:
++++    #     """
++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++++        
++++    #     Args:
++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++++    #         input_pos: 输入位置信息，可选
++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
++++            
++++    #     Returns:
++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++++    #     """
++++    #     # 调用JIT编译的版本
++++    #     return self.get_decode_one_tokens_logits(
++++    #         cur_token=cur_token,
++++    #         input_pos=input_pos,
++++    #         cache_position=cache_position,
++++    #         past_key_values=past_key_values,
++++    #     )
++++    
++++    # @mindspore.jit(jit_level='O1')
++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++++    #     """
++++    #     JIT编译的函数，用于高效的单token解码
++++    #     使用JIT编译优化以支持静态shape和高效执行
++++        
++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++++    #     """
++++    #     outputs = self.model.forward(
++++    #         input_ids=cur_token,
++++    #         position_ids=input_pos,
++++    #         cache_position=cache_position,
++++    #         past_key_values=past_key_values,
++++    #         use_cache=True,
++++    #         return_dict=False,
++++    #     )
++++        
++++    #     hidden_states = outputs[0]
++++    #     logits = self.lm_head.forward(hidden_states)
++++    #     logits = logits.float()
++++        
++++    #     return logits[:, -1, :]
++++
++++    # def _sample(
++++    #     self,
++++    #     input_ids: mindspore.Tensor,
++++    #     logits_processor,
++++    #     stopping_criteria,
++++    #     generation_config,
++++    #     synced_devices: bool,
++++    #     streamer=None,
++++    #     logits_warper=None,
++++    #     **model_kwargs,
++++    # ):
++++    #     """
++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++++    #     """
++++    #     from ...generation.logits_process import LogitsProcessorList
++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++++    #     from mindnlp.core import nn, ops, no_grad
++++    #     import numpy as np
++++        
++++    #     # 检查是否使用 StaticCache
++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++++    #     # 否则，直接调用父类方法
++++    #     past_key_values = model_kwargs.get("past_key_values")
++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++++
++++    #     if not isinstance(past_key_values, StaticCache):
++++    #         # 不使用 StaticCache，直接调用父类方法
++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++++    #         return super()._sample(
++++    #             input_ids=input_ids,
++++    #             logits_processor=logits_processor,
++++    #             stopping_criteria=stopping_criteria,
++++    #             generation_config=generation_config,
++++    #             synced_devices=synced_devices,
++++    #             streamer=streamer,
++++    #             logits_warper=logits_warper,
++++    #             **model_kwargs,
++++    #         )
++++        
++++    #     # 使用 StaticCache，进入自定义循环
++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++++    #     pad_token_id = generation_config._pad_token_tensor
++++    #     output_attentions = generation_config.output_attentions
++++    #     output_hidden_states = generation_config.output_hidden_states
++++    #     output_scores = generation_config.output_scores
++++    #     output_logits = generation_config.output_logits
++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
++++    #     max_length = generation_config.max_length
++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++++    #     do_sample = generation_config.do_sample
++++        
++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++++    #         raise ValueError(
++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++++    #             f"{logits_warper})."
++++    #         )
++++        
++++    #     # init attention / hidden states / scores tuples
++++    #     scores = () if (return_dict_in_generate and output_scores) else None
++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++++        
++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++++    #         encoder_hidden_states = (
++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++++    #         )
++++        
++++    #     # keep track of which sequences are already finished
++++    #     batch_size, cur_len = input_ids.shape
++++    #     this_peer_finished = False
++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++++        
++++    #     time_record = []
++++    #     from ....utils.testing_utils import parse_flag_from_env
++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++++        
++++    #     while self._has_unfinished_sequences(
++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++++    #     ):
++++    #         if _record_time:
++++    #             import time as time_module
++++    #             infer_start = time_module.time()
++++            
++++    #         # prepare model inputs
++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++++            
++++    #         # prepare variable output controls
++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++++            
++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++++    #         cur_cache_position = model_inputs.get("cache_position")
++++    #         cur_past_key_values = model_inputs.get("past_key_values")
++++    #         cur_input_ids = model_inputs.get("input_ids")
++++            
++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
++++    #             cur_cache_position is not None and 
++++    #             len(cur_cache_position.shape) > 0 and
++++    #             cur_cache_position.shape[0] == 1 and
++++    #             cur_input_ids is not None and
++++    #             cur_input_ids.shape[1] == 1):
++++    #             # 使用 JIT 优化的单 token 解码
++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++++    #             if not hasattr(self, '_jit_used'):
++++    #                 self._jit_used = False
++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++++                
++++    #             next_token_logits = self.get_decode_one_tokens_logits(
++++    #                 cur_token=cur_input_ids,
++++    #                 input_pos=model_inputs.get("position_ids"),
++++    #                 cache_position=cur_cache_position,
++++    #                 past_key_values=cur_past_key_values,
++++    #             )
++++                
++++    #             # 标记已使用JIT（用于后续判断）
++++    #             if not self._jit_used:
++++    #                 self._jit_used = True
++++                
++++    #             # 构造兼容的输出对象
++++    #             class JitOptimizedOutput:
++++    #                 def __init__(self, logits, config):
++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++++    #                     self.config = config
++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++++    #                     self.attentions = None if not config.is_encoder_decoder else None
++++    #                     self.cross_attentions = None
++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++++                
++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++++    #         else:
++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++++    #             outputs = self(**model_inputs, return_dict=True)
++++            
++++    #         if synced_devices and this_peer_finished:
++++    #             continue
++++            
++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++++    #         next_token_logits = outputs.logits[:, -1, :]
++++            
++++    #         # pre-process distribution
++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++++    #         if do_sample:
++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++++            
++++    #         # Store scores, attentions and hidden_states when required
++++    #         if return_dict_in_generate:
++++    #             if output_scores:
++++    #                 scores += (next_token_scores,)
++++    #             if output_logits:
++++    #                 raw_logits += (next_token_logits,)
++++    #             if output_attentions:
++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++++    #                 if self.config.is_encoder_decoder:
++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++++                
++++    #             if output_hidden_states:
++++    #                 hidden = (
++++    #                     outputs.decoder_hidden_states
++++    #                     if self.config.is_encoder_decoder
++++    #                     else outputs.hidden_states
++++    #                 )
++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++++            
++++    #         # token selection
++++    #         if do_sample:
++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++++    #         else:
++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++++            
++++    #         # finished sentences should have their next token be a padding token
++++    #         if has_eos_stopping_criteria:
++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++++            
++++    #         # update generated ids, model inputs, and length for next step
++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++++    #         if streamer is not None:
++++    #             streamer.put(next_tokens)
++++            
++++    #         model_kwargs = self._update_model_kwargs_for_generation(
++++    #             outputs,
++++    #             model_kwargs,
++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
++++    #         )
++++            
++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++++    #         cur_len += 1
++++            
++++    #         if _record_time:
++++    #             import time as time_module
++++    #             infer_stop = time_module.time()
++++    #             time_record.append(infer_stop - infer_start)
++++            
++++    #         del outputs
++++        
++++    #     average_infer_time = None
++++    #     if time_record:
++++    #         if len(time_record) > 1:
++++    #             time_record.pop(0)
++++    #         average_infer_time = sum(time_record) / len(time_record)
++++    #         print(f'average inference time is: {average_infer_time}')
++++    #         print(f'inference time record: {time_record}')
++++        
++++    #     if streamer is not None:
++++    #         streamer.end()
++++        
++++    #     # 简单判断：打印是否使用了JIT路径
++++    #     if hasattr(self, '_jit_used') and self._jit_used:
++++    #         print("[JIT] ✓ JIT optimization was used during generation")
++++    #     else:
++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++++        
++++    #     if return_dict_in_generate:
++++    #         if self.config.is_encoder_decoder:
++++    #             return GenerateEncoderDecoderOutput(
++++    #                 sequences=input_ids,
++++    #                 scores=scores,
++++    #                 logits=raw_logits,
++++    #                 encoder_attentions=encoder_attentions,
++++    #                 encoder_hidden_states=encoder_hidden_states,
++++    #                 decoder_attentions=decoder_attentions,
++++    #                 cross_attentions=cross_attentions,
++++    #                 decoder_hidden_states=decoder_hidden_states,
++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++    #                 average_infer_time=average_infer_time
++++    #             )
++++    #         else:
++++    #             return GenerateDecoderOnlyOutput(
++++    #                 sequences=input_ids,
++++    #                 scores=scores,
++++    #                 logits=raw_logits,
++++    #                 attentions=decoder_attentions,
++++    #                 hidden_states=decoder_hidden_states,
++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++    #                 average_infer_time=average_infer_time
++++    #             )
++++    #     else:
++++    #         return input_ids
++++            
++++    # def _prepare_cache_for_generation(
++++    #     self,
++++    #     generation_config,
++++    #     model_kwargs,
++++    #     assistant_model,
++++    #     batch_size,
++++    #     max_cache_length,
++++    # ):
++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++++    #         generation_config.cache_implementation = "static"
++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++++        
++++    #     if generation_config.cache_implementation == "static":
++++    #         base_required_from_max_length = generation_config.max_length + 1
++++    #         base_required = max(max_cache_length, base_required_from_max_length)
++++    #         min_cache_size = 50
++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++++    #         else:
++++    #             max_cache_length = max(base_required, min_cache_size)
++++            
++++    #         original_max_cache_length = max_cache_length
++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++++    #         print(f"  - final max_cache_length: {max_cache_length}")
++++            
++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++    #             if max_cache_length > self.config.max_position_embeddings:
++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++++        
++++    #     result = super()._prepare_cache_for_generation(
++++    #         generation_config=generation_config,
++++    #         model_kwargs=model_kwargs,
++++    #         assistant_model=assistant_model,
++++    #         batch_size=batch_size,
++++    #         max_cache_length=max_cache_length,
++++    #     )
++++        
++++    #     if generation_config.cache_implementation == "static":
++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++++    #         created_cache = model_kwargs.get(cache_name)
++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++++    #             if created_cache.max_cache_len < generation_config.max_length:
++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++++        
++++    #     return result
++++
++++
++++
+++ 
+++ 
+++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++-- 
+++2.27.0
+++
++-- 
++2.27.0
++
+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+new file mode 100644
+index 00000000..966529e4
+--- /dev/null
++++ b/patches/0003-20261106secondcommit.patch
+@@ -0,0 +1,2769 @@
++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Thu, 6 Nov 2025 14:54:37 +0800
++Subject: [PATCH 3/3] 20261106secondcommit
++
++---
++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
++ patches/0001-20251104commit.patch             | 1272 -----------------
++ 3 files changed, 528 insertions(+), 2032 deletions(-)
++ delete mode 100644 patches/0001-20251104commit.patch
++
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index 73773c22..2f9192bf 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
++ 
++ _CONFIG_FOR_DOC = "DeepseekConfig"
++ 
+++_attn_mask_cache = {}
+++
+++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+++    q_len = batch_and_seq[1]
+++    kv_len = batch_and_seq[1] + past_key_values_length 
+++    key = (batch_and_seq[0], q_len, kv_len)
+++
+++    if key in _attn_mask_cache:
+++        return _attn_mask_cache[key]
+++
+++    mask = _prepare_4d_causal_attention_mask(
+++        attention_mask,
+++        batch_and_seq,
+++        inputs_embeds,
+++        past_key_values_length,
+++    )
+++    _attn_mask_cache[key] = mask
+++    return mask
++ 
++ def _get_unpad_data(attention_mask):
++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
++         return final_output
++ 
++ 
++-    @no_grad()
++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++-        expert_cache = ops.zeros_like(x)
++-        idxs = flat_expert_indices.argsort()
++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++-        token_idxs = idxs // self.num_experts_per_tok
++-
++-        for i, end_idx in enumerate(tokens_per_expert):
++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++-            if start_idx == end_idx:
++-                continue
++-            expert = self.experts[i]
++-            exp_token_idx = token_idxs[start_idx:end_idx]
++-            expert_tokens = x[exp_token_idx]
++-            expert_out = expert(expert_tokens)
++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++-
++-        return expert_cache
++-        
++     # @no_grad()
++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++-    #     # expert_cache = torch.zeros_like(x)
++-    #     # idxs = flat_expert_indices.argsort()
++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++-    #     # token_idxs = idxs // self.num_experts_per_tok
++-    #     # for i, end_idx in enumerate(tokens_per_expert):
++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++-    #     #     if start_idx == end_idx:
++-    #     #         continue
++-    #     #     expert = self.experts[i]
++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++-    #     #     expert_tokens = x[exp_token_idx]
++-    #     #     expert_out = expert(expert_tokens)
++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++-    #     # return expert_cache
+++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++     #     expert_cache = ops.zeros_like(x)
++     #     idxs = flat_expert_indices.argsort()
++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++ 
++     #     return expert_cache
++-    # @no_grad()
++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++-    #     expert_cache = ops.zeros_like(x)
+++        
+++    @no_grad()
+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++        """
+++        优化版 MoE prefill：
+++        - 批量张量化处理同一个 expert 的所有 token
+++        - 跳过无 token 的专家
+++        - 保持结果完全一致
+++        """
+++        # 初始化输出缓存
+++        expert_cache = ops.zeros_like(x)
++ 
++-    #     # 排序保证顺序一致
++-    #     idxs = flat_expert_indices.argsort()
++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++-    #     token_idxs = idxs // self.num_experts_per_tok
+++        # 排序（确保 scatter_add 位置对应原逻辑）
+++        idxs = flat_expert_indices.argsort()
+++        sorted_expert_indices = flat_expert_indices[idxs]
+++        sorted_token_indices = idxs // self.num_experts_per_tok
++ 
++-    #     # 找出有 token 的专家
++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++        # 每个 expert 的 token 数
+++        tokens_per_expert = sorted_expert_indices.bincount()
++ 
++-    #     for i in active_experts.tolist():
++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++-    #         end_idx = tokens_per_expert[i]
++-    #         if start_idx == end_idx:  # 没有 token
++-    #             continue
+++        # 找出有 token 的专家
+++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
++ 
++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
++-    #         expert_tokens = x[exp_token_idx]
++-    #         expert_out = self.experts[i](expert_tokens)
++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++        for expert_id in active_experts.tolist():
+++            # 取该 expert 对应的排序后 token 区间
+++            start = (tokens_per_expert[:expert_id]).sum().item()
+++            end = start + tokens_per_expert[expert_id].item()
++ 
++-    #         expert_cache = mindspore.mint.scatter_add(
++-    #             expert_cache,
++-    #             0,
++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++-    #             expert_out
++-    #         )
+++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+++            expert_tokens = x[token_idx]                     # 取输入向量
++ 
++-    #     return expert_cache
+++            # 执行专家 MLP
+++            expert_out = self.experts[expert_id](expert_tokens)
+++
+++            # 按权重缩放
+++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+++
+++            # 回写到缓存（等价 scatter_add）
+++            expert_cache = mindspore.mint.scatter_add(
+++                expert_cache,
+++                0,
+++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++                scaled_out
+++            )
+++
+++        return expert_cache
+++
+++        # @no_grad()
+++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++        #     # expert_cache = torch.zeros_like(x)
+++        #     # idxs = flat_expert_indices.argsort()
+++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++        #     # token_idxs = idxs // self.num_experts_per_tok
+++        #     # for i, end_idx in enumerate(tokens_per_expert):
+++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++        #     #     if start_idx == end_idx:
+++        #     #         continue
+++        #     #     expert = self.experts[i]
+++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++        #     #     expert_tokens = x[exp_token_idx]
+++        #     #     expert_out = expert(expert_tokens)
+++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++        #     # return expert_cache
+++        #     expert_cache = ops.zeros_like(x)
+++        #     idxs = flat_expert_indices.argsort()
+++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++        #     token_idxs = idxs // self.num_experts_per_tok
+++
+++        #     for i, end_idx in enumerate(tokens_per_expert):
+++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++        #         if start_idx == end_idx:
+++        #             continue
+++        #         expert = self.experts[i]
+++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+++        #         expert_tokens = x[exp_token_idx]
+++        #         expert_out = expert(expert_tokens)
+++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++
+++        #     return expert_cache
+++        # @no_grad()
+++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++        #     expert_cache = ops.zeros_like(x)
+++
+++        #     # 排序保证顺序一致
+++        #     idxs = flat_expert_indices.argsort()
+++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++        #     token_idxs = idxs // self.num_experts_per_tok
+++
+++        #     # 找出有 token 的专家
+++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++
+++        #     for i in active_experts.tolist():
+++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++        #         end_idx = tokens_per_expert[i]
+++        #         if start_idx == end_idx:  # 没有 token
+++        #             continue
+++
+++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+++        #         expert_tokens = x[exp_token_idx]
+++        #         expert_out = self.experts[i](expert_tokens)
+++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++
+++        #         expert_cache = mindspore.mint.scatter_add(
+++        #             expert_cache,
+++        #             0,
+++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++        #             expert_out
+++        #         )
+++
+++        #     return expert_cache
++ 
++ 
++ 
++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
++ 
++         return attn_output, attn_weights, past_key_value
++ 
++-
++ # class DeepseekFlashAttention(nn.Module):
++ #     """
++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
++ 
++         return attn_output, attn_weights, past_key_value
++ 
+++
++ Deepseek_ATTENTION_CLASSES = {
++     "eager": DeepseekAttention,
++     "flash-attention": DeepseekFlashAttention,
++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
++             )
++         else:
++             # 4d mask is passed through the layers
++-            attention_mask = _prepare_4d_causal_attention_mask(
+++            # attention_mask = _prepare_4d_causal_attention_mask(
+++            #     attention_mask,
+++            #     (batch_size, seq_length),
+++            #     inputs_embeds,
+++            #     past_key_values_length,
+++            # )
+++            #@dwj
+++            attention_mask = get_cached_causal_mask(
++                 attention_mask,
++                 (batch_size, seq_length),
++                 inputs_embeds,
++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++         # Initialize weights and apply final processing
++         self.post_init()
++         self.warm_up = False
+++        #@dwj
+++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+++            self.num_layers,
+++            self.num_attention_heads,
+++            self.head_dim,
+++            batch_size=1,
+++            max_length=self.max_length,
+++            dtype=mindspore.float16
+++        )
+++
+++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+++        key_cache = []
+++        value_cache = []
+++        for _ in range(num_layers):
+++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++            key_cache.append(k)
+++            value_cache.append(v)
+++        return key_cache, value_cache
+++
++ 
++     def warmup_moe_model_deep(self):
++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++index bced285c..ebd7782e 100644
++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
++ 
++-Long_Prompt = False
++-PROMPT_LENGTH_THRESHOLD = 128
+++Long_Prompt = 1
+++LONG_PROMPT_LENGTH_THRESHOLD = 128
+++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+++
+++_causal_mask_cache = {}
+++
+++def get_cached_causal_mask_with_cache_position(
+++    attention_mask: mindspore.Tensor,
+++    sequence_length: int,
+++    target_length: int,
+++    dtype: mindspore.dtype,
+++    min_dtype: float,
+++    cache_position: mindspore.Tensor,
+++    batch_size: int,
+++):
+++    """
+++    带缓存的 causal mask 构造函数
+++    """
+++    # q_len 是当前 query 长度
+++    q_len = sequence_length
+++    # kv_len 是 target_length
+++    kv_len = target_length
+++
+++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+++
+++    if key in _causal_mask_cache:
+++        return _causal_mask_cache[key]
+++
+++    # 调用原来的 mask 构造逻辑
+++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++        attention_mask,
+++        sequence_length=sequence_length,
+++        target_length=target_length,
+++        dtype=dtype,
+++        min_dtype=min_dtype,
+++        cache_position=cache_position,
+++        batch_size=batch_size,
+++    )
+++    # 缓存结果
+++    _causal_mask_cache[key] = causal_mask
+++    return causal_mask
++ 
++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
++ def _prepare_4d_causal_attention_mask_with_cache_position(
++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++ 
++ 
++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+++# class Qwen2MoeAttention(nn.Module):
+++#     """
+++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+++#     and "Generating Long Sequences with Sparse Transformers".
+++#     """
+++
+++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++#         super().__init__()
+++#         self.config = config
+++#         self.layer_idx = layer_idx
+++#         if layer_idx is None:
+++#             logger.warning_once(
+++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++#                 "when creating this class."
+++#             )
+++
+++#         self.hidden_size = config.hidden_size
+++#         self.num_heads = config.num_attention_heads
+++#         self.head_dim = self.hidden_size // self.num_heads
+++#         self.num_key_value_heads = config.num_key_value_heads
+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++#         self.max_position_embeddings = config.max_position_embeddings
+++#         self.rope_theta = config.rope_theta
+++#         self.is_causal = True
+++#         self.attention_dropout = config.attention_dropout
+++
+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++#             raise ValueError(
+++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++#                 f" and `num_heads`: {self.num_heads})."
+++#             )
+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++
+++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++#             self.head_dim,
+++#             max_position_embeddings=self.max_position_embeddings,
+++#             base=self.rope_theta,
+++#         )
+++
+++#     def forward(
+++#         self,
+++#         hidden_states: mindspore.Tensor,
+++#         attention_mask: Optional[mindspore.Tensor] = None,
+++#         position_ids: Optional[mindspore.Tensor] = None,
+++#         past_key_value: Optional[Cache] = None,
+++#         output_attentions: bool = False,
+++#         use_cache: bool = False,
+++#         cache_position: Optional[mindspore.Tensor] = None,
+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++
+++        
+++
+++#         bsz, q_len, _ = hidden_states.shape
+++
+++#         query_states = self.q_proj(hidden_states)
+++#         key_states = self.k_proj(hidden_states)
+++#         value_states = self.v_proj(hidden_states)
+++
+++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++
+++#         kv_seq_len = key_states.shape[-2]
+++#         if past_key_value is not None:
+++#             if self.layer_idx is None:
+++#                 raise ValueError(
+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++#                     "with a layer index."
+++#                 )
+++#             if isinstance(past_key_value, StaticCache):
+++#                 kv_seq_len = key_states.shape[-2]
+++#             else:
+++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++
+++#         if past_key_value is not None:
+++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++            
+++#             if isinstance(past_key_value, StaticCache):
+++#                 kv_seq_len = key_states.shape[-2]
+++
+++#         # repeat k/v heads if n_kv_heads < n_heads
+++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++        
+++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++
+++#         if attention_mask is not None:
+++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++#             attn_weights = attn_weights + causal_mask
+++
+++#         # upcast attention to fp32
+++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++#         attn_output = ops.matmul(attn_weights, value_states)
+++
+++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++#             raise ValueError(
+++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+++#                 f" {attn_output.shape}"
+++#             )
+++
+++#         attn_output = ops.transpose(attn_output, 1, 2)
+++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++
+++#         attn_output = self.o_proj(attn_output)
+++#         # @lwx
+++        
+++#         # max_seq_len = self.max_position_embeddings  # 2048
+++
+++#         # if attention_mask is not None:
+++#         #     # attention_mask: [B, 1, Sq, Sk]
+++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++
+++#         #     # pad 到 [max_seq_len, max_seq_len]
+++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++#         #     global_attention_mask = padded_mask
+++#         # else:
+++#         #     global_attention_mask = None
+++
+++
+++#         # sparse_mode=3
+++#         # attn_output = mindspore.ops.flash_attention_score(
+++#         #     query=query_states,
+++#         #     key=key_states,
+++#         #     value=value_states,
+++#         #     real_shift=None,
+++#         #     padding_mask=None,
+++
+++#         #     head_num=self.num_heads,
+++#         #     attn_mask=global_attention_mask,
+++#         #     keep_prob=1.0 - self.attention_dropout,
+++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++#         #     input_layout="BNSD",
+++#         #     pre_tokens=2147483647,
+++#         #     next_tokens=2147483647,
+++#         #     inner_precise=0,
+++#         #     drop_mask=None,
+++#         #     prefix=None,
+++#         #     actual_seq_qlen=None,
+++#         #     actual_seq_kvlen=None,
+++#         #     sparse_mode=sparse_mode,
+++#         # )
+++#         if not output_attentions:
+++#             attn_weights = None
+++
+++#         return attn_output, attn_weights, past_key_value
+++
++ class Qwen2MoeAttention(nn.Module):
++     """
++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
++-    and "Generating Long Sequences with Sparse Transformers".
++-    """
+++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
++ 
+++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+++
+++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+++    """
++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++         super().__init__()
++         self.config = config
++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
++         if layer_idx is None:
++             logger.warning_once(
++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++                 "when creating this class."
++             )
++ 
++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
++         use_cache: bool = False,
++         cache_position: Optional[mindspore.Tensor] = None,
++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++-
++         
++-
+++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
++         bsz, q_len, _ = hidden_states.shape
++ 
++         query_states = self.q_proj(hidden_states)
++         key_states = self.k_proj(hidden_states)
++         value_states = self.v_proj(hidden_states)
++ 
++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++-
+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++        
++         kv_seq_len = key_states.shape[-2]
++         if past_key_value is not None:
++-            if self.layer_idx is None:
++-                raise ValueError(
++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++-                    "with a layer index."
++-                )
++-            if isinstance(past_key_value, StaticCache):
++-                kv_seq_len = key_states.shape[-2]
++-            else:
++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++        
++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++ 
++         if past_key_value is not None:
++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++
+++        # --- 2. 动态调度核心注意力计算 ---
+++        global Long_Prompt
+++        if Long_Prompt >= 1:
+++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+++            fa_attention_mask = None
+++            if attention_mask is not None:
+++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++                fa_attention_mask = (mask_slice != 0)
+++
+++            attn_output = mindspore.ops.flash_attention_score(
+++                query=query_states,
+++                key=key_states,
+++                value=value_states,
+++                head_num=self.num_heads,
+++                attn_mask=fa_attention_mask,
+++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+++                scalar_value=1.0 / math.sqrt(self.head_dim),
+++                input_layout="BNSD",
+++                sparse_mode=0,
+++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+++            )
++             
++-            if isinstance(past_key_value, StaticCache):
++-                kv_seq_len = key_states.shape[-2]
+++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++            attn_output = self.o_proj(attn_output)
+++            attn_weights = None
+++            if output_attentions:
+++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
++ 
++-        # repeat k/v heads if n_kv_heads < n_heads
++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
++-        
++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++        else:
+++            # --- Eager Attention 路径 (用于短序列和解码) ---
+++            key_states = repeat_kv(key_states, self.num_key_value_groups)
+++            value_states = repeat_kv(value_states, self.num_key_value_groups)
+++            
+++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++ 
++-        if attention_mask is not None:
++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++-            attn_weights = attn_weights + causal_mask
+++            if attention_mask is not None:
+++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++                attn_weights = attn_weights + causal_mask
++ 
++-        # upcast attention to fp32
++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++-        attn_output = ops.matmul(attn_weights, value_states)
+++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++            attn_output = ops.matmul(attn_weights, value_states)
++ 
++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++-            raise ValueError(
++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
++-                f" {attn_output.shape}"
++-            )
+++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++                raise ValueError(
+++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+++                )
++ 
++-        attn_output = ops.transpose(attn_output, 1, 2)
++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++            attn_output = ops.transpose(attn_output, 1, 2)
+++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++            attn_output = self.o_proj(attn_output)
++ 
++-        attn_output = self.o_proj(attn_output)
++-        # @lwx
+++            if not output_attentions:
+++                attn_weights = None
++         
++-        # max_seq_len = self.max_position_embeddings  # 2048
++-
++-        # if attention_mask is not None:
++-        #     # attention_mask: [B, 1, Sq, Sk]
++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++-
++-        #     # pad 到 [max_seq_len, max_seq_len]
++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++-        #     global_attention_mask = padded_mask
++-        # else:
++-        #     global_attention_mask = None
++-
++-
++-        # sparse_mode=3
++-        # attn_output = mindspore.ops.flash_attention_score(
++-        #     query=query_states,
++-        #     key=key_states,
++-        #     value=value_states,
++-        #     real_shift=None,
++-        #     padding_mask=None,
++-
++-        #     head_num=self.num_heads,
++-        #     attn_mask=global_attention_mask,
++-        #     keep_prob=1.0 - self.attention_dropout,
++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++-        #     input_layout="BNSD",
++-        #     pre_tokens=2147483647,
++-        #     next_tokens=2147483647,
++-        #     inner_precise=0,
++-        #     drop_mask=None,
++-        #     prefix=None,
++-        #     actual_seq_qlen=None,
++-        #     actual_seq_kvlen=None,
++-        #     sparse_mode=sparse_mode,
++-        # )
++-        if not output_attentions:
++-            attn_weights = None
++-
++         return attn_output, attn_weights, past_key_value
++ 
++-
++ # class Qwen2MoeFlashAttention(nn.Module):
++ #     """
++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
++ #             return final_hidden_states, router_logits
++ 
++ 
++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++-#     """
++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
++-#     """
++-#     def __init__(self, config: Qwen2MoeConfig):
++-#         super().__init__()
++-#         self.num_experts = config.num_experts
++-#         self.top_k = config.num_experts_per_tok
++-#         self.norm_topk_prob = config.norm_topk_prob
++-
++-#         # 门控网络
++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++-#         # 专家列表
++-#         self.experts = nn.ModuleList(
++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++-#         )
++-#         # 共享专家
++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++-
++-#     @no_grad()
++-#     def _moe_infer_decode(
++-#         self, 
++-#         hidden_states: mindspore.Tensor, 
++-#         selected_experts: mindspore.Tensor, 
++-#         routing_weights: mindspore.Tensor
++-#     ) -> mindspore.Tensor:
++-#         """
++-#         【解码路径】针对 sequence_length=1 的极致优化。
++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
++-#         """
++-#         batch_size, hidden_dim = hidden_states.shape
++-        
++-#         expert_outputs_list = [
++-#             ops.cat([
++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++-#             ], dim=0) 
++-#             for i in range(batch_size)
++-#         ]
++-        
++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
++-#         # shape: (batch_size, top_k, hidden_dim)
++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++-        
++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++-        
++-#         return moe_output.squeeze(1)
++-
++-#     @no_grad()
++-#     def _moe_infer_prefill(
++-#         self, 
++-#         hidden_states: mindspore.Tensor, 
++-#         selected_experts: mindspore.Tensor, 
++-#         routing_weights: mindspore.Tensor
++-#     ) -> mindspore.Tensor:
++-#         """
++-#         【预填充路径】针对 sequence_length > 1 的优化。
++-#         按专家对 Token 进行分组，并进行批处理。
++-#         """
++-#         moe_output = ops.zeros_like(hidden_states)
++-#         num_tokens = hidden_states.shape[0]
++-#         flat_selected_experts = selected_experts.flatten()
++-        
++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++-        
++-#         active_experts = ops.unique(flat_selected_experts)
++-        
++-#         for expert_idx_tensor in active_experts:
++-#             expert_idx = expert_idx_tensor.item()
++-#             expert_layer = self.experts[expert_idx]
++-            
++-#             mask = (flat_selected_experts == expert_idx_tensor)
++-#             selected_token_indices = token_indices[mask]
++-#             selected_routing_weights = routing_weights.flatten()[mask]
++-            
++-#             current_states = hidden_states[selected_token_indices]
++-            
++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++-            
++-#             moe_output = moe_output.index_add(
++-#                 dim=0,
++-#                 index=selected_token_indices,
++-#                 source=expert_output.to(hidden_states.dtype)
++-#             )
++-#         return moe_output
++-
++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++-#         """
++-#         顶层 forward 方法，作为智能分发器。
++-#         """
++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++-        
++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++-#         router_logits = self.gate(hidden_states_reshaped)
++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++-
++-#         if self.norm_topk_prob:
++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-        
++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++-        
++-#         moe_output = None
++-#         # 在推理时，根据序列长度选择最优路径
++-#         if not self.training:
++-#             if sequence_length == 1:
++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++-#             else:
++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++-#         else:
++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
++-#             raise NotImplementedError("Training path is not implemented.")
++-
++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
++-        
++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
++-        
++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
++-        
++-#         return final_hidden_states, router_logits
++-
++-
++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++-#     """
++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
++-#     """
++-#     def __init__(self, config: Qwen2MoeConfig):
++-#         super().__init__()
++-#         self.num_experts = config.num_experts
++-#         self.top_k = config.num_experts_per_tok
++-#         self.norm_topk_prob = config.norm_topk_prob
++-
++-#         # 门控网络
++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++-#         # 专家列表
++-#         self.experts = nn.ModuleList(
++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++-#         )
++-#         # 共享专家
++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++-
++-#     @no_grad()
++-#     def _moe_infer_decode(
++-#         self, 
++-#         hidden_states: mindspore.Tensor, 
++-#         selected_experts: mindspore.Tensor, 
++-#         routing_weights: mindspore.Tensor
++-#     ) -> mindspore.Tensor:
++-#         batch_size, _ = hidden_states.shape
++-#         expert_outputs_list = [
++-#             ops.cat([
++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++-#             ], dim=0) 
++-#             for i in range(batch_size)
++-#         ]
++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++-#         return moe_output.squeeze(1)
++-
++-#     @no_grad()
++-#     def _moe_infer_prefill(
++-#         self, 
++-#         hidden_states: mindspore.Tensor, 
++-#         selected_experts: mindspore.Tensor, 
++-#         routing_weights: mindspore.Tensor
++-#     ) -> mindspore.Tensor:
++-#         moe_output = ops.zeros_like(hidden_states)
++-#         num_tokens = hidden_states.shape[0]
++-#         flat_selected_experts = selected_experts.flatten()
++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++-#         active_experts = ops.unique(flat_selected_experts)
++-        
++-#         for expert_idx_tensor in active_experts:
++-#             expert_idx = expert_idx_tensor.item()
++-#             expert_layer = self.experts[expert_idx]
++-#             mask = (flat_selected_experts == expert_idx_tensor)
++-#             selected_token_indices = token_indices[mask]
++-#             selected_routing_weights = routing_weights.flatten()[mask]
++-#             current_states = hidden_states[selected_token_indices]
++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++-#             moe_output = moe_output.index_add(
++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++-#             )
++-#         return moe_output
++-
++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++-#         """
++-#         顶层 forward 方法，作为智能分发器。
++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
++-#         """
++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++-        
++-#         # 1. 门控计算 (通用逻辑)
++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++-#         router_logits = self.gate(hidden_states_reshaped)
++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++-
++-#         if self.norm_topk_prob:
++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-        
++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++-        
++-#         # 2. 智能分发到最优 MoE 路径
++-#         moe_output = None
++-#         if not self.training:
++-#             if sequence_length == 1:
++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++-#             else:
++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++-#         else:
++-#             raise NotImplementedError("Training path is not implemented.")
++-
++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++-        
++-#         # 4. 合并 MoE 输出和共享专家输出
++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++-        
++-#         # 5. 恢复原始形状并返回
++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++-        
++-#         return final_hidden_states, router_logits
++-
++-# prefill fastest
++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++-#     """
++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
++-#     """
++-#     def __init__(self, config: Qwen2MoeConfig):
++-#         super().__init__()
++-#         self.num_experts = config.num_experts
++-#         self.top_k = config.num_experts_per_tok
++-#         self.norm_topk_prob = config.norm_topk_prob
++-
++-#         # 门控网络
++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++-#         # 专家列表
++-#         self.experts = nn.ModuleList(
++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++-#         )
++-#         # 共享专家
++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++-
++-#     @no_grad()
++-#     def _moe_infer_dispatch(
++-#         self, 
++-#         hidden_states: mindspore.Tensor, 
++-#         selected_experts: mindspore.Tensor, 
++-#         routing_weights: mindspore.Tensor
++-#     ) -> mindspore.Tensor:
++-#         """
++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
++-#         """
++-#         moe_output = ops.zeros_like(hidden_states)
++-#         num_tokens, _ = hidden_states.shape
++-        
++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
++-#         flat_selected_experts = selected_experts.flatten()
++-#         flat_routing_weights = routing_weights.flatten()
++-
++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++-
++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
++-#         active_experts = ops.unique(flat_selected_experts)
++-        
++-#         for expert_idx_tensor in active_experts:
++-#             expert_idx = expert_idx_tensor.item()
++-#             expert_layer = self.experts[expert_idx]
++-            
++-#             # 找到所有分配给该专家的 token
++-#             mask = (flat_selected_experts == expert_idx_tensor)
++-            
++-#             # 使用 mask 选取对应的 token 和权重
++-#             current_token_indices = token_indices[mask]
++-#             current_routing_weights = flat_routing_weights[mask]
++-#             current_hidden_states = hidden_states[current_token_indices]
++-            
++-#             # 对这些 token 进行批处理
++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++-            
++-#             # 使用 index_add 将结果精确地加回到对应位置
++-#             moe_output = moe_output.index_add(
++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
++-#             )
++-#         return moe_output
++-
++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++-#         """
++-#         顶层 forward 方法，作为智能分发器。
++-#         """
++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++-        
++-#         # 1. 门控计算
++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++-#         router_logits = self.gate(hidden_states_reshaped)
++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++-
++-#         if self.norm_topk_prob:
++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-        
++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++-        
++-#         # 2. 调用统一的 MoE 计算内核
++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
++-
++-#         # 3. 统一处理共享专家
++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++-        
++-#         # 4. 合并输出
++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++-        
++-#         # 5. 恢复原始形状并返回
++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++-        
++-#         return final_hidden_states, router_logits
++-
++-
++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++-#     """
++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++-#     【最终高性能与高精度版】：
++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
++-#     3. 这样实现了速度和准确性的两全其美。
++-#     """
++-#     def __init__(self, config: Qwen2MoeConfig):
++-#         super().__init__()
++-#         self.num_experts = config.num_experts
++-#         self.top_k = config.num_experts_per_tok
++-#         self.norm_topk_prob = config.norm_topk_prob
++-
++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++-#         self.experts = nn.ModuleList(
++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++-#         )
++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++-
++-#     @no_grad()
++-#     def _moe_infer_decode(
++-#         self, 
++-#         hidden_states: mindspore.Tensor, 
++-#         selected_experts: mindspore.Tensor, 
++-#         routing_weights: mindspore.Tensor
++-#     ) -> mindspore.Tensor:
++-#         """
++-#         【解码路径】极致优化版：bmm + 高精度累加。
++-#         """
++-#         original_dtype = hidden_states.dtype
++-#         batch_size, _ = hidden_states.shape
++-        
++-#         expert_outputs_list = [
++-#             ops.cat([
++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++-#             ], dim=0) 
++-#             for i in range(batch_size)
++-#         ]
++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++-
++-#         # 在 float32 下执行 bmm，得到高精度结果
++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++-        
++-#         # 将高精度结果转换回原始数据类型
++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
++-        
++-#         return moe_output
++-
++-#     @no_grad()
++-#     def _moe_infer_prefill(
++-#         self, 
++-#         hidden_states: mindspore.Tensor, 
++-#         selected_experts: mindspore.Tensor, 
++-#         routing_weights: mindspore.Tensor
++-#     ) -> mindspore.Tensor:
++-#         """
++-#         【预填充路径】与原始实现一致，结果精确。
++-#         """
++-#         moe_output = ops.zeros_like(hidden_states)
++-#         num_tokens, _ = hidden_states.shape
++-#         flat_selected_experts = selected_experts.flatten()
++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++-#         active_experts = ops.unique(flat_selected_experts)
++-        
++-#         for expert_idx_tensor in active_experts:
++-#             expert_idx = expert_idx_tensor.item()
++-#             expert_layer = self.experts[expert_idx]
++-#             mask = (flat_selected_experts == expert_idx_tensor)
++-#             selected_token_indices = token_indices[mask]
++-#             selected_routing_weights = routing_weights.flatten()[mask]
++-#             current_states = hidden_states[selected_token_indices]
++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++-#             moe_output = moe_output.index_add(
++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++-#             )
++-#         return moe_output
++-
++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++-        
++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++-#         router_logits = self.gate(hidden_states_reshaped)
++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++-
++-#         if self.norm_topk_prob:
++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-        
++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
++-#         # 如果模型主体是 float16，后续再转换
++-        
++-#         moe_output = None
++-#         if not self.training:
++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
++-#             # _moe_infer_decode 内部会处理好类型转换
++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
++-#             if sequence_length == 1:
++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
++-#             else:
++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
++-#         else:
++-#             raise NotImplementedError("Training path is not implemented.")
++-
++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++-        
++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++-        
++-#         return final_hidden_states, router_logits
++-    
++-
++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++-#     """
++-#     【融合版】一个混合专家模块，内置两种推理策略，
++-#     由外部全局变量 `Long_Prompt` 控制：
++-
++-#     - if Long_Prompt is True:  【精度优先模式】
++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
++-#       适用于处理长序列，避免误差累积。
++-
++-#     - if Long_Prompt is False: 【速度优先模式】
++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
++-#       在解码阶段获得极致速度，同时保证结果高度准确。
++-#     """
++-#     def __init__(self, config: Qwen2MoeConfig):
++-#         super().__init__()
++-#         self.num_experts = config.num_experts
++-#         self.top_k = config.num_experts_per_tok
++-#         self.norm_topk_prob = config.norm_topk_prob
++-
++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++-#         self.experts = nn.ModuleList(
++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++-#         )
++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++-
++-#     # --- 速度优先模式的辅助函数 ---
++-#     @no_grad()
++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++-#         original_dtype = hidden_states.dtype
++-#         batch_size, _ = hidden_states.shape
++-#         expert_outputs_list = [
++-#             ops.cat([
++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++-#             ], dim=0) 
++-#             for i in range(batch_size)
++-#         ]
++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++-#         weights_fp32 = routing_weights.to(mindspore.float32)
++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
++-
++-#     @no_grad()
++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++-#         moe_output = ops.zeros_like(hidden_states)
++-#         num_tokens, _ = hidden_states.shape
++-#         flat_selected_experts = selected_experts.flatten()
++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++-#         active_experts = ops.unique(flat_selected_experts)
++-#         for expert_idx_tensor in active_experts:
++-#             expert_idx = expert_idx_tensor.item()
++-#             expert_layer = self.experts[expert_idx]
++-#             mask = (flat_selected_experts == expert_idx_tensor)
++-#             selected_token_indices = token_indices[mask]
++-#             selected_routing_weights = routing_weights.flatten()[mask]
++-#             current_states = hidden_states[selected_token_indices]
++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
++-#         return moe_output
++-
++-#     # --- 精度优先模式的辅助函数 ---
++-#     @no_grad()
++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++-#         moe_output = ops.zeros_like(hidden_states)
++-#         num_tokens, _ = hidden_states.shape
++-#         flat_selected_experts = selected_experts.flatten()
++-#         flat_routing_weights = routing_weights.flatten()
++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++-#         active_experts = ops.unique(flat_selected_experts)
++-#         for expert_idx_tensor in active_experts:
++-#             expert_idx = expert_idx_tensor.item()
++-#             expert_layer = self.experts[expert_idx]
++-#             mask = (flat_selected_experts == expert_idx_tensor)
++-#             current_token_indices = token_indices[mask]
++-#             current_routing_weights = flat_routing_weights[mask]
++-#             current_hidden_states = hidden_states[current_token_indices]
++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++-#         return moe_output
++-
++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++-#         # 声明我们将要使用一个在模块外部定义的全局变量
++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
++-#         global Long_Prompt
++-
++-#         # 1. 门控计算 (所有模式通用)
++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++-#         router_logits = self.gate(hidden_states_reshaped)
++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++-#         if self.norm_topk_prob:
++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-        
++-#         moe_output = None
++-#         if not self.training:
++-#             # 根据 Long_Prompt 标志选择模式
++-#             if Long_Prompt:
++-#                 # --- 精度优先模式 ---
++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++-#             else:
++-#                 # --- 速度优先模式 ---
++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++-#                 if sequence_length == 1:
++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++-#                 else:
++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++-#         else:
++-#             raise NotImplementedError("Training path is not implemented.")
++-
++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++-        
++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++-        
++-#         return final_hidden_states, router_logits
++-    
++ class Qwen2MoeSparseMoeBlock(nn.Module):
++     """
++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++         return moe_output_fp32.squeeze(1).to(original_dtype)
++ 
+++    # @no_grad()
+++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++    #     num_tokens, _ = hidden_states.shape
+++    #     flat_selected_experts = selected_experts.flatten()
+++    #     sorted_expert_indices = flat_selected_experts.argsort()
+++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++    #     original_token_indices = sorted_expert_indices // self.top_k
+++    #     moe_output = ops.zeros_like(hidden_states)
+++    #     current_token_offset = 0
+++    #     for i in range(self.num_experts):
+++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+++    #         if expert_token_count == 0:
+++    #             continue
+++    #         end_offset = current_token_offset + expert_token_count
+++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++    #         current_token_offset += expert_token_count
+++    #     return moe_output
+++
++     @no_grad()
++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++-        num_tokens, _ = hidden_states.shape
++-        flat_selected_experts = selected_experts.flatten()
++-        sorted_expert_indices = flat_selected_experts.argsort()
++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++-        original_token_indices = sorted_expert_indices // self.top_k
+++        """
+++        优化版 MoE prefill (速度优先模式)：
+++        - 批量张量化处理同一个 expert 的所有 token
+++        - 跳过无 token 的专家
+++        - 保持结果完全一致
+++        """
++         moe_output = ops.zeros_like(hidden_states)
++-        current_token_offset = 0
++-        for i in range(self.num_experts):
++-            expert_token_count = tokens_per_expert[i] - current_token_offset
++-            if expert_token_count == 0:
++-                continue
++-            end_offset = current_token_offset + expert_token_count
++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++-            expert_hidden_states = hidden_states[expert_original_token_indices]
++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++-            current_token_offset += expert_token_count
+++
+++        flat_selected_experts = selected_experts.flatten()
+++        flat_routing_weights = routing_weights.flatten()
+++
+++        idxs = flat_selected_experts.argsort()
+++        sorted_expert_indices = flat_selected_experts[idxs]
+++        sorted_token_indices = idxs // self.top_k
+++
+++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+++
+++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+++
+++        for expert_id in active_experts.tolist():
+++            start = int(tokens_per_expert[:expert_id].sum().item())
+++            end = start + int(tokens_per_expert[expert_id].item())
+++
+++            token_idx = sorted_token_indices[start:end]
+++            expert_tokens = hidden_states[token_idx]
+++
+++            expert_out = self.experts[expert_id](expert_tokens)
+++
+++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+++
+++            moe_output = mindspore.mint.scatter_add(
+++                moe_output,
+++                0,
+++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+++                scaled_out.to(hidden_states.dtype)
+++            )
+++
++         return moe_output
++ 
+++
++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
++     @no_grad()
++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++         
++         moe_output = None
++-        if Long_Prompt:
++-            # --- 精度优先模式 (ACCURACY MODE) ---
++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++        # if Long_Prompt==0:
+++        #     # --- 精度优先模式 (ACCURACY MODE) ---
+++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++        # else:
+++        #     # --- 速度优先模式 (SPEED MODE) ---
+++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++        #     if sequence_length == 1:
+++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++        #     else:
+++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++        
+++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++        if sequence_length == 1:
+++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++         else:
++-            # --- 速度优先模式 (SPEED MODE) ---
++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++-            if sequence_length == 1:
++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++-            else:
++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++-        
+++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++    
++ 
++         # 3. 共享专家计算与合并 (所有模式通用)
++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++         
++         return final_hidden_states, router_logits
++ 
+++
++ class Qwen2MoeDecoderLayer(nn.Module):
++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
++         super().__init__()
++         self.hidden_size = config.hidden_size
++         
++-        # if Long_Prompt:
++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++-        # else:
+++        # if Long_Prompt == 2:
++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++        # else:
+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++ 
++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++ 
++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++             )
++ 
++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++        #     attention_mask,
+++        #     sequence_length=sequence_length,
+++        #     target_length=target_length,
+++        #     dtype=dtype,
+++        #     min_dtype=min_dtype,
+++        #     cache_position=cache_position,
+++        #     batch_size=input_tensor.shape[0],
+++        # )
+++        #@dwj
+++        causal_mask = get_cached_causal_mask_with_cache_position(
++             attention_mask,
++             sequence_length=sequence_length,
++             target_length=target_length,
++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
++         """
++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+++        _causal_mask_cache.clear()
++ 
++         input_ids = kwargs.get("input_ids")
++         if input_ids is None and args:
++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++ 
++         if input_ids is not None:
++             prompt_length = input_ids.shape[1]
++-            
++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
++-                Long_Prompt = True
+++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+++                Long_Prompt = 2
+++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+++                Long_Prompt = 0
++             else:
++-                Long_Prompt = False
+++                Long_Prompt = 1
+++
++ 
++         return super().generate(*args, **kwargs)
++     
++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++             dtype = self.lm_head.weight.dtype
++             min_dtype = float(ops.finfo(dtype).min)
++ 
++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++            #     attention_mask,
+++            #     sequence_length=sequence_length,
+++            #     target_length=past_key_values.get_max_length(),
+++            #     dtype=dtype,
+++            #     min_dtype=min_dtype,
+++            #     cache_position=cache_position,
+++            #     batch_size=batch_size,
+++            # )
+++
+++            #@dwj
+++            attention_mask = get_cached_causal_mask_with_cache_position(
++                 attention_mask,
++                 sequence_length=sequence_length,
++                 target_length=past_key_values.get_max_length(),
++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++deleted file mode 100644
++index 6dfb5b93..00000000
++--- a/patches/0001-20251104commit.patch
+++++ /dev/null
++@@ -1,1272 +0,0 @@
++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++-From: Pinoeer-kingxi <13022943007@163.com>
++-Date: Tue, 4 Nov 2025 09:11:51 +0800
++-Subject: [PATCH] 20251104commit
++-
++----
++- mindnlp/transformers/cache_utils.py           |  28 +-
++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++- 3 files changed, 976 insertions(+), 87 deletions(-)
++-
++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++-index cadd2e04..02f8d4be 100644
++---- a/mindnlp/transformers/cache_utils.py
++-+++ b/mindnlp/transformers/cache_utils.py
++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++-             # k_out[:, :, cache_position] = key_states
++-             # v_out[:, :, cache_position] = value_states
++--            if ON_ORANGE_PI:
++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++--            else:
++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++--
++-+            # if ON_ORANGE_PI:
++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++-+            # else:
++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++-+            # 确保 cache_position 是 1D tensor 并且类型正确
++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++-+            if cache_position.ndim > 1:
++-+                cache_position = cache_position.flatten()
++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++-+                cache_position = cache_position.int()
++-+            
++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++-+            k_out[:, :, cache_position] = key_states
++-+            v_out[:, :, cache_position] = value_states
++-+            
++-         return k_out, v_out
++- 
++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++-index c695b944..d8303e45 100644
++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++- # Copied from transformers.models.llama.modeling_llama.rotate_half
++- def rotate_half(x):
++-     """Rotates half the hidden dims of the input."""
++--    x1 = x[..., : x.shape[-1] // 2]
++--    x2 = x[..., x.shape[-1] // 2 :]
++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++-+    # x1 = x[..., : x.shape[-1] // 2]
++-+    # x2 = x[..., x.shape[-1] // 2 :]
++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++-     return ops.cat((-x2, x1), dim=-1)
++- 
++- 
++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++-         if self.training:
++-             raise NotImplementedError("Training is not supported yet.")
++-         else:
++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++--        if self.config.n_shared_experts is not None:
++--            y = y + self.shared_experts(identity)
++--        return y
++-+            # @lwx
++-+            if orig_shape[1] == 1:
++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++-+                y=y.view(*orig_shape)
++-+                if self.config.n_shared_experts is not None:
++-+                    y = y + self.shared_experts(identity)
++-+                return y
++-+            else:
++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++-+                if self.config.n_shared_experts is not None:
++-+                    y = y + self.shared_experts(identity)
++-+                return y
++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++-+        # if self.config.n_shared_experts is not None:
++-+        #     y = y + self.shared_experts(identity)
++-+        # return y
++-+        
++-+    @no_grad()
++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++-+
++-+        expert_cache = ops.zeros_like(x)
++-+        for i in range(self.num_experts_per_tok):
++-+            expert_id = flat_expert_indices[i].item()
++-+            weight = flat_expert_weights[i].item()
++-+            expert = self.experts[expert_id]
++-+            expert_out = expert(x)
++-+            expert_cache += expert_out * weight
++-+        return expert_cache
++- 
++-     @no_grad()
++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++--        # expert_cache = torch.zeros_like(x)
++--        # idxs = flat_expert_indices.argsort()
++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++--        # token_idxs = idxs // self.num_experts_per_tok
++--        # for i, end_idx in enumerate(tokens_per_expert):
++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++--        #     if start_idx == end_idx:
++--        #         continue
++--        #     expert = self.experts[i]
++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
++--        #     expert_tokens = x[exp_token_idx]
++--        #     expert_out = expert(expert_tokens)
++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++--        # return expert_cache
++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++-         expert_cache = ops.zeros_like(x)
++-         idxs = flat_expert_indices.argsort()
++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++-         token_idxs = idxs // self.num_experts_per_tok
++-+
++-         for i, end_idx in enumerate(tokens_per_expert):
++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++-             if start_idx == end_idx:
++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++-             expert_out = expert(expert_tokens)
++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++-+
++-         return expert_cache
++-+        
++-+    # @no_grad()
++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++-+    #     # expert_cache = torch.zeros_like(x)
++-+    #     # idxs = flat_expert_indices.argsort()
++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++-+    #     # token_idxs = idxs // self.num_experts_per_tok
++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++-+    #     #     if start_idx == end_idx:
++-+    #     #         continue
++-+    #     #     expert = self.experts[i]
++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++-+    #     #     expert_tokens = x[exp_token_idx]
++-+    #     #     expert_out = expert(expert_tokens)
++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++-+    #     # return expert_cache
++-+    #     expert_cache = ops.zeros_like(x)
++-+    #     idxs = flat_expert_indices.argsort()
++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++-+    #     token_idxs = idxs // self.num_experts_per_tok
++-+
++-+    #     for i, end_idx in enumerate(tokens_per_expert):
++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++-+    #         if start_idx == end_idx:
++-+    #             continue
++-+    #         expert = self.experts[i]
++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
++-+    #         expert_tokens = x[exp_token_idx]
++-+    #         expert_out = expert(expert_tokens)
++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++-+
++-+    #     return expert_cache
++-+    # @no_grad()
++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++-+    #     expert_cache = ops.zeros_like(x)
++-+
++-+    #     # 排序保证顺序一致
++-+    #     idxs = flat_expert_indices.argsort()
++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++-+    #     token_idxs = idxs // self.num_experts_per_tok
++-+
++-+    #     # 找出有 token 的专家
++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++-+
++-+    #     for i in active_experts.tolist():
++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++-+    #         end_idx = tokens_per_expert[i]
++-+    #         if start_idx == end_idx:  # 没有 token
++-+    #             continue
++-+
++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
++-+    #         expert_tokens = x[exp_token_idx]
++-+    #         expert_out = self.experts[i](expert_tokens)
++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++-+
++-+    #         expert_cache = mindspore.mint.scatter_add(
++-+    #             expert_cache,
++-+    #             0,
++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++-+    #             expert_out
++-+    #         )
++-+
++-+    #     return expert_cache
++-+
++-+
++- 
++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++- #     """
++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++- 
++-         # Initialize weights and apply final processing
++-         self.post_init()
++-+        self.warm_up = False
++-+
++-+    def warmup_moe_model_deep(self):
++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++-+        test_texts = [
++-+            "warmup short",
++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++-+        ]
++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
++-+        if tokenizer is None:
++-+            from mindnlp.transformers import AutoTokenizer
++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++-+            self._warmup_tokenizer = tokenizer
++-+
++-+        for text in test_texts:
++-+            inputs = tokenizer(text, return_tensors="ms")
++-+            with mindspore._no_grad():
++-+                _ = self(**inputs, use_cache=False)
++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++- 
++-     def get_input_embeddings(self):
++-         return self.model.embed_tokens
++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++-         ```"""
++-+        if not self.warm_up:
++-+            self.warm_up = True
++-+            self.warmup_moe_model_deep()
++-+
++-         output_attentions = (
++-             output_attentions
++-             if output_attentions is not None
++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++-index 3cbf820e..d4c6b651 100644
++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++-@@ -18,7 +18,6 @@
++- # See the License for the specific language governing permissions and
++- # limitations under the License.
++- """MindSpore Qwen2MoE model."""
++--
++- import math
++- from typing import List, Optional, Tuple, Union
++- 
++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++-     TokenClassifierOutput,
++- )
++- from ...modeling_utils import PreTrainedModel
++-+from ...generation import GenerationMixin
++- from ....utils import logging
++- from .configuration_qwen2_moe import Qwen2MoeConfig
++- 
++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++-         self.variance_epsilon = eps
++- 
++-     def forward(self, hidden_states):
++-+        # @dwj
++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++-+        # @lwx
++-+        # if not self.training :
++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++-         input_dtype = hidden_states.dtype
++-         hidden_states = hidden_states.to(mindspore.float32)
++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++-@@ -234,6 +239,8 @@ def rotate_half(x):
++-     """Rotates half the hidden dims of the input."""
++-     x1 = x[..., : x.shape[-1] // 2]
++-     x2 = x[..., x.shape[-1] // 2 :]
++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++-     return ops.cat((-x2, x1), dim=-1)
++- 
++- 
++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++-         self.config = config
++-         self.hidden_size = config.hidden_size
++-         self.intermediate_size = intermediate_size
++-+        
++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++-         self.act_fn = ACT2FN[config.hidden_act]
++- 
++-     def forward(self, x):
++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++--
++- 
++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++-+        # @lwx 
++-+        # gate_up_output = self.gate_up_proj(x)
++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++-+        # return self.down_proj(swiglu_output)
++-+
++-+    # def forward(self, x):
++-+    #     gate_proj_out = self.gate_proj(x)
++-+    #     up_proj_out = self.up_proj(x)
++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++-+    #     return self.down_proj(swiglu_out)
++-+        
++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++-     """
++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++-         use_cache: bool = False,
++-         cache_position: Optional[mindspore.Tensor] = None,
++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++-+
++-+        
++-+
++-         bsz, q_len, _ = hidden_states.shape
++- 
++-         query_states = self.q_proj(hidden_states)
++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++-                     "with a layer index."
++-                 )
++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++-+            if isinstance(past_key_value, StaticCache):
++-+                kv_seq_len = key_states.shape[-2]
++-+            else:
++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++- 
++-         if past_key_value is not None:
++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++-+            
++-+            if isinstance(past_key_value, StaticCache):
++-+                kv_seq_len = key_states.shape[-2]
++- 
++-         # repeat k/v heads if n_kv_heads < n_heads
++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
++--
++-+        
++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++- 
++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++--            raise ValueError(
++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++--                f" {attn_weights.shape}"
++--            )
++--
++--        if attention_mask is not None:  # no matter the length, we just slice it
++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++-+        if attention_mask is not None:
++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++-             attn_weights = attn_weights + causal_mask
++- 
++-         # upcast attention to fp32
++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++- 
++-         attn_output = self.o_proj(attn_output)
++--
++-+        # @lwx
++-+        
++-+        # max_seq_len = self.max_position_embeddings  # 2048
++-+
++-+        # if attention_mask is not None:
++-+        #     # attention_mask: [B, 1, Sq, Sk]
++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++-+
++-+        #     # pad 到 [max_seq_len, max_seq_len]
++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++-+        #     global_attention_mask = padded_mask
++-+        # else:
++-+        #     global_attention_mask = None
++-+
++-+
++-+        # sparse_mode=3
++-+        # attn_output = mindspore.ops.flash_attention_score(
++-+        #     query=query_states,
++-+        #     key=key_states,
++-+        #     value=value_states,
++-+        #     real_shift=None,
++-+        #     padding_mask=None,
++-+
++-+        #     head_num=self.num_heads,
++-+        #     attn_mask=global_attention_mask,
++-+        #     keep_prob=1.0 - self.attention_dropout,
++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++-+        #     input_layout="BNSD",
++-+        #     pre_tokens=2147483647,
++-+        #     next_tokens=2147483647,
++-+        #     inner_precise=0,
++-+        #     drop_mask=None,
++-+        #     prefix=None,
++-+        #     actual_seq_qlen=None,
++-+        #     actual_seq_kvlen=None,
++-+        #     sparse_mode=sparse_mode,
++-+        # )
++-         if not output_attentions:
++-             attn_weights = None
++- 
++-         return attn_output, attn_weights, past_key_value
++- 
++- 
++-+class Qwen2MoeFlashAttention(nn.Module):
++-+    """
++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++-+
++-+    关键改动:
++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++-+       直接传入原始的 key 和 value 张量效率更高。
++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++-+    """
++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++-+        super().__init__()
++-+        self.config = config
++-+        self.layer_idx = layer_idx
++-+        self.hidden_size = config.hidden_size
++-+        self.num_heads = config.num_attention_heads
++-+        self.head_dim = self.hidden_size // self.num_heads
++-+        self.num_key_value_heads = config.num_key_value_heads
++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++-+        self.max_position_embeddings = config.max_position_embeddings
++-+        self.rope_theta = config.rope_theta
++-+        self.attention_dropout = config.attention_dropout
++-+
++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
++-+            raise ValueError(
++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++-+            )
++-+
++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++-+
++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++-+            self.head_dim,
++-+            max_position_embeddings=self.max_position_embeddings,
++-+            base=self.rope_theta,
++-+        )
++-+
++-+    def forward(
++-+        self,
++-+        hidden_states: mindspore.Tensor,
++-+        attention_mask: Optional[mindspore.Tensor] = None,
++-+        position_ids: Optional[mindspore.Tensor] = None,
++-+        past_key_value: Optional[Cache] = None,
++-+        output_attentions: bool = False,
++-+        use_cache: bool = False,
++-+        cache_position: Optional[mindspore.Tensor] = None,
++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++-+
++-+        bsz, q_len, _ = hidden_states.shape
++-+
++-+        # 1. 线性投射 Q, K, V
++-+        query_states = self.q_proj(hidden_states)
++-+        key_states = self.k_proj(hidden_states)
++-+        value_states = self.v_proj(hidden_states)
++-+
++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+
++-+        # 3. RoPE 旋转位置编码
++-+        kv_seq_len = key_states.shape[-2]
++-+        if past_key_value is not None:
++-+            if self.layer_idx is None:
++-+                raise ValueError(
++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++-+                    "with a layer index."
++-+                )
++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++-+                if cache_position.shape[0] == 1:
++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++-+                    kv_seq_len = past_seen_tokens + 1
++-+                else:
++-+                    # prefill 阶段：cache_position 是范围，使用其长度
++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++-+            else:
++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++-+
++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++-+
++-+        # 4. KV 缓存更新
++-+        if past_key_value is not None:
++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++-+            key_states, value_states = past_key_value.update(
++-+                key_states, value_states, self.layer_idx, cache_kwargs
++-+            )
++-+            
++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++-+                if cache_position.shape[0] == 1:
++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++-+                    kv_seq_len = key_states.shape[-2]
++-+
++-+        # 5. [重要] 准备 Attention Mask
++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++-+        fa_attention_mask = None
++-+        if attention_mask is not None:
++-+            # 截取与当前key长度匹配的部分
++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
++-+            fa_attention_mask = (mask_slice != 0)
++-+
++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++-+        input_dtype = query_states.dtype
++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++-+            query_states = query_states.to(mindspore.float16)
++-+            key_states = key_states.to(mindspore.float16)
++-+            value_states = value_states.to(mindspore.float16)
++-+
++-+        # 6. [核心] 调用 flash_attention_score 算子
++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++-+        attn_output = mindspore.ops.flash_attention_score(
++-+            query=query_states,
++-+            key=key_states,
++-+            value=value_states,
++-+            head_num=self.num_heads, # 传入Q的头数(N1)
++-+            attn_mask=fa_attention_mask,
++-+            keep_prob=1.0 - self.attention_dropout,
++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
++-+            input_layout="BNSD",
++-+            sparse_mode=0 # 使用 defaultMask 模式
++-+        )
++-+
++-+        # 恢复原始数据类型
++-+        attn_output = attn_output.to(input_dtype)
++-+
++-+        # 7. 调整输出形状
++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++-+        attn_output = self.o_proj(attn_output)
++-+
++-+        # FlashAttention 算子不直接返回注意力权重矩阵
++-+        attn_weights = None
++-+        if output_attentions:
++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++-+
++-+        return attn_output, attn_weights, past_key_value
++-+
++-+    # def forward(
++-+    #     self,
++-+    #     hidden_states: mindspore.Tensor,
++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
++-+    #     position_ids: Optional[mindspore.Tensor] = None,
++-+    #     past_key_value: Optional[Cache] = None,
++-+    #     output_attentions: bool = False,
++-+    #     use_cache: bool = False,
++-+    #     cache_position: Optional[mindspore.Tensor] = None,
++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++-+
++-+    #     bsz, q_len, _ = hidden_states.shape
++-+
++-+    #     # 1. 线性投射 Q, K, V
++-+    #     query_states = self.q_proj(hidden_states)
++-+    #     key_states = self.k_proj(hidden_states)
++-+    #     value_states = self.v_proj(hidden_states)
++-+
++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+
++-+    #     # 3. RoPE 旋转位置编码
++-+    #     kv_seq_len = key_states.shape[-2]
++-+    #     if past_key_value is not None:
++-+    #         if self.layer_idx is None:
++-+    #             raise ValueError(
++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++-+    #                 "with a layer index."
++-+    #             )
++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++-+
++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++-+
++-+    #     # 4. KV 缓存更新
++-+    #     if past_key_value is not None:
++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++-+    #         key_states, value_states = past_key_value.update(
++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
++-+    #         )
++-+
++-+    #     # 5. 准备 Attention Mask
++-+    #     fa_attention_mask = None
++-+    #     if attention_mask is not None:
++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++-+    #         fa_attention_mask = (mask_slice != 0)
++-+
++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++-+    #     input_dtype = query_states.dtype
++-+
++-+    #     # 6. [核心] 调用 flash_attention_score 算子
++-+    #     attn_output = mindspore.ops.flash_attention_score(
++-+    #         query=query_states,
++-+    #         key=key_states,
++-+    #         value=value_states,
++-+    #         head_num=self.num_heads,
++-+    #         attn_mask=fa_attention_mask,
++-+    #         keep_prob=1.0 - self.attention_dropout,
++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++-+    #         input_layout="BNSD",
++-+    #         sparse_mode=0,
++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++-+    #         inner_precise=1
++-+    #     )
++-+
++-+    #     # 恢复原始数据类型
++-+    #     attn_output = attn_output.to(input_dtype)
++-+
++-+    #     # 7. 调整输出形状
++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++-+    #     attn_output = self.o_proj(attn_output)
++-+
++-+    #     attn_weights = None
++-+    #     if output_attentions:
++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++-+
++-+    #     return attn_output, attn_weights, past_key_value
++-+
++-+    # def forward(
++-+    #     self,
++-+    #     hidden_states: mindspore.Tensor,
++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
++-+    #     position_ids: Optional[mindspore.Tensor] = None,
++-+    #     past_key_value: Optional[Cache] = None,
++-+    #     output_attentions: bool = False,
++-+    #     use_cache: bool = False,
++-+    #     cache_position: Optional[mindspore.Tensor] = None,
++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++-+
++-+    #     bsz, q_len, _ = hidden_states.shape
++-+
++-+    #     query_states = self.q_proj(hidden_states)
++-+    #     key_states = self.k_proj(hidden_states)
++-+    #     value_states = self.v_proj(hidden_states)
++-+
++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-+
++-+    #     kv_seq_len = key_states.shape[-2]
++-+    #     if past_key_value is not None:
++-+    #         if self.layer_idx is None:
++-+    #             raise ValueError("`layer_idx` must be specified for caching")
++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++-+
++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++-+
++-+    #     if past_key_value is not None:
++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++-+    #         key_states, value_states = past_key_value.update(
++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
++-+    #         )
++-+
++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++-+
++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++-+    #     query_states = query_states / math.sqrt(self.head_dim)
++-+    #     # <--- 修改结束 ---
++-+
++-+    #     fa_attention_mask = None
++-+    #     if attention_mask is not None:
++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++-+    #         fa_attention_mask = (mask_slice != 0)
++-+
++-+    #     input_dtype = query_states.dtype
++-+
++-+    #     attn_output = mindspore.ops.flash_attention_score(
++-+    #         query=query_states,    # 传入已经预先缩放过的 query
++-+    #         key=key_states,
++-+    #         value=value_states,
++-+    #         head_num=self.num_heads,
++-+    #         attn_mask=fa_attention_mask,
++-+    #         keep_prob=1.0 - self.attention_dropout,
++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++-+    #         input_layout="BNSD",
++-+    #         sparse_mode=0,
++-+    #         inner_precise=1        # 仍然保持内部高精度计算
++-+    #     )
++-+
++-+    #     attn_output = attn_output.to(input_dtype)
++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++-+    #     attn_output = self.o_proj(attn_output)
++-+
++-+    #     attn_weights = None
++-+    #     if output_attentions:
++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++-+
++-+    #     return attn_output, attn_weights, past_key_value
++-+
++- QWEN2MOE_ATTENTION_CLASSES = {
++-     "eager": Qwen2MoeAttention,
++-+    "flash-attention": Qwen2MoeFlashAttention,
++- }
++- 
++- 
++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++- 
++-+    #@dwj
++-+    # 只遍历激活的专家，而非全部专家
++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
++--        hidden_states = hidden_states.view(-1, hidden_dim)
++--        # router_logits: (batch * sequence_length, n_experts)
++--        router_logits = self.gate(hidden_states)
++--
++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++--        if self.norm_topk_prob:
++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++--        # we cast back to the input dtype
++--        routing_weights = routing_weights.to(hidden_states.dtype)
++--
++--        final_hidden_states = ops.zeros(
++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++--        )
++--
++--        # One hot encode the selected experts to create an expert mask
++--        # this will be used to easily index which expert is going to be sollicitated
++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++--
++--        # Loop over all available experts in the model and perform the computation on each expert
++--        for expert_idx in range(self.num_experts):
++--            expert_layer = self.experts[expert_idx]
++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++--
++--            # Index the correct hidden states and compute the expert hidden state for
++--            # the current expert. We need to make sure to multiply the output hidden
++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++--            if 0 not in idx.shape:
++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++--
++--                # However `index_add_` only support torch tensors for indexing so we'll use
++--                # the `top_x` tensor here.
++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++--
++--        shared_expert_output = self.shared_expert(hidden_states)
++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++--
++--        final_hidden_states = final_hidden_states + shared_expert_output
++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++-+            num_tokens = hidden_states_reshaped.shape[0]
++-+
++-+            router_logits = self.gate(hidden_states_reshaped)
++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++-+
++-+            if self.norm_topk_prob:
++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++-+            routing_weights = routing_weights.to(hidden_states.dtype)
++-+            
++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++-+            flat_selected_experts = selected_experts.flatten()
++-+            
++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++-+            token_indices = broadcasted_token_indices.flatten()
++-+            
++-+            active_experts = ops.unique(flat_selected_experts)
++-+            
++-+            for expert_idx_tensor in active_experts:
++-+                expert_idx = expert_idx_tensor.item()
++-+                expert_layer = self.experts[expert_idx]
++-+                
++-+                mask = (flat_selected_experts == expert_idx_tensor)
++-+                selected_token_indices = token_indices[mask]
++-+                selected_routing_weights = routing_weights.flatten()[mask]
++-+                
++-+                current_states = hidden_states_reshaped[selected_token_indices]
++-+                
++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++-+                
++-+                final_hidden_states = final_hidden_states.index_add(
++-+                    dim=0,
++-+                    index=selected_token_indices,
++-+                    source=expert_output.to(hidden_states.dtype)
++-+                )
++-+            
++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++- 
++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++--        return final_hidden_states, router_logits
++-+            final_hidden_states = final_hidden_states + shared_expert_output
++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++-+            
++-+            return final_hidden_states, router_logits
++- 
++- 
++- class Qwen2MoeDecoderLayer(nn.Module):
++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++- 
++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++- 
++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++-+
++-         if (layer_idx not in config.mlp_only_layers) and (
++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++-         ):
++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++-     _skip_keys_device_placement = "past_key_values"
++-     _supports_cache_class = True
++-+#lwx
++-+    # _supports_static_cache = True
++- 
++-     def _init_weights(self, module):
++-         std = self.config.initializer_range
++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++-         return causal_mask
++- 
++- 
++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++-     _tied_weights_keys = ["lm_head.weight"]
++- 
++-     def __init__(self, config):
++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++-         self.num_experts_per_tok = config.num_experts_per_tok
++-         # Initialize weights and apply final processing
++-         self.post_init()
++-+        # @lwx
++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++-+        #     self.generation_config.cache_implementation = "static"
++-+        self._warmed_up = False
++-+
++-+    def warmup_moe_model(self):
++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
++-+        test_texts = [
++-+            "warmup short",
++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++-+        ]
++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
++-+        if tokenizer is None:
++-+            from mindnlp.transformers import AutoTokenizer
++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++-+            self._warmup_tokenizer = tokenizer
++-+
++-+        for text in test_texts:
++-+            inputs = tokenizer(text, return_tensors="ms")
++-+            with mindspore._no_grad():
++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
++- 
++-     def get_input_embeddings(self):
++-         return self.model.embed_tokens
++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++-         ```"""
++-+        if not self._warmed_up:
++-+            self._warmed_up = True
++-+            self.warmup_moe_model()
++- 
++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++-         output_router_logits = (
++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++-             }
++-         )
++-         return model_inputs
++-+# @lwx
++-+    # def _decode_one_tokens_logits(
++-+    #     self,
++-+    #     cur_token: mindspore.Tensor,
++-+    #     input_pos: Optional[mindspore.Tensor],
++-+    #     cache_position: mindspore.Tensor,
++-+    #     past_key_values: StaticCache,
++-+    # ) -> mindspore.Tensor:
++-+    #     """
++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++-+        
++-+    #     Args:
++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++-+    #         input_pos: 输入位置信息，可选
++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
++-+            
++-+    #     Returns:
++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++-+    #     """
++-+    #     # 调用JIT编译的版本
++-+    #     return self.get_decode_one_tokens_logits(
++-+    #         cur_token=cur_token,
++-+    #         input_pos=input_pos,
++-+    #         cache_position=cache_position,
++-+    #         past_key_values=past_key_values,
++-+    #     )
++-+    
++-+    # @mindspore.jit(jit_level='O1')
++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++-+    #     """
++-+    #     JIT编译的函数，用于高效的单token解码
++-+    #     使用JIT编译优化以支持静态shape和高效执行
++-+        
++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++-+    #     """
++-+    #     outputs = self.model.forward(
++-+    #         input_ids=cur_token,
++-+    #         position_ids=input_pos,
++-+    #         cache_position=cache_position,
++-+    #         past_key_values=past_key_values,
++-+    #         use_cache=True,
++-+    #         return_dict=False,
++-+    #     )
++-+        
++-+    #     hidden_states = outputs[0]
++-+    #     logits = self.lm_head.forward(hidden_states)
++-+    #     logits = logits.float()
++-+        
++-+    #     return logits[:, -1, :]
++-+
++-+    # def _sample(
++-+    #     self,
++-+    #     input_ids: mindspore.Tensor,
++-+    #     logits_processor,
++-+    #     stopping_criteria,
++-+    #     generation_config,
++-+    #     synced_devices: bool,
++-+    #     streamer=None,
++-+    #     logits_warper=None,
++-+    #     **model_kwargs,
++-+    # ):
++-+    #     """
++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++-+    #     """
++-+    #     from ...generation.logits_process import LogitsProcessorList
++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++-+    #     from mindnlp.core import nn, ops, no_grad
++-+    #     import numpy as np
++-+        
++-+    #     # 检查是否使用 StaticCache
++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++-+    #     # 否则，直接调用父类方法
++-+    #     past_key_values = model_kwargs.get("past_key_values")
++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++-+
++-+    #     if not isinstance(past_key_values, StaticCache):
++-+    #         # 不使用 StaticCache，直接调用父类方法
++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++-+    #         return super()._sample(
++-+    #             input_ids=input_ids,
++-+    #             logits_processor=logits_processor,
++-+    #             stopping_criteria=stopping_criteria,
++-+    #             generation_config=generation_config,
++-+    #             synced_devices=synced_devices,
++-+    #             streamer=streamer,
++-+    #             logits_warper=logits_warper,
++-+    #             **model_kwargs,
++-+    #         )
++-+        
++-+    #     # 使用 StaticCache，进入自定义循环
++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++-+    #     pad_token_id = generation_config._pad_token_tensor
++-+    #     output_attentions = generation_config.output_attentions
++-+    #     output_hidden_states = generation_config.output_hidden_states
++-+    #     output_scores = generation_config.output_scores
++-+    #     output_logits = generation_config.output_logits
++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
++-+    #     max_length = generation_config.max_length
++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++-+    #     do_sample = generation_config.do_sample
++-+        
++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++-+    #         raise ValueError(
++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++-+    #             f"{logits_warper})."
++-+    #         )
++-+        
++-+    #     # init attention / hidden states / scores tuples
++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++-+        
++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++-+    #         encoder_hidden_states = (
++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++-+    #         )
++-+        
++-+    #     # keep track of which sequences are already finished
++-+    #     batch_size, cur_len = input_ids.shape
++-+    #     this_peer_finished = False
++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++-+        
++-+    #     time_record = []
++-+    #     from ....utils.testing_utils import parse_flag_from_env
++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++-+        
++-+    #     while self._has_unfinished_sequences(
++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++-+    #     ):
++-+    #         if _record_time:
++-+    #             import time as time_module
++-+    #             infer_start = time_module.time()
++-+            
++-+    #         # prepare model inputs
++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++-+            
++-+    #         # prepare variable output controls
++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++-+            
++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++-+    #         cur_cache_position = model_inputs.get("cache_position")
++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
++-+    #         cur_input_ids = model_inputs.get("input_ids")
++-+            
++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
++-+    #             cur_cache_position is not None and 
++-+    #             len(cur_cache_position.shape) > 0 and
++-+    #             cur_cache_position.shape[0] == 1 and
++-+    #             cur_input_ids is not None and
++-+    #             cur_input_ids.shape[1] == 1):
++-+    #             # 使用 JIT 优化的单 token 解码
++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++-+    #             if not hasattr(self, '_jit_used'):
++-+    #                 self._jit_used = False
++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++-+                
++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
++-+    #                 cur_token=cur_input_ids,
++-+    #                 input_pos=model_inputs.get("position_ids"),
++-+    #                 cache_position=cur_cache_position,
++-+    #                 past_key_values=cur_past_key_values,
++-+    #             )
++-+                
++-+    #             # 标记已使用JIT（用于后续判断）
++-+    #             if not self._jit_used:
++-+    #                 self._jit_used = True
++-+                
++-+    #             # 构造兼容的输出对象
++-+    #             class JitOptimizedOutput:
++-+    #                 def __init__(self, logits, config):
++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++-+    #                     self.config = config
++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
++-+    #                     self.cross_attentions = None
++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++-+                
++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++-+    #         else:
++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++-+    #             outputs = self(**model_inputs, return_dict=True)
++-+            
++-+    #         if synced_devices and this_peer_finished:
++-+    #             continue
++-+            
++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++-+    #         next_token_logits = outputs.logits[:, -1, :]
++-+            
++-+    #         # pre-process distribution
++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++-+    #         if do_sample:
++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++-+            
++-+    #         # Store scores, attentions and hidden_states when required
++-+    #         if return_dict_in_generate:
++-+    #             if output_scores:
++-+    #                 scores += (next_token_scores,)
++-+    #             if output_logits:
++-+    #                 raw_logits += (next_token_logits,)
++-+    #             if output_attentions:
++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++-+    #                 if self.config.is_encoder_decoder:
++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++-+                
++-+    #             if output_hidden_states:
++-+    #                 hidden = (
++-+    #                     outputs.decoder_hidden_states
++-+    #                     if self.config.is_encoder_decoder
++-+    #                     else outputs.hidden_states
++-+    #                 )
++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++-+            
++-+    #         # token selection
++-+    #         if do_sample:
++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++-+    #         else:
++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++-+            
++-+    #         # finished sentences should have their next token be a padding token
++-+    #         if has_eos_stopping_criteria:
++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++-+            
++-+    #         # update generated ids, model inputs, and length for next step
++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++-+    #         if streamer is not None:
++-+    #             streamer.put(next_tokens)
++-+            
++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
++-+    #             outputs,
++-+    #             model_kwargs,
++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
++-+    #         )
++-+            
++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++-+    #         cur_len += 1
++-+            
++-+    #         if _record_time:
++-+    #             import time as time_module
++-+    #             infer_stop = time_module.time()
++-+    #             time_record.append(infer_stop - infer_start)
++-+            
++-+    #         del outputs
++-+        
++-+    #     average_infer_time = None
++-+    #     if time_record:
++-+    #         if len(time_record) > 1:
++-+    #             time_record.pop(0)
++-+    #         average_infer_time = sum(time_record) / len(time_record)
++-+    #         print(f'average inference time is: {average_infer_time}')
++-+    #         print(f'inference time record: {time_record}')
++-+        
++-+    #     if streamer is not None:
++-+    #         streamer.end()
++-+        
++-+    #     # 简单判断：打印是否使用了JIT路径
++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
++-+    #     else:
++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++-+        
++-+    #     if return_dict_in_generate:
++-+    #         if self.config.is_encoder_decoder:
++-+    #             return GenerateEncoderDecoderOutput(
++-+    #                 sequences=input_ids,
++-+    #                 scores=scores,
++-+    #                 logits=raw_logits,
++-+    #                 encoder_attentions=encoder_attentions,
++-+    #                 encoder_hidden_states=encoder_hidden_states,
++-+    #                 decoder_attentions=decoder_attentions,
++-+    #                 cross_attentions=cross_attentions,
++-+    #                 decoder_hidden_states=decoder_hidden_states,
++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
++-+    #                 average_infer_time=average_infer_time
++-+    #             )
++-+    #         else:
++-+    #             return GenerateDecoderOnlyOutput(
++-+    #                 sequences=input_ids,
++-+    #                 scores=scores,
++-+    #                 logits=raw_logits,
++-+    #                 attentions=decoder_attentions,
++-+    #                 hidden_states=decoder_hidden_states,
++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
++-+    #                 average_infer_time=average_infer_time
++-+    #             )
++-+    #     else:
++-+    #         return input_ids
++-+            
++-+    # def _prepare_cache_for_generation(
++-+    #     self,
++-+    #     generation_config,
++-+    #     model_kwargs,
++-+    #     assistant_model,
++-+    #     batch_size,
++-+    #     max_cache_length,
++-+    # ):
++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++-+    #         generation_config.cache_implementation = "static"
++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++-+        
++-+    #     if generation_config.cache_implementation == "static":
++-+    #         base_required_from_max_length = generation_config.max_length + 1
++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
++-+    #         min_cache_size = 50
++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++-+    #         else:
++-+    #             max_cache_length = max(base_required, min_cache_size)
++-+            
++-+    #         original_max_cache_length = max_cache_length
++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
++-+            
++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++-+    #             if max_cache_length > self.config.max_position_embeddings:
++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++-+        
++-+    #     result = super()._prepare_cache_for_generation(
++-+    #         generation_config=generation_config,
++-+    #         model_kwargs=model_kwargs,
++-+    #         assistant_model=assistant_model,
++-+    #         batch_size=batch_size,
++-+    #         max_cache_length=max_cache_length,
++-+    #     )
++-+        
++-+    #     if generation_config.cache_implementation == "static":
++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++-+    #         created_cache = model_kwargs.get(cache_name)
++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++-+    #             if created_cache.max_cache_len < generation_config.max_length:
++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++-+        
++-+    #     return result
++-+
++-+
++-+
++- 
++- 
++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++--- 
++-2.27.0
++-
++-- 
++2.27.0
++
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0005-20251107001commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0005-20251107001commit.patch"
new file mode 100644
index 00000000..bbe6df27
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0005-20251107001commit.patch"
@@ -0,0 +1,7707 @@
+From ab47c0478530d34d2b48200af0453dda94d1ec18 Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Fri, 7 Nov 2025 11:48:18 +0800
+Subject: [PATCH 05/10] 20251107001commit
+
+---
+ .../models/deepseek/modeling_deepseek.py      |   91 +-
+ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
+ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
+ patches/0001-20251104commit.patch             |    2 +-
+ patches/0002-20251106commit.patch             |    2 +-
+ patches/0003-20261106secondcommit.patch       |    2 +-
+ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
+ 7 files changed, 7577 insertions(+), 30 deletions(-)
+ create mode 100644 patches/0004-20251106change.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index 0546f318..8831e4b7 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
+     #         expert_cache += expert_out * weight
+     #     return expert_cache
+ 
+-    @no_grad()
+-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-        # x 的 shape: (1, hidden_size)
+-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-
+-        # 1. 收集所有需要的专家层
+-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-
+-        # 2. 并行计算所有专家的输出
+-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-        # ops.cat 会将它们堆叠成一个新的 Tensor
+-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-
+-        # 3. 使用矩阵乘法进行加权求和
+-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-        # 最终结果 final_output 的 shape: (1, hidden_size)
+-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++    # @no_grad()
++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++    #     # x 的 shape: (1, hidden_size)
++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++
++    #     # 1. 收集所有需要的专家层
++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
++
++    #     # 2. 并行计算所有专家的输出
++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++
++    #     # 3. 使用矩阵乘法进行加权求和
++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+         
+-        return final_output
++    #     return final_output
+ 
+ 
+     # @no_grad()
+@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
+             )
+ 
+         return expert_cache
++# 放置在 DeepseekMoE 类中
++    @no_grad()
++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++        """
++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
++
++        Args:
++            x (Tensor): 输入张量, shape: (1, hidden_size)
++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
++        """
++        top_k, _ = flat_expert_weights.shape
++        hidden_size = x.shape[-1]
++
++        # 1. 将所有专家的权重堆叠起来
++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
++        
++        # 2. "收集" 所需的专家权重
++        selected_gate_w = stacked_gate_w[flat_expert_indices]
++        selected_up_w = stacked_up_w[flat_expert_indices]
++        selected_down_w = stacked_down_w[flat_expert_indices]
++        
++        # 3. 准备输入
++        x_expanded = x.expand((top_k, 1, hidden_size))
++        
++        # 4. 并行计算 gate_proj 和 up_proj
++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
++
++        # 5. 计算中间状态
++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
++        
++        # 6. 并行计算 down_proj
++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
++        #    --- [FIX] ---
++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
++        #    --- [FIX END] ---
++        
++        # 7. 根据路由权重进行加权求和
++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
++
++        return weighted_sum
++
++
+ 
+         # @no_grad()
+         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+index ebd7782e..913a7609 100644
+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+ def rotate_half(x):
+     """Rotates half the hidden dims of the input."""
+-    x1 = x[..., : x.shape[-1] // 2]
+-    x2 = x[..., x.shape[-1] // 2 :]
++    # x1 = x[..., : x.shape[-1] // 2]
++    # x2 = x[..., x.shape[-1] // 2 :]
+     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+     return ops.cat((-x2, x1), dim=-1)
+ 
+ 
+diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+index d059dcbe..2b217b64 100644
+--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+ def rotate_half(x):
+     """Rotates half the hidden dims of the input."""
+-    x1 = x[..., : x.shape[-1] // 2]
+-    x2 = x[..., x.shape[-1] // 2 :]
++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++    # x1 = x[..., : x.shape[-1] // 2]
++    # x2 = x[..., x.shape[-1] // 2 :]
++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+     return ops.cat((-x2, x1), dim=-1)
+ 
+ 
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+index 78f22642..0a0ef2d7 100644
+--- a/patches/0001-20251104commit.patch
++++ b/patches/0001-20251104commit.patch
+@@ -1,7 +1,7 @@
+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-Subject: [PATCH 1/3] 20251104commit
++Subject: [PATCH 1/4] 20251104commit
+ 
+ ---
+  mindnlp/transformers/cache_utils.py           |  28 +-
+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+index 22b65dd5..5185270c 100644
+--- a/patches/0002-20251106commit.patch
++++ b/patches/0002-20251106commit.patch
+@@ -1,7 +1,7 @@
+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-Subject: [PATCH 2/3] 20251106commit
++Subject: [PATCH 2/4] 20251106commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+index 966529e4..3e05f821 100644
+--- a/patches/0003-20261106secondcommit.patch
++++ b/patches/0003-20261106secondcommit.patch
+@@ -1,7 +1,7 @@
+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-Subject: [PATCH 3/3] 20261106secondcommit
++Subject: [PATCH 3/4] 20261106secondcommit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+new file mode 100644
+index 00000000..88a1aef4
+--- /dev/null
++++ b/patches/0004-20251106change.patch
+@@ -0,0 +1,7498 @@
++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Thu, 6 Nov 2025 15:48:09 +0800
++Subject: [PATCH 4/4] 20251106change
++
++---
++ .../models/deepseek/modeling_deepseek.py      |  189 +-
++ patches/0001-20251104commit.patch             | 1272 +++++++
++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
++ 4 files changed, 7244 insertions(+), 186 deletions(-)
++ create mode 100644 patches/0001-20251104commit.patch
++ create mode 100644 patches/0002-20251106commit.patch
++ create mode 100644 patches/0003-20261106secondcommit.patch
++
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index 2f9192bf..0546f318 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
++ 
++         return attn_output, attn_weights, past_key_value
++ 
++-# class DeepseekFlashAttention(nn.Module):
++-#     """
++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
++-
++-#     This class is designed as a drop-in replacement for DeepseekAttention.
++-#     """
++-
++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++-#         super().__init__()
++-#         self.config = config
++-#         self.layer_idx = layer_idx
++-#         if layer_idx is None:
++-#             logger.warning(
++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++-#                 "when creating this class."
++-#             )
++-
++-#         self.attention_dropout = config.attention_dropout
++-#         self.hidden_size = config.hidden_size
++-#         self.num_heads = config.num_attention_heads
++-#         self.head_dim = self.hidden_size // self.num_heads
++-#         self.num_key_value_heads = config.num_key_value_heads
++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++-#         self.max_position_embeddings = config.max_position_embeddings
++-#         self.rope_theta = config.rope_theta
++-#         self.is_causal = True
++-
++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
++-#             raise ValueError(
++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++-#                 f" and `num_heads`: {self.num_heads})."
++-#             )
++-
++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++-#         self._init_rope()
++-
++-#     def _init_rope(self):
++-#         if self.config.rope_scaling is None:
++-#             self.rotary_emb = DeepseekRotaryEmbedding(
++-#                 self.head_dim,
++-#                 max_position_embeddings=self.max_position_embeddings,
++-#                 base=self.rope_theta,
++-#             )
++-#         else:
++-#             scaling_type = self.config.rope_scaling["type"]
++-#             scaling_factor = self.config.rope_scaling["factor"]
++-#             if scaling_type == "linear":
++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++-#                     self.head_dim,
++-#                     max_position_embeddings=self.max_position_embeddings,
++-#                     scaling_factor=scaling_factor,
++-#                     base=self.rope_theta,
++-#                 )
++-#             elif scaling_type == "dynamic":
++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++-#                     self.head_dim,
++-#                     max_position_embeddings=self.max_position_embeddings,
++-#                     scaling_factor=scaling_factor,
++-#                     base=self.rope_theta,
++-#                 )
++-#             else:
++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++-
++-#     def forward(
++-#         self,
++-#         hidden_states: mindspore.Tensor,
++-#         attention_mask: Optional[mindspore.Tensor] = None,
++-#         position_ids: Optional[mindspore.Tensor] = None,
++-#         past_key_value: Optional[Cache] = None,
++-#         output_attentions: bool = False,
++-#         use_cache: bool = False,
++-#         **kwargs,
++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++-#         if "padding_mask" in kwargs:
++-#             warnings.warn(
++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++-#             )
++-        
++-#         if output_attentions:
++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
++-
++-#         bsz, q_len, _ = hidden_states.shape
++-
++-#         if self.config.pretraining_tp > 1:
++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++-
++-#         query_states = self.q_proj(hidden_states)
++-#         key_states = self.k_proj(hidden_states)
++-#         value_states = self.v_proj(hidden_states)
++-
++-#         # Reshape for multi-head attention
++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++-
++-#         kv_seq_len = key_states.shape[-2]
++-#         if past_key_value is not None:
++-#             if self.layer_idx is None:
++-#                 raise ValueError(
++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++-#                     "with a layer index."
++-#                 )
++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++-        
++-#         # Apply Rotary Positional Embedding
++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++-
++-#         if past_key_value is not None:
++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++-
++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++-        
++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++-        
++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++-
++-#         # Convert attention_mask for flash_attention_score
++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
++-#         if attention_mask is not None:
++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++-#                 raise ValueError(
++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++-#                 )
++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
++-#         else:
++-#             attn_mask_for_fa = None
++-        
++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++-
++-#         # Call the fused flash_attention_score operator
++-#         attn_output = mindspore.ops.flash_attention_score(
++-#             query=query_states_for_fa,
++-#             key=key_states_for_fa,
++-#             value=value_states_for_fa,
++-#             head_num=self.num_heads, # This is N1, the number of query heads
++-#             input_layout='BSH',
++-#             attn_mask=attn_mask_for_fa,
++-#             keep_prob=keep_prob,
++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
++-#             sparse_mode=0 # Default mask mode
++-#         )
++-        
++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
++-#         attn_output = self.o_proj(attn_output)
++-        
++-#         # Flash Attention does not return attention weights
++-#         attn_weights = None
++-
++-#         return attn_output, attn_weights, past_key_value
++ 
++ class DeepseekFlashAttention(nn.Module):
++     """
++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
++         super().__init__()
++         self.hidden_size = config.hidden_size
++ 
++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
++-            config=config, layer_idx=layer_idx
++-        )
+++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+++            # config=config, layer_idx=layer_idx
+++        # )
++ 
++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
++             config=config, layer_idx=layer_idx
++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
++         return outputs
++ 
++ 
++-
++ class DeepseekPreTrainedModel(PreTrainedModel):
++     config_class = DeepseekConfig
++     base_model_prefix = "model"
++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++         # Initialize weights and apply final processing
++         self.post_init()
++         self.warm_up = False
++-        #@dwj
++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
++-            self.num_layers,
++-            self.num_attention_heads,
++-            self.head_dim,
++-            batch_size=1,
++-            max_length=self.max_length,
++-            dtype=mindspore.float16
++-        )
++-
++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
++-        key_cache = []
++-        value_cache = []
++-        for _ in range(num_layers):
++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++-            key_cache.append(k)
++-            value_cache.append(v)
++-        return key_cache, value_cache
++-
++ 
++     def warmup_moe_model_deep(self):
++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++new file mode 100644
++index 00000000..78f22642
++--- /dev/null
+++++ b/patches/0001-20251104commit.patch
++@@ -0,0 +1,1272 @@
+++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++From: Pinoeer-kingxi <13022943007@163.com>
+++Date: Tue, 4 Nov 2025 09:11:51 +0800
+++Subject: [PATCH 1/3] 20251104commit
+++
+++---
+++ mindnlp/transformers/cache_utils.py           |  28 +-
+++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++ 3 files changed, 976 insertions(+), 87 deletions(-)
+++
+++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++index cadd2e04..02f8d4be 100644
+++--- a/mindnlp/transformers/cache_utils.py
++++++ b/mindnlp/transformers/cache_utils.py
+++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++             # k_out[:, :, cache_position] = key_states
+++             # v_out[:, :, cache_position] = value_states
+++-            if ON_ORANGE_PI:
+++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++-            else:
+++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++-
++++            # if ON_ORANGE_PI:
++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++            # else:
++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++            # 确保 cache_position 是 1D tensor 并且类型正确
++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++++            if cache_position.ndim > 1:
++++                cache_position = cache_position.flatten()
++++            # 确保类型是 int32 或 int64（MindSpore 要求）
++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++++                cache_position = cache_position.int()
++++            
++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++++            k_out[:, :, cache_position] = key_states
++++            v_out[:, :, cache_position] = value_states
++++            
+++         return k_out, v_out
+++ 
+++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++index c695b944..d8303e45 100644
+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++ def rotate_half(x):
+++     """Rotates half the hidden dims of the input."""
+++-    x1 = x[..., : x.shape[-1] // 2]
+++-    x2 = x[..., x.shape[-1] // 2 :]
++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++    # x1 = x[..., : x.shape[-1] // 2]
++++    # x2 = x[..., x.shape[-1] // 2 :]
++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++     return ops.cat((-x2, x1), dim=-1)
+++ 
+++ 
+++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++         if self.training:
+++             raise NotImplementedError("Training is not supported yet.")
+++         else:
+++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++-        if self.config.n_shared_experts is not None:
+++-            y = y + self.shared_experts(identity)
+++-        return y
++++            # @lwx
++++            if orig_shape[1] == 1:
++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++++                y=y.view(*orig_shape)
++++                if self.config.n_shared_experts is not None:
++++                    y = y + self.shared_experts(identity)
++++                return y
++++            else:
++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++++                if self.config.n_shared_experts is not None:
++++                    y = y + self.shared_experts(identity)
++++                return y
++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++        # if self.config.n_shared_experts is not None:
++++        #     y = y + self.shared_experts(identity)
++++        # return y
++++        
++++    @no_grad()
++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++
++++        expert_cache = ops.zeros_like(x)
++++        for i in range(self.num_experts_per_tok):
++++            expert_id = flat_expert_indices[i].item()
++++            weight = flat_expert_weights[i].item()
++++            expert = self.experts[expert_id]
++++            expert_out = expert(x)
++++            expert_cache += expert_out * weight
++++        return expert_cache
+++ 
+++     @no_grad()
+++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++-        # expert_cache = torch.zeros_like(x)
+++-        # idxs = flat_expert_indices.argsort()
+++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++-        # token_idxs = idxs // self.num_experts_per_tok
+++-        # for i, end_idx in enumerate(tokens_per_expert):
+++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++-        #     if start_idx == end_idx:
+++-        #         continue
+++-        #     expert = self.experts[i]
+++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++-        #     expert_tokens = x[exp_token_idx]
+++-        #     expert_out = expert(expert_tokens)
+++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++-        # return expert_cache
++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++         expert_cache = ops.zeros_like(x)
+++         idxs = flat_expert_indices.argsort()
+++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++         token_idxs = idxs // self.num_experts_per_tok
++++
+++         for i, end_idx in enumerate(tokens_per_expert):
+++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++             if start_idx == end_idx:
+++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++             expert_out = expert(expert_tokens)
+++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++
+++         return expert_cache
++++        
++++    # @no_grad()
++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++    #     # expert_cache = torch.zeros_like(x)
++++    #     # idxs = flat_expert_indices.argsort()
++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++    #     # token_idxs = idxs // self.num_experts_per_tok
++++    #     # for i, end_idx in enumerate(tokens_per_expert):
++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++    #     #     if start_idx == end_idx:
++++    #     #         continue
++++    #     #     expert = self.experts[i]
++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++    #     #     expert_tokens = x[exp_token_idx]
++++    #     #     expert_out = expert(expert_tokens)
++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++    #     # return expert_cache
++++    #     expert_cache = ops.zeros_like(x)
++++    #     idxs = flat_expert_indices.argsort()
++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++    #     token_idxs = idxs // self.num_experts_per_tok
++++
++++    #     for i, end_idx in enumerate(tokens_per_expert):
++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++    #         if start_idx == end_idx:
++++    #             continue
++++    #         expert = self.experts[i]
++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++    #         expert_tokens = x[exp_token_idx]
++++    #         expert_out = expert(expert_tokens)
++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++
++++    #     return expert_cache
++++    # @no_grad()
++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++    #     expert_cache = ops.zeros_like(x)
++++
++++    #     # 排序保证顺序一致
++++    #     idxs = flat_expert_indices.argsort()
++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++    #     token_idxs = idxs // self.num_experts_per_tok
++++
++++    #     # 找出有 token 的专家
++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++
++++    #     for i in active_experts.tolist():
++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++    #         end_idx = tokens_per_expert[i]
++++    #         if start_idx == end_idx:  # 没有 token
++++    #             continue
++++
++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++    #         expert_tokens = x[exp_token_idx]
++++    #         expert_out = self.experts[i](expert_tokens)
++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++
++++    #         expert_cache = mindspore.mint.scatter_add(
++++    #             expert_cache,
++++    #             0,
++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++    #             expert_out
++++    #         )
++++
++++    #     return expert_cache
++++
++++
+++ 
+++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++ #     """
+++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++ 
+++         # Initialize weights and apply final processing
+++         self.post_init()
++++        self.warm_up = False
++++
++++    def warmup_moe_model_deep(self):
++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++        test_texts = [
++++            "warmup short",
++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++++        ]
++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++        if tokenizer is None:
++++            from mindnlp.transformers import AutoTokenizer
++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++            self._warmup_tokenizer = tokenizer
++++
++++        for text in test_texts:
++++            inputs = tokenizer(text, return_tensors="ms")
++++            with mindspore._no_grad():
++++                _ = self(**inputs, use_cache=False)
++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++ 
+++     def get_input_embeddings(self):
+++         return self.model.embed_tokens
+++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++         ```"""
++++        if not self.warm_up:
++++            self.warm_up = True
++++            self.warmup_moe_model_deep()
++++
+++         output_attentions = (
+++             output_attentions
+++             if output_attentions is not None
+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++index 3cbf820e..d4c6b651 100644
+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++@@ -18,7 +18,6 @@
+++ # See the License for the specific language governing permissions and
+++ # limitations under the License.
+++ """MindSpore Qwen2MoE model."""
+++-
+++ import math
+++ from typing import List, Optional, Tuple, Union
+++ 
+++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++     TokenClassifierOutput,
+++ )
+++ from ...modeling_utils import PreTrainedModel
++++from ...generation import GenerationMixin
+++ from ....utils import logging
+++ from .configuration_qwen2_moe import Qwen2MoeConfig
+++ 
+++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++         self.variance_epsilon = eps
+++ 
+++     def forward(self, hidden_states):
++++        # @dwj
++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++        # @lwx
++++        # if not self.training :
++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++         input_dtype = hidden_states.dtype
+++         hidden_states = hidden_states.to(mindspore.float32)
+++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++@@ -234,6 +239,8 @@ def rotate_half(x):
+++     """Rotates half the hidden dims of the input."""
+++     x1 = x[..., : x.shape[-1] // 2]
+++     x2 = x[..., x.shape[-1] // 2 :]
++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++     return ops.cat((-x2, x1), dim=-1)
+++ 
+++ 
+++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++         self.config = config
+++         self.hidden_size = config.hidden_size
+++         self.intermediate_size = intermediate_size
++++        
+++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++         self.act_fn = ACT2FN[config.hidden_act]
+++ 
+++     def forward(self, x):
+++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++-
+++ 
++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++        # @lwx 
++++        # gate_up_output = self.gate_up_proj(x)
++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++++        # return self.down_proj(swiglu_output)
++++
++++    # def forward(self, x):
++++    #     gate_proj_out = self.gate_proj(x)
++++    #     up_proj_out = self.up_proj(x)
++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++++    #     return self.down_proj(swiglu_out)
++++        
+++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++     """
+++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++         use_cache: bool = False,
+++         cache_position: Optional[mindspore.Tensor] = None,
+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++        
++++
+++         bsz, q_len, _ = hidden_states.shape
+++ 
+++         query_states = self.q_proj(hidden_states)
+++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++                     "with a layer index."
+++                 )
+++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++            if isinstance(past_key_value, StaticCache):
++++                kv_seq_len = key_states.shape[-2]
++++            else:
++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++ 
+++         if past_key_value is not None:
+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++            
++++            if isinstance(past_key_value, StaticCache):
++++                kv_seq_len = key_states.shape[-2]
+++ 
+++         # repeat k/v heads if n_kv_heads < n_heads
+++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++-
++++        
+++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++ 
+++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++-            raise ValueError(
+++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++-                f" {attn_weights.shape}"
+++-            )
+++-
+++-        if attention_mask is not None:  # no matter the length, we just slice it
+++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++++        if attention_mask is not None:
++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++             attn_weights = attn_weights + causal_mask
+++ 
+++         # upcast attention to fp32
+++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++ 
+++         attn_output = self.o_proj(attn_output)
+++-
++++        # @lwx
++++        
++++        # max_seq_len = self.max_position_embeddings  # 2048
++++
++++        # if attention_mask is not None:
++++        #     # attention_mask: [B, 1, Sq, Sk]
++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++
++++        #     # pad 到 [max_seq_len, max_seq_len]
++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++        #     global_attention_mask = padded_mask
++++        # else:
++++        #     global_attention_mask = None
++++
++++
++++        # sparse_mode=3
++++        # attn_output = mindspore.ops.flash_attention_score(
++++        #     query=query_states,
++++        #     key=key_states,
++++        #     value=value_states,
++++        #     real_shift=None,
++++        #     padding_mask=None,
++++
++++        #     head_num=self.num_heads,
++++        #     attn_mask=global_attention_mask,
++++        #     keep_prob=1.0 - self.attention_dropout,
++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++        #     input_layout="BNSD",
++++        #     pre_tokens=2147483647,
++++        #     next_tokens=2147483647,
++++        #     inner_precise=0,
++++        #     drop_mask=None,
++++        #     prefix=None,
++++        #     actual_seq_qlen=None,
++++        #     actual_seq_kvlen=None,
++++        #     sparse_mode=sparse_mode,
++++        # )
+++         if not output_attentions:
+++             attn_weights = None
+++ 
+++         return attn_output, attn_weights, past_key_value
+++ 
+++ 
++++class Qwen2MoeFlashAttention(nn.Module):
++++    """
++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++
++++    关键改动:
++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++       直接传入原始的 key 和 value 张量效率更高。
++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++    """
++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++        super().__init__()
++++        self.config = config
++++        self.layer_idx = layer_idx
++++        self.hidden_size = config.hidden_size
++++        self.num_heads = config.num_attention_heads
++++        self.head_dim = self.hidden_size // self.num_heads
++++        self.num_key_value_heads = config.num_key_value_heads
++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++        self.max_position_embeddings = config.max_position_embeddings
++++        self.rope_theta = config.rope_theta
++++        self.attention_dropout = config.attention_dropout
++++
++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++            raise ValueError(
++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++            )
++++
++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++
++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++            self.head_dim,
++++            max_position_embeddings=self.max_position_embeddings,
++++            base=self.rope_theta,
++++        )
++++
++++    def forward(
++++        self,
++++        hidden_states: mindspore.Tensor,
++++        attention_mask: Optional[mindspore.Tensor] = None,
++++        position_ids: Optional[mindspore.Tensor] = None,
++++        past_key_value: Optional[Cache] = None,
++++        output_attentions: bool = False,
++++        use_cache: bool = False,
++++        cache_position: Optional[mindspore.Tensor] = None,
++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++        bsz, q_len, _ = hidden_states.shape
++++
++++        # 1. 线性投射 Q, K, V
++++        query_states = self.q_proj(hidden_states)
++++        key_states = self.k_proj(hidden_states)
++++        value_states = self.v_proj(hidden_states)
++++
++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++        # query:   [B, S, H*D] -> [B, N1, S, D]
++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++        # 3. RoPE 旋转位置编码
++++        kv_seq_len = key_states.shape[-2]
++++        if past_key_value is not None:
++++            if self.layer_idx is None:
++++                raise ValueError(
++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++                    "with a layer index."
++++                )
++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++                if cache_position.shape[0] == 1:
++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++                    kv_seq_len = past_seen_tokens + 1
++++                else:
++++                    # prefill 阶段：cache_position 是范围，使用其长度
++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++            else:
++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++
++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++        # 4. KV 缓存更新
++++        if past_key_value is not None:
++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++            key_states, value_states = past_key_value.update(
++++                key_states, value_states, self.layer_idx, cache_kwargs
++++            )
++++            
++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++                if cache_position.shape[0] == 1:
++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++                    kv_seq_len = key_states.shape[-2]
++++
++++        # 5. [重要] 准备 Attention Mask
++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++        fa_attention_mask = None
++++        if attention_mask is not None:
++++            # 截取与当前key长度匹配的部分
++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++            fa_attention_mask = (mask_slice != 0)
++++
++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++        input_dtype = query_states.dtype
++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++            query_states = query_states.to(mindspore.float16)
++++            key_states = key_states.to(mindspore.float16)
++++            value_states = value_states.to(mindspore.float16)
++++
++++        # 6. [核心] 调用 flash_attention_score 算子
++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++        attn_output = mindspore.ops.flash_attention_score(
++++            query=query_states,
++++            key=key_states,
++++            value=value_states,
++++            head_num=self.num_heads, # 传入Q的头数(N1)
++++            attn_mask=fa_attention_mask,
++++            keep_prob=1.0 - self.attention_dropout,
++++            scalar_value=1.0 / math.sqrt(self.head_dim),
++++            input_layout="BNSD",
++++            sparse_mode=0 # 使用 defaultMask 模式
++++        )
++++
++++        # 恢复原始数据类型
++++        attn_output = attn_output.to(input_dtype)
++++
++++        # 7. 调整输出形状
++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++        attn_output = self.o_proj(attn_output)
++++
++++        # FlashAttention 算子不直接返回注意力权重矩阵
++++        attn_weights = None
++++        if output_attentions:
++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++
++++        return attn_output, attn_weights, past_key_value
++++
++++    # def forward(
++++    #     self,
++++    #     hidden_states: mindspore.Tensor,
++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++    #     past_key_value: Optional[Cache] = None,
++++    #     output_attentions: bool = False,
++++    #     use_cache: bool = False,
++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++    #     bsz, q_len, _ = hidden_states.shape
++++
++++    #     # 1. 线性投射 Q, K, V
++++    #     query_states = self.q_proj(hidden_states)
++++    #     key_states = self.k_proj(hidden_states)
++++    #     value_states = self.v_proj(hidden_states)
++++
++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++    #     # 3. RoPE 旋转位置编码
++++    #     kv_seq_len = key_states.shape[-2]
++++    #     if past_key_value is not None:
++++    #         if self.layer_idx is None:
++++    #             raise ValueError(
++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++    #                 "with a layer index."
++++    #             )
++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++
++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++    #     # 4. KV 缓存更新
++++    #     if past_key_value is not None:
++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++    #         key_states, value_states = past_key_value.update(
++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++    #         )
++++
++++    #     # 5. 准备 Attention Mask
++++    #     fa_attention_mask = None
++++    #     if attention_mask is not None:
++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++    #         fa_attention_mask = (mask_slice != 0)
++++
++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++    #     input_dtype = query_states.dtype
++++
++++    #     # 6. [核心] 调用 flash_attention_score 算子
++++    #     attn_output = mindspore.ops.flash_attention_score(
++++    #         query=query_states,
++++    #         key=key_states,
++++    #         value=value_states,
++++    #         head_num=self.num_heads,
++++    #         attn_mask=fa_attention_mask,
++++    #         keep_prob=1.0 - self.attention_dropout,
++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++    #         input_layout="BNSD",
++++    #         sparse_mode=0,
++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++    #         inner_precise=1
++++    #     )
++++
++++    #     # 恢复原始数据类型
++++    #     attn_output = attn_output.to(input_dtype)
++++
++++    #     # 7. 调整输出形状
++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++    #     attn_output = self.o_proj(attn_output)
++++
++++    #     attn_weights = None
++++    #     if output_attentions:
++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++
++++    #     return attn_output, attn_weights, past_key_value
++++
++++    # def forward(
++++    #     self,
++++    #     hidden_states: mindspore.Tensor,
++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++    #     past_key_value: Optional[Cache] = None,
++++    #     output_attentions: bool = False,
++++    #     use_cache: bool = False,
++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++    #     bsz, q_len, _ = hidden_states.shape
++++
++++    #     query_states = self.q_proj(hidden_states)
++++    #     key_states = self.k_proj(hidden_states)
++++    #     value_states = self.v_proj(hidden_states)
++++
++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++    #     kv_seq_len = key_states.shape[-2]
++++    #     if past_key_value is not None:
++++    #         if self.layer_idx is None:
++++    #             raise ValueError("`layer_idx` must be specified for caching")
++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++
++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++    #     if past_key_value is not None:
++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++    #         key_states, value_states = past_key_value.update(
++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++    #         )
++++
++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++
++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++    #     query_states = query_states / math.sqrt(self.head_dim)
++++    #     # <--- 修改结束 ---
++++
++++    #     fa_attention_mask = None
++++    #     if attention_mask is not None:
++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++    #         fa_attention_mask = (mask_slice != 0)
++++
++++    #     input_dtype = query_states.dtype
++++
++++    #     attn_output = mindspore.ops.flash_attention_score(
++++    #         query=query_states,    # 传入已经预先缩放过的 query
++++    #         key=key_states,
++++    #         value=value_states,
++++    #         head_num=self.num_heads,
++++    #         attn_mask=fa_attention_mask,
++++    #         keep_prob=1.0 - self.attention_dropout,
++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++    #         input_layout="BNSD",
++++    #         sparse_mode=0,
++++    #         inner_precise=1        # 仍然保持内部高精度计算
++++    #     )
++++
++++    #     attn_output = attn_output.to(input_dtype)
++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++    #     attn_output = self.o_proj(attn_output)
++++
++++    #     attn_weights = None
++++    #     if output_attentions:
++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++
++++    #     return attn_output, attn_weights, past_key_value
++++
+++ QWEN2MOE_ATTENTION_CLASSES = {
+++     "eager": Qwen2MoeAttention,
++++    "flash-attention": Qwen2MoeFlashAttention,
+++ }
+++ 
+++ 
+++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++ 
++++    #@dwj
++++    # 只遍历激活的专家，而非全部专家
+++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-        hidden_states = hidden_states.view(-1, hidden_dim)
+++-        # router_logits: (batch * sequence_length, n_experts)
+++-        router_logits = self.gate(hidden_states)
+++-
+++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++-        if self.norm_topk_prob:
+++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-        # we cast back to the input dtype
+++-        routing_weights = routing_weights.to(hidden_states.dtype)
+++-
+++-        final_hidden_states = ops.zeros(
+++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++-        )
+++-
+++-        # One hot encode the selected experts to create an expert mask
+++-        # this will be used to easily index which expert is going to be sollicitated
+++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++-
+++-        # Loop over all available experts in the model and perform the computation on each expert
+++-        for expert_idx in range(self.num_experts):
+++-            expert_layer = self.experts[expert_idx]
+++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++-
+++-            # Index the correct hidden states and compute the expert hidden state for
+++-            # the current expert. We need to make sure to multiply the output hidden
+++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++-            if 0 not in idx.shape:
+++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++-
+++-                # However `index_add_` only support torch tensors for indexing so we'll use
+++-                # the `top_x` tensor here.
+++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++-
+++-        shared_expert_output = self.shared_expert(hidden_states)
+++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++-
+++-        final_hidden_states = final_hidden_states + shared_expert_output
++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++            num_tokens = hidden_states_reshaped.shape[0]
++++
++++            router_logits = self.gate(hidden_states_reshaped)
++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++
++++            if self.norm_topk_prob:
++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++            routing_weights = routing_weights.to(hidden_states.dtype)
++++            
++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++            flat_selected_experts = selected_experts.flatten()
++++            
++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++            token_indices = broadcasted_token_indices.flatten()
++++            
++++            active_experts = ops.unique(flat_selected_experts)
++++            
++++            for expert_idx_tensor in active_experts:
++++                expert_idx = expert_idx_tensor.item()
++++                expert_layer = self.experts[expert_idx]
++++                
++++                mask = (flat_selected_experts == expert_idx_tensor)
++++                selected_token_indices = token_indices[mask]
++++                selected_routing_weights = routing_weights.flatten()[mask]
++++                
++++                current_states = hidden_states_reshaped[selected_token_indices]
++++                
++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++                
++++                final_hidden_states = final_hidden_states.index_add(
++++                    dim=0,
++++                    index=selected_token_indices,
++++                    source=expert_output.to(hidden_states.dtype)
++++                )
++++            
++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++ 
+++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++-        return final_hidden_states, router_logits
++++            final_hidden_states = final_hidden_states + shared_expert_output
++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++            
++++            return final_hidden_states, router_logits
+++ 
+++ 
+++ class Qwen2MoeDecoderLayer(nn.Module):
+++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++ 
+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++ 
++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++
+++         if (layer_idx not in config.mlp_only_layers) and (
+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++         ):
+++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++     _skip_keys_device_placement = "past_key_values"
+++     _supports_cache_class = True
++++#lwx
++++    # _supports_static_cache = True
+++ 
+++     def _init_weights(self, module):
+++         std = self.config.initializer_range
+++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++         return causal_mask
+++ 
+++ 
+++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++     _tied_weights_keys = ["lm_head.weight"]
+++ 
+++     def __init__(self, config):
+++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++         self.num_experts_per_tok = config.num_experts_per_tok
+++         # Initialize weights and apply final processing
+++         self.post_init()
++++        # @lwx
++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++++        #     self.generation_config.cache_implementation = "static"
++++        self._warmed_up = False
++++
++++    def warmup_moe_model(self):
++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
++++        test_texts = [
++++            "warmup short",
++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++++        ]
++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++        if tokenizer is None:
++++            from mindnlp.transformers import AutoTokenizer
++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++            self._warmup_tokenizer = tokenizer
++++
++++        for text in test_texts:
++++            inputs = tokenizer(text, return_tensors="ms")
++++            with mindspore._no_grad():
++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++ 
+++     def get_input_embeddings(self):
+++         return self.model.embed_tokens
+++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++         ```"""
++++        if not self._warmed_up:
++++            self._warmed_up = True
++++            self.warmup_moe_model()
+++ 
+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++         output_router_logits = (
+++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++             }
+++         )
+++         return model_inputs
++++# @lwx
++++    # def _decode_one_tokens_logits(
++++    #     self,
++++    #     cur_token: mindspore.Tensor,
++++    #     input_pos: Optional[mindspore.Tensor],
++++    #     cache_position: mindspore.Tensor,
++++    #     past_key_values: StaticCache,
++++    # ) -> mindspore.Tensor:
++++    #     """
++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++++        
++++    #     Args:
++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++++    #         input_pos: 输入位置信息，可选
++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
++++            
++++    #     Returns:
++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++++    #     """
++++    #     # 调用JIT编译的版本
++++    #     return self.get_decode_one_tokens_logits(
++++    #         cur_token=cur_token,
++++    #         input_pos=input_pos,
++++    #         cache_position=cache_position,
++++    #         past_key_values=past_key_values,
++++    #     )
++++    
++++    # @mindspore.jit(jit_level='O1')
++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++++    #     """
++++    #     JIT编译的函数，用于高效的单token解码
++++    #     使用JIT编译优化以支持静态shape和高效执行
++++        
++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++++    #     """
++++    #     outputs = self.model.forward(
++++    #         input_ids=cur_token,
++++    #         position_ids=input_pos,
++++    #         cache_position=cache_position,
++++    #         past_key_values=past_key_values,
++++    #         use_cache=True,
++++    #         return_dict=False,
++++    #     )
++++        
++++    #     hidden_states = outputs[0]
++++    #     logits = self.lm_head.forward(hidden_states)
++++    #     logits = logits.float()
++++        
++++    #     return logits[:, -1, :]
++++
++++    # def _sample(
++++    #     self,
++++    #     input_ids: mindspore.Tensor,
++++    #     logits_processor,
++++    #     stopping_criteria,
++++    #     generation_config,
++++    #     synced_devices: bool,
++++    #     streamer=None,
++++    #     logits_warper=None,
++++    #     **model_kwargs,
++++    # ):
++++    #     """
++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++++    #     """
++++    #     from ...generation.logits_process import LogitsProcessorList
++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++++    #     from mindnlp.core import nn, ops, no_grad
++++    #     import numpy as np
++++        
++++    #     # 检查是否使用 StaticCache
++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++++    #     # 否则，直接调用父类方法
++++    #     past_key_values = model_kwargs.get("past_key_values")
++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++++
++++    #     if not isinstance(past_key_values, StaticCache):
++++    #         # 不使用 StaticCache，直接调用父类方法
++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++++    #         return super()._sample(
++++    #             input_ids=input_ids,
++++    #             logits_processor=logits_processor,
++++    #             stopping_criteria=stopping_criteria,
++++    #             generation_config=generation_config,
++++    #             synced_devices=synced_devices,
++++    #             streamer=streamer,
++++    #             logits_warper=logits_warper,
++++    #             **model_kwargs,
++++    #         )
++++        
++++    #     # 使用 StaticCache，进入自定义循环
++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++++    #     pad_token_id = generation_config._pad_token_tensor
++++    #     output_attentions = generation_config.output_attentions
++++    #     output_hidden_states = generation_config.output_hidden_states
++++    #     output_scores = generation_config.output_scores
++++    #     output_logits = generation_config.output_logits
++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
++++    #     max_length = generation_config.max_length
++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++++    #     do_sample = generation_config.do_sample
++++        
++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++++    #         raise ValueError(
++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++++    #             f"{logits_warper})."
++++    #         )
++++        
++++    #     # init attention / hidden states / scores tuples
++++    #     scores = () if (return_dict_in_generate and output_scores) else None
++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++++        
++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++++    #         encoder_hidden_states = (
++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++++    #         )
++++        
++++    #     # keep track of which sequences are already finished
++++    #     batch_size, cur_len = input_ids.shape
++++    #     this_peer_finished = False
++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++++        
++++    #     time_record = []
++++    #     from ....utils.testing_utils import parse_flag_from_env
++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++++        
++++    #     while self._has_unfinished_sequences(
++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++++    #     ):
++++    #         if _record_time:
++++    #             import time as time_module
++++    #             infer_start = time_module.time()
++++            
++++    #         # prepare model inputs
++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++++            
++++    #         # prepare variable output controls
++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++++            
++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++++    #         cur_cache_position = model_inputs.get("cache_position")
++++    #         cur_past_key_values = model_inputs.get("past_key_values")
++++    #         cur_input_ids = model_inputs.get("input_ids")
++++            
++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
++++    #             cur_cache_position is not None and 
++++    #             len(cur_cache_position.shape) > 0 and
++++    #             cur_cache_position.shape[0] == 1 and
++++    #             cur_input_ids is not None and
++++    #             cur_input_ids.shape[1] == 1):
++++    #             # 使用 JIT 优化的单 token 解码
++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++++    #             if not hasattr(self, '_jit_used'):
++++    #                 self._jit_used = False
++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++++                
++++    #             next_token_logits = self.get_decode_one_tokens_logits(
++++    #                 cur_token=cur_input_ids,
++++    #                 input_pos=model_inputs.get("position_ids"),
++++    #                 cache_position=cur_cache_position,
++++    #                 past_key_values=cur_past_key_values,
++++    #             )
++++                
++++    #             # 标记已使用JIT（用于后续判断）
++++    #             if not self._jit_used:
++++    #                 self._jit_used = True
++++                
++++    #             # 构造兼容的输出对象
++++    #             class JitOptimizedOutput:
++++    #                 def __init__(self, logits, config):
++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++++    #                     self.config = config
++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++++    #                     self.attentions = None if not config.is_encoder_decoder else None
++++    #                     self.cross_attentions = None
++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++++                
++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++++    #         else:
++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++++    #             outputs = self(**model_inputs, return_dict=True)
++++            
++++    #         if synced_devices and this_peer_finished:
++++    #             continue
++++            
++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++++    #         next_token_logits = outputs.logits[:, -1, :]
++++            
++++    #         # pre-process distribution
++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++++    #         if do_sample:
++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++++            
++++    #         # Store scores, attentions and hidden_states when required
++++    #         if return_dict_in_generate:
++++    #             if output_scores:
++++    #                 scores += (next_token_scores,)
++++    #             if output_logits:
++++    #                 raw_logits += (next_token_logits,)
++++    #             if output_attentions:
++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++++    #                 if self.config.is_encoder_decoder:
++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++++                
++++    #             if output_hidden_states:
++++    #                 hidden = (
++++    #                     outputs.decoder_hidden_states
++++    #                     if self.config.is_encoder_decoder
++++    #                     else outputs.hidden_states
++++    #                 )
++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++++            
++++    #         # token selection
++++    #         if do_sample:
++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++++    #         else:
++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++++            
++++    #         # finished sentences should have their next token be a padding token
++++    #         if has_eos_stopping_criteria:
++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++++            
++++    #         # update generated ids, model inputs, and length for next step
++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++++    #         if streamer is not None:
++++    #             streamer.put(next_tokens)
++++            
++++    #         model_kwargs = self._update_model_kwargs_for_generation(
++++    #             outputs,
++++    #             model_kwargs,
++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
++++    #         )
++++            
++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++++    #         cur_len += 1
++++            
++++    #         if _record_time:
++++    #             import time as time_module
++++    #             infer_stop = time_module.time()
++++    #             time_record.append(infer_stop - infer_start)
++++            
++++    #         del outputs
++++        
++++    #     average_infer_time = None
++++    #     if time_record:
++++    #         if len(time_record) > 1:
++++    #             time_record.pop(0)
++++    #         average_infer_time = sum(time_record) / len(time_record)
++++    #         print(f'average inference time is: {average_infer_time}')
++++    #         print(f'inference time record: {time_record}')
++++        
++++    #     if streamer is not None:
++++    #         streamer.end()
++++        
++++    #     # 简单判断：打印是否使用了JIT路径
++++    #     if hasattr(self, '_jit_used') and self._jit_used:
++++    #         print("[JIT] ✓ JIT optimization was used during generation")
++++    #     else:
++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++++        
++++    #     if return_dict_in_generate:
++++    #         if self.config.is_encoder_decoder:
++++    #             return GenerateEncoderDecoderOutput(
++++    #                 sequences=input_ids,
++++    #                 scores=scores,
++++    #                 logits=raw_logits,
++++    #                 encoder_attentions=encoder_attentions,
++++    #                 encoder_hidden_states=encoder_hidden_states,
++++    #                 decoder_attentions=decoder_attentions,
++++    #                 cross_attentions=cross_attentions,
++++    #                 decoder_hidden_states=decoder_hidden_states,
++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++    #                 average_infer_time=average_infer_time
++++    #             )
++++    #         else:
++++    #             return GenerateDecoderOnlyOutput(
++++    #                 sequences=input_ids,
++++    #                 scores=scores,
++++    #                 logits=raw_logits,
++++    #                 attentions=decoder_attentions,
++++    #                 hidden_states=decoder_hidden_states,
++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++    #                 average_infer_time=average_infer_time
++++    #             )
++++    #     else:
++++    #         return input_ids
++++            
++++    # def _prepare_cache_for_generation(
++++    #     self,
++++    #     generation_config,
++++    #     model_kwargs,
++++    #     assistant_model,
++++    #     batch_size,
++++    #     max_cache_length,
++++    # ):
++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++++    #         generation_config.cache_implementation = "static"
++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++++        
++++    #     if generation_config.cache_implementation == "static":
++++    #         base_required_from_max_length = generation_config.max_length + 1
++++    #         base_required = max(max_cache_length, base_required_from_max_length)
++++    #         min_cache_size = 50
++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++++    #         else:
++++    #             max_cache_length = max(base_required, min_cache_size)
++++            
++++    #         original_max_cache_length = max_cache_length
++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++++    #         print(f"  - final max_cache_length: {max_cache_length}")
++++            
++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++    #             if max_cache_length > self.config.max_position_embeddings:
++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++++        
++++    #     result = super()._prepare_cache_for_generation(
++++    #         generation_config=generation_config,
++++    #         model_kwargs=model_kwargs,
++++    #         assistant_model=assistant_model,
++++    #         batch_size=batch_size,
++++    #         max_cache_length=max_cache_length,
++++    #     )
++++        
++++    #     if generation_config.cache_implementation == "static":
++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++++    #         created_cache = model_kwargs.get(cache_name)
++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++++    #             if created_cache.max_cache_len < generation_config.max_length:
++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++++        
++++    #     return result
++++
++++
++++
+++ 
+++ 
+++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++-- 
+++2.27.0
+++
++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++new file mode 100644
++index 00000000..22b65dd5
++--- /dev/null
+++++ b/patches/0002-20251106commit.patch
++@@ -0,0 +1,3200 @@
+++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+++From: Pinoeer-kingxi <13022943007@163.com>
+++Date: Thu, 6 Nov 2025 09:20:38 +0800
+++Subject: [PATCH 2/3] 20251106commit
+++
+++---
+++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+++ 3 files changed, 2689 insertions(+), 305 deletions(-)
+++ create mode 100644 patches/0001-20251104commit.patch
+++
+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++index d8303e45..73773c22 100644
+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+++         #     y = y + self.shared_experts(identity)
+++         # return y
+++         
++++    # @no_grad()
++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++
++++    #     expert_cache = ops.zeros_like(x)
++++    #     for i in range(self.num_experts_per_tok):
++++    #         expert_id = flat_expert_indices[i].item()
++++    #         weight = flat_expert_weights[i].item()
++++    #         expert = self.experts[expert_id]
++++    #         expert_out = expert(x)
++++    #         expert_cache += expert_out * weight
++++    #     return expert_cache
++++
+++     @no_grad()
+++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++        # x 的 shape: (1, hidden_size)
++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++++
++++        # 1. 收集所有需要的专家层
++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
++++
++++        # 2. 并行计算所有专家的输出
++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++++        # ops.cat 会将它们堆叠成一个新的 Tensor
++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++++
++++        # 3. 使用矩阵乘法进行加权求和
++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++        # 最终结果 final_output 的 shape: (1, hidden_size)
++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++++        
++++        return final_output
+++ 
+++-        expert_cache = ops.zeros_like(x)
+++-        for i in range(self.num_experts_per_tok):
+++-            expert_id = flat_expert_indices[i].item()
+++-            weight = flat_expert_weights[i].item()
+++-            expert = self.experts[expert_id]
+++-            expert_out = expert(x)
+++-            expert_cache += expert_out * weight
+++-        return expert_cache
+++ 
+++     @no_grad()
+++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+++             key_states = self.k_proj(hidden_states)
+++             value_states = self.v_proj(hidden_states)
+++ 
+++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++        # @lwx
++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+++ 
+++         kv_seq_len = key_states.shape[-2]
+++         if past_key_value is not None:
+++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+++         return attn_output, attn_weights, past_key_value
+++ 
+++ 
++++# class DeepseekFlashAttention(nn.Module):
++++#     """
++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
++++
++++#     This class is designed as a drop-in replacement for DeepseekAttention.
++++#     """
++++
++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++++#         super().__init__()
++++#         self.config = config
++++#         self.layer_idx = layer_idx
++++#         if layer_idx is None:
++++#             logger.warning(
++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++#                 "when creating this class."
++++#             )
++++
++++#         self.attention_dropout = config.attention_dropout
++++#         self.hidden_size = config.hidden_size
++++#         self.num_heads = config.num_attention_heads
++++#         self.head_dim = self.hidden_size // self.num_heads
++++#         self.num_key_value_heads = config.num_key_value_heads
++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++#         self.max_position_embeddings = config.max_position_embeddings
++++#         self.rope_theta = config.rope_theta
++++#         self.is_causal = True
++++
++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++#             raise ValueError(
++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++#                 f" and `num_heads`: {self.num_heads})."
++++#             )
++++
++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++++#         self._init_rope()
++++
++++#     def _init_rope(self):
++++#         if self.config.rope_scaling is None:
++++#             self.rotary_emb = DeepseekRotaryEmbedding(
++++#                 self.head_dim,
++++#                 max_position_embeddings=self.max_position_embeddings,
++++#                 base=self.rope_theta,
++++#             )
++++#         else:
++++#             scaling_type = self.config.rope_scaling["type"]
++++#             scaling_factor = self.config.rope_scaling["factor"]
++++#             if scaling_type == "linear":
++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++++#                     self.head_dim,
++++#                     max_position_embeddings=self.max_position_embeddings,
++++#                     scaling_factor=scaling_factor,
++++#                     base=self.rope_theta,
++++#                 )
++++#             elif scaling_type == "dynamic":
++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++++#                     self.head_dim,
++++#                     max_position_embeddings=self.max_position_embeddings,
++++#                     scaling_factor=scaling_factor,
++++#                     base=self.rope_theta,
++++#                 )
++++#             else:
++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++++
++++#     def forward(
++++#         self,
++++#         hidden_states: mindspore.Tensor,
++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++#         position_ids: Optional[mindspore.Tensor] = None,
++++#         past_key_value: Optional[Cache] = None,
++++#         output_attentions: bool = False,
++++#         use_cache: bool = False,
++++#         **kwargs,
++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++#         if "padding_mask" in kwargs:
++++#             warnings.warn(
++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++++#             )
++++        
++++#         if output_attentions:
++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
++++
++++#         bsz, q_len, _ = hidden_states.shape
++++
++++#         if self.config.pretraining_tp > 1:
++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++++
++++#         query_states = self.q_proj(hidden_states)
++++#         key_states = self.k_proj(hidden_states)
++++#         value_states = self.v_proj(hidden_states)
++++
++++#         # Reshape for multi-head attention
++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++#         kv_seq_len = key_states.shape[-2]
++++#         if past_key_value is not None:
++++#             if self.layer_idx is None:
++++#                 raise ValueError(
++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++#                     "with a layer index."
++++#                 )
++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++        
++++#         # Apply Rotary Positional Embedding
++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++#         if past_key_value is not None:
++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++
++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++        
++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++        
++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++
++++#         # Convert attention_mask for flash_attention_score
++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
++++#         if attention_mask is not None:
++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++++#                 raise ValueError(
++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++++#                 )
++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
++++#         else:
++++#             attn_mask_for_fa = None
++++        
++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++++
++++#         # Call the fused flash_attention_score operator
++++#         attn_output = mindspore.ops.flash_attention_score(
++++#             query=query_states_for_fa,
++++#             key=key_states_for_fa,
++++#             value=value_states_for_fa,
++++#             head_num=self.num_heads, # This is N1, the number of query heads
++++#             input_layout='BSH',
++++#             attn_mask=attn_mask_for_fa,
++++#             keep_prob=keep_prob,
++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
++++#             sparse_mode=0 # Default mask mode
++++#         )
++++        
++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
++++#         attn_output = self.o_proj(attn_output)
++++        
++++#         # Flash Attention does not return attention weights
++++#         attn_weights = None
++++
++++#         return attn_output, attn_weights, past_key_value
++++
++++class DeepseekFlashAttention(nn.Module):
++++    """
++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
++++    designed for high performance on supported hardware (Ascend).
++++
++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
++++    """
++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++++        super().__init__()
++++        self.config = config
++++        self.layer_idx = layer_idx
++++        if layer_idx is None:
++++            logger.warning(
++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++                "when creating this class."
++++            )
++++
++++        # --- [FIX] Correctly initialize all required attributes ---
++++        self.attention_dropout = config.attention_dropout
++++        self.hidden_size = config.hidden_size
++++        self.num_heads = config.num_attention_heads
++++        self.head_dim = self.hidden_size // self.num_heads
++++        self.num_key_value_heads = config.num_key_value_heads
++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++        self.max_position_embeddings = config.max_position_embeddings
++++        self.rope_theta = config.rope_theta
++++        self.is_causal = True
++++
++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++            raise ValueError(
++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++                f" and `num_heads`: {self.num_heads})."
++++            )
++++
++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++++        
++++        # This call will now succeed as all attributes are initialized.
++++        self._init_rope()
++++
++++    def _init_rope(self):
++++        if self.config.rope_scaling is None:
++++            self.rotary_emb = DeepseekRotaryEmbedding(
++++                self.head_dim,
++++                max_position_embeddings=self.max_position_embeddings,
++++                base=self.rope_theta,
++++            )
++++        else:
++++            scaling_type = self.config.rope_scaling["type"]
++++            scaling_factor = self.config.rope_scaling["factor"]
++++            if scaling_type == "linear":
++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++++                    self.head_dim,
++++                    max_position_embeddings=self.max_position_embeddings,
++++                    scaling_factor=scaling_factor,
++++                    base=self.rope_theta,
++++                )
++++            elif scaling_type == "dynamic":
++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++++                    self.head_dim,
++++                    max_position_embeddings=self.max_position_embeddings,
++++                    scaling_factor=scaling_factor,
++++                    base=self.rope_theta,
++++                )
++++            else:
++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++++
++++    def forward(
++++        self,
++++        hidden_states: mindspore.Tensor,
++++        attention_mask: Optional[mindspore.Tensor] = None,
++++        position_ids: Optional[mindspore.Tensor] = None,
++++        past_key_value: Optional[Cache] = None,
++++        output_attentions: bool = False,
++++        use_cache: bool = False,
++++        **kwargs,
++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++        if "padding_mask" in kwargs:
++++            warnings.warn(
++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++++            )
++++        if output_attentions:
++++            warnings.warn(
++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
++++            )
++++
++++        bsz, q_len, _ = hidden_states.shape
++++
++++        if self.config.pretraining_tp > 1:
++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++++
++++        query_states = self.q_proj(hidden_states)
++++        key_states = self.k_proj(hidden_states)
++++        value_states = self.v_proj(hidden_states)
++++
++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++        kv_seq_len = key_states.shape[-2]
++++        if past_key_value is not None:
++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++        
++++        # Apply Rotary Position Embedding
++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++        if past_key_value is not None:
++++            cache_kwargs = {"sin": sin, "cos": cos}
++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++
++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
++++        # So we must explicitly repeat the KV heads.
++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
++++
++++        # Convert attention mask for flash_attention_score
++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
++++        if attention_mask is not None:
++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++++                 raise ValueError(
++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++++                )
++++            attn_mask_for_fa = attention_mask < 0
++++        else:
++++            attn_mask_for_fa = None
++++
++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++++
++++        # Call the fused operator using the efficient BNSD layout
++++        attn_output = mindspore.ops.flash_attention_score(
++++            query=query_states,
++++            key=key_states,
++++            value=value_states,
++++            head_num=self.num_heads,
++++            input_layout='BNSD', # Specify the correct layout
++++            attn_mask=attn_mask_for_fa,
++++            keep_prob=keep_prob,
++++            scalar_value=1.0 / math.sqrt(self.head_dim)
++++        )
++++        
++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++        
++++        # Apply output projection
++++        attn_output = self.o_proj(attn_output)
++++
++++        # Flash attention does not return attention weights, so we return None.
++++        attn_weights = None
++++
++++        return attn_output, attn_weights, past_key_value
++++
+++ Deepseek_ATTENTION_CLASSES = {
+++     "eager": DeepseekAttention,
++++    "flash-attention": DeepseekFlashAttention,
+++ }
+++ 
+++ 
+++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+++             config=config, layer_idx=layer_idx
+++         )
+++ 
++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
++++            config=config, layer_idx=layer_idx
++++        )
++++
+++         self.mlp = (
+++             DeepseekMoE(config)
+++             if (
+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++index d4c6b651..bced285c 100644
+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+++ 
+++ import mindspore
+++ import mindnlp.core.nn.functional as F
+++-from mindnlp.core import nn, ops
++++from mindnlp.core import nn, ops, no_grad
+++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+++ 
+++ from ....common.activations import ACT2FN
+++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+++ 
++++Long_Prompt = False
++++PROMPT_LENGTH_THRESHOLD = 128
+++ 
+++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+++ def _prepare_4d_causal_attention_mask_with_cache_position(
+++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+++         return attn_output, attn_weights, past_key_value
+++ 
+++ 
++++# class Qwen2MoeFlashAttention(nn.Module):
++++#     """
++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++
++++#     关键改动:
++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++#        直接传入原始的 key 和 value 张量效率更高。
++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++#     """
++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++#         super().__init__()
++++#         self.config = config
++++#         self.layer_idx = layer_idx
++++#         self.hidden_size = config.hidden_size
++++#         self.num_heads = config.num_attention_heads
++++#         self.head_dim = self.hidden_size // self.num_heads
++++#         self.num_key_value_heads = config.num_key_value_heads
++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++#         self.max_position_embeddings = config.max_position_embeddings
++++#         self.rope_theta = config.rope_theta
++++#         self.attention_dropout = config.attention_dropout
++++
++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++#             raise ValueError(
++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++#             )
++++
++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++
++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++#             self.head_dim,
++++#             max_position_embeddings=self.max_position_embeddings,
++++#             base=self.rope_theta,
++++#         )
++++
++++#     def forward(
++++#         self,
++++#         hidden_states: mindspore.Tensor,
++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++#         position_ids: Optional[mindspore.Tensor] = None,
++++#         past_key_value: Optional[Cache] = None,
++++#         output_attentions: bool = False,
++++#         use_cache: bool = False,
++++#         cache_position: Optional[mindspore.Tensor] = None,
++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++#         bsz, q_len, _ = hidden_states.shape
++++
++++#         # 1. 线性投射 Q, K, V
++++#         query_states = self.q_proj(hidden_states)
++++#         key_states = self.k_proj(hidden_states)
++++#         value_states = self.v_proj(hidden_states)
++++
++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++#         # 3. RoPE 旋转位置编码
++++#         kv_seq_len = key_states.shape[-2]
++++#         if past_key_value is not None:
++++#             if self.layer_idx is None:
++++#                 raise ValueError(
++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++#                     "with a layer index."
++++#                 )
++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++#                 if cache_position.shape[0] == 1:
++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++#                     kv_seq_len = past_seen_tokens + 1
++++#                 else:
++++#                     # prefill 阶段：cache_position 是范围，使用其长度
++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++#             else:
++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++
++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++#         # 4. KV 缓存更新
++++#         if past_key_value is not None:
++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++#             key_states, value_states = past_key_value.update(
++++#                 key_states, value_states, self.layer_idx, cache_kwargs
++++#             )
++++            
++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++#                 if cache_position.shape[0] == 1:
++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++#                     kv_seq_len = key_states.shape[-2]
++++
++++#         # 5. [重要] 准备 Attention Mask
++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++#         fa_attention_mask = None
++++#         if attention_mask is not None:
++++#             # 截取与当前key长度匹配的部分
++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
++++#             fa_attention_mask = (mask_slice != 0)
++++
++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++#         input_dtype = query_states.dtype
++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++#             query_states = query_states.to(mindspore.float16)
++++#             key_states = key_states.to(mindspore.float16)
++++#             value_states = value_states.to(mindspore.float16)
++++
++++#         # 6. [核心] 调用 flash_attention_score 算子
++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++#         attn_output = mindspore.ops.flash_attention_score(
++++#             query=query_states,
++++#             key=key_states,
++++#             value=value_states,
++++#             head_num=self.num_heads, # 传入Q的头数(N1)
++++#             attn_mask=fa_attention_mask,
++++#             keep_prob=1.0 - self.attention_dropout,
++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
++++#             input_layout="BNSD",
++++#             sparse_mode=0 # 使用 defaultMask 模式
++++#         )
++++
++++#         # 恢复原始数据类型
++++#         attn_output = attn_output.to(input_dtype)
++++
++++#         # 7. 调整输出形状
++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++#         attn_output = self.o_proj(attn_output)
++++
++++#         # FlashAttention 算子不直接返回注意力权重矩阵
++++#         attn_weights = None
++++#         if output_attentions:
++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++
++++#         return attn_output, attn_weights, past_key_value
++++
++++#     # def forward(
++++#     #     self,
++++#     #     hidden_states: mindspore.Tensor,
++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
++++#     #     position_ids: Optional[mindspore.Tensor] = None,
++++#     #     past_key_value: Optional[Cache] = None,
++++#     #     output_attentions: bool = False,
++++#     #     use_cache: bool = False,
++++#     #     cache_position: Optional[mindspore.Tensor] = None,
++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++#     #     bsz, q_len, _ = hidden_states.shape
++++
++++#     #     # 1. 线性投射 Q, K, V
++++#     #     query_states = self.q_proj(hidden_states)
++++#     #     key_states = self.k_proj(hidden_states)
++++#     #     value_states = self.v_proj(hidden_states)
++++
++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++
++++#     #     # 3. RoPE 旋转位置编码
++++#     #     kv_seq_len = key_states.shape[-2]
++++#     #     if past_key_value is not None:
++++#     #         if self.layer_idx is None:
++++#     #             raise ValueError(
++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++#     #                 "with a layer index."
++++#     #             )
++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++
++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++#     #     # 4. KV 缓存更新
++++#     #     if past_key_value is not None:
++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++#     #         key_states, value_states = past_key_value.update(
++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
++++#     #         )
++++
++++#     #     # 5. 准备 Attention Mask
++++#     #     fa_attention_mask = None
++++#     #     if attention_mask is not None:
++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++#     #         fa_attention_mask = (mask_slice != 0)
++++
++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++#     #     input_dtype = query_states.dtype
++++
++++#     #     # 6. [核心] 调用 flash_attention_score 算子
++++#     #     attn_output = mindspore.ops.flash_attention_score(
++++#     #         query=query_states,
++++#     #         key=key_states,
++++#     #         value=value_states,
++++#     #         head_num=self.num_heads,
++++#     #         attn_mask=fa_attention_mask,
++++#     #         keep_prob=1.0 - self.attention_dropout,
++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++#     #         input_layout="BNSD",
++++#     #         sparse_mode=0,
++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++#     #         inner_precise=1
++++#     #     )
++++
++++#     #     # 恢复原始数据类型
++++#     #     attn_output = attn_output.to(input_dtype)
++++
++++#     #     # 7. 调整输出形状
++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++#     #     attn_output = self.o_proj(attn_output)
++++
++++#     #     attn_weights = None
++++#     #     if output_attentions:
++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++
++++#     #     return attn_output, attn_weights, past_key_value
++++
++++
+++ class Qwen2MoeFlashAttention(nn.Module):
+++     """
+++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++-
+++-    关键改动:
+++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++-       直接传入原始的 key 和 value 张量效率更高。
+++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
++++
++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
++++    完全使用模型的低精度数据类型（如 float16）进行计算，
++++    以达到理论上的最高执行速度。
+++     """
+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++         super().__init__()
+++         self.config = config
+++         self.layer_idx = layer_idx
++++        if layer_idx is None:
++++            logger.warning_once(
++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
++++            )
++++
+++         self.hidden_size = config.hidden_size
+++         self.num_heads = config.num_attention_heads
+++         self.head_dim = self.hidden_size // self.num_heads
+++         self.num_key_value_heads = config.num_key_value_heads
+++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++         self.max_position_embeddings = config.max_position_embeddings
+++         self.rope_theta = config.rope_theta
+++         self.attention_dropout = config.attention_dropout
+++ 
+++-        if (self.head_dim * self.num_heads) != self.hidden_size:
+++-            raise ValueError(
+++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++-            )
+++-
+++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+++         key_states = self.k_proj(hidden_states)
+++         value_states = self.v_proj(hidden_states)
+++ 
+++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++-        # query:   [B, S, H*D] -> [B, N1, S, D]
+++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++        # 2. 调整形状以匹配 BNSD 布局
+++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-
+++-        # 3. RoPE 旋转位置编码
++++        
++++        # 3. RoPE 和 KV 缓存
+++         kv_seq_len = key_states.shape[-2]
+++         if past_key_value is not None:
+++-            if self.layer_idx is None:
+++-                raise ValueError(
+++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++-                    "with a layer index."
+++-                )
+++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++-                if cache_position.shape[0] == 1:
+++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++-                    kv_seq_len = past_seen_tokens + 1
+++-                else:
+++-                    # prefill 阶段：cache_position 是范围，使用其长度
+++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++-            else:
+++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++-
++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++        
+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++ 
+++-        # 4. KV 缓存更新
+++         if past_key_value is not None:
+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++-            key_states, value_states = past_key_value.update(
+++-                key_states, value_states, self.layer_idx, cache_kwargs
+++-            )
+++-            
+++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++-                if cache_position.shape[0] == 1:
+++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++-                    kv_seq_len = key_states.shape[-2]
+++-
+++-        # 5. [重要] 准备 Attention Mask
+++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++
++++        # 4. 准备 Attention Mask
+++         fa_attention_mask = None
+++         if attention_mask is not None:
+++-            # 截取与当前key长度匹配的部分
+++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++             fa_attention_mask = (mask_slice != 0)
+++ 
+++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++-        input_dtype = query_states.dtype
+++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++-            query_states = query_states.to(mindspore.float16)
+++-            key_states = key_states.to(mindspore.float16)
+++-            value_states = value_states.to(mindspore.float16)
+++-
+++-        # 6. [核心] 调用 flash_attention_score 算子
+++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+++         attn_output = mindspore.ops.flash_attention_score(
+++             query=query_states,
+++             key=key_states,
+++             value=value_states,
+++-            head_num=self.num_heads, # 传入Q的头数(N1)
++++            head_num=self.num_heads,
+++             attn_mask=fa_attention_mask,
+++-            keep_prob=1.0 - self.attention_dropout,
++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+++             scalar_value=1.0 / math.sqrt(self.head_dim),
+++             input_layout="BNSD",
+++-            sparse_mode=0 # 使用 defaultMask 模式
++++            sparse_mode=0,
++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+++         )
+++ 
+++-        # 恢复原始数据类型
+++-        attn_output = attn_output.to(input_dtype)
+++-
+++-        # 7. 调整输出形状
+++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++        # 6. 调整输出形状
+++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++         attn_output = self.o_proj(attn_output)
+++ 
+++-        # FlashAttention 算子不直接返回注意力权重矩阵
++++        # 7. 返回结果
+++         attn_weights = None
+++         if output_attentions:
+++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+++ 
+++         return attn_output, attn_weights, past_key_value
+++ 
+++-    # def forward(
+++-    #     self,
+++-    #     hidden_states: mindspore.Tensor,
+++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+++-    #     position_ids: Optional[mindspore.Tensor] = None,
+++-    #     past_key_value: Optional[Cache] = None,
+++-    #     output_attentions: bool = False,
+++-    #     use_cache: bool = False,
+++-    #     cache_position: Optional[mindspore.Tensor] = None,
+++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++-
+++-    #     bsz, q_len, _ = hidden_states.shape
+++-
+++-    #     # 1. 线性投射 Q, K, V
+++-    #     query_states = self.q_proj(hidden_states)
+++-    #     key_states = self.k_proj(hidden_states)
+++-    #     value_states = self.v_proj(hidden_states)
+++-
+++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-
+++-    #     # 3. RoPE 旋转位置编码
+++-    #     kv_seq_len = key_states.shape[-2]
+++-    #     if past_key_value is not None:
+++-    #         if self.layer_idx is None:
+++-    #             raise ValueError(
+++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++-    #                 "with a layer index."
+++-    #             )
+++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++ 
+++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++-
+++-    #     # 4. KV 缓存更新
+++-    #     if past_key_value is not None:
+++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++-    #         key_states, value_states = past_key_value.update(
+++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+++-    #         )
+++-
+++-    #     # 5. 准备 Attention Mask
+++-    #     fa_attention_mask = None
+++-    #     if attention_mask is not None:
+++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++-    #         fa_attention_mask = (mask_slice != 0)
+++-
+++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++-    #     input_dtype = query_states.dtype
+++-
+++-    #     # 6. [核心] 调用 flash_attention_score 算子
+++-    #     attn_output = mindspore.ops.flash_attention_score(
+++-    #         query=query_states,
+++-    #         key=key_states,
+++-    #         value=value_states,
+++-    #         head_num=self.num_heads,
+++-    #         attn_mask=fa_attention_mask,
+++-    #         keep_prob=1.0 - self.attention_dropout,
+++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++-    #         input_layout="BNSD",
+++-    #         sparse_mode=0,
+++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++-    #         inner_precise=1
+++-    #     )
+++-
+++-    #     # 恢复原始数据类型
+++-    #     attn_output = attn_output.to(input_dtype)
++++QWEN2MOE_ATTENTION_CLASSES = {
++++    "eager": Qwen2MoeAttention,
++++    "flash-attention": Qwen2MoeFlashAttention,
++++}
+++ 
+++-    #     # 7. 调整输出形状
+++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++-    #     attn_output = self.o_proj(attn_output)
+++ 
+++-    #     attn_weights = None
+++-    #     if output_attentions:
+++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++#     def __init__(self, config):
++++#         super().__init__()
++++#         self.num_experts = config.num_experts
++++#         self.top_k = config.num_experts_per_tok
++++#         self.norm_topk_prob = config.norm_topk_prob
++++
++++#         # gating
++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++#         self.experts = nn.ModuleList(
++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++#         )
++++
++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++
++++#     #@dwj
++++#     # 只遍历激活的专家，而非全部专家
++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++#             num_tokens = hidden_states_reshaped.shape[0]
++++
++++#             router_logits = self.gate(hidden_states_reshaped)
++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++
++++#             if self.norm_topk_prob:
++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++#             routing_weights = routing_weights.to(hidden_states.dtype)
++++            
++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++#             flat_selected_experts = selected_experts.flatten()
++++            
++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++#             token_indices = broadcasted_token_indices.flatten()
++++            
++++#             active_experts = ops.unique(flat_selected_experts)
++++            
++++#             for expert_idx_tensor in active_experts:
++++#                 expert_idx = expert_idx_tensor.item()
++++#                 expert_layer = self.experts[expert_idx]
++++                
++++#                 mask = (flat_selected_experts == expert_idx_tensor)
++++#                 selected_token_indices = token_indices[mask]
++++#                 selected_routing_weights = routing_weights.flatten()[mask]
++++                
++++#                 current_states = hidden_states_reshaped[selected_token_indices]
++++                
++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++                
++++#                 final_hidden_states = final_hidden_states.index_add(
++++#                     dim=0,
++++#                     index=selected_token_indices,
++++#                     source=expert_output.to(hidden_states.dtype)
++++#                 )
++++            
++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++ 
+++-    #     return attn_output, attn_weights, past_key_value
++++#             final_hidden_states = final_hidden_states + shared_expert_output
++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++            
++++#             return final_hidden_states, router_logits
++++
++++
++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++#     """
++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
++++#     """
++++#     def __init__(self, config: Qwen2MoeConfig):
++++#         super().__init__()
++++#         self.num_experts = config.num_experts
++++#         self.top_k = config.num_experts_per_tok
++++#         self.norm_topk_prob = config.norm_topk_prob
++++
++++#         # 门控网络
++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++#         # 专家列表
++++#         self.experts = nn.ModuleList(
++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++#         )
++++#         # 共享专家
++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++
++++#     @no_grad()
++++#     def _moe_infer_decode(
++++#         self, 
++++#         hidden_states: mindspore.Tensor, 
++++#         selected_experts: mindspore.Tensor, 
++++#         routing_weights: mindspore.Tensor
++++#     ) -> mindspore.Tensor:
++++#         """
++++#         【解码路径】针对 sequence_length=1 的极致优化。
++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
++++#         """
++++#         batch_size, hidden_dim = hidden_states.shape
++++        
++++#         expert_outputs_list = [
++++#             ops.cat([
++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++#             ], dim=0) 
++++#             for i in range(batch_size)
++++#         ]
++++        
++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
++++#         # shape: (batch_size, top_k, hidden_dim)
++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++        
++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++        
++++#         return moe_output.squeeze(1)
++++
++++#     @no_grad()
++++#     def _moe_infer_prefill(
++++#         self, 
++++#         hidden_states: mindspore.Tensor, 
++++#         selected_experts: mindspore.Tensor, 
++++#         routing_weights: mindspore.Tensor
++++#     ) -> mindspore.Tensor:
++++#         """
++++#         【预填充路径】针对 sequence_length > 1 的优化。
++++#         按专家对 Token 进行分组，并进行批处理。
++++#         """
++++#         moe_output = ops.zeros_like(hidden_states)
++++#         num_tokens = hidden_states.shape[0]
++++#         flat_selected_experts = selected_experts.flatten()
++++        
++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++        
++++#         active_experts = ops.unique(flat_selected_experts)
++++        
++++#         for expert_idx_tensor in active_experts:
++++#             expert_idx = expert_idx_tensor.item()
++++#             expert_layer = self.experts[expert_idx]
++++            
++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++#             selected_token_indices = token_indices[mask]
++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++            
++++#             current_states = hidden_states[selected_token_indices]
++++            
++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++            
++++#             moe_output = moe_output.index_add(
++++#                 dim=0,
++++#                 index=selected_token_indices,
++++#                 source=expert_output.to(hidden_states.dtype)
++++#             )
++++#         return moe_output
++++
++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++#         """
++++#         顶层 forward 方法，作为智能分发器。
++++#         """
++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++        
++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++#         router_logits = self.gate(hidden_states_reshaped)
++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++ 
+++-    # def forward(
+++-    #     self,
+++-    #     hidden_states: mindspore.Tensor,
+++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+++-    #     position_ids: Optional[mindspore.Tensor] = None,
+++-    #     past_key_value: Optional[Cache] = None,
+++-    #     output_attentions: bool = False,
+++-    #     use_cache: bool = False,
+++-    #     cache_position: Optional[mindspore.Tensor] = None,
+++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++-
+++-    #     bsz, q_len, _ = hidden_states.shape
+++-
+++-    #     query_states = self.q_proj(hidden_states)
+++-    #     key_states = self.k_proj(hidden_states)
+++-    #     value_states = self.v_proj(hidden_states)
+++-
+++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-
+++-    #     kv_seq_len = key_states.shape[-2]
+++-    #     if past_key_value is not None:
+++-    #         if self.layer_idx is None:
+++-    #             raise ValueError("`layer_idx` must be specified for caching")
+++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++-
+++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++-
+++-    #     if past_key_value is not None:
+++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++-    #         key_states, value_states = past_key_value.update(
+++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+++-    #         )
++++#         if self.norm_topk_prob:
++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++        
++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++        
++++#         moe_output = None
++++#         # 在推理时，根据序列长度选择最优路径
++++#         if not self.training:
++++#             if sequence_length == 1:
++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++#             else:
++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++#         else:
++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
++++#             raise NotImplementedError("Training path is not implemented.")
++++
++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
++++        
++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
++++        
++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
++++        
++++#         return final_hidden_states, router_logits
++++
++++
++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++#     """
++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
++++#     """
++++#     def __init__(self, config: Qwen2MoeConfig):
++++#         super().__init__()
++++#         self.num_experts = config.num_experts
++++#         self.top_k = config.num_experts_per_tok
++++#         self.norm_topk_prob = config.norm_topk_prob
++++
++++#         # 门控网络
++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++#         # 专家列表
++++#         self.experts = nn.ModuleList(
++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++#         )
++++#         # 共享专家
++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++
++++#     @no_grad()
++++#     def _moe_infer_decode(
++++#         self, 
++++#         hidden_states: mindspore.Tensor, 
++++#         selected_experts: mindspore.Tensor, 
++++#         routing_weights: mindspore.Tensor
++++#     ) -> mindspore.Tensor:
++++#         batch_size, _ = hidden_states.shape
++++#         expert_outputs_list = [
++++#             ops.cat([
++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++#             ], dim=0) 
++++#             for i in range(batch_size)
++++#         ]
++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++#         return moe_output.squeeze(1)
++++
++++#     @no_grad()
++++#     def _moe_infer_prefill(
++++#         self, 
++++#         hidden_states: mindspore.Tensor, 
++++#         selected_experts: mindspore.Tensor, 
++++#         routing_weights: mindspore.Tensor
++++#     ) -> mindspore.Tensor:
++++#         moe_output = ops.zeros_like(hidden_states)
++++#         num_tokens = hidden_states.shape[0]
++++#         flat_selected_experts = selected_experts.flatten()
++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++#         active_experts = ops.unique(flat_selected_experts)
++++        
++++#         for expert_idx_tensor in active_experts:
++++#             expert_idx = expert_idx_tensor.item()
++++#             expert_layer = self.experts[expert_idx]
++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++#             selected_token_indices = token_indices[mask]
++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++#             current_states = hidden_states[selected_token_indices]
++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++#             moe_output = moe_output.index_add(
++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++#             )
++++#         return moe_output
++++
++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++#         """
++++#         顶层 forward 方法，作为智能分发器。
++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
++++#         """
++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++        
++++#         # 1. 门控计算 (通用逻辑)
++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++#         router_logits = self.gate(hidden_states_reshaped)
++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++
++++#         if self.norm_topk_prob:
++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++        
++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++        
++++#         # 2. 智能分发到最优 MoE 路径
++++#         moe_output = None
++++#         if not self.training:
++++#             if sequence_length == 1:
++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++#             else:
++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++#         else:
++++#             raise NotImplementedError("Training path is not implemented.")
++++
++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++        
++++#         # 4. 合并 MoE 输出和共享专家输出
++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++        
++++#         # 5. 恢复原始形状并返回
++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++        
++++#         return final_hidden_states, router_logits
++++
++++# prefill fastest
++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++#     """
++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
++++#     """
++++#     def __init__(self, config: Qwen2MoeConfig):
++++#         super().__init__()
++++#         self.num_experts = config.num_experts
++++#         self.top_k = config.num_experts_per_tok
++++#         self.norm_topk_prob = config.norm_topk_prob
++++
++++#         # 门控网络
++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++#         # 专家列表
++++#         self.experts = nn.ModuleList(
++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++#         )
++++#         # 共享专家
++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++
++++#     @no_grad()
++++#     def _moe_infer_dispatch(
++++#         self, 
++++#         hidden_states: mindspore.Tensor, 
++++#         selected_experts: mindspore.Tensor, 
++++#         routing_weights: mindspore.Tensor
++++#     ) -> mindspore.Tensor:
++++#         """
++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
++++#         """
++++#         moe_output = ops.zeros_like(hidden_states)
++++#         num_tokens, _ = hidden_states.shape
++++        
++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
++++#         flat_selected_experts = selected_experts.flatten()
++++#         flat_routing_weights = routing_weights.flatten()
+++ 
+++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++-
+++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++-    #     query_states = query_states / math.sqrt(self.head_dim)
+++-    #     # <--- 修改结束 ---
+++-
+++-    #     fa_attention_mask = None
+++-    #     if attention_mask is not None:
+++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++-    #         fa_attention_mask = (mask_slice != 0)
+++-
+++-    #     input_dtype = query_states.dtype
+++-
+++-    #     attn_output = mindspore.ops.flash_attention_score(
+++-    #         query=query_states,    # 传入已经预先缩放过的 query
+++-    #         key=key_states,
+++-    #         value=value_states,
+++-    #         head_num=self.num_heads,
+++-    #         attn_mask=fa_attention_mask,
+++-    #         keep_prob=1.0 - self.attention_dropout,
+++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++-    #         input_layout="BNSD",
+++-    #         sparse_mode=0,
+++-    #         inner_precise=1        # 仍然保持内部高精度计算
+++-    #     )
++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++ 
+++-    #     attn_output = attn_output.to(input_dtype)
+++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++-    #     attn_output = self.o_proj(attn_output)
++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
++++#         active_experts = ops.unique(flat_selected_experts)
++++        
++++#         for expert_idx_tensor in active_experts:
++++#             expert_idx = expert_idx_tensor.item()
++++#             expert_layer = self.experts[expert_idx]
++++            
++++#             # 找到所有分配给该专家的 token
++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++            
++++#             # 使用 mask 选取对应的 token 和权重
++++#             current_token_indices = token_indices[mask]
++++#             current_routing_weights = flat_routing_weights[mask]
++++#             current_hidden_states = hidden_states[current_token_indices]
++++            
++++#             # 对这些 token 进行批处理
++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++            
++++#             # 使用 index_add 将结果精确地加回到对应位置
++++#             moe_output = moe_output.index_add(
++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
++++#             )
++++#         return moe_output
++++
++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++#         """
++++#         顶层 forward 方法，作为智能分发器。
++++#         """
++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++        
++++#         # 1. 门控计算
++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++#         router_logits = self.gate(hidden_states_reshaped)
++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++
++++#         if self.norm_topk_prob:
++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++        
++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++        
++++#         # 2. 调用统一的 MoE 计算内核
++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+++ 
+++-    #     attn_weights = None
+++-    #     if output_attentions:
+++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++#         # 3. 统一处理共享专家
++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++        
++++#         # 4. 合并输出
++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++        
++++#         # 5. 恢复原始形状并返回
++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++        
++++#         return final_hidden_states, router_logits
++++
++++
++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++#     """
++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++#     【最终高性能与高精度版】：
++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
++++#     3. 这样实现了速度和准确性的两全其美。
++++#     """
++++#     def __init__(self, config: Qwen2MoeConfig):
++++#         super().__init__()
++++#         self.num_experts = config.num_experts
++++#         self.top_k = config.num_experts_per_tok
++++#         self.norm_topk_prob = config.norm_topk_prob
++++
++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++#         self.experts = nn.ModuleList(
++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++#         )
++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++
++++#     @no_grad()
++++#     def _moe_infer_decode(
++++#         self, 
++++#         hidden_states: mindspore.Tensor, 
++++#         selected_experts: mindspore.Tensor, 
++++#         routing_weights: mindspore.Tensor
++++#     ) -> mindspore.Tensor:
++++#         """
++++#         【解码路径】极致优化版：bmm + 高精度累加。
++++#         """
++++#         original_dtype = hidden_states.dtype
++++#         batch_size, _ = hidden_states.shape
++++        
++++#         expert_outputs_list = [
++++#             ops.cat([
++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++#             ], dim=0) 
++++#             for i in range(batch_size)
++++#         ]
++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++
++++#         # 在 float32 下执行 bmm，得到高精度结果
++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++        
++++#         # 将高精度结果转换回原始数据类型
++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
++++        
++++#         return moe_output
++++
++++#     @no_grad()
++++#     def _moe_infer_prefill(
++++#         self, 
++++#         hidden_states: mindspore.Tensor, 
++++#         selected_experts: mindspore.Tensor, 
++++#         routing_weights: mindspore.Tensor
++++#     ) -> mindspore.Tensor:
++++#         """
++++#         【预填充路径】与原始实现一致，结果精确。
++++#         """
++++#         moe_output = ops.zeros_like(hidden_states)
++++#         num_tokens, _ = hidden_states.shape
++++#         flat_selected_experts = selected_experts.flatten()
++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++#         active_experts = ops.unique(flat_selected_experts)
++++        
++++#         for expert_idx_tensor in active_experts:
++++#             expert_idx = expert_idx_tensor.item()
++++#             expert_layer = self.experts[expert_idx]
++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++#             selected_token_indices = token_indices[mask]
++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++#             current_states = hidden_states[selected_token_indices]
++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++#             moe_output = moe_output.index_add(
++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++#             )
++++#         return moe_output
++++
++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++        
++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++#         router_logits = self.gate(hidden_states_reshaped)
++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++ 
+++-    #     return attn_output, attn_weights, past_key_value
++++#         if self.norm_topk_prob:
++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++        
++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
++++#         # 如果模型主体是 float16，后续再转换
++++        
++++#         moe_output = None
++++#         if not self.training:
++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
++++#             # _moe_infer_decode 内部会处理好类型转换
++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
++++#             if sequence_length == 1:
++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++#             else:
++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++#         else:
++++#             raise NotImplementedError("Training path is not implemented.")
++++
++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++        
++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++        
++++#         return final_hidden_states, router_logits
++++    
+++ 
+++-QWEN2MOE_ATTENTION_CLASSES = {
+++-    "eager": Qwen2MoeAttention,
+++-    "flash-attention": Qwen2MoeFlashAttention,
+++-}
++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++#     """
++++#     【融合版】一个混合专家模块，内置两种推理策略，
++++#     由外部全局变量 `Long_Prompt` 控制：
++++
++++#     - if Long_Prompt is True:  【精度优先模式】
++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
++++#       适用于处理长序列，避免误差累积。
++++
++++#     - if Long_Prompt is False: 【速度优先模式】
++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
++++#       在解码阶段获得极致速度，同时保证结果高度准确。
++++#     """
++++#     def __init__(self, config: Qwen2MoeConfig):
++++#         super().__init__()
++++#         self.num_experts = config.num_experts
++++#         self.top_k = config.num_experts_per_tok
++++#         self.norm_topk_prob = config.norm_topk_prob
++++
++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++#         self.experts = nn.ModuleList(
++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++#         )
++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++
++++#     # --- 速度优先模式的辅助函数 ---
++++#     @no_grad()
++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++#         original_dtype = hidden_states.dtype
++++#         batch_size, _ = hidden_states.shape
++++#         expert_outputs_list = [
++++#             ops.cat([
++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++#             ], dim=0) 
++++#             for i in range(batch_size)
++++#         ]
++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++#         weights_fp32 = routing_weights.to(mindspore.float32)
++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
++++
++++#     @no_grad()
++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++#         moe_output = ops.zeros_like(hidden_states)
++++#         num_tokens, _ = hidden_states.shape
++++#         flat_selected_experts = selected_experts.flatten()
++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++#         active_experts = ops.unique(flat_selected_experts)
++++#         for expert_idx_tensor in active_experts:
++++#             expert_idx = expert_idx_tensor.item()
++++#             expert_layer = self.experts[expert_idx]
++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++#             selected_token_indices = token_indices[mask]
++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++#             current_states = hidden_states[selected_token_indices]
++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
++++#         return moe_output
++++
++++#     # --- 精度优先模式的辅助函数 ---
++++#     @no_grad()
++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++#         moe_output = ops.zeros_like(hidden_states)
++++#         num_tokens, _ = hidden_states.shape
++++#         flat_selected_experts = selected_experts.flatten()
++++#         flat_routing_weights = routing_weights.flatten()
++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++#         active_experts = ops.unique(flat_selected_experts)
++++#         for expert_idx_tensor in active_experts:
++++#             expert_idx = expert_idx_tensor.item()
++++#             expert_layer = self.experts[expert_idx]
++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++#             current_token_indices = token_indices[mask]
++++#             current_routing_weights = flat_routing_weights[mask]
++++#             current_hidden_states = hidden_states[current_token_indices]
++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++++#         return moe_output
++++
++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++#         # 声明我们将要使用一个在模块外部定义的全局变量
++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
++++#         global Long_Prompt
++++
++++#         # 1. 门控计算 (所有模式通用)
++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++#         router_logits = self.gate(hidden_states_reshaped)
++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++++#         if self.norm_topk_prob:
++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++        
++++#         moe_output = None
++++#         if not self.training:
++++#             # 根据 Long_Prompt 标志选择模式
++++#             if Long_Prompt:
++++#                 # --- 精度优先模式 ---
++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++#             else:
++++#                 # --- 速度优先模式 ---
++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++#                 if sequence_length == 1:
++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++#                 else:
++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++#         else:
++++#             raise NotImplementedError("Training path is not implemented.")
++++
++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++        
++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++        
++++#         return final_hidden_states, router_logits
++++    
++++class Qwen2MoeSparseMoeBlock(nn.Module):
++++    """
++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
++++    控制的顶级推理策略：
+++ 
++++    - if Long_Prompt is True:  【精度优先模式】
++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
++++      适用于需要严格可复现性的长序列任务。
+++ 
+++-class Qwen2MoeSparseMoeBlock(nn.Module):
+++-    def __init__(self, config):
++++    - if Long_Prompt is False: 【速度优先模式】
++++      采用业界最强的性能组合：
++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
++++    """
++++    def __init__(self, config: Qwen2MoeConfig):
+++         super().__init__()
+++         self.num_experts = config.num_experts
+++         self.top_k = config.num_experts_per_tok
+++         self.norm_topk_prob = config.norm_topk_prob
+++ 
+++-        # gating
+++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++         self.experts = nn.ModuleList(
+++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++         )
+++-
+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++ 
+++-    #@dwj
+++-    # 只遍历激活的专家，而非全部专家
+++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++-            num_tokens = hidden_states_reshaped.shape[0]
+++-
+++-            router_logits = self.gate(hidden_states_reshaped)
+++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++-
+++-            if self.norm_topk_prob:
+++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-            routing_weights = routing_weights.to(hidden_states.dtype)
+++-            
+++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++-            flat_selected_experts = selected_experts.flatten()
+++-            
+++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++-            token_indices = broadcasted_token_indices.flatten()
+++-            
+++-            active_experts = ops.unique(flat_selected_experts)
+++-            
+++-            for expert_idx_tensor in active_experts:
+++-                expert_idx = expert_idx_tensor.item()
+++-                expert_layer = self.experts[expert_idx]
+++-                
+++-                mask = (flat_selected_experts == expert_idx_tensor)
+++-                selected_token_indices = token_indices[mask]
+++-                selected_routing_weights = routing_weights.flatten()[mask]
+++-                
+++-                current_states = hidden_states_reshaped[selected_token_indices]
+++-                
+++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++-                
+++-                final_hidden_states = final_hidden_states.index_add(
+++-                    dim=0,
+++-                    index=selected_token_indices,
+++-                    source=expert_output.to(hidden_states.dtype)
+++-                )
+++-            
+++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
++++    @no_grad()
++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++        original_dtype = hidden_states.dtype
++++        batch_size, _ = hidden_states.shape
++++        expert_outputs_list = [
++++            ops.cat([
++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++            ], dim=0)
++++            for i in range(batch_size)
++++        ]
++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++        weights_fp32 = routing_weights.to(mindspore.float32)
++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++        return moe_output_fp32.squeeze(1).to(original_dtype)
++++
++++    @no_grad()
++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++        num_tokens, _ = hidden_states.shape
++++        flat_selected_experts = selected_experts.flatten()
++++        sorted_expert_indices = flat_selected_experts.argsort()
++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++++        original_token_indices = sorted_expert_indices // self.top_k
++++        moe_output = ops.zeros_like(hidden_states)
++++        current_token_offset = 0
++++        for i in range(self.num_experts):
++++            expert_token_count = tokens_per_expert[i] - current_token_offset
++++            if expert_token_count == 0:
++++                continue
++++            end_offset = current_token_offset + expert_token_count
++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++++            expert_hidden_states = hidden_states[expert_original_token_indices]
++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++++            current_token_offset += expert_token_count
++++        return moe_output
++++
++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
++++    @no_grad()
++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++        moe_output = ops.zeros_like(hidden_states)
++++        num_tokens, _ = hidden_states.shape
++++        flat_selected_experts = selected_experts.flatten()
++++        flat_routing_weights = routing_weights.flatten()
++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++        active_experts = ops.unique(flat_selected_experts)
++++        for expert_idx_tensor in active_experts:
++++            expert_idx = expert_idx_tensor.item()
++++            expert_layer = self.experts[expert_idx]
++++            mask = (flat_selected_experts == expert_idx_tensor)
++++            current_token_indices = token_indices[mask]
++++            current_routing_weights = flat_routing_weights[mask]
++++            current_hidden_states = hidden_states[current_token_indices]
++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++++        return moe_output
+++ 
+++-            final_hidden_states = final_hidden_states + shared_expert_output
+++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++-            
+++-            return final_hidden_states, router_logits
++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++        global Long_Prompt
++++
++++        # 1. 门控计算 (所有模式通用)
++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++        router_logits = self.gate(hidden_states_reshaped)
++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++++        if self.norm_topk_prob:
++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++        
++++        moe_output = None
++++        if Long_Prompt:
++++            # --- 精度优先模式 (ACCURACY MODE) ---
++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++        else:
++++            # --- 速度优先模式 (SPEED MODE) ---
++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++            if sequence_length == 1:
++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++            else:
++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++        
+++ 
++++        # 3. 共享专家计算与合并 (所有模式通用)
++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++        
++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++        
++++        return final_hidden_states, router_logits
+++ 
+++ class Qwen2MoeDecoderLayer(nn.Module):
+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+++         super().__init__()
+++         self.hidden_size = config.hidden_size
++++        
++++        # if Long_Prompt:
++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++        # else:
++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++ 
+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++ 
+++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++-
+++         if (layer_idx not in config.mlp_only_layers) and (
+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++         ):
+++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++             self._warmed_up = True
+++             self.warmup_moe_model()
+++ 
++++
++++
+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++         output_router_logits = (
+++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++             router_logits=outputs.router_logits,
+++         )
+++ 
++++    def generate(self, *args, **kwargs):
++++        """
++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
++++        """
++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++++
++++        input_ids = kwargs.get("input_ids")
++++        if input_ids is None and args:
++++            input_ids = args[0]
++++
++++        if input_ids is not None:
++++            prompt_length = input_ids.shape[1]
++++            
++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
++++                Long_Prompt = True
++++            else:
++++                Long_Prompt = False
++++
++++        return super().generate(*args, **kwargs)
++++    
+++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+++     def prepare_inputs_for_generation(
+++         self,
+++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+++         # Exception 1: when passing input_embeds, input_ids may be missing entries
+++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
++++        
+++         if past_key_values is not None:
+++             if inputs_embeds is not None:  # Exception 1
+++                 if 0 not in input_ids.shape:
+++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++             }
+++         )
+++         return model_inputs
++++
+++ # @lwx
+++     # def _decode_one_tokens_logits(
+++     #     self,
+++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+++             attentions=outputs.attentions,
+++         )
+++ 
++++
+++ __all__ = [
+++     "Qwen2MoeForCausalLM",
+++     "Qwen2MoeModel",
+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++new file mode 100644
+++index 00000000..6dfb5b93
+++--- /dev/null
++++++ b/patches/0001-20251104commit.patch
+++@@ -0,0 +1,1272 @@
++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++From: Pinoeer-kingxi <13022943007@163.com>
++++Date: Tue, 4 Nov 2025 09:11:51 +0800
++++Subject: [PATCH] 20251104commit
++++
++++---
++++ mindnlp/transformers/cache_utils.py           |  28 +-
++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++++ 3 files changed, 976 insertions(+), 87 deletions(-)
++++
++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++++index cadd2e04..02f8d4be 100644
++++--- a/mindnlp/transformers/cache_utils.py
+++++++ b/mindnlp/transformers/cache_utils.py
++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++++             # k_out[:, :, cache_position] = key_states
++++             # v_out[:, :, cache_position] = value_states
++++-            if ON_ORANGE_PI:
++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++-            else:
++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++-
+++++            # if ON_ORANGE_PI:
+++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++            # else:
+++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++            # 确保 cache_position 是 1D tensor 并且类型正确
+++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++++            if cache_position.ndim > 1:
+++++                cache_position = cache_position.flatten()
+++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++++                cache_position = cache_position.int()
+++++            
+++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++++            k_out[:, :, cache_position] = key_states
+++++            v_out[:, :, cache_position] = value_states
+++++            
++++         return k_out, v_out
++++ 
++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++index c695b944..d8303e45 100644
++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++++ def rotate_half(x):
++++     """Rotates half the hidden dims of the input."""
++++-    x1 = x[..., : x.shape[-1] // 2]
++++-    x2 = x[..., x.shape[-1] // 2 :]
+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++    # x1 = x[..., : x.shape[-1] // 2]
+++++    # x2 = x[..., x.shape[-1] // 2 :]
+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++     return ops.cat((-x2, x1), dim=-1)
++++ 
++++ 
++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++++         if self.training:
++++             raise NotImplementedError("Training is not supported yet.")
++++         else:
++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++-        if self.config.n_shared_experts is not None:
++++-            y = y + self.shared_experts(identity)
++++-        return y
+++++            # @lwx
+++++            if orig_shape[1] == 1:
+++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++++                y=y.view(*orig_shape)
+++++                if self.config.n_shared_experts is not None:
+++++                    y = y + self.shared_experts(identity)
+++++                return y
+++++            else:
+++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++++                if self.config.n_shared_experts is not None:
+++++                    y = y + self.shared_experts(identity)
+++++                return y
+++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++        # if self.config.n_shared_experts is not None:
+++++        #     y = y + self.shared_experts(identity)
+++++        # return y
+++++        
+++++    @no_grad()
+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++
+++++        expert_cache = ops.zeros_like(x)
+++++        for i in range(self.num_experts_per_tok):
+++++            expert_id = flat_expert_indices[i].item()
+++++            weight = flat_expert_weights[i].item()
+++++            expert = self.experts[expert_id]
+++++            expert_out = expert(x)
+++++            expert_cache += expert_out * weight
+++++        return expert_cache
++++ 
++++     @no_grad()
++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++-        # expert_cache = torch.zeros_like(x)
++++-        # idxs = flat_expert_indices.argsort()
++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++-        # token_idxs = idxs // self.num_experts_per_tok
++++-        # for i, end_idx in enumerate(tokens_per_expert):
++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++-        #     if start_idx == end_idx:
++++-        #         continue
++++-        #     expert = self.experts[i]
++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
++++-        #     expert_tokens = x[exp_token_idx]
++++-        #     expert_out = expert(expert_tokens)
++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++-        # return expert_cache
+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++         expert_cache = ops.zeros_like(x)
++++         idxs = flat_expert_indices.argsort()
++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++         token_idxs = idxs // self.num_experts_per_tok
+++++
++++         for i, end_idx in enumerate(tokens_per_expert):
++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++             if start_idx == end_idx:
++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++++             expert_out = expert(expert_tokens)
++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++
++++         return expert_cache
+++++        
+++++    # @no_grad()
+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++    #     # expert_cache = torch.zeros_like(x)
+++++    #     # idxs = flat_expert_indices.argsort()
+++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++    #     # token_idxs = idxs // self.num_experts_per_tok
+++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++    #     #     if start_idx == end_idx:
+++++    #     #         continue
+++++    #     #     expert = self.experts[i]
+++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++    #     #     expert_tokens = x[exp_token_idx]
+++++    #     #     expert_out = expert(expert_tokens)
+++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++    #     # return expert_cache
+++++    #     expert_cache = ops.zeros_like(x)
+++++    #     idxs = flat_expert_indices.argsort()
+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++
+++++    #     for i, end_idx in enumerate(tokens_per_expert):
+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++    #         if start_idx == end_idx:
+++++    #             continue
+++++    #         expert = self.experts[i]
+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++    #         expert_tokens = x[exp_token_idx]
+++++    #         expert_out = expert(expert_tokens)
+++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++
+++++    #     return expert_cache
+++++    # @no_grad()
+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++    #     expert_cache = ops.zeros_like(x)
+++++
+++++    #     # 排序保证顺序一致
+++++    #     idxs = flat_expert_indices.argsort()
+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++
+++++    #     # 找出有 token 的专家
+++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++
+++++    #     for i in active_experts.tolist():
+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++    #         end_idx = tokens_per_expert[i]
+++++    #         if start_idx == end_idx:  # 没有 token
+++++    #             continue
+++++
+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++    #         expert_tokens = x[exp_token_idx]
+++++    #         expert_out = self.experts[i](expert_tokens)
+++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++
+++++    #         expert_cache = mindspore.mint.scatter_add(
+++++    #             expert_cache,
+++++    #             0,
+++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++    #             expert_out
+++++    #         )
+++++
+++++    #     return expert_cache
+++++
+++++
++++ 
++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++++ #     """
++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++ 
++++         # Initialize weights and apply final processing
++++         self.post_init()
+++++        self.warm_up = False
+++++
+++++    def warmup_moe_model_deep(self):
+++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++        test_texts = [
+++++            "warmup short",
+++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++++        ]
+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++        if tokenizer is None:
+++++            from mindnlp.transformers import AutoTokenizer
+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++            self._warmup_tokenizer = tokenizer
+++++
+++++        for text in test_texts:
+++++            inputs = tokenizer(text, return_tensors="ms")
+++++            with mindspore._no_grad():
+++++                _ = self(**inputs, use_cache=False)
+++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++++ 
++++     def get_input_embeddings(self):
++++         return self.model.embed_tokens
++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++         ```"""
+++++        if not self.warm_up:
+++++            self.warm_up = True
+++++            self.warmup_moe_model_deep()
+++++
++++         output_attentions = (
++++             output_attentions
++++             if output_attentions is not None
++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++index 3cbf820e..d4c6b651 100644
++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++@@ -18,7 +18,6 @@
++++ # See the License for the specific language governing permissions and
++++ # limitations under the License.
++++ """MindSpore Qwen2MoE model."""
++++-
++++ import math
++++ from typing import List, Optional, Tuple, Union
++++ 
++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++++     TokenClassifierOutput,
++++ )
++++ from ...modeling_utils import PreTrainedModel
+++++from ...generation import GenerationMixin
++++ from ....utils import logging
++++ from .configuration_qwen2_moe import Qwen2MoeConfig
++++ 
++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++++         self.variance_epsilon = eps
++++ 
++++     def forward(self, hidden_states):
+++++        # @dwj
+++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++        # @lwx
+++++        # if not self.training :
+++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++         input_dtype = hidden_states.dtype
++++         hidden_states = hidden_states.to(mindspore.float32)
++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++++@@ -234,6 +239,8 @@ def rotate_half(x):
++++     """Rotates half the hidden dims of the input."""
++++     x1 = x[..., : x.shape[-1] // 2]
++++     x2 = x[..., x.shape[-1] // 2 :]
+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++     return ops.cat((-x2, x1), dim=-1)
++++ 
++++ 
++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++++         self.config = config
++++         self.hidden_size = config.hidden_size
++++         self.intermediate_size = intermediate_size
+++++        
++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++++         self.act_fn = ACT2FN[config.hidden_act]
++++ 
++++     def forward(self, x):
++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++-
++++ 
+++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++        # @lwx 
+++++        # gate_up_output = self.gate_up_proj(x)
+++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++++        # return self.down_proj(swiglu_output)
+++++
+++++    # def forward(self, x):
+++++    #     gate_proj_out = self.gate_proj(x)
+++++    #     up_proj_out = self.up_proj(x)
+++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++++    #     return self.down_proj(swiglu_out)
+++++        
++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++     """
++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++++         use_cache: bool = False,
++++         cache_position: Optional[mindspore.Tensor] = None,
++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++        
+++++
++++         bsz, q_len, _ = hidden_states.shape
++++ 
++++         query_states = self.q_proj(hidden_states)
++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++                     "with a layer index."
++++                 )
++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++            if isinstance(past_key_value, StaticCache):
+++++                kv_seq_len = key_states.shape[-2]
+++++            else:
+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++ 
++++         if past_key_value is not None:
++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++            
+++++            if isinstance(past_key_value, StaticCache):
+++++                kv_seq_len = key_states.shape[-2]
++++ 
++++         # repeat k/v heads if n_kv_heads < n_heads
++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++-
+++++        
++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++ 
++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++++-            raise ValueError(
++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++++-                f" {attn_weights.shape}"
++++-            )
++++-
++++-        if attention_mask is not None:  # no matter the length, we just slice it
++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++++        if attention_mask is not None:
+++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++             attn_weights = attn_weights + causal_mask
++++ 
++++         # upcast attention to fp32
++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++ 
++++         attn_output = self.o_proj(attn_output)
++++-
+++++        # @lwx
+++++        
+++++        # max_seq_len = self.max_position_embeddings  # 2048
+++++
+++++        # if attention_mask is not None:
+++++        #     # attention_mask: [B, 1, Sq, Sk]
+++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++
+++++        #     # pad 到 [max_seq_len, max_seq_len]
+++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++        #     global_attention_mask = padded_mask
+++++        # else:
+++++        #     global_attention_mask = None
+++++
+++++
+++++        # sparse_mode=3
+++++        # attn_output = mindspore.ops.flash_attention_score(
+++++        #     query=query_states,
+++++        #     key=key_states,
+++++        #     value=value_states,
+++++        #     real_shift=None,
+++++        #     padding_mask=None,
+++++
+++++        #     head_num=self.num_heads,
+++++        #     attn_mask=global_attention_mask,
+++++        #     keep_prob=1.0 - self.attention_dropout,
+++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++        #     input_layout="BNSD",
+++++        #     pre_tokens=2147483647,
+++++        #     next_tokens=2147483647,
+++++        #     inner_precise=0,
+++++        #     drop_mask=None,
+++++        #     prefix=None,
+++++        #     actual_seq_qlen=None,
+++++        #     actual_seq_kvlen=None,
+++++        #     sparse_mode=sparse_mode,
+++++        # )
++++         if not output_attentions:
++++             attn_weights = None
++++ 
++++         return attn_output, attn_weights, past_key_value
++++ 
++++ 
+++++class Qwen2MoeFlashAttention(nn.Module):
+++++    """
+++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++
+++++    关键改动:
+++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++       直接传入原始的 key 和 value 张量效率更高。
+++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++    """
+++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++        super().__init__()
+++++        self.config = config
+++++        self.layer_idx = layer_idx
+++++        self.hidden_size = config.hidden_size
+++++        self.num_heads = config.num_attention_heads
+++++        self.head_dim = self.hidden_size // self.num_heads
+++++        self.num_key_value_heads = config.num_key_value_heads
+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++        self.max_position_embeddings = config.max_position_embeddings
+++++        self.rope_theta = config.rope_theta
+++++        self.attention_dropout = config.attention_dropout
+++++
+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++            raise ValueError(
+++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++            )
+++++
+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++
+++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++            self.head_dim,
+++++            max_position_embeddings=self.max_position_embeddings,
+++++            base=self.rope_theta,
+++++        )
+++++
+++++    def forward(
+++++        self,
+++++        hidden_states: mindspore.Tensor,
+++++        attention_mask: Optional[mindspore.Tensor] = None,
+++++        position_ids: Optional[mindspore.Tensor] = None,
+++++        past_key_value: Optional[Cache] = None,
+++++        output_attentions: bool = False,
+++++        use_cache: bool = False,
+++++        cache_position: Optional[mindspore.Tensor] = None,
+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++        bsz, q_len, _ = hidden_states.shape
+++++
+++++        # 1. 线性投射 Q, K, V
+++++        query_states = self.q_proj(hidden_states)
+++++        key_states = self.k_proj(hidden_states)
+++++        value_states = self.v_proj(hidden_states)
+++++
+++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++        # 3. RoPE 旋转位置编码
+++++        kv_seq_len = key_states.shape[-2]
+++++        if past_key_value is not None:
+++++            if self.layer_idx is None:
+++++                raise ValueError(
+++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++                    "with a layer index."
+++++                )
+++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++                if cache_position.shape[0] == 1:
+++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++                    kv_seq_len = past_seen_tokens + 1
+++++                else:
+++++                    # prefill 阶段：cache_position 是范围，使用其长度
+++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++            else:
+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++
+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++        # 4. KV 缓存更新
+++++        if past_key_value is not None:
+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++            key_states, value_states = past_key_value.update(
+++++                key_states, value_states, self.layer_idx, cache_kwargs
+++++            )
+++++            
+++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++                if cache_position.shape[0] == 1:
+++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++                    kv_seq_len = key_states.shape[-2]
+++++
+++++        # 5. [重要] 准备 Attention Mask
+++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++        fa_attention_mask = None
+++++        if attention_mask is not None:
+++++            # 截取与当前key长度匹配的部分
+++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++            fa_attention_mask = (mask_slice != 0)
+++++
+++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++        input_dtype = query_states.dtype
+++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++            query_states = query_states.to(mindspore.float16)
+++++            key_states = key_states.to(mindspore.float16)
+++++            value_states = value_states.to(mindspore.float16)
+++++
+++++        # 6. [核心] 调用 flash_attention_score 算子
+++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++        attn_output = mindspore.ops.flash_attention_score(
+++++            query=query_states,
+++++            key=key_states,
+++++            value=value_states,
+++++            head_num=self.num_heads, # 传入Q的头数(N1)
+++++            attn_mask=fa_attention_mask,
+++++            keep_prob=1.0 - self.attention_dropout,
+++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+++++            input_layout="BNSD",
+++++            sparse_mode=0 # 使用 defaultMask 模式
+++++        )
+++++
+++++        # 恢复原始数据类型
+++++        attn_output = attn_output.to(input_dtype)
+++++
+++++        # 7. 调整输出形状
+++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++        attn_output = self.o_proj(attn_output)
+++++
+++++        # FlashAttention 算子不直接返回注意力权重矩阵
+++++        attn_weights = None
+++++        if output_attentions:
+++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++
+++++        return attn_output, attn_weights, past_key_value
+++++
+++++    # def forward(
+++++    #     self,
+++++    #     hidden_states: mindspore.Tensor,
+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++    #     past_key_value: Optional[Cache] = None,
+++++    #     output_attentions: bool = False,
+++++    #     use_cache: bool = False,
+++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++    #     bsz, q_len, _ = hidden_states.shape
+++++
+++++    #     # 1. 线性投射 Q, K, V
+++++    #     query_states = self.q_proj(hidden_states)
+++++    #     key_states = self.k_proj(hidden_states)
+++++    #     value_states = self.v_proj(hidden_states)
+++++
+++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++    #     # 3. RoPE 旋转位置编码
+++++    #     kv_seq_len = key_states.shape[-2]
+++++    #     if past_key_value is not None:
+++++    #         if self.layer_idx is None:
+++++    #             raise ValueError(
+++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++    #                 "with a layer index."
+++++    #             )
+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++
+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++    #     # 4. KV 缓存更新
+++++    #     if past_key_value is not None:
+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++    #         key_states, value_states = past_key_value.update(
+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++    #         )
+++++
+++++    #     # 5. 准备 Attention Mask
+++++    #     fa_attention_mask = None
+++++    #     if attention_mask is not None:
+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++    #         fa_attention_mask = (mask_slice != 0)
+++++
+++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++    #     input_dtype = query_states.dtype
+++++
+++++    #     # 6. [核心] 调用 flash_attention_score 算子
+++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++    #         query=query_states,
+++++    #         key=key_states,
+++++    #         value=value_states,
+++++    #         head_num=self.num_heads,
+++++    #         attn_mask=fa_attention_mask,
+++++    #         keep_prob=1.0 - self.attention_dropout,
+++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++    #         input_layout="BNSD",
+++++    #         sparse_mode=0,
+++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++    #         inner_precise=1
+++++    #     )
+++++
+++++    #     # 恢复原始数据类型
+++++    #     attn_output = attn_output.to(input_dtype)
+++++
+++++    #     # 7. 调整输出形状
+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++    #     attn_output = self.o_proj(attn_output)
+++++
+++++    #     attn_weights = None
+++++    #     if output_attentions:
+++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++
+++++    #     return attn_output, attn_weights, past_key_value
+++++
+++++    # def forward(
+++++    #     self,
+++++    #     hidden_states: mindspore.Tensor,
+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++    #     past_key_value: Optional[Cache] = None,
+++++    #     output_attentions: bool = False,
+++++    #     use_cache: bool = False,
+++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++    #     bsz, q_len, _ = hidden_states.shape
+++++
+++++    #     query_states = self.q_proj(hidden_states)
+++++    #     key_states = self.k_proj(hidden_states)
+++++    #     value_states = self.v_proj(hidden_states)
+++++
+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++    #     kv_seq_len = key_states.shape[-2]
+++++    #     if past_key_value is not None:
+++++    #         if self.layer_idx is None:
+++++    #             raise ValueError("`layer_idx` must be specified for caching")
+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++
+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++    #     if past_key_value is not None:
+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++    #         key_states, value_states = past_key_value.update(
+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++    #         )
+++++
+++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++
+++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++    #     query_states = query_states / math.sqrt(self.head_dim)
+++++    #     # <--- 修改结束 ---
+++++
+++++    #     fa_attention_mask = None
+++++    #     if attention_mask is not None:
+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++    #         fa_attention_mask = (mask_slice != 0)
+++++
+++++    #     input_dtype = query_states.dtype
+++++
+++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++    #         query=query_states,    # 传入已经预先缩放过的 query
+++++    #         key=key_states,
+++++    #         value=value_states,
+++++    #         head_num=self.num_heads,
+++++    #         attn_mask=fa_attention_mask,
+++++    #         keep_prob=1.0 - self.attention_dropout,
+++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++    #         input_layout="BNSD",
+++++    #         sparse_mode=0,
+++++    #         inner_precise=1        # 仍然保持内部高精度计算
+++++    #     )
+++++
+++++    #     attn_output = attn_output.to(input_dtype)
+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++    #     attn_output = self.o_proj(attn_output)
+++++
+++++    #     attn_weights = None
+++++    #     if output_attentions:
+++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++
+++++    #     return attn_output, attn_weights, past_key_value
+++++
++++ QWEN2MOE_ATTENTION_CLASSES = {
++++     "eager": Qwen2MoeAttention,
+++++    "flash-attention": Qwen2MoeFlashAttention,
++++ }
++++ 
++++ 
++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++ 
+++++    #@dwj
+++++    # 只遍历激活的专家，而非全部专家
++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-        hidden_states = hidden_states.view(-1, hidden_dim)
++++-        # router_logits: (batch * sequence_length, n_experts)
++++-        router_logits = self.gate(hidden_states)
++++-
++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++-        if self.norm_topk_prob:
++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-        # we cast back to the input dtype
++++-        routing_weights = routing_weights.to(hidden_states.dtype)
++++-
++++-        final_hidden_states = ops.zeros(
++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++++-        )
++++-
++++-        # One hot encode the selected experts to create an expert mask
++++-        # this will be used to easily index which expert is going to be sollicitated
++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++++-
++++-        # Loop over all available experts in the model and perform the computation on each expert
++++-        for expert_idx in range(self.num_experts):
++++-            expert_layer = self.experts[expert_idx]
++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++++-
++++-            # Index the correct hidden states and compute the expert hidden state for
++++-            # the current expert. We need to make sure to multiply the output hidden
++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++++-            if 0 not in idx.shape:
++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++++-
++++-                # However `index_add_` only support torch tensors for indexing so we'll use
++++-                # the `top_x` tensor here.
++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++++-
++++-        shared_expert_output = self.shared_expert(hidden_states)
++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++++-
++++-        final_hidden_states = final_hidden_states + shared_expert_output
+++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++            num_tokens = hidden_states_reshaped.shape[0]
+++++
+++++            router_logits = self.gate(hidden_states_reshaped)
+++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++
+++++            if self.norm_topk_prob:
+++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++            routing_weights = routing_weights.to(hidden_states.dtype)
+++++            
+++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++            flat_selected_experts = selected_experts.flatten()
+++++            
+++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++            token_indices = broadcasted_token_indices.flatten()
+++++            
+++++            active_experts = ops.unique(flat_selected_experts)
+++++            
+++++            for expert_idx_tensor in active_experts:
+++++                expert_idx = expert_idx_tensor.item()
+++++                expert_layer = self.experts[expert_idx]
+++++                
+++++                mask = (flat_selected_experts == expert_idx_tensor)
+++++                selected_token_indices = token_indices[mask]
+++++                selected_routing_weights = routing_weights.flatten()[mask]
+++++                
+++++                current_states = hidden_states_reshaped[selected_token_indices]
+++++                
+++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++                
+++++                final_hidden_states = final_hidden_states.index_add(
+++++                    dim=0,
+++++                    index=selected_token_indices,
+++++                    source=expert_output.to(hidden_states.dtype)
+++++                )
+++++            
+++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++ 
++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++-        return final_hidden_states, router_logits
+++++            final_hidden_states = final_hidden_states + shared_expert_output
+++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++            
+++++            return final_hidden_states, router_logits
++++ 
++++ 
++++ class Qwen2MoeDecoderLayer(nn.Module):
++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++++ 
++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++ 
+++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++
++++         if (layer_idx not in config.mlp_only_layers) and (
++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++         ):
++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++++     _skip_keys_device_placement = "past_key_values"
++++     _supports_cache_class = True
+++++#lwx
+++++    # _supports_static_cache = True
++++ 
++++     def _init_weights(self, module):
++++         std = self.config.initializer_range
++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++         return causal_mask
++++ 
++++ 
++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++     _tied_weights_keys = ["lm_head.weight"]
++++ 
++++     def __init__(self, config):
++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++         self.num_experts_per_tok = config.num_experts_per_tok
++++         # Initialize weights and apply final processing
++++         self.post_init()
+++++        # @lwx
+++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++++        #     self.generation_config.cache_implementation = "static"
+++++        self._warmed_up = False
+++++
+++++    def warmup_moe_model(self):
+++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++++        test_texts = [
+++++            "warmup short",
+++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++++        ]
+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++        if tokenizer is None:
+++++            from mindnlp.transformers import AutoTokenizer
+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++            self._warmup_tokenizer = tokenizer
+++++
+++++        for text in test_texts:
+++++            inputs = tokenizer(text, return_tensors="ms")
+++++            with mindspore._no_grad():
+++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
++++ 
++++     def get_input_embeddings(self):
++++         return self.model.embed_tokens
++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++         ```"""
+++++        if not self._warmed_up:
+++++            self._warmed_up = True
+++++            self.warmup_moe_model()
++++ 
++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++         output_router_logits = (
++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++             }
++++         )
++++         return model_inputs
+++++# @lwx
+++++    # def _decode_one_tokens_logits(
+++++    #     self,
+++++    #     cur_token: mindspore.Tensor,
+++++    #     input_pos: Optional[mindspore.Tensor],
+++++    #     cache_position: mindspore.Tensor,
+++++    #     past_key_values: StaticCache,
+++++    # ) -> mindspore.Tensor:
+++++    #     """
+++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++++        
+++++    #     Args:
+++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++++    #         input_pos: 输入位置信息，可选
+++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++++            
+++++    #     Returns:
+++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++++    #     """
+++++    #     # 调用JIT编译的版本
+++++    #     return self.get_decode_one_tokens_logits(
+++++    #         cur_token=cur_token,
+++++    #         input_pos=input_pos,
+++++    #         cache_position=cache_position,
+++++    #         past_key_values=past_key_values,
+++++    #     )
+++++    
+++++    # @mindspore.jit(jit_level='O1')
+++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++++    #     """
+++++    #     JIT编译的函数，用于高效的单token解码
+++++    #     使用JIT编译优化以支持静态shape和高效执行
+++++        
+++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++++    #     """
+++++    #     outputs = self.model.forward(
+++++    #         input_ids=cur_token,
+++++    #         position_ids=input_pos,
+++++    #         cache_position=cache_position,
+++++    #         past_key_values=past_key_values,
+++++    #         use_cache=True,
+++++    #         return_dict=False,
+++++    #     )
+++++        
+++++    #     hidden_states = outputs[0]
+++++    #     logits = self.lm_head.forward(hidden_states)
+++++    #     logits = logits.float()
+++++        
+++++    #     return logits[:, -1, :]
+++++
+++++    # def _sample(
+++++    #     self,
+++++    #     input_ids: mindspore.Tensor,
+++++    #     logits_processor,
+++++    #     stopping_criteria,
+++++    #     generation_config,
+++++    #     synced_devices: bool,
+++++    #     streamer=None,
+++++    #     logits_warper=None,
+++++    #     **model_kwargs,
+++++    # ):
+++++    #     """
+++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++++    #     """
+++++    #     from ...generation.logits_process import LogitsProcessorList
+++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++++    #     from mindnlp.core import nn, ops, no_grad
+++++    #     import numpy as np
+++++        
+++++    #     # 检查是否使用 StaticCache
+++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++++    #     # 否则，直接调用父类方法
+++++    #     past_key_values = model_kwargs.get("past_key_values")
+++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++++
+++++    #     if not isinstance(past_key_values, StaticCache):
+++++    #         # 不使用 StaticCache，直接调用父类方法
+++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++++    #         return super()._sample(
+++++    #             input_ids=input_ids,
+++++    #             logits_processor=logits_processor,
+++++    #             stopping_criteria=stopping_criteria,
+++++    #             generation_config=generation_config,
+++++    #             synced_devices=synced_devices,
+++++    #             streamer=streamer,
+++++    #             logits_warper=logits_warper,
+++++    #             **model_kwargs,
+++++    #         )
+++++        
+++++    #     # 使用 StaticCache，进入自定义循环
+++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++++    #     pad_token_id = generation_config._pad_token_tensor
+++++    #     output_attentions = generation_config.output_attentions
+++++    #     output_hidden_states = generation_config.output_hidden_states
+++++    #     output_scores = generation_config.output_scores
+++++    #     output_logits = generation_config.output_logits
+++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++++    #     max_length = generation_config.max_length
+++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++++    #     do_sample = generation_config.do_sample
+++++        
+++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++++    #         raise ValueError(
+++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++++    #             f"{logits_warper})."
+++++    #         )
+++++        
+++++    #     # init attention / hidden states / scores tuples
+++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++++        
+++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++++    #         encoder_hidden_states = (
+++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++++    #         )
+++++        
+++++    #     # keep track of which sequences are already finished
+++++    #     batch_size, cur_len = input_ids.shape
+++++    #     this_peer_finished = False
+++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++++        
+++++    #     time_record = []
+++++    #     from ....utils.testing_utils import parse_flag_from_env
+++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++++        
+++++    #     while self._has_unfinished_sequences(
+++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++++    #     ):
+++++    #         if _record_time:
+++++    #             import time as time_module
+++++    #             infer_start = time_module.time()
+++++            
+++++    #         # prepare model inputs
+++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++++            
+++++    #         # prepare variable output controls
+++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++++            
+++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++++    #         cur_cache_position = model_inputs.get("cache_position")
+++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+++++    #         cur_input_ids = model_inputs.get("input_ids")
+++++            
+++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++++    #             cur_cache_position is not None and 
+++++    #             len(cur_cache_position.shape) > 0 and
+++++    #             cur_cache_position.shape[0] == 1 and
+++++    #             cur_input_ids is not None and
+++++    #             cur_input_ids.shape[1] == 1):
+++++    #             # 使用 JIT 优化的单 token 解码
+++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++++    #             if not hasattr(self, '_jit_used'):
+++++    #                 self._jit_used = False
+++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++++                
+++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+++++    #                 cur_token=cur_input_ids,
+++++    #                 input_pos=model_inputs.get("position_ids"),
+++++    #                 cache_position=cur_cache_position,
+++++    #                 past_key_values=cur_past_key_values,
+++++    #             )
+++++                
+++++    #             # 标记已使用JIT（用于后续判断）
+++++    #             if not self._jit_used:
+++++    #                 self._jit_used = True
+++++                
+++++    #             # 构造兼容的输出对象
+++++    #             class JitOptimizedOutput:
+++++    #                 def __init__(self, logits, config):
+++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++++    #                     self.config = config
+++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+++++    #                     self.cross_attentions = None
+++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++++                
+++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++++    #         else:
+++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++++    #             outputs = self(**model_inputs, return_dict=True)
+++++            
+++++    #         if synced_devices and this_peer_finished:
+++++    #             continue
+++++            
+++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++++    #         next_token_logits = outputs.logits[:, -1, :]
+++++            
+++++    #         # pre-process distribution
+++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++++    #         if do_sample:
+++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++++            
+++++    #         # Store scores, attentions and hidden_states when required
+++++    #         if return_dict_in_generate:
+++++    #             if output_scores:
+++++    #                 scores += (next_token_scores,)
+++++    #             if output_logits:
+++++    #                 raw_logits += (next_token_logits,)
+++++    #             if output_attentions:
+++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++++    #                 if self.config.is_encoder_decoder:
+++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++++                
+++++    #             if output_hidden_states:
+++++    #                 hidden = (
+++++    #                     outputs.decoder_hidden_states
+++++    #                     if self.config.is_encoder_decoder
+++++    #                     else outputs.hidden_states
+++++    #                 )
+++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++++            
+++++    #         # token selection
+++++    #         if do_sample:
+++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++++    #         else:
+++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++++            
+++++    #         # finished sentences should have their next token be a padding token
+++++    #         if has_eos_stopping_criteria:
+++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++++            
+++++    #         # update generated ids, model inputs, and length for next step
+++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++++    #         if streamer is not None:
+++++    #             streamer.put(next_tokens)
+++++            
+++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+++++    #             outputs,
+++++    #             model_kwargs,
+++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++++    #         )
+++++            
+++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++++    #         cur_len += 1
+++++            
+++++    #         if _record_time:
+++++    #             import time as time_module
+++++    #             infer_stop = time_module.time()
+++++    #             time_record.append(infer_stop - infer_start)
+++++            
+++++    #         del outputs
+++++        
+++++    #     average_infer_time = None
+++++    #     if time_record:
+++++    #         if len(time_record) > 1:
+++++    #             time_record.pop(0)
+++++    #         average_infer_time = sum(time_record) / len(time_record)
+++++    #         print(f'average inference time is: {average_infer_time}')
+++++    #         print(f'inference time record: {time_record}')
+++++        
+++++    #     if streamer is not None:
+++++    #         streamer.end()
+++++        
+++++    #     # 简单判断：打印是否使用了JIT路径
+++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+++++    #     else:
+++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++++        
+++++    #     if return_dict_in_generate:
+++++    #         if self.config.is_encoder_decoder:
+++++    #             return GenerateEncoderDecoderOutput(
+++++    #                 sequences=input_ids,
+++++    #                 scores=scores,
+++++    #                 logits=raw_logits,
+++++    #                 encoder_attentions=encoder_attentions,
+++++    #                 encoder_hidden_states=encoder_hidden_states,
+++++    #                 decoder_attentions=decoder_attentions,
+++++    #                 cross_attentions=cross_attentions,
+++++    #                 decoder_hidden_states=decoder_hidden_states,
+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++    #                 average_infer_time=average_infer_time
+++++    #             )
+++++    #         else:
+++++    #             return GenerateDecoderOnlyOutput(
+++++    #                 sequences=input_ids,
+++++    #                 scores=scores,
+++++    #                 logits=raw_logits,
+++++    #                 attentions=decoder_attentions,
+++++    #                 hidden_states=decoder_hidden_states,
+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++    #                 average_infer_time=average_infer_time
+++++    #             )
+++++    #     else:
+++++    #         return input_ids
+++++            
+++++    # def _prepare_cache_for_generation(
+++++    #     self,
+++++    #     generation_config,
+++++    #     model_kwargs,
+++++    #     assistant_model,
+++++    #     batch_size,
+++++    #     max_cache_length,
+++++    # ):
+++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++++    #         generation_config.cache_implementation = "static"
+++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++++        
+++++    #     if generation_config.cache_implementation == "static":
+++++    #         base_required_from_max_length = generation_config.max_length + 1
+++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+++++    #         min_cache_size = 50
+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++++    #         else:
+++++    #             max_cache_length = max(base_required, min_cache_size)
+++++            
+++++    #         original_max_cache_length = max_cache_length
+++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+++++            
+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++    #             if max_cache_length > self.config.max_position_embeddings:
+++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++++        
+++++    #     result = super()._prepare_cache_for_generation(
+++++    #         generation_config=generation_config,
+++++    #         model_kwargs=model_kwargs,
+++++    #         assistant_model=assistant_model,
+++++    #         batch_size=batch_size,
+++++    #         max_cache_length=max_cache_length,
+++++    #     )
+++++        
+++++    #     if generation_config.cache_implementation == "static":
+++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++++    #         created_cache = model_kwargs.get(cache_name)
+++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++++    #             if created_cache.max_cache_len < generation_config.max_length:
+++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++++        
+++++    #     return result
+++++
+++++
+++++
++++ 
++++ 
++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++++-- 
++++2.27.0
++++
+++-- 
+++2.27.0
+++
++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++new file mode 100644
++index 00000000..966529e4
++--- /dev/null
+++++ b/patches/0003-20261106secondcommit.patch
++@@ -0,0 +1,2769 @@
+++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+++From: Pinoeer-kingxi <13022943007@163.com>
+++Date: Thu, 6 Nov 2025 14:54:37 +0800
+++Subject: [PATCH 3/3] 20261106secondcommit
+++
+++---
+++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+++ patches/0001-20251104commit.patch             | 1272 -----------------
+++ 3 files changed, 528 insertions(+), 2032 deletions(-)
+++ delete mode 100644 patches/0001-20251104commit.patch
+++
+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++index 73773c22..2f9192bf 100644
+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+++ 
+++ _CONFIG_FOR_DOC = "DeepseekConfig"
+++ 
++++_attn_mask_cache = {}
++++
++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
++++    q_len = batch_and_seq[1]
++++    kv_len = batch_and_seq[1] + past_key_values_length 
++++    key = (batch_and_seq[0], q_len, kv_len)
++++
++++    if key in _attn_mask_cache:
++++        return _attn_mask_cache[key]
++++
++++    mask = _prepare_4d_causal_attention_mask(
++++        attention_mask,
++++        batch_and_seq,
++++        inputs_embeds,
++++        past_key_values_length,
++++    )
++++    _attn_mask_cache[key] = mask
++++    return mask
+++ 
+++ def _get_unpad_data(attention_mask):
+++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+++         return final_output
+++ 
+++ 
+++-    @no_grad()
+++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++-        expert_cache = ops.zeros_like(x)
+++-        idxs = flat_expert_indices.argsort()
+++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++-        token_idxs = idxs // self.num_experts_per_tok
+++-
+++-        for i, end_idx in enumerate(tokens_per_expert):
+++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++-            if start_idx == end_idx:
+++-                continue
+++-            expert = self.experts[i]
+++-            exp_token_idx = token_idxs[start_idx:end_idx]
+++-            expert_tokens = x[exp_token_idx]
+++-            expert_out = expert(expert_tokens)
+++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++-
+++-        return expert_cache
+++-        
+++     # @no_grad()
+++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++-    #     # expert_cache = torch.zeros_like(x)
+++-    #     # idxs = flat_expert_indices.argsort()
+++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++-    #     # token_idxs = idxs // self.num_experts_per_tok
+++-    #     # for i, end_idx in enumerate(tokens_per_expert):
+++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++-    #     #     if start_idx == end_idx:
+++-    #     #         continue
+++-    #     #     expert = self.experts[i]
+++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++-    #     #     expert_tokens = x[exp_token_idx]
+++-    #     #     expert_out = expert(expert_tokens)
+++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++-    #     # return expert_cache
++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++     #     expert_cache = ops.zeros_like(x)
+++     #     idxs = flat_expert_indices.argsort()
+++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++ 
+++     #     return expert_cache
+++-    # @no_grad()
+++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++-    #     expert_cache = ops.zeros_like(x)
++++        
++++    @no_grad()
++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++        """
++++        优化版 MoE prefill：
++++        - 批量张量化处理同一个 expert 的所有 token
++++        - 跳过无 token 的专家
++++        - 保持结果完全一致
++++        """
++++        # 初始化输出缓存
++++        expert_cache = ops.zeros_like(x)
+++ 
+++-    #     # 排序保证顺序一致
+++-    #     idxs = flat_expert_indices.argsort()
+++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++-    #     token_idxs = idxs // self.num_experts_per_tok
++++        # 排序（确保 scatter_add 位置对应原逻辑）
++++        idxs = flat_expert_indices.argsort()
++++        sorted_expert_indices = flat_expert_indices[idxs]
++++        sorted_token_indices = idxs // self.num_experts_per_tok
+++ 
+++-    #     # 找出有 token 的专家
+++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++        # 每个 expert 的 token 数
++++        tokens_per_expert = sorted_expert_indices.bincount()
+++ 
+++-    #     for i in active_experts.tolist():
+++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++-    #         end_idx = tokens_per_expert[i]
+++-    #         if start_idx == end_idx:  # 没有 token
+++-    #             continue
++++        # 找出有 token 的专家
++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+++ 
+++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++-    #         expert_tokens = x[exp_token_idx]
+++-    #         expert_out = self.experts[i](expert_tokens)
+++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++        for expert_id in active_experts.tolist():
++++            # 取该 expert 对应的排序后 token 区间
++++            start = (tokens_per_expert[:expert_id]).sum().item()
++++            end = start + tokens_per_expert[expert_id].item()
+++ 
+++-    #         expert_cache = mindspore.mint.scatter_add(
+++-    #             expert_cache,
+++-    #             0,
+++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++-    #             expert_out
+++-    #         )
++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
++++            expert_tokens = x[token_idx]                     # 取输入向量
+++ 
+++-    #     return expert_cache
++++            # 执行专家 MLP
++++            expert_out = self.experts[expert_id](expert_tokens)
++++
++++            # 按权重缩放
++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
++++
++++            # 回写到缓存（等价 scatter_add）
++++            expert_cache = mindspore.mint.scatter_add(
++++                expert_cache,
++++                0,
++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++                scaled_out
++++            )
++++
++++        return expert_cache
++++
++++        # @no_grad()
++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++        #     # expert_cache = torch.zeros_like(x)
++++        #     # idxs = flat_expert_indices.argsort()
++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++        #     # token_idxs = idxs // self.num_experts_per_tok
++++        #     # for i, end_idx in enumerate(tokens_per_expert):
++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++        #     #     if start_idx == end_idx:
++++        #     #         continue
++++        #     #     expert = self.experts[i]
++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++        #     #     expert_tokens = x[exp_token_idx]
++++        #     #     expert_out = expert(expert_tokens)
++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++        #     # return expert_cache
++++        #     expert_cache = ops.zeros_like(x)
++++        #     idxs = flat_expert_indices.argsort()
++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++        #     token_idxs = idxs // self.num_experts_per_tok
++++
++++        #     for i, end_idx in enumerate(tokens_per_expert):
++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++        #         if start_idx == end_idx:
++++        #             continue
++++        #         expert = self.experts[i]
++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
++++        #         expert_tokens = x[exp_token_idx]
++++        #         expert_out = expert(expert_tokens)
++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++
++++        #     return expert_cache
++++        # @no_grad()
++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++        #     expert_cache = ops.zeros_like(x)
++++
++++        #     # 排序保证顺序一致
++++        #     idxs = flat_expert_indices.argsort()
++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++        #     token_idxs = idxs // self.num_experts_per_tok
++++
++++        #     # 找出有 token 的专家
++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++
++++        #     for i in active_experts.tolist():
++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++        #         end_idx = tokens_per_expert[i]
++++        #         if start_idx == end_idx:  # 没有 token
++++        #             continue
++++
++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
++++        #         expert_tokens = x[exp_token_idx]
++++        #         expert_out = self.experts[i](expert_tokens)
++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++
++++        #         expert_cache = mindspore.mint.scatter_add(
++++        #             expert_cache,
++++        #             0,
++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++        #             expert_out
++++        #         )
++++
++++        #     return expert_cache
+++ 
+++ 
+++ 
+++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+++ 
+++         return attn_output, attn_weights, past_key_value
+++ 
+++-
+++ # class DeepseekFlashAttention(nn.Module):
+++ #     """
+++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+++ 
+++         return attn_output, attn_weights, past_key_value
+++ 
++++
+++ Deepseek_ATTENTION_CLASSES = {
+++     "eager": DeepseekAttention,
+++     "flash-attention": DeepseekFlashAttention,
+++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+++             )
+++         else:
+++             # 4d mask is passed through the layers
+++-            attention_mask = _prepare_4d_causal_attention_mask(
++++            # attention_mask = _prepare_4d_causal_attention_mask(
++++            #     attention_mask,
++++            #     (batch_size, seq_length),
++++            #     inputs_embeds,
++++            #     past_key_values_length,
++++            # )
++++            #@dwj
++++            attention_mask = get_cached_causal_mask(
+++                 attention_mask,
+++                 (batch_size, seq_length),
+++                 inputs_embeds,
+++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++         # Initialize weights and apply final processing
+++         self.post_init()
+++         self.warm_up = False
++++        #@dwj
++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
++++            self.num_layers,
++++            self.num_attention_heads,
++++            self.head_dim,
++++            batch_size=1,
++++            max_length=self.max_length,
++++            dtype=mindspore.float16
++++        )
++++
++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
++++        key_cache = []
++++        value_cache = []
++++        for _ in range(num_layers):
++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++            key_cache.append(k)
++++            value_cache.append(v)
++++        return key_cache, value_cache
++++
+++ 
+++     def warmup_moe_model_deep(self):
+++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++index bced285c..ebd7782e 100644
+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+++ 
+++-Long_Prompt = False
+++-PROMPT_LENGTH_THRESHOLD = 128
++++Long_Prompt = 1
++++LONG_PROMPT_LENGTH_THRESHOLD = 128
++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
++++
++++_causal_mask_cache = {}
++++
++++def get_cached_causal_mask_with_cache_position(
++++    attention_mask: mindspore.Tensor,
++++    sequence_length: int,
++++    target_length: int,
++++    dtype: mindspore.dtype,
++++    min_dtype: float,
++++    cache_position: mindspore.Tensor,
++++    batch_size: int,
++++):
++++    """
++++    带缓存的 causal mask 构造函数
++++    """
++++    # q_len 是当前 query 长度
++++    q_len = sequence_length
++++    # kv_len 是 target_length
++++    kv_len = target_length
++++
++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
++++
++++    if key in _causal_mask_cache:
++++        return _causal_mask_cache[key]
++++
++++    # 调用原来的 mask 构造逻辑
++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++        attention_mask,
++++        sequence_length=sequence_length,
++++        target_length=target_length,
++++        dtype=dtype,
++++        min_dtype=min_dtype,
++++        cache_position=cache_position,
++++        batch_size=batch_size,
++++    )
++++    # 缓存结果
++++    _causal_mask_cache[key] = causal_mask
++++    return causal_mask
+++ 
+++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+++ def _prepare_4d_causal_attention_mask_with_cache_position(
+++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++ 
+++ 
+++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
++++# class Qwen2MoeAttention(nn.Module):
++++#     """
++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
++++#     and "Generating Long Sequences with Sparse Transformers".
++++#     """
++++
++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++#         super().__init__()
++++#         self.config = config
++++#         self.layer_idx = layer_idx
++++#         if layer_idx is None:
++++#             logger.warning_once(
++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++#                 "when creating this class."
++++#             )
++++
++++#         self.hidden_size = config.hidden_size
++++#         self.num_heads = config.num_attention_heads
++++#         self.head_dim = self.hidden_size // self.num_heads
++++#         self.num_key_value_heads = config.num_key_value_heads
++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++#         self.max_position_embeddings = config.max_position_embeddings
++++#         self.rope_theta = config.rope_theta
++++#         self.is_causal = True
++++#         self.attention_dropout = config.attention_dropout
++++
++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++#             raise ValueError(
++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++#                 f" and `num_heads`: {self.num_heads})."
++++#             )
++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++
++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++#             self.head_dim,
++++#             max_position_embeddings=self.max_position_embeddings,
++++#             base=self.rope_theta,
++++#         )
++++
++++#     def forward(
++++#         self,
++++#         hidden_states: mindspore.Tensor,
++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++#         position_ids: Optional[mindspore.Tensor] = None,
++++#         past_key_value: Optional[Cache] = None,
++++#         output_attentions: bool = False,
++++#         use_cache: bool = False,
++++#         cache_position: Optional[mindspore.Tensor] = None,
++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++
++++        
++++
++++#         bsz, q_len, _ = hidden_states.shape
++++
++++#         query_states = self.q_proj(hidden_states)
++++#         key_states = self.k_proj(hidden_states)
++++#         value_states = self.v_proj(hidden_states)
++++
++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++
++++#         kv_seq_len = key_states.shape[-2]
++++#         if past_key_value is not None:
++++#             if self.layer_idx is None:
++++#                 raise ValueError(
++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++#                     "with a layer index."
++++#                 )
++++#             if isinstance(past_key_value, StaticCache):
++++#                 kv_seq_len = key_states.shape[-2]
++++#             else:
++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++
++++#         if past_key_value is not None:
++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++            
++++#             if isinstance(past_key_value, StaticCache):
++++#                 kv_seq_len = key_states.shape[-2]
++++
++++#         # repeat k/v heads if n_kv_heads < n_heads
++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++        
++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++
++++#         if attention_mask is not None:
++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++#             attn_weights = attn_weights + causal_mask
++++
++++#         # upcast attention to fp32
++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++++#         attn_output = ops.matmul(attn_weights, value_states)
++++
++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++++#             raise ValueError(
++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
++++#                 f" {attn_output.shape}"
++++#             )
++++
++++#         attn_output = ops.transpose(attn_output, 1, 2)
++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++
++++#         attn_output = self.o_proj(attn_output)
++++#         # @lwx
++++        
++++#         # max_seq_len = self.max_position_embeddings  # 2048
++++
++++#         # if attention_mask is not None:
++++#         #     # attention_mask: [B, 1, Sq, Sk]
++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++
++++#         #     # pad 到 [max_seq_len, max_seq_len]
++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++#         #     global_attention_mask = padded_mask
++++#         # else:
++++#         #     global_attention_mask = None
++++
++++
++++#         # sparse_mode=3
++++#         # attn_output = mindspore.ops.flash_attention_score(
++++#         #     query=query_states,
++++#         #     key=key_states,
++++#         #     value=value_states,
++++#         #     real_shift=None,
++++#         #     padding_mask=None,
++++
++++#         #     head_num=self.num_heads,
++++#         #     attn_mask=global_attention_mask,
++++#         #     keep_prob=1.0 - self.attention_dropout,
++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++#         #     input_layout="BNSD",
++++#         #     pre_tokens=2147483647,
++++#         #     next_tokens=2147483647,
++++#         #     inner_precise=0,
++++#         #     drop_mask=None,
++++#         #     prefix=None,
++++#         #     actual_seq_qlen=None,
++++#         #     actual_seq_kvlen=None,
++++#         #     sparse_mode=sparse_mode,
++++#         # )
++++#         if not output_attentions:
++++#             attn_weights = None
++++
++++#         return attn_output, attn_weights, past_key_value
++++
+++ class Qwen2MoeAttention(nn.Module):
+++     """
+++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+++-    and "Generating Long Sequences with Sparse Transformers".
+++-    """
++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+++ 
++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
++++
++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
++++    """
+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++         super().__init__()
+++         self.config = config
+++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+++         if layer_idx is None:
+++             logger.warning_once(
+++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++                 "when creating this class."
+++             )
+++ 
+++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+++         use_cache: bool = False,
+++         cache_position: Optional[mindspore.Tensor] = None,
+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++-
+++         
+++-
++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+++         bsz, q_len, _ = hidden_states.shape
+++ 
+++         query_states = self.q_proj(hidden_states)
+++         key_states = self.k_proj(hidden_states)
+++         value_states = self.v_proj(hidden_states)
+++ 
+++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++-
++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++        
+++         kv_seq_len = key_states.shape[-2]
+++         if past_key_value is not None:
+++-            if self.layer_idx is None:
+++-                raise ValueError(
+++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++-                    "with a layer index."
+++-                )
+++-            if isinstance(past_key_value, StaticCache):
+++-                kv_seq_len = key_states.shape[-2]
+++-            else:
+++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++        
+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++ 
+++         if past_key_value is not None:
+++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++
++++        # --- 2. 动态调度核心注意力计算 ---
++++        global Long_Prompt
++++        if Long_Prompt >= 1:
++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
++++            fa_attention_mask = None
++++            if attention_mask is not None:
++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++                fa_attention_mask = (mask_slice != 0)
++++
++++            attn_output = mindspore.ops.flash_attention_score(
++++                query=query_states,
++++                key=key_states,
++++                value=value_states,
++++                head_num=self.num_heads,
++++                attn_mask=fa_attention_mask,
++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
++++                scalar_value=1.0 / math.sqrt(self.head_dim),
++++                input_layout="BNSD",
++++                sparse_mode=0,
++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
++++            )
+++             
+++-            if isinstance(past_key_value, StaticCache):
+++-                kv_seq_len = key_states.shape[-2]
++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++            attn_output = self.o_proj(attn_output)
++++            attn_weights = None
++++            if output_attentions:
++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+++ 
+++-        # repeat k/v heads if n_kv_heads < n_heads
+++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+++-        
+++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++        else:
++++            # --- Eager Attention 路径 (用于短序列和解码) ---
++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
++++            
++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++ 
+++-        if attention_mask is not None:
+++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++-            attn_weights = attn_weights + causal_mask
++++            if attention_mask is not None:
++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++                attn_weights = attn_weights + causal_mask
+++ 
+++-        # upcast attention to fp32
+++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++-        attn_output = ops.matmul(attn_weights, value_states)
++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++++            attn_output = ops.matmul(attn_weights, value_states)
+++ 
+++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++-            raise ValueError(
+++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+++-                f" {attn_output.shape}"
+++-            )
++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++++                raise ValueError(
++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
++++                )
+++ 
+++-        attn_output = ops.transpose(attn_output, 1, 2)
+++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++            attn_output = ops.transpose(attn_output, 1, 2)
++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++            attn_output = self.o_proj(attn_output)
+++ 
+++-        attn_output = self.o_proj(attn_output)
+++-        # @lwx
++++            if not output_attentions:
++++                attn_weights = None
+++         
+++-        # max_seq_len = self.max_position_embeddings  # 2048
+++-
+++-        # if attention_mask is not None:
+++-        #     # attention_mask: [B, 1, Sq, Sk]
+++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++-
+++-        #     # pad 到 [max_seq_len, max_seq_len]
+++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++-        #     global_attention_mask = padded_mask
+++-        # else:
+++-        #     global_attention_mask = None
+++-
+++-
+++-        # sparse_mode=3
+++-        # attn_output = mindspore.ops.flash_attention_score(
+++-        #     query=query_states,
+++-        #     key=key_states,
+++-        #     value=value_states,
+++-        #     real_shift=None,
+++-        #     padding_mask=None,
+++-
+++-        #     head_num=self.num_heads,
+++-        #     attn_mask=global_attention_mask,
+++-        #     keep_prob=1.0 - self.attention_dropout,
+++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++-        #     input_layout="BNSD",
+++-        #     pre_tokens=2147483647,
+++-        #     next_tokens=2147483647,
+++-        #     inner_precise=0,
+++-        #     drop_mask=None,
+++-        #     prefix=None,
+++-        #     actual_seq_qlen=None,
+++-        #     actual_seq_kvlen=None,
+++-        #     sparse_mode=sparse_mode,
+++-        # )
+++-        if not output_attentions:
+++-            attn_weights = None
+++-
+++         return attn_output, attn_weights, past_key_value
+++ 
+++-
+++ # class Qwen2MoeFlashAttention(nn.Module):
+++ #     """
+++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+++ #             return final_hidden_states, router_logits
+++ 
+++ 
+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++-#     """
+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+++-#     """
+++-#     def __init__(self, config: Qwen2MoeConfig):
+++-#         super().__init__()
+++-#         self.num_experts = config.num_experts
+++-#         self.top_k = config.num_experts_per_tok
+++-#         self.norm_topk_prob = config.norm_topk_prob
+++-
+++-#         # 门控网络
+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++-#         # 专家列表
+++-#         self.experts = nn.ModuleList(
+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++-#         )
+++-#         # 共享专家
+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++-
+++-#     @no_grad()
+++-#     def _moe_infer_decode(
+++-#         self, 
+++-#         hidden_states: mindspore.Tensor, 
+++-#         selected_experts: mindspore.Tensor, 
+++-#         routing_weights: mindspore.Tensor
+++-#     ) -> mindspore.Tensor:
+++-#         """
+++-#         【解码路径】针对 sequence_length=1 的极致优化。
+++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+++-#         """
+++-#         batch_size, hidden_dim = hidden_states.shape
+++-        
+++-#         expert_outputs_list = [
+++-#             ops.cat([
+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++-#             ], dim=0) 
+++-#             for i in range(batch_size)
+++-#         ]
+++-        
+++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+++-#         # shape: (batch_size, top_k, hidden_dim)
+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++-        
+++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++-        
+++-#         return moe_output.squeeze(1)
+++-
+++-#     @no_grad()
+++-#     def _moe_infer_prefill(
+++-#         self, 
+++-#         hidden_states: mindspore.Tensor, 
+++-#         selected_experts: mindspore.Tensor, 
+++-#         routing_weights: mindspore.Tensor
+++-#     ) -> mindspore.Tensor:
+++-#         """
+++-#         【预填充路径】针对 sequence_length > 1 的优化。
+++-#         按专家对 Token 进行分组，并进行批处理。
+++-#         """
+++-#         moe_output = ops.zeros_like(hidden_states)
+++-#         num_tokens = hidden_states.shape[0]
+++-#         flat_selected_experts = selected_experts.flatten()
+++-        
+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++-        
+++-#         active_experts = ops.unique(flat_selected_experts)
+++-        
+++-#         for expert_idx_tensor in active_experts:
+++-#             expert_idx = expert_idx_tensor.item()
+++-#             expert_layer = self.experts[expert_idx]
+++-            
+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++-#             selected_token_indices = token_indices[mask]
+++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++-            
+++-#             current_states = hidden_states[selected_token_indices]
+++-            
+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++-            
+++-#             moe_output = moe_output.index_add(
+++-#                 dim=0,
+++-#                 index=selected_token_indices,
+++-#                 source=expert_output.to(hidden_states.dtype)
+++-#             )
+++-#         return moe_output
+++-
+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++-#         """
+++-#         顶层 forward 方法，作为智能分发器。
+++-#         """
+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-        
+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++-#         router_logits = self.gate(hidden_states_reshaped)
+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++-
+++-#         if self.norm_topk_prob:
+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-        
+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++-        
+++-#         moe_output = None
+++-#         # 在推理时，根据序列长度选择最优路径
+++-#         if not self.training:
+++-#             if sequence_length == 1:
+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++-#             else:
+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++-#         else:
+++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+++-#             raise NotImplementedError("Training path is not implemented.")
+++-
+++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+++-        
+++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+++-        
+++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+++-        
+++-#         return final_hidden_states, router_logits
+++-
+++-
+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++-#     """
+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+++-#     """
+++-#     def __init__(self, config: Qwen2MoeConfig):
+++-#         super().__init__()
+++-#         self.num_experts = config.num_experts
+++-#         self.top_k = config.num_experts_per_tok
+++-#         self.norm_topk_prob = config.norm_topk_prob
+++-
+++-#         # 门控网络
+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++-#         # 专家列表
+++-#         self.experts = nn.ModuleList(
+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++-#         )
+++-#         # 共享专家
+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++-
+++-#     @no_grad()
+++-#     def _moe_infer_decode(
+++-#         self, 
+++-#         hidden_states: mindspore.Tensor, 
+++-#         selected_experts: mindspore.Tensor, 
+++-#         routing_weights: mindspore.Tensor
+++-#     ) -> mindspore.Tensor:
+++-#         batch_size, _ = hidden_states.shape
+++-#         expert_outputs_list = [
+++-#             ops.cat([
+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++-#             ], dim=0) 
+++-#             for i in range(batch_size)
+++-#         ]
+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++-#         return moe_output.squeeze(1)
+++-
+++-#     @no_grad()
+++-#     def _moe_infer_prefill(
+++-#         self, 
+++-#         hidden_states: mindspore.Tensor, 
+++-#         selected_experts: mindspore.Tensor, 
+++-#         routing_weights: mindspore.Tensor
+++-#     ) -> mindspore.Tensor:
+++-#         moe_output = ops.zeros_like(hidden_states)
+++-#         num_tokens = hidden_states.shape[0]
+++-#         flat_selected_experts = selected_experts.flatten()
+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++-#         active_experts = ops.unique(flat_selected_experts)
+++-        
+++-#         for expert_idx_tensor in active_experts:
+++-#             expert_idx = expert_idx_tensor.item()
+++-#             expert_layer = self.experts[expert_idx]
+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++-#             selected_token_indices = token_indices[mask]
+++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++-#             current_states = hidden_states[selected_token_indices]
+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++-#             moe_output = moe_output.index_add(
+++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++-#             )
+++-#         return moe_output
+++-
+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++-#         """
+++-#         顶层 forward 方法，作为智能分发器。
+++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+++-#         """
+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-        
+++-#         # 1. 门控计算 (通用逻辑)
+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++-#         router_logits = self.gate(hidden_states_reshaped)
+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++-
+++-#         if self.norm_topk_prob:
+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-        
+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++-        
+++-#         # 2. 智能分发到最优 MoE 路径
+++-#         moe_output = None
+++-#         if not self.training:
+++-#             if sequence_length == 1:
+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++-#             else:
+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++-#         else:
+++-#             raise NotImplementedError("Training path is not implemented.")
+++-
+++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++-        
+++-#         # 4. 合并 MoE 输出和共享专家输出
+++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++-        
+++-#         # 5. 恢复原始形状并返回
+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++-        
+++-#         return final_hidden_states, router_logits
+++-
+++-# prefill fastest
+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++-#     """
+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+++-#     """
+++-#     def __init__(self, config: Qwen2MoeConfig):
+++-#         super().__init__()
+++-#         self.num_experts = config.num_experts
+++-#         self.top_k = config.num_experts_per_tok
+++-#         self.norm_topk_prob = config.norm_topk_prob
+++-
+++-#         # 门控网络
+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++-#         # 专家列表
+++-#         self.experts = nn.ModuleList(
+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++-#         )
+++-#         # 共享专家
+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++-
+++-#     @no_grad()
+++-#     def _moe_infer_dispatch(
+++-#         self, 
+++-#         hidden_states: mindspore.Tensor, 
+++-#         selected_experts: mindspore.Tensor, 
+++-#         routing_weights: mindspore.Tensor
+++-#     ) -> mindspore.Tensor:
+++-#         """
+++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+++-#         """
+++-#         moe_output = ops.zeros_like(hidden_states)
+++-#         num_tokens, _ = hidden_states.shape
+++-        
+++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+++-#         flat_selected_experts = selected_experts.flatten()
+++-#         flat_routing_weights = routing_weights.flatten()
+++-
+++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++-
+++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+++-#         active_experts = ops.unique(flat_selected_experts)
+++-        
+++-#         for expert_idx_tensor in active_experts:
+++-#             expert_idx = expert_idx_tensor.item()
+++-#             expert_layer = self.experts[expert_idx]
+++-            
+++-#             # 找到所有分配给该专家的 token
+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++-            
+++-#             # 使用 mask 选取对应的 token 和权重
+++-#             current_token_indices = token_indices[mask]
+++-#             current_routing_weights = flat_routing_weights[mask]
+++-#             current_hidden_states = hidden_states[current_token_indices]
+++-            
+++-#             # 对这些 token 进行批处理
+++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++-            
+++-#             # 使用 index_add 将结果精确地加回到对应位置
+++-#             moe_output = moe_output.index_add(
+++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+++-#             )
+++-#         return moe_output
+++-
+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++-#         """
+++-#         顶层 forward 方法，作为智能分发器。
+++-#         """
+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-        
+++-#         # 1. 门控计算
+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++-#         router_logits = self.gate(hidden_states_reshaped)
+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++-
+++-#         if self.norm_topk_prob:
+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-        
+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++-        
+++-#         # 2. 调用统一的 MoE 计算内核
+++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+++-
+++-#         # 3. 统一处理共享专家
+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++-        
+++-#         # 4. 合并输出
+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++-        
+++-#         # 5. 恢复原始形状并返回
+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++-        
+++-#         return final_hidden_states, router_logits
+++-
+++-
+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++-#     """
+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++-#     【最终高性能与高精度版】：
+++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+++-#     3. 这样实现了速度和准确性的两全其美。
+++-#     """
+++-#     def __init__(self, config: Qwen2MoeConfig):
+++-#         super().__init__()
+++-#         self.num_experts = config.num_experts
+++-#         self.top_k = config.num_experts_per_tok
+++-#         self.norm_topk_prob = config.norm_topk_prob
+++-
+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++-#         self.experts = nn.ModuleList(
+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++-#         )
+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++-
+++-#     @no_grad()
+++-#     def _moe_infer_decode(
+++-#         self, 
+++-#         hidden_states: mindspore.Tensor, 
+++-#         selected_experts: mindspore.Tensor, 
+++-#         routing_weights: mindspore.Tensor
+++-#     ) -> mindspore.Tensor:
+++-#         """
+++-#         【解码路径】极致优化版：bmm + 高精度累加。
+++-#         """
+++-#         original_dtype = hidden_states.dtype
+++-#         batch_size, _ = hidden_states.shape
+++-        
+++-#         expert_outputs_list = [
+++-#             ops.cat([
+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++-#             ], dim=0) 
+++-#             for i in range(batch_size)
+++-#         ]
+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++-
+++-#         # 在 float32 下执行 bmm，得到高精度结果
+++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++-        
+++-#         # 将高精度结果转换回原始数据类型
+++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+++-        
+++-#         return moe_output
+++-
+++-#     @no_grad()
+++-#     def _moe_infer_prefill(
+++-#         self, 
+++-#         hidden_states: mindspore.Tensor, 
+++-#         selected_experts: mindspore.Tensor, 
+++-#         routing_weights: mindspore.Tensor
+++-#     ) -> mindspore.Tensor:
+++-#         """
+++-#         【预填充路径】与原始实现一致，结果精确。
+++-#         """
+++-#         moe_output = ops.zeros_like(hidden_states)
+++-#         num_tokens, _ = hidden_states.shape
+++-#         flat_selected_experts = selected_experts.flatten()
+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++-#         active_experts = ops.unique(flat_selected_experts)
+++-        
+++-#         for expert_idx_tensor in active_experts:
+++-#             expert_idx = expert_idx_tensor.item()
+++-#             expert_layer = self.experts[expert_idx]
+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++-#             selected_token_indices = token_indices[mask]
+++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++-#             current_states = hidden_states[selected_token_indices]
+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++-#             moe_output = moe_output.index_add(
+++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++-#             )
+++-#         return moe_output
+++-
+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-        
+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++-#         router_logits = self.gate(hidden_states_reshaped)
+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++-
+++-#         if self.norm_topk_prob:
+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-        
+++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+++-#         # 如果模型主体是 float16，后续再转换
+++-        
+++-#         moe_output = None
+++-#         if not self.training:
+++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+++-#             # _moe_infer_decode 内部会处理好类型转换
+++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+++-#             if sequence_length == 1:
+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++-#             else:
+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++-#         else:
+++-#             raise NotImplementedError("Training path is not implemented.")
+++-
+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++-        
+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++-        
+++-#         return final_hidden_states, router_logits
+++-    
+++-
+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++-#     """
+++-#     【融合版】一个混合专家模块，内置两种推理策略，
+++-#     由外部全局变量 `Long_Prompt` 控制：
+++-
+++-#     - if Long_Prompt is True:  【精度优先模式】
+++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+++-#       适用于处理长序列，避免误差累积。
+++-
+++-#     - if Long_Prompt is False: 【速度优先模式】
+++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+++-#       在解码阶段获得极致速度，同时保证结果高度准确。
+++-#     """
+++-#     def __init__(self, config: Qwen2MoeConfig):
+++-#         super().__init__()
+++-#         self.num_experts = config.num_experts
+++-#         self.top_k = config.num_experts_per_tok
+++-#         self.norm_topk_prob = config.norm_topk_prob
+++-
+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++-#         self.experts = nn.ModuleList(
+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++-#         )
+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++-
+++-#     # --- 速度优先模式的辅助函数 ---
+++-#     @no_grad()
+++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++-#         original_dtype = hidden_states.dtype
+++-#         batch_size, _ = hidden_states.shape
+++-#         expert_outputs_list = [
+++-#             ops.cat([
+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++-#             ], dim=0) 
+++-#             for i in range(batch_size)
+++-#         ]
+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++-#         weights_fp32 = routing_weights.to(mindspore.float32)
+++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+++-
+++-#     @no_grad()
+++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++-#         moe_output = ops.zeros_like(hidden_states)
+++-#         num_tokens, _ = hidden_states.shape
+++-#         flat_selected_experts = selected_experts.flatten()
+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++-#         active_experts = ops.unique(flat_selected_experts)
+++-#         for expert_idx_tensor in active_experts:
+++-#             expert_idx = expert_idx_tensor.item()
+++-#             expert_layer = self.experts[expert_idx]
+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++-#             selected_token_indices = token_indices[mask]
+++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++-#             current_states = hidden_states[selected_token_indices]
+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+++-#         return moe_output
+++-
+++-#     # --- 精度优先模式的辅助函数 ---
+++-#     @no_grad()
+++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++-#         moe_output = ops.zeros_like(hidden_states)
+++-#         num_tokens, _ = hidden_states.shape
+++-#         flat_selected_experts = selected_experts.flatten()
+++-#         flat_routing_weights = routing_weights.flatten()
+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++-#         active_experts = ops.unique(flat_selected_experts)
+++-#         for expert_idx_tensor in active_experts:
+++-#             expert_idx = expert_idx_tensor.item()
+++-#             expert_layer = self.experts[expert_idx]
+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++-#             current_token_indices = token_indices[mask]
+++-#             current_routing_weights = flat_routing_weights[mask]
+++-#             current_hidden_states = hidden_states[current_token_indices]
+++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++-#         return moe_output
+++-
+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++-#         # 声明我们将要使用一个在模块外部定义的全局变量
+++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+++-#         global Long_Prompt
+++-
+++-#         # 1. 门控计算 (所有模式通用)
+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++-#         router_logits = self.gate(hidden_states_reshaped)
+++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++-#         if self.norm_topk_prob:
+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-        
+++-#         moe_output = None
+++-#         if not self.training:
+++-#             # 根据 Long_Prompt 标志选择模式
+++-#             if Long_Prompt:
+++-#                 # --- 精度优先模式 ---
+++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++-#             else:
+++-#                 # --- 速度优先模式 ---
+++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++-#                 if sequence_length == 1:
+++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++-#                 else:
+++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++-#         else:
+++-#             raise NotImplementedError("Training path is not implemented.")
+++-
+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++-        
+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++-        
+++-#         return final_hidden_states, router_logits
+++-    
+++ class Qwen2MoeSparseMoeBlock(nn.Module):
+++     """
+++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++         return moe_output_fp32.squeeze(1).to(original_dtype)
+++ 
++++    # @no_grad()
++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++    #     num_tokens, _ = hidden_states.shape
++++    #     flat_selected_experts = selected_experts.flatten()
++++    #     sorted_expert_indices = flat_selected_experts.argsort()
++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++++    #     original_token_indices = sorted_expert_indices // self.top_k
++++    #     moe_output = ops.zeros_like(hidden_states)
++++    #     current_token_offset = 0
++++    #     for i in range(self.num_experts):
++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
++++    #         if expert_token_count == 0:
++++    #             continue
++++    #         end_offset = current_token_offset + expert_token_count
++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++++    #         current_token_offset += expert_token_count
++++    #     return moe_output
++++
+++     @no_grad()
+++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++-        num_tokens, _ = hidden_states.shape
+++-        flat_selected_experts = selected_experts.flatten()
+++-        sorted_expert_indices = flat_selected_experts.argsort()
+++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++-        original_token_indices = sorted_expert_indices // self.top_k
++++        """
++++        优化版 MoE prefill (速度优先模式)：
++++        - 批量张量化处理同一个 expert 的所有 token
++++        - 跳过无 token 的专家
++++        - 保持结果完全一致
++++        """
+++         moe_output = ops.zeros_like(hidden_states)
+++-        current_token_offset = 0
+++-        for i in range(self.num_experts):
+++-            expert_token_count = tokens_per_expert[i] - current_token_offset
+++-            if expert_token_count == 0:
+++-                continue
+++-            end_offset = current_token_offset + expert_token_count
+++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++-            expert_hidden_states = hidden_states[expert_original_token_indices]
+++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++-            current_token_offset += expert_token_count
++++
++++        flat_selected_experts = selected_experts.flatten()
++++        flat_routing_weights = routing_weights.flatten()
++++
++++        idxs = flat_selected_experts.argsort()
++++        sorted_expert_indices = flat_selected_experts[idxs]
++++        sorted_token_indices = idxs // self.top_k
++++
++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
++++
++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
++++
++++        for expert_id in active_experts.tolist():
++++            start = int(tokens_per_expert[:expert_id].sum().item())
++++            end = start + int(tokens_per_expert[expert_id].item())
++++
++++            token_idx = sorted_token_indices[start:end]
++++            expert_tokens = hidden_states[token_idx]
++++
++++            expert_out = self.experts[expert_id](expert_tokens)
++++
++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
++++
++++            moe_output = mindspore.mint.scatter_add(
++++                moe_output,
++++                0,
++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
++++                scaled_out.to(hidden_states.dtype)
++++            )
++++
+++         return moe_output
+++ 
++++
+++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+++     @no_grad()
+++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++         
+++         moe_output = None
+++-        if Long_Prompt:
+++-            # --- 精度优先模式 (ACCURACY MODE) ---
+++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++        # if Long_Prompt==0:
++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++        # else:
++++        #     # --- 速度优先模式 (SPEED MODE) ---
++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++        #     if sequence_length == 1:
++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++        #     else:
++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++        
++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++        if sequence_length == 1:
++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++         else:
+++-            # --- 速度优先模式 (SPEED MODE) ---
+++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++-            if sequence_length == 1:
+++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++-            else:
+++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++-        
++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++    
+++ 
+++         # 3. 共享专家计算与合并 (所有模式通用)
+++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++         
+++         return final_hidden_states, router_logits
+++ 
++++
+++ class Qwen2MoeDecoderLayer(nn.Module):
+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+++         super().__init__()
+++         self.hidden_size = config.hidden_size
+++         
+++-        # if Long_Prompt:
+++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++-        # else:
++++        # if Long_Prompt == 2:
+++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++        # else:
++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++ 
+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++ 
+++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++             )
+++ 
+++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++        #     attention_mask,
++++        #     sequence_length=sequence_length,
++++        #     target_length=target_length,
++++        #     dtype=dtype,
++++        #     min_dtype=min_dtype,
++++        #     cache_position=cache_position,
++++        #     batch_size=input_tensor.shape[0],
++++        # )
++++        #@dwj
++++        causal_mask = get_cached_causal_mask_with_cache_position(
+++             attention_mask,
+++             sequence_length=sequence_length,
+++             target_length=target_length,
+++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+++         """
+++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
++++        _causal_mask_cache.clear()
+++ 
+++         input_ids = kwargs.get("input_ids")
+++         if input_ids is None and args:
+++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++ 
+++         if input_ids is not None:
+++             prompt_length = input_ids.shape[1]
+++-            
+++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+++-                Long_Prompt = True
++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
++++                Long_Prompt = 2
++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
++++                Long_Prompt = 0
+++             else:
+++-                Long_Prompt = False
++++                Long_Prompt = 1
++++
+++ 
+++         return super().generate(*args, **kwargs)
+++     
+++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++             dtype = self.lm_head.weight.dtype
+++             min_dtype = float(ops.finfo(dtype).min)
+++ 
+++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++            #     attention_mask,
++++            #     sequence_length=sequence_length,
++++            #     target_length=past_key_values.get_max_length(),
++++            #     dtype=dtype,
++++            #     min_dtype=min_dtype,
++++            #     cache_position=cache_position,
++++            #     batch_size=batch_size,
++++            # )
++++
++++            #@dwj
++++            attention_mask = get_cached_causal_mask_with_cache_position(
+++                 attention_mask,
+++                 sequence_length=sequence_length,
+++                 target_length=past_key_values.get_max_length(),
+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++deleted file mode 100644
+++index 6dfb5b93..00000000
+++--- a/patches/0001-20251104commit.patch
++++++ /dev/null
+++@@ -1,1272 +0,0 @@
+++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++-From: Pinoeer-kingxi <13022943007@163.com>
+++-Date: Tue, 4 Nov 2025 09:11:51 +0800
+++-Subject: [PATCH] 20251104commit
+++-
+++----
+++- mindnlp/transformers/cache_utils.py           |  28 +-
+++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++- 3 files changed, 976 insertions(+), 87 deletions(-)
+++-
+++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++-index cadd2e04..02f8d4be 100644
+++---- a/mindnlp/transformers/cache_utils.py
+++-+++ b/mindnlp/transformers/cache_utils.py
+++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++-             # k_out[:, :, cache_position] = key_states
+++-             # v_out[:, :, cache_position] = value_states
+++--            if ON_ORANGE_PI:
+++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++--            else:
+++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++--
+++-+            # if ON_ORANGE_PI:
+++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++-+            # else:
+++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++-+            # 确保 cache_position 是 1D tensor 并且类型正确
+++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++-+            if cache_position.ndim > 1:
+++-+                cache_position = cache_position.flatten()
+++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++-+                cache_position = cache_position.int()
+++-+            
+++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++-+            k_out[:, :, cache_position] = key_states
+++-+            v_out[:, :, cache_position] = value_states
+++-+            
+++-         return k_out, v_out
+++- 
+++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++-index c695b944..d8303e45 100644
+++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++- # Copied from transformers.models.llama.modeling_llama.rotate_half
+++- def rotate_half(x):
+++-     """Rotates half the hidden dims of the input."""
+++--    x1 = x[..., : x.shape[-1] // 2]
+++--    x2 = x[..., x.shape[-1] // 2 :]
+++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++-+    # x1 = x[..., : x.shape[-1] // 2]
+++-+    # x2 = x[..., x.shape[-1] // 2 :]
+++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++-     return ops.cat((-x2, x1), dim=-1)
+++- 
+++- 
+++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++-         if self.training:
+++-             raise NotImplementedError("Training is not supported yet.")
+++-         else:
+++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++--        if self.config.n_shared_experts is not None:
+++--            y = y + self.shared_experts(identity)
+++--        return y
+++-+            # @lwx
+++-+            if orig_shape[1] == 1:
+++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++-+                y=y.view(*orig_shape)
+++-+                if self.config.n_shared_experts is not None:
+++-+                    y = y + self.shared_experts(identity)
+++-+                return y
+++-+            else:
+++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++-+                if self.config.n_shared_experts is not None:
+++-+                    y = y + self.shared_experts(identity)
+++-+                return y
+++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++-+        # if self.config.n_shared_experts is not None:
+++-+        #     y = y + self.shared_experts(identity)
+++-+        # return y
+++-+        
+++-+    @no_grad()
+++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++-+
+++-+        expert_cache = ops.zeros_like(x)
+++-+        for i in range(self.num_experts_per_tok):
+++-+            expert_id = flat_expert_indices[i].item()
+++-+            weight = flat_expert_weights[i].item()
+++-+            expert = self.experts[expert_id]
+++-+            expert_out = expert(x)
+++-+            expert_cache += expert_out * weight
+++-+        return expert_cache
+++- 
+++-     @no_grad()
+++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++--        # expert_cache = torch.zeros_like(x)
+++--        # idxs = flat_expert_indices.argsort()
+++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++--        # token_idxs = idxs // self.num_experts_per_tok
+++--        # for i, end_idx in enumerate(tokens_per_expert):
+++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++--        #     if start_idx == end_idx:
+++--        #         continue
+++--        #     expert = self.experts[i]
+++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++--        #     expert_tokens = x[exp_token_idx]
+++--        #     expert_out = expert(expert_tokens)
+++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++--        # return expert_cache
+++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++-         expert_cache = ops.zeros_like(x)
+++-         idxs = flat_expert_indices.argsort()
+++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++-         token_idxs = idxs // self.num_experts_per_tok
+++-+
+++-         for i, end_idx in enumerate(tokens_per_expert):
+++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++-             if start_idx == end_idx:
+++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++-             expert_out = expert(expert_tokens)
+++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++-+
+++-         return expert_cache
+++-+        
+++-+    # @no_grad()
+++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++-+    #     # expert_cache = torch.zeros_like(x)
+++-+    #     # idxs = flat_expert_indices.argsort()
+++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++-+    #     # token_idxs = idxs // self.num_experts_per_tok
+++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++-+    #     #     if start_idx == end_idx:
+++-+    #     #         continue
+++-+    #     #     expert = self.experts[i]
+++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++-+    #     #     expert_tokens = x[exp_token_idx]
+++-+    #     #     expert_out = expert(expert_tokens)
+++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++-+    #     # return expert_cache
+++-+    #     expert_cache = ops.zeros_like(x)
+++-+    #     idxs = flat_expert_indices.argsort()
+++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++-+    #     token_idxs = idxs // self.num_experts_per_tok
+++-+
+++-+    #     for i, end_idx in enumerate(tokens_per_expert):
+++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++-+    #         if start_idx == end_idx:
+++-+    #             continue
+++-+    #         expert = self.experts[i]
+++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++-+    #         expert_tokens = x[exp_token_idx]
+++-+    #         expert_out = expert(expert_tokens)
+++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++-+
+++-+    #     return expert_cache
+++-+    # @no_grad()
+++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++-+    #     expert_cache = ops.zeros_like(x)
+++-+
+++-+    #     # 排序保证顺序一致
+++-+    #     idxs = flat_expert_indices.argsort()
+++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++-+    #     token_idxs = idxs // self.num_experts_per_tok
+++-+
+++-+    #     # 找出有 token 的专家
+++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++-+
+++-+    #     for i in active_experts.tolist():
+++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++-+    #         end_idx = tokens_per_expert[i]
+++-+    #         if start_idx == end_idx:  # 没有 token
+++-+    #             continue
+++-+
+++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++-+    #         expert_tokens = x[exp_token_idx]
+++-+    #         expert_out = self.experts[i](expert_tokens)
+++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++-+
+++-+    #         expert_cache = mindspore.mint.scatter_add(
+++-+    #             expert_cache,
+++-+    #             0,
+++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++-+    #             expert_out
+++-+    #         )
+++-+
+++-+    #     return expert_cache
+++-+
+++-+
+++- 
+++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++- #     """
+++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++- 
+++-         # Initialize weights and apply final processing
+++-         self.post_init()
+++-+        self.warm_up = False
+++-+
+++-+    def warmup_moe_model_deep(self):
+++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++-+        test_texts = [
+++-+            "warmup short",
+++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++-+        ]
+++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++-+        if tokenizer is None:
+++-+            from mindnlp.transformers import AutoTokenizer
+++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++-+            self._warmup_tokenizer = tokenizer
+++-+
+++-+        for text in test_texts:
+++-+            inputs = tokenizer(text, return_tensors="ms")
+++-+            with mindspore._no_grad():
+++-+                _ = self(**inputs, use_cache=False)
+++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++- 
+++-     def get_input_embeddings(self):
+++-         return self.model.embed_tokens
+++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++-         ```"""
+++-+        if not self.warm_up:
+++-+            self.warm_up = True
+++-+            self.warmup_moe_model_deep()
+++-+
+++-         output_attentions = (
+++-             output_attentions
+++-             if output_attentions is not None
+++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++-index 3cbf820e..d4c6b651 100644
+++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++-@@ -18,7 +18,6 @@
+++- # See the License for the specific language governing permissions and
+++- # limitations under the License.
+++- """MindSpore Qwen2MoE model."""
+++--
+++- import math
+++- from typing import List, Optional, Tuple, Union
+++- 
+++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++-     TokenClassifierOutput,
+++- )
+++- from ...modeling_utils import PreTrainedModel
+++-+from ...generation import GenerationMixin
+++- from ....utils import logging
+++- from .configuration_qwen2_moe import Qwen2MoeConfig
+++- 
+++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++-         self.variance_epsilon = eps
+++- 
+++-     def forward(self, hidden_states):
+++-+        # @dwj
+++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++-+        # @lwx
+++-+        # if not self.training :
+++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++-         input_dtype = hidden_states.dtype
+++-         hidden_states = hidden_states.to(mindspore.float32)
+++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++-@@ -234,6 +239,8 @@ def rotate_half(x):
+++-     """Rotates half the hidden dims of the input."""
+++-     x1 = x[..., : x.shape[-1] // 2]
+++-     x2 = x[..., x.shape[-1] // 2 :]
+++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++-     return ops.cat((-x2, x1), dim=-1)
+++- 
+++- 
+++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++-         self.config = config
+++-         self.hidden_size = config.hidden_size
+++-         self.intermediate_size = intermediate_size
+++-+        
+++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++-         self.act_fn = ACT2FN[config.hidden_act]
+++- 
+++-     def forward(self, x):
+++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++--
+++- 
+++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++-+        # @lwx 
+++-+        # gate_up_output = self.gate_up_proj(x)
+++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++-+        # return self.down_proj(swiglu_output)
+++-+
+++-+    # def forward(self, x):
+++-+    #     gate_proj_out = self.gate_proj(x)
+++-+    #     up_proj_out = self.up_proj(x)
+++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++-+    #     return self.down_proj(swiglu_out)
+++-+        
+++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++-     """
+++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++-         use_cache: bool = False,
+++-         cache_position: Optional[mindspore.Tensor] = None,
+++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++-+
+++-+        
+++-+
+++-         bsz, q_len, _ = hidden_states.shape
+++- 
+++-         query_states = self.q_proj(hidden_states)
+++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++-                     "with a layer index."
+++-                 )
+++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++-+            if isinstance(past_key_value, StaticCache):
+++-+                kv_seq_len = key_states.shape[-2]
+++-+            else:
+++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++- 
+++-         if past_key_value is not None:
+++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++-+            
+++-+            if isinstance(past_key_value, StaticCache):
+++-+                kv_seq_len = key_states.shape[-2]
+++- 
+++-         # repeat k/v heads if n_kv_heads < n_heads
+++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++--
+++-+        
+++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++- 
+++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++--            raise ValueError(
+++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++--                f" {attn_weights.shape}"
+++--            )
+++--
+++--        if attention_mask is not None:  # no matter the length, we just slice it
+++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++-+        if attention_mask is not None:
+++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++-             attn_weights = attn_weights + causal_mask
+++- 
+++-         # upcast attention to fp32
+++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++- 
+++-         attn_output = self.o_proj(attn_output)
+++--
+++-+        # @lwx
+++-+        
+++-+        # max_seq_len = self.max_position_embeddings  # 2048
+++-+
+++-+        # if attention_mask is not None:
+++-+        #     # attention_mask: [B, 1, Sq, Sk]
+++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++-+
+++-+        #     # pad 到 [max_seq_len, max_seq_len]
+++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++-+        #     global_attention_mask = padded_mask
+++-+        # else:
+++-+        #     global_attention_mask = None
+++-+
+++-+
+++-+        # sparse_mode=3
+++-+        # attn_output = mindspore.ops.flash_attention_score(
+++-+        #     query=query_states,
+++-+        #     key=key_states,
+++-+        #     value=value_states,
+++-+        #     real_shift=None,
+++-+        #     padding_mask=None,
+++-+
+++-+        #     head_num=self.num_heads,
+++-+        #     attn_mask=global_attention_mask,
+++-+        #     keep_prob=1.0 - self.attention_dropout,
+++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++-+        #     input_layout="BNSD",
+++-+        #     pre_tokens=2147483647,
+++-+        #     next_tokens=2147483647,
+++-+        #     inner_precise=0,
+++-+        #     drop_mask=None,
+++-+        #     prefix=None,
+++-+        #     actual_seq_qlen=None,
+++-+        #     actual_seq_kvlen=None,
+++-+        #     sparse_mode=sparse_mode,
+++-+        # )
+++-         if not output_attentions:
+++-             attn_weights = None
+++- 
+++-         return attn_output, attn_weights, past_key_value
+++- 
+++- 
+++-+class Qwen2MoeFlashAttention(nn.Module):
+++-+    """
+++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++-+
+++-+    关键改动:
+++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++-+       直接传入原始的 key 和 value 张量效率更高。
+++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++-+    """
+++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++-+        super().__init__()
+++-+        self.config = config
+++-+        self.layer_idx = layer_idx
+++-+        self.hidden_size = config.hidden_size
+++-+        self.num_heads = config.num_attention_heads
+++-+        self.head_dim = self.hidden_size // self.num_heads
+++-+        self.num_key_value_heads = config.num_key_value_heads
+++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++-+        self.max_position_embeddings = config.max_position_embeddings
+++-+        self.rope_theta = config.rope_theta
+++-+        self.attention_dropout = config.attention_dropout
+++-+
+++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+++-+            raise ValueError(
+++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++-+            )
+++-+
+++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++-+
+++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++-+            self.head_dim,
+++-+            max_position_embeddings=self.max_position_embeddings,
+++-+            base=self.rope_theta,
+++-+        )
+++-+
+++-+    def forward(
+++-+        self,
+++-+        hidden_states: mindspore.Tensor,
+++-+        attention_mask: Optional[mindspore.Tensor] = None,
+++-+        position_ids: Optional[mindspore.Tensor] = None,
+++-+        past_key_value: Optional[Cache] = None,
+++-+        output_attentions: bool = False,
+++-+        use_cache: bool = False,
+++-+        cache_position: Optional[mindspore.Tensor] = None,
+++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++-+
+++-+        bsz, q_len, _ = hidden_states.shape
+++-+
+++-+        # 1. 线性投射 Q, K, V
+++-+        query_states = self.q_proj(hidden_states)
+++-+        key_states = self.k_proj(hidden_states)
+++-+        value_states = self.v_proj(hidden_states)
+++-+
+++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+
+++-+        # 3. RoPE 旋转位置编码
+++-+        kv_seq_len = key_states.shape[-2]
+++-+        if past_key_value is not None:
+++-+            if self.layer_idx is None:
+++-+                raise ValueError(
+++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++-+                    "with a layer index."
+++-+                )
+++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++-+                if cache_position.shape[0] == 1:
+++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++-+                    kv_seq_len = past_seen_tokens + 1
+++-+                else:
+++-+                    # prefill 阶段：cache_position 是范围，使用其长度
+++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++-+            else:
+++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++-+
+++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++-+
+++-+        # 4. KV 缓存更新
+++-+        if past_key_value is not None:
+++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++-+            key_states, value_states = past_key_value.update(
+++-+                key_states, value_states, self.layer_idx, cache_kwargs
+++-+            )
+++-+            
+++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++-+                if cache_position.shape[0] == 1:
+++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++-+                    kv_seq_len = key_states.shape[-2]
+++-+
+++-+        # 5. [重要] 准备 Attention Mask
+++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++-+        fa_attention_mask = None
+++-+        if attention_mask is not None:
+++-+            # 截取与当前key长度匹配的部分
+++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++-+            fa_attention_mask = (mask_slice != 0)
+++-+
+++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++-+        input_dtype = query_states.dtype
+++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++-+            query_states = query_states.to(mindspore.float16)
+++-+            key_states = key_states.to(mindspore.float16)
+++-+            value_states = value_states.to(mindspore.float16)
+++-+
+++-+        # 6. [核心] 调用 flash_attention_score 算子
+++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++-+        attn_output = mindspore.ops.flash_attention_score(
+++-+            query=query_states,
+++-+            key=key_states,
+++-+            value=value_states,
+++-+            head_num=self.num_heads, # 传入Q的头数(N1)
+++-+            attn_mask=fa_attention_mask,
+++-+            keep_prob=1.0 - self.attention_dropout,
+++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+++-+            input_layout="BNSD",
+++-+            sparse_mode=0 # 使用 defaultMask 模式
+++-+        )
+++-+
+++-+        # 恢复原始数据类型
+++-+        attn_output = attn_output.to(input_dtype)
+++-+
+++-+        # 7. 调整输出形状
+++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++-+        attn_output = self.o_proj(attn_output)
+++-+
+++-+        # FlashAttention 算子不直接返回注意力权重矩阵
+++-+        attn_weights = None
+++-+        if output_attentions:
+++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++-+
+++-+        return attn_output, attn_weights, past_key_value
+++-+
+++-+    # def forward(
+++-+    #     self,
+++-+    #     hidden_states: mindspore.Tensor,
+++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+++-+    #     past_key_value: Optional[Cache] = None,
+++-+    #     output_attentions: bool = False,
+++-+    #     use_cache: bool = False,
+++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++-+
+++-+    #     bsz, q_len, _ = hidden_states.shape
+++-+
+++-+    #     # 1. 线性投射 Q, K, V
+++-+    #     query_states = self.q_proj(hidden_states)
+++-+    #     key_states = self.k_proj(hidden_states)
+++-+    #     value_states = self.v_proj(hidden_states)
+++-+
+++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+
+++-+    #     # 3. RoPE 旋转位置编码
+++-+    #     kv_seq_len = key_states.shape[-2]
+++-+    #     if past_key_value is not None:
+++-+    #         if self.layer_idx is None:
+++-+    #             raise ValueError(
+++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++-+    #                 "with a layer index."
+++-+    #             )
+++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++-+
+++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++-+
+++-+    #     # 4. KV 缓存更新
+++-+    #     if past_key_value is not None:
+++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++-+    #         key_states, value_states = past_key_value.update(
+++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+++-+    #         )
+++-+
+++-+    #     # 5. 准备 Attention Mask
+++-+    #     fa_attention_mask = None
+++-+    #     if attention_mask is not None:
+++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++-+    #         fa_attention_mask = (mask_slice != 0)
+++-+
+++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++-+    #     input_dtype = query_states.dtype
+++-+
+++-+    #     # 6. [核心] 调用 flash_attention_score 算子
+++-+    #     attn_output = mindspore.ops.flash_attention_score(
+++-+    #         query=query_states,
+++-+    #         key=key_states,
+++-+    #         value=value_states,
+++-+    #         head_num=self.num_heads,
+++-+    #         attn_mask=fa_attention_mask,
+++-+    #         keep_prob=1.0 - self.attention_dropout,
+++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++-+    #         input_layout="BNSD",
+++-+    #         sparse_mode=0,
+++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++-+    #         inner_precise=1
+++-+    #     )
+++-+
+++-+    #     # 恢复原始数据类型
+++-+    #     attn_output = attn_output.to(input_dtype)
+++-+
+++-+    #     # 7. 调整输出形状
+++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++-+    #     attn_output = self.o_proj(attn_output)
+++-+
+++-+    #     attn_weights = None
+++-+    #     if output_attentions:
+++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++-+
+++-+    #     return attn_output, attn_weights, past_key_value
+++-+
+++-+    # def forward(
+++-+    #     self,
+++-+    #     hidden_states: mindspore.Tensor,
+++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+++-+    #     past_key_value: Optional[Cache] = None,
+++-+    #     output_attentions: bool = False,
+++-+    #     use_cache: bool = False,
+++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++-+
+++-+    #     bsz, q_len, _ = hidden_states.shape
+++-+
+++-+    #     query_states = self.q_proj(hidden_states)
+++-+    #     key_states = self.k_proj(hidden_states)
+++-+    #     value_states = self.v_proj(hidden_states)
+++-+
+++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-+
+++-+    #     kv_seq_len = key_states.shape[-2]
+++-+    #     if past_key_value is not None:
+++-+    #         if self.layer_idx is None:
+++-+    #             raise ValueError("`layer_idx` must be specified for caching")
+++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++-+
+++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++-+
+++-+    #     if past_key_value is not None:
+++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++-+    #         key_states, value_states = past_key_value.update(
+++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+++-+    #         )
+++-+
+++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++-+
+++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++-+    #     query_states = query_states / math.sqrt(self.head_dim)
+++-+    #     # <--- 修改结束 ---
+++-+
+++-+    #     fa_attention_mask = None
+++-+    #     if attention_mask is not None:
+++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++-+    #         fa_attention_mask = (mask_slice != 0)
+++-+
+++-+    #     input_dtype = query_states.dtype
+++-+
+++-+    #     attn_output = mindspore.ops.flash_attention_score(
+++-+    #         query=query_states,    # 传入已经预先缩放过的 query
+++-+    #         key=key_states,
+++-+    #         value=value_states,
+++-+    #         head_num=self.num_heads,
+++-+    #         attn_mask=fa_attention_mask,
+++-+    #         keep_prob=1.0 - self.attention_dropout,
+++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++-+    #         input_layout="BNSD",
+++-+    #         sparse_mode=0,
+++-+    #         inner_precise=1        # 仍然保持内部高精度计算
+++-+    #     )
+++-+
+++-+    #     attn_output = attn_output.to(input_dtype)
+++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++-+    #     attn_output = self.o_proj(attn_output)
+++-+
+++-+    #     attn_weights = None
+++-+    #     if output_attentions:
+++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++-+
+++-+    #     return attn_output, attn_weights, past_key_value
+++-+
+++- QWEN2MOE_ATTENTION_CLASSES = {
+++-     "eager": Qwen2MoeAttention,
+++-+    "flash-attention": Qwen2MoeFlashAttention,
+++- }
+++- 
+++- 
+++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++- 
+++-+    #@dwj
+++-+    # 只遍历激活的专家，而非全部专家
+++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++--        hidden_states = hidden_states.view(-1, hidden_dim)
+++--        # router_logits: (batch * sequence_length, n_experts)
+++--        router_logits = self.gate(hidden_states)
+++--
+++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++--        if self.norm_topk_prob:
+++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++--        # we cast back to the input dtype
+++--        routing_weights = routing_weights.to(hidden_states.dtype)
+++--
+++--        final_hidden_states = ops.zeros(
+++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++--        )
+++--
+++--        # One hot encode the selected experts to create an expert mask
+++--        # this will be used to easily index which expert is going to be sollicitated
+++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++--
+++--        # Loop over all available experts in the model and perform the computation on each expert
+++--        for expert_idx in range(self.num_experts):
+++--            expert_layer = self.experts[expert_idx]
+++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++--
+++--            # Index the correct hidden states and compute the expert hidden state for
+++--            # the current expert. We need to make sure to multiply the output hidden
+++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++--            if 0 not in idx.shape:
+++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++--
+++--                # However `index_add_` only support torch tensors for indexing so we'll use
+++--                # the `top_x` tensor here.
+++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++--
+++--        shared_expert_output = self.shared_expert(hidden_states)
+++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++--
+++--        final_hidden_states = final_hidden_states + shared_expert_output
+++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++-+            num_tokens = hidden_states_reshaped.shape[0]
+++-+
+++-+            router_logits = self.gate(hidden_states_reshaped)
+++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++-+
+++-+            if self.norm_topk_prob:
+++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++-+            routing_weights = routing_weights.to(hidden_states.dtype)
+++-+            
+++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++-+            flat_selected_experts = selected_experts.flatten()
+++-+            
+++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++-+            token_indices = broadcasted_token_indices.flatten()
+++-+            
+++-+            active_experts = ops.unique(flat_selected_experts)
+++-+            
+++-+            for expert_idx_tensor in active_experts:
+++-+                expert_idx = expert_idx_tensor.item()
+++-+                expert_layer = self.experts[expert_idx]
+++-+                
+++-+                mask = (flat_selected_experts == expert_idx_tensor)
+++-+                selected_token_indices = token_indices[mask]
+++-+                selected_routing_weights = routing_weights.flatten()[mask]
+++-+                
+++-+                current_states = hidden_states_reshaped[selected_token_indices]
+++-+                
+++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++-+                
+++-+                final_hidden_states = final_hidden_states.index_add(
+++-+                    dim=0,
+++-+                    index=selected_token_indices,
+++-+                    source=expert_output.to(hidden_states.dtype)
+++-+                )
+++-+            
+++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++- 
+++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++--        return final_hidden_states, router_logits
+++-+            final_hidden_states = final_hidden_states + shared_expert_output
+++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++-+            
+++-+            return final_hidden_states, router_logits
+++- 
+++- 
+++- class Qwen2MoeDecoderLayer(nn.Module):
+++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++- 
+++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++- 
+++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++-+
+++-         if (layer_idx not in config.mlp_only_layers) and (
+++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++-         ):
+++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++-     _skip_keys_device_placement = "past_key_values"
+++-     _supports_cache_class = True
+++-+#lwx
+++-+    # _supports_static_cache = True
+++- 
+++-     def _init_weights(self, module):
+++-         std = self.config.initializer_range
+++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++-         return causal_mask
+++- 
+++- 
+++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++-     _tied_weights_keys = ["lm_head.weight"]
+++- 
+++-     def __init__(self, config):
+++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++-         self.num_experts_per_tok = config.num_experts_per_tok
+++-         # Initialize weights and apply final processing
+++-         self.post_init()
+++-+        # @lwx
+++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++-+        #     self.generation_config.cache_implementation = "static"
+++-+        self._warmed_up = False
+++-+
+++-+    def warmup_moe_model(self):
+++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++-+        test_texts = [
+++-+            "warmup short",
+++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++-+        ]
+++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++-+        if tokenizer is None:
+++-+            from mindnlp.transformers import AutoTokenizer
+++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++-+            self._warmup_tokenizer = tokenizer
+++-+
+++-+        for text in test_texts:
+++-+            inputs = tokenizer(text, return_tensors="ms")
+++-+            with mindspore._no_grad():
+++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++- 
+++-     def get_input_embeddings(self):
+++-         return self.model.embed_tokens
+++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++-         ```"""
+++-+        if not self._warmed_up:
+++-+            self._warmed_up = True
+++-+            self.warmup_moe_model()
+++- 
+++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++-         output_router_logits = (
+++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++-             }
+++-         )
+++-         return model_inputs
+++-+# @lwx
+++-+    # def _decode_one_tokens_logits(
+++-+    #     self,
+++-+    #     cur_token: mindspore.Tensor,
+++-+    #     input_pos: Optional[mindspore.Tensor],
+++-+    #     cache_position: mindspore.Tensor,
+++-+    #     past_key_values: StaticCache,
+++-+    # ) -> mindspore.Tensor:
+++-+    #     """
+++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++-+        
+++-+    #     Args:
+++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++-+    #         input_pos: 输入位置信息，可选
+++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++-+            
+++-+    #     Returns:
+++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++-+    #     """
+++-+    #     # 调用JIT编译的版本
+++-+    #     return self.get_decode_one_tokens_logits(
+++-+    #         cur_token=cur_token,
+++-+    #         input_pos=input_pos,
+++-+    #         cache_position=cache_position,
+++-+    #         past_key_values=past_key_values,
+++-+    #     )
+++-+    
+++-+    # @mindspore.jit(jit_level='O1')
+++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++-+    #     """
+++-+    #     JIT编译的函数，用于高效的单token解码
+++-+    #     使用JIT编译优化以支持静态shape和高效执行
+++-+        
+++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++-+    #     """
+++-+    #     outputs = self.model.forward(
+++-+    #         input_ids=cur_token,
+++-+    #         position_ids=input_pos,
+++-+    #         cache_position=cache_position,
+++-+    #         past_key_values=past_key_values,
+++-+    #         use_cache=True,
+++-+    #         return_dict=False,
+++-+    #     )
+++-+        
+++-+    #     hidden_states = outputs[0]
+++-+    #     logits = self.lm_head.forward(hidden_states)
+++-+    #     logits = logits.float()
+++-+        
+++-+    #     return logits[:, -1, :]
+++-+
+++-+    # def _sample(
+++-+    #     self,
+++-+    #     input_ids: mindspore.Tensor,
+++-+    #     logits_processor,
+++-+    #     stopping_criteria,
+++-+    #     generation_config,
+++-+    #     synced_devices: bool,
+++-+    #     streamer=None,
+++-+    #     logits_warper=None,
+++-+    #     **model_kwargs,
+++-+    # ):
+++-+    #     """
+++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++-+    #     """
+++-+    #     from ...generation.logits_process import LogitsProcessorList
+++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++-+    #     from mindnlp.core import nn, ops, no_grad
+++-+    #     import numpy as np
+++-+        
+++-+    #     # 检查是否使用 StaticCache
+++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++-+    #     # 否则，直接调用父类方法
+++-+    #     past_key_values = model_kwargs.get("past_key_values")
+++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++-+
+++-+    #     if not isinstance(past_key_values, StaticCache):
+++-+    #         # 不使用 StaticCache，直接调用父类方法
+++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++-+    #         return super()._sample(
+++-+    #             input_ids=input_ids,
+++-+    #             logits_processor=logits_processor,
+++-+    #             stopping_criteria=stopping_criteria,
+++-+    #             generation_config=generation_config,
+++-+    #             synced_devices=synced_devices,
+++-+    #             streamer=streamer,
+++-+    #             logits_warper=logits_warper,
+++-+    #             **model_kwargs,
+++-+    #         )
+++-+        
+++-+    #     # 使用 StaticCache，进入自定义循环
+++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++-+    #     pad_token_id = generation_config._pad_token_tensor
+++-+    #     output_attentions = generation_config.output_attentions
+++-+    #     output_hidden_states = generation_config.output_hidden_states
+++-+    #     output_scores = generation_config.output_scores
+++-+    #     output_logits = generation_config.output_logits
+++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++-+    #     max_length = generation_config.max_length
+++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++-+    #     do_sample = generation_config.do_sample
+++-+        
+++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++-+    #         raise ValueError(
+++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++-+    #             f"{logits_warper})."
+++-+    #         )
+++-+        
+++-+    #     # init attention / hidden states / scores tuples
+++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++-+        
+++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++-+    #         encoder_hidden_states = (
+++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++-+    #         )
+++-+        
+++-+    #     # keep track of which sequences are already finished
+++-+    #     batch_size, cur_len = input_ids.shape
+++-+    #     this_peer_finished = False
+++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++-+        
+++-+    #     time_record = []
+++-+    #     from ....utils.testing_utils import parse_flag_from_env
+++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++-+        
+++-+    #     while self._has_unfinished_sequences(
+++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++-+    #     ):
+++-+    #         if _record_time:
+++-+    #             import time as time_module
+++-+    #             infer_start = time_module.time()
+++-+            
+++-+    #         # prepare model inputs
+++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++-+            
+++-+    #         # prepare variable output controls
+++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++-+            
+++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++-+    #         cur_cache_position = model_inputs.get("cache_position")
+++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+++-+    #         cur_input_ids = model_inputs.get("input_ids")
+++-+            
+++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++-+    #             cur_cache_position is not None and 
+++-+    #             len(cur_cache_position.shape) > 0 and
+++-+    #             cur_cache_position.shape[0] == 1 and
+++-+    #             cur_input_ids is not None and
+++-+    #             cur_input_ids.shape[1] == 1):
+++-+    #             # 使用 JIT 优化的单 token 解码
+++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++-+    #             if not hasattr(self, '_jit_used'):
+++-+    #                 self._jit_used = False
+++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++-+                
+++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+++-+    #                 cur_token=cur_input_ids,
+++-+    #                 input_pos=model_inputs.get("position_ids"),
+++-+    #                 cache_position=cur_cache_position,
+++-+    #                 past_key_values=cur_past_key_values,
+++-+    #             )
+++-+                
+++-+    #             # 标记已使用JIT（用于后续判断）
+++-+    #             if not self._jit_used:
+++-+    #                 self._jit_used = True
+++-+                
+++-+    #             # 构造兼容的输出对象
+++-+    #             class JitOptimizedOutput:
+++-+    #                 def __init__(self, logits, config):
+++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++-+    #                     self.config = config
+++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+++-+    #                     self.cross_attentions = None
+++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++-+                
+++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++-+    #         else:
+++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++-+    #             outputs = self(**model_inputs, return_dict=True)
+++-+            
+++-+    #         if synced_devices and this_peer_finished:
+++-+    #             continue
+++-+            
+++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++-+    #         next_token_logits = outputs.logits[:, -1, :]
+++-+            
+++-+    #         # pre-process distribution
+++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++-+    #         if do_sample:
+++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++-+            
+++-+    #         # Store scores, attentions and hidden_states when required
+++-+    #         if return_dict_in_generate:
+++-+    #             if output_scores:
+++-+    #                 scores += (next_token_scores,)
+++-+    #             if output_logits:
+++-+    #                 raw_logits += (next_token_logits,)
+++-+    #             if output_attentions:
+++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++-+    #                 if self.config.is_encoder_decoder:
+++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++-+                
+++-+    #             if output_hidden_states:
+++-+    #                 hidden = (
+++-+    #                     outputs.decoder_hidden_states
+++-+    #                     if self.config.is_encoder_decoder
+++-+    #                     else outputs.hidden_states
+++-+    #                 )
+++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++-+            
+++-+    #         # token selection
+++-+    #         if do_sample:
+++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++-+    #         else:
+++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++-+            
+++-+    #         # finished sentences should have their next token be a padding token
+++-+    #         if has_eos_stopping_criteria:
+++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++-+            
+++-+    #         # update generated ids, model inputs, and length for next step
+++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++-+    #         if streamer is not None:
+++-+    #             streamer.put(next_tokens)
+++-+            
+++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+++-+    #             outputs,
+++-+    #             model_kwargs,
+++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++-+    #         )
+++-+            
+++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++-+    #         cur_len += 1
+++-+            
+++-+    #         if _record_time:
+++-+    #             import time as time_module
+++-+    #             infer_stop = time_module.time()
+++-+    #             time_record.append(infer_stop - infer_start)
+++-+            
+++-+    #         del outputs
+++-+        
+++-+    #     average_infer_time = None
+++-+    #     if time_record:
+++-+    #         if len(time_record) > 1:
+++-+    #             time_record.pop(0)
+++-+    #         average_infer_time = sum(time_record) / len(time_record)
+++-+    #         print(f'average inference time is: {average_infer_time}')
+++-+    #         print(f'inference time record: {time_record}')
+++-+        
+++-+    #     if streamer is not None:
+++-+    #         streamer.end()
+++-+        
+++-+    #     # 简单判断：打印是否使用了JIT路径
+++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+++-+    #     else:
+++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++-+        
+++-+    #     if return_dict_in_generate:
+++-+    #         if self.config.is_encoder_decoder:
+++-+    #             return GenerateEncoderDecoderOutput(
+++-+    #                 sequences=input_ids,
+++-+    #                 scores=scores,
+++-+    #                 logits=raw_logits,
+++-+    #                 encoder_attentions=encoder_attentions,
+++-+    #                 encoder_hidden_states=encoder_hidden_states,
+++-+    #                 decoder_attentions=decoder_attentions,
+++-+    #                 cross_attentions=cross_attentions,
+++-+    #                 decoder_hidden_states=decoder_hidden_states,
+++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+++-+    #                 average_infer_time=average_infer_time
+++-+    #             )
+++-+    #         else:
+++-+    #             return GenerateDecoderOnlyOutput(
+++-+    #                 sequences=input_ids,
+++-+    #                 scores=scores,
+++-+    #                 logits=raw_logits,
+++-+    #                 attentions=decoder_attentions,
+++-+    #                 hidden_states=decoder_hidden_states,
+++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+++-+    #                 average_infer_time=average_infer_time
+++-+    #             )
+++-+    #     else:
+++-+    #         return input_ids
+++-+            
+++-+    # def _prepare_cache_for_generation(
+++-+    #     self,
+++-+    #     generation_config,
+++-+    #     model_kwargs,
+++-+    #     assistant_model,
+++-+    #     batch_size,
+++-+    #     max_cache_length,
+++-+    # ):
+++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++-+    #         generation_config.cache_implementation = "static"
+++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++-+        
+++-+    #     if generation_config.cache_implementation == "static":
+++-+    #         base_required_from_max_length = generation_config.max_length + 1
+++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+++-+    #         min_cache_size = 50
+++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++-+    #         else:
+++-+    #             max_cache_length = max(base_required, min_cache_size)
+++-+            
+++-+    #         original_max_cache_length = max_cache_length
+++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+++-+            
+++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++-+    #             if max_cache_length > self.config.max_position_embeddings:
+++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++-+        
+++-+    #     result = super()._prepare_cache_for_generation(
+++-+    #         generation_config=generation_config,
+++-+    #         model_kwargs=model_kwargs,
+++-+    #         assistant_model=assistant_model,
+++-+    #         batch_size=batch_size,
+++-+    #         max_cache_length=max_cache_length,
+++-+    #     )
+++-+        
+++-+    #     if generation_config.cache_implementation == "static":
+++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++-+    #         created_cache = model_kwargs.get(cache_name)
+++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++-+    #             if created_cache.max_cache_len < generation_config.max_length:
+++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++-+        
+++-+    #     return result
+++-+
+++-+
+++-+
+++- 
+++- 
+++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++--- 
+++-2.27.0
+++-
+++-- 
+++2.27.0
+++
++-- 
++2.27.0
++
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0006-20251107002commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0006-20251107002commit.patch"
new file mode 100644
index 00000000..46db89f2
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0006-20251107002commit.patch"
@@ -0,0 +1,7931 @@
+From 2c9ca98c339c674179652ab1635dab69b46d9012 Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Fri, 7 Nov 2025 12:06:32 +0800
+Subject: [PATCH 06/10] 20251107002commit
+
+---
+ .../models/deepseek/modeling_deepseek.py      |  122 +-
+ patches/0001-20251104commit.patch             |    2 +-
+ patches/0002-20251106commit.patch             |    2 +-
+ patches/0003-20261106secondcommit.patch       |    2 +-
+ patches/0004-20251106change.patch             |    2 +-
+ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
+ 6 files changed, 7773 insertions(+), 64 deletions(-)
+ create mode 100644 patches/0005-20251107001commit.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index 8831e4b7..e7e1c053 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
+     #         expert_out = expert(x)
+     #         expert_cache += expert_out * weight
+     #     return expert_cache
+-
+-    # @no_grad()
+-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-    #     # x 的 shape: (1, hidden_size)
+-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-
+-    #     # 1. 收集所有需要的专家层
+-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+-
+-    #     # 2. 并行计算所有专家的输出
+-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-
+-    #     # 3. 使用矩阵乘法进行加权求和
+-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++    
++    @no_grad()
++    dwj
++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++        # x 的 shape: (1, hidden_size)
++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++
++        # 1. 收集所有需要的专家层
++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++        selected_experts = [self.experts[i] for i in flat_expert_indices]
++
++        # 2. 并行计算所有专家的输出
++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++        # ops.cat 会将它们堆叠成一个新的 Tensor
++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++
++        # 3. 使用矩阵乘法进行加权求和
++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++        # 最终结果 final_output 的 shape: (1, hidden_size)
++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+         
+-    #     return final_output
++        return final_output
+ 
+ 
+     # @no_grad()
+@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
+ 
+         return expert_cache
+ # 放置在 DeepseekMoE 类中
+-    @no_grad()
+-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-        """
+-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-
+-        Args:
+-            x (Tensor): 输入张量, shape: (1, hidden_size)
+-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-        """
+-        top_k, _ = flat_expert_weights.shape
+-        hidden_size = x.shape[-1]
+-
+-        # 1. 将所有专家的权重堆叠起来
+-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
++    # @no_grad()
++    # #lwx 20251107
++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++    #     """
++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
++
++    #     Args:
++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
++    #     """
++    #     top_k, _ = flat_expert_weights.shape
++    #     hidden_size = x.shape[-1]
++
++    #     # 1. 将所有专家的权重堆叠起来
++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+         
+-        # 2. "收集" 所需的专家权重
+-        selected_gate_w = stacked_gate_w[flat_expert_indices]
+-        selected_up_w = stacked_up_w[flat_expert_indices]
+-        selected_down_w = stacked_down_w[flat_expert_indices]
++    #     # 2. "收集" 所需的专家权重
++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
++    #     selected_up_w = stacked_up_w[flat_expert_indices]
++    #     selected_down_w = stacked_down_w[flat_expert_indices]
+         
+-        # 3. 准备输入
+-        x_expanded = x.expand((top_k, 1, hidden_size))
++    #     # 3. 准备输入
++    #     x_expanded = x.expand((top_k, 1, hidden_size))
+         
+-        # 4. 并行计算 gate_proj 和 up_proj
+-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
++    #     # 4. 并行计算 gate_proj 和 up_proj
++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+ 
+-        # 5. 计算中间状态
+-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
++    #     # 5. 计算中间状态
++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+         
+-        # 6. 并行计算 down_proj
+-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-        #    --- [FIX] ---
+-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-        #    --- [FIX END] ---
++    #     # 6. 并行计算 down_proj
++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
++    #     #    --- [FIX] ---
++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
++    #     #    --- [FIX END] ---
+         
+-        # 7. 根据路由权重进行加权求和
+-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
++    #     # 7. 根据路由权重进行加权求和
++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+ 
+-        return weighted_sum
++    #     return weighted_sum
+ 
+ 
+ 
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+index 0a0ef2d7..2842180e 100644
+--- a/patches/0001-20251104commit.patch
++++ b/patches/0001-20251104commit.patch
+@@ -1,7 +1,7 @@
+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-Subject: [PATCH 1/4] 20251104commit
++Subject: [PATCH 1/5] 20251104commit
+ 
+ ---
+  mindnlp/transformers/cache_utils.py           |  28 +-
+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+index 5185270c..c6cd8757 100644
+--- a/patches/0002-20251106commit.patch
++++ b/patches/0002-20251106commit.patch
+@@ -1,7 +1,7 @@
+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-Subject: [PATCH 2/4] 20251106commit
++Subject: [PATCH 2/5] 20251106commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+index 3e05f821..601960c9 100644
+--- a/patches/0003-20261106secondcommit.patch
++++ b/patches/0003-20261106secondcommit.patch
+@@ -1,7 +1,7 @@
+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-Subject: [PATCH 3/4] 20261106secondcommit
++Subject: [PATCH 3/5] 20261106secondcommit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+index 88a1aef4..8976f10b 100644
+--- a/patches/0004-20251106change.patch
++++ b/patches/0004-20251106change.patch
+@@ -1,7 +1,7 @@
+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 15:48:09 +0800
+-Subject: [PATCH 4/4] 20251106change
++Subject: [PATCH 4/5] 20251106change
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  189 +-
+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+new file mode 100644
+index 00000000..8d9032be
+--- /dev/null
++++ b/patches/0005-20251107001commit.patch
+@@ -0,0 +1,7707 @@
++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Fri, 7 Nov 2025 11:48:18 +0800
++Subject: [PATCH 5/5] 20251107001commit
++
++---
++ .../models/deepseek/modeling_deepseek.py      |   91 +-
++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
++ patches/0001-20251104commit.patch             |    2 +-
++ patches/0002-20251106commit.patch             |    2 +-
++ patches/0003-20261106secondcommit.patch       |    2 +-
++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
++ 7 files changed, 7577 insertions(+), 30 deletions(-)
++ create mode 100644 patches/0004-20251106change.patch
++
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index 0546f318..8831e4b7 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
++     #         expert_cache += expert_out * weight
++     #     return expert_cache
++ 
++-    @no_grad()
++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++-        # x 的 shape: (1, hidden_size)
++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++-
++-        # 1. 收集所有需要的专家层
++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
++-
++-        # 2. 并行计算所有专家的输出
++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++-        # ops.cat 会将它们堆叠成一个新的 Tensor
++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++-
++-        # 3. 使用矩阵乘法进行加权求和
++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++-        # 最终结果 final_output 的 shape: (1, hidden_size)
++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++    # @no_grad()
+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++    #     # x 的 shape: (1, hidden_size)
+++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++
+++    #     # 1. 收集所有需要的专家层
+++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+++
+++    #     # 2. 并行计算所有专家的输出
+++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++
+++    #     # 3. 使用矩阵乘法进行加权求和
+++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++         
++-        return final_output
+++    #     return final_output
++ 
++ 
++     # @no_grad()
++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
++             )
++ 
++         return expert_cache
+++# 放置在 DeepseekMoE 类中
+++    @no_grad()
+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++        """
+++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+++
+++        Args:
+++            x (Tensor): 输入张量, shape: (1, hidden_size)
+++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+++        """
+++        top_k, _ = flat_expert_weights.shape
+++        hidden_size = x.shape[-1]
+++
+++        # 1. 将所有专家的权重堆叠起来
+++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+++        
+++        # 2. "收集" 所需的专家权重
+++        selected_gate_w = stacked_gate_w[flat_expert_indices]
+++        selected_up_w = stacked_up_w[flat_expert_indices]
+++        selected_down_w = stacked_down_w[flat_expert_indices]
+++        
+++        # 3. 准备输入
+++        x_expanded = x.expand((top_k, 1, hidden_size))
+++        
+++        # 4. 并行计算 gate_proj 和 up_proj
+++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+++
+++        # 5. 计算中间状态
+++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+++        
+++        # 6. 并行计算 down_proj
+++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+++        #    --- [FIX] ---
+++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+++        #    --- [FIX END] ---
+++        
+++        # 7. 根据路由权重进行加权求和
+++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+++
+++        return weighted_sum
+++
+++
++ 
++         # @no_grad()
++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++index ebd7782e..913a7609 100644
++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++ def rotate_half(x):
++     """Rotates half the hidden dims of the input."""
++-    x1 = x[..., : x.shape[-1] // 2]
++-    x2 = x[..., x.shape[-1] // 2 :]
+++    # x1 = x[..., : x.shape[-1] // 2]
+++    # x2 = x[..., x.shape[-1] // 2 :]
++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++     return ops.cat((-x2, x1), dim=-1)
++ 
++ 
++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
++index d059dcbe..2b217b64 100644
++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++ def rotate_half(x):
++     """Rotates half the hidden dims of the input."""
++-    x1 = x[..., : x.shape[-1] // 2]
++-    x2 = x[..., x.shape[-1] // 2 :]
+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++    # x1 = x[..., : x.shape[-1] // 2]
+++    # x2 = x[..., x.shape[-1] // 2 :]
+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++     return ops.cat((-x2, x1), dim=-1)
++ 
++ 
++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++index 78f22642..0a0ef2d7 100644
++--- a/patches/0001-20251104commit.patch
+++++ b/patches/0001-20251104commit.patch
++@@ -1,7 +1,7 @@
++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Tue, 4 Nov 2025 09:11:51 +0800
++-Subject: [PATCH 1/3] 20251104commit
+++Subject: [PATCH 1/4] 20251104commit
++ 
++ ---
++  mindnlp/transformers/cache_utils.py           |  28 +-
++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++index 22b65dd5..5185270c 100644
++--- a/patches/0002-20251106commit.patch
+++++ b/patches/0002-20251106commit.patch
++@@ -1,7 +1,7 @@
++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 09:20:38 +0800
++-Subject: [PATCH 2/3] 20251106commit
+++Subject: [PATCH 2/4] 20251106commit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++index 966529e4..3e05f821 100644
++--- a/patches/0003-20261106secondcommit.patch
+++++ b/patches/0003-20261106secondcommit.patch
++@@ -1,7 +1,7 @@
++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 14:54:37 +0800
++-Subject: [PATCH 3/3] 20261106secondcommit
+++Subject: [PATCH 3/4] 20261106secondcommit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
++new file mode 100644
++index 00000000..88a1aef4
++--- /dev/null
+++++ b/patches/0004-20251106change.patch
++@@ -0,0 +1,7498 @@
+++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+++From: Pinoeer-kingxi <13022943007@163.com>
+++Date: Thu, 6 Nov 2025 15:48:09 +0800
+++Subject: [PATCH 4/4] 20251106change
+++
+++---
+++ .../models/deepseek/modeling_deepseek.py      |  189 +-
+++ patches/0001-20251104commit.patch             | 1272 +++++++
+++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
+++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
+++ 4 files changed, 7244 insertions(+), 186 deletions(-)
+++ create mode 100644 patches/0001-20251104commit.patch
+++ create mode 100644 patches/0002-20251106commit.patch
+++ create mode 100644 patches/0003-20261106secondcommit.patch
+++
+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++index 2f9192bf..0546f318 100644
+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
+++ 
+++         return attn_output, attn_weights, past_key_value
+++ 
+++-# class DeepseekFlashAttention(nn.Module):
+++-#     """
+++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+++-
+++-#     This class is designed as a drop-in replacement for DeepseekAttention.
+++-#     """
+++-
+++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+++-#         super().__init__()
+++-#         self.config = config
+++-#         self.layer_idx = layer_idx
+++-#         if layer_idx is None:
+++-#             logger.warning(
+++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++-#                 "when creating this class."
+++-#             )
+++-
+++-#         self.attention_dropout = config.attention_dropout
+++-#         self.hidden_size = config.hidden_size
+++-#         self.num_heads = config.num_attention_heads
+++-#         self.head_dim = self.hidden_size // self.num_heads
+++-#         self.num_key_value_heads = config.num_key_value_heads
+++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++-#         self.max_position_embeddings = config.max_position_embeddings
+++-#         self.rope_theta = config.rope_theta
+++-#         self.is_causal = True
+++-
+++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++-#             raise ValueError(
+++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++-#                 f" and `num_heads`: {self.num_heads})."
+++-#             )
+++-
+++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+++-#         self._init_rope()
+++-
+++-#     def _init_rope(self):
+++-#         if self.config.rope_scaling is None:
+++-#             self.rotary_emb = DeepseekRotaryEmbedding(
+++-#                 self.head_dim,
+++-#                 max_position_embeddings=self.max_position_embeddings,
+++-#                 base=self.rope_theta,
+++-#             )
+++-#         else:
+++-#             scaling_type = self.config.rope_scaling["type"]
+++-#             scaling_factor = self.config.rope_scaling["factor"]
+++-#             if scaling_type == "linear":
+++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+++-#                     self.head_dim,
+++-#                     max_position_embeddings=self.max_position_embeddings,
+++-#                     scaling_factor=scaling_factor,
+++-#                     base=self.rope_theta,
+++-#                 )
+++-#             elif scaling_type == "dynamic":
+++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+++-#                     self.head_dim,
+++-#                     max_position_embeddings=self.max_position_embeddings,
+++-#                     scaling_factor=scaling_factor,
+++-#                     base=self.rope_theta,
+++-#                 )
+++-#             else:
+++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+++-
+++-#     def forward(
+++-#         self,
+++-#         hidden_states: mindspore.Tensor,
+++-#         attention_mask: Optional[mindspore.Tensor] = None,
+++-#         position_ids: Optional[mindspore.Tensor] = None,
+++-#         past_key_value: Optional[Cache] = None,
+++-#         output_attentions: bool = False,
+++-#         use_cache: bool = False,
+++-#         **kwargs,
+++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++-#         if "padding_mask" in kwargs:
+++-#             warnings.warn(
+++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+++-#             )
+++-        
+++-#         if output_attentions:
+++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+++-
+++-#         bsz, q_len, _ = hidden_states.shape
+++-
+++-#         if self.config.pretraining_tp > 1:
+++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+++-
+++-#         query_states = self.q_proj(hidden_states)
+++-#         key_states = self.k_proj(hidden_states)
+++-#         value_states = self.v_proj(hidden_states)
+++-
+++-#         # Reshape for multi-head attention
+++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++-
+++-#         kv_seq_len = key_states.shape[-2]
+++-#         if past_key_value is not None:
+++-#             if self.layer_idx is None:
+++-#                 raise ValueError(
+++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++-#                     "with a layer index."
+++-#                 )
+++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++-        
+++-#         # Apply Rotary Positional Embedding
+++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++-
+++-#         if past_key_value is not None:
+++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++-
+++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++-        
+++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++-        
+++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++-
+++-#         # Convert attention_mask for flash_attention_score
+++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+++-#         if attention_mask is not None:
+++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+++-#                 raise ValueError(
+++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+++-#                 )
+++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+++-#         else:
+++-#             attn_mask_for_fa = None
+++-        
+++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+++-
+++-#         # Call the fused flash_attention_score operator
+++-#         attn_output = mindspore.ops.flash_attention_score(
+++-#             query=query_states_for_fa,
+++-#             key=key_states_for_fa,
+++-#             value=value_states_for_fa,
+++-#             head_num=self.num_heads, # This is N1, the number of query heads
+++-#             input_layout='BSH',
+++-#             attn_mask=attn_mask_for_fa,
+++-#             keep_prob=keep_prob,
+++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
+++-#             sparse_mode=0 # Default mask mode
+++-#         )
+++-        
+++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+++-#         attn_output = self.o_proj(attn_output)
+++-        
+++-#         # Flash Attention does not return attention weights
+++-#         attn_weights = None
+++-
+++-#         return attn_output, attn_weights, past_key_value
+++ 
+++ class DeepseekFlashAttention(nn.Module):
+++     """
+++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
+++         super().__init__()
+++         self.hidden_size = config.hidden_size
+++ 
+++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+++-            config=config, layer_idx=layer_idx
+++-        )
++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
++++            # config=config, layer_idx=layer_idx
++++        # )
+++ 
+++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+++             config=config, layer_idx=layer_idx
+++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
+++         return outputs
+++ 
+++ 
+++-
+++ class DeepseekPreTrainedModel(PreTrainedModel):
+++     config_class = DeepseekConfig
+++     base_model_prefix = "model"
+++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++         # Initialize weights and apply final processing
+++         self.post_init()
+++         self.warm_up = False
+++-        #@dwj
+++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+++-            self.num_layers,
+++-            self.num_attention_heads,
+++-            self.head_dim,
+++-            batch_size=1,
+++-            max_length=self.max_length,
+++-            dtype=mindspore.float16
+++-        )
+++-
+++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+++-        key_cache = []
+++-        value_cache = []
+++-        for _ in range(num_layers):
+++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++-            key_cache.append(k)
+++-            value_cache.append(v)
+++-        return key_cache, value_cache
+++-
+++ 
+++     def warmup_moe_model_deep(self):
+++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++new file mode 100644
+++index 00000000..78f22642
+++--- /dev/null
++++++ b/patches/0001-20251104commit.patch
+++@@ -0,0 +1,1272 @@
++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++From: Pinoeer-kingxi <13022943007@163.com>
++++Date: Tue, 4 Nov 2025 09:11:51 +0800
++++Subject: [PATCH 1/3] 20251104commit
++++
++++---
++++ mindnlp/transformers/cache_utils.py           |  28 +-
++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++++ 3 files changed, 976 insertions(+), 87 deletions(-)
++++
++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++++index cadd2e04..02f8d4be 100644
++++--- a/mindnlp/transformers/cache_utils.py
+++++++ b/mindnlp/transformers/cache_utils.py
++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++++             # k_out[:, :, cache_position] = key_states
++++             # v_out[:, :, cache_position] = value_states
++++-            if ON_ORANGE_PI:
++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++-            else:
++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++-
+++++            # if ON_ORANGE_PI:
+++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++            # else:
+++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++            # 确保 cache_position 是 1D tensor 并且类型正确
+++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++++            if cache_position.ndim > 1:
+++++                cache_position = cache_position.flatten()
+++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++++                cache_position = cache_position.int()
+++++            
+++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++++            k_out[:, :, cache_position] = key_states
+++++            v_out[:, :, cache_position] = value_states
+++++            
++++         return k_out, v_out
++++ 
++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++index c695b944..d8303e45 100644
++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++++ def rotate_half(x):
++++     """Rotates half the hidden dims of the input."""
++++-    x1 = x[..., : x.shape[-1] // 2]
++++-    x2 = x[..., x.shape[-1] // 2 :]
+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++    # x1 = x[..., : x.shape[-1] // 2]
+++++    # x2 = x[..., x.shape[-1] // 2 :]
+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++     return ops.cat((-x2, x1), dim=-1)
++++ 
++++ 
++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++++         if self.training:
++++             raise NotImplementedError("Training is not supported yet.")
++++         else:
++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++-        if self.config.n_shared_experts is not None:
++++-            y = y + self.shared_experts(identity)
++++-        return y
+++++            # @lwx
+++++            if orig_shape[1] == 1:
+++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++++                y=y.view(*orig_shape)
+++++                if self.config.n_shared_experts is not None:
+++++                    y = y + self.shared_experts(identity)
+++++                return y
+++++            else:
+++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++++                if self.config.n_shared_experts is not None:
+++++                    y = y + self.shared_experts(identity)
+++++                return y
+++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++        # if self.config.n_shared_experts is not None:
+++++        #     y = y + self.shared_experts(identity)
+++++        # return y
+++++        
+++++    @no_grad()
+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++
+++++        expert_cache = ops.zeros_like(x)
+++++        for i in range(self.num_experts_per_tok):
+++++            expert_id = flat_expert_indices[i].item()
+++++            weight = flat_expert_weights[i].item()
+++++            expert = self.experts[expert_id]
+++++            expert_out = expert(x)
+++++            expert_cache += expert_out * weight
+++++        return expert_cache
++++ 
++++     @no_grad()
++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++-        # expert_cache = torch.zeros_like(x)
++++-        # idxs = flat_expert_indices.argsort()
++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++-        # token_idxs = idxs // self.num_experts_per_tok
++++-        # for i, end_idx in enumerate(tokens_per_expert):
++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++-        #     if start_idx == end_idx:
++++-        #         continue
++++-        #     expert = self.experts[i]
++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
++++-        #     expert_tokens = x[exp_token_idx]
++++-        #     expert_out = expert(expert_tokens)
++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++-        # return expert_cache
+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++         expert_cache = ops.zeros_like(x)
++++         idxs = flat_expert_indices.argsort()
++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++         token_idxs = idxs // self.num_experts_per_tok
+++++
++++         for i, end_idx in enumerate(tokens_per_expert):
++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++             if start_idx == end_idx:
++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++++             expert_out = expert(expert_tokens)
++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++
++++         return expert_cache
+++++        
+++++    # @no_grad()
+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++    #     # expert_cache = torch.zeros_like(x)
+++++    #     # idxs = flat_expert_indices.argsort()
+++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++    #     # token_idxs = idxs // self.num_experts_per_tok
+++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++    #     #     if start_idx == end_idx:
+++++    #     #         continue
+++++    #     #     expert = self.experts[i]
+++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++    #     #     expert_tokens = x[exp_token_idx]
+++++    #     #     expert_out = expert(expert_tokens)
+++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++    #     # return expert_cache
+++++    #     expert_cache = ops.zeros_like(x)
+++++    #     idxs = flat_expert_indices.argsort()
+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++
+++++    #     for i, end_idx in enumerate(tokens_per_expert):
+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++    #         if start_idx == end_idx:
+++++    #             continue
+++++    #         expert = self.experts[i]
+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++    #         expert_tokens = x[exp_token_idx]
+++++    #         expert_out = expert(expert_tokens)
+++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++
+++++    #     return expert_cache
+++++    # @no_grad()
+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++    #     expert_cache = ops.zeros_like(x)
+++++
+++++    #     # 排序保证顺序一致
+++++    #     idxs = flat_expert_indices.argsort()
+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++
+++++    #     # 找出有 token 的专家
+++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++
+++++    #     for i in active_experts.tolist():
+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++    #         end_idx = tokens_per_expert[i]
+++++    #         if start_idx == end_idx:  # 没有 token
+++++    #             continue
+++++
+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++    #         expert_tokens = x[exp_token_idx]
+++++    #         expert_out = self.experts[i](expert_tokens)
+++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++
+++++    #         expert_cache = mindspore.mint.scatter_add(
+++++    #             expert_cache,
+++++    #             0,
+++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++    #             expert_out
+++++    #         )
+++++
+++++    #     return expert_cache
+++++
+++++
++++ 
++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++++ #     """
++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++ 
++++         # Initialize weights and apply final processing
++++         self.post_init()
+++++        self.warm_up = False
+++++
+++++    def warmup_moe_model_deep(self):
+++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++        test_texts = [
+++++            "warmup short",
+++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++++        ]
+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++        if tokenizer is None:
+++++            from mindnlp.transformers import AutoTokenizer
+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++            self._warmup_tokenizer = tokenizer
+++++
+++++        for text in test_texts:
+++++            inputs = tokenizer(text, return_tensors="ms")
+++++            with mindspore._no_grad():
+++++                _ = self(**inputs, use_cache=False)
+++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++++ 
++++     def get_input_embeddings(self):
++++         return self.model.embed_tokens
++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++         ```"""
+++++        if not self.warm_up:
+++++            self.warm_up = True
+++++            self.warmup_moe_model_deep()
+++++
++++         output_attentions = (
++++             output_attentions
++++             if output_attentions is not None
++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++index 3cbf820e..d4c6b651 100644
++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++@@ -18,7 +18,6 @@
++++ # See the License for the specific language governing permissions and
++++ # limitations under the License.
++++ """MindSpore Qwen2MoE model."""
++++-
++++ import math
++++ from typing import List, Optional, Tuple, Union
++++ 
++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++++     TokenClassifierOutput,
++++ )
++++ from ...modeling_utils import PreTrainedModel
+++++from ...generation import GenerationMixin
++++ from ....utils import logging
++++ from .configuration_qwen2_moe import Qwen2MoeConfig
++++ 
++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++++         self.variance_epsilon = eps
++++ 
++++     def forward(self, hidden_states):
+++++        # @dwj
+++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++        # @lwx
+++++        # if not self.training :
+++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++         input_dtype = hidden_states.dtype
++++         hidden_states = hidden_states.to(mindspore.float32)
++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++++@@ -234,6 +239,8 @@ def rotate_half(x):
++++     """Rotates half the hidden dims of the input."""
++++     x1 = x[..., : x.shape[-1] // 2]
++++     x2 = x[..., x.shape[-1] // 2 :]
+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++     return ops.cat((-x2, x1), dim=-1)
++++ 
++++ 
++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++++         self.config = config
++++         self.hidden_size = config.hidden_size
++++         self.intermediate_size = intermediate_size
+++++        
++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++++         self.act_fn = ACT2FN[config.hidden_act]
++++ 
++++     def forward(self, x):
++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++-
++++ 
+++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++        # @lwx 
+++++        # gate_up_output = self.gate_up_proj(x)
+++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++++        # return self.down_proj(swiglu_output)
+++++
+++++    # def forward(self, x):
+++++    #     gate_proj_out = self.gate_proj(x)
+++++    #     up_proj_out = self.up_proj(x)
+++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++++    #     return self.down_proj(swiglu_out)
+++++        
++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++     """
++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++++         use_cache: bool = False,
++++         cache_position: Optional[mindspore.Tensor] = None,
++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++        
+++++
++++         bsz, q_len, _ = hidden_states.shape
++++ 
++++         query_states = self.q_proj(hidden_states)
++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++                     "with a layer index."
++++                 )
++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++            if isinstance(past_key_value, StaticCache):
+++++                kv_seq_len = key_states.shape[-2]
+++++            else:
+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++ 
++++         if past_key_value is not None:
++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++            
+++++            if isinstance(past_key_value, StaticCache):
+++++                kv_seq_len = key_states.shape[-2]
++++ 
++++         # repeat k/v heads if n_kv_heads < n_heads
++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++-
+++++        
++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++ 
++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++++-            raise ValueError(
++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++++-                f" {attn_weights.shape}"
++++-            )
++++-
++++-        if attention_mask is not None:  # no matter the length, we just slice it
++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++++        if attention_mask is not None:
+++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++             attn_weights = attn_weights + causal_mask
++++ 
++++         # upcast attention to fp32
++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++ 
++++         attn_output = self.o_proj(attn_output)
++++-
+++++        # @lwx
+++++        
+++++        # max_seq_len = self.max_position_embeddings  # 2048
+++++
+++++        # if attention_mask is not None:
+++++        #     # attention_mask: [B, 1, Sq, Sk]
+++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++
+++++        #     # pad 到 [max_seq_len, max_seq_len]
+++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++        #     global_attention_mask = padded_mask
+++++        # else:
+++++        #     global_attention_mask = None
+++++
+++++
+++++        # sparse_mode=3
+++++        # attn_output = mindspore.ops.flash_attention_score(
+++++        #     query=query_states,
+++++        #     key=key_states,
+++++        #     value=value_states,
+++++        #     real_shift=None,
+++++        #     padding_mask=None,
+++++
+++++        #     head_num=self.num_heads,
+++++        #     attn_mask=global_attention_mask,
+++++        #     keep_prob=1.0 - self.attention_dropout,
+++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++        #     input_layout="BNSD",
+++++        #     pre_tokens=2147483647,
+++++        #     next_tokens=2147483647,
+++++        #     inner_precise=0,
+++++        #     drop_mask=None,
+++++        #     prefix=None,
+++++        #     actual_seq_qlen=None,
+++++        #     actual_seq_kvlen=None,
+++++        #     sparse_mode=sparse_mode,
+++++        # )
++++         if not output_attentions:
++++             attn_weights = None
++++ 
++++         return attn_output, attn_weights, past_key_value
++++ 
++++ 
+++++class Qwen2MoeFlashAttention(nn.Module):
+++++    """
+++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++
+++++    关键改动:
+++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++       直接传入原始的 key 和 value 张量效率更高。
+++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++    """
+++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++        super().__init__()
+++++        self.config = config
+++++        self.layer_idx = layer_idx
+++++        self.hidden_size = config.hidden_size
+++++        self.num_heads = config.num_attention_heads
+++++        self.head_dim = self.hidden_size // self.num_heads
+++++        self.num_key_value_heads = config.num_key_value_heads
+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++        self.max_position_embeddings = config.max_position_embeddings
+++++        self.rope_theta = config.rope_theta
+++++        self.attention_dropout = config.attention_dropout
+++++
+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++            raise ValueError(
+++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++            )
+++++
+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++
+++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++            self.head_dim,
+++++            max_position_embeddings=self.max_position_embeddings,
+++++            base=self.rope_theta,
+++++        )
+++++
+++++    def forward(
+++++        self,
+++++        hidden_states: mindspore.Tensor,
+++++        attention_mask: Optional[mindspore.Tensor] = None,
+++++        position_ids: Optional[mindspore.Tensor] = None,
+++++        past_key_value: Optional[Cache] = None,
+++++        output_attentions: bool = False,
+++++        use_cache: bool = False,
+++++        cache_position: Optional[mindspore.Tensor] = None,
+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++        bsz, q_len, _ = hidden_states.shape
+++++
+++++        # 1. 线性投射 Q, K, V
+++++        query_states = self.q_proj(hidden_states)
+++++        key_states = self.k_proj(hidden_states)
+++++        value_states = self.v_proj(hidden_states)
+++++
+++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++        # 3. RoPE 旋转位置编码
+++++        kv_seq_len = key_states.shape[-2]
+++++        if past_key_value is not None:
+++++            if self.layer_idx is None:
+++++                raise ValueError(
+++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++                    "with a layer index."
+++++                )
+++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++                if cache_position.shape[0] == 1:
+++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++                    kv_seq_len = past_seen_tokens + 1
+++++                else:
+++++                    # prefill 阶段：cache_position 是范围，使用其长度
+++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++            else:
+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++
+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++        # 4. KV 缓存更新
+++++        if past_key_value is not None:
+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++            key_states, value_states = past_key_value.update(
+++++                key_states, value_states, self.layer_idx, cache_kwargs
+++++            )
+++++            
+++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++                if cache_position.shape[0] == 1:
+++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++                    kv_seq_len = key_states.shape[-2]
+++++
+++++        # 5. [重要] 准备 Attention Mask
+++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++        fa_attention_mask = None
+++++        if attention_mask is not None:
+++++            # 截取与当前key长度匹配的部分
+++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++            fa_attention_mask = (mask_slice != 0)
+++++
+++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++        input_dtype = query_states.dtype
+++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++            query_states = query_states.to(mindspore.float16)
+++++            key_states = key_states.to(mindspore.float16)
+++++            value_states = value_states.to(mindspore.float16)
+++++
+++++        # 6. [核心] 调用 flash_attention_score 算子
+++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++        attn_output = mindspore.ops.flash_attention_score(
+++++            query=query_states,
+++++            key=key_states,
+++++            value=value_states,
+++++            head_num=self.num_heads, # 传入Q的头数(N1)
+++++            attn_mask=fa_attention_mask,
+++++            keep_prob=1.0 - self.attention_dropout,
+++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+++++            input_layout="BNSD",
+++++            sparse_mode=0 # 使用 defaultMask 模式
+++++        )
+++++
+++++        # 恢复原始数据类型
+++++        attn_output = attn_output.to(input_dtype)
+++++
+++++        # 7. 调整输出形状
+++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++        attn_output = self.o_proj(attn_output)
+++++
+++++        # FlashAttention 算子不直接返回注意力权重矩阵
+++++        attn_weights = None
+++++        if output_attentions:
+++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++
+++++        return attn_output, attn_weights, past_key_value
+++++
+++++    # def forward(
+++++    #     self,
+++++    #     hidden_states: mindspore.Tensor,
+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++    #     past_key_value: Optional[Cache] = None,
+++++    #     output_attentions: bool = False,
+++++    #     use_cache: bool = False,
+++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++    #     bsz, q_len, _ = hidden_states.shape
+++++
+++++    #     # 1. 线性投射 Q, K, V
+++++    #     query_states = self.q_proj(hidden_states)
+++++    #     key_states = self.k_proj(hidden_states)
+++++    #     value_states = self.v_proj(hidden_states)
+++++
+++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++    #     # 3. RoPE 旋转位置编码
+++++    #     kv_seq_len = key_states.shape[-2]
+++++    #     if past_key_value is not None:
+++++    #         if self.layer_idx is None:
+++++    #             raise ValueError(
+++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++    #                 "with a layer index."
+++++    #             )
+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++
+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++    #     # 4. KV 缓存更新
+++++    #     if past_key_value is not None:
+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++    #         key_states, value_states = past_key_value.update(
+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++    #         )
+++++
+++++    #     # 5. 准备 Attention Mask
+++++    #     fa_attention_mask = None
+++++    #     if attention_mask is not None:
+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++    #         fa_attention_mask = (mask_slice != 0)
+++++
+++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++    #     input_dtype = query_states.dtype
+++++
+++++    #     # 6. [核心] 调用 flash_attention_score 算子
+++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++    #         query=query_states,
+++++    #         key=key_states,
+++++    #         value=value_states,
+++++    #         head_num=self.num_heads,
+++++    #         attn_mask=fa_attention_mask,
+++++    #         keep_prob=1.0 - self.attention_dropout,
+++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++    #         input_layout="BNSD",
+++++    #         sparse_mode=0,
+++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++    #         inner_precise=1
+++++    #     )
+++++
+++++    #     # 恢复原始数据类型
+++++    #     attn_output = attn_output.to(input_dtype)
+++++
+++++    #     # 7. 调整输出形状
+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++    #     attn_output = self.o_proj(attn_output)
+++++
+++++    #     attn_weights = None
+++++    #     if output_attentions:
+++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++
+++++    #     return attn_output, attn_weights, past_key_value
+++++
+++++    # def forward(
+++++    #     self,
+++++    #     hidden_states: mindspore.Tensor,
+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++    #     past_key_value: Optional[Cache] = None,
+++++    #     output_attentions: bool = False,
+++++    #     use_cache: bool = False,
+++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++    #     bsz, q_len, _ = hidden_states.shape
+++++
+++++    #     query_states = self.q_proj(hidden_states)
+++++    #     key_states = self.k_proj(hidden_states)
+++++    #     value_states = self.v_proj(hidden_states)
+++++
+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++    #     kv_seq_len = key_states.shape[-2]
+++++    #     if past_key_value is not None:
+++++    #         if self.layer_idx is None:
+++++    #             raise ValueError("`layer_idx` must be specified for caching")
+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++
+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++    #     if past_key_value is not None:
+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++    #         key_states, value_states = past_key_value.update(
+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++    #         )
+++++
+++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++
+++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++    #     query_states = query_states / math.sqrt(self.head_dim)
+++++    #     # <--- 修改结束 ---
+++++
+++++    #     fa_attention_mask = None
+++++    #     if attention_mask is not None:
+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++    #         fa_attention_mask = (mask_slice != 0)
+++++
+++++    #     input_dtype = query_states.dtype
+++++
+++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++    #         query=query_states,    # 传入已经预先缩放过的 query
+++++    #         key=key_states,
+++++    #         value=value_states,
+++++    #         head_num=self.num_heads,
+++++    #         attn_mask=fa_attention_mask,
+++++    #         keep_prob=1.0 - self.attention_dropout,
+++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++    #         input_layout="BNSD",
+++++    #         sparse_mode=0,
+++++    #         inner_precise=1        # 仍然保持内部高精度计算
+++++    #     )
+++++
+++++    #     attn_output = attn_output.to(input_dtype)
+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++    #     attn_output = self.o_proj(attn_output)
+++++
+++++    #     attn_weights = None
+++++    #     if output_attentions:
+++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++
+++++    #     return attn_output, attn_weights, past_key_value
+++++
++++ QWEN2MOE_ATTENTION_CLASSES = {
++++     "eager": Qwen2MoeAttention,
+++++    "flash-attention": Qwen2MoeFlashAttention,
++++ }
++++ 
++++ 
++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++ 
+++++    #@dwj
+++++    # 只遍历激活的专家，而非全部专家
++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-        hidden_states = hidden_states.view(-1, hidden_dim)
++++-        # router_logits: (batch * sequence_length, n_experts)
++++-        router_logits = self.gate(hidden_states)
++++-
++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++-        if self.norm_topk_prob:
++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-        # we cast back to the input dtype
++++-        routing_weights = routing_weights.to(hidden_states.dtype)
++++-
++++-        final_hidden_states = ops.zeros(
++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++++-        )
++++-
++++-        # One hot encode the selected experts to create an expert mask
++++-        # this will be used to easily index which expert is going to be sollicitated
++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++++-
++++-        # Loop over all available experts in the model and perform the computation on each expert
++++-        for expert_idx in range(self.num_experts):
++++-            expert_layer = self.experts[expert_idx]
++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++++-
++++-            # Index the correct hidden states and compute the expert hidden state for
++++-            # the current expert. We need to make sure to multiply the output hidden
++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++++-            if 0 not in idx.shape:
++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++++-
++++-                # However `index_add_` only support torch tensors for indexing so we'll use
++++-                # the `top_x` tensor here.
++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++++-
++++-        shared_expert_output = self.shared_expert(hidden_states)
++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++++-
++++-        final_hidden_states = final_hidden_states + shared_expert_output
+++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++            num_tokens = hidden_states_reshaped.shape[0]
+++++
+++++            router_logits = self.gate(hidden_states_reshaped)
+++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++
+++++            if self.norm_topk_prob:
+++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++            routing_weights = routing_weights.to(hidden_states.dtype)
+++++            
+++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++            flat_selected_experts = selected_experts.flatten()
+++++            
+++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++            token_indices = broadcasted_token_indices.flatten()
+++++            
+++++            active_experts = ops.unique(flat_selected_experts)
+++++            
+++++            for expert_idx_tensor in active_experts:
+++++                expert_idx = expert_idx_tensor.item()
+++++                expert_layer = self.experts[expert_idx]
+++++                
+++++                mask = (flat_selected_experts == expert_idx_tensor)
+++++                selected_token_indices = token_indices[mask]
+++++                selected_routing_weights = routing_weights.flatten()[mask]
+++++                
+++++                current_states = hidden_states_reshaped[selected_token_indices]
+++++                
+++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++                
+++++                final_hidden_states = final_hidden_states.index_add(
+++++                    dim=0,
+++++                    index=selected_token_indices,
+++++                    source=expert_output.to(hidden_states.dtype)
+++++                )
+++++            
+++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++ 
++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++-        return final_hidden_states, router_logits
+++++            final_hidden_states = final_hidden_states + shared_expert_output
+++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++            
+++++            return final_hidden_states, router_logits
++++ 
++++ 
++++ class Qwen2MoeDecoderLayer(nn.Module):
++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++++ 
++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++ 
+++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++
++++         if (layer_idx not in config.mlp_only_layers) and (
++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++         ):
++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++++     _skip_keys_device_placement = "past_key_values"
++++     _supports_cache_class = True
+++++#lwx
+++++    # _supports_static_cache = True
++++ 
++++     def _init_weights(self, module):
++++         std = self.config.initializer_range
++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++         return causal_mask
++++ 
++++ 
++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++     _tied_weights_keys = ["lm_head.weight"]
++++ 
++++     def __init__(self, config):
++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++         self.num_experts_per_tok = config.num_experts_per_tok
++++         # Initialize weights and apply final processing
++++         self.post_init()
+++++        # @lwx
+++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++++        #     self.generation_config.cache_implementation = "static"
+++++        self._warmed_up = False
+++++
+++++    def warmup_moe_model(self):
+++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++++        test_texts = [
+++++            "warmup short",
+++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++++        ]
+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++        if tokenizer is None:
+++++            from mindnlp.transformers import AutoTokenizer
+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++            self._warmup_tokenizer = tokenizer
+++++
+++++        for text in test_texts:
+++++            inputs = tokenizer(text, return_tensors="ms")
+++++            with mindspore._no_grad():
+++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
++++ 
++++     def get_input_embeddings(self):
++++         return self.model.embed_tokens
++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++         ```"""
+++++        if not self._warmed_up:
+++++            self._warmed_up = True
+++++            self.warmup_moe_model()
++++ 
++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++         output_router_logits = (
++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++             }
++++         )
++++         return model_inputs
+++++# @lwx
+++++    # def _decode_one_tokens_logits(
+++++    #     self,
+++++    #     cur_token: mindspore.Tensor,
+++++    #     input_pos: Optional[mindspore.Tensor],
+++++    #     cache_position: mindspore.Tensor,
+++++    #     past_key_values: StaticCache,
+++++    # ) -> mindspore.Tensor:
+++++    #     """
+++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++++        
+++++    #     Args:
+++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++++    #         input_pos: 输入位置信息，可选
+++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++++            
+++++    #     Returns:
+++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++++    #     """
+++++    #     # 调用JIT编译的版本
+++++    #     return self.get_decode_one_tokens_logits(
+++++    #         cur_token=cur_token,
+++++    #         input_pos=input_pos,
+++++    #         cache_position=cache_position,
+++++    #         past_key_values=past_key_values,
+++++    #     )
+++++    
+++++    # @mindspore.jit(jit_level='O1')
+++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++++    #     """
+++++    #     JIT编译的函数，用于高效的单token解码
+++++    #     使用JIT编译优化以支持静态shape和高效执行
+++++        
+++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++++    #     """
+++++    #     outputs = self.model.forward(
+++++    #         input_ids=cur_token,
+++++    #         position_ids=input_pos,
+++++    #         cache_position=cache_position,
+++++    #         past_key_values=past_key_values,
+++++    #         use_cache=True,
+++++    #         return_dict=False,
+++++    #     )
+++++        
+++++    #     hidden_states = outputs[0]
+++++    #     logits = self.lm_head.forward(hidden_states)
+++++    #     logits = logits.float()
+++++        
+++++    #     return logits[:, -1, :]
+++++
+++++    # def _sample(
+++++    #     self,
+++++    #     input_ids: mindspore.Tensor,
+++++    #     logits_processor,
+++++    #     stopping_criteria,
+++++    #     generation_config,
+++++    #     synced_devices: bool,
+++++    #     streamer=None,
+++++    #     logits_warper=None,
+++++    #     **model_kwargs,
+++++    # ):
+++++    #     """
+++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++++    #     """
+++++    #     from ...generation.logits_process import LogitsProcessorList
+++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++++    #     from mindnlp.core import nn, ops, no_grad
+++++    #     import numpy as np
+++++        
+++++    #     # 检查是否使用 StaticCache
+++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++++    #     # 否则，直接调用父类方法
+++++    #     past_key_values = model_kwargs.get("past_key_values")
+++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++++
+++++    #     if not isinstance(past_key_values, StaticCache):
+++++    #         # 不使用 StaticCache，直接调用父类方法
+++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++++    #         return super()._sample(
+++++    #             input_ids=input_ids,
+++++    #             logits_processor=logits_processor,
+++++    #             stopping_criteria=stopping_criteria,
+++++    #             generation_config=generation_config,
+++++    #             synced_devices=synced_devices,
+++++    #             streamer=streamer,
+++++    #             logits_warper=logits_warper,
+++++    #             **model_kwargs,
+++++    #         )
+++++        
+++++    #     # 使用 StaticCache，进入自定义循环
+++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++++    #     pad_token_id = generation_config._pad_token_tensor
+++++    #     output_attentions = generation_config.output_attentions
+++++    #     output_hidden_states = generation_config.output_hidden_states
+++++    #     output_scores = generation_config.output_scores
+++++    #     output_logits = generation_config.output_logits
+++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++++    #     max_length = generation_config.max_length
+++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++++    #     do_sample = generation_config.do_sample
+++++        
+++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++++    #         raise ValueError(
+++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++++    #             f"{logits_warper})."
+++++    #         )
+++++        
+++++    #     # init attention / hidden states / scores tuples
+++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++++        
+++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++++    #         encoder_hidden_states = (
+++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++++    #         )
+++++        
+++++    #     # keep track of which sequences are already finished
+++++    #     batch_size, cur_len = input_ids.shape
+++++    #     this_peer_finished = False
+++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++++        
+++++    #     time_record = []
+++++    #     from ....utils.testing_utils import parse_flag_from_env
+++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++++        
+++++    #     while self._has_unfinished_sequences(
+++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++++    #     ):
+++++    #         if _record_time:
+++++    #             import time as time_module
+++++    #             infer_start = time_module.time()
+++++            
+++++    #         # prepare model inputs
+++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++++            
+++++    #         # prepare variable output controls
+++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++++            
+++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++++    #         cur_cache_position = model_inputs.get("cache_position")
+++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+++++    #         cur_input_ids = model_inputs.get("input_ids")
+++++            
+++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++++    #             cur_cache_position is not None and 
+++++    #             len(cur_cache_position.shape) > 0 and
+++++    #             cur_cache_position.shape[0] == 1 and
+++++    #             cur_input_ids is not None and
+++++    #             cur_input_ids.shape[1] == 1):
+++++    #             # 使用 JIT 优化的单 token 解码
+++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++++    #             if not hasattr(self, '_jit_used'):
+++++    #                 self._jit_used = False
+++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++++                
+++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+++++    #                 cur_token=cur_input_ids,
+++++    #                 input_pos=model_inputs.get("position_ids"),
+++++    #                 cache_position=cur_cache_position,
+++++    #                 past_key_values=cur_past_key_values,
+++++    #             )
+++++                
+++++    #             # 标记已使用JIT（用于后续判断）
+++++    #             if not self._jit_used:
+++++    #                 self._jit_used = True
+++++                
+++++    #             # 构造兼容的输出对象
+++++    #             class JitOptimizedOutput:
+++++    #                 def __init__(self, logits, config):
+++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++++    #                     self.config = config
+++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+++++    #                     self.cross_attentions = None
+++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++++                
+++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++++    #         else:
+++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++++    #             outputs = self(**model_inputs, return_dict=True)
+++++            
+++++    #         if synced_devices and this_peer_finished:
+++++    #             continue
+++++            
+++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++++    #         next_token_logits = outputs.logits[:, -1, :]
+++++            
+++++    #         # pre-process distribution
+++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++++    #         if do_sample:
+++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++++            
+++++    #         # Store scores, attentions and hidden_states when required
+++++    #         if return_dict_in_generate:
+++++    #             if output_scores:
+++++    #                 scores += (next_token_scores,)
+++++    #             if output_logits:
+++++    #                 raw_logits += (next_token_logits,)
+++++    #             if output_attentions:
+++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++++    #                 if self.config.is_encoder_decoder:
+++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++++                
+++++    #             if output_hidden_states:
+++++    #                 hidden = (
+++++    #                     outputs.decoder_hidden_states
+++++    #                     if self.config.is_encoder_decoder
+++++    #                     else outputs.hidden_states
+++++    #                 )
+++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++++            
+++++    #         # token selection
+++++    #         if do_sample:
+++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++++    #         else:
+++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++++            
+++++    #         # finished sentences should have their next token be a padding token
+++++    #         if has_eos_stopping_criteria:
+++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++++            
+++++    #         # update generated ids, model inputs, and length for next step
+++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++++    #         if streamer is not None:
+++++    #             streamer.put(next_tokens)
+++++            
+++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+++++    #             outputs,
+++++    #             model_kwargs,
+++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++++    #         )
+++++            
+++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++++    #         cur_len += 1
+++++            
+++++    #         if _record_time:
+++++    #             import time as time_module
+++++    #             infer_stop = time_module.time()
+++++    #             time_record.append(infer_stop - infer_start)
+++++            
+++++    #         del outputs
+++++        
+++++    #     average_infer_time = None
+++++    #     if time_record:
+++++    #         if len(time_record) > 1:
+++++    #             time_record.pop(0)
+++++    #         average_infer_time = sum(time_record) / len(time_record)
+++++    #         print(f'average inference time is: {average_infer_time}')
+++++    #         print(f'inference time record: {time_record}')
+++++        
+++++    #     if streamer is not None:
+++++    #         streamer.end()
+++++        
+++++    #     # 简单判断：打印是否使用了JIT路径
+++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+++++    #     else:
+++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++++        
+++++    #     if return_dict_in_generate:
+++++    #         if self.config.is_encoder_decoder:
+++++    #             return GenerateEncoderDecoderOutput(
+++++    #                 sequences=input_ids,
+++++    #                 scores=scores,
+++++    #                 logits=raw_logits,
+++++    #                 encoder_attentions=encoder_attentions,
+++++    #                 encoder_hidden_states=encoder_hidden_states,
+++++    #                 decoder_attentions=decoder_attentions,
+++++    #                 cross_attentions=cross_attentions,
+++++    #                 decoder_hidden_states=decoder_hidden_states,
+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++    #                 average_infer_time=average_infer_time
+++++    #             )
+++++    #         else:
+++++    #             return GenerateDecoderOnlyOutput(
+++++    #                 sequences=input_ids,
+++++    #                 scores=scores,
+++++    #                 logits=raw_logits,
+++++    #                 attentions=decoder_attentions,
+++++    #                 hidden_states=decoder_hidden_states,
+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++    #                 average_infer_time=average_infer_time
+++++    #             )
+++++    #     else:
+++++    #         return input_ids
+++++            
+++++    # def _prepare_cache_for_generation(
+++++    #     self,
+++++    #     generation_config,
+++++    #     model_kwargs,
+++++    #     assistant_model,
+++++    #     batch_size,
+++++    #     max_cache_length,
+++++    # ):
+++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++++    #         generation_config.cache_implementation = "static"
+++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++++        
+++++    #     if generation_config.cache_implementation == "static":
+++++    #         base_required_from_max_length = generation_config.max_length + 1
+++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+++++    #         min_cache_size = 50
+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++++    #         else:
+++++    #             max_cache_length = max(base_required, min_cache_size)
+++++            
+++++    #         original_max_cache_length = max_cache_length
+++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+++++            
+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++    #             if max_cache_length > self.config.max_position_embeddings:
+++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++++        
+++++    #     result = super()._prepare_cache_for_generation(
+++++    #         generation_config=generation_config,
+++++    #         model_kwargs=model_kwargs,
+++++    #         assistant_model=assistant_model,
+++++    #         batch_size=batch_size,
+++++    #         max_cache_length=max_cache_length,
+++++    #     )
+++++        
+++++    #     if generation_config.cache_implementation == "static":
+++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++++    #         created_cache = model_kwargs.get(cache_name)
+++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++++    #             if created_cache.max_cache_len < generation_config.max_length:
+++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++++        
+++++    #     return result
+++++
+++++
+++++
++++ 
++++ 
++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++++-- 
++++2.27.0
++++
+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+++new file mode 100644
+++index 00000000..22b65dd5
+++--- /dev/null
++++++ b/patches/0002-20251106commit.patch
+++@@ -0,0 +1,3200 @@
++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++++From: Pinoeer-kingxi <13022943007@163.com>
++++Date: Thu, 6 Nov 2025 09:20:38 +0800
++++Subject: [PATCH 2/3] 20251106commit
++++
++++---
++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
++++ create mode 100644 patches/0001-20251104commit.patch
++++
++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++index d8303e45..73773c22 100644
++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
++++         #     y = y + self.shared_experts(identity)
++++         # return y
++++         
+++++    # @no_grad()
+++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++
+++++    #     expert_cache = ops.zeros_like(x)
+++++    #     for i in range(self.num_experts_per_tok):
+++++    #         expert_id = flat_expert_indices[i].item()
+++++    #         weight = flat_expert_weights[i].item()
+++++    #         expert = self.experts[expert_id]
+++++    #         expert_out = expert(x)
+++++    #         expert_cache += expert_out * weight
+++++    #     return expert_cache
+++++
++++     @no_grad()
++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++        # x 的 shape: (1, hidden_size)
+++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++++
+++++        # 1. 收集所有需要的专家层
+++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+++++
+++++        # 2. 并行计算所有专家的输出
+++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++++        # ops.cat 会将它们堆叠成一个新的 Tensor
+++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++++
+++++        # 3. 使用矩阵乘法进行加权求和
+++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++        # 最终结果 final_output 的 shape: (1, hidden_size)
+++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++++        
+++++        return final_output
++++ 
++++-        expert_cache = ops.zeros_like(x)
++++-        for i in range(self.num_experts_per_tok):
++++-            expert_id = flat_expert_indices[i].item()
++++-            weight = flat_expert_weights[i].item()
++++-            expert = self.experts[expert_id]
++++-            expert_out = expert(x)
++++-            expert_cache += expert_out * weight
++++-        return expert_cache
++++ 
++++     @no_grad()
++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
++++             key_states = self.k_proj(hidden_states)
++++             value_states = self.v_proj(hidden_states)
++++ 
++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++        # @lwx
+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
++++ 
++++         kv_seq_len = key_states.shape[-2]
++++         if past_key_value is not None:
++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
++++         return attn_output, attn_weights, past_key_value
++++ 
++++ 
+++++# class DeepseekFlashAttention(nn.Module):
+++++#     """
+++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+++++
+++++#     This class is designed as a drop-in replacement for DeepseekAttention.
+++++#     """
+++++
+++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+++++#         super().__init__()
+++++#         self.config = config
+++++#         self.layer_idx = layer_idx
+++++#         if layer_idx is None:
+++++#             logger.warning(
+++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++#                 "when creating this class."
+++++#             )
+++++
+++++#         self.attention_dropout = config.attention_dropout
+++++#         self.hidden_size = config.hidden_size
+++++#         self.num_heads = config.num_attention_heads
+++++#         self.head_dim = self.hidden_size // self.num_heads
+++++#         self.num_key_value_heads = config.num_key_value_heads
+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++#         self.max_position_embeddings = config.max_position_embeddings
+++++#         self.rope_theta = config.rope_theta
+++++#         self.is_causal = True
+++++
+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++++#             raise ValueError(
+++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++++#                 f" and `num_heads`: {self.num_heads})."
+++++#             )
+++++
+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+++++#         self._init_rope()
+++++
+++++#     def _init_rope(self):
+++++#         if self.config.rope_scaling is None:
+++++#             self.rotary_emb = DeepseekRotaryEmbedding(
+++++#                 self.head_dim,
+++++#                 max_position_embeddings=self.max_position_embeddings,
+++++#                 base=self.rope_theta,
+++++#             )
+++++#         else:
+++++#             scaling_type = self.config.rope_scaling["type"]
+++++#             scaling_factor = self.config.rope_scaling["factor"]
+++++#             if scaling_type == "linear":
+++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+++++#                     self.head_dim,
+++++#                     max_position_embeddings=self.max_position_embeddings,
+++++#                     scaling_factor=scaling_factor,
+++++#                     base=self.rope_theta,
+++++#                 )
+++++#             elif scaling_type == "dynamic":
+++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+++++#                     self.head_dim,
+++++#                     max_position_embeddings=self.max_position_embeddings,
+++++#                     scaling_factor=scaling_factor,
+++++#                     base=self.rope_theta,
+++++#                 )
+++++#             else:
+++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+++++
+++++#     def forward(
+++++#         self,
+++++#         hidden_states: mindspore.Tensor,
+++++#         attention_mask: Optional[mindspore.Tensor] = None,
+++++#         position_ids: Optional[mindspore.Tensor] = None,
+++++#         past_key_value: Optional[Cache] = None,
+++++#         output_attentions: bool = False,
+++++#         use_cache: bool = False,
+++++#         **kwargs,
+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++#         if "padding_mask" in kwargs:
+++++#             warnings.warn(
+++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+++++#             )
+++++        
+++++#         if output_attentions:
+++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+++++
+++++#         bsz, q_len, _ = hidden_states.shape
+++++
+++++#         if self.config.pretraining_tp > 1:
+++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+++++
+++++#         query_states = self.q_proj(hidden_states)
+++++#         key_states = self.k_proj(hidden_states)
+++++#         value_states = self.v_proj(hidden_states)
+++++
+++++#         # Reshape for multi-head attention
+++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++#         kv_seq_len = key_states.shape[-2]
+++++#         if past_key_value is not None:
+++++#             if self.layer_idx is None:
+++++#                 raise ValueError(
+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++#                     "with a layer index."
+++++#                 )
+++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++        
+++++#         # Apply Rotary Positional Embedding
+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++#         if past_key_value is not None:
+++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++
+++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++        
+++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++++        
+++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++++
+++++#         # Convert attention_mask for flash_attention_score
+++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+++++#         if attention_mask is not None:
+++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+++++#                 raise ValueError(
+++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+++++#                 )
+++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+++++#         else:
+++++#             attn_mask_for_fa = None
+++++        
+++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+++++
+++++#         # Call the fused flash_attention_score operator
+++++#         attn_output = mindspore.ops.flash_attention_score(
+++++#             query=query_states_for_fa,
+++++#             key=key_states_for_fa,
+++++#             value=value_states_for_fa,
+++++#             head_num=self.num_heads, # This is N1, the number of query heads
+++++#             input_layout='BSH',
+++++#             attn_mask=attn_mask_for_fa,
+++++#             keep_prob=keep_prob,
+++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+++++#             sparse_mode=0 # Default mask mode
+++++#         )
+++++        
+++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+++++#         attn_output = self.o_proj(attn_output)
+++++        
+++++#         # Flash Attention does not return attention weights
+++++#         attn_weights = None
+++++
+++++#         return attn_output, attn_weights, past_key_value
+++++
+++++class DeepseekFlashAttention(nn.Module):
+++++    """
+++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
+++++    designed for high performance on supported hardware (Ascend).
+++++
+++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+++++    """
+++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+++++        super().__init__()
+++++        self.config = config
+++++        self.layer_idx = layer_idx
+++++        if layer_idx is None:
+++++            logger.warning(
+++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++                "when creating this class."
+++++            )
+++++
+++++        # --- [FIX] Correctly initialize all required attributes ---
+++++        self.attention_dropout = config.attention_dropout
+++++        self.hidden_size = config.hidden_size
+++++        self.num_heads = config.num_attention_heads
+++++        self.head_dim = self.hidden_size // self.num_heads
+++++        self.num_key_value_heads = config.num_key_value_heads
+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++        self.max_position_embeddings = config.max_position_embeddings
+++++        self.rope_theta = config.rope_theta
+++++        self.is_causal = True
+++++
+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++            raise ValueError(
+++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++++                f" and `num_heads`: {self.num_heads})."
+++++            )
+++++
+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+++++        
+++++        # This call will now succeed as all attributes are initialized.
+++++        self._init_rope()
+++++
+++++    def _init_rope(self):
+++++        if self.config.rope_scaling is None:
+++++            self.rotary_emb = DeepseekRotaryEmbedding(
+++++                self.head_dim,
+++++                max_position_embeddings=self.max_position_embeddings,
+++++                base=self.rope_theta,
+++++            )
+++++        else:
+++++            scaling_type = self.config.rope_scaling["type"]
+++++            scaling_factor = self.config.rope_scaling["factor"]
+++++            if scaling_type == "linear":
+++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+++++                    self.head_dim,
+++++                    max_position_embeddings=self.max_position_embeddings,
+++++                    scaling_factor=scaling_factor,
+++++                    base=self.rope_theta,
+++++                )
+++++            elif scaling_type == "dynamic":
+++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+++++                    self.head_dim,
+++++                    max_position_embeddings=self.max_position_embeddings,
+++++                    scaling_factor=scaling_factor,
+++++                    base=self.rope_theta,
+++++                )
+++++            else:
+++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+++++
+++++    def forward(
+++++        self,
+++++        hidden_states: mindspore.Tensor,
+++++        attention_mask: Optional[mindspore.Tensor] = None,
+++++        position_ids: Optional[mindspore.Tensor] = None,
+++++        past_key_value: Optional[Cache] = None,
+++++        output_attentions: bool = False,
+++++        use_cache: bool = False,
+++++        **kwargs,
+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++        if "padding_mask" in kwargs:
+++++            warnings.warn(
+++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+++++            )
+++++        if output_attentions:
+++++            warnings.warn(
+++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+++++            )
+++++
+++++        bsz, q_len, _ = hidden_states.shape
+++++
+++++        if self.config.pretraining_tp > 1:
+++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+++++
+++++        query_states = self.q_proj(hidden_states)
+++++        key_states = self.k_proj(hidden_states)
+++++        value_states = self.v_proj(hidden_states)
+++++
+++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++        kv_seq_len = key_states.shape[-2]
+++++        if past_key_value is not None:
+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++        
+++++        # Apply Rotary Position Embedding
+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++        if past_key_value is not None:
+++++            cache_kwargs = {"sin": sin, "cos": cos}
+++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++
+++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+++++        # So we must explicitly repeat the KV heads.
+++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++
+++++        # Convert attention mask for flash_attention_score
+++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+++++        if attention_mask is not None:
+++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+++++                 raise ValueError(
+++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+++++                )
+++++            attn_mask_for_fa = attention_mask < 0
+++++        else:
+++++            attn_mask_for_fa = None
+++++
+++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+++++
+++++        # Call the fused operator using the efficient BNSD layout
+++++        attn_output = mindspore.ops.flash_attention_score(
+++++            query=query_states,
+++++            key=key_states,
+++++            value=value_states,
+++++            head_num=self.num_heads,
+++++            input_layout='BNSD', # Specify the correct layout
+++++            attn_mask=attn_mask_for_fa,
+++++            keep_prob=keep_prob,
+++++            scalar_value=1.0 / math.sqrt(self.head_dim)
+++++        )
+++++        
+++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++        
+++++        # Apply output projection
+++++        attn_output = self.o_proj(attn_output)
+++++
+++++        # Flash attention does not return attention weights, so we return None.
+++++        attn_weights = None
+++++
+++++        return attn_output, attn_weights, past_key_value
+++++
++++ Deepseek_ATTENTION_CLASSES = {
++++     "eager": DeepseekAttention,
+++++    "flash-attention": DeepseekFlashAttention,
++++ }
++++ 
++++ 
++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
++++             config=config, layer_idx=layer_idx
++++         )
++++ 
+++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+++++            config=config, layer_idx=layer_idx
+++++        )
+++++
++++         self.mlp = (
++++             DeepseekMoE(config)
++++             if (
++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++index d4c6b651..bced285c 100644
++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
++++ 
++++ import mindspore
++++ import mindnlp.core.nn.functional as F
++++-from mindnlp.core import nn, ops
+++++from mindnlp.core import nn, ops, no_grad
++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
++++ 
++++ from ....common.activations import ACT2FN
++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
++++ 
+++++Long_Prompt = False
+++++PROMPT_LENGTH_THRESHOLD = 128
++++ 
++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
++++ def _prepare_4d_causal_attention_mask_with_cache_position(
++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
++++         return attn_output, attn_weights, past_key_value
++++ 
++++ 
+++++# class Qwen2MoeFlashAttention(nn.Module):
+++++#     """
+++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++
+++++#     关键改动:
+++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++#        直接传入原始的 key 和 value 张量效率更高。
+++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++#     """
+++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++#         super().__init__()
+++++#         self.config = config
+++++#         self.layer_idx = layer_idx
+++++#         self.hidden_size = config.hidden_size
+++++#         self.num_heads = config.num_attention_heads
+++++#         self.head_dim = self.hidden_size // self.num_heads
+++++#         self.num_key_value_heads = config.num_key_value_heads
+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++#         self.max_position_embeddings = config.max_position_embeddings
+++++#         self.rope_theta = config.rope_theta
+++++#         self.attention_dropout = config.attention_dropout
+++++
+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++++#             raise ValueError(
+++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++#             )
+++++
+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++
+++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++#             self.head_dim,
+++++#             max_position_embeddings=self.max_position_embeddings,
+++++#             base=self.rope_theta,
+++++#         )
+++++
+++++#     def forward(
+++++#         self,
+++++#         hidden_states: mindspore.Tensor,
+++++#         attention_mask: Optional[mindspore.Tensor] = None,
+++++#         position_ids: Optional[mindspore.Tensor] = None,
+++++#         past_key_value: Optional[Cache] = None,
+++++#         output_attentions: bool = False,
+++++#         use_cache: bool = False,
+++++#         cache_position: Optional[mindspore.Tensor] = None,
+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++#         bsz, q_len, _ = hidden_states.shape
+++++
+++++#         # 1. 线性投射 Q, K, V
+++++#         query_states = self.q_proj(hidden_states)
+++++#         key_states = self.k_proj(hidden_states)
+++++#         value_states = self.v_proj(hidden_states)
+++++
+++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
+++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++#         # 3. RoPE 旋转位置编码
+++++#         kv_seq_len = key_states.shape[-2]
+++++#         if past_key_value is not None:
+++++#             if self.layer_idx is None:
+++++#                 raise ValueError(
+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++#                     "with a layer index."
+++++#                 )
+++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++#                 if cache_position.shape[0] == 1:
+++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++#                     kv_seq_len = past_seen_tokens + 1
+++++#                 else:
+++++#                     # prefill 阶段：cache_position 是范围，使用其长度
+++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++#             else:
+++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++
+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++#         # 4. KV 缓存更新
+++++#         if past_key_value is not None:
+++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++#             key_states, value_states = past_key_value.update(
+++++#                 key_states, value_states, self.layer_idx, cache_kwargs
+++++#             )
+++++            
+++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++#                 if cache_position.shape[0] == 1:
+++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++#                     kv_seq_len = key_states.shape[-2]
+++++
+++++#         # 5. [重要] 准备 Attention Mask
+++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++#         fa_attention_mask = None
+++++#         if attention_mask is not None:
+++++#             # 截取与当前key长度匹配的部分
+++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++#             fa_attention_mask = (mask_slice != 0)
+++++
+++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++#         input_dtype = query_states.dtype
+++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++#             query_states = query_states.to(mindspore.float16)
+++++#             key_states = key_states.to(mindspore.float16)
+++++#             value_states = value_states.to(mindspore.float16)
+++++
+++++#         # 6. [核心] 调用 flash_attention_score 算子
+++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++#         attn_output = mindspore.ops.flash_attention_score(
+++++#             query=query_states,
+++++#             key=key_states,
+++++#             value=value_states,
+++++#             head_num=self.num_heads, # 传入Q的头数(N1)
+++++#             attn_mask=fa_attention_mask,
+++++#             keep_prob=1.0 - self.attention_dropout,
+++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+++++#             input_layout="BNSD",
+++++#             sparse_mode=0 # 使用 defaultMask 模式
+++++#         )
+++++
+++++#         # 恢复原始数据类型
+++++#         attn_output = attn_output.to(input_dtype)
+++++
+++++#         # 7. 调整输出形状
+++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++#         attn_output = self.o_proj(attn_output)
+++++
+++++#         # FlashAttention 算子不直接返回注意力权重矩阵
+++++#         attn_weights = None
+++++#         if output_attentions:
+++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++
+++++#         return attn_output, attn_weights, past_key_value
+++++
+++++#     # def forward(
+++++#     #     self,
+++++#     #     hidden_states: mindspore.Tensor,
+++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
+++++#     #     position_ids: Optional[mindspore.Tensor] = None,
+++++#     #     past_key_value: Optional[Cache] = None,
+++++#     #     output_attentions: bool = False,
+++++#     #     use_cache: bool = False,
+++++#     #     cache_position: Optional[mindspore.Tensor] = None,
+++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++#     #     bsz, q_len, _ = hidden_states.shape
+++++
+++++#     #     # 1. 线性投射 Q, K, V
+++++#     #     query_states = self.q_proj(hidden_states)
+++++#     #     key_states = self.k_proj(hidden_states)
+++++#     #     value_states = self.v_proj(hidden_states)
+++++
+++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++
+++++#     #     # 3. RoPE 旋转位置编码
+++++#     #     kv_seq_len = key_states.shape[-2]
+++++#     #     if past_key_value is not None:
+++++#     #         if self.layer_idx is None:
+++++#     #             raise ValueError(
+++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++#     #                 "with a layer index."
+++++#     #             )
+++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++
+++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++#     #     # 4. KV 缓存更新
+++++#     #     if past_key_value is not None:
+++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++#     #         key_states, value_states = past_key_value.update(
+++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
+++++#     #         )
+++++
+++++#     #     # 5. 准备 Attention Mask
+++++#     #     fa_attention_mask = None
+++++#     #     if attention_mask is not None:
+++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++#     #         fa_attention_mask = (mask_slice != 0)
+++++
+++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++#     #     input_dtype = query_states.dtype
+++++
+++++#     #     # 6. [核心] 调用 flash_attention_score 算子
+++++#     #     attn_output = mindspore.ops.flash_attention_score(
+++++#     #         query=query_states,
+++++#     #         key=key_states,
+++++#     #         value=value_states,
+++++#     #         head_num=self.num_heads,
+++++#     #         attn_mask=fa_attention_mask,
+++++#     #         keep_prob=1.0 - self.attention_dropout,
+++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++#     #         input_layout="BNSD",
+++++#     #         sparse_mode=0,
+++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++#     #         inner_precise=1
+++++#     #     )
+++++
+++++#     #     # 恢复原始数据类型
+++++#     #     attn_output = attn_output.to(input_dtype)
+++++
+++++#     #     # 7. 调整输出形状
+++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++#     #     attn_output = self.o_proj(attn_output)
+++++
+++++#     #     attn_weights = None
+++++#     #     if output_attentions:
+++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++
+++++#     #     return attn_output, attn_weights, past_key_value
+++++
+++++
++++ class Qwen2MoeFlashAttention(nn.Module):
++++     """
++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++-
++++-    关键改动:
++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++-       直接传入原始的 key 和 value 张量效率更高。
++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+++++
+++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+++++    完全使用模型的低精度数据类型（如 float16）进行计算，
+++++    以达到理论上的最高执行速度。
++++     """
++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++         super().__init__()
++++         self.config = config
++++         self.layer_idx = layer_idx
+++++        if layer_idx is None:
+++++            logger.warning_once(
+++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+++++            )
+++++
++++         self.hidden_size = config.hidden_size
++++         self.num_heads = config.num_attention_heads
++++         self.head_dim = self.hidden_size // self.num_heads
++++         self.num_key_value_heads = config.num_key_value_heads
++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++         self.max_position_embeddings = config.max_position_embeddings
++++         self.rope_theta = config.rope_theta
++++         self.attention_dropout = config.attention_dropout
++++ 
++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
++++-            raise ValueError(
++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++-            )
++++-
++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
++++         key_states = self.k_proj(hidden_states)
++++         value_states = self.v_proj(hidden_states)
++++ 
++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++        # 2. 调整形状以匹配 BNSD 布局
++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-
++++-        # 3. RoPE 旋转位置编码
+++++        
+++++        # 3. RoPE 和 KV 缓存
++++         kv_seq_len = key_states.shape[-2]
++++         if past_key_value is not None:
++++-            if self.layer_idx is None:
++++-                raise ValueError(
++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++-                    "with a layer index."
++++-                )
++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++-                if cache_position.shape[0] == 1:
++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++-                    kv_seq_len = past_seen_tokens + 1
++++-                else:
++++-                    # prefill 阶段：cache_position 是范围，使用其长度
++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++-            else:
++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++-
+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++        
++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++ 
++++-        # 4. KV 缓存更新
++++         if past_key_value is not None:
++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++-            key_states, value_states = past_key_value.update(
++++-                key_states, value_states, self.layer_idx, cache_kwargs
++++-            )
++++-            
++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++-                if cache_position.shape[0] == 1:
++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++-                    kv_seq_len = key_states.shape[-2]
++++-
++++-        # 5. [重要] 准备 Attention Mask
++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++
+++++        # 4. 准备 Attention Mask
++++         fa_attention_mask = None
++++         if attention_mask is not None:
++++-            # 截取与当前key长度匹配的部分
++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++             fa_attention_mask = (mask_slice != 0)
++++ 
++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++-        input_dtype = query_states.dtype
++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++-            query_states = query_states.to(mindspore.float16)
++++-            key_states = key_states.to(mindspore.float16)
++++-            value_states = value_states.to(mindspore.float16)
++++-
++++-        # 6. [核心] 调用 flash_attention_score 算子
++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
++++         attn_output = mindspore.ops.flash_attention_score(
++++             query=query_states,
++++             key=key_states,
++++             value=value_states,
++++-            head_num=self.num_heads, # 传入Q的头数(N1)
+++++            head_num=self.num_heads,
++++             attn_mask=fa_attention_mask,
++++-            keep_prob=1.0 - self.attention_dropout,
+++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
++++             scalar_value=1.0 / math.sqrt(self.head_dim),
++++             input_layout="BNSD",
++++-            sparse_mode=0 # 使用 defaultMask 模式
+++++            sparse_mode=0,
+++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
++++         )
++++ 
++++-        # 恢复原始数据类型
++++-        attn_output = attn_output.to(input_dtype)
++++-
++++-        # 7. 调整输出形状
++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++        # 6. 调整输出形状
++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++         attn_output = self.o_proj(attn_output)
++++ 
++++-        # FlashAttention 算子不直接返回注意力权重矩阵
+++++        # 7. 返回结果
++++         attn_weights = None
++++         if output_attentions:
++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
++++ 
++++         return attn_output, attn_weights, past_key_value
++++ 
++++-    # def forward(
++++-    #     self,
++++-    #     hidden_states: mindspore.Tensor,
++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
++++-    #     position_ids: Optional[mindspore.Tensor] = None,
++++-    #     past_key_value: Optional[Cache] = None,
++++-    #     output_attentions: bool = False,
++++-    #     use_cache: bool = False,
++++-    #     cache_position: Optional[mindspore.Tensor] = None,
++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++-
++++-    #     bsz, q_len, _ = hidden_states.shape
++++-
++++-    #     # 1. 线性投射 Q, K, V
++++-    #     query_states = self.q_proj(hidden_states)
++++-    #     key_states = self.k_proj(hidden_states)
++++-    #     value_states = self.v_proj(hidden_states)
++++-
++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-
++++-    #     # 3. RoPE 旋转位置编码
++++-    #     kv_seq_len = key_states.shape[-2]
++++-    #     if past_key_value is not None:
++++-    #         if self.layer_idx is None:
++++-    #             raise ValueError(
++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++-    #                 "with a layer index."
++++-    #             )
++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++ 
++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++-
++++-    #     # 4. KV 缓存更新
++++-    #     if past_key_value is not None:
++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++-    #         key_states, value_states = past_key_value.update(
++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
++++-    #         )
++++-
++++-    #     # 5. 准备 Attention Mask
++++-    #     fa_attention_mask = None
++++-    #     if attention_mask is not None:
++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++-    #         fa_attention_mask = (mask_slice != 0)
++++-
++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++-    #     input_dtype = query_states.dtype
++++-
++++-    #     # 6. [核心] 调用 flash_attention_score 算子
++++-    #     attn_output = mindspore.ops.flash_attention_score(
++++-    #         query=query_states,
++++-    #         key=key_states,
++++-    #         value=value_states,
++++-    #         head_num=self.num_heads,
++++-    #         attn_mask=fa_attention_mask,
++++-    #         keep_prob=1.0 - self.attention_dropout,
++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++-    #         input_layout="BNSD",
++++-    #         sparse_mode=0,
++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++-    #         inner_precise=1
++++-    #     )
++++-
++++-    #     # 恢复原始数据类型
++++-    #     attn_output = attn_output.to(input_dtype)
+++++QWEN2MOE_ATTENTION_CLASSES = {
+++++    "eager": Qwen2MoeAttention,
+++++    "flash-attention": Qwen2MoeFlashAttention,
+++++}
++++ 
++++-    #     # 7. 调整输出形状
++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++-    #     attn_output = self.o_proj(attn_output)
++++ 
++++-    #     attn_weights = None
++++-    #     if output_attentions:
++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++#     def __init__(self, config):
+++++#         super().__init__()
+++++#         self.num_experts = config.num_experts
+++++#         self.top_k = config.num_experts_per_tok
+++++#         self.norm_topk_prob = config.norm_topk_prob
+++++
+++++#         # gating
+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++#         self.experts = nn.ModuleList(
+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++#         )
+++++
+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++
+++++#     #@dwj
+++++#     # 只遍历激活的专家，而非全部专家
+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++#             num_tokens = hidden_states_reshaped.shape[0]
+++++
+++++#             router_logits = self.gate(hidden_states_reshaped)
+++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++
+++++#             if self.norm_topk_prob:
+++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++#             routing_weights = routing_weights.to(hidden_states.dtype)
+++++            
+++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++#             flat_selected_experts = selected_experts.flatten()
+++++            
+++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++#             token_indices = broadcasted_token_indices.flatten()
+++++            
+++++#             active_experts = ops.unique(flat_selected_experts)
+++++            
+++++#             for expert_idx_tensor in active_experts:
+++++#                 expert_idx = expert_idx_tensor.item()
+++++#                 expert_layer = self.experts[expert_idx]
+++++                
+++++#                 mask = (flat_selected_experts == expert_idx_tensor)
+++++#                 selected_token_indices = token_indices[mask]
+++++#                 selected_routing_weights = routing_weights.flatten()[mask]
+++++                
+++++#                 current_states = hidden_states_reshaped[selected_token_indices]
+++++                
+++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++                
+++++#                 final_hidden_states = final_hidden_states.index_add(
+++++#                     dim=0,
+++++#                     index=selected_token_indices,
+++++#                     source=expert_output.to(hidden_states.dtype)
+++++#                 )
+++++            
+++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++ 
++++-    #     return attn_output, attn_weights, past_key_value
+++++#             final_hidden_states = final_hidden_states + shared_expert_output
+++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++            
+++++#             return final_hidden_states, router_logits
+++++
+++++
+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++#     """
+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
+++++#     """
+++++#     def __init__(self, config: Qwen2MoeConfig):
+++++#         super().__init__()
+++++#         self.num_experts = config.num_experts
+++++#         self.top_k = config.num_experts_per_tok
+++++#         self.norm_topk_prob = config.norm_topk_prob
+++++
+++++#         # 门控网络
+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++#         # 专家列表
+++++#         self.experts = nn.ModuleList(
+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++#         )
+++++#         # 共享专家
+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++
+++++#     @no_grad()
+++++#     def _moe_infer_decode(
+++++#         self, 
+++++#         hidden_states: mindspore.Tensor, 
+++++#         selected_experts: mindspore.Tensor, 
+++++#         routing_weights: mindspore.Tensor
+++++#     ) -> mindspore.Tensor:
+++++#         """
+++++#         【解码路径】针对 sequence_length=1 的极致优化。
+++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+++++#         """
+++++#         batch_size, hidden_dim = hidden_states.shape
+++++        
+++++#         expert_outputs_list = [
+++++#             ops.cat([
+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++#             ], dim=0) 
+++++#             for i in range(batch_size)
+++++#         ]
+++++        
+++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+++++#         # shape: (batch_size, top_k, hidden_dim)
+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++        
+++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++        
+++++#         return moe_output.squeeze(1)
+++++
+++++#     @no_grad()
+++++#     def _moe_infer_prefill(
+++++#         self, 
+++++#         hidden_states: mindspore.Tensor, 
+++++#         selected_experts: mindspore.Tensor, 
+++++#         routing_weights: mindspore.Tensor
+++++#     ) -> mindspore.Tensor:
+++++#         """
+++++#         【预填充路径】针对 sequence_length > 1 的优化。
+++++#         按专家对 Token 进行分组，并进行批处理。
+++++#         """
+++++#         moe_output = ops.zeros_like(hidden_states)
+++++#         num_tokens = hidden_states.shape[0]
+++++#         flat_selected_experts = selected_experts.flatten()
+++++        
+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++        
+++++#         active_experts = ops.unique(flat_selected_experts)
+++++        
+++++#         for expert_idx_tensor in active_experts:
+++++#             expert_idx = expert_idx_tensor.item()
+++++#             expert_layer = self.experts[expert_idx]
+++++            
+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++#             selected_token_indices = token_indices[mask]
+++++#             selected_routing_weights = routing_weights.flatten()[mask]
+++++            
+++++#             current_states = hidden_states[selected_token_indices]
+++++            
+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++            
+++++#             moe_output = moe_output.index_add(
+++++#                 dim=0,
+++++#                 index=selected_token_indices,
+++++#                 source=expert_output.to(hidden_states.dtype)
+++++#             )
+++++#         return moe_output
+++++
+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++#         """
+++++#         顶层 forward 方法，作为智能分发器。
+++++#         """
+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++        
+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++#         router_logits = self.gate(hidden_states_reshaped)
+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++ 
++++-    # def forward(
++++-    #     self,
++++-    #     hidden_states: mindspore.Tensor,
++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
++++-    #     position_ids: Optional[mindspore.Tensor] = None,
++++-    #     past_key_value: Optional[Cache] = None,
++++-    #     output_attentions: bool = False,
++++-    #     use_cache: bool = False,
++++-    #     cache_position: Optional[mindspore.Tensor] = None,
++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++-
++++-    #     bsz, q_len, _ = hidden_states.shape
++++-
++++-    #     query_states = self.q_proj(hidden_states)
++++-    #     key_states = self.k_proj(hidden_states)
++++-    #     value_states = self.v_proj(hidden_states)
++++-
++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-
++++-    #     kv_seq_len = key_states.shape[-2]
++++-    #     if past_key_value is not None:
++++-    #         if self.layer_idx is None:
++++-    #             raise ValueError("`layer_idx` must be specified for caching")
++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++-
++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++-
++++-    #     if past_key_value is not None:
++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++-    #         key_states, value_states = past_key_value.update(
++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
++++-    #         )
+++++#         if self.norm_topk_prob:
+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++        
+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++        
+++++#         moe_output = None
+++++#         # 在推理时，根据序列长度选择最优路径
+++++#         if not self.training:
+++++#             if sequence_length == 1:
+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++++#             else:
+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++++#         else:
+++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+++++#             raise NotImplementedError("Training path is not implemented.")
+++++
+++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+++++        
+++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+++++        
+++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+++++        
+++++#         return final_hidden_states, router_logits
+++++
+++++
+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++#     """
+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+++++#     """
+++++#     def __init__(self, config: Qwen2MoeConfig):
+++++#         super().__init__()
+++++#         self.num_experts = config.num_experts
+++++#         self.top_k = config.num_experts_per_tok
+++++#         self.norm_topk_prob = config.norm_topk_prob
+++++
+++++#         # 门控网络
+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++#         # 专家列表
+++++#         self.experts = nn.ModuleList(
+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++#         )
+++++#         # 共享专家
+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++
+++++#     @no_grad()
+++++#     def _moe_infer_decode(
+++++#         self, 
+++++#         hidden_states: mindspore.Tensor, 
+++++#         selected_experts: mindspore.Tensor, 
+++++#         routing_weights: mindspore.Tensor
+++++#     ) -> mindspore.Tensor:
+++++#         batch_size, _ = hidden_states.shape
+++++#         expert_outputs_list = [
+++++#             ops.cat([
+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++#             ], dim=0) 
+++++#             for i in range(batch_size)
+++++#         ]
+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++#         return moe_output.squeeze(1)
+++++
+++++#     @no_grad()
+++++#     def _moe_infer_prefill(
+++++#         self, 
+++++#         hidden_states: mindspore.Tensor, 
+++++#         selected_experts: mindspore.Tensor, 
+++++#         routing_weights: mindspore.Tensor
+++++#     ) -> mindspore.Tensor:
+++++#         moe_output = ops.zeros_like(hidden_states)
+++++#         num_tokens = hidden_states.shape[0]
+++++#         flat_selected_experts = selected_experts.flatten()
+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++#         active_experts = ops.unique(flat_selected_experts)
+++++        
+++++#         for expert_idx_tensor in active_experts:
+++++#             expert_idx = expert_idx_tensor.item()
+++++#             expert_layer = self.experts[expert_idx]
+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++#             selected_token_indices = token_indices[mask]
+++++#             selected_routing_weights = routing_weights.flatten()[mask]
+++++#             current_states = hidden_states[selected_token_indices]
+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++#             moe_output = moe_output.index_add(
+++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++++#             )
+++++#         return moe_output
+++++
+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++#         """
+++++#         顶层 forward 方法，作为智能分发器。
+++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+++++#         """
+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++        
+++++#         # 1. 门控计算 (通用逻辑)
+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++#         router_logits = self.gate(hidden_states_reshaped)
+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++
+++++#         if self.norm_topk_prob:
+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++        
+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++        
+++++#         # 2. 智能分发到最优 MoE 路径
+++++#         moe_output = None
+++++#         if not self.training:
+++++#             if sequence_length == 1:
+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++++#             else:
+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++++#         else:
+++++#             raise NotImplementedError("Training path is not implemented.")
+++++
+++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++        
+++++#         # 4. 合并 MoE 输出和共享专家输出
+++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++        
+++++#         # 5. 恢复原始形状并返回
+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++        
+++++#         return final_hidden_states, router_logits
+++++
+++++# prefill fastest
+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++#     """
+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+++++#     """
+++++#     def __init__(self, config: Qwen2MoeConfig):
+++++#         super().__init__()
+++++#         self.num_experts = config.num_experts
+++++#         self.top_k = config.num_experts_per_tok
+++++#         self.norm_topk_prob = config.norm_topk_prob
+++++
+++++#         # 门控网络
+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++#         # 专家列表
+++++#         self.experts = nn.ModuleList(
+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++#         )
+++++#         # 共享专家
+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++
+++++#     @no_grad()
+++++#     def _moe_infer_dispatch(
+++++#         self, 
+++++#         hidden_states: mindspore.Tensor, 
+++++#         selected_experts: mindspore.Tensor, 
+++++#         routing_weights: mindspore.Tensor
+++++#     ) -> mindspore.Tensor:
+++++#         """
+++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+++++#         """
+++++#         moe_output = ops.zeros_like(hidden_states)
+++++#         num_tokens, _ = hidden_states.shape
+++++        
+++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+++++#         flat_selected_experts = selected_experts.flatten()
+++++#         flat_routing_weights = routing_weights.flatten()
++++ 
++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++-
++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++-    #     query_states = query_states / math.sqrt(self.head_dim)
++++-    #     # <--- 修改结束 ---
++++-
++++-    #     fa_attention_mask = None
++++-    #     if attention_mask is not None:
++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++-    #         fa_attention_mask = (mask_slice != 0)
++++-
++++-    #     input_dtype = query_states.dtype
++++-
++++-    #     attn_output = mindspore.ops.flash_attention_score(
++++-    #         query=query_states,    # 传入已经预先缩放过的 query
++++-    #         key=key_states,
++++-    #         value=value_states,
++++-    #         head_num=self.num_heads,
++++-    #         attn_mask=fa_attention_mask,
++++-    #         keep_prob=1.0 - self.attention_dropout,
++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++-    #         input_layout="BNSD",
++++-    #         sparse_mode=0,
++++-    #         inner_precise=1        # 仍然保持内部高精度计算
++++-    #     )
+++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++ 
++++-    #     attn_output = attn_output.to(input_dtype)
++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++-    #     attn_output = self.o_proj(attn_output)
+++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+++++#         active_experts = ops.unique(flat_selected_experts)
+++++        
+++++#         for expert_idx_tensor in active_experts:
+++++#             expert_idx = expert_idx_tensor.item()
+++++#             expert_layer = self.experts[expert_idx]
+++++            
+++++#             # 找到所有分配给该专家的 token
+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++            
+++++#             # 使用 mask 选取对应的 token 和权重
+++++#             current_token_indices = token_indices[mask]
+++++#             current_routing_weights = flat_routing_weights[mask]
+++++#             current_hidden_states = hidden_states[current_token_indices]
+++++            
+++++#             # 对这些 token 进行批处理
+++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++            
+++++#             # 使用 index_add 将结果精确地加回到对应位置
+++++#             moe_output = moe_output.index_add(
+++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+++++#             )
+++++#         return moe_output
+++++
+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++#         """
+++++#         顶层 forward 方法，作为智能分发器。
+++++#         """
+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++        
+++++#         # 1. 门控计算
+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++#         router_logits = self.gate(hidden_states_reshaped)
+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++
+++++#         if self.norm_topk_prob:
+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++        
+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++        
+++++#         # 2. 调用统一的 MoE 计算内核
+++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
++++ 
++++-    #     attn_weights = None
++++-    #     if output_attentions:
++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++#         # 3. 统一处理共享专家
+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++        
+++++#         # 4. 合并输出
+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++        
+++++#         # 5. 恢复原始形状并返回
+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++        
+++++#         return final_hidden_states, router_logits
+++++
+++++
+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++#     """
+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++#     【最终高性能与高精度版】：
+++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+++++#     3. 这样实现了速度和准确性的两全其美。
+++++#     """
+++++#     def __init__(self, config: Qwen2MoeConfig):
+++++#         super().__init__()
+++++#         self.num_experts = config.num_experts
+++++#         self.top_k = config.num_experts_per_tok
+++++#         self.norm_topk_prob = config.norm_topk_prob
+++++
+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++#         self.experts = nn.ModuleList(
+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++#         )
+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++
+++++#     @no_grad()
+++++#     def _moe_infer_decode(
+++++#         self, 
+++++#         hidden_states: mindspore.Tensor, 
+++++#         selected_experts: mindspore.Tensor, 
+++++#         routing_weights: mindspore.Tensor
+++++#     ) -> mindspore.Tensor:
+++++#         """
+++++#         【解码路径】极致优化版：bmm + 高精度累加。
+++++#         """
+++++#         original_dtype = hidden_states.dtype
+++++#         batch_size, _ = hidden_states.shape
+++++        
+++++#         expert_outputs_list = [
+++++#             ops.cat([
+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++#             ], dim=0) 
+++++#             for i in range(batch_size)
+++++#         ]
+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++
+++++#         # 在 float32 下执行 bmm，得到高精度结果
+++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++        
+++++#         # 将高精度结果转换回原始数据类型
+++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+++++        
+++++#         return moe_output
+++++
+++++#     @no_grad()
+++++#     def _moe_infer_prefill(
+++++#         self, 
+++++#         hidden_states: mindspore.Tensor, 
+++++#         selected_experts: mindspore.Tensor, 
+++++#         routing_weights: mindspore.Tensor
+++++#     ) -> mindspore.Tensor:
+++++#         """
+++++#         【预填充路径】与原始实现一致，结果精确。
+++++#         """
+++++#         moe_output = ops.zeros_like(hidden_states)
+++++#         num_tokens, _ = hidden_states.shape
+++++#         flat_selected_experts = selected_experts.flatten()
+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++#         active_experts = ops.unique(flat_selected_experts)
+++++        
+++++#         for expert_idx_tensor in active_experts:
+++++#             expert_idx = expert_idx_tensor.item()
+++++#             expert_layer = self.experts[expert_idx]
+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++#             selected_token_indices = token_indices[mask]
+++++#             selected_routing_weights = routing_weights.flatten()[mask]
+++++#             current_states = hidden_states[selected_token_indices]
+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++#             moe_output = moe_output.index_add(
+++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++++#             )
+++++#         return moe_output
+++++
+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++        
+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++#         router_logits = self.gate(hidden_states_reshaped)
+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++ 
++++-    #     return attn_output, attn_weights, past_key_value
+++++#         if self.norm_topk_prob:
+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++        
+++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+++++#         # 如果模型主体是 float16，后续再转换
+++++        
+++++#         moe_output = None
+++++#         if not self.training:
+++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+++++#             # _moe_infer_decode 内部会处理好类型转换
+++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+++++#             if sequence_length == 1:
+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++++#             else:
+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++++#         else:
+++++#             raise NotImplementedError("Training path is not implemented.")
+++++
+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++        
+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++        
+++++#         return final_hidden_states, router_logits
+++++    
++++ 
++++-QWEN2MOE_ATTENTION_CLASSES = {
++++-    "eager": Qwen2MoeAttention,
++++-    "flash-attention": Qwen2MoeFlashAttention,
++++-}
+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++#     """
+++++#     【融合版】一个混合专家模块，内置两种推理策略，
+++++#     由外部全局变量 `Long_Prompt` 控制：
+++++
+++++#     - if Long_Prompt is True:  【精度优先模式】
+++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+++++#       适用于处理长序列，避免误差累积。
+++++
+++++#     - if Long_Prompt is False: 【速度优先模式】
+++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+++++#       在解码阶段获得极致速度，同时保证结果高度准确。
+++++#     """
+++++#     def __init__(self, config: Qwen2MoeConfig):
+++++#         super().__init__()
+++++#         self.num_experts = config.num_experts
+++++#         self.top_k = config.num_experts_per_tok
+++++#         self.norm_topk_prob = config.norm_topk_prob
+++++
+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++#         self.experts = nn.ModuleList(
+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++#         )
+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++
+++++#     # --- 速度优先模式的辅助函数 ---
+++++#     @no_grad()
+++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++#         original_dtype = hidden_states.dtype
+++++#         batch_size, _ = hidden_states.shape
+++++#         expert_outputs_list = [
+++++#             ops.cat([
+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++#             ], dim=0) 
+++++#             for i in range(batch_size)
+++++#         ]
+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++#         weights_fp32 = routing_weights.to(mindspore.float32)
+++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
+++++
+++++#     @no_grad()
+++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++#         moe_output = ops.zeros_like(hidden_states)
+++++#         num_tokens, _ = hidden_states.shape
+++++#         flat_selected_experts = selected_experts.flatten()
+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++#         active_experts = ops.unique(flat_selected_experts)
+++++#         for expert_idx_tensor in active_experts:
+++++#             expert_idx = expert_idx_tensor.item()
+++++#             expert_layer = self.experts[expert_idx]
+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++#             selected_token_indices = token_indices[mask]
+++++#             selected_routing_weights = routing_weights.flatten()[mask]
+++++#             current_states = hidden_states[selected_token_indices]
+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+++++#         return moe_output
+++++
+++++#     # --- 精度优先模式的辅助函数 ---
+++++#     @no_grad()
+++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++#         moe_output = ops.zeros_like(hidden_states)
+++++#         num_tokens, _ = hidden_states.shape
+++++#         flat_selected_experts = selected_experts.flatten()
+++++#         flat_routing_weights = routing_weights.flatten()
+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++#         active_experts = ops.unique(flat_selected_experts)
+++++#         for expert_idx_tensor in active_experts:
+++++#             expert_idx = expert_idx_tensor.item()
+++++#             expert_layer = self.experts[expert_idx]
+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++#             current_token_indices = token_indices[mask]
+++++#             current_routing_weights = flat_routing_weights[mask]
+++++#             current_hidden_states = hidden_states[current_token_indices]
+++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++++#         return moe_output
+++++
+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++#         # 声明我们将要使用一个在模块外部定义的全局变量
+++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+++++#         global Long_Prompt
+++++
+++++#         # 1. 门控计算 (所有模式通用)
+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++#         router_logits = self.gate(hidden_states_reshaped)
+++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++++#         if self.norm_topk_prob:
+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++        
+++++#         moe_output = None
+++++#         if not self.training:
+++++#             # 根据 Long_Prompt 标志选择模式
+++++#             if Long_Prompt:
+++++#                 # --- 精度优先模式 ---
+++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++#             else:
+++++#                 # --- 速度优先模式 ---
+++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++#                 if sequence_length == 1:
+++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++#                 else:
+++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++#         else:
+++++#             raise NotImplementedError("Training path is not implemented.")
+++++
+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++        
+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++        
+++++#         return final_hidden_states, router_logits
+++++    
+++++class Qwen2MoeSparseMoeBlock(nn.Module):
+++++    """
+++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+++++    控制的顶级推理策略：
++++ 
+++++    - if Long_Prompt is True:  【精度优先模式】
+++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+++++      适用于需要严格可复现性的长序列任务。
++++ 
++++-class Qwen2MoeSparseMoeBlock(nn.Module):
++++-    def __init__(self, config):
+++++    - if Long_Prompt is False: 【速度优先模式】
+++++      采用业界最强的性能组合：
+++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+++++    """
+++++    def __init__(self, config: Qwen2MoeConfig):
++++         super().__init__()
++++         self.num_experts = config.num_experts
++++         self.top_k = config.num_experts_per_tok
++++         self.norm_topk_prob = config.norm_topk_prob
++++ 
++++-        # gating
++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++         self.experts = nn.ModuleList(
++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++         )
++++-
++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++ 
++++-    #@dwj
++++-    # 只遍历激活的专家，而非全部专家
++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++-            num_tokens = hidden_states_reshaped.shape[0]
++++-
++++-            router_logits = self.gate(hidden_states_reshaped)
++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++-
++++-            if self.norm_topk_prob:
++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-            routing_weights = routing_weights.to(hidden_states.dtype)
++++-            
++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++-            flat_selected_experts = selected_experts.flatten()
++++-            
++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++-            token_indices = broadcasted_token_indices.flatten()
++++-            
++++-            active_experts = ops.unique(flat_selected_experts)
++++-            
++++-            for expert_idx_tensor in active_experts:
++++-                expert_idx = expert_idx_tensor.item()
++++-                expert_layer = self.experts[expert_idx]
++++-                
++++-                mask = (flat_selected_experts == expert_idx_tensor)
++++-                selected_token_indices = token_indices[mask]
++++-                selected_routing_weights = routing_weights.flatten()[mask]
++++-                
++++-                current_states = hidden_states_reshaped[selected_token_indices]
++++-                
++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++-                
++++-                final_hidden_states = final_hidden_states.index_add(
++++-                    dim=0,
++++-                    index=selected_token_indices,
++++-                    source=expert_output.to(hidden_states.dtype)
++++-                )
++++-            
++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+++++    @no_grad()
+++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++        original_dtype = hidden_states.dtype
+++++        batch_size, _ = hidden_states.shape
+++++        expert_outputs_list = [
+++++            ops.cat([
+++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++            ], dim=0)
+++++            for i in range(batch_size)
+++++        ]
+++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++        weights_fp32 = routing_weights.to(mindspore.float32)
+++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++++        return moe_output_fp32.squeeze(1).to(original_dtype)
+++++
+++++    @no_grad()
+++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++        num_tokens, _ = hidden_states.shape
+++++        flat_selected_experts = selected_experts.flatten()
+++++        sorted_expert_indices = flat_selected_experts.argsort()
+++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++++        original_token_indices = sorted_expert_indices // self.top_k
+++++        moe_output = ops.zeros_like(hidden_states)
+++++        current_token_offset = 0
+++++        for i in range(self.num_experts):
+++++            expert_token_count = tokens_per_expert[i] - current_token_offset
+++++            if expert_token_count == 0:
+++++                continue
+++++            end_offset = current_token_offset + expert_token_count
+++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++++            expert_hidden_states = hidden_states[expert_original_token_indices]
+++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++++            current_token_offset += expert_token_count
+++++        return moe_output
+++++
+++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+++++    @no_grad()
+++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++        moe_output = ops.zeros_like(hidden_states)
+++++        num_tokens, _ = hidden_states.shape
+++++        flat_selected_experts = selected_experts.flatten()
+++++        flat_routing_weights = routing_weights.flatten()
+++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++        active_experts = ops.unique(flat_selected_experts)
+++++        for expert_idx_tensor in active_experts:
+++++            expert_idx = expert_idx_tensor.item()
+++++            expert_layer = self.experts[expert_idx]
+++++            mask = (flat_selected_experts == expert_idx_tensor)
+++++            current_token_indices = token_indices[mask]
+++++            current_routing_weights = flat_routing_weights[mask]
+++++            current_hidden_states = hidden_states[current_token_indices]
+++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++++        return moe_output
++++ 
++++-            final_hidden_states = final_hidden_states + shared_expert_output
++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++-            
++++-            return final_hidden_states, router_logits
+++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++        global Long_Prompt
+++++
+++++        # 1. 门控计算 (所有模式通用)
+++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++        router_logits = self.gate(hidden_states_reshaped)
+++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++++        if self.norm_topk_prob:
+++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++        
+++++        moe_output = None
+++++        if Long_Prompt:
+++++            # --- 精度优先模式 (ACCURACY MODE) ---
+++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++        else:
+++++            # --- 速度优先模式 (SPEED MODE) ---
+++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++            if sequence_length == 1:
+++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++            else:
+++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++        
++++ 
+++++        # 3. 共享专家计算与合并 (所有模式通用)
+++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++        
+++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++        
+++++        return final_hidden_states, router_logits
++++ 
++++ class Qwen2MoeDecoderLayer(nn.Module):
++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
++++         super().__init__()
++++         self.hidden_size = config.hidden_size
+++++        
+++++        # if Long_Prompt:
+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++        # else:
+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++ 
++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++ 
++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++-
++++         if (layer_idx not in config.mlp_only_layers) and (
++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++         ):
++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++             self._warmed_up = True
++++             self.warmup_moe_model()
++++ 
+++++
+++++
++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++         output_router_logits = (
++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++             router_logits=outputs.router_logits,
++++         )
++++ 
+++++    def generate(self, *args, **kwargs):
+++++        """
+++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+++++        """
+++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+++++
+++++        input_ids = kwargs.get("input_ids")
+++++        if input_ids is None and args:
+++++            input_ids = args[0]
+++++
+++++        if input_ids is not None:
+++++            prompt_length = input_ids.shape[1]
+++++            
+++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+++++                Long_Prompt = True
+++++            else:
+++++                Long_Prompt = False
+++++
+++++        return super().generate(*args, **kwargs)
+++++    
++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
++++     def prepare_inputs_for_generation(
++++         self,
++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+++++        
++++         if past_key_values is not None:
++++             if inputs_embeds is not None:  # Exception 1
++++                 if 0 not in input_ids.shape:
++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++             }
++++         )
++++         return model_inputs
+++++
++++ # @lwx
++++     # def _decode_one_tokens_logits(
++++     #     self,
++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
++++             attentions=outputs.attentions,
++++         )
++++ 
+++++
++++ __all__ = [
++++     "Qwen2MoeForCausalLM",
++++     "Qwen2MoeModel",
++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++++new file mode 100644
++++index 00000000..6dfb5b93
++++--- /dev/null
+++++++ b/patches/0001-20251104commit.patch
++++@@ -0,0 +1,1272 @@
+++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++++From: Pinoeer-kingxi <13022943007@163.com>
+++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+++++Subject: [PATCH] 20251104commit
+++++
+++++---
+++++ mindnlp/transformers/cache_utils.py           |  28 +-
+++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+++++
+++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++++index cadd2e04..02f8d4be 100644
+++++--- a/mindnlp/transformers/cache_utils.py
++++++++ b/mindnlp/transformers/cache_utils.py
+++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++++             # k_out[:, :, cache_position] = key_states
+++++             # v_out[:, :, cache_position] = value_states
+++++-            if ON_ORANGE_PI:
+++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++-            else:
+++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++-
++++++            # if ON_ORANGE_PI:
++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++            # else:
++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++            # 确保 cache_position 是 1D tensor 并且类型正确
++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++++++            if cache_position.ndim > 1:
++++++                cache_position = cache_position.flatten()
++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++++++                cache_position = cache_position.int()
++++++            
++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++++++            k_out[:, :, cache_position] = key_states
++++++            v_out[:, :, cache_position] = value_states
++++++            
+++++         return k_out, v_out
+++++ 
+++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++index c695b944..d8303e45 100644
+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++++ def rotate_half(x):
+++++     """Rotates half the hidden dims of the input."""
+++++-    x1 = x[..., : x.shape[-1] // 2]
+++++-    x2 = x[..., x.shape[-1] // 2 :]
++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++    # x1 = x[..., : x.shape[-1] // 2]
++++++    # x2 = x[..., x.shape[-1] // 2 :]
++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++     return ops.cat((-x2, x1), dim=-1)
+++++ 
+++++ 
+++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++++         if self.training:
+++++             raise NotImplementedError("Training is not supported yet.")
+++++         else:
+++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++-        if self.config.n_shared_experts is not None:
+++++-            y = y + self.shared_experts(identity)
+++++-        return y
++++++            # @lwx
++++++            if orig_shape[1] == 1:
++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++++++                y=y.view(*orig_shape)
++++++                if self.config.n_shared_experts is not None:
++++++                    y = y + self.shared_experts(identity)
++++++                return y
++++++            else:
++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++++++                if self.config.n_shared_experts is not None:
++++++                    y = y + self.shared_experts(identity)
++++++                return y
++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++        # if self.config.n_shared_experts is not None:
++++++        #     y = y + self.shared_experts(identity)
++++++        # return y
++++++        
++++++    @no_grad()
++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++
++++++        expert_cache = ops.zeros_like(x)
++++++        for i in range(self.num_experts_per_tok):
++++++            expert_id = flat_expert_indices[i].item()
++++++            weight = flat_expert_weights[i].item()
++++++            expert = self.experts[expert_id]
++++++            expert_out = expert(x)
++++++            expert_cache += expert_out * weight
++++++        return expert_cache
+++++ 
+++++     @no_grad()
+++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++-        # expert_cache = torch.zeros_like(x)
+++++-        # idxs = flat_expert_indices.argsort()
+++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++-        # token_idxs = idxs // self.num_experts_per_tok
+++++-        # for i, end_idx in enumerate(tokens_per_expert):
+++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++-        #     if start_idx == end_idx:
+++++-        #         continue
+++++-        #     expert = self.experts[i]
+++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++-        #     expert_tokens = x[exp_token_idx]
+++++-        #     expert_out = expert(expert_tokens)
+++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++-        # return expert_cache
++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++         expert_cache = ops.zeros_like(x)
+++++         idxs = flat_expert_indices.argsort()
+++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++         token_idxs = idxs // self.num_experts_per_tok
++++++
+++++         for i, end_idx in enumerate(tokens_per_expert):
+++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++             if start_idx == end_idx:
+++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++++             expert_out = expert(expert_tokens)
+++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++
+++++         return expert_cache
++++++        
++++++    # @no_grad()
++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++    #     # expert_cache = torch.zeros_like(x)
++++++    #     # idxs = flat_expert_indices.argsort()
++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++    #     # token_idxs = idxs // self.num_experts_per_tok
++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++    #     #     if start_idx == end_idx:
++++++    #     #         continue
++++++    #     #     expert = self.experts[i]
++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++    #     #     expert_tokens = x[exp_token_idx]
++++++    #     #     expert_out = expert(expert_tokens)
++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++    #     # return expert_cache
++++++    #     expert_cache = ops.zeros_like(x)
++++++    #     idxs = flat_expert_indices.argsort()
++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++    #     token_idxs = idxs // self.num_experts_per_tok
++++++
++++++    #     for i, end_idx in enumerate(tokens_per_expert):
++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++    #         if start_idx == end_idx:
++++++    #             continue
++++++    #         expert = self.experts[i]
++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++    #         expert_tokens = x[exp_token_idx]
++++++    #         expert_out = expert(expert_tokens)
++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++
++++++    #     return expert_cache
++++++    # @no_grad()
++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++    #     expert_cache = ops.zeros_like(x)
++++++
++++++    #     # 排序保证顺序一致
++++++    #     idxs = flat_expert_indices.argsort()
++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++    #     token_idxs = idxs // self.num_experts_per_tok
++++++
++++++    #     # 找出有 token 的专家
++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++
++++++    #     for i in active_experts.tolist():
++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++    #         end_idx = tokens_per_expert[i]
++++++    #         if start_idx == end_idx:  # 没有 token
++++++    #             continue
++++++
++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++    #         expert_tokens = x[exp_token_idx]
++++++    #         expert_out = self.experts[i](expert_tokens)
++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++
++++++    #         expert_cache = mindspore.mint.scatter_add(
++++++    #             expert_cache,
++++++    #             0,
++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++    #             expert_out
++++++    #         )
++++++
++++++    #     return expert_cache
++++++
++++++
+++++ 
+++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++++ #     """
+++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++ 
+++++         # Initialize weights and apply final processing
+++++         self.post_init()
++++++        self.warm_up = False
++++++
++++++    def warmup_moe_model_deep(self):
++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++++        test_texts = [
++++++            "warmup short",
++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++++++        ]
++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++        if tokenizer is None:
++++++            from mindnlp.transformers import AutoTokenizer
++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++            self._warmup_tokenizer = tokenizer
++++++
++++++        for text in test_texts:
++++++            inputs = tokenizer(text, return_tensors="ms")
++++++            with mindspore._no_grad():
++++++                _ = self(**inputs, use_cache=False)
++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++++ 
+++++     def get_input_embeddings(self):
+++++         return self.model.embed_tokens
+++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++         ```"""
++++++        if not self.warm_up:
++++++            self.warm_up = True
++++++            self.warmup_moe_model_deep()
++++++
+++++         output_attentions = (
+++++             output_attentions
+++++             if output_attentions is not None
+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++index 3cbf820e..d4c6b651 100644
+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++@@ -18,7 +18,6 @@
+++++ # See the License for the specific language governing permissions and
+++++ # limitations under the License.
+++++ """MindSpore Qwen2MoE model."""
+++++-
+++++ import math
+++++ from typing import List, Optional, Tuple, Union
+++++ 
+++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++++     TokenClassifierOutput,
+++++ )
+++++ from ...modeling_utils import PreTrainedModel
++++++from ...generation import GenerationMixin
+++++ from ....utils import logging
+++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+++++ 
+++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++++         self.variance_epsilon = eps
+++++ 
+++++     def forward(self, hidden_states):
++++++        # @dwj
++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++        # @lwx
++++++        # if not self.training :
++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++         input_dtype = hidden_states.dtype
+++++         hidden_states = hidden_states.to(mindspore.float32)
+++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++++@@ -234,6 +239,8 @@ def rotate_half(x):
+++++     """Rotates half the hidden dims of the input."""
+++++     x1 = x[..., : x.shape[-1] // 2]
+++++     x2 = x[..., x.shape[-1] // 2 :]
++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++     return ops.cat((-x2, x1), dim=-1)
+++++ 
+++++ 
+++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++++         self.config = config
+++++         self.hidden_size = config.hidden_size
+++++         self.intermediate_size = intermediate_size
++++++        
+++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++++         self.act_fn = ACT2FN[config.hidden_act]
+++++ 
+++++     def forward(self, x):
+++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++-
+++++ 
++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++        # @lwx 
++++++        # gate_up_output = self.gate_up_proj(x)
++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++++++        # return self.down_proj(swiglu_output)
++++++
++++++    # def forward(self, x):
++++++    #     gate_proj_out = self.gate_proj(x)
++++++    #     up_proj_out = self.up_proj(x)
++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++++++    #     return self.down_proj(swiglu_out)
++++++        
+++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++++     """
+++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++++         use_cache: bool = False,
+++++         cache_position: Optional[mindspore.Tensor] = None,
+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++        
++++++
+++++         bsz, q_len, _ = hidden_states.shape
+++++ 
+++++         query_states = self.q_proj(hidden_states)
+++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++                     "with a layer index."
+++++                 )
+++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++            if isinstance(past_key_value, StaticCache):
++++++                kv_seq_len = key_states.shape[-2]
++++++            else:
++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++ 
+++++         if past_key_value is not None:
+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++            
++++++            if isinstance(past_key_value, StaticCache):
++++++                kv_seq_len = key_states.shape[-2]
+++++ 
+++++         # repeat k/v heads if n_kv_heads < n_heads
+++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++-
++++++        
+++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++ 
+++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++++-            raise ValueError(
+++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++++-                f" {attn_weights.shape}"
+++++-            )
+++++-
+++++-        if attention_mask is not None:  # no matter the length, we just slice it
+++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++++++        if attention_mask is not None:
++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++             attn_weights = attn_weights + causal_mask
+++++ 
+++++         # upcast attention to fp32
+++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++ 
+++++         attn_output = self.o_proj(attn_output)
+++++-
++++++        # @lwx
++++++        
++++++        # max_seq_len = self.max_position_embeddings  # 2048
++++++
++++++        # if attention_mask is not None:
++++++        #     # attention_mask: [B, 1, Sq, Sk]
++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++++
++++++        #     # pad 到 [max_seq_len, max_seq_len]
++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++++        #     global_attention_mask = padded_mask
++++++        # else:
++++++        #     global_attention_mask = None
++++++
++++++
++++++        # sparse_mode=3
++++++        # attn_output = mindspore.ops.flash_attention_score(
++++++        #     query=query_states,
++++++        #     key=key_states,
++++++        #     value=value_states,
++++++        #     real_shift=None,
++++++        #     padding_mask=None,
++++++
++++++        #     head_num=self.num_heads,
++++++        #     attn_mask=global_attention_mask,
++++++        #     keep_prob=1.0 - self.attention_dropout,
++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++++        #     input_layout="BNSD",
++++++        #     pre_tokens=2147483647,
++++++        #     next_tokens=2147483647,
++++++        #     inner_precise=0,
++++++        #     drop_mask=None,
++++++        #     prefix=None,
++++++        #     actual_seq_qlen=None,
++++++        #     actual_seq_kvlen=None,
++++++        #     sparse_mode=sparse_mode,
++++++        # )
+++++         if not output_attentions:
+++++             attn_weights = None
+++++ 
+++++         return attn_output, attn_weights, past_key_value
+++++ 
+++++ 
++++++class Qwen2MoeFlashAttention(nn.Module):
++++++    """
++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++++
++++++    关键改动:
++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++++       直接传入原始的 key 和 value 张量效率更高。
++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++    """
++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++        super().__init__()
++++++        self.config = config
++++++        self.layer_idx = layer_idx
++++++        self.hidden_size = config.hidden_size
++++++        self.num_heads = config.num_attention_heads
++++++        self.head_dim = self.hidden_size // self.num_heads
++++++        self.num_key_value_heads = config.num_key_value_heads
++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++        self.max_position_embeddings = config.max_position_embeddings
++++++        self.rope_theta = config.rope_theta
++++++        self.attention_dropout = config.attention_dropout
++++++
++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++++            raise ValueError(
++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++++            )
++++++
++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++
++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++            self.head_dim,
++++++            max_position_embeddings=self.max_position_embeddings,
++++++            base=self.rope_theta,
++++++        )
++++++
++++++    def forward(
++++++        self,
++++++        hidden_states: mindspore.Tensor,
++++++        attention_mask: Optional[mindspore.Tensor] = None,
++++++        position_ids: Optional[mindspore.Tensor] = None,
++++++        past_key_value: Optional[Cache] = None,
++++++        output_attentions: bool = False,
++++++        use_cache: bool = False,
++++++        cache_position: Optional[mindspore.Tensor] = None,
++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++        bsz, q_len, _ = hidden_states.shape
++++++
++++++        # 1. 线性投射 Q, K, V
++++++        query_states = self.q_proj(hidden_states)
++++++        key_states = self.k_proj(hidden_states)
++++++        value_states = self.v_proj(hidden_states)
++++++
++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++        # 3. RoPE 旋转位置编码
++++++        kv_seq_len = key_states.shape[-2]
++++++        if past_key_value is not None:
++++++            if self.layer_idx is None:
++++++                raise ValueError(
++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++                    "with a layer index."
++++++                )
++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++++                if cache_position.shape[0] == 1:
++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++++                    kv_seq_len = past_seen_tokens + 1
++++++                else:
++++++                    # prefill 阶段：cache_position 是范围，使用其长度
++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++++            else:
++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++
++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++        # 4. KV 缓存更新
++++++        if past_key_value is not None:
++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++            key_states, value_states = past_key_value.update(
++++++                key_states, value_states, self.layer_idx, cache_kwargs
++++++            )
++++++            
++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++                if cache_position.shape[0] == 1:
++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++++                    kv_seq_len = key_states.shape[-2]
++++++
++++++        # 5. [重要] 准备 Attention Mask
++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++        fa_attention_mask = None
++++++        if attention_mask is not None:
++++++            # 截取与当前key长度匹配的部分
++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++++            fa_attention_mask = (mask_slice != 0)
++++++
++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++++        input_dtype = query_states.dtype
++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++++            query_states = query_states.to(mindspore.float16)
++++++            key_states = key_states.to(mindspore.float16)
++++++            value_states = value_states.to(mindspore.float16)
++++++
++++++        # 6. [核心] 调用 flash_attention_score 算子
++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++        attn_output = mindspore.ops.flash_attention_score(
++++++            query=query_states,
++++++            key=key_states,
++++++            value=value_states,
++++++            head_num=self.num_heads, # 传入Q的头数(N1)
++++++            attn_mask=fa_attention_mask,
++++++            keep_prob=1.0 - self.attention_dropout,
++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
++++++            input_layout="BNSD",
++++++            sparse_mode=0 # 使用 defaultMask 模式
++++++        )
++++++
++++++        # 恢复原始数据类型
++++++        attn_output = attn_output.to(input_dtype)
++++++
++++++        # 7. 调整输出形状
++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++        attn_output = self.o_proj(attn_output)
++++++
++++++        # FlashAttention 算子不直接返回注意力权重矩阵
++++++        attn_weights = None
++++++        if output_attentions:
++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++
++++++        return attn_output, attn_weights, past_key_value
++++++
++++++    # def forward(
++++++    #     self,
++++++    #     hidden_states: mindspore.Tensor,
++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++++    #     past_key_value: Optional[Cache] = None,
++++++    #     output_attentions: bool = False,
++++++    #     use_cache: bool = False,
++++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++    #     bsz, q_len, _ = hidden_states.shape
++++++
++++++    #     # 1. 线性投射 Q, K, V
++++++    #     query_states = self.q_proj(hidden_states)
++++++    #     key_states = self.k_proj(hidden_states)
++++++    #     value_states = self.v_proj(hidden_states)
++++++
++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++    #     # 3. RoPE 旋转位置编码
++++++    #     kv_seq_len = key_states.shape[-2]
++++++    #     if past_key_value is not None:
++++++    #         if self.layer_idx is None:
++++++    #             raise ValueError(
++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++    #                 "with a layer index."
++++++    #             )
++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++
++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++    #     # 4. KV 缓存更新
++++++    #     if past_key_value is not None:
++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++    #         key_states, value_states = past_key_value.update(
++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++    #         )
++++++
++++++    #     # 5. 准备 Attention Mask
++++++    #     fa_attention_mask = None
++++++    #     if attention_mask is not None:
++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++    #         fa_attention_mask = (mask_slice != 0)
++++++
++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++++    #     input_dtype = query_states.dtype
++++++
++++++    #     # 6. [核心] 调用 flash_attention_score 算子
++++++    #     attn_output = mindspore.ops.flash_attention_score(
++++++    #         query=query_states,
++++++    #         key=key_states,
++++++    #         value=value_states,
++++++    #         head_num=self.num_heads,
++++++    #         attn_mask=fa_attention_mask,
++++++    #         keep_prob=1.0 - self.attention_dropout,
++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++++    #         input_layout="BNSD",
++++++    #         sparse_mode=0,
++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++++    #         inner_precise=1
++++++    #     )
++++++
++++++    #     # 恢复原始数据类型
++++++    #     attn_output = attn_output.to(input_dtype)
++++++
++++++    #     # 7. 调整输出形状
++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++    #     attn_output = self.o_proj(attn_output)
++++++
++++++    #     attn_weights = None
++++++    #     if output_attentions:
++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++
++++++    #     return attn_output, attn_weights, past_key_value
++++++
++++++    # def forward(
++++++    #     self,
++++++    #     hidden_states: mindspore.Tensor,
++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++++    #     past_key_value: Optional[Cache] = None,
++++++    #     output_attentions: bool = False,
++++++    #     use_cache: bool = False,
++++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++    #     bsz, q_len, _ = hidden_states.shape
++++++
++++++    #     query_states = self.q_proj(hidden_states)
++++++    #     key_states = self.k_proj(hidden_states)
++++++    #     value_states = self.v_proj(hidden_states)
++++++
++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++    #     kv_seq_len = key_states.shape[-2]
++++++    #     if past_key_value is not None:
++++++    #         if self.layer_idx is None:
++++++    #             raise ValueError("`layer_idx` must be specified for caching")
++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++
++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++    #     if past_key_value is not None:
++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++    #         key_states, value_states = past_key_value.update(
++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++    #         )
++++++
++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++
++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++++    #     query_states = query_states / math.sqrt(self.head_dim)
++++++    #     # <--- 修改结束 ---
++++++
++++++    #     fa_attention_mask = None
++++++    #     if attention_mask is not None:
++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++    #         fa_attention_mask = (mask_slice != 0)
++++++
++++++    #     input_dtype = query_states.dtype
++++++
++++++    #     attn_output = mindspore.ops.flash_attention_score(
++++++    #         query=query_states,    # 传入已经预先缩放过的 query
++++++    #         key=key_states,
++++++    #         value=value_states,
++++++    #         head_num=self.num_heads,
++++++    #         attn_mask=fa_attention_mask,
++++++    #         keep_prob=1.0 - self.attention_dropout,
++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++++    #         input_layout="BNSD",
++++++    #         sparse_mode=0,
++++++    #         inner_precise=1        # 仍然保持内部高精度计算
++++++    #     )
++++++
++++++    #     attn_output = attn_output.to(input_dtype)
++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++    #     attn_output = self.o_proj(attn_output)
++++++
++++++    #     attn_weights = None
++++++    #     if output_attentions:
++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++++
++++++    #     return attn_output, attn_weights, past_key_value
++++++
+++++ QWEN2MOE_ATTENTION_CLASSES = {
+++++     "eager": Qwen2MoeAttention,
++++++    "flash-attention": Qwen2MoeFlashAttention,
+++++ }
+++++ 
+++++ 
+++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++ 
++++++    #@dwj
++++++    # 只遍历激活的专家，而非全部专家
+++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+++++-        # router_logits: (batch * sequence_length, n_experts)
+++++-        router_logits = self.gate(hidden_states)
+++++-
+++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++-        if self.norm_topk_prob:
+++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-        # we cast back to the input dtype
+++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+++++-
+++++-        final_hidden_states = ops.zeros(
+++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++++-        )
+++++-
+++++-        # One hot encode the selected experts to create an expert mask
+++++-        # this will be used to easily index which expert is going to be sollicitated
+++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++++-
+++++-        # Loop over all available experts in the model and perform the computation on each expert
+++++-        for expert_idx in range(self.num_experts):
+++++-            expert_layer = self.experts[expert_idx]
+++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++++-
+++++-            # Index the correct hidden states and compute the expert hidden state for
+++++-            # the current expert. We need to make sure to multiply the output hidden
+++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++++-            if 0 not in idx.shape:
+++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++++-
+++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+++++-                # the `top_x` tensor here.
+++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++++-
+++++-        shared_expert_output = self.shared_expert(hidden_states)
+++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++++-
+++++-        final_hidden_states = final_hidden_states + shared_expert_output
++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++            num_tokens = hidden_states_reshaped.shape[0]
++++++
++++++            router_logits = self.gate(hidden_states_reshaped)
++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++
++++++            if self.norm_topk_prob:
++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++            routing_weights = routing_weights.to(hidden_states.dtype)
++++++            
++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++++            flat_selected_experts = selected_experts.flatten()
++++++            
++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++++            token_indices = broadcasted_token_indices.flatten()
++++++            
++++++            active_experts = ops.unique(flat_selected_experts)
++++++            
++++++            for expert_idx_tensor in active_experts:
++++++                expert_idx = expert_idx_tensor.item()
++++++                expert_layer = self.experts[expert_idx]
++++++                
++++++                mask = (flat_selected_experts == expert_idx_tensor)
++++++                selected_token_indices = token_indices[mask]
++++++                selected_routing_weights = routing_weights.flatten()[mask]
++++++                
++++++                current_states = hidden_states_reshaped[selected_token_indices]
++++++                
++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++                
++++++                final_hidden_states = final_hidden_states.index_add(
++++++                    dim=0,
++++++                    index=selected_token_indices,
++++++                    source=expert_output.to(hidden_states.dtype)
++++++                )
++++++            
++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++ 
+++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++-        return final_hidden_states, router_logits
++++++            final_hidden_states = final_hidden_states + shared_expert_output
++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++            
++++++            return final_hidden_states, router_logits
+++++ 
+++++ 
+++++ class Qwen2MoeDecoderLayer(nn.Module):
+++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++++ 
+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++ 
++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++
+++++         if (layer_idx not in config.mlp_only_layers) and (
+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++++         ):
+++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++++     _skip_keys_device_placement = "past_key_values"
+++++     _supports_cache_class = True
++++++#lwx
++++++    # _supports_static_cache = True
+++++ 
+++++     def _init_weights(self, module):
+++++         std = self.config.initializer_range
+++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++++         return causal_mask
+++++ 
+++++ 
+++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++     _tied_weights_keys = ["lm_head.weight"]
+++++ 
+++++     def __init__(self, config):
+++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++         self.num_experts_per_tok = config.num_experts_per_tok
+++++         # Initialize weights and apply final processing
+++++         self.post_init()
++++++        # @lwx
++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++++++        #     self.generation_config.cache_implementation = "static"
++++++        self._warmed_up = False
++++++
++++++    def warmup_moe_model(self):
++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
++++++        test_texts = [
++++++            "warmup short",
++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++++++        ]
++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++        if tokenizer is None:
++++++            from mindnlp.transformers import AutoTokenizer
++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++            self._warmup_tokenizer = tokenizer
++++++
++++++        for text in test_texts:
++++++            inputs = tokenizer(text, return_tensors="ms")
++++++            with mindspore._no_grad():
++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++++ 
+++++     def get_input_embeddings(self):
+++++         return self.model.embed_tokens
+++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++         ```"""
++++++        if not self._warmed_up:
++++++            self._warmed_up = True
++++++            self.warmup_moe_model()
+++++ 
+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++++         output_router_logits = (
+++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++             }
+++++         )
+++++         return model_inputs
++++++# @lwx
++++++    # def _decode_one_tokens_logits(
++++++    #     self,
++++++    #     cur_token: mindspore.Tensor,
++++++    #     input_pos: Optional[mindspore.Tensor],
++++++    #     cache_position: mindspore.Tensor,
++++++    #     past_key_values: StaticCache,
++++++    # ) -> mindspore.Tensor:
++++++    #     """
++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++++++        
++++++    #     Args:
++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++++++    #         input_pos: 输入位置信息，可选
++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
++++++            
++++++    #     Returns:
++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++++++    #     """
++++++    #     # 调用JIT编译的版本
++++++    #     return self.get_decode_one_tokens_logits(
++++++    #         cur_token=cur_token,
++++++    #         input_pos=input_pos,
++++++    #         cache_position=cache_position,
++++++    #         past_key_values=past_key_values,
++++++    #     )
++++++    
++++++    # @mindspore.jit(jit_level='O1')
++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++++++    #     """
++++++    #     JIT编译的函数，用于高效的单token解码
++++++    #     使用JIT编译优化以支持静态shape和高效执行
++++++        
++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++++++    #     """
++++++    #     outputs = self.model.forward(
++++++    #         input_ids=cur_token,
++++++    #         position_ids=input_pos,
++++++    #         cache_position=cache_position,
++++++    #         past_key_values=past_key_values,
++++++    #         use_cache=True,
++++++    #         return_dict=False,
++++++    #     )
++++++        
++++++    #     hidden_states = outputs[0]
++++++    #     logits = self.lm_head.forward(hidden_states)
++++++    #     logits = logits.float()
++++++        
++++++    #     return logits[:, -1, :]
++++++
++++++    # def _sample(
++++++    #     self,
++++++    #     input_ids: mindspore.Tensor,
++++++    #     logits_processor,
++++++    #     stopping_criteria,
++++++    #     generation_config,
++++++    #     synced_devices: bool,
++++++    #     streamer=None,
++++++    #     logits_warper=None,
++++++    #     **model_kwargs,
++++++    # ):
++++++    #     """
++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++++++    #     """
++++++    #     from ...generation.logits_process import LogitsProcessorList
++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++++++    #     from mindnlp.core import nn, ops, no_grad
++++++    #     import numpy as np
++++++        
++++++    #     # 检查是否使用 StaticCache
++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++++++    #     # 否则，直接调用父类方法
++++++    #     past_key_values = model_kwargs.get("past_key_values")
++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++++++
++++++    #     if not isinstance(past_key_values, StaticCache):
++++++    #         # 不使用 StaticCache，直接调用父类方法
++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++++++    #         return super()._sample(
++++++    #             input_ids=input_ids,
++++++    #             logits_processor=logits_processor,
++++++    #             stopping_criteria=stopping_criteria,
++++++    #             generation_config=generation_config,
++++++    #             synced_devices=synced_devices,
++++++    #             streamer=streamer,
++++++    #             logits_warper=logits_warper,
++++++    #             **model_kwargs,
++++++    #         )
++++++        
++++++    #     # 使用 StaticCache，进入自定义循环
++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++++++    #     pad_token_id = generation_config._pad_token_tensor
++++++    #     output_attentions = generation_config.output_attentions
++++++    #     output_hidden_states = generation_config.output_hidden_states
++++++    #     output_scores = generation_config.output_scores
++++++    #     output_logits = generation_config.output_logits
++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
++++++    #     max_length = generation_config.max_length
++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++++++    #     do_sample = generation_config.do_sample
++++++        
++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++++++    #         raise ValueError(
++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++++++    #             f"{logits_warper})."
++++++    #         )
++++++        
++++++    #     # init attention / hidden states / scores tuples
++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++++++        
++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++++++    #         encoder_hidden_states = (
++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++++++    #         )
++++++        
++++++    #     # keep track of which sequences are already finished
++++++    #     batch_size, cur_len = input_ids.shape
++++++    #     this_peer_finished = False
++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++++++        
++++++    #     time_record = []
++++++    #     from ....utils.testing_utils import parse_flag_from_env
++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++++++        
++++++    #     while self._has_unfinished_sequences(
++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++++++    #     ):
++++++    #         if _record_time:
++++++    #             import time as time_module
++++++    #             infer_start = time_module.time()
++++++            
++++++    #         # prepare model inputs
++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++++++            
++++++    #         # prepare variable output controls
++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++++++            
++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++++++    #         cur_cache_position = model_inputs.get("cache_position")
++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
++++++    #         cur_input_ids = model_inputs.get("input_ids")
++++++            
++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
++++++    #             cur_cache_position is not None and 
++++++    #             len(cur_cache_position.shape) > 0 and
++++++    #             cur_cache_position.shape[0] == 1 and
++++++    #             cur_input_ids is not None and
++++++    #             cur_input_ids.shape[1] == 1):
++++++    #             # 使用 JIT 优化的单 token 解码
++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++++++    #             if not hasattr(self, '_jit_used'):
++++++    #                 self._jit_used = False
++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++++++                
++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
++++++    #                 cur_token=cur_input_ids,
++++++    #                 input_pos=model_inputs.get("position_ids"),
++++++    #                 cache_position=cur_cache_position,
++++++    #                 past_key_values=cur_past_key_values,
++++++    #             )
++++++                
++++++    #             # 标记已使用JIT（用于后续判断）
++++++    #             if not self._jit_used:
++++++    #                 self._jit_used = True
++++++                
++++++    #             # 构造兼容的输出对象
++++++    #             class JitOptimizedOutput:
++++++    #                 def __init__(self, logits, config):
++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++++++    #                     self.config = config
++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
++++++    #                     self.cross_attentions = None
++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++++++                
++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++++++    #         else:
++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++++++    #             outputs = self(**model_inputs, return_dict=True)
++++++            
++++++    #         if synced_devices and this_peer_finished:
++++++    #             continue
++++++            
++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++++++    #         next_token_logits = outputs.logits[:, -1, :]
++++++            
++++++    #         # pre-process distribution
++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++++++    #         if do_sample:
++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++++++            
++++++    #         # Store scores, attentions and hidden_states when required
++++++    #         if return_dict_in_generate:
++++++    #             if output_scores:
++++++    #                 scores += (next_token_scores,)
++++++    #             if output_logits:
++++++    #                 raw_logits += (next_token_logits,)
++++++    #             if output_attentions:
++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++++++    #                 if self.config.is_encoder_decoder:
++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++++++                
++++++    #             if output_hidden_states:
++++++    #                 hidden = (
++++++    #                     outputs.decoder_hidden_states
++++++    #                     if self.config.is_encoder_decoder
++++++    #                     else outputs.hidden_states
++++++    #                 )
++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++++++            
++++++    #         # token selection
++++++    #         if do_sample:
++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++++++    #         else:
++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++++++            
++++++    #         # finished sentences should have their next token be a padding token
++++++    #         if has_eos_stopping_criteria:
++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++++++            
++++++    #         # update generated ids, model inputs, and length for next step
++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++++++    #         if streamer is not None:
++++++    #             streamer.put(next_tokens)
++++++            
++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
++++++    #             outputs,
++++++    #             model_kwargs,
++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
++++++    #         )
++++++            
++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++++++    #         cur_len += 1
++++++            
++++++    #         if _record_time:
++++++    #             import time as time_module
++++++    #             infer_stop = time_module.time()
++++++    #             time_record.append(infer_stop - infer_start)
++++++            
++++++    #         del outputs
++++++        
++++++    #     average_infer_time = None
++++++    #     if time_record:
++++++    #         if len(time_record) > 1:
++++++    #             time_record.pop(0)
++++++    #         average_infer_time = sum(time_record) / len(time_record)
++++++    #         print(f'average inference time is: {average_infer_time}')
++++++    #         print(f'inference time record: {time_record}')
++++++        
++++++    #     if streamer is not None:
++++++    #         streamer.end()
++++++        
++++++    #     # 简单判断：打印是否使用了JIT路径
++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
++++++    #     else:
++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++++++        
++++++    #     if return_dict_in_generate:
++++++    #         if self.config.is_encoder_decoder:
++++++    #             return GenerateEncoderDecoderOutput(
++++++    #                 sequences=input_ids,
++++++    #                 scores=scores,
++++++    #                 logits=raw_logits,
++++++    #                 encoder_attentions=encoder_attentions,
++++++    #                 encoder_hidden_states=encoder_hidden_states,
++++++    #                 decoder_attentions=decoder_attentions,
++++++    #                 cross_attentions=cross_attentions,
++++++    #                 decoder_hidden_states=decoder_hidden_states,
++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++    #                 average_infer_time=average_infer_time
++++++    #             )
++++++    #         else:
++++++    #             return GenerateDecoderOnlyOutput(
++++++    #                 sequences=input_ids,
++++++    #                 scores=scores,
++++++    #                 logits=raw_logits,
++++++    #                 attentions=decoder_attentions,
++++++    #                 hidden_states=decoder_hidden_states,
++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++    #                 average_infer_time=average_infer_time
++++++    #             )
++++++    #     else:
++++++    #         return input_ids
++++++            
++++++    # def _prepare_cache_for_generation(
++++++    #     self,
++++++    #     generation_config,
++++++    #     model_kwargs,
++++++    #     assistant_model,
++++++    #     batch_size,
++++++    #     max_cache_length,
++++++    # ):
++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++++++    #         generation_config.cache_implementation = "static"
++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++++++        
++++++    #     if generation_config.cache_implementation == "static":
++++++    #         base_required_from_max_length = generation_config.max_length + 1
++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
++++++    #         min_cache_size = 50
++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++++++    #         else:
++++++    #             max_cache_length = max(base_required, min_cache_size)
++++++            
++++++    #         original_max_cache_length = max_cache_length
++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
++++++            
++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++    #             if max_cache_length > self.config.max_position_embeddings:
++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++++++        
++++++    #     result = super()._prepare_cache_for_generation(
++++++    #         generation_config=generation_config,
++++++    #         model_kwargs=model_kwargs,
++++++    #         assistant_model=assistant_model,
++++++    #         batch_size=batch_size,
++++++    #         max_cache_length=max_cache_length,
++++++    #     )
++++++        
++++++    #     if generation_config.cache_implementation == "static":
++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++++++    #         created_cache = model_kwargs.get(cache_name)
++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++++++    #             if created_cache.max_cache_len < generation_config.max_length:
++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++++++        
++++++    #     return result
++++++
++++++
++++++
+++++ 
+++++ 
+++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++++-- 
+++++2.27.0
+++++
++++-- 
++++2.27.0
++++
+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+++new file mode 100644
+++index 00000000..966529e4
+++--- /dev/null
++++++ b/patches/0003-20261106secondcommit.patch
+++@@ -0,0 +1,2769 @@
++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++++From: Pinoeer-kingxi <13022943007@163.com>
++++Date: Thu, 6 Nov 2025 14:54:37 +0800
++++Subject: [PATCH 3/3] 20261106secondcommit
++++
++++---
++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
++++ patches/0001-20251104commit.patch             | 1272 -----------------
++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
++++ delete mode 100644 patches/0001-20251104commit.patch
++++
++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++index 73773c22..2f9192bf 100644
++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
++++ 
++++ _CONFIG_FOR_DOC = "DeepseekConfig"
++++ 
+++++_attn_mask_cache = {}
+++++
+++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+++++    q_len = batch_and_seq[1]
+++++    kv_len = batch_and_seq[1] + past_key_values_length 
+++++    key = (batch_and_seq[0], q_len, kv_len)
+++++
+++++    if key in _attn_mask_cache:
+++++        return _attn_mask_cache[key]
+++++
+++++    mask = _prepare_4d_causal_attention_mask(
+++++        attention_mask,
+++++        batch_and_seq,
+++++        inputs_embeds,
+++++        past_key_values_length,
+++++    )
+++++    _attn_mask_cache[key] = mask
+++++    return mask
++++ 
++++ def _get_unpad_data(attention_mask):
++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
++++         return final_output
++++ 
++++ 
++++-    @no_grad()
++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++-        expert_cache = ops.zeros_like(x)
++++-        idxs = flat_expert_indices.argsort()
++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++-        token_idxs = idxs // self.num_experts_per_tok
++++-
++++-        for i, end_idx in enumerate(tokens_per_expert):
++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++-            if start_idx == end_idx:
++++-                continue
++++-            expert = self.experts[i]
++++-            exp_token_idx = token_idxs[start_idx:end_idx]
++++-            expert_tokens = x[exp_token_idx]
++++-            expert_out = expert(expert_tokens)
++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++-
++++-        return expert_cache
++++-        
++++     # @no_grad()
++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++-    #     # expert_cache = torch.zeros_like(x)
++++-    #     # idxs = flat_expert_indices.argsort()
++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++-    #     # token_idxs = idxs // self.num_experts_per_tok
++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++-    #     #     if start_idx == end_idx:
++++-    #     #         continue
++++-    #     #     expert = self.experts[i]
++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++-    #     #     expert_tokens = x[exp_token_idx]
++++-    #     #     expert_out = expert(expert_tokens)
++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++-    #     # return expert_cache
+++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++     #     expert_cache = ops.zeros_like(x)
++++     #     idxs = flat_expert_indices.argsort()
++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++ 
++++     #     return expert_cache
++++-    # @no_grad()
++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++-    #     expert_cache = ops.zeros_like(x)
+++++        
+++++    @no_grad()
+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++        """
+++++        优化版 MoE prefill：
+++++        - 批量张量化处理同一个 expert 的所有 token
+++++        - 跳过无 token 的专家
+++++        - 保持结果完全一致
+++++        """
+++++        # 初始化输出缓存
+++++        expert_cache = ops.zeros_like(x)
++++ 
++++-    #     # 排序保证顺序一致
++++-    #     idxs = flat_expert_indices.argsort()
++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++-    #     token_idxs = idxs // self.num_experts_per_tok
+++++        # 排序（确保 scatter_add 位置对应原逻辑）
+++++        idxs = flat_expert_indices.argsort()
+++++        sorted_expert_indices = flat_expert_indices[idxs]
+++++        sorted_token_indices = idxs // self.num_experts_per_tok
++++ 
++++-    #     # 找出有 token 的专家
++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++        # 每个 expert 的 token 数
+++++        tokens_per_expert = sorted_expert_indices.bincount()
++++ 
++++-    #     for i in active_experts.tolist():
++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++-    #         end_idx = tokens_per_expert[i]
++++-    #         if start_idx == end_idx:  # 没有 token
++++-    #             continue
+++++        # 找出有 token 的专家
+++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
++++ 
++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++-    #         expert_tokens = x[exp_token_idx]
++++-    #         expert_out = self.experts[i](expert_tokens)
++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++        for expert_id in active_experts.tolist():
+++++            # 取该 expert 对应的排序后 token 区间
+++++            start = (tokens_per_expert[:expert_id]).sum().item()
+++++            end = start + tokens_per_expert[expert_id].item()
++++ 
++++-    #         expert_cache = mindspore.mint.scatter_add(
++++-    #             expert_cache,
++++-    #             0,
++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++-    #             expert_out
++++-    #         )
+++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+++++            expert_tokens = x[token_idx]                     # 取输入向量
++++ 
++++-    #     return expert_cache
+++++            # 执行专家 MLP
+++++            expert_out = self.experts[expert_id](expert_tokens)
+++++
+++++            # 按权重缩放
+++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+++++
+++++            # 回写到缓存（等价 scatter_add）
+++++            expert_cache = mindspore.mint.scatter_add(
+++++                expert_cache,
+++++                0,
+++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++                scaled_out
+++++            )
+++++
+++++        return expert_cache
+++++
+++++        # @no_grad()
+++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++        #     # expert_cache = torch.zeros_like(x)
+++++        #     # idxs = flat_expert_indices.argsort()
+++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++        #     # token_idxs = idxs // self.num_experts_per_tok
+++++        #     # for i, end_idx in enumerate(tokens_per_expert):
+++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++        #     #     if start_idx == end_idx:
+++++        #     #         continue
+++++        #     #     expert = self.experts[i]
+++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++        #     #     expert_tokens = x[exp_token_idx]
+++++        #     #     expert_out = expert(expert_tokens)
+++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++        #     # return expert_cache
+++++        #     expert_cache = ops.zeros_like(x)
+++++        #     idxs = flat_expert_indices.argsort()
+++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++        #     token_idxs = idxs // self.num_experts_per_tok
+++++
+++++        #     for i, end_idx in enumerate(tokens_per_expert):
+++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++        #         if start_idx == end_idx:
+++++        #             continue
+++++        #         expert = self.experts[i]
+++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++        #         expert_tokens = x[exp_token_idx]
+++++        #         expert_out = expert(expert_tokens)
+++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++
+++++        #     return expert_cache
+++++        # @no_grad()
+++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++        #     expert_cache = ops.zeros_like(x)
+++++
+++++        #     # 排序保证顺序一致
+++++        #     idxs = flat_expert_indices.argsort()
+++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++        #     token_idxs = idxs // self.num_experts_per_tok
+++++
+++++        #     # 找出有 token 的专家
+++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++
+++++        #     for i in active_experts.tolist():
+++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++        #         end_idx = tokens_per_expert[i]
+++++        #         if start_idx == end_idx:  # 没有 token
+++++        #             continue
+++++
+++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++        #         expert_tokens = x[exp_token_idx]
+++++        #         expert_out = self.experts[i](expert_tokens)
+++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++
+++++        #         expert_cache = mindspore.mint.scatter_add(
+++++        #             expert_cache,
+++++        #             0,
+++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++        #             expert_out
+++++        #         )
+++++
+++++        #     return expert_cache
++++ 
++++ 
++++ 
++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
++++ 
++++         return attn_output, attn_weights, past_key_value
++++ 
++++-
++++ # class DeepseekFlashAttention(nn.Module):
++++ #     """
++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
++++ 
++++         return attn_output, attn_weights, past_key_value
++++ 
+++++
++++ Deepseek_ATTENTION_CLASSES = {
++++     "eager": DeepseekAttention,
++++     "flash-attention": DeepseekFlashAttention,
++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
++++             )
++++         else:
++++             # 4d mask is passed through the layers
++++-            attention_mask = _prepare_4d_causal_attention_mask(
+++++            # attention_mask = _prepare_4d_causal_attention_mask(
+++++            #     attention_mask,
+++++            #     (batch_size, seq_length),
+++++            #     inputs_embeds,
+++++            #     past_key_values_length,
+++++            # )
+++++            #@dwj
+++++            attention_mask = get_cached_causal_mask(
++++                 attention_mask,
++++                 (batch_size, seq_length),
++++                 inputs_embeds,
++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++         # Initialize weights and apply final processing
++++         self.post_init()
++++         self.warm_up = False
+++++        #@dwj
+++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+++++            self.num_layers,
+++++            self.num_attention_heads,
+++++            self.head_dim,
+++++            batch_size=1,
+++++            max_length=self.max_length,
+++++            dtype=mindspore.float16
+++++        )
+++++
+++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+++++        key_cache = []
+++++        value_cache = []
+++++        for _ in range(num_layers):
+++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++++            key_cache.append(k)
+++++            value_cache.append(v)
+++++        return key_cache, value_cache
+++++
++++ 
++++     def warmup_moe_model_deep(self):
++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++index bced285c..ebd7782e 100644
++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
++++ 
++++-Long_Prompt = False
++++-PROMPT_LENGTH_THRESHOLD = 128
+++++Long_Prompt = 1
+++++LONG_PROMPT_LENGTH_THRESHOLD = 128
+++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+++++
+++++_causal_mask_cache = {}
+++++
+++++def get_cached_causal_mask_with_cache_position(
+++++    attention_mask: mindspore.Tensor,
+++++    sequence_length: int,
+++++    target_length: int,
+++++    dtype: mindspore.dtype,
+++++    min_dtype: float,
+++++    cache_position: mindspore.Tensor,
+++++    batch_size: int,
+++++):
+++++    """
+++++    带缓存的 causal mask 构造函数
+++++    """
+++++    # q_len 是当前 query 长度
+++++    q_len = sequence_length
+++++    # kv_len 是 target_length
+++++    kv_len = target_length
+++++
+++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+++++
+++++    if key in _causal_mask_cache:
+++++        return _causal_mask_cache[key]
+++++
+++++    # 调用原来的 mask 构造逻辑
+++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++        attention_mask,
+++++        sequence_length=sequence_length,
+++++        target_length=target_length,
+++++        dtype=dtype,
+++++        min_dtype=min_dtype,
+++++        cache_position=cache_position,
+++++        batch_size=batch_size,
+++++    )
+++++    # 缓存结果
+++++    _causal_mask_cache[key] = causal_mask
+++++    return causal_mask
++++ 
++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
++++ def _prepare_4d_causal_attention_mask_with_cache_position(
++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++ 
++++ 
++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+++++# class Qwen2MoeAttention(nn.Module):
+++++#     """
+++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+++++#     and "Generating Long Sequences with Sparse Transformers".
+++++#     """
+++++
+++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++#         super().__init__()
+++++#         self.config = config
+++++#         self.layer_idx = layer_idx
+++++#         if layer_idx is None:
+++++#             logger.warning_once(
+++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++#                 "when creating this class."
+++++#             )
+++++
+++++#         self.hidden_size = config.hidden_size
+++++#         self.num_heads = config.num_attention_heads
+++++#         self.head_dim = self.hidden_size // self.num_heads
+++++#         self.num_key_value_heads = config.num_key_value_heads
+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++#         self.max_position_embeddings = config.max_position_embeddings
+++++#         self.rope_theta = config.rope_theta
+++++#         self.is_causal = True
+++++#         self.attention_dropout = config.attention_dropout
+++++
+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++++#             raise ValueError(
+++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++++#                 f" and `num_heads`: {self.num_heads})."
+++++#             )
+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++
+++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++#             self.head_dim,
+++++#             max_position_embeddings=self.max_position_embeddings,
+++++#             base=self.rope_theta,
+++++#         )
+++++
+++++#     def forward(
+++++#         self,
+++++#         hidden_states: mindspore.Tensor,
+++++#         attention_mask: Optional[mindspore.Tensor] = None,
+++++#         position_ids: Optional[mindspore.Tensor] = None,
+++++#         past_key_value: Optional[Cache] = None,
+++++#         output_attentions: bool = False,
+++++#         use_cache: bool = False,
+++++#         cache_position: Optional[mindspore.Tensor] = None,
+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++
+++++        
+++++
+++++#         bsz, q_len, _ = hidden_states.shape
+++++
+++++#         query_states = self.q_proj(hidden_states)
+++++#         key_states = self.k_proj(hidden_states)
+++++#         value_states = self.v_proj(hidden_states)
+++++
+++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++
+++++#         kv_seq_len = key_states.shape[-2]
+++++#         if past_key_value is not None:
+++++#             if self.layer_idx is None:
+++++#                 raise ValueError(
+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++#                     "with a layer index."
+++++#                 )
+++++#             if isinstance(past_key_value, StaticCache):
+++++#                 kv_seq_len = key_states.shape[-2]
+++++#             else:
+++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++
+++++#         if past_key_value is not None:
+++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++            
+++++#             if isinstance(past_key_value, StaticCache):
+++++#                 kv_seq_len = key_states.shape[-2]
+++++
+++++#         # repeat k/v heads if n_kv_heads < n_heads
+++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++        
+++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++
+++++#         if attention_mask is not None:
+++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++#             attn_weights = attn_weights + causal_mask
+++++
+++++#         # upcast attention to fp32
+++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++++#         attn_output = ops.matmul(attn_weights, value_states)
+++++
+++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++++#             raise ValueError(
+++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+++++#                 f" {attn_output.shape}"
+++++#             )
+++++
+++++#         attn_output = ops.transpose(attn_output, 1, 2)
+++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++
+++++#         attn_output = self.o_proj(attn_output)
+++++#         # @lwx
+++++        
+++++#         # max_seq_len = self.max_position_embeddings  # 2048
+++++
+++++#         # if attention_mask is not None:
+++++#         #     # attention_mask: [B, 1, Sq, Sk]
+++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++
+++++#         #     # pad 到 [max_seq_len, max_seq_len]
+++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++#         #     global_attention_mask = padded_mask
+++++#         # else:
+++++#         #     global_attention_mask = None
+++++
+++++
+++++#         # sparse_mode=3
+++++#         # attn_output = mindspore.ops.flash_attention_score(
+++++#         #     query=query_states,
+++++#         #     key=key_states,
+++++#         #     value=value_states,
+++++#         #     real_shift=None,
+++++#         #     padding_mask=None,
+++++
+++++#         #     head_num=self.num_heads,
+++++#         #     attn_mask=global_attention_mask,
+++++#         #     keep_prob=1.0 - self.attention_dropout,
+++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++#         #     input_layout="BNSD",
+++++#         #     pre_tokens=2147483647,
+++++#         #     next_tokens=2147483647,
+++++#         #     inner_precise=0,
+++++#         #     drop_mask=None,
+++++#         #     prefix=None,
+++++#         #     actual_seq_qlen=None,
+++++#         #     actual_seq_kvlen=None,
+++++#         #     sparse_mode=sparse_mode,
+++++#         # )
+++++#         if not output_attentions:
+++++#             attn_weights = None
+++++
+++++#         return attn_output, attn_weights, past_key_value
+++++
++++ class Qwen2MoeAttention(nn.Module):
++++     """
++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
++++-    and "Generating Long Sequences with Sparse Transformers".
++++-    """
+++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
++++ 
+++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+++++
+++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+++++    """
++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++         super().__init__()
++++         self.config = config
++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
++++         if layer_idx is None:
++++             logger.warning_once(
++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++                 "when creating this class."
++++             )
++++ 
++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
++++         use_cache: bool = False,
++++         cache_position: Optional[mindspore.Tensor] = None,
++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++-
++++         
++++-
+++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
++++         bsz, q_len, _ = hidden_states.shape
++++ 
++++         query_states = self.q_proj(hidden_states)
++++         key_states = self.k_proj(hidden_states)
++++         value_states = self.v_proj(hidden_states)
++++ 
++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++-
+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++        
++++         kv_seq_len = key_states.shape[-2]
++++         if past_key_value is not None:
++++-            if self.layer_idx is None:
++++-                raise ValueError(
++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++-                    "with a layer index."
++++-                )
++++-            if isinstance(past_key_value, StaticCache):
++++-                kv_seq_len = key_states.shape[-2]
++++-            else:
++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++        
++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++ 
++++         if past_key_value is not None:
++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++
+++++        # --- 2. 动态调度核心注意力计算 ---
+++++        global Long_Prompt
+++++        if Long_Prompt >= 1:
+++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+++++            fa_attention_mask = None
+++++            if attention_mask is not None:
+++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++                fa_attention_mask = (mask_slice != 0)
+++++
+++++            attn_output = mindspore.ops.flash_attention_score(
+++++                query=query_states,
+++++                key=key_states,
+++++                value=value_states,
+++++                head_num=self.num_heads,
+++++                attn_mask=fa_attention_mask,
+++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+++++                scalar_value=1.0 / math.sqrt(self.head_dim),
+++++                input_layout="BNSD",
+++++                sparse_mode=0,
+++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+++++            )
++++             
++++-            if isinstance(past_key_value, StaticCache):
++++-                kv_seq_len = key_states.shape[-2]
+++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++            attn_output = self.o_proj(attn_output)
+++++            attn_weights = None
+++++            if output_attentions:
+++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
++++ 
++++-        # repeat k/v heads if n_kv_heads < n_heads
++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
++++-        
++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++        else:
+++++            # --- Eager Attention 路径 (用于短序列和解码) ---
+++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++            
+++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++ 
++++-        if attention_mask is not None:
++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++-            attn_weights = attn_weights + causal_mask
+++++            if attention_mask is not None:
+++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++                attn_weights = attn_weights + causal_mask
++++ 
++++-        # upcast attention to fp32
++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++++-        attn_output = ops.matmul(attn_weights, value_states)
+++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++++            attn_output = ops.matmul(attn_weights, value_states)
++++ 
++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++++-            raise ValueError(
++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
++++-                f" {attn_output.shape}"
++++-            )
+++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++++                raise ValueError(
+++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+++++                )
++++ 
++++-        attn_output = ops.transpose(attn_output, 1, 2)
++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++            attn_output = ops.transpose(attn_output, 1, 2)
+++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++            attn_output = self.o_proj(attn_output)
++++ 
++++-        attn_output = self.o_proj(attn_output)
++++-        # @lwx
+++++            if not output_attentions:
+++++                attn_weights = None
++++         
++++-        # max_seq_len = self.max_position_embeddings  # 2048
++++-
++++-        # if attention_mask is not None:
++++-        #     # attention_mask: [B, 1, Sq, Sk]
++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++-
++++-        #     # pad 到 [max_seq_len, max_seq_len]
++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++-        #     global_attention_mask = padded_mask
++++-        # else:
++++-        #     global_attention_mask = None
++++-
++++-
++++-        # sparse_mode=3
++++-        # attn_output = mindspore.ops.flash_attention_score(
++++-        #     query=query_states,
++++-        #     key=key_states,
++++-        #     value=value_states,
++++-        #     real_shift=None,
++++-        #     padding_mask=None,
++++-
++++-        #     head_num=self.num_heads,
++++-        #     attn_mask=global_attention_mask,
++++-        #     keep_prob=1.0 - self.attention_dropout,
++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++-        #     input_layout="BNSD",
++++-        #     pre_tokens=2147483647,
++++-        #     next_tokens=2147483647,
++++-        #     inner_precise=0,
++++-        #     drop_mask=None,
++++-        #     prefix=None,
++++-        #     actual_seq_qlen=None,
++++-        #     actual_seq_kvlen=None,
++++-        #     sparse_mode=sparse_mode,
++++-        # )
++++-        if not output_attentions:
++++-            attn_weights = None
++++-
++++         return attn_output, attn_weights, past_key_value
++++ 
++++-
++++ # class Qwen2MoeFlashAttention(nn.Module):
++++ #     """
++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
++++ #             return final_hidden_states, router_logits
++++ 
++++ 
++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++-#     """
++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
++++-#     """
++++-#     def __init__(self, config: Qwen2MoeConfig):
++++-#         super().__init__()
++++-#         self.num_experts = config.num_experts
++++-#         self.top_k = config.num_experts_per_tok
++++-#         self.norm_topk_prob = config.norm_topk_prob
++++-
++++-#         # 门控网络
++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++-#         # 专家列表
++++-#         self.experts = nn.ModuleList(
++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++-#         )
++++-#         # 共享专家
++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++-
++++-#     @no_grad()
++++-#     def _moe_infer_decode(
++++-#         self, 
++++-#         hidden_states: mindspore.Tensor, 
++++-#         selected_experts: mindspore.Tensor, 
++++-#         routing_weights: mindspore.Tensor
++++-#     ) -> mindspore.Tensor:
++++-#         """
++++-#         【解码路径】针对 sequence_length=1 的极致优化。
++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
++++-#         """
++++-#         batch_size, hidden_dim = hidden_states.shape
++++-        
++++-#         expert_outputs_list = [
++++-#             ops.cat([
++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++-#             ], dim=0) 
++++-#             for i in range(batch_size)
++++-#         ]
++++-        
++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
++++-#         # shape: (batch_size, top_k, hidden_dim)
++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++-        
++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++-        
++++-#         return moe_output.squeeze(1)
++++-
++++-#     @no_grad()
++++-#     def _moe_infer_prefill(
++++-#         self, 
++++-#         hidden_states: mindspore.Tensor, 
++++-#         selected_experts: mindspore.Tensor, 
++++-#         routing_weights: mindspore.Tensor
++++-#     ) -> mindspore.Tensor:
++++-#         """
++++-#         【预填充路径】针对 sequence_length > 1 的优化。
++++-#         按专家对 Token 进行分组，并进行批处理。
++++-#         """
++++-#         moe_output = ops.zeros_like(hidden_states)
++++-#         num_tokens = hidden_states.shape[0]
++++-#         flat_selected_experts = selected_experts.flatten()
++++-        
++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++-        
++++-#         active_experts = ops.unique(flat_selected_experts)
++++-        
++++-#         for expert_idx_tensor in active_experts:
++++-#             expert_idx = expert_idx_tensor.item()
++++-#             expert_layer = self.experts[expert_idx]
++++-            
++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++-#             selected_token_indices = token_indices[mask]
++++-#             selected_routing_weights = routing_weights.flatten()[mask]
++++-            
++++-#             current_states = hidden_states[selected_token_indices]
++++-            
++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++-            
++++-#             moe_output = moe_output.index_add(
++++-#                 dim=0,
++++-#                 index=selected_token_indices,
++++-#                 source=expert_output.to(hidden_states.dtype)
++++-#             )
++++-#         return moe_output
++++-
++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++-#         """
++++-#         顶层 forward 方法，作为智能分发器。
++++-#         """
++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-        
++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++-#         router_logits = self.gate(hidden_states_reshaped)
++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++-
++++-#         if self.norm_topk_prob:
++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-        
++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++++-        
++++-#         moe_output = None
++++-#         # 在推理时，根据序列长度选择最优路径
++++-#         if not self.training:
++++-#             if sequence_length == 1:
++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++-#             else:
++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++-#         else:
++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
++++-#             raise NotImplementedError("Training path is not implemented.")
++++-
++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
++++-        
++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
++++-        
++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
++++-        
++++-#         return final_hidden_states, router_logits
++++-
++++-
++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++-#     """
++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
++++-#     """
++++-#     def __init__(self, config: Qwen2MoeConfig):
++++-#         super().__init__()
++++-#         self.num_experts = config.num_experts
++++-#         self.top_k = config.num_experts_per_tok
++++-#         self.norm_topk_prob = config.norm_topk_prob
++++-
++++-#         # 门控网络
++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++-#         # 专家列表
++++-#         self.experts = nn.ModuleList(
++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++-#         )
++++-#         # 共享专家
++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++-
++++-#     @no_grad()
++++-#     def _moe_infer_decode(
++++-#         self, 
++++-#         hidden_states: mindspore.Tensor, 
++++-#         selected_experts: mindspore.Tensor, 
++++-#         routing_weights: mindspore.Tensor
++++-#     ) -> mindspore.Tensor:
++++-#         batch_size, _ = hidden_states.shape
++++-#         expert_outputs_list = [
++++-#             ops.cat([
++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++-#             ], dim=0) 
++++-#             for i in range(batch_size)
++++-#         ]
++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++-#         return moe_output.squeeze(1)
++++-
++++-#     @no_grad()
++++-#     def _moe_infer_prefill(
++++-#         self, 
++++-#         hidden_states: mindspore.Tensor, 
++++-#         selected_experts: mindspore.Tensor, 
++++-#         routing_weights: mindspore.Tensor
++++-#     ) -> mindspore.Tensor:
++++-#         moe_output = ops.zeros_like(hidden_states)
++++-#         num_tokens = hidden_states.shape[0]
++++-#         flat_selected_experts = selected_experts.flatten()
++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++-#         active_experts = ops.unique(flat_selected_experts)
++++-        
++++-#         for expert_idx_tensor in active_experts:
++++-#             expert_idx = expert_idx_tensor.item()
++++-#             expert_layer = self.experts[expert_idx]
++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++-#             selected_token_indices = token_indices[mask]
++++-#             selected_routing_weights = routing_weights.flatten()[mask]
++++-#             current_states = hidden_states[selected_token_indices]
++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++-#             moe_output = moe_output.index_add(
++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++-#             )
++++-#         return moe_output
++++-
++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++-#         """
++++-#         顶层 forward 方法，作为智能分发器。
++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
++++-#         """
++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-        
++++-#         # 1. 门控计算 (通用逻辑)
++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++-#         router_logits = self.gate(hidden_states_reshaped)
++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++-
++++-#         if self.norm_topk_prob:
++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-        
++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++++-        
++++-#         # 2. 智能分发到最优 MoE 路径
++++-#         moe_output = None
++++-#         if not self.training:
++++-#             if sequence_length == 1:
++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++-#             else:
++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++-#         else:
++++-#             raise NotImplementedError("Training path is not implemented.")
++++-
++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++-        
++++-#         # 4. 合并 MoE 输出和共享专家输出
++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++-        
++++-#         # 5. 恢复原始形状并返回
++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++-        
++++-#         return final_hidden_states, router_logits
++++-
++++-# prefill fastest
++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++-#     """
++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
++++-#     """
++++-#     def __init__(self, config: Qwen2MoeConfig):
++++-#         super().__init__()
++++-#         self.num_experts = config.num_experts
++++-#         self.top_k = config.num_experts_per_tok
++++-#         self.norm_topk_prob = config.norm_topk_prob
++++-
++++-#         # 门控网络
++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++-#         # 专家列表
++++-#         self.experts = nn.ModuleList(
++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++-#         )
++++-#         # 共享专家
++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++-
++++-#     @no_grad()
++++-#     def _moe_infer_dispatch(
++++-#         self, 
++++-#         hidden_states: mindspore.Tensor, 
++++-#         selected_experts: mindspore.Tensor, 
++++-#         routing_weights: mindspore.Tensor
++++-#     ) -> mindspore.Tensor:
++++-#         """
++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
++++-#         """
++++-#         moe_output = ops.zeros_like(hidden_states)
++++-#         num_tokens, _ = hidden_states.shape
++++-        
++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
++++-#         flat_selected_experts = selected_experts.flatten()
++++-#         flat_routing_weights = routing_weights.flatten()
++++-
++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++-
++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
++++-#         active_experts = ops.unique(flat_selected_experts)
++++-        
++++-#         for expert_idx_tensor in active_experts:
++++-#             expert_idx = expert_idx_tensor.item()
++++-#             expert_layer = self.experts[expert_idx]
++++-            
++++-#             # 找到所有分配给该专家的 token
++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++-            
++++-#             # 使用 mask 选取对应的 token 和权重
++++-#             current_token_indices = token_indices[mask]
++++-#             current_routing_weights = flat_routing_weights[mask]
++++-#             current_hidden_states = hidden_states[current_token_indices]
++++-            
++++-#             # 对这些 token 进行批处理
++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++-            
++++-#             # 使用 index_add 将结果精确地加回到对应位置
++++-#             moe_output = moe_output.index_add(
++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
++++-#             )
++++-#         return moe_output
++++-
++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++-#         """
++++-#         顶层 forward 方法，作为智能分发器。
++++-#         """
++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-        
++++-#         # 1. 门控计算
++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++-#         router_logits = self.gate(hidden_states_reshaped)
++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++-
++++-#         if self.norm_topk_prob:
++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-        
++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++++-        
++++-#         # 2. 调用统一的 MoE 计算内核
++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
++++-
++++-#         # 3. 统一处理共享专家
++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++-        
++++-#         # 4. 合并输出
++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++-        
++++-#         # 5. 恢复原始形状并返回
++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++-        
++++-#         return final_hidden_states, router_logits
++++-
++++-
++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++-#     """
++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++-#     【最终高性能与高精度版】：
++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
++++-#     3. 这样实现了速度和准确性的两全其美。
++++-#     """
++++-#     def __init__(self, config: Qwen2MoeConfig):
++++-#         super().__init__()
++++-#         self.num_experts = config.num_experts
++++-#         self.top_k = config.num_experts_per_tok
++++-#         self.norm_topk_prob = config.norm_topk_prob
++++-
++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++-#         self.experts = nn.ModuleList(
++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++-#         )
++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++-
++++-#     @no_grad()
++++-#     def _moe_infer_decode(
++++-#         self, 
++++-#         hidden_states: mindspore.Tensor, 
++++-#         selected_experts: mindspore.Tensor, 
++++-#         routing_weights: mindspore.Tensor
++++-#     ) -> mindspore.Tensor:
++++-#         """
++++-#         【解码路径】极致优化版：bmm + 高精度累加。
++++-#         """
++++-#         original_dtype = hidden_states.dtype
++++-#         batch_size, _ = hidden_states.shape
++++-        
++++-#         expert_outputs_list = [
++++-#             ops.cat([
++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++-#             ], dim=0) 
++++-#             for i in range(batch_size)
++++-#         ]
++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++-
++++-#         # 在 float32 下执行 bmm，得到高精度结果
++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++-        
++++-#         # 将高精度结果转换回原始数据类型
++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
++++-        
++++-#         return moe_output
++++-
++++-#     @no_grad()
++++-#     def _moe_infer_prefill(
++++-#         self, 
++++-#         hidden_states: mindspore.Tensor, 
++++-#         selected_experts: mindspore.Tensor, 
++++-#         routing_weights: mindspore.Tensor
++++-#     ) -> mindspore.Tensor:
++++-#         """
++++-#         【预填充路径】与原始实现一致，结果精确。
++++-#         """
++++-#         moe_output = ops.zeros_like(hidden_states)
++++-#         num_tokens, _ = hidden_states.shape
++++-#         flat_selected_experts = selected_experts.flatten()
++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++-#         active_experts = ops.unique(flat_selected_experts)
++++-        
++++-#         for expert_idx_tensor in active_experts:
++++-#             expert_idx = expert_idx_tensor.item()
++++-#             expert_layer = self.experts[expert_idx]
++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++-#             selected_token_indices = token_indices[mask]
++++-#             selected_routing_weights = routing_weights.flatten()[mask]
++++-#             current_states = hidden_states[selected_token_indices]
++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++-#             moe_output = moe_output.index_add(
++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++-#             )
++++-#         return moe_output
++++-
++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-        
++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++-#         router_logits = self.gate(hidden_states_reshaped)
++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++-
++++-#         if self.norm_topk_prob:
++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-        
++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
++++-#         # 如果模型主体是 float16，后续再转换
++++-        
++++-#         moe_output = None
++++-#         if not self.training:
++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
++++-#             # _moe_infer_decode 内部会处理好类型转换
++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
++++-#             if sequence_length == 1:
++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++-#             else:
++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++-#         else:
++++-#             raise NotImplementedError("Training path is not implemented.")
++++-
++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++-        
++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++-        
++++-#         return final_hidden_states, router_logits
++++-    
++++-
++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++-#     """
++++-#     【融合版】一个混合专家模块，内置两种推理策略，
++++-#     由外部全局变量 `Long_Prompt` 控制：
++++-
++++-#     - if Long_Prompt is True:  【精度优先模式】
++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
++++-#       适用于处理长序列，避免误差累积。
++++-
++++-#     - if Long_Prompt is False: 【速度优先模式】
++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
++++-#     """
++++-#     def __init__(self, config: Qwen2MoeConfig):
++++-#         super().__init__()
++++-#         self.num_experts = config.num_experts
++++-#         self.top_k = config.num_experts_per_tok
++++-#         self.norm_topk_prob = config.norm_topk_prob
++++-
++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++-#         self.experts = nn.ModuleList(
++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++-#         )
++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++-
++++-#     # --- 速度优先模式的辅助函数 ---
++++-#     @no_grad()
++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++-#         original_dtype = hidden_states.dtype
++++-#         batch_size, _ = hidden_states.shape
++++-#         expert_outputs_list = [
++++-#             ops.cat([
++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++-#             ], dim=0) 
++++-#             for i in range(batch_size)
++++-#         ]
++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
++++-
++++-#     @no_grad()
++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++-#         moe_output = ops.zeros_like(hidden_states)
++++-#         num_tokens, _ = hidden_states.shape
++++-#         flat_selected_experts = selected_experts.flatten()
++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++-#         active_experts = ops.unique(flat_selected_experts)
++++-#         for expert_idx_tensor in active_experts:
++++-#             expert_idx = expert_idx_tensor.item()
++++-#             expert_layer = self.experts[expert_idx]
++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++-#             selected_token_indices = token_indices[mask]
++++-#             selected_routing_weights = routing_weights.flatten()[mask]
++++-#             current_states = hidden_states[selected_token_indices]
++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
++++-#         return moe_output
++++-
++++-#     # --- 精度优先模式的辅助函数 ---
++++-#     @no_grad()
++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++-#         moe_output = ops.zeros_like(hidden_states)
++++-#         num_tokens, _ = hidden_states.shape
++++-#         flat_selected_experts = selected_experts.flatten()
++++-#         flat_routing_weights = routing_weights.flatten()
++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++-#         active_experts = ops.unique(flat_selected_experts)
++++-#         for expert_idx_tensor in active_experts:
++++-#             expert_idx = expert_idx_tensor.item()
++++-#             expert_layer = self.experts[expert_idx]
++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++-#             current_token_indices = token_indices[mask]
++++-#             current_routing_weights = flat_routing_weights[mask]
++++-#             current_hidden_states = hidden_states[current_token_indices]
++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++++-#         return moe_output
++++-
++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
++++-#         global Long_Prompt
++++-
++++-#         # 1. 门控计算 (所有模式通用)
++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++-#         router_logits = self.gate(hidden_states_reshaped)
++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++++-#         if self.norm_topk_prob:
++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-        
++++-#         moe_output = None
++++-#         if not self.training:
++++-#             # 根据 Long_Prompt 标志选择模式
++++-#             if Long_Prompt:
++++-#                 # --- 精度优先模式 ---
++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++-#             else:
++++-#                 # --- 速度优先模式 ---
++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++-#                 if sequence_length == 1:
++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++-#                 else:
++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++-#         else:
++++-#             raise NotImplementedError("Training path is not implemented.")
++++-
++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++-        
++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++-        
++++-#         return final_hidden_states, router_logits
++++-    
++++ class Qwen2MoeSparseMoeBlock(nn.Module):
++++     """
++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++         return moe_output_fp32.squeeze(1).to(original_dtype)
++++ 
+++++    # @no_grad()
+++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++    #     num_tokens, _ = hidden_states.shape
+++++    #     flat_selected_experts = selected_experts.flatten()
+++++    #     sorted_expert_indices = flat_selected_experts.argsort()
+++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++++    #     original_token_indices = sorted_expert_indices // self.top_k
+++++    #     moe_output = ops.zeros_like(hidden_states)
+++++    #     current_token_offset = 0
+++++    #     for i in range(self.num_experts):
+++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+++++    #         if expert_token_count == 0:
+++++    #             continue
+++++    #         end_offset = current_token_offset + expert_token_count
+++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++++    #         current_token_offset += expert_token_count
+++++    #     return moe_output
+++++
++++     @no_grad()
++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++-        num_tokens, _ = hidden_states.shape
++++-        flat_selected_experts = selected_experts.flatten()
++++-        sorted_expert_indices = flat_selected_experts.argsort()
++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++++-        original_token_indices = sorted_expert_indices // self.top_k
+++++        """
+++++        优化版 MoE prefill (速度优先模式)：
+++++        - 批量张量化处理同一个 expert 的所有 token
+++++        - 跳过无 token 的专家
+++++        - 保持结果完全一致
+++++        """
++++         moe_output = ops.zeros_like(hidden_states)
++++-        current_token_offset = 0
++++-        for i in range(self.num_experts):
++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
++++-            if expert_token_count == 0:
++++-                continue
++++-            end_offset = current_token_offset + expert_token_count
++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++++-            current_token_offset += expert_token_count
+++++
+++++        flat_selected_experts = selected_experts.flatten()
+++++        flat_routing_weights = routing_weights.flatten()
+++++
+++++        idxs = flat_selected_experts.argsort()
+++++        sorted_expert_indices = flat_selected_experts[idxs]
+++++        sorted_token_indices = idxs // self.top_k
+++++
+++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+++++
+++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+++++
+++++        for expert_id in active_experts.tolist():
+++++            start = int(tokens_per_expert[:expert_id].sum().item())
+++++            end = start + int(tokens_per_expert[expert_id].item())
+++++
+++++            token_idx = sorted_token_indices[start:end]
+++++            expert_tokens = hidden_states[token_idx]
+++++
+++++            expert_out = self.experts[expert_id](expert_tokens)
+++++
+++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+++++
+++++            moe_output = mindspore.mint.scatter_add(
+++++                moe_output,
+++++                0,
+++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+++++                scaled_out.to(hidden_states.dtype)
+++++            )
+++++
++++         return moe_output
++++ 
+++++
++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
++++     @no_grad()
++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++         
++++         moe_output = None
++++-        if Long_Prompt:
++++-            # --- 精度优先模式 (ACCURACY MODE) ---
++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++        # if Long_Prompt==0:
+++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
+++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++        # else:
+++++        #     # --- 速度优先模式 (SPEED MODE) ---
+++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++        #     if sequence_length == 1:
+++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++        #     else:
+++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++        
+++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++        if sequence_length == 1:
+++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++         else:
++++-            # --- 速度优先模式 (SPEED MODE) ---
++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++-            if sequence_length == 1:
++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++-            else:
++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++-        
+++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++    
++++ 
++++         # 3. 共享专家计算与合并 (所有模式通用)
++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++         
++++         return final_hidden_states, router_logits
++++ 
+++++
++++ class Qwen2MoeDecoderLayer(nn.Module):
++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
++++         super().__init__()
++++         self.hidden_size = config.hidden_size
++++         
++++-        # if Long_Prompt:
++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++-        # else:
+++++        # if Long_Prompt == 2:
++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++        # else:
+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++ 
++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++ 
++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++             )
++++ 
++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++        #     attention_mask,
+++++        #     sequence_length=sequence_length,
+++++        #     target_length=target_length,
+++++        #     dtype=dtype,
+++++        #     min_dtype=min_dtype,
+++++        #     cache_position=cache_position,
+++++        #     batch_size=input_tensor.shape[0],
+++++        # )
+++++        #@dwj
+++++        causal_mask = get_cached_causal_mask_with_cache_position(
++++             attention_mask,
++++             sequence_length=sequence_length,
++++             target_length=target_length,
++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
++++         """
++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+++++        _causal_mask_cache.clear()
++++ 
++++         input_ids = kwargs.get("input_ids")
++++         if input_ids is None and args:
++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++ 
++++         if input_ids is not None:
++++             prompt_length = input_ids.shape[1]
++++-            
++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
++++-                Long_Prompt = True
+++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+++++                Long_Prompt = 2
+++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+++++                Long_Prompt = 0
++++             else:
++++-                Long_Prompt = False
+++++                Long_Prompt = 1
+++++
++++ 
++++         return super().generate(*args, **kwargs)
++++     
++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++             dtype = self.lm_head.weight.dtype
++++             min_dtype = float(ops.finfo(dtype).min)
++++ 
++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++            #     attention_mask,
+++++            #     sequence_length=sequence_length,
+++++            #     target_length=past_key_values.get_max_length(),
+++++            #     dtype=dtype,
+++++            #     min_dtype=min_dtype,
+++++            #     cache_position=cache_position,
+++++            #     batch_size=batch_size,
+++++            # )
+++++
+++++            #@dwj
+++++            attention_mask = get_cached_causal_mask_with_cache_position(
++++                 attention_mask,
++++                 sequence_length=sequence_length,
++++                 target_length=past_key_values.get_max_length(),
++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++++deleted file mode 100644
++++index 6dfb5b93..00000000
++++--- a/patches/0001-20251104commit.patch
+++++++ /dev/null
++++@@ -1,1272 +0,0 @@
++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++-From: Pinoeer-kingxi <13022943007@163.com>
++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
++++-Subject: [PATCH] 20251104commit
++++-
++++----
++++- mindnlp/transformers/cache_utils.py           |  28 +-
++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++++- 3 files changed, 976 insertions(+), 87 deletions(-)
++++-
++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++++-index cadd2e04..02f8d4be 100644
++++---- a/mindnlp/transformers/cache_utils.py
++++-+++ b/mindnlp/transformers/cache_utils.py
++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++++-             # k_out[:, :, cache_position] = key_states
++++-             # v_out[:, :, cache_position] = value_states
++++--            if ON_ORANGE_PI:
++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++--            else:
++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++--
++++-+            # if ON_ORANGE_PI:
++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++-+            # else:
++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++++-+            if cache_position.ndim > 1:
++++-+                cache_position = cache_position.flatten()
++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++++-+                cache_position = cache_position.int()
++++-+            
++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++++-+            k_out[:, :, cache_position] = key_states
++++-+            v_out[:, :, cache_position] = value_states
++++-+            
++++-         return k_out, v_out
++++- 
++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++-index c695b944..d8303e45 100644
++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
++++- def rotate_half(x):
++++-     """Rotates half the hidden dims of the input."""
++++--    x1 = x[..., : x.shape[-1] // 2]
++++--    x2 = x[..., x.shape[-1] // 2 :]
++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++-+    # x1 = x[..., : x.shape[-1] // 2]
++++-+    # x2 = x[..., x.shape[-1] // 2 :]
++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++-     return ops.cat((-x2, x1), dim=-1)
++++- 
++++- 
++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++++-         if self.training:
++++-             raise NotImplementedError("Training is not supported yet.")
++++-         else:
++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++--        if self.config.n_shared_experts is not None:
++++--            y = y + self.shared_experts(identity)
++++--        return y
++++-+            # @lwx
++++-+            if orig_shape[1] == 1:
++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++++-+                y=y.view(*orig_shape)
++++-+                if self.config.n_shared_experts is not None:
++++-+                    y = y + self.shared_experts(identity)
++++-+                return y
++++-+            else:
++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++++-+                if self.config.n_shared_experts is not None:
++++-+                    y = y + self.shared_experts(identity)
++++-+                return y
++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++-+        # if self.config.n_shared_experts is not None:
++++-+        #     y = y + self.shared_experts(identity)
++++-+        # return y
++++-+        
++++-+    @no_grad()
++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++-+
++++-+        expert_cache = ops.zeros_like(x)
++++-+        for i in range(self.num_experts_per_tok):
++++-+            expert_id = flat_expert_indices[i].item()
++++-+            weight = flat_expert_weights[i].item()
++++-+            expert = self.experts[expert_id]
++++-+            expert_out = expert(x)
++++-+            expert_cache += expert_out * weight
++++-+        return expert_cache
++++- 
++++-     @no_grad()
++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++--        # expert_cache = torch.zeros_like(x)
++++--        # idxs = flat_expert_indices.argsort()
++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++--        # token_idxs = idxs // self.num_experts_per_tok
++++--        # for i, end_idx in enumerate(tokens_per_expert):
++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++--        #     if start_idx == end_idx:
++++--        #         continue
++++--        #     expert = self.experts[i]
++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
++++--        #     expert_tokens = x[exp_token_idx]
++++--        #     expert_out = expert(expert_tokens)
++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++--        # return expert_cache
++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++-         expert_cache = ops.zeros_like(x)
++++-         idxs = flat_expert_indices.argsort()
++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++-         token_idxs = idxs // self.num_experts_per_tok
++++-+
++++-         for i, end_idx in enumerate(tokens_per_expert):
++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++-             if start_idx == end_idx:
++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++++-             expert_out = expert(expert_tokens)
++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++-+
++++-         return expert_cache
++++-+        
++++-+    # @no_grad()
++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++-+    #     # expert_cache = torch.zeros_like(x)
++++-+    #     # idxs = flat_expert_indices.argsort()
++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++-+    #     #     if start_idx == end_idx:
++++-+    #     #         continue
++++-+    #     #     expert = self.experts[i]
++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++-+    #     #     expert_tokens = x[exp_token_idx]
++++-+    #     #     expert_out = expert(expert_tokens)
++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++-+    #     # return expert_cache
++++-+    #     expert_cache = ops.zeros_like(x)
++++-+    #     idxs = flat_expert_indices.argsort()
++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++-+    #     token_idxs = idxs // self.num_experts_per_tok
++++-+
++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++-+    #         if start_idx == end_idx:
++++-+    #             continue
++++-+    #         expert = self.experts[i]
++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++-+    #         expert_tokens = x[exp_token_idx]
++++-+    #         expert_out = expert(expert_tokens)
++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++-+
++++-+    #     return expert_cache
++++-+    # @no_grad()
++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++-+    #     expert_cache = ops.zeros_like(x)
++++-+
++++-+    #     # 排序保证顺序一致
++++-+    #     idxs = flat_expert_indices.argsort()
++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++-+    #     token_idxs = idxs // self.num_experts_per_tok
++++-+
++++-+    #     # 找出有 token 的专家
++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++-+
++++-+    #     for i in active_experts.tolist():
++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++-+    #         end_idx = tokens_per_expert[i]
++++-+    #         if start_idx == end_idx:  # 没有 token
++++-+    #             continue
++++-+
++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++-+    #         expert_tokens = x[exp_token_idx]
++++-+    #         expert_out = self.experts[i](expert_tokens)
++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++-+
++++-+    #         expert_cache = mindspore.mint.scatter_add(
++++-+    #             expert_cache,
++++-+    #             0,
++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++-+    #             expert_out
++++-+    #         )
++++-+
++++-+    #     return expert_cache
++++-+
++++-+
++++- 
++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++++- #     """
++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++- 
++++-         # Initialize weights and apply final processing
++++-         self.post_init()
++++-+        self.warm_up = False
++++-+
++++-+    def warmup_moe_model_deep(self):
++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++-+        test_texts = [
++++-+            "warmup short",
++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++++-+        ]
++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++-+        if tokenizer is None:
++++-+            from mindnlp.transformers import AutoTokenizer
++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++-+            self._warmup_tokenizer = tokenizer
++++-+
++++-+        for text in test_texts:
++++-+            inputs = tokenizer(text, return_tensors="ms")
++++-+            with mindspore._no_grad():
++++-+                _ = self(**inputs, use_cache=False)
++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++++- 
++++-     def get_input_embeddings(self):
++++-         return self.model.embed_tokens
++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++-         ```"""
++++-+        if not self.warm_up:
++++-+            self.warm_up = True
++++-+            self.warmup_moe_model_deep()
++++-+
++++-         output_attentions = (
++++-             output_attentions
++++-             if output_attentions is not None
++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++-index 3cbf820e..d4c6b651 100644
++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++-@@ -18,7 +18,6 @@
++++- # See the License for the specific language governing permissions and
++++- # limitations under the License.
++++- """MindSpore Qwen2MoE model."""
++++--
++++- import math
++++- from typing import List, Optional, Tuple, Union
++++- 
++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++++-     TokenClassifierOutput,
++++- )
++++- from ...modeling_utils import PreTrainedModel
++++-+from ...generation import GenerationMixin
++++- from ....utils import logging
++++- from .configuration_qwen2_moe import Qwen2MoeConfig
++++- 
++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++++-         self.variance_epsilon = eps
++++- 
++++-     def forward(self, hidden_states):
++++-+        # @dwj
++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++-+        # @lwx
++++-+        # if not self.training :
++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++-         input_dtype = hidden_states.dtype
++++-         hidden_states = hidden_states.to(mindspore.float32)
++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++++-@@ -234,6 +239,8 @@ def rotate_half(x):
++++-     """Rotates half the hidden dims of the input."""
++++-     x1 = x[..., : x.shape[-1] // 2]
++++-     x2 = x[..., x.shape[-1] // 2 :]
++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++-     return ops.cat((-x2, x1), dim=-1)
++++- 
++++- 
++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++++-         self.config = config
++++-         self.hidden_size = config.hidden_size
++++-         self.intermediate_size = intermediate_size
++++-+        
++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++++-         self.act_fn = ACT2FN[config.hidden_act]
++++- 
++++-     def forward(self, x):
++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++--
++++- 
++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++-+        # @lwx 
++++-+        # gate_up_output = self.gate_up_proj(x)
++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++++-+        # return self.down_proj(swiglu_output)
++++-+
++++-+    # def forward(self, x):
++++-+    #     gate_proj_out = self.gate_proj(x)
++++-+    #     up_proj_out = self.up_proj(x)
++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++++-+    #     return self.down_proj(swiglu_out)
++++-+        
++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++-     """
++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++++-         use_cache: bool = False,
++++-         cache_position: Optional[mindspore.Tensor] = None,
++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++-+
++++-+        
++++-+
++++-         bsz, q_len, _ = hidden_states.shape
++++- 
++++-         query_states = self.q_proj(hidden_states)
++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++-                     "with a layer index."
++++-                 )
++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++-+            if isinstance(past_key_value, StaticCache):
++++-+                kv_seq_len = key_states.shape[-2]
++++-+            else:
++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++- 
++++-         if past_key_value is not None:
++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++-+            
++++-+            if isinstance(past_key_value, StaticCache):
++++-+                kv_seq_len = key_states.shape[-2]
++++- 
++++-         # repeat k/v heads if n_kv_heads < n_heads
++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++--
++++-+        
++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++- 
++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++++--            raise ValueError(
++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++++--                f" {attn_weights.shape}"
++++--            )
++++--
++++--        if attention_mask is not None:  # no matter the length, we just slice it
++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++++-+        if attention_mask is not None:
++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++-             attn_weights = attn_weights + causal_mask
++++- 
++++-         # upcast attention to fp32
++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++- 
++++-         attn_output = self.o_proj(attn_output)
++++--
++++-+        # @lwx
++++-+        
++++-+        # max_seq_len = self.max_position_embeddings  # 2048
++++-+
++++-+        # if attention_mask is not None:
++++-+        #     # attention_mask: [B, 1, Sq, Sk]
++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++-+
++++-+        #     # pad 到 [max_seq_len, max_seq_len]
++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++-+        #     global_attention_mask = padded_mask
++++-+        # else:
++++-+        #     global_attention_mask = None
++++-+
++++-+
++++-+        # sparse_mode=3
++++-+        # attn_output = mindspore.ops.flash_attention_score(
++++-+        #     query=query_states,
++++-+        #     key=key_states,
++++-+        #     value=value_states,
++++-+        #     real_shift=None,
++++-+        #     padding_mask=None,
++++-+
++++-+        #     head_num=self.num_heads,
++++-+        #     attn_mask=global_attention_mask,
++++-+        #     keep_prob=1.0 - self.attention_dropout,
++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++-+        #     input_layout="BNSD",
++++-+        #     pre_tokens=2147483647,
++++-+        #     next_tokens=2147483647,
++++-+        #     inner_precise=0,
++++-+        #     drop_mask=None,
++++-+        #     prefix=None,
++++-+        #     actual_seq_qlen=None,
++++-+        #     actual_seq_kvlen=None,
++++-+        #     sparse_mode=sparse_mode,
++++-+        # )
++++-         if not output_attentions:
++++-             attn_weights = None
++++- 
++++-         return attn_output, attn_weights, past_key_value
++++- 
++++- 
++++-+class Qwen2MoeFlashAttention(nn.Module):
++++-+    """
++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++-+
++++-+    关键改动:
++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++-+       直接传入原始的 key 和 value 张量效率更高。
++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++-+    """
++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++-+        super().__init__()
++++-+        self.config = config
++++-+        self.layer_idx = layer_idx
++++-+        self.hidden_size = config.hidden_size
++++-+        self.num_heads = config.num_attention_heads
++++-+        self.head_dim = self.hidden_size // self.num_heads
++++-+        self.num_key_value_heads = config.num_key_value_heads
++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++-+        self.max_position_embeddings = config.max_position_embeddings
++++-+        self.rope_theta = config.rope_theta
++++-+        self.attention_dropout = config.attention_dropout
++++-+
++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
++++-+            raise ValueError(
++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++-+            )
++++-+
++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++-+
++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++-+            self.head_dim,
++++-+            max_position_embeddings=self.max_position_embeddings,
++++-+            base=self.rope_theta,
++++-+        )
++++-+
++++-+    def forward(
++++-+        self,
++++-+        hidden_states: mindspore.Tensor,
++++-+        attention_mask: Optional[mindspore.Tensor] = None,
++++-+        position_ids: Optional[mindspore.Tensor] = None,
++++-+        past_key_value: Optional[Cache] = None,
++++-+        output_attentions: bool = False,
++++-+        use_cache: bool = False,
++++-+        cache_position: Optional[mindspore.Tensor] = None,
++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++-+
++++-+        bsz, q_len, _ = hidden_states.shape
++++-+
++++-+        # 1. 线性投射 Q, K, V
++++-+        query_states = self.q_proj(hidden_states)
++++-+        key_states = self.k_proj(hidden_states)
++++-+        value_states = self.v_proj(hidden_states)
++++-+
++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+
++++-+        # 3. RoPE 旋转位置编码
++++-+        kv_seq_len = key_states.shape[-2]
++++-+        if past_key_value is not None:
++++-+            if self.layer_idx is None:
++++-+                raise ValueError(
++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++-+                    "with a layer index."
++++-+                )
++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++-+                if cache_position.shape[0] == 1:
++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++-+                    kv_seq_len = past_seen_tokens + 1
++++-+                else:
++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++-+            else:
++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++-+
++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++-+
++++-+        # 4. KV 缓存更新
++++-+        if past_key_value is not None:
++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++-+            key_states, value_states = past_key_value.update(
++++-+                key_states, value_states, self.layer_idx, cache_kwargs
++++-+            )
++++-+            
++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++-+                if cache_position.shape[0] == 1:
++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++-+                    kv_seq_len = key_states.shape[-2]
++++-+
++++-+        # 5. [重要] 准备 Attention Mask
++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++-+        fa_attention_mask = None
++++-+        if attention_mask is not None:
++++-+            # 截取与当前key长度匹配的部分
++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++-+            fa_attention_mask = (mask_slice != 0)
++++-+
++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++-+        input_dtype = query_states.dtype
++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++-+            query_states = query_states.to(mindspore.float16)
++++-+            key_states = key_states.to(mindspore.float16)
++++-+            value_states = value_states.to(mindspore.float16)
++++-+
++++-+        # 6. [核心] 调用 flash_attention_score 算子
++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++-+        attn_output = mindspore.ops.flash_attention_score(
++++-+            query=query_states,
++++-+            key=key_states,
++++-+            value=value_states,
++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
++++-+            attn_mask=fa_attention_mask,
++++-+            keep_prob=1.0 - self.attention_dropout,
++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
++++-+            input_layout="BNSD",
++++-+            sparse_mode=0 # 使用 defaultMask 模式
++++-+        )
++++-+
++++-+        # 恢复原始数据类型
++++-+        attn_output = attn_output.to(input_dtype)
++++-+
++++-+        # 7. 调整输出形状
++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++-+        attn_output = self.o_proj(attn_output)
++++-+
++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
++++-+        attn_weights = None
++++-+        if output_attentions:
++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++-+
++++-+        return attn_output, attn_weights, past_key_value
++++-+
++++-+    # def forward(
++++-+    #     self,
++++-+    #     hidden_states: mindspore.Tensor,
++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
++++-+    #     past_key_value: Optional[Cache] = None,
++++-+    #     output_attentions: bool = False,
++++-+    #     use_cache: bool = False,
++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++-+
++++-+    #     bsz, q_len, _ = hidden_states.shape
++++-+
++++-+    #     # 1. 线性投射 Q, K, V
++++-+    #     query_states = self.q_proj(hidden_states)
++++-+    #     key_states = self.k_proj(hidden_states)
++++-+    #     value_states = self.v_proj(hidden_states)
++++-+
++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+
++++-+    #     # 3. RoPE 旋转位置编码
++++-+    #     kv_seq_len = key_states.shape[-2]
++++-+    #     if past_key_value is not None:
++++-+    #         if self.layer_idx is None:
++++-+    #             raise ValueError(
++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++-+    #                 "with a layer index."
++++-+    #             )
++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++-+
++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++-+
++++-+    #     # 4. KV 缓存更新
++++-+    #     if past_key_value is not None:
++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++-+    #         key_states, value_states = past_key_value.update(
++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
++++-+    #         )
++++-+
++++-+    #     # 5. 准备 Attention Mask
++++-+    #     fa_attention_mask = None
++++-+    #     if attention_mask is not None:
++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++-+    #         fa_attention_mask = (mask_slice != 0)
++++-+
++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++-+    #     input_dtype = query_states.dtype
++++-+
++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
++++-+    #     attn_output = mindspore.ops.flash_attention_score(
++++-+    #         query=query_states,
++++-+    #         key=key_states,
++++-+    #         value=value_states,
++++-+    #         head_num=self.num_heads,
++++-+    #         attn_mask=fa_attention_mask,
++++-+    #         keep_prob=1.0 - self.attention_dropout,
++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++-+    #         input_layout="BNSD",
++++-+    #         sparse_mode=0,
++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++-+    #         inner_precise=1
++++-+    #     )
++++-+
++++-+    #     # 恢复原始数据类型
++++-+    #     attn_output = attn_output.to(input_dtype)
++++-+
++++-+    #     # 7. 调整输出形状
++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++-+    #     attn_output = self.o_proj(attn_output)
++++-+
++++-+    #     attn_weights = None
++++-+    #     if output_attentions:
++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++-+
++++-+    #     return attn_output, attn_weights, past_key_value
++++-+
++++-+    # def forward(
++++-+    #     self,
++++-+    #     hidden_states: mindspore.Tensor,
++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
++++-+    #     past_key_value: Optional[Cache] = None,
++++-+    #     output_attentions: bool = False,
++++-+    #     use_cache: bool = False,
++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++-+
++++-+    #     bsz, q_len, _ = hidden_states.shape
++++-+
++++-+    #     query_states = self.q_proj(hidden_states)
++++-+    #     key_states = self.k_proj(hidden_states)
++++-+    #     value_states = self.v_proj(hidden_states)
++++-+
++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-+
++++-+    #     kv_seq_len = key_states.shape[-2]
++++-+    #     if past_key_value is not None:
++++-+    #         if self.layer_idx is None:
++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++-+
++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++-+
++++-+    #     if past_key_value is not None:
++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++-+    #         key_states, value_states = past_key_value.update(
++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
++++-+    #         )
++++-+
++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++-+
++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
++++-+    #     # <--- 修改结束 ---
++++-+
++++-+    #     fa_attention_mask = None
++++-+    #     if attention_mask is not None:
++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++-+    #         fa_attention_mask = (mask_slice != 0)
++++-+
++++-+    #     input_dtype = query_states.dtype
++++-+
++++-+    #     attn_output = mindspore.ops.flash_attention_score(
++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
++++-+    #         key=key_states,
++++-+    #         value=value_states,
++++-+    #         head_num=self.num_heads,
++++-+    #         attn_mask=fa_attention_mask,
++++-+    #         keep_prob=1.0 - self.attention_dropout,
++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++-+    #         input_layout="BNSD",
++++-+    #         sparse_mode=0,
++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
++++-+    #     )
++++-+
++++-+    #     attn_output = attn_output.to(input_dtype)
++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++-+    #     attn_output = self.o_proj(attn_output)
++++-+
++++-+    #     attn_weights = None
++++-+    #     if output_attentions:
++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++-+
++++-+    #     return attn_output, attn_weights, past_key_value
++++-+
++++- QWEN2MOE_ATTENTION_CLASSES = {
++++-     "eager": Qwen2MoeAttention,
++++-+    "flash-attention": Qwen2MoeFlashAttention,
++++- }
++++- 
++++- 
++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++- 
++++-+    #@dwj
++++-+    # 只遍历激活的专家，而非全部专家
++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++--        hidden_states = hidden_states.view(-1, hidden_dim)
++++--        # router_logits: (batch * sequence_length, n_experts)
++++--        router_logits = self.gate(hidden_states)
++++--
++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++--        if self.norm_topk_prob:
++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++--        # we cast back to the input dtype
++++--        routing_weights = routing_weights.to(hidden_states.dtype)
++++--
++++--        final_hidden_states = ops.zeros(
++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++++--        )
++++--
++++--        # One hot encode the selected experts to create an expert mask
++++--        # this will be used to easily index which expert is going to be sollicitated
++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++++--
++++--        # Loop over all available experts in the model and perform the computation on each expert
++++--        for expert_idx in range(self.num_experts):
++++--            expert_layer = self.experts[expert_idx]
++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++++--
++++--            # Index the correct hidden states and compute the expert hidden state for
++++--            # the current expert. We need to make sure to multiply the output hidden
++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++++--            if 0 not in idx.shape:
++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++++--
++++--                # However `index_add_` only support torch tensors for indexing so we'll use
++++--                # the `top_x` tensor here.
++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++++--
++++--        shared_expert_output = self.shared_expert(hidden_states)
++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++++--
++++--        final_hidden_states = final_hidden_states + shared_expert_output
++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++-+            num_tokens = hidden_states_reshaped.shape[0]
++++-+
++++-+            router_logits = self.gate(hidden_states_reshaped)
++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++-+
++++-+            if self.norm_topk_prob:
++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
++++-+            
++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++-+            flat_selected_experts = selected_experts.flatten()
++++-+            
++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++-+            token_indices = broadcasted_token_indices.flatten()
++++-+            
++++-+            active_experts = ops.unique(flat_selected_experts)
++++-+            
++++-+            for expert_idx_tensor in active_experts:
++++-+                expert_idx = expert_idx_tensor.item()
++++-+                expert_layer = self.experts[expert_idx]
++++-+                
++++-+                mask = (flat_selected_experts == expert_idx_tensor)
++++-+                selected_token_indices = token_indices[mask]
++++-+                selected_routing_weights = routing_weights.flatten()[mask]
++++-+                
++++-+                current_states = hidden_states_reshaped[selected_token_indices]
++++-+                
++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++-+                
++++-+                final_hidden_states = final_hidden_states.index_add(
++++-+                    dim=0,
++++-+                    index=selected_token_indices,
++++-+                    source=expert_output.to(hidden_states.dtype)
++++-+                )
++++-+            
++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++- 
++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++--        return final_hidden_states, router_logits
++++-+            final_hidden_states = final_hidden_states + shared_expert_output
++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++-+            
++++-+            return final_hidden_states, router_logits
++++- 
++++- 
++++- class Qwen2MoeDecoderLayer(nn.Module):
++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++++- 
++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++- 
++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++-+
++++-         if (layer_idx not in config.mlp_only_layers) and (
++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++-         ):
++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++++-     _skip_keys_device_placement = "past_key_values"
++++-     _supports_cache_class = True
++++-+#lwx
++++-+    # _supports_static_cache = True
++++- 
++++-     def _init_weights(self, module):
++++-         std = self.config.initializer_range
++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++-         return causal_mask
++++- 
++++- 
++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++-     _tied_weights_keys = ["lm_head.weight"]
++++- 
++++-     def __init__(self, config):
++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++-         self.num_experts_per_tok = config.num_experts_per_tok
++++-         # Initialize weights and apply final processing
++++-         self.post_init()
++++-+        # @lwx
++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++++-+        #     self.generation_config.cache_implementation = "static"
++++-+        self._warmed_up = False
++++-+
++++-+    def warmup_moe_model(self):
++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
++++-+        test_texts = [
++++-+            "warmup short",
++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++++-+        ]
++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++-+        if tokenizer is None:
++++-+            from mindnlp.transformers import AutoTokenizer
++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++-+            self._warmup_tokenizer = tokenizer
++++-+
++++-+        for text in test_texts:
++++-+            inputs = tokenizer(text, return_tensors="ms")
++++-+            with mindspore._no_grad():
++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
++++- 
++++-     def get_input_embeddings(self):
++++-         return self.model.embed_tokens
++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++-         ```"""
++++-+        if not self._warmed_up:
++++-+            self._warmed_up = True
++++-+            self.warmup_moe_model()
++++- 
++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++-         output_router_logits = (
++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++-             }
++++-         )
++++-         return model_inputs
++++-+# @lwx
++++-+    # def _decode_one_tokens_logits(
++++-+    #     self,
++++-+    #     cur_token: mindspore.Tensor,
++++-+    #     input_pos: Optional[mindspore.Tensor],
++++-+    #     cache_position: mindspore.Tensor,
++++-+    #     past_key_values: StaticCache,
++++-+    # ) -> mindspore.Tensor:
++++-+    #     """
++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++++-+        
++++-+    #     Args:
++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++++-+    #         input_pos: 输入位置信息，可选
++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
++++-+            
++++-+    #     Returns:
++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++++-+    #     """
++++-+    #     # 调用JIT编译的版本
++++-+    #     return self.get_decode_one_tokens_logits(
++++-+    #         cur_token=cur_token,
++++-+    #         input_pos=input_pos,
++++-+    #         cache_position=cache_position,
++++-+    #         past_key_values=past_key_values,
++++-+    #     )
++++-+    
++++-+    # @mindspore.jit(jit_level='O1')
++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++++-+    #     """
++++-+    #     JIT编译的函数，用于高效的单token解码
++++-+    #     使用JIT编译优化以支持静态shape和高效执行
++++-+        
++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++++-+    #     """
++++-+    #     outputs = self.model.forward(
++++-+    #         input_ids=cur_token,
++++-+    #         position_ids=input_pos,
++++-+    #         cache_position=cache_position,
++++-+    #         past_key_values=past_key_values,
++++-+    #         use_cache=True,
++++-+    #         return_dict=False,
++++-+    #     )
++++-+        
++++-+    #     hidden_states = outputs[0]
++++-+    #     logits = self.lm_head.forward(hidden_states)
++++-+    #     logits = logits.float()
++++-+        
++++-+    #     return logits[:, -1, :]
++++-+
++++-+    # def _sample(
++++-+    #     self,
++++-+    #     input_ids: mindspore.Tensor,
++++-+    #     logits_processor,
++++-+    #     stopping_criteria,
++++-+    #     generation_config,
++++-+    #     synced_devices: bool,
++++-+    #     streamer=None,
++++-+    #     logits_warper=None,
++++-+    #     **model_kwargs,
++++-+    # ):
++++-+    #     """
++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++++-+    #     """
++++-+    #     from ...generation.logits_process import LogitsProcessorList
++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++++-+    #     from mindnlp.core import nn, ops, no_grad
++++-+    #     import numpy as np
++++-+        
++++-+    #     # 检查是否使用 StaticCache
++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++++-+    #     # 否则，直接调用父类方法
++++-+    #     past_key_values = model_kwargs.get("past_key_values")
++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++++-+
++++-+    #     if not isinstance(past_key_values, StaticCache):
++++-+    #         # 不使用 StaticCache，直接调用父类方法
++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++++-+    #         return super()._sample(
++++-+    #             input_ids=input_ids,
++++-+    #             logits_processor=logits_processor,
++++-+    #             stopping_criteria=stopping_criteria,
++++-+    #             generation_config=generation_config,
++++-+    #             synced_devices=synced_devices,
++++-+    #             streamer=streamer,
++++-+    #             logits_warper=logits_warper,
++++-+    #             **model_kwargs,
++++-+    #         )
++++-+        
++++-+    #     # 使用 StaticCache，进入自定义循环
++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++++-+    #     pad_token_id = generation_config._pad_token_tensor
++++-+    #     output_attentions = generation_config.output_attentions
++++-+    #     output_hidden_states = generation_config.output_hidden_states
++++-+    #     output_scores = generation_config.output_scores
++++-+    #     output_logits = generation_config.output_logits
++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
++++-+    #     max_length = generation_config.max_length
++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++++-+    #     do_sample = generation_config.do_sample
++++-+        
++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++++-+    #         raise ValueError(
++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++++-+    #             f"{logits_warper})."
++++-+    #         )
++++-+        
++++-+    #     # init attention / hidden states / scores tuples
++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++++-+        
++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++++-+    #         encoder_hidden_states = (
++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++++-+    #         )
++++-+        
++++-+    #     # keep track of which sequences are already finished
++++-+    #     batch_size, cur_len = input_ids.shape
++++-+    #     this_peer_finished = False
++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++++-+        
++++-+    #     time_record = []
++++-+    #     from ....utils.testing_utils import parse_flag_from_env
++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++++-+        
++++-+    #     while self._has_unfinished_sequences(
++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++++-+    #     ):
++++-+    #         if _record_time:
++++-+    #             import time as time_module
++++-+    #             infer_start = time_module.time()
++++-+            
++++-+    #         # prepare model inputs
++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++++-+            
++++-+    #         # prepare variable output controls
++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++++-+            
++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++++-+    #         cur_cache_position = model_inputs.get("cache_position")
++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
++++-+    #         cur_input_ids = model_inputs.get("input_ids")
++++-+            
++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
++++-+    #             cur_cache_position is not None and 
++++-+    #             len(cur_cache_position.shape) > 0 and
++++-+    #             cur_cache_position.shape[0] == 1 and
++++-+    #             cur_input_ids is not None and
++++-+    #             cur_input_ids.shape[1] == 1):
++++-+    #             # 使用 JIT 优化的单 token 解码
++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++++-+    #             if not hasattr(self, '_jit_used'):
++++-+    #                 self._jit_used = False
++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++++-+                
++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
++++-+    #                 cur_token=cur_input_ids,
++++-+    #                 input_pos=model_inputs.get("position_ids"),
++++-+    #                 cache_position=cur_cache_position,
++++-+    #                 past_key_values=cur_past_key_values,
++++-+    #             )
++++-+                
++++-+    #             # 标记已使用JIT（用于后续判断）
++++-+    #             if not self._jit_used:
++++-+    #                 self._jit_used = True
++++-+                
++++-+    #             # 构造兼容的输出对象
++++-+    #             class JitOptimizedOutput:
++++-+    #                 def __init__(self, logits, config):
++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++++-+    #                     self.config = config
++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
++++-+    #                     self.cross_attentions = None
++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++++-+                
++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++++-+    #         else:
++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++++-+    #             outputs = self(**model_inputs, return_dict=True)
++++-+            
++++-+    #         if synced_devices and this_peer_finished:
++++-+    #             continue
++++-+            
++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++++-+    #         next_token_logits = outputs.logits[:, -1, :]
++++-+            
++++-+    #         # pre-process distribution
++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++++-+    #         if do_sample:
++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++++-+            
++++-+    #         # Store scores, attentions and hidden_states when required
++++-+    #         if return_dict_in_generate:
++++-+    #             if output_scores:
++++-+    #                 scores += (next_token_scores,)
++++-+    #             if output_logits:
++++-+    #                 raw_logits += (next_token_logits,)
++++-+    #             if output_attentions:
++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++++-+    #                 if self.config.is_encoder_decoder:
++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++++-+                
++++-+    #             if output_hidden_states:
++++-+    #                 hidden = (
++++-+    #                     outputs.decoder_hidden_states
++++-+    #                     if self.config.is_encoder_decoder
++++-+    #                     else outputs.hidden_states
++++-+    #                 )
++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++++-+            
++++-+    #         # token selection
++++-+    #         if do_sample:
++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++++-+    #         else:
++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++++-+            
++++-+    #         # finished sentences should have their next token be a padding token
++++-+    #         if has_eos_stopping_criteria:
++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++++-+            
++++-+    #         # update generated ids, model inputs, and length for next step
++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++++-+    #         if streamer is not None:
++++-+    #             streamer.put(next_tokens)
++++-+            
++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
++++-+    #             outputs,
++++-+    #             model_kwargs,
++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
++++-+    #         )
++++-+            
++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++++-+    #         cur_len += 1
++++-+            
++++-+    #         if _record_time:
++++-+    #             import time as time_module
++++-+    #             infer_stop = time_module.time()
++++-+    #             time_record.append(infer_stop - infer_start)
++++-+            
++++-+    #         del outputs
++++-+        
++++-+    #     average_infer_time = None
++++-+    #     if time_record:
++++-+    #         if len(time_record) > 1:
++++-+    #             time_record.pop(0)
++++-+    #         average_infer_time = sum(time_record) / len(time_record)
++++-+    #         print(f'average inference time is: {average_infer_time}')
++++-+    #         print(f'inference time record: {time_record}')
++++-+        
++++-+    #     if streamer is not None:
++++-+    #         streamer.end()
++++-+        
++++-+    #     # 简单判断：打印是否使用了JIT路径
++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
++++-+    #     else:
++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++++-+        
++++-+    #     if return_dict_in_generate:
++++-+    #         if self.config.is_encoder_decoder:
++++-+    #             return GenerateEncoderDecoderOutput(
++++-+    #                 sequences=input_ids,
++++-+    #                 scores=scores,
++++-+    #                 logits=raw_logits,
++++-+    #                 encoder_attentions=encoder_attentions,
++++-+    #                 encoder_hidden_states=encoder_hidden_states,
++++-+    #                 decoder_attentions=decoder_attentions,
++++-+    #                 cross_attentions=cross_attentions,
++++-+    #                 decoder_hidden_states=decoder_hidden_states,
++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
++++-+    #                 average_infer_time=average_infer_time
++++-+    #             )
++++-+    #         else:
++++-+    #             return GenerateDecoderOnlyOutput(
++++-+    #                 sequences=input_ids,
++++-+    #                 scores=scores,
++++-+    #                 logits=raw_logits,
++++-+    #                 attentions=decoder_attentions,
++++-+    #                 hidden_states=decoder_hidden_states,
++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
++++-+    #                 average_infer_time=average_infer_time
++++-+    #             )
++++-+    #     else:
++++-+    #         return input_ids
++++-+            
++++-+    # def _prepare_cache_for_generation(
++++-+    #     self,
++++-+    #     generation_config,
++++-+    #     model_kwargs,
++++-+    #     assistant_model,
++++-+    #     batch_size,
++++-+    #     max_cache_length,
++++-+    # ):
++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++++-+    #         generation_config.cache_implementation = "static"
++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++++-+        
++++-+    #     if generation_config.cache_implementation == "static":
++++-+    #         base_required_from_max_length = generation_config.max_length + 1
++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
++++-+    #         min_cache_size = 50
++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++++-+    #         else:
++++-+    #             max_cache_length = max(base_required, min_cache_size)
++++-+            
++++-+    #         original_max_cache_length = max_cache_length
++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
++++-+            
++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++-+    #             if max_cache_length > self.config.max_position_embeddings:
++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++++-+        
++++-+    #     result = super()._prepare_cache_for_generation(
++++-+    #         generation_config=generation_config,
++++-+    #         model_kwargs=model_kwargs,
++++-+    #         assistant_model=assistant_model,
++++-+    #         batch_size=batch_size,
++++-+    #         max_cache_length=max_cache_length,
++++-+    #     )
++++-+        
++++-+    #     if generation_config.cache_implementation == "static":
++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++++-+    #         created_cache = model_kwargs.get(cache_name)
++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++++-+        
++++-+    #     return result
++++-+
++++-+
++++-+
++++- 
++++- 
++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++++--- 
++++-2.27.0
++++-
++++-- 
++++2.27.0
++++
+++-- 
+++2.27.0
+++
++-- 
++2.27.0
++
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0007-20251107003commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0007-20251107003commit.patch"
new file mode 100644
index 00000000..695e3df9
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0007-20251107003commit.patch"
@@ -0,0 +1,8034 @@
+From 2831c3ffbda41719e00e1cd83c3840bcb9dd79db Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Fri, 7 Nov 2025 12:12:51 +0800
+Subject: [PATCH 07/10] 20251107003commit
+
+---
+ .../models/deepseek/modeling_deepseek.py      |    2 +-
+ patches/0001-20251104commit.patch             |    2 +-
+ patches/0002-20251106commit.patch             |    2 +-
+ patches/0003-20261106secondcommit.patch       |    2 +-
+ patches/0004-20251106change.patch             |    2 +-
+ patches/0005-20251107001commit.patch          |    2 +-
+ patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
+ 7 files changed, 7937 insertions(+), 6 deletions(-)
+ create mode 100644 patches/0006-20251107002commit.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index e7e1c053..ff631974 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
+     #     return expert_cache
+     
+     @no_grad()
+-    dwj
++    # dwj
+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+         # x 的 shape: (1, hidden_size)
+         # flat_expert_indices 的 shape: (num_experts_per_tok,)
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+index 2842180e..c9c8c5ee 100644
+--- a/patches/0001-20251104commit.patch
++++ b/patches/0001-20251104commit.patch
+@@ -1,7 +1,7 @@
+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-Subject: [PATCH 1/5] 20251104commit
++Subject: [PATCH 1/6] 20251104commit
+ 
+ ---
+  mindnlp/transformers/cache_utils.py           |  28 +-
+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+index c6cd8757..625656eb 100644
+--- a/patches/0002-20251106commit.patch
++++ b/patches/0002-20251106commit.patch
+@@ -1,7 +1,7 @@
+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-Subject: [PATCH 2/5] 20251106commit
++Subject: [PATCH 2/6] 20251106commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+index 601960c9..dcb85080 100644
+--- a/patches/0003-20261106secondcommit.patch
++++ b/patches/0003-20261106secondcommit.patch
+@@ -1,7 +1,7 @@
+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-Subject: [PATCH 3/5] 20261106secondcommit
++Subject: [PATCH 3/6] 20261106secondcommit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+index 8976f10b..bbed13cc 100644
+--- a/patches/0004-20251106change.patch
++++ b/patches/0004-20251106change.patch
+@@ -1,7 +1,7 @@
+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 15:48:09 +0800
+-Subject: [PATCH 4/5] 20251106change
++Subject: [PATCH 4/6] 20251106change
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  189 +-
+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+index 8d9032be..b2d1035c 100644
+--- a/patches/0005-20251107001commit.patch
++++ b/patches/0005-20251107001commit.patch
+@@ -1,7 +1,7 @@
+ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Fri, 7 Nov 2025 11:48:18 +0800
+-Subject: [PATCH 5/5] 20251107001commit
++Subject: [PATCH 5/6] 20251107001commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |   91 +-
+diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
+new file mode 100644
+index 00000000..bffa134e
+--- /dev/null
++++ b/patches/0006-20251107002commit.patch
+@@ -0,0 +1,7931 @@
++From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Fri, 7 Nov 2025 12:06:32 +0800
++Subject: [PATCH 6/6] 20251107002commit
++
++---
++ .../models/deepseek/modeling_deepseek.py      |  122 +-
++ patches/0001-20251104commit.patch             |    2 +-
++ patches/0002-20251106commit.patch             |    2 +-
++ patches/0003-20261106secondcommit.patch       |    2 +-
++ patches/0004-20251106change.patch             |    2 +-
++ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
++ 6 files changed, 7773 insertions(+), 64 deletions(-)
++ create mode 100644 patches/0005-20251107001commit.patch
++
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index 8831e4b7..e7e1c053 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
++     #         expert_out = expert(x)
++     #         expert_cache += expert_out * weight
++     #     return expert_cache
++-
++-    # @no_grad()
++-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++-    #     # x 的 shape: (1, hidden_size)
++-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
++-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++-
++-    #     # 1. 收集所有需要的专家层
++-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
++-
++-    #     # 2. 并行计算所有专家的输出
++-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
++-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++-
++-    #     # 3. 使用矩阵乘法进行加权求和
++-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
++-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++    
+++    @no_grad()
+++    dwj
+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++        # x 的 shape: (1, hidden_size)
+++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++
+++        # 1. 收集所有需要的专家层
+++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+++
+++        # 2. 并行计算所有专家的输出
+++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++        # ops.cat 会将它们堆叠成一个新的 Tensor
+++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++
+++        # 3. 使用矩阵乘法进行加权求和
+++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++        # 最终结果 final_output 的 shape: (1, hidden_size)
+++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++         
++-    #     return final_output
+++        return final_output
++ 
++ 
++     # @no_grad()
++@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
++ 
++         return expert_cache
++ # 放置在 DeepseekMoE 类中
++-    @no_grad()
++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++-        """
++-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
++-
++-        Args:
++-            x (Tensor): 输入张量, shape: (1, hidden_size)
++-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
++-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
++-        """
++-        top_k, _ = flat_expert_weights.shape
++-        hidden_size = x.shape[-1]
++-
++-        # 1. 将所有专家的权重堆叠起来
++-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
++-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
++-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+++    # @no_grad()
+++    # #lwx 20251107
+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++    #     """
+++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+++
+++    #     Args:
+++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
+++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+++    #     """
+++    #     top_k, _ = flat_expert_weights.shape
+++    #     hidden_size = x.shape[-1]
+++
+++    #     # 1. 将所有专家的权重堆叠起来
+++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
++         
++-        # 2. "收集" 所需的专家权重
++-        selected_gate_w = stacked_gate_w[flat_expert_indices]
++-        selected_up_w = stacked_up_w[flat_expert_indices]
++-        selected_down_w = stacked_down_w[flat_expert_indices]
+++    #     # 2. "收集" 所需的专家权重
+++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
+++    #     selected_up_w = stacked_up_w[flat_expert_indices]
+++    #     selected_down_w = stacked_down_w[flat_expert_indices]
++         
++-        # 3. 准备输入
++-        x_expanded = x.expand((top_k, 1, hidden_size))
+++    #     # 3. 准备输入
+++    #     x_expanded = x.expand((top_k, 1, hidden_size))
++         
++-        # 4. 并行计算 gate_proj 和 up_proj
++-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
++-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+++    #     # 4. 并行计算 gate_proj 和 up_proj
+++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
++ 
++-        # 5. 计算中间状态
++-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+++    #     # 5. 计算中间状态
+++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
++         
++-        # 6. 并行计算 down_proj
++-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
++-        #    --- [FIX] ---
++-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
++-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
++-        #    --- [FIX END] ---
+++    #     # 6. 并行计算 down_proj
+++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+++    #     #    --- [FIX] ---
+++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+++    #     #    --- [FIX END] ---
++         
++-        # 7. 根据路由权重进行加权求和
++-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+++    #     # 7. 根据路由权重进行加权求和
+++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
++ 
++-        return weighted_sum
+++    #     return weighted_sum
++ 
++ 
++ 
++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++index 0a0ef2d7..2842180e 100644
++--- a/patches/0001-20251104commit.patch
+++++ b/patches/0001-20251104commit.patch
++@@ -1,7 +1,7 @@
++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Tue, 4 Nov 2025 09:11:51 +0800
++-Subject: [PATCH 1/4] 20251104commit
+++Subject: [PATCH 1/5] 20251104commit
++ 
++ ---
++  mindnlp/transformers/cache_utils.py           |  28 +-
++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++index 5185270c..c6cd8757 100644
++--- a/patches/0002-20251106commit.patch
+++++ b/patches/0002-20251106commit.patch
++@@ -1,7 +1,7 @@
++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 09:20:38 +0800
++-Subject: [PATCH 2/4] 20251106commit
+++Subject: [PATCH 2/5] 20251106commit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++index 3e05f821..601960c9 100644
++--- a/patches/0003-20261106secondcommit.patch
+++++ b/patches/0003-20261106secondcommit.patch
++@@ -1,7 +1,7 @@
++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 14:54:37 +0800
++-Subject: [PATCH 3/4] 20261106secondcommit
+++Subject: [PATCH 3/5] 20261106secondcommit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
++index 88a1aef4..8976f10b 100644
++--- a/patches/0004-20251106change.patch
+++++ b/patches/0004-20251106change.patch
++@@ -1,7 +1,7 @@
++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 15:48:09 +0800
++-Subject: [PATCH 4/4] 20251106change
+++Subject: [PATCH 4/5] 20251106change
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  189 +-
++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
++new file mode 100644
++index 00000000..8d9032be
++--- /dev/null
+++++ b/patches/0005-20251107001commit.patch
++@@ -0,0 +1,7707 @@
+++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+++From: Pinoeer-kingxi <13022943007@163.com>
+++Date: Fri, 7 Nov 2025 11:48:18 +0800
+++Subject: [PATCH 5/5] 20251107001commit
+++
+++---
+++ .../models/deepseek/modeling_deepseek.py      |   91 +-
+++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
+++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
+++ patches/0001-20251104commit.patch             |    2 +-
+++ patches/0002-20251106commit.patch             |    2 +-
+++ patches/0003-20261106secondcommit.patch       |    2 +-
+++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
+++ 7 files changed, 7577 insertions(+), 30 deletions(-)
+++ create mode 100644 patches/0004-20251106change.patch
+++
+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++index 0546f318..8831e4b7 100644
+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
+++     #         expert_cache += expert_out * weight
+++     #     return expert_cache
+++ 
+++-    @no_grad()
+++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++-        # x 的 shape: (1, hidden_size)
+++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++-
+++-        # 1. 收集所有需要的专家层
+++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
+++-
+++-        # 2. 并行计算所有专家的输出
+++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++-        # ops.cat 会将它们堆叠成一个新的 Tensor
+++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++-
+++-        # 3. 使用矩阵乘法进行加权求和
+++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++-        # 最终结果 final_output 的 shape: (1, hidden_size)
+++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++++    # @no_grad()
++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++    #     # x 的 shape: (1, hidden_size)
++++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
++++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++++
++++    #     # 1. 收集所有需要的专家层
++++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
++++
++++    #     # 2. 并行计算所有专家的输出
++++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
++++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++++
++++    #     # 3. 使用矩阵乘法进行加权求和
++++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
++++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++         
+++-        return final_output
++++    #     return final_output
+++ 
+++ 
+++     # @no_grad()
+++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
+++             )
+++ 
+++         return expert_cache
++++# 放置在 DeepseekMoE 类中
++++    @no_grad()
++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++        """
++++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
++++
++++        Args:
++++            x (Tensor): 输入张量, shape: (1, hidden_size)
++++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
++++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
++++        """
++++        top_k, _ = flat_expert_weights.shape
++++        hidden_size = x.shape[-1]
++++
++++        # 1. 将所有专家的权重堆叠起来
++++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
++++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
++++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
++++        
++++        # 2. "收集" 所需的专家权重
++++        selected_gate_w = stacked_gate_w[flat_expert_indices]
++++        selected_up_w = stacked_up_w[flat_expert_indices]
++++        selected_down_w = stacked_down_w[flat_expert_indices]
++++        
++++        # 3. 准备输入
++++        x_expanded = x.expand((top_k, 1, hidden_size))
++++        
++++        # 4. 并行计算 gate_proj 和 up_proj
++++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
++++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
++++
++++        # 5. 计算中间状态
++++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
++++        
++++        # 6. 并行计算 down_proj
++++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
++++        #    --- [FIX] ---
++++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
++++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
++++        #    --- [FIX END] ---
++++        
++++        # 7. 根据路由权重进行加权求和
++++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
++++
++++        return weighted_sum
++++
++++
+++ 
+++         # @no_grad()
+++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++index ebd7782e..913a7609 100644
+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++ def rotate_half(x):
+++     """Rotates half the hidden dims of the input."""
+++-    x1 = x[..., : x.shape[-1] // 2]
+++-    x2 = x[..., x.shape[-1] // 2 :]
++++    # x1 = x[..., : x.shape[-1] // 2]
++++    # x2 = x[..., x.shape[-1] // 2 :]
+++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++     return ops.cat((-x2, x1), dim=-1)
+++ 
+++ 
+++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+++index d059dcbe..2b217b64 100644
+++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
++++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++ def rotate_half(x):
+++     """Rotates half the hidden dims of the input."""
+++-    x1 = x[..., : x.shape[-1] // 2]
+++-    x2 = x[..., x.shape[-1] // 2 :]
++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++    # x1 = x[..., : x.shape[-1] // 2]
++++    # x2 = x[..., x.shape[-1] // 2 :]
++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++     return ops.cat((-x2, x1), dim=-1)
+++ 
+++ 
+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++index 78f22642..0a0ef2d7 100644
+++--- a/patches/0001-20251104commit.patch
++++++ b/patches/0001-20251104commit.patch
+++@@ -1,7 +1,7 @@
+++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Tue, 4 Nov 2025 09:11:51 +0800
+++-Subject: [PATCH 1/3] 20251104commit
++++Subject: [PATCH 1/4] 20251104commit
+++ 
+++ ---
+++  mindnlp/transformers/cache_utils.py           |  28 +-
+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+++index 22b65dd5..5185270c 100644
+++--- a/patches/0002-20251106commit.patch
++++++ b/patches/0002-20251106commit.patch
+++@@ -1,7 +1,7 @@
+++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Thu, 6 Nov 2025 09:20:38 +0800
+++-Subject: [PATCH 2/3] 20251106commit
++++Subject: [PATCH 2/4] 20251106commit
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+++index 966529e4..3e05f821 100644
+++--- a/patches/0003-20261106secondcommit.patch
++++++ b/patches/0003-20261106secondcommit.patch
+++@@ -1,7 +1,7 @@
+++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Thu, 6 Nov 2025 14:54:37 +0800
+++-Subject: [PATCH 3/3] 20261106secondcommit
++++Subject: [PATCH 3/4] 20261106secondcommit
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+++new file mode 100644
+++index 00000000..88a1aef4
+++--- /dev/null
++++++ b/patches/0004-20251106change.patch
+++@@ -0,0 +1,7498 @@
++++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
++++From: Pinoeer-kingxi <13022943007@163.com>
++++Date: Thu, 6 Nov 2025 15:48:09 +0800
++++Subject: [PATCH 4/4] 20251106change
++++
++++---
++++ .../models/deepseek/modeling_deepseek.py      |  189 +-
++++ patches/0001-20251104commit.patch             | 1272 +++++++
++++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
++++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
++++ 4 files changed, 7244 insertions(+), 186 deletions(-)
++++ create mode 100644 patches/0001-20251104commit.patch
++++ create mode 100644 patches/0002-20251106commit.patch
++++ create mode 100644 patches/0003-20261106secondcommit.patch
++++
++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++index 2f9192bf..0546f318 100644
++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
++++ 
++++         return attn_output, attn_weights, past_key_value
++++ 
++++-# class DeepseekFlashAttention(nn.Module):
++++-#     """
++++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
++++-
++++-#     This class is designed as a drop-in replacement for DeepseekAttention.
++++-#     """
++++-
++++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++++-#         super().__init__()
++++-#         self.config = config
++++-#         self.layer_idx = layer_idx
++++-#         if layer_idx is None:
++++-#             logger.warning(
++++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++-#                 "when creating this class."
++++-#             )
++++-
++++-#         self.attention_dropout = config.attention_dropout
++++-#         self.hidden_size = config.hidden_size
++++-#         self.num_heads = config.num_attention_heads
++++-#         self.head_dim = self.hidden_size // self.num_heads
++++-#         self.num_key_value_heads = config.num_key_value_heads
++++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++-#         self.max_position_embeddings = config.max_position_embeddings
++++-#         self.rope_theta = config.rope_theta
++++-#         self.is_causal = True
++++-
++++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++-#             raise ValueError(
++++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++-#                 f" and `num_heads`: {self.num_heads})."
++++-#             )
++++-
++++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++++-#         self._init_rope()
++++-
++++-#     def _init_rope(self):
++++-#         if self.config.rope_scaling is None:
++++-#             self.rotary_emb = DeepseekRotaryEmbedding(
++++-#                 self.head_dim,
++++-#                 max_position_embeddings=self.max_position_embeddings,
++++-#                 base=self.rope_theta,
++++-#             )
++++-#         else:
++++-#             scaling_type = self.config.rope_scaling["type"]
++++-#             scaling_factor = self.config.rope_scaling["factor"]
++++-#             if scaling_type == "linear":
++++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++++-#                     self.head_dim,
++++-#                     max_position_embeddings=self.max_position_embeddings,
++++-#                     scaling_factor=scaling_factor,
++++-#                     base=self.rope_theta,
++++-#                 )
++++-#             elif scaling_type == "dynamic":
++++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++++-#                     self.head_dim,
++++-#                     max_position_embeddings=self.max_position_embeddings,
++++-#                     scaling_factor=scaling_factor,
++++-#                     base=self.rope_theta,
++++-#                 )
++++-#             else:
++++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++++-
++++-#     def forward(
++++-#         self,
++++-#         hidden_states: mindspore.Tensor,
++++-#         attention_mask: Optional[mindspore.Tensor] = None,
++++-#         position_ids: Optional[mindspore.Tensor] = None,
++++-#         past_key_value: Optional[Cache] = None,
++++-#         output_attentions: bool = False,
++++-#         use_cache: bool = False,
++++-#         **kwargs,
++++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++-#         if "padding_mask" in kwargs:
++++-#             warnings.warn(
++++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++++-#             )
++++-        
++++-#         if output_attentions:
++++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
++++-
++++-#         bsz, q_len, _ = hidden_states.shape
++++-
++++-#         if self.config.pretraining_tp > 1:
++++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++++-
++++-#         query_states = self.q_proj(hidden_states)
++++-#         key_states = self.k_proj(hidden_states)
++++-#         value_states = self.v_proj(hidden_states)
++++-
++++-#         # Reshape for multi-head attention
++++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++-
++++-#         kv_seq_len = key_states.shape[-2]
++++-#         if past_key_value is not None:
++++-#             if self.layer_idx is None:
++++-#                 raise ValueError(
++++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++-#                     "with a layer index."
++++-#                 )
++++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++-        
++++-#         # Apply Rotary Positional Embedding
++++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++-
++++-#         if past_key_value is not None:
++++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
++++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++-
++++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
++++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
++++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++-        
++++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++-        
++++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++-
++++-#         # Convert attention_mask for flash_attention_score
++++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
++++-#         if attention_mask is not None:
++++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
++++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++++-#                 raise ValueError(
++++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++++-#                 )
++++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
++++-#         else:
++++-#             attn_mask_for_fa = None
++++-        
++++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++++-
++++-#         # Call the fused flash_attention_score operator
++++-#         attn_output = mindspore.ops.flash_attention_score(
++++-#             query=query_states_for_fa,
++++-#             key=key_states_for_fa,
++++-#             value=value_states_for_fa,
++++-#             head_num=self.num_heads, # This is N1, the number of query heads
++++-#             input_layout='BSH',
++++-#             attn_mask=attn_mask_for_fa,
++++-#             keep_prob=keep_prob,
++++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
++++-#             sparse_mode=0 # Default mask mode
++++-#         )
++++-        
++++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
++++-#         attn_output = self.o_proj(attn_output)
++++-        
++++-#         # Flash Attention does not return attention weights
++++-#         attn_weights = None
++++-
++++-#         return attn_output, attn_weights, past_key_value
++++ 
++++ class DeepseekFlashAttention(nn.Module):
++++     """
++++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
++++         super().__init__()
++++         self.hidden_size = config.hidden_size
++++ 
++++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
++++-            config=config, layer_idx=layer_idx
++++-        )
+++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+++++            # config=config, layer_idx=layer_idx
+++++        # )
++++ 
++++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
++++             config=config, layer_idx=layer_idx
++++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
++++         return outputs
++++ 
++++ 
++++-
++++ class DeepseekPreTrainedModel(PreTrainedModel):
++++     config_class = DeepseekConfig
++++     base_model_prefix = "model"
++++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++         # Initialize weights and apply final processing
++++         self.post_init()
++++         self.warm_up = False
++++-        #@dwj
++++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
++++-            self.num_layers,
++++-            self.num_attention_heads,
++++-            self.head_dim,
++++-            batch_size=1,
++++-            max_length=self.max_length,
++++-            dtype=mindspore.float16
++++-        )
++++-
++++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
++++-        key_cache = []
++++-        value_cache = []
++++-        for _ in range(num_layers):
++++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++-            key_cache.append(k)
++++-            value_cache.append(v)
++++-        return key_cache, value_cache
++++-
++++ 
++++     def warmup_moe_model_deep(self):
++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++++new file mode 100644
++++index 00000000..78f22642
++++--- /dev/null
+++++++ b/patches/0001-20251104commit.patch
++++@@ -0,0 +1,1272 @@
+++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++++From: Pinoeer-kingxi <13022943007@163.com>
+++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+++++Subject: [PATCH 1/3] 20251104commit
+++++
+++++---
+++++ mindnlp/transformers/cache_utils.py           |  28 +-
+++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+++++
+++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++++index cadd2e04..02f8d4be 100644
+++++--- a/mindnlp/transformers/cache_utils.py
++++++++ b/mindnlp/transformers/cache_utils.py
+++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++++             # k_out[:, :, cache_position] = key_states
+++++             # v_out[:, :, cache_position] = value_states
+++++-            if ON_ORANGE_PI:
+++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++-            else:
+++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++-
++++++            # if ON_ORANGE_PI:
++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++            # else:
++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++            # 确保 cache_position 是 1D tensor 并且类型正确
++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++++++            if cache_position.ndim > 1:
++++++                cache_position = cache_position.flatten()
++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++++++                cache_position = cache_position.int()
++++++            
++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++++++            k_out[:, :, cache_position] = key_states
++++++            v_out[:, :, cache_position] = value_states
++++++            
+++++         return k_out, v_out
+++++ 
+++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++index c695b944..d8303e45 100644
+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++++ def rotate_half(x):
+++++     """Rotates half the hidden dims of the input."""
+++++-    x1 = x[..., : x.shape[-1] // 2]
+++++-    x2 = x[..., x.shape[-1] // 2 :]
++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++    # x1 = x[..., : x.shape[-1] // 2]
++++++    # x2 = x[..., x.shape[-1] // 2 :]
++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++     return ops.cat((-x2, x1), dim=-1)
+++++ 
+++++ 
+++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++++         if self.training:
+++++             raise NotImplementedError("Training is not supported yet.")
+++++         else:
+++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++-        if self.config.n_shared_experts is not None:
+++++-            y = y + self.shared_experts(identity)
+++++-        return y
++++++            # @lwx
++++++            if orig_shape[1] == 1:
++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++++++                y=y.view(*orig_shape)
++++++                if self.config.n_shared_experts is not None:
++++++                    y = y + self.shared_experts(identity)
++++++                return y
++++++            else:
++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++++++                if self.config.n_shared_experts is not None:
++++++                    y = y + self.shared_experts(identity)
++++++                return y
++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++        # if self.config.n_shared_experts is not None:
++++++        #     y = y + self.shared_experts(identity)
++++++        # return y
++++++        
++++++    @no_grad()
++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++
++++++        expert_cache = ops.zeros_like(x)
++++++        for i in range(self.num_experts_per_tok):
++++++            expert_id = flat_expert_indices[i].item()
++++++            weight = flat_expert_weights[i].item()
++++++            expert = self.experts[expert_id]
++++++            expert_out = expert(x)
++++++            expert_cache += expert_out * weight
++++++        return expert_cache
+++++ 
+++++     @no_grad()
+++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++-        # expert_cache = torch.zeros_like(x)
+++++-        # idxs = flat_expert_indices.argsort()
+++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++-        # token_idxs = idxs // self.num_experts_per_tok
+++++-        # for i, end_idx in enumerate(tokens_per_expert):
+++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++-        #     if start_idx == end_idx:
+++++-        #         continue
+++++-        #     expert = self.experts[i]
+++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++-        #     expert_tokens = x[exp_token_idx]
+++++-        #     expert_out = expert(expert_tokens)
+++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++-        # return expert_cache
++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++         expert_cache = ops.zeros_like(x)
+++++         idxs = flat_expert_indices.argsort()
+++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++         token_idxs = idxs // self.num_experts_per_tok
++++++
+++++         for i, end_idx in enumerate(tokens_per_expert):
+++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++             if start_idx == end_idx:
+++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++++             expert_out = expert(expert_tokens)
+++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++
+++++         return expert_cache
++++++        
++++++    # @no_grad()
++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++    #     # expert_cache = torch.zeros_like(x)
++++++    #     # idxs = flat_expert_indices.argsort()
++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++    #     # token_idxs = idxs // self.num_experts_per_tok
++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++    #     #     if start_idx == end_idx:
++++++    #     #         continue
++++++    #     #     expert = self.experts[i]
++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++    #     #     expert_tokens = x[exp_token_idx]
++++++    #     #     expert_out = expert(expert_tokens)
++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++    #     # return expert_cache
++++++    #     expert_cache = ops.zeros_like(x)
++++++    #     idxs = flat_expert_indices.argsort()
++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++    #     token_idxs = idxs // self.num_experts_per_tok
++++++
++++++    #     for i, end_idx in enumerate(tokens_per_expert):
++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++    #         if start_idx == end_idx:
++++++    #             continue
++++++    #         expert = self.experts[i]
++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++    #         expert_tokens = x[exp_token_idx]
++++++    #         expert_out = expert(expert_tokens)
++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++
++++++    #     return expert_cache
++++++    # @no_grad()
++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++    #     expert_cache = ops.zeros_like(x)
++++++
++++++    #     # 排序保证顺序一致
++++++    #     idxs = flat_expert_indices.argsort()
++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++    #     token_idxs = idxs // self.num_experts_per_tok
++++++
++++++    #     # 找出有 token 的专家
++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++
++++++    #     for i in active_experts.tolist():
++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++    #         end_idx = tokens_per_expert[i]
++++++    #         if start_idx == end_idx:  # 没有 token
++++++    #             continue
++++++
++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++    #         expert_tokens = x[exp_token_idx]
++++++    #         expert_out = self.experts[i](expert_tokens)
++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++
++++++    #         expert_cache = mindspore.mint.scatter_add(
++++++    #             expert_cache,
++++++    #             0,
++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++    #             expert_out
++++++    #         )
++++++
++++++    #     return expert_cache
++++++
++++++
+++++ 
+++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++++ #     """
+++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++ 
+++++         # Initialize weights and apply final processing
+++++         self.post_init()
++++++        self.warm_up = False
++++++
++++++    def warmup_moe_model_deep(self):
++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++++        test_texts = [
++++++            "warmup short",
++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++++++        ]
++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++        if tokenizer is None:
++++++            from mindnlp.transformers import AutoTokenizer
++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++            self._warmup_tokenizer = tokenizer
++++++
++++++        for text in test_texts:
++++++            inputs = tokenizer(text, return_tensors="ms")
++++++            with mindspore._no_grad():
++++++                _ = self(**inputs, use_cache=False)
++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++++ 
+++++     def get_input_embeddings(self):
+++++         return self.model.embed_tokens
+++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++         ```"""
++++++        if not self.warm_up:
++++++            self.warm_up = True
++++++            self.warmup_moe_model_deep()
++++++
+++++         output_attentions = (
+++++             output_attentions
+++++             if output_attentions is not None
+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++index 3cbf820e..d4c6b651 100644
+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++@@ -18,7 +18,6 @@
+++++ # See the License for the specific language governing permissions and
+++++ # limitations under the License.
+++++ """MindSpore Qwen2MoE model."""
+++++-
+++++ import math
+++++ from typing import List, Optional, Tuple, Union
+++++ 
+++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++++     TokenClassifierOutput,
+++++ )
+++++ from ...modeling_utils import PreTrainedModel
++++++from ...generation import GenerationMixin
+++++ from ....utils import logging
+++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+++++ 
+++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++++         self.variance_epsilon = eps
+++++ 
+++++     def forward(self, hidden_states):
++++++        # @dwj
++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++        # @lwx
++++++        # if not self.training :
++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++         input_dtype = hidden_states.dtype
+++++         hidden_states = hidden_states.to(mindspore.float32)
+++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++++@@ -234,6 +239,8 @@ def rotate_half(x):
+++++     """Rotates half the hidden dims of the input."""
+++++     x1 = x[..., : x.shape[-1] // 2]
+++++     x2 = x[..., x.shape[-1] // 2 :]
++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++     return ops.cat((-x2, x1), dim=-1)
+++++ 
+++++ 
+++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++++         self.config = config
+++++         self.hidden_size = config.hidden_size
+++++         self.intermediate_size = intermediate_size
++++++        
+++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++++         self.act_fn = ACT2FN[config.hidden_act]
+++++ 
+++++     def forward(self, x):
+++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++-
+++++ 
++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++        # @lwx 
++++++        # gate_up_output = self.gate_up_proj(x)
++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++++++        # return self.down_proj(swiglu_output)
++++++
++++++    # def forward(self, x):
++++++    #     gate_proj_out = self.gate_proj(x)
++++++    #     up_proj_out = self.up_proj(x)
++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++++++    #     return self.down_proj(swiglu_out)
++++++        
+++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++++     """
+++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++++         use_cache: bool = False,
+++++         cache_position: Optional[mindspore.Tensor] = None,
+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++        
++++++
+++++         bsz, q_len, _ = hidden_states.shape
+++++ 
+++++         query_states = self.q_proj(hidden_states)
+++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++                     "with a layer index."
+++++                 )
+++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++            if isinstance(past_key_value, StaticCache):
++++++                kv_seq_len = key_states.shape[-2]
++++++            else:
++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++ 
+++++         if past_key_value is not None:
+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++            
++++++            if isinstance(past_key_value, StaticCache):
++++++                kv_seq_len = key_states.shape[-2]
+++++ 
+++++         # repeat k/v heads if n_kv_heads < n_heads
+++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++-
++++++        
+++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++ 
+++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++++-            raise ValueError(
+++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++++-                f" {attn_weights.shape}"
+++++-            )
+++++-
+++++-        if attention_mask is not None:  # no matter the length, we just slice it
+++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++++++        if attention_mask is not None:
++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++             attn_weights = attn_weights + causal_mask
+++++ 
+++++         # upcast attention to fp32
+++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++ 
+++++         attn_output = self.o_proj(attn_output)
+++++-
++++++        # @lwx
++++++        
++++++        # max_seq_len = self.max_position_embeddings  # 2048
++++++
++++++        # if attention_mask is not None:
++++++        #     # attention_mask: [B, 1, Sq, Sk]
++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++++
++++++        #     # pad 到 [max_seq_len, max_seq_len]
++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++++        #     global_attention_mask = padded_mask
++++++        # else:
++++++        #     global_attention_mask = None
++++++
++++++
++++++        # sparse_mode=3
++++++        # attn_output = mindspore.ops.flash_attention_score(
++++++        #     query=query_states,
++++++        #     key=key_states,
++++++        #     value=value_states,
++++++        #     real_shift=None,
++++++        #     padding_mask=None,
++++++
++++++        #     head_num=self.num_heads,
++++++        #     attn_mask=global_attention_mask,
++++++        #     keep_prob=1.0 - self.attention_dropout,
++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++++        #     input_layout="BNSD",
++++++        #     pre_tokens=2147483647,
++++++        #     next_tokens=2147483647,
++++++        #     inner_precise=0,
++++++        #     drop_mask=None,
++++++        #     prefix=None,
++++++        #     actual_seq_qlen=None,
++++++        #     actual_seq_kvlen=None,
++++++        #     sparse_mode=sparse_mode,
++++++        # )
+++++         if not output_attentions:
+++++             attn_weights = None
+++++ 
+++++         return attn_output, attn_weights, past_key_value
+++++ 
+++++ 
++++++class Qwen2MoeFlashAttention(nn.Module):
++++++    """
++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++++
++++++    关键改动:
++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++++       直接传入原始的 key 和 value 张量效率更高。
++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++    """
++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++        super().__init__()
++++++        self.config = config
++++++        self.layer_idx = layer_idx
++++++        self.hidden_size = config.hidden_size
++++++        self.num_heads = config.num_attention_heads
++++++        self.head_dim = self.hidden_size // self.num_heads
++++++        self.num_key_value_heads = config.num_key_value_heads
++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++        self.max_position_embeddings = config.max_position_embeddings
++++++        self.rope_theta = config.rope_theta
++++++        self.attention_dropout = config.attention_dropout
++++++
++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++++            raise ValueError(
++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++++            )
++++++
++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++
++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++            self.head_dim,
++++++            max_position_embeddings=self.max_position_embeddings,
++++++            base=self.rope_theta,
++++++        )
++++++
++++++    def forward(
++++++        self,
++++++        hidden_states: mindspore.Tensor,
++++++        attention_mask: Optional[mindspore.Tensor] = None,
++++++        position_ids: Optional[mindspore.Tensor] = None,
++++++        past_key_value: Optional[Cache] = None,
++++++        output_attentions: bool = False,
++++++        use_cache: bool = False,
++++++        cache_position: Optional[mindspore.Tensor] = None,
++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++        bsz, q_len, _ = hidden_states.shape
++++++
++++++        # 1. 线性投射 Q, K, V
++++++        query_states = self.q_proj(hidden_states)
++++++        key_states = self.k_proj(hidden_states)
++++++        value_states = self.v_proj(hidden_states)
++++++
++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++        # 3. RoPE 旋转位置编码
++++++        kv_seq_len = key_states.shape[-2]
++++++        if past_key_value is not None:
++++++            if self.layer_idx is None:
++++++                raise ValueError(
++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++                    "with a layer index."
++++++                )
++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++++                if cache_position.shape[0] == 1:
++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++++                    kv_seq_len = past_seen_tokens + 1
++++++                else:
++++++                    # prefill 阶段：cache_position 是范围，使用其长度
++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++++            else:
++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++
++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++        # 4. KV 缓存更新
++++++        if past_key_value is not None:
++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++            key_states, value_states = past_key_value.update(
++++++                key_states, value_states, self.layer_idx, cache_kwargs
++++++            )
++++++            
++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++                if cache_position.shape[0] == 1:
++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++++                    kv_seq_len = key_states.shape[-2]
++++++
++++++        # 5. [重要] 准备 Attention Mask
++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++        fa_attention_mask = None
++++++        if attention_mask is not None:
++++++            # 截取与当前key长度匹配的部分
++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++++            fa_attention_mask = (mask_slice != 0)
++++++
++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++++        input_dtype = query_states.dtype
++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++++            query_states = query_states.to(mindspore.float16)
++++++            key_states = key_states.to(mindspore.float16)
++++++            value_states = value_states.to(mindspore.float16)
++++++
++++++        # 6. [核心] 调用 flash_attention_score 算子
++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++        attn_output = mindspore.ops.flash_attention_score(
++++++            query=query_states,
++++++            key=key_states,
++++++            value=value_states,
++++++            head_num=self.num_heads, # 传入Q的头数(N1)
++++++            attn_mask=fa_attention_mask,
++++++            keep_prob=1.0 - self.attention_dropout,
++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
++++++            input_layout="BNSD",
++++++            sparse_mode=0 # 使用 defaultMask 模式
++++++        )
++++++
++++++        # 恢复原始数据类型
++++++        attn_output = attn_output.to(input_dtype)
++++++
++++++        # 7. 调整输出形状
++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++        attn_output = self.o_proj(attn_output)
++++++
++++++        # FlashAttention 算子不直接返回注意力权重矩阵
++++++        attn_weights = None
++++++        if output_attentions:
++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++
++++++        return attn_output, attn_weights, past_key_value
++++++
++++++    # def forward(
++++++    #     self,
++++++    #     hidden_states: mindspore.Tensor,
++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++++    #     past_key_value: Optional[Cache] = None,
++++++    #     output_attentions: bool = False,
++++++    #     use_cache: bool = False,
++++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++    #     bsz, q_len, _ = hidden_states.shape
++++++
++++++    #     # 1. 线性投射 Q, K, V
++++++    #     query_states = self.q_proj(hidden_states)
++++++    #     key_states = self.k_proj(hidden_states)
++++++    #     value_states = self.v_proj(hidden_states)
++++++
++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++    #     # 3. RoPE 旋转位置编码
++++++    #     kv_seq_len = key_states.shape[-2]
++++++    #     if past_key_value is not None:
++++++    #         if self.layer_idx is None:
++++++    #             raise ValueError(
++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++    #                 "with a layer index."
++++++    #             )
++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++
++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++    #     # 4. KV 缓存更新
++++++    #     if past_key_value is not None:
++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++    #         key_states, value_states = past_key_value.update(
++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++    #         )
++++++
++++++    #     # 5. 准备 Attention Mask
++++++    #     fa_attention_mask = None
++++++    #     if attention_mask is not None:
++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++    #         fa_attention_mask = (mask_slice != 0)
++++++
++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++++    #     input_dtype = query_states.dtype
++++++
++++++    #     # 6. [核心] 调用 flash_attention_score 算子
++++++    #     attn_output = mindspore.ops.flash_attention_score(
++++++    #         query=query_states,
++++++    #         key=key_states,
++++++    #         value=value_states,
++++++    #         head_num=self.num_heads,
++++++    #         attn_mask=fa_attention_mask,
++++++    #         keep_prob=1.0 - self.attention_dropout,
++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++++    #         input_layout="BNSD",
++++++    #         sparse_mode=0,
++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++++    #         inner_precise=1
++++++    #     )
++++++
++++++    #     # 恢复原始数据类型
++++++    #     attn_output = attn_output.to(input_dtype)
++++++
++++++    #     # 7. 调整输出形状
++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++    #     attn_output = self.o_proj(attn_output)
++++++
++++++    #     attn_weights = None
++++++    #     if output_attentions:
++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++
++++++    #     return attn_output, attn_weights, past_key_value
++++++
++++++    # def forward(
++++++    #     self,
++++++    #     hidden_states: mindspore.Tensor,
++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++++    #     past_key_value: Optional[Cache] = None,
++++++    #     output_attentions: bool = False,
++++++    #     use_cache: bool = False,
++++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++    #     bsz, q_len, _ = hidden_states.shape
++++++
++++++    #     query_states = self.q_proj(hidden_states)
++++++    #     key_states = self.k_proj(hidden_states)
++++++    #     value_states = self.v_proj(hidden_states)
++++++
++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++    #     kv_seq_len = key_states.shape[-2]
++++++    #     if past_key_value is not None:
++++++    #         if self.layer_idx is None:
++++++    #             raise ValueError("`layer_idx` must be specified for caching")
++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++
++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++    #     if past_key_value is not None:
++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++    #         key_states, value_states = past_key_value.update(
++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++    #         )
++++++
++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++
++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++++    #     query_states = query_states / math.sqrt(self.head_dim)
++++++    #     # <--- 修改结束 ---
++++++
++++++    #     fa_attention_mask = None
++++++    #     if attention_mask is not None:
++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++    #         fa_attention_mask = (mask_slice != 0)
++++++
++++++    #     input_dtype = query_states.dtype
++++++
++++++    #     attn_output = mindspore.ops.flash_attention_score(
++++++    #         query=query_states,    # 传入已经预先缩放过的 query
++++++    #         key=key_states,
++++++    #         value=value_states,
++++++    #         head_num=self.num_heads,
++++++    #         attn_mask=fa_attention_mask,
++++++    #         keep_prob=1.0 - self.attention_dropout,
++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++++    #         input_layout="BNSD",
++++++    #         sparse_mode=0,
++++++    #         inner_precise=1        # 仍然保持内部高精度计算
++++++    #     )
++++++
++++++    #     attn_output = attn_output.to(input_dtype)
++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++    #     attn_output = self.o_proj(attn_output)
++++++
++++++    #     attn_weights = None
++++++    #     if output_attentions:
++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++++
++++++    #     return attn_output, attn_weights, past_key_value
++++++
+++++ QWEN2MOE_ATTENTION_CLASSES = {
+++++     "eager": Qwen2MoeAttention,
++++++    "flash-attention": Qwen2MoeFlashAttention,
+++++ }
+++++ 
+++++ 
+++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++ 
++++++    #@dwj
++++++    # 只遍历激活的专家，而非全部专家
+++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+++++-        # router_logits: (batch * sequence_length, n_experts)
+++++-        router_logits = self.gate(hidden_states)
+++++-
+++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++-        if self.norm_topk_prob:
+++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-        # we cast back to the input dtype
+++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+++++-
+++++-        final_hidden_states = ops.zeros(
+++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++++-        )
+++++-
+++++-        # One hot encode the selected experts to create an expert mask
+++++-        # this will be used to easily index which expert is going to be sollicitated
+++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++++-
+++++-        # Loop over all available experts in the model and perform the computation on each expert
+++++-        for expert_idx in range(self.num_experts):
+++++-            expert_layer = self.experts[expert_idx]
+++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++++-
+++++-            # Index the correct hidden states and compute the expert hidden state for
+++++-            # the current expert. We need to make sure to multiply the output hidden
+++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++++-            if 0 not in idx.shape:
+++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++++-
+++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+++++-                # the `top_x` tensor here.
+++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++++-
+++++-        shared_expert_output = self.shared_expert(hidden_states)
+++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++++-
+++++-        final_hidden_states = final_hidden_states + shared_expert_output
++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++            num_tokens = hidden_states_reshaped.shape[0]
++++++
++++++            router_logits = self.gate(hidden_states_reshaped)
++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++
++++++            if self.norm_topk_prob:
++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++            routing_weights = routing_weights.to(hidden_states.dtype)
++++++            
++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++++            flat_selected_experts = selected_experts.flatten()
++++++            
++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++++            token_indices = broadcasted_token_indices.flatten()
++++++            
++++++            active_experts = ops.unique(flat_selected_experts)
++++++            
++++++            for expert_idx_tensor in active_experts:
++++++                expert_idx = expert_idx_tensor.item()
++++++                expert_layer = self.experts[expert_idx]
++++++                
++++++                mask = (flat_selected_experts == expert_idx_tensor)
++++++                selected_token_indices = token_indices[mask]
++++++                selected_routing_weights = routing_weights.flatten()[mask]
++++++                
++++++                current_states = hidden_states_reshaped[selected_token_indices]
++++++                
++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++                
++++++                final_hidden_states = final_hidden_states.index_add(
++++++                    dim=0,
++++++                    index=selected_token_indices,
++++++                    source=expert_output.to(hidden_states.dtype)
++++++                )
++++++            
++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++ 
+++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++-        return final_hidden_states, router_logits
++++++            final_hidden_states = final_hidden_states + shared_expert_output
++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++            
++++++            return final_hidden_states, router_logits
+++++ 
+++++ 
+++++ class Qwen2MoeDecoderLayer(nn.Module):
+++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++++ 
+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++ 
++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++
+++++         if (layer_idx not in config.mlp_only_layers) and (
+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++++         ):
+++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++++     _skip_keys_device_placement = "past_key_values"
+++++     _supports_cache_class = True
++++++#lwx
++++++    # _supports_static_cache = True
+++++ 
+++++     def _init_weights(self, module):
+++++         std = self.config.initializer_range
+++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++++         return causal_mask
+++++ 
+++++ 
+++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++     _tied_weights_keys = ["lm_head.weight"]
+++++ 
+++++     def __init__(self, config):
+++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++         self.num_experts_per_tok = config.num_experts_per_tok
+++++         # Initialize weights and apply final processing
+++++         self.post_init()
++++++        # @lwx
++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++++++        #     self.generation_config.cache_implementation = "static"
++++++        self._warmed_up = False
++++++
++++++    def warmup_moe_model(self):
++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
++++++        test_texts = [
++++++            "warmup short",
++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++++++        ]
++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++        if tokenizer is None:
++++++            from mindnlp.transformers import AutoTokenizer
++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++            self._warmup_tokenizer = tokenizer
++++++
++++++        for text in test_texts:
++++++            inputs = tokenizer(text, return_tensors="ms")
++++++            with mindspore._no_grad():
++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++++ 
+++++     def get_input_embeddings(self):
+++++         return self.model.embed_tokens
+++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++         ```"""
++++++        if not self._warmed_up:
++++++            self._warmed_up = True
++++++            self.warmup_moe_model()
+++++ 
+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++++         output_router_logits = (
+++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++             }
+++++         )
+++++         return model_inputs
++++++# @lwx
++++++    # def _decode_one_tokens_logits(
++++++    #     self,
++++++    #     cur_token: mindspore.Tensor,
++++++    #     input_pos: Optional[mindspore.Tensor],
++++++    #     cache_position: mindspore.Tensor,
++++++    #     past_key_values: StaticCache,
++++++    # ) -> mindspore.Tensor:
++++++    #     """
++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++++++        
++++++    #     Args:
++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++++++    #         input_pos: 输入位置信息，可选
++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
++++++            
++++++    #     Returns:
++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++++++    #     """
++++++    #     # 调用JIT编译的版本
++++++    #     return self.get_decode_one_tokens_logits(
++++++    #         cur_token=cur_token,
++++++    #         input_pos=input_pos,
++++++    #         cache_position=cache_position,
++++++    #         past_key_values=past_key_values,
++++++    #     )
++++++    
++++++    # @mindspore.jit(jit_level='O1')
++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++++++    #     """
++++++    #     JIT编译的函数，用于高效的单token解码
++++++    #     使用JIT编译优化以支持静态shape和高效执行
++++++        
++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++++++    #     """
++++++    #     outputs = self.model.forward(
++++++    #         input_ids=cur_token,
++++++    #         position_ids=input_pos,
++++++    #         cache_position=cache_position,
++++++    #         past_key_values=past_key_values,
++++++    #         use_cache=True,
++++++    #         return_dict=False,
++++++    #     )
++++++        
++++++    #     hidden_states = outputs[0]
++++++    #     logits = self.lm_head.forward(hidden_states)
++++++    #     logits = logits.float()
++++++        
++++++    #     return logits[:, -1, :]
++++++
++++++    # def _sample(
++++++    #     self,
++++++    #     input_ids: mindspore.Tensor,
++++++    #     logits_processor,
++++++    #     stopping_criteria,
++++++    #     generation_config,
++++++    #     synced_devices: bool,
++++++    #     streamer=None,
++++++    #     logits_warper=None,
++++++    #     **model_kwargs,
++++++    # ):
++++++    #     """
++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++++++    #     """
++++++    #     from ...generation.logits_process import LogitsProcessorList
++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++++++    #     from mindnlp.core import nn, ops, no_grad
++++++    #     import numpy as np
++++++        
++++++    #     # 检查是否使用 StaticCache
++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++++++    #     # 否则，直接调用父类方法
++++++    #     past_key_values = model_kwargs.get("past_key_values")
++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++++++
++++++    #     if not isinstance(past_key_values, StaticCache):
++++++    #         # 不使用 StaticCache，直接调用父类方法
++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++++++    #         return super()._sample(
++++++    #             input_ids=input_ids,
++++++    #             logits_processor=logits_processor,
++++++    #             stopping_criteria=stopping_criteria,
++++++    #             generation_config=generation_config,
++++++    #             synced_devices=synced_devices,
++++++    #             streamer=streamer,
++++++    #             logits_warper=logits_warper,
++++++    #             **model_kwargs,
++++++    #         )
++++++        
++++++    #     # 使用 StaticCache，进入自定义循环
++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++++++    #     pad_token_id = generation_config._pad_token_tensor
++++++    #     output_attentions = generation_config.output_attentions
++++++    #     output_hidden_states = generation_config.output_hidden_states
++++++    #     output_scores = generation_config.output_scores
++++++    #     output_logits = generation_config.output_logits
++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
++++++    #     max_length = generation_config.max_length
++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++++++    #     do_sample = generation_config.do_sample
++++++        
++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++++++    #         raise ValueError(
++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++++++    #             f"{logits_warper})."
++++++    #         )
++++++        
++++++    #     # init attention / hidden states / scores tuples
++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++++++        
++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++++++    #         encoder_hidden_states = (
++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++++++    #         )
++++++        
++++++    #     # keep track of which sequences are already finished
++++++    #     batch_size, cur_len = input_ids.shape
++++++    #     this_peer_finished = False
++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++++++        
++++++    #     time_record = []
++++++    #     from ....utils.testing_utils import parse_flag_from_env
++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++++++        
++++++    #     while self._has_unfinished_sequences(
++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++++++    #     ):
++++++    #         if _record_time:
++++++    #             import time as time_module
++++++    #             infer_start = time_module.time()
++++++            
++++++    #         # prepare model inputs
++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++++++            
++++++    #         # prepare variable output controls
++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++++++            
++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++++++    #         cur_cache_position = model_inputs.get("cache_position")
++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
++++++    #         cur_input_ids = model_inputs.get("input_ids")
++++++            
++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
++++++    #             cur_cache_position is not None and 
++++++    #             len(cur_cache_position.shape) > 0 and
++++++    #             cur_cache_position.shape[0] == 1 and
++++++    #             cur_input_ids is not None and
++++++    #             cur_input_ids.shape[1] == 1):
++++++    #             # 使用 JIT 优化的单 token 解码
++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++++++    #             if not hasattr(self, '_jit_used'):
++++++    #                 self._jit_used = False
++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++++++                
++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
++++++    #                 cur_token=cur_input_ids,
++++++    #                 input_pos=model_inputs.get("position_ids"),
++++++    #                 cache_position=cur_cache_position,
++++++    #                 past_key_values=cur_past_key_values,
++++++    #             )
++++++                
++++++    #             # 标记已使用JIT（用于后续判断）
++++++    #             if not self._jit_used:
++++++    #                 self._jit_used = True
++++++                
++++++    #             # 构造兼容的输出对象
++++++    #             class JitOptimizedOutput:
++++++    #                 def __init__(self, logits, config):
++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++++++    #                     self.config = config
++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
++++++    #                     self.cross_attentions = None
++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++++++                
++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++++++    #         else:
++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++++++    #             outputs = self(**model_inputs, return_dict=True)
++++++            
++++++    #         if synced_devices and this_peer_finished:
++++++    #             continue
++++++            
++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++++++    #         next_token_logits = outputs.logits[:, -1, :]
++++++            
++++++    #         # pre-process distribution
++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++++++    #         if do_sample:
++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++++++            
++++++    #         # Store scores, attentions and hidden_states when required
++++++    #         if return_dict_in_generate:
++++++    #             if output_scores:
++++++    #                 scores += (next_token_scores,)
++++++    #             if output_logits:
++++++    #                 raw_logits += (next_token_logits,)
++++++    #             if output_attentions:
++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++++++    #                 if self.config.is_encoder_decoder:
++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++++++                
++++++    #             if output_hidden_states:
++++++    #                 hidden = (
++++++    #                     outputs.decoder_hidden_states
++++++    #                     if self.config.is_encoder_decoder
++++++    #                     else outputs.hidden_states
++++++    #                 )
++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++++++            
++++++    #         # token selection
++++++    #         if do_sample:
++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++++++    #         else:
++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++++++            
++++++    #         # finished sentences should have their next token be a padding token
++++++    #         if has_eos_stopping_criteria:
++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++++++            
++++++    #         # update generated ids, model inputs, and length for next step
++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++++++    #         if streamer is not None:
++++++    #             streamer.put(next_tokens)
++++++            
++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
++++++    #             outputs,
++++++    #             model_kwargs,
++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
++++++    #         )
++++++            
++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++++++    #         cur_len += 1
++++++            
++++++    #         if _record_time:
++++++    #             import time as time_module
++++++    #             infer_stop = time_module.time()
++++++    #             time_record.append(infer_stop - infer_start)
++++++            
++++++    #         del outputs
++++++        
++++++    #     average_infer_time = None
++++++    #     if time_record:
++++++    #         if len(time_record) > 1:
++++++    #             time_record.pop(0)
++++++    #         average_infer_time = sum(time_record) / len(time_record)
++++++    #         print(f'average inference time is: {average_infer_time}')
++++++    #         print(f'inference time record: {time_record}')
++++++        
++++++    #     if streamer is not None:
++++++    #         streamer.end()
++++++        
++++++    #     # 简单判断：打印是否使用了JIT路径
++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
++++++    #     else:
++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++++++        
++++++    #     if return_dict_in_generate:
++++++    #         if self.config.is_encoder_decoder:
++++++    #             return GenerateEncoderDecoderOutput(
++++++    #                 sequences=input_ids,
++++++    #                 scores=scores,
++++++    #                 logits=raw_logits,
++++++    #                 encoder_attentions=encoder_attentions,
++++++    #                 encoder_hidden_states=encoder_hidden_states,
++++++    #                 decoder_attentions=decoder_attentions,
++++++    #                 cross_attentions=cross_attentions,
++++++    #                 decoder_hidden_states=decoder_hidden_states,
++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++    #                 average_infer_time=average_infer_time
++++++    #             )
++++++    #         else:
++++++    #             return GenerateDecoderOnlyOutput(
++++++    #                 sequences=input_ids,
++++++    #                 scores=scores,
++++++    #                 logits=raw_logits,
++++++    #                 attentions=decoder_attentions,
++++++    #                 hidden_states=decoder_hidden_states,
++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++    #                 average_infer_time=average_infer_time
++++++    #             )
++++++    #     else:
++++++    #         return input_ids
++++++            
++++++    # def _prepare_cache_for_generation(
++++++    #     self,
++++++    #     generation_config,
++++++    #     model_kwargs,
++++++    #     assistant_model,
++++++    #     batch_size,
++++++    #     max_cache_length,
++++++    # ):
++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++++++    #         generation_config.cache_implementation = "static"
++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++++++        
++++++    #     if generation_config.cache_implementation == "static":
++++++    #         base_required_from_max_length = generation_config.max_length + 1
++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
++++++    #         min_cache_size = 50
++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++++++    #         else:
++++++    #             max_cache_length = max(base_required, min_cache_size)
++++++            
++++++    #         original_max_cache_length = max_cache_length
++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
++++++            
++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++    #             if max_cache_length > self.config.max_position_embeddings:
++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++++++        
++++++    #     result = super()._prepare_cache_for_generation(
++++++    #         generation_config=generation_config,
++++++    #         model_kwargs=model_kwargs,
++++++    #         assistant_model=assistant_model,
++++++    #         batch_size=batch_size,
++++++    #         max_cache_length=max_cache_length,
++++++    #     )
++++++        
++++++    #     if generation_config.cache_implementation == "static":
++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++++++    #         created_cache = model_kwargs.get(cache_name)
++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++++++    #             if created_cache.max_cache_len < generation_config.max_length:
++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++++++        
++++++    #     return result
++++++
++++++
++++++
+++++ 
+++++ 
+++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++++-- 
+++++2.27.0
+++++
++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++++new file mode 100644
++++index 00000000..22b65dd5
++++--- /dev/null
+++++++ b/patches/0002-20251106commit.patch
++++@@ -0,0 +1,3200 @@
+++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+++++From: Pinoeer-kingxi <13022943007@163.com>
+++++Date: Thu, 6 Nov 2025 09:20:38 +0800
+++++Subject: [PATCH 2/3] 20251106commit
+++++
+++++---
+++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
+++++ create mode 100644 patches/0001-20251104commit.patch
+++++
+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++index d8303e45..73773c22 100644
+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+++++         #     y = y + self.shared_experts(identity)
+++++         # return y
+++++         
++++++    # @no_grad()
++++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++
++++++    #     expert_cache = ops.zeros_like(x)
++++++    #     for i in range(self.num_experts_per_tok):
++++++    #         expert_id = flat_expert_indices[i].item()
++++++    #         weight = flat_expert_weights[i].item()
++++++    #         expert = self.experts[expert_id]
++++++    #         expert_out = expert(x)
++++++    #         expert_cache += expert_out * weight
++++++    #     return expert_cache
++++++
+++++     @no_grad()
+++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++        # x 的 shape: (1, hidden_size)
++++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++++++
++++++        # 1. 收集所有需要的专家层
++++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
++++++
++++++        # 2. 并行计算所有专家的输出
++++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++++++        # ops.cat 会将它们堆叠成一个新的 Tensor
++++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++++++
++++++        # 3. 使用矩阵乘法进行加权求和
++++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++++        # 最终结果 final_output 的 shape: (1, hidden_size)
++++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++++++        
++++++        return final_output
+++++ 
+++++-        expert_cache = ops.zeros_like(x)
+++++-        for i in range(self.num_experts_per_tok):
+++++-            expert_id = flat_expert_indices[i].item()
+++++-            weight = flat_expert_weights[i].item()
+++++-            expert = self.experts[expert_id]
+++++-            expert_out = expert(x)
+++++-            expert_cache += expert_out * weight
+++++-        return expert_cache
+++++ 
+++++     @no_grad()
+++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+++++             key_states = self.k_proj(hidden_states)
+++++             value_states = self.v_proj(hidden_states)
+++++ 
+++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++        # @lwx
++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
++++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
++++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
++++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+++++ 
+++++         kv_seq_len = key_states.shape[-2]
+++++         if past_key_value is not None:
+++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+++++         return attn_output, attn_weights, past_key_value
+++++ 
+++++ 
++++++# class DeepseekFlashAttention(nn.Module):
++++++#     """
++++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
++++++
++++++#     This class is designed as a drop-in replacement for DeepseekAttention.
++++++#     """
++++++
++++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++++++#         super().__init__()
++++++#         self.config = config
++++++#         self.layer_idx = layer_idx
++++++#         if layer_idx is None:
++++++#             logger.warning(
++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++#                 "when creating this class."
++++++#             )
++++++
++++++#         self.attention_dropout = config.attention_dropout
++++++#         self.hidden_size = config.hidden_size
++++++#         self.num_heads = config.num_attention_heads
++++++#         self.head_dim = self.hidden_size // self.num_heads
++++++#         self.num_key_value_heads = config.num_key_value_heads
++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++#         self.max_position_embeddings = config.max_position_embeddings
++++++#         self.rope_theta = config.rope_theta
++++++#         self.is_causal = True
++++++
++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++++#             raise ValueError(
++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++++#                 f" and `num_heads`: {self.num_heads})."
++++++#             )
++++++
++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++++++#         self._init_rope()
++++++
++++++#     def _init_rope(self):
++++++#         if self.config.rope_scaling is None:
++++++#             self.rotary_emb = DeepseekRotaryEmbedding(
++++++#                 self.head_dim,
++++++#                 max_position_embeddings=self.max_position_embeddings,
++++++#                 base=self.rope_theta,
++++++#             )
++++++#         else:
++++++#             scaling_type = self.config.rope_scaling["type"]
++++++#             scaling_factor = self.config.rope_scaling["factor"]
++++++#             if scaling_type == "linear":
++++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++++++#                     self.head_dim,
++++++#                     max_position_embeddings=self.max_position_embeddings,
++++++#                     scaling_factor=scaling_factor,
++++++#                     base=self.rope_theta,
++++++#                 )
++++++#             elif scaling_type == "dynamic":
++++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++++++#                     self.head_dim,
++++++#                     max_position_embeddings=self.max_position_embeddings,
++++++#                     scaling_factor=scaling_factor,
++++++#                     base=self.rope_theta,
++++++#                 )
++++++#             else:
++++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++++++
++++++#     def forward(
++++++#         self,
++++++#         hidden_states: mindspore.Tensor,
++++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++++#         position_ids: Optional[mindspore.Tensor] = None,
++++++#         past_key_value: Optional[Cache] = None,
++++++#         output_attentions: bool = False,
++++++#         use_cache: bool = False,
++++++#         **kwargs,
++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++#         if "padding_mask" in kwargs:
++++++#             warnings.warn(
++++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++++++#             )
++++++        
++++++#         if output_attentions:
++++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
++++++
++++++#         bsz, q_len, _ = hidden_states.shape
++++++
++++++#         if self.config.pretraining_tp > 1:
++++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++++++
++++++#         query_states = self.q_proj(hidden_states)
++++++#         key_states = self.k_proj(hidden_states)
++++++#         value_states = self.v_proj(hidden_states)
++++++
++++++#         # Reshape for multi-head attention
++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++#         kv_seq_len = key_states.shape[-2]
++++++#         if past_key_value is not None:
++++++#             if self.layer_idx is None:
++++++#                 raise ValueError(
++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++#                     "with a layer index."
++++++#                 )
++++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++        
++++++#         # Apply Rotary Positional Embedding
++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++#         if past_key_value is not None:
++++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++
++++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
++++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
++++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++        
++++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++++        
++++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++++
++++++#         # Convert attention_mask for flash_attention_score
++++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
++++++#         if attention_mask is not None:
++++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
++++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++++++#                 raise ValueError(
++++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++++++#                 )
++++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
++++++#         else:
++++++#             attn_mask_for_fa = None
++++++        
++++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++++++
++++++#         # Call the fused flash_attention_score operator
++++++#         attn_output = mindspore.ops.flash_attention_score(
++++++#             query=query_states_for_fa,
++++++#             key=key_states_for_fa,
++++++#             value=value_states_for_fa,
++++++#             head_num=self.num_heads, # This is N1, the number of query heads
++++++#             input_layout='BSH',
++++++#             attn_mask=attn_mask_for_fa,
++++++#             keep_prob=keep_prob,
++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
++++++#             sparse_mode=0 # Default mask mode
++++++#         )
++++++        
++++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
++++++#         attn_output = self.o_proj(attn_output)
++++++        
++++++#         # Flash Attention does not return attention weights
++++++#         attn_weights = None
++++++
++++++#         return attn_output, attn_weights, past_key_value
++++++
++++++class DeepseekFlashAttention(nn.Module):
++++++    """
++++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
++++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
++++++    designed for high performance on supported hardware (Ascend).
++++++
++++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
++++++    """
++++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++++++        super().__init__()
++++++        self.config = config
++++++        self.layer_idx = layer_idx
++++++        if layer_idx is None:
++++++            logger.warning(
++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++                "when creating this class."
++++++            )
++++++
++++++        # --- [FIX] Correctly initialize all required attributes ---
++++++        self.attention_dropout = config.attention_dropout
++++++        self.hidden_size = config.hidden_size
++++++        self.num_heads = config.num_attention_heads
++++++        self.head_dim = self.hidden_size // self.num_heads
++++++        self.num_key_value_heads = config.num_key_value_heads
++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++        self.max_position_embeddings = config.max_position_embeddings
++++++        self.rope_theta = config.rope_theta
++++++        self.is_causal = True
++++++
++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++++            raise ValueError(
++++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++++                f" and `num_heads`: {self.num_heads})."
++++++            )
++++++
++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++++++        
++++++        # This call will now succeed as all attributes are initialized.
++++++        self._init_rope()
++++++
++++++    def _init_rope(self):
++++++        if self.config.rope_scaling is None:
++++++            self.rotary_emb = DeepseekRotaryEmbedding(
++++++                self.head_dim,
++++++                max_position_embeddings=self.max_position_embeddings,
++++++                base=self.rope_theta,
++++++            )
++++++        else:
++++++            scaling_type = self.config.rope_scaling["type"]
++++++            scaling_factor = self.config.rope_scaling["factor"]
++++++            if scaling_type == "linear":
++++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++++++                    self.head_dim,
++++++                    max_position_embeddings=self.max_position_embeddings,
++++++                    scaling_factor=scaling_factor,
++++++                    base=self.rope_theta,
++++++                )
++++++            elif scaling_type == "dynamic":
++++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++++++                    self.head_dim,
++++++                    max_position_embeddings=self.max_position_embeddings,
++++++                    scaling_factor=scaling_factor,
++++++                    base=self.rope_theta,
++++++                )
++++++            else:
++++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++++++
++++++    def forward(
++++++        self,
++++++        hidden_states: mindspore.Tensor,
++++++        attention_mask: Optional[mindspore.Tensor] = None,
++++++        position_ids: Optional[mindspore.Tensor] = None,
++++++        past_key_value: Optional[Cache] = None,
++++++        output_attentions: bool = False,
++++++        use_cache: bool = False,
++++++        **kwargs,
++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++        if "padding_mask" in kwargs:
++++++            warnings.warn(
++++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++++++            )
++++++        if output_attentions:
++++++            warnings.warn(
++++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
++++++            )
++++++
++++++        bsz, q_len, _ = hidden_states.shape
++++++
++++++        if self.config.pretraining_tp > 1:
++++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++++++
++++++        query_states = self.q_proj(hidden_states)
++++++        key_states = self.k_proj(hidden_states)
++++++        value_states = self.v_proj(hidden_states)
++++++
++++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++        kv_seq_len = key_states.shape[-2]
++++++        if past_key_value is not None:
++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++        
++++++        # Apply Rotary Position Embedding
++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++        if past_key_value is not None:
++++++            cache_kwargs = {"sin": sin, "cos": cos}
++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++
++++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
++++++        # So we must explicitly repeat the KV heads.
++++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++
++++++        # Convert attention mask for flash_attention_score
++++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
++++++        if attention_mask is not None:
++++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++++++                 raise ValueError(
++++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++++++                )
++++++            attn_mask_for_fa = attention_mask < 0
++++++        else:
++++++            attn_mask_for_fa = None
++++++
++++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++++++
++++++        # Call the fused operator using the efficient BNSD layout
++++++        attn_output = mindspore.ops.flash_attention_score(
++++++            query=query_states,
++++++            key=key_states,
++++++            value=value_states,
++++++            head_num=self.num_heads,
++++++            input_layout='BNSD', # Specify the correct layout
++++++            attn_mask=attn_mask_for_fa,
++++++            keep_prob=keep_prob,
++++++            scalar_value=1.0 / math.sqrt(self.head_dim)
++++++        )
++++++        
++++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++        
++++++        # Apply output projection
++++++        attn_output = self.o_proj(attn_output)
++++++
++++++        # Flash attention does not return attention weights, so we return None.
++++++        attn_weights = None
++++++
++++++        return attn_output, attn_weights, past_key_value
++++++
+++++ Deepseek_ATTENTION_CLASSES = {
+++++     "eager": DeepseekAttention,
++++++    "flash-attention": DeepseekFlashAttention,
+++++ }
+++++ 
+++++ 
+++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+++++             config=config, layer_idx=layer_idx
+++++         )
+++++ 
++++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
++++++            config=config, layer_idx=layer_idx
++++++        )
++++++
+++++         self.mlp = (
+++++             DeepseekMoE(config)
+++++             if (
+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++index d4c6b651..bced285c 100644
+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+++++ 
+++++ import mindspore
+++++ import mindnlp.core.nn.functional as F
+++++-from mindnlp.core import nn, ops
++++++from mindnlp.core import nn, ops, no_grad
+++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+++++ 
+++++ from ....common.activations import ACT2FN
+++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+++++ 
++++++Long_Prompt = False
++++++PROMPT_LENGTH_THRESHOLD = 128
+++++ 
+++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+++++ def _prepare_4d_causal_attention_mask_with_cache_position(
+++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+++++         return attn_output, attn_weights, past_key_value
+++++ 
+++++ 
++++++# class Qwen2MoeFlashAttention(nn.Module):
++++++#     """
++++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++++
++++++#     关键改动:
++++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++++#        直接传入原始的 key 和 value 张量效率更高。
++++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++#     """
++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++#         super().__init__()
++++++#         self.config = config
++++++#         self.layer_idx = layer_idx
++++++#         self.hidden_size = config.hidden_size
++++++#         self.num_heads = config.num_attention_heads
++++++#         self.head_dim = self.hidden_size // self.num_heads
++++++#         self.num_key_value_heads = config.num_key_value_heads
++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++#         self.max_position_embeddings = config.max_position_embeddings
++++++#         self.rope_theta = config.rope_theta
++++++#         self.attention_dropout = config.attention_dropout
++++++
++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++++#             raise ValueError(
++++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++++#             )
++++++
++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++
++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++#             self.head_dim,
++++++#             max_position_embeddings=self.max_position_embeddings,
++++++#             base=self.rope_theta,
++++++#         )
++++++
++++++#     def forward(
++++++#         self,
++++++#         hidden_states: mindspore.Tensor,
++++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++++#         position_ids: Optional[mindspore.Tensor] = None,
++++++#         past_key_value: Optional[Cache] = None,
++++++#         output_attentions: bool = False,
++++++#         use_cache: bool = False,
++++++#         cache_position: Optional[mindspore.Tensor] = None,
++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++#         bsz, q_len, _ = hidden_states.shape
++++++
++++++#         # 1. 线性投射 Q, K, V
++++++#         query_states = self.q_proj(hidden_states)
++++++#         key_states = self.k_proj(hidden_states)
++++++#         value_states = self.v_proj(hidden_states)
++++++
++++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
++++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++#         # 3. RoPE 旋转位置编码
++++++#         kv_seq_len = key_states.shape[-2]
++++++#         if past_key_value is not None:
++++++#             if self.layer_idx is None:
++++++#                 raise ValueError(
++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++#                     "with a layer index."
++++++#                 )
++++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
++++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++++#                 if cache_position.shape[0] == 1:
++++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++++#                     kv_seq_len = past_seen_tokens + 1
++++++#                 else:
++++++#                     # prefill 阶段：cache_position 是范围，使用其长度
++++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++++#             else:
++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++
++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++#         # 4. KV 缓存更新
++++++#         if past_key_value is not None:
++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++#             key_states, value_states = past_key_value.update(
++++++#                 key_states, value_states, self.layer_idx, cache_kwargs
++++++#             )
++++++            
++++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++#                 if cache_position.shape[0] == 1:
++++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++++#                     kv_seq_len = key_states.shape[-2]
++++++
++++++#         # 5. [重要] 准备 Attention Mask
++++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++#         fa_attention_mask = None
++++++#         if attention_mask is not None:
++++++#             # 截取与当前key长度匹配的部分
++++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
++++++#             fa_attention_mask = (mask_slice != 0)
++++++
++++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++++#         input_dtype = query_states.dtype
++++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++++#             query_states = query_states.to(mindspore.float16)
++++++#             key_states = key_states.to(mindspore.float16)
++++++#             value_states = value_states.to(mindspore.float16)
++++++
++++++#         # 6. [核心] 调用 flash_attention_score 算子
++++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++#         attn_output = mindspore.ops.flash_attention_score(
++++++#             query=query_states,
++++++#             key=key_states,
++++++#             value=value_states,
++++++#             head_num=self.num_heads, # 传入Q的头数(N1)
++++++#             attn_mask=fa_attention_mask,
++++++#             keep_prob=1.0 - self.attention_dropout,
++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
++++++#             input_layout="BNSD",
++++++#             sparse_mode=0 # 使用 defaultMask 模式
++++++#         )
++++++
++++++#         # 恢复原始数据类型
++++++#         attn_output = attn_output.to(input_dtype)
++++++
++++++#         # 7. 调整输出形状
++++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++#         attn_output = self.o_proj(attn_output)
++++++
++++++#         # FlashAttention 算子不直接返回注意力权重矩阵
++++++#         attn_weights = None
++++++#         if output_attentions:
++++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++
++++++#         return attn_output, attn_weights, past_key_value
++++++
++++++#     # def forward(
++++++#     #     self,
++++++#     #     hidden_states: mindspore.Tensor,
++++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
++++++#     #     position_ids: Optional[mindspore.Tensor] = None,
++++++#     #     past_key_value: Optional[Cache] = None,
++++++#     #     output_attentions: bool = False,
++++++#     #     use_cache: bool = False,
++++++#     #     cache_position: Optional[mindspore.Tensor] = None,
++++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++#     #     bsz, q_len, _ = hidden_states.shape
++++++
++++++#     #     # 1. 线性投射 Q, K, V
++++++#     #     query_states = self.q_proj(hidden_states)
++++++#     #     key_states = self.k_proj(hidden_states)
++++++#     #     value_states = self.v_proj(hidden_states)
++++++
++++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++
++++++#     #     # 3. RoPE 旋转位置编码
++++++#     #     kv_seq_len = key_states.shape[-2]
++++++#     #     if past_key_value is not None:
++++++#     #         if self.layer_idx is None:
++++++#     #             raise ValueError(
++++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++#     #                 "with a layer index."
++++++#     #             )
++++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++
++++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++#     #     # 4. KV 缓存更新
++++++#     #     if past_key_value is not None:
++++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++#     #         key_states, value_states = past_key_value.update(
++++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
++++++#     #         )
++++++
++++++#     #     # 5. 准备 Attention Mask
++++++#     #     fa_attention_mask = None
++++++#     #     if attention_mask is not None:
++++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++#     #         fa_attention_mask = (mask_slice != 0)
++++++
++++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++++#     #     input_dtype = query_states.dtype
++++++
++++++#     #     # 6. [核心] 调用 flash_attention_score 算子
++++++#     #     attn_output = mindspore.ops.flash_attention_score(
++++++#     #         query=query_states,
++++++#     #         key=key_states,
++++++#     #         value=value_states,
++++++#     #         head_num=self.num_heads,
++++++#     #         attn_mask=fa_attention_mask,
++++++#     #         keep_prob=1.0 - self.attention_dropout,
++++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++++#     #         input_layout="BNSD",
++++++#     #         sparse_mode=0,
++++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
++++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++++#     #         inner_precise=1
++++++#     #     )
++++++
++++++#     #     # 恢复原始数据类型
++++++#     #     attn_output = attn_output.to(input_dtype)
++++++
++++++#     #     # 7. 调整输出形状
++++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++#     #     attn_output = self.o_proj(attn_output)
++++++
++++++#     #     attn_weights = None
++++++#     #     if output_attentions:
++++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++
++++++#     #     return attn_output, attn_weights, past_key_value
++++++
++++++
+++++ class Qwen2MoeFlashAttention(nn.Module):
+++++     """
+++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++-
+++++-    关键改动:
+++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++-       直接传入原始的 key 和 value 张量效率更高。
+++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
++++++
++++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
++++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
++++++    完全使用模型的低精度数据类型（如 float16）进行计算，
++++++    以达到理论上的最高执行速度。
+++++     """
+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++         super().__init__()
+++++         self.config = config
+++++         self.layer_idx = layer_idx
++++++        if layer_idx is None:
++++++            logger.warning_once(
++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
++++++            )
++++++
+++++         self.hidden_size = config.hidden_size
+++++         self.num_heads = config.num_attention_heads
+++++         self.head_dim = self.hidden_size // self.num_heads
+++++         self.num_key_value_heads = config.num_key_value_heads
+++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++         self.max_position_embeddings = config.max_position_embeddings
+++++         self.rope_theta = config.rope_theta
+++++         self.attention_dropout = config.attention_dropout
+++++ 
+++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++-            raise ValueError(
+++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++-            )
+++++-
+++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+++++         key_states = self.k_proj(hidden_states)
+++++         value_states = self.v_proj(hidden_states)
+++++ 
+++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++        # 2. 调整形状以匹配 BNSD 布局
+++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-
+++++-        # 3. RoPE 旋转位置编码
++++++        
++++++        # 3. RoPE 和 KV 缓存
+++++         kv_seq_len = key_states.shape[-2]
+++++         if past_key_value is not None:
+++++-            if self.layer_idx is None:
+++++-                raise ValueError(
+++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++-                    "with a layer index."
+++++-                )
+++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++-                if cache_position.shape[0] == 1:
+++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++-                    kv_seq_len = past_seen_tokens + 1
+++++-                else:
+++++-                    # prefill 阶段：cache_position 是范围，使用其长度
+++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++-            else:
+++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++-
++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++        
+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++ 
+++++-        # 4. KV 缓存更新
+++++         if past_key_value is not None:
+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++-            key_states, value_states = past_key_value.update(
+++++-                key_states, value_states, self.layer_idx, cache_kwargs
+++++-            )
+++++-            
+++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++-                if cache_position.shape[0] == 1:
+++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++-                    kv_seq_len = key_states.shape[-2]
+++++-
+++++-        # 5. [重要] 准备 Attention Mask
+++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++
++++++        # 4. 准备 Attention Mask
+++++         fa_attention_mask = None
+++++         if attention_mask is not None:
+++++-            # 截取与当前key长度匹配的部分
+++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++             fa_attention_mask = (mask_slice != 0)
+++++ 
+++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++-        input_dtype = query_states.dtype
+++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++-            query_states = query_states.to(mindspore.float16)
+++++-            key_states = key_states.to(mindspore.float16)
+++++-            value_states = value_states.to(mindspore.float16)
+++++-
+++++-        # 6. [核心] 调用 flash_attention_score 算子
+++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+++++         attn_output = mindspore.ops.flash_attention_score(
+++++             query=query_states,
+++++             key=key_states,
+++++             value=value_states,
+++++-            head_num=self.num_heads, # 传入Q的头数(N1)
++++++            head_num=self.num_heads,
+++++             attn_mask=fa_attention_mask,
+++++-            keep_prob=1.0 - self.attention_dropout,
++++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+++++             scalar_value=1.0 / math.sqrt(self.head_dim),
+++++             input_layout="BNSD",
+++++-            sparse_mode=0 # 使用 defaultMask 模式
++++++            sparse_mode=0,
++++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+++++         )
+++++ 
+++++-        # 恢复原始数据类型
+++++-        attn_output = attn_output.to(input_dtype)
+++++-
+++++-        # 7. 调整输出形状
+++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++        # 6. 调整输出形状
+++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++         attn_output = self.o_proj(attn_output)
+++++ 
+++++-        # FlashAttention 算子不直接返回注意力权重矩阵
++++++        # 7. 返回结果
+++++         attn_weights = None
+++++         if output_attentions:
+++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+++++ 
+++++         return attn_output, attn_weights, past_key_value
+++++ 
+++++-    # def forward(
+++++-    #     self,
+++++-    #     hidden_states: mindspore.Tensor,
+++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++-    #     position_ids: Optional[mindspore.Tensor] = None,
+++++-    #     past_key_value: Optional[Cache] = None,
+++++-    #     output_attentions: bool = False,
+++++-    #     use_cache: bool = False,
+++++-    #     cache_position: Optional[mindspore.Tensor] = None,
+++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++-
+++++-    #     bsz, q_len, _ = hidden_states.shape
+++++-
+++++-    #     # 1. 线性投射 Q, K, V
+++++-    #     query_states = self.q_proj(hidden_states)
+++++-    #     key_states = self.k_proj(hidden_states)
+++++-    #     value_states = self.v_proj(hidden_states)
+++++-
+++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-
+++++-    #     # 3. RoPE 旋转位置编码
+++++-    #     kv_seq_len = key_states.shape[-2]
+++++-    #     if past_key_value is not None:
+++++-    #         if self.layer_idx is None:
+++++-    #             raise ValueError(
+++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++-    #                 "with a layer index."
+++++-    #             )
+++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++ 
+++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++-
+++++-    #     # 4. KV 缓存更新
+++++-    #     if past_key_value is not None:
+++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++-    #         key_states, value_states = past_key_value.update(
+++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++-    #         )
+++++-
+++++-    #     # 5. 准备 Attention Mask
+++++-    #     fa_attention_mask = None
+++++-    #     if attention_mask is not None:
+++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++-    #         fa_attention_mask = (mask_slice != 0)
+++++-
+++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++-    #     input_dtype = query_states.dtype
+++++-
+++++-    #     # 6. [核心] 调用 flash_attention_score 算子
+++++-    #     attn_output = mindspore.ops.flash_attention_score(
+++++-    #         query=query_states,
+++++-    #         key=key_states,
+++++-    #         value=value_states,
+++++-    #         head_num=self.num_heads,
+++++-    #         attn_mask=fa_attention_mask,
+++++-    #         keep_prob=1.0 - self.attention_dropout,
+++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++-    #         input_layout="BNSD",
+++++-    #         sparse_mode=0,
+++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++-    #         inner_precise=1
+++++-    #     )
+++++-
+++++-    #     # 恢复原始数据类型
+++++-    #     attn_output = attn_output.to(input_dtype)
++++++QWEN2MOE_ATTENTION_CLASSES = {
++++++    "eager": Qwen2MoeAttention,
++++++    "flash-attention": Qwen2MoeFlashAttention,
++++++}
+++++ 
+++++-    #     # 7. 调整输出形状
+++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++-    #     attn_output = self.o_proj(attn_output)
+++++ 
+++++-    #     attn_weights = None
+++++-    #     if output_attentions:
+++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++#     def __init__(self, config):
++++++#         super().__init__()
++++++#         self.num_experts = config.num_experts
++++++#         self.top_k = config.num_experts_per_tok
++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++
++++++#         # gating
++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++#         self.experts = nn.ModuleList(
++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++#         )
++++++
++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++
++++++#     #@dwj
++++++#     # 只遍历激活的专家，而非全部专家
++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++#             num_tokens = hidden_states_reshaped.shape[0]
++++++
++++++#             router_logits = self.gate(hidden_states_reshaped)
++++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++
++++++#             if self.norm_topk_prob:
++++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++#             routing_weights = routing_weights.to(hidden_states.dtype)
++++++            
++++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++++#             flat_selected_experts = selected_experts.flatten()
++++++            
++++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++++#             token_indices = broadcasted_token_indices.flatten()
++++++            
++++++#             active_experts = ops.unique(flat_selected_experts)
++++++            
++++++#             for expert_idx_tensor in active_experts:
++++++#                 expert_idx = expert_idx_tensor.item()
++++++#                 expert_layer = self.experts[expert_idx]
++++++                
++++++#                 mask = (flat_selected_experts == expert_idx_tensor)
++++++#                 selected_token_indices = token_indices[mask]
++++++#                 selected_routing_weights = routing_weights.flatten()[mask]
++++++                
++++++#                 current_states = hidden_states_reshaped[selected_token_indices]
++++++                
++++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++                
++++++#                 final_hidden_states = final_hidden_states.index_add(
++++++#                     dim=0,
++++++#                     index=selected_token_indices,
++++++#                     source=expert_output.to(hidden_states.dtype)
++++++#                 )
++++++            
++++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++ 
+++++-    #     return attn_output, attn_weights, past_key_value
++++++#             final_hidden_states = final_hidden_states + shared_expert_output
++++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++            
++++++#             return final_hidden_states, router_logits
++++++
++++++
++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++#     """
++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
++++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
++++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
++++++#     """
++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++#         super().__init__()
++++++#         self.num_experts = config.num_experts
++++++#         self.top_k = config.num_experts_per_tok
++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++
++++++#         # 门控网络
++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++#         # 专家列表
++++++#         self.experts = nn.ModuleList(
++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++#         )
++++++#         # 共享专家
++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++
++++++#     @no_grad()
++++++#     def _moe_infer_decode(
++++++#         self, 
++++++#         hidden_states: mindspore.Tensor, 
++++++#         selected_experts: mindspore.Tensor, 
++++++#         routing_weights: mindspore.Tensor
++++++#     ) -> mindspore.Tensor:
++++++#         """
++++++#         【解码路径】针对 sequence_length=1 的极致优化。
++++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
++++++#         """
++++++#         batch_size, hidden_dim = hidden_states.shape
++++++        
++++++#         expert_outputs_list = [
++++++#             ops.cat([
++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++#             ], dim=0) 
++++++#             for i in range(batch_size)
++++++#         ]
++++++        
++++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
++++++#         # shape: (batch_size, top_k, hidden_dim)
++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++        
++++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++        
++++++#         return moe_output.squeeze(1)
++++++
++++++#     @no_grad()
++++++#     def _moe_infer_prefill(
++++++#         self, 
++++++#         hidden_states: mindspore.Tensor, 
++++++#         selected_experts: mindspore.Tensor, 
++++++#         routing_weights: mindspore.Tensor
++++++#     ) -> mindspore.Tensor:
++++++#         """
++++++#         【预填充路径】针对 sequence_length > 1 的优化。
++++++#         按专家对 Token 进行分组，并进行批处理。
++++++#         """
++++++#         moe_output = ops.zeros_like(hidden_states)
++++++#         num_tokens = hidden_states.shape[0]
++++++#         flat_selected_experts = selected_experts.flatten()
++++++        
++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++        
++++++#         active_experts = ops.unique(flat_selected_experts)
++++++        
++++++#         for expert_idx_tensor in active_experts:
++++++#             expert_idx = expert_idx_tensor.item()
++++++#             expert_layer = self.experts[expert_idx]
++++++            
++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++#             selected_token_indices = token_indices[mask]
++++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++++            
++++++#             current_states = hidden_states[selected_token_indices]
++++++            
++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++            
++++++#             moe_output = moe_output.index_add(
++++++#                 dim=0,
++++++#                 index=selected_token_indices,
++++++#                 source=expert_output.to(hidden_states.dtype)
++++++#             )
++++++#         return moe_output
++++++
++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++#         """
++++++#         顶层 forward 方法，作为智能分发器。
++++++#         """
++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++        
++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++ 
+++++-    # def forward(
+++++-    #     self,
+++++-    #     hidden_states: mindspore.Tensor,
+++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++-    #     position_ids: Optional[mindspore.Tensor] = None,
+++++-    #     past_key_value: Optional[Cache] = None,
+++++-    #     output_attentions: bool = False,
+++++-    #     use_cache: bool = False,
+++++-    #     cache_position: Optional[mindspore.Tensor] = None,
+++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++-
+++++-    #     bsz, q_len, _ = hidden_states.shape
+++++-
+++++-    #     query_states = self.q_proj(hidden_states)
+++++-    #     key_states = self.k_proj(hidden_states)
+++++-    #     value_states = self.v_proj(hidden_states)
+++++-
+++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-
+++++-    #     kv_seq_len = key_states.shape[-2]
+++++-    #     if past_key_value is not None:
+++++-    #         if self.layer_idx is None:
+++++-    #             raise ValueError("`layer_idx` must be specified for caching")
+++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++-
+++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++-
+++++-    #     if past_key_value is not None:
+++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++-    #         key_states, value_states = past_key_value.update(
+++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++-    #         )
++++++#         if self.norm_topk_prob:
++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++        
++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++        
++++++#         moe_output = None
++++++#         # 在推理时，根据序列长度选择最优路径
++++++#         if not self.training:
++++++#             if sequence_length == 1:
++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++++#             else:
++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++++#         else:
++++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
++++++#             raise NotImplementedError("Training path is not implemented.")
++++++
++++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
++++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
++++++        
++++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
++++++        
++++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
++++++        
++++++#         return final_hidden_states, router_logits
++++++
++++++
++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++#     """
++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
++++++#     """
++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++#         super().__init__()
++++++#         self.num_experts = config.num_experts
++++++#         self.top_k = config.num_experts_per_tok
++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++
++++++#         # 门控网络
++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++#         # 专家列表
++++++#         self.experts = nn.ModuleList(
++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++#         )
++++++#         # 共享专家
++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++
++++++#     @no_grad()
++++++#     def _moe_infer_decode(
++++++#         self, 
++++++#         hidden_states: mindspore.Tensor, 
++++++#         selected_experts: mindspore.Tensor, 
++++++#         routing_weights: mindspore.Tensor
++++++#     ) -> mindspore.Tensor:
++++++#         batch_size, _ = hidden_states.shape
++++++#         expert_outputs_list = [
++++++#             ops.cat([
++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++#             ], dim=0) 
++++++#             for i in range(batch_size)
++++++#         ]
++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++#         return moe_output.squeeze(1)
++++++
++++++#     @no_grad()
++++++#     def _moe_infer_prefill(
++++++#         self, 
++++++#         hidden_states: mindspore.Tensor, 
++++++#         selected_experts: mindspore.Tensor, 
++++++#         routing_weights: mindspore.Tensor
++++++#     ) -> mindspore.Tensor:
++++++#         moe_output = ops.zeros_like(hidden_states)
++++++#         num_tokens = hidden_states.shape[0]
++++++#         flat_selected_experts = selected_experts.flatten()
++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++#         active_experts = ops.unique(flat_selected_experts)
++++++        
++++++#         for expert_idx_tensor in active_experts:
++++++#             expert_idx = expert_idx_tensor.item()
++++++#             expert_layer = self.experts[expert_idx]
++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++#             selected_token_indices = token_indices[mask]
++++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++++#             current_states = hidden_states[selected_token_indices]
++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++#             moe_output = moe_output.index_add(
++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++++#             )
++++++#         return moe_output
++++++
++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++#         """
++++++#         顶层 forward 方法，作为智能分发器。
++++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
++++++#         """
++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++        
++++++#         # 1. 门控计算 (通用逻辑)
++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++
++++++#         if self.norm_topk_prob:
++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++        
++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++        
++++++#         # 2. 智能分发到最优 MoE 路径
++++++#         moe_output = None
++++++#         if not self.training:
++++++#             if sequence_length == 1:
++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++++#             else:
++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++++#         else:
++++++#             raise NotImplementedError("Training path is not implemented.")
++++++
++++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
++++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++        
++++++#         # 4. 合并 MoE 输出和共享专家输出
++++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++        
++++++#         # 5. 恢复原始形状并返回
++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++        
++++++#         return final_hidden_states, router_logits
++++++
++++++# prefill fastest
++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++#     """
++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
++++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
++++++#     """
++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++#         super().__init__()
++++++#         self.num_experts = config.num_experts
++++++#         self.top_k = config.num_experts_per_tok
++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++
++++++#         # 门控网络
++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++#         # 专家列表
++++++#         self.experts = nn.ModuleList(
++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++#         )
++++++#         # 共享专家
++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++
++++++#     @no_grad()
++++++#     def _moe_infer_dispatch(
++++++#         self, 
++++++#         hidden_states: mindspore.Tensor, 
++++++#         selected_experts: mindspore.Tensor, 
++++++#         routing_weights: mindspore.Tensor
++++++#     ) -> mindspore.Tensor:
++++++#         """
++++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
++++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
++++++#         """
++++++#         moe_output = ops.zeros_like(hidden_states)
++++++#         num_tokens, _ = hidden_states.shape
++++++        
++++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
++++++#         flat_selected_experts = selected_experts.flatten()
++++++#         flat_routing_weights = routing_weights.flatten()
+++++ 
+++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++-
+++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++-    #     query_states = query_states / math.sqrt(self.head_dim)
+++++-    #     # <--- 修改结束 ---
+++++-
+++++-    #     fa_attention_mask = None
+++++-    #     if attention_mask is not None:
+++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++-    #         fa_attention_mask = (mask_slice != 0)
+++++-
+++++-    #     input_dtype = query_states.dtype
+++++-
+++++-    #     attn_output = mindspore.ops.flash_attention_score(
+++++-    #         query=query_states,    # 传入已经预先缩放过的 query
+++++-    #         key=key_states,
+++++-    #         value=value_states,
+++++-    #         head_num=self.num_heads,
+++++-    #         attn_mask=fa_attention_mask,
+++++-    #         keep_prob=1.0 - self.attention_dropout,
+++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++-    #         input_layout="BNSD",
+++++-    #         sparse_mode=0,
+++++-    #         inner_precise=1        # 仍然保持内部高精度计算
+++++-    #     )
++++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++ 
+++++-    #     attn_output = attn_output.to(input_dtype)
+++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++-    #     attn_output = self.o_proj(attn_output)
++++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
++++++#         active_experts = ops.unique(flat_selected_experts)
++++++        
++++++#         for expert_idx_tensor in active_experts:
++++++#             expert_idx = expert_idx_tensor.item()
++++++#             expert_layer = self.experts[expert_idx]
++++++            
++++++#             # 找到所有分配给该专家的 token
++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++            
++++++#             # 使用 mask 选取对应的 token 和权重
++++++#             current_token_indices = token_indices[mask]
++++++#             current_routing_weights = flat_routing_weights[mask]
++++++#             current_hidden_states = hidden_states[current_token_indices]
++++++            
++++++#             # 对这些 token 进行批处理
++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++++            
++++++#             # 使用 index_add 将结果精确地加回到对应位置
++++++#             moe_output = moe_output.index_add(
++++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
++++++#             )
++++++#         return moe_output
++++++
++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++#         """
++++++#         顶层 forward 方法，作为智能分发器。
++++++#         """
++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++        
++++++#         # 1. 门控计算
++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++
++++++#         if self.norm_topk_prob:
++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++        
++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++        
++++++#         # 2. 调用统一的 MoE 计算内核
++++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
++++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+++++ 
+++++-    #     attn_weights = None
+++++-    #     if output_attentions:
+++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++++#         # 3. 统一处理共享专家
++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++        
++++++#         # 4. 合并输出
++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++        
++++++#         # 5. 恢复原始形状并返回
++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++        
++++++#         return final_hidden_states, router_logits
++++++
++++++
++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++#     """
++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++#     【最终高性能与高精度版】：
++++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
++++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
++++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
++++++#     3. 这样实现了速度和准确性的两全其美。
++++++#     """
++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++#         super().__init__()
++++++#         self.num_experts = config.num_experts
++++++#         self.top_k = config.num_experts_per_tok
++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++
++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++#         self.experts = nn.ModuleList(
++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++#         )
++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++
++++++#     @no_grad()
++++++#     def _moe_infer_decode(
++++++#         self, 
++++++#         hidden_states: mindspore.Tensor, 
++++++#         selected_experts: mindspore.Tensor, 
++++++#         routing_weights: mindspore.Tensor
++++++#     ) -> mindspore.Tensor:
++++++#         """
++++++#         【解码路径】极致优化版：bmm + 高精度累加。
++++++#         """
++++++#         original_dtype = hidden_states.dtype
++++++#         batch_size, _ = hidden_states.shape
++++++        
++++++#         expert_outputs_list = [
++++++#             ops.cat([
++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++#             ], dim=0) 
++++++#             for i in range(batch_size)
++++++#         ]
++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++
++++++#         # 在 float32 下执行 bmm，得到高精度结果
++++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++        
++++++#         # 将高精度结果转换回原始数据类型
++++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
++++++        
++++++#         return moe_output
++++++
++++++#     @no_grad()
++++++#     def _moe_infer_prefill(
++++++#         self, 
++++++#         hidden_states: mindspore.Tensor, 
++++++#         selected_experts: mindspore.Tensor, 
++++++#         routing_weights: mindspore.Tensor
++++++#     ) -> mindspore.Tensor:
++++++#         """
++++++#         【预填充路径】与原始实现一致，结果精确。
++++++#         """
++++++#         moe_output = ops.zeros_like(hidden_states)
++++++#         num_tokens, _ = hidden_states.shape
++++++#         flat_selected_experts = selected_experts.flatten()
++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++#         active_experts = ops.unique(flat_selected_experts)
++++++        
++++++#         for expert_idx_tensor in active_experts:
++++++#             expert_idx = expert_idx_tensor.item()
++++++#             expert_layer = self.experts[expert_idx]
++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++#             selected_token_indices = token_indices[mask]
++++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++++#             current_states = hidden_states[selected_token_indices]
++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++#             moe_output = moe_output.index_add(
++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++++#             )
++++++#         return moe_output
++++++
++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++        
++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++ 
+++++-    #     return attn_output, attn_weights, past_key_value
++++++#         if self.norm_topk_prob:
++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++        
++++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
++++++#         # 如果模型主体是 float16，后续再转换
++++++        
++++++#         moe_output = None
++++++#         if not self.training:
++++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
++++++#             # _moe_infer_decode 内部会处理好类型转换
++++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
++++++#             if sequence_length == 1:
++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++++#             else:
++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++++#         else:
++++++#             raise NotImplementedError("Training path is not implemented.")
++++++
++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++        
++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++        
++++++#         return final_hidden_states, router_logits
++++++    
+++++ 
+++++-QWEN2MOE_ATTENTION_CLASSES = {
+++++-    "eager": Qwen2MoeAttention,
+++++-    "flash-attention": Qwen2MoeFlashAttention,
+++++-}
++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++#     """
++++++#     【融合版】一个混合专家模块，内置两种推理策略，
++++++#     由外部全局变量 `Long_Prompt` 控制：
++++++
++++++#     - if Long_Prompt is True:  【精度优先模式】
++++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
++++++#       适用于处理长序列，避免误差累积。
++++++
++++++#     - if Long_Prompt is False: 【速度优先模式】
++++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
++++++#       在解码阶段获得极致速度，同时保证结果高度准确。
++++++#     """
++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++#         super().__init__()
++++++#         self.num_experts = config.num_experts
++++++#         self.top_k = config.num_experts_per_tok
++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++
++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++#         self.experts = nn.ModuleList(
++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++#         )
++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++
++++++#     # --- 速度优先模式的辅助函数 ---
++++++#     @no_grad()
++++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++#         original_dtype = hidden_states.dtype
++++++#         batch_size, _ = hidden_states.shape
++++++#         expert_outputs_list = [
++++++#             ops.cat([
++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++#             ], dim=0) 
++++++#             for i in range(batch_size)
++++++#         ]
++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++#         weights_fp32 = routing_weights.to(mindspore.float32)
++++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
++++++
++++++#     @no_grad()
++++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++#         moe_output = ops.zeros_like(hidden_states)
++++++#         num_tokens, _ = hidden_states.shape
++++++#         flat_selected_experts = selected_experts.flatten()
++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++#         active_experts = ops.unique(flat_selected_experts)
++++++#         for expert_idx_tensor in active_experts:
++++++#             expert_idx = expert_idx_tensor.item()
++++++#             expert_layer = self.experts[expert_idx]
++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++#             selected_token_indices = token_indices[mask]
++++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++++#             current_states = hidden_states[selected_token_indices]
++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
++++++#         return moe_output
++++++
++++++#     # --- 精度优先模式的辅助函数 ---
++++++#     @no_grad()
++++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++#         moe_output = ops.zeros_like(hidden_states)
++++++#         num_tokens, _ = hidden_states.shape
++++++#         flat_selected_experts = selected_experts.flatten()
++++++#         flat_routing_weights = routing_weights.flatten()
++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++#         active_experts = ops.unique(flat_selected_experts)
++++++#         for expert_idx_tensor in active_experts:
++++++#             expert_idx = expert_idx_tensor.item()
++++++#             expert_layer = self.experts[expert_idx]
++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++#             current_token_indices = token_indices[mask]
++++++#             current_routing_weights = flat_routing_weights[mask]
++++++#             current_hidden_states = hidden_states[current_token_indices]
++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++++++#         return moe_output
++++++
++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++#         # 声明我们将要使用一个在模块外部定义的全局变量
++++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
++++++#         global Long_Prompt
++++++
++++++#         # 1. 门控计算 (所有模式通用)
++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++++++#         if self.norm_topk_prob:
++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++        
++++++#         moe_output = None
++++++#         if not self.training:
++++++#             # 根据 Long_Prompt 标志选择模式
++++++#             if Long_Prompt:
++++++#                 # --- 精度优先模式 ---
++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++#             else:
++++++#                 # --- 速度优先模式 ---
++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++#                 if sequence_length == 1:
++++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++#                 else:
++++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++#         else:
++++++#             raise NotImplementedError("Training path is not implemented.")
++++++
++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++        
++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++        
++++++#         return final_hidden_states, router_logits
++++++    
++++++class Qwen2MoeSparseMoeBlock(nn.Module):
++++++    """
++++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
++++++    控制的顶级推理策略：
+++++ 
++++++    - if Long_Prompt is True:  【精度优先模式】
++++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
++++++      适用于需要严格可复现性的长序列任务。
+++++ 
+++++-class Qwen2MoeSparseMoeBlock(nn.Module):
+++++-    def __init__(self, config):
++++++    - if Long_Prompt is False: 【速度优先模式】
++++++      采用业界最强的性能组合：
++++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
++++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
++++++    """
++++++    def __init__(self, config: Qwen2MoeConfig):
+++++         super().__init__()
+++++         self.num_experts = config.num_experts
+++++         self.top_k = config.num_experts_per_tok
+++++         self.norm_topk_prob = config.norm_topk_prob
+++++ 
+++++-        # gating
+++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++         self.experts = nn.ModuleList(
+++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++         )
+++++-
+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++ 
+++++-    #@dwj
+++++-    # 只遍历激活的专家，而非全部专家
+++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++-            num_tokens = hidden_states_reshaped.shape[0]
+++++-
+++++-            router_logits = self.gate(hidden_states_reshaped)
+++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++-
+++++-            if self.norm_topk_prob:
+++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-            routing_weights = routing_weights.to(hidden_states.dtype)
+++++-            
+++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++-            flat_selected_experts = selected_experts.flatten()
+++++-            
+++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++-            token_indices = broadcasted_token_indices.flatten()
+++++-            
+++++-            active_experts = ops.unique(flat_selected_experts)
+++++-            
+++++-            for expert_idx_tensor in active_experts:
+++++-                expert_idx = expert_idx_tensor.item()
+++++-                expert_layer = self.experts[expert_idx]
+++++-                
+++++-                mask = (flat_selected_experts == expert_idx_tensor)
+++++-                selected_token_indices = token_indices[mask]
+++++-                selected_routing_weights = routing_weights.flatten()[mask]
+++++-                
+++++-                current_states = hidden_states_reshaped[selected_token_indices]
+++++-                
+++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++-                
+++++-                final_hidden_states = final_hidden_states.index_add(
+++++-                    dim=0,
+++++-                    index=selected_token_indices,
+++++-                    source=expert_output.to(hidden_states.dtype)
+++++-                )
+++++-            
+++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
++++++    @no_grad()
++++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++        original_dtype = hidden_states.dtype
++++++        batch_size, _ = hidden_states.shape
++++++        expert_outputs_list = [
++++++            ops.cat([
++++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++            ], dim=0)
++++++            for i in range(batch_size)
++++++        ]
++++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++        weights_fp32 = routing_weights.to(mindspore.float32)
++++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++++        return moe_output_fp32.squeeze(1).to(original_dtype)
++++++
++++++    @no_grad()
++++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++        num_tokens, _ = hidden_states.shape
++++++        flat_selected_experts = selected_experts.flatten()
++++++        sorted_expert_indices = flat_selected_experts.argsort()
++++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++++++        original_token_indices = sorted_expert_indices // self.top_k
++++++        moe_output = ops.zeros_like(hidden_states)
++++++        current_token_offset = 0
++++++        for i in range(self.num_experts):
++++++            expert_token_count = tokens_per_expert[i] - current_token_offset
++++++            if expert_token_count == 0:
++++++                continue
++++++            end_offset = current_token_offset + expert_token_count
++++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++++++            expert_hidden_states = hidden_states[expert_original_token_indices]
++++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++++++            current_token_offset += expert_token_count
++++++        return moe_output
++++++
++++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
++++++    @no_grad()
++++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++        moe_output = ops.zeros_like(hidden_states)
++++++        num_tokens, _ = hidden_states.shape
++++++        flat_selected_experts = selected_experts.flatten()
++++++        flat_routing_weights = routing_weights.flatten()
++++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++        active_experts = ops.unique(flat_selected_experts)
++++++        for expert_idx_tensor in active_experts:
++++++            expert_idx = expert_idx_tensor.item()
++++++            expert_layer = self.experts[expert_idx]
++++++            mask = (flat_selected_experts == expert_idx_tensor)
++++++            current_token_indices = token_indices[mask]
++++++            current_routing_weights = flat_routing_weights[mask]
++++++            current_hidden_states = hidden_states[current_token_indices]
++++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++++++        return moe_output
+++++ 
+++++-            final_hidden_states = final_hidden_states + shared_expert_output
+++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++-            
+++++-            return final_hidden_states, router_logits
++++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++        global Long_Prompt
++++++
++++++        # 1. 门控计算 (所有模式通用)
++++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++        router_logits = self.gate(hidden_states_reshaped)
++++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++++++        if self.norm_topk_prob:
++++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++        
++++++        moe_output = None
++++++        if Long_Prompt:
++++++            # --- 精度优先模式 (ACCURACY MODE) ---
++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++        else:
++++++            # --- 速度优先模式 (SPEED MODE) ---
++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++            if sequence_length == 1:
++++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++            else:
++++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++        
+++++ 
++++++        # 3. 共享专家计算与合并 (所有模式通用)
++++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++        
++++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++        
++++++        return final_hidden_states, router_logits
+++++ 
+++++ class Qwen2MoeDecoderLayer(nn.Module):
+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+++++         super().__init__()
+++++         self.hidden_size = config.hidden_size
++++++        
++++++        # if Long_Prompt:
++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++        # else:
++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++ 
+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++ 
+++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++-
+++++         if (layer_idx not in config.mlp_only_layers) and (
+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++++         ):
+++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++             self._warmed_up = True
+++++             self.warmup_moe_model()
+++++ 
++++++
++++++
+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++++         output_router_logits = (
+++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++             router_logits=outputs.router_logits,
+++++         )
+++++ 
++++++    def generate(self, *args, **kwargs):
++++++        """
++++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
++++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
++++++        """
++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++++++
++++++        input_ids = kwargs.get("input_ids")
++++++        if input_ids is None and args:
++++++            input_ids = args[0]
++++++
++++++        if input_ids is not None:
++++++            prompt_length = input_ids.shape[1]
++++++            
++++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
++++++                Long_Prompt = True
++++++            else:
++++++                Long_Prompt = False
++++++
++++++        return super().generate(*args, **kwargs)
++++++    
+++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+++++     def prepare_inputs_for_generation(
+++++         self,
+++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
+++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
++++++        
+++++         if past_key_values is not None:
+++++             if inputs_embeds is not None:  # Exception 1
+++++                 if 0 not in input_ids.shape:
+++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++             }
+++++         )
+++++         return model_inputs
++++++
+++++ # @lwx
+++++     # def _decode_one_tokens_logits(
+++++     #     self,
+++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+++++             attentions=outputs.attentions,
+++++         )
+++++ 
++++++
+++++ __all__ = [
+++++     "Qwen2MoeForCausalLM",
+++++     "Qwen2MoeModel",
+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++++new file mode 100644
+++++index 00000000..6dfb5b93
+++++--- /dev/null
++++++++ b/patches/0001-20251104commit.patch
+++++@@ -0,0 +1,1272 @@
++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++++From: Pinoeer-kingxi <13022943007@163.com>
++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
++++++Subject: [PATCH] 20251104commit
++++++
++++++---
++++++ mindnlp/transformers/cache_utils.py           |  28 +-
++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
++++++
++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++++++index cadd2e04..02f8d4be 100644
++++++--- a/mindnlp/transformers/cache_utils.py
+++++++++ b/mindnlp/transformers/cache_utils.py
++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++++++             # k_out[:, :, cache_position] = key_states
++++++             # v_out[:, :, cache_position] = value_states
++++++-            if ON_ORANGE_PI:
++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++-            else:
++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++-
+++++++            # if ON_ORANGE_PI:
+++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++++            # else:
+++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++++            # 确保 cache_position 是 1D tensor 并且类型正确
+++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++++++            if cache_position.ndim > 1:
+++++++                cache_position = cache_position.flatten()
+++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++++++                cache_position = cache_position.int()
+++++++            
+++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++++++            k_out[:, :, cache_position] = key_states
+++++++            v_out[:, :, cache_position] = value_states
+++++++            
++++++         return k_out, v_out
++++++ 
++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++index c695b944..d8303e45 100644
++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++++++ def rotate_half(x):
++++++     """Rotates half the hidden dims of the input."""
++++++-    x1 = x[..., : x.shape[-1] // 2]
++++++-    x2 = x[..., x.shape[-1] // 2 :]
+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++++    # x1 = x[..., : x.shape[-1] // 2]
+++++++    # x2 = x[..., x.shape[-1] // 2 :]
+++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++     return ops.cat((-x2, x1), dim=-1)
++++++ 
++++++ 
++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++++++         if self.training:
++++++             raise NotImplementedError("Training is not supported yet.")
++++++         else:
++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++-        if self.config.n_shared_experts is not None:
++++++-            y = y + self.shared_experts(identity)
++++++-        return y
+++++++            # @lwx
+++++++            if orig_shape[1] == 1:
+++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++++++                y=y.view(*orig_shape)
+++++++                if self.config.n_shared_experts is not None:
+++++++                    y = y + self.shared_experts(identity)
+++++++                return y
+++++++            else:
+++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++++++                if self.config.n_shared_experts is not None:
+++++++                    y = y + self.shared_experts(identity)
+++++++                return y
+++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++++        # if self.config.n_shared_experts is not None:
+++++++        #     y = y + self.shared_experts(identity)
+++++++        # return y
+++++++        
+++++++    @no_grad()
+++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++++
+++++++        expert_cache = ops.zeros_like(x)
+++++++        for i in range(self.num_experts_per_tok):
+++++++            expert_id = flat_expert_indices[i].item()
+++++++            weight = flat_expert_weights[i].item()
+++++++            expert = self.experts[expert_id]
+++++++            expert_out = expert(x)
+++++++            expert_cache += expert_out * weight
+++++++        return expert_cache
++++++ 
++++++     @no_grad()
++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++-        # expert_cache = torch.zeros_like(x)
++++++-        # idxs = flat_expert_indices.argsort()
++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++-        # token_idxs = idxs // self.num_experts_per_tok
++++++-        # for i, end_idx in enumerate(tokens_per_expert):
++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++-        #     if start_idx == end_idx:
++++++-        #         continue
++++++-        #     expert = self.experts[i]
++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++-        #     expert_tokens = x[exp_token_idx]
++++++-        #     expert_out = expert(expert_tokens)
++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++-        # return expert_cache
+++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++         expert_cache = ops.zeros_like(x)
++++++         idxs = flat_expert_indices.argsort()
++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++         token_idxs = idxs // self.num_experts_per_tok
+++++++
++++++         for i, end_idx in enumerate(tokens_per_expert):
++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++             if start_idx == end_idx:
++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++++++             expert_out = expert(expert_tokens)
++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++
++++++         return expert_cache
+++++++        
+++++++    # @no_grad()
+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++    #     # expert_cache = torch.zeros_like(x)
+++++++    #     # idxs = flat_expert_indices.argsort()
+++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++    #     # token_idxs = idxs // self.num_experts_per_tok
+++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++    #     #     if start_idx == end_idx:
+++++++    #     #         continue
+++++++    #     #     expert = self.experts[i]
+++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++    #     #     expert_tokens = x[exp_token_idx]
+++++++    #     #     expert_out = expert(expert_tokens)
+++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++    #     # return expert_cache
+++++++    #     expert_cache = ops.zeros_like(x)
+++++++    #     idxs = flat_expert_indices.argsort()
+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++++
+++++++    #     for i, end_idx in enumerate(tokens_per_expert):
+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++    #         if start_idx == end_idx:
+++++++    #             continue
+++++++    #         expert = self.experts[i]
+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++    #         expert_tokens = x[exp_token_idx]
+++++++    #         expert_out = expert(expert_tokens)
+++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++
+++++++    #     return expert_cache
+++++++    # @no_grad()
+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++    #     expert_cache = ops.zeros_like(x)
+++++++
+++++++    #     # 排序保证顺序一致
+++++++    #     idxs = flat_expert_indices.argsort()
+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++++
+++++++    #     # 找出有 token 的专家
+++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++++
+++++++    #     for i in active_experts.tolist():
+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++    #         end_idx = tokens_per_expert[i]
+++++++    #         if start_idx == end_idx:  # 没有 token
+++++++    #             continue
+++++++
+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++    #         expert_tokens = x[exp_token_idx]
+++++++    #         expert_out = self.experts[i](expert_tokens)
+++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++++
+++++++    #         expert_cache = mindspore.mint.scatter_add(
+++++++    #             expert_cache,
+++++++    #             0,
+++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++++    #             expert_out
+++++++    #         )
+++++++
+++++++    #     return expert_cache
+++++++
+++++++
++++++ 
++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++++++ #     """
++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++ 
++++++         # Initialize weights and apply final processing
++++++         self.post_init()
+++++++        self.warm_up = False
+++++++
+++++++    def warmup_moe_model_deep(self):
+++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++++        test_texts = [
+++++++            "warmup short",
+++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++++++        ]
+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++++        if tokenizer is None:
+++++++            from mindnlp.transformers import AutoTokenizer
+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++++            self._warmup_tokenizer = tokenizer
+++++++
+++++++        for text in test_texts:
+++++++            inputs = tokenizer(text, return_tensors="ms")
+++++++            with mindspore._no_grad():
+++++++                _ = self(**inputs, use_cache=False)
+++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++++++ 
++++++     def get_input_embeddings(self):
++++++         return self.model.embed_tokens
++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++++         ```"""
+++++++        if not self.warm_up:
+++++++            self.warm_up = True
+++++++            self.warmup_moe_model_deep()
+++++++
++++++         output_attentions = (
++++++             output_attentions
++++++             if output_attentions is not None
++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++index 3cbf820e..d4c6b651 100644
++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++@@ -18,7 +18,6 @@
++++++ # See the License for the specific language governing permissions and
++++++ # limitations under the License.
++++++ """MindSpore Qwen2MoE model."""
++++++-
++++++ import math
++++++ from typing import List, Optional, Tuple, Union
++++++ 
++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++++++     TokenClassifierOutput,
++++++ )
++++++ from ...modeling_utils import PreTrainedModel
+++++++from ...generation import GenerationMixin
++++++ from ....utils import logging
++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
++++++ 
++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++++++         self.variance_epsilon = eps
++++++ 
++++++     def forward(self, hidden_states):
+++++++        # @dwj
+++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++++        # @lwx
+++++++        # if not self.training :
+++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++         input_dtype = hidden_states.dtype
++++++         hidden_states = hidden_states.to(mindspore.float32)
++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++++++@@ -234,6 +239,8 @@ def rotate_half(x):
++++++     """Rotates half the hidden dims of the input."""
++++++     x1 = x[..., : x.shape[-1] // 2]
++++++     x2 = x[..., x.shape[-1] // 2 :]
+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++     return ops.cat((-x2, x1), dim=-1)
++++++ 
++++++ 
++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++++++         self.config = config
++++++         self.hidden_size = config.hidden_size
++++++         self.intermediate_size = intermediate_size
+++++++        
++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++++++         self.act_fn = ACT2FN[config.hidden_act]
++++++ 
++++++     def forward(self, x):
++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++-
++++++ 
+++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++++        # @lwx 
+++++++        # gate_up_output = self.gate_up_proj(x)
+++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++++++        # return self.down_proj(swiglu_output)
+++++++
+++++++    # def forward(self, x):
+++++++    #     gate_proj_out = self.gate_proj(x)
+++++++    #     up_proj_out = self.up_proj(x)
+++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++++++    #     return self.down_proj(swiglu_out)
+++++++        
++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++++     """
++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++++++         use_cache: bool = False,
++++++         cache_position: Optional[mindspore.Tensor] = None,
++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++        
+++++++
++++++         bsz, q_len, _ = hidden_states.shape
++++++ 
++++++         query_states = self.q_proj(hidden_states)
++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++                     "with a layer index."
++++++                 )
++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++            if isinstance(past_key_value, StaticCache):
+++++++                kv_seq_len = key_states.shape[-2]
+++++++            else:
+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++ 
++++++         if past_key_value is not None:
++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++            
+++++++            if isinstance(past_key_value, StaticCache):
+++++++                kv_seq_len = key_states.shape[-2]
++++++ 
++++++         # repeat k/v heads if n_kv_heads < n_heads
++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++-
+++++++        
++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++ 
++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++++++-            raise ValueError(
++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++++++-                f" {attn_weights.shape}"
++++++-            )
++++++-
++++++-        if attention_mask is not None:  # no matter the length, we just slice it
++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++++++        if attention_mask is not None:
+++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++             attn_weights = attn_weights + causal_mask
++++++ 
++++++         # upcast attention to fp32
++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++ 
++++++         attn_output = self.o_proj(attn_output)
++++++-
+++++++        # @lwx
+++++++        
+++++++        # max_seq_len = self.max_position_embeddings  # 2048
+++++++
+++++++        # if attention_mask is not None:
+++++++        #     # attention_mask: [B, 1, Sq, Sk]
+++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++++
+++++++        #     # pad 到 [max_seq_len, max_seq_len]
+++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++++        #     global_attention_mask = padded_mask
+++++++        # else:
+++++++        #     global_attention_mask = None
+++++++
+++++++
+++++++        # sparse_mode=3
+++++++        # attn_output = mindspore.ops.flash_attention_score(
+++++++        #     query=query_states,
+++++++        #     key=key_states,
+++++++        #     value=value_states,
+++++++        #     real_shift=None,
+++++++        #     padding_mask=None,
+++++++
+++++++        #     head_num=self.num_heads,
+++++++        #     attn_mask=global_attention_mask,
+++++++        #     keep_prob=1.0 - self.attention_dropout,
+++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++        #     input_layout="BNSD",
+++++++        #     pre_tokens=2147483647,
+++++++        #     next_tokens=2147483647,
+++++++        #     inner_precise=0,
+++++++        #     drop_mask=None,
+++++++        #     prefix=None,
+++++++        #     actual_seq_qlen=None,
+++++++        #     actual_seq_kvlen=None,
+++++++        #     sparse_mode=sparse_mode,
+++++++        # )
++++++         if not output_attentions:
++++++             attn_weights = None
++++++ 
++++++         return attn_output, attn_weights, past_key_value
++++++ 
++++++ 
+++++++class Qwen2MoeFlashAttention(nn.Module):
+++++++    """
+++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++++
+++++++    关键改动:
+++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++++       直接传入原始的 key 和 value 张量效率更高。
+++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++++    """
+++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++++        super().__init__()
+++++++        self.config = config
+++++++        self.layer_idx = layer_idx
+++++++        self.hidden_size = config.hidden_size
+++++++        self.num_heads = config.num_attention_heads
+++++++        self.head_dim = self.hidden_size // self.num_heads
+++++++        self.num_key_value_heads = config.num_key_value_heads
+++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++        self.max_position_embeddings = config.max_position_embeddings
+++++++        self.rope_theta = config.rope_theta
+++++++        self.attention_dropout = config.attention_dropout
+++++++
+++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++            raise ValueError(
+++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++++            )
+++++++
+++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++++
+++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++++            self.head_dim,
+++++++            max_position_embeddings=self.max_position_embeddings,
+++++++            base=self.rope_theta,
+++++++        )
+++++++
+++++++    def forward(
+++++++        self,
+++++++        hidden_states: mindspore.Tensor,
+++++++        attention_mask: Optional[mindspore.Tensor] = None,
+++++++        position_ids: Optional[mindspore.Tensor] = None,
+++++++        past_key_value: Optional[Cache] = None,
+++++++        output_attentions: bool = False,
+++++++        use_cache: bool = False,
+++++++        cache_position: Optional[mindspore.Tensor] = None,
+++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++        bsz, q_len, _ = hidden_states.shape
+++++++
+++++++        # 1. 线性投射 Q, K, V
+++++++        query_states = self.q_proj(hidden_states)
+++++++        key_states = self.k_proj(hidden_states)
+++++++        value_states = self.v_proj(hidden_states)
+++++++
+++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++        # 3. RoPE 旋转位置编码
+++++++        kv_seq_len = key_states.shape[-2]
+++++++        if past_key_value is not None:
+++++++            if self.layer_idx is None:
+++++++                raise ValueError(
+++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++                    "with a layer index."
+++++++                )
+++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++++                if cache_position.shape[0] == 1:
+++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++++                    kv_seq_len = past_seen_tokens + 1
+++++++                else:
+++++++                    # prefill 阶段：cache_position 是范围，使用其长度
+++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++++            else:
+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++
+++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++        # 4. KV 缓存更新
+++++++        if past_key_value is not None:
+++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++            key_states, value_states = past_key_value.update(
+++++++                key_states, value_states, self.layer_idx, cache_kwargs
+++++++            )
+++++++            
+++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++                if cache_position.shape[0] == 1:
+++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++++                    kv_seq_len = key_states.shape[-2]
+++++++
+++++++        # 5. [重要] 准备 Attention Mask
+++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++++        fa_attention_mask = None
+++++++        if attention_mask is not None:
+++++++            # 截取与当前key长度匹配的部分
+++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++++            fa_attention_mask = (mask_slice != 0)
+++++++
+++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++++        input_dtype = query_states.dtype
+++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++++            query_states = query_states.to(mindspore.float16)
+++++++            key_states = key_states.to(mindspore.float16)
+++++++            value_states = value_states.to(mindspore.float16)
+++++++
+++++++        # 6. [核心] 调用 flash_attention_score 算子
+++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++++        attn_output = mindspore.ops.flash_attention_score(
+++++++            query=query_states,
+++++++            key=key_states,
+++++++            value=value_states,
+++++++            head_num=self.num_heads, # 传入Q的头数(N1)
+++++++            attn_mask=fa_attention_mask,
+++++++            keep_prob=1.0 - self.attention_dropout,
+++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++            input_layout="BNSD",
+++++++            sparse_mode=0 # 使用 defaultMask 模式
+++++++        )
+++++++
+++++++        # 恢复原始数据类型
+++++++        attn_output = attn_output.to(input_dtype)
+++++++
+++++++        # 7. 调整输出形状
+++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++        attn_output = self.o_proj(attn_output)
+++++++
+++++++        # FlashAttention 算子不直接返回注意力权重矩阵
+++++++        attn_weights = None
+++++++        if output_attentions:
+++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++
+++++++        return attn_output, attn_weights, past_key_value
+++++++
+++++++    # def forward(
+++++++    #     self,
+++++++    #     hidden_states: mindspore.Tensor,
+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++    #     past_key_value: Optional[Cache] = None,
+++++++    #     output_attentions: bool = False,
+++++++    #     use_cache: bool = False,
+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++    #     bsz, q_len, _ = hidden_states.shape
+++++++
+++++++    #     # 1. 线性投射 Q, K, V
+++++++    #     query_states = self.q_proj(hidden_states)
+++++++    #     key_states = self.k_proj(hidden_states)
+++++++    #     value_states = self.v_proj(hidden_states)
+++++++
+++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++    #     # 3. RoPE 旋转位置编码
+++++++    #     kv_seq_len = key_states.shape[-2]
+++++++    #     if past_key_value is not None:
+++++++    #         if self.layer_idx is None:
+++++++    #             raise ValueError(
+++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++    #                 "with a layer index."
+++++++    #             )
+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++
+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++    #     # 4. KV 缓存更新
+++++++    #     if past_key_value is not None:
+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++    #         key_states, value_states = past_key_value.update(
+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++    #         )
+++++++
+++++++    #     # 5. 准备 Attention Mask
+++++++    #     fa_attention_mask = None
+++++++    #     if attention_mask is not None:
+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++    #         fa_attention_mask = (mask_slice != 0)
+++++++
+++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++++    #     input_dtype = query_states.dtype
+++++++
+++++++    #     # 6. [核心] 调用 flash_attention_score 算子
+++++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++++    #         query=query_states,
+++++++    #         key=key_states,
+++++++    #         value=value_states,
+++++++    #         head_num=self.num_heads,
+++++++    #         attn_mask=fa_attention_mask,
+++++++    #         keep_prob=1.0 - self.attention_dropout,
+++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++    #         input_layout="BNSD",
+++++++    #         sparse_mode=0,
+++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++++    #         inner_precise=1
+++++++    #     )
+++++++
+++++++    #     # 恢复原始数据类型
+++++++    #     attn_output = attn_output.to(input_dtype)
+++++++
+++++++    #     # 7. 调整输出形状
+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++    #     attn_output = self.o_proj(attn_output)
+++++++
+++++++    #     attn_weights = None
+++++++    #     if output_attentions:
+++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++
+++++++    #     return attn_output, attn_weights, past_key_value
+++++++
+++++++    # def forward(
+++++++    #     self,
+++++++    #     hidden_states: mindspore.Tensor,
+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++    #     past_key_value: Optional[Cache] = None,
+++++++    #     output_attentions: bool = False,
+++++++    #     use_cache: bool = False,
+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++    #     bsz, q_len, _ = hidden_states.shape
+++++++
+++++++    #     query_states = self.q_proj(hidden_states)
+++++++    #     key_states = self.k_proj(hidden_states)
+++++++    #     value_states = self.v_proj(hidden_states)
+++++++
+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++    #     kv_seq_len = key_states.shape[-2]
+++++++    #     if past_key_value is not None:
+++++++    #         if self.layer_idx is None:
+++++++    #             raise ValueError("`layer_idx` must be specified for caching")
+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++
+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++    #     if past_key_value is not None:
+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++    #         key_states, value_states = past_key_value.update(
+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++    #         )
+++++++
+++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++
+++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++++    #     query_states = query_states / math.sqrt(self.head_dim)
+++++++    #     # <--- 修改结束 ---
+++++++
+++++++    #     fa_attention_mask = None
+++++++    #     if attention_mask is not None:
+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++    #         fa_attention_mask = (mask_slice != 0)
+++++++
+++++++    #     input_dtype = query_states.dtype
+++++++
+++++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++++    #         query=query_states,    # 传入已经预先缩放过的 query
+++++++    #         key=key_states,
+++++++    #         value=value_states,
+++++++    #         head_num=self.num_heads,
+++++++    #         attn_mask=fa_attention_mask,
+++++++    #         keep_prob=1.0 - self.attention_dropout,
+++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++++    #         input_layout="BNSD",
+++++++    #         sparse_mode=0,
+++++++    #         inner_precise=1        # 仍然保持内部高精度计算
+++++++    #     )
+++++++
+++++++    #     attn_output = attn_output.to(input_dtype)
+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++    #     attn_output = self.o_proj(attn_output)
+++++++
+++++++    #     attn_weights = None
+++++++    #     if output_attentions:
+++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++++
+++++++    #     return attn_output, attn_weights, past_key_value
+++++++
++++++ QWEN2MOE_ATTENTION_CLASSES = {
++++++     "eager": Qwen2MoeAttention,
+++++++    "flash-attention": Qwen2MoeFlashAttention,
++++++ }
++++++ 
++++++ 
++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++ 
+++++++    #@dwj
+++++++    # 只遍历激活的专家，而非全部专家
++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
++++++-        # router_logits: (batch * sequence_length, n_experts)
++++++-        router_logits = self.gate(hidden_states)
++++++-
++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++-        if self.norm_topk_prob:
++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-        # we cast back to the input dtype
++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
++++++-
++++++-        final_hidden_states = ops.zeros(
++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++++++-        )
++++++-
++++++-        # One hot encode the selected experts to create an expert mask
++++++-        # this will be used to easily index which expert is going to be sollicitated
++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++++++-
++++++-        # Loop over all available experts in the model and perform the computation on each expert
++++++-        for expert_idx in range(self.num_experts):
++++++-            expert_layer = self.experts[expert_idx]
++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++++++-
++++++-            # Index the correct hidden states and compute the expert hidden state for
++++++-            # the current expert. We need to make sure to multiply the output hidden
++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++++++-            if 0 not in idx.shape:
++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++++++-
++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
++++++-                # the `top_x` tensor here.
++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++++++-
++++++-        shared_expert_output = self.shared_expert(hidden_states)
++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++++++-
++++++-        final_hidden_states = final_hidden_states + shared_expert_output
+++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++            num_tokens = hidden_states_reshaped.shape[0]
+++++++
+++++++            router_logits = self.gate(hidden_states_reshaped)
+++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++
+++++++            if self.norm_topk_prob:
+++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++            routing_weights = routing_weights.to(hidden_states.dtype)
+++++++            
+++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++++            flat_selected_experts = selected_experts.flatten()
+++++++            
+++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++++            token_indices = broadcasted_token_indices.flatten()
+++++++            
+++++++            active_experts = ops.unique(flat_selected_experts)
+++++++            
+++++++            for expert_idx_tensor in active_experts:
+++++++                expert_idx = expert_idx_tensor.item()
+++++++                expert_layer = self.experts[expert_idx]
+++++++                
+++++++                mask = (flat_selected_experts == expert_idx_tensor)
+++++++                selected_token_indices = token_indices[mask]
+++++++                selected_routing_weights = routing_weights.flatten()[mask]
+++++++                
+++++++                current_states = hidden_states_reshaped[selected_token_indices]
+++++++                
+++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++                
+++++++                final_hidden_states = final_hidden_states.index_add(
+++++++                    dim=0,
+++++++                    index=selected_token_indices,
+++++++                    source=expert_output.to(hidden_states.dtype)
+++++++                )
+++++++            
+++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++++ 
++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++-        return final_hidden_states, router_logits
+++++++            final_hidden_states = final_hidden_states + shared_expert_output
+++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++            
+++++++            return final_hidden_states, router_logits
++++++ 
++++++ 
++++++ class Qwen2MoeDecoderLayer(nn.Module):
++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++++++ 
++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++ 
+++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++++
++++++         if (layer_idx not in config.mlp_only_layers) and (
++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++++         ):
++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++++++     _skip_keys_device_placement = "past_key_values"
++++++     _supports_cache_class = True
+++++++#lwx
+++++++    # _supports_static_cache = True
++++++ 
++++++     def _init_weights(self, module):
++++++         std = self.config.initializer_range
++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++++         return causal_mask
++++++ 
++++++ 
++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++     _tied_weights_keys = ["lm_head.weight"]
++++++ 
++++++     def __init__(self, config):
++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++         self.num_experts_per_tok = config.num_experts_per_tok
++++++         # Initialize weights and apply final processing
++++++         self.post_init()
+++++++        # @lwx
+++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++++++        #     self.generation_config.cache_implementation = "static"
+++++++        self._warmed_up = False
+++++++
+++++++    def warmup_moe_model(self):
+++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++++++        test_texts = [
+++++++            "warmup short",
+++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++++++        ]
+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++++        if tokenizer is None:
+++++++            from mindnlp.transformers import AutoTokenizer
+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++++            self._warmup_tokenizer = tokenizer
+++++++
+++++++        for text in test_texts:
+++++++            inputs = tokenizer(text, return_tensors="ms")
+++++++            with mindspore._no_grad():
+++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
++++++ 
++++++     def get_input_embeddings(self):
++++++         return self.model.embed_tokens
++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++++         ```"""
+++++++        if not self._warmed_up:
+++++++            self._warmed_up = True
+++++++            self.warmup_moe_model()
++++++ 
++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++++         output_router_logits = (
++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++             }
++++++         )
++++++         return model_inputs
+++++++# @lwx
+++++++    # def _decode_one_tokens_logits(
+++++++    #     self,
+++++++    #     cur_token: mindspore.Tensor,
+++++++    #     input_pos: Optional[mindspore.Tensor],
+++++++    #     cache_position: mindspore.Tensor,
+++++++    #     past_key_values: StaticCache,
+++++++    # ) -> mindspore.Tensor:
+++++++    #     """
+++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++++++        
+++++++    #     Args:
+++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++++++    #         input_pos: 输入位置信息，可选
+++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++++++            
+++++++    #     Returns:
+++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++++++    #     """
+++++++    #     # 调用JIT编译的版本
+++++++    #     return self.get_decode_one_tokens_logits(
+++++++    #         cur_token=cur_token,
+++++++    #         input_pos=input_pos,
+++++++    #         cache_position=cache_position,
+++++++    #         past_key_values=past_key_values,
+++++++    #     )
+++++++    
+++++++    # @mindspore.jit(jit_level='O1')
+++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++++++    #     """
+++++++    #     JIT编译的函数，用于高效的单token解码
+++++++    #     使用JIT编译优化以支持静态shape和高效执行
+++++++        
+++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++++++    #     """
+++++++    #     outputs = self.model.forward(
+++++++    #         input_ids=cur_token,
+++++++    #         position_ids=input_pos,
+++++++    #         cache_position=cache_position,
+++++++    #         past_key_values=past_key_values,
+++++++    #         use_cache=True,
+++++++    #         return_dict=False,
+++++++    #     )
+++++++        
+++++++    #     hidden_states = outputs[0]
+++++++    #     logits = self.lm_head.forward(hidden_states)
+++++++    #     logits = logits.float()
+++++++        
+++++++    #     return logits[:, -1, :]
+++++++
+++++++    # def _sample(
+++++++    #     self,
+++++++    #     input_ids: mindspore.Tensor,
+++++++    #     logits_processor,
+++++++    #     stopping_criteria,
+++++++    #     generation_config,
+++++++    #     synced_devices: bool,
+++++++    #     streamer=None,
+++++++    #     logits_warper=None,
+++++++    #     **model_kwargs,
+++++++    # ):
+++++++    #     """
+++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++++++    #     """
+++++++    #     from ...generation.logits_process import LogitsProcessorList
+++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++++++    #     from mindnlp.core import nn, ops, no_grad
+++++++    #     import numpy as np
+++++++        
+++++++    #     # 检查是否使用 StaticCache
+++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++++++    #     # 否则，直接调用父类方法
+++++++    #     past_key_values = model_kwargs.get("past_key_values")
+++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++++++
+++++++    #     if not isinstance(past_key_values, StaticCache):
+++++++    #         # 不使用 StaticCache，直接调用父类方法
+++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++++++    #         return super()._sample(
+++++++    #             input_ids=input_ids,
+++++++    #             logits_processor=logits_processor,
+++++++    #             stopping_criteria=stopping_criteria,
+++++++    #             generation_config=generation_config,
+++++++    #             synced_devices=synced_devices,
+++++++    #             streamer=streamer,
+++++++    #             logits_warper=logits_warper,
+++++++    #             **model_kwargs,
+++++++    #         )
+++++++        
+++++++    #     # 使用 StaticCache，进入自定义循环
+++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++++++    #     pad_token_id = generation_config._pad_token_tensor
+++++++    #     output_attentions = generation_config.output_attentions
+++++++    #     output_hidden_states = generation_config.output_hidden_states
+++++++    #     output_scores = generation_config.output_scores
+++++++    #     output_logits = generation_config.output_logits
+++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++++++    #     max_length = generation_config.max_length
+++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++++++    #     do_sample = generation_config.do_sample
+++++++        
+++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++++++    #         raise ValueError(
+++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++++++    #             f"{logits_warper})."
+++++++    #         )
+++++++        
+++++++    #     # init attention / hidden states / scores tuples
+++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++++++        
+++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++++++    #         encoder_hidden_states = (
+++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++++++    #         )
+++++++        
+++++++    #     # keep track of which sequences are already finished
+++++++    #     batch_size, cur_len = input_ids.shape
+++++++    #     this_peer_finished = False
+++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++++++        
+++++++    #     time_record = []
+++++++    #     from ....utils.testing_utils import parse_flag_from_env
+++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++++++        
+++++++    #     while self._has_unfinished_sequences(
+++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++++++    #     ):
+++++++    #         if _record_time:
+++++++    #             import time as time_module
+++++++    #             infer_start = time_module.time()
+++++++            
+++++++    #         # prepare model inputs
+++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++++++            
+++++++    #         # prepare variable output controls
+++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++++++            
+++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++++++    #         cur_cache_position = model_inputs.get("cache_position")
+++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+++++++    #         cur_input_ids = model_inputs.get("input_ids")
+++++++            
+++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++++++    #             cur_cache_position is not None and 
+++++++    #             len(cur_cache_position.shape) > 0 and
+++++++    #             cur_cache_position.shape[0] == 1 and
+++++++    #             cur_input_ids is not None and
+++++++    #             cur_input_ids.shape[1] == 1):
+++++++    #             # 使用 JIT 优化的单 token 解码
+++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++++++    #             if not hasattr(self, '_jit_used'):
+++++++    #                 self._jit_used = False
+++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++++++                
+++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+++++++    #                 cur_token=cur_input_ids,
+++++++    #                 input_pos=model_inputs.get("position_ids"),
+++++++    #                 cache_position=cur_cache_position,
+++++++    #                 past_key_values=cur_past_key_values,
+++++++    #             )
+++++++                
+++++++    #             # 标记已使用JIT（用于后续判断）
+++++++    #             if not self._jit_used:
+++++++    #                 self._jit_used = True
+++++++                
+++++++    #             # 构造兼容的输出对象
+++++++    #             class JitOptimizedOutput:
+++++++    #                 def __init__(self, logits, config):
+++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++++++    #                     self.config = config
+++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+++++++    #                     self.cross_attentions = None
+++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++++++                
+++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++++++    #         else:
+++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++++++    #             outputs = self(**model_inputs, return_dict=True)
+++++++            
+++++++    #         if synced_devices and this_peer_finished:
+++++++    #             continue
+++++++            
+++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++++++    #         next_token_logits = outputs.logits[:, -1, :]
+++++++            
+++++++    #         # pre-process distribution
+++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++++++    #         if do_sample:
+++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++++++            
+++++++    #         # Store scores, attentions and hidden_states when required
+++++++    #         if return_dict_in_generate:
+++++++    #             if output_scores:
+++++++    #                 scores += (next_token_scores,)
+++++++    #             if output_logits:
+++++++    #                 raw_logits += (next_token_logits,)
+++++++    #             if output_attentions:
+++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++++++    #                 if self.config.is_encoder_decoder:
+++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++++++                
+++++++    #             if output_hidden_states:
+++++++    #                 hidden = (
+++++++    #                     outputs.decoder_hidden_states
+++++++    #                     if self.config.is_encoder_decoder
+++++++    #                     else outputs.hidden_states
+++++++    #                 )
+++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++++++            
+++++++    #         # token selection
+++++++    #         if do_sample:
+++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++++++    #         else:
+++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++++++            
+++++++    #         # finished sentences should have their next token be a padding token
+++++++    #         if has_eos_stopping_criteria:
+++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++++++            
+++++++    #         # update generated ids, model inputs, and length for next step
+++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++++++    #         if streamer is not None:
+++++++    #             streamer.put(next_tokens)
+++++++            
+++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+++++++    #             outputs,
+++++++    #             model_kwargs,
+++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++++++    #         )
+++++++            
+++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++++++    #         cur_len += 1
+++++++            
+++++++    #         if _record_time:
+++++++    #             import time as time_module
+++++++    #             infer_stop = time_module.time()
+++++++    #             time_record.append(infer_stop - infer_start)
+++++++            
+++++++    #         del outputs
+++++++        
+++++++    #     average_infer_time = None
+++++++    #     if time_record:
+++++++    #         if len(time_record) > 1:
+++++++    #             time_record.pop(0)
+++++++    #         average_infer_time = sum(time_record) / len(time_record)
+++++++    #         print(f'average inference time is: {average_infer_time}')
+++++++    #         print(f'inference time record: {time_record}')
+++++++        
+++++++    #     if streamer is not None:
+++++++    #         streamer.end()
+++++++        
+++++++    #     # 简单判断：打印是否使用了JIT路径
+++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+++++++    #     else:
+++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++++++        
+++++++    #     if return_dict_in_generate:
+++++++    #         if self.config.is_encoder_decoder:
+++++++    #             return GenerateEncoderDecoderOutput(
+++++++    #                 sequences=input_ids,
+++++++    #                 scores=scores,
+++++++    #                 logits=raw_logits,
+++++++    #                 encoder_attentions=encoder_attentions,
+++++++    #                 encoder_hidden_states=encoder_hidden_states,
+++++++    #                 decoder_attentions=decoder_attentions,
+++++++    #                 cross_attentions=cross_attentions,
+++++++    #                 decoder_hidden_states=decoder_hidden_states,
+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++++    #                 average_infer_time=average_infer_time
+++++++    #             )
+++++++    #         else:
+++++++    #             return GenerateDecoderOnlyOutput(
+++++++    #                 sequences=input_ids,
+++++++    #                 scores=scores,
+++++++    #                 logits=raw_logits,
+++++++    #                 attentions=decoder_attentions,
+++++++    #                 hidden_states=decoder_hidden_states,
+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++++    #                 average_infer_time=average_infer_time
+++++++    #             )
+++++++    #     else:
+++++++    #         return input_ids
+++++++            
+++++++    # def _prepare_cache_for_generation(
+++++++    #     self,
+++++++    #     generation_config,
+++++++    #     model_kwargs,
+++++++    #     assistant_model,
+++++++    #     batch_size,
+++++++    #     max_cache_length,
+++++++    # ):
+++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++++++    #         generation_config.cache_implementation = "static"
+++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++++++        
+++++++    #     if generation_config.cache_implementation == "static":
+++++++    #         base_required_from_max_length = generation_config.max_length + 1
+++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+++++++    #         min_cache_size = 50
+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++++++    #         else:
+++++++    #             max_cache_length = max(base_required, min_cache_size)
+++++++            
+++++++    #         original_max_cache_length = max_cache_length
+++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+++++++            
+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++++    #             if max_cache_length > self.config.max_position_embeddings:
+++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++++++        
+++++++    #     result = super()._prepare_cache_for_generation(
+++++++    #         generation_config=generation_config,
+++++++    #         model_kwargs=model_kwargs,
+++++++    #         assistant_model=assistant_model,
+++++++    #         batch_size=batch_size,
+++++++    #         max_cache_length=max_cache_length,
+++++++    #     )
+++++++        
+++++++    #     if generation_config.cache_implementation == "static":
+++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++++++    #         created_cache = model_kwargs.get(cache_name)
+++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++++++    #             if created_cache.max_cache_len < generation_config.max_length:
+++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++++++        
+++++++    #     return result
+++++++
+++++++
+++++++
++++++ 
++++++ 
++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++++++-- 
++++++2.27.0
++++++
+++++-- 
+++++2.27.0
+++++
++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++++new file mode 100644
++++index 00000000..966529e4
++++--- /dev/null
+++++++ b/patches/0003-20261106secondcommit.patch
++++@@ -0,0 +1,2769 @@
+++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+++++From: Pinoeer-kingxi <13022943007@163.com>
+++++Date: Thu, 6 Nov 2025 14:54:37 +0800
+++++Subject: [PATCH 3/3] 20261106secondcommit
+++++
+++++---
+++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+++++ patches/0001-20251104commit.patch             | 1272 -----------------
+++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
+++++ delete mode 100644 patches/0001-20251104commit.patch
+++++
+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++index 73773c22..2f9192bf 100644
+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+++++ 
+++++ _CONFIG_FOR_DOC = "DeepseekConfig"
+++++ 
++++++_attn_mask_cache = {}
++++++
++++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
++++++    q_len = batch_and_seq[1]
++++++    kv_len = batch_and_seq[1] + past_key_values_length 
++++++    key = (batch_and_seq[0], q_len, kv_len)
++++++
++++++    if key in _attn_mask_cache:
++++++        return _attn_mask_cache[key]
++++++
++++++    mask = _prepare_4d_causal_attention_mask(
++++++        attention_mask,
++++++        batch_and_seq,
++++++        inputs_embeds,
++++++        past_key_values_length,
++++++    )
++++++    _attn_mask_cache[key] = mask
++++++    return mask
+++++ 
+++++ def _get_unpad_data(attention_mask):
+++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+++++         return final_output
+++++ 
+++++ 
+++++-    @no_grad()
+++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++-        expert_cache = ops.zeros_like(x)
+++++-        idxs = flat_expert_indices.argsort()
+++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++-        token_idxs = idxs // self.num_experts_per_tok
+++++-
+++++-        for i, end_idx in enumerate(tokens_per_expert):
+++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++-            if start_idx == end_idx:
+++++-                continue
+++++-            expert = self.experts[i]
+++++-            exp_token_idx = token_idxs[start_idx:end_idx]
+++++-            expert_tokens = x[exp_token_idx]
+++++-            expert_out = expert(expert_tokens)
+++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++-
+++++-        return expert_cache
+++++-        
+++++     # @no_grad()
+++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++-    #     # expert_cache = torch.zeros_like(x)
+++++-    #     # idxs = flat_expert_indices.argsort()
+++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++-    #     # token_idxs = idxs // self.num_experts_per_tok
+++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++-    #     #     if start_idx == end_idx:
+++++-    #     #         continue
+++++-    #     #     expert = self.experts[i]
+++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++-    #     #     expert_tokens = x[exp_token_idx]
+++++-    #     #     expert_out = expert(expert_tokens)
+++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++-    #     # return expert_cache
++++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++     #     expert_cache = ops.zeros_like(x)
+++++     #     idxs = flat_expert_indices.argsort()
+++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++ 
+++++     #     return expert_cache
+++++-    # @no_grad()
+++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++-    #     expert_cache = ops.zeros_like(x)
++++++        
++++++    @no_grad()
++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++        """
++++++        优化版 MoE prefill：
++++++        - 批量张量化处理同一个 expert 的所有 token
++++++        - 跳过无 token 的专家
++++++        - 保持结果完全一致
++++++        """
++++++        # 初始化输出缓存
++++++        expert_cache = ops.zeros_like(x)
+++++ 
+++++-    #     # 排序保证顺序一致
+++++-    #     idxs = flat_expert_indices.argsort()
+++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++-    #     token_idxs = idxs // self.num_experts_per_tok
++++++        # 排序（确保 scatter_add 位置对应原逻辑）
++++++        idxs = flat_expert_indices.argsort()
++++++        sorted_expert_indices = flat_expert_indices[idxs]
++++++        sorted_token_indices = idxs // self.num_experts_per_tok
+++++ 
+++++-    #     # 找出有 token 的专家
+++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++        # 每个 expert 的 token 数
++++++        tokens_per_expert = sorted_expert_indices.bincount()
+++++ 
+++++-    #     for i in active_experts.tolist():
+++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++-    #         end_idx = tokens_per_expert[i]
+++++-    #         if start_idx == end_idx:  # 没有 token
+++++-    #             continue
++++++        # 找出有 token 的专家
++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+++++ 
+++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++-    #         expert_tokens = x[exp_token_idx]
+++++-    #         expert_out = self.experts[i](expert_tokens)
+++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++        for expert_id in active_experts.tolist():
++++++            # 取该 expert 对应的排序后 token 区间
++++++            start = (tokens_per_expert[:expert_id]).sum().item()
++++++            end = start + tokens_per_expert[expert_id].item()
+++++ 
+++++-    #         expert_cache = mindspore.mint.scatter_add(
+++++-    #             expert_cache,
+++++-    #             0,
+++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++-    #             expert_out
+++++-    #         )
++++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
++++++            expert_tokens = x[token_idx]                     # 取输入向量
+++++ 
+++++-    #     return expert_cache
++++++            # 执行专家 MLP
++++++            expert_out = self.experts[expert_id](expert_tokens)
++++++
++++++            # 按权重缩放
++++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
++++++
++++++            # 回写到缓存（等价 scatter_add）
++++++            expert_cache = mindspore.mint.scatter_add(
++++++                expert_cache,
++++++                0,
++++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++                scaled_out
++++++            )
++++++
++++++        return expert_cache
++++++
++++++        # @no_grad()
++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++        #     # expert_cache = torch.zeros_like(x)
++++++        #     # idxs = flat_expert_indices.argsort()
++++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++        #     # token_idxs = idxs // self.num_experts_per_tok
++++++        #     # for i, end_idx in enumerate(tokens_per_expert):
++++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++        #     #     if start_idx == end_idx:
++++++        #     #         continue
++++++        #     #     expert = self.experts[i]
++++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++        #     #     expert_tokens = x[exp_token_idx]
++++++        #     #     expert_out = expert(expert_tokens)
++++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++        #     # return expert_cache
++++++        #     expert_cache = ops.zeros_like(x)
++++++        #     idxs = flat_expert_indices.argsort()
++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++        #     token_idxs = idxs // self.num_experts_per_tok
++++++
++++++        #     for i, end_idx in enumerate(tokens_per_expert):
++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++        #         if start_idx == end_idx:
++++++        #             continue
++++++        #         expert = self.experts[i]
++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++        #         expert_tokens = x[exp_token_idx]
++++++        #         expert_out = expert(expert_tokens)
++++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++
++++++        #     return expert_cache
++++++        # @no_grad()
++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++        #     expert_cache = ops.zeros_like(x)
++++++
++++++        #     # 排序保证顺序一致
++++++        #     idxs = flat_expert_indices.argsort()
++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++        #     token_idxs = idxs // self.num_experts_per_tok
++++++
++++++        #     # 找出有 token 的专家
++++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++
++++++        #     for i in active_experts.tolist():
++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++        #         end_idx = tokens_per_expert[i]
++++++        #         if start_idx == end_idx:  # 没有 token
++++++        #             continue
++++++
++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++        #         expert_tokens = x[exp_token_idx]
++++++        #         expert_out = self.experts[i](expert_tokens)
++++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++
++++++        #         expert_cache = mindspore.mint.scatter_add(
++++++        #             expert_cache,
++++++        #             0,
++++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++        #             expert_out
++++++        #         )
++++++
++++++        #     return expert_cache
+++++ 
+++++ 
+++++ 
+++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+++++ 
+++++         return attn_output, attn_weights, past_key_value
+++++ 
+++++-
+++++ # class DeepseekFlashAttention(nn.Module):
+++++ #     """
+++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+++++ 
+++++         return attn_output, attn_weights, past_key_value
+++++ 
++++++
+++++ Deepseek_ATTENTION_CLASSES = {
+++++     "eager": DeepseekAttention,
+++++     "flash-attention": DeepseekFlashAttention,
+++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+++++             )
+++++         else:
+++++             # 4d mask is passed through the layers
+++++-            attention_mask = _prepare_4d_causal_attention_mask(
++++++            # attention_mask = _prepare_4d_causal_attention_mask(
++++++            #     attention_mask,
++++++            #     (batch_size, seq_length),
++++++            #     inputs_embeds,
++++++            #     past_key_values_length,
++++++            # )
++++++            #@dwj
++++++            attention_mask = get_cached_causal_mask(
+++++                 attention_mask,
+++++                 (batch_size, seq_length),
+++++                 inputs_embeds,
+++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++         # Initialize weights and apply final processing
+++++         self.post_init()
+++++         self.warm_up = False
++++++        #@dwj
++++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
++++++            self.num_layers,
++++++            self.num_attention_heads,
++++++            self.head_dim,
++++++            batch_size=1,
++++++            max_length=self.max_length,
++++++            dtype=mindspore.float16
++++++        )
++++++
++++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
++++++        key_cache = []
++++++        value_cache = []
++++++        for _ in range(num_layers):
++++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++++            key_cache.append(k)
++++++            value_cache.append(v)
++++++        return key_cache, value_cache
++++++
+++++ 
+++++     def warmup_moe_model_deep(self):
+++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++index bced285c..ebd7782e 100644
+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+++++ 
+++++-Long_Prompt = False
+++++-PROMPT_LENGTH_THRESHOLD = 128
++++++Long_Prompt = 1
++++++LONG_PROMPT_LENGTH_THRESHOLD = 128
++++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
++++++
++++++_causal_mask_cache = {}
++++++
++++++def get_cached_causal_mask_with_cache_position(
++++++    attention_mask: mindspore.Tensor,
++++++    sequence_length: int,
++++++    target_length: int,
++++++    dtype: mindspore.dtype,
++++++    min_dtype: float,
++++++    cache_position: mindspore.Tensor,
++++++    batch_size: int,
++++++):
++++++    """
++++++    带缓存的 causal mask 构造函数
++++++    """
++++++    # q_len 是当前 query 长度
++++++    q_len = sequence_length
++++++    # kv_len 是 target_length
++++++    kv_len = target_length
++++++
++++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
++++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
++++++
++++++    if key in _causal_mask_cache:
++++++        return _causal_mask_cache[key]
++++++
++++++    # 调用原来的 mask 构造逻辑
++++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++        attention_mask,
++++++        sequence_length=sequence_length,
++++++        target_length=target_length,
++++++        dtype=dtype,
++++++        min_dtype=min_dtype,
++++++        cache_position=cache_position,
++++++        batch_size=batch_size,
++++++    )
++++++    # 缓存结果
++++++    _causal_mask_cache[key] = causal_mask
++++++    return causal_mask
+++++ 
+++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+++++ def _prepare_4d_causal_attention_mask_with_cache_position(
+++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++++ 
+++++ 
+++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
++++++# class Qwen2MoeAttention(nn.Module):
++++++#     """
++++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
++++++#     and "Generating Long Sequences with Sparse Transformers".
++++++#     """
++++++
++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++#         super().__init__()
++++++#         self.config = config
++++++#         self.layer_idx = layer_idx
++++++#         if layer_idx is None:
++++++#             logger.warning_once(
++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++#                 "when creating this class."
++++++#             )
++++++
++++++#         self.hidden_size = config.hidden_size
++++++#         self.num_heads = config.num_attention_heads
++++++#         self.head_dim = self.hidden_size // self.num_heads
++++++#         self.num_key_value_heads = config.num_key_value_heads
++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++#         self.max_position_embeddings = config.max_position_embeddings
++++++#         self.rope_theta = config.rope_theta
++++++#         self.is_causal = True
++++++#         self.attention_dropout = config.attention_dropout
++++++
++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++++#             raise ValueError(
++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++++#                 f" and `num_heads`: {self.num_heads})."
++++++#             )
++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++
++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++#             self.head_dim,
++++++#             max_position_embeddings=self.max_position_embeddings,
++++++#             base=self.rope_theta,
++++++#         )
++++++
++++++#     def forward(
++++++#         self,
++++++#         hidden_states: mindspore.Tensor,
++++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++++#         position_ids: Optional[mindspore.Tensor] = None,
++++++#         past_key_value: Optional[Cache] = None,
++++++#         output_attentions: bool = False,
++++++#         use_cache: bool = False,
++++++#         cache_position: Optional[mindspore.Tensor] = None,
++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++
++++++        
++++++
++++++#         bsz, q_len, _ = hidden_states.shape
++++++
++++++#         query_states = self.q_proj(hidden_states)
++++++#         key_states = self.k_proj(hidden_states)
++++++#         value_states = self.v_proj(hidden_states)
++++++
++++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++
++++++#         kv_seq_len = key_states.shape[-2]
++++++#         if past_key_value is not None:
++++++#             if self.layer_idx is None:
++++++#                 raise ValueError(
++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++#                     "with a layer index."
++++++#                 )
++++++#             if isinstance(past_key_value, StaticCache):
++++++#                 kv_seq_len = key_states.shape[-2]
++++++#             else:
++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++
++++++#         if past_key_value is not None:
++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++            
++++++#             if isinstance(past_key_value, StaticCache):
++++++#                 kv_seq_len = key_states.shape[-2]
++++++
++++++#         # repeat k/v heads if n_kv_heads < n_heads
++++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++        
++++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++
++++++#         if attention_mask is not None:
++++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++#             attn_weights = attn_weights + causal_mask
++++++
++++++#         # upcast attention to fp32
++++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++++++#         attn_output = ops.matmul(attn_weights, value_states)
++++++
++++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++++++#             raise ValueError(
++++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
++++++#                 f" {attn_output.shape}"
++++++#             )
++++++
++++++#         attn_output = ops.transpose(attn_output, 1, 2)
++++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++
++++++#         attn_output = self.o_proj(attn_output)
++++++#         # @lwx
++++++        
++++++#         # max_seq_len = self.max_position_embeddings  # 2048
++++++
++++++#         # if attention_mask is not None:
++++++#         #     # attention_mask: [B, 1, Sq, Sk]
++++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++++
++++++#         #     # pad 到 [max_seq_len, max_seq_len]
++++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++++#         #     global_attention_mask = padded_mask
++++++#         # else:
++++++#         #     global_attention_mask = None
++++++
++++++
++++++#         # sparse_mode=3
++++++#         # attn_output = mindspore.ops.flash_attention_score(
++++++#         #     query=query_states,
++++++#         #     key=key_states,
++++++#         #     value=value_states,
++++++#         #     real_shift=None,
++++++#         #     padding_mask=None,
++++++
++++++#         #     head_num=self.num_heads,
++++++#         #     attn_mask=global_attention_mask,
++++++#         #     keep_prob=1.0 - self.attention_dropout,
++++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++++#         #     input_layout="BNSD",
++++++#         #     pre_tokens=2147483647,
++++++#         #     next_tokens=2147483647,
++++++#         #     inner_precise=0,
++++++#         #     drop_mask=None,
++++++#         #     prefix=None,
++++++#         #     actual_seq_qlen=None,
++++++#         #     actual_seq_kvlen=None,
++++++#         #     sparse_mode=sparse_mode,
++++++#         # )
++++++#         if not output_attentions:
++++++#             attn_weights = None
++++++
++++++#         return attn_output, attn_weights, past_key_value
++++++
+++++ class Qwen2MoeAttention(nn.Module):
+++++     """
+++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+++++-    and "Generating Long Sequences with Sparse Transformers".
+++++-    """
++++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+++++ 
++++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
++++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
++++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
++++++
++++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
++++++    """
+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++         super().__init__()
+++++         self.config = config
+++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+++++         if layer_idx is None:
+++++             logger.warning_once(
+++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++                 "when creating this class."
+++++             )
+++++ 
+++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+++++         use_cache: bool = False,
+++++         cache_position: Optional[mindspore.Tensor] = None,
+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++-
+++++         
+++++-
++++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+++++         bsz, q_len, _ = hidden_states.shape
+++++ 
+++++         query_states = self.q_proj(hidden_states)
+++++         key_states = self.k_proj(hidden_states)
+++++         value_states = self.v_proj(hidden_states)
+++++ 
+++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++-
++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++        
+++++         kv_seq_len = key_states.shape[-2]
+++++         if past_key_value is not None:
+++++-            if self.layer_idx is None:
+++++-                raise ValueError(
+++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++-                    "with a layer index."
+++++-                )
+++++-            if isinstance(past_key_value, StaticCache):
+++++-                kv_seq_len = key_states.shape[-2]
+++++-            else:
+++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++        
+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++ 
+++++         if past_key_value is not None:
+++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++
++++++        # --- 2. 动态调度核心注意力计算 ---
++++++        global Long_Prompt
++++++        if Long_Prompt >= 1:
++++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
++++++            fa_attention_mask = None
++++++            if attention_mask is not None:
++++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++                fa_attention_mask = (mask_slice != 0)
++++++
++++++            attn_output = mindspore.ops.flash_attention_score(
++++++                query=query_states,
++++++                key=key_states,
++++++                value=value_states,
++++++                head_num=self.num_heads,
++++++                attn_mask=fa_attention_mask,
++++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
++++++                scalar_value=1.0 / math.sqrt(self.head_dim),
++++++                input_layout="BNSD",
++++++                sparse_mode=0,
++++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
++++++            )
+++++             
+++++-            if isinstance(past_key_value, StaticCache):
+++++-                kv_seq_len = key_states.shape[-2]
++++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++            attn_output = self.o_proj(attn_output)
++++++            attn_weights = None
++++++            if output_attentions:
++++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+++++ 
+++++-        # repeat k/v heads if n_kv_heads < n_heads
+++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++-        
+++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++        else:
++++++            # --- Eager Attention 路径 (用于短序列和解码) ---
++++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++            
++++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++ 
+++++-        if attention_mask is not None:
+++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++-            attn_weights = attn_weights + causal_mask
++++++            if attention_mask is not None:
++++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++                attn_weights = attn_weights + causal_mask
+++++ 
+++++-        # upcast attention to fp32
+++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++++-        attn_output = ops.matmul(attn_weights, value_states)
++++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++++++            attn_output = ops.matmul(attn_weights, value_states)
+++++ 
+++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++++-            raise ValueError(
+++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+++++-                f" {attn_output.shape}"
+++++-            )
++++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++++++                raise ValueError(
++++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
++++++                )
+++++ 
+++++-        attn_output = ops.transpose(attn_output, 1, 2)
+++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++            attn_output = ops.transpose(attn_output, 1, 2)
++++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++            attn_output = self.o_proj(attn_output)
+++++ 
+++++-        attn_output = self.o_proj(attn_output)
+++++-        # @lwx
++++++            if not output_attentions:
++++++                attn_weights = None
+++++         
+++++-        # max_seq_len = self.max_position_embeddings  # 2048
+++++-
+++++-        # if attention_mask is not None:
+++++-        #     # attention_mask: [B, 1, Sq, Sk]
+++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++-
+++++-        #     # pad 到 [max_seq_len, max_seq_len]
+++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++-        #     global_attention_mask = padded_mask
+++++-        # else:
+++++-        #     global_attention_mask = None
+++++-
+++++-
+++++-        # sparse_mode=3
+++++-        # attn_output = mindspore.ops.flash_attention_score(
+++++-        #     query=query_states,
+++++-        #     key=key_states,
+++++-        #     value=value_states,
+++++-        #     real_shift=None,
+++++-        #     padding_mask=None,
+++++-
+++++-        #     head_num=self.num_heads,
+++++-        #     attn_mask=global_attention_mask,
+++++-        #     keep_prob=1.0 - self.attention_dropout,
+++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++-        #     input_layout="BNSD",
+++++-        #     pre_tokens=2147483647,
+++++-        #     next_tokens=2147483647,
+++++-        #     inner_precise=0,
+++++-        #     drop_mask=None,
+++++-        #     prefix=None,
+++++-        #     actual_seq_qlen=None,
+++++-        #     actual_seq_kvlen=None,
+++++-        #     sparse_mode=sparse_mode,
+++++-        # )
+++++-        if not output_attentions:
+++++-            attn_weights = None
+++++-
+++++         return attn_output, attn_weights, past_key_value
+++++ 
+++++-
+++++ # class Qwen2MoeFlashAttention(nn.Module):
+++++ #     """
+++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+++++ #             return final_hidden_states, router_logits
+++++ 
+++++ 
+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++-#     """
+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+++++-#     """
+++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++-#         super().__init__()
+++++-#         self.num_experts = config.num_experts
+++++-#         self.top_k = config.num_experts_per_tok
+++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++-
+++++-#         # 门控网络
+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++-#         # 专家列表
+++++-#         self.experts = nn.ModuleList(
+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++-#         )
+++++-#         # 共享专家
+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++-
+++++-#     @no_grad()
+++++-#     def _moe_infer_decode(
+++++-#         self, 
+++++-#         hidden_states: mindspore.Tensor, 
+++++-#         selected_experts: mindspore.Tensor, 
+++++-#         routing_weights: mindspore.Tensor
+++++-#     ) -> mindspore.Tensor:
+++++-#         """
+++++-#         【解码路径】针对 sequence_length=1 的极致优化。
+++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+++++-#         """
+++++-#         batch_size, hidden_dim = hidden_states.shape
+++++-        
+++++-#         expert_outputs_list = [
+++++-#             ops.cat([
+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++-#             ], dim=0) 
+++++-#             for i in range(batch_size)
+++++-#         ]
+++++-        
+++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+++++-#         # shape: (batch_size, top_k, hidden_dim)
+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++-        
+++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++-        
+++++-#         return moe_output.squeeze(1)
+++++-
+++++-#     @no_grad()
+++++-#     def _moe_infer_prefill(
+++++-#         self, 
+++++-#         hidden_states: mindspore.Tensor, 
+++++-#         selected_experts: mindspore.Tensor, 
+++++-#         routing_weights: mindspore.Tensor
+++++-#     ) -> mindspore.Tensor:
+++++-#         """
+++++-#         【预填充路径】针对 sequence_length > 1 的优化。
+++++-#         按专家对 Token 进行分组，并进行批处理。
+++++-#         """
+++++-#         moe_output = ops.zeros_like(hidden_states)
+++++-#         num_tokens = hidden_states.shape[0]
+++++-#         flat_selected_experts = selected_experts.flatten()
+++++-        
+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++-        
+++++-#         active_experts = ops.unique(flat_selected_experts)
+++++-        
+++++-#         for expert_idx_tensor in active_experts:
+++++-#             expert_idx = expert_idx_tensor.item()
+++++-#             expert_layer = self.experts[expert_idx]
+++++-            
+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++-#             selected_token_indices = token_indices[mask]
+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++++-            
+++++-#             current_states = hidden_states[selected_token_indices]
+++++-            
+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++-            
+++++-#             moe_output = moe_output.index_add(
+++++-#                 dim=0,
+++++-#                 index=selected_token_indices,
+++++-#                 source=expert_output.to(hidden_states.dtype)
+++++-#             )
+++++-#         return moe_output
+++++-
+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++-#         """
+++++-#         顶层 forward 方法，作为智能分发器。
+++++-#         """
+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-        
+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++-
+++++-#         if self.norm_topk_prob:
+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-        
+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++-        
+++++-#         moe_output = None
+++++-#         # 在推理时，根据序列长度选择最优路径
+++++-#         if not self.training:
+++++-#             if sequence_length == 1:
+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++++-#             else:
+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++++-#         else:
+++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+++++-#             raise NotImplementedError("Training path is not implemented.")
+++++-
+++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+++++-        
+++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+++++-        
+++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+++++-        
+++++-#         return final_hidden_states, router_logits
+++++-
+++++-
+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++-#     """
+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+++++-#     """
+++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++-#         super().__init__()
+++++-#         self.num_experts = config.num_experts
+++++-#         self.top_k = config.num_experts_per_tok
+++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++-
+++++-#         # 门控网络
+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++-#         # 专家列表
+++++-#         self.experts = nn.ModuleList(
+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++-#         )
+++++-#         # 共享专家
+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++-
+++++-#     @no_grad()
+++++-#     def _moe_infer_decode(
+++++-#         self, 
+++++-#         hidden_states: mindspore.Tensor, 
+++++-#         selected_experts: mindspore.Tensor, 
+++++-#         routing_weights: mindspore.Tensor
+++++-#     ) -> mindspore.Tensor:
+++++-#         batch_size, _ = hidden_states.shape
+++++-#         expert_outputs_list = [
+++++-#             ops.cat([
+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++-#             ], dim=0) 
+++++-#             for i in range(batch_size)
+++++-#         ]
+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++-#         return moe_output.squeeze(1)
+++++-
+++++-#     @no_grad()
+++++-#     def _moe_infer_prefill(
+++++-#         self, 
+++++-#         hidden_states: mindspore.Tensor, 
+++++-#         selected_experts: mindspore.Tensor, 
+++++-#         routing_weights: mindspore.Tensor
+++++-#     ) -> mindspore.Tensor:
+++++-#         moe_output = ops.zeros_like(hidden_states)
+++++-#         num_tokens = hidden_states.shape[0]
+++++-#         flat_selected_experts = selected_experts.flatten()
+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++-#         active_experts = ops.unique(flat_selected_experts)
+++++-        
+++++-#         for expert_idx_tensor in active_experts:
+++++-#             expert_idx = expert_idx_tensor.item()
+++++-#             expert_layer = self.experts[expert_idx]
+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++-#             selected_token_indices = token_indices[mask]
+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++++-#             current_states = hidden_states[selected_token_indices]
+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++-#             moe_output = moe_output.index_add(
+++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++++-#             )
+++++-#         return moe_output
+++++-
+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++-#         """
+++++-#         顶层 forward 方法，作为智能分发器。
+++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+++++-#         """
+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-        
+++++-#         # 1. 门控计算 (通用逻辑)
+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++-
+++++-#         if self.norm_topk_prob:
+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-        
+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++-        
+++++-#         # 2. 智能分发到最优 MoE 路径
+++++-#         moe_output = None
+++++-#         if not self.training:
+++++-#             if sequence_length == 1:
+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++++-#             else:
+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++++-#         else:
+++++-#             raise NotImplementedError("Training path is not implemented.")
+++++-
+++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++-        
+++++-#         # 4. 合并 MoE 输出和共享专家输出
+++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++-        
+++++-#         # 5. 恢复原始形状并返回
+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++-        
+++++-#         return final_hidden_states, router_logits
+++++-
+++++-# prefill fastest
+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++-#     """
+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+++++-#     """
+++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++-#         super().__init__()
+++++-#         self.num_experts = config.num_experts
+++++-#         self.top_k = config.num_experts_per_tok
+++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++-
+++++-#         # 门控网络
+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++-#         # 专家列表
+++++-#         self.experts = nn.ModuleList(
+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++-#         )
+++++-#         # 共享专家
+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++-
+++++-#     @no_grad()
+++++-#     def _moe_infer_dispatch(
+++++-#         self, 
+++++-#         hidden_states: mindspore.Tensor, 
+++++-#         selected_experts: mindspore.Tensor, 
+++++-#         routing_weights: mindspore.Tensor
+++++-#     ) -> mindspore.Tensor:
+++++-#         """
+++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+++++-#         """
+++++-#         moe_output = ops.zeros_like(hidden_states)
+++++-#         num_tokens, _ = hidden_states.shape
+++++-        
+++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+++++-#         flat_selected_experts = selected_experts.flatten()
+++++-#         flat_routing_weights = routing_weights.flatten()
+++++-
+++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++-
+++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+++++-#         active_experts = ops.unique(flat_selected_experts)
+++++-        
+++++-#         for expert_idx_tensor in active_experts:
+++++-#             expert_idx = expert_idx_tensor.item()
+++++-#             expert_layer = self.experts[expert_idx]
+++++-            
+++++-#             # 找到所有分配给该专家的 token
+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++-            
+++++-#             # 使用 mask 选取对应的 token 和权重
+++++-#             current_token_indices = token_indices[mask]
+++++-#             current_routing_weights = flat_routing_weights[mask]
+++++-#             current_hidden_states = hidden_states[current_token_indices]
+++++-            
+++++-#             # 对这些 token 进行批处理
+++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++-            
+++++-#             # 使用 index_add 将结果精确地加回到对应位置
+++++-#             moe_output = moe_output.index_add(
+++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+++++-#             )
+++++-#         return moe_output
+++++-
+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++-#         """
+++++-#         顶层 forward 方法，作为智能分发器。
+++++-#         """
+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-        
+++++-#         # 1. 门控计算
+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++-
+++++-#         if self.norm_topk_prob:
+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-        
+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++-        
+++++-#         # 2. 调用统一的 MoE 计算内核
+++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+++++-
+++++-#         # 3. 统一处理共享专家
+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++-        
+++++-#         # 4. 合并输出
+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++-        
+++++-#         # 5. 恢复原始形状并返回
+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++-        
+++++-#         return final_hidden_states, router_logits
+++++-
+++++-
+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++-#     """
+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++-#     【最终高性能与高精度版】：
+++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+++++-#     3. 这样实现了速度和准确性的两全其美。
+++++-#     """
+++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++-#         super().__init__()
+++++-#         self.num_experts = config.num_experts
+++++-#         self.top_k = config.num_experts_per_tok
+++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++-
+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++-#         self.experts = nn.ModuleList(
+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++-#         )
+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++-
+++++-#     @no_grad()
+++++-#     def _moe_infer_decode(
+++++-#         self, 
+++++-#         hidden_states: mindspore.Tensor, 
+++++-#         selected_experts: mindspore.Tensor, 
+++++-#         routing_weights: mindspore.Tensor
+++++-#     ) -> mindspore.Tensor:
+++++-#         """
+++++-#         【解码路径】极致优化版：bmm + 高精度累加。
+++++-#         """
+++++-#         original_dtype = hidden_states.dtype
+++++-#         batch_size, _ = hidden_states.shape
+++++-        
+++++-#         expert_outputs_list = [
+++++-#             ops.cat([
+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++-#             ], dim=0) 
+++++-#             for i in range(batch_size)
+++++-#         ]
+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++-
+++++-#         # 在 float32 下执行 bmm，得到高精度结果
+++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++-        
+++++-#         # 将高精度结果转换回原始数据类型
+++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+++++-        
+++++-#         return moe_output
+++++-
+++++-#     @no_grad()
+++++-#     def _moe_infer_prefill(
+++++-#         self, 
+++++-#         hidden_states: mindspore.Tensor, 
+++++-#         selected_experts: mindspore.Tensor, 
+++++-#         routing_weights: mindspore.Tensor
+++++-#     ) -> mindspore.Tensor:
+++++-#         """
+++++-#         【预填充路径】与原始实现一致，结果精确。
+++++-#         """
+++++-#         moe_output = ops.zeros_like(hidden_states)
+++++-#         num_tokens, _ = hidden_states.shape
+++++-#         flat_selected_experts = selected_experts.flatten()
+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++-#         active_experts = ops.unique(flat_selected_experts)
+++++-        
+++++-#         for expert_idx_tensor in active_experts:
+++++-#             expert_idx = expert_idx_tensor.item()
+++++-#             expert_layer = self.experts[expert_idx]
+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++-#             selected_token_indices = token_indices[mask]
+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++++-#             current_states = hidden_states[selected_token_indices]
+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++-#             moe_output = moe_output.index_add(
+++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++++-#             )
+++++-#         return moe_output
+++++-
+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-        
+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++-
+++++-#         if self.norm_topk_prob:
+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-        
+++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+++++-#         # 如果模型主体是 float16，后续再转换
+++++-        
+++++-#         moe_output = None
+++++-#         if not self.training:
+++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+++++-#             # _moe_infer_decode 内部会处理好类型转换
+++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+++++-#             if sequence_length == 1:
+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++++-#             else:
+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++++-#         else:
+++++-#             raise NotImplementedError("Training path is not implemented.")
+++++-
+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++-        
+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++-        
+++++-#         return final_hidden_states, router_logits
+++++-    
+++++-
+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++-#     """
+++++-#     【融合版】一个混合专家模块，内置两种推理策略，
+++++-#     由外部全局变量 `Long_Prompt` 控制：
+++++-
+++++-#     - if Long_Prompt is True:  【精度优先模式】
+++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+++++-#       适用于处理长序列，避免误差累积。
+++++-
+++++-#     - if Long_Prompt is False: 【速度优先模式】
+++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
+++++-#     """
+++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++-#         super().__init__()
+++++-#         self.num_experts = config.num_experts
+++++-#         self.top_k = config.num_experts_per_tok
+++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++-
+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++-#         self.experts = nn.ModuleList(
+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++-#         )
+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++-
+++++-#     # --- 速度优先模式的辅助函数 ---
+++++-#     @no_grad()
+++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++-#         original_dtype = hidden_states.dtype
+++++-#         batch_size, _ = hidden_states.shape
+++++-#         expert_outputs_list = [
+++++-#             ops.cat([
+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++-#             ], dim=0) 
+++++-#             for i in range(batch_size)
+++++-#         ]
+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
+++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+++++-
+++++-#     @no_grad()
+++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++-#         moe_output = ops.zeros_like(hidden_states)
+++++-#         num_tokens, _ = hidden_states.shape
+++++-#         flat_selected_experts = selected_experts.flatten()
+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++-#         active_experts = ops.unique(flat_selected_experts)
+++++-#         for expert_idx_tensor in active_experts:
+++++-#             expert_idx = expert_idx_tensor.item()
+++++-#             expert_layer = self.experts[expert_idx]
+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++-#             selected_token_indices = token_indices[mask]
+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++++-#             current_states = hidden_states[selected_token_indices]
+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+++++-#         return moe_output
+++++-
+++++-#     # --- 精度优先模式的辅助函数 ---
+++++-#     @no_grad()
+++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++-#         moe_output = ops.zeros_like(hidden_states)
+++++-#         num_tokens, _ = hidden_states.shape
+++++-#         flat_selected_experts = selected_experts.flatten()
+++++-#         flat_routing_weights = routing_weights.flatten()
+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++-#         active_experts = ops.unique(flat_selected_experts)
+++++-#         for expert_idx_tensor in active_experts:
+++++-#             expert_idx = expert_idx_tensor.item()
+++++-#             expert_layer = self.experts[expert_idx]
+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++-#             current_token_indices = token_indices[mask]
+++++-#             current_routing_weights = flat_routing_weights[mask]
+++++-#             current_hidden_states = hidden_states[current_token_indices]
+++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++++-#         return moe_output
+++++-
+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
+++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+++++-#         global Long_Prompt
+++++-
+++++-#         # 1. 门控计算 (所有模式通用)
+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++++-#         if self.norm_topk_prob:
+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-        
+++++-#         moe_output = None
+++++-#         if not self.training:
+++++-#             # 根据 Long_Prompt 标志选择模式
+++++-#             if Long_Prompt:
+++++-#                 # --- 精度优先模式 ---
+++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++-#             else:
+++++-#                 # --- 速度优先模式 ---
+++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++-#                 if sequence_length == 1:
+++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++-#                 else:
+++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++-#         else:
+++++-#             raise NotImplementedError("Training path is not implemented.")
+++++-
+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++-        
+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++-        
+++++-#         return final_hidden_states, router_logits
+++++-    
+++++ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++     """
+++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++++         return moe_output_fp32.squeeze(1).to(original_dtype)
+++++ 
++++++    # @no_grad()
++++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++    #     num_tokens, _ = hidden_states.shape
++++++    #     flat_selected_experts = selected_experts.flatten()
++++++    #     sorted_expert_indices = flat_selected_experts.argsort()
++++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++++++    #     original_token_indices = sorted_expert_indices // self.top_k
++++++    #     moe_output = ops.zeros_like(hidden_states)
++++++    #     current_token_offset = 0
++++++    #     for i in range(self.num_experts):
++++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
++++++    #         if expert_token_count == 0:
++++++    #             continue
++++++    #         end_offset = current_token_offset + expert_token_count
++++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
++++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++++++    #         current_token_offset += expert_token_count
++++++    #     return moe_output
++++++
+++++     @no_grad()
+++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++-        num_tokens, _ = hidden_states.shape
+++++-        flat_selected_experts = selected_experts.flatten()
+++++-        sorted_expert_indices = flat_selected_experts.argsort()
+++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++++-        original_token_indices = sorted_expert_indices // self.top_k
++++++        """
++++++        优化版 MoE prefill (速度优先模式)：
++++++        - 批量张量化处理同一个 expert 的所有 token
++++++        - 跳过无 token 的专家
++++++        - 保持结果完全一致
++++++        """
+++++         moe_output = ops.zeros_like(hidden_states)
+++++-        current_token_offset = 0
+++++-        for i in range(self.num_experts):
+++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
+++++-            if expert_token_count == 0:
+++++-                continue
+++++-            end_offset = current_token_offset + expert_token_count
+++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
+++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++++-            current_token_offset += expert_token_count
++++++
++++++        flat_selected_experts = selected_experts.flatten()
++++++        flat_routing_weights = routing_weights.flatten()
++++++
++++++        idxs = flat_selected_experts.argsort()
++++++        sorted_expert_indices = flat_selected_experts[idxs]
++++++        sorted_token_indices = idxs // self.top_k
++++++
++++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
++++++
++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
++++++
++++++        for expert_id in active_experts.tolist():
++++++            start = int(tokens_per_expert[:expert_id].sum().item())
++++++            end = start + int(tokens_per_expert[expert_id].item())
++++++
++++++            token_idx = sorted_token_indices[start:end]
++++++            expert_tokens = hidden_states[token_idx]
++++++
++++++            expert_out = self.experts[expert_id](expert_tokens)
++++++
++++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
++++++
++++++            moe_output = mindspore.mint.scatter_add(
++++++                moe_output,
++++++                0,
++++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
++++++                scaled_out.to(hidden_states.dtype)
++++++            )
++++++
+++++         return moe_output
+++++ 
++++++
+++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+++++     @no_grad()
+++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++         
+++++         moe_output = None
+++++-        if Long_Prompt:
+++++-            # --- 精度优先模式 (ACCURACY MODE) ---
+++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++        # if Long_Prompt==0:
++++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++        # else:
++++++        #     # --- 速度优先模式 (SPEED MODE) ---
++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++        #     if sequence_length == 1:
++++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++        #     else:
++++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++        
++++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++        if sequence_length == 1:
++++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++         else:
+++++-            # --- 速度优先模式 (SPEED MODE) ---
+++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++-            if sequence_length == 1:
+++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++-            else:
+++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++-        
++++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++    
+++++ 
+++++         # 3. 共享专家计算与合并 (所有模式通用)
+++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++         
+++++         return final_hidden_states, router_logits
+++++ 
++++++
+++++ class Qwen2MoeDecoderLayer(nn.Module):
+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+++++         super().__init__()
+++++         self.hidden_size = config.hidden_size
+++++         
+++++-        # if Long_Prompt:
+++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++-        # else:
++++++        # if Long_Prompt == 2:
+++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++        # else:
++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++ 
+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++ 
+++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++++             )
+++++ 
+++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++        #     attention_mask,
++++++        #     sequence_length=sequence_length,
++++++        #     target_length=target_length,
++++++        #     dtype=dtype,
++++++        #     min_dtype=min_dtype,
++++++        #     cache_position=cache_position,
++++++        #     batch_size=input_tensor.shape[0],
++++++        # )
++++++        #@dwj
++++++        causal_mask = get_cached_causal_mask_with_cache_position(
+++++             attention_mask,
+++++             sequence_length=sequence_length,
+++++             target_length=target_length,
+++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+++++         """
+++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
++++++        _causal_mask_cache.clear()
+++++ 
+++++         input_ids = kwargs.get("input_ids")
+++++         if input_ids is None and args:
+++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++ 
+++++         if input_ids is not None:
+++++             prompt_length = input_ids.shape[1]
+++++-            
+++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+++++-                Long_Prompt = True
++++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
++++++                Long_Prompt = 2
++++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
++++++                Long_Prompt = 0
+++++             else:
+++++-                Long_Prompt = False
++++++                Long_Prompt = 1
++++++
+++++ 
+++++         return super().generate(*args, **kwargs)
+++++     
+++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++             dtype = self.lm_head.weight.dtype
+++++             min_dtype = float(ops.finfo(dtype).min)
+++++ 
+++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++            #     attention_mask,
++++++            #     sequence_length=sequence_length,
++++++            #     target_length=past_key_values.get_max_length(),
++++++            #     dtype=dtype,
++++++            #     min_dtype=min_dtype,
++++++            #     cache_position=cache_position,
++++++            #     batch_size=batch_size,
++++++            # )
++++++
++++++            #@dwj
++++++            attention_mask = get_cached_causal_mask_with_cache_position(
+++++                 attention_mask,
+++++                 sequence_length=sequence_length,
+++++                 target_length=past_key_values.get_max_length(),
+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++++deleted file mode 100644
+++++index 6dfb5b93..00000000
+++++--- a/patches/0001-20251104commit.patch
++++++++ /dev/null
+++++@@ -1,1272 +0,0 @@
+++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++++-From: Pinoeer-kingxi <13022943007@163.com>
+++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
+++++-Subject: [PATCH] 20251104commit
+++++-
+++++----
+++++- mindnlp/transformers/cache_utils.py           |  28 +-
+++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++++- 3 files changed, 976 insertions(+), 87 deletions(-)
+++++-
+++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++++-index cadd2e04..02f8d4be 100644
+++++---- a/mindnlp/transformers/cache_utils.py
+++++-+++ b/mindnlp/transformers/cache_utils.py
+++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++++-             # k_out[:, :, cache_position] = key_states
+++++-             # v_out[:, :, cache_position] = value_states
+++++--            if ON_ORANGE_PI:
+++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++--            else:
+++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++--
+++++-+            # if ON_ORANGE_PI:
+++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++-+            # else:
+++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
+++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++++-+            if cache_position.ndim > 1:
+++++-+                cache_position = cache_position.flatten()
+++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++++-+                cache_position = cache_position.int()
+++++-+            
+++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++++-+            k_out[:, :, cache_position] = key_states
+++++-+            v_out[:, :, cache_position] = value_states
+++++-+            
+++++-         return k_out, v_out
+++++- 
+++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++-index c695b944..d8303e45 100644
+++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
+++++- def rotate_half(x):
+++++-     """Rotates half the hidden dims of the input."""
+++++--    x1 = x[..., : x.shape[-1] // 2]
+++++--    x2 = x[..., x.shape[-1] // 2 :]
+++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++-+    # x1 = x[..., : x.shape[-1] // 2]
+++++-+    # x2 = x[..., x.shape[-1] // 2 :]
+++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++-     return ops.cat((-x2, x1), dim=-1)
+++++- 
+++++- 
+++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++++-         if self.training:
+++++-             raise NotImplementedError("Training is not supported yet.")
+++++-         else:
+++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++--        if self.config.n_shared_experts is not None:
+++++--            y = y + self.shared_experts(identity)
+++++--        return y
+++++-+            # @lwx
+++++-+            if orig_shape[1] == 1:
+++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++++-+                y=y.view(*orig_shape)
+++++-+                if self.config.n_shared_experts is not None:
+++++-+                    y = y + self.shared_experts(identity)
+++++-+                return y
+++++-+            else:
+++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++++-+                if self.config.n_shared_experts is not None:
+++++-+                    y = y + self.shared_experts(identity)
+++++-+                return y
+++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++-+        # if self.config.n_shared_experts is not None:
+++++-+        #     y = y + self.shared_experts(identity)
+++++-+        # return y
+++++-+        
+++++-+    @no_grad()
+++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++-+
+++++-+        expert_cache = ops.zeros_like(x)
+++++-+        for i in range(self.num_experts_per_tok):
+++++-+            expert_id = flat_expert_indices[i].item()
+++++-+            weight = flat_expert_weights[i].item()
+++++-+            expert = self.experts[expert_id]
+++++-+            expert_out = expert(x)
+++++-+            expert_cache += expert_out * weight
+++++-+        return expert_cache
+++++- 
+++++-     @no_grad()
+++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++--        # expert_cache = torch.zeros_like(x)
+++++--        # idxs = flat_expert_indices.argsort()
+++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++--        # token_idxs = idxs // self.num_experts_per_tok
+++++--        # for i, end_idx in enumerate(tokens_per_expert):
+++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++--        #     if start_idx == end_idx:
+++++--        #         continue
+++++--        #     expert = self.experts[i]
+++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++--        #     expert_tokens = x[exp_token_idx]
+++++--        #     expert_out = expert(expert_tokens)
+++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++--        # return expert_cache
+++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++-         expert_cache = ops.zeros_like(x)
+++++-         idxs = flat_expert_indices.argsort()
+++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++-         token_idxs = idxs // self.num_experts_per_tok
+++++-+
+++++-         for i, end_idx in enumerate(tokens_per_expert):
+++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++-             if start_idx == end_idx:
+++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++++-             expert_out = expert(expert_tokens)
+++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++-+
+++++-         return expert_cache
+++++-+        
+++++-+    # @no_grad()
+++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++-+    #     # expert_cache = torch.zeros_like(x)
+++++-+    #     # idxs = flat_expert_indices.argsort()
+++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
+++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++-+    #     #     if start_idx == end_idx:
+++++-+    #     #         continue
+++++-+    #     #     expert = self.experts[i]
+++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++-+    #     #     expert_tokens = x[exp_token_idx]
+++++-+    #     #     expert_out = expert(expert_tokens)
+++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++-+    #     # return expert_cache
+++++-+    #     expert_cache = ops.zeros_like(x)
+++++-+    #     idxs = flat_expert_indices.argsort()
+++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++-+    #     token_idxs = idxs // self.num_experts_per_tok
+++++-+
+++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
+++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++-+    #         if start_idx == end_idx:
+++++-+    #             continue
+++++-+    #         expert = self.experts[i]
+++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++-+    #         expert_tokens = x[exp_token_idx]
+++++-+    #         expert_out = expert(expert_tokens)
+++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++-+
+++++-+    #     return expert_cache
+++++-+    # @no_grad()
+++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++-+    #     expert_cache = ops.zeros_like(x)
+++++-+
+++++-+    #     # 排序保证顺序一致
+++++-+    #     idxs = flat_expert_indices.argsort()
+++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++-+    #     token_idxs = idxs // self.num_experts_per_tok
+++++-+
+++++-+    #     # 找出有 token 的专家
+++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++-+
+++++-+    #     for i in active_experts.tolist():
+++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++-+    #         end_idx = tokens_per_expert[i]
+++++-+    #         if start_idx == end_idx:  # 没有 token
+++++-+    #             continue
+++++-+
+++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++-+    #         expert_tokens = x[exp_token_idx]
+++++-+    #         expert_out = self.experts[i](expert_tokens)
+++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++-+
+++++-+    #         expert_cache = mindspore.mint.scatter_add(
+++++-+    #             expert_cache,
+++++-+    #             0,
+++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++-+    #             expert_out
+++++-+    #         )
+++++-+
+++++-+    #     return expert_cache
+++++-+
+++++-+
+++++- 
+++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++++- #     """
+++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++- 
+++++-         # Initialize weights and apply final processing
+++++-         self.post_init()
+++++-+        self.warm_up = False
+++++-+
+++++-+    def warmup_moe_model_deep(self):
+++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++-+        test_texts = [
+++++-+            "warmup short",
+++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++++-+        ]
+++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++-+        if tokenizer is None:
+++++-+            from mindnlp.transformers import AutoTokenizer
+++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++-+            self._warmup_tokenizer = tokenizer
+++++-+
+++++-+        for text in test_texts:
+++++-+            inputs = tokenizer(text, return_tensors="ms")
+++++-+            with mindspore._no_grad():
+++++-+                _ = self(**inputs, use_cache=False)
+++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++++- 
+++++-     def get_input_embeddings(self):
+++++-         return self.model.embed_tokens
+++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++-         ```"""
+++++-+        if not self.warm_up:
+++++-+            self.warm_up = True
+++++-+            self.warmup_moe_model_deep()
+++++-+
+++++-         output_attentions = (
+++++-             output_attentions
+++++-             if output_attentions is not None
+++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++-index 3cbf820e..d4c6b651 100644
+++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++-@@ -18,7 +18,6 @@
+++++- # See the License for the specific language governing permissions and
+++++- # limitations under the License.
+++++- """MindSpore Qwen2MoE model."""
+++++--
+++++- import math
+++++- from typing import List, Optional, Tuple, Union
+++++- 
+++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++++-     TokenClassifierOutput,
+++++- )
+++++- from ...modeling_utils import PreTrainedModel
+++++-+from ...generation import GenerationMixin
+++++- from ....utils import logging
+++++- from .configuration_qwen2_moe import Qwen2MoeConfig
+++++- 
+++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++++-         self.variance_epsilon = eps
+++++- 
+++++-     def forward(self, hidden_states):
+++++-+        # @dwj
+++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++-+        # @lwx
+++++-+        # if not self.training :
+++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++-         input_dtype = hidden_states.dtype
+++++-         hidden_states = hidden_states.to(mindspore.float32)
+++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++++-@@ -234,6 +239,8 @@ def rotate_half(x):
+++++-     """Rotates half the hidden dims of the input."""
+++++-     x1 = x[..., : x.shape[-1] // 2]
+++++-     x2 = x[..., x.shape[-1] // 2 :]
+++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++-     return ops.cat((-x2, x1), dim=-1)
+++++- 
+++++- 
+++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++++-         self.config = config
+++++-         self.hidden_size = config.hidden_size
+++++-         self.intermediate_size = intermediate_size
+++++-+        
+++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++++-         self.act_fn = ACT2FN[config.hidden_act]
+++++- 
+++++-     def forward(self, x):
+++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++--
+++++- 
+++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++-+        # @lwx 
+++++-+        # gate_up_output = self.gate_up_proj(x)
+++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++++-+        # return self.down_proj(swiglu_output)
+++++-+
+++++-+    # def forward(self, x):
+++++-+    #     gate_proj_out = self.gate_proj(x)
+++++-+    #     up_proj_out = self.up_proj(x)
+++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++++-+    #     return self.down_proj(swiglu_out)
+++++-+        
+++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++++-     """
+++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++++-         use_cache: bool = False,
+++++-         cache_position: Optional[mindspore.Tensor] = None,
+++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++-+
+++++-+        
+++++-+
+++++-         bsz, q_len, _ = hidden_states.shape
+++++- 
+++++-         query_states = self.q_proj(hidden_states)
+++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++-                     "with a layer index."
+++++-                 )
+++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++-+            if isinstance(past_key_value, StaticCache):
+++++-+                kv_seq_len = key_states.shape[-2]
+++++-+            else:
+++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++- 
+++++-         if past_key_value is not None:
+++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++-+            
+++++-+            if isinstance(past_key_value, StaticCache):
+++++-+                kv_seq_len = key_states.shape[-2]
+++++- 
+++++-         # repeat k/v heads if n_kv_heads < n_heads
+++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++--
+++++-+        
+++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++- 
+++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++++--            raise ValueError(
+++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++++--                f" {attn_weights.shape}"
+++++--            )
+++++--
+++++--        if attention_mask is not None:  # no matter the length, we just slice it
+++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++++-+        if attention_mask is not None:
+++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++-             attn_weights = attn_weights + causal_mask
+++++- 
+++++-         # upcast attention to fp32
+++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++- 
+++++-         attn_output = self.o_proj(attn_output)
+++++--
+++++-+        # @lwx
+++++-+        
+++++-+        # max_seq_len = self.max_position_embeddings  # 2048
+++++-+
+++++-+        # if attention_mask is not None:
+++++-+        #     # attention_mask: [B, 1, Sq, Sk]
+++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++-+
+++++-+        #     # pad 到 [max_seq_len, max_seq_len]
+++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++-+        #     global_attention_mask = padded_mask
+++++-+        # else:
+++++-+        #     global_attention_mask = None
+++++-+
+++++-+
+++++-+        # sparse_mode=3
+++++-+        # attn_output = mindspore.ops.flash_attention_score(
+++++-+        #     query=query_states,
+++++-+        #     key=key_states,
+++++-+        #     value=value_states,
+++++-+        #     real_shift=None,
+++++-+        #     padding_mask=None,
+++++-+
+++++-+        #     head_num=self.num_heads,
+++++-+        #     attn_mask=global_attention_mask,
+++++-+        #     keep_prob=1.0 - self.attention_dropout,
+++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++-+        #     input_layout="BNSD",
+++++-+        #     pre_tokens=2147483647,
+++++-+        #     next_tokens=2147483647,
+++++-+        #     inner_precise=0,
+++++-+        #     drop_mask=None,
+++++-+        #     prefix=None,
+++++-+        #     actual_seq_qlen=None,
+++++-+        #     actual_seq_kvlen=None,
+++++-+        #     sparse_mode=sparse_mode,
+++++-+        # )
+++++-         if not output_attentions:
+++++-             attn_weights = None
+++++- 
+++++-         return attn_output, attn_weights, past_key_value
+++++- 
+++++- 
+++++-+class Qwen2MoeFlashAttention(nn.Module):
+++++-+    """
+++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++-+
+++++-+    关键改动:
+++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++-+       直接传入原始的 key 和 value 张量效率更高。
+++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++-+    """
+++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++-+        super().__init__()
+++++-+        self.config = config
+++++-+        self.layer_idx = layer_idx
+++++-+        self.hidden_size = config.hidden_size
+++++-+        self.num_heads = config.num_attention_heads
+++++-+        self.head_dim = self.hidden_size // self.num_heads
+++++-+        self.num_key_value_heads = config.num_key_value_heads
+++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++-+        self.max_position_embeddings = config.max_position_embeddings
+++++-+        self.rope_theta = config.rope_theta
+++++-+        self.attention_dropout = config.attention_dropout
+++++-+
+++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++-+            raise ValueError(
+++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++-+            )
+++++-+
+++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++-+
+++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++-+            self.head_dim,
+++++-+            max_position_embeddings=self.max_position_embeddings,
+++++-+            base=self.rope_theta,
+++++-+        )
+++++-+
+++++-+    def forward(
+++++-+        self,
+++++-+        hidden_states: mindspore.Tensor,
+++++-+        attention_mask: Optional[mindspore.Tensor] = None,
+++++-+        position_ids: Optional[mindspore.Tensor] = None,
+++++-+        past_key_value: Optional[Cache] = None,
+++++-+        output_attentions: bool = False,
+++++-+        use_cache: bool = False,
+++++-+        cache_position: Optional[mindspore.Tensor] = None,
+++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++-+
+++++-+        bsz, q_len, _ = hidden_states.shape
+++++-+
+++++-+        # 1. 线性投射 Q, K, V
+++++-+        query_states = self.q_proj(hidden_states)
+++++-+        key_states = self.k_proj(hidden_states)
+++++-+        value_states = self.v_proj(hidden_states)
+++++-+
+++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+
+++++-+        # 3. RoPE 旋转位置编码
+++++-+        kv_seq_len = key_states.shape[-2]
+++++-+        if past_key_value is not None:
+++++-+            if self.layer_idx is None:
+++++-+                raise ValueError(
+++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++-+                    "with a layer index."
+++++-+                )
+++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++-+                if cache_position.shape[0] == 1:
+++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++-+                    kv_seq_len = past_seen_tokens + 1
+++++-+                else:
+++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
+++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++-+            else:
+++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++-+
+++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++-+
+++++-+        # 4. KV 缓存更新
+++++-+        if past_key_value is not None:
+++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++-+            key_states, value_states = past_key_value.update(
+++++-+                key_states, value_states, self.layer_idx, cache_kwargs
+++++-+            )
+++++-+            
+++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++-+                if cache_position.shape[0] == 1:
+++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++-+                    kv_seq_len = key_states.shape[-2]
+++++-+
+++++-+        # 5. [重要] 准备 Attention Mask
+++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++-+        fa_attention_mask = None
+++++-+        if attention_mask is not None:
+++++-+            # 截取与当前key长度匹配的部分
+++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++-+            fa_attention_mask = (mask_slice != 0)
+++++-+
+++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++-+        input_dtype = query_states.dtype
+++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++-+            query_states = query_states.to(mindspore.float16)
+++++-+            key_states = key_states.to(mindspore.float16)
+++++-+            value_states = value_states.to(mindspore.float16)
+++++-+
+++++-+        # 6. [核心] 调用 flash_attention_score 算子
+++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++-+        attn_output = mindspore.ops.flash_attention_score(
+++++-+            query=query_states,
+++++-+            key=key_states,
+++++-+            value=value_states,
+++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
+++++-+            attn_mask=fa_attention_mask,
+++++-+            keep_prob=1.0 - self.attention_dropout,
+++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+++++-+            input_layout="BNSD",
+++++-+            sparse_mode=0 # 使用 defaultMask 模式
+++++-+        )
+++++-+
+++++-+        # 恢复原始数据类型
+++++-+        attn_output = attn_output.to(input_dtype)
+++++-+
+++++-+        # 7. 调整输出形状
+++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++-+        attn_output = self.o_proj(attn_output)
+++++-+
+++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
+++++-+        attn_weights = None
+++++-+        if output_attentions:
+++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++-+
+++++-+        return attn_output, attn_weights, past_key_value
+++++-+
+++++-+    # def forward(
+++++-+    #     self,
+++++-+    #     hidden_states: mindspore.Tensor,
+++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+++++-+    #     past_key_value: Optional[Cache] = None,
+++++-+    #     output_attentions: bool = False,
+++++-+    #     use_cache: bool = False,
+++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++-+
+++++-+    #     bsz, q_len, _ = hidden_states.shape
+++++-+
+++++-+    #     # 1. 线性投射 Q, K, V
+++++-+    #     query_states = self.q_proj(hidden_states)
+++++-+    #     key_states = self.k_proj(hidden_states)
+++++-+    #     value_states = self.v_proj(hidden_states)
+++++-+
+++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+
+++++-+    #     # 3. RoPE 旋转位置编码
+++++-+    #     kv_seq_len = key_states.shape[-2]
+++++-+    #     if past_key_value is not None:
+++++-+    #         if self.layer_idx is None:
+++++-+    #             raise ValueError(
+++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++-+    #                 "with a layer index."
+++++-+    #             )
+++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++-+
+++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++-+
+++++-+    #     # 4. KV 缓存更新
+++++-+    #     if past_key_value is not None:
+++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++-+    #         key_states, value_states = past_key_value.update(
+++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++-+    #         )
+++++-+
+++++-+    #     # 5. 准备 Attention Mask
+++++-+    #     fa_attention_mask = None
+++++-+    #     if attention_mask is not None:
+++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++-+    #         fa_attention_mask = (mask_slice != 0)
+++++-+
+++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++-+    #     input_dtype = query_states.dtype
+++++-+
+++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
+++++-+    #     attn_output = mindspore.ops.flash_attention_score(
+++++-+    #         query=query_states,
+++++-+    #         key=key_states,
+++++-+    #         value=value_states,
+++++-+    #         head_num=self.num_heads,
+++++-+    #         attn_mask=fa_attention_mask,
+++++-+    #         keep_prob=1.0 - self.attention_dropout,
+++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++-+    #         input_layout="BNSD",
+++++-+    #         sparse_mode=0,
+++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++-+    #         inner_precise=1
+++++-+    #     )
+++++-+
+++++-+    #     # 恢复原始数据类型
+++++-+    #     attn_output = attn_output.to(input_dtype)
+++++-+
+++++-+    #     # 7. 调整输出形状
+++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++-+    #     attn_output = self.o_proj(attn_output)
+++++-+
+++++-+    #     attn_weights = None
+++++-+    #     if output_attentions:
+++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++-+
+++++-+    #     return attn_output, attn_weights, past_key_value
+++++-+
+++++-+    # def forward(
+++++-+    #     self,
+++++-+    #     hidden_states: mindspore.Tensor,
+++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+++++-+    #     past_key_value: Optional[Cache] = None,
+++++-+    #     output_attentions: bool = False,
+++++-+    #     use_cache: bool = False,
+++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++-+
+++++-+    #     bsz, q_len, _ = hidden_states.shape
+++++-+
+++++-+    #     query_states = self.q_proj(hidden_states)
+++++-+    #     key_states = self.k_proj(hidden_states)
+++++-+    #     value_states = self.v_proj(hidden_states)
+++++-+
+++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-+
+++++-+    #     kv_seq_len = key_states.shape[-2]
+++++-+    #     if past_key_value is not None:
+++++-+    #         if self.layer_idx is None:
+++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
+++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++-+
+++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++-+
+++++-+    #     if past_key_value is not None:
+++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++-+    #         key_states, value_states = past_key_value.update(
+++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++-+    #         )
+++++-+
+++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++-+
+++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
+++++-+    #     # <--- 修改结束 ---
+++++-+
+++++-+    #     fa_attention_mask = None
+++++-+    #     if attention_mask is not None:
+++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++-+    #         fa_attention_mask = (mask_slice != 0)
+++++-+
+++++-+    #     input_dtype = query_states.dtype
+++++-+
+++++-+    #     attn_output = mindspore.ops.flash_attention_score(
+++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
+++++-+    #         key=key_states,
+++++-+    #         value=value_states,
+++++-+    #         head_num=self.num_heads,
+++++-+    #         attn_mask=fa_attention_mask,
+++++-+    #         keep_prob=1.0 - self.attention_dropout,
+++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++-+    #         input_layout="BNSD",
+++++-+    #         sparse_mode=0,
+++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
+++++-+    #     )
+++++-+
+++++-+    #     attn_output = attn_output.to(input_dtype)
+++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++-+    #     attn_output = self.o_proj(attn_output)
+++++-+
+++++-+    #     attn_weights = None
+++++-+    #     if output_attentions:
+++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++-+
+++++-+    #     return attn_output, attn_weights, past_key_value
+++++-+
+++++- QWEN2MOE_ATTENTION_CLASSES = {
+++++-     "eager": Qwen2MoeAttention,
+++++-+    "flash-attention": Qwen2MoeFlashAttention,
+++++- }
+++++- 
+++++- 
+++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++- 
+++++-+    #@dwj
+++++-+    # 只遍历激活的专家，而非全部专家
+++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++--        hidden_states = hidden_states.view(-1, hidden_dim)
+++++--        # router_logits: (batch * sequence_length, n_experts)
+++++--        router_logits = self.gate(hidden_states)
+++++--
+++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++--        if self.norm_topk_prob:
+++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++--        # we cast back to the input dtype
+++++--        routing_weights = routing_weights.to(hidden_states.dtype)
+++++--
+++++--        final_hidden_states = ops.zeros(
+++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++++--        )
+++++--
+++++--        # One hot encode the selected experts to create an expert mask
+++++--        # this will be used to easily index which expert is going to be sollicitated
+++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++++--
+++++--        # Loop over all available experts in the model and perform the computation on each expert
+++++--        for expert_idx in range(self.num_experts):
+++++--            expert_layer = self.experts[expert_idx]
+++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++++--
+++++--            # Index the correct hidden states and compute the expert hidden state for
+++++--            # the current expert. We need to make sure to multiply the output hidden
+++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++++--            if 0 not in idx.shape:
+++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++++--
+++++--                # However `index_add_` only support torch tensors for indexing so we'll use
+++++--                # the `top_x` tensor here.
+++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++++--
+++++--        shared_expert_output = self.shared_expert(hidden_states)
+++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++++--
+++++--        final_hidden_states = final_hidden_states + shared_expert_output
+++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++-+            num_tokens = hidden_states_reshaped.shape[0]
+++++-+
+++++-+            router_logits = self.gate(hidden_states_reshaped)
+++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++-+
+++++-+            if self.norm_topk_prob:
+++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
+++++-+            
+++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++-+            flat_selected_experts = selected_experts.flatten()
+++++-+            
+++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++-+            token_indices = broadcasted_token_indices.flatten()
+++++-+            
+++++-+            active_experts = ops.unique(flat_selected_experts)
+++++-+            
+++++-+            for expert_idx_tensor in active_experts:
+++++-+                expert_idx = expert_idx_tensor.item()
+++++-+                expert_layer = self.experts[expert_idx]
+++++-+                
+++++-+                mask = (flat_selected_experts == expert_idx_tensor)
+++++-+                selected_token_indices = token_indices[mask]
+++++-+                selected_routing_weights = routing_weights.flatten()[mask]
+++++-+                
+++++-+                current_states = hidden_states_reshaped[selected_token_indices]
+++++-+                
+++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++-+                
+++++-+                final_hidden_states = final_hidden_states.index_add(
+++++-+                    dim=0,
+++++-+                    index=selected_token_indices,
+++++-+                    source=expert_output.to(hidden_states.dtype)
+++++-+                )
+++++-+            
+++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++- 
+++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++--        return final_hidden_states, router_logits
+++++-+            final_hidden_states = final_hidden_states + shared_expert_output
+++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++-+            
+++++-+            return final_hidden_states, router_logits
+++++- 
+++++- 
+++++- class Qwen2MoeDecoderLayer(nn.Module):
+++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++++- 
+++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++- 
+++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++-+
+++++-         if (layer_idx not in config.mlp_only_layers) and (
+++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++++-         ):
+++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++++-     _skip_keys_device_placement = "past_key_values"
+++++-     _supports_cache_class = True
+++++-+#lwx
+++++-+    # _supports_static_cache = True
+++++- 
+++++-     def _init_weights(self, module):
+++++-         std = self.config.initializer_range
+++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++++-         return causal_mask
+++++- 
+++++- 
+++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++-     _tied_weights_keys = ["lm_head.weight"]
+++++- 
+++++-     def __init__(self, config):
+++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++-         self.num_experts_per_tok = config.num_experts_per_tok
+++++-         # Initialize weights and apply final processing
+++++-         self.post_init()
+++++-+        # @lwx
+++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++++-+        #     self.generation_config.cache_implementation = "static"
+++++-+        self._warmed_up = False
+++++-+
+++++-+    def warmup_moe_model(self):
+++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++++-+        test_texts = [
+++++-+            "warmup short",
+++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++++-+        ]
+++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++-+        if tokenizer is None:
+++++-+            from mindnlp.transformers import AutoTokenizer
+++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++-+            self._warmup_tokenizer = tokenizer
+++++-+
+++++-+        for text in test_texts:
+++++-+            inputs = tokenizer(text, return_tensors="ms")
+++++-+            with mindspore._no_grad():
+++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++++- 
+++++-     def get_input_embeddings(self):
+++++-         return self.model.embed_tokens
+++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++-         ```"""
+++++-+        if not self._warmed_up:
+++++-+            self._warmed_up = True
+++++-+            self.warmup_moe_model()
+++++- 
+++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++++-         output_router_logits = (
+++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++-             }
+++++-         )
+++++-         return model_inputs
+++++-+# @lwx
+++++-+    # def _decode_one_tokens_logits(
+++++-+    #     self,
+++++-+    #     cur_token: mindspore.Tensor,
+++++-+    #     input_pos: Optional[mindspore.Tensor],
+++++-+    #     cache_position: mindspore.Tensor,
+++++-+    #     past_key_values: StaticCache,
+++++-+    # ) -> mindspore.Tensor:
+++++-+    #     """
+++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++++-+        
+++++-+    #     Args:
+++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++++-+    #         input_pos: 输入位置信息，可选
+++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++++-+            
+++++-+    #     Returns:
+++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++++-+    #     """
+++++-+    #     # 调用JIT编译的版本
+++++-+    #     return self.get_decode_one_tokens_logits(
+++++-+    #         cur_token=cur_token,
+++++-+    #         input_pos=input_pos,
+++++-+    #         cache_position=cache_position,
+++++-+    #         past_key_values=past_key_values,
+++++-+    #     )
+++++-+    
+++++-+    # @mindspore.jit(jit_level='O1')
+++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++++-+    #     """
+++++-+    #     JIT编译的函数，用于高效的单token解码
+++++-+    #     使用JIT编译优化以支持静态shape和高效执行
+++++-+        
+++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++++-+    #     """
+++++-+    #     outputs = self.model.forward(
+++++-+    #         input_ids=cur_token,
+++++-+    #         position_ids=input_pos,
+++++-+    #         cache_position=cache_position,
+++++-+    #         past_key_values=past_key_values,
+++++-+    #         use_cache=True,
+++++-+    #         return_dict=False,
+++++-+    #     )
+++++-+        
+++++-+    #     hidden_states = outputs[0]
+++++-+    #     logits = self.lm_head.forward(hidden_states)
+++++-+    #     logits = logits.float()
+++++-+        
+++++-+    #     return logits[:, -1, :]
+++++-+
+++++-+    # def _sample(
+++++-+    #     self,
+++++-+    #     input_ids: mindspore.Tensor,
+++++-+    #     logits_processor,
+++++-+    #     stopping_criteria,
+++++-+    #     generation_config,
+++++-+    #     synced_devices: bool,
+++++-+    #     streamer=None,
+++++-+    #     logits_warper=None,
+++++-+    #     **model_kwargs,
+++++-+    # ):
+++++-+    #     """
+++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++++-+    #     """
+++++-+    #     from ...generation.logits_process import LogitsProcessorList
+++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++++-+    #     from mindnlp.core import nn, ops, no_grad
+++++-+    #     import numpy as np
+++++-+        
+++++-+    #     # 检查是否使用 StaticCache
+++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++++-+    #     # 否则，直接调用父类方法
+++++-+    #     past_key_values = model_kwargs.get("past_key_values")
+++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++++-+
+++++-+    #     if not isinstance(past_key_values, StaticCache):
+++++-+    #         # 不使用 StaticCache，直接调用父类方法
+++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++++-+    #         return super()._sample(
+++++-+    #             input_ids=input_ids,
+++++-+    #             logits_processor=logits_processor,
+++++-+    #             stopping_criteria=stopping_criteria,
+++++-+    #             generation_config=generation_config,
+++++-+    #             synced_devices=synced_devices,
+++++-+    #             streamer=streamer,
+++++-+    #             logits_warper=logits_warper,
+++++-+    #             **model_kwargs,
+++++-+    #         )
+++++-+        
+++++-+    #     # 使用 StaticCache，进入自定义循环
+++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++++-+    #     pad_token_id = generation_config._pad_token_tensor
+++++-+    #     output_attentions = generation_config.output_attentions
+++++-+    #     output_hidden_states = generation_config.output_hidden_states
+++++-+    #     output_scores = generation_config.output_scores
+++++-+    #     output_logits = generation_config.output_logits
+++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++++-+    #     max_length = generation_config.max_length
+++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++++-+    #     do_sample = generation_config.do_sample
+++++-+        
+++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++++-+    #         raise ValueError(
+++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++++-+    #             f"{logits_warper})."
+++++-+    #         )
+++++-+        
+++++-+    #     # init attention / hidden states / scores tuples
+++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++++-+        
+++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++++-+    #         encoder_hidden_states = (
+++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++++-+    #         )
+++++-+        
+++++-+    #     # keep track of which sequences are already finished
+++++-+    #     batch_size, cur_len = input_ids.shape
+++++-+    #     this_peer_finished = False
+++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++++-+        
+++++-+    #     time_record = []
+++++-+    #     from ....utils.testing_utils import parse_flag_from_env
+++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++++-+        
+++++-+    #     while self._has_unfinished_sequences(
+++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++++-+    #     ):
+++++-+    #         if _record_time:
+++++-+    #             import time as time_module
+++++-+    #             infer_start = time_module.time()
+++++-+            
+++++-+    #         # prepare model inputs
+++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++++-+            
+++++-+    #         # prepare variable output controls
+++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++++-+            
+++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++++-+    #         cur_cache_position = model_inputs.get("cache_position")
+++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+++++-+    #         cur_input_ids = model_inputs.get("input_ids")
+++++-+            
+++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++++-+    #             cur_cache_position is not None and 
+++++-+    #             len(cur_cache_position.shape) > 0 and
+++++-+    #             cur_cache_position.shape[0] == 1 and
+++++-+    #             cur_input_ids is not None and
+++++-+    #             cur_input_ids.shape[1] == 1):
+++++-+    #             # 使用 JIT 优化的单 token 解码
+++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++++-+    #             if not hasattr(self, '_jit_used'):
+++++-+    #                 self._jit_used = False
+++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++++-+                
+++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+++++-+    #                 cur_token=cur_input_ids,
+++++-+    #                 input_pos=model_inputs.get("position_ids"),
+++++-+    #                 cache_position=cur_cache_position,
+++++-+    #                 past_key_values=cur_past_key_values,
+++++-+    #             )
+++++-+                
+++++-+    #             # 标记已使用JIT（用于后续判断）
+++++-+    #             if not self._jit_used:
+++++-+    #                 self._jit_used = True
+++++-+                
+++++-+    #             # 构造兼容的输出对象
+++++-+    #             class JitOptimizedOutput:
+++++-+    #                 def __init__(self, logits, config):
+++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++++-+    #                     self.config = config
+++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+++++-+    #                     self.cross_attentions = None
+++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++++-+                
+++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++++-+    #         else:
+++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++++-+    #             outputs = self(**model_inputs, return_dict=True)
+++++-+            
+++++-+    #         if synced_devices and this_peer_finished:
+++++-+    #             continue
+++++-+            
+++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++++-+    #         next_token_logits = outputs.logits[:, -1, :]
+++++-+            
+++++-+    #         # pre-process distribution
+++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++++-+    #         if do_sample:
+++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++++-+            
+++++-+    #         # Store scores, attentions and hidden_states when required
+++++-+    #         if return_dict_in_generate:
+++++-+    #             if output_scores:
+++++-+    #                 scores += (next_token_scores,)
+++++-+    #             if output_logits:
+++++-+    #                 raw_logits += (next_token_logits,)
+++++-+    #             if output_attentions:
+++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++++-+    #                 if self.config.is_encoder_decoder:
+++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++++-+                
+++++-+    #             if output_hidden_states:
+++++-+    #                 hidden = (
+++++-+    #                     outputs.decoder_hidden_states
+++++-+    #                     if self.config.is_encoder_decoder
+++++-+    #                     else outputs.hidden_states
+++++-+    #                 )
+++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++++-+            
+++++-+    #         # token selection
+++++-+    #         if do_sample:
+++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++++-+    #         else:
+++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++++-+            
+++++-+    #         # finished sentences should have their next token be a padding token
+++++-+    #         if has_eos_stopping_criteria:
+++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++++-+            
+++++-+    #         # update generated ids, model inputs, and length for next step
+++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++++-+    #         if streamer is not None:
+++++-+    #             streamer.put(next_tokens)
+++++-+            
+++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+++++-+    #             outputs,
+++++-+    #             model_kwargs,
+++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++++-+    #         )
+++++-+            
+++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++++-+    #         cur_len += 1
+++++-+            
+++++-+    #         if _record_time:
+++++-+    #             import time as time_module
+++++-+    #             infer_stop = time_module.time()
+++++-+    #             time_record.append(infer_stop - infer_start)
+++++-+            
+++++-+    #         del outputs
+++++-+        
+++++-+    #     average_infer_time = None
+++++-+    #     if time_record:
+++++-+    #         if len(time_record) > 1:
+++++-+    #             time_record.pop(0)
+++++-+    #         average_infer_time = sum(time_record) / len(time_record)
+++++-+    #         print(f'average inference time is: {average_infer_time}')
+++++-+    #         print(f'inference time record: {time_record}')
+++++-+        
+++++-+    #     if streamer is not None:
+++++-+    #         streamer.end()
+++++-+        
+++++-+    #     # 简单判断：打印是否使用了JIT路径
+++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+++++-+    #     else:
+++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++++-+        
+++++-+    #     if return_dict_in_generate:
+++++-+    #         if self.config.is_encoder_decoder:
+++++-+    #             return GenerateEncoderDecoderOutput(
+++++-+    #                 sequences=input_ids,
+++++-+    #                 scores=scores,
+++++-+    #                 logits=raw_logits,
+++++-+    #                 encoder_attentions=encoder_attentions,
+++++-+    #                 encoder_hidden_states=encoder_hidden_states,
+++++-+    #                 decoder_attentions=decoder_attentions,
+++++-+    #                 cross_attentions=cross_attentions,
+++++-+    #                 decoder_hidden_states=decoder_hidden_states,
+++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++-+    #                 average_infer_time=average_infer_time
+++++-+    #             )
+++++-+    #         else:
+++++-+    #             return GenerateDecoderOnlyOutput(
+++++-+    #                 sequences=input_ids,
+++++-+    #                 scores=scores,
+++++-+    #                 logits=raw_logits,
+++++-+    #                 attentions=decoder_attentions,
+++++-+    #                 hidden_states=decoder_hidden_states,
+++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++-+    #                 average_infer_time=average_infer_time
+++++-+    #             )
+++++-+    #     else:
+++++-+    #         return input_ids
+++++-+            
+++++-+    # def _prepare_cache_for_generation(
+++++-+    #     self,
+++++-+    #     generation_config,
+++++-+    #     model_kwargs,
+++++-+    #     assistant_model,
+++++-+    #     batch_size,
+++++-+    #     max_cache_length,
+++++-+    # ):
+++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++++-+    #         generation_config.cache_implementation = "static"
+++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++++-+        
+++++-+    #     if generation_config.cache_implementation == "static":
+++++-+    #         base_required_from_max_length = generation_config.max_length + 1
+++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+++++-+    #         min_cache_size = 50
+++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++++-+    #         else:
+++++-+    #             max_cache_length = max(base_required, min_cache_size)
+++++-+            
+++++-+    #         original_max_cache_length = max_cache_length
+++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+++++-+            
+++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++-+    #             if max_cache_length > self.config.max_position_embeddings:
+++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++++-+        
+++++-+    #     result = super()._prepare_cache_for_generation(
+++++-+    #         generation_config=generation_config,
+++++-+    #         model_kwargs=model_kwargs,
+++++-+    #         assistant_model=assistant_model,
+++++-+    #         batch_size=batch_size,
+++++-+    #         max_cache_length=max_cache_length,
+++++-+    #     )
+++++-+        
+++++-+    #     if generation_config.cache_implementation == "static":
+++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++++-+    #         created_cache = model_kwargs.get(cache_name)
+++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
+++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++++-+        
+++++-+    #     return result
+++++-+
+++++-+
+++++-+
+++++- 
+++++- 
+++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++++--- 
+++++-2.27.0
+++++-
+++++-- 
+++++2.27.0
+++++
++++-- 
++++2.27.0
++++
+++-- 
+++2.27.0
+++
++-- 
++2.27.0
++
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0008-moe-change.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0008-moe-change.patch"
new file mode 100644
index 00000000..31d324c3
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0008-moe-change.patch"
@@ -0,0 +1,8789 @@
+From 3b0f98eeed90a7204357d96aacc9dc7098b9dab1 Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Sun, 9 Nov 2025 00:50:01 +0800
+Subject: [PATCH 08/10] moe change
+
+---
+ .../models/deepseek/modeling_deepseek.py      |  433 +-
+ .../models/qwen2_moe/modeling_qwen2_moe.py    |   86 +-
+ patches/0001-20251104commit.patch             |    2 +-
+ patches/0002-20251106commit.patch             |    2 +-
+ patches/0003-20261106secondcommit.patch       |    2 +-
+ patches/0004-20251106change.patch             |    2 +-
+ patches/0005-20251107001commit.patch          |    2 +-
+ patches/0006-20251107002commit.patch          |    2 +-
+ patches/0007-20251107003commit.patch          | 8034 +++++++++++++++++
+ 9 files changed, 8510 insertions(+), 55 deletions(-)
+ create mode 100644 patches/0007-20251107003commit.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index ff631974..0af29305 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -19,8 +19,10 @@
+ # limitations under the License.
+ """ MindNLP DeepSeek model."""
+ import math
++import time
+ import warnings
+ from typing import List, Optional, Tuple, Union
++from mindspore import mint
+ import mindspore
+ from mindnlp.core import nn, ops, no_grad
+ from mindnlp.core.nn import functional as F
+@@ -54,6 +56,10 @@ logger = logging.get_logger(__name__)
+ 
+ _CONFIG_FOR_DOC = "DeepseekConfig"
+ 
++Long_Prompt = 1
++LONG_PROMPT_LENGTH_THRESHOLD = 128
++SHORT_PROMPT_LENGTH_THRESHOLD = 32
++
+ _attn_mask_cache = {}
+ 
+ def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+@@ -380,6 +386,8 @@ class MoEGate(nn.Module):
+         return topk_idx, topk_weight, aux_loss
+ 
+ 
++bincount_op = mindspore.ops.Bincount()
++
+ class DeepseekMoE(nn.Module):
+     """
+     A mixed expert module containing shared experts.
+@@ -413,7 +421,10 @@ class DeepseekMoE(nn.Module):
+                     y = y + self.shared_experts(identity)
+                 return y
+             else:
+-                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++                if Long_Prompt == 0:
++                    y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++                else:
++                    y= self.moe_infer_prefill_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+                 if self.config.n_shared_experts is not None:
+                     y = y + self.shared_experts(identity)
+                 return y
+@@ -421,7 +432,103 @@ class DeepseekMoE(nn.Module):
+         # if self.config.n_shared_experts is not None:
+         #     y = y + self.shared_experts(identity)
+         # return y
+-        
++    
++    
++    
++    # lwx
++    # def forward(self, x, expert_ids: Optional[mindspore.Tensor] = None):
++    #     """
++    #     如果 expert_ids 为 None，走单专家逻辑；
++    #     如果有，多专家批量处理，保证和原逻辑一致。
++    #     """
++    #     if expert_ids is None:
++    #         # 原单专家逻辑
++    #         if self.config.pretraining_tp > 1:
++    #             slice = self.intermediate_size // self.config.pretraining_tp
++    #             gate_proj_slices = ops.split(self.gate_proj.weight, slice, dim=0)
++    #             up_proj_slices = ops.split(self.up_proj.weight, slice, dim=0)
++    #             down_proj_slices = ops.split(self.down_proj.weight, slice, dim=1)
++    #             gate_proj = ops.cat([F.linear(x, gate_proj_slices[i]) 
++    #                                  for i in range(self.config.pretraining_tp)], dim=-1)
++    #             up_proj = ops.cat([F.linear(x, up_proj_slices[i]) 
++    #                                for i in range(self.config.pretraining_tp)], dim=-1)
++    #             intermediate_states = ops.split((self.act_fn(gate_proj) * up_proj), slice, dim=2)
++    #             down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) 
++    #                          for i in range(self.config.pretraining_tp)]
++    #             down_proj = sum(down_proj)
++    #         else:
++    #             down_proj = self.down_proj(
++    #                 self.act_fn(self.gate_proj(x)) * self.up_proj(x)
++    #             )
++    #         return down_proj
++
++    #     # ====== 批量多专家路径 ======
++    #     hidden_size = x.shape[-1]
++
++    #     # 按 token expert_ids 选权重
++    #     gate_weights = self.gate_proj.weight[expert_ids]  # shape: [tokens, inter_size]
++    #     up_weights   = self.up_proj.weight[expert_ids]
++    #     down_weights = self.down_proj.weight[expert_ids]
++
++    #     # 注意：pretraining_tp > 1 的分 slice 逻辑仍然要保留
++    #     if self.config.pretraining_tp > 1:
++    #         outputs = []
++    #         slice = self.intermediate_size // self.config.pretraining_tp
++    #         for i in range(self.config.pretraining_tp):
++    #             # 每个 slice 单独计算
++    #             gate_proj_out = F.linear(x, gate_weights[:, i*slice:(i+1)*slice])
++    #             up_proj_out   = F.linear(x, up_weights[:, i*slice:(i+1)*slice])
++    #             act_out       = self.act_fn(gate_proj_out) * up_proj_out
++    #             down_proj_out = F.linear(act_out, down_weights[i*slice:(i+1)*slice, :])
++    #             outputs.append(down_proj_out)
++    #         return sum(outputs)
++    #     else:
++    #         gate_proj_out = F.linear(x, gate_weights)
++    #         up_proj_out   = F.linear(x, up_weights)
++    #         act_out       = self.act_fn(gate_proj_out) * up_proj_out
++    #         return F.linear(act_out, down_weights)
++    # @no_grad()
++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++    #     num_tokens = x.shape[0]
++    #     hidden_size = x.shape[-1]
++
++    #     idxs = flat_expert_indices.argsort()
++    #     sorted_expert_indices = flat_expert_indices[idxs]
++    #     sorted_token_indices = idxs // self.num_experts_per_tok
++    #     sorted_indices = sorted_token_indices
++
++    #     permuted_tokens = x[sorted_token_indices]
++    #     sorted_weights  = flat_expert_weights[idxs]
++
++    #     # 一次调用多专家 forward
++    #     expert_outputs = ops.zeros_like(permuted_tokens)
++    #     expert_outputs = self.mlp_batch_forward(permuted_tokens, sorted_expert_indices)
++
++    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
++    #     try:
++    #         final_output = ops.moe_token_unpermute(
++    #             expert_outputs,
++    #             sorted_indices,
++    #             probs=probs,
++    #             padded_mode=False
++    #         )
++    #     except Exception:
++    #         final_output = ops.zeros_like(x)
++    #         final_output = mindspore.mint.scatter_add(
++    #             final_output,
++    #             0,
++    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
++    #             expert_outputs * sorted_weights
++    #         )
++
++    #     return final_output
++
++    # def mlp_batch_forward(self, tokens, expert_ids):
++    #     """
++    #     使用批量专家 forward（保留精度）
++    #     """
++    #     return self.experts[0].forward(tokens, expert_ids)
++
+     # @no_grad()
+     # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+ 
+@@ -434,52 +541,15 @@ class DeepseekMoE(nn.Module):
+     #         expert_cache += expert_out * weight
+     #     return expert_cache
+     
++    #@dwj
+     @no_grad()
+-    # dwj
+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-        # x 的 shape: (1, hidden_size)
+-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-
+-        # 1. 收集所有需要的专家层
+-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+         selected_experts = [self.experts[i] for i in flat_expert_indices]
+-
+-        # 2. 并行计算所有专家的输出
+-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-        # ops.cat 会将它们堆叠成一个新的 Tensor
+-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+         expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-
+-        # 3. 使用矩阵乘法进行加权求和
+-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-        # 最终结果 final_output 的 shape: (1, hidden_size)
+         final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-        
+         return final_output
+ 
+ 
+-    # @no_grad()
+-    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-    #     expert_cache = ops.zeros_like(x)
+-    #     idxs = flat_expert_indices.argsort()
+-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-    #     token_idxs = idxs // self.num_experts_per_tok
+-
+-    #     for i, end_idx in enumerate(tokens_per_expert):
+-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-    #         if start_idx == end_idx:
+-    #             continue
+-    #         expert = self.experts[i]
+-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-    #         expert_tokens = x[exp_token_idx]
+-    #         expert_out = expert(expert_tokens)
+-    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-
+-    #     return expert_cache
+-        
+     @no_grad()
+     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+         """
+@@ -525,6 +595,264 @@ class DeepseekMoE(nn.Module):
+             )
+ 
+         return expert_cache
++
++
++    # @no_grad()
++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++    #     """
++    #     优化版 MoE prefill：使用 mindspore.ops.moe_token_unpermute 替代手动 scatter_add
++    #     """
++    #     num_tokens = x.shape[0]
++    #     hidden_size = x.shape[-1]
++
++    #     # 生成排序后的 token 索引
++    #     idxs = flat_expert_indices.argsort()
++    #     sorted_expert_indices = flat_expert_indices[idxs]
++    #     sorted_token_indices = idxs // self.num_experts_per_tok
++
++    #     # 记录到 sorted_indices（moe_token_unpermute 用）
++    #     sorted_indices = sorted_token_indices  # shape: [num_tokens * top_k]
++
++    #     # 收集专家输入
++    #     permuted_tokens = x[sorted_token_indices]
++
++    #     # 执行每个专家的 MLP（批量处理）
++    #     expert_outputs = []
++    #     token_ptr = 0
++    #     tokens_per_expert = sorted_expert_indices.bincount()
++    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
++    #         if count == 0:
++    #             continue
++    #         cur_tokens = permuted_tokens[token_ptr:token_ptr+count]
++    #         out = self.experts[expert_id](cur_tokens)
++    #         expert_outputs.append(out)
++    #         token_ptr += count
++
++    #     # 拼接所有专家输出
++    #     permuted_outputs = ops.cat(expert_outputs, axis=0)
++
++    #     # 权重缩放（probs 形状为 [num_tokens, top_k]）
++    #     probs = flat_expert_weights.view(num_tokens, self.num_experts_per_tok)
++
++    #     # 直接调用硬件加速的 unpermute
++    #     final_output = ops.moe_token_unpermute(
++    #         permuted_outputs,         # shape: [num_tokens * top_k, hidden_size]
++    #         sorted_indices,           # shape: [num_tokens * top_k]
++    #         probs=probs,               # 按概率加权
++    #         padded_mode=False
++    #     )
++
++    #     return final_output
++
++    # lwx prefill 20251108
++    @no_grad()
++    def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
++        """
++        高性能 + 数值一致的 MoE prefill 推理：
++        1. 批量化处理所有专家计算，减少 Python 循环开销
++        2. Ascend A2 上使用 ops.moe_token_unpermute 加速 token 恢复
++        3. CPU/GPU 上自动 fallback 到 scatter_add 实现
++        4. 保证权重和 token 排列顺序与原版本完全一致，避免生成结果 mismatch
++
++        参数：
++            x: [num_tokens, hidden_size]，
++            MoE 输入的 token 表示
++            flat_expert_indices: [num_tokens * top_k]，
++            每个 token 的路由专家 id
++            flat_expert_weights: [num_tokens * top_k, 1]，
++            路由专家权重
++        """
++        num_tokens = x.shape[0]
++        hidden_size = x.shape[-1]
++
++        # 1) 排序专家分配（与原 scatter_add 一致的顺序）
++        idxs = flat_expert_indices.argsort()  # 排序索引
++        sorted_expert_indices = flat_expert_indices[idxs]          # [num_tokens*top_k]
++        sorted_token_indices = idxs // self.num_experts_per_tok    # 原 token ID
++
++        # sorted_indices 必须与 permuted_tokens 顺序匹配
++        sorted_indices = sorted_token_indices   # 用原 token 位置恢复顺序
++
++        # 2) 收集专家输入（按 idxs 排序）
++        permuted_tokens = x[sorted_token_indices]                  # [num_tokens*top_k, hidden_size]
++        sorted_weights  = flat_expert_weights[idxs]                 # [num_tokens*top_k, 1]，确保与 permuted_tokens 对齐
++
++        # 3) 计算每个专家的 token 数
++        tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
++
++        # 4) 批量专家计算（减少 Python 循环）
++        gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
++        up_weights   = ops.stack([expert.up_proj.weight   for expert in self.experts], dim=0)
++        down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
++
++        expert_outputs = ops.zeros_like(permuted_tokens)
++        ptr = 0
++        for expert_id, count in enumerate(tokens_per_expert.tolist()):
++            if count == 0:
++                continue
++            tokens = permuted_tokens[ptr:ptr+count]  # [count, hidden_size]
++            
++            # 与 DeepseekMLP forward 等价
++            gate_proj_out = F.linear(tokens, gate_weights[expert_id])
++            up_proj_out   = F.linear(tokens, up_weights[expert_id])
++            act_out       = self.experts[expert_id].act_fn(gate_proj_out) * up_proj_out
++            expert_out    = F.linear(act_out, down_weights[expert_id])
++
++            expert_outputs[ptr:ptr+count] = expert_out
++            ptr += count
++
++        # 5) Ascend 加速的 unpermute（已排序的权重）
++        probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)  # 按排序后的顺序 reshape
++
++        final_output = ops.zeros_like(x)
++        final_output = mindspore.mint.scatter_add(
++            final_output,
++            0,
++            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
++            expert_outputs * sorted_weights
++        )      
++
++
++        # try:
++        #     final_output = ops.moe_token_unpermute(
++        #         expert_outputs,   # [num_tokens*top_k, hidden_size]
++        #         sorted_indices,   # [num_tokens*top_k] 原 token id
++        #         probs=probs,      # 对应权重
++        #         padded_mode=False
++        #     )
++        # except Exception:
++        #     # CPU/GPU fallback：用 scatter_add 保证完全一致
++        #     final_output = ops.zeros_like(x)
++        #     final_output = mindspore.mint.scatter_add(
++        #         final_output,
++        #         0,
++        #         sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
++        #         expert_outputs * sorted_weights
++        #     )
++
++        return final_output
++
++
++    # @no_grad()
++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++    #     num_tokens = x.shape[0]
++    #     hidden_size = x.shape[-1]
++
++    #     idxs = flat_expert_indices.argsort()
++    #     sorted_expert_indices = flat_expert_indices[idxs]
++    #     sorted_token_indices = idxs // self.num_experts_per_tok
++        
++    #     # sorted_indices = sorted_token_indices
++    #     sorted_indices = sorted_token_indices.astype(mindspore.int32)
++    #     permuted_tokens = x[sorted_token_indices]
++    #     sorted_weights = flat_expert_weights[idxs]
++    #     tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
++
++    #     expert_outputs = ops.zeros_like(permuted_tokens)
++    #     ptr = 0
++
++    #     # 只按专家维度循环
++    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
++    #         if count == 0:
++    #             continue
++    #         token_slice = slice(ptr, ptr + count)
++    #         expert_tokens = permuted_tokens[token_slice]
++
++    #         # 保持原 forward（含 pretraining_tp、bias 等）
++    #         expert_out = self.experts[expert_id](expert_tokens)
++
++    #         expert_outputs[token_slice] = expert_out
++    #         ptr += count
++
++    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
++    #     try:
++    #         final_output = mindspore.ops.moe_token_unpermute(
++    #             expert_outputs,
++    #             sorted_indices,
++    #             probs=probs,
++    #             padded_mode=False
++    #         )
++    #     except Exception:
++    #         final_output = ops.zeros_like(x)
++    #         final_output = mindspore.mint.scatter_add(
++    #             final_output,
++    #             0,
++    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
++    #             expert_outputs * sorted_weights
++    #         )
++
++    #     return final_output
++
++
++    #lwx
++    # @no_grad()
++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++    #     """
++    #     并行化 MoE prefill：
++    #     - 一次性计算所有专家输出，牺牲显存峰值换取速度
++    #     - 保证结果与原版完全一致
++    #     """
++    #     # 输出缓存
++    #     expert_cache = ops.zeros_like(x)
++
++    #     # token 总数（批量*seq_len*num_experts_per_tok）
++    #     num_tokens = flat_expert_indices.shape[0]
++    #     hidden_dim = x.shape[-1]
++
++    #     # 原 token ID（idxs // num_experts_per_tok）
++    #     token_ids = ops.arange(num_tokens // self.num_experts_per_tok).repeat_interleave(self.num_experts_per_tok)
++
++    #     # ====== Step 1: 组织输入 ======
++    #     # 按 experts 排序，保证 scatter_add 对应位置一致
++    #     sort_ids = flat_expert_indices.argsort()
++    #     sorted_experts = flat_expert_indices[sort_ids]
++    #     sorted_tokens = token_ids[sort_ids]
++    #     sorted_weights = flat_expert_weights[sort_ids]
++
++    #     # 收集每个专家的输入
++    #     # build: expert_inputs[expert_id] = [tokens...]
++    #     expert_inputs = []
++    #     expert_outs = []
++
++    #     for eid in range(self.config.n_routed_experts):
++    #         eid_mask = (sorted_experts == eid)
++    #         if eid_mask.any():
++    #             tokens_for_eid = x[sorted_tokens[eid_mask]]
++    #             expert_inputs.append(tokens_for_eid)
++    #         else:
++    #             expert_inputs.append(None)
++
++    #     # ====== Step 2: 并行计算所有专家输出 ======
++    #     # 存储所有专家结果到一个列表
++    #     for eid in range(self.config.n_routed_experts):
++    #         if expert_inputs[eid] is not None:
++    #             out = self.experts[eid](expert_inputs[eid])
++    #             expert_outs.append(out)
++    #         else:
++    #             expert_outs.append(None)
++
++    #     # ====== Step 3: scatter_add 回写结果 ======
++    #     # 遍历专家，将结果加回对应的 token
++    #     pos = 0
++    #     for eid in range(self.config.n_routed_experts):
++    #         if expert_outs[eid] is not None:
++    #             size = expert_outs[eid].shape[0]
++    #             tokens_idx = sorted_tokens[pos:pos+size]
++    #             scaled_out = expert_outs[eid] * sorted_weights[pos:pos+size]
++    #             pos += size
++
++    #             # scatter_add 到 expert_cache
++    #             expert_cache = mindspore.mint.scatter_add(
++    #                 expert_cache,
++    #                 dim=0,
++    #                 index=tokens_idx.view(-1, 1).tile((1, hidden_dim)),
++    #                 src=scaled_out
++    #             )
++
++    #     return expert_cache
++
++
++
+ # 放置在 DeepseekMoE 类中
+     # @no_grad()
+     # #lwx 20251107
+@@ -1188,7 +1516,7 @@ class DeepseekDecoderLayer(nn.Module):
+         self.hidden_size = config.hidden_size
+ 
+         # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-            # config=config, layer_idx=layer_idx
++        #     config=config, layer_idx=layer_idx
+         # )
+ 
+         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+@@ -1204,6 +1532,7 @@ class DeepseekDecoderLayer(nn.Module):
+             )
+             else DeepseekMLP(config)
+         )
++
+         self.input_layernorm = DeepseekRMSNorm(
+             config.hidden_size, eps=config.rms_norm_eps
+         )
+@@ -1537,6 +1866,28 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+     def get_decoder(self):
+         return self.model
+ 
++    def generate(self, *args, **kwargs):
++        """
++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
++        """
++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++
++        input_ids = kwargs.get("input_ids")
++        if input_ids is None and args:
++            input_ids = args[0]
++
++        if input_ids is not None:
++            prompt_length = input_ids.shape[1]
++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
++                Long_Prompt = 2
++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
++                Long_Prompt = 0
++            else:
++                Long_Prompt = 1
++
++
++        return super().generate(*args, **kwargs)
+ 
+     def forward(
+         self,
+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+index 913a7609..6566958b 100644
+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+@@ -1104,7 +1104,7 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+ 
+     # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+     @no_grad()
+-    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++    def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+         original_dtype = hidden_states.dtype
+         batch_size, _ = hidden_states.shape
+         expert_outputs_list = [
+@@ -1119,8 +1119,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+         return moe_output_fp32.squeeze(1).to(original_dtype)
+ 
++
+     # @no_grad()
+-    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++    # def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+     #     num_tokens, _ = hidden_states.shape
+     #     flat_selected_experts = selected_experts.flatten()
+     #     sorted_expert_indices = flat_selected_experts.argsort()
+@@ -1142,8 +1143,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+     #         current_token_offset += expert_token_count
+     #     return moe_output
+ 
++    # baseline
+     @no_grad()
+-    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++    def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+         """
+         优化版 MoE prefill (速度优先模式)：
+         - 批量张量化处理同一个 expert 的所有 token
+@@ -1184,7 +1186,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         return moe_output
+ 
+ 
++    @no_grad()
++    def _moe_infer_prefill_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++        """
++        优化版 MoE prefill (速度优先模式) - 连续切片 & 单次 scatter_add
++        逻辑：
++        1. 按 expert 排序，将同一 expert 的 token 放在连续内存中
++        2. 每个 expert 一次性处理其全部 token
++        3. 最后一次 scatter_add 回到原 token 顺序
++        """
++
++        num_tokens = hidden_states.shape[0]
++        hidden_size = hidden_states.shape[-1]
++
++        # 展平为一维
++        flat_selected_experts = selected_experts.flatten()       # [num_tokens * top_k]
++        flat_routing_weights = routing_weights.flatten()         # [num_tokens * top_k]
++
++        # 按 expert 排序
++        idxs = flat_selected_experts.argsort()
++        sorted_expert_indices = flat_selected_experts[idxs]       # expert ID 排序后
++        sorted_token_indices = idxs // self.top_k                 # 对应原 token ID
++
++        # 排好序的输入向量（连续内存）
++        permuted_tokens = hidden_states[sorted_token_indices]
++
++        # 排好序的权重
++        sorted_weights = flat_routing_weights[idxs]
++
++        # 每个 expert 对应的 token 数量
++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
++
++        # 存放专家输出（与 permuted_tokens 对应顺序保持一致）
++        expert_outputs = ops.zeros_like(permuted_tokens)
++
++        ptr = 0  # 指向当前切片的起点
++        for expert_id, count in enumerate(tokens_per_expert.tolist()):
++            if count == 0:
++                continue
++
++            token_slice = slice(ptr, ptr + count)
++            expert_tokens = permuted_tokens[token_slice]  # 连续切片
++
++            # 执行专家 MLP
++            expert_out = self.experts[expert_id](expert_tokens)
++
++            expert_outputs[token_slice] = expert_out
++            ptr += count
++
++        # 按权重缩放
++        scaled_outputs = expert_outputs * sorted_weights.unsqueeze(1)
++
++        # 回写到原 token 顺序 (单次 scatter_add)
++        moe_output = mindspore.mint.scatter_add(
++            ops.zeros_like(hidden_states),
++            0,
++            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
++            scaled_outputs
++        )
++
++        return moe_output
++
++
++    
+     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
++
+     @no_grad()
+     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+         moe_output = ops.zeros_like(hidden_states)
+@@ -1225,16 +1291,20 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         #     # --- 速度优先模式 (SPEED MODE) ---
+         #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+         #     if sequence_length == 1:
+-        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++        #         moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+         #     else:
+-        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++        #         moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+         
+         routing_weights_casted = routing_weights.to(hidden_states.dtype)
+         if sequence_length == 1:
+-            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++            moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+         else:
+-            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-    
++            # if Long_Prompt == 1:
++            #     moe_output = self._moe_infer_prefill_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++            # else:
++            #     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++            moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++
+ 
+         # 3. 共享专家计算与合并 (所有模式通用)
+         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+index c9c8c5ee..513dd40b 100644
+--- a/patches/0001-20251104commit.patch
++++ b/patches/0001-20251104commit.patch
+@@ -1,7 +1,7 @@
+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-Subject: [PATCH 1/6] 20251104commit
++Subject: [PATCH 1/7] 20251104commit
+ 
+ ---
+  mindnlp/transformers/cache_utils.py           |  28 +-
+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+index 625656eb..41081b85 100644
+--- a/patches/0002-20251106commit.patch
++++ b/patches/0002-20251106commit.patch
+@@ -1,7 +1,7 @@
+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-Subject: [PATCH 2/6] 20251106commit
++Subject: [PATCH 2/7] 20251106commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+index dcb85080..c1392569 100644
+--- a/patches/0003-20261106secondcommit.patch
++++ b/patches/0003-20261106secondcommit.patch
+@@ -1,7 +1,7 @@
+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-Subject: [PATCH 3/6] 20261106secondcommit
++Subject: [PATCH 3/7] 20261106secondcommit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+index bbed13cc..e548b1b2 100644
+--- a/patches/0004-20251106change.patch
++++ b/patches/0004-20251106change.patch
+@@ -1,7 +1,7 @@
+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 15:48:09 +0800
+-Subject: [PATCH 4/6] 20251106change
++Subject: [PATCH 4/7] 20251106change
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  189 +-
+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+index b2d1035c..bf224d2a 100644
+--- a/patches/0005-20251107001commit.patch
++++ b/patches/0005-20251107001commit.patch
+@@ -1,7 +1,7 @@
+ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Fri, 7 Nov 2025 11:48:18 +0800
+-Subject: [PATCH 5/6] 20251107001commit
++Subject: [PATCH 5/7] 20251107001commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |   91 +-
+diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
+index bffa134e..1bd306b9 100644
+--- a/patches/0006-20251107002commit.patch
++++ b/patches/0006-20251107002commit.patch
+@@ -1,7 +1,7 @@
+ From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Fri, 7 Nov 2025 12:06:32 +0800
+-Subject: [PATCH 6/6] 20251107002commit
++Subject: [PATCH 6/7] 20251107002commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  122 +-
+diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
+new file mode 100644
+index 00000000..ce558554
+--- /dev/null
++++ b/patches/0007-20251107003commit.patch
+@@ -0,0 +1,8034 @@
++From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Fri, 7 Nov 2025 12:12:51 +0800
++Subject: [PATCH 7/7] 20251107003commit
++
++---
++ .../models/deepseek/modeling_deepseek.py      |    2 +-
++ patches/0001-20251104commit.patch             |    2 +-
++ patches/0002-20251106commit.patch             |    2 +-
++ patches/0003-20261106secondcommit.patch       |    2 +-
++ patches/0004-20251106change.patch             |    2 +-
++ patches/0005-20251107001commit.patch          |    2 +-
++ patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
++ 7 files changed, 7937 insertions(+), 6 deletions(-)
++ create mode 100644 patches/0006-20251107002commit.patch
++
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index e7e1c053..ff631974 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
++     #     return expert_cache
++     
++     @no_grad()
++-    dwj
+++    # dwj
++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++         # x 的 shape: (1, hidden_size)
++         # flat_expert_indices 的 shape: (num_experts_per_tok,)
++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++index 2842180e..c9c8c5ee 100644
++--- a/patches/0001-20251104commit.patch
+++++ b/patches/0001-20251104commit.patch
++@@ -1,7 +1,7 @@
++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Tue, 4 Nov 2025 09:11:51 +0800
++-Subject: [PATCH 1/5] 20251104commit
+++Subject: [PATCH 1/6] 20251104commit
++ 
++ ---
++  mindnlp/transformers/cache_utils.py           |  28 +-
++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++index c6cd8757..625656eb 100644
++--- a/patches/0002-20251106commit.patch
+++++ b/patches/0002-20251106commit.patch
++@@ -1,7 +1,7 @@
++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 09:20:38 +0800
++-Subject: [PATCH 2/5] 20251106commit
+++Subject: [PATCH 2/6] 20251106commit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++index 601960c9..dcb85080 100644
++--- a/patches/0003-20261106secondcommit.patch
+++++ b/patches/0003-20261106secondcommit.patch
++@@ -1,7 +1,7 @@
++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 14:54:37 +0800
++-Subject: [PATCH 3/5] 20261106secondcommit
+++Subject: [PATCH 3/6] 20261106secondcommit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
++index 8976f10b..bbed13cc 100644
++--- a/patches/0004-20251106change.patch
+++++ b/patches/0004-20251106change.patch
++@@ -1,7 +1,7 @@
++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 15:48:09 +0800
++-Subject: [PATCH 4/5] 20251106change
+++Subject: [PATCH 4/6] 20251106change
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  189 +-
++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
++index 8d9032be..b2d1035c 100644
++--- a/patches/0005-20251107001commit.patch
+++++ b/patches/0005-20251107001commit.patch
++@@ -1,7 +1,7 @@
++ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Fri, 7 Nov 2025 11:48:18 +0800
++-Subject: [PATCH 5/5] 20251107001commit
+++Subject: [PATCH 5/6] 20251107001commit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |   91 +-
++diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
++new file mode 100644
++index 00000000..bffa134e
++--- /dev/null
+++++ b/patches/0006-20251107002commit.patch
++@@ -0,0 +1,7931 @@
+++From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
+++From: Pinoeer-kingxi <13022943007@163.com>
+++Date: Fri, 7 Nov 2025 12:06:32 +0800
+++Subject: [PATCH 6/6] 20251107002commit
+++
+++---
+++ .../models/deepseek/modeling_deepseek.py      |  122 +-
+++ patches/0001-20251104commit.patch             |    2 +-
+++ patches/0002-20251106commit.patch             |    2 +-
+++ patches/0003-20261106secondcommit.patch       |    2 +-
+++ patches/0004-20251106change.patch             |    2 +-
+++ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
+++ 6 files changed, 7773 insertions(+), 64 deletions(-)
+++ create mode 100644 patches/0005-20251107001commit.patch
+++
+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++index 8831e4b7..e7e1c053 100644
+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
+++     #         expert_out = expert(x)
+++     #         expert_cache += expert_out * weight
+++     #     return expert_cache
+++-
+++-    # @no_grad()
+++-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++-    #     # x 的 shape: (1, hidden_size)
+++-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++-
+++-    #     # 1. 收集所有需要的专家层
+++-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+++-
+++-    #     # 2. 并行计算所有专家的输出
+++-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+++-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++-
+++-    #     # 3. 使用矩阵乘法进行加权求和
+++-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+++-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++++    
++++    @no_grad()
++++    dwj
++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++        # x 的 shape: (1, hidden_size)
++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++++
++++        # 1. 收集所有需要的专家层
++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
++++
++++        # 2. 并行计算所有专家的输出
++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++++        # ops.cat 会将它们堆叠成一个新的 Tensor
++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++++
++++        # 3. 使用矩阵乘法进行加权求和
++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++        # 最终结果 final_output 的 shape: (1, hidden_size)
++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++         
+++-    #     return final_output
++++        return final_output
+++ 
+++ 
+++     # @no_grad()
+++@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
+++ 
+++         return expert_cache
+++ # 放置在 DeepseekMoE 类中
+++-    @no_grad()
+++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++-        """
+++-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+++-
+++-        Args:
+++-            x (Tensor): 输入张量, shape: (1, hidden_size)
+++-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+++-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+++-        """
+++-        top_k, _ = flat_expert_weights.shape
+++-        hidden_size = x.shape[-1]
+++-
+++-        # 1. 将所有专家的权重堆叠起来
+++-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+++-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+++-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
++++    # @no_grad()
++++    # #lwx 20251107
++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++    #     """
++++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
++++
++++    #     Args:
++++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
++++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
++++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
++++    #     """
++++    #     top_k, _ = flat_expert_weights.shape
++++    #     hidden_size = x.shape[-1]
++++
++++    #     # 1. 将所有专家的权重堆叠起来
++++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
++++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
++++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+++         
+++-        # 2. "收集" 所需的专家权重
+++-        selected_gate_w = stacked_gate_w[flat_expert_indices]
+++-        selected_up_w = stacked_up_w[flat_expert_indices]
+++-        selected_down_w = stacked_down_w[flat_expert_indices]
++++    #     # 2. "收集" 所需的专家权重
++++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
++++    #     selected_up_w = stacked_up_w[flat_expert_indices]
++++    #     selected_down_w = stacked_down_w[flat_expert_indices]
+++         
+++-        # 3. 准备输入
+++-        x_expanded = x.expand((top_k, 1, hidden_size))
++++    #     # 3. 准备输入
++++    #     x_expanded = x.expand((top_k, 1, hidden_size))
+++         
+++-        # 4. 并行计算 gate_proj 和 up_proj
+++-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+++-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
++++    #     # 4. 并行计算 gate_proj 和 up_proj
++++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
++++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+++ 
+++-        # 5. 计算中间状态
+++-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
++++    #     # 5. 计算中间状态
++++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+++         
+++-        # 6. 并行计算 down_proj
+++-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+++-        #    --- [FIX] ---
+++-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+++-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+++-        #    --- [FIX END] ---
++++    #     # 6. 并行计算 down_proj
++++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
++++    #     #    --- [FIX] ---
++++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
++++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
++++    #     #    --- [FIX END] ---
+++         
+++-        # 7. 根据路由权重进行加权求和
+++-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
++++    #     # 7. 根据路由权重进行加权求和
++++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+++ 
+++-        return weighted_sum
++++    #     return weighted_sum
+++ 
+++ 
+++ 
+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++index 0a0ef2d7..2842180e 100644
+++--- a/patches/0001-20251104commit.patch
++++++ b/patches/0001-20251104commit.patch
+++@@ -1,7 +1,7 @@
+++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Tue, 4 Nov 2025 09:11:51 +0800
+++-Subject: [PATCH 1/4] 20251104commit
++++Subject: [PATCH 1/5] 20251104commit
+++ 
+++ ---
+++  mindnlp/transformers/cache_utils.py           |  28 +-
+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+++index 5185270c..c6cd8757 100644
+++--- a/patches/0002-20251106commit.patch
++++++ b/patches/0002-20251106commit.patch
+++@@ -1,7 +1,7 @@
+++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Thu, 6 Nov 2025 09:20:38 +0800
+++-Subject: [PATCH 2/4] 20251106commit
++++Subject: [PATCH 2/5] 20251106commit
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+++index 3e05f821..601960c9 100644
+++--- a/patches/0003-20261106secondcommit.patch
++++++ b/patches/0003-20261106secondcommit.patch
+++@@ -1,7 +1,7 @@
+++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Thu, 6 Nov 2025 14:54:37 +0800
+++-Subject: [PATCH 3/4] 20261106secondcommit
++++Subject: [PATCH 3/5] 20261106secondcommit
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+++index 88a1aef4..8976f10b 100644
+++--- a/patches/0004-20251106change.patch
++++++ b/patches/0004-20251106change.patch
+++@@ -1,7 +1,7 @@
+++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Thu, 6 Nov 2025 15:48:09 +0800
+++-Subject: [PATCH 4/4] 20251106change
++++Subject: [PATCH 4/5] 20251106change
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |  189 +-
+++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+++new file mode 100644
+++index 00000000..8d9032be
+++--- /dev/null
++++++ b/patches/0005-20251107001commit.patch
+++@@ -0,0 +1,7707 @@
++++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
++++From: Pinoeer-kingxi <13022943007@163.com>
++++Date: Fri, 7 Nov 2025 11:48:18 +0800
++++Subject: [PATCH 5/5] 20251107001commit
++++
++++---
++++ .../models/deepseek/modeling_deepseek.py      |   91 +-
++++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
++++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
++++ patches/0001-20251104commit.patch             |    2 +-
++++ patches/0002-20251106commit.patch             |    2 +-
++++ patches/0003-20261106secondcommit.patch       |    2 +-
++++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
++++ 7 files changed, 7577 insertions(+), 30 deletions(-)
++++ create mode 100644 patches/0004-20251106change.patch
++++
++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++index 0546f318..8831e4b7 100644
++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
++++     #         expert_cache += expert_out * weight
++++     #     return expert_cache
++++ 
++++-    @no_grad()
++++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++-        # x 的 shape: (1, hidden_size)
++++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++++-
++++-        # 1. 收集所有需要的专家层
++++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
++++-
++++-        # 2. 并行计算所有专家的输出
++++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++++-        # ops.cat 会将它们堆叠成一个新的 Tensor
++++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++++-
++++-        # 3. 使用矩阵乘法进行加权求和
++++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++-        # 最终结果 final_output 的 shape: (1, hidden_size)
++++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++++    # @no_grad()
+++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++    #     # x 的 shape: (1, hidden_size)
+++++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++++
+++++    #     # 1. 收集所有需要的专家层
+++++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+++++
+++++    #     # 2. 并行计算所有专家的输出
+++++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+++++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++++
+++++    #     # 3. 使用矩阵乘法进行加权求和
+++++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+++++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++++         
++++-        return final_output
+++++    #     return final_output
++++ 
++++ 
++++     # @no_grad()
++++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
++++             )
++++ 
++++         return expert_cache
+++++# 放置在 DeepseekMoE 类中
+++++    @no_grad()
+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++        """
+++++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+++++
+++++        Args:
+++++            x (Tensor): 输入张量, shape: (1, hidden_size)
+++++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+++++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+++++        """
+++++        top_k, _ = flat_expert_weights.shape
+++++        hidden_size = x.shape[-1]
+++++
+++++        # 1. 将所有专家的权重堆叠起来
+++++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+++++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+++++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+++++        
+++++        # 2. "收集" 所需的专家权重
+++++        selected_gate_w = stacked_gate_w[flat_expert_indices]
+++++        selected_up_w = stacked_up_w[flat_expert_indices]
+++++        selected_down_w = stacked_down_w[flat_expert_indices]
+++++        
+++++        # 3. 准备输入
+++++        x_expanded = x.expand((top_k, 1, hidden_size))
+++++        
+++++        # 4. 并行计算 gate_proj 和 up_proj
+++++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+++++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+++++
+++++        # 5. 计算中间状态
+++++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+++++        
+++++        # 6. 并行计算 down_proj
+++++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+++++        #    --- [FIX] ---
+++++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+++++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+++++        #    --- [FIX END] ---
+++++        
+++++        # 7. 根据路由权重进行加权求和
+++++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+++++
+++++        return weighted_sum
+++++
+++++
++++ 
++++         # @no_grad()
++++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++index ebd7782e..913a7609 100644
++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++++ def rotate_half(x):
++++     """Rotates half the hidden dims of the input."""
++++-    x1 = x[..., : x.shape[-1] // 2]
++++-    x2 = x[..., x.shape[-1] // 2 :]
+++++    # x1 = x[..., : x.shape[-1] // 2]
+++++    # x2 = x[..., x.shape[-1] // 2 :]
++++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++     return ops.cat((-x2, x1), dim=-1)
++++ 
++++ 
++++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
++++index d059dcbe..2b217b64 100644
++++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+++++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
++++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++++ def rotate_half(x):
++++     """Rotates half the hidden dims of the input."""
++++-    x1 = x[..., : x.shape[-1] // 2]
++++-    x2 = x[..., x.shape[-1] // 2 :]
+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++    # x1 = x[..., : x.shape[-1] // 2]
+++++    # x2 = x[..., x.shape[-1] // 2 :]
+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++     return ops.cat((-x2, x1), dim=-1)
++++ 
++++ 
++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++++index 78f22642..0a0ef2d7 100644
++++--- a/patches/0001-20251104commit.patch
+++++++ b/patches/0001-20251104commit.patch
++++@@ -1,7 +1,7 @@
++++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++ From: Pinoeer-kingxi <13022943007@163.com>
++++ Date: Tue, 4 Nov 2025 09:11:51 +0800
++++-Subject: [PATCH 1/3] 20251104commit
+++++Subject: [PATCH 1/4] 20251104commit
++++ 
++++ ---
++++  mindnlp/transformers/cache_utils.py           |  28 +-
++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++++index 22b65dd5..5185270c 100644
++++--- a/patches/0002-20251106commit.patch
+++++++ b/patches/0002-20251106commit.patch
++++@@ -1,7 +1,7 @@
++++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++++ From: Pinoeer-kingxi <13022943007@163.com>
++++ Date: Thu, 6 Nov 2025 09:20:38 +0800
++++-Subject: [PATCH 2/3] 20251106commit
+++++Subject: [PATCH 2/4] 20251106commit
++++ 
++++ ---
++++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++++index 966529e4..3e05f821 100644
++++--- a/patches/0003-20261106secondcommit.patch
+++++++ b/patches/0003-20261106secondcommit.patch
++++@@ -1,7 +1,7 @@
++++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++++ From: Pinoeer-kingxi <13022943007@163.com>
++++ Date: Thu, 6 Nov 2025 14:54:37 +0800
++++-Subject: [PATCH 3/3] 20261106secondcommit
+++++Subject: [PATCH 3/4] 20261106secondcommit
++++ 
++++ ---
++++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
++++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
++++new file mode 100644
++++index 00000000..88a1aef4
++++--- /dev/null
+++++++ b/patches/0004-20251106change.patch
++++@@ -0,0 +1,7498 @@
+++++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+++++From: Pinoeer-kingxi <13022943007@163.com>
+++++Date: Thu, 6 Nov 2025 15:48:09 +0800
+++++Subject: [PATCH 4/4] 20251106change
+++++
+++++---
+++++ .../models/deepseek/modeling_deepseek.py      |  189 +-
+++++ patches/0001-20251104commit.patch             | 1272 +++++++
+++++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
+++++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
+++++ 4 files changed, 7244 insertions(+), 186 deletions(-)
+++++ create mode 100644 patches/0001-20251104commit.patch
+++++ create mode 100644 patches/0002-20251106commit.patch
+++++ create mode 100644 patches/0003-20261106secondcommit.patch
+++++
+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++index 2f9192bf..0546f318 100644
+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
+++++ 
+++++         return attn_output, attn_weights, past_key_value
+++++ 
+++++-# class DeepseekFlashAttention(nn.Module):
+++++-#     """
+++++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+++++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+++++-
+++++-#     This class is designed as a drop-in replacement for DeepseekAttention.
+++++-#     """
+++++-
+++++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+++++-#         super().__init__()
+++++-#         self.config = config
+++++-#         self.layer_idx = layer_idx
+++++-#         if layer_idx is None:
+++++-#             logger.warning(
+++++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++-#                 "when creating this class."
+++++-#             )
+++++-
+++++-#         self.attention_dropout = config.attention_dropout
+++++-#         self.hidden_size = config.hidden_size
+++++-#         self.num_heads = config.num_attention_heads
+++++-#         self.head_dim = self.hidden_size // self.num_heads
+++++-#         self.num_key_value_heads = config.num_key_value_heads
+++++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++-#         self.max_position_embeddings = config.max_position_embeddings
+++++-#         self.rope_theta = config.rope_theta
+++++-#         self.is_causal = True
+++++-
+++++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++++-#             raise ValueError(
+++++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++++-#                 f" and `num_heads`: {self.num_heads})."
+++++-#             )
+++++-
+++++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+++++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+++++-#         self._init_rope()
+++++-
+++++-#     def _init_rope(self):
+++++-#         if self.config.rope_scaling is None:
+++++-#             self.rotary_emb = DeepseekRotaryEmbedding(
+++++-#                 self.head_dim,
+++++-#                 max_position_embeddings=self.max_position_embeddings,
+++++-#                 base=self.rope_theta,
+++++-#             )
+++++-#         else:
+++++-#             scaling_type = self.config.rope_scaling["type"]
+++++-#             scaling_factor = self.config.rope_scaling["factor"]
+++++-#             if scaling_type == "linear":
+++++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+++++-#                     self.head_dim,
+++++-#                     max_position_embeddings=self.max_position_embeddings,
+++++-#                     scaling_factor=scaling_factor,
+++++-#                     base=self.rope_theta,
+++++-#                 )
+++++-#             elif scaling_type == "dynamic":
+++++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+++++-#                     self.head_dim,
+++++-#                     max_position_embeddings=self.max_position_embeddings,
+++++-#                     scaling_factor=scaling_factor,
+++++-#                     base=self.rope_theta,
+++++-#                 )
+++++-#             else:
+++++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+++++-
+++++-#     def forward(
+++++-#         self,
+++++-#         hidden_states: mindspore.Tensor,
+++++-#         attention_mask: Optional[mindspore.Tensor] = None,
+++++-#         position_ids: Optional[mindspore.Tensor] = None,
+++++-#         past_key_value: Optional[Cache] = None,
+++++-#         output_attentions: bool = False,
+++++-#         use_cache: bool = False,
+++++-#         **kwargs,
+++++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++-#         if "padding_mask" in kwargs:
+++++-#             warnings.warn(
+++++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+++++-#             )
+++++-        
+++++-#         if output_attentions:
+++++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+++++-
+++++-#         bsz, q_len, _ = hidden_states.shape
+++++-
+++++-#         if self.config.pretraining_tp > 1:
+++++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+++++-
+++++-#         query_states = self.q_proj(hidden_states)
+++++-#         key_states = self.k_proj(hidden_states)
+++++-#         value_states = self.v_proj(hidden_states)
+++++-
+++++-#         # Reshape for multi-head attention
+++++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++-
+++++-#         kv_seq_len = key_states.shape[-2]
+++++-#         if past_key_value is not None:
+++++-#             if self.layer_idx is None:
+++++-#                 raise ValueError(
+++++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++-#                     "with a layer index."
+++++-#                 )
+++++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++-        
+++++-#         # Apply Rotary Positional Embedding
+++++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++-
+++++-#         if past_key_value is not None:
+++++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+++++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++-
+++++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+++++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+++++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++-        
+++++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++++-        
+++++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++++-
+++++-#         # Convert attention_mask for flash_attention_score
+++++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+++++-#         if attention_mask is not None:
+++++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+++++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+++++-#                 raise ValueError(
+++++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+++++-#                 )
+++++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+++++-#         else:
+++++-#             attn_mask_for_fa = None
+++++-        
+++++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+++++-
+++++-#         # Call the fused flash_attention_score operator
+++++-#         attn_output = mindspore.ops.flash_attention_score(
+++++-#             query=query_states_for_fa,
+++++-#             key=key_states_for_fa,
+++++-#             value=value_states_for_fa,
+++++-#             head_num=self.num_heads, # This is N1, the number of query heads
+++++-#             input_layout='BSH',
+++++-#             attn_mask=attn_mask_for_fa,
+++++-#             keep_prob=keep_prob,
+++++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
+++++-#             sparse_mode=0 # Default mask mode
+++++-#         )
+++++-        
+++++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+++++-#         attn_output = self.o_proj(attn_output)
+++++-        
+++++-#         # Flash Attention does not return attention weights
+++++-#         attn_weights = None
+++++-
+++++-#         return attn_output, attn_weights, past_key_value
+++++ 
+++++ class DeepseekFlashAttention(nn.Module):
+++++     """
+++++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
+++++         super().__init__()
+++++         self.hidden_size = config.hidden_size
+++++ 
+++++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+++++-            config=config, layer_idx=layer_idx
+++++-        )
++++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
++++++            # config=config, layer_idx=layer_idx
++++++        # )
+++++ 
+++++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+++++             config=config, layer_idx=layer_idx
+++++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
+++++         return outputs
+++++ 
+++++ 
+++++-
+++++ class DeepseekPreTrainedModel(PreTrainedModel):
+++++     config_class = DeepseekConfig
+++++     base_model_prefix = "model"
+++++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++         # Initialize weights and apply final processing
+++++         self.post_init()
+++++         self.warm_up = False
+++++-        #@dwj
+++++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+++++-            self.num_layers,
+++++-            self.num_attention_heads,
+++++-            self.head_dim,
+++++-            batch_size=1,
+++++-            max_length=self.max_length,
+++++-            dtype=mindspore.float16
+++++-        )
+++++-
+++++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+++++-        key_cache = []
+++++-        value_cache = []
+++++-        for _ in range(num_layers):
+++++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++++-            key_cache.append(k)
+++++-            value_cache.append(v)
+++++-        return key_cache, value_cache
+++++-
+++++ 
+++++     def warmup_moe_model_deep(self):
+++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++++new file mode 100644
+++++index 00000000..78f22642
+++++--- /dev/null
++++++++ b/patches/0001-20251104commit.patch
+++++@@ -0,0 +1,1272 @@
++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++++From: Pinoeer-kingxi <13022943007@163.com>
++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
++++++Subject: [PATCH 1/3] 20251104commit
++++++
++++++---
++++++ mindnlp/transformers/cache_utils.py           |  28 +-
++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
++++++
++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++++++index cadd2e04..02f8d4be 100644
++++++--- a/mindnlp/transformers/cache_utils.py
+++++++++ b/mindnlp/transformers/cache_utils.py
++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++++++             # k_out[:, :, cache_position] = key_states
++++++             # v_out[:, :, cache_position] = value_states
++++++-            if ON_ORANGE_PI:
++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++-            else:
++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++-
+++++++            # if ON_ORANGE_PI:
+++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++++            # else:
+++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++++            # 确保 cache_position 是 1D tensor 并且类型正确
+++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++++++            if cache_position.ndim > 1:
+++++++                cache_position = cache_position.flatten()
+++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++++++                cache_position = cache_position.int()
+++++++            
+++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++++++            k_out[:, :, cache_position] = key_states
+++++++            v_out[:, :, cache_position] = value_states
+++++++            
++++++         return k_out, v_out
++++++ 
++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++index c695b944..d8303e45 100644
++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++++++ def rotate_half(x):
++++++     """Rotates half the hidden dims of the input."""
++++++-    x1 = x[..., : x.shape[-1] // 2]
++++++-    x2 = x[..., x.shape[-1] // 2 :]
+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++++    # x1 = x[..., : x.shape[-1] // 2]
+++++++    # x2 = x[..., x.shape[-1] // 2 :]
+++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++     return ops.cat((-x2, x1), dim=-1)
++++++ 
++++++ 
++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++++++         if self.training:
++++++             raise NotImplementedError("Training is not supported yet.")
++++++         else:
++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++-        if self.config.n_shared_experts is not None:
++++++-            y = y + self.shared_experts(identity)
++++++-        return y
+++++++            # @lwx
+++++++            if orig_shape[1] == 1:
+++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++++++                y=y.view(*orig_shape)
+++++++                if self.config.n_shared_experts is not None:
+++++++                    y = y + self.shared_experts(identity)
+++++++                return y
+++++++            else:
+++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++++++                if self.config.n_shared_experts is not None:
+++++++                    y = y + self.shared_experts(identity)
+++++++                return y
+++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++++        # if self.config.n_shared_experts is not None:
+++++++        #     y = y + self.shared_experts(identity)
+++++++        # return y
+++++++        
+++++++    @no_grad()
+++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++++
+++++++        expert_cache = ops.zeros_like(x)
+++++++        for i in range(self.num_experts_per_tok):
+++++++            expert_id = flat_expert_indices[i].item()
+++++++            weight = flat_expert_weights[i].item()
+++++++            expert = self.experts[expert_id]
+++++++            expert_out = expert(x)
+++++++            expert_cache += expert_out * weight
+++++++        return expert_cache
++++++ 
++++++     @no_grad()
++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++-        # expert_cache = torch.zeros_like(x)
++++++-        # idxs = flat_expert_indices.argsort()
++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++-        # token_idxs = idxs // self.num_experts_per_tok
++++++-        # for i, end_idx in enumerate(tokens_per_expert):
++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++-        #     if start_idx == end_idx:
++++++-        #         continue
++++++-        #     expert = self.experts[i]
++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++-        #     expert_tokens = x[exp_token_idx]
++++++-        #     expert_out = expert(expert_tokens)
++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++-        # return expert_cache
+++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++         expert_cache = ops.zeros_like(x)
++++++         idxs = flat_expert_indices.argsort()
++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++         token_idxs = idxs // self.num_experts_per_tok
+++++++
++++++         for i, end_idx in enumerate(tokens_per_expert):
++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++             if start_idx == end_idx:
++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++++++             expert_out = expert(expert_tokens)
++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++
++++++         return expert_cache
+++++++        
+++++++    # @no_grad()
+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++    #     # expert_cache = torch.zeros_like(x)
+++++++    #     # idxs = flat_expert_indices.argsort()
+++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++    #     # token_idxs = idxs // self.num_experts_per_tok
+++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++    #     #     if start_idx == end_idx:
+++++++    #     #         continue
+++++++    #     #     expert = self.experts[i]
+++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++    #     #     expert_tokens = x[exp_token_idx]
+++++++    #     #     expert_out = expert(expert_tokens)
+++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++    #     # return expert_cache
+++++++    #     expert_cache = ops.zeros_like(x)
+++++++    #     idxs = flat_expert_indices.argsort()
+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++++
+++++++    #     for i, end_idx in enumerate(tokens_per_expert):
+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++    #         if start_idx == end_idx:
+++++++    #             continue
+++++++    #         expert = self.experts[i]
+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++    #         expert_tokens = x[exp_token_idx]
+++++++    #         expert_out = expert(expert_tokens)
+++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++
+++++++    #     return expert_cache
+++++++    # @no_grad()
+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++    #     expert_cache = ops.zeros_like(x)
+++++++
+++++++    #     # 排序保证顺序一致
+++++++    #     idxs = flat_expert_indices.argsort()
+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++++
+++++++    #     # 找出有 token 的专家
+++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++++
+++++++    #     for i in active_experts.tolist():
+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++    #         end_idx = tokens_per_expert[i]
+++++++    #         if start_idx == end_idx:  # 没有 token
+++++++    #             continue
+++++++
+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++    #         expert_tokens = x[exp_token_idx]
+++++++    #         expert_out = self.experts[i](expert_tokens)
+++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++++
+++++++    #         expert_cache = mindspore.mint.scatter_add(
+++++++    #             expert_cache,
+++++++    #             0,
+++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++++    #             expert_out
+++++++    #         )
+++++++
+++++++    #     return expert_cache
+++++++
+++++++
++++++ 
++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++++++ #     """
++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++ 
++++++         # Initialize weights and apply final processing
++++++         self.post_init()
+++++++        self.warm_up = False
+++++++
+++++++    def warmup_moe_model_deep(self):
+++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++++        test_texts = [
+++++++            "warmup short",
+++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++++++        ]
+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++++        if tokenizer is None:
+++++++            from mindnlp.transformers import AutoTokenizer
+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++++            self._warmup_tokenizer = tokenizer
+++++++
+++++++        for text in test_texts:
+++++++            inputs = tokenizer(text, return_tensors="ms")
+++++++            with mindspore._no_grad():
+++++++                _ = self(**inputs, use_cache=False)
+++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++++++ 
++++++     def get_input_embeddings(self):
++++++         return self.model.embed_tokens
++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++++         ```"""
+++++++        if not self.warm_up:
+++++++            self.warm_up = True
+++++++            self.warmup_moe_model_deep()
+++++++
++++++         output_attentions = (
++++++             output_attentions
++++++             if output_attentions is not None
++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++index 3cbf820e..d4c6b651 100644
++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++@@ -18,7 +18,6 @@
++++++ # See the License for the specific language governing permissions and
++++++ # limitations under the License.
++++++ """MindSpore Qwen2MoE model."""
++++++-
++++++ import math
++++++ from typing import List, Optional, Tuple, Union
++++++ 
++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++++++     TokenClassifierOutput,
++++++ )
++++++ from ...modeling_utils import PreTrainedModel
+++++++from ...generation import GenerationMixin
++++++ from ....utils import logging
++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
++++++ 
++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++++++         self.variance_epsilon = eps
++++++ 
++++++     def forward(self, hidden_states):
+++++++        # @dwj
+++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++++        # @lwx
+++++++        # if not self.training :
+++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++         input_dtype = hidden_states.dtype
++++++         hidden_states = hidden_states.to(mindspore.float32)
++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++++++@@ -234,6 +239,8 @@ def rotate_half(x):
++++++     """Rotates half the hidden dims of the input."""
++++++     x1 = x[..., : x.shape[-1] // 2]
++++++     x2 = x[..., x.shape[-1] // 2 :]
+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++     return ops.cat((-x2, x1), dim=-1)
++++++ 
++++++ 
++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++++++         self.config = config
++++++         self.hidden_size = config.hidden_size
++++++         self.intermediate_size = intermediate_size
+++++++        
++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++++++         self.act_fn = ACT2FN[config.hidden_act]
++++++ 
++++++     def forward(self, x):
++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++-
++++++ 
+++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++++        # @lwx 
+++++++        # gate_up_output = self.gate_up_proj(x)
+++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++++++        # return self.down_proj(swiglu_output)
+++++++
+++++++    # def forward(self, x):
+++++++    #     gate_proj_out = self.gate_proj(x)
+++++++    #     up_proj_out = self.up_proj(x)
+++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++++++    #     return self.down_proj(swiglu_out)
+++++++        
++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++++     """
++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++++++         use_cache: bool = False,
++++++         cache_position: Optional[mindspore.Tensor] = None,
++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++        
+++++++
++++++         bsz, q_len, _ = hidden_states.shape
++++++ 
++++++         query_states = self.q_proj(hidden_states)
++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++                     "with a layer index."
++++++                 )
++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++            if isinstance(past_key_value, StaticCache):
+++++++                kv_seq_len = key_states.shape[-2]
+++++++            else:
+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++ 
++++++         if past_key_value is not None:
++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++            
+++++++            if isinstance(past_key_value, StaticCache):
+++++++                kv_seq_len = key_states.shape[-2]
++++++ 
++++++         # repeat k/v heads if n_kv_heads < n_heads
++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++-
+++++++        
++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++ 
++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++++++-            raise ValueError(
++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++++++-                f" {attn_weights.shape}"
++++++-            )
++++++-
++++++-        if attention_mask is not None:  # no matter the length, we just slice it
++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++++++        if attention_mask is not None:
+++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++             attn_weights = attn_weights + causal_mask
++++++ 
++++++         # upcast attention to fp32
++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++ 
++++++         attn_output = self.o_proj(attn_output)
++++++-
+++++++        # @lwx
+++++++        
+++++++        # max_seq_len = self.max_position_embeddings  # 2048
+++++++
+++++++        # if attention_mask is not None:
+++++++        #     # attention_mask: [B, 1, Sq, Sk]
+++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++++
+++++++        #     # pad 到 [max_seq_len, max_seq_len]
+++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++++        #     global_attention_mask = padded_mask
+++++++        # else:
+++++++        #     global_attention_mask = None
+++++++
+++++++
+++++++        # sparse_mode=3
+++++++        # attn_output = mindspore.ops.flash_attention_score(
+++++++        #     query=query_states,
+++++++        #     key=key_states,
+++++++        #     value=value_states,
+++++++        #     real_shift=None,
+++++++        #     padding_mask=None,
+++++++
+++++++        #     head_num=self.num_heads,
+++++++        #     attn_mask=global_attention_mask,
+++++++        #     keep_prob=1.0 - self.attention_dropout,
+++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++        #     input_layout="BNSD",
+++++++        #     pre_tokens=2147483647,
+++++++        #     next_tokens=2147483647,
+++++++        #     inner_precise=0,
+++++++        #     drop_mask=None,
+++++++        #     prefix=None,
+++++++        #     actual_seq_qlen=None,
+++++++        #     actual_seq_kvlen=None,
+++++++        #     sparse_mode=sparse_mode,
+++++++        # )
++++++         if not output_attentions:
++++++             attn_weights = None
++++++ 
++++++         return attn_output, attn_weights, past_key_value
++++++ 
++++++ 
+++++++class Qwen2MoeFlashAttention(nn.Module):
+++++++    """
+++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++++
+++++++    关键改动:
+++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++++       直接传入原始的 key 和 value 张量效率更高。
+++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++++    """
+++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++++        super().__init__()
+++++++        self.config = config
+++++++        self.layer_idx = layer_idx
+++++++        self.hidden_size = config.hidden_size
+++++++        self.num_heads = config.num_attention_heads
+++++++        self.head_dim = self.hidden_size // self.num_heads
+++++++        self.num_key_value_heads = config.num_key_value_heads
+++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++        self.max_position_embeddings = config.max_position_embeddings
+++++++        self.rope_theta = config.rope_theta
+++++++        self.attention_dropout = config.attention_dropout
+++++++
+++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++            raise ValueError(
+++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++++            )
+++++++
+++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++++
+++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++++            self.head_dim,
+++++++            max_position_embeddings=self.max_position_embeddings,
+++++++            base=self.rope_theta,
+++++++        )
+++++++
+++++++    def forward(
+++++++        self,
+++++++        hidden_states: mindspore.Tensor,
+++++++        attention_mask: Optional[mindspore.Tensor] = None,
+++++++        position_ids: Optional[mindspore.Tensor] = None,
+++++++        past_key_value: Optional[Cache] = None,
+++++++        output_attentions: bool = False,
+++++++        use_cache: bool = False,
+++++++        cache_position: Optional[mindspore.Tensor] = None,
+++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++        bsz, q_len, _ = hidden_states.shape
+++++++
+++++++        # 1. 线性投射 Q, K, V
+++++++        query_states = self.q_proj(hidden_states)
+++++++        key_states = self.k_proj(hidden_states)
+++++++        value_states = self.v_proj(hidden_states)
+++++++
+++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++        # 3. RoPE 旋转位置编码
+++++++        kv_seq_len = key_states.shape[-2]
+++++++        if past_key_value is not None:
+++++++            if self.layer_idx is None:
+++++++                raise ValueError(
+++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++                    "with a layer index."
+++++++                )
+++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++++                if cache_position.shape[0] == 1:
+++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++++                    kv_seq_len = past_seen_tokens + 1
+++++++                else:
+++++++                    # prefill 阶段：cache_position 是范围，使用其长度
+++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++++            else:
+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++
+++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++        # 4. KV 缓存更新
+++++++        if past_key_value is not None:
+++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++            key_states, value_states = past_key_value.update(
+++++++                key_states, value_states, self.layer_idx, cache_kwargs
+++++++            )
+++++++            
+++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++                if cache_position.shape[0] == 1:
+++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++++                    kv_seq_len = key_states.shape[-2]
+++++++
+++++++        # 5. [重要] 准备 Attention Mask
+++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++++        fa_attention_mask = None
+++++++        if attention_mask is not None:
+++++++            # 截取与当前key长度匹配的部分
+++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++++            fa_attention_mask = (mask_slice != 0)
+++++++
+++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++++        input_dtype = query_states.dtype
+++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++++            query_states = query_states.to(mindspore.float16)
+++++++            key_states = key_states.to(mindspore.float16)
+++++++            value_states = value_states.to(mindspore.float16)
+++++++
+++++++        # 6. [核心] 调用 flash_attention_score 算子
+++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++++        attn_output = mindspore.ops.flash_attention_score(
+++++++            query=query_states,
+++++++            key=key_states,
+++++++            value=value_states,
+++++++            head_num=self.num_heads, # 传入Q的头数(N1)
+++++++            attn_mask=fa_attention_mask,
+++++++            keep_prob=1.0 - self.attention_dropout,
+++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++            input_layout="BNSD",
+++++++            sparse_mode=0 # 使用 defaultMask 模式
+++++++        )
+++++++
+++++++        # 恢复原始数据类型
+++++++        attn_output = attn_output.to(input_dtype)
+++++++
+++++++        # 7. 调整输出形状
+++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++        attn_output = self.o_proj(attn_output)
+++++++
+++++++        # FlashAttention 算子不直接返回注意力权重矩阵
+++++++        attn_weights = None
+++++++        if output_attentions:
+++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++
+++++++        return attn_output, attn_weights, past_key_value
+++++++
+++++++    # def forward(
+++++++    #     self,
+++++++    #     hidden_states: mindspore.Tensor,
+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++    #     past_key_value: Optional[Cache] = None,
+++++++    #     output_attentions: bool = False,
+++++++    #     use_cache: bool = False,
+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++    #     bsz, q_len, _ = hidden_states.shape
+++++++
+++++++    #     # 1. 线性投射 Q, K, V
+++++++    #     query_states = self.q_proj(hidden_states)
+++++++    #     key_states = self.k_proj(hidden_states)
+++++++    #     value_states = self.v_proj(hidden_states)
+++++++
+++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++    #     # 3. RoPE 旋转位置编码
+++++++    #     kv_seq_len = key_states.shape[-2]
+++++++    #     if past_key_value is not None:
+++++++    #         if self.layer_idx is None:
+++++++    #             raise ValueError(
+++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++    #                 "with a layer index."
+++++++    #             )
+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++
+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++    #     # 4. KV 缓存更新
+++++++    #     if past_key_value is not None:
+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++    #         key_states, value_states = past_key_value.update(
+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++    #         )
+++++++
+++++++    #     # 5. 准备 Attention Mask
+++++++    #     fa_attention_mask = None
+++++++    #     if attention_mask is not None:
+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++    #         fa_attention_mask = (mask_slice != 0)
+++++++
+++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++++    #     input_dtype = query_states.dtype
+++++++
+++++++    #     # 6. [核心] 调用 flash_attention_score 算子
+++++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++++    #         query=query_states,
+++++++    #         key=key_states,
+++++++    #         value=value_states,
+++++++    #         head_num=self.num_heads,
+++++++    #         attn_mask=fa_attention_mask,
+++++++    #         keep_prob=1.0 - self.attention_dropout,
+++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++    #         input_layout="BNSD",
+++++++    #         sparse_mode=0,
+++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++++    #         inner_precise=1
+++++++    #     )
+++++++
+++++++    #     # 恢复原始数据类型
+++++++    #     attn_output = attn_output.to(input_dtype)
+++++++
+++++++    #     # 7. 调整输出形状
+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++    #     attn_output = self.o_proj(attn_output)
+++++++
+++++++    #     attn_weights = None
+++++++    #     if output_attentions:
+++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++
+++++++    #     return attn_output, attn_weights, past_key_value
+++++++
+++++++    # def forward(
+++++++    #     self,
+++++++    #     hidden_states: mindspore.Tensor,
+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++    #     past_key_value: Optional[Cache] = None,
+++++++    #     output_attentions: bool = False,
+++++++    #     use_cache: bool = False,
+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++    #     bsz, q_len, _ = hidden_states.shape
+++++++
+++++++    #     query_states = self.q_proj(hidden_states)
+++++++    #     key_states = self.k_proj(hidden_states)
+++++++    #     value_states = self.v_proj(hidden_states)
+++++++
+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++    #     kv_seq_len = key_states.shape[-2]
+++++++    #     if past_key_value is not None:
+++++++    #         if self.layer_idx is None:
+++++++    #             raise ValueError("`layer_idx` must be specified for caching")
+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++
+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++    #     if past_key_value is not None:
+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++    #         key_states, value_states = past_key_value.update(
+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++    #         )
+++++++
+++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++
+++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++++    #     query_states = query_states / math.sqrt(self.head_dim)
+++++++    #     # <--- 修改结束 ---
+++++++
+++++++    #     fa_attention_mask = None
+++++++    #     if attention_mask is not None:
+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++    #         fa_attention_mask = (mask_slice != 0)
+++++++
+++++++    #     input_dtype = query_states.dtype
+++++++
+++++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++++    #         query=query_states,    # 传入已经预先缩放过的 query
+++++++    #         key=key_states,
+++++++    #         value=value_states,
+++++++    #         head_num=self.num_heads,
+++++++    #         attn_mask=fa_attention_mask,
+++++++    #         keep_prob=1.0 - self.attention_dropout,
+++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++++    #         input_layout="BNSD",
+++++++    #         sparse_mode=0,
+++++++    #         inner_precise=1        # 仍然保持内部高精度计算
+++++++    #     )
+++++++
+++++++    #     attn_output = attn_output.to(input_dtype)
+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++    #     attn_output = self.o_proj(attn_output)
+++++++
+++++++    #     attn_weights = None
+++++++    #     if output_attentions:
+++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++++
+++++++    #     return attn_output, attn_weights, past_key_value
+++++++
++++++ QWEN2MOE_ATTENTION_CLASSES = {
++++++     "eager": Qwen2MoeAttention,
+++++++    "flash-attention": Qwen2MoeFlashAttention,
++++++ }
++++++ 
++++++ 
++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++ 
+++++++    #@dwj
+++++++    # 只遍历激活的专家，而非全部专家
++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
++++++-        # router_logits: (batch * sequence_length, n_experts)
++++++-        router_logits = self.gate(hidden_states)
++++++-
++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++-        if self.norm_topk_prob:
++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-        # we cast back to the input dtype
++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
++++++-
++++++-        final_hidden_states = ops.zeros(
++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++++++-        )
++++++-
++++++-        # One hot encode the selected experts to create an expert mask
++++++-        # this will be used to easily index which expert is going to be sollicitated
++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++++++-
++++++-        # Loop over all available experts in the model and perform the computation on each expert
++++++-        for expert_idx in range(self.num_experts):
++++++-            expert_layer = self.experts[expert_idx]
++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++++++-
++++++-            # Index the correct hidden states and compute the expert hidden state for
++++++-            # the current expert. We need to make sure to multiply the output hidden
++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++++++-            if 0 not in idx.shape:
++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++++++-
++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
++++++-                # the `top_x` tensor here.
++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++++++-
++++++-        shared_expert_output = self.shared_expert(hidden_states)
++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++++++-
++++++-        final_hidden_states = final_hidden_states + shared_expert_output
+++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++            num_tokens = hidden_states_reshaped.shape[0]
+++++++
+++++++            router_logits = self.gate(hidden_states_reshaped)
+++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++
+++++++            if self.norm_topk_prob:
+++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++            routing_weights = routing_weights.to(hidden_states.dtype)
+++++++            
+++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++++            flat_selected_experts = selected_experts.flatten()
+++++++            
+++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++++            token_indices = broadcasted_token_indices.flatten()
+++++++            
+++++++            active_experts = ops.unique(flat_selected_experts)
+++++++            
+++++++            for expert_idx_tensor in active_experts:
+++++++                expert_idx = expert_idx_tensor.item()
+++++++                expert_layer = self.experts[expert_idx]
+++++++                
+++++++                mask = (flat_selected_experts == expert_idx_tensor)
+++++++                selected_token_indices = token_indices[mask]
+++++++                selected_routing_weights = routing_weights.flatten()[mask]
+++++++                
+++++++                current_states = hidden_states_reshaped[selected_token_indices]
+++++++                
+++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++                
+++++++                final_hidden_states = final_hidden_states.index_add(
+++++++                    dim=0,
+++++++                    index=selected_token_indices,
+++++++                    source=expert_output.to(hidden_states.dtype)
+++++++                )
+++++++            
+++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++++ 
++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++-        return final_hidden_states, router_logits
+++++++            final_hidden_states = final_hidden_states + shared_expert_output
+++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++            
+++++++            return final_hidden_states, router_logits
++++++ 
++++++ 
++++++ class Qwen2MoeDecoderLayer(nn.Module):
++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++++++ 
++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++ 
+++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++++
++++++         if (layer_idx not in config.mlp_only_layers) and (
++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++++         ):
++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++++++     _skip_keys_device_placement = "past_key_values"
++++++     _supports_cache_class = True
+++++++#lwx
+++++++    # _supports_static_cache = True
++++++ 
++++++     def _init_weights(self, module):
++++++         std = self.config.initializer_range
++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++++         return causal_mask
++++++ 
++++++ 
++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++     _tied_weights_keys = ["lm_head.weight"]
++++++ 
++++++     def __init__(self, config):
++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++         self.num_experts_per_tok = config.num_experts_per_tok
++++++         # Initialize weights and apply final processing
++++++         self.post_init()
+++++++        # @lwx
+++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++++++        #     self.generation_config.cache_implementation = "static"
+++++++        self._warmed_up = False
+++++++
+++++++    def warmup_moe_model(self):
+++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++++++        test_texts = [
+++++++            "warmup short",
+++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++++++        ]
+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++++        if tokenizer is None:
+++++++            from mindnlp.transformers import AutoTokenizer
+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++++            self._warmup_tokenizer = tokenizer
+++++++
+++++++        for text in test_texts:
+++++++            inputs = tokenizer(text, return_tensors="ms")
+++++++            with mindspore._no_grad():
+++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
++++++ 
++++++     def get_input_embeddings(self):
++++++         return self.model.embed_tokens
++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++++         ```"""
+++++++        if not self._warmed_up:
+++++++            self._warmed_up = True
+++++++            self.warmup_moe_model()
++++++ 
++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++++         output_router_logits = (
++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++             }
++++++         )
++++++         return model_inputs
+++++++# @lwx
+++++++    # def _decode_one_tokens_logits(
+++++++    #     self,
+++++++    #     cur_token: mindspore.Tensor,
+++++++    #     input_pos: Optional[mindspore.Tensor],
+++++++    #     cache_position: mindspore.Tensor,
+++++++    #     past_key_values: StaticCache,
+++++++    # ) -> mindspore.Tensor:
+++++++    #     """
+++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++++++        
+++++++    #     Args:
+++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++++++    #         input_pos: 输入位置信息，可选
+++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++++++            
+++++++    #     Returns:
+++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++++++    #     """
+++++++    #     # 调用JIT编译的版本
+++++++    #     return self.get_decode_one_tokens_logits(
+++++++    #         cur_token=cur_token,
+++++++    #         input_pos=input_pos,
+++++++    #         cache_position=cache_position,
+++++++    #         past_key_values=past_key_values,
+++++++    #     )
+++++++    
+++++++    # @mindspore.jit(jit_level='O1')
+++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++++++    #     """
+++++++    #     JIT编译的函数，用于高效的单token解码
+++++++    #     使用JIT编译优化以支持静态shape和高效执行
+++++++        
+++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++++++    #     """
+++++++    #     outputs = self.model.forward(
+++++++    #         input_ids=cur_token,
+++++++    #         position_ids=input_pos,
+++++++    #         cache_position=cache_position,
+++++++    #         past_key_values=past_key_values,
+++++++    #         use_cache=True,
+++++++    #         return_dict=False,
+++++++    #     )
+++++++        
+++++++    #     hidden_states = outputs[0]
+++++++    #     logits = self.lm_head.forward(hidden_states)
+++++++    #     logits = logits.float()
+++++++        
+++++++    #     return logits[:, -1, :]
+++++++
+++++++    # def _sample(
+++++++    #     self,
+++++++    #     input_ids: mindspore.Tensor,
+++++++    #     logits_processor,
+++++++    #     stopping_criteria,
+++++++    #     generation_config,
+++++++    #     synced_devices: bool,
+++++++    #     streamer=None,
+++++++    #     logits_warper=None,
+++++++    #     **model_kwargs,
+++++++    # ):
+++++++    #     """
+++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++++++    #     """
+++++++    #     from ...generation.logits_process import LogitsProcessorList
+++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++++++    #     from mindnlp.core import nn, ops, no_grad
+++++++    #     import numpy as np
+++++++        
+++++++    #     # 检查是否使用 StaticCache
+++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++++++    #     # 否则，直接调用父类方法
+++++++    #     past_key_values = model_kwargs.get("past_key_values")
+++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++++++
+++++++    #     if not isinstance(past_key_values, StaticCache):
+++++++    #         # 不使用 StaticCache，直接调用父类方法
+++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++++++    #         return super()._sample(
+++++++    #             input_ids=input_ids,
+++++++    #             logits_processor=logits_processor,
+++++++    #             stopping_criteria=stopping_criteria,
+++++++    #             generation_config=generation_config,
+++++++    #             synced_devices=synced_devices,
+++++++    #             streamer=streamer,
+++++++    #             logits_warper=logits_warper,
+++++++    #             **model_kwargs,
+++++++    #         )
+++++++        
+++++++    #     # 使用 StaticCache，进入自定义循环
+++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++++++    #     pad_token_id = generation_config._pad_token_tensor
+++++++    #     output_attentions = generation_config.output_attentions
+++++++    #     output_hidden_states = generation_config.output_hidden_states
+++++++    #     output_scores = generation_config.output_scores
+++++++    #     output_logits = generation_config.output_logits
+++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++++++    #     max_length = generation_config.max_length
+++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++++++    #     do_sample = generation_config.do_sample
+++++++        
+++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++++++    #         raise ValueError(
+++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++++++    #             f"{logits_warper})."
+++++++    #         )
+++++++        
+++++++    #     # init attention / hidden states / scores tuples
+++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++++++        
+++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++++++    #         encoder_hidden_states = (
+++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++++++    #         )
+++++++        
+++++++    #     # keep track of which sequences are already finished
+++++++    #     batch_size, cur_len = input_ids.shape
+++++++    #     this_peer_finished = False
+++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++++++        
+++++++    #     time_record = []
+++++++    #     from ....utils.testing_utils import parse_flag_from_env
+++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++++++        
+++++++    #     while self._has_unfinished_sequences(
+++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++++++    #     ):
+++++++    #         if _record_time:
+++++++    #             import time as time_module
+++++++    #             infer_start = time_module.time()
+++++++            
+++++++    #         # prepare model inputs
+++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++++++            
+++++++    #         # prepare variable output controls
+++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++++++            
+++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++++++    #         cur_cache_position = model_inputs.get("cache_position")
+++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+++++++    #         cur_input_ids = model_inputs.get("input_ids")
+++++++            
+++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++++++    #             cur_cache_position is not None and 
+++++++    #             len(cur_cache_position.shape) > 0 and
+++++++    #             cur_cache_position.shape[0] == 1 and
+++++++    #             cur_input_ids is not None and
+++++++    #             cur_input_ids.shape[1] == 1):
+++++++    #             # 使用 JIT 优化的单 token 解码
+++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++++++    #             if not hasattr(self, '_jit_used'):
+++++++    #                 self._jit_used = False
+++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++++++                
+++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+++++++    #                 cur_token=cur_input_ids,
+++++++    #                 input_pos=model_inputs.get("position_ids"),
+++++++    #                 cache_position=cur_cache_position,
+++++++    #                 past_key_values=cur_past_key_values,
+++++++    #             )
+++++++                
+++++++    #             # 标记已使用JIT（用于后续判断）
+++++++    #             if not self._jit_used:
+++++++    #                 self._jit_used = True
+++++++                
+++++++    #             # 构造兼容的输出对象
+++++++    #             class JitOptimizedOutput:
+++++++    #                 def __init__(self, logits, config):
+++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++++++    #                     self.config = config
+++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+++++++    #                     self.cross_attentions = None
+++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++++++                
+++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++++++    #         else:
+++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++++++    #             outputs = self(**model_inputs, return_dict=True)
+++++++            
+++++++    #         if synced_devices and this_peer_finished:
+++++++    #             continue
+++++++            
+++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++++++    #         next_token_logits = outputs.logits[:, -1, :]
+++++++            
+++++++    #         # pre-process distribution
+++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++++++    #         if do_sample:
+++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++++++            
+++++++    #         # Store scores, attentions and hidden_states when required
+++++++    #         if return_dict_in_generate:
+++++++    #             if output_scores:
+++++++    #                 scores += (next_token_scores,)
+++++++    #             if output_logits:
+++++++    #                 raw_logits += (next_token_logits,)
+++++++    #             if output_attentions:
+++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++++++    #                 if self.config.is_encoder_decoder:
+++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++++++                
+++++++    #             if output_hidden_states:
+++++++    #                 hidden = (
+++++++    #                     outputs.decoder_hidden_states
+++++++    #                     if self.config.is_encoder_decoder
+++++++    #                     else outputs.hidden_states
+++++++    #                 )
+++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++++++            
+++++++    #         # token selection
+++++++    #         if do_sample:
+++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++++++    #         else:
+++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++++++            
+++++++    #         # finished sentences should have their next token be a padding token
+++++++    #         if has_eos_stopping_criteria:
+++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++++++            
+++++++    #         # update generated ids, model inputs, and length for next step
+++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++++++    #         if streamer is not None:
+++++++    #             streamer.put(next_tokens)
+++++++            
+++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+++++++    #             outputs,
+++++++    #             model_kwargs,
+++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++++++    #         )
+++++++            
+++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++++++    #         cur_len += 1
+++++++            
+++++++    #         if _record_time:
+++++++    #             import time as time_module
+++++++    #             infer_stop = time_module.time()
+++++++    #             time_record.append(infer_stop - infer_start)
+++++++            
+++++++    #         del outputs
+++++++        
+++++++    #     average_infer_time = None
+++++++    #     if time_record:
+++++++    #         if len(time_record) > 1:
+++++++    #             time_record.pop(0)
+++++++    #         average_infer_time = sum(time_record) / len(time_record)
+++++++    #         print(f'average inference time is: {average_infer_time}')
+++++++    #         print(f'inference time record: {time_record}')
+++++++        
+++++++    #     if streamer is not None:
+++++++    #         streamer.end()
+++++++        
+++++++    #     # 简单判断：打印是否使用了JIT路径
+++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+++++++    #     else:
+++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++++++        
+++++++    #     if return_dict_in_generate:
+++++++    #         if self.config.is_encoder_decoder:
+++++++    #             return GenerateEncoderDecoderOutput(
+++++++    #                 sequences=input_ids,
+++++++    #                 scores=scores,
+++++++    #                 logits=raw_logits,
+++++++    #                 encoder_attentions=encoder_attentions,
+++++++    #                 encoder_hidden_states=encoder_hidden_states,
+++++++    #                 decoder_attentions=decoder_attentions,
+++++++    #                 cross_attentions=cross_attentions,
+++++++    #                 decoder_hidden_states=decoder_hidden_states,
+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++++    #                 average_infer_time=average_infer_time
+++++++    #             )
+++++++    #         else:
+++++++    #             return GenerateDecoderOnlyOutput(
+++++++    #                 sequences=input_ids,
+++++++    #                 scores=scores,
+++++++    #                 logits=raw_logits,
+++++++    #                 attentions=decoder_attentions,
+++++++    #                 hidden_states=decoder_hidden_states,
+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++++    #                 average_infer_time=average_infer_time
+++++++    #             )
+++++++    #     else:
+++++++    #         return input_ids
+++++++            
+++++++    # def _prepare_cache_for_generation(
+++++++    #     self,
+++++++    #     generation_config,
+++++++    #     model_kwargs,
+++++++    #     assistant_model,
+++++++    #     batch_size,
+++++++    #     max_cache_length,
+++++++    # ):
+++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++++++    #         generation_config.cache_implementation = "static"
+++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++++++        
+++++++    #     if generation_config.cache_implementation == "static":
+++++++    #         base_required_from_max_length = generation_config.max_length + 1
+++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+++++++    #         min_cache_size = 50
+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++++++    #         else:
+++++++    #             max_cache_length = max(base_required, min_cache_size)
+++++++            
+++++++    #         original_max_cache_length = max_cache_length
+++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+++++++            
+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++++    #             if max_cache_length > self.config.max_position_embeddings:
+++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++++++        
+++++++    #     result = super()._prepare_cache_for_generation(
+++++++    #         generation_config=generation_config,
+++++++    #         model_kwargs=model_kwargs,
+++++++    #         assistant_model=assistant_model,
+++++++    #         batch_size=batch_size,
+++++++    #         max_cache_length=max_cache_length,
+++++++    #     )
+++++++        
+++++++    #     if generation_config.cache_implementation == "static":
+++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++++++    #         created_cache = model_kwargs.get(cache_name)
+++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++++++    #             if created_cache.max_cache_len < generation_config.max_length:
+++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++++++        
+++++++    #     return result
+++++++
+++++++
+++++++
++++++ 
++++++ 
++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++++++-- 
++++++2.27.0
++++++
+++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+++++new file mode 100644
+++++index 00000000..22b65dd5
+++++--- /dev/null
++++++++ b/patches/0002-20251106commit.patch
+++++@@ -0,0 +1,3200 @@
++++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++++++From: Pinoeer-kingxi <13022943007@163.com>
++++++Date: Thu, 6 Nov 2025 09:20:38 +0800
++++++Subject: [PATCH 2/3] 20251106commit
++++++
++++++---
++++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
++++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
++++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
++++++ create mode 100644 patches/0001-20251104commit.patch
++++++
++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++index d8303e45..73773c22 100644
++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
++++++         #     y = y + self.shared_experts(identity)
++++++         # return y
++++++         
+++++++    # @no_grad()
+++++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++++
+++++++    #     expert_cache = ops.zeros_like(x)
+++++++    #     for i in range(self.num_experts_per_tok):
+++++++    #         expert_id = flat_expert_indices[i].item()
+++++++    #         weight = flat_expert_weights[i].item()
+++++++    #         expert = self.experts[expert_id]
+++++++    #         expert_out = expert(x)
+++++++    #         expert_cache += expert_out * weight
+++++++    #     return expert_cache
+++++++
++++++     @no_grad()
++++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++++        # x 的 shape: (1, hidden_size)
+++++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++++++
+++++++        # 1. 收集所有需要的专家层
+++++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+++++++
+++++++        # 2. 并行计算所有专家的输出
+++++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++++++        # ops.cat 会将它们堆叠成一个新的 Tensor
+++++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++++++
+++++++        # 3. 使用矩阵乘法进行加权求和
+++++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++++        # 最终结果 final_output 的 shape: (1, hidden_size)
+++++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++++++        
+++++++        return final_output
++++++ 
++++++-        expert_cache = ops.zeros_like(x)
++++++-        for i in range(self.num_experts_per_tok):
++++++-            expert_id = flat_expert_indices[i].item()
++++++-            weight = flat_expert_weights[i].item()
++++++-            expert = self.experts[expert_id]
++++++-            expert_out = expert(x)
++++++-            expert_cache += expert_out * weight
++++++-        return expert_cache
++++++ 
++++++     @no_grad()
++++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
++++++             key_states = self.k_proj(hidden_states)
++++++             value_states = self.v_proj(hidden_states)
++++++ 
++++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++++        # @lwx
+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+++++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+++++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+++++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
++++++ 
++++++         kv_seq_len = key_states.shape[-2]
++++++         if past_key_value is not None:
++++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
++++++         return attn_output, attn_weights, past_key_value
++++++ 
++++++ 
+++++++# class DeepseekFlashAttention(nn.Module):
+++++++#     """
+++++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+++++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+++++++
+++++++#     This class is designed as a drop-in replacement for DeepseekAttention.
+++++++#     """
+++++++
+++++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+++++++#         super().__init__()
+++++++#         self.config = config
+++++++#         self.layer_idx = layer_idx
+++++++#         if layer_idx is None:
+++++++#             logger.warning(
+++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++++#                 "when creating this class."
+++++++#             )
+++++++
+++++++#         self.attention_dropout = config.attention_dropout
+++++++#         self.hidden_size = config.hidden_size
+++++++#         self.num_heads = config.num_attention_heads
+++++++#         self.head_dim = self.hidden_size // self.num_heads
+++++++#         self.num_key_value_heads = config.num_key_value_heads
+++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++#         self.max_position_embeddings = config.max_position_embeddings
+++++++#         self.rope_theta = config.rope_theta
+++++++#         self.is_causal = True
+++++++
+++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++#             raise ValueError(
+++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++++++#                 f" and `num_heads`: {self.num_heads})."
+++++++#             )
+++++++
+++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+++++++#         self._init_rope()
+++++++
+++++++#     def _init_rope(self):
+++++++#         if self.config.rope_scaling is None:
+++++++#             self.rotary_emb = DeepseekRotaryEmbedding(
+++++++#                 self.head_dim,
+++++++#                 max_position_embeddings=self.max_position_embeddings,
+++++++#                 base=self.rope_theta,
+++++++#             )
+++++++#         else:
+++++++#             scaling_type = self.config.rope_scaling["type"]
+++++++#             scaling_factor = self.config.rope_scaling["factor"]
+++++++#             if scaling_type == "linear":
+++++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+++++++#                     self.head_dim,
+++++++#                     max_position_embeddings=self.max_position_embeddings,
+++++++#                     scaling_factor=scaling_factor,
+++++++#                     base=self.rope_theta,
+++++++#                 )
+++++++#             elif scaling_type == "dynamic":
+++++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+++++++#                     self.head_dim,
+++++++#                     max_position_embeddings=self.max_position_embeddings,
+++++++#                     scaling_factor=scaling_factor,
+++++++#                     base=self.rope_theta,
+++++++#                 )
+++++++#             else:
+++++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+++++++
+++++++#     def forward(
+++++++#         self,
+++++++#         hidden_states: mindspore.Tensor,
+++++++#         attention_mask: Optional[mindspore.Tensor] = None,
+++++++#         position_ids: Optional[mindspore.Tensor] = None,
+++++++#         past_key_value: Optional[Cache] = None,
+++++++#         output_attentions: bool = False,
+++++++#         use_cache: bool = False,
+++++++#         **kwargs,
+++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++#         if "padding_mask" in kwargs:
+++++++#             warnings.warn(
+++++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+++++++#             )
+++++++        
+++++++#         if output_attentions:
+++++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+++++++
+++++++#         bsz, q_len, _ = hidden_states.shape
+++++++
+++++++#         if self.config.pretraining_tp > 1:
+++++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+++++++
+++++++#         query_states = self.q_proj(hidden_states)
+++++++#         key_states = self.k_proj(hidden_states)
+++++++#         value_states = self.v_proj(hidden_states)
+++++++
+++++++#         # Reshape for multi-head attention
+++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++#         kv_seq_len = key_states.shape[-2]
+++++++#         if past_key_value is not None:
+++++++#             if self.layer_idx is None:
+++++++#                 raise ValueError(
+++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++#                     "with a layer index."
+++++++#                 )
+++++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++        
+++++++#         # Apply Rotary Positional Embedding
+++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++#         if past_key_value is not None:
+++++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++
+++++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+++++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+++++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++        
+++++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++++++        
+++++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+++++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+++++++
+++++++#         # Convert attention_mask for flash_attention_score
+++++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+++++++#         if attention_mask is not None:
+++++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+++++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+++++++#                 raise ValueError(
+++++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+++++++#                 )
+++++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+++++++#         else:
+++++++#             attn_mask_for_fa = None
+++++++        
+++++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+++++++
+++++++#         # Call the fused flash_attention_score operator
+++++++#         attn_output = mindspore.ops.flash_attention_score(
+++++++#             query=query_states_for_fa,
+++++++#             key=key_states_for_fa,
+++++++#             value=value_states_for_fa,
+++++++#             head_num=self.num_heads, # This is N1, the number of query heads
+++++++#             input_layout='BSH',
+++++++#             attn_mask=attn_mask_for_fa,
+++++++#             keep_prob=keep_prob,
+++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++#             sparse_mode=0 # Default mask mode
+++++++#         )
+++++++        
+++++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+++++++#         attn_output = self.o_proj(attn_output)
+++++++        
+++++++#         # Flash Attention does not return attention weights
+++++++#         attn_weights = None
+++++++
+++++++#         return attn_output, attn_weights, past_key_value
+++++++
+++++++class DeepseekFlashAttention(nn.Module):
+++++++    """
+++++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+++++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
+++++++    designed for high performance on supported hardware (Ascend).
+++++++
+++++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+++++++    """
+++++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+++++++        super().__init__()
+++++++        self.config = config
+++++++        self.layer_idx = layer_idx
+++++++        if layer_idx is None:
+++++++            logger.warning(
+++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++++                "when creating this class."
+++++++            )
+++++++
+++++++        # --- [FIX] Correctly initialize all required attributes ---
+++++++        self.attention_dropout = config.attention_dropout
+++++++        self.hidden_size = config.hidden_size
+++++++        self.num_heads = config.num_attention_heads
+++++++        self.head_dim = self.hidden_size // self.num_heads
+++++++        self.num_key_value_heads = config.num_key_value_heads
+++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++        self.max_position_embeddings = config.max_position_embeddings
+++++++        self.rope_theta = config.rope_theta
+++++++        self.is_causal = True
+++++++
+++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++            raise ValueError(
+++++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++++++                f" and `num_heads`: {self.num_heads})."
+++++++            )
+++++++
+++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+++++++        
+++++++        # This call will now succeed as all attributes are initialized.
+++++++        self._init_rope()
+++++++
+++++++    def _init_rope(self):
+++++++        if self.config.rope_scaling is None:
+++++++            self.rotary_emb = DeepseekRotaryEmbedding(
+++++++                self.head_dim,
+++++++                max_position_embeddings=self.max_position_embeddings,
+++++++                base=self.rope_theta,
+++++++            )
+++++++        else:
+++++++            scaling_type = self.config.rope_scaling["type"]
+++++++            scaling_factor = self.config.rope_scaling["factor"]
+++++++            if scaling_type == "linear":
+++++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+++++++                    self.head_dim,
+++++++                    max_position_embeddings=self.max_position_embeddings,
+++++++                    scaling_factor=scaling_factor,
+++++++                    base=self.rope_theta,
+++++++                )
+++++++            elif scaling_type == "dynamic":
+++++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+++++++                    self.head_dim,
+++++++                    max_position_embeddings=self.max_position_embeddings,
+++++++                    scaling_factor=scaling_factor,
+++++++                    base=self.rope_theta,
+++++++                )
+++++++            else:
+++++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+++++++
+++++++    def forward(
+++++++        self,
+++++++        hidden_states: mindspore.Tensor,
+++++++        attention_mask: Optional[mindspore.Tensor] = None,
+++++++        position_ids: Optional[mindspore.Tensor] = None,
+++++++        past_key_value: Optional[Cache] = None,
+++++++        output_attentions: bool = False,
+++++++        use_cache: bool = False,
+++++++        **kwargs,
+++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++        if "padding_mask" in kwargs:
+++++++            warnings.warn(
+++++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+++++++            )
+++++++        if output_attentions:
+++++++            warnings.warn(
+++++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+++++++            )
+++++++
+++++++        bsz, q_len, _ = hidden_states.shape
+++++++
+++++++        if self.config.pretraining_tp > 1:
+++++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+++++++
+++++++        query_states = self.q_proj(hidden_states)
+++++++        key_states = self.k_proj(hidden_states)
+++++++        value_states = self.v_proj(hidden_states)
+++++++
+++++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++        kv_seq_len = key_states.shape[-2]
+++++++        if past_key_value is not None:
+++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++        
+++++++        # Apply Rotary Position Embedding
+++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++        if past_key_value is not None:
+++++++            cache_kwargs = {"sin": sin, "cos": cos}
+++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++
+++++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+++++++        # So we must explicitly repeat the KV heads.
+++++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++
+++++++        # Convert attention mask for flash_attention_score
+++++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+++++++        if attention_mask is not None:
+++++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+++++++                 raise ValueError(
+++++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+++++++                )
+++++++            attn_mask_for_fa = attention_mask < 0
+++++++        else:
+++++++            attn_mask_for_fa = None
+++++++
+++++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+++++++
+++++++        # Call the fused operator using the efficient BNSD layout
+++++++        attn_output = mindspore.ops.flash_attention_score(
+++++++            query=query_states,
+++++++            key=key_states,
+++++++            value=value_states,
+++++++            head_num=self.num_heads,
+++++++            input_layout='BNSD', # Specify the correct layout
+++++++            attn_mask=attn_mask_for_fa,
+++++++            keep_prob=keep_prob,
+++++++            scalar_value=1.0 / math.sqrt(self.head_dim)
+++++++        )
+++++++        
+++++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++        
+++++++        # Apply output projection
+++++++        attn_output = self.o_proj(attn_output)
+++++++
+++++++        # Flash attention does not return attention weights, so we return None.
+++++++        attn_weights = None
+++++++
+++++++        return attn_output, attn_weights, past_key_value
+++++++
++++++ Deepseek_ATTENTION_CLASSES = {
++++++     "eager": DeepseekAttention,
+++++++    "flash-attention": DeepseekFlashAttention,
++++++ }
++++++ 
++++++ 
++++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
++++++             config=config, layer_idx=layer_idx
++++++         )
++++++ 
+++++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+++++++            config=config, layer_idx=layer_idx
+++++++        )
+++++++
++++++         self.mlp = (
++++++             DeepseekMoE(config)
++++++             if (
++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++index d4c6b651..bced285c 100644
++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
++++++ 
++++++ import mindspore
++++++ import mindnlp.core.nn.functional as F
++++++-from mindnlp.core import nn, ops
+++++++from mindnlp.core import nn, ops, no_grad
++++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
++++++ 
++++++ from ....common.activations import ACT2FN
++++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
++++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
++++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
++++++ 
+++++++Long_Prompt = False
+++++++PROMPT_LENGTH_THRESHOLD = 128
++++++ 
++++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
++++++ def _prepare_4d_causal_attention_mask_with_cache_position(
++++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
++++++         return attn_output, attn_weights, past_key_value
++++++ 
++++++ 
+++++++# class Qwen2MoeFlashAttention(nn.Module):
+++++++#     """
+++++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++++
+++++++#     关键改动:
+++++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++++#        直接传入原始的 key 和 value 张量效率更高。
+++++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++++#     """
+++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++++#         super().__init__()
+++++++#         self.config = config
+++++++#         self.layer_idx = layer_idx
+++++++#         self.hidden_size = config.hidden_size
+++++++#         self.num_heads = config.num_attention_heads
+++++++#         self.head_dim = self.hidden_size // self.num_heads
+++++++#         self.num_key_value_heads = config.num_key_value_heads
+++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++#         self.max_position_embeddings = config.max_position_embeddings
+++++++#         self.rope_theta = config.rope_theta
+++++++#         self.attention_dropout = config.attention_dropout
+++++++
+++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++#             raise ValueError(
+++++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++++#             )
+++++++
+++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++++
+++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++++#             self.head_dim,
+++++++#             max_position_embeddings=self.max_position_embeddings,
+++++++#             base=self.rope_theta,
+++++++#         )
+++++++
+++++++#     def forward(
+++++++#         self,
+++++++#         hidden_states: mindspore.Tensor,
+++++++#         attention_mask: Optional[mindspore.Tensor] = None,
+++++++#         position_ids: Optional[mindspore.Tensor] = None,
+++++++#         past_key_value: Optional[Cache] = None,
+++++++#         output_attentions: bool = False,
+++++++#         use_cache: bool = False,
+++++++#         cache_position: Optional[mindspore.Tensor] = None,
+++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++#         bsz, q_len, _ = hidden_states.shape
+++++++
+++++++#         # 1. 线性投射 Q, K, V
+++++++#         query_states = self.q_proj(hidden_states)
+++++++#         key_states = self.k_proj(hidden_states)
+++++++#         value_states = self.v_proj(hidden_states)
+++++++
+++++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
+++++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++#         # 3. RoPE 旋转位置编码
+++++++#         kv_seq_len = key_states.shape[-2]
+++++++#         if past_key_value is not None:
+++++++#             if self.layer_idx is None:
+++++++#                 raise ValueError(
+++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++#                     "with a layer index."
+++++++#                 )
+++++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++++#                 if cache_position.shape[0] == 1:
+++++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++++#                     kv_seq_len = past_seen_tokens + 1
+++++++#                 else:
+++++++#                     # prefill 阶段：cache_position 是范围，使用其长度
+++++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++++#             else:
+++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++
+++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++#         # 4. KV 缓存更新
+++++++#         if past_key_value is not None:
+++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++#             key_states, value_states = past_key_value.update(
+++++++#                 key_states, value_states, self.layer_idx, cache_kwargs
+++++++#             )
+++++++            
+++++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++#                 if cache_position.shape[0] == 1:
+++++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++++#                     kv_seq_len = key_states.shape[-2]
+++++++
+++++++#         # 5. [重要] 准备 Attention Mask
+++++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++++#         fa_attention_mask = None
+++++++#         if attention_mask is not None:
+++++++#             # 截取与当前key长度匹配的部分
+++++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++++#             fa_attention_mask = (mask_slice != 0)
+++++++
+++++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++++#         input_dtype = query_states.dtype
+++++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++++#             query_states = query_states.to(mindspore.float16)
+++++++#             key_states = key_states.to(mindspore.float16)
+++++++#             value_states = value_states.to(mindspore.float16)
+++++++
+++++++#         # 6. [核心] 调用 flash_attention_score 算子
+++++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++++#         attn_output = mindspore.ops.flash_attention_score(
+++++++#             query=query_states,
+++++++#             key=key_states,
+++++++#             value=value_states,
+++++++#             head_num=self.num_heads, # 传入Q的头数(N1)
+++++++#             attn_mask=fa_attention_mask,
+++++++#             keep_prob=1.0 - self.attention_dropout,
+++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++#             input_layout="BNSD",
+++++++#             sparse_mode=0 # 使用 defaultMask 模式
+++++++#         )
+++++++
+++++++#         # 恢复原始数据类型
+++++++#         attn_output = attn_output.to(input_dtype)
+++++++
+++++++#         # 7. 调整输出形状
+++++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++#         attn_output = self.o_proj(attn_output)
+++++++
+++++++#         # FlashAttention 算子不直接返回注意力权重矩阵
+++++++#         attn_weights = None
+++++++#         if output_attentions:
+++++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++
+++++++#         return attn_output, attn_weights, past_key_value
+++++++
+++++++#     # def forward(
+++++++#     #     self,
+++++++#     #     hidden_states: mindspore.Tensor,
+++++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++#     #     position_ids: Optional[mindspore.Tensor] = None,
+++++++#     #     past_key_value: Optional[Cache] = None,
+++++++#     #     output_attentions: bool = False,
+++++++#     #     use_cache: bool = False,
+++++++#     #     cache_position: Optional[mindspore.Tensor] = None,
+++++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++#     #     bsz, q_len, _ = hidden_states.shape
+++++++
+++++++#     #     # 1. 线性投射 Q, K, V
+++++++#     #     query_states = self.q_proj(hidden_states)
+++++++#     #     key_states = self.k_proj(hidden_states)
+++++++#     #     value_states = self.v_proj(hidden_states)
+++++++
+++++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++
+++++++#     #     # 3. RoPE 旋转位置编码
+++++++#     #     kv_seq_len = key_states.shape[-2]
+++++++#     #     if past_key_value is not None:
+++++++#     #         if self.layer_idx is None:
+++++++#     #             raise ValueError(
+++++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++#     #                 "with a layer index."
+++++++#     #             )
+++++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++
+++++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++#     #     # 4. KV 缓存更新
+++++++#     #     if past_key_value is not None:
+++++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++#     #         key_states, value_states = past_key_value.update(
+++++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++#     #         )
+++++++
+++++++#     #     # 5. 准备 Attention Mask
+++++++#     #     fa_attention_mask = None
+++++++#     #     if attention_mask is not None:
+++++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++#     #         fa_attention_mask = (mask_slice != 0)
+++++++
+++++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++++#     #     input_dtype = query_states.dtype
+++++++
+++++++#     #     # 6. [核心] 调用 flash_attention_score 算子
+++++++#     #     attn_output = mindspore.ops.flash_attention_score(
+++++++#     #         query=query_states,
+++++++#     #         key=key_states,
+++++++#     #         value=value_states,
+++++++#     #         head_num=self.num_heads,
+++++++#     #         attn_mask=fa_attention_mask,
+++++++#     #         keep_prob=1.0 - self.attention_dropout,
+++++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++#     #         input_layout="BNSD",
+++++++#     #         sparse_mode=0,
+++++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++++#     #         inner_precise=1
+++++++#     #     )
+++++++
+++++++#     #     # 恢复原始数据类型
+++++++#     #     attn_output = attn_output.to(input_dtype)
+++++++
+++++++#     #     # 7. 调整输出形状
+++++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++#     #     attn_output = self.o_proj(attn_output)
+++++++
+++++++#     #     attn_weights = None
+++++++#     #     if output_attentions:
+++++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++
+++++++#     #     return attn_output, attn_weights, past_key_value
+++++++
+++++++
++++++ class Qwen2MoeFlashAttention(nn.Module):
++++++     """
++++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++++-
++++++-    关键改动:
++++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++++-       直接传入原始的 key 和 value 张量效率更高。
++++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+++++++
+++++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+++++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+++++++    完全使用模型的低精度数据类型（如 float16）进行计算，
+++++++    以达到理论上的最高执行速度。
++++++     """
++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++         super().__init__()
++++++         self.config = config
++++++         self.layer_idx = layer_idx
+++++++        if layer_idx is None:
+++++++            logger.warning_once(
+++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+++++++            )
+++++++
++++++         self.hidden_size = config.hidden_size
++++++         self.num_heads = config.num_attention_heads
++++++         self.head_dim = self.hidden_size // self.num_heads
++++++         self.num_key_value_heads = config.num_key_value_heads
++++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++         self.max_position_embeddings = config.max_position_embeddings
++++++         self.rope_theta = config.rope_theta
++++++         self.attention_dropout = config.attention_dropout
++++++ 
++++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
++++++-            raise ValueError(
++++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++++-            )
++++++-
++++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
++++++         key_states = self.k_proj(hidden_states)
++++++         value_states = self.v_proj(hidden_states)
++++++ 
++++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
++++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++++        # 2. 调整形状以匹配 BNSD 布局
++++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-
++++++-        # 3. RoPE 旋转位置编码
+++++++        
+++++++        # 3. RoPE 和 KV 缓存
++++++         kv_seq_len = key_states.shape[-2]
++++++         if past_key_value is not None:
++++++-            if self.layer_idx is None:
++++++-                raise ValueError(
++++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++-                    "with a layer index."
++++++-                )
++++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++++-                if cache_position.shape[0] == 1:
++++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++++-                    kv_seq_len = past_seen_tokens + 1
++++++-                else:
++++++-                    # prefill 阶段：cache_position 是范围，使用其长度
++++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++++-            else:
++++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++-
+++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++        
++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++ 
++++++-        # 4. KV 缓存更新
++++++         if past_key_value is not None:
++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++-            key_states, value_states = past_key_value.update(
++++++-                key_states, value_states, self.layer_idx, cache_kwargs
++++++-            )
++++++-            
++++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++-                if cache_position.shape[0] == 1:
++++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++++-                    kv_seq_len = key_states.shape[-2]
++++++-
++++++-        # 5. [重要] 准备 Attention Mask
++++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++
+++++++        # 4. 准备 Attention Mask
++++++         fa_attention_mask = None
++++++         if attention_mask is not None:
++++++-            # 截取与当前key长度匹配的部分
++++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++++             fa_attention_mask = (mask_slice != 0)
++++++ 
++++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++++-        input_dtype = query_states.dtype
++++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++++-            query_states = query_states.to(mindspore.float16)
++++++-            key_states = key_states.to(mindspore.float16)
++++++-            value_states = value_states.to(mindspore.float16)
++++++-
++++++-        # 6. [核心] 调用 flash_attention_score 算子
++++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
++++++         attn_output = mindspore.ops.flash_attention_score(
++++++             query=query_states,
++++++             key=key_states,
++++++             value=value_states,
++++++-            head_num=self.num_heads, # 传入Q的头数(N1)
+++++++            head_num=self.num_heads,
++++++             attn_mask=fa_attention_mask,
++++++-            keep_prob=1.0 - self.attention_dropout,
+++++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
++++++             scalar_value=1.0 / math.sqrt(self.head_dim),
++++++             input_layout="BNSD",
++++++-            sparse_mode=0 # 使用 defaultMask 模式
+++++++            sparse_mode=0,
+++++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
++++++         )
++++++ 
++++++-        # 恢复原始数据类型
++++++-        attn_output = attn_output.to(input_dtype)
++++++-
++++++-        # 7. 调整输出形状
++++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++++        # 6. 调整输出形状
++++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++         attn_output = self.o_proj(attn_output)
++++++ 
++++++-        # FlashAttention 算子不直接返回注意力权重矩阵
+++++++        # 7. 返回结果
++++++         attn_weights = None
++++++         if output_attentions:
++++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
++++++ 
++++++         return attn_output, attn_weights, past_key_value
++++++ 
++++++-    # def forward(
++++++-    #     self,
++++++-    #     hidden_states: mindspore.Tensor,
++++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++-    #     position_ids: Optional[mindspore.Tensor] = None,
++++++-    #     past_key_value: Optional[Cache] = None,
++++++-    #     output_attentions: bool = False,
++++++-    #     use_cache: bool = False,
++++++-    #     cache_position: Optional[mindspore.Tensor] = None,
++++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++-
++++++-    #     bsz, q_len, _ = hidden_states.shape
++++++-
++++++-    #     # 1. 线性投射 Q, K, V
++++++-    #     query_states = self.q_proj(hidden_states)
++++++-    #     key_states = self.k_proj(hidden_states)
++++++-    #     value_states = self.v_proj(hidden_states)
++++++-
++++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-
++++++-    #     # 3. RoPE 旋转位置编码
++++++-    #     kv_seq_len = key_states.shape[-2]
++++++-    #     if past_key_value is not None:
++++++-    #         if self.layer_idx is None:
++++++-    #             raise ValueError(
++++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++-    #                 "with a layer index."
++++++-    #             )
++++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++ 
++++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++-
++++++-    #     # 4. KV 缓存更新
++++++-    #     if past_key_value is not None:
++++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++-    #         key_states, value_states = past_key_value.update(
++++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++-    #         )
++++++-
++++++-    #     # 5. 准备 Attention Mask
++++++-    #     fa_attention_mask = None
++++++-    #     if attention_mask is not None:
++++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++-    #         fa_attention_mask = (mask_slice != 0)
++++++-
++++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++++-    #     input_dtype = query_states.dtype
++++++-
++++++-    #     # 6. [核心] 调用 flash_attention_score 算子
++++++-    #     attn_output = mindspore.ops.flash_attention_score(
++++++-    #         query=query_states,
++++++-    #         key=key_states,
++++++-    #         value=value_states,
++++++-    #         head_num=self.num_heads,
++++++-    #         attn_mask=fa_attention_mask,
++++++-    #         keep_prob=1.0 - self.attention_dropout,
++++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++++-    #         input_layout="BNSD",
++++++-    #         sparse_mode=0,
++++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++++-    #         inner_precise=1
++++++-    #     )
++++++-
++++++-    #     # 恢复原始数据类型
++++++-    #     attn_output = attn_output.to(input_dtype)
+++++++QWEN2MOE_ATTENTION_CLASSES = {
+++++++    "eager": Qwen2MoeAttention,
+++++++    "flash-attention": Qwen2MoeFlashAttention,
+++++++}
++++++ 
++++++-    #     # 7. 调整输出形状
++++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++-    #     attn_output = self.o_proj(attn_output)
++++++ 
++++++-    #     attn_weights = None
++++++-    #     if output_attentions:
++++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++#     def __init__(self, config):
+++++++#         super().__init__()
+++++++#         self.num_experts = config.num_experts
+++++++#         self.top_k = config.num_experts_per_tok
+++++++#         self.norm_topk_prob = config.norm_topk_prob
+++++++
+++++++#         # gating
+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++#         self.experts = nn.ModuleList(
+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++#         )
+++++++
+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++
+++++++#     #@dwj
+++++++#     # 只遍历激活的专家，而非全部专家
+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++#             num_tokens = hidden_states_reshaped.shape[0]
+++++++
+++++++#             router_logits = self.gate(hidden_states_reshaped)
+++++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++
+++++++#             if self.norm_topk_prob:
+++++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++#             routing_weights = routing_weights.to(hidden_states.dtype)
+++++++            
+++++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++++#             flat_selected_experts = selected_experts.flatten()
+++++++            
+++++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++++#             token_indices = broadcasted_token_indices.flatten()
+++++++            
+++++++#             active_experts = ops.unique(flat_selected_experts)
+++++++            
+++++++#             for expert_idx_tensor in active_experts:
+++++++#                 expert_idx = expert_idx_tensor.item()
+++++++#                 expert_layer = self.experts[expert_idx]
+++++++                
+++++++#                 mask = (flat_selected_experts == expert_idx_tensor)
+++++++#                 selected_token_indices = token_indices[mask]
+++++++#                 selected_routing_weights = routing_weights.flatten()[mask]
+++++++                
+++++++#                 current_states = hidden_states_reshaped[selected_token_indices]
+++++++                
+++++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++                
+++++++#                 final_hidden_states = final_hidden_states.index_add(
+++++++#                     dim=0,
+++++++#                     index=selected_token_indices,
+++++++#                     source=expert_output.to(hidden_states.dtype)
+++++++#                 )
+++++++            
+++++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++++ 
++++++-    #     return attn_output, attn_weights, past_key_value
+++++++#             final_hidden_states = final_hidden_states + shared_expert_output
+++++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++            
+++++++#             return final_hidden_states, router_logits
+++++++
+++++++
+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++#     """
+++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+++++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+++++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
+++++++#     """
+++++++#     def __init__(self, config: Qwen2MoeConfig):
+++++++#         super().__init__()
+++++++#         self.num_experts = config.num_experts
+++++++#         self.top_k = config.num_experts_per_tok
+++++++#         self.norm_topk_prob = config.norm_topk_prob
+++++++
+++++++#         # 门控网络
+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++#         # 专家列表
+++++++#         self.experts = nn.ModuleList(
+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++#         )
+++++++#         # 共享专家
+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++
+++++++#     @no_grad()
+++++++#     def _moe_infer_decode(
+++++++#         self, 
+++++++#         hidden_states: mindspore.Tensor, 
+++++++#         selected_experts: mindspore.Tensor, 
+++++++#         routing_weights: mindspore.Tensor
+++++++#     ) -> mindspore.Tensor:
+++++++#         """
+++++++#         【解码路径】针对 sequence_length=1 的极致优化。
+++++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+++++++#         """
+++++++#         batch_size, hidden_dim = hidden_states.shape
+++++++        
+++++++#         expert_outputs_list = [
+++++++#             ops.cat([
+++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++#             ], dim=0) 
+++++++#             for i in range(batch_size)
+++++++#         ]
+++++++        
+++++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+++++++#         # shape: (batch_size, top_k, hidden_dim)
+++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++        
+++++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++++        
+++++++#         return moe_output.squeeze(1)
+++++++
+++++++#     @no_grad()
+++++++#     def _moe_infer_prefill(
+++++++#         self, 
+++++++#         hidden_states: mindspore.Tensor, 
+++++++#         selected_experts: mindspore.Tensor, 
+++++++#         routing_weights: mindspore.Tensor
+++++++#     ) -> mindspore.Tensor:
+++++++#         """
+++++++#         【预填充路径】针对 sequence_length > 1 的优化。
+++++++#         按专家对 Token 进行分组，并进行批处理。
+++++++#         """
+++++++#         moe_output = ops.zeros_like(hidden_states)
+++++++#         num_tokens = hidden_states.shape[0]
+++++++#         flat_selected_experts = selected_experts.flatten()
+++++++        
+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++        
+++++++#         active_experts = ops.unique(flat_selected_experts)
+++++++        
+++++++#         for expert_idx_tensor in active_experts:
+++++++#             expert_idx = expert_idx_tensor.item()
+++++++#             expert_layer = self.experts[expert_idx]
+++++++            
+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++#             selected_token_indices = token_indices[mask]
+++++++#             selected_routing_weights = routing_weights.flatten()[mask]
+++++++            
+++++++#             current_states = hidden_states[selected_token_indices]
+++++++            
+++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++            
+++++++#             moe_output = moe_output.index_add(
+++++++#                 dim=0,
+++++++#                 index=selected_token_indices,
+++++++#                 source=expert_output.to(hidden_states.dtype)
+++++++#             )
+++++++#         return moe_output
+++++++
+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++#         """
+++++++#         顶层 forward 方法，作为智能分发器。
+++++++#         """
+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++        
+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++#         router_logits = self.gate(hidden_states_reshaped)
+++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++ 
++++++-    # def forward(
++++++-    #     self,
++++++-    #     hidden_states: mindspore.Tensor,
++++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++-    #     position_ids: Optional[mindspore.Tensor] = None,
++++++-    #     past_key_value: Optional[Cache] = None,
++++++-    #     output_attentions: bool = False,
++++++-    #     use_cache: bool = False,
++++++-    #     cache_position: Optional[mindspore.Tensor] = None,
++++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++-
++++++-    #     bsz, q_len, _ = hidden_states.shape
++++++-
++++++-    #     query_states = self.q_proj(hidden_states)
++++++-    #     key_states = self.k_proj(hidden_states)
++++++-    #     value_states = self.v_proj(hidden_states)
++++++-
++++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-
++++++-    #     kv_seq_len = key_states.shape[-2]
++++++-    #     if past_key_value is not None:
++++++-    #         if self.layer_idx is None:
++++++-    #             raise ValueError("`layer_idx` must be specified for caching")
++++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++-
++++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++-
++++++-    #     if past_key_value is not None:
++++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++-    #         key_states, value_states = past_key_value.update(
++++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++-    #         )
+++++++#         if self.norm_topk_prob:
+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++        
+++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++++        
+++++++#         moe_output = None
+++++++#         # 在推理时，根据序列长度选择最优路径
+++++++#         if not self.training:
+++++++#             if sequence_length == 1:
+++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++++++#             else:
+++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++++++#         else:
+++++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+++++++#             raise NotImplementedError("Training path is not implemented.")
+++++++
+++++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+++++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+++++++        
+++++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+++++++        
+++++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+++++++        
+++++++#         return final_hidden_states, router_logits
+++++++
+++++++
+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++#     """
+++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+++++++#     """
+++++++#     def __init__(self, config: Qwen2MoeConfig):
+++++++#         super().__init__()
+++++++#         self.num_experts = config.num_experts
+++++++#         self.top_k = config.num_experts_per_tok
+++++++#         self.norm_topk_prob = config.norm_topk_prob
+++++++
+++++++#         # 门控网络
+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++#         # 专家列表
+++++++#         self.experts = nn.ModuleList(
+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++#         )
+++++++#         # 共享专家
+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++
+++++++#     @no_grad()
+++++++#     def _moe_infer_decode(
+++++++#         self, 
+++++++#         hidden_states: mindspore.Tensor, 
+++++++#         selected_experts: mindspore.Tensor, 
+++++++#         routing_weights: mindspore.Tensor
+++++++#     ) -> mindspore.Tensor:
+++++++#         batch_size, _ = hidden_states.shape
+++++++#         expert_outputs_list = [
+++++++#             ops.cat([
+++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++#             ], dim=0) 
+++++++#             for i in range(batch_size)
+++++++#         ]
+++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++++#         return moe_output.squeeze(1)
+++++++
+++++++#     @no_grad()
+++++++#     def _moe_infer_prefill(
+++++++#         self, 
+++++++#         hidden_states: mindspore.Tensor, 
+++++++#         selected_experts: mindspore.Tensor, 
+++++++#         routing_weights: mindspore.Tensor
+++++++#     ) -> mindspore.Tensor:
+++++++#         moe_output = ops.zeros_like(hidden_states)
+++++++#         num_tokens = hidden_states.shape[0]
+++++++#         flat_selected_experts = selected_experts.flatten()
+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++#         active_experts = ops.unique(flat_selected_experts)
+++++++        
+++++++#         for expert_idx_tensor in active_experts:
+++++++#             expert_idx = expert_idx_tensor.item()
+++++++#             expert_layer = self.experts[expert_idx]
+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++#             selected_token_indices = token_indices[mask]
+++++++#             selected_routing_weights = routing_weights.flatten()[mask]
+++++++#             current_states = hidden_states[selected_token_indices]
+++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++#             moe_output = moe_output.index_add(
+++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++++++#             )
+++++++#         return moe_output
+++++++
+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++#         """
+++++++#         顶层 forward 方法，作为智能分发器。
+++++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+++++++#         """
+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++        
+++++++#         # 1. 门控计算 (通用逻辑)
+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++#         router_logits = self.gate(hidden_states_reshaped)
+++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++
+++++++#         if self.norm_topk_prob:
+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++        
+++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++++        
+++++++#         # 2. 智能分发到最优 MoE 路径
+++++++#         moe_output = None
+++++++#         if not self.training:
+++++++#             if sequence_length == 1:
+++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++++++#             else:
+++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++++++#         else:
+++++++#             raise NotImplementedError("Training path is not implemented.")
+++++++
+++++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+++++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++        
+++++++#         # 4. 合并 MoE 输出和共享专家输出
+++++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++        
+++++++#         # 5. 恢复原始形状并返回
+++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++        
+++++++#         return final_hidden_states, router_logits
+++++++
+++++++# prefill fastest
+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++#     """
+++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+++++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+++++++#     """
+++++++#     def __init__(self, config: Qwen2MoeConfig):
+++++++#         super().__init__()
+++++++#         self.num_experts = config.num_experts
+++++++#         self.top_k = config.num_experts_per_tok
+++++++#         self.norm_topk_prob = config.norm_topk_prob
+++++++
+++++++#         # 门控网络
+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++#         # 专家列表
+++++++#         self.experts = nn.ModuleList(
+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++#         )
+++++++#         # 共享专家
+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++
+++++++#     @no_grad()
+++++++#     def _moe_infer_dispatch(
+++++++#         self, 
+++++++#         hidden_states: mindspore.Tensor, 
+++++++#         selected_experts: mindspore.Tensor, 
+++++++#         routing_weights: mindspore.Tensor
+++++++#     ) -> mindspore.Tensor:
+++++++#         """
+++++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+++++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+++++++#         """
+++++++#         moe_output = ops.zeros_like(hidden_states)
+++++++#         num_tokens, _ = hidden_states.shape
+++++++        
+++++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+++++++#         flat_selected_experts = selected_experts.flatten()
+++++++#         flat_routing_weights = routing_weights.flatten()
++++++ 
++++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++-
++++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++++-    #     query_states = query_states / math.sqrt(self.head_dim)
++++++-    #     # <--- 修改结束 ---
++++++-
++++++-    #     fa_attention_mask = None
++++++-    #     if attention_mask is not None:
++++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++-    #         fa_attention_mask = (mask_slice != 0)
++++++-
++++++-    #     input_dtype = query_states.dtype
++++++-
++++++-    #     attn_output = mindspore.ops.flash_attention_score(
++++++-    #         query=query_states,    # 传入已经预先缩放过的 query
++++++-    #         key=key_states,
++++++-    #         value=value_states,
++++++-    #         head_num=self.num_heads,
++++++-    #         attn_mask=fa_attention_mask,
++++++-    #         keep_prob=1.0 - self.attention_dropout,
++++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++++-    #         input_layout="BNSD",
++++++-    #         sparse_mode=0,
++++++-    #         inner_precise=1        # 仍然保持内部高精度计算
++++++-    #     )
+++++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++ 
++++++-    #     attn_output = attn_output.to(input_dtype)
++++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++-    #     attn_output = self.o_proj(attn_output)
+++++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+++++++#         active_experts = ops.unique(flat_selected_experts)
+++++++        
+++++++#         for expert_idx_tensor in active_experts:
+++++++#             expert_idx = expert_idx_tensor.item()
+++++++#             expert_layer = self.experts[expert_idx]
+++++++            
+++++++#             # 找到所有分配给该专家的 token
+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++            
+++++++#             # 使用 mask 选取对应的 token 和权重
+++++++#             current_token_indices = token_indices[mask]
+++++++#             current_routing_weights = flat_routing_weights[mask]
+++++++#             current_hidden_states = hidden_states[current_token_indices]
+++++++            
+++++++#             # 对这些 token 进行批处理
+++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++++            
+++++++#             # 使用 index_add 将结果精确地加回到对应位置
+++++++#             moe_output = moe_output.index_add(
+++++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+++++++#             )
+++++++#         return moe_output
+++++++
+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++#         """
+++++++#         顶层 forward 方法，作为智能分发器。
+++++++#         """
+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++        
+++++++#         # 1. 门控计算
+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++#         router_logits = self.gate(hidden_states_reshaped)
+++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++
+++++++#         if self.norm_topk_prob:
+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++        
+++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++++        
+++++++#         # 2. 调用统一的 MoE 计算内核
+++++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+++++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
++++++ 
++++++-    #     attn_weights = None
++++++-    #     if output_attentions:
++++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++++#         # 3. 统一处理共享专家
+++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++        
+++++++#         # 4. 合并输出
+++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++        
+++++++#         # 5. 恢复原始形状并返回
+++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++        
+++++++#         return final_hidden_states, router_logits
+++++++
+++++++
+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++#     """
+++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++++#     【最终高性能与高精度版】：
+++++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+++++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+++++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+++++++#     3. 这样实现了速度和准确性的两全其美。
+++++++#     """
+++++++#     def __init__(self, config: Qwen2MoeConfig):
+++++++#         super().__init__()
+++++++#         self.num_experts = config.num_experts
+++++++#         self.top_k = config.num_experts_per_tok
+++++++#         self.norm_topk_prob = config.norm_topk_prob
+++++++
+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++#         self.experts = nn.ModuleList(
+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++#         )
+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++
+++++++#     @no_grad()
+++++++#     def _moe_infer_decode(
+++++++#         self, 
+++++++#         hidden_states: mindspore.Tensor, 
+++++++#         selected_experts: mindspore.Tensor, 
+++++++#         routing_weights: mindspore.Tensor
+++++++#     ) -> mindspore.Tensor:
+++++++#         """
+++++++#         【解码路径】极致优化版：bmm + 高精度累加。
+++++++#         """
+++++++#         original_dtype = hidden_states.dtype
+++++++#         batch_size, _ = hidden_states.shape
+++++++        
+++++++#         expert_outputs_list = [
+++++++#             ops.cat([
+++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++#             ], dim=0) 
+++++++#             for i in range(batch_size)
+++++++#         ]
+++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++
+++++++#         # 在 float32 下执行 bmm，得到高精度结果
+++++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++++        
+++++++#         # 将高精度结果转换回原始数据类型
+++++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+++++++        
+++++++#         return moe_output
+++++++
+++++++#     @no_grad()
+++++++#     def _moe_infer_prefill(
+++++++#         self, 
+++++++#         hidden_states: mindspore.Tensor, 
+++++++#         selected_experts: mindspore.Tensor, 
+++++++#         routing_weights: mindspore.Tensor
+++++++#     ) -> mindspore.Tensor:
+++++++#         """
+++++++#         【预填充路径】与原始实现一致，结果精确。
+++++++#         """
+++++++#         moe_output = ops.zeros_like(hidden_states)
+++++++#         num_tokens, _ = hidden_states.shape
+++++++#         flat_selected_experts = selected_experts.flatten()
+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++#         active_experts = ops.unique(flat_selected_experts)
+++++++        
+++++++#         for expert_idx_tensor in active_experts:
+++++++#             expert_idx = expert_idx_tensor.item()
+++++++#             expert_layer = self.experts[expert_idx]
+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++#             selected_token_indices = token_indices[mask]
+++++++#             selected_routing_weights = routing_weights.flatten()[mask]
+++++++#             current_states = hidden_states[selected_token_indices]
+++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++#             moe_output = moe_output.index_add(
+++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++++++#             )
+++++++#         return moe_output
+++++++
+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++        
+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++#         router_logits = self.gate(hidden_states_reshaped)
+++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++ 
++++++-    #     return attn_output, attn_weights, past_key_value
+++++++#         if self.norm_topk_prob:
+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++        
+++++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+++++++#         # 如果模型主体是 float16，后续再转换
+++++++        
+++++++#         moe_output = None
+++++++#         if not self.training:
+++++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+++++++#             # _moe_infer_decode 内部会处理好类型转换
+++++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+++++++#             if sequence_length == 1:
+++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++++++#             else:
+++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++++++#         else:
+++++++#             raise NotImplementedError("Training path is not implemented.")
+++++++
+++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++        
+++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++        
+++++++#         return final_hidden_states, router_logits
+++++++    
++++++ 
++++++-QWEN2MOE_ATTENTION_CLASSES = {
++++++-    "eager": Qwen2MoeAttention,
++++++-    "flash-attention": Qwen2MoeFlashAttention,
++++++-}
+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++#     """
+++++++#     【融合版】一个混合专家模块，内置两种推理策略，
+++++++#     由外部全局变量 `Long_Prompt` 控制：
+++++++
+++++++#     - if Long_Prompt is True:  【精度优先模式】
+++++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+++++++#       适用于处理长序列，避免误差累积。
+++++++
+++++++#     - if Long_Prompt is False: 【速度优先模式】
+++++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+++++++#       在解码阶段获得极致速度，同时保证结果高度准确。
+++++++#     """
+++++++#     def __init__(self, config: Qwen2MoeConfig):
+++++++#         super().__init__()
+++++++#         self.num_experts = config.num_experts
+++++++#         self.top_k = config.num_experts_per_tok
+++++++#         self.norm_topk_prob = config.norm_topk_prob
+++++++
+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++#         self.experts = nn.ModuleList(
+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++#         )
+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++
+++++++#     # --- 速度优先模式的辅助函数 ---
+++++++#     @no_grad()
+++++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++#         original_dtype = hidden_states.dtype
+++++++#         batch_size, _ = hidden_states.shape
+++++++#         expert_outputs_list = [
+++++++#             ops.cat([
+++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++#             ], dim=0) 
+++++++#             for i in range(batch_size)
+++++++#         ]
+++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++#         weights_fp32 = routing_weights.to(mindspore.float32)
+++++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
+++++++
+++++++#     @no_grad()
+++++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++#         moe_output = ops.zeros_like(hidden_states)
+++++++#         num_tokens, _ = hidden_states.shape
+++++++#         flat_selected_experts = selected_experts.flatten()
+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++#         active_experts = ops.unique(flat_selected_experts)
+++++++#         for expert_idx_tensor in active_experts:
+++++++#             expert_idx = expert_idx_tensor.item()
+++++++#             expert_layer = self.experts[expert_idx]
+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++#             selected_token_indices = token_indices[mask]
+++++++#             selected_routing_weights = routing_weights.flatten()[mask]
+++++++#             current_states = hidden_states[selected_token_indices]
+++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+++++++#         return moe_output
+++++++
+++++++#     # --- 精度优先模式的辅助函数 ---
+++++++#     @no_grad()
+++++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++#         moe_output = ops.zeros_like(hidden_states)
+++++++#         num_tokens, _ = hidden_states.shape
+++++++#         flat_selected_experts = selected_experts.flatten()
+++++++#         flat_routing_weights = routing_weights.flatten()
+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++#         active_experts = ops.unique(flat_selected_experts)
+++++++#         for expert_idx_tensor in active_experts:
+++++++#             expert_idx = expert_idx_tensor.item()
+++++++#             expert_layer = self.experts[expert_idx]
+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++#             current_token_indices = token_indices[mask]
+++++++#             current_routing_weights = flat_routing_weights[mask]
+++++++#             current_hidden_states = hidden_states[current_token_indices]
+++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++++++#         return moe_output
+++++++
+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++#         # 声明我们将要使用一个在模块外部定义的全局变量
+++++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+++++++#         global Long_Prompt
+++++++
+++++++#         # 1. 门控计算 (所有模式通用)
+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++#         router_logits = self.gate(hidden_states_reshaped)
+++++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++++++#         if self.norm_topk_prob:
+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++        
+++++++#         moe_output = None
+++++++#         if not self.training:
+++++++#             # 根据 Long_Prompt 标志选择模式
+++++++#             if Long_Prompt:
+++++++#                 # --- 精度优先模式 ---
+++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++#             else:
+++++++#                 # --- 速度优先模式 ---
+++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++#                 if sequence_length == 1:
+++++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++#                 else:
+++++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++#         else:
+++++++#             raise NotImplementedError("Training path is not implemented.")
+++++++
+++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++        
+++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++        
+++++++#         return final_hidden_states, router_logits
+++++++    
+++++++class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++    """
+++++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+++++++    控制的顶级推理策略：
++++++ 
+++++++    - if Long_Prompt is True:  【精度优先模式】
+++++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+++++++      适用于需要严格可复现性的长序列任务。
++++++ 
++++++-class Qwen2MoeSparseMoeBlock(nn.Module):
++++++-    def __init__(self, config):
+++++++    - if Long_Prompt is False: 【速度优先模式】
+++++++      采用业界最强的性能组合：
+++++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+++++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+++++++    """
+++++++    def __init__(self, config: Qwen2MoeConfig):
++++++         super().__init__()
++++++         self.num_experts = config.num_experts
++++++         self.top_k = config.num_experts_per_tok
++++++         self.norm_topk_prob = config.norm_topk_prob
++++++ 
++++++-        # gating
++++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++         self.experts = nn.ModuleList(
++++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++         )
++++++-
++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++ 
++++++-    #@dwj
++++++-    # 只遍历激活的专家，而非全部专家
++++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++-            num_tokens = hidden_states_reshaped.shape[0]
++++++-
++++++-            router_logits = self.gate(hidden_states_reshaped)
++++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++-
++++++-            if self.norm_topk_prob:
++++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-            routing_weights = routing_weights.to(hidden_states.dtype)
++++++-            
++++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++++-            flat_selected_experts = selected_experts.flatten()
++++++-            
++++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++++-            token_indices = broadcasted_token_indices.flatten()
++++++-            
++++++-            active_experts = ops.unique(flat_selected_experts)
++++++-            
++++++-            for expert_idx_tensor in active_experts:
++++++-                expert_idx = expert_idx_tensor.item()
++++++-                expert_layer = self.experts[expert_idx]
++++++-                
++++++-                mask = (flat_selected_experts == expert_idx_tensor)
++++++-                selected_token_indices = token_indices[mask]
++++++-                selected_routing_weights = routing_weights.flatten()[mask]
++++++-                
++++++-                current_states = hidden_states_reshaped[selected_token_indices]
++++++-                
++++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++-                
++++++-                final_hidden_states = final_hidden_states.index_add(
++++++-                    dim=0,
++++++-                    index=selected_token_indices,
++++++-                    source=expert_output.to(hidden_states.dtype)
++++++-                )
++++++-            
++++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+++++++    @no_grad()
+++++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++        original_dtype = hidden_states.dtype
+++++++        batch_size, _ = hidden_states.shape
+++++++        expert_outputs_list = [
+++++++            ops.cat([
+++++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++            ], dim=0)
+++++++            for i in range(batch_size)
+++++++        ]
+++++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++        weights_fp32 = routing_weights.to(mindspore.float32)
+++++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++++++        return moe_output_fp32.squeeze(1).to(original_dtype)
+++++++
+++++++    @no_grad()
+++++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++        num_tokens, _ = hidden_states.shape
+++++++        flat_selected_experts = selected_experts.flatten()
+++++++        sorted_expert_indices = flat_selected_experts.argsort()
+++++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++++++        original_token_indices = sorted_expert_indices // self.top_k
+++++++        moe_output = ops.zeros_like(hidden_states)
+++++++        current_token_offset = 0
+++++++        for i in range(self.num_experts):
+++++++            expert_token_count = tokens_per_expert[i] - current_token_offset
+++++++            if expert_token_count == 0:
+++++++                continue
+++++++            end_offset = current_token_offset + expert_token_count
+++++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++++++            expert_hidden_states = hidden_states[expert_original_token_indices]
+++++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++++++            current_token_offset += expert_token_count
+++++++        return moe_output
+++++++
+++++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+++++++    @no_grad()
+++++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++        moe_output = ops.zeros_like(hidden_states)
+++++++        num_tokens, _ = hidden_states.shape
+++++++        flat_selected_experts = selected_experts.flatten()
+++++++        flat_routing_weights = routing_weights.flatten()
+++++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++        active_experts = ops.unique(flat_selected_experts)
+++++++        for expert_idx_tensor in active_experts:
+++++++            expert_idx = expert_idx_tensor.item()
+++++++            expert_layer = self.experts[expert_idx]
+++++++            mask = (flat_selected_experts == expert_idx_tensor)
+++++++            current_token_indices = token_indices[mask]
+++++++            current_routing_weights = flat_routing_weights[mask]
+++++++            current_hidden_states = hidden_states[current_token_indices]
+++++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++++++        return moe_output
++++++ 
++++++-            final_hidden_states = final_hidden_states + shared_expert_output
++++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++-            
++++++-            return final_hidden_states, router_logits
+++++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++        global Long_Prompt
+++++++
+++++++        # 1. 门控计算 (所有模式通用)
+++++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++        router_logits = self.gate(hidden_states_reshaped)
+++++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++++++        if self.norm_topk_prob:
+++++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++        
+++++++        moe_output = None
+++++++        if Long_Prompt:
+++++++            # --- 精度优先模式 (ACCURACY MODE) ---
+++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++        else:
+++++++            # --- 速度优先模式 (SPEED MODE) ---
+++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++            if sequence_length == 1:
+++++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++            else:
+++++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++        
++++++ 
+++++++        # 3. 共享专家计算与合并 (所有模式通用)
+++++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++        
+++++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++        
+++++++        return final_hidden_states, router_logits
++++++ 
++++++ class Qwen2MoeDecoderLayer(nn.Module):
++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
++++++         super().__init__()
++++++         self.hidden_size = config.hidden_size
+++++++        
+++++++        # if Long_Prompt:
+++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++++        # else:
+++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++ 
++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++ 
++++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++-
++++++         if (layer_idx not in config.mlp_only_layers) and (
++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++++         ):
++++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++             self._warmed_up = True
++++++             self.warmup_moe_model()
++++++ 
+++++++
+++++++
++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++++         output_router_logits = (
++++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
++++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++             router_logits=outputs.router_logits,
++++++         )
++++++ 
+++++++    def generate(self, *args, **kwargs):
+++++++        """
+++++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+++++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+++++++        """
+++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+++++++
+++++++        input_ids = kwargs.get("input_ids")
+++++++        if input_ids is None and args:
+++++++            input_ids = args[0]
+++++++
+++++++        if input_ids is not None:
+++++++            prompt_length = input_ids.shape[1]
+++++++            
+++++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+++++++                Long_Prompt = True
+++++++            else:
+++++++                Long_Prompt = False
+++++++
+++++++        return super().generate(*args, **kwargs)
+++++++    
++++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
++++++     def prepare_inputs_for_generation(
++++++         self,
++++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
++++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
++++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+++++++        
++++++         if past_key_values is not None:
++++++             if inputs_embeds is not None:  # Exception 1
++++++                 if 0 not in input_ids.shape:
++++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++             }
++++++         )
++++++         return model_inputs
+++++++
++++++ # @lwx
++++++     # def _decode_one_tokens_logits(
++++++     #     self,
++++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
++++++             attentions=outputs.attentions,
++++++         )
++++++ 
+++++++
++++++ __all__ = [
++++++     "Qwen2MoeForCausalLM",
++++++     "Qwen2MoeModel",
++++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++++++new file mode 100644
++++++index 00000000..6dfb5b93
++++++--- /dev/null
+++++++++ b/patches/0001-20251104commit.patch
++++++@@ -0,0 +1,1272 @@
+++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++++++From: Pinoeer-kingxi <13022943007@163.com>
+++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+++++++Subject: [PATCH] 20251104commit
+++++++
+++++++---
+++++++ mindnlp/transformers/cache_utils.py           |  28 +-
+++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+++++++
+++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++++++index cadd2e04..02f8d4be 100644
+++++++--- a/mindnlp/transformers/cache_utils.py
++++++++++ b/mindnlp/transformers/cache_utils.py
+++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++++++             # k_out[:, :, cache_position] = key_states
+++++++             # v_out[:, :, cache_position] = value_states
+++++++-            if ON_ORANGE_PI:
+++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++++-            else:
+++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++++-
++++++++            # if ON_ORANGE_PI:
++++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++++            # else:
++++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++++            # 确保 cache_position 是 1D tensor 并且类型正确
++++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++++++++            if cache_position.ndim > 1:
++++++++                cache_position = cache_position.flatten()
++++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
++++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++++++++                cache_position = cache_position.int()
++++++++            
++++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++++++++            k_out[:, :, cache_position] = key_states
++++++++            v_out[:, :, cache_position] = value_states
++++++++            
+++++++         return k_out, v_out
+++++++ 
+++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++index c695b944..d8303e45 100644
+++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++++++ def rotate_half(x):
+++++++     """Rotates half the hidden dims of the input."""
+++++++-    x1 = x[..., : x.shape[-1] // 2]
+++++++-    x2 = x[..., x.shape[-1] // 2 :]
++++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++++    # x1 = x[..., : x.shape[-1] // 2]
++++++++    # x2 = x[..., x.shape[-1] // 2 :]
++++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++++     return ops.cat((-x2, x1), dim=-1)
+++++++ 
+++++++ 
+++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++++++         if self.training:
+++++++             raise NotImplementedError("Training is not supported yet.")
+++++++         else:
+++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++++-        if self.config.n_shared_experts is not None:
+++++++-            y = y + self.shared_experts(identity)
+++++++-        return y
++++++++            # @lwx
++++++++            if orig_shape[1] == 1:
++++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++++++++                y=y.view(*orig_shape)
++++++++                if self.config.n_shared_experts is not None:
++++++++                    y = y + self.shared_experts(identity)
++++++++                return y
++++++++            else:
++++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++++++++                if self.config.n_shared_experts is not None:
++++++++                    y = y + self.shared_experts(identity)
++++++++                return y
++++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++++        # if self.config.n_shared_experts is not None:
++++++++        #     y = y + self.shared_experts(identity)
++++++++        # return y
++++++++        
++++++++    @no_grad()
++++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++++
++++++++        expert_cache = ops.zeros_like(x)
++++++++        for i in range(self.num_experts_per_tok):
++++++++            expert_id = flat_expert_indices[i].item()
++++++++            weight = flat_expert_weights[i].item()
++++++++            expert = self.experts[expert_id]
++++++++            expert_out = expert(x)
++++++++            expert_cache += expert_out * weight
++++++++        return expert_cache
+++++++ 
+++++++     @no_grad()
+++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++-        # expert_cache = torch.zeros_like(x)
+++++++-        # idxs = flat_expert_indices.argsort()
+++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++-        # token_idxs = idxs // self.num_experts_per_tok
+++++++-        # for i, end_idx in enumerate(tokens_per_expert):
+++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++-        #     if start_idx == end_idx:
+++++++-        #         continue
+++++++-        #     expert = self.experts[i]
+++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++-        #     expert_tokens = x[exp_token_idx]
+++++++-        #     expert_out = expert(expert_tokens)
+++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++-        # return expert_cache
++++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++++         expert_cache = ops.zeros_like(x)
+++++++         idxs = flat_expert_indices.argsort()
+++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++         token_idxs = idxs // self.num_experts_per_tok
++++++++
+++++++         for i, end_idx in enumerate(tokens_per_expert):
+++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++             if start_idx == end_idx:
+++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++++++             expert_out = expert(expert_tokens)
+++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++++
+++++++         return expert_cache
++++++++        
++++++++    # @no_grad()
++++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++++    #     # expert_cache = torch.zeros_like(x)
++++++++    #     # idxs = flat_expert_indices.argsort()
++++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++++    #     # token_idxs = idxs // self.num_experts_per_tok
++++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
++++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++++    #     #     if start_idx == end_idx:
++++++++    #     #         continue
++++++++    #     #     expert = self.experts[i]
++++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++++    #     #     expert_tokens = x[exp_token_idx]
++++++++    #     #     expert_out = expert(expert_tokens)
++++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++++    #     # return expert_cache
++++++++    #     expert_cache = ops.zeros_like(x)
++++++++    #     idxs = flat_expert_indices.argsort()
++++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++++    #     token_idxs = idxs // self.num_experts_per_tok
++++++++
++++++++    #     for i, end_idx in enumerate(tokens_per_expert):
++++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++++    #         if start_idx == end_idx:
++++++++    #             continue
++++++++    #         expert = self.experts[i]
++++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++++    #         expert_tokens = x[exp_token_idx]
++++++++    #         expert_out = expert(expert_tokens)
++++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++++
++++++++    #     return expert_cache
++++++++    # @no_grad()
++++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++++    #     expert_cache = ops.zeros_like(x)
++++++++
++++++++    #     # 排序保证顺序一致
++++++++    #     idxs = flat_expert_indices.argsort()
++++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++++    #     token_idxs = idxs // self.num_experts_per_tok
++++++++
++++++++    #     # 找出有 token 的专家
++++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++++
++++++++    #     for i in active_experts.tolist():
++++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++++    #         end_idx = tokens_per_expert[i]
++++++++    #         if start_idx == end_idx:  # 没有 token
++++++++    #             continue
++++++++
++++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++++    #         expert_tokens = x[exp_token_idx]
++++++++    #         expert_out = self.experts[i](expert_tokens)
++++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++++
++++++++    #         expert_cache = mindspore.mint.scatter_add(
++++++++    #             expert_cache,
++++++++    #             0,
++++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++++    #             expert_out
++++++++    #         )
++++++++
++++++++    #     return expert_cache
++++++++
++++++++
+++++++ 
+++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++++++ #     """
+++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++++ 
+++++++         # Initialize weights and apply final processing
+++++++         self.post_init()
++++++++        self.warm_up = False
++++++++
++++++++    def warmup_moe_model_deep(self):
++++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++++++        test_texts = [
++++++++            "warmup short",
++++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++++++++        ]
++++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++++        if tokenizer is None:
++++++++            from mindnlp.transformers import AutoTokenizer
++++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++++            self._warmup_tokenizer = tokenizer
++++++++
++++++++        for text in test_texts:
++++++++            inputs = tokenizer(text, return_tensors="ms")
++++++++            with mindspore._no_grad():
++++++++                _ = self(**inputs, use_cache=False)
++++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++++++ 
+++++++     def get_input_embeddings(self):
+++++++         return self.model.embed_tokens
+++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++++         ```"""
++++++++        if not self.warm_up:
++++++++            self.warm_up = True
++++++++            self.warmup_moe_model_deep()
++++++++
+++++++         output_attentions = (
+++++++             output_attentions
+++++++             if output_attentions is not None
+++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++index 3cbf820e..d4c6b651 100644
+++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++@@ -18,7 +18,6 @@
+++++++ # See the License for the specific language governing permissions and
+++++++ # limitations under the License.
+++++++ """MindSpore Qwen2MoE model."""
+++++++-
+++++++ import math
+++++++ from typing import List, Optional, Tuple, Union
+++++++ 
+++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++++++     TokenClassifierOutput,
+++++++ )
+++++++ from ...modeling_utils import PreTrainedModel
++++++++from ...generation import GenerationMixin
+++++++ from ....utils import logging
+++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+++++++ 
+++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++++++         self.variance_epsilon = eps
+++++++ 
+++++++     def forward(self, hidden_states):
++++++++        # @dwj
++++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++++        # @lwx
++++++++        # if not self.training :
++++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++++         input_dtype = hidden_states.dtype
+++++++         hidden_states = hidden_states.to(mindspore.float32)
+++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++++++@@ -234,6 +239,8 @@ def rotate_half(x):
+++++++     """Rotates half the hidden dims of the input."""
+++++++     x1 = x[..., : x.shape[-1] // 2]
+++++++     x2 = x[..., x.shape[-1] // 2 :]
++++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++++     return ops.cat((-x2, x1), dim=-1)
+++++++ 
+++++++ 
+++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++++++         self.config = config
+++++++         self.hidden_size = config.hidden_size
+++++++         self.intermediate_size = intermediate_size
++++++++        
+++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++++++         self.act_fn = ACT2FN[config.hidden_act]
+++++++ 
+++++++     def forward(self, x):
+++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++++-
+++++++ 
++++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++++        # @lwx 
++++++++        # gate_up_output = self.gate_up_proj(x)
++++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++++++++        # return self.down_proj(swiglu_output)
++++++++
++++++++    # def forward(self, x):
++++++++    #     gate_proj_out = self.gate_proj(x)
++++++++    #     up_proj_out = self.up_proj(x)
++++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++++++++    #     return self.down_proj(swiglu_out)
++++++++        
+++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++++++     """
+++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++++++         use_cache: bool = False,
+++++++         cache_position: Optional[mindspore.Tensor] = None,
+++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++        
++++++++
+++++++         bsz, q_len, _ = hidden_states.shape
+++++++ 
+++++++         query_states = self.q_proj(hidden_states)
+++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++                     "with a layer index."
+++++++                 )
+++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++            if isinstance(past_key_value, StaticCache):
++++++++                kv_seq_len = key_states.shape[-2]
++++++++            else:
++++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++ 
+++++++         if past_key_value is not None:
+++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++++            
++++++++            if isinstance(past_key_value, StaticCache):
++++++++                kv_seq_len = key_states.shape[-2]
+++++++ 
+++++++         # repeat k/v heads if n_kv_heads < n_heads
+++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++-
++++++++        
+++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++++ 
+++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++++++-            raise ValueError(
+++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++++++-                f" {attn_weights.shape}"
+++++++-            )
+++++++-
+++++++-        if attention_mask is not None:  # no matter the length, we just slice it
+++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++++++++        if attention_mask is not None:
++++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++++             attn_weights = attn_weights + causal_mask
+++++++ 
+++++++         # upcast attention to fp32
+++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++++ 
+++++++         attn_output = self.o_proj(attn_output)
+++++++-
++++++++        # @lwx
++++++++        
++++++++        # max_seq_len = self.max_position_embeddings  # 2048
++++++++
++++++++        # if attention_mask is not None:
++++++++        #     # attention_mask: [B, 1, Sq, Sk]
++++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++++++
++++++++        #     # pad 到 [max_seq_len, max_seq_len]
++++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++++++        #     global_attention_mask = padded_mask
++++++++        # else:
++++++++        #     global_attention_mask = None
++++++++
++++++++
++++++++        # sparse_mode=3
++++++++        # attn_output = mindspore.ops.flash_attention_score(
++++++++        #     query=query_states,
++++++++        #     key=key_states,
++++++++        #     value=value_states,
++++++++        #     real_shift=None,
++++++++        #     padding_mask=None,
++++++++
++++++++        #     head_num=self.num_heads,
++++++++        #     attn_mask=global_attention_mask,
++++++++        #     keep_prob=1.0 - self.attention_dropout,
++++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++        #     input_layout="BNSD",
++++++++        #     pre_tokens=2147483647,
++++++++        #     next_tokens=2147483647,
++++++++        #     inner_precise=0,
++++++++        #     drop_mask=None,
++++++++        #     prefix=None,
++++++++        #     actual_seq_qlen=None,
++++++++        #     actual_seq_kvlen=None,
++++++++        #     sparse_mode=sparse_mode,
++++++++        # )
+++++++         if not output_attentions:
+++++++             attn_weights = None
+++++++ 
+++++++         return attn_output, attn_weights, past_key_value
+++++++ 
+++++++ 
++++++++class Qwen2MoeFlashAttention(nn.Module):
++++++++    """
++++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++++++
++++++++    关键改动:
++++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++++++       直接传入原始的 key 和 value 张量效率更高。
++++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++++    """
++++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++++        super().__init__()
++++++++        self.config = config
++++++++        self.layer_idx = layer_idx
++++++++        self.hidden_size = config.hidden_size
++++++++        self.num_heads = config.num_attention_heads
++++++++        self.head_dim = self.hidden_size // self.num_heads
++++++++        self.num_key_value_heads = config.num_key_value_heads
++++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++++        self.max_position_embeddings = config.max_position_embeddings
++++++++        self.rope_theta = config.rope_theta
++++++++        self.attention_dropout = config.attention_dropout
++++++++
++++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++++++            raise ValueError(
++++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++++++            )
++++++++
++++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++++
++++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++++            self.head_dim,
++++++++            max_position_embeddings=self.max_position_embeddings,
++++++++            base=self.rope_theta,
++++++++        )
++++++++
++++++++    def forward(
++++++++        self,
++++++++        hidden_states: mindspore.Tensor,
++++++++        attention_mask: Optional[mindspore.Tensor] = None,
++++++++        position_ids: Optional[mindspore.Tensor] = None,
++++++++        past_key_value: Optional[Cache] = None,
++++++++        output_attentions: bool = False,
++++++++        use_cache: bool = False,
++++++++        cache_position: Optional[mindspore.Tensor] = None,
++++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++        bsz, q_len, _ = hidden_states.shape
++++++++
++++++++        # 1. 线性投射 Q, K, V
++++++++        query_states = self.q_proj(hidden_states)
++++++++        key_states = self.k_proj(hidden_states)
++++++++        value_states = self.v_proj(hidden_states)
++++++++
++++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
++++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++        # 3. RoPE 旋转位置编码
++++++++        kv_seq_len = key_states.shape[-2]
++++++++        if past_key_value is not None:
++++++++            if self.layer_idx is None:
++++++++                raise ValueError(
++++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++                    "with a layer index."
++++++++                )
++++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++++++                if cache_position.shape[0] == 1:
++++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++++++                    kv_seq_len = past_seen_tokens + 1
++++++++                else:
++++++++                    # prefill 阶段：cache_position 是范围，使用其长度
++++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++++++            else:
++++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++
++++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++        # 4. KV 缓存更新
++++++++        if past_key_value is not None:
++++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++++            key_states, value_states = past_key_value.update(
++++++++                key_states, value_states, self.layer_idx, cache_kwargs
++++++++            )
++++++++            
++++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++++                if cache_position.shape[0] == 1:
++++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++++++                    kv_seq_len = key_states.shape[-2]
++++++++
++++++++        # 5. [重要] 准备 Attention Mask
++++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++++        fa_attention_mask = None
++++++++        if attention_mask is not None:
++++++++            # 截取与当前key长度匹配的部分
++++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++++++            fa_attention_mask = (mask_slice != 0)
++++++++
++++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++++++        input_dtype = query_states.dtype
++++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++++++            query_states = query_states.to(mindspore.float16)
++++++++            key_states = key_states.to(mindspore.float16)
++++++++            value_states = value_states.to(mindspore.float16)
++++++++
++++++++        # 6. [核心] 调用 flash_attention_score 算子
++++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++++        attn_output = mindspore.ops.flash_attention_score(
++++++++            query=query_states,
++++++++            key=key_states,
++++++++            value=value_states,
++++++++            head_num=self.num_heads, # 传入Q的头数(N1)
++++++++            attn_mask=fa_attention_mask,
++++++++            keep_prob=1.0 - self.attention_dropout,
++++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++            input_layout="BNSD",
++++++++            sparse_mode=0 # 使用 defaultMask 模式
++++++++        )
++++++++
++++++++        # 恢复原始数据类型
++++++++        attn_output = attn_output.to(input_dtype)
++++++++
++++++++        # 7. 调整输出形状
++++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++        attn_output = self.o_proj(attn_output)
++++++++
++++++++        # FlashAttention 算子不直接返回注意力权重矩阵
++++++++        attn_weights = None
++++++++        if output_attentions:
++++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++++
++++++++        return attn_output, attn_weights, past_key_value
++++++++
++++++++    # def forward(
++++++++    #     self,
++++++++    #     hidden_states: mindspore.Tensor,
++++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++++++    #     past_key_value: Optional[Cache] = None,
++++++++    #     output_attentions: bool = False,
++++++++    #     use_cache: bool = False,
++++++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++    #     bsz, q_len, _ = hidden_states.shape
++++++++
++++++++    #     # 1. 线性投射 Q, K, V
++++++++    #     query_states = self.q_proj(hidden_states)
++++++++    #     key_states = self.k_proj(hidden_states)
++++++++    #     value_states = self.v_proj(hidden_states)
++++++++
++++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++    #     # 3. RoPE 旋转位置编码
++++++++    #     kv_seq_len = key_states.shape[-2]
++++++++    #     if past_key_value is not None:
++++++++    #         if self.layer_idx is None:
++++++++    #             raise ValueError(
++++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++    #                 "with a layer index."
++++++++    #             )
++++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++
++++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++    #     # 4. KV 缓存更新
++++++++    #     if past_key_value is not None:
++++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++++    #         key_states, value_states = past_key_value.update(
++++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++++    #         )
++++++++
++++++++    #     # 5. 准备 Attention Mask
++++++++    #     fa_attention_mask = None
++++++++    #     if attention_mask is not None:
++++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++    #         fa_attention_mask = (mask_slice != 0)
++++++++
++++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++++++    #     input_dtype = query_states.dtype
++++++++
++++++++    #     # 6. [核心] 调用 flash_attention_score 算子
++++++++    #     attn_output = mindspore.ops.flash_attention_score(
++++++++    #         query=query_states,
++++++++    #         key=key_states,
++++++++    #         value=value_states,
++++++++    #         head_num=self.num_heads,
++++++++    #         attn_mask=fa_attention_mask,
++++++++    #         keep_prob=1.0 - self.attention_dropout,
++++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++    #         input_layout="BNSD",
++++++++    #         sparse_mode=0,
++++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++++++    #         inner_precise=1
++++++++    #     )
++++++++
++++++++    #     # 恢复原始数据类型
++++++++    #     attn_output = attn_output.to(input_dtype)
++++++++
++++++++    #     # 7. 调整输出形状
++++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++    #     attn_output = self.o_proj(attn_output)
++++++++
++++++++    #     attn_weights = None
++++++++    #     if output_attentions:
++++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++++
++++++++    #     return attn_output, attn_weights, past_key_value
++++++++
++++++++    # def forward(
++++++++    #     self,
++++++++    #     hidden_states: mindspore.Tensor,
++++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++++++    #     past_key_value: Optional[Cache] = None,
++++++++    #     output_attentions: bool = False,
++++++++    #     use_cache: bool = False,
++++++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++    #     bsz, q_len, _ = hidden_states.shape
++++++++
++++++++    #     query_states = self.q_proj(hidden_states)
++++++++    #     key_states = self.k_proj(hidden_states)
++++++++    #     value_states = self.v_proj(hidden_states)
++++++++
++++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++    #     kv_seq_len = key_states.shape[-2]
++++++++    #     if past_key_value is not None:
++++++++    #         if self.layer_idx is None:
++++++++    #             raise ValueError("`layer_idx` must be specified for caching")
++++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++
++++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++    #     if past_key_value is not None:
++++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++++    #         key_states, value_states = past_key_value.update(
++++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++++    #         )
++++++++
++++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++++
++++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++++++    #     query_states = query_states / math.sqrt(self.head_dim)
++++++++    #     # <--- 修改结束 ---
++++++++
++++++++    #     fa_attention_mask = None
++++++++    #     if attention_mask is not None:
++++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++    #         fa_attention_mask = (mask_slice != 0)
++++++++
++++++++    #     input_dtype = query_states.dtype
++++++++
++++++++    #     attn_output = mindspore.ops.flash_attention_score(
++++++++    #         query=query_states,    # 传入已经预先缩放过的 query
++++++++    #         key=key_states,
++++++++    #         value=value_states,
++++++++    #         head_num=self.num_heads,
++++++++    #         attn_mask=fa_attention_mask,
++++++++    #         keep_prob=1.0 - self.attention_dropout,
++++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++++++    #         input_layout="BNSD",
++++++++    #         sparse_mode=0,
++++++++    #         inner_precise=1        # 仍然保持内部高精度计算
++++++++    #     )
++++++++
++++++++    #     attn_output = attn_output.to(input_dtype)
++++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++    #     attn_output = self.o_proj(attn_output)
++++++++
++++++++    #     attn_weights = None
++++++++    #     if output_attentions:
++++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++++++
++++++++    #     return attn_output, attn_weights, past_key_value
++++++++
+++++++ QWEN2MOE_ATTENTION_CLASSES = {
+++++++     "eager": Qwen2MoeAttention,
++++++++    "flash-attention": Qwen2MoeFlashAttention,
+++++++ }
+++++++ 
+++++++ 
+++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++ 
++++++++    #@dwj
++++++++    # 只遍历激活的专家，而非全部专家
+++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+++++++-        # router_logits: (batch * sequence_length, n_experts)
+++++++-        router_logits = self.gate(hidden_states)
+++++++-
+++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++-        if self.norm_topk_prob:
+++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-        # we cast back to the input dtype
+++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+++++++-
+++++++-        final_hidden_states = ops.zeros(
+++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++++++-        )
+++++++-
+++++++-        # One hot encode the selected experts to create an expert mask
+++++++-        # this will be used to easily index which expert is going to be sollicitated
+++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++++++-
+++++++-        # Loop over all available experts in the model and perform the computation on each expert
+++++++-        for expert_idx in range(self.num_experts):
+++++++-            expert_layer = self.experts[expert_idx]
+++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++++++-
+++++++-            # Index the correct hidden states and compute the expert hidden state for
+++++++-            # the current expert. We need to make sure to multiply the output hidden
+++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++++++-            if 0 not in idx.shape:
+++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++++++-
+++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+++++++-                # the `top_x` tensor here.
+++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++++++-
+++++++-        shared_expert_output = self.shared_expert(hidden_states)
+++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++++++-
+++++++-        final_hidden_states = final_hidden_states + shared_expert_output
++++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++            num_tokens = hidden_states_reshaped.shape[0]
++++++++
++++++++            router_logits = self.gate(hidden_states_reshaped)
++++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++++
++++++++            if self.norm_topk_prob:
++++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++            routing_weights = routing_weights.to(hidden_states.dtype)
++++++++            
++++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++++++            flat_selected_experts = selected_experts.flatten()
++++++++            
++++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++++++            token_indices = broadcasted_token_indices.flatten()
++++++++            
++++++++            active_experts = ops.unique(flat_selected_experts)
++++++++            
++++++++            for expert_idx_tensor in active_experts:
++++++++                expert_idx = expert_idx_tensor.item()
++++++++                expert_layer = self.experts[expert_idx]
++++++++                
++++++++                mask = (flat_selected_experts == expert_idx_tensor)
++++++++                selected_token_indices = token_indices[mask]
++++++++                selected_routing_weights = routing_weights.flatten()[mask]
++++++++                
++++++++                current_states = hidden_states_reshaped[selected_token_indices]
++++++++                
++++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++++                
++++++++                final_hidden_states = final_hidden_states.index_add(
++++++++                    dim=0,
++++++++                    index=selected_token_indices,
++++++++                    source=expert_output.to(hidden_states.dtype)
++++++++                )
++++++++            
++++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++++ 
+++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++-        return final_hidden_states, router_logits
++++++++            final_hidden_states = final_hidden_states + shared_expert_output
++++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++++            
++++++++            return final_hidden_states, router_logits
+++++++ 
+++++++ 
+++++++ class Qwen2MoeDecoderLayer(nn.Module):
+++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++++++ 
+++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++++ 
++++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++++
+++++++         if (layer_idx not in config.mlp_only_layers) and (
+++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++++++         ):
+++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++++++     _skip_keys_device_placement = "past_key_values"
+++++++     _supports_cache_class = True
++++++++#lwx
++++++++    # _supports_static_cache = True
+++++++ 
+++++++     def _init_weights(self, module):
+++++++         std = self.config.initializer_range
+++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++++++         return causal_mask
+++++++ 
+++++++ 
+++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++     _tied_weights_keys = ["lm_head.weight"]
+++++++ 
+++++++     def __init__(self, config):
+++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++         self.num_experts_per_tok = config.num_experts_per_tok
+++++++         # Initialize weights and apply final processing
+++++++         self.post_init()
++++++++        # @lwx
++++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++++++++        #     self.generation_config.cache_implementation = "static"
++++++++        self._warmed_up = False
++++++++
++++++++    def warmup_moe_model(self):
++++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
++++++++        test_texts = [
++++++++            "warmup short",
++++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++++++++        ]
++++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++++        if tokenizer is None:
++++++++            from mindnlp.transformers import AutoTokenizer
++++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++++            self._warmup_tokenizer = tokenizer
++++++++
++++++++        for text in test_texts:
++++++++            inputs = tokenizer(text, return_tensors="ms")
++++++++            with mindspore._no_grad():
++++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
++++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++++++ 
+++++++     def get_input_embeddings(self):
+++++++         return self.model.embed_tokens
+++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++++         ```"""
++++++++        if not self._warmed_up:
++++++++            self._warmed_up = True
++++++++            self.warmup_moe_model()
+++++++ 
+++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++++++         output_router_logits = (
+++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++             }
+++++++         )
+++++++         return model_inputs
++++++++# @lwx
++++++++    # def _decode_one_tokens_logits(
++++++++    #     self,
++++++++    #     cur_token: mindspore.Tensor,
++++++++    #     input_pos: Optional[mindspore.Tensor],
++++++++    #     cache_position: mindspore.Tensor,
++++++++    #     past_key_values: StaticCache,
++++++++    # ) -> mindspore.Tensor:
++++++++    #     """
++++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++++++++        
++++++++    #     Args:
++++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++++++++    #         input_pos: 输入位置信息，可选
++++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
++++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
++++++++            
++++++++    #     Returns:
++++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++++++++    #     """
++++++++    #     # 调用JIT编译的版本
++++++++    #     return self.get_decode_one_tokens_logits(
++++++++    #         cur_token=cur_token,
++++++++    #         input_pos=input_pos,
++++++++    #         cache_position=cache_position,
++++++++    #         past_key_values=past_key_values,
++++++++    #     )
++++++++    
++++++++    # @mindspore.jit(jit_level='O1')
++++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++++++++    #     """
++++++++    #     JIT编译的函数，用于高效的单token解码
++++++++    #     使用JIT编译优化以支持静态shape和高效执行
++++++++        
++++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++++++++    #     """
++++++++    #     outputs = self.model.forward(
++++++++    #         input_ids=cur_token,
++++++++    #         position_ids=input_pos,
++++++++    #         cache_position=cache_position,
++++++++    #         past_key_values=past_key_values,
++++++++    #         use_cache=True,
++++++++    #         return_dict=False,
++++++++    #     )
++++++++        
++++++++    #     hidden_states = outputs[0]
++++++++    #     logits = self.lm_head.forward(hidden_states)
++++++++    #     logits = logits.float()
++++++++        
++++++++    #     return logits[:, -1, :]
++++++++
++++++++    # def _sample(
++++++++    #     self,
++++++++    #     input_ids: mindspore.Tensor,
++++++++    #     logits_processor,
++++++++    #     stopping_criteria,
++++++++    #     generation_config,
++++++++    #     synced_devices: bool,
++++++++    #     streamer=None,
++++++++    #     logits_warper=None,
++++++++    #     **model_kwargs,
++++++++    # ):
++++++++    #     """
++++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++++++++    #     """
++++++++    #     from ...generation.logits_process import LogitsProcessorList
++++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
++++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++++++++    #     from mindnlp.core import nn, ops, no_grad
++++++++    #     import numpy as np
++++++++        
++++++++    #     # 检查是否使用 StaticCache
++++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++++++++    #     # 否则，直接调用父类方法
++++++++    #     past_key_values = model_kwargs.get("past_key_values")
++++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++++++++
++++++++    #     if not isinstance(past_key_values, StaticCache):
++++++++    #         # 不使用 StaticCache，直接调用父类方法
++++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++++++++    #         return super()._sample(
++++++++    #             input_ids=input_ids,
++++++++    #             logits_processor=logits_processor,
++++++++    #             stopping_criteria=stopping_criteria,
++++++++    #             generation_config=generation_config,
++++++++    #             synced_devices=synced_devices,
++++++++    #             streamer=streamer,
++++++++    #             logits_warper=logits_warper,
++++++++    #             **model_kwargs,
++++++++    #         )
++++++++        
++++++++    #     # 使用 StaticCache，进入自定义循环
++++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++++++++    #     pad_token_id = generation_config._pad_token_tensor
++++++++    #     output_attentions = generation_config.output_attentions
++++++++    #     output_hidden_states = generation_config.output_hidden_states
++++++++    #     output_scores = generation_config.output_scores
++++++++    #     output_logits = generation_config.output_logits
++++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
++++++++    #     max_length = generation_config.max_length
++++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++++++++    #     do_sample = generation_config.do_sample
++++++++        
++++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++++++++    #         raise ValueError(
++++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++++++++    #             f"{logits_warper})."
++++++++    #         )
++++++++        
++++++++    #     # init attention / hidden states / scores tuples
++++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
++++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++++++++        
++++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++++++++    #         encoder_hidden_states = (
++++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++++++++    #         )
++++++++        
++++++++    #     # keep track of which sequences are already finished
++++++++    #     batch_size, cur_len = input_ids.shape
++++++++    #     this_peer_finished = False
++++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++++++++        
++++++++    #     time_record = []
++++++++    #     from ....utils.testing_utils import parse_flag_from_env
++++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++++++++        
++++++++    #     while self._has_unfinished_sequences(
++++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++++++++    #     ):
++++++++    #         if _record_time:
++++++++    #             import time as time_module
++++++++    #             infer_start = time_module.time()
++++++++            
++++++++    #         # prepare model inputs
++++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++++++++            
++++++++    #         # prepare variable output controls
++++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++++++++            
++++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++++++++    #         cur_cache_position = model_inputs.get("cache_position")
++++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
++++++++    #         cur_input_ids = model_inputs.get("input_ids")
++++++++            
++++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
++++++++    #             cur_cache_position is not None and 
++++++++    #             len(cur_cache_position.shape) > 0 and
++++++++    #             cur_cache_position.shape[0] == 1 and
++++++++    #             cur_input_ids is not None and
++++++++    #             cur_input_ids.shape[1] == 1):
++++++++    #             # 使用 JIT 优化的单 token 解码
++++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++++++++    #             if not hasattr(self, '_jit_used'):
++++++++    #                 self._jit_used = False
++++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++++++++                
++++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
++++++++    #                 cur_token=cur_input_ids,
++++++++    #                 input_pos=model_inputs.get("position_ids"),
++++++++    #                 cache_position=cur_cache_position,
++++++++    #                 past_key_values=cur_past_key_values,
++++++++    #             )
++++++++                
++++++++    #             # 标记已使用JIT（用于后续判断）
++++++++    #             if not self._jit_used:
++++++++    #                 self._jit_used = True
++++++++                
++++++++    #             # 构造兼容的输出对象
++++++++    #             class JitOptimizedOutput:
++++++++    #                 def __init__(self, logits, config):
++++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++++++++    #                     self.config = config
++++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
++++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
++++++++    #                     self.cross_attentions = None
++++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++++++++                
++++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++++++++    #         else:
++++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++++++++    #             outputs = self(**model_inputs, return_dict=True)
++++++++            
++++++++    #         if synced_devices and this_peer_finished:
++++++++    #             continue
++++++++            
++++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++++++++    #         next_token_logits = outputs.logits[:, -1, :]
++++++++            
++++++++    #         # pre-process distribution
++++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++++++++    #         if do_sample:
++++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++++++++            
++++++++    #         # Store scores, attentions and hidden_states when required
++++++++    #         if return_dict_in_generate:
++++++++    #             if output_scores:
++++++++    #                 scores += (next_token_scores,)
++++++++    #             if output_logits:
++++++++    #                 raw_logits += (next_token_logits,)
++++++++    #             if output_attentions:
++++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++++++++    #                 if self.config.is_encoder_decoder:
++++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++++++++                
++++++++    #             if output_hidden_states:
++++++++    #                 hidden = (
++++++++    #                     outputs.decoder_hidden_states
++++++++    #                     if self.config.is_encoder_decoder
++++++++    #                     else outputs.hidden_states
++++++++    #                 )
++++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++++++++            
++++++++    #         # token selection
++++++++    #         if do_sample:
++++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++++++++    #         else:
++++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++++++++            
++++++++    #         # finished sentences should have their next token be a padding token
++++++++    #         if has_eos_stopping_criteria:
++++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++++++++            
++++++++    #         # update generated ids, model inputs, and length for next step
++++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++++++++    #         if streamer is not None:
++++++++    #             streamer.put(next_tokens)
++++++++            
++++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
++++++++    #             outputs,
++++++++    #             model_kwargs,
++++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
++++++++    #         )
++++++++            
++++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++++++++    #         cur_len += 1
++++++++            
++++++++    #         if _record_time:
++++++++    #             import time as time_module
++++++++    #             infer_stop = time_module.time()
++++++++    #             time_record.append(infer_stop - infer_start)
++++++++            
++++++++    #         del outputs
++++++++        
++++++++    #     average_infer_time = None
++++++++    #     if time_record:
++++++++    #         if len(time_record) > 1:
++++++++    #             time_record.pop(0)
++++++++    #         average_infer_time = sum(time_record) / len(time_record)
++++++++    #         print(f'average inference time is: {average_infer_time}')
++++++++    #         print(f'inference time record: {time_record}')
++++++++        
++++++++    #     if streamer is not None:
++++++++    #         streamer.end()
++++++++        
++++++++    #     # 简单判断：打印是否使用了JIT路径
++++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
++++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
++++++++    #     else:
++++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++++++++        
++++++++    #     if return_dict_in_generate:
++++++++    #         if self.config.is_encoder_decoder:
++++++++    #             return GenerateEncoderDecoderOutput(
++++++++    #                 sequences=input_ids,
++++++++    #                 scores=scores,
++++++++    #                 logits=raw_logits,
++++++++    #                 encoder_attentions=encoder_attentions,
++++++++    #                 encoder_hidden_states=encoder_hidden_states,
++++++++    #                 decoder_attentions=decoder_attentions,
++++++++    #                 cross_attentions=cross_attentions,
++++++++    #                 decoder_hidden_states=decoder_hidden_states,
++++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++++    #                 average_infer_time=average_infer_time
++++++++    #             )
++++++++    #         else:
++++++++    #             return GenerateDecoderOnlyOutput(
++++++++    #                 sequences=input_ids,
++++++++    #                 scores=scores,
++++++++    #                 logits=raw_logits,
++++++++    #                 attentions=decoder_attentions,
++++++++    #                 hidden_states=decoder_hidden_states,
++++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++++    #                 average_infer_time=average_infer_time
++++++++    #             )
++++++++    #     else:
++++++++    #         return input_ids
++++++++            
++++++++    # def _prepare_cache_for_generation(
++++++++    #     self,
++++++++    #     generation_config,
++++++++    #     model_kwargs,
++++++++    #     assistant_model,
++++++++    #     batch_size,
++++++++    #     max_cache_length,
++++++++    # ):
++++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++++++++    #         generation_config.cache_implementation = "static"
++++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++++++++        
++++++++    #     if generation_config.cache_implementation == "static":
++++++++    #         base_required_from_max_length = generation_config.max_length + 1
++++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
++++++++    #         min_cache_size = 50
++++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++++++++    #         else:
++++++++    #             max_cache_length = max(base_required, min_cache_size)
++++++++            
++++++++    #         original_max_cache_length = max_cache_length
++++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
++++++++            
++++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++++    #             if max_cache_length > self.config.max_position_embeddings:
++++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++++++++        
++++++++    #     result = super()._prepare_cache_for_generation(
++++++++    #         generation_config=generation_config,
++++++++    #         model_kwargs=model_kwargs,
++++++++    #         assistant_model=assistant_model,
++++++++    #         batch_size=batch_size,
++++++++    #         max_cache_length=max_cache_length,
++++++++    #     )
++++++++        
++++++++    #     if generation_config.cache_implementation == "static":
++++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++++++++    #         created_cache = model_kwargs.get(cache_name)
++++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++++++++    #             if created_cache.max_cache_len < generation_config.max_length:
++++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++++++++        
++++++++    #     return result
++++++++
++++++++
++++++++
+++++++ 
+++++++ 
+++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++++++-- 
+++++++2.27.0
+++++++
++++++-- 
++++++2.27.0
++++++
+++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+++++new file mode 100644
+++++index 00000000..966529e4
+++++--- /dev/null
++++++++ b/patches/0003-20261106secondcommit.patch
+++++@@ -0,0 +1,2769 @@
++++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++++++From: Pinoeer-kingxi <13022943007@163.com>
++++++Date: Thu, 6 Nov 2025 14:54:37 +0800
++++++Subject: [PATCH 3/3] 20261106secondcommit
++++++
++++++---
++++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
++++++ patches/0001-20251104commit.patch             | 1272 -----------------
++++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
++++++ delete mode 100644 patches/0001-20251104commit.patch
++++++
++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++index 73773c22..2f9192bf 100644
++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
++++++ 
++++++ _CONFIG_FOR_DOC = "DeepseekConfig"
++++++ 
+++++++_attn_mask_cache = {}
+++++++
+++++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+++++++    q_len = batch_and_seq[1]
+++++++    kv_len = batch_and_seq[1] + past_key_values_length 
+++++++    key = (batch_and_seq[0], q_len, kv_len)
+++++++
+++++++    if key in _attn_mask_cache:
+++++++        return _attn_mask_cache[key]
+++++++
+++++++    mask = _prepare_4d_causal_attention_mask(
+++++++        attention_mask,
+++++++        batch_and_seq,
+++++++        inputs_embeds,
+++++++        past_key_values_length,
+++++++    )
+++++++    _attn_mask_cache[key] = mask
+++++++    return mask
++++++ 
++++++ def _get_unpad_data(attention_mask):
++++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
++++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
++++++         return final_output
++++++ 
++++++ 
++++++-    @no_grad()
++++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++-        expert_cache = ops.zeros_like(x)
++++++-        idxs = flat_expert_indices.argsort()
++++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++-        token_idxs = idxs // self.num_experts_per_tok
++++++-
++++++-        for i, end_idx in enumerate(tokens_per_expert):
++++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++-            if start_idx == end_idx:
++++++-                continue
++++++-            expert = self.experts[i]
++++++-            exp_token_idx = token_idxs[start_idx:end_idx]
++++++-            expert_tokens = x[exp_token_idx]
++++++-            expert_out = expert(expert_tokens)
++++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++-
++++++-        return expert_cache
++++++-        
++++++     # @no_grad()
++++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++-    #     # expert_cache = torch.zeros_like(x)
++++++-    #     # idxs = flat_expert_indices.argsort()
++++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++-    #     # token_idxs = idxs // self.num_experts_per_tok
++++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
++++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++-    #     #     if start_idx == end_idx:
++++++-    #     #         continue
++++++-    #     #     expert = self.experts[i]
++++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++-    #     #     expert_tokens = x[exp_token_idx]
++++++-    #     #     expert_out = expert(expert_tokens)
++++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++-    #     # return expert_cache
+++++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++     #     expert_cache = ops.zeros_like(x)
++++++     #     idxs = flat_expert_indices.argsort()
++++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
++++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++ 
++++++     #     return expert_cache
++++++-    # @no_grad()
++++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++-    #     expert_cache = ops.zeros_like(x)
+++++++        
+++++++    @no_grad()
+++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++++        """
+++++++        优化版 MoE prefill：
+++++++        - 批量张量化处理同一个 expert 的所有 token
+++++++        - 跳过无 token 的专家
+++++++        - 保持结果完全一致
+++++++        """
+++++++        # 初始化输出缓存
+++++++        expert_cache = ops.zeros_like(x)
++++++ 
++++++-    #     # 排序保证顺序一致
++++++-    #     idxs = flat_expert_indices.argsort()
++++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++-    #     token_idxs = idxs // self.num_experts_per_tok
+++++++        # 排序（确保 scatter_add 位置对应原逻辑）
+++++++        idxs = flat_expert_indices.argsort()
+++++++        sorted_expert_indices = flat_expert_indices[idxs]
+++++++        sorted_token_indices = idxs // self.num_experts_per_tok
++++++ 
++++++-    #     # 找出有 token 的专家
++++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++++        # 每个 expert 的 token 数
+++++++        tokens_per_expert = sorted_expert_indices.bincount()
++++++ 
++++++-    #     for i in active_experts.tolist():
++++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++-    #         end_idx = tokens_per_expert[i]
++++++-    #         if start_idx == end_idx:  # 没有 token
++++++-    #             continue
+++++++        # 找出有 token 的专家
+++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
++++++ 
++++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++-    #         expert_tokens = x[exp_token_idx]
++++++-    #         expert_out = self.experts[i](expert_tokens)
++++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++++        for expert_id in active_experts.tolist():
+++++++            # 取该 expert 对应的排序后 token 区间
+++++++            start = (tokens_per_expert[:expert_id]).sum().item()
+++++++            end = start + tokens_per_expert[expert_id].item()
++++++ 
++++++-    #         expert_cache = mindspore.mint.scatter_add(
++++++-    #             expert_cache,
++++++-    #             0,
++++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++-    #             expert_out
++++++-    #         )
+++++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+++++++            expert_tokens = x[token_idx]                     # 取输入向量
++++++ 
++++++-    #     return expert_cache
+++++++            # 执行专家 MLP
+++++++            expert_out = self.experts[expert_id](expert_tokens)
+++++++
+++++++            # 按权重缩放
+++++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+++++++
+++++++            # 回写到缓存（等价 scatter_add）
+++++++            expert_cache = mindspore.mint.scatter_add(
+++++++                expert_cache,
+++++++                0,
+++++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++++                scaled_out
+++++++            )
+++++++
+++++++        return expert_cache
+++++++
+++++++        # @no_grad()
+++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++        #     # expert_cache = torch.zeros_like(x)
+++++++        #     # idxs = flat_expert_indices.argsort()
+++++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++        #     # token_idxs = idxs // self.num_experts_per_tok
+++++++        #     # for i, end_idx in enumerate(tokens_per_expert):
+++++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++        #     #     if start_idx == end_idx:
+++++++        #     #         continue
+++++++        #     #     expert = self.experts[i]
+++++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++        #     #     expert_tokens = x[exp_token_idx]
+++++++        #     #     expert_out = expert(expert_tokens)
+++++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++        #     # return expert_cache
+++++++        #     expert_cache = ops.zeros_like(x)
+++++++        #     idxs = flat_expert_indices.argsort()
+++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++        #     token_idxs = idxs // self.num_experts_per_tok
+++++++
+++++++        #     for i, end_idx in enumerate(tokens_per_expert):
+++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++        #         if start_idx == end_idx:
+++++++        #             continue
+++++++        #         expert = self.experts[i]
+++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++        #         expert_tokens = x[exp_token_idx]
+++++++        #         expert_out = expert(expert_tokens)
+++++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++
+++++++        #     return expert_cache
+++++++        # @no_grad()
+++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++        #     expert_cache = ops.zeros_like(x)
+++++++
+++++++        #     # 排序保证顺序一致
+++++++        #     idxs = flat_expert_indices.argsort()
+++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++        #     token_idxs = idxs // self.num_experts_per_tok
+++++++
+++++++        #     # 找出有 token 的专家
+++++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++++
+++++++        #     for i in active_experts.tolist():
+++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++        #         end_idx = tokens_per_expert[i]
+++++++        #         if start_idx == end_idx:  # 没有 token
+++++++        #             continue
+++++++
+++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++        #         expert_tokens = x[exp_token_idx]
+++++++        #         expert_out = self.experts[i](expert_tokens)
+++++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++++
+++++++        #         expert_cache = mindspore.mint.scatter_add(
+++++++        #             expert_cache,
+++++++        #             0,
+++++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++++        #             expert_out
+++++++        #         )
+++++++
+++++++        #     return expert_cache
++++++ 
++++++ 
++++++ 
++++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
++++++ 
++++++         return attn_output, attn_weights, past_key_value
++++++ 
++++++-
++++++ # class DeepseekFlashAttention(nn.Module):
++++++ #     """
++++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
++++++ 
++++++         return attn_output, attn_weights, past_key_value
++++++ 
+++++++
++++++ Deepseek_ATTENTION_CLASSES = {
++++++     "eager": DeepseekAttention,
++++++     "flash-attention": DeepseekFlashAttention,
++++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
++++++             )
++++++         else:
++++++             # 4d mask is passed through the layers
++++++-            attention_mask = _prepare_4d_causal_attention_mask(
+++++++            # attention_mask = _prepare_4d_causal_attention_mask(
+++++++            #     attention_mask,
+++++++            #     (batch_size, seq_length),
+++++++            #     inputs_embeds,
+++++++            #     past_key_values_length,
+++++++            # )
+++++++            #@dwj
+++++++            attention_mask = get_cached_causal_mask(
++++++                 attention_mask,
++++++                 (batch_size, seq_length),
++++++                 inputs_embeds,
++++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++         # Initialize weights and apply final processing
++++++         self.post_init()
++++++         self.warm_up = False
+++++++        #@dwj
+++++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+++++++            self.num_layers,
+++++++            self.num_attention_heads,
+++++++            self.head_dim,
+++++++            batch_size=1,
+++++++            max_length=self.max_length,
+++++++            dtype=mindspore.float16
+++++++        )
+++++++
+++++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+++++++        key_cache = []
+++++++        value_cache = []
+++++++        for _ in range(num_layers):
+++++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+++++++            key_cache.append(k)
+++++++            value_cache.append(v)
+++++++        return key_cache, value_cache
+++++++
++++++ 
++++++     def warmup_moe_model_deep(self):
++++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++index bced285c..ebd7782e 100644
++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
++++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
++++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
++++++ 
++++++-Long_Prompt = False
++++++-PROMPT_LENGTH_THRESHOLD = 128
+++++++Long_Prompt = 1
+++++++LONG_PROMPT_LENGTH_THRESHOLD = 128
+++++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+++++++
+++++++_causal_mask_cache = {}
+++++++
+++++++def get_cached_causal_mask_with_cache_position(
+++++++    attention_mask: mindspore.Tensor,
+++++++    sequence_length: int,
+++++++    target_length: int,
+++++++    dtype: mindspore.dtype,
+++++++    min_dtype: float,
+++++++    cache_position: mindspore.Tensor,
+++++++    batch_size: int,
+++++++):
+++++++    """
+++++++    带缓存的 causal mask 构造函数
+++++++    """
+++++++    # q_len 是当前 query 长度
+++++++    q_len = sequence_length
+++++++    # kv_len 是 target_length
+++++++    kv_len = target_length
+++++++
+++++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+++++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+++++++
+++++++    if key in _causal_mask_cache:
+++++++        return _causal_mask_cache[key]
+++++++
+++++++    # 调用原来的 mask 构造逻辑
+++++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++++        attention_mask,
+++++++        sequence_length=sequence_length,
+++++++        target_length=target_length,
+++++++        dtype=dtype,
+++++++        min_dtype=min_dtype,
+++++++        cache_position=cache_position,
+++++++        batch_size=batch_size,
+++++++    )
+++++++    # 缓存结果
+++++++    _causal_mask_cache[key] = causal_mask
+++++++    return causal_mask
++++++ 
++++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
++++++ def _prepare_4d_causal_attention_mask_with_cache_position(
++++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++++ 
++++++ 
++++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+++++++# class Qwen2MoeAttention(nn.Module):
+++++++#     """
+++++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+++++++#     and "Generating Long Sequences with Sparse Transformers".
+++++++#     """
+++++++
+++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++++#         super().__init__()
+++++++#         self.config = config
+++++++#         self.layer_idx = layer_idx
+++++++#         if layer_idx is None:
+++++++#             logger.warning_once(
+++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++++#                 "when creating this class."
+++++++#             )
+++++++
+++++++#         self.hidden_size = config.hidden_size
+++++++#         self.num_heads = config.num_attention_heads
+++++++#         self.head_dim = self.hidden_size // self.num_heads
+++++++#         self.num_key_value_heads = config.num_key_value_heads
+++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++#         self.max_position_embeddings = config.max_position_embeddings
+++++++#         self.rope_theta = config.rope_theta
+++++++#         self.is_causal = True
+++++++#         self.attention_dropout = config.attention_dropout
+++++++
+++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++#             raise ValueError(
+++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+++++++#                 f" and `num_heads`: {self.num_heads})."
+++++++#             )
+++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++++
+++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++++#             self.head_dim,
+++++++#             max_position_embeddings=self.max_position_embeddings,
+++++++#             base=self.rope_theta,
+++++++#         )
+++++++
+++++++#     def forward(
+++++++#         self,
+++++++#         hidden_states: mindspore.Tensor,
+++++++#         attention_mask: Optional[mindspore.Tensor] = None,
+++++++#         position_ids: Optional[mindspore.Tensor] = None,
+++++++#         past_key_value: Optional[Cache] = None,
+++++++#         output_attentions: bool = False,
+++++++#         use_cache: bool = False,
+++++++#         cache_position: Optional[mindspore.Tensor] = None,
+++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++
+++++++        
+++++++
+++++++#         bsz, q_len, _ = hidden_states.shape
+++++++
+++++++#         query_states = self.q_proj(hidden_states)
+++++++#         key_states = self.k_proj(hidden_states)
+++++++#         value_states = self.v_proj(hidden_states)
+++++++
+++++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++++
+++++++#         kv_seq_len = key_states.shape[-2]
+++++++#         if past_key_value is not None:
+++++++#             if self.layer_idx is None:
+++++++#                 raise ValueError(
+++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++#                     "with a layer index."
+++++++#                 )
+++++++#             if isinstance(past_key_value, StaticCache):
+++++++#                 kv_seq_len = key_states.shape[-2]
+++++++#             else:
+++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++
+++++++#         if past_key_value is not None:
+++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++            
+++++++#             if isinstance(past_key_value, StaticCache):
+++++++#                 kv_seq_len = key_states.shape[-2]
+++++++
+++++++#         # repeat k/v heads if n_kv_heads < n_heads
+++++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++        
+++++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++++
+++++++#         if attention_mask is not None:
+++++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++++#             attn_weights = attn_weights + causal_mask
+++++++
+++++++#         # upcast attention to fp32
+++++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++++++#         attn_output = ops.matmul(attn_weights, value_states)
+++++++
+++++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++++++#             raise ValueError(
+++++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+++++++#                 f" {attn_output.shape}"
+++++++#             )
+++++++
+++++++#         attn_output = ops.transpose(attn_output, 1, 2)
+++++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++++
+++++++#         attn_output = self.o_proj(attn_output)
+++++++#         # @lwx
+++++++        
+++++++#         # max_seq_len = self.max_position_embeddings  # 2048
+++++++
+++++++#         # if attention_mask is not None:
+++++++#         #     # attention_mask: [B, 1, Sq, Sk]
+++++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++++
+++++++#         #     # pad 到 [max_seq_len, max_seq_len]
+++++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++++#         #     global_attention_mask = padded_mask
+++++++#         # else:
+++++++#         #     global_attention_mask = None
+++++++
+++++++
+++++++#         # sparse_mode=3
+++++++#         # attn_output = mindspore.ops.flash_attention_score(
+++++++#         #     query=query_states,
+++++++#         #     key=key_states,
+++++++#         #     value=value_states,
+++++++#         #     real_shift=None,
+++++++#         #     padding_mask=None,
+++++++
+++++++#         #     head_num=self.num_heads,
+++++++#         #     attn_mask=global_attention_mask,
+++++++#         #     keep_prob=1.0 - self.attention_dropout,
+++++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++#         #     input_layout="BNSD",
+++++++#         #     pre_tokens=2147483647,
+++++++#         #     next_tokens=2147483647,
+++++++#         #     inner_precise=0,
+++++++#         #     drop_mask=None,
+++++++#         #     prefix=None,
+++++++#         #     actual_seq_qlen=None,
+++++++#         #     actual_seq_kvlen=None,
+++++++#         #     sparse_mode=sparse_mode,
+++++++#         # )
+++++++#         if not output_attentions:
+++++++#             attn_weights = None
+++++++
+++++++#         return attn_output, attn_weights, past_key_value
+++++++
++++++ class Qwen2MoeAttention(nn.Module):
++++++     """
++++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
++++++-    and "Generating Long Sequences with Sparse Transformers".
++++++-    """
+++++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
++++++ 
+++++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+++++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+++++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+++++++
+++++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+++++++    """
++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++         super().__init__()
++++++         self.config = config
++++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
++++++         if layer_idx is None:
++++++             logger.warning_once(
++++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++                 "when creating this class."
++++++             )
++++++ 
++++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
++++++         use_cache: bool = False,
++++++         cache_position: Optional[mindspore.Tensor] = None,
++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++-
++++++         
++++++-
+++++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
++++++         bsz, q_len, _ = hidden_states.shape
++++++ 
++++++         query_states = self.q_proj(hidden_states)
++++++         key_states = self.k_proj(hidden_states)
++++++         value_states = self.v_proj(hidden_states)
++++++ 
++++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++-
+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++        
++++++         kv_seq_len = key_states.shape[-2]
++++++         if past_key_value is not None:
++++++-            if self.layer_idx is None:
++++++-                raise ValueError(
++++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++-                    "with a layer index."
++++++-                )
++++++-            if isinstance(past_key_value, StaticCache):
++++++-                kv_seq_len = key_states.shape[-2]
++++++-            else:
++++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++        
++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++ 
++++++         if past_key_value is not None:
++++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++
+++++++        # --- 2. 动态调度核心注意力计算 ---
+++++++        global Long_Prompt
+++++++        if Long_Prompt >= 1:
+++++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+++++++            fa_attention_mask = None
+++++++            if attention_mask is not None:
+++++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++                fa_attention_mask = (mask_slice != 0)
+++++++
+++++++            attn_output = mindspore.ops.flash_attention_score(
+++++++                query=query_states,
+++++++                key=key_states,
+++++++                value=value_states,
+++++++                head_num=self.num_heads,
+++++++                attn_mask=fa_attention_mask,
+++++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+++++++                scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++                input_layout="BNSD",
+++++++                sparse_mode=0,
+++++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+++++++            )
++++++             
++++++-            if isinstance(past_key_value, StaticCache):
++++++-                kv_seq_len = key_states.shape[-2]
+++++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++            attn_output = self.o_proj(attn_output)
+++++++            attn_weights = None
+++++++            if output_attentions:
+++++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
++++++ 
++++++-        # repeat k/v heads if n_kv_heads < n_heads
++++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++-        
++++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++++        else:
+++++++            # --- Eager Attention 路径 (用于短序列和解码) ---
+++++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++            
+++++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++ 
++++++-        if attention_mask is not None:
++++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++-            attn_weights = attn_weights + causal_mask
+++++++            if attention_mask is not None:
+++++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++++                attn_weights = attn_weights + causal_mask
++++++ 
++++++-        # upcast attention to fp32
++++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++++++-        attn_output = ops.matmul(attn_weights, value_states)
+++++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++++++            attn_output = ops.matmul(attn_weights, value_states)
++++++ 
++++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++++++-            raise ValueError(
++++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
++++++-                f" {attn_output.shape}"
++++++-            )
+++++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++++++                raise ValueError(
+++++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+++++++                )
++++++ 
++++++-        attn_output = ops.transpose(attn_output, 1, 2)
++++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++++            attn_output = ops.transpose(attn_output, 1, 2)
+++++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++++            attn_output = self.o_proj(attn_output)
++++++ 
++++++-        attn_output = self.o_proj(attn_output)
++++++-        # @lwx
+++++++            if not output_attentions:
+++++++                attn_weights = None
++++++         
++++++-        # max_seq_len = self.max_position_embeddings  # 2048
++++++-
++++++-        # if attention_mask is not None:
++++++-        #     # attention_mask: [B, 1, Sq, Sk]
++++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++++-
++++++-        #     # pad 到 [max_seq_len, max_seq_len]
++++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++++-        #     global_attention_mask = padded_mask
++++++-        # else:
++++++-        #     global_attention_mask = None
++++++-
++++++-
++++++-        # sparse_mode=3
++++++-        # attn_output = mindspore.ops.flash_attention_score(
++++++-        #     query=query_states,
++++++-        #     key=key_states,
++++++-        #     value=value_states,
++++++-        #     real_shift=None,
++++++-        #     padding_mask=None,
++++++-
++++++-        #     head_num=self.num_heads,
++++++-        #     attn_mask=global_attention_mask,
++++++-        #     keep_prob=1.0 - self.attention_dropout,
++++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++++-        #     input_layout="BNSD",
++++++-        #     pre_tokens=2147483647,
++++++-        #     next_tokens=2147483647,
++++++-        #     inner_precise=0,
++++++-        #     drop_mask=None,
++++++-        #     prefix=None,
++++++-        #     actual_seq_qlen=None,
++++++-        #     actual_seq_kvlen=None,
++++++-        #     sparse_mode=sparse_mode,
++++++-        # )
++++++-        if not output_attentions:
++++++-            attn_weights = None
++++++-
++++++         return attn_output, attn_weights, past_key_value
++++++ 
++++++-
++++++ # class Qwen2MoeFlashAttention(nn.Module):
++++++ #     """
++++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
++++++ #             return final_hidden_states, router_logits
++++++ 
++++++ 
++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++-#     """
++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
++++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
++++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
++++++-#     """
++++++-#     def __init__(self, config: Qwen2MoeConfig):
++++++-#         super().__init__()
++++++-#         self.num_experts = config.num_experts
++++++-#         self.top_k = config.num_experts_per_tok
++++++-#         self.norm_topk_prob = config.norm_topk_prob
++++++-
++++++-#         # 门控网络
++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++-#         # 专家列表
++++++-#         self.experts = nn.ModuleList(
++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++-#         )
++++++-#         # 共享专家
++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++-
++++++-#     @no_grad()
++++++-#     def _moe_infer_decode(
++++++-#         self, 
++++++-#         hidden_states: mindspore.Tensor, 
++++++-#         selected_experts: mindspore.Tensor, 
++++++-#         routing_weights: mindspore.Tensor
++++++-#     ) -> mindspore.Tensor:
++++++-#         """
++++++-#         【解码路径】针对 sequence_length=1 的极致优化。
++++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
++++++-#         """
++++++-#         batch_size, hidden_dim = hidden_states.shape
++++++-        
++++++-#         expert_outputs_list = [
++++++-#             ops.cat([
++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++-#             ], dim=0) 
++++++-#             for i in range(batch_size)
++++++-#         ]
++++++-        
++++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
++++++-#         # shape: (batch_size, top_k, hidden_dim)
++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++-        
++++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
++++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++-        
++++++-#         return moe_output.squeeze(1)
++++++-
++++++-#     @no_grad()
++++++-#     def _moe_infer_prefill(
++++++-#         self, 
++++++-#         hidden_states: mindspore.Tensor, 
++++++-#         selected_experts: mindspore.Tensor, 
++++++-#         routing_weights: mindspore.Tensor
++++++-#     ) -> mindspore.Tensor:
++++++-#         """
++++++-#         【预填充路径】针对 sequence_length > 1 的优化。
++++++-#         按专家对 Token 进行分组，并进行批处理。
++++++-#         """
++++++-#         moe_output = ops.zeros_like(hidden_states)
++++++-#         num_tokens = hidden_states.shape[0]
++++++-#         flat_selected_experts = selected_experts.flatten()
++++++-        
++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++-        
++++++-#         active_experts = ops.unique(flat_selected_experts)
++++++-        
++++++-#         for expert_idx_tensor in active_experts:
++++++-#             expert_idx = expert_idx_tensor.item()
++++++-#             expert_layer = self.experts[expert_idx]
++++++-            
++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++++-#             selected_token_indices = token_indices[mask]
++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
++++++-            
++++++-#             current_states = hidden_states[selected_token_indices]
++++++-            
++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++-            
++++++-#             moe_output = moe_output.index_add(
++++++-#                 dim=0,
++++++-#                 index=selected_token_indices,
++++++-#                 source=expert_output.to(hidden_states.dtype)
++++++-#             )
++++++-#         return moe_output
++++++-
++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++-#         """
++++++-#         顶层 forward 方法，作为智能分发器。
++++++-#         """
++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-        
++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++-#         router_logits = self.gate(hidden_states_reshaped)
++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++-
++++++-#         if self.norm_topk_prob:
++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-        
++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++-        
++++++-#         moe_output = None
++++++-#         # 在推理时，根据序列长度选择最优路径
++++++-#         if not self.training:
++++++-#             if sequence_length == 1:
++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++++-#             else:
++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++++-#         else:
++++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
++++++-#             raise NotImplementedError("Training path is not implemented.")
++++++-
++++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
++++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
++++++-        
++++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
++++++-        
++++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
++++++-        
++++++-#         return final_hidden_states, router_logits
++++++-
++++++-
++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++-#     """
++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
++++++-#     """
++++++-#     def __init__(self, config: Qwen2MoeConfig):
++++++-#         super().__init__()
++++++-#         self.num_experts = config.num_experts
++++++-#         self.top_k = config.num_experts_per_tok
++++++-#         self.norm_topk_prob = config.norm_topk_prob
++++++-
++++++-#         # 门控网络
++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++-#         # 专家列表
++++++-#         self.experts = nn.ModuleList(
++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++-#         )
++++++-#         # 共享专家
++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++-
++++++-#     @no_grad()
++++++-#     def _moe_infer_decode(
++++++-#         self, 
++++++-#         hidden_states: mindspore.Tensor, 
++++++-#         selected_experts: mindspore.Tensor, 
++++++-#         routing_weights: mindspore.Tensor
++++++-#     ) -> mindspore.Tensor:
++++++-#         batch_size, _ = hidden_states.shape
++++++-#         expert_outputs_list = [
++++++-#             ops.cat([
++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++-#             ], dim=0) 
++++++-#             for i in range(batch_size)
++++++-#         ]
++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++-#         return moe_output.squeeze(1)
++++++-
++++++-#     @no_grad()
++++++-#     def _moe_infer_prefill(
++++++-#         self, 
++++++-#         hidden_states: mindspore.Tensor, 
++++++-#         selected_experts: mindspore.Tensor, 
++++++-#         routing_weights: mindspore.Tensor
++++++-#     ) -> mindspore.Tensor:
++++++-#         moe_output = ops.zeros_like(hidden_states)
++++++-#         num_tokens = hidden_states.shape[0]
++++++-#         flat_selected_experts = selected_experts.flatten()
++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++-#         active_experts = ops.unique(flat_selected_experts)
++++++-        
++++++-#         for expert_idx_tensor in active_experts:
++++++-#             expert_idx = expert_idx_tensor.item()
++++++-#             expert_layer = self.experts[expert_idx]
++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++++-#             selected_token_indices = token_indices[mask]
++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
++++++-#             current_states = hidden_states[selected_token_indices]
++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++-#             moe_output = moe_output.index_add(
++++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++++-#             )
++++++-#         return moe_output
++++++-
++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++-#         """
++++++-#         顶层 forward 方法，作为智能分发器。
++++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
++++++-#         """
++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-        
++++++-#         # 1. 门控计算 (通用逻辑)
++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++-#         router_logits = self.gate(hidden_states_reshaped)
++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++-
++++++-#         if self.norm_topk_prob:
++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-        
++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++-        
++++++-#         # 2. 智能分发到最优 MoE 路径
++++++-#         moe_output = None
++++++-#         if not self.training:
++++++-#             if sequence_length == 1:
++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++++-#             else:
++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++++-#         else:
++++++-#             raise NotImplementedError("Training path is not implemented.")
++++++-
++++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
++++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++-        
++++++-#         # 4. 合并 MoE 输出和共享专家输出
++++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++-        
++++++-#         # 5. 恢复原始形状并返回
++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++-        
++++++-#         return final_hidden_states, router_logits
++++++-
++++++-# prefill fastest
++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++-#     """
++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
++++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
++++++-#     """
++++++-#     def __init__(self, config: Qwen2MoeConfig):
++++++-#         super().__init__()
++++++-#         self.num_experts = config.num_experts
++++++-#         self.top_k = config.num_experts_per_tok
++++++-#         self.norm_topk_prob = config.norm_topk_prob
++++++-
++++++-#         # 门控网络
++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++-#         # 专家列表
++++++-#         self.experts = nn.ModuleList(
++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++-#         )
++++++-#         # 共享专家
++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++-
++++++-#     @no_grad()
++++++-#     def _moe_infer_dispatch(
++++++-#         self, 
++++++-#         hidden_states: mindspore.Tensor, 
++++++-#         selected_experts: mindspore.Tensor, 
++++++-#         routing_weights: mindspore.Tensor
++++++-#     ) -> mindspore.Tensor:
++++++-#         """
++++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
++++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
++++++-#         """
++++++-#         moe_output = ops.zeros_like(hidden_states)
++++++-#         num_tokens, _ = hidden_states.shape
++++++-        
++++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
++++++-#         flat_selected_experts = selected_experts.flatten()
++++++-#         flat_routing_weights = routing_weights.flatten()
++++++-
++++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++-
++++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
++++++-#         active_experts = ops.unique(flat_selected_experts)
++++++-        
++++++-#         for expert_idx_tensor in active_experts:
++++++-#             expert_idx = expert_idx_tensor.item()
++++++-#             expert_layer = self.experts[expert_idx]
++++++-            
++++++-#             # 找到所有分配给该专家的 token
++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++++-            
++++++-#             # 使用 mask 选取对应的 token 和权重
++++++-#             current_token_indices = token_indices[mask]
++++++-#             current_routing_weights = flat_routing_weights[mask]
++++++-#             current_hidden_states = hidden_states[current_token_indices]
++++++-            
++++++-#             # 对这些 token 进行批处理
++++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++++-            
++++++-#             # 使用 index_add 将结果精确地加回到对应位置
++++++-#             moe_output = moe_output.index_add(
++++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
++++++-#             )
++++++-#         return moe_output
++++++-
++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++-#         """
++++++-#         顶层 forward 方法，作为智能分发器。
++++++-#         """
++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-        
++++++-#         # 1. 门控计算
++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++-#         router_logits = self.gate(hidden_states_reshaped)
++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++-
++++++-#         if self.norm_topk_prob:
++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-        
++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++-        
++++++-#         # 2. 调用统一的 MoE 计算内核
++++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
++++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
++++++-
++++++-#         # 3. 统一处理共享专家
++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++-        
++++++-#         # 4. 合并输出
++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++-        
++++++-#         # 5. 恢复原始形状并返回
++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++-        
++++++-#         return final_hidden_states, router_logits
++++++-
++++++-
++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++-#     """
++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++-#     【最终高性能与高精度版】：
++++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
++++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
++++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
++++++-#     3. 这样实现了速度和准确性的两全其美。
++++++-#     """
++++++-#     def __init__(self, config: Qwen2MoeConfig):
++++++-#         super().__init__()
++++++-#         self.num_experts = config.num_experts
++++++-#         self.top_k = config.num_experts_per_tok
++++++-#         self.norm_topk_prob = config.norm_topk_prob
++++++-
++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++-#         self.experts = nn.ModuleList(
++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++-#         )
++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++-
++++++-#     @no_grad()
++++++-#     def _moe_infer_decode(
++++++-#         self, 
++++++-#         hidden_states: mindspore.Tensor, 
++++++-#         selected_experts: mindspore.Tensor, 
++++++-#         routing_weights: mindspore.Tensor
++++++-#     ) -> mindspore.Tensor:
++++++-#         """
++++++-#         【解码路径】极致优化版：bmm + 高精度累加。
++++++-#         """
++++++-#         original_dtype = hidden_states.dtype
++++++-#         batch_size, _ = hidden_states.shape
++++++-        
++++++-#         expert_outputs_list = [
++++++-#             ops.cat([
++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++-#             ], dim=0) 
++++++-#             for i in range(batch_size)
++++++-#         ]
++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++-
++++++-#         # 在 float32 下执行 bmm，得到高精度结果
++++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++-        
++++++-#         # 将高精度结果转换回原始数据类型
++++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
++++++-        
++++++-#         return moe_output
++++++-
++++++-#     @no_grad()
++++++-#     def _moe_infer_prefill(
++++++-#         self, 
++++++-#         hidden_states: mindspore.Tensor, 
++++++-#         selected_experts: mindspore.Tensor, 
++++++-#         routing_weights: mindspore.Tensor
++++++-#     ) -> mindspore.Tensor:
++++++-#         """
++++++-#         【预填充路径】与原始实现一致，结果精确。
++++++-#         """
++++++-#         moe_output = ops.zeros_like(hidden_states)
++++++-#         num_tokens, _ = hidden_states.shape
++++++-#         flat_selected_experts = selected_experts.flatten()
++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++-#         active_experts = ops.unique(flat_selected_experts)
++++++-        
++++++-#         for expert_idx_tensor in active_experts:
++++++-#             expert_idx = expert_idx_tensor.item()
++++++-#             expert_layer = self.experts[expert_idx]
++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++++-#             selected_token_indices = token_indices[mask]
++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
++++++-#             current_states = hidden_states[selected_token_indices]
++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++-#             moe_output = moe_output.index_add(
++++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++++-#             )
++++++-#         return moe_output
++++++-
++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-        
++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++-#         router_logits = self.gate(hidden_states_reshaped)
++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++-
++++++-#         if self.norm_topk_prob:
++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-        
++++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
++++++-#         # 如果模型主体是 float16，后续再转换
++++++-        
++++++-#         moe_output = None
++++++-#         if not self.training:
++++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
++++++-#             # _moe_infer_decode 内部会处理好类型转换
++++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
++++++-#             if sequence_length == 1:
++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++++-#             else:
++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++++-#         else:
++++++-#             raise NotImplementedError("Training path is not implemented.")
++++++-
++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++-        
++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++-        
++++++-#         return final_hidden_states, router_logits
++++++-    
++++++-
++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++-#     """
++++++-#     【融合版】一个混合专家模块，内置两种推理策略，
++++++-#     由外部全局变量 `Long_Prompt` 控制：
++++++-
++++++-#     - if Long_Prompt is True:  【精度优先模式】
++++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
++++++-#       适用于处理长序列，避免误差累积。
++++++-
++++++-#     - if Long_Prompt is False: 【速度优先模式】
++++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
++++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
++++++-#     """
++++++-#     def __init__(self, config: Qwen2MoeConfig):
++++++-#         super().__init__()
++++++-#         self.num_experts = config.num_experts
++++++-#         self.top_k = config.num_experts_per_tok
++++++-#         self.norm_topk_prob = config.norm_topk_prob
++++++-
++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++-#         self.experts = nn.ModuleList(
++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++-#         )
++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++-
++++++-#     # --- 速度优先模式的辅助函数 ---
++++++-#     @no_grad()
++++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++-#         original_dtype = hidden_states.dtype
++++++-#         batch_size, _ = hidden_states.shape
++++++-#         expert_outputs_list = [
++++++-#             ops.cat([
++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++-#             ], dim=0) 
++++++-#             for i in range(batch_size)
++++++-#         ]
++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
++++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
++++++-
++++++-#     @no_grad()
++++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++-#         moe_output = ops.zeros_like(hidden_states)
++++++-#         num_tokens, _ = hidden_states.shape
++++++-#         flat_selected_experts = selected_experts.flatten()
++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++-#         active_experts = ops.unique(flat_selected_experts)
++++++-#         for expert_idx_tensor in active_experts:
++++++-#             expert_idx = expert_idx_tensor.item()
++++++-#             expert_layer = self.experts[expert_idx]
++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++++-#             selected_token_indices = token_indices[mask]
++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
++++++-#             current_states = hidden_states[selected_token_indices]
++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
++++++-#         return moe_output
++++++-
++++++-#     # --- 精度优先模式的辅助函数 ---
++++++-#     @no_grad()
++++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++-#         moe_output = ops.zeros_like(hidden_states)
++++++-#         num_tokens, _ = hidden_states.shape
++++++-#         flat_selected_experts = selected_experts.flatten()
++++++-#         flat_routing_weights = routing_weights.flatten()
++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++-#         active_experts = ops.unique(flat_selected_experts)
++++++-#         for expert_idx_tensor in active_experts:
++++++-#             expert_idx = expert_idx_tensor.item()
++++++-#             expert_layer = self.experts[expert_idx]
++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
++++++-#             current_token_indices = token_indices[mask]
++++++-#             current_routing_weights = flat_routing_weights[mask]
++++++-#             current_hidden_states = hidden_states[current_token_indices]
++++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++++++-#         return moe_output
++++++-
++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
++++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
++++++-#         global Long_Prompt
++++++-
++++++-#         # 1. 门控计算 (所有模式通用)
++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++-#         router_logits = self.gate(hidden_states_reshaped)
++++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++++++-#         if self.norm_topk_prob:
++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-        
++++++-#         moe_output = None
++++++-#         if not self.training:
++++++-#             # 根据 Long_Prompt 标志选择模式
++++++-#             if Long_Prompt:
++++++-#                 # --- 精度优先模式 ---
++++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++-#             else:
++++++-#                 # --- 速度优先模式 ---
++++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++-#                 if sequence_length == 1:
++++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++-#                 else:
++++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++-#         else:
++++++-#             raise NotImplementedError("Training path is not implemented.")
++++++-
++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++-        
++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++-        
++++++-#         return final_hidden_states, router_logits
++++++-    
++++++ class Qwen2MoeSparseMoeBlock(nn.Module):
++++++     """
++++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
++++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++++         return moe_output_fp32.squeeze(1).to(original_dtype)
++++++ 
+++++++    # @no_grad()
+++++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++    #     num_tokens, _ = hidden_states.shape
+++++++    #     flat_selected_experts = selected_experts.flatten()
+++++++    #     sorted_expert_indices = flat_selected_experts.argsort()
+++++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++++++    #     original_token_indices = sorted_expert_indices // self.top_k
+++++++    #     moe_output = ops.zeros_like(hidden_states)
+++++++    #     current_token_offset = 0
+++++++    #     for i in range(self.num_experts):
+++++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+++++++    #         if expert_token_count == 0:
+++++++    #             continue
+++++++    #         end_offset = current_token_offset + expert_token_count
+++++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+++++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++++++    #         current_token_offset += expert_token_count
+++++++    #     return moe_output
+++++++
++++++     @no_grad()
++++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++-        num_tokens, _ = hidden_states.shape
++++++-        flat_selected_experts = selected_experts.flatten()
++++++-        sorted_expert_indices = flat_selected_experts.argsort()
++++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++++++-        original_token_indices = sorted_expert_indices // self.top_k
+++++++        """
+++++++        优化版 MoE prefill (速度优先模式)：
+++++++        - 批量张量化处理同一个 expert 的所有 token
+++++++        - 跳过无 token 的专家
+++++++        - 保持结果完全一致
+++++++        """
++++++         moe_output = ops.zeros_like(hidden_states)
++++++-        current_token_offset = 0
++++++-        for i in range(self.num_experts):
++++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
++++++-            if expert_token_count == 0:
++++++-                continue
++++++-            end_offset = current_token_offset + expert_token_count
++++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
++++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++++++-            current_token_offset += expert_token_count
+++++++
+++++++        flat_selected_experts = selected_experts.flatten()
+++++++        flat_routing_weights = routing_weights.flatten()
+++++++
+++++++        idxs = flat_selected_experts.argsort()
+++++++        sorted_expert_indices = flat_selected_experts[idxs]
+++++++        sorted_token_indices = idxs // self.top_k
+++++++
+++++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+++++++
+++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+++++++
+++++++        for expert_id in active_experts.tolist():
+++++++            start = int(tokens_per_expert[:expert_id].sum().item())
+++++++            end = start + int(tokens_per_expert[expert_id].item())
+++++++
+++++++            token_idx = sorted_token_indices[start:end]
+++++++            expert_tokens = hidden_states[token_idx]
+++++++
+++++++            expert_out = self.experts[expert_id](expert_tokens)
+++++++
+++++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+++++++
+++++++            moe_output = mindspore.mint.scatter_add(
+++++++                moe_output,
+++++++                0,
+++++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+++++++                scaled_out.to(hidden_states.dtype)
+++++++            )
+++++++
++++++         return moe_output
++++++ 
+++++++
++++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
++++++     @no_grad()
++++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++         
++++++         moe_output = None
++++++-        if Long_Prompt:
++++++-            # --- 精度优先模式 (ACCURACY MODE) ---
++++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++        # if Long_Prompt==0:
+++++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
+++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++        # else:
+++++++        #     # --- 速度优先模式 (SPEED MODE) ---
+++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++        #     if sequence_length == 1:
+++++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++        #     else:
+++++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++        
+++++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++        if sequence_length == 1:
+++++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++         else:
++++++-            # --- 速度优先模式 (SPEED MODE) ---
++++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++-            if sequence_length == 1:
++++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++-            else:
++++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++-        
+++++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++    
++++++ 
++++++         # 3. 共享专家计算与合并 (所有模式通用)
++++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++++         
++++++         return final_hidden_states, router_logits
++++++ 
+++++++
++++++ class Qwen2MoeDecoderLayer(nn.Module):
++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
++++++         super().__init__()
++++++         self.hidden_size = config.hidden_size
++++++         
++++++-        # if Long_Prompt:
++++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++-        # else:
+++++++        # if Long_Prompt == 2:
++++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++++        # else:
+++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++ 
++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++ 
++++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++++             )
++++++ 
++++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
++++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++++        #     attention_mask,
+++++++        #     sequence_length=sequence_length,
+++++++        #     target_length=target_length,
+++++++        #     dtype=dtype,
+++++++        #     min_dtype=min_dtype,
+++++++        #     cache_position=cache_position,
+++++++        #     batch_size=input_tensor.shape[0],
+++++++        # )
+++++++        #@dwj
+++++++        causal_mask = get_cached_causal_mask_with_cache_position(
++++++             attention_mask,
++++++             sequence_length=sequence_length,
++++++             target_length=target_length,
++++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
++++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
++++++         """
++++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+++++++        _causal_mask_cache.clear()
++++++ 
++++++         input_ids = kwargs.get("input_ids")
++++++         if input_ids is None and args:
++++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++ 
++++++         if input_ids is not None:
++++++             prompt_length = input_ids.shape[1]
++++++-            
++++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
++++++-                Long_Prompt = True
+++++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+++++++                Long_Prompt = 2
+++++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+++++++                Long_Prompt = 0
++++++             else:
++++++-                Long_Prompt = False
+++++++                Long_Prompt = 1
+++++++
++++++ 
++++++         return super().generate(*args, **kwargs)
++++++     
++++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++             dtype = self.lm_head.weight.dtype
++++++             min_dtype = float(ops.finfo(dtype).min)
++++++ 
++++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+++++++            #     attention_mask,
+++++++            #     sequence_length=sequence_length,
+++++++            #     target_length=past_key_values.get_max_length(),
+++++++            #     dtype=dtype,
+++++++            #     min_dtype=min_dtype,
+++++++            #     cache_position=cache_position,
+++++++            #     batch_size=batch_size,
+++++++            # )
+++++++
+++++++            #@dwj
+++++++            attention_mask = get_cached_causal_mask_with_cache_position(
++++++                 attention_mask,
++++++                 sequence_length=sequence_length,
++++++                 target_length=past_key_values.get_max_length(),
++++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++++++deleted file mode 100644
++++++index 6dfb5b93..00000000
++++++--- a/patches/0001-20251104commit.patch
+++++++++ /dev/null
++++++@@ -1,1272 +0,0 @@
++++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++++-From: Pinoeer-kingxi <13022943007@163.com>
++++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
++++++-Subject: [PATCH] 20251104commit
++++++-
++++++----
++++++- mindnlp/transformers/cache_utils.py           |  28 +-
++++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
++++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++++++- 3 files changed, 976 insertions(+), 87 deletions(-)
++++++-
++++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++++++-index cadd2e04..02f8d4be 100644
++++++---- a/mindnlp/transformers/cache_utils.py
++++++-+++ b/mindnlp/transformers/cache_utils.py
++++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
++++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++++++-             # k_out[:, :, cache_position] = key_states
++++++-             # v_out[:, :, cache_position] = value_states
++++++--            if ON_ORANGE_PI:
++++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++--            else:
++++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++--
++++++-+            # if ON_ORANGE_PI:
++++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++-+            # else:
++++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
++++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++++++-+            if cache_position.ndim > 1:
++++++-+                cache_position = cache_position.flatten()
++++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
++++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++++++-+                cache_position = cache_position.int()
++++++-+            
++++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++++++-+            k_out[:, :, cache_position] = key_states
++++++-+            v_out[:, :, cache_position] = value_states
++++++-+            
++++++-         return k_out, v_out
++++++- 
++++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++-index c695b944..d8303e45 100644
++++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
++++++- def rotate_half(x):
++++++-     """Rotates half the hidden dims of the input."""
++++++--    x1 = x[..., : x.shape[-1] // 2]
++++++--    x2 = x[..., x.shape[-1] // 2 :]
++++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++-+    # x1 = x[..., : x.shape[-1] // 2]
++++++-+    # x2 = x[..., x.shape[-1] // 2 :]
++++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++-     return ops.cat((-x2, x1), dim=-1)
++++++- 
++++++- 
++++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++++++-         if self.training:
++++++-             raise NotImplementedError("Training is not supported yet.")
++++++-         else:
++++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++--        if self.config.n_shared_experts is not None:
++++++--            y = y + self.shared_experts(identity)
++++++--        return y
++++++-+            # @lwx
++++++-+            if orig_shape[1] == 1:
++++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++++++-+                y=y.view(*orig_shape)
++++++-+                if self.config.n_shared_experts is not None:
++++++-+                    y = y + self.shared_experts(identity)
++++++-+                return y
++++++-+            else:
++++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++++++-+                if self.config.n_shared_experts is not None:
++++++-+                    y = y + self.shared_experts(identity)
++++++-+                return y
++++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++-+        # if self.config.n_shared_experts is not None:
++++++-+        #     y = y + self.shared_experts(identity)
++++++-+        # return y
++++++-+        
++++++-+    @no_grad()
++++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++-+
++++++-+        expert_cache = ops.zeros_like(x)
++++++-+        for i in range(self.num_experts_per_tok):
++++++-+            expert_id = flat_expert_indices[i].item()
++++++-+            weight = flat_expert_weights[i].item()
++++++-+            expert = self.experts[expert_id]
++++++-+            expert_out = expert(x)
++++++-+            expert_cache += expert_out * weight
++++++-+        return expert_cache
++++++- 
++++++-     @no_grad()
++++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++--        # expert_cache = torch.zeros_like(x)
++++++--        # idxs = flat_expert_indices.argsort()
++++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++--        # token_idxs = idxs // self.num_experts_per_tok
++++++--        # for i, end_idx in enumerate(tokens_per_expert):
++++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++--        #     if start_idx == end_idx:
++++++--        #         continue
++++++--        #     expert = self.experts[i]
++++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++--        #     expert_tokens = x[exp_token_idx]
++++++--        #     expert_out = expert(expert_tokens)
++++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++--        # return expert_cache
++++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++-         expert_cache = ops.zeros_like(x)
++++++-         idxs = flat_expert_indices.argsort()
++++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++-         token_idxs = idxs // self.num_experts_per_tok
++++++-+
++++++-         for i, end_idx in enumerate(tokens_per_expert):
++++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++-             if start_idx == end_idx:
++++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++++++-             expert_out = expert(expert_tokens)
++++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++-+
++++++-         return expert_cache
++++++-+        
++++++-+    # @no_grad()
++++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++-+    #     # expert_cache = torch.zeros_like(x)
++++++-+    #     # idxs = flat_expert_indices.argsort()
++++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
++++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
++++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++-+    #     #     if start_idx == end_idx:
++++++-+    #     #         continue
++++++-+    #     #     expert = self.experts[i]
++++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++-+    #     #     expert_tokens = x[exp_token_idx]
++++++-+    #     #     expert_out = expert(expert_tokens)
++++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++-+    #     # return expert_cache
++++++-+    #     expert_cache = ops.zeros_like(x)
++++++-+    #     idxs = flat_expert_indices.argsort()
++++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++-+    #     token_idxs = idxs // self.num_experts_per_tok
++++++-+
++++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
++++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++-+    #         if start_idx == end_idx:
++++++-+    #             continue
++++++-+    #         expert = self.experts[i]
++++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++-+    #         expert_tokens = x[exp_token_idx]
++++++-+    #         expert_out = expert(expert_tokens)
++++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++-+
++++++-+    #     return expert_cache
++++++-+    # @no_grad()
++++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++-+    #     expert_cache = ops.zeros_like(x)
++++++-+
++++++-+    #     # 排序保证顺序一致
++++++-+    #     idxs = flat_expert_indices.argsort()
++++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++-+    #     token_idxs = idxs // self.num_experts_per_tok
++++++-+
++++++-+    #     # 找出有 token 的专家
++++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++-+
++++++-+    #     for i in active_experts.tolist():
++++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++-+    #         end_idx = tokens_per_expert[i]
++++++-+    #         if start_idx == end_idx:  # 没有 token
++++++-+    #             continue
++++++-+
++++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++-+    #         expert_tokens = x[exp_token_idx]
++++++-+    #         expert_out = self.experts[i](expert_tokens)
++++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++-+
++++++-+    #         expert_cache = mindspore.mint.scatter_add(
++++++-+    #             expert_cache,
++++++-+    #             0,
++++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++-+    #             expert_out
++++++-+    #         )
++++++-+
++++++-+    #     return expert_cache
++++++-+
++++++-+
++++++- 
++++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++++++- #     """
++++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++- 
++++++-         # Initialize weights and apply final processing
++++++-         self.post_init()
++++++-+        self.warm_up = False
++++++-+
++++++-+    def warmup_moe_model_deep(self):
++++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++++-+        test_texts = [
++++++-+            "warmup short",
++++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++++++-+        ]
++++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++-+        if tokenizer is None:
++++++-+            from mindnlp.transformers import AutoTokenizer
++++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++-+            self._warmup_tokenizer = tokenizer
++++++-+
++++++-+        for text in test_texts:
++++++-+            inputs = tokenizer(text, return_tensors="ms")
++++++-+            with mindspore._no_grad():
++++++-+                _ = self(**inputs, use_cache=False)
++++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++++++- 
++++++-     def get_input_embeddings(self):
++++++-         return self.model.embed_tokens
++++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++++-         ```"""
++++++-+        if not self.warm_up:
++++++-+            self.warm_up = True
++++++-+            self.warmup_moe_model_deep()
++++++-+
++++++-         output_attentions = (
++++++-             output_attentions
++++++-             if output_attentions is not None
++++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++-index 3cbf820e..d4c6b651 100644
++++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++-@@ -18,7 +18,6 @@
++++++- # See the License for the specific language governing permissions and
++++++- # limitations under the License.
++++++- """MindSpore Qwen2MoE model."""
++++++--
++++++- import math
++++++- from typing import List, Optional, Tuple, Union
++++++- 
++++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++++++-     TokenClassifierOutput,
++++++- )
++++++- from ...modeling_utils import PreTrainedModel
++++++-+from ...generation import GenerationMixin
++++++- from ....utils import logging
++++++- from .configuration_qwen2_moe import Qwen2MoeConfig
++++++- 
++++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++++++-         self.variance_epsilon = eps
++++++- 
++++++-     def forward(self, hidden_states):
++++++-+        # @dwj
++++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++-+        # @lwx
++++++-+        # if not self.training :
++++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++-         input_dtype = hidden_states.dtype
++++++-         hidden_states = hidden_states.to(mindspore.float32)
++++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++++++-@@ -234,6 +239,8 @@ def rotate_half(x):
++++++-     """Rotates half the hidden dims of the input."""
++++++-     x1 = x[..., : x.shape[-1] // 2]
++++++-     x2 = x[..., x.shape[-1] // 2 :]
++++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++-     return ops.cat((-x2, x1), dim=-1)
++++++- 
++++++- 
++++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++++++-         self.config = config
++++++-         self.hidden_size = config.hidden_size
++++++-         self.intermediate_size = intermediate_size
++++++-+        
++++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++++++-         self.act_fn = ACT2FN[config.hidden_act]
++++++- 
++++++-     def forward(self, x):
++++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++--
++++++- 
++++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++-+        # @lwx 
++++++-+        # gate_up_output = self.gate_up_proj(x)
++++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++++++-+        # return self.down_proj(swiglu_output)
++++++-+
++++++-+    # def forward(self, x):
++++++-+    #     gate_proj_out = self.gate_proj(x)
++++++-+    #     up_proj_out = self.up_proj(x)
++++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++++++-+    #     return self.down_proj(swiglu_out)
++++++-+        
++++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
++++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++++-     """
++++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++++++-         use_cache: bool = False,
++++++-         cache_position: Optional[mindspore.Tensor] = None,
++++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++-+
++++++-+        
++++++-+
++++++-         bsz, q_len, _ = hidden_states.shape
++++++- 
++++++-         query_states = self.q_proj(hidden_states)
++++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++-                     "with a layer index."
++++++-                 )
++++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++-+            if isinstance(past_key_value, StaticCache):
++++++-+                kv_seq_len = key_states.shape[-2]
++++++-+            else:
++++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++- 
++++++-         if past_key_value is not None:
++++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++-+            
++++++-+            if isinstance(past_key_value, StaticCache):
++++++-+                kv_seq_len = key_states.shape[-2]
++++++- 
++++++-         # repeat k/v heads if n_kv_heads < n_heads
++++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++--
++++++-+        
++++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++- 
++++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++++++--            raise ValueError(
++++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++++++--                f" {attn_weights.shape}"
++++++--            )
++++++--
++++++--        if attention_mask is not None:  # no matter the length, we just slice it
++++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++++++-+        if attention_mask is not None:
++++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++-             attn_weights = attn_weights + causal_mask
++++++- 
++++++-         # upcast attention to fp32
++++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++- 
++++++-         attn_output = self.o_proj(attn_output)
++++++--
++++++-+        # @lwx
++++++-+        
++++++-+        # max_seq_len = self.max_position_embeddings  # 2048
++++++-+
++++++-+        # if attention_mask is not None:
++++++-+        #     # attention_mask: [B, 1, Sq, Sk]
++++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++++-+
++++++-+        #     # pad 到 [max_seq_len, max_seq_len]
++++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++++-+        #     global_attention_mask = padded_mask
++++++-+        # else:
++++++-+        #     global_attention_mask = None
++++++-+
++++++-+
++++++-+        # sparse_mode=3
++++++-+        # attn_output = mindspore.ops.flash_attention_score(
++++++-+        #     query=query_states,
++++++-+        #     key=key_states,
++++++-+        #     value=value_states,
++++++-+        #     real_shift=None,
++++++-+        #     padding_mask=None,
++++++-+
++++++-+        #     head_num=self.num_heads,
++++++-+        #     attn_mask=global_attention_mask,
++++++-+        #     keep_prob=1.0 - self.attention_dropout,
++++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++++-+        #     input_layout="BNSD",
++++++-+        #     pre_tokens=2147483647,
++++++-+        #     next_tokens=2147483647,
++++++-+        #     inner_precise=0,
++++++-+        #     drop_mask=None,
++++++-+        #     prefix=None,
++++++-+        #     actual_seq_qlen=None,
++++++-+        #     actual_seq_kvlen=None,
++++++-+        #     sparse_mode=sparse_mode,
++++++-+        # )
++++++-         if not output_attentions:
++++++-             attn_weights = None
++++++- 
++++++-         return attn_output, attn_weights, past_key_value
++++++- 
++++++- 
++++++-+class Qwen2MoeFlashAttention(nn.Module):
++++++-+    """
++++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++++-+
++++++-+    关键改动:
++++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++++-+       直接传入原始的 key 和 value 张量效率更高。
++++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++-+    """
++++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++-+        super().__init__()
++++++-+        self.config = config
++++++-+        self.layer_idx = layer_idx
++++++-+        self.hidden_size = config.hidden_size
++++++-+        self.num_heads = config.num_attention_heads
++++++-+        self.head_dim = self.hidden_size // self.num_heads
++++++-+        self.num_key_value_heads = config.num_key_value_heads
++++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++-+        self.max_position_embeddings = config.max_position_embeddings
++++++-+        self.rope_theta = config.rope_theta
++++++-+        self.attention_dropout = config.attention_dropout
++++++-+
++++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
++++++-+            raise ValueError(
++++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++++-+            )
++++++-+
++++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++-+
++++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++-+            self.head_dim,
++++++-+            max_position_embeddings=self.max_position_embeddings,
++++++-+            base=self.rope_theta,
++++++-+        )
++++++-+
++++++-+    def forward(
++++++-+        self,
++++++-+        hidden_states: mindspore.Tensor,
++++++-+        attention_mask: Optional[mindspore.Tensor] = None,
++++++-+        position_ids: Optional[mindspore.Tensor] = None,
++++++-+        past_key_value: Optional[Cache] = None,
++++++-+        output_attentions: bool = False,
++++++-+        use_cache: bool = False,
++++++-+        cache_position: Optional[mindspore.Tensor] = None,
++++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++-+
++++++-+        bsz, q_len, _ = hidden_states.shape
++++++-+
++++++-+        # 1. 线性投射 Q, K, V
++++++-+        query_states = self.q_proj(hidden_states)
++++++-+        key_states = self.k_proj(hidden_states)
++++++-+        value_states = self.v_proj(hidden_states)
++++++-+
++++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
++++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+
++++++-+        # 3. RoPE 旋转位置编码
++++++-+        kv_seq_len = key_states.shape[-2]
++++++-+        if past_key_value is not None:
++++++-+            if self.layer_idx is None:
++++++-+                raise ValueError(
++++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++-+                    "with a layer index."
++++++-+                )
++++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++++-+                if cache_position.shape[0] == 1:
++++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++++-+                    kv_seq_len = past_seen_tokens + 1
++++++-+                else:
++++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
++++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++++-+            else:
++++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++-+
++++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++-+
++++++-+        # 4. KV 缓存更新
++++++-+        if past_key_value is not None:
++++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++-+            key_states, value_states = past_key_value.update(
++++++-+                key_states, value_states, self.layer_idx, cache_kwargs
++++++-+            )
++++++-+            
++++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++-+                if cache_position.shape[0] == 1:
++++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++++-+                    kv_seq_len = key_states.shape[-2]
++++++-+
++++++-+        # 5. [重要] 准备 Attention Mask
++++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++-+        fa_attention_mask = None
++++++-+        if attention_mask is not None:
++++++-+            # 截取与当前key长度匹配的部分
++++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++++-+            fa_attention_mask = (mask_slice != 0)
++++++-+
++++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++++-+        input_dtype = query_states.dtype
++++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++++-+            query_states = query_states.to(mindspore.float16)
++++++-+            key_states = key_states.to(mindspore.float16)
++++++-+            value_states = value_states.to(mindspore.float16)
++++++-+
++++++-+        # 6. [核心] 调用 flash_attention_score 算子
++++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++-+        attn_output = mindspore.ops.flash_attention_score(
++++++-+            query=query_states,
++++++-+            key=key_states,
++++++-+            value=value_states,
++++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
++++++-+            attn_mask=fa_attention_mask,
++++++-+            keep_prob=1.0 - self.attention_dropout,
++++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
++++++-+            input_layout="BNSD",
++++++-+            sparse_mode=0 # 使用 defaultMask 模式
++++++-+        )
++++++-+
++++++-+        # 恢复原始数据类型
++++++-+        attn_output = attn_output.to(input_dtype)
++++++-+
++++++-+        # 7. 调整输出形状
++++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++-+        attn_output = self.o_proj(attn_output)
++++++-+
++++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
++++++-+        attn_weights = None
++++++-+        if output_attentions:
++++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++-+
++++++-+        return attn_output, attn_weights, past_key_value
++++++-+
++++++-+    # def forward(
++++++-+    #     self,
++++++-+    #     hidden_states: mindspore.Tensor,
++++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
++++++-+    #     past_key_value: Optional[Cache] = None,
++++++-+    #     output_attentions: bool = False,
++++++-+    #     use_cache: bool = False,
++++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
++++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++-+
++++++-+    #     bsz, q_len, _ = hidden_states.shape
++++++-+
++++++-+    #     # 1. 线性投射 Q, K, V
++++++-+    #     query_states = self.q_proj(hidden_states)
++++++-+    #     key_states = self.k_proj(hidden_states)
++++++-+    #     value_states = self.v_proj(hidden_states)
++++++-+
++++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+
++++++-+    #     # 3. RoPE 旋转位置编码
++++++-+    #     kv_seq_len = key_states.shape[-2]
++++++-+    #     if past_key_value is not None:
++++++-+    #         if self.layer_idx is None:
++++++-+    #             raise ValueError(
++++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++-+    #                 "with a layer index."
++++++-+    #             )
++++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++-+
++++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++-+
++++++-+    #     # 4. KV 缓存更新
++++++-+    #     if past_key_value is not None:
++++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++-+    #         key_states, value_states = past_key_value.update(
++++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++-+    #         )
++++++-+
++++++-+    #     # 5. 准备 Attention Mask
++++++-+    #     fa_attention_mask = None
++++++-+    #     if attention_mask is not None:
++++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++-+    #         fa_attention_mask = (mask_slice != 0)
++++++-+
++++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++++-+    #     input_dtype = query_states.dtype
++++++-+
++++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
++++++-+    #     attn_output = mindspore.ops.flash_attention_score(
++++++-+    #         query=query_states,
++++++-+    #         key=key_states,
++++++-+    #         value=value_states,
++++++-+    #         head_num=self.num_heads,
++++++-+    #         attn_mask=fa_attention_mask,
++++++-+    #         keep_prob=1.0 - self.attention_dropout,
++++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++++-+    #         input_layout="BNSD",
++++++-+    #         sparse_mode=0,
++++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++++-+    #         inner_precise=1
++++++-+    #     )
++++++-+
++++++-+    #     # 恢复原始数据类型
++++++-+    #     attn_output = attn_output.to(input_dtype)
++++++-+
++++++-+    #     # 7. 调整输出形状
++++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++-+    #     attn_output = self.o_proj(attn_output)
++++++-+
++++++-+    #     attn_weights = None
++++++-+    #     if output_attentions:
++++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++-+
++++++-+    #     return attn_output, attn_weights, past_key_value
++++++-+
++++++-+    # def forward(
++++++-+    #     self,
++++++-+    #     hidden_states: mindspore.Tensor,
++++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
++++++-+    #     past_key_value: Optional[Cache] = None,
++++++-+    #     output_attentions: bool = False,
++++++-+    #     use_cache: bool = False,
++++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
++++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++-+
++++++-+    #     bsz, q_len, _ = hidden_states.shape
++++++-+
++++++-+    #     query_states = self.q_proj(hidden_states)
++++++-+    #     key_states = self.k_proj(hidden_states)
++++++-+    #     value_states = self.v_proj(hidden_states)
++++++-+
++++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-+
++++++-+    #     kv_seq_len = key_states.shape[-2]
++++++-+    #     if past_key_value is not None:
++++++-+    #         if self.layer_idx is None:
++++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
++++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++-+
++++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++-+
++++++-+    #     if past_key_value is not None:
++++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++-+    #         key_states, value_states = past_key_value.update(
++++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++-+    #         )
++++++-+
++++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++-+
++++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
++++++-+    #     # <--- 修改结束 ---
++++++-+
++++++-+    #     fa_attention_mask = None
++++++-+    #     if attention_mask is not None:
++++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++-+    #         fa_attention_mask = (mask_slice != 0)
++++++-+
++++++-+    #     input_dtype = query_states.dtype
++++++-+
++++++-+    #     attn_output = mindspore.ops.flash_attention_score(
++++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
++++++-+    #         key=key_states,
++++++-+    #         value=value_states,
++++++-+    #         head_num=self.num_heads,
++++++-+    #         attn_mask=fa_attention_mask,
++++++-+    #         keep_prob=1.0 - self.attention_dropout,
++++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++++-+    #         input_layout="BNSD",
++++++-+    #         sparse_mode=0,
++++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
++++++-+    #     )
++++++-+
++++++-+    #     attn_output = attn_output.to(input_dtype)
++++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++-+    #     attn_output = self.o_proj(attn_output)
++++++-+
++++++-+    #     attn_weights = None
++++++-+    #     if output_attentions:
++++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++++-+
++++++-+    #     return attn_output, attn_weights, past_key_value
++++++-+
++++++- QWEN2MOE_ATTENTION_CLASSES = {
++++++-     "eager": Qwen2MoeAttention,
++++++-+    "flash-attention": Qwen2MoeFlashAttention,
++++++- }
++++++- 
++++++- 
++++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++- 
++++++-+    #@dwj
++++++-+    # 只遍历激活的专家，而非全部专家
++++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++--        hidden_states = hidden_states.view(-1, hidden_dim)
++++++--        # router_logits: (batch * sequence_length, n_experts)
++++++--        router_logits = self.gate(hidden_states)
++++++--
++++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++--        if self.norm_topk_prob:
++++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++--        # we cast back to the input dtype
++++++--        routing_weights = routing_weights.to(hidden_states.dtype)
++++++--
++++++--        final_hidden_states = ops.zeros(
++++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++++++--        )
++++++--
++++++--        # One hot encode the selected experts to create an expert mask
++++++--        # this will be used to easily index which expert is going to be sollicitated
++++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++++++--
++++++--        # Loop over all available experts in the model and perform the computation on each expert
++++++--        for expert_idx in range(self.num_experts):
++++++--            expert_layer = self.experts[expert_idx]
++++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++++++--
++++++--            # Index the correct hidden states and compute the expert hidden state for
++++++--            # the current expert. We need to make sure to multiply the output hidden
++++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++++++--            if 0 not in idx.shape:
++++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++++++--
++++++--                # However `index_add_` only support torch tensors for indexing so we'll use
++++++--                # the `top_x` tensor here.
++++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++++++--
++++++--        shared_expert_output = self.shared_expert(hidden_states)
++++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++++++--
++++++--        final_hidden_states = final_hidden_states + shared_expert_output
++++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++-+            num_tokens = hidden_states_reshaped.shape[0]
++++++-+
++++++-+            router_logits = self.gate(hidden_states_reshaped)
++++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++-+
++++++-+            if self.norm_topk_prob:
++++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
++++++-+            
++++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++++-+            flat_selected_experts = selected_experts.flatten()
++++++-+            
++++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++++-+            token_indices = broadcasted_token_indices.flatten()
++++++-+            
++++++-+            active_experts = ops.unique(flat_selected_experts)
++++++-+            
++++++-+            for expert_idx_tensor in active_experts:
++++++-+                expert_idx = expert_idx_tensor.item()
++++++-+                expert_layer = self.experts[expert_idx]
++++++-+                
++++++-+                mask = (flat_selected_experts == expert_idx_tensor)
++++++-+                selected_token_indices = token_indices[mask]
++++++-+                selected_routing_weights = routing_weights.flatten()[mask]
++++++-+                
++++++-+                current_states = hidden_states_reshaped[selected_token_indices]
++++++-+                
++++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++-+                
++++++-+                final_hidden_states = final_hidden_states.index_add(
++++++-+                    dim=0,
++++++-+                    index=selected_token_indices,
++++++-+                    source=expert_output.to(hidden_states.dtype)
++++++-+                )
++++++-+            
++++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++++- 
++++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++--        return final_hidden_states, router_logits
++++++-+            final_hidden_states = final_hidden_states + shared_expert_output
++++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++-+            
++++++-+            return final_hidden_states, router_logits
++++++- 
++++++- 
++++++- class Qwen2MoeDecoderLayer(nn.Module):
++++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++++++- 
++++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++- 
++++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++-+
++++++-         if (layer_idx not in config.mlp_only_layers) and (
++++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++++-         ):
++++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++++++-     _skip_keys_device_placement = "past_key_values"
++++++-     _supports_cache_class = True
++++++-+#lwx
++++++-+    # _supports_static_cache = True
++++++- 
++++++-     def _init_weights(self, module):
++++++-         std = self.config.initializer_range
++++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++++-         return causal_mask
++++++- 
++++++- 
++++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++-     _tied_weights_keys = ["lm_head.weight"]
++++++- 
++++++-     def __init__(self, config):
++++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++-         self.num_experts_per_tok = config.num_experts_per_tok
++++++-         # Initialize weights and apply final processing
++++++-         self.post_init()
++++++-+        # @lwx
++++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++++++-+        #     self.generation_config.cache_implementation = "static"
++++++-+        self._warmed_up = False
++++++-+
++++++-+    def warmup_moe_model(self):
++++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
++++++-+        test_texts = [
++++++-+            "warmup short",
++++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++++++-+        ]
++++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++-+        if tokenizer is None:
++++++-+            from mindnlp.transformers import AutoTokenizer
++++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++-+            self._warmup_tokenizer = tokenizer
++++++-+
++++++-+        for text in test_texts:
++++++-+            inputs = tokenizer(text, return_tensors="ms")
++++++-+            with mindspore._no_grad():
++++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
++++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
++++++- 
++++++-     def get_input_embeddings(self):
++++++-         return self.model.embed_tokens
++++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++++-         ```"""
++++++-+        if not self._warmed_up:
++++++-+            self._warmed_up = True
++++++-+            self.warmup_moe_model()
++++++- 
++++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++++-         output_router_logits = (
++++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++-             }
++++++-         )
++++++-         return model_inputs
++++++-+# @lwx
++++++-+    # def _decode_one_tokens_logits(
++++++-+    #     self,
++++++-+    #     cur_token: mindspore.Tensor,
++++++-+    #     input_pos: Optional[mindspore.Tensor],
++++++-+    #     cache_position: mindspore.Tensor,
++++++-+    #     past_key_values: StaticCache,
++++++-+    # ) -> mindspore.Tensor:
++++++-+    #     """
++++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++++++-+        
++++++-+    #     Args:
++++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++++++-+    #         input_pos: 输入位置信息，可选
++++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
++++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
++++++-+            
++++++-+    #     Returns:
++++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++++++-+    #     """
++++++-+    #     # 调用JIT编译的版本
++++++-+    #     return self.get_decode_one_tokens_logits(
++++++-+    #         cur_token=cur_token,
++++++-+    #         input_pos=input_pos,
++++++-+    #         cache_position=cache_position,
++++++-+    #         past_key_values=past_key_values,
++++++-+    #     )
++++++-+    
++++++-+    # @mindspore.jit(jit_level='O1')
++++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++++++-+    #     """
++++++-+    #     JIT编译的函数，用于高效的单token解码
++++++-+    #     使用JIT编译优化以支持静态shape和高效执行
++++++-+        
++++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++++++-+    #     """
++++++-+    #     outputs = self.model.forward(
++++++-+    #         input_ids=cur_token,
++++++-+    #         position_ids=input_pos,
++++++-+    #         cache_position=cache_position,
++++++-+    #         past_key_values=past_key_values,
++++++-+    #         use_cache=True,
++++++-+    #         return_dict=False,
++++++-+    #     )
++++++-+        
++++++-+    #     hidden_states = outputs[0]
++++++-+    #     logits = self.lm_head.forward(hidden_states)
++++++-+    #     logits = logits.float()
++++++-+        
++++++-+    #     return logits[:, -1, :]
++++++-+
++++++-+    # def _sample(
++++++-+    #     self,
++++++-+    #     input_ids: mindspore.Tensor,
++++++-+    #     logits_processor,
++++++-+    #     stopping_criteria,
++++++-+    #     generation_config,
++++++-+    #     synced_devices: bool,
++++++-+    #     streamer=None,
++++++-+    #     logits_warper=None,
++++++-+    #     **model_kwargs,
++++++-+    # ):
++++++-+    #     """
++++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++++++-+    #     """
++++++-+    #     from ...generation.logits_process import LogitsProcessorList
++++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
++++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++++++-+    #     from mindnlp.core import nn, ops, no_grad
++++++-+    #     import numpy as np
++++++-+        
++++++-+    #     # 检查是否使用 StaticCache
++++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++++++-+    #     # 否则，直接调用父类方法
++++++-+    #     past_key_values = model_kwargs.get("past_key_values")
++++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++++++-+
++++++-+    #     if not isinstance(past_key_values, StaticCache):
++++++-+    #         # 不使用 StaticCache，直接调用父类方法
++++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++++++-+    #         return super()._sample(
++++++-+    #             input_ids=input_ids,
++++++-+    #             logits_processor=logits_processor,
++++++-+    #             stopping_criteria=stopping_criteria,
++++++-+    #             generation_config=generation_config,
++++++-+    #             synced_devices=synced_devices,
++++++-+    #             streamer=streamer,
++++++-+    #             logits_warper=logits_warper,
++++++-+    #             **model_kwargs,
++++++-+    #         )
++++++-+        
++++++-+    #     # 使用 StaticCache，进入自定义循环
++++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++++++-+    #     pad_token_id = generation_config._pad_token_tensor
++++++-+    #     output_attentions = generation_config.output_attentions
++++++-+    #     output_hidden_states = generation_config.output_hidden_states
++++++-+    #     output_scores = generation_config.output_scores
++++++-+    #     output_logits = generation_config.output_logits
++++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
++++++-+    #     max_length = generation_config.max_length
++++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++++++-+    #     do_sample = generation_config.do_sample
++++++-+        
++++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++++++-+    #         raise ValueError(
++++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++++++-+    #             f"{logits_warper})."
++++++-+    #         )
++++++-+        
++++++-+    #     # init attention / hidden states / scores tuples
++++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
++++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++++++-+        
++++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++++++-+    #         encoder_hidden_states = (
++++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++++++-+    #         )
++++++-+        
++++++-+    #     # keep track of which sequences are already finished
++++++-+    #     batch_size, cur_len = input_ids.shape
++++++-+    #     this_peer_finished = False
++++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++++++-+        
++++++-+    #     time_record = []
++++++-+    #     from ....utils.testing_utils import parse_flag_from_env
++++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++++++-+        
++++++-+    #     while self._has_unfinished_sequences(
++++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++++++-+    #     ):
++++++-+    #         if _record_time:
++++++-+    #             import time as time_module
++++++-+    #             infer_start = time_module.time()
++++++-+            
++++++-+    #         # prepare model inputs
++++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++++++-+            
++++++-+    #         # prepare variable output controls
++++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++++++-+            
++++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++++++-+    #         cur_cache_position = model_inputs.get("cache_position")
++++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
++++++-+    #         cur_input_ids = model_inputs.get("input_ids")
++++++-+            
++++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
++++++-+    #             cur_cache_position is not None and 
++++++-+    #             len(cur_cache_position.shape) > 0 and
++++++-+    #             cur_cache_position.shape[0] == 1 and
++++++-+    #             cur_input_ids is not None and
++++++-+    #             cur_input_ids.shape[1] == 1):
++++++-+    #             # 使用 JIT 优化的单 token 解码
++++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++++++-+    #             if not hasattr(self, '_jit_used'):
++++++-+    #                 self._jit_used = False
++++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++++++-+                
++++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
++++++-+    #                 cur_token=cur_input_ids,
++++++-+    #                 input_pos=model_inputs.get("position_ids"),
++++++-+    #                 cache_position=cur_cache_position,
++++++-+    #                 past_key_values=cur_past_key_values,
++++++-+    #             )
++++++-+                
++++++-+    #             # 标记已使用JIT（用于后续判断）
++++++-+    #             if not self._jit_used:
++++++-+    #                 self._jit_used = True
++++++-+                
++++++-+    #             # 构造兼容的输出对象
++++++-+    #             class JitOptimizedOutput:
++++++-+    #                 def __init__(self, logits, config):
++++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++++++-+    #                     self.config = config
++++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
++++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
++++++-+    #                     self.cross_attentions = None
++++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++++++-+                
++++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++++++-+    #         else:
++++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++++++-+    #             outputs = self(**model_inputs, return_dict=True)
++++++-+            
++++++-+    #         if synced_devices and this_peer_finished:
++++++-+    #             continue
++++++-+            
++++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++++++-+    #         next_token_logits = outputs.logits[:, -1, :]
++++++-+            
++++++-+    #         # pre-process distribution
++++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++++++-+    #         if do_sample:
++++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++++++-+            
++++++-+    #         # Store scores, attentions and hidden_states when required
++++++-+    #         if return_dict_in_generate:
++++++-+    #             if output_scores:
++++++-+    #                 scores += (next_token_scores,)
++++++-+    #             if output_logits:
++++++-+    #                 raw_logits += (next_token_logits,)
++++++-+    #             if output_attentions:
++++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++++++-+    #                 if self.config.is_encoder_decoder:
++++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++++++-+                
++++++-+    #             if output_hidden_states:
++++++-+    #                 hidden = (
++++++-+    #                     outputs.decoder_hidden_states
++++++-+    #                     if self.config.is_encoder_decoder
++++++-+    #                     else outputs.hidden_states
++++++-+    #                 )
++++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++++++-+            
++++++-+    #         # token selection
++++++-+    #         if do_sample:
++++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++++++-+    #         else:
++++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++++++-+            
++++++-+    #         # finished sentences should have their next token be a padding token
++++++-+    #         if has_eos_stopping_criteria:
++++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++++++-+            
++++++-+    #         # update generated ids, model inputs, and length for next step
++++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++++++-+    #         if streamer is not None:
++++++-+    #             streamer.put(next_tokens)
++++++-+            
++++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
++++++-+    #             outputs,
++++++-+    #             model_kwargs,
++++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
++++++-+    #         )
++++++-+            
++++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++++++-+    #         cur_len += 1
++++++-+            
++++++-+    #         if _record_time:
++++++-+    #             import time as time_module
++++++-+    #             infer_stop = time_module.time()
++++++-+    #             time_record.append(infer_stop - infer_start)
++++++-+            
++++++-+    #         del outputs
++++++-+        
++++++-+    #     average_infer_time = None
++++++-+    #     if time_record:
++++++-+    #         if len(time_record) > 1:
++++++-+    #             time_record.pop(0)
++++++-+    #         average_infer_time = sum(time_record) / len(time_record)
++++++-+    #         print(f'average inference time is: {average_infer_time}')
++++++-+    #         print(f'inference time record: {time_record}')
++++++-+        
++++++-+    #     if streamer is not None:
++++++-+    #         streamer.end()
++++++-+        
++++++-+    #     # 简单判断：打印是否使用了JIT路径
++++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
++++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
++++++-+    #     else:
++++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++++++-+        
++++++-+    #     if return_dict_in_generate:
++++++-+    #         if self.config.is_encoder_decoder:
++++++-+    #             return GenerateEncoderDecoderOutput(
++++++-+    #                 sequences=input_ids,
++++++-+    #                 scores=scores,
++++++-+    #                 logits=raw_logits,
++++++-+    #                 encoder_attentions=encoder_attentions,
++++++-+    #                 encoder_hidden_states=encoder_hidden_states,
++++++-+    #                 decoder_attentions=decoder_attentions,
++++++-+    #                 cross_attentions=cross_attentions,
++++++-+    #                 decoder_hidden_states=decoder_hidden_states,
++++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++-+    #                 average_infer_time=average_infer_time
++++++-+    #             )
++++++-+    #         else:
++++++-+    #             return GenerateDecoderOnlyOutput(
++++++-+    #                 sequences=input_ids,
++++++-+    #                 scores=scores,
++++++-+    #                 logits=raw_logits,
++++++-+    #                 attentions=decoder_attentions,
++++++-+    #                 hidden_states=decoder_hidden_states,
++++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++-+    #                 average_infer_time=average_infer_time
++++++-+    #             )
++++++-+    #     else:
++++++-+    #         return input_ids
++++++-+            
++++++-+    # def _prepare_cache_for_generation(
++++++-+    #     self,
++++++-+    #     generation_config,
++++++-+    #     model_kwargs,
++++++-+    #     assistant_model,
++++++-+    #     batch_size,
++++++-+    #     max_cache_length,
++++++-+    # ):
++++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++++++-+    #         generation_config.cache_implementation = "static"
++++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++++++-+        
++++++-+    #     if generation_config.cache_implementation == "static":
++++++-+    #         base_required_from_max_length = generation_config.max_length + 1
++++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
++++++-+    #         min_cache_size = 50
++++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++++++-+    #         else:
++++++-+    #             max_cache_length = max(base_required, min_cache_size)
++++++-+            
++++++-+    #         original_max_cache_length = max_cache_length
++++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
++++++-+            
++++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++-+    #             if max_cache_length > self.config.max_position_embeddings:
++++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++++++-+        
++++++-+    #     result = super()._prepare_cache_for_generation(
++++++-+    #         generation_config=generation_config,
++++++-+    #         model_kwargs=model_kwargs,
++++++-+    #         assistant_model=assistant_model,
++++++-+    #         batch_size=batch_size,
++++++-+    #         max_cache_length=max_cache_length,
++++++-+    #     )
++++++-+        
++++++-+    #     if generation_config.cache_implementation == "static":
++++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++++++-+    #         created_cache = model_kwargs.get(cache_name)
++++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
++++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++++++-+        
++++++-+    #     return result
++++++-+
++++++-+
++++++-+
++++++- 
++++++- 
++++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++++++--- 
++++++-2.27.0
++++++-
++++++-- 
++++++2.27.0
++++++
+++++-- 
+++++2.27.0
+++++
++++-- 
++++2.27.0
++++
+++-- 
+++2.27.0
+++
++-- 
++2.27.0
++
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0009-20251109firstcommit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0009-20251109firstcommit.patch"
new file mode 100644
index 00000000..5ba94286
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0009-20251109firstcommit.patch"
@@ -0,0 +1,9078 @@
+From 4f88911daf60910b3b94b56b8a590650454a2dde Mon Sep 17 00:00:00 2001
+From: Pinoeer-kingxi <13022943007@163.com>
+Date: Sun, 9 Nov 2025 02:09:15 +0800
+Subject: [PATCH 09/10] 20251109firstcommit
+
+---
+ .../models/deepseek/modeling_deepseek.py      |  103 +-
+ patches/0001-20251104commit.patch             |    2 +-
+ patches/0002-20251106commit.patch             |    2 +-
+ patches/0003-20261106secondcommit.patch       |    2 +-
+ patches/0004-20251106change.patch             |    2 +-
+ patches/0005-20251107001commit.patch          |    2 +-
+ patches/0006-20251107002commit.patch          |    2 +-
+ patches/0007-20251107003commit.patch          |    2 +-
+ patches/0008-moe-change.patch                 | 8789 +++++++++++++++++
+ 9 files changed, 8889 insertions(+), 17 deletions(-)
+ create mode 100644 patches/0008-moe-change.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index 0af29305..8d004af1 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -415,7 +415,9 @@ class DeepseekMoE(nn.Module):
+         else:
+             # @lwx
+             if orig_shape[1] == 1:
+-                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++                # lwx moe_infer_decode_fast
++                # y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++                y=self.moe_infer_decode_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+                 y=y.view(*orig_shape)
+                 if self.config.n_shared_experts is not None:
+                     y = y + self.shared_experts(identity)
+@@ -544,6 +546,7 @@ class DeepseekMoE(nn.Module):
+     #@dwj
+     @no_grad()
+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++
+         selected_experts = [self.experts[i] for i in flat_expert_indices]
+         expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+         final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+@@ -643,6 +646,43 @@ class DeepseekMoE(nn.Module):
+     #     )
+ 
+     #     return final_output
++    # def init_expert_cache(self):
++    #     """
++    #     在模型初始化时调用，缓存所有专家的权重到显存。
++    #     """
++    #     self.cache_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
++    #     self.cache_up_w   = ops.stack([expert.up_proj.weight for expert in self.experts], dim=0)
++    #     self.cache_down_w = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
++    @no_grad()
++    def moe_infer_decode_fast(self, x, flat_expert_indices, flat_expert_weights):
++        top_k = flat_expert_indices.shape[0]
++        hidden_size = x.shape[-1]
++
++        selected_gate_w = []
++        selected_up_w = []
++        selected_down_w = []
++
++        for eid in flat_expert_indices.tolist():
++            if hasattr(self, "cache_gate_w") and eid < self.cache_gate_w.shape[0]:
++                selected_gate_w.append(self.cache_gate_w[eid])
++                selected_up_w.append(self.cache_up_w[eid])
++                selected_down_w.append(self.cache_down_w[eid])
++            else:
++                selected_gate_w.append(self.experts[eid].gate_proj.weight)
++                selected_up_w.append(self.experts[eid].up_proj.weight)
++                selected_down_w.append(self.experts[eid].down_proj.weight)
++
++        selected_gate_w = ops.stack(selected_gate_w, dim=0)
++        selected_up_w   = ops.stack(selected_up_w, dim=0)
++        selected_down_w = ops.stack(selected_down_w, dim=0)
++
++        x_expanded = x.expand((top_k, 1, hidden_size))
++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
++        up_out   = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
++        return weighted_sum
+ 
+     # lwx prefill 20251108
+     @no_grad()
+@@ -711,7 +751,7 @@ class DeepseekMoE(nn.Module):
+             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+             expert_outputs * sorted_weights
+         )      
+-
++        return final_output
+ 
+         # try:
+         #     final_output = ops.moe_token_unpermute(
+@@ -730,7 +770,7 @@ class DeepseekMoE(nn.Module):
+         #         expert_outputs * sorted_weights
+         #     )
+ 
+-        return final_output
++        # return final_output
+ 
+ 
+     # @no_grad()
+@@ -1827,27 +1867,68 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+ 
+         # Initialize weights and apply final processing
+         self.post_init()
++        # lwx
+         self.warm_up = False
+-
++    #初始
++
++    # def warmup_moe_model_deep(self):
++    #     print("[Warmup] DeepSeek-MoE 模型预热开始...")
++    #     test_texts = [
++    #         "warmup short",
++    #         "This is a medium length warmup sentence for MoE experts. middle middle middle",
++    #         "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++    #     ]
++    #     tokenizer = getattr(self, "_warmup_tokenizer", None)
++    #     if tokenizer is None:
++    #         from mindnlp.transformers import AutoTokenizer
++    #         tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++    #         self._warmup_tokenizer = tokenizer
++
++    #     for text in test_texts:
++    #         inputs = tokenizer(text, return_tensors="ms")
++    #         with mindspore._no_grad():
++    #             _ = self(**inputs, use_cache=False)
++    #     print("[Warmup] DeepSeek-MoE 模型预热完成。")
++    
+     def warmup_moe_model_deep(self):
+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-        test_texts = [
+-            "warmup short",
+-            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++
++        # 直接用 eval.py 默认的 prompts 内容
++        warmup_prompts = [
++            "Hello, how are you?",
++            "This American studied art at Yale and is the author of multiple popular mystery novels. First name is 'Hillary'. What's the last name?",
++            """Summarize the following text: US President Donald Trump has said he is 'not happy' with his Russian counterpart Vladimir Putin, following Moscow's largest aerial attack yet on Ukraine.
++    In a rare rebuke, Trump said: "What the hell happened to him? He's killing a lot of people." He later called Putin "absolutely crazy".
++    Ukrainian President Volodymyr Zelensky earlier said Washington's "silence" over recent Russian attacks was encouraging Putin, urging "strong pressure" - including tougher sanctions - on Moscow.
++    """
+         ]
++
+         tokenizer = getattr(self, "_warmup_tokenizer", None)
+         if tokenizer is None:
+             from mindnlp.transformers import AutoTokenizer
+             tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+             self._warmup_tokenizer = tokenizer
+ 
+-        for text in test_texts:
++        # 跑一遍 warmup_prompts，触发路由逻辑
++        for text in warmup_prompts:
+             inputs = tokenizer(text, return_tensors="ms")
+             with mindspore._no_grad():
+                 _ = self(**inputs, use_cache=False)
++
++        # 这里可以加按需缓存逻辑，避免显存 OOM
++        from mindnlp.transformers.models.deepseek.modeling_deepseek import DeepseekMoE
++        for module in self.modules():
++            if isinstance(module, DeepseekMoE):
++                active_ids = getattr(module, "_last_routed_expert_ids", None)
++                if active_ids is not None:
++                    module.init_active_expert_cache(active_ids)
+         print("[Warmup] DeepSeek-MoE 模型预热完成。")
+ 
++    def init_active_expert_cache(self, active_ids):
++        self.cache_gate_w = ops.stack([self.experts[i].gate_proj.weight for i in active_ids], dim=0)
++        self.cache_up_w   = ops.stack([self.experts[i].up_proj.weight for i in active_ids], dim=0)
++        self.cache_down_w = ops.stack([self.experts[i].down_proj.weight for i in active_ids], dim=0)
++
+     def get_input_embeddings(self):
+         return self.model.embed_tokens
+ 
+@@ -2208,7 +2289,9 @@ if __name__ == "__main__":
+     config.num_hidden_layers = 2
+     config.n_routed_experts = 2
+     model = DeepseekForCausalLM(config)
+-
++    # for module in model.modules():
++    #     if isinstance(module, DeepseekMoE):
++    #         module.init_expert_cache()
+     print('init model')
+     input_ids = mindspore.Tensor(np.random.randint(0, 10000, (1, 11)), mindspore.int32)
+     attention_mask = mindspore.Tensor(np.ones((1,11)), mindspore.int32)
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+index 513dd40b..8de61195 100644
+--- a/patches/0001-20251104commit.patch
++++ b/patches/0001-20251104commit.patch
+@@ -1,7 +1,7 @@
+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-Subject: [PATCH 1/7] 20251104commit
++Subject: [PATCH 1/8] 20251104commit
+ 
+ ---
+  mindnlp/transformers/cache_utils.py           |  28 +-
+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+index 41081b85..d7a129ea 100644
+--- a/patches/0002-20251106commit.patch
++++ b/patches/0002-20251106commit.patch
+@@ -1,7 +1,7 @@
+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-Subject: [PATCH 2/7] 20251106commit
++Subject: [PATCH 2/8] 20251106commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+index c1392569..179a9bb5 100644
+--- a/patches/0003-20261106secondcommit.patch
++++ b/patches/0003-20261106secondcommit.patch
+@@ -1,7 +1,7 @@
+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-Subject: [PATCH 3/7] 20261106secondcommit
++Subject: [PATCH 3/8] 20261106secondcommit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+index e548b1b2..bc5549ca 100644
+--- a/patches/0004-20251106change.patch
++++ b/patches/0004-20251106change.patch
+@@ -1,7 +1,7 @@
+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Thu, 6 Nov 2025 15:48:09 +0800
+-Subject: [PATCH 4/7] 20251106change
++Subject: [PATCH 4/8] 20251106change
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  189 +-
+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+index bf224d2a..7217a46b 100644
+--- a/patches/0005-20251107001commit.patch
++++ b/patches/0005-20251107001commit.patch
+@@ -1,7 +1,7 @@
+ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Fri, 7 Nov 2025 11:48:18 +0800
+-Subject: [PATCH 5/7] 20251107001commit
++Subject: [PATCH 5/8] 20251107001commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |   91 +-
+diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
+index 1bd306b9..80906633 100644
+--- a/patches/0006-20251107002commit.patch
++++ b/patches/0006-20251107002commit.patch
+@@ -1,7 +1,7 @@
+ From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Fri, 7 Nov 2025 12:06:32 +0800
+-Subject: [PATCH 6/7] 20251107002commit
++Subject: [PATCH 6/8] 20251107002commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |  122 +-
+diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
+index ce558554..8a2fc4fe 100644
+--- a/patches/0007-20251107003commit.patch
++++ b/patches/0007-20251107003commit.patch
+@@ -1,7 +1,7 @@
+ From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
+ From: Pinoeer-kingxi <13022943007@163.com>
+ Date: Fri, 7 Nov 2025 12:12:51 +0800
+-Subject: [PATCH 7/7] 20251107003commit
++Subject: [PATCH 7/8] 20251107003commit
+ 
+ ---
+  .../models/deepseek/modeling_deepseek.py      |    2 +-
+diff --git a/patches/0008-moe-change.patch b/patches/0008-moe-change.patch
+new file mode 100644
+index 00000000..349f1429
+--- /dev/null
++++ b/patches/0008-moe-change.patch
+@@ -0,0 +1,8789 @@
++From 45ba3bbc411b64cbffd547fa3d66bce9545639dd Mon Sep 17 00:00:00 2001
++From: Pinoeer-kingxi <13022943007@163.com>
++Date: Sun, 9 Nov 2025 00:50:01 +0800
++Subject: [PATCH 8/8] moe change
++
++---
++ .../models/deepseek/modeling_deepseek.py      |  433 +-
++ .../models/qwen2_moe/modeling_qwen2_moe.py    |   86 +-
++ patches/0001-20251104commit.patch             |    2 +-
++ patches/0002-20251106commit.patch             |    2 +-
++ patches/0003-20261106secondcommit.patch       |    2 +-
++ patches/0004-20251106change.patch             |    2 +-
++ patches/0005-20251107001commit.patch          |    2 +-
++ patches/0006-20251107002commit.patch          |    2 +-
++ patches/0007-20251107003commit.patch          | 8034 +++++++++++++++++
++ 9 files changed, 8510 insertions(+), 55 deletions(-)
++ create mode 100644 patches/0007-20251107003commit.patch
++
++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++index ff631974..0af29305 100644
++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++@@ -19,8 +19,10 @@
++ # limitations under the License.
++ """ MindNLP DeepSeek model."""
++ import math
+++import time
++ import warnings
++ from typing import List, Optional, Tuple, Union
+++from mindspore import mint
++ import mindspore
++ from mindnlp.core import nn, ops, no_grad
++ from mindnlp.core.nn import functional as F
++@@ -54,6 +56,10 @@ logger = logging.get_logger(__name__)
++ 
++ _CONFIG_FOR_DOC = "DeepseekConfig"
++ 
+++Long_Prompt = 1
+++LONG_PROMPT_LENGTH_THRESHOLD = 128
+++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+++
++ _attn_mask_cache = {}
++ 
++ def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
++@@ -380,6 +386,8 @@ class MoEGate(nn.Module):
++         return topk_idx, topk_weight, aux_loss
++ 
++ 
+++bincount_op = mindspore.ops.Bincount()
+++
++ class DeepseekMoE(nn.Module):
++     """
++     A mixed expert module containing shared experts.
++@@ -413,7 +421,10 @@ class DeepseekMoE(nn.Module):
++                     y = y + self.shared_experts(identity)
++                 return y
++             else:
++-                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++                if Long_Prompt == 0:
+++                    y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++                else:
+++                    y= self.moe_infer_prefill_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++                 if self.config.n_shared_experts is not None:
++                     y = y + self.shared_experts(identity)
++                 return y
++@@ -421,7 +432,103 @@ class DeepseekMoE(nn.Module):
++         # if self.config.n_shared_experts is not None:
++         #     y = y + self.shared_experts(identity)
++         # return y
++-        
+++    
+++    
+++    
+++    # lwx
+++    # def forward(self, x, expert_ids: Optional[mindspore.Tensor] = None):
+++    #     """
+++    #     如果 expert_ids 为 None，走单专家逻辑；
+++    #     如果有，多专家批量处理，保证和原逻辑一致。
+++    #     """
+++    #     if expert_ids is None:
+++    #         # 原单专家逻辑
+++    #         if self.config.pretraining_tp > 1:
+++    #             slice = self.intermediate_size // self.config.pretraining_tp
+++    #             gate_proj_slices = ops.split(self.gate_proj.weight, slice, dim=0)
+++    #             up_proj_slices = ops.split(self.up_proj.weight, slice, dim=0)
+++    #             down_proj_slices = ops.split(self.down_proj.weight, slice, dim=1)
+++    #             gate_proj = ops.cat([F.linear(x, gate_proj_slices[i]) 
+++    #                                  for i in range(self.config.pretraining_tp)], dim=-1)
+++    #             up_proj = ops.cat([F.linear(x, up_proj_slices[i]) 
+++    #                                for i in range(self.config.pretraining_tp)], dim=-1)
+++    #             intermediate_states = ops.split((self.act_fn(gate_proj) * up_proj), slice, dim=2)
+++    #             down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) 
+++    #                          for i in range(self.config.pretraining_tp)]
+++    #             down_proj = sum(down_proj)
+++    #         else:
+++    #             down_proj = self.down_proj(
+++    #                 self.act_fn(self.gate_proj(x)) * self.up_proj(x)
+++    #             )
+++    #         return down_proj
+++
+++    #     # ====== 批量多专家路径 ======
+++    #     hidden_size = x.shape[-1]
+++
+++    #     # 按 token expert_ids 选权重
+++    #     gate_weights = self.gate_proj.weight[expert_ids]  # shape: [tokens, inter_size]
+++    #     up_weights   = self.up_proj.weight[expert_ids]
+++    #     down_weights = self.down_proj.weight[expert_ids]
+++
+++    #     # 注意：pretraining_tp > 1 的分 slice 逻辑仍然要保留
+++    #     if self.config.pretraining_tp > 1:
+++    #         outputs = []
+++    #         slice = self.intermediate_size // self.config.pretraining_tp
+++    #         for i in range(self.config.pretraining_tp):
+++    #             # 每个 slice 单独计算
+++    #             gate_proj_out = F.linear(x, gate_weights[:, i*slice:(i+1)*slice])
+++    #             up_proj_out   = F.linear(x, up_weights[:, i*slice:(i+1)*slice])
+++    #             act_out       = self.act_fn(gate_proj_out) * up_proj_out
+++    #             down_proj_out = F.linear(act_out, down_weights[i*slice:(i+1)*slice, :])
+++    #             outputs.append(down_proj_out)
+++    #         return sum(outputs)
+++    #     else:
+++    #         gate_proj_out = F.linear(x, gate_weights)
+++    #         up_proj_out   = F.linear(x, up_weights)
+++    #         act_out       = self.act_fn(gate_proj_out) * up_proj_out
+++    #         return F.linear(act_out, down_weights)
+++    # @no_grad()
+++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++    #     num_tokens = x.shape[0]
+++    #     hidden_size = x.shape[-1]
+++
+++    #     idxs = flat_expert_indices.argsort()
+++    #     sorted_expert_indices = flat_expert_indices[idxs]
+++    #     sorted_token_indices = idxs // self.num_experts_per_tok
+++    #     sorted_indices = sorted_token_indices
+++
+++    #     permuted_tokens = x[sorted_token_indices]
+++    #     sorted_weights  = flat_expert_weights[idxs]
+++
+++    #     # 一次调用多专家 forward
+++    #     expert_outputs = ops.zeros_like(permuted_tokens)
+++    #     expert_outputs = self.mlp_batch_forward(permuted_tokens, sorted_expert_indices)
+++
+++    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
+++    #     try:
+++    #         final_output = ops.moe_token_unpermute(
+++    #             expert_outputs,
+++    #             sorted_indices,
+++    #             probs=probs,
+++    #             padded_mode=False
+++    #         )
+++    #     except Exception:
+++    #         final_output = ops.zeros_like(x)
+++    #         final_output = mindspore.mint.scatter_add(
+++    #             final_output,
+++    #             0,
+++    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+++    #             expert_outputs * sorted_weights
+++    #         )
+++
+++    #     return final_output
+++
+++    # def mlp_batch_forward(self, tokens, expert_ids):
+++    #     """
+++    #     使用批量专家 forward（保留精度）
+++    #     """
+++    #     return self.experts[0].forward(tokens, expert_ids)
+++
++     # @no_grad()
++     # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++ 
++@@ -434,52 +541,15 @@ class DeepseekMoE(nn.Module):
++     #         expert_cache += expert_out * weight
++     #     return expert_cache
++     
+++    #@dwj
++     @no_grad()
++-    # dwj
++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++-        # x 的 shape: (1, hidden_size)
++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++-
++-        # 1. 收集所有需要的专家层
++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++         selected_experts = [self.experts[i] for i in flat_expert_indices]
++-
++-        # 2. 并行计算所有专家的输出
++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++-        # ops.cat 会将它们堆叠成一个新的 Tensor
++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++         expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++-
++-        # 3. 使用矩阵乘法进行加权求和
++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++-        # 最终结果 final_output 的 shape: (1, hidden_size)
++         final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++-        
++         return final_output
++ 
++ 
++-    # @no_grad()
++-    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++-    #     expert_cache = ops.zeros_like(x)
++-    #     idxs = flat_expert_indices.argsort()
++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++-    #     token_idxs = idxs // self.num_experts_per_tok
++-
++-    #     for i, end_idx in enumerate(tokens_per_expert):
++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++-    #         if start_idx == end_idx:
++-    #             continue
++-    #         expert = self.experts[i]
++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
++-    #         expert_tokens = x[exp_token_idx]
++-    #         expert_out = expert(expert_tokens)
++-    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++-    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++-
++-    #     return expert_cache
++-        
++     @no_grad()
++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++         """
++@@ -525,6 +595,264 @@ class DeepseekMoE(nn.Module):
++             )
++ 
++         return expert_cache
+++
+++
+++    # @no_grad()
+++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++    #     """
+++    #     优化版 MoE prefill：使用 mindspore.ops.moe_token_unpermute 替代手动 scatter_add
+++    #     """
+++    #     num_tokens = x.shape[0]
+++    #     hidden_size = x.shape[-1]
+++
+++    #     # 生成排序后的 token 索引
+++    #     idxs = flat_expert_indices.argsort()
+++    #     sorted_expert_indices = flat_expert_indices[idxs]
+++    #     sorted_token_indices = idxs // self.num_experts_per_tok
+++
+++    #     # 记录到 sorted_indices（moe_token_unpermute 用）
+++    #     sorted_indices = sorted_token_indices  # shape: [num_tokens * top_k]
+++
+++    #     # 收集专家输入
+++    #     permuted_tokens = x[sorted_token_indices]
+++
+++    #     # 执行每个专家的 MLP（批量处理）
+++    #     expert_outputs = []
+++    #     token_ptr = 0
+++    #     tokens_per_expert = sorted_expert_indices.bincount()
+++    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
+++    #         if count == 0:
+++    #             continue
+++    #         cur_tokens = permuted_tokens[token_ptr:token_ptr+count]
+++    #         out = self.experts[expert_id](cur_tokens)
+++    #         expert_outputs.append(out)
+++    #         token_ptr += count
+++
+++    #     # 拼接所有专家输出
+++    #     permuted_outputs = ops.cat(expert_outputs, axis=0)
+++
+++    #     # 权重缩放（probs 形状为 [num_tokens, top_k]）
+++    #     probs = flat_expert_weights.view(num_tokens, self.num_experts_per_tok)
+++
+++    #     # 直接调用硬件加速的 unpermute
+++    #     final_output = ops.moe_token_unpermute(
+++    #         permuted_outputs,         # shape: [num_tokens * top_k, hidden_size]
+++    #         sorted_indices,           # shape: [num_tokens * top_k]
+++    #         probs=probs,               # 按概率加权
+++    #         padded_mode=False
+++    #     )
+++
+++    #     return final_output
+++
+++    # lwx prefill 20251108
+++    @no_grad()
+++    def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
+++        """
+++        高性能 + 数值一致的 MoE prefill 推理：
+++        1. 批量化处理所有专家计算，减少 Python 循环开销
+++        2. Ascend A2 上使用 ops.moe_token_unpermute 加速 token 恢复
+++        3. CPU/GPU 上自动 fallback 到 scatter_add 实现
+++        4. 保证权重和 token 排列顺序与原版本完全一致，避免生成结果 mismatch
+++
+++        参数：
+++            x: [num_tokens, hidden_size]，
+++            MoE 输入的 token 表示
+++            flat_expert_indices: [num_tokens * top_k]，
+++            每个 token 的路由专家 id
+++            flat_expert_weights: [num_tokens * top_k, 1]，
+++            路由专家权重
+++        """
+++        num_tokens = x.shape[0]
+++        hidden_size = x.shape[-1]
+++
+++        # 1) 排序专家分配（与原 scatter_add 一致的顺序）
+++        idxs = flat_expert_indices.argsort()  # 排序索引
+++        sorted_expert_indices = flat_expert_indices[idxs]          # [num_tokens*top_k]
+++        sorted_token_indices = idxs // self.num_experts_per_tok    # 原 token ID
+++
+++        # sorted_indices 必须与 permuted_tokens 顺序匹配
+++        sorted_indices = sorted_token_indices   # 用原 token 位置恢复顺序
+++
+++        # 2) 收集专家输入（按 idxs 排序）
+++        permuted_tokens = x[sorted_token_indices]                  # [num_tokens*top_k, hidden_size]
+++        sorted_weights  = flat_expert_weights[idxs]                 # [num_tokens*top_k, 1]，确保与 permuted_tokens 对齐
+++
+++        # 3) 计算每个专家的 token 数
+++        tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
+++
+++        # 4) 批量专家计算（减少 Python 循环）
+++        gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
+++        up_weights   = ops.stack([expert.up_proj.weight   for expert in self.experts], dim=0)
+++        down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
+++
+++        expert_outputs = ops.zeros_like(permuted_tokens)
+++        ptr = 0
+++        for expert_id, count in enumerate(tokens_per_expert.tolist()):
+++            if count == 0:
+++                continue
+++            tokens = permuted_tokens[ptr:ptr+count]  # [count, hidden_size]
+++            
+++            # 与 DeepseekMLP forward 等价
+++            gate_proj_out = F.linear(tokens, gate_weights[expert_id])
+++            up_proj_out   = F.linear(tokens, up_weights[expert_id])
+++            act_out       = self.experts[expert_id].act_fn(gate_proj_out) * up_proj_out
+++            expert_out    = F.linear(act_out, down_weights[expert_id])
+++
+++            expert_outputs[ptr:ptr+count] = expert_out
+++            ptr += count
+++
+++        # 5) Ascend 加速的 unpermute（已排序的权重）
+++        probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)  # 按排序后的顺序 reshape
+++
+++        final_output = ops.zeros_like(x)
+++        final_output = mindspore.mint.scatter_add(
+++            final_output,
+++            0,
+++            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+++            expert_outputs * sorted_weights
+++        )      
+++
+++
+++        # try:
+++        #     final_output = ops.moe_token_unpermute(
+++        #         expert_outputs,   # [num_tokens*top_k, hidden_size]
+++        #         sorted_indices,   # [num_tokens*top_k] 原 token id
+++        #         probs=probs,      # 对应权重
+++        #         padded_mode=False
+++        #     )
+++        # except Exception:
+++        #     # CPU/GPU fallback：用 scatter_add 保证完全一致
+++        #     final_output = ops.zeros_like(x)
+++        #     final_output = mindspore.mint.scatter_add(
+++        #         final_output,
+++        #         0,
+++        #         sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+++        #         expert_outputs * sorted_weights
+++        #     )
+++
+++        return final_output
+++
+++
+++    # @no_grad()
+++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++    #     num_tokens = x.shape[0]
+++    #     hidden_size = x.shape[-1]
+++
+++    #     idxs = flat_expert_indices.argsort()
+++    #     sorted_expert_indices = flat_expert_indices[idxs]
+++    #     sorted_token_indices = idxs // self.num_experts_per_tok
+++        
+++    #     # sorted_indices = sorted_token_indices
+++    #     sorted_indices = sorted_token_indices.astype(mindspore.int32)
+++    #     permuted_tokens = x[sorted_token_indices]
+++    #     sorted_weights = flat_expert_weights[idxs]
+++    #     tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
+++
+++    #     expert_outputs = ops.zeros_like(permuted_tokens)
+++    #     ptr = 0
+++
+++    #     # 只按专家维度循环
+++    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
+++    #         if count == 0:
+++    #             continue
+++    #         token_slice = slice(ptr, ptr + count)
+++    #         expert_tokens = permuted_tokens[token_slice]
+++
+++    #         # 保持原 forward（含 pretraining_tp、bias 等）
+++    #         expert_out = self.experts[expert_id](expert_tokens)
+++
+++    #         expert_outputs[token_slice] = expert_out
+++    #         ptr += count
+++
+++    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
+++    #     try:
+++    #         final_output = mindspore.ops.moe_token_unpermute(
+++    #             expert_outputs,
+++    #             sorted_indices,
+++    #             probs=probs,
+++    #             padded_mode=False
+++    #         )
+++    #     except Exception:
+++    #         final_output = ops.zeros_like(x)
+++    #         final_output = mindspore.mint.scatter_add(
+++    #             final_output,
+++    #             0,
+++    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+++    #             expert_outputs * sorted_weights
+++    #         )
+++
+++    #     return final_output
+++
+++
+++    #lwx
+++    # @no_grad()
+++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++    #     """
+++    #     并行化 MoE prefill：
+++    #     - 一次性计算所有专家输出，牺牲显存峰值换取速度
+++    #     - 保证结果与原版完全一致
+++    #     """
+++    #     # 输出缓存
+++    #     expert_cache = ops.zeros_like(x)
+++
+++    #     # token 总数（批量*seq_len*num_experts_per_tok）
+++    #     num_tokens = flat_expert_indices.shape[0]
+++    #     hidden_dim = x.shape[-1]
+++
+++    #     # 原 token ID（idxs // num_experts_per_tok）
+++    #     token_ids = ops.arange(num_tokens // self.num_experts_per_tok).repeat_interleave(self.num_experts_per_tok)
+++
+++    #     # ====== Step 1: 组织输入 ======
+++    #     # 按 experts 排序，保证 scatter_add 对应位置一致
+++    #     sort_ids = flat_expert_indices.argsort()
+++    #     sorted_experts = flat_expert_indices[sort_ids]
+++    #     sorted_tokens = token_ids[sort_ids]
+++    #     sorted_weights = flat_expert_weights[sort_ids]
+++
+++    #     # 收集每个专家的输入
+++    #     # build: expert_inputs[expert_id] = [tokens...]
+++    #     expert_inputs = []
+++    #     expert_outs = []
+++
+++    #     for eid in range(self.config.n_routed_experts):
+++    #         eid_mask = (sorted_experts == eid)
+++    #         if eid_mask.any():
+++    #             tokens_for_eid = x[sorted_tokens[eid_mask]]
+++    #             expert_inputs.append(tokens_for_eid)
+++    #         else:
+++    #             expert_inputs.append(None)
+++
+++    #     # ====== Step 2: 并行计算所有专家输出 ======
+++    #     # 存储所有专家结果到一个列表
+++    #     for eid in range(self.config.n_routed_experts):
+++    #         if expert_inputs[eid] is not None:
+++    #             out = self.experts[eid](expert_inputs[eid])
+++    #             expert_outs.append(out)
+++    #         else:
+++    #             expert_outs.append(None)
+++
+++    #     # ====== Step 3: scatter_add 回写结果 ======
+++    #     # 遍历专家，将结果加回对应的 token
+++    #     pos = 0
+++    #     for eid in range(self.config.n_routed_experts):
+++    #         if expert_outs[eid] is not None:
+++    #             size = expert_outs[eid].shape[0]
+++    #             tokens_idx = sorted_tokens[pos:pos+size]
+++    #             scaled_out = expert_outs[eid] * sorted_weights[pos:pos+size]
+++    #             pos += size
+++
+++    #             # scatter_add 到 expert_cache
+++    #             expert_cache = mindspore.mint.scatter_add(
+++    #                 expert_cache,
+++    #                 dim=0,
+++    #                 index=tokens_idx.view(-1, 1).tile((1, hidden_dim)),
+++    #                 src=scaled_out
+++    #             )
+++
+++    #     return expert_cache
+++
+++
+++
++ # 放置在 DeepseekMoE 类中
++     # @no_grad()
++     # #lwx 20251107
++@@ -1188,7 +1516,7 @@ class DeepseekDecoderLayer(nn.Module):
++         self.hidden_size = config.hidden_size
++ 
++         # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
++-            # config=config, layer_idx=layer_idx
+++        #     config=config, layer_idx=layer_idx
++         # )
++ 
++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
++@@ -1204,6 +1532,7 @@ class DeepseekDecoderLayer(nn.Module):
++             )
++             else DeepseekMLP(config)
++         )
+++
++         self.input_layernorm = DeepseekRMSNorm(
++             config.hidden_size, eps=config.rms_norm_eps
++         )
++@@ -1537,6 +1866,28 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++     def get_decoder(self):
++         return self.model
++ 
+++    def generate(self, *args, **kwargs):
+++        """
+++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+++        """
+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+++
+++        input_ids = kwargs.get("input_ids")
+++        if input_ids is None and args:
+++            input_ids = args[0]
+++
+++        if input_ids is not None:
+++            prompt_length = input_ids.shape[1]
+++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+++                Long_Prompt = 2
+++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+++                Long_Prompt = 0
+++            else:
+++                Long_Prompt = 1
+++
+++
+++        return super().generate(*args, **kwargs)
++ 
++     def forward(
++         self,
++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++index 913a7609..6566958b 100644
++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++@@ -1104,7 +1104,7 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++ 
++     # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
++     @no_grad()
++-    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++    def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++         original_dtype = hidden_states.dtype
++         batch_size, _ = hidden_states.shape
++         expert_outputs_list = [
++@@ -1119,8 +1119,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++         return moe_output_fp32.squeeze(1).to(original_dtype)
++ 
+++
++     # @no_grad()
++-    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++    # def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++     #     num_tokens, _ = hidden_states.shape
++     #     flat_selected_experts = selected_experts.flatten()
++     #     sorted_expert_indices = flat_selected_experts.argsort()
++@@ -1142,8 +1143,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++     #         current_token_offset += expert_token_count
++     #     return moe_output
++ 
+++    # baseline
++     @no_grad()
++-    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++    def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++         """
++         优化版 MoE prefill (速度优先模式)：
++         - 批量张量化处理同一个 expert 的所有 token
++@@ -1184,7 +1186,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++         return moe_output
++ 
++ 
+++    @no_grad()
+++    def _moe_infer_prefill_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++        """
+++        优化版 MoE prefill (速度优先模式) - 连续切片 & 单次 scatter_add
+++        逻辑：
+++        1. 按 expert 排序，将同一 expert 的 token 放在连续内存中
+++        2. 每个 expert 一次性处理其全部 token
+++        3. 最后一次 scatter_add 回到原 token 顺序
+++        """
+++
+++        num_tokens = hidden_states.shape[0]
+++        hidden_size = hidden_states.shape[-1]
+++
+++        # 展平为一维
+++        flat_selected_experts = selected_experts.flatten()       # [num_tokens * top_k]
+++        flat_routing_weights = routing_weights.flatten()         # [num_tokens * top_k]
+++
+++        # 按 expert 排序
+++        idxs = flat_selected_experts.argsort()
+++        sorted_expert_indices = flat_selected_experts[idxs]       # expert ID 排序后
+++        sorted_token_indices = idxs // self.top_k                 # 对应原 token ID
+++
+++        # 排好序的输入向量（连续内存）
+++        permuted_tokens = hidden_states[sorted_token_indices]
+++
+++        # 排好序的权重
+++        sorted_weights = flat_routing_weights[idxs]
+++
+++        # 每个 expert 对应的 token 数量
+++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+++
+++        # 存放专家输出（与 permuted_tokens 对应顺序保持一致）
+++        expert_outputs = ops.zeros_like(permuted_tokens)
+++
+++        ptr = 0  # 指向当前切片的起点
+++        for expert_id, count in enumerate(tokens_per_expert.tolist()):
+++            if count == 0:
+++                continue
+++
+++            token_slice = slice(ptr, ptr + count)
+++            expert_tokens = permuted_tokens[token_slice]  # 连续切片
+++
+++            # 执行专家 MLP
+++            expert_out = self.experts[expert_id](expert_tokens)
+++
+++            expert_outputs[token_slice] = expert_out
+++            ptr += count
+++
+++        # 按权重缩放
+++        scaled_outputs = expert_outputs * sorted_weights.unsqueeze(1)
+++
+++        # 回写到原 token 顺序 (单次 scatter_add)
+++        moe_output = mindspore.mint.scatter_add(
+++            ops.zeros_like(hidden_states),
+++            0,
+++            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+++            scaled_outputs
+++        )
+++
+++        return moe_output
+++
+++
+++    
++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+++
++     @no_grad()
++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++         moe_output = ops.zeros_like(hidden_states)
++@@ -1225,16 +1291,20 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++         #     # --- 速度优先模式 (SPEED MODE) ---
++         #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++         #     if sequence_length == 1:
++-        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++        #         moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++         #     else:
++-        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++        #         moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++         
++         routing_weights_casted = routing_weights.to(hidden_states.dtype)
++         if sequence_length == 1:
++-            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++            moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++         else:
++-            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++-    
+++            # if Long_Prompt == 1:
+++            #     moe_output = self._moe_infer_prefill_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++            # else:
+++            #     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++            moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++
++ 
++         # 3. 共享专家计算与合并 (所有模式通用)
++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++index c9c8c5ee..513dd40b 100644
++--- a/patches/0001-20251104commit.patch
+++++ b/patches/0001-20251104commit.patch
++@@ -1,7 +1,7 @@
++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Tue, 4 Nov 2025 09:11:51 +0800
++-Subject: [PATCH 1/6] 20251104commit
+++Subject: [PATCH 1/7] 20251104commit
++ 
++ ---
++  mindnlp/transformers/cache_utils.py           |  28 +-
++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++index 625656eb..41081b85 100644
++--- a/patches/0002-20251106commit.patch
+++++ b/patches/0002-20251106commit.patch
++@@ -1,7 +1,7 @@
++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 09:20:38 +0800
++-Subject: [PATCH 2/6] 20251106commit
+++Subject: [PATCH 2/7] 20251106commit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++index dcb85080..c1392569 100644
++--- a/patches/0003-20261106secondcommit.patch
+++++ b/patches/0003-20261106secondcommit.patch
++@@ -1,7 +1,7 @@
++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 14:54:37 +0800
++-Subject: [PATCH 3/6] 20261106secondcommit
+++Subject: [PATCH 3/7] 20261106secondcommit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
++index bbed13cc..e548b1b2 100644
++--- a/patches/0004-20251106change.patch
+++++ b/patches/0004-20251106change.patch
++@@ -1,7 +1,7 @@
++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Thu, 6 Nov 2025 15:48:09 +0800
++-Subject: [PATCH 4/6] 20251106change
+++Subject: [PATCH 4/7] 20251106change
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  189 +-
++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
++index b2d1035c..bf224d2a 100644
++--- a/patches/0005-20251107001commit.patch
+++++ b/patches/0005-20251107001commit.patch
++@@ -1,7 +1,7 @@
++ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Fri, 7 Nov 2025 11:48:18 +0800
++-Subject: [PATCH 5/6] 20251107001commit
+++Subject: [PATCH 5/7] 20251107001commit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |   91 +-
++diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
++index bffa134e..1bd306b9 100644
++--- a/patches/0006-20251107002commit.patch
+++++ b/patches/0006-20251107002commit.patch
++@@ -1,7 +1,7 @@
++ From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
++ From: Pinoeer-kingxi <13022943007@163.com>
++ Date: Fri, 7 Nov 2025 12:06:32 +0800
++-Subject: [PATCH 6/6] 20251107002commit
+++Subject: [PATCH 6/7] 20251107002commit
++ 
++ ---
++  .../models/deepseek/modeling_deepseek.py      |  122 +-
++diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
++new file mode 100644
++index 00000000..ce558554
++--- /dev/null
+++++ b/patches/0007-20251107003commit.patch
++@@ -0,0 +1,8034 @@
+++From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
+++From: Pinoeer-kingxi <13022943007@163.com>
+++Date: Fri, 7 Nov 2025 12:12:51 +0800
+++Subject: [PATCH 7/7] 20251107003commit
+++
+++---
+++ .../models/deepseek/modeling_deepseek.py      |    2 +-
+++ patches/0001-20251104commit.patch             |    2 +-
+++ patches/0002-20251106commit.patch             |    2 +-
+++ patches/0003-20261106secondcommit.patch       |    2 +-
+++ patches/0004-20251106change.patch             |    2 +-
+++ patches/0005-20251107001commit.patch          |    2 +-
+++ patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
+++ 7 files changed, 7937 insertions(+), 6 deletions(-)
+++ create mode 100644 patches/0006-20251107002commit.patch
+++
+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++index e7e1c053..ff631974 100644
+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
+++     #     return expert_cache
+++     
+++     @no_grad()
+++-    dwj
++++    # dwj
+++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++         # x 的 shape: (1, hidden_size)
+++         # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++index 2842180e..c9c8c5ee 100644
+++--- a/patches/0001-20251104commit.patch
++++++ b/patches/0001-20251104commit.patch
+++@@ -1,7 +1,7 @@
+++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Tue, 4 Nov 2025 09:11:51 +0800
+++-Subject: [PATCH 1/5] 20251104commit
++++Subject: [PATCH 1/6] 20251104commit
+++ 
+++ ---
+++  mindnlp/transformers/cache_utils.py           |  28 +-
+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+++index c6cd8757..625656eb 100644
+++--- a/patches/0002-20251106commit.patch
++++++ b/patches/0002-20251106commit.patch
+++@@ -1,7 +1,7 @@
+++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Thu, 6 Nov 2025 09:20:38 +0800
+++-Subject: [PATCH 2/5] 20251106commit
++++Subject: [PATCH 2/6] 20251106commit
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+++index 601960c9..dcb85080 100644
+++--- a/patches/0003-20261106secondcommit.patch
++++++ b/patches/0003-20261106secondcommit.patch
+++@@ -1,7 +1,7 @@
+++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Thu, 6 Nov 2025 14:54:37 +0800
+++-Subject: [PATCH 3/5] 20261106secondcommit
++++Subject: [PATCH 3/6] 20261106secondcommit
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+++index 8976f10b..bbed13cc 100644
+++--- a/patches/0004-20251106change.patch
++++++ b/patches/0004-20251106change.patch
+++@@ -1,7 +1,7 @@
+++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Thu, 6 Nov 2025 15:48:09 +0800
+++-Subject: [PATCH 4/5] 20251106change
++++Subject: [PATCH 4/6] 20251106change
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |  189 +-
+++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+++index 8d9032be..b2d1035c 100644
+++--- a/patches/0005-20251107001commit.patch
++++++ b/patches/0005-20251107001commit.patch
+++@@ -1,7 +1,7 @@
+++ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+++ From: Pinoeer-kingxi <13022943007@163.com>
+++ Date: Fri, 7 Nov 2025 11:48:18 +0800
+++-Subject: [PATCH 5/5] 20251107001commit
++++Subject: [PATCH 5/6] 20251107001commit
+++ 
+++ ---
+++  .../models/deepseek/modeling_deepseek.py      |   91 +-
+++diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
+++new file mode 100644
+++index 00000000..bffa134e
+++--- /dev/null
++++++ b/patches/0006-20251107002commit.patch
+++@@ -0,0 +1,7931 @@
++++From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
++++From: Pinoeer-kingxi <13022943007@163.com>
++++Date: Fri, 7 Nov 2025 12:06:32 +0800
++++Subject: [PATCH 6/6] 20251107002commit
++++
++++---
++++ .../models/deepseek/modeling_deepseek.py      |  122 +-
++++ patches/0001-20251104commit.patch             |    2 +-
++++ patches/0002-20251106commit.patch             |    2 +-
++++ patches/0003-20261106secondcommit.patch       |    2 +-
++++ patches/0004-20251106change.patch             |    2 +-
++++ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
++++ 6 files changed, 7773 insertions(+), 64 deletions(-)
++++ create mode 100644 patches/0005-20251107001commit.patch
++++
++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++index 8831e4b7..e7e1c053 100644
++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
++++     #         expert_out = expert(x)
++++     #         expert_cache += expert_out * weight
++++     #     return expert_cache
++++-
++++-    # @no_grad()
++++-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++-    #     # x 的 shape: (1, hidden_size)
++++-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
++++-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++++-
++++-    #     # 1. 收集所有需要的专家层
++++-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++++-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
++++-
++++-    #     # 2. 并行计算所有专家的输出
++++-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++++-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
++++-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++++-
++++-    #     # 3. 使用矩阵乘法进行加权求和
++++-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++++-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
++++-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++++    
+++++    @no_grad()
+++++    dwj
+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++        # x 的 shape: (1, hidden_size)
+++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++++
+++++        # 1. 收集所有需要的专家层
+++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+++++
+++++        # 2. 并行计算所有专家的输出
+++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++++        # ops.cat 会将它们堆叠成一个新的 Tensor
+++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++++
+++++        # 3. 使用矩阵乘法进行加权求和
+++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++        # 最终结果 final_output 的 shape: (1, hidden_size)
+++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++++         
++++-    #     return final_output
+++++        return final_output
++++ 
++++ 
++++     # @no_grad()
++++@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
++++ 
++++         return expert_cache
++++ # 放置在 DeepseekMoE 类中
++++-    @no_grad()
++++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++-        """
++++-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
++++-
++++-        Args:
++++-            x (Tensor): 输入张量, shape: (1, hidden_size)
++++-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
++++-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
++++-        """
++++-        top_k, _ = flat_expert_weights.shape
++++-        hidden_size = x.shape[-1]
++++-
++++-        # 1. 将所有专家的权重堆叠起来
++++-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
++++-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
++++-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+++++    # @no_grad()
+++++    # #lwx 20251107
+++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++    #     """
+++++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+++++
+++++    #     Args:
+++++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
+++++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+++++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+++++    #     """
+++++    #     top_k, _ = flat_expert_weights.shape
+++++    #     hidden_size = x.shape[-1]
+++++
+++++    #     # 1. 将所有专家的权重堆叠起来
+++++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+++++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+++++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
++++         
++++-        # 2. "收集" 所需的专家权重
++++-        selected_gate_w = stacked_gate_w[flat_expert_indices]
++++-        selected_up_w = stacked_up_w[flat_expert_indices]
++++-        selected_down_w = stacked_down_w[flat_expert_indices]
+++++    #     # 2. "收集" 所需的专家权重
+++++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
+++++    #     selected_up_w = stacked_up_w[flat_expert_indices]
+++++    #     selected_down_w = stacked_down_w[flat_expert_indices]
++++         
++++-        # 3. 准备输入
++++-        x_expanded = x.expand((top_k, 1, hidden_size))
+++++    #     # 3. 准备输入
+++++    #     x_expanded = x.expand((top_k, 1, hidden_size))
++++         
++++-        # 4. 并行计算 gate_proj 和 up_proj
++++-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
++++-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+++++    #     # 4. 并行计算 gate_proj 和 up_proj
+++++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+++++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
++++ 
++++-        # 5. 计算中间状态
++++-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+++++    #     # 5. 计算中间状态
+++++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
++++         
++++-        # 6. 并行计算 down_proj
++++-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
++++-        #    --- [FIX] ---
++++-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
++++-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
++++-        #    --- [FIX END] ---
+++++    #     # 6. 并行计算 down_proj
+++++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+++++    #     #    --- [FIX] ---
+++++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+++++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+++++    #     #    --- [FIX END] ---
++++         
++++-        # 7. 根据路由权重进行加权求和
++++-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+++++    #     # 7. 根据路由权重进行加权求和
+++++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
++++ 
++++-        return weighted_sum
+++++    #     return weighted_sum
++++ 
++++ 
++++ 
++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++++index 0a0ef2d7..2842180e 100644
++++--- a/patches/0001-20251104commit.patch
+++++++ b/patches/0001-20251104commit.patch
++++@@ -1,7 +1,7 @@
++++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++ From: Pinoeer-kingxi <13022943007@163.com>
++++ Date: Tue, 4 Nov 2025 09:11:51 +0800
++++-Subject: [PATCH 1/4] 20251104commit
+++++Subject: [PATCH 1/5] 20251104commit
++++ 
++++ ---
++++  mindnlp/transformers/cache_utils.py           |  28 +-
++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++++index 5185270c..c6cd8757 100644
++++--- a/patches/0002-20251106commit.patch
+++++++ b/patches/0002-20251106commit.patch
++++@@ -1,7 +1,7 @@
++++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
++++ From: Pinoeer-kingxi <13022943007@163.com>
++++ Date: Thu, 6 Nov 2025 09:20:38 +0800
++++-Subject: [PATCH 2/4] 20251106commit
+++++Subject: [PATCH 2/5] 20251106commit
++++ 
++++ ---
++++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++++index 3e05f821..601960c9 100644
++++--- a/patches/0003-20261106secondcommit.patch
+++++++ b/patches/0003-20261106secondcommit.patch
++++@@ -1,7 +1,7 @@
++++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
++++ From: Pinoeer-kingxi <13022943007@163.com>
++++ Date: Thu, 6 Nov 2025 14:54:37 +0800
++++-Subject: [PATCH 3/4] 20261106secondcommit
+++++Subject: [PATCH 3/5] 20261106secondcommit
++++ 
++++ ---
++++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
++++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
++++index 88a1aef4..8976f10b 100644
++++--- a/patches/0004-20251106change.patch
+++++++ b/patches/0004-20251106change.patch
++++@@ -1,7 +1,7 @@
++++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
++++ From: Pinoeer-kingxi <13022943007@163.com>
++++ Date: Thu, 6 Nov 2025 15:48:09 +0800
++++-Subject: [PATCH 4/4] 20251106change
+++++Subject: [PATCH 4/5] 20251106change
++++ 
++++ ---
++++  .../models/deepseek/modeling_deepseek.py      |  189 +-
++++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
++++new file mode 100644
++++index 00000000..8d9032be
++++--- /dev/null
+++++++ b/patches/0005-20251107001commit.patch
++++@@ -0,0 +1,7707 @@
+++++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+++++From: Pinoeer-kingxi <13022943007@163.com>
+++++Date: Fri, 7 Nov 2025 11:48:18 +0800
+++++Subject: [PATCH 5/5] 20251107001commit
+++++
+++++---
+++++ .../models/deepseek/modeling_deepseek.py      |   91 +-
+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
+++++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
+++++ patches/0001-20251104commit.patch             |    2 +-
+++++ patches/0002-20251106commit.patch             |    2 +-
+++++ patches/0003-20261106secondcommit.patch       |    2 +-
+++++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
+++++ 7 files changed, 7577 insertions(+), 30 deletions(-)
+++++ create mode 100644 patches/0004-20251106change.patch
+++++
+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++index 0546f318..8831e4b7 100644
+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
+++++     #         expert_cache += expert_out * weight
+++++     #     return expert_cache
+++++ 
+++++-    @no_grad()
+++++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++-        # x 的 shape: (1, hidden_size)
+++++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+++++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+++++-
+++++-        # 1. 收集所有需要的专家层
+++++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+++++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
+++++-
+++++-        # 2. 并行计算所有专家的输出
+++++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+++++-        # ops.cat 会将它们堆叠成一个新的 Tensor
+++++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+++++-
+++++-        # 3. 使用矩阵乘法进行加权求和
+++++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+++++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+++++-        # 最终结果 final_output 的 shape: (1, hidden_size)
+++++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++++++    # @no_grad()
++++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++    #     # x 的 shape: (1, hidden_size)
++++++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
++++++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++++++
++++++    #     # 1. 收集所有需要的专家层
++++++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++++++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
++++++
++++++    #     # 2. 并行计算所有专家的输出
++++++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++++++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
++++++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++++++
++++++    #     # 3. 使用矩阵乘法进行加权求和
++++++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++++++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
++++++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+++++         
+++++-        return final_output
++++++    #     return final_output
+++++ 
+++++ 
+++++     # @no_grad()
+++++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
+++++             )
+++++ 
+++++         return expert_cache
++++++# 放置在 DeepseekMoE 类中
++++++    @no_grad()
++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++        """
++++++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
++++++
++++++        Args:
++++++            x (Tensor): 输入张量, shape: (1, hidden_size)
++++++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
++++++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
++++++        """
++++++        top_k, _ = flat_expert_weights.shape
++++++        hidden_size = x.shape[-1]
++++++
++++++        # 1. 将所有专家的权重堆叠起来
++++++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
++++++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
++++++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
++++++        
++++++        # 2. "收集" 所需的专家权重
++++++        selected_gate_w = stacked_gate_w[flat_expert_indices]
++++++        selected_up_w = stacked_up_w[flat_expert_indices]
++++++        selected_down_w = stacked_down_w[flat_expert_indices]
++++++        
++++++        # 3. 准备输入
++++++        x_expanded = x.expand((top_k, 1, hidden_size))
++++++        
++++++        # 4. 并行计算 gate_proj 和 up_proj
++++++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
++++++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
++++++
++++++        # 5. 计算中间状态
++++++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
++++++        
++++++        # 6. 并行计算 down_proj
++++++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
++++++        #    --- [FIX] ---
++++++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
++++++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
++++++        #    --- [FIX END] ---
++++++        
++++++        # 7. 根据路由权重进行加权求和
++++++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
++++++
++++++        return weighted_sum
++++++
++++++
+++++ 
+++++         # @no_grad()
+++++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++index ebd7782e..913a7609 100644
+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++++ def rotate_half(x):
+++++     """Rotates half the hidden dims of the input."""
+++++-    x1 = x[..., : x.shape[-1] // 2]
+++++-    x2 = x[..., x.shape[-1] // 2 :]
++++++    # x1 = x[..., : x.shape[-1] // 2]
++++++    # x2 = x[..., x.shape[-1] // 2 :]
+++++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++     return ops.cat((-x2, x1), dim=-1)
+++++ 
+++++ 
+++++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+++++index d059dcbe..2b217b64 100644
+++++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
++++++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+++++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++++ def rotate_half(x):
+++++     """Rotates half the hidden dims of the input."""
+++++-    x1 = x[..., : x.shape[-1] // 2]
+++++-    x2 = x[..., x.shape[-1] // 2 :]
++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++    # x1 = x[..., : x.shape[-1] // 2]
++++++    # x2 = x[..., x.shape[-1] // 2 :]
++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++     return ops.cat((-x2, x1), dim=-1)
+++++ 
+++++ 
+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++++index 78f22642..0a0ef2d7 100644
+++++--- a/patches/0001-20251104commit.patch
++++++++ b/patches/0001-20251104commit.patch
+++++@@ -1,7 +1,7 @@
+++++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++++ From: Pinoeer-kingxi <13022943007@163.com>
+++++ Date: Tue, 4 Nov 2025 09:11:51 +0800
+++++-Subject: [PATCH 1/3] 20251104commit
++++++Subject: [PATCH 1/4] 20251104commit
+++++ 
+++++ ---
+++++  mindnlp/transformers/cache_utils.py           |  28 +-
+++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+++++index 22b65dd5..5185270c 100644
+++++--- a/patches/0002-20251106commit.patch
++++++++ b/patches/0002-20251106commit.patch
+++++@@ -1,7 +1,7 @@
+++++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+++++ From: Pinoeer-kingxi <13022943007@163.com>
+++++ Date: Thu, 6 Nov 2025 09:20:38 +0800
+++++-Subject: [PATCH 2/3] 20251106commit
++++++Subject: [PATCH 2/4] 20251106commit
+++++ 
+++++ ---
+++++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+++++index 966529e4..3e05f821 100644
+++++--- a/patches/0003-20261106secondcommit.patch
++++++++ b/patches/0003-20261106secondcommit.patch
+++++@@ -1,7 +1,7 @@
+++++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+++++ From: Pinoeer-kingxi <13022943007@163.com>
+++++ Date: Thu, 6 Nov 2025 14:54:37 +0800
+++++-Subject: [PATCH 3/3] 20261106secondcommit
++++++Subject: [PATCH 3/4] 20261106secondcommit
+++++ 
+++++ ---
+++++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+++++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+++++new file mode 100644
+++++index 00000000..88a1aef4
+++++--- /dev/null
++++++++ b/patches/0004-20251106change.patch
+++++@@ -0,0 +1,7498 @@
++++++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
++++++From: Pinoeer-kingxi <13022943007@163.com>
++++++Date: Thu, 6 Nov 2025 15:48:09 +0800
++++++Subject: [PATCH 4/4] 20251106change
++++++
++++++---
++++++ .../models/deepseek/modeling_deepseek.py      |  189 +-
++++++ patches/0001-20251104commit.patch             | 1272 +++++++
++++++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
++++++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
++++++ 4 files changed, 7244 insertions(+), 186 deletions(-)
++++++ create mode 100644 patches/0001-20251104commit.patch
++++++ create mode 100644 patches/0002-20251106commit.patch
++++++ create mode 100644 patches/0003-20261106secondcommit.patch
++++++
++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++index 2f9192bf..0546f318 100644
++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
++++++ 
++++++         return attn_output, attn_weights, past_key_value
++++++ 
++++++-# class DeepseekFlashAttention(nn.Module):
++++++-#     """
++++++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++++++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
++++++-
++++++-#     This class is designed as a drop-in replacement for DeepseekAttention.
++++++-#     """
++++++-
++++++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++++++-#         super().__init__()
++++++-#         self.config = config
++++++-#         self.layer_idx = layer_idx
++++++-#         if layer_idx is None:
++++++-#             logger.warning(
++++++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++-#                 "when creating this class."
++++++-#             )
++++++-
++++++-#         self.attention_dropout = config.attention_dropout
++++++-#         self.hidden_size = config.hidden_size
++++++-#         self.num_heads = config.num_attention_heads
++++++-#         self.head_dim = self.hidden_size // self.num_heads
++++++-#         self.num_key_value_heads = config.num_key_value_heads
++++++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++-#         self.max_position_embeddings = config.max_position_embeddings
++++++-#         self.rope_theta = config.rope_theta
++++++-#         self.is_causal = True
++++++-
++++++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++++-#             raise ValueError(
++++++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++++-#                 f" and `num_heads`: {self.num_heads})."
++++++-#             )
++++++-
++++++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++++++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++++++-#         self._init_rope()
++++++-
++++++-#     def _init_rope(self):
++++++-#         if self.config.rope_scaling is None:
++++++-#             self.rotary_emb = DeepseekRotaryEmbedding(
++++++-#                 self.head_dim,
++++++-#                 max_position_embeddings=self.max_position_embeddings,
++++++-#                 base=self.rope_theta,
++++++-#             )
++++++-#         else:
++++++-#             scaling_type = self.config.rope_scaling["type"]
++++++-#             scaling_factor = self.config.rope_scaling["factor"]
++++++-#             if scaling_type == "linear":
++++++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++++++-#                     self.head_dim,
++++++-#                     max_position_embeddings=self.max_position_embeddings,
++++++-#                     scaling_factor=scaling_factor,
++++++-#                     base=self.rope_theta,
++++++-#                 )
++++++-#             elif scaling_type == "dynamic":
++++++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++++++-#                     self.head_dim,
++++++-#                     max_position_embeddings=self.max_position_embeddings,
++++++-#                     scaling_factor=scaling_factor,
++++++-#                     base=self.rope_theta,
++++++-#                 )
++++++-#             else:
++++++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++++++-
++++++-#     def forward(
++++++-#         self,
++++++-#         hidden_states: mindspore.Tensor,
++++++-#         attention_mask: Optional[mindspore.Tensor] = None,
++++++-#         position_ids: Optional[mindspore.Tensor] = None,
++++++-#         past_key_value: Optional[Cache] = None,
++++++-#         output_attentions: bool = False,
++++++-#         use_cache: bool = False,
++++++-#         **kwargs,
++++++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++-#         if "padding_mask" in kwargs:
++++++-#             warnings.warn(
++++++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++++++-#             )
++++++-        
++++++-#         if output_attentions:
++++++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
++++++-
++++++-#         bsz, q_len, _ = hidden_states.shape
++++++-
++++++-#         if self.config.pretraining_tp > 1:
++++++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++++++-
++++++-#         query_states = self.q_proj(hidden_states)
++++++-#         key_states = self.k_proj(hidden_states)
++++++-#         value_states = self.v_proj(hidden_states)
++++++-
++++++-#         # Reshape for multi-head attention
++++++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++-
++++++-#         kv_seq_len = key_states.shape[-2]
++++++-#         if past_key_value is not None:
++++++-#             if self.layer_idx is None:
++++++-#                 raise ValueError(
++++++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++-#                     "with a layer index."
++++++-#                 )
++++++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++-        
++++++-#         # Apply Rotary Positional Embedding
++++++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++-
++++++-#         if past_key_value is not None:
++++++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
++++++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++-
++++++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
++++++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
++++++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++-        
++++++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++++-        
++++++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++++-
++++++-#         # Convert attention_mask for flash_attention_score
++++++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
++++++-#         if attention_mask is not None:
++++++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
++++++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++++++-#                 raise ValueError(
++++++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++++++-#                 )
++++++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
++++++-#         else:
++++++-#             attn_mask_for_fa = None
++++++-        
++++++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++++++-
++++++-#         # Call the fused flash_attention_score operator
++++++-#         attn_output = mindspore.ops.flash_attention_score(
++++++-#             query=query_states_for_fa,
++++++-#             key=key_states_for_fa,
++++++-#             value=value_states_for_fa,
++++++-#             head_num=self.num_heads, # This is N1, the number of query heads
++++++-#             input_layout='BSH',
++++++-#             attn_mask=attn_mask_for_fa,
++++++-#             keep_prob=keep_prob,
++++++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
++++++-#             sparse_mode=0 # Default mask mode
++++++-#         )
++++++-        
++++++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
++++++-#         attn_output = self.o_proj(attn_output)
++++++-        
++++++-#         # Flash Attention does not return attention weights
++++++-#         attn_weights = None
++++++-
++++++-#         return attn_output, attn_weights, past_key_value
++++++ 
++++++ class DeepseekFlashAttention(nn.Module):
++++++     """
++++++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
++++++         super().__init__()
++++++         self.hidden_size = config.hidden_size
++++++ 
++++++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
++++++-            config=config, layer_idx=layer_idx
++++++-        )
+++++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+++++++            # config=config, layer_idx=layer_idx
+++++++        # )
++++++ 
++++++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
++++++             config=config, layer_idx=layer_idx
++++++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
++++++         return outputs
++++++ 
++++++ 
++++++-
++++++ class DeepseekPreTrainedModel(PreTrainedModel):
++++++     config_class = DeepseekConfig
++++++     base_model_prefix = "model"
++++++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++         # Initialize weights and apply final processing
++++++         self.post_init()
++++++         self.warm_up = False
++++++-        #@dwj
++++++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
++++++-            self.num_layers,
++++++-            self.num_attention_heads,
++++++-            self.head_dim,
++++++-            batch_size=1,
++++++-            max_length=self.max_length,
++++++-            dtype=mindspore.float16
++++++-        )
++++++-
++++++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
++++++-        key_cache = []
++++++-        value_cache = []
++++++-        for _ in range(num_layers):
++++++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++++-            key_cache.append(k)
++++++-            value_cache.append(v)
++++++-        return key_cache, value_cache
++++++-
++++++ 
++++++     def warmup_moe_model_deep(self):
++++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
++++++new file mode 100644
++++++index 00000000..78f22642
++++++--- /dev/null
+++++++++ b/patches/0001-20251104commit.patch
++++++@@ -0,0 +1,1272 @@
+++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++++++From: Pinoeer-kingxi <13022943007@163.com>
+++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+++++++Subject: [PATCH 1/3] 20251104commit
+++++++
+++++++---
+++++++ mindnlp/transformers/cache_utils.py           |  28 +-
+++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+++++++
+++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++++++index cadd2e04..02f8d4be 100644
+++++++--- a/mindnlp/transformers/cache_utils.py
++++++++++ b/mindnlp/transformers/cache_utils.py
+++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++++++             # k_out[:, :, cache_position] = key_states
+++++++             # v_out[:, :, cache_position] = value_states
+++++++-            if ON_ORANGE_PI:
+++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++++-            else:
+++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++++-
++++++++            # if ON_ORANGE_PI:
++++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++++            # else:
++++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++++            # 确保 cache_position 是 1D tensor 并且类型正确
++++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
++++++++            if cache_position.ndim > 1:
++++++++                cache_position = cache_position.flatten()
++++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
++++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
++++++++                cache_position = cache_position.int()
++++++++            
++++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
++++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
++++++++            k_out[:, :, cache_position] = key_states
++++++++            v_out[:, :, cache_position] = value_states
++++++++            
+++++++         return k_out, v_out
+++++++ 
+++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++index c695b944..d8303e45 100644
+++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+++++++ def rotate_half(x):
+++++++     """Rotates half the hidden dims of the input."""
+++++++-    x1 = x[..., : x.shape[-1] // 2]
+++++++-    x2 = x[..., x.shape[-1] // 2 :]
++++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++++    # x1 = x[..., : x.shape[-1] // 2]
++++++++    # x2 = x[..., x.shape[-1] // 2 :]
++++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++++     return ops.cat((-x2, x1), dim=-1)
+++++++ 
+++++++ 
+++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++++++         if self.training:
+++++++             raise NotImplementedError("Training is not supported yet.")
+++++++         else:
+++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++++-        if self.config.n_shared_experts is not None:
+++++++-            y = y + self.shared_experts(identity)
+++++++-        return y
++++++++            # @lwx
++++++++            if orig_shape[1] == 1:
++++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
++++++++                y=y.view(*orig_shape)
++++++++                if self.config.n_shared_experts is not None:
++++++++                    y = y + self.shared_experts(identity)
++++++++                return y
++++++++            else:
++++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
++++++++                if self.config.n_shared_experts is not None:
++++++++                    y = y + self.shared_experts(identity)
++++++++                return y
++++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++++        # if self.config.n_shared_experts is not None:
++++++++        #     y = y + self.shared_experts(identity)
++++++++        # return y
++++++++        
++++++++    @no_grad()
++++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++++
++++++++        expert_cache = ops.zeros_like(x)
++++++++        for i in range(self.num_experts_per_tok):
++++++++            expert_id = flat_expert_indices[i].item()
++++++++            weight = flat_expert_weights[i].item()
++++++++            expert = self.experts[expert_id]
++++++++            expert_out = expert(x)
++++++++            expert_cache += expert_out * weight
++++++++        return expert_cache
+++++++ 
+++++++     @no_grad()
+++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++-        # expert_cache = torch.zeros_like(x)
+++++++-        # idxs = flat_expert_indices.argsort()
+++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++-        # token_idxs = idxs // self.num_experts_per_tok
+++++++-        # for i, end_idx in enumerate(tokens_per_expert):
+++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++-        #     if start_idx == end_idx:
+++++++-        #         continue
+++++++-        #     expert = self.experts[i]
+++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++-        #     expert_tokens = x[exp_token_idx]
+++++++-        #     expert_out = expert(expert_tokens)
+++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++-        # return expert_cache
++++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++++         expert_cache = ops.zeros_like(x)
+++++++         idxs = flat_expert_indices.argsort()
+++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++         token_idxs = idxs // self.num_experts_per_tok
++++++++
+++++++         for i, end_idx in enumerate(tokens_per_expert):
+++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++             if start_idx == end_idx:
+++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++++++             expert_out = expert(expert_tokens)
+++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++++
+++++++         return expert_cache
++++++++        
++++++++    # @no_grad()
++++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++++    #     # expert_cache = torch.zeros_like(x)
++++++++    #     # idxs = flat_expert_indices.argsort()
++++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++++    #     # token_idxs = idxs // self.num_experts_per_tok
++++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
++++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++++    #     #     if start_idx == end_idx:
++++++++    #     #         continue
++++++++    #     #     expert = self.experts[i]
++++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++++    #     #     expert_tokens = x[exp_token_idx]
++++++++    #     #     expert_out = expert(expert_tokens)
++++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++++    #     # return expert_cache
++++++++    #     expert_cache = ops.zeros_like(x)
++++++++    #     idxs = flat_expert_indices.argsort()
++++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++++    #     token_idxs = idxs // self.num_experts_per_tok
++++++++
++++++++    #     for i, end_idx in enumerate(tokens_per_expert):
++++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++++    #         if start_idx == end_idx:
++++++++    #             continue
++++++++    #         expert = self.experts[i]
++++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++++    #         expert_tokens = x[exp_token_idx]
++++++++    #         expert_out = expert(expert_tokens)
++++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++++
++++++++    #     return expert_cache
++++++++    # @no_grad()
++++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++++    #     expert_cache = ops.zeros_like(x)
++++++++
++++++++    #     # 排序保证顺序一致
++++++++    #     idxs = flat_expert_indices.argsort()
++++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++++    #     token_idxs = idxs // self.num_experts_per_tok
++++++++
++++++++    #     # 找出有 token 的专家
++++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++++
++++++++    #     for i in active_experts.tolist():
++++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++++    #         end_idx = tokens_per_expert[i]
++++++++    #         if start_idx == end_idx:  # 没有 token
++++++++    #             continue
++++++++
++++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++++    #         expert_tokens = x[exp_token_idx]
++++++++    #         expert_out = self.experts[i](expert_tokens)
++++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++++
++++++++    #         expert_cache = mindspore.mint.scatter_add(
++++++++    #             expert_cache,
++++++++    #             0,
++++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++++    #             expert_out
++++++++    #         )
++++++++
++++++++    #     return expert_cache
++++++++
++++++++
+++++++ 
+++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++++++ #     """
+++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++++ 
+++++++         # Initialize weights and apply final processing
+++++++         self.post_init()
++++++++        self.warm_up = False
++++++++
++++++++    def warmup_moe_model_deep(self):
++++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
++++++++        test_texts = [
++++++++            "warmup short",
++++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
++++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
++++++++        ]
++++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++++        if tokenizer is None:
++++++++            from mindnlp.transformers import AutoTokenizer
++++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++++            self._warmup_tokenizer = tokenizer
++++++++
++++++++        for text in test_texts:
++++++++            inputs = tokenizer(text, return_tensors="ms")
++++++++            with mindspore._no_grad():
++++++++                _ = self(**inputs, use_cache=False)
++++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++++++ 
+++++++     def get_input_embeddings(self):
+++++++         return self.model.embed_tokens
+++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++++         ```"""
++++++++        if not self.warm_up:
++++++++            self.warm_up = True
++++++++            self.warmup_moe_model_deep()
++++++++
+++++++         output_attentions = (
+++++++             output_attentions
+++++++             if output_attentions is not None
+++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++index 3cbf820e..d4c6b651 100644
+++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++@@ -18,7 +18,6 @@
+++++++ # See the License for the specific language governing permissions and
+++++++ # limitations under the License.
+++++++ """MindSpore Qwen2MoE model."""
+++++++-
+++++++ import math
+++++++ from typing import List, Optional, Tuple, Union
+++++++ 
+++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++++++     TokenClassifierOutput,
+++++++ )
+++++++ from ...modeling_utils import PreTrainedModel
++++++++from ...generation import GenerationMixin
+++++++ from ....utils import logging
+++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+++++++ 
+++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++++++         self.variance_epsilon = eps
+++++++ 
+++++++     def forward(self, hidden_states):
++++++++        # @dwj
++++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++++        # @lwx
++++++++        # if not self.training :
++++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++++         input_dtype = hidden_states.dtype
+++++++         hidden_states = hidden_states.to(mindspore.float32)
+++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++++++@@ -234,6 +239,8 @@ def rotate_half(x):
+++++++     """Rotates half the hidden dims of the input."""
+++++++     x1 = x[..., : x.shape[-1] // 2]
+++++++     x2 = x[..., x.shape[-1] // 2 :]
++++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
++++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++++     return ops.cat((-x2, x1), dim=-1)
+++++++ 
+++++++ 
+++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++++++         self.config = config
+++++++         self.hidden_size = config.hidden_size
+++++++         self.intermediate_size = intermediate_size
++++++++        
+++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++++++         self.act_fn = ACT2FN[config.hidden_act]
+++++++ 
+++++++     def forward(self, x):
+++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++++-
+++++++ 
++++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++++        # @lwx 
++++++++        # gate_up_output = self.gate_up_proj(x)
++++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
++++++++        # return self.down_proj(swiglu_output)
++++++++
++++++++    # def forward(self, x):
++++++++    #     gate_proj_out = self.gate_proj(x)
++++++++    #     up_proj_out = self.up_proj(x)
++++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
++++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
++++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
++++++++    #     return self.down_proj(swiglu_out)
++++++++        
+++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++++++     """
+++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++++++         use_cache: bool = False,
+++++++         cache_position: Optional[mindspore.Tensor] = None,
+++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++        
++++++++
+++++++         bsz, q_len, _ = hidden_states.shape
+++++++ 
+++++++         query_states = self.q_proj(hidden_states)
+++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++                     "with a layer index."
+++++++                 )
+++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++            if isinstance(past_key_value, StaticCache):
++++++++                kv_seq_len = key_states.shape[-2]
++++++++            else:
++++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++ 
+++++++         if past_key_value is not None:
+++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++++            
++++++++            if isinstance(past_key_value, StaticCache):
++++++++                kv_seq_len = key_states.shape[-2]
+++++++ 
+++++++         # repeat k/v heads if n_kv_heads < n_heads
+++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++-
++++++++        
+++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++++ 
+++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++++++-            raise ValueError(
+++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++++++-                f" {attn_weights.shape}"
+++++++-            )
+++++++-
+++++++-        if attention_mask is not None:  # no matter the length, we just slice it
+++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
++++++++        if attention_mask is not None:
++++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++++             attn_weights = attn_weights + causal_mask
+++++++ 
+++++++         # upcast attention to fp32
+++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++++ 
+++++++         attn_output = self.o_proj(attn_output)
+++++++-
++++++++        # @lwx
++++++++        
++++++++        # max_seq_len = self.max_position_embeddings  # 2048
++++++++
++++++++        # if attention_mask is not None:
++++++++        #     # attention_mask: [B, 1, Sq, Sk]
++++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++++++
++++++++        #     # pad 到 [max_seq_len, max_seq_len]
++++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++++++        #     global_attention_mask = padded_mask
++++++++        # else:
++++++++        #     global_attention_mask = None
++++++++
++++++++
++++++++        # sparse_mode=3
++++++++        # attn_output = mindspore.ops.flash_attention_score(
++++++++        #     query=query_states,
++++++++        #     key=key_states,
++++++++        #     value=value_states,
++++++++        #     real_shift=None,
++++++++        #     padding_mask=None,
++++++++
++++++++        #     head_num=self.num_heads,
++++++++        #     attn_mask=global_attention_mask,
++++++++        #     keep_prob=1.0 - self.attention_dropout,
++++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++        #     input_layout="BNSD",
++++++++        #     pre_tokens=2147483647,
++++++++        #     next_tokens=2147483647,
++++++++        #     inner_precise=0,
++++++++        #     drop_mask=None,
++++++++        #     prefix=None,
++++++++        #     actual_seq_qlen=None,
++++++++        #     actual_seq_kvlen=None,
++++++++        #     sparse_mode=sparse_mode,
++++++++        # )
+++++++         if not output_attentions:
+++++++             attn_weights = None
+++++++ 
+++++++         return attn_output, attn_weights, past_key_value
+++++++ 
+++++++ 
++++++++class Qwen2MoeFlashAttention(nn.Module):
++++++++    """
++++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++++++
++++++++    关键改动:
++++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++++++       直接传入原始的 key 和 value 张量效率更高。
++++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++++    """
++++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++++        super().__init__()
++++++++        self.config = config
++++++++        self.layer_idx = layer_idx
++++++++        self.hidden_size = config.hidden_size
++++++++        self.num_heads = config.num_attention_heads
++++++++        self.head_dim = self.hidden_size // self.num_heads
++++++++        self.num_key_value_heads = config.num_key_value_heads
++++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++++        self.max_position_embeddings = config.max_position_embeddings
++++++++        self.rope_theta = config.rope_theta
++++++++        self.attention_dropout = config.attention_dropout
++++++++
++++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++++++            raise ValueError(
++++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++++++            )
++++++++
++++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++++
++++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++++            self.head_dim,
++++++++            max_position_embeddings=self.max_position_embeddings,
++++++++            base=self.rope_theta,
++++++++        )
++++++++
++++++++    def forward(
++++++++        self,
++++++++        hidden_states: mindspore.Tensor,
++++++++        attention_mask: Optional[mindspore.Tensor] = None,
++++++++        position_ids: Optional[mindspore.Tensor] = None,
++++++++        past_key_value: Optional[Cache] = None,
++++++++        output_attentions: bool = False,
++++++++        use_cache: bool = False,
++++++++        cache_position: Optional[mindspore.Tensor] = None,
++++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++        bsz, q_len, _ = hidden_states.shape
++++++++
++++++++        # 1. 线性投射 Q, K, V
++++++++        query_states = self.q_proj(hidden_states)
++++++++        key_states = self.k_proj(hidden_states)
++++++++        value_states = self.v_proj(hidden_states)
++++++++
++++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
++++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++        # 3. RoPE 旋转位置编码
++++++++        kv_seq_len = key_states.shape[-2]
++++++++        if past_key_value is not None:
++++++++            if self.layer_idx is None:
++++++++                raise ValueError(
++++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++                    "with a layer index."
++++++++                )
++++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
++++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++++++                if cache_position.shape[0] == 1:
++++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++++++                    kv_seq_len = past_seen_tokens + 1
++++++++                else:
++++++++                    # prefill 阶段：cache_position 是范围，使用其长度
++++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++++++            else:
++++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++
++++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++        # 4. KV 缓存更新
++++++++        if past_key_value is not None:
++++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++++            key_states, value_states = past_key_value.update(
++++++++                key_states, value_states, self.layer_idx, cache_kwargs
++++++++            )
++++++++            
++++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++++                if cache_position.shape[0] == 1:
++++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++++++                    kv_seq_len = key_states.shape[-2]
++++++++
++++++++        # 5. [重要] 准备 Attention Mask
++++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++++        fa_attention_mask = None
++++++++        if attention_mask is not None:
++++++++            # 截取与当前key长度匹配的部分
++++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
++++++++            fa_attention_mask = (mask_slice != 0)
++++++++
++++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++++++        input_dtype = query_states.dtype
++++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++++++            query_states = query_states.to(mindspore.float16)
++++++++            key_states = key_states.to(mindspore.float16)
++++++++            value_states = value_states.to(mindspore.float16)
++++++++
++++++++        # 6. [核心] 调用 flash_attention_score 算子
++++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++++        attn_output = mindspore.ops.flash_attention_score(
++++++++            query=query_states,
++++++++            key=key_states,
++++++++            value=value_states,
++++++++            head_num=self.num_heads, # 传入Q的头数(N1)
++++++++            attn_mask=fa_attention_mask,
++++++++            keep_prob=1.0 - self.attention_dropout,
++++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++            input_layout="BNSD",
++++++++            sparse_mode=0 # 使用 defaultMask 模式
++++++++        )
++++++++
++++++++        # 恢复原始数据类型
++++++++        attn_output = attn_output.to(input_dtype)
++++++++
++++++++        # 7. 调整输出形状
++++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++        attn_output = self.o_proj(attn_output)
++++++++
++++++++        # FlashAttention 算子不直接返回注意力权重矩阵
++++++++        attn_weights = None
++++++++        if output_attentions:
++++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++++
++++++++        return attn_output, attn_weights, past_key_value
++++++++
++++++++    # def forward(
++++++++    #     self,
++++++++    #     hidden_states: mindspore.Tensor,
++++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++++++    #     past_key_value: Optional[Cache] = None,
++++++++    #     output_attentions: bool = False,
++++++++    #     use_cache: bool = False,
++++++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++    #     bsz, q_len, _ = hidden_states.shape
++++++++
++++++++    #     # 1. 线性投射 Q, K, V
++++++++    #     query_states = self.q_proj(hidden_states)
++++++++    #     key_states = self.k_proj(hidden_states)
++++++++    #     value_states = self.v_proj(hidden_states)
++++++++
++++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++    #     # 3. RoPE 旋转位置编码
++++++++    #     kv_seq_len = key_states.shape[-2]
++++++++    #     if past_key_value is not None:
++++++++    #         if self.layer_idx is None:
++++++++    #             raise ValueError(
++++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++    #                 "with a layer index."
++++++++    #             )
++++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++
++++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++    #     # 4. KV 缓存更新
++++++++    #     if past_key_value is not None:
++++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++++    #         key_states, value_states = past_key_value.update(
++++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++++    #         )
++++++++
++++++++    #     # 5. 准备 Attention Mask
++++++++    #     fa_attention_mask = None
++++++++    #     if attention_mask is not None:
++++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++    #         fa_attention_mask = (mask_slice != 0)
++++++++
++++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++++++    #     input_dtype = query_states.dtype
++++++++
++++++++    #     # 6. [核心] 调用 flash_attention_score 算子
++++++++    #     attn_output = mindspore.ops.flash_attention_score(
++++++++    #         query=query_states,
++++++++    #         key=key_states,
++++++++    #         value=value_states,
++++++++    #         head_num=self.num_heads,
++++++++    #         attn_mask=fa_attention_mask,
++++++++    #         keep_prob=1.0 - self.attention_dropout,
++++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++    #         input_layout="BNSD",
++++++++    #         sparse_mode=0,
++++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
++++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++++++    #         inner_precise=1
++++++++    #     )
++++++++
++++++++    #     # 恢复原始数据类型
++++++++    #     attn_output = attn_output.to(input_dtype)
++++++++
++++++++    #     # 7. 调整输出形状
++++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++    #     attn_output = self.o_proj(attn_output)
++++++++
++++++++    #     attn_weights = None
++++++++    #     if output_attentions:
++++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++++
++++++++    #     return attn_output, attn_weights, past_key_value
++++++++
++++++++    # def forward(
++++++++    #     self,
++++++++    #     hidden_states: mindspore.Tensor,
++++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
++++++++    #     position_ids: Optional[mindspore.Tensor] = None,
++++++++    #     past_key_value: Optional[Cache] = None,
++++++++    #     output_attentions: bool = False,
++++++++    #     use_cache: bool = False,
++++++++    #     cache_position: Optional[mindspore.Tensor] = None,
++++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++    #     bsz, q_len, _ = hidden_states.shape
++++++++
++++++++    #     query_states = self.q_proj(hidden_states)
++++++++    #     key_states = self.k_proj(hidden_states)
++++++++    #     value_states = self.v_proj(hidden_states)
++++++++
++++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++    #     kv_seq_len = key_states.shape[-2]
++++++++    #     if past_key_value is not None:
++++++++    #         if self.layer_idx is None:
++++++++    #             raise ValueError("`layer_idx` must be specified for caching")
++++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++
++++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++    #     if past_key_value is not None:
++++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++++    #         key_states, value_states = past_key_value.update(
++++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
++++++++    #         )
++++++++
++++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++++
++++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
++++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
++++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
++++++++    #     query_states = query_states / math.sqrt(self.head_dim)
++++++++    #     # <--- 修改结束 ---
++++++++
++++++++    #     fa_attention_mask = None
++++++++    #     if attention_mask is not None:
++++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++    #         fa_attention_mask = (mask_slice != 0)
++++++++
++++++++    #     input_dtype = query_states.dtype
++++++++
++++++++    #     attn_output = mindspore.ops.flash_attention_score(
++++++++    #         query=query_states,    # 传入已经预先缩放过的 query
++++++++    #         key=key_states,
++++++++    #         value=value_states,
++++++++    #         head_num=self.num_heads,
++++++++    #         attn_mask=fa_attention_mask,
++++++++    #         keep_prob=1.0 - self.attention_dropout,
++++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
++++++++    #         input_layout="BNSD",
++++++++    #         sparse_mode=0,
++++++++    #         inner_precise=1        # 仍然保持内部高精度计算
++++++++    #     )
++++++++
++++++++    #     attn_output = attn_output.to(input_dtype)
++++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++    #     attn_output = self.o_proj(attn_output)
++++++++
++++++++    #     attn_weights = None
++++++++    #     if output_attentions:
++++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++++++
++++++++    #     return attn_output, attn_weights, past_key_value
++++++++
+++++++ QWEN2MOE_ATTENTION_CLASSES = {
+++++++     "eager": Qwen2MoeAttention,
++++++++    "flash-attention": Qwen2MoeFlashAttention,
+++++++ }
+++++++ 
+++++++ 
+++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++ 
++++++++    #@dwj
++++++++    # 只遍历激活的专家，而非全部专家
+++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+++++++-        # router_logits: (batch * sequence_length, n_experts)
+++++++-        router_logits = self.gate(hidden_states)
+++++++-
+++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++-        if self.norm_topk_prob:
+++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-        # we cast back to the input dtype
+++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+++++++-
+++++++-        final_hidden_states = ops.zeros(
+++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++++++-        )
+++++++-
+++++++-        # One hot encode the selected experts to create an expert mask
+++++++-        # this will be used to easily index which expert is going to be sollicitated
+++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++++++-
+++++++-        # Loop over all available experts in the model and perform the computation on each expert
+++++++-        for expert_idx in range(self.num_experts):
+++++++-            expert_layer = self.experts[expert_idx]
+++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++++++-
+++++++-            # Index the correct hidden states and compute the expert hidden state for
+++++++-            # the current expert. We need to make sure to multiply the output hidden
+++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++++++-            if 0 not in idx.shape:
+++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++++++-
+++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+++++++-                # the `top_x` tensor here.
+++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++++++-
+++++++-        shared_expert_output = self.shared_expert(hidden_states)
+++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++++++-
+++++++-        final_hidden_states = final_hidden_states + shared_expert_output
++++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++            num_tokens = hidden_states_reshaped.shape[0]
++++++++
++++++++            router_logits = self.gate(hidden_states_reshaped)
++++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++++
++++++++            if self.norm_topk_prob:
++++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++            routing_weights = routing_weights.to(hidden_states.dtype)
++++++++            
++++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++++++            flat_selected_experts = selected_experts.flatten()
++++++++            
++++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++++++            token_indices = broadcasted_token_indices.flatten()
++++++++            
++++++++            active_experts = ops.unique(flat_selected_experts)
++++++++            
++++++++            for expert_idx_tensor in active_experts:
++++++++                expert_idx = expert_idx_tensor.item()
++++++++                expert_layer = self.experts[expert_idx]
++++++++                
++++++++                mask = (flat_selected_experts == expert_idx_tensor)
++++++++                selected_token_indices = token_indices[mask]
++++++++                selected_routing_weights = routing_weights.flatten()[mask]
++++++++                
++++++++                current_states = hidden_states_reshaped[selected_token_indices]
++++++++                
++++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++++                
++++++++                final_hidden_states = final_hidden_states.index_add(
++++++++                    dim=0,
++++++++                    index=selected_token_indices,
++++++++                    source=expert_output.to(hidden_states.dtype)
++++++++                )
++++++++            
++++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++++ 
+++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++-        return final_hidden_states, router_logits
++++++++            final_hidden_states = final_hidden_states + shared_expert_output
++++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++++            
++++++++            return final_hidden_states, router_logits
+++++++ 
+++++++ 
+++++++ class Qwen2MoeDecoderLayer(nn.Module):
+++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++++++ 
+++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++++ 
++++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++++
+++++++         if (layer_idx not in config.mlp_only_layers) and (
+++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++++++         ):
+++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++++++     _skip_keys_device_placement = "past_key_values"
+++++++     _supports_cache_class = True
++++++++#lwx
++++++++    # _supports_static_cache = True
+++++++ 
+++++++     def _init_weights(self, module):
+++++++         std = self.config.initializer_range
+++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++++++         return causal_mask
+++++++ 
+++++++ 
+++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++     _tied_weights_keys = ["lm_head.weight"]
+++++++ 
+++++++     def __init__(self, config):
+++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++         self.num_experts_per_tok = config.num_experts_per_tok
+++++++         # Initialize weights and apply final processing
+++++++         self.post_init()
++++++++        # @lwx
++++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
++++++++        #     self.generation_config.cache_implementation = "static"
++++++++        self._warmed_up = False
++++++++
++++++++    def warmup_moe_model(self):
++++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
++++++++        test_texts = [
++++++++            "warmup short",
++++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
++++++++        ]
++++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
++++++++        if tokenizer is None:
++++++++            from mindnlp.transformers import AutoTokenizer
++++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
++++++++            self._warmup_tokenizer = tokenizer
++++++++
++++++++        for text in test_texts:
++++++++            inputs = tokenizer(text, return_tensors="ms")
++++++++            with mindspore._no_grad():
++++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
++++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++++++ 
+++++++     def get_input_embeddings(self):
+++++++         return self.model.embed_tokens
+++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++++         ```"""
++++++++        if not self._warmed_up:
++++++++            self._warmed_up = True
++++++++            self.warmup_moe_model()
+++++++ 
+++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++++++         output_router_logits = (
+++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++             }
+++++++         )
+++++++         return model_inputs
++++++++# @lwx
++++++++    # def _decode_one_tokens_logits(
++++++++    #     self,
++++++++    #     cur_token: mindspore.Tensor,
++++++++    #     input_pos: Optional[mindspore.Tensor],
++++++++    #     cache_position: mindspore.Tensor,
++++++++    #     past_key_values: StaticCache,
++++++++    # ) -> mindspore.Tensor:
++++++++    #     """
++++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
++++++++        
++++++++    #     Args:
++++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
++++++++    #         input_pos: 输入位置信息，可选
++++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
++++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
++++++++            
++++++++    #     Returns:
++++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
++++++++    #     """
++++++++    #     # 调用JIT编译的版本
++++++++    #     return self.get_decode_one_tokens_logits(
++++++++    #         cur_token=cur_token,
++++++++    #         input_pos=input_pos,
++++++++    #         cache_position=cache_position,
++++++++    #         past_key_values=past_key_values,
++++++++    #     )
++++++++    
++++++++    # @mindspore.jit(jit_level='O1')
++++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
++++++++    #     """
++++++++    #     JIT编译的函数，用于高效的单token解码
++++++++    #     使用JIT编译优化以支持静态shape和高效执行
++++++++        
++++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
++++++++    #     """
++++++++    #     outputs = self.model.forward(
++++++++    #         input_ids=cur_token,
++++++++    #         position_ids=input_pos,
++++++++    #         cache_position=cache_position,
++++++++    #         past_key_values=past_key_values,
++++++++    #         use_cache=True,
++++++++    #         return_dict=False,
++++++++    #     )
++++++++        
++++++++    #     hidden_states = outputs[0]
++++++++    #     logits = self.lm_head.forward(hidden_states)
++++++++    #     logits = logits.float()
++++++++        
++++++++    #     return logits[:, -1, :]
++++++++
++++++++    # def _sample(
++++++++    #     self,
++++++++    #     input_ids: mindspore.Tensor,
++++++++    #     logits_processor,
++++++++    #     stopping_criteria,
++++++++    #     generation_config,
++++++++    #     synced_devices: bool,
++++++++    #     streamer=None,
++++++++    #     logits_warper=None,
++++++++    #     **model_kwargs,
++++++++    # ):
++++++++    #     """
++++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
++++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
++++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
++++++++    #     """
++++++++    #     from ...generation.logits_process import LogitsProcessorList
++++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
++++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
++++++++    #     from mindnlp.core import nn, ops, no_grad
++++++++    #     import numpy as np
++++++++        
++++++++    #     # 检查是否使用 StaticCache
++++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
++++++++    #     # 否则，直接调用父类方法
++++++++    #     past_key_values = model_kwargs.get("past_key_values")
++++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
++++++++
++++++++    #     if not isinstance(past_key_values, StaticCache):
++++++++    #         # 不使用 StaticCache，直接调用父类方法
++++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
++++++++    #         return super()._sample(
++++++++    #             input_ids=input_ids,
++++++++    #             logits_processor=logits_processor,
++++++++    #             stopping_criteria=stopping_criteria,
++++++++    #             generation_config=generation_config,
++++++++    #             synced_devices=synced_devices,
++++++++    #             streamer=streamer,
++++++++    #             logits_warper=logits_warper,
++++++++    #             **model_kwargs,
++++++++    #         )
++++++++        
++++++++    #     # 使用 StaticCache，进入自定义循环
++++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
++++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
++++++++    #     pad_token_id = generation_config._pad_token_tensor
++++++++    #     output_attentions = generation_config.output_attentions
++++++++    #     output_hidden_states = generation_config.output_hidden_states
++++++++    #     output_scores = generation_config.output_scores
++++++++    #     output_logits = generation_config.output_logits
++++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
++++++++    #     max_length = generation_config.max_length
++++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
++++++++    #     do_sample = generation_config.do_sample
++++++++        
++++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
++++++++    #         raise ValueError(
++++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
++++++++    #             f"{logits_warper})."
++++++++    #         )
++++++++        
++++++++    #     # init attention / hidden states / scores tuples
++++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
++++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
++++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
++++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
++++++++        
++++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
++++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
++++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
++++++++    #         encoder_hidden_states = (
++++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
++++++++    #         )
++++++++        
++++++++    #     # keep track of which sequences are already finished
++++++++    #     batch_size, cur_len = input_ids.shape
++++++++    #     this_peer_finished = False
++++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
++++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
++++++++        
++++++++    #     time_record = []
++++++++    #     from ....utils.testing_utils import parse_flag_from_env
++++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
++++++++        
++++++++    #     while self._has_unfinished_sequences(
++++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
++++++++    #     ):
++++++++    #         if _record_time:
++++++++    #             import time as time_module
++++++++    #             infer_start = time_module.time()
++++++++            
++++++++    #         # prepare model inputs
++++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
++++++++            
++++++++    #         # prepare variable output controls
++++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
++++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
++++++++            
++++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
++++++++    #         cur_cache_position = model_inputs.get("cache_position")
++++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
++++++++    #         cur_input_ids = model_inputs.get("input_ids")
++++++++            
++++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
++++++++    #             cur_cache_position is not None and 
++++++++    #             len(cur_cache_position.shape) > 0 and
++++++++    #             cur_cache_position.shape[0] == 1 and
++++++++    #             cur_input_ids is not None and
++++++++    #             cur_input_ids.shape[1] == 1):
++++++++    #             # 使用 JIT 优化的单 token 解码
++++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
++++++++    #             if not hasattr(self, '_jit_used'):
++++++++    #                 self._jit_used = False
++++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
++++++++                
++++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
++++++++    #                 cur_token=cur_input_ids,
++++++++    #                 input_pos=model_inputs.get("position_ids"),
++++++++    #                 cache_position=cur_cache_position,
++++++++    #                 past_key_values=cur_past_key_values,
++++++++    #             )
++++++++                
++++++++    #             # 标记已使用JIT（用于后续判断）
++++++++    #             if not self._jit_used:
++++++++    #                 self._jit_used = True
++++++++                
++++++++    #             # 构造兼容的输出对象
++++++++    #             class JitOptimizedOutput:
++++++++    #                 def __init__(self, logits, config):
++++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
++++++++    #                     self.config = config
++++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
++++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
++++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
++++++++    #                     self.cross_attentions = None
++++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
++++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
++++++++                
++++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
++++++++    #         else:
++++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
++++++++    #             outputs = self(**model_inputs, return_dict=True)
++++++++            
++++++++    #         if synced_devices and this_peer_finished:
++++++++    #             continue
++++++++            
++++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
++++++++    #         next_token_logits = outputs.logits[:, -1, :]
++++++++            
++++++++    #         # pre-process distribution
++++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
++++++++    #         if do_sample:
++++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
++++++++            
++++++++    #         # Store scores, attentions and hidden_states when required
++++++++    #         if return_dict_in_generate:
++++++++    #             if output_scores:
++++++++    #                 scores += (next_token_scores,)
++++++++    #             if output_logits:
++++++++    #                 raw_logits += (next_token_logits,)
++++++++    #             if output_attentions:
++++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
++++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
++++++++    #                 if self.config.is_encoder_decoder:
++++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
++++++++                
++++++++    #             if output_hidden_states:
++++++++    #                 hidden = (
++++++++    #                     outputs.decoder_hidden_states
++++++++    #                     if self.config.is_encoder_decoder
++++++++    #                     else outputs.hidden_states
++++++++    #                 )
++++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
++++++++            
++++++++    #         # token selection
++++++++    #         if do_sample:
++++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
++++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
++++++++    #         else:
++++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
++++++++            
++++++++    #         # finished sentences should have their next token be a padding token
++++++++    #         if has_eos_stopping_criteria:
++++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
++++++++            
++++++++    #         # update generated ids, model inputs, and length for next step
++++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
++++++++    #         if streamer is not None:
++++++++    #             streamer.put(next_tokens)
++++++++            
++++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
++++++++    #             outputs,
++++++++    #             model_kwargs,
++++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
++++++++    #         )
++++++++            
++++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
++++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
++++++++    #         cur_len += 1
++++++++            
++++++++    #         if _record_time:
++++++++    #             import time as time_module
++++++++    #             infer_stop = time_module.time()
++++++++    #             time_record.append(infer_stop - infer_start)
++++++++            
++++++++    #         del outputs
++++++++        
++++++++    #     average_infer_time = None
++++++++    #     if time_record:
++++++++    #         if len(time_record) > 1:
++++++++    #             time_record.pop(0)
++++++++    #         average_infer_time = sum(time_record) / len(time_record)
++++++++    #         print(f'average inference time is: {average_infer_time}')
++++++++    #         print(f'inference time record: {time_record}')
++++++++        
++++++++    #     if streamer is not None:
++++++++    #         streamer.end()
++++++++        
++++++++    #     # 简单判断：打印是否使用了JIT路径
++++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
++++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
++++++++    #     else:
++++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
++++++++        
++++++++    #     if return_dict_in_generate:
++++++++    #         if self.config.is_encoder_decoder:
++++++++    #             return GenerateEncoderDecoderOutput(
++++++++    #                 sequences=input_ids,
++++++++    #                 scores=scores,
++++++++    #                 logits=raw_logits,
++++++++    #                 encoder_attentions=encoder_attentions,
++++++++    #                 encoder_hidden_states=encoder_hidden_states,
++++++++    #                 decoder_attentions=decoder_attentions,
++++++++    #                 cross_attentions=cross_attentions,
++++++++    #                 decoder_hidden_states=decoder_hidden_states,
++++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++++    #                 average_infer_time=average_infer_time
++++++++    #             )
++++++++    #         else:
++++++++    #             return GenerateDecoderOnlyOutput(
++++++++    #                 sequences=input_ids,
++++++++    #                 scores=scores,
++++++++    #                 logits=raw_logits,
++++++++    #                 attentions=decoder_attentions,
++++++++    #                 hidden_states=decoder_hidden_states,
++++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
++++++++    #                 average_infer_time=average_infer_time
++++++++    #             )
++++++++    #     else:
++++++++    #         return input_ids
++++++++            
++++++++    # def _prepare_cache_for_generation(
++++++++    #     self,
++++++++    #     generation_config,
++++++++    #     model_kwargs,
++++++++    #     assistant_model,
++++++++    #     batch_size,
++++++++    #     max_cache_length,
++++++++    # ):
++++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
++++++++    #         generation_config.cache_implementation = "static"
++++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
++++++++        
++++++++    #     if generation_config.cache_implementation == "static":
++++++++    #         base_required_from_max_length = generation_config.max_length + 1
++++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
++++++++    #         min_cache_size = 50
++++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
++++++++    #         else:
++++++++    #             max_cache_length = max(base_required, min_cache_size)
++++++++            
++++++++    #         original_max_cache_length = max_cache_length
++++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
++++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
++++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
++++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
++++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
++++++++            
++++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
++++++++    #             if max_cache_length > self.config.max_position_embeddings:
++++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
++++++++        
++++++++    #     result = super()._prepare_cache_for_generation(
++++++++    #         generation_config=generation_config,
++++++++    #         model_kwargs=model_kwargs,
++++++++    #         assistant_model=assistant_model,
++++++++    #         batch_size=batch_size,
++++++++    #         max_cache_length=max_cache_length,
++++++++    #     )
++++++++        
++++++++    #     if generation_config.cache_implementation == "static":
++++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
++++++++    #         created_cache = model_kwargs.get(cache_name)
++++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
++++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
++++++++    #             if created_cache.max_cache_len < generation_config.max_length:
++++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
++++++++        
++++++++    #     return result
++++++++
++++++++
++++++++
+++++++ 
+++++++ 
+++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++++++-- 
+++++++2.27.0
+++++++
++++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
++++++new file mode 100644
++++++index 00000000..22b65dd5
++++++--- /dev/null
+++++++++ b/patches/0002-20251106commit.patch
++++++@@ -0,0 +1,3200 @@
+++++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+++++++From: Pinoeer-kingxi <13022943007@163.com>
+++++++Date: Thu, 6 Nov 2025 09:20:38 +0800
+++++++Subject: [PATCH 2/3] 20251106commit
+++++++
+++++++---
+++++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+++++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+++++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
+++++++ create mode 100644 patches/0001-20251104commit.patch
+++++++
+++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++index d8303e45..73773c22 100644
+++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+++++++         #     y = y + self.shared_experts(identity)
+++++++         # return y
+++++++         
++++++++    # @no_grad()
++++++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++++
++++++++    #     expert_cache = ops.zeros_like(x)
++++++++    #     for i in range(self.num_experts_per_tok):
++++++++    #         expert_id = flat_expert_indices[i].item()
++++++++    #         weight = flat_expert_weights[i].item()
++++++++    #         expert = self.experts[expert_id]
++++++++    #         expert_out = expert(x)
++++++++    #         expert_cache += expert_out * weight
++++++++    #     return expert_cache
++++++++
+++++++     @no_grad()
+++++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
++++++++        # x 的 shape: (1, hidden_size)
++++++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
++++++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
++++++++
++++++++        # 1. 收集所有需要的专家层
++++++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
++++++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
++++++++
++++++++        # 2. 并行计算所有专家的输出
++++++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
++++++++        # ops.cat 会将它们堆叠成一个新的 Tensor
++++++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
++++++++
++++++++        # 3. 使用矩阵乘法进行加权求和
++++++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
++++++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
++++++++        # 最终结果 final_output 的 shape: (1, hidden_size)
++++++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
++++++++        
++++++++        return final_output
+++++++ 
+++++++-        expert_cache = ops.zeros_like(x)
+++++++-        for i in range(self.num_experts_per_tok):
+++++++-            expert_id = flat_expert_indices[i].item()
+++++++-            weight = flat_expert_weights[i].item()
+++++++-            expert = self.experts[expert_id]
+++++++-            expert_out = expert(x)
+++++++-            expert_cache += expert_out * weight
+++++++-        return expert_cache
+++++++ 
+++++++     @no_grad()
+++++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+++++++             key_states = self.k_proj(hidden_states)
+++++++             value_states = self.v_proj(hidden_states)
+++++++ 
+++++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++++        # @lwx
++++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
++++++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
++++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
++++++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
++++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
++++++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+++++++ 
+++++++         kv_seq_len = key_states.shape[-2]
+++++++         if past_key_value is not None:
+++++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+++++++         return attn_output, attn_weights, past_key_value
+++++++ 
+++++++ 
++++++++# class DeepseekFlashAttention(nn.Module):
++++++++#     """
++++++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
++++++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
++++++++
++++++++#     This class is designed as a drop-in replacement for DeepseekAttention.
++++++++#     """
++++++++
++++++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++++++++#         super().__init__()
++++++++#         self.config = config
++++++++#         self.layer_idx = layer_idx
++++++++#         if layer_idx is None:
++++++++#             logger.warning(
++++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++++#                 "when creating this class."
++++++++#             )
++++++++
++++++++#         self.attention_dropout = config.attention_dropout
++++++++#         self.hidden_size = config.hidden_size
++++++++#         self.num_heads = config.num_attention_heads
++++++++#         self.head_dim = self.hidden_size // self.num_heads
++++++++#         self.num_key_value_heads = config.num_key_value_heads
++++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++++#         self.max_position_embeddings = config.max_position_embeddings
++++++++#         self.rope_theta = config.rope_theta
++++++++#         self.is_causal = True
++++++++
++++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++++++#             raise ValueError(
++++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++++++#                 f" and `num_heads`: {self.num_heads})."
++++++++#             )
++++++++
++++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++++++++#         self._init_rope()
++++++++
++++++++#     def _init_rope(self):
++++++++#         if self.config.rope_scaling is None:
++++++++#             self.rotary_emb = DeepseekRotaryEmbedding(
++++++++#                 self.head_dim,
++++++++#                 max_position_embeddings=self.max_position_embeddings,
++++++++#                 base=self.rope_theta,
++++++++#             )
++++++++#         else:
++++++++#             scaling_type = self.config.rope_scaling["type"]
++++++++#             scaling_factor = self.config.rope_scaling["factor"]
++++++++#             if scaling_type == "linear":
++++++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++++++++#                     self.head_dim,
++++++++#                     max_position_embeddings=self.max_position_embeddings,
++++++++#                     scaling_factor=scaling_factor,
++++++++#                     base=self.rope_theta,
++++++++#                 )
++++++++#             elif scaling_type == "dynamic":
++++++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++++++++#                     self.head_dim,
++++++++#                     max_position_embeddings=self.max_position_embeddings,
++++++++#                     scaling_factor=scaling_factor,
++++++++#                     base=self.rope_theta,
++++++++#                 )
++++++++#             else:
++++++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++++++++
++++++++#     def forward(
++++++++#         self,
++++++++#         hidden_states: mindspore.Tensor,
++++++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++++++#         position_ids: Optional[mindspore.Tensor] = None,
++++++++#         past_key_value: Optional[Cache] = None,
++++++++#         output_attentions: bool = False,
++++++++#         use_cache: bool = False,
++++++++#         **kwargs,
++++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++#         if "padding_mask" in kwargs:
++++++++#             warnings.warn(
++++++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++++++++#             )
++++++++        
++++++++#         if output_attentions:
++++++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
++++++++
++++++++#         bsz, q_len, _ = hidden_states.shape
++++++++
++++++++#         if self.config.pretraining_tp > 1:
++++++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++++++++
++++++++#         query_states = self.q_proj(hidden_states)
++++++++#         key_states = self.k_proj(hidden_states)
++++++++#         value_states = self.v_proj(hidden_states)
++++++++
++++++++#         # Reshape for multi-head attention
++++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++#         kv_seq_len = key_states.shape[-2]
++++++++#         if past_key_value is not None:
++++++++#             if self.layer_idx is None:
++++++++#                 raise ValueError(
++++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++#                     "with a layer index."
++++++++#                 )
++++++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++        
++++++++#         # Apply Rotary Positional Embedding
++++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++#         if past_key_value is not None:
++++++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
++++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++++
++++++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
++++++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
++++++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++        
++++++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++++++        
++++++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
++++++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
++++++++
++++++++#         # Convert attention_mask for flash_attention_score
++++++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
++++++++#         if attention_mask is not None:
++++++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
++++++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++++++++#                 raise ValueError(
++++++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++++++++#                 )
++++++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
++++++++#         else:
++++++++#             attn_mask_for_fa = None
++++++++        
++++++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++++++++
++++++++#         # Call the fused flash_attention_score operator
++++++++#         attn_output = mindspore.ops.flash_attention_score(
++++++++#             query=query_states_for_fa,
++++++++#             key=key_states_for_fa,
++++++++#             value=value_states_for_fa,
++++++++#             head_num=self.num_heads, # This is N1, the number of query heads
++++++++#             input_layout='BSH',
++++++++#             attn_mask=attn_mask_for_fa,
++++++++#             keep_prob=keep_prob,
++++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++#             sparse_mode=0 # Default mask mode
++++++++#         )
++++++++        
++++++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
++++++++#         attn_output = self.o_proj(attn_output)
++++++++        
++++++++#         # Flash Attention does not return attention weights
++++++++#         attn_weights = None
++++++++
++++++++#         return attn_output, attn_weights, past_key_value
++++++++
++++++++class DeepseekFlashAttention(nn.Module):
++++++++    """
++++++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
++++++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
++++++++    designed for high performance on supported hardware (Ascend).
++++++++
++++++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
++++++++    """
++++++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
++++++++        super().__init__()
++++++++        self.config = config
++++++++        self.layer_idx = layer_idx
++++++++        if layer_idx is None:
++++++++            logger.warning(
++++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++++                "when creating this class."
++++++++            )
++++++++
++++++++        # --- [FIX] Correctly initialize all required attributes ---
++++++++        self.attention_dropout = config.attention_dropout
++++++++        self.hidden_size = config.hidden_size
++++++++        self.num_heads = config.num_attention_heads
++++++++        self.head_dim = self.hidden_size // self.num_heads
++++++++        self.num_key_value_heads = config.num_key_value_heads
++++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++++        self.max_position_embeddings = config.max_position_embeddings
++++++++        self.rope_theta = config.rope_theta
++++++++        self.is_causal = True
++++++++
++++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
++++++++            raise ValueError(
++++++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++++++                f" and `num_heads`: {self.num_heads})."
++++++++            )
++++++++
++++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
++++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
++++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
++++++++        
++++++++        # This call will now succeed as all attributes are initialized.
++++++++        self._init_rope()
++++++++
++++++++    def _init_rope(self):
++++++++        if self.config.rope_scaling is None:
++++++++            self.rotary_emb = DeepseekRotaryEmbedding(
++++++++                self.head_dim,
++++++++                max_position_embeddings=self.max_position_embeddings,
++++++++                base=self.rope_theta,
++++++++            )
++++++++        else:
++++++++            scaling_type = self.config.rope_scaling["type"]
++++++++            scaling_factor = self.config.rope_scaling["factor"]
++++++++            if scaling_type == "linear":
++++++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
++++++++                    self.head_dim,
++++++++                    max_position_embeddings=self.max_position_embeddings,
++++++++                    scaling_factor=scaling_factor,
++++++++                    base=self.rope_theta,
++++++++                )
++++++++            elif scaling_type == "dynamic":
++++++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
++++++++                    self.head_dim,
++++++++                    max_position_embeddings=self.max_position_embeddings,
++++++++                    scaling_factor=scaling_factor,
++++++++                    base=self.rope_theta,
++++++++                )
++++++++            else:
++++++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
++++++++
++++++++    def forward(
++++++++        self,
++++++++        hidden_states: mindspore.Tensor,
++++++++        attention_mask: Optional[mindspore.Tensor] = None,
++++++++        position_ids: Optional[mindspore.Tensor] = None,
++++++++        past_key_value: Optional[Cache] = None,
++++++++        output_attentions: bool = False,
++++++++        use_cache: bool = False,
++++++++        **kwargs,
++++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++        if "padding_mask" in kwargs:
++++++++            warnings.warn(
++++++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
++++++++            )
++++++++        if output_attentions:
++++++++            warnings.warn(
++++++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
++++++++            )
++++++++
++++++++        bsz, q_len, _ = hidden_states.shape
++++++++
++++++++        if self.config.pretraining_tp > 1:
++++++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
++++++++
++++++++        query_states = self.q_proj(hidden_states)
++++++++        key_states = self.k_proj(hidden_states)
++++++++        value_states = self.v_proj(hidden_states)
++++++++
++++++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
++++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++        kv_seq_len = key_states.shape[-2]
++++++++        if past_key_value is not None:
++++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++        
++++++++        # Apply Rotary Position Embedding
++++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++        if past_key_value is not None:
++++++++            cache_kwargs = {"sin": sin, "cos": cos}
++++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++++
++++++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
++++++++        # So we must explicitly repeat the KV heads.
++++++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++++
++++++++        # Convert attention mask for flash_attention_score
++++++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
++++++++        if attention_mask is not None:
++++++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
++++++++                 raise ValueError(
++++++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
++++++++                )
++++++++            attn_mask_for_fa = attention_mask < 0
++++++++        else:
++++++++            attn_mask_for_fa = None
++++++++
++++++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
++++++++
++++++++        # Call the fused operator using the efficient BNSD layout
++++++++        attn_output = mindspore.ops.flash_attention_score(
++++++++            query=query_states,
++++++++            key=key_states,
++++++++            value=value_states,
++++++++            head_num=self.num_heads,
++++++++            input_layout='BNSD', # Specify the correct layout
++++++++            attn_mask=attn_mask_for_fa,
++++++++            keep_prob=keep_prob,
++++++++            scalar_value=1.0 / math.sqrt(self.head_dim)
++++++++        )
++++++++        
++++++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
++++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++        
++++++++        # Apply output projection
++++++++        attn_output = self.o_proj(attn_output)
++++++++
++++++++        # Flash attention does not return attention weights, so we return None.
++++++++        attn_weights = None
++++++++
++++++++        return attn_output, attn_weights, past_key_value
++++++++
+++++++ Deepseek_ATTENTION_CLASSES = {
+++++++     "eager": DeepseekAttention,
++++++++    "flash-attention": DeepseekFlashAttention,
+++++++ }
+++++++ 
+++++++ 
+++++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+++++++             config=config, layer_idx=layer_idx
+++++++         )
+++++++ 
++++++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
++++++++            config=config, layer_idx=layer_idx
++++++++        )
++++++++
+++++++         self.mlp = (
+++++++             DeepseekMoE(config)
+++++++             if (
+++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++index d4c6b651..bced285c 100644
+++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+++++++ 
+++++++ import mindspore
+++++++ import mindnlp.core.nn.functional as F
+++++++-from mindnlp.core import nn, ops
++++++++from mindnlp.core import nn, ops, no_grad
+++++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+++++++ 
+++++++ from ....common.activations import ACT2FN
+++++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+++++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+++++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+++++++ 
++++++++Long_Prompt = False
++++++++PROMPT_LENGTH_THRESHOLD = 128
+++++++ 
+++++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+++++++ def _prepare_4d_causal_attention_mask_with_cache_position(
+++++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+++++++         return attn_output, attn_weights, past_key_value
+++++++ 
+++++++ 
++++++++# class Qwen2MoeFlashAttention(nn.Module):
++++++++#     """
++++++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
++++++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
++++++++
++++++++#     关键改动:
++++++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
++++++++#        直接传入原始的 key 和 value 张量效率更高。
++++++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
++++++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++++#     """
++++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++++#         super().__init__()
++++++++#         self.config = config
++++++++#         self.layer_idx = layer_idx
++++++++#         self.hidden_size = config.hidden_size
++++++++#         self.num_heads = config.num_attention_heads
++++++++#         self.head_dim = self.hidden_size // self.num_heads
++++++++#         self.num_key_value_heads = config.num_key_value_heads
++++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++++#         self.max_position_embeddings = config.max_position_embeddings
++++++++#         self.rope_theta = config.rope_theta
++++++++#         self.attention_dropout = config.attention_dropout
++++++++
++++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++++++#             raise ValueError(
++++++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
++++++++#             )
++++++++
++++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++++
++++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++++#             self.head_dim,
++++++++#             max_position_embeddings=self.max_position_embeddings,
++++++++#             base=self.rope_theta,
++++++++#         )
++++++++
++++++++#     def forward(
++++++++#         self,
++++++++#         hidden_states: mindspore.Tensor,
++++++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++++++#         position_ids: Optional[mindspore.Tensor] = None,
++++++++#         past_key_value: Optional[Cache] = None,
++++++++#         output_attentions: bool = False,
++++++++#         use_cache: bool = False,
++++++++#         cache_position: Optional[mindspore.Tensor] = None,
++++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++#         bsz, q_len, _ = hidden_states.shape
++++++++
++++++++#         # 1. 线性投射 Q, K, V
++++++++#         query_states = self.q_proj(hidden_states)
++++++++#         key_states = self.k_proj(hidden_states)
++++++++#         value_states = self.v_proj(hidden_states)
++++++++
++++++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
++++++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++#         # 3. RoPE 旋转位置编码
++++++++#         kv_seq_len = key_states.shape[-2]
++++++++#         if past_key_value is not None:
++++++++#             if self.layer_idx is None:
++++++++#                 raise ValueError(
++++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++#                     "with a layer index."
++++++++#                 )
++++++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
++++++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
++++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
++++++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
++++++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
++++++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
++++++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
++++++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
++++++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
++++++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
++++++++#                 if cache_position.shape[0] == 1:
++++++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
++++++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
++++++++#                     kv_seq_len = past_seen_tokens + 1
++++++++#                 else:
++++++++#                     # prefill 阶段：cache_position 是范围，使用其长度
++++++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
++++++++#             else:
++++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++
++++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++#         # 4. KV 缓存更新
++++++++#         if past_key_value is not None:
++++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++++#             key_states, value_states = past_key_value.update(
++++++++#                 key_states, value_states, self.layer_idx, cache_kwargs
++++++++#             )
++++++++            
++++++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
++++++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
++++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
++++++++#                 if cache_position.shape[0] == 1:
++++++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
++++++++#                     kv_seq_len = key_states.shape[-2]
++++++++
++++++++#         # 5. [重要] 准备 Attention Mask
++++++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
++++++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++++#         fa_attention_mask = None
++++++++#         if attention_mask is not None:
++++++++#             # 截取与当前key长度匹配的部分
++++++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
++++++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
++++++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
++++++++#             fa_attention_mask = (mask_slice != 0)
++++++++
++++++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
++++++++#         input_dtype = query_states.dtype
++++++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
++++++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
++++++++#             query_states = query_states.to(mindspore.float16)
++++++++#             key_states = key_states.to(mindspore.float16)
++++++++#             value_states = value_states.to(mindspore.float16)
++++++++
++++++++#         # 6. [核心] 调用 flash_attention_score 算子
++++++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
++++++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++++#         attn_output = mindspore.ops.flash_attention_score(
++++++++#             query=query_states,
++++++++#             key=key_states,
++++++++#             value=value_states,
++++++++#             head_num=self.num_heads, # 传入Q的头数(N1)
++++++++#             attn_mask=fa_attention_mask,
++++++++#             keep_prob=1.0 - self.attention_dropout,
++++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++#             input_layout="BNSD",
++++++++#             sparse_mode=0 # 使用 defaultMask 模式
++++++++#         )
++++++++
++++++++#         # 恢复原始数据类型
++++++++#         attn_output = attn_output.to(input_dtype)
++++++++
++++++++#         # 7. 调整输出形状
++++++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++#         attn_output = self.o_proj(attn_output)
++++++++
++++++++#         # FlashAttention 算子不直接返回注意力权重矩阵
++++++++#         attn_weights = None
++++++++#         if output_attentions:
++++++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++++
++++++++#         return attn_output, attn_weights, past_key_value
++++++++
++++++++#     # def forward(
++++++++#     #     self,
++++++++#     #     hidden_states: mindspore.Tensor,
++++++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
++++++++#     #     position_ids: Optional[mindspore.Tensor] = None,
++++++++#     #     past_key_value: Optional[Cache] = None,
++++++++#     #     output_attentions: bool = False,
++++++++#     #     use_cache: bool = False,
++++++++#     #     cache_position: Optional[mindspore.Tensor] = None,
++++++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++#     #     bsz, q_len, _ = hidden_states.shape
++++++++
++++++++#     #     # 1. 线性投射 Q, K, V
++++++++#     #     query_states = self.q_proj(hidden_states)
++++++++#     #     key_states = self.k_proj(hidden_states)
++++++++#     #     value_states = self.v_proj(hidden_states)
++++++++
++++++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
++++++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++
++++++++#     #     # 3. RoPE 旋转位置编码
++++++++#     #     kv_seq_len = key_states.shape[-2]
++++++++#     #     if past_key_value is not None:
++++++++#     #         if self.layer_idx is None:
++++++++#     #             raise ValueError(
++++++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++#     #                 "with a layer index."
++++++++#     #             )
++++++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++
++++++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++#     #     # 4. KV 缓存更新
++++++++#     #     if past_key_value is not None:
++++++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
++++++++#     #         key_states, value_states = past_key_value.update(
++++++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
++++++++#     #         )
++++++++
++++++++#     #     # 5. 准备 Attention Mask
++++++++#     #     fa_attention_mask = None
++++++++#     #     if attention_mask is not None:
++++++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++#     #         fa_attention_mask = (mask_slice != 0)
++++++++
++++++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
++++++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
++++++++#     #     input_dtype = query_states.dtype
++++++++
++++++++#     #     # 6. [核心] 调用 flash_attention_score 算子
++++++++#     #     attn_output = mindspore.ops.flash_attention_score(
++++++++#     #         query=query_states,
++++++++#     #         key=key_states,
++++++++#     #         value=value_states,
++++++++#     #         head_num=self.num_heads,
++++++++#     #         attn_mask=fa_attention_mask,
++++++++#     #         keep_prob=1.0 - self.attention_dropout,
++++++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++#     #         input_layout="BNSD",
++++++++#     #         sparse_mode=0,
++++++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
++++++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
++++++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
++++++++#     #         inner_precise=1
++++++++#     #     )
++++++++
++++++++#     #     # 恢复原始数据类型
++++++++#     #     attn_output = attn_output.to(input_dtype)
++++++++
++++++++#     #     # 7. 调整输出形状
++++++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++#     #     attn_output = self.o_proj(attn_output)
++++++++
++++++++#     #     attn_weights = None
++++++++#     #     if output_attentions:
++++++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++++
++++++++#     #     return attn_output, attn_weights, past_key_value
++++++++
++++++++
+++++++ class Qwen2MoeFlashAttention(nn.Module):
+++++++     """
+++++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++++-
+++++++-    关键改动:
+++++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++++-       直接传入原始的 key 和 value 张量效率更高。
+++++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
++++++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
++++++++
++++++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
++++++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
++++++++    完全使用模型的低精度数据类型（如 float16）进行计算，
++++++++    以达到理论上的最高执行速度。
+++++++     """
+++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++++         super().__init__()
+++++++         self.config = config
+++++++         self.layer_idx = layer_idx
++++++++        if layer_idx is None:
++++++++            logger.warning_once(
++++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
++++++++            )
++++++++
+++++++         self.hidden_size = config.hidden_size
+++++++         self.num_heads = config.num_attention_heads
+++++++         self.head_dim = self.hidden_size // self.num_heads
+++++++         self.num_key_value_heads = config.num_key_value_heads
+++++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++         self.max_position_embeddings = config.max_position_embeddings
+++++++         self.rope_theta = config.rope_theta
+++++++         self.attention_dropout = config.attention_dropout
+++++++ 
+++++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++-            raise ValueError(
+++++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++++-            )
+++++++-
+++++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+++++++         key_states = self.k_proj(hidden_states)
+++++++         value_states = self.v_proj(hidden_states)
+++++++ 
+++++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
++++++++        # 2. 调整形状以匹配 BNSD 布局
+++++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-
+++++++-        # 3. RoPE 旋转位置编码
++++++++        
++++++++        # 3. RoPE 和 KV 缓存
+++++++         kv_seq_len = key_states.shape[-2]
+++++++         if past_key_value is not None:
+++++++-            if self.layer_idx is None:
+++++++-                raise ValueError(
+++++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++-                    "with a layer index."
+++++++-                )
+++++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++++-                if cache_position.shape[0] == 1:
+++++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++++-                    kv_seq_len = past_seen_tokens + 1
+++++++-                else:
+++++++-                    # prefill 阶段：cache_position 是范围，使用其长度
+++++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++++-            else:
+++++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++-
++++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++        
+++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++ 
+++++++-        # 4. KV 缓存更新
+++++++         if past_key_value is not None:
+++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++-            key_states, value_states = past_key_value.update(
+++++++-                key_states, value_states, self.layer_idx, cache_kwargs
+++++++-            )
+++++++-            
+++++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++-                if cache_position.shape[0] == 1:
+++++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++++-                    kv_seq_len = key_states.shape[-2]
+++++++-
+++++++-        # 5. [重要] 准备 Attention Mask
+++++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
++++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++++
++++++++        # 4. 准备 Attention Mask
+++++++         fa_attention_mask = None
+++++++         if attention_mask is not None:
+++++++-            # 截取与当前key长度匹配的部分
+++++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++++             fa_attention_mask = (mask_slice != 0)
+++++++ 
+++++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++++-        input_dtype = query_states.dtype
+++++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++++-            query_states = query_states.to(mindspore.float16)
+++++++-            key_states = key_states.to(mindspore.float16)
+++++++-            value_states = value_states.to(mindspore.float16)
+++++++-
+++++++-        # 6. [核心] 调用 flash_attention_score 算子
+++++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
++++++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+++++++         attn_output = mindspore.ops.flash_attention_score(
+++++++             query=query_states,
+++++++             key=key_states,
+++++++             value=value_states,
+++++++-            head_num=self.num_heads, # 传入Q的头数(N1)
++++++++            head_num=self.num_heads,
+++++++             attn_mask=fa_attention_mask,
+++++++-            keep_prob=1.0 - self.attention_dropout,
++++++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+++++++             scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++             input_layout="BNSD",
+++++++-            sparse_mode=0 # 使用 defaultMask 模式
++++++++            sparse_mode=0,
++++++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+++++++         )
+++++++ 
+++++++-        # 恢复原始数据类型
+++++++-        attn_output = attn_output.to(input_dtype)
+++++++-
+++++++-        # 7. 调整输出形状
+++++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
++++++++        # 6. 调整输出形状
+++++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++         attn_output = self.o_proj(attn_output)
+++++++ 
+++++++-        # FlashAttention 算子不直接返回注意力权重矩阵
++++++++        # 7. 返回结果
+++++++         attn_weights = None
+++++++         if output_attentions:
+++++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+++++++ 
+++++++         return attn_output, attn_weights, past_key_value
+++++++ 
+++++++-    # def forward(
+++++++-    #     self,
+++++++-    #     hidden_states: mindspore.Tensor,
+++++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++-    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++-    #     past_key_value: Optional[Cache] = None,
+++++++-    #     output_attentions: bool = False,
+++++++-    #     use_cache: bool = False,
+++++++-    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++-
+++++++-    #     bsz, q_len, _ = hidden_states.shape
+++++++-
+++++++-    #     # 1. 线性投射 Q, K, V
+++++++-    #     query_states = self.q_proj(hidden_states)
+++++++-    #     key_states = self.k_proj(hidden_states)
+++++++-    #     value_states = self.v_proj(hidden_states)
+++++++-
+++++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-
+++++++-    #     # 3. RoPE 旋转位置编码
+++++++-    #     kv_seq_len = key_states.shape[-2]
+++++++-    #     if past_key_value is not None:
+++++++-    #         if self.layer_idx is None:
+++++++-    #             raise ValueError(
+++++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++-    #                 "with a layer index."
+++++++-    #             )
+++++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++ 
+++++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++-
+++++++-    #     # 4. KV 缓存更新
+++++++-    #     if past_key_value is not None:
+++++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++-    #         key_states, value_states = past_key_value.update(
+++++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++-    #         )
+++++++-
+++++++-    #     # 5. 准备 Attention Mask
+++++++-    #     fa_attention_mask = None
+++++++-    #     if attention_mask is not None:
+++++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++-    #         fa_attention_mask = (mask_slice != 0)
+++++++-
+++++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++++-    #     input_dtype = query_states.dtype
+++++++-
+++++++-    #     # 6. [核心] 调用 flash_attention_score 算子
+++++++-    #     attn_output = mindspore.ops.flash_attention_score(
+++++++-    #         query=query_states,
+++++++-    #         key=key_states,
+++++++-    #         value=value_states,
+++++++-    #         head_num=self.num_heads,
+++++++-    #         attn_mask=fa_attention_mask,
+++++++-    #         keep_prob=1.0 - self.attention_dropout,
+++++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++-    #         input_layout="BNSD",
+++++++-    #         sparse_mode=0,
+++++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++++-    #         inner_precise=1
+++++++-    #     )
+++++++-
+++++++-    #     # 恢复原始数据类型
+++++++-    #     attn_output = attn_output.to(input_dtype)
++++++++QWEN2MOE_ATTENTION_CLASSES = {
++++++++    "eager": Qwen2MoeAttention,
++++++++    "flash-attention": Qwen2MoeFlashAttention,
++++++++}
+++++++ 
+++++++-    #     # 7. 调整输出形状
+++++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++-    #     attn_output = self.o_proj(attn_output)
+++++++ 
+++++++-    #     attn_weights = None
+++++++-    #     if output_attentions:
+++++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
++++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++++#     def __init__(self, config):
++++++++#         super().__init__()
++++++++#         self.num_experts = config.num_experts
++++++++#         self.top_k = config.num_experts_per_tok
++++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++++
++++++++#         # gating
++++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++++#         self.experts = nn.ModuleList(
++++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++++#         )
++++++++
++++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++++
++++++++#     #@dwj
++++++++#     # 只遍历激活的专家，而非全部专家
++++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++#             num_tokens = hidden_states_reshaped.shape[0]
++++++++
++++++++#             router_logits = self.gate(hidden_states_reshaped)
++++++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++++
++++++++#             if self.norm_topk_prob:
++++++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++#             routing_weights = routing_weights.to(hidden_states.dtype)
++++++++            
++++++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
++++++++#             flat_selected_experts = selected_experts.flatten()
++++++++            
++++++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
++++++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
++++++++#             token_indices = broadcasted_token_indices.flatten()
++++++++            
++++++++#             active_experts = ops.unique(flat_selected_experts)
++++++++            
++++++++#             for expert_idx_tensor in active_experts:
++++++++#                 expert_idx = expert_idx_tensor.item()
++++++++#                 expert_layer = self.experts[expert_idx]
++++++++                
++++++++#                 mask = (flat_selected_experts == expert_idx_tensor)
++++++++#                 selected_token_indices = token_indices[mask]
++++++++#                 selected_routing_weights = routing_weights.flatten()[mask]
++++++++                
++++++++#                 current_states = hidden_states_reshaped[selected_token_indices]
++++++++                
++++++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++++                
++++++++#                 final_hidden_states = final_hidden_states.index_add(
++++++++#                     dim=0,
++++++++#                     index=selected_token_indices,
++++++++#                     source=expert_output.to(hidden_states.dtype)
++++++++#                 )
++++++++            
++++++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++++ 
+++++++-    #     return attn_output, attn_weights, past_key_value
++++++++#             final_hidden_states = final_hidden_states + shared_expert_output
++++++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++++            
++++++++#             return final_hidden_states, router_logits
++++++++
++++++++
++++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++++#     """
++++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
++++++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
++++++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
++++++++#     """
++++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++++#         super().__init__()
++++++++#         self.num_experts = config.num_experts
++++++++#         self.top_k = config.num_experts_per_tok
++++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++++
++++++++#         # 门控网络
++++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++++#         # 专家列表
++++++++#         self.experts = nn.ModuleList(
++++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++++#         )
++++++++#         # 共享专家
++++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++++
++++++++#     @no_grad()
++++++++#     def _moe_infer_decode(
++++++++#         self, 
++++++++#         hidden_states: mindspore.Tensor, 
++++++++#         selected_experts: mindspore.Tensor, 
++++++++#         routing_weights: mindspore.Tensor
++++++++#     ) -> mindspore.Tensor:
++++++++#         """
++++++++#         【解码路径】针对 sequence_length=1 的极致优化。
++++++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
++++++++#         """
++++++++#         batch_size, hidden_dim = hidden_states.shape
++++++++        
++++++++#         expert_outputs_list = [
++++++++#             ops.cat([
++++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++++#             ], dim=0) 
++++++++#             for i in range(batch_size)
++++++++#         ]
++++++++        
++++++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
++++++++#         # shape: (batch_size, top_k, hidden_dim)
++++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++++        
++++++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
++++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++++        
++++++++#         return moe_output.squeeze(1)
++++++++
++++++++#     @no_grad()
++++++++#     def _moe_infer_prefill(
++++++++#         self, 
++++++++#         hidden_states: mindspore.Tensor, 
++++++++#         selected_experts: mindspore.Tensor, 
++++++++#         routing_weights: mindspore.Tensor
++++++++#     ) -> mindspore.Tensor:
++++++++#         """
++++++++#         【预填充路径】针对 sequence_length > 1 的优化。
++++++++#         按专家对 Token 进行分组，并进行批处理。
++++++++#         """
++++++++#         moe_output = ops.zeros_like(hidden_states)
++++++++#         num_tokens = hidden_states.shape[0]
++++++++#         flat_selected_experts = selected_experts.flatten()
++++++++        
++++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++++        
++++++++#         active_experts = ops.unique(flat_selected_experts)
++++++++        
++++++++#         for expert_idx_tensor in active_experts:
++++++++#             expert_idx = expert_idx_tensor.item()
++++++++#             expert_layer = self.experts[expert_idx]
++++++++            
++++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++++#             selected_token_indices = token_indices[mask]
++++++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++++++            
++++++++#             current_states = hidden_states[selected_token_indices]
++++++++            
++++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++++            
++++++++#             moe_output = moe_output.index_add(
++++++++#                 dim=0,
++++++++#                 index=selected_token_indices,
++++++++#                 source=expert_output.to(hidden_states.dtype)
++++++++#             )
++++++++#         return moe_output
++++++++
++++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++++#         """
++++++++#         顶层 forward 方法，作为智能分发器。
++++++++#         """
++++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++        
++++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++ 
+++++++-    # def forward(
+++++++-    #     self,
+++++++-    #     hidden_states: mindspore.Tensor,
+++++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++-    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++-    #     past_key_value: Optional[Cache] = None,
+++++++-    #     output_attentions: bool = False,
+++++++-    #     use_cache: bool = False,
+++++++-    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++-
+++++++-    #     bsz, q_len, _ = hidden_states.shape
+++++++-
+++++++-    #     query_states = self.q_proj(hidden_states)
+++++++-    #     key_states = self.k_proj(hidden_states)
+++++++-    #     value_states = self.v_proj(hidden_states)
+++++++-
+++++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-
+++++++-    #     kv_seq_len = key_states.shape[-2]
+++++++-    #     if past_key_value is not None:
+++++++-    #         if self.layer_idx is None:
+++++++-    #             raise ValueError("`layer_idx` must be specified for caching")
+++++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++-
+++++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++-
+++++++-    #     if past_key_value is not None:
+++++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++-    #         key_states, value_states = past_key_value.update(
+++++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++-    #         )
++++++++#         if self.norm_topk_prob:
++++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++        
++++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++++        
++++++++#         moe_output = None
++++++++#         # 在推理时，根据序列长度选择最优路径
++++++++#         if not self.training:
++++++++#             if sequence_length == 1:
++++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++++++#             else:
++++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++++++#         else:
++++++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
++++++++#             raise NotImplementedError("Training path is not implemented.")
++++++++
++++++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
++++++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
++++++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
++++++++        
++++++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
++++++++        
++++++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
++++++++        
++++++++#         return final_hidden_states, router_logits
++++++++
++++++++
++++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++++#     """
++++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
++++++++#     """
++++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++++#         super().__init__()
++++++++#         self.num_experts = config.num_experts
++++++++#         self.top_k = config.num_experts_per_tok
++++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++++
++++++++#         # 门控网络
++++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++++#         # 专家列表
++++++++#         self.experts = nn.ModuleList(
++++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++++#         )
++++++++#         # 共享专家
++++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++++
++++++++#     @no_grad()
++++++++#     def _moe_infer_decode(
++++++++#         self, 
++++++++#         hidden_states: mindspore.Tensor, 
++++++++#         selected_experts: mindspore.Tensor, 
++++++++#         routing_weights: mindspore.Tensor
++++++++#     ) -> mindspore.Tensor:
++++++++#         batch_size, _ = hidden_states.shape
++++++++#         expert_outputs_list = [
++++++++#             ops.cat([
++++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++++#             ], dim=0) 
++++++++#             for i in range(batch_size)
++++++++#         ]
++++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++++#         return moe_output.squeeze(1)
++++++++
++++++++#     @no_grad()
++++++++#     def _moe_infer_prefill(
++++++++#         self, 
++++++++#         hidden_states: mindspore.Tensor, 
++++++++#         selected_experts: mindspore.Tensor, 
++++++++#         routing_weights: mindspore.Tensor
++++++++#     ) -> mindspore.Tensor:
++++++++#         moe_output = ops.zeros_like(hidden_states)
++++++++#         num_tokens = hidden_states.shape[0]
++++++++#         flat_selected_experts = selected_experts.flatten()
++++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++++#         active_experts = ops.unique(flat_selected_experts)
++++++++        
++++++++#         for expert_idx_tensor in active_experts:
++++++++#             expert_idx = expert_idx_tensor.item()
++++++++#             expert_layer = self.experts[expert_idx]
++++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++++#             selected_token_indices = token_indices[mask]
++++++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++++++#             current_states = hidden_states[selected_token_indices]
++++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++++#             moe_output = moe_output.index_add(
++++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++++++#             )
++++++++#         return moe_output
++++++++
++++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++++#         """
++++++++#         顶层 forward 方法，作为智能分发器。
++++++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
++++++++#         """
++++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++        
++++++++#         # 1. 门控计算 (通用逻辑)
++++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++++
++++++++#         if self.norm_topk_prob:
++++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++        
++++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++++        
++++++++#         # 2. 智能分发到最优 MoE 路径
++++++++#         moe_output = None
++++++++#         if not self.training:
++++++++#             if sequence_length == 1:
++++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
++++++++#             else:
++++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
++++++++#         else:
++++++++#             raise NotImplementedError("Training path is not implemented.")
++++++++
++++++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
++++++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
++++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++++        
++++++++#         # 4. 合并 MoE 输出和共享专家输出
++++++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
++++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++++        
++++++++#         # 5. 恢复原始形状并返回
++++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++++        
++++++++#         return final_hidden_states, router_logits
++++++++
++++++++# prefill fastest
++++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++++#     """
++++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
++++++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
++++++++#     """
++++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++++#         super().__init__()
++++++++#         self.num_experts = config.num_experts
++++++++#         self.top_k = config.num_experts_per_tok
++++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++++
++++++++#         # 门控网络
++++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++++#         # 专家列表
++++++++#         self.experts = nn.ModuleList(
++++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++++#         )
++++++++#         # 共享专家
++++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++++
++++++++#     @no_grad()
++++++++#     def _moe_infer_dispatch(
++++++++#         self, 
++++++++#         hidden_states: mindspore.Tensor, 
++++++++#         selected_experts: mindspore.Tensor, 
++++++++#         routing_weights: mindspore.Tensor
++++++++#     ) -> mindspore.Tensor:
++++++++#         """
++++++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
++++++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
++++++++#         """
++++++++#         moe_output = ops.zeros_like(hidden_states)
++++++++#         num_tokens, _ = hidden_states.shape
++++++++        
++++++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
++++++++#         flat_selected_experts = selected_experts.flatten()
++++++++#         flat_routing_weights = routing_weights.flatten()
+++++++ 
+++++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++-
+++++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++++-    #     query_states = query_states / math.sqrt(self.head_dim)
+++++++-    #     # <--- 修改结束 ---
+++++++-
+++++++-    #     fa_attention_mask = None
+++++++-    #     if attention_mask is not None:
+++++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++-    #         fa_attention_mask = (mask_slice != 0)
+++++++-
+++++++-    #     input_dtype = query_states.dtype
+++++++-
+++++++-    #     attn_output = mindspore.ops.flash_attention_score(
+++++++-    #         query=query_states,    # 传入已经预先缩放过的 query
+++++++-    #         key=key_states,
+++++++-    #         value=value_states,
+++++++-    #         head_num=self.num_heads,
+++++++-    #         attn_mask=fa_attention_mask,
+++++++-    #         keep_prob=1.0 - self.attention_dropout,
+++++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++++-    #         input_layout="BNSD",
+++++++-    #         sparse_mode=0,
+++++++-    #         inner_precise=1        # 仍然保持内部高精度计算
+++++++-    #     )
++++++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
++++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++ 
+++++++-    #     attn_output = attn_output.to(input_dtype)
+++++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++-    #     attn_output = self.o_proj(attn_output)
++++++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
++++++++#         active_experts = ops.unique(flat_selected_experts)
++++++++        
++++++++#         for expert_idx_tensor in active_experts:
++++++++#             expert_idx = expert_idx_tensor.item()
++++++++#             expert_layer = self.experts[expert_idx]
++++++++            
++++++++#             # 找到所有分配给该专家的 token
++++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++++            
++++++++#             # 使用 mask 选取对应的 token 和权重
++++++++#             current_token_indices = token_indices[mask]
++++++++#             current_routing_weights = flat_routing_weights[mask]
++++++++#             current_hidden_states = hidden_states[current_token_indices]
++++++++            
++++++++#             # 对这些 token 进行批处理
++++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++++++            
++++++++#             # 使用 index_add 将结果精确地加回到对应位置
++++++++#             moe_output = moe_output.index_add(
++++++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
++++++++#             )
++++++++#         return moe_output
++++++++
++++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++++#         """
++++++++#         顶层 forward 方法，作为智能分发器。
++++++++#         """
++++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++        
++++++++#         # 1. 门控计算
++++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++++
++++++++#         if self.norm_topk_prob:
++++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++        
++++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
++++++++        
++++++++#         # 2. 调用统一的 MoE 计算内核
++++++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
++++++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+++++++ 
+++++++-    #     attn_weights = None
+++++++-    #     if output_attentions:
+++++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
++++++++#         # 3. 统一处理共享专家
++++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++++        
++++++++#         # 4. 合并输出
++++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++++        
++++++++#         # 5. 恢复原始形状并返回
++++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++++        
++++++++#         return final_hidden_states, router_logits
++++++++
++++++++
++++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++++#     """
++++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
++++++++#     【最终高性能与高精度版】：
++++++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
++++++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
++++++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
++++++++#     3. 这样实现了速度和准确性的两全其美。
++++++++#     """
++++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++++#         super().__init__()
++++++++#         self.num_experts = config.num_experts
++++++++#         self.top_k = config.num_experts_per_tok
++++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++++
++++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++++#         self.experts = nn.ModuleList(
++++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++++#         )
++++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++++
++++++++#     @no_grad()
++++++++#     def _moe_infer_decode(
++++++++#         self, 
++++++++#         hidden_states: mindspore.Tensor, 
++++++++#         selected_experts: mindspore.Tensor, 
++++++++#         routing_weights: mindspore.Tensor
++++++++#     ) -> mindspore.Tensor:
++++++++#         """
++++++++#         【解码路径】极致优化版：bmm + 高精度累加。
++++++++#         """
++++++++#         original_dtype = hidden_states.dtype
++++++++#         batch_size, _ = hidden_states.shape
++++++++        
++++++++#         expert_outputs_list = [
++++++++#             ops.cat([
++++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++++#             ], dim=0) 
++++++++#             for i in range(batch_size)
++++++++#         ]
++++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++++
++++++++#         # 在 float32 下执行 bmm，得到高精度结果
++++++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
++++++++        
++++++++#         # 将高精度结果转换回原始数据类型
++++++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
++++++++        
++++++++#         return moe_output
++++++++
++++++++#     @no_grad()
++++++++#     def _moe_infer_prefill(
++++++++#         self, 
++++++++#         hidden_states: mindspore.Tensor, 
++++++++#         selected_experts: mindspore.Tensor, 
++++++++#         routing_weights: mindspore.Tensor
++++++++#     ) -> mindspore.Tensor:
++++++++#         """
++++++++#         【预填充路径】与原始实现一致，结果精确。
++++++++#         """
++++++++#         moe_output = ops.zeros_like(hidden_states)
++++++++#         num_tokens, _ = hidden_states.shape
++++++++#         flat_selected_experts = selected_experts.flatten()
++++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++++#         active_experts = ops.unique(flat_selected_experts)
++++++++        
++++++++#         for expert_idx_tensor in active_experts:
++++++++#             expert_idx = expert_idx_tensor.item()
++++++++#             expert_layer = self.experts[expert_idx]
++++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++++#             selected_token_indices = token_indices[mask]
++++++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++++++#             current_states = hidden_states[selected_token_indices]
++++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++++#             moe_output = moe_output.index_add(
++++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
++++++++#             )
++++++++#         return moe_output
++++++++
++++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++        
++++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++ 
+++++++-    #     return attn_output, attn_weights, past_key_value
++++++++#         if self.norm_topk_prob:
++++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++        
++++++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
++++++++#         # 如果模型主体是 float16，后续再转换
++++++++        
++++++++#         moe_output = None
++++++++#         if not self.training:
++++++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
++++++++#             # _moe_infer_decode 内部会处理好类型转换
++++++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
++++++++#             if sequence_length == 1:
++++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++++++#             else:
++++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
++++++++#         else:
++++++++#             raise NotImplementedError("Training path is not implemented.")
++++++++
++++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++++        
++++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++++        
++++++++#         return final_hidden_states, router_logits
++++++++    
+++++++ 
+++++++-QWEN2MOE_ATTENTION_CLASSES = {
+++++++-    "eager": Qwen2MoeAttention,
+++++++-    "flash-attention": Qwen2MoeFlashAttention,
+++++++-}
++++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
++++++++#     """
++++++++#     【融合版】一个混合专家模块，内置两种推理策略，
++++++++#     由外部全局变量 `Long_Prompt` 控制：
++++++++
++++++++#     - if Long_Prompt is True:  【精度优先模式】
++++++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
++++++++#       适用于处理长序列，避免误差累积。
++++++++
++++++++#     - if Long_Prompt is False: 【速度优先模式】
++++++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
++++++++#       在解码阶段获得极致速度，同时保证结果高度准确。
++++++++#     """
++++++++#     def __init__(self, config: Qwen2MoeConfig):
++++++++#         super().__init__()
++++++++#         self.num_experts = config.num_experts
++++++++#         self.top_k = config.num_experts_per_tok
++++++++#         self.norm_topk_prob = config.norm_topk_prob
++++++++
++++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
++++++++#         self.experts = nn.ModuleList(
++++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
++++++++#         )
++++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++++
++++++++#     # --- 速度优先模式的辅助函数 ---
++++++++#     @no_grad()
++++++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++++#         original_dtype = hidden_states.dtype
++++++++#         batch_size, _ = hidden_states.shape
++++++++#         expert_outputs_list = [
++++++++#             ops.cat([
++++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++++#             ], dim=0) 
++++++++#             for i in range(batch_size)
++++++++#         ]
++++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++++#         weights_fp32 = routing_weights.to(mindspore.float32)
++++++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++++++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
++++++++
++++++++#     @no_grad()
++++++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++++#         moe_output = ops.zeros_like(hidden_states)
++++++++#         num_tokens, _ = hidden_states.shape
++++++++#         flat_selected_experts = selected_experts.flatten()
++++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++++#         active_experts = ops.unique(flat_selected_experts)
++++++++#         for expert_idx_tensor in active_experts:
++++++++#             expert_idx = expert_idx_tensor.item()
++++++++#             expert_layer = self.experts[expert_idx]
++++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++++#             selected_token_indices = token_indices[mask]
++++++++#             selected_routing_weights = routing_weights.flatten()[mask]
++++++++#             current_states = hidden_states[selected_token_indices]
++++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
++++++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
++++++++#         return moe_output
++++++++
++++++++#     # --- 精度优先模式的辅助函数 ---
++++++++#     @no_grad()
++++++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++++#         moe_output = ops.zeros_like(hidden_states)
++++++++#         num_tokens, _ = hidden_states.shape
++++++++#         flat_selected_experts = selected_experts.flatten()
++++++++#         flat_routing_weights = routing_weights.flatten()
++++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++++#         active_experts = ops.unique(flat_selected_experts)
++++++++#         for expert_idx_tensor in active_experts:
++++++++#             expert_idx = expert_idx_tensor.item()
++++++++#             expert_layer = self.experts[expert_idx]
++++++++#             mask = (flat_selected_experts == expert_idx_tensor)
++++++++#             current_token_indices = token_indices[mask]
++++++++#             current_routing_weights = flat_routing_weights[mask]
++++++++#             current_hidden_states = hidden_states[current_token_indices]
++++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++++++++#         return moe_output
++++++++
++++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++++#         # 声明我们将要使用一个在模块外部定义的全局变量
++++++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
++++++++#         global Long_Prompt
++++++++
++++++++#         # 1. 门控计算 (所有模式通用)
++++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++#         router_logits = self.gate(hidden_states_reshaped)
++++++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++++++++#         if self.norm_topk_prob:
++++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++        
++++++++#         moe_output = None
++++++++#         if not self.training:
++++++++#             # 根据 Long_Prompt 标志选择模式
++++++++#             if Long_Prompt:
++++++++#                 # --- 精度优先模式 ---
++++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++#             else:
++++++++#                 # --- 速度优先模式 ---
++++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++++#                 if sequence_length == 1:
++++++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++#                 else:
++++++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++#         else:
++++++++#             raise NotImplementedError("Training path is not implemented.")
++++++++
++++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++++        
++++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++++        
++++++++#         return final_hidden_states, router_logits
++++++++    
++++++++class Qwen2MoeSparseMoeBlock(nn.Module):
++++++++    """
++++++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
++++++++    控制的顶级推理策略：
+++++++ 
++++++++    - if Long_Prompt is True:  【精度优先模式】
++++++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
++++++++      适用于需要严格可复现性的长序列任务。
+++++++ 
+++++++-class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++-    def __init__(self, config):
++++++++    - if Long_Prompt is False: 【速度优先模式】
++++++++      采用业界最强的性能组合：
++++++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
++++++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
++++++++    """
++++++++    def __init__(self, config: Qwen2MoeConfig):
+++++++         super().__init__()
+++++++         self.num_experts = config.num_experts
+++++++         self.top_k = config.num_experts_per_tok
+++++++         self.norm_topk_prob = config.norm_topk_prob
+++++++ 
+++++++-        # gating
+++++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++         self.experts = nn.ModuleList(
+++++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++         )
+++++++-
+++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++ 
+++++++-    #@dwj
+++++++-    # 只遍历激活的专家，而非全部专家
+++++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++-            num_tokens = hidden_states_reshaped.shape[0]
+++++++-
+++++++-            router_logits = self.gate(hidden_states_reshaped)
+++++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++-
+++++++-            if self.norm_topk_prob:
+++++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-            routing_weights = routing_weights.to(hidden_states.dtype)
+++++++-            
+++++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++++-            flat_selected_experts = selected_experts.flatten()
+++++++-            
+++++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++++-            token_indices = broadcasted_token_indices.flatten()
+++++++-            
+++++++-            active_experts = ops.unique(flat_selected_experts)
+++++++-            
+++++++-            for expert_idx_tensor in active_experts:
+++++++-                expert_idx = expert_idx_tensor.item()
+++++++-                expert_layer = self.experts[expert_idx]
+++++++-                
+++++++-                mask = (flat_selected_experts == expert_idx_tensor)
+++++++-                selected_token_indices = token_indices[mask]
+++++++-                selected_routing_weights = routing_weights.flatten()[mask]
+++++++-                
+++++++-                current_states = hidden_states_reshaped[selected_token_indices]
+++++++-                
+++++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++-                
+++++++-                final_hidden_states = final_hidden_states.index_add(
+++++++-                    dim=0,
+++++++-                    index=selected_token_indices,
+++++++-                    source=expert_output.to(hidden_states.dtype)
+++++++-                )
+++++++-            
+++++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
++++++++    @no_grad()
++++++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++++        original_dtype = hidden_states.dtype
++++++++        batch_size, _ = hidden_states.shape
++++++++        expert_outputs_list = [
++++++++            ops.cat([
++++++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
++++++++            ], dim=0)
++++++++            for i in range(batch_size)
++++++++        ]
++++++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
++++++++        weights_fp32 = routing_weights.to(mindspore.float32)
++++++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
++++++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
++++++++        return moe_output_fp32.squeeze(1).to(original_dtype)
++++++++
++++++++    @no_grad()
++++++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++++        num_tokens, _ = hidden_states.shape
++++++++        flat_selected_experts = selected_experts.flatten()
++++++++        sorted_expert_indices = flat_selected_experts.argsort()
++++++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++++++++        original_token_indices = sorted_expert_indices // self.top_k
++++++++        moe_output = ops.zeros_like(hidden_states)
++++++++        current_token_offset = 0
++++++++        for i in range(self.num_experts):
++++++++            expert_token_count = tokens_per_expert[i] - current_token_offset
++++++++            if expert_token_count == 0:
++++++++                continue
++++++++            end_offset = current_token_offset + expert_token_count
++++++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++++++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++++++++            expert_hidden_states = hidden_states[expert_original_token_indices]
++++++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++++++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++++++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++++++++            current_token_offset += expert_token_count
++++++++        return moe_output
++++++++
++++++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
++++++++    @no_grad()
++++++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++++        moe_output = ops.zeros_like(hidden_states)
++++++++        num_tokens, _ = hidden_states.shape
++++++++        flat_selected_experts = selected_experts.flatten()
++++++++        flat_routing_weights = routing_weights.flatten()
++++++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
++++++++        active_experts = ops.unique(flat_selected_experts)
++++++++        for expert_idx_tensor in active_experts:
++++++++            expert_idx = expert_idx_tensor.item()
++++++++            expert_layer = self.experts[expert_idx]
++++++++            mask = (flat_selected_experts == expert_idx_tensor)
++++++++            current_token_indices = token_indices[mask]
++++++++            current_routing_weights = flat_routing_weights[mask]
++++++++            current_hidden_states = hidden_states[current_token_indices]
++++++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
++++++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
++++++++        return moe_output
+++++++ 
+++++++-            final_hidden_states = final_hidden_states + shared_expert_output
+++++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++-            
+++++++-            return final_hidden_states, router_logits
++++++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++++        global Long_Prompt
++++++++
++++++++        # 1. 门控计算 (所有模式通用)
++++++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
++++++++        router_logits = self.gate(hidden_states_reshaped)
++++++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
++++++++        if self.norm_topk_prob:
++++++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++        
++++++++        moe_output = None
++++++++        if Long_Prompt:
++++++++            # --- 精度优先模式 (ACCURACY MODE) ---
++++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++        else:
++++++++            # --- 速度优先模式 (SPEED MODE) ---
++++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++++            if sequence_length == 1:
++++++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++            else:
++++++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++        
+++++++ 
++++++++        # 3. 共享专家计算与合并 (所有模式通用)
++++++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
++++++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
++++++++        
++++++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
++++++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
++++++++        
++++++++        return final_hidden_states, router_logits
+++++++ 
+++++++ class Qwen2MoeDecoderLayer(nn.Module):
+++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+++++++         super().__init__()
+++++++         self.hidden_size = config.hidden_size
++++++++        
++++++++        # if Long_Prompt:
++++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++++        # else:
++++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++++ 
+++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++++ 
+++++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++++-
+++++++         if (layer_idx not in config.mlp_only_layers) and (
+++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++++++         ):
+++++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++             self._warmed_up = True
+++++++             self.warmup_moe_model()
+++++++ 
++++++++
++++++++
+++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++++++         output_router_logits = (
+++++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+++++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++             router_logits=outputs.router_logits,
+++++++         )
+++++++ 
++++++++    def generate(self, *args, **kwargs):
++++++++        """
++++++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
++++++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
++++++++        """
++++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++++++++
++++++++        input_ids = kwargs.get("input_ids")
++++++++        if input_ids is None and args:
++++++++            input_ids = args[0]
++++++++
++++++++        if input_ids is not None:
++++++++            prompt_length = input_ids.shape[1]
++++++++            
++++++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
++++++++                Long_Prompt = True
++++++++            else:
++++++++                Long_Prompt = False
++++++++
++++++++        return super().generate(*args, **kwargs)
++++++++    
+++++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+++++++     def prepare_inputs_for_generation(
+++++++         self,
+++++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+++++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
+++++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
++++++++        
+++++++         if past_key_values is not None:
+++++++             if inputs_embeds is not None:  # Exception 1
+++++++                 if 0 not in input_ids.shape:
+++++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++             }
+++++++         )
+++++++         return model_inputs
++++++++
+++++++ # @lwx
+++++++     # def _decode_one_tokens_logits(
+++++++     #     self,
+++++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+++++++             attentions=outputs.attentions,
+++++++         )
+++++++ 
++++++++
+++++++ __all__ = [
+++++++     "Qwen2MoeForCausalLM",
+++++++     "Qwen2MoeModel",
+++++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++++++new file mode 100644
+++++++index 00000000..6dfb5b93
+++++++--- /dev/null
++++++++++ b/patches/0001-20251104commit.patch
+++++++@@ -0,0 +1,1272 @@
++++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
++++++++From: Pinoeer-kingxi <13022943007@163.com>
++++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
++++++++Subject: [PATCH] 20251104commit
++++++++
++++++++---
++++++++ mindnlp/transformers/cache_utils.py           |  28 +-
++++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
++++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
++++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
++++++++
++++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
++++++++index cadd2e04..02f8d4be 100644
++++++++--- a/mindnlp/transformers/cache_utils.py
+++++++++++ b/mindnlp/transformers/cache_utils.py
++++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
++++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
++++++++             # k_out[:, :, cache_position] = key_states
++++++++             # v_out[:, :, cache_position] = value_states
++++++++-            if ON_ORANGE_PI:
++++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
++++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
++++++++-            else:
++++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
++++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
++++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
++++++++-
+++++++++            # if ON_ORANGE_PI:
+++++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++++++            # else:
+++++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++++++            # 确保 cache_position 是 1D tensor 并且类型正确
+++++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++++++++            if cache_position.ndim > 1:
+++++++++                cache_position = cache_position.flatten()
+++++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+++++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++++++++                cache_position = cache_position.int()
+++++++++            
+++++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++++++++            k_out[:, :, cache_position] = key_states
+++++++++            v_out[:, :, cache_position] = value_states
+++++++++            
++++++++         return k_out, v_out
++++++++ 
++++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
++++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++index c695b944..d8303e45 100644
++++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
++++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
++++++++ def rotate_half(x):
++++++++     """Rotates half the hidden dims of the input."""
++++++++-    x1 = x[..., : x.shape[-1] // 2]
++++++++-    x2 = x[..., x.shape[-1] // 2 :]
+++++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++++++    # x1 = x[..., : x.shape[-1] // 2]
+++++++++    # x2 = x[..., x.shape[-1] // 2 :]
+++++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++++     return ops.cat((-x2, x1), dim=-1)
++++++++ 
++++++++ 
++++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
++++++++         if self.training:
++++++++             raise NotImplementedError("Training is not supported yet.")
++++++++         else:
++++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
++++++++-        if self.config.n_shared_experts is not None:
++++++++-            y = y + self.shared_experts(identity)
++++++++-        return y
+++++++++            # @lwx
+++++++++            if orig_shape[1] == 1:
+++++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++++++++                y=y.view(*orig_shape)
+++++++++                if self.config.n_shared_experts is not None:
+++++++++                    y = y + self.shared_experts(identity)
+++++++++                return y
+++++++++            else:
+++++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++++++++                if self.config.n_shared_experts is not None:
+++++++++                    y = y + self.shared_experts(identity)
+++++++++                return y
+++++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++++++        # if self.config.n_shared_experts is not None:
+++++++++        #     y = y + self.shared_experts(identity)
+++++++++        # return y
+++++++++        
+++++++++    @no_grad()
+++++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++++++
+++++++++        expert_cache = ops.zeros_like(x)
+++++++++        for i in range(self.num_experts_per_tok):
+++++++++            expert_id = flat_expert_indices[i].item()
+++++++++            weight = flat_expert_weights[i].item()
+++++++++            expert = self.experts[expert_id]
+++++++++            expert_out = expert(x)
+++++++++            expert_cache += expert_out * weight
+++++++++        return expert_cache
++++++++ 
++++++++     @no_grad()
++++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++++-        # expert_cache = torch.zeros_like(x)
++++++++-        # idxs = flat_expert_indices.argsort()
++++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++++-        # token_idxs = idxs // self.num_experts_per_tok
++++++++-        # for i, end_idx in enumerate(tokens_per_expert):
++++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++++-        #     if start_idx == end_idx:
++++++++-        #         continue
++++++++-        #     expert = self.experts[i]
++++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++++-        #     expert_tokens = x[exp_token_idx]
++++++++-        #     expert_out = expert(expert_tokens)
++++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++++-        # return expert_cache
+++++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++++         expert_cache = ops.zeros_like(x)
++++++++         idxs = flat_expert_indices.argsort()
++++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++++         token_idxs = idxs // self.num_experts_per_tok
+++++++++
++++++++         for i, end_idx in enumerate(tokens_per_expert):
++++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++++             if start_idx == end_idx:
++++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
++++++++             expert_out = expert(expert_tokens)
++++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++++
++++++++         return expert_cache
+++++++++        
+++++++++    # @no_grad()
+++++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++++    #     # expert_cache = torch.zeros_like(x)
+++++++++    #     # idxs = flat_expert_indices.argsort()
+++++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++++    #     # token_idxs = idxs // self.num_experts_per_tok
+++++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++++    #     #     if start_idx == end_idx:
+++++++++    #     #         continue
+++++++++    #     #     expert = self.experts[i]
+++++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++++    #     #     expert_tokens = x[exp_token_idx]
+++++++++    #     #     expert_out = expert(expert_tokens)
+++++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++++    #     # return expert_cache
+++++++++    #     expert_cache = ops.zeros_like(x)
+++++++++    #     idxs = flat_expert_indices.argsort()
+++++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++++++
+++++++++    #     for i, end_idx in enumerate(tokens_per_expert):
+++++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++++    #         if start_idx == end_idx:
+++++++++    #             continue
+++++++++    #         expert = self.experts[i]
+++++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++++    #         expert_tokens = x[exp_token_idx]
+++++++++    #         expert_out = expert(expert_tokens)
+++++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++++
+++++++++    #     return expert_cache
+++++++++    # @no_grad()
+++++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++++    #     expert_cache = ops.zeros_like(x)
+++++++++
+++++++++    #     # 排序保证顺序一致
+++++++++    #     idxs = flat_expert_indices.argsort()
+++++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++++    #     token_idxs = idxs // self.num_experts_per_tok
+++++++++
+++++++++    #     # 找出有 token 的专家
+++++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++++++
+++++++++    #     for i in active_experts.tolist():
+++++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++++    #         end_idx = tokens_per_expert[i]
+++++++++    #         if start_idx == end_idx:  # 没有 token
+++++++++    #             continue
+++++++++
+++++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++++    #         expert_tokens = x[exp_token_idx]
+++++++++    #         expert_out = self.experts[i](expert_tokens)
+++++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++++++
+++++++++    #         expert_cache = mindspore.mint.scatter_add(
+++++++++    #             expert_cache,
+++++++++    #             0,
+++++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++++++    #             expert_out
+++++++++    #         )
+++++++++
+++++++++    #     return expert_cache
+++++++++
+++++++++
++++++++ 
++++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
++++++++ #     """
++++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++++ 
++++++++         # Initialize weights and apply final processing
++++++++         self.post_init()
+++++++++        self.warm_up = False
+++++++++
+++++++++    def warmup_moe_model_deep(self):
+++++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++++++        test_texts = [
+++++++++            "warmup short",
+++++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++++++++        ]
+++++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++++++        if tokenizer is None:
+++++++++            from mindnlp.transformers import AutoTokenizer
+++++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++++++            self._warmup_tokenizer = tokenizer
+++++++++
+++++++++        for text in test_texts:
+++++++++            inputs = tokenizer(text, return_tensors="ms")
+++++++++            with mindspore._no_grad():
+++++++++                _ = self(**inputs, use_cache=False)
+++++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
++++++++ 
++++++++     def get_input_embeddings(self):
++++++++         return self.model.embed_tokens
++++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
++++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++++++         ```"""
+++++++++        if not self.warm_up:
+++++++++            self.warm_up = True
+++++++++            self.warmup_moe_model_deep()
+++++++++
++++++++         output_attentions = (
++++++++             output_attentions
++++++++             if output_attentions is not None
++++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++index 3cbf820e..d4c6b651 100644
++++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++@@ -18,7 +18,6 @@
++++++++ # See the License for the specific language governing permissions and
++++++++ # limitations under the License.
++++++++ """MindSpore Qwen2MoE model."""
++++++++-
++++++++ import math
++++++++ from typing import List, Optional, Tuple, Union
++++++++ 
++++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
++++++++     TokenClassifierOutput,
++++++++ )
++++++++ from ...modeling_utils import PreTrainedModel
+++++++++from ...generation import GenerationMixin
++++++++ from ....utils import logging
++++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
++++++++ 
++++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
++++++++         self.variance_epsilon = eps
++++++++ 
++++++++     def forward(self, hidden_states):
+++++++++        # @dwj
+++++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++++++        # @lwx
+++++++++        # if not self.training :
+++++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
++++++++         input_dtype = hidden_states.dtype
++++++++         hidden_states = hidden_states.to(mindspore.float32)
++++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
++++++++@@ -234,6 +239,8 @@ def rotate_half(x):
++++++++     """Rotates half the hidden dims of the input."""
++++++++     x1 = x[..., : x.shape[-1] // 2]
++++++++     x2 = x[..., x.shape[-1] // 2 :]
+++++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
++++++++     return ops.cat((-x2, x1), dim=-1)
++++++++ 
++++++++ 
++++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
++++++++         self.config = config
++++++++         self.hidden_size = config.hidden_size
++++++++         self.intermediate_size = intermediate_size
+++++++++        
++++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
++++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
++++++++         self.act_fn = ACT2FN[config.hidden_act]
++++++++ 
++++++++     def forward(self, x):
++++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
++++++++-
++++++++ 
+++++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++++++        # @lwx 
+++++++++        # gate_up_output = self.gate_up_proj(x)
+++++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++++++++        # return self.down_proj(swiglu_output)
+++++++++
+++++++++    # def forward(self, x):
+++++++++    #     gate_proj_out = self.gate_proj(x)
+++++++++    #     up_proj_out = self.up_proj(x)
+++++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++++++++    #     return self.down_proj(swiglu_out)
+++++++++        
++++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
++++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
++++++++     """
++++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
++++++++         use_cache: bool = False,
++++++++         cache_position: Optional[mindspore.Tensor] = None,
++++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++++
+++++++++        
+++++++++
++++++++         bsz, q_len, _ = hidden_states.shape
++++++++ 
++++++++         query_states = self.q_proj(hidden_states)
++++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
++++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++                     "with a layer index."
++++++++                 )
++++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++++            if isinstance(past_key_value, StaticCache):
+++++++++                kv_seq_len = key_states.shape[-2]
+++++++++            else:
+++++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++ 
++++++++         if past_key_value is not None:
++++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++++            
+++++++++            if isinstance(past_key_value, StaticCache):
+++++++++                kv_seq_len = key_states.shape[-2]
++++++++ 
++++++++         # repeat k/v heads if n_kv_heads < n_heads
++++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++++-
+++++++++        
++++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++++ 
++++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
++++++++-            raise ValueError(
++++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
++++++++-                f" {attn_weights.shape}"
++++++++-            )
++++++++-
++++++++-        if attention_mask is not None:  # no matter the length, we just slice it
++++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++++++++        if attention_mask is not None:
+++++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++++             attn_weights = attn_weights + causal_mask
++++++++ 
++++++++         # upcast attention to fp32
++++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
++++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++++ 
++++++++         attn_output = self.o_proj(attn_output)
++++++++-
+++++++++        # @lwx
+++++++++        
+++++++++        # max_seq_len = self.max_position_embeddings  # 2048
+++++++++
+++++++++        # if attention_mask is not None:
+++++++++        #     # attention_mask: [B, 1, Sq, Sk]
+++++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++++++
+++++++++        #     # pad 到 [max_seq_len, max_seq_len]
+++++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++++++        #     global_attention_mask = padded_mask
+++++++++        # else:
+++++++++        #     global_attention_mask = None
+++++++++
+++++++++
+++++++++        # sparse_mode=3
+++++++++        # attn_output = mindspore.ops.flash_attention_score(
+++++++++        #     query=query_states,
+++++++++        #     key=key_states,
+++++++++        #     value=value_states,
+++++++++        #     real_shift=None,
+++++++++        #     padding_mask=None,
+++++++++
+++++++++        #     head_num=self.num_heads,
+++++++++        #     attn_mask=global_attention_mask,
+++++++++        #     keep_prob=1.0 - self.attention_dropout,
+++++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++++        #     input_layout="BNSD",
+++++++++        #     pre_tokens=2147483647,
+++++++++        #     next_tokens=2147483647,
+++++++++        #     inner_precise=0,
+++++++++        #     drop_mask=None,
+++++++++        #     prefix=None,
+++++++++        #     actual_seq_qlen=None,
+++++++++        #     actual_seq_kvlen=None,
+++++++++        #     sparse_mode=sparse_mode,
+++++++++        # )
++++++++         if not output_attentions:
++++++++             attn_weights = None
++++++++ 
++++++++         return attn_output, attn_weights, past_key_value
++++++++ 
++++++++ 
+++++++++class Qwen2MoeFlashAttention(nn.Module):
+++++++++    """
+++++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++++++
+++++++++    关键改动:
+++++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++++++       直接传入原始的 key 和 value 张量效率更高。
+++++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++++++    """
+++++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++++++        super().__init__()
+++++++++        self.config = config
+++++++++        self.layer_idx = layer_idx
+++++++++        self.hidden_size = config.hidden_size
+++++++++        self.num_heads = config.num_attention_heads
+++++++++        self.head_dim = self.hidden_size // self.num_heads
+++++++++        self.num_key_value_heads = config.num_key_value_heads
+++++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++++        self.max_position_embeddings = config.max_position_embeddings
+++++++++        self.rope_theta = config.rope_theta
+++++++++        self.attention_dropout = config.attention_dropout
+++++++++
+++++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++++            raise ValueError(
+++++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++++++            )
+++++++++
+++++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++++++
+++++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++++++            self.head_dim,
+++++++++            max_position_embeddings=self.max_position_embeddings,
+++++++++            base=self.rope_theta,
+++++++++        )
+++++++++
+++++++++    def forward(
+++++++++        self,
+++++++++        hidden_states: mindspore.Tensor,
+++++++++        attention_mask: Optional[mindspore.Tensor] = None,
+++++++++        position_ids: Optional[mindspore.Tensor] = None,
+++++++++        past_key_value: Optional[Cache] = None,
+++++++++        output_attentions: bool = False,
+++++++++        use_cache: bool = False,
+++++++++        cache_position: Optional[mindspore.Tensor] = None,
+++++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++++
+++++++++        bsz, q_len, _ = hidden_states.shape
+++++++++
+++++++++        # 1. 线性投射 Q, K, V
+++++++++        query_states = self.q_proj(hidden_states)
+++++++++        key_states = self.k_proj(hidden_states)
+++++++++        value_states = self.v_proj(hidden_states)
+++++++++
+++++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++
+++++++++        # 3. RoPE 旋转位置编码
+++++++++        kv_seq_len = key_states.shape[-2]
+++++++++        if past_key_value is not None:
+++++++++            if self.layer_idx is None:
+++++++++                raise ValueError(
+++++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++++                    "with a layer index."
+++++++++                )
+++++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++++++                if cache_position.shape[0] == 1:
+++++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++++++                    kv_seq_len = past_seen_tokens + 1
+++++++++                else:
+++++++++                    # prefill 阶段：cache_position 是范围，使用其长度
+++++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++++++            else:
+++++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++++
+++++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++++
+++++++++        # 4. KV 缓存更新
+++++++++        if past_key_value is not None:
+++++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++++            key_states, value_states = past_key_value.update(
+++++++++                key_states, value_states, self.layer_idx, cache_kwargs
+++++++++            )
+++++++++            
+++++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++++                if cache_position.shape[0] == 1:
+++++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++++++                    kv_seq_len = key_states.shape[-2]
+++++++++
+++++++++        # 5. [重要] 准备 Attention Mask
+++++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++++++        fa_attention_mask = None
+++++++++        if attention_mask is not None:
+++++++++            # 截取与当前key长度匹配的部分
+++++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++++++            fa_attention_mask = (mask_slice != 0)
+++++++++
+++++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++++++        input_dtype = query_states.dtype
+++++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++++++            query_states = query_states.to(mindspore.float16)
+++++++++            key_states = key_states.to(mindspore.float16)
+++++++++            value_states = value_states.to(mindspore.float16)
+++++++++
+++++++++        # 6. [核心] 调用 flash_attention_score 算子
+++++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++++++        attn_output = mindspore.ops.flash_attention_score(
+++++++++            query=query_states,
+++++++++            key=key_states,
+++++++++            value=value_states,
+++++++++            head_num=self.num_heads, # 传入Q的头数(N1)
+++++++++            attn_mask=fa_attention_mask,
+++++++++            keep_prob=1.0 - self.attention_dropout,
+++++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++++            input_layout="BNSD",
+++++++++            sparse_mode=0 # 使用 defaultMask 模式
+++++++++        )
+++++++++
+++++++++        # 恢复原始数据类型
+++++++++        attn_output = attn_output.to(input_dtype)
+++++++++
+++++++++        # 7. 调整输出形状
+++++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++++        attn_output = self.o_proj(attn_output)
+++++++++
+++++++++        # FlashAttention 算子不直接返回注意力权重矩阵
+++++++++        attn_weights = None
+++++++++        if output_attentions:
+++++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++++
+++++++++        return attn_output, attn_weights, past_key_value
+++++++++
+++++++++    # def forward(
+++++++++    #     self,
+++++++++    #     hidden_states: mindspore.Tensor,
+++++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++++    #     past_key_value: Optional[Cache] = None,
+++++++++    #     output_attentions: bool = False,
+++++++++    #     use_cache: bool = False,
+++++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++++
+++++++++    #     bsz, q_len, _ = hidden_states.shape
+++++++++
+++++++++    #     # 1. 线性投射 Q, K, V
+++++++++    #     query_states = self.q_proj(hidden_states)
+++++++++    #     key_states = self.k_proj(hidden_states)
+++++++++    #     value_states = self.v_proj(hidden_states)
+++++++++
+++++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++
+++++++++    #     # 3. RoPE 旋转位置编码
+++++++++    #     kv_seq_len = key_states.shape[-2]
+++++++++    #     if past_key_value is not None:
+++++++++    #         if self.layer_idx is None:
+++++++++    #             raise ValueError(
+++++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++++    #                 "with a layer index."
+++++++++    #             )
+++++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++++
+++++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++++
+++++++++    #     # 4. KV 缓存更新
+++++++++    #     if past_key_value is not None:
+++++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++++    #         key_states, value_states = past_key_value.update(
+++++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++++    #         )
+++++++++
+++++++++    #     # 5. 准备 Attention Mask
+++++++++    #     fa_attention_mask = None
+++++++++    #     if attention_mask is not None:
+++++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++++    #         fa_attention_mask = (mask_slice != 0)
+++++++++
+++++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++++++    #     input_dtype = query_states.dtype
+++++++++
+++++++++    #     # 6. [核心] 调用 flash_attention_score 算子
+++++++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++++++    #         query=query_states,
+++++++++    #         key=key_states,
+++++++++    #         value=value_states,
+++++++++    #         head_num=self.num_heads,
+++++++++    #         attn_mask=fa_attention_mask,
+++++++++    #         keep_prob=1.0 - self.attention_dropout,
+++++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++++    #         input_layout="BNSD",
+++++++++    #         sparse_mode=0,
+++++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++++++    #         inner_precise=1
+++++++++    #     )
+++++++++
+++++++++    #     # 恢复原始数据类型
+++++++++    #     attn_output = attn_output.to(input_dtype)
+++++++++
+++++++++    #     # 7. 调整输出形状
+++++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++++    #     attn_output = self.o_proj(attn_output)
+++++++++
+++++++++    #     attn_weights = None
+++++++++    #     if output_attentions:
+++++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++++
+++++++++    #     return attn_output, attn_weights, past_key_value
+++++++++
+++++++++    # def forward(
+++++++++    #     self,
+++++++++    #     hidden_states: mindspore.Tensor,
+++++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++++    #     past_key_value: Optional[Cache] = None,
+++++++++    #     output_attentions: bool = False,
+++++++++    #     use_cache: bool = False,
+++++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++++
+++++++++    #     bsz, q_len, _ = hidden_states.shape
+++++++++
+++++++++    #     query_states = self.q_proj(hidden_states)
+++++++++    #     key_states = self.k_proj(hidden_states)
+++++++++    #     value_states = self.v_proj(hidden_states)
+++++++++
+++++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++++
+++++++++    #     kv_seq_len = key_states.shape[-2]
+++++++++    #     if past_key_value is not None:
+++++++++    #         if self.layer_idx is None:
+++++++++    #             raise ValueError("`layer_idx` must be specified for caching")
+++++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++++
+++++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++++
+++++++++    #     if past_key_value is not None:
+++++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++++    #         key_states, value_states = past_key_value.update(
+++++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++++    #         )
+++++++++
+++++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++++
+++++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++++++    #     query_states = query_states / math.sqrt(self.head_dim)
+++++++++    #     # <--- 修改结束 ---
+++++++++
+++++++++    #     fa_attention_mask = None
+++++++++    #     if attention_mask is not None:
+++++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++++    #         fa_attention_mask = (mask_slice != 0)
+++++++++
+++++++++    #     input_dtype = query_states.dtype
+++++++++
+++++++++    #     attn_output = mindspore.ops.flash_attention_score(
+++++++++    #         query=query_states,    # 传入已经预先缩放过的 query
+++++++++    #         key=key_states,
+++++++++    #         value=value_states,
+++++++++    #         head_num=self.num_heads,
+++++++++    #         attn_mask=fa_attention_mask,
+++++++++    #         keep_prob=1.0 - self.attention_dropout,
+++++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++++++    #         input_layout="BNSD",
+++++++++    #         sparse_mode=0,
+++++++++    #         inner_precise=1        # 仍然保持内部高精度计算
+++++++++    #     )
+++++++++
+++++++++    #     attn_output = attn_output.to(input_dtype)
+++++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++++    #     attn_output = self.o_proj(attn_output)
+++++++++
+++++++++    #     attn_weights = None
+++++++++    #     if output_attentions:
+++++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++++++
+++++++++    #     return attn_output, attn_weights, past_key_value
+++++++++
++++++++ QWEN2MOE_ATTENTION_CLASSES = {
++++++++     "eager": Qwen2MoeAttention,
+++++++++    "flash-attention": Qwen2MoeFlashAttention,
++++++++ }
++++++++ 
++++++++ 
++++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
++++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
++++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
++++++++ 
+++++++++    #@dwj
+++++++++    # 只遍历激活的专家，而非全部专家
++++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
++++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
++++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
++++++++-        # router_logits: (batch * sequence_length, n_experts)
++++++++-        router_logits = self.gate(hidden_states)
++++++++-
++++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
++++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
++++++++-        if self.norm_topk_prob:
++++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
++++++++-        # we cast back to the input dtype
++++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
++++++++-
++++++++-        final_hidden_states = ops.zeros(
++++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
++++++++-        )
++++++++-
++++++++-        # One hot encode the selected experts to create an expert mask
++++++++-        # this will be used to easily index which expert is going to be sollicitated
++++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
++++++++-
++++++++-        # Loop over all available experts in the model and perform the computation on each expert
++++++++-        for expert_idx in range(self.num_experts):
++++++++-            expert_layer = self.experts[expert_idx]
++++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
++++++++-
++++++++-            # Index the correct hidden states and compute the expert hidden state for
++++++++-            # the current expert. We need to make sure to multiply the output hidden
++++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
++++++++-            if 0 not in idx.shape:
++++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
++++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
++++++++-
++++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
++++++++-                # the `top_x` tensor here.
++++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
++++++++-
++++++++-        shared_expert_output = self.shared_expert(hidden_states)
++++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
++++++++-
++++++++-        final_hidden_states = final_hidden_states + shared_expert_output
+++++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++++            num_tokens = hidden_states_reshaped.shape[0]
+++++++++
+++++++++            router_logits = self.gate(hidden_states_reshaped)
+++++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++++
+++++++++            if self.norm_topk_prob:
+++++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++++            routing_weights = routing_weights.to(hidden_states.dtype)
+++++++++            
+++++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++++++            flat_selected_experts = selected_experts.flatten()
+++++++++            
+++++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++++++            token_indices = broadcasted_token_indices.flatten()
+++++++++            
+++++++++            active_experts = ops.unique(flat_selected_experts)
+++++++++            
+++++++++            for expert_idx_tensor in active_experts:
+++++++++                expert_idx = expert_idx_tensor.item()
+++++++++                expert_layer = self.experts[expert_idx]
+++++++++                
+++++++++                mask = (flat_selected_experts == expert_idx_tensor)
+++++++++                selected_token_indices = token_indices[mask]
+++++++++                selected_routing_weights = routing_weights.flatten()[mask]
+++++++++                
+++++++++                current_states = hidden_states_reshaped[selected_token_indices]
+++++++++                
+++++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++++                
+++++++++                final_hidden_states = final_hidden_states.index_add(
+++++++++                    dim=0,
+++++++++                    index=selected_token_indices,
+++++++++                    source=expert_output.to(hidden_states.dtype)
+++++++++                )
+++++++++            
+++++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
++++++++ 
++++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
++++++++-        return final_hidden_states, router_logits
+++++++++            final_hidden_states = final_hidden_states + shared_expert_output
+++++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++++            
+++++++++            return final_hidden_states, router_logits
++++++++ 
++++++++ 
++++++++ class Qwen2MoeDecoderLayer(nn.Module):
++++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
++++++++ 
++++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
++++++++ 
+++++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++++++
++++++++         if (layer_idx not in config.mlp_only_layers) and (
++++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
++++++++         ):
++++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
++++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
++++++++     _skip_keys_device_placement = "past_key_values"
++++++++     _supports_cache_class = True
+++++++++#lwx
+++++++++    # _supports_static_cache = True
++++++++ 
++++++++     def _init_weights(self, module):
++++++++         std = self.config.initializer_range
++++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
++++++++         return causal_mask
++++++++ 
++++++++ 
++++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
++++++++     _tied_weights_keys = ["lm_head.weight"]
++++++++ 
++++++++     def __init__(self, config):
++++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++++         self.num_experts_per_tok = config.num_experts_per_tok
++++++++         # Initialize weights and apply final processing
++++++++         self.post_init()
+++++++++        # @lwx
+++++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++++++++        #     self.generation_config.cache_implementation = "static"
+++++++++        self._warmed_up = False
+++++++++
+++++++++    def warmup_moe_model(self):
+++++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++++++++        test_texts = [
+++++++++            "warmup short",
+++++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++++++++        ]
+++++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++++++        if tokenizer is None:
+++++++++            from mindnlp.transformers import AutoTokenizer
+++++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++++++            self._warmup_tokenizer = tokenizer
+++++++++
+++++++++        for text in test_texts:
+++++++++            inputs = tokenizer(text, return_tensors="ms")
+++++++++            with mindspore._no_grad():
+++++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
++++++++ 
++++++++     def get_input_embeddings(self):
++++++++         return self.model.embed_tokens
++++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
++++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
++++++++         ```"""
+++++++++        if not self._warmed_up:
+++++++++            self._warmed_up = True
+++++++++            self.warmup_moe_model()
++++++++ 
++++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
++++++++         output_router_logits = (
++++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
++++++++             }
++++++++         )
++++++++         return model_inputs
+++++++++# @lwx
+++++++++    # def _decode_one_tokens_logits(
+++++++++    #     self,
+++++++++    #     cur_token: mindspore.Tensor,
+++++++++    #     input_pos: Optional[mindspore.Tensor],
+++++++++    #     cache_position: mindspore.Tensor,
+++++++++    #     past_key_values: StaticCache,
+++++++++    # ) -> mindspore.Tensor:
+++++++++    #     """
+++++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++++++++        
+++++++++    #     Args:
+++++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++++++++    #         input_pos: 输入位置信息，可选
+++++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++++++++            
+++++++++    #     Returns:
+++++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++++++++    #     """
+++++++++    #     # 调用JIT编译的版本
+++++++++    #     return self.get_decode_one_tokens_logits(
+++++++++    #         cur_token=cur_token,
+++++++++    #         input_pos=input_pos,
+++++++++    #         cache_position=cache_position,
+++++++++    #         past_key_values=past_key_values,
+++++++++    #     )
+++++++++    
+++++++++    # @mindspore.jit(jit_level='O1')
+++++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++++++++    #     """
+++++++++    #     JIT编译的函数，用于高效的单token解码
+++++++++    #     使用JIT编译优化以支持静态shape和高效执行
+++++++++        
+++++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++++++++    #     """
+++++++++    #     outputs = self.model.forward(
+++++++++    #         input_ids=cur_token,
+++++++++    #         position_ids=input_pos,
+++++++++    #         cache_position=cache_position,
+++++++++    #         past_key_values=past_key_values,
+++++++++    #         use_cache=True,
+++++++++    #         return_dict=False,
+++++++++    #     )
+++++++++        
+++++++++    #     hidden_states = outputs[0]
+++++++++    #     logits = self.lm_head.forward(hidden_states)
+++++++++    #     logits = logits.float()
+++++++++        
+++++++++    #     return logits[:, -1, :]
+++++++++
+++++++++    # def _sample(
+++++++++    #     self,
+++++++++    #     input_ids: mindspore.Tensor,
+++++++++    #     logits_processor,
+++++++++    #     stopping_criteria,
+++++++++    #     generation_config,
+++++++++    #     synced_devices: bool,
+++++++++    #     streamer=None,
+++++++++    #     logits_warper=None,
+++++++++    #     **model_kwargs,
+++++++++    # ):
+++++++++    #     """
+++++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++++++++    #     """
+++++++++    #     from ...generation.logits_process import LogitsProcessorList
+++++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++++++++    #     from mindnlp.core import nn, ops, no_grad
+++++++++    #     import numpy as np
+++++++++        
+++++++++    #     # 检查是否使用 StaticCache
+++++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++++++++    #     # 否则，直接调用父类方法
+++++++++    #     past_key_values = model_kwargs.get("past_key_values")
+++++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++++++++
+++++++++    #     if not isinstance(past_key_values, StaticCache):
+++++++++    #         # 不使用 StaticCache，直接调用父类方法
+++++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++++++++    #         return super()._sample(
+++++++++    #             input_ids=input_ids,
+++++++++    #             logits_processor=logits_processor,
+++++++++    #             stopping_criteria=stopping_criteria,
+++++++++    #             generation_config=generation_config,
+++++++++    #             synced_devices=synced_devices,
+++++++++    #             streamer=streamer,
+++++++++    #             logits_warper=logits_warper,
+++++++++    #             **model_kwargs,
+++++++++    #         )
+++++++++        
+++++++++    #     # 使用 StaticCache，进入自定义循环
+++++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++++++++    #     pad_token_id = generation_config._pad_token_tensor
+++++++++    #     output_attentions = generation_config.output_attentions
+++++++++    #     output_hidden_states = generation_config.output_hidden_states
+++++++++    #     output_scores = generation_config.output_scores
+++++++++    #     output_logits = generation_config.output_logits
+++++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++++++++    #     max_length = generation_config.max_length
+++++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++++++++    #     do_sample = generation_config.do_sample
+++++++++        
+++++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++++++++    #         raise ValueError(
+++++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++++++++    #             f"{logits_warper})."
+++++++++    #         )
+++++++++        
+++++++++    #     # init attention / hidden states / scores tuples
+++++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+++++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++++++++        
+++++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++++++++    #         encoder_hidden_states = (
+++++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++++++++    #         )
+++++++++        
+++++++++    #     # keep track of which sequences are already finished
+++++++++    #     batch_size, cur_len = input_ids.shape
+++++++++    #     this_peer_finished = False
+++++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++++++++        
+++++++++    #     time_record = []
+++++++++    #     from ....utils.testing_utils import parse_flag_from_env
+++++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++++++++        
+++++++++    #     while self._has_unfinished_sequences(
+++++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++++++++    #     ):
+++++++++    #         if _record_time:
+++++++++    #             import time as time_module
+++++++++    #             infer_start = time_module.time()
+++++++++            
+++++++++    #         # prepare model inputs
+++++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++++++++            
+++++++++    #         # prepare variable output controls
+++++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++++++++            
+++++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++++++++    #         cur_cache_position = model_inputs.get("cache_position")
+++++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+++++++++    #         cur_input_ids = model_inputs.get("input_ids")
+++++++++            
+++++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++++++++    #             cur_cache_position is not None and 
+++++++++    #             len(cur_cache_position.shape) > 0 and
+++++++++    #             cur_cache_position.shape[0] == 1 and
+++++++++    #             cur_input_ids is not None and
+++++++++    #             cur_input_ids.shape[1] == 1):
+++++++++    #             # 使用 JIT 优化的单 token 解码
+++++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++++++++    #             if not hasattr(self, '_jit_used'):
+++++++++    #                 self._jit_used = False
+++++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++++++++                
+++++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+++++++++    #                 cur_token=cur_input_ids,
+++++++++    #                 input_pos=model_inputs.get("position_ids"),
+++++++++    #                 cache_position=cur_cache_position,
+++++++++    #                 past_key_values=cur_past_key_values,
+++++++++    #             )
+++++++++                
+++++++++    #             # 标记已使用JIT（用于后续判断）
+++++++++    #             if not self._jit_used:
+++++++++    #                 self._jit_used = True
+++++++++                
+++++++++    #             # 构造兼容的输出对象
+++++++++    #             class JitOptimizedOutput:
+++++++++    #                 def __init__(self, logits, config):
+++++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++++++++    #                     self.config = config
+++++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+++++++++    #                     self.cross_attentions = None
+++++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++++++++                
+++++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++++++++    #         else:
+++++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++++++++    #             outputs = self(**model_inputs, return_dict=True)
+++++++++            
+++++++++    #         if synced_devices and this_peer_finished:
+++++++++    #             continue
+++++++++            
+++++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++++++++    #         next_token_logits = outputs.logits[:, -1, :]
+++++++++            
+++++++++    #         # pre-process distribution
+++++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++++++++    #         if do_sample:
+++++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++++++++            
+++++++++    #         # Store scores, attentions and hidden_states when required
+++++++++    #         if return_dict_in_generate:
+++++++++    #             if output_scores:
+++++++++    #                 scores += (next_token_scores,)
+++++++++    #             if output_logits:
+++++++++    #                 raw_logits += (next_token_logits,)
+++++++++    #             if output_attentions:
+++++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++++++++    #                 if self.config.is_encoder_decoder:
+++++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++++++++                
+++++++++    #             if output_hidden_states:
+++++++++    #                 hidden = (
+++++++++    #                     outputs.decoder_hidden_states
+++++++++    #                     if self.config.is_encoder_decoder
+++++++++    #                     else outputs.hidden_states
+++++++++    #                 )
+++++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++++++++            
+++++++++    #         # token selection
+++++++++    #         if do_sample:
+++++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++++++++    #         else:
+++++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++++++++            
+++++++++    #         # finished sentences should have their next token be a padding token
+++++++++    #         if has_eos_stopping_criteria:
+++++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++++++++            
+++++++++    #         # update generated ids, model inputs, and length for next step
+++++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++++++++    #         if streamer is not None:
+++++++++    #             streamer.put(next_tokens)
+++++++++            
+++++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+++++++++    #             outputs,
+++++++++    #             model_kwargs,
+++++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++++++++    #         )
+++++++++            
+++++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++++++++    #         cur_len += 1
+++++++++            
+++++++++    #         if _record_time:
+++++++++    #             import time as time_module
+++++++++    #             infer_stop = time_module.time()
+++++++++    #             time_record.append(infer_stop - infer_start)
+++++++++            
+++++++++    #         del outputs
+++++++++        
+++++++++    #     average_infer_time = None
+++++++++    #     if time_record:
+++++++++    #         if len(time_record) > 1:
+++++++++    #             time_record.pop(0)
+++++++++    #         average_infer_time = sum(time_record) / len(time_record)
+++++++++    #         print(f'average inference time is: {average_infer_time}')
+++++++++    #         print(f'inference time record: {time_record}')
+++++++++        
+++++++++    #     if streamer is not None:
+++++++++    #         streamer.end()
+++++++++        
+++++++++    #     # 简单判断：打印是否使用了JIT路径
+++++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+++++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+++++++++    #     else:
+++++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++++++++        
+++++++++    #     if return_dict_in_generate:
+++++++++    #         if self.config.is_encoder_decoder:
+++++++++    #             return GenerateEncoderDecoderOutput(
+++++++++    #                 sequences=input_ids,
+++++++++    #                 scores=scores,
+++++++++    #                 logits=raw_logits,
+++++++++    #                 encoder_attentions=encoder_attentions,
+++++++++    #                 encoder_hidden_states=encoder_hidden_states,
+++++++++    #                 decoder_attentions=decoder_attentions,
+++++++++    #                 cross_attentions=cross_attentions,
+++++++++    #                 decoder_hidden_states=decoder_hidden_states,
+++++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++++++    #                 average_infer_time=average_infer_time
+++++++++    #             )
+++++++++    #         else:
+++++++++    #             return GenerateDecoderOnlyOutput(
+++++++++    #                 sequences=input_ids,
+++++++++    #                 scores=scores,
+++++++++    #                 logits=raw_logits,
+++++++++    #                 attentions=decoder_attentions,
+++++++++    #                 hidden_states=decoder_hidden_states,
+++++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++++++    #                 average_infer_time=average_infer_time
+++++++++    #             )
+++++++++    #     else:
+++++++++    #         return input_ids
+++++++++            
+++++++++    # def _prepare_cache_for_generation(
+++++++++    #     self,
+++++++++    #     generation_config,
+++++++++    #     model_kwargs,
+++++++++    #     assistant_model,
+++++++++    #     batch_size,
+++++++++    #     max_cache_length,
+++++++++    # ):
+++++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++++++++    #         generation_config.cache_implementation = "static"
+++++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++++++++        
+++++++++    #     if generation_config.cache_implementation == "static":
+++++++++    #         base_required_from_max_length = generation_config.max_length + 1
+++++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+++++++++    #         min_cache_size = 50
+++++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++++++++    #         else:
+++++++++    #             max_cache_length = max(base_required, min_cache_size)
+++++++++            
+++++++++    #         original_max_cache_length = max_cache_length
+++++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+++++++++            
+++++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++++++    #             if max_cache_length > self.config.max_position_embeddings:
+++++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++++++++        
+++++++++    #     result = super()._prepare_cache_for_generation(
+++++++++    #         generation_config=generation_config,
+++++++++    #         model_kwargs=model_kwargs,
+++++++++    #         assistant_model=assistant_model,
+++++++++    #         batch_size=batch_size,
+++++++++    #         max_cache_length=max_cache_length,
+++++++++    #     )
+++++++++        
+++++++++    #     if generation_config.cache_implementation == "static":
+++++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++++++++    #         created_cache = model_kwargs.get(cache_name)
+++++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++++++++    #             if created_cache.max_cache_len < generation_config.max_length:
+++++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++++++++        
+++++++++    #     return result
+++++++++
+++++++++
+++++++++
++++++++ 
++++++++ 
++++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
++++++++-- 
++++++++2.27.0
++++++++
+++++++-- 
+++++++2.27.0
+++++++
++++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
++++++new file mode 100644
++++++index 00000000..966529e4
++++++--- /dev/null
+++++++++ b/patches/0003-20261106secondcommit.patch
++++++@@ -0,0 +1,2769 @@
+++++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+++++++From: Pinoeer-kingxi <13022943007@163.com>
+++++++Date: Thu, 6 Nov 2025 14:54:37 +0800
+++++++Subject: [PATCH 3/3] 20261106secondcommit
+++++++
+++++++---
+++++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+++++++ patches/0001-20251104commit.patch             | 1272 -----------------
+++++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
+++++++ delete mode 100644 patches/0001-20251104commit.patch
+++++++
+++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++index 73773c22..2f9192bf 100644
+++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+++++++ 
+++++++ _CONFIG_FOR_DOC = "DeepseekConfig"
+++++++ 
++++++++_attn_mask_cache = {}
++++++++
++++++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
++++++++    q_len = batch_and_seq[1]
++++++++    kv_len = batch_and_seq[1] + past_key_values_length 
++++++++    key = (batch_and_seq[0], q_len, kv_len)
++++++++
++++++++    if key in _attn_mask_cache:
++++++++        return _attn_mask_cache[key]
++++++++
++++++++    mask = _prepare_4d_causal_attention_mask(
++++++++        attention_mask,
++++++++        batch_and_seq,
++++++++        inputs_embeds,
++++++++        past_key_values_length,
++++++++    )
++++++++    _attn_mask_cache[key] = mask
++++++++    return mask
+++++++ 
+++++++ def _get_unpad_data(attention_mask):
+++++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+++++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+++++++         return final_output
+++++++ 
+++++++ 
+++++++-    @no_grad()
+++++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++++-        expert_cache = ops.zeros_like(x)
+++++++-        idxs = flat_expert_indices.argsort()
+++++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++-        token_idxs = idxs // self.num_experts_per_tok
+++++++-
+++++++-        for i, end_idx in enumerate(tokens_per_expert):
+++++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++-            if start_idx == end_idx:
+++++++-                continue
+++++++-            expert = self.experts[i]
+++++++-            exp_token_idx = token_idxs[start_idx:end_idx]
+++++++-            expert_tokens = x[exp_token_idx]
+++++++-            expert_out = expert(expert_tokens)
+++++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++-
+++++++-        return expert_cache
+++++++-        
+++++++     # @no_grad()
+++++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++-    #     # expert_cache = torch.zeros_like(x)
+++++++-    #     # idxs = flat_expert_indices.argsort()
+++++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++-    #     # token_idxs = idxs // self.num_experts_per_tok
+++++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++-    #     #     if start_idx == end_idx:
+++++++-    #     #         continue
+++++++-    #     #     expert = self.experts[i]
+++++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++-    #     #     expert_tokens = x[exp_token_idx]
+++++++-    #     #     expert_out = expert(expert_tokens)
+++++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++-    #     # return expert_cache
++++++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++++     #     expert_cache = ops.zeros_like(x)
+++++++     #     idxs = flat_expert_indices.argsort()
+++++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+++++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++ 
+++++++     #     return expert_cache
+++++++-    # @no_grad()
+++++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++-    #     expert_cache = ops.zeros_like(x)
++++++++        
++++++++    @no_grad()
++++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
++++++++        """
++++++++        优化版 MoE prefill：
++++++++        - 批量张量化处理同一个 expert 的所有 token
++++++++        - 跳过无 token 的专家
++++++++        - 保持结果完全一致
++++++++        """
++++++++        # 初始化输出缓存
++++++++        expert_cache = ops.zeros_like(x)
+++++++ 
+++++++-    #     # 排序保证顺序一致
+++++++-    #     idxs = flat_expert_indices.argsort()
+++++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++-    #     token_idxs = idxs // self.num_experts_per_tok
++++++++        # 排序（确保 scatter_add 位置对应原逻辑）
++++++++        idxs = flat_expert_indices.argsort()
++++++++        sorted_expert_indices = flat_expert_indices[idxs]
++++++++        sorted_token_indices = idxs // self.num_experts_per_tok
+++++++ 
+++++++-    #     # 找出有 token 的专家
+++++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++++        # 每个 expert 的 token 数
++++++++        tokens_per_expert = sorted_expert_indices.bincount()
+++++++ 
+++++++-    #     for i in active_experts.tolist():
+++++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++-    #         end_idx = tokens_per_expert[i]
+++++++-    #         if start_idx == end_idx:  # 没有 token
+++++++-    #             continue
++++++++        # 找出有 token 的专家
++++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+++++++ 
+++++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++-    #         expert_tokens = x[exp_token_idx]
+++++++-    #         expert_out = self.experts[i](expert_tokens)
+++++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++++        for expert_id in active_experts.tolist():
++++++++            # 取该 expert 对应的排序后 token 区间
++++++++            start = (tokens_per_expert[:expert_id]).sum().item()
++++++++            end = start + tokens_per_expert[expert_id].item()
+++++++ 
+++++++-    #         expert_cache = mindspore.mint.scatter_add(
+++++++-    #             expert_cache,
+++++++-    #             0,
+++++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++++-    #             expert_out
+++++++-    #         )
++++++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
++++++++            expert_tokens = x[token_idx]                     # 取输入向量
+++++++ 
+++++++-    #     return expert_cache
++++++++            # 执行专家 MLP
++++++++            expert_out = self.experts[expert_id](expert_tokens)
++++++++
++++++++            # 按权重缩放
++++++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
++++++++
++++++++            # 回写到缓存（等价 scatter_add）
++++++++            expert_cache = mindspore.mint.scatter_add(
++++++++                expert_cache,
++++++++                0,
++++++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++++                scaled_out
++++++++            )
++++++++
++++++++        return expert_cache
++++++++
++++++++        # @no_grad()
++++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++++        #     # expert_cache = torch.zeros_like(x)
++++++++        #     # idxs = flat_expert_indices.argsort()
++++++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
++++++++        #     # token_idxs = idxs // self.num_experts_per_tok
++++++++        #     # for i, end_idx in enumerate(tokens_per_expert):
++++++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
++++++++        #     #     if start_idx == end_idx:
++++++++        #     #         continue
++++++++        #     #     expert = self.experts[i]
++++++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
++++++++        #     #     expert_tokens = x[exp_token_idx]
++++++++        #     #     expert_out = expert(expert_tokens)
++++++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
++++++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
++++++++        #     # return expert_cache
++++++++        #     expert_cache = ops.zeros_like(x)
++++++++        #     idxs = flat_expert_indices.argsort()
++++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++++        #     token_idxs = idxs // self.num_experts_per_tok
++++++++
++++++++        #     for i, end_idx in enumerate(tokens_per_expert):
++++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++++        #         if start_idx == end_idx:
++++++++        #             continue
++++++++        #         expert = self.experts[i]
++++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++++        #         expert_tokens = x[exp_token_idx]
++++++++        #         expert_out = expert(expert_tokens)
++++++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
++++++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
++++++++
++++++++        #     return expert_cache
++++++++        # @no_grad()
++++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
++++++++        #     expert_cache = ops.zeros_like(x)
++++++++
++++++++        #     # 排序保证顺序一致
++++++++        #     idxs = flat_expert_indices.argsort()
++++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
++++++++        #     token_idxs = idxs // self.num_experts_per_tok
++++++++
++++++++        #     # 找出有 token 的专家
++++++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
++++++++
++++++++        #     for i in active_experts.tolist():
++++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
++++++++        #         end_idx = tokens_per_expert[i]
++++++++        #         if start_idx == end_idx:  # 没有 token
++++++++        #             continue
++++++++
++++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
++++++++        #         expert_tokens = x[exp_token_idx]
++++++++        #         expert_out = self.experts[i](expert_tokens)
++++++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
++++++++
++++++++        #         expert_cache = mindspore.mint.scatter_add(
++++++++        #             expert_cache,
++++++++        #             0,
++++++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
++++++++        #             expert_out
++++++++        #         )
++++++++
++++++++        #     return expert_cache
+++++++ 
+++++++ 
+++++++ 
+++++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+++++++ 
+++++++         return attn_output, attn_weights, past_key_value
+++++++ 
+++++++-
+++++++ # class DeepseekFlashAttention(nn.Module):
+++++++ #     """
+++++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+++++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+++++++ 
+++++++         return attn_output, attn_weights, past_key_value
+++++++ 
++++++++
+++++++ Deepseek_ATTENTION_CLASSES = {
+++++++     "eager": DeepseekAttention,
+++++++     "flash-attention": DeepseekFlashAttention,
+++++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+++++++             )
+++++++         else:
+++++++             # 4d mask is passed through the layers
+++++++-            attention_mask = _prepare_4d_causal_attention_mask(
++++++++            # attention_mask = _prepare_4d_causal_attention_mask(
++++++++            #     attention_mask,
++++++++            #     (batch_size, seq_length),
++++++++            #     inputs_embeds,
++++++++            #     past_key_values_length,
++++++++            # )
++++++++            #@dwj
++++++++            attention_mask = get_cached_causal_mask(
+++++++                 attention_mask,
+++++++                 (batch_size, seq_length),
+++++++                 inputs_embeds,
+++++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++++         # Initialize weights and apply final processing
+++++++         self.post_init()
+++++++         self.warm_up = False
++++++++        #@dwj
++++++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
++++++++            self.num_layers,
++++++++            self.num_attention_heads,
++++++++            self.head_dim,
++++++++            batch_size=1,
++++++++            max_length=self.max_length,
++++++++            dtype=mindspore.float16
++++++++        )
++++++++
++++++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
++++++++        key_cache = []
++++++++        value_cache = []
++++++++        for _ in range(num_layers):
++++++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
++++++++            key_cache.append(k)
++++++++            value_cache.append(v)
++++++++        return key_cache, value_cache
++++++++
+++++++ 
+++++++     def warmup_moe_model_deep(self):
+++++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++index bced285c..ebd7782e 100644
+++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+++++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+++++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+++++++ 
+++++++-Long_Prompt = False
+++++++-PROMPT_LENGTH_THRESHOLD = 128
++++++++Long_Prompt = 1
++++++++LONG_PROMPT_LENGTH_THRESHOLD = 128
++++++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
++++++++
++++++++_causal_mask_cache = {}
++++++++
++++++++def get_cached_causal_mask_with_cache_position(
++++++++    attention_mask: mindspore.Tensor,
++++++++    sequence_length: int,
++++++++    target_length: int,
++++++++    dtype: mindspore.dtype,
++++++++    min_dtype: float,
++++++++    cache_position: mindspore.Tensor,
++++++++    batch_size: int,
++++++++):
++++++++    """
++++++++    带缓存的 causal mask 构造函数
++++++++    """
++++++++    # q_len 是当前 query 长度
++++++++    q_len = sequence_length
++++++++    # kv_len 是 target_length
++++++++    kv_len = target_length
++++++++
++++++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
++++++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
++++++++
++++++++    if key in _causal_mask_cache:
++++++++        return _causal_mask_cache[key]
++++++++
++++++++    # 调用原来的 mask 构造逻辑
++++++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++++        attention_mask,
++++++++        sequence_length=sequence_length,
++++++++        target_length=target_length,
++++++++        dtype=dtype,
++++++++        min_dtype=min_dtype,
++++++++        cache_position=cache_position,
++++++++        batch_size=batch_size,
++++++++    )
++++++++    # 缓存结果
++++++++    _causal_mask_cache[key] = causal_mask
++++++++    return causal_mask
+++++++ 
+++++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+++++++ def _prepare_4d_causal_attention_mask_with_cache_position(
+++++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++++++ 
+++++++ 
+++++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
++++++++# class Qwen2MoeAttention(nn.Module):
++++++++#     """
++++++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
++++++++#     and "Generating Long Sequences with Sparse Transformers".
++++++++#     """
++++++++
++++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
++++++++#         super().__init__()
++++++++#         self.config = config
++++++++#         self.layer_idx = layer_idx
++++++++#         if layer_idx is None:
++++++++#             logger.warning_once(
++++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
++++++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++++#                 "when creating this class."
++++++++#             )
++++++++
++++++++#         self.hidden_size = config.hidden_size
++++++++#         self.num_heads = config.num_attention_heads
++++++++#         self.head_dim = self.hidden_size // self.num_heads
++++++++#         self.num_key_value_heads = config.num_key_value_heads
++++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
++++++++#         self.max_position_embeddings = config.max_position_embeddings
++++++++#         self.rope_theta = config.rope_theta
++++++++#         self.is_causal = True
++++++++#         self.attention_dropout = config.attention_dropout
++++++++
++++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
++++++++#             raise ValueError(
++++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
++++++++#                 f" and `num_heads`: {self.num_heads})."
++++++++#             )
++++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
++++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
++++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
++++++++
++++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
++++++++#             self.head_dim,
++++++++#             max_position_embeddings=self.max_position_embeddings,
++++++++#             base=self.rope_theta,
++++++++#         )
++++++++
++++++++#     def forward(
++++++++#         self,
++++++++#         hidden_states: mindspore.Tensor,
++++++++#         attention_mask: Optional[mindspore.Tensor] = None,
++++++++#         position_ids: Optional[mindspore.Tensor] = None,
++++++++#         past_key_value: Optional[Cache] = None,
++++++++#         output_attentions: bool = False,
++++++++#         use_cache: bool = False,
++++++++#         cache_position: Optional[mindspore.Tensor] = None,
++++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
++++++++
++++++++        
++++++++
++++++++#         bsz, q_len, _ = hidden_states.shape
++++++++
++++++++#         query_states = self.q_proj(hidden_states)
++++++++#         key_states = self.k_proj(hidden_states)
++++++++#         value_states = self.v_proj(hidden_states)
++++++++
++++++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
++++++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
++++++++
++++++++#         kv_seq_len = key_states.shape[-2]
++++++++#         if past_key_value is not None:
++++++++#             if self.layer_idx is None:
++++++++#                 raise ValueError(
++++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
++++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
++++++++#                     "with a layer index."
++++++++#                 )
++++++++#             if isinstance(past_key_value, StaticCache):
++++++++#                 kv_seq_len = key_states.shape[-2]
++++++++#             else:
++++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
++++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
++++++++
++++++++#         if past_key_value is not None:
++++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++++            
++++++++#             if isinstance(past_key_value, StaticCache):
++++++++#                 kv_seq_len = key_states.shape[-2]
++++++++
++++++++#         # repeat k/v heads if n_kv_heads < n_heads
++++++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++++        
++++++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++++
++++++++#         if attention_mask is not None:
++++++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++++#             attn_weights = attn_weights + causal_mask
++++++++
++++++++#         # upcast attention to fp32
++++++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++++++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++++++++#         attn_output = ops.matmul(attn_weights, value_states)
++++++++
++++++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++++++++#             raise ValueError(
++++++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
++++++++#                 f" {attn_output.shape}"
++++++++#             )
++++++++
++++++++#         attn_output = ops.transpose(attn_output, 1, 2)
++++++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++++
++++++++#         attn_output = self.o_proj(attn_output)
++++++++#         # @lwx
++++++++        
++++++++#         # max_seq_len = self.max_position_embeddings  # 2048
++++++++
++++++++#         # if attention_mask is not None:
++++++++#         #     # attention_mask: [B, 1, Sq, Sk]
++++++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
++++++++
++++++++#         #     # pad 到 [max_seq_len, max_seq_len]
++++++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
++++++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
++++++++#         #     global_attention_mask = padded_mask
++++++++#         # else:
++++++++#         #     global_attention_mask = None
++++++++
++++++++
++++++++#         # sparse_mode=3
++++++++#         # attn_output = mindspore.ops.flash_attention_score(
++++++++#         #     query=query_states,
++++++++#         #     key=key_states,
++++++++#         #     value=value_states,
++++++++#         #     real_shift=None,
++++++++#         #     padding_mask=None,
++++++++
++++++++#         #     head_num=self.num_heads,
++++++++#         #     attn_mask=global_attention_mask,
++++++++#         #     keep_prob=1.0 - self.attention_dropout,
++++++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++#         #     input_layout="BNSD",
++++++++#         #     pre_tokens=2147483647,
++++++++#         #     next_tokens=2147483647,
++++++++#         #     inner_precise=0,
++++++++#         #     drop_mask=None,
++++++++#         #     prefix=None,
++++++++#         #     actual_seq_qlen=None,
++++++++#         #     actual_seq_kvlen=None,
++++++++#         #     sparse_mode=sparse_mode,
++++++++#         # )
++++++++#         if not output_attentions:
++++++++#             attn_weights = None
++++++++
++++++++#         return attn_output, attn_weights, past_key_value
++++++++
+++++++ class Qwen2MoeAttention(nn.Module):
+++++++     """
+++++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+++++++-    and "Generating Long Sequences with Sparse Transformers".
+++++++-    """
++++++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+++++++ 
++++++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
++++++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
++++++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
++++++++
++++++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
++++++++    """
+++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++++         super().__init__()
+++++++         self.config = config
+++++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+++++++         if layer_idx is None:
+++++++             logger.warning_once(
+++++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+++++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
++++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+++++++                 "when creating this class."
+++++++             )
+++++++ 
+++++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+++++++         use_cache: bool = False,
+++++++         cache_position: Optional[mindspore.Tensor] = None,
+++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++-
+++++++         
+++++++-
++++++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+++++++         bsz, q_len, _ = hidden_states.shape
+++++++ 
+++++++         query_states = self.q_proj(hidden_states)
+++++++         key_states = self.k_proj(hidden_states)
+++++++         value_states = self.v_proj(hidden_states)
+++++++ 
+++++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+++++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+++++++-
++++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
++++++++        
+++++++         kv_seq_len = key_states.shape[-2]
+++++++         if past_key_value is not None:
+++++++-            if self.layer_idx is None:
+++++++-                raise ValueError(
+++++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++-                    "with a layer index."
+++++++-                )
+++++++-            if isinstance(past_key_value, StaticCache):
+++++++-                kv_seq_len = key_states.shape[-2]
+++++++-            else:
+++++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
++++++++        
+++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++ 
+++++++         if past_key_value is not None:
+++++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
++++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
++++++++
++++++++        # --- 2. 动态调度核心注意力计算 ---
++++++++        global Long_Prompt
++++++++        if Long_Prompt >= 1:
++++++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
++++++++            fa_attention_mask = None
++++++++            if attention_mask is not None:
++++++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
++++++++                fa_attention_mask = (mask_slice != 0)
++++++++
++++++++            attn_output = mindspore.ops.flash_attention_score(
++++++++                query=query_states,
++++++++                key=key_states,
++++++++                value=value_states,
++++++++                head_num=self.num_heads,
++++++++                attn_mask=fa_attention_mask,
++++++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
++++++++                scalar_value=1.0 / math.sqrt(self.head_dim),
++++++++                input_layout="BNSD",
++++++++                sparse_mode=0,
++++++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
++++++++            )
+++++++             
+++++++-            if isinstance(past_key_value, StaticCache):
+++++++-                kv_seq_len = key_states.shape[-2]
++++++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
++++++++            attn_output = self.o_proj(attn_output)
++++++++            attn_weights = None
++++++++            if output_attentions:
++++++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+++++++ 
+++++++-        # repeat k/v heads if n_kv_heads < n_heads
+++++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++-        
+++++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
++++++++        else:
++++++++            # --- Eager Attention 路径 (用于短序列和解码) ---
++++++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
++++++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
++++++++            
++++++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++++ 
+++++++-        if attention_mask is not None:
+++++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++++-            attn_weights = attn_weights + causal_mask
++++++++            if attention_mask is not None:
++++++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
++++++++                attn_weights = attn_weights + causal_mask
+++++++ 
+++++++-        # upcast attention to fp32
+++++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+++++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+++++++-        attn_output = ops.matmul(attn_weights, value_states)
++++++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
++++++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
++++++++            attn_output = ops.matmul(attn_weights, value_states)
+++++++ 
+++++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+++++++-            raise ValueError(
+++++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+++++++-                f" {attn_output.shape}"
+++++++-            )
++++++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
++++++++                raise ValueError(
++++++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
++++++++                )
+++++++ 
+++++++-        attn_output = ops.transpose(attn_output, 1, 2)
+++++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++++            attn_output = ops.transpose(attn_output, 1, 2)
++++++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
++++++++            attn_output = self.o_proj(attn_output)
+++++++ 
+++++++-        attn_output = self.o_proj(attn_output)
+++++++-        # @lwx
++++++++            if not output_attentions:
++++++++                attn_weights = None
+++++++         
+++++++-        # max_seq_len = self.max_position_embeddings  # 2048
+++++++-
+++++++-        # if attention_mask is not None:
+++++++-        #     # attention_mask: [B, 1, Sq, Sk]
+++++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++++-
+++++++-        #     # pad 到 [max_seq_len, max_seq_len]
+++++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++++-        #     global_attention_mask = padded_mask
+++++++-        # else:
+++++++-        #     global_attention_mask = None
+++++++-
+++++++-
+++++++-        # sparse_mode=3
+++++++-        # attn_output = mindspore.ops.flash_attention_score(
+++++++-        #     query=query_states,
+++++++-        #     key=key_states,
+++++++-        #     value=value_states,
+++++++-        #     real_shift=None,
+++++++-        #     padding_mask=None,
+++++++-
+++++++-        #     head_num=self.num_heads,
+++++++-        #     attn_mask=global_attention_mask,
+++++++-        #     keep_prob=1.0 - self.attention_dropout,
+++++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++-        #     input_layout="BNSD",
+++++++-        #     pre_tokens=2147483647,
+++++++-        #     next_tokens=2147483647,
+++++++-        #     inner_precise=0,
+++++++-        #     drop_mask=None,
+++++++-        #     prefix=None,
+++++++-        #     actual_seq_qlen=None,
+++++++-        #     actual_seq_kvlen=None,
+++++++-        #     sparse_mode=sparse_mode,
+++++++-        # )
+++++++-        if not output_attentions:
+++++++-            attn_weights = None
+++++++-
+++++++         return attn_output, attn_weights, past_key_value
+++++++ 
+++++++-
+++++++ # class Qwen2MoeFlashAttention(nn.Module):
+++++++ #     """
+++++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+++++++ #             return final_hidden_states, router_logits
+++++++ 
+++++++ 
+++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++-#     """
+++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+++++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+++++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+++++++-#     """
+++++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++++-#         super().__init__()
+++++++-#         self.num_experts = config.num_experts
+++++++-#         self.top_k = config.num_experts_per_tok
+++++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++++-
+++++++-#         # 门控网络
+++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++-#         # 专家列表
+++++++-#         self.experts = nn.ModuleList(
+++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++-#         )
+++++++-#         # 共享专家
+++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++-
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_decode(
+++++++-#         self, 
+++++++-#         hidden_states: mindspore.Tensor, 
+++++++-#         selected_experts: mindspore.Tensor, 
+++++++-#         routing_weights: mindspore.Tensor
+++++++-#     ) -> mindspore.Tensor:
+++++++-#         """
+++++++-#         【解码路径】针对 sequence_length=1 的极致优化。
+++++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+++++++-#         """
+++++++-#         batch_size, hidden_dim = hidden_states.shape
+++++++-        
+++++++-#         expert_outputs_list = [
+++++++-#             ops.cat([
+++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++-#             ], dim=0) 
+++++++-#             for i in range(batch_size)
+++++++-#         ]
+++++++-        
+++++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+++++++-#         # shape: (batch_size, top_k, hidden_dim)
+++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++-        
+++++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+++++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++++-        
+++++++-#         return moe_output.squeeze(1)
+++++++-
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_prefill(
+++++++-#         self, 
+++++++-#         hidden_states: mindspore.Tensor, 
+++++++-#         selected_experts: mindspore.Tensor, 
+++++++-#         routing_weights: mindspore.Tensor
+++++++-#     ) -> mindspore.Tensor:
+++++++-#         """
+++++++-#         【预填充路径】针对 sequence_length > 1 的优化。
+++++++-#         按专家对 Token 进行分组，并进行批处理。
+++++++-#         """
+++++++-#         moe_output = ops.zeros_like(hidden_states)
+++++++-#         num_tokens = hidden_states.shape[0]
+++++++-#         flat_selected_experts = selected_experts.flatten()
+++++++-        
+++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++-        
+++++++-#         active_experts = ops.unique(flat_selected_experts)
+++++++-        
+++++++-#         for expert_idx_tensor in active_experts:
+++++++-#             expert_idx = expert_idx_tensor.item()
+++++++-#             expert_layer = self.experts[expert_idx]
+++++++-            
+++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++-#             selected_token_indices = token_indices[mask]
+++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++++++-            
+++++++-#             current_states = hidden_states[selected_token_indices]
+++++++-            
+++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++-            
+++++++-#             moe_output = moe_output.index_add(
+++++++-#                 dim=0,
+++++++-#                 index=selected_token_indices,
+++++++-#                 source=expert_output.to(hidden_states.dtype)
+++++++-#             )
+++++++-#         return moe_output
+++++++-
+++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++-#         """
+++++++-#         顶层 forward 方法，作为智能分发器。
+++++++-#         """
+++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-        
+++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++-
+++++++-#         if self.norm_topk_prob:
+++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-        
+++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++++-        
+++++++-#         moe_output = None
+++++++-#         # 在推理时，根据序列长度选择最优路径
+++++++-#         if not self.training:
+++++++-#             if sequence_length == 1:
+++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++++++-#             else:
+++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++++++-#         else:
+++++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+++++++-#             raise NotImplementedError("Training path is not implemented.")
+++++++-
+++++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+++++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+++++++-        
+++++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+++++++-        
+++++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+++++++-        
+++++++-#         return final_hidden_states, router_logits
+++++++-
+++++++-
+++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++-#     """
+++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+++++++-#     """
+++++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++++-#         super().__init__()
+++++++-#         self.num_experts = config.num_experts
+++++++-#         self.top_k = config.num_experts_per_tok
+++++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++++-
+++++++-#         # 门控网络
+++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++-#         # 专家列表
+++++++-#         self.experts = nn.ModuleList(
+++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++-#         )
+++++++-#         # 共享专家
+++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++-
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_decode(
+++++++-#         self, 
+++++++-#         hidden_states: mindspore.Tensor, 
+++++++-#         selected_experts: mindspore.Tensor, 
+++++++-#         routing_weights: mindspore.Tensor
+++++++-#     ) -> mindspore.Tensor:
+++++++-#         batch_size, _ = hidden_states.shape
+++++++-#         expert_outputs_list = [
+++++++-#             ops.cat([
+++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++-#             ], dim=0) 
+++++++-#             for i in range(batch_size)
+++++++-#         ]
+++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++++-#         return moe_output.squeeze(1)
+++++++-
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_prefill(
+++++++-#         self, 
+++++++-#         hidden_states: mindspore.Tensor, 
+++++++-#         selected_experts: mindspore.Tensor, 
+++++++-#         routing_weights: mindspore.Tensor
+++++++-#     ) -> mindspore.Tensor:
+++++++-#         moe_output = ops.zeros_like(hidden_states)
+++++++-#         num_tokens = hidden_states.shape[0]
+++++++-#         flat_selected_experts = selected_experts.flatten()
+++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++-#         active_experts = ops.unique(flat_selected_experts)
+++++++-        
+++++++-#         for expert_idx_tensor in active_experts:
+++++++-#             expert_idx = expert_idx_tensor.item()
+++++++-#             expert_layer = self.experts[expert_idx]
+++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++-#             selected_token_indices = token_indices[mask]
+++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++++++-#             current_states = hidden_states[selected_token_indices]
+++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++-#             moe_output = moe_output.index_add(
+++++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++++++-#             )
+++++++-#         return moe_output
+++++++-
+++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++-#         """
+++++++-#         顶层 forward 方法，作为智能分发器。
+++++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+++++++-#         """
+++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-        
+++++++-#         # 1. 门控计算 (通用逻辑)
+++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++-
+++++++-#         if self.norm_topk_prob:
+++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-        
+++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++++-        
+++++++-#         # 2. 智能分发到最优 MoE 路径
+++++++-#         moe_output = None
+++++++-#         if not self.training:
+++++++-#             if sequence_length == 1:
+++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+++++++-#             else:
+++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+++++++-#         else:
+++++++-#             raise NotImplementedError("Training path is not implemented.")
+++++++-
+++++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+++++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++-        
+++++++-#         # 4. 合并 MoE 输出和共享专家输出
+++++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++-        
+++++++-#         # 5. 恢复原始形状并返回
+++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++-        
+++++++-#         return final_hidden_states, router_logits
+++++++-
+++++++-# prefill fastest
+++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++-#     """
+++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+++++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+++++++-#     """
+++++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++++-#         super().__init__()
+++++++-#         self.num_experts = config.num_experts
+++++++-#         self.top_k = config.num_experts_per_tok
+++++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++++-
+++++++-#         # 门控网络
+++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++-#         # 专家列表
+++++++-#         self.experts = nn.ModuleList(
+++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++-#         )
+++++++-#         # 共享专家
+++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++-
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_dispatch(
+++++++-#         self, 
+++++++-#         hidden_states: mindspore.Tensor, 
+++++++-#         selected_experts: mindspore.Tensor, 
+++++++-#         routing_weights: mindspore.Tensor
+++++++-#     ) -> mindspore.Tensor:
+++++++-#         """
+++++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+++++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+++++++-#         """
+++++++-#         moe_output = ops.zeros_like(hidden_states)
+++++++-#         num_tokens, _ = hidden_states.shape
+++++++-        
+++++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+++++++-#         flat_selected_experts = selected_experts.flatten()
+++++++-#         flat_routing_weights = routing_weights.flatten()
+++++++-
+++++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++-
+++++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+++++++-#         active_experts = ops.unique(flat_selected_experts)
+++++++-        
+++++++-#         for expert_idx_tensor in active_experts:
+++++++-#             expert_idx = expert_idx_tensor.item()
+++++++-#             expert_layer = self.experts[expert_idx]
+++++++-            
+++++++-#             # 找到所有分配给该专家的 token
+++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++-            
+++++++-#             # 使用 mask 选取对应的 token 和权重
+++++++-#             current_token_indices = token_indices[mask]
+++++++-#             current_routing_weights = flat_routing_weights[mask]
+++++++-#             current_hidden_states = hidden_states[current_token_indices]
+++++++-            
+++++++-#             # 对这些 token 进行批处理
+++++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++++-            
+++++++-#             # 使用 index_add 将结果精确地加回到对应位置
+++++++-#             moe_output = moe_output.index_add(
+++++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+++++++-#             )
+++++++-#         return moe_output
+++++++-
+++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++-#         """
+++++++-#         顶层 forward 方法，作为智能分发器。
+++++++-#         """
+++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-        
+++++++-#         # 1. 门控计算
+++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++-
+++++++-#         if self.norm_topk_prob:
+++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-        
+++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+++++++-        
+++++++-#         # 2. 调用统一的 MoE 计算内核
+++++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+++++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+++++++-
+++++++-#         # 3. 统一处理共享专家
+++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++-        
+++++++-#         # 4. 合并输出
+++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++-        
+++++++-#         # 5. 恢复原始形状并返回
+++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++-        
+++++++-#         return final_hidden_states, router_logits
+++++++-
+++++++-
+++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++-#     """
+++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+++++++-#     【最终高性能与高精度版】：
+++++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+++++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+++++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+++++++-#     3. 这样实现了速度和准确性的两全其美。
+++++++-#     """
+++++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++++-#         super().__init__()
+++++++-#         self.num_experts = config.num_experts
+++++++-#         self.top_k = config.num_experts_per_tok
+++++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++++-
+++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++-#         self.experts = nn.ModuleList(
+++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++-#         )
+++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++-
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_decode(
+++++++-#         self, 
+++++++-#         hidden_states: mindspore.Tensor, 
+++++++-#         selected_experts: mindspore.Tensor, 
+++++++-#         routing_weights: mindspore.Tensor
+++++++-#     ) -> mindspore.Tensor:
+++++++-#         """
+++++++-#         【解码路径】极致优化版：bmm + 高精度累加。
+++++++-#         """
+++++++-#         original_dtype = hidden_states.dtype
+++++++-#         batch_size, _ = hidden_states.shape
+++++++-        
+++++++-#         expert_outputs_list = [
+++++++-#             ops.cat([
+++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++-#             ], dim=0) 
+++++++-#             for i in range(batch_size)
+++++++-#         ]
+++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++-
+++++++-#         # 在 float32 下执行 bmm，得到高精度结果
+++++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+++++++-        
+++++++-#         # 将高精度结果转换回原始数据类型
+++++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+++++++-        
+++++++-#         return moe_output
+++++++-
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_prefill(
+++++++-#         self, 
+++++++-#         hidden_states: mindspore.Tensor, 
+++++++-#         selected_experts: mindspore.Tensor, 
+++++++-#         routing_weights: mindspore.Tensor
+++++++-#     ) -> mindspore.Tensor:
+++++++-#         """
+++++++-#         【预填充路径】与原始实现一致，结果精确。
+++++++-#         """
+++++++-#         moe_output = ops.zeros_like(hidden_states)
+++++++-#         num_tokens, _ = hidden_states.shape
+++++++-#         flat_selected_experts = selected_experts.flatten()
+++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++-#         active_experts = ops.unique(flat_selected_experts)
+++++++-        
+++++++-#         for expert_idx_tensor in active_experts:
+++++++-#             expert_idx = expert_idx_tensor.item()
+++++++-#             expert_layer = self.experts[expert_idx]
+++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++-#             selected_token_indices = token_indices[mask]
+++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++++++-#             current_states = hidden_states[selected_token_indices]
+++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++-#             moe_output = moe_output.index_add(
+++++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+++++++-#             )
+++++++-#         return moe_output
+++++++-
+++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-        
+++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++-
+++++++-#         if self.norm_topk_prob:
+++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-        
+++++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+++++++-#         # 如果模型主体是 float16，后续再转换
+++++++-        
+++++++-#         moe_output = None
+++++++-#         if not self.training:
+++++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+++++++-#             # _moe_infer_decode 内部会处理好类型转换
+++++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+++++++-#             if sequence_length == 1:
+++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++++++-#             else:
+++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+++++++-#         else:
+++++++-#             raise NotImplementedError("Training path is not implemented.")
+++++++-
+++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++-        
+++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++-        
+++++++-#         return final_hidden_states, router_logits
+++++++-    
+++++++-
+++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++-#     """
+++++++-#     【融合版】一个混合专家模块，内置两种推理策略，
+++++++-#     由外部全局变量 `Long_Prompt` 控制：
+++++++-
+++++++-#     - if Long_Prompt is True:  【精度优先模式】
+++++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+++++++-#       适用于处理长序列，避免误差累积。
+++++++-
+++++++-#     - if Long_Prompt is False: 【速度优先模式】
+++++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+++++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
+++++++-#     """
+++++++-#     def __init__(self, config: Qwen2MoeConfig):
+++++++-#         super().__init__()
+++++++-#         self.num_experts = config.num_experts
+++++++-#         self.top_k = config.num_experts_per_tok
+++++++-#         self.norm_topk_prob = config.norm_topk_prob
+++++++-
+++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+++++++-#         self.experts = nn.ModuleList(
+++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+++++++-#         )
+++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++-
+++++++-#     # --- 速度优先模式的辅助函数 ---
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++-#         original_dtype = hidden_states.dtype
+++++++-#         batch_size, _ = hidden_states.shape
+++++++-#         expert_outputs_list = [
+++++++-#             ops.cat([
+++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+++++++-#             ], dim=0) 
+++++++-#             for i in range(batch_size)
+++++++-#         ]
+++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+++++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
+++++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+++++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+++++++-
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++-#         moe_output = ops.zeros_like(hidden_states)
+++++++-#         num_tokens, _ = hidden_states.shape
+++++++-#         flat_selected_experts = selected_experts.flatten()
+++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++-#         active_experts = ops.unique(flat_selected_experts)
+++++++-#         for expert_idx_tensor in active_experts:
+++++++-#             expert_idx = expert_idx_tensor.item()
+++++++-#             expert_layer = self.experts[expert_idx]
+++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++-#             selected_token_indices = token_indices[mask]
+++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+++++++-#             current_states = hidden_states[selected_token_indices]
+++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+++++++-#         return moe_output
+++++++-
+++++++-#     # --- 精度优先模式的辅助函数 ---
+++++++-#     @no_grad()
+++++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++-#         moe_output = ops.zeros_like(hidden_states)
+++++++-#         num_tokens, _ = hidden_states.shape
+++++++-#         flat_selected_experts = selected_experts.flatten()
+++++++-#         flat_routing_weights = routing_weights.flatten()
+++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+++++++-#         active_experts = ops.unique(flat_selected_experts)
+++++++-#         for expert_idx_tensor in active_experts:
+++++++-#             expert_idx = expert_idx_tensor.item()
+++++++-#             expert_layer = self.experts[expert_idx]
+++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+++++++-#             current_token_indices = token_indices[mask]
+++++++-#             current_routing_weights = flat_routing_weights[mask]
+++++++-#             current_hidden_states = hidden_states[current_token_indices]
+++++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+++++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+++++++-#         return moe_output
+++++++-
+++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
+++++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+++++++-#         global Long_Prompt
+++++++-
+++++++-#         # 1. 门控计算 (所有模式通用)
+++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++-#         router_logits = self.gate(hidden_states_reshaped)
+++++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+++++++-#         if self.norm_topk_prob:
+++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-        
+++++++-#         moe_output = None
+++++++-#         if not self.training:
+++++++-#             # 根据 Long_Prompt 标志选择模式
+++++++-#             if Long_Prompt:
+++++++-#                 # --- 精度优先模式 ---
+++++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++-#             else:
+++++++-#                 # --- 速度优先模式 ---
+++++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++-#                 if sequence_length == 1:
+++++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++-#                 else:
+++++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++-#         else:
+++++++-#             raise NotImplementedError("Training path is not implemented.")
+++++++-
+++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+++++++-        
+++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+++++++-        
+++++++-#         return final_hidden_states, router_logits
+++++++-    
+++++++ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++     """
+++++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+++++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+++++++         return moe_output_fp32.squeeze(1).to(original_dtype)
+++++++ 
++++++++    # @no_grad()
++++++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
++++++++    #     num_tokens, _ = hidden_states.shape
++++++++    #     flat_selected_experts = selected_experts.flatten()
++++++++    #     sorted_expert_indices = flat_selected_experts.argsort()
++++++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
++++++++    #     original_token_indices = sorted_expert_indices // self.top_k
++++++++    #     moe_output = ops.zeros_like(hidden_states)
++++++++    #     current_token_offset = 0
++++++++    #     for i in range(self.num_experts):
++++++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
++++++++    #         if expert_token_count == 0:
++++++++    #             continue
++++++++    #         end_offset = current_token_offset + expert_token_count
++++++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
++++++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
++++++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
++++++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
++++++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
++++++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
++++++++    #         current_token_offset += expert_token_count
++++++++    #     return moe_output
++++++++
+++++++     @no_grad()
+++++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++-        num_tokens, _ = hidden_states.shape
+++++++-        flat_selected_experts = selected_experts.flatten()
+++++++-        sorted_expert_indices = flat_selected_experts.argsort()
+++++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+++++++-        original_token_indices = sorted_expert_indices // self.top_k
++++++++        """
++++++++        优化版 MoE prefill (速度优先模式)：
++++++++        - 批量张量化处理同一个 expert 的所有 token
++++++++        - 跳过无 token 的专家
++++++++        - 保持结果完全一致
++++++++        """
+++++++         moe_output = ops.zeros_like(hidden_states)
+++++++-        current_token_offset = 0
+++++++-        for i in range(self.num_experts):
+++++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
+++++++-            if expert_token_count == 0:
+++++++-                continue
+++++++-            end_offset = current_token_offset + expert_token_count
+++++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+++++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+++++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
+++++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+++++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+++++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+++++++-            current_token_offset += expert_token_count
++++++++
++++++++        flat_selected_experts = selected_experts.flatten()
++++++++        flat_routing_weights = routing_weights.flatten()
++++++++
++++++++        idxs = flat_selected_experts.argsort()
++++++++        sorted_expert_indices = flat_selected_experts[idxs]
++++++++        sorted_token_indices = idxs // self.top_k
++++++++
++++++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
++++++++
++++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
++++++++
++++++++        for expert_id in active_experts.tolist():
++++++++            start = int(tokens_per_expert[:expert_id].sum().item())
++++++++            end = start + int(tokens_per_expert[expert_id].item())
++++++++
++++++++            token_idx = sorted_token_indices[start:end]
++++++++            expert_tokens = hidden_states[token_idx]
++++++++
++++++++            expert_out = self.experts[expert_id](expert_tokens)
++++++++
++++++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
++++++++
++++++++            moe_output = mindspore.mint.scatter_add(
++++++++                moe_output,
++++++++                0,
++++++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
++++++++                scaled_out.to(hidden_states.dtype)
++++++++            )
++++++++
+++++++         return moe_output
+++++++ 
++++++++
+++++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+++++++     @no_grad()
+++++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+++++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++         
+++++++         moe_output = None
+++++++-        if Long_Prompt:
+++++++-            # --- 精度优先模式 (ACCURACY MODE) ---
+++++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++        # if Long_Prompt==0:
++++++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
++++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++        # else:
++++++++        #     # --- 速度优先模式 (SPEED MODE) ---
++++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++++        #     if sequence_length == 1:
++++++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++        #     else:
++++++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++        
++++++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
++++++++        if sequence_length == 1:
++++++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++         else:
+++++++-            # --- 速度优先模式 (SPEED MODE) ---
+++++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+++++++-            if sequence_length == 1:
+++++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++-            else:
+++++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+++++++-        
++++++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
++++++++    
+++++++ 
+++++++         # 3. 共享专家计算与合并 (所有模式通用)
+++++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+++++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++         
+++++++         return final_hidden_states, router_logits
+++++++ 
++++++++
+++++++ class Qwen2MoeDecoderLayer(nn.Module):
+++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+++++++         super().__init__()
+++++++         self.hidden_size = config.hidden_size
+++++++         
+++++++-        # if Long_Prompt:
+++++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++++-        # else:
++++++++        # if Long_Prompt == 2:
+++++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
++++++++        # else:
++++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++++ 
+++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++++ 
+++++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++++++             )
+++++++ 
+++++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+++++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++++        #     attention_mask,
++++++++        #     sequence_length=sequence_length,
++++++++        #     target_length=target_length,
++++++++        #     dtype=dtype,
++++++++        #     min_dtype=min_dtype,
++++++++        #     cache_position=cache_position,
++++++++        #     batch_size=input_tensor.shape[0],
++++++++        # )
++++++++        #@dwj
++++++++        causal_mask = get_cached_causal_mask_with_cache_position(
+++++++             attention_mask,
+++++++             sequence_length=sequence_length,
+++++++             target_length=target_length,
+++++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+++++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+++++++         """
+++++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
++++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
++++++++        _causal_mask_cache.clear()
+++++++ 
+++++++         input_ids = kwargs.get("input_ids")
+++++++         if input_ids is None and args:
+++++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++ 
+++++++         if input_ids is not None:
+++++++             prompt_length = input_ids.shape[1]
+++++++-            
+++++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+++++++-                Long_Prompt = True
++++++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
++++++++                Long_Prompt = 2
++++++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
++++++++                Long_Prompt = 0
+++++++             else:
+++++++-                Long_Prompt = False
++++++++                Long_Prompt = 1
++++++++
+++++++ 
+++++++         return super().generate(*args, **kwargs)
+++++++     
+++++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++             dtype = self.lm_head.weight.dtype
+++++++             min_dtype = float(ops.finfo(dtype).min)
+++++++ 
+++++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
++++++++            #     attention_mask,
++++++++            #     sequence_length=sequence_length,
++++++++            #     target_length=past_key_values.get_max_length(),
++++++++            #     dtype=dtype,
++++++++            #     min_dtype=min_dtype,
++++++++            #     cache_position=cache_position,
++++++++            #     batch_size=batch_size,
++++++++            # )
++++++++
++++++++            #@dwj
++++++++            attention_mask = get_cached_causal_mask_with_cache_position(
+++++++                 attention_mask,
+++++++                 sequence_length=sequence_length,
+++++++                 target_length=past_key_values.get_max_length(),
+++++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+++++++deleted file mode 100644
+++++++index 6dfb5b93..00000000
+++++++--- a/patches/0001-20251104commit.patch
++++++++++ /dev/null
+++++++@@ -1,1272 +0,0 @@
+++++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+++++++-From: Pinoeer-kingxi <13022943007@163.com>
+++++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
+++++++-Subject: [PATCH] 20251104commit
+++++++-
+++++++----
+++++++- mindnlp/transformers/cache_utils.py           |  28 +-
+++++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+++++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+++++++- 3 files changed, 976 insertions(+), 87 deletions(-)
+++++++-
+++++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+++++++-index cadd2e04..02f8d4be 100644
+++++++---- a/mindnlp/transformers/cache_utils.py
+++++++-+++ b/mindnlp/transformers/cache_utils.py
+++++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+++++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+++++++-             # k_out[:, :, cache_position] = key_states
+++++++-             # v_out[:, :, cache_position] = value_states
+++++++--            if ON_ORANGE_PI:
+++++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++++--            else:
+++++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++++--
+++++++-+            # if ON_ORANGE_PI:
+++++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+++++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+++++++-+            # else:
+++++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+++++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+++++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+++++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
+++++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+++++++-+            if cache_position.ndim > 1:
+++++++-+                cache_position = cache_position.flatten()
+++++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+++++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+++++++-+                cache_position = cache_position.int()
+++++++-+            
+++++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+++++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+++++++-+            k_out[:, :, cache_position] = key_states
+++++++-+            v_out[:, :, cache_position] = value_states
+++++++-+            
+++++++-         return k_out, v_out
+++++++- 
+++++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+++++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++-index c695b944..d8303e45 100644
+++++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+++++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+++++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
+++++++- def rotate_half(x):
+++++++-     """Rotates half the hidden dims of the input."""
+++++++--    x1 = x[..., : x.shape[-1] // 2]
+++++++--    x2 = x[..., x.shape[-1] // 2 :]
+++++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++++-+    # x1 = x[..., : x.shape[-1] // 2]
+++++++-+    # x2 = x[..., x.shape[-1] // 2 :]
+++++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++++-     return ops.cat((-x2, x1), dim=-1)
+++++++- 
+++++++- 
+++++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+++++++-         if self.training:
+++++++-             raise NotImplementedError("Training is not supported yet.")
+++++++-         else:
+++++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++++--        if self.config.n_shared_experts is not None:
+++++++--            y = y + self.shared_experts(identity)
+++++++--        return y
+++++++-+            # @lwx
+++++++-+            if orig_shape[1] == 1:
+++++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+++++++-+                y=y.view(*orig_shape)
+++++++-+                if self.config.n_shared_experts is not None:
+++++++-+                    y = y + self.shared_experts(identity)
+++++++-+                return y
+++++++-+            else:
+++++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+++++++-+                if self.config.n_shared_experts is not None:
+++++++-+                    y = y + self.shared_experts(identity)
+++++++-+                return y
+++++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+++++++-+        # if self.config.n_shared_experts is not None:
+++++++-+        #     y = y + self.shared_experts(identity)
+++++++-+        # return y
+++++++-+        
+++++++-+    @no_grad()
+++++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+++++++-+
+++++++-+        expert_cache = ops.zeros_like(x)
+++++++-+        for i in range(self.num_experts_per_tok):
+++++++-+            expert_id = flat_expert_indices[i].item()
+++++++-+            weight = flat_expert_weights[i].item()
+++++++-+            expert = self.experts[expert_id]
+++++++-+            expert_out = expert(x)
+++++++-+            expert_cache += expert_out * weight
+++++++-+        return expert_cache
+++++++- 
+++++++-     @no_grad()
+++++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++--        # expert_cache = torch.zeros_like(x)
+++++++--        # idxs = flat_expert_indices.argsort()
+++++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++--        # token_idxs = idxs // self.num_experts_per_tok
+++++++--        # for i, end_idx in enumerate(tokens_per_expert):
+++++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++--        #     if start_idx == end_idx:
+++++++--        #         continue
+++++++--        #     expert = self.experts[i]
+++++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++--        #     expert_tokens = x[exp_token_idx]
+++++++--        #     expert_out = expert(expert_tokens)
+++++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++--        # return expert_cache
+++++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+++++++-         expert_cache = ops.zeros_like(x)
+++++++-         idxs = flat_expert_indices.argsort()
+++++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++-         token_idxs = idxs // self.num_experts_per_tok
+++++++-+
+++++++-         for i, end_idx in enumerate(tokens_per_expert):
+++++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++-             if start_idx == end_idx:
+++++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+++++++-             expert_out = expert(expert_tokens)
+++++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++-+
+++++++-         return expert_cache
+++++++-+        
+++++++-+    # @no_grad()
+++++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++-+    #     # expert_cache = torch.zeros_like(x)
+++++++-+    #     # idxs = flat_expert_indices.argsort()
+++++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+++++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
+++++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+++++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+++++++-+    #     #     if start_idx == end_idx:
+++++++-+    #     #         continue
+++++++-+    #     #     expert = self.experts[i]
+++++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+++++++-+    #     #     expert_tokens = x[exp_token_idx]
+++++++-+    #     #     expert_out = expert(expert_tokens)
+++++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+++++++-+    #     # return expert_cache
+++++++-+    #     expert_cache = ops.zeros_like(x)
+++++++-+    #     idxs = flat_expert_indices.argsort()
+++++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++-+    #     token_idxs = idxs // self.num_experts_per_tok
+++++++-+
+++++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
+++++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++-+    #         if start_idx == end_idx:
+++++++-+    #             continue
+++++++-+    #         expert = self.experts[i]
+++++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++-+    #         expert_tokens = x[exp_token_idx]
+++++++-+    #         expert_out = expert(expert_tokens)
+++++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+++++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+++++++-+
+++++++-+    #     return expert_cache
+++++++-+    # @no_grad()
+++++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+++++++-+    #     expert_cache = ops.zeros_like(x)
+++++++-+
+++++++-+    #     # 排序保证顺序一致
+++++++-+    #     idxs = flat_expert_indices.argsort()
+++++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+++++++-+    #     token_idxs = idxs // self.num_experts_per_tok
+++++++-+
+++++++-+    #     # 找出有 token 的专家
+++++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+++++++-+
+++++++-+    #     for i in active_experts.tolist():
+++++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+++++++-+    #         end_idx = tokens_per_expert[i]
+++++++-+    #         if start_idx == end_idx:  # 没有 token
+++++++-+    #             continue
+++++++-+
+++++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+++++++-+    #         expert_tokens = x[exp_token_idx]
+++++++-+    #         expert_out = self.experts[i](expert_tokens)
+++++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+++++++-+
+++++++-+    #         expert_cache = mindspore.mint.scatter_add(
+++++++-+    #             expert_cache,
+++++++-+    #             0,
+++++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+++++++-+    #             expert_out
+++++++-+    #         )
+++++++-+
+++++++-+    #     return expert_cache
+++++++-+
+++++++-+
+++++++- 
+++++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+++++++- #     """
+++++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++++- 
+++++++-         # Initialize weights and apply final processing
+++++++-         self.post_init()
+++++++-+        self.warm_up = False
+++++++-+
+++++++-+    def warmup_moe_model_deep(self):
+++++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+++++++-+        test_texts = [
+++++++-+            "warmup short",
+++++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+++++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+++++++-+        ]
+++++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++++-+        if tokenizer is None:
+++++++-+            from mindnlp.transformers import AutoTokenizer
+++++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++++-+            self._warmup_tokenizer = tokenizer
+++++++-+
+++++++-+        for text in test_texts:
+++++++-+            inputs = tokenizer(text, return_tensors="ms")
+++++++-+            with mindspore._no_grad():
+++++++-+                _ = self(**inputs, use_cache=False)
+++++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+++++++- 
+++++++-     def get_input_embeddings(self):
+++++++-         return self.model.embed_tokens
+++++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+++++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++++-         ```"""
+++++++-+        if not self.warm_up:
+++++++-+            self.warm_up = True
+++++++-+            self.warmup_moe_model_deep()
+++++++-+
+++++++-         output_attentions = (
+++++++-             output_attentions
+++++++-             if output_attentions is not None
+++++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++-index 3cbf820e..d4c6b651 100644
+++++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+++++++-@@ -18,7 +18,6 @@
+++++++- # See the License for the specific language governing permissions and
+++++++- # limitations under the License.
+++++++- """MindSpore Qwen2MoE model."""
+++++++--
+++++++- import math
+++++++- from typing import List, Optional, Tuple, Union
+++++++- 
+++++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+++++++-     TokenClassifierOutput,
+++++++- )
+++++++- from ...modeling_utils import PreTrainedModel
+++++++-+from ...generation import GenerationMixin
+++++++- from ....utils import logging
+++++++- from .configuration_qwen2_moe import Qwen2MoeConfig
+++++++- 
+++++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+++++++-         self.variance_epsilon = eps
+++++++- 
+++++++-     def forward(self, hidden_states):
+++++++-+        # @dwj
+++++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++++-+        # @lwx
+++++++-+        # if not self.training :
+++++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+++++++-         input_dtype = hidden_states.dtype
+++++++-         hidden_states = hidden_states.to(mindspore.float32)
+++++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+++++++-@@ -234,6 +239,8 @@ def rotate_half(x):
+++++++-     """Rotates half the hidden dims of the input."""
+++++++-     x1 = x[..., : x.shape[-1] // 2]
+++++++-     x2 = x[..., x.shape[-1] // 2 :]
+++++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+++++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+++++++-     return ops.cat((-x2, x1), dim=-1)
+++++++- 
+++++++- 
+++++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+++++++-         self.config = config
+++++++-         self.hidden_size = config.hidden_size
+++++++-         self.intermediate_size = intermediate_size
+++++++-+        
+++++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+++++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+++++++-         self.act_fn = ACT2FN[config.hidden_act]
+++++++- 
+++++++-     def forward(self, x):
+++++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++++--
+++++++- 
+++++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+++++++-+        # @lwx 
+++++++-+        # gate_up_output = self.gate_up_proj(x)
+++++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+++++++-+        # return self.down_proj(swiglu_output)
+++++++-+
+++++++-+    # def forward(self, x):
+++++++-+    #     gate_proj_out = self.gate_proj(x)
+++++++-+    #     up_proj_out = self.up_proj(x)
+++++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+++++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+++++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+++++++-+    #     return self.down_proj(swiglu_out)
+++++++-+        
+++++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+++++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+++++++-     """
+++++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+++++++-         use_cache: bool = False,
+++++++-         cache_position: Optional[mindspore.Tensor] = None,
+++++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++-+
+++++++-+        
+++++++-+
+++++++-         bsz, q_len, _ = hidden_states.shape
+++++++- 
+++++++-         query_states = self.q_proj(hidden_states)
+++++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+++++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++-                     "with a layer index."
+++++++-                 )
+++++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++-+            if isinstance(past_key_value, StaticCache):
+++++++-+                kv_seq_len = key_states.shape[-2]
+++++++-+            else:
+++++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++- 
+++++++-         if past_key_value is not None:
+++++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+++++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+++++++-+            
+++++++-+            if isinstance(past_key_value, StaticCache):
+++++++-+                kv_seq_len = key_states.shape[-2]
+++++++- 
+++++++-         # repeat k/v heads if n_kv_heads < n_heads
+++++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++--
+++++++-+        
+++++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+++++++- 
+++++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+++++++--            raise ValueError(
+++++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+++++++--                f" {attn_weights.shape}"
+++++++--            )
+++++++--
+++++++--        if attention_mask is not None:  # no matter the length, we just slice it
+++++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+++++++-+        if attention_mask is not None:
+++++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+++++++-             attn_weights = attn_weights + causal_mask
+++++++- 
+++++++-         # upcast attention to fp32
+++++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+++++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+++++++- 
+++++++-         attn_output = self.o_proj(attn_output)
+++++++--
+++++++-+        # @lwx
+++++++-+        
+++++++-+        # max_seq_len = self.max_position_embeddings  # 2048
+++++++-+
+++++++-+        # if attention_mask is not None:
+++++++-+        #     # attention_mask: [B, 1, Sq, Sk]
+++++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+++++++-+
+++++++-+        #     # pad 到 [max_seq_len, max_seq_len]
+++++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+++++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+++++++-+        #     global_attention_mask = padded_mask
+++++++-+        # else:
+++++++-+        #     global_attention_mask = None
+++++++-+
+++++++-+
+++++++-+        # sparse_mode=3
+++++++-+        # attn_output = mindspore.ops.flash_attention_score(
+++++++-+        #     query=query_states,
+++++++-+        #     key=key_states,
+++++++-+        #     value=value_states,
+++++++-+        #     real_shift=None,
+++++++-+        #     padding_mask=None,
+++++++-+
+++++++-+        #     head_num=self.num_heads,
+++++++-+        #     attn_mask=global_attention_mask,
+++++++-+        #     keep_prob=1.0 - self.attention_dropout,
+++++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++-+        #     input_layout="BNSD",
+++++++-+        #     pre_tokens=2147483647,
+++++++-+        #     next_tokens=2147483647,
+++++++-+        #     inner_precise=0,
+++++++-+        #     drop_mask=None,
+++++++-+        #     prefix=None,
+++++++-+        #     actual_seq_qlen=None,
+++++++-+        #     actual_seq_kvlen=None,
+++++++-+        #     sparse_mode=sparse_mode,
+++++++-+        # )
+++++++-         if not output_attentions:
+++++++-             attn_weights = None
+++++++- 
+++++++-         return attn_output, attn_weights, past_key_value
+++++++- 
+++++++- 
+++++++-+class Qwen2MoeFlashAttention(nn.Module):
+++++++-+    """
+++++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+++++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+++++++-+
+++++++-+    关键改动:
+++++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+++++++-+       直接传入原始的 key 和 value 张量效率更高。
+++++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+++++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+++++++-+    """
+++++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+++++++-+        super().__init__()
+++++++-+        self.config = config
+++++++-+        self.layer_idx = layer_idx
+++++++-+        self.hidden_size = config.hidden_size
+++++++-+        self.num_heads = config.num_attention_heads
+++++++-+        self.head_dim = self.hidden_size // self.num_heads
+++++++-+        self.num_key_value_heads = config.num_key_value_heads
+++++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+++++++-+        self.max_position_embeddings = config.max_position_embeddings
+++++++-+        self.rope_theta = config.rope_theta
+++++++-+        self.attention_dropout = config.attention_dropout
+++++++-+
+++++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+++++++-+            raise ValueError(
+++++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+++++++-+            )
+++++++-+
+++++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+++++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+++++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+++++++-+
+++++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+++++++-+            self.head_dim,
+++++++-+            max_position_embeddings=self.max_position_embeddings,
+++++++-+            base=self.rope_theta,
+++++++-+        )
+++++++-+
+++++++-+    def forward(
+++++++-+        self,
+++++++-+        hidden_states: mindspore.Tensor,
+++++++-+        attention_mask: Optional[mindspore.Tensor] = None,
+++++++-+        position_ids: Optional[mindspore.Tensor] = None,
+++++++-+        past_key_value: Optional[Cache] = None,
+++++++-+        output_attentions: bool = False,
+++++++-+        use_cache: bool = False,
+++++++-+        cache_position: Optional[mindspore.Tensor] = None,
+++++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++-+
+++++++-+        bsz, q_len, _ = hidden_states.shape
+++++++-+
+++++++-+        # 1. 线性投射 Q, K, V
+++++++-+        query_states = self.q_proj(hidden_states)
+++++++-+        key_states = self.k_proj(hidden_states)
+++++++-+        value_states = self.v_proj(hidden_states)
+++++++-+
+++++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+++++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+++++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+
+++++++-+        # 3. RoPE 旋转位置编码
+++++++-+        kv_seq_len = key_states.shape[-2]
+++++++-+        if past_key_value is not None:
+++++++-+            if self.layer_idx is None:
+++++++-+                raise ValueError(
+++++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++-+                    "with a layer index."
+++++++-+                )
+++++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+++++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+++++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+++++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+++++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+++++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+++++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+++++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+++++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+++++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+++++++-+                if cache_position.shape[0] == 1:
+++++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+++++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+++++++-+                    kv_seq_len = past_seen_tokens + 1
+++++++-+                else:
+++++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
+++++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+++++++-+            else:
+++++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++-+
+++++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++-+
+++++++-+        # 4. KV 缓存更新
+++++++-+        if past_key_value is not None:
+++++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++-+            key_states, value_states = past_key_value.update(
+++++++-+                key_states, value_states, self.layer_idx, cache_kwargs
+++++++-+            )
+++++++-+            
+++++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+++++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+++++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+++++++-+                if cache_position.shape[0] == 1:
+++++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+++++++-+                    kv_seq_len = key_states.shape[-2]
+++++++-+
+++++++-+        # 5. [重要] 准备 Attention Mask
+++++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+++++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+++++++-+        fa_attention_mask = None
+++++++-+        if attention_mask is not None:
+++++++-+            # 截取与当前key长度匹配的部分
+++++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+++++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+++++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+++++++-+            fa_attention_mask = (mask_slice != 0)
+++++++-+
+++++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+++++++-+        input_dtype = query_states.dtype
+++++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+++++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+++++++-+            query_states = query_states.to(mindspore.float16)
+++++++-+            key_states = key_states.to(mindspore.float16)
+++++++-+            value_states = value_states.to(mindspore.float16)
+++++++-+
+++++++-+        # 6. [核心] 调用 flash_attention_score 算子
+++++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+++++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+++++++-+        attn_output = mindspore.ops.flash_attention_score(
+++++++-+            query=query_states,
+++++++-+            key=key_states,
+++++++-+            value=value_states,
+++++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
+++++++-+            attn_mask=fa_attention_mask,
+++++++-+            keep_prob=1.0 - self.attention_dropout,
+++++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++-+            input_layout="BNSD",
+++++++-+            sparse_mode=0 # 使用 defaultMask 模式
+++++++-+        )
+++++++-+
+++++++-+        # 恢复原始数据类型
+++++++-+        attn_output = attn_output.to(input_dtype)
+++++++-+
+++++++-+        # 7. 调整输出形状
+++++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+++++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++-+        attn_output = self.o_proj(attn_output)
+++++++-+
+++++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
+++++++-+        attn_weights = None
+++++++-+        if output_attentions:
+++++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++-+
+++++++-+        return attn_output, attn_weights, past_key_value
+++++++-+
+++++++-+    # def forward(
+++++++-+    #     self,
+++++++-+    #     hidden_states: mindspore.Tensor,
+++++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++-+    #     past_key_value: Optional[Cache] = None,
+++++++-+    #     output_attentions: bool = False,
+++++++-+    #     use_cache: bool = False,
+++++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++-+
+++++++-+    #     bsz, q_len, _ = hidden_states.shape
+++++++-+
+++++++-+    #     # 1. 线性投射 Q, K, V
+++++++-+    #     query_states = self.q_proj(hidden_states)
+++++++-+    #     key_states = self.k_proj(hidden_states)
+++++++-+    #     value_states = self.v_proj(hidden_states)
+++++++-+
+++++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+++++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+
+++++++-+    #     # 3. RoPE 旋转位置编码
+++++++-+    #     kv_seq_len = key_states.shape[-2]
+++++++-+    #     if past_key_value is not None:
+++++++-+    #         if self.layer_idx is None:
+++++++-+    #             raise ValueError(
+++++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+++++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+++++++-+    #                 "with a layer index."
+++++++-+    #             )
+++++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++-+
+++++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++-+
+++++++-+    #     # 4. KV 缓存更新
+++++++-+    #     if past_key_value is not None:
+++++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++-+    #         key_states, value_states = past_key_value.update(
+++++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++-+    #         )
+++++++-+
+++++++-+    #     # 5. 准备 Attention Mask
+++++++-+    #     fa_attention_mask = None
+++++++-+    #     if attention_mask is not None:
+++++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++-+    #         fa_attention_mask = (mask_slice != 0)
+++++++-+
+++++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+++++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+++++++-+    #     input_dtype = query_states.dtype
+++++++-+
+++++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
+++++++-+    #     attn_output = mindspore.ops.flash_attention_score(
+++++++-+    #         query=query_states,
+++++++-+    #         key=key_states,
+++++++-+    #         value=value_states,
+++++++-+    #         head_num=self.num_heads,
+++++++-+    #         attn_mask=fa_attention_mask,
+++++++-+    #         keep_prob=1.0 - self.attention_dropout,
+++++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+++++++-+    #         input_layout="BNSD",
+++++++-+    #         sparse_mode=0,
+++++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+++++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+++++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+++++++-+    #         inner_precise=1
+++++++-+    #     )
+++++++-+
+++++++-+    #     # 恢复原始数据类型
+++++++-+    #     attn_output = attn_output.to(input_dtype)
+++++++-+
+++++++-+    #     # 7. 调整输出形状
+++++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++-+    #     attn_output = self.o_proj(attn_output)
+++++++-+
+++++++-+    #     attn_weights = None
+++++++-+    #     if output_attentions:
+++++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+++++++-+
+++++++-+    #     return attn_output, attn_weights, past_key_value
+++++++-+
+++++++-+    # def forward(
+++++++-+    #     self,
+++++++-+    #     hidden_states: mindspore.Tensor,
+++++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+++++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+++++++-+    #     past_key_value: Optional[Cache] = None,
+++++++-+    #     output_attentions: bool = False,
+++++++-+    #     use_cache: bool = False,
+++++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+++++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+++++++-+
+++++++-+    #     bsz, q_len, _ = hidden_states.shape
+++++++-+
+++++++-+    #     query_states = self.q_proj(hidden_states)
+++++++-+    #     key_states = self.k_proj(hidden_states)
+++++++-+    #     value_states = self.v_proj(hidden_states)
+++++++-+
+++++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+++++++-+
+++++++-+    #     kv_seq_len = key_states.shape[-2]
+++++++-+    #     if past_key_value is not None:
+++++++-+    #         if self.layer_idx is None:
+++++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
+++++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+++++++-+
+++++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+++++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+++++++-+
+++++++-+    #     if past_key_value is not None:
+++++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+++++++-+    #         key_states, value_states = past_key_value.update(
+++++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+++++++-+    #         )
+++++++-+
+++++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+++++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+++++++-+
+++++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+++++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+++++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+++++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
+++++++-+    #     # <--- 修改结束 ---
+++++++-+
+++++++-+    #     fa_attention_mask = None
+++++++-+    #     if attention_mask is not None:
+++++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+++++++-+    #         fa_attention_mask = (mask_slice != 0)
+++++++-+
+++++++-+    #     input_dtype = query_states.dtype
+++++++-+
+++++++-+    #     attn_output = mindspore.ops.flash_attention_score(
+++++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
+++++++-+    #         key=key_states,
+++++++-+    #         value=value_states,
+++++++-+    #         head_num=self.num_heads,
+++++++-+    #         attn_mask=fa_attention_mask,
+++++++-+    #         keep_prob=1.0 - self.attention_dropout,
+++++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+++++++-+    #         input_layout="BNSD",
+++++++-+    #         sparse_mode=0,
+++++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
+++++++-+    #     )
+++++++-+
+++++++-+    #     attn_output = attn_output.to(input_dtype)
+++++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+++++++-+    #     attn_output = self.o_proj(attn_output)
+++++++-+
+++++++-+    #     attn_weights = None
+++++++-+    #     if output_attentions:
+++++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+++++++-+
+++++++-+    #     return attn_output, attn_weights, past_key_value
+++++++-+
+++++++- QWEN2MOE_ATTENTION_CLASSES = {
+++++++-     "eager": Qwen2MoeAttention,
+++++++-+    "flash-attention": Qwen2MoeFlashAttention,
+++++++- }
+++++++- 
+++++++- 
+++++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+++++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+++++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+++++++- 
+++++++-+    #@dwj
+++++++-+    # 只遍历激活的专家，而非全部专家
+++++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+++++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++--        hidden_states = hidden_states.view(-1, hidden_dim)
+++++++--        # router_logits: (batch * sequence_length, n_experts)
+++++++--        router_logits = self.gate(hidden_states)
+++++++--
+++++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++--        if self.norm_topk_prob:
+++++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++--        # we cast back to the input dtype
+++++++--        routing_weights = routing_weights.to(hidden_states.dtype)
+++++++--
+++++++--        final_hidden_states = ops.zeros(
+++++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+++++++--        )
+++++++--
+++++++--        # One hot encode the selected experts to create an expert mask
+++++++--        # this will be used to easily index which expert is going to be sollicitated
+++++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+++++++--
+++++++--        # Loop over all available experts in the model and perform the computation on each expert
+++++++--        for expert_idx in range(self.num_experts):
+++++++--            expert_layer = self.experts[expert_idx]
+++++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+++++++--
+++++++--            # Index the correct hidden states and compute the expert hidden state for
+++++++--            # the current expert. We need to make sure to multiply the output hidden
+++++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+++++++--            if 0 not in idx.shape:
+++++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+++++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+++++++--
+++++++--                # However `index_add_` only support torch tensors for indexing so we'll use
+++++++--                # the `top_x` tensor here.
+++++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+++++++--
+++++++--        shared_expert_output = self.shared_expert(hidden_states)
+++++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+++++++--
+++++++--        final_hidden_states = final_hidden_states + shared_expert_output
+++++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+++++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+++++++-+            num_tokens = hidden_states_reshaped.shape[0]
+++++++-+
+++++++-+            router_logits = self.gate(hidden_states_reshaped)
+++++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+++++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+++++++-+
+++++++-+            if self.norm_topk_prob:
+++++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+++++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
+++++++-+            
+++++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+++++++-+            flat_selected_experts = selected_experts.flatten()
+++++++-+            
+++++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+++++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+++++++-+            token_indices = broadcasted_token_indices.flatten()
+++++++-+            
+++++++-+            active_experts = ops.unique(flat_selected_experts)
+++++++-+            
+++++++-+            for expert_idx_tensor in active_experts:
+++++++-+                expert_idx = expert_idx_tensor.item()
+++++++-+                expert_layer = self.experts[expert_idx]
+++++++-+                
+++++++-+                mask = (flat_selected_experts == expert_idx_tensor)
+++++++-+                selected_token_indices = token_indices[mask]
+++++++-+                selected_routing_weights = routing_weights.flatten()[mask]
+++++++-+                
+++++++-+                current_states = hidden_states_reshaped[selected_token_indices]
+++++++-+                
+++++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+++++++-+                
+++++++-+                final_hidden_states = final_hidden_states.index_add(
+++++++-+                    dim=0,
+++++++-+                    index=selected_token_indices,
+++++++-+                    source=expert_output.to(hidden_states.dtype)
+++++++-+                )
+++++++-+            
+++++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+++++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+++++++- 
+++++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++--        return final_hidden_states, router_logits
+++++++-+            final_hidden_states = final_hidden_states + shared_expert_output
+++++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+++++++-+            
+++++++-+            return final_hidden_states, router_logits
+++++++- 
+++++++- 
+++++++- class Qwen2MoeDecoderLayer(nn.Module):
+++++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+++++++- 
+++++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+++++++- 
+++++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+++++++-+
+++++++-         if (layer_idx not in config.mlp_only_layers) and (
+++++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+++++++-         ):
+++++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+++++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+++++++-     _skip_keys_device_placement = "past_key_values"
+++++++-     _supports_cache_class = True
+++++++-+#lwx
+++++++-+    # _supports_static_cache = True
+++++++- 
+++++++-     def _init_weights(self, module):
+++++++-         std = self.config.initializer_range
+++++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+++++++-         return causal_mask
+++++++- 
+++++++- 
+++++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+++++++-     _tied_weights_keys = ["lm_head.weight"]
+++++++- 
+++++++-     def __init__(self, config):
+++++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++-         self.num_experts_per_tok = config.num_experts_per_tok
+++++++-         # Initialize weights and apply final processing
+++++++-         self.post_init()
+++++++-+        # @lwx
+++++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+++++++-+        #     self.generation_config.cache_implementation = "static"
+++++++-+        self._warmed_up = False
+++++++-+
+++++++-+    def warmup_moe_model(self):
+++++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+++++++-+        test_texts = [
+++++++-+            "warmup short",
+++++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+++++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+++++++-+        ]
+++++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+++++++-+        if tokenizer is None:
+++++++-+            from mindnlp.transformers import AutoTokenizer
+++++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+++++++-+            self._warmup_tokenizer = tokenizer
+++++++-+
+++++++-+        for text in test_texts:
+++++++-+            inputs = tokenizer(text, return_tensors="ms")
+++++++-+            with mindspore._no_grad():
+++++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+++++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+++++++- 
+++++++-     def get_input_embeddings(self):
+++++++-         return self.model.embed_tokens
+++++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+++++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+++++++-         ```"""
+++++++-+        if not self._warmed_up:
+++++++-+            self._warmed_up = True
+++++++-+            self.warmup_moe_model()
+++++++- 
+++++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+++++++-         output_router_logits = (
+++++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+++++++-             }
+++++++-         )
+++++++-         return model_inputs
+++++++-+# @lwx
+++++++-+    # def _decode_one_tokens_logits(
+++++++-+    #     self,
+++++++-+    #     cur_token: mindspore.Tensor,
+++++++-+    #     input_pos: Optional[mindspore.Tensor],
+++++++-+    #     cache_position: mindspore.Tensor,
+++++++-+    #     past_key_values: StaticCache,
+++++++-+    # ) -> mindspore.Tensor:
+++++++-+    #     """
+++++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+++++++-+        
+++++++-+    #     Args:
+++++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+++++++-+    #         input_pos: 输入位置信息，可选
+++++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+++++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+++++++-+            
+++++++-+    #     Returns:
+++++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+++++++-+    #     """
+++++++-+    #     # 调用JIT编译的版本
+++++++-+    #     return self.get_decode_one_tokens_logits(
+++++++-+    #         cur_token=cur_token,
+++++++-+    #         input_pos=input_pos,
+++++++-+    #         cache_position=cache_position,
+++++++-+    #         past_key_values=past_key_values,
+++++++-+    #     )
+++++++-+    
+++++++-+    # @mindspore.jit(jit_level='O1')
+++++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+++++++-+    #     """
+++++++-+    #     JIT编译的函数，用于高效的单token解码
+++++++-+    #     使用JIT编译优化以支持静态shape和高效执行
+++++++-+        
+++++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+++++++-+    #     """
+++++++-+    #     outputs = self.model.forward(
+++++++-+    #         input_ids=cur_token,
+++++++-+    #         position_ids=input_pos,
+++++++-+    #         cache_position=cache_position,
+++++++-+    #         past_key_values=past_key_values,
+++++++-+    #         use_cache=True,
+++++++-+    #         return_dict=False,
+++++++-+    #     )
+++++++-+        
+++++++-+    #     hidden_states = outputs[0]
+++++++-+    #     logits = self.lm_head.forward(hidden_states)
+++++++-+    #     logits = logits.float()
+++++++-+        
+++++++-+    #     return logits[:, -1, :]
+++++++-+
+++++++-+    # def _sample(
+++++++-+    #     self,
+++++++-+    #     input_ids: mindspore.Tensor,
+++++++-+    #     logits_processor,
+++++++-+    #     stopping_criteria,
+++++++-+    #     generation_config,
+++++++-+    #     synced_devices: bool,
+++++++-+    #     streamer=None,
+++++++-+    #     logits_warper=None,
+++++++-+    #     **model_kwargs,
+++++++-+    # ):
+++++++-+    #     """
+++++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+++++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+++++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+++++++-+    #     """
+++++++-+    #     from ...generation.logits_process import LogitsProcessorList
+++++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+++++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+++++++-+    #     from mindnlp.core import nn, ops, no_grad
+++++++-+    #     import numpy as np
+++++++-+        
+++++++-+    #     # 检查是否使用 StaticCache
+++++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+++++++-+    #     # 否则，直接调用父类方法
+++++++-+    #     past_key_values = model_kwargs.get("past_key_values")
+++++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+++++++-+
+++++++-+    #     if not isinstance(past_key_values, StaticCache):
+++++++-+    #         # 不使用 StaticCache，直接调用父类方法
+++++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+++++++-+    #         return super()._sample(
+++++++-+    #             input_ids=input_ids,
+++++++-+    #             logits_processor=logits_processor,
+++++++-+    #             stopping_criteria=stopping_criteria,
+++++++-+    #             generation_config=generation_config,
+++++++-+    #             synced_devices=synced_devices,
+++++++-+    #             streamer=streamer,
+++++++-+    #             logits_warper=logits_warper,
+++++++-+    #             **model_kwargs,
+++++++-+    #         )
+++++++-+        
+++++++-+    #     # 使用 StaticCache，进入自定义循环
+++++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+++++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+++++++-+    #     pad_token_id = generation_config._pad_token_tensor
+++++++-+    #     output_attentions = generation_config.output_attentions
+++++++-+    #     output_hidden_states = generation_config.output_hidden_states
+++++++-+    #     output_scores = generation_config.output_scores
+++++++-+    #     output_logits = generation_config.output_logits
+++++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+++++++-+    #     max_length = generation_config.max_length
+++++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+++++++-+    #     do_sample = generation_config.do_sample
+++++++-+        
+++++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+++++++-+    #         raise ValueError(
+++++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+++++++-+    #             f"{logits_warper})."
+++++++-+    #         )
+++++++-+        
+++++++-+    #     # init attention / hidden states / scores tuples
+++++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+++++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+++++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+++++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+++++++-+        
+++++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+++++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+++++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+++++++-+    #         encoder_hidden_states = (
+++++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+++++++-+    #         )
+++++++-+        
+++++++-+    #     # keep track of which sequences are already finished
+++++++-+    #     batch_size, cur_len = input_ids.shape
+++++++-+    #     this_peer_finished = False
+++++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+++++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+++++++-+        
+++++++-+    #     time_record = []
+++++++-+    #     from ....utils.testing_utils import parse_flag_from_env
+++++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+++++++-+        
+++++++-+    #     while self._has_unfinished_sequences(
+++++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+++++++-+    #     ):
+++++++-+    #         if _record_time:
+++++++-+    #             import time as time_module
+++++++-+    #             infer_start = time_module.time()
+++++++-+            
+++++++-+    #         # prepare model inputs
+++++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+++++++-+            
+++++++-+    #         # prepare variable output controls
+++++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+++++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+++++++-+            
+++++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+++++++-+    #         cur_cache_position = model_inputs.get("cache_position")
+++++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+++++++-+    #         cur_input_ids = model_inputs.get("input_ids")
+++++++-+            
+++++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+++++++-+    #             cur_cache_position is not None and 
+++++++-+    #             len(cur_cache_position.shape) > 0 and
+++++++-+    #             cur_cache_position.shape[0] == 1 and
+++++++-+    #             cur_input_ids is not None and
+++++++-+    #             cur_input_ids.shape[1] == 1):
+++++++-+    #             # 使用 JIT 优化的单 token 解码
+++++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+++++++-+    #             if not hasattr(self, '_jit_used'):
+++++++-+    #                 self._jit_used = False
+++++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+++++++-+                
+++++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+++++++-+    #                 cur_token=cur_input_ids,
+++++++-+    #                 input_pos=model_inputs.get("position_ids"),
+++++++-+    #                 cache_position=cur_cache_position,
+++++++-+    #                 past_key_values=cur_past_key_values,
+++++++-+    #             )
+++++++-+                
+++++++-+    #             # 标记已使用JIT（用于后续判断）
+++++++-+    #             if not self._jit_used:
+++++++-+    #                 self._jit_used = True
+++++++-+                
+++++++-+    #             # 构造兼容的输出对象
+++++++-+    #             class JitOptimizedOutput:
+++++++-+    #                 def __init__(self, logits, config):
+++++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+++++++-+    #                     self.config = config
+++++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+++++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+++++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+++++++-+    #                     self.cross_attentions = None
+++++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+++++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+++++++-+                
+++++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+++++++-+    #         else:
+++++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+++++++-+    #             outputs = self(**model_inputs, return_dict=True)
+++++++-+            
+++++++-+    #         if synced_devices and this_peer_finished:
+++++++-+    #             continue
+++++++-+            
+++++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+++++++-+    #         next_token_logits = outputs.logits[:, -1, :]
+++++++-+            
+++++++-+    #         # pre-process distribution
+++++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+++++++-+    #         if do_sample:
+++++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+++++++-+            
+++++++-+    #         # Store scores, attentions and hidden_states when required
+++++++-+    #         if return_dict_in_generate:
+++++++-+    #             if output_scores:
+++++++-+    #                 scores += (next_token_scores,)
+++++++-+    #             if output_logits:
+++++++-+    #                 raw_logits += (next_token_logits,)
+++++++-+    #             if output_attentions:
+++++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+++++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+++++++-+    #                 if self.config.is_encoder_decoder:
+++++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+++++++-+                
+++++++-+    #             if output_hidden_states:
+++++++-+    #                 hidden = (
+++++++-+    #                     outputs.decoder_hidden_states
+++++++-+    #                     if self.config.is_encoder_decoder
+++++++-+    #                     else outputs.hidden_states
+++++++-+    #                 )
+++++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+++++++-+            
+++++++-+    #         # token selection
+++++++-+    #         if do_sample:
+++++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+++++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+++++++-+    #         else:
+++++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+++++++-+            
+++++++-+    #         # finished sentences should have their next token be a padding token
+++++++-+    #         if has_eos_stopping_criteria:
+++++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+++++++-+            
+++++++-+    #         # update generated ids, model inputs, and length for next step
+++++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+++++++-+    #         if streamer is not None:
+++++++-+    #             streamer.put(next_tokens)
+++++++-+            
+++++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+++++++-+    #             outputs,
+++++++-+    #             model_kwargs,
+++++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+++++++-+    #         )
+++++++-+            
+++++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+++++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+++++++-+    #         cur_len += 1
+++++++-+            
+++++++-+    #         if _record_time:
+++++++-+    #             import time as time_module
+++++++-+    #             infer_stop = time_module.time()
+++++++-+    #             time_record.append(infer_stop - infer_start)
+++++++-+            
+++++++-+    #         del outputs
+++++++-+        
+++++++-+    #     average_infer_time = None
+++++++-+    #     if time_record:
+++++++-+    #         if len(time_record) > 1:
+++++++-+    #             time_record.pop(0)
+++++++-+    #         average_infer_time = sum(time_record) / len(time_record)
+++++++-+    #         print(f'average inference time is: {average_infer_time}')
+++++++-+    #         print(f'inference time record: {time_record}')
+++++++-+        
+++++++-+    #     if streamer is not None:
+++++++-+    #         streamer.end()
+++++++-+        
+++++++-+    #     # 简单判断：打印是否使用了JIT路径
+++++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+++++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+++++++-+    #     else:
+++++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+++++++-+        
+++++++-+    #     if return_dict_in_generate:
+++++++-+    #         if self.config.is_encoder_decoder:
+++++++-+    #             return GenerateEncoderDecoderOutput(
+++++++-+    #                 sequences=input_ids,
+++++++-+    #                 scores=scores,
+++++++-+    #                 logits=raw_logits,
+++++++-+    #                 encoder_attentions=encoder_attentions,
+++++++-+    #                 encoder_hidden_states=encoder_hidden_states,
+++++++-+    #                 decoder_attentions=decoder_attentions,
+++++++-+    #                 cross_attentions=cross_attentions,
+++++++-+    #                 decoder_hidden_states=decoder_hidden_states,
+++++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++++-+    #                 average_infer_time=average_infer_time
+++++++-+    #             )
+++++++-+    #         else:
+++++++-+    #             return GenerateDecoderOnlyOutput(
+++++++-+    #                 sequences=input_ids,
+++++++-+    #                 scores=scores,
+++++++-+    #                 logits=raw_logits,
+++++++-+    #                 attentions=decoder_attentions,
+++++++-+    #                 hidden_states=decoder_hidden_states,
+++++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+++++++-+    #                 average_infer_time=average_infer_time
+++++++-+    #             )
+++++++-+    #     else:
+++++++-+    #         return input_ids
+++++++-+            
+++++++-+    # def _prepare_cache_for_generation(
+++++++-+    #     self,
+++++++-+    #     generation_config,
+++++++-+    #     model_kwargs,
+++++++-+    #     assistant_model,
+++++++-+    #     batch_size,
+++++++-+    #     max_cache_length,
+++++++-+    # ):
+++++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+++++++-+    #         generation_config.cache_implementation = "static"
+++++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+++++++-+        
+++++++-+    #     if generation_config.cache_implementation == "static":
+++++++-+    #         base_required_from_max_length = generation_config.max_length + 1
+++++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+++++++-+    #         min_cache_size = 50
+++++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+++++++-+    #         else:
+++++++-+    #             max_cache_length = max(base_required, min_cache_size)
+++++++-+            
+++++++-+    #         original_max_cache_length = max_cache_length
+++++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+++++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+++++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+++++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+++++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+++++++-+            
+++++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+++++++-+    #             if max_cache_length > self.config.max_position_embeddings:
+++++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+++++++-+        
+++++++-+    #     result = super()._prepare_cache_for_generation(
+++++++-+    #         generation_config=generation_config,
+++++++-+    #         model_kwargs=model_kwargs,
+++++++-+    #         assistant_model=assistant_model,
+++++++-+    #         batch_size=batch_size,
+++++++-+    #         max_cache_length=max_cache_length,
+++++++-+    #     )
+++++++-+        
+++++++-+    #     if generation_config.cache_implementation == "static":
+++++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+++++++-+    #         created_cache = model_kwargs.get(cache_name)
+++++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+++++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+++++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
+++++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+++++++-+        
+++++++-+    #     return result
+++++++-+
+++++++-+
+++++++-+
+++++++- 
+++++++- 
+++++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+++++++--- 
+++++++-2.27.0
+++++++-
+++++++-- 
+++++++2.27.0
+++++++
++++++-- 
++++++2.27.0
++++++
+++++-- 
+++++2.27.0
+++++
++++-- 
++++2.27.0
++++
+++-- 
+++2.27.0
+++
++-- 
++2.27.0
++
+-- 
+2.39.5 (Apple Git-154)
+
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0010-.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0010-.patch"
new file mode 100644
index 00000000..a1832dc4
--- /dev/null
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0010-.patch"
@@ -0,0 +1,49453 @@
+From 5d88d879c9a97cf89b7f7a00df9534ba2df9e955 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?=E9=82=93=E4=BC=9F=E9=94=AE?= <emmmvkdeng@gmail.com>
+Date: Wed, 3 Dec 2025 16:13:15 +0800
+Subject: [PATCH 10/10] =?UTF-8?q?=E6=9C=80=E5=90=8E=E6=95=B4=E7=90=86?=
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+---
+ .../models/deepseek/modeling_deepseek.py      |  731 +-
+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1005 +-
+ patches/0001-20251104commit.patch             | 1272 ---
+ patches/0002-20251106commit.patch             | 3200 ------
+ patches/0003-20261106secondcommit.patch       | 2769 ------
+ patches/0004-20251106change.patch             | 7498 --------------
+ patches/0005-20251107001commit.patch          | 7707 ---------------
+ patches/0006-20251107002commit.patch          | 7931 ---------------
+ patches/0007-20251107003commit.patch          | 8034 ---------------
+ patches/0008-moe-change.patch                 | 8789 -----------------
+ 10 files changed, 29 insertions(+), 48907 deletions(-)
+ delete mode 100644 patches/0001-20251104commit.patch
+ delete mode 100644 patches/0002-20251106commit.patch
+ delete mode 100644 patches/0003-20261106secondcommit.patch
+ delete mode 100644 patches/0004-20251106change.patch
+ delete mode 100644 patches/0005-20251107001commit.patch
+ delete mode 100644 patches/0006-20251107002commit.patch
+ delete mode 100644 patches/0007-20251107003commit.patch
+ delete mode 100644 patches/0008-moe-change.patch
+
+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+index 8d004af1..8178fb05 100644
+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+@@ -234,9 +234,6 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+ def rotate_half(x):
+     """Rotates half the hidden dims of the input."""
+-    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-    # x1 = x[..., : x.shape[-1] // 2]
+-    # x2 = x[..., x.shape[-1] // 2 :]
+     x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+     return ops.cat((-x2, x1), dim=-1)
+ 
+@@ -413,10 +410,7 @@ class DeepseekMoE(nn.Module):
+         if self.training:
+             raise NotImplementedError("Training is not supported yet.")
+         else:
+-            # @lwx
+             if orig_shape[1] == 1:
+-                # lwx moe_infer_decode_fast
+-                # y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+                 y=self.moe_infer_decode_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+                 y=y.view(*orig_shape)
+                 if self.config.n_shared_experts is not None:
+@@ -430,120 +424,7 @@ class DeepseekMoE(nn.Module):
+                 if self.config.n_shared_experts is not None:
+                     y = y + self.shared_experts(identity)
+                 return y
+-            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-        # if self.config.n_shared_experts is not None:
+-        #     y = y + self.shared_experts(identity)
+-        # return y
+-    
+-    
+-    
+-    # lwx
+-    # def forward(self, x, expert_ids: Optional[mindspore.Tensor] = None):
+-    #     """
+-    #     如果 expert_ids 为 None，走单专家逻辑；
+-    #     如果有，多专家批量处理，保证和原逻辑一致。
+-    #     """
+-    #     if expert_ids is None:
+-    #         # 原单专家逻辑
+-    #         if self.config.pretraining_tp > 1:
+-    #             slice = self.intermediate_size // self.config.pretraining_tp
+-    #             gate_proj_slices = ops.split(self.gate_proj.weight, slice, dim=0)
+-    #             up_proj_slices = ops.split(self.up_proj.weight, slice, dim=0)
+-    #             down_proj_slices = ops.split(self.down_proj.weight, slice, dim=1)
+-    #             gate_proj = ops.cat([F.linear(x, gate_proj_slices[i]) 
+-    #                                  for i in range(self.config.pretraining_tp)], dim=-1)
+-    #             up_proj = ops.cat([F.linear(x, up_proj_slices[i]) 
+-    #                                for i in range(self.config.pretraining_tp)], dim=-1)
+-    #             intermediate_states = ops.split((self.act_fn(gate_proj) * up_proj), slice, dim=2)
+-    #             down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) 
+-    #                          for i in range(self.config.pretraining_tp)]
+-    #             down_proj = sum(down_proj)
+-    #         else:
+-    #             down_proj = self.down_proj(
+-    #                 self.act_fn(self.gate_proj(x)) * self.up_proj(x)
+-    #             )
+-    #         return down_proj
+-
+-    #     # ====== 批量多专家路径 ======
+-    #     hidden_size = x.shape[-1]
+-
+-    #     # 按 token expert_ids 选权重
+-    #     gate_weights = self.gate_proj.weight[expert_ids]  # shape: [tokens, inter_size]
+-    #     up_weights   = self.up_proj.weight[expert_ids]
+-    #     down_weights = self.down_proj.weight[expert_ids]
+-
+-    #     # 注意：pretraining_tp > 1 的分 slice 逻辑仍然要保留
+-    #     if self.config.pretraining_tp > 1:
+-    #         outputs = []
+-    #         slice = self.intermediate_size // self.config.pretraining_tp
+-    #         for i in range(self.config.pretraining_tp):
+-    #             # 每个 slice 单独计算
+-    #             gate_proj_out = F.linear(x, gate_weights[:, i*slice:(i+1)*slice])
+-    #             up_proj_out   = F.linear(x, up_weights[:, i*slice:(i+1)*slice])
+-    #             act_out       = self.act_fn(gate_proj_out) * up_proj_out
+-    #             down_proj_out = F.linear(act_out, down_weights[i*slice:(i+1)*slice, :])
+-    #             outputs.append(down_proj_out)
+-    #         return sum(outputs)
+-    #     else:
+-    #         gate_proj_out = F.linear(x, gate_weights)
+-    #         up_proj_out   = F.linear(x, up_weights)
+-    #         act_out       = self.act_fn(gate_proj_out) * up_proj_out
+-    #         return F.linear(act_out, down_weights)
+-    # @no_grad()
+-    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-    #     num_tokens = x.shape[0]
+-    #     hidden_size = x.shape[-1]
+-
+-    #     idxs = flat_expert_indices.argsort()
+-    #     sorted_expert_indices = flat_expert_indices[idxs]
+-    #     sorted_token_indices = idxs // self.num_experts_per_tok
+-    #     sorted_indices = sorted_token_indices
+-
+-    #     permuted_tokens = x[sorted_token_indices]
+-    #     sorted_weights  = flat_expert_weights[idxs]
+-
+-    #     # 一次调用多专家 forward
+-    #     expert_outputs = ops.zeros_like(permuted_tokens)
+-    #     expert_outputs = self.mlp_batch_forward(permuted_tokens, sorted_expert_indices)
+-
+-    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
+-    #     try:
+-    #         final_output = ops.moe_token_unpermute(
+-    #             expert_outputs,
+-    #             sorted_indices,
+-    #             probs=probs,
+-    #             padded_mode=False
+-    #         )
+-    #     except Exception:
+-    #         final_output = ops.zeros_like(x)
+-    #         final_output = mindspore.mint.scatter_add(
+-    #             final_output,
+-    #             0,
+-    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+-    #             expert_outputs * sorted_weights
+-    #         )
+-
+-    #     return final_output
+-
+-    # def mlp_batch_forward(self, tokens, expert_ids):
+-    #     """
+-    #     使用批量专家 forward（保留精度）
+-    #     """
+-    #     return self.experts[0].forward(tokens, expert_ids)
+-
+-    # @no_grad()
+-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-
+-    #     expert_cache = ops.zeros_like(x)
+-    #     for i in range(self.num_experts_per_tok):
+-    #         expert_id = flat_expert_indices[i].item()
+-    #         weight = flat_expert_weights[i].item()
+-    #         expert = self.experts[expert_id]
+-    #         expert_out = expert(x)
+-    #         expert_cache += expert_out * weight
+-    #     return expert_cache
+-    
+-    #@dwj
++
+     @no_grad()
+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+ 
+@@ -561,35 +442,27 @@ class DeepseekMoE(nn.Module):
+         - 跳过无 token 的专家
+         - 保持结果完全一致
+         """
+-        # 初始化输出缓存
+         expert_cache = ops.zeros_like(x)
+ 
+-        # 排序（确保 scatter_add 位置对应原逻辑）
+         idxs = flat_expert_indices.argsort()
+         sorted_expert_indices = flat_expert_indices[idxs]
+         sorted_token_indices = idxs // self.num_experts_per_tok
+ 
+-        # 每个 expert 的 token 数
+         tokens_per_expert = sorted_expert_indices.bincount()
+ 
+-        # 找出有 token 的专家
+         active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+ 
+         for expert_id in active_experts.tolist():
+-            # 取该 expert 对应的排序后 token 区间
+             start = (tokens_per_expert[:expert_id]).sum().item()
+             end = start + tokens_per_expert[expert_id].item()
+ 
+-            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+-            expert_tokens = x[token_idx]                     # 取输入向量
++            token_idx = sorted_token_indices[start:end]
++            expert_tokens = x[token_idx]
+ 
+-            # 执行专家 MLP
+             expert_out = self.experts[expert_id](expert_tokens)
+ 
+-            # 按权重缩放
+             scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+ 
+-            # 回写到缓存（等价 scatter_add）
+             expert_cache = mindspore.mint.scatter_add(
+                 expert_cache,
+                 0,
+@@ -599,60 +472,6 @@ class DeepseekMoE(nn.Module):
+ 
+         return expert_cache
+ 
+-
+-    # @no_grad()
+-    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-    #     """
+-    #     优化版 MoE prefill：使用 mindspore.ops.moe_token_unpermute 替代手动 scatter_add
+-    #     """
+-    #     num_tokens = x.shape[0]
+-    #     hidden_size = x.shape[-1]
+-
+-    #     # 生成排序后的 token 索引
+-    #     idxs = flat_expert_indices.argsort()
+-    #     sorted_expert_indices = flat_expert_indices[idxs]
+-    #     sorted_token_indices = idxs // self.num_experts_per_tok
+-
+-    #     # 记录到 sorted_indices（moe_token_unpermute 用）
+-    #     sorted_indices = sorted_token_indices  # shape: [num_tokens * top_k]
+-
+-    #     # 收集专家输入
+-    #     permuted_tokens = x[sorted_token_indices]
+-
+-    #     # 执行每个专家的 MLP（批量处理）
+-    #     expert_outputs = []
+-    #     token_ptr = 0
+-    #     tokens_per_expert = sorted_expert_indices.bincount()
+-    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
+-    #         if count == 0:
+-    #             continue
+-    #         cur_tokens = permuted_tokens[token_ptr:token_ptr+count]
+-    #         out = self.experts[expert_id](cur_tokens)
+-    #         expert_outputs.append(out)
+-    #         token_ptr += count
+-
+-    #     # 拼接所有专家输出
+-    #     permuted_outputs = ops.cat(expert_outputs, axis=0)
+-
+-    #     # 权重缩放（probs 形状为 [num_tokens, top_k]）
+-    #     probs = flat_expert_weights.view(num_tokens, self.num_experts_per_tok)
+-
+-    #     # 直接调用硬件加速的 unpermute
+-    #     final_output = ops.moe_token_unpermute(
+-    #         permuted_outputs,         # shape: [num_tokens * top_k, hidden_size]
+-    #         sorted_indices,           # shape: [num_tokens * top_k]
+-    #         probs=probs,               # 按概率加权
+-    #         padded_mode=False
+-    #     )
+-
+-    #     return final_output
+-    # def init_expert_cache(self):
+-    #     """
+-    #     在模型初始化时调用，缓存所有专家的权重到显存。
+-    #     """
+-    #     self.cache_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
+-    #     self.cache_up_w   = ops.stack([expert.up_proj.weight for expert in self.experts], dim=0)
+-    #     self.cache_down_w = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
+     @no_grad()
+     def moe_infer_decode_fast(self, x, flat_expert_indices, flat_expert_weights):
+         top_k = flat_expert_indices.shape[0]
+@@ -684,43 +503,22 @@ class DeepseekMoE(nn.Module):
+         weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+         return weighted_sum
+ 
+-    # lwx prefill 20251108
+     @no_grad()
+     def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
+-        """
+-        高性能 + 数值一致的 MoE prefill 推理：
+-        1. 批量化处理所有专家计算，减少 Python 循环开销
+-        2. Ascend A2 上使用 ops.moe_token_unpermute 加速 token 恢复
+-        3. CPU/GPU 上自动 fallback 到 scatter_add 实现
+-        4. 保证权重和 token 排列顺序与原版本完全一致，避免生成结果 mismatch
+-
+-        参数：
+-            x: [num_tokens, hidden_size]，
+-            MoE 输入的 token 表示
+-            flat_expert_indices: [num_tokens * top_k]，
+-            每个 token 的路由专家 id
+-            flat_expert_weights: [num_tokens * top_k, 1]，
+-            路由专家权重
+-        """
+         num_tokens = x.shape[0]
+         hidden_size = x.shape[-1]
+ 
+-        # 1) 排序专家分配（与原 scatter_add 一致的顺序）
+-        idxs = flat_expert_indices.argsort()  # 排序索引
+-        sorted_expert_indices = flat_expert_indices[idxs]          # [num_tokens*top_k]
+-        sorted_token_indices = idxs // self.num_experts_per_tok    # 原 token ID
++        idxs = flat_expert_indices.argsort()
++        sorted_expert_indices = flat_expert_indices[idxs]
++        sorted_token_indices = idxs // self.num_experts_per_tok
+ 
+-        # sorted_indices 必须与 permuted_tokens 顺序匹配
+-        sorted_indices = sorted_token_indices   # 用原 token 位置恢复顺序
++        sorted_indices = sorted_token_indices
+ 
+-        # 2) 收集专家输入（按 idxs 排序）
+-        permuted_tokens = x[sorted_token_indices]                  # [num_tokens*top_k, hidden_size]
+-        sorted_weights  = flat_expert_weights[idxs]                 # [num_tokens*top_k, 1]，确保与 permuted_tokens 对齐
++        permuted_tokens = x[sorted_token_indices]
++        sorted_weights  = flat_expert_weights[idxs]
+ 
+-        # 3) 计算每个专家的 token 数
+         tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
+ 
+-        # 4) 批量专家计算（减少 Python 循环）
+         gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
+         up_weights   = ops.stack([expert.up_proj.weight   for expert in self.experts], dim=0)
+         down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
+@@ -731,8 +529,7 @@ class DeepseekMoE(nn.Module):
+             if count == 0:
+                 continue
+             tokens = permuted_tokens[ptr:ptr+count]  # [count, hidden_size]
+-            
+-            # 与 DeepseekMLP forward 等价
++
+             gate_proj_out = F.linear(tokens, gate_weights[expert_id])
+             up_proj_out   = F.linear(tokens, up_weights[expert_id])
+             act_out       = self.experts[expert_id].act_fn(gate_proj_out) * up_proj_out
+@@ -741,7 +538,6 @@ class DeepseekMoE(nn.Module):
+             expert_outputs[ptr:ptr+count] = expert_out
+             ptr += count
+ 
+-        # 5) Ascend 加速的 unpermute（已排序的权重）
+         probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)  # 按排序后的顺序 reshape
+ 
+         final_output = ops.zeros_like(x)
+@@ -753,444 +549,6 @@ class DeepseekMoE(nn.Module):
+         )      
+         return final_output
+ 
+-        # try:
+-        #     final_output = ops.moe_token_unpermute(
+-        #         expert_outputs,   # [num_tokens*top_k, hidden_size]
+-        #         sorted_indices,   # [num_tokens*top_k] 原 token id
+-        #         probs=probs,      # 对应权重
+-        #         padded_mode=False
+-        #     )
+-        # except Exception:
+-        #     # CPU/GPU fallback：用 scatter_add 保证完全一致
+-        #     final_output = ops.zeros_like(x)
+-        #     final_output = mindspore.mint.scatter_add(
+-        #         final_output,
+-        #         0,
+-        #         sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+-        #         expert_outputs * sorted_weights
+-        #     )
+-
+-        # return final_output
+-
+-
+-    # @no_grad()
+-    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-    #     num_tokens = x.shape[0]
+-    #     hidden_size = x.shape[-1]
+-
+-    #     idxs = flat_expert_indices.argsort()
+-    #     sorted_expert_indices = flat_expert_indices[idxs]
+-    #     sorted_token_indices = idxs // self.num_experts_per_tok
+-        
+-    #     # sorted_indices = sorted_token_indices
+-    #     sorted_indices = sorted_token_indices.astype(mindspore.int32)
+-    #     permuted_tokens = x[sorted_token_indices]
+-    #     sorted_weights = flat_expert_weights[idxs]
+-    #     tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
+-
+-    #     expert_outputs = ops.zeros_like(permuted_tokens)
+-    #     ptr = 0
+-
+-    #     # 只按专家维度循环
+-    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
+-    #         if count == 0:
+-    #             continue
+-    #         token_slice = slice(ptr, ptr + count)
+-    #         expert_tokens = permuted_tokens[token_slice]
+-
+-    #         # 保持原 forward（含 pretraining_tp、bias 等）
+-    #         expert_out = self.experts[expert_id](expert_tokens)
+-
+-    #         expert_outputs[token_slice] = expert_out
+-    #         ptr += count
+-
+-    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
+-    #     try:
+-    #         final_output = mindspore.ops.moe_token_unpermute(
+-    #             expert_outputs,
+-    #             sorted_indices,
+-    #             probs=probs,
+-    #             padded_mode=False
+-    #         )
+-    #     except Exception:
+-    #         final_output = ops.zeros_like(x)
+-    #         final_output = mindspore.mint.scatter_add(
+-    #             final_output,
+-    #             0,
+-    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+-    #             expert_outputs * sorted_weights
+-    #         )
+-
+-    #     return final_output
+-
+-
+-    #lwx
+-    # @no_grad()
+-    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-    #     """
+-    #     并行化 MoE prefill：
+-    #     - 一次性计算所有专家输出，牺牲显存峰值换取速度
+-    #     - 保证结果与原版完全一致
+-    #     """
+-    #     # 输出缓存
+-    #     expert_cache = ops.zeros_like(x)
+-
+-    #     # token 总数（批量*seq_len*num_experts_per_tok）
+-    #     num_tokens = flat_expert_indices.shape[0]
+-    #     hidden_dim = x.shape[-1]
+-
+-    #     # 原 token ID（idxs // num_experts_per_tok）
+-    #     token_ids = ops.arange(num_tokens // self.num_experts_per_tok).repeat_interleave(self.num_experts_per_tok)
+-
+-    #     # ====== Step 1: 组织输入 ======
+-    #     # 按 experts 排序，保证 scatter_add 对应位置一致
+-    #     sort_ids = flat_expert_indices.argsort()
+-    #     sorted_experts = flat_expert_indices[sort_ids]
+-    #     sorted_tokens = token_ids[sort_ids]
+-    #     sorted_weights = flat_expert_weights[sort_ids]
+-
+-    #     # 收集每个专家的输入
+-    #     # build: expert_inputs[expert_id] = [tokens...]
+-    #     expert_inputs = []
+-    #     expert_outs = []
+-
+-    #     for eid in range(self.config.n_routed_experts):
+-    #         eid_mask = (sorted_experts == eid)
+-    #         if eid_mask.any():
+-    #             tokens_for_eid = x[sorted_tokens[eid_mask]]
+-    #             expert_inputs.append(tokens_for_eid)
+-    #         else:
+-    #             expert_inputs.append(None)
+-
+-    #     # ====== Step 2: 并行计算所有专家输出 ======
+-    #     # 存储所有专家结果到一个列表
+-    #     for eid in range(self.config.n_routed_experts):
+-    #         if expert_inputs[eid] is not None:
+-    #             out = self.experts[eid](expert_inputs[eid])
+-    #             expert_outs.append(out)
+-    #         else:
+-    #             expert_outs.append(None)
+-
+-    #     # ====== Step 3: scatter_add 回写结果 ======
+-    #     # 遍历专家，将结果加回对应的 token
+-    #     pos = 0
+-    #     for eid in range(self.config.n_routed_experts):
+-    #         if expert_outs[eid] is not None:
+-    #             size = expert_outs[eid].shape[0]
+-    #             tokens_idx = sorted_tokens[pos:pos+size]
+-    #             scaled_out = expert_outs[eid] * sorted_weights[pos:pos+size]
+-    #             pos += size
+-
+-    #             # scatter_add 到 expert_cache
+-    #             expert_cache = mindspore.mint.scatter_add(
+-    #                 expert_cache,
+-    #                 dim=0,
+-    #                 index=tokens_idx.view(-1, 1).tile((1, hidden_dim)),
+-    #                 src=scaled_out
+-    #             )
+-
+-    #     return expert_cache
+-
+-
+-
+-# 放置在 DeepseekMoE 类中
+-    # @no_grad()
+-    # #lwx 20251107
+-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-    #     """
+-    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-
+-    #     Args:
+-    #         x (Tensor): 输入张量, shape: (1, hidden_size)
+-    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-    #     """
+-    #     top_k, _ = flat_expert_weights.shape
+-    #     hidden_size = x.shape[-1]
+-
+-    #     # 1. 将所有专家的权重堆叠起来
+-    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-        
+-    #     # 2. "收集" 所需的专家权重
+-    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
+-    #     selected_up_w = stacked_up_w[flat_expert_indices]
+-    #     selected_down_w = stacked_down_w[flat_expert_indices]
+-        
+-    #     # 3. 准备输入
+-    #     x_expanded = x.expand((top_k, 1, hidden_size))
+-        
+-    #     # 4. 并行计算 gate_proj 和 up_proj
+-    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-
+-    #     # 5. 计算中间状态
+-    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-        
+-    #     # 6. 并行计算 down_proj
+-    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-    #     #    --- [FIX] ---
+-    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-    #     #    --- [FIX END] ---
+-        
+-    #     # 7. 根据路由权重进行加权求和
+-    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-
+-    #     return weighted_sum
+-
+-
+-
+-        # @no_grad()
+-        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-        #     # expert_cache = torch.zeros_like(x)
+-        #     # idxs = flat_expert_indices.argsort()
+-        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-        #     # token_idxs = idxs // self.num_experts_per_tok
+-        #     # for i, end_idx in enumerate(tokens_per_expert):
+-        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-        #     #     if start_idx == end_idx:
+-        #     #         continue
+-        #     #     expert = self.experts[i]
+-        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-        #     #     expert_tokens = x[exp_token_idx]
+-        #     #     expert_out = expert(expert_tokens)
+-        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-        #     # return expert_cache
+-        #     expert_cache = ops.zeros_like(x)
+-        #     idxs = flat_expert_indices.argsort()
+-        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-        #     token_idxs = idxs // self.num_experts_per_tok
+-
+-        #     for i, end_idx in enumerate(tokens_per_expert):
+-        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-        #         if start_idx == end_idx:
+-        #             continue
+-        #         expert = self.experts[i]
+-        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-        #         expert_tokens = x[exp_token_idx]
+-        #         expert_out = expert(expert_tokens)
+-        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-
+-        #     return expert_cache
+-        # @no_grad()
+-        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-        #     expert_cache = ops.zeros_like(x)
+-
+-        #     # 排序保证顺序一致
+-        #     idxs = flat_expert_indices.argsort()
+-        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-        #     token_idxs = idxs // self.num_experts_per_tok
+-
+-        #     # 找出有 token 的专家
+-        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-
+-        #     for i in active_experts.tolist():
+-        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-        #         end_idx = tokens_per_expert[i]
+-        #         if start_idx == end_idx:  # 没有 token
+-        #             continue
+-
+-        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-        #         expert_tokens = x[exp_token_idx]
+-        #         expert_out = self.experts[i](expert_tokens)
+-        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-
+-        #         expert_cache = mindspore.mint.scatter_add(
+-        #             expert_cache,
+-        #             0,
+-        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-        #             expert_out
+-        #         )
+-
+-        #     return expert_cache
+-
+-
+-
+-# class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-#     """
+-#     The trick function of adding auxiliary (aux) loss, 
+-#     which includes the gradient of the aux loss during backpropagation.
+-#     """
+-#     @staticmethod
+-#     def forward(ctx, x, loss):
+-#         assert loss.numel() == 1
+-#         ctx.dtype = loss.dtype
+-#         ctx.required_aux_loss = loss.requires_grad
+-#         return x
+-
+-#     @staticmethod
+-#     def backward(ctx, grad_output):
+-#         grad_loss = None
+-#         if ctx.required_aux_loss:
+-#             grad_loss = ops.ones(1, dtype=ctx.dtype)
+-#         return grad_output, grad_loss
+-
+-
+-# class DeepseekMoE(nn.Module):
+-#     '''
+-#     A mixed expert module containing shared experts.
+-#     '''
+-#     def __init__(self, config):
+-#         super().__init__()
+-#         self.config = config
+-#         self.num_experts_per_tok = config.num_experts_per_tok
+-#         if hasattr(config, "ep_size") and config.ep_size > 1:
+-#             assert config.ep_size == mindspore.mint.distributed.get_world_size()
+-#             self.ep_size = config.ep_size
+-#             self.experts_per_rank = config.n_routed_experts // config.ep_size
+-#             self.ep_rank = mindspore.mint.distributed.get_rank()
+-#             self.experts = nn.ModuleList(
+-#                 [
+-#                     (
+-#                         DeepseekMLP(
+-#                             config, intermediate_size=config.moe_intermediate_size
+-#                         )
+-#                         if i >= self.ep_rank * self.experts_per_rank
+-#                            and i < (self.ep_rank + 1) * self.experts_per_rank
+-#                         else None
+-#                     )
+-#                     for i in range(config.n_routed_experts)
+-#                 ]
+-#             )
+-
+-#         else:
+-#             self.ep_size = 1
+-#             self.experts_per_rank = config.n_routed_experts
+-#             self.ep_rank = 0
+-#             self.experts = nn.ModuleList(
+-#                 [
+-#                     DeepseekMLP(
+-#                         config, intermediate_size=config.moe_intermediate_size
+-#                     )
+-#                     for i in range(config.n_routed_experts)
+-#                 ]
+-#             )
+-#         self.gate = MoEGate(config)
+-#         if config.n_shared_experts is not None:
+-#             intermediate_size = config.moe_intermediate_size * config.n_shared_experts
+-#             self.shared_experts = DeepseekMLP(
+-#                 config=config, intermediate_size=intermediate_size
+-#             )
+-
+-#     def forward(self, hidden_states):
+-#         identity = hidden_states
+-#         orig_shape = hidden_states.shape
+-#         topk_idx, topk_weight, aux_loss = self.gate(hidden_states)
+-#         hidden_states = hidden_states.view(-1, hidden_states.shape[-1])
+-#         flat_topk_idx = topk_idx.view(-1)
+-#         if self.training:
+-#             hidden_states = hidden_states.repeat_interleave(
+-#             self.num_experts_per_tok, dim=0
+-#             )
+-#             y = ops.empty(hidden_states.shape)
+-#             for i, expert in enumerate(self.experts):
+-#                 y[flat_topk_idx == i] = expert(hidden_states[flat_topk_idx == i])
+-#             y = ops.sum(y.view(*topk_weight.shape, -1) * topk_weight.unsqueeze(-1), dim=1)
+-#             y = y.to(hidden_states.dtype).view(*orig_shape)
+-#             # y = AddAuxiliaryLoss.apply(y, aux_loss)
+-#         else:
+-#             # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-#             y = self.moe_infer(hidden_states, topk_idx, topk_weight).view(*orig_shape)
+-#         if self.config.n_shared_experts is not None:
+-#             y = y + self.shared_experts(identity)
+-#         return y
+-    
+-#     # # @mindnlp.core.no_grad()
+-#     # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-#     #     expert_cache = ops.zeros_like(x)
+-#     #     idxs = flat_expert_indices.argsort()
+-#     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-#     #     token_idxs = idxs // self.num_experts_per_tok
+-#     #     for i, end_idx in enumerate(tokens_per_expert):
+-#     #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-#     #         if start_idx == end_idx:
+-#     #             continue
+-#     #         expert = self.experts[i]
+-#     #         exp_token_idx = token_idxs[start_idx:end_idx]
+-#     #         expert_tokens = x[exp_token_idx]
+-#     #         expert_out = expert(expert_tokens)
+-#     #         expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-#     #         # expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out, reduce='sum')
+-#     #     return expert_out # expert_cache
+-#     def moe_infer(self, x, topk_ids, topk_weight):
+-#         cnts = topk_ids.new_zeros((topk_ids.shape[0], len(self.experts)))
+-#         cnts.scatter_(1, topk_ids, 1)
+-#         tokens_per_expert = cnts.sum(dim=0)
+-#         idxs = topk_ids.view(-1).argsort()
+-#         sorted_tokens = x[idxs // topk_ids.shape[1]]
+-#         sorted_tokens_shape = sorted_tokens.shape
+-#         if self.ep_size > 1:
+-#             tokens_per_ep_rank = tokens_per_expert.view(self.ep_size, -1).sum(dim=1)
+-#             tokens_per_expert_group = tokens_per_expert.new_empty(
+-#                 tokens_per_expert.shape[0]
+-#             )
+-#             mindspore.mint.distributed.all_to_all_single(tokens_per_expert_group, tokens_per_expert)
+-#             output_splits = (
+-#                 tokens_per_expert_group.view(self.ep_size, -1)
+-#                 .sum(1)
+-#                 .cpu()
+-#                 .numpy()
+-#                 .tolist()
+-#             )
+-#             gathered_tokens = sorted_tokens.new_empty(
+-#                 tokens_per_expert_group.sum(dim=0).cpu().item(), sorted_tokens.shape[1]
+-#             )
+-#             input_split_sizes = tokens_per_ep_rank.cpu().numpy().tolist()
+-#             mindspore.mint.distributed.all_to_all(
+-#                 list(gathered_tokens.split(output_splits)),
+-#                 list(sorted_tokens.split(input_split_sizes)),
+-#             )
+-#             tokens_per_expert_post_gather = tokens_per_expert_group.view(
+-#                 self.ep_size, self.experts_per_rank
+-#             ).sum(dim=0)
+-#             gatherd_idxs = np.zeros(shape=(gathered_tokens.shape[0],), dtype=np.int32)
+-#             s = 0
+-#             for i, k in enumerate(tokens_per_expert_group.cpu().numpy()):
+-#                 gatherd_idxs[s : s + k] = i % self.experts_per_rank
+-#                 s += k
+-#             gatherd_idxs = gatherd_idxs.argsort()
+-#             sorted_tokens = gathered_tokens[gatherd_idxs]
+-#             tokens_per_expert = tokens_per_expert_post_gather
+-#         tokens_per_expert = tokens_per_expert.cpu().numpy()
+-#         outputs = []
+-#         start_idx = 0
+-#         for i, num_tokens in enumerate(tokens_per_expert):
+-#             end_idx = start_idx + num_tokens
+-#             if num_tokens == 0:
+-#                 continue
+-#             expert = self.experts[i + self.ep_rank * self.experts_per_rank]
+-#             tokens_for_this_expert = sorted_tokens[start_idx:end_idx]
+-#             expert_out = expert(tokens_for_this_expert)
+-#             outputs.append(expert_out)
+-#             start_idx = end_idx
+-
+-#         outs = ops.cat(outputs, dim=0) if len(outputs) else sorted_tokens.new_empty(0)
+-#         if self.ep_size > 1:
+-#             new_x = ops.empty_like(outs)
+-#             new_x[gatherd_idxs] = outs
+-#             gathered_tokens = new_x.new_empty(*sorted_tokens_shape)
+-#             mindspore.mint.distributed.all_to_all(
+-#                 list(gathered_tokens.split(input_split_sizes)),
+-#                 list(new_x.split(output_splits)),
+-#             )
+-#             outs = gathered_tokens
+-
+-#         new_x = ops.empty_like(outs)
+-#         new_x[idxs] = outs
+-#         final_out = (
+-#             new_x.view(*topk_ids.shape, -1)
+-#             .type(topk_weight.dtype)
+-#             .mul_(topk_weight.unsqueeze(dim=-1))
+-#             .sum(dim=1)
+-#             .type(new_x.dtype)
+-#         )
+-#         return final_out
+-
+-
+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+     """
+@@ -1313,10 +671,6 @@ class DeepseekAttention(nn.Module):
+             key_states = self.k_proj(hidden_states)
+             value_states = self.v_proj(hidden_states)
+ 
+-        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-        # @lwx
+         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+         query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+@@ -1555,10 +909,6 @@ class DeepseekDecoderLayer(nn.Module):
+         super().__init__()
+         self.hidden_size = config.hidden_size
+ 
+-        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-        #     config=config, layer_idx=layer_idx
+-        # )
+-
+         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+             config=config, layer_idx=layer_idx
+         )
+@@ -1774,14 +1124,6 @@ class DeepseekModel(DeepseekPreTrainedModel):
+                 else None
+             )
+         else:
+-            # 4d mask is passed through the layers
+-            # attention_mask = _prepare_4d_causal_attention_mask(
+-            #     attention_mask,
+-            #     (batch_size, seq_length),
+-            #     inputs_embeds,
+-            #     past_key_values_length,
+-            # )
+-            #@dwj
+             attention_mask = get_cached_causal_mask(
+                 attention_mask,
+                 (batch_size, seq_length),
+@@ -1869,38 +1211,14 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+         self.post_init()
+         # lwx
+         self.warm_up = False
+-    #初始
+-
+-    # def warmup_moe_model_deep(self):
+-    #     print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-    #     test_texts = [
+-    #         "warmup short",
+-    #         "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-    #         "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-    #     ]
+-    #     tokenizer = getattr(self, "_warmup_tokenizer", None)
+-    #     if tokenizer is None:
+-    #         from mindnlp.transformers import AutoTokenizer
+-    #         tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-    #         self._warmup_tokenizer = tokenizer
+-
+-    #     for text in test_texts:
+-    #         inputs = tokenizer(text, return_tensors="ms")
+-    #         with mindspore._no_grad():
+-    #             _ = self(**inputs, use_cache=False)
+-    #     print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-    
++
+     def warmup_moe_model_deep(self):
+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+ 
+-        # 直接用 eval.py 默认的 prompts 内容
+         warmup_prompts = [
+-            "Hello, how are you?",
+-            "This American studied art at Yale and is the author of multiple popular mystery novels. First name is 'Hillary'. What's the last name?",
+-            """Summarize the following text: US President Donald Trump has said he is 'not happy' with his Russian counterpart Vladimir Putin, following Moscow's largest aerial attack yet on Ukraine.
+-    In a rare rebuke, Trump said: "What the hell happened to him? He's killing a lot of people." He later called Putin "absolutely crazy".
+-    Ukrainian President Volodymyr Zelensky earlier said Washington's "silence" over recent Russian attacks was encouraging Putin, urging "strong pressure" - including tougher sanctions - on Moscow.
+-    """
++            "warmup short",
++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+         ]
+ 
+         tokenizer = getattr(self, "_warmup_tokenizer", None)
+@@ -1909,13 +1227,11 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+             tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+             self._warmup_tokenizer = tokenizer
+ 
+-        # 跑一遍 warmup_prompts，触发路由逻辑
+         for text in warmup_prompts:
+             inputs = tokenizer(text, return_tensors="ms")
+             with mindspore._no_grad():
+                 _ = self(**inputs, use_cache=False)
+ 
+-        # 这里可以加按需缓存逻辑，避免显存 OOM
+         from mindnlp.transformers.models.deepseek.modeling_deepseek import DeepseekMoE
+         for module in self.modules():
+             if isinstance(module, DeepseekMoE):
+@@ -2051,15 +1367,13 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+ 
+         loss = None
+         if labels is not None:
+-            # Shift so that tokens < n predict n
+             shift_logits = logits[..., :-1, :]
+             shift_labels = labels[..., 1:]
+-            # Flatten the tokens
++
+             loss_fct = nn.CrossEntropyLoss()
+             shift_logits = shift_logits.view(-1, self.config.vocab_size)
+             shift_labels = shift_labels.view(-1)
+-            # Enable model parallelism
+-            # shift_labels = shift_labels.to(shift_logits)
++
+             loss = loss_fct(shift_logits, shift_labels)
+ 
+         if not return_dict:
+@@ -2091,22 +1405,16 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+                 cache_length = past_length = past_key_values[0][0].shape[2]
+                 max_cache_length = None
+ 
+-            # Keep only the unprocessed tokens:
+-            # 1 - If the length of the attention_mask exceeds the length of input_ids, then we are in a setting where
+-            # some of the inputs are exclusivelly passed as part of the cache (e.g. when passing input_embeds as
+-            # input)
++
+             if (
+                 attention_mask is not None
+                 and attention_mask.shape[1] > input_ids.shape[1]
+             ):
+                 input_ids = input_ids[:, -(attention_mask.shape[1] - past_length) :]
+-            # 2 - If the past_length is smaller than input_ids', then input_ids holds all input tokens. We can discard
+-            # input_ids based on the past_length.
++
+             elif past_length < input_ids.shape[1]:
+                 input_ids = input_ids[:, past_length:]
+-            # 3 - Otherwise (past_length >= input_ids.shape[1]), let's assume input_ids only has unprocessed tokens.
+ 
+-            # If we are about to go beyond the maximum cache length, we need to crop the input attention mask.
+             if (
+                 max_cache_length is not None
+                 and attention_mask is not None
+@@ -2116,14 +1424,11 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+ 
+         position_ids = kwargs.get("position_ids", None)
+         if attention_mask is not None and position_ids is None:
+-            # create position_ids on the fly for batch generation
+             position_ids = attention_mask.to(mindspore.int32).cumsum(-1) - 1
+-            # position_ids.masked_fill_(attention_mask == 0, 1)
+             position_ids = ops.masked_fill(position_ids, attention_mask == 0, 1)
+             if past_key_values:
+                 position_ids = position_ids[:, -input_ids.shape[1] :]
+ 
+-        # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
+         if inputs_embeds is not None and past_key_values is None:
+             model_inputs = {"inputs_embeds": inputs_embeds}
+         else:
+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+index 6566958b..d689e36d 100644
+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+@@ -63,18 +63,14 @@ def get_cached_causal_mask_with_cache_position(
+     """
+     带缓存的 causal mask 构造函数
+     """
+-    # q_len 是当前 query 长度
+     q_len = sequence_length
+-    # kv_len 是 target_length
+     kv_len = target_length
+ 
+-    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+     key = (batch_size, q_len, kv_len, dtype, min_dtype)
+ 
+     if key in _causal_mask_cache:
+         return _causal_mask_cache[key]
+ 
+-    # 调用原来的 mask 构造逻辑
+     causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+         attention_mask,
+         sequence_length=sequence_length,
+@@ -84,7 +80,6 @@ def get_cached_causal_mask_with_cache_position(
+         cache_position=cache_position,
+         batch_size=batch_size,
+     )
+-    # 缓存结果
+     _causal_mask_cache[key] = causal_mask
+     return causal_mask
+ 
+@@ -224,11 +219,6 @@ class Qwen2MoeRMSNorm(nn.Module):
+         self.variance_epsilon = eps
+ 
+     def forward(self, hidden_states):
+-        # @dwj
+-        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-        # @lwx
+-        # if not self.training :
+-        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+         input_dtype = hidden_states.dtype
+         hidden_states = hidden_states.to(mindspore.float32)
+         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+@@ -279,9 +269,6 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+ def rotate_half(x):
+     """Rotates half the hidden dims of the input."""
+-    # x1 = x[..., : x.shape[-1] // 2]
+-    # x2 = x[..., x.shape[-1] // 2 :]
+-    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+     x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+     return ops.cat((-x2, x1), dim=-1)
+ 
+@@ -329,21 +316,8 @@ class Qwen2MoeMLP(nn.Module):
+         self.act_fn = ACT2FN[config.hidden_act]
+ 
+     def forward(self, x):
+-
+         return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-        # @lwx 
+-        # gate_up_output = self.gate_up_proj(x)
+-        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-        # return self.down_proj(swiglu_output)
+-
+-    # def forward(self, x):
+-    #     gate_proj_out = self.gate_proj(x)
+-    #     up_proj_out = self.up_proj(x)
+-    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-    #     return self.down_proj(swiglu_out)
+-        
++
+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+     """
+@@ -356,164 +330,6 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+     hidden_states = hidden_states[:, :, None, :, :].broadcast_to((batch, num_key_value_heads, n_rep, slen, head_dim))
+     return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
+ 
+-
+-# Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+-# class Qwen2MoeAttention(nn.Module):
+-#     """
+-#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-#     and "Generating Long Sequences with Sparse Transformers".
+-#     """
+-
+-#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-#         super().__init__()
+-#         self.config = config
+-#         self.layer_idx = layer_idx
+-#         if layer_idx is None:
+-#             logger.warning_once(
+-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-#                 "when creating this class."
+-#             )
+-
+-#         self.hidden_size = config.hidden_size
+-#         self.num_heads = config.num_attention_heads
+-#         self.head_dim = self.hidden_size // self.num_heads
+-#         self.num_key_value_heads = config.num_key_value_heads
+-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-#         self.max_position_embeddings = config.max_position_embeddings
+-#         self.rope_theta = config.rope_theta
+-#         self.is_causal = True
+-#         self.attention_dropout = config.attention_dropout
+-
+-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-#             raise ValueError(
+-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-#                 f" and `num_heads`: {self.num_heads})."
+-#             )
+-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-
+-#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-#             self.head_dim,
+-#             max_position_embeddings=self.max_position_embeddings,
+-#             base=self.rope_theta,
+-#         )
+-
+-#     def forward(
+-#         self,
+-#         hidden_states: mindspore.Tensor,
+-#         attention_mask: Optional[mindspore.Tensor] = None,
+-#         position_ids: Optional[mindspore.Tensor] = None,
+-#         past_key_value: Optional[Cache] = None,
+-#         output_attentions: bool = False,
+-#         use_cache: bool = False,
+-#         cache_position: Optional[mindspore.Tensor] = None,
+-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-
+-        
+-
+-#         bsz, q_len, _ = hidden_states.shape
+-
+-#         query_states = self.q_proj(hidden_states)
+-#         key_states = self.k_proj(hidden_states)
+-#         value_states = self.v_proj(hidden_states)
+-
+-#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-
+-#         kv_seq_len = key_states.shape[-2]
+-#         if past_key_value is not None:
+-#             if self.layer_idx is None:
+-#                 raise ValueError(
+-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-#                     "with a layer index."
+-#                 )
+-#             if isinstance(past_key_value, StaticCache):
+-#                 kv_seq_len = key_states.shape[-2]
+-#             else:
+-#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-
+-#         if past_key_value is not None:
+-#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-            
+-#             if isinstance(past_key_value, StaticCache):
+-#                 kv_seq_len = key_states.shape[-2]
+-
+-#         # repeat k/v heads if n_kv_heads < n_heads
+-#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-        
+-#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-
+-#         if attention_mask is not None:
+-#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-#             attn_weights = attn_weights + causal_mask
+-
+-#         # upcast attention to fp32
+-#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-#         attn_output = ops.matmul(attn_weights, value_states)
+-
+-#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-#             raise ValueError(
+-#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-#                 f" {attn_output.shape}"
+-#             )
+-
+-#         attn_output = ops.transpose(attn_output, 1, 2)
+-#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-
+-#         attn_output = self.o_proj(attn_output)
+-#         # @lwx
+-        
+-#         # max_seq_len = self.max_position_embeddings  # 2048
+-
+-#         # if attention_mask is not None:
+-#         #     # attention_mask: [B, 1, Sq, Sk]
+-#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-
+-#         #     # pad 到 [max_seq_len, max_seq_len]
+-#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-#         #     global_attention_mask = padded_mask
+-#         # else:
+-#         #     global_attention_mask = None
+-
+-
+-#         # sparse_mode=3
+-#         # attn_output = mindspore.ops.flash_attention_score(
+-#         #     query=query_states,
+-#         #     key=key_states,
+-#         #     value=value_states,
+-#         #     real_shift=None,
+-#         #     padding_mask=None,
+-
+-#         #     head_num=self.num_heads,
+-#         #     attn_mask=global_attention_mask,
+-#         #     keep_prob=1.0 - self.attention_dropout,
+-#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-#         #     input_layout="BNSD",
+-#         #     pre_tokens=2147483647,
+-#         #     next_tokens=2147483647,
+-#         #     inner_precise=0,
+-#         #     drop_mask=None,
+-#         #     prefix=None,
+-#         #     actual_seq_qlen=None,
+-#         #     actual_seq_kvlen=None,
+-#         #     sparse_mode=sparse_mode,
+-#         # )
+-#         if not output_attentions:
+-#             attn_weights = None
+-
+-#         return attn_output, attn_weights, past_key_value
+-
+ class Qwen2MoeAttention(nn.Module):
+     """
+     一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+@@ -594,10 +410,8 @@ class Qwen2MoeAttention(nn.Module):
+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+ 
+-        # --- 2. 动态调度核心注意力计算 ---
+         global Long_Prompt
+         if Long_Prompt >= 1:
+-            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+             fa_attention_mask = None
+             if attention_mask is not None:
+                 mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+@@ -613,7 +427,7 @@ class Qwen2MoeAttention(nn.Module):
+                 scalar_value=1.0 / math.sqrt(self.head_dim),
+                 input_layout="BNSD",
+                 sparse_mode=0,
+-                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
++                inner_precise=0
+             )
+             
+             attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+@@ -623,7 +437,6 @@ class Qwen2MoeAttention(nn.Module):
+                 logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+ 
+         else:
+-            # --- Eager Attention 路径 (用于短序列和解码) ---
+             key_states = repeat_kv(key_states, self.num_key_value_groups)
+             value_states = repeat_kv(value_states, self.num_key_value_groups)
+             
+@@ -651,252 +464,6 @@ class Qwen2MoeAttention(nn.Module):
+         
+         return attn_output, attn_weights, past_key_value
+ 
+-# class Qwen2MoeFlashAttention(nn.Module):
+-#     """
+-#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-
+-#     关键改动:
+-#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-#        直接传入原始的 key 和 value 张量效率更高。
+-#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-#     """
+-#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-#         super().__init__()
+-#         self.config = config
+-#         self.layer_idx = layer_idx
+-#         self.hidden_size = config.hidden_size
+-#         self.num_heads = config.num_attention_heads
+-#         self.head_dim = self.hidden_size // self.num_heads
+-#         self.num_key_value_heads = config.num_key_value_heads
+-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-#         self.max_position_embeddings = config.max_position_embeddings
+-#         self.rope_theta = config.rope_theta
+-#         self.attention_dropout = config.attention_dropout
+-
+-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-#             raise ValueError(
+-#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-#             )
+-
+-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-
+-#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-#             self.head_dim,
+-#             max_position_embeddings=self.max_position_embeddings,
+-#             base=self.rope_theta,
+-#         )
+-
+-#     def forward(
+-#         self,
+-#         hidden_states: mindspore.Tensor,
+-#         attention_mask: Optional[mindspore.Tensor] = None,
+-#         position_ids: Optional[mindspore.Tensor] = None,
+-#         past_key_value: Optional[Cache] = None,
+-#         output_attentions: bool = False,
+-#         use_cache: bool = False,
+-#         cache_position: Optional[mindspore.Tensor] = None,
+-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-
+-#         bsz, q_len, _ = hidden_states.shape
+-
+-#         # 1. 线性投射 Q, K, V
+-#         query_states = self.q_proj(hidden_states)
+-#         key_states = self.k_proj(hidden_states)
+-#         value_states = self.v_proj(hidden_states)
+-
+-#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-#         # query:   [B, S, H*D] -> [B, N1, S, D]
+-#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-
+-#         # 3. RoPE 旋转位置编码
+-#         kv_seq_len = key_states.shape[-2]
+-#         if past_key_value is not None:
+-#             if self.layer_idx is None:
+-#                 raise ValueError(
+-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-#                     "with a layer index."
+-#                 )
+-#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+-#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-#                 if cache_position.shape[0] == 1:
+-#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-#                     kv_seq_len = past_seen_tokens + 1
+-#                 else:
+-#                     # prefill 阶段：cache_position 是范围，使用其长度
+-#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-#             else:
+-#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-
+-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-
+-#         # 4. KV 缓存更新
+-#         if past_key_value is not None:
+-#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-#             key_states, value_states = past_key_value.update(
+-#                 key_states, value_states, self.layer_idx, cache_kwargs
+-#             )
+-            
+-#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-#                 if cache_position.shape[0] == 1:
+-#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-#                     kv_seq_len = key_states.shape[-2]
+-
+-#         # 5. [重要] 准备 Attention Mask
+-#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-#         fa_attention_mask = None
+-#         if attention_mask is not None:
+-#             # 截取与当前key长度匹配的部分
+-#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+-#             fa_attention_mask = (mask_slice != 0)
+-
+-#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-#         input_dtype = query_states.dtype
+-#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-#             query_states = query_states.to(mindspore.float16)
+-#             key_states = key_states.to(mindspore.float16)
+-#             value_states = value_states.to(mindspore.float16)
+-
+-#         # 6. [核心] 调用 flash_attention_score 算子
+-#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-#         attn_output = mindspore.ops.flash_attention_score(
+-#             query=query_states,
+-#             key=key_states,
+-#             value=value_states,
+-#             head_num=self.num_heads, # 传入Q的头数(N1)
+-#             attn_mask=fa_attention_mask,
+-#             keep_prob=1.0 - self.attention_dropout,
+-#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-#             input_layout="BNSD",
+-#             sparse_mode=0 # 使用 defaultMask 模式
+-#         )
+-
+-#         # 恢复原始数据类型
+-#         attn_output = attn_output.to(input_dtype)
+-
+-#         # 7. 调整输出形状
+-#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-#         attn_output = self.o_proj(attn_output)
+-
+-#         # FlashAttention 算子不直接返回注意力权重矩阵
+-#         attn_weights = None
+-#         if output_attentions:
+-#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-
+-#         return attn_output, attn_weights, past_key_value
+-
+-#     # def forward(
+-#     #     self,
+-#     #     hidden_states: mindspore.Tensor,
+-#     #     attention_mask: Optional[mindspore.Tensor] = None,
+-#     #     position_ids: Optional[mindspore.Tensor] = None,
+-#     #     past_key_value: Optional[Cache] = None,
+-#     #     output_attentions: bool = False,
+-#     #     use_cache: bool = False,
+-#     #     cache_position: Optional[mindspore.Tensor] = None,
+-#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-
+-#     #     bsz, q_len, _ = hidden_states.shape
+-
+-#     #     # 1. 线性投射 Q, K, V
+-#     #     query_states = self.q_proj(hidden_states)
+-#     #     key_states = self.k_proj(hidden_states)
+-#     #     value_states = self.v_proj(hidden_states)
+-
+-#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-
+-#     #     # 3. RoPE 旋转位置编码
+-#     #     kv_seq_len = key_states.shape[-2]
+-#     #     if past_key_value is not None:
+-#     #         if self.layer_idx is None:
+-#     #             raise ValueError(
+-#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-#     #                 "with a layer index."
+-#     #             )
+-#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-
+-#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-
+-#     #     # 4. KV 缓存更新
+-#     #     if past_key_value is not None:
+-#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-#     #         key_states, value_states = past_key_value.update(
+-#     #             key_states, value_states, self.layer_idx, cache_kwargs
+-#     #         )
+-
+-#     #     # 5. 准备 Attention Mask
+-#     #     fa_attention_mask = None
+-#     #     if attention_mask is not None:
+-#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-#     #         fa_attention_mask = (mask_slice != 0)
+-
+-#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-#     #     input_dtype = query_states.dtype
+-
+-#     #     # 6. [核心] 调用 flash_attention_score 算子
+-#     #     attn_output = mindspore.ops.flash_attention_score(
+-#     #         query=query_states,
+-#     #         key=key_states,
+-#     #         value=value_states,
+-#     #         head_num=self.num_heads,
+-#     #         attn_mask=fa_attention_mask,
+-#     #         keep_prob=1.0 - self.attention_dropout,
+-#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-#     #         input_layout="BNSD",
+-#     #         sparse_mode=0,
+-#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+-#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-#     #         inner_precise=1
+-#     #     )
+-
+-#     #     # 恢复原始数据类型
+-#     #     attn_output = attn_output.to(input_dtype)
+-
+-#     #     # 7. 调整输出形状
+-#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-#     #     attn_output = self.o_proj(attn_output)
+-
+-#     #     attn_weights = None
+-#     #     if output_attentions:
+-#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-
+-#     #     return attn_output, attn_weights, past_key_value
+-
+ 
+ class Qwen2MoeFlashAttention(nn.Module):
+     """
+@@ -948,17 +515,14 @@ class Qwen2MoeFlashAttention(nn.Module):
+ 
+         bsz, q_len, _ = hidden_states.shape
+ 
+-        # 1. 线性投射 Q, K, V
+         query_states = self.q_proj(hidden_states)
+         key_states = self.k_proj(hidden_states)
+         value_states = self.v_proj(hidden_states)
+ 
+-        # 2. 调整形状以匹配 BNSD 布局
+         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-        
+-        # 3. RoPE 和 KV 缓存
++
+         kv_seq_len = key_states.shape[-2]
+         if past_key_value is not None:
+             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+@@ -970,13 +534,11 @@ class Qwen2MoeFlashAttention(nn.Module):
+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+ 
+-        # 4. 准备 Attention Mask
+         fa_attention_mask = None
+         if attention_mask is not None:
+             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+             fa_attention_mask = (mask_slice != 0)
+ 
+-        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+         attn_output = mindspore.ops.flash_attention_score(
+             query=query_states,
+             key=key_states,
+@@ -987,14 +549,12 @@ class Qwen2MoeFlashAttention(nn.Module):
+             scalar_value=1.0 / math.sqrt(self.head_dim),
+             input_layout="BNSD",
+             sparse_mode=0,
+-            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
++            inner_precise=0
+         )
+ 
+-        # 6. 调整输出形状
+         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+         attn_output = self.o_proj(attn_output)
+ 
+-        # 7. 返回结果
+         attn_weights = None
+         if output_attentions:
+              logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+@@ -1007,88 +567,7 @@ QWEN2MOE_ATTENTION_CLASSES = {
+     "flash-attention": Qwen2MoeFlashAttention,
+ }
+ 
+-
+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-#     def __init__(self, config):
+-#         super().__init__()
+-#         self.num_experts = config.num_experts
+-#         self.top_k = config.num_experts_per_tok
+-#         self.norm_topk_prob = config.norm_topk_prob
+-
+-#         # gating
+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-#         self.experts = nn.ModuleList(
+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-#         )
+-
+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-
+-#     #@dwj
+-#     # 只遍历激活的专家，而非全部专家
+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+-#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-#             num_tokens = hidden_states_reshaped.shape[0]
+-
+-#             router_logits = self.gate(hidden_states_reshaped)
+-#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-
+-#             if self.norm_topk_prob:
+-#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-#             routing_weights = routing_weights.to(hidden_states.dtype)
+-            
+-#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-#             flat_selected_experts = selected_experts.flatten()
+-            
+-#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-#             token_indices = broadcasted_token_indices.flatten()
+-            
+-#             active_experts = ops.unique(flat_selected_experts)
+-            
+-#             for expert_idx_tensor in active_experts:
+-#                 expert_idx = expert_idx_tensor.item()
+-#                 expert_layer = self.experts[expert_idx]
+-                
+-#                 mask = (flat_selected_experts == expert_idx_tensor)
+-#                 selected_token_indices = token_indices[mask]
+-#                 selected_routing_weights = routing_weights.flatten()[mask]
+-                
+-#                 current_states = hidden_states_reshaped[selected_token_indices]
+-                
+-#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-                
+-#                 final_hidden_states = final_hidden_states.index_add(
+-#                     dim=0,
+-#                     index=selected_token_indices,
+-#                     source=expert_output.to(hidden_states.dtype)
+-#                 )
+-            
+-#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-
+-#             final_hidden_states = final_hidden_states + shared_expert_output
+-#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-            
+-#             return final_hidden_states, router_logits
+-
+-
+ class Qwen2MoeSparseMoeBlock(nn.Module):
+-    """
+-    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-    控制的顶级推理策略：
+-
+-    - if Long_Prompt is True:  【精度优先模式】
+-      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+-      适用于需要严格可复现性的长序列任务。
+-
+-    - if Long_Prompt is False: 【速度优先模式】
+-      采用业界最强的性能组合：
+-      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+-      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+-    """
+     def __init__(self, config: Qwen2MoeConfig):
+         super().__init__()
+         self.num_experts = config.num_experts
+@@ -1102,7 +581,6 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+ 
+-    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+     @no_grad()
+     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+         original_dtype = hidden_states.dtype
+@@ -1119,39 +597,8 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+         return moe_output_fp32.squeeze(1).to(original_dtype)
+ 
+-
+-    # @no_grad()
+-    # def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-    #     num_tokens, _ = hidden_states.shape
+-    #     flat_selected_experts = selected_experts.flatten()
+-    #     sorted_expert_indices = flat_selected_experts.argsort()
+-    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-    #     original_token_indices = sorted_expert_indices // self.top_k
+-    #     moe_output = ops.zeros_like(hidden_states)
+-    #     current_token_offset = 0
+-    #     for i in range(self.num_experts):
+-    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+-    #         if expert_token_count == 0:
+-    #             continue
+-    #         end_offset = current_token_offset + expert_token_count
+-    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+-    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-    #         current_token_offset += expert_token_count
+-    #     return moe_output
+-
+-    # baseline
+     @no_grad()
+     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-        """
+-        优化版 MoE prefill (速度优先模式)：
+-        - 批量张量化处理同一个 expert 的所有 token
+-        - 跳过无 token 的专家
+-        - 保持结果完全一致
+-        """
+         moe_output = ops.zeros_like(hidden_states)
+ 
+         flat_selected_experts = selected_experts.flatten()
+@@ -1188,56 +635,39 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+ 
+     @no_grad()
+     def _moe_infer_prefill_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-        """
+-        优化版 MoE prefill (速度优先模式) - 连续切片 & 单次 scatter_add
+-        逻辑：
+-        1. 按 expert 排序，将同一 expert 的 token 放在连续内存中
+-        2. 每个 expert 一次性处理其全部 token
+-        3. 最后一次 scatter_add 回到原 token 顺序
+-        """
+-
+         num_tokens = hidden_states.shape[0]
+         hidden_size = hidden_states.shape[-1]
+ 
+-        # 展平为一维
+-        flat_selected_experts = selected_experts.flatten()       # [num_tokens * top_k]
+-        flat_routing_weights = routing_weights.flatten()         # [num_tokens * top_k]
++        flat_selected_experts = selected_experts.flatten()
++        flat_routing_weights = routing_weights.flatten()
+ 
+-        # 按 expert 排序
+         idxs = flat_selected_experts.argsort()
+-        sorted_expert_indices = flat_selected_experts[idxs]       # expert ID 排序后
+-        sorted_token_indices = idxs // self.top_k                 # 对应原 token ID
++        sorted_expert_indices = flat_selected_experts[idxs]
++        sorted_token_indices = idxs // self.top_k
+ 
+-        # 排好序的输入向量（连续内存）
+         permuted_tokens = hidden_states[sorted_token_indices]
+ 
+-        # 排好序的权重
+         sorted_weights = flat_routing_weights[idxs]
+ 
+-        # 每个 expert 对应的 token 数量
+         tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+ 
+-        # 存放专家输出（与 permuted_tokens 对应顺序保持一致）
+         expert_outputs = ops.zeros_like(permuted_tokens)
+ 
+-        ptr = 0  # 指向当前切片的起点
++        ptr = 0
+         for expert_id, count in enumerate(tokens_per_expert.tolist()):
+             if count == 0:
+                 continue
+ 
+             token_slice = slice(ptr, ptr + count)
+-            expert_tokens = permuted_tokens[token_slice]  # 连续切片
++            expert_tokens = permuted_tokens[token_slice]
+ 
+-            # 执行专家 MLP
+             expert_out = self.experts[expert_id](expert_tokens)
+ 
+             expert_outputs[token_slice] = expert_out
+             ptr += count
+ 
+-        # 按权重缩放
+         scaled_outputs = expert_outputs * sorted_weights.unsqueeze(1)
+ 
+-        # 回写到原 token 顺序 (单次 scatter_add)
+         moe_output = mindspore.mint.scatter_add(
+             ops.zeros_like(hidden_states),
+             0,
+@@ -1247,10 +677,6 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+ 
+         return moe_output
+ 
+-
+-    
+-    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-
+     @no_grad()
+     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+         moe_output = ops.zeros_like(hidden_states)
+@@ -1282,31 +708,12 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+         if self.norm_topk_prob:
+             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+         
+-        moe_output = None
+-        # if Long_Prompt==0:
+-        #     # --- 精度优先模式 (ACCURACY MODE) ---
+-        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-        # else:
+-        #     # --- 速度优先模式 (SPEED MODE) ---
+-        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-        #     if sequence_length == 1:
+-        #         moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-        #     else:
+-        #         moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-        
+         routing_weights_casted = routing_weights.to(hidden_states.dtype)
+         if sequence_length == 1:
+             moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+         else:
+-            # if Long_Prompt == 1:
+-            #     moe_output = self._moe_infer_prefill_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-            # else:
+-            #     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+             moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+ 
+-
+-        # 3. 共享专家计算与合并 (所有模式通用)
+         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+         
+@@ -1320,11 +727,6 @@ class Qwen2MoeDecoderLayer(nn.Module):
+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+         super().__init__()
+         self.hidden_size = config.hidden_size
+-        
+-        # if Long_Prompt == 2:
+-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-        # else:
+-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+ 
+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+ 
+@@ -1421,8 +823,6 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+     _skip_keys_device_placement = "past_key_values"
+     _supports_cache_class = True
+-#lwx
+-    # _supports_static_cache = True
+ 
+     def _init_weights(self, module):
+         std = self.config.initializer_range
+@@ -1576,7 +976,6 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+ 
+         hidden_states = self.norm(hidden_states)
+ 
+-        # add hidden states from the last decoder layer
+         if output_hidden_states:
+             all_hidden_states += (hidden_states,)
+ 
+@@ -1598,7 +997,6 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+             router_logits=all_router_logits,
+         )
+ 
+-    # Copied from transformers.models.llama.modeling_llama.LlamaModel._update_causal_mask
+     def _update_causal_mask(
+         self,
+         attention_mask: mindspore.Tensor,
+@@ -1626,17 +1024,6 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+                 else past_seen_tokens + sequence_length + 1
+             )
+ 
+-        # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+-        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-        #     attention_mask,
+-        #     sequence_length=sequence_length,
+-        #     target_length=target_length,
+-        #     dtype=dtype,
+-        #     min_dtype=min_dtype,
+-        #     cache_position=cache_position,
+-        #     batch_size=input_tensor.shape[0],
+-        # )
+-        #@dwj
+         causal_mask = get_cached_causal_mask_with_cache_position(
+             attention_mask,
+             sequence_length=sequence_length,
+@@ -1664,9 +1051,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+         self.num_experts_per_tok = config.num_experts_per_tok
+         # Initialize weights and apply final processing
+         self.post_init()
+-        # @lwx
+-        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-        #     self.generation_config.cache_implementation = "static"
++
+         self._warmed_up = False
+ 
+     def warmup_moe_model(self):
+@@ -1890,17 +1275,6 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+             dtype = self.lm_head.weight.dtype
+             min_dtype = float(ops.finfo(dtype).min)
+ 
+-            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-            #     attention_mask,
+-            #     sequence_length=sequence_length,
+-            #     target_length=past_key_values.get_max_length(),
+-            #     dtype=dtype,
+-            #     min_dtype=min_dtype,
+-            #     cache_position=cache_position,
+-            #     batch_size=batch_size,
+-            # )
+-
+-            #@dwj
+             attention_mask = get_cached_causal_mask_with_cache_position(
+                 attention_mask,
+                 sequence_length=sequence_length,
+@@ -1922,363 +1296,6 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+         )
+         return model_inputs
+ 
+-# @lwx
+-    # def _decode_one_tokens_logits(
+-    #     self,
+-    #     cur_token: mindspore.Tensor,
+-    #     input_pos: Optional[mindspore.Tensor],
+-    #     cache_position: mindspore.Tensor,
+-    #     past_key_values: StaticCache,
+-    # ) -> mindspore.Tensor:
+-    #     """
+-    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-        
+-    #     Args:
+-    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-    #         input_pos: 输入位置信息，可选
+-    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-            
+-    #     Returns:
+-    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-    #     """
+-    #     # 调用JIT编译的版本
+-    #     return self.get_decode_one_tokens_logits(
+-    #         cur_token=cur_token,
+-    #         input_pos=input_pos,
+-    #         cache_position=cache_position,
+-    #         past_key_values=past_key_values,
+-    #     )
+-    
+-    # @mindspore.jit(jit_level='O1')
+-    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-    #     """
+-    #     JIT编译的函数，用于高效的单token解码
+-    #     使用JIT编译优化以支持静态shape和高效执行
+-        
+-    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-    #     """
+-    #     outputs = self.model.forward(
+-    #         input_ids=cur_token,
+-    #         position_ids=input_pos,
+-    #         cache_position=cache_position,
+-    #         past_key_values=past_key_values,
+-    #         use_cache=True,
+-    #         return_dict=False,
+-    #     )
+-        
+-    #     hidden_states = outputs[0]
+-    #     logits = self.lm_head.forward(hidden_states)
+-    #     logits = logits.float()
+-        
+-    #     return logits[:, -1, :]
+-
+-    # def _sample(
+-    #     self,
+-    #     input_ids: mindspore.Tensor,
+-    #     logits_processor,
+-    #     stopping_criteria,
+-    #     generation_config,
+-    #     synced_devices: bool,
+-    #     streamer=None,
+-    #     logits_warper=None,
+-    #     **model_kwargs,
+-    # ):
+-    #     """
+-    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-    #     """
+-    #     from ...generation.logits_process import LogitsProcessorList
+-    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-    #     from mindnlp.core import nn, ops, no_grad
+-    #     import numpy as np
+-        
+-    #     # 检查是否使用 StaticCache
+-    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-    #     # 否则，直接调用父类方法
+-    #     past_key_values = model_kwargs.get("past_key_values")
+-    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-
+-    #     if not isinstance(past_key_values, StaticCache):
+-    #         # 不使用 StaticCache，直接调用父类方法
+-    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-    #         return super()._sample(
+-    #             input_ids=input_ids,
+-    #             logits_processor=logits_processor,
+-    #             stopping_criteria=stopping_criteria,
+-    #             generation_config=generation_config,
+-    #             synced_devices=synced_devices,
+-    #             streamer=streamer,
+-    #             logits_warper=logits_warper,
+-    #             **model_kwargs,
+-    #         )
+-        
+-    #     # 使用 StaticCache，进入自定义循环
+-    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-    #     pad_token_id = generation_config._pad_token_tensor
+-    #     output_attentions = generation_config.output_attentions
+-    #     output_hidden_states = generation_config.output_hidden_states
+-    #     output_scores = generation_config.output_scores
+-    #     output_logits = generation_config.output_logits
+-    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-    #     max_length = generation_config.max_length
+-    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-    #     do_sample = generation_config.do_sample
+-        
+-    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-    #         raise ValueError(
+-    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-    #             f"{logits_warper})."
+-    #         )
+-        
+-    #     # init attention / hidden states / scores tuples
+-    #     scores = () if (return_dict_in_generate and output_scores) else None
+-    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-        
+-    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-    #         encoder_hidden_states = (
+-    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-    #         )
+-        
+-    #     # keep track of which sequences are already finished
+-    #     batch_size, cur_len = input_ids.shape
+-    #     this_peer_finished = False
+-    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-        
+-    #     time_record = []
+-    #     from ....utils.testing_utils import parse_flag_from_env
+-    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-        
+-    #     while self._has_unfinished_sequences(
+-    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-    #     ):
+-    #         if _record_time:
+-    #             import time as time_module
+-    #             infer_start = time_module.time()
+-            
+-    #         # prepare model inputs
+-    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-            
+-    #         # prepare variable output controls
+-    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-            
+-    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-    #         cur_cache_position = model_inputs.get("cache_position")
+-    #         cur_past_key_values = model_inputs.get("past_key_values")
+-    #         cur_input_ids = model_inputs.get("input_ids")
+-            
+-    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-    #             cur_cache_position is not None and 
+-    #             len(cur_cache_position.shape) > 0 and
+-    #             cur_cache_position.shape[0] == 1 and
+-    #             cur_input_ids is not None and
+-    #             cur_input_ids.shape[1] == 1):
+-    #             # 使用 JIT 优化的单 token 解码
+-    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-    #             if not hasattr(self, '_jit_used'):
+-    #                 self._jit_used = False
+-    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-                
+-    #             next_token_logits = self.get_decode_one_tokens_logits(
+-    #                 cur_token=cur_input_ids,
+-    #                 input_pos=model_inputs.get("position_ids"),
+-    #                 cache_position=cur_cache_position,
+-    #                 past_key_values=cur_past_key_values,
+-    #             )
+-                
+-    #             # 标记已使用JIT（用于后续判断）
+-    #             if not self._jit_used:
+-    #                 self._jit_used = True
+-                
+-    #             # 构造兼容的输出对象
+-    #             class JitOptimizedOutput:
+-    #                 def __init__(self, logits, config):
+-    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-    #                     self.config = config
+-    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-    #                     self.attentions = None if not config.is_encoder_decoder else None
+-    #                     self.cross_attentions = None
+-    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-                
+-    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-    #         else:
+-    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-    #             outputs = self(**model_inputs, return_dict=True)
+-            
+-    #         if synced_devices and this_peer_finished:
+-    #             continue
+-            
+-    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-    #         next_token_logits = outputs.logits[:, -1, :]
+-            
+-    #         # pre-process distribution
+-    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-    #         if do_sample:
+-    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-            
+-    #         # Store scores, attentions and hidden_states when required
+-    #         if return_dict_in_generate:
+-    #             if output_scores:
+-    #                 scores += (next_token_scores,)
+-    #             if output_logits:
+-    #                 raw_logits += (next_token_logits,)
+-    #             if output_attentions:
+-    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-    #                 if self.config.is_encoder_decoder:
+-    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-                
+-    #             if output_hidden_states:
+-    #                 hidden = (
+-    #                     outputs.decoder_hidden_states
+-    #                     if self.config.is_encoder_decoder
+-    #                     else outputs.hidden_states
+-    #                 )
+-    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-            
+-    #         # token selection
+-    #         if do_sample:
+-    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-    #         else:
+-    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-            
+-    #         # finished sentences should have their next token be a padding token
+-    #         if has_eos_stopping_criteria:
+-    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-            
+-    #         # update generated ids, model inputs, and length for next step
+-    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-    #         if streamer is not None:
+-    #             streamer.put(next_tokens)
+-            
+-    #         model_kwargs = self._update_model_kwargs_for_generation(
+-    #             outputs,
+-    #             model_kwargs,
+-    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-    #         )
+-            
+-    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-    #         cur_len += 1
+-            
+-    #         if _record_time:
+-    #             import time as time_module
+-    #             infer_stop = time_module.time()
+-    #             time_record.append(infer_stop - infer_start)
+-            
+-    #         del outputs
+-        
+-    #     average_infer_time = None
+-    #     if time_record:
+-    #         if len(time_record) > 1:
+-    #             time_record.pop(0)
+-    #         average_infer_time = sum(time_record) / len(time_record)
+-    #         print(f'average inference time is: {average_infer_time}')
+-    #         print(f'inference time record: {time_record}')
+-        
+-    #     if streamer is not None:
+-    #         streamer.end()
+-        
+-    #     # 简单判断：打印是否使用了JIT路径
+-    #     if hasattr(self, '_jit_used') and self._jit_used:
+-    #         print("[JIT] ✓ JIT optimization was used during generation")
+-    #     else:
+-    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-        
+-    #     if return_dict_in_generate:
+-    #         if self.config.is_encoder_decoder:
+-    #             return GenerateEncoderDecoderOutput(
+-    #                 sequences=input_ids,
+-    #                 scores=scores,
+-    #                 logits=raw_logits,
+-    #                 encoder_attentions=encoder_attentions,
+-    #                 encoder_hidden_states=encoder_hidden_states,
+-    #                 decoder_attentions=decoder_attentions,
+-    #                 cross_attentions=cross_attentions,
+-    #                 decoder_hidden_states=decoder_hidden_states,
+-    #                 past_key_values=model_kwargs.get("past_key_values"),
+-    #                 average_infer_time=average_infer_time
+-    #             )
+-    #         else:
+-    #             return GenerateDecoderOnlyOutput(
+-    #                 sequences=input_ids,
+-    #                 scores=scores,
+-    #                 logits=raw_logits,
+-    #                 attentions=decoder_attentions,
+-    #                 hidden_states=decoder_hidden_states,
+-    #                 past_key_values=model_kwargs.get("past_key_values"),
+-    #                 average_infer_time=average_infer_time
+-    #             )
+-    #     else:
+-    #         return input_ids
+-            
+-    # def _prepare_cache_for_generation(
+-    #     self,
+-    #     generation_config,
+-    #     model_kwargs,
+-    #     assistant_model,
+-    #     batch_size,
+-    #     max_cache_length,
+-    # ):
+-    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-    #         generation_config.cache_implementation = "static"
+-    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-        
+-    #     if generation_config.cache_implementation == "static":
+-    #         base_required_from_max_length = generation_config.max_length + 1
+-    #         base_required = max(max_cache_length, base_required_from_max_length)
+-    #         min_cache_size = 50
+-    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-    #         else:
+-    #             max_cache_length = max(base_required, min_cache_size)
+-            
+-    #         original_max_cache_length = max_cache_length
+-    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-    #         print(f"  - final max_cache_length: {max_cache_length}")
+-            
+-    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-    #             if max_cache_length > self.config.max_position_embeddings:
+-    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-        
+-    #     result = super()._prepare_cache_for_generation(
+-    #         generation_config=generation_config,
+-    #         model_kwargs=model_kwargs,
+-    #         assistant_model=assistant_model,
+-    #         batch_size=batch_size,
+-    #         max_cache_length=max_cache_length,
+-    #     )
+-        
+-    #     if generation_config.cache_implementation == "static":
+-    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-    #         created_cache = model_kwargs.get(cache_name)
+-    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-    #             if created_cache.max_cache_len < generation_config.max_length:
+-    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-        
+-    #     return result
+-
+-
+-
+-
+-
+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+ class Qwen2MoeForSequenceClassification(Qwen2MoePreTrainedModel):
+     def __init__(self, config):
+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+deleted file mode 100644
+index 8de61195..00000000
+--- a/patches/0001-20251104commit.patch
++++ /dev/null
+@@ -1,1272 +0,0 @@
+-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Tue, 4 Nov 2025 09:11:51 +0800
+-Subject: [PATCH 1/8] 20251104commit
+-
+----
+- mindnlp/transformers/cache_utils.py           |  28 +-
+- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+- 3 files changed, 976 insertions(+), 87 deletions(-)
+-
+-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-index cadd2e04..02f8d4be 100644
+---- a/mindnlp/transformers/cache_utils.py
+-+++ b/mindnlp/transformers/cache_utils.py
+-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-             # k_out[:, :, cache_position] = key_states
+-             # v_out[:, :, cache_position] = value_states
+--            if ON_ORANGE_PI:
+--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+--            else:
+--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+--
+-+            # if ON_ORANGE_PI:
+-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+            # else:
+-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+            # 确保 cache_position 是 1D tensor 并且类型正确
+-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+            if cache_position.ndim > 1:
+-+                cache_position = cache_position.flatten()
+-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+                cache_position = cache_position.int()
+-+            
+-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+            k_out[:, :, cache_position] = key_states
+-+            v_out[:, :, cache_position] = value_states
+-+            
+-         return k_out, v_out
+- 
+-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index c695b944..d8303e45 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+- # Copied from transformers.models.llama.modeling_llama.rotate_half
+- def rotate_half(x):
+-     """Rotates half the hidden dims of the input."""
+--    x1 = x[..., : x.shape[-1] // 2]
+--    x2 = x[..., x.shape[-1] // 2 :]
+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+    # x1 = x[..., : x.shape[-1] // 2]
+-+    # x2 = x[..., x.shape[-1] // 2 :]
+-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-     return ops.cat((-x2, x1), dim=-1)
+- 
+- 
+-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-         if self.training:
+-             raise NotImplementedError("Training is not supported yet.")
+-         else:
+--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+--        if self.config.n_shared_experts is not None:
+--            y = y + self.shared_experts(identity)
+--        return y
+-+            # @lwx
+-+            if orig_shape[1] == 1:
+-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+                y=y.view(*orig_shape)
+-+                if self.config.n_shared_experts is not None:
+-+                    y = y + self.shared_experts(identity)
+-+                return y
+-+            else:
+-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+                if self.config.n_shared_experts is not None:
+-+                    y = y + self.shared_experts(identity)
+-+                return y
+-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+        # if self.config.n_shared_experts is not None:
+-+        #     y = y + self.shared_experts(identity)
+-+        # return y
+-+        
+-+    @no_grad()
+-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+
+-+        expert_cache = ops.zeros_like(x)
+-+        for i in range(self.num_experts_per_tok):
+-+            expert_id = flat_expert_indices[i].item()
+-+            weight = flat_expert_weights[i].item()
+-+            expert = self.experts[expert_id]
+-+            expert_out = expert(x)
+-+            expert_cache += expert_out * weight
+-+        return expert_cache
+- 
+-     @no_grad()
+--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+--        # expert_cache = torch.zeros_like(x)
+--        # idxs = flat_expert_indices.argsort()
+--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+--        # token_idxs = idxs // self.num_experts_per_tok
+--        # for i, end_idx in enumerate(tokens_per_expert):
+--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+--        #     if start_idx == end_idx:
+--        #         continue
+--        #     expert = self.experts[i]
+--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+--        #     expert_tokens = x[exp_token_idx]
+--        #     expert_out = expert(expert_tokens)
+--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+--        # return expert_cache
+-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-         expert_cache = ops.zeros_like(x)
+-         idxs = flat_expert_indices.argsort()
+-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-         token_idxs = idxs // self.num_experts_per_tok
+-+
+-         for i, end_idx in enumerate(tokens_per_expert):
+-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-             if start_idx == end_idx:
+-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-             expert_out = expert(expert_tokens)
+-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+
+-         return expert_cache
+-+        
+-+    # @no_grad()
+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     # expert_cache = torch.zeros_like(x)
+-+    #     # idxs = flat_expert_indices.argsort()
+-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+    #     # token_idxs = idxs // self.num_experts_per_tok
+-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+    #     #     if start_idx == end_idx:
+-+    #     #         continue
+-+    #     #     expert = self.experts[i]
+-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+    #     #     expert_tokens = x[exp_token_idx]
+-+    #     #     expert_out = expert(expert_tokens)
+-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+    #     # return expert_cache
+-+    #     expert_cache = ops.zeros_like(x)
+-+    #     idxs = flat_expert_indices.argsort()
+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+
+-+    #     for i, end_idx in enumerate(tokens_per_expert):
+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+    #         if start_idx == end_idx:
+-+    #             continue
+-+    #         expert = self.experts[i]
+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+    #         expert_tokens = x[exp_token_idx]
+-+    #         expert_out = expert(expert_tokens)
+-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+
+-+    #     return expert_cache
+-+    # @no_grad()
+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     expert_cache = ops.zeros_like(x)
+-+
+-+    #     # 排序保证顺序一致
+-+    #     idxs = flat_expert_indices.argsort()
+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+
+-+    #     # 找出有 token 的专家
+-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+
+-+    #     for i in active_experts.tolist():
+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+    #         end_idx = tokens_per_expert[i]
+-+    #         if start_idx == end_idx:  # 没有 token
+-+    #             continue
+-+
+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+    #         expert_tokens = x[exp_token_idx]
+-+    #         expert_out = self.experts[i](expert_tokens)
+-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+
+-+    #         expert_cache = mindspore.mint.scatter_add(
+-+    #             expert_cache,
+-+    #             0,
+-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+    #             expert_out
+-+    #         )
+-+
+-+    #     return expert_cache
+-+
+-+
+- 
+- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+- #     """
+-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+- 
+-         # Initialize weights and apply final processing
+-         self.post_init()
+-+        self.warm_up = False
+-+
+-+    def warmup_moe_model_deep(self):
+-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+        test_texts = [
+-+            "warmup short",
+-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+        ]
+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+        if tokenizer is None:
+-+            from mindnlp.transformers import AutoTokenizer
+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+            self._warmup_tokenizer = tokenizer
+-+
+-+        for text in test_texts:
+-+            inputs = tokenizer(text, return_tensors="ms")
+-+            with mindspore._no_grad():
+-+                _ = self(**inputs, use_cache=False)
+-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+- 
+-     def get_input_embeddings(self):
+-         return self.model.embed_tokens
+-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-         ```"""
+-+        if not self.warm_up:
+-+            self.warm_up = True
+-+            self.warmup_moe_model_deep()
+-+
+-         output_attentions = (
+-             output_attentions
+-             if output_attentions is not None
+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-index 3cbf820e..d4c6b651 100644
+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-@@ -18,7 +18,6 @@
+- # See the License for the specific language governing permissions and
+- # limitations under the License.
+- """MindSpore Qwen2MoE model."""
+--
+- import math
+- from typing import List, Optional, Tuple, Union
+- 
+-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-     TokenClassifierOutput,
+- )
+- from ...modeling_utils import PreTrainedModel
+-+from ...generation import GenerationMixin
+- from ....utils import logging
+- from .configuration_qwen2_moe import Qwen2MoeConfig
+- 
+-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-         self.variance_epsilon = eps
+- 
+-     def forward(self, hidden_states):
+-+        # @dwj
+-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+        # @lwx
+-+        # if not self.training :
+-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-         input_dtype = hidden_states.dtype
+-         hidden_states = hidden_states.to(mindspore.float32)
+-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-@@ -234,6 +239,8 @@ def rotate_half(x):
+-     """Rotates half the hidden dims of the input."""
+-     x1 = x[..., : x.shape[-1] // 2]
+-     x2 = x[..., x.shape[-1] // 2 :]
+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-     return ops.cat((-x2, x1), dim=-1)
+- 
+- 
+-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-         self.config = config
+-         self.hidden_size = config.hidden_size
+-         self.intermediate_size = intermediate_size
+-+        
+-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-         self.act_fn = ACT2FN[config.hidden_act]
+- 
+-     def forward(self, x):
+--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+--
+- 
+-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+        # @lwx 
+-+        # gate_up_output = self.gate_up_proj(x)
+-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+        # return self.down_proj(swiglu_output)
+-+
+-+    # def forward(self, x):
+-+    #     gate_proj_out = self.gate_proj(x)
+-+    #     up_proj_out = self.up_proj(x)
+-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+    #     return self.down_proj(swiglu_out)
+-+        
+- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-     """
+-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-         use_cache: bool = False,
+-         cache_position: Optional[mindspore.Tensor] = None,
+-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+        
+-+
+-         bsz, q_len, _ = hidden_states.shape
+- 
+-         query_states = self.q_proj(hidden_states)
+-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-                     "with a layer index."
+-                 )
+--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+            if isinstance(past_key_value, StaticCache):
+-+                kv_seq_len = key_states.shape[-2]
+-+            else:
+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+- 
+-         if past_key_value is not None:
+-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+            
+-+            if isinstance(past_key_value, StaticCache):
+-+                kv_seq_len = key_states.shape[-2]
+- 
+-         # repeat k/v heads if n_kv_heads < n_heads
+-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+--
+-+        
+-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+- 
+--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+--            raise ValueError(
+--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+--                f" {attn_weights.shape}"
+--            )
+--
+--        if attention_mask is not None:  # no matter the length, we just slice it
+--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+        if attention_mask is not None:
+-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-             attn_weights = attn_weights + causal_mask
+- 
+-         # upcast attention to fp32
+-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+- 
+-         attn_output = self.o_proj(attn_output)
+--
+-+        # @lwx
+-+        
+-+        # max_seq_len = self.max_position_embeddings  # 2048
+-+
+-+        # if attention_mask is not None:
+-+        #     # attention_mask: [B, 1, Sq, Sk]
+-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+
+-+        #     # pad 到 [max_seq_len, max_seq_len]
+-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+        #     global_attention_mask = padded_mask
+-+        # else:
+-+        #     global_attention_mask = None
+-+
+-+
+-+        # sparse_mode=3
+-+        # attn_output = mindspore.ops.flash_attention_score(
+-+        #     query=query_states,
+-+        #     key=key_states,
+-+        #     value=value_states,
+-+        #     real_shift=None,
+-+        #     padding_mask=None,
+-+
+-+        #     head_num=self.num_heads,
+-+        #     attn_mask=global_attention_mask,
+-+        #     keep_prob=1.0 - self.attention_dropout,
+-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+        #     input_layout="BNSD",
+-+        #     pre_tokens=2147483647,
+-+        #     next_tokens=2147483647,
+-+        #     inner_precise=0,
+-+        #     drop_mask=None,
+-+        #     prefix=None,
+-+        #     actual_seq_qlen=None,
+-+        #     actual_seq_kvlen=None,
+-+        #     sparse_mode=sparse_mode,
+-+        # )
+-         if not output_attentions:
+-             attn_weights = None
+- 
+-         return attn_output, attn_weights, past_key_value
+- 
+- 
+-+class Qwen2MoeFlashAttention(nn.Module):
+-+    """
+-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+
+-+    关键改动:
+-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+       直接传入原始的 key 和 value 张量效率更高。
+-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+    """
+-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+        super().__init__()
+-+        self.config = config
+-+        self.layer_idx = layer_idx
+-+        self.hidden_size = config.hidden_size
+-+        self.num_heads = config.num_attention_heads
+-+        self.head_dim = self.hidden_size // self.num_heads
+-+        self.num_key_value_heads = config.num_key_value_heads
+-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+        self.max_position_embeddings = config.max_position_embeddings
+-+        self.rope_theta = config.rope_theta
+-+        self.attention_dropout = config.attention_dropout
+-+
+-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+            raise ValueError(
+-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+            )
+-+
+-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+
+-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+            self.head_dim,
+-+            max_position_embeddings=self.max_position_embeddings,
+-+            base=self.rope_theta,
+-+        )
+-+
+-+    def forward(
+-+        self,
+-+        hidden_states: mindspore.Tensor,
+-+        attention_mask: Optional[mindspore.Tensor] = None,
+-+        position_ids: Optional[mindspore.Tensor] = None,
+-+        past_key_value: Optional[Cache] = None,
+-+        output_attentions: bool = False,
+-+        use_cache: bool = False,
+-+        cache_position: Optional[mindspore.Tensor] = None,
+-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+        bsz, q_len, _ = hidden_states.shape
+-+
+-+        # 1. 线性投射 Q, K, V
+-+        query_states = self.q_proj(hidden_states)
+-+        key_states = self.k_proj(hidden_states)
+-+        value_states = self.v_proj(hidden_states)
+-+
+-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+        # 3. RoPE 旋转位置编码
+-+        kv_seq_len = key_states.shape[-2]
+-+        if past_key_value is not None:
+-+            if self.layer_idx is None:
+-+                raise ValueError(
+-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+                    "with a layer index."
+-+                )
+-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+                if cache_position.shape[0] == 1:
+-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+                    kv_seq_len = past_seen_tokens + 1
+-+                else:
+-+                    # prefill 阶段：cache_position 是范围，使用其长度
+-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+            else:
+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+
+-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+        # 4. KV 缓存更新
+-+        if past_key_value is not None:
+-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+            key_states, value_states = past_key_value.update(
+-+                key_states, value_states, self.layer_idx, cache_kwargs
+-+            )
+-+            
+-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+                if cache_position.shape[0] == 1:
+-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+                    kv_seq_len = key_states.shape[-2]
+-+
+-+        # 5. [重要] 准备 Attention Mask
+-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+        fa_attention_mask = None
+-+        if attention_mask is not None:
+-+            # 截取与当前key长度匹配的部分
+-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+            fa_attention_mask = (mask_slice != 0)
+-+
+-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+        input_dtype = query_states.dtype
+-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+            query_states = query_states.to(mindspore.float16)
+-+            key_states = key_states.to(mindspore.float16)
+-+            value_states = value_states.to(mindspore.float16)
+-+
+-+        # 6. [核心] 调用 flash_attention_score 算子
+-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+        attn_output = mindspore.ops.flash_attention_score(
+-+            query=query_states,
+-+            key=key_states,
+-+            value=value_states,
+-+            head_num=self.num_heads, # 传入Q的头数(N1)
+-+            attn_mask=fa_attention_mask,
+-+            keep_prob=1.0 - self.attention_dropout,
+-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+            input_layout="BNSD",
+-+            sparse_mode=0 # 使用 defaultMask 模式
+-+        )
+-+
+-+        # 恢复原始数据类型
+-+        attn_output = attn_output.to(input_dtype)
+-+
+-+        # 7. 调整输出形状
+-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+        attn_output = self.o_proj(attn_output)
+-+
+-+        # FlashAttention 算子不直接返回注意力权重矩阵
+-+        attn_weights = None
+-+        if output_attentions:
+-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+
+-+        return attn_output, attn_weights, past_key_value
+-+
+-+    # def forward(
+-+    #     self,
+-+    #     hidden_states: mindspore.Tensor,
+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+    #     past_key_value: Optional[Cache] = None,
+-+    #     output_attentions: bool = False,
+-+    #     use_cache: bool = False,
+-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+    #     bsz, q_len, _ = hidden_states.shape
+-+
+-+    #     # 1. 线性投射 Q, K, V
+-+    #     query_states = self.q_proj(hidden_states)
+-+    #     key_states = self.k_proj(hidden_states)
+-+    #     value_states = self.v_proj(hidden_states)
+-+
+-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+    #     # 3. RoPE 旋转位置编码
+-+    #     kv_seq_len = key_states.shape[-2]
+-+    #     if past_key_value is not None:
+-+    #         if self.layer_idx is None:
+-+    #             raise ValueError(
+-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+    #                 "with a layer index."
+-+    #             )
+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+
+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+    #     # 4. KV 缓存更新
+-+    #     if past_key_value is not None:
+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+    #         key_states, value_states = past_key_value.update(
+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+    #         )
+-+
+-+    #     # 5. 准备 Attention Mask
+-+    #     fa_attention_mask = None
+-+    #     if attention_mask is not None:
+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+    #         fa_attention_mask = (mask_slice != 0)
+-+
+-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+    #     input_dtype = query_states.dtype
+-+
+-+    #     # 6. [核心] 调用 flash_attention_score 算子
+-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+    #         query=query_states,
+-+    #         key=key_states,
+-+    #         value=value_states,
+-+    #         head_num=self.num_heads,
+-+    #         attn_mask=fa_attention_mask,
+-+    #         keep_prob=1.0 - self.attention_dropout,
+-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+    #         input_layout="BNSD",
+-+    #         sparse_mode=0,
+-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+    #         inner_precise=1
+-+    #     )
+-+
+-+    #     # 恢复原始数据类型
+-+    #     attn_output = attn_output.to(input_dtype)
+-+
+-+    #     # 7. 调整输出形状
+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+    #     attn_output = self.o_proj(attn_output)
+-+
+-+    #     attn_weights = None
+-+    #     if output_attentions:
+-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+
+-+    #     return attn_output, attn_weights, past_key_value
+-+
+-+    # def forward(
+-+    #     self,
+-+    #     hidden_states: mindspore.Tensor,
+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+    #     past_key_value: Optional[Cache] = None,
+-+    #     output_attentions: bool = False,
+-+    #     use_cache: bool = False,
+-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+    #     bsz, q_len, _ = hidden_states.shape
+-+
+-+    #     query_states = self.q_proj(hidden_states)
+-+    #     key_states = self.k_proj(hidden_states)
+-+    #     value_states = self.v_proj(hidden_states)
+-+
+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+    #     kv_seq_len = key_states.shape[-2]
+-+    #     if past_key_value is not None:
+-+    #         if self.layer_idx is None:
+-+    #             raise ValueError("`layer_idx` must be specified for caching")
+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+
+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+    #     if past_key_value is not None:
+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+    #         key_states, value_states = past_key_value.update(
+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+    #         )
+-+
+-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+
+-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+    #     query_states = query_states / math.sqrt(self.head_dim)
+-+    #     # <--- 修改结束 ---
+-+
+-+    #     fa_attention_mask = None
+-+    #     if attention_mask is not None:
+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+    #         fa_attention_mask = (mask_slice != 0)
+-+
+-+    #     input_dtype = query_states.dtype
+-+
+-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+    #         query=query_states,    # 传入已经预先缩放过的 query
+-+    #         key=key_states,
+-+    #         value=value_states,
+-+    #         head_num=self.num_heads,
+-+    #         attn_mask=fa_attention_mask,
+-+    #         keep_prob=1.0 - self.attention_dropout,
+-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+    #         input_layout="BNSD",
+-+    #         sparse_mode=0,
+-+    #         inner_precise=1        # 仍然保持内部高精度计算
+-+    #     )
+-+
+-+    #     attn_output = attn_output.to(input_dtype)
+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+    #     attn_output = self.o_proj(attn_output)
+-+
+-+    #     attn_weights = None
+-+    #     if output_attentions:
+-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+
+-+    #     return attn_output, attn_weights, past_key_value
+-+
+- QWEN2MOE_ATTENTION_CLASSES = {
+-     "eager": Qwen2MoeAttention,
+-+    "flash-attention": Qwen2MoeFlashAttention,
+- }
+- 
+- 
+-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+- 
+-+    #@dwj
+-+    # 只遍历激活的专家，而非全部专家
+-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+--        hidden_states = hidden_states.view(-1, hidden_dim)
+--        # router_logits: (batch * sequence_length, n_experts)
+--        router_logits = self.gate(hidden_states)
+--
+--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+--        if self.norm_topk_prob:
+--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--        # we cast back to the input dtype
+--        routing_weights = routing_weights.to(hidden_states.dtype)
+--
+--        final_hidden_states = ops.zeros(
+--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+--        )
+--
+--        # One hot encode the selected experts to create an expert mask
+--        # this will be used to easily index which expert is going to be sollicitated
+--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+--
+--        # Loop over all available experts in the model and perform the computation on each expert
+--        for expert_idx in range(self.num_experts):
+--            expert_layer = self.experts[expert_idx]
+--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+--
+--            # Index the correct hidden states and compute the expert hidden state for
+--            # the current expert. We need to make sure to multiply the output hidden
+--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+--            if 0 not in idx.shape:
+--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+--
+--                # However `index_add_` only support torch tensors for indexing so we'll use
+--                # the `top_x` tensor here.
+--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+--
+--        shared_expert_output = self.shared_expert(hidden_states)
+--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+--
+--        final_hidden_states = final_hidden_states + shared_expert_output
+-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+            num_tokens = hidden_states_reshaped.shape[0]
+-+
+-+            router_logits = self.gate(hidden_states_reshaped)
+-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+
+-+            if self.norm_topk_prob:
+-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+            routing_weights = routing_weights.to(hidden_states.dtype)
+-+            
+-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+            flat_selected_experts = selected_experts.flatten()
+-+            
+-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+            token_indices = broadcasted_token_indices.flatten()
+-+            
+-+            active_experts = ops.unique(flat_selected_experts)
+-+            
+-+            for expert_idx_tensor in active_experts:
+-+                expert_idx = expert_idx_tensor.item()
+-+                expert_layer = self.experts[expert_idx]
+-+                
+-+                mask = (flat_selected_experts == expert_idx_tensor)
+-+                selected_token_indices = token_indices[mask]
+-+                selected_routing_weights = routing_weights.flatten()[mask]
+-+                
+-+                current_states = hidden_states_reshaped[selected_token_indices]
+-+                
+-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+                
+-+                final_hidden_states = final_hidden_states.index_add(
+-+                    dim=0,
+-+                    index=selected_token_indices,
+-+                    source=expert_output.to(hidden_states.dtype)
+-+                )
+-+            
+-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+- 
+--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+--        return final_hidden_states, router_logits
+-+            final_hidden_states = final_hidden_states + shared_expert_output
+-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+            
+-+            return final_hidden_states, router_logits
+- 
+- 
+- class Qwen2MoeDecoderLayer(nn.Module):
+-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+- 
+-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+- 
+-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+
+-         if (layer_idx not in config.mlp_only_layers) and (
+-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-         ):
+-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-     _skip_keys_device_placement = "past_key_values"
+-     _supports_cache_class = True
+-+#lwx
+-+    # _supports_static_cache = True
+- 
+-     def _init_weights(self, module):
+-         std = self.config.initializer_range
+-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-         return causal_mask
+- 
+- 
+--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-     _tied_weights_keys = ["lm_head.weight"]
+- 
+-     def __init__(self, config):
+-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-         self.num_experts_per_tok = config.num_experts_per_tok
+-         # Initialize weights and apply final processing
+-         self.post_init()
+-+        # @lwx
+-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+        #     self.generation_config.cache_implementation = "static"
+-+        self._warmed_up = False
+-+
+-+    def warmup_moe_model(self):
+-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+        test_texts = [
+-+            "warmup short",
+-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+        ]
+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+        if tokenizer is None:
+-+            from mindnlp.transformers import AutoTokenizer
+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+            self._warmup_tokenizer = tokenizer
+-+
+-+        for text in test_texts:
+-+            inputs = tokenizer(text, return_tensors="ms")
+-+            with mindspore._no_grad():
+-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+- 
+-     def get_input_embeddings(self):
+-         return self.model.embed_tokens
+-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-         ```"""
+-+        if not self._warmed_up:
+-+            self._warmed_up = True
+-+            self.warmup_moe_model()
+- 
+-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-         output_router_logits = (
+-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-             }
+-         )
+-         return model_inputs
+-+# @lwx
+-+    # def _decode_one_tokens_logits(
+-+    #     self,
+-+    #     cur_token: mindspore.Tensor,
+-+    #     input_pos: Optional[mindspore.Tensor],
+-+    #     cache_position: mindspore.Tensor,
+-+    #     past_key_values: StaticCache,
+-+    # ) -> mindspore.Tensor:
+-+    #     """
+-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+        
+-+    #     Args:
+-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+    #         input_pos: 输入位置信息，可选
+-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+            
+-+    #     Returns:
+-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+    #     """
+-+    #     # 调用JIT编译的版本
+-+    #     return self.get_decode_one_tokens_logits(
+-+    #         cur_token=cur_token,
+-+    #         input_pos=input_pos,
+-+    #         cache_position=cache_position,
+-+    #         past_key_values=past_key_values,
+-+    #     )
+-+    
+-+    # @mindspore.jit(jit_level='O1')
+-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+    #     """
+-+    #     JIT编译的函数，用于高效的单token解码
+-+    #     使用JIT编译优化以支持静态shape和高效执行
+-+        
+-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+    #     """
+-+    #     outputs = self.model.forward(
+-+    #         input_ids=cur_token,
+-+    #         position_ids=input_pos,
+-+    #         cache_position=cache_position,
+-+    #         past_key_values=past_key_values,
+-+    #         use_cache=True,
+-+    #         return_dict=False,
+-+    #     )
+-+        
+-+    #     hidden_states = outputs[0]
+-+    #     logits = self.lm_head.forward(hidden_states)
+-+    #     logits = logits.float()
+-+        
+-+    #     return logits[:, -1, :]
+-+
+-+    # def _sample(
+-+    #     self,
+-+    #     input_ids: mindspore.Tensor,
+-+    #     logits_processor,
+-+    #     stopping_criteria,
+-+    #     generation_config,
+-+    #     synced_devices: bool,
+-+    #     streamer=None,
+-+    #     logits_warper=None,
+-+    #     **model_kwargs,
+-+    # ):
+-+    #     """
+-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+    #     """
+-+    #     from ...generation.logits_process import LogitsProcessorList
+-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+    #     from mindnlp.core import nn, ops, no_grad
+-+    #     import numpy as np
+-+        
+-+    #     # 检查是否使用 StaticCache
+-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+    #     # 否则，直接调用父类方法
+-+    #     past_key_values = model_kwargs.get("past_key_values")
+-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+
+-+    #     if not isinstance(past_key_values, StaticCache):
+-+    #         # 不使用 StaticCache，直接调用父类方法
+-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+    #         return super()._sample(
+-+    #             input_ids=input_ids,
+-+    #             logits_processor=logits_processor,
+-+    #             stopping_criteria=stopping_criteria,
+-+    #             generation_config=generation_config,
+-+    #             synced_devices=synced_devices,
+-+    #             streamer=streamer,
+-+    #             logits_warper=logits_warper,
+-+    #             **model_kwargs,
+-+    #         )
+-+        
+-+    #     # 使用 StaticCache，进入自定义循环
+-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+    #     pad_token_id = generation_config._pad_token_tensor
+-+    #     output_attentions = generation_config.output_attentions
+-+    #     output_hidden_states = generation_config.output_hidden_states
+-+    #     output_scores = generation_config.output_scores
+-+    #     output_logits = generation_config.output_logits
+-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+    #     max_length = generation_config.max_length
+-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+    #     do_sample = generation_config.do_sample
+-+        
+-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+    #         raise ValueError(
+-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+    #             f"{logits_warper})."
+-+    #         )
+-+        
+-+    #     # init attention / hidden states / scores tuples
+-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+        
+-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+    #         encoder_hidden_states = (
+-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+    #         )
+-+        
+-+    #     # keep track of which sequences are already finished
+-+    #     batch_size, cur_len = input_ids.shape
+-+    #     this_peer_finished = False
+-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+        
+-+    #     time_record = []
+-+    #     from ....utils.testing_utils import parse_flag_from_env
+-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+        
+-+    #     while self._has_unfinished_sequences(
+-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+    #     ):
+-+    #         if _record_time:
+-+    #             import time as time_module
+-+    #             infer_start = time_module.time()
+-+            
+-+    #         # prepare model inputs
+-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+            
+-+    #         # prepare variable output controls
+-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+            
+-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+    #         cur_cache_position = model_inputs.get("cache_position")
+-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+    #         cur_input_ids = model_inputs.get("input_ids")
+-+            
+-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+    #             cur_cache_position is not None and 
+-+    #             len(cur_cache_position.shape) > 0 and
+-+    #             cur_cache_position.shape[0] == 1 and
+-+    #             cur_input_ids is not None and
+-+    #             cur_input_ids.shape[1] == 1):
+-+    #             # 使用 JIT 优化的单 token 解码
+-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+    #             if not hasattr(self, '_jit_used'):
+-+    #                 self._jit_used = False
+-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+                
+-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+    #                 cur_token=cur_input_ids,
+-+    #                 input_pos=model_inputs.get("position_ids"),
+-+    #                 cache_position=cur_cache_position,
+-+    #                 past_key_values=cur_past_key_values,
+-+    #             )
+-+                
+-+    #             # 标记已使用JIT（用于后续判断）
+-+    #             if not self._jit_used:
+-+    #                 self._jit_used = True
+-+                
+-+    #             # 构造兼容的输出对象
+-+    #             class JitOptimizedOutput:
+-+    #                 def __init__(self, logits, config):
+-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+    #                     self.config = config
+-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+    #                     self.cross_attentions = None
+-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+                
+-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+    #         else:
+-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+    #             outputs = self(**model_inputs, return_dict=True)
+-+            
+-+    #         if synced_devices and this_peer_finished:
+-+    #             continue
+-+            
+-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+    #         next_token_logits = outputs.logits[:, -1, :]
+-+            
+-+    #         # pre-process distribution
+-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+    #         if do_sample:
+-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+            
+-+    #         # Store scores, attentions and hidden_states when required
+-+    #         if return_dict_in_generate:
+-+    #             if output_scores:
+-+    #                 scores += (next_token_scores,)
+-+    #             if output_logits:
+-+    #                 raw_logits += (next_token_logits,)
+-+    #             if output_attentions:
+-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+    #                 if self.config.is_encoder_decoder:
+-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+                
+-+    #             if output_hidden_states:
+-+    #                 hidden = (
+-+    #                     outputs.decoder_hidden_states
+-+    #                     if self.config.is_encoder_decoder
+-+    #                     else outputs.hidden_states
+-+    #                 )
+-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+            
+-+    #         # token selection
+-+    #         if do_sample:
+-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+    #         else:
+-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+            
+-+    #         # finished sentences should have their next token be a padding token
+-+    #         if has_eos_stopping_criteria:
+-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+            
+-+    #         # update generated ids, model inputs, and length for next step
+-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+    #         if streamer is not None:
+-+    #             streamer.put(next_tokens)
+-+            
+-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+    #             outputs,
+-+    #             model_kwargs,
+-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+    #         )
+-+            
+-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+    #         cur_len += 1
+-+            
+-+    #         if _record_time:
+-+    #             import time as time_module
+-+    #             infer_stop = time_module.time()
+-+    #             time_record.append(infer_stop - infer_start)
+-+            
+-+    #         del outputs
+-+        
+-+    #     average_infer_time = None
+-+    #     if time_record:
+-+    #         if len(time_record) > 1:
+-+    #             time_record.pop(0)
+-+    #         average_infer_time = sum(time_record) / len(time_record)
+-+    #         print(f'average inference time is: {average_infer_time}')
+-+    #         print(f'inference time record: {time_record}')
+-+        
+-+    #     if streamer is not None:
+-+    #         streamer.end()
+-+        
+-+    #     # 简单判断：打印是否使用了JIT路径
+-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+    #     else:
+-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+        
+-+    #     if return_dict_in_generate:
+-+    #         if self.config.is_encoder_decoder:
+-+    #             return GenerateEncoderDecoderOutput(
+-+    #                 sequences=input_ids,
+-+    #                 scores=scores,
+-+    #                 logits=raw_logits,
+-+    #                 encoder_attentions=encoder_attentions,
+-+    #                 encoder_hidden_states=encoder_hidden_states,
+-+    #                 decoder_attentions=decoder_attentions,
+-+    #                 cross_attentions=cross_attentions,
+-+    #                 decoder_hidden_states=decoder_hidden_states,
+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+    #                 average_infer_time=average_infer_time
+-+    #             )
+-+    #         else:
+-+    #             return GenerateDecoderOnlyOutput(
+-+    #                 sequences=input_ids,
+-+    #                 scores=scores,
+-+    #                 logits=raw_logits,
+-+    #                 attentions=decoder_attentions,
+-+    #                 hidden_states=decoder_hidden_states,
+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+    #                 average_infer_time=average_infer_time
+-+    #             )
+-+    #     else:
+-+    #         return input_ids
+-+            
+-+    # def _prepare_cache_for_generation(
+-+    #     self,
+-+    #     generation_config,
+-+    #     model_kwargs,
+-+    #     assistant_model,
+-+    #     batch_size,
+-+    #     max_cache_length,
+-+    # ):
+-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+    #         generation_config.cache_implementation = "static"
+-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+        
+-+    #     if generation_config.cache_implementation == "static":
+-+    #         base_required_from_max_length = generation_config.max_length + 1
+-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+    #         min_cache_size = 50
+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+    #         else:
+-+    #             max_cache_length = max(base_required, min_cache_size)
+-+            
+-+    #         original_max_cache_length = max_cache_length
+-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+            
+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+    #             if max_cache_length > self.config.max_position_embeddings:
+-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+        
+-+    #     result = super()._prepare_cache_for_generation(
+-+    #         generation_config=generation_config,
+-+    #         model_kwargs=model_kwargs,
+-+    #         assistant_model=assistant_model,
+-+    #         batch_size=batch_size,
+-+    #         max_cache_length=max_cache_length,
+-+    #     )
+-+        
+-+    #     if generation_config.cache_implementation == "static":
+-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+    #         created_cache = model_kwargs.get(cache_name)
+-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+    #             if created_cache.max_cache_len < generation_config.max_length:
+-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+        
+-+    #     return result
+-+
+-+
+-+
+- 
+- 
+- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+--- 
+-2.27.0
+-
+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+deleted file mode 100644
+index d7a129ea..00000000
+--- a/patches/0002-20251106commit.patch
++++ /dev/null
+@@ -1,3200 +0,0 @@
+-From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Thu, 6 Nov 2025 09:20:38 +0800
+-Subject: [PATCH 2/8] 20251106commit
+-
+----
+- .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+- .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+- patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+- 3 files changed, 2689 insertions(+), 305 deletions(-)
+- create mode 100644 patches/0001-20251104commit.patch
+-
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index d8303e45..73773c22 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+-         #     y = y + self.shared_experts(identity)
+-         # return y
+-         
+-+    # @no_grad()
+-+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+
+-+    #     expert_cache = ops.zeros_like(x)
+-+    #     for i in range(self.num_experts_per_tok):
+-+    #         expert_id = flat_expert_indices[i].item()
+-+    #         weight = flat_expert_weights[i].item()
+-+    #         expert = self.experts[expert_id]
+-+    #         expert_out = expert(x)
+-+    #         expert_cache += expert_out * weight
+-+    #     return expert_cache
+-+
+-     @no_grad()
+-     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+        # x 的 shape: (1, hidden_size)
+-+        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+
+-+        # 1. 收集所有需要的专家层
+-+        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+
+-+        # 2. 并行计算所有专家的输出
+-+        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+        # ops.cat 会将它们堆叠成一个新的 Tensor
+-+        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+
+-+        # 3. 使用矩阵乘法进行加权求和
+-+        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+        # 最终结果 final_output 的 shape: (1, hidden_size)
+-+        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+        
+-+        return final_output
+- 
+--        expert_cache = ops.zeros_like(x)
+--        for i in range(self.num_experts_per_tok):
+--            expert_id = flat_expert_indices[i].item()
+--            weight = flat_expert_weights[i].item()
+--            expert = self.experts[expert_id]
+--            expert_out = expert(x)
+--            expert_cache += expert_out * weight
+--        return expert_cache
+- 
+-     @no_grad()
+-     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+-             key_states = self.k_proj(hidden_states)
+-             value_states = self.v_proj(hidden_states)
+- 
+--        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+--        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+--        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+        # @lwx
+-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+-+        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-+        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-+        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+- 
+-         kv_seq_len = key_states.shape[-2]
+-         if past_key_value is not None:
+-@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+-         return attn_output, attn_weights, past_key_value
+- 
+- 
+-+# class DeepseekFlashAttention(nn.Module):
+-+#     """
+-+#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-+#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-+
+-+#     This class is designed as a drop-in replacement for DeepseekAttention.
+-+#     """
+-+
+-+#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-+#         super().__init__()
+-+#         self.config = config
+-+#         self.layer_idx = layer_idx
+-+#         if layer_idx is None:
+-+#             logger.warning(
+-+#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+#                 "when creating this class."
+-+#             )
+-+
+-+#         self.attention_dropout = config.attention_dropout
+-+#         self.hidden_size = config.hidden_size
+-+#         self.num_heads = config.num_attention_heads
+-+#         self.head_dim = self.hidden_size // self.num_heads
+-+#         self.num_key_value_heads = config.num_key_value_heads
+-+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+#         self.max_position_embeddings = config.max_position_embeddings
+-+#         self.rope_theta = config.rope_theta
+-+#         self.is_causal = True
+-+
+-+#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+#             raise ValueError(
+-+#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+#                 f" and `num_heads`: {self.num_heads})."
+-+#             )
+-+
+-+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-+#         self._init_rope()
+-+
+-+#     def _init_rope(self):
+-+#         if self.config.rope_scaling is None:
+-+#             self.rotary_emb = DeepseekRotaryEmbedding(
+-+#                 self.head_dim,
+-+#                 max_position_embeddings=self.max_position_embeddings,
+-+#                 base=self.rope_theta,
+-+#             )
+-+#         else:
+-+#             scaling_type = self.config.rope_scaling["type"]
+-+#             scaling_factor = self.config.rope_scaling["factor"]
+-+#             if scaling_type == "linear":
+-+#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-+#                     self.head_dim,
+-+#                     max_position_embeddings=self.max_position_embeddings,
+-+#                     scaling_factor=scaling_factor,
+-+#                     base=self.rope_theta,
+-+#                 )
+-+#             elif scaling_type == "dynamic":
+-+#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-+#                     self.head_dim,
+-+#                     max_position_embeddings=self.max_position_embeddings,
+-+#                     scaling_factor=scaling_factor,
+-+#                     base=self.rope_theta,
+-+#                 )
+-+#             else:
+-+#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-+
+-+#     def forward(
+-+#         self,
+-+#         hidden_states: mindspore.Tensor,
+-+#         attention_mask: Optional[mindspore.Tensor] = None,
+-+#         position_ids: Optional[mindspore.Tensor] = None,
+-+#         past_key_value: Optional[Cache] = None,
+-+#         output_attentions: bool = False,
+-+#         use_cache: bool = False,
+-+#         **kwargs,
+-+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+#         if "padding_mask" in kwargs:
+-+#             warnings.warn(
+-+#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-+#             )
+-+        
+-+#         if output_attentions:
+-+#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-+
+-+#         bsz, q_len, _ = hidden_states.shape
+-+
+-+#         if self.config.pretraining_tp > 1:
+-+#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-+
+-+#         query_states = self.q_proj(hidden_states)
+-+#         key_states = self.k_proj(hidden_states)
+-+#         value_states = self.v_proj(hidden_states)
+-+
+-+#         # Reshape for multi-head attention
+-+#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+#         kv_seq_len = key_states.shape[-2]
+-+#         if past_key_value is not None:
+-+#             if self.layer_idx is None:
+-+#                 raise ValueError(
+-+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+#                     "with a layer index."
+-+#                 )
+-+#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+        
+-+#         # Apply Rotary Positional Embedding
+-+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+#         if past_key_value is not None:
+-+#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-+#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+
+-+#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-+#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-+#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+        
+-+#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+        
+-+#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+
+-+#         # Convert attention_mask for flash_attention_score
+-+#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-+#         if attention_mask is not None:
+-+#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-+#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-+#                 raise ValueError(
+-+#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-+#                 )
+-+#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-+#         else:
+-+#             attn_mask_for_fa = None
+-+        
+-+#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-+
+-+#         # Call the fused flash_attention_score operator
+-+#         attn_output = mindspore.ops.flash_attention_score(
+-+#             query=query_states_for_fa,
+-+#             key=key_states_for_fa,
+-+#             value=value_states_for_fa,
+-+#             head_num=self.num_heads, # This is N1, the number of query heads
+-+#             input_layout='BSH',
+-+#             attn_mask=attn_mask_for_fa,
+-+#             keep_prob=keep_prob,
+-+#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+#             sparse_mode=0 # Default mask mode
+-+#         )
+-+        
+-+#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-+#         attn_output = self.o_proj(attn_output)
+-+        
+-+#         # Flash Attention does not return attention weights
+-+#         attn_weights = None
+-+
+-+#         return attn_output, attn_weights, past_key_value
+-+
+-+class DeepseekFlashAttention(nn.Module):
+-+    """
+-+    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+-+    This implementation is a drop-in replacement for the original DeepseekAttention class,
+-+    designed for high performance on supported hardware (Ascend).
+-+
+-+    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+-+    """
+-+    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-+        super().__init__()
+-+        self.config = config
+-+        self.layer_idx = layer_idx
+-+        if layer_idx is None:
+-+            logger.warning(
+-+                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+                "when creating this class."
+-+            )
+-+
+-+        # --- [FIX] Correctly initialize all required attributes ---
+-+        self.attention_dropout = config.attention_dropout
+-+        self.hidden_size = config.hidden_size
+-+        self.num_heads = config.num_attention_heads
+-+        self.head_dim = self.hidden_size // self.num_heads
+-+        self.num_key_value_heads = config.num_key_value_heads
+-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+        self.max_position_embeddings = config.max_position_embeddings
+-+        self.rope_theta = config.rope_theta
+-+        self.is_causal = True
+-+
+-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+            raise ValueError(
+-+                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+                f" and `num_heads`: {self.num_heads})."
+-+            )
+-+
+-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-+        
+-+        # This call will now succeed as all attributes are initialized.
+-+        self._init_rope()
+-+
+-+    def _init_rope(self):
+-+        if self.config.rope_scaling is None:
+-+            self.rotary_emb = DeepseekRotaryEmbedding(
+-+                self.head_dim,
+-+                max_position_embeddings=self.max_position_embeddings,
+-+                base=self.rope_theta,
+-+            )
+-+        else:
+-+            scaling_type = self.config.rope_scaling["type"]
+-+            scaling_factor = self.config.rope_scaling["factor"]
+-+            if scaling_type == "linear":
+-+                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-+                    self.head_dim,
+-+                    max_position_embeddings=self.max_position_embeddings,
+-+                    scaling_factor=scaling_factor,
+-+                    base=self.rope_theta,
+-+                )
+-+            elif scaling_type == "dynamic":
+-+                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-+                    self.head_dim,
+-+                    max_position_embeddings=self.max_position_embeddings,
+-+                    scaling_factor=scaling_factor,
+-+                    base=self.rope_theta,
+-+                )
+-+            else:
+-+                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-+
+-+    def forward(
+-+        self,
+-+        hidden_states: mindspore.Tensor,
+-+        attention_mask: Optional[mindspore.Tensor] = None,
+-+        position_ids: Optional[mindspore.Tensor] = None,
+-+        past_key_value: Optional[Cache] = None,
+-+        output_attentions: bool = False,
+-+        use_cache: bool = False,
+-+        **kwargs,
+-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+        if "padding_mask" in kwargs:
+-+            warnings.warn(
+-+                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-+            )
+-+        if output_attentions:
+-+            warnings.warn(
+-+                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+-+            )
+-+
+-+        bsz, q_len, _ = hidden_states.shape
+-+
+-+        if self.config.pretraining_tp > 1:
+-+            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-+
+-+        query_states = self.q_proj(hidden_states)
+-+        key_states = self.k_proj(hidden_states)
+-+        value_states = self.v_proj(hidden_states)
+-+
+-+        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+        kv_seq_len = key_states.shape[-2]
+-+        if past_key_value is not None:
+-+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+        
+-+        # Apply Rotary Position Embedding
+-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+        if past_key_value is not None:
+-+            cache_kwargs = {"sin": sin, "cos": cos}
+-+            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+
+-+        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+-+        # So we must explicitly repeat the KV heads.
+-+        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+
+-+        # Convert attention mask for flash_attention_score
+-+        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+-+        if attention_mask is not None:
+-+            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-+                 raise ValueError(
+-+                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-+                )
+-+            attn_mask_for_fa = attention_mask < 0
+-+        else:
+-+            attn_mask_for_fa = None
+-+
+-+        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-+
+-+        # Call the fused operator using the efficient BNSD layout
+-+        attn_output = mindspore.ops.flash_attention_score(
+-+            query=query_states,
+-+            key=key_states,
+-+            value=value_states,
+-+            head_num=self.num_heads,
+-+            input_layout='BNSD', # Specify the correct layout
+-+            attn_mask=attn_mask_for_fa,
+-+            keep_prob=keep_prob,
+-+            scalar_value=1.0 / math.sqrt(self.head_dim)
+-+        )
+-+        
+-+        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+        
+-+        # Apply output projection
+-+        attn_output = self.o_proj(attn_output)
+-+
+-+        # Flash attention does not return attention weights, so we return None.
+-+        attn_weights = None
+-+
+-+        return attn_output, attn_weights, past_key_value
+-+
+- Deepseek_ATTENTION_CLASSES = {
+-     "eager": DeepseekAttention,
+-+    "flash-attention": DeepseekFlashAttention,
+- }
+- 
+- 
+-@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+-             config=config, layer_idx=layer_idx
+-         )
+- 
+-+        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-+            config=config, layer_idx=layer_idx
+-+        )
+-+
+-         self.mlp = (
+-             DeepseekMoE(config)
+-             if (
+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-index d4c6b651..bced285c 100644
+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+- 
+- import mindspore
+- import mindnlp.core.nn.functional as F
+--from mindnlp.core import nn, ops
+-+from mindnlp.core import nn, ops, no_grad
+- from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+- 
+- from ....common.activations import ACT2FN
+-@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+- _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+- _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+- 
+-+Long_Prompt = False
+-+PROMPT_LENGTH_THRESHOLD = 128
+- 
+- # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+- def _prepare_4d_causal_attention_mask_with_cache_position(
+-@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+-         return attn_output, attn_weights, past_key_value
+- 
+- 
+-+# class Qwen2MoeFlashAttention(nn.Module):
+-+#     """
+-+#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+
+-+#     关键改动:
+-+#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+#        直接传入原始的 key 和 value 张量效率更高。
+-+#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+#     """
+-+#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+#         super().__init__()
+-+#         self.config = config
+-+#         self.layer_idx = layer_idx
+-+#         self.hidden_size = config.hidden_size
+-+#         self.num_heads = config.num_attention_heads
+-+#         self.head_dim = self.hidden_size // self.num_heads
+-+#         self.num_key_value_heads = config.num_key_value_heads
+-+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+#         self.max_position_embeddings = config.max_position_embeddings
+-+#         self.rope_theta = config.rope_theta
+-+#         self.attention_dropout = config.attention_dropout
+-+
+-+#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+#             raise ValueError(
+-+#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+#             )
+-+
+-+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+
+-+#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+#             self.head_dim,
+-+#             max_position_embeddings=self.max_position_embeddings,
+-+#             base=self.rope_theta,
+-+#         )
+-+
+-+#     def forward(
+-+#         self,
+-+#         hidden_states: mindspore.Tensor,
+-+#         attention_mask: Optional[mindspore.Tensor] = None,
+-+#         position_ids: Optional[mindspore.Tensor] = None,
+-+#         past_key_value: Optional[Cache] = None,
+-+#         output_attentions: bool = False,
+-+#         use_cache: bool = False,
+-+#         cache_position: Optional[mindspore.Tensor] = None,
+-+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+#         bsz, q_len, _ = hidden_states.shape
+-+
+-+#         # 1. 线性投射 Q, K, V
+-+#         query_states = self.q_proj(hidden_states)
+-+#         key_states = self.k_proj(hidden_states)
+-+#         value_states = self.v_proj(hidden_states)
+-+
+-+#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+#         # query:   [B, S, H*D] -> [B, N1, S, D]
+-+#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+#         # 3. RoPE 旋转位置编码
+-+#         kv_seq_len = key_states.shape[-2]
+-+#         if past_key_value is not None:
+-+#             if self.layer_idx is None:
+-+#                 raise ValueError(
+-+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+#                     "with a layer index."
+-+#                 )
+-+#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+#                 if cache_position.shape[0] == 1:
+-+#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+#                     kv_seq_len = past_seen_tokens + 1
+-+#                 else:
+-+#                     # prefill 阶段：cache_position 是范围，使用其长度
+-+#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+#             else:
+-+#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+
+-+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+#         # 4. KV 缓存更新
+-+#         if past_key_value is not None:
+-+#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+#             key_states, value_states = past_key_value.update(
+-+#                 key_states, value_states, self.layer_idx, cache_kwargs
+-+#             )
+-+            
+-+#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+#                 if cache_position.shape[0] == 1:
+-+#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+#                     kv_seq_len = key_states.shape[-2]
+-+
+-+#         # 5. [重要] 准备 Attention Mask
+-+#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+#         fa_attention_mask = None
+-+#         if attention_mask is not None:
+-+#             # 截取与当前key长度匹配的部分
+-+#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+#             fa_attention_mask = (mask_slice != 0)
+-+
+-+#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+#         input_dtype = query_states.dtype
+-+#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+#             query_states = query_states.to(mindspore.float16)
+-+#             key_states = key_states.to(mindspore.float16)
+-+#             value_states = value_states.to(mindspore.float16)
+-+
+-+#         # 6. [核心] 调用 flash_attention_score 算子
+-+#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+#         attn_output = mindspore.ops.flash_attention_score(
+-+#             query=query_states,
+-+#             key=key_states,
+-+#             value=value_states,
+-+#             head_num=self.num_heads, # 传入Q的头数(N1)
+-+#             attn_mask=fa_attention_mask,
+-+#             keep_prob=1.0 - self.attention_dropout,
+-+#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+#             input_layout="BNSD",
+-+#             sparse_mode=0 # 使用 defaultMask 模式
+-+#         )
+-+
+-+#         # 恢复原始数据类型
+-+#         attn_output = attn_output.to(input_dtype)
+-+
+-+#         # 7. 调整输出形状
+-+#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+#         attn_output = self.o_proj(attn_output)
+-+
+-+#         # FlashAttention 算子不直接返回注意力权重矩阵
+-+#         attn_weights = None
+-+#         if output_attentions:
+-+#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+
+-+#         return attn_output, attn_weights, past_key_value
+-+
+-+#     # def forward(
+-+#     #     self,
+-+#     #     hidden_states: mindspore.Tensor,
+-+#     #     attention_mask: Optional[mindspore.Tensor] = None,
+-+#     #     position_ids: Optional[mindspore.Tensor] = None,
+-+#     #     past_key_value: Optional[Cache] = None,
+-+#     #     output_attentions: bool = False,
+-+#     #     use_cache: bool = False,
+-+#     #     cache_position: Optional[mindspore.Tensor] = None,
+-+#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+#     #     bsz, q_len, _ = hidden_states.shape
+-+
+-+#     #     # 1. 线性投射 Q, K, V
+-+#     #     query_states = self.q_proj(hidden_states)
+-+#     #     key_states = self.k_proj(hidden_states)
+-+#     #     value_states = self.v_proj(hidden_states)
+-+
+-+#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+
+-+#     #     # 3. RoPE 旋转位置编码
+-+#     #     kv_seq_len = key_states.shape[-2]
+-+#     #     if past_key_value is not None:
+-+#     #         if self.layer_idx is None:
+-+#     #             raise ValueError(
+-+#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+#     #                 "with a layer index."
+-+#     #             )
+-+#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+
+-+#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+#     #     # 4. KV 缓存更新
+-+#     #     if past_key_value is not None:
+-+#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+#     #         key_states, value_states = past_key_value.update(
+-+#     #             key_states, value_states, self.layer_idx, cache_kwargs
+-+#     #         )
+-+
+-+#     #     # 5. 准备 Attention Mask
+-+#     #     fa_attention_mask = None
+-+#     #     if attention_mask is not None:
+-+#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+#     #         fa_attention_mask = (mask_slice != 0)
+-+
+-+#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+#     #     input_dtype = query_states.dtype
+-+
+-+#     #     # 6. [核心] 调用 flash_attention_score 算子
+-+#     #     attn_output = mindspore.ops.flash_attention_score(
+-+#     #         query=query_states,
+-+#     #         key=key_states,
+-+#     #         value=value_states,
+-+#     #         head_num=self.num_heads,
+-+#     #         attn_mask=fa_attention_mask,
+-+#     #         keep_prob=1.0 - self.attention_dropout,
+-+#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+#     #         input_layout="BNSD",
+-+#     #         sparse_mode=0,
+-+#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+#     #         inner_precise=1
+-+#     #     )
+-+
+-+#     #     # 恢复原始数据类型
+-+#     #     attn_output = attn_output.to(input_dtype)
+-+
+-+#     #     # 7. 调整输出形状
+-+#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+#     #     attn_output = self.o_proj(attn_output)
+-+
+-+#     #     attn_weights = None
+-+#     #     if output_attentions:
+-+#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+
+-+#     #     return attn_output, attn_weights, past_key_value
+-+
+-+
+- class Qwen2MoeFlashAttention(nn.Module):
+-     """
+--    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+--    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+--
+--    关键改动:
+--    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+--       直接传入原始的 key 和 value 张量效率更高。
+--    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+--    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+-+
+-+    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+-+    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+-+    完全使用模型的低精度数据类型（如 float16）进行计算，
+-+    以达到理论上的最高执行速度。
+-     """
+-     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-         super().__init__()
+-         self.config = config
+-         self.layer_idx = layer_idx
+-+        if layer_idx is None:
+-+            logger.warning_once(
+-+                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+-+            )
+-+
+-         self.hidden_size = config.hidden_size
+-         self.num_heads = config.num_attention_heads
+-         self.head_dim = self.hidden_size // self.num_heads
+-         self.num_key_value_heads = config.num_key_value_heads
+--        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-         self.max_position_embeddings = config.max_position_embeddings
+-         self.rope_theta = config.rope_theta
+-         self.attention_dropout = config.attention_dropout
+- 
+--        if (self.head_dim * self.num_heads) != self.hidden_size:
+--            raise ValueError(
+--                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+--            )
+--
+-         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+-         key_states = self.k_proj(hidden_states)
+-         value_states = self.v_proj(hidden_states)
+- 
+--        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+--        # query:   [B, S, H*D] -> [B, N1, S, D]
+--        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+        # 2. 调整形状以匹配 BNSD 布局
+-         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--
+--        # 3. RoPE 旋转位置编码
+-+        
+-+        # 3. RoPE 和 KV 缓存
+-         kv_seq_len = key_states.shape[-2]
+-         if past_key_value is not None:
+--            if self.layer_idx is None:
+--                raise ValueError(
+--                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+--                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+--                    "with a layer index."
+--                )
+--            # 对于 StaticCache，需要特殊处理 kv_seq_len
+--            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+--            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+--                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+--                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+--                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+--                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+--                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+--                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+--                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+--                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+--                if cache_position.shape[0] == 1:
+--                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+--                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+--                    kv_seq_len = past_seen_tokens + 1
+--                else:
+--                    # prefill 阶段：cache_position 是范围，使用其长度
+--                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+--            else:
+--                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+--
+-+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+        
+-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+- 
+--        # 4. KV 缓存更新
+-         if past_key_value is not None:
+-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+--            key_states, value_states = past_key_value.update(
+--                key_states, value_states, self.layer_idx, cache_kwargs
+--            )
+--            
+--            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+--            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+--            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+--                if cache_position.shape[0] == 1:
+--                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+--                    kv_seq_len = key_states.shape[-2]
+--
+--        # 5. [重要] 准备 Attention Mask
+--        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+--        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+
+-+        # 4. 准备 Attention Mask
+-         fa_attention_mask = None
+-         if attention_mask is not None:
+--            # 截取与当前key长度匹配的部分
+--            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+--            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+--            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-             fa_attention_mask = (mask_slice != 0)
+- 
+--        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+--        input_dtype = query_states.dtype
+--        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+--            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+--            query_states = query_states.to(mindspore.float16)
+--            key_states = key_states.to(mindspore.float16)
+--            value_states = value_states.to(mindspore.float16)
+--
+--        # 6. [核心] 调用 flash_attention_score 算子
+--        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+--        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+-         attn_output = mindspore.ops.flash_attention_score(
+-             query=query_states,
+-             key=key_states,
+-             value=value_states,
+--            head_num=self.num_heads, # 传入Q的头数(N1)
+-+            head_num=self.num_heads,
+-             attn_mask=fa_attention_mask,
+--            keep_prob=1.0 - self.attention_dropout,
+-+            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+-             scalar_value=1.0 / math.sqrt(self.head_dim),
+-             input_layout="BNSD",
+--            sparse_mode=0 # 使用 defaultMask 模式
+-+            sparse_mode=0,
+-+            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+-         )
+- 
+--        # 恢复原始数据类型
+--        attn_output = attn_output.to(input_dtype)
+--
+--        # 7. 调整输出形状
+--        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+        # 6. 调整输出形状
+-         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-         attn_output = self.o_proj(attn_output)
+- 
+--        # FlashAttention 算子不直接返回注意力权重矩阵
+-+        # 7. 返回结果
+-         attn_weights = None
+-         if output_attentions:
+--             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+- 
+-         return attn_output, attn_weights, past_key_value
+- 
+--    # def forward(
+--    #     self,
+--    #     hidden_states: mindspore.Tensor,
+--    #     attention_mask: Optional[mindspore.Tensor] = None,
+--    #     position_ids: Optional[mindspore.Tensor] = None,
+--    #     past_key_value: Optional[Cache] = None,
+--    #     output_attentions: bool = False,
+--    #     use_cache: bool = False,
+--    #     cache_position: Optional[mindspore.Tensor] = None,
+--    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+--
+--    #     bsz, q_len, _ = hidden_states.shape
+--
+--    #     # 1. 线性投射 Q, K, V
+--    #     query_states = self.q_proj(hidden_states)
+--    #     key_states = self.k_proj(hidden_states)
+--    #     value_states = self.v_proj(hidden_states)
+--
+--    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+--    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+--    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--
+--    #     # 3. RoPE 旋转位置编码
+--    #     kv_seq_len = key_states.shape[-2]
+--    #     if past_key_value is not None:
+--    #         if self.layer_idx is None:
+--    #             raise ValueError(
+--    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+--    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+--    #                 "with a layer index."
+--    #             )
+--    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+- 
+--    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+--    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+--
+--    #     # 4. KV 缓存更新
+--    #     if past_key_value is not None:
+--    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+--    #         key_states, value_states = past_key_value.update(
+--    #             key_states, value_states, self.layer_idx, cache_kwargs
+--    #         )
+--
+--    #     # 5. 准备 Attention Mask
+--    #     fa_attention_mask = None
+--    #     if attention_mask is not None:
+--    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+--    #         fa_attention_mask = (mask_slice != 0)
+--
+--    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+--    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+--    #     input_dtype = query_states.dtype
+--
+--    #     # 6. [核心] 调用 flash_attention_score 算子
+--    #     attn_output = mindspore.ops.flash_attention_score(
+--    #         query=query_states,
+--    #         key=key_states,
+--    #         value=value_states,
+--    #         head_num=self.num_heads,
+--    #         attn_mask=fa_attention_mask,
+--    #         keep_prob=1.0 - self.attention_dropout,
+--    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+--    #         input_layout="BNSD",
+--    #         sparse_mode=0,
+--    #         # <--- 修改点 2: 启用内部高精度计算 ---
+--    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+--    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+--    #         inner_precise=1
+--    #     )
+--
+--    #     # 恢复原始数据类型
+--    #     attn_output = attn_output.to(input_dtype)
+-+QWEN2MOE_ATTENTION_CLASSES = {
+-+    "eager": Qwen2MoeAttention,
+-+    "flash-attention": Qwen2MoeFlashAttention,
+-+}
+- 
+--    #     # 7. 调整输出形状
+--    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+--    #     attn_output = self.o_proj(attn_output)
+- 
+--    #     attn_weights = None
+--    #     if output_attentions:
+--    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+#     def __init__(self, config):
+-+#         super().__init__()
+-+#         self.num_experts = config.num_experts
+-+#         self.top_k = config.num_experts_per_tok
+-+#         self.norm_topk_prob = config.norm_topk_prob
+-+
+-+#         # gating
+-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+#         self.experts = nn.ModuleList(
+-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+#         )
+-+
+-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+
+-+#     #@dwj
+-+#     # 只遍历激活的专家，而非全部专家
+-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+#             num_tokens = hidden_states_reshaped.shape[0]
+-+
+-+#             router_logits = self.gate(hidden_states_reshaped)
+-+#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+
+-+#             if self.norm_topk_prob:
+-+#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+#             routing_weights = routing_weights.to(hidden_states.dtype)
+-+            
+-+#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+#             flat_selected_experts = selected_experts.flatten()
+-+            
+-+#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+#             token_indices = broadcasted_token_indices.flatten()
+-+            
+-+#             active_experts = ops.unique(flat_selected_experts)
+-+            
+-+#             for expert_idx_tensor in active_experts:
+-+#                 expert_idx = expert_idx_tensor.item()
+-+#                 expert_layer = self.experts[expert_idx]
+-+                
+-+#                 mask = (flat_selected_experts == expert_idx_tensor)
+-+#                 selected_token_indices = token_indices[mask]
+-+#                 selected_routing_weights = routing_weights.flatten()[mask]
+-+                
+-+#                 current_states = hidden_states_reshaped[selected_token_indices]
+-+                
+-+#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+                
+-+#                 final_hidden_states = final_hidden_states.index_add(
+-+#                     dim=0,
+-+#                     index=selected_token_indices,
+-+#                     source=expert_output.to(hidden_states.dtype)
+-+#                 )
+-+            
+-+#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+- 
+--    #     return attn_output, attn_weights, past_key_value
+-+#             final_hidden_states = final_hidden_states + shared_expert_output
+-+#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+            
+-+#             return final_hidden_states, router_logits
+-+
+-+
+-+# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+#     """
+-+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-+#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-+#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-+#     """
+-+#     def __init__(self, config: Qwen2MoeConfig):
+-+#         super().__init__()
+-+#         self.num_experts = config.num_experts
+-+#         self.top_k = config.num_experts_per_tok
+-+#         self.norm_topk_prob = config.norm_topk_prob
+-+
+-+#         # 门控网络
+-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+#         # 专家列表
+-+#         self.experts = nn.ModuleList(
+-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+#         )
+-+#         # 共享专家
+-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+
+-+#     @no_grad()
+-+#     def _moe_infer_decode(
+-+#         self, 
+-+#         hidden_states: mindspore.Tensor, 
+-+#         selected_experts: mindspore.Tensor, 
+-+#         routing_weights: mindspore.Tensor
+-+#     ) -> mindspore.Tensor:
+-+#         """
+-+#         【解码路径】针对 sequence_length=1 的极致优化。
+-+#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-+#         """
+-+#         batch_size, hidden_dim = hidden_states.shape
+-+        
+-+#         expert_outputs_list = [
+-+#             ops.cat([
+-+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+#             ], dim=0) 
+-+#             for i in range(batch_size)
+-+#         ]
+-+        
+-+#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-+#         # shape: (batch_size, top_k, hidden_dim)
+-+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+        
+-+#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-+#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+        
+-+#         return moe_output.squeeze(1)
+-+
+-+#     @no_grad()
+-+#     def _moe_infer_prefill(
+-+#         self, 
+-+#         hidden_states: mindspore.Tensor, 
+-+#         selected_experts: mindspore.Tensor, 
+-+#         routing_weights: mindspore.Tensor
+-+#     ) -> mindspore.Tensor:
+-+#         """
+-+#         【预填充路径】针对 sequence_length > 1 的优化。
+-+#         按专家对 Token 进行分组，并进行批处理。
+-+#         """
+-+#         moe_output = ops.zeros_like(hidden_states)
+-+#         num_tokens = hidden_states.shape[0]
+-+#         flat_selected_experts = selected_experts.flatten()
+-+        
+-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+        
+-+#         active_experts = ops.unique(flat_selected_experts)
+-+        
+-+#         for expert_idx_tensor in active_experts:
+-+#             expert_idx = expert_idx_tensor.item()
+-+#             expert_layer = self.experts[expert_idx]
+-+            
+-+#             mask = (flat_selected_experts == expert_idx_tensor)
+-+#             selected_token_indices = token_indices[mask]
+-+#             selected_routing_weights = routing_weights.flatten()[mask]
+-+            
+-+#             current_states = hidden_states[selected_token_indices]
+-+            
+-+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+            
+-+#             moe_output = moe_output.index_add(
+-+#                 dim=0,
+-+#                 index=selected_token_indices,
+-+#                 source=expert_output.to(hidden_states.dtype)
+-+#             )
+-+#         return moe_output
+-+
+-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+#         """
+-+#         顶层 forward 方法，作为智能分发器。
+-+#         """
+-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+        
+-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+#         router_logits = self.gate(hidden_states_reshaped)
+-+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+- 
+--    # def forward(
+--    #     self,
+--    #     hidden_states: mindspore.Tensor,
+--    #     attention_mask: Optional[mindspore.Tensor] = None,
+--    #     position_ids: Optional[mindspore.Tensor] = None,
+--    #     past_key_value: Optional[Cache] = None,
+--    #     output_attentions: bool = False,
+--    #     use_cache: bool = False,
+--    #     cache_position: Optional[mindspore.Tensor] = None,
+--    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+--
+--    #     bsz, q_len, _ = hidden_states.shape
+--
+--    #     query_states = self.q_proj(hidden_states)
+--    #     key_states = self.k_proj(hidden_states)
+--    #     value_states = self.v_proj(hidden_states)
+--
+--    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+--    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--
+--    #     kv_seq_len = key_states.shape[-2]
+--    #     if past_key_value is not None:
+--    #         if self.layer_idx is None:
+--    #             raise ValueError("`layer_idx` must be specified for caching")
+--    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+--
+--    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+--    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+--
+--    #     if past_key_value is not None:
+--    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+--    #         key_states, value_states = past_key_value.update(
+--    #             key_states, value_states, self.layer_idx, cache_kwargs
+--    #         )
+-+#         if self.norm_topk_prob:
+-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+        
+-+#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+        
+-+#         moe_output = None
+-+#         # 在推理时，根据序列长度选择最优路径
+-+#         if not self.training:
+-+#             if sequence_length == 1:
+-+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+#             else:
+-+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+#         else:
+-+#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-+#             raise NotImplementedError("Training path is not implemented.")
+-+
+-+#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-+#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-+        
+-+#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-+        
+-+#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-+        
+-+#         return final_hidden_states, router_logits
+-+
+-+
+-+# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+#     """
+-+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-+#     """
+-+#     def __init__(self, config: Qwen2MoeConfig):
+-+#         super().__init__()
+-+#         self.num_experts = config.num_experts
+-+#         self.top_k = config.num_experts_per_tok
+-+#         self.norm_topk_prob = config.norm_topk_prob
+-+
+-+#         # 门控网络
+-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+#         # 专家列表
+-+#         self.experts = nn.ModuleList(
+-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+#         )
+-+#         # 共享专家
+-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+
+-+#     @no_grad()
+-+#     def _moe_infer_decode(
+-+#         self, 
+-+#         hidden_states: mindspore.Tensor, 
+-+#         selected_experts: mindspore.Tensor, 
+-+#         routing_weights: mindspore.Tensor
+-+#     ) -> mindspore.Tensor:
+-+#         batch_size, _ = hidden_states.shape
+-+#         expert_outputs_list = [
+-+#             ops.cat([
+-+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+#             ], dim=0) 
+-+#             for i in range(batch_size)
+-+#         ]
+-+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+#         return moe_output.squeeze(1)
+-+
+-+#     @no_grad()
+-+#     def _moe_infer_prefill(
+-+#         self, 
+-+#         hidden_states: mindspore.Tensor, 
+-+#         selected_experts: mindspore.Tensor, 
+-+#         routing_weights: mindspore.Tensor
+-+#     ) -> mindspore.Tensor:
+-+#         moe_output = ops.zeros_like(hidden_states)
+-+#         num_tokens = hidden_states.shape[0]
+-+#         flat_selected_experts = selected_experts.flatten()
+-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+#         active_experts = ops.unique(flat_selected_experts)
+-+        
+-+#         for expert_idx_tensor in active_experts:
+-+#             expert_idx = expert_idx_tensor.item()
+-+#             expert_layer = self.experts[expert_idx]
+-+#             mask = (flat_selected_experts == expert_idx_tensor)
+-+#             selected_token_indices = token_indices[mask]
+-+#             selected_routing_weights = routing_weights.flatten()[mask]
+-+#             current_states = hidden_states[selected_token_indices]
+-+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+#             moe_output = moe_output.index_add(
+-+#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+#             )
+-+#         return moe_output
+-+
+-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+#         """
+-+#         顶层 forward 方法，作为智能分发器。
+-+#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-+#         """
+-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+        
+-+#         # 1. 门控计算 (通用逻辑)
+-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+#         router_logits = self.gate(hidden_states_reshaped)
+-+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+
+-+#         if self.norm_topk_prob:
+-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+        
+-+#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+        
+-+#         # 2. 智能分发到最优 MoE 路径
+-+#         moe_output = None
+-+#         if not self.training:
+-+#             if sequence_length == 1:
+-+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+#             else:
+-+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+#         else:
+-+#             raise NotImplementedError("Training path is not implemented.")
+-+
+-+#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-+#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+        
+-+#         # 4. 合并 MoE 输出和共享专家输出
+-+#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+        
+-+#         # 5. 恢复原始形状并返回
+-+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+        
+-+#         return final_hidden_states, router_logits
+-+
+-+# prefill fastest
+-+# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+#     """
+-+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-+#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-+#     """
+-+#     def __init__(self, config: Qwen2MoeConfig):
+-+#         super().__init__()
+-+#         self.num_experts = config.num_experts
+-+#         self.top_k = config.num_experts_per_tok
+-+#         self.norm_topk_prob = config.norm_topk_prob
+-+
+-+#         # 门控网络
+-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+#         # 专家列表
+-+#         self.experts = nn.ModuleList(
+-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+#         )
+-+#         # 共享专家
+-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+
+-+#     @no_grad()
+-+#     def _moe_infer_dispatch(
+-+#         self, 
+-+#         hidden_states: mindspore.Tensor, 
+-+#         selected_experts: mindspore.Tensor, 
+-+#         routing_weights: mindspore.Tensor
+-+#     ) -> mindspore.Tensor:
+-+#         """
+-+#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-+#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-+#         """
+-+#         moe_output = ops.zeros_like(hidden_states)
+-+#         num_tokens, _ = hidden_states.shape
+-+        
+-+#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-+#         flat_selected_experts = selected_experts.flatten()
+-+#         flat_routing_weights = routing_weights.flatten()
+- 
+--    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+--    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+--
+--    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+--    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+--    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+--    #     query_states = query_states / math.sqrt(self.head_dim)
+--    #     # <--- 修改结束 ---
+--
+--    #     fa_attention_mask = None
+--    #     if attention_mask is not None:
+--    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+--    #         fa_attention_mask = (mask_slice != 0)
+--
+--    #     input_dtype = query_states.dtype
+--
+--    #     attn_output = mindspore.ops.flash_attention_score(
+--    #         query=query_states,    # 传入已经预先缩放过的 query
+--    #         key=key_states,
+--    #         value=value_states,
+--    #         head_num=self.num_heads,
+--    #         attn_mask=fa_attention_mask,
+--    #         keep_prob=1.0 - self.attention_dropout,
+--    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+--    #         input_layout="BNSD",
+--    #         sparse_mode=0,
+--    #         inner_precise=1        # 仍然保持内部高精度计算
+--    #     )
+-+#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+- 
+--    #     attn_output = attn_output.to(input_dtype)
+--    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+--    #     attn_output = self.o_proj(attn_output)
+-+#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-+#         active_experts = ops.unique(flat_selected_experts)
+-+        
+-+#         for expert_idx_tensor in active_experts:
+-+#             expert_idx = expert_idx_tensor.item()
+-+#             expert_layer = self.experts[expert_idx]
+-+            
+-+#             # 找到所有分配给该专家的 token
+-+#             mask = (flat_selected_experts == expert_idx_tensor)
+-+            
+-+#             # 使用 mask 选取对应的 token 和权重
+-+#             current_token_indices = token_indices[mask]
+-+#             current_routing_weights = flat_routing_weights[mask]
+-+#             current_hidden_states = hidden_states[current_token_indices]
+-+            
+-+#             # 对这些 token 进行批处理
+-+#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+            
+-+#             # 使用 index_add 将结果精确地加回到对应位置
+-+#             moe_output = moe_output.index_add(
+-+#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-+#             )
+-+#         return moe_output
+-+
+-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+#         """
+-+#         顶层 forward 方法，作为智能分发器。
+-+#         """
+-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+        
+-+#         # 1. 门控计算
+-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+#         router_logits = self.gate(hidden_states_reshaped)
+-+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+
+-+#         if self.norm_topk_prob:
+-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+        
+-+#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+        
+-+#         # 2. 调用统一的 MoE 计算内核
+-+#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-+#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+- 
+--    #     attn_weights = None
+--    #     if output_attentions:
+--    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+#         # 3. 统一处理共享专家
+-+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+        
+-+#         # 4. 合并输出
+-+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+        
+-+#         # 5. 恢复原始形状并返回
+-+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+        
+-+#         return final_hidden_states, router_logits
+-+
+-+
+-+# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+#     """
+-+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+#     【最终高性能与高精度版】：
+-+#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-+#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-+#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-+#     3. 这样实现了速度和准确性的两全其美。
+-+#     """
+-+#     def __init__(self, config: Qwen2MoeConfig):
+-+#         super().__init__()
+-+#         self.num_experts = config.num_experts
+-+#         self.top_k = config.num_experts_per_tok
+-+#         self.norm_topk_prob = config.norm_topk_prob
+-+
+-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+#         self.experts = nn.ModuleList(
+-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+#         )
+-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+
+-+#     @no_grad()
+-+#     def _moe_infer_decode(
+-+#         self, 
+-+#         hidden_states: mindspore.Tensor, 
+-+#         selected_experts: mindspore.Tensor, 
+-+#         routing_weights: mindspore.Tensor
+-+#     ) -> mindspore.Tensor:
+-+#         """
+-+#         【解码路径】极致优化版：bmm + 高精度累加。
+-+#         """
+-+#         original_dtype = hidden_states.dtype
+-+#         batch_size, _ = hidden_states.shape
+-+        
+-+#         expert_outputs_list = [
+-+#             ops.cat([
+-+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+#             ], dim=0) 
+-+#             for i in range(batch_size)
+-+#         ]
+-+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+
+-+#         # 在 float32 下执行 bmm，得到高精度结果
+-+#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+        
+-+#         # 将高精度结果转换回原始数据类型
+-+#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-+        
+-+#         return moe_output
+-+
+-+#     @no_grad()
+-+#     def _moe_infer_prefill(
+-+#         self, 
+-+#         hidden_states: mindspore.Tensor, 
+-+#         selected_experts: mindspore.Tensor, 
+-+#         routing_weights: mindspore.Tensor
+-+#     ) -> mindspore.Tensor:
+-+#         """
+-+#         【预填充路径】与原始实现一致，结果精确。
+-+#         """
+-+#         moe_output = ops.zeros_like(hidden_states)
+-+#         num_tokens, _ = hidden_states.shape
+-+#         flat_selected_experts = selected_experts.flatten()
+-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+#         active_experts = ops.unique(flat_selected_experts)
+-+        
+-+#         for expert_idx_tensor in active_experts:
+-+#             expert_idx = expert_idx_tensor.item()
+-+#             expert_layer = self.experts[expert_idx]
+-+#             mask = (flat_selected_experts == expert_idx_tensor)
+-+#             selected_token_indices = token_indices[mask]
+-+#             selected_routing_weights = routing_weights.flatten()[mask]
+-+#             current_states = hidden_states[selected_token_indices]
+-+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+#             moe_output = moe_output.index_add(
+-+#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+#             )
+-+#         return moe_output
+-+
+-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+        
+-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+#         router_logits = self.gate(hidden_states_reshaped)
+-+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+- 
+--    #     return attn_output, attn_weights, past_key_value
+-+#         if self.norm_topk_prob:
+-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+        
+-+#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-+#         # 如果模型主体是 float16，后续再转换
+-+        
+-+#         moe_output = None
+-+#         if not self.training:
+-+#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-+#             # _moe_infer_decode 内部会处理好类型转换
+-+#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-+#             if sequence_length == 1:
+-+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+#             else:
+-+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+#         else:
+-+#             raise NotImplementedError("Training path is not implemented.")
+-+
+-+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+        
+-+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+        
+-+#         return final_hidden_states, router_logits
+-+    
+- 
+--QWEN2MOE_ATTENTION_CLASSES = {
+--    "eager": Qwen2MoeAttention,
+--    "flash-attention": Qwen2MoeFlashAttention,
+--}
+-+# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+#     """
+-+#     【融合版】一个混合专家模块，内置两种推理策略，
+-+#     由外部全局变量 `Long_Prompt` 控制：
+-+
+-+#     - if Long_Prompt is True:  【精度优先模式】
+-+#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-+#       适用于处理长序列，避免误差累积。
+-+
+-+#     - if Long_Prompt is False: 【速度优先模式】
+-+#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-+#       在解码阶段获得极致速度，同时保证结果高度准确。
+-+#     """
+-+#     def __init__(self, config: Qwen2MoeConfig):
+-+#         super().__init__()
+-+#         self.num_experts = config.num_experts
+-+#         self.top_k = config.num_experts_per_tok
+-+#         self.norm_topk_prob = config.norm_topk_prob
+-+
+-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+#         self.experts = nn.ModuleList(
+-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+#         )
+-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+
+-+#     # --- 速度优先模式的辅助函数 ---
+-+#     @no_grad()
+-+#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+#         original_dtype = hidden_states.dtype
+-+#         batch_size, _ = hidden_states.shape
+-+#         expert_outputs_list = [
+-+#             ops.cat([
+-+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+#             ], dim=0) 
+-+#             for i in range(batch_size)
+-+#         ]
+-+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+#         weights_fp32 = routing_weights.to(mindspore.float32)
+-+#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+
+-+#     @no_grad()
+-+#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+#         moe_output = ops.zeros_like(hidden_states)
+-+#         num_tokens, _ = hidden_states.shape
+-+#         flat_selected_experts = selected_experts.flatten()
+-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+#         active_experts = ops.unique(flat_selected_experts)
+-+#         for expert_idx_tensor in active_experts:
+-+#             expert_idx = expert_idx_tensor.item()
+-+#             expert_layer = self.experts[expert_idx]
+-+#             mask = (flat_selected_experts == expert_idx_tensor)
+-+#             selected_token_indices = token_indices[mask]
+-+#             selected_routing_weights = routing_weights.flatten()[mask]
+-+#             current_states = hidden_states[selected_token_indices]
+-+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-+#         return moe_output
+-+
+-+#     # --- 精度优先模式的辅助函数 ---
+-+#     @no_grad()
+-+#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+#         moe_output = ops.zeros_like(hidden_states)
+-+#         num_tokens, _ = hidden_states.shape
+-+#         flat_selected_experts = selected_experts.flatten()
+-+#         flat_routing_weights = routing_weights.flatten()
+-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+#         active_experts = ops.unique(flat_selected_experts)
+-+#         for expert_idx_tensor in active_experts:
+-+#             expert_idx = expert_idx_tensor.item()
+-+#             expert_layer = self.experts[expert_idx]
+-+#             mask = (flat_selected_experts == expert_idx_tensor)
+-+#             current_token_indices = token_indices[mask]
+-+#             current_routing_weights = flat_routing_weights[mask]
+-+#             current_hidden_states = hidden_states[current_token_indices]
+-+#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+#         return moe_output
+-+
+-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+#         # 声明我们将要使用一个在模块外部定义的全局变量
+-+#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-+#         global Long_Prompt
+-+
+-+#         # 1. 门控计算 (所有模式通用)
+-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+#         router_logits = self.gate(hidden_states_reshaped)
+-+#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+#         if self.norm_topk_prob:
+-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+        
+-+#         moe_output = None
+-+#         if not self.training:
+-+#             # 根据 Long_Prompt 标志选择模式
+-+#             if Long_Prompt:
+-+#                 # --- 精度优先模式 ---
+-+#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+#             else:
+-+#                 # --- 速度优先模式 ---
+-+#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+#                 if sequence_length == 1:
+-+#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+#                 else:
+-+#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+#         else:
+-+#             raise NotImplementedError("Training path is not implemented.")
+-+
+-+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+        
+-+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+        
+-+#         return final_hidden_states, router_logits
+-+    
+-+class Qwen2MoeSparseMoeBlock(nn.Module):
+-+    """
+-+    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-+    控制的顶级推理策略：
+- 
+-+    - if Long_Prompt is True:  【精度优先模式】
+-+      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+-+      适用于需要严格可复现性的长序列任务。
+- 
+--class Qwen2MoeSparseMoeBlock(nn.Module):
+--    def __init__(self, config):
+-+    - if Long_Prompt is False: 【速度优先模式】
+-+      采用业界最强的性能组合：
+-+      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+-+      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+-+    """
+-+    def __init__(self, config: Qwen2MoeConfig):
+-         super().__init__()
+-         self.num_experts = config.num_experts
+-         self.top_k = config.num_experts_per_tok
+-         self.norm_topk_prob = config.norm_topk_prob
+- 
+--        # gating
+-         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-         self.experts = nn.ModuleList(
+-             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-         )
+--
+-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+- 
+--    #@dwj
+--    # 只遍历激活的专家，而非全部专家
+--    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+--            batch_size, sequence_length, hidden_dim = hidden_states.shape
+--            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+--            num_tokens = hidden_states_reshaped.shape[0]
+--
+--            router_logits = self.gate(hidden_states_reshaped)
+--            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+--
+--            if self.norm_topk_prob:
+--                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--            routing_weights = routing_weights.to(hidden_states.dtype)
+--            
+--            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+--            flat_selected_experts = selected_experts.flatten()
+--            
+--            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+--            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+--            token_indices = broadcasted_token_indices.flatten()
+--            
+--            active_experts = ops.unique(flat_selected_experts)
+--            
+--            for expert_idx_tensor in active_experts:
+--                expert_idx = expert_idx_tensor.item()
+--                expert_layer = self.experts[expert_idx]
+--                
+--                mask = (flat_selected_experts == expert_idx_tensor)
+--                selected_token_indices = token_indices[mask]
+--                selected_routing_weights = routing_weights.flatten()[mask]
+--                
+--                current_states = hidden_states_reshaped[selected_token_indices]
+--                
+--                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+--                
+--                final_hidden_states = final_hidden_states.index_add(
+--                    dim=0,
+--                    index=selected_token_indices,
+--                    source=expert_output.to(hidden_states.dtype)
+--                )
+--            
+--            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+--            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+-+    @no_grad()
+-+    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+        original_dtype = hidden_states.dtype
+-+        batch_size, _ = hidden_states.shape
+-+        expert_outputs_list = [
+-+            ops.cat([
+-+                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+            ], dim=0)
+-+            for i in range(batch_size)
+-+        ]
+-+        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+        weights_fp32 = routing_weights.to(mindspore.float32)
+-+        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+        return moe_output_fp32.squeeze(1).to(original_dtype)
+-+
+-+    @no_grad()
+-+    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+        num_tokens, _ = hidden_states.shape
+-+        flat_selected_experts = selected_experts.flatten()
+-+        sorted_expert_indices = flat_selected_experts.argsort()
+-+        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+        original_token_indices = sorted_expert_indices // self.top_k
+-+        moe_output = ops.zeros_like(hidden_states)
+-+        current_token_offset = 0
+-+        for i in range(self.num_experts):
+-+            expert_token_count = tokens_per_expert[i] - current_token_offset
+-+            if expert_token_count == 0:
+-+                continue
+-+            end_offset = current_token_offset + expert_token_count
+-+            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+            expert_hidden_states = hidden_states[expert_original_token_indices]
+-+            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+            current_token_offset += expert_token_count
+-+        return moe_output
+-+
+-+    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-+    @no_grad()
+-+    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+        moe_output = ops.zeros_like(hidden_states)
+-+        num_tokens, _ = hidden_states.shape
+-+        flat_selected_experts = selected_experts.flatten()
+-+        flat_routing_weights = routing_weights.flatten()
+-+        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+        active_experts = ops.unique(flat_selected_experts)
+-+        for expert_idx_tensor in active_experts:
+-+            expert_idx = expert_idx_tensor.item()
+-+            expert_layer = self.experts[expert_idx]
+-+            mask = (flat_selected_experts == expert_idx_tensor)
+-+            current_token_indices = token_indices[mask]
+-+            current_routing_weights = flat_routing_weights[mask]
+-+            current_hidden_states = hidden_states[current_token_indices]
+-+            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+        return moe_output
+- 
+--            final_hidden_states = final_hidden_states + shared_expert_output
+--            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+--            
+--            return final_hidden_states, router_logits
+-+    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+        global Long_Prompt
+-+
+-+        # 1. 门控计算 (所有模式通用)
+-+        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+        router_logits = self.gate(hidden_states_reshaped)
+-+        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+        if self.norm_topk_prob:
+-+            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+        
+-+        moe_output = None
+-+        if Long_Prompt:
+-+            # --- 精度优先模式 (ACCURACY MODE) ---
+-+            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+        else:
+-+            # --- 速度优先模式 (SPEED MODE) ---
+-+            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+            if sequence_length == 1:
+-+                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+            else:
+-+                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+        
+- 
+-+        # 3. 共享专家计算与合并 (所有模式通用)
+-+        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+        
+-+        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+        
+-+        return final_hidden_states, router_logits
+- 
+- class Qwen2MoeDecoderLayer(nn.Module):
+-     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-         super().__init__()
+-         self.hidden_size = config.hidden_size
+-+        
+-+        # if Long_Prompt:
+-+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+        # else:
+-+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+- 
+-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+- 
+--        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+--
+-         if (layer_idx not in config.mlp_only_layers) and (
+-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-         ):
+-@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-             self._warmed_up = True
+-             self.warmup_moe_model()
+- 
+-+
+-+
+-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-         output_router_logits = (
+-             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+-@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-             router_logits=outputs.router_logits,
+-         )
+- 
+-+    def generate(self, *args, **kwargs):
+-+        """
+-+        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-+        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-+        """
+-+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-+
+-+        input_ids = kwargs.get("input_ids")
+-+        if input_ids is None and args:
+-+            input_ids = args[0]
+-+
+-+        if input_ids is not None:
+-+            prompt_length = input_ids.shape[1]
+-+            
+-+            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-+                Long_Prompt = True
+-+            else:
+-+                Long_Prompt = False
+-+
+-+        return super().generate(*args, **kwargs)
+-+    
+-     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+-     def prepare_inputs_for_generation(
+-         self,
+-@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+-         # Exception 1: when passing input_embeds, input_ids may be missing entries
+-         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+-+        
+-         if past_key_values is not None:
+-             if inputs_embeds is not None:  # Exception 1
+-                 if 0 not in input_ids.shape:
+-@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-             }
+-         )
+-         return model_inputs
+-+
+- # @lwx
+-     # def _decode_one_tokens_logits(
+-     #     self,
+-@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+-             attentions=outputs.attentions,
+-         )
+- 
+-+
+- __all__ = [
+-     "Qwen2MoeForCausalLM",
+-     "Qwen2MoeModel",
+-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-new file mode 100644
+-index 00000000..6dfb5b93
+---- /dev/null
+-+++ b/patches/0001-20251104commit.patch
+-@@ -0,0 +1,1272 @@
+-+From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+From: Pinoeer-kingxi <13022943007@163.com>
+-+Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+Subject: [PATCH] 20251104commit
+-+
+-+---
+-+ mindnlp/transformers/cache_utils.py           |  28 +-
+-+ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+ 3 files changed, 976 insertions(+), 87 deletions(-)
+-+
+-+diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+index cadd2e04..02f8d4be 100644
+-+--- a/mindnlp/transformers/cache_utils.py
+-++++ b/mindnlp/transformers/cache_utils.py
+-+@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+             # k_out[:, :, cache_position] = key_states
+-+             # v_out[:, :, cache_position] = value_states
+-+-            if ON_ORANGE_PI:
+-+-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+-            else:
+-+-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+-
+-++            # if ON_ORANGE_PI:
+-++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++            # else:
+-++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++            # 确保 cache_position 是 1D tensor 并且类型正确
+-++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-++            if cache_position.ndim > 1:
+-++                cache_position = cache_position.flatten()
+-++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-++                cache_position = cache_position.int()
+-++            
+-++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-++            k_out[:, :, cache_position] = key_states
+-++            v_out[:, :, cache_position] = value_states
+-++            
+-+         return k_out, v_out
+-+ 
+-+     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+index c695b944..d8303e45 100644
+-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+ def rotate_half(x):
+-+     """Rotates half the hidden dims of the input."""
+-+-    x1 = x[..., : x.shape[-1] // 2]
+-+-    x2 = x[..., x.shape[-1] // 2 :]
+-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++    # x1 = x[..., : x.shape[-1] // 2]
+-++    # x2 = x[..., x.shape[-1] // 2 :]
+-++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+     return ops.cat((-x2, x1), dim=-1)
+-+ 
+-+ 
+-+@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+         if self.training:
+-+             raise NotImplementedError("Training is not supported yet.")
+-+         else:
+-+-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+-        if self.config.n_shared_experts is not None:
+-+-            y = y + self.shared_experts(identity)
+-+-        return y
+-++            # @lwx
+-++            if orig_shape[1] == 1:
+-++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-++                y=y.view(*orig_shape)
+-++                if self.config.n_shared_experts is not None:
+-++                    y = y + self.shared_experts(identity)
+-++                return y
+-++            else:
+-++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-++                if self.config.n_shared_experts is not None:
+-++                    y = y + self.shared_experts(identity)
+-++                return y
+-++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++        # if self.config.n_shared_experts is not None:
+-++        #     y = y + self.shared_experts(identity)
+-++        # return y
+-++        
+-++    @no_grad()
+-++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++
+-++        expert_cache = ops.zeros_like(x)
+-++        for i in range(self.num_experts_per_tok):
+-++            expert_id = flat_expert_indices[i].item()
+-++            weight = flat_expert_weights[i].item()
+-++            expert = self.experts[expert_id]
+-++            expert_out = expert(x)
+-++            expert_cache += expert_out * weight
+-++        return expert_cache
+-+ 
+-+     @no_grad()
+-+-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+-        # expert_cache = torch.zeros_like(x)
+-+-        # idxs = flat_expert_indices.argsort()
+-+-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+-        # token_idxs = idxs // self.num_experts_per_tok
+-+-        # for i, end_idx in enumerate(tokens_per_expert):
+-+-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+-        #     if start_idx == end_idx:
+-+-        #         continue
+-+-        #     expert = self.experts[i]
+-+-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+-        #     expert_tokens = x[exp_token_idx]
+-+-        #     expert_out = expert(expert_tokens)
+-+-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+-        # return expert_cache
+-++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+         expert_cache = ops.zeros_like(x)
+-+         idxs = flat_expert_indices.argsort()
+-+         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+         token_idxs = idxs // self.num_experts_per_tok
+-++
+-+         for i, end_idx in enumerate(tokens_per_expert):
+-+             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+             if start_idx == end_idx:
+-+@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+             expert_out = expert(expert_tokens)
+-+             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++
+-+         return expert_cache
+-++        
+-++    # @no_grad()
+-++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++    #     # expert_cache = torch.zeros_like(x)
+-++    #     # idxs = flat_expert_indices.argsort()
+-++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++    #     # token_idxs = idxs // self.num_experts_per_tok
+-++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++    #     #     if start_idx == end_idx:
+-++    #     #         continue
+-++    #     #     expert = self.experts[i]
+-++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++    #     #     expert_tokens = x[exp_token_idx]
+-++    #     #     expert_out = expert(expert_tokens)
+-++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++    #     # return expert_cache
+-++    #     expert_cache = ops.zeros_like(x)
+-++    #     idxs = flat_expert_indices.argsort()
+-++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++    #     token_idxs = idxs // self.num_experts_per_tok
+-++
+-++    #     for i, end_idx in enumerate(tokens_per_expert):
+-++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++    #         if start_idx == end_idx:
+-++    #             continue
+-++    #         expert = self.experts[i]
+-++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++    #         expert_tokens = x[exp_token_idx]
+-++    #         expert_out = expert(expert_tokens)
+-++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++
+-++    #     return expert_cache
+-++    # @no_grad()
+-++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++    #     expert_cache = ops.zeros_like(x)
+-++
+-++    #     # 排序保证顺序一致
+-++    #     idxs = flat_expert_indices.argsort()
+-++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++    #     token_idxs = idxs // self.num_experts_per_tok
+-++
+-++    #     # 找出有 token 的专家
+-++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++
+-++    #     for i in active_experts.tolist():
+-++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++    #         end_idx = tokens_per_expert[i]
+-++    #         if start_idx == end_idx:  # 没有 token
+-++    #             continue
+-++
+-++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++    #         expert_tokens = x[exp_token_idx]
+-++    #         expert_out = self.experts[i](expert_tokens)
+-++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++
+-++    #         expert_cache = mindspore.mint.scatter_add(
+-++    #             expert_cache,
+-++    #             0,
+-++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++    #             expert_out
+-++    #         )
+-++
+-++    #     return expert_cache
+-++
+-++
+-+ 
+-+ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+ #     """
+-+@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+ 
+-+         # Initialize weights and apply final processing
+-+         self.post_init()
+-++        self.warm_up = False
+-++
+-++    def warmup_moe_model_deep(self):
+-++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++        test_texts = [
+-++            "warmup short",
+-++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-++        ]
+-++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++        if tokenizer is None:
+-++            from mindnlp.transformers import AutoTokenizer
+-++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++            self._warmup_tokenizer = tokenizer
+-++
+-++        for text in test_texts:
+-++            inputs = tokenizer(text, return_tensors="ms")
+-++            with mindspore._no_grad():
+-++                _ = self(**inputs, use_cache=False)
+-++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+ 
+-+     def get_input_embeddings(self):
+-+         return self.model.embed_tokens
+-+@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+         ```"""
+-++        if not self.warm_up:
+-++            self.warm_up = True
+-++            self.warmup_moe_model_deep()
+-++
+-+         output_attentions = (
+-+             output_attentions
+-+             if output_attentions is not None
+-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+index 3cbf820e..d4c6b651 100644
+-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+@@ -18,7 +18,6 @@
+-+ # See the License for the specific language governing permissions and
+-+ # limitations under the License.
+-+ """MindSpore Qwen2MoE model."""
+-+-
+-+ import math
+-+ from typing import List, Optional, Tuple, Union
+-+ 
+-+@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+     TokenClassifierOutput,
+-+ )
+-+ from ...modeling_utils import PreTrainedModel
+-++from ...generation import GenerationMixin
+-+ from ....utils import logging
+-+ from .configuration_qwen2_moe import Qwen2MoeConfig
+-+ 
+-+@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+         self.variance_epsilon = eps
+-+ 
+-+     def forward(self, hidden_states):
+-++        # @dwj
+-++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++        # @lwx
+-++        # if not self.training :
+-++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+         input_dtype = hidden_states.dtype
+-+         hidden_states = hidden_states.to(mindspore.float32)
+-+         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+@@ -234,6 +239,8 @@ def rotate_half(x):
+-+     """Rotates half the hidden dims of the input."""
+-+     x1 = x[..., : x.shape[-1] // 2]
+-+     x2 = x[..., x.shape[-1] // 2 :]
+-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+     return ops.cat((-x2, x1), dim=-1)
+-+ 
+-+ 
+-+@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+         self.config = config
+-+         self.hidden_size = config.hidden_size
+-+         self.intermediate_size = intermediate_size
+-++        
+-+         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+         self.act_fn = ACT2FN[config.hidden_act]
+-+ 
+-+     def forward(self, x):
+-+-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+-
+-+ 
+-++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++        # @lwx 
+-++        # gate_up_output = self.gate_up_proj(x)
+-++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-++        # return self.down_proj(swiglu_output)
+-++
+-++    # def forward(self, x):
+-++    #     gate_proj_out = self.gate_proj(x)
+-++    #     up_proj_out = self.up_proj(x)
+-++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-++    #     return self.down_proj(swiglu_out)
+-++        
+-+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+     """
+-+@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+         use_cache: bool = False,
+-+         cache_position: Optional[mindspore.Tensor] = None,
+-+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++        
+-++
+-+         bsz, q_len, _ = hidden_states.shape
+-+ 
+-+         query_states = self.q_proj(hidden_states)
+-+@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+                     "with a layer index."
+-+                 )
+-+-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++            if isinstance(past_key_value, StaticCache):
+-++                kv_seq_len = key_states.shape[-2]
+-++            else:
+-++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+ 
+-+         if past_key_value is not None:
+-+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++            
+-++            if isinstance(past_key_value, StaticCache):
+-++                kv_seq_len = key_states.shape[-2]
+-+ 
+-+         # repeat k/v heads if n_kv_heads < n_heads
+-+         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+-
+-++        
+-+         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+ 
+-+-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+-            raise ValueError(
+-+-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+-                f" {attn_weights.shape}"
+-+-            )
+-+-
+-+-        if attention_mask is not None:  # no matter the length, we just slice it
+-+-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-++        if attention_mask is not None:
+-++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+             attn_weights = attn_weights + causal_mask
+-+ 
+-+         # upcast attention to fp32
+-+@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+ 
+-+         attn_output = self.o_proj(attn_output)
+-+-
+-++        # @lwx
+-++        
+-++        # max_seq_len = self.max_position_embeddings  # 2048
+-++
+-++        # if attention_mask is not None:
+-++        #     # attention_mask: [B, 1, Sq, Sk]
+-++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++
+-++        #     # pad 到 [max_seq_len, max_seq_len]
+-++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++        #     global_attention_mask = padded_mask
+-++        # else:
+-++        #     global_attention_mask = None
+-++
+-++
+-++        # sparse_mode=3
+-++        # attn_output = mindspore.ops.flash_attention_score(
+-++        #     query=query_states,
+-++        #     key=key_states,
+-++        #     value=value_states,
+-++        #     real_shift=None,
+-++        #     padding_mask=None,
+-++
+-++        #     head_num=self.num_heads,
+-++        #     attn_mask=global_attention_mask,
+-++        #     keep_prob=1.0 - self.attention_dropout,
+-++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++        #     input_layout="BNSD",
+-++        #     pre_tokens=2147483647,
+-++        #     next_tokens=2147483647,
+-++        #     inner_precise=0,
+-++        #     drop_mask=None,
+-++        #     prefix=None,
+-++        #     actual_seq_qlen=None,
+-++        #     actual_seq_kvlen=None,
+-++        #     sparse_mode=sparse_mode,
+-++        # )
+-+         if not output_attentions:
+-+             attn_weights = None
+-+ 
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-+ 
+-++class Qwen2MoeFlashAttention(nn.Module):
+-++    """
+-++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++
+-++    关键改动:
+-++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++       直接传入原始的 key 和 value 张量效率更高。
+-++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++    """
+-++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++        super().__init__()
+-++        self.config = config
+-++        self.layer_idx = layer_idx
+-++        self.hidden_size = config.hidden_size
+-++        self.num_heads = config.num_attention_heads
+-++        self.head_dim = self.hidden_size // self.num_heads
+-++        self.num_key_value_heads = config.num_key_value_heads
+-++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++        self.max_position_embeddings = config.max_position_embeddings
+-++        self.rope_theta = config.rope_theta
+-++        self.attention_dropout = config.attention_dropout
+-++
+-++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++            raise ValueError(
+-++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++            )
+-++
+-++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++
+-++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++            self.head_dim,
+-++            max_position_embeddings=self.max_position_embeddings,
+-++            base=self.rope_theta,
+-++        )
+-++
+-++    def forward(
+-++        self,
+-++        hidden_states: mindspore.Tensor,
+-++        attention_mask: Optional[mindspore.Tensor] = None,
+-++        position_ids: Optional[mindspore.Tensor] = None,
+-++        past_key_value: Optional[Cache] = None,
+-++        output_attentions: bool = False,
+-++        use_cache: bool = False,
+-++        cache_position: Optional[mindspore.Tensor] = None,
+-++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++        bsz, q_len, _ = hidden_states.shape
+-++
+-++        # 1. 线性投射 Q, K, V
+-++        query_states = self.q_proj(hidden_states)
+-++        key_states = self.k_proj(hidden_states)
+-++        value_states = self.v_proj(hidden_states)
+-++
+-++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++        # 3. RoPE 旋转位置编码
+-++        kv_seq_len = key_states.shape[-2]
+-++        if past_key_value is not None:
+-++            if self.layer_idx is None:
+-++                raise ValueError(
+-++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++                    "with a layer index."
+-++                )
+-++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++                if cache_position.shape[0] == 1:
+-++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++                    kv_seq_len = past_seen_tokens + 1
+-++                else:
+-++                    # prefill 阶段：cache_position 是范围，使用其长度
+-++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++            else:
+-++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++
+-++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++        # 4. KV 缓存更新
+-++        if past_key_value is not None:
+-++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++            key_states, value_states = past_key_value.update(
+-++                key_states, value_states, self.layer_idx, cache_kwargs
+-++            )
+-++            
+-++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++                if cache_position.shape[0] == 1:
+-++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++                    kv_seq_len = key_states.shape[-2]
+-++
+-++        # 5. [重要] 准备 Attention Mask
+-++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++        fa_attention_mask = None
+-++        if attention_mask is not None:
+-++            # 截取与当前key长度匹配的部分
+-++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++            fa_attention_mask = (mask_slice != 0)
+-++
+-++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++        input_dtype = query_states.dtype
+-++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++            query_states = query_states.to(mindspore.float16)
+-++            key_states = key_states.to(mindspore.float16)
+-++            value_states = value_states.to(mindspore.float16)
+-++
+-++        # 6. [核心] 调用 flash_attention_score 算子
+-++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++        attn_output = mindspore.ops.flash_attention_score(
+-++            query=query_states,
+-++            key=key_states,
+-++            value=value_states,
+-++            head_num=self.num_heads, # 传入Q的头数(N1)
+-++            attn_mask=fa_attention_mask,
+-++            keep_prob=1.0 - self.attention_dropout,
+-++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-++            input_layout="BNSD",
+-++            sparse_mode=0 # 使用 defaultMask 模式
+-++        )
+-++
+-++        # 恢复原始数据类型
+-++        attn_output = attn_output.to(input_dtype)
+-++
+-++        # 7. 调整输出形状
+-++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++        attn_output = self.o_proj(attn_output)
+-++
+-++        # FlashAttention 算子不直接返回注意力权重矩阵
+-++        attn_weights = None
+-++        if output_attentions:
+-++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++
+-++        return attn_output, attn_weights, past_key_value
+-++
+-++    # def forward(
+-++    #     self,
+-++    #     hidden_states: mindspore.Tensor,
+-++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++    #     past_key_value: Optional[Cache] = None,
+-++    #     output_attentions: bool = False,
+-++    #     use_cache: bool = False,
+-++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++    #     bsz, q_len, _ = hidden_states.shape
+-++
+-++    #     # 1. 线性投射 Q, K, V
+-++    #     query_states = self.q_proj(hidden_states)
+-++    #     key_states = self.k_proj(hidden_states)
+-++    #     value_states = self.v_proj(hidden_states)
+-++
+-++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++    #     # 3. RoPE 旋转位置编码
+-++    #     kv_seq_len = key_states.shape[-2]
+-++    #     if past_key_value is not None:
+-++    #         if self.layer_idx is None:
+-++    #             raise ValueError(
+-++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++    #                 "with a layer index."
+-++    #             )
+-++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++
+-++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++    #     # 4. KV 缓存更新
+-++    #     if past_key_value is not None:
+-++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++    #         key_states, value_states = past_key_value.update(
+-++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++    #         )
+-++
+-++    #     # 5. 准备 Attention Mask
+-++    #     fa_attention_mask = None
+-++    #     if attention_mask is not None:
+-++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++    #         fa_attention_mask = (mask_slice != 0)
+-++
+-++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++    #     input_dtype = query_states.dtype
+-++
+-++    #     # 6. [核心] 调用 flash_attention_score 算子
+-++    #     attn_output = mindspore.ops.flash_attention_score(
+-++    #         query=query_states,
+-++    #         key=key_states,
+-++    #         value=value_states,
+-++    #         head_num=self.num_heads,
+-++    #         attn_mask=fa_attention_mask,
+-++    #         keep_prob=1.0 - self.attention_dropout,
+-++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++    #         input_layout="BNSD",
+-++    #         sparse_mode=0,
+-++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++    #         inner_precise=1
+-++    #     )
+-++
+-++    #     # 恢复原始数据类型
+-++    #     attn_output = attn_output.to(input_dtype)
+-++
+-++    #     # 7. 调整输出形状
+-++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++    #     attn_output = self.o_proj(attn_output)
+-++
+-++    #     attn_weights = None
+-++    #     if output_attentions:
+-++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++
+-++    #     return attn_output, attn_weights, past_key_value
+-++
+-++    # def forward(
+-++    #     self,
+-++    #     hidden_states: mindspore.Tensor,
+-++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++    #     past_key_value: Optional[Cache] = None,
+-++    #     output_attentions: bool = False,
+-++    #     use_cache: bool = False,
+-++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++    #     bsz, q_len, _ = hidden_states.shape
+-++
+-++    #     query_states = self.q_proj(hidden_states)
+-++    #     key_states = self.k_proj(hidden_states)
+-++    #     value_states = self.v_proj(hidden_states)
+-++
+-++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++    #     kv_seq_len = key_states.shape[-2]
+-++    #     if past_key_value is not None:
+-++    #         if self.layer_idx is None:
+-++    #             raise ValueError("`layer_idx` must be specified for caching")
+-++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++
+-++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++    #     if past_key_value is not None:
+-++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++    #         key_states, value_states = past_key_value.update(
+-++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++    #         )
+-++
+-++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++
+-++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++    #     query_states = query_states / math.sqrt(self.head_dim)
+-++    #     # <--- 修改结束 ---
+-++
+-++    #     fa_attention_mask = None
+-++    #     if attention_mask is not None:
+-++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++    #         fa_attention_mask = (mask_slice != 0)
+-++
+-++    #     input_dtype = query_states.dtype
+-++
+-++    #     attn_output = mindspore.ops.flash_attention_score(
+-++    #         query=query_states,    # 传入已经预先缩放过的 query
+-++    #         key=key_states,
+-++    #         value=value_states,
+-++    #         head_num=self.num_heads,
+-++    #         attn_mask=fa_attention_mask,
+-++    #         keep_prob=1.0 - self.attention_dropout,
+-++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++    #         input_layout="BNSD",
+-++    #         sparse_mode=0,
+-++    #         inner_precise=1        # 仍然保持内部高精度计算
+-++    #     )
+-++
+-++    #     attn_output = attn_output.to(input_dtype)
+-++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++    #     attn_output = self.o_proj(attn_output)
+-++
+-++    #     attn_weights = None
+-++    #     if output_attentions:
+-++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++
+-++    #     return attn_output, attn_weights, past_key_value
+-++
+-+ QWEN2MOE_ATTENTION_CLASSES = {
+-+     "eager": Qwen2MoeAttention,
+-++    "flash-attention": Qwen2MoeFlashAttention,
+-+ }
+-+ 
+-+ 
+-+@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+ 
+-++    #@dwj
+-++    # 只遍历激活的专家，而非全部专家
+-+     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-        hidden_states = hidden_states.view(-1, hidden_dim)
+-+-        # router_logits: (batch * sequence_length, n_experts)
+-+-        router_logits = self.gate(hidden_states)
+-+-
+-+-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+-        if self.norm_topk_prob:
+-+-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-        # we cast back to the input dtype
+-+-        routing_weights = routing_weights.to(hidden_states.dtype)
+-+-
+-+-        final_hidden_states = ops.zeros(
+-+-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+-        )
+-+-
+-+-        # One hot encode the selected experts to create an expert mask
+-+-        # this will be used to easily index which expert is going to be sollicitated
+-+-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+-
+-+-        # Loop over all available experts in the model and perform the computation on each expert
+-+-        for expert_idx in range(self.num_experts):
+-+-            expert_layer = self.experts[expert_idx]
+-+-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+-
+-+-            # Index the correct hidden states and compute the expert hidden state for
+-+-            # the current expert. We need to make sure to multiply the output hidden
+-+-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+-            if 0 not in idx.shape:
+-+-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+-
+-+-                # However `index_add_` only support torch tensors for indexing so we'll use
+-+-                # the `top_x` tensor here.
+-+-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+-
+-+-        shared_expert_output = self.shared_expert(hidden_states)
+-+-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+-
+-+-        final_hidden_states = final_hidden_states + shared_expert_output
+-++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++            num_tokens = hidden_states_reshaped.shape[0]
+-++
+-++            router_logits = self.gate(hidden_states_reshaped)
+-++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++
+-++            if self.norm_topk_prob:
+-++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++            routing_weights = routing_weights.to(hidden_states.dtype)
+-++            
+-++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++            flat_selected_experts = selected_experts.flatten()
+-++            
+-++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++            token_indices = broadcasted_token_indices.flatten()
+-++            
+-++            active_experts = ops.unique(flat_selected_experts)
+-++            
+-++            for expert_idx_tensor in active_experts:
+-++                expert_idx = expert_idx_tensor.item()
+-++                expert_layer = self.experts[expert_idx]
+-++                
+-++                mask = (flat_selected_experts == expert_idx_tensor)
+-++                selected_token_indices = token_indices[mask]
+-++                selected_routing_weights = routing_weights.flatten()[mask]
+-++                
+-++                current_states = hidden_states_reshaped[selected_token_indices]
+-++                
+-++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++                
+-++                final_hidden_states = final_hidden_states.index_add(
+-++                    dim=0,
+-++                    index=selected_token_indices,
+-++                    source=expert_output.to(hidden_states.dtype)
+-++                )
+-++            
+-++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+ 
+-+-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+-        return final_hidden_states, router_logits
+-++            final_hidden_states = final_hidden_states + shared_expert_output
+-++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++            
+-++            return final_hidden_states, router_logits
+-+ 
+-+ 
+-+ class Qwen2MoeDecoderLayer(nn.Module):
+-+@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+ 
+-+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+ 
+-++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++
+-+         if (layer_idx not in config.mlp_only_layers) and (
+-+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+         ):
+-+@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+     _skip_keys_device_placement = "past_key_values"
+-+     _supports_cache_class = True
+-++#lwx
+-++    # _supports_static_cache = True
+-+ 
+-+     def _init_weights(self, module):
+-+         std = self.config.initializer_range
+-+@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+         return causal_mask
+-+ 
+-+ 
+-+-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+     _tied_weights_keys = ["lm_head.weight"]
+-+ 
+-+     def __init__(self, config):
+-+@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+         self.num_experts_per_tok = config.num_experts_per_tok
+-+         # Initialize weights and apply final processing
+-+         self.post_init()
+-++        # @lwx
+-++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-++        #     self.generation_config.cache_implementation = "static"
+-++        self._warmed_up = False
+-++
+-++    def warmup_moe_model(self):
+-++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-++        test_texts = [
+-++            "warmup short",
+-++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-++        ]
+-++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++        if tokenizer is None:
+-++            from mindnlp.transformers import AutoTokenizer
+-++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++            self._warmup_tokenizer = tokenizer
+-++
+-++        for text in test_texts:
+-++            inputs = tokenizer(text, return_tensors="ms")
+-++            with mindspore._no_grad():
+-++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+ 
+-+     def get_input_embeddings(self):
+-+         return self.model.embed_tokens
+-+@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+         ```"""
+-++        if not self._warmed_up:
+-++            self._warmed_up = True
+-++            self.warmup_moe_model()
+-+ 
+-+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+         output_router_logits = (
+-+@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+             }
+-+         )
+-+         return model_inputs
+-++# @lwx
+-++    # def _decode_one_tokens_logits(
+-++    #     self,
+-++    #     cur_token: mindspore.Tensor,
+-++    #     input_pos: Optional[mindspore.Tensor],
+-++    #     cache_position: mindspore.Tensor,
+-++    #     past_key_values: StaticCache,
+-++    # ) -> mindspore.Tensor:
+-++    #     """
+-++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-++        
+-++    #     Args:
+-++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-++    #         input_pos: 输入位置信息，可选
+-++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-++            
+-++    #     Returns:
+-++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-++    #     """
+-++    #     # 调用JIT编译的版本
+-++    #     return self.get_decode_one_tokens_logits(
+-++    #         cur_token=cur_token,
+-++    #         input_pos=input_pos,
+-++    #         cache_position=cache_position,
+-++    #         past_key_values=past_key_values,
+-++    #     )
+-++    
+-++    # @mindspore.jit(jit_level='O1')
+-++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-++    #     """
+-++    #     JIT编译的函数，用于高效的单token解码
+-++    #     使用JIT编译优化以支持静态shape和高效执行
+-++        
+-++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-++    #     """
+-++    #     outputs = self.model.forward(
+-++    #         input_ids=cur_token,
+-++    #         position_ids=input_pos,
+-++    #         cache_position=cache_position,
+-++    #         past_key_values=past_key_values,
+-++    #         use_cache=True,
+-++    #         return_dict=False,
+-++    #     )
+-++        
+-++    #     hidden_states = outputs[0]
+-++    #     logits = self.lm_head.forward(hidden_states)
+-++    #     logits = logits.float()
+-++        
+-++    #     return logits[:, -1, :]
+-++
+-++    # def _sample(
+-++    #     self,
+-++    #     input_ids: mindspore.Tensor,
+-++    #     logits_processor,
+-++    #     stopping_criteria,
+-++    #     generation_config,
+-++    #     synced_devices: bool,
+-++    #     streamer=None,
+-++    #     logits_warper=None,
+-++    #     **model_kwargs,
+-++    # ):
+-++    #     """
+-++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-++    #     """
+-++    #     from ...generation.logits_process import LogitsProcessorList
+-++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-++    #     from mindnlp.core import nn, ops, no_grad
+-++    #     import numpy as np
+-++        
+-++    #     # 检查是否使用 StaticCache
+-++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-++    #     # 否则，直接调用父类方法
+-++    #     past_key_values = model_kwargs.get("past_key_values")
+-++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-++
+-++    #     if not isinstance(past_key_values, StaticCache):
+-++    #         # 不使用 StaticCache，直接调用父类方法
+-++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-++    #         return super()._sample(
+-++    #             input_ids=input_ids,
+-++    #             logits_processor=logits_processor,
+-++    #             stopping_criteria=stopping_criteria,
+-++    #             generation_config=generation_config,
+-++    #             synced_devices=synced_devices,
+-++    #             streamer=streamer,
+-++    #             logits_warper=logits_warper,
+-++    #             **model_kwargs,
+-++    #         )
+-++        
+-++    #     # 使用 StaticCache，进入自定义循环
+-++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-++    #     pad_token_id = generation_config._pad_token_tensor
+-++    #     output_attentions = generation_config.output_attentions
+-++    #     output_hidden_states = generation_config.output_hidden_states
+-++    #     output_scores = generation_config.output_scores
+-++    #     output_logits = generation_config.output_logits
+-++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-++    #     max_length = generation_config.max_length
+-++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-++    #     do_sample = generation_config.do_sample
+-++        
+-++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-++    #         raise ValueError(
+-++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-++    #             f"{logits_warper})."
+-++    #         )
+-++        
+-++    #     # init attention / hidden states / scores tuples
+-++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-++        
+-++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-++    #         encoder_hidden_states = (
+-++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-++    #         )
+-++        
+-++    #     # keep track of which sequences are already finished
+-++    #     batch_size, cur_len = input_ids.shape
+-++    #     this_peer_finished = False
+-++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-++        
+-++    #     time_record = []
+-++    #     from ....utils.testing_utils import parse_flag_from_env
+-++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-++        
+-++    #     while self._has_unfinished_sequences(
+-++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-++    #     ):
+-++    #         if _record_time:
+-++    #             import time as time_module
+-++    #             infer_start = time_module.time()
+-++            
+-++    #         # prepare model inputs
+-++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-++            
+-++    #         # prepare variable output controls
+-++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-++            
+-++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-++    #         cur_cache_position = model_inputs.get("cache_position")
+-++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-++    #         cur_input_ids = model_inputs.get("input_ids")
+-++            
+-++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-++    #             cur_cache_position is not None and 
+-++    #             len(cur_cache_position.shape) > 0 and
+-++    #             cur_cache_position.shape[0] == 1 and
+-++    #             cur_input_ids is not None and
+-++    #             cur_input_ids.shape[1] == 1):
+-++    #             # 使用 JIT 优化的单 token 解码
+-++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-++    #             if not hasattr(self, '_jit_used'):
+-++    #                 self._jit_used = False
+-++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-++                
+-++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-++    #                 cur_token=cur_input_ids,
+-++    #                 input_pos=model_inputs.get("position_ids"),
+-++    #                 cache_position=cur_cache_position,
+-++    #                 past_key_values=cur_past_key_values,
+-++    #             )
+-++                
+-++    #             # 标记已使用JIT（用于后续判断）
+-++    #             if not self._jit_used:
+-++    #                 self._jit_used = True
+-++                
+-++    #             # 构造兼容的输出对象
+-++    #             class JitOptimizedOutput:
+-++    #                 def __init__(self, logits, config):
+-++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-++    #                     self.config = config
+-++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-++    #                     self.cross_attentions = None
+-++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-++                
+-++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-++    #         else:
+-++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-++    #             outputs = self(**model_inputs, return_dict=True)
+-++            
+-++    #         if synced_devices and this_peer_finished:
+-++    #             continue
+-++            
+-++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-++    #         next_token_logits = outputs.logits[:, -1, :]
+-++            
+-++    #         # pre-process distribution
+-++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-++    #         if do_sample:
+-++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-++            
+-++    #         # Store scores, attentions and hidden_states when required
+-++    #         if return_dict_in_generate:
+-++    #             if output_scores:
+-++    #                 scores += (next_token_scores,)
+-++    #             if output_logits:
+-++    #                 raw_logits += (next_token_logits,)
+-++    #             if output_attentions:
+-++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-++    #                 if self.config.is_encoder_decoder:
+-++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-++                
+-++    #             if output_hidden_states:
+-++    #                 hidden = (
+-++    #                     outputs.decoder_hidden_states
+-++    #                     if self.config.is_encoder_decoder
+-++    #                     else outputs.hidden_states
+-++    #                 )
+-++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-++            
+-++    #         # token selection
+-++    #         if do_sample:
+-++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-++    #         else:
+-++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-++            
+-++    #         # finished sentences should have their next token be a padding token
+-++    #         if has_eos_stopping_criteria:
+-++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-++            
+-++    #         # update generated ids, model inputs, and length for next step
+-++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-++    #         if streamer is not None:
+-++    #             streamer.put(next_tokens)
+-++            
+-++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-++    #             outputs,
+-++    #             model_kwargs,
+-++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-++    #         )
+-++            
+-++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-++    #         cur_len += 1
+-++            
+-++    #         if _record_time:
+-++    #             import time as time_module
+-++    #             infer_stop = time_module.time()
+-++    #             time_record.append(infer_stop - infer_start)
+-++            
+-++    #         del outputs
+-++        
+-++    #     average_infer_time = None
+-++    #     if time_record:
+-++    #         if len(time_record) > 1:
+-++    #             time_record.pop(0)
+-++    #         average_infer_time = sum(time_record) / len(time_record)
+-++    #         print(f'average inference time is: {average_infer_time}')
+-++    #         print(f'inference time record: {time_record}')
+-++        
+-++    #     if streamer is not None:
+-++    #         streamer.end()
+-++        
+-++    #     # 简单判断：打印是否使用了JIT路径
+-++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-++    #     else:
+-++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-++        
+-++    #     if return_dict_in_generate:
+-++    #         if self.config.is_encoder_decoder:
+-++    #             return GenerateEncoderDecoderOutput(
+-++    #                 sequences=input_ids,
+-++    #                 scores=scores,
+-++    #                 logits=raw_logits,
+-++    #                 encoder_attentions=encoder_attentions,
+-++    #                 encoder_hidden_states=encoder_hidden_states,
+-++    #                 decoder_attentions=decoder_attentions,
+-++    #                 cross_attentions=cross_attentions,
+-++    #                 decoder_hidden_states=decoder_hidden_states,
+-++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++    #                 average_infer_time=average_infer_time
+-++    #             )
+-++    #         else:
+-++    #             return GenerateDecoderOnlyOutput(
+-++    #                 sequences=input_ids,
+-++    #                 scores=scores,
+-++    #                 logits=raw_logits,
+-++    #                 attentions=decoder_attentions,
+-++    #                 hidden_states=decoder_hidden_states,
+-++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++    #                 average_infer_time=average_infer_time
+-++    #             )
+-++    #     else:
+-++    #         return input_ids
+-++            
+-++    # def _prepare_cache_for_generation(
+-++    #     self,
+-++    #     generation_config,
+-++    #     model_kwargs,
+-++    #     assistant_model,
+-++    #     batch_size,
+-++    #     max_cache_length,
+-++    # ):
+-++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-++    #         generation_config.cache_implementation = "static"
+-++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-++        
+-++    #     if generation_config.cache_implementation == "static":
+-++    #         base_required_from_max_length = generation_config.max_length + 1
+-++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-++    #         min_cache_size = 50
+-++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-++    #         else:
+-++    #             max_cache_length = max(base_required, min_cache_size)
+-++            
+-++    #         original_max_cache_length = max_cache_length
+-++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-++            
+-++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++    #             if max_cache_length > self.config.max_position_embeddings:
+-++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-++        
+-++    #     result = super()._prepare_cache_for_generation(
+-++    #         generation_config=generation_config,
+-++    #         model_kwargs=model_kwargs,
+-++    #         assistant_model=assistant_model,
+-++    #         batch_size=batch_size,
+-++    #         max_cache_length=max_cache_length,
+-++    #     )
+-++        
+-++    #     if generation_config.cache_implementation == "static":
+-++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-++    #         created_cache = model_kwargs.get(cache_name)
+-++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-++    #             if created_cache.max_cache_len < generation_config.max_length:
+-++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-++        
+-++    #     return result
+-++
+-++
+-++
+-+ 
+-+ 
+-+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+-- 
+-+2.27.0
+-+
+--- 
+-2.27.0
+-
+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+deleted file mode 100644
+index 179a9bb5..00000000
+--- a/patches/0003-20261106secondcommit.patch
++++ /dev/null
+@@ -1,2769 +0,0 @@
+-From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Thu, 6 Nov 2025 14:54:37 +0800
+-Subject: [PATCH 3/8] 20261106secondcommit
+-
+----
+- .../models/deepseek/modeling_deepseek.py      |  217 ++-
+- .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+- patches/0001-20251104commit.patch             | 1272 -----------------
+- 3 files changed, 528 insertions(+), 2032 deletions(-)
+- delete mode 100644 patches/0001-20251104commit.patch
+-
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index 73773c22..2f9192bf 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+- 
+- _CONFIG_FOR_DOC = "DeepseekConfig"
+- 
+-+_attn_mask_cache = {}
+-+
+-+def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+-+    q_len = batch_and_seq[1]
+-+    kv_len = batch_and_seq[1] + past_key_values_length 
+-+    key = (batch_and_seq[0], q_len, kv_len)
+-+
+-+    if key in _attn_mask_cache:
+-+        return _attn_mask_cache[key]
+-+
+-+    mask = _prepare_4d_causal_attention_mask(
+-+        attention_mask,
+-+        batch_and_seq,
+-+        inputs_embeds,
+-+        past_key_values_length,
+-+    )
+-+    _attn_mask_cache[key] = mask
+-+    return mask
+- 
+- def _get_unpad_data(attention_mask):
+-     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+-@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+-         return final_output
+- 
+- 
+--    @no_grad()
+--    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+--        expert_cache = ops.zeros_like(x)
+--        idxs = flat_expert_indices.argsort()
+--        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+--        token_idxs = idxs // self.num_experts_per_tok
+--
+--        for i, end_idx in enumerate(tokens_per_expert):
+--            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+--            if start_idx == end_idx:
+--                continue
+--            expert = self.experts[i]
+--            exp_token_idx = token_idxs[start_idx:end_idx]
+--            expert_tokens = x[exp_token_idx]
+--            expert_out = expert(expert_tokens)
+--            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+--            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+--
+--        return expert_cache
+--        
+-     # @no_grad()
+--    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+--    #     # expert_cache = torch.zeros_like(x)
+--    #     # idxs = flat_expert_indices.argsort()
+--    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+--    #     # token_idxs = idxs // self.num_experts_per_tok
+--    #     # for i, end_idx in enumerate(tokens_per_expert):
+--    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+--    #     #     if start_idx == end_idx:
+--    #     #         continue
+--    #     #     expert = self.experts[i]
+--    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+--    #     #     expert_tokens = x[exp_token_idx]
+--    #     #     expert_out = expert(expert_tokens)
+--    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+--    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+--    #     # return expert_cache
+-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-     #     expert_cache = ops.zeros_like(x)
+-     #     idxs = flat_expert_indices.argsort()
+-     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+-     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+- 
+-     #     return expert_cache
+--    # @no_grad()
+--    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+--    #     expert_cache = ops.zeros_like(x)
+-+        
+-+    @no_grad()
+-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+        """
+-+        优化版 MoE prefill：
+-+        - 批量张量化处理同一个 expert 的所有 token
+-+        - 跳过无 token 的专家
+-+        - 保持结果完全一致
+-+        """
+-+        # 初始化输出缓存
+-+        expert_cache = ops.zeros_like(x)
+- 
+--    #     # 排序保证顺序一致
+--    #     idxs = flat_expert_indices.argsort()
+--    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+--    #     token_idxs = idxs // self.num_experts_per_tok
+-+        # 排序（确保 scatter_add 位置对应原逻辑）
+-+        idxs = flat_expert_indices.argsort()
+-+        sorted_expert_indices = flat_expert_indices[idxs]
+-+        sorted_token_indices = idxs // self.num_experts_per_tok
+- 
+--    #     # 找出有 token 的专家
+--    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+        # 每个 expert 的 token 数
+-+        tokens_per_expert = sorted_expert_indices.bincount()
+- 
+--    #     for i in active_experts.tolist():
+--    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+--    #         end_idx = tokens_per_expert[i]
+--    #         if start_idx == end_idx:  # 没有 token
+--    #             continue
+-+        # 找出有 token 的专家
+-+        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+- 
+--    #         exp_token_idx = token_idxs[start_idx:end_idx]
+--    #         expert_tokens = x[exp_token_idx]
+--    #         expert_out = self.experts[i](expert_tokens)
+--    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+        for expert_id in active_experts.tolist():
+-+            # 取该 expert 对应的排序后 token 区间
+-+            start = (tokens_per_expert[:expert_id]).sum().item()
+-+            end = start + tokens_per_expert[expert_id].item()
+- 
+--    #         expert_cache = mindspore.mint.scatter_add(
+--    #             expert_cache,
+--    #             0,
+--    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+--    #             expert_out
+--    #         )
+-+            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+-+            expert_tokens = x[token_idx]                     # 取输入向量
+- 
+--    #     return expert_cache
+-+            # 执行专家 MLP
+-+            expert_out = self.experts[expert_id](expert_tokens)
+-+
+-+            # 按权重缩放
+-+            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+-+
+-+            # 回写到缓存（等价 scatter_add）
+-+            expert_cache = mindspore.mint.scatter_add(
+-+                expert_cache,
+-+                0,
+-+                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+                scaled_out
+-+            )
+-+
+-+        return expert_cache
+-+
+-+        # @no_grad()
+-+        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+        #     # expert_cache = torch.zeros_like(x)
+-+        #     # idxs = flat_expert_indices.argsort()
+-+        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+        #     # token_idxs = idxs // self.num_experts_per_tok
+-+        #     # for i, end_idx in enumerate(tokens_per_expert):
+-+        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+        #     #     if start_idx == end_idx:
+-+        #     #         continue
+-+        #     #     expert = self.experts[i]
+-+        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+        #     #     expert_tokens = x[exp_token_idx]
+-+        #     #     expert_out = expert(expert_tokens)
+-+        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+        #     # return expert_cache
+-+        #     expert_cache = ops.zeros_like(x)
+-+        #     idxs = flat_expert_indices.argsort()
+-+        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+        #     token_idxs = idxs // self.num_experts_per_tok
+-+
+-+        #     for i, end_idx in enumerate(tokens_per_expert):
+-+        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+        #         if start_idx == end_idx:
+-+        #             continue
+-+        #         expert = self.experts[i]
+-+        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+        #         expert_tokens = x[exp_token_idx]
+-+        #         expert_out = expert(expert_tokens)
+-+        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+
+-+        #     return expert_cache
+-+        # @no_grad()
+-+        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+        #     expert_cache = ops.zeros_like(x)
+-+
+-+        #     # 排序保证顺序一致
+-+        #     idxs = flat_expert_indices.argsort()
+-+        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+        #     token_idxs = idxs // self.num_experts_per_tok
+-+
+-+        #     # 找出有 token 的专家
+-+        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+
+-+        #     for i in active_experts.tolist():
+-+        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+        #         end_idx = tokens_per_expert[i]
+-+        #         if start_idx == end_idx:  # 没有 token
+-+        #             continue
+-+
+-+        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+        #         expert_tokens = x[exp_token_idx]
+-+        #         expert_out = self.experts[i](expert_tokens)
+-+        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+
+-+        #         expert_cache = mindspore.mint.scatter_add(
+-+        #             expert_cache,
+-+        #             0,
+-+        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+        #             expert_out
+-+        #         )
+-+
+-+        #     return expert_cache
+- 
+- 
+- 
+-@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+- 
+-         return attn_output, attn_weights, past_key_value
+- 
+--
+- # class DeepseekFlashAttention(nn.Module):
+- #     """
+- #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+- 
+-         return attn_output, attn_weights, past_key_value
+- 
+-+
+- Deepseek_ATTENTION_CLASSES = {
+-     "eager": DeepseekAttention,
+-     "flash-attention": DeepseekFlashAttention,
+-@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+-             )
+-         else:
+-             # 4d mask is passed through the layers
+--            attention_mask = _prepare_4d_causal_attention_mask(
+-+            # attention_mask = _prepare_4d_causal_attention_mask(
+-+            #     attention_mask,
+-+            #     (batch_size, seq_length),
+-+            #     inputs_embeds,
+-+            #     past_key_values_length,
+-+            # )
+-+            #@dwj
+-+            attention_mask = get_cached_causal_mask(
+-                 attention_mask,
+-                 (batch_size, seq_length),
+-                 inputs_embeds,
+-@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-         # Initialize weights and apply final processing
+-         self.post_init()
+-         self.warm_up = False
+-+        #@dwj
+-+        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-+            self.num_layers,
+-+            self.num_attention_heads,
+-+            self.head_dim,
+-+            batch_size=1,
+-+            max_length=self.max_length,
+-+            dtype=mindspore.float16
+-+        )
+-+
+-+    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-+        key_cache = []
+-+        value_cache = []
+-+        for _ in range(num_layers):
+-+            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+            key_cache.append(k)
+-+            value_cache.append(v)
+-+        return key_cache, value_cache
+-+
+- 
+-     def warmup_moe_model_deep(self):
+-         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-index bced285c..ebd7782e 100644
+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+- _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+- _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+- 
+--Long_Prompt = False
+--PROMPT_LENGTH_THRESHOLD = 128
+-+Long_Prompt = 1
+-+LONG_PROMPT_LENGTH_THRESHOLD = 128
+-+SHORT_PROMPT_LENGTH_THRESHOLD = 32
+-+
+-+_causal_mask_cache = {}
+-+
+-+def get_cached_causal_mask_with_cache_position(
+-+    attention_mask: mindspore.Tensor,
+-+    sequence_length: int,
+-+    target_length: int,
+-+    dtype: mindspore.dtype,
+-+    min_dtype: float,
+-+    cache_position: mindspore.Tensor,
+-+    batch_size: int,
+-+):
+-+    """
+-+    带缓存的 causal mask 构造函数
+-+    """
+-+    # q_len 是当前 query 长度
+-+    q_len = sequence_length
+-+    # kv_len 是 target_length
+-+    kv_len = target_length
+-+
+-+    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+-+    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+-+
+-+    if key in _causal_mask_cache:
+-+        return _causal_mask_cache[key]
+-+
+-+    # 调用原来的 mask 构造逻辑
+-+    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+        attention_mask,
+-+        sequence_length=sequence_length,
+-+        target_length=target_length,
+-+        dtype=dtype,
+-+        min_dtype=min_dtype,
+-+        cache_position=cache_position,
+-+        batch_size=batch_size,
+-+    )
+-+    # 缓存结果
+-+    _causal_mask_cache[key] = causal_mask
+-+    return causal_mask
+- 
+- # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+- def _prepare_4d_causal_attention_mask_with_cache_position(
+-@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+- 
+- 
+- # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+-+# class Qwen2MoeAttention(nn.Module):
+-+#     """
+-+#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-+#     and "Generating Long Sequences with Sparse Transformers".
+-+#     """
+-+
+-+#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+#         super().__init__()
+-+#         self.config = config
+-+#         self.layer_idx = layer_idx
+-+#         if layer_idx is None:
+-+#             logger.warning_once(
+-+#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+#                 "when creating this class."
+-+#             )
+-+
+-+#         self.hidden_size = config.hidden_size
+-+#         self.num_heads = config.num_attention_heads
+-+#         self.head_dim = self.hidden_size // self.num_heads
+-+#         self.num_key_value_heads = config.num_key_value_heads
+-+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+#         self.max_position_embeddings = config.max_position_embeddings
+-+#         self.rope_theta = config.rope_theta
+-+#         self.is_causal = True
+-+#         self.attention_dropout = config.attention_dropout
+-+
+-+#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+#             raise ValueError(
+-+#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+#                 f" and `num_heads`: {self.num_heads})."
+-+#             )
+-+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+
+-+#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+#             self.head_dim,
+-+#             max_position_embeddings=self.max_position_embeddings,
+-+#             base=self.rope_theta,
+-+#         )
+-+
+-+#     def forward(
+-+#         self,
+-+#         hidden_states: mindspore.Tensor,
+-+#         attention_mask: Optional[mindspore.Tensor] = None,
+-+#         position_ids: Optional[mindspore.Tensor] = None,
+-+#         past_key_value: Optional[Cache] = None,
+-+#         output_attentions: bool = False,
+-+#         use_cache: bool = False,
+-+#         cache_position: Optional[mindspore.Tensor] = None,
+-+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+
+-+        
+-+
+-+#         bsz, q_len, _ = hidden_states.shape
+-+
+-+#         query_states = self.q_proj(hidden_states)
+-+#         key_states = self.k_proj(hidden_states)
+-+#         value_states = self.v_proj(hidden_states)
+-+
+-+#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+
+-+#         kv_seq_len = key_states.shape[-2]
+-+#         if past_key_value is not None:
+-+#             if self.layer_idx is None:
+-+#                 raise ValueError(
+-+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+#                     "with a layer index."
+-+#                 )
+-+#             if isinstance(past_key_value, StaticCache):
+-+#                 kv_seq_len = key_states.shape[-2]
+-+#             else:
+-+#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+
+-+#         if past_key_value is not None:
+-+#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+            
+-+#             if isinstance(past_key_value, StaticCache):
+-+#                 kv_seq_len = key_states.shape[-2]
+-+
+-+#         # repeat k/v heads if n_kv_heads < n_heads
+-+#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+        
+-+#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+
+-+#         if attention_mask is not None:
+-+#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+#             attn_weights = attn_weights + causal_mask
+-+
+-+#         # upcast attention to fp32
+-+#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+#         attn_output = ops.matmul(attn_weights, value_states)
+-+
+-+#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+#             raise ValueError(
+-+#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-+#                 f" {attn_output.shape}"
+-+#             )
+-+
+-+#         attn_output = ops.transpose(attn_output, 1, 2)
+-+#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+
+-+#         attn_output = self.o_proj(attn_output)
+-+#         # @lwx
+-+        
+-+#         # max_seq_len = self.max_position_embeddings  # 2048
+-+
+-+#         # if attention_mask is not None:
+-+#         #     # attention_mask: [B, 1, Sq, Sk]
+-+#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+
+-+#         #     # pad 到 [max_seq_len, max_seq_len]
+-+#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+#         #     global_attention_mask = padded_mask
+-+#         # else:
+-+#         #     global_attention_mask = None
+-+
+-+
+-+#         # sparse_mode=3
+-+#         # attn_output = mindspore.ops.flash_attention_score(
+-+#         #     query=query_states,
+-+#         #     key=key_states,
+-+#         #     value=value_states,
+-+#         #     real_shift=None,
+-+#         #     padding_mask=None,
+-+
+-+#         #     head_num=self.num_heads,
+-+#         #     attn_mask=global_attention_mask,
+-+#         #     keep_prob=1.0 - self.attention_dropout,
+-+#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+#         #     input_layout="BNSD",
+-+#         #     pre_tokens=2147483647,
+-+#         #     next_tokens=2147483647,
+-+#         #     inner_precise=0,
+-+#         #     drop_mask=None,
+-+#         #     prefix=None,
+-+#         #     actual_seq_qlen=None,
+-+#         #     actual_seq_kvlen=None,
+-+#         #     sparse_mode=sparse_mode,
+-+#         # )
+-+#         if not output_attentions:
+-+#             attn_weights = None
+-+
+-+#         return attn_output, attn_weights, past_key_value
+-+
+- class Qwen2MoeAttention(nn.Module):
+-     """
+--    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+--    and "Generating Long Sequences with Sparse Transformers".
+--    """
+-+    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+- 
+-+    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+-+    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+-+    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+-+
+-+    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+-+    """
+-     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-         super().__init__()
+-         self.config = config
+-@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+-         if layer_idx is None:
+-             logger.warning_once(
+-                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+--                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-                 "when creating this class."
+-             )
+- 
+-@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+-         use_cache: bool = False,
+-         cache_position: Optional[mindspore.Tensor] = None,
+-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+--
+-         
+--
+-+        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+-         bsz, q_len, _ = hidden_states.shape
+- 
+-         query_states = self.q_proj(hidden_states)
+-         key_states = self.k_proj(hidden_states)
+-         value_states = self.v_proj(hidden_states)
+- 
+--        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+--        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+--        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+--
+-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+        
+-         kv_seq_len = key_states.shape[-2]
+-         if past_key_value is not None:
+--            if self.layer_idx is None:
+--                raise ValueError(
+--                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+--                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+--                    "with a layer index."
+--                )
+--            if isinstance(past_key_value, StaticCache):
+--                kv_seq_len = key_states.shape[-2]
+--            else:
+--                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+        
+-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+- 
+-         if past_key_value is not None:
+--            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+
+-+        # --- 2. 动态调度核心注意力计算 ---
+-+        global Long_Prompt
+-+        if Long_Prompt >= 1:
+-+            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+-+            fa_attention_mask = None
+-+            if attention_mask is not None:
+-+                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+                fa_attention_mask = (mask_slice != 0)
+-+
+-+            attn_output = mindspore.ops.flash_attention_score(
+-+                query=query_states,
+-+                key=key_states,
+-+                value=value_states,
+-+                head_num=self.num_heads,
+-+                attn_mask=fa_attention_mask,
+-+                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+-+                scalar_value=1.0 / math.sqrt(self.head_dim),
+-+                input_layout="BNSD",
+-+                sparse_mode=0,
+-+                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+-+            )
+-             
+--            if isinstance(past_key_value, StaticCache):
+--                kv_seq_len = key_states.shape[-2]
+-+            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+            attn_output = self.o_proj(attn_output)
+-+            attn_weights = None
+-+            if output_attentions:
+-+                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+- 
+--        # repeat k/v heads if n_kv_heads < n_heads
+--        key_states = repeat_kv(key_states, self.num_key_value_groups)
+--        value_states = repeat_kv(value_states, self.num_key_value_groups)
+--        
+--        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+        else:
+-+            # --- Eager Attention 路径 (用于短序列和解码) ---
+-+            key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+            value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+            
+-+            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+- 
+--        if attention_mask is not None:
+--            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+--            attn_weights = attn_weights + causal_mask
+-+            if attention_mask is not None:
+-+                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+                attn_weights = attn_weights + causal_mask
+- 
+--        # upcast attention to fp32
+--        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+--        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+--        attn_output = ops.matmul(attn_weights, value_states)
+-+            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+            attn_output = ops.matmul(attn_weights, value_states)
+- 
+--        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+--            raise ValueError(
+--                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+--                f" {attn_output.shape}"
+--            )
+-+            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+                raise ValueError(
+-+                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+-+                )
+- 
+--        attn_output = ops.transpose(attn_output, 1, 2)
+--        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+            attn_output = ops.transpose(attn_output, 1, 2)
+-+            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+            attn_output = self.o_proj(attn_output)
+- 
+--        attn_output = self.o_proj(attn_output)
+--        # @lwx
+-+            if not output_attentions:
+-+                attn_weights = None
+-         
+--        # max_seq_len = self.max_position_embeddings  # 2048
+--
+--        # if attention_mask is not None:
+--        #     # attention_mask: [B, 1, Sq, Sk]
+--        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+--
+--        #     # pad 到 [max_seq_len, max_seq_len]
+--        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+--        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+--        #     global_attention_mask = padded_mask
+--        # else:
+--        #     global_attention_mask = None
+--
+--
+--        # sparse_mode=3
+--        # attn_output = mindspore.ops.flash_attention_score(
+--        #     query=query_states,
+--        #     key=key_states,
+--        #     value=value_states,
+--        #     real_shift=None,
+--        #     padding_mask=None,
+--
+--        #     head_num=self.num_heads,
+--        #     attn_mask=global_attention_mask,
+--        #     keep_prob=1.0 - self.attention_dropout,
+--        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+--        #     input_layout="BNSD",
+--        #     pre_tokens=2147483647,
+--        #     next_tokens=2147483647,
+--        #     inner_precise=0,
+--        #     drop_mask=None,
+--        #     prefix=None,
+--        #     actual_seq_qlen=None,
+--        #     actual_seq_kvlen=None,
+--        #     sparse_mode=sparse_mode,
+--        # )
+--        if not output_attentions:
+--            attn_weights = None
+--
+-         return attn_output, attn_weights, past_key_value
+- 
+--
+- # class Qwen2MoeFlashAttention(nn.Module):
+- #     """
+- #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+- #             return final_hidden_states, router_logits
+- 
+- 
+--# class Qwen2MoeSparseMoeBlock(nn.Module):
+--#     """
+--#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+--#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+--#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+--#     `_moe_infer_prefill` (用于长序列处理) 方法。
+--#     """
+--#     def __init__(self, config: Qwen2MoeConfig):
+--#         super().__init__()
+--#         self.num_experts = config.num_experts
+--#         self.top_k = config.num_experts_per_tok
+--#         self.norm_topk_prob = config.norm_topk_prob
+--
+--#         # 门控网络
+--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+--#         # 专家列表
+--#         self.experts = nn.ModuleList(
+--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+--#         )
+--#         # 共享专家
+--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+--
+--#     @no_grad()
+--#     def _moe_infer_decode(
+--#         self, 
+--#         hidden_states: mindspore.Tensor, 
+--#         selected_experts: mindspore.Tensor, 
+--#         routing_weights: mindspore.Tensor
+--#     ) -> mindspore.Tensor:
+--#         """
+--#         【解码路径】针对 sequence_length=1 的极致优化。
+--#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+--#         """
+--#         batch_size, hidden_dim = hidden_states.shape
+--        
+--#         expert_outputs_list = [
+--#             ops.cat([
+--#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+--#             ], dim=0) 
+--#             for i in range(batch_size)
+--#         ]
+--        
+--#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+--#         # shape: (batch_size, top_k, hidden_dim)
+--#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+--        
+--#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+--#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+--        
+--#         return moe_output.squeeze(1)
+--
+--#     @no_grad()
+--#     def _moe_infer_prefill(
+--#         self, 
+--#         hidden_states: mindspore.Tensor, 
+--#         selected_experts: mindspore.Tensor, 
+--#         routing_weights: mindspore.Tensor
+--#     ) -> mindspore.Tensor:
+--#         """
+--#         【预填充路径】针对 sequence_length > 1 的优化。
+--#         按专家对 Token 进行分组，并进行批处理。
+--#         """
+--#         moe_output = ops.zeros_like(hidden_states)
+--#         num_tokens = hidden_states.shape[0]
+--#         flat_selected_experts = selected_experts.flatten()
+--        
+--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+--        
+--#         active_experts = ops.unique(flat_selected_experts)
+--        
+--#         for expert_idx_tensor in active_experts:
+--#             expert_idx = expert_idx_tensor.item()
+--#             expert_layer = self.experts[expert_idx]
+--            
+--#             mask = (flat_selected_experts == expert_idx_tensor)
+--#             selected_token_indices = token_indices[mask]
+--#             selected_routing_weights = routing_weights.flatten()[mask]
+--            
+--#             current_states = hidden_states[selected_token_indices]
+--            
+--#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+--            
+--#             moe_output = moe_output.index_add(
+--#                 dim=0,
+--#                 index=selected_token_indices,
+--#                 source=expert_output.to(hidden_states.dtype)
+--#             )
+--#         return moe_output
+--
+--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+--#         """
+--#         顶层 forward 方法，作为智能分发器。
+--#         """
+--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+--        
+--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+--#         router_logits = self.gate(hidden_states_reshaped)
+--#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+--
+--#         if self.norm_topk_prob:
+--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--        
+--#         routing_weights = routing_weights.to(hidden_states.dtype)
+--        
+--#         moe_output = None
+--#         # 在推理时，根据序列长度选择最优路径
+--#         if not self.training:
+--#             if sequence_length == 1:
+--#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+--#             else:
+--#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+--#         else:
+--#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+--#             raise NotImplementedError("Training path is not implemented.")
+--
+--#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+--#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+--#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+--        
+--#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+--        
+--#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+--        
+--#         return final_hidden_states, router_logits
+--
+--
+--# class Qwen2MoeSparseMoeBlock(nn.Module):
+--#     """
+--#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+--#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+--#     """
+--#     def __init__(self, config: Qwen2MoeConfig):
+--#         super().__init__()
+--#         self.num_experts = config.num_experts
+--#         self.top_k = config.num_experts_per_tok
+--#         self.norm_topk_prob = config.norm_topk_prob
+--
+--#         # 门控网络
+--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+--#         # 专家列表
+--#         self.experts = nn.ModuleList(
+--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+--#         )
+--#         # 共享专家
+--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+--
+--#     @no_grad()
+--#     def _moe_infer_decode(
+--#         self, 
+--#         hidden_states: mindspore.Tensor, 
+--#         selected_experts: mindspore.Tensor, 
+--#         routing_weights: mindspore.Tensor
+--#     ) -> mindspore.Tensor:
+--#         batch_size, _ = hidden_states.shape
+--#         expert_outputs_list = [
+--#             ops.cat([
+--#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+--#             ], dim=0) 
+--#             for i in range(batch_size)
+--#         ]
+--#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+--#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+--#         return moe_output.squeeze(1)
+--
+--#     @no_grad()
+--#     def _moe_infer_prefill(
+--#         self, 
+--#         hidden_states: mindspore.Tensor, 
+--#         selected_experts: mindspore.Tensor, 
+--#         routing_weights: mindspore.Tensor
+--#     ) -> mindspore.Tensor:
+--#         moe_output = ops.zeros_like(hidden_states)
+--#         num_tokens = hidden_states.shape[0]
+--#         flat_selected_experts = selected_experts.flatten()
+--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+--#         active_experts = ops.unique(flat_selected_experts)
+--        
+--#         for expert_idx_tensor in active_experts:
+--#             expert_idx = expert_idx_tensor.item()
+--#             expert_layer = self.experts[expert_idx]
+--#             mask = (flat_selected_experts == expert_idx_tensor)
+--#             selected_token_indices = token_indices[mask]
+--#             selected_routing_weights = routing_weights.flatten()[mask]
+--#             current_states = hidden_states[selected_token_indices]
+--#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+--#             moe_output = moe_output.index_add(
+--#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+--#             )
+--#         return moe_output
+--
+--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+--#         """
+--#         顶层 forward 方法，作为智能分发器。
+--#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+--#         """
+--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+--        
+--#         # 1. 门控计算 (通用逻辑)
+--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+--#         router_logits = self.gate(hidden_states_reshaped)
+--#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+--
+--#         if self.norm_topk_prob:
+--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--        
+--#         routing_weights = routing_weights.to(hidden_states.dtype)
+--        
+--#         # 2. 智能分发到最优 MoE 路径
+--#         moe_output = None
+--#         if not self.training:
+--#             if sequence_length == 1:
+--#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+--#             else:
+--#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+--#         else:
+--#             raise NotImplementedError("Training path is not implemented.")
+--
+--#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+--#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+--#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+--#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+--        
+--#         # 4. 合并 MoE 输出和共享专家输出
+--#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+--#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+--        
+--#         # 5. 恢复原始形状并返回
+--#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+--        
+--#         return final_hidden_states, router_logits
+--
+--# prefill fastest
+--# class Qwen2MoeSparseMoeBlock(nn.Module):
+--#     """
+--#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+--#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+--#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+--#     """
+--#     def __init__(self, config: Qwen2MoeConfig):
+--#         super().__init__()
+--#         self.num_experts = config.num_experts
+--#         self.top_k = config.num_experts_per_tok
+--#         self.norm_topk_prob = config.norm_topk_prob
+--
+--#         # 门控网络
+--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+--#         # 专家列表
+--#         self.experts = nn.ModuleList(
+--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+--#         )
+--#         # 共享专家
+--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+--
+--#     @no_grad()
+--#     def _moe_infer_dispatch(
+--#         self, 
+--#         hidden_states: mindspore.Tensor, 
+--#         selected_experts: mindspore.Tensor, 
+--#         routing_weights: mindspore.Tensor
+--#     ) -> mindspore.Tensor:
+--#         """
+--#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+--#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+--#         """
+--#         moe_output = ops.zeros_like(hidden_states)
+--#         num_tokens, _ = hidden_states.shape
+--        
+--#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+--#         flat_selected_experts = selected_experts.flatten()
+--#         flat_routing_weights = routing_weights.flatten()
+--
+--#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+--
+--#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+--#         active_experts = ops.unique(flat_selected_experts)
+--        
+--#         for expert_idx_tensor in active_experts:
+--#             expert_idx = expert_idx_tensor.item()
+--#             expert_layer = self.experts[expert_idx]
+--            
+--#             # 找到所有分配给该专家的 token
+--#             mask = (flat_selected_experts == expert_idx_tensor)
+--            
+--#             # 使用 mask 选取对应的 token 和权重
+--#             current_token_indices = token_indices[mask]
+--#             current_routing_weights = flat_routing_weights[mask]
+--#             current_hidden_states = hidden_states[current_token_indices]
+--            
+--#             # 对这些 token 进行批处理
+--#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+--            
+--#             # 使用 index_add 将结果精确地加回到对应位置
+--#             moe_output = moe_output.index_add(
+--#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+--#             )
+--#         return moe_output
+--
+--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+--#         """
+--#         顶层 forward 方法，作为智能分发器。
+--#         """
+--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+--        
+--#         # 1. 门控计算
+--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+--#         router_logits = self.gate(hidden_states_reshaped)
+--#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+--
+--#         if self.norm_topk_prob:
+--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--        
+--#         routing_weights = routing_weights.to(hidden_states.dtype)
+--        
+--#         # 2. 调用统一的 MoE 计算内核
+--#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+--#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+--
+--#         # 3. 统一处理共享专家
+--#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+--#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+--        
+--#         # 4. 合并输出
+--#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+--        
+--#         # 5. 恢复原始形状并返回
+--#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+--        
+--#         return final_hidden_states, router_logits
+--
+--
+--# class Qwen2MoeSparseMoeBlock(nn.Module):
+--#     """
+--#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+--#     【最终高性能与高精度版】：
+--#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+--#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+--#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+--#     3. 这样实现了速度和准确性的两全其美。
+--#     """
+--#     def __init__(self, config: Qwen2MoeConfig):
+--#         super().__init__()
+--#         self.num_experts = config.num_experts
+--#         self.top_k = config.num_experts_per_tok
+--#         self.norm_topk_prob = config.norm_topk_prob
+--
+--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+--#         self.experts = nn.ModuleList(
+--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+--#         )
+--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+--
+--#     @no_grad()
+--#     def _moe_infer_decode(
+--#         self, 
+--#         hidden_states: mindspore.Tensor, 
+--#         selected_experts: mindspore.Tensor, 
+--#         routing_weights: mindspore.Tensor
+--#     ) -> mindspore.Tensor:
+--#         """
+--#         【解码路径】极致优化版：bmm + 高精度累加。
+--#         """
+--#         original_dtype = hidden_states.dtype
+--#         batch_size, _ = hidden_states.shape
+--        
+--#         expert_outputs_list = [
+--#             ops.cat([
+--#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+--#             ], dim=0) 
+--#             for i in range(batch_size)
+--#         ]
+--#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+--
+--#         # 在 float32 下执行 bmm，得到高精度结果
+--#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+--        
+--#         # 将高精度结果转换回原始数据类型
+--#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+--        
+--#         return moe_output
+--
+--#     @no_grad()
+--#     def _moe_infer_prefill(
+--#         self, 
+--#         hidden_states: mindspore.Tensor, 
+--#         selected_experts: mindspore.Tensor, 
+--#         routing_weights: mindspore.Tensor
+--#     ) -> mindspore.Tensor:
+--#         """
+--#         【预填充路径】与原始实现一致，结果精确。
+--#         """
+--#         moe_output = ops.zeros_like(hidden_states)
+--#         num_tokens, _ = hidden_states.shape
+--#         flat_selected_experts = selected_experts.flatten()
+--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+--#         active_experts = ops.unique(flat_selected_experts)
+--        
+--#         for expert_idx_tensor in active_experts:
+--#             expert_idx = expert_idx_tensor.item()
+--#             expert_layer = self.experts[expert_idx]
+--#             mask = (flat_selected_experts == expert_idx_tensor)
+--#             selected_token_indices = token_indices[mask]
+--#             selected_routing_weights = routing_weights.flatten()[mask]
+--#             current_states = hidden_states[selected_token_indices]
+--#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+--#             moe_output = moe_output.index_add(
+--#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+--#             )
+--#         return moe_output
+--
+--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+--        
+--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+--#         router_logits = self.gate(hidden_states_reshaped)
+--#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+--
+--#         if self.norm_topk_prob:
+--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--        
+--#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+--#         # 如果模型主体是 float16，后续再转换
+--        
+--#         moe_output = None
+--#         if not self.training:
+--#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+--#             # _moe_infer_decode 内部会处理好类型转换
+--#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+--#             if sequence_length == 1:
+--#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+--#             else:
+--#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+--#         else:
+--#             raise NotImplementedError("Training path is not implemented.")
+--
+--#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+--#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+--        
+--#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+--#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+--        
+--#         return final_hidden_states, router_logits
+--    
+--
+--# class Qwen2MoeSparseMoeBlock(nn.Module):
+--#     """
+--#     【融合版】一个混合专家模块，内置两种推理策略，
+--#     由外部全局变量 `Long_Prompt` 控制：
+--
+--#     - if Long_Prompt is True:  【精度优先模式】
+--#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+--#       适用于处理长序列，避免误差累积。
+--
+--#     - if Long_Prompt is False: 【速度优先模式】
+--#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+--#       在解码阶段获得极致速度，同时保证结果高度准确。
+--#     """
+--#     def __init__(self, config: Qwen2MoeConfig):
+--#         super().__init__()
+--#         self.num_experts = config.num_experts
+--#         self.top_k = config.num_experts_per_tok
+--#         self.norm_topk_prob = config.norm_topk_prob
+--
+--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+--#         self.experts = nn.ModuleList(
+--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+--#         )
+--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+--
+--#     # --- 速度优先模式的辅助函数 ---
+--#     @no_grad()
+--#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+--#         original_dtype = hidden_states.dtype
+--#         batch_size, _ = hidden_states.shape
+--#         expert_outputs_list = [
+--#             ops.cat([
+--#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+--#             ], dim=0) 
+--#             for i in range(batch_size)
+--#         ]
+--#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+--#         weights_fp32 = routing_weights.to(mindspore.float32)
+--#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+--#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+--#         return moe_output_fp32.squeeze(1).to(original_dtype)
+--
+--#     @no_grad()
+--#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+--#         moe_output = ops.zeros_like(hidden_states)
+--#         num_tokens, _ = hidden_states.shape
+--#         flat_selected_experts = selected_experts.flatten()
+--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+--#         active_experts = ops.unique(flat_selected_experts)
+--#         for expert_idx_tensor in active_experts:
+--#             expert_idx = expert_idx_tensor.item()
+--#             expert_layer = self.experts[expert_idx]
+--#             mask = (flat_selected_experts == expert_idx_tensor)
+--#             selected_token_indices = token_indices[mask]
+--#             selected_routing_weights = routing_weights.flatten()[mask]
+--#             current_states = hidden_states[selected_token_indices]
+--#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+--#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+--#         return moe_output
+--
+--#     # --- 精度优先模式的辅助函数 ---
+--#     @no_grad()
+--#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+--#         moe_output = ops.zeros_like(hidden_states)
+--#         num_tokens, _ = hidden_states.shape
+--#         flat_selected_experts = selected_experts.flatten()
+--#         flat_routing_weights = routing_weights.flatten()
+--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+--#         active_experts = ops.unique(flat_selected_experts)
+--#         for expert_idx_tensor in active_experts:
+--#             expert_idx = expert_idx_tensor.item()
+--#             expert_layer = self.experts[expert_idx]
+--#             mask = (flat_selected_experts == expert_idx_tensor)
+--#             current_token_indices = token_indices[mask]
+--#             current_routing_weights = flat_routing_weights[mask]
+--#             current_hidden_states = hidden_states[current_token_indices]
+--#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+--#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+--#         return moe_output
+--
+--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+--#         # 声明我们将要使用一个在模块外部定义的全局变量
+--#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+--#         global Long_Prompt
+--
+--#         # 1. 门控计算 (所有模式通用)
+--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+--#         router_logits = self.gate(hidden_states_reshaped)
+--#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+--#         if self.norm_topk_prob:
+--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--        
+--#         moe_output = None
+--#         if not self.training:
+--#             # 根据 Long_Prompt 标志选择模式
+--#             if Long_Prompt:
+--#                 # --- 精度优先模式 ---
+--#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+--#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+--#             else:
+--#                 # --- 速度优先模式 ---
+--#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+--#                 if sequence_length == 1:
+--#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+--#                 else:
+--#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+--#         else:
+--#             raise NotImplementedError("Training path is not implemented.")
+--
+--#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+--#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+--        
+--#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+--#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+--        
+--#         return final_hidden_states, router_logits
+--    
+- class Qwen2MoeSparseMoeBlock(nn.Module):
+-     """
+-     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-         return moe_output_fp32.squeeze(1).to(original_dtype)
+- 
+-+    # @no_grad()
+-+    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+    #     num_tokens, _ = hidden_states.shape
+-+    #     flat_selected_experts = selected_experts.flatten()
+-+    #     sorted_expert_indices = flat_selected_experts.argsort()
+-+    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+    #     original_token_indices = sorted_expert_indices // self.top_k
+-+    #     moe_output = ops.zeros_like(hidden_states)
+-+    #     current_token_offset = 0
+-+    #     for i in range(self.num_experts):
+-+    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+-+    #         if expert_token_count == 0:
+-+    #             continue
+-+    #         end_offset = current_token_offset + expert_token_count
+-+    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+-+    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+    #         current_token_offset += expert_token_count
+-+    #     return moe_output
+-+
+-     @no_grad()
+-     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+--        num_tokens, _ = hidden_states.shape
+--        flat_selected_experts = selected_experts.flatten()
+--        sorted_expert_indices = flat_selected_experts.argsort()
+--        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+--        original_token_indices = sorted_expert_indices // self.top_k
+-+        """
+-+        优化版 MoE prefill (速度优先模式)：
+-+        - 批量张量化处理同一个 expert 的所有 token
+-+        - 跳过无 token 的专家
+-+        - 保持结果完全一致
+-+        """
+-         moe_output = ops.zeros_like(hidden_states)
+--        current_token_offset = 0
+--        for i in range(self.num_experts):
+--            expert_token_count = tokens_per_expert[i] - current_token_offset
+--            if expert_token_count == 0:
+--                continue
+--            end_offset = current_token_offset + expert_token_count
+--            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+--            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+--            expert_hidden_states = hidden_states[expert_original_token_indices]
+--            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+--            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+--            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+--            current_token_offset += expert_token_count
+-+
+-+        flat_selected_experts = selected_experts.flatten()
+-+        flat_routing_weights = routing_weights.flatten()
+-+
+-+        idxs = flat_selected_experts.argsort()
+-+        sorted_expert_indices = flat_selected_experts[idxs]
+-+        sorted_token_indices = idxs // self.top_k
+-+
+-+        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+-+
+-+        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-+
+-+        for expert_id in active_experts.tolist():
+-+            start = int(tokens_per_expert[:expert_id].sum().item())
+-+            end = start + int(tokens_per_expert[expert_id].item())
+-+
+-+            token_idx = sorted_token_indices[start:end]
+-+            expert_tokens = hidden_states[token_idx]
+-+
+-+            expert_out = self.experts[expert_id](expert_tokens)
+-+
+-+            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+-+
+-+            moe_output = mindspore.mint.scatter_add(
+-+                moe_output,
+-+                0,
+-+                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+-+                scaled_out.to(hidden_states.dtype)
+-+            )
+-+
+-         return moe_output
+- 
+-+
+-     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-     @no_grad()
+-     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-         
+-         moe_output = None
+--        if Long_Prompt:
+--            # --- 精度优先模式 (ACCURACY MODE) ---
+--            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+--            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+        # if Long_Prompt==0:
+-+        #     # --- 精度优先模式 (ACCURACY MODE) ---
+-+        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+        # else:
+-+        #     # --- 速度优先模式 (SPEED MODE) ---
+-+        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+        #     if sequence_length == 1:
+-+        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+        #     else:
+-+        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+        
+-+        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+        if sequence_length == 1:
+-+            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-         else:
+--            # --- 速度优先模式 (SPEED MODE) ---
+--            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+--            if sequence_length == 1:
+--                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+--            else:
+--                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+--        
+-+            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+    
+- 
+-         # 3. 共享专家计算与合并 (所有模式通用)
+-         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-         
+-         return final_hidden_states, router_logits
+- 
+-+
+- class Qwen2MoeDecoderLayer(nn.Module):
+-     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-         super().__init__()
+-         self.hidden_size = config.hidden_size
+-         
+--        # if Long_Prompt:
+--        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+--        # else:
+-+        # if Long_Prompt == 2:
+-         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+        # else:
+-+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+- 
+-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+- 
+-@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-             )
+- 
+-         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+--        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+        #     attention_mask,
+-+        #     sequence_length=sequence_length,
+-+        #     target_length=target_length,
+-+        #     dtype=dtype,
+-+        #     min_dtype=min_dtype,
+-+        #     cache_position=cache_position,
+-+        #     batch_size=input_tensor.shape[0],
+-+        # )
+-+        #@dwj
+-+        causal_mask = get_cached_causal_mask_with_cache_position(
+-             attention_mask,
+-             sequence_length=sequence_length,
+-             target_length=target_length,
+-@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-         """
+--        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+-+        _causal_mask_cache.clear()
+- 
+-         input_ids = kwargs.get("input_ids")
+-         if input_ids is None and args:
+-@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+- 
+-         if input_ids is not None:
+-             prompt_length = input_ids.shape[1]
+--            
+--            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+--                Long_Prompt = True
+-+            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+-+                Long_Prompt = 2
+-+            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+-+                Long_Prompt = 0
+-             else:
+--                Long_Prompt = False
+-+                Long_Prompt = 1
+-+
+- 
+-         return super().generate(*args, **kwargs)
+-     
+-@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-             dtype = self.lm_head.weight.dtype
+-             min_dtype = float(ops.finfo(dtype).min)
+- 
+--            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+            #     attention_mask,
+-+            #     sequence_length=sequence_length,
+-+            #     target_length=past_key_values.get_max_length(),
+-+            #     dtype=dtype,
+-+            #     min_dtype=min_dtype,
+-+            #     cache_position=cache_position,
+-+            #     batch_size=batch_size,
+-+            # )
+-+
+-+            #@dwj
+-+            attention_mask = get_cached_causal_mask_with_cache_position(
+-                 attention_mask,
+-                 sequence_length=sequence_length,
+-                 target_length=past_key_values.get_max_length(),
+-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-deleted file mode 100644
+-index 6dfb5b93..00000000
+---- a/patches/0001-20251104commit.patch
+-+++ /dev/null
+-@@ -1,1272 +0,0 @@
+--From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+--From: Pinoeer-kingxi <13022943007@163.com>
+--Date: Tue, 4 Nov 2025 09:11:51 +0800
+--Subject: [PATCH] 20251104commit
+--
+-----
+-- mindnlp/transformers/cache_utils.py           |  28 +-
+-- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-- 3 files changed, 976 insertions(+), 87 deletions(-)
+--
+--diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+--index cadd2e04..02f8d4be 100644
+----- a/mindnlp/transformers/cache_utils.py
+--+++ b/mindnlp/transformers/cache_utils.py
+--@@ -812,14 +812,26 @@ class StaticCache(Cache):
+--             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+--             # k_out[:, :, cache_position] = key_states
+--             # v_out[:, :, cache_position] = value_states
+---            if ON_ORANGE_PI:
+---                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+---                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+---            else:
+---                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+---                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+---                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+---
+--+            # if ON_ORANGE_PI:
+--+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+--+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+--+            # else:
+--+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+--+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+--+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+--+            # 确保 cache_position 是 1D tensor 并且类型正确
+--+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+--+            if cache_position.ndim > 1:
+--+                cache_position = cache_position.flatten()
+--+            # 确保类型是 int32 或 int64（MindSpore 要求）
+--+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+--+                cache_position = cache_position.int()
+--+            
+--+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+--+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+--+            k_out[:, :, cache_position] = key_states
+--+            v_out[:, :, cache_position] = value_states
+--+            
+--         return k_out, v_out
+-- 
+--     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+--index c695b944..d8303e45 100644
+----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+--@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-- # Copied from transformers.models.llama.modeling_llama.rotate_half
+-- def rotate_half(x):
+--     """Rotates half the hidden dims of the input."""
+---    x1 = x[..., : x.shape[-1] // 2]
+---    x2 = x[..., x.shape[-1] // 2 :]
+--+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+--+    # x1 = x[..., : x.shape[-1] // 2]
+--+    # x2 = x[..., x.shape[-1] // 2 :]
+--+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+--     return ops.cat((-x2, x1), dim=-1)
+-- 
+-- 
+--@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+--         if self.training:
+--             raise NotImplementedError("Training is not supported yet.")
+--         else:
+---            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+---        if self.config.n_shared_experts is not None:
+---            y = y + self.shared_experts(identity)
+---        return y
+--+            # @lwx
+--+            if orig_shape[1] == 1:
+--+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+--+                y=y.view(*orig_shape)
+--+                if self.config.n_shared_experts is not None:
+--+                    y = y + self.shared_experts(identity)
+--+                return y
+--+            else:
+--+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+--+                if self.config.n_shared_experts is not None:
+--+                    y = y + self.shared_experts(identity)
+--+                return y
+--+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+--+        # if self.config.n_shared_experts is not None:
+--+        #     y = y + self.shared_experts(identity)
+--+        # return y
+--+        
+--+    @no_grad()
+--+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+--+
+--+        expert_cache = ops.zeros_like(x)
+--+        for i in range(self.num_experts_per_tok):
+--+            expert_id = flat_expert_indices[i].item()
+--+            weight = flat_expert_weights[i].item()
+--+            expert = self.experts[expert_id]
+--+            expert_out = expert(x)
+--+            expert_cache += expert_out * weight
+--+        return expert_cache
+-- 
+--     @no_grad()
+---    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+---        # expert_cache = torch.zeros_like(x)
+---        # idxs = flat_expert_indices.argsort()
+---        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+---        # token_idxs = idxs // self.num_experts_per_tok
+---        # for i, end_idx in enumerate(tokens_per_expert):
+---        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+---        #     if start_idx == end_idx:
+---        #         continue
+---        #     expert = self.experts[i]
+---        #     exp_token_idx = token_idxs[start_idx:end_idx]
+---        #     expert_tokens = x[exp_token_idx]
+---        #     expert_out = expert(expert_tokens)
+---        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+---        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+---        # return expert_cache
+--+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+--         expert_cache = ops.zeros_like(x)
+--         idxs = flat_expert_indices.argsort()
+--         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+--         token_idxs = idxs // self.num_experts_per_tok
+--+
+--         for i, end_idx in enumerate(tokens_per_expert):
+--             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+--             if start_idx == end_idx:
+--@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+--             expert_out = expert(expert_tokens)
+--             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+--             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+--+
+--         return expert_cache
+--+        
+--+    # @no_grad()
+--+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+--+    #     # expert_cache = torch.zeros_like(x)
+--+    #     # idxs = flat_expert_indices.argsort()
+--+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+--+    #     # token_idxs = idxs // self.num_experts_per_tok
+--+    #     # for i, end_idx in enumerate(tokens_per_expert):
+--+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+--+    #     #     if start_idx == end_idx:
+--+    #     #         continue
+--+    #     #     expert = self.experts[i]
+--+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+--+    #     #     expert_tokens = x[exp_token_idx]
+--+    #     #     expert_out = expert(expert_tokens)
+--+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+--+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+--+    #     # return expert_cache
+--+    #     expert_cache = ops.zeros_like(x)
+--+    #     idxs = flat_expert_indices.argsort()
+--+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+--+    #     token_idxs = idxs // self.num_experts_per_tok
+--+
+--+    #     for i, end_idx in enumerate(tokens_per_expert):
+--+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+--+    #         if start_idx == end_idx:
+--+    #             continue
+--+    #         expert = self.experts[i]
+--+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+--+    #         expert_tokens = x[exp_token_idx]
+--+    #         expert_out = expert(expert_tokens)
+--+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+--+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+--+
+--+    #     return expert_cache
+--+    # @no_grad()
+--+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+--+    #     expert_cache = ops.zeros_like(x)
+--+
+--+    #     # 排序保证顺序一致
+--+    #     idxs = flat_expert_indices.argsort()
+--+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+--+    #     token_idxs = idxs // self.num_experts_per_tok
+--+
+--+    #     # 找出有 token 的专家
+--+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+--+
+--+    #     for i in active_experts.tolist():
+--+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+--+    #         end_idx = tokens_per_expert[i]
+--+    #         if start_idx == end_idx:  # 没有 token
+--+    #             continue
+--+
+--+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+--+    #         expert_tokens = x[exp_token_idx]
+--+    #         expert_out = self.experts[i](expert_tokens)
+--+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+--+
+--+    #         expert_cache = mindspore.mint.scatter_add(
+--+    #             expert_cache,
+--+    #             0,
+--+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+--+    #             expert_out
+--+    #         )
+--+
+--+    #     return expert_cache
+--+
+--+
+-- 
+-- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-- #     """
+--@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-- 
+--         # Initialize weights and apply final processing
+--         self.post_init()
+--+        self.warm_up = False
+--+
+--+    def warmup_moe_model_deep(self):
+--+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+--+        test_texts = [
+--+            "warmup short",
+--+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+--+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+--+        ]
+--+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+--+        if tokenizer is None:
+--+            from mindnlp.transformers import AutoTokenizer
+--+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+--+            self._warmup_tokenizer = tokenizer
+--+
+--+        for text in test_texts:
+--+            inputs = tokenizer(text, return_tensors="ms")
+--+            with mindspore._no_grad():
+--+                _ = self(**inputs, use_cache=False)
+--+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-- 
+--     def get_input_embeddings(self):
+--         return self.model.embed_tokens
+--@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+--         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+--         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+--         ```"""
+--+        if not self.warm_up:
+--+            self.warm_up = True
+--+            self.warmup_moe_model_deep()
+--+
+--         output_attentions = (
+--             output_attentions
+--             if output_attentions is not None
+--diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+--index 3cbf820e..d4c6b651 100644
+----- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+--+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+--@@ -18,7 +18,6 @@
+-- # See the License for the specific language governing permissions and
+-- # limitations under the License.
+-- """MindSpore Qwen2MoE model."""
+---
+-- import math
+-- from typing import List, Optional, Tuple, Union
+-- 
+--@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+--     TokenClassifierOutput,
+-- )
+-- from ...modeling_utils import PreTrainedModel
+--+from ...generation import GenerationMixin
+-- from ....utils import logging
+-- from .configuration_qwen2_moe import Qwen2MoeConfig
+-- 
+--@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+--         self.variance_epsilon = eps
+-- 
+--     def forward(self, hidden_states):
+--+        # @dwj
+--+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+--+        # @lwx
+--+        # if not self.training :
+--+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+--         input_dtype = hidden_states.dtype
+--         hidden_states = hidden_states.to(mindspore.float32)
+--         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+--@@ -234,6 +239,8 @@ def rotate_half(x):
+--     """Rotates half the hidden dims of the input."""
+--     x1 = x[..., : x.shape[-1] // 2]
+--     x2 = x[..., x.shape[-1] // 2 :]
+--+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+--+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+--     return ops.cat((-x2, x1), dim=-1)
+-- 
+-- 
+--@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+--         self.config = config
+--         self.hidden_size = config.hidden_size
+--         self.intermediate_size = intermediate_size
+--+        
+--         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+--         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+--         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+--         self.act_fn = ACT2FN[config.hidden_act]
+-- 
+--     def forward(self, x):
+---        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+---
+-- 
+--+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+--+        # @lwx 
+--+        # gate_up_output = self.gate_up_proj(x)
+--+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+--+        # return self.down_proj(swiglu_output)
+--+
+--+    # def forward(self, x):
+--+    #     gate_proj_out = self.gate_proj(x)
+--+    #     up_proj_out = self.up_proj(x)
+--+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+--+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+--+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+--+    #     return self.down_proj(swiglu_out)
+--+        
+-- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+--     """
+--@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+--         use_cache: bool = False,
+--         cache_position: Optional[mindspore.Tensor] = None,
+--     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+--+
+--+        
+--+
+--         bsz, q_len, _ = hidden_states.shape
+-- 
+--         query_states = self.q_proj(hidden_states)
+--@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+--                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+--                     "with a layer index."
+--                 )
+---            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+--+            if isinstance(past_key_value, StaticCache):
+--+                kv_seq_len = key_states.shape[-2]
+--+            else:
+--+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+--         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+--         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-- 
+--         if past_key_value is not None:
+--             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+--             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+--+            
+--+            if isinstance(past_key_value, StaticCache):
+--+                kv_seq_len = key_states.shape[-2]
+-- 
+--         # repeat k/v heads if n_kv_heads < n_heads
+--         key_states = repeat_kv(key_states, self.num_key_value_groups)
+--         value_states = repeat_kv(value_states, self.num_key_value_groups)
+---
+--+        
+--         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-- 
+---        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+---            raise ValueError(
+---                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+---                f" {attn_weights.shape}"
+---            )
+---
+---        if attention_mask is not None:  # no matter the length, we just slice it
+---            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+--+        if attention_mask is not None:
+--+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+--             attn_weights = attn_weights + causal_mask
+-- 
+--         # upcast attention to fp32
+--@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+--         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-- 
+--         attn_output = self.o_proj(attn_output)
+---
+--+        # @lwx
+--+        
+--+        # max_seq_len = self.max_position_embeddings  # 2048
+--+
+--+        # if attention_mask is not None:
+--+        #     # attention_mask: [B, 1, Sq, Sk]
+--+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+--+
+--+        #     # pad 到 [max_seq_len, max_seq_len]
+--+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+--+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+--+        #     global_attention_mask = padded_mask
+--+        # else:
+--+        #     global_attention_mask = None
+--+
+--+
+--+        # sparse_mode=3
+--+        # attn_output = mindspore.ops.flash_attention_score(
+--+        #     query=query_states,
+--+        #     key=key_states,
+--+        #     value=value_states,
+--+        #     real_shift=None,
+--+        #     padding_mask=None,
+--+
+--+        #     head_num=self.num_heads,
+--+        #     attn_mask=global_attention_mask,
+--+        #     keep_prob=1.0 - self.attention_dropout,
+--+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+--+        #     input_layout="BNSD",
+--+        #     pre_tokens=2147483647,
+--+        #     next_tokens=2147483647,
+--+        #     inner_precise=0,
+--+        #     drop_mask=None,
+--+        #     prefix=None,
+--+        #     actual_seq_qlen=None,
+--+        #     actual_seq_kvlen=None,
+--+        #     sparse_mode=sparse_mode,
+--+        # )
+--         if not output_attentions:
+--             attn_weights = None
+-- 
+--         return attn_output, attn_weights, past_key_value
+-- 
+-- 
+--+class Qwen2MoeFlashAttention(nn.Module):
+--+    """
+--+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+--+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+--+
+--+    关键改动:
+--+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+--+       直接传入原始的 key 和 value 张量效率更高。
+--+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+--+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+--+    """
+--+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+--+        super().__init__()
+--+        self.config = config
+--+        self.layer_idx = layer_idx
+--+        self.hidden_size = config.hidden_size
+--+        self.num_heads = config.num_attention_heads
+--+        self.head_dim = self.hidden_size // self.num_heads
+--+        self.num_key_value_heads = config.num_key_value_heads
+--+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+--+        self.max_position_embeddings = config.max_position_embeddings
+--+        self.rope_theta = config.rope_theta
+--+        self.attention_dropout = config.attention_dropout
+--+
+--+        if (self.head_dim * self.num_heads) != self.hidden_size:
+--+            raise ValueError(
+--+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+--+            )
+--+
+--+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+--+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+--+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+--+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+--+
+--+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+--+            self.head_dim,
+--+            max_position_embeddings=self.max_position_embeddings,
+--+            base=self.rope_theta,
+--+        )
+--+
+--+    def forward(
+--+        self,
+--+        hidden_states: mindspore.Tensor,
+--+        attention_mask: Optional[mindspore.Tensor] = None,
+--+        position_ids: Optional[mindspore.Tensor] = None,
+--+        past_key_value: Optional[Cache] = None,
+--+        output_attentions: bool = False,
+--+        use_cache: bool = False,
+--+        cache_position: Optional[mindspore.Tensor] = None,
+--+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+--+
+--+        bsz, q_len, _ = hidden_states.shape
+--+
+--+        # 1. 线性投射 Q, K, V
+--+        query_states = self.q_proj(hidden_states)
+--+        key_states = self.k_proj(hidden_states)
+--+        value_states = self.v_proj(hidden_states)
+--+
+--+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+--+        # query:   [B, S, H*D] -> [B, N1, S, D]
+--+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+--+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+
+--+        # 3. RoPE 旋转位置编码
+--+        kv_seq_len = key_states.shape[-2]
+--+        if past_key_value is not None:
+--+            if self.layer_idx is None:
+--+                raise ValueError(
+--+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+--+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+--+                    "with a layer index."
+--+                )
+--+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+--+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+--+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+--+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+--+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+--+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+--+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+--+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+--+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+--+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+--+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+--+                if cache_position.shape[0] == 1:
+--+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+--+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+--+                    kv_seq_len = past_seen_tokens + 1
+--+                else:
+--+                    # prefill 阶段：cache_position 是范围，使用其长度
+--+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+--+            else:
+--+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+--+
+--+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+--+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+--+
+--+        # 4. KV 缓存更新
+--+        if past_key_value is not None:
+--+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+--+            key_states, value_states = past_key_value.update(
+--+                key_states, value_states, self.layer_idx, cache_kwargs
+--+            )
+--+            
+--+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+--+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+--+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+--+                if cache_position.shape[0] == 1:
+--+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+--+                    kv_seq_len = key_states.shape[-2]
+--+
+--+        # 5. [重要] 准备 Attention Mask
+--+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+--+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+--+        fa_attention_mask = None
+--+        if attention_mask is not None:
+--+            # 截取与当前key长度匹配的部分
+--+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+--+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+--+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+--+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+--+            fa_attention_mask = (mask_slice != 0)
+--+
+--+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+--+        input_dtype = query_states.dtype
+--+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+--+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+--+            query_states = query_states.to(mindspore.float16)
+--+            key_states = key_states.to(mindspore.float16)
+--+            value_states = value_states.to(mindspore.float16)
+--+
+--+        # 6. [核心] 调用 flash_attention_score 算子
+--+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+--+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+--+        attn_output = mindspore.ops.flash_attention_score(
+--+            query=query_states,
+--+            key=key_states,
+--+            value=value_states,
+--+            head_num=self.num_heads, # 传入Q的头数(N1)
+--+            attn_mask=fa_attention_mask,
+--+            keep_prob=1.0 - self.attention_dropout,
+--+            scalar_value=1.0 / math.sqrt(self.head_dim),
+--+            input_layout="BNSD",
+--+            sparse_mode=0 # 使用 defaultMask 模式
+--+        )
+--+
+--+        # 恢复原始数据类型
+--+        attn_output = attn_output.to(input_dtype)
+--+
+--+        # 7. 调整输出形状
+--+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+--+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+--+        attn_output = self.o_proj(attn_output)
+--+
+--+        # FlashAttention 算子不直接返回注意力权重矩阵
+--+        attn_weights = None
+--+        if output_attentions:
+--+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+--+
+--+        return attn_output, attn_weights, past_key_value
+--+
+--+    # def forward(
+--+    #     self,
+--+    #     hidden_states: mindspore.Tensor,
+--+    #     attention_mask: Optional[mindspore.Tensor] = None,
+--+    #     position_ids: Optional[mindspore.Tensor] = None,
+--+    #     past_key_value: Optional[Cache] = None,
+--+    #     output_attentions: bool = False,
+--+    #     use_cache: bool = False,
+--+    #     cache_position: Optional[mindspore.Tensor] = None,
+--+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+--+
+--+    #     bsz, q_len, _ = hidden_states.shape
+--+
+--+    #     # 1. 线性投射 Q, K, V
+--+    #     query_states = self.q_proj(hidden_states)
+--+    #     key_states = self.k_proj(hidden_states)
+--+    #     value_states = self.v_proj(hidden_states)
+--+
+--+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+--+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+
+--+    #     # 3. RoPE 旋转位置编码
+--+    #     kv_seq_len = key_states.shape[-2]
+--+    #     if past_key_value is not None:
+--+    #         if self.layer_idx is None:
+--+    #             raise ValueError(
+--+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+--+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+--+    #                 "with a layer index."
+--+    #             )
+--+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+--+
+--+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+--+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+--+
+--+    #     # 4. KV 缓存更新
+--+    #     if past_key_value is not None:
+--+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+--+    #         key_states, value_states = past_key_value.update(
+--+    #             key_states, value_states, self.layer_idx, cache_kwargs
+--+    #         )
+--+
+--+    #     # 5. 准备 Attention Mask
+--+    #     fa_attention_mask = None
+--+    #     if attention_mask is not None:
+--+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+--+    #         fa_attention_mask = (mask_slice != 0)
+--+
+--+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+--+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+--+    #     input_dtype = query_states.dtype
+--+
+--+    #     # 6. [核心] 调用 flash_attention_score 算子
+--+    #     attn_output = mindspore.ops.flash_attention_score(
+--+    #         query=query_states,
+--+    #         key=key_states,
+--+    #         value=value_states,
+--+    #         head_num=self.num_heads,
+--+    #         attn_mask=fa_attention_mask,
+--+    #         keep_prob=1.0 - self.attention_dropout,
+--+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+--+    #         input_layout="BNSD",
+--+    #         sparse_mode=0,
+--+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+--+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+--+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+--+    #         inner_precise=1
+--+    #     )
+--+
+--+    #     # 恢复原始数据类型
+--+    #     attn_output = attn_output.to(input_dtype)
+--+
+--+    #     # 7. 调整输出形状
+--+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+--+    #     attn_output = self.o_proj(attn_output)
+--+
+--+    #     attn_weights = None
+--+    #     if output_attentions:
+--+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+--+
+--+    #     return attn_output, attn_weights, past_key_value
+--+
+--+    # def forward(
+--+    #     self,
+--+    #     hidden_states: mindspore.Tensor,
+--+    #     attention_mask: Optional[mindspore.Tensor] = None,
+--+    #     position_ids: Optional[mindspore.Tensor] = None,
+--+    #     past_key_value: Optional[Cache] = None,
+--+    #     output_attentions: bool = False,
+--+    #     use_cache: bool = False,
+--+    #     cache_position: Optional[mindspore.Tensor] = None,
+--+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+--+
+--+    #     bsz, q_len, _ = hidden_states.shape
+--+
+--+    #     query_states = self.q_proj(hidden_states)
+--+    #     key_states = self.k_proj(hidden_states)
+--+    #     value_states = self.v_proj(hidden_states)
+--+
+--+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--+
+--+    #     kv_seq_len = key_states.shape[-2]
+--+    #     if past_key_value is not None:
+--+    #         if self.layer_idx is None:
+--+    #             raise ValueError("`layer_idx` must be specified for caching")
+--+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+--+
+--+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+--+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+--+
+--+    #     if past_key_value is not None:
+--+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+--+    #         key_states, value_states = past_key_value.update(
+--+    #             key_states, value_states, self.layer_idx, cache_kwargs
+--+    #         )
+--+
+--+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+--+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+--+
+--+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+--+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+--+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+--+    #     query_states = query_states / math.sqrt(self.head_dim)
+--+    #     # <--- 修改结束 ---
+--+
+--+    #     fa_attention_mask = None
+--+    #     if attention_mask is not None:
+--+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+--+    #         fa_attention_mask = (mask_slice != 0)
+--+
+--+    #     input_dtype = query_states.dtype
+--+
+--+    #     attn_output = mindspore.ops.flash_attention_score(
+--+    #         query=query_states,    # 传入已经预先缩放过的 query
+--+    #         key=key_states,
+--+    #         value=value_states,
+--+    #         head_num=self.num_heads,
+--+    #         attn_mask=fa_attention_mask,
+--+    #         keep_prob=1.0 - self.attention_dropout,
+--+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+--+    #         input_layout="BNSD",
+--+    #         sparse_mode=0,
+--+    #         inner_precise=1        # 仍然保持内部高精度计算
+--+    #     )
+--+
+--+    #     attn_output = attn_output.to(input_dtype)
+--+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+--+    #     attn_output = self.o_proj(attn_output)
+--+
+--+    #     attn_weights = None
+--+    #     if output_attentions:
+--+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+--+
+--+    #     return attn_output, attn_weights, past_key_value
+--+
+-- QWEN2MOE_ATTENTION_CLASSES = {
+--     "eager": Qwen2MoeAttention,
+--+    "flash-attention": Qwen2MoeFlashAttention,
+-- }
+-- 
+-- 
+--@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+--         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+--         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-- 
+--+    #@dwj
+--+    # 只遍历激活的专家，而非全部专家
+--     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+---        batch_size, sequence_length, hidden_dim = hidden_states.shape
+---        hidden_states = hidden_states.view(-1, hidden_dim)
+---        # router_logits: (batch * sequence_length, n_experts)
+---        router_logits = self.gate(hidden_states)
+---
+---        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+---        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+---        if self.norm_topk_prob:
+---            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+---        # we cast back to the input dtype
+---        routing_weights = routing_weights.to(hidden_states.dtype)
+---
+---        final_hidden_states = ops.zeros(
+---            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+---        )
+---
+---        # One hot encode the selected experts to create an expert mask
+---        # this will be used to easily index which expert is going to be sollicitated
+---        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+---
+---        # Loop over all available experts in the model and perform the computation on each expert
+---        for expert_idx in range(self.num_experts):
+---            expert_layer = self.experts[expert_idx]
+---            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+---
+---            # Index the correct hidden states and compute the expert hidden state for
+---            # the current expert. We need to make sure to multiply the output hidden
+---            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+---            if 0 not in idx.shape:
+---                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+---                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+---
+---                # However `index_add_` only support torch tensors for indexing so we'll use
+---                # the `top_x` tensor here.
+---                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+---
+---        shared_expert_output = self.shared_expert(hidden_states)
+---        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+---
+---        final_hidden_states = final_hidden_states + shared_expert_output
+--+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+--+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+--+            num_tokens = hidden_states_reshaped.shape[0]
+--+
+--+            router_logits = self.gate(hidden_states_reshaped)
+--+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+--+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+--+
+--+            if self.norm_topk_prob:
+--+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+--+            routing_weights = routing_weights.to(hidden_states.dtype)
+--+            
+--+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+--+            flat_selected_experts = selected_experts.flatten()
+--+            
+--+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+--+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+--+            token_indices = broadcasted_token_indices.flatten()
+--+            
+--+            active_experts = ops.unique(flat_selected_experts)
+--+            
+--+            for expert_idx_tensor in active_experts:
+--+                expert_idx = expert_idx_tensor.item()
+--+                expert_layer = self.experts[expert_idx]
+--+                
+--+                mask = (flat_selected_experts == expert_idx_tensor)
+--+                selected_token_indices = token_indices[mask]
+--+                selected_routing_weights = routing_weights.flatten()[mask]
+--+                
+--+                current_states = hidden_states_reshaped[selected_token_indices]
+--+                
+--+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+--+                
+--+                final_hidden_states = final_hidden_states.index_add(
+--+                    dim=0,
+--+                    index=selected_token_indices,
+--+                    source=expert_output.to(hidden_states.dtype)
+--+                )
+--+            
+--+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+--+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-- 
+---        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+---        return final_hidden_states, router_logits
+--+            final_hidden_states = final_hidden_states + shared_expert_output
+--+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+--+            
+--+            return final_hidden_states, router_logits
+-- 
+-- 
+-- class Qwen2MoeDecoderLayer(nn.Module):
+--@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-- 
+--         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-- 
+--+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+--+
+--         if (layer_idx not in config.mlp_only_layers) and (
+--             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+--         ):
+--@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+--     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+--     _skip_keys_device_placement = "past_key_values"
+--     _supports_cache_class = True
+--+#lwx
+--+    # _supports_static_cache = True
+-- 
+--     def _init_weights(self, module):
+--         std = self.config.initializer_range
+--@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+--         return causal_mask
+-- 
+-- 
+---class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+--+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+--     _tied_weights_keys = ["lm_head.weight"]
+-- 
+--     def __init__(self, config):
+--@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+--         self.num_experts_per_tok = config.num_experts_per_tok
+--         # Initialize weights and apply final processing
+--         self.post_init()
+--+        # @lwx
+--+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+--+        #     self.generation_config.cache_implementation = "static"
+--+        self._warmed_up = False
+--+
+--+    def warmup_moe_model(self):
+--+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+--+        test_texts = [
+--+            "warmup short",
+--+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+--+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+--+        ]
+--+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+--+        if tokenizer is None:
+--+            from mindnlp.transformers import AutoTokenizer
+--+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+--+            self._warmup_tokenizer = tokenizer
+--+
+--+        for text in test_texts:
+--+            inputs = tokenizer(text, return_tensors="ms")
+--+            with mindspore._no_grad():
+--+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+--+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-- 
+--     def get_input_embeddings(self):
+--         return self.model.embed_tokens
+--@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+--         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+--         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+--         ```"""
+--+        if not self._warmed_up:
+--+            self._warmed_up = True
+--+            self.warmup_moe_model()
+-- 
+--         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+--         output_router_logits = (
+--@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+--             }
+--         )
+--         return model_inputs
+--+# @lwx
+--+    # def _decode_one_tokens_logits(
+--+    #     self,
+--+    #     cur_token: mindspore.Tensor,
+--+    #     input_pos: Optional[mindspore.Tensor],
+--+    #     cache_position: mindspore.Tensor,
+--+    #     past_key_values: StaticCache,
+--+    # ) -> mindspore.Tensor:
+--+    #     """
+--+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+--+        
+--+    #     Args:
+--+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+--+    #         input_pos: 输入位置信息，可选
+--+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+--+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+--+            
+--+    #     Returns:
+--+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+--+    #     """
+--+    #     # 调用JIT编译的版本
+--+    #     return self.get_decode_one_tokens_logits(
+--+    #         cur_token=cur_token,
+--+    #         input_pos=input_pos,
+--+    #         cache_position=cache_position,
+--+    #         past_key_values=past_key_values,
+--+    #     )
+--+    
+--+    # @mindspore.jit(jit_level='O1')
+--+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+--+    #     """
+--+    #     JIT编译的函数，用于高效的单token解码
+--+    #     使用JIT编译优化以支持静态shape和高效执行
+--+        
+--+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+--+    #     """
+--+    #     outputs = self.model.forward(
+--+    #         input_ids=cur_token,
+--+    #         position_ids=input_pos,
+--+    #         cache_position=cache_position,
+--+    #         past_key_values=past_key_values,
+--+    #         use_cache=True,
+--+    #         return_dict=False,
+--+    #     )
+--+        
+--+    #     hidden_states = outputs[0]
+--+    #     logits = self.lm_head.forward(hidden_states)
+--+    #     logits = logits.float()
+--+        
+--+    #     return logits[:, -1, :]
+--+
+--+    # def _sample(
+--+    #     self,
+--+    #     input_ids: mindspore.Tensor,
+--+    #     logits_processor,
+--+    #     stopping_criteria,
+--+    #     generation_config,
+--+    #     synced_devices: bool,
+--+    #     streamer=None,
+--+    #     logits_warper=None,
+--+    #     **model_kwargs,
+--+    # ):
+--+    #     """
+--+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+--+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+--+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+--+    #     """
+--+    #     from ...generation.logits_process import LogitsProcessorList
+--+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+--+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+--+    #     from mindnlp.core import nn, ops, no_grad
+--+    #     import numpy as np
+--+        
+--+    #     # 检查是否使用 StaticCache
+--+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+--+    #     # 否则，直接调用父类方法
+--+    #     past_key_values = model_kwargs.get("past_key_values")
+--+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+--+
+--+    #     if not isinstance(past_key_values, StaticCache):
+--+    #         # 不使用 StaticCache，直接调用父类方法
+--+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+--+    #         return super()._sample(
+--+    #             input_ids=input_ids,
+--+    #             logits_processor=logits_processor,
+--+    #             stopping_criteria=stopping_criteria,
+--+    #             generation_config=generation_config,
+--+    #             synced_devices=synced_devices,
+--+    #             streamer=streamer,
+--+    #             logits_warper=logits_warper,
+--+    #             **model_kwargs,
+--+    #         )
+--+        
+--+    #     # 使用 StaticCache，进入自定义循环
+--+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+--+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+--+    #     pad_token_id = generation_config._pad_token_tensor
+--+    #     output_attentions = generation_config.output_attentions
+--+    #     output_hidden_states = generation_config.output_hidden_states
+--+    #     output_scores = generation_config.output_scores
+--+    #     output_logits = generation_config.output_logits
+--+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+--+    #     max_length = generation_config.max_length
+--+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+--+    #     do_sample = generation_config.do_sample
+--+        
+--+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+--+    #         raise ValueError(
+--+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+--+    #             f"{logits_warper})."
+--+    #         )
+--+        
+--+    #     # init attention / hidden states / scores tuples
+--+    #     scores = () if (return_dict_in_generate and output_scores) else None
+--+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+--+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+--+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+--+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+--+        
+--+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+--+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+--+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+--+    #         encoder_hidden_states = (
+--+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+--+    #         )
+--+        
+--+    #     # keep track of which sequences are already finished
+--+    #     batch_size, cur_len = input_ids.shape
+--+    #     this_peer_finished = False
+--+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+--+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+--+        
+--+    #     time_record = []
+--+    #     from ....utils.testing_utils import parse_flag_from_env
+--+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+--+        
+--+    #     while self._has_unfinished_sequences(
+--+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+--+    #     ):
+--+    #         if _record_time:
+--+    #             import time as time_module
+--+    #             infer_start = time_module.time()
+--+            
+--+    #         # prepare model inputs
+--+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+--+            
+--+    #         # prepare variable output controls
+--+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+--+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+--+            
+--+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+--+    #         cur_cache_position = model_inputs.get("cache_position")
+--+    #         cur_past_key_values = model_inputs.get("past_key_values")
+--+    #         cur_input_ids = model_inputs.get("input_ids")
+--+            
+--+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+--+    #             cur_cache_position is not None and 
+--+    #             len(cur_cache_position.shape) > 0 and
+--+    #             cur_cache_position.shape[0] == 1 and
+--+    #             cur_input_ids is not None and
+--+    #             cur_input_ids.shape[1] == 1):
+--+    #             # 使用 JIT 优化的单 token 解码
+--+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+--+    #             if not hasattr(self, '_jit_used'):
+--+    #                 self._jit_used = False
+--+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+--+                
+--+    #             next_token_logits = self.get_decode_one_tokens_logits(
+--+    #                 cur_token=cur_input_ids,
+--+    #                 input_pos=model_inputs.get("position_ids"),
+--+    #                 cache_position=cur_cache_position,
+--+    #                 past_key_values=cur_past_key_values,
+--+    #             )
+--+                
+--+    #             # 标记已使用JIT（用于后续判断）
+--+    #             if not self._jit_used:
+--+    #                 self._jit_used = True
+--+                
+--+    #             # 构造兼容的输出对象
+--+    #             class JitOptimizedOutput:
+--+    #                 def __init__(self, logits, config):
+--+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+--+    #                     self.config = config
+--+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+--+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+--+    #                     self.attentions = None if not config.is_encoder_decoder else None
+--+    #                     self.cross_attentions = None
+--+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+--+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+--+                
+--+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+--+    #         else:
+--+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+--+    #             outputs = self(**model_inputs, return_dict=True)
+--+            
+--+    #         if synced_devices and this_peer_finished:
+--+    #             continue
+--+            
+--+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+--+    #         next_token_logits = outputs.logits[:, -1, :]
+--+            
+--+    #         # pre-process distribution
+--+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+--+    #         if do_sample:
+--+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+--+            
+--+    #         # Store scores, attentions and hidden_states when required
+--+    #         if return_dict_in_generate:
+--+    #             if output_scores:
+--+    #                 scores += (next_token_scores,)
+--+    #             if output_logits:
+--+    #                 raw_logits += (next_token_logits,)
+--+    #             if output_attentions:
+--+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+--+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+--+    #                 if self.config.is_encoder_decoder:
+--+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+--+                
+--+    #             if output_hidden_states:
+--+    #                 hidden = (
+--+    #                     outputs.decoder_hidden_states
+--+    #                     if self.config.is_encoder_decoder
+--+    #                     else outputs.hidden_states
+--+    #                 )
+--+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+--+            
+--+    #         # token selection
+--+    #         if do_sample:
+--+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+--+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+--+    #         else:
+--+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+--+            
+--+    #         # finished sentences should have their next token be a padding token
+--+    #         if has_eos_stopping_criteria:
+--+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+--+            
+--+    #         # update generated ids, model inputs, and length for next step
+--+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+--+    #         if streamer is not None:
+--+    #             streamer.put(next_tokens)
+--+            
+--+    #         model_kwargs = self._update_model_kwargs_for_generation(
+--+    #             outputs,
+--+    #             model_kwargs,
+--+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+--+    #         )
+--+            
+--+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+--+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+--+    #         cur_len += 1
+--+            
+--+    #         if _record_time:
+--+    #             import time as time_module
+--+    #             infer_stop = time_module.time()
+--+    #             time_record.append(infer_stop - infer_start)
+--+            
+--+    #         del outputs
+--+        
+--+    #     average_infer_time = None
+--+    #     if time_record:
+--+    #         if len(time_record) > 1:
+--+    #             time_record.pop(0)
+--+    #         average_infer_time = sum(time_record) / len(time_record)
+--+    #         print(f'average inference time is: {average_infer_time}')
+--+    #         print(f'inference time record: {time_record}')
+--+        
+--+    #     if streamer is not None:
+--+    #         streamer.end()
+--+        
+--+    #     # 简单判断：打印是否使用了JIT路径
+--+    #     if hasattr(self, '_jit_used') and self._jit_used:
+--+    #         print("[JIT] ✓ JIT optimization was used during generation")
+--+    #     else:
+--+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+--+        
+--+    #     if return_dict_in_generate:
+--+    #         if self.config.is_encoder_decoder:
+--+    #             return GenerateEncoderDecoderOutput(
+--+    #                 sequences=input_ids,
+--+    #                 scores=scores,
+--+    #                 logits=raw_logits,
+--+    #                 encoder_attentions=encoder_attentions,
+--+    #                 encoder_hidden_states=encoder_hidden_states,
+--+    #                 decoder_attentions=decoder_attentions,
+--+    #                 cross_attentions=cross_attentions,
+--+    #                 decoder_hidden_states=decoder_hidden_states,
+--+    #                 past_key_values=model_kwargs.get("past_key_values"),
+--+    #                 average_infer_time=average_infer_time
+--+    #             )
+--+    #         else:
+--+    #             return GenerateDecoderOnlyOutput(
+--+    #                 sequences=input_ids,
+--+    #                 scores=scores,
+--+    #                 logits=raw_logits,
+--+    #                 attentions=decoder_attentions,
+--+    #                 hidden_states=decoder_hidden_states,
+--+    #                 past_key_values=model_kwargs.get("past_key_values"),
+--+    #                 average_infer_time=average_infer_time
+--+    #             )
+--+    #     else:
+--+    #         return input_ids
+--+            
+--+    # def _prepare_cache_for_generation(
+--+    #     self,
+--+    #     generation_config,
+--+    #     model_kwargs,
+--+    #     assistant_model,
+--+    #     batch_size,
+--+    #     max_cache_length,
+--+    # ):
+--+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+--+    #         generation_config.cache_implementation = "static"
+--+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+--+        
+--+    #     if generation_config.cache_implementation == "static":
+--+    #         base_required_from_max_length = generation_config.max_length + 1
+--+    #         base_required = max(max_cache_length, base_required_from_max_length)
+--+    #         min_cache_size = 50
+--+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+--+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+--+    #         else:
+--+    #             max_cache_length = max(base_required, min_cache_size)
+--+            
+--+    #         original_max_cache_length = max_cache_length
+--+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+--+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+--+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+--+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+--+    #         print(f"  - final max_cache_length: {max_cache_length}")
+--+            
+--+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+--+    #             if max_cache_length > self.config.max_position_embeddings:
+--+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+--+        
+--+    #     result = super()._prepare_cache_for_generation(
+--+    #         generation_config=generation_config,
+--+    #         model_kwargs=model_kwargs,
+--+    #         assistant_model=assistant_model,
+--+    #         batch_size=batch_size,
+--+    #         max_cache_length=max_cache_length,
+--+    #     )
+--+        
+--+    #     if generation_config.cache_implementation == "static":
+--+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+--+    #         created_cache = model_kwargs.get(cache_name)
+--+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+--+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+--+    #             if created_cache.max_cache_len < generation_config.max_length:
+--+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+--+        
+--+    #     return result
+--+
+--+
+--+
+-- 
+-- 
+-- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+---- 
+--2.27.0
+--
+--- 
+-2.27.0
+-
+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+deleted file mode 100644
+index bc5549ca..00000000
+--- a/patches/0004-20251106change.patch
++++ /dev/null
+@@ -1,7498 +0,0 @@
+-From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Thu, 6 Nov 2025 15:48:09 +0800
+-Subject: [PATCH 4/8] 20251106change
+-
+----
+- .../models/deepseek/modeling_deepseek.py      |  189 +-
+- patches/0001-20251104commit.patch             | 1272 +++++++
+- patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
+- patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
+- 4 files changed, 7244 insertions(+), 186 deletions(-)
+- create mode 100644 patches/0001-20251104commit.patch
+- create mode 100644 patches/0002-20251106commit.patch
+- create mode 100644 patches/0003-20261106secondcommit.patch
+-
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index 2f9192bf..0546f318 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
+- 
+-         return attn_output, attn_weights, past_key_value
+- 
+--# class DeepseekFlashAttention(nn.Module):
+--#     """
+--#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+--#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+--
+--#     This class is designed as a drop-in replacement for DeepseekAttention.
+--#     """
+--
+--#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+--#         super().__init__()
+--#         self.config = config
+--#         self.layer_idx = layer_idx
+--#         if layer_idx is None:
+--#             logger.warning(
+--#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+--#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+--#                 "when creating this class."
+--#             )
+--
+--#         self.attention_dropout = config.attention_dropout
+--#         self.hidden_size = config.hidden_size
+--#         self.num_heads = config.num_attention_heads
+--#         self.head_dim = self.hidden_size // self.num_heads
+--#         self.num_key_value_heads = config.num_key_value_heads
+--#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+--#         self.max_position_embeddings = config.max_position_embeddings
+--#         self.rope_theta = config.rope_theta
+--#         self.is_causal = True
+--
+--#         if (self.head_dim * self.num_heads) != self.hidden_size:
+--#             raise ValueError(
+--#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+--#                 f" and `num_heads`: {self.num_heads})."
+--#             )
+--
+--#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+--#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+--#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+--#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+--#         self._init_rope()
+--
+--#     def _init_rope(self):
+--#         if self.config.rope_scaling is None:
+--#             self.rotary_emb = DeepseekRotaryEmbedding(
+--#                 self.head_dim,
+--#                 max_position_embeddings=self.max_position_embeddings,
+--#                 base=self.rope_theta,
+--#             )
+--#         else:
+--#             scaling_type = self.config.rope_scaling["type"]
+--#             scaling_factor = self.config.rope_scaling["factor"]
+--#             if scaling_type == "linear":
+--#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+--#                     self.head_dim,
+--#                     max_position_embeddings=self.max_position_embeddings,
+--#                     scaling_factor=scaling_factor,
+--#                     base=self.rope_theta,
+--#                 )
+--#             elif scaling_type == "dynamic":
+--#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+--#                     self.head_dim,
+--#                     max_position_embeddings=self.max_position_embeddings,
+--#                     scaling_factor=scaling_factor,
+--#                     base=self.rope_theta,
+--#                 )
+--#             else:
+--#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+--
+--#     def forward(
+--#         self,
+--#         hidden_states: mindspore.Tensor,
+--#         attention_mask: Optional[mindspore.Tensor] = None,
+--#         position_ids: Optional[mindspore.Tensor] = None,
+--#         past_key_value: Optional[Cache] = None,
+--#         output_attentions: bool = False,
+--#         use_cache: bool = False,
+--#         **kwargs,
+--#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+--#         if "padding_mask" in kwargs:
+--#             warnings.warn(
+--#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+--#             )
+--        
+--#         if output_attentions:
+--#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+--
+--#         bsz, q_len, _ = hidden_states.shape
+--
+--#         if self.config.pretraining_tp > 1:
+--#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+--
+--#         query_states = self.q_proj(hidden_states)
+--#         key_states = self.k_proj(hidden_states)
+--#         value_states = self.v_proj(hidden_states)
+--
+--#         # Reshape for multi-head attention
+--#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+--#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+--
+--#         kv_seq_len = key_states.shape[-2]
+--#         if past_key_value is not None:
+--#             if self.layer_idx is None:
+--#                 raise ValueError(
+--#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+--#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+--#                     "with a layer index."
+--#                 )
+--#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+--        
+--#         # Apply Rotary Positional Embedding
+--#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+--#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+--
+--#         if past_key_value is not None:
+--#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+--#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+--
+--#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+--#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+--#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+--        
+--#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+--#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+--        
+--#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+--#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+--
+--#         # Convert attention_mask for flash_attention_score
+--#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+--#         if attention_mask is not None:
+--#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+--#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+--#                 raise ValueError(
+--#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+--#                 )
+--#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+--#         else:
+--#             attn_mask_for_fa = None
+--        
+--#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+--
+--#         # Call the fused flash_attention_score operator
+--#         attn_output = mindspore.ops.flash_attention_score(
+--#             query=query_states_for_fa,
+--#             key=key_states_for_fa,
+--#             value=value_states_for_fa,
+--#             head_num=self.num_heads, # This is N1, the number of query heads
+--#             input_layout='BSH',
+--#             attn_mask=attn_mask_for_fa,
+--#             keep_prob=keep_prob,
+--#             scalar_value=1.0 / math.sqrt(self.head_dim),
+--#             sparse_mode=0 # Default mask mode
+--#         )
+--        
+--#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+--#         attn_output = self.o_proj(attn_output)
+--        
+--#         # Flash Attention does not return attention weights
+--#         attn_weights = None
+--
+--#         return attn_output, attn_weights, past_key_value
+- 
+- class DeepseekFlashAttention(nn.Module):
+-     """
+-@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
+-         super().__init__()
+-         self.hidden_size = config.hidden_size
+- 
+--        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+--            config=config, layer_idx=layer_idx
+--        )
+-+        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-+            # config=config, layer_idx=layer_idx
+-+        # )
+- 
+-         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-             config=config, layer_idx=layer_idx
+-@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
+-         return outputs
+- 
+- 
+--
+- class DeepseekPreTrainedModel(PreTrainedModel):
+-     config_class = DeepseekConfig
+-     base_model_prefix = "model"
+-@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-         # Initialize weights and apply final processing
+-         self.post_init()
+-         self.warm_up = False
+--        #@dwj
+--        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+--            self.num_layers,
+--            self.num_attention_heads,
+--            self.head_dim,
+--            batch_size=1,
+--            max_length=self.max_length,
+--            dtype=mindspore.float16
+--        )
+--
+--    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+--        key_cache = []
+--        value_cache = []
+--        for _ in range(num_layers):
+--            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+--            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+--            key_cache.append(k)
+--            value_cache.append(v)
+--        return key_cache, value_cache
+--
+- 
+-     def warmup_moe_model_deep(self):
+-         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-new file mode 100644
+-index 00000000..78f22642
+---- /dev/null
+-+++ b/patches/0001-20251104commit.patch
+-@@ -0,0 +1,1272 @@
+-+From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+From: Pinoeer-kingxi <13022943007@163.com>
+-+Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+Subject: [PATCH 1/3] 20251104commit
+-+
+-+---
+-+ mindnlp/transformers/cache_utils.py           |  28 +-
+-+ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+ 3 files changed, 976 insertions(+), 87 deletions(-)
+-+
+-+diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+index cadd2e04..02f8d4be 100644
+-+--- a/mindnlp/transformers/cache_utils.py
+-++++ b/mindnlp/transformers/cache_utils.py
+-+@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+             # k_out[:, :, cache_position] = key_states
+-+             # v_out[:, :, cache_position] = value_states
+-+-            if ON_ORANGE_PI:
+-+-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+-            else:
+-+-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+-
+-++            # if ON_ORANGE_PI:
+-++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++            # else:
+-++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++            # 确保 cache_position 是 1D tensor 并且类型正确
+-++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-++            if cache_position.ndim > 1:
+-++                cache_position = cache_position.flatten()
+-++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-++                cache_position = cache_position.int()
+-++            
+-++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-++            k_out[:, :, cache_position] = key_states
+-++            v_out[:, :, cache_position] = value_states
+-++            
+-+         return k_out, v_out
+-+ 
+-+     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+index c695b944..d8303e45 100644
+-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+ def rotate_half(x):
+-+     """Rotates half the hidden dims of the input."""
+-+-    x1 = x[..., : x.shape[-1] // 2]
+-+-    x2 = x[..., x.shape[-1] // 2 :]
+-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++    # x1 = x[..., : x.shape[-1] // 2]
+-++    # x2 = x[..., x.shape[-1] // 2 :]
+-++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+     return ops.cat((-x2, x1), dim=-1)
+-+ 
+-+ 
+-+@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+         if self.training:
+-+             raise NotImplementedError("Training is not supported yet.")
+-+         else:
+-+-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+-        if self.config.n_shared_experts is not None:
+-+-            y = y + self.shared_experts(identity)
+-+-        return y
+-++            # @lwx
+-++            if orig_shape[1] == 1:
+-++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-++                y=y.view(*orig_shape)
+-++                if self.config.n_shared_experts is not None:
+-++                    y = y + self.shared_experts(identity)
+-++                return y
+-++            else:
+-++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-++                if self.config.n_shared_experts is not None:
+-++                    y = y + self.shared_experts(identity)
+-++                return y
+-++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++        # if self.config.n_shared_experts is not None:
+-++        #     y = y + self.shared_experts(identity)
+-++        # return y
+-++        
+-++    @no_grad()
+-++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++
+-++        expert_cache = ops.zeros_like(x)
+-++        for i in range(self.num_experts_per_tok):
+-++            expert_id = flat_expert_indices[i].item()
+-++            weight = flat_expert_weights[i].item()
+-++            expert = self.experts[expert_id]
+-++            expert_out = expert(x)
+-++            expert_cache += expert_out * weight
+-++        return expert_cache
+-+ 
+-+     @no_grad()
+-+-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+-        # expert_cache = torch.zeros_like(x)
+-+-        # idxs = flat_expert_indices.argsort()
+-+-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+-        # token_idxs = idxs // self.num_experts_per_tok
+-+-        # for i, end_idx in enumerate(tokens_per_expert):
+-+-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+-        #     if start_idx == end_idx:
+-+-        #         continue
+-+-        #     expert = self.experts[i]
+-+-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+-        #     expert_tokens = x[exp_token_idx]
+-+-        #     expert_out = expert(expert_tokens)
+-+-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+-        # return expert_cache
+-++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+         expert_cache = ops.zeros_like(x)
+-+         idxs = flat_expert_indices.argsort()
+-+         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+         token_idxs = idxs // self.num_experts_per_tok
+-++
+-+         for i, end_idx in enumerate(tokens_per_expert):
+-+             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+             if start_idx == end_idx:
+-+@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+             expert_out = expert(expert_tokens)
+-+             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++
+-+         return expert_cache
+-++        
+-++    # @no_grad()
+-++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++    #     # expert_cache = torch.zeros_like(x)
+-++    #     # idxs = flat_expert_indices.argsort()
+-++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++    #     # token_idxs = idxs // self.num_experts_per_tok
+-++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++    #     #     if start_idx == end_idx:
+-++    #     #         continue
+-++    #     #     expert = self.experts[i]
+-++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++    #     #     expert_tokens = x[exp_token_idx]
+-++    #     #     expert_out = expert(expert_tokens)
+-++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++    #     # return expert_cache
+-++    #     expert_cache = ops.zeros_like(x)
+-++    #     idxs = flat_expert_indices.argsort()
+-++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++    #     token_idxs = idxs // self.num_experts_per_tok
+-++
+-++    #     for i, end_idx in enumerate(tokens_per_expert):
+-++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++    #         if start_idx == end_idx:
+-++    #             continue
+-++    #         expert = self.experts[i]
+-++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++    #         expert_tokens = x[exp_token_idx]
+-++    #         expert_out = expert(expert_tokens)
+-++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++
+-++    #     return expert_cache
+-++    # @no_grad()
+-++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++    #     expert_cache = ops.zeros_like(x)
+-++
+-++    #     # 排序保证顺序一致
+-++    #     idxs = flat_expert_indices.argsort()
+-++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++    #     token_idxs = idxs // self.num_experts_per_tok
+-++
+-++    #     # 找出有 token 的专家
+-++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++
+-++    #     for i in active_experts.tolist():
+-++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++    #         end_idx = tokens_per_expert[i]
+-++    #         if start_idx == end_idx:  # 没有 token
+-++    #             continue
+-++
+-++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++    #         expert_tokens = x[exp_token_idx]
+-++    #         expert_out = self.experts[i](expert_tokens)
+-++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++
+-++    #         expert_cache = mindspore.mint.scatter_add(
+-++    #             expert_cache,
+-++    #             0,
+-++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++    #             expert_out
+-++    #         )
+-++
+-++    #     return expert_cache
+-++
+-++
+-+ 
+-+ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+ #     """
+-+@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+ 
+-+         # Initialize weights and apply final processing
+-+         self.post_init()
+-++        self.warm_up = False
+-++
+-++    def warmup_moe_model_deep(self):
+-++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++        test_texts = [
+-++            "warmup short",
+-++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-++        ]
+-++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++        if tokenizer is None:
+-++            from mindnlp.transformers import AutoTokenizer
+-++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++            self._warmup_tokenizer = tokenizer
+-++
+-++        for text in test_texts:
+-++            inputs = tokenizer(text, return_tensors="ms")
+-++            with mindspore._no_grad():
+-++                _ = self(**inputs, use_cache=False)
+-++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+ 
+-+     def get_input_embeddings(self):
+-+         return self.model.embed_tokens
+-+@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+         ```"""
+-++        if not self.warm_up:
+-++            self.warm_up = True
+-++            self.warmup_moe_model_deep()
+-++
+-+         output_attentions = (
+-+             output_attentions
+-+             if output_attentions is not None
+-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+index 3cbf820e..d4c6b651 100644
+-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+@@ -18,7 +18,6 @@
+-+ # See the License for the specific language governing permissions and
+-+ # limitations under the License.
+-+ """MindSpore Qwen2MoE model."""
+-+-
+-+ import math
+-+ from typing import List, Optional, Tuple, Union
+-+ 
+-+@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+     TokenClassifierOutput,
+-+ )
+-+ from ...modeling_utils import PreTrainedModel
+-++from ...generation import GenerationMixin
+-+ from ....utils import logging
+-+ from .configuration_qwen2_moe import Qwen2MoeConfig
+-+ 
+-+@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+         self.variance_epsilon = eps
+-+ 
+-+     def forward(self, hidden_states):
+-++        # @dwj
+-++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++        # @lwx
+-++        # if not self.training :
+-++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+         input_dtype = hidden_states.dtype
+-+         hidden_states = hidden_states.to(mindspore.float32)
+-+         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+@@ -234,6 +239,8 @@ def rotate_half(x):
+-+     """Rotates half the hidden dims of the input."""
+-+     x1 = x[..., : x.shape[-1] // 2]
+-+     x2 = x[..., x.shape[-1] // 2 :]
+-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+     return ops.cat((-x2, x1), dim=-1)
+-+ 
+-+ 
+-+@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+         self.config = config
+-+         self.hidden_size = config.hidden_size
+-+         self.intermediate_size = intermediate_size
+-++        
+-+         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+         self.act_fn = ACT2FN[config.hidden_act]
+-+ 
+-+     def forward(self, x):
+-+-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+-
+-+ 
+-++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++        # @lwx 
+-++        # gate_up_output = self.gate_up_proj(x)
+-++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-++        # return self.down_proj(swiglu_output)
+-++
+-++    # def forward(self, x):
+-++    #     gate_proj_out = self.gate_proj(x)
+-++    #     up_proj_out = self.up_proj(x)
+-++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-++    #     return self.down_proj(swiglu_out)
+-++        
+-+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+     """
+-+@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+         use_cache: bool = False,
+-+         cache_position: Optional[mindspore.Tensor] = None,
+-+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++        
+-++
+-+         bsz, q_len, _ = hidden_states.shape
+-+ 
+-+         query_states = self.q_proj(hidden_states)
+-+@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+                     "with a layer index."
+-+                 )
+-+-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++            if isinstance(past_key_value, StaticCache):
+-++                kv_seq_len = key_states.shape[-2]
+-++            else:
+-++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+ 
+-+         if past_key_value is not None:
+-+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++            
+-++            if isinstance(past_key_value, StaticCache):
+-++                kv_seq_len = key_states.shape[-2]
+-+ 
+-+         # repeat k/v heads if n_kv_heads < n_heads
+-+         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+-
+-++        
+-+         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+ 
+-+-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+-            raise ValueError(
+-+-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+-                f" {attn_weights.shape}"
+-+-            )
+-+-
+-+-        if attention_mask is not None:  # no matter the length, we just slice it
+-+-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-++        if attention_mask is not None:
+-++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+             attn_weights = attn_weights + causal_mask
+-+ 
+-+         # upcast attention to fp32
+-+@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+ 
+-+         attn_output = self.o_proj(attn_output)
+-+-
+-++        # @lwx
+-++        
+-++        # max_seq_len = self.max_position_embeddings  # 2048
+-++
+-++        # if attention_mask is not None:
+-++        #     # attention_mask: [B, 1, Sq, Sk]
+-++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++
+-++        #     # pad 到 [max_seq_len, max_seq_len]
+-++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++        #     global_attention_mask = padded_mask
+-++        # else:
+-++        #     global_attention_mask = None
+-++
+-++
+-++        # sparse_mode=3
+-++        # attn_output = mindspore.ops.flash_attention_score(
+-++        #     query=query_states,
+-++        #     key=key_states,
+-++        #     value=value_states,
+-++        #     real_shift=None,
+-++        #     padding_mask=None,
+-++
+-++        #     head_num=self.num_heads,
+-++        #     attn_mask=global_attention_mask,
+-++        #     keep_prob=1.0 - self.attention_dropout,
+-++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++        #     input_layout="BNSD",
+-++        #     pre_tokens=2147483647,
+-++        #     next_tokens=2147483647,
+-++        #     inner_precise=0,
+-++        #     drop_mask=None,
+-++        #     prefix=None,
+-++        #     actual_seq_qlen=None,
+-++        #     actual_seq_kvlen=None,
+-++        #     sparse_mode=sparse_mode,
+-++        # )
+-+         if not output_attentions:
+-+             attn_weights = None
+-+ 
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-+ 
+-++class Qwen2MoeFlashAttention(nn.Module):
+-++    """
+-++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++
+-++    关键改动:
+-++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++       直接传入原始的 key 和 value 张量效率更高。
+-++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++    """
+-++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++        super().__init__()
+-++        self.config = config
+-++        self.layer_idx = layer_idx
+-++        self.hidden_size = config.hidden_size
+-++        self.num_heads = config.num_attention_heads
+-++        self.head_dim = self.hidden_size // self.num_heads
+-++        self.num_key_value_heads = config.num_key_value_heads
+-++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++        self.max_position_embeddings = config.max_position_embeddings
+-++        self.rope_theta = config.rope_theta
+-++        self.attention_dropout = config.attention_dropout
+-++
+-++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++            raise ValueError(
+-++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++            )
+-++
+-++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++
+-++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++            self.head_dim,
+-++            max_position_embeddings=self.max_position_embeddings,
+-++            base=self.rope_theta,
+-++        )
+-++
+-++    def forward(
+-++        self,
+-++        hidden_states: mindspore.Tensor,
+-++        attention_mask: Optional[mindspore.Tensor] = None,
+-++        position_ids: Optional[mindspore.Tensor] = None,
+-++        past_key_value: Optional[Cache] = None,
+-++        output_attentions: bool = False,
+-++        use_cache: bool = False,
+-++        cache_position: Optional[mindspore.Tensor] = None,
+-++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++        bsz, q_len, _ = hidden_states.shape
+-++
+-++        # 1. 线性投射 Q, K, V
+-++        query_states = self.q_proj(hidden_states)
+-++        key_states = self.k_proj(hidden_states)
+-++        value_states = self.v_proj(hidden_states)
+-++
+-++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++        # 3. RoPE 旋转位置编码
+-++        kv_seq_len = key_states.shape[-2]
+-++        if past_key_value is not None:
+-++            if self.layer_idx is None:
+-++                raise ValueError(
+-++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++                    "with a layer index."
+-++                )
+-++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++                if cache_position.shape[0] == 1:
+-++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++                    kv_seq_len = past_seen_tokens + 1
+-++                else:
+-++                    # prefill 阶段：cache_position 是范围，使用其长度
+-++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++            else:
+-++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++
+-++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++        # 4. KV 缓存更新
+-++        if past_key_value is not None:
+-++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++            key_states, value_states = past_key_value.update(
+-++                key_states, value_states, self.layer_idx, cache_kwargs
+-++            )
+-++            
+-++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++                if cache_position.shape[0] == 1:
+-++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++                    kv_seq_len = key_states.shape[-2]
+-++
+-++        # 5. [重要] 准备 Attention Mask
+-++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++        fa_attention_mask = None
+-++        if attention_mask is not None:
+-++            # 截取与当前key长度匹配的部分
+-++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++            fa_attention_mask = (mask_slice != 0)
+-++
+-++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++        input_dtype = query_states.dtype
+-++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++            query_states = query_states.to(mindspore.float16)
+-++            key_states = key_states.to(mindspore.float16)
+-++            value_states = value_states.to(mindspore.float16)
+-++
+-++        # 6. [核心] 调用 flash_attention_score 算子
+-++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++        attn_output = mindspore.ops.flash_attention_score(
+-++            query=query_states,
+-++            key=key_states,
+-++            value=value_states,
+-++            head_num=self.num_heads, # 传入Q的头数(N1)
+-++            attn_mask=fa_attention_mask,
+-++            keep_prob=1.0 - self.attention_dropout,
+-++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-++            input_layout="BNSD",
+-++            sparse_mode=0 # 使用 defaultMask 模式
+-++        )
+-++
+-++        # 恢复原始数据类型
+-++        attn_output = attn_output.to(input_dtype)
+-++
+-++        # 7. 调整输出形状
+-++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++        attn_output = self.o_proj(attn_output)
+-++
+-++        # FlashAttention 算子不直接返回注意力权重矩阵
+-++        attn_weights = None
+-++        if output_attentions:
+-++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++
+-++        return attn_output, attn_weights, past_key_value
+-++
+-++    # def forward(
+-++    #     self,
+-++    #     hidden_states: mindspore.Tensor,
+-++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++    #     past_key_value: Optional[Cache] = None,
+-++    #     output_attentions: bool = False,
+-++    #     use_cache: bool = False,
+-++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++    #     bsz, q_len, _ = hidden_states.shape
+-++
+-++    #     # 1. 线性投射 Q, K, V
+-++    #     query_states = self.q_proj(hidden_states)
+-++    #     key_states = self.k_proj(hidden_states)
+-++    #     value_states = self.v_proj(hidden_states)
+-++
+-++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++    #     # 3. RoPE 旋转位置编码
+-++    #     kv_seq_len = key_states.shape[-2]
+-++    #     if past_key_value is not None:
+-++    #         if self.layer_idx is None:
+-++    #             raise ValueError(
+-++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++    #                 "with a layer index."
+-++    #             )
+-++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++
+-++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++    #     # 4. KV 缓存更新
+-++    #     if past_key_value is not None:
+-++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++    #         key_states, value_states = past_key_value.update(
+-++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++    #         )
+-++
+-++    #     # 5. 准备 Attention Mask
+-++    #     fa_attention_mask = None
+-++    #     if attention_mask is not None:
+-++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++    #         fa_attention_mask = (mask_slice != 0)
+-++
+-++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++    #     input_dtype = query_states.dtype
+-++
+-++    #     # 6. [核心] 调用 flash_attention_score 算子
+-++    #     attn_output = mindspore.ops.flash_attention_score(
+-++    #         query=query_states,
+-++    #         key=key_states,
+-++    #         value=value_states,
+-++    #         head_num=self.num_heads,
+-++    #         attn_mask=fa_attention_mask,
+-++    #         keep_prob=1.0 - self.attention_dropout,
+-++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++    #         input_layout="BNSD",
+-++    #         sparse_mode=0,
+-++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++    #         inner_precise=1
+-++    #     )
+-++
+-++    #     # 恢复原始数据类型
+-++    #     attn_output = attn_output.to(input_dtype)
+-++
+-++    #     # 7. 调整输出形状
+-++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++    #     attn_output = self.o_proj(attn_output)
+-++
+-++    #     attn_weights = None
+-++    #     if output_attentions:
+-++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++
+-++    #     return attn_output, attn_weights, past_key_value
+-++
+-++    # def forward(
+-++    #     self,
+-++    #     hidden_states: mindspore.Tensor,
+-++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++    #     past_key_value: Optional[Cache] = None,
+-++    #     output_attentions: bool = False,
+-++    #     use_cache: bool = False,
+-++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++    #     bsz, q_len, _ = hidden_states.shape
+-++
+-++    #     query_states = self.q_proj(hidden_states)
+-++    #     key_states = self.k_proj(hidden_states)
+-++    #     value_states = self.v_proj(hidden_states)
+-++
+-++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++    #     kv_seq_len = key_states.shape[-2]
+-++    #     if past_key_value is not None:
+-++    #         if self.layer_idx is None:
+-++    #             raise ValueError("`layer_idx` must be specified for caching")
+-++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++
+-++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++    #     if past_key_value is not None:
+-++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++    #         key_states, value_states = past_key_value.update(
+-++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++    #         )
+-++
+-++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++
+-++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++    #     query_states = query_states / math.sqrt(self.head_dim)
+-++    #     # <--- 修改结束 ---
+-++
+-++    #     fa_attention_mask = None
+-++    #     if attention_mask is not None:
+-++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++    #         fa_attention_mask = (mask_slice != 0)
+-++
+-++    #     input_dtype = query_states.dtype
+-++
+-++    #     attn_output = mindspore.ops.flash_attention_score(
+-++    #         query=query_states,    # 传入已经预先缩放过的 query
+-++    #         key=key_states,
+-++    #         value=value_states,
+-++    #         head_num=self.num_heads,
+-++    #         attn_mask=fa_attention_mask,
+-++    #         keep_prob=1.0 - self.attention_dropout,
+-++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++    #         input_layout="BNSD",
+-++    #         sparse_mode=0,
+-++    #         inner_precise=1        # 仍然保持内部高精度计算
+-++    #     )
+-++
+-++    #     attn_output = attn_output.to(input_dtype)
+-++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++    #     attn_output = self.o_proj(attn_output)
+-++
+-++    #     attn_weights = None
+-++    #     if output_attentions:
+-++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++
+-++    #     return attn_output, attn_weights, past_key_value
+-++
+-+ QWEN2MOE_ATTENTION_CLASSES = {
+-+     "eager": Qwen2MoeAttention,
+-++    "flash-attention": Qwen2MoeFlashAttention,
+-+ }
+-+ 
+-+ 
+-+@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+ 
+-++    #@dwj
+-++    # 只遍历激活的专家，而非全部专家
+-+     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-        hidden_states = hidden_states.view(-1, hidden_dim)
+-+-        # router_logits: (batch * sequence_length, n_experts)
+-+-        router_logits = self.gate(hidden_states)
+-+-
+-+-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+-        if self.norm_topk_prob:
+-+-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-        # we cast back to the input dtype
+-+-        routing_weights = routing_weights.to(hidden_states.dtype)
+-+-
+-+-        final_hidden_states = ops.zeros(
+-+-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+-        )
+-+-
+-+-        # One hot encode the selected experts to create an expert mask
+-+-        # this will be used to easily index which expert is going to be sollicitated
+-+-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+-
+-+-        # Loop over all available experts in the model and perform the computation on each expert
+-+-        for expert_idx in range(self.num_experts):
+-+-            expert_layer = self.experts[expert_idx]
+-+-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+-
+-+-            # Index the correct hidden states and compute the expert hidden state for
+-+-            # the current expert. We need to make sure to multiply the output hidden
+-+-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+-            if 0 not in idx.shape:
+-+-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+-
+-+-                # However `index_add_` only support torch tensors for indexing so we'll use
+-+-                # the `top_x` tensor here.
+-+-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+-
+-+-        shared_expert_output = self.shared_expert(hidden_states)
+-+-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+-
+-+-        final_hidden_states = final_hidden_states + shared_expert_output
+-++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++            num_tokens = hidden_states_reshaped.shape[0]
+-++
+-++            router_logits = self.gate(hidden_states_reshaped)
+-++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++
+-++            if self.norm_topk_prob:
+-++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++            routing_weights = routing_weights.to(hidden_states.dtype)
+-++            
+-++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++            flat_selected_experts = selected_experts.flatten()
+-++            
+-++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++            token_indices = broadcasted_token_indices.flatten()
+-++            
+-++            active_experts = ops.unique(flat_selected_experts)
+-++            
+-++            for expert_idx_tensor in active_experts:
+-++                expert_idx = expert_idx_tensor.item()
+-++                expert_layer = self.experts[expert_idx]
+-++                
+-++                mask = (flat_selected_experts == expert_idx_tensor)
+-++                selected_token_indices = token_indices[mask]
+-++                selected_routing_weights = routing_weights.flatten()[mask]
+-++                
+-++                current_states = hidden_states_reshaped[selected_token_indices]
+-++                
+-++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++                
+-++                final_hidden_states = final_hidden_states.index_add(
+-++                    dim=0,
+-++                    index=selected_token_indices,
+-++                    source=expert_output.to(hidden_states.dtype)
+-++                )
+-++            
+-++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+ 
+-+-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+-        return final_hidden_states, router_logits
+-++            final_hidden_states = final_hidden_states + shared_expert_output
+-++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++            
+-++            return final_hidden_states, router_logits
+-+ 
+-+ 
+-+ class Qwen2MoeDecoderLayer(nn.Module):
+-+@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+ 
+-+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+ 
+-++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++
+-+         if (layer_idx not in config.mlp_only_layers) and (
+-+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+         ):
+-+@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+     _skip_keys_device_placement = "past_key_values"
+-+     _supports_cache_class = True
+-++#lwx
+-++    # _supports_static_cache = True
+-+ 
+-+     def _init_weights(self, module):
+-+         std = self.config.initializer_range
+-+@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+         return causal_mask
+-+ 
+-+ 
+-+-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+     _tied_weights_keys = ["lm_head.weight"]
+-+ 
+-+     def __init__(self, config):
+-+@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+         self.num_experts_per_tok = config.num_experts_per_tok
+-+         # Initialize weights and apply final processing
+-+         self.post_init()
+-++        # @lwx
+-++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-++        #     self.generation_config.cache_implementation = "static"
+-++        self._warmed_up = False
+-++
+-++    def warmup_moe_model(self):
+-++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-++        test_texts = [
+-++            "warmup short",
+-++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-++        ]
+-++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++        if tokenizer is None:
+-++            from mindnlp.transformers import AutoTokenizer
+-++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++            self._warmup_tokenizer = tokenizer
+-++
+-++        for text in test_texts:
+-++            inputs = tokenizer(text, return_tensors="ms")
+-++            with mindspore._no_grad():
+-++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+ 
+-+     def get_input_embeddings(self):
+-+         return self.model.embed_tokens
+-+@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+         ```"""
+-++        if not self._warmed_up:
+-++            self._warmed_up = True
+-++            self.warmup_moe_model()
+-+ 
+-+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+         output_router_logits = (
+-+@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+             }
+-+         )
+-+         return model_inputs
+-++# @lwx
+-++    # def _decode_one_tokens_logits(
+-++    #     self,
+-++    #     cur_token: mindspore.Tensor,
+-++    #     input_pos: Optional[mindspore.Tensor],
+-++    #     cache_position: mindspore.Tensor,
+-++    #     past_key_values: StaticCache,
+-++    # ) -> mindspore.Tensor:
+-++    #     """
+-++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-++        
+-++    #     Args:
+-++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-++    #         input_pos: 输入位置信息，可选
+-++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-++            
+-++    #     Returns:
+-++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-++    #     """
+-++    #     # 调用JIT编译的版本
+-++    #     return self.get_decode_one_tokens_logits(
+-++    #         cur_token=cur_token,
+-++    #         input_pos=input_pos,
+-++    #         cache_position=cache_position,
+-++    #         past_key_values=past_key_values,
+-++    #     )
+-++    
+-++    # @mindspore.jit(jit_level='O1')
+-++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-++    #     """
+-++    #     JIT编译的函数，用于高效的单token解码
+-++    #     使用JIT编译优化以支持静态shape和高效执行
+-++        
+-++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-++    #     """
+-++    #     outputs = self.model.forward(
+-++    #         input_ids=cur_token,
+-++    #         position_ids=input_pos,
+-++    #         cache_position=cache_position,
+-++    #         past_key_values=past_key_values,
+-++    #         use_cache=True,
+-++    #         return_dict=False,
+-++    #     )
+-++        
+-++    #     hidden_states = outputs[0]
+-++    #     logits = self.lm_head.forward(hidden_states)
+-++    #     logits = logits.float()
+-++        
+-++    #     return logits[:, -1, :]
+-++
+-++    # def _sample(
+-++    #     self,
+-++    #     input_ids: mindspore.Tensor,
+-++    #     logits_processor,
+-++    #     stopping_criteria,
+-++    #     generation_config,
+-++    #     synced_devices: bool,
+-++    #     streamer=None,
+-++    #     logits_warper=None,
+-++    #     **model_kwargs,
+-++    # ):
+-++    #     """
+-++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-++    #     """
+-++    #     from ...generation.logits_process import LogitsProcessorList
+-++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-++    #     from mindnlp.core import nn, ops, no_grad
+-++    #     import numpy as np
+-++        
+-++    #     # 检查是否使用 StaticCache
+-++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-++    #     # 否则，直接调用父类方法
+-++    #     past_key_values = model_kwargs.get("past_key_values")
+-++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-++
+-++    #     if not isinstance(past_key_values, StaticCache):
+-++    #         # 不使用 StaticCache，直接调用父类方法
+-++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-++    #         return super()._sample(
+-++    #             input_ids=input_ids,
+-++    #             logits_processor=logits_processor,
+-++    #             stopping_criteria=stopping_criteria,
+-++    #             generation_config=generation_config,
+-++    #             synced_devices=synced_devices,
+-++    #             streamer=streamer,
+-++    #             logits_warper=logits_warper,
+-++    #             **model_kwargs,
+-++    #         )
+-++        
+-++    #     # 使用 StaticCache，进入自定义循环
+-++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-++    #     pad_token_id = generation_config._pad_token_tensor
+-++    #     output_attentions = generation_config.output_attentions
+-++    #     output_hidden_states = generation_config.output_hidden_states
+-++    #     output_scores = generation_config.output_scores
+-++    #     output_logits = generation_config.output_logits
+-++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-++    #     max_length = generation_config.max_length
+-++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-++    #     do_sample = generation_config.do_sample
+-++        
+-++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-++    #         raise ValueError(
+-++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-++    #             f"{logits_warper})."
+-++    #         )
+-++        
+-++    #     # init attention / hidden states / scores tuples
+-++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-++        
+-++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-++    #         encoder_hidden_states = (
+-++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-++    #         )
+-++        
+-++    #     # keep track of which sequences are already finished
+-++    #     batch_size, cur_len = input_ids.shape
+-++    #     this_peer_finished = False
+-++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-++        
+-++    #     time_record = []
+-++    #     from ....utils.testing_utils import parse_flag_from_env
+-++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-++        
+-++    #     while self._has_unfinished_sequences(
+-++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-++    #     ):
+-++    #         if _record_time:
+-++    #             import time as time_module
+-++    #             infer_start = time_module.time()
+-++            
+-++    #         # prepare model inputs
+-++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-++            
+-++    #         # prepare variable output controls
+-++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-++            
+-++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-++    #         cur_cache_position = model_inputs.get("cache_position")
+-++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-++    #         cur_input_ids = model_inputs.get("input_ids")
+-++            
+-++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-++    #             cur_cache_position is not None and 
+-++    #             len(cur_cache_position.shape) > 0 and
+-++    #             cur_cache_position.shape[0] == 1 and
+-++    #             cur_input_ids is not None and
+-++    #             cur_input_ids.shape[1] == 1):
+-++    #             # 使用 JIT 优化的单 token 解码
+-++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-++    #             if not hasattr(self, '_jit_used'):
+-++    #                 self._jit_used = False
+-++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-++                
+-++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-++    #                 cur_token=cur_input_ids,
+-++    #                 input_pos=model_inputs.get("position_ids"),
+-++    #                 cache_position=cur_cache_position,
+-++    #                 past_key_values=cur_past_key_values,
+-++    #             )
+-++                
+-++    #             # 标记已使用JIT（用于后续判断）
+-++    #             if not self._jit_used:
+-++    #                 self._jit_used = True
+-++                
+-++    #             # 构造兼容的输出对象
+-++    #             class JitOptimizedOutput:
+-++    #                 def __init__(self, logits, config):
+-++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-++    #                     self.config = config
+-++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-++    #                     self.cross_attentions = None
+-++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-++                
+-++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-++    #         else:
+-++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-++    #             outputs = self(**model_inputs, return_dict=True)
+-++            
+-++    #         if synced_devices and this_peer_finished:
+-++    #             continue
+-++            
+-++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-++    #         next_token_logits = outputs.logits[:, -1, :]
+-++            
+-++    #         # pre-process distribution
+-++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-++    #         if do_sample:
+-++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-++            
+-++    #         # Store scores, attentions and hidden_states when required
+-++    #         if return_dict_in_generate:
+-++    #             if output_scores:
+-++    #                 scores += (next_token_scores,)
+-++    #             if output_logits:
+-++    #                 raw_logits += (next_token_logits,)
+-++    #             if output_attentions:
+-++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-++    #                 if self.config.is_encoder_decoder:
+-++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-++                
+-++    #             if output_hidden_states:
+-++    #                 hidden = (
+-++    #                     outputs.decoder_hidden_states
+-++    #                     if self.config.is_encoder_decoder
+-++    #                     else outputs.hidden_states
+-++    #                 )
+-++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-++            
+-++    #         # token selection
+-++    #         if do_sample:
+-++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-++    #         else:
+-++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-++            
+-++    #         # finished sentences should have their next token be a padding token
+-++    #         if has_eos_stopping_criteria:
+-++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-++            
+-++    #         # update generated ids, model inputs, and length for next step
+-++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-++    #         if streamer is not None:
+-++    #             streamer.put(next_tokens)
+-++            
+-++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-++    #             outputs,
+-++    #             model_kwargs,
+-++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-++    #         )
+-++            
+-++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-++    #         cur_len += 1
+-++            
+-++    #         if _record_time:
+-++    #             import time as time_module
+-++    #             infer_stop = time_module.time()
+-++    #             time_record.append(infer_stop - infer_start)
+-++            
+-++    #         del outputs
+-++        
+-++    #     average_infer_time = None
+-++    #     if time_record:
+-++    #         if len(time_record) > 1:
+-++    #             time_record.pop(0)
+-++    #         average_infer_time = sum(time_record) / len(time_record)
+-++    #         print(f'average inference time is: {average_infer_time}')
+-++    #         print(f'inference time record: {time_record}')
+-++        
+-++    #     if streamer is not None:
+-++    #         streamer.end()
+-++        
+-++    #     # 简单判断：打印是否使用了JIT路径
+-++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-++    #     else:
+-++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-++        
+-++    #     if return_dict_in_generate:
+-++    #         if self.config.is_encoder_decoder:
+-++    #             return GenerateEncoderDecoderOutput(
+-++    #                 sequences=input_ids,
+-++    #                 scores=scores,
+-++    #                 logits=raw_logits,
+-++    #                 encoder_attentions=encoder_attentions,
+-++    #                 encoder_hidden_states=encoder_hidden_states,
+-++    #                 decoder_attentions=decoder_attentions,
+-++    #                 cross_attentions=cross_attentions,
+-++    #                 decoder_hidden_states=decoder_hidden_states,
+-++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++    #                 average_infer_time=average_infer_time
+-++    #             )
+-++    #         else:
+-++    #             return GenerateDecoderOnlyOutput(
+-++    #                 sequences=input_ids,
+-++    #                 scores=scores,
+-++    #                 logits=raw_logits,
+-++    #                 attentions=decoder_attentions,
+-++    #                 hidden_states=decoder_hidden_states,
+-++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++    #                 average_infer_time=average_infer_time
+-++    #             )
+-++    #     else:
+-++    #         return input_ids
+-++            
+-++    # def _prepare_cache_for_generation(
+-++    #     self,
+-++    #     generation_config,
+-++    #     model_kwargs,
+-++    #     assistant_model,
+-++    #     batch_size,
+-++    #     max_cache_length,
+-++    # ):
+-++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-++    #         generation_config.cache_implementation = "static"
+-++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-++        
+-++    #     if generation_config.cache_implementation == "static":
+-++    #         base_required_from_max_length = generation_config.max_length + 1
+-++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-++    #         min_cache_size = 50
+-++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-++    #         else:
+-++    #             max_cache_length = max(base_required, min_cache_size)
+-++            
+-++    #         original_max_cache_length = max_cache_length
+-++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-++            
+-++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++    #             if max_cache_length > self.config.max_position_embeddings:
+-++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-++        
+-++    #     result = super()._prepare_cache_for_generation(
+-++    #         generation_config=generation_config,
+-++    #         model_kwargs=model_kwargs,
+-++    #         assistant_model=assistant_model,
+-++    #         batch_size=batch_size,
+-++    #         max_cache_length=max_cache_length,
+-++    #     )
+-++        
+-++    #     if generation_config.cache_implementation == "static":
+-++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-++    #         created_cache = model_kwargs.get(cache_name)
+-++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-++    #             if created_cache.max_cache_len < generation_config.max_length:
+-++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-++        
+-++    #     return result
+-++
+-++
+-++
+-+ 
+-+ 
+-+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+-- 
+-+2.27.0
+-+
+-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-new file mode 100644
+-index 00000000..22b65dd5
+---- /dev/null
+-+++ b/patches/0002-20251106commit.patch
+-@@ -0,0 +1,3200 @@
+-+From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-+From: Pinoeer-kingxi <13022943007@163.com>
+-+Date: Thu, 6 Nov 2025 09:20:38 +0800
+-+Subject: [PATCH 2/3] 20251106commit
+-+
+-+---
+-+ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+-+ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+-+ 3 files changed, 2689 insertions(+), 305 deletions(-)
+-+ create mode 100644 patches/0001-20251104commit.patch
+-+
+-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+index d8303e45..73773c22 100644
+-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+-+         #     y = y + self.shared_experts(identity)
+-+         # return y
+-+         
+-++    # @no_grad()
+-++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++
+-++    #     expert_cache = ops.zeros_like(x)
+-++    #     for i in range(self.num_experts_per_tok):
+-++    #         expert_id = flat_expert_indices[i].item()
+-++    #         weight = flat_expert_weights[i].item()
+-++    #         expert = self.experts[expert_id]
+-++    #         expert_out = expert(x)
+-++    #         expert_cache += expert_out * weight
+-++    #     return expert_cache
+-++
+-+     @no_grad()
+-+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++        # x 的 shape: (1, hidden_size)
+-++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-++
+-++        # 1. 收集所有需要的专家层
+-++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-++
+-++        # 2. 并行计算所有专家的输出
+-++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-++        # ops.cat 会将它们堆叠成一个新的 Tensor
+-++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-++
+-++        # 3. 使用矩阵乘法进行加权求和
+-++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++        # 最终结果 final_output 的 shape: (1, hidden_size)
+-++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-++        
+-++        return final_output
+-+ 
+-+-        expert_cache = ops.zeros_like(x)
+-+-        for i in range(self.num_experts_per_tok):
+-+-            expert_id = flat_expert_indices[i].item()
+-+-            weight = flat_expert_weights[i].item()
+-+-            expert = self.experts[expert_id]
+-+-            expert_out = expert(x)
+-+-            expert_cache += expert_out * weight
+-+-        return expert_cache
+-+ 
+-+     @no_grad()
+-+     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+-+             key_states = self.k_proj(hidden_states)
+-+             value_states = self.v_proj(hidden_states)
+-+ 
+-+-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++        # @lwx
+-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+-++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-+ 
+-+         kv_seq_len = key_states.shape[-2]
+-+         if past_key_value is not None:
+-+@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-+ 
+-++# class DeepseekFlashAttention(nn.Module):
+-++#     """
+-++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-++
+-++#     This class is designed as a drop-in replacement for DeepseekAttention.
+-++#     """
+-++
+-++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-++#         super().__init__()
+-++#         self.config = config
+-++#         self.layer_idx = layer_idx
+-++#         if layer_idx is None:
+-++#             logger.warning(
+-++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++#                 "when creating this class."
+-++#             )
+-++
+-++#         self.attention_dropout = config.attention_dropout
+-++#         self.hidden_size = config.hidden_size
+-++#         self.num_heads = config.num_attention_heads
+-++#         self.head_dim = self.hidden_size // self.num_heads
+-++#         self.num_key_value_heads = config.num_key_value_heads
+-++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++#         self.max_position_embeddings = config.max_position_embeddings
+-++#         self.rope_theta = config.rope_theta
+-++#         self.is_causal = True
+-++
+-++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++#             raise ValueError(
+-++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++#                 f" and `num_heads`: {self.num_heads})."
+-++#             )
+-++
+-++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-++#         self._init_rope()
+-++
+-++#     def _init_rope(self):
+-++#         if self.config.rope_scaling is None:
+-++#             self.rotary_emb = DeepseekRotaryEmbedding(
+-++#                 self.head_dim,
+-++#                 max_position_embeddings=self.max_position_embeddings,
+-++#                 base=self.rope_theta,
+-++#             )
+-++#         else:
+-++#             scaling_type = self.config.rope_scaling["type"]
+-++#             scaling_factor = self.config.rope_scaling["factor"]
+-++#             if scaling_type == "linear":
+-++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-++#                     self.head_dim,
+-++#                     max_position_embeddings=self.max_position_embeddings,
+-++#                     scaling_factor=scaling_factor,
+-++#                     base=self.rope_theta,
+-++#                 )
+-++#             elif scaling_type == "dynamic":
+-++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-++#                     self.head_dim,
+-++#                     max_position_embeddings=self.max_position_embeddings,
+-++#                     scaling_factor=scaling_factor,
+-++#                     base=self.rope_theta,
+-++#                 )
+-++#             else:
+-++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-++
+-++#     def forward(
+-++#         self,
+-++#         hidden_states: mindspore.Tensor,
+-++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++#         position_ids: Optional[mindspore.Tensor] = None,
+-++#         past_key_value: Optional[Cache] = None,
+-++#         output_attentions: bool = False,
+-++#         use_cache: bool = False,
+-++#         **kwargs,
+-++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++#         if "padding_mask" in kwargs:
+-++#             warnings.warn(
+-++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-++#             )
+-++        
+-++#         if output_attentions:
+-++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-++
+-++#         bsz, q_len, _ = hidden_states.shape
+-++
+-++#         if self.config.pretraining_tp > 1:
+-++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-++
+-++#         query_states = self.q_proj(hidden_states)
+-++#         key_states = self.k_proj(hidden_states)
+-++#         value_states = self.v_proj(hidden_states)
+-++
+-++#         # Reshape for multi-head attention
+-++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++#         kv_seq_len = key_states.shape[-2]
+-++#         if past_key_value is not None:
+-++#             if self.layer_idx is None:
+-++#                 raise ValueError(
+-++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++#                     "with a layer index."
+-++#                 )
+-++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++        
+-++#         # Apply Rotary Positional Embedding
+-++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++#         if past_key_value is not None:
+-++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++
+-++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++        
+-++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++        
+-++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++
+-++#         # Convert attention_mask for flash_attention_score
+-++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-++#         if attention_mask is not None:
+-++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-++#                 raise ValueError(
+-++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-++#                 )
+-++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-++#         else:
+-++#             attn_mask_for_fa = None
+-++        
+-++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-++
+-++#         # Call the fused flash_attention_score operator
+-++#         attn_output = mindspore.ops.flash_attention_score(
+-++#             query=query_states_for_fa,
+-++#             key=key_states_for_fa,
+-++#             value=value_states_for_fa,
+-++#             head_num=self.num_heads, # This is N1, the number of query heads
+-++#             input_layout='BSH',
+-++#             attn_mask=attn_mask_for_fa,
+-++#             keep_prob=keep_prob,
+-++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++#             sparse_mode=0 # Default mask mode
+-++#         )
+-++        
+-++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-++#         attn_output = self.o_proj(attn_output)
+-++        
+-++#         # Flash Attention does not return attention weights
+-++#         attn_weights = None
+-++
+-++#         return attn_output, attn_weights, past_key_value
+-++
+-++class DeepseekFlashAttention(nn.Module):
+-++    """
+-++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+-++    This implementation is a drop-in replacement for the original DeepseekAttention class,
+-++    designed for high performance on supported hardware (Ascend).
+-++
+-++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+-++    """
+-++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-++        super().__init__()
+-++        self.config = config
+-++        self.layer_idx = layer_idx
+-++        if layer_idx is None:
+-++            logger.warning(
+-++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++                "when creating this class."
+-++            )
+-++
+-++        # --- [FIX] Correctly initialize all required attributes ---
+-++        self.attention_dropout = config.attention_dropout
+-++        self.hidden_size = config.hidden_size
+-++        self.num_heads = config.num_attention_heads
+-++        self.head_dim = self.hidden_size // self.num_heads
+-++        self.num_key_value_heads = config.num_key_value_heads
+-++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++        self.max_position_embeddings = config.max_position_embeddings
+-++        self.rope_theta = config.rope_theta
+-++        self.is_causal = True
+-++
+-++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++            raise ValueError(
+-++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++                f" and `num_heads`: {self.num_heads})."
+-++            )
+-++
+-++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-++        
+-++        # This call will now succeed as all attributes are initialized.
+-++        self._init_rope()
+-++
+-++    def _init_rope(self):
+-++        if self.config.rope_scaling is None:
+-++            self.rotary_emb = DeepseekRotaryEmbedding(
+-++                self.head_dim,
+-++                max_position_embeddings=self.max_position_embeddings,
+-++                base=self.rope_theta,
+-++            )
+-++        else:
+-++            scaling_type = self.config.rope_scaling["type"]
+-++            scaling_factor = self.config.rope_scaling["factor"]
+-++            if scaling_type == "linear":
+-++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-++                    self.head_dim,
+-++                    max_position_embeddings=self.max_position_embeddings,
+-++                    scaling_factor=scaling_factor,
+-++                    base=self.rope_theta,
+-++                )
+-++            elif scaling_type == "dynamic":
+-++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-++                    self.head_dim,
+-++                    max_position_embeddings=self.max_position_embeddings,
+-++                    scaling_factor=scaling_factor,
+-++                    base=self.rope_theta,
+-++                )
+-++            else:
+-++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-++
+-++    def forward(
+-++        self,
+-++        hidden_states: mindspore.Tensor,
+-++        attention_mask: Optional[mindspore.Tensor] = None,
+-++        position_ids: Optional[mindspore.Tensor] = None,
+-++        past_key_value: Optional[Cache] = None,
+-++        output_attentions: bool = False,
+-++        use_cache: bool = False,
+-++        **kwargs,
+-++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++        if "padding_mask" in kwargs:
+-++            warnings.warn(
+-++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-++            )
+-++        if output_attentions:
+-++            warnings.warn(
+-++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+-++            )
+-++
+-++        bsz, q_len, _ = hidden_states.shape
+-++
+-++        if self.config.pretraining_tp > 1:
+-++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-++
+-++        query_states = self.q_proj(hidden_states)
+-++        key_states = self.k_proj(hidden_states)
+-++        value_states = self.v_proj(hidden_states)
+-++
+-++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++        kv_seq_len = key_states.shape[-2]
+-++        if past_key_value is not None:
+-++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++        
+-++        # Apply Rotary Position Embedding
+-++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++        if past_key_value is not None:
+-++            cache_kwargs = {"sin": sin, "cos": cos}
+-++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++
+-++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+-++        # So we must explicitly repeat the KV heads.
+-++        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++
+-++        # Convert attention mask for flash_attention_score
+-++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+-++        if attention_mask is not None:
+-++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-++                 raise ValueError(
+-++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-++                )
+-++            attn_mask_for_fa = attention_mask < 0
+-++        else:
+-++            attn_mask_for_fa = None
+-++
+-++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-++
+-++        # Call the fused operator using the efficient BNSD layout
+-++        attn_output = mindspore.ops.flash_attention_score(
+-++            query=query_states,
+-++            key=key_states,
+-++            value=value_states,
+-++            head_num=self.num_heads,
+-++            input_layout='BNSD', # Specify the correct layout
+-++            attn_mask=attn_mask_for_fa,
+-++            keep_prob=keep_prob,
+-++            scalar_value=1.0 / math.sqrt(self.head_dim)
+-++        )
+-++        
+-++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+-++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++        
+-++        # Apply output projection
+-++        attn_output = self.o_proj(attn_output)
+-++
+-++        # Flash attention does not return attention weights, so we return None.
+-++        attn_weights = None
+-++
+-++        return attn_output, attn_weights, past_key_value
+-++
+-+ Deepseek_ATTENTION_CLASSES = {
+-+     "eager": DeepseekAttention,
+-++    "flash-attention": DeepseekFlashAttention,
+-+ }
+-+ 
+-+ 
+-+@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+-+             config=config, layer_idx=layer_idx
+-+         )
+-+ 
+-++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-++            config=config, layer_idx=layer_idx
+-++        )
+-++
+-+         self.mlp = (
+-+             DeepseekMoE(config)
+-+             if (
+-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+index d4c6b651..bced285c 100644
+-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+-+ 
+-+ import mindspore
+-+ import mindnlp.core.nn.functional as F
+-+-from mindnlp.core import nn, ops
+-++from mindnlp.core import nn, ops, no_grad
+-+ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+-+ 
+-+ from ....common.activations import ACT2FN
+-+@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+-+ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-+ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-+ 
+-++Long_Prompt = False
+-++PROMPT_LENGTH_THRESHOLD = 128
+-+ 
+-+ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-+ def _prepare_4d_causal_attention_mask_with_cache_position(
+-+@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-+ 
+-++# class Qwen2MoeFlashAttention(nn.Module):
+-++#     """
+-++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++
+-++#     关键改动:
+-++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++#        直接传入原始的 key 和 value 张量效率更高。
+-++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++#     """
+-++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++#         super().__init__()
+-++#         self.config = config
+-++#         self.layer_idx = layer_idx
+-++#         self.hidden_size = config.hidden_size
+-++#         self.num_heads = config.num_attention_heads
+-++#         self.head_dim = self.hidden_size // self.num_heads
+-++#         self.num_key_value_heads = config.num_key_value_heads
+-++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++#         self.max_position_embeddings = config.max_position_embeddings
+-++#         self.rope_theta = config.rope_theta
+-++#         self.attention_dropout = config.attention_dropout
+-++
+-++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++#             raise ValueError(
+-++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++#             )
+-++
+-++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++
+-++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++#             self.head_dim,
+-++#             max_position_embeddings=self.max_position_embeddings,
+-++#             base=self.rope_theta,
+-++#         )
+-++
+-++#     def forward(
+-++#         self,
+-++#         hidden_states: mindspore.Tensor,
+-++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++#         position_ids: Optional[mindspore.Tensor] = None,
+-++#         past_key_value: Optional[Cache] = None,
+-++#         output_attentions: bool = False,
+-++#         use_cache: bool = False,
+-++#         cache_position: Optional[mindspore.Tensor] = None,
+-++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++#         bsz, q_len, _ = hidden_states.shape
+-++
+-++#         # 1. 线性投射 Q, K, V
+-++#         query_states = self.q_proj(hidden_states)
+-++#         key_states = self.k_proj(hidden_states)
+-++#         value_states = self.v_proj(hidden_states)
+-++
+-++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++#         # query:   [B, S, H*D] -> [B, N1, S, D]
+-++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++#         # 3. RoPE 旋转位置编码
+-++#         kv_seq_len = key_states.shape[-2]
+-++#         if past_key_value is not None:
+-++#             if self.layer_idx is None:
+-++#                 raise ValueError(
+-++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++#                     "with a layer index."
+-++#                 )
+-++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++#                 if cache_position.shape[0] == 1:
+-++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++#                     kv_seq_len = past_seen_tokens + 1
+-++#                 else:
+-++#                     # prefill 阶段：cache_position 是范围，使用其长度
+-++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++#             else:
+-++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++
+-++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++#         # 4. KV 缓存更新
+-++#         if past_key_value is not None:
+-++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++#             key_states, value_states = past_key_value.update(
+-++#                 key_states, value_states, self.layer_idx, cache_kwargs
+-++#             )
+-++            
+-++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++#                 if cache_position.shape[0] == 1:
+-++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++#                     kv_seq_len = key_states.shape[-2]
+-++
+-++#         # 5. [重要] 准备 Attention Mask
+-++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++#         fa_attention_mask = None
+-++#         if attention_mask is not None:
+-++#             # 截取与当前key长度匹配的部分
+-++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++#             fa_attention_mask = (mask_slice != 0)
+-++
+-++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++#         input_dtype = query_states.dtype
+-++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++#             query_states = query_states.to(mindspore.float16)
+-++#             key_states = key_states.to(mindspore.float16)
+-++#             value_states = value_states.to(mindspore.float16)
+-++
+-++#         # 6. [核心] 调用 flash_attention_score 算子
+-++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++#         attn_output = mindspore.ops.flash_attention_score(
+-++#             query=query_states,
+-++#             key=key_states,
+-++#             value=value_states,
+-++#             head_num=self.num_heads, # 传入Q的头数(N1)
+-++#             attn_mask=fa_attention_mask,
+-++#             keep_prob=1.0 - self.attention_dropout,
+-++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++#             input_layout="BNSD",
+-++#             sparse_mode=0 # 使用 defaultMask 模式
+-++#         )
+-++
+-++#         # 恢复原始数据类型
+-++#         attn_output = attn_output.to(input_dtype)
+-++
+-++#         # 7. 调整输出形状
+-++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++#         attn_output = self.o_proj(attn_output)
+-++
+-++#         # FlashAttention 算子不直接返回注意力权重矩阵
+-++#         attn_weights = None
+-++#         if output_attentions:
+-++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++
+-++#         return attn_output, attn_weights, past_key_value
+-++
+-++#     # def forward(
+-++#     #     self,
+-++#     #     hidden_states: mindspore.Tensor,
+-++#     #     attention_mask: Optional[mindspore.Tensor] = None,
+-++#     #     position_ids: Optional[mindspore.Tensor] = None,
+-++#     #     past_key_value: Optional[Cache] = None,
+-++#     #     output_attentions: bool = False,
+-++#     #     use_cache: bool = False,
+-++#     #     cache_position: Optional[mindspore.Tensor] = None,
+-++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++#     #     bsz, q_len, _ = hidden_states.shape
+-++
+-++#     #     # 1. 线性投射 Q, K, V
+-++#     #     query_states = self.q_proj(hidden_states)
+-++#     #     key_states = self.k_proj(hidden_states)
+-++#     #     value_states = self.v_proj(hidden_states)
+-++
+-++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++
+-++#     #     # 3. RoPE 旋转位置编码
+-++#     #     kv_seq_len = key_states.shape[-2]
+-++#     #     if past_key_value is not None:
+-++#     #         if self.layer_idx is None:
+-++#     #             raise ValueError(
+-++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++#     #                 "with a layer index."
+-++#     #             )
+-++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++
+-++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++#     #     # 4. KV 缓存更新
+-++#     #     if past_key_value is not None:
+-++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++#     #         key_states, value_states = past_key_value.update(
+-++#     #             key_states, value_states, self.layer_idx, cache_kwargs
+-++#     #         )
+-++
+-++#     #     # 5. 准备 Attention Mask
+-++#     #     fa_attention_mask = None
+-++#     #     if attention_mask is not None:
+-++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++#     #         fa_attention_mask = (mask_slice != 0)
+-++
+-++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++#     #     input_dtype = query_states.dtype
+-++
+-++#     #     # 6. [核心] 调用 flash_attention_score 算子
+-++#     #     attn_output = mindspore.ops.flash_attention_score(
+-++#     #         query=query_states,
+-++#     #         key=key_states,
+-++#     #         value=value_states,
+-++#     #         head_num=self.num_heads,
+-++#     #         attn_mask=fa_attention_mask,
+-++#     #         keep_prob=1.0 - self.attention_dropout,
+-++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++#     #         input_layout="BNSD",
+-++#     #         sparse_mode=0,
+-++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++#     #         inner_precise=1
+-++#     #     )
+-++
+-++#     #     # 恢复原始数据类型
+-++#     #     attn_output = attn_output.to(input_dtype)
+-++
+-++#     #     # 7. 调整输出形状
+-++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++#     #     attn_output = self.o_proj(attn_output)
+-++
+-++#     #     attn_weights = None
+-++#     #     if output_attentions:
+-++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++
+-++#     #     return attn_output, attn_weights, past_key_value
+-++
+-++
+-+ class Qwen2MoeFlashAttention(nn.Module):
+-+     """
+-+-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+-
+-+-    关键改动:
+-+-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+-       直接传入原始的 key 和 value 张量效率更高。
+-+-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+-++
+-++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+-++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+-++    完全使用模型的低精度数据类型（如 float16）进行计算，
+-++    以达到理论上的最高执行速度。
+-+     """
+-+     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+         super().__init__()
+-+         self.config = config
+-+         self.layer_idx = layer_idx
+-++        if layer_idx is None:
+-++            logger.warning_once(
+-++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+-++            )
+-++
+-+         self.hidden_size = config.hidden_size
+-+         self.num_heads = config.num_attention_heads
+-+         self.head_dim = self.hidden_size // self.num_heads
+-+         self.num_key_value_heads = config.num_key_value_heads
+-+-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+         self.max_position_embeddings = config.max_position_embeddings
+-+         self.rope_theta = config.rope_theta
+-+         self.attention_dropout = config.attention_dropout
+-+ 
+-+-        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+-            raise ValueError(
+-+-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+-            )
+-+-
+-+         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+-+         key_states = self.k_proj(hidden_states)
+-+         value_states = self.v_proj(hidden_states)
+-+ 
+-+-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+-        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++        # 2. 调整形状以匹配 BNSD 布局
+-+         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-
+-+-        # 3. RoPE 旋转位置编码
+-++        
+-++        # 3. RoPE 和 KV 缓存
+-+         kv_seq_len = key_states.shape[-2]
+-+         if past_key_value is not None:
+-+-            if self.layer_idx is None:
+-+-                raise ValueError(
+-+-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+-                    "with a layer index."
+-+-                )
+-+-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+-                if cache_position.shape[0] == 1:
+-+-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+-                    kv_seq_len = past_seen_tokens + 1
+-+-                else:
+-+-                    # prefill 阶段：cache_position 是范围，使用其长度
+-+-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+-            else:
+-+-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+-
+-++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++        
+-+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+ 
+-+-        # 4. KV 缓存更新
+-+         if past_key_value is not None:
+-+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+-            key_states, value_states = past_key_value.update(
+-+-                key_states, value_states, self.layer_idx, cache_kwargs
+-+-            )
+-+-            
+-+-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+-                if cache_position.shape[0] == 1:
+-+-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+-                    kv_seq_len = key_states.shape[-2]
+-+-
+-+-        # 5. [重要] 准备 Attention Mask
+-+-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++
+-++        # 4. 准备 Attention Mask
+-+         fa_attention_mask = None
+-+         if attention_mask is not None:
+-+-            # 截取与当前key长度匹配的部分
+-+-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+             fa_attention_mask = (mask_slice != 0)
+-+ 
+-+-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+-        input_dtype = query_states.dtype
+-+-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+-            query_states = query_states.to(mindspore.float16)
+-+-            key_states = key_states.to(mindspore.float16)
+-+-            value_states = value_states.to(mindspore.float16)
+-+-
+-+-        # 6. [核心] 调用 flash_attention_score 算子
+-+-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+-+         attn_output = mindspore.ops.flash_attention_score(
+-+             query=query_states,
+-+             key=key_states,
+-+             value=value_states,
+-+-            head_num=self.num_heads, # 传入Q的头数(N1)
+-++            head_num=self.num_heads,
+-+             attn_mask=fa_attention_mask,
+-+-            keep_prob=1.0 - self.attention_dropout,
+-++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+-+             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+             input_layout="BNSD",
+-+-            sparse_mode=0 # 使用 defaultMask 模式
+-++            sparse_mode=0,
+-++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+-+         )
+-+ 
+-+-        # 恢复原始数据类型
+-+-        attn_output = attn_output.to(input_dtype)
+-+-
+-+-        # 7. 调整输出形状
+-+-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++        # 6. 调整输出形状
+-+         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+         attn_output = self.o_proj(attn_output)
+-+ 
+-+-        # FlashAttention 算子不直接返回注意力权重矩阵
+-++        # 7. 返回结果
+-+         attn_weights = None
+-+         if output_attentions:
+-+-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-+ 
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-+-    # def forward(
+-+-    #     self,
+-+-    #     hidden_states: mindspore.Tensor,
+-+-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+-    #     position_ids: Optional[mindspore.Tensor] = None,
+-+-    #     past_key_value: Optional[Cache] = None,
+-+-    #     output_attentions: bool = False,
+-+-    #     use_cache: bool = False,
+-+-    #     cache_position: Optional[mindspore.Tensor] = None,
+-+-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+-
+-+-    #     bsz, q_len, _ = hidden_states.shape
+-+-
+-+-    #     # 1. 线性投射 Q, K, V
+-+-    #     query_states = self.q_proj(hidden_states)
+-+-    #     key_states = self.k_proj(hidden_states)
+-+-    #     value_states = self.v_proj(hidden_states)
+-+-
+-+-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-
+-+-    #     # 3. RoPE 旋转位置编码
+-+-    #     kv_seq_len = key_states.shape[-2]
+-+-    #     if past_key_value is not None:
+-+-    #         if self.layer_idx is None:
+-+-    #             raise ValueError(
+-+-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+-    #                 "with a layer index."
+-+-    #             )
+-+-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+ 
+-+-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+-
+-+-    #     # 4. KV 缓存更新
+-+-    #     if past_key_value is not None:
+-+-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+-    #         key_states, value_states = past_key_value.update(
+-+-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+-    #         )
+-+-
+-+-    #     # 5. 准备 Attention Mask
+-+-    #     fa_attention_mask = None
+-+-    #     if attention_mask is not None:
+-+-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+-    #         fa_attention_mask = (mask_slice != 0)
+-+-
+-+-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+-    #     input_dtype = query_states.dtype
+-+-
+-+-    #     # 6. [核心] 调用 flash_attention_score 算子
+-+-    #     attn_output = mindspore.ops.flash_attention_score(
+-+-    #         query=query_states,
+-+-    #         key=key_states,
+-+-    #         value=value_states,
+-+-    #         head_num=self.num_heads,
+-+-    #         attn_mask=fa_attention_mask,
+-+-    #         keep_prob=1.0 - self.attention_dropout,
+-+-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+-    #         input_layout="BNSD",
+-+-    #         sparse_mode=0,
+-+-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+-    #         inner_precise=1
+-+-    #     )
+-+-
+-+-    #     # 恢复原始数据类型
+-+-    #     attn_output = attn_output.to(input_dtype)
+-++QWEN2MOE_ATTENTION_CLASSES = {
+-++    "eager": Qwen2MoeAttention,
+-++    "flash-attention": Qwen2MoeFlashAttention,
+-++}
+-+ 
+-+-    #     # 7. 调整输出形状
+-+-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+-    #     attn_output = self.o_proj(attn_output)
+-+ 
+-+-    #     attn_weights = None
+-+-    #     if output_attentions:
+-+-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++#     def __init__(self, config):
+-++#         super().__init__()
+-++#         self.num_experts = config.num_experts
+-++#         self.top_k = config.num_experts_per_tok
+-++#         self.norm_topk_prob = config.norm_topk_prob
+-++
+-++#         # gating
+-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++#         self.experts = nn.ModuleList(
+-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++#         )
+-++
+-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++
+-++#     #@dwj
+-++#     # 只遍历激活的专家，而非全部专家
+-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++#             num_tokens = hidden_states_reshaped.shape[0]
+-++
+-++#             router_logits = self.gate(hidden_states_reshaped)
+-++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++
+-++#             if self.norm_topk_prob:
+-++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++#             routing_weights = routing_weights.to(hidden_states.dtype)
+-++            
+-++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++#             flat_selected_experts = selected_experts.flatten()
+-++            
+-++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++#             token_indices = broadcasted_token_indices.flatten()
+-++            
+-++#             active_experts = ops.unique(flat_selected_experts)
+-++            
+-++#             for expert_idx_tensor in active_experts:
+-++#                 expert_idx = expert_idx_tensor.item()
+-++#                 expert_layer = self.experts[expert_idx]
+-++                
+-++#                 mask = (flat_selected_experts == expert_idx_tensor)
+-++#                 selected_token_indices = token_indices[mask]
+-++#                 selected_routing_weights = routing_weights.flatten()[mask]
+-++                
+-++#                 current_states = hidden_states_reshaped[selected_token_indices]
+-++                
+-++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++                
+-++#                 final_hidden_states = final_hidden_states.index_add(
+-++#                     dim=0,
+-++#                     index=selected_token_indices,
+-++#                     source=expert_output.to(hidden_states.dtype)
+-++#                 )
+-++            
+-++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+ 
+-+-    #     return attn_output, attn_weights, past_key_value
+-++#             final_hidden_states = final_hidden_states + shared_expert_output
+-++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++            
+-++#             return final_hidden_states, router_logits
+-++
+-++
+-++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++#     """
+-++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-++#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-++#     """
+-++#     def __init__(self, config: Qwen2MoeConfig):
+-++#         super().__init__()
+-++#         self.num_experts = config.num_experts
+-++#         self.top_k = config.num_experts_per_tok
+-++#         self.norm_topk_prob = config.norm_topk_prob
+-++
+-++#         # 门控网络
+-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++#         # 专家列表
+-++#         self.experts = nn.ModuleList(
+-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++#         )
+-++#         # 共享专家
+-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++
+-++#     @no_grad()
+-++#     def _moe_infer_decode(
+-++#         self, 
+-++#         hidden_states: mindspore.Tensor, 
+-++#         selected_experts: mindspore.Tensor, 
+-++#         routing_weights: mindspore.Tensor
+-++#     ) -> mindspore.Tensor:
+-++#         """
+-++#         【解码路径】针对 sequence_length=1 的极致优化。
+-++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-++#         """
+-++#         batch_size, hidden_dim = hidden_states.shape
+-++        
+-++#         expert_outputs_list = [
+-++#             ops.cat([
+-++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++#             ], dim=0) 
+-++#             for i in range(batch_size)
+-++#         ]
+-++        
+-++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-++#         # shape: (batch_size, top_k, hidden_dim)
+-++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++        
+-++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++        
+-++#         return moe_output.squeeze(1)
+-++
+-++#     @no_grad()
+-++#     def _moe_infer_prefill(
+-++#         self, 
+-++#         hidden_states: mindspore.Tensor, 
+-++#         selected_experts: mindspore.Tensor, 
+-++#         routing_weights: mindspore.Tensor
+-++#     ) -> mindspore.Tensor:
+-++#         """
+-++#         【预填充路径】针对 sequence_length > 1 的优化。
+-++#         按专家对 Token 进行分组，并进行批处理。
+-++#         """
+-++#         moe_output = ops.zeros_like(hidden_states)
+-++#         num_tokens = hidden_states.shape[0]
+-++#         flat_selected_experts = selected_experts.flatten()
+-++        
+-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++        
+-++#         active_experts = ops.unique(flat_selected_experts)
+-++        
+-++#         for expert_idx_tensor in active_experts:
+-++#             expert_idx = expert_idx_tensor.item()
+-++#             expert_layer = self.experts[expert_idx]
+-++            
+-++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++#             selected_token_indices = token_indices[mask]
+-++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++            
+-++#             current_states = hidden_states[selected_token_indices]
+-++            
+-++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++            
+-++#             moe_output = moe_output.index_add(
+-++#                 dim=0,
+-++#                 index=selected_token_indices,
+-++#                 source=expert_output.to(hidden_states.dtype)
+-++#             )
+-++#         return moe_output
+-++
+-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++#         """
+-++#         顶层 forward 方法，作为智能分发器。
+-++#         """
+-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++        
+-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++#         router_logits = self.gate(hidden_states_reshaped)
+-++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+ 
+-+-    # def forward(
+-+-    #     self,
+-+-    #     hidden_states: mindspore.Tensor,
+-+-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+-    #     position_ids: Optional[mindspore.Tensor] = None,
+-+-    #     past_key_value: Optional[Cache] = None,
+-+-    #     output_attentions: bool = False,
+-+-    #     use_cache: bool = False,
+-+-    #     cache_position: Optional[mindspore.Tensor] = None,
+-+-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+-
+-+-    #     bsz, q_len, _ = hidden_states.shape
+-+-
+-+-    #     query_states = self.q_proj(hidden_states)
+-+-    #     key_states = self.k_proj(hidden_states)
+-+-    #     value_states = self.v_proj(hidden_states)
+-+-
+-+-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-
+-+-    #     kv_seq_len = key_states.shape[-2]
+-+-    #     if past_key_value is not None:
+-+-    #         if self.layer_idx is None:
+-+-    #             raise ValueError("`layer_idx` must be specified for caching")
+-+-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+-
+-+-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+-
+-+-    #     if past_key_value is not None:
+-+-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+-    #         key_states, value_states = past_key_value.update(
+-+-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+-    #         )
+-++#         if self.norm_topk_prob:
+-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++        
+-++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++        
+-++#         moe_output = None
+-++#         # 在推理时，根据序列长度选择最优路径
+-++#         if not self.training:
+-++#             if sequence_length == 1:
+-++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++#             else:
+-++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++#         else:
+-++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-++#             raise NotImplementedError("Training path is not implemented.")
+-++
+-++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-++        
+-++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-++        
+-++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-++        
+-++#         return final_hidden_states, router_logits
+-++
+-++
+-++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++#     """
+-++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-++#     """
+-++#     def __init__(self, config: Qwen2MoeConfig):
+-++#         super().__init__()
+-++#         self.num_experts = config.num_experts
+-++#         self.top_k = config.num_experts_per_tok
+-++#         self.norm_topk_prob = config.norm_topk_prob
+-++
+-++#         # 门控网络
+-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++#         # 专家列表
+-++#         self.experts = nn.ModuleList(
+-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++#         )
+-++#         # 共享专家
+-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++
+-++#     @no_grad()
+-++#     def _moe_infer_decode(
+-++#         self, 
+-++#         hidden_states: mindspore.Tensor, 
+-++#         selected_experts: mindspore.Tensor, 
+-++#         routing_weights: mindspore.Tensor
+-++#     ) -> mindspore.Tensor:
+-++#         batch_size, _ = hidden_states.shape
+-++#         expert_outputs_list = [
+-++#             ops.cat([
+-++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++#             ], dim=0) 
+-++#             for i in range(batch_size)
+-++#         ]
+-++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++#         return moe_output.squeeze(1)
+-++
+-++#     @no_grad()
+-++#     def _moe_infer_prefill(
+-++#         self, 
+-++#         hidden_states: mindspore.Tensor, 
+-++#         selected_experts: mindspore.Tensor, 
+-++#         routing_weights: mindspore.Tensor
+-++#     ) -> mindspore.Tensor:
+-++#         moe_output = ops.zeros_like(hidden_states)
+-++#         num_tokens = hidden_states.shape[0]
+-++#         flat_selected_experts = selected_experts.flatten()
+-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++#         active_experts = ops.unique(flat_selected_experts)
+-++        
+-++#         for expert_idx_tensor in active_experts:
+-++#             expert_idx = expert_idx_tensor.item()
+-++#             expert_layer = self.experts[expert_idx]
+-++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++#             selected_token_indices = token_indices[mask]
+-++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++#             current_states = hidden_states[selected_token_indices]
+-++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++#             moe_output = moe_output.index_add(
+-++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++#             )
+-++#         return moe_output
+-++
+-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++#         """
+-++#         顶层 forward 方法，作为智能分发器。
+-++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-++#         """
+-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++        
+-++#         # 1. 门控计算 (通用逻辑)
+-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++#         router_logits = self.gate(hidden_states_reshaped)
+-++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++
+-++#         if self.norm_topk_prob:
+-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++        
+-++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++        
+-++#         # 2. 智能分发到最优 MoE 路径
+-++#         moe_output = None
+-++#         if not self.training:
+-++#             if sequence_length == 1:
+-++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++#             else:
+-++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++#         else:
+-++#             raise NotImplementedError("Training path is not implemented.")
+-++
+-++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++        
+-++#         # 4. 合并 MoE 输出和共享专家输出
+-++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++        
+-++#         # 5. 恢复原始形状并返回
+-++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++        
+-++#         return final_hidden_states, router_logits
+-++
+-++# prefill fastest
+-++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++#     """
+-++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-++#     """
+-++#     def __init__(self, config: Qwen2MoeConfig):
+-++#         super().__init__()
+-++#         self.num_experts = config.num_experts
+-++#         self.top_k = config.num_experts_per_tok
+-++#         self.norm_topk_prob = config.norm_topk_prob
+-++
+-++#         # 门控网络
+-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++#         # 专家列表
+-++#         self.experts = nn.ModuleList(
+-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++#         )
+-++#         # 共享专家
+-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++
+-++#     @no_grad()
+-++#     def _moe_infer_dispatch(
+-++#         self, 
+-++#         hidden_states: mindspore.Tensor, 
+-++#         selected_experts: mindspore.Tensor, 
+-++#         routing_weights: mindspore.Tensor
+-++#     ) -> mindspore.Tensor:
+-++#         """
+-++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-++#         """
+-++#         moe_output = ops.zeros_like(hidden_states)
+-++#         num_tokens, _ = hidden_states.shape
+-++        
+-++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-++#         flat_selected_experts = selected_experts.flatten()
+-++#         flat_routing_weights = routing_weights.flatten()
+-+ 
+-+-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+-
+-+-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+-    #     query_states = query_states / math.sqrt(self.head_dim)
+-+-    #     # <--- 修改结束 ---
+-+-
+-+-    #     fa_attention_mask = None
+-+-    #     if attention_mask is not None:
+-+-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+-    #         fa_attention_mask = (mask_slice != 0)
+-+-
+-+-    #     input_dtype = query_states.dtype
+-+-
+-+-    #     attn_output = mindspore.ops.flash_attention_score(
+-+-    #         query=query_states,    # 传入已经预先缩放过的 query
+-+-    #         key=key_states,
+-+-    #         value=value_states,
+-+-    #         head_num=self.num_heads,
+-+-    #         attn_mask=fa_attention_mask,
+-+-    #         keep_prob=1.0 - self.attention_dropout,
+-+-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+-    #         input_layout="BNSD",
+-+-    #         sparse_mode=0,
+-+-    #         inner_precise=1        # 仍然保持内部高精度计算
+-+-    #     )
+-++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+ 
+-+-    #     attn_output = attn_output.to(input_dtype)
+-+-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+-    #     attn_output = self.o_proj(attn_output)
+-++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-++#         active_experts = ops.unique(flat_selected_experts)
+-++        
+-++#         for expert_idx_tensor in active_experts:
+-++#             expert_idx = expert_idx_tensor.item()
+-++#             expert_layer = self.experts[expert_idx]
+-++            
+-++#             # 找到所有分配给该专家的 token
+-++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++            
+-++#             # 使用 mask 选取对应的 token 和权重
+-++#             current_token_indices = token_indices[mask]
+-++#             current_routing_weights = flat_routing_weights[mask]
+-++#             current_hidden_states = hidden_states[current_token_indices]
+-++            
+-++#             # 对这些 token 进行批处理
+-++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++            
+-++#             # 使用 index_add 将结果精确地加回到对应位置
+-++#             moe_output = moe_output.index_add(
+-++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-++#             )
+-++#         return moe_output
+-++
+-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++#         """
+-++#         顶层 forward 方法，作为智能分发器。
+-++#         """
+-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++        
+-++#         # 1. 门控计算
+-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++#         router_logits = self.gate(hidden_states_reshaped)
+-++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++
+-++#         if self.norm_topk_prob:
+-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++        
+-++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++        
+-++#         # 2. 调用统一的 MoE 计算内核
+-++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-+ 
+-+-    #     attn_weights = None
+-+-    #     if output_attentions:
+-+-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++#         # 3. 统一处理共享专家
+-++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++        
+-++#         # 4. 合并输出
+-++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++        
+-++#         # 5. 恢复原始形状并返回
+-++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++        
+-++#         return final_hidden_states, router_logits
+-++
+-++
+-++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++#     """
+-++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++#     【最终高性能与高精度版】：
+-++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-++#     3. 这样实现了速度和准确性的两全其美。
+-++#     """
+-++#     def __init__(self, config: Qwen2MoeConfig):
+-++#         super().__init__()
+-++#         self.num_experts = config.num_experts
+-++#         self.top_k = config.num_experts_per_tok
+-++#         self.norm_topk_prob = config.norm_topk_prob
+-++
+-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++#         self.experts = nn.ModuleList(
+-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++#         )
+-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++
+-++#     @no_grad()
+-++#     def _moe_infer_decode(
+-++#         self, 
+-++#         hidden_states: mindspore.Tensor, 
+-++#         selected_experts: mindspore.Tensor, 
+-++#         routing_weights: mindspore.Tensor
+-++#     ) -> mindspore.Tensor:
+-++#         """
+-++#         【解码路径】极致优化版：bmm + 高精度累加。
+-++#         """
+-++#         original_dtype = hidden_states.dtype
+-++#         batch_size, _ = hidden_states.shape
+-++        
+-++#         expert_outputs_list = [
+-++#             ops.cat([
+-++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++#             ], dim=0) 
+-++#             for i in range(batch_size)
+-++#         ]
+-++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++
+-++#         # 在 float32 下执行 bmm，得到高精度结果
+-++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++        
+-++#         # 将高精度结果转换回原始数据类型
+-++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-++        
+-++#         return moe_output
+-++
+-++#     @no_grad()
+-++#     def _moe_infer_prefill(
+-++#         self, 
+-++#         hidden_states: mindspore.Tensor, 
+-++#         selected_experts: mindspore.Tensor, 
+-++#         routing_weights: mindspore.Tensor
+-++#     ) -> mindspore.Tensor:
+-++#         """
+-++#         【预填充路径】与原始实现一致，结果精确。
+-++#         """
+-++#         moe_output = ops.zeros_like(hidden_states)
+-++#         num_tokens, _ = hidden_states.shape
+-++#         flat_selected_experts = selected_experts.flatten()
+-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++#         active_experts = ops.unique(flat_selected_experts)
+-++        
+-++#         for expert_idx_tensor in active_experts:
+-++#             expert_idx = expert_idx_tensor.item()
+-++#             expert_layer = self.experts[expert_idx]
+-++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++#             selected_token_indices = token_indices[mask]
+-++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++#             current_states = hidden_states[selected_token_indices]
+-++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++#             moe_output = moe_output.index_add(
+-++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++#             )
+-++#         return moe_output
+-++
+-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++        
+-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++#         router_logits = self.gate(hidden_states_reshaped)
+-++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+ 
+-+-    #     return attn_output, attn_weights, past_key_value
+-++#         if self.norm_topk_prob:
+-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++        
+-++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-++#         # 如果模型主体是 float16，后续再转换
+-++        
+-++#         moe_output = None
+-++#         if not self.training:
+-++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-++#             # _moe_infer_decode 内部会处理好类型转换
+-++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-++#             if sequence_length == 1:
+-++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++#             else:
+-++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++#         else:
+-++#             raise NotImplementedError("Training path is not implemented.")
+-++
+-++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++        
+-++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++        
+-++#         return final_hidden_states, router_logits
+-++    
+-+ 
+-+-QWEN2MOE_ATTENTION_CLASSES = {
+-+-    "eager": Qwen2MoeAttention,
+-+-    "flash-attention": Qwen2MoeFlashAttention,
+-+-}
+-++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++#     """
+-++#     【融合版】一个混合专家模块，内置两种推理策略，
+-++#     由外部全局变量 `Long_Prompt` 控制：
+-++
+-++#     - if Long_Prompt is True:  【精度优先模式】
+-++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-++#       适用于处理长序列，避免误差累积。
+-++
+-++#     - if Long_Prompt is False: 【速度优先模式】
+-++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-++#       在解码阶段获得极致速度，同时保证结果高度准确。
+-++#     """
+-++#     def __init__(self, config: Qwen2MoeConfig):
+-++#         super().__init__()
+-++#         self.num_experts = config.num_experts
+-++#         self.top_k = config.num_experts_per_tok
+-++#         self.norm_topk_prob = config.norm_topk_prob
+-++
+-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++#         self.experts = nn.ModuleList(
+-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++#         )
+-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++
+-++#     # --- 速度优先模式的辅助函数 ---
+-++#     @no_grad()
+-++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++#         original_dtype = hidden_states.dtype
+-++#         batch_size, _ = hidden_states.shape
+-++#         expert_outputs_list = [
+-++#             ops.cat([
+-++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++#             ], dim=0) 
+-++#             for i in range(batch_size)
+-++#         ]
+-++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++#         weights_fp32 = routing_weights.to(mindspore.float32)
+-++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-++
+-++#     @no_grad()
+-++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++#         moe_output = ops.zeros_like(hidden_states)
+-++#         num_tokens, _ = hidden_states.shape
+-++#         flat_selected_experts = selected_experts.flatten()
+-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++#         active_experts = ops.unique(flat_selected_experts)
+-++#         for expert_idx_tensor in active_experts:
+-++#             expert_idx = expert_idx_tensor.item()
+-++#             expert_layer = self.experts[expert_idx]
+-++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++#             selected_token_indices = token_indices[mask]
+-++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++#             current_states = hidden_states[selected_token_indices]
+-++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-++#         return moe_output
+-++
+-++#     # --- 精度优先模式的辅助函数 ---
+-++#     @no_grad()
+-++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++#         moe_output = ops.zeros_like(hidden_states)
+-++#         num_tokens, _ = hidden_states.shape
+-++#         flat_selected_experts = selected_experts.flatten()
+-++#         flat_routing_weights = routing_weights.flatten()
+-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++#         active_experts = ops.unique(flat_selected_experts)
+-++#         for expert_idx_tensor in active_experts:
+-++#             expert_idx = expert_idx_tensor.item()
+-++#             expert_layer = self.experts[expert_idx]
+-++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++#             current_token_indices = token_indices[mask]
+-++#             current_routing_weights = flat_routing_weights[mask]
+-++#             current_hidden_states = hidden_states[current_token_indices]
+-++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-++#         return moe_output
+-++
+-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++#         # 声明我们将要使用一个在模块外部定义的全局变量
+-++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-++#         global Long_Prompt
+-++
+-++#         # 1. 门控计算 (所有模式通用)
+-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++#         router_logits = self.gate(hidden_states_reshaped)
+-++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-++#         if self.norm_topk_prob:
+-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++        
+-++#         moe_output = None
+-++#         if not self.training:
+-++#             # 根据 Long_Prompt 标志选择模式
+-++#             if Long_Prompt:
+-++#                 # --- 精度优先模式 ---
+-++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++#             else:
+-++#                 # --- 速度优先模式 ---
+-++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++#                 if sequence_length == 1:
+-++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++#                 else:
+-++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++#         else:
+-++#             raise NotImplementedError("Training path is not implemented.")
+-++
+-++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++        
+-++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++        
+-++#         return final_hidden_states, router_logits
+-++    
+-++class Qwen2MoeSparseMoeBlock(nn.Module):
+-++    """
+-++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-++    控制的顶级推理策略：
+-+ 
+-++    - if Long_Prompt is True:  【精度优先模式】
+-++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+-++      适用于需要严格可复现性的长序列任务。
+-+ 
+-+-class Qwen2MoeSparseMoeBlock(nn.Module):
+-+-    def __init__(self, config):
+-++    - if Long_Prompt is False: 【速度优先模式】
+-++      采用业界最强的性能组合：
+-++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+-++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+-++    """
+-++    def __init__(self, config: Qwen2MoeConfig):
+-+         super().__init__()
+-+         self.num_experts = config.num_experts
+-+         self.top_k = config.num_experts_per_tok
+-+         self.norm_topk_prob = config.norm_topk_prob
+-+ 
+-+-        # gating
+-+         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+         self.experts = nn.ModuleList(
+-+             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+         )
+-+-
+-+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+ 
+-+-    #@dwj
+-+-    # 只遍历激活的专家，而非全部专家
+-+-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+-            num_tokens = hidden_states_reshaped.shape[0]
+-+-
+-+-            router_logits = self.gate(hidden_states_reshaped)
+-+-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+-
+-+-            if self.norm_topk_prob:
+-+-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-            routing_weights = routing_weights.to(hidden_states.dtype)
+-+-            
+-+-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+-            flat_selected_experts = selected_experts.flatten()
+-+-            
+-+-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+-            token_indices = broadcasted_token_indices.flatten()
+-+-            
+-+-            active_experts = ops.unique(flat_selected_experts)
+-+-            
+-+-            for expert_idx_tensor in active_experts:
+-+-                expert_idx = expert_idx_tensor.item()
+-+-                expert_layer = self.experts[expert_idx]
+-+-                
+-+-                mask = (flat_selected_experts == expert_idx_tensor)
+-+-                selected_token_indices = token_indices[mask]
+-+-                selected_routing_weights = routing_weights.flatten()[mask]
+-+-                
+-+-                current_states = hidden_states_reshaped[selected_token_indices]
+-+-                
+-+-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+-                
+-+-                final_hidden_states = final_hidden_states.index_add(
+-+-                    dim=0,
+-+-                    index=selected_token_indices,
+-+-                    source=expert_output.to(hidden_states.dtype)
+-+-                )
+-+-            
+-+-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+-++    @no_grad()
+-++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++        original_dtype = hidden_states.dtype
+-++        batch_size, _ = hidden_states.shape
+-++        expert_outputs_list = [
+-++            ops.cat([
+-++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++            ], dim=0)
+-++            for i in range(batch_size)
+-++        ]
+-++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++        weights_fp32 = routing_weights.to(mindspore.float32)
+-++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++        return moe_output_fp32.squeeze(1).to(original_dtype)
+-++
+-++    @no_grad()
+-++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++        num_tokens, _ = hidden_states.shape
+-++        flat_selected_experts = selected_experts.flatten()
+-++        sorted_expert_indices = flat_selected_experts.argsort()
+-++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-++        original_token_indices = sorted_expert_indices // self.top_k
+-++        moe_output = ops.zeros_like(hidden_states)
+-++        current_token_offset = 0
+-++        for i in range(self.num_experts):
+-++            expert_token_count = tokens_per_expert[i] - current_token_offset
+-++            if expert_token_count == 0:
+-++                continue
+-++            end_offset = current_token_offset + expert_token_count
+-++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-++            expert_hidden_states = hidden_states[expert_original_token_indices]
+-++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-++            current_token_offset += expert_token_count
+-++        return moe_output
+-++
+-++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-++    @no_grad()
+-++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++        moe_output = ops.zeros_like(hidden_states)
+-++        num_tokens, _ = hidden_states.shape
+-++        flat_selected_experts = selected_experts.flatten()
+-++        flat_routing_weights = routing_weights.flatten()
+-++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++        active_experts = ops.unique(flat_selected_experts)
+-++        for expert_idx_tensor in active_experts:
+-++            expert_idx = expert_idx_tensor.item()
+-++            expert_layer = self.experts[expert_idx]
+-++            mask = (flat_selected_experts == expert_idx_tensor)
+-++            current_token_indices = token_indices[mask]
+-++            current_routing_weights = flat_routing_weights[mask]
+-++            current_hidden_states = hidden_states[current_token_indices]
+-++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-++        return moe_output
+-+ 
+-+-            final_hidden_states = final_hidden_states + shared_expert_output
+-+-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+-            
+-+-            return final_hidden_states, router_logits
+-++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++        global Long_Prompt
+-++
+-++        # 1. 门控计算 (所有模式通用)
+-++        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++        router_logits = self.gate(hidden_states_reshaped)
+-++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-++        if self.norm_topk_prob:
+-++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++        
+-++        moe_output = None
+-++        if Long_Prompt:
+-++            # --- 精度优先模式 (ACCURACY MODE) ---
+-++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++        else:
+-++            # --- 速度优先模式 (SPEED MODE) ---
+-++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++            if sequence_length == 1:
+-++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++            else:
+-++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++        
+-+ 
+-++        # 3. 共享专家计算与合并 (所有模式通用)
+-++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++        
+-++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++        
+-++        return final_hidden_states, router_logits
+-+ 
+-+ class Qwen2MoeDecoderLayer(nn.Module):
+-+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-+         super().__init__()
+-+         self.hidden_size = config.hidden_size
+-++        
+-++        # if Long_Prompt:
+-++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++        # else:
+-++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+ 
+-+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+ 
+-+-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+-
+-+         if (layer_idx not in config.mlp_only_layers) and (
+-+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+         ):
+-+@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+             self._warmed_up = True
+-+             self.warmup_moe_model()
+-+ 
+-++
+-++
+-+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+         output_router_logits = (
+-+             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+-+@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+             router_logits=outputs.router_logits,
+-+         )
+-+ 
+-++    def generate(self, *args, **kwargs):
+-++        """
+-++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-++        """
+-++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-++
+-++        input_ids = kwargs.get("input_ids")
+-++        if input_ids is None and args:
+-++            input_ids = args[0]
+-++
+-++        if input_ids is not None:
+-++            prompt_length = input_ids.shape[1]
+-++            
+-++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-++                Long_Prompt = True
+-++            else:
+-++                Long_Prompt = False
+-++
+-++        return super().generate(*args, **kwargs)
+-++    
+-+     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+-+     def prepare_inputs_for_generation(
+-+         self,
+-+@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+-+         # Exception 1: when passing input_embeds, input_ids may be missing entries
+-+         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+-++        
+-+         if past_key_values is not None:
+-+             if inputs_embeds is not None:  # Exception 1
+-+                 if 0 not in input_ids.shape:
+-+@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+             }
+-+         )
+-+         return model_inputs
+-++
+-+ # @lwx
+-+     # def _decode_one_tokens_logits(
+-+     #     self,
+-+@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+-+             attentions=outputs.attentions,
+-+         )
+-+ 
+-++
+-+ __all__ = [
+-+     "Qwen2MoeForCausalLM",
+-+     "Qwen2MoeModel",
+-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+new file mode 100644
+-+index 00000000..6dfb5b93
+-+--- /dev/null
+-++++ b/patches/0001-20251104commit.patch
+-+@@ -0,0 +1,1272 @@
+-++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++From: Pinoeer-kingxi <13022943007@163.com>
+-++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++Subject: [PATCH] 20251104commit
+-++
+-++---
+-++ mindnlp/transformers/cache_utils.py           |  28 +-
+-++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-++
+-++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-++index cadd2e04..02f8d4be 100644
+-++--- a/mindnlp/transformers/cache_utils.py
+-+++++ b/mindnlp/transformers/cache_utils.py
+-++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-++             # k_out[:, :, cache_position] = key_states
+-++             # v_out[:, :, cache_position] = value_states
+-++-            if ON_ORANGE_PI:
+-++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++-            else:
+-++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++-
+-+++            # if ON_ORANGE_PI:
+-+++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++            # else:
+-+++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++            # 确保 cache_position 是 1D tensor 并且类型正确
+-+++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+++            if cache_position.ndim > 1:
+-+++                cache_position = cache_position.flatten()
+-+++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+++                cache_position = cache_position.int()
+-+++            
+-+++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+++            k_out[:, :, cache_position] = key_states
+-+++            v_out[:, :, cache_position] = value_states
+-+++            
+-++         return k_out, v_out
+-++ 
+-++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++index c695b944..d8303e45 100644
+-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++ def rotate_half(x):
+-++     """Rotates half the hidden dims of the input."""
+-++-    x1 = x[..., : x.shape[-1] // 2]
+-++-    x2 = x[..., x.shape[-1] // 2 :]
+-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++    # x1 = x[..., : x.shape[-1] // 2]
+-+++    # x2 = x[..., x.shape[-1] // 2 :]
+-+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++     return ops.cat((-x2, x1), dim=-1)
+-++ 
+-++ 
+-++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-++         if self.training:
+-++             raise NotImplementedError("Training is not supported yet.")
+-++         else:
+-++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++-        if self.config.n_shared_experts is not None:
+-++-            y = y + self.shared_experts(identity)
+-++-        return y
+-+++            # @lwx
+-+++            if orig_shape[1] == 1:
+-+++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+++                y=y.view(*orig_shape)
+-+++                if self.config.n_shared_experts is not None:
+-+++                    y = y + self.shared_experts(identity)
+-+++                return y
+-+++            else:
+-+++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+++                if self.config.n_shared_experts is not None:
+-+++                    y = y + self.shared_experts(identity)
+-+++                return y
+-+++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++        # if self.config.n_shared_experts is not None:
+-+++        #     y = y + self.shared_experts(identity)
+-+++        # return y
+-+++        
+-+++    @no_grad()
+-+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++
+-+++        expert_cache = ops.zeros_like(x)
+-+++        for i in range(self.num_experts_per_tok):
+-+++            expert_id = flat_expert_indices[i].item()
+-+++            weight = flat_expert_weights[i].item()
+-+++            expert = self.experts[expert_id]
+-+++            expert_out = expert(x)
+-+++            expert_cache += expert_out * weight
+-+++        return expert_cache
+-++ 
+-++     @no_grad()
+-++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++-        # expert_cache = torch.zeros_like(x)
+-++-        # idxs = flat_expert_indices.argsort()
+-++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++-        # token_idxs = idxs // self.num_experts_per_tok
+-++-        # for i, end_idx in enumerate(tokens_per_expert):
+-++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++-        #     if start_idx == end_idx:
+-++-        #         continue
+-++-        #     expert = self.experts[i]
+-++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++-        #     expert_tokens = x[exp_token_idx]
+-++-        #     expert_out = expert(expert_tokens)
+-++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++-        # return expert_cache
+-+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++         expert_cache = ops.zeros_like(x)
+-++         idxs = flat_expert_indices.argsort()
+-++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++         token_idxs = idxs // self.num_experts_per_tok
+-+++
+-++         for i, end_idx in enumerate(tokens_per_expert):
+-++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++             if start_idx == end_idx:
+-++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-++             expert_out = expert(expert_tokens)
+-++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++
+-++         return expert_cache
+-+++        
+-+++    # @no_grad()
+-+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++    #     # expert_cache = torch.zeros_like(x)
+-+++    #     # idxs = flat_expert_indices.argsort()
+-+++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++    #     #     if start_idx == end_idx:
+-+++    #     #         continue
+-+++    #     #     expert = self.experts[i]
+-+++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++    #     #     expert_tokens = x[exp_token_idx]
+-+++    #     #     expert_out = expert(expert_tokens)
+-+++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++    #     # return expert_cache
+-+++    #     expert_cache = ops.zeros_like(x)
+-+++    #     idxs = flat_expert_indices.argsort()
+-+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++
+-+++    #     for i, end_idx in enumerate(tokens_per_expert):
+-+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++    #         if start_idx == end_idx:
+-+++    #             continue
+-+++    #         expert = self.experts[i]
+-+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++    #         expert_tokens = x[exp_token_idx]
+-+++    #         expert_out = expert(expert_tokens)
+-+++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++
+-+++    #     return expert_cache
+-+++    # @no_grad()
+-+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++    #     expert_cache = ops.zeros_like(x)
+-+++
+-+++    #     # 排序保证顺序一致
+-+++    #     idxs = flat_expert_indices.argsort()
+-+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++
+-+++    #     # 找出有 token 的专家
+-+++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++
+-+++    #     for i in active_experts.tolist():
+-+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++    #         end_idx = tokens_per_expert[i]
+-+++    #         if start_idx == end_idx:  # 没有 token
+-+++    #             continue
+-+++
+-+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++    #         expert_tokens = x[exp_token_idx]
+-+++    #         expert_out = self.experts[i](expert_tokens)
+-+++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++
+-+++    #         expert_cache = mindspore.mint.scatter_add(
+-+++    #             expert_cache,
+-+++    #             0,
+-+++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++    #             expert_out
+-+++    #         )
+-+++
+-+++    #     return expert_cache
+-+++
+-+++
+-++ 
+-++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-++ #     """
+-++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++ 
+-++         # Initialize weights and apply final processing
+-++         self.post_init()
+-+++        self.warm_up = False
+-+++
+-+++    def warmup_moe_model_deep(self):
+-+++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++        test_texts = [
+-+++            "warmup short",
+-+++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+++        ]
+-+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++        if tokenizer is None:
+-+++            from mindnlp.transformers import AutoTokenizer
+-+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++            self._warmup_tokenizer = tokenizer
+-+++
+-+++        for text in test_texts:
+-+++            inputs = tokenizer(text, return_tensors="ms")
+-+++            with mindspore._no_grad():
+-+++                _ = self(**inputs, use_cache=False)
+-+++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-++ 
+-++     def get_input_embeddings(self):
+-++         return self.model.embed_tokens
+-++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++         ```"""
+-+++        if not self.warm_up:
+-+++            self.warm_up = True
+-+++            self.warmup_moe_model_deep()
+-+++
+-++         output_attentions = (
+-++             output_attentions
+-++             if output_attentions is not None
+-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++index 3cbf820e..d4c6b651 100644
+-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++@@ -18,7 +18,6 @@
+-++ # See the License for the specific language governing permissions and
+-++ # limitations under the License.
+-++ """MindSpore Qwen2MoE model."""
+-++-
+-++ import math
+-++ from typing import List, Optional, Tuple, Union
+-++ 
+-++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-++     TokenClassifierOutput,
+-++ )
+-++ from ...modeling_utils import PreTrainedModel
+-+++from ...generation import GenerationMixin
+-++ from ....utils import logging
+-++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-++ 
+-++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-++         self.variance_epsilon = eps
+-++ 
+-++     def forward(self, hidden_states):
+-+++        # @dwj
+-+++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++        # @lwx
+-+++        # if not self.training :
+-+++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++         input_dtype = hidden_states.dtype
+-++         hidden_states = hidden_states.to(mindspore.float32)
+-++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-++@@ -234,6 +239,8 @@ def rotate_half(x):
+-++     """Rotates half the hidden dims of the input."""
+-++     x1 = x[..., : x.shape[-1] // 2]
+-++     x2 = x[..., x.shape[-1] // 2 :]
+-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++     return ops.cat((-x2, x1), dim=-1)
+-++ 
+-++ 
+-++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-++         self.config = config
+-++         self.hidden_size = config.hidden_size
+-++         self.intermediate_size = intermediate_size
+-+++        
+-++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-++         self.act_fn = ACT2FN[config.hidden_act]
+-++ 
+-++     def forward(self, x):
+-++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++-
+-++ 
+-+++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++        # @lwx 
+-+++        # gate_up_output = self.gate_up_proj(x)
+-+++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+++        # return self.down_proj(swiglu_output)
+-+++
+-+++    # def forward(self, x):
+-+++    #     gate_proj_out = self.gate_proj(x)
+-+++    #     up_proj_out = self.up_proj(x)
+-+++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+++    #     return self.down_proj(swiglu_out)
+-+++        
+-++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++     """
+-++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-++         use_cache: bool = False,
+-++         cache_position: Optional[mindspore.Tensor] = None,
+-++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++        
+-+++
+-++         bsz, q_len, _ = hidden_states.shape
+-++ 
+-++         query_states = self.q_proj(hidden_states)
+-++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++                     "with a layer index."
+-++                 )
+-++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++            if isinstance(past_key_value, StaticCache):
+-+++                kv_seq_len = key_states.shape[-2]
+-+++            else:
+-+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++ 
+-++         if past_key_value is not None:
+-++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++            
+-+++            if isinstance(past_key_value, StaticCache):
+-+++                kv_seq_len = key_states.shape[-2]
+-++ 
+-++         # repeat k/v heads if n_kv_heads < n_heads
+-++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++-
+-+++        
+-++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++ 
+-++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-++-            raise ValueError(
+-++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-++-                f" {attn_weights.shape}"
+-++-            )
+-++-
+-++-        if attention_mask is not None:  # no matter the length, we just slice it
+-++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+++        if attention_mask is not None:
+-+++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++             attn_weights = attn_weights + causal_mask
+-++ 
+-++         # upcast attention to fp32
+-++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++ 
+-++         attn_output = self.o_proj(attn_output)
+-++-
+-+++        # @lwx
+-+++        
+-+++        # max_seq_len = self.max_position_embeddings  # 2048
+-+++
+-+++        # if attention_mask is not None:
+-+++        #     # attention_mask: [B, 1, Sq, Sk]
+-+++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++
+-+++        #     # pad 到 [max_seq_len, max_seq_len]
+-+++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++        #     global_attention_mask = padded_mask
+-+++        # else:
+-+++        #     global_attention_mask = None
+-+++
+-+++
+-+++        # sparse_mode=3
+-+++        # attn_output = mindspore.ops.flash_attention_score(
+-+++        #     query=query_states,
+-+++        #     key=key_states,
+-+++        #     value=value_states,
+-+++        #     real_shift=None,
+-+++        #     padding_mask=None,
+-+++
+-+++        #     head_num=self.num_heads,
+-+++        #     attn_mask=global_attention_mask,
+-+++        #     keep_prob=1.0 - self.attention_dropout,
+-+++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++        #     input_layout="BNSD",
+-+++        #     pre_tokens=2147483647,
+-+++        #     next_tokens=2147483647,
+-+++        #     inner_precise=0,
+-+++        #     drop_mask=None,
+-+++        #     prefix=None,
+-+++        #     actual_seq_qlen=None,
+-+++        #     actual_seq_kvlen=None,
+-+++        #     sparse_mode=sparse_mode,
+-+++        # )
+-++         if not output_attentions:
+-++             attn_weights = None
+-++ 
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-++ 
+-+++class Qwen2MoeFlashAttention(nn.Module):
+-+++    """
+-+++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++
+-+++    关键改动:
+-+++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++       直接传入原始的 key 和 value 张量效率更高。
+-+++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++    """
+-+++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++        super().__init__()
+-+++        self.config = config
+-+++        self.layer_idx = layer_idx
+-+++        self.hidden_size = config.hidden_size
+-+++        self.num_heads = config.num_attention_heads
+-+++        self.head_dim = self.hidden_size // self.num_heads
+-+++        self.num_key_value_heads = config.num_key_value_heads
+-+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++        self.max_position_embeddings = config.max_position_embeddings
+-+++        self.rope_theta = config.rope_theta
+-+++        self.attention_dropout = config.attention_dropout
+-+++
+-+++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++            raise ValueError(
+-+++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++            )
+-+++
+-+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++
+-+++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++            self.head_dim,
+-+++            max_position_embeddings=self.max_position_embeddings,
+-+++            base=self.rope_theta,
+-+++        )
+-+++
+-+++    def forward(
+-+++        self,
+-+++        hidden_states: mindspore.Tensor,
+-+++        attention_mask: Optional[mindspore.Tensor] = None,
+-+++        position_ids: Optional[mindspore.Tensor] = None,
+-+++        past_key_value: Optional[Cache] = None,
+-+++        output_attentions: bool = False,
+-+++        use_cache: bool = False,
+-+++        cache_position: Optional[mindspore.Tensor] = None,
+-+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++        bsz, q_len, _ = hidden_states.shape
+-+++
+-+++        # 1. 线性投射 Q, K, V
+-+++        query_states = self.q_proj(hidden_states)
+-+++        key_states = self.k_proj(hidden_states)
+-+++        value_states = self.v_proj(hidden_states)
+-+++
+-+++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++        # 3. RoPE 旋转位置编码
+-+++        kv_seq_len = key_states.shape[-2]
+-+++        if past_key_value is not None:
+-+++            if self.layer_idx is None:
+-+++                raise ValueError(
+-+++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++                    "with a layer index."
+-+++                )
+-+++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++                if cache_position.shape[0] == 1:
+-+++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++                    kv_seq_len = past_seen_tokens + 1
+-+++                else:
+-+++                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++            else:
+-+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++
+-+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++        # 4. KV 缓存更新
+-+++        if past_key_value is not None:
+-+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++            key_states, value_states = past_key_value.update(
+-+++                key_states, value_states, self.layer_idx, cache_kwargs
+-+++            )
+-+++            
+-+++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++                if cache_position.shape[0] == 1:
+-+++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++                    kv_seq_len = key_states.shape[-2]
+-+++
+-+++        # 5. [重要] 准备 Attention Mask
+-+++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++        fa_attention_mask = None
+-+++        if attention_mask is not None:
+-+++            # 截取与当前key长度匹配的部分
+-+++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++            fa_attention_mask = (mask_slice != 0)
+-+++
+-+++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++        input_dtype = query_states.dtype
+-+++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++            query_states = query_states.to(mindspore.float16)
+-+++            key_states = key_states.to(mindspore.float16)
+-+++            value_states = value_states.to(mindspore.float16)
+-+++
+-+++        # 6. [核心] 调用 flash_attention_score 算子
+-+++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++        attn_output = mindspore.ops.flash_attention_score(
+-+++            query=query_states,
+-+++            key=key_states,
+-+++            value=value_states,
+-+++            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++            attn_mask=fa_attention_mask,
+-+++            keep_prob=1.0 - self.attention_dropout,
+-+++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++            input_layout="BNSD",
+-+++            sparse_mode=0 # 使用 defaultMask 模式
+-+++        )
+-+++
+-+++        # 恢复原始数据类型
+-+++        attn_output = attn_output.to(input_dtype)
+-+++
+-+++        # 7. 调整输出形状
+-+++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++        attn_output = self.o_proj(attn_output)
+-+++
+-+++        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++        attn_weights = None
+-+++        if output_attentions:
+-+++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++
+-+++        return attn_output, attn_weights, past_key_value
+-+++
+-+++    # def forward(
+-+++    #     self,
+-+++    #     hidden_states: mindspore.Tensor,
+-+++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++    #     past_key_value: Optional[Cache] = None,
+-+++    #     output_attentions: bool = False,
+-+++    #     use_cache: bool = False,
+-+++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++    #     bsz, q_len, _ = hidden_states.shape
+-+++
+-+++    #     # 1. 线性投射 Q, K, V
+-+++    #     query_states = self.q_proj(hidden_states)
+-+++    #     key_states = self.k_proj(hidden_states)
+-+++    #     value_states = self.v_proj(hidden_states)
+-+++
+-+++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++    #     # 3. RoPE 旋转位置编码
+-+++    #     kv_seq_len = key_states.shape[-2]
+-+++    #     if past_key_value is not None:
+-+++    #         if self.layer_idx is None:
+-+++    #             raise ValueError(
+-+++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++    #                 "with a layer index."
+-+++    #             )
+-+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++
+-+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++    #     # 4. KV 缓存更新
+-+++    #     if past_key_value is not None:
+-+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++    #         key_states, value_states = past_key_value.update(
+-+++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++    #         )
+-+++
+-+++    #     # 5. 准备 Attention Mask
+-+++    #     fa_attention_mask = None
+-+++    #     if attention_mask is not None:
+-+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++    #         fa_attention_mask = (mask_slice != 0)
+-+++
+-+++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++    #     input_dtype = query_states.dtype
+-+++
+-+++    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++    #         query=query_states,
+-+++    #         key=key_states,
+-+++    #         value=value_states,
+-+++    #         head_num=self.num_heads,
+-+++    #         attn_mask=fa_attention_mask,
+-+++    #         keep_prob=1.0 - self.attention_dropout,
+-+++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++    #         input_layout="BNSD",
+-+++    #         sparse_mode=0,
+-+++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++    #         inner_precise=1
+-+++    #     )
+-+++
+-+++    #     # 恢复原始数据类型
+-+++    #     attn_output = attn_output.to(input_dtype)
+-+++
+-+++    #     # 7. 调整输出形状
+-+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++    #     attn_output = self.o_proj(attn_output)
+-+++
+-+++    #     attn_weights = None
+-+++    #     if output_attentions:
+-+++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++
+-+++    #     return attn_output, attn_weights, past_key_value
+-+++
+-+++    # def forward(
+-+++    #     self,
+-+++    #     hidden_states: mindspore.Tensor,
+-+++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++    #     past_key_value: Optional[Cache] = None,
+-+++    #     output_attentions: bool = False,
+-+++    #     use_cache: bool = False,
+-+++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++    #     bsz, q_len, _ = hidden_states.shape
+-+++
+-+++    #     query_states = self.q_proj(hidden_states)
+-+++    #     key_states = self.k_proj(hidden_states)
+-+++    #     value_states = self.v_proj(hidden_states)
+-+++
+-+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++    #     kv_seq_len = key_states.shape[-2]
+-+++    #     if past_key_value is not None:
+-+++    #         if self.layer_idx is None:
+-+++    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++
+-+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++    #     if past_key_value is not None:
+-+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++    #         key_states, value_states = past_key_value.update(
+-+++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++    #         )
+-+++
+-+++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++
+-+++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++    #     # <--- 修改结束 ---
+-+++
+-+++    #     fa_attention_mask = None
+-+++    #     if attention_mask is not None:
+-+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++    #         fa_attention_mask = (mask_slice != 0)
+-+++
+-+++    #     input_dtype = query_states.dtype
+-+++
+-+++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++    #         key=key_states,
+-+++    #         value=value_states,
+-+++    #         head_num=self.num_heads,
+-+++    #         attn_mask=fa_attention_mask,
+-+++    #         keep_prob=1.0 - self.attention_dropout,
+-+++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++    #         input_layout="BNSD",
+-+++    #         sparse_mode=0,
+-+++    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++    #     )
+-+++
+-+++    #     attn_output = attn_output.to(input_dtype)
+-+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++    #     attn_output = self.o_proj(attn_output)
+-+++
+-+++    #     attn_weights = None
+-+++    #     if output_attentions:
+-+++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++
+-+++    #     return attn_output, attn_weights, past_key_value
+-+++
+-++ QWEN2MOE_ATTENTION_CLASSES = {
+-++     "eager": Qwen2MoeAttention,
+-+++    "flash-attention": Qwen2MoeFlashAttention,
+-++ }
+-++ 
+-++ 
+-++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++ 
+-+++    #@dwj
+-+++    # 只遍历激活的专家，而非全部专家
+-++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-++-        # router_logits: (batch * sequence_length, n_experts)
+-++-        router_logits = self.gate(hidden_states)
+-++-
+-++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++-        if self.norm_topk_prob:
+-++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-        # we cast back to the input dtype
+-++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-++-
+-++-        final_hidden_states = ops.zeros(
+-++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-++-        )
+-++-
+-++-        # One hot encode the selected experts to create an expert mask
+-++-        # this will be used to easily index which expert is going to be sollicitated
+-++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-++-
+-++-        # Loop over all available experts in the model and perform the computation on each expert
+-++-        for expert_idx in range(self.num_experts):
+-++-            expert_layer = self.experts[expert_idx]
+-++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-++-
+-++-            # Index the correct hidden states and compute the expert hidden state for
+-++-            # the current expert. We need to make sure to multiply the output hidden
+-++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-++-            if 0 not in idx.shape:
+-++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-++-
+-++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-++-                # the `top_x` tensor here.
+-++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-++-
+-++-        shared_expert_output = self.shared_expert(hidden_states)
+-++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-++-
+-++-        final_hidden_states = final_hidden_states + shared_expert_output
+-+++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++            num_tokens = hidden_states_reshaped.shape[0]
+-+++
+-+++            router_logits = self.gate(hidden_states_reshaped)
+-+++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++
+-+++            if self.norm_topk_prob:
+-+++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++            
+-+++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++            flat_selected_experts = selected_experts.flatten()
+-+++            
+-+++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++            token_indices = broadcasted_token_indices.flatten()
+-+++            
+-+++            active_experts = ops.unique(flat_selected_experts)
+-+++            
+-+++            for expert_idx_tensor in active_experts:
+-+++                expert_idx = expert_idx_tensor.item()
+-+++                expert_layer = self.experts[expert_idx]
+-+++                
+-+++                mask = (flat_selected_experts == expert_idx_tensor)
+-+++                selected_token_indices = token_indices[mask]
+-+++                selected_routing_weights = routing_weights.flatten()[mask]
+-+++                
+-+++                current_states = hidden_states_reshaped[selected_token_indices]
+-+++                
+-+++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++                
+-+++                final_hidden_states = final_hidden_states.index_add(
+-+++                    dim=0,
+-+++                    index=selected_token_indices,
+-+++                    source=expert_output.to(hidden_states.dtype)
+-+++                )
+-+++            
+-+++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++ 
+-++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++-        return final_hidden_states, router_logits
+-+++            final_hidden_states = final_hidden_states + shared_expert_output
+-+++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++            
+-+++            return final_hidden_states, router_logits
+-++ 
+-++ 
+-++ class Qwen2MoeDecoderLayer(nn.Module):
+-++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-++ 
+-++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++ 
+-+++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++
+-++         if (layer_idx not in config.mlp_only_layers) and (
+-++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++         ):
+-++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-++     _skip_keys_device_placement = "past_key_values"
+-++     _supports_cache_class = True
+-+++#lwx
+-+++    # _supports_static_cache = True
+-++ 
+-++     def _init_weights(self, module):
+-++         std = self.config.initializer_range
+-++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++         return causal_mask
+-++ 
+-++ 
+-++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++     _tied_weights_keys = ["lm_head.weight"]
+-++ 
+-++     def __init__(self, config):
+-++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++         self.num_experts_per_tok = config.num_experts_per_tok
+-++         # Initialize weights and apply final processing
+-++         self.post_init()
+-+++        # @lwx
+-+++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+++        #     self.generation_config.cache_implementation = "static"
+-+++        self._warmed_up = False
+-+++
+-+++    def warmup_moe_model(self):
+-+++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+++        test_texts = [
+-+++            "warmup short",
+-+++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+++        ]
+-+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++        if tokenizer is None:
+-+++            from mindnlp.transformers import AutoTokenizer
+-+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++            self._warmup_tokenizer = tokenizer
+-+++
+-+++        for text in test_texts:
+-+++            inputs = tokenizer(text, return_tensors="ms")
+-+++            with mindspore._no_grad():
+-+++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-++ 
+-++     def get_input_embeddings(self):
+-++         return self.model.embed_tokens
+-++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++         ```"""
+-+++        if not self._warmed_up:
+-+++            self._warmed_up = True
+-+++            self.warmup_moe_model()
+-++ 
+-++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++         output_router_logits = (
+-++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++             }
+-++         )
+-++         return model_inputs
+-+++# @lwx
+-+++    # def _decode_one_tokens_logits(
+-+++    #     self,
+-+++    #     cur_token: mindspore.Tensor,
+-+++    #     input_pos: Optional[mindspore.Tensor],
+-+++    #     cache_position: mindspore.Tensor,
+-+++    #     past_key_values: StaticCache,
+-+++    # ) -> mindspore.Tensor:
+-+++    #     """
+-+++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+++        
+-+++    #     Args:
+-+++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+++    #         input_pos: 输入位置信息，可选
+-+++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+++            
+-+++    #     Returns:
+-+++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+++    #     """
+-+++    #     # 调用JIT编译的版本
+-+++    #     return self.get_decode_one_tokens_logits(
+-+++    #         cur_token=cur_token,
+-+++    #         input_pos=input_pos,
+-+++    #         cache_position=cache_position,
+-+++    #         past_key_values=past_key_values,
+-+++    #     )
+-+++    
+-+++    # @mindspore.jit(jit_level='O1')
+-+++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+++    #     """
+-+++    #     JIT编译的函数，用于高效的单token解码
+-+++    #     使用JIT编译优化以支持静态shape和高效执行
+-+++        
+-+++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+++    #     """
+-+++    #     outputs = self.model.forward(
+-+++    #         input_ids=cur_token,
+-+++    #         position_ids=input_pos,
+-+++    #         cache_position=cache_position,
+-+++    #         past_key_values=past_key_values,
+-+++    #         use_cache=True,
+-+++    #         return_dict=False,
+-+++    #     )
+-+++        
+-+++    #     hidden_states = outputs[0]
+-+++    #     logits = self.lm_head.forward(hidden_states)
+-+++    #     logits = logits.float()
+-+++        
+-+++    #     return logits[:, -1, :]
+-+++
+-+++    # def _sample(
+-+++    #     self,
+-+++    #     input_ids: mindspore.Tensor,
+-+++    #     logits_processor,
+-+++    #     stopping_criteria,
+-+++    #     generation_config,
+-+++    #     synced_devices: bool,
+-+++    #     streamer=None,
+-+++    #     logits_warper=None,
+-+++    #     **model_kwargs,
+-+++    # ):
+-+++    #     """
+-+++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+++    #     """
+-+++    #     from ...generation.logits_process import LogitsProcessorList
+-+++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+++    #     from mindnlp.core import nn, ops, no_grad
+-+++    #     import numpy as np
+-+++        
+-+++    #     # 检查是否使用 StaticCache
+-+++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+++    #     # 否则，直接调用父类方法
+-+++    #     past_key_values = model_kwargs.get("past_key_values")
+-+++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+++
+-+++    #     if not isinstance(past_key_values, StaticCache):
+-+++    #         # 不使用 StaticCache，直接调用父类方法
+-+++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+++    #         return super()._sample(
+-+++    #             input_ids=input_ids,
+-+++    #             logits_processor=logits_processor,
+-+++    #             stopping_criteria=stopping_criteria,
+-+++    #             generation_config=generation_config,
+-+++    #             synced_devices=synced_devices,
+-+++    #             streamer=streamer,
+-+++    #             logits_warper=logits_warper,
+-+++    #             **model_kwargs,
+-+++    #         )
+-+++        
+-+++    #     # 使用 StaticCache，进入自定义循环
+-+++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+++    #     pad_token_id = generation_config._pad_token_tensor
+-+++    #     output_attentions = generation_config.output_attentions
+-+++    #     output_hidden_states = generation_config.output_hidden_states
+-+++    #     output_scores = generation_config.output_scores
+-+++    #     output_logits = generation_config.output_logits
+-+++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+++    #     max_length = generation_config.max_length
+-+++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+++    #     do_sample = generation_config.do_sample
+-+++        
+-+++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+++    #         raise ValueError(
+-+++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+++    #             f"{logits_warper})."
+-+++    #         )
+-+++        
+-+++    #     # init attention / hidden states / scores tuples
+-+++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+++        
+-+++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+++    #         encoder_hidden_states = (
+-+++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+++    #         )
+-+++        
+-+++    #     # keep track of which sequences are already finished
+-+++    #     batch_size, cur_len = input_ids.shape
+-+++    #     this_peer_finished = False
+-+++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+++        
+-+++    #     time_record = []
+-+++    #     from ....utils.testing_utils import parse_flag_from_env
+-+++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+++        
+-+++    #     while self._has_unfinished_sequences(
+-+++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+++    #     ):
+-+++    #         if _record_time:
+-+++    #             import time as time_module
+-+++    #             infer_start = time_module.time()
+-+++            
+-+++    #         # prepare model inputs
+-+++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+++            
+-+++    #         # prepare variable output controls
+-+++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+++            
+-+++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+++    #         cur_cache_position = model_inputs.get("cache_position")
+-+++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+++    #         cur_input_ids = model_inputs.get("input_ids")
+-+++            
+-+++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+++    #             cur_cache_position is not None and 
+-+++    #             len(cur_cache_position.shape) > 0 and
+-+++    #             cur_cache_position.shape[0] == 1 and
+-+++    #             cur_input_ids is not None and
+-+++    #             cur_input_ids.shape[1] == 1):
+-+++    #             # 使用 JIT 优化的单 token 解码
+-+++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+++    #             if not hasattr(self, '_jit_used'):
+-+++    #                 self._jit_used = False
+-+++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+++                
+-+++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+++    #                 cur_token=cur_input_ids,
+-+++    #                 input_pos=model_inputs.get("position_ids"),
+-+++    #                 cache_position=cur_cache_position,
+-+++    #                 past_key_values=cur_past_key_values,
+-+++    #             )
+-+++                
+-+++    #             # 标记已使用JIT（用于后续判断）
+-+++    #             if not self._jit_used:
+-+++    #                 self._jit_used = True
+-+++                
+-+++    #             # 构造兼容的输出对象
+-+++    #             class JitOptimizedOutput:
+-+++    #                 def __init__(self, logits, config):
+-+++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+++    #                     self.config = config
+-+++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+++    #                     self.cross_attentions = None
+-+++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+++                
+-+++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+++    #         else:
+-+++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+++    #             outputs = self(**model_inputs, return_dict=True)
+-+++            
+-+++    #         if synced_devices and this_peer_finished:
+-+++    #             continue
+-+++            
+-+++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+++    #         next_token_logits = outputs.logits[:, -1, :]
+-+++            
+-+++    #         # pre-process distribution
+-+++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+++    #         if do_sample:
+-+++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+++            
+-+++    #         # Store scores, attentions and hidden_states when required
+-+++    #         if return_dict_in_generate:
+-+++    #             if output_scores:
+-+++    #                 scores += (next_token_scores,)
+-+++    #             if output_logits:
+-+++    #                 raw_logits += (next_token_logits,)
+-+++    #             if output_attentions:
+-+++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+++    #                 if self.config.is_encoder_decoder:
+-+++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+++                
+-+++    #             if output_hidden_states:
+-+++    #                 hidden = (
+-+++    #                     outputs.decoder_hidden_states
+-+++    #                     if self.config.is_encoder_decoder
+-+++    #                     else outputs.hidden_states
+-+++    #                 )
+-+++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+++            
+-+++    #         # token selection
+-+++    #         if do_sample:
+-+++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+++    #         else:
+-+++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+++            
+-+++    #         # finished sentences should have their next token be a padding token
+-+++    #         if has_eos_stopping_criteria:
+-+++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+++            
+-+++    #         # update generated ids, model inputs, and length for next step
+-+++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+++    #         if streamer is not None:
+-+++    #             streamer.put(next_tokens)
+-+++            
+-+++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+++    #             outputs,
+-+++    #             model_kwargs,
+-+++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+++    #         )
+-+++            
+-+++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+++    #         cur_len += 1
+-+++            
+-+++    #         if _record_time:
+-+++    #             import time as time_module
+-+++    #             infer_stop = time_module.time()
+-+++    #             time_record.append(infer_stop - infer_start)
+-+++            
+-+++    #         del outputs
+-+++        
+-+++    #     average_infer_time = None
+-+++    #     if time_record:
+-+++    #         if len(time_record) > 1:
+-+++    #             time_record.pop(0)
+-+++    #         average_infer_time = sum(time_record) / len(time_record)
+-+++    #         print(f'average inference time is: {average_infer_time}')
+-+++    #         print(f'inference time record: {time_record}')
+-+++        
+-+++    #     if streamer is not None:
+-+++    #         streamer.end()
+-+++        
+-+++    #     # 简单判断：打印是否使用了JIT路径
+-+++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+++    #     else:
+-+++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+++        
+-+++    #     if return_dict_in_generate:
+-+++    #         if self.config.is_encoder_decoder:
+-+++    #             return GenerateEncoderDecoderOutput(
+-+++    #                 sequences=input_ids,
+-+++    #                 scores=scores,
+-+++    #                 logits=raw_logits,
+-+++    #                 encoder_attentions=encoder_attentions,
+-+++    #                 encoder_hidden_states=encoder_hidden_states,
+-+++    #                 decoder_attentions=decoder_attentions,
+-+++    #                 cross_attentions=cross_attentions,
+-+++    #                 decoder_hidden_states=decoder_hidden_states,
+-+++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++    #                 average_infer_time=average_infer_time
+-+++    #             )
+-+++    #         else:
+-+++    #             return GenerateDecoderOnlyOutput(
+-+++    #                 sequences=input_ids,
+-+++    #                 scores=scores,
+-+++    #                 logits=raw_logits,
+-+++    #                 attentions=decoder_attentions,
+-+++    #                 hidden_states=decoder_hidden_states,
+-+++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++    #                 average_infer_time=average_infer_time
+-+++    #             )
+-+++    #     else:
+-+++    #         return input_ids
+-+++            
+-+++    # def _prepare_cache_for_generation(
+-+++    #     self,
+-+++    #     generation_config,
+-+++    #     model_kwargs,
+-+++    #     assistant_model,
+-+++    #     batch_size,
+-+++    #     max_cache_length,
+-+++    # ):
+-+++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+++    #         generation_config.cache_implementation = "static"
+-+++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+++        
+-+++    #     if generation_config.cache_implementation == "static":
+-+++    #         base_required_from_max_length = generation_config.max_length + 1
+-+++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+++    #         min_cache_size = 50
+-+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+++    #         else:
+-+++    #             max_cache_length = max(base_required, min_cache_size)
+-+++            
+-+++    #         original_max_cache_length = max_cache_length
+-+++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+++            
+-+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++    #             if max_cache_length > self.config.max_position_embeddings:
+-+++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+++        
+-+++    #     result = super()._prepare_cache_for_generation(
+-+++    #         generation_config=generation_config,
+-+++    #         model_kwargs=model_kwargs,
+-+++    #         assistant_model=assistant_model,
+-+++    #         batch_size=batch_size,
+-+++    #         max_cache_length=max_cache_length,
+-+++    #     )
+-+++        
+-+++    #     if generation_config.cache_implementation == "static":
+-+++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+++    #         created_cache = model_kwargs.get(cache_name)
+-+++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+++    #             if created_cache.max_cache_len < generation_config.max_length:
+-+++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+++        
+-+++    #     return result
+-+++
+-+++
+-+++
+-++ 
+-++ 
+-++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-++-- 
+-++2.27.0
+-++
+-+-- 
+-+2.27.0
+-+
+-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-new file mode 100644
+-index 00000000..966529e4
+---- /dev/null
+-+++ b/patches/0003-20261106secondcommit.patch
+-@@ -0,0 +1,2769 @@
+-+From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-+From: Pinoeer-kingxi <13022943007@163.com>
+-+Date: Thu, 6 Nov 2025 14:54:37 +0800
+-+Subject: [PATCH 3/3] 20261106secondcommit
+-+
+-+---
+-+ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+-+ patches/0001-20251104commit.patch             | 1272 -----------------
+-+ 3 files changed, 528 insertions(+), 2032 deletions(-)
+-+ delete mode 100644 patches/0001-20251104commit.patch
+-+
+-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+index 73773c22..2f9192bf 100644
+-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+-+ 
+-+ _CONFIG_FOR_DOC = "DeepseekConfig"
+-+ 
+-++_attn_mask_cache = {}
+-++
+-++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+-++    q_len = batch_and_seq[1]
+-++    kv_len = batch_and_seq[1] + past_key_values_length 
+-++    key = (batch_and_seq[0], q_len, kv_len)
+-++
+-++    if key in _attn_mask_cache:
+-++        return _attn_mask_cache[key]
+-++
+-++    mask = _prepare_4d_causal_attention_mask(
+-++        attention_mask,
+-++        batch_and_seq,
+-++        inputs_embeds,
+-++        past_key_values_length,
+-++    )
+-++    _attn_mask_cache[key] = mask
+-++    return mask
+-+ 
+-+ def _get_unpad_data(attention_mask):
+-+     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+-+@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+-+         return final_output
+-+ 
+-+ 
+-+-    @no_grad()
+-+-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+-        expert_cache = ops.zeros_like(x)
+-+-        idxs = flat_expert_indices.argsort()
+-+-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+-        token_idxs = idxs // self.num_experts_per_tok
+-+-
+-+-        for i, end_idx in enumerate(tokens_per_expert):
+-+-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+-            if start_idx == end_idx:
+-+-                continue
+-+-            expert = self.experts[i]
+-+-            exp_token_idx = token_idxs[start_idx:end_idx]
+-+-            expert_tokens = x[exp_token_idx]
+-+-            expert_out = expert(expert_tokens)
+-+-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+-
+-+-        return expert_cache
+-+-        
+-+     # @no_grad()
+-+-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+-    #     # expert_cache = torch.zeros_like(x)
+-+-    #     # idxs = flat_expert_indices.argsort()
+-+-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+-    #     # token_idxs = idxs // self.num_experts_per_tok
+-+-    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+-    #     #     if start_idx == end_idx:
+-+-    #     #         continue
+-+-    #     #     expert = self.experts[i]
+-+-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+-    #     #     expert_tokens = x[exp_token_idx]
+-+-    #     #     expert_out = expert(expert_tokens)
+-+-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+-    #     # return expert_cache
+-++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+     #     expert_cache = ops.zeros_like(x)
+-+     #     idxs = flat_expert_indices.argsort()
+-+     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+-+     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+ 
+-+     #     return expert_cache
+-+-    # @no_grad()
+-+-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+-    #     expert_cache = ops.zeros_like(x)
+-++        
+-++    @no_grad()
+-++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++        """
+-++        优化版 MoE prefill：
+-++        - 批量张量化处理同一个 expert 的所有 token
+-++        - 跳过无 token 的专家
+-++        - 保持结果完全一致
+-++        """
+-++        # 初始化输出缓存
+-++        expert_cache = ops.zeros_like(x)
+-+ 
+-+-    #     # 排序保证顺序一致
+-+-    #     idxs = flat_expert_indices.argsort()
+-+-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+-    #     token_idxs = idxs // self.num_experts_per_tok
+-++        # 排序（确保 scatter_add 位置对应原逻辑）
+-++        idxs = flat_expert_indices.argsort()
+-++        sorted_expert_indices = flat_expert_indices[idxs]
+-++        sorted_token_indices = idxs // self.num_experts_per_tok
+-+ 
+-+-    #     # 找出有 token 的专家
+-+-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++        # 每个 expert 的 token 数
+-++        tokens_per_expert = sorted_expert_indices.bincount()
+-+ 
+-+-    #     for i in active_experts.tolist():
+-+-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+-    #         end_idx = tokens_per_expert[i]
+-+-    #         if start_idx == end_idx:  # 没有 token
+-+-    #             continue
+-++        # 找出有 token 的专家
+-++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-+ 
+-+-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+-    #         expert_tokens = x[exp_token_idx]
+-+-    #         expert_out = self.experts[i](expert_tokens)
+-+-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++        for expert_id in active_experts.tolist():
+-++            # 取该 expert 对应的排序后 token 区间
+-++            start = (tokens_per_expert[:expert_id]).sum().item()
+-++            end = start + tokens_per_expert[expert_id].item()
+-+ 
+-+-    #         expert_cache = mindspore.mint.scatter_add(
+-+-    #             expert_cache,
+-+-    #             0,
+-+-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+-    #             expert_out
+-+-    #         )
+-++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+-++            expert_tokens = x[token_idx]                     # 取输入向量
+-+ 
+-+-    #     return expert_cache
+-++            # 执行专家 MLP
+-++            expert_out = self.experts[expert_id](expert_tokens)
+-++
+-++            # 按权重缩放
+-++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+-++
+-++            # 回写到缓存（等价 scatter_add）
+-++            expert_cache = mindspore.mint.scatter_add(
+-++                expert_cache,
+-++                0,
+-++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++                scaled_out
+-++            )
+-++
+-++        return expert_cache
+-++
+-++        # @no_grad()
+-++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++        #     # expert_cache = torch.zeros_like(x)
+-++        #     # idxs = flat_expert_indices.argsort()
+-++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++        #     # token_idxs = idxs // self.num_experts_per_tok
+-++        #     # for i, end_idx in enumerate(tokens_per_expert):
+-++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++        #     #     if start_idx == end_idx:
+-++        #     #         continue
+-++        #     #     expert = self.experts[i]
+-++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++        #     #     expert_tokens = x[exp_token_idx]
+-++        #     #     expert_out = expert(expert_tokens)
+-++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++        #     # return expert_cache
+-++        #     expert_cache = ops.zeros_like(x)
+-++        #     idxs = flat_expert_indices.argsort()
+-++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++        #     token_idxs = idxs // self.num_experts_per_tok
+-++
+-++        #     for i, end_idx in enumerate(tokens_per_expert):
+-++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++        #         if start_idx == end_idx:
+-++        #             continue
+-++        #         expert = self.experts[i]
+-++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++        #         expert_tokens = x[exp_token_idx]
+-++        #         expert_out = expert(expert_tokens)
+-++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++
+-++        #     return expert_cache
+-++        # @no_grad()
+-++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++        #     expert_cache = ops.zeros_like(x)
+-++
+-++        #     # 排序保证顺序一致
+-++        #     idxs = flat_expert_indices.argsort()
+-++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++        #     token_idxs = idxs // self.num_experts_per_tok
+-++
+-++        #     # 找出有 token 的专家
+-++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++
+-++        #     for i in active_experts.tolist():
+-++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++        #         end_idx = tokens_per_expert[i]
+-++        #         if start_idx == end_idx:  # 没有 token
+-++        #             continue
+-++
+-++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++        #         expert_tokens = x[exp_token_idx]
+-++        #         expert_out = self.experts[i](expert_tokens)
+-++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++
+-++        #         expert_cache = mindspore.mint.scatter_add(
+-++        #             expert_cache,
+-++        #             0,
+-++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++        #             expert_out
+-++        #         )
+-++
+-++        #     return expert_cache
+-+ 
+-+ 
+-+ 
+-+@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+-+ 
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-+-
+-+ # class DeepseekFlashAttention(nn.Module):
+-+ #     """
+-+ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-+@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+-+ 
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-++
+-+ Deepseek_ATTENTION_CLASSES = {
+-+     "eager": DeepseekAttention,
+-+     "flash-attention": DeepseekFlashAttention,
+-+@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+-+             )
+-+         else:
+-+             # 4d mask is passed through the layers
+-+-            attention_mask = _prepare_4d_causal_attention_mask(
+-++            # attention_mask = _prepare_4d_causal_attention_mask(
+-++            #     attention_mask,
+-++            #     (batch_size, seq_length),
+-++            #     inputs_embeds,
+-++            #     past_key_values_length,
+-++            # )
+-++            #@dwj
+-++            attention_mask = get_cached_causal_mask(
+-+                 attention_mask,
+-+                 (batch_size, seq_length),
+-+                 inputs_embeds,
+-+@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+         # Initialize weights and apply final processing
+-+         self.post_init()
+-+         self.warm_up = False
+-++        #@dwj
+-++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-++            self.num_layers,
+-++            self.num_attention_heads,
+-++            self.head_dim,
+-++            batch_size=1,
+-++            max_length=self.max_length,
+-++            dtype=mindspore.float16
+-++        )
+-++
+-++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-++        key_cache = []
+-++        value_cache = []
+-++        for _ in range(num_layers):
+-++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++            key_cache.append(k)
+-++            value_cache.append(v)
+-++        return key_cache, value_cache
+-++
+-+ 
+-+     def warmup_moe_model_deep(self):
+-+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+index bced285c..ebd7782e 100644
+-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+-+ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-+ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-+ 
+-+-Long_Prompt = False
+-+-PROMPT_LENGTH_THRESHOLD = 128
+-++Long_Prompt = 1
+-++LONG_PROMPT_LENGTH_THRESHOLD = 128
+-++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+-++
+-++_causal_mask_cache = {}
+-++
+-++def get_cached_causal_mask_with_cache_position(
+-++    attention_mask: mindspore.Tensor,
+-++    sequence_length: int,
+-++    target_length: int,
+-++    dtype: mindspore.dtype,
+-++    min_dtype: float,
+-++    cache_position: mindspore.Tensor,
+-++    batch_size: int,
+-++):
+-++    """
+-++    带缓存的 causal mask 构造函数
+-++    """
+-++    # q_len 是当前 query 长度
+-++    q_len = sequence_length
+-++    # kv_len 是 target_length
+-++    kv_len = target_length
+-++
+-++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+-++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+-++
+-++    if key in _causal_mask_cache:
+-++        return _causal_mask_cache[key]
+-++
+-++    # 调用原来的 mask 构造逻辑
+-++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++        attention_mask,
+-++        sequence_length=sequence_length,
+-++        target_length=target_length,
+-++        dtype=dtype,
+-++        min_dtype=min_dtype,
+-++        cache_position=cache_position,
+-++        batch_size=batch_size,
+-++    )
+-++    # 缓存结果
+-++    _causal_mask_cache[key] = causal_mask
+-++    return causal_mask
+-+ 
+-+ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-+ def _prepare_4d_causal_attention_mask_with_cache_position(
+-+@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+ 
+-+ 
+-+ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+-++# class Qwen2MoeAttention(nn.Module):
+-++#     """
+-++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-++#     and "Generating Long Sequences with Sparse Transformers".
+-++#     """
+-++
+-++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++#         super().__init__()
+-++#         self.config = config
+-++#         self.layer_idx = layer_idx
+-++#         if layer_idx is None:
+-++#             logger.warning_once(
+-++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++#                 "when creating this class."
+-++#             )
+-++
+-++#         self.hidden_size = config.hidden_size
+-++#         self.num_heads = config.num_attention_heads
+-++#         self.head_dim = self.hidden_size // self.num_heads
+-++#         self.num_key_value_heads = config.num_key_value_heads
+-++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++#         self.max_position_embeddings = config.max_position_embeddings
+-++#         self.rope_theta = config.rope_theta
+-++#         self.is_causal = True
+-++#         self.attention_dropout = config.attention_dropout
+-++
+-++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++#             raise ValueError(
+-++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++#                 f" and `num_heads`: {self.num_heads})."
+-++#             )
+-++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++
+-++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++#             self.head_dim,
+-++#             max_position_embeddings=self.max_position_embeddings,
+-++#             base=self.rope_theta,
+-++#         )
+-++
+-++#     def forward(
+-++#         self,
+-++#         hidden_states: mindspore.Tensor,
+-++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++#         position_ids: Optional[mindspore.Tensor] = None,
+-++#         past_key_value: Optional[Cache] = None,
+-++#         output_attentions: bool = False,
+-++#         use_cache: bool = False,
+-++#         cache_position: Optional[mindspore.Tensor] = None,
+-++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++
+-++        
+-++
+-++#         bsz, q_len, _ = hidden_states.shape
+-++
+-++#         query_states = self.q_proj(hidden_states)
+-++#         key_states = self.k_proj(hidden_states)
+-++#         value_states = self.v_proj(hidden_states)
+-++
+-++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++
+-++#         kv_seq_len = key_states.shape[-2]
+-++#         if past_key_value is not None:
+-++#             if self.layer_idx is None:
+-++#                 raise ValueError(
+-++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++#                     "with a layer index."
+-++#                 )
+-++#             if isinstance(past_key_value, StaticCache):
+-++#                 kv_seq_len = key_states.shape[-2]
+-++#             else:
+-++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++
+-++#         if past_key_value is not None:
+-++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++            
+-++#             if isinstance(past_key_value, StaticCache):
+-++#                 kv_seq_len = key_states.shape[-2]
+-++
+-++#         # repeat k/v heads if n_kv_heads < n_heads
+-++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++        
+-++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++
+-++#         if attention_mask is not None:
+-++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++#             attn_weights = attn_weights + causal_mask
+-++
+-++#         # upcast attention to fp32
+-++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-++#         attn_output = ops.matmul(attn_weights, value_states)
+-++
+-++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-++#             raise ValueError(
+-++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-++#                 f" {attn_output.shape}"
+-++#             )
+-++
+-++#         attn_output = ops.transpose(attn_output, 1, 2)
+-++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++
+-++#         attn_output = self.o_proj(attn_output)
+-++#         # @lwx
+-++        
+-++#         # max_seq_len = self.max_position_embeddings  # 2048
+-++
+-++#         # if attention_mask is not None:
+-++#         #     # attention_mask: [B, 1, Sq, Sk]
+-++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++
+-++#         #     # pad 到 [max_seq_len, max_seq_len]
+-++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++#         #     global_attention_mask = padded_mask
+-++#         # else:
+-++#         #     global_attention_mask = None
+-++
+-++
+-++#         # sparse_mode=3
+-++#         # attn_output = mindspore.ops.flash_attention_score(
+-++#         #     query=query_states,
+-++#         #     key=key_states,
+-++#         #     value=value_states,
+-++#         #     real_shift=None,
+-++#         #     padding_mask=None,
+-++
+-++#         #     head_num=self.num_heads,
+-++#         #     attn_mask=global_attention_mask,
+-++#         #     keep_prob=1.0 - self.attention_dropout,
+-++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++#         #     input_layout="BNSD",
+-++#         #     pre_tokens=2147483647,
+-++#         #     next_tokens=2147483647,
+-++#         #     inner_precise=0,
+-++#         #     drop_mask=None,
+-++#         #     prefix=None,
+-++#         #     actual_seq_qlen=None,
+-++#         #     actual_seq_kvlen=None,
+-++#         #     sparse_mode=sparse_mode,
+-++#         # )
+-++#         if not output_attentions:
+-++#             attn_weights = None
+-++
+-++#         return attn_output, attn_weights, past_key_value
+-++
+-+ class Qwen2MoeAttention(nn.Module):
+-+     """
+-+-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-+-    and "Generating Long Sequences with Sparse Transformers".
+-+-    """
+-++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+-+ 
+-++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+-++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+-++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+-++
+-++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+-++    """
+-+     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+         super().__init__()
+-+         self.config = config
+-+@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+-+         if layer_idx is None:
+-+             logger.warning_once(
+-+                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+                 "when creating this class."
+-+             )
+-+ 
+-+@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+-+         use_cache: bool = False,
+-+         cache_position: Optional[mindspore.Tensor] = None,
+-+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+-
+-+         
+-+-
+-++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+-+         bsz, q_len, _ = hidden_states.shape
+-+ 
+-+         query_states = self.q_proj(hidden_states)
+-+         key_states = self.k_proj(hidden_states)
+-+         value_states = self.v_proj(hidden_states)
+-+ 
+-+-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+-
+-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++        
+-+         kv_seq_len = key_states.shape[-2]
+-+         if past_key_value is not None:
+-+-            if self.layer_idx is None:
+-+-                raise ValueError(
+-+-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+-                    "with a layer index."
+-+-                )
+-+-            if isinstance(past_key_value, StaticCache):
+-+-                kv_seq_len = key_states.shape[-2]
+-+-            else:
+-+-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++        
+-+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+ 
+-+         if past_key_value is not None:
+-+-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++
+-++        # --- 2. 动态调度核心注意力计算 ---
+-++        global Long_Prompt
+-++        if Long_Prompt >= 1:
+-++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+-++            fa_attention_mask = None
+-++            if attention_mask is not None:
+-++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++                fa_attention_mask = (mask_slice != 0)
+-++
+-++            attn_output = mindspore.ops.flash_attention_score(
+-++                query=query_states,
+-++                key=key_states,
+-++                value=value_states,
+-++                head_num=self.num_heads,
+-++                attn_mask=fa_attention_mask,
+-++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+-++                scalar_value=1.0 / math.sqrt(self.head_dim),
+-++                input_layout="BNSD",
+-++                sparse_mode=0,
+-++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+-++            )
+-+             
+-+-            if isinstance(past_key_value, StaticCache):
+-+-                kv_seq_len = key_states.shape[-2]
+-++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++            attn_output = self.o_proj(attn_output)
+-++            attn_weights = None
+-++            if output_attentions:
+-++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-+ 
+-+-        # repeat k/v heads if n_kv_heads < n_heads
+-+-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+-        
+-+-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++        else:
+-++            # --- Eager Attention 路径 (用于短序列和解码) ---
+-++            key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++            value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++            
+-++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+ 
+-+-        if attention_mask is not None:
+-+-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+-            attn_weights = attn_weights + causal_mask
+-++            if attention_mask is not None:
+-++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++                attn_weights = attn_weights + causal_mask
+-+ 
+-+-        # upcast attention to fp32
+-+-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+-        attn_output = ops.matmul(attn_weights, value_states)
+-++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-++            attn_output = ops.matmul(attn_weights, value_states)
+-+ 
+-+-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+-            raise ValueError(
+-+-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-+-                f" {attn_output.shape}"
+-+-            )
+-++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-++                raise ValueError(
+-++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+-++                )
+-+ 
+-+-        attn_output = ops.transpose(attn_output, 1, 2)
+-+-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++            attn_output = ops.transpose(attn_output, 1, 2)
+-++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++            attn_output = self.o_proj(attn_output)
+-+ 
+-+-        attn_output = self.o_proj(attn_output)
+-+-        # @lwx
+-++            if not output_attentions:
+-++                attn_weights = None
+-+         
+-+-        # max_seq_len = self.max_position_embeddings  # 2048
+-+-
+-+-        # if attention_mask is not None:
+-+-        #     # attention_mask: [B, 1, Sq, Sk]
+-+-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+-
+-+-        #     # pad 到 [max_seq_len, max_seq_len]
+-+-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+-        #     global_attention_mask = padded_mask
+-+-        # else:
+-+-        #     global_attention_mask = None
+-+-
+-+-
+-+-        # sparse_mode=3
+-+-        # attn_output = mindspore.ops.flash_attention_score(
+-+-        #     query=query_states,
+-+-        #     key=key_states,
+-+-        #     value=value_states,
+-+-        #     real_shift=None,
+-+-        #     padding_mask=None,
+-+-
+-+-        #     head_num=self.num_heads,
+-+-        #     attn_mask=global_attention_mask,
+-+-        #     keep_prob=1.0 - self.attention_dropout,
+-+-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+-        #     input_layout="BNSD",
+-+-        #     pre_tokens=2147483647,
+-+-        #     next_tokens=2147483647,
+-+-        #     inner_precise=0,
+-+-        #     drop_mask=None,
+-+-        #     prefix=None,
+-+-        #     actual_seq_qlen=None,
+-+-        #     actual_seq_kvlen=None,
+-+-        #     sparse_mode=sparse_mode,
+-+-        # )
+-+-        if not output_attentions:
+-+-            attn_weights = None
+-+-
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-+-
+-+ # class Qwen2MoeFlashAttention(nn.Module):
+-+ #     """
+-+ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+-+ #             return final_hidden_states, router_logits
+-+ 
+-+ 
+-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+-#     """
+-+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-+-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-+-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-+-#     """
+-+-#     def __init__(self, config: Qwen2MoeConfig):
+-+-#         super().__init__()
+-+-#         self.num_experts = config.num_experts
+-+-#         self.top_k = config.num_experts_per_tok
+-+-#         self.norm_topk_prob = config.norm_topk_prob
+-+-
+-+-#         # 门控网络
+-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+-#         # 专家列表
+-+-#         self.experts = nn.ModuleList(
+-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+-#         )
+-+-#         # 共享专家
+-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+-
+-+-#     @no_grad()
+-+-#     def _moe_infer_decode(
+-+-#         self, 
+-+-#         hidden_states: mindspore.Tensor, 
+-+-#         selected_experts: mindspore.Tensor, 
+-+-#         routing_weights: mindspore.Tensor
+-+-#     ) -> mindspore.Tensor:
+-+-#         """
+-+-#         【解码路径】针对 sequence_length=1 的极致优化。
+-+-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-+-#         """
+-+-#         batch_size, hidden_dim = hidden_states.shape
+-+-        
+-+-#         expert_outputs_list = [
+-+-#             ops.cat([
+-+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+-#             ], dim=0) 
+-+-#             for i in range(batch_size)
+-+-#         ]
+-+-        
+-+-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-+-#         # shape: (batch_size, top_k, hidden_dim)
+-+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+-        
+-+-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-+-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+-        
+-+-#         return moe_output.squeeze(1)
+-+-
+-+-#     @no_grad()
+-+-#     def _moe_infer_prefill(
+-+-#         self, 
+-+-#         hidden_states: mindspore.Tensor, 
+-+-#         selected_experts: mindspore.Tensor, 
+-+-#         routing_weights: mindspore.Tensor
+-+-#     ) -> mindspore.Tensor:
+-+-#         """
+-+-#         【预填充路径】针对 sequence_length > 1 的优化。
+-+-#         按专家对 Token 进行分组，并进行批处理。
+-+-#         """
+-+-#         moe_output = ops.zeros_like(hidden_states)
+-+-#         num_tokens = hidden_states.shape[0]
+-+-#         flat_selected_experts = selected_experts.flatten()
+-+-        
+-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+-        
+-+-#         active_experts = ops.unique(flat_selected_experts)
+-+-        
+-+-#         for expert_idx_tensor in active_experts:
+-+-#             expert_idx = expert_idx_tensor.item()
+-+-#             expert_layer = self.experts[expert_idx]
+-+-            
+-+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+-#             selected_token_indices = token_indices[mask]
+-+-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+-            
+-+-#             current_states = hidden_states[selected_token_indices]
+-+-            
+-+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+-            
+-+-#             moe_output = moe_output.index_add(
+-+-#                 dim=0,
+-+-#                 index=selected_token_indices,
+-+-#                 source=expert_output.to(hidden_states.dtype)
+-+-#             )
+-+-#         return moe_output
+-+-
+-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+-#         """
+-+-#         顶层 forward 方法，作为智能分发器。
+-+-#         """
+-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-        
+-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+-#         router_logits = self.gate(hidden_states_reshaped)
+-+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+-
+-+-#         if self.norm_topk_prob:
+-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-        
+-+-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+-        
+-+-#         moe_output = None
+-+-#         # 在推理时，根据序列长度选择最优路径
+-+-#         if not self.training:
+-+-#             if sequence_length == 1:
+-+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+-#             else:
+-+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+-#         else:
+-+-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-+-#             raise NotImplementedError("Training path is not implemented.")
+-+-
+-+-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-+-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-+-        
+-+-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-+-        
+-+-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-+-        
+-+-#         return final_hidden_states, router_logits
+-+-
+-+-
+-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+-#     """
+-+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-+-#     """
+-+-#     def __init__(self, config: Qwen2MoeConfig):
+-+-#         super().__init__()
+-+-#         self.num_experts = config.num_experts
+-+-#         self.top_k = config.num_experts_per_tok
+-+-#         self.norm_topk_prob = config.norm_topk_prob
+-+-
+-+-#         # 门控网络
+-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+-#         # 专家列表
+-+-#         self.experts = nn.ModuleList(
+-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+-#         )
+-+-#         # 共享专家
+-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+-
+-+-#     @no_grad()
+-+-#     def _moe_infer_decode(
+-+-#         self, 
+-+-#         hidden_states: mindspore.Tensor, 
+-+-#         selected_experts: mindspore.Tensor, 
+-+-#         routing_weights: mindspore.Tensor
+-+-#     ) -> mindspore.Tensor:
+-+-#         batch_size, _ = hidden_states.shape
+-+-#         expert_outputs_list = [
+-+-#             ops.cat([
+-+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+-#             ], dim=0) 
+-+-#             for i in range(batch_size)
+-+-#         ]
+-+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+-#         return moe_output.squeeze(1)
+-+-
+-+-#     @no_grad()
+-+-#     def _moe_infer_prefill(
+-+-#         self, 
+-+-#         hidden_states: mindspore.Tensor, 
+-+-#         selected_experts: mindspore.Tensor, 
+-+-#         routing_weights: mindspore.Tensor
+-+-#     ) -> mindspore.Tensor:
+-+-#         moe_output = ops.zeros_like(hidden_states)
+-+-#         num_tokens = hidden_states.shape[0]
+-+-#         flat_selected_experts = selected_experts.flatten()
+-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+-#         active_experts = ops.unique(flat_selected_experts)
+-+-        
+-+-#         for expert_idx_tensor in active_experts:
+-+-#             expert_idx = expert_idx_tensor.item()
+-+-#             expert_layer = self.experts[expert_idx]
+-+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+-#             selected_token_indices = token_indices[mask]
+-+-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+-#             current_states = hidden_states[selected_token_indices]
+-+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+-#             moe_output = moe_output.index_add(
+-+-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+-#             )
+-+-#         return moe_output
+-+-
+-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+-#         """
+-+-#         顶层 forward 方法，作为智能分发器。
+-+-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-+-#         """
+-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-        
+-+-#         # 1. 门控计算 (通用逻辑)
+-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+-#         router_logits = self.gate(hidden_states_reshaped)
+-+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+-
+-+-#         if self.norm_topk_prob:
+-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-        
+-+-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+-        
+-+-#         # 2. 智能分发到最优 MoE 路径
+-+-#         moe_output = None
+-+-#         if not self.training:
+-+-#             if sequence_length == 1:
+-+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+-#             else:
+-+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+-#         else:
+-+-#             raise NotImplementedError("Training path is not implemented.")
+-+-
+-+-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-+-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+-        
+-+-#         # 4. 合并 MoE 输出和共享专家输出
+-+-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+-        
+-+-#         # 5. 恢复原始形状并返回
+-+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+-        
+-+-#         return final_hidden_states, router_logits
+-+-
+-+-# prefill fastest
+-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+-#     """
+-+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-+-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-+-#     """
+-+-#     def __init__(self, config: Qwen2MoeConfig):
+-+-#         super().__init__()
+-+-#         self.num_experts = config.num_experts
+-+-#         self.top_k = config.num_experts_per_tok
+-+-#         self.norm_topk_prob = config.norm_topk_prob
+-+-
+-+-#         # 门控网络
+-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+-#         # 专家列表
+-+-#         self.experts = nn.ModuleList(
+-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+-#         )
+-+-#         # 共享专家
+-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+-
+-+-#     @no_grad()
+-+-#     def _moe_infer_dispatch(
+-+-#         self, 
+-+-#         hidden_states: mindspore.Tensor, 
+-+-#         selected_experts: mindspore.Tensor, 
+-+-#         routing_weights: mindspore.Tensor
+-+-#     ) -> mindspore.Tensor:
+-+-#         """
+-+-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-+-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-+-#         """
+-+-#         moe_output = ops.zeros_like(hidden_states)
+-+-#         num_tokens, _ = hidden_states.shape
+-+-        
+-+-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-+-#         flat_selected_experts = selected_experts.flatten()
+-+-#         flat_routing_weights = routing_weights.flatten()
+-+-
+-+-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+-
+-+-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-+-#         active_experts = ops.unique(flat_selected_experts)
+-+-        
+-+-#         for expert_idx_tensor in active_experts:
+-+-#             expert_idx = expert_idx_tensor.item()
+-+-#             expert_layer = self.experts[expert_idx]
+-+-            
+-+-#             # 找到所有分配给该专家的 token
+-+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+-            
+-+-#             # 使用 mask 选取对应的 token 和权重
+-+-#             current_token_indices = token_indices[mask]
+-+-#             current_routing_weights = flat_routing_weights[mask]
+-+-#             current_hidden_states = hidden_states[current_token_indices]
+-+-            
+-+-#             # 对这些 token 进行批处理
+-+-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+-            
+-+-#             # 使用 index_add 将结果精确地加回到对应位置
+-+-#             moe_output = moe_output.index_add(
+-+-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-+-#             )
+-+-#         return moe_output
+-+-
+-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+-#         """
+-+-#         顶层 forward 方法，作为智能分发器。
+-+-#         """
+-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-        
+-+-#         # 1. 门控计算
+-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+-#         router_logits = self.gate(hidden_states_reshaped)
+-+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+-
+-+-#         if self.norm_topk_prob:
+-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-        
+-+-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+-        
+-+-#         # 2. 调用统一的 MoE 计算内核
+-+-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-+-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-+-
+-+-#         # 3. 统一处理共享专家
+-+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+-        
+-+-#         # 4. 合并输出
+-+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+-        
+-+-#         # 5. 恢复原始形状并返回
+-+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+-        
+-+-#         return final_hidden_states, router_logits
+-+-
+-+-
+-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+-#     """
+-+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+-#     【最终高性能与高精度版】：
+-+-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-+-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-+-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-+-#     3. 这样实现了速度和准确性的两全其美。
+-+-#     """
+-+-#     def __init__(self, config: Qwen2MoeConfig):
+-+-#         super().__init__()
+-+-#         self.num_experts = config.num_experts
+-+-#         self.top_k = config.num_experts_per_tok
+-+-#         self.norm_topk_prob = config.norm_topk_prob
+-+-
+-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+-#         self.experts = nn.ModuleList(
+-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+-#         )
+-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+-
+-+-#     @no_grad()
+-+-#     def _moe_infer_decode(
+-+-#         self, 
+-+-#         hidden_states: mindspore.Tensor, 
+-+-#         selected_experts: mindspore.Tensor, 
+-+-#         routing_weights: mindspore.Tensor
+-+-#     ) -> mindspore.Tensor:
+-+-#         """
+-+-#         【解码路径】极致优化版：bmm + 高精度累加。
+-+-#         """
+-+-#         original_dtype = hidden_states.dtype
+-+-#         batch_size, _ = hidden_states.shape
+-+-        
+-+-#         expert_outputs_list = [
+-+-#             ops.cat([
+-+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+-#             ], dim=0) 
+-+-#             for i in range(batch_size)
+-+-#         ]
+-+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+-
+-+-#         # 在 float32 下执行 bmm，得到高精度结果
+-+-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+-        
+-+-#         # 将高精度结果转换回原始数据类型
+-+-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-+-        
+-+-#         return moe_output
+-+-
+-+-#     @no_grad()
+-+-#     def _moe_infer_prefill(
+-+-#         self, 
+-+-#         hidden_states: mindspore.Tensor, 
+-+-#         selected_experts: mindspore.Tensor, 
+-+-#         routing_weights: mindspore.Tensor
+-+-#     ) -> mindspore.Tensor:
+-+-#         """
+-+-#         【预填充路径】与原始实现一致，结果精确。
+-+-#         """
+-+-#         moe_output = ops.zeros_like(hidden_states)
+-+-#         num_tokens, _ = hidden_states.shape
+-+-#         flat_selected_experts = selected_experts.flatten()
+-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+-#         active_experts = ops.unique(flat_selected_experts)
+-+-        
+-+-#         for expert_idx_tensor in active_experts:
+-+-#             expert_idx = expert_idx_tensor.item()
+-+-#             expert_layer = self.experts[expert_idx]
+-+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+-#             selected_token_indices = token_indices[mask]
+-+-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+-#             current_states = hidden_states[selected_token_indices]
+-+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+-#             moe_output = moe_output.index_add(
+-+-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+-#             )
+-+-#         return moe_output
+-+-
+-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-        
+-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+-#         router_logits = self.gate(hidden_states_reshaped)
+-+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+-
+-+-#         if self.norm_topk_prob:
+-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-        
+-+-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-+-#         # 如果模型主体是 float16，后续再转换
+-+-        
+-+-#         moe_output = None
+-+-#         if not self.training:
+-+-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-+-#             # _moe_infer_decode 内部会处理好类型转换
+-+-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-+-#             if sequence_length == 1:
+-+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+-#             else:
+-+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+-#         else:
+-+-#             raise NotImplementedError("Training path is not implemented.")
+-+-
+-+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+-        
+-+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+-        
+-+-#         return final_hidden_states, router_logits
+-+-    
+-+-
+-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+-#     """
+-+-#     【融合版】一个混合专家模块，内置两种推理策略，
+-+-#     由外部全局变量 `Long_Prompt` 控制：
+-+-
+-+-#     - if Long_Prompt is True:  【精度优先模式】
+-+-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-+-#       适用于处理长序列，避免误差累积。
+-+-
+-+-#     - if Long_Prompt is False: 【速度优先模式】
+-+-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-+-#       在解码阶段获得极致速度，同时保证结果高度准确。
+-+-#     """
+-+-#     def __init__(self, config: Qwen2MoeConfig):
+-+-#         super().__init__()
+-+-#         self.num_experts = config.num_experts
+-+-#         self.top_k = config.num_experts_per_tok
+-+-#         self.norm_topk_prob = config.norm_topk_prob
+-+-
+-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+-#         self.experts = nn.ModuleList(
+-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+-#         )
+-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+-
+-+-#     # --- 速度优先模式的辅助函数 ---
+-+-#     @no_grad()
+-+-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+-#         original_dtype = hidden_states.dtype
+-+-#         batch_size, _ = hidden_states.shape
+-+-#         expert_outputs_list = [
+-+-#             ops.cat([
+-+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+-#             ], dim=0) 
+-+-#             for i in range(batch_size)
+-+-#         ]
+-+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+-#         weights_fp32 = routing_weights.to(mindspore.float32)
+-+-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+-
+-+-#     @no_grad()
+-+-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+-#         moe_output = ops.zeros_like(hidden_states)
+-+-#         num_tokens, _ = hidden_states.shape
+-+-#         flat_selected_experts = selected_experts.flatten()
+-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+-#         active_experts = ops.unique(flat_selected_experts)
+-+-#         for expert_idx_tensor in active_experts:
+-+-#             expert_idx = expert_idx_tensor.item()
+-+-#             expert_layer = self.experts[expert_idx]
+-+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+-#             selected_token_indices = token_indices[mask]
+-+-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+-#             current_states = hidden_states[selected_token_indices]
+-+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-+-#         return moe_output
+-+-
+-+-#     # --- 精度优先模式的辅助函数 ---
+-+-#     @no_grad()
+-+-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+-#         moe_output = ops.zeros_like(hidden_states)
+-+-#         num_tokens, _ = hidden_states.shape
+-+-#         flat_selected_experts = selected_experts.flatten()
+-+-#         flat_routing_weights = routing_weights.flatten()
+-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+-#         active_experts = ops.unique(flat_selected_experts)
+-+-#         for expert_idx_tensor in active_experts:
+-+-#             expert_idx = expert_idx_tensor.item()
+-+-#             expert_layer = self.experts[expert_idx]
+-+-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+-#             current_token_indices = token_indices[mask]
+-+-#             current_routing_weights = flat_routing_weights[mask]
+-+-#             current_hidden_states = hidden_states[current_token_indices]
+-+-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+-#         return moe_output
+-+-
+-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+-#         # 声明我们将要使用一个在模块外部定义的全局变量
+-+-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-+-#         global Long_Prompt
+-+-
+-+-#         # 1. 门控计算 (所有模式通用)
+-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+-#         router_logits = self.gate(hidden_states_reshaped)
+-+-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+-#         if self.norm_topk_prob:
+-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-        
+-+-#         moe_output = None
+-+-#         if not self.training:
+-+-#             # 根据 Long_Prompt 标志选择模式
+-+-#             if Long_Prompt:
+-+-#                 # --- 精度优先模式 ---
+-+-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+-#             else:
+-+-#                 # --- 速度优先模式 ---
+-+-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+-#                 if sequence_length == 1:
+-+-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+-#                 else:
+-+-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+-#         else:
+-+-#             raise NotImplementedError("Training path is not implemented.")
+-+-
+-+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+-        
+-+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+-        
+-+-#         return final_hidden_states, router_logits
+-+-    
+-+ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+     """
+-+     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-+@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+ 
+-++    # @no_grad()
+-++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++    #     num_tokens, _ = hidden_states.shape
+-++    #     flat_selected_experts = selected_experts.flatten()
+-++    #     sorted_expert_indices = flat_selected_experts.argsort()
+-++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-++    #     original_token_indices = sorted_expert_indices // self.top_k
+-++    #     moe_output = ops.zeros_like(hidden_states)
+-++    #     current_token_offset = 0
+-++    #     for i in range(self.num_experts):
+-++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+-++    #         if expert_token_count == 0:
+-++    #             continue
+-++    #         end_offset = current_token_offset + expert_token_count
+-++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+-++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-++    #         current_token_offset += expert_token_count
+-++    #     return moe_output
+-++
+-+     @no_grad()
+-+     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+-        num_tokens, _ = hidden_states.shape
+-+-        flat_selected_experts = selected_experts.flatten()
+-+-        sorted_expert_indices = flat_selected_experts.argsort()
+-+-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+-        original_token_indices = sorted_expert_indices // self.top_k
+-++        """
+-++        优化版 MoE prefill (速度优先模式)：
+-++        - 批量张量化处理同一个 expert 的所有 token
+-++        - 跳过无 token 的专家
+-++        - 保持结果完全一致
+-++        """
+-+         moe_output = ops.zeros_like(hidden_states)
+-+-        current_token_offset = 0
+-+-        for i in range(self.num_experts):
+-+-            expert_token_count = tokens_per_expert[i] - current_token_offset
+-+-            if expert_token_count == 0:
+-+-                continue
+-+-            end_offset = current_token_offset + expert_token_count
+-+-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+-            expert_hidden_states = hidden_states[expert_original_token_indices]
+-+-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+-            current_token_offset += expert_token_count
+-++
+-++        flat_selected_experts = selected_experts.flatten()
+-++        flat_routing_weights = routing_weights.flatten()
+-++
+-++        idxs = flat_selected_experts.argsort()
+-++        sorted_expert_indices = flat_selected_experts[idxs]
+-++        sorted_token_indices = idxs // self.top_k
+-++
+-++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+-++
+-++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-++
+-++        for expert_id in active_experts.tolist():
+-++            start = int(tokens_per_expert[:expert_id].sum().item())
+-++            end = start + int(tokens_per_expert[expert_id].item())
+-++
+-++            token_idx = sorted_token_indices[start:end]
+-++            expert_tokens = hidden_states[token_idx]
+-++
+-++            expert_out = self.experts[expert_id](expert_tokens)
+-++
+-++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+-++
+-++            moe_output = mindspore.mint.scatter_add(
+-++                moe_output,
+-++                0,
+-++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+-++                scaled_out.to(hidden_states.dtype)
+-++            )
+-++
+-+         return moe_output
+-+ 
+-++
+-+     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-+     @no_grad()
+-+     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+         
+-+         moe_output = None
+-+-        if Long_Prompt:
+-+-            # --- 精度优先模式 (ACCURACY MODE) ---
+-+-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++        # if Long_Prompt==0:
+-++        #     # --- 精度优先模式 (ACCURACY MODE) ---
+-++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++        # else:
+-++        #     # --- 速度优先模式 (SPEED MODE) ---
+-++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++        #     if sequence_length == 1:
+-++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++        #     else:
+-++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++        
+-++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++        if sequence_length == 1:
+-++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+         else:
+-+-            # --- 速度优先模式 (SPEED MODE) ---
+-+-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+-            if sequence_length == 1:
+-+-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+-            else:
+-+-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+-        
+-++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++    
+-+ 
+-+         # 3. 共享专家计算与合并 (所有模式通用)
+-+         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+         
+-+         return final_hidden_states, router_logits
+-+ 
+-++
+-+ class Qwen2MoeDecoderLayer(nn.Module):
+-+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-+         super().__init__()
+-+         self.hidden_size = config.hidden_size
+-+         
+-+-        # if Long_Prompt:
+-+-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+-        # else:
+-++        # if Long_Prompt == 2:
+-+         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++        # else:
+-++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+ 
+-+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+ 
+-+@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+             )
+-+ 
+-+         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+-+-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++        #     attention_mask,
+-++        #     sequence_length=sequence_length,
+-++        #     target_length=target_length,
+-++        #     dtype=dtype,
+-++        #     min_dtype=min_dtype,
+-++        #     cache_position=cache_position,
+-++        #     batch_size=input_tensor.shape[0],
+-++        # )
+-++        #@dwj
+-++        causal_mask = get_cached_causal_mask_with_cache_position(
+-+             attention_mask,
+-+             sequence_length=sequence_length,
+-+             target_length=target_length,
+-+@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-+         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-+         """
+-+-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+-++        _causal_mask_cache.clear()
+-+ 
+-+         input_ids = kwargs.get("input_ids")
+-+         if input_ids is None and args:
+-+@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+ 
+-+         if input_ids is not None:
+-+             prompt_length = input_ids.shape[1]
+-+-            
+-+-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-+-                Long_Prompt = True
+-++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+-++                Long_Prompt = 2
+-++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+-++                Long_Prompt = 0
+-+             else:
+-+-                Long_Prompt = False
+-++                Long_Prompt = 1
+-++
+-+ 
+-+         return super().generate(*args, **kwargs)
+-+     
+-+@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+             dtype = self.lm_head.weight.dtype
+-+             min_dtype = float(ops.finfo(dtype).min)
+-+ 
+-+-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++            #     attention_mask,
+-++            #     sequence_length=sequence_length,
+-++            #     target_length=past_key_values.get_max_length(),
+-++            #     dtype=dtype,
+-++            #     min_dtype=min_dtype,
+-++            #     cache_position=cache_position,
+-++            #     batch_size=batch_size,
+-++            # )
+-++
+-++            #@dwj
+-++            attention_mask = get_cached_causal_mask_with_cache_position(
+-+                 attention_mask,
+-+                 sequence_length=sequence_length,
+-+                 target_length=past_key_values.get_max_length(),
+-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+deleted file mode 100644
+-+index 6dfb5b93..00000000
+-+--- a/patches/0001-20251104commit.patch
+-++++ /dev/null
+-+@@ -1,1272 +0,0 @@
+-+-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+-From: Pinoeer-kingxi <13022943007@163.com>
+-+-Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+-Subject: [PATCH] 20251104commit
+-+-
+-+----
+-+- mindnlp/transformers/cache_utils.py           |  28 +-
+-+- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+- 3 files changed, 976 insertions(+), 87 deletions(-)
+-+-
+-+-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+-index cadd2e04..02f8d4be 100644
+-+---- a/mindnlp/transformers/cache_utils.py
+-+-+++ b/mindnlp/transformers/cache_utils.py
+-+-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+-             # k_out[:, :, cache_position] = key_states
+-+-             # v_out[:, :, cache_position] = value_states
+-+--            if ON_ORANGE_PI:
+-+--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+--            else:
+-+--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+--
+-+-+            # if ON_ORANGE_PI:
+-+-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+-+            # else:
+-+-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+-+            # 确保 cache_position 是 1D tensor 并且类型正确
+-+-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+-+            if cache_position.ndim > 1:
+-+-+                cache_position = cache_position.flatten()
+-+-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+-+                cache_position = cache_position.int()
+-+-+            
+-+-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+-+            k_out[:, :, cache_position] = key_states
+-+-+            v_out[:, :, cache_position] = value_states
+-+-+            
+-+-         return k_out, v_out
+-+- 
+-+-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+-index c695b944..d8303e45 100644
+-+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+- # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+- def rotate_half(x):
+-+-     """Rotates half the hidden dims of the input."""
+-+--    x1 = x[..., : x.shape[-1] // 2]
+-+--    x2 = x[..., x.shape[-1] // 2 :]
+-+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+-+    # x1 = x[..., : x.shape[-1] // 2]
+-+-+    # x2 = x[..., x.shape[-1] // 2 :]
+-+-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+-     return ops.cat((-x2, x1), dim=-1)
+-+- 
+-+- 
+-+-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+-         if self.training:
+-+-             raise NotImplementedError("Training is not supported yet.")
+-+-         else:
+-+--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+--        if self.config.n_shared_experts is not None:
+-+--            y = y + self.shared_experts(identity)
+-+--        return y
+-+-+            # @lwx
+-+-+            if orig_shape[1] == 1:
+-+-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+-+                y=y.view(*orig_shape)
+-+-+                if self.config.n_shared_experts is not None:
+-+-+                    y = y + self.shared_experts(identity)
+-+-+                return y
+-+-+            else:
+-+-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+-+                if self.config.n_shared_experts is not None:
+-+-+                    y = y + self.shared_experts(identity)
+-+-+                return y
+-+-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+-+        # if self.config.n_shared_experts is not None:
+-+-+        #     y = y + self.shared_experts(identity)
+-+-+        # return y
+-+-+        
+-+-+    @no_grad()
+-+-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+-+
+-+-+        expert_cache = ops.zeros_like(x)
+-+-+        for i in range(self.num_experts_per_tok):
+-+-+            expert_id = flat_expert_indices[i].item()
+-+-+            weight = flat_expert_weights[i].item()
+-+-+            expert = self.experts[expert_id]
+-+-+            expert_out = expert(x)
+-+-+            expert_cache += expert_out * weight
+-+-+        return expert_cache
+-+- 
+-+-     @no_grad()
+-+--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+--        # expert_cache = torch.zeros_like(x)
+-+--        # idxs = flat_expert_indices.argsort()
+-+--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+--        # token_idxs = idxs // self.num_experts_per_tok
+-+--        # for i, end_idx in enumerate(tokens_per_expert):
+-+--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+--        #     if start_idx == end_idx:
+-+--        #         continue
+-+--        #     expert = self.experts[i]
+-+--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+--        #     expert_tokens = x[exp_token_idx]
+-+--        #     expert_out = expert(expert_tokens)
+-+--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+--        # return expert_cache
+-+-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+-         expert_cache = ops.zeros_like(x)
+-+-         idxs = flat_expert_indices.argsort()
+-+-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+-         token_idxs = idxs // self.num_experts_per_tok
+-+-+
+-+-         for i, end_idx in enumerate(tokens_per_expert):
+-+-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+-             if start_idx == end_idx:
+-+-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+-             expert_out = expert(expert_tokens)
+-+-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+-+
+-+-         return expert_cache
+-+-+        
+-+-+    # @no_grad()
+-+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+-+    #     # expert_cache = torch.zeros_like(x)
+-+-+    #     # idxs = flat_expert_indices.argsort()
+-+-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+-+    #     # token_idxs = idxs // self.num_experts_per_tok
+-+-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+-+    #     #     if start_idx == end_idx:
+-+-+    #     #         continue
+-+-+    #     #     expert = self.experts[i]
+-+-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+-+    #     #     expert_tokens = x[exp_token_idx]
+-+-+    #     #     expert_out = expert(expert_tokens)
+-+-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+-+    #     # return expert_cache
+-+-+    #     expert_cache = ops.zeros_like(x)
+-+-+    #     idxs = flat_expert_indices.argsort()
+-+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+-+
+-+-+    #     for i, end_idx in enumerate(tokens_per_expert):
+-+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+-+    #         if start_idx == end_idx:
+-+-+    #             continue
+-+-+    #         expert = self.experts[i]
+-+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+-+    #         expert_tokens = x[exp_token_idx]
+-+-+    #         expert_out = expert(expert_tokens)
+-+-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+-+
+-+-+    #     return expert_cache
+-+-+    # @no_grad()
+-+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+-+    #     expert_cache = ops.zeros_like(x)
+-+-+
+-+-+    #     # 排序保证顺序一致
+-+-+    #     idxs = flat_expert_indices.argsort()
+-+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+-+
+-+-+    #     # 找出有 token 的专家
+-+-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+-+
+-+-+    #     for i in active_experts.tolist():
+-+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+-+    #         end_idx = tokens_per_expert[i]
+-+-+    #         if start_idx == end_idx:  # 没有 token
+-+-+    #             continue
+-+-+
+-+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+-+    #         expert_tokens = x[exp_token_idx]
+-+-+    #         expert_out = self.experts[i](expert_tokens)
+-+-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+-+
+-+-+    #         expert_cache = mindspore.mint.scatter_add(
+-+-+    #             expert_cache,
+-+-+    #             0,
+-+-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+-+    #             expert_out
+-+-+    #         )
+-+-+
+-+-+    #     return expert_cache
+-+-+
+-+-+
+-+- 
+-+- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+- #     """
+-+-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+- 
+-+-         # Initialize weights and apply final processing
+-+-         self.post_init()
+-+-+        self.warm_up = False
+-+-+
+-+-+    def warmup_moe_model_deep(self):
+-+-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+-+        test_texts = [
+-+-+            "warmup short",
+-+-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+-+        ]
+-+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+-+        if tokenizer is None:
+-+-+            from mindnlp.transformers import AutoTokenizer
+-+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+-+            self._warmup_tokenizer = tokenizer
+-+-+
+-+-+        for text in test_texts:
+-+-+            inputs = tokenizer(text, return_tensors="ms")
+-+-+            with mindspore._no_grad():
+-+-+                _ = self(**inputs, use_cache=False)
+-+-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+- 
+-+-     def get_input_embeddings(self):
+-+-         return self.model.embed_tokens
+-+-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+-         ```"""
+-+-+        if not self.warm_up:
+-+-+            self.warm_up = True
+-+-+            self.warmup_moe_model_deep()
+-+-+
+-+-         output_attentions = (
+-+-             output_attentions
+-+-             if output_attentions is not None
+-+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+-index 3cbf820e..d4c6b651 100644
+-+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+-@@ -18,7 +18,6 @@
+-+- # See the License for the specific language governing permissions and
+-+- # limitations under the License.
+-+- """MindSpore Qwen2MoE model."""
+-+--
+-+- import math
+-+- from typing import List, Optional, Tuple, Union
+-+- 
+-+-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+-     TokenClassifierOutput,
+-+- )
+-+- from ...modeling_utils import PreTrainedModel
+-+-+from ...generation import GenerationMixin
+-+- from ....utils import logging
+-+- from .configuration_qwen2_moe import Qwen2MoeConfig
+-+- 
+-+-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+-         self.variance_epsilon = eps
+-+- 
+-+-     def forward(self, hidden_states):
+-+-+        # @dwj
+-+-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+-+        # @lwx
+-+-+        # if not self.training :
+-+-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+-         input_dtype = hidden_states.dtype
+-+-         hidden_states = hidden_states.to(mindspore.float32)
+-+-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+-@@ -234,6 +239,8 @@ def rotate_half(x):
+-+-     """Rotates half the hidden dims of the input."""
+-+-     x1 = x[..., : x.shape[-1] // 2]
+-+-     x2 = x[..., x.shape[-1] // 2 :]
+-+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+-     return ops.cat((-x2, x1), dim=-1)
+-+- 
+-+- 
+-+-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+-         self.config = config
+-+-         self.hidden_size = config.hidden_size
+-+-         self.intermediate_size = intermediate_size
+-+-+        
+-+-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+-         self.act_fn = ACT2FN[config.hidden_act]
+-+- 
+-+-     def forward(self, x):
+-+--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+--
+-+- 
+-+-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+-+        # @lwx 
+-+-+        # gate_up_output = self.gate_up_proj(x)
+-+-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+-+        # return self.down_proj(swiglu_output)
+-+-+
+-+-+    # def forward(self, x):
+-+-+    #     gate_proj_out = self.gate_proj(x)
+-+-+    #     up_proj_out = self.up_proj(x)
+-+-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+-+    #     return self.down_proj(swiglu_out)
+-+-+        
+-+- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+-     """
+-+-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+-         use_cache: bool = False,
+-+-         cache_position: Optional[mindspore.Tensor] = None,
+-+-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+-+
+-+-+        
+-+-+
+-+-         bsz, q_len, _ = hidden_states.shape
+-+- 
+-+-         query_states = self.q_proj(hidden_states)
+-+-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+-                     "with a layer index."
+-+-                 )
+-+--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+-+            if isinstance(past_key_value, StaticCache):
+-+-+                kv_seq_len = key_states.shape[-2]
+-+-+            else:
+-+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+- 
+-+-         if past_key_value is not None:
+-+-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+-+            
+-+-+            if isinstance(past_key_value, StaticCache):
+-+-+                kv_seq_len = key_states.shape[-2]
+-+- 
+-+-         # repeat k/v heads if n_kv_heads < n_heads
+-+-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+--
+-+-+        
+-+-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+- 
+-+--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+--            raise ValueError(
+-+--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+--                f" {attn_weights.shape}"
+-+--            )
+-+--
+-+--        if attention_mask is not None:  # no matter the length, we just slice it
+-+--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+-+        if attention_mask is not None:
+-+-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+-             attn_weights = attn_weights + causal_mask
+-+- 
+-+-         # upcast attention to fp32
+-+-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+- 
+-+-         attn_output = self.o_proj(attn_output)
+-+--
+-+-+        # @lwx
+-+-+        
+-+-+        # max_seq_len = self.max_position_embeddings  # 2048
+-+-+
+-+-+        # if attention_mask is not None:
+-+-+        #     # attention_mask: [B, 1, Sq, Sk]
+-+-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+-+
+-+-+        #     # pad 到 [max_seq_len, max_seq_len]
+-+-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+-+        #     global_attention_mask = padded_mask
+-+-+        # else:
+-+-+        #     global_attention_mask = None
+-+-+
+-+-+
+-+-+        # sparse_mode=3
+-+-+        # attn_output = mindspore.ops.flash_attention_score(
+-+-+        #     query=query_states,
+-+-+        #     key=key_states,
+-+-+        #     value=value_states,
+-+-+        #     real_shift=None,
+-+-+        #     padding_mask=None,
+-+-+
+-+-+        #     head_num=self.num_heads,
+-+-+        #     attn_mask=global_attention_mask,
+-+-+        #     keep_prob=1.0 - self.attention_dropout,
+-+-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+-+        #     input_layout="BNSD",
+-+-+        #     pre_tokens=2147483647,
+-+-+        #     next_tokens=2147483647,
+-+-+        #     inner_precise=0,
+-+-+        #     drop_mask=None,
+-+-+        #     prefix=None,
+-+-+        #     actual_seq_qlen=None,
+-+-+        #     actual_seq_kvlen=None,
+-+-+        #     sparse_mode=sparse_mode,
+-+-+        # )
+-+-         if not output_attentions:
+-+-             attn_weights = None
+-+- 
+-+-         return attn_output, attn_weights, past_key_value
+-+- 
+-+- 
+-+-+class Qwen2MoeFlashAttention(nn.Module):
+-+-+    """
+-+-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+-+
+-+-+    关键改动:
+-+-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+-+       直接传入原始的 key 和 value 张量效率更高。
+-+-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+-+    """
+-+-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+-+        super().__init__()
+-+-+        self.config = config
+-+-+        self.layer_idx = layer_idx
+-+-+        self.hidden_size = config.hidden_size
+-+-+        self.num_heads = config.num_attention_heads
+-+-+        self.head_dim = self.hidden_size // self.num_heads
+-+-+        self.num_key_value_heads = config.num_key_value_heads
+-+-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+-+        self.max_position_embeddings = config.max_position_embeddings
+-+-+        self.rope_theta = config.rope_theta
+-+-+        self.attention_dropout = config.attention_dropout
+-+-+
+-+-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+-+            raise ValueError(
+-+-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+-+            )
+-+-+
+-+-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+-+
+-+-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+-+            self.head_dim,
+-+-+            max_position_embeddings=self.max_position_embeddings,
+-+-+            base=self.rope_theta,
+-+-+        )
+-+-+
+-+-+    def forward(
+-+-+        self,
+-+-+        hidden_states: mindspore.Tensor,
+-+-+        attention_mask: Optional[mindspore.Tensor] = None,
+-+-+        position_ids: Optional[mindspore.Tensor] = None,
+-+-+        past_key_value: Optional[Cache] = None,
+-+-+        output_attentions: bool = False,
+-+-+        use_cache: bool = False,
+-+-+        cache_position: Optional[mindspore.Tensor] = None,
+-+-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+-+
+-+-+        bsz, q_len, _ = hidden_states.shape
+-+-+
+-+-+        # 1. 线性投射 Q, K, V
+-+-+        query_states = self.q_proj(hidden_states)
+-+-+        key_states = self.k_proj(hidden_states)
+-+-+        value_states = self.v_proj(hidden_states)
+-+-+
+-+-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+
+-+-+        # 3. RoPE 旋转位置编码
+-+-+        kv_seq_len = key_states.shape[-2]
+-+-+        if past_key_value is not None:
+-+-+            if self.layer_idx is None:
+-+-+                raise ValueError(
+-+-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+-+                    "with a layer index."
+-+-+                )
+-+-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+-+                if cache_position.shape[0] == 1:
+-+-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+-+                    kv_seq_len = past_seen_tokens + 1
+-+-+                else:
+-+-+                    # prefill 阶段：cache_position 是范围，使用其长度
+-+-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+-+            else:
+-+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+-+
+-+-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+-+
+-+-+        # 4. KV 缓存更新
+-+-+        if past_key_value is not None:
+-+-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+-+            key_states, value_states = past_key_value.update(
+-+-+                key_states, value_states, self.layer_idx, cache_kwargs
+-+-+            )
+-+-+            
+-+-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+-+                if cache_position.shape[0] == 1:
+-+-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+-+                    kv_seq_len = key_states.shape[-2]
+-+-+
+-+-+        # 5. [重要] 准备 Attention Mask
+-+-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+-+        fa_attention_mask = None
+-+-+        if attention_mask is not None:
+-+-+            # 截取与当前key长度匹配的部分
+-+-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+-+            fa_attention_mask = (mask_slice != 0)
+-+-+
+-+-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+-+        input_dtype = query_states.dtype
+-+-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+-+            query_states = query_states.to(mindspore.float16)
+-+-+            key_states = key_states.to(mindspore.float16)
+-+-+            value_states = value_states.to(mindspore.float16)
+-+-+
+-+-+        # 6. [核心] 调用 flash_attention_score 算子
+-+-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+-+        attn_output = mindspore.ops.flash_attention_score(
+-+-+            query=query_states,
+-+-+            key=key_states,
+-+-+            value=value_states,
+-+-+            head_num=self.num_heads, # 传入Q的头数(N1)
+-+-+            attn_mask=fa_attention_mask,
+-+-+            keep_prob=1.0 - self.attention_dropout,
+-+-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+-+            input_layout="BNSD",
+-+-+            sparse_mode=0 # 使用 defaultMask 模式
+-+-+        )
+-+-+
+-+-+        # 恢复原始数据类型
+-+-+        attn_output = attn_output.to(input_dtype)
+-+-+
+-+-+        # 7. 调整输出形状
+-+-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+-+        attn_output = self.o_proj(attn_output)
+-+-+
+-+-+        # FlashAttention 算子不直接返回注意力权重矩阵
+-+-+        attn_weights = None
+-+-+        if output_attentions:
+-+-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+-+
+-+-+        return attn_output, attn_weights, past_key_value
+-+-+
+-+-+    # def forward(
+-+-+    #     self,
+-+-+    #     hidden_states: mindspore.Tensor,
+-+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+-+    #     past_key_value: Optional[Cache] = None,
+-+-+    #     output_attentions: bool = False,
+-+-+    #     use_cache: bool = False,
+-+-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+-+
+-+-+    #     bsz, q_len, _ = hidden_states.shape
+-+-+
+-+-+    #     # 1. 线性投射 Q, K, V
+-+-+    #     query_states = self.q_proj(hidden_states)
+-+-+    #     key_states = self.k_proj(hidden_states)
+-+-+    #     value_states = self.v_proj(hidden_states)
+-+-+
+-+-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+
+-+-+    #     # 3. RoPE 旋转位置编码
+-+-+    #     kv_seq_len = key_states.shape[-2]
+-+-+    #     if past_key_value is not None:
+-+-+    #         if self.layer_idx is None:
+-+-+    #             raise ValueError(
+-+-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+-+    #                 "with a layer index."
+-+-+    #             )
+-+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+-+
+-+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+-+
+-+-+    #     # 4. KV 缓存更新
+-+-+    #     if past_key_value is not None:
+-+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+-+    #         key_states, value_states = past_key_value.update(
+-+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+-+    #         )
+-+-+
+-+-+    #     # 5. 准备 Attention Mask
+-+-+    #     fa_attention_mask = None
+-+-+    #     if attention_mask is not None:
+-+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+-+    #         fa_attention_mask = (mask_slice != 0)
+-+-+
+-+-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+-+    #     input_dtype = query_states.dtype
+-+-+
+-+-+    #     # 6. [核心] 调用 flash_attention_score 算子
+-+-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+-+    #         query=query_states,
+-+-+    #         key=key_states,
+-+-+    #         value=value_states,
+-+-+    #         head_num=self.num_heads,
+-+-+    #         attn_mask=fa_attention_mask,
+-+-+    #         keep_prob=1.0 - self.attention_dropout,
+-+-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+-+    #         input_layout="BNSD",
+-+-+    #         sparse_mode=0,
+-+-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+-+    #         inner_precise=1
+-+-+    #     )
+-+-+
+-+-+    #     # 恢复原始数据类型
+-+-+    #     attn_output = attn_output.to(input_dtype)
+-+-+
+-+-+    #     # 7. 调整输出形状
+-+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+-+    #     attn_output = self.o_proj(attn_output)
+-+-+
+-+-+    #     attn_weights = None
+-+-+    #     if output_attentions:
+-+-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+-+
+-+-+    #     return attn_output, attn_weights, past_key_value
+-+-+
+-+-+    # def forward(
+-+-+    #     self,
+-+-+    #     hidden_states: mindspore.Tensor,
+-+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+-+    #     past_key_value: Optional[Cache] = None,
+-+-+    #     output_attentions: bool = False,
+-+-+    #     use_cache: bool = False,
+-+-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+-+
+-+-+    #     bsz, q_len, _ = hidden_states.shape
+-+-+
+-+-+    #     query_states = self.q_proj(hidden_states)
+-+-+    #     key_states = self.k_proj(hidden_states)
+-+-+    #     value_states = self.v_proj(hidden_states)
+-+-+
+-+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-+
+-+-+    #     kv_seq_len = key_states.shape[-2]
+-+-+    #     if past_key_value is not None:
+-+-+    #         if self.layer_idx is None:
+-+-+    #             raise ValueError("`layer_idx` must be specified for caching")
+-+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+-+
+-+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+-+
+-+-+    #     if past_key_value is not None:
+-+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+-+    #         key_states, value_states = past_key_value.update(
+-+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+-+    #         )
+-+-+
+-+-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+-+
+-+-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+-+    #     query_states = query_states / math.sqrt(self.head_dim)
+-+-+    #     # <--- 修改结束 ---
+-+-+
+-+-+    #     fa_attention_mask = None
+-+-+    #     if attention_mask is not None:
+-+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+-+    #         fa_attention_mask = (mask_slice != 0)
+-+-+
+-+-+    #     input_dtype = query_states.dtype
+-+-+
+-+-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+-+    #         query=query_states,    # 传入已经预先缩放过的 query
+-+-+    #         key=key_states,
+-+-+    #         value=value_states,
+-+-+    #         head_num=self.num_heads,
+-+-+    #         attn_mask=fa_attention_mask,
+-+-+    #         keep_prob=1.0 - self.attention_dropout,
+-+-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+-+    #         input_layout="BNSD",
+-+-+    #         sparse_mode=0,
+-+-+    #         inner_precise=1        # 仍然保持内部高精度计算
+-+-+    #     )
+-+-+
+-+-+    #     attn_output = attn_output.to(input_dtype)
+-+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+-+    #     attn_output = self.o_proj(attn_output)
+-+-+
+-+-+    #     attn_weights = None
+-+-+    #     if output_attentions:
+-+-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+-+
+-+-+    #     return attn_output, attn_weights, past_key_value
+-+-+
+-+- QWEN2MOE_ATTENTION_CLASSES = {
+-+-     "eager": Qwen2MoeAttention,
+-+-+    "flash-attention": Qwen2MoeFlashAttention,
+-+- }
+-+- 
+-+- 
+-+-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+- 
+-+-+    #@dwj
+-+-+    # 只遍历激活的专家，而非全部专家
+-+-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+--        hidden_states = hidden_states.view(-1, hidden_dim)
+-+--        # router_logits: (batch * sequence_length, n_experts)
+-+--        router_logits = self.gate(hidden_states)
+-+--
+-+--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+--        if self.norm_topk_prob:
+-+--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+--        # we cast back to the input dtype
+-+--        routing_weights = routing_weights.to(hidden_states.dtype)
+-+--
+-+--        final_hidden_states = ops.zeros(
+-+--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+--        )
+-+--
+-+--        # One hot encode the selected experts to create an expert mask
+-+--        # this will be used to easily index which expert is going to be sollicitated
+-+--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+--
+-+--        # Loop over all available experts in the model and perform the computation on each expert
+-+--        for expert_idx in range(self.num_experts):
+-+--            expert_layer = self.experts[expert_idx]
+-+--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+--
+-+--            # Index the correct hidden states and compute the expert hidden state for
+-+--            # the current expert. We need to make sure to multiply the output hidden
+-+--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+--            if 0 not in idx.shape:
+-+--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+--
+-+--                # However `index_add_` only support torch tensors for indexing so we'll use
+-+--                # the `top_x` tensor here.
+-+--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+--
+-+--        shared_expert_output = self.shared_expert(hidden_states)
+-+--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+--
+-+--        final_hidden_states = final_hidden_states + shared_expert_output
+-+-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+-+            num_tokens = hidden_states_reshaped.shape[0]
+-+-+
+-+-+            router_logits = self.gate(hidden_states_reshaped)
+-+-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+-+
+-+-+            if self.norm_topk_prob:
+-+-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+-+            routing_weights = routing_weights.to(hidden_states.dtype)
+-+-+            
+-+-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+-+            flat_selected_experts = selected_experts.flatten()
+-+-+            
+-+-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+-+            token_indices = broadcasted_token_indices.flatten()
+-+-+            
+-+-+            active_experts = ops.unique(flat_selected_experts)
+-+-+            
+-+-+            for expert_idx_tensor in active_experts:
+-+-+                expert_idx = expert_idx_tensor.item()
+-+-+                expert_layer = self.experts[expert_idx]
+-+-+                
+-+-+                mask = (flat_selected_experts == expert_idx_tensor)
+-+-+                selected_token_indices = token_indices[mask]
+-+-+                selected_routing_weights = routing_weights.flatten()[mask]
+-+-+                
+-+-+                current_states = hidden_states_reshaped[selected_token_indices]
+-+-+                
+-+-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+-+                
+-+-+                final_hidden_states = final_hidden_states.index_add(
+-+-+                    dim=0,
+-+-+                    index=selected_token_indices,
+-+-+                    source=expert_output.to(hidden_states.dtype)
+-+-+                )
+-+-+            
+-+-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+- 
+-+--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+--        return final_hidden_states, router_logits
+-+-+            final_hidden_states = final_hidden_states + shared_expert_output
+-+-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+-+            
+-+-+            return final_hidden_states, router_logits
+-+- 
+-+- 
+-+- class Qwen2MoeDecoderLayer(nn.Module):
+-+-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+- 
+-+-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+- 
+-+-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+-+
+-+-         if (layer_idx not in config.mlp_only_layers) and (
+-+-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+-         ):
+-+-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+-     _skip_keys_device_placement = "past_key_values"
+-+-     _supports_cache_class = True
+-+-+#lwx
+-+-+    # _supports_static_cache = True
+-+- 
+-+-     def _init_weights(self, module):
+-+-         std = self.config.initializer_range
+-+-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+-         return causal_mask
+-+- 
+-+- 
+-+--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+-     _tied_weights_keys = ["lm_head.weight"]
+-+- 
+-+-     def __init__(self, config):
+-+-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+-         self.num_experts_per_tok = config.num_experts_per_tok
+-+-         # Initialize weights and apply final processing
+-+-         self.post_init()
+-+-+        # @lwx
+-+-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+-+        #     self.generation_config.cache_implementation = "static"
+-+-+        self._warmed_up = False
+-+-+
+-+-+    def warmup_moe_model(self):
+-+-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+-+        test_texts = [
+-+-+            "warmup short",
+-+-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+-+        ]
+-+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+-+        if tokenizer is None:
+-+-+            from mindnlp.transformers import AutoTokenizer
+-+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+-+            self._warmup_tokenizer = tokenizer
+-+-+
+-+-+        for text in test_texts:
+-+-+            inputs = tokenizer(text, return_tensors="ms")
+-+-+            with mindspore._no_grad():
+-+-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+- 
+-+-     def get_input_embeddings(self):
+-+-         return self.model.embed_tokens
+-+-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+-         ```"""
+-+-+        if not self._warmed_up:
+-+-+            self._warmed_up = True
+-+-+            self.warmup_moe_model()
+-+- 
+-+-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+-         output_router_logits = (
+-+-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+-             }
+-+-         )
+-+-         return model_inputs
+-+-+# @lwx
+-+-+    # def _decode_one_tokens_logits(
+-+-+    #     self,
+-+-+    #     cur_token: mindspore.Tensor,
+-+-+    #     input_pos: Optional[mindspore.Tensor],
+-+-+    #     cache_position: mindspore.Tensor,
+-+-+    #     past_key_values: StaticCache,
+-+-+    # ) -> mindspore.Tensor:
+-+-+    #     """
+-+-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+-+        
+-+-+    #     Args:
+-+-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+-+    #         input_pos: 输入位置信息，可选
+-+-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+-+            
+-+-+    #     Returns:
+-+-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+-+    #     """
+-+-+    #     # 调用JIT编译的版本
+-+-+    #     return self.get_decode_one_tokens_logits(
+-+-+    #         cur_token=cur_token,
+-+-+    #         input_pos=input_pos,
+-+-+    #         cache_position=cache_position,
+-+-+    #         past_key_values=past_key_values,
+-+-+    #     )
+-+-+    
+-+-+    # @mindspore.jit(jit_level='O1')
+-+-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+-+    #     """
+-+-+    #     JIT编译的函数，用于高效的单token解码
+-+-+    #     使用JIT编译优化以支持静态shape和高效执行
+-+-+        
+-+-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+-+    #     """
+-+-+    #     outputs = self.model.forward(
+-+-+    #         input_ids=cur_token,
+-+-+    #         position_ids=input_pos,
+-+-+    #         cache_position=cache_position,
+-+-+    #         past_key_values=past_key_values,
+-+-+    #         use_cache=True,
+-+-+    #         return_dict=False,
+-+-+    #     )
+-+-+        
+-+-+    #     hidden_states = outputs[0]
+-+-+    #     logits = self.lm_head.forward(hidden_states)
+-+-+    #     logits = logits.float()
+-+-+        
+-+-+    #     return logits[:, -1, :]
+-+-+
+-+-+    # def _sample(
+-+-+    #     self,
+-+-+    #     input_ids: mindspore.Tensor,
+-+-+    #     logits_processor,
+-+-+    #     stopping_criteria,
+-+-+    #     generation_config,
+-+-+    #     synced_devices: bool,
+-+-+    #     streamer=None,
+-+-+    #     logits_warper=None,
+-+-+    #     **model_kwargs,
+-+-+    # ):
+-+-+    #     """
+-+-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+-+    #     """
+-+-+    #     from ...generation.logits_process import LogitsProcessorList
+-+-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+-+    #     from mindnlp.core import nn, ops, no_grad
+-+-+    #     import numpy as np
+-+-+        
+-+-+    #     # 检查是否使用 StaticCache
+-+-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+-+    #     # 否则，直接调用父类方法
+-+-+    #     past_key_values = model_kwargs.get("past_key_values")
+-+-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+-+
+-+-+    #     if not isinstance(past_key_values, StaticCache):
+-+-+    #         # 不使用 StaticCache，直接调用父类方法
+-+-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+-+    #         return super()._sample(
+-+-+    #             input_ids=input_ids,
+-+-+    #             logits_processor=logits_processor,
+-+-+    #             stopping_criteria=stopping_criteria,
+-+-+    #             generation_config=generation_config,
+-+-+    #             synced_devices=synced_devices,
+-+-+    #             streamer=streamer,
+-+-+    #             logits_warper=logits_warper,
+-+-+    #             **model_kwargs,
+-+-+    #         )
+-+-+        
+-+-+    #     # 使用 StaticCache，进入自定义循环
+-+-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+-+    #     pad_token_id = generation_config._pad_token_tensor
+-+-+    #     output_attentions = generation_config.output_attentions
+-+-+    #     output_hidden_states = generation_config.output_hidden_states
+-+-+    #     output_scores = generation_config.output_scores
+-+-+    #     output_logits = generation_config.output_logits
+-+-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+-+    #     max_length = generation_config.max_length
+-+-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+-+    #     do_sample = generation_config.do_sample
+-+-+        
+-+-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+-+    #         raise ValueError(
+-+-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+-+    #             f"{logits_warper})."
+-+-+    #         )
+-+-+        
+-+-+    #     # init attention / hidden states / scores tuples
+-+-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+-+        
+-+-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+-+    #         encoder_hidden_states = (
+-+-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+-+    #         )
+-+-+        
+-+-+    #     # keep track of which sequences are already finished
+-+-+    #     batch_size, cur_len = input_ids.shape
+-+-+    #     this_peer_finished = False
+-+-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+-+        
+-+-+    #     time_record = []
+-+-+    #     from ....utils.testing_utils import parse_flag_from_env
+-+-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+-+        
+-+-+    #     while self._has_unfinished_sequences(
+-+-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+-+    #     ):
+-+-+    #         if _record_time:
+-+-+    #             import time as time_module
+-+-+    #             infer_start = time_module.time()
+-+-+            
+-+-+    #         # prepare model inputs
+-+-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+-+            
+-+-+    #         # prepare variable output controls
+-+-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+-+            
+-+-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+-+    #         cur_cache_position = model_inputs.get("cache_position")
+-+-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+-+    #         cur_input_ids = model_inputs.get("input_ids")
+-+-+            
+-+-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+-+    #             cur_cache_position is not None and 
+-+-+    #             len(cur_cache_position.shape) > 0 and
+-+-+    #             cur_cache_position.shape[0] == 1 and
+-+-+    #             cur_input_ids is not None and
+-+-+    #             cur_input_ids.shape[1] == 1):
+-+-+    #             # 使用 JIT 优化的单 token 解码
+-+-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+-+    #             if not hasattr(self, '_jit_used'):
+-+-+    #                 self._jit_used = False
+-+-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+-+                
+-+-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+-+    #                 cur_token=cur_input_ids,
+-+-+    #                 input_pos=model_inputs.get("position_ids"),
+-+-+    #                 cache_position=cur_cache_position,
+-+-+    #                 past_key_values=cur_past_key_values,
+-+-+    #             )
+-+-+                
+-+-+    #             # 标记已使用JIT（用于后续判断）
+-+-+    #             if not self._jit_used:
+-+-+    #                 self._jit_used = True
+-+-+                
+-+-+    #             # 构造兼容的输出对象
+-+-+    #             class JitOptimizedOutput:
+-+-+    #                 def __init__(self, logits, config):
+-+-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+-+    #                     self.config = config
+-+-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+-+    #                     self.cross_attentions = None
+-+-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+-+                
+-+-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+-+    #         else:
+-+-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+-+    #             outputs = self(**model_inputs, return_dict=True)
+-+-+            
+-+-+    #         if synced_devices and this_peer_finished:
+-+-+    #             continue
+-+-+            
+-+-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+-+    #         next_token_logits = outputs.logits[:, -1, :]
+-+-+            
+-+-+    #         # pre-process distribution
+-+-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+-+    #         if do_sample:
+-+-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+-+            
+-+-+    #         # Store scores, attentions and hidden_states when required
+-+-+    #         if return_dict_in_generate:
+-+-+    #             if output_scores:
+-+-+    #                 scores += (next_token_scores,)
+-+-+    #             if output_logits:
+-+-+    #                 raw_logits += (next_token_logits,)
+-+-+    #             if output_attentions:
+-+-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+-+    #                 if self.config.is_encoder_decoder:
+-+-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+-+                
+-+-+    #             if output_hidden_states:
+-+-+    #                 hidden = (
+-+-+    #                     outputs.decoder_hidden_states
+-+-+    #                     if self.config.is_encoder_decoder
+-+-+    #                     else outputs.hidden_states
+-+-+    #                 )
+-+-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+-+            
+-+-+    #         # token selection
+-+-+    #         if do_sample:
+-+-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+-+    #         else:
+-+-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+-+            
+-+-+    #         # finished sentences should have their next token be a padding token
+-+-+    #         if has_eos_stopping_criteria:
+-+-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+-+            
+-+-+    #         # update generated ids, model inputs, and length for next step
+-+-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+-+    #         if streamer is not None:
+-+-+    #             streamer.put(next_tokens)
+-+-+            
+-+-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+-+    #             outputs,
+-+-+    #             model_kwargs,
+-+-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+-+    #         )
+-+-+            
+-+-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+-+    #         cur_len += 1
+-+-+            
+-+-+    #         if _record_time:
+-+-+    #             import time as time_module
+-+-+    #             infer_stop = time_module.time()
+-+-+    #             time_record.append(infer_stop - infer_start)
+-+-+            
+-+-+    #         del outputs
+-+-+        
+-+-+    #     average_infer_time = None
+-+-+    #     if time_record:
+-+-+    #         if len(time_record) > 1:
+-+-+    #             time_record.pop(0)
+-+-+    #         average_infer_time = sum(time_record) / len(time_record)
+-+-+    #         print(f'average inference time is: {average_infer_time}')
+-+-+    #         print(f'inference time record: {time_record}')
+-+-+        
+-+-+    #     if streamer is not None:
+-+-+    #         streamer.end()
+-+-+        
+-+-+    #     # 简单判断：打印是否使用了JIT路径
+-+-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+-+    #     else:
+-+-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+-+        
+-+-+    #     if return_dict_in_generate:
+-+-+    #         if self.config.is_encoder_decoder:
+-+-+    #             return GenerateEncoderDecoderOutput(
+-+-+    #                 sequences=input_ids,
+-+-+    #                 scores=scores,
+-+-+    #                 logits=raw_logits,
+-+-+    #                 encoder_attentions=encoder_attentions,
+-+-+    #                 encoder_hidden_states=encoder_hidden_states,
+-+-+    #                 decoder_attentions=decoder_attentions,
+-+-+    #                 cross_attentions=cross_attentions,
+-+-+    #                 decoder_hidden_states=decoder_hidden_states,
+-+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+-+    #                 average_infer_time=average_infer_time
+-+-+    #             )
+-+-+    #         else:
+-+-+    #             return GenerateDecoderOnlyOutput(
+-+-+    #                 sequences=input_ids,
+-+-+    #                 scores=scores,
+-+-+    #                 logits=raw_logits,
+-+-+    #                 attentions=decoder_attentions,
+-+-+    #                 hidden_states=decoder_hidden_states,
+-+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+-+    #                 average_infer_time=average_infer_time
+-+-+    #             )
+-+-+    #     else:
+-+-+    #         return input_ids
+-+-+            
+-+-+    # def _prepare_cache_for_generation(
+-+-+    #     self,
+-+-+    #     generation_config,
+-+-+    #     model_kwargs,
+-+-+    #     assistant_model,
+-+-+    #     batch_size,
+-+-+    #     max_cache_length,
+-+-+    # ):
+-+-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+-+    #         generation_config.cache_implementation = "static"
+-+-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+-+        
+-+-+    #     if generation_config.cache_implementation == "static":
+-+-+    #         base_required_from_max_length = generation_config.max_length + 1
+-+-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+-+    #         min_cache_size = 50
+-+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+-+    #         else:
+-+-+    #             max_cache_length = max(base_required, min_cache_size)
+-+-+            
+-+-+    #         original_max_cache_length = max_cache_length
+-+-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+-+            
+-+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+-+    #             if max_cache_length > self.config.max_position_embeddings:
+-+-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+-+        
+-+-+    #     result = super()._prepare_cache_for_generation(
+-+-+    #         generation_config=generation_config,
+-+-+    #         model_kwargs=model_kwargs,
+-+-+    #         assistant_model=assistant_model,
+-+-+    #         batch_size=batch_size,
+-+-+    #         max_cache_length=max_cache_length,
+-+-+    #     )
+-+-+        
+-+-+    #     if generation_config.cache_implementation == "static":
+-+-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+-+    #         created_cache = model_kwargs.get(cache_name)
+-+-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+-+    #             if created_cache.max_cache_len < generation_config.max_length:
+-+-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+-+        
+-+-+    #     return result
+-+-+
+-+-+
+-+-+
+-+- 
+-+- 
+-+- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+--- 
+-+-2.27.0
+-+-
+-+-- 
+-+2.27.0
+-+
+--- 
+-2.27.0
+-
+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+deleted file mode 100644
+index 7217a46b..00000000
+--- a/patches/0005-20251107001commit.patch
++++ /dev/null
+@@ -1,7707 +0,0 @@
+-From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Fri, 7 Nov 2025 11:48:18 +0800
+-Subject: [PATCH 5/8] 20251107001commit
+-
+----
+- .../models/deepseek/modeling_deepseek.py      |   91 +-
+- .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
+- .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
+- patches/0001-20251104commit.patch             |    2 +-
+- patches/0002-20251106commit.patch             |    2 +-
+- patches/0003-20261106secondcommit.patch       |    2 +-
+- patches/0004-20251106change.patch             | 7498 +++++++++++++++++
+- 7 files changed, 7577 insertions(+), 30 deletions(-)
+- create mode 100644 patches/0004-20251106change.patch
+-
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index 0546f318..8831e4b7 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
+-     #         expert_cache += expert_out * weight
+-     #     return expert_cache
+- 
+--    @no_grad()
+--    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+--        # x 的 shape: (1, hidden_size)
+--        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+--        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+--
+--        # 1. 收集所有需要的专家层
+--        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+--        selected_experts = [self.experts[i] for i in flat_expert_indices]
+--
+--        # 2. 并行计算所有专家的输出
+--        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+--        # ops.cat 会将它们堆叠成一个新的 Tensor
+--        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+--        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+--
+--        # 3. 使用矩阵乘法进行加权求和
+--        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+--        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+--        # 最终结果 final_output 的 shape: (1, hidden_size)
+--        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+    # @no_grad()
+-+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     # x 的 shape: (1, hidden_size)
+-+    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+
+-+    #     # 1. 收集所有需要的专家层
+-+    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+
+-+    #     # 2. 并行计算所有专家的输出
+-+    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+-+    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+
+-+    #     # 3. 使用矩阵乘法进行加权求和
+-+    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+-+    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-         
+--        return final_output
+-+    #     return final_output
+- 
+- 
+-     # @no_grad()
+-@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
+-             )
+- 
+-         return expert_cache
+-+# 放置在 DeepseekMoE 类中
+-+    @no_grad()
+-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+        """
+-+        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-+
+-+        Args:
+-+            x (Tensor): 输入张量, shape: (1, hidden_size)
+-+            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-+            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-+        """
+-+        top_k, _ = flat_expert_weights.shape
+-+        hidden_size = x.shape[-1]
+-+
+-+        # 1. 将所有专家的权重堆叠起来
+-+        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-+        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-+        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-+        
+-+        # 2. "收集" 所需的专家权重
+-+        selected_gate_w = stacked_gate_w[flat_expert_indices]
+-+        selected_up_w = stacked_up_w[flat_expert_indices]
+-+        selected_down_w = stacked_down_w[flat_expert_indices]
+-+        
+-+        # 3. 准备输入
+-+        x_expanded = x.expand((top_k, 1, hidden_size))
+-+        
+-+        # 4. 并行计算 gate_proj 和 up_proj
+-+        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-+        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-+
+-+        # 5. 计算中间状态
+-+        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-+        
+-+        # 6. 并行计算 down_proj
+-+        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-+        #    --- [FIX] ---
+-+        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-+        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-+        #    --- [FIX END] ---
+-+        
+-+        # 7. 根据路由权重进行加权求和
+-+        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-+
+-+        return weighted_sum
+-+
+-+
+- 
+-         # @no_grad()
+-         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-index ebd7782e..913a7609 100644
+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
+- # Copied from transformers.models.llama.modeling_llama.rotate_half
+- def rotate_half(x):
+-     """Rotates half the hidden dims of the input."""
+--    x1 = x[..., : x.shape[-1] // 2]
+--    x2 = x[..., x.shape[-1] // 2 :]
+-+    # x1 = x[..., : x.shape[-1] // 2]
+-+    # x2 = x[..., x.shape[-1] // 2 :]
+-     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+--    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-     return ops.cat((-x2, x1), dim=-1)
+- 
+- 
+-diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-index d059dcbe..2b217b64 100644
+---- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-+++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
+- # Copied from transformers.models.llama.modeling_llama.rotate_half
+- def rotate_half(x):
+-     """Rotates half the hidden dims of the input."""
+--    x1 = x[..., : x.shape[-1] // 2]
+--    x2 = x[..., x.shape[-1] // 2 :]
+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+    # x1 = x[..., : x.shape[-1] // 2]
+-+    # x2 = x[..., x.shape[-1] // 2 :]
+-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-     return ops.cat((-x2, x1), dim=-1)
+- 
+- 
+-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-index 78f22642..0a0ef2d7 100644
+---- a/patches/0001-20251104commit.patch
+-+++ b/patches/0001-20251104commit.patch
+-@@ -1,7 +1,7 @@
+- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Tue, 4 Nov 2025 09:11:51 +0800
+--Subject: [PATCH 1/3] 20251104commit
+-+Subject: [PATCH 1/4] 20251104commit
+- 
+- ---
+-  mindnlp/transformers/cache_utils.py           |  28 +-
+-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-index 22b65dd5..5185270c 100644
+---- a/patches/0002-20251106commit.patch
+-+++ b/patches/0002-20251106commit.patch
+-@@ -1,7 +1,7 @@
+- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 09:20:38 +0800
+--Subject: [PATCH 2/3] 20251106commit
+-+Subject: [PATCH 2/4] 20251106commit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-index 966529e4..3e05f821 100644
+---- a/patches/0003-20261106secondcommit.patch
+-+++ b/patches/0003-20261106secondcommit.patch
+-@@ -1,7 +1,7 @@
+- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 14:54:37 +0800
+--Subject: [PATCH 3/3] 20261106secondcommit
+-+Subject: [PATCH 3/4] 20261106secondcommit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-new file mode 100644
+-index 00000000..88a1aef4
+---- /dev/null
+-+++ b/patches/0004-20251106change.patch
+-@@ -0,0 +1,7498 @@
+-+From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+-+From: Pinoeer-kingxi <13022943007@163.com>
+-+Date: Thu, 6 Nov 2025 15:48:09 +0800
+-+Subject: [PATCH 4/4] 20251106change
+-+
+-+---
+-+ .../models/deepseek/modeling_deepseek.py      |  189 +-
+-+ patches/0001-20251104commit.patch             | 1272 +++++++
+-+ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
+-+ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
+-+ 4 files changed, 7244 insertions(+), 186 deletions(-)
+-+ create mode 100644 patches/0001-20251104commit.patch
+-+ create mode 100644 patches/0002-20251106commit.patch
+-+ create mode 100644 patches/0003-20261106secondcommit.patch
+-+
+-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+index 2f9192bf..0546f318 100644
+-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
+-+ 
+-+         return attn_output, attn_weights, past_key_value
+-+ 
+-+-# class DeepseekFlashAttention(nn.Module):
+-+-#     """
+-+-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-+-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-+-
+-+-#     This class is designed as a drop-in replacement for DeepseekAttention.
+-+-#     """
+-+-
+-+-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-+-#         super().__init__()
+-+-#         self.config = config
+-+-#         self.layer_idx = layer_idx
+-+-#         if layer_idx is None:
+-+-#             logger.warning(
+-+-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+-#                 "when creating this class."
+-+-#             )
+-+-
+-+-#         self.attention_dropout = config.attention_dropout
+-+-#         self.hidden_size = config.hidden_size
+-+-#         self.num_heads = config.num_attention_heads
+-+-#         self.head_dim = self.hidden_size // self.num_heads
+-+-#         self.num_key_value_heads = config.num_key_value_heads
+-+-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+-#         self.max_position_embeddings = config.max_position_embeddings
+-+-#         self.rope_theta = config.rope_theta
+-+-#         self.is_causal = True
+-+-
+-+-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+-#             raise ValueError(
+-+-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+-#                 f" and `num_heads`: {self.num_heads})."
+-+-#             )
+-+-
+-+-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-+-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-+-#         self._init_rope()
+-+-
+-+-#     def _init_rope(self):
+-+-#         if self.config.rope_scaling is None:
+-+-#             self.rotary_emb = DeepseekRotaryEmbedding(
+-+-#                 self.head_dim,
+-+-#                 max_position_embeddings=self.max_position_embeddings,
+-+-#                 base=self.rope_theta,
+-+-#             )
+-+-#         else:
+-+-#             scaling_type = self.config.rope_scaling["type"]
+-+-#             scaling_factor = self.config.rope_scaling["factor"]
+-+-#             if scaling_type == "linear":
+-+-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-+-#                     self.head_dim,
+-+-#                     max_position_embeddings=self.max_position_embeddings,
+-+-#                     scaling_factor=scaling_factor,
+-+-#                     base=self.rope_theta,
+-+-#                 )
+-+-#             elif scaling_type == "dynamic":
+-+-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-+-#                     self.head_dim,
+-+-#                     max_position_embeddings=self.max_position_embeddings,
+-+-#                     scaling_factor=scaling_factor,
+-+-#                     base=self.rope_theta,
+-+-#                 )
+-+-#             else:
+-+-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-+-
+-+-#     def forward(
+-+-#         self,
+-+-#         hidden_states: mindspore.Tensor,
+-+-#         attention_mask: Optional[mindspore.Tensor] = None,
+-+-#         position_ids: Optional[mindspore.Tensor] = None,
+-+-#         past_key_value: Optional[Cache] = None,
+-+-#         output_attentions: bool = False,
+-+-#         use_cache: bool = False,
+-+-#         **kwargs,
+-+-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+-#         if "padding_mask" in kwargs:
+-+-#             warnings.warn(
+-+-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-+-#             )
+-+-        
+-+-#         if output_attentions:
+-+-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-+-
+-+-#         bsz, q_len, _ = hidden_states.shape
+-+-
+-+-#         if self.config.pretraining_tp > 1:
+-+-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-+-
+-+-#         query_states = self.q_proj(hidden_states)
+-+-#         key_states = self.k_proj(hidden_states)
+-+-#         value_states = self.v_proj(hidden_states)
+-+-
+-+-#         # Reshape for multi-head attention
+-+-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+-
+-+-#         kv_seq_len = key_states.shape[-2]
+-+-#         if past_key_value is not None:
+-+-#             if self.layer_idx is None:
+-+-#                 raise ValueError(
+-+-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+-#                     "with a layer index."
+-+-#                 )
+-+-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+-        
+-+-#         # Apply Rotary Positional Embedding
+-+-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+-
+-+-#         if past_key_value is not None:
+-+-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-+-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+-
+-+-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-+-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-+-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+-        
+-+-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+-        
+-+-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+-
+-+-#         # Convert attention_mask for flash_attention_score
+-+-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-+-#         if attention_mask is not None:
+-+-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-+-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-+-#                 raise ValueError(
+-+-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-+-#                 )
+-+-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-+-#         else:
+-+-#             attn_mask_for_fa = None
+-+-        
+-+-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-+-
+-+-#         # Call the fused flash_attention_score operator
+-+-#         attn_output = mindspore.ops.flash_attention_score(
+-+-#             query=query_states_for_fa,
+-+-#             key=key_states_for_fa,
+-+-#             value=value_states_for_fa,
+-+-#             head_num=self.num_heads, # This is N1, the number of query heads
+-+-#             input_layout='BSH',
+-+-#             attn_mask=attn_mask_for_fa,
+-+-#             keep_prob=keep_prob,
+-+-#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+-#             sparse_mode=0 # Default mask mode
+-+-#         )
+-+-        
+-+-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-+-#         attn_output = self.o_proj(attn_output)
+-+-        
+-+-#         # Flash Attention does not return attention weights
+-+-#         attn_weights = None
+-+-
+-+-#         return attn_output, attn_weights, past_key_value
+-+ 
+-+ class DeepseekFlashAttention(nn.Module):
+-+     """
+-+@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
+-+         super().__init__()
+-+         self.hidden_size = config.hidden_size
+-+ 
+-+-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-+-            config=config, layer_idx=layer_idx
+-+-        )
+-++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-++            # config=config, layer_idx=layer_idx
+-++        # )
+-+ 
+-+         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-+             config=config, layer_idx=layer_idx
+-+@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
+-+         return outputs
+-+ 
+-+ 
+-+-
+-+ class DeepseekPreTrainedModel(PreTrainedModel):
+-+     config_class = DeepseekConfig
+-+     base_model_prefix = "model"
+-+@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+         # Initialize weights and apply final processing
+-+         self.post_init()
+-+         self.warm_up = False
+-+-        #@dwj
+-+-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-+-            self.num_layers,
+-+-            self.num_attention_heads,
+-+-            self.head_dim,
+-+-            batch_size=1,
+-+-            max_length=self.max_length,
+-+-            dtype=mindspore.float16
+-+-        )
+-+-
+-+-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-+-        key_cache = []
+-+-        value_cache = []
+-+-        for _ in range(num_layers):
+-+-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+-            key_cache.append(k)
+-+-            value_cache.append(v)
+-+-        return key_cache, value_cache
+-+-
+-+ 
+-+     def warmup_moe_model_deep(self):
+-+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+new file mode 100644
+-+index 00000000..78f22642
+-+--- /dev/null
+-++++ b/patches/0001-20251104commit.patch
+-+@@ -0,0 +1,1272 @@
+-++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++From: Pinoeer-kingxi <13022943007@163.com>
+-++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++Subject: [PATCH 1/3] 20251104commit
+-++
+-++---
+-++ mindnlp/transformers/cache_utils.py           |  28 +-
+-++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-++
+-++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-++index cadd2e04..02f8d4be 100644
+-++--- a/mindnlp/transformers/cache_utils.py
+-+++++ b/mindnlp/transformers/cache_utils.py
+-++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-++             # k_out[:, :, cache_position] = key_states
+-++             # v_out[:, :, cache_position] = value_states
+-++-            if ON_ORANGE_PI:
+-++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++-            else:
+-++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++-
+-+++            # if ON_ORANGE_PI:
+-+++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++            # else:
+-+++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++            # 确保 cache_position 是 1D tensor 并且类型正确
+-+++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+++            if cache_position.ndim > 1:
+-+++                cache_position = cache_position.flatten()
+-+++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+++                cache_position = cache_position.int()
+-+++            
+-+++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+++            k_out[:, :, cache_position] = key_states
+-+++            v_out[:, :, cache_position] = value_states
+-+++            
+-++         return k_out, v_out
+-++ 
+-++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++index c695b944..d8303e45 100644
+-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++ def rotate_half(x):
+-++     """Rotates half the hidden dims of the input."""
+-++-    x1 = x[..., : x.shape[-1] // 2]
+-++-    x2 = x[..., x.shape[-1] // 2 :]
+-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++    # x1 = x[..., : x.shape[-1] // 2]
+-+++    # x2 = x[..., x.shape[-1] // 2 :]
+-+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++     return ops.cat((-x2, x1), dim=-1)
+-++ 
+-++ 
+-++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-++         if self.training:
+-++             raise NotImplementedError("Training is not supported yet.")
+-++         else:
+-++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++-        if self.config.n_shared_experts is not None:
+-++-            y = y + self.shared_experts(identity)
+-++-        return y
+-+++            # @lwx
+-+++            if orig_shape[1] == 1:
+-+++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+++                y=y.view(*orig_shape)
+-+++                if self.config.n_shared_experts is not None:
+-+++                    y = y + self.shared_experts(identity)
+-+++                return y
+-+++            else:
+-+++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+++                if self.config.n_shared_experts is not None:
+-+++                    y = y + self.shared_experts(identity)
+-+++                return y
+-+++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++        # if self.config.n_shared_experts is not None:
+-+++        #     y = y + self.shared_experts(identity)
+-+++        # return y
+-+++        
+-+++    @no_grad()
+-+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++
+-+++        expert_cache = ops.zeros_like(x)
+-+++        for i in range(self.num_experts_per_tok):
+-+++            expert_id = flat_expert_indices[i].item()
+-+++            weight = flat_expert_weights[i].item()
+-+++            expert = self.experts[expert_id]
+-+++            expert_out = expert(x)
+-+++            expert_cache += expert_out * weight
+-+++        return expert_cache
+-++ 
+-++     @no_grad()
+-++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++-        # expert_cache = torch.zeros_like(x)
+-++-        # idxs = flat_expert_indices.argsort()
+-++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++-        # token_idxs = idxs // self.num_experts_per_tok
+-++-        # for i, end_idx in enumerate(tokens_per_expert):
+-++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++-        #     if start_idx == end_idx:
+-++-        #         continue
+-++-        #     expert = self.experts[i]
+-++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++-        #     expert_tokens = x[exp_token_idx]
+-++-        #     expert_out = expert(expert_tokens)
+-++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++-        # return expert_cache
+-+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++         expert_cache = ops.zeros_like(x)
+-++         idxs = flat_expert_indices.argsort()
+-++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++         token_idxs = idxs // self.num_experts_per_tok
+-+++
+-++         for i, end_idx in enumerate(tokens_per_expert):
+-++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++             if start_idx == end_idx:
+-++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-++             expert_out = expert(expert_tokens)
+-++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++
+-++         return expert_cache
+-+++        
+-+++    # @no_grad()
+-+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++    #     # expert_cache = torch.zeros_like(x)
+-+++    #     # idxs = flat_expert_indices.argsort()
+-+++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++    #     #     if start_idx == end_idx:
+-+++    #     #         continue
+-+++    #     #     expert = self.experts[i]
+-+++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++    #     #     expert_tokens = x[exp_token_idx]
+-+++    #     #     expert_out = expert(expert_tokens)
+-+++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++    #     # return expert_cache
+-+++    #     expert_cache = ops.zeros_like(x)
+-+++    #     idxs = flat_expert_indices.argsort()
+-+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++
+-+++    #     for i, end_idx in enumerate(tokens_per_expert):
+-+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++    #         if start_idx == end_idx:
+-+++    #             continue
+-+++    #         expert = self.experts[i]
+-+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++    #         expert_tokens = x[exp_token_idx]
+-+++    #         expert_out = expert(expert_tokens)
+-+++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++
+-+++    #     return expert_cache
+-+++    # @no_grad()
+-+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++    #     expert_cache = ops.zeros_like(x)
+-+++
+-+++    #     # 排序保证顺序一致
+-+++    #     idxs = flat_expert_indices.argsort()
+-+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++
+-+++    #     # 找出有 token 的专家
+-+++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++
+-+++    #     for i in active_experts.tolist():
+-+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++    #         end_idx = tokens_per_expert[i]
+-+++    #         if start_idx == end_idx:  # 没有 token
+-+++    #             continue
+-+++
+-+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++    #         expert_tokens = x[exp_token_idx]
+-+++    #         expert_out = self.experts[i](expert_tokens)
+-+++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++
+-+++    #         expert_cache = mindspore.mint.scatter_add(
+-+++    #             expert_cache,
+-+++    #             0,
+-+++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++    #             expert_out
+-+++    #         )
+-+++
+-+++    #     return expert_cache
+-+++
+-+++
+-++ 
+-++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-++ #     """
+-++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++ 
+-++         # Initialize weights and apply final processing
+-++         self.post_init()
+-+++        self.warm_up = False
+-+++
+-+++    def warmup_moe_model_deep(self):
+-+++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++        test_texts = [
+-+++            "warmup short",
+-+++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+++        ]
+-+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++        if tokenizer is None:
+-+++            from mindnlp.transformers import AutoTokenizer
+-+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++            self._warmup_tokenizer = tokenizer
+-+++
+-+++        for text in test_texts:
+-+++            inputs = tokenizer(text, return_tensors="ms")
+-+++            with mindspore._no_grad():
+-+++                _ = self(**inputs, use_cache=False)
+-+++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-++ 
+-++     def get_input_embeddings(self):
+-++         return self.model.embed_tokens
+-++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++         ```"""
+-+++        if not self.warm_up:
+-+++            self.warm_up = True
+-+++            self.warmup_moe_model_deep()
+-+++
+-++         output_attentions = (
+-++             output_attentions
+-++             if output_attentions is not None
+-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++index 3cbf820e..d4c6b651 100644
+-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++@@ -18,7 +18,6 @@
+-++ # See the License for the specific language governing permissions and
+-++ # limitations under the License.
+-++ """MindSpore Qwen2MoE model."""
+-++-
+-++ import math
+-++ from typing import List, Optional, Tuple, Union
+-++ 
+-++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-++     TokenClassifierOutput,
+-++ )
+-++ from ...modeling_utils import PreTrainedModel
+-+++from ...generation import GenerationMixin
+-++ from ....utils import logging
+-++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-++ 
+-++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-++         self.variance_epsilon = eps
+-++ 
+-++     def forward(self, hidden_states):
+-+++        # @dwj
+-+++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++        # @lwx
+-+++        # if not self.training :
+-+++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++         input_dtype = hidden_states.dtype
+-++         hidden_states = hidden_states.to(mindspore.float32)
+-++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-++@@ -234,6 +239,8 @@ def rotate_half(x):
+-++     """Rotates half the hidden dims of the input."""
+-++     x1 = x[..., : x.shape[-1] // 2]
+-++     x2 = x[..., x.shape[-1] // 2 :]
+-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++     return ops.cat((-x2, x1), dim=-1)
+-++ 
+-++ 
+-++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-++         self.config = config
+-++         self.hidden_size = config.hidden_size
+-++         self.intermediate_size = intermediate_size
+-+++        
+-++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-++         self.act_fn = ACT2FN[config.hidden_act]
+-++ 
+-++     def forward(self, x):
+-++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++-
+-++ 
+-+++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++        # @lwx 
+-+++        # gate_up_output = self.gate_up_proj(x)
+-+++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+++        # return self.down_proj(swiglu_output)
+-+++
+-+++    # def forward(self, x):
+-+++    #     gate_proj_out = self.gate_proj(x)
+-+++    #     up_proj_out = self.up_proj(x)
+-+++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+++    #     return self.down_proj(swiglu_out)
+-+++        
+-++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++     """
+-++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-++         use_cache: bool = False,
+-++         cache_position: Optional[mindspore.Tensor] = None,
+-++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++        
+-+++
+-++         bsz, q_len, _ = hidden_states.shape
+-++ 
+-++         query_states = self.q_proj(hidden_states)
+-++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++                     "with a layer index."
+-++                 )
+-++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++            if isinstance(past_key_value, StaticCache):
+-+++                kv_seq_len = key_states.shape[-2]
+-+++            else:
+-+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++ 
+-++         if past_key_value is not None:
+-++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++            
+-+++            if isinstance(past_key_value, StaticCache):
+-+++                kv_seq_len = key_states.shape[-2]
+-++ 
+-++         # repeat k/v heads if n_kv_heads < n_heads
+-++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++-
+-+++        
+-++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++ 
+-++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-++-            raise ValueError(
+-++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-++-                f" {attn_weights.shape}"
+-++-            )
+-++-
+-++-        if attention_mask is not None:  # no matter the length, we just slice it
+-++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+++        if attention_mask is not None:
+-+++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++             attn_weights = attn_weights + causal_mask
+-++ 
+-++         # upcast attention to fp32
+-++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++ 
+-++         attn_output = self.o_proj(attn_output)
+-++-
+-+++        # @lwx
+-+++        
+-+++        # max_seq_len = self.max_position_embeddings  # 2048
+-+++
+-+++        # if attention_mask is not None:
+-+++        #     # attention_mask: [B, 1, Sq, Sk]
+-+++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++
+-+++        #     # pad 到 [max_seq_len, max_seq_len]
+-+++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++        #     global_attention_mask = padded_mask
+-+++        # else:
+-+++        #     global_attention_mask = None
+-+++
+-+++
+-+++        # sparse_mode=3
+-+++        # attn_output = mindspore.ops.flash_attention_score(
+-+++        #     query=query_states,
+-+++        #     key=key_states,
+-+++        #     value=value_states,
+-+++        #     real_shift=None,
+-+++        #     padding_mask=None,
+-+++
+-+++        #     head_num=self.num_heads,
+-+++        #     attn_mask=global_attention_mask,
+-+++        #     keep_prob=1.0 - self.attention_dropout,
+-+++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++        #     input_layout="BNSD",
+-+++        #     pre_tokens=2147483647,
+-+++        #     next_tokens=2147483647,
+-+++        #     inner_precise=0,
+-+++        #     drop_mask=None,
+-+++        #     prefix=None,
+-+++        #     actual_seq_qlen=None,
+-+++        #     actual_seq_kvlen=None,
+-+++        #     sparse_mode=sparse_mode,
+-+++        # )
+-++         if not output_attentions:
+-++             attn_weights = None
+-++ 
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-++ 
+-+++class Qwen2MoeFlashAttention(nn.Module):
+-+++    """
+-+++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++
+-+++    关键改动:
+-+++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++       直接传入原始的 key 和 value 张量效率更高。
+-+++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++    """
+-+++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++        super().__init__()
+-+++        self.config = config
+-+++        self.layer_idx = layer_idx
+-+++        self.hidden_size = config.hidden_size
+-+++        self.num_heads = config.num_attention_heads
+-+++        self.head_dim = self.hidden_size // self.num_heads
+-+++        self.num_key_value_heads = config.num_key_value_heads
+-+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++        self.max_position_embeddings = config.max_position_embeddings
+-+++        self.rope_theta = config.rope_theta
+-+++        self.attention_dropout = config.attention_dropout
+-+++
+-+++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++            raise ValueError(
+-+++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++            )
+-+++
+-+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++
+-+++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++            self.head_dim,
+-+++            max_position_embeddings=self.max_position_embeddings,
+-+++            base=self.rope_theta,
+-+++        )
+-+++
+-+++    def forward(
+-+++        self,
+-+++        hidden_states: mindspore.Tensor,
+-+++        attention_mask: Optional[mindspore.Tensor] = None,
+-+++        position_ids: Optional[mindspore.Tensor] = None,
+-+++        past_key_value: Optional[Cache] = None,
+-+++        output_attentions: bool = False,
+-+++        use_cache: bool = False,
+-+++        cache_position: Optional[mindspore.Tensor] = None,
+-+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++        bsz, q_len, _ = hidden_states.shape
+-+++
+-+++        # 1. 线性投射 Q, K, V
+-+++        query_states = self.q_proj(hidden_states)
+-+++        key_states = self.k_proj(hidden_states)
+-+++        value_states = self.v_proj(hidden_states)
+-+++
+-+++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++        # 3. RoPE 旋转位置编码
+-+++        kv_seq_len = key_states.shape[-2]
+-+++        if past_key_value is not None:
+-+++            if self.layer_idx is None:
+-+++                raise ValueError(
+-+++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++                    "with a layer index."
+-+++                )
+-+++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++                if cache_position.shape[0] == 1:
+-+++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++                    kv_seq_len = past_seen_tokens + 1
+-+++                else:
+-+++                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++            else:
+-+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++
+-+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++        # 4. KV 缓存更新
+-+++        if past_key_value is not None:
+-+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++            key_states, value_states = past_key_value.update(
+-+++                key_states, value_states, self.layer_idx, cache_kwargs
+-+++            )
+-+++            
+-+++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++                if cache_position.shape[0] == 1:
+-+++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++                    kv_seq_len = key_states.shape[-2]
+-+++
+-+++        # 5. [重要] 准备 Attention Mask
+-+++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++        fa_attention_mask = None
+-+++        if attention_mask is not None:
+-+++            # 截取与当前key长度匹配的部分
+-+++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++            fa_attention_mask = (mask_slice != 0)
+-+++
+-+++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++        input_dtype = query_states.dtype
+-+++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++            query_states = query_states.to(mindspore.float16)
+-+++            key_states = key_states.to(mindspore.float16)
+-+++            value_states = value_states.to(mindspore.float16)
+-+++
+-+++        # 6. [核心] 调用 flash_attention_score 算子
+-+++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++        attn_output = mindspore.ops.flash_attention_score(
+-+++            query=query_states,
+-+++            key=key_states,
+-+++            value=value_states,
+-+++            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++            attn_mask=fa_attention_mask,
+-+++            keep_prob=1.0 - self.attention_dropout,
+-+++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++            input_layout="BNSD",
+-+++            sparse_mode=0 # 使用 defaultMask 模式
+-+++        )
+-+++
+-+++        # 恢复原始数据类型
+-+++        attn_output = attn_output.to(input_dtype)
+-+++
+-+++        # 7. 调整输出形状
+-+++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++        attn_output = self.o_proj(attn_output)
+-+++
+-+++        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++        attn_weights = None
+-+++        if output_attentions:
+-+++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++
+-+++        return attn_output, attn_weights, past_key_value
+-+++
+-+++    # def forward(
+-+++    #     self,
+-+++    #     hidden_states: mindspore.Tensor,
+-+++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++    #     past_key_value: Optional[Cache] = None,
+-+++    #     output_attentions: bool = False,
+-+++    #     use_cache: bool = False,
+-+++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++    #     bsz, q_len, _ = hidden_states.shape
+-+++
+-+++    #     # 1. 线性投射 Q, K, V
+-+++    #     query_states = self.q_proj(hidden_states)
+-+++    #     key_states = self.k_proj(hidden_states)
+-+++    #     value_states = self.v_proj(hidden_states)
+-+++
+-+++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++    #     # 3. RoPE 旋转位置编码
+-+++    #     kv_seq_len = key_states.shape[-2]
+-+++    #     if past_key_value is not None:
+-+++    #         if self.layer_idx is None:
+-+++    #             raise ValueError(
+-+++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++    #                 "with a layer index."
+-+++    #             )
+-+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++
+-+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++    #     # 4. KV 缓存更新
+-+++    #     if past_key_value is not None:
+-+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++    #         key_states, value_states = past_key_value.update(
+-+++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++    #         )
+-+++
+-+++    #     # 5. 准备 Attention Mask
+-+++    #     fa_attention_mask = None
+-+++    #     if attention_mask is not None:
+-+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++    #         fa_attention_mask = (mask_slice != 0)
+-+++
+-+++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++    #     input_dtype = query_states.dtype
+-+++
+-+++    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++    #         query=query_states,
+-+++    #         key=key_states,
+-+++    #         value=value_states,
+-+++    #         head_num=self.num_heads,
+-+++    #         attn_mask=fa_attention_mask,
+-+++    #         keep_prob=1.0 - self.attention_dropout,
+-+++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++    #         input_layout="BNSD",
+-+++    #         sparse_mode=0,
+-+++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++    #         inner_precise=1
+-+++    #     )
+-+++
+-+++    #     # 恢复原始数据类型
+-+++    #     attn_output = attn_output.to(input_dtype)
+-+++
+-+++    #     # 7. 调整输出形状
+-+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++    #     attn_output = self.o_proj(attn_output)
+-+++
+-+++    #     attn_weights = None
+-+++    #     if output_attentions:
+-+++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++
+-+++    #     return attn_output, attn_weights, past_key_value
+-+++
+-+++    # def forward(
+-+++    #     self,
+-+++    #     hidden_states: mindspore.Tensor,
+-+++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++    #     past_key_value: Optional[Cache] = None,
+-+++    #     output_attentions: bool = False,
+-+++    #     use_cache: bool = False,
+-+++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++    #     bsz, q_len, _ = hidden_states.shape
+-+++
+-+++    #     query_states = self.q_proj(hidden_states)
+-+++    #     key_states = self.k_proj(hidden_states)
+-+++    #     value_states = self.v_proj(hidden_states)
+-+++
+-+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++    #     kv_seq_len = key_states.shape[-2]
+-+++    #     if past_key_value is not None:
+-+++    #         if self.layer_idx is None:
+-+++    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++
+-+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++    #     if past_key_value is not None:
+-+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++    #         key_states, value_states = past_key_value.update(
+-+++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++    #         )
+-+++
+-+++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++
+-+++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++    #     # <--- 修改结束 ---
+-+++
+-+++    #     fa_attention_mask = None
+-+++    #     if attention_mask is not None:
+-+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++    #         fa_attention_mask = (mask_slice != 0)
+-+++
+-+++    #     input_dtype = query_states.dtype
+-+++
+-+++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++    #         key=key_states,
+-+++    #         value=value_states,
+-+++    #         head_num=self.num_heads,
+-+++    #         attn_mask=fa_attention_mask,
+-+++    #         keep_prob=1.0 - self.attention_dropout,
+-+++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++    #         input_layout="BNSD",
+-+++    #         sparse_mode=0,
+-+++    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++    #     )
+-+++
+-+++    #     attn_output = attn_output.to(input_dtype)
+-+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++    #     attn_output = self.o_proj(attn_output)
+-+++
+-+++    #     attn_weights = None
+-+++    #     if output_attentions:
+-+++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++
+-+++    #     return attn_output, attn_weights, past_key_value
+-+++
+-++ QWEN2MOE_ATTENTION_CLASSES = {
+-++     "eager": Qwen2MoeAttention,
+-+++    "flash-attention": Qwen2MoeFlashAttention,
+-++ }
+-++ 
+-++ 
+-++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++ 
+-+++    #@dwj
+-+++    # 只遍历激活的专家，而非全部专家
+-++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-++-        # router_logits: (batch * sequence_length, n_experts)
+-++-        router_logits = self.gate(hidden_states)
+-++-
+-++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++-        if self.norm_topk_prob:
+-++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-        # we cast back to the input dtype
+-++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-++-
+-++-        final_hidden_states = ops.zeros(
+-++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-++-        )
+-++-
+-++-        # One hot encode the selected experts to create an expert mask
+-++-        # this will be used to easily index which expert is going to be sollicitated
+-++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-++-
+-++-        # Loop over all available experts in the model and perform the computation on each expert
+-++-        for expert_idx in range(self.num_experts):
+-++-            expert_layer = self.experts[expert_idx]
+-++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-++-
+-++-            # Index the correct hidden states and compute the expert hidden state for
+-++-            # the current expert. We need to make sure to multiply the output hidden
+-++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-++-            if 0 not in idx.shape:
+-++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-++-
+-++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-++-                # the `top_x` tensor here.
+-++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-++-
+-++-        shared_expert_output = self.shared_expert(hidden_states)
+-++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-++-
+-++-        final_hidden_states = final_hidden_states + shared_expert_output
+-+++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++            num_tokens = hidden_states_reshaped.shape[0]
+-+++
+-+++            router_logits = self.gate(hidden_states_reshaped)
+-+++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++
+-+++            if self.norm_topk_prob:
+-+++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++            
+-+++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++            flat_selected_experts = selected_experts.flatten()
+-+++            
+-+++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++            token_indices = broadcasted_token_indices.flatten()
+-+++            
+-+++            active_experts = ops.unique(flat_selected_experts)
+-+++            
+-+++            for expert_idx_tensor in active_experts:
+-+++                expert_idx = expert_idx_tensor.item()
+-+++                expert_layer = self.experts[expert_idx]
+-+++                
+-+++                mask = (flat_selected_experts == expert_idx_tensor)
+-+++                selected_token_indices = token_indices[mask]
+-+++                selected_routing_weights = routing_weights.flatten()[mask]
+-+++                
+-+++                current_states = hidden_states_reshaped[selected_token_indices]
+-+++                
+-+++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++                
+-+++                final_hidden_states = final_hidden_states.index_add(
+-+++                    dim=0,
+-+++                    index=selected_token_indices,
+-+++                    source=expert_output.to(hidden_states.dtype)
+-+++                )
+-+++            
+-+++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++ 
+-++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++-        return final_hidden_states, router_logits
+-+++            final_hidden_states = final_hidden_states + shared_expert_output
+-+++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++            
+-+++            return final_hidden_states, router_logits
+-++ 
+-++ 
+-++ class Qwen2MoeDecoderLayer(nn.Module):
+-++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-++ 
+-++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++ 
+-+++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++
+-++         if (layer_idx not in config.mlp_only_layers) and (
+-++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++         ):
+-++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-++     _skip_keys_device_placement = "past_key_values"
+-++     _supports_cache_class = True
+-+++#lwx
+-+++    # _supports_static_cache = True
+-++ 
+-++     def _init_weights(self, module):
+-++         std = self.config.initializer_range
+-++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++         return causal_mask
+-++ 
+-++ 
+-++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++     _tied_weights_keys = ["lm_head.weight"]
+-++ 
+-++     def __init__(self, config):
+-++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++         self.num_experts_per_tok = config.num_experts_per_tok
+-++         # Initialize weights and apply final processing
+-++         self.post_init()
+-+++        # @lwx
+-+++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+++        #     self.generation_config.cache_implementation = "static"
+-+++        self._warmed_up = False
+-+++
+-+++    def warmup_moe_model(self):
+-+++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+++        test_texts = [
+-+++            "warmup short",
+-+++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+++        ]
+-+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++        if tokenizer is None:
+-+++            from mindnlp.transformers import AutoTokenizer
+-+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++            self._warmup_tokenizer = tokenizer
+-+++
+-+++        for text in test_texts:
+-+++            inputs = tokenizer(text, return_tensors="ms")
+-+++            with mindspore._no_grad():
+-+++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-++ 
+-++     def get_input_embeddings(self):
+-++         return self.model.embed_tokens
+-++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++         ```"""
+-+++        if not self._warmed_up:
+-+++            self._warmed_up = True
+-+++            self.warmup_moe_model()
+-++ 
+-++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++         output_router_logits = (
+-++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++             }
+-++         )
+-++         return model_inputs
+-+++# @lwx
+-+++    # def _decode_one_tokens_logits(
+-+++    #     self,
+-+++    #     cur_token: mindspore.Tensor,
+-+++    #     input_pos: Optional[mindspore.Tensor],
+-+++    #     cache_position: mindspore.Tensor,
+-+++    #     past_key_values: StaticCache,
+-+++    # ) -> mindspore.Tensor:
+-+++    #     """
+-+++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+++        
+-+++    #     Args:
+-+++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+++    #         input_pos: 输入位置信息，可选
+-+++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+++            
+-+++    #     Returns:
+-+++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+++    #     """
+-+++    #     # 调用JIT编译的版本
+-+++    #     return self.get_decode_one_tokens_logits(
+-+++    #         cur_token=cur_token,
+-+++    #         input_pos=input_pos,
+-+++    #         cache_position=cache_position,
+-+++    #         past_key_values=past_key_values,
+-+++    #     )
+-+++    
+-+++    # @mindspore.jit(jit_level='O1')
+-+++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+++    #     """
+-+++    #     JIT编译的函数，用于高效的单token解码
+-+++    #     使用JIT编译优化以支持静态shape和高效执行
+-+++        
+-+++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+++    #     """
+-+++    #     outputs = self.model.forward(
+-+++    #         input_ids=cur_token,
+-+++    #         position_ids=input_pos,
+-+++    #         cache_position=cache_position,
+-+++    #         past_key_values=past_key_values,
+-+++    #         use_cache=True,
+-+++    #         return_dict=False,
+-+++    #     )
+-+++        
+-+++    #     hidden_states = outputs[0]
+-+++    #     logits = self.lm_head.forward(hidden_states)
+-+++    #     logits = logits.float()
+-+++        
+-+++    #     return logits[:, -1, :]
+-+++
+-+++    # def _sample(
+-+++    #     self,
+-+++    #     input_ids: mindspore.Tensor,
+-+++    #     logits_processor,
+-+++    #     stopping_criteria,
+-+++    #     generation_config,
+-+++    #     synced_devices: bool,
+-+++    #     streamer=None,
+-+++    #     logits_warper=None,
+-+++    #     **model_kwargs,
+-+++    # ):
+-+++    #     """
+-+++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+++    #     """
+-+++    #     from ...generation.logits_process import LogitsProcessorList
+-+++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+++    #     from mindnlp.core import nn, ops, no_grad
+-+++    #     import numpy as np
+-+++        
+-+++    #     # 检查是否使用 StaticCache
+-+++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+++    #     # 否则，直接调用父类方法
+-+++    #     past_key_values = model_kwargs.get("past_key_values")
+-+++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+++
+-+++    #     if not isinstance(past_key_values, StaticCache):
+-+++    #         # 不使用 StaticCache，直接调用父类方法
+-+++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+++    #         return super()._sample(
+-+++    #             input_ids=input_ids,
+-+++    #             logits_processor=logits_processor,
+-+++    #             stopping_criteria=stopping_criteria,
+-+++    #             generation_config=generation_config,
+-+++    #             synced_devices=synced_devices,
+-+++    #             streamer=streamer,
+-+++    #             logits_warper=logits_warper,
+-+++    #             **model_kwargs,
+-+++    #         )
+-+++        
+-+++    #     # 使用 StaticCache，进入自定义循环
+-+++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+++    #     pad_token_id = generation_config._pad_token_tensor
+-+++    #     output_attentions = generation_config.output_attentions
+-+++    #     output_hidden_states = generation_config.output_hidden_states
+-+++    #     output_scores = generation_config.output_scores
+-+++    #     output_logits = generation_config.output_logits
+-+++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+++    #     max_length = generation_config.max_length
+-+++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+++    #     do_sample = generation_config.do_sample
+-+++        
+-+++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+++    #         raise ValueError(
+-+++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+++    #             f"{logits_warper})."
+-+++    #         )
+-+++        
+-+++    #     # init attention / hidden states / scores tuples
+-+++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+++        
+-+++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+++    #         encoder_hidden_states = (
+-+++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+++    #         )
+-+++        
+-+++    #     # keep track of which sequences are already finished
+-+++    #     batch_size, cur_len = input_ids.shape
+-+++    #     this_peer_finished = False
+-+++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+++        
+-+++    #     time_record = []
+-+++    #     from ....utils.testing_utils import parse_flag_from_env
+-+++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+++        
+-+++    #     while self._has_unfinished_sequences(
+-+++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+++    #     ):
+-+++    #         if _record_time:
+-+++    #             import time as time_module
+-+++    #             infer_start = time_module.time()
+-+++            
+-+++    #         # prepare model inputs
+-+++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+++            
+-+++    #         # prepare variable output controls
+-+++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+++            
+-+++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+++    #         cur_cache_position = model_inputs.get("cache_position")
+-+++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+++    #         cur_input_ids = model_inputs.get("input_ids")
+-+++            
+-+++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+++    #             cur_cache_position is not None and 
+-+++    #             len(cur_cache_position.shape) > 0 and
+-+++    #             cur_cache_position.shape[0] == 1 and
+-+++    #             cur_input_ids is not None and
+-+++    #             cur_input_ids.shape[1] == 1):
+-+++    #             # 使用 JIT 优化的单 token 解码
+-+++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+++    #             if not hasattr(self, '_jit_used'):
+-+++    #                 self._jit_used = False
+-+++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+++                
+-+++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+++    #                 cur_token=cur_input_ids,
+-+++    #                 input_pos=model_inputs.get("position_ids"),
+-+++    #                 cache_position=cur_cache_position,
+-+++    #                 past_key_values=cur_past_key_values,
+-+++    #             )
+-+++                
+-+++    #             # 标记已使用JIT（用于后续判断）
+-+++    #             if not self._jit_used:
+-+++    #                 self._jit_used = True
+-+++                
+-+++    #             # 构造兼容的输出对象
+-+++    #             class JitOptimizedOutput:
+-+++    #                 def __init__(self, logits, config):
+-+++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+++    #                     self.config = config
+-+++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+++    #                     self.cross_attentions = None
+-+++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+++                
+-+++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+++    #         else:
+-+++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+++    #             outputs = self(**model_inputs, return_dict=True)
+-+++            
+-+++    #         if synced_devices and this_peer_finished:
+-+++    #             continue
+-+++            
+-+++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+++    #         next_token_logits = outputs.logits[:, -1, :]
+-+++            
+-+++    #         # pre-process distribution
+-+++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+++    #         if do_sample:
+-+++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+++            
+-+++    #         # Store scores, attentions and hidden_states when required
+-+++    #         if return_dict_in_generate:
+-+++    #             if output_scores:
+-+++    #                 scores += (next_token_scores,)
+-+++    #             if output_logits:
+-+++    #                 raw_logits += (next_token_logits,)
+-+++    #             if output_attentions:
+-+++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+++    #                 if self.config.is_encoder_decoder:
+-+++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+++                
+-+++    #             if output_hidden_states:
+-+++    #                 hidden = (
+-+++    #                     outputs.decoder_hidden_states
+-+++    #                     if self.config.is_encoder_decoder
+-+++    #                     else outputs.hidden_states
+-+++    #                 )
+-+++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+++            
+-+++    #         # token selection
+-+++    #         if do_sample:
+-+++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+++    #         else:
+-+++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+++            
+-+++    #         # finished sentences should have their next token be a padding token
+-+++    #         if has_eos_stopping_criteria:
+-+++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+++            
+-+++    #         # update generated ids, model inputs, and length for next step
+-+++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+++    #         if streamer is not None:
+-+++    #             streamer.put(next_tokens)
+-+++            
+-+++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+++    #             outputs,
+-+++    #             model_kwargs,
+-+++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+++    #         )
+-+++            
+-+++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+++    #         cur_len += 1
+-+++            
+-+++    #         if _record_time:
+-+++    #             import time as time_module
+-+++    #             infer_stop = time_module.time()
+-+++    #             time_record.append(infer_stop - infer_start)
+-+++            
+-+++    #         del outputs
+-+++        
+-+++    #     average_infer_time = None
+-+++    #     if time_record:
+-+++    #         if len(time_record) > 1:
+-+++    #             time_record.pop(0)
+-+++    #         average_infer_time = sum(time_record) / len(time_record)
+-+++    #         print(f'average inference time is: {average_infer_time}')
+-+++    #         print(f'inference time record: {time_record}')
+-+++        
+-+++    #     if streamer is not None:
+-+++    #         streamer.end()
+-+++        
+-+++    #     # 简单判断：打印是否使用了JIT路径
+-+++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+++    #     else:
+-+++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+++        
+-+++    #     if return_dict_in_generate:
+-+++    #         if self.config.is_encoder_decoder:
+-+++    #             return GenerateEncoderDecoderOutput(
+-+++    #                 sequences=input_ids,
+-+++    #                 scores=scores,
+-+++    #                 logits=raw_logits,
+-+++    #                 encoder_attentions=encoder_attentions,
+-+++    #                 encoder_hidden_states=encoder_hidden_states,
+-+++    #                 decoder_attentions=decoder_attentions,
+-+++    #                 cross_attentions=cross_attentions,
+-+++    #                 decoder_hidden_states=decoder_hidden_states,
+-+++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++    #                 average_infer_time=average_infer_time
+-+++    #             )
+-+++    #         else:
+-+++    #             return GenerateDecoderOnlyOutput(
+-+++    #                 sequences=input_ids,
+-+++    #                 scores=scores,
+-+++    #                 logits=raw_logits,
+-+++    #                 attentions=decoder_attentions,
+-+++    #                 hidden_states=decoder_hidden_states,
+-+++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++    #                 average_infer_time=average_infer_time
+-+++    #             )
+-+++    #     else:
+-+++    #         return input_ids
+-+++            
+-+++    # def _prepare_cache_for_generation(
+-+++    #     self,
+-+++    #     generation_config,
+-+++    #     model_kwargs,
+-+++    #     assistant_model,
+-+++    #     batch_size,
+-+++    #     max_cache_length,
+-+++    # ):
+-+++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+++    #         generation_config.cache_implementation = "static"
+-+++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+++        
+-+++    #     if generation_config.cache_implementation == "static":
+-+++    #         base_required_from_max_length = generation_config.max_length + 1
+-+++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+++    #         min_cache_size = 50
+-+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+++    #         else:
+-+++    #             max_cache_length = max(base_required, min_cache_size)
+-+++            
+-+++    #         original_max_cache_length = max_cache_length
+-+++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+++            
+-+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++    #             if max_cache_length > self.config.max_position_embeddings:
+-+++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+++        
+-+++    #     result = super()._prepare_cache_for_generation(
+-+++    #         generation_config=generation_config,
+-+++    #         model_kwargs=model_kwargs,
+-+++    #         assistant_model=assistant_model,
+-+++    #         batch_size=batch_size,
+-+++    #         max_cache_length=max_cache_length,
+-+++    #     )
+-+++        
+-+++    #     if generation_config.cache_implementation == "static":
+-+++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+++    #         created_cache = model_kwargs.get(cache_name)
+-+++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+++    #             if created_cache.max_cache_len < generation_config.max_length:
+-+++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+++        
+-+++    #     return result
+-+++
+-+++
+-+++
+-++ 
+-++ 
+-++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-++-- 
+-++2.27.0
+-++
+-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-+new file mode 100644
+-+index 00000000..22b65dd5
+-+--- /dev/null
+-++++ b/patches/0002-20251106commit.patch
+-+@@ -0,0 +1,3200 @@
+-++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-++From: Pinoeer-kingxi <13022943007@163.com>
+-++Date: Thu, 6 Nov 2025 09:20:38 +0800
+-++Subject: [PATCH 2/3] 20251106commit
+-++
+-++---
+-++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+-++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+-++ 3 files changed, 2689 insertions(+), 305 deletions(-)
+-++ create mode 100644 patches/0001-20251104commit.patch
+-++
+-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++index d8303e45..73773c22 100644
+-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+-++         #     y = y + self.shared_experts(identity)
+-++         # return y
+-++         
+-+++    # @no_grad()
+-+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++
+-+++    #     expert_cache = ops.zeros_like(x)
+-+++    #     for i in range(self.num_experts_per_tok):
+-+++    #         expert_id = flat_expert_indices[i].item()
+-+++    #         weight = flat_expert_weights[i].item()
+-+++    #         expert = self.experts[expert_id]
+-+++    #         expert_out = expert(x)
+-+++    #         expert_cache += expert_out * weight
+-+++    #     return expert_cache
+-+++
+-++     @no_grad()
+-++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++        # x 的 shape: (1, hidden_size)
+-+++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+++
+-+++        # 1. 收集所有需要的专家层
+-+++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+++
+-+++        # 2. 并行计算所有专家的输出
+-+++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+++        # ops.cat 会将它们堆叠成一个新的 Tensor
+-+++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+++
+-+++        # 3. 使用矩阵乘法进行加权求和
+-+++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++        # 最终结果 final_output 的 shape: (1, hidden_size)
+-+++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+++        
+-+++        return final_output
+-++ 
+-++-        expert_cache = ops.zeros_like(x)
+-++-        for i in range(self.num_experts_per_tok):
+-++-            expert_id = flat_expert_indices[i].item()
+-++-            weight = flat_expert_weights[i].item()
+-++-            expert = self.experts[expert_id]
+-++-            expert_out = expert(x)
+-++-            expert_cache += expert_out * weight
+-++-        return expert_cache
+-++ 
+-++     @no_grad()
+-++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+-++             key_states = self.k_proj(hidden_states)
+-++             value_states = self.v_proj(hidden_states)
+-++ 
+-++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++        # @lwx
+-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+-+++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-+++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-+++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-++ 
+-++         kv_seq_len = key_states.shape[-2]
+-++         if past_key_value is not None:
+-++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-++ 
+-+++# class DeepseekFlashAttention(nn.Module):
+-+++#     """
+-+++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-+++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-+++
+-+++#     This class is designed as a drop-in replacement for DeepseekAttention.
+-+++#     """
+-+++
+-+++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-+++#         super().__init__()
+-+++#         self.config = config
+-+++#         self.layer_idx = layer_idx
+-+++#         if layer_idx is None:
+-+++#             logger.warning(
+-+++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++#                 "when creating this class."
+-+++#             )
+-+++
+-+++#         self.attention_dropout = config.attention_dropout
+-+++#         self.hidden_size = config.hidden_size
+-+++#         self.num_heads = config.num_attention_heads
+-+++#         self.head_dim = self.hidden_size // self.num_heads
+-+++#         self.num_key_value_heads = config.num_key_value_heads
+-+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++#         self.max_position_embeddings = config.max_position_embeddings
+-+++#         self.rope_theta = config.rope_theta
+-+++#         self.is_causal = True
+-+++
+-+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++#             raise ValueError(
+-+++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+++#                 f" and `num_heads`: {self.num_heads})."
+-+++#             )
+-+++
+-+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-+++#         self._init_rope()
+-+++
+-+++#     def _init_rope(self):
+-+++#         if self.config.rope_scaling is None:
+-+++#             self.rotary_emb = DeepseekRotaryEmbedding(
+-+++#                 self.head_dim,
+-+++#                 max_position_embeddings=self.max_position_embeddings,
+-+++#                 base=self.rope_theta,
+-+++#             )
+-+++#         else:
+-+++#             scaling_type = self.config.rope_scaling["type"]
+-+++#             scaling_factor = self.config.rope_scaling["factor"]
+-+++#             if scaling_type == "linear":
+-+++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-+++#                     self.head_dim,
+-+++#                     max_position_embeddings=self.max_position_embeddings,
+-+++#                     scaling_factor=scaling_factor,
+-+++#                     base=self.rope_theta,
+-+++#                 )
+-+++#             elif scaling_type == "dynamic":
+-+++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-+++#                     self.head_dim,
+-+++#                     max_position_embeddings=self.max_position_embeddings,
+-+++#                     scaling_factor=scaling_factor,
+-+++#                     base=self.rope_theta,
+-+++#                 )
+-+++#             else:
+-+++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-+++
+-+++#     def forward(
+-+++#         self,
+-+++#         hidden_states: mindspore.Tensor,
+-+++#         attention_mask: Optional[mindspore.Tensor] = None,
+-+++#         position_ids: Optional[mindspore.Tensor] = None,
+-+++#         past_key_value: Optional[Cache] = None,
+-+++#         output_attentions: bool = False,
+-+++#         use_cache: bool = False,
+-+++#         **kwargs,
+-+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++#         if "padding_mask" in kwargs:
+-+++#             warnings.warn(
+-+++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-+++#             )
+-+++        
+-+++#         if output_attentions:
+-+++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-+++
+-+++#         bsz, q_len, _ = hidden_states.shape
+-+++
+-+++#         if self.config.pretraining_tp > 1:
+-+++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-+++
+-+++#         query_states = self.q_proj(hidden_states)
+-+++#         key_states = self.k_proj(hidden_states)
+-+++#         value_states = self.v_proj(hidden_states)
+-+++
+-+++#         # Reshape for multi-head attention
+-+++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++#         kv_seq_len = key_states.shape[-2]
+-+++#         if past_key_value is not None:
+-+++#             if self.layer_idx is None:
+-+++#                 raise ValueError(
+-+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++#                     "with a layer index."
+-+++#                 )
+-+++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++        
+-+++#         # Apply Rotary Positional Embedding
+-+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++#         if past_key_value is not None:
+-+++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-+++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++
+-+++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-+++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-+++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++        
+-+++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+++        
+-+++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+++
+-+++#         # Convert attention_mask for flash_attention_score
+-+++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-+++#         if attention_mask is not None:
+-+++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-+++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-+++#                 raise ValueError(
+-+++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-+++#                 )
+-+++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-+++#         else:
+-+++#             attn_mask_for_fa = None
+-+++        
+-+++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-+++
+-+++#         # Call the fused flash_attention_score operator
+-+++#         attn_output = mindspore.ops.flash_attention_score(
+-+++#             query=query_states_for_fa,
+-+++#             key=key_states_for_fa,
+-+++#             value=value_states_for_fa,
+-+++#             head_num=self.num_heads, # This is N1, the number of query heads
+-+++#             input_layout='BSH',
+-+++#             attn_mask=attn_mask_for_fa,
+-+++#             keep_prob=keep_prob,
+-+++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++#             sparse_mode=0 # Default mask mode
+-+++#         )
+-+++        
+-+++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-+++#         attn_output = self.o_proj(attn_output)
+-+++        
+-+++#         # Flash Attention does not return attention weights
+-+++#         attn_weights = None
+-+++
+-+++#         return attn_output, attn_weights, past_key_value
+-+++
+-+++class DeepseekFlashAttention(nn.Module):
+-+++    """
+-+++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+-+++    This implementation is a drop-in replacement for the original DeepseekAttention class,
+-+++    designed for high performance on supported hardware (Ascend).
+-+++
+-+++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+-+++    """
+-+++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-+++        super().__init__()
+-+++        self.config = config
+-+++        self.layer_idx = layer_idx
+-+++        if layer_idx is None:
+-+++            logger.warning(
+-+++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++                "when creating this class."
+-+++            )
+-+++
+-+++        # --- [FIX] Correctly initialize all required attributes ---
+-+++        self.attention_dropout = config.attention_dropout
+-+++        self.hidden_size = config.hidden_size
+-+++        self.num_heads = config.num_attention_heads
+-+++        self.head_dim = self.hidden_size // self.num_heads
+-+++        self.num_key_value_heads = config.num_key_value_heads
+-+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++        self.max_position_embeddings = config.max_position_embeddings
+-+++        self.rope_theta = config.rope_theta
+-+++        self.is_causal = True
+-+++
+-+++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++            raise ValueError(
+-+++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+++                f" and `num_heads`: {self.num_heads})."
+-+++            )
+-+++
+-+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-+++        
+-+++        # This call will now succeed as all attributes are initialized.
+-+++        self._init_rope()
+-+++
+-+++    def _init_rope(self):
+-+++        if self.config.rope_scaling is None:
+-+++            self.rotary_emb = DeepseekRotaryEmbedding(
+-+++                self.head_dim,
+-+++                max_position_embeddings=self.max_position_embeddings,
+-+++                base=self.rope_theta,
+-+++            )
+-+++        else:
+-+++            scaling_type = self.config.rope_scaling["type"]
+-+++            scaling_factor = self.config.rope_scaling["factor"]
+-+++            if scaling_type == "linear":
+-+++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-+++                    self.head_dim,
+-+++                    max_position_embeddings=self.max_position_embeddings,
+-+++                    scaling_factor=scaling_factor,
+-+++                    base=self.rope_theta,
+-+++                )
+-+++            elif scaling_type == "dynamic":
+-+++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-+++                    self.head_dim,
+-+++                    max_position_embeddings=self.max_position_embeddings,
+-+++                    scaling_factor=scaling_factor,
+-+++                    base=self.rope_theta,
+-+++                )
+-+++            else:
+-+++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-+++
+-+++    def forward(
+-+++        self,
+-+++        hidden_states: mindspore.Tensor,
+-+++        attention_mask: Optional[mindspore.Tensor] = None,
+-+++        position_ids: Optional[mindspore.Tensor] = None,
+-+++        past_key_value: Optional[Cache] = None,
+-+++        output_attentions: bool = False,
+-+++        use_cache: bool = False,
+-+++        **kwargs,
+-+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++        if "padding_mask" in kwargs:
+-+++            warnings.warn(
+-+++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-+++            )
+-+++        if output_attentions:
+-+++            warnings.warn(
+-+++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+-+++            )
+-+++
+-+++        bsz, q_len, _ = hidden_states.shape
+-+++
+-+++        if self.config.pretraining_tp > 1:
+-+++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-+++
+-+++        query_states = self.q_proj(hidden_states)
+-+++        key_states = self.k_proj(hidden_states)
+-+++        value_states = self.v_proj(hidden_states)
+-+++
+-+++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++        kv_seq_len = key_states.shape[-2]
+-+++        if past_key_value is not None:
+-+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++        
+-+++        # Apply Rotary Position Embedding
+-+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++        if past_key_value is not None:
+-+++            cache_kwargs = {"sin": sin, "cos": cos}
+-+++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++
+-+++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+-+++        # So we must explicitly repeat the KV heads.
+-+++        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++
+-+++        # Convert attention mask for flash_attention_score
+-+++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+-+++        if attention_mask is not None:
+-+++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-+++                 raise ValueError(
+-+++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-+++                )
+-+++            attn_mask_for_fa = attention_mask < 0
+-+++        else:
+-+++            attn_mask_for_fa = None
+-+++
+-+++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-+++
+-+++        # Call the fused operator using the efficient BNSD layout
+-+++        attn_output = mindspore.ops.flash_attention_score(
+-+++            query=query_states,
+-+++            key=key_states,
+-+++            value=value_states,
+-+++            head_num=self.num_heads,
+-+++            input_layout='BNSD', # Specify the correct layout
+-+++            attn_mask=attn_mask_for_fa,
+-+++            keep_prob=keep_prob,
+-+++            scalar_value=1.0 / math.sqrt(self.head_dim)
+-+++        )
+-+++        
+-+++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+-+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++        
+-+++        # Apply output projection
+-+++        attn_output = self.o_proj(attn_output)
+-+++
+-+++        # Flash attention does not return attention weights, so we return None.
+-+++        attn_weights = None
+-+++
+-+++        return attn_output, attn_weights, past_key_value
+-+++
+-++ Deepseek_ATTENTION_CLASSES = {
+-++     "eager": DeepseekAttention,
+-+++    "flash-attention": DeepseekFlashAttention,
+-++ }
+-++ 
+-++ 
+-++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+-++             config=config, layer_idx=layer_idx
+-++         )
+-++ 
+-+++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-+++            config=config, layer_idx=layer_idx
+-+++        )
+-+++
+-++         self.mlp = (
+-++             DeepseekMoE(config)
+-++             if (
+-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++index d4c6b651..bced285c 100644
+-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+-++ 
+-++ import mindspore
+-++ import mindnlp.core.nn.functional as F
+-++-from mindnlp.core import nn, ops
+-+++from mindnlp.core import nn, ops, no_grad
+-++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+-++ 
+-++ from ....common.activations import ACT2FN
+-++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+-++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-++ 
+-+++Long_Prompt = False
+-+++PROMPT_LENGTH_THRESHOLD = 128
+-++ 
+-++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-++ def _prepare_4d_causal_attention_mask_with_cache_position(
+-++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-++ 
+-+++# class Qwen2MoeFlashAttention(nn.Module):
+-+++#     """
+-+++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++
+-+++#     关键改动:
+-+++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++#        直接传入原始的 key 和 value 张量效率更高。
+-+++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++#     """
+-+++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++#         super().__init__()
+-+++#         self.config = config
+-+++#         self.layer_idx = layer_idx
+-+++#         self.hidden_size = config.hidden_size
+-+++#         self.num_heads = config.num_attention_heads
+-+++#         self.head_dim = self.hidden_size // self.num_heads
+-+++#         self.num_key_value_heads = config.num_key_value_heads
+-+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++#         self.max_position_embeddings = config.max_position_embeddings
+-+++#         self.rope_theta = config.rope_theta
+-+++#         self.attention_dropout = config.attention_dropout
+-+++
+-+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++#             raise ValueError(
+-+++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++#             )
+-+++
+-+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++
+-+++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++#             self.head_dim,
+-+++#             max_position_embeddings=self.max_position_embeddings,
+-+++#             base=self.rope_theta,
+-+++#         )
+-+++
+-+++#     def forward(
+-+++#         self,
+-+++#         hidden_states: mindspore.Tensor,
+-+++#         attention_mask: Optional[mindspore.Tensor] = None,
+-+++#         position_ids: Optional[mindspore.Tensor] = None,
+-+++#         past_key_value: Optional[Cache] = None,
+-+++#         output_attentions: bool = False,
+-+++#         use_cache: bool = False,
+-+++#         cache_position: Optional[mindspore.Tensor] = None,
+-+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++#         bsz, q_len, _ = hidden_states.shape
+-+++
+-+++#         # 1. 线性投射 Q, K, V
+-+++#         query_states = self.q_proj(hidden_states)
+-+++#         key_states = self.k_proj(hidden_states)
+-+++#         value_states = self.v_proj(hidden_states)
+-+++
+-+++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++#         # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++#         # 3. RoPE 旋转位置编码
+-+++#         kv_seq_len = key_states.shape[-2]
+-+++#         if past_key_value is not None:
+-+++#             if self.layer_idx is None:
+-+++#                 raise ValueError(
+-+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++#                     "with a layer index."
+-+++#                 )
+-+++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++#                 if cache_position.shape[0] == 1:
+-+++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++#                     kv_seq_len = past_seen_tokens + 1
+-+++#                 else:
+-+++#                     # prefill 阶段：cache_position 是范围，使用其长度
+-+++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++#             else:
+-+++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++
+-+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++#         # 4. KV 缓存更新
+-+++#         if past_key_value is not None:
+-+++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++#             key_states, value_states = past_key_value.update(
+-+++#                 key_states, value_states, self.layer_idx, cache_kwargs
+-+++#             )
+-+++            
+-+++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++#                 if cache_position.shape[0] == 1:
+-+++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++#                     kv_seq_len = key_states.shape[-2]
+-+++
+-+++#         # 5. [重要] 准备 Attention Mask
+-+++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++#         fa_attention_mask = None
+-+++#         if attention_mask is not None:
+-+++#             # 截取与当前key长度匹配的部分
+-+++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++#             fa_attention_mask = (mask_slice != 0)
+-+++
+-+++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++#         input_dtype = query_states.dtype
+-+++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++#             query_states = query_states.to(mindspore.float16)
+-+++#             key_states = key_states.to(mindspore.float16)
+-+++#             value_states = value_states.to(mindspore.float16)
+-+++
+-+++#         # 6. [核心] 调用 flash_attention_score 算子
+-+++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++#         attn_output = mindspore.ops.flash_attention_score(
+-+++#             query=query_states,
+-+++#             key=key_states,
+-+++#             value=value_states,
+-+++#             head_num=self.num_heads, # 传入Q的头数(N1)
+-+++#             attn_mask=fa_attention_mask,
+-+++#             keep_prob=1.0 - self.attention_dropout,
+-+++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++#             input_layout="BNSD",
+-+++#             sparse_mode=0 # 使用 defaultMask 模式
+-+++#         )
+-+++
+-+++#         # 恢复原始数据类型
+-+++#         attn_output = attn_output.to(input_dtype)
+-+++
+-+++#         # 7. 调整输出形状
+-+++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++#         attn_output = self.o_proj(attn_output)
+-+++
+-+++#         # FlashAttention 算子不直接返回注意力权重矩阵
+-+++#         attn_weights = None
+-+++#         if output_attentions:
+-+++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++
+-+++#         return attn_output, attn_weights, past_key_value
+-+++
+-+++#     # def forward(
+-+++#     #     self,
+-+++#     #     hidden_states: mindspore.Tensor,
+-+++#     #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++#     #     position_ids: Optional[mindspore.Tensor] = None,
+-+++#     #     past_key_value: Optional[Cache] = None,
+-+++#     #     output_attentions: bool = False,
+-+++#     #     use_cache: bool = False,
+-+++#     #     cache_position: Optional[mindspore.Tensor] = None,
+-+++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++#     #     bsz, q_len, _ = hidden_states.shape
+-+++
+-+++#     #     # 1. 线性投射 Q, K, V
+-+++#     #     query_states = self.q_proj(hidden_states)
+-+++#     #     key_states = self.k_proj(hidden_states)
+-+++#     #     value_states = self.v_proj(hidden_states)
+-+++
+-+++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++
+-+++#     #     # 3. RoPE 旋转位置编码
+-+++#     #     kv_seq_len = key_states.shape[-2]
+-+++#     #     if past_key_value is not None:
+-+++#     #         if self.layer_idx is None:
+-+++#     #             raise ValueError(
+-+++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++#     #                 "with a layer index."
+-+++#     #             )
+-+++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++
+-+++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++#     #     # 4. KV 缓存更新
+-+++#     #     if past_key_value is not None:
+-+++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++#     #         key_states, value_states = past_key_value.update(
+-+++#     #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++#     #         )
+-+++
+-+++#     #     # 5. 准备 Attention Mask
+-+++#     #     fa_attention_mask = None
+-+++#     #     if attention_mask is not None:
+-+++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++#     #         fa_attention_mask = (mask_slice != 0)
+-+++
+-+++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++#     #     input_dtype = query_states.dtype
+-+++
+-+++#     #     # 6. [核心] 调用 flash_attention_score 算子
+-+++#     #     attn_output = mindspore.ops.flash_attention_score(
+-+++#     #         query=query_states,
+-+++#     #         key=key_states,
+-+++#     #         value=value_states,
+-+++#     #         head_num=self.num_heads,
+-+++#     #         attn_mask=fa_attention_mask,
+-+++#     #         keep_prob=1.0 - self.attention_dropout,
+-+++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++#     #         input_layout="BNSD",
+-+++#     #         sparse_mode=0,
+-+++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++#     #         inner_precise=1
+-+++#     #     )
+-+++
+-+++#     #     # 恢复原始数据类型
+-+++#     #     attn_output = attn_output.to(input_dtype)
+-+++
+-+++#     #     # 7. 调整输出形状
+-+++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++#     #     attn_output = self.o_proj(attn_output)
+-+++
+-+++#     #     attn_weights = None
+-+++#     #     if output_attentions:
+-+++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++
+-+++#     #     return attn_output, attn_weights, past_key_value
+-+++
+-+++
+-++ class Qwen2MoeFlashAttention(nn.Module):
+-++     """
+-++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++-
+-++-    关键改动:
+-++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++-       直接传入原始的 key 和 value 张量效率更高。
+-++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+-+++
+-+++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+-+++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+-+++    完全使用模型的低精度数据类型（如 float16）进行计算，
+-+++    以达到理论上的最高执行速度。
+-++     """
+-++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++         super().__init__()
+-++         self.config = config
+-++         self.layer_idx = layer_idx
+-+++        if layer_idx is None:
+-+++            logger.warning_once(
+-+++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+-+++            )
+-+++
+-++         self.hidden_size = config.hidden_size
+-++         self.num_heads = config.num_attention_heads
+-++         self.head_dim = self.hidden_size // self.num_heads
+-++         self.num_key_value_heads = config.num_key_value_heads
+-++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++         self.max_position_embeddings = config.max_position_embeddings
+-++         self.rope_theta = config.rope_theta
+-++         self.attention_dropout = config.attention_dropout
+-++ 
+-++-        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++-            raise ValueError(
+-++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++-            )
+-++-
+-++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+-++         key_states = self.k_proj(hidden_states)
+-++         value_states = self.v_proj(hidden_states)
+-++ 
+-++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++-        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++        # 2. 调整形状以匹配 BNSD 布局
+-++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-
+-++-        # 3. RoPE 旋转位置编码
+-+++        
+-+++        # 3. RoPE 和 KV 缓存
+-++         kv_seq_len = key_states.shape[-2]
+-++         if past_key_value is not None:
+-++-            if self.layer_idx is None:
+-++-                raise ValueError(
+-++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++-                    "with a layer index."
+-++-                )
+-++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++-                if cache_position.shape[0] == 1:
+-++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++-                    kv_seq_len = past_seen_tokens + 1
+-++-                else:
+-++-                    # prefill 阶段：cache_position 是范围，使用其长度
+-++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++-            else:
+-++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++-
+-+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++        
+-++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++ 
+-++-        # 4. KV 缓存更新
+-++         if past_key_value is not None:
+-++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++-            key_states, value_states = past_key_value.update(
+-++-                key_states, value_states, self.layer_idx, cache_kwargs
+-++-            )
+-++-            
+-++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++-                if cache_position.shape[0] == 1:
+-++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++-                    kv_seq_len = key_states.shape[-2]
+-++-
+-++-        # 5. [重要] 准备 Attention Mask
+-++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++
+-+++        # 4. 准备 Attention Mask
+-++         fa_attention_mask = None
+-++         if attention_mask is not None:
+-++-            # 截取与当前key长度匹配的部分
+-++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++             fa_attention_mask = (mask_slice != 0)
+-++ 
+-++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++-        input_dtype = query_states.dtype
+-++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++-            query_states = query_states.to(mindspore.float16)
+-++-            key_states = key_states.to(mindspore.float16)
+-++-            value_states = value_states.to(mindspore.float16)
+-++-
+-++-        # 6. [核心] 调用 flash_attention_score 算子
+-++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+-++         attn_output = mindspore.ops.flash_attention_score(
+-++             query=query_states,
+-++             key=key_states,
+-++             value=value_states,
+-++-            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++            head_num=self.num_heads,
+-++             attn_mask=fa_attention_mask,
+-++-            keep_prob=1.0 - self.attention_dropout,
+-+++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+-++             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++             input_layout="BNSD",
+-++-            sparse_mode=0 # 使用 defaultMask 模式
+-+++            sparse_mode=0,
+-+++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+-++         )
+-++ 
+-++-        # 恢复原始数据类型
+-++-        attn_output = attn_output.to(input_dtype)
+-++-
+-++-        # 7. 调整输出形状
+-++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++        # 6. 调整输出形状
+-++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++         attn_output = self.o_proj(attn_output)
+-++ 
+-++-        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++        # 7. 返回结果
+-++         attn_weights = None
+-++         if output_attentions:
+-++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-++ 
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-++-    # def forward(
+-++-    #     self,
+-++-    #     hidden_states: mindspore.Tensor,
+-++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++-    #     position_ids: Optional[mindspore.Tensor] = None,
+-++-    #     past_key_value: Optional[Cache] = None,
+-++-    #     output_attentions: bool = False,
+-++-    #     use_cache: bool = False,
+-++-    #     cache_position: Optional[mindspore.Tensor] = None,
+-++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++-
+-++-    #     bsz, q_len, _ = hidden_states.shape
+-++-
+-++-    #     # 1. 线性投射 Q, K, V
+-++-    #     query_states = self.q_proj(hidden_states)
+-++-    #     key_states = self.k_proj(hidden_states)
+-++-    #     value_states = self.v_proj(hidden_states)
+-++-
+-++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-
+-++-    #     # 3. RoPE 旋转位置编码
+-++-    #     kv_seq_len = key_states.shape[-2]
+-++-    #     if past_key_value is not None:
+-++-    #         if self.layer_idx is None:
+-++-    #             raise ValueError(
+-++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++-    #                 "with a layer index."
+-++-    #             )
+-++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++ 
+-++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++-
+-++-    #     # 4. KV 缓存更新
+-++-    #     if past_key_value is not None:
+-++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++-    #         key_states, value_states = past_key_value.update(
+-++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++-    #         )
+-++-
+-++-    #     # 5. 准备 Attention Mask
+-++-    #     fa_attention_mask = None
+-++-    #     if attention_mask is not None:
+-++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++-    #         fa_attention_mask = (mask_slice != 0)
+-++-
+-++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++-    #     input_dtype = query_states.dtype
+-++-
+-++-    #     # 6. [核心] 调用 flash_attention_score 算子
+-++-    #     attn_output = mindspore.ops.flash_attention_score(
+-++-    #         query=query_states,
+-++-    #         key=key_states,
+-++-    #         value=value_states,
+-++-    #         head_num=self.num_heads,
+-++-    #         attn_mask=fa_attention_mask,
+-++-    #         keep_prob=1.0 - self.attention_dropout,
+-++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++-    #         input_layout="BNSD",
+-++-    #         sparse_mode=0,
+-++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++-    #         inner_precise=1
+-++-    #     )
+-++-
+-++-    #     # 恢复原始数据类型
+-++-    #     attn_output = attn_output.to(input_dtype)
+-+++QWEN2MOE_ATTENTION_CLASSES = {
+-+++    "eager": Qwen2MoeAttention,
+-+++    "flash-attention": Qwen2MoeFlashAttention,
+-+++}
+-++ 
+-++-    #     # 7. 调整输出形状
+-++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++-    #     attn_output = self.o_proj(attn_output)
+-++ 
+-++-    #     attn_weights = None
+-++-    #     if output_attentions:
+-++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++#     def __init__(self, config):
+-+++#         super().__init__()
+-+++#         self.num_experts = config.num_experts
+-+++#         self.top_k = config.num_experts_per_tok
+-+++#         self.norm_topk_prob = config.norm_topk_prob
+-+++
+-+++#         # gating
+-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++#         self.experts = nn.ModuleList(
+-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++#         )
+-+++
+-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++
+-+++#     #@dwj
+-+++#     # 只遍历激活的专家，而非全部专家
+-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++#             num_tokens = hidden_states_reshaped.shape[0]
+-+++
+-+++#             router_logits = self.gate(hidden_states_reshaped)
+-+++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++
+-+++#             if self.norm_topk_prob:
+-+++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++#             routing_weights = routing_weights.to(hidden_states.dtype)
+-+++            
+-+++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++#             flat_selected_experts = selected_experts.flatten()
+-+++            
+-+++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++#             token_indices = broadcasted_token_indices.flatten()
+-+++            
+-+++#             active_experts = ops.unique(flat_selected_experts)
+-+++            
+-+++#             for expert_idx_tensor in active_experts:
+-+++#                 expert_idx = expert_idx_tensor.item()
+-+++#                 expert_layer = self.experts[expert_idx]
+-+++                
+-+++#                 mask = (flat_selected_experts == expert_idx_tensor)
+-+++#                 selected_token_indices = token_indices[mask]
+-+++#                 selected_routing_weights = routing_weights.flatten()[mask]
+-+++                
+-+++#                 current_states = hidden_states_reshaped[selected_token_indices]
+-+++                
+-+++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++                
+-+++#                 final_hidden_states = final_hidden_states.index_add(
+-+++#                     dim=0,
+-+++#                     index=selected_token_indices,
+-+++#                     source=expert_output.to(hidden_states.dtype)
+-+++#                 )
+-+++            
+-+++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++ 
+-++-    #     return attn_output, attn_weights, past_key_value
+-+++#             final_hidden_states = final_hidden_states + shared_expert_output
+-+++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++            
+-+++#             return final_hidden_states, router_logits
+-+++
+-+++
+-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++#     """
+-+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-+++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-+++#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-+++#     """
+-+++#     def __init__(self, config: Qwen2MoeConfig):
+-+++#         super().__init__()
+-+++#         self.num_experts = config.num_experts
+-+++#         self.top_k = config.num_experts_per_tok
+-+++#         self.norm_topk_prob = config.norm_topk_prob
+-+++
+-+++#         # 门控网络
+-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++#         # 专家列表
+-+++#         self.experts = nn.ModuleList(
+-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++#         )
+-+++#         # 共享专家
+-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++
+-+++#     @no_grad()
+-+++#     def _moe_infer_decode(
+-+++#         self, 
+-+++#         hidden_states: mindspore.Tensor, 
+-+++#         selected_experts: mindspore.Tensor, 
+-+++#         routing_weights: mindspore.Tensor
+-+++#     ) -> mindspore.Tensor:
+-+++#         """
+-+++#         【解码路径】针对 sequence_length=1 的极致优化。
+-+++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-+++#         """
+-+++#         batch_size, hidden_dim = hidden_states.shape
+-+++        
+-+++#         expert_outputs_list = [
+-+++#             ops.cat([
+-+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++#             ], dim=0) 
+-+++#             for i in range(batch_size)
+-+++#         ]
+-+++        
+-+++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-+++#         # shape: (batch_size, top_k, hidden_dim)
+-+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++        
+-+++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-+++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++        
+-+++#         return moe_output.squeeze(1)
+-+++
+-+++#     @no_grad()
+-+++#     def _moe_infer_prefill(
+-+++#         self, 
+-+++#         hidden_states: mindspore.Tensor, 
+-+++#         selected_experts: mindspore.Tensor, 
+-+++#         routing_weights: mindspore.Tensor
+-+++#     ) -> mindspore.Tensor:
+-+++#         """
+-+++#         【预填充路径】针对 sequence_length > 1 的优化。
+-+++#         按专家对 Token 进行分组，并进行批处理。
+-+++#         """
+-+++#         moe_output = ops.zeros_like(hidden_states)
+-+++#         num_tokens = hidden_states.shape[0]
+-+++#         flat_selected_experts = selected_experts.flatten()
+-+++        
+-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++        
+-+++#         active_experts = ops.unique(flat_selected_experts)
+-+++        
+-+++#         for expert_idx_tensor in active_experts:
+-+++#             expert_idx = expert_idx_tensor.item()
+-+++#             expert_layer = self.experts[expert_idx]
+-+++            
+-+++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++#             selected_token_indices = token_indices[mask]
+-+++#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++            
+-+++#             current_states = hidden_states[selected_token_indices]
+-+++            
+-+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++            
+-+++#             moe_output = moe_output.index_add(
+-+++#                 dim=0,
+-+++#                 index=selected_token_indices,
+-+++#                 source=expert_output.to(hidden_states.dtype)
+-+++#             )
+-+++#         return moe_output
+-+++
+-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++#         """
+-+++#         顶层 forward 方法，作为智能分发器。
+-+++#         """
+-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++        
+-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++#         router_logits = self.gate(hidden_states_reshaped)
+-+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++ 
+-++-    # def forward(
+-++-    #     self,
+-++-    #     hidden_states: mindspore.Tensor,
+-++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++-    #     position_ids: Optional[mindspore.Tensor] = None,
+-++-    #     past_key_value: Optional[Cache] = None,
+-++-    #     output_attentions: bool = False,
+-++-    #     use_cache: bool = False,
+-++-    #     cache_position: Optional[mindspore.Tensor] = None,
+-++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++-
+-++-    #     bsz, q_len, _ = hidden_states.shape
+-++-
+-++-    #     query_states = self.q_proj(hidden_states)
+-++-    #     key_states = self.k_proj(hidden_states)
+-++-    #     value_states = self.v_proj(hidden_states)
+-++-
+-++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-
+-++-    #     kv_seq_len = key_states.shape[-2]
+-++-    #     if past_key_value is not None:
+-++-    #         if self.layer_idx is None:
+-++-    #             raise ValueError("`layer_idx` must be specified for caching")
+-++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++-
+-++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++-
+-++-    #     if past_key_value is not None:
+-++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++-    #         key_states, value_states = past_key_value.update(
+-++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++-    #         )
+-+++#         if self.norm_topk_prob:
+-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++        
+-+++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++        
+-+++#         moe_output = None
+-+++#         # 在推理时，根据序列长度选择最优路径
+-+++#         if not self.training:
+-+++#             if sequence_length == 1:
+-+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+++#             else:
+-+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+++#         else:
+-+++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-+++#             raise NotImplementedError("Training path is not implemented.")
+-+++
+-+++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-+++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-+++        
+-+++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-+++        
+-+++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-+++        
+-+++#         return final_hidden_states, router_logits
+-+++
+-+++
+-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++#     """
+-+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-+++#     """
+-+++#     def __init__(self, config: Qwen2MoeConfig):
+-+++#         super().__init__()
+-+++#         self.num_experts = config.num_experts
+-+++#         self.top_k = config.num_experts_per_tok
+-+++#         self.norm_topk_prob = config.norm_topk_prob
+-+++
+-+++#         # 门控网络
+-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++#         # 专家列表
+-+++#         self.experts = nn.ModuleList(
+-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++#         )
+-+++#         # 共享专家
+-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++
+-+++#     @no_grad()
+-+++#     def _moe_infer_decode(
+-+++#         self, 
+-+++#         hidden_states: mindspore.Tensor, 
+-+++#         selected_experts: mindspore.Tensor, 
+-+++#         routing_weights: mindspore.Tensor
+-+++#     ) -> mindspore.Tensor:
+-+++#         batch_size, _ = hidden_states.shape
+-+++#         expert_outputs_list = [
+-+++#             ops.cat([
+-+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++#             ], dim=0) 
+-+++#             for i in range(batch_size)
+-+++#         ]
+-+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++#         return moe_output.squeeze(1)
+-+++
+-+++#     @no_grad()
+-+++#     def _moe_infer_prefill(
+-+++#         self, 
+-+++#         hidden_states: mindspore.Tensor, 
+-+++#         selected_experts: mindspore.Tensor, 
+-+++#         routing_weights: mindspore.Tensor
+-+++#     ) -> mindspore.Tensor:
+-+++#         moe_output = ops.zeros_like(hidden_states)
+-+++#         num_tokens = hidden_states.shape[0]
+-+++#         flat_selected_experts = selected_experts.flatten()
+-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++#         active_experts = ops.unique(flat_selected_experts)
+-+++        
+-+++#         for expert_idx_tensor in active_experts:
+-+++#             expert_idx = expert_idx_tensor.item()
+-+++#             expert_layer = self.experts[expert_idx]
+-+++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++#             selected_token_indices = token_indices[mask]
+-+++#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++#             current_states = hidden_states[selected_token_indices]
+-+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++#             moe_output = moe_output.index_add(
+-+++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++#             )
+-+++#         return moe_output
+-+++
+-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++#         """
+-+++#         顶层 forward 方法，作为智能分发器。
+-+++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-+++#         """
+-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++        
+-+++#         # 1. 门控计算 (通用逻辑)
+-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++#         router_logits = self.gate(hidden_states_reshaped)
+-+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++
+-+++#         if self.norm_topk_prob:
+-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++        
+-+++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++        
+-+++#         # 2. 智能分发到最优 MoE 路径
+-+++#         moe_output = None
+-+++#         if not self.training:
+-+++#             if sequence_length == 1:
+-+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+++#             else:
+-+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+++#         else:
+-+++#             raise NotImplementedError("Training path is not implemented.")
+-+++
+-+++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-+++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++        
+-+++#         # 4. 合并 MoE 输出和共享专家输出
+-+++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++        
+-+++#         # 5. 恢复原始形状并返回
+-+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++        
+-+++#         return final_hidden_states, router_logits
+-+++
+-+++# prefill fastest
+-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++#     """
+-+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-+++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-+++#     """
+-+++#     def __init__(self, config: Qwen2MoeConfig):
+-+++#         super().__init__()
+-+++#         self.num_experts = config.num_experts
+-+++#         self.top_k = config.num_experts_per_tok
+-+++#         self.norm_topk_prob = config.norm_topk_prob
+-+++
+-+++#         # 门控网络
+-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++#         # 专家列表
+-+++#         self.experts = nn.ModuleList(
+-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++#         )
+-+++#         # 共享专家
+-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++
+-+++#     @no_grad()
+-+++#     def _moe_infer_dispatch(
+-+++#         self, 
+-+++#         hidden_states: mindspore.Tensor, 
+-+++#         selected_experts: mindspore.Tensor, 
+-+++#         routing_weights: mindspore.Tensor
+-+++#     ) -> mindspore.Tensor:
+-+++#         """
+-+++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-+++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-+++#         """
+-+++#         moe_output = ops.zeros_like(hidden_states)
+-+++#         num_tokens, _ = hidden_states.shape
+-+++        
+-+++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-+++#         flat_selected_experts = selected_experts.flatten()
+-+++#         flat_routing_weights = routing_weights.flatten()
+-++ 
+-++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++-
+-++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++-    #     query_states = query_states / math.sqrt(self.head_dim)
+-++-    #     # <--- 修改结束 ---
+-++-
+-++-    #     fa_attention_mask = None
+-++-    #     if attention_mask is not None:
+-++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++-    #         fa_attention_mask = (mask_slice != 0)
+-++-
+-++-    #     input_dtype = query_states.dtype
+-++-
+-++-    #     attn_output = mindspore.ops.flash_attention_score(
+-++-    #         query=query_states,    # 传入已经预先缩放过的 query
+-++-    #         key=key_states,
+-++-    #         value=value_states,
+-++-    #         head_num=self.num_heads,
+-++-    #         attn_mask=fa_attention_mask,
+-++-    #         keep_prob=1.0 - self.attention_dropout,
+-++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++-    #         input_layout="BNSD",
+-++-    #         sparse_mode=0,
+-++-    #         inner_precise=1        # 仍然保持内部高精度计算
+-++-    #     )
+-+++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++ 
+-++-    #     attn_output = attn_output.to(input_dtype)
+-++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++-    #     attn_output = self.o_proj(attn_output)
+-+++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-+++#         active_experts = ops.unique(flat_selected_experts)
+-+++        
+-+++#         for expert_idx_tensor in active_experts:
+-+++#             expert_idx = expert_idx_tensor.item()
+-+++#             expert_layer = self.experts[expert_idx]
+-+++            
+-+++#             # 找到所有分配给该专家的 token
+-+++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++            
+-+++#             # 使用 mask 选取对应的 token 和权重
+-+++#             current_token_indices = token_indices[mask]
+-+++#             current_routing_weights = flat_routing_weights[mask]
+-+++#             current_hidden_states = hidden_states[current_token_indices]
+-+++            
+-+++#             # 对这些 token 进行批处理
+-+++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++            
+-+++#             # 使用 index_add 将结果精确地加回到对应位置
+-+++#             moe_output = moe_output.index_add(
+-+++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++#             )
+-+++#         return moe_output
+-+++
+-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++#         """
+-+++#         顶层 forward 方法，作为智能分发器。
+-+++#         """
+-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++        
+-+++#         # 1. 门控计算
+-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++#         router_logits = self.gate(hidden_states_reshaped)
+-+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++
+-+++#         if self.norm_topk_prob:
+-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++        
+-+++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++        
+-+++#         # 2. 调用统一的 MoE 计算内核
+-+++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-+++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-++ 
+-++-    #     attn_weights = None
+-++-    #     if output_attentions:
+-++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++#         # 3. 统一处理共享专家
+-+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++        
+-+++#         # 4. 合并输出
+-+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++        
+-+++#         # 5. 恢复原始形状并返回
+-+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++        
+-+++#         return final_hidden_states, router_logits
+-+++
+-+++
+-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++#     """
+-+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++#     【最终高性能与高精度版】：
+-+++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-+++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-+++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-+++#     3. 这样实现了速度和准确性的两全其美。
+-+++#     """
+-+++#     def __init__(self, config: Qwen2MoeConfig):
+-+++#         super().__init__()
+-+++#         self.num_experts = config.num_experts
+-+++#         self.top_k = config.num_experts_per_tok
+-+++#         self.norm_topk_prob = config.norm_topk_prob
+-+++
+-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++#         self.experts = nn.ModuleList(
+-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++#         )
+-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++
+-+++#     @no_grad()
+-+++#     def _moe_infer_decode(
+-+++#         self, 
+-+++#         hidden_states: mindspore.Tensor, 
+-+++#         selected_experts: mindspore.Tensor, 
+-+++#         routing_weights: mindspore.Tensor
+-+++#     ) -> mindspore.Tensor:
+-+++#         """
+-+++#         【解码路径】极致优化版：bmm + 高精度累加。
+-+++#         """
+-+++#         original_dtype = hidden_states.dtype
+-+++#         batch_size, _ = hidden_states.shape
+-+++        
+-+++#         expert_outputs_list = [
+-+++#             ops.cat([
+-+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++#             ], dim=0) 
+-+++#             for i in range(batch_size)
+-+++#         ]
+-+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++
+-+++#         # 在 float32 下执行 bmm，得到高精度结果
+-+++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++        
+-+++#         # 将高精度结果转换回原始数据类型
+-+++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-+++        
+-+++#         return moe_output
+-+++
+-+++#     @no_grad()
+-+++#     def _moe_infer_prefill(
+-+++#         self, 
+-+++#         hidden_states: mindspore.Tensor, 
+-+++#         selected_experts: mindspore.Tensor, 
+-+++#         routing_weights: mindspore.Tensor
+-+++#     ) -> mindspore.Tensor:
+-+++#         """
+-+++#         【预填充路径】与原始实现一致，结果精确。
+-+++#         """
+-+++#         moe_output = ops.zeros_like(hidden_states)
+-+++#         num_tokens, _ = hidden_states.shape
+-+++#         flat_selected_experts = selected_experts.flatten()
+-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++#         active_experts = ops.unique(flat_selected_experts)
+-+++        
+-+++#         for expert_idx_tensor in active_experts:
+-+++#             expert_idx = expert_idx_tensor.item()
+-+++#             expert_layer = self.experts[expert_idx]
+-+++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++#             selected_token_indices = token_indices[mask]
+-+++#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++#             current_states = hidden_states[selected_token_indices]
+-+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++#             moe_output = moe_output.index_add(
+-+++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++#             )
+-+++#         return moe_output
+-+++
+-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++        
+-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++#         router_logits = self.gate(hidden_states_reshaped)
+-+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++ 
+-++-    #     return attn_output, attn_weights, past_key_value
+-+++#         if self.norm_topk_prob:
+-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++        
+-+++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-+++#         # 如果模型主体是 float16，后续再转换
+-+++        
+-+++#         moe_output = None
+-+++#         if not self.training:
+-+++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-+++#             # _moe_infer_decode 内部会处理好类型转换
+-+++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-+++#             if sequence_length == 1:
+-+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+++#             else:
+-+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+++#         else:
+-+++#             raise NotImplementedError("Training path is not implemented.")
+-+++
+-+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++        
+-+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++        
+-+++#         return final_hidden_states, router_logits
+-+++    
+-++ 
+-++-QWEN2MOE_ATTENTION_CLASSES = {
+-++-    "eager": Qwen2MoeAttention,
+-++-    "flash-attention": Qwen2MoeFlashAttention,
+-++-}
+-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++#     """
+-+++#     【融合版】一个混合专家模块，内置两种推理策略，
+-+++#     由外部全局变量 `Long_Prompt` 控制：
+-+++
+-+++#     - if Long_Prompt is True:  【精度优先模式】
+-+++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-+++#       适用于处理长序列，避免误差累积。
+-+++
+-+++#     - if Long_Prompt is False: 【速度优先模式】
+-+++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-+++#       在解码阶段获得极致速度，同时保证结果高度准确。
+-+++#     """
+-+++#     def __init__(self, config: Qwen2MoeConfig):
+-+++#         super().__init__()
+-+++#         self.num_experts = config.num_experts
+-+++#         self.top_k = config.num_experts_per_tok
+-+++#         self.norm_topk_prob = config.norm_topk_prob
+-+++
+-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++#         self.experts = nn.ModuleList(
+-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++#         )
+-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++
+-+++#     # --- 速度优先模式的辅助函数 ---
+-+++#     @no_grad()
+-+++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++#         original_dtype = hidden_states.dtype
+-+++#         batch_size, _ = hidden_states.shape
+-+++#         expert_outputs_list = [
+-+++#             ops.cat([
+-+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++#             ], dim=0) 
+-+++#             for i in range(batch_size)
+-+++#         ]
+-+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++#         weights_fp32 = routing_weights.to(mindspore.float32)
+-+++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+++#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+++
+-+++#     @no_grad()
+-+++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++#         moe_output = ops.zeros_like(hidden_states)
+-+++#         num_tokens, _ = hidden_states.shape
+-+++#         flat_selected_experts = selected_experts.flatten()
+-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++#         active_experts = ops.unique(flat_selected_experts)
+-+++#         for expert_idx_tensor in active_experts:
+-+++#             expert_idx = expert_idx_tensor.item()
+-+++#             expert_layer = self.experts[expert_idx]
+-+++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++#             selected_token_indices = token_indices[mask]
+-+++#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++#             current_states = hidden_states[selected_token_indices]
+-+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++#         return moe_output
+-+++
+-+++#     # --- 精度优先模式的辅助函数 ---
+-+++#     @no_grad()
+-+++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++#         moe_output = ops.zeros_like(hidden_states)
+-+++#         num_tokens, _ = hidden_states.shape
+-+++#         flat_selected_experts = selected_experts.flatten()
+-+++#         flat_routing_weights = routing_weights.flatten()
+-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++#         active_experts = ops.unique(flat_selected_experts)
+-+++#         for expert_idx_tensor in active_experts:
+-+++#             expert_idx = expert_idx_tensor.item()
+-+++#             expert_layer = self.experts[expert_idx]
+-+++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++#             current_token_indices = token_indices[mask]
+-+++#             current_routing_weights = flat_routing_weights[mask]
+-+++#             current_hidden_states = hidden_states[current_token_indices]
+-+++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++#         return moe_output
+-+++
+-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++#         # 声明我们将要使用一个在模块外部定义的全局变量
+-+++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-+++#         global Long_Prompt
+-+++
+-+++#         # 1. 门控计算 (所有模式通用)
+-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++#         router_logits = self.gate(hidden_states_reshaped)
+-+++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+++#         if self.norm_topk_prob:
+-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++        
+-+++#         moe_output = None
+-+++#         if not self.training:
+-+++#             # 根据 Long_Prompt 标志选择模式
+-+++#             if Long_Prompt:
+-+++#                 # --- 精度优先模式 ---
+-+++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++#             else:
+-+++#                 # --- 速度优先模式 ---
+-+++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++#                 if sequence_length == 1:
+-+++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++#                 else:
+-+++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++#         else:
+-+++#             raise NotImplementedError("Training path is not implemented.")
+-+++
+-+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++        
+-+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++        
+-+++#         return final_hidden_states, router_logits
+-+++    
+-+++class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++    """
+-+++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-+++    控制的顶级推理策略：
+-++ 
+-+++    - if Long_Prompt is True:  【精度优先模式】
+-+++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+-+++      适用于需要严格可复现性的长序列任务。
+-++ 
+-++-class Qwen2MoeSparseMoeBlock(nn.Module):
+-++-    def __init__(self, config):
+-+++    - if Long_Prompt is False: 【速度优先模式】
+-+++      采用业界最强的性能组合：
+-+++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+-+++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+-+++    """
+-+++    def __init__(self, config: Qwen2MoeConfig):
+-++         super().__init__()
+-++         self.num_experts = config.num_experts
+-++         self.top_k = config.num_experts_per_tok
+-++         self.norm_topk_prob = config.norm_topk_prob
+-++ 
+-++-        # gating
+-++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++         self.experts = nn.ModuleList(
+-++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++         )
+-++-
+-++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++ 
+-++-    #@dwj
+-++-    # 只遍历激活的专家，而非全部专家
+-++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++-            num_tokens = hidden_states_reshaped.shape[0]
+-++-
+-++-            router_logits = self.gate(hidden_states_reshaped)
+-++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++-
+-++-            if self.norm_topk_prob:
+-++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-            routing_weights = routing_weights.to(hidden_states.dtype)
+-++-            
+-++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++-            flat_selected_experts = selected_experts.flatten()
+-++-            
+-++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++-            token_indices = broadcasted_token_indices.flatten()
+-++-            
+-++-            active_experts = ops.unique(flat_selected_experts)
+-++-            
+-++-            for expert_idx_tensor in active_experts:
+-++-                expert_idx = expert_idx_tensor.item()
+-++-                expert_layer = self.experts[expert_idx]
+-++-                
+-++-                mask = (flat_selected_experts == expert_idx_tensor)
+-++-                selected_token_indices = token_indices[mask]
+-++-                selected_routing_weights = routing_weights.flatten()[mask]
+-++-                
+-++-                current_states = hidden_states_reshaped[selected_token_indices]
+-++-                
+-++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++-                
+-++-                final_hidden_states = final_hidden_states.index_add(
+-++-                    dim=0,
+-++-                    index=selected_token_indices,
+-++-                    source=expert_output.to(hidden_states.dtype)
+-++-                )
+-++-            
+-++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+-+++    @no_grad()
+-+++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++        original_dtype = hidden_states.dtype
+-+++        batch_size, _ = hidden_states.shape
+-+++        expert_outputs_list = [
+-+++            ops.cat([
+-+++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++            ], dim=0)
+-+++            for i in range(batch_size)
+-+++        ]
+-+++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++        weights_fp32 = routing_weights.to(mindspore.float32)
+-+++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+++        return moe_output_fp32.squeeze(1).to(original_dtype)
+-+++
+-+++    @no_grad()
+-+++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++        num_tokens, _ = hidden_states.shape
+-+++        flat_selected_experts = selected_experts.flatten()
+-+++        sorted_expert_indices = flat_selected_experts.argsort()
+-+++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+++        original_token_indices = sorted_expert_indices // self.top_k
+-+++        moe_output = ops.zeros_like(hidden_states)
+-+++        current_token_offset = 0
+-+++        for i in range(self.num_experts):
+-+++            expert_token_count = tokens_per_expert[i] - current_token_offset
+-+++            if expert_token_count == 0:
+-+++                continue
+-+++            end_offset = current_token_offset + expert_token_count
+-+++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+++            expert_hidden_states = hidden_states[expert_original_token_indices]
+-+++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++            current_token_offset += expert_token_count
+-+++        return moe_output
+-+++
+-+++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-+++    @no_grad()
+-+++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++        moe_output = ops.zeros_like(hidden_states)
+-+++        num_tokens, _ = hidden_states.shape
+-+++        flat_selected_experts = selected_experts.flatten()
+-+++        flat_routing_weights = routing_weights.flatten()
+-+++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++        active_experts = ops.unique(flat_selected_experts)
+-+++        for expert_idx_tensor in active_experts:
+-+++            expert_idx = expert_idx_tensor.item()
+-+++            expert_layer = self.experts[expert_idx]
+-+++            mask = (flat_selected_experts == expert_idx_tensor)
+-+++            current_token_indices = token_indices[mask]
+-+++            current_routing_weights = flat_routing_weights[mask]
+-+++            current_hidden_states = hidden_states[current_token_indices]
+-+++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++        return moe_output
+-++ 
+-++-            final_hidden_states = final_hidden_states + shared_expert_output
+-++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++-            
+-++-            return final_hidden_states, router_logits
+-+++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++        global Long_Prompt
+-+++
+-+++        # 1. 门控计算 (所有模式通用)
+-+++        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++        router_logits = self.gate(hidden_states_reshaped)
+-+++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+++        if self.norm_topk_prob:
+-+++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++        
+-+++        moe_output = None
+-+++        if Long_Prompt:
+-+++            # --- 精度优先模式 (ACCURACY MODE) ---
+-+++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++        else:
+-+++            # --- 速度优先模式 (SPEED MODE) ---
+-+++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++            if sequence_length == 1:
+-+++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++            else:
+-+++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++        
+-++ 
+-+++        # 3. 共享专家计算与合并 (所有模式通用)
+-+++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++        
+-+++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++        
+-+++        return final_hidden_states, router_logits
+-++ 
+-++ class Qwen2MoeDecoderLayer(nn.Module):
+-++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-++         super().__init__()
+-++         self.hidden_size = config.hidden_size
+-+++        
+-+++        # if Long_Prompt:
+-+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++        # else:
+-+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++ 
+-++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++ 
+-++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++-
+-++         if (layer_idx not in config.mlp_only_layers) and (
+-++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++         ):
+-++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++             self._warmed_up = True
+-++             self.warmup_moe_model()
+-++ 
+-+++
+-+++
+-++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++         output_router_logits = (
+-++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+-++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++             router_logits=outputs.router_logits,
+-++         )
+-++ 
+-+++    def generate(self, *args, **kwargs):
+-+++        """
+-+++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-+++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-+++        """
+-+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-+++
+-+++        input_ids = kwargs.get("input_ids")
+-+++        if input_ids is None and args:
+-+++            input_ids = args[0]
+-+++
+-+++        if input_ids is not None:
+-+++            prompt_length = input_ids.shape[1]
+-+++            
+-+++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-+++                Long_Prompt = True
+-+++            else:
+-+++                Long_Prompt = False
+-+++
+-+++        return super().generate(*args, **kwargs)
+-+++    
+-++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+-++     def prepare_inputs_for_generation(
+-++         self,
+-++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+-++         # Exception 1: when passing input_embeds, input_ids may be missing entries
+-++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+-+++        
+-++         if past_key_values is not None:
+-++             if inputs_embeds is not None:  # Exception 1
+-++                 if 0 not in input_ids.shape:
+-++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++             }
+-++         )
+-++         return model_inputs
+-+++
+-++ # @lwx
+-++     # def _decode_one_tokens_logits(
+-++     #     self,
+-++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+-++             attentions=outputs.attentions,
+-++         )
+-++ 
+-+++
+-++ __all__ = [
+-++     "Qwen2MoeForCausalLM",
+-++     "Qwen2MoeModel",
+-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-++new file mode 100644
+-++index 00000000..6dfb5b93
+-++--- /dev/null
+-+++++ b/patches/0001-20251104commit.patch
+-++@@ -0,0 +1,1272 @@
+-+++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+++From: Pinoeer-kingxi <13022943007@163.com>
+-+++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+++Subject: [PATCH] 20251104commit
+-+++
+-+++---
+-+++ mindnlp/transformers/cache_utils.py           |  28 +-
+-+++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-+++
+-+++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+++index cadd2e04..02f8d4be 100644
+-+++--- a/mindnlp/transformers/cache_utils.py
+-++++++ b/mindnlp/transformers/cache_utils.py
+-+++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+++             # k_out[:, :, cache_position] = key_states
+-+++             # v_out[:, :, cache_position] = value_states
+-+++-            if ON_ORANGE_PI:
+-+++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++-            else:
+-+++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++-
+-++++            # if ON_ORANGE_PI:
+-++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++            # else:
+-++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++            # 确保 cache_position 是 1D tensor 并且类型正确
+-++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-++++            if cache_position.ndim > 1:
+-++++                cache_position = cache_position.flatten()
+-++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-++++                cache_position = cache_position.int()
+-++++            
+-++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-++++            k_out[:, :, cache_position] = key_states
+-++++            v_out[:, :, cache_position] = value_states
+-++++            
+-+++         return k_out, v_out
+-+++ 
+-+++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++index c695b944..d8303e45 100644
+-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+++ def rotate_half(x):
+-+++     """Rotates half the hidden dims of the input."""
+-+++-    x1 = x[..., : x.shape[-1] // 2]
+-+++-    x2 = x[..., x.shape[-1] // 2 :]
+-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++    # x1 = x[..., : x.shape[-1] // 2]
+-++++    # x2 = x[..., x.shape[-1] // 2 :]
+-++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++     return ops.cat((-x2, x1), dim=-1)
+-+++ 
+-+++ 
+-+++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+++         if self.training:
+-+++             raise NotImplementedError("Training is not supported yet.")
+-+++         else:
+-+++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++-        if self.config.n_shared_experts is not None:
+-+++-            y = y + self.shared_experts(identity)
+-+++-        return y
+-++++            # @lwx
+-++++            if orig_shape[1] == 1:
+-++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-++++                y=y.view(*orig_shape)
+-++++                if self.config.n_shared_experts is not None:
+-++++                    y = y + self.shared_experts(identity)
+-++++                return y
+-++++            else:
+-++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-++++                if self.config.n_shared_experts is not None:
+-++++                    y = y + self.shared_experts(identity)
+-++++                return y
+-++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++        # if self.config.n_shared_experts is not None:
+-++++        #     y = y + self.shared_experts(identity)
+-++++        # return y
+-++++        
+-++++    @no_grad()
+-++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++
+-++++        expert_cache = ops.zeros_like(x)
+-++++        for i in range(self.num_experts_per_tok):
+-++++            expert_id = flat_expert_indices[i].item()
+-++++            weight = flat_expert_weights[i].item()
+-++++            expert = self.experts[expert_id]
+-++++            expert_out = expert(x)
+-++++            expert_cache += expert_out * weight
+-++++        return expert_cache
+-+++ 
+-+++     @no_grad()
+-+++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++-        # expert_cache = torch.zeros_like(x)
+-+++-        # idxs = flat_expert_indices.argsort()
+-+++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++-        # token_idxs = idxs // self.num_experts_per_tok
+-+++-        # for i, end_idx in enumerate(tokens_per_expert):
+-+++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++-        #     if start_idx == end_idx:
+-+++-        #         continue
+-+++-        #     expert = self.experts[i]
+-+++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++-        #     expert_tokens = x[exp_token_idx]
+-+++-        #     expert_out = expert(expert_tokens)
+-+++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++-        # return expert_cache
+-++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++         expert_cache = ops.zeros_like(x)
+-+++         idxs = flat_expert_indices.argsort()
+-+++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++         token_idxs = idxs // self.num_experts_per_tok
+-++++
+-+++         for i, end_idx in enumerate(tokens_per_expert):
+-+++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++             if start_idx == end_idx:
+-+++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+++             expert_out = expert(expert_tokens)
+-+++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++
+-+++         return expert_cache
+-++++        
+-++++    # @no_grad()
+-++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++    #     # expert_cache = torch.zeros_like(x)
+-++++    #     # idxs = flat_expert_indices.argsort()
+-++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++    #     # token_idxs = idxs // self.num_experts_per_tok
+-++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++    #     #     if start_idx == end_idx:
+-++++    #     #         continue
+-++++    #     #     expert = self.experts[i]
+-++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++    #     #     expert_tokens = x[exp_token_idx]
+-++++    #     #     expert_out = expert(expert_tokens)
+-++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++    #     # return expert_cache
+-++++    #     expert_cache = ops.zeros_like(x)
+-++++    #     idxs = flat_expert_indices.argsort()
+-++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++    #     token_idxs = idxs // self.num_experts_per_tok
+-++++
+-++++    #     for i, end_idx in enumerate(tokens_per_expert):
+-++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++    #         if start_idx == end_idx:
+-++++    #             continue
+-++++    #         expert = self.experts[i]
+-++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++    #         expert_tokens = x[exp_token_idx]
+-++++    #         expert_out = expert(expert_tokens)
+-++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++
+-++++    #     return expert_cache
+-++++    # @no_grad()
+-++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++    #     expert_cache = ops.zeros_like(x)
+-++++
+-++++    #     # 排序保证顺序一致
+-++++    #     idxs = flat_expert_indices.argsort()
+-++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++    #     token_idxs = idxs // self.num_experts_per_tok
+-++++
+-++++    #     # 找出有 token 的专家
+-++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++
+-++++    #     for i in active_experts.tolist():
+-++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++    #         end_idx = tokens_per_expert[i]
+-++++    #         if start_idx == end_idx:  # 没有 token
+-++++    #             continue
+-++++
+-++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++    #         expert_tokens = x[exp_token_idx]
+-++++    #         expert_out = self.experts[i](expert_tokens)
+-++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++
+-++++    #         expert_cache = mindspore.mint.scatter_add(
+-++++    #             expert_cache,
+-++++    #             0,
+-++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++    #             expert_out
+-++++    #         )
+-++++
+-++++    #     return expert_cache
+-++++
+-++++
+-+++ 
+-+++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+++ #     """
+-+++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++ 
+-+++         # Initialize weights and apply final processing
+-+++         self.post_init()
+-++++        self.warm_up = False
+-++++
+-++++    def warmup_moe_model_deep(self):
+-++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++++        test_texts = [
+-++++            "warmup short",
+-++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-++++        ]
+-++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++        if tokenizer is None:
+-++++            from mindnlp.transformers import AutoTokenizer
+-++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++            self._warmup_tokenizer = tokenizer
+-++++
+-++++        for text in test_texts:
+-++++            inputs = tokenizer(text, return_tensors="ms")
+-++++            with mindspore._no_grad():
+-++++                _ = self(**inputs, use_cache=False)
+-++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+++ 
+-+++     def get_input_embeddings(self):
+-+++         return self.model.embed_tokens
+-+++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++         ```"""
+-++++        if not self.warm_up:
+-++++            self.warm_up = True
+-++++            self.warmup_moe_model_deep()
+-++++
+-+++         output_attentions = (
+-+++             output_attentions
+-+++             if output_attentions is not None
+-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++index 3cbf820e..d4c6b651 100644
+-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++@@ -18,7 +18,6 @@
+-+++ # See the License for the specific language governing permissions and
+-+++ # limitations under the License.
+-+++ """MindSpore Qwen2MoE model."""
+-+++-
+-+++ import math
+-+++ from typing import List, Optional, Tuple, Union
+-+++ 
+-+++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+++     TokenClassifierOutput,
+-+++ )
+-+++ from ...modeling_utils import PreTrainedModel
+-++++from ...generation import GenerationMixin
+-+++ from ....utils import logging
+-+++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-+++ 
+-+++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+++         self.variance_epsilon = eps
+-+++ 
+-+++     def forward(self, hidden_states):
+-++++        # @dwj
+-++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++        # @lwx
+-++++        # if not self.training :
+-++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++         input_dtype = hidden_states.dtype
+-+++         hidden_states = hidden_states.to(mindspore.float32)
+-+++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+++@@ -234,6 +239,8 @@ def rotate_half(x):
+-+++     """Rotates half the hidden dims of the input."""
+-+++     x1 = x[..., : x.shape[-1] // 2]
+-+++     x2 = x[..., x.shape[-1] // 2 :]
+-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++     return ops.cat((-x2, x1), dim=-1)
+-+++ 
+-+++ 
+-+++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+++         self.config = config
+-+++         self.hidden_size = config.hidden_size
+-+++         self.intermediate_size = intermediate_size
+-++++        
+-+++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+++         self.act_fn = ACT2FN[config.hidden_act]
+-+++ 
+-+++     def forward(self, x):
+-+++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++-
+-+++ 
+-++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++        # @lwx 
+-++++        # gate_up_output = self.gate_up_proj(x)
+-++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-++++        # return self.down_proj(swiglu_output)
+-++++
+-++++    # def forward(self, x):
+-++++    #     gate_proj_out = self.gate_proj(x)
+-++++    #     up_proj_out = self.up_proj(x)
+-++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-++++    #     return self.down_proj(swiglu_out)
+-++++        
+-+++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+++     """
+-+++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+++         use_cache: bool = False,
+-+++         cache_position: Optional[mindspore.Tensor] = None,
+-+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++        
+-++++
+-+++         bsz, q_len, _ = hidden_states.shape
+-+++ 
+-+++         query_states = self.q_proj(hidden_states)
+-+++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++                     "with a layer index."
+-+++                 )
+-+++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++            if isinstance(past_key_value, StaticCache):
+-++++                kv_seq_len = key_states.shape[-2]
+-++++            else:
+-++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++ 
+-+++         if past_key_value is not None:
+-+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++            
+-++++            if isinstance(past_key_value, StaticCache):
+-++++                kv_seq_len = key_states.shape[-2]
+-+++ 
+-+++         # repeat k/v heads if n_kv_heads < n_heads
+-+++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++-
+-++++        
+-+++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++ 
+-+++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+++-            raise ValueError(
+-+++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+++-                f" {attn_weights.shape}"
+-+++-            )
+-+++-
+-+++-        if attention_mask is not None:  # no matter the length, we just slice it
+-+++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-++++        if attention_mask is not None:
+-++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++             attn_weights = attn_weights + causal_mask
+-+++ 
+-+++         # upcast attention to fp32
+-+++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++ 
+-+++         attn_output = self.o_proj(attn_output)
+-+++-
+-++++        # @lwx
+-++++        
+-++++        # max_seq_len = self.max_position_embeddings  # 2048
+-++++
+-++++        # if attention_mask is not None:
+-++++        #     # attention_mask: [B, 1, Sq, Sk]
+-++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++++
+-++++        #     # pad 到 [max_seq_len, max_seq_len]
+-++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++++        #     global_attention_mask = padded_mask
+-++++        # else:
+-++++        #     global_attention_mask = None
+-++++
+-++++
+-++++        # sparse_mode=3
+-++++        # attn_output = mindspore.ops.flash_attention_score(
+-++++        #     query=query_states,
+-++++        #     key=key_states,
+-++++        #     value=value_states,
+-++++        #     real_shift=None,
+-++++        #     padding_mask=None,
+-++++
+-++++        #     head_num=self.num_heads,
+-++++        #     attn_mask=global_attention_mask,
+-++++        #     keep_prob=1.0 - self.attention_dropout,
+-++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++        #     input_layout="BNSD",
+-++++        #     pre_tokens=2147483647,
+-++++        #     next_tokens=2147483647,
+-++++        #     inner_precise=0,
+-++++        #     drop_mask=None,
+-++++        #     prefix=None,
+-++++        #     actual_seq_qlen=None,
+-++++        #     actual_seq_kvlen=None,
+-++++        #     sparse_mode=sparse_mode,
+-++++        # )
+-+++         if not output_attentions:
+-+++             attn_weights = None
+-+++ 
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++ 
+-++++class Qwen2MoeFlashAttention(nn.Module):
+-++++    """
+-++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++++
+-++++    关键改动:
+-++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++++       直接传入原始的 key 和 value 张量效率更高。
+-++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++    """
+-++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++        super().__init__()
+-++++        self.config = config
+-++++        self.layer_idx = layer_idx
+-++++        self.hidden_size = config.hidden_size
+-++++        self.num_heads = config.num_attention_heads
+-++++        self.head_dim = self.hidden_size // self.num_heads
+-++++        self.num_key_value_heads = config.num_key_value_heads
+-++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++        self.max_position_embeddings = config.max_position_embeddings
+-++++        self.rope_theta = config.rope_theta
+-++++        self.attention_dropout = config.attention_dropout
+-++++
+-++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++            raise ValueError(
+-++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++++            )
+-++++
+-++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++
+-++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++            self.head_dim,
+-++++            max_position_embeddings=self.max_position_embeddings,
+-++++            base=self.rope_theta,
+-++++        )
+-++++
+-++++    def forward(
+-++++        self,
+-++++        hidden_states: mindspore.Tensor,
+-++++        attention_mask: Optional[mindspore.Tensor] = None,
+-++++        position_ids: Optional[mindspore.Tensor] = None,
+-++++        past_key_value: Optional[Cache] = None,
+-++++        output_attentions: bool = False,
+-++++        use_cache: bool = False,
+-++++        cache_position: Optional[mindspore.Tensor] = None,
+-++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++        bsz, q_len, _ = hidden_states.shape
+-++++
+-++++        # 1. 线性投射 Q, K, V
+-++++        query_states = self.q_proj(hidden_states)
+-++++        key_states = self.k_proj(hidden_states)
+-++++        value_states = self.v_proj(hidden_states)
+-++++
+-++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++        # 3. RoPE 旋转位置编码
+-++++        kv_seq_len = key_states.shape[-2]
+-++++        if past_key_value is not None:
+-++++            if self.layer_idx is None:
+-++++                raise ValueError(
+-++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++                    "with a layer index."
+-++++                )
+-++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++++                if cache_position.shape[0] == 1:
+-++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++++                    kv_seq_len = past_seen_tokens + 1
+-++++                else:
+-++++                    # prefill 阶段：cache_position 是范围，使用其长度
+-++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++++            else:
+-++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++
+-++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++        # 4. KV 缓存更新
+-++++        if past_key_value is not None:
+-++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++            key_states, value_states = past_key_value.update(
+-++++                key_states, value_states, self.layer_idx, cache_kwargs
+-++++            )
+-++++            
+-++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++                if cache_position.shape[0] == 1:
+-++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++++                    kv_seq_len = key_states.shape[-2]
+-++++
+-++++        # 5. [重要] 准备 Attention Mask
+-++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++        fa_attention_mask = None
+-++++        if attention_mask is not None:
+-++++            # 截取与当前key长度匹配的部分
+-++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++++            fa_attention_mask = (mask_slice != 0)
+-++++
+-++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++++        input_dtype = query_states.dtype
+-++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++++            query_states = query_states.to(mindspore.float16)
+-++++            key_states = key_states.to(mindspore.float16)
+-++++            value_states = value_states.to(mindspore.float16)
+-++++
+-++++        # 6. [核心] 调用 flash_attention_score 算子
+-++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++        attn_output = mindspore.ops.flash_attention_score(
+-++++            query=query_states,
+-++++            key=key_states,
+-++++            value=value_states,
+-++++            head_num=self.num_heads, # 传入Q的头数(N1)
+-++++            attn_mask=fa_attention_mask,
+-++++            keep_prob=1.0 - self.attention_dropout,
+-++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++            input_layout="BNSD",
+-++++            sparse_mode=0 # 使用 defaultMask 模式
+-++++        )
+-++++
+-++++        # 恢复原始数据类型
+-++++        attn_output = attn_output.to(input_dtype)
+-++++
+-++++        # 7. 调整输出形状
+-++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++        attn_output = self.o_proj(attn_output)
+-++++
+-++++        # FlashAttention 算子不直接返回注意力权重矩阵
+-++++        attn_weights = None
+-++++        if output_attentions:
+-++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++
+-++++        return attn_output, attn_weights, past_key_value
+-++++
+-++++    # def forward(
+-++++    #     self,
+-++++    #     hidden_states: mindspore.Tensor,
+-++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++    #     past_key_value: Optional[Cache] = None,
+-++++    #     output_attentions: bool = False,
+-++++    #     use_cache: bool = False,
+-++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++    #     bsz, q_len, _ = hidden_states.shape
+-++++
+-++++    #     # 1. 线性投射 Q, K, V
+-++++    #     query_states = self.q_proj(hidden_states)
+-++++    #     key_states = self.k_proj(hidden_states)
+-++++    #     value_states = self.v_proj(hidden_states)
+-++++
+-++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++    #     # 3. RoPE 旋转位置编码
+-++++    #     kv_seq_len = key_states.shape[-2]
+-++++    #     if past_key_value is not None:
+-++++    #         if self.layer_idx is None:
+-++++    #             raise ValueError(
+-++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++    #                 "with a layer index."
+-++++    #             )
+-++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++
+-++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++    #     # 4. KV 缓存更新
+-++++    #     if past_key_value is not None:
+-++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++    #         key_states, value_states = past_key_value.update(
+-++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++    #         )
+-++++
+-++++    #     # 5. 准备 Attention Mask
+-++++    #     fa_attention_mask = None
+-++++    #     if attention_mask is not None:
+-++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++    #         fa_attention_mask = (mask_slice != 0)
+-++++
+-++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++++    #     input_dtype = query_states.dtype
+-++++
+-++++    #     # 6. [核心] 调用 flash_attention_score 算子
+-++++    #     attn_output = mindspore.ops.flash_attention_score(
+-++++    #         query=query_states,
+-++++    #         key=key_states,
+-++++    #         value=value_states,
+-++++    #         head_num=self.num_heads,
+-++++    #         attn_mask=fa_attention_mask,
+-++++    #         keep_prob=1.0 - self.attention_dropout,
+-++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++    #         input_layout="BNSD",
+-++++    #         sparse_mode=0,
+-++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++++    #         inner_precise=1
+-++++    #     )
+-++++
+-++++    #     # 恢复原始数据类型
+-++++    #     attn_output = attn_output.to(input_dtype)
+-++++
+-++++    #     # 7. 调整输出形状
+-++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++    #     attn_output = self.o_proj(attn_output)
+-++++
+-++++    #     attn_weights = None
+-++++    #     if output_attentions:
+-++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++
+-++++    #     return attn_output, attn_weights, past_key_value
+-++++
+-++++    # def forward(
+-++++    #     self,
+-++++    #     hidden_states: mindspore.Tensor,
+-++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++    #     past_key_value: Optional[Cache] = None,
+-++++    #     output_attentions: bool = False,
+-++++    #     use_cache: bool = False,
+-++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++    #     bsz, q_len, _ = hidden_states.shape
+-++++
+-++++    #     query_states = self.q_proj(hidden_states)
+-++++    #     key_states = self.k_proj(hidden_states)
+-++++    #     value_states = self.v_proj(hidden_states)
+-++++
+-++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++    #     kv_seq_len = key_states.shape[-2]
+-++++    #     if past_key_value is not None:
+-++++    #         if self.layer_idx is None:
+-++++    #             raise ValueError("`layer_idx` must be specified for caching")
+-++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++
+-++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++    #     if past_key_value is not None:
+-++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++    #         key_states, value_states = past_key_value.update(
+-++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++    #         )
+-++++
+-++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++
+-++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++++    #     query_states = query_states / math.sqrt(self.head_dim)
+-++++    #     # <--- 修改结束 ---
+-++++
+-++++    #     fa_attention_mask = None
+-++++    #     if attention_mask is not None:
+-++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++    #         fa_attention_mask = (mask_slice != 0)
+-++++
+-++++    #     input_dtype = query_states.dtype
+-++++
+-++++    #     attn_output = mindspore.ops.flash_attention_score(
+-++++    #         query=query_states,    # 传入已经预先缩放过的 query
+-++++    #         key=key_states,
+-++++    #         value=value_states,
+-++++    #         head_num=self.num_heads,
+-++++    #         attn_mask=fa_attention_mask,
+-++++    #         keep_prob=1.0 - self.attention_dropout,
+-++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++++    #         input_layout="BNSD",
+-++++    #         sparse_mode=0,
+-++++    #         inner_precise=1        # 仍然保持内部高精度计算
+-++++    #     )
+-++++
+-++++    #     attn_output = attn_output.to(input_dtype)
+-++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++    #     attn_output = self.o_proj(attn_output)
+-++++
+-++++    #     attn_weights = None
+-++++    #     if output_attentions:
+-++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++++
+-++++    #     return attn_output, attn_weights, past_key_value
+-++++
+-+++ QWEN2MOE_ATTENTION_CLASSES = {
+-+++     "eager": Qwen2MoeAttention,
+-++++    "flash-attention": Qwen2MoeFlashAttention,
+-+++ }
+-+++ 
+-+++ 
+-+++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++ 
+-++++    #@dwj
+-++++    # 只遍历激活的专家，而非全部专家
+-+++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-+++-        # router_logits: (batch * sequence_length, n_experts)
+-+++-        router_logits = self.gate(hidden_states)
+-+++-
+-+++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++-        if self.norm_topk_prob:
+-+++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-        # we cast back to the input dtype
+-+++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-+++-
+-+++-        final_hidden_states = ops.zeros(
+-+++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+++-        )
+-+++-
+-+++-        # One hot encode the selected experts to create an expert mask
+-+++-        # this will be used to easily index which expert is going to be sollicitated
+-+++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+++-
+-+++-        # Loop over all available experts in the model and perform the computation on each expert
+-+++-        for expert_idx in range(self.num_experts):
+-+++-            expert_layer = self.experts[expert_idx]
+-+++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+++-
+-+++-            # Index the correct hidden states and compute the expert hidden state for
+-+++-            # the current expert. We need to make sure to multiply the output hidden
+-+++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+++-            if 0 not in idx.shape:
+-+++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+++-
+-+++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-+++-                # the `top_x` tensor here.
+-+++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+++-
+-+++-        shared_expert_output = self.shared_expert(hidden_states)
+-+++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+++-
+-+++-        final_hidden_states = final_hidden_states + shared_expert_output
+-++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++            num_tokens = hidden_states_reshaped.shape[0]
+-++++
+-++++            router_logits = self.gate(hidden_states_reshaped)
+-++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++
+-++++            if self.norm_topk_prob:
+-++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++            routing_weights = routing_weights.to(hidden_states.dtype)
+-++++            
+-++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++++            flat_selected_experts = selected_experts.flatten()
+-++++            
+-++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++++            token_indices = broadcasted_token_indices.flatten()
+-++++            
+-++++            active_experts = ops.unique(flat_selected_experts)
+-++++            
+-++++            for expert_idx_tensor in active_experts:
+-++++                expert_idx = expert_idx_tensor.item()
+-++++                expert_layer = self.experts[expert_idx]
+-++++                
+-++++                mask = (flat_selected_experts == expert_idx_tensor)
+-++++                selected_token_indices = token_indices[mask]
+-++++                selected_routing_weights = routing_weights.flatten()[mask]
+-++++                
+-++++                current_states = hidden_states_reshaped[selected_token_indices]
+-++++                
+-++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++                
+-++++                final_hidden_states = final_hidden_states.index_add(
+-++++                    dim=0,
+-++++                    index=selected_token_indices,
+-++++                    source=expert_output.to(hidden_states.dtype)
+-++++                )
+-++++            
+-++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++ 
+-+++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++-        return final_hidden_states, router_logits
+-++++            final_hidden_states = final_hidden_states + shared_expert_output
+-++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++            
+-++++            return final_hidden_states, router_logits
+-+++ 
+-+++ 
+-+++ class Qwen2MoeDecoderLayer(nn.Module):
+-+++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+++ 
+-+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++ 
+-++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++
+-+++         if (layer_idx not in config.mlp_only_layers) and (
+-+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+++         ):
+-+++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+++     _skip_keys_device_placement = "past_key_values"
+-+++     _supports_cache_class = True
+-++++#lwx
+-++++    # _supports_static_cache = True
+-+++ 
+-+++     def _init_weights(self, module):
+-+++         std = self.config.initializer_range
+-+++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+++         return causal_mask
+-+++ 
+-+++ 
+-+++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++     _tied_weights_keys = ["lm_head.weight"]
+-+++ 
+-+++     def __init__(self, config):
+-+++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++         self.num_experts_per_tok = config.num_experts_per_tok
+-+++         # Initialize weights and apply final processing
+-+++         self.post_init()
+-++++        # @lwx
+-++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-++++        #     self.generation_config.cache_implementation = "static"
+-++++        self._warmed_up = False
+-++++
+-++++    def warmup_moe_model(self):
+-++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-++++        test_texts = [
+-++++            "warmup short",
+-++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-++++        ]
+-++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++        if tokenizer is None:
+-++++            from mindnlp.transformers import AutoTokenizer
+-++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++            self._warmup_tokenizer = tokenizer
+-++++
+-++++        for text in test_texts:
+-++++            inputs = tokenizer(text, return_tensors="ms")
+-++++            with mindspore._no_grad():
+-++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+++ 
+-+++     def get_input_embeddings(self):
+-+++         return self.model.embed_tokens
+-+++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++         ```"""
+-++++        if not self._warmed_up:
+-++++            self._warmed_up = True
+-++++            self.warmup_moe_model()
+-+++ 
+-+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+++         output_router_logits = (
+-+++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++             }
+-+++         )
+-+++         return model_inputs
+-++++# @lwx
+-++++    # def _decode_one_tokens_logits(
+-++++    #     self,
+-++++    #     cur_token: mindspore.Tensor,
+-++++    #     input_pos: Optional[mindspore.Tensor],
+-++++    #     cache_position: mindspore.Tensor,
+-++++    #     past_key_values: StaticCache,
+-++++    # ) -> mindspore.Tensor:
+-++++    #     """
+-++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-++++        
+-++++    #     Args:
+-++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-++++    #         input_pos: 输入位置信息，可选
+-++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-++++            
+-++++    #     Returns:
+-++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-++++    #     """
+-++++    #     # 调用JIT编译的版本
+-++++    #     return self.get_decode_one_tokens_logits(
+-++++    #         cur_token=cur_token,
+-++++    #         input_pos=input_pos,
+-++++    #         cache_position=cache_position,
+-++++    #         past_key_values=past_key_values,
+-++++    #     )
+-++++    
+-++++    # @mindspore.jit(jit_level='O1')
+-++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-++++    #     """
+-++++    #     JIT编译的函数，用于高效的单token解码
+-++++    #     使用JIT编译优化以支持静态shape和高效执行
+-++++        
+-++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-++++    #     """
+-++++    #     outputs = self.model.forward(
+-++++    #         input_ids=cur_token,
+-++++    #         position_ids=input_pos,
+-++++    #         cache_position=cache_position,
+-++++    #         past_key_values=past_key_values,
+-++++    #         use_cache=True,
+-++++    #         return_dict=False,
+-++++    #     )
+-++++        
+-++++    #     hidden_states = outputs[0]
+-++++    #     logits = self.lm_head.forward(hidden_states)
+-++++    #     logits = logits.float()
+-++++        
+-++++    #     return logits[:, -1, :]
+-++++
+-++++    # def _sample(
+-++++    #     self,
+-++++    #     input_ids: mindspore.Tensor,
+-++++    #     logits_processor,
+-++++    #     stopping_criteria,
+-++++    #     generation_config,
+-++++    #     synced_devices: bool,
+-++++    #     streamer=None,
+-++++    #     logits_warper=None,
+-++++    #     **model_kwargs,
+-++++    # ):
+-++++    #     """
+-++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-++++    #     """
+-++++    #     from ...generation.logits_process import LogitsProcessorList
+-++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-++++    #     from mindnlp.core import nn, ops, no_grad
+-++++    #     import numpy as np
+-++++        
+-++++    #     # 检查是否使用 StaticCache
+-++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-++++    #     # 否则，直接调用父类方法
+-++++    #     past_key_values = model_kwargs.get("past_key_values")
+-++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-++++
+-++++    #     if not isinstance(past_key_values, StaticCache):
+-++++    #         # 不使用 StaticCache，直接调用父类方法
+-++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-++++    #         return super()._sample(
+-++++    #             input_ids=input_ids,
+-++++    #             logits_processor=logits_processor,
+-++++    #             stopping_criteria=stopping_criteria,
+-++++    #             generation_config=generation_config,
+-++++    #             synced_devices=synced_devices,
+-++++    #             streamer=streamer,
+-++++    #             logits_warper=logits_warper,
+-++++    #             **model_kwargs,
+-++++    #         )
+-++++        
+-++++    #     # 使用 StaticCache，进入自定义循环
+-++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-++++    #     pad_token_id = generation_config._pad_token_tensor
+-++++    #     output_attentions = generation_config.output_attentions
+-++++    #     output_hidden_states = generation_config.output_hidden_states
+-++++    #     output_scores = generation_config.output_scores
+-++++    #     output_logits = generation_config.output_logits
+-++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-++++    #     max_length = generation_config.max_length
+-++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-++++    #     do_sample = generation_config.do_sample
+-++++        
+-++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-++++    #         raise ValueError(
+-++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-++++    #             f"{logits_warper})."
+-++++    #         )
+-++++        
+-++++    #     # init attention / hidden states / scores tuples
+-++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-++++        
+-++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-++++    #         encoder_hidden_states = (
+-++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-++++    #         )
+-++++        
+-++++    #     # keep track of which sequences are already finished
+-++++    #     batch_size, cur_len = input_ids.shape
+-++++    #     this_peer_finished = False
+-++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-++++        
+-++++    #     time_record = []
+-++++    #     from ....utils.testing_utils import parse_flag_from_env
+-++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-++++        
+-++++    #     while self._has_unfinished_sequences(
+-++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-++++    #     ):
+-++++    #         if _record_time:
+-++++    #             import time as time_module
+-++++    #             infer_start = time_module.time()
+-++++            
+-++++    #         # prepare model inputs
+-++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-++++            
+-++++    #         # prepare variable output controls
+-++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-++++            
+-++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-++++    #         cur_cache_position = model_inputs.get("cache_position")
+-++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-++++    #         cur_input_ids = model_inputs.get("input_ids")
+-++++            
+-++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-++++    #             cur_cache_position is not None and 
+-++++    #             len(cur_cache_position.shape) > 0 and
+-++++    #             cur_cache_position.shape[0] == 1 and
+-++++    #             cur_input_ids is not None and
+-++++    #             cur_input_ids.shape[1] == 1):
+-++++    #             # 使用 JIT 优化的单 token 解码
+-++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-++++    #             if not hasattr(self, '_jit_used'):
+-++++    #                 self._jit_used = False
+-++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-++++                
+-++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-++++    #                 cur_token=cur_input_ids,
+-++++    #                 input_pos=model_inputs.get("position_ids"),
+-++++    #                 cache_position=cur_cache_position,
+-++++    #                 past_key_values=cur_past_key_values,
+-++++    #             )
+-++++                
+-++++    #             # 标记已使用JIT（用于后续判断）
+-++++    #             if not self._jit_used:
+-++++    #                 self._jit_used = True
+-++++                
+-++++    #             # 构造兼容的输出对象
+-++++    #             class JitOptimizedOutput:
+-++++    #                 def __init__(self, logits, config):
+-++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-++++    #                     self.config = config
+-++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-++++    #                     self.cross_attentions = None
+-++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-++++                
+-++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-++++    #         else:
+-++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-++++    #             outputs = self(**model_inputs, return_dict=True)
+-++++            
+-++++    #         if synced_devices and this_peer_finished:
+-++++    #             continue
+-++++            
+-++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-++++    #         next_token_logits = outputs.logits[:, -1, :]
+-++++            
+-++++    #         # pre-process distribution
+-++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-++++    #         if do_sample:
+-++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-++++            
+-++++    #         # Store scores, attentions and hidden_states when required
+-++++    #         if return_dict_in_generate:
+-++++    #             if output_scores:
+-++++    #                 scores += (next_token_scores,)
+-++++    #             if output_logits:
+-++++    #                 raw_logits += (next_token_logits,)
+-++++    #             if output_attentions:
+-++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-++++    #                 if self.config.is_encoder_decoder:
+-++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-++++                
+-++++    #             if output_hidden_states:
+-++++    #                 hidden = (
+-++++    #                     outputs.decoder_hidden_states
+-++++    #                     if self.config.is_encoder_decoder
+-++++    #                     else outputs.hidden_states
+-++++    #                 )
+-++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-++++            
+-++++    #         # token selection
+-++++    #         if do_sample:
+-++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-++++    #         else:
+-++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-++++            
+-++++    #         # finished sentences should have their next token be a padding token
+-++++    #         if has_eos_stopping_criteria:
+-++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-++++            
+-++++    #         # update generated ids, model inputs, and length for next step
+-++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-++++    #         if streamer is not None:
+-++++    #             streamer.put(next_tokens)
+-++++            
+-++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-++++    #             outputs,
+-++++    #             model_kwargs,
+-++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-++++    #         )
+-++++            
+-++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-++++    #         cur_len += 1
+-++++            
+-++++    #         if _record_time:
+-++++    #             import time as time_module
+-++++    #             infer_stop = time_module.time()
+-++++    #             time_record.append(infer_stop - infer_start)
+-++++            
+-++++    #         del outputs
+-++++        
+-++++    #     average_infer_time = None
+-++++    #     if time_record:
+-++++    #         if len(time_record) > 1:
+-++++    #             time_record.pop(0)
+-++++    #         average_infer_time = sum(time_record) / len(time_record)
+-++++    #         print(f'average inference time is: {average_infer_time}')
+-++++    #         print(f'inference time record: {time_record}')
+-++++        
+-++++    #     if streamer is not None:
+-++++    #         streamer.end()
+-++++        
+-++++    #     # 简单判断：打印是否使用了JIT路径
+-++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-++++    #     else:
+-++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-++++        
+-++++    #     if return_dict_in_generate:
+-++++    #         if self.config.is_encoder_decoder:
+-++++    #             return GenerateEncoderDecoderOutput(
+-++++    #                 sequences=input_ids,
+-++++    #                 scores=scores,
+-++++    #                 logits=raw_logits,
+-++++    #                 encoder_attentions=encoder_attentions,
+-++++    #                 encoder_hidden_states=encoder_hidden_states,
+-++++    #                 decoder_attentions=decoder_attentions,
+-++++    #                 cross_attentions=cross_attentions,
+-++++    #                 decoder_hidden_states=decoder_hidden_states,
+-++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++    #                 average_infer_time=average_infer_time
+-++++    #             )
+-++++    #         else:
+-++++    #             return GenerateDecoderOnlyOutput(
+-++++    #                 sequences=input_ids,
+-++++    #                 scores=scores,
+-++++    #                 logits=raw_logits,
+-++++    #                 attentions=decoder_attentions,
+-++++    #                 hidden_states=decoder_hidden_states,
+-++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++    #                 average_infer_time=average_infer_time
+-++++    #             )
+-++++    #     else:
+-++++    #         return input_ids
+-++++            
+-++++    # def _prepare_cache_for_generation(
+-++++    #     self,
+-++++    #     generation_config,
+-++++    #     model_kwargs,
+-++++    #     assistant_model,
+-++++    #     batch_size,
+-++++    #     max_cache_length,
+-++++    # ):
+-++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-++++    #         generation_config.cache_implementation = "static"
+-++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-++++        
+-++++    #     if generation_config.cache_implementation == "static":
+-++++    #         base_required_from_max_length = generation_config.max_length + 1
+-++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-++++    #         min_cache_size = 50
+-++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-++++    #         else:
+-++++    #             max_cache_length = max(base_required, min_cache_size)
+-++++            
+-++++    #         original_max_cache_length = max_cache_length
+-++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-++++            
+-++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++    #             if max_cache_length > self.config.max_position_embeddings:
+-++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-++++        
+-++++    #     result = super()._prepare_cache_for_generation(
+-++++    #         generation_config=generation_config,
+-++++    #         model_kwargs=model_kwargs,
+-++++    #         assistant_model=assistant_model,
+-++++    #         batch_size=batch_size,
+-++++    #         max_cache_length=max_cache_length,
+-++++    #     )
+-++++        
+-++++    #     if generation_config.cache_implementation == "static":
+-++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-++++    #         created_cache = model_kwargs.get(cache_name)
+-++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-++++    #             if created_cache.max_cache_len < generation_config.max_length:
+-++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-++++        
+-++++    #     return result
+-++++
+-++++
+-++++
+-+++ 
+-+++ 
+-+++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+++-- 
+-+++2.27.0
+-+++
+-++-- 
+-++2.27.0
+-++
+-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-+new file mode 100644
+-+index 00000000..966529e4
+-+--- /dev/null
+-++++ b/patches/0003-20261106secondcommit.patch
+-+@@ -0,0 +1,2769 @@
+-++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-++From: Pinoeer-kingxi <13022943007@163.com>
+-++Date: Thu, 6 Nov 2025 14:54:37 +0800
+-++Subject: [PATCH 3/3] 20261106secondcommit
+-++
+-++---
+-++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+-++ patches/0001-20251104commit.patch             | 1272 -----------------
+-++ 3 files changed, 528 insertions(+), 2032 deletions(-)
+-++ delete mode 100644 patches/0001-20251104commit.patch
+-++
+-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++index 73773c22..2f9192bf 100644
+-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+-++ 
+-++ _CONFIG_FOR_DOC = "DeepseekConfig"
+-++ 
+-+++_attn_mask_cache = {}
+-+++
+-+++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+-+++    q_len = batch_and_seq[1]
+-+++    kv_len = batch_and_seq[1] + past_key_values_length 
+-+++    key = (batch_and_seq[0], q_len, kv_len)
+-+++
+-+++    if key in _attn_mask_cache:
+-+++        return _attn_mask_cache[key]
+-+++
+-+++    mask = _prepare_4d_causal_attention_mask(
+-+++        attention_mask,
+-+++        batch_and_seq,
+-+++        inputs_embeds,
+-+++        past_key_values_length,
+-+++    )
+-+++    _attn_mask_cache[key] = mask
+-+++    return mask
+-++ 
+-++ def _get_unpad_data(attention_mask):
+-++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+-++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+-++         return final_output
+-++ 
+-++ 
+-++-    @no_grad()
+-++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++-        expert_cache = ops.zeros_like(x)
+-++-        idxs = flat_expert_indices.argsort()
+-++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++-        token_idxs = idxs // self.num_experts_per_tok
+-++-
+-++-        for i, end_idx in enumerate(tokens_per_expert):
+-++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++-            if start_idx == end_idx:
+-++-                continue
+-++-            expert = self.experts[i]
+-++-            exp_token_idx = token_idxs[start_idx:end_idx]
+-++-            expert_tokens = x[exp_token_idx]
+-++-            expert_out = expert(expert_tokens)
+-++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++-
+-++-        return expert_cache
+-++-        
+-++     # @no_grad()
+-++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++-    #     # expert_cache = torch.zeros_like(x)
+-++-    #     # idxs = flat_expert_indices.argsort()
+-++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++-    #     # token_idxs = idxs // self.num_experts_per_tok
+-++-    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++-    #     #     if start_idx == end_idx:
+-++-    #     #         continue
+-++-    #     #     expert = self.experts[i]
+-++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++-    #     #     expert_tokens = x[exp_token_idx]
+-++-    #     #     expert_out = expert(expert_tokens)
+-++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++-    #     # return expert_cache
+-+++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++     #     expert_cache = ops.zeros_like(x)
+-++     #     idxs = flat_expert_indices.argsort()
+-++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+-++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++ 
+-++     #     return expert_cache
+-++-    # @no_grad()
+-++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++-    #     expert_cache = ops.zeros_like(x)
+-+++        
+-+++    @no_grad()
+-+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++        """
+-+++        优化版 MoE prefill：
+-+++        - 批量张量化处理同一个 expert 的所有 token
+-+++        - 跳过无 token 的专家
+-+++        - 保持结果完全一致
+-+++        """
+-+++        # 初始化输出缓存
+-+++        expert_cache = ops.zeros_like(x)
+-++ 
+-++-    #     # 排序保证顺序一致
+-++-    #     idxs = flat_expert_indices.argsort()
+-++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++-    #     token_idxs = idxs // self.num_experts_per_tok
+-+++        # 排序（确保 scatter_add 位置对应原逻辑）
+-+++        idxs = flat_expert_indices.argsort()
+-+++        sorted_expert_indices = flat_expert_indices[idxs]
+-+++        sorted_token_indices = idxs // self.num_experts_per_tok
+-++ 
+-++-    #     # 找出有 token 的专家
+-++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++        # 每个 expert 的 token 数
+-+++        tokens_per_expert = sorted_expert_indices.bincount()
+-++ 
+-++-    #     for i in active_experts.tolist():
+-++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++-    #         end_idx = tokens_per_expert[i]
+-++-    #         if start_idx == end_idx:  # 没有 token
+-++-    #             continue
+-+++        # 找出有 token 的专家
+-+++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-++ 
+-++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++-    #         expert_tokens = x[exp_token_idx]
+-++-    #         expert_out = self.experts[i](expert_tokens)
+-++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++        for expert_id in active_experts.tolist():
+-+++            # 取该 expert 对应的排序后 token 区间
+-+++            start = (tokens_per_expert[:expert_id]).sum().item()
+-+++            end = start + tokens_per_expert[expert_id].item()
+-++ 
+-++-    #         expert_cache = mindspore.mint.scatter_add(
+-++-    #             expert_cache,
+-++-    #             0,
+-++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++-    #             expert_out
+-++-    #         )
+-+++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+-+++            expert_tokens = x[token_idx]                     # 取输入向量
+-++ 
+-++-    #     return expert_cache
+-+++            # 执行专家 MLP
+-+++            expert_out = self.experts[expert_id](expert_tokens)
+-+++
+-+++            # 按权重缩放
+-+++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+-+++
+-+++            # 回写到缓存（等价 scatter_add）
+-+++            expert_cache = mindspore.mint.scatter_add(
+-+++                expert_cache,
+-+++                0,
+-+++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++                scaled_out
+-+++            )
+-+++
+-+++        return expert_cache
+-+++
+-+++        # @no_grad()
+-+++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++        #     # expert_cache = torch.zeros_like(x)
+-+++        #     # idxs = flat_expert_indices.argsort()
+-+++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++        #     # token_idxs = idxs // self.num_experts_per_tok
+-+++        #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++        #     #     if start_idx == end_idx:
+-+++        #     #         continue
+-+++        #     #     expert = self.experts[i]
+-+++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++        #     #     expert_tokens = x[exp_token_idx]
+-+++        #     #     expert_out = expert(expert_tokens)
+-+++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++        #     # return expert_cache
+-+++        #     expert_cache = ops.zeros_like(x)
+-+++        #     idxs = flat_expert_indices.argsort()
+-+++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++        #     token_idxs = idxs // self.num_experts_per_tok
+-+++
+-+++        #     for i, end_idx in enumerate(tokens_per_expert):
+-+++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++        #         if start_idx == end_idx:
+-+++        #             continue
+-+++        #         expert = self.experts[i]
+-+++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++        #         expert_tokens = x[exp_token_idx]
+-+++        #         expert_out = expert(expert_tokens)
+-+++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++
+-+++        #     return expert_cache
+-+++        # @no_grad()
+-+++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++        #     expert_cache = ops.zeros_like(x)
+-+++
+-+++        #     # 排序保证顺序一致
+-+++        #     idxs = flat_expert_indices.argsort()
+-+++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++        #     token_idxs = idxs // self.num_experts_per_tok
+-+++
+-+++        #     # 找出有 token 的专家
+-+++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++
+-+++        #     for i in active_experts.tolist():
+-+++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++        #         end_idx = tokens_per_expert[i]
+-+++        #         if start_idx == end_idx:  # 没有 token
+-+++        #             continue
+-+++
+-+++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++        #         expert_tokens = x[exp_token_idx]
+-+++        #         expert_out = self.experts[i](expert_tokens)
+-+++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++
+-+++        #         expert_cache = mindspore.mint.scatter_add(
+-+++        #             expert_cache,
+-+++        #             0,
+-+++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++        #             expert_out
+-+++        #         )
+-+++
+-+++        #     return expert_cache
+-++ 
+-++ 
+-++ 
+-++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+-++ 
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-++-
+-++ # class DeepseekFlashAttention(nn.Module):
+-++ #     """
+-++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+-++ 
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-+++
+-++ Deepseek_ATTENTION_CLASSES = {
+-++     "eager": DeepseekAttention,
+-++     "flash-attention": DeepseekFlashAttention,
+-++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+-++             )
+-++         else:
+-++             # 4d mask is passed through the layers
+-++-            attention_mask = _prepare_4d_causal_attention_mask(
+-+++            # attention_mask = _prepare_4d_causal_attention_mask(
+-+++            #     attention_mask,
+-+++            #     (batch_size, seq_length),
+-+++            #     inputs_embeds,
+-+++            #     past_key_values_length,
+-+++            # )
+-+++            #@dwj
+-+++            attention_mask = get_cached_causal_mask(
+-++                 attention_mask,
+-++                 (batch_size, seq_length),
+-++                 inputs_embeds,
+-++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++         # Initialize weights and apply final processing
+-++         self.post_init()
+-++         self.warm_up = False
+-+++        #@dwj
+-+++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-+++            self.num_layers,
+-+++            self.num_attention_heads,
+-+++            self.head_dim,
+-+++            batch_size=1,
+-+++            max_length=self.max_length,
+-+++            dtype=mindspore.float16
+-+++        )
+-+++
+-+++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-+++        key_cache = []
+-+++        value_cache = []
+-+++        for _ in range(num_layers):
+-+++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+++            key_cache.append(k)
+-+++            value_cache.append(v)
+-+++        return key_cache, value_cache
+-+++
+-++ 
+-++     def warmup_moe_model_deep(self):
+-++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++index bced285c..ebd7782e 100644
+-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+-++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-++ 
+-++-Long_Prompt = False
+-++-PROMPT_LENGTH_THRESHOLD = 128
+-+++Long_Prompt = 1
+-+++LONG_PROMPT_LENGTH_THRESHOLD = 128
+-+++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+-+++
+-+++_causal_mask_cache = {}
+-+++
+-+++def get_cached_causal_mask_with_cache_position(
+-+++    attention_mask: mindspore.Tensor,
+-+++    sequence_length: int,
+-+++    target_length: int,
+-+++    dtype: mindspore.dtype,
+-+++    min_dtype: float,
+-+++    cache_position: mindspore.Tensor,
+-+++    batch_size: int,
+-+++):
+-+++    """
+-+++    带缓存的 causal mask 构造函数
+-+++    """
+-+++    # q_len 是当前 query 长度
+-+++    q_len = sequence_length
+-+++    # kv_len 是 target_length
+-+++    kv_len = target_length
+-+++
+-+++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+-+++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+-+++
+-+++    if key in _causal_mask_cache:
+-+++        return _causal_mask_cache[key]
+-+++
+-+++    # 调用原来的 mask 构造逻辑
+-+++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++        attention_mask,
+-+++        sequence_length=sequence_length,
+-+++        target_length=target_length,
+-+++        dtype=dtype,
+-+++        min_dtype=min_dtype,
+-+++        cache_position=cache_position,
+-+++        batch_size=batch_size,
+-+++    )
+-+++    # 缓存结果
+-+++    _causal_mask_cache[key] = causal_mask
+-+++    return causal_mask
+-++ 
+-++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-++ def _prepare_4d_causal_attention_mask_with_cache_position(
+-++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++ 
+-++ 
+-++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+-+++# class Qwen2MoeAttention(nn.Module):
+-+++#     """
+-+++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-+++#     and "Generating Long Sequences with Sparse Transformers".
+-+++#     """
+-+++
+-+++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++#         super().__init__()
+-+++#         self.config = config
+-+++#         self.layer_idx = layer_idx
+-+++#         if layer_idx is None:
+-+++#             logger.warning_once(
+-+++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++#                 "when creating this class."
+-+++#             )
+-+++
+-+++#         self.hidden_size = config.hidden_size
+-+++#         self.num_heads = config.num_attention_heads
+-+++#         self.head_dim = self.hidden_size // self.num_heads
+-+++#         self.num_key_value_heads = config.num_key_value_heads
+-+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++#         self.max_position_embeddings = config.max_position_embeddings
+-+++#         self.rope_theta = config.rope_theta
+-+++#         self.is_causal = True
+-+++#         self.attention_dropout = config.attention_dropout
+-+++
+-+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++#             raise ValueError(
+-+++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+++#                 f" and `num_heads`: {self.num_heads})."
+-+++#             )
+-+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++
+-+++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++#             self.head_dim,
+-+++#             max_position_embeddings=self.max_position_embeddings,
+-+++#             base=self.rope_theta,
+-+++#         )
+-+++
+-+++#     def forward(
+-+++#         self,
+-+++#         hidden_states: mindspore.Tensor,
+-+++#         attention_mask: Optional[mindspore.Tensor] = None,
+-+++#         position_ids: Optional[mindspore.Tensor] = None,
+-+++#         past_key_value: Optional[Cache] = None,
+-+++#         output_attentions: bool = False,
+-+++#         use_cache: bool = False,
+-+++#         cache_position: Optional[mindspore.Tensor] = None,
+-+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++
+-+++        
+-+++
+-+++#         bsz, q_len, _ = hidden_states.shape
+-+++
+-+++#         query_states = self.q_proj(hidden_states)
+-+++#         key_states = self.k_proj(hidden_states)
+-+++#         value_states = self.v_proj(hidden_states)
+-+++
+-+++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++
+-+++#         kv_seq_len = key_states.shape[-2]
+-+++#         if past_key_value is not None:
+-+++#             if self.layer_idx is None:
+-+++#                 raise ValueError(
+-+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++#                     "with a layer index."
+-+++#                 )
+-+++#             if isinstance(past_key_value, StaticCache):
+-+++#                 kv_seq_len = key_states.shape[-2]
+-+++#             else:
+-+++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++
+-+++#         if past_key_value is not None:
+-+++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++            
+-+++#             if isinstance(past_key_value, StaticCache):
+-+++#                 kv_seq_len = key_states.shape[-2]
+-+++
+-+++#         # repeat k/v heads if n_kv_heads < n_heads
+-+++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++        
+-+++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++
+-+++#         if attention_mask is not None:
+-+++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++#             attn_weights = attn_weights + causal_mask
+-+++
+-+++#         # upcast attention to fp32
+-+++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+++#         attn_output = ops.matmul(attn_weights, value_states)
+-+++
+-+++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+++#             raise ValueError(
+-+++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-+++#                 f" {attn_output.shape}"
+-+++#             )
+-+++
+-+++#         attn_output = ops.transpose(attn_output, 1, 2)
+-+++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++
+-+++#         attn_output = self.o_proj(attn_output)
+-+++#         # @lwx
+-+++        
+-+++#         # max_seq_len = self.max_position_embeddings  # 2048
+-+++
+-+++#         # if attention_mask is not None:
+-+++#         #     # attention_mask: [B, 1, Sq, Sk]
+-+++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++
+-+++#         #     # pad 到 [max_seq_len, max_seq_len]
+-+++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++#         #     global_attention_mask = padded_mask
+-+++#         # else:
+-+++#         #     global_attention_mask = None
+-+++
+-+++
+-+++#         # sparse_mode=3
+-+++#         # attn_output = mindspore.ops.flash_attention_score(
+-+++#         #     query=query_states,
+-+++#         #     key=key_states,
+-+++#         #     value=value_states,
+-+++#         #     real_shift=None,
+-+++#         #     padding_mask=None,
+-+++
+-+++#         #     head_num=self.num_heads,
+-+++#         #     attn_mask=global_attention_mask,
+-+++#         #     keep_prob=1.0 - self.attention_dropout,
+-+++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++#         #     input_layout="BNSD",
+-+++#         #     pre_tokens=2147483647,
+-+++#         #     next_tokens=2147483647,
+-+++#         #     inner_precise=0,
+-+++#         #     drop_mask=None,
+-+++#         #     prefix=None,
+-+++#         #     actual_seq_qlen=None,
+-+++#         #     actual_seq_kvlen=None,
+-+++#         #     sparse_mode=sparse_mode,
+-+++#         # )
+-+++#         if not output_attentions:
+-+++#             attn_weights = None
+-+++
+-+++#         return attn_output, attn_weights, past_key_value
+-+++
+-++ class Qwen2MoeAttention(nn.Module):
+-++     """
+-++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-++-    and "Generating Long Sequences with Sparse Transformers".
+-++-    """
+-+++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+-++ 
+-+++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+-+++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+-+++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+-+++
+-+++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+-+++    """
+-++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++         super().__init__()
+-++         self.config = config
+-++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+-++         if layer_idx is None:
+-++             logger.warning_once(
+-++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++                 "when creating this class."
+-++             )
+-++ 
+-++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+-++         use_cache: bool = False,
+-++         cache_position: Optional[mindspore.Tensor] = None,
+-++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++-
+-++         
+-++-
+-+++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+-++         bsz, q_len, _ = hidden_states.shape
+-++ 
+-++         query_states = self.q_proj(hidden_states)
+-++         key_states = self.k_proj(hidden_states)
+-++         value_states = self.v_proj(hidden_states)
+-++ 
+-++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++-
+-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++        
+-++         kv_seq_len = key_states.shape[-2]
+-++         if past_key_value is not None:
+-++-            if self.layer_idx is None:
+-++-                raise ValueError(
+-++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++-                    "with a layer index."
+-++-                )
+-++-            if isinstance(past_key_value, StaticCache):
+-++-                kv_seq_len = key_states.shape[-2]
+-++-            else:
+-++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++        
+-++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++ 
+-++         if past_key_value is not None:
+-++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++
+-+++        # --- 2. 动态调度核心注意力计算 ---
+-+++        global Long_Prompt
+-+++        if Long_Prompt >= 1:
+-+++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+-+++            fa_attention_mask = None
+-+++            if attention_mask is not None:
+-+++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++                fa_attention_mask = (mask_slice != 0)
+-+++
+-+++            attn_output = mindspore.ops.flash_attention_score(
+-+++                query=query_states,
+-+++                key=key_states,
+-+++                value=value_states,
+-+++                head_num=self.num_heads,
+-+++                attn_mask=fa_attention_mask,
+-+++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+-+++                scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++                input_layout="BNSD",
+-+++                sparse_mode=0,
+-+++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+-+++            )
+-++             
+-++-            if isinstance(past_key_value, StaticCache):
+-++-                kv_seq_len = key_states.shape[-2]
+-+++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++            attn_output = self.o_proj(attn_output)
+-+++            attn_weights = None
+-+++            if output_attentions:
+-+++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-++ 
+-++-        # repeat k/v heads if n_kv_heads < n_heads
+-++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++-        
+-++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++        else:
+-+++            # --- Eager Attention 路径 (用于短序列和解码) ---
+-+++            key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++            value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++            
+-+++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++ 
+-++-        if attention_mask is not None:
+-++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++-            attn_weights = attn_weights + causal_mask
+-+++            if attention_mask is not None:
+-+++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++                attn_weights = attn_weights + causal_mask
+-++ 
+-++-        # upcast attention to fp32
+-++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-++-        attn_output = ops.matmul(attn_weights, value_states)
+-+++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+++            attn_output = ops.matmul(attn_weights, value_states)
+-++ 
+-++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-++-            raise ValueError(
+-++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-++-                f" {attn_output.shape}"
+-++-            )
+-+++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+++                raise ValueError(
+-+++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+-+++                )
+-++ 
+-++-        attn_output = ops.transpose(attn_output, 1, 2)
+-++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++            attn_output = ops.transpose(attn_output, 1, 2)
+-+++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++            attn_output = self.o_proj(attn_output)
+-++ 
+-++-        attn_output = self.o_proj(attn_output)
+-++-        # @lwx
+-+++            if not output_attentions:
+-+++                attn_weights = None
+-++         
+-++-        # max_seq_len = self.max_position_embeddings  # 2048
+-++-
+-++-        # if attention_mask is not None:
+-++-        #     # attention_mask: [B, 1, Sq, Sk]
+-++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++-
+-++-        #     # pad 到 [max_seq_len, max_seq_len]
+-++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++-        #     global_attention_mask = padded_mask
+-++-        # else:
+-++-        #     global_attention_mask = None
+-++-
+-++-
+-++-        # sparse_mode=3
+-++-        # attn_output = mindspore.ops.flash_attention_score(
+-++-        #     query=query_states,
+-++-        #     key=key_states,
+-++-        #     value=value_states,
+-++-        #     real_shift=None,
+-++-        #     padding_mask=None,
+-++-
+-++-        #     head_num=self.num_heads,
+-++-        #     attn_mask=global_attention_mask,
+-++-        #     keep_prob=1.0 - self.attention_dropout,
+-++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++-        #     input_layout="BNSD",
+-++-        #     pre_tokens=2147483647,
+-++-        #     next_tokens=2147483647,
+-++-        #     inner_precise=0,
+-++-        #     drop_mask=None,
+-++-        #     prefix=None,
+-++-        #     actual_seq_qlen=None,
+-++-        #     actual_seq_kvlen=None,
+-++-        #     sparse_mode=sparse_mode,
+-++-        # )
+-++-        if not output_attentions:
+-++-            attn_weights = None
+-++-
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-++-
+-++ # class Qwen2MoeFlashAttention(nn.Module):
+-++ #     """
+-++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+-++ #             return final_hidden_states, router_logits
+-++ 
+-++ 
+-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++-#     """
+-++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-++-#     """
+-++-#     def __init__(self, config: Qwen2MoeConfig):
+-++-#         super().__init__()
+-++-#         self.num_experts = config.num_experts
+-++-#         self.top_k = config.num_experts_per_tok
+-++-#         self.norm_topk_prob = config.norm_topk_prob
+-++-
+-++-#         # 门控网络
+-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++-#         # 专家列表
+-++-#         self.experts = nn.ModuleList(
+-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++-#         )
+-++-#         # 共享专家
+-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++-
+-++-#     @no_grad()
+-++-#     def _moe_infer_decode(
+-++-#         self, 
+-++-#         hidden_states: mindspore.Tensor, 
+-++-#         selected_experts: mindspore.Tensor, 
+-++-#         routing_weights: mindspore.Tensor
+-++-#     ) -> mindspore.Tensor:
+-++-#         """
+-++-#         【解码路径】针对 sequence_length=1 的极致优化。
+-++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-++-#         """
+-++-#         batch_size, hidden_dim = hidden_states.shape
+-++-        
+-++-#         expert_outputs_list = [
+-++-#             ops.cat([
+-++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++-#             ], dim=0) 
+-++-#             for i in range(batch_size)
+-++-#         ]
+-++-        
+-++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-++-#         # shape: (batch_size, top_k, hidden_dim)
+-++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++-        
+-++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++-        
+-++-#         return moe_output.squeeze(1)
+-++-
+-++-#     @no_grad()
+-++-#     def _moe_infer_prefill(
+-++-#         self, 
+-++-#         hidden_states: mindspore.Tensor, 
+-++-#         selected_experts: mindspore.Tensor, 
+-++-#         routing_weights: mindspore.Tensor
+-++-#     ) -> mindspore.Tensor:
+-++-#         """
+-++-#         【预填充路径】针对 sequence_length > 1 的优化。
+-++-#         按专家对 Token 进行分组，并进行批处理。
+-++-#         """
+-++-#         moe_output = ops.zeros_like(hidden_states)
+-++-#         num_tokens = hidden_states.shape[0]
+-++-#         flat_selected_experts = selected_experts.flatten()
+-++-        
+-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++-        
+-++-#         active_experts = ops.unique(flat_selected_experts)
+-++-        
+-++-#         for expert_idx_tensor in active_experts:
+-++-#             expert_idx = expert_idx_tensor.item()
+-++-#             expert_layer = self.experts[expert_idx]
+-++-            
+-++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++-#             selected_token_indices = token_indices[mask]
+-++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-++-            
+-++-#             current_states = hidden_states[selected_token_indices]
+-++-            
+-++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++-            
+-++-#             moe_output = moe_output.index_add(
+-++-#                 dim=0,
+-++-#                 index=selected_token_indices,
+-++-#                 source=expert_output.to(hidden_states.dtype)
+-++-#             )
+-++-#         return moe_output
+-++-
+-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++-#         """
+-++-#         顶层 forward 方法，作为智能分发器。
+-++-#         """
+-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-        
+-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++-#         router_logits = self.gate(hidden_states_reshaped)
+-++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++-
+-++-#         if self.norm_topk_prob:
+-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-        
+-++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++-        
+-++-#         moe_output = None
+-++-#         # 在推理时，根据序列长度选择最优路径
+-++-#         if not self.training:
+-++-#             if sequence_length == 1:
+-++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++-#             else:
+-++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++-#         else:
+-++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-++-#             raise NotImplementedError("Training path is not implemented.")
+-++-
+-++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-++-        
+-++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-++-        
+-++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-++-        
+-++-#         return final_hidden_states, router_logits
+-++-
+-++-
+-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++-#     """
+-++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-++-#     """
+-++-#     def __init__(self, config: Qwen2MoeConfig):
+-++-#         super().__init__()
+-++-#         self.num_experts = config.num_experts
+-++-#         self.top_k = config.num_experts_per_tok
+-++-#         self.norm_topk_prob = config.norm_topk_prob
+-++-
+-++-#         # 门控网络
+-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++-#         # 专家列表
+-++-#         self.experts = nn.ModuleList(
+-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++-#         )
+-++-#         # 共享专家
+-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++-
+-++-#     @no_grad()
+-++-#     def _moe_infer_decode(
+-++-#         self, 
+-++-#         hidden_states: mindspore.Tensor, 
+-++-#         selected_experts: mindspore.Tensor, 
+-++-#         routing_weights: mindspore.Tensor
+-++-#     ) -> mindspore.Tensor:
+-++-#         batch_size, _ = hidden_states.shape
+-++-#         expert_outputs_list = [
+-++-#             ops.cat([
+-++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++-#             ], dim=0) 
+-++-#             for i in range(batch_size)
+-++-#         ]
+-++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++-#         return moe_output.squeeze(1)
+-++-
+-++-#     @no_grad()
+-++-#     def _moe_infer_prefill(
+-++-#         self, 
+-++-#         hidden_states: mindspore.Tensor, 
+-++-#         selected_experts: mindspore.Tensor, 
+-++-#         routing_weights: mindspore.Tensor
+-++-#     ) -> mindspore.Tensor:
+-++-#         moe_output = ops.zeros_like(hidden_states)
+-++-#         num_tokens = hidden_states.shape[0]
+-++-#         flat_selected_experts = selected_experts.flatten()
+-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++-#         active_experts = ops.unique(flat_selected_experts)
+-++-        
+-++-#         for expert_idx_tensor in active_experts:
+-++-#             expert_idx = expert_idx_tensor.item()
+-++-#             expert_layer = self.experts[expert_idx]
+-++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++-#             selected_token_indices = token_indices[mask]
+-++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-++-#             current_states = hidden_states[selected_token_indices]
+-++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++-#             moe_output = moe_output.index_add(
+-++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++-#             )
+-++-#         return moe_output
+-++-
+-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++-#         """
+-++-#         顶层 forward 方法，作为智能分发器。
+-++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-++-#         """
+-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-        
+-++-#         # 1. 门控计算 (通用逻辑)
+-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++-#         router_logits = self.gate(hidden_states_reshaped)
+-++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++-
+-++-#         if self.norm_topk_prob:
+-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-        
+-++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++-        
+-++-#         # 2. 智能分发到最优 MoE 路径
+-++-#         moe_output = None
+-++-#         if not self.training:
+-++-#             if sequence_length == 1:
+-++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++-#             else:
+-++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++-#         else:
+-++-#             raise NotImplementedError("Training path is not implemented.")
+-++-
+-++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++-        
+-++-#         # 4. 合并 MoE 输出和共享专家输出
+-++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++-        
+-++-#         # 5. 恢复原始形状并返回
+-++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++-        
+-++-#         return final_hidden_states, router_logits
+-++-
+-++-# prefill fastest
+-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++-#     """
+-++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-++-#     """
+-++-#     def __init__(self, config: Qwen2MoeConfig):
+-++-#         super().__init__()
+-++-#         self.num_experts = config.num_experts
+-++-#         self.top_k = config.num_experts_per_tok
+-++-#         self.norm_topk_prob = config.norm_topk_prob
+-++-
+-++-#         # 门控网络
+-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++-#         # 专家列表
+-++-#         self.experts = nn.ModuleList(
+-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++-#         )
+-++-#         # 共享专家
+-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++-
+-++-#     @no_grad()
+-++-#     def _moe_infer_dispatch(
+-++-#         self, 
+-++-#         hidden_states: mindspore.Tensor, 
+-++-#         selected_experts: mindspore.Tensor, 
+-++-#         routing_weights: mindspore.Tensor
+-++-#     ) -> mindspore.Tensor:
+-++-#         """
+-++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-++-#         """
+-++-#         moe_output = ops.zeros_like(hidden_states)
+-++-#         num_tokens, _ = hidden_states.shape
+-++-        
+-++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-++-#         flat_selected_experts = selected_experts.flatten()
+-++-#         flat_routing_weights = routing_weights.flatten()
+-++-
+-++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++-
+-++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-++-#         active_experts = ops.unique(flat_selected_experts)
+-++-        
+-++-#         for expert_idx_tensor in active_experts:
+-++-#             expert_idx = expert_idx_tensor.item()
+-++-#             expert_layer = self.experts[expert_idx]
+-++-            
+-++-#             # 找到所有分配给该专家的 token
+-++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++-            
+-++-#             # 使用 mask 选取对应的 token 和权重
+-++-#             current_token_indices = token_indices[mask]
+-++-#             current_routing_weights = flat_routing_weights[mask]
+-++-#             current_hidden_states = hidden_states[current_token_indices]
+-++-            
+-++-#             # 对这些 token 进行批处理
+-++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++-            
+-++-#             # 使用 index_add 将结果精确地加回到对应位置
+-++-#             moe_output = moe_output.index_add(
+-++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-++-#             )
+-++-#         return moe_output
+-++-
+-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++-#         """
+-++-#         顶层 forward 方法，作为智能分发器。
+-++-#         """
+-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-        
+-++-#         # 1. 门控计算
+-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++-#         router_logits = self.gate(hidden_states_reshaped)
+-++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++-
+-++-#         if self.norm_topk_prob:
+-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-        
+-++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++-        
+-++-#         # 2. 调用统一的 MoE 计算内核
+-++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-++-
+-++-#         # 3. 统一处理共享专家
+-++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++-        
+-++-#         # 4. 合并输出
+-++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++-        
+-++-#         # 5. 恢复原始形状并返回
+-++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++-        
+-++-#         return final_hidden_states, router_logits
+-++-
+-++-
+-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++-#     """
+-++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++-#     【最终高性能与高精度版】：
+-++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-++-#     3. 这样实现了速度和准确性的两全其美。
+-++-#     """
+-++-#     def __init__(self, config: Qwen2MoeConfig):
+-++-#         super().__init__()
+-++-#         self.num_experts = config.num_experts
+-++-#         self.top_k = config.num_experts_per_tok
+-++-#         self.norm_topk_prob = config.norm_topk_prob
+-++-
+-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++-#         self.experts = nn.ModuleList(
+-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++-#         )
+-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++-
+-++-#     @no_grad()
+-++-#     def _moe_infer_decode(
+-++-#         self, 
+-++-#         hidden_states: mindspore.Tensor, 
+-++-#         selected_experts: mindspore.Tensor, 
+-++-#         routing_weights: mindspore.Tensor
+-++-#     ) -> mindspore.Tensor:
+-++-#         """
+-++-#         【解码路径】极致优化版：bmm + 高精度累加。
+-++-#         """
+-++-#         original_dtype = hidden_states.dtype
+-++-#         batch_size, _ = hidden_states.shape
+-++-        
+-++-#         expert_outputs_list = [
+-++-#             ops.cat([
+-++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++-#             ], dim=0) 
+-++-#             for i in range(batch_size)
+-++-#         ]
+-++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++-
+-++-#         # 在 float32 下执行 bmm，得到高精度结果
+-++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++-        
+-++-#         # 将高精度结果转换回原始数据类型
+-++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-++-        
+-++-#         return moe_output
+-++-
+-++-#     @no_grad()
+-++-#     def _moe_infer_prefill(
+-++-#         self, 
+-++-#         hidden_states: mindspore.Tensor, 
+-++-#         selected_experts: mindspore.Tensor, 
+-++-#         routing_weights: mindspore.Tensor
+-++-#     ) -> mindspore.Tensor:
+-++-#         """
+-++-#         【预填充路径】与原始实现一致，结果精确。
+-++-#         """
+-++-#         moe_output = ops.zeros_like(hidden_states)
+-++-#         num_tokens, _ = hidden_states.shape
+-++-#         flat_selected_experts = selected_experts.flatten()
+-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++-#         active_experts = ops.unique(flat_selected_experts)
+-++-        
+-++-#         for expert_idx_tensor in active_experts:
+-++-#             expert_idx = expert_idx_tensor.item()
+-++-#             expert_layer = self.experts[expert_idx]
+-++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++-#             selected_token_indices = token_indices[mask]
+-++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-++-#             current_states = hidden_states[selected_token_indices]
+-++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++-#             moe_output = moe_output.index_add(
+-++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++-#             )
+-++-#         return moe_output
+-++-
+-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-        
+-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++-#         router_logits = self.gate(hidden_states_reshaped)
+-++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++-
+-++-#         if self.norm_topk_prob:
+-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-        
+-++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-++-#         # 如果模型主体是 float16，后续再转换
+-++-        
+-++-#         moe_output = None
+-++-#         if not self.training:
+-++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-++-#             # _moe_infer_decode 内部会处理好类型转换
+-++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-++-#             if sequence_length == 1:
+-++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++-#             else:
+-++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++-#         else:
+-++-#             raise NotImplementedError("Training path is not implemented.")
+-++-
+-++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++-        
+-++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++-        
+-++-#         return final_hidden_states, router_logits
+-++-    
+-++-
+-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++-#     """
+-++-#     【融合版】一个混合专家模块，内置两种推理策略，
+-++-#     由外部全局变量 `Long_Prompt` 控制：
+-++-
+-++-#     - if Long_Prompt is True:  【精度优先模式】
+-++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-++-#       适用于处理长序列，避免误差累积。
+-++-
+-++-#     - if Long_Prompt is False: 【速度优先模式】
+-++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-++-#       在解码阶段获得极致速度，同时保证结果高度准确。
+-++-#     """
+-++-#     def __init__(self, config: Qwen2MoeConfig):
+-++-#         super().__init__()
+-++-#         self.num_experts = config.num_experts
+-++-#         self.top_k = config.num_experts_per_tok
+-++-#         self.norm_topk_prob = config.norm_topk_prob
+-++-
+-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++-#         self.experts = nn.ModuleList(
+-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++-#         )
+-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++-
+-++-#     # --- 速度优先模式的辅助函数 ---
+-++-#     @no_grad()
+-++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++-#         original_dtype = hidden_states.dtype
+-++-#         batch_size, _ = hidden_states.shape
+-++-#         expert_outputs_list = [
+-++-#             ops.cat([
+-++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++-#             ], dim=0) 
+-++-#             for i in range(batch_size)
+-++-#         ]
+-++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++-#         weights_fp32 = routing_weights.to(mindspore.float32)
+-++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-++-
+-++-#     @no_grad()
+-++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++-#         moe_output = ops.zeros_like(hidden_states)
+-++-#         num_tokens, _ = hidden_states.shape
+-++-#         flat_selected_experts = selected_experts.flatten()
+-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++-#         active_experts = ops.unique(flat_selected_experts)
+-++-#         for expert_idx_tensor in active_experts:
+-++-#             expert_idx = expert_idx_tensor.item()
+-++-#             expert_layer = self.experts[expert_idx]
+-++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++-#             selected_token_indices = token_indices[mask]
+-++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-++-#             current_states = hidden_states[selected_token_indices]
+-++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-++-#         return moe_output
+-++-
+-++-#     # --- 精度优先模式的辅助函数 ---
+-++-#     @no_grad()
+-++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++-#         moe_output = ops.zeros_like(hidden_states)
+-++-#         num_tokens, _ = hidden_states.shape
+-++-#         flat_selected_experts = selected_experts.flatten()
+-++-#         flat_routing_weights = routing_weights.flatten()
+-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++-#         active_experts = ops.unique(flat_selected_experts)
+-++-#         for expert_idx_tensor in active_experts:
+-++-#             expert_idx = expert_idx_tensor.item()
+-++-#             expert_layer = self.experts[expert_idx]
+-++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++-#             current_token_indices = token_indices[mask]
+-++-#             current_routing_weights = flat_routing_weights[mask]
+-++-#             current_hidden_states = hidden_states[current_token_indices]
+-++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-++-#         return moe_output
+-++-
+-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++-#         # 声明我们将要使用一个在模块外部定义的全局变量
+-++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-++-#         global Long_Prompt
+-++-
+-++-#         # 1. 门控计算 (所有模式通用)
+-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++-#         router_logits = self.gate(hidden_states_reshaped)
+-++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-++-#         if self.norm_topk_prob:
+-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-        
+-++-#         moe_output = None
+-++-#         if not self.training:
+-++-#             # 根据 Long_Prompt 标志选择模式
+-++-#             if Long_Prompt:
+-++-#                 # --- 精度优先模式 ---
+-++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++-#             else:
+-++-#                 # --- 速度优先模式 ---
+-++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++-#                 if sequence_length == 1:
+-++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++-#                 else:
+-++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++-#         else:
+-++-#             raise NotImplementedError("Training path is not implemented.")
+-++-
+-++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++-        
+-++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++-        
+-++-#         return final_hidden_states, router_logits
+-++-    
+-++ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++     """
+-++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++         return moe_output_fp32.squeeze(1).to(original_dtype)
+-++ 
+-+++    # @no_grad()
+-+++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++    #     num_tokens, _ = hidden_states.shape
+-+++    #     flat_selected_experts = selected_experts.flatten()
+-+++    #     sorted_expert_indices = flat_selected_experts.argsort()
+-+++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+++    #     original_token_indices = sorted_expert_indices // self.top_k
+-+++    #     moe_output = ops.zeros_like(hidden_states)
+-+++    #     current_token_offset = 0
+-+++    #     for i in range(self.num_experts):
+-+++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+-+++    #         if expert_token_count == 0:
+-+++    #             continue
+-+++    #         end_offset = current_token_offset + expert_token_count
+-+++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+-+++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++    #         current_token_offset += expert_token_count
+-+++    #     return moe_output
+-+++
+-++     @no_grad()
+-++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++-        num_tokens, _ = hidden_states.shape
+-++-        flat_selected_experts = selected_experts.flatten()
+-++-        sorted_expert_indices = flat_selected_experts.argsort()
+-++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-++-        original_token_indices = sorted_expert_indices // self.top_k
+-+++        """
+-+++        优化版 MoE prefill (速度优先模式)：
+-+++        - 批量张量化处理同一个 expert 的所有 token
+-+++        - 跳过无 token 的专家
+-+++        - 保持结果完全一致
+-+++        """
+-++         moe_output = ops.zeros_like(hidden_states)
+-++-        current_token_offset = 0
+-++-        for i in range(self.num_experts):
+-++-            expert_token_count = tokens_per_expert[i] - current_token_offset
+-++-            if expert_token_count == 0:
+-++-                continue
+-++-            end_offset = current_token_offset + expert_token_count
+-++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-++-            expert_hidden_states = hidden_states[expert_original_token_indices]
+-++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-++-            current_token_offset += expert_token_count
+-+++
+-+++        flat_selected_experts = selected_experts.flatten()
+-+++        flat_routing_weights = routing_weights.flatten()
+-+++
+-+++        idxs = flat_selected_experts.argsort()
+-+++        sorted_expert_indices = flat_selected_experts[idxs]
+-+++        sorted_token_indices = idxs // self.top_k
+-+++
+-+++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+-+++
+-+++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-+++
+-+++        for expert_id in active_experts.tolist():
+-+++            start = int(tokens_per_expert[:expert_id].sum().item())
+-+++            end = start + int(tokens_per_expert[expert_id].item())
+-+++
+-+++            token_idx = sorted_token_indices[start:end]
+-+++            expert_tokens = hidden_states[token_idx]
+-+++
+-+++            expert_out = self.experts[expert_id](expert_tokens)
+-+++
+-+++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+-+++
+-+++            moe_output = mindspore.mint.scatter_add(
+-+++                moe_output,
+-+++                0,
+-+++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+-+++                scaled_out.to(hidden_states.dtype)
+-+++            )
+-+++
+-++         return moe_output
+-++ 
+-+++
+-++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-++     @no_grad()
+-++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++         
+-++         moe_output = None
+-++-        if Long_Prompt:
+-++-            # --- 精度优先模式 (ACCURACY MODE) ---
+-++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++        # if Long_Prompt==0:
+-+++        #     # --- 精度优先模式 (ACCURACY MODE) ---
+-+++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++        # else:
+-+++        #     # --- 速度优先模式 (SPEED MODE) ---
+-+++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++        #     if sequence_length == 1:
+-+++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++        #     else:
+-+++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++        
+-+++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++        if sequence_length == 1:
+-+++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++         else:
+-++-            # --- 速度优先模式 (SPEED MODE) ---
+-++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++-            if sequence_length == 1:
+-++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++-            else:
+-++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++-        
+-+++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++    
+-++ 
+-++         # 3. 共享专家计算与合并 (所有模式通用)
+-++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++         
+-++         return final_hidden_states, router_logits
+-++ 
+-+++
+-++ class Qwen2MoeDecoderLayer(nn.Module):
+-++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-++         super().__init__()
+-++         self.hidden_size = config.hidden_size
+-++         
+-++-        # if Long_Prompt:
+-++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++-        # else:
+-+++        # if Long_Prompt == 2:
+-++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++        # else:
+-+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++ 
+-++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++ 
+-++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++             )
+-++ 
+-++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+-++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++        #     attention_mask,
+-+++        #     sequence_length=sequence_length,
+-+++        #     target_length=target_length,
+-+++        #     dtype=dtype,
+-+++        #     min_dtype=min_dtype,
+-+++        #     cache_position=cache_position,
+-+++        #     batch_size=input_tensor.shape[0],
+-+++        # )
+-+++        #@dwj
+-+++        causal_mask = get_cached_causal_mask_with_cache_position(
+-++             attention_mask,
+-++             sequence_length=sequence_length,
+-++             target_length=target_length,
+-++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-++         """
+-++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+-+++        _causal_mask_cache.clear()
+-++ 
+-++         input_ids = kwargs.get("input_ids")
+-++         if input_ids is None and args:
+-++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++ 
+-++         if input_ids is not None:
+-++             prompt_length = input_ids.shape[1]
+-++-            
+-++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-++-                Long_Prompt = True
+-+++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+-+++                Long_Prompt = 2
+-+++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+-+++                Long_Prompt = 0
+-++             else:
+-++-                Long_Prompt = False
+-+++                Long_Prompt = 1
+-+++
+-++ 
+-++         return super().generate(*args, **kwargs)
+-++     
+-++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++             dtype = self.lm_head.weight.dtype
+-++             min_dtype = float(ops.finfo(dtype).min)
+-++ 
+-++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++            #     attention_mask,
+-+++            #     sequence_length=sequence_length,
+-+++            #     target_length=past_key_values.get_max_length(),
+-+++            #     dtype=dtype,
+-+++            #     min_dtype=min_dtype,
+-+++            #     cache_position=cache_position,
+-+++            #     batch_size=batch_size,
+-+++            # )
+-+++
+-+++            #@dwj
+-+++            attention_mask = get_cached_causal_mask_with_cache_position(
+-++                 attention_mask,
+-++                 sequence_length=sequence_length,
+-++                 target_length=past_key_values.get_max_length(),
+-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-++deleted file mode 100644
+-++index 6dfb5b93..00000000
+-++--- a/patches/0001-20251104commit.patch
+-+++++ /dev/null
+-++@@ -1,1272 +0,0 @@
+-++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++-From: Pinoeer-kingxi <13022943007@163.com>
+-++-Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++-Subject: [PATCH] 20251104commit
+-++-
+-++----
+-++- mindnlp/transformers/cache_utils.py           |  28 +-
+-++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-++- 3 files changed, 976 insertions(+), 87 deletions(-)
+-++-
+-++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-++-index cadd2e04..02f8d4be 100644
+-++---- a/mindnlp/transformers/cache_utils.py
+-++-+++ b/mindnlp/transformers/cache_utils.py
+-++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-++-             # k_out[:, :, cache_position] = key_states
+-++-             # v_out[:, :, cache_position] = value_states
+-++--            if ON_ORANGE_PI:
+-++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++--            else:
+-++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++--
+-++-+            # if ON_ORANGE_PI:
+-++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++-+            # else:
+-++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++-+            # 确保 cache_position 是 1D tensor 并且类型正确
+-++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-++-+            if cache_position.ndim > 1:
+-++-+                cache_position = cache_position.flatten()
+-++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+-++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-++-+                cache_position = cache_position.int()
+-++-+            
+-++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-++-+            k_out[:, :, cache_position] = key_states
+-++-+            v_out[:, :, cache_position] = value_states
+-++-+            
+-++-         return k_out, v_out
+-++- 
+-++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++-index c695b944..d8303e45 100644
+-++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-++- # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++- def rotate_half(x):
+-++-     """Rotates half the hidden dims of the input."""
+-++--    x1 = x[..., : x.shape[-1] // 2]
+-++--    x2 = x[..., x.shape[-1] // 2 :]
+-++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++-+    # x1 = x[..., : x.shape[-1] // 2]
+-++-+    # x2 = x[..., x.shape[-1] // 2 :]
+-++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++-     return ops.cat((-x2, x1), dim=-1)
+-++- 
+-++- 
+-++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-++-         if self.training:
+-++-             raise NotImplementedError("Training is not supported yet.")
+-++-         else:
+-++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++--        if self.config.n_shared_experts is not None:
+-++--            y = y + self.shared_experts(identity)
+-++--        return y
+-++-+            # @lwx
+-++-+            if orig_shape[1] == 1:
+-++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-++-+                y=y.view(*orig_shape)
+-++-+                if self.config.n_shared_experts is not None:
+-++-+                    y = y + self.shared_experts(identity)
+-++-+                return y
+-++-+            else:
+-++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-++-+                if self.config.n_shared_experts is not None:
+-++-+                    y = y + self.shared_experts(identity)
+-++-+                return y
+-++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++-+        # if self.config.n_shared_experts is not None:
+-++-+        #     y = y + self.shared_experts(identity)
+-++-+        # return y
+-++-+        
+-++-+    @no_grad()
+-++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++-+
+-++-+        expert_cache = ops.zeros_like(x)
+-++-+        for i in range(self.num_experts_per_tok):
+-++-+            expert_id = flat_expert_indices[i].item()
+-++-+            weight = flat_expert_weights[i].item()
+-++-+            expert = self.experts[expert_id]
+-++-+            expert_out = expert(x)
+-++-+            expert_cache += expert_out * weight
+-++-+        return expert_cache
+-++- 
+-++-     @no_grad()
+-++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++--        # expert_cache = torch.zeros_like(x)
+-++--        # idxs = flat_expert_indices.argsort()
+-++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++--        # token_idxs = idxs // self.num_experts_per_tok
+-++--        # for i, end_idx in enumerate(tokens_per_expert):
+-++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++--        #     if start_idx == end_idx:
+-++--        #         continue
+-++--        #     expert = self.experts[i]
+-++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++--        #     expert_tokens = x[exp_token_idx]
+-++--        #     expert_out = expert(expert_tokens)
+-++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++--        # return expert_cache
+-++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++-         expert_cache = ops.zeros_like(x)
+-++-         idxs = flat_expert_indices.argsort()
+-++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++-         token_idxs = idxs // self.num_experts_per_tok
+-++-+
+-++-         for i, end_idx in enumerate(tokens_per_expert):
+-++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++-             if start_idx == end_idx:
+-++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-++-             expert_out = expert(expert_tokens)
+-++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++-+
+-++-         return expert_cache
+-++-+        
+-++-+    # @no_grad()
+-++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++-+    #     # expert_cache = torch.zeros_like(x)
+-++-+    #     # idxs = flat_expert_indices.argsort()
+-++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++-+    #     # token_idxs = idxs // self.num_experts_per_tok
+-++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++-+    #     #     if start_idx == end_idx:
+-++-+    #     #         continue
+-++-+    #     #     expert = self.experts[i]
+-++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++-+    #     #     expert_tokens = x[exp_token_idx]
+-++-+    #     #     expert_out = expert(expert_tokens)
+-++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++-+    #     # return expert_cache
+-++-+    #     expert_cache = ops.zeros_like(x)
+-++-+    #     idxs = flat_expert_indices.argsort()
+-++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++-+    #     token_idxs = idxs // self.num_experts_per_tok
+-++-+
+-++-+    #     for i, end_idx in enumerate(tokens_per_expert):
+-++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++-+    #         if start_idx == end_idx:
+-++-+    #             continue
+-++-+    #         expert = self.experts[i]
+-++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++-+    #         expert_tokens = x[exp_token_idx]
+-++-+    #         expert_out = expert(expert_tokens)
+-++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++-+
+-++-+    #     return expert_cache
+-++-+    # @no_grad()
+-++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++-+    #     expert_cache = ops.zeros_like(x)
+-++-+
+-++-+    #     # 排序保证顺序一致
+-++-+    #     idxs = flat_expert_indices.argsort()
+-++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++-+    #     token_idxs = idxs // self.num_experts_per_tok
+-++-+
+-++-+    #     # 找出有 token 的专家
+-++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++-+
+-++-+    #     for i in active_experts.tolist():
+-++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++-+    #         end_idx = tokens_per_expert[i]
+-++-+    #         if start_idx == end_idx:  # 没有 token
+-++-+    #             continue
+-++-+
+-++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++-+    #         expert_tokens = x[exp_token_idx]
+-++-+    #         expert_out = self.experts[i](expert_tokens)
+-++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++-+
+-++-+    #         expert_cache = mindspore.mint.scatter_add(
+-++-+    #             expert_cache,
+-++-+    #             0,
+-++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++-+    #             expert_out
+-++-+    #         )
+-++-+
+-++-+    #     return expert_cache
+-++-+
+-++-+
+-++- 
+-++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-++- #     """
+-++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++- 
+-++-         # Initialize weights and apply final processing
+-++-         self.post_init()
+-++-+        self.warm_up = False
+-++-+
+-++-+    def warmup_moe_model_deep(self):
+-++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++-+        test_texts = [
+-++-+            "warmup short",
+-++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-++-+        ]
+-++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++-+        if tokenizer is None:
+-++-+            from mindnlp.transformers import AutoTokenizer
+-++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++-+            self._warmup_tokenizer = tokenizer
+-++-+
+-++-+        for text in test_texts:
+-++-+            inputs = tokenizer(text, return_tensors="ms")
+-++-+            with mindspore._no_grad():
+-++-+                _ = self(**inputs, use_cache=False)
+-++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-++- 
+-++-     def get_input_embeddings(self):
+-++-         return self.model.embed_tokens
+-++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++-         ```"""
+-++-+        if not self.warm_up:
+-++-+            self.warm_up = True
+-++-+            self.warmup_moe_model_deep()
+-++-+
+-++-         output_attentions = (
+-++-             output_attentions
+-++-             if output_attentions is not None
+-++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++-index 3cbf820e..d4c6b651 100644
+-++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++-@@ -18,7 +18,6 @@
+-++- # See the License for the specific language governing permissions and
+-++- # limitations under the License.
+-++- """MindSpore Qwen2MoE model."""
+-++--
+-++- import math
+-++- from typing import List, Optional, Tuple, Union
+-++- 
+-++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-++-     TokenClassifierOutput,
+-++- )
+-++- from ...modeling_utils import PreTrainedModel
+-++-+from ...generation import GenerationMixin
+-++- from ....utils import logging
+-++- from .configuration_qwen2_moe import Qwen2MoeConfig
+-++- 
+-++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-++-         self.variance_epsilon = eps
+-++- 
+-++-     def forward(self, hidden_states):
+-++-+        # @dwj
+-++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++-+        # @lwx
+-++-+        # if not self.training :
+-++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++-         input_dtype = hidden_states.dtype
+-++-         hidden_states = hidden_states.to(mindspore.float32)
+-++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-++-@@ -234,6 +239,8 @@ def rotate_half(x):
+-++-     """Rotates half the hidden dims of the input."""
+-++-     x1 = x[..., : x.shape[-1] // 2]
+-++-     x2 = x[..., x.shape[-1] // 2 :]
+-++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++-     return ops.cat((-x2, x1), dim=-1)
+-++- 
+-++- 
+-++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-++-         self.config = config
+-++-         self.hidden_size = config.hidden_size
+-++-         self.intermediate_size = intermediate_size
+-++-+        
+-++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-++-         self.act_fn = ACT2FN[config.hidden_act]
+-++- 
+-++-     def forward(self, x):
+-++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++--
+-++- 
+-++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++-+        # @lwx 
+-++-+        # gate_up_output = self.gate_up_proj(x)
+-++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-++-+        # return self.down_proj(swiglu_output)
+-++-+
+-++-+    # def forward(self, x):
+-++-+    #     gate_proj_out = self.gate_proj(x)
+-++-+    #     up_proj_out = self.up_proj(x)
+-++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-++-+    #     return self.down_proj(swiglu_out)
+-++-+        
+-++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++-     """
+-++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-++-         use_cache: bool = False,
+-++-         cache_position: Optional[mindspore.Tensor] = None,
+-++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++-+
+-++-+        
+-++-+
+-++-         bsz, q_len, _ = hidden_states.shape
+-++- 
+-++-         query_states = self.q_proj(hidden_states)
+-++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++-                     "with a layer index."
+-++-                 )
+-++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++-+            if isinstance(past_key_value, StaticCache):
+-++-+                kv_seq_len = key_states.shape[-2]
+-++-+            else:
+-++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++- 
+-++-         if past_key_value is not None:
+-++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++-+            
+-++-+            if isinstance(past_key_value, StaticCache):
+-++-+                kv_seq_len = key_states.shape[-2]
+-++- 
+-++-         # repeat k/v heads if n_kv_heads < n_heads
+-++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++--
+-++-+        
+-++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++- 
+-++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-++--            raise ValueError(
+-++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-++--                f" {attn_weights.shape}"
+-++--            )
+-++--
+-++--        if attention_mask is not None:  # no matter the length, we just slice it
+-++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-++-+        if attention_mask is not None:
+-++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++-             attn_weights = attn_weights + causal_mask
+-++- 
+-++-         # upcast attention to fp32
+-++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++- 
+-++-         attn_output = self.o_proj(attn_output)
+-++--
+-++-+        # @lwx
+-++-+        
+-++-+        # max_seq_len = self.max_position_embeddings  # 2048
+-++-+
+-++-+        # if attention_mask is not None:
+-++-+        #     # attention_mask: [B, 1, Sq, Sk]
+-++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++-+
+-++-+        #     # pad 到 [max_seq_len, max_seq_len]
+-++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++-+        #     global_attention_mask = padded_mask
+-++-+        # else:
+-++-+        #     global_attention_mask = None
+-++-+
+-++-+
+-++-+        # sparse_mode=3
+-++-+        # attn_output = mindspore.ops.flash_attention_score(
+-++-+        #     query=query_states,
+-++-+        #     key=key_states,
+-++-+        #     value=value_states,
+-++-+        #     real_shift=None,
+-++-+        #     padding_mask=None,
+-++-+
+-++-+        #     head_num=self.num_heads,
+-++-+        #     attn_mask=global_attention_mask,
+-++-+        #     keep_prob=1.0 - self.attention_dropout,
+-++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++-+        #     input_layout="BNSD",
+-++-+        #     pre_tokens=2147483647,
+-++-+        #     next_tokens=2147483647,
+-++-+        #     inner_precise=0,
+-++-+        #     drop_mask=None,
+-++-+        #     prefix=None,
+-++-+        #     actual_seq_qlen=None,
+-++-+        #     actual_seq_kvlen=None,
+-++-+        #     sparse_mode=sparse_mode,
+-++-+        # )
+-++-         if not output_attentions:
+-++-             attn_weights = None
+-++- 
+-++-         return attn_output, attn_weights, past_key_value
+-++- 
+-++- 
+-++-+class Qwen2MoeFlashAttention(nn.Module):
+-++-+    """
+-++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++-+
+-++-+    关键改动:
+-++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++-+       直接传入原始的 key 和 value 张量效率更高。
+-++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++-+    """
+-++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++-+        super().__init__()
+-++-+        self.config = config
+-++-+        self.layer_idx = layer_idx
+-++-+        self.hidden_size = config.hidden_size
+-++-+        self.num_heads = config.num_attention_heads
+-++-+        self.head_dim = self.hidden_size // self.num_heads
+-++-+        self.num_key_value_heads = config.num_key_value_heads
+-++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++-+        self.max_position_embeddings = config.max_position_embeddings
+-++-+        self.rope_theta = config.rope_theta
+-++-+        self.attention_dropout = config.attention_dropout
+-++-+
+-++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++-+            raise ValueError(
+-++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++-+            )
+-++-+
+-++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++-+
+-++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++-+            self.head_dim,
+-++-+            max_position_embeddings=self.max_position_embeddings,
+-++-+            base=self.rope_theta,
+-++-+        )
+-++-+
+-++-+    def forward(
+-++-+        self,
+-++-+        hidden_states: mindspore.Tensor,
+-++-+        attention_mask: Optional[mindspore.Tensor] = None,
+-++-+        position_ids: Optional[mindspore.Tensor] = None,
+-++-+        past_key_value: Optional[Cache] = None,
+-++-+        output_attentions: bool = False,
+-++-+        use_cache: bool = False,
+-++-+        cache_position: Optional[mindspore.Tensor] = None,
+-++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++-+
+-++-+        bsz, q_len, _ = hidden_states.shape
+-++-+
+-++-+        # 1. 线性投射 Q, K, V
+-++-+        query_states = self.q_proj(hidden_states)
+-++-+        key_states = self.k_proj(hidden_states)
+-++-+        value_states = self.v_proj(hidden_states)
+-++-+
+-++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+
+-++-+        # 3. RoPE 旋转位置编码
+-++-+        kv_seq_len = key_states.shape[-2]
+-++-+        if past_key_value is not None:
+-++-+            if self.layer_idx is None:
+-++-+                raise ValueError(
+-++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++-+                    "with a layer index."
+-++-+                )
+-++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++-+                if cache_position.shape[0] == 1:
+-++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++-+                    kv_seq_len = past_seen_tokens + 1
+-++-+                else:
+-++-+                    # prefill 阶段：cache_position 是范围，使用其长度
+-++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++-+            else:
+-++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++-+
+-++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++-+
+-++-+        # 4. KV 缓存更新
+-++-+        if past_key_value is not None:
+-++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++-+            key_states, value_states = past_key_value.update(
+-++-+                key_states, value_states, self.layer_idx, cache_kwargs
+-++-+            )
+-++-+            
+-++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++-+                if cache_position.shape[0] == 1:
+-++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++-+                    kv_seq_len = key_states.shape[-2]
+-++-+
+-++-+        # 5. [重要] 准备 Attention Mask
+-++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++-+        fa_attention_mask = None
+-++-+        if attention_mask is not None:
+-++-+            # 截取与当前key长度匹配的部分
+-++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++-+            fa_attention_mask = (mask_slice != 0)
+-++-+
+-++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++-+        input_dtype = query_states.dtype
+-++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++-+            query_states = query_states.to(mindspore.float16)
+-++-+            key_states = key_states.to(mindspore.float16)
+-++-+            value_states = value_states.to(mindspore.float16)
+-++-+
+-++-+        # 6. [核心] 调用 flash_attention_score 算子
+-++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++-+        attn_output = mindspore.ops.flash_attention_score(
+-++-+            query=query_states,
+-++-+            key=key_states,
+-++-+            value=value_states,
+-++-+            head_num=self.num_heads, # 传入Q的头数(N1)
+-++-+            attn_mask=fa_attention_mask,
+-++-+            keep_prob=1.0 - self.attention_dropout,
+-++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+-++-+            input_layout="BNSD",
+-++-+            sparse_mode=0 # 使用 defaultMask 模式
+-++-+        )
+-++-+
+-++-+        # 恢复原始数据类型
+-++-+        attn_output = attn_output.to(input_dtype)
+-++-+
+-++-+        # 7. 调整输出形状
+-++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++-+        attn_output = self.o_proj(attn_output)
+-++-+
+-++-+        # FlashAttention 算子不直接返回注意力权重矩阵
+-++-+        attn_weights = None
+-++-+        if output_attentions:
+-++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++-+
+-++-+        return attn_output, attn_weights, past_key_value
+-++-+
+-++-+    # def forward(
+-++-+    #     self,
+-++-+    #     hidden_states: mindspore.Tensor,
+-++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-++-+    #     past_key_value: Optional[Cache] = None,
+-++-+    #     output_attentions: bool = False,
+-++-+    #     use_cache: bool = False,
+-++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++-+
+-++-+    #     bsz, q_len, _ = hidden_states.shape
+-++-+
+-++-+    #     # 1. 线性投射 Q, K, V
+-++-+    #     query_states = self.q_proj(hidden_states)
+-++-+    #     key_states = self.k_proj(hidden_states)
+-++-+    #     value_states = self.v_proj(hidden_states)
+-++-+
+-++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+
+-++-+    #     # 3. RoPE 旋转位置编码
+-++-+    #     kv_seq_len = key_states.shape[-2]
+-++-+    #     if past_key_value is not None:
+-++-+    #         if self.layer_idx is None:
+-++-+    #             raise ValueError(
+-++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++-+    #                 "with a layer index."
+-++-+    #             )
+-++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++-+
+-++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++-+
+-++-+    #     # 4. KV 缓存更新
+-++-+    #     if past_key_value is not None:
+-++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++-+    #         key_states, value_states = past_key_value.update(
+-++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++-+    #         )
+-++-+
+-++-+    #     # 5. 准备 Attention Mask
+-++-+    #     fa_attention_mask = None
+-++-+    #     if attention_mask is not None:
+-++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++-+    #         fa_attention_mask = (mask_slice != 0)
+-++-+
+-++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++-+    #     input_dtype = query_states.dtype
+-++-+
+-++-+    #     # 6. [核心] 调用 flash_attention_score 算子
+-++-+    #     attn_output = mindspore.ops.flash_attention_score(
+-++-+    #         query=query_states,
+-++-+    #         key=key_states,
+-++-+    #         value=value_states,
+-++-+    #         head_num=self.num_heads,
+-++-+    #         attn_mask=fa_attention_mask,
+-++-+    #         keep_prob=1.0 - self.attention_dropout,
+-++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++-+    #         input_layout="BNSD",
+-++-+    #         sparse_mode=0,
+-++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++-+    #         inner_precise=1
+-++-+    #     )
+-++-+
+-++-+    #     # 恢复原始数据类型
+-++-+    #     attn_output = attn_output.to(input_dtype)
+-++-+
+-++-+    #     # 7. 调整输出形状
+-++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++-+    #     attn_output = self.o_proj(attn_output)
+-++-+
+-++-+    #     attn_weights = None
+-++-+    #     if output_attentions:
+-++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++-+
+-++-+    #     return attn_output, attn_weights, past_key_value
+-++-+
+-++-+    # def forward(
+-++-+    #     self,
+-++-+    #     hidden_states: mindspore.Tensor,
+-++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-++-+    #     past_key_value: Optional[Cache] = None,
+-++-+    #     output_attentions: bool = False,
+-++-+    #     use_cache: bool = False,
+-++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++-+
+-++-+    #     bsz, q_len, _ = hidden_states.shape
+-++-+
+-++-+    #     query_states = self.q_proj(hidden_states)
+-++-+    #     key_states = self.k_proj(hidden_states)
+-++-+    #     value_states = self.v_proj(hidden_states)
+-++-+
+-++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-+
+-++-+    #     kv_seq_len = key_states.shape[-2]
+-++-+    #     if past_key_value is not None:
+-++-+    #         if self.layer_idx is None:
+-++-+    #             raise ValueError("`layer_idx` must be specified for caching")
+-++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++-+
+-++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++-+
+-++-+    #     if past_key_value is not None:
+-++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++-+    #         key_states, value_states = past_key_value.update(
+-++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++-+    #         )
+-++-+
+-++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++-+
+-++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++-+    #     query_states = query_states / math.sqrt(self.head_dim)
+-++-+    #     # <--- 修改结束 ---
+-++-+
+-++-+    #     fa_attention_mask = None
+-++-+    #     if attention_mask is not None:
+-++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++-+    #         fa_attention_mask = (mask_slice != 0)
+-++-+
+-++-+    #     input_dtype = query_states.dtype
+-++-+
+-++-+    #     attn_output = mindspore.ops.flash_attention_score(
+-++-+    #         query=query_states,    # 传入已经预先缩放过的 query
+-++-+    #         key=key_states,
+-++-+    #         value=value_states,
+-++-+    #         head_num=self.num_heads,
+-++-+    #         attn_mask=fa_attention_mask,
+-++-+    #         keep_prob=1.0 - self.attention_dropout,
+-++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++-+    #         input_layout="BNSD",
+-++-+    #         sparse_mode=0,
+-++-+    #         inner_precise=1        # 仍然保持内部高精度计算
+-++-+    #     )
+-++-+
+-++-+    #     attn_output = attn_output.to(input_dtype)
+-++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++-+    #     attn_output = self.o_proj(attn_output)
+-++-+
+-++-+    #     attn_weights = None
+-++-+    #     if output_attentions:
+-++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++-+
+-++-+    #     return attn_output, attn_weights, past_key_value
+-++-+
+-++- QWEN2MOE_ATTENTION_CLASSES = {
+-++-     "eager": Qwen2MoeAttention,
+-++-+    "flash-attention": Qwen2MoeFlashAttention,
+-++- }
+-++- 
+-++- 
+-++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++- 
+-++-+    #@dwj
+-++-+    # 只遍历激活的专家，而非全部专家
+-++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++--        hidden_states = hidden_states.view(-1, hidden_dim)
+-++--        # router_logits: (batch * sequence_length, n_experts)
+-++--        router_logits = self.gate(hidden_states)
+-++--
+-++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++--        if self.norm_topk_prob:
+-++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++--        # we cast back to the input dtype
+-++--        routing_weights = routing_weights.to(hidden_states.dtype)
+-++--
+-++--        final_hidden_states = ops.zeros(
+-++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-++--        )
+-++--
+-++--        # One hot encode the selected experts to create an expert mask
+-++--        # this will be used to easily index which expert is going to be sollicitated
+-++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-++--
+-++--        # Loop over all available experts in the model and perform the computation on each expert
+-++--        for expert_idx in range(self.num_experts):
+-++--            expert_layer = self.experts[expert_idx]
+-++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-++--
+-++--            # Index the correct hidden states and compute the expert hidden state for
+-++--            # the current expert. We need to make sure to multiply the output hidden
+-++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-++--            if 0 not in idx.shape:
+-++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-++--
+-++--                # However `index_add_` only support torch tensors for indexing so we'll use
+-++--                # the `top_x` tensor here.
+-++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-++--
+-++--        shared_expert_output = self.shared_expert(hidden_states)
+-++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-++--
+-++--        final_hidden_states = final_hidden_states + shared_expert_output
+-++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++-+            num_tokens = hidden_states_reshaped.shape[0]
+-++-+
+-++-+            router_logits = self.gate(hidden_states_reshaped)
+-++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++-+
+-++-+            if self.norm_topk_prob:
+-++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++-+            routing_weights = routing_weights.to(hidden_states.dtype)
+-++-+            
+-++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++-+            flat_selected_experts = selected_experts.flatten()
+-++-+            
+-++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++-+            token_indices = broadcasted_token_indices.flatten()
+-++-+            
+-++-+            active_experts = ops.unique(flat_selected_experts)
+-++-+            
+-++-+            for expert_idx_tensor in active_experts:
+-++-+                expert_idx = expert_idx_tensor.item()
+-++-+                expert_layer = self.experts[expert_idx]
+-++-+                
+-++-+                mask = (flat_selected_experts == expert_idx_tensor)
+-++-+                selected_token_indices = token_indices[mask]
+-++-+                selected_routing_weights = routing_weights.flatten()[mask]
+-++-+                
+-++-+                current_states = hidden_states_reshaped[selected_token_indices]
+-++-+                
+-++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++-+                
+-++-+                final_hidden_states = final_hidden_states.index_add(
+-++-+                    dim=0,
+-++-+                    index=selected_token_indices,
+-++-+                    source=expert_output.to(hidden_states.dtype)
+-++-+                )
+-++-+            
+-++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++- 
+-++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++--        return final_hidden_states, router_logits
+-++-+            final_hidden_states = final_hidden_states + shared_expert_output
+-++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++-+            
+-++-+            return final_hidden_states, router_logits
+-++- 
+-++- 
+-++- class Qwen2MoeDecoderLayer(nn.Module):
+-++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-++- 
+-++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++- 
+-++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++-+
+-++-         if (layer_idx not in config.mlp_only_layers) and (
+-++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++-         ):
+-++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-++-     _skip_keys_device_placement = "past_key_values"
+-++-     _supports_cache_class = True
+-++-+#lwx
+-++-+    # _supports_static_cache = True
+-++- 
+-++-     def _init_weights(self, module):
+-++-         std = self.config.initializer_range
+-++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++-         return causal_mask
+-++- 
+-++- 
+-++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++-     _tied_weights_keys = ["lm_head.weight"]
+-++- 
+-++-     def __init__(self, config):
+-++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++-         self.num_experts_per_tok = config.num_experts_per_tok
+-++-         # Initialize weights and apply final processing
+-++-         self.post_init()
+-++-+        # @lwx
+-++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-++-+        #     self.generation_config.cache_implementation = "static"
+-++-+        self._warmed_up = False
+-++-+
+-++-+    def warmup_moe_model(self):
+-++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-++-+        test_texts = [
+-++-+            "warmup short",
+-++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-++-+        ]
+-++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++-+        if tokenizer is None:
+-++-+            from mindnlp.transformers import AutoTokenizer
+-++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++-+            self._warmup_tokenizer = tokenizer
+-++-+
+-++-+        for text in test_texts:
+-++-+            inputs = tokenizer(text, return_tensors="ms")
+-++-+            with mindspore._no_grad():
+-++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-++- 
+-++-     def get_input_embeddings(self):
+-++-         return self.model.embed_tokens
+-++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++-         ```"""
+-++-+        if not self._warmed_up:
+-++-+            self._warmed_up = True
+-++-+            self.warmup_moe_model()
+-++- 
+-++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++-         output_router_logits = (
+-++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++-             }
+-++-         )
+-++-         return model_inputs
+-++-+# @lwx
+-++-+    # def _decode_one_tokens_logits(
+-++-+    #     self,
+-++-+    #     cur_token: mindspore.Tensor,
+-++-+    #     input_pos: Optional[mindspore.Tensor],
+-++-+    #     cache_position: mindspore.Tensor,
+-++-+    #     past_key_values: StaticCache,
+-++-+    # ) -> mindspore.Tensor:
+-++-+    #     """
+-++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-++-+        
+-++-+    #     Args:
+-++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-++-+    #         input_pos: 输入位置信息，可选
+-++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-++-+            
+-++-+    #     Returns:
+-++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-++-+    #     """
+-++-+    #     # 调用JIT编译的版本
+-++-+    #     return self.get_decode_one_tokens_logits(
+-++-+    #         cur_token=cur_token,
+-++-+    #         input_pos=input_pos,
+-++-+    #         cache_position=cache_position,
+-++-+    #         past_key_values=past_key_values,
+-++-+    #     )
+-++-+    
+-++-+    # @mindspore.jit(jit_level='O1')
+-++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-++-+    #     """
+-++-+    #     JIT编译的函数，用于高效的单token解码
+-++-+    #     使用JIT编译优化以支持静态shape和高效执行
+-++-+        
+-++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-++-+    #     """
+-++-+    #     outputs = self.model.forward(
+-++-+    #         input_ids=cur_token,
+-++-+    #         position_ids=input_pos,
+-++-+    #         cache_position=cache_position,
+-++-+    #         past_key_values=past_key_values,
+-++-+    #         use_cache=True,
+-++-+    #         return_dict=False,
+-++-+    #     )
+-++-+        
+-++-+    #     hidden_states = outputs[0]
+-++-+    #     logits = self.lm_head.forward(hidden_states)
+-++-+    #     logits = logits.float()
+-++-+        
+-++-+    #     return logits[:, -1, :]
+-++-+
+-++-+    # def _sample(
+-++-+    #     self,
+-++-+    #     input_ids: mindspore.Tensor,
+-++-+    #     logits_processor,
+-++-+    #     stopping_criteria,
+-++-+    #     generation_config,
+-++-+    #     synced_devices: bool,
+-++-+    #     streamer=None,
+-++-+    #     logits_warper=None,
+-++-+    #     **model_kwargs,
+-++-+    # ):
+-++-+    #     """
+-++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-++-+    #     """
+-++-+    #     from ...generation.logits_process import LogitsProcessorList
+-++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-++-+    #     from mindnlp.core import nn, ops, no_grad
+-++-+    #     import numpy as np
+-++-+        
+-++-+    #     # 检查是否使用 StaticCache
+-++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-++-+    #     # 否则，直接调用父类方法
+-++-+    #     past_key_values = model_kwargs.get("past_key_values")
+-++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-++-+
+-++-+    #     if not isinstance(past_key_values, StaticCache):
+-++-+    #         # 不使用 StaticCache，直接调用父类方法
+-++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-++-+    #         return super()._sample(
+-++-+    #             input_ids=input_ids,
+-++-+    #             logits_processor=logits_processor,
+-++-+    #             stopping_criteria=stopping_criteria,
+-++-+    #             generation_config=generation_config,
+-++-+    #             synced_devices=synced_devices,
+-++-+    #             streamer=streamer,
+-++-+    #             logits_warper=logits_warper,
+-++-+    #             **model_kwargs,
+-++-+    #         )
+-++-+        
+-++-+    #     # 使用 StaticCache，进入自定义循环
+-++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-++-+    #     pad_token_id = generation_config._pad_token_tensor
+-++-+    #     output_attentions = generation_config.output_attentions
+-++-+    #     output_hidden_states = generation_config.output_hidden_states
+-++-+    #     output_scores = generation_config.output_scores
+-++-+    #     output_logits = generation_config.output_logits
+-++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-++-+    #     max_length = generation_config.max_length
+-++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-++-+    #     do_sample = generation_config.do_sample
+-++-+        
+-++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-++-+    #         raise ValueError(
+-++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-++-+    #             f"{logits_warper})."
+-++-+    #         )
+-++-+        
+-++-+    #     # init attention / hidden states / scores tuples
+-++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+-++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-++-+        
+-++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-++-+    #         encoder_hidden_states = (
+-++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-++-+    #         )
+-++-+        
+-++-+    #     # keep track of which sequences are already finished
+-++-+    #     batch_size, cur_len = input_ids.shape
+-++-+    #     this_peer_finished = False
+-++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-++-+        
+-++-+    #     time_record = []
+-++-+    #     from ....utils.testing_utils import parse_flag_from_env
+-++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-++-+        
+-++-+    #     while self._has_unfinished_sequences(
+-++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-++-+    #     ):
+-++-+    #         if _record_time:
+-++-+    #             import time as time_module
+-++-+    #             infer_start = time_module.time()
+-++-+            
+-++-+    #         # prepare model inputs
+-++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-++-+            
+-++-+    #         # prepare variable output controls
+-++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-++-+            
+-++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-++-+    #         cur_cache_position = model_inputs.get("cache_position")
+-++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+-++-+    #         cur_input_ids = model_inputs.get("input_ids")
+-++-+            
+-++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-++-+    #             cur_cache_position is not None and 
+-++-+    #             len(cur_cache_position.shape) > 0 and
+-++-+    #             cur_cache_position.shape[0] == 1 and
+-++-+    #             cur_input_ids is not None and
+-++-+    #             cur_input_ids.shape[1] == 1):
+-++-+    #             # 使用 JIT 优化的单 token 解码
+-++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-++-+    #             if not hasattr(self, '_jit_used'):
+-++-+    #                 self._jit_used = False
+-++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-++-+                
+-++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+-++-+    #                 cur_token=cur_input_ids,
+-++-+    #                 input_pos=model_inputs.get("position_ids"),
+-++-+    #                 cache_position=cur_cache_position,
+-++-+    #                 past_key_values=cur_past_key_values,
+-++-+    #             )
+-++-+                
+-++-+    #             # 标记已使用JIT（用于后续判断）
+-++-+    #             if not self._jit_used:
+-++-+    #                 self._jit_used = True
+-++-+                
+-++-+    #             # 构造兼容的输出对象
+-++-+    #             class JitOptimizedOutput:
+-++-+    #                 def __init__(self, logits, config):
+-++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-++-+    #                     self.config = config
+-++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+-++-+    #                     self.cross_attentions = None
+-++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-++-+                
+-++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-++-+    #         else:
+-++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-++-+    #             outputs = self(**model_inputs, return_dict=True)
+-++-+            
+-++-+    #         if synced_devices and this_peer_finished:
+-++-+    #             continue
+-++-+            
+-++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-++-+    #         next_token_logits = outputs.logits[:, -1, :]
+-++-+            
+-++-+    #         # pre-process distribution
+-++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-++-+    #         if do_sample:
+-++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-++-+            
+-++-+    #         # Store scores, attentions and hidden_states when required
+-++-+    #         if return_dict_in_generate:
+-++-+    #             if output_scores:
+-++-+    #                 scores += (next_token_scores,)
+-++-+    #             if output_logits:
+-++-+    #                 raw_logits += (next_token_logits,)
+-++-+    #             if output_attentions:
+-++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-++-+    #                 if self.config.is_encoder_decoder:
+-++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-++-+                
+-++-+    #             if output_hidden_states:
+-++-+    #                 hidden = (
+-++-+    #                     outputs.decoder_hidden_states
+-++-+    #                     if self.config.is_encoder_decoder
+-++-+    #                     else outputs.hidden_states
+-++-+    #                 )
+-++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-++-+            
+-++-+    #         # token selection
+-++-+    #         if do_sample:
+-++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-++-+    #         else:
+-++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-++-+            
+-++-+    #         # finished sentences should have their next token be a padding token
+-++-+    #         if has_eos_stopping_criteria:
+-++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-++-+            
+-++-+    #         # update generated ids, model inputs, and length for next step
+-++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-++-+    #         if streamer is not None:
+-++-+    #             streamer.put(next_tokens)
+-++-+            
+-++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+-++-+    #             outputs,
+-++-+    #             model_kwargs,
+-++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-++-+    #         )
+-++-+            
+-++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-++-+    #         cur_len += 1
+-++-+            
+-++-+    #         if _record_time:
+-++-+    #             import time as time_module
+-++-+    #             infer_stop = time_module.time()
+-++-+    #             time_record.append(infer_stop - infer_start)
+-++-+            
+-++-+    #         del outputs
+-++-+        
+-++-+    #     average_infer_time = None
+-++-+    #     if time_record:
+-++-+    #         if len(time_record) > 1:
+-++-+    #             time_record.pop(0)
+-++-+    #         average_infer_time = sum(time_record) / len(time_record)
+-++-+    #         print(f'average inference time is: {average_infer_time}')
+-++-+    #         print(f'inference time record: {time_record}')
+-++-+        
+-++-+    #     if streamer is not None:
+-++-+    #         streamer.end()
+-++-+        
+-++-+    #     # 简单判断：打印是否使用了JIT路径
+-++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+-++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+-++-+    #     else:
+-++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-++-+        
+-++-+    #     if return_dict_in_generate:
+-++-+    #         if self.config.is_encoder_decoder:
+-++-+    #             return GenerateEncoderDecoderOutput(
+-++-+    #                 sequences=input_ids,
+-++-+    #                 scores=scores,
+-++-+    #                 logits=raw_logits,
+-++-+    #                 encoder_attentions=encoder_attentions,
+-++-+    #                 encoder_hidden_states=encoder_hidden_states,
+-++-+    #                 decoder_attentions=decoder_attentions,
+-++-+    #                 cross_attentions=cross_attentions,
+-++-+    #                 decoder_hidden_states=decoder_hidden_states,
+-++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++-+    #                 average_infer_time=average_infer_time
+-++-+    #             )
+-++-+    #         else:
+-++-+    #             return GenerateDecoderOnlyOutput(
+-++-+    #                 sequences=input_ids,
+-++-+    #                 scores=scores,
+-++-+    #                 logits=raw_logits,
+-++-+    #                 attentions=decoder_attentions,
+-++-+    #                 hidden_states=decoder_hidden_states,
+-++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++-+    #                 average_infer_time=average_infer_time
+-++-+    #             )
+-++-+    #     else:
+-++-+    #         return input_ids
+-++-+            
+-++-+    # def _prepare_cache_for_generation(
+-++-+    #     self,
+-++-+    #     generation_config,
+-++-+    #     model_kwargs,
+-++-+    #     assistant_model,
+-++-+    #     batch_size,
+-++-+    #     max_cache_length,
+-++-+    # ):
+-++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-++-+    #         generation_config.cache_implementation = "static"
+-++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-++-+        
+-++-+    #     if generation_config.cache_implementation == "static":
+-++-+    #         base_required_from_max_length = generation_config.max_length + 1
+-++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+-++-+    #         min_cache_size = 50
+-++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-++-+    #         else:
+-++-+    #             max_cache_length = max(base_required, min_cache_size)
+-++-+            
+-++-+    #         original_max_cache_length = max_cache_length
+-++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+-++-+            
+-++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++-+    #             if max_cache_length > self.config.max_position_embeddings:
+-++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-++-+        
+-++-+    #     result = super()._prepare_cache_for_generation(
+-++-+    #         generation_config=generation_config,
+-++-+    #         model_kwargs=model_kwargs,
+-++-+    #         assistant_model=assistant_model,
+-++-+    #         batch_size=batch_size,
+-++-+    #         max_cache_length=max_cache_length,
+-++-+    #     )
+-++-+        
+-++-+    #     if generation_config.cache_implementation == "static":
+-++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-++-+    #         created_cache = model_kwargs.get(cache_name)
+-++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-++-+    #             if created_cache.max_cache_len < generation_config.max_length:
+-++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-++-+        
+-++-+    #     return result
+-++-+
+-++-+
+-++-+
+-++- 
+-++- 
+-++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-++--- 
+-++-2.27.0
+-++-
+-++-- 
+-++2.27.0
+-++
+-+-- 
+-+2.27.0
+-+
+--- 
+-2.27.0
+-
+diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
+deleted file mode 100644
+index 80906633..00000000
+--- a/patches/0006-20251107002commit.patch
++++ /dev/null
+@@ -1,7931 +0,0 @@
+-From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Fri, 7 Nov 2025 12:06:32 +0800
+-Subject: [PATCH 6/8] 20251107002commit
+-
+----
+- .../models/deepseek/modeling_deepseek.py      |  122 +-
+- patches/0001-20251104commit.patch             |    2 +-
+- patches/0002-20251106commit.patch             |    2 +-
+- patches/0003-20261106secondcommit.patch       |    2 +-
+- patches/0004-20251106change.patch             |    2 +-
+- patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
+- 6 files changed, 7773 insertions(+), 64 deletions(-)
+- create mode 100644 patches/0005-20251107001commit.patch
+-
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index 8831e4b7..e7e1c053 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
+-     #         expert_out = expert(x)
+-     #         expert_cache += expert_out * weight
+-     #     return expert_cache
+--
+--    # @no_grad()
+--    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+--    #     # x 的 shape: (1, hidden_size)
+--    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+--    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+--
+--    #     # 1. 收集所有需要的专家层
+--    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+--    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+--
+--    #     # 2. 并行计算所有专家的输出
+--    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+--    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+--    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+--    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+--
+--    #     # 3. 使用矩阵乘法进行加权求和
+--    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+--    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+--    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+--    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+    
+-+    @no_grad()
+-+    dwj
+-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+        # x 的 shape: (1, hidden_size)
+-+        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+
+-+        # 1. 收集所有需要的专家层
+-+        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+
+-+        # 2. 并行计算所有专家的输出
+-+        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+        # ops.cat 会将它们堆叠成一个新的 Tensor
+-+        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+
+-+        # 3. 使用矩阵乘法进行加权求和
+-+        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+        # 最终结果 final_output 的 shape: (1, hidden_size)
+-+        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-         
+--    #     return final_output
+-+        return final_output
+- 
+- 
+-     # @no_grad()
+-@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
+- 
+-         return expert_cache
+- # 放置在 DeepseekMoE 类中
+--    @no_grad()
+--    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+--        """
+--        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+--
+--        Args:
+--            x (Tensor): 输入张量, shape: (1, hidden_size)
+--            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+--            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+--        """
+--        top_k, _ = flat_expert_weights.shape
+--        hidden_size = x.shape[-1]
+--
+--        # 1. 将所有专家的权重堆叠起来
+--        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+--        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+--        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-+    # @no_grad()
+-+    # #lwx 20251107
+-+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     """
+-+    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-+
+-+    #     Args:
+-+    #         x (Tensor): 输入张量, shape: (1, hidden_size)
+-+    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-+    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-+    #     """
+-+    #     top_k, _ = flat_expert_weights.shape
+-+    #     hidden_size = x.shape[-1]
+-+
+-+    #     # 1. 将所有专家的权重堆叠起来
+-+    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-+    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-+    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-         
+--        # 2. "收集" 所需的专家权重
+--        selected_gate_w = stacked_gate_w[flat_expert_indices]
+--        selected_up_w = stacked_up_w[flat_expert_indices]
+--        selected_down_w = stacked_down_w[flat_expert_indices]
+-+    #     # 2. "收集" 所需的专家权重
+-+    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
+-+    #     selected_up_w = stacked_up_w[flat_expert_indices]
+-+    #     selected_down_w = stacked_down_w[flat_expert_indices]
+-         
+--        # 3. 准备输入
+--        x_expanded = x.expand((top_k, 1, hidden_size))
+-+    #     # 3. 准备输入
+-+    #     x_expanded = x.expand((top_k, 1, hidden_size))
+-         
+--        # 4. 并行计算 gate_proj 和 up_proj
+--        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+--        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-+    #     # 4. 并行计算 gate_proj 和 up_proj
+-+    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-+    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+- 
+--        # 5. 计算中间状态
+--        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-+    #     # 5. 计算中间状态
+-+    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-         
+--        # 6. 并行计算 down_proj
+--        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+--        #    --- [FIX] ---
+--        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+--        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+--        #    --- [FIX END] ---
+-+    #     # 6. 并行计算 down_proj
+-+    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-+    #     #    --- [FIX] ---
+-+    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-+    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-+    #     #    --- [FIX END] ---
+-         
+--        # 7. 根据路由权重进行加权求和
+--        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-+    #     # 7. 根据路由权重进行加权求和
+-+    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+- 
+--        return weighted_sum
+-+    #     return weighted_sum
+- 
+- 
+- 
+-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-index 0a0ef2d7..2842180e 100644
+---- a/patches/0001-20251104commit.patch
+-+++ b/patches/0001-20251104commit.patch
+-@@ -1,7 +1,7 @@
+- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Tue, 4 Nov 2025 09:11:51 +0800
+--Subject: [PATCH 1/4] 20251104commit
+-+Subject: [PATCH 1/5] 20251104commit
+- 
+- ---
+-  mindnlp/transformers/cache_utils.py           |  28 +-
+-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-index 5185270c..c6cd8757 100644
+---- a/patches/0002-20251106commit.patch
+-+++ b/patches/0002-20251106commit.patch
+-@@ -1,7 +1,7 @@
+- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 09:20:38 +0800
+--Subject: [PATCH 2/4] 20251106commit
+-+Subject: [PATCH 2/5] 20251106commit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-index 3e05f821..601960c9 100644
+---- a/patches/0003-20261106secondcommit.patch
+-+++ b/patches/0003-20261106secondcommit.patch
+-@@ -1,7 +1,7 @@
+- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 14:54:37 +0800
+--Subject: [PATCH 3/4] 20261106secondcommit
+-+Subject: [PATCH 3/5] 20261106secondcommit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-index 88a1aef4..8976f10b 100644
+---- a/patches/0004-20251106change.patch
+-+++ b/patches/0004-20251106change.patch
+-@@ -1,7 +1,7 @@
+- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 15:48:09 +0800
+--Subject: [PATCH 4/4] 20251106change
+-+Subject: [PATCH 4/5] 20251106change
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  189 +-
+-diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+-new file mode 100644
+-index 00000000..8d9032be
+---- /dev/null
+-+++ b/patches/0005-20251107001commit.patch
+-@@ -0,0 +1,7707 @@
+-+From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+-+From: Pinoeer-kingxi <13022943007@163.com>
+-+Date: Fri, 7 Nov 2025 11:48:18 +0800
+-+Subject: [PATCH 5/5] 20251107001commit
+-+
+-+---
+-+ .../models/deepseek/modeling_deepseek.py      |   91 +-
+-+ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
+-+ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
+-+ patches/0001-20251104commit.patch             |    2 +-
+-+ patches/0002-20251106commit.patch             |    2 +-
+-+ patches/0003-20261106secondcommit.patch       |    2 +-
+-+ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
+-+ 7 files changed, 7577 insertions(+), 30 deletions(-)
+-+ create mode 100644 patches/0004-20251106change.patch
+-+
+-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+index 0546f318..8831e4b7 100644
+-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
+-+     #         expert_cache += expert_out * weight
+-+     #     return expert_cache
+-+ 
+-+-    @no_grad()
+-+-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+-        # x 的 shape: (1, hidden_size)
+-+-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+-
+-+-        # 1. 收集所有需要的专家层
+-+-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+-        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+-
+-+-        # 2. 并行计算所有专家的输出
+-+-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+-        # ops.cat 会将它们堆叠成一个新的 Tensor
+-+-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+-
+-+-        # 3. 使用矩阵乘法进行加权求和
+-+-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+-        # 最终结果 final_output 的 shape: (1, hidden_size)
+-+-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-++    # @no_grad()
+-++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++    #     # x 的 shape: (1, hidden_size)
+-++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-++
+-++    #     # 1. 收集所有需要的专家层
+-++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+-++
+-++    #     # 2. 并行计算所有专家的输出
+-++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+-++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-++
+-++    #     # 3. 使用矩阵乘法进行加权求和
+-++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+-++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+         
+-+-        return final_output
+-++    #     return final_output
+-+ 
+-+ 
+-+     # @no_grad()
+-+@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
+-+             )
+-+ 
+-+         return expert_cache
+-++# 放置在 DeepseekMoE 类中
+-++    @no_grad()
+-++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++        """
+-++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-++
+-++        Args:
+-++            x (Tensor): 输入张量, shape: (1, hidden_size)
+-++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-++        """
+-++        top_k, _ = flat_expert_weights.shape
+-++        hidden_size = x.shape[-1]
+-++
+-++        # 1. 将所有专家的权重堆叠起来
+-++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-++        
+-++        # 2. "收集" 所需的专家权重
+-++        selected_gate_w = stacked_gate_w[flat_expert_indices]
+-++        selected_up_w = stacked_up_w[flat_expert_indices]
+-++        selected_down_w = stacked_down_w[flat_expert_indices]
+-++        
+-++        # 3. 准备输入
+-++        x_expanded = x.expand((top_k, 1, hidden_size))
+-++        
+-++        # 4. 并行计算 gate_proj 和 up_proj
+-++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-++
+-++        # 5. 计算中间状态
+-++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-++        
+-++        # 6. 并行计算 down_proj
+-++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-++        #    --- [FIX] ---
+-++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-++        #    --- [FIX END] ---
+-++        
+-++        # 7. 根据路由权重进行加权求和
+-++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-++
+-++        return weighted_sum
+-++
+-++
+-+ 
+-+         # @no_grad()
+-+         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+index ebd7782e..913a7609 100644
+-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
+-+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+ def rotate_half(x):
+-+     """Rotates half the hidden dims of the input."""
+-+-    x1 = x[..., : x.shape[-1] // 2]
+-+-    x2 = x[..., x.shape[-1] // 2 :]
+-++    # x1 = x[..., : x.shape[-1] // 2]
+-++    # x2 = x[..., x.shape[-1] // 2 :]
+-+     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+     return ops.cat((-x2, x1), dim=-1)
+-+ 
+-+ 
+-+diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-+index d059dcbe..2b217b64 100644
+-+--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-+@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
+-+ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+ def rotate_half(x):
+-+     """Rotates half the hidden dims of the input."""
+-+-    x1 = x[..., : x.shape[-1] // 2]
+-+-    x2 = x[..., x.shape[-1] // 2 :]
+-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++    # x1 = x[..., : x.shape[-1] // 2]
+-++    # x2 = x[..., x.shape[-1] // 2 :]
+-++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+     return ops.cat((-x2, x1), dim=-1)
+-+ 
+-+ 
+-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+index 78f22642..0a0ef2d7 100644
+-+--- a/patches/0001-20251104commit.patch
+-++++ b/patches/0001-20251104commit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+-Subject: [PATCH 1/3] 20251104commit
+-++Subject: [PATCH 1/4] 20251104commit
+-+ 
+-+ ---
+-+  mindnlp/transformers/cache_utils.py           |  28 +-
+-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-+index 22b65dd5..5185270c 100644
+-+--- a/patches/0002-20251106commit.patch
+-++++ b/patches/0002-20251106commit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-+-Subject: [PATCH 2/3] 20251106commit
+-++Subject: [PATCH 2/4] 20251106commit
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-+index 966529e4..3e05f821 100644
+-+--- a/patches/0003-20261106secondcommit.patch
+-++++ b/patches/0003-20261106secondcommit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-+-Subject: [PATCH 3/3] 20261106secondcommit
+-++Subject: [PATCH 3/4] 20261106secondcommit
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-+new file mode 100644
+-+index 00000000..88a1aef4
+-+--- /dev/null
+-++++ b/patches/0004-20251106change.patch
+-+@@ -0,0 +1,7498 @@
+-++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+-++From: Pinoeer-kingxi <13022943007@163.com>
+-++Date: Thu, 6 Nov 2025 15:48:09 +0800
+-++Subject: [PATCH 4/4] 20251106change
+-++
+-++---
+-++ .../models/deepseek/modeling_deepseek.py      |  189 +-
+-++ patches/0001-20251104commit.patch             | 1272 +++++++
+-++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
+-++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
+-++ 4 files changed, 7244 insertions(+), 186 deletions(-)
+-++ create mode 100644 patches/0001-20251104commit.patch
+-++ create mode 100644 patches/0002-20251106commit.patch
+-++ create mode 100644 patches/0003-20261106secondcommit.patch
+-++
+-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++index 2f9192bf..0546f318 100644
+-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
+-++ 
+-++         return attn_output, attn_weights, past_key_value
+-++ 
+-++-# class DeepseekFlashAttention(nn.Module):
+-++-#     """
+-++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-++-
+-++-#     This class is designed as a drop-in replacement for DeepseekAttention.
+-++-#     """
+-++-
+-++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-++-#         super().__init__()
+-++-#         self.config = config
+-++-#         self.layer_idx = layer_idx
+-++-#         if layer_idx is None:
+-++-#             logger.warning(
+-++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++-#                 "when creating this class."
+-++-#             )
+-++-
+-++-#         self.attention_dropout = config.attention_dropout
+-++-#         self.hidden_size = config.hidden_size
+-++-#         self.num_heads = config.num_attention_heads
+-++-#         self.head_dim = self.hidden_size // self.num_heads
+-++-#         self.num_key_value_heads = config.num_key_value_heads
+-++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++-#         self.max_position_embeddings = config.max_position_embeddings
+-++-#         self.rope_theta = config.rope_theta
+-++-#         self.is_causal = True
+-++-
+-++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++-#             raise ValueError(
+-++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++-#                 f" and `num_heads`: {self.num_heads})."
+-++-#             )
+-++-
+-++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-++-#         self._init_rope()
+-++-
+-++-#     def _init_rope(self):
+-++-#         if self.config.rope_scaling is None:
+-++-#             self.rotary_emb = DeepseekRotaryEmbedding(
+-++-#                 self.head_dim,
+-++-#                 max_position_embeddings=self.max_position_embeddings,
+-++-#                 base=self.rope_theta,
+-++-#             )
+-++-#         else:
+-++-#             scaling_type = self.config.rope_scaling["type"]
+-++-#             scaling_factor = self.config.rope_scaling["factor"]
+-++-#             if scaling_type == "linear":
+-++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-++-#                     self.head_dim,
+-++-#                     max_position_embeddings=self.max_position_embeddings,
+-++-#                     scaling_factor=scaling_factor,
+-++-#                     base=self.rope_theta,
+-++-#                 )
+-++-#             elif scaling_type == "dynamic":
+-++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-++-#                     self.head_dim,
+-++-#                     max_position_embeddings=self.max_position_embeddings,
+-++-#                     scaling_factor=scaling_factor,
+-++-#                     base=self.rope_theta,
+-++-#                 )
+-++-#             else:
+-++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-++-
+-++-#     def forward(
+-++-#         self,
+-++-#         hidden_states: mindspore.Tensor,
+-++-#         attention_mask: Optional[mindspore.Tensor] = None,
+-++-#         position_ids: Optional[mindspore.Tensor] = None,
+-++-#         past_key_value: Optional[Cache] = None,
+-++-#         output_attentions: bool = False,
+-++-#         use_cache: bool = False,
+-++-#         **kwargs,
+-++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++-#         if "padding_mask" in kwargs:
+-++-#             warnings.warn(
+-++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-++-#             )
+-++-        
+-++-#         if output_attentions:
+-++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-++-
+-++-#         bsz, q_len, _ = hidden_states.shape
+-++-
+-++-#         if self.config.pretraining_tp > 1:
+-++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-++-
+-++-#         query_states = self.q_proj(hidden_states)
+-++-#         key_states = self.k_proj(hidden_states)
+-++-#         value_states = self.v_proj(hidden_states)
+-++-
+-++-#         # Reshape for multi-head attention
+-++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++-
+-++-#         kv_seq_len = key_states.shape[-2]
+-++-#         if past_key_value is not None:
+-++-#             if self.layer_idx is None:
+-++-#                 raise ValueError(
+-++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++-#                     "with a layer index."
+-++-#                 )
+-++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++-        
+-++-#         # Apply Rotary Positional Embedding
+-++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++-
+-++-#         if past_key_value is not None:
+-++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++-
+-++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++-        
+-++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++-        
+-++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++-
+-++-#         # Convert attention_mask for flash_attention_score
+-++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-++-#         if attention_mask is not None:
+-++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-++-#                 raise ValueError(
+-++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-++-#                 )
+-++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-++-#         else:
+-++-#             attn_mask_for_fa = None
+-++-        
+-++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-++-
+-++-#         # Call the fused flash_attention_score operator
+-++-#         attn_output = mindspore.ops.flash_attention_score(
+-++-#             query=query_states_for_fa,
+-++-#             key=key_states_for_fa,
+-++-#             value=value_states_for_fa,
+-++-#             head_num=self.num_heads, # This is N1, the number of query heads
+-++-#             input_layout='BSH',
+-++-#             attn_mask=attn_mask_for_fa,
+-++-#             keep_prob=keep_prob,
+-++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++-#             sparse_mode=0 # Default mask mode
+-++-#         )
+-++-        
+-++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-++-#         attn_output = self.o_proj(attn_output)
+-++-        
+-++-#         # Flash Attention does not return attention weights
+-++-#         attn_weights = None
+-++-
+-++-#         return attn_output, attn_weights, past_key_value
+-++ 
+-++ class DeepseekFlashAttention(nn.Module):
+-++     """
+-++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
+-++         super().__init__()
+-++         self.hidden_size = config.hidden_size
+-++ 
+-++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-++-            config=config, layer_idx=layer_idx
+-++-        )
+-+++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-+++            # config=config, layer_idx=layer_idx
+-+++        # )
+-++ 
+-++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-++             config=config, layer_idx=layer_idx
+-++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
+-++         return outputs
+-++ 
+-++ 
+-++-
+-++ class DeepseekPreTrainedModel(PreTrainedModel):
+-++     config_class = DeepseekConfig
+-++     base_model_prefix = "model"
+-++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++         # Initialize weights and apply final processing
+-++         self.post_init()
+-++         self.warm_up = False
+-++-        #@dwj
+-++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-++-            self.num_layers,
+-++-            self.num_attention_heads,
+-++-            self.head_dim,
+-++-            batch_size=1,
+-++-            max_length=self.max_length,
+-++-            dtype=mindspore.float16
+-++-        )
+-++-
+-++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-++-        key_cache = []
+-++-        value_cache = []
+-++-        for _ in range(num_layers):
+-++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++-            key_cache.append(k)
+-++-            value_cache.append(v)
+-++-        return key_cache, value_cache
+-++-
+-++ 
+-++     def warmup_moe_model_deep(self):
+-++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-++new file mode 100644
+-++index 00000000..78f22642
+-++--- /dev/null
+-+++++ b/patches/0001-20251104commit.patch
+-++@@ -0,0 +1,1272 @@
+-+++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+++From: Pinoeer-kingxi <13022943007@163.com>
+-+++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+++Subject: [PATCH 1/3] 20251104commit
+-+++
+-+++---
+-+++ mindnlp/transformers/cache_utils.py           |  28 +-
+-+++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-+++
+-+++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+++index cadd2e04..02f8d4be 100644
+-+++--- a/mindnlp/transformers/cache_utils.py
+-++++++ b/mindnlp/transformers/cache_utils.py
+-+++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+++             # k_out[:, :, cache_position] = key_states
+-+++             # v_out[:, :, cache_position] = value_states
+-+++-            if ON_ORANGE_PI:
+-+++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++-            else:
+-+++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++-
+-++++            # if ON_ORANGE_PI:
+-++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++            # else:
+-++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++            # 确保 cache_position 是 1D tensor 并且类型正确
+-++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-++++            if cache_position.ndim > 1:
+-++++                cache_position = cache_position.flatten()
+-++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-++++                cache_position = cache_position.int()
+-++++            
+-++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-++++            k_out[:, :, cache_position] = key_states
+-++++            v_out[:, :, cache_position] = value_states
+-++++            
+-+++         return k_out, v_out
+-+++ 
+-+++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++index c695b944..d8303e45 100644
+-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+++ def rotate_half(x):
+-+++     """Rotates half the hidden dims of the input."""
+-+++-    x1 = x[..., : x.shape[-1] // 2]
+-+++-    x2 = x[..., x.shape[-1] // 2 :]
+-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++    # x1 = x[..., : x.shape[-1] // 2]
+-++++    # x2 = x[..., x.shape[-1] // 2 :]
+-++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++     return ops.cat((-x2, x1), dim=-1)
+-+++ 
+-+++ 
+-+++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+++         if self.training:
+-+++             raise NotImplementedError("Training is not supported yet.")
+-+++         else:
+-+++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++-        if self.config.n_shared_experts is not None:
+-+++-            y = y + self.shared_experts(identity)
+-+++-        return y
+-++++            # @lwx
+-++++            if orig_shape[1] == 1:
+-++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-++++                y=y.view(*orig_shape)
+-++++                if self.config.n_shared_experts is not None:
+-++++                    y = y + self.shared_experts(identity)
+-++++                return y
+-++++            else:
+-++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-++++                if self.config.n_shared_experts is not None:
+-++++                    y = y + self.shared_experts(identity)
+-++++                return y
+-++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++        # if self.config.n_shared_experts is not None:
+-++++        #     y = y + self.shared_experts(identity)
+-++++        # return y
+-++++        
+-++++    @no_grad()
+-++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++
+-++++        expert_cache = ops.zeros_like(x)
+-++++        for i in range(self.num_experts_per_tok):
+-++++            expert_id = flat_expert_indices[i].item()
+-++++            weight = flat_expert_weights[i].item()
+-++++            expert = self.experts[expert_id]
+-++++            expert_out = expert(x)
+-++++            expert_cache += expert_out * weight
+-++++        return expert_cache
+-+++ 
+-+++     @no_grad()
+-+++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++-        # expert_cache = torch.zeros_like(x)
+-+++-        # idxs = flat_expert_indices.argsort()
+-+++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++-        # token_idxs = idxs // self.num_experts_per_tok
+-+++-        # for i, end_idx in enumerate(tokens_per_expert):
+-+++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++-        #     if start_idx == end_idx:
+-+++-        #         continue
+-+++-        #     expert = self.experts[i]
+-+++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++-        #     expert_tokens = x[exp_token_idx]
+-+++-        #     expert_out = expert(expert_tokens)
+-+++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++-        # return expert_cache
+-++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++         expert_cache = ops.zeros_like(x)
+-+++         idxs = flat_expert_indices.argsort()
+-+++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++         token_idxs = idxs // self.num_experts_per_tok
+-++++
+-+++         for i, end_idx in enumerate(tokens_per_expert):
+-+++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++             if start_idx == end_idx:
+-+++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+++             expert_out = expert(expert_tokens)
+-+++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++
+-+++         return expert_cache
+-++++        
+-++++    # @no_grad()
+-++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++    #     # expert_cache = torch.zeros_like(x)
+-++++    #     # idxs = flat_expert_indices.argsort()
+-++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++    #     # token_idxs = idxs // self.num_experts_per_tok
+-++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++    #     #     if start_idx == end_idx:
+-++++    #     #         continue
+-++++    #     #     expert = self.experts[i]
+-++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++    #     #     expert_tokens = x[exp_token_idx]
+-++++    #     #     expert_out = expert(expert_tokens)
+-++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++    #     # return expert_cache
+-++++    #     expert_cache = ops.zeros_like(x)
+-++++    #     idxs = flat_expert_indices.argsort()
+-++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++    #     token_idxs = idxs // self.num_experts_per_tok
+-++++
+-++++    #     for i, end_idx in enumerate(tokens_per_expert):
+-++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++    #         if start_idx == end_idx:
+-++++    #             continue
+-++++    #         expert = self.experts[i]
+-++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++    #         expert_tokens = x[exp_token_idx]
+-++++    #         expert_out = expert(expert_tokens)
+-++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++
+-++++    #     return expert_cache
+-++++    # @no_grad()
+-++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++    #     expert_cache = ops.zeros_like(x)
+-++++
+-++++    #     # 排序保证顺序一致
+-++++    #     idxs = flat_expert_indices.argsort()
+-++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++    #     token_idxs = idxs // self.num_experts_per_tok
+-++++
+-++++    #     # 找出有 token 的专家
+-++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++
+-++++    #     for i in active_experts.tolist():
+-++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++    #         end_idx = tokens_per_expert[i]
+-++++    #         if start_idx == end_idx:  # 没有 token
+-++++    #             continue
+-++++
+-++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++    #         expert_tokens = x[exp_token_idx]
+-++++    #         expert_out = self.experts[i](expert_tokens)
+-++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++
+-++++    #         expert_cache = mindspore.mint.scatter_add(
+-++++    #             expert_cache,
+-++++    #             0,
+-++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++    #             expert_out
+-++++    #         )
+-++++
+-++++    #     return expert_cache
+-++++
+-++++
+-+++ 
+-+++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+++ #     """
+-+++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++ 
+-+++         # Initialize weights and apply final processing
+-+++         self.post_init()
+-++++        self.warm_up = False
+-++++
+-++++    def warmup_moe_model_deep(self):
+-++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++++        test_texts = [
+-++++            "warmup short",
+-++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-++++        ]
+-++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++        if tokenizer is None:
+-++++            from mindnlp.transformers import AutoTokenizer
+-++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++            self._warmup_tokenizer = tokenizer
+-++++
+-++++        for text in test_texts:
+-++++            inputs = tokenizer(text, return_tensors="ms")
+-++++            with mindspore._no_grad():
+-++++                _ = self(**inputs, use_cache=False)
+-++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+++ 
+-+++     def get_input_embeddings(self):
+-+++         return self.model.embed_tokens
+-+++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++         ```"""
+-++++        if not self.warm_up:
+-++++            self.warm_up = True
+-++++            self.warmup_moe_model_deep()
+-++++
+-+++         output_attentions = (
+-+++             output_attentions
+-+++             if output_attentions is not None
+-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++index 3cbf820e..d4c6b651 100644
+-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++@@ -18,7 +18,6 @@
+-+++ # See the License for the specific language governing permissions and
+-+++ # limitations under the License.
+-+++ """MindSpore Qwen2MoE model."""
+-+++-
+-+++ import math
+-+++ from typing import List, Optional, Tuple, Union
+-+++ 
+-+++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+++     TokenClassifierOutput,
+-+++ )
+-+++ from ...modeling_utils import PreTrainedModel
+-++++from ...generation import GenerationMixin
+-+++ from ....utils import logging
+-+++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-+++ 
+-+++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+++         self.variance_epsilon = eps
+-+++ 
+-+++     def forward(self, hidden_states):
+-++++        # @dwj
+-++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++        # @lwx
+-++++        # if not self.training :
+-++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++         input_dtype = hidden_states.dtype
+-+++         hidden_states = hidden_states.to(mindspore.float32)
+-+++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+++@@ -234,6 +239,8 @@ def rotate_half(x):
+-+++     """Rotates half the hidden dims of the input."""
+-+++     x1 = x[..., : x.shape[-1] // 2]
+-+++     x2 = x[..., x.shape[-1] // 2 :]
+-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++     return ops.cat((-x2, x1), dim=-1)
+-+++ 
+-+++ 
+-+++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+++         self.config = config
+-+++         self.hidden_size = config.hidden_size
+-+++         self.intermediate_size = intermediate_size
+-++++        
+-+++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+++         self.act_fn = ACT2FN[config.hidden_act]
+-+++ 
+-+++     def forward(self, x):
+-+++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++-
+-+++ 
+-++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++        # @lwx 
+-++++        # gate_up_output = self.gate_up_proj(x)
+-++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-++++        # return self.down_proj(swiglu_output)
+-++++
+-++++    # def forward(self, x):
+-++++    #     gate_proj_out = self.gate_proj(x)
+-++++    #     up_proj_out = self.up_proj(x)
+-++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-++++    #     return self.down_proj(swiglu_out)
+-++++        
+-+++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+++     """
+-+++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+++         use_cache: bool = False,
+-+++         cache_position: Optional[mindspore.Tensor] = None,
+-+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++        
+-++++
+-+++         bsz, q_len, _ = hidden_states.shape
+-+++ 
+-+++         query_states = self.q_proj(hidden_states)
+-+++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++                     "with a layer index."
+-+++                 )
+-+++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++            if isinstance(past_key_value, StaticCache):
+-++++                kv_seq_len = key_states.shape[-2]
+-++++            else:
+-++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++ 
+-+++         if past_key_value is not None:
+-+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++            
+-++++            if isinstance(past_key_value, StaticCache):
+-++++                kv_seq_len = key_states.shape[-2]
+-+++ 
+-+++         # repeat k/v heads if n_kv_heads < n_heads
+-+++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++-
+-++++        
+-+++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++ 
+-+++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+++-            raise ValueError(
+-+++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+++-                f" {attn_weights.shape}"
+-+++-            )
+-+++-
+-+++-        if attention_mask is not None:  # no matter the length, we just slice it
+-+++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-++++        if attention_mask is not None:
+-++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++             attn_weights = attn_weights + causal_mask
+-+++ 
+-+++         # upcast attention to fp32
+-+++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++ 
+-+++         attn_output = self.o_proj(attn_output)
+-+++-
+-++++        # @lwx
+-++++        
+-++++        # max_seq_len = self.max_position_embeddings  # 2048
+-++++
+-++++        # if attention_mask is not None:
+-++++        #     # attention_mask: [B, 1, Sq, Sk]
+-++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++++
+-++++        #     # pad 到 [max_seq_len, max_seq_len]
+-++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++++        #     global_attention_mask = padded_mask
+-++++        # else:
+-++++        #     global_attention_mask = None
+-++++
+-++++
+-++++        # sparse_mode=3
+-++++        # attn_output = mindspore.ops.flash_attention_score(
+-++++        #     query=query_states,
+-++++        #     key=key_states,
+-++++        #     value=value_states,
+-++++        #     real_shift=None,
+-++++        #     padding_mask=None,
+-++++
+-++++        #     head_num=self.num_heads,
+-++++        #     attn_mask=global_attention_mask,
+-++++        #     keep_prob=1.0 - self.attention_dropout,
+-++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++        #     input_layout="BNSD",
+-++++        #     pre_tokens=2147483647,
+-++++        #     next_tokens=2147483647,
+-++++        #     inner_precise=0,
+-++++        #     drop_mask=None,
+-++++        #     prefix=None,
+-++++        #     actual_seq_qlen=None,
+-++++        #     actual_seq_kvlen=None,
+-++++        #     sparse_mode=sparse_mode,
+-++++        # )
+-+++         if not output_attentions:
+-+++             attn_weights = None
+-+++ 
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++ 
+-++++class Qwen2MoeFlashAttention(nn.Module):
+-++++    """
+-++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++++
+-++++    关键改动:
+-++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++++       直接传入原始的 key 和 value 张量效率更高。
+-++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++    """
+-++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++        super().__init__()
+-++++        self.config = config
+-++++        self.layer_idx = layer_idx
+-++++        self.hidden_size = config.hidden_size
+-++++        self.num_heads = config.num_attention_heads
+-++++        self.head_dim = self.hidden_size // self.num_heads
+-++++        self.num_key_value_heads = config.num_key_value_heads
+-++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++        self.max_position_embeddings = config.max_position_embeddings
+-++++        self.rope_theta = config.rope_theta
+-++++        self.attention_dropout = config.attention_dropout
+-++++
+-++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++            raise ValueError(
+-++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++++            )
+-++++
+-++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++
+-++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++            self.head_dim,
+-++++            max_position_embeddings=self.max_position_embeddings,
+-++++            base=self.rope_theta,
+-++++        )
+-++++
+-++++    def forward(
+-++++        self,
+-++++        hidden_states: mindspore.Tensor,
+-++++        attention_mask: Optional[mindspore.Tensor] = None,
+-++++        position_ids: Optional[mindspore.Tensor] = None,
+-++++        past_key_value: Optional[Cache] = None,
+-++++        output_attentions: bool = False,
+-++++        use_cache: bool = False,
+-++++        cache_position: Optional[mindspore.Tensor] = None,
+-++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++        bsz, q_len, _ = hidden_states.shape
+-++++
+-++++        # 1. 线性投射 Q, K, V
+-++++        query_states = self.q_proj(hidden_states)
+-++++        key_states = self.k_proj(hidden_states)
+-++++        value_states = self.v_proj(hidden_states)
+-++++
+-++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++        # 3. RoPE 旋转位置编码
+-++++        kv_seq_len = key_states.shape[-2]
+-++++        if past_key_value is not None:
+-++++            if self.layer_idx is None:
+-++++                raise ValueError(
+-++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++                    "with a layer index."
+-++++                )
+-++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++++                if cache_position.shape[0] == 1:
+-++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++++                    kv_seq_len = past_seen_tokens + 1
+-++++                else:
+-++++                    # prefill 阶段：cache_position 是范围，使用其长度
+-++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++++            else:
+-++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++
+-++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++        # 4. KV 缓存更新
+-++++        if past_key_value is not None:
+-++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++            key_states, value_states = past_key_value.update(
+-++++                key_states, value_states, self.layer_idx, cache_kwargs
+-++++            )
+-++++            
+-++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++                if cache_position.shape[0] == 1:
+-++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++++                    kv_seq_len = key_states.shape[-2]
+-++++
+-++++        # 5. [重要] 准备 Attention Mask
+-++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++        fa_attention_mask = None
+-++++        if attention_mask is not None:
+-++++            # 截取与当前key长度匹配的部分
+-++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++++            fa_attention_mask = (mask_slice != 0)
+-++++
+-++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++++        input_dtype = query_states.dtype
+-++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++++            query_states = query_states.to(mindspore.float16)
+-++++            key_states = key_states.to(mindspore.float16)
+-++++            value_states = value_states.to(mindspore.float16)
+-++++
+-++++        # 6. [核心] 调用 flash_attention_score 算子
+-++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++        attn_output = mindspore.ops.flash_attention_score(
+-++++            query=query_states,
+-++++            key=key_states,
+-++++            value=value_states,
+-++++            head_num=self.num_heads, # 传入Q的头数(N1)
+-++++            attn_mask=fa_attention_mask,
+-++++            keep_prob=1.0 - self.attention_dropout,
+-++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++            input_layout="BNSD",
+-++++            sparse_mode=0 # 使用 defaultMask 模式
+-++++        )
+-++++
+-++++        # 恢复原始数据类型
+-++++        attn_output = attn_output.to(input_dtype)
+-++++
+-++++        # 7. 调整输出形状
+-++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++        attn_output = self.o_proj(attn_output)
+-++++
+-++++        # FlashAttention 算子不直接返回注意力权重矩阵
+-++++        attn_weights = None
+-++++        if output_attentions:
+-++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++
+-++++        return attn_output, attn_weights, past_key_value
+-++++
+-++++    # def forward(
+-++++    #     self,
+-++++    #     hidden_states: mindspore.Tensor,
+-++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++    #     past_key_value: Optional[Cache] = None,
+-++++    #     output_attentions: bool = False,
+-++++    #     use_cache: bool = False,
+-++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++    #     bsz, q_len, _ = hidden_states.shape
+-++++
+-++++    #     # 1. 线性投射 Q, K, V
+-++++    #     query_states = self.q_proj(hidden_states)
+-++++    #     key_states = self.k_proj(hidden_states)
+-++++    #     value_states = self.v_proj(hidden_states)
+-++++
+-++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++    #     # 3. RoPE 旋转位置编码
+-++++    #     kv_seq_len = key_states.shape[-2]
+-++++    #     if past_key_value is not None:
+-++++    #         if self.layer_idx is None:
+-++++    #             raise ValueError(
+-++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++    #                 "with a layer index."
+-++++    #             )
+-++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++
+-++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++    #     # 4. KV 缓存更新
+-++++    #     if past_key_value is not None:
+-++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++    #         key_states, value_states = past_key_value.update(
+-++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++    #         )
+-++++
+-++++    #     # 5. 准备 Attention Mask
+-++++    #     fa_attention_mask = None
+-++++    #     if attention_mask is not None:
+-++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++    #         fa_attention_mask = (mask_slice != 0)
+-++++
+-++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++++    #     input_dtype = query_states.dtype
+-++++
+-++++    #     # 6. [核心] 调用 flash_attention_score 算子
+-++++    #     attn_output = mindspore.ops.flash_attention_score(
+-++++    #         query=query_states,
+-++++    #         key=key_states,
+-++++    #         value=value_states,
+-++++    #         head_num=self.num_heads,
+-++++    #         attn_mask=fa_attention_mask,
+-++++    #         keep_prob=1.0 - self.attention_dropout,
+-++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++    #         input_layout="BNSD",
+-++++    #         sparse_mode=0,
+-++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++++    #         inner_precise=1
+-++++    #     )
+-++++
+-++++    #     # 恢复原始数据类型
+-++++    #     attn_output = attn_output.to(input_dtype)
+-++++
+-++++    #     # 7. 调整输出形状
+-++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++    #     attn_output = self.o_proj(attn_output)
+-++++
+-++++    #     attn_weights = None
+-++++    #     if output_attentions:
+-++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++
+-++++    #     return attn_output, attn_weights, past_key_value
+-++++
+-++++    # def forward(
+-++++    #     self,
+-++++    #     hidden_states: mindspore.Tensor,
+-++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++    #     past_key_value: Optional[Cache] = None,
+-++++    #     output_attentions: bool = False,
+-++++    #     use_cache: bool = False,
+-++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++    #     bsz, q_len, _ = hidden_states.shape
+-++++
+-++++    #     query_states = self.q_proj(hidden_states)
+-++++    #     key_states = self.k_proj(hidden_states)
+-++++    #     value_states = self.v_proj(hidden_states)
+-++++
+-++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++    #     kv_seq_len = key_states.shape[-2]
+-++++    #     if past_key_value is not None:
+-++++    #         if self.layer_idx is None:
+-++++    #             raise ValueError("`layer_idx` must be specified for caching")
+-++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++
+-++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++    #     if past_key_value is not None:
+-++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++    #         key_states, value_states = past_key_value.update(
+-++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++    #         )
+-++++
+-++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++
+-++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++++    #     query_states = query_states / math.sqrt(self.head_dim)
+-++++    #     # <--- 修改结束 ---
+-++++
+-++++    #     fa_attention_mask = None
+-++++    #     if attention_mask is not None:
+-++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++    #         fa_attention_mask = (mask_slice != 0)
+-++++
+-++++    #     input_dtype = query_states.dtype
+-++++
+-++++    #     attn_output = mindspore.ops.flash_attention_score(
+-++++    #         query=query_states,    # 传入已经预先缩放过的 query
+-++++    #         key=key_states,
+-++++    #         value=value_states,
+-++++    #         head_num=self.num_heads,
+-++++    #         attn_mask=fa_attention_mask,
+-++++    #         keep_prob=1.0 - self.attention_dropout,
+-++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++++    #         input_layout="BNSD",
+-++++    #         sparse_mode=0,
+-++++    #         inner_precise=1        # 仍然保持内部高精度计算
+-++++    #     )
+-++++
+-++++    #     attn_output = attn_output.to(input_dtype)
+-++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++    #     attn_output = self.o_proj(attn_output)
+-++++
+-++++    #     attn_weights = None
+-++++    #     if output_attentions:
+-++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++++
+-++++    #     return attn_output, attn_weights, past_key_value
+-++++
+-+++ QWEN2MOE_ATTENTION_CLASSES = {
+-+++     "eager": Qwen2MoeAttention,
+-++++    "flash-attention": Qwen2MoeFlashAttention,
+-+++ }
+-+++ 
+-+++ 
+-+++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++ 
+-++++    #@dwj
+-++++    # 只遍历激活的专家，而非全部专家
+-+++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-+++-        # router_logits: (batch * sequence_length, n_experts)
+-+++-        router_logits = self.gate(hidden_states)
+-+++-
+-+++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++-        if self.norm_topk_prob:
+-+++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-        # we cast back to the input dtype
+-+++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-+++-
+-+++-        final_hidden_states = ops.zeros(
+-+++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+++-        )
+-+++-
+-+++-        # One hot encode the selected experts to create an expert mask
+-+++-        # this will be used to easily index which expert is going to be sollicitated
+-+++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+++-
+-+++-        # Loop over all available experts in the model and perform the computation on each expert
+-+++-        for expert_idx in range(self.num_experts):
+-+++-            expert_layer = self.experts[expert_idx]
+-+++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+++-
+-+++-            # Index the correct hidden states and compute the expert hidden state for
+-+++-            # the current expert. We need to make sure to multiply the output hidden
+-+++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+++-            if 0 not in idx.shape:
+-+++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+++-
+-+++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-+++-                # the `top_x` tensor here.
+-+++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+++-
+-+++-        shared_expert_output = self.shared_expert(hidden_states)
+-+++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+++-
+-+++-        final_hidden_states = final_hidden_states + shared_expert_output
+-++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++            num_tokens = hidden_states_reshaped.shape[0]
+-++++
+-++++            router_logits = self.gate(hidden_states_reshaped)
+-++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++
+-++++            if self.norm_topk_prob:
+-++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++            routing_weights = routing_weights.to(hidden_states.dtype)
+-++++            
+-++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++++            flat_selected_experts = selected_experts.flatten()
+-++++            
+-++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++++            token_indices = broadcasted_token_indices.flatten()
+-++++            
+-++++            active_experts = ops.unique(flat_selected_experts)
+-++++            
+-++++            for expert_idx_tensor in active_experts:
+-++++                expert_idx = expert_idx_tensor.item()
+-++++                expert_layer = self.experts[expert_idx]
+-++++                
+-++++                mask = (flat_selected_experts == expert_idx_tensor)
+-++++                selected_token_indices = token_indices[mask]
+-++++                selected_routing_weights = routing_weights.flatten()[mask]
+-++++                
+-++++                current_states = hidden_states_reshaped[selected_token_indices]
+-++++                
+-++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++                
+-++++                final_hidden_states = final_hidden_states.index_add(
+-++++                    dim=0,
+-++++                    index=selected_token_indices,
+-++++                    source=expert_output.to(hidden_states.dtype)
+-++++                )
+-++++            
+-++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++ 
+-+++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++-        return final_hidden_states, router_logits
+-++++            final_hidden_states = final_hidden_states + shared_expert_output
+-++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++            
+-++++            return final_hidden_states, router_logits
+-+++ 
+-+++ 
+-+++ class Qwen2MoeDecoderLayer(nn.Module):
+-+++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+++ 
+-+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++ 
+-++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++
+-+++         if (layer_idx not in config.mlp_only_layers) and (
+-+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+++         ):
+-+++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+++     _skip_keys_device_placement = "past_key_values"
+-+++     _supports_cache_class = True
+-++++#lwx
+-++++    # _supports_static_cache = True
+-+++ 
+-+++     def _init_weights(self, module):
+-+++         std = self.config.initializer_range
+-+++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+++         return causal_mask
+-+++ 
+-+++ 
+-+++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++     _tied_weights_keys = ["lm_head.weight"]
+-+++ 
+-+++     def __init__(self, config):
+-+++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++         self.num_experts_per_tok = config.num_experts_per_tok
+-+++         # Initialize weights and apply final processing
+-+++         self.post_init()
+-++++        # @lwx
+-++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-++++        #     self.generation_config.cache_implementation = "static"
+-++++        self._warmed_up = False
+-++++
+-++++    def warmup_moe_model(self):
+-++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-++++        test_texts = [
+-++++            "warmup short",
+-++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-++++        ]
+-++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++        if tokenizer is None:
+-++++            from mindnlp.transformers import AutoTokenizer
+-++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++            self._warmup_tokenizer = tokenizer
+-++++
+-++++        for text in test_texts:
+-++++            inputs = tokenizer(text, return_tensors="ms")
+-++++            with mindspore._no_grad():
+-++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+++ 
+-+++     def get_input_embeddings(self):
+-+++         return self.model.embed_tokens
+-+++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++         ```"""
+-++++        if not self._warmed_up:
+-++++            self._warmed_up = True
+-++++            self.warmup_moe_model()
+-+++ 
+-+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+++         output_router_logits = (
+-+++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++             }
+-+++         )
+-+++         return model_inputs
+-++++# @lwx
+-++++    # def _decode_one_tokens_logits(
+-++++    #     self,
+-++++    #     cur_token: mindspore.Tensor,
+-++++    #     input_pos: Optional[mindspore.Tensor],
+-++++    #     cache_position: mindspore.Tensor,
+-++++    #     past_key_values: StaticCache,
+-++++    # ) -> mindspore.Tensor:
+-++++    #     """
+-++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-++++        
+-++++    #     Args:
+-++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-++++    #         input_pos: 输入位置信息，可选
+-++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-++++            
+-++++    #     Returns:
+-++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-++++    #     """
+-++++    #     # 调用JIT编译的版本
+-++++    #     return self.get_decode_one_tokens_logits(
+-++++    #         cur_token=cur_token,
+-++++    #         input_pos=input_pos,
+-++++    #         cache_position=cache_position,
+-++++    #         past_key_values=past_key_values,
+-++++    #     )
+-++++    
+-++++    # @mindspore.jit(jit_level='O1')
+-++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-++++    #     """
+-++++    #     JIT编译的函数，用于高效的单token解码
+-++++    #     使用JIT编译优化以支持静态shape和高效执行
+-++++        
+-++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-++++    #     """
+-++++    #     outputs = self.model.forward(
+-++++    #         input_ids=cur_token,
+-++++    #         position_ids=input_pos,
+-++++    #         cache_position=cache_position,
+-++++    #         past_key_values=past_key_values,
+-++++    #         use_cache=True,
+-++++    #         return_dict=False,
+-++++    #     )
+-++++        
+-++++    #     hidden_states = outputs[0]
+-++++    #     logits = self.lm_head.forward(hidden_states)
+-++++    #     logits = logits.float()
+-++++        
+-++++    #     return logits[:, -1, :]
+-++++
+-++++    # def _sample(
+-++++    #     self,
+-++++    #     input_ids: mindspore.Tensor,
+-++++    #     logits_processor,
+-++++    #     stopping_criteria,
+-++++    #     generation_config,
+-++++    #     synced_devices: bool,
+-++++    #     streamer=None,
+-++++    #     logits_warper=None,
+-++++    #     **model_kwargs,
+-++++    # ):
+-++++    #     """
+-++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-++++    #     """
+-++++    #     from ...generation.logits_process import LogitsProcessorList
+-++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-++++    #     from mindnlp.core import nn, ops, no_grad
+-++++    #     import numpy as np
+-++++        
+-++++    #     # 检查是否使用 StaticCache
+-++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-++++    #     # 否则，直接调用父类方法
+-++++    #     past_key_values = model_kwargs.get("past_key_values")
+-++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-++++
+-++++    #     if not isinstance(past_key_values, StaticCache):
+-++++    #         # 不使用 StaticCache，直接调用父类方法
+-++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-++++    #         return super()._sample(
+-++++    #             input_ids=input_ids,
+-++++    #             logits_processor=logits_processor,
+-++++    #             stopping_criteria=stopping_criteria,
+-++++    #             generation_config=generation_config,
+-++++    #             synced_devices=synced_devices,
+-++++    #             streamer=streamer,
+-++++    #             logits_warper=logits_warper,
+-++++    #             **model_kwargs,
+-++++    #         )
+-++++        
+-++++    #     # 使用 StaticCache，进入自定义循环
+-++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-++++    #     pad_token_id = generation_config._pad_token_tensor
+-++++    #     output_attentions = generation_config.output_attentions
+-++++    #     output_hidden_states = generation_config.output_hidden_states
+-++++    #     output_scores = generation_config.output_scores
+-++++    #     output_logits = generation_config.output_logits
+-++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-++++    #     max_length = generation_config.max_length
+-++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-++++    #     do_sample = generation_config.do_sample
+-++++        
+-++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-++++    #         raise ValueError(
+-++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-++++    #             f"{logits_warper})."
+-++++    #         )
+-++++        
+-++++    #     # init attention / hidden states / scores tuples
+-++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-++++        
+-++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-++++    #         encoder_hidden_states = (
+-++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-++++    #         )
+-++++        
+-++++    #     # keep track of which sequences are already finished
+-++++    #     batch_size, cur_len = input_ids.shape
+-++++    #     this_peer_finished = False
+-++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-++++        
+-++++    #     time_record = []
+-++++    #     from ....utils.testing_utils import parse_flag_from_env
+-++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-++++        
+-++++    #     while self._has_unfinished_sequences(
+-++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-++++    #     ):
+-++++    #         if _record_time:
+-++++    #             import time as time_module
+-++++    #             infer_start = time_module.time()
+-++++            
+-++++    #         # prepare model inputs
+-++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-++++            
+-++++    #         # prepare variable output controls
+-++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-++++            
+-++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-++++    #         cur_cache_position = model_inputs.get("cache_position")
+-++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-++++    #         cur_input_ids = model_inputs.get("input_ids")
+-++++            
+-++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-++++    #             cur_cache_position is not None and 
+-++++    #             len(cur_cache_position.shape) > 0 and
+-++++    #             cur_cache_position.shape[0] == 1 and
+-++++    #             cur_input_ids is not None and
+-++++    #             cur_input_ids.shape[1] == 1):
+-++++    #             # 使用 JIT 优化的单 token 解码
+-++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-++++    #             if not hasattr(self, '_jit_used'):
+-++++    #                 self._jit_used = False
+-++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-++++                
+-++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-++++    #                 cur_token=cur_input_ids,
+-++++    #                 input_pos=model_inputs.get("position_ids"),
+-++++    #                 cache_position=cur_cache_position,
+-++++    #                 past_key_values=cur_past_key_values,
+-++++    #             )
+-++++                
+-++++    #             # 标记已使用JIT（用于后续判断）
+-++++    #             if not self._jit_used:
+-++++    #                 self._jit_used = True
+-++++                
+-++++    #             # 构造兼容的输出对象
+-++++    #             class JitOptimizedOutput:
+-++++    #                 def __init__(self, logits, config):
+-++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-++++    #                     self.config = config
+-++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-++++    #                     self.cross_attentions = None
+-++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-++++                
+-++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-++++    #         else:
+-++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-++++    #             outputs = self(**model_inputs, return_dict=True)
+-++++            
+-++++    #         if synced_devices and this_peer_finished:
+-++++    #             continue
+-++++            
+-++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-++++    #         next_token_logits = outputs.logits[:, -1, :]
+-++++            
+-++++    #         # pre-process distribution
+-++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-++++    #         if do_sample:
+-++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-++++            
+-++++    #         # Store scores, attentions and hidden_states when required
+-++++    #         if return_dict_in_generate:
+-++++    #             if output_scores:
+-++++    #                 scores += (next_token_scores,)
+-++++    #             if output_logits:
+-++++    #                 raw_logits += (next_token_logits,)
+-++++    #             if output_attentions:
+-++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-++++    #                 if self.config.is_encoder_decoder:
+-++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-++++                
+-++++    #             if output_hidden_states:
+-++++    #                 hidden = (
+-++++    #                     outputs.decoder_hidden_states
+-++++    #                     if self.config.is_encoder_decoder
+-++++    #                     else outputs.hidden_states
+-++++    #                 )
+-++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-++++            
+-++++    #         # token selection
+-++++    #         if do_sample:
+-++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-++++    #         else:
+-++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-++++            
+-++++    #         # finished sentences should have their next token be a padding token
+-++++    #         if has_eos_stopping_criteria:
+-++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-++++            
+-++++    #         # update generated ids, model inputs, and length for next step
+-++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-++++    #         if streamer is not None:
+-++++    #             streamer.put(next_tokens)
+-++++            
+-++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-++++    #             outputs,
+-++++    #             model_kwargs,
+-++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-++++    #         )
+-++++            
+-++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-++++    #         cur_len += 1
+-++++            
+-++++    #         if _record_time:
+-++++    #             import time as time_module
+-++++    #             infer_stop = time_module.time()
+-++++    #             time_record.append(infer_stop - infer_start)
+-++++            
+-++++    #         del outputs
+-++++        
+-++++    #     average_infer_time = None
+-++++    #     if time_record:
+-++++    #         if len(time_record) > 1:
+-++++    #             time_record.pop(0)
+-++++    #         average_infer_time = sum(time_record) / len(time_record)
+-++++    #         print(f'average inference time is: {average_infer_time}')
+-++++    #         print(f'inference time record: {time_record}')
+-++++        
+-++++    #     if streamer is not None:
+-++++    #         streamer.end()
+-++++        
+-++++    #     # 简单判断：打印是否使用了JIT路径
+-++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-++++    #     else:
+-++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-++++        
+-++++    #     if return_dict_in_generate:
+-++++    #         if self.config.is_encoder_decoder:
+-++++    #             return GenerateEncoderDecoderOutput(
+-++++    #                 sequences=input_ids,
+-++++    #                 scores=scores,
+-++++    #                 logits=raw_logits,
+-++++    #                 encoder_attentions=encoder_attentions,
+-++++    #                 encoder_hidden_states=encoder_hidden_states,
+-++++    #                 decoder_attentions=decoder_attentions,
+-++++    #                 cross_attentions=cross_attentions,
+-++++    #                 decoder_hidden_states=decoder_hidden_states,
+-++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++    #                 average_infer_time=average_infer_time
+-++++    #             )
+-++++    #         else:
+-++++    #             return GenerateDecoderOnlyOutput(
+-++++    #                 sequences=input_ids,
+-++++    #                 scores=scores,
+-++++    #                 logits=raw_logits,
+-++++    #                 attentions=decoder_attentions,
+-++++    #                 hidden_states=decoder_hidden_states,
+-++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++    #                 average_infer_time=average_infer_time
+-++++    #             )
+-++++    #     else:
+-++++    #         return input_ids
+-++++            
+-++++    # def _prepare_cache_for_generation(
+-++++    #     self,
+-++++    #     generation_config,
+-++++    #     model_kwargs,
+-++++    #     assistant_model,
+-++++    #     batch_size,
+-++++    #     max_cache_length,
+-++++    # ):
+-++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-++++    #         generation_config.cache_implementation = "static"
+-++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-++++        
+-++++    #     if generation_config.cache_implementation == "static":
+-++++    #         base_required_from_max_length = generation_config.max_length + 1
+-++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-++++    #         min_cache_size = 50
+-++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-++++    #         else:
+-++++    #             max_cache_length = max(base_required, min_cache_size)
+-++++            
+-++++    #         original_max_cache_length = max_cache_length
+-++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-++++            
+-++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++    #             if max_cache_length > self.config.max_position_embeddings:
+-++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-++++        
+-++++    #     result = super()._prepare_cache_for_generation(
+-++++    #         generation_config=generation_config,
+-++++    #         model_kwargs=model_kwargs,
+-++++    #         assistant_model=assistant_model,
+-++++    #         batch_size=batch_size,
+-++++    #         max_cache_length=max_cache_length,
+-++++    #     )
+-++++        
+-++++    #     if generation_config.cache_implementation == "static":
+-++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-++++    #         created_cache = model_kwargs.get(cache_name)
+-++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-++++    #             if created_cache.max_cache_len < generation_config.max_length:
+-++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-++++        
+-++++    #     return result
+-++++
+-++++
+-++++
+-+++ 
+-+++ 
+-+++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+++-- 
+-+++2.27.0
+-+++
+-++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-++new file mode 100644
+-++index 00000000..22b65dd5
+-++--- /dev/null
+-+++++ b/patches/0002-20251106commit.patch
+-++@@ -0,0 +1,3200 @@
+-+++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-+++From: Pinoeer-kingxi <13022943007@163.com>
+-+++Date: Thu, 6 Nov 2025 09:20:38 +0800
+-+++Subject: [PATCH 2/3] 20251106commit
+-+++
+-+++---
+-+++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+-+++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+-+++ 3 files changed, 2689 insertions(+), 305 deletions(-)
+-+++ create mode 100644 patches/0001-20251104commit.patch
+-+++
+-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++index d8303e45..73773c22 100644
+-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+-+++         #     y = y + self.shared_experts(identity)
+-+++         # return y
+-+++         
+-++++    # @no_grad()
+-++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++
+-++++    #     expert_cache = ops.zeros_like(x)
+-++++    #     for i in range(self.num_experts_per_tok):
+-++++    #         expert_id = flat_expert_indices[i].item()
+-++++    #         weight = flat_expert_weights[i].item()
+-++++    #         expert = self.experts[expert_id]
+-++++    #         expert_out = expert(x)
+-++++    #         expert_cache += expert_out * weight
+-++++    #     return expert_cache
+-++++
+-+++     @no_grad()
+-+++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++        # x 的 shape: (1, hidden_size)
+-++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-++++
+-++++        # 1. 收集所有需要的专家层
+-++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-++++
+-++++        # 2. 并行计算所有专家的输出
+-++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-++++        # ops.cat 会将它们堆叠成一个新的 Tensor
+-++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-++++
+-++++        # 3. 使用矩阵乘法进行加权求和
+-++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++++        # 最终结果 final_output 的 shape: (1, hidden_size)
+-++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-++++        
+-++++        return final_output
+-+++ 
+-+++-        expert_cache = ops.zeros_like(x)
+-+++-        for i in range(self.num_experts_per_tok):
+-+++-            expert_id = flat_expert_indices[i].item()
+-+++-            weight = flat_expert_weights[i].item()
+-+++-            expert = self.experts[expert_id]
+-+++-            expert_out = expert(x)
+-+++-            expert_cache += expert_out * weight
+-+++-        return expert_cache
+-+++ 
+-+++     @no_grad()
+-+++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+-+++             key_states = self.k_proj(hidden_states)
+-+++             value_states = self.v_proj(hidden_states)
+-+++ 
+-+++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++        # @lwx
+-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+-++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-+++ 
+-+++         kv_seq_len = key_states.shape[-2]
+-+++         if past_key_value is not None:
+-+++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++ 
+-++++# class DeepseekFlashAttention(nn.Module):
+-++++#     """
+-++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-++++
+-++++#     This class is designed as a drop-in replacement for DeepseekAttention.
+-++++#     """
+-++++
+-++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-++++#         super().__init__()
+-++++#         self.config = config
+-++++#         self.layer_idx = layer_idx
+-++++#         if layer_idx is None:
+-++++#             logger.warning(
+-++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++#                 "when creating this class."
+-++++#             )
+-++++
+-++++#         self.attention_dropout = config.attention_dropout
+-++++#         self.hidden_size = config.hidden_size
+-++++#         self.num_heads = config.num_attention_heads
+-++++#         self.head_dim = self.hidden_size // self.num_heads
+-++++#         self.num_key_value_heads = config.num_key_value_heads
+-++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++#         self.max_position_embeddings = config.max_position_embeddings
+-++++#         self.rope_theta = config.rope_theta
+-++++#         self.is_causal = True
+-++++
+-++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++#             raise ValueError(
+-++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++++#                 f" and `num_heads`: {self.num_heads})."
+-++++#             )
+-++++
+-++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-++++#         self._init_rope()
+-++++
+-++++#     def _init_rope(self):
+-++++#         if self.config.rope_scaling is None:
+-++++#             self.rotary_emb = DeepseekRotaryEmbedding(
+-++++#                 self.head_dim,
+-++++#                 max_position_embeddings=self.max_position_embeddings,
+-++++#                 base=self.rope_theta,
+-++++#             )
+-++++#         else:
+-++++#             scaling_type = self.config.rope_scaling["type"]
+-++++#             scaling_factor = self.config.rope_scaling["factor"]
+-++++#             if scaling_type == "linear":
+-++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-++++#                     self.head_dim,
+-++++#                     max_position_embeddings=self.max_position_embeddings,
+-++++#                     scaling_factor=scaling_factor,
+-++++#                     base=self.rope_theta,
+-++++#                 )
+-++++#             elif scaling_type == "dynamic":
+-++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-++++#                     self.head_dim,
+-++++#                     max_position_embeddings=self.max_position_embeddings,
+-++++#                     scaling_factor=scaling_factor,
+-++++#                     base=self.rope_theta,
+-++++#                 )
+-++++#             else:
+-++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-++++
+-++++#     def forward(
+-++++#         self,
+-++++#         hidden_states: mindspore.Tensor,
+-++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++++#         position_ids: Optional[mindspore.Tensor] = None,
+-++++#         past_key_value: Optional[Cache] = None,
+-++++#         output_attentions: bool = False,
+-++++#         use_cache: bool = False,
+-++++#         **kwargs,
+-++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++#         if "padding_mask" in kwargs:
+-++++#             warnings.warn(
+-++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-++++#             )
+-++++        
+-++++#         if output_attentions:
+-++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-++++
+-++++#         bsz, q_len, _ = hidden_states.shape
+-++++
+-++++#         if self.config.pretraining_tp > 1:
+-++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-++++
+-++++#         query_states = self.q_proj(hidden_states)
+-++++#         key_states = self.k_proj(hidden_states)
+-++++#         value_states = self.v_proj(hidden_states)
+-++++
+-++++#         # Reshape for multi-head attention
+-++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++#         kv_seq_len = key_states.shape[-2]
+-++++#         if past_key_value is not None:
+-++++#             if self.layer_idx is None:
+-++++#                 raise ValueError(
+-++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++#                     "with a layer index."
+-++++#                 )
+-++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++        
+-++++#         # Apply Rotary Positional Embedding
+-++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++#         if past_key_value is not None:
+-++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++
+-++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++        
+-++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++++        
+-++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++++
+-++++#         # Convert attention_mask for flash_attention_score
+-++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-++++#         if attention_mask is not None:
+-++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-++++#                 raise ValueError(
+-++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-++++#                 )
+-++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-++++#         else:
+-++++#             attn_mask_for_fa = None
+-++++        
+-++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-++++
+-++++#         # Call the fused flash_attention_score operator
+-++++#         attn_output = mindspore.ops.flash_attention_score(
+-++++#             query=query_states_for_fa,
+-++++#             key=key_states_for_fa,
+-++++#             value=value_states_for_fa,
+-++++#             head_num=self.num_heads, # This is N1, the number of query heads
+-++++#             input_layout='BSH',
+-++++#             attn_mask=attn_mask_for_fa,
+-++++#             keep_prob=keep_prob,
+-++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++#             sparse_mode=0 # Default mask mode
+-++++#         )
+-++++        
+-++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-++++#         attn_output = self.o_proj(attn_output)
+-++++        
+-++++#         # Flash Attention does not return attention weights
+-++++#         attn_weights = None
+-++++
+-++++#         return attn_output, attn_weights, past_key_value
+-++++
+-++++class DeepseekFlashAttention(nn.Module):
+-++++    """
+-++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+-++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
+-++++    designed for high performance on supported hardware (Ascend).
+-++++
+-++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+-++++    """
+-++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-++++        super().__init__()
+-++++        self.config = config
+-++++        self.layer_idx = layer_idx
+-++++        if layer_idx is None:
+-++++            logger.warning(
+-++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++                "when creating this class."
+-++++            )
+-++++
+-++++        # --- [FIX] Correctly initialize all required attributes ---
+-++++        self.attention_dropout = config.attention_dropout
+-++++        self.hidden_size = config.hidden_size
+-++++        self.num_heads = config.num_attention_heads
+-++++        self.head_dim = self.hidden_size // self.num_heads
+-++++        self.num_key_value_heads = config.num_key_value_heads
+-++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++        self.max_position_embeddings = config.max_position_embeddings
+-++++        self.rope_theta = config.rope_theta
+-++++        self.is_causal = True
+-++++
+-++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++            raise ValueError(
+-++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++++                f" and `num_heads`: {self.num_heads})."
+-++++            )
+-++++
+-++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-++++        
+-++++        # This call will now succeed as all attributes are initialized.
+-++++        self._init_rope()
+-++++
+-++++    def _init_rope(self):
+-++++        if self.config.rope_scaling is None:
+-++++            self.rotary_emb = DeepseekRotaryEmbedding(
+-++++                self.head_dim,
+-++++                max_position_embeddings=self.max_position_embeddings,
+-++++                base=self.rope_theta,
+-++++            )
+-++++        else:
+-++++            scaling_type = self.config.rope_scaling["type"]
+-++++            scaling_factor = self.config.rope_scaling["factor"]
+-++++            if scaling_type == "linear":
+-++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-++++                    self.head_dim,
+-++++                    max_position_embeddings=self.max_position_embeddings,
+-++++                    scaling_factor=scaling_factor,
+-++++                    base=self.rope_theta,
+-++++                )
+-++++            elif scaling_type == "dynamic":
+-++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-++++                    self.head_dim,
+-++++                    max_position_embeddings=self.max_position_embeddings,
+-++++                    scaling_factor=scaling_factor,
+-++++                    base=self.rope_theta,
+-++++                )
+-++++            else:
+-++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-++++
+-++++    def forward(
+-++++        self,
+-++++        hidden_states: mindspore.Tensor,
+-++++        attention_mask: Optional[mindspore.Tensor] = None,
+-++++        position_ids: Optional[mindspore.Tensor] = None,
+-++++        past_key_value: Optional[Cache] = None,
+-++++        output_attentions: bool = False,
+-++++        use_cache: bool = False,
+-++++        **kwargs,
+-++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++        if "padding_mask" in kwargs:
+-++++            warnings.warn(
+-++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-++++            )
+-++++        if output_attentions:
+-++++            warnings.warn(
+-++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+-++++            )
+-++++
+-++++        bsz, q_len, _ = hidden_states.shape
+-++++
+-++++        if self.config.pretraining_tp > 1:
+-++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-++++
+-++++        query_states = self.q_proj(hidden_states)
+-++++        key_states = self.k_proj(hidden_states)
+-++++        value_states = self.v_proj(hidden_states)
+-++++
+-++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++        kv_seq_len = key_states.shape[-2]
+-++++        if past_key_value is not None:
+-++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++        
+-++++        # Apply Rotary Position Embedding
+-++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++        if past_key_value is not None:
+-++++            cache_kwargs = {"sin": sin, "cos": cos}
+-++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++
+-++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+-++++        # So we must explicitly repeat the KV heads.
+-++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++
+-++++        # Convert attention mask for flash_attention_score
+-++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+-++++        if attention_mask is not None:
+-++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-++++                 raise ValueError(
+-++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-++++                )
+-++++            attn_mask_for_fa = attention_mask < 0
+-++++        else:
+-++++            attn_mask_for_fa = None
+-++++
+-++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-++++
+-++++        # Call the fused operator using the efficient BNSD layout
+-++++        attn_output = mindspore.ops.flash_attention_score(
+-++++            query=query_states,
+-++++            key=key_states,
+-++++            value=value_states,
+-++++            head_num=self.num_heads,
+-++++            input_layout='BNSD', # Specify the correct layout
+-++++            attn_mask=attn_mask_for_fa,
+-++++            keep_prob=keep_prob,
+-++++            scalar_value=1.0 / math.sqrt(self.head_dim)
+-++++        )
+-++++        
+-++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+-++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++        
+-++++        # Apply output projection
+-++++        attn_output = self.o_proj(attn_output)
+-++++
+-++++        # Flash attention does not return attention weights, so we return None.
+-++++        attn_weights = None
+-++++
+-++++        return attn_output, attn_weights, past_key_value
+-++++
+-+++ Deepseek_ATTENTION_CLASSES = {
+-+++     "eager": DeepseekAttention,
+-++++    "flash-attention": DeepseekFlashAttention,
+-+++ }
+-+++ 
+-+++ 
+-+++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+-+++             config=config, layer_idx=layer_idx
+-+++         )
+-+++ 
+-++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-++++            config=config, layer_idx=layer_idx
+-++++        )
+-++++
+-+++         self.mlp = (
+-+++             DeepseekMoE(config)
+-+++             if (
+-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++index d4c6b651..bced285c 100644
+-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+-+++ 
+-+++ import mindspore
+-+++ import mindnlp.core.nn.functional as F
+-+++-from mindnlp.core import nn, ops
+-++++from mindnlp.core import nn, ops, no_grad
+-+++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+-+++ 
+-+++ from ....common.activations import ACT2FN
+-+++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+-+++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-+++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-+++ 
+-++++Long_Prompt = False
+-++++PROMPT_LENGTH_THRESHOLD = 128
+-+++ 
+-+++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-+++ def _prepare_4d_causal_attention_mask_with_cache_position(
+-+++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++ 
+-++++# class Qwen2MoeFlashAttention(nn.Module):
+-++++#     """
+-++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++++
+-++++#     关键改动:
+-++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++++#        直接传入原始的 key 和 value 张量效率更高。
+-++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++#     """
+-++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++#         super().__init__()
+-++++#         self.config = config
+-++++#         self.layer_idx = layer_idx
+-++++#         self.hidden_size = config.hidden_size
+-++++#         self.num_heads = config.num_attention_heads
+-++++#         self.head_dim = self.hidden_size // self.num_heads
+-++++#         self.num_key_value_heads = config.num_key_value_heads
+-++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++#         self.max_position_embeddings = config.max_position_embeddings
+-++++#         self.rope_theta = config.rope_theta
+-++++#         self.attention_dropout = config.attention_dropout
+-++++
+-++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++#             raise ValueError(
+-++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++++#             )
+-++++
+-++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++
+-++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++#             self.head_dim,
+-++++#             max_position_embeddings=self.max_position_embeddings,
+-++++#             base=self.rope_theta,
+-++++#         )
+-++++
+-++++#     def forward(
+-++++#         self,
+-++++#         hidden_states: mindspore.Tensor,
+-++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++++#         position_ids: Optional[mindspore.Tensor] = None,
+-++++#         past_key_value: Optional[Cache] = None,
+-++++#         output_attentions: bool = False,
+-++++#         use_cache: bool = False,
+-++++#         cache_position: Optional[mindspore.Tensor] = None,
+-++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++#         bsz, q_len, _ = hidden_states.shape
+-++++
+-++++#         # 1. 线性投射 Q, K, V
+-++++#         query_states = self.q_proj(hidden_states)
+-++++#         key_states = self.k_proj(hidden_states)
+-++++#         value_states = self.v_proj(hidden_states)
+-++++
+-++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
+-++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++#         # 3. RoPE 旋转位置编码
+-++++#         kv_seq_len = key_states.shape[-2]
+-++++#         if past_key_value is not None:
+-++++#             if self.layer_idx is None:
+-++++#                 raise ValueError(
+-++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++#                     "with a layer index."
+-++++#                 )
+-++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++++#                 if cache_position.shape[0] == 1:
+-++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++++#                     kv_seq_len = past_seen_tokens + 1
+-++++#                 else:
+-++++#                     # prefill 阶段：cache_position 是范围，使用其长度
+-++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++++#             else:
+-++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++
+-++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++#         # 4. KV 缓存更新
+-++++#         if past_key_value is not None:
+-++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++#             key_states, value_states = past_key_value.update(
+-++++#                 key_states, value_states, self.layer_idx, cache_kwargs
+-++++#             )
+-++++            
+-++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++#                 if cache_position.shape[0] == 1:
+-++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++++#                     kv_seq_len = key_states.shape[-2]
+-++++
+-++++#         # 5. [重要] 准备 Attention Mask
+-++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++#         fa_attention_mask = None
+-++++#         if attention_mask is not None:
+-++++#             # 截取与当前key长度匹配的部分
+-++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++++#             fa_attention_mask = (mask_slice != 0)
+-++++
+-++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++++#         input_dtype = query_states.dtype
+-++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++++#             query_states = query_states.to(mindspore.float16)
+-++++#             key_states = key_states.to(mindspore.float16)
+-++++#             value_states = value_states.to(mindspore.float16)
+-++++
+-++++#         # 6. [核心] 调用 flash_attention_score 算子
+-++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++#         attn_output = mindspore.ops.flash_attention_score(
+-++++#             query=query_states,
+-++++#             key=key_states,
+-++++#             value=value_states,
+-++++#             head_num=self.num_heads, # 传入Q的头数(N1)
+-++++#             attn_mask=fa_attention_mask,
+-++++#             keep_prob=1.0 - self.attention_dropout,
+-++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++#             input_layout="BNSD",
+-++++#             sparse_mode=0 # 使用 defaultMask 模式
+-++++#         )
+-++++
+-++++#         # 恢复原始数据类型
+-++++#         attn_output = attn_output.to(input_dtype)
+-++++
+-++++#         # 7. 调整输出形状
+-++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++#         attn_output = self.o_proj(attn_output)
+-++++
+-++++#         # FlashAttention 算子不直接返回注意力权重矩阵
+-++++#         attn_weights = None
+-++++#         if output_attentions:
+-++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++
+-++++#         return attn_output, attn_weights, past_key_value
+-++++
+-++++#     # def forward(
+-++++#     #     self,
+-++++#     #     hidden_states: mindspore.Tensor,
+-++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++#     #     position_ids: Optional[mindspore.Tensor] = None,
+-++++#     #     past_key_value: Optional[Cache] = None,
+-++++#     #     output_attentions: bool = False,
+-++++#     #     use_cache: bool = False,
+-++++#     #     cache_position: Optional[mindspore.Tensor] = None,
+-++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++#     #     bsz, q_len, _ = hidden_states.shape
+-++++
+-++++#     #     # 1. 线性投射 Q, K, V
+-++++#     #     query_states = self.q_proj(hidden_states)
+-++++#     #     key_states = self.k_proj(hidden_states)
+-++++#     #     value_states = self.v_proj(hidden_states)
+-++++
+-++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++
+-++++#     #     # 3. RoPE 旋转位置编码
+-++++#     #     kv_seq_len = key_states.shape[-2]
+-++++#     #     if past_key_value is not None:
+-++++#     #         if self.layer_idx is None:
+-++++#     #             raise ValueError(
+-++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++#     #                 "with a layer index."
+-++++#     #             )
+-++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++
+-++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++#     #     # 4. KV 缓存更新
+-++++#     #     if past_key_value is not None:
+-++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++#     #         key_states, value_states = past_key_value.update(
+-++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++#     #         )
+-++++
+-++++#     #     # 5. 准备 Attention Mask
+-++++#     #     fa_attention_mask = None
+-++++#     #     if attention_mask is not None:
+-++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++#     #         fa_attention_mask = (mask_slice != 0)
+-++++
+-++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++++#     #     input_dtype = query_states.dtype
+-++++
+-++++#     #     # 6. [核心] 调用 flash_attention_score 算子
+-++++#     #     attn_output = mindspore.ops.flash_attention_score(
+-++++#     #         query=query_states,
+-++++#     #         key=key_states,
+-++++#     #         value=value_states,
+-++++#     #         head_num=self.num_heads,
+-++++#     #         attn_mask=fa_attention_mask,
+-++++#     #         keep_prob=1.0 - self.attention_dropout,
+-++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++#     #         input_layout="BNSD",
+-++++#     #         sparse_mode=0,
+-++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++++#     #         inner_precise=1
+-++++#     #     )
+-++++
+-++++#     #     # 恢复原始数据类型
+-++++#     #     attn_output = attn_output.to(input_dtype)
+-++++
+-++++#     #     # 7. 调整输出形状
+-++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++#     #     attn_output = self.o_proj(attn_output)
+-++++
+-++++#     #     attn_weights = None
+-++++#     #     if output_attentions:
+-++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++
+-++++#     #     return attn_output, attn_weights, past_key_value
+-++++
+-++++
+-+++ class Qwen2MoeFlashAttention(nn.Module):
+-+++     """
+-+++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++-
+-+++-    关键改动:
+-+++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++-       直接传入原始的 key 和 value 张量效率更高。
+-+++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+-++++
+-++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+-++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+-++++    完全使用模型的低精度数据类型（如 float16）进行计算，
+-++++    以达到理论上的最高执行速度。
+-+++     """
+-+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++         super().__init__()
+-+++         self.config = config
+-+++         self.layer_idx = layer_idx
+-++++        if layer_idx is None:
+-++++            logger.warning_once(
+-++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+-++++            )
+-++++
+-+++         self.hidden_size = config.hidden_size
+-+++         self.num_heads = config.num_attention_heads
+-+++         self.head_dim = self.hidden_size // self.num_heads
+-+++         self.num_key_value_heads = config.num_key_value_heads
+-+++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++         self.max_position_embeddings = config.max_position_embeddings
+-+++         self.rope_theta = config.rope_theta
+-+++         self.attention_dropout = config.attention_dropout
+-+++ 
+-+++-        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++-            raise ValueError(
+-+++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++-            )
+-+++-
+-+++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+-+++         key_states = self.k_proj(hidden_states)
+-+++         value_states = self.v_proj(hidden_states)
+-+++ 
+-+++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++-        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++        # 2. 调整形状以匹配 BNSD 布局
+-+++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-
+-+++-        # 3. RoPE 旋转位置编码
+-++++        
+-++++        # 3. RoPE 和 KV 缓存
+-+++         kv_seq_len = key_states.shape[-2]
+-+++         if past_key_value is not None:
+-+++-            if self.layer_idx is None:
+-+++-                raise ValueError(
+-+++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++-                    "with a layer index."
+-+++-                )
+-+++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++-                if cache_position.shape[0] == 1:
+-+++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++-                    kv_seq_len = past_seen_tokens + 1
+-+++-                else:
+-+++-                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++-            else:
+-+++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++-
+-++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++        
+-+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++ 
+-+++-        # 4. KV 缓存更新
+-+++         if past_key_value is not None:
+-+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++-            key_states, value_states = past_key_value.update(
+-+++-                key_states, value_states, self.layer_idx, cache_kwargs
+-+++-            )
+-+++-            
+-+++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++-                if cache_position.shape[0] == 1:
+-+++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++-                    kv_seq_len = key_states.shape[-2]
+-+++-
+-+++-        # 5. [重要] 准备 Attention Mask
+-+++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++
+-++++        # 4. 准备 Attention Mask
+-+++         fa_attention_mask = None
+-+++         if attention_mask is not None:
+-+++-            # 截取与当前key长度匹配的部分
+-+++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++             fa_attention_mask = (mask_slice != 0)
+-+++ 
+-+++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++-        input_dtype = query_states.dtype
+-+++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++-            query_states = query_states.to(mindspore.float16)
+-+++-            key_states = key_states.to(mindspore.float16)
+-+++-            value_states = value_states.to(mindspore.float16)
+-+++-
+-+++-        # 6. [核心] 调用 flash_attention_score 算子
+-+++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+-+++         attn_output = mindspore.ops.flash_attention_score(
+-+++             query=query_states,
+-+++             key=key_states,
+-+++             value=value_states,
+-+++-            head_num=self.num_heads, # 传入Q的头数(N1)
+-++++            head_num=self.num_heads,
+-+++             attn_mask=fa_attention_mask,
+-+++-            keep_prob=1.0 - self.attention_dropout,
+-++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+-+++             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++             input_layout="BNSD",
+-+++-            sparse_mode=0 # 使用 defaultMask 模式
+-++++            sparse_mode=0,
+-++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+-+++         )
+-+++ 
+-+++-        # 恢复原始数据类型
+-+++-        attn_output = attn_output.to(input_dtype)
+-+++-
+-+++-        # 7. 调整输出形状
+-+++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++        # 6. 调整输出形状
+-+++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++         attn_output = self.o_proj(attn_output)
+-+++ 
+-+++-        # FlashAttention 算子不直接返回注意力权重矩阵
+-++++        # 7. 返回结果
+-+++         attn_weights = None
+-+++         if output_attentions:
+-+++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-+++ 
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++-    # def forward(
+-+++-    #     self,
+-+++-    #     hidden_states: mindspore.Tensor,
+-+++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++-    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++-    #     past_key_value: Optional[Cache] = None,
+-+++-    #     output_attentions: bool = False,
+-+++-    #     use_cache: bool = False,
+-+++-    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++-
+-+++-    #     bsz, q_len, _ = hidden_states.shape
+-+++-
+-+++-    #     # 1. 线性投射 Q, K, V
+-+++-    #     query_states = self.q_proj(hidden_states)
+-+++-    #     key_states = self.k_proj(hidden_states)
+-+++-    #     value_states = self.v_proj(hidden_states)
+-+++-
+-+++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-
+-+++-    #     # 3. RoPE 旋转位置编码
+-+++-    #     kv_seq_len = key_states.shape[-2]
+-+++-    #     if past_key_value is not None:
+-+++-    #         if self.layer_idx is None:
+-+++-    #             raise ValueError(
+-+++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++-    #                 "with a layer index."
+-+++-    #             )
+-+++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++ 
+-+++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++-
+-+++-    #     # 4. KV 缓存更新
+-+++-    #     if past_key_value is not None:
+-+++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++-    #         key_states, value_states = past_key_value.update(
+-+++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++-    #         )
+-+++-
+-+++-    #     # 5. 准备 Attention Mask
+-+++-    #     fa_attention_mask = None
+-+++-    #     if attention_mask is not None:
+-+++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++-    #         fa_attention_mask = (mask_slice != 0)
+-+++-
+-+++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++-    #     input_dtype = query_states.dtype
+-+++-
+-+++-    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++-    #     attn_output = mindspore.ops.flash_attention_score(
+-+++-    #         query=query_states,
+-+++-    #         key=key_states,
+-+++-    #         value=value_states,
+-+++-    #         head_num=self.num_heads,
+-+++-    #         attn_mask=fa_attention_mask,
+-+++-    #         keep_prob=1.0 - self.attention_dropout,
+-+++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++-    #         input_layout="BNSD",
+-+++-    #         sparse_mode=0,
+-+++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++-    #         inner_precise=1
+-+++-    #     )
+-+++-
+-+++-    #     # 恢复原始数据类型
+-+++-    #     attn_output = attn_output.to(input_dtype)
+-++++QWEN2MOE_ATTENTION_CLASSES = {
+-++++    "eager": Qwen2MoeAttention,
+-++++    "flash-attention": Qwen2MoeFlashAttention,
+-++++}
+-+++ 
+-+++-    #     # 7. 调整输出形状
+-+++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++-    #     attn_output = self.o_proj(attn_output)
+-+++ 
+-+++-    #     attn_weights = None
+-+++-    #     if output_attentions:
+-+++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++#     def __init__(self, config):
+-++++#         super().__init__()
+-++++#         self.num_experts = config.num_experts
+-++++#         self.top_k = config.num_experts_per_tok
+-++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++
+-++++#         # gating
+-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++#         self.experts = nn.ModuleList(
+-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++#         )
+-++++
+-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++
+-++++#     #@dwj
+-++++#     # 只遍历激活的专家，而非全部专家
+-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++#             num_tokens = hidden_states_reshaped.shape[0]
+-++++
+-++++#             router_logits = self.gate(hidden_states_reshaped)
+-++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++
+-++++#             if self.norm_topk_prob:
+-++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++#             routing_weights = routing_weights.to(hidden_states.dtype)
+-++++            
+-++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++++#             flat_selected_experts = selected_experts.flatten()
+-++++            
+-++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++++#             token_indices = broadcasted_token_indices.flatten()
+-++++            
+-++++#             active_experts = ops.unique(flat_selected_experts)
+-++++            
+-++++#             for expert_idx_tensor in active_experts:
+-++++#                 expert_idx = expert_idx_tensor.item()
+-++++#                 expert_layer = self.experts[expert_idx]
+-++++                
+-++++#                 mask = (flat_selected_experts == expert_idx_tensor)
+-++++#                 selected_token_indices = token_indices[mask]
+-++++#                 selected_routing_weights = routing_weights.flatten()[mask]
+-++++                
+-++++#                 current_states = hidden_states_reshaped[selected_token_indices]
+-++++                
+-++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++                
+-++++#                 final_hidden_states = final_hidden_states.index_add(
+-++++#                     dim=0,
+-++++#                     index=selected_token_indices,
+-++++#                     source=expert_output.to(hidden_states.dtype)
+-++++#                 )
+-++++            
+-++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++ 
+-+++-    #     return attn_output, attn_weights, past_key_value
+-++++#             final_hidden_states = final_hidden_states + shared_expert_output
+-++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++            
+-++++#             return final_hidden_states, router_logits
+-++++
+-++++
+-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++#     """
+-++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-++++#     """
+-++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++#         super().__init__()
+-++++#         self.num_experts = config.num_experts
+-++++#         self.top_k = config.num_experts_per_tok
+-++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++
+-++++#         # 门控网络
+-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++#         # 专家列表
+-++++#         self.experts = nn.ModuleList(
+-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++#         )
+-++++#         # 共享专家
+-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++
+-++++#     @no_grad()
+-++++#     def _moe_infer_decode(
+-++++#         self, 
+-++++#         hidden_states: mindspore.Tensor, 
+-++++#         selected_experts: mindspore.Tensor, 
+-++++#         routing_weights: mindspore.Tensor
+-++++#     ) -> mindspore.Tensor:
+-++++#         """
+-++++#         【解码路径】针对 sequence_length=1 的极致优化。
+-++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-++++#         """
+-++++#         batch_size, hidden_dim = hidden_states.shape
+-++++        
+-++++#         expert_outputs_list = [
+-++++#             ops.cat([
+-++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++#             ], dim=0) 
+-++++#             for i in range(batch_size)
+-++++#         ]
+-++++        
+-++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-++++#         # shape: (batch_size, top_k, hidden_dim)
+-++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++        
+-++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++        
+-++++#         return moe_output.squeeze(1)
+-++++
+-++++#     @no_grad()
+-++++#     def _moe_infer_prefill(
+-++++#         self, 
+-++++#         hidden_states: mindspore.Tensor, 
+-++++#         selected_experts: mindspore.Tensor, 
+-++++#         routing_weights: mindspore.Tensor
+-++++#     ) -> mindspore.Tensor:
+-++++#         """
+-++++#         【预填充路径】针对 sequence_length > 1 的优化。
+-++++#         按专家对 Token 进行分组，并进行批处理。
+-++++#         """
+-++++#         moe_output = ops.zeros_like(hidden_states)
+-++++#         num_tokens = hidden_states.shape[0]
+-++++#         flat_selected_experts = selected_experts.flatten()
+-++++        
+-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++        
+-++++#         active_experts = ops.unique(flat_selected_experts)
+-++++        
+-++++#         for expert_idx_tensor in active_experts:
+-++++#             expert_idx = expert_idx_tensor.item()
+-++++#             expert_layer = self.experts[expert_idx]
+-++++            
+-++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++#             selected_token_indices = token_indices[mask]
+-++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++            
+-++++#             current_states = hidden_states[selected_token_indices]
+-++++            
+-++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++            
+-++++#             moe_output = moe_output.index_add(
+-++++#                 dim=0,
+-++++#                 index=selected_token_indices,
+-++++#                 source=expert_output.to(hidden_states.dtype)
+-++++#             )
+-++++#         return moe_output
+-++++
+-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++#         """
+-++++#         顶层 forward 方法，作为智能分发器。
+-++++#         """
+-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++        
+-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++ 
+-+++-    # def forward(
+-+++-    #     self,
+-+++-    #     hidden_states: mindspore.Tensor,
+-+++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++-    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++-    #     past_key_value: Optional[Cache] = None,
+-+++-    #     output_attentions: bool = False,
+-+++-    #     use_cache: bool = False,
+-+++-    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++-
+-+++-    #     bsz, q_len, _ = hidden_states.shape
+-+++-
+-+++-    #     query_states = self.q_proj(hidden_states)
+-+++-    #     key_states = self.k_proj(hidden_states)
+-+++-    #     value_states = self.v_proj(hidden_states)
+-+++-
+-+++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-
+-+++-    #     kv_seq_len = key_states.shape[-2]
+-+++-    #     if past_key_value is not None:
+-+++-    #         if self.layer_idx is None:
+-+++-    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++-
+-+++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++-
+-+++-    #     if past_key_value is not None:
+-+++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++-    #         key_states, value_states = past_key_value.update(
+-+++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++-    #         )
+-++++#         if self.norm_topk_prob:
+-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++        
+-++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++        
+-++++#         moe_output = None
+-++++#         # 在推理时，根据序列长度选择最优路径
+-++++#         if not self.training:
+-++++#             if sequence_length == 1:
+-++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++++#             else:
+-++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++++#         else:
+-++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-++++#             raise NotImplementedError("Training path is not implemented.")
+-++++
+-++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-++++        
+-++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-++++        
+-++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-++++        
+-++++#         return final_hidden_states, router_logits
+-++++
+-++++
+-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++#     """
+-++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-++++#     """
+-++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++#         super().__init__()
+-++++#         self.num_experts = config.num_experts
+-++++#         self.top_k = config.num_experts_per_tok
+-++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++
+-++++#         # 门控网络
+-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++#         # 专家列表
+-++++#         self.experts = nn.ModuleList(
+-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++#         )
+-++++#         # 共享专家
+-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++
+-++++#     @no_grad()
+-++++#     def _moe_infer_decode(
+-++++#         self, 
+-++++#         hidden_states: mindspore.Tensor, 
+-++++#         selected_experts: mindspore.Tensor, 
+-++++#         routing_weights: mindspore.Tensor
+-++++#     ) -> mindspore.Tensor:
+-++++#         batch_size, _ = hidden_states.shape
+-++++#         expert_outputs_list = [
+-++++#             ops.cat([
+-++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++#             ], dim=0) 
+-++++#             for i in range(batch_size)
+-++++#         ]
+-++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++#         return moe_output.squeeze(1)
+-++++
+-++++#     @no_grad()
+-++++#     def _moe_infer_prefill(
+-++++#         self, 
+-++++#         hidden_states: mindspore.Tensor, 
+-++++#         selected_experts: mindspore.Tensor, 
+-++++#         routing_weights: mindspore.Tensor
+-++++#     ) -> mindspore.Tensor:
+-++++#         moe_output = ops.zeros_like(hidden_states)
+-++++#         num_tokens = hidden_states.shape[0]
+-++++#         flat_selected_experts = selected_experts.flatten()
+-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++#         active_experts = ops.unique(flat_selected_experts)
+-++++        
+-++++#         for expert_idx_tensor in active_experts:
+-++++#             expert_idx = expert_idx_tensor.item()
+-++++#             expert_layer = self.experts[expert_idx]
+-++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++#             selected_token_indices = token_indices[mask]
+-++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++#             current_states = hidden_states[selected_token_indices]
+-++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++#             moe_output = moe_output.index_add(
+-++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++#             )
+-++++#         return moe_output
+-++++
+-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++#         """
+-++++#         顶层 forward 方法，作为智能分发器。
+-++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-++++#         """
+-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++        
+-++++#         # 1. 门控计算 (通用逻辑)
+-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++
+-++++#         if self.norm_topk_prob:
+-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++        
+-++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++        
+-++++#         # 2. 智能分发到最优 MoE 路径
+-++++#         moe_output = None
+-++++#         if not self.training:
+-++++#             if sequence_length == 1:
+-++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++++#             else:
+-++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++++#         else:
+-++++#             raise NotImplementedError("Training path is not implemented.")
+-++++
+-++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++        
+-++++#         # 4. 合并 MoE 输出和共享专家输出
+-++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++        
+-++++#         # 5. 恢复原始形状并返回
+-++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++        
+-++++#         return final_hidden_states, router_logits
+-++++
+-++++# prefill fastest
+-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++#     """
+-++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-++++#     """
+-++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++#         super().__init__()
+-++++#         self.num_experts = config.num_experts
+-++++#         self.top_k = config.num_experts_per_tok
+-++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++
+-++++#         # 门控网络
+-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++#         # 专家列表
+-++++#         self.experts = nn.ModuleList(
+-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++#         )
+-++++#         # 共享专家
+-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++
+-++++#     @no_grad()
+-++++#     def _moe_infer_dispatch(
+-++++#         self, 
+-++++#         hidden_states: mindspore.Tensor, 
+-++++#         selected_experts: mindspore.Tensor, 
+-++++#         routing_weights: mindspore.Tensor
+-++++#     ) -> mindspore.Tensor:
+-++++#         """
+-++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-++++#         """
+-++++#         moe_output = ops.zeros_like(hidden_states)
+-++++#         num_tokens, _ = hidden_states.shape
+-++++        
+-++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-++++#         flat_selected_experts = selected_experts.flatten()
+-++++#         flat_routing_weights = routing_weights.flatten()
+-+++ 
+-+++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++-
+-+++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++-    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++-    #     # <--- 修改结束 ---
+-+++-
+-+++-    #     fa_attention_mask = None
+-+++-    #     if attention_mask is not None:
+-+++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++-    #         fa_attention_mask = (mask_slice != 0)
+-+++-
+-+++-    #     input_dtype = query_states.dtype
+-+++-
+-+++-    #     attn_output = mindspore.ops.flash_attention_score(
+-+++-    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++-    #         key=key_states,
+-+++-    #         value=value_states,
+-+++-    #         head_num=self.num_heads,
+-+++-    #         attn_mask=fa_attention_mask,
+-+++-    #         keep_prob=1.0 - self.attention_dropout,
+-+++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++-    #         input_layout="BNSD",
+-+++-    #         sparse_mode=0,
+-+++-    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++-    #     )
+-++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++ 
+-+++-    #     attn_output = attn_output.to(input_dtype)
+-+++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++-    #     attn_output = self.o_proj(attn_output)
+-++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-++++#         active_experts = ops.unique(flat_selected_experts)
+-++++        
+-++++#         for expert_idx_tensor in active_experts:
+-++++#             expert_idx = expert_idx_tensor.item()
+-++++#             expert_layer = self.experts[expert_idx]
+-++++            
+-++++#             # 找到所有分配给该专家的 token
+-++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++            
+-++++#             # 使用 mask 选取对应的 token 和权重
+-++++#             current_token_indices = token_indices[mask]
+-++++#             current_routing_weights = flat_routing_weights[mask]
+-++++#             current_hidden_states = hidden_states[current_token_indices]
+-++++            
+-++++#             # 对这些 token 进行批处理
+-++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++++            
+-++++#             # 使用 index_add 将结果精确地加回到对应位置
+-++++#             moe_output = moe_output.index_add(
+-++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++#             )
+-++++#         return moe_output
+-++++
+-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++#         """
+-++++#         顶层 forward 方法，作为智能分发器。
+-++++#         """
+-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++        
+-++++#         # 1. 门控计算
+-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++
+-++++#         if self.norm_topk_prob:
+-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++        
+-++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++        
+-++++#         # 2. 调用统一的 MoE 计算内核
+-++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-+++ 
+-+++-    #     attn_weights = None
+-+++-    #     if output_attentions:
+-+++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++++#         # 3. 统一处理共享专家
+-++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++        
+-++++#         # 4. 合并输出
+-++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++        
+-++++#         # 5. 恢复原始形状并返回
+-++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++        
+-++++#         return final_hidden_states, router_logits
+-++++
+-++++
+-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++#     """
+-++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++#     【最终高性能与高精度版】：
+-++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-++++#     3. 这样实现了速度和准确性的两全其美。
+-++++#     """
+-++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++#         super().__init__()
+-++++#         self.num_experts = config.num_experts
+-++++#         self.top_k = config.num_experts_per_tok
+-++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++
+-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++#         self.experts = nn.ModuleList(
+-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++#         )
+-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++
+-++++#     @no_grad()
+-++++#     def _moe_infer_decode(
+-++++#         self, 
+-++++#         hidden_states: mindspore.Tensor, 
+-++++#         selected_experts: mindspore.Tensor, 
+-++++#         routing_weights: mindspore.Tensor
+-++++#     ) -> mindspore.Tensor:
+-++++#         """
+-++++#         【解码路径】极致优化版：bmm + 高精度累加。
+-++++#         """
+-++++#         original_dtype = hidden_states.dtype
+-++++#         batch_size, _ = hidden_states.shape
+-++++        
+-++++#         expert_outputs_list = [
+-++++#             ops.cat([
+-++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++#             ], dim=0) 
+-++++#             for i in range(batch_size)
+-++++#         ]
+-++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++
+-++++#         # 在 float32 下执行 bmm，得到高精度结果
+-++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++        
+-++++#         # 将高精度结果转换回原始数据类型
+-++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-++++        
+-++++#         return moe_output
+-++++
+-++++#     @no_grad()
+-++++#     def _moe_infer_prefill(
+-++++#         self, 
+-++++#         hidden_states: mindspore.Tensor, 
+-++++#         selected_experts: mindspore.Tensor, 
+-++++#         routing_weights: mindspore.Tensor
+-++++#     ) -> mindspore.Tensor:
+-++++#         """
+-++++#         【预填充路径】与原始实现一致，结果精确。
+-++++#         """
+-++++#         moe_output = ops.zeros_like(hidden_states)
+-++++#         num_tokens, _ = hidden_states.shape
+-++++#         flat_selected_experts = selected_experts.flatten()
+-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++#         active_experts = ops.unique(flat_selected_experts)
+-++++        
+-++++#         for expert_idx_tensor in active_experts:
+-++++#             expert_idx = expert_idx_tensor.item()
+-++++#             expert_layer = self.experts[expert_idx]
+-++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++#             selected_token_indices = token_indices[mask]
+-++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++#             current_states = hidden_states[selected_token_indices]
+-++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++#             moe_output = moe_output.index_add(
+-++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++#             )
+-++++#         return moe_output
+-++++
+-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++        
+-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++ 
+-+++-    #     return attn_output, attn_weights, past_key_value
+-++++#         if self.norm_topk_prob:
+-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++        
+-++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-++++#         # 如果模型主体是 float16，后续再转换
+-++++        
+-++++#         moe_output = None
+-++++#         if not self.training:
+-++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-++++#             # _moe_infer_decode 内部会处理好类型转换
+-++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-++++#             if sequence_length == 1:
+-++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++++#             else:
+-++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++++#         else:
+-++++#             raise NotImplementedError("Training path is not implemented.")
+-++++
+-++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++        
+-++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++        
+-++++#         return final_hidden_states, router_logits
+-++++    
+-+++ 
+-+++-QWEN2MOE_ATTENTION_CLASSES = {
+-+++-    "eager": Qwen2MoeAttention,
+-+++-    "flash-attention": Qwen2MoeFlashAttention,
+-+++-}
+-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++#     """
+-++++#     【融合版】一个混合专家模块，内置两种推理策略，
+-++++#     由外部全局变量 `Long_Prompt` 控制：
+-++++
+-++++#     - if Long_Prompt is True:  【精度优先模式】
+-++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-++++#       适用于处理长序列，避免误差累积。
+-++++
+-++++#     - if Long_Prompt is False: 【速度优先模式】
+-++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-++++#       在解码阶段获得极致速度，同时保证结果高度准确。
+-++++#     """
+-++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++#         super().__init__()
+-++++#         self.num_experts = config.num_experts
+-++++#         self.top_k = config.num_experts_per_tok
+-++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++
+-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++#         self.experts = nn.ModuleList(
+-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++#         )
+-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++
+-++++#     # --- 速度优先模式的辅助函数 ---
+-++++#     @no_grad()
+-++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++#         original_dtype = hidden_states.dtype
+-++++#         batch_size, _ = hidden_states.shape
+-++++#         expert_outputs_list = [
+-++++#             ops.cat([
+-++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++#             ], dim=0) 
+-++++#             for i in range(batch_size)
+-++++#         ]
+-++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++#         weights_fp32 = routing_weights.to(mindspore.float32)
+-++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-++++
+-++++#     @no_grad()
+-++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++#         moe_output = ops.zeros_like(hidden_states)
+-++++#         num_tokens, _ = hidden_states.shape
+-++++#         flat_selected_experts = selected_experts.flatten()
+-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++#         active_experts = ops.unique(flat_selected_experts)
+-++++#         for expert_idx_tensor in active_experts:
+-++++#             expert_idx = expert_idx_tensor.item()
+-++++#             expert_layer = self.experts[expert_idx]
+-++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++#             selected_token_indices = token_indices[mask]
+-++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++#             current_states = hidden_states[selected_token_indices]
+-++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++#         return moe_output
+-++++
+-++++#     # --- 精度优先模式的辅助函数 ---
+-++++#     @no_grad()
+-++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++#         moe_output = ops.zeros_like(hidden_states)
+-++++#         num_tokens, _ = hidden_states.shape
+-++++#         flat_selected_experts = selected_experts.flatten()
+-++++#         flat_routing_weights = routing_weights.flatten()
+-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++#         active_experts = ops.unique(flat_selected_experts)
+-++++#         for expert_idx_tensor in active_experts:
+-++++#             expert_idx = expert_idx_tensor.item()
+-++++#             expert_layer = self.experts[expert_idx]
+-++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++#             current_token_indices = token_indices[mask]
+-++++#             current_routing_weights = flat_routing_weights[mask]
+-++++#             current_hidden_states = hidden_states[current_token_indices]
+-++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++#         return moe_output
+-++++
+-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++#         # 声明我们将要使用一个在模块外部定义的全局变量
+-++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-++++#         global Long_Prompt
+-++++
+-++++#         # 1. 门控计算 (所有模式通用)
+-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-++++#         if self.norm_topk_prob:
+-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++        
+-++++#         moe_output = None
+-++++#         if not self.training:
+-++++#             # 根据 Long_Prompt 标志选择模式
+-++++#             if Long_Prompt:
+-++++#                 # --- 精度优先模式 ---
+-++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++#             else:
+-++++#                 # --- 速度优先模式 ---
+-++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++#                 if sequence_length == 1:
+-++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++#                 else:
+-++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++#         else:
+-++++#             raise NotImplementedError("Training path is not implemented.")
+-++++
+-++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++        
+-++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++        
+-++++#         return final_hidden_states, router_logits
+-++++    
+-++++class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++    """
+-++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-++++    控制的顶级推理策略：
+-+++ 
+-++++    - if Long_Prompt is True:  【精度优先模式】
+-++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+-++++      适用于需要严格可复现性的长序列任务。
+-+++ 
+-+++-class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++-    def __init__(self, config):
+-++++    - if Long_Prompt is False: 【速度优先模式】
+-++++      采用业界最强的性能组合：
+-++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+-++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+-++++    """
+-++++    def __init__(self, config: Qwen2MoeConfig):
+-+++         super().__init__()
+-+++         self.num_experts = config.num_experts
+-+++         self.top_k = config.num_experts_per_tok
+-+++         self.norm_topk_prob = config.norm_topk_prob
+-+++ 
+-+++-        # gating
+-+++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++         self.experts = nn.ModuleList(
+-+++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++         )
+-+++-
+-+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++ 
+-+++-    #@dwj
+-+++-    # 只遍历激活的专家，而非全部专家
+-+++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++-            num_tokens = hidden_states_reshaped.shape[0]
+-+++-
+-+++-            router_logits = self.gate(hidden_states_reshaped)
+-+++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++-
+-+++-            if self.norm_topk_prob:
+-+++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++-            
+-+++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++-            flat_selected_experts = selected_experts.flatten()
+-+++-            
+-+++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++-            token_indices = broadcasted_token_indices.flatten()
+-+++-            
+-+++-            active_experts = ops.unique(flat_selected_experts)
+-+++-            
+-+++-            for expert_idx_tensor in active_experts:
+-+++-                expert_idx = expert_idx_tensor.item()
+-+++-                expert_layer = self.experts[expert_idx]
+-+++-                
+-+++-                mask = (flat_selected_experts == expert_idx_tensor)
+-+++-                selected_token_indices = token_indices[mask]
+-+++-                selected_routing_weights = routing_weights.flatten()[mask]
+-+++-                
+-+++-                current_states = hidden_states_reshaped[selected_token_indices]
+-+++-                
+-+++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++-                
+-+++-                final_hidden_states = final_hidden_states.index_add(
+-+++-                    dim=0,
+-+++-                    index=selected_token_indices,
+-+++-                    source=expert_output.to(hidden_states.dtype)
+-+++-                )
+-+++-            
+-+++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+-++++    @no_grad()
+-++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++        original_dtype = hidden_states.dtype
+-++++        batch_size, _ = hidden_states.shape
+-++++        expert_outputs_list = [
+-++++            ops.cat([
+-++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++            ], dim=0)
+-++++            for i in range(batch_size)
+-++++        ]
+-++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++        weights_fp32 = routing_weights.to(mindspore.float32)
+-++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++++        return moe_output_fp32.squeeze(1).to(original_dtype)
+-++++
+-++++    @no_grad()
+-++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++        num_tokens, _ = hidden_states.shape
+-++++        flat_selected_experts = selected_experts.flatten()
+-++++        sorted_expert_indices = flat_selected_experts.argsort()
+-++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-++++        original_token_indices = sorted_expert_indices // self.top_k
+-++++        moe_output = ops.zeros_like(hidden_states)
+-++++        current_token_offset = 0
+-++++        for i in range(self.num_experts):
+-++++            expert_token_count = tokens_per_expert[i] - current_token_offset
+-++++            if expert_token_count == 0:
+-++++                continue
+-++++            end_offset = current_token_offset + expert_token_count
+-++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-++++            expert_hidden_states = hidden_states[expert_original_token_indices]
+-++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++            current_token_offset += expert_token_count
+-++++        return moe_output
+-++++
+-++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-++++    @no_grad()
+-++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++        moe_output = ops.zeros_like(hidden_states)
+-++++        num_tokens, _ = hidden_states.shape
+-++++        flat_selected_experts = selected_experts.flatten()
+-++++        flat_routing_weights = routing_weights.flatten()
+-++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++        active_experts = ops.unique(flat_selected_experts)
+-++++        for expert_idx_tensor in active_experts:
+-++++            expert_idx = expert_idx_tensor.item()
+-++++            expert_layer = self.experts[expert_idx]
+-++++            mask = (flat_selected_experts == expert_idx_tensor)
+-++++            current_token_indices = token_indices[mask]
+-++++            current_routing_weights = flat_routing_weights[mask]
+-++++            current_hidden_states = hidden_states[current_token_indices]
+-++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++        return moe_output
+-+++ 
+-+++-            final_hidden_states = final_hidden_states + shared_expert_output
+-+++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++-            
+-+++-            return final_hidden_states, router_logits
+-++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++        global Long_Prompt
+-++++
+-++++        # 1. 门控计算 (所有模式通用)
+-++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++        router_logits = self.gate(hidden_states_reshaped)
+-++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-++++        if self.norm_topk_prob:
+-++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++        
+-++++        moe_output = None
+-++++        if Long_Prompt:
+-++++            # --- 精度优先模式 (ACCURACY MODE) ---
+-++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++        else:
+-++++            # --- 速度优先模式 (SPEED MODE) ---
+-++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++            if sequence_length == 1:
+-++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++            else:
+-++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++        
+-+++ 
+-++++        # 3. 共享专家计算与合并 (所有模式通用)
+-++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++        
+-++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++        
+-++++        return final_hidden_states, router_logits
+-+++ 
+-+++ class Qwen2MoeDecoderLayer(nn.Module):
+-+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-+++         super().__init__()
+-+++         self.hidden_size = config.hidden_size
+-++++        
+-++++        # if Long_Prompt:
+-++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++        # else:
+-++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++ 
+-+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++ 
+-+++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++-
+-+++         if (layer_idx not in config.mlp_only_layers) and (
+-+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+++         ):
+-+++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++             self._warmed_up = True
+-+++             self.warmup_moe_model()
+-+++ 
+-++++
+-++++
+-+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+++         output_router_logits = (
+-+++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+-+++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++             router_logits=outputs.router_logits,
+-+++         )
+-+++ 
+-++++    def generate(self, *args, **kwargs):
+-++++        """
+-++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-++++        """
+-++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-++++
+-++++        input_ids = kwargs.get("input_ids")
+-++++        if input_ids is None and args:
+-++++            input_ids = args[0]
+-++++
+-++++        if input_ids is not None:
+-++++            prompt_length = input_ids.shape[1]
+-++++            
+-++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-++++                Long_Prompt = True
+-++++            else:
+-++++                Long_Prompt = False
+-++++
+-++++        return super().generate(*args, **kwargs)
+-++++    
+-+++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+-+++     def prepare_inputs_for_generation(
+-+++         self,
+-+++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+-+++         # Exception 1: when passing input_embeds, input_ids may be missing entries
+-+++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+-++++        
+-+++         if past_key_values is not None:
+-+++             if inputs_embeds is not None:  # Exception 1
+-+++                 if 0 not in input_ids.shape:
+-+++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++             }
+-+++         )
+-+++         return model_inputs
+-++++
+-+++ # @lwx
+-+++     # def _decode_one_tokens_logits(
+-+++     #     self,
+-+++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+-+++             attentions=outputs.attentions,
+-+++         )
+-+++ 
+-++++
+-+++ __all__ = [
+-+++     "Qwen2MoeForCausalLM",
+-+++     "Qwen2MoeModel",
+-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+++new file mode 100644
+-+++index 00000000..6dfb5b93
+-+++--- /dev/null
+-++++++ b/patches/0001-20251104commit.patch
+-+++@@ -0,0 +1,1272 @@
+-++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++++From: Pinoeer-kingxi <13022943007@163.com>
+-++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++++Subject: [PATCH] 20251104commit
+-++++
+-++++---
+-++++ mindnlp/transformers/cache_utils.py           |  28 +-
+-++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-++++
+-++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-++++index cadd2e04..02f8d4be 100644
+-++++--- a/mindnlp/transformers/cache_utils.py
+-+++++++ b/mindnlp/transformers/cache_utils.py
+-++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-++++             # k_out[:, :, cache_position] = key_states
+-++++             # v_out[:, :, cache_position] = value_states
+-++++-            if ON_ORANGE_PI:
+-++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++-            else:
+-++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++-
+-+++++            # if ON_ORANGE_PI:
+-+++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++++            # else:
+-+++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++++            # 确保 cache_position 是 1D tensor 并且类型正确
+-+++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+++++            if cache_position.ndim > 1:
+-+++++                cache_position = cache_position.flatten()
+-+++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+++++                cache_position = cache_position.int()
+-+++++            
+-+++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+++++            k_out[:, :, cache_position] = key_states
+-+++++            v_out[:, :, cache_position] = value_states
+-+++++            
+-++++         return k_out, v_out
+-++++ 
+-++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++index c695b944..d8303e45 100644
+-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++++ def rotate_half(x):
+-++++     """Rotates half the hidden dims of the input."""
+-++++-    x1 = x[..., : x.shape[-1] // 2]
+-++++-    x2 = x[..., x.shape[-1] // 2 :]
+-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++++    # x1 = x[..., : x.shape[-1] // 2]
+-+++++    # x2 = x[..., x.shape[-1] // 2 :]
+-+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++     return ops.cat((-x2, x1), dim=-1)
+-++++ 
+-++++ 
+-++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-++++         if self.training:
+-++++             raise NotImplementedError("Training is not supported yet.")
+-++++         else:
+-++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++-        if self.config.n_shared_experts is not None:
+-++++-            y = y + self.shared_experts(identity)
+-++++-        return y
+-+++++            # @lwx
+-+++++            if orig_shape[1] == 1:
+-+++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+++++                y=y.view(*orig_shape)
+-+++++                if self.config.n_shared_experts is not None:
+-+++++                    y = y + self.shared_experts(identity)
+-+++++                return y
+-+++++            else:
+-+++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++                if self.config.n_shared_experts is not None:
+-+++++                    y = y + self.shared_experts(identity)
+-+++++                return y
+-+++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++        # if self.config.n_shared_experts is not None:
+-+++++        #     y = y + self.shared_experts(identity)
+-+++++        # return y
+-+++++        
+-+++++    @no_grad()
+-+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++++
+-+++++        expert_cache = ops.zeros_like(x)
+-+++++        for i in range(self.num_experts_per_tok):
+-+++++            expert_id = flat_expert_indices[i].item()
+-+++++            weight = flat_expert_weights[i].item()
+-+++++            expert = self.experts[expert_id]
+-+++++            expert_out = expert(x)
+-+++++            expert_cache += expert_out * weight
+-+++++        return expert_cache
+-++++ 
+-++++     @no_grad()
+-++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++-        # expert_cache = torch.zeros_like(x)
+-++++-        # idxs = flat_expert_indices.argsort()
+-++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++-        # token_idxs = idxs // self.num_experts_per_tok
+-++++-        # for i, end_idx in enumerate(tokens_per_expert):
+-++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++-        #     if start_idx == end_idx:
+-++++-        #         continue
+-++++-        #     expert = self.experts[i]
+-++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++-        #     expert_tokens = x[exp_token_idx]
+-++++-        #     expert_out = expert(expert_tokens)
+-++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++-        # return expert_cache
+-+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++         expert_cache = ops.zeros_like(x)
+-++++         idxs = flat_expert_indices.argsort()
+-++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++         token_idxs = idxs // self.num_experts_per_tok
+-+++++
+-++++         for i, end_idx in enumerate(tokens_per_expert):
+-++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++             if start_idx == end_idx:
+-++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-++++             expert_out = expert(expert_tokens)
+-++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++
+-++++         return expert_cache
+-+++++        
+-+++++    # @no_grad()
+-+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++    #     # expert_cache = torch.zeros_like(x)
+-+++++    #     # idxs = flat_expert_indices.argsort()
+-+++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++    #     #     if start_idx == end_idx:
+-+++++    #     #         continue
+-+++++    #     #     expert = self.experts[i]
+-+++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++    #     #     expert_tokens = x[exp_token_idx]
+-+++++    #     #     expert_out = expert(expert_tokens)
+-+++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++    #     # return expert_cache
+-+++++    #     expert_cache = ops.zeros_like(x)
+-+++++    #     idxs = flat_expert_indices.argsort()
+-+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++
+-+++++    #     for i, end_idx in enumerate(tokens_per_expert):
+-+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++    #         if start_idx == end_idx:
+-+++++    #             continue
+-+++++    #         expert = self.experts[i]
+-+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++    #         expert_tokens = x[exp_token_idx]
+-+++++    #         expert_out = expert(expert_tokens)
+-+++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++
+-+++++    #     return expert_cache
+-+++++    # @no_grad()
+-+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++    #     expert_cache = ops.zeros_like(x)
+-+++++
+-+++++    #     # 排序保证顺序一致
+-+++++    #     idxs = flat_expert_indices.argsort()
+-+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++
+-+++++    #     # 找出有 token 的专家
+-+++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++++
+-+++++    #     for i in active_experts.tolist():
+-+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++    #         end_idx = tokens_per_expert[i]
+-+++++    #         if start_idx == end_idx:  # 没有 token
+-+++++    #             continue
+-+++++
+-+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++    #         expert_tokens = x[exp_token_idx]
+-+++++    #         expert_out = self.experts[i](expert_tokens)
+-+++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++++
+-+++++    #         expert_cache = mindspore.mint.scatter_add(
+-+++++    #             expert_cache,
+-+++++    #             0,
+-+++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++++    #             expert_out
+-+++++    #         )
+-+++++
+-+++++    #     return expert_cache
+-+++++
+-+++++
+-++++ 
+-++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-++++ #     """
+-++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++ 
+-++++         # Initialize weights and apply final processing
+-++++         self.post_init()
+-+++++        self.warm_up = False
+-+++++
+-+++++    def warmup_moe_model_deep(self):
+-+++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++++        test_texts = [
+-+++++            "warmup short",
+-+++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+++++        ]
+-+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++++        if tokenizer is None:
+-+++++            from mindnlp.transformers import AutoTokenizer
+-+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++++            self._warmup_tokenizer = tokenizer
+-+++++
+-+++++        for text in test_texts:
+-+++++            inputs = tokenizer(text, return_tensors="ms")
+-+++++            with mindspore._no_grad():
+-+++++                _ = self(**inputs, use_cache=False)
+-+++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-++++ 
+-++++     def get_input_embeddings(self):
+-++++         return self.model.embed_tokens
+-++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++++         ```"""
+-+++++        if not self.warm_up:
+-+++++            self.warm_up = True
+-+++++            self.warmup_moe_model_deep()
+-+++++
+-++++         output_attentions = (
+-++++             output_attentions
+-++++             if output_attentions is not None
+-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++index 3cbf820e..d4c6b651 100644
+-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++@@ -18,7 +18,6 @@
+-++++ # See the License for the specific language governing permissions and
+-++++ # limitations under the License.
+-++++ """MindSpore Qwen2MoE model."""
+-++++-
+-++++ import math
+-++++ from typing import List, Optional, Tuple, Union
+-++++ 
+-++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-++++     TokenClassifierOutput,
+-++++ )
+-++++ from ...modeling_utils import PreTrainedModel
+-+++++from ...generation import GenerationMixin
+-++++ from ....utils import logging
+-++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-++++ 
+-++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-++++         self.variance_epsilon = eps
+-++++ 
+-++++     def forward(self, hidden_states):
+-+++++        # @dwj
+-+++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++++        # @lwx
+-+++++        # if not self.training :
+-+++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++         input_dtype = hidden_states.dtype
+-++++         hidden_states = hidden_states.to(mindspore.float32)
+-++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-++++@@ -234,6 +239,8 @@ def rotate_half(x):
+-++++     """Rotates half the hidden dims of the input."""
+-++++     x1 = x[..., : x.shape[-1] // 2]
+-++++     x2 = x[..., x.shape[-1] // 2 :]
+-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++     return ops.cat((-x2, x1), dim=-1)
+-++++ 
+-++++ 
+-++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-++++         self.config = config
+-++++         self.hidden_size = config.hidden_size
+-++++         self.intermediate_size = intermediate_size
+-+++++        
+-++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-++++         self.act_fn = ACT2FN[config.hidden_act]
+-++++ 
+-++++     def forward(self, x):
+-++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++-
+-++++ 
+-+++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++++        # @lwx 
+-+++++        # gate_up_output = self.gate_up_proj(x)
+-+++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+++++        # return self.down_proj(swiglu_output)
+-+++++
+-+++++    # def forward(self, x):
+-+++++    #     gate_proj_out = self.gate_proj(x)
+-+++++    #     up_proj_out = self.up_proj(x)
+-+++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+++++    #     return self.down_proj(swiglu_out)
+-+++++        
+-++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++++     """
+-++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-++++         use_cache: bool = False,
+-++++         cache_position: Optional[mindspore.Tensor] = None,
+-++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++        
+-+++++
+-++++         bsz, q_len, _ = hidden_states.shape
+-++++ 
+-++++         query_states = self.q_proj(hidden_states)
+-++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++                     "with a layer index."
+-++++                 )
+-++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++            if isinstance(past_key_value, StaticCache):
+-+++++                kv_seq_len = key_states.shape[-2]
+-+++++            else:
+-+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++ 
+-++++         if past_key_value is not None:
+-++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++            
+-+++++            if isinstance(past_key_value, StaticCache):
+-+++++                kv_seq_len = key_states.shape[-2]
+-++++ 
+-++++         # repeat k/v heads if n_kv_heads < n_heads
+-++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++-
+-+++++        
+-++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++ 
+-++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-++++-            raise ValueError(
+-++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-++++-                f" {attn_weights.shape}"
+-++++-            )
+-++++-
+-++++-        if attention_mask is not None:  # no matter the length, we just slice it
+-++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+++++        if attention_mask is not None:
+-+++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++             attn_weights = attn_weights + causal_mask
+-++++ 
+-++++         # upcast attention to fp32
+-++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++ 
+-++++         attn_output = self.o_proj(attn_output)
+-++++-
+-+++++        # @lwx
+-+++++        
+-+++++        # max_seq_len = self.max_position_embeddings  # 2048
+-+++++
+-+++++        # if attention_mask is not None:
+-+++++        #     # attention_mask: [B, 1, Sq, Sk]
+-+++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++++
+-+++++        #     # pad 到 [max_seq_len, max_seq_len]
+-+++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++++        #     global_attention_mask = padded_mask
+-+++++        # else:
+-+++++        #     global_attention_mask = None
+-+++++
+-+++++
+-+++++        # sparse_mode=3
+-+++++        # attn_output = mindspore.ops.flash_attention_score(
+-+++++        #     query=query_states,
+-+++++        #     key=key_states,
+-+++++        #     value=value_states,
+-+++++        #     real_shift=None,
+-+++++        #     padding_mask=None,
+-+++++
+-+++++        #     head_num=self.num_heads,
+-+++++        #     attn_mask=global_attention_mask,
+-+++++        #     keep_prob=1.0 - self.attention_dropout,
+-+++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++        #     input_layout="BNSD",
+-+++++        #     pre_tokens=2147483647,
+-+++++        #     next_tokens=2147483647,
+-+++++        #     inner_precise=0,
+-+++++        #     drop_mask=None,
+-+++++        #     prefix=None,
+-+++++        #     actual_seq_qlen=None,
+-+++++        #     actual_seq_kvlen=None,
+-+++++        #     sparse_mode=sparse_mode,
+-+++++        # )
+-++++         if not output_attentions:
+-++++             attn_weights = None
+-++++ 
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++ 
+-+++++class Qwen2MoeFlashAttention(nn.Module):
+-+++++    """
+-+++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++++
+-+++++    关键改动:
+-+++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++++       直接传入原始的 key 和 value 张量效率更高。
+-+++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++++    """
+-+++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++++        super().__init__()
+-+++++        self.config = config
+-+++++        self.layer_idx = layer_idx
+-+++++        self.hidden_size = config.hidden_size
+-+++++        self.num_heads = config.num_attention_heads
+-+++++        self.head_dim = self.hidden_size // self.num_heads
+-+++++        self.num_key_value_heads = config.num_key_value_heads
+-+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++        self.max_position_embeddings = config.max_position_embeddings
+-+++++        self.rope_theta = config.rope_theta
+-+++++        self.attention_dropout = config.attention_dropout
+-+++++
+-+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++            raise ValueError(
+-+++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++++            )
+-+++++
+-+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++++
+-+++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++++            self.head_dim,
+-+++++            max_position_embeddings=self.max_position_embeddings,
+-+++++            base=self.rope_theta,
+-+++++        )
+-+++++
+-+++++    def forward(
+-+++++        self,
+-+++++        hidden_states: mindspore.Tensor,
+-+++++        attention_mask: Optional[mindspore.Tensor] = None,
+-+++++        position_ids: Optional[mindspore.Tensor] = None,
+-+++++        past_key_value: Optional[Cache] = None,
+-+++++        output_attentions: bool = False,
+-+++++        use_cache: bool = False,
+-+++++        cache_position: Optional[mindspore.Tensor] = None,
+-+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++        bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++        # 1. 线性投射 Q, K, V
+-+++++        query_states = self.q_proj(hidden_states)
+-+++++        key_states = self.k_proj(hidden_states)
+-+++++        value_states = self.v_proj(hidden_states)
+-+++++
+-+++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++        # 3. RoPE 旋转位置编码
+-+++++        kv_seq_len = key_states.shape[-2]
+-+++++        if past_key_value is not None:
+-+++++            if self.layer_idx is None:
+-+++++                raise ValueError(
+-+++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++                    "with a layer index."
+-+++++                )
+-+++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++++                if cache_position.shape[0] == 1:
+-+++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++++                    kv_seq_len = past_seen_tokens + 1
+-+++++                else:
+-+++++                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++++            else:
+-+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++
+-+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++        # 4. KV 缓存更新
+-+++++        if past_key_value is not None:
+-+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++            key_states, value_states = past_key_value.update(
+-+++++                key_states, value_states, self.layer_idx, cache_kwargs
+-+++++            )
+-+++++            
+-+++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++                if cache_position.shape[0] == 1:
+-+++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++++                    kv_seq_len = key_states.shape[-2]
+-+++++
+-+++++        # 5. [重要] 准备 Attention Mask
+-+++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++++        fa_attention_mask = None
+-+++++        if attention_mask is not None:
+-+++++            # 截取与当前key长度匹配的部分
+-+++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++++            fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++++        input_dtype = query_states.dtype
+-+++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++++            query_states = query_states.to(mindspore.float16)
+-+++++            key_states = key_states.to(mindspore.float16)
+-+++++            value_states = value_states.to(mindspore.float16)
+-+++++
+-+++++        # 6. [核心] 调用 flash_attention_score 算子
+-+++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++++        attn_output = mindspore.ops.flash_attention_score(
+-+++++            query=query_states,
+-+++++            key=key_states,
+-+++++            value=value_states,
+-+++++            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++++            attn_mask=fa_attention_mask,
+-+++++            keep_prob=1.0 - self.attention_dropout,
+-+++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++            input_layout="BNSD",
+-+++++            sparse_mode=0 # 使用 defaultMask 模式
+-+++++        )
+-+++++
+-+++++        # 恢复原始数据类型
+-+++++        attn_output = attn_output.to(input_dtype)
+-+++++
+-+++++        # 7. 调整输出形状
+-+++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++        attn_output = self.o_proj(attn_output)
+-+++++
+-+++++        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++++        attn_weights = None
+-+++++        if output_attentions:
+-+++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++
+-+++++        return attn_output, attn_weights, past_key_value
+-+++++
+-+++++    # def forward(
+-+++++    #     self,
+-+++++    #     hidden_states: mindspore.Tensor,
+-+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++    #     past_key_value: Optional[Cache] = None,
+-+++++    #     output_attentions: bool = False,
+-+++++    #     use_cache: bool = False,
+-+++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++    #     bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++    #     # 1. 线性投射 Q, K, V
+-+++++    #     query_states = self.q_proj(hidden_states)
+-+++++    #     key_states = self.k_proj(hidden_states)
+-+++++    #     value_states = self.v_proj(hidden_states)
+-+++++
+-+++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++    #     # 3. RoPE 旋转位置编码
+-+++++    #     kv_seq_len = key_states.shape[-2]
+-+++++    #     if past_key_value is not None:
+-+++++    #         if self.layer_idx is None:
+-+++++    #             raise ValueError(
+-+++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++    #                 "with a layer index."
+-+++++    #             )
+-+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++
+-+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++    #     # 4. KV 缓存更新
+-+++++    #     if past_key_value is not None:
+-+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++    #         key_states, value_states = past_key_value.update(
+-+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++    #         )
+-+++++
+-+++++    #     # 5. 准备 Attention Mask
+-+++++    #     fa_attention_mask = None
+-+++++    #     if attention_mask is not None:
+-+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++    #         fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++++    #     input_dtype = query_states.dtype
+-+++++
+-+++++    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++    #         query=query_states,
+-+++++    #         key=key_states,
+-+++++    #         value=value_states,
+-+++++    #         head_num=self.num_heads,
+-+++++    #         attn_mask=fa_attention_mask,
+-+++++    #         keep_prob=1.0 - self.attention_dropout,
+-+++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++    #         input_layout="BNSD",
+-+++++    #         sparse_mode=0,
+-+++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++++    #         inner_precise=1
+-+++++    #     )
+-+++++
+-+++++    #     # 恢复原始数据类型
+-+++++    #     attn_output = attn_output.to(input_dtype)
+-+++++
+-+++++    #     # 7. 调整输出形状
+-+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++    #     attn_output = self.o_proj(attn_output)
+-+++++
+-+++++    #     attn_weights = None
+-+++++    #     if output_attentions:
+-+++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++
+-+++++    #     return attn_output, attn_weights, past_key_value
+-+++++
+-+++++    # def forward(
+-+++++    #     self,
+-+++++    #     hidden_states: mindspore.Tensor,
+-+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++    #     past_key_value: Optional[Cache] = None,
+-+++++    #     output_attentions: bool = False,
+-+++++    #     use_cache: bool = False,
+-+++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++    #     bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++    #     query_states = self.q_proj(hidden_states)
+-+++++    #     key_states = self.k_proj(hidden_states)
+-+++++    #     value_states = self.v_proj(hidden_states)
+-+++++
+-+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++    #     kv_seq_len = key_states.shape[-2]
+-+++++    #     if past_key_value is not None:
+-+++++    #         if self.layer_idx is None:
+-+++++    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++
+-+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++    #     if past_key_value is not None:
+-+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++    #         key_states, value_states = past_key_value.update(
+-+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++    #         )
+-+++++
+-+++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++
+-+++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++++    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++++    #     # <--- 修改结束 ---
+-+++++
+-+++++    #     fa_attention_mask = None
+-+++++    #     if attention_mask is not None:
+-+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++    #         fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++    #     input_dtype = query_states.dtype
+-+++++
+-+++++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++++    #         key=key_states,
+-+++++    #         value=value_states,
+-+++++    #         head_num=self.num_heads,
+-+++++    #         attn_mask=fa_attention_mask,
+-+++++    #         keep_prob=1.0 - self.attention_dropout,
+-+++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++++    #         input_layout="BNSD",
+-+++++    #         sparse_mode=0,
+-+++++    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++++    #     )
+-+++++
+-+++++    #     attn_output = attn_output.to(input_dtype)
+-+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++    #     attn_output = self.o_proj(attn_output)
+-+++++
+-+++++    #     attn_weights = None
+-+++++    #     if output_attentions:
+-+++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++++
+-+++++    #     return attn_output, attn_weights, past_key_value
+-+++++
+-++++ QWEN2MOE_ATTENTION_CLASSES = {
+-++++     "eager": Qwen2MoeAttention,
+-+++++    "flash-attention": Qwen2MoeFlashAttention,
+-++++ }
+-++++ 
+-++++ 
+-++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++ 
+-+++++    #@dwj
+-+++++    # 只遍历激活的专家，而非全部专家
+-++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-++++-        # router_logits: (batch * sequence_length, n_experts)
+-++++-        router_logits = self.gate(hidden_states)
+-++++-
+-++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++-        if self.norm_topk_prob:
+-++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-        # we cast back to the input dtype
+-++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-++++-
+-++++-        final_hidden_states = ops.zeros(
+-++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-++++-        )
+-++++-
+-++++-        # One hot encode the selected experts to create an expert mask
+-++++-        # this will be used to easily index which expert is going to be sollicitated
+-++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-++++-
+-++++-        # Loop over all available experts in the model and perform the computation on each expert
+-++++-        for expert_idx in range(self.num_experts):
+-++++-            expert_layer = self.experts[expert_idx]
+-++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-++++-
+-++++-            # Index the correct hidden states and compute the expert hidden state for
+-++++-            # the current expert. We need to make sure to multiply the output hidden
+-++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-++++-            if 0 not in idx.shape:
+-++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-++++-
+-++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-++++-                # the `top_x` tensor here.
+-++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-++++-
+-++++-        shared_expert_output = self.shared_expert(hidden_states)
+-++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-++++-
+-++++-        final_hidden_states = final_hidden_states + shared_expert_output
+-+++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++            num_tokens = hidden_states_reshaped.shape[0]
+-+++++
+-+++++            router_logits = self.gate(hidden_states_reshaped)
+-+++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++
+-+++++            if self.norm_topk_prob:
+-+++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++            
+-+++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++++            flat_selected_experts = selected_experts.flatten()
+-+++++            
+-+++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++++            token_indices = broadcasted_token_indices.flatten()
+-+++++            
+-+++++            active_experts = ops.unique(flat_selected_experts)
+-+++++            
+-+++++            for expert_idx_tensor in active_experts:
+-+++++                expert_idx = expert_idx_tensor.item()
+-+++++                expert_layer = self.experts[expert_idx]
+-+++++                
+-+++++                mask = (flat_selected_experts == expert_idx_tensor)
+-+++++                selected_token_indices = token_indices[mask]
+-+++++                selected_routing_weights = routing_weights.flatten()[mask]
+-+++++                
+-+++++                current_states = hidden_states_reshaped[selected_token_indices]
+-+++++                
+-+++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++                
+-+++++                final_hidden_states = final_hidden_states.index_add(
+-+++++                    dim=0,
+-+++++                    index=selected_token_indices,
+-+++++                    source=expert_output.to(hidden_states.dtype)
+-+++++                )
+-+++++            
+-+++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++++ 
+-++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++-        return final_hidden_states, router_logits
+-+++++            final_hidden_states = final_hidden_states + shared_expert_output
+-+++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++            
+-+++++            return final_hidden_states, router_logits
+-++++ 
+-++++ 
+-++++ class Qwen2MoeDecoderLayer(nn.Module):
+-++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-++++ 
+-++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++ 
+-+++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++++
+-++++         if (layer_idx not in config.mlp_only_layers) and (
+-++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++++         ):
+-++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-++++     _skip_keys_device_placement = "past_key_values"
+-++++     _supports_cache_class = True
+-+++++#lwx
+-+++++    # _supports_static_cache = True
+-++++ 
+-++++     def _init_weights(self, module):
+-++++         std = self.config.initializer_range
+-++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++++         return causal_mask
+-++++ 
+-++++ 
+-++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++     _tied_weights_keys = ["lm_head.weight"]
+-++++ 
+-++++     def __init__(self, config):
+-++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++         self.num_experts_per_tok = config.num_experts_per_tok
+-++++         # Initialize weights and apply final processing
+-++++         self.post_init()
+-+++++        # @lwx
+-+++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+++++        #     self.generation_config.cache_implementation = "static"
+-+++++        self._warmed_up = False
+-+++++
+-+++++    def warmup_moe_model(self):
+-+++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+++++        test_texts = [
+-+++++            "warmup short",
+-+++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+++++        ]
+-+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++++        if tokenizer is None:
+-+++++            from mindnlp.transformers import AutoTokenizer
+-+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++++            self._warmup_tokenizer = tokenizer
+-+++++
+-+++++        for text in test_texts:
+-+++++            inputs = tokenizer(text, return_tensors="ms")
+-+++++            with mindspore._no_grad():
+-+++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-++++ 
+-++++     def get_input_embeddings(self):
+-++++         return self.model.embed_tokens
+-++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++++         ```"""
+-+++++        if not self._warmed_up:
+-+++++            self._warmed_up = True
+-+++++            self.warmup_moe_model()
+-++++ 
+-++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++++         output_router_logits = (
+-++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++             }
+-++++         )
+-++++         return model_inputs
+-+++++# @lwx
+-+++++    # def _decode_one_tokens_logits(
+-+++++    #     self,
+-+++++    #     cur_token: mindspore.Tensor,
+-+++++    #     input_pos: Optional[mindspore.Tensor],
+-+++++    #     cache_position: mindspore.Tensor,
+-+++++    #     past_key_values: StaticCache,
+-+++++    # ) -> mindspore.Tensor:
+-+++++    #     """
+-+++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+++++        
+-+++++    #     Args:
+-+++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+++++    #         input_pos: 输入位置信息，可选
+-+++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+++++            
+-+++++    #     Returns:
+-+++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+++++    #     """
+-+++++    #     # 调用JIT编译的版本
+-+++++    #     return self.get_decode_one_tokens_logits(
+-+++++    #         cur_token=cur_token,
+-+++++    #         input_pos=input_pos,
+-+++++    #         cache_position=cache_position,
+-+++++    #         past_key_values=past_key_values,
+-+++++    #     )
+-+++++    
+-+++++    # @mindspore.jit(jit_level='O1')
+-+++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+++++    #     """
+-+++++    #     JIT编译的函数，用于高效的单token解码
+-+++++    #     使用JIT编译优化以支持静态shape和高效执行
+-+++++        
+-+++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+++++    #     """
+-+++++    #     outputs = self.model.forward(
+-+++++    #         input_ids=cur_token,
+-+++++    #         position_ids=input_pos,
+-+++++    #         cache_position=cache_position,
+-+++++    #         past_key_values=past_key_values,
+-+++++    #         use_cache=True,
+-+++++    #         return_dict=False,
+-+++++    #     )
+-+++++        
+-+++++    #     hidden_states = outputs[0]
+-+++++    #     logits = self.lm_head.forward(hidden_states)
+-+++++    #     logits = logits.float()
+-+++++        
+-+++++    #     return logits[:, -1, :]
+-+++++
+-+++++    # def _sample(
+-+++++    #     self,
+-+++++    #     input_ids: mindspore.Tensor,
+-+++++    #     logits_processor,
+-+++++    #     stopping_criteria,
+-+++++    #     generation_config,
+-+++++    #     synced_devices: bool,
+-+++++    #     streamer=None,
+-+++++    #     logits_warper=None,
+-+++++    #     **model_kwargs,
+-+++++    # ):
+-+++++    #     """
+-+++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+++++    #     """
+-+++++    #     from ...generation.logits_process import LogitsProcessorList
+-+++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+++++    #     from mindnlp.core import nn, ops, no_grad
+-+++++    #     import numpy as np
+-+++++        
+-+++++    #     # 检查是否使用 StaticCache
+-+++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+++++    #     # 否则，直接调用父类方法
+-+++++    #     past_key_values = model_kwargs.get("past_key_values")
+-+++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+++++
+-+++++    #     if not isinstance(past_key_values, StaticCache):
+-+++++    #         # 不使用 StaticCache，直接调用父类方法
+-+++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+++++    #         return super()._sample(
+-+++++    #             input_ids=input_ids,
+-+++++    #             logits_processor=logits_processor,
+-+++++    #             stopping_criteria=stopping_criteria,
+-+++++    #             generation_config=generation_config,
+-+++++    #             synced_devices=synced_devices,
+-+++++    #             streamer=streamer,
+-+++++    #             logits_warper=logits_warper,
+-+++++    #             **model_kwargs,
+-+++++    #         )
+-+++++        
+-+++++    #     # 使用 StaticCache，进入自定义循环
+-+++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+++++    #     pad_token_id = generation_config._pad_token_tensor
+-+++++    #     output_attentions = generation_config.output_attentions
+-+++++    #     output_hidden_states = generation_config.output_hidden_states
+-+++++    #     output_scores = generation_config.output_scores
+-+++++    #     output_logits = generation_config.output_logits
+-+++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+++++    #     max_length = generation_config.max_length
+-+++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+++++    #     do_sample = generation_config.do_sample
+-+++++        
+-+++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+++++    #         raise ValueError(
+-+++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+++++    #             f"{logits_warper})."
+-+++++    #         )
+-+++++        
+-+++++    #     # init attention / hidden states / scores tuples
+-+++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+++++        
+-+++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+++++    #         encoder_hidden_states = (
+-+++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+++++    #         )
+-+++++        
+-+++++    #     # keep track of which sequences are already finished
+-+++++    #     batch_size, cur_len = input_ids.shape
+-+++++    #     this_peer_finished = False
+-+++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+++++        
+-+++++    #     time_record = []
+-+++++    #     from ....utils.testing_utils import parse_flag_from_env
+-+++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+++++        
+-+++++    #     while self._has_unfinished_sequences(
+-+++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+++++    #     ):
+-+++++    #         if _record_time:
+-+++++    #             import time as time_module
+-+++++    #             infer_start = time_module.time()
+-+++++            
+-+++++    #         # prepare model inputs
+-+++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+++++            
+-+++++    #         # prepare variable output controls
+-+++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+++++            
+-+++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+++++    #         cur_cache_position = model_inputs.get("cache_position")
+-+++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+++++    #         cur_input_ids = model_inputs.get("input_ids")
+-+++++            
+-+++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+++++    #             cur_cache_position is not None and 
+-+++++    #             len(cur_cache_position.shape) > 0 and
+-+++++    #             cur_cache_position.shape[0] == 1 and
+-+++++    #             cur_input_ids is not None and
+-+++++    #             cur_input_ids.shape[1] == 1):
+-+++++    #             # 使用 JIT 优化的单 token 解码
+-+++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+++++    #             if not hasattr(self, '_jit_used'):
+-+++++    #                 self._jit_used = False
+-+++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+++++                
+-+++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+++++    #                 cur_token=cur_input_ids,
+-+++++    #                 input_pos=model_inputs.get("position_ids"),
+-+++++    #                 cache_position=cur_cache_position,
+-+++++    #                 past_key_values=cur_past_key_values,
+-+++++    #             )
+-+++++                
+-+++++    #             # 标记已使用JIT（用于后续判断）
+-+++++    #             if not self._jit_used:
+-+++++    #                 self._jit_used = True
+-+++++                
+-+++++    #             # 构造兼容的输出对象
+-+++++    #             class JitOptimizedOutput:
+-+++++    #                 def __init__(self, logits, config):
+-+++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+++++    #                     self.config = config
+-+++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+++++    #                     self.cross_attentions = None
+-+++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+++++                
+-+++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+++++    #         else:
+-+++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+++++    #             outputs = self(**model_inputs, return_dict=True)
+-+++++            
+-+++++    #         if synced_devices and this_peer_finished:
+-+++++    #             continue
+-+++++            
+-+++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+++++    #         next_token_logits = outputs.logits[:, -1, :]
+-+++++            
+-+++++    #         # pre-process distribution
+-+++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+++++    #         if do_sample:
+-+++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+++++            
+-+++++    #         # Store scores, attentions and hidden_states when required
+-+++++    #         if return_dict_in_generate:
+-+++++    #             if output_scores:
+-+++++    #                 scores += (next_token_scores,)
+-+++++    #             if output_logits:
+-+++++    #                 raw_logits += (next_token_logits,)
+-+++++    #             if output_attentions:
+-+++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+++++    #                 if self.config.is_encoder_decoder:
+-+++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+++++                
+-+++++    #             if output_hidden_states:
+-+++++    #                 hidden = (
+-+++++    #                     outputs.decoder_hidden_states
+-+++++    #                     if self.config.is_encoder_decoder
+-+++++    #                     else outputs.hidden_states
+-+++++    #                 )
+-+++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+++++            
+-+++++    #         # token selection
+-+++++    #         if do_sample:
+-+++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+++++    #         else:
+-+++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+++++            
+-+++++    #         # finished sentences should have their next token be a padding token
+-+++++    #         if has_eos_stopping_criteria:
+-+++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+++++            
+-+++++    #         # update generated ids, model inputs, and length for next step
+-+++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+++++    #         if streamer is not None:
+-+++++    #             streamer.put(next_tokens)
+-+++++            
+-+++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+++++    #             outputs,
+-+++++    #             model_kwargs,
+-+++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+++++    #         )
+-+++++            
+-+++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+++++    #         cur_len += 1
+-+++++            
+-+++++    #         if _record_time:
+-+++++    #             import time as time_module
+-+++++    #             infer_stop = time_module.time()
+-+++++    #             time_record.append(infer_stop - infer_start)
+-+++++            
+-+++++    #         del outputs
+-+++++        
+-+++++    #     average_infer_time = None
+-+++++    #     if time_record:
+-+++++    #         if len(time_record) > 1:
+-+++++    #             time_record.pop(0)
+-+++++    #         average_infer_time = sum(time_record) / len(time_record)
+-+++++    #         print(f'average inference time is: {average_infer_time}')
+-+++++    #         print(f'inference time record: {time_record}')
+-+++++        
+-+++++    #     if streamer is not None:
+-+++++    #         streamer.end()
+-+++++        
+-+++++    #     # 简单判断：打印是否使用了JIT路径
+-+++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+++++    #     else:
+-+++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+++++        
+-+++++    #     if return_dict_in_generate:
+-+++++    #         if self.config.is_encoder_decoder:
+-+++++    #             return GenerateEncoderDecoderOutput(
+-+++++    #                 sequences=input_ids,
+-+++++    #                 scores=scores,
+-+++++    #                 logits=raw_logits,
+-+++++    #                 encoder_attentions=encoder_attentions,
+-+++++    #                 encoder_hidden_states=encoder_hidden_states,
+-+++++    #                 decoder_attentions=decoder_attentions,
+-+++++    #                 cross_attentions=cross_attentions,
+-+++++    #                 decoder_hidden_states=decoder_hidden_states,
+-+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++++    #                 average_infer_time=average_infer_time
+-+++++    #             )
+-+++++    #         else:
+-+++++    #             return GenerateDecoderOnlyOutput(
+-+++++    #                 sequences=input_ids,
+-+++++    #                 scores=scores,
+-+++++    #                 logits=raw_logits,
+-+++++    #                 attentions=decoder_attentions,
+-+++++    #                 hidden_states=decoder_hidden_states,
+-+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++++    #                 average_infer_time=average_infer_time
+-+++++    #             )
+-+++++    #     else:
+-+++++    #         return input_ids
+-+++++            
+-+++++    # def _prepare_cache_for_generation(
+-+++++    #     self,
+-+++++    #     generation_config,
+-+++++    #     model_kwargs,
+-+++++    #     assistant_model,
+-+++++    #     batch_size,
+-+++++    #     max_cache_length,
+-+++++    # ):
+-+++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+++++    #         generation_config.cache_implementation = "static"
+-+++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+++++        
+-+++++    #     if generation_config.cache_implementation == "static":
+-+++++    #         base_required_from_max_length = generation_config.max_length + 1
+-+++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+++++    #         min_cache_size = 50
+-+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+++++    #         else:
+-+++++    #             max_cache_length = max(base_required, min_cache_size)
+-+++++            
+-+++++    #         original_max_cache_length = max_cache_length
+-+++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+++++            
+-+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++++    #             if max_cache_length > self.config.max_position_embeddings:
+-+++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+++++        
+-+++++    #     result = super()._prepare_cache_for_generation(
+-+++++    #         generation_config=generation_config,
+-+++++    #         model_kwargs=model_kwargs,
+-+++++    #         assistant_model=assistant_model,
+-+++++    #         batch_size=batch_size,
+-+++++    #         max_cache_length=max_cache_length,
+-+++++    #     )
+-+++++        
+-+++++    #     if generation_config.cache_implementation == "static":
+-+++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+++++    #         created_cache = model_kwargs.get(cache_name)
+-+++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+++++    #             if created_cache.max_cache_len < generation_config.max_length:
+-+++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+++++        
+-+++++    #     return result
+-+++++
+-+++++
+-+++++
+-++++ 
+-++++ 
+-++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-++++-- 
+-++++2.27.0
+-++++
+-+++-- 
+-+++2.27.0
+-+++
+-++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-++new file mode 100644
+-++index 00000000..966529e4
+-++--- /dev/null
+-+++++ b/patches/0003-20261106secondcommit.patch
+-++@@ -0,0 +1,2769 @@
+-+++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-+++From: Pinoeer-kingxi <13022943007@163.com>
+-+++Date: Thu, 6 Nov 2025 14:54:37 +0800
+-+++Subject: [PATCH 3/3] 20261106secondcommit
+-+++
+-+++---
+-+++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+-+++ patches/0001-20251104commit.patch             | 1272 -----------------
+-+++ 3 files changed, 528 insertions(+), 2032 deletions(-)
+-+++ delete mode 100644 patches/0001-20251104commit.patch
+-+++
+-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++index 73773c22..2f9192bf 100644
+-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+-+++ 
+-+++ _CONFIG_FOR_DOC = "DeepseekConfig"
+-+++ 
+-++++_attn_mask_cache = {}
+-++++
+-++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+-++++    q_len = batch_and_seq[1]
+-++++    kv_len = batch_and_seq[1] + past_key_values_length 
+-++++    key = (batch_and_seq[0], q_len, kv_len)
+-++++
+-++++    if key in _attn_mask_cache:
+-++++        return _attn_mask_cache[key]
+-++++
+-++++    mask = _prepare_4d_causal_attention_mask(
+-++++        attention_mask,
+-++++        batch_and_seq,
+-++++        inputs_embeds,
+-++++        past_key_values_length,
+-++++    )
+-++++    _attn_mask_cache[key] = mask
+-++++    return mask
+-+++ 
+-+++ def _get_unpad_data(attention_mask):
+-+++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+-+++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+-+++         return final_output
+-+++ 
+-+++ 
+-+++-    @no_grad()
+-+++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++-        expert_cache = ops.zeros_like(x)
+-+++-        idxs = flat_expert_indices.argsort()
+-+++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++-        token_idxs = idxs // self.num_experts_per_tok
+-+++-
+-+++-        for i, end_idx in enumerate(tokens_per_expert):
+-+++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++-            if start_idx == end_idx:
+-+++-                continue
+-+++-            expert = self.experts[i]
+-+++-            exp_token_idx = token_idxs[start_idx:end_idx]
+-+++-            expert_tokens = x[exp_token_idx]
+-+++-            expert_out = expert(expert_tokens)
+-+++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++-
+-+++-        return expert_cache
+-+++-        
+-+++     # @no_grad()
+-+++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++-    #     # expert_cache = torch.zeros_like(x)
+-+++-    #     # idxs = flat_expert_indices.argsort()
+-+++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++-    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++-    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++-    #     #     if start_idx == end_idx:
+-+++-    #     #         continue
+-+++-    #     #     expert = self.experts[i]
+-+++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++-    #     #     expert_tokens = x[exp_token_idx]
+-+++-    #     #     expert_out = expert(expert_tokens)
+-+++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++-    #     # return expert_cache
+-++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++     #     expert_cache = ops.zeros_like(x)
+-+++     #     idxs = flat_expert_indices.argsort()
+-+++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+-+++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++ 
+-+++     #     return expert_cache
+-+++-    # @no_grad()
+-+++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++-    #     expert_cache = ops.zeros_like(x)
+-++++        
+-++++    @no_grad()
+-++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++        """
+-++++        优化版 MoE prefill：
+-++++        - 批量张量化处理同一个 expert 的所有 token
+-++++        - 跳过无 token 的专家
+-++++        - 保持结果完全一致
+-++++        """
+-++++        # 初始化输出缓存
+-++++        expert_cache = ops.zeros_like(x)
+-+++ 
+-+++-    #     # 排序保证顺序一致
+-+++-    #     idxs = flat_expert_indices.argsort()
+-+++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++-    #     token_idxs = idxs // self.num_experts_per_tok
+-++++        # 排序（确保 scatter_add 位置对应原逻辑）
+-++++        idxs = flat_expert_indices.argsort()
+-++++        sorted_expert_indices = flat_expert_indices[idxs]
+-++++        sorted_token_indices = idxs // self.num_experts_per_tok
+-+++ 
+-+++-    #     # 找出有 token 的专家
+-+++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++        # 每个 expert 的 token 数
+-++++        tokens_per_expert = sorted_expert_indices.bincount()
+-+++ 
+-+++-    #     for i in active_experts.tolist():
+-+++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++-    #         end_idx = tokens_per_expert[i]
+-+++-    #         if start_idx == end_idx:  # 没有 token
+-+++-    #             continue
+-++++        # 找出有 token 的专家
+-++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-+++ 
+-+++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++-    #         expert_tokens = x[exp_token_idx]
+-+++-    #         expert_out = self.experts[i](expert_tokens)
+-+++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++        for expert_id in active_experts.tolist():
+-++++            # 取该 expert 对应的排序后 token 区间
+-++++            start = (tokens_per_expert[:expert_id]).sum().item()
+-++++            end = start + tokens_per_expert[expert_id].item()
+-+++ 
+-+++-    #         expert_cache = mindspore.mint.scatter_add(
+-+++-    #             expert_cache,
+-+++-    #             0,
+-+++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++-    #             expert_out
+-+++-    #         )
+-++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+-++++            expert_tokens = x[token_idx]                     # 取输入向量
+-+++ 
+-+++-    #     return expert_cache
+-++++            # 执行专家 MLP
+-++++            expert_out = self.experts[expert_id](expert_tokens)
+-++++
+-++++            # 按权重缩放
+-++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+-++++
+-++++            # 回写到缓存（等价 scatter_add）
+-++++            expert_cache = mindspore.mint.scatter_add(
+-++++                expert_cache,
+-++++                0,
+-++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++                scaled_out
+-++++            )
+-++++
+-++++        return expert_cache
+-++++
+-++++        # @no_grad()
+-++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++        #     # expert_cache = torch.zeros_like(x)
+-++++        #     # idxs = flat_expert_indices.argsort()
+-++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++        #     # token_idxs = idxs // self.num_experts_per_tok
+-++++        #     # for i, end_idx in enumerate(tokens_per_expert):
+-++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++        #     #     if start_idx == end_idx:
+-++++        #     #         continue
+-++++        #     #     expert = self.experts[i]
+-++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++        #     #     expert_tokens = x[exp_token_idx]
+-++++        #     #     expert_out = expert(expert_tokens)
+-++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++        #     # return expert_cache
+-++++        #     expert_cache = ops.zeros_like(x)
+-++++        #     idxs = flat_expert_indices.argsort()
+-++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++        #     token_idxs = idxs // self.num_experts_per_tok
+-++++
+-++++        #     for i, end_idx in enumerate(tokens_per_expert):
+-++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++        #         if start_idx == end_idx:
+-++++        #             continue
+-++++        #         expert = self.experts[i]
+-++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++        #         expert_tokens = x[exp_token_idx]
+-++++        #         expert_out = expert(expert_tokens)
+-++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++
+-++++        #     return expert_cache
+-++++        # @no_grad()
+-++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++        #     expert_cache = ops.zeros_like(x)
+-++++
+-++++        #     # 排序保证顺序一致
+-++++        #     idxs = flat_expert_indices.argsort()
+-++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++        #     token_idxs = idxs // self.num_experts_per_tok
+-++++
+-++++        #     # 找出有 token 的专家
+-++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++
+-++++        #     for i in active_experts.tolist():
+-++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++        #         end_idx = tokens_per_expert[i]
+-++++        #         if start_idx == end_idx:  # 没有 token
+-++++        #             continue
+-++++
+-++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++        #         expert_tokens = x[exp_token_idx]
+-++++        #         expert_out = self.experts[i](expert_tokens)
+-++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++
+-++++        #         expert_cache = mindspore.mint.scatter_add(
+-++++        #             expert_cache,
+-++++        #             0,
+-++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++        #             expert_out
+-++++        #         )
+-++++
+-++++        #     return expert_cache
+-+++ 
+-+++ 
+-+++ 
+-+++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+-+++ 
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++-
+-+++ # class DeepseekFlashAttention(nn.Module):
+-+++ #     """
+-+++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-+++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+-+++ 
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-++++
+-+++ Deepseek_ATTENTION_CLASSES = {
+-+++     "eager": DeepseekAttention,
+-+++     "flash-attention": DeepseekFlashAttention,
+-+++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+-+++             )
+-+++         else:
+-+++             # 4d mask is passed through the layers
+-+++-            attention_mask = _prepare_4d_causal_attention_mask(
+-++++            # attention_mask = _prepare_4d_causal_attention_mask(
+-++++            #     attention_mask,
+-++++            #     (batch_size, seq_length),
+-++++            #     inputs_embeds,
+-++++            #     past_key_values_length,
+-++++            # )
+-++++            #@dwj
+-++++            attention_mask = get_cached_causal_mask(
+-+++                 attention_mask,
+-+++                 (batch_size, seq_length),
+-+++                 inputs_embeds,
+-+++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++         # Initialize weights and apply final processing
+-+++         self.post_init()
+-+++         self.warm_up = False
+-++++        #@dwj
+-++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-++++            self.num_layers,
+-++++            self.num_attention_heads,
+-++++            self.head_dim,
+-++++            batch_size=1,
+-++++            max_length=self.max_length,
+-++++            dtype=mindspore.float16
+-++++        )
+-++++
+-++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-++++        key_cache = []
+-++++        value_cache = []
+-++++        for _ in range(num_layers):
+-++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++++            key_cache.append(k)
+-++++            value_cache.append(v)
+-++++        return key_cache, value_cache
+-++++
+-+++ 
+-+++     def warmup_moe_model_deep(self):
+-+++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++index bced285c..ebd7782e 100644
+-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+-+++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-+++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-+++ 
+-+++-Long_Prompt = False
+-+++-PROMPT_LENGTH_THRESHOLD = 128
+-++++Long_Prompt = 1
+-++++LONG_PROMPT_LENGTH_THRESHOLD = 128
+-++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+-++++
+-++++_causal_mask_cache = {}
+-++++
+-++++def get_cached_causal_mask_with_cache_position(
+-++++    attention_mask: mindspore.Tensor,
+-++++    sequence_length: int,
+-++++    target_length: int,
+-++++    dtype: mindspore.dtype,
+-++++    min_dtype: float,
+-++++    cache_position: mindspore.Tensor,
+-++++    batch_size: int,
+-++++):
+-++++    """
+-++++    带缓存的 causal mask 构造函数
+-++++    """
+-++++    # q_len 是当前 query 长度
+-++++    q_len = sequence_length
+-++++    # kv_len 是 target_length
+-++++    kv_len = target_length
+-++++
+-++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+-++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+-++++
+-++++    if key in _causal_mask_cache:
+-++++        return _causal_mask_cache[key]
+-++++
+-++++    # 调用原来的 mask 构造逻辑
+-++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++        attention_mask,
+-++++        sequence_length=sequence_length,
+-++++        target_length=target_length,
+-++++        dtype=dtype,
+-++++        min_dtype=min_dtype,
+-++++        cache_position=cache_position,
+-++++        batch_size=batch_size,
+-++++    )
+-++++    # 缓存结果
+-++++    _causal_mask_cache[key] = causal_mask
+-++++    return causal_mask
+-+++ 
+-+++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-+++ def _prepare_4d_causal_attention_mask_with_cache_position(
+-+++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+++ 
+-+++ 
+-+++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+-++++# class Qwen2MoeAttention(nn.Module):
+-++++#     """
+-++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-++++#     and "Generating Long Sequences with Sparse Transformers".
+-++++#     """
+-++++
+-++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++#         super().__init__()
+-++++#         self.config = config
+-++++#         self.layer_idx = layer_idx
+-++++#         if layer_idx is None:
+-++++#             logger.warning_once(
+-++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++#                 "when creating this class."
+-++++#             )
+-++++
+-++++#         self.hidden_size = config.hidden_size
+-++++#         self.num_heads = config.num_attention_heads
+-++++#         self.head_dim = self.hidden_size // self.num_heads
+-++++#         self.num_key_value_heads = config.num_key_value_heads
+-++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++#         self.max_position_embeddings = config.max_position_embeddings
+-++++#         self.rope_theta = config.rope_theta
+-++++#         self.is_causal = True
+-++++#         self.attention_dropout = config.attention_dropout
+-++++
+-++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++#             raise ValueError(
+-++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++++#                 f" and `num_heads`: {self.num_heads})."
+-++++#             )
+-++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++
+-++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++#             self.head_dim,
+-++++#             max_position_embeddings=self.max_position_embeddings,
+-++++#             base=self.rope_theta,
+-++++#         )
+-++++
+-++++#     def forward(
+-++++#         self,
+-++++#         hidden_states: mindspore.Tensor,
+-++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++++#         position_ids: Optional[mindspore.Tensor] = None,
+-++++#         past_key_value: Optional[Cache] = None,
+-++++#         output_attentions: bool = False,
+-++++#         use_cache: bool = False,
+-++++#         cache_position: Optional[mindspore.Tensor] = None,
+-++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++
+-++++        
+-++++
+-++++#         bsz, q_len, _ = hidden_states.shape
+-++++
+-++++#         query_states = self.q_proj(hidden_states)
+-++++#         key_states = self.k_proj(hidden_states)
+-++++#         value_states = self.v_proj(hidden_states)
+-++++
+-++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++
+-++++#         kv_seq_len = key_states.shape[-2]
+-++++#         if past_key_value is not None:
+-++++#             if self.layer_idx is None:
+-++++#                 raise ValueError(
+-++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++#                     "with a layer index."
+-++++#                 )
+-++++#             if isinstance(past_key_value, StaticCache):
+-++++#                 kv_seq_len = key_states.shape[-2]
+-++++#             else:
+-++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++
+-++++#         if past_key_value is not None:
+-++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++            
+-++++#             if isinstance(past_key_value, StaticCache):
+-++++#                 kv_seq_len = key_states.shape[-2]
+-++++
+-++++#         # repeat k/v heads if n_kv_heads < n_heads
+-++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++        
+-++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++
+-++++#         if attention_mask is not None:
+-++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++#             attn_weights = attn_weights + causal_mask
+-++++
+-++++#         # upcast attention to fp32
+-++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-++++#         attn_output = ops.matmul(attn_weights, value_states)
+-++++
+-++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-++++#             raise ValueError(
+-++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-++++#                 f" {attn_output.shape}"
+-++++#             )
+-++++
+-++++#         attn_output = ops.transpose(attn_output, 1, 2)
+-++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++
+-++++#         attn_output = self.o_proj(attn_output)
+-++++#         # @lwx
+-++++        
+-++++#         # max_seq_len = self.max_position_embeddings  # 2048
+-++++
+-++++#         # if attention_mask is not None:
+-++++#         #     # attention_mask: [B, 1, Sq, Sk]
+-++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++++
+-++++#         #     # pad 到 [max_seq_len, max_seq_len]
+-++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++++#         #     global_attention_mask = padded_mask
+-++++#         # else:
+-++++#         #     global_attention_mask = None
+-++++
+-++++
+-++++#         # sparse_mode=3
+-++++#         # attn_output = mindspore.ops.flash_attention_score(
+-++++#         #     query=query_states,
+-++++#         #     key=key_states,
+-++++#         #     value=value_states,
+-++++#         #     real_shift=None,
+-++++#         #     padding_mask=None,
+-++++
+-++++#         #     head_num=self.num_heads,
+-++++#         #     attn_mask=global_attention_mask,
+-++++#         #     keep_prob=1.0 - self.attention_dropout,
+-++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++#         #     input_layout="BNSD",
+-++++#         #     pre_tokens=2147483647,
+-++++#         #     next_tokens=2147483647,
+-++++#         #     inner_precise=0,
+-++++#         #     drop_mask=None,
+-++++#         #     prefix=None,
+-++++#         #     actual_seq_qlen=None,
+-++++#         #     actual_seq_kvlen=None,
+-++++#         #     sparse_mode=sparse_mode,
+-++++#         # )
+-++++#         if not output_attentions:
+-++++#             attn_weights = None
+-++++
+-++++#         return attn_output, attn_weights, past_key_value
+-++++
+-+++ class Qwen2MoeAttention(nn.Module):
+-+++     """
+-+++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-+++-    and "Generating Long Sequences with Sparse Transformers".
+-+++-    """
+-++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+-+++ 
+-++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+-++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+-++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+-++++
+-++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+-++++    """
+-+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++         super().__init__()
+-+++         self.config = config
+-+++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+-+++         if layer_idx is None:
+-+++             logger.warning_once(
+-+++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++                 "when creating this class."
+-+++             )
+-+++ 
+-+++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+-+++         use_cache: bool = False,
+-+++         cache_position: Optional[mindspore.Tensor] = None,
+-+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++-
+-+++         
+-+++-
+-++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+-+++         bsz, q_len, _ = hidden_states.shape
+-+++ 
+-+++         query_states = self.q_proj(hidden_states)
+-+++         key_states = self.k_proj(hidden_states)
+-+++         value_states = self.v_proj(hidden_states)
+-+++ 
+-+++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++-
+-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++        
+-+++         kv_seq_len = key_states.shape[-2]
+-+++         if past_key_value is not None:
+-+++-            if self.layer_idx is None:
+-+++-                raise ValueError(
+-+++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++-                    "with a layer index."
+-+++-                )
+-+++-            if isinstance(past_key_value, StaticCache):
+-+++-                kv_seq_len = key_states.shape[-2]
+-+++-            else:
+-+++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++        
+-+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++ 
+-+++         if past_key_value is not None:
+-+++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++
+-++++        # --- 2. 动态调度核心注意力计算 ---
+-++++        global Long_Prompt
+-++++        if Long_Prompt >= 1:
+-++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+-++++            fa_attention_mask = None
+-++++            if attention_mask is not None:
+-++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++                fa_attention_mask = (mask_slice != 0)
+-++++
+-++++            attn_output = mindspore.ops.flash_attention_score(
+-++++                query=query_states,
+-++++                key=key_states,
+-++++                value=value_states,
+-++++                head_num=self.num_heads,
+-++++                attn_mask=fa_attention_mask,
+-++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+-++++                scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++                input_layout="BNSD",
+-++++                sparse_mode=0,
+-++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+-++++            )
+-+++             
+-+++-            if isinstance(past_key_value, StaticCache):
+-+++-                kv_seq_len = key_states.shape[-2]
+-++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++            attn_output = self.o_proj(attn_output)
+-++++            attn_weights = None
+-++++            if output_attentions:
+-++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-+++ 
+-+++-        # repeat k/v heads if n_kv_heads < n_heads
+-+++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++-        
+-+++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++        else:
+-++++            # --- Eager Attention 路径 (用于短序列和解码) ---
+-++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++            
+-++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++ 
+-+++-        if attention_mask is not None:
+-+++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++-            attn_weights = attn_weights + causal_mask
+-++++            if attention_mask is not None:
+-++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++                attn_weights = attn_weights + causal_mask
+-+++ 
+-+++-        # upcast attention to fp32
+-+++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+++-        attn_output = ops.matmul(attn_weights, value_states)
+-++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-++++            attn_output = ops.matmul(attn_weights, value_states)
+-+++ 
+-+++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+++-            raise ValueError(
+-+++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-+++-                f" {attn_output.shape}"
+-+++-            )
+-++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-++++                raise ValueError(
+-++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+-++++                )
+-+++ 
+-+++-        attn_output = ops.transpose(attn_output, 1, 2)
+-+++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++            attn_output = ops.transpose(attn_output, 1, 2)
+-++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++            attn_output = self.o_proj(attn_output)
+-+++ 
+-+++-        attn_output = self.o_proj(attn_output)
+-+++-        # @lwx
+-++++            if not output_attentions:
+-++++                attn_weights = None
+-+++         
+-+++-        # max_seq_len = self.max_position_embeddings  # 2048
+-+++-
+-+++-        # if attention_mask is not None:
+-+++-        #     # attention_mask: [B, 1, Sq, Sk]
+-+++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++-
+-+++-        #     # pad 到 [max_seq_len, max_seq_len]
+-+++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++-        #     global_attention_mask = padded_mask
+-+++-        # else:
+-+++-        #     global_attention_mask = None
+-+++-
+-+++-
+-+++-        # sparse_mode=3
+-+++-        # attn_output = mindspore.ops.flash_attention_score(
+-+++-        #     query=query_states,
+-+++-        #     key=key_states,
+-+++-        #     value=value_states,
+-+++-        #     real_shift=None,
+-+++-        #     padding_mask=None,
+-+++-
+-+++-        #     head_num=self.num_heads,
+-+++-        #     attn_mask=global_attention_mask,
+-+++-        #     keep_prob=1.0 - self.attention_dropout,
+-+++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++-        #     input_layout="BNSD",
+-+++-        #     pre_tokens=2147483647,
+-+++-        #     next_tokens=2147483647,
+-+++-        #     inner_precise=0,
+-+++-        #     drop_mask=None,
+-+++-        #     prefix=None,
+-+++-        #     actual_seq_qlen=None,
+-+++-        #     actual_seq_kvlen=None,
+-+++-        #     sparse_mode=sparse_mode,
+-+++-        # )
+-+++-        if not output_attentions:
+-+++-            attn_weights = None
+-+++-
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++-
+-+++ # class Qwen2MoeFlashAttention(nn.Module):
+-+++ #     """
+-+++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+-+++ #             return final_hidden_states, router_logits
+-+++ 
+-+++ 
+-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++-#     """
+-+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-+++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-+++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-+++-#     """
+-+++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++-#         super().__init__()
+-+++-#         self.num_experts = config.num_experts
+-+++-#         self.top_k = config.num_experts_per_tok
+-+++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++-
+-+++-#         # 门控网络
+-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++-#         # 专家列表
+-+++-#         self.experts = nn.ModuleList(
+-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++-#         )
+-+++-#         # 共享专家
+-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++-
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_decode(
+-+++-#         self, 
+-+++-#         hidden_states: mindspore.Tensor, 
+-+++-#         selected_experts: mindspore.Tensor, 
+-+++-#         routing_weights: mindspore.Tensor
+-+++-#     ) -> mindspore.Tensor:
+-+++-#         """
+-+++-#         【解码路径】针对 sequence_length=1 的极致优化。
+-+++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-+++-#         """
+-+++-#         batch_size, hidden_dim = hidden_states.shape
+-+++-        
+-+++-#         expert_outputs_list = [
+-+++-#             ops.cat([
+-+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++-#             ], dim=0) 
+-+++-#             for i in range(batch_size)
+-+++-#         ]
+-+++-        
+-+++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-+++-#         # shape: (batch_size, top_k, hidden_dim)
+-+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++-        
+-+++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-+++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++-        
+-+++-#         return moe_output.squeeze(1)
+-+++-
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_prefill(
+-+++-#         self, 
+-+++-#         hidden_states: mindspore.Tensor, 
+-+++-#         selected_experts: mindspore.Tensor, 
+-+++-#         routing_weights: mindspore.Tensor
+-+++-#     ) -> mindspore.Tensor:
+-+++-#         """
+-+++-#         【预填充路径】针对 sequence_length > 1 的优化。
+-+++-#         按专家对 Token 进行分组，并进行批处理。
+-+++-#         """
+-+++-#         moe_output = ops.zeros_like(hidden_states)
+-+++-#         num_tokens = hidden_states.shape[0]
+-+++-#         flat_selected_experts = selected_experts.flatten()
+-+++-        
+-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++-        
+-+++-#         active_experts = ops.unique(flat_selected_experts)
+-+++-        
+-+++-#         for expert_idx_tensor in active_experts:
+-+++-#             expert_idx = expert_idx_tensor.item()
+-+++-#             expert_layer = self.experts[expert_idx]
+-+++-            
+-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++-#             selected_token_indices = token_indices[mask]
+-+++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++-            
+-+++-#             current_states = hidden_states[selected_token_indices]
+-+++-            
+-+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++-            
+-+++-#             moe_output = moe_output.index_add(
+-+++-#                 dim=0,
+-+++-#                 index=selected_token_indices,
+-+++-#                 source=expert_output.to(hidden_states.dtype)
+-+++-#             )
+-+++-#         return moe_output
+-+++-
+-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++-#         """
+-+++-#         顶层 forward 方法，作为智能分发器。
+-+++-#         """
+-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-        
+-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++-
+-+++-#         if self.norm_topk_prob:
+-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-        
+-+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++-        
+-+++-#         moe_output = None
+-+++-#         # 在推理时，根据序列长度选择最优路径
+-+++-#         if not self.training:
+-+++-#             if sequence_length == 1:
+-+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+++-#             else:
+-+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+++-#         else:
+-+++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-+++-#             raise NotImplementedError("Training path is not implemented.")
+-+++-
+-+++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-+++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-+++-        
+-+++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-+++-        
+-+++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-+++-        
+-+++-#         return final_hidden_states, router_logits
+-+++-
+-+++-
+-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++-#     """
+-+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-+++-#     """
+-+++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++-#         super().__init__()
+-+++-#         self.num_experts = config.num_experts
+-+++-#         self.top_k = config.num_experts_per_tok
+-+++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++-
+-+++-#         # 门控网络
+-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++-#         # 专家列表
+-+++-#         self.experts = nn.ModuleList(
+-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++-#         )
+-+++-#         # 共享专家
+-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++-
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_decode(
+-+++-#         self, 
+-+++-#         hidden_states: mindspore.Tensor, 
+-+++-#         selected_experts: mindspore.Tensor, 
+-+++-#         routing_weights: mindspore.Tensor
+-+++-#     ) -> mindspore.Tensor:
+-+++-#         batch_size, _ = hidden_states.shape
+-+++-#         expert_outputs_list = [
+-+++-#             ops.cat([
+-+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++-#             ], dim=0) 
+-+++-#             for i in range(batch_size)
+-+++-#         ]
+-+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++-#         return moe_output.squeeze(1)
+-+++-
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_prefill(
+-+++-#         self, 
+-+++-#         hidden_states: mindspore.Tensor, 
+-+++-#         selected_experts: mindspore.Tensor, 
+-+++-#         routing_weights: mindspore.Tensor
+-+++-#     ) -> mindspore.Tensor:
+-+++-#         moe_output = ops.zeros_like(hidden_states)
+-+++-#         num_tokens = hidden_states.shape[0]
+-+++-#         flat_selected_experts = selected_experts.flatten()
+-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++-#         active_experts = ops.unique(flat_selected_experts)
+-+++-        
+-+++-#         for expert_idx_tensor in active_experts:
+-+++-#             expert_idx = expert_idx_tensor.item()
+-+++-#             expert_layer = self.experts[expert_idx]
+-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++-#             selected_token_indices = token_indices[mask]
+-+++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++-#             current_states = hidden_states[selected_token_indices]
+-+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++-#             moe_output = moe_output.index_add(
+-+++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++-#             )
+-+++-#         return moe_output
+-+++-
+-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++-#         """
+-+++-#         顶层 forward 方法，作为智能分发器。
+-+++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-+++-#         """
+-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-        
+-+++-#         # 1. 门控计算 (通用逻辑)
+-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++-
+-+++-#         if self.norm_topk_prob:
+-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-        
+-+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++-        
+-+++-#         # 2. 智能分发到最优 MoE 路径
+-+++-#         moe_output = None
+-+++-#         if not self.training:
+-+++-#             if sequence_length == 1:
+-+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+++-#             else:
+-+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+++-#         else:
+-+++-#             raise NotImplementedError("Training path is not implemented.")
+-+++-
+-+++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-+++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++-        
+-+++-#         # 4. 合并 MoE 输出和共享专家输出
+-+++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++-        
+-+++-#         # 5. 恢复原始形状并返回
+-+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++-        
+-+++-#         return final_hidden_states, router_logits
+-+++-
+-+++-# prefill fastest
+-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++-#     """
+-+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-+++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-+++-#     """
+-+++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++-#         super().__init__()
+-+++-#         self.num_experts = config.num_experts
+-+++-#         self.top_k = config.num_experts_per_tok
+-+++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++-
+-+++-#         # 门控网络
+-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++-#         # 专家列表
+-+++-#         self.experts = nn.ModuleList(
+-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++-#         )
+-+++-#         # 共享专家
+-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++-
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_dispatch(
+-+++-#         self, 
+-+++-#         hidden_states: mindspore.Tensor, 
+-+++-#         selected_experts: mindspore.Tensor, 
+-+++-#         routing_weights: mindspore.Tensor
+-+++-#     ) -> mindspore.Tensor:
+-+++-#         """
+-+++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-+++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-+++-#         """
+-+++-#         moe_output = ops.zeros_like(hidden_states)
+-+++-#         num_tokens, _ = hidden_states.shape
+-+++-        
+-+++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-+++-#         flat_selected_experts = selected_experts.flatten()
+-+++-#         flat_routing_weights = routing_weights.flatten()
+-+++-
+-+++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++-
+-+++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-+++-#         active_experts = ops.unique(flat_selected_experts)
+-+++-        
+-+++-#         for expert_idx_tensor in active_experts:
+-+++-#             expert_idx = expert_idx_tensor.item()
+-+++-#             expert_layer = self.experts[expert_idx]
+-+++-            
+-+++-#             # 找到所有分配给该专家的 token
+-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++-            
+-+++-#             # 使用 mask 选取对应的 token 和权重
+-+++-#             current_token_indices = token_indices[mask]
+-+++-#             current_routing_weights = flat_routing_weights[mask]
+-+++-#             current_hidden_states = hidden_states[current_token_indices]
+-+++-            
+-+++-#             # 对这些 token 进行批处理
+-+++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++-            
+-+++-#             # 使用 index_add 将结果精确地加回到对应位置
+-+++-#             moe_output = moe_output.index_add(
+-+++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++-#             )
+-+++-#         return moe_output
+-+++-
+-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++-#         """
+-+++-#         顶层 forward 方法，作为智能分发器。
+-+++-#         """
+-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-        
+-+++-#         # 1. 门控计算
+-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++-
+-+++-#         if self.norm_topk_prob:
+-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-        
+-+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++-        
+-+++-#         # 2. 调用统一的 MoE 计算内核
+-+++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-+++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-+++-
+-+++-#         # 3. 统一处理共享专家
+-+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++-        
+-+++-#         # 4. 合并输出
+-+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++-        
+-+++-#         # 5. 恢复原始形状并返回
+-+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++-        
+-+++-#         return final_hidden_states, router_logits
+-+++-
+-+++-
+-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++-#     """
+-+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++-#     【最终高性能与高精度版】：
+-+++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-+++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-+++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-+++-#     3. 这样实现了速度和准确性的两全其美。
+-+++-#     """
+-+++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++-#         super().__init__()
+-+++-#         self.num_experts = config.num_experts
+-+++-#         self.top_k = config.num_experts_per_tok
+-+++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++-
+-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++-#         self.experts = nn.ModuleList(
+-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++-#         )
+-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++-
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_decode(
+-+++-#         self, 
+-+++-#         hidden_states: mindspore.Tensor, 
+-+++-#         selected_experts: mindspore.Tensor, 
+-+++-#         routing_weights: mindspore.Tensor
+-+++-#     ) -> mindspore.Tensor:
+-+++-#         """
+-+++-#         【解码路径】极致优化版：bmm + 高精度累加。
+-+++-#         """
+-+++-#         original_dtype = hidden_states.dtype
+-+++-#         batch_size, _ = hidden_states.shape
+-+++-        
+-+++-#         expert_outputs_list = [
+-+++-#             ops.cat([
+-+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++-#             ], dim=0) 
+-+++-#             for i in range(batch_size)
+-+++-#         ]
+-+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++-
+-+++-#         # 在 float32 下执行 bmm，得到高精度结果
+-+++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++-        
+-+++-#         # 将高精度结果转换回原始数据类型
+-+++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-+++-        
+-+++-#         return moe_output
+-+++-
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_prefill(
+-+++-#         self, 
+-+++-#         hidden_states: mindspore.Tensor, 
+-+++-#         selected_experts: mindspore.Tensor, 
+-+++-#         routing_weights: mindspore.Tensor
+-+++-#     ) -> mindspore.Tensor:
+-+++-#         """
+-+++-#         【预填充路径】与原始实现一致，结果精确。
+-+++-#         """
+-+++-#         moe_output = ops.zeros_like(hidden_states)
+-+++-#         num_tokens, _ = hidden_states.shape
+-+++-#         flat_selected_experts = selected_experts.flatten()
+-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++-#         active_experts = ops.unique(flat_selected_experts)
+-+++-        
+-+++-#         for expert_idx_tensor in active_experts:
+-+++-#             expert_idx = expert_idx_tensor.item()
+-+++-#             expert_layer = self.experts[expert_idx]
+-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++-#             selected_token_indices = token_indices[mask]
+-+++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++-#             current_states = hidden_states[selected_token_indices]
+-+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++-#             moe_output = moe_output.index_add(
+-+++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++-#             )
+-+++-#         return moe_output
+-+++-
+-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-        
+-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++-
+-+++-#         if self.norm_topk_prob:
+-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-        
+-+++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-+++-#         # 如果模型主体是 float16，后续再转换
+-+++-        
+-+++-#         moe_output = None
+-+++-#         if not self.training:
+-+++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-+++-#             # _moe_infer_decode 内部会处理好类型转换
+-+++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-+++-#             if sequence_length == 1:
+-+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+++-#             else:
+-+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+++-#         else:
+-+++-#             raise NotImplementedError("Training path is not implemented.")
+-+++-
+-+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++-        
+-+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++-        
+-+++-#         return final_hidden_states, router_logits
+-+++-    
+-+++-
+-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++-#     """
+-+++-#     【融合版】一个混合专家模块，内置两种推理策略，
+-+++-#     由外部全局变量 `Long_Prompt` 控制：
+-+++-
+-+++-#     - if Long_Prompt is True:  【精度优先模式】
+-+++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-+++-#       适用于处理长序列，避免误差累积。
+-+++-
+-+++-#     - if Long_Prompt is False: 【速度优先模式】
+-+++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-+++-#       在解码阶段获得极致速度，同时保证结果高度准确。
+-+++-#     """
+-+++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++-#         super().__init__()
+-+++-#         self.num_experts = config.num_experts
+-+++-#         self.top_k = config.num_experts_per_tok
+-+++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++-
+-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++-#         self.experts = nn.ModuleList(
+-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++-#         )
+-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++-
+-+++-#     # --- 速度优先模式的辅助函数 ---
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++-#         original_dtype = hidden_states.dtype
+-+++-#         batch_size, _ = hidden_states.shape
+-+++-#         expert_outputs_list = [
+-+++-#             ops.cat([
+-+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++-#             ], dim=0) 
+-+++-#             for i in range(batch_size)
+-+++-#         ]
+-+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++-#         weights_fp32 = routing_weights.to(mindspore.float32)
+-+++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+++-
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++-#         moe_output = ops.zeros_like(hidden_states)
+-+++-#         num_tokens, _ = hidden_states.shape
+-+++-#         flat_selected_experts = selected_experts.flatten()
+-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++-#         active_experts = ops.unique(flat_selected_experts)
+-+++-#         for expert_idx_tensor in active_experts:
+-+++-#             expert_idx = expert_idx_tensor.item()
+-+++-#             expert_layer = self.experts[expert_idx]
+-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++-#             selected_token_indices = token_indices[mask]
+-+++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++-#             current_states = hidden_states[selected_token_indices]
+-+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++-#         return moe_output
+-+++-
+-+++-#     # --- 精度优先模式的辅助函数 ---
+-+++-#     @no_grad()
+-+++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++-#         moe_output = ops.zeros_like(hidden_states)
+-+++-#         num_tokens, _ = hidden_states.shape
+-+++-#         flat_selected_experts = selected_experts.flatten()
+-+++-#         flat_routing_weights = routing_weights.flatten()
+-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++-#         active_experts = ops.unique(flat_selected_experts)
+-+++-#         for expert_idx_tensor in active_experts:
+-+++-#             expert_idx = expert_idx_tensor.item()
+-+++-#             expert_layer = self.experts[expert_idx]
+-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++-#             current_token_indices = token_indices[mask]
+-+++-#             current_routing_weights = flat_routing_weights[mask]
+-+++-#             current_hidden_states = hidden_states[current_token_indices]
+-+++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++-#         return moe_output
+-+++-
+-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++-#         # 声明我们将要使用一个在模块外部定义的全局变量
+-+++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-+++-#         global Long_Prompt
+-+++-
+-+++-#         # 1. 门控计算 (所有模式通用)
+-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+++-#         if self.norm_topk_prob:
+-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-        
+-+++-#         moe_output = None
+-+++-#         if not self.training:
+-+++-#             # 根据 Long_Prompt 标志选择模式
+-+++-#             if Long_Prompt:
+-+++-#                 # --- 精度优先模式 ---
+-+++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++-#             else:
+-+++-#                 # --- 速度优先模式 ---
+-+++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++-#                 if sequence_length == 1:
+-+++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++-#                 else:
+-+++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++-#         else:
+-+++-#             raise NotImplementedError("Training path is not implemented.")
+-+++-
+-+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++-        
+-+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++-        
+-+++-#         return final_hidden_states, router_logits
+-+++-    
+-+++ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++     """
+-+++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-+++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+++         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+++ 
+-++++    # @no_grad()
+-++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++    #     num_tokens, _ = hidden_states.shape
+-++++    #     flat_selected_experts = selected_experts.flatten()
+-++++    #     sorted_expert_indices = flat_selected_experts.argsort()
+-++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-++++    #     original_token_indices = sorted_expert_indices // self.top_k
+-++++    #     moe_output = ops.zeros_like(hidden_states)
+-++++    #     current_token_offset = 0
+-++++    #     for i in range(self.num_experts):
+-++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+-++++    #         if expert_token_count == 0:
+-++++    #             continue
+-++++    #         end_offset = current_token_offset + expert_token_count
+-++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+-++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++    #         current_token_offset += expert_token_count
+-++++    #     return moe_output
+-++++
+-+++     @no_grad()
+-+++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++-        num_tokens, _ = hidden_states.shape
+-+++-        flat_selected_experts = selected_experts.flatten()
+-+++-        sorted_expert_indices = flat_selected_experts.argsort()
+-+++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+++-        original_token_indices = sorted_expert_indices // self.top_k
+-++++        """
+-++++        优化版 MoE prefill (速度优先模式)：
+-++++        - 批量张量化处理同一个 expert 的所有 token
+-++++        - 跳过无 token 的专家
+-++++        - 保持结果完全一致
+-++++        """
+-+++         moe_output = ops.zeros_like(hidden_states)
+-+++-        current_token_offset = 0
+-+++-        for i in range(self.num_experts):
+-+++-            expert_token_count = tokens_per_expert[i] - current_token_offset
+-+++-            if expert_token_count == 0:
+-+++-                continue
+-+++-            end_offset = current_token_offset + expert_token_count
+-+++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+++-            expert_hidden_states = hidden_states[expert_original_token_indices]
+-+++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++-            current_token_offset += expert_token_count
+-++++
+-++++        flat_selected_experts = selected_experts.flatten()
+-++++        flat_routing_weights = routing_weights.flatten()
+-++++
+-++++        idxs = flat_selected_experts.argsort()
+-++++        sorted_expert_indices = flat_selected_experts[idxs]
+-++++        sorted_token_indices = idxs // self.top_k
+-++++
+-++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+-++++
+-++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-++++
+-++++        for expert_id in active_experts.tolist():
+-++++            start = int(tokens_per_expert[:expert_id].sum().item())
+-++++            end = start + int(tokens_per_expert[expert_id].item())
+-++++
+-++++            token_idx = sorted_token_indices[start:end]
+-++++            expert_tokens = hidden_states[token_idx]
+-++++
+-++++            expert_out = self.experts[expert_id](expert_tokens)
+-++++
+-++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+-++++
+-++++            moe_output = mindspore.mint.scatter_add(
+-++++                moe_output,
+-++++                0,
+-++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+-++++                scaled_out.to(hidden_states.dtype)
+-++++            )
+-++++
+-+++         return moe_output
+-+++ 
+-++++
+-+++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-+++     @no_grad()
+-+++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++         
+-+++         moe_output = None
+-+++-        if Long_Prompt:
+-+++-            # --- 精度优先模式 (ACCURACY MODE) ---
+-+++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++        # if Long_Prompt==0:
+-++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
+-++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++        # else:
+-++++        #     # --- 速度优先模式 (SPEED MODE) ---
+-++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++        #     if sequence_length == 1:
+-++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++        #     else:
+-++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++        
+-++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++        if sequence_length == 1:
+-++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++         else:
+-+++-            # --- 速度优先模式 (SPEED MODE) ---
+-+++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++-            if sequence_length == 1:
+-+++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++-            else:
+-+++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++-        
+-++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++    
+-+++ 
+-+++         # 3. 共享专家计算与合并 (所有模式通用)
+-+++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++         
+-+++         return final_hidden_states, router_logits
+-+++ 
+-++++
+-+++ class Qwen2MoeDecoderLayer(nn.Module):
+-+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-+++         super().__init__()
+-+++         self.hidden_size = config.hidden_size
+-+++         
+-+++-        # if Long_Prompt:
+-+++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++-        # else:
+-++++        # if Long_Prompt == 2:
+-+++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++        # else:
+-++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++ 
+-+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++ 
+-+++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+++             )
+-+++ 
+-+++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+-+++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++        #     attention_mask,
+-++++        #     sequence_length=sequence_length,
+-++++        #     target_length=target_length,
+-++++        #     dtype=dtype,
+-++++        #     min_dtype=min_dtype,
+-++++        #     cache_position=cache_position,
+-++++        #     batch_size=input_tensor.shape[0],
+-++++        # )
+-++++        #@dwj
+-++++        causal_mask = get_cached_causal_mask_with_cache_position(
+-+++             attention_mask,
+-+++             sequence_length=sequence_length,
+-+++             target_length=target_length,
+-+++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-+++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-+++         """
+-+++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+-++++        _causal_mask_cache.clear()
+-+++ 
+-+++         input_ids = kwargs.get("input_ids")
+-+++         if input_ids is None and args:
+-+++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++ 
+-+++         if input_ids is not None:
+-+++             prompt_length = input_ids.shape[1]
+-+++-            
+-+++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-+++-                Long_Prompt = True
+-++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+-++++                Long_Prompt = 2
+-++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+-++++                Long_Prompt = 0
+-+++             else:
+-+++-                Long_Prompt = False
+-++++                Long_Prompt = 1
+-++++
+-+++ 
+-+++         return super().generate(*args, **kwargs)
+-+++     
+-+++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++             dtype = self.lm_head.weight.dtype
+-+++             min_dtype = float(ops.finfo(dtype).min)
+-+++ 
+-+++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++            #     attention_mask,
+-++++            #     sequence_length=sequence_length,
+-++++            #     target_length=past_key_values.get_max_length(),
+-++++            #     dtype=dtype,
+-++++            #     min_dtype=min_dtype,
+-++++            #     cache_position=cache_position,
+-++++            #     batch_size=batch_size,
+-++++            # )
+-++++
+-++++            #@dwj
+-++++            attention_mask = get_cached_causal_mask_with_cache_position(
+-+++                 attention_mask,
+-+++                 sequence_length=sequence_length,
+-+++                 target_length=past_key_values.get_max_length(),
+-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+++deleted file mode 100644
+-+++index 6dfb5b93..00000000
+-+++--- a/patches/0001-20251104commit.patch
+-++++++ /dev/null
+-+++@@ -1,1272 +0,0 @@
+-+++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+++-From: Pinoeer-kingxi <13022943007@163.com>
+-+++-Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+++-Subject: [PATCH] 20251104commit
+-+++-
+-+++----
+-+++- mindnlp/transformers/cache_utils.py           |  28 +-
+-+++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+++- 3 files changed, 976 insertions(+), 87 deletions(-)
+-+++-
+-+++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+++-index cadd2e04..02f8d4be 100644
+-+++---- a/mindnlp/transformers/cache_utils.py
+-+++-+++ b/mindnlp/transformers/cache_utils.py
+-+++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+++-             # k_out[:, :, cache_position] = key_states
+-+++-             # v_out[:, :, cache_position] = value_states
+-+++--            if ON_ORANGE_PI:
+-+++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++--            else:
+-+++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++--
+-+++-+            # if ON_ORANGE_PI:
+-+++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++-+            # else:
+-+++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++-+            # 确保 cache_position 是 1D tensor 并且类型正确
+-+++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+++-+            if cache_position.ndim > 1:
+-+++-+                cache_position = cache_position.flatten()
+-+++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+++-+                cache_position = cache_position.int()
+-+++-+            
+-+++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+++-+            k_out[:, :, cache_position] = key_states
+-+++-+            v_out[:, :, cache_position] = value_states
+-+++-+            
+-+++-         return k_out, v_out
+-+++- 
+-+++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++-index c695b944..d8303e45 100644
+-+++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+++- # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+++- def rotate_half(x):
+-+++-     """Rotates half the hidden dims of the input."""
+-+++--    x1 = x[..., : x.shape[-1] // 2]
+-+++--    x2 = x[..., x.shape[-1] // 2 :]
+-+++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++-+    # x1 = x[..., : x.shape[-1] // 2]
+-+++-+    # x2 = x[..., x.shape[-1] // 2 :]
+-+++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++-     return ops.cat((-x2, x1), dim=-1)
+-+++- 
+-+++- 
+-+++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+++-         if self.training:
+-+++-             raise NotImplementedError("Training is not supported yet.")
+-+++-         else:
+-+++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++--        if self.config.n_shared_experts is not None:
+-+++--            y = y + self.shared_experts(identity)
+-+++--        return y
+-+++-+            # @lwx
+-+++-+            if orig_shape[1] == 1:
+-+++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+++-+                y=y.view(*orig_shape)
+-+++-+                if self.config.n_shared_experts is not None:
+-+++-+                    y = y + self.shared_experts(identity)
+-+++-+                return y
+-+++-+            else:
+-+++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+++-+                if self.config.n_shared_experts is not None:
+-+++-+                    y = y + self.shared_experts(identity)
+-+++-+                return y
+-+++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++-+        # if self.config.n_shared_experts is not None:
+-+++-+        #     y = y + self.shared_experts(identity)
+-+++-+        # return y
+-+++-+        
+-+++-+    @no_grad()
+-+++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++-+
+-+++-+        expert_cache = ops.zeros_like(x)
+-+++-+        for i in range(self.num_experts_per_tok):
+-+++-+            expert_id = flat_expert_indices[i].item()
+-+++-+            weight = flat_expert_weights[i].item()
+-+++-+            expert = self.experts[expert_id]
+-+++-+            expert_out = expert(x)
+-+++-+            expert_cache += expert_out * weight
+-+++-+        return expert_cache
+-+++- 
+-+++-     @no_grad()
+-+++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++--        # expert_cache = torch.zeros_like(x)
+-+++--        # idxs = flat_expert_indices.argsort()
+-+++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++--        # token_idxs = idxs // self.num_experts_per_tok
+-+++--        # for i, end_idx in enumerate(tokens_per_expert):
+-+++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++--        #     if start_idx == end_idx:
+-+++--        #         continue
+-+++--        #     expert = self.experts[i]
+-+++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++--        #     expert_tokens = x[exp_token_idx]
+-+++--        #     expert_out = expert(expert_tokens)
+-+++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++--        # return expert_cache
+-+++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++-         expert_cache = ops.zeros_like(x)
+-+++-         idxs = flat_expert_indices.argsort()
+-+++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++-         token_idxs = idxs // self.num_experts_per_tok
+-+++-+
+-+++-         for i, end_idx in enumerate(tokens_per_expert):
+-+++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++-             if start_idx == end_idx:
+-+++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+++-             expert_out = expert(expert_tokens)
+-+++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++-+
+-+++-         return expert_cache
+-+++-+        
+-+++-+    # @no_grad()
+-+++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++-+    #     # expert_cache = torch.zeros_like(x)
+-+++-+    #     # idxs = flat_expert_indices.argsort()
+-+++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++-+    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++-+    #     #     if start_idx == end_idx:
+-+++-+    #     #         continue
+-+++-+    #     #     expert = self.experts[i]
+-+++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++-+    #     #     expert_tokens = x[exp_token_idx]
+-+++-+    #     #     expert_out = expert(expert_tokens)
+-+++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++-+    #     # return expert_cache
+-+++-+    #     expert_cache = ops.zeros_like(x)
+-+++-+    #     idxs = flat_expert_indices.argsort()
+-+++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+++-+
+-+++-+    #     for i, end_idx in enumerate(tokens_per_expert):
+-+++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++-+    #         if start_idx == end_idx:
+-+++-+    #             continue
+-+++-+    #         expert = self.experts[i]
+-+++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++-+    #         expert_tokens = x[exp_token_idx]
+-+++-+    #         expert_out = expert(expert_tokens)
+-+++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++-+
+-+++-+    #     return expert_cache
+-+++-+    # @no_grad()
+-+++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++-+    #     expert_cache = ops.zeros_like(x)
+-+++-+
+-+++-+    #     # 排序保证顺序一致
+-+++-+    #     idxs = flat_expert_indices.argsort()
+-+++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+++-+
+-+++-+    #     # 找出有 token 的专家
+-+++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++-+
+-+++-+    #     for i in active_experts.tolist():
+-+++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++-+    #         end_idx = tokens_per_expert[i]
+-+++-+    #         if start_idx == end_idx:  # 没有 token
+-+++-+    #             continue
+-+++-+
+-+++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++-+    #         expert_tokens = x[exp_token_idx]
+-+++-+    #         expert_out = self.experts[i](expert_tokens)
+-+++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++-+
+-+++-+    #         expert_cache = mindspore.mint.scatter_add(
+-+++-+    #             expert_cache,
+-+++-+    #             0,
+-+++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++-+    #             expert_out
+-+++-+    #         )
+-+++-+
+-+++-+    #     return expert_cache
+-+++-+
+-+++-+
+-+++- 
+-+++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+++- #     """
+-+++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++- 
+-+++-         # Initialize weights and apply final processing
+-+++-         self.post_init()
+-+++-+        self.warm_up = False
+-+++-+
+-+++-+    def warmup_moe_model_deep(self):
+-+++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++-+        test_texts = [
+-+++-+            "warmup short",
+-+++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+++-+        ]
+-+++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++-+        if tokenizer is None:
+-+++-+            from mindnlp.transformers import AutoTokenizer
+-+++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++-+            self._warmup_tokenizer = tokenizer
+-+++-+
+-+++-+        for text in test_texts:
+-+++-+            inputs = tokenizer(text, return_tensors="ms")
+-+++-+            with mindspore._no_grad():
+-+++-+                _ = self(**inputs, use_cache=False)
+-+++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+++- 
+-+++-     def get_input_embeddings(self):
+-+++-         return self.model.embed_tokens
+-+++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++-         ```"""
+-+++-+        if not self.warm_up:
+-+++-+            self.warm_up = True
+-+++-+            self.warmup_moe_model_deep()
+-+++-+
+-+++-         output_attentions = (
+-+++-             output_attentions
+-+++-             if output_attentions is not None
+-+++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++-index 3cbf820e..d4c6b651 100644
+-+++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++-@@ -18,7 +18,6 @@
+-+++- # See the License for the specific language governing permissions and
+-+++- # limitations under the License.
+-+++- """MindSpore Qwen2MoE model."""
+-+++--
+-+++- import math
+-+++- from typing import List, Optional, Tuple, Union
+-+++- 
+-+++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+++-     TokenClassifierOutput,
+-+++- )
+-+++- from ...modeling_utils import PreTrainedModel
+-+++-+from ...generation import GenerationMixin
+-+++- from ....utils import logging
+-+++- from .configuration_qwen2_moe import Qwen2MoeConfig
+-+++- 
+-+++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+++-         self.variance_epsilon = eps
+-+++- 
+-+++-     def forward(self, hidden_states):
+-+++-+        # @dwj
+-+++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++-+        # @lwx
+-+++-+        # if not self.training :
+-+++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++-         input_dtype = hidden_states.dtype
+-+++-         hidden_states = hidden_states.to(mindspore.float32)
+-+++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+++-@@ -234,6 +239,8 @@ def rotate_half(x):
+-+++-     """Rotates half the hidden dims of the input."""
+-+++-     x1 = x[..., : x.shape[-1] // 2]
+-+++-     x2 = x[..., x.shape[-1] // 2 :]
+-+++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++-     return ops.cat((-x2, x1), dim=-1)
+-+++- 
+-+++- 
+-+++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+++-         self.config = config
+-+++-         self.hidden_size = config.hidden_size
+-+++-         self.intermediate_size = intermediate_size
+-+++-+        
+-+++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+++-         self.act_fn = ACT2FN[config.hidden_act]
+-+++- 
+-+++-     def forward(self, x):
+-+++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++--
+-+++- 
+-+++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++-+        # @lwx 
+-+++-+        # gate_up_output = self.gate_up_proj(x)
+-+++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+++-+        # return self.down_proj(swiglu_output)
+-+++-+
+-+++-+    # def forward(self, x):
+-+++-+    #     gate_proj_out = self.gate_proj(x)
+-+++-+    #     up_proj_out = self.up_proj(x)
+-+++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+++-+    #     return self.down_proj(swiglu_out)
+-+++-+        
+-+++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+++-     """
+-+++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+++-         use_cache: bool = False,
+-+++-         cache_position: Optional[mindspore.Tensor] = None,
+-+++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++-+
+-+++-+        
+-+++-+
+-+++-         bsz, q_len, _ = hidden_states.shape
+-+++- 
+-+++-         query_states = self.q_proj(hidden_states)
+-+++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++-                     "with a layer index."
+-+++-                 )
+-+++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++-+            if isinstance(past_key_value, StaticCache):
+-+++-+                kv_seq_len = key_states.shape[-2]
+-+++-+            else:
+-+++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++- 
+-+++-         if past_key_value is not None:
+-+++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++-+            
+-+++-+            if isinstance(past_key_value, StaticCache):
+-+++-+                kv_seq_len = key_states.shape[-2]
+-+++- 
+-+++-         # repeat k/v heads if n_kv_heads < n_heads
+-+++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++--
+-+++-+        
+-+++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++- 
+-+++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+++--            raise ValueError(
+-+++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+++--                f" {attn_weights.shape}"
+-+++--            )
+-+++--
+-+++--        if attention_mask is not None:  # no matter the length, we just slice it
+-+++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+++-+        if attention_mask is not None:
+-+++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++-             attn_weights = attn_weights + causal_mask
+-+++- 
+-+++-         # upcast attention to fp32
+-+++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++- 
+-+++-         attn_output = self.o_proj(attn_output)
+-+++--
+-+++-+        # @lwx
+-+++-+        
+-+++-+        # max_seq_len = self.max_position_embeddings  # 2048
+-+++-+
+-+++-+        # if attention_mask is not None:
+-+++-+        #     # attention_mask: [B, 1, Sq, Sk]
+-+++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++-+
+-+++-+        #     # pad 到 [max_seq_len, max_seq_len]
+-+++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++-+        #     global_attention_mask = padded_mask
+-+++-+        # else:
+-+++-+        #     global_attention_mask = None
+-+++-+
+-+++-+
+-+++-+        # sparse_mode=3
+-+++-+        # attn_output = mindspore.ops.flash_attention_score(
+-+++-+        #     query=query_states,
+-+++-+        #     key=key_states,
+-+++-+        #     value=value_states,
+-+++-+        #     real_shift=None,
+-+++-+        #     padding_mask=None,
+-+++-+
+-+++-+        #     head_num=self.num_heads,
+-+++-+        #     attn_mask=global_attention_mask,
+-+++-+        #     keep_prob=1.0 - self.attention_dropout,
+-+++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++-+        #     input_layout="BNSD",
+-+++-+        #     pre_tokens=2147483647,
+-+++-+        #     next_tokens=2147483647,
+-+++-+        #     inner_precise=0,
+-+++-+        #     drop_mask=None,
+-+++-+        #     prefix=None,
+-+++-+        #     actual_seq_qlen=None,
+-+++-+        #     actual_seq_kvlen=None,
+-+++-+        #     sparse_mode=sparse_mode,
+-+++-+        # )
+-+++-         if not output_attentions:
+-+++-             attn_weights = None
+-+++- 
+-+++-         return attn_output, attn_weights, past_key_value
+-+++- 
+-+++- 
+-+++-+class Qwen2MoeFlashAttention(nn.Module):
+-+++-+    """
+-+++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++-+
+-+++-+    关键改动:
+-+++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++-+       直接传入原始的 key 和 value 张量效率更高。
+-+++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++-+    """
+-+++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++-+        super().__init__()
+-+++-+        self.config = config
+-+++-+        self.layer_idx = layer_idx
+-+++-+        self.hidden_size = config.hidden_size
+-+++-+        self.num_heads = config.num_attention_heads
+-+++-+        self.head_dim = self.hidden_size // self.num_heads
+-+++-+        self.num_key_value_heads = config.num_key_value_heads
+-+++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++-+        self.max_position_embeddings = config.max_position_embeddings
+-+++-+        self.rope_theta = config.rope_theta
+-+++-+        self.attention_dropout = config.attention_dropout
+-+++-+
+-+++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++-+            raise ValueError(
+-+++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++-+            )
+-+++-+
+-+++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++-+
+-+++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++-+            self.head_dim,
+-+++-+            max_position_embeddings=self.max_position_embeddings,
+-+++-+            base=self.rope_theta,
+-+++-+        )
+-+++-+
+-+++-+    def forward(
+-+++-+        self,
+-+++-+        hidden_states: mindspore.Tensor,
+-+++-+        attention_mask: Optional[mindspore.Tensor] = None,
+-+++-+        position_ids: Optional[mindspore.Tensor] = None,
+-+++-+        past_key_value: Optional[Cache] = None,
+-+++-+        output_attentions: bool = False,
+-+++-+        use_cache: bool = False,
+-+++-+        cache_position: Optional[mindspore.Tensor] = None,
+-+++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++-+
+-+++-+        bsz, q_len, _ = hidden_states.shape
+-+++-+
+-+++-+        # 1. 线性投射 Q, K, V
+-+++-+        query_states = self.q_proj(hidden_states)
+-+++-+        key_states = self.k_proj(hidden_states)
+-+++-+        value_states = self.v_proj(hidden_states)
+-+++-+
+-+++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+
+-+++-+        # 3. RoPE 旋转位置编码
+-+++-+        kv_seq_len = key_states.shape[-2]
+-+++-+        if past_key_value is not None:
+-+++-+            if self.layer_idx is None:
+-+++-+                raise ValueError(
+-+++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++-+                    "with a layer index."
+-+++-+                )
+-+++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++-+                if cache_position.shape[0] == 1:
+-+++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++-+                    kv_seq_len = past_seen_tokens + 1
+-+++-+                else:
+-+++-+                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++-+            else:
+-+++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++-+
+-+++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++-+
+-+++-+        # 4. KV 缓存更新
+-+++-+        if past_key_value is not None:
+-+++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++-+            key_states, value_states = past_key_value.update(
+-+++-+                key_states, value_states, self.layer_idx, cache_kwargs
+-+++-+            )
+-+++-+            
+-+++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++-+                if cache_position.shape[0] == 1:
+-+++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++-+                    kv_seq_len = key_states.shape[-2]
+-+++-+
+-+++-+        # 5. [重要] 准备 Attention Mask
+-+++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++-+        fa_attention_mask = None
+-+++-+        if attention_mask is not None:
+-+++-+            # 截取与当前key长度匹配的部分
+-+++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++-+            fa_attention_mask = (mask_slice != 0)
+-+++-+
+-+++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++-+        input_dtype = query_states.dtype
+-+++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++-+            query_states = query_states.to(mindspore.float16)
+-+++-+            key_states = key_states.to(mindspore.float16)
+-+++-+            value_states = value_states.to(mindspore.float16)
+-+++-+
+-+++-+        # 6. [核心] 调用 flash_attention_score 算子
+-+++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++-+        attn_output = mindspore.ops.flash_attention_score(
+-+++-+            query=query_states,
+-+++-+            key=key_states,
+-+++-+            value=value_states,
+-+++-+            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++-+            attn_mask=fa_attention_mask,
+-+++-+            keep_prob=1.0 - self.attention_dropout,
+-+++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++-+            input_layout="BNSD",
+-+++-+            sparse_mode=0 # 使用 defaultMask 模式
+-+++-+        )
+-+++-+
+-+++-+        # 恢复原始数据类型
+-+++-+        attn_output = attn_output.to(input_dtype)
+-+++-+
+-+++-+        # 7. 调整输出形状
+-+++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++-+        attn_output = self.o_proj(attn_output)
+-+++-+
+-+++-+        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++-+        attn_weights = None
+-+++-+        if output_attentions:
+-+++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++-+
+-+++-+        return attn_output, attn_weights, past_key_value
+-+++-+
+-+++-+    # def forward(
+-+++-+    #     self,
+-+++-+    #     hidden_states: mindspore.Tensor,
+-+++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++-+    #     past_key_value: Optional[Cache] = None,
+-+++-+    #     output_attentions: bool = False,
+-+++-+    #     use_cache: bool = False,
+-+++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++-+
+-+++-+    #     bsz, q_len, _ = hidden_states.shape
+-+++-+
+-+++-+    #     # 1. 线性投射 Q, K, V
+-+++-+    #     query_states = self.q_proj(hidden_states)
+-+++-+    #     key_states = self.k_proj(hidden_states)
+-+++-+    #     value_states = self.v_proj(hidden_states)
+-+++-+
+-+++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+
+-+++-+    #     # 3. RoPE 旋转位置编码
+-+++-+    #     kv_seq_len = key_states.shape[-2]
+-+++-+    #     if past_key_value is not None:
+-+++-+    #         if self.layer_idx is None:
+-+++-+    #             raise ValueError(
+-+++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++-+    #                 "with a layer index."
+-+++-+    #             )
+-+++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++-+
+-+++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++-+
+-+++-+    #     # 4. KV 缓存更新
+-+++-+    #     if past_key_value is not None:
+-+++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++-+    #         key_states, value_states = past_key_value.update(
+-+++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++-+    #         )
+-+++-+
+-+++-+    #     # 5. 准备 Attention Mask
+-+++-+    #     fa_attention_mask = None
+-+++-+    #     if attention_mask is not None:
+-+++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++-+    #         fa_attention_mask = (mask_slice != 0)
+-+++-+
+-+++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++-+    #     input_dtype = query_states.dtype
+-+++-+
+-+++-+    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+++-+    #         query=query_states,
+-+++-+    #         key=key_states,
+-+++-+    #         value=value_states,
+-+++-+    #         head_num=self.num_heads,
+-+++-+    #         attn_mask=fa_attention_mask,
+-+++-+    #         keep_prob=1.0 - self.attention_dropout,
+-+++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++-+    #         input_layout="BNSD",
+-+++-+    #         sparse_mode=0,
+-+++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++-+    #         inner_precise=1
+-+++-+    #     )
+-+++-+
+-+++-+    #     # 恢复原始数据类型
+-+++-+    #     attn_output = attn_output.to(input_dtype)
+-+++-+
+-+++-+    #     # 7. 调整输出形状
+-+++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++-+    #     attn_output = self.o_proj(attn_output)
+-+++-+
+-+++-+    #     attn_weights = None
+-+++-+    #     if output_attentions:
+-+++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++-+
+-+++-+    #     return attn_output, attn_weights, past_key_value
+-+++-+
+-+++-+    # def forward(
+-+++-+    #     self,
+-+++-+    #     hidden_states: mindspore.Tensor,
+-+++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++-+    #     past_key_value: Optional[Cache] = None,
+-+++-+    #     output_attentions: bool = False,
+-+++-+    #     use_cache: bool = False,
+-+++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++-+
+-+++-+    #     bsz, q_len, _ = hidden_states.shape
+-+++-+
+-+++-+    #     query_states = self.q_proj(hidden_states)
+-+++-+    #     key_states = self.k_proj(hidden_states)
+-+++-+    #     value_states = self.v_proj(hidden_states)
+-+++-+
+-+++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-+
+-+++-+    #     kv_seq_len = key_states.shape[-2]
+-+++-+    #     if past_key_value is not None:
+-+++-+    #         if self.layer_idx is None:
+-+++-+    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++-+
+-+++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++-+
+-+++-+    #     if past_key_value is not None:
+-+++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++-+    #         key_states, value_states = past_key_value.update(
+-+++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++-+    #         )
+-+++-+
+-+++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++-+
+-+++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++-+    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++-+    #     # <--- 修改结束 ---
+-+++-+
+-+++-+    #     fa_attention_mask = None
+-+++-+    #     if attention_mask is not None:
+-+++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++-+    #         fa_attention_mask = (mask_slice != 0)
+-+++-+
+-+++-+    #     input_dtype = query_states.dtype
+-+++-+
+-+++-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+++-+    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++-+    #         key=key_states,
+-+++-+    #         value=value_states,
+-+++-+    #         head_num=self.num_heads,
+-+++-+    #         attn_mask=fa_attention_mask,
+-+++-+    #         keep_prob=1.0 - self.attention_dropout,
+-+++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++-+    #         input_layout="BNSD",
+-+++-+    #         sparse_mode=0,
+-+++-+    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++-+    #     )
+-+++-+
+-+++-+    #     attn_output = attn_output.to(input_dtype)
+-+++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++-+    #     attn_output = self.o_proj(attn_output)
+-+++-+
+-+++-+    #     attn_weights = None
+-+++-+    #     if output_attentions:
+-+++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++-+
+-+++-+    #     return attn_output, attn_weights, past_key_value
+-+++-+
+-+++- QWEN2MOE_ATTENTION_CLASSES = {
+-+++-     "eager": Qwen2MoeAttention,
+-+++-+    "flash-attention": Qwen2MoeFlashAttention,
+-+++- }
+-+++- 
+-+++- 
+-+++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++- 
+-+++-+    #@dwj
+-+++-+    # 只遍历激活的专家，而非全部专家
+-+++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++--        hidden_states = hidden_states.view(-1, hidden_dim)
+-+++--        # router_logits: (batch * sequence_length, n_experts)
+-+++--        router_logits = self.gate(hidden_states)
+-+++--
+-+++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++--        if self.norm_topk_prob:
+-+++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++--        # we cast back to the input dtype
+-+++--        routing_weights = routing_weights.to(hidden_states.dtype)
+-+++--
+-+++--        final_hidden_states = ops.zeros(
+-+++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+++--        )
+-+++--
+-+++--        # One hot encode the selected experts to create an expert mask
+-+++--        # this will be used to easily index which expert is going to be sollicitated
+-+++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+++--
+-+++--        # Loop over all available experts in the model and perform the computation on each expert
+-+++--        for expert_idx in range(self.num_experts):
+-+++--            expert_layer = self.experts[expert_idx]
+-+++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+++--
+-+++--            # Index the correct hidden states and compute the expert hidden state for
+-+++--            # the current expert. We need to make sure to multiply the output hidden
+-+++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+++--            if 0 not in idx.shape:
+-+++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+++--
+-+++--                # However `index_add_` only support torch tensors for indexing so we'll use
+-+++--                # the `top_x` tensor here.
+-+++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+++--
+-+++--        shared_expert_output = self.shared_expert(hidden_states)
+-+++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+++--
+-+++--        final_hidden_states = final_hidden_states + shared_expert_output
+-+++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++-+            num_tokens = hidden_states_reshaped.shape[0]
+-+++-+
+-+++-+            router_logits = self.gate(hidden_states_reshaped)
+-+++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++-+
+-+++-+            if self.norm_topk_prob:
+-+++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++-+            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++-+            
+-+++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++-+            flat_selected_experts = selected_experts.flatten()
+-+++-+            
+-+++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++-+            token_indices = broadcasted_token_indices.flatten()
+-+++-+            
+-+++-+            active_experts = ops.unique(flat_selected_experts)
+-+++-+            
+-+++-+            for expert_idx_tensor in active_experts:
+-+++-+                expert_idx = expert_idx_tensor.item()
+-+++-+                expert_layer = self.experts[expert_idx]
+-+++-+                
+-+++-+                mask = (flat_selected_experts == expert_idx_tensor)
+-+++-+                selected_token_indices = token_indices[mask]
+-+++-+                selected_routing_weights = routing_weights.flatten()[mask]
+-+++-+                
+-+++-+                current_states = hidden_states_reshaped[selected_token_indices]
+-+++-+                
+-+++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++-+                
+-+++-+                final_hidden_states = final_hidden_states.index_add(
+-+++-+                    dim=0,
+-+++-+                    index=selected_token_indices,
+-+++-+                    source=expert_output.to(hidden_states.dtype)
+-+++-+                )
+-+++-+            
+-+++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++- 
+-+++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++--        return final_hidden_states, router_logits
+-+++-+            final_hidden_states = final_hidden_states + shared_expert_output
+-+++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++-+            
+-+++-+            return final_hidden_states, router_logits
+-+++- 
+-+++- 
+-+++- class Qwen2MoeDecoderLayer(nn.Module):
+-+++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+++- 
+-+++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++- 
+-+++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++-+
+-+++-         if (layer_idx not in config.mlp_only_layers) and (
+-+++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+++-         ):
+-+++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+++-     _skip_keys_device_placement = "past_key_values"
+-+++-     _supports_cache_class = True
+-+++-+#lwx
+-+++-+    # _supports_static_cache = True
+-+++- 
+-+++-     def _init_weights(self, module):
+-+++-         std = self.config.initializer_range
+-+++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+++-         return causal_mask
+-+++- 
+-+++- 
+-+++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++-     _tied_weights_keys = ["lm_head.weight"]
+-+++- 
+-+++-     def __init__(self, config):
+-+++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++-         self.num_experts_per_tok = config.num_experts_per_tok
+-+++-         # Initialize weights and apply final processing
+-+++-         self.post_init()
+-+++-+        # @lwx
+-+++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+++-+        #     self.generation_config.cache_implementation = "static"
+-+++-+        self._warmed_up = False
+-+++-+
+-+++-+    def warmup_moe_model(self):
+-+++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+++-+        test_texts = [
+-+++-+            "warmup short",
+-+++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+++-+        ]
+-+++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++-+        if tokenizer is None:
+-+++-+            from mindnlp.transformers import AutoTokenizer
+-+++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++-+            self._warmup_tokenizer = tokenizer
+-+++-+
+-+++-+        for text in test_texts:
+-+++-+            inputs = tokenizer(text, return_tensors="ms")
+-+++-+            with mindspore._no_grad():
+-+++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+++- 
+-+++-     def get_input_embeddings(self):
+-+++-         return self.model.embed_tokens
+-+++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++-         ```"""
+-+++-+        if not self._warmed_up:
+-+++-+            self._warmed_up = True
+-+++-+            self.warmup_moe_model()
+-+++- 
+-+++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+++-         output_router_logits = (
+-+++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++-             }
+-+++-         )
+-+++-         return model_inputs
+-+++-+# @lwx
+-+++-+    # def _decode_one_tokens_logits(
+-+++-+    #     self,
+-+++-+    #     cur_token: mindspore.Tensor,
+-+++-+    #     input_pos: Optional[mindspore.Tensor],
+-+++-+    #     cache_position: mindspore.Tensor,
+-+++-+    #     past_key_values: StaticCache,
+-+++-+    # ) -> mindspore.Tensor:
+-+++-+    #     """
+-+++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+++-+        
+-+++-+    #     Args:
+-+++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+++-+    #         input_pos: 输入位置信息，可选
+-+++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+++-+            
+-+++-+    #     Returns:
+-+++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+++-+    #     """
+-+++-+    #     # 调用JIT编译的版本
+-+++-+    #     return self.get_decode_one_tokens_logits(
+-+++-+    #         cur_token=cur_token,
+-+++-+    #         input_pos=input_pos,
+-+++-+    #         cache_position=cache_position,
+-+++-+    #         past_key_values=past_key_values,
+-+++-+    #     )
+-+++-+    
+-+++-+    # @mindspore.jit(jit_level='O1')
+-+++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+++-+    #     """
+-+++-+    #     JIT编译的函数，用于高效的单token解码
+-+++-+    #     使用JIT编译优化以支持静态shape和高效执行
+-+++-+        
+-+++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+++-+    #     """
+-+++-+    #     outputs = self.model.forward(
+-+++-+    #         input_ids=cur_token,
+-+++-+    #         position_ids=input_pos,
+-+++-+    #         cache_position=cache_position,
+-+++-+    #         past_key_values=past_key_values,
+-+++-+    #         use_cache=True,
+-+++-+    #         return_dict=False,
+-+++-+    #     )
+-+++-+        
+-+++-+    #     hidden_states = outputs[0]
+-+++-+    #     logits = self.lm_head.forward(hidden_states)
+-+++-+    #     logits = logits.float()
+-+++-+        
+-+++-+    #     return logits[:, -1, :]
+-+++-+
+-+++-+    # def _sample(
+-+++-+    #     self,
+-+++-+    #     input_ids: mindspore.Tensor,
+-+++-+    #     logits_processor,
+-+++-+    #     stopping_criteria,
+-+++-+    #     generation_config,
+-+++-+    #     synced_devices: bool,
+-+++-+    #     streamer=None,
+-+++-+    #     logits_warper=None,
+-+++-+    #     **model_kwargs,
+-+++-+    # ):
+-+++-+    #     """
+-+++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+++-+    #     """
+-+++-+    #     from ...generation.logits_process import LogitsProcessorList
+-+++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+++-+    #     from mindnlp.core import nn, ops, no_grad
+-+++-+    #     import numpy as np
+-+++-+        
+-+++-+    #     # 检查是否使用 StaticCache
+-+++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+++-+    #     # 否则，直接调用父类方法
+-+++-+    #     past_key_values = model_kwargs.get("past_key_values")
+-+++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+++-+
+-+++-+    #     if not isinstance(past_key_values, StaticCache):
+-+++-+    #         # 不使用 StaticCache，直接调用父类方法
+-+++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+++-+    #         return super()._sample(
+-+++-+    #             input_ids=input_ids,
+-+++-+    #             logits_processor=logits_processor,
+-+++-+    #             stopping_criteria=stopping_criteria,
+-+++-+    #             generation_config=generation_config,
+-+++-+    #             synced_devices=synced_devices,
+-+++-+    #             streamer=streamer,
+-+++-+    #             logits_warper=logits_warper,
+-+++-+    #             **model_kwargs,
+-+++-+    #         )
+-+++-+        
+-+++-+    #     # 使用 StaticCache，进入自定义循环
+-+++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+++-+    #     pad_token_id = generation_config._pad_token_tensor
+-+++-+    #     output_attentions = generation_config.output_attentions
+-+++-+    #     output_hidden_states = generation_config.output_hidden_states
+-+++-+    #     output_scores = generation_config.output_scores
+-+++-+    #     output_logits = generation_config.output_logits
+-+++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+++-+    #     max_length = generation_config.max_length
+-+++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+++-+    #     do_sample = generation_config.do_sample
+-+++-+        
+-+++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+++-+    #         raise ValueError(
+-+++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+++-+    #             f"{logits_warper})."
+-+++-+    #         )
+-+++-+        
+-+++-+    #     # init attention / hidden states / scores tuples
+-+++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+++-+        
+-+++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+++-+    #         encoder_hidden_states = (
+-+++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+++-+    #         )
+-+++-+        
+-+++-+    #     # keep track of which sequences are already finished
+-+++-+    #     batch_size, cur_len = input_ids.shape
+-+++-+    #     this_peer_finished = False
+-+++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+++-+        
+-+++-+    #     time_record = []
+-+++-+    #     from ....utils.testing_utils import parse_flag_from_env
+-+++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+++-+        
+-+++-+    #     while self._has_unfinished_sequences(
+-+++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+++-+    #     ):
+-+++-+    #         if _record_time:
+-+++-+    #             import time as time_module
+-+++-+    #             infer_start = time_module.time()
+-+++-+            
+-+++-+    #         # prepare model inputs
+-+++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+++-+            
+-+++-+    #         # prepare variable output controls
+-+++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+++-+            
+-+++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+++-+    #         cur_cache_position = model_inputs.get("cache_position")
+-+++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+++-+    #         cur_input_ids = model_inputs.get("input_ids")
+-+++-+            
+-+++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+++-+    #             cur_cache_position is not None and 
+-+++-+    #             len(cur_cache_position.shape) > 0 and
+-+++-+    #             cur_cache_position.shape[0] == 1 and
+-+++-+    #             cur_input_ids is not None and
+-+++-+    #             cur_input_ids.shape[1] == 1):
+-+++-+    #             # 使用 JIT 优化的单 token 解码
+-+++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+++-+    #             if not hasattr(self, '_jit_used'):
+-+++-+    #                 self._jit_used = False
+-+++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+++-+                
+-+++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+++-+    #                 cur_token=cur_input_ids,
+-+++-+    #                 input_pos=model_inputs.get("position_ids"),
+-+++-+    #                 cache_position=cur_cache_position,
+-+++-+    #                 past_key_values=cur_past_key_values,
+-+++-+    #             )
+-+++-+                
+-+++-+    #             # 标记已使用JIT（用于后续判断）
+-+++-+    #             if not self._jit_used:
+-+++-+    #                 self._jit_used = True
+-+++-+                
+-+++-+    #             # 构造兼容的输出对象
+-+++-+    #             class JitOptimizedOutput:
+-+++-+    #                 def __init__(self, logits, config):
+-+++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+++-+    #                     self.config = config
+-+++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+++-+    #                     self.cross_attentions = None
+-+++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+++-+                
+-+++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+++-+    #         else:
+-+++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+++-+    #             outputs = self(**model_inputs, return_dict=True)
+-+++-+            
+-+++-+    #         if synced_devices and this_peer_finished:
+-+++-+    #             continue
+-+++-+            
+-+++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+++-+    #         next_token_logits = outputs.logits[:, -1, :]
+-+++-+            
+-+++-+    #         # pre-process distribution
+-+++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+++-+    #         if do_sample:
+-+++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+++-+            
+-+++-+    #         # Store scores, attentions and hidden_states when required
+-+++-+    #         if return_dict_in_generate:
+-+++-+    #             if output_scores:
+-+++-+    #                 scores += (next_token_scores,)
+-+++-+    #             if output_logits:
+-+++-+    #                 raw_logits += (next_token_logits,)
+-+++-+    #             if output_attentions:
+-+++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+++-+    #                 if self.config.is_encoder_decoder:
+-+++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+++-+                
+-+++-+    #             if output_hidden_states:
+-+++-+    #                 hidden = (
+-+++-+    #                     outputs.decoder_hidden_states
+-+++-+    #                     if self.config.is_encoder_decoder
+-+++-+    #                     else outputs.hidden_states
+-+++-+    #                 )
+-+++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+++-+            
+-+++-+    #         # token selection
+-+++-+    #         if do_sample:
+-+++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+++-+    #         else:
+-+++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+++-+            
+-+++-+    #         # finished sentences should have their next token be a padding token
+-+++-+    #         if has_eos_stopping_criteria:
+-+++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+++-+            
+-+++-+    #         # update generated ids, model inputs, and length for next step
+-+++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+++-+    #         if streamer is not None:
+-+++-+    #             streamer.put(next_tokens)
+-+++-+            
+-+++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+++-+    #             outputs,
+-+++-+    #             model_kwargs,
+-+++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+++-+    #         )
+-+++-+            
+-+++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+++-+    #         cur_len += 1
+-+++-+            
+-+++-+    #         if _record_time:
+-+++-+    #             import time as time_module
+-+++-+    #             infer_stop = time_module.time()
+-+++-+    #             time_record.append(infer_stop - infer_start)
+-+++-+            
+-+++-+    #         del outputs
+-+++-+        
+-+++-+    #     average_infer_time = None
+-+++-+    #     if time_record:
+-+++-+    #         if len(time_record) > 1:
+-+++-+    #             time_record.pop(0)
+-+++-+    #         average_infer_time = sum(time_record) / len(time_record)
+-+++-+    #         print(f'average inference time is: {average_infer_time}')
+-+++-+    #         print(f'inference time record: {time_record}')
+-+++-+        
+-+++-+    #     if streamer is not None:
+-+++-+    #         streamer.end()
+-+++-+        
+-+++-+    #     # 简单判断：打印是否使用了JIT路径
+-+++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+++-+    #     else:
+-+++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+++-+        
+-+++-+    #     if return_dict_in_generate:
+-+++-+    #         if self.config.is_encoder_decoder:
+-+++-+    #             return GenerateEncoderDecoderOutput(
+-+++-+    #                 sequences=input_ids,
+-+++-+    #                 scores=scores,
+-+++-+    #                 logits=raw_logits,
+-+++-+    #                 encoder_attentions=encoder_attentions,
+-+++-+    #                 encoder_hidden_states=encoder_hidden_states,
+-+++-+    #                 decoder_attentions=decoder_attentions,
+-+++-+    #                 cross_attentions=cross_attentions,
+-+++-+    #                 decoder_hidden_states=decoder_hidden_states,
+-+++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++-+    #                 average_infer_time=average_infer_time
+-+++-+    #             )
+-+++-+    #         else:
+-+++-+    #             return GenerateDecoderOnlyOutput(
+-+++-+    #                 sequences=input_ids,
+-+++-+    #                 scores=scores,
+-+++-+    #                 logits=raw_logits,
+-+++-+    #                 attentions=decoder_attentions,
+-+++-+    #                 hidden_states=decoder_hidden_states,
+-+++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++-+    #                 average_infer_time=average_infer_time
+-+++-+    #             )
+-+++-+    #     else:
+-+++-+    #         return input_ids
+-+++-+            
+-+++-+    # def _prepare_cache_for_generation(
+-+++-+    #     self,
+-+++-+    #     generation_config,
+-+++-+    #     model_kwargs,
+-+++-+    #     assistant_model,
+-+++-+    #     batch_size,
+-+++-+    #     max_cache_length,
+-+++-+    # ):
+-+++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+++-+    #         generation_config.cache_implementation = "static"
+-+++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+++-+        
+-+++-+    #     if generation_config.cache_implementation == "static":
+-+++-+    #         base_required_from_max_length = generation_config.max_length + 1
+-+++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+++-+    #         min_cache_size = 50
+-+++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+++-+    #         else:
+-+++-+    #             max_cache_length = max(base_required, min_cache_size)
+-+++-+            
+-+++-+    #         original_max_cache_length = max_cache_length
+-+++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+++-+            
+-+++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++-+    #             if max_cache_length > self.config.max_position_embeddings:
+-+++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+++-+        
+-+++-+    #     result = super()._prepare_cache_for_generation(
+-+++-+    #         generation_config=generation_config,
+-+++-+    #         model_kwargs=model_kwargs,
+-+++-+    #         assistant_model=assistant_model,
+-+++-+    #         batch_size=batch_size,
+-+++-+    #         max_cache_length=max_cache_length,
+-+++-+    #     )
+-+++-+        
+-+++-+    #     if generation_config.cache_implementation == "static":
+-+++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+++-+    #         created_cache = model_kwargs.get(cache_name)
+-+++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+++-+    #             if created_cache.max_cache_len < generation_config.max_length:
+-+++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+++-+        
+-+++-+    #     return result
+-+++-+
+-+++-+
+-+++-+
+-+++- 
+-+++- 
+-+++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+++--- 
+-+++-2.27.0
+-+++-
+-+++-- 
+-+++2.27.0
+-+++
+-++-- 
+-++2.27.0
+-++
+-+-- 
+-+2.27.0
+-+
+--- 
+-2.27.0
+-
+diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
+deleted file mode 100644
+index 8a2fc4fe..00000000
+--- a/patches/0007-20251107003commit.patch
++++ /dev/null
+@@ -1,8034 +0,0 @@
+-From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Fri, 7 Nov 2025 12:12:51 +0800
+-Subject: [PATCH 7/8] 20251107003commit
+-
+----
+- .../models/deepseek/modeling_deepseek.py      |    2 +-
+- patches/0001-20251104commit.patch             |    2 +-
+- patches/0002-20251106commit.patch             |    2 +-
+- patches/0003-20261106secondcommit.patch       |    2 +-
+- patches/0004-20251106change.patch             |    2 +-
+- patches/0005-20251107001commit.patch          |    2 +-
+- patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
+- 7 files changed, 7937 insertions(+), 6 deletions(-)
+- create mode 100644 patches/0006-20251107002commit.patch
+-
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index e7e1c053..ff631974 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
+-     #     return expert_cache
+-     
+-     @no_grad()
+--    dwj
+-+    # dwj
+-     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-         # x 的 shape: (1, hidden_size)
+-         # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-index 2842180e..c9c8c5ee 100644
+---- a/patches/0001-20251104commit.patch
+-+++ b/patches/0001-20251104commit.patch
+-@@ -1,7 +1,7 @@
+- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Tue, 4 Nov 2025 09:11:51 +0800
+--Subject: [PATCH 1/5] 20251104commit
+-+Subject: [PATCH 1/6] 20251104commit
+- 
+- ---
+-  mindnlp/transformers/cache_utils.py           |  28 +-
+-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-index c6cd8757..625656eb 100644
+---- a/patches/0002-20251106commit.patch
+-+++ b/patches/0002-20251106commit.patch
+-@@ -1,7 +1,7 @@
+- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 09:20:38 +0800
+--Subject: [PATCH 2/5] 20251106commit
+-+Subject: [PATCH 2/6] 20251106commit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-index 601960c9..dcb85080 100644
+---- a/patches/0003-20261106secondcommit.patch
+-+++ b/patches/0003-20261106secondcommit.patch
+-@@ -1,7 +1,7 @@
+- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 14:54:37 +0800
+--Subject: [PATCH 3/5] 20261106secondcommit
+-+Subject: [PATCH 3/6] 20261106secondcommit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-index 8976f10b..bbed13cc 100644
+---- a/patches/0004-20251106change.patch
+-+++ b/patches/0004-20251106change.patch
+-@@ -1,7 +1,7 @@
+- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 15:48:09 +0800
+--Subject: [PATCH 4/5] 20251106change
+-+Subject: [PATCH 4/6] 20251106change
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  189 +-
+-diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+-index 8d9032be..b2d1035c 100644
+---- a/patches/0005-20251107001commit.patch
+-+++ b/patches/0005-20251107001commit.patch
+-@@ -1,7 +1,7 @@
+- From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Fri, 7 Nov 2025 11:48:18 +0800
+--Subject: [PATCH 5/5] 20251107001commit
+-+Subject: [PATCH 5/6] 20251107001commit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |   91 +-
+-diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
+-new file mode 100644
+-index 00000000..bffa134e
+---- /dev/null
+-+++ b/patches/0006-20251107002commit.patch
+-@@ -0,0 +1,7931 @@
+-+From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
+-+From: Pinoeer-kingxi <13022943007@163.com>
+-+Date: Fri, 7 Nov 2025 12:06:32 +0800
+-+Subject: [PATCH 6/6] 20251107002commit
+-+
+-+---
+-+ .../models/deepseek/modeling_deepseek.py      |  122 +-
+-+ patches/0001-20251104commit.patch             |    2 +-
+-+ patches/0002-20251106commit.patch             |    2 +-
+-+ patches/0003-20261106secondcommit.patch       |    2 +-
+-+ patches/0004-20251106change.patch             |    2 +-
+-+ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
+-+ 6 files changed, 7773 insertions(+), 64 deletions(-)
+-+ create mode 100644 patches/0005-20251107001commit.patch
+-+
+-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+index 8831e4b7..e7e1c053 100644
+-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
+-+     #         expert_out = expert(x)
+-+     #         expert_cache += expert_out * weight
+-+     #     return expert_cache
+-+-
+-+-    # @no_grad()
+-+-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+-    #     # x 的 shape: (1, hidden_size)
+-+-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+-
+-+-    #     # 1. 收集所有需要的专家层
+-+-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+-
+-+-    #     # 2. 并行计算所有专家的输出
+-+-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+-+-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+-
+-+-    #     # 3. 使用矩阵乘法进行加权求和
+-+-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+-+-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-++    
+-++    @no_grad()
+-++    dwj
+-++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++        # x 的 shape: (1, hidden_size)
+-++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-++
+-++        # 1. 收集所有需要的专家层
+-++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-++
+-++        # 2. 并行计算所有专家的输出
+-++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-++        # ops.cat 会将它们堆叠成一个新的 Tensor
+-++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-++
+-++        # 3. 使用矩阵乘法进行加权求和
+-++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++        # 最终结果 final_output 的 shape: (1, hidden_size)
+-++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+         
+-+-    #     return final_output
+-++        return final_output
+-+ 
+-+ 
+-+     # @no_grad()
+-+@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
+-+ 
+-+         return expert_cache
+-+ # 放置在 DeepseekMoE 类中
+-+-    @no_grad()
+-+-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+-        """
+-+-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-+-
+-+-        Args:
+-+-            x (Tensor): 输入张量, shape: (1, hidden_size)
+-+-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-+-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-+-        """
+-+-        top_k, _ = flat_expert_weights.shape
+-+-        hidden_size = x.shape[-1]
+-+-
+-+-        # 1. 将所有专家的权重堆叠起来
+-+-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-+-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-+-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-++    # @no_grad()
+-++    # #lwx 20251107
+-++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++    #     """
+-++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-++
+-++    #     Args:
+-++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
+-++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-++    #     """
+-++    #     top_k, _ = flat_expert_weights.shape
+-++    #     hidden_size = x.shape[-1]
+-++
+-++    #     # 1. 将所有专家的权重堆叠起来
+-++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-+         
+-+-        # 2. "收集" 所需的专家权重
+-+-        selected_gate_w = stacked_gate_w[flat_expert_indices]
+-+-        selected_up_w = stacked_up_w[flat_expert_indices]
+-+-        selected_down_w = stacked_down_w[flat_expert_indices]
+-++    #     # 2. "收集" 所需的专家权重
+-++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
+-++    #     selected_up_w = stacked_up_w[flat_expert_indices]
+-++    #     selected_down_w = stacked_down_w[flat_expert_indices]
+-+         
+-+-        # 3. 准备输入
+-+-        x_expanded = x.expand((top_k, 1, hidden_size))
+-++    #     # 3. 准备输入
+-++    #     x_expanded = x.expand((top_k, 1, hidden_size))
+-+         
+-+-        # 4. 并行计算 gate_proj 和 up_proj
+-+-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-+-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-++    #     # 4. 并行计算 gate_proj 和 up_proj
+-++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-+ 
+-+-        # 5. 计算中间状态
+-+-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-++    #     # 5. 计算中间状态
+-++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-+         
+-+-        # 6. 并行计算 down_proj
+-+-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-+-        #    --- [FIX] ---
+-+-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-+-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-+-        #    --- [FIX END] ---
+-++    #     # 6. 并行计算 down_proj
+-++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-++    #     #    --- [FIX] ---
+-++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-++    #     #    --- [FIX END] ---
+-+         
+-+-        # 7. 根据路由权重进行加权求和
+-+-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-++    #     # 7. 根据路由权重进行加权求和
+-++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-+ 
+-+-        return weighted_sum
+-++    #     return weighted_sum
+-+ 
+-+ 
+-+ 
+-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+index 0a0ef2d7..2842180e 100644
+-+--- a/patches/0001-20251104commit.patch
+-++++ b/patches/0001-20251104commit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+-Subject: [PATCH 1/4] 20251104commit
+-++Subject: [PATCH 1/5] 20251104commit
+-+ 
+-+ ---
+-+  mindnlp/transformers/cache_utils.py           |  28 +-
+-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-+index 5185270c..c6cd8757 100644
+-+--- a/patches/0002-20251106commit.patch
+-++++ b/patches/0002-20251106commit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-+-Subject: [PATCH 2/4] 20251106commit
+-++Subject: [PATCH 2/5] 20251106commit
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-+index 3e05f821..601960c9 100644
+-+--- a/patches/0003-20261106secondcommit.patch
+-++++ b/patches/0003-20261106secondcommit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-+-Subject: [PATCH 3/4] 20261106secondcommit
+-++Subject: [PATCH 3/5] 20261106secondcommit
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-+index 88a1aef4..8976f10b 100644
+-+--- a/patches/0004-20251106change.patch
+-++++ b/patches/0004-20251106change.patch
+-+@@ -1,7 +1,7 @@
+-+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Thu, 6 Nov 2025 15:48:09 +0800
+-+-Subject: [PATCH 4/4] 20251106change
+-++Subject: [PATCH 4/5] 20251106change
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |  189 +-
+-+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+-+new file mode 100644
+-+index 00000000..8d9032be
+-+--- /dev/null
+-++++ b/patches/0005-20251107001commit.patch
+-+@@ -0,0 +1,7707 @@
+-++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+-++From: Pinoeer-kingxi <13022943007@163.com>
+-++Date: Fri, 7 Nov 2025 11:48:18 +0800
+-++Subject: [PATCH 5/5] 20251107001commit
+-++
+-++---
+-++ .../models/deepseek/modeling_deepseek.py      |   91 +-
+-++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
+-++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
+-++ patches/0001-20251104commit.patch             |    2 +-
+-++ patches/0002-20251106commit.patch             |    2 +-
+-++ patches/0003-20261106secondcommit.patch       |    2 +-
+-++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
+-++ 7 files changed, 7577 insertions(+), 30 deletions(-)
+-++ create mode 100644 patches/0004-20251106change.patch
+-++
+-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++index 0546f318..8831e4b7 100644
+-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
+-++     #         expert_cache += expert_out * weight
+-++     #     return expert_cache
+-++ 
+-++-    @no_grad()
+-++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++-        # x 的 shape: (1, hidden_size)
+-++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-++-
+-++-        # 1. 收集所有需要的专家层
+-++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-++-
+-++-        # 2. 并行计算所有专家的输出
+-++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-++-        # ops.cat 会将它们堆叠成一个新的 Tensor
+-++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-++-
+-++-        # 3. 使用矩阵乘法进行加权求和
+-++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++-        # 最终结果 final_output 的 shape: (1, hidden_size)
+-++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+++    # @no_grad()
+-+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++    #     # x 的 shape: (1, hidden_size)
+-+++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+++
+-+++    #     # 1. 收集所有需要的专家层
+-+++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+++
+-+++    #     # 2. 并行计算所有专家的输出
+-+++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+-+++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+++
+-+++    #     # 3. 使用矩阵乘法进行加权求和
+-+++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+-+++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-++         
+-++-        return final_output
+-+++    #     return final_output
+-++ 
+-++ 
+-++     # @no_grad()
+-++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
+-++             )
+-++ 
+-++         return expert_cache
+-+++# 放置在 DeepseekMoE 类中
+-+++    @no_grad()
+-+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++        """
+-+++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-+++
+-+++        Args:
+-+++            x (Tensor): 输入张量, shape: (1, hidden_size)
+-+++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-+++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-+++        """
+-+++        top_k, _ = flat_expert_weights.shape
+-+++        hidden_size = x.shape[-1]
+-+++
+-+++        # 1. 将所有专家的权重堆叠起来
+-+++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-+++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-+++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-+++        
+-+++        # 2. "收集" 所需的专家权重
+-+++        selected_gate_w = stacked_gate_w[flat_expert_indices]
+-+++        selected_up_w = stacked_up_w[flat_expert_indices]
+-+++        selected_down_w = stacked_down_w[flat_expert_indices]
+-+++        
+-+++        # 3. 准备输入
+-+++        x_expanded = x.expand((top_k, 1, hidden_size))
+-+++        
+-+++        # 4. 并行计算 gate_proj 和 up_proj
+-+++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-+++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-+++
+-+++        # 5. 计算中间状态
+-+++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-+++        
+-+++        # 6. 并行计算 down_proj
+-+++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-+++        #    --- [FIX] ---
+-+++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-+++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-+++        #    --- [FIX END] ---
+-+++        
+-+++        # 7. 根据路由权重进行加权求和
+-+++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-+++
+-+++        return weighted_sum
+-+++
+-+++
+-++ 
+-++         # @no_grad()
+-++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++index ebd7782e..913a7609 100644
+-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
+-++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++ def rotate_half(x):
+-++     """Rotates half the hidden dims of the input."""
+-++-    x1 = x[..., : x.shape[-1] // 2]
+-++-    x2 = x[..., x.shape[-1] // 2 :]
+-+++    # x1 = x[..., : x.shape[-1] // 2]
+-+++    # x2 = x[..., x.shape[-1] // 2 :]
+-++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++     return ops.cat((-x2, x1), dim=-1)
+-++ 
+-++ 
+-++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-++index d059dcbe..2b217b64 100644
+-++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-+++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
+-++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++ def rotate_half(x):
+-++     """Rotates half the hidden dims of the input."""
+-++-    x1 = x[..., : x.shape[-1] // 2]
+-++-    x2 = x[..., x.shape[-1] // 2 :]
+-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++    # x1 = x[..., : x.shape[-1] // 2]
+-+++    # x2 = x[..., x.shape[-1] // 2 :]
+-+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++     return ops.cat((-x2, x1), dim=-1)
+-++ 
+-++ 
+-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-++index 78f22642..0a0ef2d7 100644
+-++--- a/patches/0001-20251104commit.patch
+-+++++ b/patches/0001-20251104commit.patch
+-++@@ -1,7 +1,7 @@
+-++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++ From: Pinoeer-kingxi <13022943007@163.com>
+-++ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++-Subject: [PATCH 1/3] 20251104commit
+-+++Subject: [PATCH 1/4] 20251104commit
+-++ 
+-++ ---
+-++  mindnlp/transformers/cache_utils.py           |  28 +-
+-++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-++index 22b65dd5..5185270c 100644
+-++--- a/patches/0002-20251106commit.patch
+-+++++ b/patches/0002-20251106commit.patch
+-++@@ -1,7 +1,7 @@
+-++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-++ From: Pinoeer-kingxi <13022943007@163.com>
+-++ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-++-Subject: [PATCH 2/3] 20251106commit
+-+++Subject: [PATCH 2/4] 20251106commit
+-++ 
+-++ ---
+-++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-++index 966529e4..3e05f821 100644
+-++--- a/patches/0003-20261106secondcommit.patch
+-+++++ b/patches/0003-20261106secondcommit.patch
+-++@@ -1,7 +1,7 @@
+-++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-++ From: Pinoeer-kingxi <13022943007@163.com>
+-++ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-++-Subject: [PATCH 3/3] 20261106secondcommit
+-+++Subject: [PATCH 3/4] 20261106secondcommit
+-++ 
+-++ ---
+-++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-++new file mode 100644
+-++index 00000000..88a1aef4
+-++--- /dev/null
+-+++++ b/patches/0004-20251106change.patch
+-++@@ -0,0 +1,7498 @@
+-+++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+-+++From: Pinoeer-kingxi <13022943007@163.com>
+-+++Date: Thu, 6 Nov 2025 15:48:09 +0800
+-+++Subject: [PATCH 4/4] 20251106change
+-+++
+-+++---
+-+++ .../models/deepseek/modeling_deepseek.py      |  189 +-
+-+++ patches/0001-20251104commit.patch             | 1272 +++++++
+-+++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
+-+++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
+-+++ 4 files changed, 7244 insertions(+), 186 deletions(-)
+-+++ create mode 100644 patches/0001-20251104commit.patch
+-+++ create mode 100644 patches/0002-20251106commit.patch
+-+++ create mode 100644 patches/0003-20261106secondcommit.patch
+-+++
+-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++index 2f9192bf..0546f318 100644
+-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
+-+++ 
+-+++         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++-# class DeepseekFlashAttention(nn.Module):
+-+++-#     """
+-+++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-+++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-+++-
+-+++-#     This class is designed as a drop-in replacement for DeepseekAttention.
+-+++-#     """
+-+++-
+-+++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-+++-#         super().__init__()
+-+++-#         self.config = config
+-+++-#         self.layer_idx = layer_idx
+-+++-#         if layer_idx is None:
+-+++-#             logger.warning(
+-+++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++-#                 "when creating this class."
+-+++-#             )
+-+++-
+-+++-#         self.attention_dropout = config.attention_dropout
+-+++-#         self.hidden_size = config.hidden_size
+-+++-#         self.num_heads = config.num_attention_heads
+-+++-#         self.head_dim = self.hidden_size // self.num_heads
+-+++-#         self.num_key_value_heads = config.num_key_value_heads
+-+++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++-#         self.max_position_embeddings = config.max_position_embeddings
+-+++-#         self.rope_theta = config.rope_theta
+-+++-#         self.is_causal = True
+-+++-
+-+++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++-#             raise ValueError(
+-+++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+++-#                 f" and `num_heads`: {self.num_heads})."
+-+++-#             )
+-+++-
+-+++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-+++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-+++-#         self._init_rope()
+-+++-
+-+++-#     def _init_rope(self):
+-+++-#         if self.config.rope_scaling is None:
+-+++-#             self.rotary_emb = DeepseekRotaryEmbedding(
+-+++-#                 self.head_dim,
+-+++-#                 max_position_embeddings=self.max_position_embeddings,
+-+++-#                 base=self.rope_theta,
+-+++-#             )
+-+++-#         else:
+-+++-#             scaling_type = self.config.rope_scaling["type"]
+-+++-#             scaling_factor = self.config.rope_scaling["factor"]
+-+++-#             if scaling_type == "linear":
+-+++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-+++-#                     self.head_dim,
+-+++-#                     max_position_embeddings=self.max_position_embeddings,
+-+++-#                     scaling_factor=scaling_factor,
+-+++-#                     base=self.rope_theta,
+-+++-#                 )
+-+++-#             elif scaling_type == "dynamic":
+-+++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-+++-#                     self.head_dim,
+-+++-#                     max_position_embeddings=self.max_position_embeddings,
+-+++-#                     scaling_factor=scaling_factor,
+-+++-#                     base=self.rope_theta,
+-+++-#                 )
+-+++-#             else:
+-+++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-+++-
+-+++-#     def forward(
+-+++-#         self,
+-+++-#         hidden_states: mindspore.Tensor,
+-+++-#         attention_mask: Optional[mindspore.Tensor] = None,
+-+++-#         position_ids: Optional[mindspore.Tensor] = None,
+-+++-#         past_key_value: Optional[Cache] = None,
+-+++-#         output_attentions: bool = False,
+-+++-#         use_cache: bool = False,
+-+++-#         **kwargs,
+-+++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++-#         if "padding_mask" in kwargs:
+-+++-#             warnings.warn(
+-+++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-+++-#             )
+-+++-        
+-+++-#         if output_attentions:
+-+++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-+++-
+-+++-#         bsz, q_len, _ = hidden_states.shape
+-+++-
+-+++-#         if self.config.pretraining_tp > 1:
+-+++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-+++-
+-+++-#         query_states = self.q_proj(hidden_states)
+-+++-#         key_states = self.k_proj(hidden_states)
+-+++-#         value_states = self.v_proj(hidden_states)
+-+++-
+-+++-#         # Reshape for multi-head attention
+-+++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++-
+-+++-#         kv_seq_len = key_states.shape[-2]
+-+++-#         if past_key_value is not None:
+-+++-#             if self.layer_idx is None:
+-+++-#                 raise ValueError(
+-+++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++-#                     "with a layer index."
+-+++-#                 )
+-+++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++-        
+-+++-#         # Apply Rotary Positional Embedding
+-+++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++-
+-+++-#         if past_key_value is not None:
+-+++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-+++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++-
+-+++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-+++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-+++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++-        
+-+++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+++-        
+-+++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+++-
+-+++-#         # Convert attention_mask for flash_attention_score
+-+++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-+++-#         if attention_mask is not None:
+-+++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-+++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-+++-#                 raise ValueError(
+-+++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-+++-#                 )
+-+++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-+++-#         else:
+-+++-#             attn_mask_for_fa = None
+-+++-        
+-+++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-+++-
+-+++-#         # Call the fused flash_attention_score operator
+-+++-#         attn_output = mindspore.ops.flash_attention_score(
+-+++-#             query=query_states_for_fa,
+-+++-#             key=key_states_for_fa,
+-+++-#             value=value_states_for_fa,
+-+++-#             head_num=self.num_heads, # This is N1, the number of query heads
+-+++-#             input_layout='BSH',
+-+++-#             attn_mask=attn_mask_for_fa,
+-+++-#             keep_prob=keep_prob,
+-+++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++-#             sparse_mode=0 # Default mask mode
+-+++-#         )
+-+++-        
+-+++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-+++-#         attn_output = self.o_proj(attn_output)
+-+++-        
+-+++-#         # Flash Attention does not return attention weights
+-+++-#         attn_weights = None
+-+++-
+-+++-#         return attn_output, attn_weights, past_key_value
+-+++ 
+-+++ class DeepseekFlashAttention(nn.Module):
+-+++     """
+-+++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
+-+++         super().__init__()
+-+++         self.hidden_size = config.hidden_size
+-+++ 
+-+++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-+++-            config=config, layer_idx=layer_idx
+-+++-        )
+-++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-++++            # config=config, layer_idx=layer_idx
+-++++        # )
+-+++ 
+-+++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-+++             config=config, layer_idx=layer_idx
+-+++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
+-+++         return outputs
+-+++ 
+-+++ 
+-+++-
+-+++ class DeepseekPreTrainedModel(PreTrainedModel):
+-+++     config_class = DeepseekConfig
+-+++     base_model_prefix = "model"
+-+++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++         # Initialize weights and apply final processing
+-+++         self.post_init()
+-+++         self.warm_up = False
+-+++-        #@dwj
+-+++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-+++-            self.num_layers,
+-+++-            self.num_attention_heads,
+-+++-            self.head_dim,
+-+++-            batch_size=1,
+-+++-            max_length=self.max_length,
+-+++-            dtype=mindspore.float16
+-+++-        )
+-+++-
+-+++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-+++-        key_cache = []
+-+++-        value_cache = []
+-+++-        for _ in range(num_layers):
+-+++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+++-            key_cache.append(k)
+-+++-            value_cache.append(v)
+-+++-        return key_cache, value_cache
+-+++-
+-+++ 
+-+++     def warmup_moe_model_deep(self):
+-+++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+++new file mode 100644
+-+++index 00000000..78f22642
+-+++--- /dev/null
+-++++++ b/patches/0001-20251104commit.patch
+-+++@@ -0,0 +1,1272 @@
+-++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++++From: Pinoeer-kingxi <13022943007@163.com>
+-++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++++Subject: [PATCH 1/3] 20251104commit
+-++++
+-++++---
+-++++ mindnlp/transformers/cache_utils.py           |  28 +-
+-++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-++++
+-++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-++++index cadd2e04..02f8d4be 100644
+-++++--- a/mindnlp/transformers/cache_utils.py
+-+++++++ b/mindnlp/transformers/cache_utils.py
+-++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-++++             # k_out[:, :, cache_position] = key_states
+-++++             # v_out[:, :, cache_position] = value_states
+-++++-            if ON_ORANGE_PI:
+-++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++-            else:
+-++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++-
+-+++++            # if ON_ORANGE_PI:
+-+++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++++            # else:
+-+++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++++            # 确保 cache_position 是 1D tensor 并且类型正确
+-+++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+++++            if cache_position.ndim > 1:
+-+++++                cache_position = cache_position.flatten()
+-+++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+++++                cache_position = cache_position.int()
+-+++++            
+-+++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+++++            k_out[:, :, cache_position] = key_states
+-+++++            v_out[:, :, cache_position] = value_states
+-+++++            
+-++++         return k_out, v_out
+-++++ 
+-++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++index c695b944..d8303e45 100644
+-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++++ def rotate_half(x):
+-++++     """Rotates half the hidden dims of the input."""
+-++++-    x1 = x[..., : x.shape[-1] // 2]
+-++++-    x2 = x[..., x.shape[-1] // 2 :]
+-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++++    # x1 = x[..., : x.shape[-1] // 2]
+-+++++    # x2 = x[..., x.shape[-1] // 2 :]
+-+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++     return ops.cat((-x2, x1), dim=-1)
+-++++ 
+-++++ 
+-++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-++++         if self.training:
+-++++             raise NotImplementedError("Training is not supported yet.")
+-++++         else:
+-++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++-        if self.config.n_shared_experts is not None:
+-++++-            y = y + self.shared_experts(identity)
+-++++-        return y
+-+++++            # @lwx
+-+++++            if orig_shape[1] == 1:
+-+++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+++++                y=y.view(*orig_shape)
+-+++++                if self.config.n_shared_experts is not None:
+-+++++                    y = y + self.shared_experts(identity)
+-+++++                return y
+-+++++            else:
+-+++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++                if self.config.n_shared_experts is not None:
+-+++++                    y = y + self.shared_experts(identity)
+-+++++                return y
+-+++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++        # if self.config.n_shared_experts is not None:
+-+++++        #     y = y + self.shared_experts(identity)
+-+++++        # return y
+-+++++        
+-+++++    @no_grad()
+-+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++++
+-+++++        expert_cache = ops.zeros_like(x)
+-+++++        for i in range(self.num_experts_per_tok):
+-+++++            expert_id = flat_expert_indices[i].item()
+-+++++            weight = flat_expert_weights[i].item()
+-+++++            expert = self.experts[expert_id]
+-+++++            expert_out = expert(x)
+-+++++            expert_cache += expert_out * weight
+-+++++        return expert_cache
+-++++ 
+-++++     @no_grad()
+-++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++-        # expert_cache = torch.zeros_like(x)
+-++++-        # idxs = flat_expert_indices.argsort()
+-++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++-        # token_idxs = idxs // self.num_experts_per_tok
+-++++-        # for i, end_idx in enumerate(tokens_per_expert):
+-++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++-        #     if start_idx == end_idx:
+-++++-        #         continue
+-++++-        #     expert = self.experts[i]
+-++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++-        #     expert_tokens = x[exp_token_idx]
+-++++-        #     expert_out = expert(expert_tokens)
+-++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++-        # return expert_cache
+-+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++         expert_cache = ops.zeros_like(x)
+-++++         idxs = flat_expert_indices.argsort()
+-++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++         token_idxs = idxs // self.num_experts_per_tok
+-+++++
+-++++         for i, end_idx in enumerate(tokens_per_expert):
+-++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++             if start_idx == end_idx:
+-++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-++++             expert_out = expert(expert_tokens)
+-++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++
+-++++         return expert_cache
+-+++++        
+-+++++    # @no_grad()
+-+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++    #     # expert_cache = torch.zeros_like(x)
+-+++++    #     # idxs = flat_expert_indices.argsort()
+-+++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++    #     #     if start_idx == end_idx:
+-+++++    #     #         continue
+-+++++    #     #     expert = self.experts[i]
+-+++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++    #     #     expert_tokens = x[exp_token_idx]
+-+++++    #     #     expert_out = expert(expert_tokens)
+-+++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++    #     # return expert_cache
+-+++++    #     expert_cache = ops.zeros_like(x)
+-+++++    #     idxs = flat_expert_indices.argsort()
+-+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++
+-+++++    #     for i, end_idx in enumerate(tokens_per_expert):
+-+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++    #         if start_idx == end_idx:
+-+++++    #             continue
+-+++++    #         expert = self.experts[i]
+-+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++    #         expert_tokens = x[exp_token_idx]
+-+++++    #         expert_out = expert(expert_tokens)
+-+++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++
+-+++++    #     return expert_cache
+-+++++    # @no_grad()
+-+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++    #     expert_cache = ops.zeros_like(x)
+-+++++
+-+++++    #     # 排序保证顺序一致
+-+++++    #     idxs = flat_expert_indices.argsort()
+-+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++
+-+++++    #     # 找出有 token 的专家
+-+++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++++
+-+++++    #     for i in active_experts.tolist():
+-+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++    #         end_idx = tokens_per_expert[i]
+-+++++    #         if start_idx == end_idx:  # 没有 token
+-+++++    #             continue
+-+++++
+-+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++    #         expert_tokens = x[exp_token_idx]
+-+++++    #         expert_out = self.experts[i](expert_tokens)
+-+++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++++
+-+++++    #         expert_cache = mindspore.mint.scatter_add(
+-+++++    #             expert_cache,
+-+++++    #             0,
+-+++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++++    #             expert_out
+-+++++    #         )
+-+++++
+-+++++    #     return expert_cache
+-+++++
+-+++++
+-++++ 
+-++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-++++ #     """
+-++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++ 
+-++++         # Initialize weights and apply final processing
+-++++         self.post_init()
+-+++++        self.warm_up = False
+-+++++
+-+++++    def warmup_moe_model_deep(self):
+-+++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++++        test_texts = [
+-+++++            "warmup short",
+-+++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+++++        ]
+-+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++++        if tokenizer is None:
+-+++++            from mindnlp.transformers import AutoTokenizer
+-+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++++            self._warmup_tokenizer = tokenizer
+-+++++
+-+++++        for text in test_texts:
+-+++++            inputs = tokenizer(text, return_tensors="ms")
+-+++++            with mindspore._no_grad():
+-+++++                _ = self(**inputs, use_cache=False)
+-+++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-++++ 
+-++++     def get_input_embeddings(self):
+-++++         return self.model.embed_tokens
+-++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++++         ```"""
+-+++++        if not self.warm_up:
+-+++++            self.warm_up = True
+-+++++            self.warmup_moe_model_deep()
+-+++++
+-++++         output_attentions = (
+-++++             output_attentions
+-++++             if output_attentions is not None
+-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++index 3cbf820e..d4c6b651 100644
+-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++@@ -18,7 +18,6 @@
+-++++ # See the License for the specific language governing permissions and
+-++++ # limitations under the License.
+-++++ """MindSpore Qwen2MoE model."""
+-++++-
+-++++ import math
+-++++ from typing import List, Optional, Tuple, Union
+-++++ 
+-++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-++++     TokenClassifierOutput,
+-++++ )
+-++++ from ...modeling_utils import PreTrainedModel
+-+++++from ...generation import GenerationMixin
+-++++ from ....utils import logging
+-++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-++++ 
+-++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-++++         self.variance_epsilon = eps
+-++++ 
+-++++     def forward(self, hidden_states):
+-+++++        # @dwj
+-+++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++++        # @lwx
+-+++++        # if not self.training :
+-+++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++         input_dtype = hidden_states.dtype
+-++++         hidden_states = hidden_states.to(mindspore.float32)
+-++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-++++@@ -234,6 +239,8 @@ def rotate_half(x):
+-++++     """Rotates half the hidden dims of the input."""
+-++++     x1 = x[..., : x.shape[-1] // 2]
+-++++     x2 = x[..., x.shape[-1] // 2 :]
+-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++     return ops.cat((-x2, x1), dim=-1)
+-++++ 
+-++++ 
+-++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-++++         self.config = config
+-++++         self.hidden_size = config.hidden_size
+-++++         self.intermediate_size = intermediate_size
+-+++++        
+-++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-++++         self.act_fn = ACT2FN[config.hidden_act]
+-++++ 
+-++++     def forward(self, x):
+-++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++-
+-++++ 
+-+++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++++        # @lwx 
+-+++++        # gate_up_output = self.gate_up_proj(x)
+-+++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+++++        # return self.down_proj(swiglu_output)
+-+++++
+-+++++    # def forward(self, x):
+-+++++    #     gate_proj_out = self.gate_proj(x)
+-+++++    #     up_proj_out = self.up_proj(x)
+-+++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+++++    #     return self.down_proj(swiglu_out)
+-+++++        
+-++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++++     """
+-++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-++++         use_cache: bool = False,
+-++++         cache_position: Optional[mindspore.Tensor] = None,
+-++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++        
+-+++++
+-++++         bsz, q_len, _ = hidden_states.shape
+-++++ 
+-++++         query_states = self.q_proj(hidden_states)
+-++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++                     "with a layer index."
+-++++                 )
+-++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++            if isinstance(past_key_value, StaticCache):
+-+++++                kv_seq_len = key_states.shape[-2]
+-+++++            else:
+-+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++ 
+-++++         if past_key_value is not None:
+-++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++            
+-+++++            if isinstance(past_key_value, StaticCache):
+-+++++                kv_seq_len = key_states.shape[-2]
+-++++ 
+-++++         # repeat k/v heads if n_kv_heads < n_heads
+-++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++-
+-+++++        
+-++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++ 
+-++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-++++-            raise ValueError(
+-++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-++++-                f" {attn_weights.shape}"
+-++++-            )
+-++++-
+-++++-        if attention_mask is not None:  # no matter the length, we just slice it
+-++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+++++        if attention_mask is not None:
+-+++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++             attn_weights = attn_weights + causal_mask
+-++++ 
+-++++         # upcast attention to fp32
+-++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++ 
+-++++         attn_output = self.o_proj(attn_output)
+-++++-
+-+++++        # @lwx
+-+++++        
+-+++++        # max_seq_len = self.max_position_embeddings  # 2048
+-+++++
+-+++++        # if attention_mask is not None:
+-+++++        #     # attention_mask: [B, 1, Sq, Sk]
+-+++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++++
+-+++++        #     # pad 到 [max_seq_len, max_seq_len]
+-+++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++++        #     global_attention_mask = padded_mask
+-+++++        # else:
+-+++++        #     global_attention_mask = None
+-+++++
+-+++++
+-+++++        # sparse_mode=3
+-+++++        # attn_output = mindspore.ops.flash_attention_score(
+-+++++        #     query=query_states,
+-+++++        #     key=key_states,
+-+++++        #     value=value_states,
+-+++++        #     real_shift=None,
+-+++++        #     padding_mask=None,
+-+++++
+-+++++        #     head_num=self.num_heads,
+-+++++        #     attn_mask=global_attention_mask,
+-+++++        #     keep_prob=1.0 - self.attention_dropout,
+-+++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++        #     input_layout="BNSD",
+-+++++        #     pre_tokens=2147483647,
+-+++++        #     next_tokens=2147483647,
+-+++++        #     inner_precise=0,
+-+++++        #     drop_mask=None,
+-+++++        #     prefix=None,
+-+++++        #     actual_seq_qlen=None,
+-+++++        #     actual_seq_kvlen=None,
+-+++++        #     sparse_mode=sparse_mode,
+-+++++        # )
+-++++         if not output_attentions:
+-++++             attn_weights = None
+-++++ 
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++ 
+-+++++class Qwen2MoeFlashAttention(nn.Module):
+-+++++    """
+-+++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++++
+-+++++    关键改动:
+-+++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++++       直接传入原始的 key 和 value 张量效率更高。
+-+++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++++    """
+-+++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++++        super().__init__()
+-+++++        self.config = config
+-+++++        self.layer_idx = layer_idx
+-+++++        self.hidden_size = config.hidden_size
+-+++++        self.num_heads = config.num_attention_heads
+-+++++        self.head_dim = self.hidden_size // self.num_heads
+-+++++        self.num_key_value_heads = config.num_key_value_heads
+-+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++        self.max_position_embeddings = config.max_position_embeddings
+-+++++        self.rope_theta = config.rope_theta
+-+++++        self.attention_dropout = config.attention_dropout
+-+++++
+-+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++            raise ValueError(
+-+++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++++            )
+-+++++
+-+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++++
+-+++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++++            self.head_dim,
+-+++++            max_position_embeddings=self.max_position_embeddings,
+-+++++            base=self.rope_theta,
+-+++++        )
+-+++++
+-+++++    def forward(
+-+++++        self,
+-+++++        hidden_states: mindspore.Tensor,
+-+++++        attention_mask: Optional[mindspore.Tensor] = None,
+-+++++        position_ids: Optional[mindspore.Tensor] = None,
+-+++++        past_key_value: Optional[Cache] = None,
+-+++++        output_attentions: bool = False,
+-+++++        use_cache: bool = False,
+-+++++        cache_position: Optional[mindspore.Tensor] = None,
+-+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++        bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++        # 1. 线性投射 Q, K, V
+-+++++        query_states = self.q_proj(hidden_states)
+-+++++        key_states = self.k_proj(hidden_states)
+-+++++        value_states = self.v_proj(hidden_states)
+-+++++
+-+++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++        # 3. RoPE 旋转位置编码
+-+++++        kv_seq_len = key_states.shape[-2]
+-+++++        if past_key_value is not None:
+-+++++            if self.layer_idx is None:
+-+++++                raise ValueError(
+-+++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++                    "with a layer index."
+-+++++                )
+-+++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++++                if cache_position.shape[0] == 1:
+-+++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++++                    kv_seq_len = past_seen_tokens + 1
+-+++++                else:
+-+++++                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++++            else:
+-+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++
+-+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++        # 4. KV 缓存更新
+-+++++        if past_key_value is not None:
+-+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++            key_states, value_states = past_key_value.update(
+-+++++                key_states, value_states, self.layer_idx, cache_kwargs
+-+++++            )
+-+++++            
+-+++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++                if cache_position.shape[0] == 1:
+-+++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++++                    kv_seq_len = key_states.shape[-2]
+-+++++
+-+++++        # 5. [重要] 准备 Attention Mask
+-+++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++++        fa_attention_mask = None
+-+++++        if attention_mask is not None:
+-+++++            # 截取与当前key长度匹配的部分
+-+++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++++            fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++++        input_dtype = query_states.dtype
+-+++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++++            query_states = query_states.to(mindspore.float16)
+-+++++            key_states = key_states.to(mindspore.float16)
+-+++++            value_states = value_states.to(mindspore.float16)
+-+++++
+-+++++        # 6. [核心] 调用 flash_attention_score 算子
+-+++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++++        attn_output = mindspore.ops.flash_attention_score(
+-+++++            query=query_states,
+-+++++            key=key_states,
+-+++++            value=value_states,
+-+++++            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++++            attn_mask=fa_attention_mask,
+-+++++            keep_prob=1.0 - self.attention_dropout,
+-+++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++            input_layout="BNSD",
+-+++++            sparse_mode=0 # 使用 defaultMask 模式
+-+++++        )
+-+++++
+-+++++        # 恢复原始数据类型
+-+++++        attn_output = attn_output.to(input_dtype)
+-+++++
+-+++++        # 7. 调整输出形状
+-+++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++        attn_output = self.o_proj(attn_output)
+-+++++
+-+++++        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++++        attn_weights = None
+-+++++        if output_attentions:
+-+++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++
+-+++++        return attn_output, attn_weights, past_key_value
+-+++++
+-+++++    # def forward(
+-+++++    #     self,
+-+++++    #     hidden_states: mindspore.Tensor,
+-+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++    #     past_key_value: Optional[Cache] = None,
+-+++++    #     output_attentions: bool = False,
+-+++++    #     use_cache: bool = False,
+-+++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++    #     bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++    #     # 1. 线性投射 Q, K, V
+-+++++    #     query_states = self.q_proj(hidden_states)
+-+++++    #     key_states = self.k_proj(hidden_states)
+-+++++    #     value_states = self.v_proj(hidden_states)
+-+++++
+-+++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++    #     # 3. RoPE 旋转位置编码
+-+++++    #     kv_seq_len = key_states.shape[-2]
+-+++++    #     if past_key_value is not None:
+-+++++    #         if self.layer_idx is None:
+-+++++    #             raise ValueError(
+-+++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++    #                 "with a layer index."
+-+++++    #             )
+-+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++
+-+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++    #     # 4. KV 缓存更新
+-+++++    #     if past_key_value is not None:
+-+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++    #         key_states, value_states = past_key_value.update(
+-+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++    #         )
+-+++++
+-+++++    #     # 5. 准备 Attention Mask
+-+++++    #     fa_attention_mask = None
+-+++++    #     if attention_mask is not None:
+-+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++    #         fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++++    #     input_dtype = query_states.dtype
+-+++++
+-+++++    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++    #         query=query_states,
+-+++++    #         key=key_states,
+-+++++    #         value=value_states,
+-+++++    #         head_num=self.num_heads,
+-+++++    #         attn_mask=fa_attention_mask,
+-+++++    #         keep_prob=1.0 - self.attention_dropout,
+-+++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++    #         input_layout="BNSD",
+-+++++    #         sparse_mode=0,
+-+++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++++    #         inner_precise=1
+-+++++    #     )
+-+++++
+-+++++    #     # 恢复原始数据类型
+-+++++    #     attn_output = attn_output.to(input_dtype)
+-+++++
+-+++++    #     # 7. 调整输出形状
+-+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++    #     attn_output = self.o_proj(attn_output)
+-+++++
+-+++++    #     attn_weights = None
+-+++++    #     if output_attentions:
+-+++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++
+-+++++    #     return attn_output, attn_weights, past_key_value
+-+++++
+-+++++    # def forward(
+-+++++    #     self,
+-+++++    #     hidden_states: mindspore.Tensor,
+-+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++    #     past_key_value: Optional[Cache] = None,
+-+++++    #     output_attentions: bool = False,
+-+++++    #     use_cache: bool = False,
+-+++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++    #     bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++    #     query_states = self.q_proj(hidden_states)
+-+++++    #     key_states = self.k_proj(hidden_states)
+-+++++    #     value_states = self.v_proj(hidden_states)
+-+++++
+-+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++    #     kv_seq_len = key_states.shape[-2]
+-+++++    #     if past_key_value is not None:
+-+++++    #         if self.layer_idx is None:
+-+++++    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++
+-+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++    #     if past_key_value is not None:
+-+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++    #         key_states, value_states = past_key_value.update(
+-+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++    #         )
+-+++++
+-+++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++
+-+++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++++    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++++    #     # <--- 修改结束 ---
+-+++++
+-+++++    #     fa_attention_mask = None
+-+++++    #     if attention_mask is not None:
+-+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++    #         fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++    #     input_dtype = query_states.dtype
+-+++++
+-+++++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++++    #         key=key_states,
+-+++++    #         value=value_states,
+-+++++    #         head_num=self.num_heads,
+-+++++    #         attn_mask=fa_attention_mask,
+-+++++    #         keep_prob=1.0 - self.attention_dropout,
+-+++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++++    #         input_layout="BNSD",
+-+++++    #         sparse_mode=0,
+-+++++    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++++    #     )
+-+++++
+-+++++    #     attn_output = attn_output.to(input_dtype)
+-+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++    #     attn_output = self.o_proj(attn_output)
+-+++++
+-+++++    #     attn_weights = None
+-+++++    #     if output_attentions:
+-+++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++++
+-+++++    #     return attn_output, attn_weights, past_key_value
+-+++++
+-++++ QWEN2MOE_ATTENTION_CLASSES = {
+-++++     "eager": Qwen2MoeAttention,
+-+++++    "flash-attention": Qwen2MoeFlashAttention,
+-++++ }
+-++++ 
+-++++ 
+-++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++ 
+-+++++    #@dwj
+-+++++    # 只遍历激活的专家，而非全部专家
+-++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-++++-        # router_logits: (batch * sequence_length, n_experts)
+-++++-        router_logits = self.gate(hidden_states)
+-++++-
+-++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++-        if self.norm_topk_prob:
+-++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-        # we cast back to the input dtype
+-++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-++++-
+-++++-        final_hidden_states = ops.zeros(
+-++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-++++-        )
+-++++-
+-++++-        # One hot encode the selected experts to create an expert mask
+-++++-        # this will be used to easily index which expert is going to be sollicitated
+-++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-++++-
+-++++-        # Loop over all available experts in the model and perform the computation on each expert
+-++++-        for expert_idx in range(self.num_experts):
+-++++-            expert_layer = self.experts[expert_idx]
+-++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-++++-
+-++++-            # Index the correct hidden states and compute the expert hidden state for
+-++++-            # the current expert. We need to make sure to multiply the output hidden
+-++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-++++-            if 0 not in idx.shape:
+-++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-++++-
+-++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-++++-                # the `top_x` tensor here.
+-++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-++++-
+-++++-        shared_expert_output = self.shared_expert(hidden_states)
+-++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-++++-
+-++++-        final_hidden_states = final_hidden_states + shared_expert_output
+-+++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++            num_tokens = hidden_states_reshaped.shape[0]
+-+++++
+-+++++            router_logits = self.gate(hidden_states_reshaped)
+-+++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++
+-+++++            if self.norm_topk_prob:
+-+++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++            
+-+++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++++            flat_selected_experts = selected_experts.flatten()
+-+++++            
+-+++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++++            token_indices = broadcasted_token_indices.flatten()
+-+++++            
+-+++++            active_experts = ops.unique(flat_selected_experts)
+-+++++            
+-+++++            for expert_idx_tensor in active_experts:
+-+++++                expert_idx = expert_idx_tensor.item()
+-+++++                expert_layer = self.experts[expert_idx]
+-+++++                
+-+++++                mask = (flat_selected_experts == expert_idx_tensor)
+-+++++                selected_token_indices = token_indices[mask]
+-+++++                selected_routing_weights = routing_weights.flatten()[mask]
+-+++++                
+-+++++                current_states = hidden_states_reshaped[selected_token_indices]
+-+++++                
+-+++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++                
+-+++++                final_hidden_states = final_hidden_states.index_add(
+-+++++                    dim=0,
+-+++++                    index=selected_token_indices,
+-+++++                    source=expert_output.to(hidden_states.dtype)
+-+++++                )
+-+++++            
+-+++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++++ 
+-++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++-        return final_hidden_states, router_logits
+-+++++            final_hidden_states = final_hidden_states + shared_expert_output
+-+++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++            
+-+++++            return final_hidden_states, router_logits
+-++++ 
+-++++ 
+-++++ class Qwen2MoeDecoderLayer(nn.Module):
+-++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-++++ 
+-++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++ 
+-+++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++++
+-++++         if (layer_idx not in config.mlp_only_layers) and (
+-++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++++         ):
+-++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-++++     _skip_keys_device_placement = "past_key_values"
+-++++     _supports_cache_class = True
+-+++++#lwx
+-+++++    # _supports_static_cache = True
+-++++ 
+-++++     def _init_weights(self, module):
+-++++         std = self.config.initializer_range
+-++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++++         return causal_mask
+-++++ 
+-++++ 
+-++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++     _tied_weights_keys = ["lm_head.weight"]
+-++++ 
+-++++     def __init__(self, config):
+-++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++         self.num_experts_per_tok = config.num_experts_per_tok
+-++++         # Initialize weights and apply final processing
+-++++         self.post_init()
+-+++++        # @lwx
+-+++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+++++        #     self.generation_config.cache_implementation = "static"
+-+++++        self._warmed_up = False
+-+++++
+-+++++    def warmup_moe_model(self):
+-+++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+++++        test_texts = [
+-+++++            "warmup short",
+-+++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+++++        ]
+-+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++++        if tokenizer is None:
+-+++++            from mindnlp.transformers import AutoTokenizer
+-+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++++            self._warmup_tokenizer = tokenizer
+-+++++
+-+++++        for text in test_texts:
+-+++++            inputs = tokenizer(text, return_tensors="ms")
+-+++++            with mindspore._no_grad():
+-+++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-++++ 
+-++++     def get_input_embeddings(self):
+-++++         return self.model.embed_tokens
+-++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++++         ```"""
+-+++++        if not self._warmed_up:
+-+++++            self._warmed_up = True
+-+++++            self.warmup_moe_model()
+-++++ 
+-++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++++         output_router_logits = (
+-++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++             }
+-++++         )
+-++++         return model_inputs
+-+++++# @lwx
+-+++++    # def _decode_one_tokens_logits(
+-+++++    #     self,
+-+++++    #     cur_token: mindspore.Tensor,
+-+++++    #     input_pos: Optional[mindspore.Tensor],
+-+++++    #     cache_position: mindspore.Tensor,
+-+++++    #     past_key_values: StaticCache,
+-+++++    # ) -> mindspore.Tensor:
+-+++++    #     """
+-+++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+++++        
+-+++++    #     Args:
+-+++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+++++    #         input_pos: 输入位置信息，可选
+-+++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+++++            
+-+++++    #     Returns:
+-+++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+++++    #     """
+-+++++    #     # 调用JIT编译的版本
+-+++++    #     return self.get_decode_one_tokens_logits(
+-+++++    #         cur_token=cur_token,
+-+++++    #         input_pos=input_pos,
+-+++++    #         cache_position=cache_position,
+-+++++    #         past_key_values=past_key_values,
+-+++++    #     )
+-+++++    
+-+++++    # @mindspore.jit(jit_level='O1')
+-+++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+++++    #     """
+-+++++    #     JIT编译的函数，用于高效的单token解码
+-+++++    #     使用JIT编译优化以支持静态shape和高效执行
+-+++++        
+-+++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+++++    #     """
+-+++++    #     outputs = self.model.forward(
+-+++++    #         input_ids=cur_token,
+-+++++    #         position_ids=input_pos,
+-+++++    #         cache_position=cache_position,
+-+++++    #         past_key_values=past_key_values,
+-+++++    #         use_cache=True,
+-+++++    #         return_dict=False,
+-+++++    #     )
+-+++++        
+-+++++    #     hidden_states = outputs[0]
+-+++++    #     logits = self.lm_head.forward(hidden_states)
+-+++++    #     logits = logits.float()
+-+++++        
+-+++++    #     return logits[:, -1, :]
+-+++++
+-+++++    # def _sample(
+-+++++    #     self,
+-+++++    #     input_ids: mindspore.Tensor,
+-+++++    #     logits_processor,
+-+++++    #     stopping_criteria,
+-+++++    #     generation_config,
+-+++++    #     synced_devices: bool,
+-+++++    #     streamer=None,
+-+++++    #     logits_warper=None,
+-+++++    #     **model_kwargs,
+-+++++    # ):
+-+++++    #     """
+-+++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+++++    #     """
+-+++++    #     from ...generation.logits_process import LogitsProcessorList
+-+++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+++++    #     from mindnlp.core import nn, ops, no_grad
+-+++++    #     import numpy as np
+-+++++        
+-+++++    #     # 检查是否使用 StaticCache
+-+++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+++++    #     # 否则，直接调用父类方法
+-+++++    #     past_key_values = model_kwargs.get("past_key_values")
+-+++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+++++
+-+++++    #     if not isinstance(past_key_values, StaticCache):
+-+++++    #         # 不使用 StaticCache，直接调用父类方法
+-+++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+++++    #         return super()._sample(
+-+++++    #             input_ids=input_ids,
+-+++++    #             logits_processor=logits_processor,
+-+++++    #             stopping_criteria=stopping_criteria,
+-+++++    #             generation_config=generation_config,
+-+++++    #             synced_devices=synced_devices,
+-+++++    #             streamer=streamer,
+-+++++    #             logits_warper=logits_warper,
+-+++++    #             **model_kwargs,
+-+++++    #         )
+-+++++        
+-+++++    #     # 使用 StaticCache，进入自定义循环
+-+++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+++++    #     pad_token_id = generation_config._pad_token_tensor
+-+++++    #     output_attentions = generation_config.output_attentions
+-+++++    #     output_hidden_states = generation_config.output_hidden_states
+-+++++    #     output_scores = generation_config.output_scores
+-+++++    #     output_logits = generation_config.output_logits
+-+++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+++++    #     max_length = generation_config.max_length
+-+++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+++++    #     do_sample = generation_config.do_sample
+-+++++        
+-+++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+++++    #         raise ValueError(
+-+++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+++++    #             f"{logits_warper})."
+-+++++    #         )
+-+++++        
+-+++++    #     # init attention / hidden states / scores tuples
+-+++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+++++        
+-+++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+++++    #         encoder_hidden_states = (
+-+++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+++++    #         )
+-+++++        
+-+++++    #     # keep track of which sequences are already finished
+-+++++    #     batch_size, cur_len = input_ids.shape
+-+++++    #     this_peer_finished = False
+-+++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+++++        
+-+++++    #     time_record = []
+-+++++    #     from ....utils.testing_utils import parse_flag_from_env
+-+++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+++++        
+-+++++    #     while self._has_unfinished_sequences(
+-+++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+++++    #     ):
+-+++++    #         if _record_time:
+-+++++    #             import time as time_module
+-+++++    #             infer_start = time_module.time()
+-+++++            
+-+++++    #         # prepare model inputs
+-+++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+++++            
+-+++++    #         # prepare variable output controls
+-+++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+++++            
+-+++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+++++    #         cur_cache_position = model_inputs.get("cache_position")
+-+++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+++++    #         cur_input_ids = model_inputs.get("input_ids")
+-+++++            
+-+++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+++++    #             cur_cache_position is not None and 
+-+++++    #             len(cur_cache_position.shape) > 0 and
+-+++++    #             cur_cache_position.shape[0] == 1 and
+-+++++    #             cur_input_ids is not None and
+-+++++    #             cur_input_ids.shape[1] == 1):
+-+++++    #             # 使用 JIT 优化的单 token 解码
+-+++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+++++    #             if not hasattr(self, '_jit_used'):
+-+++++    #                 self._jit_used = False
+-+++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+++++                
+-+++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+++++    #                 cur_token=cur_input_ids,
+-+++++    #                 input_pos=model_inputs.get("position_ids"),
+-+++++    #                 cache_position=cur_cache_position,
+-+++++    #                 past_key_values=cur_past_key_values,
+-+++++    #             )
+-+++++                
+-+++++    #             # 标记已使用JIT（用于后续判断）
+-+++++    #             if not self._jit_used:
+-+++++    #                 self._jit_used = True
+-+++++                
+-+++++    #             # 构造兼容的输出对象
+-+++++    #             class JitOptimizedOutput:
+-+++++    #                 def __init__(self, logits, config):
+-+++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+++++    #                     self.config = config
+-+++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+++++    #                     self.cross_attentions = None
+-+++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+++++                
+-+++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+++++    #         else:
+-+++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+++++    #             outputs = self(**model_inputs, return_dict=True)
+-+++++            
+-+++++    #         if synced_devices and this_peer_finished:
+-+++++    #             continue
+-+++++            
+-+++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+++++    #         next_token_logits = outputs.logits[:, -1, :]
+-+++++            
+-+++++    #         # pre-process distribution
+-+++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+++++    #         if do_sample:
+-+++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+++++            
+-+++++    #         # Store scores, attentions and hidden_states when required
+-+++++    #         if return_dict_in_generate:
+-+++++    #             if output_scores:
+-+++++    #                 scores += (next_token_scores,)
+-+++++    #             if output_logits:
+-+++++    #                 raw_logits += (next_token_logits,)
+-+++++    #             if output_attentions:
+-+++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+++++    #                 if self.config.is_encoder_decoder:
+-+++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+++++                
+-+++++    #             if output_hidden_states:
+-+++++    #                 hidden = (
+-+++++    #                     outputs.decoder_hidden_states
+-+++++    #                     if self.config.is_encoder_decoder
+-+++++    #                     else outputs.hidden_states
+-+++++    #                 )
+-+++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+++++            
+-+++++    #         # token selection
+-+++++    #         if do_sample:
+-+++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+++++    #         else:
+-+++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+++++            
+-+++++    #         # finished sentences should have their next token be a padding token
+-+++++    #         if has_eos_stopping_criteria:
+-+++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+++++            
+-+++++    #         # update generated ids, model inputs, and length for next step
+-+++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+++++    #         if streamer is not None:
+-+++++    #             streamer.put(next_tokens)
+-+++++            
+-+++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+++++    #             outputs,
+-+++++    #             model_kwargs,
+-+++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+++++    #         )
+-+++++            
+-+++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+++++    #         cur_len += 1
+-+++++            
+-+++++    #         if _record_time:
+-+++++    #             import time as time_module
+-+++++    #             infer_stop = time_module.time()
+-+++++    #             time_record.append(infer_stop - infer_start)
+-+++++            
+-+++++    #         del outputs
+-+++++        
+-+++++    #     average_infer_time = None
+-+++++    #     if time_record:
+-+++++    #         if len(time_record) > 1:
+-+++++    #             time_record.pop(0)
+-+++++    #         average_infer_time = sum(time_record) / len(time_record)
+-+++++    #         print(f'average inference time is: {average_infer_time}')
+-+++++    #         print(f'inference time record: {time_record}')
+-+++++        
+-+++++    #     if streamer is not None:
+-+++++    #         streamer.end()
+-+++++        
+-+++++    #     # 简单判断：打印是否使用了JIT路径
+-+++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+++++    #     else:
+-+++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+++++        
+-+++++    #     if return_dict_in_generate:
+-+++++    #         if self.config.is_encoder_decoder:
+-+++++    #             return GenerateEncoderDecoderOutput(
+-+++++    #                 sequences=input_ids,
+-+++++    #                 scores=scores,
+-+++++    #                 logits=raw_logits,
+-+++++    #                 encoder_attentions=encoder_attentions,
+-+++++    #                 encoder_hidden_states=encoder_hidden_states,
+-+++++    #                 decoder_attentions=decoder_attentions,
+-+++++    #                 cross_attentions=cross_attentions,
+-+++++    #                 decoder_hidden_states=decoder_hidden_states,
+-+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++++    #                 average_infer_time=average_infer_time
+-+++++    #             )
+-+++++    #         else:
+-+++++    #             return GenerateDecoderOnlyOutput(
+-+++++    #                 sequences=input_ids,
+-+++++    #                 scores=scores,
+-+++++    #                 logits=raw_logits,
+-+++++    #                 attentions=decoder_attentions,
+-+++++    #                 hidden_states=decoder_hidden_states,
+-+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++++    #                 average_infer_time=average_infer_time
+-+++++    #             )
+-+++++    #     else:
+-+++++    #         return input_ids
+-+++++            
+-+++++    # def _prepare_cache_for_generation(
+-+++++    #     self,
+-+++++    #     generation_config,
+-+++++    #     model_kwargs,
+-+++++    #     assistant_model,
+-+++++    #     batch_size,
+-+++++    #     max_cache_length,
+-+++++    # ):
+-+++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+++++    #         generation_config.cache_implementation = "static"
+-+++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+++++        
+-+++++    #     if generation_config.cache_implementation == "static":
+-+++++    #         base_required_from_max_length = generation_config.max_length + 1
+-+++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+++++    #         min_cache_size = 50
+-+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+++++    #         else:
+-+++++    #             max_cache_length = max(base_required, min_cache_size)
+-+++++            
+-+++++    #         original_max_cache_length = max_cache_length
+-+++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+++++            
+-+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++++    #             if max_cache_length > self.config.max_position_embeddings:
+-+++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+++++        
+-+++++    #     result = super()._prepare_cache_for_generation(
+-+++++    #         generation_config=generation_config,
+-+++++    #         model_kwargs=model_kwargs,
+-+++++    #         assistant_model=assistant_model,
+-+++++    #         batch_size=batch_size,
+-+++++    #         max_cache_length=max_cache_length,
+-+++++    #     )
+-+++++        
+-+++++    #     if generation_config.cache_implementation == "static":
+-+++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+++++    #         created_cache = model_kwargs.get(cache_name)
+-+++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+++++    #             if created_cache.max_cache_len < generation_config.max_length:
+-+++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+++++        
+-+++++    #     return result
+-+++++
+-+++++
+-+++++
+-++++ 
+-++++ 
+-++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-++++-- 
+-++++2.27.0
+-++++
+-+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-+++new file mode 100644
+-+++index 00000000..22b65dd5
+-+++--- /dev/null
+-++++++ b/patches/0002-20251106commit.patch
+-+++@@ -0,0 +1,3200 @@
+-++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-++++From: Pinoeer-kingxi <13022943007@163.com>
+-++++Date: Thu, 6 Nov 2025 09:20:38 +0800
+-++++Subject: [PATCH 2/3] 20251106commit
+-++++
+-++++---
+-++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+-++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+-++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
+-++++ create mode 100644 patches/0001-20251104commit.patch
+-++++
+-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++index d8303e45..73773c22 100644
+-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+-++++         #     y = y + self.shared_experts(identity)
+-++++         # return y
+-++++         
+-+++++    # @no_grad()
+-+++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++++
+-+++++    #     expert_cache = ops.zeros_like(x)
+-+++++    #     for i in range(self.num_experts_per_tok):
+-+++++    #         expert_id = flat_expert_indices[i].item()
+-+++++    #         weight = flat_expert_weights[i].item()
+-+++++    #         expert = self.experts[expert_id]
+-+++++    #         expert_out = expert(x)
+-+++++    #         expert_cache += expert_out * weight
+-+++++    #     return expert_cache
+-+++++
+-++++     @no_grad()
+-++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++++        # x 的 shape: (1, hidden_size)
+-+++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+++++
+-+++++        # 1. 收集所有需要的专家层
+-+++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+++++
+-+++++        # 2. 并行计算所有专家的输出
+-+++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+++++        # ops.cat 会将它们堆叠成一个新的 Tensor
+-+++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+++++
+-+++++        # 3. 使用矩阵乘法进行加权求和
+-+++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++++        # 最终结果 final_output 的 shape: (1, hidden_size)
+-+++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+++++        
+-+++++        return final_output
+-++++ 
+-++++-        expert_cache = ops.zeros_like(x)
+-++++-        for i in range(self.num_experts_per_tok):
+-++++-            expert_id = flat_expert_indices[i].item()
+-++++-            weight = flat_expert_weights[i].item()
+-++++-            expert = self.experts[expert_id]
+-++++-            expert_out = expert(x)
+-++++-            expert_cache += expert_out * weight
+-++++-        return expert_cache
+-++++ 
+-++++     @no_grad()
+-++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+-++++             key_states = self.k_proj(hidden_states)
+-++++             value_states = self.v_proj(hidden_states)
+-++++ 
+-++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++++        # @lwx
+-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+-+++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-+++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-+++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-++++ 
+-++++         kv_seq_len = key_states.shape[-2]
+-++++         if past_key_value is not None:
+-++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++ 
+-+++++# class DeepseekFlashAttention(nn.Module):
+-+++++#     """
+-+++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-+++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-+++++
+-+++++#     This class is designed as a drop-in replacement for DeepseekAttention.
+-+++++#     """
+-+++++
+-+++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-+++++#         super().__init__()
+-+++++#         self.config = config
+-+++++#         self.layer_idx = layer_idx
+-+++++#         if layer_idx is None:
+-+++++#             logger.warning(
+-+++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++++#                 "when creating this class."
+-+++++#             )
+-+++++
+-+++++#         self.attention_dropout = config.attention_dropout
+-+++++#         self.hidden_size = config.hidden_size
+-+++++#         self.num_heads = config.num_attention_heads
+-+++++#         self.head_dim = self.hidden_size // self.num_heads
+-+++++#         self.num_key_value_heads = config.num_key_value_heads
+-+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++#         self.max_position_embeddings = config.max_position_embeddings
+-+++++#         self.rope_theta = config.rope_theta
+-+++++#         self.is_causal = True
+-+++++
+-+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++#             raise ValueError(
+-+++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+++++#                 f" and `num_heads`: {self.num_heads})."
+-+++++#             )
+-+++++
+-+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-+++++#         self._init_rope()
+-+++++
+-+++++#     def _init_rope(self):
+-+++++#         if self.config.rope_scaling is None:
+-+++++#             self.rotary_emb = DeepseekRotaryEmbedding(
+-+++++#                 self.head_dim,
+-+++++#                 max_position_embeddings=self.max_position_embeddings,
+-+++++#                 base=self.rope_theta,
+-+++++#             )
+-+++++#         else:
+-+++++#             scaling_type = self.config.rope_scaling["type"]
+-+++++#             scaling_factor = self.config.rope_scaling["factor"]
+-+++++#             if scaling_type == "linear":
+-+++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-+++++#                     self.head_dim,
+-+++++#                     max_position_embeddings=self.max_position_embeddings,
+-+++++#                     scaling_factor=scaling_factor,
+-+++++#                     base=self.rope_theta,
+-+++++#                 )
+-+++++#             elif scaling_type == "dynamic":
+-+++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-+++++#                     self.head_dim,
+-+++++#                     max_position_embeddings=self.max_position_embeddings,
+-+++++#                     scaling_factor=scaling_factor,
+-+++++#                     base=self.rope_theta,
+-+++++#                 )
+-+++++#             else:
+-+++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-+++++
+-+++++#     def forward(
+-+++++#         self,
+-+++++#         hidden_states: mindspore.Tensor,
+-+++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-+++++#         position_ids: Optional[mindspore.Tensor] = None,
+-+++++#         past_key_value: Optional[Cache] = None,
+-+++++#         output_attentions: bool = False,
+-+++++#         use_cache: bool = False,
+-+++++#         **kwargs,
+-+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++#         if "padding_mask" in kwargs:
+-+++++#             warnings.warn(
+-+++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-+++++#             )
+-+++++        
+-+++++#         if output_attentions:
+-+++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-+++++
+-+++++#         bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++#         if self.config.pretraining_tp > 1:
+-+++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-+++++
+-+++++#         query_states = self.q_proj(hidden_states)
+-+++++#         key_states = self.k_proj(hidden_states)
+-+++++#         value_states = self.v_proj(hidden_states)
+-+++++
+-+++++#         # Reshape for multi-head attention
+-+++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++#         kv_seq_len = key_states.shape[-2]
+-+++++#         if past_key_value is not None:
+-+++++#             if self.layer_idx is None:
+-+++++#                 raise ValueError(
+-+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++#                     "with a layer index."
+-+++++#                 )
+-+++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++        
+-+++++#         # Apply Rotary Positional Embedding
+-+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++#         if past_key_value is not None:
+-+++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-+++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++
+-+++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-+++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-+++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++        
+-+++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+++++        
+-+++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-+++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-+++++
+-+++++#         # Convert attention_mask for flash_attention_score
+-+++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-+++++#         if attention_mask is not None:
+-+++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-+++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-+++++#                 raise ValueError(
+-+++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-+++++#                 )
+-+++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-+++++#         else:
+-+++++#             attn_mask_for_fa = None
+-+++++        
+-+++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-+++++
+-+++++#         # Call the fused flash_attention_score operator
+-+++++#         attn_output = mindspore.ops.flash_attention_score(
+-+++++#             query=query_states_for_fa,
+-+++++#             key=key_states_for_fa,
+-+++++#             value=value_states_for_fa,
+-+++++#             head_num=self.num_heads, # This is N1, the number of query heads
+-+++++#             input_layout='BSH',
+-+++++#             attn_mask=attn_mask_for_fa,
+-+++++#             keep_prob=keep_prob,
+-+++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++#             sparse_mode=0 # Default mask mode
+-+++++#         )
+-+++++        
+-+++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-+++++#         attn_output = self.o_proj(attn_output)
+-+++++        
+-+++++#         # Flash Attention does not return attention weights
+-+++++#         attn_weights = None
+-+++++
+-+++++#         return attn_output, attn_weights, past_key_value
+-+++++
+-+++++class DeepseekFlashAttention(nn.Module):
+-+++++    """
+-+++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+-+++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
+-+++++    designed for high performance on supported hardware (Ascend).
+-+++++
+-+++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+-+++++    """
+-+++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-+++++        super().__init__()
+-+++++        self.config = config
+-+++++        self.layer_idx = layer_idx
+-+++++        if layer_idx is None:
+-+++++            logger.warning(
+-+++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++++                "when creating this class."
+-+++++            )
+-+++++
+-+++++        # --- [FIX] Correctly initialize all required attributes ---
+-+++++        self.attention_dropout = config.attention_dropout
+-+++++        self.hidden_size = config.hidden_size
+-+++++        self.num_heads = config.num_attention_heads
+-+++++        self.head_dim = self.hidden_size // self.num_heads
+-+++++        self.num_key_value_heads = config.num_key_value_heads
+-+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++        self.max_position_embeddings = config.max_position_embeddings
+-+++++        self.rope_theta = config.rope_theta
+-+++++        self.is_causal = True
+-+++++
+-+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++            raise ValueError(
+-+++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+++++                f" and `num_heads`: {self.num_heads})."
+-+++++            )
+-+++++
+-+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-+++++        
+-+++++        # This call will now succeed as all attributes are initialized.
+-+++++        self._init_rope()
+-+++++
+-+++++    def _init_rope(self):
+-+++++        if self.config.rope_scaling is None:
+-+++++            self.rotary_emb = DeepseekRotaryEmbedding(
+-+++++                self.head_dim,
+-+++++                max_position_embeddings=self.max_position_embeddings,
+-+++++                base=self.rope_theta,
+-+++++            )
+-+++++        else:
+-+++++            scaling_type = self.config.rope_scaling["type"]
+-+++++            scaling_factor = self.config.rope_scaling["factor"]
+-+++++            if scaling_type == "linear":
+-+++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-+++++                    self.head_dim,
+-+++++                    max_position_embeddings=self.max_position_embeddings,
+-+++++                    scaling_factor=scaling_factor,
+-+++++                    base=self.rope_theta,
+-+++++                )
+-+++++            elif scaling_type == "dynamic":
+-+++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-+++++                    self.head_dim,
+-+++++                    max_position_embeddings=self.max_position_embeddings,
+-+++++                    scaling_factor=scaling_factor,
+-+++++                    base=self.rope_theta,
+-+++++                )
+-+++++            else:
+-+++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-+++++
+-+++++    def forward(
+-+++++        self,
+-+++++        hidden_states: mindspore.Tensor,
+-+++++        attention_mask: Optional[mindspore.Tensor] = None,
+-+++++        position_ids: Optional[mindspore.Tensor] = None,
+-+++++        past_key_value: Optional[Cache] = None,
+-+++++        output_attentions: bool = False,
+-+++++        use_cache: bool = False,
+-+++++        **kwargs,
+-+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++        if "padding_mask" in kwargs:
+-+++++            warnings.warn(
+-+++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-+++++            )
+-+++++        if output_attentions:
+-+++++            warnings.warn(
+-+++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+-+++++            )
+-+++++
+-+++++        bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++        if self.config.pretraining_tp > 1:
+-+++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-+++++
+-+++++        query_states = self.q_proj(hidden_states)
+-+++++        key_states = self.k_proj(hidden_states)
+-+++++        value_states = self.v_proj(hidden_states)
+-+++++
+-+++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++        kv_seq_len = key_states.shape[-2]
+-+++++        if past_key_value is not None:
+-+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++        
+-+++++        # Apply Rotary Position Embedding
+-+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++        if past_key_value is not None:
+-+++++            cache_kwargs = {"sin": sin, "cos": cos}
+-+++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++
+-+++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+-+++++        # So we must explicitly repeat the KV heads.
+-+++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++
+-+++++        # Convert attention mask for flash_attention_score
+-+++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+-+++++        if attention_mask is not None:
+-+++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-+++++                 raise ValueError(
+-+++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-+++++                )
+-+++++            attn_mask_for_fa = attention_mask < 0
+-+++++        else:
+-+++++            attn_mask_for_fa = None
+-+++++
+-+++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-+++++
+-+++++        # Call the fused operator using the efficient BNSD layout
+-+++++        attn_output = mindspore.ops.flash_attention_score(
+-+++++            query=query_states,
+-+++++            key=key_states,
+-+++++            value=value_states,
+-+++++            head_num=self.num_heads,
+-+++++            input_layout='BNSD', # Specify the correct layout
+-+++++            attn_mask=attn_mask_for_fa,
+-+++++            keep_prob=keep_prob,
+-+++++            scalar_value=1.0 / math.sqrt(self.head_dim)
+-+++++        )
+-+++++        
+-+++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+-+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++        
+-+++++        # Apply output projection
+-+++++        attn_output = self.o_proj(attn_output)
+-+++++
+-+++++        # Flash attention does not return attention weights, so we return None.
+-+++++        attn_weights = None
+-+++++
+-+++++        return attn_output, attn_weights, past_key_value
+-+++++
+-++++ Deepseek_ATTENTION_CLASSES = {
+-++++     "eager": DeepseekAttention,
+-+++++    "flash-attention": DeepseekFlashAttention,
+-++++ }
+-++++ 
+-++++ 
+-++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+-++++             config=config, layer_idx=layer_idx
+-++++         )
+-++++ 
+-+++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-+++++            config=config, layer_idx=layer_idx
+-+++++        )
+-+++++
+-++++         self.mlp = (
+-++++             DeepseekMoE(config)
+-++++             if (
+-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++index d4c6b651..bced285c 100644
+-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+-++++ 
+-++++ import mindspore
+-++++ import mindnlp.core.nn.functional as F
+-++++-from mindnlp.core import nn, ops
+-+++++from mindnlp.core import nn, ops, no_grad
+-++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+-++++ 
+-++++ from ....common.activations import ACT2FN
+-++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+-++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-++++ 
+-+++++Long_Prompt = False
+-+++++PROMPT_LENGTH_THRESHOLD = 128
+-++++ 
+-++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-++++ def _prepare_4d_causal_attention_mask_with_cache_position(
+-++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++ 
+-+++++# class Qwen2MoeFlashAttention(nn.Module):
+-+++++#     """
+-+++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++++
+-+++++#     关键改动:
+-+++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++++#        直接传入原始的 key 和 value 张量效率更高。
+-+++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++++#     """
+-+++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++++#         super().__init__()
+-+++++#         self.config = config
+-+++++#         self.layer_idx = layer_idx
+-+++++#         self.hidden_size = config.hidden_size
+-+++++#         self.num_heads = config.num_attention_heads
+-+++++#         self.head_dim = self.hidden_size // self.num_heads
+-+++++#         self.num_key_value_heads = config.num_key_value_heads
+-+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++#         self.max_position_embeddings = config.max_position_embeddings
+-+++++#         self.rope_theta = config.rope_theta
+-+++++#         self.attention_dropout = config.attention_dropout
+-+++++
+-+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++#             raise ValueError(
+-+++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++++#             )
+-+++++
+-+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++++
+-+++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++++#             self.head_dim,
+-+++++#             max_position_embeddings=self.max_position_embeddings,
+-+++++#             base=self.rope_theta,
+-+++++#         )
+-+++++
+-+++++#     def forward(
+-+++++#         self,
+-+++++#         hidden_states: mindspore.Tensor,
+-+++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-+++++#         position_ids: Optional[mindspore.Tensor] = None,
+-+++++#         past_key_value: Optional[Cache] = None,
+-+++++#         output_attentions: bool = False,
+-+++++#         use_cache: bool = False,
+-+++++#         cache_position: Optional[mindspore.Tensor] = None,
+-+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++#         bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++#         # 1. 线性投射 Q, K, V
+-+++++#         query_states = self.q_proj(hidden_states)
+-+++++#         key_states = self.k_proj(hidden_states)
+-+++++#         value_states = self.v_proj(hidden_states)
+-+++++
+-+++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++#         # 3. RoPE 旋转位置编码
+-+++++#         kv_seq_len = key_states.shape[-2]
+-+++++#         if past_key_value is not None:
+-+++++#             if self.layer_idx is None:
+-+++++#                 raise ValueError(
+-+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++#                     "with a layer index."
+-+++++#                 )
+-+++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++++#                 if cache_position.shape[0] == 1:
+-+++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++++#                     kv_seq_len = past_seen_tokens + 1
+-+++++#                 else:
+-+++++#                     # prefill 阶段：cache_position 是范围，使用其长度
+-+++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++++#             else:
+-+++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++
+-+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++#         # 4. KV 缓存更新
+-+++++#         if past_key_value is not None:
+-+++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++#             key_states, value_states = past_key_value.update(
+-+++++#                 key_states, value_states, self.layer_idx, cache_kwargs
+-+++++#             )
+-+++++            
+-+++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++#                 if cache_position.shape[0] == 1:
+-+++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++++#                     kv_seq_len = key_states.shape[-2]
+-+++++
+-+++++#         # 5. [重要] 准备 Attention Mask
+-+++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++++#         fa_attention_mask = None
+-+++++#         if attention_mask is not None:
+-+++++#             # 截取与当前key长度匹配的部分
+-+++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++++#             fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++++#         input_dtype = query_states.dtype
+-+++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++++#             query_states = query_states.to(mindspore.float16)
+-+++++#             key_states = key_states.to(mindspore.float16)
+-+++++#             value_states = value_states.to(mindspore.float16)
+-+++++
+-+++++#         # 6. [核心] 调用 flash_attention_score 算子
+-+++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++++#         attn_output = mindspore.ops.flash_attention_score(
+-+++++#             query=query_states,
+-+++++#             key=key_states,
+-+++++#             value=value_states,
+-+++++#             head_num=self.num_heads, # 传入Q的头数(N1)
+-+++++#             attn_mask=fa_attention_mask,
+-+++++#             keep_prob=1.0 - self.attention_dropout,
+-+++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++#             input_layout="BNSD",
+-+++++#             sparse_mode=0 # 使用 defaultMask 模式
+-+++++#         )
+-+++++
+-+++++#         # 恢复原始数据类型
+-+++++#         attn_output = attn_output.to(input_dtype)
+-+++++
+-+++++#         # 7. 调整输出形状
+-+++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++#         attn_output = self.o_proj(attn_output)
+-+++++
+-+++++#         # FlashAttention 算子不直接返回注意力权重矩阵
+-+++++#         attn_weights = None
+-+++++#         if output_attentions:
+-+++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++
+-+++++#         return attn_output, attn_weights, past_key_value
+-+++++
+-+++++#     # def forward(
+-+++++#     #     self,
+-+++++#     #     hidden_states: mindspore.Tensor,
+-+++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++#     #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++#     #     past_key_value: Optional[Cache] = None,
+-+++++#     #     output_attentions: bool = False,
+-+++++#     #     use_cache: bool = False,
+-+++++#     #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++#     #     bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++#     #     # 1. 线性投射 Q, K, V
+-+++++#     #     query_states = self.q_proj(hidden_states)
+-+++++#     #     key_states = self.k_proj(hidden_states)
+-+++++#     #     value_states = self.v_proj(hidden_states)
+-+++++
+-+++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++
+-+++++#     #     # 3. RoPE 旋转位置编码
+-+++++#     #     kv_seq_len = key_states.shape[-2]
+-+++++#     #     if past_key_value is not None:
+-+++++#     #         if self.layer_idx is None:
+-+++++#     #             raise ValueError(
+-+++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++#     #                 "with a layer index."
+-+++++#     #             )
+-+++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++
+-+++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++#     #     # 4. KV 缓存更新
+-+++++#     #     if past_key_value is not None:
+-+++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++#     #         key_states, value_states = past_key_value.update(
+-+++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++#     #         )
+-+++++
+-+++++#     #     # 5. 准备 Attention Mask
+-+++++#     #     fa_attention_mask = None
+-+++++#     #     if attention_mask is not None:
+-+++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++#     #         fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++++#     #     input_dtype = query_states.dtype
+-+++++
+-+++++#     #     # 6. [核心] 调用 flash_attention_score 算子
+-+++++#     #     attn_output = mindspore.ops.flash_attention_score(
+-+++++#     #         query=query_states,
+-+++++#     #         key=key_states,
+-+++++#     #         value=value_states,
+-+++++#     #         head_num=self.num_heads,
+-+++++#     #         attn_mask=fa_attention_mask,
+-+++++#     #         keep_prob=1.0 - self.attention_dropout,
+-+++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++#     #         input_layout="BNSD",
+-+++++#     #         sparse_mode=0,
+-+++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++++#     #         inner_precise=1
+-+++++#     #     )
+-+++++
+-+++++#     #     # 恢复原始数据类型
+-+++++#     #     attn_output = attn_output.to(input_dtype)
+-+++++
+-+++++#     #     # 7. 调整输出形状
+-+++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++#     #     attn_output = self.o_proj(attn_output)
+-+++++
+-+++++#     #     attn_weights = None
+-+++++#     #     if output_attentions:
+-+++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++
+-+++++#     #     return attn_output, attn_weights, past_key_value
+-+++++
+-+++++
+-++++ class Qwen2MoeFlashAttention(nn.Module):
+-++++     """
+-++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++++-
+-++++-    关键改动:
+-++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++++-       直接传入原始的 key 和 value 张量效率更高。
+-++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+-+++++
+-+++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+-+++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+-+++++    完全使用模型的低精度数据类型（如 float16）进行计算，
+-+++++    以达到理论上的最高执行速度。
+-++++     """
+-++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++         super().__init__()
+-++++         self.config = config
+-++++         self.layer_idx = layer_idx
+-+++++        if layer_idx is None:
+-+++++            logger.warning_once(
+-+++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+-+++++            )
+-+++++
+-++++         self.hidden_size = config.hidden_size
+-++++         self.num_heads = config.num_attention_heads
+-++++         self.head_dim = self.hidden_size // self.num_heads
+-++++         self.num_key_value_heads = config.num_key_value_heads
+-++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++         self.max_position_embeddings = config.max_position_embeddings
+-++++         self.rope_theta = config.rope_theta
+-++++         self.attention_dropout = config.attention_dropout
+-++++ 
+-++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++-            raise ValueError(
+-++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++++-            )
+-++++-
+-++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+-++++         key_states = self.k_proj(hidden_states)
+-++++         value_states = self.v_proj(hidden_states)
+-++++ 
+-++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++++        # 2. 调整形状以匹配 BNSD 布局
+-++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-
+-++++-        # 3. RoPE 旋转位置编码
+-+++++        
+-+++++        # 3. RoPE 和 KV 缓存
+-++++         kv_seq_len = key_states.shape[-2]
+-++++         if past_key_value is not None:
+-++++-            if self.layer_idx is None:
+-++++-                raise ValueError(
+-++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++-                    "with a layer index."
+-++++-                )
+-++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++++-                if cache_position.shape[0] == 1:
+-++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++++-                    kv_seq_len = past_seen_tokens + 1
+-++++-                else:
+-++++-                    # prefill 阶段：cache_position 是范围，使用其长度
+-++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++++-            else:
+-++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++-
+-+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++        
+-++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++ 
+-++++-        # 4. KV 缓存更新
+-++++         if past_key_value is not None:
+-++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++-            key_states, value_states = past_key_value.update(
+-++++-                key_states, value_states, self.layer_idx, cache_kwargs
+-++++-            )
+-++++-            
+-++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++-                if cache_position.shape[0] == 1:
+-++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++++-                    kv_seq_len = key_states.shape[-2]
+-++++-
+-++++-        # 5. [重要] 准备 Attention Mask
+-++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++
+-+++++        # 4. 准备 Attention Mask
+-++++         fa_attention_mask = None
+-++++         if attention_mask is not None:
+-++++-            # 截取与当前key长度匹配的部分
+-++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++++             fa_attention_mask = (mask_slice != 0)
+-++++ 
+-++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++++-        input_dtype = query_states.dtype
+-++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++++-            query_states = query_states.to(mindspore.float16)
+-++++-            key_states = key_states.to(mindspore.float16)
+-++++-            value_states = value_states.to(mindspore.float16)
+-++++-
+-++++-        # 6. [核心] 调用 flash_attention_score 算子
+-++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+-++++         attn_output = mindspore.ops.flash_attention_score(
+-++++             query=query_states,
+-++++             key=key_states,
+-++++             value=value_states,
+-++++-            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++++            head_num=self.num_heads,
+-++++             attn_mask=fa_attention_mask,
+-++++-            keep_prob=1.0 - self.attention_dropout,
+-+++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+-++++             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++             input_layout="BNSD",
+-++++-            sparse_mode=0 # 使用 defaultMask 模式
+-+++++            sparse_mode=0,
+-+++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+-++++         )
+-++++ 
+-++++-        # 恢复原始数据类型
+-++++-        attn_output = attn_output.to(input_dtype)
+-++++-
+-++++-        # 7. 调整输出形状
+-++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++++        # 6. 调整输出形状
+-++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++         attn_output = self.o_proj(attn_output)
+-++++ 
+-++++-        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++++        # 7. 返回结果
+-++++         attn_weights = None
+-++++         if output_attentions:
+-++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-++++ 
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++-    # def forward(
+-++++-    #     self,
+-++++-    #     hidden_states: mindspore.Tensor,
+-++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++-    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++-    #     past_key_value: Optional[Cache] = None,
+-++++-    #     output_attentions: bool = False,
+-++++-    #     use_cache: bool = False,
+-++++-    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++-
+-++++-    #     bsz, q_len, _ = hidden_states.shape
+-++++-
+-++++-    #     # 1. 线性投射 Q, K, V
+-++++-    #     query_states = self.q_proj(hidden_states)
+-++++-    #     key_states = self.k_proj(hidden_states)
+-++++-    #     value_states = self.v_proj(hidden_states)
+-++++-
+-++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-
+-++++-    #     # 3. RoPE 旋转位置编码
+-++++-    #     kv_seq_len = key_states.shape[-2]
+-++++-    #     if past_key_value is not None:
+-++++-    #         if self.layer_idx is None:
+-++++-    #             raise ValueError(
+-++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++-    #                 "with a layer index."
+-++++-    #             )
+-++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++ 
+-++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++-
+-++++-    #     # 4. KV 缓存更新
+-++++-    #     if past_key_value is not None:
+-++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++-    #         key_states, value_states = past_key_value.update(
+-++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++-    #         )
+-++++-
+-++++-    #     # 5. 准备 Attention Mask
+-++++-    #     fa_attention_mask = None
+-++++-    #     if attention_mask is not None:
+-++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++-    #         fa_attention_mask = (mask_slice != 0)
+-++++-
+-++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++++-    #     input_dtype = query_states.dtype
+-++++-
+-++++-    #     # 6. [核心] 调用 flash_attention_score 算子
+-++++-    #     attn_output = mindspore.ops.flash_attention_score(
+-++++-    #         query=query_states,
+-++++-    #         key=key_states,
+-++++-    #         value=value_states,
+-++++-    #         head_num=self.num_heads,
+-++++-    #         attn_mask=fa_attention_mask,
+-++++-    #         keep_prob=1.0 - self.attention_dropout,
+-++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++-    #         input_layout="BNSD",
+-++++-    #         sparse_mode=0,
+-++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++++-    #         inner_precise=1
+-++++-    #     )
+-++++-
+-++++-    #     # 恢复原始数据类型
+-++++-    #     attn_output = attn_output.to(input_dtype)
+-+++++QWEN2MOE_ATTENTION_CLASSES = {
+-+++++    "eager": Qwen2MoeAttention,
+-+++++    "flash-attention": Qwen2MoeFlashAttention,
+-+++++}
+-++++ 
+-++++-    #     # 7. 调整输出形状
+-++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++-    #     attn_output = self.o_proj(attn_output)
+-++++ 
+-++++-    #     attn_weights = None
+-++++-    #     if output_attentions:
+-++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++#     def __init__(self, config):
+-+++++#         super().__init__()
+-+++++#         self.num_experts = config.num_experts
+-+++++#         self.top_k = config.num_experts_per_tok
+-+++++#         self.norm_topk_prob = config.norm_topk_prob
+-+++++
+-+++++#         # gating
+-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++#         self.experts = nn.ModuleList(
+-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++#         )
+-+++++
+-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++
+-+++++#     #@dwj
+-+++++#     # 只遍历激活的专家，而非全部专家
+-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++#             num_tokens = hidden_states_reshaped.shape[0]
+-+++++
+-+++++#             router_logits = self.gate(hidden_states_reshaped)
+-+++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++
+-+++++#             if self.norm_topk_prob:
+-+++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++#             routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++            
+-+++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++++#             flat_selected_experts = selected_experts.flatten()
+-+++++            
+-+++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++++#             token_indices = broadcasted_token_indices.flatten()
+-+++++            
+-+++++#             active_experts = ops.unique(flat_selected_experts)
+-+++++            
+-+++++#             for expert_idx_tensor in active_experts:
+-+++++#                 expert_idx = expert_idx_tensor.item()
+-+++++#                 expert_layer = self.experts[expert_idx]
+-+++++                
+-+++++#                 mask = (flat_selected_experts == expert_idx_tensor)
+-+++++#                 selected_token_indices = token_indices[mask]
+-+++++#                 selected_routing_weights = routing_weights.flatten()[mask]
+-+++++                
+-+++++#                 current_states = hidden_states_reshaped[selected_token_indices]
+-+++++                
+-+++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++                
+-+++++#                 final_hidden_states = final_hidden_states.index_add(
+-+++++#                     dim=0,
+-+++++#                     index=selected_token_indices,
+-+++++#                     source=expert_output.to(hidden_states.dtype)
+-+++++#                 )
+-+++++            
+-+++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++++ 
+-++++-    #     return attn_output, attn_weights, past_key_value
+-+++++#             final_hidden_states = final_hidden_states + shared_expert_output
+-+++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++            
+-+++++#             return final_hidden_states, router_logits
+-+++++
+-+++++
+-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++#     """
+-+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-+++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-+++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-+++++#     """
+-+++++#     def __init__(self, config: Qwen2MoeConfig):
+-+++++#         super().__init__()
+-+++++#         self.num_experts = config.num_experts
+-+++++#         self.top_k = config.num_experts_per_tok
+-+++++#         self.norm_topk_prob = config.norm_topk_prob
+-+++++
+-+++++#         # 门控网络
+-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++#         # 专家列表
+-+++++#         self.experts = nn.ModuleList(
+-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++#         )
+-+++++#         # 共享专家
+-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_decode(
+-+++++#         self, 
+-+++++#         hidden_states: mindspore.Tensor, 
+-+++++#         selected_experts: mindspore.Tensor, 
+-+++++#         routing_weights: mindspore.Tensor
+-+++++#     ) -> mindspore.Tensor:
+-+++++#         """
+-+++++#         【解码路径】针对 sequence_length=1 的极致优化。
+-+++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-+++++#         """
+-+++++#         batch_size, hidden_dim = hidden_states.shape
+-+++++        
+-+++++#         expert_outputs_list = [
+-+++++#             ops.cat([
+-+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++#             ], dim=0) 
+-+++++#             for i in range(batch_size)
+-+++++#         ]
+-+++++        
+-+++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-+++++#         # shape: (batch_size, top_k, hidden_dim)
+-+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++        
+-+++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-+++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++++        
+-+++++#         return moe_output.squeeze(1)
+-+++++
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_prefill(
+-+++++#         self, 
+-+++++#         hidden_states: mindspore.Tensor, 
+-+++++#         selected_experts: mindspore.Tensor, 
+-+++++#         routing_weights: mindspore.Tensor
+-+++++#     ) -> mindspore.Tensor:
+-+++++#         """
+-+++++#         【预填充路径】针对 sequence_length > 1 的优化。
+-+++++#         按专家对 Token 进行分组，并进行批处理。
+-+++++#         """
+-+++++#         moe_output = ops.zeros_like(hidden_states)
+-+++++#         num_tokens = hidden_states.shape[0]
+-+++++#         flat_selected_experts = selected_experts.flatten()
+-+++++        
+-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++        
+-+++++#         active_experts = ops.unique(flat_selected_experts)
+-+++++        
+-+++++#         for expert_idx_tensor in active_experts:
+-+++++#             expert_idx = expert_idx_tensor.item()
+-+++++#             expert_layer = self.experts[expert_idx]
+-+++++            
+-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++#             selected_token_indices = token_indices[mask]
+-+++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++++            
+-+++++#             current_states = hidden_states[selected_token_indices]
+-+++++            
+-+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++            
+-+++++#             moe_output = moe_output.index_add(
+-+++++#                 dim=0,
+-+++++#                 index=selected_token_indices,
+-+++++#                 source=expert_output.to(hidden_states.dtype)
+-+++++#             )
+-+++++#         return moe_output
+-+++++
+-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++#         """
+-+++++#         顶层 forward 方法，作为智能分发器。
+-+++++#         """
+-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++        
+-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++#         router_logits = self.gate(hidden_states_reshaped)
+-+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++ 
+-++++-    # def forward(
+-++++-    #     self,
+-++++-    #     hidden_states: mindspore.Tensor,
+-++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++-    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++-    #     past_key_value: Optional[Cache] = None,
+-++++-    #     output_attentions: bool = False,
+-++++-    #     use_cache: bool = False,
+-++++-    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++-
+-++++-    #     bsz, q_len, _ = hidden_states.shape
+-++++-
+-++++-    #     query_states = self.q_proj(hidden_states)
+-++++-    #     key_states = self.k_proj(hidden_states)
+-++++-    #     value_states = self.v_proj(hidden_states)
+-++++-
+-++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-
+-++++-    #     kv_seq_len = key_states.shape[-2]
+-++++-    #     if past_key_value is not None:
+-++++-    #         if self.layer_idx is None:
+-++++-    #             raise ValueError("`layer_idx` must be specified for caching")
+-++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++-
+-++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++-
+-++++-    #     if past_key_value is not None:
+-++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++-    #         key_states, value_states = past_key_value.update(
+-++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++-    #         )
+-+++++#         if self.norm_topk_prob:
+-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++        
+-+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++        
+-+++++#         moe_output = None
+-+++++#         # 在推理时，根据序列长度选择最优路径
+-+++++#         if not self.training:
+-+++++#             if sequence_length == 1:
+-+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++#             else:
+-+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++#         else:
+-+++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-+++++#             raise NotImplementedError("Training path is not implemented.")
+-+++++
+-+++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-+++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-+++++        
+-+++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-+++++        
+-+++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-+++++        
+-+++++#         return final_hidden_states, router_logits
+-+++++
+-+++++
+-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++#     """
+-+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-+++++#     """
+-+++++#     def __init__(self, config: Qwen2MoeConfig):
+-+++++#         super().__init__()
+-+++++#         self.num_experts = config.num_experts
+-+++++#         self.top_k = config.num_experts_per_tok
+-+++++#         self.norm_topk_prob = config.norm_topk_prob
+-+++++
+-+++++#         # 门控网络
+-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++#         # 专家列表
+-+++++#         self.experts = nn.ModuleList(
+-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++#         )
+-+++++#         # 共享专家
+-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_decode(
+-+++++#         self, 
+-+++++#         hidden_states: mindspore.Tensor, 
+-+++++#         selected_experts: mindspore.Tensor, 
+-+++++#         routing_weights: mindspore.Tensor
+-+++++#     ) -> mindspore.Tensor:
+-+++++#         batch_size, _ = hidden_states.shape
+-+++++#         expert_outputs_list = [
+-+++++#             ops.cat([
+-+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++#             ], dim=0) 
+-+++++#             for i in range(batch_size)
+-+++++#         ]
+-+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++++#         return moe_output.squeeze(1)
+-+++++
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_prefill(
+-+++++#         self, 
+-+++++#         hidden_states: mindspore.Tensor, 
+-+++++#         selected_experts: mindspore.Tensor, 
+-+++++#         routing_weights: mindspore.Tensor
+-+++++#     ) -> mindspore.Tensor:
+-+++++#         moe_output = ops.zeros_like(hidden_states)
+-+++++#         num_tokens = hidden_states.shape[0]
+-+++++#         flat_selected_experts = selected_experts.flatten()
+-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++#         active_experts = ops.unique(flat_selected_experts)
+-+++++        
+-+++++#         for expert_idx_tensor in active_experts:
+-+++++#             expert_idx = expert_idx_tensor.item()
+-+++++#             expert_layer = self.experts[expert_idx]
+-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++#             selected_token_indices = token_indices[mask]
+-+++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++++#             current_states = hidden_states[selected_token_indices]
+-+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++#             moe_output = moe_output.index_add(
+-+++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++++#             )
+-+++++#         return moe_output
+-+++++
+-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++#         """
+-+++++#         顶层 forward 方法，作为智能分发器。
+-+++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-+++++#         """
+-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++        
+-+++++#         # 1. 门控计算 (通用逻辑)
+-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++#         router_logits = self.gate(hidden_states_reshaped)
+-+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++
+-+++++#         if self.norm_topk_prob:
+-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++        
+-+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++        
+-+++++#         # 2. 智能分发到最优 MoE 路径
+-+++++#         moe_output = None
+-+++++#         if not self.training:
+-+++++#             if sequence_length == 1:
+-+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++#             else:
+-+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++#         else:
+-+++++#             raise NotImplementedError("Training path is not implemented.")
+-+++++
+-+++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-+++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++        
+-+++++#         # 4. 合并 MoE 输出和共享专家输出
+-+++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++        
+-+++++#         # 5. 恢复原始形状并返回
+-+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++        
+-+++++#         return final_hidden_states, router_logits
+-+++++
+-+++++# prefill fastest
+-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++#     """
+-+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-+++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-+++++#     """
+-+++++#     def __init__(self, config: Qwen2MoeConfig):
+-+++++#         super().__init__()
+-+++++#         self.num_experts = config.num_experts
+-+++++#         self.top_k = config.num_experts_per_tok
+-+++++#         self.norm_topk_prob = config.norm_topk_prob
+-+++++
+-+++++#         # 门控网络
+-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++#         # 专家列表
+-+++++#         self.experts = nn.ModuleList(
+-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++#         )
+-+++++#         # 共享专家
+-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_dispatch(
+-+++++#         self, 
+-+++++#         hidden_states: mindspore.Tensor, 
+-+++++#         selected_experts: mindspore.Tensor, 
+-+++++#         routing_weights: mindspore.Tensor
+-+++++#     ) -> mindspore.Tensor:
+-+++++#         """
+-+++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-+++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-+++++#         """
+-+++++#         moe_output = ops.zeros_like(hidden_states)
+-+++++#         num_tokens, _ = hidden_states.shape
+-+++++        
+-+++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-+++++#         flat_selected_experts = selected_experts.flatten()
+-+++++#         flat_routing_weights = routing_weights.flatten()
+-++++ 
+-++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++-
+-++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++++-    #     query_states = query_states / math.sqrt(self.head_dim)
+-++++-    #     # <--- 修改结束 ---
+-++++-
+-++++-    #     fa_attention_mask = None
+-++++-    #     if attention_mask is not None:
+-++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++-    #         fa_attention_mask = (mask_slice != 0)
+-++++-
+-++++-    #     input_dtype = query_states.dtype
+-++++-
+-++++-    #     attn_output = mindspore.ops.flash_attention_score(
+-++++-    #         query=query_states,    # 传入已经预先缩放过的 query
+-++++-    #         key=key_states,
+-++++-    #         value=value_states,
+-++++-    #         head_num=self.num_heads,
+-++++-    #         attn_mask=fa_attention_mask,
+-++++-    #         keep_prob=1.0 - self.attention_dropout,
+-++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++++-    #         input_layout="BNSD",
+-++++-    #         sparse_mode=0,
+-++++-    #         inner_precise=1        # 仍然保持内部高精度计算
+-++++-    #     )
+-+++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++ 
+-++++-    #     attn_output = attn_output.to(input_dtype)
+-++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++-    #     attn_output = self.o_proj(attn_output)
+-+++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-+++++#         active_experts = ops.unique(flat_selected_experts)
+-+++++        
+-+++++#         for expert_idx_tensor in active_experts:
+-+++++#             expert_idx = expert_idx_tensor.item()
+-+++++#             expert_layer = self.experts[expert_idx]
+-+++++            
+-+++++#             # 找到所有分配给该专家的 token
+-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++            
+-+++++#             # 使用 mask 选取对应的 token 和权重
+-+++++#             current_token_indices = token_indices[mask]
+-+++++#             current_routing_weights = flat_routing_weights[mask]
+-+++++#             current_hidden_states = hidden_states[current_token_indices]
+-+++++            
+-+++++#             # 对这些 token 进行批处理
+-+++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++++            
+-+++++#             # 使用 index_add 将结果精确地加回到对应位置
+-+++++#             moe_output = moe_output.index_add(
+-+++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++++#             )
+-+++++#         return moe_output
+-+++++
+-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++#         """
+-+++++#         顶层 forward 方法，作为智能分发器。
+-+++++#         """
+-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++        
+-+++++#         # 1. 门控计算
+-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++#         router_logits = self.gate(hidden_states_reshaped)
+-+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++
+-+++++#         if self.norm_topk_prob:
+-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++        
+-+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++        
+-+++++#         # 2. 调用统一的 MoE 计算内核
+-+++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-+++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-++++ 
+-++++-    #     attn_weights = None
+-++++-    #     if output_attentions:
+-++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++++#         # 3. 统一处理共享专家
+-+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++        
+-+++++#         # 4. 合并输出
+-+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++        
+-+++++#         # 5. 恢复原始形状并返回
+-+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++        
+-+++++#         return final_hidden_states, router_logits
+-+++++
+-+++++
+-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++#     """
+-+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++++#     【最终高性能与高精度版】：
+-+++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-+++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-+++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-+++++#     3. 这样实现了速度和准确性的两全其美。
+-+++++#     """
+-+++++#     def __init__(self, config: Qwen2MoeConfig):
+-+++++#         super().__init__()
+-+++++#         self.num_experts = config.num_experts
+-+++++#         self.top_k = config.num_experts_per_tok
+-+++++#         self.norm_topk_prob = config.norm_topk_prob
+-+++++
+-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++#         self.experts = nn.ModuleList(
+-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++#         )
+-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_decode(
+-+++++#         self, 
+-+++++#         hidden_states: mindspore.Tensor, 
+-+++++#         selected_experts: mindspore.Tensor, 
+-+++++#         routing_weights: mindspore.Tensor
+-+++++#     ) -> mindspore.Tensor:
+-+++++#         """
+-+++++#         【解码路径】极致优化版：bmm + 高精度累加。
+-+++++#         """
+-+++++#         original_dtype = hidden_states.dtype
+-+++++#         batch_size, _ = hidden_states.shape
+-+++++        
+-+++++#         expert_outputs_list = [
+-+++++#             ops.cat([
+-+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++#             ], dim=0) 
+-+++++#             for i in range(batch_size)
+-+++++#         ]
+-+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++
+-+++++#         # 在 float32 下执行 bmm，得到高精度结果
+-+++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++++        
+-+++++#         # 将高精度结果转换回原始数据类型
+-+++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-+++++        
+-+++++#         return moe_output
+-+++++
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_prefill(
+-+++++#         self, 
+-+++++#         hidden_states: mindspore.Tensor, 
+-+++++#         selected_experts: mindspore.Tensor, 
+-+++++#         routing_weights: mindspore.Tensor
+-+++++#     ) -> mindspore.Tensor:
+-+++++#         """
+-+++++#         【预填充路径】与原始实现一致，结果精确。
+-+++++#         """
+-+++++#         moe_output = ops.zeros_like(hidden_states)
+-+++++#         num_tokens, _ = hidden_states.shape
+-+++++#         flat_selected_experts = selected_experts.flatten()
+-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++#         active_experts = ops.unique(flat_selected_experts)
+-+++++        
+-+++++#         for expert_idx_tensor in active_experts:
+-+++++#             expert_idx = expert_idx_tensor.item()
+-+++++#             expert_layer = self.experts[expert_idx]
+-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++#             selected_token_indices = token_indices[mask]
+-+++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++++#             current_states = hidden_states[selected_token_indices]
+-+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++#             moe_output = moe_output.index_add(
+-+++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++++#             )
+-+++++#         return moe_output
+-+++++
+-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++        
+-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++#         router_logits = self.gate(hidden_states_reshaped)
+-+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++ 
+-++++-    #     return attn_output, attn_weights, past_key_value
+-+++++#         if self.norm_topk_prob:
+-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++        
+-+++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-+++++#         # 如果模型主体是 float16，后续再转换
+-+++++        
+-+++++#         moe_output = None
+-+++++#         if not self.training:
+-+++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-+++++#             # _moe_infer_decode 内部会处理好类型转换
+-+++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++#             if sequence_length == 1:
+-+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+++++#             else:
+-+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+++++#         else:
+-+++++#             raise NotImplementedError("Training path is not implemented.")
+-+++++
+-+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++        
+-+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++        
+-+++++#         return final_hidden_states, router_logits
+-+++++    
+-++++ 
+-++++-QWEN2MOE_ATTENTION_CLASSES = {
+-++++-    "eager": Qwen2MoeAttention,
+-++++-    "flash-attention": Qwen2MoeFlashAttention,
+-++++-}
+-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++#     """
+-+++++#     【融合版】一个混合专家模块，内置两种推理策略，
+-+++++#     由外部全局变量 `Long_Prompt` 控制：
+-+++++
+-+++++#     - if Long_Prompt is True:  【精度优先模式】
+-+++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-+++++#       适用于处理长序列，避免误差累积。
+-+++++
+-+++++#     - if Long_Prompt is False: 【速度优先模式】
+-+++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-+++++#       在解码阶段获得极致速度，同时保证结果高度准确。
+-+++++#     """
+-+++++#     def __init__(self, config: Qwen2MoeConfig):
+-+++++#         super().__init__()
+-+++++#         self.num_experts = config.num_experts
+-+++++#         self.top_k = config.num_experts_per_tok
+-+++++#         self.norm_topk_prob = config.norm_topk_prob
+-+++++
+-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++#         self.experts = nn.ModuleList(
+-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++#         )
+-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++
+-+++++#     # --- 速度优先模式的辅助函数 ---
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++#         original_dtype = hidden_states.dtype
+-+++++#         batch_size, _ = hidden_states.shape
+-+++++#         expert_outputs_list = [
+-+++++#             ops.cat([
+-+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++#             ], dim=0) 
+-+++++#             for i in range(batch_size)
+-+++++#         ]
+-+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++#         weights_fp32 = routing_weights.to(mindspore.float32)
+-+++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+++++
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++#         moe_output = ops.zeros_like(hidden_states)
+-+++++#         num_tokens, _ = hidden_states.shape
+-+++++#         flat_selected_experts = selected_experts.flatten()
+-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++#         active_experts = ops.unique(flat_selected_experts)
+-+++++#         for expert_idx_tensor in active_experts:
+-+++++#             expert_idx = expert_idx_tensor.item()
+-+++++#             expert_layer = self.experts[expert_idx]
+-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++#             selected_token_indices = token_indices[mask]
+-+++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++++#             current_states = hidden_states[selected_token_indices]
+-+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++++#         return moe_output
+-+++++
+-+++++#     # --- 精度优先模式的辅助函数 ---
+-+++++#     @no_grad()
+-+++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++#         moe_output = ops.zeros_like(hidden_states)
+-+++++#         num_tokens, _ = hidden_states.shape
+-+++++#         flat_selected_experts = selected_experts.flatten()
+-+++++#         flat_routing_weights = routing_weights.flatten()
+-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++#         active_experts = ops.unique(flat_selected_experts)
+-+++++#         for expert_idx_tensor in active_experts:
+-+++++#             expert_idx = expert_idx_tensor.item()
+-+++++#             expert_layer = self.experts[expert_idx]
+-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++#             current_token_indices = token_indices[mask]
+-+++++#             current_routing_weights = flat_routing_weights[mask]
+-+++++#             current_hidden_states = hidden_states[current_token_indices]
+-+++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++++#         return moe_output
+-+++++
+-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++#         # 声明我们将要使用一个在模块外部定义的全局变量
+-+++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-+++++#         global Long_Prompt
+-+++++
+-+++++#         # 1. 门控计算 (所有模式通用)
+-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++#         router_logits = self.gate(hidden_states_reshaped)
+-+++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+++++#         if self.norm_topk_prob:
+-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++        
+-+++++#         moe_output = None
+-+++++#         if not self.training:
+-+++++#             # 根据 Long_Prompt 标志选择模式
+-+++++#             if Long_Prompt:
+-+++++#                 # --- 精度优先模式 ---
+-+++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++#             else:
+-+++++#                 # --- 速度优先模式 ---
+-+++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++#                 if sequence_length == 1:
+-+++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++#                 else:
+-+++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++#         else:
+-+++++#             raise NotImplementedError("Training path is not implemented.")
+-+++++
+-+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++        
+-+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++        
+-+++++#         return final_hidden_states, router_logits
+-+++++    
+-+++++class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++    """
+-+++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-+++++    控制的顶级推理策略：
+-++++ 
+-+++++    - if Long_Prompt is True:  【精度优先模式】
+-+++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+-+++++      适用于需要严格可复现性的长序列任务。
+-++++ 
+-++++-class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++-    def __init__(self, config):
+-+++++    - if Long_Prompt is False: 【速度优先模式】
+-+++++      采用业界最强的性能组合：
+-+++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+-+++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+-+++++    """
+-+++++    def __init__(self, config: Qwen2MoeConfig):
+-++++         super().__init__()
+-++++         self.num_experts = config.num_experts
+-++++         self.top_k = config.num_experts_per_tok
+-++++         self.norm_topk_prob = config.norm_topk_prob
+-++++ 
+-++++-        # gating
+-++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++         self.experts = nn.ModuleList(
+-++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++         )
+-++++-
+-++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++ 
+-++++-    #@dwj
+-++++-    # 只遍历激活的专家，而非全部专家
+-++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++-            num_tokens = hidden_states_reshaped.shape[0]
+-++++-
+-++++-            router_logits = self.gate(hidden_states_reshaped)
+-++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++-
+-++++-            if self.norm_topk_prob:
+-++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-            routing_weights = routing_weights.to(hidden_states.dtype)
+-++++-            
+-++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++++-            flat_selected_experts = selected_experts.flatten()
+-++++-            
+-++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++++-            token_indices = broadcasted_token_indices.flatten()
+-++++-            
+-++++-            active_experts = ops.unique(flat_selected_experts)
+-++++-            
+-++++-            for expert_idx_tensor in active_experts:
+-++++-                expert_idx = expert_idx_tensor.item()
+-++++-                expert_layer = self.experts[expert_idx]
+-++++-                
+-++++-                mask = (flat_selected_experts == expert_idx_tensor)
+-++++-                selected_token_indices = token_indices[mask]
+-++++-                selected_routing_weights = routing_weights.flatten()[mask]
+-++++-                
+-++++-                current_states = hidden_states_reshaped[selected_token_indices]
+-++++-                
+-++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++-                
+-++++-                final_hidden_states = final_hidden_states.index_add(
+-++++-                    dim=0,
+-++++-                    index=selected_token_indices,
+-++++-                    source=expert_output.to(hidden_states.dtype)
+-++++-                )
+-++++-            
+-++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+-+++++    @no_grad()
+-+++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++        original_dtype = hidden_states.dtype
+-+++++        batch_size, _ = hidden_states.shape
+-+++++        expert_outputs_list = [
+-+++++            ops.cat([
+-+++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++            ], dim=0)
+-+++++            for i in range(batch_size)
+-+++++        ]
+-+++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++        weights_fp32 = routing_weights.to(mindspore.float32)
+-+++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+++++        return moe_output_fp32.squeeze(1).to(original_dtype)
+-+++++
+-+++++    @no_grad()
+-+++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++        num_tokens, _ = hidden_states.shape
+-+++++        flat_selected_experts = selected_experts.flatten()
+-+++++        sorted_expert_indices = flat_selected_experts.argsort()
+-+++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+++++        original_token_indices = sorted_expert_indices // self.top_k
+-+++++        moe_output = ops.zeros_like(hidden_states)
+-+++++        current_token_offset = 0
+-+++++        for i in range(self.num_experts):
+-+++++            expert_token_count = tokens_per_expert[i] - current_token_offset
+-+++++            if expert_token_count == 0:
+-+++++                continue
+-+++++            end_offset = current_token_offset + expert_token_count
+-+++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+++++            expert_hidden_states = hidden_states[expert_original_token_indices]
+-+++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++++            current_token_offset += expert_token_count
+-+++++        return moe_output
+-+++++
+-+++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-+++++    @no_grad()
+-+++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++        moe_output = ops.zeros_like(hidden_states)
+-+++++        num_tokens, _ = hidden_states.shape
+-+++++        flat_selected_experts = selected_experts.flatten()
+-+++++        flat_routing_weights = routing_weights.flatten()
+-+++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++        active_experts = ops.unique(flat_selected_experts)
+-+++++        for expert_idx_tensor in active_experts:
+-+++++            expert_idx = expert_idx_tensor.item()
+-+++++            expert_layer = self.experts[expert_idx]
+-+++++            mask = (flat_selected_experts == expert_idx_tensor)
+-+++++            current_token_indices = token_indices[mask]
+-+++++            current_routing_weights = flat_routing_weights[mask]
+-+++++            current_hidden_states = hidden_states[current_token_indices]
+-+++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++++        return moe_output
+-++++ 
+-++++-            final_hidden_states = final_hidden_states + shared_expert_output
+-++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++-            
+-++++-            return final_hidden_states, router_logits
+-+++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++        global Long_Prompt
+-+++++
+-+++++        # 1. 门控计算 (所有模式通用)
+-+++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++        router_logits = self.gate(hidden_states_reshaped)
+-+++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+++++        if self.norm_topk_prob:
+-+++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++        
+-+++++        moe_output = None
+-+++++        if Long_Prompt:
+-+++++            # --- 精度优先模式 (ACCURACY MODE) ---
+-+++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++        else:
+-+++++            # --- 速度优先模式 (SPEED MODE) ---
+-+++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++            if sequence_length == 1:
+-+++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++            else:
+-+++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++        
+-++++ 
+-+++++        # 3. 共享专家计算与合并 (所有模式通用)
+-+++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++        
+-+++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++        
+-+++++        return final_hidden_states, router_logits
+-++++ 
+-++++ class Qwen2MoeDecoderLayer(nn.Module):
+-++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-++++         super().__init__()
+-++++         self.hidden_size = config.hidden_size
+-+++++        
+-+++++        # if Long_Prompt:
+-+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++++        # else:
+-+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++ 
+-++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++ 
+-++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++-
+-++++         if (layer_idx not in config.mlp_only_layers) and (
+-++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++++         ):
+-++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++             self._warmed_up = True
+-++++             self.warmup_moe_model()
+-++++ 
+-+++++
+-+++++
+-++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++++         output_router_logits = (
+-++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+-++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++             router_logits=outputs.router_logits,
+-++++         )
+-++++ 
+-+++++    def generate(self, *args, **kwargs):
+-+++++        """
+-+++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-+++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-+++++        """
+-+++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-+++++
+-+++++        input_ids = kwargs.get("input_ids")
+-+++++        if input_ids is None and args:
+-+++++            input_ids = args[0]
+-+++++
+-+++++        if input_ids is not None:
+-+++++            prompt_length = input_ids.shape[1]
+-+++++            
+-+++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-+++++                Long_Prompt = True
+-+++++            else:
+-+++++                Long_Prompt = False
+-+++++
+-+++++        return super().generate(*args, **kwargs)
+-+++++    
+-++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+-++++     def prepare_inputs_for_generation(
+-++++         self,
+-++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+-++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
+-++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+-+++++        
+-++++         if past_key_values is not None:
+-++++             if inputs_embeds is not None:  # Exception 1
+-++++                 if 0 not in input_ids.shape:
+-++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++             }
+-++++         )
+-++++         return model_inputs
+-+++++
+-++++ # @lwx
+-++++     # def _decode_one_tokens_logits(
+-++++     #     self,
+-++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+-++++             attentions=outputs.attentions,
+-++++         )
+-++++ 
+-+++++
+-++++ __all__ = [
+-++++     "Qwen2MoeForCausalLM",
+-++++     "Qwen2MoeModel",
+-++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-++++new file mode 100644
+-++++index 00000000..6dfb5b93
+-++++--- /dev/null
+-+++++++ b/patches/0001-20251104commit.patch
+-++++@@ -0,0 +1,1272 @@
+-+++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+++++From: Pinoeer-kingxi <13022943007@163.com>
+-+++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+++++Subject: [PATCH] 20251104commit
+-+++++
+-+++++---
+-+++++ mindnlp/transformers/cache_utils.py           |  28 +-
+-+++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-+++++
+-+++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+++++index cadd2e04..02f8d4be 100644
+-+++++--- a/mindnlp/transformers/cache_utils.py
+-++++++++ b/mindnlp/transformers/cache_utils.py
+-+++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+++++             # k_out[:, :, cache_position] = key_states
+-+++++             # v_out[:, :, cache_position] = value_states
+-+++++-            if ON_ORANGE_PI:
+-+++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++++-            else:
+-+++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++++-
+-++++++            # if ON_ORANGE_PI:
+-++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++++            # else:
+-++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++++            # 确保 cache_position 是 1D tensor 并且类型正确
+-++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-++++++            if cache_position.ndim > 1:
+-++++++                cache_position = cache_position.flatten()
+-++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-++++++                cache_position = cache_position.int()
+-++++++            
+-++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-++++++            k_out[:, :, cache_position] = key_states
+-++++++            v_out[:, :, cache_position] = value_states
+-++++++            
+-+++++         return k_out, v_out
+-+++++ 
+-+++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++index c695b944..d8303e45 100644
+-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+++++ def rotate_half(x):
+-+++++     """Rotates half the hidden dims of the input."""
+-+++++-    x1 = x[..., : x.shape[-1] // 2]
+-+++++-    x2 = x[..., x.shape[-1] // 2 :]
+-++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++++    # x1 = x[..., : x.shape[-1] // 2]
+-++++++    # x2 = x[..., x.shape[-1] // 2 :]
+-++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++++     return ops.cat((-x2, x1), dim=-1)
+-+++++ 
+-+++++ 
+-+++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+++++         if self.training:
+-+++++             raise NotImplementedError("Training is not supported yet.")
+-+++++         else:
+-+++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++-        if self.config.n_shared_experts is not None:
+-+++++-            y = y + self.shared_experts(identity)
+-+++++-        return y
+-++++++            # @lwx
+-++++++            if orig_shape[1] == 1:
+-++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-++++++                y=y.view(*orig_shape)
+-++++++                if self.config.n_shared_experts is not None:
+-++++++                    y = y + self.shared_experts(identity)
+-++++++                return y
+-++++++            else:
+-++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-++++++                if self.config.n_shared_experts is not None:
+-++++++                    y = y + self.shared_experts(identity)
+-++++++                return y
+-++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++++        # if self.config.n_shared_experts is not None:
+-++++++        #     y = y + self.shared_experts(identity)
+-++++++        # return y
+-++++++        
+-++++++    @no_grad()
+-++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++++
+-++++++        expert_cache = ops.zeros_like(x)
+-++++++        for i in range(self.num_experts_per_tok):
+-++++++            expert_id = flat_expert_indices[i].item()
+-++++++            weight = flat_expert_weights[i].item()
+-++++++            expert = self.experts[expert_id]
+-++++++            expert_out = expert(x)
+-++++++            expert_cache += expert_out * weight
+-++++++        return expert_cache
+-+++++ 
+-+++++     @no_grad()
+-+++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-        # expert_cache = torch.zeros_like(x)
+-+++++-        # idxs = flat_expert_indices.argsort()
+-+++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++-        # token_idxs = idxs // self.num_experts_per_tok
+-+++++-        # for i, end_idx in enumerate(tokens_per_expert):
+-+++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++-        #     if start_idx == end_idx:
+-+++++-        #         continue
+-+++++-        #     expert = self.experts[i]
+-+++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++-        #     expert_tokens = x[exp_token_idx]
+-+++++-        #     expert_out = expert(expert_tokens)
+-+++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++-        # return expert_cache
+-++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++++         expert_cache = ops.zeros_like(x)
+-+++++         idxs = flat_expert_indices.argsort()
+-+++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++         token_idxs = idxs // self.num_experts_per_tok
+-++++++
+-+++++         for i, end_idx in enumerate(tokens_per_expert):
+-+++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++             if start_idx == end_idx:
+-+++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+++++             expert_out = expert(expert_tokens)
+-+++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++++
+-+++++         return expert_cache
+-++++++        
+-++++++    # @no_grad()
+-++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++++    #     # expert_cache = torch.zeros_like(x)
+-++++++    #     # idxs = flat_expert_indices.argsort()
+-++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++++    #     # token_idxs = idxs // self.num_experts_per_tok
+-++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++++    #     #     if start_idx == end_idx:
+-++++++    #     #         continue
+-++++++    #     #     expert = self.experts[i]
+-++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++    #     #     expert_tokens = x[exp_token_idx]
+-++++++    #     #     expert_out = expert(expert_tokens)
+-++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++++    #     # return expert_cache
+-++++++    #     expert_cache = ops.zeros_like(x)
+-++++++    #     idxs = flat_expert_indices.argsort()
+-++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++++    #     token_idxs = idxs // self.num_experts_per_tok
+-++++++
+-++++++    #     for i, end_idx in enumerate(tokens_per_expert):
+-++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++++    #         if start_idx == end_idx:
+-++++++    #             continue
+-++++++    #         expert = self.experts[i]
+-++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++    #         expert_tokens = x[exp_token_idx]
+-++++++    #         expert_out = expert(expert_tokens)
+-++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++++
+-++++++    #     return expert_cache
+-++++++    # @no_grad()
+-++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++++    #     expert_cache = ops.zeros_like(x)
+-++++++
+-++++++    #     # 排序保证顺序一致
+-++++++    #     idxs = flat_expert_indices.argsort()
+-++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++++    #     token_idxs = idxs // self.num_experts_per_tok
+-++++++
+-++++++    #     # 找出有 token 的专家
+-++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++++
+-++++++    #     for i in active_experts.tolist():
+-++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++++    #         end_idx = tokens_per_expert[i]
+-++++++    #         if start_idx == end_idx:  # 没有 token
+-++++++    #             continue
+-++++++
+-++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++    #         expert_tokens = x[exp_token_idx]
+-++++++    #         expert_out = self.experts[i](expert_tokens)
+-++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++++
+-++++++    #         expert_cache = mindspore.mint.scatter_add(
+-++++++    #             expert_cache,
+-++++++    #             0,
+-++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++++    #             expert_out
+-++++++    #         )
+-++++++
+-++++++    #     return expert_cache
+-++++++
+-++++++
+-+++++ 
+-+++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+++++ #     """
+-+++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++++ 
+-+++++         # Initialize weights and apply final processing
+-+++++         self.post_init()
+-++++++        self.warm_up = False
+-++++++
+-++++++    def warmup_moe_model_deep(self):
+-++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++++++        test_texts = [
+-++++++            "warmup short",
+-++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-++++++        ]
+-++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++++        if tokenizer is None:
+-++++++            from mindnlp.transformers import AutoTokenizer
+-++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++++            self._warmup_tokenizer = tokenizer
+-++++++
+-++++++        for text in test_texts:
+-++++++            inputs = tokenizer(text, return_tensors="ms")
+-++++++            with mindspore._no_grad():
+-++++++                _ = self(**inputs, use_cache=False)
+-++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+++++ 
+-+++++     def get_input_embeddings(self):
+-+++++         return self.model.embed_tokens
+-+++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++++         ```"""
+-++++++        if not self.warm_up:
+-++++++            self.warm_up = True
+-++++++            self.warmup_moe_model_deep()
+-++++++
+-+++++         output_attentions = (
+-+++++             output_attentions
+-+++++             if output_attentions is not None
+-+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++index 3cbf820e..d4c6b651 100644
+-+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++@@ -18,7 +18,6 @@
+-+++++ # See the License for the specific language governing permissions and
+-+++++ # limitations under the License.
+-+++++ """MindSpore Qwen2MoE model."""
+-+++++-
+-+++++ import math
+-+++++ from typing import List, Optional, Tuple, Union
+-+++++ 
+-+++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+++++     TokenClassifierOutput,
+-+++++ )
+-+++++ from ...modeling_utils import PreTrainedModel
+-++++++from ...generation import GenerationMixin
+-+++++ from ....utils import logging
+-+++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-+++++ 
+-+++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+++++         self.variance_epsilon = eps
+-+++++ 
+-+++++     def forward(self, hidden_states):
+-++++++        # @dwj
+-++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++++        # @lwx
+-++++++        # if not self.training :
+-++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++++         input_dtype = hidden_states.dtype
+-+++++         hidden_states = hidden_states.to(mindspore.float32)
+-+++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+++++@@ -234,6 +239,8 @@ def rotate_half(x):
+-+++++     """Rotates half the hidden dims of the input."""
+-+++++     x1 = x[..., : x.shape[-1] // 2]
+-+++++     x2 = x[..., x.shape[-1] // 2 :]
+-++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++++     return ops.cat((-x2, x1), dim=-1)
+-+++++ 
+-+++++ 
+-+++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+++++         self.config = config
+-+++++         self.hidden_size = config.hidden_size
+-+++++         self.intermediate_size = intermediate_size
+-++++++        
+-+++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+++++         self.act_fn = ACT2FN[config.hidden_act]
+-+++++ 
+-+++++     def forward(self, x):
+-+++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++++-
+-+++++ 
+-++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++++        # @lwx 
+-++++++        # gate_up_output = self.gate_up_proj(x)
+-++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-++++++        # return self.down_proj(swiglu_output)
+-++++++
+-++++++    # def forward(self, x):
+-++++++    #     gate_proj_out = self.gate_proj(x)
+-++++++    #     up_proj_out = self.up_proj(x)
+-++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-++++++    #     return self.down_proj(swiglu_out)
+-++++++        
+-+++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+++++     """
+-+++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+++++         use_cache: bool = False,
+-+++++         cache_position: Optional[mindspore.Tensor] = None,
+-+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++        
+-++++++
+-+++++         bsz, q_len, _ = hidden_states.shape
+-+++++ 
+-+++++         query_states = self.q_proj(hidden_states)
+-+++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++                     "with a layer index."
+-+++++                 )
+-+++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++            if isinstance(past_key_value, StaticCache):
+-++++++                kv_seq_len = key_states.shape[-2]
+-++++++            else:
+-++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++ 
+-+++++         if past_key_value is not None:
+-+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++++            
+-++++++            if isinstance(past_key_value, StaticCache):
+-++++++                kv_seq_len = key_states.shape[-2]
+-+++++ 
+-+++++         # repeat k/v heads if n_kv_heads < n_heads
+-+++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++-
+-++++++        
+-+++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++++ 
+-+++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+++++-            raise ValueError(
+-+++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+++++-                f" {attn_weights.shape}"
+-+++++-            )
+-+++++-
+-+++++-        if attention_mask is not None:  # no matter the length, we just slice it
+-+++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-++++++        if attention_mask is not None:
+-++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++++             attn_weights = attn_weights + causal_mask
+-+++++ 
+-+++++         # upcast attention to fp32
+-+++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++++ 
+-+++++         attn_output = self.o_proj(attn_output)
+-+++++-
+-++++++        # @lwx
+-++++++        
+-++++++        # max_seq_len = self.max_position_embeddings  # 2048
+-++++++
+-++++++        # if attention_mask is not None:
+-++++++        #     # attention_mask: [B, 1, Sq, Sk]
+-++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++++++
+-++++++        #     # pad 到 [max_seq_len, max_seq_len]
+-++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++++++        #     global_attention_mask = padded_mask
+-++++++        # else:
+-++++++        #     global_attention_mask = None
+-++++++
+-++++++
+-++++++        # sparse_mode=3
+-++++++        # attn_output = mindspore.ops.flash_attention_score(
+-++++++        #     query=query_states,
+-++++++        #     key=key_states,
+-++++++        #     value=value_states,
+-++++++        #     real_shift=None,
+-++++++        #     padding_mask=None,
+-++++++
+-++++++        #     head_num=self.num_heads,
+-++++++        #     attn_mask=global_attention_mask,
+-++++++        #     keep_prob=1.0 - self.attention_dropout,
+-++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++        #     input_layout="BNSD",
+-++++++        #     pre_tokens=2147483647,
+-++++++        #     next_tokens=2147483647,
+-++++++        #     inner_precise=0,
+-++++++        #     drop_mask=None,
+-++++++        #     prefix=None,
+-++++++        #     actual_seq_qlen=None,
+-++++++        #     actual_seq_kvlen=None,
+-++++++        #     sparse_mode=sparse_mode,
+-++++++        # )
+-+++++         if not output_attentions:
+-+++++             attn_weights = None
+-+++++ 
+-+++++         return attn_output, attn_weights, past_key_value
+-+++++ 
+-+++++ 
+-++++++class Qwen2MoeFlashAttention(nn.Module):
+-++++++    """
+-++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++++++
+-++++++    关键改动:
+-++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++++++       直接传入原始的 key 和 value 张量效率更高。
+-++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++++    """
+-++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++++        super().__init__()
+-++++++        self.config = config
+-++++++        self.layer_idx = layer_idx
+-++++++        self.hidden_size = config.hidden_size
+-++++++        self.num_heads = config.num_attention_heads
+-++++++        self.head_dim = self.hidden_size // self.num_heads
+-++++++        self.num_key_value_heads = config.num_key_value_heads
+-++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++++        self.max_position_embeddings = config.max_position_embeddings
+-++++++        self.rope_theta = config.rope_theta
+-++++++        self.attention_dropout = config.attention_dropout
+-++++++
+-++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++++            raise ValueError(
+-++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++++++            )
+-++++++
+-++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++++
+-++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++++            self.head_dim,
+-++++++            max_position_embeddings=self.max_position_embeddings,
+-++++++            base=self.rope_theta,
+-++++++        )
+-++++++
+-++++++    def forward(
+-++++++        self,
+-++++++        hidden_states: mindspore.Tensor,
+-++++++        attention_mask: Optional[mindspore.Tensor] = None,
+-++++++        position_ids: Optional[mindspore.Tensor] = None,
+-++++++        past_key_value: Optional[Cache] = None,
+-++++++        output_attentions: bool = False,
+-++++++        use_cache: bool = False,
+-++++++        cache_position: Optional[mindspore.Tensor] = None,
+-++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++        bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++        # 1. 线性投射 Q, K, V
+-++++++        query_states = self.q_proj(hidden_states)
+-++++++        key_states = self.k_proj(hidden_states)
+-++++++        value_states = self.v_proj(hidden_states)
+-++++++
+-++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++        # 3. RoPE 旋转位置编码
+-++++++        kv_seq_len = key_states.shape[-2]
+-++++++        if past_key_value is not None:
+-++++++            if self.layer_idx is None:
+-++++++                raise ValueError(
+-++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++                    "with a layer index."
+-++++++                )
+-++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++++++                if cache_position.shape[0] == 1:
+-++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++++++                    kv_seq_len = past_seen_tokens + 1
+-++++++                else:
+-++++++                    # prefill 阶段：cache_position 是范围，使用其长度
+-++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++++++            else:
+-++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++
+-++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++        # 4. KV 缓存更新
+-++++++        if past_key_value is not None:
+-++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++++            key_states, value_states = past_key_value.update(
+-++++++                key_states, value_states, self.layer_idx, cache_kwargs
+-++++++            )
+-++++++            
+-++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++++                if cache_position.shape[0] == 1:
+-++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++++++                    kv_seq_len = key_states.shape[-2]
+-++++++
+-++++++        # 5. [重要] 准备 Attention Mask
+-++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++++        fa_attention_mask = None
+-++++++        if attention_mask is not None:
+-++++++            # 截取与当前key长度匹配的部分
+-++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++++++            fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++++++        input_dtype = query_states.dtype
+-++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++++++            query_states = query_states.to(mindspore.float16)
+-++++++            key_states = key_states.to(mindspore.float16)
+-++++++            value_states = value_states.to(mindspore.float16)
+-++++++
+-++++++        # 6. [核心] 调用 flash_attention_score 算子
+-++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++++        attn_output = mindspore.ops.flash_attention_score(
+-++++++            query=query_states,
+-++++++            key=key_states,
+-++++++            value=value_states,
+-++++++            head_num=self.num_heads, # 传入Q的头数(N1)
+-++++++            attn_mask=fa_attention_mask,
+-++++++            keep_prob=1.0 - self.attention_dropout,
+-++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++            input_layout="BNSD",
+-++++++            sparse_mode=0 # 使用 defaultMask 模式
+-++++++        )
+-++++++
+-++++++        # 恢复原始数据类型
+-++++++        attn_output = attn_output.to(input_dtype)
+-++++++
+-++++++        # 7. 调整输出形状
+-++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++        attn_output = self.o_proj(attn_output)
+-++++++
+-++++++        # FlashAttention 算子不直接返回注意力权重矩阵
+-++++++        attn_weights = None
+-++++++        if output_attentions:
+-++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++++
+-++++++        return attn_output, attn_weights, past_key_value
+-++++++
+-++++++    # def forward(
+-++++++    #     self,
+-++++++    #     hidden_states: mindspore.Tensor,
+-++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++++    #     past_key_value: Optional[Cache] = None,
+-++++++    #     output_attentions: bool = False,
+-++++++    #     use_cache: bool = False,
+-++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++    #     bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++    #     # 1. 线性投射 Q, K, V
+-++++++    #     query_states = self.q_proj(hidden_states)
+-++++++    #     key_states = self.k_proj(hidden_states)
+-++++++    #     value_states = self.v_proj(hidden_states)
+-++++++
+-++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++    #     # 3. RoPE 旋转位置编码
+-++++++    #     kv_seq_len = key_states.shape[-2]
+-++++++    #     if past_key_value is not None:
+-++++++    #         if self.layer_idx is None:
+-++++++    #             raise ValueError(
+-++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++    #                 "with a layer index."
+-++++++    #             )
+-++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++
+-++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++    #     # 4. KV 缓存更新
+-++++++    #     if past_key_value is not None:
+-++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++++    #         key_states, value_states = past_key_value.update(
+-++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++++    #         )
+-++++++
+-++++++    #     # 5. 准备 Attention Mask
+-++++++    #     fa_attention_mask = None
+-++++++    #     if attention_mask is not None:
+-++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++    #         fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++++++    #     input_dtype = query_states.dtype
+-++++++
+-++++++    #     # 6. [核心] 调用 flash_attention_score 算子
+-++++++    #     attn_output = mindspore.ops.flash_attention_score(
+-++++++    #         query=query_states,
+-++++++    #         key=key_states,
+-++++++    #         value=value_states,
+-++++++    #         head_num=self.num_heads,
+-++++++    #         attn_mask=fa_attention_mask,
+-++++++    #         keep_prob=1.0 - self.attention_dropout,
+-++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++    #         input_layout="BNSD",
+-++++++    #         sparse_mode=0,
+-++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++++++    #         inner_precise=1
+-++++++    #     )
+-++++++
+-++++++    #     # 恢复原始数据类型
+-++++++    #     attn_output = attn_output.to(input_dtype)
+-++++++
+-++++++    #     # 7. 调整输出形状
+-++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++    #     attn_output = self.o_proj(attn_output)
+-++++++
+-++++++    #     attn_weights = None
+-++++++    #     if output_attentions:
+-++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++++
+-++++++    #     return attn_output, attn_weights, past_key_value
+-++++++
+-++++++    # def forward(
+-++++++    #     self,
+-++++++    #     hidden_states: mindspore.Tensor,
+-++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++++    #     past_key_value: Optional[Cache] = None,
+-++++++    #     output_attentions: bool = False,
+-++++++    #     use_cache: bool = False,
+-++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++    #     bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++    #     query_states = self.q_proj(hidden_states)
+-++++++    #     key_states = self.k_proj(hidden_states)
+-++++++    #     value_states = self.v_proj(hidden_states)
+-++++++
+-++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++    #     kv_seq_len = key_states.shape[-2]
+-++++++    #     if past_key_value is not None:
+-++++++    #         if self.layer_idx is None:
+-++++++    #             raise ValueError("`layer_idx` must be specified for caching")
+-++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++
+-++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++    #     if past_key_value is not None:
+-++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++++    #         key_states, value_states = past_key_value.update(
+-++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++++    #         )
+-++++++
+-++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++++
+-++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++++++    #     query_states = query_states / math.sqrt(self.head_dim)
+-++++++    #     # <--- 修改结束 ---
+-++++++
+-++++++    #     fa_attention_mask = None
+-++++++    #     if attention_mask is not None:
+-++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++    #         fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++    #     input_dtype = query_states.dtype
+-++++++
+-++++++    #     attn_output = mindspore.ops.flash_attention_score(
+-++++++    #         query=query_states,    # 传入已经预先缩放过的 query
+-++++++    #         key=key_states,
+-++++++    #         value=value_states,
+-++++++    #         head_num=self.num_heads,
+-++++++    #         attn_mask=fa_attention_mask,
+-++++++    #         keep_prob=1.0 - self.attention_dropout,
+-++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++++++    #         input_layout="BNSD",
+-++++++    #         sparse_mode=0,
+-++++++    #         inner_precise=1        # 仍然保持内部高精度计算
+-++++++    #     )
+-++++++
+-++++++    #     attn_output = attn_output.to(input_dtype)
+-++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++    #     attn_output = self.o_proj(attn_output)
+-++++++
+-++++++    #     attn_weights = None
+-++++++    #     if output_attentions:
+-++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++++++
+-++++++    #     return attn_output, attn_weights, past_key_value
+-++++++
+-+++++ QWEN2MOE_ATTENTION_CLASSES = {
+-+++++     "eager": Qwen2MoeAttention,
+-++++++    "flash-attention": Qwen2MoeFlashAttention,
+-+++++ }
+-+++++ 
+-+++++ 
+-+++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++ 
+-++++++    #@dwj
+-++++++    # 只遍历激活的专家，而非全部专家
+-+++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-+++++-        # router_logits: (batch * sequence_length, n_experts)
+-+++++-        router_logits = self.gate(hidden_states)
+-+++++-
+-+++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++-        if self.norm_topk_prob:
+-+++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-        # we cast back to the input dtype
+-+++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++-
+-+++++-        final_hidden_states = ops.zeros(
+-+++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+++++-        )
+-+++++-
+-+++++-        # One hot encode the selected experts to create an expert mask
+-+++++-        # this will be used to easily index which expert is going to be sollicitated
+-+++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+++++-
+-+++++-        # Loop over all available experts in the model and perform the computation on each expert
+-+++++-        for expert_idx in range(self.num_experts):
+-+++++-            expert_layer = self.experts[expert_idx]
+-+++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+++++-
+-+++++-            # Index the correct hidden states and compute the expert hidden state for
+-+++++-            # the current expert. We need to make sure to multiply the output hidden
+-+++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+++++-            if 0 not in idx.shape:
+-+++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+++++-
+-+++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-+++++-                # the `top_x` tensor here.
+-+++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+++++-
+-+++++-        shared_expert_output = self.shared_expert(hidden_states)
+-+++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+++++-
+-+++++-        final_hidden_states = final_hidden_states + shared_expert_output
+-++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++            num_tokens = hidden_states_reshaped.shape[0]
+-++++++
+-++++++            router_logits = self.gate(hidden_states_reshaped)
+-++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++++
+-++++++            if self.norm_topk_prob:
+-++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++            routing_weights = routing_weights.to(hidden_states.dtype)
+-++++++            
+-++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++++++            flat_selected_experts = selected_experts.flatten()
+-++++++            
+-++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++++++            token_indices = broadcasted_token_indices.flatten()
+-++++++            
+-++++++            active_experts = ops.unique(flat_selected_experts)
+-++++++            
+-++++++            for expert_idx_tensor in active_experts:
+-++++++                expert_idx = expert_idx_tensor.item()
+-++++++                expert_layer = self.experts[expert_idx]
+-++++++                
+-++++++                mask = (flat_selected_experts == expert_idx_tensor)
+-++++++                selected_token_indices = token_indices[mask]
+-++++++                selected_routing_weights = routing_weights.flatten()[mask]
+-++++++                
+-++++++                current_states = hidden_states_reshaped[selected_token_indices]
+-++++++                
+-++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++++                
+-++++++                final_hidden_states = final_hidden_states.index_add(
+-++++++                    dim=0,
+-++++++                    index=selected_token_indices,
+-++++++                    source=expert_output.to(hidden_states.dtype)
+-++++++                )
+-++++++            
+-++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++++ 
+-+++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++-        return final_hidden_states, router_logits
+-++++++            final_hidden_states = final_hidden_states + shared_expert_output
+-++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++++            
+-++++++            return final_hidden_states, router_logits
+-+++++ 
+-+++++ 
+-+++++ class Qwen2MoeDecoderLayer(nn.Module):
+-+++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+++++ 
+-+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++++ 
+-++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++++
+-+++++         if (layer_idx not in config.mlp_only_layers) and (
+-+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+++++         ):
+-+++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+++++     _skip_keys_device_placement = "past_key_values"
+-+++++     _supports_cache_class = True
+-++++++#lwx
+-++++++    # _supports_static_cache = True
+-+++++ 
+-+++++     def _init_weights(self, module):
+-+++++         std = self.config.initializer_range
+-+++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+++++         return causal_mask
+-+++++ 
+-+++++ 
+-+++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++     _tied_weights_keys = ["lm_head.weight"]
+-+++++ 
+-+++++     def __init__(self, config):
+-+++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++         self.num_experts_per_tok = config.num_experts_per_tok
+-+++++         # Initialize weights and apply final processing
+-+++++         self.post_init()
+-++++++        # @lwx
+-++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-++++++        #     self.generation_config.cache_implementation = "static"
+-++++++        self._warmed_up = False
+-++++++
+-++++++    def warmup_moe_model(self):
+-++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-++++++        test_texts = [
+-++++++            "warmup short",
+-++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-++++++        ]
+-++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++++        if tokenizer is None:
+-++++++            from mindnlp.transformers import AutoTokenizer
+-++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++++            self._warmup_tokenizer = tokenizer
+-++++++
+-++++++        for text in test_texts:
+-++++++            inputs = tokenizer(text, return_tensors="ms")
+-++++++            with mindspore._no_grad():
+-++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+++++ 
+-+++++     def get_input_embeddings(self):
+-+++++         return self.model.embed_tokens
+-+++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++++         ```"""
+-++++++        if not self._warmed_up:
+-++++++            self._warmed_up = True
+-++++++            self.warmup_moe_model()
+-+++++ 
+-+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+++++         output_router_logits = (
+-+++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++             }
+-+++++         )
+-+++++         return model_inputs
+-++++++# @lwx
+-++++++    # def _decode_one_tokens_logits(
+-++++++    #     self,
+-++++++    #     cur_token: mindspore.Tensor,
+-++++++    #     input_pos: Optional[mindspore.Tensor],
+-++++++    #     cache_position: mindspore.Tensor,
+-++++++    #     past_key_values: StaticCache,
+-++++++    # ) -> mindspore.Tensor:
+-++++++    #     """
+-++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-++++++        
+-++++++    #     Args:
+-++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-++++++    #         input_pos: 输入位置信息，可选
+-++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-++++++            
+-++++++    #     Returns:
+-++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-++++++    #     """
+-++++++    #     # 调用JIT编译的版本
+-++++++    #     return self.get_decode_one_tokens_logits(
+-++++++    #         cur_token=cur_token,
+-++++++    #         input_pos=input_pos,
+-++++++    #         cache_position=cache_position,
+-++++++    #         past_key_values=past_key_values,
+-++++++    #     )
+-++++++    
+-++++++    # @mindspore.jit(jit_level='O1')
+-++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-++++++    #     """
+-++++++    #     JIT编译的函数，用于高效的单token解码
+-++++++    #     使用JIT编译优化以支持静态shape和高效执行
+-++++++        
+-++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-++++++    #     """
+-++++++    #     outputs = self.model.forward(
+-++++++    #         input_ids=cur_token,
+-++++++    #         position_ids=input_pos,
+-++++++    #         cache_position=cache_position,
+-++++++    #         past_key_values=past_key_values,
+-++++++    #         use_cache=True,
+-++++++    #         return_dict=False,
+-++++++    #     )
+-++++++        
+-++++++    #     hidden_states = outputs[0]
+-++++++    #     logits = self.lm_head.forward(hidden_states)
+-++++++    #     logits = logits.float()
+-++++++        
+-++++++    #     return logits[:, -1, :]
+-++++++
+-++++++    # def _sample(
+-++++++    #     self,
+-++++++    #     input_ids: mindspore.Tensor,
+-++++++    #     logits_processor,
+-++++++    #     stopping_criteria,
+-++++++    #     generation_config,
+-++++++    #     synced_devices: bool,
+-++++++    #     streamer=None,
+-++++++    #     logits_warper=None,
+-++++++    #     **model_kwargs,
+-++++++    # ):
+-++++++    #     """
+-++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-++++++    #     """
+-++++++    #     from ...generation.logits_process import LogitsProcessorList
+-++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-++++++    #     from mindnlp.core import nn, ops, no_grad
+-++++++    #     import numpy as np
+-++++++        
+-++++++    #     # 检查是否使用 StaticCache
+-++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-++++++    #     # 否则，直接调用父类方法
+-++++++    #     past_key_values = model_kwargs.get("past_key_values")
+-++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-++++++
+-++++++    #     if not isinstance(past_key_values, StaticCache):
+-++++++    #         # 不使用 StaticCache，直接调用父类方法
+-++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-++++++    #         return super()._sample(
+-++++++    #             input_ids=input_ids,
+-++++++    #             logits_processor=logits_processor,
+-++++++    #             stopping_criteria=stopping_criteria,
+-++++++    #             generation_config=generation_config,
+-++++++    #             synced_devices=synced_devices,
+-++++++    #             streamer=streamer,
+-++++++    #             logits_warper=logits_warper,
+-++++++    #             **model_kwargs,
+-++++++    #         )
+-++++++        
+-++++++    #     # 使用 StaticCache，进入自定义循环
+-++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-++++++    #     pad_token_id = generation_config._pad_token_tensor
+-++++++    #     output_attentions = generation_config.output_attentions
+-++++++    #     output_hidden_states = generation_config.output_hidden_states
+-++++++    #     output_scores = generation_config.output_scores
+-++++++    #     output_logits = generation_config.output_logits
+-++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-++++++    #     max_length = generation_config.max_length
+-++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-++++++    #     do_sample = generation_config.do_sample
+-++++++        
+-++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-++++++    #         raise ValueError(
+-++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-++++++    #             f"{logits_warper})."
+-++++++    #         )
+-++++++        
+-++++++    #     # init attention / hidden states / scores tuples
+-++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-++++++        
+-++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-++++++    #         encoder_hidden_states = (
+-++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-++++++    #         )
+-++++++        
+-++++++    #     # keep track of which sequences are already finished
+-++++++    #     batch_size, cur_len = input_ids.shape
+-++++++    #     this_peer_finished = False
+-++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-++++++        
+-++++++    #     time_record = []
+-++++++    #     from ....utils.testing_utils import parse_flag_from_env
+-++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-++++++        
+-++++++    #     while self._has_unfinished_sequences(
+-++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-++++++    #     ):
+-++++++    #         if _record_time:
+-++++++    #             import time as time_module
+-++++++    #             infer_start = time_module.time()
+-++++++            
+-++++++    #         # prepare model inputs
+-++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-++++++            
+-++++++    #         # prepare variable output controls
+-++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-++++++            
+-++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-++++++    #         cur_cache_position = model_inputs.get("cache_position")
+-++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-++++++    #         cur_input_ids = model_inputs.get("input_ids")
+-++++++            
+-++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-++++++    #             cur_cache_position is not None and 
+-++++++    #             len(cur_cache_position.shape) > 0 and
+-++++++    #             cur_cache_position.shape[0] == 1 and
+-++++++    #             cur_input_ids is not None and
+-++++++    #             cur_input_ids.shape[1] == 1):
+-++++++    #             # 使用 JIT 优化的单 token 解码
+-++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-++++++    #             if not hasattr(self, '_jit_used'):
+-++++++    #                 self._jit_used = False
+-++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-++++++                
+-++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-++++++    #                 cur_token=cur_input_ids,
+-++++++    #                 input_pos=model_inputs.get("position_ids"),
+-++++++    #                 cache_position=cur_cache_position,
+-++++++    #                 past_key_values=cur_past_key_values,
+-++++++    #             )
+-++++++                
+-++++++    #             # 标记已使用JIT（用于后续判断）
+-++++++    #             if not self._jit_used:
+-++++++    #                 self._jit_used = True
+-++++++                
+-++++++    #             # 构造兼容的输出对象
+-++++++    #             class JitOptimizedOutput:
+-++++++    #                 def __init__(self, logits, config):
+-++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-++++++    #                     self.config = config
+-++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-++++++    #                     self.cross_attentions = None
+-++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-++++++                
+-++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-++++++    #         else:
+-++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-++++++    #             outputs = self(**model_inputs, return_dict=True)
+-++++++            
+-++++++    #         if synced_devices and this_peer_finished:
+-++++++    #             continue
+-++++++            
+-++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-++++++    #         next_token_logits = outputs.logits[:, -1, :]
+-++++++            
+-++++++    #         # pre-process distribution
+-++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-++++++    #         if do_sample:
+-++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-++++++            
+-++++++    #         # Store scores, attentions and hidden_states when required
+-++++++    #         if return_dict_in_generate:
+-++++++    #             if output_scores:
+-++++++    #                 scores += (next_token_scores,)
+-++++++    #             if output_logits:
+-++++++    #                 raw_logits += (next_token_logits,)
+-++++++    #             if output_attentions:
+-++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-++++++    #                 if self.config.is_encoder_decoder:
+-++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-++++++                
+-++++++    #             if output_hidden_states:
+-++++++    #                 hidden = (
+-++++++    #                     outputs.decoder_hidden_states
+-++++++    #                     if self.config.is_encoder_decoder
+-++++++    #                     else outputs.hidden_states
+-++++++    #                 )
+-++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-++++++            
+-++++++    #         # token selection
+-++++++    #         if do_sample:
+-++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-++++++    #         else:
+-++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-++++++            
+-++++++    #         # finished sentences should have their next token be a padding token
+-++++++    #         if has_eos_stopping_criteria:
+-++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-++++++            
+-++++++    #         # update generated ids, model inputs, and length for next step
+-++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-++++++    #         if streamer is not None:
+-++++++    #             streamer.put(next_tokens)
+-++++++            
+-++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-++++++    #             outputs,
+-++++++    #             model_kwargs,
+-++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-++++++    #         )
+-++++++            
+-++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-++++++    #         cur_len += 1
+-++++++            
+-++++++    #         if _record_time:
+-++++++    #             import time as time_module
+-++++++    #             infer_stop = time_module.time()
+-++++++    #             time_record.append(infer_stop - infer_start)
+-++++++            
+-++++++    #         del outputs
+-++++++        
+-++++++    #     average_infer_time = None
+-++++++    #     if time_record:
+-++++++    #         if len(time_record) > 1:
+-++++++    #             time_record.pop(0)
+-++++++    #         average_infer_time = sum(time_record) / len(time_record)
+-++++++    #         print(f'average inference time is: {average_infer_time}')
+-++++++    #         print(f'inference time record: {time_record}')
+-++++++        
+-++++++    #     if streamer is not None:
+-++++++    #         streamer.end()
+-++++++        
+-++++++    #     # 简单判断：打印是否使用了JIT路径
+-++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-++++++    #     else:
+-++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-++++++        
+-++++++    #     if return_dict_in_generate:
+-++++++    #         if self.config.is_encoder_decoder:
+-++++++    #             return GenerateEncoderDecoderOutput(
+-++++++    #                 sequences=input_ids,
+-++++++    #                 scores=scores,
+-++++++    #                 logits=raw_logits,
+-++++++    #                 encoder_attentions=encoder_attentions,
+-++++++    #                 encoder_hidden_states=encoder_hidden_states,
+-++++++    #                 decoder_attentions=decoder_attentions,
+-++++++    #                 cross_attentions=cross_attentions,
+-++++++    #                 decoder_hidden_states=decoder_hidden_states,
+-++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++++    #                 average_infer_time=average_infer_time
+-++++++    #             )
+-++++++    #         else:
+-++++++    #             return GenerateDecoderOnlyOutput(
+-++++++    #                 sequences=input_ids,
+-++++++    #                 scores=scores,
+-++++++    #                 logits=raw_logits,
+-++++++    #                 attentions=decoder_attentions,
+-++++++    #                 hidden_states=decoder_hidden_states,
+-++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++++    #                 average_infer_time=average_infer_time
+-++++++    #             )
+-++++++    #     else:
+-++++++    #         return input_ids
+-++++++            
+-++++++    # def _prepare_cache_for_generation(
+-++++++    #     self,
+-++++++    #     generation_config,
+-++++++    #     model_kwargs,
+-++++++    #     assistant_model,
+-++++++    #     batch_size,
+-++++++    #     max_cache_length,
+-++++++    # ):
+-++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-++++++    #         generation_config.cache_implementation = "static"
+-++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-++++++        
+-++++++    #     if generation_config.cache_implementation == "static":
+-++++++    #         base_required_from_max_length = generation_config.max_length + 1
+-++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-++++++    #         min_cache_size = 50
+-++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-++++++    #         else:
+-++++++    #             max_cache_length = max(base_required, min_cache_size)
+-++++++            
+-++++++    #         original_max_cache_length = max_cache_length
+-++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-++++++            
+-++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++++    #             if max_cache_length > self.config.max_position_embeddings:
+-++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-++++++        
+-++++++    #     result = super()._prepare_cache_for_generation(
+-++++++    #         generation_config=generation_config,
+-++++++    #         model_kwargs=model_kwargs,
+-++++++    #         assistant_model=assistant_model,
+-++++++    #         batch_size=batch_size,
+-++++++    #         max_cache_length=max_cache_length,
+-++++++    #     )
+-++++++        
+-++++++    #     if generation_config.cache_implementation == "static":
+-++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-++++++    #         created_cache = model_kwargs.get(cache_name)
+-++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-++++++    #             if created_cache.max_cache_len < generation_config.max_length:
+-++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-++++++        
+-++++++    #     return result
+-++++++
+-++++++
+-++++++
+-+++++ 
+-+++++ 
+-+++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+++++-- 
+-+++++2.27.0
+-+++++
+-++++-- 
+-++++2.27.0
+-++++
+-+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-+++new file mode 100644
+-+++index 00000000..966529e4
+-+++--- /dev/null
+-++++++ b/patches/0003-20261106secondcommit.patch
+-+++@@ -0,0 +1,2769 @@
+-++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-++++From: Pinoeer-kingxi <13022943007@163.com>
+-++++Date: Thu, 6 Nov 2025 14:54:37 +0800
+-++++Subject: [PATCH 3/3] 20261106secondcommit
+-++++
+-++++---
+-++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+-++++ patches/0001-20251104commit.patch             | 1272 -----------------
+-++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
+-++++ delete mode 100644 patches/0001-20251104commit.patch
+-++++
+-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++index 73773c22..2f9192bf 100644
+-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+-++++ 
+-++++ _CONFIG_FOR_DOC = "DeepseekConfig"
+-++++ 
+-+++++_attn_mask_cache = {}
+-+++++
+-+++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+-+++++    q_len = batch_and_seq[1]
+-+++++    kv_len = batch_and_seq[1] + past_key_values_length 
+-+++++    key = (batch_and_seq[0], q_len, kv_len)
+-+++++
+-+++++    if key in _attn_mask_cache:
+-+++++        return _attn_mask_cache[key]
+-+++++
+-+++++    mask = _prepare_4d_causal_attention_mask(
+-+++++        attention_mask,
+-+++++        batch_and_seq,
+-+++++        inputs_embeds,
+-+++++        past_key_values_length,
+-+++++    )
+-+++++    _attn_mask_cache[key] = mask
+-+++++    return mask
+-++++ 
+-++++ def _get_unpad_data(attention_mask):
+-++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+-++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+-++++         return final_output
+-++++ 
+-++++ 
+-++++-    @no_grad()
+-++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++-        expert_cache = ops.zeros_like(x)
+-++++-        idxs = flat_expert_indices.argsort()
+-++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++-        token_idxs = idxs // self.num_experts_per_tok
+-++++-
+-++++-        for i, end_idx in enumerate(tokens_per_expert):
+-++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++-            if start_idx == end_idx:
+-++++-                continue
+-++++-            expert = self.experts[i]
+-++++-            exp_token_idx = token_idxs[start_idx:end_idx]
+-++++-            expert_tokens = x[exp_token_idx]
+-++++-            expert_out = expert(expert_tokens)
+-++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++-
+-++++-        return expert_cache
+-++++-        
+-++++     # @no_grad()
+-++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++-    #     # expert_cache = torch.zeros_like(x)
+-++++-    #     # idxs = flat_expert_indices.argsort()
+-++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++-    #     # token_idxs = idxs // self.num_experts_per_tok
+-++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++-    #     #     if start_idx == end_idx:
+-++++-    #     #         continue
+-++++-    #     #     expert = self.experts[i]
+-++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++-    #     #     expert_tokens = x[exp_token_idx]
+-++++-    #     #     expert_out = expert(expert_tokens)
+-++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++-    #     # return expert_cache
+-+++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++     #     expert_cache = ops.zeros_like(x)
+-++++     #     idxs = flat_expert_indices.argsort()
+-++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+-++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++ 
+-++++     #     return expert_cache
+-++++-    # @no_grad()
+-++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++-    #     expert_cache = ops.zeros_like(x)
+-+++++        
+-+++++    @no_grad()
+-+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++++        """
+-+++++        优化版 MoE prefill：
+-+++++        - 批量张量化处理同一个 expert 的所有 token
+-+++++        - 跳过无 token 的专家
+-+++++        - 保持结果完全一致
+-+++++        """
+-+++++        # 初始化输出缓存
+-+++++        expert_cache = ops.zeros_like(x)
+-++++ 
+-++++-    #     # 排序保证顺序一致
+-++++-    #     idxs = flat_expert_indices.argsort()
+-++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++-    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++        # 排序（确保 scatter_add 位置对应原逻辑）
+-+++++        idxs = flat_expert_indices.argsort()
+-+++++        sorted_expert_indices = flat_expert_indices[idxs]
+-+++++        sorted_token_indices = idxs // self.num_experts_per_tok
+-++++ 
+-++++-    #     # 找出有 token 的专家
+-++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++++        # 每个 expert 的 token 数
+-+++++        tokens_per_expert = sorted_expert_indices.bincount()
+-++++ 
+-++++-    #     for i in active_experts.tolist():
+-++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++-    #         end_idx = tokens_per_expert[i]
+-++++-    #         if start_idx == end_idx:  # 没有 token
+-++++-    #             continue
+-+++++        # 找出有 token 的专家
+-+++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-++++ 
+-++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++-    #         expert_tokens = x[exp_token_idx]
+-++++-    #         expert_out = self.experts[i](expert_tokens)
+-++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++++        for expert_id in active_experts.tolist():
+-+++++            # 取该 expert 对应的排序后 token 区间
+-+++++            start = (tokens_per_expert[:expert_id]).sum().item()
+-+++++            end = start + tokens_per_expert[expert_id].item()
+-++++ 
+-++++-    #         expert_cache = mindspore.mint.scatter_add(
+-++++-    #             expert_cache,
+-++++-    #             0,
+-++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++-    #             expert_out
+-++++-    #         )
+-+++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+-+++++            expert_tokens = x[token_idx]                     # 取输入向量
+-++++ 
+-++++-    #     return expert_cache
+-+++++            # 执行专家 MLP
+-+++++            expert_out = self.experts[expert_id](expert_tokens)
+-+++++
+-+++++            # 按权重缩放
+-+++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+-+++++
+-+++++            # 回写到缓存（等价 scatter_add）
+-+++++            expert_cache = mindspore.mint.scatter_add(
+-+++++                expert_cache,
+-+++++                0,
+-+++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++++                scaled_out
+-+++++            )
+-+++++
+-+++++        return expert_cache
+-+++++
+-+++++        # @no_grad()
+-+++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++        #     # expert_cache = torch.zeros_like(x)
+-+++++        #     # idxs = flat_expert_indices.argsort()
+-+++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++        #     # token_idxs = idxs // self.num_experts_per_tok
+-+++++        #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++        #     #     if start_idx == end_idx:
+-+++++        #     #         continue
+-+++++        #     #     expert = self.experts[i]
+-+++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++        #     #     expert_tokens = x[exp_token_idx]
+-+++++        #     #     expert_out = expert(expert_tokens)
+-+++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++        #     # return expert_cache
+-+++++        #     expert_cache = ops.zeros_like(x)
+-+++++        #     idxs = flat_expert_indices.argsort()
+-+++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++        #     token_idxs = idxs // self.num_experts_per_tok
+-+++++
+-+++++        #     for i, end_idx in enumerate(tokens_per_expert):
+-+++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++        #         if start_idx == end_idx:
+-+++++        #             continue
+-+++++        #         expert = self.experts[i]
+-+++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++        #         expert_tokens = x[exp_token_idx]
+-+++++        #         expert_out = expert(expert_tokens)
+-+++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++
+-+++++        #     return expert_cache
+-+++++        # @no_grad()
+-+++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++        #     expert_cache = ops.zeros_like(x)
+-+++++
+-+++++        #     # 排序保证顺序一致
+-+++++        #     idxs = flat_expert_indices.argsort()
+-+++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++        #     token_idxs = idxs // self.num_experts_per_tok
+-+++++
+-+++++        #     # 找出有 token 的专家
+-+++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++++
+-+++++        #     for i in active_experts.tolist():
+-+++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++        #         end_idx = tokens_per_expert[i]
+-+++++        #         if start_idx == end_idx:  # 没有 token
+-+++++        #             continue
+-+++++
+-+++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++        #         expert_tokens = x[exp_token_idx]
+-+++++        #         expert_out = self.experts[i](expert_tokens)
+-+++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++++
+-+++++        #         expert_cache = mindspore.mint.scatter_add(
+-+++++        #             expert_cache,
+-+++++        #             0,
+-+++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++++        #             expert_out
+-+++++        #         )
+-+++++
+-+++++        #     return expert_cache
+-++++ 
+-++++ 
+-++++ 
+-++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+-++++ 
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++-
+-++++ # class DeepseekFlashAttention(nn.Module):
+-++++ #     """
+-++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+-++++ 
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-+++++
+-++++ Deepseek_ATTENTION_CLASSES = {
+-++++     "eager": DeepseekAttention,
+-++++     "flash-attention": DeepseekFlashAttention,
+-++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+-++++             )
+-++++         else:
+-++++             # 4d mask is passed through the layers
+-++++-            attention_mask = _prepare_4d_causal_attention_mask(
+-+++++            # attention_mask = _prepare_4d_causal_attention_mask(
+-+++++            #     attention_mask,
+-+++++            #     (batch_size, seq_length),
+-+++++            #     inputs_embeds,
+-+++++            #     past_key_values_length,
+-+++++            # )
+-+++++            #@dwj
+-+++++            attention_mask = get_cached_causal_mask(
+-++++                 attention_mask,
+-++++                 (batch_size, seq_length),
+-++++                 inputs_embeds,
+-++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++         # Initialize weights and apply final processing
+-++++         self.post_init()
+-++++         self.warm_up = False
+-+++++        #@dwj
+-+++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-+++++            self.num_layers,
+-+++++            self.num_attention_heads,
+-+++++            self.head_dim,
+-+++++            batch_size=1,
+-+++++            max_length=self.max_length,
+-+++++            dtype=mindspore.float16
+-+++++        )
+-+++++
+-+++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-+++++        key_cache = []
+-+++++        value_cache = []
+-+++++        for _ in range(num_layers):
+-+++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-+++++            key_cache.append(k)
+-+++++            value_cache.append(v)
+-+++++        return key_cache, value_cache
+-+++++
+-++++ 
+-++++     def warmup_moe_model_deep(self):
+-++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++index bced285c..ebd7782e 100644
+-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+-++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-++++ 
+-++++-Long_Prompt = False
+-++++-PROMPT_LENGTH_THRESHOLD = 128
+-+++++Long_Prompt = 1
+-+++++LONG_PROMPT_LENGTH_THRESHOLD = 128
+-+++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+-+++++
+-+++++_causal_mask_cache = {}
+-+++++
+-+++++def get_cached_causal_mask_with_cache_position(
+-+++++    attention_mask: mindspore.Tensor,
+-+++++    sequence_length: int,
+-+++++    target_length: int,
+-+++++    dtype: mindspore.dtype,
+-+++++    min_dtype: float,
+-+++++    cache_position: mindspore.Tensor,
+-+++++    batch_size: int,
+-+++++):
+-+++++    """
+-+++++    带缓存的 causal mask 构造函数
+-+++++    """
+-+++++    # q_len 是当前 query 长度
+-+++++    q_len = sequence_length
+-+++++    # kv_len 是 target_length
+-+++++    kv_len = target_length
+-+++++
+-+++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+-+++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+-+++++
+-+++++    if key in _causal_mask_cache:
+-+++++        return _causal_mask_cache[key]
+-+++++
+-+++++    # 调用原来的 mask 构造逻辑
+-+++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++++        attention_mask,
+-+++++        sequence_length=sequence_length,
+-+++++        target_length=target_length,
+-+++++        dtype=dtype,
+-+++++        min_dtype=min_dtype,
+-+++++        cache_position=cache_position,
+-+++++        batch_size=batch_size,
+-+++++    )
+-+++++    # 缓存结果
+-+++++    _causal_mask_cache[key] = causal_mask
+-+++++    return causal_mask
+-++++ 
+-++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-++++ def _prepare_4d_causal_attention_mask_with_cache_position(
+-++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++++ 
+-++++ 
+-++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+-+++++# class Qwen2MoeAttention(nn.Module):
+-+++++#     """
+-+++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-+++++#     and "Generating Long Sequences with Sparse Transformers".
+-+++++#     """
+-+++++
+-+++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++++#         super().__init__()
+-+++++#         self.config = config
+-+++++#         self.layer_idx = layer_idx
+-+++++#         if layer_idx is None:
+-+++++#             logger.warning_once(
+-+++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++++#                 "when creating this class."
+-+++++#             )
+-+++++
+-+++++#         self.hidden_size = config.hidden_size
+-+++++#         self.num_heads = config.num_attention_heads
+-+++++#         self.head_dim = self.hidden_size // self.num_heads
+-+++++#         self.num_key_value_heads = config.num_key_value_heads
+-+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++#         self.max_position_embeddings = config.max_position_embeddings
+-+++++#         self.rope_theta = config.rope_theta
+-+++++#         self.is_causal = True
+-+++++#         self.attention_dropout = config.attention_dropout
+-+++++
+-+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++#             raise ValueError(
+-+++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-+++++#                 f" and `num_heads`: {self.num_heads})."
+-+++++#             )
+-+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++++
+-+++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++++#             self.head_dim,
+-+++++#             max_position_embeddings=self.max_position_embeddings,
+-+++++#             base=self.rope_theta,
+-+++++#         )
+-+++++
+-+++++#     def forward(
+-+++++#         self,
+-+++++#         hidden_states: mindspore.Tensor,
+-+++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-+++++#         position_ids: Optional[mindspore.Tensor] = None,
+-+++++#         past_key_value: Optional[Cache] = None,
+-+++++#         output_attentions: bool = False,
+-+++++#         use_cache: bool = False,
+-+++++#         cache_position: Optional[mindspore.Tensor] = None,
+-+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++
+-+++++        
+-+++++
+-+++++#         bsz, q_len, _ = hidden_states.shape
+-+++++
+-+++++#         query_states = self.q_proj(hidden_states)
+-+++++#         key_states = self.k_proj(hidden_states)
+-+++++#         value_states = self.v_proj(hidden_states)
+-+++++
+-+++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++++
+-+++++#         kv_seq_len = key_states.shape[-2]
+-+++++#         if past_key_value is not None:
+-+++++#             if self.layer_idx is None:
+-+++++#                 raise ValueError(
+-+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++#                     "with a layer index."
+-+++++#                 )
+-+++++#             if isinstance(past_key_value, StaticCache):
+-+++++#                 kv_seq_len = key_states.shape[-2]
+-+++++#             else:
+-+++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++
+-+++++#         if past_key_value is not None:
+-+++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++            
+-+++++#             if isinstance(past_key_value, StaticCache):
+-+++++#                 kv_seq_len = key_states.shape[-2]
+-+++++
+-+++++#         # repeat k/v heads if n_kv_heads < n_heads
+-+++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++        
+-+++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++++
+-+++++#         if attention_mask is not None:
+-+++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++++#             attn_weights = attn_weights + causal_mask
+-+++++
+-+++++#         # upcast attention to fp32
+-+++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+++++#         attn_output = ops.matmul(attn_weights, value_states)
+-+++++
+-+++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+++++#             raise ValueError(
+-+++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-+++++#                 f" {attn_output.shape}"
+-+++++#             )
+-+++++
+-+++++#         attn_output = ops.transpose(attn_output, 1, 2)
+-+++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++++
+-+++++#         attn_output = self.o_proj(attn_output)
+-+++++#         # @lwx
+-+++++        
+-+++++#         # max_seq_len = self.max_position_embeddings  # 2048
+-+++++
+-+++++#         # if attention_mask is not None:
+-+++++#         #     # attention_mask: [B, 1, Sq, Sk]
+-+++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++++
+-+++++#         #     # pad 到 [max_seq_len, max_seq_len]
+-+++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++++#         #     global_attention_mask = padded_mask
+-+++++#         # else:
+-+++++#         #     global_attention_mask = None
+-+++++
+-+++++
+-+++++#         # sparse_mode=3
+-+++++#         # attn_output = mindspore.ops.flash_attention_score(
+-+++++#         #     query=query_states,
+-+++++#         #     key=key_states,
+-+++++#         #     value=value_states,
+-+++++#         #     real_shift=None,
+-+++++#         #     padding_mask=None,
+-+++++
+-+++++#         #     head_num=self.num_heads,
+-+++++#         #     attn_mask=global_attention_mask,
+-+++++#         #     keep_prob=1.0 - self.attention_dropout,
+-+++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++#         #     input_layout="BNSD",
+-+++++#         #     pre_tokens=2147483647,
+-+++++#         #     next_tokens=2147483647,
+-+++++#         #     inner_precise=0,
+-+++++#         #     drop_mask=None,
+-+++++#         #     prefix=None,
+-+++++#         #     actual_seq_qlen=None,
+-+++++#         #     actual_seq_kvlen=None,
+-+++++#         #     sparse_mode=sparse_mode,
+-+++++#         # )
+-+++++#         if not output_attentions:
+-+++++#             attn_weights = None
+-+++++
+-+++++#         return attn_output, attn_weights, past_key_value
+-+++++
+-++++ class Qwen2MoeAttention(nn.Module):
+-++++     """
+-++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-++++-    and "Generating Long Sequences with Sparse Transformers".
+-++++-    """
+-+++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+-++++ 
+-+++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+-+++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+-+++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+-+++++
+-+++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+-+++++    """
+-++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++         super().__init__()
+-++++         self.config = config
+-++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+-++++         if layer_idx is None:
+-++++             logger.warning_once(
+-++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++                 "when creating this class."
+-++++             )
+-++++ 
+-++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+-++++         use_cache: bool = False,
+-++++         cache_position: Optional[mindspore.Tensor] = None,
+-++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++-
+-++++         
+-++++-
+-+++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+-++++         bsz, q_len, _ = hidden_states.shape
+-++++ 
+-++++         query_states = self.q_proj(hidden_states)
+-++++         key_states = self.k_proj(hidden_states)
+-++++         value_states = self.v_proj(hidden_states)
+-++++ 
+-++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++-
+-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++        
+-++++         kv_seq_len = key_states.shape[-2]
+-++++         if past_key_value is not None:
+-++++-            if self.layer_idx is None:
+-++++-                raise ValueError(
+-++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++-                    "with a layer index."
+-++++-                )
+-++++-            if isinstance(past_key_value, StaticCache):
+-++++-                kv_seq_len = key_states.shape[-2]
+-++++-            else:
+-++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++        
+-++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++ 
+-++++         if past_key_value is not None:
+-++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++
+-+++++        # --- 2. 动态调度核心注意力计算 ---
+-+++++        global Long_Prompt
+-+++++        if Long_Prompt >= 1:
+-+++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+-+++++            fa_attention_mask = None
+-+++++            if attention_mask is not None:
+-+++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++                fa_attention_mask = (mask_slice != 0)
+-+++++
+-+++++            attn_output = mindspore.ops.flash_attention_score(
+-+++++                query=query_states,
+-+++++                key=key_states,
+-+++++                value=value_states,
+-+++++                head_num=self.num_heads,
+-+++++                attn_mask=fa_attention_mask,
+-+++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+-+++++                scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++                input_layout="BNSD",
+-+++++                sparse_mode=0,
+-+++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+-+++++            )
+-++++             
+-++++-            if isinstance(past_key_value, StaticCache):
+-++++-                kv_seq_len = key_states.shape[-2]
+-+++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++            attn_output = self.o_proj(attn_output)
+-+++++            attn_weights = None
+-+++++            if output_attentions:
+-+++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-++++ 
+-++++-        # repeat k/v heads if n_kv_heads < n_heads
+-++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++-        
+-++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++++        else:
+-+++++            # --- Eager Attention 路径 (用于短序列和解码) ---
+-+++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++            
+-+++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++ 
+-++++-        if attention_mask is not None:
+-++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++-            attn_weights = attn_weights + causal_mask
+-+++++            if attention_mask is not None:
+-+++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++++                attn_weights = attn_weights + causal_mask
+-++++ 
+-++++-        # upcast attention to fp32
+-++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-++++-        attn_output = ops.matmul(attn_weights, value_states)
+-+++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+++++            attn_output = ops.matmul(attn_weights, value_states)
+-++++ 
+-++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-++++-            raise ValueError(
+-++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-++++-                f" {attn_output.shape}"
+-++++-            )
+-+++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+++++                raise ValueError(
+-+++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+-+++++                )
+-++++ 
+-++++-        attn_output = ops.transpose(attn_output, 1, 2)
+-++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++++            attn_output = ops.transpose(attn_output, 1, 2)
+-+++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++++            attn_output = self.o_proj(attn_output)
+-++++ 
+-++++-        attn_output = self.o_proj(attn_output)
+-++++-        # @lwx
+-+++++            if not output_attentions:
+-+++++                attn_weights = None
+-++++         
+-++++-        # max_seq_len = self.max_position_embeddings  # 2048
+-++++-
+-++++-        # if attention_mask is not None:
+-++++-        #     # attention_mask: [B, 1, Sq, Sk]
+-++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++++-
+-++++-        #     # pad 到 [max_seq_len, max_seq_len]
+-++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++++-        #     global_attention_mask = padded_mask
+-++++-        # else:
+-++++-        #     global_attention_mask = None
+-++++-
+-++++-
+-++++-        # sparse_mode=3
+-++++-        # attn_output = mindspore.ops.flash_attention_score(
+-++++-        #     query=query_states,
+-++++-        #     key=key_states,
+-++++-        #     value=value_states,
+-++++-        #     real_shift=None,
+-++++-        #     padding_mask=None,
+-++++-
+-++++-        #     head_num=self.num_heads,
+-++++-        #     attn_mask=global_attention_mask,
+-++++-        #     keep_prob=1.0 - self.attention_dropout,
+-++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++-        #     input_layout="BNSD",
+-++++-        #     pre_tokens=2147483647,
+-++++-        #     next_tokens=2147483647,
+-++++-        #     inner_precise=0,
+-++++-        #     drop_mask=None,
+-++++-        #     prefix=None,
+-++++-        #     actual_seq_qlen=None,
+-++++-        #     actual_seq_kvlen=None,
+-++++-        #     sparse_mode=sparse_mode,
+-++++-        # )
+-++++-        if not output_attentions:
+-++++-            attn_weights = None
+-++++-
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++-
+-++++ # class Qwen2MoeFlashAttention(nn.Module):
+-++++ #     """
+-++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+-++++ #             return final_hidden_states, router_logits
+-++++ 
+-++++ 
+-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++-#     """
+-++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-++++-#     """
+-++++-#     def __init__(self, config: Qwen2MoeConfig):
+-++++-#         super().__init__()
+-++++-#         self.num_experts = config.num_experts
+-++++-#         self.top_k = config.num_experts_per_tok
+-++++-#         self.norm_topk_prob = config.norm_topk_prob
+-++++-
+-++++-#         # 门控网络
+-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++-#         # 专家列表
+-++++-#         self.experts = nn.ModuleList(
+-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++-#         )
+-++++-#         # 共享专家
+-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++-
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_decode(
+-++++-#         self, 
+-++++-#         hidden_states: mindspore.Tensor, 
+-++++-#         selected_experts: mindspore.Tensor, 
+-++++-#         routing_weights: mindspore.Tensor
+-++++-#     ) -> mindspore.Tensor:
+-++++-#         """
+-++++-#         【解码路径】针对 sequence_length=1 的极致优化。
+-++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-++++-#         """
+-++++-#         batch_size, hidden_dim = hidden_states.shape
+-++++-        
+-++++-#         expert_outputs_list = [
+-++++-#             ops.cat([
+-++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++-#             ], dim=0) 
+-++++-#             for i in range(batch_size)
+-++++-#         ]
+-++++-        
+-++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-++++-#         # shape: (batch_size, top_k, hidden_dim)
+-++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++-        
+-++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++-        
+-++++-#         return moe_output.squeeze(1)
+-++++-
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_prefill(
+-++++-#         self, 
+-++++-#         hidden_states: mindspore.Tensor, 
+-++++-#         selected_experts: mindspore.Tensor, 
+-++++-#         routing_weights: mindspore.Tensor
+-++++-#     ) -> mindspore.Tensor:
+-++++-#         """
+-++++-#         【预填充路径】针对 sequence_length > 1 的优化。
+-++++-#         按专家对 Token 进行分组，并进行批处理。
+-++++-#         """
+-++++-#         moe_output = ops.zeros_like(hidden_states)
+-++++-#         num_tokens = hidden_states.shape[0]
+-++++-#         flat_selected_experts = selected_experts.flatten()
+-++++-        
+-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++-        
+-++++-#         active_experts = ops.unique(flat_selected_experts)
+-++++-        
+-++++-#         for expert_idx_tensor in active_experts:
+-++++-#             expert_idx = expert_idx_tensor.item()
+-++++-#             expert_layer = self.experts[expert_idx]
+-++++-            
+-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++-#             selected_token_indices = token_indices[mask]
+-++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++-            
+-++++-#             current_states = hidden_states[selected_token_indices]
+-++++-            
+-++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++-            
+-++++-#             moe_output = moe_output.index_add(
+-++++-#                 dim=0,
+-++++-#                 index=selected_token_indices,
+-++++-#                 source=expert_output.to(hidden_states.dtype)
+-++++-#             )
+-++++-#         return moe_output
+-++++-
+-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++-#         """
+-++++-#         顶层 forward 方法，作为智能分发器。
+-++++-#         """
+-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-        
+-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++-#         router_logits = self.gate(hidden_states_reshaped)
+-++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++-
+-++++-#         if self.norm_topk_prob:
+-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-        
+-++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++-        
+-++++-#         moe_output = None
+-++++-#         # 在推理时，根据序列长度选择最优路径
+-++++-#         if not self.training:
+-++++-#             if sequence_length == 1:
+-++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++++-#             else:
+-++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++++-#         else:
+-++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-++++-#             raise NotImplementedError("Training path is not implemented.")
+-++++-
+-++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-++++-        
+-++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-++++-        
+-++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-++++-        
+-++++-#         return final_hidden_states, router_logits
+-++++-
+-++++-
+-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++-#     """
+-++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-++++-#     """
+-++++-#     def __init__(self, config: Qwen2MoeConfig):
+-++++-#         super().__init__()
+-++++-#         self.num_experts = config.num_experts
+-++++-#         self.top_k = config.num_experts_per_tok
+-++++-#         self.norm_topk_prob = config.norm_topk_prob
+-++++-
+-++++-#         # 门控网络
+-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++-#         # 专家列表
+-++++-#         self.experts = nn.ModuleList(
+-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++-#         )
+-++++-#         # 共享专家
+-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++-
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_decode(
+-++++-#         self, 
+-++++-#         hidden_states: mindspore.Tensor, 
+-++++-#         selected_experts: mindspore.Tensor, 
+-++++-#         routing_weights: mindspore.Tensor
+-++++-#     ) -> mindspore.Tensor:
+-++++-#         batch_size, _ = hidden_states.shape
+-++++-#         expert_outputs_list = [
+-++++-#             ops.cat([
+-++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++-#             ], dim=0) 
+-++++-#             for i in range(batch_size)
+-++++-#         ]
+-++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++-#         return moe_output.squeeze(1)
+-++++-
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_prefill(
+-++++-#         self, 
+-++++-#         hidden_states: mindspore.Tensor, 
+-++++-#         selected_experts: mindspore.Tensor, 
+-++++-#         routing_weights: mindspore.Tensor
+-++++-#     ) -> mindspore.Tensor:
+-++++-#         moe_output = ops.zeros_like(hidden_states)
+-++++-#         num_tokens = hidden_states.shape[0]
+-++++-#         flat_selected_experts = selected_experts.flatten()
+-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++-#         active_experts = ops.unique(flat_selected_experts)
+-++++-        
+-++++-#         for expert_idx_tensor in active_experts:
+-++++-#             expert_idx = expert_idx_tensor.item()
+-++++-#             expert_layer = self.experts[expert_idx]
+-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++-#             selected_token_indices = token_indices[mask]
+-++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++-#             current_states = hidden_states[selected_token_indices]
+-++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++-#             moe_output = moe_output.index_add(
+-++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++-#             )
+-++++-#         return moe_output
+-++++-
+-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++-#         """
+-++++-#         顶层 forward 方法，作为智能分发器。
+-++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-++++-#         """
+-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-        
+-++++-#         # 1. 门控计算 (通用逻辑)
+-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++-#         router_logits = self.gate(hidden_states_reshaped)
+-++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++-
+-++++-#         if self.norm_topk_prob:
+-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-        
+-++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++-        
+-++++-#         # 2. 智能分发到最优 MoE 路径
+-++++-#         moe_output = None
+-++++-#         if not self.training:
+-++++-#             if sequence_length == 1:
+-++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++++-#             else:
+-++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++++-#         else:
+-++++-#             raise NotImplementedError("Training path is not implemented.")
+-++++-
+-++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++-        
+-++++-#         # 4. 合并 MoE 输出和共享专家输出
+-++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++-        
+-++++-#         # 5. 恢复原始形状并返回
+-++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++-        
+-++++-#         return final_hidden_states, router_logits
+-++++-
+-++++-# prefill fastest
+-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++-#     """
+-++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-++++-#     """
+-++++-#     def __init__(self, config: Qwen2MoeConfig):
+-++++-#         super().__init__()
+-++++-#         self.num_experts = config.num_experts
+-++++-#         self.top_k = config.num_experts_per_tok
+-++++-#         self.norm_topk_prob = config.norm_topk_prob
+-++++-
+-++++-#         # 门控网络
+-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++-#         # 专家列表
+-++++-#         self.experts = nn.ModuleList(
+-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++-#         )
+-++++-#         # 共享专家
+-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++-
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_dispatch(
+-++++-#         self, 
+-++++-#         hidden_states: mindspore.Tensor, 
+-++++-#         selected_experts: mindspore.Tensor, 
+-++++-#         routing_weights: mindspore.Tensor
+-++++-#     ) -> mindspore.Tensor:
+-++++-#         """
+-++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-++++-#         """
+-++++-#         moe_output = ops.zeros_like(hidden_states)
+-++++-#         num_tokens, _ = hidden_states.shape
+-++++-        
+-++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-++++-#         flat_selected_experts = selected_experts.flatten()
+-++++-#         flat_routing_weights = routing_weights.flatten()
+-++++-
+-++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++-
+-++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-++++-#         active_experts = ops.unique(flat_selected_experts)
+-++++-        
+-++++-#         for expert_idx_tensor in active_experts:
+-++++-#             expert_idx = expert_idx_tensor.item()
+-++++-#             expert_layer = self.experts[expert_idx]
+-++++-            
+-++++-#             # 找到所有分配给该专家的 token
+-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++-            
+-++++-#             # 使用 mask 选取对应的 token 和权重
+-++++-#             current_token_indices = token_indices[mask]
+-++++-#             current_routing_weights = flat_routing_weights[mask]
+-++++-#             current_hidden_states = hidden_states[current_token_indices]
+-++++-            
+-++++-#             # 对这些 token 进行批处理
+-++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++++-            
+-++++-#             # 使用 index_add 将结果精确地加回到对应位置
+-++++-#             moe_output = moe_output.index_add(
+-++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++-#             )
+-++++-#         return moe_output
+-++++-
+-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++-#         """
+-++++-#         顶层 forward 方法，作为智能分发器。
+-++++-#         """
+-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-        
+-++++-#         # 1. 门控计算
+-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++-#         router_logits = self.gate(hidden_states_reshaped)
+-++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++-
+-++++-#         if self.norm_topk_prob:
+-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-        
+-++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++-        
+-++++-#         # 2. 调用统一的 MoE 计算内核
+-++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-++++-
+-++++-#         # 3. 统一处理共享专家
+-++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++-        
+-++++-#         # 4. 合并输出
+-++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++-        
+-++++-#         # 5. 恢复原始形状并返回
+-++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++-        
+-++++-#         return final_hidden_states, router_logits
+-++++-
+-++++-
+-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++-#     """
+-++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++-#     【最终高性能与高精度版】：
+-++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-++++-#     3. 这样实现了速度和准确性的两全其美。
+-++++-#     """
+-++++-#     def __init__(self, config: Qwen2MoeConfig):
+-++++-#         super().__init__()
+-++++-#         self.num_experts = config.num_experts
+-++++-#         self.top_k = config.num_experts_per_tok
+-++++-#         self.norm_topk_prob = config.norm_topk_prob
+-++++-
+-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++-#         self.experts = nn.ModuleList(
+-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++-#         )
+-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++-
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_decode(
+-++++-#         self, 
+-++++-#         hidden_states: mindspore.Tensor, 
+-++++-#         selected_experts: mindspore.Tensor, 
+-++++-#         routing_weights: mindspore.Tensor
+-++++-#     ) -> mindspore.Tensor:
+-++++-#         """
+-++++-#         【解码路径】极致优化版：bmm + 高精度累加。
+-++++-#         """
+-++++-#         original_dtype = hidden_states.dtype
+-++++-#         batch_size, _ = hidden_states.shape
+-++++-        
+-++++-#         expert_outputs_list = [
+-++++-#             ops.cat([
+-++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++-#             ], dim=0) 
+-++++-#             for i in range(batch_size)
+-++++-#         ]
+-++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++-
+-++++-#         # 在 float32 下执行 bmm，得到高精度结果
+-++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++-        
+-++++-#         # 将高精度结果转换回原始数据类型
+-++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-++++-        
+-++++-#         return moe_output
+-++++-
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_prefill(
+-++++-#         self, 
+-++++-#         hidden_states: mindspore.Tensor, 
+-++++-#         selected_experts: mindspore.Tensor, 
+-++++-#         routing_weights: mindspore.Tensor
+-++++-#     ) -> mindspore.Tensor:
+-++++-#         """
+-++++-#         【预填充路径】与原始实现一致，结果精确。
+-++++-#         """
+-++++-#         moe_output = ops.zeros_like(hidden_states)
+-++++-#         num_tokens, _ = hidden_states.shape
+-++++-#         flat_selected_experts = selected_experts.flatten()
+-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++-#         active_experts = ops.unique(flat_selected_experts)
+-++++-        
+-++++-#         for expert_idx_tensor in active_experts:
+-++++-#             expert_idx = expert_idx_tensor.item()
+-++++-#             expert_layer = self.experts[expert_idx]
+-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++-#             selected_token_indices = token_indices[mask]
+-++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++-#             current_states = hidden_states[selected_token_indices]
+-++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++-#             moe_output = moe_output.index_add(
+-++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++-#             )
+-++++-#         return moe_output
+-++++-
+-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-        
+-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++-#         router_logits = self.gate(hidden_states_reshaped)
+-++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++-
+-++++-#         if self.norm_topk_prob:
+-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-        
+-++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-++++-#         # 如果模型主体是 float16，后续再转换
+-++++-        
+-++++-#         moe_output = None
+-++++-#         if not self.training:
+-++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-++++-#             # _moe_infer_decode 内部会处理好类型转换
+-++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-++++-#             if sequence_length == 1:
+-++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++++-#             else:
+-++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++++-#         else:
+-++++-#             raise NotImplementedError("Training path is not implemented.")
+-++++-
+-++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++-        
+-++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++-        
+-++++-#         return final_hidden_states, router_logits
+-++++-    
+-++++-
+-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++-#     """
+-++++-#     【融合版】一个混合专家模块，内置两种推理策略，
+-++++-#     由外部全局变量 `Long_Prompt` 控制：
+-++++-
+-++++-#     - if Long_Prompt is True:  【精度优先模式】
+-++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-++++-#       适用于处理长序列，避免误差累积。
+-++++-
+-++++-#     - if Long_Prompt is False: 【速度优先模式】
+-++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
+-++++-#     """
+-++++-#     def __init__(self, config: Qwen2MoeConfig):
+-++++-#         super().__init__()
+-++++-#         self.num_experts = config.num_experts
+-++++-#         self.top_k = config.num_experts_per_tok
+-++++-#         self.norm_topk_prob = config.norm_topk_prob
+-++++-
+-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++-#         self.experts = nn.ModuleList(
+-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++-#         )
+-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++-
+-++++-#     # --- 速度优先模式的辅助函数 ---
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++-#         original_dtype = hidden_states.dtype
+-++++-#         batch_size, _ = hidden_states.shape
+-++++-#         expert_outputs_list = [
+-++++-#             ops.cat([
+-++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++-#             ], dim=0) 
+-++++-#             for i in range(batch_size)
+-++++-#         ]
+-++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
+-++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-++++-
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++-#         moe_output = ops.zeros_like(hidden_states)
+-++++-#         num_tokens, _ = hidden_states.shape
+-++++-#         flat_selected_experts = selected_experts.flatten()
+-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++-#         active_experts = ops.unique(flat_selected_experts)
+-++++-#         for expert_idx_tensor in active_experts:
+-++++-#             expert_idx = expert_idx_tensor.item()
+-++++-#             expert_layer = self.experts[expert_idx]
+-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++-#             selected_token_indices = token_indices[mask]
+-++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++-#             current_states = hidden_states[selected_token_indices]
+-++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++-#         return moe_output
+-++++-
+-++++-#     # --- 精度优先模式的辅助函数 ---
+-++++-#     @no_grad()
+-++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++-#         moe_output = ops.zeros_like(hidden_states)
+-++++-#         num_tokens, _ = hidden_states.shape
+-++++-#         flat_selected_experts = selected_experts.flatten()
+-++++-#         flat_routing_weights = routing_weights.flatten()
+-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++-#         active_experts = ops.unique(flat_selected_experts)
+-++++-#         for expert_idx_tensor in active_experts:
+-++++-#             expert_idx = expert_idx_tensor.item()
+-++++-#             expert_layer = self.experts[expert_idx]
+-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++-#             current_token_indices = token_indices[mask]
+-++++-#             current_routing_weights = flat_routing_weights[mask]
+-++++-#             current_hidden_states = hidden_states[current_token_indices]
+-++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++-#         return moe_output
+-++++-
+-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
+-++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-++++-#         global Long_Prompt
+-++++-
+-++++-#         # 1. 门控计算 (所有模式通用)
+-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++-#         router_logits = self.gate(hidden_states_reshaped)
+-++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-++++-#         if self.norm_topk_prob:
+-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-        
+-++++-#         moe_output = None
+-++++-#         if not self.training:
+-++++-#             # 根据 Long_Prompt 标志选择模式
+-++++-#             if Long_Prompt:
+-++++-#                 # --- 精度优先模式 ---
+-++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++-#             else:
+-++++-#                 # --- 速度优先模式 ---
+-++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++-#                 if sequence_length == 1:
+-++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++-#                 else:
+-++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++-#         else:
+-++++-#             raise NotImplementedError("Training path is not implemented.")
+-++++-
+-++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++-        
+-++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++-        
+-++++-#         return final_hidden_states, router_logits
+-++++-    
+-++++ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++     """
+-++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++++         return moe_output_fp32.squeeze(1).to(original_dtype)
+-++++ 
+-+++++    # @no_grad()
+-+++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++    #     num_tokens, _ = hidden_states.shape
+-+++++    #     flat_selected_experts = selected_experts.flatten()
+-+++++    #     sorted_expert_indices = flat_selected_experts.argsort()
+-+++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+++++    #     original_token_indices = sorted_expert_indices // self.top_k
+-+++++    #     moe_output = ops.zeros_like(hidden_states)
+-+++++    #     current_token_offset = 0
+-+++++    #     for i in range(self.num_experts):
+-+++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+-+++++    #         if expert_token_count == 0:
+-+++++    #             continue
+-+++++    #         end_offset = current_token_offset + expert_token_count
+-+++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+-+++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++++    #         current_token_offset += expert_token_count
+-+++++    #     return moe_output
+-+++++
+-++++     @no_grad()
+-++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++-        num_tokens, _ = hidden_states.shape
+-++++-        flat_selected_experts = selected_experts.flatten()
+-++++-        sorted_expert_indices = flat_selected_experts.argsort()
+-++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-++++-        original_token_indices = sorted_expert_indices // self.top_k
+-+++++        """
+-+++++        优化版 MoE prefill (速度优先模式)：
+-+++++        - 批量张量化处理同一个 expert 的所有 token
+-+++++        - 跳过无 token 的专家
+-+++++        - 保持结果完全一致
+-+++++        """
+-++++         moe_output = ops.zeros_like(hidden_states)
+-++++-        current_token_offset = 0
+-++++-        for i in range(self.num_experts):
+-++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
+-++++-            if expert_token_count == 0:
+-++++-                continue
+-++++-            end_offset = current_token_offset + expert_token_count
+-++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
+-++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++-            current_token_offset += expert_token_count
+-+++++
+-+++++        flat_selected_experts = selected_experts.flatten()
+-+++++        flat_routing_weights = routing_weights.flatten()
+-+++++
+-+++++        idxs = flat_selected_experts.argsort()
+-+++++        sorted_expert_indices = flat_selected_experts[idxs]
+-+++++        sorted_token_indices = idxs // self.top_k
+-+++++
+-+++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+-+++++
+-+++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-+++++
+-+++++        for expert_id in active_experts.tolist():
+-+++++            start = int(tokens_per_expert[:expert_id].sum().item())
+-+++++            end = start + int(tokens_per_expert[expert_id].item())
+-+++++
+-+++++            token_idx = sorted_token_indices[start:end]
+-+++++            expert_tokens = hidden_states[token_idx]
+-+++++
+-+++++            expert_out = self.experts[expert_id](expert_tokens)
+-+++++
+-+++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+-+++++
+-+++++            moe_output = mindspore.mint.scatter_add(
+-+++++                moe_output,
+-+++++                0,
+-+++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+-+++++                scaled_out.to(hidden_states.dtype)
+-+++++            )
+-+++++
+-++++         return moe_output
+-++++ 
+-+++++
+-++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-++++     @no_grad()
+-++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++         
+-++++         moe_output = None
+-++++-        if Long_Prompt:
+-++++-            # --- 精度优先模式 (ACCURACY MODE) ---
+-++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++        # if Long_Prompt==0:
+-+++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
+-+++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++        # else:
+-+++++        #     # --- 速度优先模式 (SPEED MODE) ---
+-+++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++        #     if sequence_length == 1:
+-+++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++        #     else:
+-+++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++        
+-+++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++        if sequence_length == 1:
+-+++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++         else:
+-++++-            # --- 速度优先模式 (SPEED MODE) ---
+-++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++-            if sequence_length == 1:
+-++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++-            else:
+-++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++-        
+-+++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++    
+-++++ 
+-++++         # 3. 共享专家计算与合并 (所有模式通用)
+-++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++         
+-++++         return final_hidden_states, router_logits
+-++++ 
+-+++++
+-++++ class Qwen2MoeDecoderLayer(nn.Module):
+-++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-++++         super().__init__()
+-++++         self.hidden_size = config.hidden_size
+-++++         
+-++++-        # if Long_Prompt:
+-++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++-        # else:
+-+++++        # if Long_Prompt == 2:
+-++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++++        # else:
+-+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++ 
+-++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++ 
+-++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++++             )
+-++++ 
+-++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+-++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++++        #     attention_mask,
+-+++++        #     sequence_length=sequence_length,
+-+++++        #     target_length=target_length,
+-+++++        #     dtype=dtype,
+-+++++        #     min_dtype=min_dtype,
+-+++++        #     cache_position=cache_position,
+-+++++        #     batch_size=input_tensor.shape[0],
+-+++++        # )
+-+++++        #@dwj
+-+++++        causal_mask = get_cached_causal_mask_with_cache_position(
+-++++             attention_mask,
+-++++             sequence_length=sequence_length,
+-++++             target_length=target_length,
+-++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-++++         """
+-++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-+++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+-+++++        _causal_mask_cache.clear()
+-++++ 
+-++++         input_ids = kwargs.get("input_ids")
+-++++         if input_ids is None and args:
+-++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++ 
+-++++         if input_ids is not None:
+-++++             prompt_length = input_ids.shape[1]
+-++++-            
+-++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-++++-                Long_Prompt = True
+-+++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+-+++++                Long_Prompt = 2
+-+++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+-+++++                Long_Prompt = 0
+-++++             else:
+-++++-                Long_Prompt = False
+-+++++                Long_Prompt = 1
+-+++++
+-++++ 
+-++++         return super().generate(*args, **kwargs)
+-++++     
+-++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++             dtype = self.lm_head.weight.dtype
+-++++             min_dtype = float(ops.finfo(dtype).min)
+-++++ 
+-++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-+++++            #     attention_mask,
+-+++++            #     sequence_length=sequence_length,
+-+++++            #     target_length=past_key_values.get_max_length(),
+-+++++            #     dtype=dtype,
+-+++++            #     min_dtype=min_dtype,
+-+++++            #     cache_position=cache_position,
+-+++++            #     batch_size=batch_size,
+-+++++            # )
+-+++++
+-+++++            #@dwj
+-+++++            attention_mask = get_cached_causal_mask_with_cache_position(
+-++++                 attention_mask,
+-++++                 sequence_length=sequence_length,
+-++++                 target_length=past_key_values.get_max_length(),
+-++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-++++deleted file mode 100644
+-++++index 6dfb5b93..00000000
+-++++--- a/patches/0001-20251104commit.patch
+-+++++++ /dev/null
+-++++@@ -1,1272 +0,0 @@
+-++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++++-From: Pinoeer-kingxi <13022943007@163.com>
+-++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++++-Subject: [PATCH] 20251104commit
+-++++-
+-++++----
+-++++- mindnlp/transformers/cache_utils.py           |  28 +-
+-++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-++++- 3 files changed, 976 insertions(+), 87 deletions(-)
+-++++-
+-++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-++++-index cadd2e04..02f8d4be 100644
+-++++---- a/mindnlp/transformers/cache_utils.py
+-++++-+++ b/mindnlp/transformers/cache_utils.py
+-++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-++++-             # k_out[:, :, cache_position] = key_states
+-++++-             # v_out[:, :, cache_position] = value_states
+-++++--            if ON_ORANGE_PI:
+-++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++--            else:
+-++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++--
+-++++-+            # if ON_ORANGE_PI:
+-++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++-+            # else:
+-++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
+-++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-++++-+            if cache_position.ndim > 1:
+-++++-+                cache_position = cache_position.flatten()
+-++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+-++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-++++-+                cache_position = cache_position.int()
+-++++-+            
+-++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-++++-+            k_out[:, :, cache_position] = key_states
+-++++-+            v_out[:, :, cache_position] = value_states
+-++++-+            
+-++++-         return k_out, v_out
+-++++- 
+-++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++-index c695b944..d8303e45 100644
+-++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++++- def rotate_half(x):
+-++++-     """Rotates half the hidden dims of the input."""
+-++++--    x1 = x[..., : x.shape[-1] // 2]
+-++++--    x2 = x[..., x.shape[-1] // 2 :]
+-++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++-+    # x1 = x[..., : x.shape[-1] // 2]
+-++++-+    # x2 = x[..., x.shape[-1] // 2 :]
+-++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++-     return ops.cat((-x2, x1), dim=-1)
+-++++- 
+-++++- 
+-++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-++++-         if self.training:
+-++++-             raise NotImplementedError("Training is not supported yet.")
+-++++-         else:
+-++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++--        if self.config.n_shared_experts is not None:
+-++++--            y = y + self.shared_experts(identity)
+-++++--        return y
+-++++-+            # @lwx
+-++++-+            if orig_shape[1] == 1:
+-++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-++++-+                y=y.view(*orig_shape)
+-++++-+                if self.config.n_shared_experts is not None:
+-++++-+                    y = y + self.shared_experts(identity)
+-++++-+                return y
+-++++-+            else:
+-++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-++++-+                if self.config.n_shared_experts is not None:
+-++++-+                    y = y + self.shared_experts(identity)
+-++++-+                return y
+-++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++-+        # if self.config.n_shared_experts is not None:
+-++++-+        #     y = y + self.shared_experts(identity)
+-++++-+        # return y
+-++++-+        
+-++++-+    @no_grad()
+-++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++-+
+-++++-+        expert_cache = ops.zeros_like(x)
+-++++-+        for i in range(self.num_experts_per_tok):
+-++++-+            expert_id = flat_expert_indices[i].item()
+-++++-+            weight = flat_expert_weights[i].item()
+-++++-+            expert = self.experts[expert_id]
+-++++-+            expert_out = expert(x)
+-++++-+            expert_cache += expert_out * weight
+-++++-+        return expert_cache
+-++++- 
+-++++-     @no_grad()
+-++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++--        # expert_cache = torch.zeros_like(x)
+-++++--        # idxs = flat_expert_indices.argsort()
+-++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++--        # token_idxs = idxs // self.num_experts_per_tok
+-++++--        # for i, end_idx in enumerate(tokens_per_expert):
+-++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++--        #     if start_idx == end_idx:
+-++++--        #         continue
+-++++--        #     expert = self.experts[i]
+-++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++--        #     expert_tokens = x[exp_token_idx]
+-++++--        #     expert_out = expert(expert_tokens)
+-++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++--        # return expert_cache
+-++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++-         expert_cache = ops.zeros_like(x)
+-++++-         idxs = flat_expert_indices.argsort()
+-++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++-         token_idxs = idxs // self.num_experts_per_tok
+-++++-+
+-++++-         for i, end_idx in enumerate(tokens_per_expert):
+-++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++-             if start_idx == end_idx:
+-++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-++++-             expert_out = expert(expert_tokens)
+-++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++-+
+-++++-         return expert_cache
+-++++-+        
+-++++-+    # @no_grad()
+-++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++-+    #     # expert_cache = torch.zeros_like(x)
+-++++-+    #     # idxs = flat_expert_indices.argsort()
+-++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
+-++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++-+    #     #     if start_idx == end_idx:
+-++++-+    #     #         continue
+-++++-+    #     #     expert = self.experts[i]
+-++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++-+    #     #     expert_tokens = x[exp_token_idx]
+-++++-+    #     #     expert_out = expert(expert_tokens)
+-++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++-+    #     # return expert_cache
+-++++-+    #     expert_cache = ops.zeros_like(x)
+-++++-+    #     idxs = flat_expert_indices.argsort()
+-++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++-+    #     token_idxs = idxs // self.num_experts_per_tok
+-++++-+
+-++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
+-++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++-+    #         if start_idx == end_idx:
+-++++-+    #             continue
+-++++-+    #         expert = self.experts[i]
+-++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++-+    #         expert_tokens = x[exp_token_idx]
+-++++-+    #         expert_out = expert(expert_tokens)
+-++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++-+
+-++++-+    #     return expert_cache
+-++++-+    # @no_grad()
+-++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++-+    #     expert_cache = ops.zeros_like(x)
+-++++-+
+-++++-+    #     # 排序保证顺序一致
+-++++-+    #     idxs = flat_expert_indices.argsort()
+-++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++-+    #     token_idxs = idxs // self.num_experts_per_tok
+-++++-+
+-++++-+    #     # 找出有 token 的专家
+-++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++-+
+-++++-+    #     for i in active_experts.tolist():
+-++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++-+    #         end_idx = tokens_per_expert[i]
+-++++-+    #         if start_idx == end_idx:  # 没有 token
+-++++-+    #             continue
+-++++-+
+-++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++-+    #         expert_tokens = x[exp_token_idx]
+-++++-+    #         expert_out = self.experts[i](expert_tokens)
+-++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++-+
+-++++-+    #         expert_cache = mindspore.mint.scatter_add(
+-++++-+    #             expert_cache,
+-++++-+    #             0,
+-++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++-+    #             expert_out
+-++++-+    #         )
+-++++-+
+-++++-+    #     return expert_cache
+-++++-+
+-++++-+
+-++++- 
+-++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-++++- #     """
+-++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++- 
+-++++-         # Initialize weights and apply final processing
+-++++-         self.post_init()
+-++++-+        self.warm_up = False
+-++++-+
+-++++-+    def warmup_moe_model_deep(self):
+-++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++++-+        test_texts = [
+-++++-+            "warmup short",
+-++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-++++-+        ]
+-++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++-+        if tokenizer is None:
+-++++-+            from mindnlp.transformers import AutoTokenizer
+-++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++-+            self._warmup_tokenizer = tokenizer
+-++++-+
+-++++-+        for text in test_texts:
+-++++-+            inputs = tokenizer(text, return_tensors="ms")
+-++++-+            with mindspore._no_grad():
+-++++-+                _ = self(**inputs, use_cache=False)
+-++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-++++- 
+-++++-     def get_input_embeddings(self):
+-++++-         return self.model.embed_tokens
+-++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++++-         ```"""
+-++++-+        if not self.warm_up:
+-++++-+            self.warm_up = True
+-++++-+            self.warmup_moe_model_deep()
+-++++-+
+-++++-         output_attentions = (
+-++++-             output_attentions
+-++++-             if output_attentions is not None
+-++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++-index 3cbf820e..d4c6b651 100644
+-++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++-@@ -18,7 +18,6 @@
+-++++- # See the License for the specific language governing permissions and
+-++++- # limitations under the License.
+-++++- """MindSpore Qwen2MoE model."""
+-++++--
+-++++- import math
+-++++- from typing import List, Optional, Tuple, Union
+-++++- 
+-++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-++++-     TokenClassifierOutput,
+-++++- )
+-++++- from ...modeling_utils import PreTrainedModel
+-++++-+from ...generation import GenerationMixin
+-++++- from ....utils import logging
+-++++- from .configuration_qwen2_moe import Qwen2MoeConfig
+-++++- 
+-++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-++++-         self.variance_epsilon = eps
+-++++- 
+-++++-     def forward(self, hidden_states):
+-++++-+        # @dwj
+-++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++-+        # @lwx
+-++++-+        # if not self.training :
+-++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++-         input_dtype = hidden_states.dtype
+-++++-         hidden_states = hidden_states.to(mindspore.float32)
+-++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-++++-@@ -234,6 +239,8 @@ def rotate_half(x):
+-++++-     """Rotates half the hidden dims of the input."""
+-++++-     x1 = x[..., : x.shape[-1] // 2]
+-++++-     x2 = x[..., x.shape[-1] // 2 :]
+-++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++-     return ops.cat((-x2, x1), dim=-1)
+-++++- 
+-++++- 
+-++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-++++-         self.config = config
+-++++-         self.hidden_size = config.hidden_size
+-++++-         self.intermediate_size = intermediate_size
+-++++-+        
+-++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-++++-         self.act_fn = ACT2FN[config.hidden_act]
+-++++- 
+-++++-     def forward(self, x):
+-++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++--
+-++++- 
+-++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++-+        # @lwx 
+-++++-+        # gate_up_output = self.gate_up_proj(x)
+-++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-++++-+        # return self.down_proj(swiglu_output)
+-++++-+
+-++++-+    # def forward(self, x):
+-++++-+    #     gate_proj_out = self.gate_proj(x)
+-++++-+    #     up_proj_out = self.up_proj(x)
+-++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-++++-+    #     return self.down_proj(swiglu_out)
+-++++-+        
+-++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++++-     """
+-++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-++++-         use_cache: bool = False,
+-++++-         cache_position: Optional[mindspore.Tensor] = None,
+-++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++-+
+-++++-+        
+-++++-+
+-++++-         bsz, q_len, _ = hidden_states.shape
+-++++- 
+-++++-         query_states = self.q_proj(hidden_states)
+-++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++-                     "with a layer index."
+-++++-                 )
+-++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++-+            if isinstance(past_key_value, StaticCache):
+-++++-+                kv_seq_len = key_states.shape[-2]
+-++++-+            else:
+-++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++- 
+-++++-         if past_key_value is not None:
+-++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++-+            
+-++++-+            if isinstance(past_key_value, StaticCache):
+-++++-+                kv_seq_len = key_states.shape[-2]
+-++++- 
+-++++-         # repeat k/v heads if n_kv_heads < n_heads
+-++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++--
+-++++-+        
+-++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++- 
+-++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-++++--            raise ValueError(
+-++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-++++--                f" {attn_weights.shape}"
+-++++--            )
+-++++--
+-++++--        if attention_mask is not None:  # no matter the length, we just slice it
+-++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-++++-+        if attention_mask is not None:
+-++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++-             attn_weights = attn_weights + causal_mask
+-++++- 
+-++++-         # upcast attention to fp32
+-++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++- 
+-++++-         attn_output = self.o_proj(attn_output)
+-++++--
+-++++-+        # @lwx
+-++++-+        
+-++++-+        # max_seq_len = self.max_position_embeddings  # 2048
+-++++-+
+-++++-+        # if attention_mask is not None:
+-++++-+        #     # attention_mask: [B, 1, Sq, Sk]
+-++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++++-+
+-++++-+        #     # pad 到 [max_seq_len, max_seq_len]
+-++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++++-+        #     global_attention_mask = padded_mask
+-++++-+        # else:
+-++++-+        #     global_attention_mask = None
+-++++-+
+-++++-+
+-++++-+        # sparse_mode=3
+-++++-+        # attn_output = mindspore.ops.flash_attention_score(
+-++++-+        #     query=query_states,
+-++++-+        #     key=key_states,
+-++++-+        #     value=value_states,
+-++++-+        #     real_shift=None,
+-++++-+        #     padding_mask=None,
+-++++-+
+-++++-+        #     head_num=self.num_heads,
+-++++-+        #     attn_mask=global_attention_mask,
+-++++-+        #     keep_prob=1.0 - self.attention_dropout,
+-++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++-+        #     input_layout="BNSD",
+-++++-+        #     pre_tokens=2147483647,
+-++++-+        #     next_tokens=2147483647,
+-++++-+        #     inner_precise=0,
+-++++-+        #     drop_mask=None,
+-++++-+        #     prefix=None,
+-++++-+        #     actual_seq_qlen=None,
+-++++-+        #     actual_seq_kvlen=None,
+-++++-+        #     sparse_mode=sparse_mode,
+-++++-+        # )
+-++++-         if not output_attentions:
+-++++-             attn_weights = None
+-++++- 
+-++++-         return attn_output, attn_weights, past_key_value
+-++++- 
+-++++- 
+-++++-+class Qwen2MoeFlashAttention(nn.Module):
+-++++-+    """
+-++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++++-+
+-++++-+    关键改动:
+-++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++++-+       直接传入原始的 key 和 value 张量效率更高。
+-++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++-+    """
+-++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++-+        super().__init__()
+-++++-+        self.config = config
+-++++-+        self.layer_idx = layer_idx
+-++++-+        self.hidden_size = config.hidden_size
+-++++-+        self.num_heads = config.num_attention_heads
+-++++-+        self.head_dim = self.hidden_size // self.num_heads
+-++++-+        self.num_key_value_heads = config.num_key_value_heads
+-++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++-+        self.max_position_embeddings = config.max_position_embeddings
+-++++-+        self.rope_theta = config.rope_theta
+-++++-+        self.attention_dropout = config.attention_dropout
+-++++-+
+-++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++-+            raise ValueError(
+-++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++++-+            )
+-++++-+
+-++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++-+
+-++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++-+            self.head_dim,
+-++++-+            max_position_embeddings=self.max_position_embeddings,
+-++++-+            base=self.rope_theta,
+-++++-+        )
+-++++-+
+-++++-+    def forward(
+-++++-+        self,
+-++++-+        hidden_states: mindspore.Tensor,
+-++++-+        attention_mask: Optional[mindspore.Tensor] = None,
+-++++-+        position_ids: Optional[mindspore.Tensor] = None,
+-++++-+        past_key_value: Optional[Cache] = None,
+-++++-+        output_attentions: bool = False,
+-++++-+        use_cache: bool = False,
+-++++-+        cache_position: Optional[mindspore.Tensor] = None,
+-++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++-+
+-++++-+        bsz, q_len, _ = hidden_states.shape
+-++++-+
+-++++-+        # 1. 线性投射 Q, K, V
+-++++-+        query_states = self.q_proj(hidden_states)
+-++++-+        key_states = self.k_proj(hidden_states)
+-++++-+        value_states = self.v_proj(hidden_states)
+-++++-+
+-++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+
+-++++-+        # 3. RoPE 旋转位置编码
+-++++-+        kv_seq_len = key_states.shape[-2]
+-++++-+        if past_key_value is not None:
+-++++-+            if self.layer_idx is None:
+-++++-+                raise ValueError(
+-++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++-+                    "with a layer index."
+-++++-+                )
+-++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++++-+                if cache_position.shape[0] == 1:
+-++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++++-+                    kv_seq_len = past_seen_tokens + 1
+-++++-+                else:
+-++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
+-++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++++-+            else:
+-++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++-+
+-++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++-+
+-++++-+        # 4. KV 缓存更新
+-++++-+        if past_key_value is not None:
+-++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++-+            key_states, value_states = past_key_value.update(
+-++++-+                key_states, value_states, self.layer_idx, cache_kwargs
+-++++-+            )
+-++++-+            
+-++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++-+                if cache_position.shape[0] == 1:
+-++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++++-+                    kv_seq_len = key_states.shape[-2]
+-++++-+
+-++++-+        # 5. [重要] 准备 Attention Mask
+-++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++-+        fa_attention_mask = None
+-++++-+        if attention_mask is not None:
+-++++-+            # 截取与当前key长度匹配的部分
+-++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++++-+            fa_attention_mask = (mask_slice != 0)
+-++++-+
+-++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++++-+        input_dtype = query_states.dtype
+-++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++++-+            query_states = query_states.to(mindspore.float16)
+-++++-+            key_states = key_states.to(mindspore.float16)
+-++++-+            value_states = value_states.to(mindspore.float16)
+-++++-+
+-++++-+        # 6. [核心] 调用 flash_attention_score 算子
+-++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++-+        attn_output = mindspore.ops.flash_attention_score(
+-++++-+            query=query_states,
+-++++-+            key=key_states,
+-++++-+            value=value_states,
+-++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
+-++++-+            attn_mask=fa_attention_mask,
+-++++-+            keep_prob=1.0 - self.attention_dropout,
+-++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++-+            input_layout="BNSD",
+-++++-+            sparse_mode=0 # 使用 defaultMask 模式
+-++++-+        )
+-++++-+
+-++++-+        # 恢复原始数据类型
+-++++-+        attn_output = attn_output.to(input_dtype)
+-++++-+
+-++++-+        # 7. 调整输出形状
+-++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++-+        attn_output = self.o_proj(attn_output)
+-++++-+
+-++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
+-++++-+        attn_weights = None
+-++++-+        if output_attentions:
+-++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++-+
+-++++-+        return attn_output, attn_weights, past_key_value
+-++++-+
+-++++-+    # def forward(
+-++++-+    #     self,
+-++++-+    #     hidden_states: mindspore.Tensor,
+-++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++-+    #     past_key_value: Optional[Cache] = None,
+-++++-+    #     output_attentions: bool = False,
+-++++-+    #     use_cache: bool = False,
+-++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++-+
+-++++-+    #     bsz, q_len, _ = hidden_states.shape
+-++++-+
+-++++-+    #     # 1. 线性投射 Q, K, V
+-++++-+    #     query_states = self.q_proj(hidden_states)
+-++++-+    #     key_states = self.k_proj(hidden_states)
+-++++-+    #     value_states = self.v_proj(hidden_states)
+-++++-+
+-++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+
+-++++-+    #     # 3. RoPE 旋转位置编码
+-++++-+    #     kv_seq_len = key_states.shape[-2]
+-++++-+    #     if past_key_value is not None:
+-++++-+    #         if self.layer_idx is None:
+-++++-+    #             raise ValueError(
+-++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++-+    #                 "with a layer index."
+-++++-+    #             )
+-++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++-+
+-++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++-+
+-++++-+    #     # 4. KV 缓存更新
+-++++-+    #     if past_key_value is not None:
+-++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++-+    #         key_states, value_states = past_key_value.update(
+-++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++-+    #         )
+-++++-+
+-++++-+    #     # 5. 准备 Attention Mask
+-++++-+    #     fa_attention_mask = None
+-++++-+    #     if attention_mask is not None:
+-++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++-+    #         fa_attention_mask = (mask_slice != 0)
+-++++-+
+-++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++++-+    #     input_dtype = query_states.dtype
+-++++-+
+-++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
+-++++-+    #     attn_output = mindspore.ops.flash_attention_score(
+-++++-+    #         query=query_states,
+-++++-+    #         key=key_states,
+-++++-+    #         value=value_states,
+-++++-+    #         head_num=self.num_heads,
+-++++-+    #         attn_mask=fa_attention_mask,
+-++++-+    #         keep_prob=1.0 - self.attention_dropout,
+-++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++-+    #         input_layout="BNSD",
+-++++-+    #         sparse_mode=0,
+-++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++++-+    #         inner_precise=1
+-++++-+    #     )
+-++++-+
+-++++-+    #     # 恢复原始数据类型
+-++++-+    #     attn_output = attn_output.to(input_dtype)
+-++++-+
+-++++-+    #     # 7. 调整输出形状
+-++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++-+    #     attn_output = self.o_proj(attn_output)
+-++++-+
+-++++-+    #     attn_weights = None
+-++++-+    #     if output_attentions:
+-++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++-+
+-++++-+    #     return attn_output, attn_weights, past_key_value
+-++++-+
+-++++-+    # def forward(
+-++++-+    #     self,
+-++++-+    #     hidden_states: mindspore.Tensor,
+-++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++-+    #     past_key_value: Optional[Cache] = None,
+-++++-+    #     output_attentions: bool = False,
+-++++-+    #     use_cache: bool = False,
+-++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++-+
+-++++-+    #     bsz, q_len, _ = hidden_states.shape
+-++++-+
+-++++-+    #     query_states = self.q_proj(hidden_states)
+-++++-+    #     key_states = self.k_proj(hidden_states)
+-++++-+    #     value_states = self.v_proj(hidden_states)
+-++++-+
+-++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-+
+-++++-+    #     kv_seq_len = key_states.shape[-2]
+-++++-+    #     if past_key_value is not None:
+-++++-+    #         if self.layer_idx is None:
+-++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
+-++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++-+
+-++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++-+
+-++++-+    #     if past_key_value is not None:
+-++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++-+    #         key_states, value_states = past_key_value.update(
+-++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++-+    #         )
+-++++-+
+-++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++-+
+-++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
+-++++-+    #     # <--- 修改结束 ---
+-++++-+
+-++++-+    #     fa_attention_mask = None
+-++++-+    #     if attention_mask is not None:
+-++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++-+    #         fa_attention_mask = (mask_slice != 0)
+-++++-+
+-++++-+    #     input_dtype = query_states.dtype
+-++++-+
+-++++-+    #     attn_output = mindspore.ops.flash_attention_score(
+-++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
+-++++-+    #         key=key_states,
+-++++-+    #         value=value_states,
+-++++-+    #         head_num=self.num_heads,
+-++++-+    #         attn_mask=fa_attention_mask,
+-++++-+    #         keep_prob=1.0 - self.attention_dropout,
+-++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++++-+    #         input_layout="BNSD",
+-++++-+    #         sparse_mode=0,
+-++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
+-++++-+    #     )
+-++++-+
+-++++-+    #     attn_output = attn_output.to(input_dtype)
+-++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++-+    #     attn_output = self.o_proj(attn_output)
+-++++-+
+-++++-+    #     attn_weights = None
+-++++-+    #     if output_attentions:
+-++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++++-+
+-++++-+    #     return attn_output, attn_weights, past_key_value
+-++++-+
+-++++- QWEN2MOE_ATTENTION_CLASSES = {
+-++++-     "eager": Qwen2MoeAttention,
+-++++-+    "flash-attention": Qwen2MoeFlashAttention,
+-++++- }
+-++++- 
+-++++- 
+-++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++- 
+-++++-+    #@dwj
+-++++-+    # 只遍历激活的专家，而非全部专家
+-++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++--        hidden_states = hidden_states.view(-1, hidden_dim)
+-++++--        # router_logits: (batch * sequence_length, n_experts)
+-++++--        router_logits = self.gate(hidden_states)
+-++++--
+-++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++--        if self.norm_topk_prob:
+-++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++--        # we cast back to the input dtype
+-++++--        routing_weights = routing_weights.to(hidden_states.dtype)
+-++++--
+-++++--        final_hidden_states = ops.zeros(
+-++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-++++--        )
+-++++--
+-++++--        # One hot encode the selected experts to create an expert mask
+-++++--        # this will be used to easily index which expert is going to be sollicitated
+-++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-++++--
+-++++--        # Loop over all available experts in the model and perform the computation on each expert
+-++++--        for expert_idx in range(self.num_experts):
+-++++--            expert_layer = self.experts[expert_idx]
+-++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-++++--
+-++++--            # Index the correct hidden states and compute the expert hidden state for
+-++++--            # the current expert. We need to make sure to multiply the output hidden
+-++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-++++--            if 0 not in idx.shape:
+-++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-++++--
+-++++--                # However `index_add_` only support torch tensors for indexing so we'll use
+-++++--                # the `top_x` tensor here.
+-++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-++++--
+-++++--        shared_expert_output = self.shared_expert(hidden_states)
+-++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-++++--
+-++++--        final_hidden_states = final_hidden_states + shared_expert_output
+-++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++-+            num_tokens = hidden_states_reshaped.shape[0]
+-++++-+
+-++++-+            router_logits = self.gate(hidden_states_reshaped)
+-++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++-+
+-++++-+            if self.norm_topk_prob:
+-++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
+-++++-+            
+-++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++++-+            flat_selected_experts = selected_experts.flatten()
+-++++-+            
+-++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++++-+            token_indices = broadcasted_token_indices.flatten()
+-++++-+            
+-++++-+            active_experts = ops.unique(flat_selected_experts)
+-++++-+            
+-++++-+            for expert_idx_tensor in active_experts:
+-++++-+                expert_idx = expert_idx_tensor.item()
+-++++-+                expert_layer = self.experts[expert_idx]
+-++++-+                
+-++++-+                mask = (flat_selected_experts == expert_idx_tensor)
+-++++-+                selected_token_indices = token_indices[mask]
+-++++-+                selected_routing_weights = routing_weights.flatten()[mask]
+-++++-+                
+-++++-+                current_states = hidden_states_reshaped[selected_token_indices]
+-++++-+                
+-++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++-+                
+-++++-+                final_hidden_states = final_hidden_states.index_add(
+-++++-+                    dim=0,
+-++++-+                    index=selected_token_indices,
+-++++-+                    source=expert_output.to(hidden_states.dtype)
+-++++-+                )
+-++++-+            
+-++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++++- 
+-++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++--        return final_hidden_states, router_logits
+-++++-+            final_hidden_states = final_hidden_states + shared_expert_output
+-++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++-+            
+-++++-+            return final_hidden_states, router_logits
+-++++- 
+-++++- 
+-++++- class Qwen2MoeDecoderLayer(nn.Module):
+-++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-++++- 
+-++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++- 
+-++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++-+
+-++++-         if (layer_idx not in config.mlp_only_layers) and (
+-++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++++-         ):
+-++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-++++-     _skip_keys_device_placement = "past_key_values"
+-++++-     _supports_cache_class = True
+-++++-+#lwx
+-++++-+    # _supports_static_cache = True
+-++++- 
+-++++-     def _init_weights(self, module):
+-++++-         std = self.config.initializer_range
+-++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++++-         return causal_mask
+-++++- 
+-++++- 
+-++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++-     _tied_weights_keys = ["lm_head.weight"]
+-++++- 
+-++++-     def __init__(self, config):
+-++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++-         self.num_experts_per_tok = config.num_experts_per_tok
+-++++-         # Initialize weights and apply final processing
+-++++-         self.post_init()
+-++++-+        # @lwx
+-++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-++++-+        #     self.generation_config.cache_implementation = "static"
+-++++-+        self._warmed_up = False
+-++++-+
+-++++-+    def warmup_moe_model(self):
+-++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-++++-+        test_texts = [
+-++++-+            "warmup short",
+-++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-++++-+        ]
+-++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++-+        if tokenizer is None:
+-++++-+            from mindnlp.transformers import AutoTokenizer
+-++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++-+            self._warmup_tokenizer = tokenizer
+-++++-+
+-++++-+        for text in test_texts:
+-++++-+            inputs = tokenizer(text, return_tensors="ms")
+-++++-+            with mindspore._no_grad():
+-++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-++++- 
+-++++-     def get_input_embeddings(self):
+-++++-         return self.model.embed_tokens
+-++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++++-         ```"""
+-++++-+        if not self._warmed_up:
+-++++-+            self._warmed_up = True
+-++++-+            self.warmup_moe_model()
+-++++- 
+-++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++++-         output_router_logits = (
+-++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++-             }
+-++++-         )
+-++++-         return model_inputs
+-++++-+# @lwx
+-++++-+    # def _decode_one_tokens_logits(
+-++++-+    #     self,
+-++++-+    #     cur_token: mindspore.Tensor,
+-++++-+    #     input_pos: Optional[mindspore.Tensor],
+-++++-+    #     cache_position: mindspore.Tensor,
+-++++-+    #     past_key_values: StaticCache,
+-++++-+    # ) -> mindspore.Tensor:
+-++++-+    #     """
+-++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-++++-+        
+-++++-+    #     Args:
+-++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-++++-+    #         input_pos: 输入位置信息，可选
+-++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-++++-+            
+-++++-+    #     Returns:
+-++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-++++-+    #     """
+-++++-+    #     # 调用JIT编译的版本
+-++++-+    #     return self.get_decode_one_tokens_logits(
+-++++-+    #         cur_token=cur_token,
+-++++-+    #         input_pos=input_pos,
+-++++-+    #         cache_position=cache_position,
+-++++-+    #         past_key_values=past_key_values,
+-++++-+    #     )
+-++++-+    
+-++++-+    # @mindspore.jit(jit_level='O1')
+-++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-++++-+    #     """
+-++++-+    #     JIT编译的函数，用于高效的单token解码
+-++++-+    #     使用JIT编译优化以支持静态shape和高效执行
+-++++-+        
+-++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-++++-+    #     """
+-++++-+    #     outputs = self.model.forward(
+-++++-+    #         input_ids=cur_token,
+-++++-+    #         position_ids=input_pos,
+-++++-+    #         cache_position=cache_position,
+-++++-+    #         past_key_values=past_key_values,
+-++++-+    #         use_cache=True,
+-++++-+    #         return_dict=False,
+-++++-+    #     )
+-++++-+        
+-++++-+    #     hidden_states = outputs[0]
+-++++-+    #     logits = self.lm_head.forward(hidden_states)
+-++++-+    #     logits = logits.float()
+-++++-+        
+-++++-+    #     return logits[:, -1, :]
+-++++-+
+-++++-+    # def _sample(
+-++++-+    #     self,
+-++++-+    #     input_ids: mindspore.Tensor,
+-++++-+    #     logits_processor,
+-++++-+    #     stopping_criteria,
+-++++-+    #     generation_config,
+-++++-+    #     synced_devices: bool,
+-++++-+    #     streamer=None,
+-++++-+    #     logits_warper=None,
+-++++-+    #     **model_kwargs,
+-++++-+    # ):
+-++++-+    #     """
+-++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-++++-+    #     """
+-++++-+    #     from ...generation.logits_process import LogitsProcessorList
+-++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-++++-+    #     from mindnlp.core import nn, ops, no_grad
+-++++-+    #     import numpy as np
+-++++-+        
+-++++-+    #     # 检查是否使用 StaticCache
+-++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-++++-+    #     # 否则，直接调用父类方法
+-++++-+    #     past_key_values = model_kwargs.get("past_key_values")
+-++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-++++-+
+-++++-+    #     if not isinstance(past_key_values, StaticCache):
+-++++-+    #         # 不使用 StaticCache，直接调用父类方法
+-++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-++++-+    #         return super()._sample(
+-++++-+    #             input_ids=input_ids,
+-++++-+    #             logits_processor=logits_processor,
+-++++-+    #             stopping_criteria=stopping_criteria,
+-++++-+    #             generation_config=generation_config,
+-++++-+    #             synced_devices=synced_devices,
+-++++-+    #             streamer=streamer,
+-++++-+    #             logits_warper=logits_warper,
+-++++-+    #             **model_kwargs,
+-++++-+    #         )
+-++++-+        
+-++++-+    #     # 使用 StaticCache，进入自定义循环
+-++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-++++-+    #     pad_token_id = generation_config._pad_token_tensor
+-++++-+    #     output_attentions = generation_config.output_attentions
+-++++-+    #     output_hidden_states = generation_config.output_hidden_states
+-++++-+    #     output_scores = generation_config.output_scores
+-++++-+    #     output_logits = generation_config.output_logits
+-++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-++++-+    #     max_length = generation_config.max_length
+-++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-++++-+    #     do_sample = generation_config.do_sample
+-++++-+        
+-++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-++++-+    #         raise ValueError(
+-++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-++++-+    #             f"{logits_warper})."
+-++++-+    #         )
+-++++-+        
+-++++-+    #     # init attention / hidden states / scores tuples
+-++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+-++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-++++-+        
+-++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-++++-+    #         encoder_hidden_states = (
+-++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-++++-+    #         )
+-++++-+        
+-++++-+    #     # keep track of which sequences are already finished
+-++++-+    #     batch_size, cur_len = input_ids.shape
+-++++-+    #     this_peer_finished = False
+-++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-++++-+        
+-++++-+    #     time_record = []
+-++++-+    #     from ....utils.testing_utils import parse_flag_from_env
+-++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-++++-+        
+-++++-+    #     while self._has_unfinished_sequences(
+-++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-++++-+    #     ):
+-++++-+    #         if _record_time:
+-++++-+    #             import time as time_module
+-++++-+    #             infer_start = time_module.time()
+-++++-+            
+-++++-+    #         # prepare model inputs
+-++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-++++-+            
+-++++-+    #         # prepare variable output controls
+-++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-++++-+            
+-++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-++++-+    #         cur_cache_position = model_inputs.get("cache_position")
+-++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+-++++-+    #         cur_input_ids = model_inputs.get("input_ids")
+-++++-+            
+-++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-++++-+    #             cur_cache_position is not None and 
+-++++-+    #             len(cur_cache_position.shape) > 0 and
+-++++-+    #             cur_cache_position.shape[0] == 1 and
+-++++-+    #             cur_input_ids is not None and
+-++++-+    #             cur_input_ids.shape[1] == 1):
+-++++-+    #             # 使用 JIT 优化的单 token 解码
+-++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-++++-+    #             if not hasattr(self, '_jit_used'):
+-++++-+    #                 self._jit_used = False
+-++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-++++-+                
+-++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+-++++-+    #                 cur_token=cur_input_ids,
+-++++-+    #                 input_pos=model_inputs.get("position_ids"),
+-++++-+    #                 cache_position=cur_cache_position,
+-++++-+    #                 past_key_values=cur_past_key_values,
+-++++-+    #             )
+-++++-+                
+-++++-+    #             # 标记已使用JIT（用于后续判断）
+-++++-+    #             if not self._jit_used:
+-++++-+    #                 self._jit_used = True
+-++++-+                
+-++++-+    #             # 构造兼容的输出对象
+-++++-+    #             class JitOptimizedOutput:
+-++++-+    #                 def __init__(self, logits, config):
+-++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-++++-+    #                     self.config = config
+-++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+-++++-+    #                     self.cross_attentions = None
+-++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-++++-+                
+-++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-++++-+    #         else:
+-++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-++++-+    #             outputs = self(**model_inputs, return_dict=True)
+-++++-+            
+-++++-+    #         if synced_devices and this_peer_finished:
+-++++-+    #             continue
+-++++-+            
+-++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-++++-+    #         next_token_logits = outputs.logits[:, -1, :]
+-++++-+            
+-++++-+    #         # pre-process distribution
+-++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-++++-+    #         if do_sample:
+-++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-++++-+            
+-++++-+    #         # Store scores, attentions and hidden_states when required
+-++++-+    #         if return_dict_in_generate:
+-++++-+    #             if output_scores:
+-++++-+    #                 scores += (next_token_scores,)
+-++++-+    #             if output_logits:
+-++++-+    #                 raw_logits += (next_token_logits,)
+-++++-+    #             if output_attentions:
+-++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-++++-+    #                 if self.config.is_encoder_decoder:
+-++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-++++-+                
+-++++-+    #             if output_hidden_states:
+-++++-+    #                 hidden = (
+-++++-+    #                     outputs.decoder_hidden_states
+-++++-+    #                     if self.config.is_encoder_decoder
+-++++-+    #                     else outputs.hidden_states
+-++++-+    #                 )
+-++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-++++-+            
+-++++-+    #         # token selection
+-++++-+    #         if do_sample:
+-++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-++++-+    #         else:
+-++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-++++-+            
+-++++-+    #         # finished sentences should have their next token be a padding token
+-++++-+    #         if has_eos_stopping_criteria:
+-++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-++++-+            
+-++++-+    #         # update generated ids, model inputs, and length for next step
+-++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-++++-+    #         if streamer is not None:
+-++++-+    #             streamer.put(next_tokens)
+-++++-+            
+-++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+-++++-+    #             outputs,
+-++++-+    #             model_kwargs,
+-++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-++++-+    #         )
+-++++-+            
+-++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-++++-+    #         cur_len += 1
+-++++-+            
+-++++-+    #         if _record_time:
+-++++-+    #             import time as time_module
+-++++-+    #             infer_stop = time_module.time()
+-++++-+    #             time_record.append(infer_stop - infer_start)
+-++++-+            
+-++++-+    #         del outputs
+-++++-+        
+-++++-+    #     average_infer_time = None
+-++++-+    #     if time_record:
+-++++-+    #         if len(time_record) > 1:
+-++++-+    #             time_record.pop(0)
+-++++-+    #         average_infer_time = sum(time_record) / len(time_record)
+-++++-+    #         print(f'average inference time is: {average_infer_time}')
+-++++-+    #         print(f'inference time record: {time_record}')
+-++++-+        
+-++++-+    #     if streamer is not None:
+-++++-+    #         streamer.end()
+-++++-+        
+-++++-+    #     # 简单判断：打印是否使用了JIT路径
+-++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+-++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+-++++-+    #     else:
+-++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-++++-+        
+-++++-+    #     if return_dict_in_generate:
+-++++-+    #         if self.config.is_encoder_decoder:
+-++++-+    #             return GenerateEncoderDecoderOutput(
+-++++-+    #                 sequences=input_ids,
+-++++-+    #                 scores=scores,
+-++++-+    #                 logits=raw_logits,
+-++++-+    #                 encoder_attentions=encoder_attentions,
+-++++-+    #                 encoder_hidden_states=encoder_hidden_states,
+-++++-+    #                 decoder_attentions=decoder_attentions,
+-++++-+    #                 cross_attentions=cross_attentions,
+-++++-+    #                 decoder_hidden_states=decoder_hidden_states,
+-++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++-+    #                 average_infer_time=average_infer_time
+-++++-+    #             )
+-++++-+    #         else:
+-++++-+    #             return GenerateDecoderOnlyOutput(
+-++++-+    #                 sequences=input_ids,
+-++++-+    #                 scores=scores,
+-++++-+    #                 logits=raw_logits,
+-++++-+    #                 attentions=decoder_attentions,
+-++++-+    #                 hidden_states=decoder_hidden_states,
+-++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++-+    #                 average_infer_time=average_infer_time
+-++++-+    #             )
+-++++-+    #     else:
+-++++-+    #         return input_ids
+-++++-+            
+-++++-+    # def _prepare_cache_for_generation(
+-++++-+    #     self,
+-++++-+    #     generation_config,
+-++++-+    #     model_kwargs,
+-++++-+    #     assistant_model,
+-++++-+    #     batch_size,
+-++++-+    #     max_cache_length,
+-++++-+    # ):
+-++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-++++-+    #         generation_config.cache_implementation = "static"
+-++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-++++-+        
+-++++-+    #     if generation_config.cache_implementation == "static":
+-++++-+    #         base_required_from_max_length = generation_config.max_length + 1
+-++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+-++++-+    #         min_cache_size = 50
+-++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-++++-+    #         else:
+-++++-+    #             max_cache_length = max(base_required, min_cache_size)
+-++++-+            
+-++++-+    #         original_max_cache_length = max_cache_length
+-++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+-++++-+            
+-++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++-+    #             if max_cache_length > self.config.max_position_embeddings:
+-++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-++++-+        
+-++++-+    #     result = super()._prepare_cache_for_generation(
+-++++-+    #         generation_config=generation_config,
+-++++-+    #         model_kwargs=model_kwargs,
+-++++-+    #         assistant_model=assistant_model,
+-++++-+    #         batch_size=batch_size,
+-++++-+    #         max_cache_length=max_cache_length,
+-++++-+    #     )
+-++++-+        
+-++++-+    #     if generation_config.cache_implementation == "static":
+-++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-++++-+    #         created_cache = model_kwargs.get(cache_name)
+-++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
+-++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-++++-+        
+-++++-+    #     return result
+-++++-+
+-++++-+
+-++++-+
+-++++- 
+-++++- 
+-++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-++++--- 
+-++++-2.27.0
+-++++-
+-++++-- 
+-++++2.27.0
+-++++
+-+++-- 
+-+++2.27.0
+-+++
+-++-- 
+-++2.27.0
+-++
+-+-- 
+-+2.27.0
+-+
+--- 
+-2.27.0
+-
+diff --git a/patches/0008-moe-change.patch b/patches/0008-moe-change.patch
+deleted file mode 100644
+index 349f1429..00000000
+--- a/patches/0008-moe-change.patch
++++ /dev/null
+@@ -1,8789 +0,0 @@
+-From 45ba3bbc411b64cbffd547fa3d66bce9545639dd Mon Sep 17 00:00:00 2001
+-From: Pinoeer-kingxi <13022943007@163.com>
+-Date: Sun, 9 Nov 2025 00:50:01 +0800
+-Subject: [PATCH 8/8] moe change
+-
+----
+- .../models/deepseek/modeling_deepseek.py      |  433 +-
+- .../models/qwen2_moe/modeling_qwen2_moe.py    |   86 +-
+- patches/0001-20251104commit.patch             |    2 +-
+- patches/0002-20251106commit.patch             |    2 +-
+- patches/0003-20261106secondcommit.patch       |    2 +-
+- patches/0004-20251106change.patch             |    2 +-
+- patches/0005-20251107001commit.patch          |    2 +-
+- patches/0006-20251107002commit.patch          |    2 +-
+- patches/0007-20251107003commit.patch          | 8034 +++++++++++++++++
+- 9 files changed, 8510 insertions(+), 55 deletions(-)
+- create mode 100644 patches/0007-20251107003commit.patch
+-
+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-index ff631974..0af29305 100644
+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-@@ -19,8 +19,10 @@
+- # limitations under the License.
+- """ MindNLP DeepSeek model."""
+- import math
+-+import time
+- import warnings
+- from typing import List, Optional, Tuple, Union
+-+from mindspore import mint
+- import mindspore
+- from mindnlp.core import nn, ops, no_grad
+- from mindnlp.core.nn import functional as F
+-@@ -54,6 +56,10 @@ logger = logging.get_logger(__name__)
+- 
+- _CONFIG_FOR_DOC = "DeepseekConfig"
+- 
+-+Long_Prompt = 1
+-+LONG_PROMPT_LENGTH_THRESHOLD = 128
+-+SHORT_PROMPT_LENGTH_THRESHOLD = 32
+-+
+- _attn_mask_cache = {}
+- 
+- def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+-@@ -380,6 +386,8 @@ class MoEGate(nn.Module):
+-         return topk_idx, topk_weight, aux_loss
+- 
+- 
+-+bincount_op = mindspore.ops.Bincount()
+-+
+- class DeepseekMoE(nn.Module):
+-     """
+-     A mixed expert module containing shared experts.
+-@@ -413,7 +421,10 @@ class DeepseekMoE(nn.Module):
+-                     y = y + self.shared_experts(identity)
+-                 return y
+-             else:
+--                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+                if Long_Prompt == 0:
+-+                    y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+                else:
+-+                    y= self.moe_infer_prefill_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-                 if self.config.n_shared_experts is not None:
+-                     y = y + self.shared_experts(identity)
+-                 return y
+-@@ -421,7 +432,103 @@ class DeepseekMoE(nn.Module):
+-         # if self.config.n_shared_experts is not None:
+-         #     y = y + self.shared_experts(identity)
+-         # return y
+--        
+-+    
+-+    
+-+    
+-+    # lwx
+-+    # def forward(self, x, expert_ids: Optional[mindspore.Tensor] = None):
+-+    #     """
+-+    #     如果 expert_ids 为 None，走单专家逻辑；
+-+    #     如果有，多专家批量处理，保证和原逻辑一致。
+-+    #     """
+-+    #     if expert_ids is None:
+-+    #         # 原单专家逻辑
+-+    #         if self.config.pretraining_tp > 1:
+-+    #             slice = self.intermediate_size // self.config.pretraining_tp
+-+    #             gate_proj_slices = ops.split(self.gate_proj.weight, slice, dim=0)
+-+    #             up_proj_slices = ops.split(self.up_proj.weight, slice, dim=0)
+-+    #             down_proj_slices = ops.split(self.down_proj.weight, slice, dim=1)
+-+    #             gate_proj = ops.cat([F.linear(x, gate_proj_slices[i]) 
+-+    #                                  for i in range(self.config.pretraining_tp)], dim=-1)
+-+    #             up_proj = ops.cat([F.linear(x, up_proj_slices[i]) 
+-+    #                                for i in range(self.config.pretraining_tp)], dim=-1)
+-+    #             intermediate_states = ops.split((self.act_fn(gate_proj) * up_proj), slice, dim=2)
+-+    #             down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) 
+-+    #                          for i in range(self.config.pretraining_tp)]
+-+    #             down_proj = sum(down_proj)
+-+    #         else:
+-+    #             down_proj = self.down_proj(
+-+    #                 self.act_fn(self.gate_proj(x)) * self.up_proj(x)
+-+    #             )
+-+    #         return down_proj
+-+
+-+    #     # ====== 批量多专家路径 ======
+-+    #     hidden_size = x.shape[-1]
+-+
+-+    #     # 按 token expert_ids 选权重
+-+    #     gate_weights = self.gate_proj.weight[expert_ids]  # shape: [tokens, inter_size]
+-+    #     up_weights   = self.up_proj.weight[expert_ids]
+-+    #     down_weights = self.down_proj.weight[expert_ids]
+-+
+-+    #     # 注意：pretraining_tp > 1 的分 slice 逻辑仍然要保留
+-+    #     if self.config.pretraining_tp > 1:
+-+    #         outputs = []
+-+    #         slice = self.intermediate_size // self.config.pretraining_tp
+-+    #         for i in range(self.config.pretraining_tp):
+-+    #             # 每个 slice 单独计算
+-+    #             gate_proj_out = F.linear(x, gate_weights[:, i*slice:(i+1)*slice])
+-+    #             up_proj_out   = F.linear(x, up_weights[:, i*slice:(i+1)*slice])
+-+    #             act_out       = self.act_fn(gate_proj_out) * up_proj_out
+-+    #             down_proj_out = F.linear(act_out, down_weights[i*slice:(i+1)*slice, :])
+-+    #             outputs.append(down_proj_out)
+-+    #         return sum(outputs)
+-+    #     else:
+-+    #         gate_proj_out = F.linear(x, gate_weights)
+-+    #         up_proj_out   = F.linear(x, up_weights)
+-+    #         act_out       = self.act_fn(gate_proj_out) * up_proj_out
+-+    #         return F.linear(act_out, down_weights)
+-+    # @no_grad()
+-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     num_tokens = x.shape[0]
+-+    #     hidden_size = x.shape[-1]
+-+
+-+    #     idxs = flat_expert_indices.argsort()
+-+    #     sorted_expert_indices = flat_expert_indices[idxs]
+-+    #     sorted_token_indices = idxs // self.num_experts_per_tok
+-+    #     sorted_indices = sorted_token_indices
+-+
+-+    #     permuted_tokens = x[sorted_token_indices]
+-+    #     sorted_weights  = flat_expert_weights[idxs]
+-+
+-+    #     # 一次调用多专家 forward
+-+    #     expert_outputs = ops.zeros_like(permuted_tokens)
+-+    #     expert_outputs = self.mlp_batch_forward(permuted_tokens, sorted_expert_indices)
+-+
+-+    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
+-+    #     try:
+-+    #         final_output = ops.moe_token_unpermute(
+-+    #             expert_outputs,
+-+    #             sorted_indices,
+-+    #             probs=probs,
+-+    #             padded_mode=False
+-+    #         )
+-+    #     except Exception:
+-+    #         final_output = ops.zeros_like(x)
+-+    #         final_output = mindspore.mint.scatter_add(
+-+    #             final_output,
+-+    #             0,
+-+    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+-+    #             expert_outputs * sorted_weights
+-+    #         )
+-+
+-+    #     return final_output
+-+
+-+    # def mlp_batch_forward(self, tokens, expert_ids):
+-+    #     """
+-+    #     使用批量专家 forward（保留精度）
+-+    #     """
+-+    #     return self.experts[0].forward(tokens, expert_ids)
+-+
+-     # @no_grad()
+-     # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+- 
+-@@ -434,52 +541,15 @@ class DeepseekMoE(nn.Module):
+-     #         expert_cache += expert_out * weight
+-     #     return expert_cache
+-     
+-+    #@dwj
+-     @no_grad()
+--    # dwj
+-     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+--        # x 的 shape: (1, hidden_size)
+--        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+--        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+--
+--        # 1. 收集所有需要的专家层
+--        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-         selected_experts = [self.experts[i] for i in flat_expert_indices]
+--
+--        # 2. 并行计算所有专家的输出
+--        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+--        # ops.cat 会将它们堆叠成一个新的 Tensor
+--        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-         expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+--
+--        # 3. 使用矩阵乘法进行加权求和
+--        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+--        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+--        # 最终结果 final_output 的 shape: (1, hidden_size)
+-         final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+--        
+-         return final_output
+- 
+- 
+--    # @no_grad()
+--    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+--    #     expert_cache = ops.zeros_like(x)
+--    #     idxs = flat_expert_indices.argsort()
+--    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+--    #     token_idxs = idxs // self.num_experts_per_tok
+--
+--    #     for i, end_idx in enumerate(tokens_per_expert):
+--    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+--    #         if start_idx == end_idx:
+--    #             continue
+--    #         expert = self.experts[i]
+--    #         exp_token_idx = token_idxs[start_idx:end_idx]
+--    #         expert_tokens = x[exp_token_idx]
+--    #         expert_out = expert(expert_tokens)
+--    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+--    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+--
+--    #     return expert_cache
+--        
+-     @no_grad()
+-     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-         """
+-@@ -525,6 +595,264 @@ class DeepseekMoE(nn.Module):
+-             )
+- 
+-         return expert_cache
+-+
+-+
+-+    # @no_grad()
+-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     """
+-+    #     优化版 MoE prefill：使用 mindspore.ops.moe_token_unpermute 替代手动 scatter_add
+-+    #     """
+-+    #     num_tokens = x.shape[0]
+-+    #     hidden_size = x.shape[-1]
+-+
+-+    #     # 生成排序后的 token 索引
+-+    #     idxs = flat_expert_indices.argsort()
+-+    #     sorted_expert_indices = flat_expert_indices[idxs]
+-+    #     sorted_token_indices = idxs // self.num_experts_per_tok
+-+
+-+    #     # 记录到 sorted_indices（moe_token_unpermute 用）
+-+    #     sorted_indices = sorted_token_indices  # shape: [num_tokens * top_k]
+-+
+-+    #     # 收集专家输入
+-+    #     permuted_tokens = x[sorted_token_indices]
+-+
+-+    #     # 执行每个专家的 MLP（批量处理）
+-+    #     expert_outputs = []
+-+    #     token_ptr = 0
+-+    #     tokens_per_expert = sorted_expert_indices.bincount()
+-+    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
+-+    #         if count == 0:
+-+    #             continue
+-+    #         cur_tokens = permuted_tokens[token_ptr:token_ptr+count]
+-+    #         out = self.experts[expert_id](cur_tokens)
+-+    #         expert_outputs.append(out)
+-+    #         token_ptr += count
+-+
+-+    #     # 拼接所有专家输出
+-+    #     permuted_outputs = ops.cat(expert_outputs, axis=0)
+-+
+-+    #     # 权重缩放（probs 形状为 [num_tokens, top_k]）
+-+    #     probs = flat_expert_weights.view(num_tokens, self.num_experts_per_tok)
+-+
+-+    #     # 直接调用硬件加速的 unpermute
+-+    #     final_output = ops.moe_token_unpermute(
+-+    #         permuted_outputs,         # shape: [num_tokens * top_k, hidden_size]
+-+    #         sorted_indices,           # shape: [num_tokens * top_k]
+-+    #         probs=probs,               # 按概率加权
+-+    #         padded_mode=False
+-+    #     )
+-+
+-+    #     return final_output
+-+
+-+    # lwx prefill 20251108
+-+    @no_grad()
+-+    def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
+-+        """
+-+        高性能 + 数值一致的 MoE prefill 推理：
+-+        1. 批量化处理所有专家计算，减少 Python 循环开销
+-+        2. Ascend A2 上使用 ops.moe_token_unpermute 加速 token 恢复
+-+        3. CPU/GPU 上自动 fallback 到 scatter_add 实现
+-+        4. 保证权重和 token 排列顺序与原版本完全一致，避免生成结果 mismatch
+-+
+-+        参数：
+-+            x: [num_tokens, hidden_size]，
+-+            MoE 输入的 token 表示
+-+            flat_expert_indices: [num_tokens * top_k]，
+-+            每个 token 的路由专家 id
+-+            flat_expert_weights: [num_tokens * top_k, 1]，
+-+            路由专家权重
+-+        """
+-+        num_tokens = x.shape[0]
+-+        hidden_size = x.shape[-1]
+-+
+-+        # 1) 排序专家分配（与原 scatter_add 一致的顺序）
+-+        idxs = flat_expert_indices.argsort()  # 排序索引
+-+        sorted_expert_indices = flat_expert_indices[idxs]          # [num_tokens*top_k]
+-+        sorted_token_indices = idxs // self.num_experts_per_tok    # 原 token ID
+-+
+-+        # sorted_indices 必须与 permuted_tokens 顺序匹配
+-+        sorted_indices = sorted_token_indices   # 用原 token 位置恢复顺序
+-+
+-+        # 2) 收集专家输入（按 idxs 排序）
+-+        permuted_tokens = x[sorted_token_indices]                  # [num_tokens*top_k, hidden_size]
+-+        sorted_weights  = flat_expert_weights[idxs]                 # [num_tokens*top_k, 1]，确保与 permuted_tokens 对齐
+-+
+-+        # 3) 计算每个专家的 token 数
+-+        tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
+-+
+-+        # 4) 批量专家计算（减少 Python 循环）
+-+        gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
+-+        up_weights   = ops.stack([expert.up_proj.weight   for expert in self.experts], dim=0)
+-+        down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
+-+
+-+        expert_outputs = ops.zeros_like(permuted_tokens)
+-+        ptr = 0
+-+        for expert_id, count in enumerate(tokens_per_expert.tolist()):
+-+            if count == 0:
+-+                continue
+-+            tokens = permuted_tokens[ptr:ptr+count]  # [count, hidden_size]
+-+            
+-+            # 与 DeepseekMLP forward 等价
+-+            gate_proj_out = F.linear(tokens, gate_weights[expert_id])
+-+            up_proj_out   = F.linear(tokens, up_weights[expert_id])
+-+            act_out       = self.experts[expert_id].act_fn(gate_proj_out) * up_proj_out
+-+            expert_out    = F.linear(act_out, down_weights[expert_id])
+-+
+-+            expert_outputs[ptr:ptr+count] = expert_out
+-+            ptr += count
+-+
+-+        # 5) Ascend 加速的 unpermute（已排序的权重）
+-+        probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)  # 按排序后的顺序 reshape
+-+
+-+        final_output = ops.zeros_like(x)
+-+        final_output = mindspore.mint.scatter_add(
+-+            final_output,
+-+            0,
+-+            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+-+            expert_outputs * sorted_weights
+-+        )      
+-+
+-+
+-+        # try:
+-+        #     final_output = ops.moe_token_unpermute(
+-+        #         expert_outputs,   # [num_tokens*top_k, hidden_size]
+-+        #         sorted_indices,   # [num_tokens*top_k] 原 token id
+-+        #         probs=probs,      # 对应权重
+-+        #         padded_mode=False
+-+        #     )
+-+        # except Exception:
+-+        #     # CPU/GPU fallback：用 scatter_add 保证完全一致
+-+        #     final_output = ops.zeros_like(x)
+-+        #     final_output = mindspore.mint.scatter_add(
+-+        #         final_output,
+-+        #         0,
+-+        #         sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+-+        #         expert_outputs * sorted_weights
+-+        #     )
+-+
+-+        return final_output
+-+
+-+
+-+    # @no_grad()
+-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     num_tokens = x.shape[0]
+-+    #     hidden_size = x.shape[-1]
+-+
+-+    #     idxs = flat_expert_indices.argsort()
+-+    #     sorted_expert_indices = flat_expert_indices[idxs]
+-+    #     sorted_token_indices = idxs // self.num_experts_per_tok
+-+        
+-+    #     # sorted_indices = sorted_token_indices
+-+    #     sorted_indices = sorted_token_indices.astype(mindspore.int32)
+-+    #     permuted_tokens = x[sorted_token_indices]
+-+    #     sorted_weights = flat_expert_weights[idxs]
+-+    #     tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
+-+
+-+    #     expert_outputs = ops.zeros_like(permuted_tokens)
+-+    #     ptr = 0
+-+
+-+    #     # 只按专家维度循环
+-+    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
+-+    #         if count == 0:
+-+    #             continue
+-+    #         token_slice = slice(ptr, ptr + count)
+-+    #         expert_tokens = permuted_tokens[token_slice]
+-+
+-+    #         # 保持原 forward（含 pretraining_tp、bias 等）
+-+    #         expert_out = self.experts[expert_id](expert_tokens)
+-+
+-+    #         expert_outputs[token_slice] = expert_out
+-+    #         ptr += count
+-+
+-+    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
+-+    #     try:
+-+    #         final_output = mindspore.ops.moe_token_unpermute(
+-+    #             expert_outputs,
+-+    #             sorted_indices,
+-+    #             probs=probs,
+-+    #             padded_mode=False
+-+    #         )
+-+    #     except Exception:
+-+    #         final_output = ops.zeros_like(x)
+-+    #         final_output = mindspore.mint.scatter_add(
+-+    #             final_output,
+-+    #             0,
+-+    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+-+    #             expert_outputs * sorted_weights
+-+    #         )
+-+
+-+    #     return final_output
+-+
+-+
+-+    #lwx
+-+    # @no_grad()
+-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+    #     """
+-+    #     并行化 MoE prefill：
+-+    #     - 一次性计算所有专家输出，牺牲显存峰值换取速度
+-+    #     - 保证结果与原版完全一致
+-+    #     """
+-+    #     # 输出缓存
+-+    #     expert_cache = ops.zeros_like(x)
+-+
+-+    #     # token 总数（批量*seq_len*num_experts_per_tok）
+-+    #     num_tokens = flat_expert_indices.shape[0]
+-+    #     hidden_dim = x.shape[-1]
+-+
+-+    #     # 原 token ID（idxs // num_experts_per_tok）
+-+    #     token_ids = ops.arange(num_tokens // self.num_experts_per_tok).repeat_interleave(self.num_experts_per_tok)
+-+
+-+    #     # ====== Step 1: 组织输入 ======
+-+    #     # 按 experts 排序，保证 scatter_add 对应位置一致
+-+    #     sort_ids = flat_expert_indices.argsort()
+-+    #     sorted_experts = flat_expert_indices[sort_ids]
+-+    #     sorted_tokens = token_ids[sort_ids]
+-+    #     sorted_weights = flat_expert_weights[sort_ids]
+-+
+-+    #     # 收集每个专家的输入
+-+    #     # build: expert_inputs[expert_id] = [tokens...]
+-+    #     expert_inputs = []
+-+    #     expert_outs = []
+-+
+-+    #     for eid in range(self.config.n_routed_experts):
+-+    #         eid_mask = (sorted_experts == eid)
+-+    #         if eid_mask.any():
+-+    #             tokens_for_eid = x[sorted_tokens[eid_mask]]
+-+    #             expert_inputs.append(tokens_for_eid)
+-+    #         else:
+-+    #             expert_inputs.append(None)
+-+
+-+    #     # ====== Step 2: 并行计算所有专家输出 ======
+-+    #     # 存储所有专家结果到一个列表
+-+    #     for eid in range(self.config.n_routed_experts):
+-+    #         if expert_inputs[eid] is not None:
+-+    #             out = self.experts[eid](expert_inputs[eid])
+-+    #             expert_outs.append(out)
+-+    #         else:
+-+    #             expert_outs.append(None)
+-+
+-+    #     # ====== Step 3: scatter_add 回写结果 ======
+-+    #     # 遍历专家，将结果加回对应的 token
+-+    #     pos = 0
+-+    #     for eid in range(self.config.n_routed_experts):
+-+    #         if expert_outs[eid] is not None:
+-+    #             size = expert_outs[eid].shape[0]
+-+    #             tokens_idx = sorted_tokens[pos:pos+size]
+-+    #             scaled_out = expert_outs[eid] * sorted_weights[pos:pos+size]
+-+    #             pos += size
+-+
+-+    #             # scatter_add 到 expert_cache
+-+    #             expert_cache = mindspore.mint.scatter_add(
+-+    #                 expert_cache,
+-+    #                 dim=0,
+-+    #                 index=tokens_idx.view(-1, 1).tile((1, hidden_dim)),
+-+    #                 src=scaled_out
+-+    #             )
+-+
+-+    #     return expert_cache
+-+
+-+
+-+
+- # 放置在 DeepseekMoE 类中
+-     # @no_grad()
+-     # #lwx 20251107
+-@@ -1188,7 +1516,7 @@ class DeepseekDecoderLayer(nn.Module):
+-         self.hidden_size = config.hidden_size
+- 
+-         # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+--            # config=config, layer_idx=layer_idx
+-+        #     config=config, layer_idx=layer_idx
+-         # )
+- 
+-         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-@@ -1204,6 +1532,7 @@ class DeepseekDecoderLayer(nn.Module):
+-             )
+-             else DeepseekMLP(config)
+-         )
+-+
+-         self.input_layernorm = DeepseekRMSNorm(
+-             config.hidden_size, eps=config.rms_norm_eps
+-         )
+-@@ -1537,6 +1866,28 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-     def get_decoder(self):
+-         return self.model
+- 
+-+    def generate(self, *args, **kwargs):
+-+        """
+-+        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-+        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-+        """
+-+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-+
+-+        input_ids = kwargs.get("input_ids")
+-+        if input_ids is None and args:
+-+            input_ids = args[0]
+-+
+-+        if input_ids is not None:
+-+            prompt_length = input_ids.shape[1]
+-+            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+-+                Long_Prompt = 2
+-+            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+-+                Long_Prompt = 0
+-+            else:
+-+                Long_Prompt = 1
+-+
+-+
+-+        return super().generate(*args, **kwargs)
+- 
+-     def forward(
+-         self,
+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-index 913a7609..6566958b 100644
+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-@@ -1104,7 +1104,7 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+- 
+-     # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+-     @no_grad()
+--    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+    def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-         original_dtype = hidden_states.dtype
+-         batch_size, _ = hidden_states.shape
+-         expert_outputs_list = [
+-@@ -1119,8 +1119,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-         return moe_output_fp32.squeeze(1).to(original_dtype)
+- 
+-+
+-     # @no_grad()
+--    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+    # def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-     #     num_tokens, _ = hidden_states.shape
+-     #     flat_selected_experts = selected_experts.flatten()
+-     #     sorted_expert_indices = flat_selected_experts.argsort()
+-@@ -1142,8 +1143,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-     #         current_token_offset += expert_token_count
+-     #     return moe_output
+- 
+-+    # baseline
+-     @no_grad()
+--    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+    def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-         """
+-         优化版 MoE prefill (速度优先模式)：
+-         - 批量张量化处理同一个 expert 的所有 token
+-@@ -1184,7 +1186,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-         return moe_output
+- 
+- 
+-+    @no_grad()
+-+    def _moe_infer_prefill_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+        """
+-+        优化版 MoE prefill (速度优先模式) - 连续切片 & 单次 scatter_add
+-+        逻辑：
+-+        1. 按 expert 排序，将同一 expert 的 token 放在连续内存中
+-+        2. 每个 expert 一次性处理其全部 token
+-+        3. 最后一次 scatter_add 回到原 token 顺序
+-+        """
+-+
+-+        num_tokens = hidden_states.shape[0]
+-+        hidden_size = hidden_states.shape[-1]
+-+
+-+        # 展平为一维
+-+        flat_selected_experts = selected_experts.flatten()       # [num_tokens * top_k]
+-+        flat_routing_weights = routing_weights.flatten()         # [num_tokens * top_k]
+-+
+-+        # 按 expert 排序
+-+        idxs = flat_selected_experts.argsort()
+-+        sorted_expert_indices = flat_selected_experts[idxs]       # expert ID 排序后
+-+        sorted_token_indices = idxs // self.top_k                 # 对应原 token ID
+-+
+-+        # 排好序的输入向量（连续内存）
+-+        permuted_tokens = hidden_states[sorted_token_indices]
+-+
+-+        # 排好序的权重
+-+        sorted_weights = flat_routing_weights[idxs]
+-+
+-+        # 每个 expert 对应的 token 数量
+-+        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+-+
+-+        # 存放专家输出（与 permuted_tokens 对应顺序保持一致）
+-+        expert_outputs = ops.zeros_like(permuted_tokens)
+-+
+-+        ptr = 0  # 指向当前切片的起点
+-+        for expert_id, count in enumerate(tokens_per_expert.tolist()):
+-+            if count == 0:
+-+                continue
+-+
+-+            token_slice = slice(ptr, ptr + count)
+-+            expert_tokens = permuted_tokens[token_slice]  # 连续切片
+-+
+-+            # 执行专家 MLP
+-+            expert_out = self.experts[expert_id](expert_tokens)
+-+
+-+            expert_outputs[token_slice] = expert_out
+-+            ptr += count
+-+
+-+        # 按权重缩放
+-+        scaled_outputs = expert_outputs * sorted_weights.unsqueeze(1)
+-+
+-+        # 回写到原 token 顺序 (单次 scatter_add)
+-+        moe_output = mindspore.mint.scatter_add(
+-+            ops.zeros_like(hidden_states),
+-+            0,
+-+            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
+-+            scaled_outputs
+-+        )
+-+
+-+        return moe_output
+-+
+-+
+-+    
+-     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-+
+-     @no_grad()
+-     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-         moe_output = ops.zeros_like(hidden_states)
+-@@ -1225,16 +1291,20 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-         #     # --- 速度优先模式 (SPEED MODE) ---
+-         #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-         #     if sequence_length == 1:
+--        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+        #         moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-         #     else:
+--        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+        #         moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-         
+-         routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-         if sequence_length == 1:
+--            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+            moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-         else:
+--            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+--    
+-+            # if Long_Prompt == 1:
+-+            #     moe_output = self._moe_infer_prefill_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+            # else:
+-+            #     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+            moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+
+- 
+-         # 3. 共享专家计算与合并 (所有模式通用)
+-         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-index c9c8c5ee..513dd40b 100644
+---- a/patches/0001-20251104commit.patch
+-+++ b/patches/0001-20251104commit.patch
+-@@ -1,7 +1,7 @@
+- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Tue, 4 Nov 2025 09:11:51 +0800
+--Subject: [PATCH 1/6] 20251104commit
+-+Subject: [PATCH 1/7] 20251104commit
+- 
+- ---
+-  mindnlp/transformers/cache_utils.py           |  28 +-
+-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-index 625656eb..41081b85 100644
+---- a/patches/0002-20251106commit.patch
+-+++ b/patches/0002-20251106commit.patch
+-@@ -1,7 +1,7 @@
+- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 09:20:38 +0800
+--Subject: [PATCH 2/6] 20251106commit
+-+Subject: [PATCH 2/7] 20251106commit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-index dcb85080..c1392569 100644
+---- a/patches/0003-20261106secondcommit.patch
+-+++ b/patches/0003-20261106secondcommit.patch
+-@@ -1,7 +1,7 @@
+- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 14:54:37 +0800
+--Subject: [PATCH 3/6] 20261106secondcommit
+-+Subject: [PATCH 3/7] 20261106secondcommit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-index bbed13cc..e548b1b2 100644
+---- a/patches/0004-20251106change.patch
+-+++ b/patches/0004-20251106change.patch
+-@@ -1,7 +1,7 @@
+- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Thu, 6 Nov 2025 15:48:09 +0800
+--Subject: [PATCH 4/6] 20251106change
+-+Subject: [PATCH 4/7] 20251106change
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  189 +-
+-diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+-index b2d1035c..bf224d2a 100644
+---- a/patches/0005-20251107001commit.patch
+-+++ b/patches/0005-20251107001commit.patch
+-@@ -1,7 +1,7 @@
+- From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Fri, 7 Nov 2025 11:48:18 +0800
+--Subject: [PATCH 5/6] 20251107001commit
+-+Subject: [PATCH 5/7] 20251107001commit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |   91 +-
+-diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
+-index bffa134e..1bd306b9 100644
+---- a/patches/0006-20251107002commit.patch
+-+++ b/patches/0006-20251107002commit.patch
+-@@ -1,7 +1,7 @@
+- From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
+- From: Pinoeer-kingxi <13022943007@163.com>
+- Date: Fri, 7 Nov 2025 12:06:32 +0800
+--Subject: [PATCH 6/6] 20251107002commit
+-+Subject: [PATCH 6/7] 20251107002commit
+- 
+- ---
+-  .../models/deepseek/modeling_deepseek.py      |  122 +-
+-diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
+-new file mode 100644
+-index 00000000..ce558554
+---- /dev/null
+-+++ b/patches/0007-20251107003commit.patch
+-@@ -0,0 +1,8034 @@
+-+From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
+-+From: Pinoeer-kingxi <13022943007@163.com>
+-+Date: Fri, 7 Nov 2025 12:12:51 +0800
+-+Subject: [PATCH 7/7] 20251107003commit
+-+
+-+---
+-+ .../models/deepseek/modeling_deepseek.py      |    2 +-
+-+ patches/0001-20251104commit.patch             |    2 +-
+-+ patches/0002-20251106commit.patch             |    2 +-
+-+ patches/0003-20261106secondcommit.patch       |    2 +-
+-+ patches/0004-20251106change.patch             |    2 +-
+-+ patches/0005-20251107001commit.patch          |    2 +-
+-+ patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
+-+ 7 files changed, 7937 insertions(+), 6 deletions(-)
+-+ create mode 100644 patches/0006-20251107002commit.patch
+-+
+-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+index e7e1c053..ff631974 100644
+-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
+-+     #     return expert_cache
+-+     
+-+     @no_grad()
+-+-    dwj
+-++    # dwj
+-+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+         # x 的 shape: (1, hidden_size)
+-+         # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+index 2842180e..c9c8c5ee 100644
+-+--- a/patches/0001-20251104commit.patch
+-++++ b/patches/0001-20251104commit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+-Subject: [PATCH 1/5] 20251104commit
+-++Subject: [PATCH 1/6] 20251104commit
+-+ 
+-+ ---
+-+  mindnlp/transformers/cache_utils.py           |  28 +-
+-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-+index c6cd8757..625656eb 100644
+-+--- a/patches/0002-20251106commit.patch
+-++++ b/patches/0002-20251106commit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-+-Subject: [PATCH 2/5] 20251106commit
+-++Subject: [PATCH 2/6] 20251106commit
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-+index 601960c9..dcb85080 100644
+-+--- a/patches/0003-20261106secondcommit.patch
+-++++ b/patches/0003-20261106secondcommit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-+-Subject: [PATCH 3/5] 20261106secondcommit
+-++Subject: [PATCH 3/6] 20261106secondcommit
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-+index 8976f10b..bbed13cc 100644
+-+--- a/patches/0004-20251106change.patch
+-++++ b/patches/0004-20251106change.patch
+-+@@ -1,7 +1,7 @@
+-+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Thu, 6 Nov 2025 15:48:09 +0800
+-+-Subject: [PATCH 4/5] 20251106change
+-++Subject: [PATCH 4/6] 20251106change
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |  189 +-
+-+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+-+index 8d9032be..b2d1035c 100644
+-+--- a/patches/0005-20251107001commit.patch
+-++++ b/patches/0005-20251107001commit.patch
+-+@@ -1,7 +1,7 @@
+-+ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+-+ From: Pinoeer-kingxi <13022943007@163.com>
+-+ Date: Fri, 7 Nov 2025 11:48:18 +0800
+-+-Subject: [PATCH 5/5] 20251107001commit
+-++Subject: [PATCH 5/6] 20251107001commit
+-+ 
+-+ ---
+-+  .../models/deepseek/modeling_deepseek.py      |   91 +-
+-+diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
+-+new file mode 100644
+-+index 00000000..bffa134e
+-+--- /dev/null
+-++++ b/patches/0006-20251107002commit.patch
+-+@@ -0,0 +1,7931 @@
+-++From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
+-++From: Pinoeer-kingxi <13022943007@163.com>
+-++Date: Fri, 7 Nov 2025 12:06:32 +0800
+-++Subject: [PATCH 6/6] 20251107002commit
+-++
+-++---
+-++ .../models/deepseek/modeling_deepseek.py      |  122 +-
+-++ patches/0001-20251104commit.patch             |    2 +-
+-++ patches/0002-20251106commit.patch             |    2 +-
+-++ patches/0003-20261106secondcommit.patch       |    2 +-
+-++ patches/0004-20251106change.patch             |    2 +-
+-++ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
+-++ 6 files changed, 7773 insertions(+), 64 deletions(-)
+-++ create mode 100644 patches/0005-20251107001commit.patch
+-++
+-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++index 8831e4b7..e7e1c053 100644
+-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
+-++     #         expert_out = expert(x)
+-++     #         expert_cache += expert_out * weight
+-++     #     return expert_cache
+-++-
+-++-    # @no_grad()
+-++-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++-    #     # x 的 shape: (1, hidden_size)
+-++-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-++-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-++-
+-++-    #     # 1. 收集所有需要的专家层
+-++-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-++-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+-++-
+-++-    #     # 2. 并行计算所有专家的输出
+-++-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-++-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+-++-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-++-
+-++-    #     # 3. 使用矩阵乘法进行加权求和
+-++-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-++-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+-++-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+++    
+-+++    @no_grad()
+-+++    dwj
+-+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++        # x 的 shape: (1, hidden_size)
+-+++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+++
+-+++        # 1. 收集所有需要的专家层
+-+++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+++
+-+++        # 2. 并行计算所有专家的输出
+-+++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+++        # ops.cat 会将它们堆叠成一个新的 Tensor
+-+++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+++
+-+++        # 3. 使用矩阵乘法进行加权求和
+-+++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++        # 最终结果 final_output 的 shape: (1, hidden_size)
+-+++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-++         
+-++-    #     return final_output
+-+++        return final_output
+-++ 
+-++ 
+-++     # @no_grad()
+-++@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
+-++ 
+-++         return expert_cache
+-++ # 放置在 DeepseekMoE 类中
+-++-    @no_grad()
+-++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++-        """
+-++-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-++-
+-++-        Args:
+-++-            x (Tensor): 输入张量, shape: (1, hidden_size)
+-++-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-++-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-++-        """
+-++-        top_k, _ = flat_expert_weights.shape
+-++-        hidden_size = x.shape[-1]
+-++-
+-++-        # 1. 将所有专家的权重堆叠起来
+-++-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-++-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-++-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-+++    # @no_grad()
+-+++    # #lwx 20251107
+-+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++    #     """
+-+++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-+++
+-+++    #     Args:
+-+++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
+-+++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-+++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-+++    #     """
+-+++    #     top_k, _ = flat_expert_weights.shape
+-+++    #     hidden_size = x.shape[-1]
+-+++
+-+++    #     # 1. 将所有专家的权重堆叠起来
+-+++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-+++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-+++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-++         
+-++-        # 2. "收集" 所需的专家权重
+-++-        selected_gate_w = stacked_gate_w[flat_expert_indices]
+-++-        selected_up_w = stacked_up_w[flat_expert_indices]
+-++-        selected_down_w = stacked_down_w[flat_expert_indices]
+-+++    #     # 2. "收集" 所需的专家权重
+-+++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
+-+++    #     selected_up_w = stacked_up_w[flat_expert_indices]
+-+++    #     selected_down_w = stacked_down_w[flat_expert_indices]
+-++         
+-++-        # 3. 准备输入
+-++-        x_expanded = x.expand((top_k, 1, hidden_size))
+-+++    #     # 3. 准备输入
+-+++    #     x_expanded = x.expand((top_k, 1, hidden_size))
+-++         
+-++-        # 4. 并行计算 gate_proj 和 up_proj
+-++-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-++-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-+++    #     # 4. 并行计算 gate_proj 和 up_proj
+-+++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-+++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-++ 
+-++-        # 5. 计算中间状态
+-++-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-+++    #     # 5. 计算中间状态
+-+++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-++         
+-++-        # 6. 并行计算 down_proj
+-++-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-++-        #    --- [FIX] ---
+-++-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-++-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-++-        #    --- [FIX END] ---
+-+++    #     # 6. 并行计算 down_proj
+-+++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-+++    #     #    --- [FIX] ---
+-+++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-+++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-+++    #     #    --- [FIX END] ---
+-++         
+-++-        # 7. 根据路由权重进行加权求和
+-++-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-+++    #     # 7. 根据路由权重进行加权求和
+-+++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-++ 
+-++-        return weighted_sum
+-+++    #     return weighted_sum
+-++ 
+-++ 
+-++ 
+-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-++index 0a0ef2d7..2842180e 100644
+-++--- a/patches/0001-20251104commit.patch
+-+++++ b/patches/0001-20251104commit.patch
+-++@@ -1,7 +1,7 @@
+-++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++ From: Pinoeer-kingxi <13022943007@163.com>
+-++ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++-Subject: [PATCH 1/4] 20251104commit
+-+++Subject: [PATCH 1/5] 20251104commit
+-++ 
+-++ ---
+-++  mindnlp/transformers/cache_utils.py           |  28 +-
+-++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-++index 5185270c..c6cd8757 100644
+-++--- a/patches/0002-20251106commit.patch
+-+++++ b/patches/0002-20251106commit.patch
+-++@@ -1,7 +1,7 @@
+-++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-++ From: Pinoeer-kingxi <13022943007@163.com>
+-++ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-++-Subject: [PATCH 2/4] 20251106commit
+-+++Subject: [PATCH 2/5] 20251106commit
+-++ 
+-++ ---
+-++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-++index 3e05f821..601960c9 100644
+-++--- a/patches/0003-20261106secondcommit.patch
+-+++++ b/patches/0003-20261106secondcommit.patch
+-++@@ -1,7 +1,7 @@
+-++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-++ From: Pinoeer-kingxi <13022943007@163.com>
+-++ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-++-Subject: [PATCH 3/4] 20261106secondcommit
+-+++Subject: [PATCH 3/5] 20261106secondcommit
+-++ 
+-++ ---
+-++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-++index 88a1aef4..8976f10b 100644
+-++--- a/patches/0004-20251106change.patch
+-+++++ b/patches/0004-20251106change.patch
+-++@@ -1,7 +1,7 @@
+-++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+-++ From: Pinoeer-kingxi <13022943007@163.com>
+-++ Date: Thu, 6 Nov 2025 15:48:09 +0800
+-++-Subject: [PATCH 4/4] 20251106change
+-+++Subject: [PATCH 4/5] 20251106change
+-++ 
+-++ ---
+-++  .../models/deepseek/modeling_deepseek.py      |  189 +-
+-++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
+-++new file mode 100644
+-++index 00000000..8d9032be
+-++--- /dev/null
+-+++++ b/patches/0005-20251107001commit.patch
+-++@@ -0,0 +1,7707 @@
+-+++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
+-+++From: Pinoeer-kingxi <13022943007@163.com>
+-+++Date: Fri, 7 Nov 2025 11:48:18 +0800
+-+++Subject: [PATCH 5/5] 20251107001commit
+-+++
+-+++---
+-+++ .../models/deepseek/modeling_deepseek.py      |   91 +-
+-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
+-+++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
+-+++ patches/0001-20251104commit.patch             |    2 +-
+-+++ patches/0002-20251106commit.patch             |    2 +-
+-+++ patches/0003-20261106secondcommit.patch       |    2 +-
+-+++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
+-+++ 7 files changed, 7577 insertions(+), 30 deletions(-)
+-+++ create mode 100644 patches/0004-20251106change.patch
+-+++
+-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++index 0546f318..8831e4b7 100644
+-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
+-+++     #         expert_cache += expert_out * weight
+-+++     #     return expert_cache
+-+++ 
+-+++-    @no_grad()
+-+++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++-        # x 的 shape: (1, hidden_size)
+-+++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-+++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-+++-
+-+++-        # 1. 收集所有需要的专家层
+-+++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-+++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-+++-
+-+++-        # 2. 并行计算所有专家的输出
+-+++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-+++-        # ops.cat 会将它们堆叠成一个新的 Tensor
+-+++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-+++-
+-+++-        # 3. 使用矩阵乘法进行加权求和
+-+++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-+++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-+++-        # 最终结果 final_output 的 shape: (1, hidden_size)
+-+++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-++++    # @no_grad()
+-++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++    #     # x 的 shape: (1, hidden_size)
+-++++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-++++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-++++
+-++++    #     # 1. 收集所有需要的专家层
+-++++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-++++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
+-++++
+-++++    #     # 2. 并行计算所有专家的输出
+-++++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-++++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
+-++++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-++++
+-++++    #     # 3. 使用矩阵乘法进行加权求和
+-++++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-++++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
+-++++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-+++         
+-+++-        return final_output
+-++++    #     return final_output
+-+++ 
+-+++ 
+-+++     # @no_grad()
+-+++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
+-+++             )
+-+++ 
+-+++         return expert_cache
+-++++# 放置在 DeepseekMoE 类中
+-++++    @no_grad()
+-++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++        """
+-++++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
+-++++
+-++++        Args:
+-++++            x (Tensor): 输入张量, shape: (1, hidden_size)
+-++++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
+-++++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
+-++++        """
+-++++        top_k, _ = flat_expert_weights.shape
+-++++        hidden_size = x.shape[-1]
+-++++
+-++++        # 1. 将所有专家的权重堆叠起来
+-++++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
+-++++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
+-++++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
+-++++        
+-++++        # 2. "收集" 所需的专家权重
+-++++        selected_gate_w = stacked_gate_w[flat_expert_indices]
+-++++        selected_up_w = stacked_up_w[flat_expert_indices]
+-++++        selected_down_w = stacked_down_w[flat_expert_indices]
+-++++        
+-++++        # 3. 准备输入
+-++++        x_expanded = x.expand((top_k, 1, hidden_size))
+-++++        
+-++++        # 4. 并行计算 gate_proj 和 up_proj
+-++++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
+-++++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
+-++++
+-++++        # 5. 计算中间状态
+-++++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
+-++++        
+-++++        # 6. 并行计算 down_proj
+-++++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
+-++++        #    --- [FIX] ---
+-++++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
+-++++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
+-++++        #    --- [FIX END] ---
+-++++        
+-++++        # 7. 根据路由权重进行加权求和
+-++++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
+-++++
+-++++        return weighted_sum
+-++++
+-++++
+-+++ 
+-+++         # @no_grad()
+-+++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++index ebd7782e..913a7609 100644
+-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
+-+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+++ def rotate_half(x):
+-+++     """Rotates half the hidden dims of the input."""
+-+++-    x1 = x[..., : x.shape[-1] // 2]
+-+++-    x2 = x[..., x.shape[-1] // 2 :]
+-++++    # x1 = x[..., : x.shape[-1] // 2]
+-++++    # x2 = x[..., x.shape[-1] // 2 :]
+-+++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++     return ops.cat((-x2, x1), dim=-1)
+-+++ 
+-+++ 
+-+++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-+++index d059dcbe..2b217b64 100644
+-+++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-++++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
+-+++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
+-+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+++ def rotate_half(x):
+-+++     """Rotates half the hidden dims of the input."""
+-+++-    x1 = x[..., : x.shape[-1] // 2]
+-+++-    x2 = x[..., x.shape[-1] // 2 :]
+-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++    # x1 = x[..., : x.shape[-1] // 2]
+-++++    # x2 = x[..., x.shape[-1] // 2 :]
+-++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++     return ops.cat((-x2, x1), dim=-1)
+-+++ 
+-+++ 
+-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+++index 78f22642..0a0ef2d7 100644
+-+++--- a/patches/0001-20251104commit.patch
+-++++++ b/patches/0001-20251104commit.patch
+-+++@@ -1,7 +1,7 @@
+-+++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+++ From: Pinoeer-kingxi <13022943007@163.com>
+-+++ Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+++-Subject: [PATCH 1/3] 20251104commit
+-++++Subject: [PATCH 1/4] 20251104commit
+-+++ 
+-+++ ---
+-+++  mindnlp/transformers/cache_utils.py           |  28 +-
+-+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-+++index 22b65dd5..5185270c 100644
+-+++--- a/patches/0002-20251106commit.patch
+-++++++ b/patches/0002-20251106commit.patch
+-+++@@ -1,7 +1,7 @@
+-+++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-+++ From: Pinoeer-kingxi <13022943007@163.com>
+-+++ Date: Thu, 6 Nov 2025 09:20:38 +0800
+-+++-Subject: [PATCH 2/3] 20251106commit
+-++++Subject: [PATCH 2/4] 20251106commit
+-+++ 
+-+++ ---
+-+++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-+++index 966529e4..3e05f821 100644
+-+++--- a/patches/0003-20261106secondcommit.patch
+-++++++ b/patches/0003-20261106secondcommit.patch
+-+++@@ -1,7 +1,7 @@
+-+++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-+++ From: Pinoeer-kingxi <13022943007@163.com>
+-+++ Date: Thu, 6 Nov 2025 14:54:37 +0800
+-+++-Subject: [PATCH 3/3] 20261106secondcommit
+-++++Subject: [PATCH 3/4] 20261106secondcommit
+-+++ 
+-+++ ---
+-+++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-+++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
+-+++new file mode 100644
+-+++index 00000000..88a1aef4
+-+++--- /dev/null
+-++++++ b/patches/0004-20251106change.patch
+-+++@@ -0,0 +1,7498 @@
+-++++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
+-++++From: Pinoeer-kingxi <13022943007@163.com>
+-++++Date: Thu, 6 Nov 2025 15:48:09 +0800
+-++++Subject: [PATCH 4/4] 20251106change
+-++++
+-++++---
+-++++ .../models/deepseek/modeling_deepseek.py      |  189 +-
+-++++ patches/0001-20251104commit.patch             | 1272 +++++++
+-++++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
+-++++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
+-++++ 4 files changed, 7244 insertions(+), 186 deletions(-)
+-++++ create mode 100644 patches/0001-20251104commit.patch
+-++++ create mode 100644 patches/0002-20251106commit.patch
+-++++ create mode 100644 patches/0003-20261106secondcommit.patch
+-++++
+-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++index 2f9192bf..0546f318 100644
+-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
+-++++ 
+-++++         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++-# class DeepseekFlashAttention(nn.Module):
+-++++-#     """
+-++++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-++++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-++++-
+-++++-#     This class is designed as a drop-in replacement for DeepseekAttention.
+-++++-#     """
+-++++-
+-++++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-++++-#         super().__init__()
+-++++-#         self.config = config
+-++++-#         self.layer_idx = layer_idx
+-++++-#         if layer_idx is None:
+-++++-#             logger.warning(
+-++++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++-#                 "when creating this class."
+-++++-#             )
+-++++-
+-++++-#         self.attention_dropout = config.attention_dropout
+-++++-#         self.hidden_size = config.hidden_size
+-++++-#         self.num_heads = config.num_attention_heads
+-++++-#         self.head_dim = self.hidden_size // self.num_heads
+-++++-#         self.num_key_value_heads = config.num_key_value_heads
+-++++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++-#         self.max_position_embeddings = config.max_position_embeddings
+-++++-#         self.rope_theta = config.rope_theta
+-++++-#         self.is_causal = True
+-++++-
+-++++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++-#             raise ValueError(
+-++++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++++-#                 f" and `num_heads`: {self.num_heads})."
+-++++-#             )
+-++++-
+-++++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-++++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-++++-#         self._init_rope()
+-++++-
+-++++-#     def _init_rope(self):
+-++++-#         if self.config.rope_scaling is None:
+-++++-#             self.rotary_emb = DeepseekRotaryEmbedding(
+-++++-#                 self.head_dim,
+-++++-#                 max_position_embeddings=self.max_position_embeddings,
+-++++-#                 base=self.rope_theta,
+-++++-#             )
+-++++-#         else:
+-++++-#             scaling_type = self.config.rope_scaling["type"]
+-++++-#             scaling_factor = self.config.rope_scaling["factor"]
+-++++-#             if scaling_type == "linear":
+-++++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-++++-#                     self.head_dim,
+-++++-#                     max_position_embeddings=self.max_position_embeddings,
+-++++-#                     scaling_factor=scaling_factor,
+-++++-#                     base=self.rope_theta,
+-++++-#                 )
+-++++-#             elif scaling_type == "dynamic":
+-++++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-++++-#                     self.head_dim,
+-++++-#                     max_position_embeddings=self.max_position_embeddings,
+-++++-#                     scaling_factor=scaling_factor,
+-++++-#                     base=self.rope_theta,
+-++++-#                 )
+-++++-#             else:
+-++++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-++++-
+-++++-#     def forward(
+-++++-#         self,
+-++++-#         hidden_states: mindspore.Tensor,
+-++++-#         attention_mask: Optional[mindspore.Tensor] = None,
+-++++-#         position_ids: Optional[mindspore.Tensor] = None,
+-++++-#         past_key_value: Optional[Cache] = None,
+-++++-#         output_attentions: bool = False,
+-++++-#         use_cache: bool = False,
+-++++-#         **kwargs,
+-++++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++-#         if "padding_mask" in kwargs:
+-++++-#             warnings.warn(
+-++++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-++++-#             )
+-++++-        
+-++++-#         if output_attentions:
+-++++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-++++-
+-++++-#         bsz, q_len, _ = hidden_states.shape
+-++++-
+-++++-#         if self.config.pretraining_tp > 1:
+-++++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-++++-
+-++++-#         query_states = self.q_proj(hidden_states)
+-++++-#         key_states = self.k_proj(hidden_states)
+-++++-#         value_states = self.v_proj(hidden_states)
+-++++-
+-++++-#         # Reshape for multi-head attention
+-++++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++-
+-++++-#         kv_seq_len = key_states.shape[-2]
+-++++-#         if past_key_value is not None:
+-++++-#             if self.layer_idx is None:
+-++++-#                 raise ValueError(
+-++++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++-#                     "with a layer index."
+-++++-#                 )
+-++++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++-        
+-++++-#         # Apply Rotary Positional Embedding
+-++++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++-
+-++++-#         if past_key_value is not None:
+-++++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-++++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++-
+-++++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-++++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-++++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++-        
+-++++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++++-        
+-++++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++++-
+-++++-#         # Convert attention_mask for flash_attention_score
+-++++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-++++-#         if attention_mask is not None:
+-++++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-++++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-++++-#                 raise ValueError(
+-++++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-++++-#                 )
+-++++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-++++-#         else:
+-++++-#             attn_mask_for_fa = None
+-++++-        
+-++++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-++++-
+-++++-#         # Call the fused flash_attention_score operator
+-++++-#         attn_output = mindspore.ops.flash_attention_score(
+-++++-#             query=query_states_for_fa,
+-++++-#             key=key_states_for_fa,
+-++++-#             value=value_states_for_fa,
+-++++-#             head_num=self.num_heads, # This is N1, the number of query heads
+-++++-#             input_layout='BSH',
+-++++-#             attn_mask=attn_mask_for_fa,
+-++++-#             keep_prob=keep_prob,
+-++++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++-#             sparse_mode=0 # Default mask mode
+-++++-#         )
+-++++-        
+-++++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-++++-#         attn_output = self.o_proj(attn_output)
+-++++-        
+-++++-#         # Flash Attention does not return attention weights
+-++++-#         attn_weights = None
+-++++-
+-++++-#         return attn_output, attn_weights, past_key_value
+-++++ 
+-++++ class DeepseekFlashAttention(nn.Module):
+-++++     """
+-++++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
+-++++         super().__init__()
+-++++         self.hidden_size = config.hidden_size
+-++++ 
+-++++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-++++-            config=config, layer_idx=layer_idx
+-++++-        )
+-+++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
+-+++++            # config=config, layer_idx=layer_idx
+-+++++        # )
+-++++ 
+-++++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-++++             config=config, layer_idx=layer_idx
+-++++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
+-++++         return outputs
+-++++ 
+-++++ 
+-++++-
+-++++ class DeepseekPreTrainedModel(PreTrainedModel):
+-++++     config_class = DeepseekConfig
+-++++     base_model_prefix = "model"
+-++++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++         # Initialize weights and apply final processing
+-++++         self.post_init()
+-++++         self.warm_up = False
+-++++-        #@dwj
+-++++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-++++-            self.num_layers,
+-++++-            self.num_attention_heads,
+-++++-            self.head_dim,
+-++++-            batch_size=1,
+-++++-            max_length=self.max_length,
+-++++-            dtype=mindspore.float16
+-++++-        )
+-++++-
+-++++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-++++-        key_cache = []
+-++++-        value_cache = []
+-++++-        for _ in range(num_layers):
+-++++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++++-            key_cache.append(k)
+-++++-            value_cache.append(v)
+-++++-        return key_cache, value_cache
+-++++-
+-++++ 
+-++++     def warmup_moe_model_deep(self):
+-++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-++++new file mode 100644
+-++++index 00000000..78f22642
+-++++--- /dev/null
+-+++++++ b/patches/0001-20251104commit.patch
+-++++@@ -0,0 +1,1272 @@
+-+++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+++++From: Pinoeer-kingxi <13022943007@163.com>
+-+++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+++++Subject: [PATCH 1/3] 20251104commit
+-+++++
+-+++++---
+-+++++ mindnlp/transformers/cache_utils.py           |  28 +-
+-+++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-+++++
+-+++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+++++index cadd2e04..02f8d4be 100644
+-+++++--- a/mindnlp/transformers/cache_utils.py
+-++++++++ b/mindnlp/transformers/cache_utils.py
+-+++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+++++             # k_out[:, :, cache_position] = key_states
+-+++++             # v_out[:, :, cache_position] = value_states
+-+++++-            if ON_ORANGE_PI:
+-+++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++++-            else:
+-+++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++++-
+-++++++            # if ON_ORANGE_PI:
+-++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++++            # else:
+-++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++++            # 确保 cache_position 是 1D tensor 并且类型正确
+-++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-++++++            if cache_position.ndim > 1:
+-++++++                cache_position = cache_position.flatten()
+-++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-++++++                cache_position = cache_position.int()
+-++++++            
+-++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-++++++            k_out[:, :, cache_position] = key_states
+-++++++            v_out[:, :, cache_position] = value_states
+-++++++            
+-+++++         return k_out, v_out
+-+++++ 
+-+++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++index c695b944..d8303e45 100644
+-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+++++ def rotate_half(x):
+-+++++     """Rotates half the hidden dims of the input."""
+-+++++-    x1 = x[..., : x.shape[-1] // 2]
+-+++++-    x2 = x[..., x.shape[-1] // 2 :]
+-++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++++    # x1 = x[..., : x.shape[-1] // 2]
+-++++++    # x2 = x[..., x.shape[-1] // 2 :]
+-++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++++     return ops.cat((-x2, x1), dim=-1)
+-+++++ 
+-+++++ 
+-+++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+++++         if self.training:
+-+++++             raise NotImplementedError("Training is not supported yet.")
+-+++++         else:
+-+++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++-        if self.config.n_shared_experts is not None:
+-+++++-            y = y + self.shared_experts(identity)
+-+++++-        return y
+-++++++            # @lwx
+-++++++            if orig_shape[1] == 1:
+-++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-++++++                y=y.view(*orig_shape)
+-++++++                if self.config.n_shared_experts is not None:
+-++++++                    y = y + self.shared_experts(identity)
+-++++++                return y
+-++++++            else:
+-++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-++++++                if self.config.n_shared_experts is not None:
+-++++++                    y = y + self.shared_experts(identity)
+-++++++                return y
+-++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++++        # if self.config.n_shared_experts is not None:
+-++++++        #     y = y + self.shared_experts(identity)
+-++++++        # return y
+-++++++        
+-++++++    @no_grad()
+-++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++++
+-++++++        expert_cache = ops.zeros_like(x)
+-++++++        for i in range(self.num_experts_per_tok):
+-++++++            expert_id = flat_expert_indices[i].item()
+-++++++            weight = flat_expert_weights[i].item()
+-++++++            expert = self.experts[expert_id]
+-++++++            expert_out = expert(x)
+-++++++            expert_cache += expert_out * weight
+-++++++        return expert_cache
+-+++++ 
+-+++++     @no_grad()
+-+++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-        # expert_cache = torch.zeros_like(x)
+-+++++-        # idxs = flat_expert_indices.argsort()
+-+++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++-        # token_idxs = idxs // self.num_experts_per_tok
+-+++++-        # for i, end_idx in enumerate(tokens_per_expert):
+-+++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++-        #     if start_idx == end_idx:
+-+++++-        #         continue
+-+++++-        #     expert = self.experts[i]
+-+++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++-        #     expert_tokens = x[exp_token_idx]
+-+++++-        #     expert_out = expert(expert_tokens)
+-+++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++-        # return expert_cache
+-++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++++         expert_cache = ops.zeros_like(x)
+-+++++         idxs = flat_expert_indices.argsort()
+-+++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++         token_idxs = idxs // self.num_experts_per_tok
+-++++++
+-+++++         for i, end_idx in enumerate(tokens_per_expert):
+-+++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++             if start_idx == end_idx:
+-+++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+++++             expert_out = expert(expert_tokens)
+-+++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++++
+-+++++         return expert_cache
+-++++++        
+-++++++    # @no_grad()
+-++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++++    #     # expert_cache = torch.zeros_like(x)
+-++++++    #     # idxs = flat_expert_indices.argsort()
+-++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++++    #     # token_idxs = idxs // self.num_experts_per_tok
+-++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++++    #     #     if start_idx == end_idx:
+-++++++    #     #         continue
+-++++++    #     #     expert = self.experts[i]
+-++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++    #     #     expert_tokens = x[exp_token_idx]
+-++++++    #     #     expert_out = expert(expert_tokens)
+-++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++++    #     # return expert_cache
+-++++++    #     expert_cache = ops.zeros_like(x)
+-++++++    #     idxs = flat_expert_indices.argsort()
+-++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++++    #     token_idxs = idxs // self.num_experts_per_tok
+-++++++
+-++++++    #     for i, end_idx in enumerate(tokens_per_expert):
+-++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++++    #         if start_idx == end_idx:
+-++++++    #             continue
+-++++++    #         expert = self.experts[i]
+-++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++    #         expert_tokens = x[exp_token_idx]
+-++++++    #         expert_out = expert(expert_tokens)
+-++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++++
+-++++++    #     return expert_cache
+-++++++    # @no_grad()
+-++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++++    #     expert_cache = ops.zeros_like(x)
+-++++++
+-++++++    #     # 排序保证顺序一致
+-++++++    #     idxs = flat_expert_indices.argsort()
+-++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++++    #     token_idxs = idxs // self.num_experts_per_tok
+-++++++
+-++++++    #     # 找出有 token 的专家
+-++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++++
+-++++++    #     for i in active_experts.tolist():
+-++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++++    #         end_idx = tokens_per_expert[i]
+-++++++    #         if start_idx == end_idx:  # 没有 token
+-++++++    #             continue
+-++++++
+-++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++    #         expert_tokens = x[exp_token_idx]
+-++++++    #         expert_out = self.experts[i](expert_tokens)
+-++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++++
+-++++++    #         expert_cache = mindspore.mint.scatter_add(
+-++++++    #             expert_cache,
+-++++++    #             0,
+-++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++++    #             expert_out
+-++++++    #         )
+-++++++
+-++++++    #     return expert_cache
+-++++++
+-++++++
+-+++++ 
+-+++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+++++ #     """
+-+++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++++ 
+-+++++         # Initialize weights and apply final processing
+-+++++         self.post_init()
+-++++++        self.warm_up = False
+-++++++
+-++++++    def warmup_moe_model_deep(self):
+-++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-++++++        test_texts = [
+-++++++            "warmup short",
+-++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-++++++        ]
+-++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++++        if tokenizer is None:
+-++++++            from mindnlp.transformers import AutoTokenizer
+-++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++++            self._warmup_tokenizer = tokenizer
+-++++++
+-++++++        for text in test_texts:
+-++++++            inputs = tokenizer(text, return_tensors="ms")
+-++++++            with mindspore._no_grad():
+-++++++                _ = self(**inputs, use_cache=False)
+-++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+++++ 
+-+++++     def get_input_embeddings(self):
+-+++++         return self.model.embed_tokens
+-+++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++++         ```"""
+-++++++        if not self.warm_up:
+-++++++            self.warm_up = True
+-++++++            self.warmup_moe_model_deep()
+-++++++
+-+++++         output_attentions = (
+-+++++             output_attentions
+-+++++             if output_attentions is not None
+-+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++index 3cbf820e..d4c6b651 100644
+-+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++@@ -18,7 +18,6 @@
+-+++++ # See the License for the specific language governing permissions and
+-+++++ # limitations under the License.
+-+++++ """MindSpore Qwen2MoE model."""
+-+++++-
+-+++++ import math
+-+++++ from typing import List, Optional, Tuple, Union
+-+++++ 
+-+++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+++++     TokenClassifierOutput,
+-+++++ )
+-+++++ from ...modeling_utils import PreTrainedModel
+-++++++from ...generation import GenerationMixin
+-+++++ from ....utils import logging
+-+++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-+++++ 
+-+++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+++++         self.variance_epsilon = eps
+-+++++ 
+-+++++     def forward(self, hidden_states):
+-++++++        # @dwj
+-++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++++        # @lwx
+-++++++        # if not self.training :
+-++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++++         input_dtype = hidden_states.dtype
+-+++++         hidden_states = hidden_states.to(mindspore.float32)
+-+++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+++++@@ -234,6 +239,8 @@ def rotate_half(x):
+-+++++     """Rotates half the hidden dims of the input."""
+-+++++     x1 = x[..., : x.shape[-1] // 2]
+-+++++     x2 = x[..., x.shape[-1] // 2 :]
+-++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++++     return ops.cat((-x2, x1), dim=-1)
+-+++++ 
+-+++++ 
+-+++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+++++         self.config = config
+-+++++         self.hidden_size = config.hidden_size
+-+++++         self.intermediate_size = intermediate_size
+-++++++        
+-+++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+++++         self.act_fn = ACT2FN[config.hidden_act]
+-+++++ 
+-+++++     def forward(self, x):
+-+++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++++-
+-+++++ 
+-++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++++        # @lwx 
+-++++++        # gate_up_output = self.gate_up_proj(x)
+-++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-++++++        # return self.down_proj(swiglu_output)
+-++++++
+-++++++    # def forward(self, x):
+-++++++    #     gate_proj_out = self.gate_proj(x)
+-++++++    #     up_proj_out = self.up_proj(x)
+-++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-++++++    #     return self.down_proj(swiglu_out)
+-++++++        
+-+++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+++++     """
+-+++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+++++         use_cache: bool = False,
+-+++++         cache_position: Optional[mindspore.Tensor] = None,
+-+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++        
+-++++++
+-+++++         bsz, q_len, _ = hidden_states.shape
+-+++++ 
+-+++++         query_states = self.q_proj(hidden_states)
+-+++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++                     "with a layer index."
+-+++++                 )
+-+++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++            if isinstance(past_key_value, StaticCache):
+-++++++                kv_seq_len = key_states.shape[-2]
+-++++++            else:
+-++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++ 
+-+++++         if past_key_value is not None:
+-+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++++            
+-++++++            if isinstance(past_key_value, StaticCache):
+-++++++                kv_seq_len = key_states.shape[-2]
+-+++++ 
+-+++++         # repeat k/v heads if n_kv_heads < n_heads
+-+++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++-
+-++++++        
+-+++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++++ 
+-+++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+++++-            raise ValueError(
+-+++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+++++-                f" {attn_weights.shape}"
+-+++++-            )
+-+++++-
+-+++++-        if attention_mask is not None:  # no matter the length, we just slice it
+-+++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-++++++        if attention_mask is not None:
+-++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++++             attn_weights = attn_weights + causal_mask
+-+++++ 
+-+++++         # upcast attention to fp32
+-+++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++++ 
+-+++++         attn_output = self.o_proj(attn_output)
+-+++++-
+-++++++        # @lwx
+-++++++        
+-++++++        # max_seq_len = self.max_position_embeddings  # 2048
+-++++++
+-++++++        # if attention_mask is not None:
+-++++++        #     # attention_mask: [B, 1, Sq, Sk]
+-++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++++++
+-++++++        #     # pad 到 [max_seq_len, max_seq_len]
+-++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++++++        #     global_attention_mask = padded_mask
+-++++++        # else:
+-++++++        #     global_attention_mask = None
+-++++++
+-++++++
+-++++++        # sparse_mode=3
+-++++++        # attn_output = mindspore.ops.flash_attention_score(
+-++++++        #     query=query_states,
+-++++++        #     key=key_states,
+-++++++        #     value=value_states,
+-++++++        #     real_shift=None,
+-++++++        #     padding_mask=None,
+-++++++
+-++++++        #     head_num=self.num_heads,
+-++++++        #     attn_mask=global_attention_mask,
+-++++++        #     keep_prob=1.0 - self.attention_dropout,
+-++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++        #     input_layout="BNSD",
+-++++++        #     pre_tokens=2147483647,
+-++++++        #     next_tokens=2147483647,
+-++++++        #     inner_precise=0,
+-++++++        #     drop_mask=None,
+-++++++        #     prefix=None,
+-++++++        #     actual_seq_qlen=None,
+-++++++        #     actual_seq_kvlen=None,
+-++++++        #     sparse_mode=sparse_mode,
+-++++++        # )
+-+++++         if not output_attentions:
+-+++++             attn_weights = None
+-+++++ 
+-+++++         return attn_output, attn_weights, past_key_value
+-+++++ 
+-+++++ 
+-++++++class Qwen2MoeFlashAttention(nn.Module):
+-++++++    """
+-++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++++++
+-++++++    关键改动:
+-++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++++++       直接传入原始的 key 和 value 张量效率更高。
+-++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++++    """
+-++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++++        super().__init__()
+-++++++        self.config = config
+-++++++        self.layer_idx = layer_idx
+-++++++        self.hidden_size = config.hidden_size
+-++++++        self.num_heads = config.num_attention_heads
+-++++++        self.head_dim = self.hidden_size // self.num_heads
+-++++++        self.num_key_value_heads = config.num_key_value_heads
+-++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++++        self.max_position_embeddings = config.max_position_embeddings
+-++++++        self.rope_theta = config.rope_theta
+-++++++        self.attention_dropout = config.attention_dropout
+-++++++
+-++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++++            raise ValueError(
+-++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++++++            )
+-++++++
+-++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++++
+-++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++++            self.head_dim,
+-++++++            max_position_embeddings=self.max_position_embeddings,
+-++++++            base=self.rope_theta,
+-++++++        )
+-++++++
+-++++++    def forward(
+-++++++        self,
+-++++++        hidden_states: mindspore.Tensor,
+-++++++        attention_mask: Optional[mindspore.Tensor] = None,
+-++++++        position_ids: Optional[mindspore.Tensor] = None,
+-++++++        past_key_value: Optional[Cache] = None,
+-++++++        output_attentions: bool = False,
+-++++++        use_cache: bool = False,
+-++++++        cache_position: Optional[mindspore.Tensor] = None,
+-++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++        bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++        # 1. 线性投射 Q, K, V
+-++++++        query_states = self.q_proj(hidden_states)
+-++++++        key_states = self.k_proj(hidden_states)
+-++++++        value_states = self.v_proj(hidden_states)
+-++++++
+-++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++        # 3. RoPE 旋转位置编码
+-++++++        kv_seq_len = key_states.shape[-2]
+-++++++        if past_key_value is not None:
+-++++++            if self.layer_idx is None:
+-++++++                raise ValueError(
+-++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++                    "with a layer index."
+-++++++                )
+-++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++++++                if cache_position.shape[0] == 1:
+-++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++++++                    kv_seq_len = past_seen_tokens + 1
+-++++++                else:
+-++++++                    # prefill 阶段：cache_position 是范围，使用其长度
+-++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++++++            else:
+-++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++
+-++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++        # 4. KV 缓存更新
+-++++++        if past_key_value is not None:
+-++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++++            key_states, value_states = past_key_value.update(
+-++++++                key_states, value_states, self.layer_idx, cache_kwargs
+-++++++            )
+-++++++            
+-++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++++                if cache_position.shape[0] == 1:
+-++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++++++                    kv_seq_len = key_states.shape[-2]
+-++++++
+-++++++        # 5. [重要] 准备 Attention Mask
+-++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++++        fa_attention_mask = None
+-++++++        if attention_mask is not None:
+-++++++            # 截取与当前key长度匹配的部分
+-++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++++++            fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++++++        input_dtype = query_states.dtype
+-++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++++++            query_states = query_states.to(mindspore.float16)
+-++++++            key_states = key_states.to(mindspore.float16)
+-++++++            value_states = value_states.to(mindspore.float16)
+-++++++
+-++++++        # 6. [核心] 调用 flash_attention_score 算子
+-++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++++        attn_output = mindspore.ops.flash_attention_score(
+-++++++            query=query_states,
+-++++++            key=key_states,
+-++++++            value=value_states,
+-++++++            head_num=self.num_heads, # 传入Q的头数(N1)
+-++++++            attn_mask=fa_attention_mask,
+-++++++            keep_prob=1.0 - self.attention_dropout,
+-++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++            input_layout="BNSD",
+-++++++            sparse_mode=0 # 使用 defaultMask 模式
+-++++++        )
+-++++++
+-++++++        # 恢复原始数据类型
+-++++++        attn_output = attn_output.to(input_dtype)
+-++++++
+-++++++        # 7. 调整输出形状
+-++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++        attn_output = self.o_proj(attn_output)
+-++++++
+-++++++        # FlashAttention 算子不直接返回注意力权重矩阵
+-++++++        attn_weights = None
+-++++++        if output_attentions:
+-++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++++
+-++++++        return attn_output, attn_weights, past_key_value
+-++++++
+-++++++    # def forward(
+-++++++    #     self,
+-++++++    #     hidden_states: mindspore.Tensor,
+-++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++++    #     past_key_value: Optional[Cache] = None,
+-++++++    #     output_attentions: bool = False,
+-++++++    #     use_cache: bool = False,
+-++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++    #     bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++    #     # 1. 线性投射 Q, K, V
+-++++++    #     query_states = self.q_proj(hidden_states)
+-++++++    #     key_states = self.k_proj(hidden_states)
+-++++++    #     value_states = self.v_proj(hidden_states)
+-++++++
+-++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++    #     # 3. RoPE 旋转位置编码
+-++++++    #     kv_seq_len = key_states.shape[-2]
+-++++++    #     if past_key_value is not None:
+-++++++    #         if self.layer_idx is None:
+-++++++    #             raise ValueError(
+-++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++    #                 "with a layer index."
+-++++++    #             )
+-++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++
+-++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++    #     # 4. KV 缓存更新
+-++++++    #     if past_key_value is not None:
+-++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++++    #         key_states, value_states = past_key_value.update(
+-++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++++    #         )
+-++++++
+-++++++    #     # 5. 准备 Attention Mask
+-++++++    #     fa_attention_mask = None
+-++++++    #     if attention_mask is not None:
+-++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++    #         fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++++++    #     input_dtype = query_states.dtype
+-++++++
+-++++++    #     # 6. [核心] 调用 flash_attention_score 算子
+-++++++    #     attn_output = mindspore.ops.flash_attention_score(
+-++++++    #         query=query_states,
+-++++++    #         key=key_states,
+-++++++    #         value=value_states,
+-++++++    #         head_num=self.num_heads,
+-++++++    #         attn_mask=fa_attention_mask,
+-++++++    #         keep_prob=1.0 - self.attention_dropout,
+-++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++    #         input_layout="BNSD",
+-++++++    #         sparse_mode=0,
+-++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++++++    #         inner_precise=1
+-++++++    #     )
+-++++++
+-++++++    #     # 恢复原始数据类型
+-++++++    #     attn_output = attn_output.to(input_dtype)
+-++++++
+-++++++    #     # 7. 调整输出形状
+-++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++    #     attn_output = self.o_proj(attn_output)
+-++++++
+-++++++    #     attn_weights = None
+-++++++    #     if output_attentions:
+-++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++++
+-++++++    #     return attn_output, attn_weights, past_key_value
+-++++++
+-++++++    # def forward(
+-++++++    #     self,
+-++++++    #     hidden_states: mindspore.Tensor,
+-++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-++++++    #     past_key_value: Optional[Cache] = None,
+-++++++    #     output_attentions: bool = False,
+-++++++    #     use_cache: bool = False,
+-++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++    #     bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++    #     query_states = self.q_proj(hidden_states)
+-++++++    #     key_states = self.k_proj(hidden_states)
+-++++++    #     value_states = self.v_proj(hidden_states)
+-++++++
+-++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++    #     kv_seq_len = key_states.shape[-2]
+-++++++    #     if past_key_value is not None:
+-++++++    #         if self.layer_idx is None:
+-++++++    #             raise ValueError("`layer_idx` must be specified for caching")
+-++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++
+-++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++    #     if past_key_value is not None:
+-++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++++    #         key_states, value_states = past_key_value.update(
+-++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++++    #         )
+-++++++
+-++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++++
+-++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-++++++    #     query_states = query_states / math.sqrt(self.head_dim)
+-++++++    #     # <--- 修改结束 ---
+-++++++
+-++++++    #     fa_attention_mask = None
+-++++++    #     if attention_mask is not None:
+-++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++    #         fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++    #     input_dtype = query_states.dtype
+-++++++
+-++++++    #     attn_output = mindspore.ops.flash_attention_score(
+-++++++    #         query=query_states,    # 传入已经预先缩放过的 query
+-++++++    #         key=key_states,
+-++++++    #         value=value_states,
+-++++++    #         head_num=self.num_heads,
+-++++++    #         attn_mask=fa_attention_mask,
+-++++++    #         keep_prob=1.0 - self.attention_dropout,
+-++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-++++++    #         input_layout="BNSD",
+-++++++    #         sparse_mode=0,
+-++++++    #         inner_precise=1        # 仍然保持内部高精度计算
+-++++++    #     )
+-++++++
+-++++++    #     attn_output = attn_output.to(input_dtype)
+-++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++    #     attn_output = self.o_proj(attn_output)
+-++++++
+-++++++    #     attn_weights = None
+-++++++    #     if output_attentions:
+-++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++++++
+-++++++    #     return attn_output, attn_weights, past_key_value
+-++++++
+-+++++ QWEN2MOE_ATTENTION_CLASSES = {
+-+++++     "eager": Qwen2MoeAttention,
+-++++++    "flash-attention": Qwen2MoeFlashAttention,
+-+++++ }
+-+++++ 
+-+++++ 
+-+++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++ 
+-++++++    #@dwj
+-++++++    # 只遍历激活的专家，而非全部专家
+-+++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-+++++-        # router_logits: (batch * sequence_length, n_experts)
+-+++++-        router_logits = self.gate(hidden_states)
+-+++++-
+-+++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++-        if self.norm_topk_prob:
+-+++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-        # we cast back to the input dtype
+-+++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++-
+-+++++-        final_hidden_states = ops.zeros(
+-+++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+++++-        )
+-+++++-
+-+++++-        # One hot encode the selected experts to create an expert mask
+-+++++-        # this will be used to easily index which expert is going to be sollicitated
+-+++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+++++-
+-+++++-        # Loop over all available experts in the model and perform the computation on each expert
+-+++++-        for expert_idx in range(self.num_experts):
+-+++++-            expert_layer = self.experts[expert_idx]
+-+++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+++++-
+-+++++-            # Index the correct hidden states and compute the expert hidden state for
+-+++++-            # the current expert. We need to make sure to multiply the output hidden
+-+++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+++++-            if 0 not in idx.shape:
+-+++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+++++-
+-+++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-+++++-                # the `top_x` tensor here.
+-+++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+++++-
+-+++++-        shared_expert_output = self.shared_expert(hidden_states)
+-+++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+++++-
+-+++++-        final_hidden_states = final_hidden_states + shared_expert_output
+-++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++            num_tokens = hidden_states_reshaped.shape[0]
+-++++++
+-++++++            router_logits = self.gate(hidden_states_reshaped)
+-++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++++
+-++++++            if self.norm_topk_prob:
+-++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++            routing_weights = routing_weights.to(hidden_states.dtype)
+-++++++            
+-++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++++++            flat_selected_experts = selected_experts.flatten()
+-++++++            
+-++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++++++            token_indices = broadcasted_token_indices.flatten()
+-++++++            
+-++++++            active_experts = ops.unique(flat_selected_experts)
+-++++++            
+-++++++            for expert_idx_tensor in active_experts:
+-++++++                expert_idx = expert_idx_tensor.item()
+-++++++                expert_layer = self.experts[expert_idx]
+-++++++                
+-++++++                mask = (flat_selected_experts == expert_idx_tensor)
+-++++++                selected_token_indices = token_indices[mask]
+-++++++                selected_routing_weights = routing_weights.flatten()[mask]
+-++++++                
+-++++++                current_states = hidden_states_reshaped[selected_token_indices]
+-++++++                
+-++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++++                
+-++++++                final_hidden_states = final_hidden_states.index_add(
+-++++++                    dim=0,
+-++++++                    index=selected_token_indices,
+-++++++                    source=expert_output.to(hidden_states.dtype)
+-++++++                )
+-++++++            
+-++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++++ 
+-+++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++-        return final_hidden_states, router_logits
+-++++++            final_hidden_states = final_hidden_states + shared_expert_output
+-++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++++            
+-++++++            return final_hidden_states, router_logits
+-+++++ 
+-+++++ 
+-+++++ class Qwen2MoeDecoderLayer(nn.Module):
+-+++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+++++ 
+-+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++++ 
+-++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++++
+-+++++         if (layer_idx not in config.mlp_only_layers) and (
+-+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+++++         ):
+-+++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+++++     _skip_keys_device_placement = "past_key_values"
+-+++++     _supports_cache_class = True
+-++++++#lwx
+-++++++    # _supports_static_cache = True
+-+++++ 
+-+++++     def _init_weights(self, module):
+-+++++         std = self.config.initializer_range
+-+++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+++++         return causal_mask
+-+++++ 
+-+++++ 
+-+++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++     _tied_weights_keys = ["lm_head.weight"]
+-+++++ 
+-+++++     def __init__(self, config):
+-+++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++         self.num_experts_per_tok = config.num_experts_per_tok
+-+++++         # Initialize weights and apply final processing
+-+++++         self.post_init()
+-++++++        # @lwx
+-++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-++++++        #     self.generation_config.cache_implementation = "static"
+-++++++        self._warmed_up = False
+-++++++
+-++++++    def warmup_moe_model(self):
+-++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-++++++        test_texts = [
+-++++++            "warmup short",
+-++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-++++++        ]
+-++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-++++++        if tokenizer is None:
+-++++++            from mindnlp.transformers import AutoTokenizer
+-++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-++++++            self._warmup_tokenizer = tokenizer
+-++++++
+-++++++        for text in test_texts:
+-++++++            inputs = tokenizer(text, return_tensors="ms")
+-++++++            with mindspore._no_grad():
+-++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+++++ 
+-+++++     def get_input_embeddings(self):
+-+++++         return self.model.embed_tokens
+-+++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++++         ```"""
+-++++++        if not self._warmed_up:
+-++++++            self._warmed_up = True
+-++++++            self.warmup_moe_model()
+-+++++ 
+-+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+++++         output_router_logits = (
+-+++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++             }
+-+++++         )
+-+++++         return model_inputs
+-++++++# @lwx
+-++++++    # def _decode_one_tokens_logits(
+-++++++    #     self,
+-++++++    #     cur_token: mindspore.Tensor,
+-++++++    #     input_pos: Optional[mindspore.Tensor],
+-++++++    #     cache_position: mindspore.Tensor,
+-++++++    #     past_key_values: StaticCache,
+-++++++    # ) -> mindspore.Tensor:
+-++++++    #     """
+-++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-++++++        
+-++++++    #     Args:
+-++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-++++++    #         input_pos: 输入位置信息，可选
+-++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-++++++            
+-++++++    #     Returns:
+-++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-++++++    #     """
+-++++++    #     # 调用JIT编译的版本
+-++++++    #     return self.get_decode_one_tokens_logits(
+-++++++    #         cur_token=cur_token,
+-++++++    #         input_pos=input_pos,
+-++++++    #         cache_position=cache_position,
+-++++++    #         past_key_values=past_key_values,
+-++++++    #     )
+-++++++    
+-++++++    # @mindspore.jit(jit_level='O1')
+-++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-++++++    #     """
+-++++++    #     JIT编译的函数，用于高效的单token解码
+-++++++    #     使用JIT编译优化以支持静态shape和高效执行
+-++++++        
+-++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-++++++    #     """
+-++++++    #     outputs = self.model.forward(
+-++++++    #         input_ids=cur_token,
+-++++++    #         position_ids=input_pos,
+-++++++    #         cache_position=cache_position,
+-++++++    #         past_key_values=past_key_values,
+-++++++    #         use_cache=True,
+-++++++    #         return_dict=False,
+-++++++    #     )
+-++++++        
+-++++++    #     hidden_states = outputs[0]
+-++++++    #     logits = self.lm_head.forward(hidden_states)
+-++++++    #     logits = logits.float()
+-++++++        
+-++++++    #     return logits[:, -1, :]
+-++++++
+-++++++    # def _sample(
+-++++++    #     self,
+-++++++    #     input_ids: mindspore.Tensor,
+-++++++    #     logits_processor,
+-++++++    #     stopping_criteria,
+-++++++    #     generation_config,
+-++++++    #     synced_devices: bool,
+-++++++    #     streamer=None,
+-++++++    #     logits_warper=None,
+-++++++    #     **model_kwargs,
+-++++++    # ):
+-++++++    #     """
+-++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-++++++    #     """
+-++++++    #     from ...generation.logits_process import LogitsProcessorList
+-++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-++++++    #     from mindnlp.core import nn, ops, no_grad
+-++++++    #     import numpy as np
+-++++++        
+-++++++    #     # 检查是否使用 StaticCache
+-++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-++++++    #     # 否则，直接调用父类方法
+-++++++    #     past_key_values = model_kwargs.get("past_key_values")
+-++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-++++++
+-++++++    #     if not isinstance(past_key_values, StaticCache):
+-++++++    #         # 不使用 StaticCache，直接调用父类方法
+-++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-++++++    #         return super()._sample(
+-++++++    #             input_ids=input_ids,
+-++++++    #             logits_processor=logits_processor,
+-++++++    #             stopping_criteria=stopping_criteria,
+-++++++    #             generation_config=generation_config,
+-++++++    #             synced_devices=synced_devices,
+-++++++    #             streamer=streamer,
+-++++++    #             logits_warper=logits_warper,
+-++++++    #             **model_kwargs,
+-++++++    #         )
+-++++++        
+-++++++    #     # 使用 StaticCache，进入自定义循环
+-++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-++++++    #     pad_token_id = generation_config._pad_token_tensor
+-++++++    #     output_attentions = generation_config.output_attentions
+-++++++    #     output_hidden_states = generation_config.output_hidden_states
+-++++++    #     output_scores = generation_config.output_scores
+-++++++    #     output_logits = generation_config.output_logits
+-++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-++++++    #     max_length = generation_config.max_length
+-++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-++++++    #     do_sample = generation_config.do_sample
+-++++++        
+-++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-++++++    #         raise ValueError(
+-++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-++++++    #             f"{logits_warper})."
+-++++++    #         )
+-++++++        
+-++++++    #     # init attention / hidden states / scores tuples
+-++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-++++++        
+-++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-++++++    #         encoder_hidden_states = (
+-++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-++++++    #         )
+-++++++        
+-++++++    #     # keep track of which sequences are already finished
+-++++++    #     batch_size, cur_len = input_ids.shape
+-++++++    #     this_peer_finished = False
+-++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-++++++        
+-++++++    #     time_record = []
+-++++++    #     from ....utils.testing_utils import parse_flag_from_env
+-++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-++++++        
+-++++++    #     while self._has_unfinished_sequences(
+-++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-++++++    #     ):
+-++++++    #         if _record_time:
+-++++++    #             import time as time_module
+-++++++    #             infer_start = time_module.time()
+-++++++            
+-++++++    #         # prepare model inputs
+-++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-++++++            
+-++++++    #         # prepare variable output controls
+-++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-++++++            
+-++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-++++++    #         cur_cache_position = model_inputs.get("cache_position")
+-++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-++++++    #         cur_input_ids = model_inputs.get("input_ids")
+-++++++            
+-++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-++++++    #             cur_cache_position is not None and 
+-++++++    #             len(cur_cache_position.shape) > 0 and
+-++++++    #             cur_cache_position.shape[0] == 1 and
+-++++++    #             cur_input_ids is not None and
+-++++++    #             cur_input_ids.shape[1] == 1):
+-++++++    #             # 使用 JIT 优化的单 token 解码
+-++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-++++++    #             if not hasattr(self, '_jit_used'):
+-++++++    #                 self._jit_used = False
+-++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-++++++                
+-++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-++++++    #                 cur_token=cur_input_ids,
+-++++++    #                 input_pos=model_inputs.get("position_ids"),
+-++++++    #                 cache_position=cur_cache_position,
+-++++++    #                 past_key_values=cur_past_key_values,
+-++++++    #             )
+-++++++                
+-++++++    #             # 标记已使用JIT（用于后续判断）
+-++++++    #             if not self._jit_used:
+-++++++    #                 self._jit_used = True
+-++++++                
+-++++++    #             # 构造兼容的输出对象
+-++++++    #             class JitOptimizedOutput:
+-++++++    #                 def __init__(self, logits, config):
+-++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-++++++    #                     self.config = config
+-++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-++++++    #                     self.cross_attentions = None
+-++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-++++++                
+-++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-++++++    #         else:
+-++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-++++++    #             outputs = self(**model_inputs, return_dict=True)
+-++++++            
+-++++++    #         if synced_devices and this_peer_finished:
+-++++++    #             continue
+-++++++            
+-++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-++++++    #         next_token_logits = outputs.logits[:, -1, :]
+-++++++            
+-++++++    #         # pre-process distribution
+-++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-++++++    #         if do_sample:
+-++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-++++++            
+-++++++    #         # Store scores, attentions and hidden_states when required
+-++++++    #         if return_dict_in_generate:
+-++++++    #             if output_scores:
+-++++++    #                 scores += (next_token_scores,)
+-++++++    #             if output_logits:
+-++++++    #                 raw_logits += (next_token_logits,)
+-++++++    #             if output_attentions:
+-++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-++++++    #                 if self.config.is_encoder_decoder:
+-++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-++++++                
+-++++++    #             if output_hidden_states:
+-++++++    #                 hidden = (
+-++++++    #                     outputs.decoder_hidden_states
+-++++++    #                     if self.config.is_encoder_decoder
+-++++++    #                     else outputs.hidden_states
+-++++++    #                 )
+-++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-++++++            
+-++++++    #         # token selection
+-++++++    #         if do_sample:
+-++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-++++++    #         else:
+-++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-++++++            
+-++++++    #         # finished sentences should have their next token be a padding token
+-++++++    #         if has_eos_stopping_criteria:
+-++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-++++++            
+-++++++    #         # update generated ids, model inputs, and length for next step
+-++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-++++++    #         if streamer is not None:
+-++++++    #             streamer.put(next_tokens)
+-++++++            
+-++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-++++++    #             outputs,
+-++++++    #             model_kwargs,
+-++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-++++++    #         )
+-++++++            
+-++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-++++++    #         cur_len += 1
+-++++++            
+-++++++    #         if _record_time:
+-++++++    #             import time as time_module
+-++++++    #             infer_stop = time_module.time()
+-++++++    #             time_record.append(infer_stop - infer_start)
+-++++++            
+-++++++    #         del outputs
+-++++++        
+-++++++    #     average_infer_time = None
+-++++++    #     if time_record:
+-++++++    #         if len(time_record) > 1:
+-++++++    #             time_record.pop(0)
+-++++++    #         average_infer_time = sum(time_record) / len(time_record)
+-++++++    #         print(f'average inference time is: {average_infer_time}')
+-++++++    #         print(f'inference time record: {time_record}')
+-++++++        
+-++++++    #     if streamer is not None:
+-++++++    #         streamer.end()
+-++++++        
+-++++++    #     # 简单判断：打印是否使用了JIT路径
+-++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-++++++    #     else:
+-++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-++++++        
+-++++++    #     if return_dict_in_generate:
+-++++++    #         if self.config.is_encoder_decoder:
+-++++++    #             return GenerateEncoderDecoderOutput(
+-++++++    #                 sequences=input_ids,
+-++++++    #                 scores=scores,
+-++++++    #                 logits=raw_logits,
+-++++++    #                 encoder_attentions=encoder_attentions,
+-++++++    #                 encoder_hidden_states=encoder_hidden_states,
+-++++++    #                 decoder_attentions=decoder_attentions,
+-++++++    #                 cross_attentions=cross_attentions,
+-++++++    #                 decoder_hidden_states=decoder_hidden_states,
+-++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++++    #                 average_infer_time=average_infer_time
+-++++++    #             )
+-++++++    #         else:
+-++++++    #             return GenerateDecoderOnlyOutput(
+-++++++    #                 sequences=input_ids,
+-++++++    #                 scores=scores,
+-++++++    #                 logits=raw_logits,
+-++++++    #                 attentions=decoder_attentions,
+-++++++    #                 hidden_states=decoder_hidden_states,
+-++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-++++++    #                 average_infer_time=average_infer_time
+-++++++    #             )
+-++++++    #     else:
+-++++++    #         return input_ids
+-++++++            
+-++++++    # def _prepare_cache_for_generation(
+-++++++    #     self,
+-++++++    #     generation_config,
+-++++++    #     model_kwargs,
+-++++++    #     assistant_model,
+-++++++    #     batch_size,
+-++++++    #     max_cache_length,
+-++++++    # ):
+-++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-++++++    #         generation_config.cache_implementation = "static"
+-++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-++++++        
+-++++++    #     if generation_config.cache_implementation == "static":
+-++++++    #         base_required_from_max_length = generation_config.max_length + 1
+-++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-++++++    #         min_cache_size = 50
+-++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-++++++    #         else:
+-++++++    #             max_cache_length = max(base_required, min_cache_size)
+-++++++            
+-++++++    #         original_max_cache_length = max_cache_length
+-++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-++++++            
+-++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-++++++    #             if max_cache_length > self.config.max_position_embeddings:
+-++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-++++++        
+-++++++    #     result = super()._prepare_cache_for_generation(
+-++++++    #         generation_config=generation_config,
+-++++++    #         model_kwargs=model_kwargs,
+-++++++    #         assistant_model=assistant_model,
+-++++++    #         batch_size=batch_size,
+-++++++    #         max_cache_length=max_cache_length,
+-++++++    #     )
+-++++++        
+-++++++    #     if generation_config.cache_implementation == "static":
+-++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-++++++    #         created_cache = model_kwargs.get(cache_name)
+-++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-++++++    #             if created_cache.max_cache_len < generation_config.max_length:
+-++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-++++++        
+-++++++    #     return result
+-++++++
+-++++++
+-++++++
+-+++++ 
+-+++++ 
+-+++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+++++-- 
+-+++++2.27.0
+-+++++
+-++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
+-++++new file mode 100644
+-++++index 00000000..22b65dd5
+-++++--- /dev/null
+-+++++++ b/patches/0002-20251106commit.patch
+-++++@@ -0,0 +1,3200 @@
+-+++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
+-+++++From: Pinoeer-kingxi <13022943007@163.com>
+-+++++Date: Thu, 6 Nov 2025 09:20:38 +0800
+-+++++Subject: [PATCH 2/3] 20251106commit
+-+++++
+-+++++---
+-+++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
+-+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
+-+++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
+-+++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
+-+++++ create mode 100644 patches/0001-20251104commit.patch
+-+++++
+-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++index d8303e45..73773c22 100644
+-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
+-+++++         #     y = y + self.shared_experts(identity)
+-+++++         # return y
+-+++++         
+-++++++    # @no_grad()
+-++++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++++
+-++++++    #     expert_cache = ops.zeros_like(x)
+-++++++    #     for i in range(self.num_experts_per_tok):
+-++++++    #         expert_id = flat_expert_indices[i].item()
+-++++++    #         weight = flat_expert_weights[i].item()
+-++++++    #         expert = self.experts[expert_id]
+-++++++    #         expert_out = expert(x)
+-++++++    #         expert_cache += expert_out * weight
+-++++++    #     return expert_cache
+-++++++
+-+++++     @no_grad()
+-+++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-++++++        # x 的 shape: (1, hidden_size)
+-++++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
+-++++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
+-++++++
+-++++++        # 1. 收集所有需要的专家层
+-++++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
+-++++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
+-++++++
+-++++++        # 2. 并行计算所有专家的输出
+-++++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
+-++++++        # ops.cat 会将它们堆叠成一个新的 Tensor
+-++++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
+-++++++
+-++++++        # 3. 使用矩阵乘法进行加权求和
+-++++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
+-++++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
+-++++++        # 最终结果 final_output 的 shape: (1, hidden_size)
+-++++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
+-++++++        
+-++++++        return final_output
+-+++++ 
+-+++++-        expert_cache = ops.zeros_like(x)
+-+++++-        for i in range(self.num_experts_per_tok):
+-+++++-            expert_id = flat_expert_indices[i].item()
+-+++++-            weight = flat_expert_weights[i].item()
+-+++++-            expert = self.experts[expert_id]
+-+++++-            expert_out = expert(x)
+-+++++-            expert_cache += expert_out * weight
+-+++++-        return expert_cache
+-+++++ 
+-+++++     @no_grad()
+-+++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
+-+++++             key_states = self.k_proj(hidden_states)
+-+++++             value_states = self.v_proj(hidden_states)
+-+++++ 
+-+++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++++        # @lwx
+-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
+-++++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
+-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-++++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
+-++++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
+-+++++ 
+-+++++         kv_seq_len = key_states.shape[-2]
+-+++++         if past_key_value is not None:
+-+++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
+-+++++         return attn_output, attn_weights, past_key_value
+-+++++ 
+-+++++ 
+-++++++# class DeepseekFlashAttention(nn.Module):
+-++++++#     """
+-++++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-++++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
+-++++++
+-++++++#     This class is designed as a drop-in replacement for DeepseekAttention.
+-++++++#     """
+-++++++
+-++++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-++++++#         super().__init__()
+-++++++#         self.config = config
+-++++++#         self.layer_idx = layer_idx
+-++++++#         if layer_idx is None:
+-++++++#             logger.warning(
+-++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++++#                 "when creating this class."
+-++++++#             )
+-++++++
+-++++++#         self.attention_dropout = config.attention_dropout
+-++++++#         self.hidden_size = config.hidden_size
+-++++++#         self.num_heads = config.num_attention_heads
+-++++++#         self.head_dim = self.hidden_size // self.num_heads
+-++++++#         self.num_key_value_heads = config.num_key_value_heads
+-++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++++#         self.max_position_embeddings = config.max_position_embeddings
+-++++++#         self.rope_theta = config.rope_theta
+-++++++#         self.is_causal = True
+-++++++
+-++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++++#             raise ValueError(
+-++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++++++#                 f" and `num_heads`: {self.num_heads})."
+-++++++#             )
+-++++++
+-++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-++++++#         self._init_rope()
+-++++++
+-++++++#     def _init_rope(self):
+-++++++#         if self.config.rope_scaling is None:
+-++++++#             self.rotary_emb = DeepseekRotaryEmbedding(
+-++++++#                 self.head_dim,
+-++++++#                 max_position_embeddings=self.max_position_embeddings,
+-++++++#                 base=self.rope_theta,
+-++++++#             )
+-++++++#         else:
+-++++++#             scaling_type = self.config.rope_scaling["type"]
+-++++++#             scaling_factor = self.config.rope_scaling["factor"]
+-++++++#             if scaling_type == "linear":
+-++++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-++++++#                     self.head_dim,
+-++++++#                     max_position_embeddings=self.max_position_embeddings,
+-++++++#                     scaling_factor=scaling_factor,
+-++++++#                     base=self.rope_theta,
+-++++++#                 )
+-++++++#             elif scaling_type == "dynamic":
+-++++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-++++++#                     self.head_dim,
+-++++++#                     max_position_embeddings=self.max_position_embeddings,
+-++++++#                     scaling_factor=scaling_factor,
+-++++++#                     base=self.rope_theta,
+-++++++#                 )
+-++++++#             else:
+-++++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-++++++
+-++++++#     def forward(
+-++++++#         self,
+-++++++#         hidden_states: mindspore.Tensor,
+-++++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++++++#         position_ids: Optional[mindspore.Tensor] = None,
+-++++++#         past_key_value: Optional[Cache] = None,
+-++++++#         output_attentions: bool = False,
+-++++++#         use_cache: bool = False,
+-++++++#         **kwargs,
+-++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++#         if "padding_mask" in kwargs:
+-++++++#             warnings.warn(
+-++++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-++++++#             )
+-++++++        
+-++++++#         if output_attentions:
+-++++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
+-++++++
+-++++++#         bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++#         if self.config.pretraining_tp > 1:
+-++++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-++++++
+-++++++#         query_states = self.q_proj(hidden_states)
+-++++++#         key_states = self.k_proj(hidden_states)
+-++++++#         value_states = self.v_proj(hidden_states)
+-++++++
+-++++++#         # Reshape for multi-head attention
+-++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++#         kv_seq_len = key_states.shape[-2]
+-++++++#         if past_key_value is not None:
+-++++++#             if self.layer_idx is None:
+-++++++#                 raise ValueError(
+-++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++#                     "with a layer index."
+-++++++#                 )
+-++++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++        
+-++++++#         # Apply Rotary Positional Embedding
+-++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++#         if past_key_value is not None:
+-++++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
+-++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++++
+-++++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
+-++++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
+-++++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++        
+-++++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++++++        
+-++++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
+-++++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
+-++++++
+-++++++#         # Convert attention_mask for flash_attention_score
+-++++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
+-++++++#         if attention_mask is not None:
+-++++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
+-++++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-++++++#                 raise ValueError(
+-++++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-++++++#                 )
+-++++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
+-++++++#         else:
+-++++++#             attn_mask_for_fa = None
+-++++++        
+-++++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-++++++
+-++++++#         # Call the fused flash_attention_score operator
+-++++++#         attn_output = mindspore.ops.flash_attention_score(
+-++++++#             query=query_states_for_fa,
+-++++++#             key=key_states_for_fa,
+-++++++#             value=value_states_for_fa,
+-++++++#             head_num=self.num_heads, # This is N1, the number of query heads
+-++++++#             input_layout='BSH',
+-++++++#             attn_mask=attn_mask_for_fa,
+-++++++#             keep_prob=keep_prob,
+-++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++#             sparse_mode=0 # Default mask mode
+-++++++#         )
+-++++++        
+-++++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
+-++++++#         attn_output = self.o_proj(attn_output)
+-++++++        
+-++++++#         # Flash Attention does not return attention weights
+-++++++#         attn_weights = None
+-++++++
+-++++++#         return attn_output, attn_weights, past_key_value
+-++++++
+-++++++class DeepseekFlashAttention(nn.Module):
+-++++++    """
+-++++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
+-++++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
+-++++++    designed for high performance on supported hardware (Ascend).
+-++++++
+-++++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
+-++++++    """
+-++++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
+-++++++        super().__init__()
+-++++++        self.config = config
+-++++++        self.layer_idx = layer_idx
+-++++++        if layer_idx is None:
+-++++++            logger.warning(
+-++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++++                "when creating this class."
+-++++++            )
+-++++++
+-++++++        # --- [FIX] Correctly initialize all required attributes ---
+-++++++        self.attention_dropout = config.attention_dropout
+-++++++        self.hidden_size = config.hidden_size
+-++++++        self.num_heads = config.num_attention_heads
+-++++++        self.head_dim = self.hidden_size // self.num_heads
+-++++++        self.num_key_value_heads = config.num_key_value_heads
+-++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++++        self.max_position_embeddings = config.max_position_embeddings
+-++++++        self.rope_theta = config.rope_theta
+-++++++        self.is_causal = True
+-++++++
+-++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++++            raise ValueError(
+-++++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++++++                f" and `num_heads`: {self.num_heads})."
+-++++++            )
+-++++++
+-++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
+-++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
+-++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
+-++++++        
+-++++++        # This call will now succeed as all attributes are initialized.
+-++++++        self._init_rope()
+-++++++
+-++++++    def _init_rope(self):
+-++++++        if self.config.rope_scaling is None:
+-++++++            self.rotary_emb = DeepseekRotaryEmbedding(
+-++++++                self.head_dim,
+-++++++                max_position_embeddings=self.max_position_embeddings,
+-++++++                base=self.rope_theta,
+-++++++            )
+-++++++        else:
+-++++++            scaling_type = self.config.rope_scaling["type"]
+-++++++            scaling_factor = self.config.rope_scaling["factor"]
+-++++++            if scaling_type == "linear":
+-++++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
+-++++++                    self.head_dim,
+-++++++                    max_position_embeddings=self.max_position_embeddings,
+-++++++                    scaling_factor=scaling_factor,
+-++++++                    base=self.rope_theta,
+-++++++                )
+-++++++            elif scaling_type == "dynamic":
+-++++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
+-++++++                    self.head_dim,
+-++++++                    max_position_embeddings=self.max_position_embeddings,
+-++++++                    scaling_factor=scaling_factor,
+-++++++                    base=self.rope_theta,
+-++++++                )
+-++++++            else:
+-++++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
+-++++++
+-++++++    def forward(
+-++++++        self,
+-++++++        hidden_states: mindspore.Tensor,
+-++++++        attention_mask: Optional[mindspore.Tensor] = None,
+-++++++        position_ids: Optional[mindspore.Tensor] = None,
+-++++++        past_key_value: Optional[Cache] = None,
+-++++++        output_attentions: bool = False,
+-++++++        use_cache: bool = False,
+-++++++        **kwargs,
+-++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++        if "padding_mask" in kwargs:
+-++++++            warnings.warn(
+-++++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
+-++++++            )
+-++++++        if output_attentions:
+-++++++            warnings.warn(
+-++++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
+-++++++            )
+-++++++
+-++++++        bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++        if self.config.pretraining_tp > 1:
+-++++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
+-++++++
+-++++++        query_states = self.q_proj(hidden_states)
+-++++++        key_states = self.k_proj(hidden_states)
+-++++++        value_states = self.v_proj(hidden_states)
+-++++++
+-++++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
+-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++        kv_seq_len = key_states.shape[-2]
+-++++++        if past_key_value is not None:
+-++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++        
+-++++++        # Apply Rotary Position Embedding
+-++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++        if past_key_value is not None:
+-++++++            cache_kwargs = {"sin": sin, "cos": cos}
+-++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++++
+-++++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
+-++++++        # So we must explicitly repeat the KV heads.
+-++++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++++
+-++++++        # Convert attention mask for flash_attention_score
+-++++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
+-++++++        if attention_mask is not None:
+-++++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
+-++++++                 raise ValueError(
+-++++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
+-++++++                )
+-++++++            attn_mask_for_fa = attention_mask < 0
+-++++++        else:
+-++++++            attn_mask_for_fa = None
+-++++++
+-++++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
+-++++++
+-++++++        # Call the fused operator using the efficient BNSD layout
+-++++++        attn_output = mindspore.ops.flash_attention_score(
+-++++++            query=query_states,
+-++++++            key=key_states,
+-++++++            value=value_states,
+-++++++            head_num=self.num_heads,
+-++++++            input_layout='BNSD', # Specify the correct layout
+-++++++            attn_mask=attn_mask_for_fa,
+-++++++            keep_prob=keep_prob,
+-++++++            scalar_value=1.0 / math.sqrt(self.head_dim)
+-++++++        )
+-++++++        
+-++++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
+-++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++        
+-++++++        # Apply output projection
+-++++++        attn_output = self.o_proj(attn_output)
+-++++++
+-++++++        # Flash attention does not return attention weights, so we return None.
+-++++++        attn_weights = None
+-++++++
+-++++++        return attn_output, attn_weights, past_key_value
+-++++++
+-+++++ Deepseek_ATTENTION_CLASSES = {
+-+++++     "eager": DeepseekAttention,
+-++++++    "flash-attention": DeepseekFlashAttention,
+-+++++ }
+-+++++ 
+-+++++ 
+-+++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
+-+++++             config=config, layer_idx=layer_idx
+-+++++         )
+-+++++ 
+-++++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
+-++++++            config=config, layer_idx=layer_idx
+-++++++        )
+-++++++
+-+++++         self.mlp = (
+-+++++             DeepseekMoE(config)
+-+++++             if (
+-+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++index d4c6b651..bced285c 100644
+-+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
+-+++++ 
+-+++++ import mindspore
+-+++++ import mindnlp.core.nn.functional as F
+-+++++-from mindnlp.core import nn, ops
+-++++++from mindnlp.core import nn, ops, no_grad
+-+++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+-+++++ 
+-+++++ from ....common.activations import ACT2FN
+-+++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
+-+++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-+++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-+++++ 
+-++++++Long_Prompt = False
+-++++++PROMPT_LENGTH_THRESHOLD = 128
+-+++++ 
+-+++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-+++++ def _prepare_4d_causal_attention_mask_with_cache_position(
+-+++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
+-+++++         return attn_output, attn_weights, past_key_value
+-+++++ 
+-+++++ 
+-++++++# class Qwen2MoeFlashAttention(nn.Module):
+-++++++#     """
+-++++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-++++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-++++++
+-++++++#     关键改动:
+-++++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-++++++#        直接传入原始的 key 和 value 张量效率更高。
+-++++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-++++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++++#     """
+-++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++++#         super().__init__()
+-++++++#         self.config = config
+-++++++#         self.layer_idx = layer_idx
+-++++++#         self.hidden_size = config.hidden_size
+-++++++#         self.num_heads = config.num_attention_heads
+-++++++#         self.head_dim = self.hidden_size // self.num_heads
+-++++++#         self.num_key_value_heads = config.num_key_value_heads
+-++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++++#         self.max_position_embeddings = config.max_position_embeddings
+-++++++#         self.rope_theta = config.rope_theta
+-++++++#         self.attention_dropout = config.attention_dropout
+-++++++
+-++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++++#             raise ValueError(
+-++++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-++++++#             )
+-++++++
+-++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++++
+-++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++++#             self.head_dim,
+-++++++#             max_position_embeddings=self.max_position_embeddings,
+-++++++#             base=self.rope_theta,
+-++++++#         )
+-++++++
+-++++++#     def forward(
+-++++++#         self,
+-++++++#         hidden_states: mindspore.Tensor,
+-++++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++++++#         position_ids: Optional[mindspore.Tensor] = None,
+-++++++#         past_key_value: Optional[Cache] = None,
+-++++++#         output_attentions: bool = False,
+-++++++#         use_cache: bool = False,
+-++++++#         cache_position: Optional[mindspore.Tensor] = None,
+-++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++#         bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++#         # 1. 线性投射 Q, K, V
+-++++++#         query_states = self.q_proj(hidden_states)
+-++++++#         key_states = self.k_proj(hidden_states)
+-++++++#         value_states = self.v_proj(hidden_states)
+-++++++
+-++++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
+-++++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++#         # 3. RoPE 旋转位置编码
+-++++++#         kv_seq_len = key_states.shape[-2]
+-++++++#         if past_key_value is not None:
+-++++++#             if self.layer_idx is None:
+-++++++#                 raise ValueError(
+-++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++#                     "with a layer index."
+-++++++#                 )
+-++++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
+-++++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-++++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-++++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-++++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-++++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-++++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-++++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-++++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-++++++#                 if cache_position.shape[0] == 1:
+-++++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-++++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-++++++#                     kv_seq_len = past_seen_tokens + 1
+-++++++#                 else:
+-++++++#                     # prefill 阶段：cache_position 是范围，使用其长度
+-++++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-++++++#             else:
+-++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++
+-++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++#         # 4. KV 缓存更新
+-++++++#         if past_key_value is not None:
+-++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++++#             key_states, value_states = past_key_value.update(
+-++++++#                 key_states, value_states, self.layer_idx, cache_kwargs
+-++++++#             )
+-++++++            
+-++++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-++++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-++++++#                 if cache_position.shape[0] == 1:
+-++++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-++++++#                     kv_seq_len = key_states.shape[-2]
+-++++++
+-++++++#         # 5. [重要] 准备 Attention Mask
+-++++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-++++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++++#         fa_attention_mask = None
+-++++++#         if attention_mask is not None:
+-++++++#             # 截取与当前key长度匹配的部分
+-++++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-++++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-++++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
+-++++++#             fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-++++++#         input_dtype = query_states.dtype
+-++++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-++++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-++++++#             query_states = query_states.to(mindspore.float16)
+-++++++#             key_states = key_states.to(mindspore.float16)
+-++++++#             value_states = value_states.to(mindspore.float16)
+-++++++
+-++++++#         # 6. [核心] 调用 flash_attention_score 算子
+-++++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-++++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++++#         attn_output = mindspore.ops.flash_attention_score(
+-++++++#             query=query_states,
+-++++++#             key=key_states,
+-++++++#             value=value_states,
+-++++++#             head_num=self.num_heads, # 传入Q的头数(N1)
+-++++++#             attn_mask=fa_attention_mask,
+-++++++#             keep_prob=1.0 - self.attention_dropout,
+-++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++#             input_layout="BNSD",
+-++++++#             sparse_mode=0 # 使用 defaultMask 模式
+-++++++#         )
+-++++++
+-++++++#         # 恢复原始数据类型
+-++++++#         attn_output = attn_output.to(input_dtype)
+-++++++
+-++++++#         # 7. 调整输出形状
+-++++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++#         attn_output = self.o_proj(attn_output)
+-++++++
+-++++++#         # FlashAttention 算子不直接返回注意力权重矩阵
+-++++++#         attn_weights = None
+-++++++#         if output_attentions:
+-++++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++++
+-++++++#         return attn_output, attn_weights, past_key_value
+-++++++
+-++++++#     # def forward(
+-++++++#     #     self,
+-++++++#     #     hidden_states: mindspore.Tensor,
+-++++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
+-++++++#     #     position_ids: Optional[mindspore.Tensor] = None,
+-++++++#     #     past_key_value: Optional[Cache] = None,
+-++++++#     #     output_attentions: bool = False,
+-++++++#     #     use_cache: bool = False,
+-++++++#     #     cache_position: Optional[mindspore.Tensor] = None,
+-++++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++#     #     bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++#     #     # 1. 线性投射 Q, K, V
+-++++++#     #     query_states = self.q_proj(hidden_states)
+-++++++#     #     key_states = self.k_proj(hidden_states)
+-++++++#     #     value_states = self.v_proj(hidden_states)
+-++++++
+-++++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-++++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++
+-++++++#     #     # 3. RoPE 旋转位置编码
+-++++++#     #     kv_seq_len = key_states.shape[-2]
+-++++++#     #     if past_key_value is not None:
+-++++++#     #         if self.layer_idx is None:
+-++++++#     #             raise ValueError(
+-++++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++#     #                 "with a layer index."
+-++++++#     #             )
+-++++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++
+-++++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++#     #     # 4. KV 缓存更新
+-++++++#     #     if past_key_value is not None:
+-++++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-++++++#     #         key_states, value_states = past_key_value.update(
+-++++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
+-++++++#     #         )
+-++++++
+-++++++#     #     # 5. 准备 Attention Mask
+-++++++#     #     fa_attention_mask = None
+-++++++#     #     if attention_mask is not None:
+-++++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++#     #         fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-++++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-++++++#     #     input_dtype = query_states.dtype
+-++++++
+-++++++#     #     # 6. [核心] 调用 flash_attention_score 算子
+-++++++#     #     attn_output = mindspore.ops.flash_attention_score(
+-++++++#     #         query=query_states,
+-++++++#     #         key=key_states,
+-++++++#     #         value=value_states,
+-++++++#     #         head_num=self.num_heads,
+-++++++#     #         attn_mask=fa_attention_mask,
+-++++++#     #         keep_prob=1.0 - self.attention_dropout,
+-++++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++#     #         input_layout="BNSD",
+-++++++#     #         sparse_mode=0,
+-++++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
+-++++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-++++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-++++++#     #         inner_precise=1
+-++++++#     #     )
+-++++++
+-++++++#     #     # 恢复原始数据类型
+-++++++#     #     attn_output = attn_output.to(input_dtype)
+-++++++
+-++++++#     #     # 7. 调整输出形状
+-++++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++#     #     attn_output = self.o_proj(attn_output)
+-++++++
+-++++++#     #     attn_weights = None
+-++++++#     #     if output_attentions:
+-++++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++++
+-++++++#     #     return attn_output, attn_weights, past_key_value
+-++++++
+-++++++
+-+++++ class Qwen2MoeFlashAttention(nn.Module):
+-+++++     """
+-+++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++++-
+-+++++-    关键改动:
+-+++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++++-       直接传入原始的 key 和 value 张量效率更高。
+-+++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-++++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
+-++++++
+-++++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
+-++++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
+-++++++    完全使用模型的低精度数据类型（如 float16）进行计算，
+-++++++    以达到理论上的最高执行速度。
+-+++++     """
+-+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++++         super().__init__()
+-+++++         self.config = config
+-+++++         self.layer_idx = layer_idx
+-++++++        if layer_idx is None:
+-++++++            logger.warning_once(
+-++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
+-++++++            )
+-++++++
+-+++++         self.hidden_size = config.hidden_size
+-+++++         self.num_heads = config.num_attention_heads
+-+++++         self.head_dim = self.hidden_size // self.num_heads
+-+++++         self.num_key_value_heads = config.num_key_value_heads
+-+++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++         self.max_position_embeddings = config.max_position_embeddings
+-+++++         self.rope_theta = config.rope_theta
+-+++++         self.attention_dropout = config.attention_dropout
+-+++++ 
+-+++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++-            raise ValueError(
+-+++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++++-            )
+-+++++-
+-+++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
+-+++++         key_states = self.k_proj(hidden_states)
+-+++++         value_states = self.v_proj(hidden_states)
+-+++++ 
+-+++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-++++++        # 2. 调整形状以匹配 BNSD 布局
+-+++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-
+-+++++-        # 3. RoPE 旋转位置编码
+-++++++        
+-++++++        # 3. RoPE 和 KV 缓存
+-+++++         kv_seq_len = key_states.shape[-2]
+-+++++         if past_key_value is not None:
+-+++++-            if self.layer_idx is None:
+-+++++-                raise ValueError(
+-+++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++-                    "with a layer index."
+-+++++-                )
+-+++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++++-                if cache_position.shape[0] == 1:
+-+++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++++-                    kv_seq_len = past_seen_tokens + 1
+-+++++-                else:
+-+++++-                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++++-            else:
+-+++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++-
+-++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++        
+-+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++ 
+-+++++-        # 4. KV 缓存更新
+-+++++         if past_key_value is not None:
+-+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++-            key_states, value_states = past_key_value.update(
+-+++++-                key_states, value_states, self.layer_idx, cache_kwargs
+-+++++-            )
+-+++++-            
+-+++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++-                if cache_position.shape[0] == 1:
+-+++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++++-                    kv_seq_len = key_states.shape[-2]
+-+++++-
+-+++++-        # 5. [重要] 准备 Attention Mask
+-+++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++++
+-++++++        # 4. 准备 Attention Mask
+-+++++         fa_attention_mask = None
+-+++++         if attention_mask is not None:
+-+++++-            # 截取与当前key长度匹配的部分
+-+++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++++             fa_attention_mask = (mask_slice != 0)
+-+++++ 
+-+++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++++-        input_dtype = query_states.dtype
+-+++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++++-            query_states = query_states.to(mindspore.float16)
+-+++++-            key_states = key_states.to(mindspore.float16)
+-+++++-            value_states = value_states.to(mindspore.float16)
+-+++++-
+-+++++-        # 6. [核心] 调用 flash_attention_score 算子
+-+++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-++++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
+-+++++         attn_output = mindspore.ops.flash_attention_score(
+-+++++             query=query_states,
+-+++++             key=key_states,
+-+++++             value=value_states,
+-+++++-            head_num=self.num_heads, # 传入Q的头数(N1)
+-++++++            head_num=self.num_heads,
+-+++++             attn_mask=fa_attention_mask,
+-+++++-            keep_prob=1.0 - self.attention_dropout,
+-++++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
+-+++++             scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++             input_layout="BNSD",
+-+++++-            sparse_mode=0 # 使用 defaultMask 模式
+-++++++            sparse_mode=0,
+-++++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
+-+++++         )
+-+++++ 
+-+++++-        # 恢复原始数据类型
+-+++++-        attn_output = attn_output.to(input_dtype)
+-+++++-
+-+++++-        # 7. 调整输出形状
+-+++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-++++++        # 6. 调整输出形状
+-+++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++         attn_output = self.o_proj(attn_output)
+-+++++ 
+-+++++-        # FlashAttention 算子不直接返回注意力权重矩阵
+-++++++        # 7. 返回结果
+-+++++         attn_weights = None
+-+++++         if output_attentions:
+-+++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-+++++ 
+-+++++         return attn_output, attn_weights, past_key_value
+-+++++ 
+-+++++-    # def forward(
+-+++++-    #     self,
+-+++++-    #     hidden_states: mindspore.Tensor,
+-+++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++-    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++-    #     past_key_value: Optional[Cache] = None,
+-+++++-    #     output_attentions: bool = False,
+-+++++-    #     use_cache: bool = False,
+-+++++-    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++-
+-+++++-    #     bsz, q_len, _ = hidden_states.shape
+-+++++-
+-+++++-    #     # 1. 线性投射 Q, K, V
+-+++++-    #     query_states = self.q_proj(hidden_states)
+-+++++-    #     key_states = self.k_proj(hidden_states)
+-+++++-    #     value_states = self.v_proj(hidden_states)
+-+++++-
+-+++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-
+-+++++-    #     # 3. RoPE 旋转位置编码
+-+++++-    #     kv_seq_len = key_states.shape[-2]
+-+++++-    #     if past_key_value is not None:
+-+++++-    #         if self.layer_idx is None:
+-+++++-    #             raise ValueError(
+-+++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++-    #                 "with a layer index."
+-+++++-    #             )
+-+++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++ 
+-+++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++-
+-+++++-    #     # 4. KV 缓存更新
+-+++++-    #     if past_key_value is not None:
+-+++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++-    #         key_states, value_states = past_key_value.update(
+-+++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++-    #         )
+-+++++-
+-+++++-    #     # 5. 准备 Attention Mask
+-+++++-    #     fa_attention_mask = None
+-+++++-    #     if attention_mask is not None:
+-+++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++-    #         fa_attention_mask = (mask_slice != 0)
+-+++++-
+-+++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++++-    #     input_dtype = query_states.dtype
+-+++++-
+-+++++-    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++++-    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++-    #         query=query_states,
+-+++++-    #         key=key_states,
+-+++++-    #         value=value_states,
+-+++++-    #         head_num=self.num_heads,
+-+++++-    #         attn_mask=fa_attention_mask,
+-+++++-    #         keep_prob=1.0 - self.attention_dropout,
+-+++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++-    #         input_layout="BNSD",
+-+++++-    #         sparse_mode=0,
+-+++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++++-    #         inner_precise=1
+-+++++-    #     )
+-+++++-
+-+++++-    #     # 恢复原始数据类型
+-+++++-    #     attn_output = attn_output.to(input_dtype)
+-++++++QWEN2MOE_ATTENTION_CLASSES = {
+-++++++    "eager": Qwen2MoeAttention,
+-++++++    "flash-attention": Qwen2MoeFlashAttention,
+-++++++}
+-+++++ 
+-+++++-    #     # 7. 调整输出形状
+-+++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++-    #     attn_output = self.o_proj(attn_output)
+-+++++ 
+-+++++-    #     attn_weights = None
+-+++++-    #     if output_attentions:
+-+++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++++#     def __init__(self, config):
+-++++++#         super().__init__()
+-++++++#         self.num_experts = config.num_experts
+-++++++#         self.top_k = config.num_experts_per_tok
+-++++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++++
+-++++++#         # gating
+-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++++#         self.experts = nn.ModuleList(
+-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++++#         )
+-++++++
+-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++++
+-++++++#     #@dwj
+-++++++#     # 只遍历激活的专家，而非全部专家
+-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++#             num_tokens = hidden_states_reshaped.shape[0]
+-++++++
+-++++++#             router_logits = self.gate(hidden_states_reshaped)
+-++++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++++
+-++++++#             if self.norm_topk_prob:
+-++++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++#             routing_weights = routing_weights.to(hidden_states.dtype)
+-++++++            
+-++++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-++++++#             flat_selected_experts = selected_experts.flatten()
+-++++++            
+-++++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-++++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-++++++#             token_indices = broadcasted_token_indices.flatten()
+-++++++            
+-++++++#             active_experts = ops.unique(flat_selected_experts)
+-++++++            
+-++++++#             for expert_idx_tensor in active_experts:
+-++++++#                 expert_idx = expert_idx_tensor.item()
+-++++++#                 expert_layer = self.experts[expert_idx]
+-++++++                
+-++++++#                 mask = (flat_selected_experts == expert_idx_tensor)
+-++++++#                 selected_token_indices = token_indices[mask]
+-++++++#                 selected_routing_weights = routing_weights.flatten()[mask]
+-++++++                
+-++++++#                 current_states = hidden_states_reshaped[selected_token_indices]
+-++++++                
+-++++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++++                
+-++++++#                 final_hidden_states = final_hidden_states.index_add(
+-++++++#                     dim=0,
+-++++++#                     index=selected_token_indices,
+-++++++#                     source=expert_output.to(hidden_states.dtype)
+-++++++#                 )
+-++++++            
+-++++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++++ 
+-+++++-    #     return attn_output, attn_weights, past_key_value
+-++++++#             final_hidden_states = final_hidden_states + shared_expert_output
+-++++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++++            
+-++++++#             return final_hidden_states, router_logits
+-++++++
+-++++++
+-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++++#     """
+-++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-++++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-++++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-++++++#     """
+-++++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++++#         super().__init__()
+-++++++#         self.num_experts = config.num_experts
+-++++++#         self.top_k = config.num_experts_per_tok
+-++++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++++
+-++++++#         # 门控网络
+-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++++#         # 专家列表
+-++++++#         self.experts = nn.ModuleList(
+-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++++#         )
+-++++++#         # 共享专家
+-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++++
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_decode(
+-++++++#         self, 
+-++++++#         hidden_states: mindspore.Tensor, 
+-++++++#         selected_experts: mindspore.Tensor, 
+-++++++#         routing_weights: mindspore.Tensor
+-++++++#     ) -> mindspore.Tensor:
+-++++++#         """
+-++++++#         【解码路径】针对 sequence_length=1 的极致优化。
+-++++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-++++++#         """
+-++++++#         batch_size, hidden_dim = hidden_states.shape
+-++++++        
+-++++++#         expert_outputs_list = [
+-++++++#             ops.cat([
+-++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++++#             ], dim=0) 
+-++++++#             for i in range(batch_size)
+-++++++#         ]
+-++++++        
+-++++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-++++++#         # shape: (batch_size, top_k, hidden_dim)
+-++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++++        
+-++++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++++        
+-++++++#         return moe_output.squeeze(1)
+-++++++
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_prefill(
+-++++++#         self, 
+-++++++#         hidden_states: mindspore.Tensor, 
+-++++++#         selected_experts: mindspore.Tensor, 
+-++++++#         routing_weights: mindspore.Tensor
+-++++++#     ) -> mindspore.Tensor:
+-++++++#         """
+-++++++#         【预填充路径】针对 sequence_length > 1 的优化。
+-++++++#         按专家对 Token 进行分组，并进行批处理。
+-++++++#         """
+-++++++#         moe_output = ops.zeros_like(hidden_states)
+-++++++#         num_tokens = hidden_states.shape[0]
+-++++++#         flat_selected_experts = selected_experts.flatten()
+-++++++        
+-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++++        
+-++++++#         active_experts = ops.unique(flat_selected_experts)
+-++++++        
+-++++++#         for expert_idx_tensor in active_experts:
+-++++++#             expert_idx = expert_idx_tensor.item()
+-++++++#             expert_layer = self.experts[expert_idx]
+-++++++            
+-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++++#             selected_token_indices = token_indices[mask]
+-++++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++++            
+-++++++#             current_states = hidden_states[selected_token_indices]
+-++++++            
+-++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++++            
+-++++++#             moe_output = moe_output.index_add(
+-++++++#                 dim=0,
+-++++++#                 index=selected_token_indices,
+-++++++#                 source=expert_output.to(hidden_states.dtype)
+-++++++#             )
+-++++++#         return moe_output
+-++++++
+-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++++#         """
+-++++++#         顶层 forward 方法，作为智能分发器。
+-++++++#         """
+-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++        
+-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++ 
+-+++++-    # def forward(
+-+++++-    #     self,
+-+++++-    #     hidden_states: mindspore.Tensor,
+-+++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++-    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++-    #     past_key_value: Optional[Cache] = None,
+-+++++-    #     output_attentions: bool = False,
+-+++++-    #     use_cache: bool = False,
+-+++++-    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++-
+-+++++-    #     bsz, q_len, _ = hidden_states.shape
+-+++++-
+-+++++-    #     query_states = self.q_proj(hidden_states)
+-+++++-    #     key_states = self.k_proj(hidden_states)
+-+++++-    #     value_states = self.v_proj(hidden_states)
+-+++++-
+-+++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-
+-+++++-    #     kv_seq_len = key_states.shape[-2]
+-+++++-    #     if past_key_value is not None:
+-+++++-    #         if self.layer_idx is None:
+-+++++-    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++-
+-+++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++-
+-+++++-    #     if past_key_value is not None:
+-+++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++-    #         key_states, value_states = past_key_value.update(
+-+++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++-    #         )
+-++++++#         if self.norm_topk_prob:
+-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++        
+-++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++++        
+-++++++#         moe_output = None
+-++++++#         # 在推理时，根据序列长度选择最优路径
+-++++++#         if not self.training:
+-++++++#             if sequence_length == 1:
+-++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++++++#             else:
+-++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++++++#         else:
+-++++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-++++++#             raise NotImplementedError("Training path is not implemented.")
+-++++++
+-++++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-++++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-++++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-++++++        
+-++++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-++++++        
+-++++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-++++++        
+-++++++#         return final_hidden_states, router_logits
+-++++++
+-++++++
+-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++++#     """
+-++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-++++++#     """
+-++++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++++#         super().__init__()
+-++++++#         self.num_experts = config.num_experts
+-++++++#         self.top_k = config.num_experts_per_tok
+-++++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++++
+-++++++#         # 门控网络
+-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++++#         # 专家列表
+-++++++#         self.experts = nn.ModuleList(
+-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++++#         )
+-++++++#         # 共享专家
+-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++++
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_decode(
+-++++++#         self, 
+-++++++#         hidden_states: mindspore.Tensor, 
+-++++++#         selected_experts: mindspore.Tensor, 
+-++++++#         routing_weights: mindspore.Tensor
+-++++++#     ) -> mindspore.Tensor:
+-++++++#         batch_size, _ = hidden_states.shape
+-++++++#         expert_outputs_list = [
+-++++++#             ops.cat([
+-++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++++#             ], dim=0) 
+-++++++#             for i in range(batch_size)
+-++++++#         ]
+-++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++++#         return moe_output.squeeze(1)
+-++++++
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_prefill(
+-++++++#         self, 
+-++++++#         hidden_states: mindspore.Tensor, 
+-++++++#         selected_experts: mindspore.Tensor, 
+-++++++#         routing_weights: mindspore.Tensor
+-++++++#     ) -> mindspore.Tensor:
+-++++++#         moe_output = ops.zeros_like(hidden_states)
+-++++++#         num_tokens = hidden_states.shape[0]
+-++++++#         flat_selected_experts = selected_experts.flatten()
+-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++++#         active_experts = ops.unique(flat_selected_experts)
+-++++++        
+-++++++#         for expert_idx_tensor in active_experts:
+-++++++#             expert_idx = expert_idx_tensor.item()
+-++++++#             expert_layer = self.experts[expert_idx]
+-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++++#             selected_token_indices = token_indices[mask]
+-++++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++++#             current_states = hidden_states[selected_token_indices]
+-++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++++#             moe_output = moe_output.index_add(
+-++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++++#             )
+-++++++#         return moe_output
+-++++++
+-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++++#         """
+-++++++#         顶层 forward 方法，作为智能分发器。
+-++++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-++++++#         """
+-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++        
+-++++++#         # 1. 门控计算 (通用逻辑)
+-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++++
+-++++++#         if self.norm_topk_prob:
+-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++        
+-++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++++        
+-++++++#         # 2. 智能分发到最优 MoE 路径
+-++++++#         moe_output = None
+-++++++#         if not self.training:
+-++++++#             if sequence_length == 1:
+-++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-++++++#             else:
+-++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-++++++#         else:
+-++++++#             raise NotImplementedError("Training path is not implemented.")
+-++++++
+-++++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-++++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++++        
+-++++++#         # 4. 合并 MoE 输出和共享专家输出
+-++++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++++        
+-++++++#         # 5. 恢复原始形状并返回
+-++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++++        
+-++++++#         return final_hidden_states, router_logits
+-++++++
+-++++++# prefill fastest
+-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++++#     """
+-++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-++++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-++++++#     """
+-++++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++++#         super().__init__()
+-++++++#         self.num_experts = config.num_experts
+-++++++#         self.top_k = config.num_experts_per_tok
+-++++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++++
+-++++++#         # 门控网络
+-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++++#         # 专家列表
+-++++++#         self.experts = nn.ModuleList(
+-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++++#         )
+-++++++#         # 共享专家
+-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++++
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_dispatch(
+-++++++#         self, 
+-++++++#         hidden_states: mindspore.Tensor, 
+-++++++#         selected_experts: mindspore.Tensor, 
+-++++++#         routing_weights: mindspore.Tensor
+-++++++#     ) -> mindspore.Tensor:
+-++++++#         """
+-++++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-++++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-++++++#         """
+-++++++#         moe_output = ops.zeros_like(hidden_states)
+-++++++#         num_tokens, _ = hidden_states.shape
+-++++++        
+-++++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-++++++#         flat_selected_experts = selected_experts.flatten()
+-++++++#         flat_routing_weights = routing_weights.flatten()
+-+++++ 
+-+++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++-
+-+++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++++-    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++++-    #     # <--- 修改结束 ---
+-+++++-
+-+++++-    #     fa_attention_mask = None
+-+++++-    #     if attention_mask is not None:
+-+++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++-    #         fa_attention_mask = (mask_slice != 0)
+-+++++-
+-+++++-    #     input_dtype = query_states.dtype
+-+++++-
+-+++++-    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++-    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++++-    #         key=key_states,
+-+++++-    #         value=value_states,
+-+++++-    #         head_num=self.num_heads,
+-+++++-    #         attn_mask=fa_attention_mask,
+-+++++-    #         keep_prob=1.0 - self.attention_dropout,
+-+++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++++-    #         input_layout="BNSD",
+-+++++-    #         sparse_mode=0,
+-+++++-    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++++-    #     )
+-++++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++ 
+-+++++-    #     attn_output = attn_output.to(input_dtype)
+-+++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++-    #     attn_output = self.o_proj(attn_output)
+-++++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-++++++#         active_experts = ops.unique(flat_selected_experts)
+-++++++        
+-++++++#         for expert_idx_tensor in active_experts:
+-++++++#             expert_idx = expert_idx_tensor.item()
+-++++++#             expert_layer = self.experts[expert_idx]
+-++++++            
+-++++++#             # 找到所有分配给该专家的 token
+-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++++            
+-++++++#             # 使用 mask 选取对应的 token 和权重
+-++++++#             current_token_indices = token_indices[mask]
+-++++++#             current_routing_weights = flat_routing_weights[mask]
+-++++++#             current_hidden_states = hidden_states[current_token_indices]
+-++++++            
+-++++++#             # 对这些 token 进行批处理
+-++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++++++            
+-++++++#             # 使用 index_add 将结果精确地加回到对应位置
+-++++++#             moe_output = moe_output.index_add(
+-++++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++++#             )
+-++++++#         return moe_output
+-++++++
+-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++++#         """
+-++++++#         顶层 forward 方法，作为智能分发器。
+-++++++#         """
+-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++        
+-++++++#         # 1. 门控计算
+-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++++
+-++++++#         if self.norm_topk_prob:
+-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++        
+-++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
+-++++++        
+-++++++#         # 2. 调用统一的 MoE 计算内核
+-++++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-++++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++ 
+-+++++-    #     attn_weights = None
+-+++++-    #     if output_attentions:
+-+++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-++++++#         # 3. 统一处理共享专家
+-++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++++        
+-++++++#         # 4. 合并输出
+-++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++++        
+-++++++#         # 5. 恢复原始形状并返回
+-++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++++        
+-++++++#         return final_hidden_states, router_logits
+-++++++
+-++++++
+-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++++#     """
+-++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-++++++#     【最终高性能与高精度版】：
+-++++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-++++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-++++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-++++++#     3. 这样实现了速度和准确性的两全其美。
+-++++++#     """
+-++++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++++#         super().__init__()
+-++++++#         self.num_experts = config.num_experts
+-++++++#         self.top_k = config.num_experts_per_tok
+-++++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++++
+-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++++#         self.experts = nn.ModuleList(
+-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++++#         )
+-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++++
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_decode(
+-++++++#         self, 
+-++++++#         hidden_states: mindspore.Tensor, 
+-++++++#         selected_experts: mindspore.Tensor, 
+-++++++#         routing_weights: mindspore.Tensor
+-++++++#     ) -> mindspore.Tensor:
+-++++++#         """
+-++++++#         【解码路径】极致优化版：bmm + 高精度累加。
+-++++++#         """
+-++++++#         original_dtype = hidden_states.dtype
+-++++++#         batch_size, _ = hidden_states.shape
+-++++++        
+-++++++#         expert_outputs_list = [
+-++++++#             ops.cat([
+-++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++++#             ], dim=0) 
+-++++++#             for i in range(batch_size)
+-++++++#         ]
+-++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++++
+-++++++#         # 在 float32 下执行 bmm，得到高精度结果
+-++++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-++++++        
+-++++++#         # 将高精度结果转换回原始数据类型
+-++++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-++++++        
+-++++++#         return moe_output
+-++++++
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_prefill(
+-++++++#         self, 
+-++++++#         hidden_states: mindspore.Tensor, 
+-++++++#         selected_experts: mindspore.Tensor, 
+-++++++#         routing_weights: mindspore.Tensor
+-++++++#     ) -> mindspore.Tensor:
+-++++++#         """
+-++++++#         【预填充路径】与原始实现一致，结果精确。
+-++++++#         """
+-++++++#         moe_output = ops.zeros_like(hidden_states)
+-++++++#         num_tokens, _ = hidden_states.shape
+-++++++#         flat_selected_experts = selected_experts.flatten()
+-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++++#         active_experts = ops.unique(flat_selected_experts)
+-++++++        
+-++++++#         for expert_idx_tensor in active_experts:
+-++++++#             expert_idx = expert_idx_tensor.item()
+-++++++#             expert_layer = self.experts[expert_idx]
+-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++++#             selected_token_indices = token_indices[mask]
+-++++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++++#             current_states = hidden_states[selected_token_indices]
+-++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++++#             moe_output = moe_output.index_add(
+-++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-++++++#             )
+-++++++#         return moe_output
+-++++++
+-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++        
+-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++ 
+-+++++-    #     return attn_output, attn_weights, past_key_value
+-++++++#         if self.norm_topk_prob:
+-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++        
+-++++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-++++++#         # 如果模型主体是 float16，后续再转换
+-++++++        
+-++++++#         moe_output = None
+-++++++#         if not self.training:
+-++++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-++++++#             # _moe_infer_decode 内部会处理好类型转换
+-++++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-++++++#             if sequence_length == 1:
+-++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++++++#             else:
+-++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-++++++#         else:
+-++++++#             raise NotImplementedError("Training path is not implemented.")
+-++++++
+-++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++++        
+-++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++++        
+-++++++#         return final_hidden_states, router_logits
+-++++++    
+-+++++ 
+-+++++-QWEN2MOE_ATTENTION_CLASSES = {
+-+++++-    "eager": Qwen2MoeAttention,
+-+++++-    "flash-attention": Qwen2MoeFlashAttention,
+-+++++-}
+-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++++#     """
+-++++++#     【融合版】一个混合专家模块，内置两种推理策略，
+-++++++#     由外部全局变量 `Long_Prompt` 控制：
+-++++++
+-++++++#     - if Long_Prompt is True:  【精度优先模式】
+-++++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-++++++#       适用于处理长序列，避免误差累积。
+-++++++
+-++++++#     - if Long_Prompt is False: 【速度优先模式】
+-++++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-++++++#       在解码阶段获得极致速度，同时保证结果高度准确。
+-++++++#     """
+-++++++#     def __init__(self, config: Qwen2MoeConfig):
+-++++++#         super().__init__()
+-++++++#         self.num_experts = config.num_experts
+-++++++#         self.top_k = config.num_experts_per_tok
+-++++++#         self.norm_topk_prob = config.norm_topk_prob
+-++++++
+-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-++++++#         self.experts = nn.ModuleList(
+-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-++++++#         )
+-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++++
+-++++++#     # --- 速度优先模式的辅助函数 ---
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++++#         original_dtype = hidden_states.dtype
+-++++++#         batch_size, _ = hidden_states.shape
+-++++++#         expert_outputs_list = [
+-++++++#             ops.cat([
+-++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++++#             ], dim=0) 
+-++++++#             for i in range(batch_size)
+-++++++#         ]
+-++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++++#         weights_fp32 = routing_weights.to(mindspore.float32)
+-++++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-++++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-++++++
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++++#         moe_output = ops.zeros_like(hidden_states)
+-++++++#         num_tokens, _ = hidden_states.shape
+-++++++#         flat_selected_experts = selected_experts.flatten()
+-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++++#         active_experts = ops.unique(flat_selected_experts)
+-++++++#         for expert_idx_tensor in active_experts:
+-++++++#             expert_idx = expert_idx_tensor.item()
+-++++++#             expert_layer = self.experts[expert_idx]
+-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++++#             selected_token_indices = token_indices[mask]
+-++++++#             selected_routing_weights = routing_weights.flatten()[mask]
+-++++++#             current_states = hidden_states[selected_token_indices]
+-++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-++++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++++#         return moe_output
+-++++++
+-++++++#     # --- 精度优先模式的辅助函数 ---
+-++++++#     @no_grad()
+-++++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++++#         moe_output = ops.zeros_like(hidden_states)
+-++++++#         num_tokens, _ = hidden_states.shape
+-++++++#         flat_selected_experts = selected_experts.flatten()
+-++++++#         flat_routing_weights = routing_weights.flatten()
+-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++++#         active_experts = ops.unique(flat_selected_experts)
+-++++++#         for expert_idx_tensor in active_experts:
+-++++++#             expert_idx = expert_idx_tensor.item()
+-++++++#             expert_layer = self.experts[expert_idx]
+-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
+-++++++#             current_token_indices = token_indices[mask]
+-++++++#             current_routing_weights = flat_routing_weights[mask]
+-++++++#             current_hidden_states = hidden_states[current_token_indices]
+-++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++++#         return moe_output
+-++++++
+-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++++#         # 声明我们将要使用一个在模块外部定义的全局变量
+-++++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-++++++#         global Long_Prompt
+-++++++
+-++++++#         # 1. 门控计算 (所有模式通用)
+-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++#         router_logits = self.gate(hidden_states_reshaped)
+-++++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-++++++#         if self.norm_topk_prob:
+-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++        
+-++++++#         moe_output = None
+-++++++#         if not self.training:
+-++++++#             # 根据 Long_Prompt 标志选择模式
+-++++++#             if Long_Prompt:
+-++++++#                 # --- 精度优先模式 ---
+-++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++#             else:
+-++++++#                 # --- 速度优先模式 ---
+-++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++++#                 if sequence_length == 1:
+-++++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++#                 else:
+-++++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++#         else:
+-++++++#             raise NotImplementedError("Training path is not implemented.")
+-++++++
+-++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++++        
+-++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++++        
+-++++++#         return final_hidden_states, router_logits
+-++++++    
+-++++++class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++++    """
+-++++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-++++++    控制的顶级推理策略：
+-+++++ 
+-++++++    - if Long_Prompt is True:  【精度优先模式】
+-++++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
+-++++++      适用于需要严格可复现性的长序列任务。
+-+++++ 
+-+++++-class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++-    def __init__(self, config):
+-++++++    - if Long_Prompt is False: 【速度优先模式】
+-++++++      采用业界最强的性能组合：
+-++++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
+-++++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
+-++++++    """
+-++++++    def __init__(self, config: Qwen2MoeConfig):
+-+++++         super().__init__()
+-+++++         self.num_experts = config.num_experts
+-+++++         self.top_k = config.num_experts_per_tok
+-+++++         self.norm_topk_prob = config.norm_topk_prob
+-+++++ 
+-+++++-        # gating
+-+++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++         self.experts = nn.ModuleList(
+-+++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++         )
+-+++++-
+-+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++ 
+-+++++-    #@dwj
+-+++++-    # 只遍历激活的专家，而非全部专家
+-+++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++-            num_tokens = hidden_states_reshaped.shape[0]
+-+++++-
+-+++++-            router_logits = self.gate(hidden_states_reshaped)
+-+++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++-
+-+++++-            if self.norm_topk_prob:
+-+++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++-            
+-+++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++++-            flat_selected_experts = selected_experts.flatten()
+-+++++-            
+-+++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++++-            token_indices = broadcasted_token_indices.flatten()
+-+++++-            
+-+++++-            active_experts = ops.unique(flat_selected_experts)
+-+++++-            
+-+++++-            for expert_idx_tensor in active_experts:
+-+++++-                expert_idx = expert_idx_tensor.item()
+-+++++-                expert_layer = self.experts[expert_idx]
+-+++++-                
+-+++++-                mask = (flat_selected_experts == expert_idx_tensor)
+-+++++-                selected_token_indices = token_indices[mask]
+-+++++-                selected_routing_weights = routing_weights.flatten()[mask]
+-+++++-                
+-+++++-                current_states = hidden_states_reshaped[selected_token_indices]
+-+++++-                
+-+++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++-                
+-+++++-                final_hidden_states = final_hidden_states.index_add(
+-+++++-                    dim=0,
+-+++++-                    index=selected_token_indices,
+-+++++-                    source=expert_output.to(hidden_states.dtype)
+-+++++-                )
+-+++++-            
+-+++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
+-++++++    @no_grad()
+-++++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++++        original_dtype = hidden_states.dtype
+-++++++        batch_size, _ = hidden_states.shape
+-++++++        expert_outputs_list = [
+-++++++            ops.cat([
+-++++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-++++++            ], dim=0)
+-++++++            for i in range(batch_size)
+-++++++        ]
+-++++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-++++++        weights_fp32 = routing_weights.to(mindspore.float32)
+-++++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-++++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-++++++        return moe_output_fp32.squeeze(1).to(original_dtype)
+-++++++
+-++++++    @no_grad()
+-++++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++++        num_tokens, _ = hidden_states.shape
+-++++++        flat_selected_experts = selected_experts.flatten()
+-++++++        sorted_expert_indices = flat_selected_experts.argsort()
+-++++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-++++++        original_token_indices = sorted_expert_indices // self.top_k
+-++++++        moe_output = ops.zeros_like(hidden_states)
+-++++++        current_token_offset = 0
+-++++++        for i in range(self.num_experts):
+-++++++            expert_token_count = tokens_per_expert[i] - current_token_offset
+-++++++            if expert_token_count == 0:
+-++++++                continue
+-++++++            end_offset = current_token_offset + expert_token_count
+-++++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-++++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-++++++            expert_hidden_states = hidden_states[expert_original_token_indices]
+-++++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-++++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-++++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++++            current_token_offset += expert_token_count
+-++++++        return moe_output
+-++++++
+-++++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-++++++    @no_grad()
+-++++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++++        moe_output = ops.zeros_like(hidden_states)
+-++++++        num_tokens, _ = hidden_states.shape
+-++++++        flat_selected_experts = selected_experts.flatten()
+-++++++        flat_routing_weights = routing_weights.flatten()
+-++++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-++++++        active_experts = ops.unique(flat_selected_experts)
+-++++++        for expert_idx_tensor in active_experts:
+-++++++            expert_idx = expert_idx_tensor.item()
+-++++++            expert_layer = self.experts[expert_idx]
+-++++++            mask = (flat_selected_experts == expert_idx_tensor)
+-++++++            current_token_indices = token_indices[mask]
+-++++++            current_routing_weights = flat_routing_weights[mask]
+-++++++            current_hidden_states = hidden_states[current_token_indices]
+-++++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-++++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++++        return moe_output
+-+++++ 
+-+++++-            final_hidden_states = final_hidden_states + shared_expert_output
+-+++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++-            
+-+++++-            return final_hidden_states, router_logits
+-++++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++++        global Long_Prompt
+-++++++
+-++++++        # 1. 门控计算 (所有模式通用)
+-++++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-++++++        router_logits = self.gate(hidden_states_reshaped)
+-++++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-++++++        if self.norm_topk_prob:
+-++++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++        
+-++++++        moe_output = None
+-++++++        if Long_Prompt:
+-++++++            # --- 精度优先模式 (ACCURACY MODE) ---
+-++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++        else:
+-++++++            # --- 速度优先模式 (SPEED MODE) ---
+-++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++++            if sequence_length == 1:
+-++++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++            else:
+-++++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++        
+-+++++ 
+-++++++        # 3. 共享专家计算与合并 (所有模式通用)
+-++++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-++++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-++++++        
+-++++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-++++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-++++++        
+-++++++        return final_hidden_states, router_logits
+-+++++ 
+-+++++ class Qwen2MoeDecoderLayer(nn.Module):
+-+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-+++++         super().__init__()
+-+++++         self.hidden_size = config.hidden_size
+-++++++        
+-++++++        # if Long_Prompt:
+-++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++++        # else:
+-++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++++ 
+-+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++++ 
+-+++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++++-
+-+++++         if (layer_idx not in config.mlp_only_layers) and (
+-+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+++++         ):
+-+++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++             self._warmed_up = True
+-+++++             self.warmup_moe_model()
+-+++++ 
+-++++++
+-++++++
+-+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+++++         output_router_logits = (
+-+++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
+-+++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++             router_logits=outputs.router_logits,
+-+++++         )
+-+++++ 
+-++++++    def generate(self, *args, **kwargs):
+-++++++        """
+-++++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-++++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-++++++        """
+-++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-++++++
+-++++++        input_ids = kwargs.get("input_ids")
+-++++++        if input_ids is None and args:
+-++++++            input_ids = args[0]
+-++++++
+-++++++        if input_ids is not None:
+-++++++            prompt_length = input_ids.shape[1]
+-++++++            
+-++++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-++++++                Long_Prompt = True
+-++++++            else:
+-++++++                Long_Prompt = False
+-++++++
+-++++++        return super().generate(*args, **kwargs)
+-++++++    
+-+++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
+-+++++     def prepare_inputs_for_generation(
+-+++++         self,
+-+++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
+-+++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
+-+++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
+-++++++        
+-+++++         if past_key_values is not None:
+-+++++             if inputs_embeds is not None:  # Exception 1
+-+++++                 if 0 not in input_ids.shape:
+-+++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++             }
+-+++++         )
+-+++++         return model_inputs
+-++++++
+-+++++ # @lwx
+-+++++     # def _decode_one_tokens_logits(
+-+++++     #     self,
+-+++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
+-+++++             attentions=outputs.attentions,
+-+++++         )
+-+++++ 
+-++++++
+-+++++ __all__ = [
+-+++++     "Qwen2MoeForCausalLM",
+-+++++     "Qwen2MoeModel",
+-+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+++++new file mode 100644
+-+++++index 00000000..6dfb5b93
+-+++++--- /dev/null
+-++++++++ b/patches/0001-20251104commit.patch
+-+++++@@ -0,0 +1,1272 @@
+-++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-++++++From: Pinoeer-kingxi <13022943007@163.com>
+-++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
+-++++++Subject: [PATCH] 20251104commit
+-++++++
+-++++++---
+-++++++ mindnlp/transformers/cache_utils.py           |  28 +-
+-++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
+-++++++
+-++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-++++++index cadd2e04..02f8d4be 100644
+-++++++--- a/mindnlp/transformers/cache_utils.py
+-+++++++++ b/mindnlp/transformers/cache_utils.py
+-++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-++++++             # k_out[:, :, cache_position] = key_states
+-++++++             # v_out[:, :, cache_position] = value_states
+-++++++-            if ON_ORANGE_PI:
+-++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-++++++-            else:
+-++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-++++++-
+-+++++++            # if ON_ORANGE_PI:
+-+++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++++++            # else:
+-+++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++++++            # 确保 cache_position 是 1D tensor 并且类型正确
+-+++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+++++++            if cache_position.ndim > 1:
+-+++++++                cache_position = cache_position.flatten()
+-+++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+++++++                cache_position = cache_position.int()
+-+++++++            
+-+++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+++++++            k_out[:, :, cache_position] = key_states
+-+++++++            v_out[:, :, cache_position] = value_states
+-+++++++            
+-++++++         return k_out, v_out
+-++++++ 
+-++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++index c695b944..d8303e45 100644
+-++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
+-++++++ def rotate_half(x):
+-++++++     """Rotates half the hidden dims of the input."""
+-++++++-    x1 = x[..., : x.shape[-1] // 2]
+-++++++-    x2 = x[..., x.shape[-1] // 2 :]
+-+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++++++    # x1 = x[..., : x.shape[-1] // 2]
+-+++++++    # x2 = x[..., x.shape[-1] // 2 :]
+-+++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++++     return ops.cat((-x2, x1), dim=-1)
+-++++++ 
+-++++++ 
+-++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-++++++         if self.training:
+-++++++             raise NotImplementedError("Training is not supported yet.")
+-++++++         else:
+-++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-++++++-        if self.config.n_shared_experts is not None:
+-++++++-            y = y + self.shared_experts(identity)
+-++++++-        return y
+-+++++++            # @lwx
+-+++++++            if orig_shape[1] == 1:
+-+++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+++++++                y=y.view(*orig_shape)
+-+++++++                if self.config.n_shared_experts is not None:
+-+++++++                    y = y + self.shared_experts(identity)
+-+++++++                return y
+-+++++++            else:
+-+++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++++                if self.config.n_shared_experts is not None:
+-+++++++                    y = y + self.shared_experts(identity)
+-+++++++                return y
+-+++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++++        # if self.config.n_shared_experts is not None:
+-+++++++        #     y = y + self.shared_experts(identity)
+-+++++++        # return y
+-+++++++        
+-+++++++    @no_grad()
+-+++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++++++
+-+++++++        expert_cache = ops.zeros_like(x)
+-+++++++        for i in range(self.num_experts_per_tok):
+-+++++++            expert_id = flat_expert_indices[i].item()
+-+++++++            weight = flat_expert_weights[i].item()
+-+++++++            expert = self.experts[expert_id]
+-+++++++            expert_out = expert(x)
+-+++++++            expert_cache += expert_out * weight
+-+++++++        return expert_cache
+-++++++ 
+-++++++     @no_grad()
+-++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++++-        # expert_cache = torch.zeros_like(x)
+-++++++-        # idxs = flat_expert_indices.argsort()
+-++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++++-        # token_idxs = idxs // self.num_experts_per_tok
+-++++++-        # for i, end_idx in enumerate(tokens_per_expert):
+-++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++++-        #     if start_idx == end_idx:
+-++++++-        #         continue
+-++++++-        #     expert = self.experts[i]
+-++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++-        #     expert_tokens = x[exp_token_idx]
+-++++++-        #     expert_out = expert(expert_tokens)
+-++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++++-        # return expert_cache
+-+++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++++         expert_cache = ops.zeros_like(x)
+-++++++         idxs = flat_expert_indices.argsort()
+-++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++++         token_idxs = idxs // self.num_experts_per_tok
+-+++++++
+-++++++         for i, end_idx in enumerate(tokens_per_expert):
+-++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++++             if start_idx == end_idx:
+-++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-++++++             expert_out = expert(expert_tokens)
+-++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++++
+-++++++         return expert_cache
+-+++++++        
+-+++++++    # @no_grad()
+-+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++++    #     # expert_cache = torch.zeros_like(x)
+-+++++++    #     # idxs = flat_expert_indices.argsort()
+-+++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++++    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++++    #     #     if start_idx == end_idx:
+-+++++++    #     #         continue
+-+++++++    #     #     expert = self.experts[i]
+-+++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++++    #     #     expert_tokens = x[exp_token_idx]
+-+++++++    #     #     expert_out = expert(expert_tokens)
+-+++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++++    #     # return expert_cache
+-+++++++    #     expert_cache = ops.zeros_like(x)
+-+++++++    #     idxs = flat_expert_indices.argsort()
+-+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++++
+-+++++++    #     for i, end_idx in enumerate(tokens_per_expert):
+-+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++++    #         if start_idx == end_idx:
+-+++++++    #             continue
+-+++++++    #         expert = self.experts[i]
+-+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++++    #         expert_tokens = x[exp_token_idx]
+-+++++++    #         expert_out = expert(expert_tokens)
+-+++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++++
+-+++++++    #     return expert_cache
+-+++++++    # @no_grad()
+-+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++++    #     expert_cache = ops.zeros_like(x)
+-+++++++
+-+++++++    #     # 排序保证顺序一致
+-+++++++    #     idxs = flat_expert_indices.argsort()
+-+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++++    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++++
+-+++++++    #     # 找出有 token 的专家
+-+++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++++++
+-+++++++    #     for i in active_experts.tolist():
+-+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++++    #         end_idx = tokens_per_expert[i]
+-+++++++    #         if start_idx == end_idx:  # 没有 token
+-+++++++    #             continue
+-+++++++
+-+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++++    #         expert_tokens = x[exp_token_idx]
+-+++++++    #         expert_out = self.experts[i](expert_tokens)
+-+++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++++++
+-+++++++    #         expert_cache = mindspore.mint.scatter_add(
+-+++++++    #             expert_cache,
+-+++++++    #             0,
+-+++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++++++    #             expert_out
+-+++++++    #         )
+-+++++++
+-+++++++    #     return expert_cache
+-+++++++
+-+++++++
+-++++++ 
+-++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-++++++ #     """
+-++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++++ 
+-++++++         # Initialize weights and apply final processing
+-++++++         self.post_init()
+-+++++++        self.warm_up = False
+-+++++++
+-+++++++    def warmup_moe_model_deep(self):
+-+++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++++++        test_texts = [
+-+++++++            "warmup short",
+-+++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+++++++        ]
+-+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++++++        if tokenizer is None:
+-+++++++            from mindnlp.transformers import AutoTokenizer
+-+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++++++            self._warmup_tokenizer = tokenizer
+-+++++++
+-+++++++        for text in test_texts:
+-+++++++            inputs = tokenizer(text, return_tensors="ms")
+-+++++++            with mindspore._no_grad():
+-+++++++                _ = self(**inputs, use_cache=False)
+-+++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-++++++ 
+-++++++     def get_input_embeddings(self):
+-++++++         return self.model.embed_tokens
+-++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++++++         ```"""
+-+++++++        if not self.warm_up:
+-+++++++            self.warm_up = True
+-+++++++            self.warmup_moe_model_deep()
+-+++++++
+-++++++         output_attentions = (
+-++++++             output_attentions
+-++++++             if output_attentions is not None
+-++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++index 3cbf820e..d4c6b651 100644
+-++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++@@ -18,7 +18,6 @@
+-++++++ # See the License for the specific language governing permissions and
+-++++++ # limitations under the License.
+-++++++ """MindSpore Qwen2MoE model."""
+-++++++-
+-++++++ import math
+-++++++ from typing import List, Optional, Tuple, Union
+-++++++ 
+-++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-++++++     TokenClassifierOutput,
+-++++++ )
+-++++++ from ...modeling_utils import PreTrainedModel
+-+++++++from ...generation import GenerationMixin
+-++++++ from ....utils import logging
+-++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
+-++++++ 
+-++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-++++++         self.variance_epsilon = eps
+-++++++ 
+-++++++     def forward(self, hidden_states):
+-+++++++        # @dwj
+-+++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++++++        # @lwx
+-+++++++        # if not self.training :
+-+++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-++++++         input_dtype = hidden_states.dtype
+-++++++         hidden_states = hidden_states.to(mindspore.float32)
+-++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-++++++@@ -234,6 +239,8 @@ def rotate_half(x):
+-++++++     """Rotates half the hidden dims of the input."""
+-++++++     x1 = x[..., : x.shape[-1] // 2]
+-++++++     x2 = x[..., x.shape[-1] // 2 :]
+-+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-++++++     return ops.cat((-x2, x1), dim=-1)
+-++++++ 
+-++++++ 
+-++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-++++++         self.config = config
+-++++++         self.hidden_size = config.hidden_size
+-++++++         self.intermediate_size = intermediate_size
+-+++++++        
+-++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-++++++         self.act_fn = ACT2FN[config.hidden_act]
+-++++++ 
+-++++++     def forward(self, x):
+-++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-++++++-
+-++++++ 
+-+++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++++++        # @lwx 
+-+++++++        # gate_up_output = self.gate_up_proj(x)
+-+++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+++++++        # return self.down_proj(swiglu_output)
+-+++++++
+-+++++++    # def forward(self, x):
+-+++++++    #     gate_proj_out = self.gate_proj(x)
+-+++++++    #     up_proj_out = self.up_proj(x)
+-+++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+++++++    #     return self.down_proj(swiglu_out)
+-+++++++        
+-++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-++++++     """
+-++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-++++++         use_cache: bool = False,
+-++++++         cache_position: Optional[mindspore.Tensor] = None,
+-++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++++
+-+++++++        
+-+++++++
+-++++++         bsz, q_len, _ = hidden_states.shape
+-++++++ 
+-++++++         query_states = self.q_proj(hidden_states)
+-++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++                     "with a layer index."
+-++++++                 )
+-++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++++            if isinstance(past_key_value, StaticCache):
+-+++++++                kv_seq_len = key_states.shape[-2]
+-+++++++            else:
+-+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++ 
+-++++++         if past_key_value is not None:
+-++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++++            
+-+++++++            if isinstance(past_key_value, StaticCache):
+-+++++++                kv_seq_len = key_states.shape[-2]
+-++++++ 
+-++++++         # repeat k/v heads if n_kv_heads < n_heads
+-++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++++-
+-+++++++        
+-++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++++ 
+-++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-++++++-            raise ValueError(
+-++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-++++++-                f" {attn_weights.shape}"
+-++++++-            )
+-++++++-
+-++++++-        if attention_mask is not None:  # no matter the length, we just slice it
+-++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+++++++        if attention_mask is not None:
+-+++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++++             attn_weights = attn_weights + causal_mask
+-++++++ 
+-++++++         # upcast attention to fp32
+-++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++++ 
+-++++++         attn_output = self.o_proj(attn_output)
+-++++++-
+-+++++++        # @lwx
+-+++++++        
+-+++++++        # max_seq_len = self.max_position_embeddings  # 2048
+-+++++++
+-+++++++        # if attention_mask is not None:
+-+++++++        #     # attention_mask: [B, 1, Sq, Sk]
+-+++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++++++
+-+++++++        #     # pad 到 [max_seq_len, max_seq_len]
+-+++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++++++        #     global_attention_mask = padded_mask
+-+++++++        # else:
+-+++++++        #     global_attention_mask = None
+-+++++++
+-+++++++
+-+++++++        # sparse_mode=3
+-+++++++        # attn_output = mindspore.ops.flash_attention_score(
+-+++++++        #     query=query_states,
+-+++++++        #     key=key_states,
+-+++++++        #     value=value_states,
+-+++++++        #     real_shift=None,
+-+++++++        #     padding_mask=None,
+-+++++++
+-+++++++        #     head_num=self.num_heads,
+-+++++++        #     attn_mask=global_attention_mask,
+-+++++++        #     keep_prob=1.0 - self.attention_dropout,
+-+++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++++        #     input_layout="BNSD",
+-+++++++        #     pre_tokens=2147483647,
+-+++++++        #     next_tokens=2147483647,
+-+++++++        #     inner_precise=0,
+-+++++++        #     drop_mask=None,
+-+++++++        #     prefix=None,
+-+++++++        #     actual_seq_qlen=None,
+-+++++++        #     actual_seq_kvlen=None,
+-+++++++        #     sparse_mode=sparse_mode,
+-+++++++        # )
+-++++++         if not output_attentions:
+-++++++             attn_weights = None
+-++++++ 
+-++++++         return attn_output, attn_weights, past_key_value
+-++++++ 
+-++++++ 
+-+++++++class Qwen2MoeFlashAttention(nn.Module):
+-+++++++    """
+-+++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++++++
+-+++++++    关键改动:
+-+++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++++++       直接传入原始的 key 和 value 张量效率更高。
+-+++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++++++    """
+-+++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++++++        super().__init__()
+-+++++++        self.config = config
+-+++++++        self.layer_idx = layer_idx
+-+++++++        self.hidden_size = config.hidden_size
+-+++++++        self.num_heads = config.num_attention_heads
+-+++++++        self.head_dim = self.hidden_size // self.num_heads
+-+++++++        self.num_key_value_heads = config.num_key_value_heads
+-+++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++++        self.max_position_embeddings = config.max_position_embeddings
+-+++++++        self.rope_theta = config.rope_theta
+-+++++++        self.attention_dropout = config.attention_dropout
+-+++++++
+-+++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++++            raise ValueError(
+-+++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++++++            )
+-+++++++
+-+++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++++++
+-+++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++++++            self.head_dim,
+-+++++++            max_position_embeddings=self.max_position_embeddings,
+-+++++++            base=self.rope_theta,
+-+++++++        )
+-+++++++
+-+++++++    def forward(
+-+++++++        self,
+-+++++++        hidden_states: mindspore.Tensor,
+-+++++++        attention_mask: Optional[mindspore.Tensor] = None,
+-+++++++        position_ids: Optional[mindspore.Tensor] = None,
+-+++++++        past_key_value: Optional[Cache] = None,
+-+++++++        output_attentions: bool = False,
+-+++++++        use_cache: bool = False,
+-+++++++        cache_position: Optional[mindspore.Tensor] = None,
+-+++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++++
+-+++++++        bsz, q_len, _ = hidden_states.shape
+-+++++++
+-+++++++        # 1. 线性投射 Q, K, V
+-+++++++        query_states = self.q_proj(hidden_states)
+-+++++++        key_states = self.k_proj(hidden_states)
+-+++++++        value_states = self.v_proj(hidden_states)
+-+++++++
+-+++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++
+-+++++++        # 3. RoPE 旋转位置编码
+-+++++++        kv_seq_len = key_states.shape[-2]
+-+++++++        if past_key_value is not None:
+-+++++++            if self.layer_idx is None:
+-+++++++                raise ValueError(
+-+++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++++                    "with a layer index."
+-+++++++                )
+-+++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++++++                if cache_position.shape[0] == 1:
+-+++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++++++                    kv_seq_len = past_seen_tokens + 1
+-+++++++                else:
+-+++++++                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++++++            else:
+-+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++++
+-+++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++++
+-+++++++        # 4. KV 缓存更新
+-+++++++        if past_key_value is not None:
+-+++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++++            key_states, value_states = past_key_value.update(
+-+++++++                key_states, value_states, self.layer_idx, cache_kwargs
+-+++++++            )
+-+++++++            
+-+++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++++                if cache_position.shape[0] == 1:
+-+++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++++++                    kv_seq_len = key_states.shape[-2]
+-+++++++
+-+++++++        # 5. [重要] 准备 Attention Mask
+-+++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++++++        fa_attention_mask = None
+-+++++++        if attention_mask is not None:
+-+++++++            # 截取与当前key长度匹配的部分
+-+++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++++++            fa_attention_mask = (mask_slice != 0)
+-+++++++
+-+++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++++++        input_dtype = query_states.dtype
+-+++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++++++            query_states = query_states.to(mindspore.float16)
+-+++++++            key_states = key_states.to(mindspore.float16)
+-+++++++            value_states = value_states.to(mindspore.float16)
+-+++++++
+-+++++++        # 6. [核心] 调用 flash_attention_score 算子
+-+++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++++++        attn_output = mindspore.ops.flash_attention_score(
+-+++++++            query=query_states,
+-+++++++            key=key_states,
+-+++++++            value=value_states,
+-+++++++            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++++++            attn_mask=fa_attention_mask,
+-+++++++            keep_prob=1.0 - self.attention_dropout,
+-+++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++++            input_layout="BNSD",
+-+++++++            sparse_mode=0 # 使用 defaultMask 模式
+-+++++++        )
+-+++++++
+-+++++++        # 恢复原始数据类型
+-+++++++        attn_output = attn_output.to(input_dtype)
+-+++++++
+-+++++++        # 7. 调整输出形状
+-+++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++++        attn_output = self.o_proj(attn_output)
+-+++++++
+-+++++++        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++++++        attn_weights = None
+-+++++++        if output_attentions:
+-+++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++++
+-+++++++        return attn_output, attn_weights, past_key_value
+-+++++++
+-+++++++    # def forward(
+-+++++++    #     self,
+-+++++++    #     hidden_states: mindspore.Tensor,
+-+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++++    #     past_key_value: Optional[Cache] = None,
+-+++++++    #     output_attentions: bool = False,
+-+++++++    #     use_cache: bool = False,
+-+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++++
+-+++++++    #     bsz, q_len, _ = hidden_states.shape
+-+++++++
+-+++++++    #     # 1. 线性投射 Q, K, V
+-+++++++    #     query_states = self.q_proj(hidden_states)
+-+++++++    #     key_states = self.k_proj(hidden_states)
+-+++++++    #     value_states = self.v_proj(hidden_states)
+-+++++++
+-+++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++
+-+++++++    #     # 3. RoPE 旋转位置编码
+-+++++++    #     kv_seq_len = key_states.shape[-2]
+-+++++++    #     if past_key_value is not None:
+-+++++++    #         if self.layer_idx is None:
+-+++++++    #             raise ValueError(
+-+++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++++    #                 "with a layer index."
+-+++++++    #             )
+-+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++++
+-+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++++
+-+++++++    #     # 4. KV 缓存更新
+-+++++++    #     if past_key_value is not None:
+-+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++++    #         key_states, value_states = past_key_value.update(
+-+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++++    #         )
+-+++++++
+-+++++++    #     # 5. 准备 Attention Mask
+-+++++++    #     fa_attention_mask = None
+-+++++++    #     if attention_mask is not None:
+-+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++++    #         fa_attention_mask = (mask_slice != 0)
+-+++++++
+-+++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++++++    #     input_dtype = query_states.dtype
+-+++++++
+-+++++++    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++++++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++++    #         query=query_states,
+-+++++++    #         key=key_states,
+-+++++++    #         value=value_states,
+-+++++++    #         head_num=self.num_heads,
+-+++++++    #         attn_mask=fa_attention_mask,
+-+++++++    #         keep_prob=1.0 - self.attention_dropout,
+-+++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++++    #         input_layout="BNSD",
+-+++++++    #         sparse_mode=0,
+-+++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++++++    #         inner_precise=1
+-+++++++    #     )
+-+++++++
+-+++++++    #     # 恢复原始数据类型
+-+++++++    #     attn_output = attn_output.to(input_dtype)
+-+++++++
+-+++++++    #     # 7. 调整输出形状
+-+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++++    #     attn_output = self.o_proj(attn_output)
+-+++++++
+-+++++++    #     attn_weights = None
+-+++++++    #     if output_attentions:
+-+++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++++
+-+++++++    #     return attn_output, attn_weights, past_key_value
+-+++++++
+-+++++++    # def forward(
+-+++++++    #     self,
+-+++++++    #     hidden_states: mindspore.Tensor,
+-+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++++    #     past_key_value: Optional[Cache] = None,
+-+++++++    #     output_attentions: bool = False,
+-+++++++    #     use_cache: bool = False,
+-+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++++
+-+++++++    #     bsz, q_len, _ = hidden_states.shape
+-+++++++
+-+++++++    #     query_states = self.q_proj(hidden_states)
+-+++++++    #     key_states = self.k_proj(hidden_states)
+-+++++++    #     value_states = self.v_proj(hidden_states)
+-+++++++
+-+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++++
+-+++++++    #     kv_seq_len = key_states.shape[-2]
+-+++++++    #     if past_key_value is not None:
+-+++++++    #         if self.layer_idx is None:
+-+++++++    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++++
+-+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++++
+-+++++++    #     if past_key_value is not None:
+-+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++++    #         key_states, value_states = past_key_value.update(
+-+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++++    #         )
+-+++++++
+-+++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++++
+-+++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++++++    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++++++    #     # <--- 修改结束 ---
+-+++++++
+-+++++++    #     fa_attention_mask = None
+-+++++++    #     if attention_mask is not None:
+-+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++++    #         fa_attention_mask = (mask_slice != 0)
+-+++++++
+-+++++++    #     input_dtype = query_states.dtype
+-+++++++
+-+++++++    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++++    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++++++    #         key=key_states,
+-+++++++    #         value=value_states,
+-+++++++    #         head_num=self.num_heads,
+-+++++++    #         attn_mask=fa_attention_mask,
+-+++++++    #         keep_prob=1.0 - self.attention_dropout,
+-+++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++++++    #         input_layout="BNSD",
+-+++++++    #         sparse_mode=0,
+-+++++++    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++++++    #     )
+-+++++++
+-+++++++    #     attn_output = attn_output.to(input_dtype)
+-+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++++    #     attn_output = self.o_proj(attn_output)
+-+++++++
+-+++++++    #     attn_weights = None
+-+++++++    #     if output_attentions:
+-+++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++++++
+-+++++++    #     return attn_output, attn_weights, past_key_value
+-+++++++
+-++++++ QWEN2MOE_ATTENTION_CLASSES = {
+-++++++     "eager": Qwen2MoeAttention,
+-+++++++    "flash-attention": Qwen2MoeFlashAttention,
+-++++++ }
+-++++++ 
+-++++++ 
+-++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-++++++ 
+-+++++++    #@dwj
+-+++++++    # 只遍历激活的专家，而非全部专家
+-++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
+-++++++-        # router_logits: (batch * sequence_length, n_experts)
+-++++++-        router_logits = self.gate(hidden_states)
+-++++++-
+-++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-++++++-        if self.norm_topk_prob:
+-++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-++++++-        # we cast back to the input dtype
+-++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
+-++++++-
+-++++++-        final_hidden_states = ops.zeros(
+-++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-++++++-        )
+-++++++-
+-++++++-        # One hot encode the selected experts to create an expert mask
+-++++++-        # this will be used to easily index which expert is going to be sollicitated
+-++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-++++++-
+-++++++-        # Loop over all available experts in the model and perform the computation on each expert
+-++++++-        for expert_idx in range(self.num_experts):
+-++++++-            expert_layer = self.experts[expert_idx]
+-++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-++++++-
+-++++++-            # Index the correct hidden states and compute the expert hidden state for
+-++++++-            # the current expert. We need to make sure to multiply the output hidden
+-++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-++++++-            if 0 not in idx.shape:
+-++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-++++++-
+-++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
+-++++++-                # the `top_x` tensor here.
+-++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-++++++-
+-++++++-        shared_expert_output = self.shared_expert(hidden_states)
+-++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-++++++-
+-++++++-        final_hidden_states = final_hidden_states + shared_expert_output
+-+++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++++            num_tokens = hidden_states_reshaped.shape[0]
+-+++++++
+-+++++++            router_logits = self.gate(hidden_states_reshaped)
+-+++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++++
+-+++++++            if self.norm_topk_prob:
+-+++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++++            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++++            
+-+++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++++++            flat_selected_experts = selected_experts.flatten()
+-+++++++            
+-+++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++++++            token_indices = broadcasted_token_indices.flatten()
+-+++++++            
+-+++++++            active_experts = ops.unique(flat_selected_experts)
+-+++++++            
+-+++++++            for expert_idx_tensor in active_experts:
+-+++++++                expert_idx = expert_idx_tensor.item()
+-+++++++                expert_layer = self.experts[expert_idx]
+-+++++++                
+-+++++++                mask = (flat_selected_experts == expert_idx_tensor)
+-+++++++                selected_token_indices = token_indices[mask]
+-+++++++                selected_routing_weights = routing_weights.flatten()[mask]
+-+++++++                
+-+++++++                current_states = hidden_states_reshaped[selected_token_indices]
+-+++++++                
+-+++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++++                
+-+++++++                final_hidden_states = final_hidden_states.index_add(
+-+++++++                    dim=0,
+-+++++++                    index=selected_token_indices,
+-+++++++                    source=expert_output.to(hidden_states.dtype)
+-+++++++                )
+-+++++++            
+-+++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-++++++ 
+-++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-++++++-        return final_hidden_states, router_logits
+-+++++++            final_hidden_states = final_hidden_states + shared_expert_output
+-+++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++++            
+-+++++++            return final_hidden_states, router_logits
+-++++++ 
+-++++++ 
+-++++++ class Qwen2MoeDecoderLayer(nn.Module):
+-++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-++++++ 
+-++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-++++++ 
+-+++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++++++
+-++++++         if (layer_idx not in config.mlp_only_layers) and (
+-++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-++++++         ):
+-++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-++++++     _skip_keys_device_placement = "past_key_values"
+-++++++     _supports_cache_class = True
+-+++++++#lwx
+-+++++++    # _supports_static_cache = True
+-++++++ 
+-++++++     def _init_weights(self, module):
+-++++++         std = self.config.initializer_range
+-++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-++++++         return causal_mask
+-++++++ 
+-++++++ 
+-++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-++++++     _tied_weights_keys = ["lm_head.weight"]
+-++++++ 
+-++++++     def __init__(self, config):
+-++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++++         self.num_experts_per_tok = config.num_experts_per_tok
+-++++++         # Initialize weights and apply final processing
+-++++++         self.post_init()
+-+++++++        # @lwx
+-+++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+++++++        #     self.generation_config.cache_implementation = "static"
+-+++++++        self._warmed_up = False
+-+++++++
+-+++++++    def warmup_moe_model(self):
+-+++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+++++++        test_texts = [
+-+++++++            "warmup short",
+-+++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+++++++        ]
+-+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++++++        if tokenizer is None:
+-+++++++            from mindnlp.transformers import AutoTokenizer
+-+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++++++            self._warmup_tokenizer = tokenizer
+-+++++++
+-+++++++        for text in test_texts:
+-+++++++            inputs = tokenizer(text, return_tensors="ms")
+-+++++++            with mindspore._no_grad():
+-+++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-++++++ 
+-++++++     def get_input_embeddings(self):
+-++++++         return self.model.embed_tokens
+-++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-++++++         ```"""
+-+++++++        if not self._warmed_up:
+-+++++++            self._warmed_up = True
+-+++++++            self.warmup_moe_model()
+-++++++ 
+-++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-++++++         output_router_logits = (
+-++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-++++++             }
+-++++++         )
+-++++++         return model_inputs
+-+++++++# @lwx
+-+++++++    # def _decode_one_tokens_logits(
+-+++++++    #     self,
+-+++++++    #     cur_token: mindspore.Tensor,
+-+++++++    #     input_pos: Optional[mindspore.Tensor],
+-+++++++    #     cache_position: mindspore.Tensor,
+-+++++++    #     past_key_values: StaticCache,
+-+++++++    # ) -> mindspore.Tensor:
+-+++++++    #     """
+-+++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+++++++        
+-+++++++    #     Args:
+-+++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+++++++    #         input_pos: 输入位置信息，可选
+-+++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+++++++            
+-+++++++    #     Returns:
+-+++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+++++++    #     """
+-+++++++    #     # 调用JIT编译的版本
+-+++++++    #     return self.get_decode_one_tokens_logits(
+-+++++++    #         cur_token=cur_token,
+-+++++++    #         input_pos=input_pos,
+-+++++++    #         cache_position=cache_position,
+-+++++++    #         past_key_values=past_key_values,
+-+++++++    #     )
+-+++++++    
+-+++++++    # @mindspore.jit(jit_level='O1')
+-+++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+++++++    #     """
+-+++++++    #     JIT编译的函数，用于高效的单token解码
+-+++++++    #     使用JIT编译优化以支持静态shape和高效执行
+-+++++++        
+-+++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+++++++    #     """
+-+++++++    #     outputs = self.model.forward(
+-+++++++    #         input_ids=cur_token,
+-+++++++    #         position_ids=input_pos,
+-+++++++    #         cache_position=cache_position,
+-+++++++    #         past_key_values=past_key_values,
+-+++++++    #         use_cache=True,
+-+++++++    #         return_dict=False,
+-+++++++    #     )
+-+++++++        
+-+++++++    #     hidden_states = outputs[0]
+-+++++++    #     logits = self.lm_head.forward(hidden_states)
+-+++++++    #     logits = logits.float()
+-+++++++        
+-+++++++    #     return logits[:, -1, :]
+-+++++++
+-+++++++    # def _sample(
+-+++++++    #     self,
+-+++++++    #     input_ids: mindspore.Tensor,
+-+++++++    #     logits_processor,
+-+++++++    #     stopping_criteria,
+-+++++++    #     generation_config,
+-+++++++    #     synced_devices: bool,
+-+++++++    #     streamer=None,
+-+++++++    #     logits_warper=None,
+-+++++++    #     **model_kwargs,
+-+++++++    # ):
+-+++++++    #     """
+-+++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+++++++    #     """
+-+++++++    #     from ...generation.logits_process import LogitsProcessorList
+-+++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+++++++    #     from mindnlp.core import nn, ops, no_grad
+-+++++++    #     import numpy as np
+-+++++++        
+-+++++++    #     # 检查是否使用 StaticCache
+-+++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+++++++    #     # 否则，直接调用父类方法
+-+++++++    #     past_key_values = model_kwargs.get("past_key_values")
+-+++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+++++++
+-+++++++    #     if not isinstance(past_key_values, StaticCache):
+-+++++++    #         # 不使用 StaticCache，直接调用父类方法
+-+++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+++++++    #         return super()._sample(
+-+++++++    #             input_ids=input_ids,
+-+++++++    #             logits_processor=logits_processor,
+-+++++++    #             stopping_criteria=stopping_criteria,
+-+++++++    #             generation_config=generation_config,
+-+++++++    #             synced_devices=synced_devices,
+-+++++++    #             streamer=streamer,
+-+++++++    #             logits_warper=logits_warper,
+-+++++++    #             **model_kwargs,
+-+++++++    #         )
+-+++++++        
+-+++++++    #     # 使用 StaticCache，进入自定义循环
+-+++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+++++++    #     pad_token_id = generation_config._pad_token_tensor
+-+++++++    #     output_attentions = generation_config.output_attentions
+-+++++++    #     output_hidden_states = generation_config.output_hidden_states
+-+++++++    #     output_scores = generation_config.output_scores
+-+++++++    #     output_logits = generation_config.output_logits
+-+++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+++++++    #     max_length = generation_config.max_length
+-+++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+++++++    #     do_sample = generation_config.do_sample
+-+++++++        
+-+++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+++++++    #         raise ValueError(
+-+++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+++++++    #             f"{logits_warper})."
+-+++++++    #         )
+-+++++++        
+-+++++++    #     # init attention / hidden states / scores tuples
+-+++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+++++++        
+-+++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+++++++    #         encoder_hidden_states = (
+-+++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+++++++    #         )
+-+++++++        
+-+++++++    #     # keep track of which sequences are already finished
+-+++++++    #     batch_size, cur_len = input_ids.shape
+-+++++++    #     this_peer_finished = False
+-+++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+++++++        
+-+++++++    #     time_record = []
+-+++++++    #     from ....utils.testing_utils import parse_flag_from_env
+-+++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+++++++        
+-+++++++    #     while self._has_unfinished_sequences(
+-+++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+++++++    #     ):
+-+++++++    #         if _record_time:
+-+++++++    #             import time as time_module
+-+++++++    #             infer_start = time_module.time()
+-+++++++            
+-+++++++    #         # prepare model inputs
+-+++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+++++++            
+-+++++++    #         # prepare variable output controls
+-+++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+++++++            
+-+++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+++++++    #         cur_cache_position = model_inputs.get("cache_position")
+-+++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+++++++    #         cur_input_ids = model_inputs.get("input_ids")
+-+++++++            
+-+++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+++++++    #             cur_cache_position is not None and 
+-+++++++    #             len(cur_cache_position.shape) > 0 and
+-+++++++    #             cur_cache_position.shape[0] == 1 and
+-+++++++    #             cur_input_ids is not None and
+-+++++++    #             cur_input_ids.shape[1] == 1):
+-+++++++    #             # 使用 JIT 优化的单 token 解码
+-+++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+++++++    #             if not hasattr(self, '_jit_used'):
+-+++++++    #                 self._jit_used = False
+-+++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+++++++                
+-+++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+++++++    #                 cur_token=cur_input_ids,
+-+++++++    #                 input_pos=model_inputs.get("position_ids"),
+-+++++++    #                 cache_position=cur_cache_position,
+-+++++++    #                 past_key_values=cur_past_key_values,
+-+++++++    #             )
+-+++++++                
+-+++++++    #             # 标记已使用JIT（用于后续判断）
+-+++++++    #             if not self._jit_used:
+-+++++++    #                 self._jit_used = True
+-+++++++                
+-+++++++    #             # 构造兼容的输出对象
+-+++++++    #             class JitOptimizedOutput:
+-+++++++    #                 def __init__(self, logits, config):
+-+++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+++++++    #                     self.config = config
+-+++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+++++++    #                     self.cross_attentions = None
+-+++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+++++++                
+-+++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+++++++    #         else:
+-+++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+++++++    #             outputs = self(**model_inputs, return_dict=True)
+-+++++++            
+-+++++++    #         if synced_devices and this_peer_finished:
+-+++++++    #             continue
+-+++++++            
+-+++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+++++++    #         next_token_logits = outputs.logits[:, -1, :]
+-+++++++            
+-+++++++    #         # pre-process distribution
+-+++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+++++++    #         if do_sample:
+-+++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+++++++            
+-+++++++    #         # Store scores, attentions and hidden_states when required
+-+++++++    #         if return_dict_in_generate:
+-+++++++    #             if output_scores:
+-+++++++    #                 scores += (next_token_scores,)
+-+++++++    #             if output_logits:
+-+++++++    #                 raw_logits += (next_token_logits,)
+-+++++++    #             if output_attentions:
+-+++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+++++++    #                 if self.config.is_encoder_decoder:
+-+++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+++++++                
+-+++++++    #             if output_hidden_states:
+-+++++++    #                 hidden = (
+-+++++++    #                     outputs.decoder_hidden_states
+-+++++++    #                     if self.config.is_encoder_decoder
+-+++++++    #                     else outputs.hidden_states
+-+++++++    #                 )
+-+++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+++++++            
+-+++++++    #         # token selection
+-+++++++    #         if do_sample:
+-+++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+++++++    #         else:
+-+++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+++++++            
+-+++++++    #         # finished sentences should have their next token be a padding token
+-+++++++    #         if has_eos_stopping_criteria:
+-+++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+++++++            
+-+++++++    #         # update generated ids, model inputs, and length for next step
+-+++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+++++++    #         if streamer is not None:
+-+++++++    #             streamer.put(next_tokens)
+-+++++++            
+-+++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+++++++    #             outputs,
+-+++++++    #             model_kwargs,
+-+++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+++++++    #         )
+-+++++++            
+-+++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+++++++    #         cur_len += 1
+-+++++++            
+-+++++++    #         if _record_time:
+-+++++++    #             import time as time_module
+-+++++++    #             infer_stop = time_module.time()
+-+++++++    #             time_record.append(infer_stop - infer_start)
+-+++++++            
+-+++++++    #         del outputs
+-+++++++        
+-+++++++    #     average_infer_time = None
+-+++++++    #     if time_record:
+-+++++++    #         if len(time_record) > 1:
+-+++++++    #             time_record.pop(0)
+-+++++++    #         average_infer_time = sum(time_record) / len(time_record)
+-+++++++    #         print(f'average inference time is: {average_infer_time}')
+-+++++++    #         print(f'inference time record: {time_record}')
+-+++++++        
+-+++++++    #     if streamer is not None:
+-+++++++    #         streamer.end()
+-+++++++        
+-+++++++    #     # 简单判断：打印是否使用了JIT路径
+-+++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+++++++    #     else:
+-+++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+++++++        
+-+++++++    #     if return_dict_in_generate:
+-+++++++    #         if self.config.is_encoder_decoder:
+-+++++++    #             return GenerateEncoderDecoderOutput(
+-+++++++    #                 sequences=input_ids,
+-+++++++    #                 scores=scores,
+-+++++++    #                 logits=raw_logits,
+-+++++++    #                 encoder_attentions=encoder_attentions,
+-+++++++    #                 encoder_hidden_states=encoder_hidden_states,
+-+++++++    #                 decoder_attentions=decoder_attentions,
+-+++++++    #                 cross_attentions=cross_attentions,
+-+++++++    #                 decoder_hidden_states=decoder_hidden_states,
+-+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++++++    #                 average_infer_time=average_infer_time
+-+++++++    #             )
+-+++++++    #         else:
+-+++++++    #             return GenerateDecoderOnlyOutput(
+-+++++++    #                 sequences=input_ids,
+-+++++++    #                 scores=scores,
+-+++++++    #                 logits=raw_logits,
+-+++++++    #                 attentions=decoder_attentions,
+-+++++++    #                 hidden_states=decoder_hidden_states,
+-+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++++++    #                 average_infer_time=average_infer_time
+-+++++++    #             )
+-+++++++    #     else:
+-+++++++    #         return input_ids
+-+++++++            
+-+++++++    # def _prepare_cache_for_generation(
+-+++++++    #     self,
+-+++++++    #     generation_config,
+-+++++++    #     model_kwargs,
+-+++++++    #     assistant_model,
+-+++++++    #     batch_size,
+-+++++++    #     max_cache_length,
+-+++++++    # ):
+-+++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+++++++    #         generation_config.cache_implementation = "static"
+-+++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+++++++        
+-+++++++    #     if generation_config.cache_implementation == "static":
+-+++++++    #         base_required_from_max_length = generation_config.max_length + 1
+-+++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+++++++    #         min_cache_size = 50
+-+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+++++++    #         else:
+-+++++++    #             max_cache_length = max(base_required, min_cache_size)
+-+++++++            
+-+++++++    #         original_max_cache_length = max_cache_length
+-+++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+++++++            
+-+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++++++    #             if max_cache_length > self.config.max_position_embeddings:
+-+++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+++++++        
+-+++++++    #     result = super()._prepare_cache_for_generation(
+-+++++++    #         generation_config=generation_config,
+-+++++++    #         model_kwargs=model_kwargs,
+-+++++++    #         assistant_model=assistant_model,
+-+++++++    #         batch_size=batch_size,
+-+++++++    #         max_cache_length=max_cache_length,
+-+++++++    #     )
+-+++++++        
+-+++++++    #     if generation_config.cache_implementation == "static":
+-+++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+++++++    #         created_cache = model_kwargs.get(cache_name)
+-+++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+++++++    #             if created_cache.max_cache_len < generation_config.max_length:
+-+++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+++++++        
+-+++++++    #     return result
+-+++++++
+-+++++++
+-+++++++
+-++++++ 
+-++++++ 
+-++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-++++++-- 
+-++++++2.27.0
+-++++++
+-+++++-- 
+-+++++2.27.0
+-+++++
+-++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
+-++++new file mode 100644
+-++++index 00000000..966529e4
+-++++--- /dev/null
+-+++++++ b/patches/0003-20261106secondcommit.patch
+-++++@@ -0,0 +1,2769 @@
+-+++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
+-+++++From: Pinoeer-kingxi <13022943007@163.com>
+-+++++Date: Thu, 6 Nov 2025 14:54:37 +0800
+-+++++Subject: [PATCH 3/3] 20261106secondcommit
+-+++++
+-+++++---
+-+++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
+-+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
+-+++++ patches/0001-20251104commit.patch             | 1272 -----------------
+-+++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
+-+++++ delete mode 100644 patches/0001-20251104commit.patch
+-+++++
+-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++index 73773c22..2f9192bf 100644
+-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
+-+++++ 
+-+++++ _CONFIG_FOR_DOC = "DeepseekConfig"
+-+++++ 
+-++++++_attn_mask_cache = {}
+-++++++
+-++++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
+-++++++    q_len = batch_and_seq[1]
+-++++++    kv_len = batch_and_seq[1] + past_key_values_length 
+-++++++    key = (batch_and_seq[0], q_len, kv_len)
+-++++++
+-++++++    if key in _attn_mask_cache:
+-++++++        return _attn_mask_cache[key]
+-++++++
+-++++++    mask = _prepare_4d_causal_attention_mask(
+-++++++        attention_mask,
+-++++++        batch_and_seq,
+-++++++        inputs_embeds,
+-++++++        past_key_values_length,
+-++++++    )
+-++++++    _attn_mask_cache[key] = mask
+-++++++    return mask
+-+++++ 
+-+++++ def _get_unpad_data(attention_mask):
+-+++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
+-+++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
+-+++++         return final_output
+-+++++ 
+-+++++ 
+-+++++-    @no_grad()
+-+++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-        expert_cache = ops.zeros_like(x)
+-+++++-        idxs = flat_expert_indices.argsort()
+-+++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++-        token_idxs = idxs // self.num_experts_per_tok
+-+++++-
+-+++++-        for i, end_idx in enumerate(tokens_per_expert):
+-+++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++-            if start_idx == end_idx:
+-+++++-                continue
+-+++++-            expert = self.experts[i]
+-+++++-            exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++-            expert_tokens = x[exp_token_idx]
+-+++++-            expert_out = expert(expert_tokens)
+-+++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++-
+-+++++-        return expert_cache
+-+++++-        
+-+++++     # @no_grad()
+-+++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-    #     # expert_cache = torch.zeros_like(x)
+-+++++-    #     # idxs = flat_expert_indices.argsort()
+-+++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++-    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++-    #     #     if start_idx == end_idx:
+-+++++-    #     #         continue
+-+++++-    #     #     expert = self.experts[i]
+-+++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++-    #     #     expert_tokens = x[exp_token_idx]
+-+++++-    #     #     expert_out = expert(expert_tokens)
+-+++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++-    #     # return expert_cache
+-++++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++++     #     expert_cache = ops.zeros_like(x)
+-+++++     #     idxs = flat_expert_indices.argsort()
+-+++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
+-+++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++ 
+-+++++     #     return expert_cache
+-+++++-    # @no_grad()
+-+++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-    #     expert_cache = ops.zeros_like(x)
+-++++++        
+-++++++    @no_grad()
+-++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-++++++        """
+-++++++        优化版 MoE prefill：
+-++++++        - 批量张量化处理同一个 expert 的所有 token
+-++++++        - 跳过无 token 的专家
+-++++++        - 保持结果完全一致
+-++++++        """
+-++++++        # 初始化输出缓存
+-++++++        expert_cache = ops.zeros_like(x)
+-+++++ 
+-+++++-    #     # 排序保证顺序一致
+-+++++-    #     idxs = flat_expert_indices.argsort()
+-+++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++-    #     token_idxs = idxs // self.num_experts_per_tok
+-++++++        # 排序（确保 scatter_add 位置对应原逻辑）
+-++++++        idxs = flat_expert_indices.argsort()
+-++++++        sorted_expert_indices = flat_expert_indices[idxs]
+-++++++        sorted_token_indices = idxs // self.num_experts_per_tok
+-+++++ 
+-+++++-    #     # 找出有 token 的专家
+-+++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++++        # 每个 expert 的 token 数
+-++++++        tokens_per_expert = sorted_expert_indices.bincount()
+-+++++ 
+-+++++-    #     for i in active_experts.tolist():
+-+++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++-    #         end_idx = tokens_per_expert[i]
+-+++++-    #         if start_idx == end_idx:  # 没有 token
+-+++++-    #             continue
+-++++++        # 找出有 token 的专家
+-++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-+++++ 
+-+++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++-    #         expert_tokens = x[exp_token_idx]
+-+++++-    #         expert_out = self.experts[i](expert_tokens)
+-+++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++++        for expert_id in active_experts.tolist():
+-++++++            # 取该 expert 对应的排序后 token 区间
+-++++++            start = (tokens_per_expert[:expert_id]).sum().item()
+-++++++            end = start + tokens_per_expert[expert_id].item()
+-+++++ 
+-+++++-    #         expert_cache = mindspore.mint.scatter_add(
+-+++++-    #             expert_cache,
+-+++++-    #             0,
+-+++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++++-    #             expert_out
+-+++++-    #         )
+-++++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
+-++++++            expert_tokens = x[token_idx]                     # 取输入向量
+-+++++ 
+-+++++-    #     return expert_cache
+-++++++            # 执行专家 MLP
+-++++++            expert_out = self.experts[expert_id](expert_tokens)
+-++++++
+-++++++            # 按权重缩放
+-++++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
+-++++++
+-++++++            # 回写到缓存（等价 scatter_add）
+-++++++            expert_cache = mindspore.mint.scatter_add(
+-++++++                expert_cache,
+-++++++                0,
+-++++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++++                scaled_out
+-++++++            )
+-++++++
+-++++++        return expert_cache
+-++++++
+-++++++        # @no_grad()
+-++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++++        #     # expert_cache = torch.zeros_like(x)
+-++++++        #     # idxs = flat_expert_indices.argsort()
+-++++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-++++++        #     # token_idxs = idxs // self.num_experts_per_tok
+-++++++        #     # for i, end_idx in enumerate(tokens_per_expert):
+-++++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-++++++        #     #     if start_idx == end_idx:
+-++++++        #     #         continue
+-++++++        #     #     expert = self.experts[i]
+-++++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++        #     #     expert_tokens = x[exp_token_idx]
+-++++++        #     #     expert_out = expert(expert_tokens)
+-++++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-++++++        #     # return expert_cache
+-++++++        #     expert_cache = ops.zeros_like(x)
+-++++++        #     idxs = flat_expert_indices.argsort()
+-++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++++        #     token_idxs = idxs // self.num_experts_per_tok
+-++++++
+-++++++        #     for i, end_idx in enumerate(tokens_per_expert):
+-++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++++        #         if start_idx == end_idx:
+-++++++        #             continue
+-++++++        #         expert = self.experts[i]
+-++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++        #         expert_tokens = x[exp_token_idx]
+-++++++        #         expert_out = expert(expert_tokens)
+-++++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-++++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-++++++
+-++++++        #     return expert_cache
+-++++++        # @no_grad()
+-++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-++++++        #     expert_cache = ops.zeros_like(x)
+-++++++
+-++++++        #     # 排序保证顺序一致
+-++++++        #     idxs = flat_expert_indices.argsort()
+-++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-++++++        #     token_idxs = idxs // self.num_experts_per_tok
+-++++++
+-++++++        #     # 找出有 token 的专家
+-++++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-++++++
+-++++++        #     for i in active_experts.tolist():
+-++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-++++++        #         end_idx = tokens_per_expert[i]
+-++++++        #         if start_idx == end_idx:  # 没有 token
+-++++++        #             continue
+-++++++
+-++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
+-++++++        #         expert_tokens = x[exp_token_idx]
+-++++++        #         expert_out = self.experts[i](expert_tokens)
+-++++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-++++++
+-++++++        #         expert_cache = mindspore.mint.scatter_add(
+-++++++        #             expert_cache,
+-++++++        #             0,
+-++++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-++++++        #             expert_out
+-++++++        #         )
+-++++++
+-++++++        #     return expert_cache
+-+++++ 
+-+++++ 
+-+++++ 
+-+++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
+-+++++ 
+-+++++         return attn_output, attn_weights, past_key_value
+-+++++ 
+-+++++-
+-+++++ # class DeepseekFlashAttention(nn.Module):
+-+++++ #     """
+-+++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
+-+++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
+-+++++ 
+-+++++         return attn_output, attn_weights, past_key_value
+-+++++ 
+-++++++
+-+++++ Deepseek_ATTENTION_CLASSES = {
+-+++++     "eager": DeepseekAttention,
+-+++++     "flash-attention": DeepseekFlashAttention,
+-+++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
+-+++++             )
+-+++++         else:
+-+++++             # 4d mask is passed through the layers
+-+++++-            attention_mask = _prepare_4d_causal_attention_mask(
+-++++++            # attention_mask = _prepare_4d_causal_attention_mask(
+-++++++            #     attention_mask,
+-++++++            #     (batch_size, seq_length),
+-++++++            #     inputs_embeds,
+-++++++            #     past_key_values_length,
+-++++++            # )
+-++++++            #@dwj
+-++++++            attention_mask = get_cached_causal_mask(
+-+++++                 attention_mask,
+-+++++                 (batch_size, seq_length),
+-+++++                 inputs_embeds,
+-+++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++++         # Initialize weights and apply final processing
+-+++++         self.post_init()
+-+++++         self.warm_up = False
+-++++++        #@dwj
+-++++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
+-++++++            self.num_layers,
+-++++++            self.num_attention_heads,
+-++++++            self.head_dim,
+-++++++            batch_size=1,
+-++++++            max_length=self.max_length,
+-++++++            dtype=mindspore.float16
+-++++++        )
+-++++++
+-++++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
+-++++++        key_cache = []
+-++++++        value_cache = []
+-++++++        for _ in range(num_layers):
+-++++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
+-++++++            key_cache.append(k)
+-++++++            value_cache.append(v)
+-++++++        return key_cache, value_cache
+-++++++
+-+++++ 
+-+++++     def warmup_moe_model_deep(self):
+-+++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++index bced285c..ebd7782e 100644
+-+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
+-+++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
+-+++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
+-+++++ 
+-+++++-Long_Prompt = False
+-+++++-PROMPT_LENGTH_THRESHOLD = 128
+-++++++Long_Prompt = 1
+-++++++LONG_PROMPT_LENGTH_THRESHOLD = 128
+-++++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
+-++++++
+-++++++_causal_mask_cache = {}
+-++++++
+-++++++def get_cached_causal_mask_with_cache_position(
+-++++++    attention_mask: mindspore.Tensor,
+-++++++    sequence_length: int,
+-++++++    target_length: int,
+-++++++    dtype: mindspore.dtype,
+-++++++    min_dtype: float,
+-++++++    cache_position: mindspore.Tensor,
+-++++++    batch_size: int,
+-++++++):
+-++++++    """
+-++++++    带缓存的 causal mask 构造函数
+-++++++    """
+-++++++    # q_len 是当前 query 长度
+-++++++    q_len = sequence_length
+-++++++    # kv_len 是 target_length
+-++++++    kv_len = target_length
+-++++++
+-++++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
+-++++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
+-++++++
+-++++++    if key in _causal_mask_cache:
+-++++++        return _causal_mask_cache[key]
+-++++++
+-++++++    # 调用原来的 mask 构造逻辑
+-++++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++++        attention_mask,
+-++++++        sequence_length=sequence_length,
+-++++++        target_length=target_length,
+-++++++        dtype=dtype,
+-++++++        min_dtype=min_dtype,
+-++++++        cache_position=cache_position,
+-++++++        batch_size=batch_size,
+-++++++    )
+-++++++    # 缓存结果
+-++++++    _causal_mask_cache[key] = causal_mask
+-++++++    return causal_mask
+-+++++ 
+-+++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
+-+++++ def _prepare_4d_causal_attention_mask_with_cache_position(
+-+++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+++++ 
+-+++++ 
+-+++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
+-++++++# class Qwen2MoeAttention(nn.Module):
+-++++++#     """
+-++++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-++++++#     and "Generating Long Sequences with Sparse Transformers".
+-++++++#     """
+-++++++
+-++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-++++++#         super().__init__()
+-++++++#         self.config = config
+-++++++#         self.layer_idx = layer_idx
+-++++++#         if layer_idx is None:
+-++++++#             logger.warning_once(
+-++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-++++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++++#                 "when creating this class."
+-++++++#             )
+-++++++
+-++++++#         self.hidden_size = config.hidden_size
+-++++++#         self.num_heads = config.num_attention_heads
+-++++++#         self.head_dim = self.hidden_size // self.num_heads
+-++++++#         self.num_key_value_heads = config.num_key_value_heads
+-++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-++++++#         self.max_position_embeddings = config.max_position_embeddings
+-++++++#         self.rope_theta = config.rope_theta
+-++++++#         self.is_causal = True
+-++++++#         self.attention_dropout = config.attention_dropout
+-++++++
+-++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
+-++++++#             raise ValueError(
+-++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
+-++++++#                 f" and `num_heads`: {self.num_heads})."
+-++++++#             )
+-++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-++++++
+-++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-++++++#             self.head_dim,
+-++++++#             max_position_embeddings=self.max_position_embeddings,
+-++++++#             base=self.rope_theta,
+-++++++#         )
+-++++++
+-++++++#     def forward(
+-++++++#         self,
+-++++++#         hidden_states: mindspore.Tensor,
+-++++++#         attention_mask: Optional[mindspore.Tensor] = None,
+-++++++#         position_ids: Optional[mindspore.Tensor] = None,
+-++++++#         past_key_value: Optional[Cache] = None,
+-++++++#         output_attentions: bool = False,
+-++++++#         use_cache: bool = False,
+-++++++#         cache_position: Optional[mindspore.Tensor] = None,
+-++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-++++++
+-++++++        
+-++++++
+-++++++#         bsz, q_len, _ = hidden_states.shape
+-++++++
+-++++++#         query_states = self.q_proj(hidden_states)
+-++++++#         key_states = self.k_proj(hidden_states)
+-++++++#         value_states = self.v_proj(hidden_states)
+-++++++
+-++++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-++++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-++++++
+-++++++#         kv_seq_len = key_states.shape[-2]
+-++++++#         if past_key_value is not None:
+-++++++#             if self.layer_idx is None:
+-++++++#                 raise ValueError(
+-++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-++++++#                     "with a layer index."
+-++++++#                 )
+-++++++#             if isinstance(past_key_value, StaticCache):
+-++++++#                 kv_seq_len = key_states.shape[-2]
+-++++++#             else:
+-++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-++++++
+-++++++#         if past_key_value is not None:
+-++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++++            
+-++++++#             if isinstance(past_key_value, StaticCache):
+-++++++#                 kv_seq_len = key_states.shape[-2]
+-++++++
+-++++++#         # repeat k/v heads if n_kv_heads < n_heads
+-++++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++++        
+-++++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++++
+-++++++#         if attention_mask is not None:
+-++++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++++#             attn_weights = attn_weights + causal_mask
+-++++++
+-++++++#         # upcast attention to fp32
+-++++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-++++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-++++++#         attn_output = ops.matmul(attn_weights, value_states)
+-++++++
+-++++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-++++++#             raise ValueError(
+-++++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-++++++#                 f" {attn_output.shape}"
+-++++++#             )
+-++++++
+-++++++#         attn_output = ops.transpose(attn_output, 1, 2)
+-++++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++++
+-++++++#         attn_output = self.o_proj(attn_output)
+-++++++#         # @lwx
+-++++++        
+-++++++#         # max_seq_len = self.max_position_embeddings  # 2048
+-++++++
+-++++++#         # if attention_mask is not None:
+-++++++#         #     # attention_mask: [B, 1, Sq, Sk]
+-++++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-++++++
+-++++++#         #     # pad 到 [max_seq_len, max_seq_len]
+-++++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-++++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-++++++#         #     global_attention_mask = padded_mask
+-++++++#         # else:
+-++++++#         #     global_attention_mask = None
+-++++++
+-++++++
+-++++++#         # sparse_mode=3
+-++++++#         # attn_output = mindspore.ops.flash_attention_score(
+-++++++#         #     query=query_states,
+-++++++#         #     key=key_states,
+-++++++#         #     value=value_states,
+-++++++#         #     real_shift=None,
+-++++++#         #     padding_mask=None,
+-++++++
+-++++++#         #     head_num=self.num_heads,
+-++++++#         #     attn_mask=global_attention_mask,
+-++++++#         #     keep_prob=1.0 - self.attention_dropout,
+-++++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++#         #     input_layout="BNSD",
+-++++++#         #     pre_tokens=2147483647,
+-++++++#         #     next_tokens=2147483647,
+-++++++#         #     inner_precise=0,
+-++++++#         #     drop_mask=None,
+-++++++#         #     prefix=None,
+-++++++#         #     actual_seq_qlen=None,
+-++++++#         #     actual_seq_kvlen=None,
+-++++++#         #     sparse_mode=sparse_mode,
+-++++++#         # )
+-++++++#         if not output_attentions:
+-++++++#             attn_weights = None
+-++++++
+-++++++#         return attn_output, attn_weights, past_key_value
+-++++++
+-+++++ class Qwen2MoeAttention(nn.Module):
+-+++++     """
+-+++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
+-+++++-    and "Generating Long Sequences with Sparse Transformers".
+-+++++-    """
+-++++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
+-+++++ 
+-++++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
+-++++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
+-++++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
+-++++++
+-++++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
+-++++++    """
+-+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++++         super().__init__()
+-+++++         self.config = config
+-+++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
+-+++++         if layer_idx is None:
+-+++++             logger.warning_once(
+-+++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
+-+++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
+-+++++                 "when creating this class."
+-+++++             )
+-+++++ 
+-+++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
+-+++++         use_cache: bool = False,
+-+++++         cache_position: Optional[mindspore.Tensor] = None,
+-+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++-
+-+++++         
+-+++++-
+-++++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
+-+++++         bsz, q_len, _ = hidden_states.shape
+-+++++ 
+-+++++         query_states = self.q_proj(hidden_states)
+-+++++         key_states = self.k_proj(hidden_states)
+-+++++         value_states = self.v_proj(hidden_states)
+-+++++ 
+-+++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
+-+++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
+-+++++-
+-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-++++++        
+-+++++         kv_seq_len = key_states.shape[-2]
+-+++++         if past_key_value is not None:
+-+++++-            if self.layer_idx is None:
+-+++++-                raise ValueError(
+-+++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++-                    "with a layer index."
+-+++++-                )
+-+++++-            if isinstance(past_key_value, StaticCache):
+-+++++-                kv_seq_len = key_states.shape[-2]
+-+++++-            else:
+-+++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-++++++        
+-+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++ 
+-+++++         if past_key_value is not None:
+-+++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-++++++
+-++++++        # --- 2. 动态调度核心注意力计算 ---
+-++++++        global Long_Prompt
+-++++++        if Long_Prompt >= 1:
+-++++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
+-++++++            fa_attention_mask = None
+-++++++            if attention_mask is not None:
+-++++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-++++++                fa_attention_mask = (mask_slice != 0)
+-++++++
+-++++++            attn_output = mindspore.ops.flash_attention_score(
+-++++++                query=query_states,
+-++++++                key=key_states,
+-++++++                value=value_states,
+-++++++                head_num=self.num_heads,
+-++++++                attn_mask=fa_attention_mask,
+-++++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
+-++++++                scalar_value=1.0 / math.sqrt(self.head_dim),
+-++++++                input_layout="BNSD",
+-++++++                sparse_mode=0,
+-++++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
+-++++++            )
+-+++++             
+-+++++-            if isinstance(past_key_value, StaticCache):
+-+++++-                kv_seq_len = key_states.shape[-2]
+-++++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-++++++            attn_output = self.o_proj(attn_output)
+-++++++            attn_weights = None
+-++++++            if output_attentions:
+-++++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
+-+++++ 
+-+++++-        # repeat k/v heads if n_kv_heads < n_heads
+-+++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++-        
+-+++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-++++++        else:
+-++++++            # --- Eager Attention 路径 (用于短序列和解码) ---
+-++++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
+-++++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
+-++++++            
+-++++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++++ 
+-+++++-        if attention_mask is not None:
+-+++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++++-            attn_weights = attn_weights + causal_mask
+-++++++            if attention_mask is not None:
+-++++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-++++++                attn_weights = attn_weights + causal_mask
+-+++++ 
+-+++++-        # upcast attention to fp32
+-+++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-+++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-+++++-        attn_output = ops.matmul(attn_weights, value_states)
+-++++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
+-++++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
+-++++++            attn_output = ops.matmul(attn_weights, value_states)
+-+++++ 
+-+++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-+++++-            raise ValueError(
+-+++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
+-+++++-                f" {attn_output.shape}"
+-+++++-            )
+-++++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
+-++++++                raise ValueError(
+-++++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
+-++++++                )
+-+++++ 
+-+++++-        attn_output = ops.transpose(attn_output, 1, 2)
+-+++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++++            attn_output = ops.transpose(attn_output, 1, 2)
+-++++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-++++++            attn_output = self.o_proj(attn_output)
+-+++++ 
+-+++++-        attn_output = self.o_proj(attn_output)
+-+++++-        # @lwx
+-++++++            if not output_attentions:
+-++++++                attn_weights = None
+-+++++         
+-+++++-        # max_seq_len = self.max_position_embeddings  # 2048
+-+++++-
+-+++++-        # if attention_mask is not None:
+-+++++-        #     # attention_mask: [B, 1, Sq, Sk]
+-+++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++++-
+-+++++-        #     # pad 到 [max_seq_len, max_seq_len]
+-+++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++++-        #     global_attention_mask = padded_mask
+-+++++-        # else:
+-+++++-        #     global_attention_mask = None
+-+++++-
+-+++++-
+-+++++-        # sparse_mode=3
+-+++++-        # attn_output = mindspore.ops.flash_attention_score(
+-+++++-        #     query=query_states,
+-+++++-        #     key=key_states,
+-+++++-        #     value=value_states,
+-+++++-        #     real_shift=None,
+-+++++-        #     padding_mask=None,
+-+++++-
+-+++++-        #     head_num=self.num_heads,
+-+++++-        #     attn_mask=global_attention_mask,
+-+++++-        #     keep_prob=1.0 - self.attention_dropout,
+-+++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++-        #     input_layout="BNSD",
+-+++++-        #     pre_tokens=2147483647,
+-+++++-        #     next_tokens=2147483647,
+-+++++-        #     inner_precise=0,
+-+++++-        #     drop_mask=None,
+-+++++-        #     prefix=None,
+-+++++-        #     actual_seq_qlen=None,
+-+++++-        #     actual_seq_kvlen=None,
+-+++++-        #     sparse_mode=sparse_mode,
+-+++++-        # )
+-+++++-        if not output_attentions:
+-+++++-            attn_weights = None
+-+++++-
+-+++++         return attn_output, attn_weights, past_key_value
+-+++++ 
+-+++++-
+-+++++ # class Qwen2MoeFlashAttention(nn.Module):
+-+++++ #     """
+-+++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
+-+++++ #             return final_hidden_states, router_logits
+-+++++ 
+-+++++ 
+-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++-#     """
+-+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
+-+++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
+-+++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
+-+++++-#     """
+-+++++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++++-#         super().__init__()
+-+++++-#         self.num_experts = config.num_experts
+-+++++-#         self.top_k = config.num_experts_per_tok
+-+++++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++++-
+-+++++-#         # 门控网络
+-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++-#         # 专家列表
+-+++++-#         self.experts = nn.ModuleList(
+-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++-#         )
+-+++++-#         # 共享专家
+-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++-
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_decode(
+-+++++-#         self, 
+-+++++-#         hidden_states: mindspore.Tensor, 
+-+++++-#         selected_experts: mindspore.Tensor, 
+-+++++-#         routing_weights: mindspore.Tensor
+-+++++-#     ) -> mindspore.Tensor:
+-+++++-#         """
+-+++++-#         【解码路径】针对 sequence_length=1 的极致优化。
+-+++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
+-+++++-#         """
+-+++++-#         batch_size, hidden_dim = hidden_states.shape
+-+++++-        
+-+++++-#         expert_outputs_list = [
+-+++++-#             ops.cat([
+-+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++-#             ], dim=0) 
+-+++++-#             for i in range(batch_size)
+-+++++-#         ]
+-+++++-        
+-+++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
+-+++++-#         # shape: (batch_size, top_k, hidden_dim)
+-+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++-        
+-+++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
+-+++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++++-        
+-+++++-#         return moe_output.squeeze(1)
+-+++++-
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_prefill(
+-+++++-#         self, 
+-+++++-#         hidden_states: mindspore.Tensor, 
+-+++++-#         selected_experts: mindspore.Tensor, 
+-+++++-#         routing_weights: mindspore.Tensor
+-+++++-#     ) -> mindspore.Tensor:
+-+++++-#         """
+-+++++-#         【预填充路径】针对 sequence_length > 1 的优化。
+-+++++-#         按专家对 Token 进行分组，并进行批处理。
+-+++++-#         """
+-+++++-#         moe_output = ops.zeros_like(hidden_states)
+-+++++-#         num_tokens = hidden_states.shape[0]
+-+++++-#         flat_selected_experts = selected_experts.flatten()
+-+++++-        
+-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++-        
+-+++++-#         active_experts = ops.unique(flat_selected_experts)
+-+++++-        
+-+++++-#         for expert_idx_tensor in active_experts:
+-+++++-#             expert_idx = expert_idx_tensor.item()
+-+++++-#             expert_layer = self.experts[expert_idx]
+-+++++-            
+-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++-#             selected_token_indices = token_indices[mask]
+-+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++++-            
+-+++++-#             current_states = hidden_states[selected_token_indices]
+-+++++-            
+-+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++-            
+-+++++-#             moe_output = moe_output.index_add(
+-+++++-#                 dim=0,
+-+++++-#                 index=selected_token_indices,
+-+++++-#                 source=expert_output.to(hidden_states.dtype)
+-+++++-#             )
+-+++++-#         return moe_output
+-+++++-
+-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++-#         """
+-+++++-#         顶层 forward 方法，作为智能分发器。
+-+++++-#         """
+-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-        
+-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++-
+-+++++-#         if self.norm_topk_prob:
+-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-        
+-+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++-        
+-+++++-#         moe_output = None
+-+++++-#         # 在推理时，根据序列长度选择最优路径
+-+++++-#         if not self.training:
+-+++++-#             if sequence_length == 1:
+-+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++-#             else:
+-+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++-#         else:
+-+++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
+-+++++-#             raise NotImplementedError("Training path is not implemented.")
+-+++++-
+-+++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
+-+++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
+-+++++-        
+-+++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
+-+++++-        
+-+++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
+-+++++-        
+-+++++-#         return final_hidden_states, router_logits
+-+++++-
+-+++++-
+-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++-#     """
+-+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
+-+++++-#     """
+-+++++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++++-#         super().__init__()
+-+++++-#         self.num_experts = config.num_experts
+-+++++-#         self.top_k = config.num_experts_per_tok
+-+++++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++++-
+-+++++-#         # 门控网络
+-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++-#         # 专家列表
+-+++++-#         self.experts = nn.ModuleList(
+-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++-#         )
+-+++++-#         # 共享专家
+-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++-
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_decode(
+-+++++-#         self, 
+-+++++-#         hidden_states: mindspore.Tensor, 
+-+++++-#         selected_experts: mindspore.Tensor, 
+-+++++-#         routing_weights: mindspore.Tensor
+-+++++-#     ) -> mindspore.Tensor:
+-+++++-#         batch_size, _ = hidden_states.shape
+-+++++-#         expert_outputs_list = [
+-+++++-#             ops.cat([
+-+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++-#             ], dim=0) 
+-+++++-#             for i in range(batch_size)
+-+++++-#         ]
+-+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++++-#         return moe_output.squeeze(1)
+-+++++-
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_prefill(
+-+++++-#         self, 
+-+++++-#         hidden_states: mindspore.Tensor, 
+-+++++-#         selected_experts: mindspore.Tensor, 
+-+++++-#         routing_weights: mindspore.Tensor
+-+++++-#     ) -> mindspore.Tensor:
+-+++++-#         moe_output = ops.zeros_like(hidden_states)
+-+++++-#         num_tokens = hidden_states.shape[0]
+-+++++-#         flat_selected_experts = selected_experts.flatten()
+-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++-#         active_experts = ops.unique(flat_selected_experts)
+-+++++-        
+-+++++-#         for expert_idx_tensor in active_experts:
+-+++++-#             expert_idx = expert_idx_tensor.item()
+-+++++-#             expert_layer = self.experts[expert_idx]
+-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++-#             selected_token_indices = token_indices[mask]
+-+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++++-#             current_states = hidden_states[selected_token_indices]
+-+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++-#             moe_output = moe_output.index_add(
+-+++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++++-#             )
+-+++++-#         return moe_output
+-+++++-
+-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++-#         """
+-+++++-#         顶层 forward 方法，作为智能分发器。
+-+++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
+-+++++-#         """
+-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-        
+-+++++-#         # 1. 门控计算 (通用逻辑)
+-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++-
+-+++++-#         if self.norm_topk_prob:
+-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-        
+-+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++-        
+-+++++-#         # 2. 智能分发到最优 MoE 路径
+-+++++-#         moe_output = None
+-+++++-#         if not self.training:
+-+++++-#             if sequence_length == 1:
+-+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++-#             else:
+-+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++-#         else:
+-+++++-#             raise NotImplementedError("Training path is not implemented.")
+-+++++-
+-+++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
+-+++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
+-+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++-        
+-+++++-#         # 4. 合并 MoE 输出和共享专家输出
+-+++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
+-+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++-        
+-+++++-#         # 5. 恢复原始形状并返回
+-+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++-        
+-+++++-#         return final_hidden_states, router_logits
+-+++++-
+-+++++-# prefill fastest
+-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++-#     """
+-+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
+-+++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
+-+++++-#     """
+-+++++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++++-#         super().__init__()
+-+++++-#         self.num_experts = config.num_experts
+-+++++-#         self.top_k = config.num_experts_per_tok
+-+++++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++++-
+-+++++-#         # 门控网络
+-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++-#         # 专家列表
+-+++++-#         self.experts = nn.ModuleList(
+-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++-#         )
+-+++++-#         # 共享专家
+-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++-
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_dispatch(
+-+++++-#         self, 
+-+++++-#         hidden_states: mindspore.Tensor, 
+-+++++-#         selected_experts: mindspore.Tensor, 
+-+++++-#         routing_weights: mindspore.Tensor
+-+++++-#     ) -> mindspore.Tensor:
+-+++++-#         """
+-+++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
+-+++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
+-+++++-#         """
+-+++++-#         moe_output = ops.zeros_like(hidden_states)
+-+++++-#         num_tokens, _ = hidden_states.shape
+-+++++-        
+-+++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
+-+++++-#         flat_selected_experts = selected_experts.flatten()
+-+++++-#         flat_routing_weights = routing_weights.flatten()
+-+++++-
+-+++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
+-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++-
+-+++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
+-+++++-#         active_experts = ops.unique(flat_selected_experts)
+-+++++-        
+-+++++-#         for expert_idx_tensor in active_experts:
+-+++++-#             expert_idx = expert_idx_tensor.item()
+-+++++-#             expert_layer = self.experts[expert_idx]
+-+++++-            
+-+++++-#             # 找到所有分配给该专家的 token
+-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++-            
+-+++++-#             # 使用 mask 选取对应的 token 和权重
+-+++++-#             current_token_indices = token_indices[mask]
+-+++++-#             current_routing_weights = flat_routing_weights[mask]
+-+++++-#             current_hidden_states = hidden_states[current_token_indices]
+-+++++-            
+-+++++-#             # 对这些 token 进行批处理
+-+++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++++-            
+-+++++-#             # 使用 index_add 将结果精确地加回到对应位置
+-+++++-#             moe_output = moe_output.index_add(
+-+++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++++-#             )
+-+++++-#         return moe_output
+-+++++-
+-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++-#         """
+-+++++-#         顶层 forward 方法，作为智能分发器。
+-+++++-#         """
+-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-        
+-+++++-#         # 1. 门控计算
+-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++-
+-+++++-#         if self.norm_topk_prob:
+-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-        
+-+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++-        
+-+++++-#         # 2. 调用统一的 MoE 计算内核
+-+++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
+-+++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
+-+++++-
+-+++++-#         # 3. 统一处理共享专家
+-+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++-        
+-+++++-#         # 4. 合并输出
+-+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++-        
+-+++++-#         # 5. 恢复原始形状并返回
+-+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++-        
+-+++++-#         return final_hidden_states, router_logits
+-+++++-
+-+++++-
+-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++-#     """
+-+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
+-+++++-#     【最终高性能与高精度版】：
+-+++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
+-+++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
+-+++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
+-+++++-#     3. 这样实现了速度和准确性的两全其美。
+-+++++-#     """
+-+++++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++++-#         super().__init__()
+-+++++-#         self.num_experts = config.num_experts
+-+++++-#         self.top_k = config.num_experts_per_tok
+-+++++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++++-
+-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++-#         self.experts = nn.ModuleList(
+-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++-#         )
+-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++-
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_decode(
+-+++++-#         self, 
+-+++++-#         hidden_states: mindspore.Tensor, 
+-+++++-#         selected_experts: mindspore.Tensor, 
+-+++++-#         routing_weights: mindspore.Tensor
+-+++++-#     ) -> mindspore.Tensor:
+-+++++-#         """
+-+++++-#         【解码路径】极致优化版：bmm + 高精度累加。
+-+++++-#         """
+-+++++-#         original_dtype = hidden_states.dtype
+-+++++-#         batch_size, _ = hidden_states.shape
+-+++++-        
+-+++++-#         expert_outputs_list = [
+-+++++-#             ops.cat([
+-+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++-#             ], dim=0) 
+-+++++-#             for i in range(batch_size)
+-+++++-#         ]
+-+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++-
+-+++++-#         # 在 float32 下执行 bmm，得到高精度结果
+-+++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
+-+++++-        
+-+++++-#         # 将高精度结果转换回原始数据类型
+-+++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
+-+++++-        
+-+++++-#         return moe_output
+-+++++-
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_prefill(
+-+++++-#         self, 
+-+++++-#         hidden_states: mindspore.Tensor, 
+-+++++-#         selected_experts: mindspore.Tensor, 
+-+++++-#         routing_weights: mindspore.Tensor
+-+++++-#     ) -> mindspore.Tensor:
+-+++++-#         """
+-+++++-#         【预填充路径】与原始实现一致，结果精确。
+-+++++-#         """
+-+++++-#         moe_output = ops.zeros_like(hidden_states)
+-+++++-#         num_tokens, _ = hidden_states.shape
+-+++++-#         flat_selected_experts = selected_experts.flatten()
+-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++-#         active_experts = ops.unique(flat_selected_experts)
+-+++++-        
+-+++++-#         for expert_idx_tensor in active_experts:
+-+++++-#             expert_idx = expert_idx_tensor.item()
+-+++++-#             expert_layer = self.experts[expert_idx]
+-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++-#             selected_token_indices = token_indices[mask]
+-+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++++-#             current_states = hidden_states[selected_token_indices]
+-+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++-#             moe_output = moe_output.index_add(
+-+++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
+-+++++-#             )
+-+++++-#         return moe_output
+-+++++-
+-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-        
+-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++-
+-+++++-#         if self.norm_topk_prob:
+-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-        
+-+++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
+-+++++-#         # 如果模型主体是 float16，后续再转换
+-+++++-        
+-+++++-#         moe_output = None
+-+++++-#         if not self.training:
+-+++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
+-+++++-#             # _moe_infer_decode 内部会处理好类型转换
+-+++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++-#             if sequence_length == 1:
+-+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+++++-#             else:
+-+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
+-+++++-#         else:
+-+++++-#             raise NotImplementedError("Training path is not implemented.")
+-+++++-
+-+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++-        
+-+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++-        
+-+++++-#         return final_hidden_states, router_logits
+-+++++-    
+-+++++-
+-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++-#     """
+-+++++-#     【融合版】一个混合专家模块，内置两种推理策略，
+-+++++-#     由外部全局变量 `Long_Prompt` 控制：
+-+++++-
+-+++++-#     - if Long_Prompt is True:  【精度优先模式】
+-+++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
+-+++++-#       适用于处理长序列，避免误差累积。
+-+++++-
+-+++++-#     - if Long_Prompt is False: 【速度优先模式】
+-+++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
+-+++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
+-+++++-#     """
+-+++++-#     def __init__(self, config: Qwen2MoeConfig):
+-+++++-#         super().__init__()
+-+++++-#         self.num_experts = config.num_experts
+-+++++-#         self.top_k = config.num_experts_per_tok
+-+++++-#         self.norm_topk_prob = config.norm_topk_prob
+-+++++-
+-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
+-+++++-#         self.experts = nn.ModuleList(
+-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
+-+++++-#         )
+-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++-
+-+++++-#     # --- 速度优先模式的辅助函数 ---
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++-#         original_dtype = hidden_states.dtype
+-+++++-#         batch_size, _ = hidden_states.shape
+-+++++-#         expert_outputs_list = [
+-+++++-#             ops.cat([
+-+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
+-+++++-#             ], dim=0) 
+-+++++-#             for i in range(batch_size)
+-+++++-#         ]
+-+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
+-+++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
+-+++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
+-+++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+++++-
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++-#         moe_output = ops.zeros_like(hidden_states)
+-+++++-#         num_tokens, _ = hidden_states.shape
+-+++++-#         flat_selected_experts = selected_experts.flatten()
+-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++-#         active_experts = ops.unique(flat_selected_experts)
+-+++++-#         for expert_idx_tensor in active_experts:
+-+++++-#             expert_idx = expert_idx_tensor.item()
+-+++++-#             expert_layer = self.experts[expert_idx]
+-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++-#             selected_token_indices = token_indices[mask]
+-+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
+-+++++-#             current_states = hidden_states[selected_token_indices]
+-+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++++-#         return moe_output
+-+++++-
+-+++++-#     # --- 精度优先模式的辅助函数 ---
+-+++++-#     @no_grad()
+-+++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++-#         moe_output = ops.zeros_like(hidden_states)
+-+++++-#         num_tokens, _ = hidden_states.shape
+-+++++-#         flat_selected_experts = selected_experts.flatten()
+-+++++-#         flat_routing_weights = routing_weights.flatten()
+-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
+-+++++-#         active_experts = ops.unique(flat_selected_experts)
+-+++++-#         for expert_idx_tensor in active_experts:
+-+++++-#             expert_idx = expert_idx_tensor.item()
+-+++++-#             expert_layer = self.experts[expert_idx]
+-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
+-+++++-#             current_token_indices = token_indices[mask]
+-+++++-#             current_routing_weights = flat_routing_weights[mask]
+-+++++-#             current_hidden_states = hidden_states[current_token_indices]
+-+++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
+-+++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++++-#         return moe_output
+-+++++-
+-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
+-+++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
+-+++++-#         global Long_Prompt
+-+++++-
+-+++++-#         # 1. 门控计算 (所有模式通用)
+-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++-#         router_logits = self.gate(hidden_states_reshaped)
+-+++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
+-+++++-#         if self.norm_topk_prob:
+-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-        
+-+++++-#         moe_output = None
+-+++++-#         if not self.training:
+-+++++-#             # 根据 Long_Prompt 标志选择模式
+-+++++-#             if Long_Prompt:
+-+++++-#                 # --- 精度优先模式 ---
+-+++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++-#             else:
+-+++++-#                 # --- 速度优先模式 ---
+-+++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++-#                 if sequence_length == 1:
+-+++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++-#                 else:
+-+++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++-#         else:
+-+++++-#             raise NotImplementedError("Training path is not implemented.")
+-+++++-
+-+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
+-+++++-        
+-+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
+-+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
+-+++++-        
+-+++++-#         return final_hidden_states, router_logits
+-+++++-    
+-+++++ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++     """
+-+++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
+-+++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
+-+++++         return moe_output_fp32.squeeze(1).to(original_dtype)
+-+++++ 
+-++++++    # @no_grad()
+-++++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-++++++    #     num_tokens, _ = hidden_states.shape
+-++++++    #     flat_selected_experts = selected_experts.flatten()
+-++++++    #     sorted_expert_indices = flat_selected_experts.argsort()
+-++++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-++++++    #     original_token_indices = sorted_expert_indices // self.top_k
+-++++++    #     moe_output = ops.zeros_like(hidden_states)
+-++++++    #     current_token_offset = 0
+-++++++    #     for i in range(self.num_experts):
+-++++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
+-++++++    #         if expert_token_count == 0:
+-++++++    #             continue
+-++++++    #         end_offset = current_token_offset + expert_token_count
+-++++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-++++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-++++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
+-++++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-++++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-++++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-++++++    #         current_token_offset += expert_token_count
+-++++++    #     return moe_output
+-++++++
+-+++++     @no_grad()
+-+++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++-        num_tokens, _ = hidden_states.shape
+-+++++-        flat_selected_experts = selected_experts.flatten()
+-+++++-        sorted_expert_indices = flat_selected_experts.argsort()
+-+++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
+-+++++-        original_token_indices = sorted_expert_indices // self.top_k
+-++++++        """
+-++++++        优化版 MoE prefill (速度优先模式)：
+-++++++        - 批量张量化处理同一个 expert 的所有 token
+-++++++        - 跳过无 token 的专家
+-++++++        - 保持结果完全一致
+-++++++        """
+-+++++         moe_output = ops.zeros_like(hidden_states)
+-+++++-        current_token_offset = 0
+-+++++-        for i in range(self.num_experts):
+-+++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
+-+++++-            if expert_token_count == 0:
+-+++++-                continue
+-+++++-            end_offset = current_token_offset + expert_token_count
+-+++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
+-+++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
+-+++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
+-+++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
+-+++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
+-+++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
+-+++++-            current_token_offset += expert_token_count
+-++++++
+-++++++        flat_selected_experts = selected_experts.flatten()
+-++++++        flat_routing_weights = routing_weights.flatten()
+-++++++
+-++++++        idxs = flat_selected_experts.argsort()
+-++++++        sorted_expert_indices = flat_selected_experts[idxs]
+-++++++        sorted_token_indices = idxs // self.top_k
+-++++++
+-++++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
+-++++++
+-++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
+-++++++
+-++++++        for expert_id in active_experts.tolist():
+-++++++            start = int(tokens_per_expert[:expert_id].sum().item())
+-++++++            end = start + int(tokens_per_expert[expert_id].item())
+-++++++
+-++++++            token_idx = sorted_token_indices[start:end]
+-++++++            expert_tokens = hidden_states[token_idx]
+-++++++
+-++++++            expert_out = self.experts[expert_id](expert_tokens)
+-++++++
+-++++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
+-++++++
+-++++++            moe_output = mindspore.mint.scatter_add(
+-++++++                moe_output,
+-++++++                0,
+-++++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
+-++++++                scaled_out.to(hidden_states.dtype)
+-++++++            )
+-++++++
+-+++++         return moe_output
+-+++++ 
+-++++++
+-+++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
+-+++++     @no_grad()
+-+++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
+-+++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++         
+-+++++         moe_output = None
+-+++++-        if Long_Prompt:
+-+++++-            # --- 精度优先模式 (ACCURACY MODE) ---
+-+++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++        # if Long_Prompt==0:
+-++++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
+-++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++        # else:
+-++++++        #     # --- 速度优先模式 (SPEED MODE) ---
+-++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++++        #     if sequence_length == 1:
+-++++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++        #     else:
+-++++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++        
+-++++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-++++++        if sequence_length == 1:
+-++++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++         else:
+-+++++-            # --- 速度优先模式 (SPEED MODE) ---
+-+++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
+-+++++-            if sequence_length == 1:
+-+++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++-            else:
+-+++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-+++++-        
+-++++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
+-++++++    
+-+++++ 
+-+++++         # 3. 共享专家计算与合并 (所有模式通用)
+-+++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
+-+++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++         
+-+++++         return final_hidden_states, router_logits
+-+++++ 
+-++++++
+-+++++ class Qwen2MoeDecoderLayer(nn.Module):
+-+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
+-+++++         super().__init__()
+-+++++         self.hidden_size = config.hidden_size
+-+++++         
+-+++++-        # if Long_Prompt:
+-+++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++++-        # else:
+-++++++        # if Long_Prompt == 2:
+-+++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-++++++        # else:
+-++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++++ 
+-+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++++ 
+-+++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+++++             )
+-+++++ 
+-+++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+-+++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++++        #     attention_mask,
+-++++++        #     sequence_length=sequence_length,
+-++++++        #     target_length=target_length,
+-++++++        #     dtype=dtype,
+-++++++        #     min_dtype=min_dtype,
+-++++++        #     cache_position=cache_position,
+-++++++        #     batch_size=input_tensor.shape[0],
+-++++++        # )
+-++++++        #@dwj
+-++++++        causal_mask = get_cached_causal_mask_with_cache_position(
+-+++++             attention_mask,
+-+++++             sequence_length=sequence_length,
+-+++++             target_length=target_length,
+-+++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
+-+++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
+-+++++         """
+-+++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
+-++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
+-++++++        _causal_mask_cache.clear()
+-+++++ 
+-+++++         input_ids = kwargs.get("input_ids")
+-+++++         if input_ids is None and args:
+-+++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++ 
+-+++++         if input_ids is not None:
+-+++++             prompt_length = input_ids.shape[1]
+-+++++-            
+-+++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
+-+++++-                Long_Prompt = True
+-++++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
+-++++++                Long_Prompt = 2
+-++++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
+-++++++                Long_Prompt = 0
+-+++++             else:
+-+++++-                Long_Prompt = False
+-++++++                Long_Prompt = 1
+-++++++
+-+++++ 
+-+++++         return super().generate(*args, **kwargs)
+-+++++     
+-+++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++             dtype = self.lm_head.weight.dtype
+-+++++             min_dtype = float(ops.finfo(dtype).min)
+-+++++ 
+-+++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
+-++++++            #     attention_mask,
+-++++++            #     sequence_length=sequence_length,
+-++++++            #     target_length=past_key_values.get_max_length(),
+-++++++            #     dtype=dtype,
+-++++++            #     min_dtype=min_dtype,
+-++++++            #     cache_position=cache_position,
+-++++++            #     batch_size=batch_size,
+-++++++            # )
+-++++++
+-++++++            #@dwj
+-++++++            attention_mask = get_cached_causal_mask_with_cache_position(
+-+++++                 attention_mask,
+-+++++                 sequence_length=sequence_length,
+-+++++                 target_length=past_key_values.get_max_length(),
+-+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
+-+++++deleted file mode 100644
+-+++++index 6dfb5b93..00000000
+-+++++--- a/patches/0001-20251104commit.patch
+-++++++++ /dev/null
+-+++++@@ -1,1272 +0,0 @@
+-+++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
+-+++++-From: Pinoeer-kingxi <13022943007@163.com>
+-+++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
+-+++++-Subject: [PATCH] 20251104commit
+-+++++-
+-+++++----
+-+++++- mindnlp/transformers/cache_utils.py           |  28 +-
+-+++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
+-+++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
+-+++++- 3 files changed, 976 insertions(+), 87 deletions(-)
+-+++++-
+-+++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
+-+++++-index cadd2e04..02f8d4be 100644
+-+++++---- a/mindnlp/transformers/cache_utils.py
+-+++++-+++ b/mindnlp/transformers/cache_utils.py
+-+++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
+-+++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
+-+++++-             # k_out[:, :, cache_position] = key_states
+-+++++-             # v_out[:, :, cache_position] = value_states
+-+++++--            if ON_ORANGE_PI:
+-+++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++++--            else:
+-+++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++++--
+-+++++-+            # if ON_ORANGE_PI:
+-+++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
+-+++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
+-+++++-+            # else:
+-+++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
+-+++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
+-+++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
+-+++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
+-+++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
+-+++++-+            if cache_position.ndim > 1:
+-+++++-+                cache_position = cache_position.flatten()
+-+++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
+-+++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
+-+++++-+                cache_position = cache_position.int()
+-+++++-+            
+-+++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
+-+++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
+-+++++-+            k_out[:, :, cache_position] = key_states
+-+++++-+            v_out[:, :, cache_position] = value_states
+-+++++-+            
+-+++++-         return k_out, v_out
+-+++++- 
+-+++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
+-+++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++-index c695b944..d8303e45 100644
+-+++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
+-+++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
+-+++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
+-+++++- def rotate_half(x):
+-+++++-     """Rotates half the hidden dims of the input."""
+-+++++--    x1 = x[..., : x.shape[-1] // 2]
+-+++++--    x2 = x[..., x.shape[-1] // 2 :]
+-+++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++++-+    # x1 = x[..., : x.shape[-1] // 2]
+-+++++-+    # x2 = x[..., x.shape[-1] // 2 :]
+-+++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++++-     return ops.cat((-x2, x1), dim=-1)
+-+++++- 
+-+++++- 
+-+++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
+-+++++-         if self.training:
+-+++++-             raise NotImplementedError("Training is not supported yet.")
+-+++++-         else:
+-+++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++--        if self.config.n_shared_experts is not None:
+-+++++--            y = y + self.shared_experts(identity)
+-+++++--        return y
+-+++++-+            # @lwx
+-+++++-+            if orig_shape[1] == 1:
+-+++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
+-+++++-+                y=y.view(*orig_shape)
+-+++++-+                if self.config.n_shared_experts is not None:
+-+++++-+                    y = y + self.shared_experts(identity)
+-+++++-+                return y
+-+++++-+            else:
+-+++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++-+                if self.config.n_shared_experts is not None:
+-+++++-+                    y = y + self.shared_experts(identity)
+-+++++-+                return y
+-+++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
+-+++++-+        # if self.config.n_shared_experts is not None:
+-+++++-+        #     y = y + self.shared_experts(identity)
+-+++++-+        # return y
+-+++++-+        
+-+++++-+    @no_grad()
+-+++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-+
+-+++++-+        expert_cache = ops.zeros_like(x)
+-+++++-+        for i in range(self.num_experts_per_tok):
+-+++++-+            expert_id = flat_expert_indices[i].item()
+-+++++-+            weight = flat_expert_weights[i].item()
+-+++++-+            expert = self.experts[expert_id]
+-+++++-+            expert_out = expert(x)
+-+++++-+            expert_cache += expert_out * weight
+-+++++-+        return expert_cache
+-+++++- 
+-+++++-     @no_grad()
+-+++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++--        # expert_cache = torch.zeros_like(x)
+-+++++--        # idxs = flat_expert_indices.argsort()
+-+++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++--        # token_idxs = idxs // self.num_experts_per_tok
+-+++++--        # for i, end_idx in enumerate(tokens_per_expert):
+-+++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++--        #     if start_idx == end_idx:
+-+++++--        #         continue
+-+++++--        #     expert = self.experts[i]
+-+++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++--        #     expert_tokens = x[exp_token_idx]
+-+++++--        #     expert_out = expert(expert_tokens)
+-+++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++--        # return expert_cache
+-+++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-         expert_cache = ops.zeros_like(x)
+-+++++-         idxs = flat_expert_indices.argsort()
+-+++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++-         token_idxs = idxs // self.num_experts_per_tok
+-+++++-+
+-+++++-         for i, end_idx in enumerate(tokens_per_expert):
+-+++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++-             if start_idx == end_idx:
+-+++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
+-+++++-             expert_out = expert(expert_tokens)
+-+++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++-+
+-+++++-         return expert_cache
+-+++++-+        
+-+++++-+    # @no_grad()
+-+++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-+    #     # expert_cache = torch.zeros_like(x)
+-+++++-+    #     # idxs = flat_expert_indices.argsort()
+-+++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
+-+++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
+-+++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
+-+++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
+-+++++-+    #     #     if start_idx == end_idx:
+-+++++-+    #     #         continue
+-+++++-+    #     #     expert = self.experts[i]
+-+++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++-+    #     #     expert_tokens = x[exp_token_idx]
+-+++++-+    #     #     expert_out = expert(expert_tokens)
+-+++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
+-+++++-+    #     # return expert_cache
+-+++++-+    #     expert_cache = ops.zeros_like(x)
+-+++++-+    #     idxs = flat_expert_indices.argsort()
+-+++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++-+
+-+++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
+-+++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++-+    #         if start_idx == end_idx:
+-+++++-+    #             continue
+-+++++-+    #         expert = self.experts[i]
+-+++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++-+    #         expert_tokens = x[exp_token_idx]
+-+++++-+    #         expert_out = expert(expert_tokens)
+-+++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
+-+++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
+-+++++-+
+-+++++-+    #     return expert_cache
+-+++++-+    # @no_grad()
+-+++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
+-+++++-+    #     expert_cache = ops.zeros_like(x)
+-+++++-+
+-+++++-+    #     # 排序保证顺序一致
+-+++++-+    #     idxs = flat_expert_indices.argsort()
+-+++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
+-+++++-+    #     token_idxs = idxs // self.num_experts_per_tok
+-+++++-+
+-+++++-+    #     # 找出有 token 的专家
+-+++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
+-+++++-+
+-+++++-+    #     for i in active_experts.tolist():
+-+++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
+-+++++-+    #         end_idx = tokens_per_expert[i]
+-+++++-+    #         if start_idx == end_idx:  # 没有 token
+-+++++-+    #             continue
+-+++++-+
+-+++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
+-+++++-+    #         expert_tokens = x[exp_token_idx]
+-+++++-+    #         expert_out = self.experts[i](expert_tokens)
+-+++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
+-+++++-+
+-+++++-+    #         expert_cache = mindspore.mint.scatter_add(
+-+++++-+    #             expert_cache,
+-+++++-+    #             0,
+-+++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
+-+++++-+    #             expert_out
+-+++++-+    #         )
+-+++++-+
+-+++++-+    #     return expert_cache
+-+++++-+
+-+++++-+
+-+++++- 
+-+++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
+-+++++- #     """
+-+++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++++- 
+-+++++-         # Initialize weights and apply final processing
+-+++++-         self.post_init()
+-+++++-+        self.warm_up = False
+-+++++-+
+-+++++-+    def warmup_moe_model_deep(self):
+-+++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
+-+++++-+        test_texts = [
+-+++++-+            "warmup short",
+-+++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
+-+++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
+-+++++-+        ]
+-+++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++++-+        if tokenizer is None:
+-+++++-+            from mindnlp.transformers import AutoTokenizer
+-+++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++++-+            self._warmup_tokenizer = tokenizer
+-+++++-+
+-+++++-+        for text in test_texts:
+-+++++-+            inputs = tokenizer(text, return_tensors="ms")
+-+++++-+            with mindspore._no_grad():
+-+++++-+                _ = self(**inputs, use_cache=False)
+-+++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
+-+++++- 
+-+++++-     def get_input_embeddings(self):
+-+++++-         return self.model.embed_tokens
+-+++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
+-+++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++++-         ```"""
+-+++++-+        if not self.warm_up:
+-+++++-+            self.warm_up = True
+-+++++-+            self.warmup_moe_model_deep()
+-+++++-+
+-+++++-         output_attentions = (
+-+++++-             output_attentions
+-+++++-             if output_attentions is not None
+-+++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++-index 3cbf820e..d4c6b651 100644
+-+++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
+-+++++-@@ -18,7 +18,6 @@
+-+++++- # See the License for the specific language governing permissions and
+-+++++- # limitations under the License.
+-+++++- """MindSpore Qwen2MoE model."""
+-+++++--
+-+++++- import math
+-+++++- from typing import List, Optional, Tuple, Union
+-+++++- 
+-+++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
+-+++++-     TokenClassifierOutput,
+-+++++- )
+-+++++- from ...modeling_utils import PreTrainedModel
+-+++++-+from ...generation import GenerationMixin
+-+++++- from ....utils import logging
+-+++++- from .configuration_qwen2_moe import Qwen2MoeConfig
+-+++++- 
+-+++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
+-+++++-         self.variance_epsilon = eps
+-+++++- 
+-+++++-     def forward(self, hidden_states):
+-+++++-+        # @dwj
+-+++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++++-+        # @lwx
+-+++++-+        # if not self.training :
+-+++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
+-+++++-         input_dtype = hidden_states.dtype
+-+++++-         hidden_states = hidden_states.to(mindspore.float32)
+-+++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
+-+++++-@@ -234,6 +239,8 @@ def rotate_half(x):
+-+++++-     """Rotates half the hidden dims of the input."""
+-+++++-     x1 = x[..., : x.shape[-1] // 2]
+-+++++-     x2 = x[..., x.shape[-1] // 2 :]
+-+++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
+-+++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
+-+++++-     return ops.cat((-x2, x1), dim=-1)
+-+++++- 
+-+++++- 
+-+++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
+-+++++-         self.config = config
+-+++++-         self.hidden_size = config.hidden_size
+-+++++-         self.intermediate_size = intermediate_size
+-+++++-+        
+-+++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+-+++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+-+++++-         self.act_fn = ACT2FN[config.hidden_act]
+-+++++- 
+-+++++-     def forward(self, x):
+-+++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++++--
+-+++++- 
+-+++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+-+++++-+        # @lwx 
+-+++++-+        # gate_up_output = self.gate_up_proj(x)
+-+++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
+-+++++-+        # return self.down_proj(swiglu_output)
+-+++++-+
+-+++++-+    # def forward(self, x):
+-+++++-+    #     gate_proj_out = self.gate_proj(x)
+-+++++-+    #     up_proj_out = self.up_proj(x)
+-+++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
+-+++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
+-+++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
+-+++++-+    #     return self.down_proj(swiglu_out)
+-+++++-+        
+-+++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
+-+++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
+-+++++-     """
+-+++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
+-+++++-         use_cache: bool = False,
+-+++++-         cache_position: Optional[mindspore.Tensor] = None,
+-+++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++-+
+-+++++-+        
+-+++++-+
+-+++++-         bsz, q_len, _ = hidden_states.shape
+-+++++- 
+-+++++-         query_states = self.q_proj(hidden_states)
+-+++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
+-+++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++-                     "with a layer index."
+-+++++-                 )
+-+++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++-+            if isinstance(past_key_value, StaticCache):
+-+++++-+                kv_seq_len = key_states.shape[-2]
+-+++++-+            else:
+-+++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++- 
+-+++++-         if past_key_value is not None:
+-+++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
+-+++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+-+++++-+            
+-+++++-+            if isinstance(past_key_value, StaticCache):
+-+++++-+                kv_seq_len = key_states.shape[-2]
+-+++++- 
+-+++++-         # repeat k/v heads if n_kv_heads < n_heads
+-+++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++--
+-+++++-+        
+-+++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
+-+++++- 
+-+++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
+-+++++--            raise ValueError(
+-+++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
+-+++++--                f" {attn_weights.shape}"
+-+++++--            )
+-+++++--
+-+++++--        if attention_mask is not None:  # no matter the length, we just slice it
+-+++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+-+++++-+        if attention_mask is not None:
+-+++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
+-+++++-             attn_weights = attn_weights + causal_mask
+-+++++- 
+-+++++-         # upcast attention to fp32
+-+++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
+-+++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+-+++++- 
+-+++++-         attn_output = self.o_proj(attn_output)
+-+++++--
+-+++++-+        # @lwx
+-+++++-+        
+-+++++-+        # max_seq_len = self.max_position_embeddings  # 2048
+-+++++-+
+-+++++-+        # if attention_mask is not None:
+-+++++-+        #     # attention_mask: [B, 1, Sq, Sk]
+-+++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
+-+++++-+
+-+++++-+        #     # pad 到 [max_seq_len, max_seq_len]
+-+++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
+-+++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
+-+++++-+        #     global_attention_mask = padded_mask
+-+++++-+        # else:
+-+++++-+        #     global_attention_mask = None
+-+++++-+
+-+++++-+
+-+++++-+        # sparse_mode=3
+-+++++-+        # attn_output = mindspore.ops.flash_attention_score(
+-+++++-+        #     query=query_states,
+-+++++-+        #     key=key_states,
+-+++++-+        #     value=value_states,
+-+++++-+        #     real_shift=None,
+-+++++-+        #     padding_mask=None,
+-+++++-+
+-+++++-+        #     head_num=self.num_heads,
+-+++++-+        #     attn_mask=global_attention_mask,
+-+++++-+        #     keep_prob=1.0 - self.attention_dropout,
+-+++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++-+        #     input_layout="BNSD",
+-+++++-+        #     pre_tokens=2147483647,
+-+++++-+        #     next_tokens=2147483647,
+-+++++-+        #     inner_precise=0,
+-+++++-+        #     drop_mask=None,
+-+++++-+        #     prefix=None,
+-+++++-+        #     actual_seq_qlen=None,
+-+++++-+        #     actual_seq_kvlen=None,
+-+++++-+        #     sparse_mode=sparse_mode,
+-+++++-+        # )
+-+++++-         if not output_attentions:
+-+++++-             attn_weights = None
+-+++++- 
+-+++++-         return attn_output, attn_weights, past_key_value
+-+++++- 
+-+++++- 
+-+++++-+class Qwen2MoeFlashAttention(nn.Module):
+-+++++-+    """
+-+++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
+-+++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
+-+++++-+
+-+++++-+    关键改动:
+-+++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
+-+++++-+       直接传入原始的 key 和 value 张量效率更高。
+-+++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
+-+++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
+-+++++-+    """
+-+++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
+-+++++-+        super().__init__()
+-+++++-+        self.config = config
+-+++++-+        self.layer_idx = layer_idx
+-+++++-+        self.hidden_size = config.hidden_size
+-+++++-+        self.num_heads = config.num_attention_heads
+-+++++-+        self.head_dim = self.hidden_size // self.num_heads
+-+++++-+        self.num_key_value_heads = config.num_key_value_heads
+-+++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
+-+++++-+        self.max_position_embeddings = config.max_position_embeddings
+-+++++-+        self.rope_theta = config.rope_theta
+-+++++-+        self.attention_dropout = config.attention_dropout
+-+++++-+
+-+++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
+-+++++-+            raise ValueError(
+-+++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
+-+++++-+            )
+-+++++-+
+-+++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
+-+++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
+-+++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
+-+++++-+
+-+++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
+-+++++-+            self.head_dim,
+-+++++-+            max_position_embeddings=self.max_position_embeddings,
+-+++++-+            base=self.rope_theta,
+-+++++-+        )
+-+++++-+
+-+++++-+    def forward(
+-+++++-+        self,
+-+++++-+        hidden_states: mindspore.Tensor,
+-+++++-+        attention_mask: Optional[mindspore.Tensor] = None,
+-+++++-+        position_ids: Optional[mindspore.Tensor] = None,
+-+++++-+        past_key_value: Optional[Cache] = None,
+-+++++-+        output_attentions: bool = False,
+-+++++-+        use_cache: bool = False,
+-+++++-+        cache_position: Optional[mindspore.Tensor] = None,
+-+++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++-+
+-+++++-+        bsz, q_len, _ = hidden_states.shape
+-+++++-+
+-+++++-+        # 1. 线性投射 Q, K, V
+-+++++-+        query_states = self.q_proj(hidden_states)
+-+++++-+        key_states = self.k_proj(hidden_states)
+-+++++-+        value_states = self.v_proj(hidden_states)
+-+++++-+
+-+++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
+-+++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
+-+++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+
+-+++++-+        # 3. RoPE 旋转位置编码
+-+++++-+        kv_seq_len = key_states.shape[-2]
+-+++++-+        if past_key_value is not None:
+-+++++-+            if self.layer_idx is None:
+-+++++-+                raise ValueError(
+-+++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++-+                    "with a layer index."
+-+++++-+                )
+-+++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
+-+++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
+-+++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
+-+++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
+-+++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
+-+++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
+-+++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
+-+++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
+-+++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
+-+++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
+-+++++-+                if cache_position.shape[0] == 1:
+-+++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
+-+++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
+-+++++-+                    kv_seq_len = past_seen_tokens + 1
+-+++++-+                else:
+-+++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
+-+++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
+-+++++-+            else:
+-+++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++-+
+-+++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++-+
+-+++++-+        # 4. KV 缓存更新
+-+++++-+        if past_key_value is not None:
+-+++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++-+            key_states, value_states = past_key_value.update(
+-+++++-+                key_states, value_states, self.layer_idx, cache_kwargs
+-+++++-+            )
+-+++++-+            
+-+++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
+-+++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
+-+++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
+-+++++-+                if cache_position.shape[0] == 1:
+-+++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
+-+++++-+                    kv_seq_len = key_states.shape[-2]
+-+++++-+
+-+++++-+        # 5. [重要] 准备 Attention Mask
+-+++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
+-+++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
+-+++++-+        fa_attention_mask = None
+-+++++-+        if attention_mask is not None:
+-+++++-+            # 截取与当前key长度匹配的部分
+-+++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
+-+++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
+-+++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
+-+++++-+            fa_attention_mask = (mask_slice != 0)
+-+++++-+
+-+++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
+-+++++-+        input_dtype = query_states.dtype
+-+++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
+-+++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
+-+++++-+            query_states = query_states.to(mindspore.float16)
+-+++++-+            key_states = key_states.to(mindspore.float16)
+-+++++-+            value_states = value_states.to(mindspore.float16)
+-+++++-+
+-+++++-+        # 6. [核心] 调用 flash_attention_score 算子
+-+++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
+-+++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
+-+++++-+        attn_output = mindspore.ops.flash_attention_score(
+-+++++-+            query=query_states,
+-+++++-+            key=key_states,
+-+++++-+            value=value_states,
+-+++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
+-+++++-+            attn_mask=fa_attention_mask,
+-+++++-+            keep_prob=1.0 - self.attention_dropout,
+-+++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++-+            input_layout="BNSD",
+-+++++-+            sparse_mode=0 # 使用 defaultMask 模式
+-+++++-+        )
+-+++++-+
+-+++++-+        # 恢复原始数据类型
+-+++++-+        attn_output = attn_output.to(input_dtype)
+-+++++-+
+-+++++-+        # 7. 调整输出形状
+-+++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
+-+++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++-+        attn_output = self.o_proj(attn_output)
+-+++++-+
+-+++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
+-+++++-+        attn_weights = None
+-+++++-+        if output_attentions:
+-+++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++-+
+-+++++-+        return attn_output, attn_weights, past_key_value
+-+++++-+
+-+++++-+    # def forward(
+-+++++-+    #     self,
+-+++++-+    #     hidden_states: mindspore.Tensor,
+-+++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++-+    #     past_key_value: Optional[Cache] = None,
+-+++++-+    #     output_attentions: bool = False,
+-+++++-+    #     use_cache: bool = False,
+-+++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++-+
+-+++++-+    #     bsz, q_len, _ = hidden_states.shape
+-+++++-+
+-+++++-+    #     # 1. 线性投射 Q, K, V
+-+++++-+    #     query_states = self.q_proj(hidden_states)
+-+++++-+    #     key_states = self.k_proj(hidden_states)
+-+++++-+    #     value_states = self.v_proj(hidden_states)
+-+++++-+
+-+++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
+-+++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+
+-+++++-+    #     # 3. RoPE 旋转位置编码
+-+++++-+    #     kv_seq_len = key_states.shape[-2]
+-+++++-+    #     if past_key_value is not None:
+-+++++-+    #         if self.layer_idx is None:
+-+++++-+    #             raise ValueError(
+-+++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
+-+++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
+-+++++-+    #                 "with a layer index."
+-+++++-+    #             )
+-+++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++-+
+-+++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++-+
+-+++++-+    #     # 4. KV 缓存更新
+-+++++-+    #     if past_key_value is not None:
+-+++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++-+    #         key_states, value_states = past_key_value.update(
+-+++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++-+    #         )
+-+++++-+
+-+++++-+    #     # 5. 准备 Attention Mask
+-+++++-+    #     fa_attention_mask = None
+-+++++-+    #     if attention_mask is not None:
+-+++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++-+    #         fa_attention_mask = (mask_slice != 0)
+-+++++-+
+-+++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
+-+++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
+-+++++-+    #     input_dtype = query_states.dtype
+-+++++-+
+-+++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
+-+++++-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++-+    #         query=query_states,
+-+++++-+    #         key=key_states,
+-+++++-+    #         value=value_states,
+-+++++-+    #         head_num=self.num_heads,
+-+++++-+    #         attn_mask=fa_attention_mask,
+-+++++-+    #         keep_prob=1.0 - self.attention_dropout,
+-+++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
+-+++++-+    #         input_layout="BNSD",
+-+++++-+    #         sparse_mode=0,
+-+++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
+-+++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
+-+++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
+-+++++-+    #         inner_precise=1
+-+++++-+    #     )
+-+++++-+
+-+++++-+    #     # 恢复原始数据类型
+-+++++-+    #     attn_output = attn_output.to(input_dtype)
+-+++++-+
+-+++++-+    #     # 7. 调整输出形状
+-+++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++-+    #     attn_output = self.o_proj(attn_output)
+-+++++-+
+-+++++-+    #     attn_weights = None
+-+++++-+    #     if output_attentions:
+-+++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
+-+++++-+
+-+++++-+    #     return attn_output, attn_weights, past_key_value
+-+++++-+
+-+++++-+    # def forward(
+-+++++-+    #     self,
+-+++++-+    #     hidden_states: mindspore.Tensor,
+-+++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
+-+++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
+-+++++-+    #     past_key_value: Optional[Cache] = None,
+-+++++-+    #     output_attentions: bool = False,
+-+++++-+    #     use_cache: bool = False,
+-+++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
+-+++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
+-+++++-+
+-+++++-+    #     bsz, q_len, _ = hidden_states.shape
+-+++++-+
+-+++++-+    #     query_states = self.q_proj(hidden_states)
+-+++++-+    #     key_states = self.k_proj(hidden_states)
+-+++++-+    #     value_states = self.v_proj(hidden_states)
+-+++++-+
+-+++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
+-+++++-+
+-+++++-+    #     kv_seq_len = key_states.shape[-2]
+-+++++-+    #     if past_key_value is not None:
+-+++++-+    #         if self.layer_idx is None:
+-+++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
+-+++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
+-+++++-+
+-+++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+-+++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
+-+++++-+
+-+++++-+    #     if past_key_value is not None:
+-+++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+-+++++-+    #         key_states, value_states = past_key_value.update(
+-+++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
+-+++++-+    #         )
+-+++++-+
+-+++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
+-+++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
+-+++++-+
+-+++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
+-+++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
+-+++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
+-+++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
+-+++++-+    #     # <--- 修改结束 ---
+-+++++-+
+-+++++-+    #     fa_attention_mask = None
+-+++++-+    #     if attention_mask is not None:
+-+++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
+-+++++-+    #         fa_attention_mask = (mask_slice != 0)
+-+++++-+
+-+++++-+    #     input_dtype = query_states.dtype
+-+++++-+
+-+++++-+    #     attn_output = mindspore.ops.flash_attention_score(
+-+++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
+-+++++-+    #         key=key_states,
+-+++++-+    #         value=value_states,
+-+++++-+    #         head_num=self.num_heads,
+-+++++-+    #         attn_mask=fa_attention_mask,
+-+++++-+    #         keep_prob=1.0 - self.attention_dropout,
+-+++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
+-+++++-+    #         input_layout="BNSD",
+-+++++-+    #         sparse_mode=0,
+-+++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
+-+++++-+    #     )
+-+++++-+
+-+++++-+    #     attn_output = attn_output.to(input_dtype)
+-+++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
+-+++++-+    #     attn_output = self.o_proj(attn_output)
+-+++++-+
+-+++++-+    #     attn_weights = None
+-+++++-+    #     if output_attentions:
+-+++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
+-+++++-+
+-+++++-+    #     return attn_output, attn_weights, past_key_value
+-+++++-+
+-+++++- QWEN2MOE_ATTENTION_CLASSES = {
+-+++++-     "eager": Qwen2MoeAttention,
+-+++++-+    "flash-attention": Qwen2MoeFlashAttention,
+-+++++- }
+-+++++- 
+-+++++- 
+-+++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
+-+++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
+-+++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
+-+++++- 
+-+++++-+    #@dwj
+-+++++-+    # 只遍历激活的专家，而非全部专家
+-+++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
+-+++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++--        hidden_states = hidden_states.view(-1, hidden_dim)
+-+++++--        # router_logits: (batch * sequence_length, n_experts)
+-+++++--        router_logits = self.gate(hidden_states)
+-+++++--
+-+++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++--        if self.norm_topk_prob:
+-+++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++--        # we cast back to the input dtype
+-+++++--        routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++--
+-+++++--        final_hidden_states = ops.zeros(
+-+++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
+-+++++--        )
+-+++++--
+-+++++--        # One hot encode the selected experts to create an expert mask
+-+++++--        # this will be used to easily index which expert is going to be sollicitated
+-+++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
+-+++++--
+-+++++--        # Loop over all available experts in the model and perform the computation on each expert
+-+++++--        for expert_idx in range(self.num_experts):
+-+++++--            expert_layer = self.experts[expert_idx]
+-+++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
+-+++++--
+-+++++--            # Index the correct hidden states and compute the expert hidden state for
+-+++++--            # the current expert. We need to make sure to multiply the output hidden
+-+++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
+-+++++--            if 0 not in idx.shape:
+-+++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
+-+++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
+-+++++--
+-+++++--                # However `index_add_` only support torch tensors for indexing so we'll use
+-+++++--                # the `top_x` tensor here.
+-+++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
+-+++++--
+-+++++--        shared_expert_output = self.shared_expert(hidden_states)
+-+++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
+-+++++--
+-+++++--        final_hidden_states = final_hidden_states + shared_expert_output
+-+++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
+-+++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
+-+++++-+            num_tokens = hidden_states_reshaped.shape[0]
+-+++++-+
+-+++++-+            router_logits = self.gate(hidden_states_reshaped)
+-+++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
+-+++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
+-+++++-+
+-+++++-+            if self.norm_topk_prob:
+-+++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
+-+++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
+-+++++-+            
+-+++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
+-+++++-+            flat_selected_experts = selected_experts.flatten()
+-+++++-+            
+-+++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
+-+++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
+-+++++-+            token_indices = broadcasted_token_indices.flatten()
+-+++++-+            
+-+++++-+            active_experts = ops.unique(flat_selected_experts)
+-+++++-+            
+-+++++-+            for expert_idx_tensor in active_experts:
+-+++++-+                expert_idx = expert_idx_tensor.item()
+-+++++-+                expert_layer = self.experts[expert_idx]
+-+++++-+                
+-+++++-+                mask = (flat_selected_experts == expert_idx_tensor)
+-+++++-+                selected_token_indices = token_indices[mask]
+-+++++-+                selected_routing_weights = routing_weights.flatten()[mask]
+-+++++-+                
+-+++++-+                current_states = hidden_states_reshaped[selected_token_indices]
+-+++++-+                
+-+++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
+-+++++-+                
+-+++++-+                final_hidden_states = final_hidden_states.index_add(
+-+++++-+                    dim=0,
+-+++++-+                    index=selected_token_indices,
+-+++++-+                    source=expert_output.to(hidden_states.dtype)
+-+++++-+                )
+-+++++-+            
+-+++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
+-+++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
+-+++++- 
+-+++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++--        return final_hidden_states, router_logits
+-+++++-+            final_hidden_states = final_hidden_states + shared_expert_output
+-+++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+-+++++-+            
+-+++++-+            return final_hidden_states, router_logits
+-+++++- 
+-+++++- 
+-+++++- class Qwen2MoeDecoderLayer(nn.Module):
+-+++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
+-+++++- 
+-+++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
+-+++++- 
+-+++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
+-+++++-+
+-+++++-         if (layer_idx not in config.mlp_only_layers) and (
+-+++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
+-+++++-         ):
+-+++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
+-+++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
+-+++++-     _skip_keys_device_placement = "past_key_values"
+-+++++-     _supports_cache_class = True
+-+++++-+#lwx
+-+++++-+    # _supports_static_cache = True
+-+++++- 
+-+++++-     def _init_weights(self, module):
+-+++++-         std = self.config.initializer_range
+-+++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
+-+++++-         return causal_mask
+-+++++- 
+-+++++- 
+-+++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
+-+++++-     _tied_weights_keys = ["lm_head.weight"]
+-+++++- 
+-+++++-     def __init__(self, config):
+-+++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++-         self.num_experts_per_tok = config.num_experts_per_tok
+-+++++-         # Initialize weights and apply final processing
+-+++++-         self.post_init()
+-+++++-+        # @lwx
+-+++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
+-+++++-+        #     self.generation_config.cache_implementation = "static"
+-+++++-+        self._warmed_up = False
+-+++++-+
+-+++++-+    def warmup_moe_model(self):
+-+++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
+-+++++-+        test_texts = [
+-+++++-+            "warmup short",
+-+++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
+-+++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
+-+++++-+        ]
+-+++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
+-+++++-+        if tokenizer is None:
+-+++++-+            from mindnlp.transformers import AutoTokenizer
+-+++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
+-+++++-+            self._warmup_tokenizer = tokenizer
+-+++++-+
+-+++++-+        for text in test_texts:
+-+++++-+            inputs = tokenizer(text, return_tensors="ms")
+-+++++-+            with mindspore._no_grad():
+-+++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
+-+++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
+-+++++- 
+-+++++-     def get_input_embeddings(self):
+-+++++-         return self.model.embed_tokens
+-+++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+-+++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+-+++++-         ```"""
+-+++++-+        if not self._warmed_up:
+-+++++-+            self._warmed_up = True
+-+++++-+            self.warmup_moe_model()
+-+++++- 
+-+++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+-+++++-         output_router_logits = (
+-+++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
+-+++++-             }
+-+++++-         )
+-+++++-         return model_inputs
+-+++++-+# @lwx
+-+++++-+    # def _decode_one_tokens_logits(
+-+++++-+    #     self,
+-+++++-+    #     cur_token: mindspore.Tensor,
+-+++++-+    #     input_pos: Optional[mindspore.Tensor],
+-+++++-+    #     cache_position: mindspore.Tensor,
+-+++++-+    #     past_key_values: StaticCache,
+-+++++-+    # ) -> mindspore.Tensor:
+-+++++-+    #     """
+-+++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
+-+++++-+        
+-+++++-+    #     Args:
+-+++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
+-+++++-+    #         input_pos: 输入位置信息，可选
+-+++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
+-+++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
+-+++++-+            
+-+++++-+    #     Returns:
+-+++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
+-+++++-+    #     """
+-+++++-+    #     # 调用JIT编译的版本
+-+++++-+    #     return self.get_decode_one_tokens_logits(
+-+++++-+    #         cur_token=cur_token,
+-+++++-+    #         input_pos=input_pos,
+-+++++-+    #         cache_position=cache_position,
+-+++++-+    #         past_key_values=past_key_values,
+-+++++-+    #     )
+-+++++-+    
+-+++++-+    # @mindspore.jit(jit_level='O1')
+-+++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
+-+++++-+    #     """
+-+++++-+    #     JIT编译的函数，用于高效的单token解码
+-+++++-+    #     使用JIT编译优化以支持静态shape和高效执行
+-+++++-+        
+-+++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
+-+++++-+    #     """
+-+++++-+    #     outputs = self.model.forward(
+-+++++-+    #         input_ids=cur_token,
+-+++++-+    #         position_ids=input_pos,
+-+++++-+    #         cache_position=cache_position,
+-+++++-+    #         past_key_values=past_key_values,
+-+++++-+    #         use_cache=True,
+-+++++-+    #         return_dict=False,
+-+++++-+    #     )
+-+++++-+        
+-+++++-+    #     hidden_states = outputs[0]
+-+++++-+    #     logits = self.lm_head.forward(hidden_states)
+-+++++-+    #     logits = logits.float()
+-+++++-+        
+-+++++-+    #     return logits[:, -1, :]
+-+++++-+
+-+++++-+    # def _sample(
+-+++++-+    #     self,
+-+++++-+    #     input_ids: mindspore.Tensor,
+-+++++-+    #     logits_processor,
+-+++++-+    #     stopping_criteria,
+-+++++-+    #     generation_config,
+-+++++-+    #     synced_devices: bool,
+-+++++-+    #     streamer=None,
+-+++++-+    #     logits_warper=None,
+-+++++-+    #     **model_kwargs,
+-+++++-+    # ):
+-+++++-+    #     """
+-+++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
+-+++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
+-+++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
+-+++++-+    #     """
+-+++++-+    #     from ...generation.logits_process import LogitsProcessorList
+-+++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
+-+++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
+-+++++-+    #     from mindnlp.core import nn, ops, no_grad
+-+++++-+    #     import numpy as np
+-+++++-+        
+-+++++-+    #     # 检查是否使用 StaticCache
+-+++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
+-+++++-+    #     # 否则，直接调用父类方法
+-+++++-+    #     past_key_values = model_kwargs.get("past_key_values")
+-+++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
+-+++++-+
+-+++++-+    #     if not isinstance(past_key_values, StaticCache):
+-+++++-+    #         # 不使用 StaticCache，直接调用父类方法
+-+++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
+-+++++-+    #         return super()._sample(
+-+++++-+    #             input_ids=input_ids,
+-+++++-+    #             logits_processor=logits_processor,
+-+++++-+    #             stopping_criteria=stopping_criteria,
+-+++++-+    #             generation_config=generation_config,
+-+++++-+    #             synced_devices=synced_devices,
+-+++++-+    #             streamer=streamer,
+-+++++-+    #             logits_warper=logits_warper,
+-+++++-+    #             **model_kwargs,
+-+++++-+    #         )
+-+++++-+        
+-+++++-+    #     # 使用 StaticCache，进入自定义循环
+-+++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
+-+++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
+-+++++-+    #     pad_token_id = generation_config._pad_token_tensor
+-+++++-+    #     output_attentions = generation_config.output_attentions
+-+++++-+    #     output_hidden_states = generation_config.output_hidden_states
+-+++++-+    #     output_scores = generation_config.output_scores
+-+++++-+    #     output_logits = generation_config.output_logits
+-+++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
+-+++++-+    #     max_length = generation_config.max_length
+-+++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
+-+++++-+    #     do_sample = generation_config.do_sample
+-+++++-+        
+-+++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
+-+++++-+    #         raise ValueError(
+-+++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
+-+++++-+    #             f"{logits_warper})."
+-+++++-+    #         )
+-+++++-+        
+-+++++-+    #     # init attention / hidden states / scores tuples
+-+++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
+-+++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
+-+++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
+-+++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
+-+++++-+        
+-+++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
+-+++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
+-+++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
+-+++++-+    #         encoder_hidden_states = (
+-+++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
+-+++++-+    #         )
+-+++++-+        
+-+++++-+    #     # keep track of which sequences are already finished
+-+++++-+    #     batch_size, cur_len = input_ids.shape
+-+++++-+    #     this_peer_finished = False
+-+++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
+-+++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
+-+++++-+        
+-+++++-+    #     time_record = []
+-+++++-+    #     from ....utils.testing_utils import parse_flag_from_env
+-+++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
+-+++++-+        
+-+++++-+    #     while self._has_unfinished_sequences(
+-+++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
+-+++++-+    #     ):
+-+++++-+    #         if _record_time:
+-+++++-+    #             import time as time_module
+-+++++-+    #             infer_start = time_module.time()
+-+++++-+            
+-+++++-+    #         # prepare model inputs
+-+++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
+-+++++-+            
+-+++++-+    #         # prepare variable output controls
+-+++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
+-+++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
+-+++++-+            
+-+++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
+-+++++-+    #         cur_cache_position = model_inputs.get("cache_position")
+-+++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
+-+++++-+    #         cur_input_ids = model_inputs.get("input_ids")
+-+++++-+            
+-+++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
+-+++++-+    #             cur_cache_position is not None and 
+-+++++-+    #             len(cur_cache_position.shape) > 0 and
+-+++++-+    #             cur_cache_position.shape[0] == 1 and
+-+++++-+    #             cur_input_ids is not None and
+-+++++-+    #             cur_input_ids.shape[1] == 1):
+-+++++-+    #             # 使用 JIT 优化的单 token 解码
+-+++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
+-+++++-+    #             if not hasattr(self, '_jit_used'):
+-+++++-+    #                 self._jit_used = False
+-+++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
+-+++++-+                
+-+++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
+-+++++-+    #                 cur_token=cur_input_ids,
+-+++++-+    #                 input_pos=model_inputs.get("position_ids"),
+-+++++-+    #                 cache_position=cur_cache_position,
+-+++++-+    #                 past_key_values=cur_past_key_values,
+-+++++-+    #             )
+-+++++-+                
+-+++++-+    #             # 标记已使用JIT（用于后续判断）
+-+++++-+    #             if not self._jit_used:
+-+++++-+    #                 self._jit_used = True
+-+++++-+                
+-+++++-+    #             # 构造兼容的输出对象
+-+++++-+    #             class JitOptimizedOutput:
+-+++++-+    #                 def __init__(self, logits, config):
+-+++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
+-+++++-+    #                     self.config = config
+-+++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
+-+++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
+-+++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
+-+++++-+    #                     self.cross_attentions = None
+-+++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
+-+++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
+-+++++-+                
+-+++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
+-+++++-+    #         else:
+-+++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
+-+++++-+    #             outputs = self(**model_inputs, return_dict=True)
+-+++++-+            
+-+++++-+    #         if synced_devices and this_peer_finished:
+-+++++-+    #             continue
+-+++++-+            
+-+++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
+-+++++-+    #         next_token_logits = outputs.logits[:, -1, :]
+-+++++-+            
+-+++++-+    #         # pre-process distribution
+-+++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
+-+++++-+    #         if do_sample:
+-+++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
+-+++++-+            
+-+++++-+    #         # Store scores, attentions and hidden_states when required
+-+++++-+    #         if return_dict_in_generate:
+-+++++-+    #             if output_scores:
+-+++++-+    #                 scores += (next_token_scores,)
+-+++++-+    #             if output_logits:
+-+++++-+    #                 raw_logits += (next_token_logits,)
+-+++++-+    #             if output_attentions:
+-+++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
+-+++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
+-+++++-+    #                 if self.config.is_encoder_decoder:
+-+++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
+-+++++-+                
+-+++++-+    #             if output_hidden_states:
+-+++++-+    #                 hidden = (
+-+++++-+    #                     outputs.decoder_hidden_states
+-+++++-+    #                     if self.config.is_encoder_decoder
+-+++++-+    #                     else outputs.hidden_states
+-+++++-+    #                 )
+-+++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
+-+++++-+            
+-+++++-+    #         # token selection
+-+++++-+    #         if do_sample:
+-+++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
+-+++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
+-+++++-+    #         else:
+-+++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
+-+++++-+            
+-+++++-+    #         # finished sentences should have their next token be a padding token
+-+++++-+    #         if has_eos_stopping_criteria:
+-+++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
+-+++++-+            
+-+++++-+    #         # update generated ids, model inputs, and length for next step
+-+++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
+-+++++-+    #         if streamer is not None:
+-+++++-+    #             streamer.put(next_tokens)
+-+++++-+            
+-+++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
+-+++++-+    #             outputs,
+-+++++-+    #             model_kwargs,
+-+++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
+-+++++-+    #         )
+-+++++-+            
+-+++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
+-+++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
+-+++++-+    #         cur_len += 1
+-+++++-+            
+-+++++-+    #         if _record_time:
+-+++++-+    #             import time as time_module
+-+++++-+    #             infer_stop = time_module.time()
+-+++++-+    #             time_record.append(infer_stop - infer_start)
+-+++++-+            
+-+++++-+    #         del outputs
+-+++++-+        
+-+++++-+    #     average_infer_time = None
+-+++++-+    #     if time_record:
+-+++++-+    #         if len(time_record) > 1:
+-+++++-+    #             time_record.pop(0)
+-+++++-+    #         average_infer_time = sum(time_record) / len(time_record)
+-+++++-+    #         print(f'average inference time is: {average_infer_time}')
+-+++++-+    #         print(f'inference time record: {time_record}')
+-+++++-+        
+-+++++-+    #     if streamer is not None:
+-+++++-+    #         streamer.end()
+-+++++-+        
+-+++++-+    #     # 简单判断：打印是否使用了JIT路径
+-+++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
+-+++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
+-+++++-+    #     else:
+-+++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
+-+++++-+        
+-+++++-+    #     if return_dict_in_generate:
+-+++++-+    #         if self.config.is_encoder_decoder:
+-+++++-+    #             return GenerateEncoderDecoderOutput(
+-+++++-+    #                 sequences=input_ids,
+-+++++-+    #                 scores=scores,
+-+++++-+    #                 logits=raw_logits,
+-+++++-+    #                 encoder_attentions=encoder_attentions,
+-+++++-+    #                 encoder_hidden_states=encoder_hidden_states,
+-+++++-+    #                 decoder_attentions=decoder_attentions,
+-+++++-+    #                 cross_attentions=cross_attentions,
+-+++++-+    #                 decoder_hidden_states=decoder_hidden_states,
+-+++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++++-+    #                 average_infer_time=average_infer_time
+-+++++-+    #             )
+-+++++-+    #         else:
+-+++++-+    #             return GenerateDecoderOnlyOutput(
+-+++++-+    #                 sequences=input_ids,
+-+++++-+    #                 scores=scores,
+-+++++-+    #                 logits=raw_logits,
+-+++++-+    #                 attentions=decoder_attentions,
+-+++++-+    #                 hidden_states=decoder_hidden_states,
+-+++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
+-+++++-+    #                 average_infer_time=average_infer_time
+-+++++-+    #             )
+-+++++-+    #     else:
+-+++++-+    #         return input_ids
+-+++++-+            
+-+++++-+    # def _prepare_cache_for_generation(
+-+++++-+    #     self,
+-+++++-+    #     generation_config,
+-+++++-+    #     model_kwargs,
+-+++++-+    #     assistant_model,
+-+++++-+    #     batch_size,
+-+++++-+    #     max_cache_length,
+-+++++-+    # ):
+-+++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
+-+++++-+    #         generation_config.cache_implementation = "static"
+-+++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
+-+++++-+        
+-+++++-+    #     if generation_config.cache_implementation == "static":
+-+++++-+    #         base_required_from_max_length = generation_config.max_length + 1
+-+++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
+-+++++-+    #         min_cache_size = 50
+-+++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
+-+++++-+    #         else:
+-+++++-+    #             max_cache_length = max(base_required, min_cache_size)
+-+++++-+            
+-+++++-+    #         original_max_cache_length = max_cache_length
+-+++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
+-+++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
+-+++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
+-+++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
+-+++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
+-+++++-+            
+-+++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
+-+++++-+    #             if max_cache_length > self.config.max_position_embeddings:
+-+++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
+-+++++-+        
+-+++++-+    #     result = super()._prepare_cache_for_generation(
+-+++++-+    #         generation_config=generation_config,
+-+++++-+    #         model_kwargs=model_kwargs,
+-+++++-+    #         assistant_model=assistant_model,
+-+++++-+    #         batch_size=batch_size,
+-+++++-+    #         max_cache_length=max_cache_length,
+-+++++-+    #     )
+-+++++-+        
+-+++++-+    #     if generation_config.cache_implementation == "static":
+-+++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
+-+++++-+    #         created_cache = model_kwargs.get(cache_name)
+-+++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
+-+++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
+-+++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
+-+++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
+-+++++-+        
+-+++++-+    #     return result
+-+++++-+
+-+++++-+
+-+++++-+
+-+++++- 
+-+++++- 
+-+++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
+-+++++--- 
+-+++++-2.27.0
+-+++++-
+-+++++-- 
+-+++++2.27.0
+-+++++
+-++++-- 
+-++++2.27.0
+-++++
+-+++-- 
+-+++2.27.0
+-+++
+-++-- 
+-++2.27.0
+-++
+-+-- 
+-+2.27.0
+-+
+--- 
+-2.27.0
+-
+-- 
+2.39.5 (Apple Git-154)
+

From 03e95e4e202f66d0321cf2c9824aae4e38488afa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E9=82=93=E4=BC=9F=E9=94=AE?= <emmmvkdeng@gmail.com>
Date: Wed, 10 Dec 2025 14:10:28 +0800
Subject: [PATCH 2/3] =?UTF-8?q?=E6=A0=B9=E6=8D=AEreview=E9=87=8D=E6=96=B0?=
 =?UTF-8?q?=E4=B8=8A=E4=BC=A0?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../\351\230\237\344\274\215emmm/README.md"   |    23 +-
 .../assets/mindstudio.png"                    |   Bin 0 -> 448846 bytes
 .../assets/score\350\256\241\347\256\227.png" |   Bin 0 -> 77871 bytes
 .../\346\216\222\350\241\214\346\246\234.png" |   Bin 0 -> 144470 bytes
 ...0\347\273\210\346\210\220\347\273\251.png" |   Bin 0 -> 420137 bytes
 .../patches/0001-20251104commit.patch"        |  1272 -
 .../patches/0002-20251106commit.patch"        |  3200 -
 .../patches/0003-20261106secondcommit.patch"  |  2769 -
 .../patches/0004-20251106change.patch"        |  7498 ---
 .../patches/0005-20251107001commit.patch"     |  7707 ---
 .../patches/0006-20251107002commit.patch"     |  7931 ---
 .../patches/0007-20251107003commit.patch"     |  8034 ---
 .../patches/0008-moe-change.patch"            |  8789 ---
 .../patches/0009-20251109firstcommit.patch"   |  9078 ---
 .../patches/0010-.patch"                      | 49453 ----------------
 15 files changed, 13 insertions(+), 105741 deletions(-)
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/mindstudio.png"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/score\350\256\241\347\256\227.png"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/\346\216\222\350\241\214\346\246\234.png"
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/\346\234\200\347\273\210\346\210\220\347\273\251.png"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0001-20251104commit.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0002-20251106commit.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0003-20261106secondcommit.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0004-20251106change.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0005-20251107001commit.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0006-20251107002commit.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0007-20251107003commit.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0008-moe-change.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0009-20251109firstcommit.patch"
 delete mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0010-.patch"

diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/README.md" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/README.md"
index a3cda3f5..b29d1520 100644
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/README.md"
+++ "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/README.md"
@@ -10,11 +10,11 @@ Qwen/Qwen1.5-MoE-A2.7B-Chat
 
 在无精度误差的情况下提速这两个模型的prefill，decode和显存峰值
 
-![img](https://kxqaj5kr937.feishu.cn/space/api/box/stream/download/asynccode/?code=NWIwMTIzNjY4NDhkMTI4ZmYxNTFmMWNhOWIyNWRlYzZfeldtbk84b3lhUWVNYjJCZlRtT05TZ0JubU1hMzB0S3RfVG9rZW46WU10NmJsWXFab01CaER4NFlHT2NZRzJHbjJuXzE3NjQ3NTAyNDg6MTc2NDc1Mzg0OF9WNA)
+![img](./assets/score计算.png)
 
 ## 最终成绩
 
-![img](https://kxqaj5kr937.feishu.cn/space/api/box/stream/download/asynccode/?code=Zjk2MzEzNmNhYWUxODQ5NzI1NTNhODRmMjhmMDljMGZfeWNuZ0tzT3JBcHBlY0Z1ZnFJWHRNczRGWnd1UWFOaGdfVG9rZW46QVRXVWJGeUpGb2k5R094WmxuVGM5TUdEbmxmXzE3NjQ3NTAyNDg6MTc2NDc1Mzg0OF9WNA)
+![img](./assets/最终成绩.png)
 
 # 比赛复盘
 
@@ -32,7 +32,7 @@ Qwen/Qwen1.5-MoE-A2.7B-Chat
   - 通过简单网络来测试，flash-attention对于长序列确而有提速效果，但是在中短序列不明显，有时候还会因为未知波动效果不如baseline
   - 官方接口 `mindspore.ops.flash_attention_score`会带来一定的精度误差，具体而言qwen的prompt2会mismatch
 - 算子融合
-  - F.rms_norm 不仅没加速还带来了精度误差(应该是qwen的prompt1会mismatch)，遂直接放弃
+  - F.rms_norm 不仅没加速还带来了精度误差(应该是qwen的prompt1会mismatch)，遂直接放弃；对于review中提到的融合算子精度对齐没有缺陷，我猜测可能是进入F.rms_norm前所必须做的精度转化操作导致的，虽然我当时尝试了float32也还是有mismatch
   - 但是我没太理解会议里面讲的要比较下放损耗和融合算子加速效果，我个人仍然觉得这应该要work，但是却没有
 - Graph&Pynative mode - kernal/图复用
   - 一开始打算用分桶填充策略，设置 `seq_len = [1,2,4,8,..,128]`的桶来多次调用模型生成来生成这些尺寸的图，为输入的prompt寻找恰好不小于他的桶进行padding触发图复用，但是毫无效果，于是开始探索图复用的条件，网上有说法是需要 `@mindspore.jit`即时编译/`Graph mode`静态图模式才能生成可以复用的图，于是进入下一步测试
@@ -41,7 +41,7 @@ Qwen/Qwen1.5-MoE-A2.7B-Chat
   - static-cache ：没做成功，因为需要把动态cache 换成 static cache，bug较多，时间上不允许，而且直播的时候说提升不大。
 - Profiler
   - 这是一个很好的工具(疑似)，但是直到最后都不知道如何使用,一方面是断点设置和信息收集的问题，但这个问题不大
-  - ![img](https://kxqaj5kr937.feishu.cn/space/api/box/stream/download/asynccode/?code=ODNmOTFhMDg2NjZjYmJmODgwMjBlNzVjYTE1MWFiMzRfTTh1S1pnUVZXbWdPRGU0MGhSREh5TU05ZkRaNEJCMGZfVG9rZW46VmFIRWJnMDlob1FUV294YktYZGNTNklqbnlmXzE3NjQ3NTAyNDg6MTc2NDc1Mzg0OF9WNA)
+  - ![img](./assets/mindstudio.png)
   - 最重要的是这个页面我只看到NPU的free/compute比值很大，除此之外不知道如何分析来调优了,要是能看**别人实际调优一遍肯定会好很多，求教程！！**
 - MOE分析
   - 通过模型原来的代码，在self.mlp = ... 这一行，我发现了有一个if控制流，走moe/mlp,尝试使用走mlp之后，prefill/decode耗时降低了**20倍**，这时候我才意识到，原来前面有的没的都是**次要矛盾**，只要把**moe这个模块的代码**优化好了，就已经胜利了
@@ -186,14 +186,17 @@ Qwen/Qwen1.5-MoE-A2.7B-Chat
   -  在预热的时候，记录下所有被激活过的专家的ID，缓存那些在预热中被激活过的active_ids的权重（ops.stack）。
   -  如果缓存已经建立，并且当前需要的专家 eid 就在缓存里，它会直接从连续的 cache_gate_w 张量中索引权重。
 
+
+
 ## 收益点
 
-| DeepseekMoe + Qwen moe都进行MoE模块前向优化，decode直接遍历激活专家 | 总分的具体收益估计是从100->120                               |
-| ------------------------------------------------------------ | ------------------------------------------------------------ |
-| 在DeepseekAttention和QwenAttention的forward函数里有用apply_rotary_pos_emb函数，而对于该函数里用了rotate_half函数。对于rotate_half函数，可以使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]。 | 显存峰值100优化后100prefill133.4445 132.4821decode427.7311 437.5848总分220.919 223.3556 |
-| **moe_prefill_fast** 通过将权重堆叠、对输入 tokens 重新排序，将原本多个小的、串行的专家计算，转换为了几个大的、连续的计算块，并使用一个高效的 scatter_add 操作完成结果聚合，从而大幅提升了性能。**moe_decode_fast** 将多个小规模的、串行的专家计算，巧妙地转换成了一次大规模的、并行的批量矩阵乘法（bmm）操作，彻底消除了 Python 循环，因此速度更快。但是有mismatch，所以根据LongPrompt来做dispatch | 显存峰值100优化后98.4848prefill132.4821 163.8114decode437.5848 454.7424总分223.3556 239.0129 |
-| **init_active_expert_cache**和**warmup_moe_model_deep**：在预热的时候，记录下所有被激活过的专家的ID，缓存那些在预热中被激活过的active_ids的权重（ops.stack）。如果缓存已经建立，并且当前需要的专家 eid 就在缓存里，它会直接从连续的 cache_gate_w 张量中索引权重。 | 显存峰值98.4848优化后98.4848prefill163.8114 198.4985decode454.7424 493.2538总分239.0129 263.4124 |
-| 通过 **Pad -> BMM -> Gather** 的流程，将所有专家的计算合并为单个、大规模的并行操作Pad : 将分配给不同专家的、数量不等的“锯齿状”token数据，通过 tensor_scatter_update 填充成一个规整的、[专家数, 最大Token数, 隐藏层大小] 的“矩形”张量。BMM: 利用这个规整的张量，调用一次 ops.bmm 即可同时计算所有专家的输出，将硬件并行度拉满。Gather : 计算完成后，用 gather_nd 从填充后的结果中高效地提取出有效的输出数据。+但是有mismatch，解决思路是：在核心计算中使用 float32 保证数值精度，从根本上解决 mismatch 问题+根据LongPrompt来做dispatch | 显存峰值98.4848优化后83.3333prefill198.4985 487.1616decode493.2538 490.5996总分263.4124 353.6982 |
+|                     策略名称                     | 说明                                                         |    显存峰值     |      Prefill      |      Decode       |       总分        |
+| :----------------------------------------------: | :----------------------------------------------------------- | :-------------: | :---------------: | :---------------: | :---------------: |
+|          DeepseekMoe + Qwen MoE模块优化          | Decode直接遍历激活专家                                       |     100→100     |      100→132      |      100→400      |      100→200      |
+|                    Rotary优化                    | 用`ops.split`替代`rotate_half`切片方式                       |     100→100     | 133.4445→132.4821 | 427.7311→437.5848 | 220.919→223.3556  |
+|        moe_prefill_fast / moe_decode_fast        | 串行专家计算改为大批量并行BMM，减少Python循环，速度更快（LongPrompt dispatch） |   100→98.4848   | 132.4821→163.8114 | 437.5848→454.7424 | 223.3556→239.0129 |
+| init_active_expert_cache / warmup_moe_model_deep | 缓存预热期间激活专家权重，直接索引cache提升性能              | 98.4848→98.4848 | 163.8114→198.4985 | 454.7424→493.2538 | 239.0129→263.4124 |
+|                Pad→BMM→Gather流程                | 将专家计算合并为一次BMM，保证精度float32并按LongPrompt dispatch | 98.4848→83.3333 | 198.4985→487.1616 | 493.2538→490.5996 | 263.4124→353.6982 |
 
 ## 总结
 
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/mindstudio.png" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/mindstudio.png"
new file mode 100644
index 0000000000000000000000000000000000000000..a963d5c66afcbf9a0e3bc28050d7b638aea560eb
GIT binary patch
literal 448846
zcmd42cT`l(@-94ziV{VVWRaY677&q~a|V%|bA~}hKtMo-I7CSU!w@8AMnJNXVaSN&
zFytJ+?eTqo=e+0MKkoYexp%L{+Iu%m@9wJVr>dT6BGgsoaIwg-Kp+sVg8WNO5a{>g
z-Ou;$0XGPUd>**n@pz%2eIK|2?^}KV-pM>=^gXp)tUZ0q-K{`2&Mr<?93B?#R#wg)
zwl1DK7)S}AlenF>zNfUimAR*#i}Mq0J0~lk5d`Ap<`r13uj1iZN&mpZy^@^&WF;kQ
zB|U4Y?#&Y(;g#eX0iop&VcY_%1rWyj{O2Ih6Oh77DQ(~Mty!NyGoPTHy^}MJt16UQ
zfc{8QEyE5$mv=Y)gT;H~Q`idicR53o=7vqChUFI8(081(pJP@ve|n#Qb6-Jz%hyG$
zhQUvvBCPu4679B4Q(@Hy;pj<1Gy93t2w#F%0|!nnL{^vf)IBdH=F5gAC+&U<;y<qM
zEJ=+f|MC9&F-0WJj`!A+-;e*kJaG9yxBvYLJaJ<ZfbRUG<rUsj(EWe3e5JJHQ#o)(
z{nhxL%zXZ{zo^*QveMw?88ni2+gpb4M=1FY5#QE`x>|u~W1Bk%rzTd}ON@X0#ZQxO
zg0o?93$*p~fhxyEixqgdijcMl1uV`I^_i>K?hjE&`0cj0U_VX?rp;aF$*1DH451u$
zSWJ$(%5!Di8}co1@I?rQKkP^beMqexJIMvrOcjtZ%0X>fOwwpqh_A!W@BZscUnyUm
zkAAtrJddR!wc-HFt4RixxuYuwT~5=lHxctAt>#+eZTK5%D>kh5Bn=rl@ngP?rCn%m
zgjbp~)ftLwbCORAdTe$6O)o0pSDn+?U!wAvsk;4U&LZ40_&A)^bHY*pr^j!d1v;O*
zJJf7HU#pdEcI72gW-ON5DxTreGk)$xOrKT(BZqJzBWH45PYdkJ{H5N`Na}iaj*}fV
z<_46t3}10TIio%bOG9)Zqa{*T^!usTh!xUv%}y7Gw9QPiBkX^z=ZzS%r>x|4RPZ=K
zElayb4*pf^YSp9{THD)P;O3XhQ8XHjl_tm$xZihv*dR<6=T9}IlMBvb$(YTjBNbw)
zo$#lKitY~DEA(#ik{>%lZ%{mtj}d0D=^uG#g{XFG*R&GB?Kv#m+f}7PIOr9KneE&}
zV`{sK^^vo=)Rf7+BsDv0Xen=2B4{t&FF?WylGH9t7Rhk<u)We?u<CogvkgLjY~m|N
zihC>ePPues@G)j@L`7`aO{uv6&ax;vGI(WvNgV7ug%Kb+ulWL{m$(Tj9#=T8Cht{=
z|JB&?)Md5b?&P;nuHDztG#hlntX?C-IzB<o$IS4+(k@3^8Q#c0AE~QU_PMd$Rf3Qe
zqPAzfu#BgzmlA4d*3>5w@;;mgV9K_eCMgpWf9rkyxZ)crocYST^upf0a6{y(y*6{q
zh>2oUU`dPlkw&*r-gYM|I@l@MekM5$58-`dH04ul=;aiGHP>0xS)YrtAerrH*corW
z_2<LRZ-4NY-CR3Z`pO^M1Q(7L!ciUa(eB+R-|>l{NO@V~8QkeGjH`-jun9{k+$>u~
zp{jRe-zzj;xBFM!UPbd&>#d;t3!@5{+Eq<C!%%6N_l;R16T1#X+^zTc@3lP*xgsYI
z@FdolA2Jc$Yk6Iwz};EnMLCl{lGzunjEjfE>bTOaUNIiHAFDGu&efinuU+7oyZUOx
zI_BneD(t&!u6IkIwm7X8Z+fqBg<+rjqSXUD1Y$3J!;!7}+1c<ZYV=j=N_CpET6CwQ
zS?r!yY-X2Z)`vW!=2*RfYa7#ZaC7_4x4)NcJ&7CdzAa>?6=!j-<YPHiQZ&Dp`b%;*
zj-uwkI8ZgT2Uf~{CMJV~69@?2yxi-^YB6uLF*KtzIrd3C3e&<7A326di0#1bsQ<|D
zjaUaI*?9LDE#Hj@T4bVB2j4oD(}AFHWDbJ^%Or6ZD+Z^}MNzQc>(s2J*k&M%ewsJv
z7Xn%8E~j21XF3!^sA9jfG4}O~-rFyhu}WItN&DuVg#ZD`^$k*8m9wds&Ezula1V9{
zIi6@j`X(e0H9iR&fBS*H=hxI{rYwKugJQk&`{61o6`-*1tyky$J7}7#*9s#wT$gdw
zedzIHm&<7I5M@$aokrGGC6l&&VjC{k=H^YEk~l#WFU_gh`Bk_F7YCZBncPa%TbvWU
zU$cH1BUH7coiT-ZS@8{KH<u$Yb^KKvWgNSnvLJxxjc~s}Or3dWOr1AtYNtPfjA(zk
z4)J%lh$$8-6ywz{3lZ~PUxVKB!to`lc*!D2YfxInPFxx!AIt6&ZFWs>T#$A3BWtrB
z9II?o;Z9OIh7s$iW3d0tBmHo0X&}yk@Q+w`Gu_(@bYn(b)Ec&Sj`TT5L|ErDD>9rc
z2lD!^O>3JwJ~>&QB%r;|@cz}|v95GuP&jOs`j651s1AxhMY8mKVUdt93?tLMSimU4
zmWROOih@LW9>$qR=>>jY<D(rBe>&i0<BMImymTu=7?l^(LU8uZT;{3yS-7Znz!LG*
z!3j9~95MbFuXbk$dA%50jcCxpnETX58|zQbRm3C=2GOy~^3*6To@9*nUy<@ZCZA?Q
z939{7f*CCyrEDMgpV1)mX?BG3Vbry^vp)rfphbgsZO-R}z=a{R3s7SIcn8M@ouf;i
z!yT7QZHs{qw->(L&XhW;5-gkzT1*aXpT#^1LEhAW8wwOv#Oe0KpJHJ)k#E>(T}{?{
zof^k78s8epeVb{1)^2`xM%-TbxafMhQBp4>*t*~^N7ocT<LWOtJ@|>pD$5{T8bhn5
zT+ZKSw16IPMp`xN8c0|-zv>`WrXR;&+jUks`eU7(w6W~Fq`5|YfwFC#fv*p!3c>pM
zY)zsyH^qUl>=cHyYS@hz+7_$t@ME`=qMcSVo&AsT)Gi^!ghXCyv)G|`qgi&p^i;q(
zm^ewUk6X_3*BHvH)j=^#8roO8o*5c}b(B7IoiVn7WkSs5k!07N%CED7)?}V3(Bc9c
zjk#FbUq_A^BHc<zY&w+Bzx94FDU<_S<yNpmJ~HR47#vn7+ujpU?Jb)^fpeOjz|b(A
z9&AB}KKJCn>WicC2*EQ<waw)$YTZd2L1w?Oe1+6=^6(9>3U_q^t%|&|aY7oy0vv=I
zh6xpK<Z#(d!T{xHc`_I_5<&o#O>iG?GQ%(F)j9l1aJ2-L-iS0_6;^8KrD@SSuxKut
zzk$V6pA&CO6bIQ*tGoGqulu^dF;DSFct7#Vp(}@T_c9~k=lm`keo!7OWJ81p&quq%
zzug}D(}52snjmh&2iW6wXgG#{^(2B1GX}wCB_E#pf|rA(XhOX-feM*G52dc1A<%-E
z$E4lr=l?MznJn~}lS98zb{Wr3#y!g(87waiGb`c9h%8QFibCp+RfC}rcBs*z1M>Kd
zdoiR<!Cq3kq|{uYVa6swk^16eruVZl(&KbJv#;6{v!#_6!7vZZS~i>6;1MPXea>VR
zh@Gk5dn*;)>oytIX8XH_eC}h{eFA9^IKm#;6eKI4>MkC*>UD|+%g`HW3T^0Rft5S(
z6c9?wYc~#8#kBOnvpgAFCIuXDl*Axk8Pwdxlk&$Bc^_+|3dBrvkgkkDW3nTZmEAyC
zcd_o8(<!QM08!z5*i<rUbcCRp%wvxtTDMcL;%UirJDGvN8BU7z>ZOb1*T>Fws761L
zsT+BqY{j)^DKcsQ#4bOn*1|@XLmxe|nK9=EZ&<t_L4Cgv(1bb0TPtL^A>p$7A7lSO
z$yc;>?`ks<NIM~&hs?0ryoE9h{;FL<{9>5EO9scFa@^iTBRwY<hYa%ljamy?siA|4
zs!q=Ga0+#nvO-*|#8J4iLRgObeu8<k8%+hH!w_B8c-H0F&ZgFlQ=0BlN1wI4^|#!l
zkb$-EX(Rsj3K;()d6_u)v(oI_p+%~*j$R+*DRTu4H@A(okG^r#UPs0*ip}e3TjYJZ
z#nqdY%iMYLH+EN-jW<TYXGc4m9U;-=Ev=tcy5EGgC2#ci>rAN>e{ob%tk;Z13Axq3
z`eSOZV(Jza%fCY0Lv41g9lH664O{FtyxmTiDKn*z<}8#*uU_vHQB?0oWuzG&yHoTW
zH~MdftfhPwcs&#=n~;tt9nmLbYnM(d(=ax{&pz@6VcBkPWmp^;*^*$&7qowF{dl!P
zzmpnbB370`FY|&{Md-FoM9*kjbnFJ<&8FH>M=U!9E)+`ZrL7BJ)MV<!)$8K6Bx7Ub
zMZ4K(F6u~|pc3WY>5X}?F*bMN5kI>$2-?=Gu~+aeE{^O*1*rzyY)+rAc<7L~^gHoW
zs>+z0Q&N&cPOOImQ1z=qEq{k#VgtA|^BprMpQf&|?R>_`aM=8(0U(R&zbc%%fZgzy
z<j$)r&&k0YPb|kVE!Q)zb5$M6;7pYg|6a4SGO}sz3!q2y<aTHI{<~9Oh6RqF_kA$Q
zIbwTcCGRdwnY`~f2gL+2nJaj*jG4>x5<3d#Hc-I6E8m@=zT4&G!5p*@m~X~8zDk8#
zMCLt-POimu5O!73lda0hNX|_2yBOVUxvt|wxIq@<hZDi;r_Z%YQaMqN*2hAzn{&&B
zqm<T5C*?t=;RLx&0<r7y$P?1@;ngg4nN%IXLES`L6;VSk13!NAOHF<qXUS@O0RvYo
zCfIY07H()a-*Z_WSMzfDqxv^KPtyEC$gV%u`JQD3DmD*Xslzp5Hf!SZsDqzhGfA@8
zIHo%T2yd{Zuy1vfeeQo<tNF;KNl;<um3-@#Wc)}Hw-r*M=H->b)asF5$UKB$2;5$?
z@b+`d(m55w#IJ%a(mx;aNm-8{2wsOqk8!4%_wtwY8a2w+2bhg@e92DOKQdgS%qZyx
zi_}}4FCRg32Ld{&&pv{)0;k6+XBBt_8fHy%h@*aa?w@{nD&2LobyCvc;}rL3ET<pw
z&OA5h(Dnv-)jYQk1ndyw{0=dz2e^o%L$|R|hMrFyiKW`@32@*1`W&m#*lwImR1B#P
zt1(I0;Lp$L*tNDYvLD^uwJhQ5n)HEEVwnLt)_!5dROo#ZqYGvGQTo+w=CLAT1yQ#z
zW^pYc=X*$lDnEmwKrhVdMizRlu$)L;TeloWKd`q~Fr>3|2dC66Pv#<1^7F#h3#$Tp
z=1!7){Q@0LeHfW0w{J#(^^$%21<B@`#~69c;pQRYrgVhx`fBL=&RN3Air_ukzoNi2
z%e11mm9L}kF`Q*y_*hMqRG3z-$aV9Nk>LO-LvMlUwNLT<AqRQFlHHAi73`&D<;N@n
zZX)W8Gp>{?8+VvfrZ>_^OGp9IGJfO~s)b87zIT<nFa*8%trRB(t248I>Ew+MJ>Qe8
zonf0slVhb=Uv|6A!?=R@(d|ms>O9x2g``<?nv7e6hh`7(=CJ4tqvmhSOYAS>VvfT2
zhdt$ywxaxYBl0MgjHbt*#Ef%m#q1fqm%E2oF!fl6KGCN|ju&g5@6B%DV2*;fR{d8d
z){Zs1SUCpd)wj<B1a1a{*VCDbuPen4M=EPg7ngFh&j$&o26uO+<_QL$|De8dEDn@T
zI^`8V&-PNRSLyqe8WLUf46IKYL)l^-OKa20o5*15@=Ce~0oFmTPn4p(Gu2HlKHpB#
zujkDXn3-XzWg$ySWAWbzXvb>(V}ZT}P+kyobQp*(AKS}(mCLFu(C9c@N<ovGRwLVr
zjo4#8#J#ighp(&6If)EJw=jFzZ&KE~k9wU7je-=Z&yaMmpMOX{{zi7vDBu;}OPI7V
zd7NB59Uw2hkxXBoP|Pv3d1N_+nP97y;`Vng20%sN+o2~}=9PAixDV)LC!<&s#7oO*
z)2{X{IFXl?1fT><dd`uuKw1MCF`o@rujZ+XS*%`88^?E=#$>%Fsb!B9k`l1QkIj)W
z+T(@gB}N#|$fqRXuOAura+>B;pE!^2v#;v=e%Cnt0A93f1Yh^%7TM82%)C45-4m$^
z$*F{_N+hlg(&l`FVpYd{?7K<hIu^7YUql*{iLkOMQf@jl2$-HwF7_U_4g{;#PuFff
zU%p$7f6p>PMaOi~-YMWGHHUo5-Ud}Iqd}-r)RMEU-p&^_+V<>;JRD+n(}I&(q^w;o
z;?gu+dcBW&NUM*UZ2D1Au)@u1mguq|ZbxV5jJCc@QNSOL1JpF~n1h3(IQUNl1A5he
z-^5o-QglNP42D5oJ4V(`>(u+ys`_ABkw+NbtV@m)9-w_1mH#O4XH6xP@l5j13&=y}
zWkz*@c2Jc`qFmXel|B)2SIUW{CSo=7FLiUJj6eGNSLa<-b0w}J$ZP46SqyUT3!-uv
zHXaVMd^SP5Y1m!G@)<PNUt3SC-PYN8MMwXyzx(U9@^~)v4|no+`zyQ(C?ny2vqUIv
zj3<Bh_;dI8MB9^)C)0<TTLzf9|Gd6>^87EIAKrTYpItzpl_$5~{Ilh|)E&^Dx&Gar
z5_12)=kb5UR=os0{CkLh-QS3O06qCf3&`VtGok5O&VP2|_IMD%;@;W8BFd;bG*n(8
zMkzF_$Gw(}LMII@CZWW0vRhjWiiv8mZ+*4f?;z=^ny|u+?p|<eo?iaPK)$gRVmS$V
z{AT5D4{Aqa?NTkWoCL#v5mH5}DJRkU&9vK>mT7SRMCS99l4O0#Q#ol-GY~oB#j)r;
z#wE3{3g|*$g|}Dgwks2T?;C-1U?;|B<S#AJAD!S9=9if0jM;dW*DsfjdU8lZSQMg?
z%9{J~m5F$^8tOJEuw-pm3-<dW$qE}Bsz;r{QmrYD9bVfS*AbGBrzKMSkfNlQHLnBk
z*$U3td`f<kqyN5kz$|uTz27+!#_XHdB}(zG6=0mfjgNXLu88o$$A*RruQU78s5k9n
zgHQDmKc(b(*su~uymX!zPfW^g@{_T5F49vw`L-7%QMTbQ(6f5bujr&VT?dYqI6b{c
zOpDJi-KHKhDb({(D;qY3r<!T9C$s7N;3)CGE|44<T4b^gx`fu{@_~+tB=Sq|W@S{E
z=pLaGX7(#C_d53TN)zaD3+4m#Fy5f}Qm#I|7_uY(?)MV9dM!2n>B2erAC%@#6Xwap
z>~uc{p`gpepYyhBE`4ERhAmafXCel@@*eQYMmbWWDX)t2Vs9E@M#NZ7x?cyxt_WGX
znY5{<G%yMsVe;DAZK0^?>+eBDCgu`OE&pKCoNLNam@X;QN04USsQHOkCQtfrEUMY}
ztwc{%2V}Ke3`h=!o_9WQQjEm*+7YKXG|IY2FbzVhh+kYrRSI4!H|HjUyzw&h6x%BS
zEiuE7mK#m_^Nf#fXnWKhcX--@|E&q&$r)QNL2J@46j;LYZWfEy1Gn<?EaY{bZizl`
zsfwv;(!P5|m77zjZZg-H<kd8vdq7ZcFIz%@rTi`APEx;A7%kuHm}zap8vX4!^Xh72
z%IO+=G$UD<Rqu9=#cLhS26df+m{~)U<Y#Ab_;#}!iRXP~m(}EHsb$lp+b=>kIQeZ)
z<}l9Nxvp!%<Yvx099+ID;_`XPVZ2e`;9_ZUIGH$!V!KgoO`(**PuKMxGc#f7ou^aa
zwPZcG6xz0o;3MI4RXlUK*`DS63o-vJ?`xyJvpJkysD5iFao6mnQm^7s=s@6$hzEN@
z&#YBCc!`OZ*3;%M@H5IB)K#S-$ouu?M^ED?{cmz3BO-K+1K+*K{v2pp<{!}DdHR(z
zzit>dtETKYQ#@K}x%Z;^W&!hXx$H8_Z2`#~F0GWzuBodVc7A@YV-O_U-cARR4S6$v
zd3<sdZg%OEArfGU8?92Q*{3`+1p3S&%xvW9y46U4XL=r7mNOGgIE9%(Ry4~idSKQ$
z>B3ySKDjroqGdH*TPV4TLEZ~if4j!saZIf<4n@W@Y-#AN7YKYhD8w1j4qR2WO22@y
zTf5FY2AI^$Uj+i`Ee5S?>>}X~iGh0sidz#BB_8%h`vv6PLCQ&=<bW|%Cbk=Pr5S{f
zg{2U+o{R?Bur@`<ARRleo2Sx?BgL(S+FprZ-OL0mnPOEL{7fBxw9hhSX@>4!{KQ|T
zGZ&O&u}DOb?5Y%$!fTZZ+ivu^N?1Vl8ME~GueAhtme!c^xb3-je(M!d>O(gS^BrEs
zUh5DegluiUm{Flirkc>S3<Y_;_8C&eKkAMQi*=*Gi>RAM(_o9PI)>n6cGeNn;8mez
zPtmTLySHxz9nc`lYSnZVAw_CgV{fJ94&iFoK36?i!AIj!&}^T>M_;vgaZHL&G5s{z
zIt6j9Bn!XLDZtEWYMX=S2rI&56HG5=%bf_Kw4yg#n9SPVl{;l7xch1L5{qt<V^hvW
zE*u7oW}VQ6Qbkxy`zv0DZ?!lx)Hd<g{2<jUh6&7*p`7W1a%bQOft_v}Vm{yVB!_=E
z6z-F6-&DE22$7BObA{eKo%#YZra);c2My{7mcj?)YT3Mgc0cI&)-hlR8vK+UIYyC?
zm{G7qlx%DV<+WsH5D3ye&x+6?W=mL692)@do!6n&f-Ws8^x4l>YLR7xOst9N>3L;W
zW66i3U0IaaAjM%N$pD?;8&q)qS$;<myxCRGA<I8|$wydi-@Sh0TIWnISgF6&143Wx
zP!jk6JHc<oCDG?Hs^ZL_K>=eDKUS>9;ksa1N&kt%EA_jWH{rc=GezG=g6R}CIL~lH
zooXe-^CVmJz0JbiRj=W|5A>t!a|nOR)lYp2Es|=EGc7<;k=tdOJ*D_|t;ZLM6X!+B
z2x${+iQqo3f1pcB+UsA=5+LE;1m*R&9BIxQYnU~Tkncm#`&Ox1G~Bp0MXDb!sPCWZ
zc(nM-_TBU+MeZ*bNd!4N+L_sT>VDlS#u+oQ9n5a;TEf3GIy9MIbF7+QP;Xgj0u3l@
z&eh`Ja4OO---<y=)Y^a&61*2M6&rWCAbIK|fY~3DLh`nXklHU$)C!QsE8HVuurjSp
zs&YGPJ5C>e{@ZkXd59*(wTEL^^U@0Gn8`j%Xf9+)N&oXi%VMo|7Qr#%=b(*c#3{W(
zhzPy|y;I2Tp-Bs;z`gu^1`Y^4u=BzwTaJ~yuJl7hV!Pv;j0xH$%+Fr`WBEH^Au){4
zcB#Yp`N1ksMW6f`ZOQdgA}B906(Q3|N}`UBj)_QWUy**ZR0zgZgL;52<Z#!5M#Idd
zHkEv%v)atGT$iK|4c*}sW0SV&AOAQ_f+u(ApmsChHVuJ165x{56K@W}3VpG7JZ4&`
zCna_51mBTe2?Py3YGCyud)%}y;%QZ+;Nbe|FvMKr+eU)Gk}fx+96>-aJ=~#nl_|rD
z>v4ux-w~O$FZ-pBS2Tgq;e?>s>-CA%&Z8Sq1tyZvbUSz7Y3KUCh~2*y=Mm+G-70i2
zEU7%At@*l5<tQ#8<;e)Y)#FNnD~ht_5(99S&sIPYn+j>#hS#qNDJ$#M7$-B`7Q=W>
zUomXLN1XsLEFR5$iRNQfM4pIc#gEyKA<B><=D{v5Su4pT^Zq=1{Hqvm^!Q3Wgra){
z&5pQoJ=U6v-@bjILb<&%$%<x;C$cbwGr&$#aQL2OM+gWDoBu@u{+Z$%AIY*Vj)0s+
z##ri%s{wu$0b|GU%U^9|Cxfo#1~9wc&3b;8G`pBSv)RbwaW=A&lnG%O5jiH^O8pZ5
zrM=h~hw0fd+5$d9T+)6vAR(S<dS?J{Mei~5ESxXM1D<^TW_Ez>M`983V#UXE>$K;A
z)33$FW$ys%{7k5?-d&JQ+qPXI!EsNOC^4$qY~5=f=};{!SE{TB1&p+|X^Z<m?c|LQ
zepXk>`)^BWJ3lpb%Ua;eu-k8rR5)_IPG_{Lngrmbhvfs-^f`oGc#4omkK(nB8`K+X
zZeA>?%Rh+d5%sq%+nzlC{M{hs_Ls&+IX^!U)GX3pCzOQ4_;Ygb^;F)1qylS7azW{K
zrctRC%y}TE+v5$bmI6naJj6drqV#RL9C1x2vt#s->lC9OVQ*)3?R&WqDBws!o!lx7
zqdW>_`&)+u@}`r(3fl>Zk&nGYDOolKE<U%_q#3pUL+)Q^Ru8_b?r#4uxgd&oY3->@
z|7eJ7eu_<DKfrkh#|I9#v&d2z(GD=GoSIG@Lrsg<G#BP6#|Hk)?fu@Xqo&tY{Lu(-
zq1Y?Q%)LBk>`{UU6`TG{R7AW^{B~@NcTrxQdI2lK=C_as?tvIsnpeak$y=U`yw;l@
z=32kJa{2uO_gkJX3fiOTX=0_^pnS%pu+Nn6`@P?u3gCLi*UP#eQEGCFDycZMlz8B8
z>-?zE8*4d@9W}Y^F`&Z#PsMzl;o+a~QKEg`PML9g_r;=2^L}SP8P!EEh1eKiN8l_y
z>5wZSlXlNMV*!kx%M5R5{4R41o1H7^dp>1P)D88F0x=K~R=DNbc=A&T|E>2gT0c<-
z9A+z=)K=B3f1f<q*nkFxA@PHEHKOZgMURi+zZEn$a_hOnL@r!fa`5l70Bsm*Y$eAR
zk<`K*bW4e|vpMf;gc5YFs@?ju3s7J+p5`vA_f5PU8kMF>hw9Oa#}e?omE@(z&Its^
zEcSJ(EYXs>_KP=ZM|=OdI?#7qz^I~d1^6<XT@rZCCEGt)uE|LT8O$~Zjfhs%mYLn`
zAWf6(Qtj=E6FFmDW-QnG`K|OzgB^=Di^xD+9Jd0`V&_+z>v;-*t+7_WYBq{V9VFi&
zbC>Y=W~wW5kdOv!j6aKXYHN|fCSAM2pDnWi@4pL_n0G?#`QzAm5-j~r^Y&W*;Y>jz
z@u`{lTbDU@wL&Ud5cmhOI6GM)9+l8`a~V0gb?hv#;-+4r*~vt^OV+4l;0flwKgNc=
z%lLD()%?lhk7=1+I}x!#nlXjU=;@=dcdcLRg2HgbFDD2g=7st78i`CXAE$$NYQllY
zTjB>ByM~koT$NIc88OBh)lt-r={>70jylhszj?Q^yu9HMe4t|D=u9@<V%IL}G}$CI
zaJh8Xq|k+O+SE`pPx5qjet5ld^(gx0V%7j|4*RdP@+)N=Bnkd~=C(n}xLq}{8rV}_
zx?HKRLy|z1i@<5NN|!B}Lq`f6!#grEIy6=SThXN|D;_rr!0-9^er$FQmfd2ikXw15
z9OmqQ@9Rv{?b(yT#fr%-A+JkIRyE3@A?W~{{Ia?U{i|^ou;PJI)L~m6a?qj7MGxSi
z2RVBe1{Y%`^11+>FsbTpWVM`NoLyf3Z|C{i=Os(+7j3t_2(6mtXV`$tiZsT$9Y$vk
zjbwfl(Vf5oT}j0FI7D{cL|r?@HO@wf)A!g!M!%N#VqPEZ-Kc3o7SDON3H`Qf$y0Wq
z5GiC9pg>B&HKVD5^4|?II$A0V)YpxY;V5ykci7QZb1CB;9!`Klv(wYr&#u5%tFho5
z@e92EjR`)P|4Ok}J~P`?y<Q|FB24}~ar#n(QX{^GBIE90f}O0aWSU;mT|ur;jF4N%
zXWImRTvrdyuOK*NVSZNY^F}FGE~38(Ayv+cNt>Bn)rLl?02!J+t*PYh4iPLi%FS0I
zVHM+(R#1r8pn0tAp@b+Z>bJegrXweFn@=jYdG}wLebY1)rYt?YmL3cdIkt10UN*Rv
zy%7;b&rj4&Q(m|KD4#|Dsf_&34y%QlcXO*i!)O=XcX!-B4>Z}E&nwa39Q5963*IsQ
zFHYiT^UtD(Ki9UvE4b5&PcKjK<o-(#b8H`#DzBiqQP-(qc@FzQ7L_+-o$rHuZD{}N
zP|c1&vdw9l=R8%(o!YPNwSVzO|1&GT{~q{_$XX2Nt-zIRX&?7GpM06Jvc_qK|LOj(
zhq+rRxBi>?=>BA>IGnTWu;b41&(5zJ{@2d`XEez&ew^#6^Y}l}0A&4&9V9$B?;*7L
z&+mZvzWxtc_1ACyuQNUW&rwhR|0Xo)d;%g#(*c}s7(2wR5y-{G=;+wgGSEMK8gYKh
zdgpGK&T*A{2gg}Ljwb0m+MXM?wbdV24=ne#lu+|qaBbgTN_4GF0TBGf!l$R%yG~@k
zYVLv1c;PthPuriIVg5s%zW6-!(j=3`-ZJchp|>k<+IF3cZcn_f?0sGS==vXtJ*dfs
zEQ~hFyxYEdz<<7$17zEX+wsq>$BlX%MvM3%x_LEM1NuyB{a*t-ZxDxyP8Xl}lKrMe
z-)t(XDvUMYrd2um)xUmiF*Y)r-y84ZaMDi&$+PO~t+q#t0)7HQKb9eh5O4G;f|r}W
zFoSa@CvlVw!UAf0?W^s5UDf6AXa?T~1z0{NDVeTQ=+o0?rsGqBWY1-CHeW|&A<D~5
z!C5ZGW-M?{onqsBflWbH9v+QweCom0tAMe&xuThwXF-P+G<xoeMeIpKC-Oi_%ua&9
z+nO#pSnX%wV^i7`bky`z{&*>o6+<JQL+C=IV<UI*z5yF6>WCuiOvQj*7s|_^wXtS9
z;!xI9+}Y{RVPu+3yv!z0_C-F<^`(G-O|?UpcXe$wv!I|ZHVNl&%>}hKz;iU2#@hF1
z3=NopHL}%RlXgfzAR+6@$!fa;5`ct)FC}Q>;^J+5Usptf9U7le$0{n4+M>^+Q_X7B
zA)}+}Wg(cy8(Zre4}v<F=xv97<u}+I=yr54b)%^@ymYF)sa+CVD}gb$KZMWU+~P27
z(B9j7Ku%6M__jH~^V~B(EIceBKCR%0VtZ(qco~vcGfqM!<}cqB3wSWKz!&cXo#u)<
zqLNu9cr>npf{IF?!O<>FjP2Z2*0c9Y3@T0S$e`{oMMh^v^Gw9%N7svb9|YwW*c1jX
zlb0A)>h`d(_#Y)ESKF-Og}$IwfUt3xPzXAW4_mQW){0rO=FE!u6z(rg^WsPY8nZn}
zx}Yc9uaom!QLaATn#s=2ZgpD}-ojg6T3PRjmCVn}vkZMl3$=a;&N7?7DR6;TR!T>a
zaE*u?)v$;$$;HXP94k;=n;2@h7+@EitCS&T)9Xvj<jg`Lrcx{Y4~|g$!ou8rK3JeP
z@L7)?L7igL+=M0VySF8%CTjI`5}AIR-p67`@$=L0g!p*2sE!;!jZ_l8&u!H7RMuAa
zcdnI^OFprQn*dZ>hFE~?bnsabyDp-aP1j*IOhr*yIichA*qMio=T<juMrr1$hd3>d
z#xgOY+W&`WzK!)pclMS4S?DS;PRzBjS+jSuu6D8Zb<>Nx7`LkV%|<DZfjS(dX`M%v
zO@aA<*mK^0qi??M?JFuR)#iR!r&&1*D>)G_)Ip$FSws}FUjZdL4LPGm)3Y;dd&fdO
zV4&HXK{^Z6=hBs?7)p`CR1X^lg_GHvPf2Bk>Mt}Q?`|Ohb4CD`JMuhu-f@l*<m*z8
zJK*PH26~ucQLKyD!Yg1WiNd+oT^4WWR{_oe-NgWhwl6fA;JrDy-#D8+)AY<~)*IQM
zAyV?T^_G>j75w$Gp-CYkF~IVrQXg9aWMg=}S8M;`%CO!)TC`DAU{LaNYwNY!)ekmi
z&5oimax!X7uQ8TH4&$8szCIvlEcCEZWs^ic55d5&x3^UfeEm+h)Hv6RVkd53HRB;O
zKOL;eH0$=;H^Y3qiRr1VatYE9es1B<vyI-&qN1{4wDrS$+PXFJXLm#G9gFpxO*2V=
z3ezj*LJXIt(!1ECaL1w7XP0vd7;lD#hI;$b3bL9HwT_OCSc$NDfwQ5twQ@W><*a7)
zhCMwjq9%Td+3NaisA3HxrKo4lwT~YCsy=GL1hS~#+RU#rW;LpMnURqp3O@XD37&rs
zMDlf1j)G|rEzPQh%pV(@>pII3wTTG{c>uN)!n38QsGJPH*v738lWcf;CEk`OBjGfF
zF(@S=5S7x+;^K?vN(0@aupdU9;7%(`E4bs{>uSG8W9zOn`IUD-Kbu?0s7lWAtqf~&
zjAv&}iVz|NGfjT}yK!Bk09ZYL^11ETBMyU>7d8mQ(!qs}wstWwn{H7)Tp=Ta$|7hI
z?QLak?XX#Mp`+!YdUQm(r08LHl<Fnwb6oW_?gKAx#nN$|m-7i8LqS88{gGiJe(Ct&
z;HT6y@~$xwLAw3(rj7M=jmex2MP)_m6ZfyZV-DKJyk<c`;zfv(vB=0o!)p7I22nAq
zO2aDeE5>MFS<emhA}c$qhQfN_S&&5YS)X24v_-X1VrmksstP9qFySBde8BPF5L#S(
z{NFlfV`WW9Ow7*^n;UI<_kuW)bMQmJ%O9j^L=-dyUKE18s|WCnr3WH8IaTJ1%?;{$
z1qU_*-HUGXonrSvUx`VgbPh<_2P=q|Bj2})2HGO`53sAuN18V_(TTLu%gb!lpB=Nk
zkw>s!gpU#vGY|p}D7E__!;<mx`72D!d)i+6?*Ug+?cB~dHkO_|yRB~M575W{UQ2_w
z5;^rh51(N#9o6aR>G3VdT*}kuJpWqOGH3dO%cyX4l)vY3(E40rQmKx^I14SU6jKb%
zNO$)qLB|F?0B_hRvEa=mIuIfX(lP-2t9|!0{emPqI=W(;>w=-?`U#nt*(no`$57+z
z-=VBf$P69b#^$DuwQh~Bu@^1`qLm1R<`XX?wDq)dD`cXo9c4misAKKz`&op9%qRqB
z5db~|LB%efH0a@{CD)fd);<p}1MmR=)kUa~6zH_Rdfxix2B?$#6o10PVgPInMHMxz
z<w_>c@^UE$l*@4H(Pp*l(4(W16Bafxxe$z?sYaXROK?__Ua>Y9oK3z-0l;#HQZfQr
zrYAl-+uPf#3%Eq^^)Pmk=vmF{9E&R39$pS+qe8wh9eX=xO*Nm|DMKS!={wlTwx&|q
zkK!Q=(fA<$W@-U}Q8peP1sR5XUb?vQAiDN;VC(dLyw{hM%qGSr2O#yv3zHJX)npc6
zo;F2@DxU$sQjgZM<*0vS8BmAqK~s|j)8a@IjEpX{V~);E@sfP2(JuxzqCV7Ma%000
zH7DL^$&)v>IS&8)MIM-K#iS!E3&)6B@V>=wCx+ci!otO6#->}Ut+V_95ZfCuMnO(h
zU`mQg1A{}-m^xk??-f^)?e>P3SxwYor|<79udJ{=#<9QIt_8MLqM3SOaXKY22}$4U
z@}fM;7E#fuQPea7uxOxAG|0x){MkYd-|j3Ds|0C-h>VOJo9x{w27zXPsv?h}u4(q#
z+GCT#5)H301r8&P>;lO%GAo<a-h~gErq|CV(Rk?TMkT<N?d|RL{^-+lTfco42u}Q-
zRkpA|d)8!1Ohn8gAg;1%dJVT9$QY|wCT0bH(Z(j_&@e&!RaaN9{pz^)+Y2c~C>S<4
z)>l+iCPEVhXhCakjbrM;QW#f30Jhn`Ku=EAyNhNNm53ndFi+SxIaLw5Wm<uOO*xbr
zz1`nOu?Hj(i*m8HmbUtc@!}Bxni=naI7rffokO&H;R!3!#KhmCstq<>FtoB-wYPPB
zesR*2pA^+`{*#bK$JF#~MTL7pQe5`oM|Sk=RZ<MK1bs4_N?NTW#nnzmjrYH#-f3NJ
z-`dupsHix;zM+VouGGpu=sO%`k}OG|c_xyZ!OCLtBx|66E*zhwcYU-5@N)0oJq5xD
z)`$0*xg^a1M;9EN<$i#jkd&F-8_(o<xJ$ddyzFo^b@W?p{l|FK{6g@Wm2_3T49$Q5
z1Xd=ewybr{qhExhK`NlZos-SiTDf82;rzi1ZRKbZgwCW$J-~PA8W|yu(O@MBq?%H4
zBI0wQehCNZ*xWcPVhqO6m&8Ix2UIogUdG>C!DyucJ|h|jck>X+ik@!vn7PjRr3ok~
z+9V_;<+1o_cW$a4U@OJ!97ZiK0DX!EG$@ASZ|i9&z%$R63KBEY5J^!g?gt)JpgErV
zpkLwxugf1^hv2k-0<edfh?3O%)|=_p#3UH7ZrsCTcuBsDiAl!B_IASjDk_ITN#Gej
zdjXwdH5HY;i$ff}mzO6q1A~l#fdQ7>3;%tTM0SxDleUhZA`1(A{~}N{&^3$xcW~6f
zkPoO>lbL8V)5Ik@nu?UBEc-1Ohz^x~Dbyi>5@m*loUEdZvH+0@vs3lQ>tC~7CMN0z
z=CPV}6$g~xa?*!@67|N;jut(oF8b_Xs>)9v&I0fK-aj5XmFgBWQLJlFqMJ`&(9Mkl
zUHCAQNl+gi4&ck{$znk-lZJa58meC2p3EF{zo)YAJ5w=@7WMm8cklc{U~s-HbK4n^
zVUP8qq}wf!kLrsTkR*0f#;h(xO$~{cXB8WZhyex$1}i70hM|AMBf>{xvy<@gaW+X5
zc4VXuZ}3}sX_~8vu4<bBBhdhuBHWJC0H<$apl)uC8u$2wjUoSKYv<=JpA*kpAYZ+x
z$Vg2osknQX_eSSY1#_9yW8>)nO`{E#%I5WOPl$`lpUZRx!mtE-FCQ;fq2SEkUXmgN
zLONFB+FU?z9I&8}$jH^$>+Sieoc?}1FE1ZEOUsv0N$g`0i(RAY!RC%zs#$97($bMB
zFvsx-B+6f$T0lskupTZY5o^<Sef<^;#ILLpoKmHS{C7j7aB*>2AK(9-3qZ1g45N4F
z1fE}Q&~bH8?Yakfa0MLs?&#?3<7NkvBOL%Odia-*effn4%K+>Vp~N$(Q%Ahw&TE)D
z*g&sq6!ql;h|R*rq?i5bIThMzW6L()vM9IN<m><|At6u^jpnHQ{m&M41nu@3&dx2D
zU}QsMBR{`zAzuO8@_|3#A?frCQ#CcUyZ|p@S|&GZl40*+r(rPpl*716pUQ-)Vdrf)
zF<!Xm<pco-yAgX<m)J31|GH;UL8DSf2SbsFkFL#tc5HM|Ib2Fwx}BGuL`Eco&>s8V
zZF?s>T?eJguzUA9Ym9KBG__1B9*2ctVveCuZ0%nPOR8lKNE28XIt0#@#Tj&rYQ;Y<
zWYbnx*K~{YJkBLGKSa6|12Gl^l9+7vWaOi*udTfE$p;|jO8E3ic$?g_s+LVmObyV2
zmRry5dO6wA>FJW|>z4A(<g~Q1utU*$icOoVmRlfaykN1&lV^?Mk-IeXz=0|%Z_G;B
zTXkqk4GjPr-5Y1I(sC>3fKulf#w`YV`rg<2L>e#OK7X~9(ue#gmaef6u){_s`3+O#
zla7KN-@0NQ#GXB)4}S8*#`BQgLM$N@3O^nf^lK)(X?xW`!f9Nb)Ar~H|8V_#{_?|1
z31Azk0@w*;%=S`~PwS1>fzqOT{cz`9P;O3oWK@)%qj9>#NhlZV6d7G@R|#GC@uu9(
z9tM!mF%x0MWF*_Ay2s)GnM}5JvXW}87LdrYe_lKC=l?~m3uL4?7O%e$fZj}ZIM=dq
z(wa&j^|F;JC+|tq`aN@&>H5~ojbl)(dsccbzPLZ|D$44R_PV}u%)ZH0iZ_2L&k{Hc
zTb|`c9cxonMjxKHi7d8!3~6VaWY0y;r^4XsFR|X2Zq*twq<MMX11VsIsf4y^TJAR?
zi<5>VIFi9OF)6Tjv@D+Rk9}7LIMTebT{vNRuTJlwHj6Bjp;&jy4w)2^lvd4~1F3n5
zW8|-Y3aqmAILB{|50Qmp(6T)1S?0;R8JG(S?{Q0rpt<orIb8apkmi%<QBlERBeLM=
z_%S*fpD<SGYwO$AH#}h&VArouy7X8K964b;i;r|QRz@PLRm0ngjl(NVVt%lZo!B_t
zB!<44stW#<t7iinLx-MvIzB3DDtbthP}uUmcm++&;wyc*J~(SV@A}lxGhA$C!Wha_
zi{m3#aww`t>ptzvYfT+uvnS78g7L}dP~KuSD!fp7B0TeHyappC>L^^7c-4Y(6~fuY
zY(!;9Sddp$73I?mRf63MdHP2a;lXq%vPqtzD#DafBQd6ptdZ~LX+9SgrmU<mHUu=x
zxlf`>4OQ0*H96}q6<4@KX_A#}&e*ts{6zK1!*9#(<)>{?es=2z+Pb<LT1HVQkH9Z)
zii%D#z*tz=emfaH9v4(T<|+5mgi2TOkvAOFWfX6GFV}duD-4WNW!-w1pT^yXDwB+S
z7nrNDB8-*U^KD*hxZv1He)!_PTM>~=>*k4hiUBXY9o>M@qx~prAYAhC*Zj?w78>d8
zq@4yMUn^9wqN3o4=^xi&{>mekpOH6f)i<sbHjocpduJG3Rc@&;9;*${wm5D8X4>&_
zo5*UMMrLOta+Yu3ii%QvOi6C3ZO5`8f5*g!kcIhYn-XV!anrl}0ZBP8CyoPBPSHJ{
z=dEaOEfcN2ydJH3^zljD4{P5Au7NM^XMJ_E&f3Y|-+8!kFuKZFI9X^P<MHzIKWEGk
zSvfrmk-B{Q)u@b!_WeVfk9R<kNdF>O2Kwm-j}QLu3LHWo{`x4q(Cv>G6MrGLcOlZ;
zEPi>W78|F?2rnv?@%vGv$O&IeO4ifny=XeSBW*6|6qtToqZPc-l2d_|HQ$x3r=p-;
zYr`bd!SFqkFE+A>)o~hDSNHz7hC~x?5rU|k#JM;@9UR=8^Qd4R6XJxXh`G=bGN~;`
zBIb_H)!0Sx@cJL%$G&qv<ZK3Hy;mbL?2WYETX-YfwQ%Zv<!a%S!YJN6`|Ru+O@wT2
zLJX_;{z^GQ?2X4hCW;kHd#<^=-Os|aNz_x$Nf0e`I$LDS6l9g%3xPV5sEt3kAbLdR
zYh@EQc7cO;L`~DN;?w>fi*(0iE4zr<XEv}9^{bwPtVS?xEw!yfYt~rMq##FK#)rid
zf3oQdM|;Jr!6T-WAbHf!urunFk8T=biWbc4yTJ%8N6}Y!n5Q}<@TQ%oTv|oOzMdcQ
zDZQ!r&$5VFey!(*S1{+fp5X`y^`8umTJ3Em;ZTtOSlq`sBU<_B;dW(=pHD!n^ER`v
zeWTeRb3TbS?yZ{@c@)#8vFL2HT=?6%cDBY!_W6u_uB5)*DG_SU%+zT;%r)|gv#TP0
zF}zgEm;qg2yUn@#He>fK2wuH(l6W;u()03NKG!oj#(0ruRzLU9KAcbs!ibkx_1m3k
zu5Vx%7j_PIetisC5*~tYpMU6&5!8ee0eZ;KsdyCp<hh<6>Cwfp`<|5^w_PuP#Uy*S
z`kfKc7J{fQ3-$mj79f%S7M8Q-EL3To)%EUxw9;wQ`d#bK2eBHZY>T@)IfPX0bQ#1i
z?$dsoUOP&q>2gz79mdZJ06l5l(CKrd3|D9NY;y>fTd1oUSln~n%U6<Tj0ap$bN;?z
zlFZ2U)z)lvl!6v)e=qo)NIE3sB^K5fb_$gdX09^J>zo6FMr#HqrxI;#9i+2=Se(^5
z_CUGaqJPCYziC!hA1CtNU2Lw;&I8-IU2D}BWt3-Vk6)g|tKWhSfeH0zHbr={aI!v)
z|KU1<i_*~5Riu9eL%O`dQ29}8gp8YOkpRJzGhbTCh;VTkxlQ@adA3)lBW+t*Ty7+h
zXX0J~CmG%%pvXup5tn0jdxs)8y!0ta?8xx2Vo*ESdwWX!sUM}=RW-_W;g?0$BUj3B
zvzy}^S;ydh2By?&^~)p3K*+;i=vU<TCX3&qR%IAYnit*cUDr7^;q*P)@1Api==1V~
zNR|ymSwSC%y^Z^U=R<jlacg*;pP?Z$O%~<0dw#j*oKfuNsr@zQ*K6wmXxFIS$V9ml
z+XV&1hpxqfCVZKW4%;GG8=KYLiWL|m`v|NYOcR;&%8>Ex^G`2%6?PnCxqxY1UrUUa
zn)ZDC339GY`}ONlr&{{Ax(9^ad3?61f`K@{A@0cBkg?o0!SZRuG+I{dvKj5YDy#F|
zny+hCp5V7A=|}h#3N+W2srdy+nG5X5scXYM8C9M_zvXYPHQ4x4)H0piIEn@-H&UOR
zDbW9_n>}}>Sl#m-YL>uh2dM@&Ym#d~#Z*_@Se(5n`PX8_K9c3TUDFFuVF+$+;Pz+b
z@dZ>xWxsDH+#E6H>1%%%A=>N~F5Nu2IL!V4F(WKV%3tfKm-Jc9+&w>PbJcGWmHFy*
zXq4R;Io|H&F*^foBD9*5Gj*f$%g<1BR{@;)rzWk%yRLk7a^2kqVm$m~IRF>hB18|m
zHj7Jy)?%lXnNcYko4-;Io;|B&;G@*nC;EPuKPgP--fKbAt8cd;GPH>ZYrhY7v`y!r
z9l7nJ_N02>-75ptl3sS2D<(#T5_&a+IgX=jM1mn{cNe`JDKe)L%nhTXuo*XnzgCFf
z3tRt<0m;f9@qz$0i1vI62ZXpEiGD5^-@17G(|cKVUxSzdRa&5#5+)@jMN2&}<ZXiT
z;#Z+dRJ1T4nVqfQoYz?A<(>50o(et{dWT(W#FVJ`nugk{ps?<J{Q(KLOY52ATwNyM
zOY^kgwZ_NmBQ`wz!e7IV9J=!~`Gsg+ogW$`Bz$~rHcOq<vP&r-P@Mu(sQV21kxzgp
z6Qk6{uhJkYD$k^qH_T$3irrVE?>^QD<YH~Wzx-f<cZLiwjT^$8R$+uVcitr=CFR2F
zr`3v5DS}usRPkez*xx^V(L_aw$qU7CuBDg_s>^~Jo4Jp7zs^_2ii&)2devUpQO|!&
z;WhF1SpZy_u@ADjDPbbdHp;pd{l=<;OQW^4iOW!OqCuXrZ`yL6ayBfvvm$^~MecVS
zeb9kQR=Py;jD*Do%@7z2<Ie3{j?>mFTOyyC-_x0Qs-(yp7~Ea#lzniDF&aZHy<X@}
z!fB|?8~18{u3;h0-)VZ~>$NQZkI=Xu4{<_SI5<`t(wsY256sCKYpiHIaQIcE-H2Wf
z**a&LobTQbmy0nI&0vgABpZAG@Ru=!^wyiZcL`c<T;}u625Dicnco_YC`4R8LnHTj
z=YmlIXA^SU`Ppugp`mvZh)6L$+p{W}8G0L@F>0|`D`&|X8Hr(pgt+)Orf%39bE2!9
zi;z=4giHo5=1)v1G0DmQHB?T{i&A5EGMc(oPH*&FyAM$2zISgfkW6hwes{j#Bp@u=
z#O*9UT_|8hj2E=)dFM7*7B#MJqVwQ6FuV$;kvyYXkfO}qxI7hLR~qajxn*U^c6`<Z
zEFT_5;}Kwm?$tF@+B(vU3cqyQV{VJP->QDUkC%m+{V7zLmtXiz*CG&Nr29ToWzBd>
zPSv)U`CJ8UD>^mJ%+B71_8u8a#O{xJiIpN@K@wk}+`+A^TNW7l_N@fvbmz_iA?t@X
zi-V|Xp=Uv@6nhsK`ZYnyud?h?s%j{6>is04o?+#SOb=saBV;inWW$-6ndu+n@eQId
zvEH}ewX7r~icu~o_&Gt6WIjAMr)Z@!w#zf<l>rz45bD{wxU|<e5y@odRHW*&D{>Rj
zYpOpFcG=k-Rs5HOp-ogA8Xs?4f{%Q7Pj}~?)Qjwgt?HkWl0puTj(_bb(^Ez~#0oE0
zj+f5^wR5lS?6jlz)J;uWfBppZXhYt<1@o&=I4`y>aA&K3s_`WfzGf6(y1fe1byy=W
zp0pX9bMct;6e3j~H{h1Caft6Zb;qqO{ZcadYRnV^DYraTPtBO%;OL_Jfstw@dL#ly
zX?Vf$LLwBJYcg7$>S)WqyMbkDvhc1)v^tj`r_bC65ixSR5fA$E5(@yR{TGRQ%jAxd
zE$K<P0n;iC;f|Noq2>8qG2Ju|n6-)FzOz^ko48XK?5;Bjq=m{wuH~bNp&zAn;(CjR
zID4N0aket;rJ*U6y%N7DCP$4RS-aNLdG3KIJ-tFs-4ylgg&|^7{WNQ%N(DE0t2y5c
z79Gya$@J_7`F!Exih_=%<&65To%M^JtdZJ^ZLCj_;()1VwDIErIW03g>(j?L3n!S&
zoMfTX=<EU}p{J(Ik6?fjT<PE0Elqh5g=TJZ@~h+mVW~E&0%JU^pfL(sx$-dP{ewI2
z-nY?Sf+Yc$#Ql-|?%h!3ES{)Ik5vLHa?3s#ZdGk9M*Y{e$X?c8d%}DAA03bHv~RX3
z+8K}?VO;SGjsu1v$4_Fl_bci_cza5cOn?LSQNX4vWm+hyvAmZKZqTGL&GRRZ>4{=q
zS@HpKh_()jn|m<UX;L^f#K}OoCUj942N##%e%SB1!|Z_Q;lVw+JcxoJo<O_+uX6e`
zsH@cb=iD!!Jk9|U3KOxKcxC#2Un(phB9;@b`SCnhQlL5)2$gnyn?JI0w67lU2y;JY
z7k#PW_8?cW&oyA?W!3GccS9#&GDZ&Acz9M;%M&pnR)+1UYq_bqSIo-vn$P>PtsxP2
z?uLebPZbZBA!Uax4@!AE&nwCWIFn==n|TvYIwtu(I}U5TZf>6S^z!<o2m>VkdVEXd
z@Z{(bRU|}_ezM#N8`P>^S~BL9@rwEVz)nG)B@wV?+qpk<<N)x?11#CffVt?M6`62N
z$h#L$GoJuBOcQcjZ(%_qik%}o6;37|2E+(27Puv{Tux7qPL2QtKqu}1>!hROx(&j_
z3<vP{@C2WmTe#cj<|@3aNiukjl2P^XfI5F~r;1VCP0PT;$8SYelUu4jZWOVSn0@)=
z5_EC#5?}5#>-#86yx(qL)@o~6P*l=*tzh0xMBrsE&9D7F&K2bKg@$S1dg9_-Sy9oF
z1x-ux4*0_Ctc?_@sONKHx4JHj*XWJfYn+!9Y;tV9tOqI>ydIsNpj3QndJ+mKsH?VZ
z=Y+2)FC`>=?wsg89lY?%o0EAjF;aHJuh$qVl(wSpVQ^#ry}KVN#KM63!2uAPA%%{h
z1O>7i&+o?#G&G9Y*-rd_#JzP`)Z6w4jDaW;2BEZqprnUxP(-A22qmPuJ4fJv($WIb
zJ;2Z%BOr{l<j~y%3|&LKn{(rL&wKCR?>o-}`V8OLvG!V@^;v7Ljf^TN)*py5zxnzJ
z%u;G$l;{JWt&y4Cr#}|jm35VyyR}v#<rrVQVAq#urFj40{d0EqxznU+`SWQ(7poZ=
zR}H2rL&t-9o7tuKl-wK_Zk!%GqS+B=lN7$s1@KxL$e?Du32rj==1u=sBx$g!xUupT
zgE6vVR&QS?#(+L_Zd%Db`N8cI?>G3t+t~U*{{m9U=7pW84x`nzwa3%N17!{}U7at>
zZi!6kXrIH;_MXB{6-z!3<&#B|AD8zUw(V+^$VY3f^gK(JDYfom43>)JbUAY)ku=ux
zruIDA(C-{FZ$<tnV+DNqScCbEumkMvfR=sWfHzB$8%~fxAj5B-;#y5t(6sXV{Comt
zeO~S{)uWHt5jO3rMP^_yKba?~p+eiy2g=o+<V9?sZwI`VOXLhoC|~(1!}y$qB{#D%
z*&i?I1vRAofN!d$-5H%X(HNZqXE*3i&$I?%mJ9JuhmCB{W#;a5DN!irvk$CNRg6s;
z43IlM`e5=$=15x&XIRUp*Y@wSnJmh)JLnPuc(7&K{E%GOJ41lga@eDfSPvaaCX9@!
z-JAkiK6SRg_3wQ`Ip{ZE(g7(Kkt=f8*@d%yND5b=933Ivyg>#WrBv+C%pEP>4Pmou
zWDl9Ok+?V%j0{{vkLme}RN9^1nNSUA{YtD|3uqAh&NtFF-yxXnPloG5ie|X>sp^J#
zYjfF?4UM%&C94z_f3PZ%epA`Z$k8;WvnjW7Wxs6z&20m??%5OE&)NwCkMRbtPq@6J
z{MyN{NyOCi;fm4#fOli8F`A{=hdg$8IE%~NB$Y4T=&8cs#oO_T_jkzOeB*YfDDO>@
zWll&dD+_kwlx}Tm1o8*3*|i|5q%!F1rQRr-=;1-l{Q6T`)ox9Yr_%7Z8aWWmpDtF*
z-N}8~vhXXdBzC@Y`2}fseD1>yX-4l6>R~b9wSbWIYo>$2(rdMEo?QzLmVNj0VXe5I
zQ{q4{KBIi1+@SAyD2>Q1oUW=D?Cd7{-%2Z}1iu(h$R&w}eB|$r3qxOf#N{H)Wzw5b
zo%`AP1E2kYGhdl&A3G7X)l(Iv9~pPq8IpT{Pfq@sV}7&s33GvEULjn+JyeT?j7$ZH
z2*6-zMUAnt>>pab^&3hvJor;W|0yo+5DJrxKpe4-!0)~Og5a&P?pH#@Cyu`d_kVk-
z|A5P+r@hJ=DBr=65gC2u`yuQ#UtsGLKx*@;MFKA_bk~oD?-K;@U$%t&!4^$ui6YR$
z(wZ;4jSYQzkKpRH`D%W(d*r)tw9U`#dd+eN+yuEmfDOvl2($)2cEq9p?vR*DnF3e?
z&`X&xin<AZ)Ji5G9gH|%36mM_#+dVm%WFvC8McS#!g2yhyL3%j>xO~W-5nS}5MWlq
z_-5KO+Ctpj*Y>O?OTud%$yIbdIMyHxxHNPfsh#Ux2SCHk%^$Dy=!ir0hyiSHVwTp>
zz;j$-%YxMfDrXz3si{-5uduh^;%eIu^IX{@`vJ8ah178C2`Vc=aZJG$;q51X@?&?O
z;%pCx6Nq3fl6Y;62bYUX84yy?1Ih*(V!iXubFaA&;;7z+3ssj&o?$PLA-eQo=H%%9
z;}@7J*O{NBvg}J-so1BV-@jL^Q2>xn&wZ8BYtAusxf!qN2a}jL?bybznSFK+4#SpX
zd)C^G2RJF%u8^}ugb(lhNkim>ZGFJRwD&_Cd3ikdtpM_qu07eA5l%G?$ZmEYG%=fl
zW5u5q%QUcNDE81NH6qEjE)MFw>eF=CgXy1Q6Z3hSIOKcp=}flj3m`5gCtV!&V^IaS
zZ{FB<axHgn6{G-^m=MuKwe?(-Y`h&t>Vj<6?(-$PM_qjazdL97O1FeENn#SQU0SK{
zrm?ZMdrxJzPRUm{)^^xZaf$wv#p}rHaKs#0dnj#$@%tDJHM>1F%(Yptb&fhv|D|<1
zs~lHX;xNlM#I_fE!=j3X1ybqWi&?+B96&zD8n(~4gm;i)IJJ@31|Qn~=2xel5k5#V
zQgZv|0CFzpQ#lg4$S-@l1LUK1#t(fEsX@`XoSwa2E||$zj)T{%kaNM7@TEFRTaf<C
z`FjT0PH3rByFtp@Wz@AS{~g*KFo}&sv~jWlNmT~q76_0Uahj(<x6I-7b0#LqeDU3L
zN!)#6Snjn7IT1RbXg3WXY1P%&Y1Uhi?vW?=J&V_S<z)ATEs0kJQ5|mP^rdd9{sl+z
z{0-XE=U?v*%v3rGfo&64C@Fz!<u?i?^}+KlNPA$_JA9x4cErv7@M8cP!UWZO>uEW;
zV}^QfCX1WXYM8+2=Nkhmk3RzkR87+mxWQla^!1Y3QsO4foOL9K`m-c{7ON=dJy0wl
zxk(b}#|$v&nGG(|e$?TI5nW1BMv3KbMoYoZ{%7HA{yBbqMIXnAb}AfYkY%+KPv<`a
zpZa(^R#Y8joX@V_uT!%3ReHoJkG=n!@<Ik(+Ad(RP(25B^LDM@-;Ad>O&RSN@x^Vv
zbaG4mfMtz({Q0X!bM}j$DpK?(f_T>RFFq^ZC!Gj9AiPDcI1D`Gkh1Gyl<Vbcqy>cq
ze5;SX-WGrIryElc(y&msG&FJi57xR_19;uChk=B@@cGa~VAUQdGMHKAnQqkFnRLW_
zzGemr*9`|7J2_SSVY1-{)>;?}18O~812J^Xi(OgP_ch;uk&e4)60Ncsdt-PmN|u87
zCL6Eaw*R>NC4s-^*7?hD<LGzoT8$0@mKw!i2aK()WoAZMl+e~GWy{>{Y~v+u$Mdl7
ze5N$@uGTB$m*=+&)L9qK+Q@gC>cu`Sjjq-uS%2W|pLD`Wo#|`2%f9H1xugq*{YDM8
zwjWq(9`1(V05`O?du+u%Ej<!U{nszZnytX3I|1}&i84>`uO#GN{A|v-@g`GZl|6YE
z%~E^mTb#~}e}z$w;MAINl6s27XjGwoKQ6h$g(P{jLe6M|{fz@>=Qp1Ndrb}iTs>wn
zOk_;rEzAWN{4RULw}22b!^B%%x;~L{5_smemd5AfJfW%l35av7Vmg|gh?}>>YD_i{
zCgbk@Of^TYwL*^Ka5MYl$F&jo`?l}ydjgjkAwkG_tFa2-(?(}jO}A?rjcCF4uq5%P
zk+Za-2OJ+K_74uqKn8h=`}&<6g``({w#0xTeva?8j}d#tK$4zSK{8SGNXX@SPY?E_
zr`V@CCVwzN#FWNY{N4oTSgkkZM72#$K|PxAs6MAEE`QK;VBFzLrKS3ugZboXiqLII
zd?nHVpcA+h&x69N)Ng&hwd2+RhEd3i;Ba_Tf3W@FTTaI*PeC3GW-{$3sqM8EHMf8o
z8@u2gN(08H`YGl*PG8{V*;=3bwlBwa09LeBg9hv|MS<Li<pRCh)YOYJ<tiJYW??>C
zWqalOnEC|~;oU*K2Al1f0+F-U%f(3AOCi1~JL(9Li=XBzm+$4}ds;KE`|Q$+g-;Yk
zxY5|Qb6l=e3!m(l%RLecALBCJZYFL>b$9Gr5Slq}etgHvt0-#Tpy@22ZyU*!vK)|&
zjQDlCHa`=3ghfhc4|we~L6^ITeN^A3Q@&Ecd;P^V*tbbk*Jp>rYNqT?{MOT0nZX%U
zi1#u4JVi%nb#47A3BK=je}8``XD1w@P3xjF5U)^lGW!DWRUjDr*aIWou(cA0KtiqJ
zfIFtL#+_rlgVsAINbMsg8VwgY6PL8?dSca;Q&dSx^||@t8UVJhJwLFBB4{CyQftbt
zvem8iN9>-Qk5?3-$0C7|Z7;(W9#2+4#Y#0}AS}9ohO#tT+uATPGsA%lyDeptk5w7p
zEn@3ocyMQ=NVntS)Z2`m5HkELOLhN%Iyuk{S;baCO~dnouje-%HTC8v3BbQ#M|$?1
zoYU*4Gy^+ke}B4pc?Z}+76SUN_2I73dRIQ(;?GvRT_2%hJ8o1+dw#n#UpLmyU}M==
zzM9-`0SX69cBALf_Af;raa%<`kUTjW*Rf1UX=Z8u%C-Ivd4J!FS5a?YcyI<$h>Akv
zZzbk@<*^R`bNVsaqa1%Jjpz9JHEOt69((tk*a`J(>+jh_j|v{;;10J@kjoE8B)vyn
zlj~~6T#NVM-cV50-_j&s=;8^CB62Qb%x$UAe}3n)2P-T@r(wm5!@N}=`zo5$vn;E;
z#H~!hII?}3H)IKL^9geT=7}AJtxYZ=0*2NmBkMKXt07~;l`ka5aUx`2sfJT`6m!@-
zlobkPhdk1O4UAh3PTO)dQJI9f*iW00_s6uxF-9-jBwT3Rell$i;zeV?QdJq#+f2#x
zCV}u+h#PnCb*GA|-N`}4GW|^pCAx%nzXt~;7?X1<m*@}|!+q}<=bc=g4IXLzB}!8b
z*QzJv8I#iy2-=CNvFhsZ=H_d!(_e;LFU;cL=a>V;Bx`GXv|R~xbtPx-X)A`Fa5?E9
zmj+6$ldKrIsMM+rvFZTN-oELN^E_Avs>cC*l)S_cvCAFe&)7#K1NkK0-i<~?#BL?-
zgC2=VL_oI%Av0M$Pb90X$Nt=M$$TyYPfgEF(qs)Z*17_`vmeDS#Bq@Js~EJ92}v3u
zTD=@mN|QSC*<2I&>M!|!AGv5`gMg2(aYCV9eQ<lA2f##e#)HGd&A=ZIU-;cW-FbO_
z2;Igz0}W!O$j+^&oTyq`HCxybyP!e3tgf2THXRmEls^Q|)PF=kMD!p6Iaca@H#vC|
z@aJ-aR9pUiCY%KON0A*y$3QbFfPpQpj<;fLwS^uyI!%RjEnTN7)ABcZ+WY%AU6-y_
z=OJ5?eXY)@x()LJd`rPD96D>$ddkDiz0q@i_{-$VZV0(pJ&To#l3{ee7|Ye7)D7C|
z73qlYF)_2Xq$dlR8|D2>#0!|kz`$E;n;kuZW{Hd}@raf!9ii=-2pYtN??rVHQ2<~~
z1s>oWGkPz4m|61Wm_~|0cWI9=TEzsH8$Z7MAwfLZw*d8sGVam{d*?&WuKRO#_V=CK
zbCF0R&W>nBk*@4`ZMX4jpMo>&+Lq#h6J0T9Y={gsL2cCy%D83`5D=8v50dfPjJG^g
z1j?}K%zX=i;SxTsv`yJ*xLiKR7>@-H7gM?&S||#iuo`D))|^)*9gk%q1eS!Ir+UP^
z=2H{+%{k}nzJXL#UAM!>DL(6?9!>Kw%WsH}pstu=Zc%mshxdivwA!ibj3WoRo`*p5
z$FJ#TF~%GmOT?F+>q{boy2V8B`ehlM6z}0LU+E!uAc)EcXg^@9%`8QBDh@&3DghC}
z_wB|v!o5{&NfBBjWBm%I$Q{M0$78JZu3P-(S{@$ZqCBHhu{rhVMaBCRfv*f(cQUDo
z2?;@D^H}ghLIsso&eF%d?WpiWUO-yv@}*uh>7HU0ghjrZCjwHXe};+g9-yeddjnw_
z?ziXHymSM>4NY#3mZMvzr;ps^Jx3-c<iV=w#|;h?h23m@$*)|KWbAJ&)Y|Oz^={`N
zcR=CNJ=rl$-Ub&HT=^dBBwngA#{D;rd=FYglZ@J5dG?&*r{&|D)5p^;=Jc~|vuJy#
z;OVH2P8pP-f9K=kD8Pqm?1IBJG>+T4q?z#{CSfD_FGgyN)8j{cLLT$5dl>Q-td}vA
zQ;`oHzn6R!+m(afu;{==L&z-Tz^!huq#is1B?t4o^?vKuy11zyo*42I{G?`2oip*O
zZ69Iu&w)=4b(XiUYI936r8tbb2V+?P6rCI%_xLh0BEjUvks#wxa)N`GL`-$0%SvI~
z2RAHk*29cL(-0B97twlyO^4%9yX~LnH*Fak14o0PAvjL1E(23z0%vuVOZSS{n5Hl3
zm-w{+vaMhBc}PzJ^!Pp7&%hGZ=??cczGsfj@R<q1_%hAg16(1(TZF1TgmPKWq)Z7#
zxHnoN(ZtL{4kZa>iHeESb_s$)tLwt=b3lfY8>AZByqptllltzb-qE8Ult&A9fra49
zvuy~*`DIe>bxt{B0$iH29_$QX!{nyR-fVo+k3?rmxAg=>sHRUelcE9oFnXpCT2wUQ
z^0KZfQ0ckbdv`JH<lTn-P6rT_<2pN99PCRjHIGx2vel#2J3p=dCTxDadfH*(aZdr@
zmT>o$x4RuVn%1mS+_xkF3;)8V8XKnJL#=n#DIu3A9KchczP)$p^Jlv6tE~ns&2o1V
zwA<s#kuE8@y~NgC;k3kqayXZ-F7ciqYdSe7E&~LNP3?immGh9RJ<YXd<O~{?*k3t}
zWTOod1ey+v-Rm!o3HpMM7;g%%+kH)y!Yd}~sF0AgWlt-9nWm)t4ULa~5`L?e@L71{
zS1LI{C!+DZ;Mdkc!86ga92=^?bza`9?w|^HXCYwL=56iy6F<z`nsFkD@YaJ|UR*op
z()LBz<f=!~A!S1aS67wEjR|`oe!XWqf=dC++4`}ME7@&yXZ6+Iq&FxTk31qTe@lz~
zZt1J1ubt5TO(l$w@K%>AWkZQb>+%go;5g(A*C-87{_>|m^_wA;^<&dCo+Jl2jp!df
zBKL`C@nsJob=K0IjLgnP8E7ZG-F&|G{^39R_*X~2>$vk>PeUy%!C2T$d%ur83T%cA
zTaMM3cTGgO3P#Vd5PSRELie6{U{Bo+Ysrr{OhucU<H&r@CuSndbeqLLRaGrW?Fb~q
zjZCUFFBu#EG}uw?^&C!g&TZICsIa>bX0P8c@bL6>pYJeT#q^2jk4{V=FE8foZ4jGe
zG|>5cuDHw4$AhD))fhJUXA|Qi3a+lis*Lg<k#LdR7)<Kr{-Y@!omRyl8$bZiitIYF
z*K8Y9*-UdcOt;QpiEw7H5vdJp6{fw3J+nV>#7^c~aW<wa{RQ?`9H%GU3qRCTd?NPV
zU+7LofN~k@$~N`q3NkWM{lonoiay6LGm)Wl?Mb%T6@@V%<IU&H|FgNxxY+lLsiVoR
zws^KDk8Z!@=Kcx7bjM45m)Mx_8{Zp<HXahV6Vj^!Re2>Sx9!<Oa;R=S<gI*0k=epr
zWvxo3a>tz%u>M2`?+F;JV8Lq3q0ar}J)D&?^z$10qn3lB9~TDFay03!`3asms7{;6
zeU=0f%Fhma3*jt-CDh=O=uW|yKS2t@aIJpIRhGf7h}L1{O3qKnrxLw%y0ovI%YL`Q
z_`b36%|dRUkO-byYbU-&ev((MZp)G(H<aNybbZh5`r&3HzG?vvHzfT>@yGZ0hO#;p
z2V)!NJfDv;pXaGSVm=fLt^cO2ek>>c>c`@#J576Rt=W3nBbo6~`~o%df_;Lqp%}B9
z!uU7t;7R&MeGN8V$&uq_8%5dlkDLDC8LIFj^Sf8>)d>7G4B*#1L019J52L4mep*hc
z#~5{0ZQi_bt;y?hB{0`Ntlc0KUX)&v_DN3A@ondCAO+`o9$##|&abg+R(u5}6MD89
zdB2Y-S>YkUt?2<?-3anea?9Nt*fFPz4BScRjA^+0c_{+BxQzxDcX!Scqd)NT&JZ7W
zN+jo-mM4OO5kym7{qvGB@v$j(Q@wB-^&%}|@daNKX68;sf#Z$gS3Wp9ngoN~_UK%(
z3>_&?%5u}bbXc7U{|}{C=?grn@7}iD7d@VfD79}Dtg=zCJE%nM<@b4!6{C=W&X%L4
zQ}gS9iU_Zt0g`a4aqSzItte{Bjq5|TmBH|?v0Q>^GT@tXA=A;r&ENVfaT->pY9r~S
z%833sMDtb-I8+b2Nn8$(@iK^<&4x(bzi~0PNUCaA98So=m9{NzO;8#dsXQL{WMyVL
z{>JszSgO2*r`%aM@s+o%-+9rCU_W+I599lI^@MR4xoEi@{?R+-7E>E$>dy&a-L$#C
z$CE#rF3<(8%_ri%veBGv+w_|$d^b35I_Kh>qX}}h#&~Grx(^3OZMM%wxTpLXeXd-M
z{>sVyW?h|TjQX#v=${K(124ofWre%*Am(fS{h1_haoOsyoM#SebdAcMZipMY0k+Vz
z7^C+2{IEOfsBFDTS}hlz+_f#mh1W|u<vQPT!cavoq2Ghy6#IwBQKk6y+|x%0x*t>I
z?rsK#fiGh5DIOEKWR;Y^IK4&9sCM#AE&n&=;5%dL_E41{onPgim6Yc%pu6Qj#1Y`c
zr_cXGy!fMcnQ=<8E5jnZ&Qe#mwjS%MzZVQf)M6F3+%QFMlQf6JeA}zVKCNY>1-+UA
zg7tOB{fa)F8cqA|HgyNRTv1Y<o_MOc3huHKa^GGOeS&LHr+R^ePmrwCTFg<m?Poa6
z!NFp<n-#QCV2kt7?757m6qDgwR7xq1_(YZRaHc9)|IGU9Lmf=osETkROOPp&m=mTc
z(5vTl-&g!od;9620(vYr#i+Su?_}3s&vcAM76jaYLJjE?bM5x1o9KaF9OD)0)?Np3
z4kQ;yGHbd_wk<Ore2c><pYqz+czB!F5E&*l$W3$EiV-M5*t@-H9<(nLeE$6Q4FZH{
zw6ooxedHjALo-L3^JG~nkOVlX_|?b|)C!WDFe-?@q1UYNYB}<Y1RLt_Z<?5$qr?Xz
zJfTXEcVWc_6}zJBsRHCvo5pFy(Xq0qovu`28BX-J(Ye_WW$|)G2>aaS6zSW_jE3!Y
ziPqtN<pRv_6`pi)8n8`;Cj1dQJ}VZzX45`~C2_8I*$t5xJf5AI?2+Oqcne7@Z}im@
z+c70-EQ@rVap00=)RmXFXUoywjzFy%u~Ac+S&jB7(wx}h%jT-CHN|}E|FizWJc<zS
zxSG~0wX@R!UxMOQYKv6d>SWcuri%mGvW@=6EEoBe8%L8pcpS_~b?yx-zYz&Mw=+e>
z{xT1iprQWHG=tZ~+U1EO<+HxnK*L8*t%u|i-c?=>W^MWxPVRCGeR%%}GuhIByTXNC
z2JHM-`NO?|kPq$~Oc4WPTwij1@FjO-YrHy0yWt04gO0Xa?s9A|a&ZjwPb2#d)$RIp
zs<~XZyvP)9gDemAoTQah&&Wy+cvkh!Z+lR5<So1AVCU2OpTLk$+axXno?cFvtnF8h
z{a3UzP=`;UFGxtpqUC=6G8%XutZjt<<Y(=&_801o0vz$#Z?g#c;>%6!<xW*ReG*Tv
z*=bU>%}Uh=+nkJsjc@n+6qbXZ#g`hR4;yH&ZO0|qe4O*EoxEK792gHakkD4^>Rpz0
zff#MN(spQUL9y)>Od>$|ob~l>AE&0ew{F5<%%c^?*e|gQBGog15@%ApJW<R|R_7Lc
z1G2cfg)zc5HEZhH>|scsOQEQspY~{)I~O;;5E1HYE!%C=iVKrv??dR%tj8*Xl5Cn{
zp%?L3&C~h4vy&a9LVW{Mfj<Jf+K6+um>WS;J@pRNeBY*+JOU}jn-qs5{^Y0Lv#F)_
z72+uA+;`6&@;|`2TF2i7s(h2zgyueIeT1IVMF{Vzg0x^ebHI#{h=?@4!4t$Wn<QVV
zF4fh2Uq@2?0i?W)PYA7UtUm>Dj|#q?R<YyzB8@tLQ~8*ONOiWvTivo^y0?AfUI^0M
z-27SpOkXg0YMMPt@a&pcl{Ky7ji+QG<%o3Orm9A(nTjY)VFgP|p$@n2v|?w)ZjHKX
z5Xn8#^!V<!3M&eV!#{r4rb1Xw&1^;f>|2p|)m$Mrzw?HD#`l^PcJ=nBpwM<#1XhQ&
zE2F3=dFTfYVLjG;ySe!5I|egrYx+pJ-PK|PVeF*dZ)b9R`~<7GiwhsY_ep@tY!rI9
z?^?aR+r^QdPXYdLct|uZG~QuCc*sYvy_$(gJI-gauXnb4q?=oxizk#Itl#*&_43M|
z+Hyc3cm8b28jzM#Z7VI}n|^oo?FxgW@-|+J5tn({*!~!MIDn}S?K~UZ8br$y%E^VI
zT(5nCvRx&pG@ahMxLBZF0(90nAu3hKl$V@MG_}ur&bK<<Nh&sm#BrE3p0?Tx*{icL
zU49fZ5ZZ2_rmjX+zOf@F($W2R@Y_pKzRtN~pL4|iA&uukW?{vCP1=vhid-Ao)H7s+
zvZm6HAG5Pb<0fj-K$V3VyG}LTf6<D(oO>7pQQy@Qqe)0ZK`%B|9*cOe>NQ~Wp1)w9
z8=1f?E>GAXE96p+7ie`|g(d`>T7P-;5T-8YoS$Kjnp&-^XPT2;r&^a};fp~?TEnp}
zS;bl)Y&WpZehP2v{%uLMV`Q>NRgX?la~(!e?d>2lVM}Zd>+dgNHC~piY4yqIX|S~x
zh@StR5wmMKNFt`mI+<;bjx)EjTdi>(fls_VgAVJ3+sfViwEyuz7x&uQW_E7%QPvub
zz7`qS)gxju_TvYg<p^BJL`y<~o{%YljgBJZeps9C(P-A7`Qet(x$-e!2mTNE$SKXZ
zk=j>z9bv_cCP)7`Yn}1l&qDWn6Ai;_a-&7_S$RwdWfj+)so2!by*KIo#(U-pD1qH3
zAsewJ?xW3u+`PQ6<B98Y#uEL{Q!h5o&jxcvyQXN5Z(nhpe@tA3(%&(3!>62czhfwa
z+6@YI5Rsl3_q{<;==KfQQiCOmsA4XPA0O<oAb+~USk~>LHdm|mEOt;qagcktH@R<o
zm}PBq%>*+o;<aE&{JR1X39>PFD+ciZm-SHZqQw;8<{8}CX=3k~kI$Vt*OCi~C@6?a
ziZ6~S1?jc^1$BW4{$HnGf0t#z*`^v98*cN~JHy{w{*I&<^Z9la#~RF00O?!GA5?py
zy4dF&FLz%w?Ns88R@R40-d%BQb;fvn=*ojJ<0U56UUm@^UN!<>FlO-Ed`?lLe_;Fh
zq&>C>+332hbhlLVtY*n|%3Y}51tVai_9OBTJtNW{)A6wOU>QAQH&u`EB8J40o-IQa
zq3aW9+lGWB+uB${^vEt+3hL9j=GHhpcZseeg2@OvZYcHlr>9SP=1QZcS}*lL;+mrH
z#UtI+h{s%}sp@sb8h9YBv;%;L2rb?yXAmm-kFNf6rhnGH&A|~xR#sK*1SvwpJ-3>o
z(bLX6AT10&cQlYdaBvVd>3XS=H&GuE;d87jdKl>$gj`_#P`Vx(+fhXguiJHiYdSo}
z*?D8|bMJVq&!?jradMB6&sKkIHyYD*fYNQ<(Be9tJU8_@UAVZob`{|JD*_0E$@m<G
z=tawAhifrW;qSWAV($$PlX`6?&Cq-JeQK+oPqZvv>8;;NuRXy*j}*?&%Z4k|HrQ-h
z5YWrMBr%xJ>JYav64P<zGM)AUgtU&3D?zSfAD-yXyHNLVJ(1HHAlD#FXunCw7u90|
zl2$42k|p<rj+Kv9*i7B&+Ew@3<w^ycdtyL{*5KpEPb~1dKquz5>AI=h*sZa=8-JIK
ze-2l#<O7^<c=J0q#oQN7agM!2Eno#8eLkbUe&ySN&uO`oy;TyJH`)%0TVSz?(J!4K
zkQdxkA8QtOWy2F-u2~yX8hx%!*q5p2-Awtg!o?t_)F(TM;;nZ4-PF}Hona_M2p-~4
zJ~5Ff+)Y3stFkB=OM5H`rbEH!@K-B0H_fcLDpwtx%fyLrp*>u`4OZ#3<EwWstW7}(
z;W~4u>r|a|=ONQxKC~9|?Fd9be><$wfeiln1?Xjv;dDLb0uMBIc7Yb0mjtqctVZZ8
zjRH~b$DT9Gt+eMtHjSG^TXie`4D<FQ#lS<oSeQ|)pX?JpFx_&S!MYz=W7<~s`cX4T
z+l!U>D-z9R+}0H&0R9ijvS(Mp!Aonf=>EdEv2*&CyOGxGasrr?yJt%Q+4PB9`LKNW
zywJ=kg=^ELglYYAr2D0g>-O|fLlRKYAFQU!Hba*WnuMlXN#pf8Ws6y9b$K1zxpB5G
zcO`O@FRo;2pXoBPrRDlMCB+y50iqcOKuv|x^2NcoY?}6_Xq9GuyLGzX^qW(i=6yu)
z(!_~v7(zN)B_V4``I)FP3gj;e$nT7N4-$O$Gt;(4K}(Ww%?bZ6dv8*LSRt&w50Gf0
zIGx-}6|k4Vz^sHAS=HwfdDT<<;D%|OfHuGj++cX9Y}F;W;10WR@G^ws(s{2wD?Co;
zO34M6?V!X_Q!yH0In}w*a?C;2sV6QvS3_kQn31QZ7N%c<NHXGl%l=aWWUkjTsA&2j
z(ez2~d!_UVPU~SB9(Q+5ZvMcCoY8{<nY}K51Xfm5?UW)ehsN03?*K{(fqY58+}YLV
zrpv}w<aD%8fOk$^nrk!jB@5w=WEVX=O8vO?eeV&U3jql4YgYA~5+)r_U5d5`$|p%N
zq^goeFkLAxoqBYyg}_Ck^#7d1F9>z_hd85?XNPi_Q}s8Qw!$)_W4l-};ft!pa*G~8
z;EuY!-TP%!R;W|$TZ!iV9=UHl@fI@76;<_1tSVIZ+5eD>C%b>X5eY(g*di<5K81eY
z?DI#WnI18}L5VEx-J7<lj{lf<FO|csD#YQv)08~jqbHA;f32*G4|6wM&^6oXi(+nt
zFQnBs-lUfVLX+FGxpP05WZt2+&qeuWTK&{b|NDM(w}Yb)ra?Ki+qix==>wk2TwHX!
zD*YDGZE?Ry#!agi`IEk_M*DP(z0<tknK<3<iEc4Sfxi#-z7R;>{`2W`_cNTY;AX7<
zpY)P>Fqf(j$4(N-SFWBxBqS>bu4JFOvLC)+eLJ{YEjlf&U6-Vk_euQacht%ka1UO~
zpLc_QS2Vc$9PtNd%pV><f6m;2KE?U1Er=_bod*kC;5V)3&y{nm{X#;raOk#Ir-RZK
zOn>*};^S?e2n3X;N0ROlBgu<5T?;dzR}VP-LJ~LYu!M1qyOGCE9@!Gr0FQ8eAn$me
z{dvm@ar&~oConp`O5Z-_;fO?F{YWI+(0dPqNmjg5<CY!nhm$OwAJk>!y}_5UtKY-=
z_`xhd-;d8;aq~d?oFyT50ztCkUAwgt&+zBLetqP0kJw-LEu7=v#&^l{t9;oy{Jz<8
zTO6YHdhV&$#Yf$0ba2K!{z=SN%*vNOat0FiUs%VVOZE>#N#+mTzyv6SEbxPQS=~CD
zCdR!Tcz65M-+hWW;(D_kLvincu;}>fP=dkkWoKm3BLT3V>sz}Pn?G{l*}dCy4xUdU
zgyyGcIKVhW^ht@j3a{4TrE+`{yZ5GJ;ao|FLN}EbN*9AlW%-3Wc+)SvTqkre$5Z(D
zhQ79qt1h%^`UQxoaNPSzetYRaUTJu#1&qS&KX*Nx3?-F+JNM1aqR)zme|eh*yvD8J
z)&Mtvc@ux`$JBQNN2iVw`cjVW>&mNF?Bq0hIMA0-sB6T}@of~h(j&btdi`2`7w7yM
zwi<kx>}#FQn(v9~(E(}Md{<9Hu;Pr91~aRf(tSOb(LfyKT&w-&X@EO|HZf=Ax{~m8
z!`2rtyS_>{ZBtUwu$;L)ygN5r+^P9<VfZu=pTYZKOSUh={QSX8g_0~fdpAUzv6D9|
zUDGY5WlM#&()xU1a)qy81TtV2Tg{k{kK$7hSl^7aS|`7`chi&IpsWMzK0bK{daGgU
zwK61D{ADN-E!DGt(S!8gzS(+wy}9AS>IEzqxk7)BAbn}i=Q(~Ne1Z7wu4v6#@@oOA
zdzhWmz0*1wWwkS3Iv?nBgugcgHv$zQcJHn03dScbZ?*2TIGq<iooK`h8TL`uRP+sT
zfYn>w8k)|CHl&&{#GprZOVY{TJsX9uc5qReBUJZ%GWYvTrdM#f%3g+UjIjsu<2@Q9
zs4`C3H29xpVkhUj=UiN{gUO9{g~Qw3LIRlRB`EKns1&#>PJh;+zMorTIzs%(wD3e#
zDeFV}q{v9R|HP^PLNsr51&*%c%$b#pzPq*1Fosj%_yiN#Ek?U5Dm5A|&l@c|(e^*k
z5719v_#|)8+yv74mRmnwR_y!@iSgpa?u)CNXLiY+D&Ca1KVSNUb@0b8lb7;_Zw8!k
zx;Q&b21ny~o|>l+5)oAy>l+QyyIA$!LSCET?VxxZYIUJNx;6XhYTJz#J>o_(X~3Ma
z&|cz=ICBG|fl-%>6f{npHGm{9!_Gvd9A>mmyx}+at73&v<tLtSA^uv&MLW-LsPC;S
z?-&EDSB~DbRvAb3WR(`4S))iDu=+-vAeiNVXxNRmo_O?@j6S*(qg+Uh*H!k*D9W^+
zAr~~G>er&X_o5GW+=%mqRhQR?UA28WkLEWx-RX?qH-3M~vYhxl;Y<&e@^JmN`PuxD
zQJMSMn1jQmW}Gx*G7@`j_W5X)q+s$-d#V`WEo>Qek{n~0b(6@-X~NvBW-W%5K1rmU
zE9tvFSi?qKz;mNQg3%Q)4!@CO93`K->mnBJ6-qV17~P9G{X~zAZ0TI^!NX&!rcL0^
ziR?kUFu1}^9nYKGv16S*491C!333$3Uevi#_=(k+dp#{s4PQI5N_LNZUT<Z_P7n`v
zUpk5;^?WqiyDX1I8+OkwS_{oDp89QQr>g57wX)P+4A8H5T<C@@9?~btBX0CM6CDW|
z?H!dnA8#j_fbr>=Ip;sCEn{QBV2W4R-^H9ET=N!vkhBQ_gB=q-ZqJUAo%t1J!_Yt7
zyGrRtZXb0$rxLB~DrgTiYoRH&eKYtFL{6juJkX(LGp&F0Pl1c9R)Y#O{eimvRdoM>
z`q^(S_Q?N~ukY>&P8M$lRi(Ij{uem9`mTXZ+mv~(EI5%XwgONI4;MWZ{<%UFk{4dQ
zx+95mI}|2<X9*oc_b(y8GN(Js)D|$>{{3D}oHV(x7a)%*`o*Vzu88^lJf9n>(A%G!
z6!Fxx;idm+e&>s#{(ap)p7#Z8;Z4R{bDlEwfo`3`8*R<FDOTHM{20WqPx2{0c%~fp
zOe@Z+Zxj+O7p9v3LIc*GuO`3~?Z60UFzz$@{@7GoL3_c7)4m~`T5I>+$m^h|&$jN~
zjv7U^4`8Ry;jSIDi0XdA>vUDOnsb^`VeDx$N7IgdA}f<0BzNa9|EObD_+O_CX#UqJ
z10}cq>r70-|2h+s;=fKA;Q6oed6xfm%D~8f9eeQqdxI|3c_s%ZpaqGyK1JM|cDx!U
zyM+O{-0wML(kkf%8Vaf&NuK|zr(Zq&yMw-9e46bpPSfcEW*U}rZMMpDd#7(V<*NGI
zr=)vT`vvF}!l<l{8)C}F>#NpV_x~~QIFlae(_8A5B2p?IEELoHHQtbke-w=Sfs{&I
ziy7RzgT;=@Ixw%;I!65Smf4XHT%<TspZg|ZIUFj#Rt_vLk2QwXSapv#Jo`j*SD<I}
z&ecmEDPDc?3xW=da1%b;Ahn<<1_+g;Ow@51@+=8@Fp-#;IC8lycHaAWvOZ~WtRFse
zin+|cI99xT&C#^K?wpKOyeNPc(%s(YfF2#$C}VIYE{<9~`mRdj{Xj-06|=;r#xg+K
zKa+B7_x<bm47*0r(-+kRnyj(=Na!v(xsRCJ>Mz28%QHHqM@EZv(}}k)rK9LIQqV65
zxOUHHU0h8e@^T7m^)r|0Ep!cAj^wh>&JR!s`@0?i5#PBhQ!B^+cJ_kZLz^+O9|=}A
zkcs-ldQ6?F{|(9m*O_zQ^@CIAWHdj#<t07R9<<e_ky{>8pb^X1vAKo6Jy$n;t&ydH
zJntX1Ci##(#i};5dr?`PzsrNk18x3eU<=ecucmS0`f=;wmM7i;__1Bpo;gA}9V<pr
zAtNy-Fs90hDbm^2%H1)YqL;H2P}+aa%3uG?=C&e1WiT#nQ){aePafKMERM~iW|9BA
z<c#FWAB-LLVl=JW_O*Yj*NgLGIP%fos|9}iA1iEeW{ap$k`jyEkL7^A5)skqG~%S7
zemqE=dKskad0uoEc~*XTh^lOIO=fr@{UEl>I9W(6!xp-tMJ?cQ)?$x1>ARo|i%?u8
zzSy48a|**doG!RLqdoIJ??dbrJ8zwBe|FIO&lcl+s`SU1b*_W<HJ&aZjvzhV-K%Z!
zx`UHv4K7v_Ful|9c<24mJK$?w{=>!mC+ACg-VzCNc_}IEA_qSWPlut$ET@~7OVGig
zp?&Lyzt{Bq@m>mNTkyE4I2Ajv!o$f4n>la1)KMmpj!J>fJp+|#EM082G8WJWbqd7E
zkqn><+8Ap7!kN>anctv*(Lp0d>`34BZ<Bzt{m1+?7^7!^g((>ye}8ab>w|rLY}<it
zg~oI7@Dy28!VX79J`79#t(QL(L6IX=da-LVb8K)pm57hwN-+k)n>0Lne9V0GSLMim
zU%x#V$o8pNLRPk%Om@jqRP?sr+f|NCLT~gzel87aXum?&{h;T=sO-OQxcbiz#T(!S
z`X(+4NqBrZdplRss|a04eJ=fgjXjRrX}PfJqQ|yCO_%6Cqx93hJ$3;U@pWmHBK7){
z@V|?NsLO3|7#Jv%WL&OgJyEA66_H%Cz9SV*gF>8CLyvut-90@8`T1@gE1=K-427mh
z{njvd@%J*5qD_+%Q@)jp0knw1+7oy0laSPSw63s|5N4DeX%DaOiR}O5z{xc~Hy1m9
zxHXv`)RCXG&hfVn%`S9B@Ao~+ADyg&5n_ExV4G7isBsj3ID|d^;FrV^x@qbtPl+|p
zRVPApZ=Y#DAoZj$C&$!jYqEr4tpiD`T%?X-DQ?>FwN4Rqej6cn_;qW-yIQypVrv`G
zbX?wNHZjt3dU>(@fqcAPU0DqQU${2;%|JDTjC|R~^{)foKK3~HW%!L)M`f8EW(|yQ
z{HI>ECn>V<w7TY%4E7}PgoVkk%8FqHfIw#AU)NiB7=2nUc8M+sJZCp8RU5!uHq)+I
zid{;I?2RdQ9f`>Ib+0Shmi%qug89RfQel?<OJC2)@m61TH{m4<QU_cABv(4aqR64X
z{@R(^>e?o1bvt2VituW%G|MN?1eb1!a!A7nL<-#WS9y>Ab&Rx4qeXTDY_sTmXoC6O
zz@m$*KEz9V>+dsd=X?|AS)09AeSHb@yqojU=Uac90WT+TqErk+X-1w)eZ9SEMqUm>
z`gOX$U2;~jZR?6B%AE0Bg$lXbC?3Aj=`4CW_KQJ2radTwU#0U9)VBGL&$jAP94?Nh
z2i6uch_fB}!;|Rhk2cy@#%SoNG_%t|r@7337^CGdqSO<mV%M(gpJ)%maDIHw&#g{9
zqPn_i@+P022D=!GG%S>uMRzu}^y*9Sne`~eeTg%friVT$nwa|Qpe4<6GA^xT_CuK{
zUZ=_I<=1QfF-+5#h>PXV9wML^mYx40SxU{6&ec;s@mLDRtoyP^*Rdd8O-);G{j11t
z;j699t*a~k<5T1F|0-FG^ZzlxxpZ;;S99P}KK{AOLO0yCr0Dz9N#Y04wm&2N$fR^4
zjGjp)dX(pMzH2y5ZexpGa~eK)qaMy8-=GFLIU8OnSx67%q=qwRmgY?7O_enXG^tT-
z5AS?@Yl=`-sj$0&b_$qJE1p-^3tuqX2ne3v?Wz1nomx?vf7kZbRBch$U~|2RpQsbo
zsMJ`Blg7Y-&NHvag)K*nKptx?h0n+B?S{YKB3N`A*r?};EuQ-h9@_F&(ed$0+$|i(
z#+3J4cOaAVT|3kP*5SM7DGrS(4TaXds;P?H&_eit5^L>9wHKl|twTBSUB3aW)}IHG
z`=BF|lR70h$y>fr8KN!yjPui9gK;)Q|F8PbALZ@6q;#iaHS8Ao`TD=I)V|8}O)|%;
z4qM=N@Rkba&_Ns6ag`lBoF13%J33(|I);Ta$t<YV5~hrA-GxvQ#I)z1H3XcU`DL0l
z!Ls6D=NF-g)mR5@DQS^d%JD{dgqEr^r@FJSI(AyAR##E^MK(U&Uu~fKcMA>#ns5)|
zIJL}~vz1}oag*Xv^x2Uy?JVI(6v&ZhOeJ<Gm6sgySy?GE9rF7fmfA??#hR%2QnA6s
zT_me?hMKt%)RpommT|4^JVfQ$0jc%P_}Zh@8|^tMx{pE%>?Ne(^w09C`Y5EMmn`g9
z#OlpLIUH@xCs$oP?6KkX4jZb^hAq((13ZiTDv7!^>J%K#3gq5#Z|Yv=Vra>;!%8DG
z_WLRi%7+4z`m^WcP1{7c$7}6UQNOwbWd_w8*przA!YbGKWngQP-g}(kQNKpaN9Q`b
z1QXrbTD<j>n2?bshY+3}3;BRoifFM^jS26^&tlc~kn8JSbVKuFnwfoZBpRyLrvZK*
zDcvwH(UD2tI++y%$-DV=G+lY3sb)J)tjNlhXJO9zg;d4&ZJek1MDq(B(groVrh_Z*
zWy?@lwh_kR4Qlnb2blU!KKW~OX&?W4-4!w;I+W3HC7lQplB!w<Dydiw(iW0`<pPvy
zJ}g$Rv}=G*_brid*HS=BGuHUnVuw%c-dfj7v<<5*h~?*DcM8kU!Y7g%=20-tD6`dK
zGRWh`_LxcIicO-cbr5h(F?=PDmy`vyV!0FZY>!IOqLxVwIjt0MO;2|=PgjO#$3c?I
z7PWXBF1Xc~=dGC6mi}&xM&ngTZcU8Y@>E^#rH<B)0W7Xw;=qToz^a<*nTj!I|IaSc
zFoaU<f~<(5@vw_+o?I@|vwT=|P#{8Yq^0y-|NL+?<OV`pZFCDJ7@6l@&%x)UCipO)
zGw#<;cAU_S+Q(9NbDzpLX%7!k@(831)fL!Zo~KNPX4b@5EGH+n`)EUqf3^=3W?vL9
z4p)dMDcO-~sIsL5Ii2>AllDOEW<WP)C}DeZp;R-<+VYUmRat3Rg6q1BMSdB&q1|*~
zs0qfXn{TmW)BsZABq;f$E7#sEa6ov<5jo>dPx)9#PrVO(cylK`k|ds~KjVP29<fa<
zn776|Fz-){4w*dcp~+j@KouMNf4qA+wWG&ZAR7ORkjl7DyVkd8E)o;FM6O-ANwh>t
zMU8EBA2!__Z@Z@V0a}??N~=+62ZPHc6`Xoy<nzH~ECz-sT{z62J{wjW?$ETZ=j?QH
z!xs|fsf|i<<}OFT0-__S^eE1iB`Fbc8gS(Byt$OUfU-NkO3q+(lJlaQ%AmSlV`_)_
zvbg*pc0ITt|Gmqt_ec5PTgWtQdS81<$1o?#IvpP^$F70%g~i_jHMU#Et1*bRNc3uC
z&Q`$;3Y^Hto!w2O#+8T9%fNQxnG-#Va@K#!Gk_9#;PV6K0Bqr3f{BCU<;C9=gFYAn
z<)Xq@%;wu*?Rtg3cCz47T<Y1ZQ9#e)HdB;|q_psANVCN*$k?>^&*#4bQfPMZE^FRI
zFPYdhW^)IQP=%{at{0s9hL$y^EJsKQUKW*EaCMR5QltDrrWB20HV?Ty)8KnEFIe8c
z8`}A#gWAbX8&melEY?oV9Z1l;i$ptIm$E(z+<h(8R@5}UMMzRnL*bd8F^7qVP{3I7
zb9S$^G5*kr*<0x;zwpdH2nl!nx(7pn+)Cv+dK^4UKf2hex&R0SEEL&kyFcurmuD4U
zuy)Qfob!N`@1*kx>mdBX%z60S=xwbtoJ>KI!m}x3yuzAk!o}82{|x0VSTN%GI`*?+
z*m+jQn1nNr^_`*fpFI10r8pm|kg836(P$wupOCDoDPPbK1}QMuFb3)21`)NUzW8|8
zArHkoIFi%B%Y3~E$RYe!&$}r_lbD&+4#I4?q+I9JUgjB$<(yGg2a5H%t3IWvbX!jo
zP%(MtA;k$doWn%*l&u++yOPuEQhxpF(QU0lz)7m=nfI~5T|qgz#pWJR`7s+sYU1~K
zmAvtJ{P*%+-uu-h&{=eJ_2RU#;g+4cv+wkX4!<C2h<Z;6v*ca%KVBj_G>#r>EqMqJ
z=NQCFJ*^E3j*(`As3?-MWhDx!Xi&%|*eL236x1+7m>#+dOq`B(33VQNFxwEu%*=Lm
z2{03#uGffV%6ZZoVbvkZFO*eec`ghf4l0!xLI_ofJ=JNKWNaVOVw@(grXwCLg#TWf
z$NDY_w*Yf;7!S099Wk3{t&x`&dkN~9Udf%*9a5aXBoGIuFtMb^Ponl9rT5ZWOVw6A
zma(d5>l7%xs=DCihuM*Z14;A&MQ~1Pb$zP~Go9G}HxD^9NZA;o(7$%5YMUUMN!}R2
zLwjpG;e2D$H0|QtPP4*NT8Nq`EekxSv#4A=9ZX3By^{S8RJV|`e3J^qm{na(recl!
zZs&nV)q%O8*ytLcWul0h#_tW8@S!HA%(OQ1vzlFp@OcNIRxEe-;2CXieEON`1KHq=
zdOEOfQ{R^PG!De7OkEZJXE2QJ{3FL9q{iWH>z6XRx+DF-ctjVgL6T%Dw$=*Ebl?tX
zktM7M=`AmLOo91&C5$;UsI#?{_u{p#{Dc<-X#im|sKj(Um(LnIT2B;H*RV;>UEcIf
zbiJ7<q=v}pv<*k+IkAuY3?Yq`GQgG8D5}3c+}oDuRtoEMD=e8(wf~{<dy)CMpv4sF
zbE^aHQ*dhbbm5CIRmmkabZSR|xn<h=n~bzo*h#Na_FyD{O2Af(j#}Sgw~|i6><s}8
z>n%DwgoXfE@*<McQ=)R3)WSq>GZ7{4E!saB`K}xfy}K8VPwBlhf@Mw)sk9STS&_&G
z&4Y!ucOJSiTkWeQNvl1U9U7qi)m0Nr#-7rb)}}mIB~DFj6oc7|X&-3mcy|{`zceyI
z!D5B4mZT6nq(X}44e*|R2#Xjdk257t_(1`E#X8hn=SpC9t+bZ*N;6L0$bYzQTrV<A
z{!o6ipyPmnui~o#04@4IH0)n0d9(K~f|DSr;RQ|KV3%F)OCH|C8fqTjMbuE;7l|U7
z$tj%4DGio)l>c(`J`odM6aP#0aBvD3tsEhvi(Lm)!{iSFHd6+k0|>U{rc%}QTBUOA
z4Oj*x!w~)78l&ql!C12k&8&5!r+lHLRkfhAoi4R9PYcpkZlPwFkg>azUIK76iePRi
zLll+yoSM$TNhm`{Up-k>r$BIlx;`5Utuq?MXJrquv^q!}Qju@ahB!Y}&Kaz<a3z*~
z6;fdK2gF%exk=HYiA%|wSJ|B#NT#(F?<x89Nc{p9L}?`~zEm7l3F~|Io+=pF&;ln$
z^*&WCcXz1}Ci>XkHW}6Yy~xlrMvYogMKD;k>6s_3&ONkYF+<s>209|CG5}fNPv{J<
zJTUfTWmpRgKPWak-wQ#B9-1?=r37=Iyq8y?15yPRYlWNO2(ZVszG~DFz@&EUSas<X
zRK;6>{tt|4I%F?a!bZHAbG=d_Br%2``g^JK3%E)6MuM!3H@Gk4tMw{)AT;4iOFG63
z0=}8tRh>boawY|5PURD?!cIF@K6*&I0V&>%>_|15@S;wDpaGiMe_T<=9PtytR$W8S
zX@<HQc|OAh3V?+qq=q}*NQ*#J;$=*Mx?)RPIj-tWthx!3I;e`3sY?=Jg7>?2Sj`@5
zsIXCmRM=T91*l}3!?>fg6@v?0QdER;U#_y4IYW}m?AdJ&V@Fi8gGjiwE*W01=Yl1L
zI*WW4AfEzL4Jw;Tp~C;i;j!D1um5#M%MaKNXZV>PJqOFdrPnL-25tF5-guc5i16KU
zZD-N%$`g&g`2N0=zahp$%-tuFQs;UdA0#V&`YFlIWoP4iq1x;hm#8e!__#Nz6?c=g
zYnw9R$TEwWq}pwrL>cLfu3P(2K;q<=ezoY_MLv}uDFFQ~a>cW#rT=}l^KOAd9YP(l
zw4#IC13zv0s-dDslT;I>TWYHj13$K`n?F(`hp`IU7%BwA2kJ;-H9}u8w;6YE@H<G{
zMY;<}oe-w*o@x8IZ#}q+XcX&q>6r>0c#EguRW(#a-{H7uLIt4Xno4uSn_wAUcANGp
zQWs$<2bI`f$>2`%>gu0W<}OQ1)YiIsHFjfJi7+SjTC;9AhS7>zFg(m$TcHBLo4Dlk
zvy6hYj>vdtZfFBMtH~gt3F9$b`i6d{*lhnZY31gfVXTt^uz5?y)XA6iW$jRH2sTBj
zN9cN6NPZ`M>8^LmdU;TW3XEHtLzq}YB~_M$<Nd*pL##wVQ3`&N7qGlNK)T$e=s044
z{U5H_oL4h-XK1>Y)3ACt*#2Fh>>u>jcI2!7o3fyS{EPs#gA+BWi_PyAgXFf4DPvW#
z57YwWgQ9|myi5TE$Tj=p%Cc5<K`>kbOQ*tNB6|J%O^HBF@SYpIsyp2(kB@(E*16-<
z#_VFwD(M(BZviWxVKrYr2&oPZE68^cQJ2P8kSR*RLfI{jh2SG%(W>Yv-*K1EiSG3|
zc`@%E4vg8xa6_FW=Q(26@A7&0)3SzH3w_}G+7OBh_FcTT@X=M7ydC>YrQ(~J{jaJR
zw&Wm<ib&nOz?#!@Sq`<w4fo2NSe4d{L8%c{2+~l%7gyZTy9u0j-Igtjjee_QE!f`@
z^lqg|N2TI7f6o1nA8H#no$a&IR++<8pB<<jjHyL)poTnL^Wu`5T3iaq1a?#)_B;^p
zCy)nP$Gf{ntUVjW`88u2XdV4bl^pI98OR_O23*2d<<{hrK6-frC+`4E1(ptyQyOrM
zBUdyY#z|99GpvD~E*#*8oM;zz08k6dU*8ML0H!g%fZ{@TDZ?@PJ<KaLcci|Rl9K5!
z6}z3DsVVjkW%~)Dn*6T9u7MpHD@fD*+|1m{KFL{abWu78;$G0l2$M$w;1aXu1bG|+
z5hd5lJ6ygMo^di%h1uL=pKhvD-t4U%u`mi!6D9Am&zg@D2e!{xP{&#H8>DLDfocEi
zjD#Z#(4xd@Tmb9HUOHw0$6R+r`GKB@@JL0g22$cNY4tHo;pcmPEsY1#KE29`S-`@^
z$5&fx_^WH!8(Pk47wasje<6F&)1;xQEj`@CmkGdMz$RRfwqrC7VOYv+SPJA|_e67y
zU87c4&t~>{em1k!?jrdVdOYLU<L5I<a$v%oUGrK#6<bB*PWLS@8dC$^T4BemV$G>s
z@ip%NE`H*p2z;>~*JJ#N<hv6CBPQ0KYhH-lNkK*FDdyYDr^2;k?e?<652(k;YCd=s
zU3w{O9(~>o>2mzK<`lc>605TU1J2pmTfGB;yRi0QwQT(fcwNmirJOERYi4EdUcVSH
zarOd{^-rZx2X1e@^}6%Di^HJyjsOrb@qLZ5c#4b%yukm%+IxmI(XDNuC?YB%C<;gw
zQBXS4yA-8^(tD9!Lhl`<NJonF-lW$gK<HAWH)#nF5FtPa0qKNtM)!XA_w94OpXU!3
zS0<U6wVrlA_nKLE%BIs=s9Fg^9kKInx=aR0vd+EN!_qla<uxf-M9KMm;3wc#_;L81
z3~K$mP+x#j5XK2S&%T^`lYQkO?-~JOAEm9V5YK$kF`>PG^ct9cUX>I~-B$9fVdR6q
z24B$9!29<qP_6BjYb4MCWQZ92`Lbl!dl1KGYMol!OY0(lka$IIx1v$BPWvT+g`^)q
z62FP!SQzU7Rz%N}uiIoXCoGZb6!tait9a@QsQ9~m+TitjGn{3-O==GmryOUKH=R<j
z$KV}xApksC_^`R9GHI>Uyn0}_-j?Cg_7_fj!@`yq-j#D|aKh`46#$m!G`2mYpyFo`
zK`YP;{dTGZ${ILb=Jc>V3LiL72zM*Uo7MzO=Fak-m)`i0Y8Kty)}{~OeeL6<$I~VF
z0Q@OeX3p@gsRn*2FE%|#q<QGaQZWO=2XOV9HJNna5l~!t6#0Ya)6+HAb&TcsF0_<S
ziDR8@T#|nj0VVI045-Z&75wSnooPl|)&Bsu<5k_>6b;j%-e7RAI)Ke8%g}W(Ep#L0
zO+vM#Ig9{cEE7-CdGuO)(5)H39st(*0W{n|m!2}AGcB*L7%-s;PzXQ+@oEvkG1CCT
zO$mRB5N}Sl-E%8~_(}u2U{5J(vam8nZZ(tv%+#wz{th5MfGazXYY&aSHqETCFD(Ve
z<0{K2RQ(4h4`l}@3Tlp<2e+2ZR+Pu+AsqUEl&S8|UFT#r^S&CX-T(2iP<tT|5H^f7
z)BwW2cNR4SA#ep6@l1K%z^3ZvkM4lTZaFXxAkk;N1Bx5KFlgoi)Y*%FHJw#|(ze5R
z*JS(ukexr@tsbh#PwxFrW@1gL$O;BO6WrYBX@For<39&Czl1H1_6m3C0O)8fCn*Z&
zcj!3Cs;9)~(o`@HbUXEh>NRIupLhTcubALvj@+YM@bdhL*PFvr8EKp}SSWw}mAi5=
zUy7pk?a?+ycXMf=dr#TiP3A8x&x(6A+yppa24zLL=mYc**I+&Don-X>T2)e_9N?8y
zxiu^-%))`M3THm#t-9`)r>GI+6t24l_bm->4O`1ERThR|D{2Q)p<O97W#=kI=}@~2
zajjIeA~Vrr@WTBU)~;)n4Ipg_OGafj4b7K^Vt!yRUe1nXYoDT$O|^?|E2F-NUpkum
z#d0~T)|c;$<g;x3HkSe({^^&1^q`WNtOQ_|2Y-4=7Bde+Pra>?Um7$AYkbXW44N7$
ze1_$=5Cd(@E5r_mvtOCJ;P~nBs|@OLyGo4|v|=Gj3fx9vyQ8$)@R~{_>AfPu_;z|r
zUun4usyIoRpTjhhHB8Oz=`=u?tjEo2W-71s$Ek>;DpoIJfOxTa(W_ooz@zoj1$dIK
zcdD+wq1q>;@;BOFC_4_o%<e2RZe=CSE{2Q|(qKzAFX4<YDVc}UK1}Z{L}xUbOMG<U
z<k*Q1WkoL?6^cs|8`H|&Ud+rXK1vxy=<RRCl>=fi`_%y>DkGcY6RuS3DfV0nY8I8U
z)B<iV=k+G`>ePi@-`O5YRlJzDv@7)Bof$^B{|OocWj={S?RI2prYD=miB@GDYN$Ns
z`R`vg{&Uq!N7raY{PapoOEjzA+NKlnI<r)q?zJ$jS8UnrXq3RKqLOUQztsZ%t)Bn&
zi&AD3kr1u0i&`m?txV^)ea@`@Er{@EYnS$NxJE%8<Kr;(oN(qry%%o=RbW&?pWk!u
zxd*gXo~pS`yWjbn?Zj!~sIHLPE$zXYAi<;&V0{H|_x<D)qGn=+2MYVErCMv*eCj?(
z>}Qmy@z_UeuX`L=l9{RJy88>)WznHHr6EsKp;zJmy{#oclJxtrztHF!^twu-ObUYX
z%SdRx=+*Bl)l)qxRK?4Ozt0LK&hcplz1|^D)}f79%tfjg2U+;pFLl)|JL8L}C9i#U
zKt=na*wV_1)9>mBbI!O`&SkPe|8&QR_Ekkmsg*;yF|cK@&lZ{LmiFZ(=N9y;7rWKj
zCC$Q4a8LW>cWQuE?&$%u`_Er5FJM>uS6UC>_$*(Fopq0i7UGU~CIXK$JBlx`t%YG>
zy_sA8TB>d0o>uVIL=^tB$P&bdsgSsG^*Q()@hChs9B-(&K8{u*|7!Q@JaT;<<tUA<
zUQF#cAz1A2jEBSZIZrvtWr5YW@s7(j<m~Y9w#8`sVZd7IRn88~@i>DXW#?Y}q~w{!
ztH&TZnsGpiMIlAuO{ofrjRUX<YIcuasC;S3{Vd%|oEZr_4Sdw$kI=%jnnkGOiWlbd
zSaDEaCKXAE*OZhn^*XZ^mnjq{7QFbfDwvU(IW;~`y7DiKBK{Uw4Co|2leo!gtozrj
zPVQ6rHU6NSXPY}c+S)0BeciFtl>Ul_oLQgI&8dDAy`OiojmMXkWWb~GN5TlfW@Fl#
zTusXGRKHZFGM4Em{zI8^@L~>b&Sm`(n3G0#f7OpKK)ev6tKBw|v$IXaD{|b9+r*@L
z_WQH!c*}e=#3&dAO8T#noiztXU%C5Ij#jwZcbpl;9&*Ca<@}2aO>QlPTf#>LSq3T{
zQ|~paoKBC<vTwW3mi##GbinLjG{nlw3Y+EIf(yI#C4OAZ8NEMz#h~(cEVaUyLrL!L
z9>=y*>lhYZuv{WJ%F6w$=e}|ban0G`+;vJkhu5l(9Y(qW&cJfrpWoQc)FP{{ew+)*
zPfodx**p<;{qtl8aljL?%NSmsm&CM5C4C&1?QM!%)XDsJ6;kTm4h(I!votD<;98^#
zr%W~hdJ2h|=v0)xujCW<Va3fyaI^vQp<|r{t-MQ-VB&g7@qTIX=Y}JZzjxU5oK6r~
zV(3fXyzPlYddXX<(i=~s-~UIWGU)kxL!X^Y?aKh$g@7gR!!6LX-Q@PKtpnD$?NXPN
zRi?Mw|DrG#$TK1KSDjcb5j?L=qra<-e=qR&&vMHek+Q(SV(kCt5gnFWA-m!2u=oEV
z=KuYp|Nj=mzIQCr{`;ijM{rnCgg84NRG~(UIa|<e6-2z9;IWgEsN=H-n_M^mXCHT}
ztEpGG{S0#3`hIBaImK^d0&_bmay$F_aJr@=c6xG><IwTT%%P&hX`BX!VIVbNm?Pb?
z25n(?9iLT#m$!<?{4_JNljG=<-;NsS>yK3NGgmXy!kLw04Fdyb%9oTUA+YDjq<~#w
z-1eSTQMHkcmz>Xq>?D3}BQm~?mDX$j8+J9@*ZzXecWnJkt9tQCviRw98xI9xZ#gQF
zKn&_E_}~Op{2|Hh7wzWvOZ{2Zzk}PY`%gR8G;Lta!Yj{CtCrT*41o(5(H_ekwuH^N
z#a`=FMilKFk~_dY<Cz@$*q{x`jg+R)5`lbRcX@>8yw{mT)7f4Q=i)`sFTu$TU7`}9
zxMWtnG%oF?lcp608^2$0{GeWPn^{1-RcP}hn7X58+T$2M5vy<Q`T=<72~n%Nwf)(a
zZ?lv7NY@TVfcMc!2v5fz;+&n{_|E)_oG5-&MgA-SKM1o3%mUljrmIK#uuN=%j{xAf
zM3a}lU~0wppSp{ouV!FH5EY!g-`uFb#4MQm@q#HPN91|7kOS(_p`O2n8E?aTa-CMq
zt~5-}ahdyK+^srk#xe1gkj>SQ8rTKq>xMDVwzW-y>?D`;v43r*f-Exff#s44$rJxB
zSuacfhi_<_9wgG5s3?1rk<-)+p_yd~E3PL+fCqoEZIFk~298K2{eYWZYyQlmMgv&8
zqO5p?Wl%X*Uf$(!)!!J73nd6VYO}u9AqcT`;vpm7O|%!?fAcaG>RKI!h6^pCi;tId
z?j`y^+zyyqZ`0Gw##=4{P0}7K5V;Ordp!o-(`&9)p-(DdDk_`k<IbT)wFn?1(9<2g
z-L`9QHqT$;>`fvXarbcx&P4)&^`eeBoP5yv<BuYhV-d)9KRl<GTxa8@MYi+Tn@i(#
z^0qHai*|_B_yhgzy7w`s9&MRz`}!4D9~)856)Zj{81N&!13%wD(F2>!?}!Qtyw+7C
zqAzp-nRrkB_|UTSe-wk8=p>t-e*ZOxocbZ<Y|}#Hrl5|n<&7(v3~}3FC&In~zx^}@
zHB0J|lB+h$Zx^iJpZozo7A3mD9(K_k<KNzMSjj#Mh8T>st4yvzIGBfd;cnZivaHqg
z@|#g6i-2*LuGs{rl+SA2XlYd(Q#ZTohlDFzfZj~FIB{9u$eMH$+R&(hV-;6QSr7Z2
z*%H!5<Ykw@uCIhDereUrb1TlOQq7M1s#M4|XZWt95`~_O{K!wCeTP?A#jIWXfH@MR
zlo}E--x|hv(JAHJpssBoOaVp{^2aHfB|n4kCm#G1>R2$SrePe|k&Na4*%C(0)4b+9
z_-K2~wl~f&qa($PM)v5K%Zz@;ZU`-OtCB>y>WEC%11ax!`VLv=nL4{V+2T%hF}hw!
zV&k)^Q~j$&ZG2;BkS~KbhgDWfcj~i71&4OA!JYBqNPls!h(mV^i@wPa%>W{)OjvXg
zf56l##ND&qYfWf)jU6v-NRg=F$eTvKM%)T<uLk)7MA=vsQM{I5PJ?LU4D)X5GYHp(
zVUJNh7kS*qv7D1XW)x`bPH9*4eX+)ztVlkJncYb}#ciBPFRoX2Pu#U_?UGHbAKbTZ
zY+VmdV4YJ1w~D5wR$@rfP(#f>aTdK7O^O(Sbs&KYE3R(0c;uiB{8XLo?OGdTZ5?bx
zsbj$g6^2Ze85|w)i+!GoW*rARh8eX|Nf*DUK^$#Yj-I_5f83?BpY9)0cUg}h|6sF5
z&L98Jtm(&Skyl0OWGILn&yj;u-Vj}HyQmLS$hj>4F~Kf|bFVP)L>5j~NBpTnM=C^5
zQ^<>@aYd*O3(5KtOB?j!<mwo2h=QW&3s0eK{-$p%&cK9h`CaI<{dgs0IUBH|pSmQ=
z#OjaHE|zK_(9?sn)vD<w9JQHGk7_vtBBog_UQO=Wz*1>pRV@o8m{=2CUE$p?2WRn>
z#>b*B7LU_F#Z>)T&R9%p@oY}R5s+NV0xPni4uucHCy|#O9%Ce;&oOfr3<Z-RvHf;q
zij2}84*ID-X4o|cCnr3&PqKa4mJb)NtdASe47l@QX8=kR<zC57cJdQ#S7K9l{C}Lm
zp}Jdo%(Q-rK;=x0Q<sfD_%Q~j34KhIoGtRQZP#;h9>^4%J4yUV8^mJY-t=t3s+=LZ
zSXP`JG;cG#Uho|fe{pH7ziU6FvROAKPaAv3ohON$;~e;mI$o3=v^0NfY5<;8l)pMj
zJ*Fs&ObIv?s6<xVrEKJ!6&%QRn1Pn|gWUQZGWTuY_#E?XX3w(C9i#+WBNPKsoJI3C
z)I+KP6Eo_^K1BY{^@}e@$n&x&+Yab*Pv0Quj()AIIlCKfUNRTMuu!l;jY7nc{&ee*
zrip%s{MnX7{kTote7TiRPSxjJzp)T)&jq{v-}G16vF{%s_>M#!9+7k$mLA<@AC@oG
z-Y&(>s8&8*USj4X#+$L5oD<Sl96K)nIr<4@ql;7n=4Tv=?ad&Q=P^I#Sq~3%@wn_;
zMY?0eSo@cfd~m>g093c-{>)oSJyK94y<B{NRtqYitgZjL1y9nXYoOgDHnA~Cj{k1b
zQC3T}=HtbysDr{Q^Vz@?8=^h(4EvEBF;i@Q1G9aW#2x{Wa7zwo(V#*O+s*-6u$62!
z=$hH+`E&pcw4X)bn*6aolQkEv6@Dkb8zy_O{*u5Wc2Dr+Wh6Cf-L9Co(}yWQ{@J2z
zZNDq4_E-rLQ$ru4(M!zszH3}&*j`3laca9Gdij!>dd%E<-|N6|vnsWSjPT2>v1hJf
zg===zyu$nvK|hHOEIx21Co|4{((o9;KOeY{#lVlnZLwbcU=!4kMD6K<U#T0)|0o~;
z>n?J1$&ATnoQFZ*HF~$97&fQr=X|l2743hQRGc3DgP4Nsz>RP0k$K6yQu>V2<T5Q+
zWaKIt>5aP!*~jt->}Mv~;<|luDH}}flWcmqUM-^ZqDTU033KHlj4r_l_Zd7b@Iq|&
zHc=a`*SG7gXn}c@V#qsNO!5qg&hpxZFS#vR&rz~|hyaha|7;32(ETamSTTB^d<)Us
z-_5~_AEaCbiQOz?Bq#MbXwaes9aQrAy|qO*1}Ud~=69&WQ>Ibm^g~Fw`Qwu(r{~m;
ztF>EStGDgFo`6!(b^3!O)<xKgTn`MXzX909#YPCP$#lG0Ou)>v%6{~$QKM|+vu1~#
z4YV#dCm{zH9>%uX13~$r9uYx7OE&#$cT#h)AELt^k7d`_=A(7r+4*@j+nOk5j*K;{
z&qDa>>2-n}aU5m5oNcFO6)|5N>m>y2Aza$NsUYobLh={z!-TMGqrjyzn8Qbc)$lmo
zIlFxo5eKy4WUxRn_9BI#J!D)eoB5s|4n|+UyuO<dcIhGU0c~!o0>7#U8J=~jYjsuY
zLGTQ`C)Qn#zwy%peo1oK*ZqFcKXu70f1nRsTi|1q2Ck}&7}L@5n3bM68ZN3Toa`=M
z{V`*4tt#Ja=8DT~f3K6dh@yjaL>b}JhLYS&p_fx&zcIUH(KR6m!R1n>jArRL!1U@4
zeTq-@%Hho#1bZ&%g1`y_O%g1Wuk4ssn?pXwjzPcwv!1ZM1COpR#19F%w^Tnhv(c=6
zv?xZxp`FXhuZWK*Cd{q(_22ZJnemTDP!2028jd@e*|NNM*<lWpa9HOt^-4!JIh)wY
zmmdqZE6qSYYJ;DQ;g5ih7DknHNq5d)+WGF&PDV-}Ic#i!j~AAhi&hCiiu+PYVr0lu
zMPH`LA1qII{1*VrEH${dm^K_Nmlio*3=k6Hi6Pf_CdTw?&lcn}Va0gH$e^&@n;vss
z+2SAkY3594=^odE^{nf0x(2nrLcWuI-e^OJC4h$yE1`Ve+hRmav$ha_Yz}i}2ok|L
z?~7Y0Wq)!qT0tJvj=HwFB)i?#=3EiGNX;q7XjQE0Kd|J<OfqjpR><%1r&QrBeE3ha
zP;<M?ut=v+R1RujnTU#8f8!HT9v;XmgUCYxc)fY5sP91MH5DRsTJWsD{5n5W&!iMk
zY5tWhZmT&&zRo5H*4P6@qIJf90XxFXS$xYQcp%ck9;9Ieyv&<OHQlMAhgYHoY2TT%
zIA+u(sC1{U+?rX4;T5rw%uq(K-8ksmCttR6unV2LGx^$VXqrCYn^g1&h};^sYGi}*
zC;zV7a&`K+f{H(2sr=57V_lu^sV-WZRF4mu65L;HM5K@{;Sbu^FltOtwV7b41ORb<
z6<0&6#YW0WdssY8KO`ob5*%e0?K;%|>kcGrD_EdPYMId4KqR2wp(4oBtAijqSz!dj
zAEED|N4ojTG;{x%QT5j*Gy@SQM|$pOqtpD@T$CbWryfu9f&Obd4|w0!!F|{a1P0n+
zdA@l_!S)O)$mxHXYM+BVWK(LwI1p(OJf{9A5cr{Mp#2(As!F{6^lKY$yS0i)GorAz
zqb$w23N}50HsX&S7_!u|>uDwYB@svn&c?P5+~FKOvUXBS0EH}pBnL<7h)hz*F#-B;
zw!3=lH4db97fik?1>e_e=Iqga#R~F?4Ky58U#h6Gum~7q5;(}I_`{=X@vZD@QT!6~
zeL~7rlcQ}%;$Hjx5V||cYGXHTR{4n@9y`>NuFoeOiE<1hzVygX>e#`y9}Av*3A5FQ
zZ=Wb5*dGe{^_^`JlDh!Kqw27jJ&VE;BQ(5Tg{6kV#o}|J-0os_z8-_uVIO1N8?bL+
zOJB12nSqw&eD`BNc)amEK<FH@xIb4r;+qJhns0fA$Gup+WycRa<u4rImJ<C`ns`Y)
z?fO3S6q%Jz&D1NZGVAsy*WVDCwI7GK1?HYV)<^b}DAZhj-|;u>Xgs7NqN0a0)A}H!
z;()qf#_uuOXJ%^v-%pD@-A!ow7wdD9&;*6y|MhFU&J6VmodnEs=Hw|u#RrI>V^_R1
zZW}iLQsgKqKlKMo490hYWhq<2B4xaLT#98Rb~g{L5V2IBH@Kl&VXMt<Jeh3COx@K?
zM(&cCUc0OAivCTWg4Ncm<sZvw^cD1-lu~XE$zJs)Zy8u;znk=oe3ytvYi(--Hj|qa
z!DbF5)ZOLo%CHp6(n=r1ICn+)jDR)6(*Y?VGZfnf4_~|A*kMLk6u$#_TscEeN1=W!
z6VCR~WBo)~d4;b`ll{dHxJ`c3XioG;b4C5B8MIdZjx8edtX<RI4&6vP2cY`PLKK@{
zO36_+8ZJyF;V9YPz|6KiyU5xcyBEDhF1K)1Uj-X@Pd&AK2R*W1sivo%Ir1h$&%OC;
zyFoA)(tK9Ld#)UN=+J1gAyNa+=$Nk)4TR5}omYamMFc>0=NG{<&uwh`v&80gaArrx
zN;5Nk=P_aTXAgBZK&$-v(mZ5Gt&Q#oy*7hK9n_lE^?CDW%9DG?Mfru~wXlVYlClxm
z485SEX?g|D07oHS1l;sO)W2tD{(C~0Un&dPdNjG0qFQDlgoCJ4Yl#2m2uMvo!ryaQ
zd2_+Ww?u^=EP<fXkB-j*TREM#UI_WYmx#mGi-R3*upcANmEsXRLN0yXRg3-}*)Qsy
zex@ujvt!7GjarV~FOK|VKDS95ptj6l&Y2_g6Q1m!1h`v&UZ^#C^o`Dc>2&3pL9I-A
za8g6{0-t9ledd`N1<X^Q#&i_WMuUX1luKHhPD>>bBo=;eCzA5b1&V*4uTz8}`j;sY
zl@Ytr-!}|8*<}b`|I5*q-&PJk?5zt^)C!<vJ}Pxh86Qe~QID5OL)7V30|jxRH4%6A
zZKv0?0Bv|l;n@)J*~tiAaJhy22Zl_kiSi`komyOGHe^|~ZeGeP+BKHNiq+heJnAL8
z0ZmlYRxHM6UyDGsK0&<=;-{2QH?C|Ie~^1|*1PI*{@M)jlU&5ocH#PK?K`llFPpbF
zPaWrvdXL3t=I$KuZBk!#?bzTrDmo!tzNe&fI13kY8lqR^bD8N)C%8BVBKv7ua@!~>
z$y&XUnYU?%IYYeDZ_w<K>@U_0Hm^=SD(~o})!hlDzR7v%W7)mVA<xF;j-s;*v>`%I
z-xEwB2Ps>*%#_-F?$H{V++(ZsT2FJ^`a1Hhh1f&2+DcMVR+}Vi@l$b?x#6r-b)4Nb
z?pbh@lWE^NYvl9d)WD<PZxG}H-{B{t%7&ME)mFi<4-E(DH45)nz2z;shS#--7BvUS
zpTue3c-P=f<aF~fV%;~O#o57@L91W)-fBs`6}PUBRr67n|4J%If754ihglh<Z`R*z
z-ft#P?lCH$%e6;w8NGJ&b1o!HvBNl$e$G90i4#jA%vqI`g6-BT-lJ0hWpU_Tu?nXf
zt@&>^9bK|C&V3nz(d#+KW*8RgL27pnnh2g+m9W|0{$uG*(A$5O^-o@8q}PutrGW=E
z)X69a*R?Xy@T0nR$Cb=ep_fUYg=chADn2t1Vc3Ut$4lqzgVP725>&h=&e8_>L*B*a
z#nJ1K+L)dXGUWM8v$SLjKJ&l0(z_H*A;=X)LKfM0buBmtuFETSH4=5v_3x~?NYcyB
z(tS_fHbQ!JL12sRLm@$)7;wPl3D%x<M(GGH{d&4yv$jIJDp6lQHSBR}Og8lO+%hdW
zDK&i^HwjN{U9~|VpGdz=Xn|Vhl&!oEmTyxe(0kAMFk_?`i}3g+Z@xd<p$iXyYMM3N
zq?Qoqa%Rt(`SHQ*m2Vi{0*)YGGc5*9ZY|<Dm=izQ>V`5xqYpy;>H4)UFm+?|K055^
zGxo3ZOY9dPjiIMDwn=_t-I+i%l8#Vd#F6wLlGv2^eMUnZvc<^LGPBzcnw~+dZ1pCg
zR|_mF?DFcqOu(Y<*^#47<NU4IFXU|Bbt~r5z;1JVH2pb?ylpPMgG&LMkz)uNv18?m
zdT2n>!7!)c*?F~JPC-VPsp9@*?nST|bE4QcoBDHHij2FBbJr4Qn58eIoRvkDLo?st
zG36n?KE6w3y}3_H!b)$p@ugh`@6694kl6VaiMHSDEm^3C9E=`f1#Pr;g}TPa?x(>2
zzU#0o{rZ0l&N8B(CZFgb_9B)<QGpuK1TmtaficA{&kf$uJuT3r+D3k0$N+gFP_9!D
z&Gm3eqLrBJq`q(v^*p1ae&Ei6S#V}jL{)5w77;6PZr!_K=8k7=2Vz9bAf&~$G7>nG
z=XRDMv|_fr5)<g&+^ySYncoqs&t}dmGT&7Equb2+*&(scYbR%;7Q6T`cdaEKoBX;F
zP~H8ntOg}|M(|s6LgZWUfOvx3B?tQkk$?bieRbf?GR+|h!Ul0U4XA=|QsE2|1FP()
zwnOaEiI}+3$xiOb+E?iGe+F9ZV;(;rZ{In)_ceOPBD1<dyC=SrafTiwA|CqVFEgP?
z3e+=YMNf(GCRD1uvi<x^u@iYB4!~9>TJ{jo=fbxs0~MMcN%q6Dj){;3i$_1`yx}v$
zk*OeGBtSLPFReg}d6>6p`s?^=0}5je$8Ms>0IU!G>cB9v(m;M`XA7=Mif^B4y2_wc
zxBnvSW53`ygW%L%cD1;LAv3N$6>^sWNNDn+;dKyu3RMcjMd$S*$+@{ikA)d$WS-d6
zd-LPjX}grGXPHS!5k<J%1b&clHLx?I)yhIK(r`A!hS3U>6-P%6Q;(pw1HFU$MGy!h
zU?8H^-yxuUU!n9Y(9rhP)%K!Qg3+u$>J%$<yoA~;b(KQJ&_pUd3z2{J?EM3FyGmN$
zY=|XtIN4C7VQHRM1Wn~@TUugyVQFbqYFkB%u$x~3+s`MyP<#43@yYF=8xwNGdH03$
zTt^pGCB|I8%~O=(OEHpOpDQ}J(#XdX9cx3KW1CS+<u?OG`@#8QV-2iq>lADed{g`U
zS=xFwD5;0=UF1ey_AD9#fAk=>&cCHPTVldrYM4iT+~YQ6JA{_dexn*&LYuU39RY~>
zg`0WOHC`L5^4;F&7BjuN<YFxj52j_6Fhkeagq}Wg1JAdYQ*C2utB&U7e!-nbIYThG
zZ64uq(8oOB3TRBLEf)84E7EBACNGc8Je|(J->&W<W4;FHKm6xyZSm;q$eV>{!k{N>
z>73VyOf#0rLOaQyqwH^$Rob5N%z{nxdOH=I(}|hgp@_VNGYr2Q505Fn1iTJu3^Vzr
zfXjsjS1w$q))Do+E?hk)?jZ{Ls7=tzFE}-QGc^-SJHBZH?JA}vT;8O>Y?(zfl<)9U
zld`@17-z?Cev990X4>+b=plOLI|b;qya+VHq2T+?TeBeWicA$KZ{pCQri+r=Yml`z
zxKxJ*f`wjCnMuE2{l>4Osb*T^Xg?PoOc&10|IBTVk=g*>^sTzZI))aYJrG*BN3c7~
z6PyUr_mXuO)VJKF_MbmLD#gGvpjap-SHebGIX2S-+ra<pSMT;#3?>!Rgc17i?H#Q)
zhGtlyvjxPTCm>+<@tr$*naCI#{QdJa-28`3cKRv<#fNZAB1wn+M?*X4oU5Y|)U`4|
zY+Km;qyOr+Bbs7+5aI~2+Te=uG7z+o4l==a)XrGxHLmvuuFW-9;5K*@-!8*G8*G>F
z&_-;aCGh%bf4F9U`1YJqTz5(KyGGI-7B;Xy9j>Hp2{OdxwsK^LfQ(h^LA}IG?1=G2
z=qaEzODD<1v(K$SP5WEkA`p>#b1mdGoCawyb+U06_-JnnI3e3i+QU2SNSL|LUZ;L5
zyp4IxZ$jL-?!*O0As))^yIr&O77@M*AA=#bc|1xFTK$)z6t-QnjC9_NAE$awXHzz?
zYR&M@drp}^iJuv@nIIVvNApN!^W26y23!o%@zK<&&_=r{lJk-Kkr)@BL%&79j8Xh*
z_qw6e#&c@h%CFE>-?m<H5fZWlVn1p&8tVF$$-<gBR{)r5Ua;6=VCEMdFW4XqQ=hMW
z=Y;I9{K`BMcU-bSGK-MF+0%g8@YjR(j`jyT;Aq6G*QEdP)}L0-{<yA}OsVCstEE4Z
zw}2aelI5d|-iSnKej6PNyK64aEMeAs&Fv_d{$usEwA)?#`E3;IK|2gM-N6V5N1ELH
zSYM)36@GYWUf}f-K+kfIoHDxsC1bPkIJ|)(&+{(RtAwp<{*yPG>VJcFYkY618G2Cv
zsC#xY1;JF>Bkl$9kYL0Tj!ewSORhJ;?=cZFO$zcb5I_Ey^!xGHi?^EMGWQH@8dl1W
z_+6p^Nou39lTOkkMxG)osVAZH+=7=mhKJ?#<n^o!*}+TiH|Pli@j^lk8Mtc5LvB^O
z!*@rg!>V41;4s``7ejnwcVCx0R=dY*AMz8--y|R|W6z%__+BbF$)vV4zHHP;j`Ii4
zBEg%HgdhCoYyHc^-jrD(JK~7;q+nO}1Q*Z-d&j+-O!<WrOzWvwi@}KXeg02^!i3Uf
z?_S+#e&liZEz*0VJw|+1YUJ``lb2Gg5*L4M_fgex!KBmA<ZMnp<%Abt1m=jO>!jPG
z>5`DEG%xRac&3jmX3)A+yb<I^9OC+|It?NiAjX(r-Egg<cZ|ajUt%ac!==JhI5a*J
z1LkuGNpvSN6TM4jlNP(3W!roQv`AF3Z4t~;+tC7NnEoI!R2EWp_rUPF>%h`jtDand
zJ!(nmM%#tQ4w(I1jMWt1UnuNIsKtf68Z6BewDNdkL*I+;sY}+UvnNkittdbJ9DYe?
zDc$6KPy8G&{OuF60<|eG3Vog9b53p%{=SplXARKvOy#c5TfMPp6J^@$2Ko?=(4S1Z
zQKr^ZZ`bqk=)Ql<#81cxIoEsP*tp6!Wrv@!mT7AAG0@E0j{@w8@SWxRl);z$<>C8;
zl>Q$Z4;iD)?#~{b?ut?>cbnf-$*D(b{s_{)%d^krG_mspA#(lO6qm=N)>R_Y#s?zI
zA;aAU1D$e&C(d&#h#)<-vC-(u+lc;1^#l!mle8DX8rG<j$Rz$oZ6iA*JtGfSBUkQO
z-*O?(k@o=S4Q5jWthx5i=57qJZl8W9Yp@P?56W~q95fU!IYo|h=$tc?MFCnoJIB=|
z3_|8ePJIuA72o@N`};?ao8Z?Lza{kC1X^s9R|JNsd52kTH+6CX1qf5Inz9n+KZ#r?
zBgz<X)r+<Gt=FBRmfupCt8|rkQ#)}OI!YI-uO>uF!o$932_Al4%n@08-9(y~6`2Pn
zUU*8~$Y$G!&GYs@_WPqs{HKcS?brOC!l=~kUTte?2#C&vuj)qj!*fWb{Jtb>&%A1E
zf(QM()_hV`veHhlK6!vwwv+o{RQ%T++n;rS{`~pn)o+Q<xw-xE$n)DsTo(ye*b@C^
z<UIcB0=ow%K`!?Ur~{frHB!Y|y7N@+=e=e$g0;DhSG~r2U6JY)T12l!=};XPr5(~w
zKaWI`m7eVBOI&+x7mM;1jBEJm-A5Vnr1KbCM_jK{?>+VQa;@Wpo|1}cT|e-*l|lPL
z%TMQOw3_bw(T)p&!0+>}6kZpx^sdG3-iqrV36J)FD{}s{jtD%gxgs__G5@BMgWB8D
zHeUI@^0D=}>vlWN6B$@=;h)oE#@)D@m)wFDF}rI@nK1CZ;Xg@hAKS|rxY~~FxLkCU
zn7K43_}|qxK(gv#&$aI(Bl(r~#MNk*t-`6hcklTRrYyc!Rk@eR?_lEOl#rf2uUSRV
z=}c2zUT)p&mR{{ZZ8WXd06c_ZbUMksMZKjZ>iEGgkwi6cIMeYW+5FY1&6o3L_Zp#=
zul4mI$jcF9!sWs#de_3L3kY^BM@-dB8XtEIC|ynAgr@wyxSi1<-bn!aiq)Fo_}{*L
zyQ$whzHOx}O1|20Dcdn}=58l%j%@RKZ7cp}f8O<60(n`X+9CL#@wuv82lmU|`*i8q
zU+HoCAFpWUTchN#!=Ym1@?`@0&NP}iw%?KBE6{kY$L$@T^S;rpd3T7t61(y<|HON(
zvi0j?=~GYpw7!l@)>n^n%Yzw8w7uJ#Br`?=&q_6vRO)^0V|4rTKE2pRmz>4YCnhBU
zh6R-A`X;z@h$e5mLSZX9!@zfJdCz%>lRV?JwX)_uGd^tJeM<ajtqC=@XH0ysXPeyY
zUmb2^tH+@x%rEu6pZHSp7nyihou^Xg7ZTM(I=cM2icI6rxVv-Cy2m@J10eBO^uk5%
z^KPyp5Bn_IS4RIH8biRn*1ea<e^Hc|%orMXMZfXSp&gJM>pFQrrH|fA+cm37Rc*@U
zHrzcHI%a5FHrO*R@L}iNi<nrKTP9%anBRlyId+o@lS4E&gMn@SE$bF=&gM%Tnyx-r
z%4D$$_32Sr9d~QT72W*6Gg$vyZqK+>?j|y_CT$MW(d_4~9<kZ1w1ne`SSq?aYnPQJ
z(SbSvhOKk&;cd3*^q2+I^ZN^R!mHW8gM1!y{$nc%!e@Us?)^6u4E3t_1J<)VwP1G`
zb_QbN5>J<ocl?8}WLYAzPCE-9sTG;Hb7$g)`b_%zC5lJ?2Q$8|@woQSkpJh?fBkpz
z>%Xt@AD{^w_PZyK_wJvI|L+69XD2j?cz~?b|9Nyz=x^cwKac)&mHXp={|xcJ{_8;{
z+`#+4w#x$U<<l60Cz$Aeo$@CtPVH-}=!brzw~xpl{F#E1(}@fgmTuoOo2-(zulnQ3
zjM{&{LkVT_uk66X(zALboUikUT?`11j`rz7$vvl-LU=|(X=&r%T_hE<vPO2C@wshM
z6{OpQOqK4b`ZvKX#KiShD-EL?oEaxN@g(;PM{>trVK;c1GWqW-D<xizdgjQ{w?N3P
zzTHgLGG96yC<MRPdb_=rI{y55kui?J)K{dXJnBTdL&53vRadcM+Q2}+!Ra?=h<(>$
z>*FMq&)zs%IVSDmxMH6f{`(>xF$W|@j8rr?kjB$K|Cs5M@r?&~5C3PDeme8qB)jpo
z)z9tyeXlxM2ZiSk1ktmrm?dSW;*^94BD}4rJFWh{k!OUrMsomV_Se~|5l~+N6+^L=
zSbpjvJ?K3i5k;rHr$tloE<*8PgCk%wmxC88hv>8c`2rwsRiGlBXcjOCry0j5x4unR
zuUoB)gYh-r$?*y+c=e8&rhI?<h@vpo`@@B>-Rw`yTwJ<O(`ijql#?em{M0*SMr*j|
zMiU2%1ihx-oxLWp>e__42`jF2UQ?_7Edo4hDko2S<|e1EOMeiV3a}$w(ZfgaE4gB_
zUi<4VT<`4M4B2;D4mTzrIMFEeF>mkmJwrJSvW{zNLOlx}S~`5$wWN?7dXtrr;d_+t
zvhM`7XP~0$4!puISXohZ7Zqn8g@ZS?ScZq$x|&@p41wL5|N7Y{W2gU~LwPx-+nmNs
z!vhRk82<y0WT|G3o)0go=r0>}?xti$4~~YbIxS6}&d&={^PBWBb8YgfnBmu@8b95l
zf70qf+>mjtvs)^w@2!RdM>-VAWZh=_QBOjZhR|wJn7UnWO>>cs(DT?KEBq$)2g6%<
z;R23QlqTz_T%8UT5HSmLYAqC-A2-|E+tYFIrr`7Yf)&&5$lKIDZBF;67h@mO6TW5-
zzuxd5FPmBG>f-Cq4_Q%nUf(ycAwc%q*!PHdky@?3b+dWNl+8r-xXxAXrN+}>&t<%&
zoyD(N2Dk8?XhIZIZ^K&bn;cjB*7jboNTob9dn+s7aVczShxW&D?eEKclvWJ32r$AA
zq89jxLCXG^{a+ns#oy}ipXe0*gqk-t#fytpyF>Pd^o>c3A%p#1!xJ6&(Ig;;9F8T2
zb`1~eIjGzD#&I9vp45V_^?XnSmCIbu61o0XukgWbCDJ(w&Ku`Sl<<^OExq?6&YkWU
z?EA)rD_TDMyoct@PAn{P*`i_T>5_cBc|>=()u+BL&`{&?_8H}>4WZZ}s1`+kJV>t6
z&F<Y5J6%f8;)gfCz_p(D0%n#gy$$o_;^8^+ng#cB#>qaMzkHPr>wj|heYIV`eVvEd
zYu4n+qjt8LhVc^aK8UB?$%ONh^?Y(E6Q-yySmxIr5HZ=8B3gZOKIB_E8Th8wW6@9b
z<Iq1fG*^KAy77zj5Z=y1Y1fYydby|0y1;A@thI^h@YH7U*frSTVilO3&TRNMAa)i{
z(IIftfH3*(qR^49?#B6b>a?K&$+cf5uO4@23fzXl9I*&@%z|;+zTzy!jiG+AZFlQf
z4#SV~qxT_Tb}{Xusi9+Y)6oPKM38~Xy_VWrJh@V9zg}yRTL#c|G(T<(aa-T9vU8^M
zTDm~DgGPI{(_Xsc{0y`336}bD{&^N4djDVsWtvZt9HJ35xAJYHgsGt2sJlWNL_Of$
z-5#K!!r?QLfxyM~V`k?2<L(pR`+Sz!!LYs6y$B&*Yq^0MOY*)rM5sji?@`vFa=t04
zCV8U_htJRbai9A7c2#qk(6a%L4Bx$O64a*1ZF^FSeP4_Lvkf=AQQT)ws@p#uMR_}m
zvw|C`81IN+wS00}Ec(wzt=N-vbH=Lea0Y&39PJ3h*`?gxb_(npwr~zMnMV)vm`dNx
z&ktdtxM{Y3@|`Hde~0R|4`ULl=P8_I22Cd4UWF0qr+avOutYg|iidG<01?n7;hlJZ
z3?vEaalE_<WZPi`b~*5zJ2LQh{QEt@LFe(cM_;?I4<Fo3r@u-YE>lnYeE(Dbz*)o<
zspBjM&fnytsV(>Rbs)|Y_Q$S7&f~}OSXrEGZ98Rd(9#xN{!)#+Nr-K^X|-?pVN5In
z=8V6xoM8Bvoes(ucyfOe8{oOAqxmkVy833-?a|zR5w}k#z)tgawoc_0dk}jD*Pr+W
zWiPWbs5Bg^iS<)b0DA+j6%X}IU1Z{ACM@_=3(Gu+4^RR$dPdI<dve608a&udJxO-L
z55I2P7j!Y4lQGL(EfeAanc{h?ReXV>0#~~ul&9T^&QMo=d$W{5Ahk8{BI0DkYHutj
z?8%eMHBO+Cy?$9R@&0^@J_)E$U;39f-e-Z8Ll+ml0JXa+Ri4Kh0pb%IFI3$5TJkz?
z{(y_@j^#35hH>T%4RLSs2DtK+UEw4+V^eTH)O$L9-vP3;wK6Z4+<La>_trylbE~A|
zh``r#(*%2hD$<M-<D{FjMvNZ6xJFBqzsm3fS*gh{{eYxh)vbZ(!EHA)MnDJ|1PV1Y
z1Om1`05j{9k+EGK7e~ZY-&MKVzlRiyt1kC(VsC3(_Gv=TN@|gts&bY|T<sU*J>C<H
zM4l68Rq0Qh0xBg@wQV0y9sP-T_$+b(O||=1!;f3>2{FgBHTRWOS22hP7B(4gx`|z`
z%XFTij${A&K$Pe15$4U>+RBevG7+bs1GI!pO7ilLnFq0PudV{mW${`+67J;)g&=<1
zsH3#BjT#<MKVepV;O@Q{K)c7bP|}xDBMnp}+VUs4%W%Ge&-ZZ#Vr{v=Jwz`rcx#w4
zEPZC|G4>8SmAEu>lf1TP?DNP_|04%yzab(tuq3^<kaqD$sJiBxt+C6PytPTgu~gXp
zOTe4sVHVMbU}q2VtHLSsCExh3FkGCKGg&0^j2&<m;7l!fq%_LB;wopc0=GnjBWZ;~
zVE)wU$4kunqaq1PKX(?Tw@*<M&ECTo6ZG@!k=!zjjEW%K$jA+Lb{RsF<sH8*NPznr
zf;zMQ*CIuM2aM<O;vr9-@CY_NWNB3vjf44%$lMRu*M*HZKU{>8@NfqQuHV%~o>;hn
z*vkv?L-0It9<9=Z1Fgs6K&^EPxCWeP*JDR!Z4I74XJTo23jvo8Ja(M0SnG*vi&58y
z@#aoBwgz8a;=XoEJ-GTsk9h4Q8ppvZyM{6?v6AaJqPhfx1p4~kd3&2*P#_I*#s;2A
zqBYf!mq(L+NB8LaVq)h{6EhXP4p}C?bM5Rb^8bPg=Bj2r*`I<&240~5ghh2_XbHF}
zQ-I6>rf`P63^0w+uGakrLwc$G7lQ4<2Vy6yMVKo*uZq@C<5P86mxxz_wNt|GTk4Wl
z8bkG17zm+;_il|MFFG2-5};Q|6zxQ{71`H!%hBUQ!7ayGhlNGht%!)*u?$Du`(A|i
zNMqy4AdPs-7^IbUvtA`LxwN(xd>;<K&0#2|t@#tPAYbdrd0D3JX)Fy4oc+!04^n02
z&)!%507e^aarcTKx$oggX3%_n;8{z-b<ahgf(dS=SCN5R4_j(UIYlzRLQ{c8@WIk{
zw~eg~b}@Qt++?i7Eq!@G2=>85g#(zZg5Ur*W%eCNRZpC_wAW#ImBk1ZfIoKq&(|9S
z-N75>ELj-D^5LwrC$?`9MsXf=K<zjO`ebMO$p`m|iOHFY%4w#?>XP#^q;|iAm`VJY
zrlmDzn2<h?hGKC%$n%7hNk|q5@GpIxHug8F_A)bXJ1)u0PTxn=6O^`OEN^KRbV(wX
zmfAL=T)Mj?3nJy+K*UR<hBt7>2(Arv)%zl6cUu~hhhhEPWe=B@#Jp6o{lmkgjg5H<
zv&-ZZWJd?fxnhdETtQGP0boL8E-r3R=-Vf--hE=AGtD7E?>Oc#Z18}GCmM(zk>7Wh
z;%LoPvO0&KJM%mA7((On8m7Wo1RQ!0(Bl=`=J~L|m8U)<R0~!5yGPqy!y~4N^3)#?
zu+-f%;`7S@{h{`@o;ccX5&;Y1M8w3DRkuZix9<c<2C=X_&B$On*<j3?R9jHa+&#iF
zxB7(*>pSL0M~^>Bxd9T`4LsvFHKwoQZ2iRX@*2`X8f_1k0b=d_^WGa(RiPMKuZVj;
zX}zct<zF!}dc1-3qOGw_IN_4g$TM7*vy<n}js9LGWyB-;{omtuGi(tywEfj5+LjVT
z9L8_g))F-}ExsiIU5P<U=-_DIVgsR}-krJ=!!hz8b!1S3DSj8&?xa+8wc1dYR}g^P
zKF1A|(S<{U<0&bjvo%T0evbiY3iafyqcD2H_j@+&hbRjtiD&;kphj>3j^#$IV)1<9
z`wPmCgLM^(Z<ExAVoRse+<baLg_=);)iuN9M^`Y2Xdsm|aag(}4ihqOD<nm;erVyS
zW0VE1aYMU$IajEmHF3rN>`4??*~5u*(H48+mL^Ibo{#F<5ioC_l@X4D;A=oR`w(&O
z0U!<)aByEmT5R+Sds>;zt9~SSZ0GU<=uBY0O@u+H{jQH`=9A$cqN64(>^_A@+!hvo
z7uUB4ruH18D_CK8I`++XW#)7tg5zcHV#{INsX=^M8N=2NQ<K*<p43JLY@KrmiG)GM
z*iO5Dr~rId;_Q~8QFsMycf0s2<dyiu0dGs)ee9<1EDf(fjiBd0@ZnQ#%71l`UbmQT
z-IA5vQ_Z<?w~oTNQL-&q@?5?29RR+KysjC(k7VKOkBO~Xct`ob!y^%ZJX|+RTW%T}
zgmAET#l$Sr3gX2v33}eUI<RpwWl%9IG<?1d35g;0Fcegsn~sH95bczVG{we-YFAe-
zK-E0nfT00WUaL1*m?Ly3SH#-rJyGiX$!SVJ;U0hVD3qwe^NpBTsMw;{z_W?nTbKP-
zT<YrlO(hRsgk3G+03Va__g`)?Ukb>yw!YQaH!^yy3bZ?_f5I$ZjM;LR@Xy@sBQ7{Z
zN4D*XwzSe{>U`qhcuVmJo^tiJe>=)-Z7tN_e<SJ+&`)H+^+MJgWB~SoT40&?2U#QJ
zCw(9ZS$NMqx=xWZwxxXO6J`pj&vE$acO?~xE(w`G_4V(5a2ClK@IF75=u}c(p%s>^
zY@sqBx$2Q0UZAP7%IOl><sA(t21dH<uozNZ!MIKTWd8_}=0;^`WckemXo=Zw2*zR&
zwQxWdPVFV^r>PO$U1Bmoo7}L~1nQQPb>p>_)vEoR+z(`YdZu)0@^Urk`fRU+ca*hx
zXc~4kIn}Jh>eJcT-aR_MCOu25+ttR5*5c6|FZ&jo?w3}NRyzrD2YYTj31zr!b_MQ&
zEmRNs(*gO`X|Djp7{%4l;NcEy7l(oEO&JW@6nKTMDJu)w%@zy|0*FKI>KC(2p&8I1
z733Jfe$$w!K^=>?0aBp4`YAxn9K<7^ZDE-+G8xg+H7oY;PzOiGt=48g+A8f^dK>^6
zC@vnl)=5llU+e65o}aU4ZT)HajF{Slnx47I>+|lJyPNbk{4veQ8^WCNPi}o`X&JbC
zZywjICU<Z9sG;oP<-F?+3Jw5XVZ5sJBL)YbQc*t3NCPr$t^QNgdckaR$Rtp@J*Ed`
zu1`K5?&*;>%ylu;Frs4-ATcyj;pLBu>L1VKpSL=am4Gn3#+&V3Y#XXBnx75C9?y$a
zk338IdVU@F`b1|CF-7UrQQK;Z$J%(e)cLG2^U2BGrELxoLR3;H2-9PHL8W5x$kZw#
zdn`RBd(8b1U=9$Pd|mCTj*Xv%oUf;(gaMhzsNtreM$Ev8m+vwA7-}dLkk<G%Uwa8T
zjZsSgC=0AuH?=^g^Z1)Lr1AHamDxwEN`a!PExUf=T##?7WJn88C6}91cwU4g3sB43
zcz_i9-!}aJU8SXgP87H)4+Jvuc2*zL0<FlcY9P9{Wr*)R04;@9SJ$)pDy!Dasql&^
z&rR59A1glNS~dGR$dXLafauZEAYE#;k#=5OBn?li<5n_Tr6ryRxqpLrbRT&nqv7w~
z?fmx0@H7r(prY=V2=Lo1tSuX;meV$j{r!Qm3MBV}T>lQ$;_L^??Vj7PFz(l)4*kyr
zC`#+@mzGaf>wL*3R^Xg{CH!`ML;z!)(!JPDYqfe)<2}Q~_getsZE2x$MDQ0x-UeXy
za5f_OI*R~>W{ah%W=BKlg+EvB(--<5VP=FXnva@>t{ze;t9%x^%vn_9zx5=g+x!(j
zp2z7c%gO2^D@)5yHwkam7y(7&KL3wZ3^Ib_JCs%U%a?+B*UqsyyoIM#+mMpz3u5B?
z5a@jmj}X*-i053Mn(pJBxYAwK?4Fodvq>B3w}Yb)*3)cZ3#jfW81z`8^G16%nWTv}
zu}09(e7oC(H!+az*%~sSA+@x6^11d}kNJbr@{HOJ<jrpi3gf2)ToUnvLxf`uu|;a3
z+1Ws?F|)MX)@o~L;P2G3`xeSVE-iGgyy(W7;Bl_B34Q^QB&@b7d-t~m9#DBc(TbZR
z9PAD5;K9+>9XR3o`fZlroCpq9#wIW4UiLbAsI^O?wacB->U)k?;;^%|^K-Fy0f3-n
z2B7UHAD>PcfPiZf@tDKZ;?97`aTc(DotpYlZG+Du?HA2uv0}nNo+wGufXU8Yb#t^D
zzf#&G<}J)OiNJd)sjBC!HOrVq#ofL1HN3o;4Y=O$uZNbuSFVRc<x4Rhs69ad)aNyO
zzr-tLWqEA}D_CAw_~1+$s#uDiRQo>IeIuB7mEDC#!(nvNxNzX5NG_+L)FTQrCZ=~`
z1k2w1$-3_)c?GRet~M`$NL1TItT)J)0`1^!j;#I|)wxqqfd~%H4v;tyZf|CNhfxt#
z|AR;u{u_}Na#?ySDs?_fO8X_MV5cE^Q*Uv#HO!Ubis}u2PjL!yZ%+ePgDc6sr7Dg1
zNOE|yERr?ysS@RUP_$Y?(npr>{Y7dg*p4fwi;}6K*`h~4n_Fpop_z4vev%T-Rcp$H
zaT*((wDBF$D-@*`*U;7V8WW3#d2%kTe7j*amKcEHySolich`OFqYThS3@zSM4ljqK
zYMIkzI77S1(R{A1G8hc2@laJ@``RT(wq1<j8`3A_(lsyNP|)=F+M<3ZsHmEK2~jp#
zS0AcaJQd%hKgVnA^Id=Z)A;OZUEx@QP(T)dqwdG50u0W(I?8ksiJq9a2hSe>^$17<
z4<tNdtbO3sCDwOW4yh@Dy{`4_B|QyRtM(mDqmex|SACNnwM<dB2+tUHG01PEJR(qD
z@s`TFo=EYZ#+R=YUILK@@E4TtqVu~5-`~lW3&NV6>#tvSyA}iBLa3QF*B9m9#8LkB
zR(7Tx^g+8fnW65fle=g^9{5p8s_ga#Pfy3Ga%-35BoZf{o_qNtT#3H(#<hv<%IvFC
z@6eHtC#W$p-#vn#oUK7!d9!oV#1yFNYoDZ)D3_p^=cb<kp60N4p!I?uD3t)g7z*&o
z7&y3<uU^d~bFbj(IfTDJwf>BrZGi;8S}$P70U%c%EHtUmd}=#*&{7V+nLu{CU?_Fl
ztM|i3{i_dSG|wT9SHGK+l05Cj{BW>N83qORR(#sn!I5TPal&gW<I9tNOflR0kLMw7
zx{m=H_O$PY!PfVyB!9w0-A4YQf_qk<|MznttOd7LaS7QMyIp0uDX9Dp+})dxZ%Ma;
z^qsf1xTe%u={uzIP+BeJcd)w*tr&#7KT70ADn~vmD{!B$91ItkSw<Wkn3`M&O55X3
zW;^0wo+PG@oTkQDQ2(2Y3mN>6O_|W;N`2WmB#C5DlvTcq%M+nM-zKax5EW-<Ag(=M
zr2g5Q%ze=(o#_E+_Vaa10HOaO)myD{pEr5kPsxk`8sLIG?oZ^`71h;4+x*i`9|DL%
z@aFKZ8=FuhgX`~SMQRf2?1}}qctoUsEJ>abO4PEF*OS)W!3QYaojzsH>~)7lsqJ(|
znQziXm;1S-%1WexhngS8=-Mti!=~?Oo8O)Adr;bV<HO$C*s3#Z8bnGuGA#PTVx!<G
z(0R)yXBfA8nAzhgq1S{e@nma6OQ#1{z`l_!uz|b8D;+qFrTf%$(PCFLOxfLm$0zt>
z+Pb#^_Vw0v*qy}%8cT<`|A)7)jEk~+yQPsXX#o)dr5mJ6P#Wnj>Fy5cMn$?o8isNP
zn4weYkY-2;fuTFk{run8=Y2gN&d2ln@d1XpXYPB)b*;6owfC-tQNz{Rzj=BJiM9m1
z6HXWy>3qq`u(R!BO4oP6ovg8aeZBE?!QAC?T3<o&=o}$qX$??TmDx#{wEYE)B4XeN
zQ)YUT_aS>ur;^gVzvm&+_~Mg{B=P4!Ei*172Yk|8iV0v%9*OJmV}4npuNogyd|YUx
zqBC-tc3Q4twi+%@gSv8^CW4RB22ZXI3k(wVR|AjpGyl{~qM2p?Dy{@*zvoH+WR_i_
zkgc2rV0r*G!e?!Cwr`%3>8wIM`6Y|Tii>-UgD3%=v9uh^d6f1MYG@oe_sI!BDK(!-
zx!heNhAny0mHM0<Gg2%YpBfb~0W3trI8ovK93eG^(Yx4&lG6f39Z;P2;nVlMF;M?U
zH#b;LiyiQ0l9c2yDF4$)qd;ydnh#tZA|5s`x7VHEu^cZMTYO0Z9tv)Pb;nme!k>LB
zK-SWG3-Esx4Uz%(7z_-vVs=xpq}h{Ei;0(A0AfSm+kUYhiMQhCwE#apj>tp|xK@0n
zC~d>%vE`|``v~AM8!s|m@wT7+3{!5USH6Y&v>uj>P)KTR4FSMj8a#2-ACQ=u-KbfZ
zF#!!zh#!zwA}4=_PpSnk+;W|~g+Yy^kbBkz#x|GgQ|unkXWH8zK>!SRz+ls=a2sbL
ztD{9^43qOg%y+}nv!4Kg7r4nZZ3(_MRX$<t#4wn9n$Z~#Gp6%s?!uNU#XKVZ<OGpA
zrKjg*uqwh1!YC&NumDg(V4>}QxWtvRuf!W;BrdxR_&&^dpHgEgC-+F-0FnMOf`y~Y
zwhVK*U$1e*$x}?&&RziHA9?1S?I_Je&eD3_1S&h*_T-fJ<OH}_q=WzfH{B9}RNmbf
zfMOBD#PpPz8bD%vVwR2h7@FE-Q0PFLjFq)x$7!<eY@hJXUXPM$!SRXfbagkN`aazH
zcAX1oibev1^5k3u;J;k!B(m%4UW&G4o9=<ADl(`QRq+7h5Xqwf=$s5dGS!10jsy1h
z``+^a$Poy0aBuO-7EuZ+sH3>M;){i$A!&+292|9f1^RW8ganEF&1TY@e@{9O(*9f7
z<$ac!{oH-+IgmhB0Ae~hi<B5PaiLqhCk|*M8LEY>EMprRiu6n=c}d=vTq+txczusa
zfjS5yP(H#yii_nwdD+EU-PkDj>NTHm?Btl_ZCbm^?6gyNi1)dOn9R?Gs0_t58X?n0
zXZG*?6z&Iz-~%9a7CtGh!}8jD3IOo(YDOwrroCN;%-puSORncFbFatTu)I&K)=chL
zn((1tzu4G(1cVuLS|+3mtgGSSGFcRWZ~yb(aZt`SzdfLgAF1X+7KoNT9jV}INexyB
z@MHcQDWDwb<+-q;r;Nu!#w;}1^7#9+c%UkA#0Cr;fO!hFC1Ifq4dLF;HF%wMsOk^W
z6dt+qNZfPR)dZ0O?AXJTpXn8VE&3l0i;7*QbB2EMjrRb{k+Uy318Iy5<NrY&)Vnq=
zgbhJr`z6Z%B)6_aeCm4r$Khzh)~<GT<DOtZm_+y9QUM|`w}k}|=84YF=B!tdHUAOj
z7y^&K;58!y;cR7q6cBb6X%@y)NWDCvz1+qvsaL*TAdbZZ>X_D3bPuhmiRrpV8gjJs
zA1|}#g$vunHQ!<Z14AVcwMP_)4?cjJlwYkE=Dkg*u9JpW5gM4g?h44Vu8oc|<c~vk
zSM!9V_{@`lp$SwY8P0D{Ud;3Yq;RR;6P%Iz0SGZi$L`xSff%I_{L%tsLvkPh0fOUw
z6dkc)yUN&vw96UGK2XwY5yizwc?0PBN7AO`usdg|g4X?~P4;I)H|OKiR?C8Oa{zIc
z?Pe&8khB7_Lwd6w@K~^r-1WlU9Tza<ppvU+Xqec`gV3-wmgPkUy!ON8Oq<-#W$Uez
zF+zZaJEWf=16S|>412cD!b@C7+nmQ120qHzycx0P6${faa$g8&<ClGMeomN>P_%3J
z<GN2U8Uq7ML_~dV?xO0|6S`p2|H;fnAOES0ix%>ZlgtYL=h#>vX$MIu$wuBs&{c9!
zc8^)#TC+L(e3{+Q<_VKQqogCDY*{60Vb%E&A5V0NEE}o&fV7`d`IWg8>+YkE4uPL|
zf`qWjO4NH#4}cmxlIuHUKn;3xYbEIClxu2(7VL(7caPb#rYDq;P!{?O;Opbf{{--f
zbrQ5sof)}|t_A4rF!Erh+!s7TyzcM8hD>{V&v<#G2GD%i(_E>k-Zax*CCa~kAALg!
z8XhXN-$j08P(#EgNOFp&rm6YnwG(Yaa`Ne-P5+{s$>|E=(p8*~6YdVxBdgD{ypuwL
z^Xk67L#2o&F40)`eUzgeDi2pIrcV|!x?3FiIB{=Ec(=<bTN1O%_Rvmn0?V%W>T-><
zb+b=T*`UFX8uPz{-sozE#ysoNpj$O*3&oOd0nsk{guZ79i@~j^#N28nh<=#>=yBCC
za{Z6=^}lCd6Cznh1R*buyP8+e#K_6Lu|`pPPbLnNn_W<}I%@dDZLR|%_PY3Vj@lG}
z-v7ef3AeO_O27$-TH)Ommqg(~Z$?tIyY^{s%PG6QOA-?8BKn7i<<qK=`#a;62>~O4
zS6}JZe59F+X0Ib+rW9n07l)_=8n$ln1(l#&B|Szx?xeNFCkl__i#I84#O0Yt0=o32
z>wyMSRw0g8C00`Ex}hWF<miRh(rOuh;O>juk`AMzTr|i<UnZdE5i%0<u^w4<T0&E{
zk0~ixkrP3qH*=P-mFQ?)O4Mn=uy!P|x|6#-di!%0?K;!wj-51oZ0W;ICss?VKQOmV
z5!VrQuabg)V~-mW*AoxUaEtlRc=p_V3@tQQfK%w`*L==QuV*!5+FJFa)?-##tulx@
z7A;hg^f{Ihm3SZI7R$v`26P9n_0b|lMOk<)zyuo!;7c*N84RslUVd(wm%n+pZ$)qC
zp0m6YDz@D2V^WJPb4Getw+f}EDf>)ZkAVS9=OYdAE37nd-%9BOE)(%kGwr3g*~u%O
zgH}}`9y|M7#WfGl22U$8(l~A*zM-+h?-MP<zjsUP3qI+-eVKE__O6`oXx?wpabXWT
zjy+8ZHa7OAq&i1wkdcC_(|=0{kdOhWtVBU<<ivQ=`L^p5cQme_`0J`*!mTyNcO+hh
zPAQ&w;^s2lnvP4-EIb}~xIFyaG3RSOyzV}&ckivXVk{Y7D8JYEJi?`>JpvR}L0&J0
zh6{9AHzjVP_v!AUe_=8ojtOa{44yOxKa${+{f(<9I4@~`%Y9?4X`AQMul#dc=>9uJ
z=QhR89Z8Sd2_gVg)g?=0Ww(=hB^v5|MrHQUG2y=Naki2pk^~F7L!7)yrr)M}zK*#)
z;?xqy-eiocWu!<!F)_R~Mv{C{bv5GpuZM${JmM}GQC8;UQ-#;xK%4{?1I#pRFKenS
zZSt+bgMT~=qwG0GQiZFENME7B3PKC51EhDi3yt0cCJ)6mpVQL%5AFrqy=N`}e;Z(V
zbNe+WrO#os$@#F{ZXCf9Tqov#`bZw1<~a}1!{+k+_{0|AAKRF!X_YoMrhV8UxEJi}
z1Kb2<TXFryYt$yeDUHjoMjkG5+KXF@+(|1R9A?`bSx3cga18{U0S<Og$0IBNnP-Gs
zj_w=XE=k-NMMg$mO#^EL(>NPJKH2YLZ*K$cS1umVr3jDzaRBD5AFv0SyfYlCxecoU
zj<flmcl$eUa(d~*0~!Q+{~Y2pH{|><U9)~89(<2zFFf0UF9<j<efwm=8T_Ht%VA66
zw*NtPcWU>`WAdY0_tE1NowlByU(X-5g0I_~X5DN_L6U!}KbLc@mmAKD7dJEQH;DfE
z<**CI|78-AzFnNuZ-A5&Nj##UPOfIrkqI<nkJ&>DyhCja%uP;Ke(TD;nOdz&)8FZ7
z6w39ru_mec{qZEn1#fvvOZ>n9wy?9;DrTo$n!Cvk@`9g^w&vlKPMfTwG9J0Uev{W-
zpX0KslF9^pGA;&}v+eLnOoSt_U!<3pz?4=>%R}ee$jB%i(URKQkmGuuZ_{5VDGRxt
z$Kt9#E_fWDnE1KP!rtm%tHolL12RwsPui9UU^GM!|F@#O=5GHk!qCGZ-BwXU_T@3}
zbh7s1<99~ZhxJ3T5|hHh%0)LAOYwt|%9K&nf4l%Yu5HgGE`oM;PL>(`H*pObtY^Ik
zFVm**-jTzkG-1iX7wz}>86L5VduW#ESkp?mYrE)lz!0MN9lzL6UyoCjvCq6wYU3A#
zA#u5v@Yc1+EU>(Q!|H{&dAeN-OwQ<z&o|4sq@^U0XX*!MO>@_3SoeD;SU<hxl#Xt4
z?rM3<D@dT(lG9*IZ0yWwz1Jn|R(nIwSWFnacA56&LM<gQW3L|jerx&8^K(N(k7&~8
zP&Cp_jNX&wQa^8k25NEti$6!ph5Jwh(*;8B{$oR<9Ved#d7=L|0Y19or|8=23bn*+
z$X_ym`vnn|-J1RU`SPdXEv>|4r0G0;$PP;WiUx-o68s}_z3F82#S42Ru^^NUB{l8G
zn9sO~m{NL_JnCOAb(iStaSu~{IDLeaa8_wJAXauu@3EZqF{AjK2W~@GS6~HOjK&!H
z32J(bia?)JeVu^&#fN>uuT1o(<uwMYH>z2@7#5{I6{fhy2Sh9G^6Cx4La32<bIl(e
zJ1CSscHy3XPHvX=^u->KfKU9()cQ9LoJB9w```XW*ltm{CdJZ!=T33Rh{pbWuKqx|
zEzvNa65ZdSbU_Tc-MCAlbfTwl@^!K3Y`HtrBPua-+NcgrLwh+i>ls?No2R1uuRbvP
zs&yXy&n)@hQ{tc3){>L|f4leoM+*MGZzfOXFJ|!P3H~z!#G@9cIR2gNfDJHDr^15)
zYRAcvs_25?Pp6Lj{_qRWmcSa$t?Bi0{0*!6vOK{|urdEgUky=$F(Z}JhhN*wdo!<R
zXr+7IqM2Y^3_9Yz2-R4UutkMMyqbJs#`K?P<16Te`xiQ}`hEqiJ?Sm&+1=ec7x19(
zp2sk<#}w_*EyXIBPSR?>?xG-ex=>-dNJ)0~OdRbbuZ4&?5w_lrRvdFJ!9?XOpXMa3
zgP?Z^#^D`5n%xc3t*9lkMkvb4u*)*&JQV}mfe8m?NCj6l5sZ62)Vy=eeqd24=;|v?
z6-ujPCiUgtQylXD-DN$TLVvZ`(bsJX7WMd8ayGPwAqL2LzJfqL&Mx<F+im~vC-vAa
zJBcA!+{roV@>odP`;jkX8T+~QUT;^lrpMq9?LECrlBhZxYk87=OtZanm)>laz8~=v
z>~|(g5A|SRsi6{ATpg7E3Ow6!Me0@(7JX{mz)MUE$ycf+%c1(&u{pAnV)=B1Nf9fR
zZxhW+{OQH;6r@I|9%C<}32|#mu|v<f#ygs)aFF^=#u=%{ru<_tI1AUV)O3?ZDlr7{
zLo)I@l)jmj+(_DLnlV(=;*$=dl9=(h3kV5G1nEfXr8vlv3z!xn@y^F)&`75_%WurE
zf!kf4%OW%J$GlC<XU*+Wx*#cGddh$G690RtOj?yZX*FY4$p4i3s(Aixtnl<Xslxk)
z6~?`t;3TCF>mSY~C2l2tzeGqE61MC`;fTZMO0yz39G+6Og}+TKJZ5ptyT<F9ZfC&W
z<gTPZ(mWFNxWfP9g#}w;*z+T}mDim^G${FRx3G+HtSUX&>74LRrodcw@JS5s#I1sG
zLkW4OlxN)sB6~tlNw$B!wycq2jO8{ukhvsebrI%XWN(+|^Y~DE;z$imKq<WEHN~|m
zq5Fa%h?Amanm4Rfh`pm$Oihm)%``t2fwz5Kw7upjFqEeCC_}F+F^{lh8G~e%uQh~Y
z!D$&w3S&`7Y9T4wPesb@ee;nnhe~&YPy(mvga#V<{S#h^L>4<~oqb@eJ6EDUULE$v
z%6y<$FI6kfQsWIw&+A5dYAnE#8|6k)X%zGCJp4;Yq<!-Whn^06=xzDnARUcDJ)q%t
z)xd?y6nf=a_yO2S&28;mmxMk1__3)2m#ym=dog)e5Y948@5Kh&Y22L>Y^7;14x7a>
zxE=g}Y23f|>Qigt3)6Bb=E#AUjSCNnjTR-)u@so&%FA3F<3xC*AoiYusrvf`jRDXm
zgpb4SimE7Lp6gr$!G2o0ul`_uIBo=*3x@=9Qm1IdVog#=*y=nO>}mKnnk<-S)!yDs
zTs8zCoVxBor(IV#{0+6j@anvX-v;|B{@w`%&cP;7mDC09&MwQ{4dJxSjx{m0vTUB{
zE#M}&WXSjGR2+Ok?1}udyCy9qA~h1&dyD$zsl9$_`x1jQ4YQ-w;FsXrR=uXJKDJvu
zxZ8;IvRNPGpa&7_euSdQ)<2Rr*01DYUgAe~-Ly^%7Zy;jC0ydYXiw~QHTj|yuJG6_
z^s)LV*3E4*mI!U!4D3=#?t|FOVBLvmDrTLbOT*JQM4oxIGZ2a@Sn|Z6BV5~ibKjy9
z!=3zK+3{kqF^@CR{iA*x)5h$pS}tQ~KE9TkJYNv1G@VvGGU+J8&a1aRCq^C!vuq%Y
zRT7wPGIR6Z37w4+D|-+c-yk!k_rvd!TxNnPgDb4fQq&e2ITj*0;E#?1-XyZ8&_IIH
zYR_PUGpju=1h*PM?6kU#G}3a}6Rz+ef##>WY_D|%8a0*1ogk!Am%<uvt>cURpUk=H
z`bDA05@IwCO-6WcuZg~9ZzoV2+~P>Qnz+5#^WU02*$)V=e*e2!57%J%N(tu=x>1e%
zpScPII3FafHdG)mdig6$TBe1rn=v^PLmtpd?<qCM^<4~yd@!YUy@6;}wst0D-yRaX
zu}1eGgS42_9>G%9J#@bQMFQO^)Fvpt7NJ}S5p-C2UW4HcSKN+;m-)~#DRG7)$WQ=A
z=>a!eye>)~iZ^=6(T)-JjOzYj4tDTVq=vWsWZ8Y`qV?(=5j~QCK}Cu725IOm@oN^f
zFV<f_iFGy{@A){^AnIFTmNO2BAy6Y^n&V-kS|2uTo87lgY1>D_7K1(XIbHJOQ@4F5
z4?*FrtnQoKhTv>D29oED#D!-86}nR`7w5jCuMT{jNjvFsx59DG*>YR2gdqX6IS#1t
zIP$*S4z`IlYy5?AHIkm52&&=j5At00tHREQ8kZNh7lfm$2e&K^L7xv7y}31<Nfs|8
z%(a=-e&dyCQbX#M`0iPZA7nt+`X74)Skjx5!{r|sejab%M;$=|kPR~BE&bJgXs7k+
ziK_bSw43U16uv`jf9OBNkcf!2?vzd6AabxB0c$%gBhNE89XJzPL1RWF;`W;foKHwN
zTFSl|Ec~J`JXwyAvqeDjy2-56*t68@_{O&f(=!;*lJ3biC$hFnf@UfNs!Yj7Oj!p(
z!*&|&{*KpNt1-^ftsw*F`8Tv()B%g=eTBkSM^DCQlWHfU#!-`2@Phi_?S4d45K>-u
zG%77k$O2dgOlDUp+dWF6Oc%Mv=OYe%kUBJ<^&3au&h!J8p0HHdc0_EqbVPja!cGW#
zz-<tX-EYAOzDc8>`tS}rInT|kc6DpfP(W;T^Brpwi>q)4TQ)uPG8X5%EgjsQbE#xB
zKNAyvW{*D9s}tR4@*4fEx9T-rCtHs9Ney{z#9Pgea{L5^p}5nLAn5YdfrVA#%E3dW
zPgY*I56Qh)FnaKoj?I%>>#|df(-P*|Udk@-ha3^=){9j<u?VLn-1+WQa^PbbGJR+i
z+;yf4L~Xv*81)|#7#6qjLr*x~csH?l7(aPfYEK;SFq1Q3qVh#<0r;(nd?UVDUr@Ln
zE&chnvlIlAb1rnIfsWVxOP|3*_R4BIfdWZFTeM!&kCUmNN$n4U>dJTfds?Y69IbCX
zj<W=V%F$k$tKKp=cv7}d2rnc|Aw#%cdu+B#Ysy@Rh3tz<t|tV?8qr49M%pb)xr~~0
zYLh)uy(_mba9|(*X%=lKYu*0(E9!;Oco|{N4L&q5%RhA<VxC;R!W>hw)W?>KPD7aU
zK-<~Y9_zHjs88#~o&-)3JpCeKaxf*l7`KYJt+;_JIuj?fp)fs%o{|Oae#G%Z*+z2S
zC08G&=GKUU>%s7*c*2)1g@Tq*_n*jl4s}s_=aG%BE}so<pXYF_#B`-@asSiltqBT{
zLP7Dw8F!=xo=B%1&(7~%xK^7lgePO{;GwaCX4o3(<7q$>ht-BQ-ch!6jy5gZ!}mtq
zTP67dXW>~p$I4rkH*|dqmlRM-UTxbl5TTJzqRkG%o|W~F_kd8s@-KwBCh&z$-V!>-
zp2Zcn#f`J2_S;MI2LJXwH3Sl?C%P+bT`l8tM?jLg*^yKLoH^R!<y+-L0(GSB!-+0g
z_LPD^s(r*TYgvnCH`H&3Rr6u{uInnT=t0DA|C&maNBMj#sjiNK)RyEzS;Fy4wbbxt
zOkUr^O=L-leQCeptlBthuJsye(YXEDh{mh8Lu=>`MJVbtq+g9pUr5AMZ`83Cg>U8c
zE?;EHrY-f1<aP|@gglKI@K^&*8f&sTr+^4W#cn;VGGkgsu~!33aMyHwLr*$~XymW3
zJU|2sKckF9<6bHiX8Nf+8jRm15t3S~(ZI%xyk&m6e1M%|sA0qWu*gP>^rO_IGHrgn
z9{Rbky_s;ikcYg{yi27+G`w*9%E6YF*V_RBMV&1P+B<$-*u0*8rSXC@a=@bqc43hg
zSqB!a@MT!)C<3LN3{a>Sg*z?9@B2R*xoxg&nr<U;UNLj1mIQ`l;evO*8>&wr0ijAe
zP7rYlLax_6Jw?4RPNIhc2nK=pDK<=*!r>XOBKKR#HPWHP9Pel=JaHh#J*i|s0Bd14
zkp<&6hZyL3-b~iT9sY7ppU7TVS5}p7$=F`>Hj0~=ruTS|Nl)R97A&Z5B^)#*A0<j?
z-<f~W==i!iv9nn4=~8^-1`gq<c?p5VN%D3Z=8=GSqLFZtqo!JnmFD16{SSJgNk&j4
z0qL`)qzs9KVb#KUNYmJ_zJ4Pv=U?dL13S<tn^VT2#kWHP6vl&0nia!Z%^JC|>5|dH
zdNElIXNyW!pB(QZe>$X%<^lJhiE4&7=4p{Y2<WnHVnHg)yjC0N{>3cyQt;w^R%ILJ
zKG-K2=M3YegIQ+Z_%egfVX&K7!w;GCtI7QgMM%iG)BL&C2F4g_QvQM$VM5zB%kSQ-
zr)<kR2IwhHQSWiQR@ug9D?uAh%XhP#{oEuq7CQavS$SQ%{vl=1P<GBjo}7JLw(JNw
z^x{Brg&eWXiphv6drph$N*nTs+1i~7N;`x&`BZ^=-$;(jr~-sK#irAbyCoNOdsA7q
zh_j%AwXCGgfUUcG7m8`X3f<!IaBP}EjZ3S%c$r|g6z#NJr+svEt~tomG6JieA}w6q
zQg04&(DmF^n~x~cVUFgY3HKp$UQs{oDB7>GN3<Cj2ox!W>p?>lz=c_uMwY`nJXXI&
zS-_nUKK_8=$`7+E_QuI^M?Q;?_YMTpaWpuBUcmDABMwYN)HyeETRT;Y%qa_xbhxh+
zePr_FFT_^b`WgRS61T?i_cF63PJ6=_cB{A*7UaenbSw=YY;LvoCeX>H2Lh<3T8K9K
zoyC+i`^idTINCyEEf0xbX+pcv4Y4)06OBOYzxLFJda?*d3-tx0>JjrFzC;;O^~K*&
z7;qwEI3MzYGIPCmcP*p~%e@UhvZZ>`=MLw+Zp0@_u`=rA<cw4RlO(i9R<lXFVu%D&
zIfNTgb`L#q)monSZsG<f841aF{Kj)&Pg3zFCM^LPPE#~4&CQG%4yA+%Lm)(wn#@oS
zLW9ii-K8YjTRy1e$2T`+x@0Lf7$oM3?x5DP#Afb0Ax)jSX>&3a8AsA7PXZP}Ngu#{
z7b9p$3YMb;HqR`8?e~$DayUORg&UHOS|~P@r4EPmPAjH2lgzH+Lhb36a!q<Uj9ic_
zYq9&9V#9x{yzz#Tl>;H*^3~9#{p=dw{n(a~{j}zdzM?Um0jIf@swtCk_w+H)@Td~2
z6}#-&5w$z#8*U+rwjO?{<3s8WK`7?79t#5A7+Z-JI?Q8dp*WhPCG^-vDGi&*L${Pe
z#e#;O0viSX`1zd6%*A!Zb6{SP_=Fu0)Utx+9nJQ1_4YKoY_fB+e_mclSz_?TjCzKX
zgRN7wXWiCFXAd<HD1PdT*d*tnkQ63YkAcS7vkbvQz3#KOj1D&fK-e7mRd7~2JLPeM
zuM=1&q{??aq0bLRs<|-s=vOIpKRVOy7Lf2w_N<Dyq2ll<C9ZZ#awd*$E%!s_J;M*@
zcQB*k^<T_^4Swh}z<Rge9A|9=9=y`9M+7=gUW1K2q`EU0%5*W{5YilKmr<P{Y!=0B
zh@&itbKw`cEua9IA@O@(Mii$c=%m79Ufz2-OyL(-<@#*IB;jcD&7^3Pmy=!DH2ouD
zP_=8zk70`yBB1~EM5E401PPEZY{_c}J<K@T;*FYpHd^I=Xwk<DO4ZZYu>x3b_W0T6
z&S)zp=~qLXy$O^Op>-CE%&`d8?#xO*T~<?SOU+%fT6&hP3XA#LwUHPs?#U=_-h;v{
z!J^NIG6#ayULCH6{xQdeqtC--4j16+NM#y>Y)0rQ6CaDubpAD+wD+Fd*0G)OP~e5N
z|8%A8{P8m&bNY1~ak#wr<D(4-|J%WIWA@C!l9Klb!^;zwW>DIbvJ!RZm%;3m>l?nJ
zUCZcVBcUyAmO<d*XCruZJE(mg15tF~eYwCJ=Z=v)x7DI4?!)BS0b-WaR<9@aTpCf+
ztXCPb_+p0?Jz4ar&1b^Ega43ma^OsS%eU%yhf-llVz>VsVl2(Lpe7udA$GLHF4+mL
zW#@#N&fmV&`B0PUJ%zyw=Ma577{bqi`%PPFSdQNzXaXHnF_^9?0#)~Nl$i;&g9z=i
zMB$0<geDg2m4PHnPUhc!6&Z6hfak`yg$^gd4wLX}vm|ya1ex6IXB<s|sit6R`8)IO
zn__o*Z^LY{SuMr^$fS=g+Y=e0kDOu?QBZ0K?vlj;a-V5I%P;+LbHk@E-xRss*%v&E
zFlvsbyrCBe{$W(X^?SY^;v6-=V|BEca+rs&R*wH@Vs;~Lb_z={wJV~6ZMHVD8Uz)a
z%EeJA5zIHIEaNERffH(V$|04N7(2LP)XoBV5`eXVvDQZzv{JqG+lCU*8(N^9(PcR!
z?RL7gQw`Hu;4#p}$az^=^5E^zPYJ}+aTZk(gY*!kQk;(7iKdu#)!%rg)s+H5!2?Ei
zn?^IS{F7MaUNhI^aXAvc-NMoQ%}))AP=097oO@n9J7;3|53aEDbuhy=FmQE@vS&>P
zX4-^fZq6E?mt~LE!zsz^En0=62uo_`&nWJ=Ej1bu^&eJmz_6z-p0?G6MzG3nwnGW1
z{;oP1?t|ph-OT`oW0{N?Hj(z+>p4j#)KaGrq5q!mvC^mXYm8kcd@HxND$xWh$ymO4
z{_vU>O(HQvV!f+ShnJpZF4SqKcZkCjL@)x6tmcsM9oK4KyllcTsWA-K??gu?AA!1f
zI3WSiQz=}5uX&8U=|qRqoFjn1AiEa?M$?I58ah1LzhQi1UReoZtbMoZs%*e%mda_W
zBUfPD#Ya#ma?zB(x1mtA>rc~kK+Czw;*RlB)1FnJs(EWUE->HoU)oCd<!{=$3<7Oh
zcF2NSH?J2!YiWKphs${DE7FE9G}#c+@1=+}GP_g7B^TaU<Qsb-yjs&Mzn=%mdX}Xv
z>bIcTETiv3FPf25;1P~au-w|6J|vxLkUiZ!s0NB1mhgFjnz}hPzi~cqkaY43PJD)<
zv-Ho71)$W5nzVMdq{<hVZu1LWhjO1~AWr6^M`w-P(LlE2uXDyX#^r~W41y;94Lc6!
z^Lf?H$NIXx(>VLqoSXW4z;kIxbfN}t&8uT4>vBfwR|?ovD(<dIgxK7A){?>9>Fz%~
zAj3qZeSS%pTB$a5OqMWD?<><5X2QJw!LxZc*4<7zXI0t&2$f{iA1lXy16g&vXIok+
ztr;qYK*=~-f?B5$X)%jSXtw0fe~RBjI6F6)YhZ$#6=Jw`tPJ~0EC~bWD+gEnnhV-G
zbE8yW&J9^DJQsw0cE6CjYccH>SDw2Kpqsj*Zl`wIvx1Ep@i)+4a3)TPFZJ~Uob#tP
zJo~oiCNDZ`_~!mAyAlBJFk+q<FQZ<12g6$hu8GIzoNIE577{}fcFG1x@dx98j6!s}
zxGVil3OmK7uS%)ur22He!3_?CZA;jZfJDUpY)pm^O#>fp8f%iV&k~q*e-ZU8)E=Hi
z-J_kV&Ym?!BVvT)bgM-UIWc!ql2{E&+;I}C`&PNF#|dQt?6G}N>$#=#o5Rha#h1@G
zKP(-85t#%KZbG{p-!(C$KD=?AeB8(%?)%6d^cp&`c-*G9o_klBcQaT6cS;oKnJ3&}
zbmnQ3yLzknEDGRMI36_*`kt*+#ed|uBlS;zY4Z}ST~uD4D1{g-lQ9Zli$S`lXqRTs
zGXX}(@m*IhT#j-|bY<QT6c3GcpP5|pdvjo=2?ctbGc@K=5};0QzJLabbTS0z5u|r?
z4bk-f;w*5#^MO5l#$hHcF%&vv5vVf8y1v5FDISz_0@Z!P!If*o;R)h`sJpF}MojV<
z-!ad$SwD;3DN9I#-+Gg)kZ>gB%b7?!2MmGnmX7(f`}6Cg?Upwyx~tJw9g0HrXkUB7
z6ZWAC=qQz};nw%KEQ+&!SO&TMWY*Ct0jRw0Kedq1s`xn^W-nI!IlqU`?qG-WcXhYN
zWrzpB{a87or`t|4XSAF;&<CLB#JRT&L#IR@yH+zJuNse_aWBRi@eSq%r9hP|x)mis
zI=<D$ZtS+Oi`ENaq;FA9z+S%UXB8|->EH(cjoS)?9LwURlkDaE7D<oecEX%Hfu&T=
zHOGL*@CPod{V%|N?+w<!=Wz<nl{vnU@#jNgGhW5A4;L;#3W#5+QSo|rMO=}MTEy{|
zEp~v5_|8`Gg0P*p0mk;K@Kc)MYel5Vr_f8D@nXSg7I{d%h!@NklvqSOyzRi*lNB9!
zP6xAoF^(cd9K7r-eZr)p&Kdk8!YLWSAxi-VBxd6w@V+r+;Df&pc;FeJkZ?7~t0P1X
zAz}y?O$@BP;}?LyBTE7lG1?Q4QCbita?2IRzf}hm^IO7kGMAorI8u|o;56lETSFO0
zTdvpspE8oT+uvl@Iy_-#cH+qwV9g-%UL=ewfw;n#5dJd>`!k*S*e;5ye$m=1j~OwF
zYIS8)fFT(ftYk3E9*f#_a~z@!{aPTL2E|)Gv(Z$~;Lxl<^{<@n5XLCSe=#iWNlH3e
zD7c7joK5gfOWB()L^I2tq{WBUC|j<0-jK0=H%%|Qmn~cF0v^?f7zF5rv?hSy;}yOU
z0;Ip1qu?|9LGXc7H(RbZaz(EHc$T=L$0}xsRI$i-5DIVbbstNZLXjJ4**T4xQ72oH
zxwUG?N~wAsF3TIUU35)on9Hr!Oc1%66gFHqyqY{FGOwQ_$+WuyhM5qHgGOe(m<ce|
zR1Q;}O{1(;O%bd8UyhYybNa}<clbc`*aigxI|o|rHvuQf*HglL?l-5Nt}B-*rP*s4
z8GB6qd`AOHIGQRH9?<vDkPw&)WrtHlHxyIa%<ZwvIL?Muv#p&ykz_Qei+EX9)zWU~
zryx^8!W-D5)#P4evHMzwzQK|Z?v6z2EC+UUPeGYsUX?Zg(wc&E$0l<#_o-#g9L`h=
z))bUdE&XVwcCdSPL<~gg@hoa*Ho|Tpl6}O~Dygawq4is3hDoF`9`H=s(YO^fABP%P
z>vmQddW;5F&@_lj^BqC`IjV6o)|l0s6K1=fzlV;6<w;*G?y?Z(UTX|ZM6j;PAlA<$
z!e1&i3yfwsc>q$3U$NfMo;p?BRtN039lu36?={%DKfY~GKi|YDt2!{cxOStNbxHzW
z@pNvl0$pGL6q|oO9!OA7Z!{ax0P;L_+)ji87-gtS5BO=x^)F!G(l3tUkG8A>hvFz&
zK^G~1E;;rGN@#wR-Vi7R0BI_H%SbEXSFV%N#k1+re3VH*pX;>Wc@4&uI_LA+|BCYt
zlk&x|86(;s?D>DQx*%=c^@{A_99eI8*#6uBOCF<vktOV6O73)LRl00L{WlN_DqbF%
zZ|C}}E4P-trv^@1$_OG)&U0}z)j$yP0}>CYT;HmQ3L+HeL>b!o=$0Z}Q2YK_#XlVB
zX~KRmVONDc9`)4KQ&J=?D^#dECnS&=!UYH}((J*EAs*m)7C^$Zy61yhl;SYYJp!0y
zGEmNH-<4Kjm!Y#ESk`dGXmG&*UaondC-<rEi?|%X;R?RCstlFIfm)C&U;<2Me@PBc
zhY_R>q#!9@kexxR52EJgt!Bvr7}hI2NC<(3W=O>7(a3XVzXm&YQu|>WWWVm^F=ZG!
z-bO5BN0KVp_JJRd0H*@ovsG9o;~4hwaAf=Of>9tsI0-xex%=iQZz<psI6FCgJS(Rk
z^H}+h7l0DpOkPD#bu$iT|Ax%#{y1evveBt?`54_nEXfHNYMjuqblb&3&yy+tFOnPA
z>Om7=g0?4`{C((e4IudjjRm2n^SQb@b*_Ph7#i79#lnXew6D=$7XI#@**DSb1*w=)
zGkV;pIMkD-{MPd?(TDSX6(j#v$yTu7U3jzK&w*#5%_;4R!0pc(gWoKQTl+79trN|4
z`ldIZK(RBMkq=QEEXUF9IaT>9idvE2ZDD16QA;l5#-o8{i%1YIAacdbw2W(c52-@^
zVXGW1iAMXrN|3_O?CC$k0A4E){5w;7Dg=kg${*-wL;m(LtyEdt8K<kA^JV**-RbI;
z{@QVUHTC{+0IoVNa3%08av*K@swuZ_FWsmQhk<wiE^46XmhzTWadsXX>WRaW8=*Vt
zpL8MR%k81T>2URgWjxkyr+1T+NSW4Sw}uocLcv?|W8V;sw-myG0N~7A)dq_fTCK8t
zX9AB)gP|w?8eT!q{;8{7e#BQ<1f2N)RSD@~o+0LHPQ??3;PTSt4{qk<SE74ZP>_xi
z8&{df<?s5o^l@!dtlHoipulx|aYn<q!^w9)vaL;V7KT1g06!$2sv1id@5-j7awXf>
zSTd_CQ}H%P^3fJ5I_KZ812k@1d3J**J{OJ6%8!<pq>#$pXG24FD-A9b6h1<JNL<Q2
zIhh_KjOKegOCN0=!<7FL)re6>blTK|BcBRia>;Zw?|G&VUV~b``=rq?O~I5oQl=5(
zg&<w_nzeGlamNKoPYdlE?l|isV(WaU0^hqHworY;*Mi2q3$>V`;yQPBK^V8BdlN$A
zvXm)$efHVisar99(hf8pO`$VFNeOnc7W5S@;W}(k(=YoUjvh6*Jm&Q!GpJtckD`A=
zV)Wf&bEoXL>(8zWfD*_u!lZ2X7N1#d3XS(#aO7*Z9YK+xnq9#+LgU3RhcrpEO#G{_
zD8|3DS(QXqQ+jx!OZdhg0`KQYq7q5SRcVs6bPz-mXb;WEV#=)Dw!x|iB5UA#&4>I#
zZG*?yVjSF`07v(%J;<dOeqd6@nytM<vf>_{z4E1ct@D8!PR1>yTk`2VgBMorm-U7T
zML(<I+chZ%`>lWKNa9$!AC|L5xWakJPMGwRRzg-{k=>vVcz3h@U8ibx|ManJ_iG1V
ztk&x%&<kYpkmGY1v7QkX;zVjLeh&*`=MABRI6MdR^jx{omW0NM2n8>5$43ouvxi0=
z^}T^0Y$*4K5juC|cMT~nN6#z1#c~)1N+cGPqO^P<OhKT7kzq}MW*V9y9M5h7E^TNK
z1QqeAI=ssFwjmIv<9P*@bZz<O6~gdy4d$arvkgJyyU)(EKwJn@Y<iX0n~qu@9jq*D
z<9QdDSI<t6b(ND`_RAqfGo^A*lBSC)=PzMo)?T5=o|KJnd}$}ZrV<fJ$ncA@aX@hf
zHl7JYHh4i~bbMIhqJuY3ER1ud9eYOhoqFh4st`xQtGWaEOQ(To-&iE`5f$Sm$4H&e
zRNh;D%j0jQ)L-4a-Gla@s`JF1E3DLj543NlSwg0{HY~zPm=Ci~wo{c?n9abU%`RWI
z7jb6So){W^cTOPBd(D~4O!dQpZ0OqB100<v9+yGQ1l8>#n--@#nS<I(oM{cN$>vUB
zbu?Vgq1d?+xtH&qUx}6Jh6peM?^kd}Yo!#??j6eMQEJCALl15q1b>~TuA9=@i|`(w
zbLv_R#mSuJEU^9@Zoj3Orm%fYQSpN?aSCpqf&%p-O5q?%Y#u#x)cmG}dlDD9v7=pB
z-=$*MrPMCNcbsTYL@CGlO<I#7JJAA%e9T?KV~WN7E`L94>QRFavir-ffS#PiKZc*%
z0FA1<lm*aIXWmZ7QBRbO{IhVvJi;9K-H?hR%AZ}f#MyFop1rz!ja0GDNle;*a2OvX
zE_%CydfQtQ7?GMvcE9>?mPA1fb6pO;;=CvJzgX&>bKkmZeK^JRHcgdSrVac>pX+-a
z$5vBy$oX*AZt3Ys8iH7W92-2W1h1~mQUqTPKXhHr3WbGHUmf$Rp7e4CBre~1KYSoJ
z4#>E=GWceB{|eZ4?`iOQ&4U(83U=^ig2d)t^NF{gg5aA{joXFbJ5C|bYrf^{bMIGf
zr_-cIZp}~H&j{|VzfA<}iY#xO2`(81J8vZ-M9+SaJeh4zdN^x*AOKF86i@u!URvUi
z8hji4kQX0s{0nomz2isnjCPl%!s?B<6yah3(o<|1Dzj*Ax~iSus@={3cQd|gym@N|
z%R#9JcPp<H7;TqX|M=p<uJ$I|Pje4~FI0mcO8vyj|CNRR@ek?F%+E_#Yp<XFZt(VF
zkw~kbWBc(wbjw6ksao40SS%X-8wBFw>VRv{4))7Y04uOPdXYA$q@q&S<dZ!8T_TD&
z1`Tdx6tKH6TK}yjptSTP_lw`bap{q6yvYOD#7agUle7JpVaa#vkpIAhg0`T#I_2g7
zmgawPuzn_pU*F`hH6o>`$m@Lomtjd+8(TO4_GgQr7N4Ai4|fxYbtJ0>ZKke25%CJu
z3A(gU$%X1sBa(98?LaQ%{yR~GU+U5SSNhoW0uHwF!~SD?i`l?@tDkj)q#NT{u=E1H
zN_iKcJNNT7*q8UrA+)$UIIdagK63Q7^oLYvGX<{{P9;nt7}M(N1c7;`vLrU!`F@=;
zwNX5h8??Nm>y6_)3~q4kXSm&5UXrcTeMYAGd3SqLWQ*j9@&hrih=ga9^+Q9!=iy@!
z&PEs8GhVhcxJ;nTor>yjjJfxc48Zy!{=tDw{LL<VR>8i2SwX@rL0VfYQx{(5O;LQR
zq?l?M!T2-n7WBYx2{Nm^Lk<BS^}P|1qXCB6=G@1|yk{giEi^YQ$qV9v-PW-`9>;BX
zEs4+i4!Wl5Gz<w4)It5v7_t$=q_9Sf(bIQ(fswsA{y%i6=x(0;t5Z${wglJxhx{DT
zYt7R^CpG??Mn|hwrI%una3k5-p-zd6Z60lnN4!{pZLNP@&&;@_WkCq;-~8yBC#Vy!
zdGOg{HsuQ)r<8OCgvlD!>r4qT`~{eM&z;p_Cm&m{`rvpBm4!p;MB7>7@Qw9b^mJ;}
zSzKjrAD%|jfL=P9NhJ`UyE@|Q1g-H&&~~S}MtlQH?++p&ukya~0i=}kCF;J+vDk@P
zD~Pz{PrgG}_SF&=$t6k4pB8!t`SVWB*BQM$*))MQpXTXp+8LImoY|RP&=B|tE{`gJ
zlJ&3q`2UOv!UM!Q9ui9jipLrLy(VTN^Jj^vBf!jO{npxB!rAfWVgHox&-@@^-Lw^+
zMhS^af&Qgo5k4qi4xa}5j9#?gl(s);wC3KICo=tM&wsz^zlK|MQpDdE|1WP|wCdBp
zJXe3L9{+hup5$K~^nYFUh+h1|U)u6tUmbw*|JZR5?f<dkAl9$t-&p_q?(Y|7QcirV
zoMZ8nX-8>U-8~9}N=r{a&B005j~ub;v4-2P&2pz=n<q8&FTkj_*9L(cL;RQL<HtF<
zBdX7-b@uiXj==uBx4%VRt0+zX&_U4TB)=O=KDorX_&PUEQ>4j@J1R)A^GNrF?Cqgp
zL`FbQDTML&cfWuw-I&$p!^Dw)R|6B~IsDxiRj^WtJ_sm8$<SiDUeoNo9t))$IaPM9
z)1p^=>X*$nW!C}E!R8DM_lRgE!v<%I?1J-<M$T#RHLXSCmCi~3Nj_$N{SW9>5ejgw
zw#Pi8%cgBEH!Dkh0(sTcDA@enpN@pVV-ugI7DXsZ#nzALOgL-xueN6zn!BxQ+4EVq
zh%QJA>Z-orb-Uw@j0i+er`A_8_vJ28<Z9<#BAE{9oHkwt+=CuFCuqNtSm=QNxH!Lr
zpoKXi11s?kB&*Fb4Dp+r#!Ss$U5~jR^Y0<zj<%=Ka<yATXy1XT{W1f;MZQ;>%ntX*
zT*rlAK6z4pFN@sap)|g{)6l>yC$Ku=1~$vkOAwMUux6sGYvh!3SKw_WcD33(A3eY<
zNb$$u!5R*rFC@~3c%na{bsv=$E}^hF0s4)BR41G$o--ml&yg$Z;qd0)J;S0|{~l1f
z7nhLdCSo6%C^Zbz@sJe-3d#l8fTzhY9MLG#+w=cb1$C!wKYEjzF%>Q0)9~GpUYa%9
zH#WWI@Z0t5v^#5z<dLBGl&t);Mfu}8O`e{>?>+pbB|8DTZ!*$G7R8$zf##}NOfz%f
z{<WwSrf)MoU(LJ8`cu9|v)EK-di?~ZAu9AGUu0`Mx?_^#=Xsw?iiCe-&m4F~XGMxl
z?Q%(8q$Vj$d&U3bDQO$V(w1?sWpHLcgz4opt_CX+H{AsH@TK|pL2H(Pb+sdz(I1ns
zchu{@K4%`}y*RmXe#}Mo47E3NW8&Q#j^ZCoe8X>*SMJ_pzRw@t`rT@qyOi9e|0scx
z&&9Q%^CF4WSXg7VK_uA$5r1?Yv7;)6v~BNW>K?jel@wOQAFI{PxtzDUIK%@UgI*!t
z9$mrSv8CZDV+|h$gjpx0TrM3Y(Nq4(xYD;z{<E&>-$}Mc#cJ7IxwU&clE;pGxB=6+
zoe$H?T})Y9d6O_X>8Zf4C|19lvO)ZzOCu!I;oW5JQQ1%Sxv$SiVFE5>7o-_YIQ@D!
z3EwB8J9Noy;{5kd(z@M0H6K&z;?((KTS(cHqrq;Q^}C7n$d!)8BEV`m2(1UFauY3|
zy&=sUXIhryuHV=)G~&TGF;Dz&)(rLWug0&ie@Lnmz|RsqXJT)#tKkP*pEIZ9t;(xx
z6zo{=yxAAEKVpNbrgY1qbf#Br8xsfpEf6W-Nc)>i+{b<hZ?WNPp@gw&hRsf1e~kd3
zSH5?+GY{TwJ47Jh+_}vn!&?UmpA>EHFC+}~=`@hr+Rkai?Pn5(QuvMAmNtB+q{(kj
z#dHD>b^-~CSb@(3RS6#I);#|)?`c=y-#J5giH)wgLsq=lbgwmom$1+<=pTG`^ScQ+
zhy08sXx9Rili+7_Bt|{}3&7`qH4z>!{@`i9Sr@%y7dHuhK3HTx5+KKT)mQK=1{|nk
zPwmxQp9^!=*+e3gQ%hQ0`f45v%1(M0Ylmt_uSu0^os?SF^lIZ>9d+)j;>E?9<(a`P
z@lBY&*VVl(qHi8Z)#XNS0|@TdE741IWFTIB6gFpbv$SJh%Emt*+q|{S{2c2bt!^L^
zmNuH#=w)0Zk(bf5&_6f>yUoyFAGkXGW-hNI>8|hYR*|TknFHrWSsp&i{UDU&tjd#>
zbeSVz@JrEuQnZ=1x3rPjcmKWR-Hpt@BZn^c@2qn0yLXZlh`>Q-AJ~$d6%Q9iza5_U
zQZ4&6_85@cKS$|ZzE(=4F+%>4{)}eQVgJ##<I&s{$@=go7G|d15FER^fjdgzT3i~v
zq<6!I)!vFNF3_%R={MdZ>zwec)jG$wnHrgL@0+=1Ru<$HZnq6n6uOUpjXV!NTEd;I
zB%)pj6y~9QA@oXbYx!Piw0RCA$x-pRqeKW8-|biGxlvKXq7e2@L;AG^`N1!CGfYW#
zc!RwWAHx{5uJr@wd1>>Oa(^|p&2|L`Hh#ZIN^9n#;~CpLnW42e&mZpB-5&6Dup^*w
zAee3E9%)Sa5;INg9FxA*m}hGylPu3t()8`|Q8fc&TX3OLxl=O`TR8U=PsY?j-z%w?
zQ7zF_9<{DpE>XBTeh$9LAQR3KWlELaUw+J-HKqg#zrBJn-fyd6Eo+FjiM_2hGwN*?
zjMJo!)R@H6W{osyVjhOyxr(=n{jevmK<Yv5y~|ny&g$HKp7!%`MWhV5eZZDYOc9AE
z#1rFo1G8ZfPRBqq16=OiXLox~{hBP6=mM|r3XIv{v1RDy>I3RzwPp=CQL_yk8GboD
zWcQuH;PVjaGZa<zDyZxI6u}A7V*9HRo%7<ieS8mxqvW=OJ^_d&+zWgr8wDG?f>ofy
zf_-YgM>K!TwMsH+UPeF5{G4e%wWwxxbcA>-4rcERpkKQ%<~;m9{rm=VjV<NbeJ|@3
z^$4VntC`r%r>gH96yhNL>Cxlht(YgMj^eH+pAuQ2ZnO^vIF&!82V29i_7@(PzS>YP
zZ*IMr{zS}MNpBJ?G>Do-!dJ={e8Q{j;Ba!qylk}jaFyb9tkvC^_(ivRG2L(B<C)KU
z#ZZ&kw8F|C{UF5dN$G&7cYT-6Md>Rp3TD#DOIln{w19Qf5Bu4wE1f;>dg82EhBg?C
zsX@(L1*MlaCN`UO)mUyXlM3*VRS%R;5*lMgy$gr-mncj`KAPFcguHXHefcXZDmLK2
zCqv2b2-ZH{DLqbjK}x+4fTGIX_W0PphTO;gn?k{kbXN11l6T&#;~s`a8Z_Vd5d|qY
zB^)WZqd9E6V>`Ux=vS6HFZ~uA{)JlsK7XT&meN2}*ZrLY^@?WGIiN`*1<)LJ4t;mc
z@l-qab3@)Mx!ZgjFy}VUXYcQE>p5W_1XA>;_?NU)3S+6`kL@ETH9RA?s6^}eJ4?Nu
zzY>GzPuGgpG5kn7c<T=2luB|IK1M$8i{_5G?*U<T#UC{d8_F3>HRL;v6g~UM47r@W
zRpU#&iqmyD0I)PWNp4}qFKxf=d@=|8YRCUiZW!N13Hg9Yohj<&)&J%)HYJwcgga!T
zqfaQR-_9_d;71?7|Ft=ghn7W*t=ywjnLw0AHiyj4X+pdj);g;3m;%f;qdxKMkx!_q
z$ESYPfq7RNV4wfNU>^kXMiDAU9h+#~CwTZtQ~%7j{UxDX?d;t0<Qm=iiG4qAx+Mxw
zeZdcF=iqqx#^*vJaMSk%Qr;uh0d<|<11!W0wQ<;MOv-p>_s?y0R67jFjgYuVUpzx~
zPSDISPVn)t54}URzwoZ;)PQO1?O;(D;Q9i+rZd72|CA@yNR0cwtBtSPc35zU<+CF6
zN(=~p<>kWG)>Pk$`TcL!<A>~lujk^St1sp3OU2`ccwy}Nx3%)hnZ%76c8*1_q~X7B
zDwP<S%Mi9M*#(B2DBqV~YXNT*D@Gt73KOVEZ;$!+xY9v@!T{)MX0ac{l++sN%A_NW
zVlqBc@3pMcm6hulfm7~?!2Qn1&s#}^ZKGoEb<t-do4VSJAtA2w4`!*voY>os;f7Hi
zF&{jhlOWv6{XlW{{Wy%ovJDusBe_C>?6ftxHOKwV<VIPWOiAen6$~KbnJa<KZeaeI
zG)u@Z)B{1_lY?}F^V2?^^HN`gbPDi?r;MaPIb{yP)%ho4OnJM#9q=9^sR8JoT>?wr
zGwGD%WY0}(qTI-bTdNO{_>%UitwY_M0~$_V$gJtIWcDb+i+<tY3A;Lonszz$*N>dH
z06zH?LuK?TFR^N4TQKa*d^0{rAZbMC)Yub@mFjBLL(jYOp;0V38t89DKu!|^37
zZ}?~;1G{yYcF?jLr<CQ(ZmOapQNvX<yp>y^5i>ixN&|LP{pT<Q`ktG=Eo43XpFfz(
zdv}=lKF8_E&80w4CRvbpCMy3F5+}4!b<R$#imEAKFZ%6t{z`CiWy~hw9bC;xck_h)
zAb!fnx~dNXfV}rBz|E4Yq{Q6B79@r&S>-uMPfo76n_6Uh7@pjgV;{t8*}iSYHaD5i
zvT_d2mOfJe*V=U-19z>#N}uGUe0ZA6w;+saPHUKMe>m|QyOK1Fn~P^=1;~-dq}>Wv
z3YRs!TPNj+x6mxHc^-~O0?kj{oJu<1y<FQ<3PTtnE5B9liaL+Cl1c6dqCPOY%^kAM
zhq=JS$KCFvsS1(%Ru}ys@Yr#818Jhl&sjK0QRSRPtysrQtPz~s>#V0NB2_}daINQk
z&RjrG+Ih(=oO&I!B;=d~WeGVO<EeXmyjRE5{I#U?mArSuD8OTi>*<r#*isl$2WBFv
z$YUd~q{G3f02XPSmWI&kaa1P=Hc)+zVFh~iy@54uE1C9SG2gl(eYZPDc6y?HEdQWs
z*61`IvV7leiw{(KQRh8LCTf`L`FTI3k&7*d1-tuIG%zlBb)sKPvC-(|?PtrYj#xkl
z#~1dHV}ERXOs{DQ)(ABpc5(UIN8L>tDQpg8p-$kw3>6y6#h#K*-mbjQnBmm?ZCF|5
zqiT{p7=gq&#P75JK4U7jV7fLt15^~U;pQ>C^v!B9n_4|J_1ortkPiFtZ1McHHHxY$
z(6-)*sjtNT4|8uF6=nDJ4`YBxDuR@PAYIZWB_T>kcS*-c_mGMV?V!@#Nap}Ucf*iF
z4oD6m9Ygb8_w#)3`+4qnt>5qc_g!liYcYAQ>+G}lXMgtR?4y-mHfasAZnf*$N3Ad#
zi<?;b@~oDLFLVN{VhJ878Bp5^<Sne3B7Eb&r*lOU&)hBX%EnQvuAU~5fFT6j>ZhSq
z;@)@i^UyZeOUgJ_snDyQrI{62c%AA3L0Et(j-HKB(O`BZEcHFjbPABDRh01k?yUn&
z`dMTR|0zsHoX?29N_XDu_$#hU8MFq^-~gGen!T=I4fZEHPp#Kbr8>V}9Rhjgtws^U
zT-E}3(bP~?S(h~!*6rJtL6(wi|A=xR?=&#?hiU1f?}~OwQ-$|Fl)VKZT5{EEZLmvh
zn>?6u@BY)(|KR8U8}I`GSMgj7#{RG_1NA*j^e<0YLwdO4ss$`9H=)Z<Te#C>OXD#U
zf2GE0jw3K6)hQTRFEvWtq2=#Z3NS_KeX-Ha+O&(V_NotN@8;=98%t=I3erA4k{;vf
zdbO)q74Tvhp4fen!o|h!b>bTwoR!IrRx70DeHf)4E+lW)DB{96c+&mOzNv&uN^nV9
zZ-`Z=t+ZjLgCao}PefRMW#21GdUAiomF$$bbJO;8W{k2N#`j{MG~_La<7pXTtWn8h
zy><M@i{i#sRs!^t;~Y;34PJ}Kd{KUW2k_hHaGX~u9+}1Q2u%QEPR~49xoH#bc?GZO
zI|>ecxxzJwy+3)+7DxZyEHBMwLi<$s+l^B1D)9$8mb)$lCAw9S9Q?v=qdPr-({J!G
zug)>@nQ`TFukpktsJ;-ZW^4l-k`h-4!`%(rQ(FZa0A5{}NW|7N8iKqV3s60nd9C%s
z$30w=!eYDY#hOun#ql{(bhGk9BBSE5$v;2KC;*NE_$)9sVbRBrF_~qIQXrNo&K*4{
zvRdByXV0-hFMj^~42!lAV_#8|fc{F!C_Co72VHb^=6eG11JXqR(4WM?ipT(xfXXq_
z@cdJWsfd{NN<)v4?unvjJ`h=bZ)FqFD%vcQoMkB`hXp&OL;G=rZG=<ak~p)zKrA^B
zPZ*Lvt~ukU*DcXCYc|vHkrFK}2q~pnqHQtKzeVx!5gtymmn3?+^-M3Uu}S!EzYCjs
zHQ@jrO0!DWR8q6mb$JgpB#|~hsLm3}ostx4CUx=18=jfL=}R!PXl`4$5iYTXx}cl@
zyf^Nno>HgTqpQd}V(~L>ic8b>Lw(@lN&ewsi-`pLKdjPO-@AXEQ{fyH8@|*TEZTBp
zttjMod#*u7gG&17qZ8TU=9pxtpixRWj6eeM+<_A7EBoi_0ZAeH2=$B@PzUvqY3?W8
zffzOD3lkWSM6Jg#JvMwuIJ4aidn)F~xN+*Al3IE+BqP`CTI@THNf*~%Z}AZxxK1WH
zhMl<ydJh&uQj!XPmNk2>RTfRzMH&(7vaS_!u&v#7h>zWlGn+J{w;8JI4*}Zo=`JR}
zwMaC4$T~>~k70_RnDQN-ipC|?XMGw`O0ql@vlXY`G4@>z09yf0<)Pa9!Q!m`=gkBv
zd+(3_RFJGh&>4w9zcE1p%(@KB8?r~2O-^&m!*2*SqF!i}c;2br3ncM=)aF25@J1|f
zH>GEe9Q3S3b0zIY_@EVbj2eAjT=w{*@8}O)(gUdqb>68)jA<0-R{-`7w%n-QKrlZ*
z*2Of&4~5vU;V~gF)na`G9aL@qETUjK1izZmJUlOl>G{-EQ@)F=4k|y`?CT!f^*NPL
z{<d<7>{VMEJuvECVEV4cKJycQ@<5+-uC^<eQ4*7i4@s!(jRH~J?E0k3Og0l75oCt)
zbN9*g`ULvYg0bMEPV#NOdG{1n6Q^%;e)53tYX4KL*-*5i-Cbg;?{o42JKF$Wq+p-C
z{hK%pP5-w_?-A|vghD6s!@xqbus#6RWt&~>6XaP>KDT?F^kSz}K5U}#KZ4<F=bDAD
zRc6kfdEJxgZU=Ib;o>}j=N-BfY50!6YVs!^;0MHR<ACV=v}`zb8(`oxeE+8OjAU53
zF8A@OFF3jEdS&5Kuxm!>_qs2*@Xa+7+1)nS)ew_FkyeM21U>IQ^Ipu^0i5jt&Wi8#
z5{Jp`&*k)%#{Ofw-rTk5fb)Jf03&TL37H#dbsIo>oMH*khmYr8-#eb2&)24B(xG2|
zld5myb)Xpx#6BKg+R|HN@oKRqp)MdPd2iw_DDeT=uFfOP`mT%2nbr=Z(fI7DL3ID@
zjpkNdtV<^V2KKJd*jCy&XL~Y#YXQnM<Eo`H?KVEuyZ<`1dzD@it&dOYA>eIjWT9f@
z{<es40D$y>>pRPq={lK;3RbvY$EdQ}eDIOQS;vRyAaq!@<IfvZ1dkm3WAgdX2m5it
z@;%#EeM=+}!J%!Zfk2RB_GI#2`xLt#V<m&ke%v-~7POC)<Nvz`hI=5fITs-N4-Wt=
z;8J&R{jgyh_w?hD#3Nx*HS^NnV`7<~rMpIci~%@6^Ut-Ge6kTCWuoKA)bdeo6Dvg2
z_tY`+2%wuLWCq&4v+<qp*Q{VYo2FRVCbmB7zu`}V_*|G@b-&c}UrHmj-^ASZWSLrc
zuSmy^w03)>CzN@)<|x>gkQi`QPsk|xbiX9iuviZ7ygUO<q0v+vq%Q$)?R&mU%E{VA
zj>++|VfhPKo#1N~Mj3=`BGJ~rhrZ+WB`o(Du{L|W7*Y*EJb6vxyCVhRq`o^mIN@!+
zdTYHm{AIO&_jOAE0xmSwKa26ZhbLuf&w<*A{iJ3np9RTVt%$2&>aG-D%J+tKd`oz!
z^7YS1(Dlb2rCz!@bH%i<gmR1T^vR^POvz1Vi};FUb|zJE#l`)D{f(Gsk}3MCQkiPN
zu0_Jb+b|o!dfuajDwHh$+4gNqtG;cKx{(1)B|!0O5U1J)_pX8uU?nabFFb#YET!Wy
zW9VDbdm4;+ymXd4Lt9CfW;_-p7T)eEU&8B{ehS6(2#B9ZBwqFCpsADCoh=*RfnX{o
z8Zp5srdA>us+CHX(%tL#>bL2S1!c>T*f^I4S+ZhXXVR^jOgY)z=%NQt{y2s^RkC-3
ze4fnhI%n~9r#Bi8@1?P1a}){#v^&&<6K&8lK-0uZ+b|qQy!SN0h)87N@d|CT>RYzF
zv)Ro14-B#`4k6SUUGgTlt@RnGx6#hRM$22Wk5wA@R1-&5BOu{*%c_c`=F{3_PBMN9
zN3`h*2d@J1ApwVWBFv)Z&p@u6fkLO<x;jgv<KbI5Of;Vl6U;=PeB?Y9c)RiIL{Hb9
zuA|1z(I9pR$(DXfl`c+L%6Rs9o#lv~P9qU3mvqfpuGAu72(pSK2+mqNRr>BGXaTcH
zG<i6R=U*lHp1#n_QY9tr0@ob>PJDNGGUEH+noT;Ibj|;LOpmm$S*^7~WR7CvC^<7O
z8F@D{!GIWRJpz*HCKccs^XjA_W7@$?VkP1ZEkQ{|3u{05DWgrnAxnfUc<ZIFn`hMa
zPdVOm`H*ES;Qz*Ep|($gt3L7CGvkkzPMVzac<%ty$i&xxPlAIp{Ebu?MUBqL?~Nz0
z4w(`RQdATn_undv+V^4Bm``9EQ$N{|^k@=P%W3LH3lTG%uUGH_llaUo>PNWkFkzW9
zC{Nawd~#GFb8Y9d7$F&r_{rm&w(ktroN8MG;<F&%x9EHRPK-T+hsZ9{(yhh4RHY?Q
z*FrZg-@qMf$kfR8t)a>8_u;oj+iPE~yd*xFPL{{;8PdEOTO?VN6gBQXGkfQrXvJA>
zNcqYE6y?*>r_@@KNb=OC_}B~cA#F4mt7<_T1N?H?pZEK-Y4_b(p}|@mZE?6WUrfzd
zS35HAUJt4HAAS6>dx2}ICg&5S()S5Qvs;QeUv9qtQ}K@E^WO0((xs?o;qk=i7Lk^V
z4QxML^N&t7o$&7h#(%yYOv=2s)1V2N%d){H??Kk_*EC>#q$?lf2zmA?%qc^pU6bi|
zzt{Hxb5P1g<Pxi{u0Tu|X|s09bK|)K(a)bPWJjk+O&zllfR(5b>lgXO;X|rCr+VW#
zfZq-=$bM!IJq&pj%O|X}+^rcl8`E=>huh*q-?gOeNn_1Rm&=VnRr^R3{z%dRuK>$q
zOl{jdaxEzHE?^xDS*1}<6|4G;muXp#baC=(3zdn6W9tMovdVbc0}jo?JM4L+?~%Wq
zN-#m8s{W_i0sZfz%#<p+*b!jd|A3%_`EXe?ubRGVFd(Ptmt1)qM(FA}Q1`q$K`B0z
z(~9<@Ian^i!Pe9g9uYfv_hoIVx-JLFJr~BI2eb?<ucLEb=W$yyIv`>wX!cn@_!NID
zdt_kPK>-b$vosu#@B3B~P08tG)t8+ay(+5{+2O4GNPpMR)wJ{&C7{_?DMp1}k7XDi
zpc74Hj1#ixF`IAJf7@)Kx@p;7C_J#{tgS7b6+!vO@W-G{%Dlq6Ppl!ekDQMB?+<u)
z^n0aEn=NI!J~s(GzY#CypQ_-l>g;XBq6)oJ6(oVqeyHyE@PkgE0=aH#ErAgxLB(cb
zo}@xc#*_`{-9?i6t#$tH3OmHJ>hX_6hMw8%M&pt`PD-wMmcs}G@53i@^sT)^3cT-!
z4yTb6Jr3Hcg~xgnT)z?MI$q*}Sc-g~<zpdv5vnQDBgpG1=>>Q!8Zg9)lg5?R9hcG)
z*>Wk9{+C-ID;%S`-41FIdXbLfkKya56xXc8uW;ZG2YiCCpF&)VOZEKgNV;DCuuutB
zF?qfk9+&JYSEya}wFvS(ITo;{0rOz{?fM$rmv6rE(Q2uuP8f=%TY}4_J`H7&U2UKx
zz;u0?NS0TzpmyF~3z`0^qIpWA6_lQGK5Kp%t8{=YPdXvI>0@6cm?O$+bdx@}Us&<w
zL+whsn|9vZ&HDpk6zl_Z6<vet7k~(1keED_bDd^-FLV#dniy8@gz&+O*dO*$(-a+E
zLQ=G-<?9Vl4HJ-hqu-{DR?t_3n*FiDUW)7)dk4$|jbve+pL<HIK*yGB+PrUT?h%iE
z5QBR#$s}A%=I5P&pAiStmrwq@!t5844PbYS7=~?_#T}!BYwc6|f=kf6wYc%0VhRmY
zWJ=WPpHMw-Kdf8Y^7>9|gm|2H&%r2a^eMxJW7wfpYA{?(5}uOJy}3fGFs4SG*>ne~
z5EuLD`<Z&0E|ctowXa3q`nnm6aUB**C<8fv50BumnZOz<=#|BnWRdjtPPW7s?H`fm
zf2^Jo?+3Pll4^C#VxB7>f?@JJ<|2}n-7~->c8zhh#Qp;j(S!LQ+5Iq9r`rR24baq+
zCMS+Guj6E(b}lkGJr4x#=_(q?f1J5kty^W7?9|j9mdKm0PL{`%(09J-K%XsGnSY|=
z^{^u!zgzt6$>Nox1#hCYcyT^GN!UK~?J-outvtO7&9xtJHFtZdNBllCzp$92&*bX7
zW&I+G+85ohh+fVeK!D%<owszlt|%^^o%S@}T({f|FoPwpWy}6|CH@TNbQNsg=(ff1
zhT<@&<@WLURZ#4NKbb83<L1M)SI*_PSgxD=djb1PRWv^yrBmikRMC9-pNRBE9}AzZ
z(_D1s*!q9_3AQe?RrM1rhV5@p-itS07oPcs$u7Yb!^^Erp0D-cFzwnDR^CRM!$cwl
z5MVDAP5YnAv1Q#GsJTcYyTpXvF<m>x$>SnQ4^lH3Udl|dSw-{8Z#T2XS9x95;tYSY
z(#s-;ZPs9+zXSrE=63I!&Wl4{{AV%{CDIc=zCo}i7<}GejQv{nUOHE$@mggj!!P3Y
zo{C;3EvU2>uVgOcEqqt(%|XyMmcl5m`6kyPllP=po~d(uP5yVckA&ipr7XU3b<Ur^
zF8v=Ls2lsQofFGPYy`jmsB~D1M<Qgc+71U=@Y7DXzn%?ZGoE#MuKoFEpAX{Rxk}ML
zDE04)`0uX`{l6Vb`2VlhZi^rMeV2i;cUF!0NmcFD$@@>boLb*0O6+Mtj_QcpQIS27
zff$evy<DVKkyfnD+YVeuVt1VV&X%nDd80v-3jm+op3AGB6BRldb2++t{@@jF?IZie
zqRshy9=b-1c6kxPOCrU0T*fxx8E@o5jmjsn6V7m?ag>15%BOeEJ7Go_-+^DrZ(07C
z=3?!~dcEN-P|YTa0f_-8z!H1b`m1Q{nJR;7tyZmiJEIv{#G8E19}8cTrGM(?_3z=9
zyj~q{D=Q3oWxHj-e>5J)z89i$?H|VLbub{IXge@%FGJC~keyVC*99Y%8Pk9+RNZJK
zA!&xxXI<8rRXhh{Vc|{k0tE_Plf)iE5`Rx-OZ~q@0{-@0B&u5Z`!9cc<9S^@k~0s<
zgBhsx-VqZ`S(i5q1ugPGXLttAels_utE-|iHm39Y1_LDS21Y#(&m+2?HtmLHlTHYV
zH=JuD*+eh--(WSu8W3JIZaWy&G{%;@*|q70EWZkkNByLJrd8VPg&BG1{9q-Q2xt6w
zjNCoVY*ZM9tE|YA(q8Se67Q>-7eG$Mlk$j$Qj6%w1j1*T<h)}qxv+GNAm0t>*cEsG
z$gOa38)|3n`{F7@C)P&J^WSHDmy=O%JUU$}R8~=W-}NVCurasLSZyk+w@mjT)~>Wn
zQ68wJ8$+lg5p*eMR*ng^vg_PYQLS`4wk_NiczMrC=T3u{x|&`#Zn#Q7V;Cv$ydAYb
zhtU7+mT?a1I~e1#u72da-MmYh_P}3uX@@s57e-H_#h_u3nXMx2GvY3#p_kqJC+qIq
z`1-l3QQ3j7Zi4sW$|%k8GqA_eBvoTCb7Ux6Xgq5*j^xGto@Ka{kzVfdMPl?mAJ?z^
zYW#9=vvdK<uREeHaAGCpgt-LPqg7})V2~%zarw>X8MrUIvRj!cE|44R#M23F7b-9^
zRw8_@mHtlivrA^{WF%vQrkVe&`c$!qmqHc#S5N;BM&rERvewq~=qgD<iY7~8-`VVD
zEk2j=^0<KRh<#vJkzb)?_U!{1R*caz6<uH2;?G{v-`~OaYETI@uBIdN{o}X(hm9%{
z|I3tCO$h(TTXedr*oMls>5~u6;4<oMe-YEK(qkObXDNZV<mYoJZaxChvDCi^K9y>-
z>V42k;rYDExK$Z6A~DzQj|t(XuQ;hFfjY{l&ejiqI125azpfeTC=S?7GXu>AtrN-V
zR$7pADxmmod?afn9Ck`t{1LkCSnVTC#D=kapxSjCLedC_w6xp(0~Tz+bS%!nTGa+P
z2osYh4FW<=<`3_BMn#=N=xp2(K9O_TKDG#$(n88!$LV~r$zGbc5RO7uNreppdwN%#
znaRR{T6gF3o@3PJ0>*KAdSyQi7AlLn`~rQrW?wwigjp0Or+(8h<X8EqG`%ItPeDt%
zP+g5p*TI<kOs)3YuaavlA)KyLz6$5`D8B_;4r=t>?NaaC8~I&)K8Hr4UEmy;;be8O
zYVm^))~7et_H&n#>*p-7HKqAZmwva;b}mG0wkJDD!bc5oN;2z*gYUe$qPMW?9BJ-Z
z!Eae{-KYi;r|FfaTt(r=3WS^Zzxrs}Jpx}`Io9zzr5P<=lS!H-GrAOiYR$QL+Y)~K
z>{$3PT>g?~C?tS5ei46DJjd3?Rf|{NfvWcH!bOS@^3wcMah2uv`M$5`hbYtyA7b%h
zSH}MK1AfrS$@uH^H@UUx*FFe8isJOH6Y+1dw9T~CX!n=g_0D;F0SkhsLr2qM9kV*g
zEej<f;;-8-JQJGtE)ie1t#=-hJxQg6<uFSRcEjhqxR4P83zyGrW^Pi+AsDe0JhOum
zlf7#Be^5ZvzZ_GV521Z{+4Y;4+-kZ+b<K&oLQtaj0ARgH8f6glEl4+|s_{gJF>amX
z(s*C^1{*qQIe@sSmz}x;O$$oWoex$i+(x-Gg%D0j;KE@G9yPfJ&cAcy5pSTncMsWB
zwKE0TCFRWaVUm%L!s-<QwA{lhU3$yfgFWv8OWTv+vf7RHW+{Ni(yUk2cx~NOIw6_Y
zcfTVrk*9(3a1^g_i=S@=0)H>#4O`wQqCs6Jh1&)5M5pl^`oFU9Y8@HP-;`V%9ox~5
zS=afE%#BV)%jC&(jtcu;rpxIWc`VgK1c#Q3mr7bVMm;C8Z)4=On%eaf(kp%UF?U`H
zp}<}ee_rQ>F_ZF%C>Gv5rNsze0WJBum(NSP*jo1qJAhF#?dc0GN`|={Nx!qND(Z*0
z$5#8F=`d<n8Gy3MCMMY5IyRgOR<kx=Tt0q^<|qg+5jRQTAxQB&LQNkfI(HD?6n1X$
zG7Myzm#uduOK7>5enA~(tqre5V-{u=kZ!aiDOZG0h+q??S7i4KKb1?Q<CWU|SMA_n
zA<R+PmH$@@ywXNmUbpKui|^=;gI8Y1&rg08yKck99M3m;?yLksF0{KUru=3xHZfn?
z+2)%byQ~F8&p3NRzctYaM1@Zl7jc|*3;EemlljX&4wJS_1<CoHH(@R6{1M42_~k75
z`t~TWt%y$1b*@KUYY$1^U<1E;)0?VDH^a!n9s&fm5R+3OM63Yy<n6rIY141#jp_&e
zhfM+7NN<GcP#d}U57w^5o33GTHFgl$gAaBW-@IdV;2<b56)<iW&k)$0mqfP%b>TKy
zL177cC4Q0cSD`1ul0WZk=~lzSyDhkZsCm_SUk;k9*D)7BU1u!-N_^%*HaFb#{8P;d
z;k{li>}NhG8BhZyON8u40gzq-ED}|bLC|Ad#n3-r4?+a*KeZwNVb{6y<TPuk>E$0}
zbA7F;|5RY>glM0B?%53tC?}slXc+F%L5psEUpw3-&~WbCxDV<1uH(DTL@0Sq%;=&g
zd>{uE4-t1TA}9b&#>6(*fx{SLrdYUI8Wv9LW}5N@ES_@g-M;u}(W(kG{W?%=_Q^vQ
z>2{N-a<mRb(CIAT1LJbrnex4c($Af~WNEvKoP!=$GqOmVCl{sFjna6#Qv&wxefxlm
z(LgJIL$YATb3VIx%*gyD1>9g%nBSe{Tg%e;NNHI2%%_HH061=JVm4mq0WPF0@zL%A
zx8e1Rd3Wk<dG94c=t7W#<l?zQYuZbikPjf=#i4@Cq11ymcDh`WiCUX2gL#C52_F1Z
zt#{Q((VOh5O%lo@X&xqPROFdEzh`6}P=C_&kI3%Rze;XiyXVdLfm%p#4l|*~OGqW2
zT#vCbJ$6J@TwenuoJtPuue_)>)aot6Pqa44Wta)0e;QJMqB^)*jvp-GX-B0mFyrU6
z{ezAwNL*e_6yptq67q$I1@@Z9$3Dqq=opvJtq^Txk5TT|4%|3kB0De<*kavid-TEj
zV>Qhw)^;gA37Z58UQHyc=%v`VMgnw1o0V=E+~A2~>s2))fzi9|UiQ-K4H(ognlmm?
zdSPrnSNJRoKPqaGM+r)G`s`pseDyJ@sq-+lbAJ)3J7+kMG>^cdgDGWhS;?6yk-@W(
zpM&xZUyOX^BFN?EcehiJ8#v681IdXM{BoRR7(mSu8@a|#0waTzf|&Lo4$Xw&3mr6k
z=8#bXt{0PjQsX|lt{4biL+`rpOSP`6YD?!~^~J1$;z`ZP3O@wKJ4TBoZp^@E0k)XE
zmteYIm|3dlysDMnPt_GwMVI#A#GcDnv$tcCSb7%x*TVYcFhXF&$1zN)-ypzg1fKJc
z)vAxH`Mv=TdzDdasHT1A`*hIS!18;W#UhczN<@F<TE7onm%zYx9U~b0&yU<1wehi%
zL<-({_$Qk!H`hLnPpR<$nBy-@crL{9B2k^+(@udgfs$`fS^tz~@|n+&HKVznarvOK
zF0d+&X(o!TID)mve~4Was#JW0yLQ3a7hF46f~S17<2Z7EWY)ZQUXFMyh6t|tLeK9B
z;JDg#@+g@?1-M<%wjya5t9opBC0_8@>v)&#*Lf&Pj<$wH;$Qaa)qA7IG-DR=9GiVR
zh~Dm_$D{)0L3HLpT(Z+MXyA4X>_k8&BXiHo@sUR0DJ@K~t*R!H4N|J7v{5+9Ii#2O
z)-c*G`?+Hw$yS4ay$^pFZx?DiPHDC>u2zYt;c4@Az8G5}Sy)WFE;+qHpRP7<+@z(V
zPkK-0_*2-8*jn6<Rm3_>%m$&=uep6^CC(uj4SMTHtsN)PeE%awy_BwU5C~>8x0+V5
zh2u$x2)WSKF;%stXtKB290jgY>Gzj%&r&MDsjN-X9BDw>*msjcKKT5R+*Rv^3n*x@
zC5Z}bzV8@upTdD(X5`)G^!%x#OV%UKN7d2_5Qf(n*V3XFiNY9v@7qFajE9Es>EImv
z?reZfX~&5<7!%C+48nlEzFG>$e&pW(>5|Fa^1L_Bhsx=BABI-T>G&K+!Q!$ubl+Xb
zcpj8NI0khuRJ^M(c2+$E*|i>i$xt%Ix<?jV+x{wHU72}mPoml;%9oIgGMGD1V9nu)
z^bS%xnDG7)-x?i7^SpLQY^2VDy)N@7W!j^1ND*KdAoGC&?JGE>3eNt_HUFKEo?ZRR
z4V=ZAMp^pya{KqFE@o}amW45rjhjOXOl34evbF<@?h1f?&}<5LIwD)$iP6W^_`4@G
zhrf#5$=0c6Nr#NM{&-v~^`^4Z0pX`mRL9bO&a2qcf7azhhCHRn(K++iAsfsb>ycq3
zM~fL{)7lrX^7AC5gR*Nrlh)=JtAm^@b0`ymp;m=#;w9Qk-T5Fj+91At=#iP#f(rmF
z?0+)xIl`~}-+}v84TDx3&wwf0N+R22ZD@vY3)$BBGv(-(H=dW^0tp$4x>P!LYy^(b
z+Agvmi$pnN3X8QOiK*7}Ymi!?1RD3rAv_V!9rLow9JbooUj1$9aPwNiO;#C$vOIg0
zuiYL)g9*SnIy{TZN%(qX_{sQ@uupSG!_5ycjrP#I0vVpyM8Guy0r)u7^mpT2ydS8v
zULSCfLq}Z4dQ-)HIVi9tIYC<fFWW29kSFhASiu(p1#Dfd!O_&#GS!ec2O0AQrtj*q
zX1jiG`mk#%6~nK>Ejer5Q!5qZoQJTV2_Z^c-b`)N2Cr%x3RD+@hpwvezm~3L!sIoK
zcZN-}ltxiytYzwkSC+j%5Xu!GPY^xHQ&Fs45UX+@3U_sUU?}-Sovr?QYh(QNpcCdQ
z6rbenN_ZxS?jBmYaEsJ^C@Bd?*{{g^1s3k^x$cnuuCWkcZ&Y)rqfulwrLZ-C{qBFx
z^($MSm0tpL>#7YJymh4N@28`3-Htaw)jP-uxX~VOi02eIP8M5n{`55R8q+8Ly#*jL
zAeA4_z?IQ8F3-s)?7C|a|B=<+cGH7SWQR2zdtf@%sfuh@zio*5sgCZwP1cT9Z#j=X
z^<HuA$p?VjNc2`T56NoOS2HSmANQu*>V4h|y;2*S(HO0IypV`0u0ixU5J2iqO?w9u
zl2s_`t4szB?5G5Ef7xqmYBscPcAG&_9*8nnso&vd*%^PeZJzwjx64tW_0ZPzEOZNu
z@zd*t7ltNcV?mqE!@v0@E#1L1b}fguf#{|r2o7HEUgq#LH(+I)8sZ-UX>ymyK<WS_
zkw=DMOWHg4M{)o#R9$z_kIm7E+Pb5FfXh?nJ}pMrJaE6#gcyE=FUZi~S8+|p8}Vui
z)$NU~`hD(!JyRl}ggh_|o;o$gG-+x%CM5sQf(zyRm!PtRH;$Zh#nTW5H){+2z-v3T
z<?7Y?D2Bi5E<?YeA|mxZ<G>-k;d!gpC}^v^ddGRPwE96wQ)$=kw0%d^5gkZGw@;U-
z*=Fh<*g!kJws28y#ld&-q_lFp<J0{htDYh7fKG?{th!cyXz2qpAhgV7>ls(C>Ts@U
z0ms!29e!C`4|d`kd=O_ds3mahOKwKhyfpT13h~u#*LA67ol6<qKjg`34+I@0LIbmh
zzZ{$<JC!!>zV*K!jx98$I~E+l1~QBoIcJpt_dDR|6$iJmLq5AgWD|?NdH!+7V7y#F
z&EqdiXP1tDb?GY>Y_vyEZ&&jy=2j=~;g}l@%=%!&(yK}@7R7yx+4M<RVU#%IR1QXN
zcdC}Da@Yv%eU2|g;cUkG9TTCZpPIyNto$#VXyrNXaH43KzGm347O665lnS`<?;?3y
zmWP#Pl8elQcx-y@=XKtA=e}-9Jm0y~3i~p*`1uI(ZV=`c)y+Ac4UkhtYyE3-igK{I
zjwnEH7PoY#V>P-c!Q&him|X_uuVK&PA*Udask<=ko&Q>7MXsX%<p=%`aiZLeM!Z5h
zqugrs$6NY_8Qd)J>(b&Gkt<8WLFGCuO>n<i2~dooHoROJ{9<WKM5S-^qQ(!3zi#!E
zgKs5n_g=V7%8=>;!~&jGzX03*CX3P|lF>ntY~PJ2(u*LZ`>x*kgj;B{y*5}Y0u4WH
zORSc&m6H?T7>3utjqkVrxP-w<<?492#XeHf;CeJU0BqHK&eCs1nxWLBS*s{oEsJ==
zIW$CDP9vY5D!41L1c5C_(PB<1+~PMD>mI&^qbySdl6xLiaMB;^;5daB_dR*Hq4mg=
zlh$Q9f(E9{#Zou!FPk-o$u6D`PD5LBjMil}_WMDpR@fNwkOrw`7KWitD-9s;Hubsc
z2KO+=c|#J`u5-`3-Fpcok5;;i-PZrs0)Q=tk8^U1$5P>WL^{e75mFqRX-)av^)*Nx
zN!&1IRHE$;Oo@Q1)Q|KBi|d`XBpe!Cdahyk;ytS#qkV7xZq=z}{gS`K)g#Qmc4nQ$
zS`yC5C?KzuY=hN_A5z%gK+S>eyQCRFJGek;2Oq7PkZ(&jK1C;>9S@@I*xHzd3qy`w
zeKgT6y@ln&!mU;eU9FkXz<>{HXt!bDD;9g@VqK>I`|53!s(#h|t`excS*h|Ne%a+S
zu<iEIjGSV@Z~b5zv7OnvbseKX5Oa5()OXl#`L>ApDqKu)Ev)p%O4DC9{aVub2-Pfw
zxi<G&L?EY7ut_)5@{A2>MbkP7be2KHy9w;Et59mkB@LXL-Fz5+ZLC&n(c*q>iB`q$
zFZvp;5D+c)ar&Y63AKb`6(nL0lHxN*c-s0N>Ltw+>#p_eCGo?bNlw>)=5{1q?>U=?
zbvbF>A_E)<NL?H_MiW4m&m!0ruobxmtk~x+?Irjc#HJJ)T1^_2)aHJ%lpP8mU^^b2
z9LM-=h=6}DY$HO910=?N#)UONO8wL8t;kQul7^(FJJK#U)dr4jS`*XTKd;c_x2zwH
z#gonA>!k=Z9t`0|xI#>PnP0@IKJyzgU&N;YFm0W8+lWtRHo)@E&)xAXqrOl9KhwPh
zt>zUe)g2fem-C4A_mcn!6zLL&G*-4O!KdXK)LVY9Y@IIe9@Kt!W)RsO(kAW$eZ9!M
z;8xNEgBT*SnxMB1c=s@=<T#~dRhVcj7|$F==Cahxl0NsBW7xa~;bQ=>5|)%6q9#H?
zuxnxUSguA}g25;QJotn3xm|xCPxad}AnW>)46Gwpnuw!mQ>04&*VE7YwiKn3GPJH7
zf&sLx;*$Rdj-=B;kNlhZR-2u&N(A~1@7fo4Q`|68&B=zqq5VJrvDxl^c@gL}rd2v;
zJ2-n!vObpk;N^Ck)QUclq>?kCRkvZo%4b&tR(f+o_Ju-GImLKT)447C>dxxIIgzpJ
zk)L5i72qKAn#hwx`SGjhOtD6P713)O8aP6!>Q2JNv&N8bGhFBNbcw@vJ1Kh6SR5^9
zAHphL8@N^=cY=7FPOX8s#s5rr(|MH)dM{rLdFXWg>78P{6{lk#mtQ?h5C_luU0ehF
zOR^#ZPK7YX&>OC8Hp{nWTPrd})NO)ICrn)(9Tv#eKVLyTqAmk22l02zYxfHNQ0Oaw
z8j^dukJQ-e(dMl7l(hV$+8j+G0?MhSyz7I%K-Z$a<4#u*#KdZAmE2AeISq?i^@?#a
z9-}bj=KXJ$j4QK!`YVM5f-q3H_pee<tv_5X+p`P-$17MnZ3{1xu6x9XZ#Z70xoy`0
zyR@6w$>^$Z$tFu(jKO2P;#Mr|I+{p0Ui>EOK>OgZsi@>~dw%Mr*skG2`0)VN6Sn3X
z=*#4#_0-OvBh#f}jHKseGNzLx1>pT_e_8uED7VujTfRKUUX#}Br-n~cbDllR;jl(V
zt)<ytuZH~(AAR)btK{C6j*Qw#+N^jW6WA0YZuU_4=x6j?yfgq~9sRfISi2E)B}TCd
z<}@^&WdAXZgKfnd1D?N0e&9)uzT(=VAxk6d(vC7O*(c?<U-K$ZFl`hkh||=jV%q*Q
zPi{t@nucD!&6esh&FL@O365cX7riOzI=JFbqsrnnR6=D$+_|lcd(-lWA`L7myxyt{
zpjAXqZP)V&O|f=-_<e{#X79a##Q#EjXe^~D;k5ibkJ;#Q`r@ga9vtLpI8`iY(Dyd0
z)M$G{S;lHfgKM9t2H_ck7n0tpzW{qmC^H?<o?ZHWbUxQ+aeG!2@y(#UwWJCO5ZSgn
zll&faP180$^>i$6cyn!bko?uG#<%wA-KRTrN+7JYFDAjyA{QSTU7~1|3&*tR*~?rm
zC#N%~eKf1~u%YUsydKO9wPs;&jpj^p4S%~h^z*(x*t{DEe25s*`4cFNh|Pk@0FXRx
zauC){_Ok<$xlLE~q%?QGU~&JKqr3S$<_pu)MLcXAMwVDZgN<(z%8NY8Q+JfzJCM=|
zPUf73)6SNCl?fg@S#w}<F~oIoA+m(EU(5b-xz<+{Q!pf0q5zLWlEK2`c1zGN#W$r*
zn7;ZVd4+6kcPDb#3|}~SwFc`1Wr2DcMh~Z;t)P~*qr)k&dSV;9EwYj(aMcfEU6&Q2
zdp286u`B~q9iDwLpwxyWSPWj=p@Q6^8u4Pt)>~kQP!=xoo<YUeWFI>8-Ov*8@c5Q>
zfaPTRK8tE5!{?fhFIJVM-}O$R6c|*dQs0g}UAHDMul4j-je!(`bxum7UE7wriKe#B
z7Yh9q;7-PKG|H~=S-A$TSA@hMPIFI!BR-EKKg=#bn>N`?YV*alv+-9q@X^F@e5aTY
zu)PYUZ?G@|ZBfa2w&c~f?fCd5U#87YCas!!FZ@90gNbbcIssri04SSX2GDPB3Z+YG
zc{~<tOT?38&ri3fquAoMttVzxv`bFj7Zq!1PO@+5llt|D0fbG_K-{0KbHhlL|ET0T
z(INj*7MaBte}=yY;8rH6rRV!N4M2roXv+aS{JiBPZ=l&IOq6rgP@ulsgM3;Iub<5@
znuQ>v=6yvHY#vTh8B+sfzJ=R{Yh_V_1&X#Ey@4RlS$C-?VfT>1DNHzoKRe<I!7?(*
z>%Hs}Lq?F&>!jr*OJ8maXAd;D9z|;_{1LF~U%zTxs^T`c{iLH5MRgULnWdj84&-ia
zz090iFaeVzR(6xbk&^ROBOpD+e_)HiZUX@Hlms4YDI7jg%LWwndRl&~=?87$oJdX9
z4K`-h^kp}gJc(++GGeIL${f(w)e{7ed|;+Jx7Q@90WR|-jE&b;(Xyn7rj{E7ELeb?
zV*?@Y#XSb{A&#Mj-<JHPx+xVFW9behM@4aMMUXfynZFPgMsL7C6k}`EO#s;W3{TMA
zbtX>K1M#b0J~&CDCpAx-61_X-zvg9<5jCv>F14mCcNs~+hab1bs0cXgMf{{p5TwGO
z)Wt|Kez`N5W=8IpttODIwvUX=Zx*80v^gSNxmxpMK<D7!br10DsOFEHx7FWyj5-Y4
zpS|rZSUl#zi35;qGhtU&Zt=4%15|x{tc-h*ykAj~g09~7U6&3a!lQAnUP18doccG;
z8*)(Cm)y#?YWh_KM)--1pio(0r8@01LJ#f8=L?I~021)03?yTf%?*-?zq5HF*z5Dn
zYcP^>o#{`H_VWC!Py&@8P{z|ybD0EYIugZu{QO9@b&I#y!oz{sKB}l!XYuY(DxT!B
zhEdtRm=0{uSV2ZUok_1W^sSMqlUXnCYX`x(^u-ohSkxS*<342RbM!T{;p^2|rY6eu
zXN`71B9jL}Vczu^M<FHpTrhs%fTH;j9voiV0TeX^sKDNk^OuOvVpozy5_~P@FrZ>H
z>@6=`@Jl~$o+ThPxL2en6kmy_^U#G6FrF#?8qk==sfHG<-?P3LYDe?ryqZ37m5zIm
znfFu8Ucg&6{XJIxZl-i_{$JfNyp&wwRsRw*1V47kwZqah4V#occ%1I_6?>?QT>@Tg
z`H9Pd%lf0S>ryykEX9&z$3Jgf0R8HGFw&qKEyMd(&0`2V5|Gg?eC=KsDYi1@LX&*x
z(4bie#1|VXjY}}XK5FinFT^*F@oc-QWb_*fcJ1yVtYW8|ay;)Oaq=AFj>iwXA5QJ*
z%l4Givqy$$UFIoXc({X(GpACpg^T5Vt5*YXU{A&!5S|i;E)#;9tMSQ?0k$m+()>zc
zIEB{2tSuU$x9$=>%&r|HuUN~ZR>V!GgfM{Q%K+SRj%$**JpV~ak|s0oX&}YNDWkMP
z6OJY<AD-X>{$Vsq;J%gr!z1j!7$!D5QmwUUPW`8NI7Vt8lnpqK6yq9xW?MW(C9R3c
z&D1eD_gQ<fedHo1AtW1Q8#XzH3w%l#{%R4mJl};|3z1=ZD*e>usd$-4z>M`_sj1!L
zdOMI1`gJT1>C~U~G?V&^-)Sa|TMbeZ;y(UMm54tVY1RNEpy~=#2sC4h_=R}QDdD!l
z7}xlDgZg4!JVVCjL$=_s_K~laUU)9-9!H;28a664$0h_pO4x=i_7uq)#asq&^9ZYE
zqFu>->kpHTD;ZOXDe7%lW&P?;K>$s>W>l{IHStsLpD&CcGEJKw_FPV0J`=oN0v1*b
z{RiX*daNvO^YbYb%dmdOlirh|Il>;=_+`pP{)m+o2S+?FUZ7)TTN0mC$l)?(h!xT!
z159`xpK9c}+pb9&R{U23fEL-_RVE)+K$EpC`nWyc57mL=frD1!G9M<S*C=s(UDZL=
zO=U@h#h7tcM=pFGOJdY3P0-^~OQ<+Qi(g1Xy&^0R%vwXNU!-C-=F8qsvKQLXrPmIU
z6X45rtY$HZZ1JQ%OhkliNQ-=)QM7WWVI$|ZvOhW(KKf8MbLw7@!;%6$Y>z*!NHDf^
zarg$GKU|;cI@2Uf6&KKiQCUemTkhaXD>9tQ2%BibbK1>X>wj?Pr_9q_x~uQ6i3`|*
zO^6cd3H&~K-R9pZq8yknw5r$`t&vw-$i@Bq`3?D_M&~Yd1B0meVsk71#Lt<O>n@N@
zM(D)Xp7s>R)J5>R#PDIU*;*t0%&x{@$IkcY(V6HOExC?N+}R+54mwx3)7%V2ij(9x
zjj!9=pAW_Ck_ATJCLr|kh8tS9PzBNIbZ_WvGRvPb8>BG2v*MvDCDO$i4vvhZ)OL-_
zgW263ud<hXYt1>1-kpK6Cg?eXG^UE**T9jj_<RsI35L`vUpMUvUANdvdcyF@FKb(Y
z_<x3r(-#4aA*;53j<C~McD?b+l$69?d*jN{9qVCR@F7isLa+`9u-n7|)@~e>ry518
zytSVl6{%|xT{a)i?U$ay!(Ovqd7384Aak1BuRY7fm*OvYa^D*tQRoG5WoMc>8Lk)|
z)xBQ#f65m=_mU<!)XL`Cnu#)-l!hVMgej6U!%jPKpl#BV04NkECiBPWuwCKP1aIv2
zF#kuiv|Lp+kaemOd6z@#ANg3;)D!7Y#Lw<^539WHzjQ_atwUOd=l)b?sBb2cGCgMg
z%ASqzl1(S>`9`yZxCsg9GLXD_ym-EgspAu_-Z_z_Zxaf*q<$vul5o4;a!rn~ipF^D
zOvay=IQph`<ucw3rQc#FDy@&wUfh>(Jo1YJ*EgKKx{bJv^p!xgTw-oNC)YYiZqxa#
z3#LI^u6HqKb9?^Kii(5hi|9*<A?2tmXu!py>rFdj^DUAX?r;9_=p<gHrB|o<icxIm
z;MMJ9ItgS+<GCjT$&#o4CGGxf(W|2fljZ@F+aXUHFDNp}hCdt$4q>m0Tgp`rdvO+H
z$?dt+dgFI>AbBGrDmr(beuKUZ=CelZoHgs1cqaxNEM8AR=b<VxtaAJ$Z+vLB=&l`t
zGJxJfvj;HdHz4Is%)wRaHfx>7?wwrqKv;S+o{72nz1z$5Tabj9D`d|Xb7|P<cCKNf
z@i+dl-u|~p?q7|bI?<8-??%=(zRS3Ix_|HE*gLzhE@^B`k)<Do`n#mTU@9ZGUh)5R
z>y{6Mhm@87lHr+~Iv?l}qZ7fpk3Pw`x<0vGmArn5C2_SuP`gLq;B$R-9szvY!E(Wq
z6#q@R;WQsnv}X|7`<aMuC>+lJ<jK$1&H1&pApyrei+OP1XvxFhv~y*pt7kXF{=LwE
z!>WA2;hvLU*Ju2*v-;=VQ_TtixBklq)8F>IFVTDZ9WA#!`1mSl_{Qeux}u~1_mv*#
z?irZQ<oA4V#MN1kq`8aTRu~FnTUzuVVOyxle=(AwJ#Lkoh_sVO{}`H9cvo-k)}43A
z=gdFY#8w{rN&G1(VcHcMO}S7enxf6HbGMw^o-A^Y3{aJ^Az)5UN*`sZJ=2IXyzv+R
zo~Zu~>HBj;Y(~K@z4lx}k<3(N@3uv}I4U5yxii^K(67H{29X(s&gMMP`i7*<tf;Qv
zIk6!iN;&pgipA4UeofIpnOl0MBBFm;^5B)=O#g<wfTmTlSEfVSZ&vL)k!KPSLYNAE
zlN>WR^r!Oe#%GM?FFYm*J8QZ$sqF_rCb=(O@GSQC<vEJMoE<g_%QCoT<mCAKKF{9b
z%~hALT%JxL{r>oiJuI-HocU~plgD;IqyC&H!G2xp98@fqKFp_<InYPM?_!w;?wRV~
zwy@6(5kJtewe9!Yx|~FIy??83qtyI1!2Xs0c$w#Ko&T6qnhLFNp6Ab}t<ty$6jaG`
z-UWXg|M%L}3e_JI;BwNa%6R|OlgQ($zQ92><t(db)u;myo_TEQLEnWUpiyY^t8b=%
zzL}mhw@3TmufG&X1vFrf-#Sp{$rGGEcd<Rl@wx5)xuf6o@-IsAut*h5<dO^s(*TKv
z=6|W3&Nox%W(BL+zBOL#Uq6B}8{?Gu4l#>cQnr}uCR`ue2mh@v@~zRnyg4>E2^ixH
z{I8=gpl|+Z^?R`S?|p9vMH^K_SpDh6{~UbtH-YlE_LA=Z+_DMr|Et$-^@#u1sgJmK
zN>Jg=<YHzIyL=q6IbKcoH2;4}8Jo?MEZ}C~PHypwvy}i6K#<63gtr5n1V%0q>+{UJ
z>t%}_+;8&$x$VrTKH}4x4cBl!zm7i6VI+{a;Y)k)-~$*)%ktz2J-H;|JVNd)K$#KQ
z3RI1|V9Xwy@`c43jr&O<V@_+pXC`)sQE4y#(jWqy{|2XwD!BjNH3d*9wQovNA9VrV
zVz#u28aT=X1TT8nN(6e{4godRo-(}HB~3}C_h<kSMgnEkgdi(X^M+clo`VrD$B;Pq
zl}h8L+IU9dGeN`4{Y^16HI23J-}BjN{XbUl7y02!bf(7g7uSx5xOAJc+Zw4pWkDYP
zVZ+*x&uo^fxW$IfI0I_N6;y4m!IQAI>D2!cc#afq%%j^yRg%VbF90bIJPaX$MG*xb
zG1@bQHcSDz2{bMHG$YqwbgYDjPK6&!cdB?(WCN&Tj-(&{@mCLHaC<)DMFUjK8;Gz3
zs{yGAcvXkviHz7d)<>D5EKrM;*9!(?oE;wlSX7ek-Iy=1yOE2|OWn&<oy+CuP<c66
z`%z}OU1EAg@|*5%54lTY<wAA)g_i^$9mbq4f4l*0%;!H;fP}r_M!dL35yeh5%m3hm
z4Wx|Ss+$9-CIWXnHqz2}q?{kKvCW_ne#j=rad6yoCHzm;KqrC!uTl=M3E#Vo*G-%u
zPL=d9ovS)eeBU=1+2GrvE}IoDo7f(ZBFUqvYK*<61?Y6H4K=$99Yz?~KOV3~$^m29
zD}1hhIfKw*)fs(7;@qMQy}|u0t#gn=PvYldKAJX%u@D(lPLS88_g#*8y`8p)G^?do
zdlv)eUSmF={~1^Kr4Wo=r>e)BJMB4`T`(R@B~F{=%u*EiyCZrHThSHBG0Da0)r6d$
zv@fTadXH8LgKW1Z0N<mcl}azZcFMr=8Rm4<XbOPzMQ3IArmQTSxUux>#@H`5q$HLO
zd#ytP#(B8me?;Gj3ViD2(0yu^cH?^Xf4LbC)$y)QK0c*h;#g}JGW}iH<9(5q9{Ykn
zS}!oxGDuxj-<6IJ;kj39#qLz;oEnu8i_Gw=(IPzFz0{DDY}ckxwZ~3V>Xo@?pwJLa
z>Tzi;xa%m50bwnM1ih-S$vV;~eO`jD;*7B@xl~e}#bv4F%%H$|1aT<+Ev-U5HMI>=
z<!~q&=(hHxf~o`E*GYj3Lmhyr(+R^%@6LAnmgo^rqAt^ZG~D3mD641;8dOGjY*%?)
zS5dnPI((R&tIy|AWh*V)zR<niIE78?<`13Q)OJ%_zo7EeqTd^$xlkMFM>(3S2%TQ9
zur=;ft56?l*=?T;RV0?2;Jnk4)?)-XrDUKQjFxt<Q<_^BO&U?wTdb;~HV1@aA!!jG
zFBi_gmw;r=?F<4Pk9@sKT^7121l!%Z+E3l*%6W8lP)EM=LR0p80TV~2^Wrei#iK3z
z5>IDBK(%ILH#LZ$0F-J#ecAXvP8lR}0S{o?OfEdW0-C0$d7S>8`(yDh0srEEbBk@U
zPWYFU@30VY{A=lgvi1ScRB=CWZSMv_Uo<e+BO$aYU4O!v)FdxHeATHu0<D85e2mw+
zGl{I<Jx*B+sM$p!m@U0pwfG69FgE65>8n?!Q$1Vu5^;0GQmV6}+5h+q^!I<YUGn+@
z_{3KMDCWew2n~k*7iy)tL%KksIySO3>&m3QYi;dM0+h0BY{mh>*ULop#*=tPKw%P;
zFGls6bdyz4BZSEqN6|Iz^AVkOz!u3XPE`0kW^q75)JYO;;#~68m7S}9S$=7)?3}-b
z^(wUJMq4W%SB9d|4yY8j9<~P^+1P0Ko?u?CR+g&k>jf80z$z_7ZjYkBC({i?>^3RL
z015<_w!<HciC@xWl3u>L)I`=JIA65^S~&Cq5#SsCOKMnh@G^R!eP@792oa(L)z5N1
z3j)-OrpVxJz+C`5e(g=AEC_L7-M~K^G-(M`3|n^;=@f0r=L;OwVzlZC!&ic=j=+kx
z5!gu${G$o@?9CYMRur|BYcrs-0t8o<B?d)*vetmaAmN$ds?NMG>`M~3>$gD1{Hl0*
zY89ZZi_6jPXpaO02bm2f;}e;qR94-Dl4nI@RxCw%Y6r89=lt$<JY+815;nq^s=(vw
zuCESIX!CM|cCAD)R|C%iwF3CaKmw)=&`or-*VCy{Q!Sh0(|ME)O(#`8-V)DlzsN7D
zLHk96Vdejb3=i45K5NiiLVeC2lgA~|2pCFIyKH}H@a{2&0@@}!BsS1CA0zChzj7IF
z)}{p{NjPO+ZzX*ksm!Mm7tXI2U3Z%`QL~t)aIG#aLk9Irq<U@r&bJ<Wgz)fviq4;u
zgCQ@D=%fM%37#=XMPelqn={<`!DIC&=ebWeO#^CY*TYk>#3+kJ-7uU3*}UgjP=5&?
z?hCnvyE}fDt7K8nng;tzeJ<9p_KrvUsf1zj(Qhhn`Kwu~-43RYweAUVQRpVuW^A_p
z<{^|%8w2|1wJqH}q{QblXDao#c~NiLwuYl$&!6I_EN}jRZ&i;e<2CE}9VzsSm26k3
zj)1ccu^+Fc6!Y7AZJ%f7yI*a#4DWT@3&05RvaR`_fu-fSb3P`egj7HY2ce4&0a|6m
zCXe<XknuNFN(gEzMW!J~UEg(_<>Pb@Q?slFe$J-)mNvO`!hW^CNT#YGKhE(uJEw-*
z*37<bK6b;9CJsZ0WVIo}gJv=X@015m#|6Vg@asEdG10-zYR?rlkfoivy?U4Z->}M`
zRH!a~L7`t8w(>2yv~J>9^(b*D&Chw^Puy*KX^75zdm6xOX1WUrQ{0Pjx}tntLq+Pw
z9F4duB*AKWc-S#OdVb~k+-7&8oZI2|p$`rYRA;$Nb5%^M$3&w6lIH<qz8QP?@wob1
zhstJe9%tP=UfZLd0Kc}*^Yr*~_|-;L65f{=xc~iVPknSp_0jz||MQ@sPa?VxBAW%1
zF7v#obdZkF-JqS^pC}^AS$;~t{WRqRep9w6<io}^k%3cvpj4D}hW1XJXGuXmhh>Zs
z#|Nrkj%vRgkt2TZdb&)goZz3^Rz$%NcZOP43c+fYV_KwVRTr!3Z_wu0$3)UP<WnUA
znrsF#y{YI>6xXHjvzXsoNr6L__!>F{8y`NN<$TLl*?!l}$x!2+s!sYVE>Edbkdxot
zM))ZFo<nW>a?a;ZR8Bsu3GReyhlP%6x`{j$6&7Y^G2cFCiz;A$e67R>96~^+TlUBp
z#_ic5<<=n!iidPqv!o~+@su~)EDa9C&{na2WHyr;|GdyamExcfolF1QLvyzl{>jX_
zT}ru0dMe}FO*Yk!*KsHlLU-kqDwy{Z2ig(WcRl;8&l6*UKZU!P8KmRnE7Db3navI7
zXwPNPP2hgwO4c>uUn*1Epg{tELxVYshmwC4=xjV6b>4XRWL$^vz8%>&@f|4)9bv>z
zCK4WecV)cPdsG?EywGJ4xDe8}AGS6qM<1cexg{Mk3S?vu`wJw9W~?q5XxR^kp=~rz
z+3hJ+Ee$Pt`D~mhhvaFu1g~nZf$&Z7J$c0DyItL1Angt6N!8vvw6{hNqwMkChMDOl
zLbS1!HA3x^0`Jb#WHbl!uB`KNJNNxRq<v*rTie!dAO(uI1%g8rDZ$-oDWy1twz#yE
zP~6?Ucw5}9SPKLxE(w~Vp}1Rs;0^(T+?;d2z4v#{kNfBTN-~otYpl73ykpFFjtPhJ
zGY2DFR|Mofq~mW55i`$)yV#WJPU;HWA>(AbBi`{B7a+rDK2Yw}d(Y`k_0i|(fUvJ4
z2~~E+0=bd_g3!%R8^aY+W)B<8atqbpFBXU8aRVfb5)N?7HA0z|i%jYS*(SNQo0DZY
zyE7vL1ycHJ;kKUOCN~UEQs$wxH?+i<b&C{~R75Ujn0RvcCL;91V9*3~NHFjH@;w50
znpgSKi)g2;8AW>0%#|w@0zXfnoONB;xLxQ4gv7su*;+I`*Bk(eILK5nXwoVk8D!fy
z;KUI?e74*(f1^2?c+}~2O%|QC93h*>XQOuA{o^cyGiAPWl;Kf3T+pK_M#4w}_=Z-~
z)Y+_Y=$rQTUMBQrj}FtiWcyY5Y%Ea$6Kd7(4PxOUD5B&i+r>++1KPN>lvb;}n@>-!
zt2n7xWQQgT@5BLAMhHRd0pl#>Ex2<eIwT^ye23-!m$B*Gx(guqU$%B@DN2qLtihPl
z_yKN`KQx~IfKDr=>AfkW|AkuSbMNf1#<eQ~ZY=0`AvoIC#?-9mg-TtR8JhZeOn>Xx
zWckn^=!5KPhn+{jBT@1Y1{$kP;Sh!~TMP-*p#CB^@XajFHgj8a<Z~c57C<ms4-AN+
zvZ?09dDu^%w*D;EL7f`{$L$D`72mVBN4pxyGo(oh8TzEBo~I~iKC^@`I+DlOf<3*?
zrGb^)ROeuaLjq=IAVX3Sjff5h(GDa7J1Y0UVyUy50>{<54Na>*{Tiq{`P{nhIp#s^
zBhzH`&Y0#?{a2l*$}i8sD)>kv(L1*}e_=L0eJc2!*at{b;Iy(F?raYUY=n^Sv@kHt
z^g6%D{ptQf{(;4VL{lZbSQ@O1{|LN0FLX^kboT+ddR+TqLSWfjQd?<~%wQ(iDm~Q+
zH$!8%(8@6Ui~uJiDf1PB&ogCPTEAzwc0P*oGNguH>8ZCRFSu`6rO+Q7Et7MQv5YK1
z-_{?L1V*S3x)2Rb+vc@s1oP%BI&qx^F{Q7>hEB2x))m}KbR-X<9_A$N6Qp^jJQA?~
zBh;ET@y^UIw`>|?LH^U%#4~pzy=_~ze<Da-Sciwkf4<-R5x9jW@5~3~ZLt}j+p;$j
zy1p3argoBB_#77Vgfu=mX0AUaGkBU=aE|#_IO+QO1n!!+RhfFvuR)*Wqrxsg<lIJ}
zht_qZrmH9X?j7*<5MqBdbKm;eoL{tcL&GJ{+&MgawneEc#LW3RvKcvOxWA*kC2=~i
zG%3zZUe<q6FJ*hb3i9BN*6&F@8OCWL-K2<EKk_ySf}-?;>^^|x>u#q98cNkAbA$I8
zI3FYv-kXtoVBfY;NTK&zuZh2+#%pieKbNq^B2&W;az7g=WlYckqDn4QEgb+hL=(it
z`HzolJI!mf`{-_WFCYlDBtf%T%9lpi-*Rk9dR0vR00m1mAc!rMhA;jWE>fgPGU4{=
zxK#`MZv(jd-v;o&Oixax@35MbicP<fdYG6riMVu7Pd7sdM`iq?CK#VmzOW=?+Wm`*
zmG0D6?U+nk`DaNH6R6m{`_~3P@SLo#gYCn(oh4GBOR9Qgz*uz20q>wVblff}u*E<Y
zC!&ZBIAj7mDgt2avk9J2nfkp&<fSN_KtkPN$dXeKwQ9-E^V)}tW|A?PQ4E^~Orx;j
zHqgz={Umgd8aV$t(rv(qK}2VA;YELY_qT{L8snMK+_qIbvf$JaiHp>)>Xrc=JtC4!
z{V(e!EFLyC=zNS+RJ8%8=^9ilp6dTL2a6ez@KD;|eR<-qgH{70bvY7MR|NX4-m$RP
zB?m3x{HTn<*A#Hnx7q3Dh(vnQ!rTYRtA?e#X*k>i_{0GMdtBNSArPweeS-bHDN6Wh
zgWuYvA5n~s;;Au{%meP|3GxeeUx(<ZZNCr_a-SEZ-upb$L1vdktjng-e}OM2LlpUu
z;(mP9ojCp5!cWO$bKl#|zVH}$&I6RgYtuh{Z{5|orj<Bkj<*%XW@jw4<beN&oYY7j
zBm&aAH?{PgX<~Z7#?3DK`9uI=5vgvPP=$zn-kV!hmZU`{?`IhFslPdrw&t`CyAU&~
z*rbmPh&LKAEsz(aR8|+p2K{)?Hwd_%Lji+wub|w}xEOP)udbwB&&MwI0e{p;mxQTy
zd?`jn1muluTb|2UzjuuEbrGXhCa|vi=uRJ(oVt_Yrnl}ByVFz1T{=ilCsmm5{dv`f
zQJ>R_;mQ1?TFVa?PbYjl<~oNQnLF5BkIAf4=(TlCa#D1E^BX(_Ck?lr{%Iqxn<fk`
zJnfZOFswAC4z#MYm$YeTeN$}v-KKPE>y7)Yo`fIA<bmjUDsh~>{Ur_+ES>Ddz~=hE
zrj9{dw17wgsvhPf51wQj+P^VU2=w7Yz&|>T!%^+D3W>ZQ=qBR}`l|pV6<syJ3%)LV
z&8J$;&fe5Mc#+Bt(V;$TaJ3SvIM+oBppCtZ?XwX8l+wYI!`_u*eKI2LcL8T#gEUEK
zMyLyZ?WKiT^12W?M4BreDG;AkxWoP4px+>IYscCbFSNtLg^a1>Ba92=NjcZn|Jj(8
zASb4)iBvL42~YJc`*8K?^_)Ou(P;ThzUD+J-w{YKVy9(^dx62dk=2}UuFb+iJ3$_g
zB%@M|l!Ez_jm7w@0&0@;a4yzLRq!GdFYBrnSny%hB9sZuU1hi2OGKzH14XTmP>^L+
z@OjU{$0*5ccns*I^E*&)e;1}$?yBF*vV3!)BMzaA$qQK}=SYYI9qPTA{(u(>UOWQk
z7AmJs>~9*MS26T8!e*V6QDWYg31q<!n2StGb>Dw=wMI_P#z=2<Kr5I<`29yZ4~hcK
z&~z_^Pe>}ct>gGCQ|q~#W=}N8`seum<>@x;#rH==nTCXb*)0a`p`zp-KK-Udnj-W1
z{7aPhg~X-(`nsuQvlS_#niHMT`8Y(?{*y5JTbEW#_DGj=BOtk{YB0v2`_)L`BaIC@
z@h*YMsb2&|@s35l*PV&kQQln(%#|sjqZ{vrU@Ib||Dfcn`}jG3LFRvtN`)o#xg{x7
z@dCqD9*ll(<KQ>bl>v$w1Zd{}ORUGWYNMl;ztMnX#Ws_UqaUH=KA_tn9?)K2J(4jY
z&W@d^F7T%wW>UckGDDx4`c(_Cwa-dvtWVYyl0^h(*H(FOA@AU+HkMoYFYflxzG=Gm
zp#s;2T-og$6$T1zWFygW&j$;>?R)F)oUTgs8{)C*EernO39%RHu!#(s9HjyFjj%}&
z(Wt}L<uG4jpM!2g6Nl!1KFBr~X&`Fjz}xsv5Uw)AZDu{~lr}?}pe!dv`0-mbrrRwE
zZ_wP%zq}^y7crnd7;c^l_DokX(9NiFSPj0HRK$f5Cf|ZrzG;!sA|?$5ui|Q(5o_uY
zI2TQ-Z?$TrY1^vV%a(!FmaCq^ROTw%c6enG^Zwb>iJN6vYWE3WMOYZxe-4M)`6BD$
zdfX<TJB2^_(U<pL{%4TvtDe&%O%8{c#0k%*gPym<{DTofWZ9!%XKVM{;#5AD8;D^W
zW3M~hU%6k<ys!+ILHO3{)??29envt2#b0jh)&5Cja0bgFuNthkE1*P*gDtT*on0k(
zsc!uV2lJjdQU;mp@sW{&P9eA6kldbpu5Uo8vg^h={AITCi<-ecVL-PCxiTJCHJg^C
zoRrlE+rfZBV3Dy?j`NoS-*EL$VuuN_`)S1aw_M1raIUFs2RPWAmtP{^p`Nnpj@g0Y
zf*1oXLJPT899udef_MR=!d$nk-WEvhC575t;1PMSZl<NUayN94xzs^unoMvqZiIxM
z8k0+RY-=L|!ea5rXABZ*abc81oRmA0fg<~l=EN;BmWe;}b(m~76@nz1a{ITRcHW7d
zJVZKrbD_5(Id8yRO#)^P{TITwtWuQ(=(IlBnCY$5vV@{Ew)W2gaEZ%73IV}3WSLYL
zcRnTe3sDwTKz|`O&a`lO!lNSB<JeEdB}B2ABd-_o-m}|$aGrYc5b_`%|00gBd&Y|f
zm&J{a{xn+R&93>Pl<MFa_(xFUg2H;8*?g_Z3f#v+N$?#7s(l3V!Glzfx+a~H3-h5q
zK0w+;Abk;mX2?&))@u(rp>b;zJGCdTc{np8fzBv4{2CHhHHIq7$V_t*Pk3&12y_>q
z-cpPT-IH)w;Uve=P;h~c+em^*oy4sbfJ?jN_ikfRluunKEpLYj^~W&Y7Uq<?;$}wr
zDJo9fndn@%WZrR`E$^>5cHnz4Rb|>2(+Kof7H76F{_|(tdGA-ii~-oz$QW(g!jL9G
zO(C7T)F~)|gS}@#BwhGlR1EeB0xuxsJnrU~F$PsENvmoFfD=z`Dj9$>9uVQ;a_mf=
z;qy!p2hKkZbh=IYtn5(z6Bi^O(Ek^@=kFBYwkgwW-jdBjtsLR5L%o@fdpI;C4Z#aP
zzYvKet9{g#vw(%|KP7llK+VLX%wHzW(2PegN33cnRKZ+($4V2Yt*;QN1m$l&cEb|I
zNmaDjroDyhrTZ_?hnm0Anr+})MAKfb8Gb!Dv_gf0@J{rkQQCwb()$nHxWd{>y@gK_
z_SIo4a%cdoNDlHayGC;+hYyEsUTiNV^%De`W<K|gZy%4WN#EIiHfdPN=Q-6$h1?Lk
zjs~VC5cq52fgDiUzy_fTA@hg2MSM<?D4MydJH@Z}CyJ$QKij3?*pWE$p~J2#Q8ONd
z9Bjc;4Ti*SF$HA4)qKAh8CdA{FJym-7us1HD);Ka-I_0Idd1e`d>)312H9b%!_SLf
zKS->=p%dI~I(Sm&F@lyU#l}OK%*Us;2-BuHc?+o|a4%&8G3k>y7k~P+-&hm}wsAR!
z#GyEeM?qeV!Ko-q4GxyPKaLK?`Sf+BHvt89T(8rSuZ#axW6&c6b*p*!Sk<lAJ#9NP
z&Jg;lOm}<qDOvXLU1RN6on#3li6^*>9GCOzzN~MS={-nR*R+%JdKWGV*nV-EbTd#0
z9-YDGg7=4(Nt5jOMiYfoeX`NH4b}f_mxdEKO_HWDT-b%&(!rY=w^bKc&ghRzcM#(Y
zs6Dazw7Ym#<7Tay);=5(_?u8rl2D$cvHXi#x@KfS4mS3aLjz5frK#@$NOBK^1FG*J
z19Ps%AZljy$TDkZ&^`Xsoe1<9f5?atCo%)6-`|{4iOYt~*xQ{JQXiY~jTTqb^tPb-
z+N%O$WeAMi;X=4njvSW*k;b#tc7PZR#va*b{}ZxiiT@3}W!RVcyA<dC(OE=N-qMRg
zT_S*H1M_!hn8XQ<$_xo=gktmsH3*ke3KB-!hBC%ec=(V+8Aw6wtm&FI0(JpOwV#~M
z*<C%ZNUS9lI>`wZ5-93uZ(DrgT}6?^J4-A-A@BQo4<`jlw$<lV;a@J64mE&uVp2v?
zT^SeiuFiMU@X?^ykdpVFJuk$xgYp6WJR2Nd!lgX=x-p4&`nX1oT;Qv(e>ahTq%x+C
z_M?s>_*i!Pu>7%rxanzL1lJzpXFFXwnQ?%L@1r%^y1VKxzw~M{DED(x&*w?4L0=DD
z%00w&u<GfYR0ICW2y06SE2JPFgqS*Zv=4uem||qB{lI2u7s_R(ZWGQ84#nLi^}V{n
zfar2Fz8)L1dG$~q;?E&!k5%kEd>+shcTJv=)rydCOs#4fb?;!D=}C0##lET;=+5$e
zQ3vQiCBG=?>((Oq>x2y%(XzSaFI9iJiol1g9XEQxBFk(ftA{20wwUR$;<@?aFs=#a
z$p~HD82Zm2Rf~(m`X-nr^_MEw3nLu#AD~@wDvg3Wh-?bLHtM?d2N8jjwGpegzXXI)
zy9m`<Rs_MFI@_m3LVp%G5F2ok>Oi?3=jL0#3SPsj72)nL2fXZg?lI8*?nj6<N<vwN
zIf>vg7$`VFPVGy9bL_a1k|DQTBcK#P{-TP_>2)Z)F?RYTK{Bmwj6N<|xufTp=bpn{
zlF{C~Ch^XU6A(}iJlc}*5p@AiR~AYJOwll3kTW0kxm}6}q&^{1_9J)hV-wMSvaYu{
z=PpE+IrL}fN84TecV_V6UxDe_uf<L-Xy$ztJ8+fPwl6`e-}gnUJFBX3N+BYm`k0%1
zo`Pu8fLb;93aNpI3(eywg6{$#Hni&}R|yyPJL<HRzVwqkxB^(N+!i??qBPW@yHmYK
z?g13n^Xq)}Go!<&*4_)v^xu+#uXM`demC}Pd(q!d8@sDP);2_wG|`b_6tlfcTdl%!
zOJ9GHr16W;I|uVp-18lpEoei%<*<Se(36xO#S<zkz-j+1+Knuca;|_)%?3M{zeki<
zQYiz68^a#EjD}EW48UKuw_gN>;=IxL2hc1Jtznzp<7$O|ZrcandU94KD(cHSmfq;j
ztg*Q22*^3<KPYp|B(kDTpqh*|&@~a@7l+*68vY|(+_tLCvBC{Du?Xv#G6!<K1|B$u
zHfm&606u0Cj*F-nUaBg3XUD#(4av-l7?m;4`k9*Hj=Q6ZDjQQS^Y*6loe0_FT)w6}
z+-qG4U-kFB?gh8Oz&Dk6Px5KfC8J)I+?w?ZoiguU=u8RB{2cmA3p0m1DeZsEd}%9U
zPsJ}Q=t_33#_d>Ul(9*A-aGs8?V$aX;o<Snfs<9mIkLZg9fAN2Opw^`F#-Z+hYk3&
zRO0Ft+RLJ2$xtR&Q8z5<=>}&bve>wKkB#b?!Shqbg-%hE8$!R!cvy$`p8sdVU72AD
znU62SD3@yNn}a&?AD!vIZe$<bi;)?{dbb$+U?yNOREW4G|Fh&9lza;Y_0r#j_j*J2
zT>4|R-sKdI$HZ1t$m$y3>hfa(!Fq+2ZHr+P#Yk&#hNh)4Tx!5>cr~BWpUtq|P}{?>
z^SemK&8>qO*a>E6VIkcA264j+fuJ?PAWX|dha6|<?k>+|wf|vUWmRJ&>`cQyW?-Of
zX3$YQbI66+3g(glP%d~#63SbinO4}g;UMR}o!t>&-lxI=QmH5<ir=4}ncnZNGw#6n
z9f|G}-E7^2VPwzN19#c~r+SV9pHM%nxBJz(#`Pwa^2GDoe-+b1B%RptipzOIy2HFa
z3Qj5V+~CQ{%L8$QAAVTH=MNhXM$s*yxar7)*3lk#Eaxs@9Xn75Ib}68vX_72?#|5n
zjShHzpSAQ&u(Y=A;^KvUE+16CTvYV;P6s!%R0kKV(j1?hE|2LLw{LDv(>?`AWKuL=
zE|cPd&hqlT_NCvpXcsrP#dK&(q!}W%9p$*^o$WsF%PNaKPygK2nsUCkx7YEL7Z##j
zq`r+kG873{Nro;Ch%R(bUtGw0uC=Ub{iQ|E?zH~+4B)_Njrck2A|tMm_;2VP$34SC
zR6e=^2*@u4s(GH<mqHoZ)Y-$NNm`zf@9a?6AWjFr_8GmHIV3nU6WzC;CIiH|Q7}dw
zeG<Q!_>DfKv*gUvy>XZs`Ba{Bu({C2r*`Z=aHDpAeY$;L<N9{)iLg(ROw{wmEm4-S
zt^9(yh!>D-elah?R%d8lK@%hQjLI2NgkG~@x?Ec|W_GO~%mAg;?EmigWA1t7epgIU
z`PMpdT$^?Id$H;uldV5^>)wM0SnW>+7q@a(Zn>x}U}Ss^-Z#%1%H^I$Sy~UlV#d4T
zG_xmVtMlH|iQSv&wsxieC`CC=s8LPDO-}KviTc_zo$6-ay0!K1Vr~c#TGp=z8GjFs
zo0aU~q*@-IVK3~>1Ma^yg(=!qQP9uuNDTlNy|<TB_v`Yr!Q<JN#du<l6u->_ryTEB
z9TXcpzHP#cQ~VWSQ;TeW_Dy>H3re6mdqo3&zZ#$?a|Y?$fLBDykr?ot!UO)T7Sp*w
zXvmZE=GI}Hh%;IT-vqF}4!j5p{D~IV`TOYv?E@~2XX^GYoH_Bpe~a_4H|?}Yb!ra`
zBCe4pXn$<w{+5|znh7TF3wk11fgi~m_)!afOb7cLEZkb+f8jk!{*OZZ^UYu0<JQvk
z-TyLy|N0S&{0ji}e~ta`k0FRV|5YOV`@jG18=Mfg<o;{yKR$Yp{r`)A$hQml2Rz|)
z{>jiN1(O6JlMF%gQ62mc{H4#Q)tgzC)P)>S$+v&NOW`+Xy#N$YGCgIx)3b2nOhf@3
z*pU;UFuXawhTi09k`&tl)MbFjWOxM+EBFWu&u_na!s5Ec!ct}=x4>tbwrwDTS(R>T
zVD{K(f{&B%`O#16w_t=dT5sKwhtG<kQTIPF3{4@Sx^yY^mvRFj3OM|IPBo=&DiNrt
z!LJqk1GPC#aKF}d&pOMK8Mo>ds#JSEHPHZ5Hzw%j@j0*sv6%SaEq3hzzLJ~T;eWRC
zt3+<G+`3Q<&>|MQ8PA+Z;)S<djUELcNK1JSn7xO5kAA*TWZmz#hOd!0oJie?limdV
zZSX#v|8@j_Ptd-r+{js*pBa~FYJjEBwRephUU)3fniRthh|RiH`$*0s7{BvsSQ_vP
zgYPVQew9ia;Brsbz~>k1p|!-2_iZsB+(KbM3BPh=H!%7@=WC&e=?ZadU7>#i@^k{A
zhy*yxm|7Bw1r&v5Js>eYSoggh;HB^i`*dWO_?N4*w4_nM>pbd)MnK2}tLxToT&<ny
z{_RkXz6+;pBW4{z(STr^N5F7ADCMhNhvpW%oLZLsv)1)-8J07AId4jg6*Ml$<td&G
z*qPZcH3+Rk92d8I`Be~(u*d7dM?v;)4-K_D_Cf+q6ihEJ=B+c;b6V-e+>%hD?(us=
zgMK;w)I;`Daf$D~e&0$v8*IINfQh&o*EJ*MbNXgW59`EiBy)Pms#<LZJSsZJ4t9<y
z&N2L@uOYnT|EcV)xV5|l-V7d7XBrNlSTY}TA#w*XyepX-%bqQZ@n;D189Qfq@WbG8
zos~9cC_W>X9NWn{0lZx1{+;jj2fJ~BKBqb=E`MBe*lyvLJeYeqd>fEDk@uo^b7r0H
z%o=xRjJWQjKS>OYA<-+<+0Zf*!mIrLwW&$vR8hL+W}k~))Dr-qRiGvi;ca$)@R>{F
ztsy<3AYOMqQJ?a;>-|wchF9zq8Yh(Jc4=`u{NfNg-s&NXQ_&&*ZvBtU94lv6&I@@=
zx=BY#7M#6`qFhy3u&~3EEg11f0Vz=y)CH@AzGqRR6w^)G9fu|6#c6iAkMR?=Zumc=
zuol;4*7*98DY=*(R{RL&oHGFIk(9E~+w-dzCI7EEV_{Fo^U%6BHQ@BqGky6$Ex+`w
zZ$BSL5BC6{x4|GNor%Iv(~gchTL{aL-?`O`nP8UEv82Aoq@Hu(>}o{hQgo#nWvgjg
zGO6Wwqw1kNr58orXDQ*+TzGk@6_HBPH!7kj6BUYc&?HpX43PB03y-+xS^m!tjLmb4
z-h50KnzO6R6M74-h)L_-QXOGII$^n+uAUg@ZzY8GX4Zk{5jzlULU$$i%1*-&UQzzx
ztO2&rGGJr?mYi+P&^*7hbS`HuMLr^<^HTy+5mTsVnV+u=sZy0ea#S)?5J832{OnT`
z{gSI+?T8ZL1bEkr=15$p=w<8OO({H(7%A?QJE@!#X-^V7Zb9MNs#a-p!e7&92S`J)
zWd!c#U~m-bb?lr$Ni51qQKg_ouab!ZB9YOe^!lBpq5T-LBK@;Zc@?PdC})Fase(ze
zaZn~5QedmD^W^${-0NJp$XjUNW{qJB&rGAoB}z24VRvi+5EiHX9J_thtGhs4a3J0o
zj2suL?W#iON%<>J9ci!VCL(;@!8^675Oa#3$Un@9zFk5dahu!u9qhU?qN%0}y=DXH
zll&i{@&((zKQ<?`8l2x{5zR`M_u4iJm-iwa-gIS|zgxq+Qw_#`Ep(3lfhfG}ck(6!
za)7wDe08**KcN8syvIH+y|zCtV^F+JS@3Y?U2NA}i@HTN_tB}@>H(N^9=jV{ysfq>
zUE92uyzL$%`+GZG5CxBk>X*Tc_Q9!;0Q+m<>e^j9JNF!^u+^CA=485;tJqDmmt)sj
zeynqiya|zsx|~Qy@$p_qe|GUo37$TbmT~;5`{&(?g*x+yw{MsGI0R-Ea#8Sch6RMJ
zG(5tY`p9peUCsq4=0-s$-sXL|k4@N5AQW<UAXED}jPxKoTSs3$9hnC<pIIjaBY5#p
z+5FCSl>n(DwO(H?=fvxDLNu?Y_T6JrPd~k+77g*I&OTcbym3*Xy=Mt~mU@Xdf8ZO-
zf@)@aEV_QQ^n?p#V`kQA+c5LDSFx{I^sz>k2~WNCJb9gVkIMg!(q|0+S-WmWy|4xD
zcgq5TYA)sGSXaU^Q`Gyr8XSP*xOcHF=Ri}hX#9x-XE>=n;0C)VI5VC~8)#u%#KGs2
zq})gPHx~diXpT$t@MJG!vUMhIG5M2-or48QX-F>lS{tJmaqe>i&3WgeRw9{IpEJB@
zPm``GF$`il$ulw`LLy|DaIUP7<$oXz5#n+_d%So6oFYh)z6`(Neo~yIp2VHeOnAA$
ziVwRo(WZ6~d%F4>dGifl!Gd-1NSjVKtALE(J>Klp;3g&Y+oiCa(WBK`4{8srB7B=X
z_9WLVf$Q>FEoT+Zp&$5luyc5fifZ4-Uc+*?9XG{uEWTj+AOR)TaCZA@A180h{q!3e
z!P+sA{fjZrMcy0k{kUt%`{`@W5?DMVAjl$M5_HGvqZ_vo(HeUI#*YN7?l?%9O{gxd
z&B;kW1>k9+oUQNF`dO4kAaRi*Ug=JE8+wp+eIG>#aI!RAW^gz#BFP6JZLKUU&F1i!
zWJ#qCgwWu02%}m&{hblwK@~+-h5QIEn~qsEWp-rqpwrO6j*XdAcfV`APQW_oPAOij
z46)StLqylFF2xq`ttp`z;KHS?3}O{u%W-jyX0gcM&%1epVsS;##@&FjE8d2o{_-xL
zTi|lWOozCm_tf6WliGur1w09F&$u@kN9&}3n^VewiB^Pe>osSe7e(|T9o)0^@ByD!
zq+V;1MIbKy!tQ4-ta@OB6F$CM3xu7UK{mnAzMh}axoLN@^;&rRPidojsvM4ccpF$4
zX2fW5hgUBs>Pu`Nh#_=s?1+4xnX>m@<y0nO4v11?wuimp1(Yc?l_lh2kUio+24&x$
zNAQy<!##lJ!i%M?_4re!>C{6R-*D@iM&`!B=uhgjh{lFY$7LsHX>;v<$l8IE)MlE;
zc1N~gd0j_hJ&sCbOZ3#y6?JdI$<;8F!8Z5LG^0+NevbVuk8AR*e0Yt3Uj(u#(J5UL
z8`#IXuZKV$&>s{VxwXWg)s4*DhEASz@5b|LjUEy_%`@idaJ}w>Ev3vyq2~eW8LL_5
z2?1rUkhG?Km2SA`jSTjz8fHp3+4g95t9?tVR;Yu_X#p$Uy|qf&kd@_^Xsyc$$-}PG
zUZ&0fFKKULxNRE5WUucRv|{{w9raUgUGLq5R+;f59gE9=jeha`*?msTbHsJw*vV8{
zF=zq)M5!gdDDP1g4gy+>VnVlQIa_$t1tDR$UI1AZvKc(PirNzaNA|B<xssQCPhIB`
z8y^@EvM8ox%$!y035E-+XDi(&gBF;bNrKd~smoF}p8$6WQmM^0ykQ9b<}(&Ff^|_}
zc^BZ@5nZ_s!X_%=wMmpml;iE5D>xXj$`Fj!@qvuoe}DS16@Ar~4<9?~Kh^o|xOW?R
zWMRjP++uydt!iv0si?l63fM>90M4p!fUj(vVN!lcTg|#3RgKR-V<Db~Y13!(b{pS)
z>ZCgO=VSMc2q%LMi3k5sUZ)_8u`CNpKS`yokAM2udtMo)TSvhc{IW6h(mhw*TPSR8
z33&<wDq~F8ftx_yCcpj4f*tf%GH^yCcdhd3Z#Q=5){^7%iSuXf$UCxh3x<R^3Q8@`
zjmkNtFN6y(PR3Jz$V<m<umZ@M@+OlWpPk_~musuz&KSGeCx+OkH7W5gPU`vPT=&de
zI^)1$tN;(?YS6S-@110sMe!YJIq$t|{2YkIm!BoT3{KNihS124B2HS{GJzSB1~O}J
z)~>ir4uuxjIn|>3HGNm^R&q)x>Eh%s*Xod^Lnd>^IXpN%h>e4w#zpHJolM&vu82$L
zI^eTAO!g$C?GY1fhp!+ZUqdV8Y$cic{u~QdTreRi`W|8%B`OZmXi<Ir-ZE)fY2mg)
ztWB6!REJVvVQ#2zTEk$!voh+W%&31{$T`zY+#%&*f46`_Q`UppgA0YMiD5h`j9M+K
zT{2;oFczTEqQ!Y*I0F}d9r*#(D~0%J@fodZjF%=ciQai--kPS*IZ?{3S>fP~89X10
z<7^~nMN4(JOSv5@Me?L`fp5e;=)|xB3d09Tju|Eyh4V+rOOrzG_(;F_BexNmBI_?m
zF04PZFZC)ezpT^<bjd)uuG7sJgWkAJ!~n%Es8=gD_y?61_*U<C3|_qDsZJf3q;G5t
zU^glJB_v7lVw_{0jH&RI{$QG(fkxY#5f0^X;X?~sBe-zp2Q^#fio5{1&syv2O63jv
z5(GawiUNyZZ8h4)$t3^cKeUo(=Nt>Tj|}F$8>^Eezfs+FXjvc)kGI`eFKDh7K)AV_
z0ySVS&n-NYwn|#Uz2TEZHY0G3Rhf>Q<FVO6=fP%pieAc!Zd}$%%fcmBti}zW+2hs$
zIzlm?5-~o}!IWbn&O7KTDGM`VvBSrGzKFVpYtf{Jx)_;fp2Epxoo{6na|N@1cD|MB
z#*rV(%YC2s6j$5Z8=g=AHCj7$&-N`6j=aJqvscBt{e@RWCdQq!gkmcHj1R~rLjrU|
zFm!(S^@seS&{$&<v#;bNX13A^RZjH1HkD=G{n^0>{>w}=emnS>Yn;xRcwHsh*k<}(
z%8bHc6T(h=hZhYu$1Ka0U6$8ZXL`Lk$@4{WB3_Z+PCW8U^=zFj0-(Y>khBDN-x0s3
ztG$w2(M(cf<SfxC*E8>}l*X2X$8#*Z&<WXsOV1S{Oz=8eTujzSsd8=y-k<hs5<g1N
z5)q2xMF=Eb$z#l8DD2_QTz#jzsv@hyvL%U|vuMNh&jTXyqLO;w1>0SkU`38Pb;G4b
zdWl})7NHJz0}EL`*`AY4Qa+hAc8&`xiG`N1Hb)*Bal4Xw?#k(BQ@D(E%v17bzVnUa
z);j9in)<vv*6vN!{?3brawdvXSVKJO+vHN#o<l`T!M*(>*b0k;*S9TjHVXT_C4KAi
zm~{K5{V}4z`-}J5^h5JjpRj|W`Ht<dbnA@darIbvTWL&k_d62bb1(u&%2{LcP}gQe
zoc7Ri+BNPh@Tbj)DT?wqvPc;(Zgbh(6i{vpOLgk%U@7DO%3Rx_Pydt3Ar;G^nbvU4
zDJ!<eyhgM|rNg1-c}1PbFvG#}c<+m2565eU##3U2v=Ls-s&(lX=Y<&rHp@>PmTV*C
zuK}w`RzjJ}9YuRuZ+@F|?2HOmabwNj;&X6-2k=pQfODyt^9y^tOdPRl5MOuVH@2J`
zFZHpC3dHu0ja`Po^OHpzu=Z$_ofe_4$@f`Jo-vOzA%e1tLS?1r`JGNbMkY8JT`0Be
z4v9R~)R+}QoRZ3&^!nF9T`_4A%68T&32}`?oc=x+Jc2v6W&CDZdw0wuLb!W46xwXI
z+l>#{(+d*>U1Nrq-EUMPgO8}FYrjdtNf!ti3^@Fc<FxR7u!{Y|Z+ObidfhqSh|Ogb
zX$2yi{JtVd#|2)h2AkW9zB}LE6ekU%V|AwTM*CRDIKA?S86NsgMJ>kD+))X&{_a9e
zhKFb1^0N$$l=aSjXv#y{e3AY7Hy4Ow?UxJw-sKl33lW9)kVS{psev78e=TBVY010u
z*phd(OBY=R<579_D+gMi^)idLQ`O&mlz>vMvwUDGil_cU_|kK*z+GIVrNhQV{M!9P
zzr<Su#KEVX4~C7iwIO+xWnK7t8<d(A0&3+Y=fq9C;hfg$CeY4h^<M&UMOo;^ZH64f
zH%4o^6P$hUyZ{|;qWKu7h#Bi_zFh-}r%nfovS>s1V}Wvm=>|du{ebzqy>|9DU+5Q}
zllQky$oAI9l{(+xm^z-u*IyimX1n=!)>n0F-L!rg7%euIKwJa2*wXVVrYC!A_NR!b
zzcmtaRL?W$uwIhYyJcCoFzH-9GQ^BIx}H##2d}t2DQ8kwHnyG12)9P3s*UdC^3962
zd&u5yRFAmWMD#NxBei5o%g|XP!)>y%XOg|Dk1QeIz$=^#^OmsL7Fh7NL?m*Hw*}#7
zx$)hGK=Xq^2IS^>OmksF=CDpq*jC%dM*{&%60>~v^FiHA^cn3@ZSz<ce|Fsj>~&s>
z`-v26Z?nkIfIBSN(kdUoP>t!4Oq^<=QA1t(FENvt+3{ixWs=k@D2;&Iak9T#-N+AG
zJT|1ubJszbx$C>T+RA37IE3O?_h@#c3D$Sg(%{qtxUE^rwddZ$OfrVBDe<t55wI|0
zG!w4vy6Rsgd<XG_xk*O77Z~K{AVrpC+Gt4hINI2tT$K-)hh=+P&X>LW8@<@|CFq-Q
zbt&u60|4J?9;4f-EKzn#?JG}r=g_?>ClKO!mbZn0DB}6ZISVuF7P!_S6GQe|$S`a5
zDlw1G<C;6x-jkI{qW65rrPT!NaI?d@dO;Z`COR@P)lF*eoR!J5%d1X1<otQqMx2E#
z*s+1HYf}2+G3qFjRZEixS~GYZ+?Dc=OStpjE@7oAp|7x;g9*oBAK_L%0G<@O^}eP+
z&-E>>I#a8^TjL0GIj)?w<9l&XRGTHx+fUbpAE2d>TY-^DQm{lHhF|(Qrs*X&`}GpI
z-_wu2x7xfat}K|WoiNRi#t91D`IcnNdS^bz(1llkv(|jKa1H^+{+WD<$EMkbGM0t)
z3cU46-^ps~F-xU>a%=9HrI~Z+1SK=-jCVluRs#(bcdigZg}>4V=F-a}I}NW2ji5tt
z)1#XcLrI|A$+k1}wV)!-l;%_E!M0weYNHK_YW1;EZKjRwBAc&j+VxKp#bAxW-(K|a
z#HB=Qf1frqZ~E|_#d_moE%cmNhOwfnK_b$+L<~{mH8=s{ED_Jh_?ab7?NXkJo>UOw
z3>ITbyw}t~hHVl+o*JLu_P91+G8o~hv}B+j8Bd<tmE#UAbK0;p2|(L|bv{bfg$KRL
zNvistbtW`r5|?a?Lz6q-zqw7llHAfQ#-qcA$lC9%IfK{&LoH8X<zM|prpzJv226W$
z+~}6_ugm$ea5ENcR;m!xxsg8nt83C>X<1QoLZ4Hn*!H$@)0R^f#(uA6!kn9+P=H6Y
z8gW9Q{gRB;)b|Zx=#hWVpjZ+0gqt@d`Y~wjD&DczO4*Tz(O@dMkJX9CJ&!ukD~vAu
z<aBy$dmCaq=accCuhx2tYdCMDOE^L4K(EO+yWscsp(ex8MFWVfd@@zE?wl;#+1_wq
z;iKw1#9M1x<P4RL%B;6fo3bH+m4^m2ZpYH9=+4{NhHg+TR!-g;k1Ba;-t==Gs(N8X
zu-T2?X=G(KNuggW;@jvmO$N+w)ktjHCO*Q6U4IHWtkhj@>0~=5JUu+dhH7jL-60FB
zr%h$U?zuVJ1l1&>ExpW5@Xze@QuALK4u&w4mUY^vHoi0%B0Zd4##@!8c!<K))_mMN
zGQH1xnFwS}zF<`>aBb54$<2l~@`f7u1{%0QDKWyMa#NMMi_`su4Rv}TOq5oM*0ry!
zlOBoP>fy9?RP=k3fCcEFAkflQx&VItx*NInH2ZFq%KPMq(DU7`b@P`4s^5ZaeW_+p
zPCCj3(0)g0ChyI`4w_PP%jGW1LG#q-vETAXda!F%vnVwCt`BsaryP|@_XKGehLywI
zhxM9|_MxgO+xvf**FK8=lO><QSQlRTV`@uAx6jub66HI>>=rgH(VORbs(X%~xL`t-
zf4JLg&f!((>Kx7M|NB6QEcShg$9@fNG4uZ#@IF`2?SOxlKJMR+dz2I^o$bI#_@t<p
zSZZzPX;9CzJ?p?c6>HknFCN>AsnFgP=Y&@-iVE8ev|-PP^K@5SDQF2|s-YD|8=dq0
z7Vdd(X4B1oI0HP^gd<729-7k{6~S}fl1kUn0h4+vwE;<mdecK&MY9v4{bZrE3+8LN
zRxU*5JoztPzponrnG+Xi-vT~;!axqheD@lRxo3WRB*zea@LJ;NjDv*X?Ulyvr0<xA
zCy#t(Y5yU(ATl0_&wC|Nt?GO14AIt(L;!23^$w?$i8l=RmFqfle8~VgV&g2lX89jk
zMadzaeP}VZwmaA_eOdyI+O>Q}7U>t;P7SMQ&>^+6ovmWY?{DP|Le*U@*9QGRFsBQe
zpk8y%f_lS*sM<#vifR^;sz;|HfCRq_#gqr-)!C`z<(U?4jct$mcWJD?-v*I7Q#99x
zobB8+3&<2IH5{p5aYI$qpb533Ldxk2*UaVdr1m<>@BLYN=^vFD#HPMfI6b)FxM!~A
zT`-@=t?vnRJD&j%G)_w_HcW7#ZA<s1)X)sQXgj33*Ux&hgs1oBclo_riVc>3_%utD
z&$0~Kx*G0uo70dFBv=N!!nJuTU)VSBmvkD_@B8tQm5cZNc;%%cI2$$C+U{s&XTb`A
zVC-l7>TM3nGd6bKRaq*3CvMtqD0JK@=YFkay#;CZZjP4j$bsWZR2g1L^z40)QFRHs
zEhENyC*_V}Q?L&9DKw?O<K7|6y@O+}ih^5G1df(qik9pcEZ3>R81*e<u1ys}PCfMI
z-Bw<htCM;{q=YCdXn~3DxC%={O1<(1v?OsWzai6FCb=wH?WCfN*Kwh&49)T(jT$PN
z#MvfXD-fJz=>$7Hd%DM66lJpN9#yDatWBb8G%KlIsPulJF>V3ad|G`t)#CgP?67z0
zQQ35tzMxk(8AqY~Q&$kb=ij%itILLwF##lEa$kxcie7)s{95;0oa2T?%dXaZ`l^QL
z&+5`7T2M~FvBJIhpFqa<pMmT)mTKzz?>of>-<@Fh7O$311?1zANE`2sTLs~*cS#;&
z6V2Zc-C;tuJg;1ApTQN#fJu0lBmSNiC0pt*L;_$J?#&Y;3pp8%$2~Iya`~NPmaB)k
zjU7o|)Lq2YFQ!)BYyB2XH+o9pb3++8@Z;MLbG*Wv{p*RB<9ZZ_!`N-4HZLA~c7BHA
z`6I~wQxIp6%<V$%=jYZ>@8xRId^KaYvgmPst86-t<5YI$_q&~}nP@X{W5zff!KYF)
z-yM#kU>{!od}JXOj&6O{z?EzcL1gMa;px>X;{|+j|D@s(QxFoap_wx53LyZ4^w^Hh
z#3L02C#J^B1C_&iF@CCgj&&M@{^OE96$jf4AC(5@<-sH3o-t;J3g47ow>x@^DXxmS
zHJ;Z{6;P)t;Z=+3f_`O0P%X|<A2K1s-a*K5Yhncw<;T9uPpjI*-5U2;biT$YmrQ5h
zJP|Ab?OdnU`nW)cn~4fi&|W0t{67su)DDi5{i|<e>h`2bwvltUtov3nJba@&c5Vw&
zg)yA@+A<4vawA_ybAjlP6TJAWH+fF1Bh4`BzNL@d9qd`TQXU7=rnXCqPAGKln-$59
zrH>%_xk0%S>cxX2w0Vz6JpMeEQ)Py&*0aUjagODU77jQ{ay4|6MoQl8b}|ygM4NK+
zc_!LM+P+?vJhxH!1mx(vUX0^PrNYgVlyD?`ohqX!4Ah%e@Q}bVJa}(Kj(fl3CVb)~
zPC(_;Qwmk%DXxz5poI9ShTy)K)HjK<>pwnTx;9RY+pO&h*a%R3NA=qkOQy6qvbU3^
zgvVA>PLW}DYI!2jYPN6#*+rT0`Hrdqckh=aaDGRx)7X-~6?1}mXTIn!hyCwS6P0^2
z35xi_;5wa&-}l0qeG{v~F~R16*`qqX`cO$uo@15|oCsR~n4q?0A;AX?hW+1v@hNx~
zr1{i-8|y(e9XVS+ul@C<@Q(8~XC(DBUY{qv%}j&N6lM}sy+lLe-S;6$i9PjzJN`p$
zLkGLKn}xCIyROXNZOZful$B~p*iqNTGwp3>8I@-R)&ckW{SFm<sS02yjn}vIGnSj1
zW|<7s_yzS{;_>H#zo=E))F6=_o6Kvot(RLi=uQ04)Eb3IcBFr}?cO#~Y&OAcYh-hx
z@}#+!H%ANup+fvP&%6rfHY8Gu0ezD?!4VG;?1v}y3B=-xM4IFS;2yof_GGoA+1+Qx
z6T<Lwd#j(vCE%UJ$o@~b(bkQ}=0C3Q4$89)I3J4lmXUMIsnD7uo(K9K6H&0l;}UYK
zQwls$UhRrVyII(%7YvN70z2ah{D<!J79EJdwvGV#qGQ^bJgIF;Csh}sKI;il?cr*0
zIeT3=Kh2C!g~(gP*wRQw(Pq7VTEUw)%sRGnU=4j{osGxU2{fe98rj+AP0}(DfxhqC
zs?l#9PdptPIX@doQeGY-Ih`_Zp3FFW0S71b2<uUOn3~CY&<(L*HHQfl64;ap4mSJ6
zj`d1aNFY>W+w^;@bdcKnQMQNYwsX`%@s5#p<lM7Sl+TL;3kH#`{nv(9DX1HTUfeFC
z<`Vw$N-7QQbz2((b!j4@I)1hZkT?QYsp0G0&RFe_H3HHs9qn^2j=>Q9S?Wy-cD#GI
znUj8xE9~v=+{TQ52z*{K;#ZUy36D{smL|udm0t^-czD82jgyR!=I830n`o@trUV+-
z&*$zw;)dbu=!WX-;0EE;@JG#b8-ITa`_{}Dc|L#^TN^7OoU?27chVaSEOdsQ5AMC2
z53Q?`Qn~x9lE&kDrrF&yv!hByWkG>0+D^;vc*4PcPTxEo&AxNELq1EjJabw<y;>75
z$pjl$Z<;+f>TlJZ^m_^LI2Wzw#V{?iES^SE<0v)%HiGsEik^S*YyPs6p!?ezmGI%Z
zRjrW5LCU?YbjmEH;<PszteX?U_>u-vg>oCY8|0D1(((y6NfvsW6%E19;hXqSC|je_
z>%~!LYHFSq7%YKmF7Gp+Pf)vo3VXqK?1p?!Qaj>vyk3^RL`vW%=hai^BXI+X#^-t~
zv!W@fWi?Sw-79no37Q>k+5WuW9D9{VCeBrp5lNhe2?j~%Q`jJFj@T$}CI4j7Tqt;3
z$0IW+a)#$FgRk6cuEg)RhR5aG)(iGDl{ae9K7K(Tj2l)~*Qw-eKkGH<H*ug7T9RrM
z7N-f6Us@P{S1sO(&fUt^0{499#MHP|W#h`!s5e3U=6dj6FzUxZ(t(|EdBWJNKFT9Y
z3}F*EB~(yrAXsj$l`)a^Ac=}6`J-Al=QrE3w^c<?KKyI79tmr^7Ovq(>B<WVDiu<u
zT-hE!7Nj7wI=_rZR?RhlyQU6QC`EMKA660f$7okb`XyCq89jQE*F{gFu|dt}8iQ|y
ze6TpSU|KSL0f9UZQ4r_K&hPXrx>rkt^ZZ_ffu`h$CMF%ycxrOB{Chfq_o24C#hq_U
zI(>^uVk!?W8R2dTSHB@=)rMitMhpG?9(NKx7+N#zS<^8^t<ozex%Y{DN(f-2rJuOU
zr-my(NeU`YdZiYuWq)%!JHEC$L_G#i9??Kq7@t_<dnADA-F>haYOj6q!}`08D{6SY
zKhHfcm%!sfc$fJrc`dXsc8j}`JlelotBx|=CJl?|G>_e?hF04TFemJ@K^+$M3F^lQ
zDYSccs|D@6t_&r+@Ix2m5n_Hn-5OlNx#?hM%&`V4L<;+<TZtlIRYzId9ej}_vGUa#
z`-J-B6Q<>PtAv197KoC~-WuJQu$EQRS-DYf-6k_lX+cHNRy2B3VAog{)2MaeMm)K?
z>di}oK9XjM9MO>K@U|_Rmr44nW(vs&T8SuXXk|1wL$_1+F)x&@?7D?qaU{3I+VBCt
zBM>s@K|V$HWll2fjE33>q!YPNZB?1O^LzI^!x><BAe9AY$EL*4^nmy!(^%u*WGfbn
ze{+b=GJG;FnONQa=A}q+JS9~f#9Mk?QA_@qT&@qFs`^sXcTT3zBJjXcnRH81?kRbH
z8Gdp(;HwZ)Zyvuq21vREcf<Ql2Un(}Ya%WM<IRXKPOIgt=U#%=b4-PVHU8Xh9>kV*
z1srgi+J-IweY!irtKqN05WI1(FQ+;0X9|B60E>|UtHr6d-T?vzt;@5B)+RI!76QW7
zjCCcIO3M_!#La7^9aLMxzbFy+%#f_Om|6t}jVsp=C#mQJCJ>AY<F&e(3X^n}YdPsI
z5NdIriqjK`pqhEU;oGSmT*Qc($8>R%FuGpJ&EHBY3QX4tcyTwyXw6lHQLY*A*bGj@
ziXVQoT@B4N6Y!_6_oAPi#gDXGytozm>(CkRzR6;ByrhrYuxU~~BNa7g?Kio+Tc7A_
zy&Y50tHW>u`_~jdt#3VlFgS5>Z>IQOG;<vEu=G<BYRQ(ss*&>vw>=&zdC|!D88#jX
z?+4wP`uw25Jb7%4rh;Kg65Z&V>3>gGuO==rpvx7`PYf=1RIOtNU?dZQ7}D_EznHmS
zX`8x%CS&m|0a<mNiyra;y@D@kMLD8vo}ABm+?D%Mr8YE(Pe%$Q4n}6D5P)UBZ~*85
z&epYLlP~#K*U0v#*o&una?0xne^Xa@tV(?8;qREfe5!d#$Javr{tb7Kc#b1Gu}LuL
zh$U6%p+ut>^KXvDZdX&Y@Q1xMj<xphe^l|r&&Ztt<tX{1QtF5u^yD5IS-N`~@l01H
zK=qSJx!~%7N3dT6#eAXJVb2EDB;F$yq^}1aej|kF05EhDp2L84PE*KTzGk_8ncBp~
zG-bV0Ps?sdPhY=z8&8g@12mNZ$OiGa>6-{BefbAkNZ=r<ou@*#Z~e^0o$NtvYeeQJ
zhI1za<|n3*)FW7P7g{n3etfSYlPGCNIFqYNO<0mCd9$8DLb-Ch?5U%0i`)U&7T0M5
zsF1%|ANf6?pDjDPHl?7n74#b&n901$s}z5Q+_2HUOcJq2d&0QP62*Bz^)k?CyeP)~
zRhrF2qMB5uq~@64LS9QlLLotZLS9B3Mlx|@4!9kI%RGB{RP_$Td{+r3Ey6=@h`DE!
zLVO-{I<bkw6Xt-+N7j^bAfzcNsKy3y;ITTO63i#99g%E}lAG_j7OKUMk+<>x5R6WM
zzGc9~eAgLLEdGlNfVZLyUy)jj1908FuZ{0^D0j&=k($r)sSKSqJRUfFHir^4S-E!d
z{dS~1f0I;KZ^C2qCYHM#SWX;!u8#LAJkWgsNRf@FKaHnBiWxswTI_v$p8;={KIfPR
zJ3LA_dRh_phx>j$Uf7^?cMEaIh+JMsc{71Y`Dc&$_OKrWLPQhD0c1eBs2et4Z7|=Q
zBYJxGUS1T}=Ppvd?d*np%TarUJ62mAKN>?`2=ofm4D4rqNcB<FFnnJ+t6Tziw|8i&
z0`->KJMY*kVwl{;TtZw}(}1qJX=yJ+G6KF5W=q0)>t{I~co0SB#&NWqUB-+=Bs`Hv
zuD)`)ozC67K<!#hnWlc>*IDrbJ&z~m3i=X{+BxBjK%VJu8<S#ap)W){hPMtWdIK7h
zVKd-TV8yln>8@1w@OJYa7L#!_U-rXp2ixkR<9gDnrh)1IlJP*n>6KD~dAH`TNhKQu
zD5Hylj7aIrE%gs_wdPNr-MJrhpSbqnOJcGp5U10tNO=kiF>j4M>$f`YxWAci#I&MU
z;+(Cw%8d(~ydGs!@_js-$F^c@_Lv>^)6oasUoub8>Ws3jF6*#I&3IqkFP&MT$H?P>
zXZ7NX*&8LR*c;IQ^o8q0X*u7-ZGS&8rW<n|kn}?7N>%1#aNR5|l-{ZV%^p$i+e`m-
zw!`E*Z7e^9|2eDH8{JnJZ~u0z?DdhP%u#P&(OCi1FEnwK1a?E@-#IyXrcJkjgpMac
zuCEuclBK1k`vd+rwu2^<Yzc>giv{t#htof^yCnMdE~6Hj&=zq_GMy$jW6(-yb^O9L
z^5y{w6_@*M>XDZ6-8jT+cgt+cvM1>S-v(sXl1kkx?d{ob2K=w$&inS7+i*=5PX!xr
zWF>SKMeAjO$>K^BJC1g~>A$*J!_%v*f@`6PJg70bekv$feYwPX!^mUQTD5aBuHVl0
zIKIcmQyLU=XaGBBJ={dg^if`Ks$K8dUrxoR3>JYRUfd2^EJq8H$-mRl$=)yC-C@1a
zxp}g))8uh;6?Zn_e^}&xW%%rw4D4VF)=i3tJsg0YiP&GQFefGZI<yoPrtx04qt7pY
zW}`4Si<Ia4`6Km2mYB=i%F2<RC(hZc(3=|Q0rF;b&+%uSy|2_?&6!xBygnye7@wOa
ze?QEP|2f}x{G9)3|A^~c=)v|RUg=(h816c@Ny(3Y?SJn=Ia>Yrc|KkT>*(xU8DSYT
z)7H^ZdC2=r2NP2yoxuEOdbr>JAp5G85^qyd{++yhNAuNSQ}ao@d)C<`NxafiF0R`_
z0T|iK&pkaQ>@s0RW%c!+fAszz>c09R%5VKvKtV|Z5D7;Nq@_CqM39i~?rxDBQb3SW
zV1S{dks7)gl<t(F8G7ihVHmit=X?Bo&$)lWy}uFz^UmJSUe8)<KT9s?XITTZGn5`7
zR9t;iBB1S{_ByVm!{E?GK|#UwBChgs0XFZ;-lpFIgei9uy`s&ot6Psg<Juq&mTJ4a
z0$tKqR=UmTW9@yg@<uDbpwQWd@xB<OcS=G+!pT`@XQ$WJ%RT4Kjt3uhr`o#LslZ@t
zjJwEAqer=i5<xQ<e0)YGl8BQ2JNq;E>z_Plk(}l67EINDBuI}|+&>KRHkJ6amUSJV
zaPmV~>_Mcu%VHx%!B@i5hu2S&gYwnNw^14<Nh?_YiQdhH37aUt^yy8jBIb$rJ;#U<
zUW4vd$(LP4F|$XNI~Fh0*Tz-uvMo8UUUyf$7tK@=tCNgdnU48rRyD=<rlzSWC1~n>
zw3h!)cnlkL__rkHr5$m*oIrm`i`AjE<c7pvg-1otc-yDHT?V&cl(hA=fk<%GBvvz<
zDyX~@fR_lUvTK{g5_F}NJX{NokIV`A{T^hiNST>D3n`bQCL$6v1RuXs+)#4~CYjY|
zdUij2_8`3)lDjIp&1V0cPDwmu!~vG~J$TgfTV57*v=(UkFr8CbJ<oH41*dU_riYc%
z&8{p8EUn?J*U9<nv&#{dluOFJ3O~P~z|gyRxl}fh)BOkSgFtM@HFq+uP1xVCL8U<g
zrQ?~LTV=o_{!|FXH|)P&8~cwKboFqv>Vf;=l37M^)J^jBEcEI09jg)3nu;djOWQL#
znp(U5QFl|Mvy@eTSkIP(qnNJAP;};_(P$f)p%?ON^c7CJK%iD$)_YFAh4sA(VbS9Z
zy+$&dZygaoRAi7RC0m6&D=$s<?npf^OnOusc`Rtzxo5xEcsli)%beSzsd>5T05x#4
ztZDze<)3Nzp}D8XdK6|9SNq><1Q>oOa;6C}b{!^oE|*W|1x&;g!C1betBg~XmKSMK
zpzb7~TMicXi7pPAw}Z^v3480|e8_0=Ng4%)Q3S_aZC+vOPp4jXgv!e5{QD*S-rFN(
z{&V`|f5c0Cx;w=4|H%x^%mthOoT2$SKQ~ux%;k4k#fkuSnN{A6%pWH8ibk%8T#BE~
z0JRMk2Mq;JQv$hfd0WL6WvHtOd#gR7P=)Cc@YP}|@?BITuf=#o2fB+tJ)ajDItRp6
zN0BTx^H-34h`O_IZik<1a-NslGP5)3$!tsE?m%s|xVaH~ycs=r0SErb%NBAD!KI;#
z?G-Fa14!SP&gKkU3f>c+ybyiwK)~b928D0YOYzf9^)F*epWp&NUdDl<YBEWto~w7;
zue3#efkx;K{4Q8t!YrCy!{3Odo}2#M&N2HwbKlbBj9$?%bLlXQKJM8Wva<P=^`<Uw
zWn6R&(znTsomp*LEQbl0Rd%uPtMi|M&Q^*4ISj9TsKlC<P@Y^j7(wWHeplK(JFiru
zyrC8E){Wo-GM61eQ!3C0;#1#-JvHiHJ8~e;X)4_6*$wTn;vcrk#~yo^+VQ9K6~J@J
z;*tjD;-%c0_bjuE3|wo*FQH?RA4WF2Wq$uAC%q-scK%&Dg|2Z2uZLxSG28R;v4`lS
z*mHUE0Uu(Cn_}lzW{(%Z<OxjBY2JhrXsH{02Sr4ox1!09r}uBsvi1vlv+Jzo)5%G&
z;xv*y0yfeU++?191#NG3T`rq$B}tvAyc+2T;X!U?9LrQYwF)tUgni@*ClTWoFL8bn
zC~LrMrnB!ng?!l)zuNoKPQDXH{0t1)P?6zVw%#0$SGT?>ogyHdlwOVqdMYh+a^97q
zbwiO3TEVQiH%>s4@pl|?hQWVL)fBI_WT2gUD5=^P`(9KxK$3<dknDljFe>3sEchYC
zFR4p4p%So0gL%2B<>f}0xWYoP#`L~+XEra-j+gh*`1H^S7z)$8t8~tN@6AVNR(!$;
za<6wuREOie>CY2VV)zJiA8%7XIKMmhzO=2a&_**}omo?hc3CPqIy<!aC9!m_A6NO!
zPdeSjmh~=7?K=Jg<ys_P3vh4A{TfZR^k!}QU65U+TX|MzIYHmHRQxO*rmgeI&PtL&
zrh<6Bj8?#gv1p|IK4d&0zv=dnKneZr-w%jtnOBQ$bS_;L>k^gMJx$-cJ`1J-RZQN6
zJOo|u#qnkM{WFQx?_<w3FZthN94HT>xjL|hT-Qa~{cGX=3yvP$*mlA0;*0BaT9^L!
zp#lU`Rb;Bxc!UTkv4yrLCY1K20V#;@ct(p313UDQvaI|yv*!qNZkn?x$HyZ{*KoF@
zvidDP3cWFWU|OR|XF11mQG7{^|B-vW3nt*H9&UsQW$Um%sjqZ<HSj`LCm*JV#%_9m
zVpNs2H}P!T)S}_+U+%ctE%B}4`>{?Hw{%VWw%x6oK7O=yX3g#3+qZIpo_82>*!4>}
zoz*oecW_{xK7TU3=4L8YQx!OlA7pH%kRsk<=@8SVXm>CCV#s3eUay{B$^f$EBYPG8
zgQ+Sd($-nD-b{YbQ_zimV6x1RZcu@d%<oRW08p^$R<brGy&634$ZCheMF9h_uVk`n
z2fFN2uRl1Zo#ff=BGNq@xqUE|HLFKwo?=|dQ5dF6KctvUqA(E73e}bzL0XJZ%k_e(
z_P1x~fyEkX5ITgF`*{@y7d@Y?T0TcTBTr!VyGUalGlS%bhQ=9@PJ)#y8}6&lEwIoI
zMhe3*FKu+s9+AS$ttWHQ`vlI5&9OCDXs`F5Exy+Po3&PO9JJd4li6);PkM?eEDZf{
z*E8b&)#(JX-Ez`R8Ysz_zanOCKG}8>J2T<)pmg)04W9>#BL=s;jn$oeC)Q974#-Fo
zEb-Rsl54cY+|E?bnqS+oLyJFOe_AyiGlLLfFNv10+z#cqQ{PRIICn|2K6#PyFf=9I
zWi|M-o&UUJ%)h5IOMV{szoNi_tvliWtam%-1?;to_GVG(fTCb&+8rPg`FFK#)v3ny
z{Q29^sXH=V_u58=9zoXV0BlY?kQRynL$znal;oH%!J$5EiQ>06daKh)98^9$=C9i2
zZogkvaT^_C+kB|@93OUV`APP7EhRg<7H}9_+!=(&tzJwy&y*=SR4&1=`6{7~y5Bnd
z9`k()zt=~A9OB|z0G6(<giF1p{V*4&{W}OFX|uicR1p)aiZ&4tAVahla%@-$q<fsg
zjSCz`Zr&4LD?OqI0t+Kgg@Ngu7fv!d(6hycUhjeay45K0fnmSyva?%KE+MUgNz^XB
z`@yW>39NV04Ixsyr8^yy71`hKp3?Ht3Rq~Tx+T3<(cDpEZPsG5LiTX^_My>NbBq&1
zq4DF^*N+IY@gRHt&)P;?JEyI0<x;?#_uI>ID+~`-jbb>s7a196CyzEb`>MJ9;C4P@
zeAeiBeqheBl;M4954-4QN@e|BmdSx30Z0AYUVBkd=8E$9Yv(AEgcEPkeX6KmD|x3S
zM%a?u47mCJ@wqL1EmyflF|k@V9wN!=y;cX!zus9`M-c-RReIG=6qQd4!gBN($K+<c
zcRA<AA!+21ld!$xXh2ebF42Ad{W`jHE_9T+V>HO=1$GK{D_(;s=(CfFo_eV)TU*Fk
z#mzv|2?^vpUp29n-?v<pYWH?KqMTgoyjnZSMgRQCN+&utL<c$j@r-^TQ0>=6y(>bi
zQN1@NfBCb?^*g-YgEar;39kv90r-Q^Cfp9AOBL@KR84`xh?oAD-()SKniAiK_0K<E
zJuqD&3c{O%d?S>WIOse&7;~~>D&7BA9Qf6BoTcA%SmSB4^`u8D5b)`>^{V1Z$H18w
zS!K?>-=n9auy7{O-`Z*qY0Ee$NtwX4_<Ea_O(3s}4v(Lfv2Rf;6}Ir`AYrrk2<60k
z%K#<Gyh$#-3VpIEWi5XcJt}w2N<6!7Rg5h&acy@!iHq%+a!9l>K)JLGZfMkA`O>0`
zf#^h>2W=4kYlqO)TLPzrCKEVP6ZNr&LW+L(gFZj~O3<U)5#k|5C{JO*dnH<8;y8V9
zbyjg~P#Lez^xl_#e<(i46aEwu@PV*2gQ))8u=}_!+bd>`rG;Xu0yIUe5QWAj(Yl+D
z0SBW1dNrUB|9Ym9hYVOxP1m9s-`Lo-wna-f1Z=xs9iC|uy%j64rY)qQkP)X*p86i%
ziFMNV_vlu-=91qsl){)GPX(#Z=o|F3*b>EVgX*GAu)rJ0bjUysdBD>wCM4Rf;-F9V
z-3p7(teQ}!vJ7|>)c~Nm);&YV_JKl(k*?3bEb?EeME>BEdA>OVW?8+ScfXLKX=(g>
zw6s(IC+_@=GQ4|!Lt7=;#C4Ku)T-x|IuNn>oX}T(dQlJ3swzG9SP?@%P|Xdorv3Ip
z?0=HM5A_)>>n6flZJ|5~?$8j@bs?wlPU8s)Ag9U1q|@azEw+MLddslV(B(Z7=yq}O
z7!W?&Nx+7X+!9j+kjremS=HX)P2;;N@$oFJ#g=7VTpz|%)l02y`DsONS&dGelo=|1
zVSoex=$`5#7V#_57c<faLO;=C#rZ)YqgZK*yaS@9X)<}VQ@br>Z9U`Eoo8a%O5g(E
zm_yx;_SogM)mUJI&Oh``>J|jqR#F1-_IQDO$0m%qnGNsa-U+W+(dx1+c|}ABDH@K#
zumgSY3f(QSfdLliqF!c(Q*PEyfQkLjQRFI~Dq9+~1`i5+8bsX1L;UwO6rzLxlma~S
zEG|VCLA|r4wpgQrq%@!;tcix6wn5V6A&1l*zY_5x<S4Y1!l>{lPQ-lUbpz!=pMKZc
zkfw6QtJ^vn)&N**Q<tZDKo28~ddL)X8_s6WK9ZcFzVcA=`9~{sg=f0cjPJhs&!wj2
zB_H|?jjmsb;OhDi9T}Yj0GV-r7;{|?%5CTwvT*cXZi?4n$x`V603nXotq3TCA_JZo
z9wgkPYgY=|2D`~RC<s8k2U5|eZJ{+_Xs)1)ZBN*9vS!-Y_kxc4$0cPRbUd_PFN|Ff
zUwgEB3n_q9CgVsl2a7S~v0|nllSxKAq_;YHwP^a;RG#NCGODgTWR9PQ$LnECc7}}c
z=dq5E3R~$mcEuULf!1O(SI;f(ic??nF=$|VY~W+iQ+wc3^_?044_5G?3IrE|-BHMs
zaZz~|3Tcaoltend<MDONm$c8oVzSkU=-ThMnRR(i8hOHfJ4#rh>)&;4Mwr!2)b7t=
zf-=kAd}UPDMoF1}af|K{>>6##&YUZ|&XHP8&tLVbozL=e@2FKAl)pfJ2C9(5m)4um
zE62X2?PmEKp_yc#5;SdrLs>#PEiY#j*>+P|1;s+lOoE+*0DY)pWv3<h?rG<XzT>19
zI;Kg~<;VKRXS4pJWvBxB>g|bw@3Ja$Imxs&vrS7gPinWvwnZJOKvk^$tF1P*J2E!8
zS@40quH;%tAZgHfR2fL~@Z&T0!YNE4FTwno9leE;{0KTNi7PotxF7s8?Ah`yGDF<5
zDpi-u>VQp#0cV(nV{fKA`<wMADhn*8r}~7+kKMMxzlMkN9)ZdLWWPT#tH!!0728L2
zwZ~L05U**I#YweM{UNQk>I3yhI-VSeiDU05c)kJ?n^```A+}Bhr^u=nhe?o!I_Z?A
z*Q~stXe7HhN_;QDfIa@-ba2m){}uai*?+qdj@^D;3%fh<J2SkKT&mXBA{WXaee_sS
zBir&jW>+H~vbNaTr~{PdsYo(uibM>v(fB|_gXVy3uL}T(^t7ZR`Y1wjp-YD)ffO`C
zK+F74ynBf@UPA_0IOLE52fToV84>mYLEr>?wB@1BYikSjWh~xm3i2~@FZPOINIB5x
zI3yw;po5oebcqvg)UV?OphF_iHeFY7IIFzSADn$twNiOqY<*v%mtnVlSKHt>P?*(k
zx|&eF#jt6Kk@n=!8C}xcJ0zmue-)<hEB?mDNCH-@t72I6sP0}?JZx#)V~&fnuDn#Y
zO*v+DG)!zAqS#hGf%~n4X2&fpc=eXdo<?8$=iE$1wFyd~T5$6%%wS6g_|p#Qi$2)y
z=z=t-n@~Ut`}G_l3PY|=IGawYYGxM{s}Y}`3r^pij`<0km7+4l23Ar);t^0Mk~zDk
zYAaa>(Ug5J67=u_GBmSZmv05YT!B$-v341sl5P~zuvN1`n{cI{<A?hiQP`VKatY^Z
z>wG9?cbd)ru3ZjwC7zv}Ci>ty>2fs}`@15$LQL{quBC6c>y<&v74HDlJp5@gQQ3~r
zI?G;~A4|?wD%n{{9&J9HHq$Q=mac7>%?MNRJr8D`0OGnv;r9)fcV+axJRx*1xk5L&
zaIS1_=xa1EE7~OF75?HB^tFPip^Qy{+14XmylVxv`#v*eea<@%`i|?K#kugfQ?dJ#
ztEw)xMf|YnDq(>HnM$*k%EE*+A$+@f0P=ZJv%?S6BT7rUU{Ntye2x!eqh7PazcYUn
zsD07_Jx!%OptxSM@*S?Lk#<`!58y0oT3>H`Tw2>|U~q62@sfP-LN0Mr6kt>mLT0SI
z^U*dSzY@n$pp%tj#wPdV(0aX-mdMPiFhogOM<=^n`%*3AS)=@@xdgYJNj0@xONJ_=
z4{f8|{9r`#yp<2?aZtF?6GP`i?5~6j(bCuUr5f2yU~%ec--PCBEMtlK?;2~!L%<11
zf5dHElDYgjt6pu)XHMK{Bq<9M1Y@_x1@)KOc$eMy&IjMLFl)3lopixQ!XS^{JfYSC
zo@K;V|EFioOUk;O{Hs#!J@?&ZFsErbyb8pE>`kBwfvwLSSf`(FN`<x#gR%?sRaA<4
z<Q>!Ydb0qw!F3CoTc)itvE-j9$GmfK@s}sR`vm9p|MEl!+~zKWoTGLddli#FJF+y>
z7=6;Hzn~On7~<Kgg){R$$)$hjXEnFjO|^+lS=a?fI6`^s4@p7%L9gLeXfZqH?)+s{
zcA2?dfSK1wiB`>*PVBiHGx-`jl0;0IThuX`+C?E+psKuINf_9=Bl;Tk+PJnU;DoA#
z7cv#vYWkTvNiMTZ<Xj1MYQnsmpF6)-^@cs{BA~_Nziet4)R{gN255`~5@@*us0Nzx
zG4d1>@QxKndKC_(I$zMadoZ34bm1d)PS5nzH?Kvwck#5n>U#|aCH8pxqP{@hh&}&8
z*R@ZvwG=+<%N9w7F8q{@B{lP1=Ysvz`Z+CvoMIA34%k(5G`bI|vG~2>+w)@B`HaFu
z*oV9z%*{l7v+(Z!Y8cYW1%wC;mc%^;5~=p7X{v~khsatEydSGFM04#+%5&>_gu+|0
z5XM77W8PJ~uR{9{@`-D7KshlWAdwP<jH~bZP(1(Q`{GwUlXw^Jh6YgEkM*nGnEDwk
zd9iB^jA)hWryg1H+`iN<q`vefnJ<ex!9>_-iIWOm>AsdOd3xLTH8b(z^i*uPiO~T2
ztBr0;aO}28!K?`4^eJ&>)*z^Y<M>FP(#gNXjf2-VnR-AJG%XH#kpRaYTt=EXj)qMw
zdBIx^CCW`rI{o3fV_~dw4rrw3<NAmYG3A;jW0%3}DM93wm)FRmB=SDLA@PH@=r@66
z5Ag$GeEj?yWr!HyR?!S$%I!~&T4V+Rt)QC2vyIbpc`$Z)u3u3&?=71|yYl#VrlJEj
z_lKzO1+=|(6bU2|vN!V{@|X5m8s|)e9V4uFq}*~7;3RB=S`LwK_z_;bt5loikt6~N
zH<GJXR}wFsv?WG5x#3>oOW`G$uUZR5uir-rE}`Mo-ggfsRv%tq0Qw04<X7z17@RVJ
zt!{+aWJCLXfIc{DY!U90g_(VH-Eju^3px+;i0^g%TC$OEuzSv|i!9n=e57wLSSN<G
zm%Og{>rHCSs)k102d~)&-wX5>ftB4V>zP{e)SgBE{$^&`s%n^Gr}FA=t84%Czqd=D
zPEBZhs=p$T7(NrmID6smiC8OD2{;^|`Wg0%Ps^fli^}oZF`;oHc<`451BslI{=+XA
zM18YU80(;<i{Q;bGRCPHA{u6pVFkT8cM7f<)AfXI<2-M&elDE%YZ9!}eT5qbV>Dam
zXVUrg9eVgTJPsbU)%%T%#-U3W#HdU3I=MSfda`PM5bZp>=B}Hv7sZeDtz{o}xaMx!
zsh(}X)IgyCuI3|oAO+z_Z0^m$>b}VmJAe4*(o;=7hVr~I4IyR?F}nTApLeoCbA#M`
zjDyRh{!#_f{OV@Y<#+&)zf<oaCFa#fR&!KGpROFqsy$j%Oi<%!)iyijp=Wf2cK{@Y
zKdEduXsC|WJv9sO^rd2_Jfm2HJk)lw@M%1;TEiKt9#u9h3P-W@7Lxavwv`V#<YX|&
z%o+e!@sVjhVjp(VNwgn{=PG#iNEl%UCa-`Dl}~d!^`IS($Wtryxy-QFZ|B34fQKyw
z!wlw@vxieK>7?YcCTt07`KP5}meA;Dp?LYk0E0743XBZr&1gzkMNst3FNY4!8i6Ok
zC-v3Htj4HCKT0bwZ~eH1j6n(vH%Zr3K}lfA3!3*)>RoOt4v`xT4akyzNA$`U<TFrv
zikQ|~_`NpYqG*D+UIUue2(%-;rCsfA7^!d3_M5=0EWQdA63$gAzXW%9#G-75Q#F<Y
zC_OJI>I;*D9cR8f_89_gEDnWx#E0IY4l4gaGndCg{TDV<!*ln<NZMw(Kg5`oU!CD-
zlbQchitYZ3MrjXr6{F21kEwQ8vFv<p7)XW)p?^;Jy{?tO-W!Id%;g=39gq}J%5)+N
zTj?w&xj7Q`>D}F)TX-c-yk7i3QK50x@2M^}Fe`ED(9<t1a6t4c*q02}s__lgq7L86
zBf>bj9pA0k0f*BI_{Kpj2hm6Y>?XwUguaAhYk~z*=b>!JHeA2Tx)PK%Q%<P~!J&2X
zXKDKxRWeSj<%%;nCQe#^ndv0xv;RU0MHD(tQRg~cD{o-n!rCu3t*b}$gQ8phqCO}?
z{`p!F-r)P6hq!N^!ntNex$W$LtndT+j=45-_+~cD7BR6Za5lz@H%D_+p82fu!uUwr
zM~utn+PDAx1)}+fw9E_;;~ZxQyKN)2-s~uNu{kHD-bduTqafR7h$Iwvb%%&M!n}NG
zTNps;Mm8fl1&xUAmw&wg#bfIH)}t)Ss+j-{qwhIU`|fHzOs6Vvj)2!En)>~<fV{{M
zY+09OA?)kBw%JGkzGxQ$i!qW^`!Fsd%T-z>+0?b2Q?v%0#Qynw-$pHLv9lxgNPNIl
zszd9_(p+@5Da|t-M)gRZQInGG#|xR9^T8p4d({@=t;H~N{%MrwUR1yLj0NhllJW}+
z!2aF}iCA(S=s(XV!(s0i%IgAuE-#ee7IL<_UA7ZI4+)4t5^sosRCQpUJ6d}x5hstb
zc~@S5yAK>C^#7jH;w>vkioXG~z!#TrbcnLSkkwm8fm=W2L#=U7X~)!g3mR)E=A+%B
zIX(!?LVhLzu}Ha0%v$RF%+sV)az#Ac%h9X1kbad66IUMY9Wgd0owfLEKa{uqN0c>l
z0sOWtY+}7VtPrK!%F44nx-IlQeD&AYA!_DLp2`Ps)9JZ2VZrU0F3sP}hd@2D_lk1L
zrFr9ft07BO`r5s3_V^k5K3~nCM+%S~-ntp#<U$)-hDyq8(p4ENww^2jD7Q#5q1Iv}
ztCGwbX|;*GAGXInE<RsS)tViEpwjEzVdW?Iz0kz&_L$CV^n#Y{BYdY%@LZRoyzkvw
zPvt2w*g{)%Kpfc+FzwC(SQ|;mL%I>h=ZDxSMs_WQNP*6<$80x}bVGlo$9krlo=Hf7
zQ?jb$Tn^=NM(RxKCR~r?y3b>sm@ZpUBoyv`k#sV?G^5o4Mtpe5kxoQ&lchG|#X#(L
zyt*UnCz1(7YZ%=_D?08txWNZ-$}g&GU6*7dwRqU{=_U-H$GzYLhz^(VipneNuRSyH
z5n~7gulI$PRV*Y?eS{l$WRkU?7#=N-+fUan56E8>x=7qKMCS&m!Dw`acylQt8u*$G
z-@WQTcL~!ea6C?6jH}-_Fbr%J)Yf%2SB*CRnuU#}zx_R@*B|kQd`jP#1+wp^#D=+H
zYEZQFdn<Hs8T#f|@Q$9)Y8%)q1U|mElxkLt1lUccZjI>_>dN}l2@h532o~S*#@`iW
zq*}ZennNi?7fh>dUO>~!I}kBVIWRIroqFWZp5Hlpf#Tbf*%E8E5QnVM{&-Pfzi%ay
z;_}8+*|6l!(})nAuFJ^mGWC%PEj87Rv*xV%>-VC=(Bhw$@n#mkMrw@Ika<in4W5tI
zq|K(Xlu{CztvBoUbAuW*6`$6hh&h%-SnNg$ooy6W?^Xe7lS@celjexuq_R%F4%CCS
zf6~hIA8jI?P8CP_e|6H&1m<j!B)S0iSArpy5i<k2pLk7M!U$ix9)eYtEgenDOt=O&
z=L(*%90mgMG2)mUCV*2~(oPJXYvxp!h~@Bzt#_~Jp1hCI^JZsaw&ew01}!Z{aX+f<
zbuGxLLLet^%6WS{VC}uUK@mgs_WIVJhIj`5T`Q~p<&%{KCeRI$gN^`zul6c2J%_xY
z_8^BaRt0`F>^r<1)ViSnDL6rk*{-tCw4TQT467<KPw*<_hS?LH;-s|wg!NG-ss<aT
zqSGBSCBL3nos;Fd<_cv0r`+P+TGfG9hZ;*k-ETwK|MWnr0A^x8`}Gsh&wx&D+EURk
z(A2L}MY%XB!&M<lN3Q77C&3+1l$fL9vf-^d8o^Jgfh=<&%a@NGN)jK^{h=zic!37(
zvK62XZS??w6q}q(b(paG7DMNNAqmfE)hQ?#Hy@>$3$bEg%?3yG3Joj{-89z(chn6e
z>{>F=0lLeD0Tk~g2|y+u8<w{pk-L{b)nEjyg8*D72T!mebO0VHt&0Tc#<o}TP}$l*
ze7%#Ap7v%&3=1f%A5|y?v_6jd#DXW@uc<DEQA@L(vtvckL6QZw>VOWo<fUwASK#om
zIPcO>X4(1Zn*mJS^GHtCCSj@D>e377+Kcl5DJBGHCn8nI6NO3P_KYwMKq`so<1xcR
zu1aI=J8Q|vD&eMS2zZb&#-z=<g^YE+=sv0w{tKY3e;8oOmI2f@>#`M*yIl5awT_42
zi`z_X*hfmsPP)}`vAd}m#IlfczMk4z15HAxFt}(@mK=C$z*{rJ``C;{?_gBaYuv@T
ziXd4;^gc>dry0Pgv~nP4kK@CKj9P%)!gtdCCLIB=Z=UH-(@_&1ZZk_h<&^&;B~s6j
zV3f`|^6&WmEI+gDZY|Dk7ztBj*>YR#e_cWlQF3CZslAl3JmLpZD7GC2n!&s-tdRIS
zuXmt9hsjlcaZR;onul!Jc+pRAkGP|!A^H4bf2;lw=^q-kT?(o#hW$@Vi2{|0pc-y~
z+(1RVocJ_JJ}-(LpY@dhK7%9!giXO_(C|30I`G{!6grwJ2;$&dr%EMEu8g?s3tg4&
z>ei>^&T-M=l_%mT-yY*=eunR+7&q{<T{SfP#e!MeabaGe#DkKca<8s+3vHdCUlM)i
zrrimp0ou5ZBzg=Gsi!&}<1{!SV-b46^ipH=V?`0)#^6iSwyfqhf*JO8DfPKFTq8D;
zDW=A)oU?X<7>Vw~psb4QpkeZZp8Ch3R;(<2(cu<Jh8j6fLjeUp3&^Hs$fs?7MqgX1
zm#4+sNXbj#4wZdDtRQd!1GcVSXFeMcCf=HG`w#4|=Dpb|_6a>)w)H);O#N#?aB`KI
zX^*7ew<Rc7o+Vv#HnxA@f>)0(;!;9!$@lf1EF^q2*1Eu0-(^kzwHD0c1H1VzW--Bs
znMM4ykNZePa|PY#jn5$#=J?RQUIa>5$m$)e;Y3v2e`8Qo+=I#Zqw7|BM)O^c^qC`|
z6b6lL>+zD4vs4B3+>=4N_r2W1tjupybl^gxDWod2zq-RoDjpv$R#aEIoCcYQ{RWht
z3QkZBZIUwUfbhjzwq+uEt3d)Dm@R`9?SnoYZ+yDsrhqk3sCoHQ-COcDRLMk-GF<!h
zLga1lTL7JB(Zq-wlX<at@5lpp);#iHvuDKb*>LZglTSEO=cz)k{QBBmA+m2>O1e0N
zU>kJ(%@7U!9n$U$<>R$M<fO<aG>)Q=?=1n18xu}K4TE8`4U26$YD^*6ubK5@fPjrC
zox?`c;BoM7Qb9>|bX-}Pr1@R}<Y2IzTU)G-JHY3e9X(Pti(dC?!^JmPOL}i52&S~;
z3e6K0(rPAM(*oWC<VU=Pv)qF{UgDuo3hZG8>Z9VIFKXrw#<UxI^>@+d=WWi-J6rZ|
z%f0D)iHvrQExp4Rd+snmP!s;__%iY~wrsg&(_B~citjTxt4eL<Sa`qUCJnqs3U+mn
zRS+XTFp`|I86}s0*Eadakq}f$#<tfIXJIqQ*<6@RL~_zdKEm8V$l=}Qa^%`QUKCwX
zasNYBYWap&DU#aEbc##Fteb{IAx4-Xkw3lOC%HBv_b9^L?X~x<28^UXLW{gw>!~%K
z0hs<l-y)=PgnPwk4F7$`2&;gUPSjh9U*}n7`^99%^VD~Ga_;9;ROgJktZ9bo+@aLI
z)d>?U1>Ya#x7N}9s1Ad~lIYcLclCdEde;O8nQ_!->HJhS%)Zn6<U#9k(zQXV{zqTe
z_nc1PRk&=XDf=o_{Tg1QS^bfZI8kuKYAt!%m(Xv7im_z!EF41CkD!CD7sYw*<I2i8
zw~1&CYH1(HM+82;)+4uET;|5fO^7HDi4Am-UuiPwpt7ulyZ~QgHy&}A-wy?ds<VLh
z&kTDNnXk_D-%3djeecxgo2A}|brfec4#h@TJFXd4dU0|sicvyPI=o}%hY+f?u<||#
z5$KZwG3KZFN)tVu|Ng<9e^(z**T-cAz)Y`|E9rsFhBdlo7uNL25zHO>_R5^Xyuk9W
zf%&4ue}C3-qsjLEfDz7<@oR=f(`&WIBbAH$nZcL4;@4J92iTS+J>(TBzqg~Km-tuN
z{XcfcFw(~hEttLQjj=~746r8O2&0Qlt)%3%RltfEla!LucrBgLTFHmZ|52Tr+;V+(
zMasg$ayfO4ajKsm2dd$5JD>*oAMEVyKLlPH4=;GekNq|-;4?gb*mBh3yteHH$8?_6
zyPiy6VDjImFUT8R;=ZN1%J)AN{<Me(f%?dPmHlLdUNu@tNK`sFRYI-~wV+Fc#j>jz
zsH3v@&tm4=1!WlG=u^P$TuEf#-INHfN066_)G5^G$g8vKxELk>iXD`p-v;^hvH!W#
zULz<*@?SaPG{Ft>Ei?9fQbK7{r_E{Ks~PU!OftrJ6T;JcAPNj$++qwvV1<R>#IJ&{
zzmSPvB{_jHjj6)^i(mJc{>?Zn^<1=SwfGI4wO~;G4cW+-6gU1xECUG1vva-irU5D4
ztdhJNx0~D14E#Fhe=L{&Cu@%YE2zii)zGwWMxLk0dqsC%<i)8+QhKu2bemV<F6B)<
zH8r&X|I1oGq5v}Si=<r|QN4L_w>4U}fq_)dMXHxZBEL429uJ`n%pw;T0<nOVl8c9@
zX~*hMK6rJ@fjjVoZ$I_8GQ;F{SimvOdq-{j{+E-T=(62fP)rsa9u0$ES)A_9Nr^n3
zIpzLiHicgYgSbSop6Oo@2ue5*+fFO$Z9Q|Q#=;~f=o1aP+V3KciQJw%6B~4=__u&~
zP$~T<IQk$qTI+gu>~YY)-8>?-T;)`Kw<vC`B!FJ1#(MWY^XA)TXQ6+ruBNt6L|d?)
z<7wm4hj}5tO6;dBXU*1CWS2h5$ba*hhW@$Zt*_-=qzQge!@Kq`s;wO0N^1Y%xKwI-
zt9J6NX26dWSXC#tjMkJpBnr9Y_2y$bzaP~tcFTwS9_Qh57M1h5aB?fy*4k^Da;C$A
z5>%^{2fiLMRm7AN8b80CR6qY~TOHbp{s{cv({ve3>LJA7-#<Ty1R^}}TezlJ!+!_?
z*36UgC^tnnY5G;K0rIy;>#P`py3Gv-@!I_2%XCcn1HvpctA5eqwCue#_VIg_%e)^L
zj7)ehJk!G^)65rjDIRvo)quqT({H}s6or5C^yaF}6};k5!0W4S;h~|4lMao<$<TkL
zh>MIk{ZESc-9lSxPtNNUB_FE4PkL5Pnl3I^%o*@K^~gJcw$HuP)bj*9dpU%RA8k~#
z?%K;3(5Ktk4Rtxg{KX)(Yi67{fL~&Bh6gx$&+{yNyT48|#8Jqf550ON5*WzB$oL6x
zM1qX+r|IHYf#?Omve58K+p@FP$xctRWYXty7eoK}Ju6O^5mC}(z()xHSx|Ec;OhJr
z^#A<Q?g!Rm7k%}<_6Fo%cG{c&2+8f)|1#MA(;_Tcjl=-~;^I1g7tLSP^zU~@pDy)Z
z)~~-`{?{J+f9rd{#vTX!uYvk5?xrsJjtnuta{-RUex=4R_(WzRNv=jjW(uU#?G&C7
zgOYIgk_X5L?g*|)w`;qaF+c6NMD@s1owpc}-=%P7m7K~<@$g$fFWWEo1$13v7WwGi
zBeGgUG8UB^Kv%%47(uibra&s(y4?)p#21R8iC`hm#zJTwest^~kBZo2h^llx`5^1<
z-sK23l3tB?2BV4({Q~?-w8&)rjJvYhMB$_Lnqv>0isUYlVV?qI?hmIiz)lPNNRC>}
zUZcVdpe0***gCA(_4+JMw9)R6@O!D^*sM+6z9Xa@M@FX_5ZI+`r$*f8xrAM=n?+8c
znN_OXvv&2Di_2bv6C(ZRf7#d1FaJ9}Tm&zO5zvPZ3CTM}rke|Cc->1P#wi{%-k^Gu
znWb5d`fTDm%sF}FjMSUS3U4=F=xj4+iDTX(SELgwt{jU$J#(5WN~ZEY#QAC6AGQlH
zBaQ~;3Q_a^(bKOTzGT%%n@N2H(!&k2ZRV^1j(%N6q%lMDDH%X9cK#gbI}OAO0Q?B2
zQlSabUBim$wl~Tu-j6iNjQOxVPbVV*Y^NIkN)I0=C!XJ5>^x=$(!>8?)i)-%vTTBR
zowMGt0QWOerl+QygJ^ly-wL>XFip~SfThu#8<P3BOo{$adN|OWq;>H$YmuAbMF&0Y
zC=G@JVzhCL#mA+qvmO@Z=U*tw+m}H{ZIOSy#-BXz|2L1wZR|Jg0yJ5ynX)A!y4H$-
z(NS`X;Iipo>EZCFo=N_u1y6K{drSdej}HU0S%SLlA)(0{Cy*X)J-u8+Cv>)gDvXD%
zKMX$?FTwz0K87_!l<tM^&T3a3VA2ymh12^W<vt0&M^`%HwosBW-w$~MCsXr-Zhh31
z;3U^#{2A5@d_r_#B;$%=zpZgUcha}qtpJ4r5|AwvkpFq#Wce-deM-L52Z$4vvw-p~
zcxwCFr5+&O_TKT~4gNaeh5+V}=xivUvzsvWzZXM$vE2F>-!mVWVJBkrRq^Pj#3;~?
z&sLdvZy$(<k#YaiHZ8r2{Zl*ssT9JpOt)Qsu_9V^dc&Lmj_$G5aoz;iUEQRV&<?LJ
zsz7^A&;8(hNVxHyfKKC(?y#KQFpgd!K6K{lhtXY=XT$}3xJrC0rGad@U|GW26KM5b
zZRR!i>$`vq3olv!{n<&z4!7o0Me5OnIkz2-bj?ljHr7v>87$|+W7Al7N0QNvD0iL4
zkBtY6t5P;oJb-^@X5MLM+J0K=wO$IWvR-*^ZOHr*J8^F74Ly!9GwF2D-*TNO9hkZh
zw5dEuLr5~D=}pXP2d^Wlwe|8((AF)Tc0hLcnZ{FoE3#ll>(v$~2c?NqtRcmG%}jnZ
znz{Pm>SA)c(;JVRT*B`jz$tB6v2~s+(}|7Mz@zz&)etF&j}FClK7JE!U*{S;B+Qoq
z`{wlh1wC<4r`DhzO`C2Kd)jn^qa9+ca3Bx0TvP9ezQm@JZMCSS%%p2omd|f!OuGG6
zp8K9mT?xFMs@?IV+sFEZ8+-qX4nOz)WI#YXmL2`X)a&5OJ)pFRqtA5K@>uC)hzX2H
z0kf|@dU8*~a(u*R;2`l@XkhdbQLaRciFK4Nu{@)1>vFodgMUx_VG8(XVSuffL9NfA
zlek{WnOpv`;SmMEqdzW*LXD-0>pK=(V#{Y;@hg&}xS9Xf3z0AXy%R{8OH@-Y<X#)%
zP-V-mNIm_b4RuQezs#u^tVJo1pHGPSam(-V%3CzA?OPE8<`(`d9w~{hrbwdD!nYkO
zpR*fO9t?#|-RY|4rqB0ZdGM;5En;vPa4m*v<7!fZSr%8NK^AwioHtJb3caNPF%k)A
zoF=zL6Im0*`?<uyC=uo@3Ztu&+kHfERb9VK*zanAErFxBSinMl#C=sMC#4w2kP?X2
zkuk)n+MS>kp-@&|gQaAI(4YVXh6NBEEdggc&?(;yT)njpzw3%s=(cWoMjeJcmr((@
z<hs%dLvC(IuVp3f-_>iS4hU;J`6K6zHr6?ml;umXbAFcQSEKbJX42Ei6U@MSAzHw4
z1L!r{2f*1f+h99^66W6Rex**HfUed}a=APf)&X3YEMNTKx)L@aS*QvNQ_r#L2hrx`
zPVWkN6=HNs)5@#1$JP05t3t~q`p)2Ab;v9dG@1SooSSdO^L}#?pK6PtT_8OwVI<hc
zAaADq5=T63=$Y9pt(^Cc7G7^5FW{Q+J(lw46FaH~p4q_bvrfB|<Qvg`gMhtKS-sSm
zbnUMt@r6$Dy6>+)4w=vAWxuUkR(m&MODnJXXgp{D?~^={*z0lNlE8Hi!)@}7c-UJj
zwqt$b9`o<Taw_8sMq6SaS9Eslq8_P(@&EV>H{UAW4H>uS0#xCny-<Kf%i=7s9d0_8
zmhwu2#iCe~K9h~%{N6i6Wy2MZuCbsZ)_zHiX44@~DHLZLzn_ykcGgi-J#SqEV2kZ#
zDOrk7FwO^9z>>17;P2okcK9teL*2Uaz&DaqS19HY;EM;$B7lifR_K-kh8JVR131i)
z7t1^$FT`y4#CN1@n<v=Xnm>xZc^HK`h4<VKA@yA-uao5DUQBxI0kpqKKS%3kMn-Mc
z_guH=J3sU~(0sg^l=dNBv-}DxL2l?R`s99a`WN7y$eB_T1-Zo0<LlIii5T?Idm+I6
z<$I_J34XOYsF~dQ#@PK9yMGd3l}#%mM<>H>sUz&XVV_dL!sZjdqK*;=&cV~ZR{%#!
zQEl6Y)`Nx(n*H&r1DL^D;VbLumiVY0dVxTv6F=QgL+vlhT5|lqw%I5?K#@TiI%atk
zqP356va8kHmWXfw9~D5BJYTZj<=GZ9PfB)vlclU&0C0X=I|fAZU~T4p-HXz&cugjt
z`H$KO9TNvC$Uine0^+_x@)Dl`?`nY%vV9F+>(mpR(nZ6*QarKeyn<QD8F23H2RFrT
zYN_G?2PMFuGT;Iw2BY+da*Y<)TSB?M0fjAnT%v0Q1!J_=zOPyH03S$2#m(|hWJ$c+
zr}Qwu*1AB_XWf#!T#)I|LZ1?vd?$mLnN{PtJOyB8OG%UhWbJVhXmb~Mp?FKbTQ+0)
z+)nJWuHtrO9R2j`BSXdKYf`aO#O*;*m?#qMi_#Es2s8D^N4jTrl$HjLaNzK2v1+Sj
zEf)cNhaYHpSzieOB}1S*54iWh%n-)LT`y-VZ2?_!rJ^Z|Jk|jWMNxF2my#XFpUlo7
zxd83n)HUokISLlb`7Z0Sp#B=TptRRu;Ww?Knpxm9_8oWr!ZH-NsW5a;YV7f!>n4bV
zSio+m)`g%P^13DFNi{UXSPSVsgS|N1+jgNDI`VP(Z@VAfX$B8#zmOa=1z<t^KsbMG
z2p9{clTLnB0M};-ODM+7D9HhDgYIfuq#2&`PF!paUMOH^_+gQhae#xE3nK<Tu>*-N
zYk%1aZ5KeZzSR*C=-9U%4{V=GR1%qm0k+)~SK#WFlt!@SLa2oaLPa~vdIr|Bf*bI)
zfNYrp!d!J=u>iPZUm2-XZB)-xgAiB5mJ4%xmP|#OfCN>wI@q<ckFett2R&<!4&04>
z(s1D051#n?PVhcDB2N@*mvMf>*yM+QC@LQ;@Ct%ztCJ_Xx|*uG24L=Z6m;{gv`sGA
zCEH-biESwR&6E?AaP<m8D)Xy8Z;xj>phX;9IK9zUM3>naX?4rOuG0ZA@dtN$K7z$k
zlOGLk1eea-$@25H8}b@jja8t2s#0wj^$Ma45aakx7tP*_M>uSl%e}d2fyd+1kBm~f
z){giA8;y{Y&Bm;NzndCl_XvW((x99yM(z)V0hZ0#=f@sl#|FWh48YALOUu^S9(y%>
z)xS8!^X>my<=U@@=E1l0KF*=R*QM2MJ!k1DW=MG2aMW;=?ENQqf>!!txl`})Rurr1
zc$f8Mf0IA`g4BPZ^CI!`yX(}ekLUr{o_8<gE9?_2-#&}(UJSVTD<JYMURv9LU|Q3e
zp&i;JEk)=?ROI6;aSc|UK~bx)>IvQaDQw#{V0u^@`%Ye|MVyG_iT5Tqli^J$T)Mc`
zHhu(27wb6ZR7CF2;pxsv4HpUzUeA{Cm_14!O$BSJ&>fH0;&)Lz=;BSQS>8z`;h*PL
zC!i3;LN~uRENd5iiLUjtEN~L;ZH_pwQLIPpqVSP}+=>ebY5#!^LgGm4&S+b?xuFh0
zsx;MhuMPwhrtPs3Z|Ny{v>5}o5Hpfi&g4lI$$vz9#HNHi4>vLE7`6EohrFF6keS?^
zrJa7X_`Ny84Mz2?X88aXLL6zMiDGkVv7gj-k9^OR_xN!6_S|XH7?jn0+n$#6k-IMZ
z#r}y0@oJ&zrS{LC1!S((DGRA&(gXSxvrX3PG>AwgJN&MX8DGghC}xa2caQ<5@y4H+
zt?pqveL{2fq8HzF{AN;&*XlRqEAAKWjAmv~x;RK_w%q`YD5Q;6(d_TSh-p}&!<V6^
z9bwacCwpKj=i`FBv7iaRZI*lQX+7e7d@b5z3Y9feMQD^aHS|3@oDX=V?zXPSTU;jP
zX=}C%N|!Hn<<+cCYM&;|YzkwRbxWD4-MhBOu>tXjGNC<B%xYwmMXg8a_^m#4A~)ZJ
zud|c=G|~=!LygR)GmkeB$<=g3)OOln1|B)wX`yskG5<N8^~|QNoEw_^s4JJh-_WAJ
zC+;FWVRu)Cs;%8cF;z`p)X8&C19X>;YUDV^`faoTf^wHQV5c-@>mf1!D_uEJ%+|BP
zMJb<|=&$b-?=l}s`y(0a+^5b7{RO*KWy~=9Bnq?>sbD^1V_;-bYO|@fo^yOrgk^0n
z>St1?U?1=|a+U6_US2x!o7l|_t>!FCL8X@G2(#=S?ci_QOOCp{5d?)POglP}7Wex*
zpR{ZS2vrTfYIa<&qBl}ba^!GXtFTzauS$vb9l5ToMTVvBQMO*@Y!9?#&}`~O>-+g%
zHDfC>b&fI|Kc)Zm0+{s7p$WtoOL%9SR*G=>?8<7^snTyhN<sK&Jo?M2eSY=yAC}<{
ziafaEMk>UH^gCUX?(4oMZPvBO+Hd4B+hG1kxX_x{_eJ@Wx-OS}im`d$o)0m0g5J|%
zquv)_f#TUMmOKMNr*$5%&tw`&?ZmX3*7U*nMrpIz9kWS8kI46EzOp?J&<R`DVXB$?
zuMy!J@|9QAuD3vZ%<;w-CW?@AW9`(iug~D3I~^0g`)u-;bM1oXy-p$PH#9BjDz(7F
z>q}dq8)4vK$MesYUW3TJ!B8`s&I&G+xG+y<_>plWES1k?Rx3x1y0zQLjq1Yo)6oRm
zR$o}K!i*gxR}T`B60QGh#8D_q`yOfo{|2}ZiKc-}6-BdFR;9oTr*;Ka3B`=-s(06`
zPNftE(Xq3wBz?0QVZ6|fed1p_fZwMf9nJ}VW^_1_{OUr^3HXmRIxR^)Ea(w}L?4Wf
z5lB6VkbH!oWYrqNV06-I?MrMzUy3_3&#STTvD66?4PH=Ic5r<F?riDY9%wN0nNy)t
zPxX^@!y5E$24&CPBv<#wiWc|lK&(h_Hqf%pS2Ud!zKApi_qeRYn{7ax!fD@%(eewe
z7EP^2L_B-Rqe9G`NKC}fszX>3y1e0pv7@0<`CUWT5n;L}*Et&5Qp1SK7{LrIpJm6x
zCa1<wc!izkUieoN=%?z19@pEu7uwM3Vm6`Cm7f>gZ5I{5N5bXDn&UC7Rj{?A^+7jU
zFA*ymn)y_I*068a038Igf9O6V=`nrftxM8%>E6~;iKCY9D?AgZ0{YF9EZ*5!vD&NW
zHvnVTeV;}ofHnU)G6pR|QGDSCb8y;VY{LbRa+saOJ@G|^@OLU|q!FsmjWn+Hmoaj!
zFjrmWnDnBoe7AJ$Nt4I=`Gb)(|3t&ul`ml^<SIW~$mk-v_KcIDmi^*{D3$>-4RJbX
znyE1OdvcsV{q|oCZw3U<>z<i_dTzg9Sssmk<B*5~^flyS!GImc2mveqnEUVndIn}9
zDD9d%tN&%>l)pGkO(^~ONUVw1>#4Eb+^FMunLayAb~?I~bZ*~Bt5!On7rr@F`o2$V
zbnL>JXr^lUR2+%5FE937Ne*}rv#Zr2ujU{5<T&9Tku>OrZo|n{Ae|NXh5{XH4vO@8
zT8FfHdSI0T+!j$Jewffs3fiyn=RFfjCkwunDeh6U+m^Y2mrBzY%@Od+#fVpxNqQ7*
z-X;TAWR3S;utO5{jAAohI9$&nDA}Cg=AMV>fUa~RL3G{nv~XyxAs6eA&|^P_Cq~Lu
zoAA-uW=T)PHWGPdHwRpe|D?glQt^Alrre*6^cy9)yfWRi(@}$vzksh-tQ)AbCUw*=
z_2>nNt5_W`-OogvqNZk@G-?1NSS2PGF`&^Gb;gQH53U@#c=vk@HmU6rh8s8ND&t=7
z;b&9l_lv_t|7g}J%ooisRmInIW!eezv+jNfz9qbGvYa3&{{t7%I?LNNTaCrbqmx3$
z!UTl8BrAoSIFgS_yv0o^Uq#Rl?!esKW~er9JjS~IOwRYE+16~kC+#=C5o1_)_ze9W
z?V&WNMIYrFvA_+t{0Dfq{WrTo_I<7<tx=+4NRo=BP>MPZ9buBj@aCB^o%-cwSZJD}
zyi!KqSf~Ju@#OrBj`)MSu5#&x*$!x;R<arqwz%7iX11<o>l&{mfn80`o^m4*=OeJ2
zAB5l+Q!h)DLpNi|I1CyWih{wk3`Y_<rqVkhcUhW&yDSI?e-VzXb(+J#qO(dGOChRK
z?-u*G>S5aprK=TWPQRh3{y>?=8BTeh6SXBwn?Z>=-H+aUQ~mX5(0E8BfC2JXi+X?;
z4SqXtuhSbARRVy2*IV6Olk)+V14C!OT#)#6b4^E0yX={YH+JtWxFTws9QbK#*@fqQ
znaoamuUtx8&O{SQOJgpgpV^?D=Qo2aJV&*D_Nf|y-L#+rOIfgjS`nXfKV1?dI@j&2
z#celw^wJgT8Y`}JhQPMF3>^70<&58mn-^qADFSqNCzWOz9$G4mcxEwkjLP$HBeTY>
zCiqUqvvZWkVe#5=98R@jm%M*!IeZ<MPmUWc?i%BoZ}eJDs&xHqiU+-L_K&46;~{zv
z@t2MO_NeC#o#mogUc2b<2hd6*lFvlWYc>Pj_o^rEpZN)(I64;v^zSJ%F?!(Hn8|?q
z=6;0nRI|wxEi%TNI4fk6_Zw9Ht~K_Vt8kwCj_0|VLLc`MY==~fvHP=VHiJ6MomX<!
zR&*3gZx8tj8(T|OR&UPi&7b(O^&4@U|1h&|traVA5R9-`r_qe}8Ta#<r;sT+;3PO_
zzX<gmubzg$UVHdxl#><dxbb|XT^@sG=VL_`Q->8&hwV$xhY^23gYa7h-WZYW4Vs`h
z6mDk>-BtD+>w!ZEMus9=Eh67=f3Jt)6j`YisEV$saMOwcRW3taVX)^qKbY~meDzk!
zvm@MwZmoPNb8f3>UQ{mA3Wo_5Xd8Seeg$s2VO+Eb?4{VM#r`pzF)JQXtn=wqz}3A0
zM4^QjqM~xq8nJku?H3)R$P%4Sy1t?16WH>;Z=_gn30LG&;b*xF&lnk|*MmTCf{yG~
zltb4VG+X<0oHh7o?P&?=c|_d1E-m@Xi@nfPY+;_$<g0IUC!8Xgd)lBzw%n~5XRLvI
zz19eSjt#GGnVZ&VAjdwNZ}8o8Z~6rqB$?N79~1z3!=Fr7WM5OE0}RTEWk90}83$)-
zg^YRcZJ0F%PCQ7)={ww7==Q6TBm%Czu<)=9QmVbdnffC5iF58|8{{4TEY+v|_6fXn
z^v9o*BV_DEb@fYndV8fu<u(JupO!LT@&LwaV^2x7V#d*%(<CzGxY#zRK&|-IT1{bC
z<Dz2jYw+*UPoQ8`=Oz1}?Jh0&>6dR`D`l8E{Hmz86+heh#}v3L%?y}D{#kYhH(U|r
zT&Q4=E)dC|lDT+C`noJ{wn%pR%)dAXudfoKNyiXlr-=f;e#azDg;DNbS5eF?zWucB
zywJpLT!l_Kl2(PM224@MdIDUhC>6H+<Hg{uZ-QMsM1hAlM`=PO?{wzjrC(WgAGblC
z<@4<1Eh8-2IxAp@ABD|)$D7mrK&?I(f7a_|3+6_0?wIQ_DI+`FadPV1TyG6m{em1T
z6_FrVtF^Petex(r@7;C?Jf}f0Z;p;fJToCJ0E&klfA(9*=PJ3ZF^L;)u|&JZWc|)2
zHE6r|W??PKDU|DVDoh_z0i&+dtM3;y5_CMgIae4I0aTpOPF`RstLl_jZZK+52yaDe
z&W-RF3wp5|ZztvHxB;EX30)Tej#oZE9b`cy($0X~isNd?Dh++QICys3zC@)3WV(QK
z8?Q<&EC#=^Faj8kDg_~ORUPnthG4(q_f9o-&d`H()>-5-a(UzxfkL|a!f!e@TVV%K
z0gYgm%b@d-fjqq6RCgEQad`=M?@>i^#`{p@X8<#ImACDwD$M=CBFArj{+5DA?eAVp
zruj$FfgF9eS#UzQ>`UkzuV)H{FB)+)_Y;U@VwgK)(CE8wIRr|z(>#h_=`A&xda81|
z>$<JPGbCyqlQL5}>N+;r-FXJOL6?FcRyseRNgHJ>NvgNj);mm~W6_Yl%cx88MHNnQ
z{C}AH>c1x2zkL-2Bt_|(7)W=gfP{pEbcjedNGWW9A|aB}-QC@wG!mm5h9Ha@qsJIy
z-|PN7_x*Y9=Rf%Vw!L<3yRI{i_i-HW^L(E*%EIoiCLDgOS1e6_Y$|aT->O{b@akmH
zB6$pQGHX7VUpPda_s(I>6<)Mp&qp#6PZDU-qVjBSHO_+DkN`0M+|7Pxy^3VYqJqC&
zPkuxwUmiTcKH49?vGsI#tJKm>7g9O*Pu&Xn<!^u0tyq6-?A&?z(fq$z+-jW%6Y7{7
zS_p1>M3)$rs!pS?%|PCFma-|4v`iytLD#gI@;fhwOU1?9e&N=YAB!BJAg9cbUMr;*
z8#{HQPGe;52l0mnfis1gyk3n|NtFanwy6sZ5u<v_$Kf`4d355rUzOi}9H{a9*+ia^
zR#z@WlH`iNZ#cRiVcn2I!RqGB`jFdh3wU@@0TGGtP6LJY_S`tZdT*j9TVnR9<L@0k
z*Xc(^O#Nn#w=_xj)S{+r3E0lb-*3#amNf|PYJlh$TPjuj|MXZEFbDJed^O51YAKG(
zwCJc1H<}~Su*<b*ZupbjV%6gPv$odat4FvfRN1Lg!bTr>I*^KK3(4riHLFnr22OPj
z3A*wMV+W%sa7}nl(h2g7wKY4-UZX<fXvV;|$n^?h3V_+pe$T1FbM1IASuSqVhO9|T
zZGxQze39s~p-W@@al51ad`md7k$cDmGfmooT0oy2oK-rmf}dGAj`DX|7&;nHw{Ar?
zDd=xqIqnT7&V=ML1(QTMc0VegW*VkpqVcz%pRXUO{7IH7aWlx0-r&=M@IZGVx#?)R
z4RD2PM_w_7+n;zPxP|qCyEB>kmZ6zj`Ly7wzvx3wd@%yaF}E`&1!IyFnY<ERsv|<7
z`-aOR<qcd2FU61i**;nnG;jS^YTTx3L}>US@6vru_?y}XuauYq9|kgfyV*y4v#HgC
zoACFd>2o)hwSFg`>tHu8)+e0yl~CivJvuq!H&-5`B?EZ)O2QUYTcU5b!hd(nLDuTE
zblNZAse+mWx}`)k=pV2dI+(Y&H@Btzkd8-c_X`zVST}tDE@PO_UB0e4L(5fyXN7Pa
za#QYhS!DdE=vzTjG2eayJ7Vj3dvX1=#EOv^Pf7#Zs}s{jkuR;yT{6=G8O1L*eq0)w
z)OWVNc`$2ob4jv@@6$VP_XNiiM?XwoLyFX9EXBI2`sm2f7W#`#HmywV5DjHX2&(cj
za4%9Vge!PGl~t5ls~lK*omgq&VpeO~`A(aj{!<|U-g*?IJ(E$EM}>b8*kDp+o~W;!
zucv{tk!<x19YuyzQ%%0zw8%G{KlPi&ILITyb91l~yeKx^+i6QteRRVGCn5TbK+tF9
z;rJe9dCp{RalBY=;j&VVt;|Ix5E9fs>glfCoSDPxkG@mhAke6zhUIb2ByaV6Y}HYa
zk|ta}%^j==vAgb5a(8|ON(~S%YnFHbsOV8|6=|l2f_Yr7E4h{4g9(&#HrT7S53AJ;
zTXt+`IaUAU8hfjX)@{KuiYVDuhK%Iv7v$-~G%^<&fNl<?r<3eHUf+j9=>E>W9sQW>
z(<Ykg!h7zwrJQ^@0tYLudy{90kHT<%II7`cPN2B=Iq!$FhB@R1D9Rt+21ZGC4S{e=
z0Rx);mx7B3a&7O!2i{w!3RknP{%Ok0GCeRH0gl^oZ!c^6h&cEb45tR$IP$PXx8cqL
zPQEf@<UFqCg>Vb7Ydqr$YTQ2VI?1#Zym``@4_*Fwb`*^p!i@zJBUjEKs_eeLz89gm
zHDs)Hi0tKt?C$~IYbzui4~xBOz#Ue(U0v{B58=)OfF(p;lR_PPH8OWq*$WG!4t=iA
zVeef0GXv07*RqL81_vKB85Or5K3}O|d=_KP{_=RotY|isjpS~3+Z^Aa^RVEgP)|!V
zO*aFGVZ+BC({xdW`-wb>zHY;P%IZ8l9+1fVCs}#K0K=3aF(&~S+_ibuHv0b*R=c;J
zc|dhzcdtx9Y+SQ!;c~er;07KZUPNT%`;9H`$^8f0P`|VRsC0kLj(ua3kIa#!tlcqM
z$}%tSDoysvQba^l7CDSIU2%l$(=#w_F2Q_1J&udJ+LOJ?lkEPUW@0ijGUo2&n+m%)
z7fs1BztF`C)TTVHpy~WH8E(>KV`yg9wjmuC_W+D)@(-%B1wGD!uAGSlT(4`Qe`a0B
zL`G`D(I8p>2-wA}=*QiX0p#V0ciE_;Yx@v+z=b6>HH~bnhyQrpf9ub57XK%M0GbHk
zKPHz&s{P#vVFdb*li>OvNdkvHDetfg%fdPDH!(1m3s8WHjA`Lj=LQ$Q27hO3@kG&e
z2&vjv_JU5Y&9<@WJoS!tj)9$^NUd$p3(eu4CumKP7s9$At2H>5)-eEqQ_p@@@fNp9
zt2q)7U2;|DsdT+wf7oeGFgIY1#6gXiTLcN`P#;EO;bjpTkvhgD$A6A}Xh}dtf|KH>
zoktB8_;2V6r=^BqrObJ+uTJ{~$8Qd}YG_v`eQ<G3F3Eudqd=UC-`72Ub(7;0(SdJz
zkwIBRLT1hbuY=2z=(13vJY<xX>3-K9)I((BYb3wYd(8<(^amJ7THz^ksXE`isN*Ee
zwdGjk)%m|hgnMk0%+$$C;<qecE(n$SqYk5qa0CCIUyBjH&u)_bSRfrfEQ({H04XdN
z?F2~l*E32cy_P6DKk0nJ(#Z#?Bm_$5d!*!?GWvtcrfygiRmB|zgsN(*hi#iNQ&1F@
zh{rtHifj$;bU%X^k%^o$)zXPCl>hZiUc_9@jyn~K`Th^Hlv=yqNx`8Z0X*D+|2SxW
z84+2nR#Ua7J<z6#s(a?Ts5+V5Il<jI*=D8&jUTL8?{|?NQZSkfdok}-wS|w!DaQ`3
ze|{01dJ2c!{NH^2|9y3F_<s^j|9t5`-iK7+|4*NFLov7h_x5^=cxCCwPuyO>EGfRp
zI!!d&^KqkMqh@fr>h==(bvB0Rm+($m(d)K5wIK!G;+7tx;O!HAW+L_o{QRY;(WdX@
zShQy3=jKwni}&j24`CWP{@j+>88Rb8UgQytwuMyo`$xS-h5^Dqy=B_`hNJN#AVtp|
z`7QUrs(w2`aOj$W#|!E3#e|5<hW+bT`8h&VIui8bMoms#asDHmG@$%rEfsr?eEvA?
zuGrr)JuI?fRU$Bz?eYcktC2(EkDerFdvuW|bdO;paUjz9&X`(sKaHLFeRDC3m;9G$
zcNRHA7H8teb_t+O(RpC5e6T>Tb@lO-$N}@xTP+>kS6@hE(iGrS72-WxCPP?cn)$=N
zlE7HkKla@8v*v0lO)@L|)c@1bVZ22DwqLL4wdg;$Kayn(ykjfl`=N}Fc^u>R!czZd
zDB%r#(6^T&BnWF_5X%k2Ofq)ny9+J*59EWOTPk~rTc%2_XXrZ!-5Z+^SY|x<CG_{(
zlfEAS#f7^1c<ZlJo+B;!M_Wj!z>RRS;qcC9@49ny-5=*cKw0b{a`uSZPtp=E8!(d5
zMA>l97xo-);lF4^uMB93+0^m>gx>Nb;Rynmn0Olc?Cb{ntB}+5sA991zXjj)6ToN2
z3NU`F(&WLn6%|`8?Z+;(3`}Qk-#KG9M+C8Ov4_Xq?0f8=exo)uuOUXe?Ro)Us!Qeb
z=UgGCNdRwtX)gT<#q$My9U@&`$RF}#WT<-fWDh?rF#Mq??KsXVLW8NHj4qOa&MwT4
zYBP=kopXaK&|bp9KxdJYn(`0+EuxK^dtJLRbUt&!i6Ek48R{tyiFYT6FIL0v7RhZs
zNFU7Tf}-ydv76B7_-hoU{SVBp&&B+^rkp=%zOWkBH#{o4VO!x2C}EO_irw?#_bpzF
zmlnoESc>Zxs=+dY*|*M1SxKd{iW*`p=%YOMPRexih2o<^URaNw_r(_kz`E<1&VO=d
zdTMeU%EcAu+Kn-oo-c$b#2>TJQr+^|vOeG#<oui~uZ)t_^vNn9x)`JqkR?udE#sC7
zB6FLi&=ab-)3ZCwv^b0c{lc~fQz<SNY3JlNyge|=*`Ipv_J-Xn;N%G#TlZ%rWg^=g
zZ{f!a^(eiB<2kwIB4Cz8P1cMPKs|mLD0XB{ri)tb(D;}^QgfLVp(%t^@_v1RXF2EH
zfBaJ=4rY2Gsh$HmU5VdaI8_+6jb1~z-mUu(TWN#}a?^_nc-~H!q5uX}80Hoq85Kt^
z^XM*A(sPzHLr>vsBCPwvbB}|Aga8k`RN&>T8`pKVVym<4ERsUTfJirAD`#CMss8U^
zb`|{Zt@dFAUvDv?19B-#G4z2NOsYXd=eXG=Rx(?bGD!d}Q|mA{Z%}<EqZ$*#r5p^T
zyJG-`0lrWR{Y6k^D$3<?`2DjBpBlRxSpilv!>c)>&7gaanWlj7oub>r>CK3qLdH^E
zN|b$fcy67`LQDF6N=j{`*u++rD#m%T?d4CeJ*#?bR@`^4C3k;_b7ixHQH_bq7Pg7P
zuaYIEI^IN*$04HP@X8h7;IO--g$TZ{vgZfQ`x!o{)!L;oM<%;~H?|pF(e1mLfeIU6
zmpT%$bLR(n7<F=dP&!@`8Xkzh(!p$t73^B#M>f9#YjN(*B$GLHW?qQLwharyD)_wK
zw!6yCBId$Qpo4VhRn|UZUa&lm-SPA#NR)((+seTYj}Brg@2sIunEE!$4R2W9Ai1Nt
zoa4|kmT_fC#2G%9droy>x&;plZ$qs}Hxhv#2II{&`i!jfcxPb_y!^*oxzulGzePWD
zc1Y6c4`_Ri%;Vp*3~4a;KJUNG)b-){veJ^>3U_}SdGb>XvevNTzU%LeqW2Ns%Yrc>
z?~=hb5Ewb&JLuTJdYhUOP?CaQ+Nx|<Zd((gGm$x2unmgZ7%MxFFqrrx?1|}MS|gdR
z%%84g2Zc+!MJ=J?6fldQj$qQr17TQaAuKxFg~2jFRaS)B(}!oV_Y>lci4Sx^df>pZ
z4Qja4gGuyQ^BzbI@++6OKZmQJq%DlJC73Vuk*-fE2k=5V6N_*ujKsJ`MSa}bjWy}6
z*WR3S%*%MLF!O^3B=g0qBk09pYh(XPokUJl;1AgL&+KBs759njl%>-{i3xWUgsvBI
zvm^RNeIkib&t$Xxyg%~CU11X0EX!@jkCV=7Sfwr0ZG?#)IfNUY)%WQm1*K3Pj3@iC
z;l~;cM;Xg2zdIqBQ0=zN+Ox(SaaTU}2(Tb@J*&}fS-=m~g;}9a6qeJrF!l%4zVzWy
zB$)ly=0S=)2=G@Y2Wujt>}zqIE6B8<svN9`$2kww3Dt4%1N_7Mn+`$$o0XQrzj?#@
zT<pI=i20#=*4Cgo&wYJL4y`ZNin<e4q{EZZi<HmsILH-WSmPGEv+CzNlke*ZVKw_A
zz->SLa99xlGS)oAmEH{2oq5YcAM>}!MA{oQuD(n`2c!m8kxZkkMt&rD^g`&<#5~7I
zn>_VE(R5d;QPssN42j1R_oz7Nn*{XD5313Z83kY72i=_UppcW}m%h&@&3HeNQpDY<
zGq(EjgscLM&DT0}QrFV5(oo*98yxD=D-@&9p(WlHXR5)v{A$28jGe-`^4fmqUsWp&
zgeN2|{!zwLeNAC8K@dhSDo7nLp6mTIPmGQaE2`$H&#i6r0Obh8xdkDvt87I?t!u+P
z#;teWm5Oj!=F{|>ntEyyRXG4{n(1ReBy174O|j+<r!x2X4>uV%Em<Ur_%sI2{qB~@
z?v1AlU$^75zn<4VJvd3oO0A<4y@ui!UB3!0-x+8YqRRg4vbW>~&_Z?)vG%82Z((b+
zvNM1-08IGu*sQ6EANSA~P$H;6giF4hIeZiszf51VR0KyHe+@gY&8f4wI~<<*n&O7h
za1A5}L$Y-xSj-p=)^Z%ky|o=6QHa}Nw0IfYz*tanuiO&T5nukBE#l6~+}l0*?nPdS
z=}N2OaRTkq=sut=;y(if5opOJn)?!tiN(3aG8BTR=~*6rH)EF;$>7SjSi8)#J)gW7
z=zO|28~mjuV5QZvv*g}o)>pa@9Lnwjw@HnrYK~OXB1P{ns!?Zzu3J5iF%W`--3s~W
z`Op|eRpnc{#VU@}Q=VU*_7&ht`Y(`FgQO0!wgHPd$6ABENa~x0SfkK01BuhjNII-{
zX8&kJi>y;FB_o<svFnu@{gk_qji3wm1m9>BQ8A1lW(TH^%!dW3b{;rRgP1B%12XQm
zSs)VpHD=lE`HtQ5jed+l9%lpriBw!3t|YnNRLSDWLoUbfq&-_pchMBMtK8bcItwF=
zVj2KJ)y}d9Sj~PCFgt6xo#~d{2yNr4sv)Fh$_$ozZTtRImg2#gJzvzNa2Lp_s}1DK
z5n-=;17RH$WpRvAJkGl*ah$Wd{5H_uP(NUng^dm13t<FTj>7?~HYXMFRy_x!Wt{^M
zy%+OY1}3i0o}`7s^(xIV<H?dzRX_mTwCWvfB3hX(kDC1qic6qDmX|mCjTY?0oSA#g
zgE@16UQ~c?ZaW!&{tOF!{?~AxBsZzwA_b5zr)sM>Gm=nkF*N7!X8Gc1d|$v^uK3AM
zsyoVEdfB+0<Yvu{2+%;Wa?NEO*;yK0T)+^t4`@~(xK02(<kB_Xj+<c%H)s&M1abE1
zq>lbV67X2redJlkQd!6*&5~35_iPWhb#e}|jh^uqISpNXE3e_ntcC@!WR5rY#w5<L
zr$B}z!-+n()CK@0j-p4k`ttg1K7?BEY~)e$Pg{>-m*PsX9aK`q3A>0pCF`y52mS#I
zAL@D*c_oEh&ThQp_-8EuttT1KY;73RHB@k_HGCyLAxWY>Nxg~6^~WIb9ij*Rl~2<`
zfL{R`wO<I?-uM+sgZ1BHbkAaCb5I|yIeR6K4KbGmc!A+X95sXMljOq3Kq3IAL`PYU
z%?%Qas?G=cI@A+B<`Ex_(VrjtF`3nL{`pIvJI*WxL}<s8+@baCqQ!)kcpm&XM?7xA
zSfIRp`u88u3*D&D`XrJ<YkOW>fC_rP$+=aWPC|$F{)F&(kjtzApun*wAsaKsufr7c
z%nJZ-SR6lqfG?;bHLwz}4-Gw+oI|V4f0S>TZlS;#C$QHNRVpG}qBwiJz#|y@IL7V;
z*0;YR-~d&91mo06C$t;0nRc}zfk<omt303<7V!{*lL+EU<!W6@GS4s?TGMPm1)C)#
zt5ZkDJrn_nOw!|nvpK?S*$wKme|#Oq|I2H_ChwiusSYSe>?O{=Q*xXbN4*%?cBNKn
zg0NLoU>zZvy`HmNjSMn_<abt7B{~zV-5))_!w(FX3}Hq0NdxxM%x}eJW7`m+TwDn)
z(~n)~;<nbriPj6#72XA!Ka}X-fz6t)yJBn4%f4hzvGfhatW$W*-N+`og6F=0sd~V5
zKHH05j-ef0xaq%p1=(XzG|{)`Z+|CMv+Z-+GZzv&0v2ZSGKj!b=Ues_6t$ys#V9VR
z!<$G((IW(4#_7t!CqN}a!{^AIuvKNt@7&+Ws9O9%tB&0CyP|@COaKG{?t%DsQstPS
z&S&5?`?CTOF>`yil^P)zD*)M?@EbVkIA!f0=m`gz4{Ots6~jh=zvFl&UutWqK4lTt
zo8`mzP?U1qNO<2=9SgAoopMER(hJf_Ch3GL%X1VxCW;K4mG2+xi$CTE(JuH3=J0O>
zJp`oJ21$9x^36NaMUMeSkmt{>IMAtMaTG>=zHh2(KW0VwOySpnyu;k<M%c@tXnfCE
z?xBAFl9H=1avveMfVn$646>!Cfwk%HcVq%?r8HW}id)%srS{q(6wp7vo;@-qmeFeR
zlLn$(2J&rc)NV`iZ`VJ3+-kGOa1hppnW}phO*!QmU#I}u-?cFhe|=K{x>cM`xbQ*n
zX_*lqJUYk+u9vHx_>44NKlA9tOGSLU8<{3^BW3oS(;LyaFk(CwO8Xa^NiBNMk?YM~
z(C{wIPxwDV>CUIB7gPqq!8P|jlWjhTC6BUKw2*pk58D#loyophWV(8<epYVokv4qS
z!f48e$L0bp1*n|;3bnw?iiqm9=oyU43uGQe!?29i$HUd!aPmoO$;7c=1W8e^Eqp4}
zgXc-;o4_hS{Np>8&$L>aY)+v#a|n-P!D?>JgQt=<O4EsU0$BU$46o~`0hn928U$Db
zFnkx!;yVX`mQtLbt~ngg0FYO@=#ei}>4}Md^sTTm8C~Ngvo0zPdx_q9rizaW9x^$&
zqvRYoM6ceeXVUc0&je}d?<(lrBwD4uRCA<6u(Xh6FqHW7$-#L0;s@!UHd?6?QdL09
z<7W+1oUtYPFF)eKR6FD1!~q(T>fsY|3_J=F|4!s&cy%$G`fWf_Vr`UaUD!b6B{I}C
zM-HK5sX3lc-X3z+z`9n|*<OXnFwQlW<8l>j5!kF8B8$ToMyw|0z6XF$GSz1v)Lq82
zmW<42Dut--Q&YNk6^c>aBe(M$Gq-iodM!h7pZubu4J?%p;5dL@SZg0fX#c5fN>3vr
zpgFax9?8;<8`TNf3Z@5l4m>W6MmID;Se~vdcKI`5JYv!3C!lF#6ZTD15(TKg0ya1g
z7R*7#h5?t>7UtDC0HY-5((naTp@w{ajP^a=%^uhwg#m;LYBi-J^Bc)tAt~}^?;bT(
z5APrL2m+_hgtv?S5tms}*i~Qgw#T<=6S0D$4v87VbVG?}@B{qwM&2ht?9hbLfl4c3
zgJGqqlM{i94H4xoch%?%Aj8`L^1@BTe|H&}d;|2dP1UbA?i~3-RFlcVVtePtsRVZv
zk5sk?wtba6e(qN=q{%OX0MMFDJ=t&*dJ)<QIJ#VbrSfE@(aBr|ytGzIemsh`_%_xG
zJcBv_xP==)v{rP_Z5zuPjde0|m=ox(%>{Gy3NekgMD#m_*h^kAN3*}W4D@##ieUgK
zTd0+nvaTNB<y>h!*V0OE+-nR!hh^EX=+GYhyv~6imtWO)p!_1PigDLVKn(WoGq}a<
zFB3|1n4ozi_p2EvgX#TyNkOmq!+&pO`|y0GGT*5)<=&roa`IDqpjn(UGzr5~GeaL8
z_uK^N*mA-8y+GlZ12GYElw24PB6XpC@zKgc@Cfqgc&z)J|Eyeacy<0=wWfvz;_L=1
zL%$XU%UJ^^dVd+s%C05jLms^Z?-^GcD_7R!sK+br>Q`@KCnsVqN1dvO-nu-@{>Ygp
zoE0m~XyRG`Yyq)=tEyrbKV7F283QGOk{0U?xLf=+9LGFO(l|VCT$-x8OFb05cX6J4
zArc7yo;Ob+5afPMv7jdctM|)T`}8zoPV`!LMFmW^Of<b`BdYlHXX>{EQ$6TrJUVlA
z5Dxu|!_JGtiL<$XQK4VYuI5=+Vj^^^J;W_fB71&<rjDJ{e1OvpBdpc24fjRPSkn8s
zL67LLk+)iY2d>8k{we_^I}nQ-4Fd{-_VmXXa4tUyF5U%CcK&*YsLhhh5LTPs$SM-N
z22QCaH5z8}wb70~zM0KQV62ZnZY%--aAAX0*ss)8<;S(H)`(3qgr34Otm4Vs9%rx~
z6s`KTjJL@MXHAeMQJZLa)$~pG)#VRE3#oiQ=IX<s#DFmkW#c+s=J0dl57bKFg5mlt
zaIr(rfnGqF41L|%mh;W$1!2BIa&i`uQA$n5`o<o9+%JPZ=YmHFKGeDp762}tejUIs
z6dZ>_@rgHN-rk9EGBVwThfH=F8_WVVFgN^R)Dp7y*K`6hi}SYwI6~{HAuI*n!UcSQ
z`xrz10AZ?*nzgRAU)Yuz_*r$~Wt2)*0vmiN&3I#998o6fEi#hhwuael6j%Xw^qd#O
z17UDFt_aL!wb@4|X#hSH+MVe%wp<JPnh?h#Cs%V3L~R0pu-+LMOGBCP8t~aD5D9=*
zh<R8vFO^k&ttww>{+jk|hX0!~n5|fh`CbB5Bpqm1vRK^G5c6@#;41x;rK3MG0${{^
zOv-3Gl}d9Me^p|@^1F?*+qPs-&H{0vDv>v8Zglxem=jgSi<qewgXaakO|4{9a<3*l
zSaX3u(g6lRJ{}=n`S92}wJ+X;FDXWY%bqV64UyjdGjO$9cWUe0xcX`^HEKDVTwK&<
z1apwo25m>0s%z;wbs#vMQlrQ#E`J5b%y6IWrI0(3+vyeq3M+r9Q^IUIPs`%D1=kND
zh~m_)r9Y{#%~fk-N!2ZrbV7D7l;EMC^@dIVrQ^!@j5dfX!VgLEhAIS)DDNiXK;~nU
z{x6>n<qEqo3^Lwo=|1(9$Ez*UGF~4QM}H)TX6Oql*rU$ln69x=(FdUX<!XQtJ%I6Q
ziEnbQ=j6b3UjT}{Ou@xaeVG4x<uf@a?<(ZV48P*(9<;-Gg5p#*;jw78J+~3yyI#3H
zcM;3(uqjCo11q+t(HTu?r%#I!?~G+wX5TW0P-gSjfu3_|gq})KehN-7uc!OKx{6Vc
zG58Y-t`=LV@yp_O4%J|i)28@Ch0s&;o5>}<_8Tb}PZ#9EHfuQk45jL+WvwtVT|Y1F
zOVkWu10Dg*k$T3`HKsNo4-jbk<^9z!pT|P1H;VwLt{*o1K?O?J6;OH)hq(hs%>&cz
zJeOrx4|u#Kn%7QCHJ9&0t)_A<6}7as4Yg$`YOhOo#|U8OK4+U`fW<7Kb1F>{%Rp#&
zF-4u``ozz%!c+HpQSE=0HXiLMUuVVqlO1E}#T%?=IxAzBL>BxB6&`AQ`o8o2_*maa
zlVb(iL5@QUB^Bi?321GAS?x6DiBaiD?{Ow5cBD4*K6Lym?_lr4`RyqKP!?ZXmSD8S
zsp{Am+nS4GS2jF#4o6bixzp8e35;-q7y+QzP;d_Fq;mCJxOaBBrtjA`{^okEF}d)!
zN$G5*l^=@s%>KAX<_qQ7MU1M}X+p_&mm7QwwbQN_(1q!yuob2*&jAB&&ItU|^sits
z<kkwO_#H*iAjxs+WrHX3gZ*RcLve*=B$9ifRt!Gq&3NX>9+_r?Jh_qKrQpGu=f4s<
z(;v{JVmmY5yqT3m^!9?`<9cbKr%=A9&<ld_^50<?i14#1@8TZ>Y|dL?!+aYfOo3~t
z{e$gtlSw96cMTAA5e`bWd6Hv9xo<Y{9to>iZasg$`9shkmYh5>Iv9|43{_9J2_?!3
zO{^yg^i`F+JKkDa_J_Lfb22G)nLU)HUvqk~O!GFfYI4??ubGmbYO|_~lWWJNflEPa
z`p+TN(~KhvMVDa~@x3cQV!+;JkzPv7K!G=(r(L?=J?@lW@87#S)L6hrcQ3IOdg!aF
z@6n4*EVr?k&&sZI7)p(*o~$Sq$+oYe?XG1lz0jWoBF3y+zNF0ZchKEdZdWRIy$6ZH
zg$Tmo(j}tCHXfj=phfk1>_Kb!_R2Ezn#nlzGX;)kVCxQy5r|-+x8__8B&GMO%2{#;
z=sN}5m4gQTG17}{AWQO4mJtYqvjH<d>sb~C*)(k9qcGyh41{4AJi;Gz=cRNKcoy0e
zDD<x|fMo0biFTzYTNxEui0VQCN>BnmXKi>q!`H-{zn%VQ!TZl;eU+W}+cz_n-espu
z$p(F2e-H?`W*MZAvm(aIipyIOW?26O&`wAY_7vdz%M&4&Z_<MhKKC1W^MPbcbuKNs
zu<8j7CFI_mIOuo<k}YxqUza&fQ{{ZGKtEO8De${oPI(1BM;dBF+EpcH0`$Xulw|H3
z4cnP}iSq9iZq}UnL2B*YXAPLo_68A%xEpr3eeVL{fYrrKm$o@)oIe%Qb>&wKL~)?H
ze4!=&z-$Q{WdR@-&>M2vfZ)`9r6}ZrF8ylpa>7V>g?i`0A#wmBdAL{*zj#KH)UBT?
za3KtP)y_x#_O<eS-cYK)l-r7%(bnJ@W;e{oaxn^Ugxq_OpGH3_AnDW$XW2FZDS0Zj
zUv{yHF9bw`#tl^8N{!d~7{=hHAIxVbD$jkm%7RJZ%;|-{P_{d#B?c^yi^JS}F67_k
z=941OC|MD2t~nGU7wW%?hDlmYSFK8ZHQEl@h|?GbB#S!nWX6GnEVWKD$@#hA)O$ZY
zdmk^6D}Nnv<{X9KG4|S!v){v0vE&&v!jg{cqX#}+QQ)#|Ab?fM{xHU2LRW<0t+<-*
z%PMfg8W6pS9<Z<;`Dxd>xZ}SMN^wPU|DCMsviWxw@hH%PjBh#imRmmoL$l8{l84{~
z_xcn0!3*hMP5W0dKOaqbi*0D-xZeC|UL+z@|4uM8oKixkw<9&D3jRaR*k}GUDB4mm
zRPwNMT^lxA4=}_+8Lj+VvLd_ZpY;d^su^;%h=3de^-@&!pB_j;97SXronzN_^^p>M
zoW*ku&tptX{TtTN`|XlUdz~mNnN#<hwgJC4YABc)N;nD4_eJp9DEN+fAMapKFOrHp
zc=J0G!D3dD%XYDNT|hVBX*HO8(O$ieszFk2<Gx#O_K6J3o@0{;I^!Rgvl)LQTe7GM
zK4nTu6JK`zsAAi?LwV=64B<No7ujqJ7RO6;?uJW3XFGX`RYMQ-=pog`nvRw(BZwCf
z`3)hKdU76BQ%A?uHQ!zDahmZh<)Y(JkU2kKV2+OUQBF0!kiKql>f9;&yz@Lq$V!+1
z#dbBDJX#DWE%681=|o>IZvm;lFpRKt5v1Wqvw0ohsMxL$Onv_B+8_-Wc(O7p?siql
zJwK^@ma3R!JnP!hTlEmE0bMG(dgblu)gI`)XXfeFIN$R(r%+Fyj4yYBiKvXGpoos9
z0)OVvr--09MMZz1hn1I&({0AY_-Pkf4(JzQh~~BsJ94wnzj;ndG2fQ9W>)tI;Ccr<
zC%Zg7@eIe@6?vsLhb>Zg=0gbSpz^J1rY2FS42kVSM|5AArDCnU{jy4peP&M8=1}xS
z;~YSNJor0K=#~{?PAYVr;Rmv!8X#7+My_COQ0p{>&=!gNjKSFy{|bjYzAbo7oSBgJ
z;(MX5i>y=POnFymM_MbE6@~g`#_{EM6}%)f@}&qVCvPJ^BAPF-t!R+UDX^^y5tVR@
ziK9EWzaXYz9wy0{oJj81A4H#o7M@3aaW(2}@_BNw>UU%|-`GPBYt@HS0+~I)7d!4r
zEvN7_DZwNTv{|&b1xtFBZ~F<b9?VSh_}DEU{ytcC8?U&Ab0*^7*+jh=eZ#P3HFXJJ
zgk&5T>rwb;X)umm2KXn#2)?X1d@3Ae+FQ2vAKRC4vTfe*TM*)AJ)&aw@)sRHP$=vh
z3M~F3N;!27({S&y*{w<%B)xDb8S@Tm`XrCY9QciwzrsH)-e3?bTGoQ`clpOMqDlC-
z+23o{V4%^|nsTl;oOn=$-Bd0((V{*1opYZ1xXi>{1F8F>)*gH(XK6*b*aFc0Ek6Ti
zvb+HQmcW+D_Ifx|QeF|1U-z#03`+k8U#JHPFOj00K)nDf-x#Nd)#`f%y(~4ktHDl4
z^uy)h#sjMduy-OA;+5AM^c73SrKyQ|?{^OZDCghsbKz24hn4`hYTUpB@Mc5p(W-6*
zJNJ?<tA_@DM=Vu&Hu<raGW;3j{H~#3yw?~wQlemyD|s*{_FV(E?s=`5>gNo0zm~E`
zoZo&3L{;K|{iQB_D3F|Z3_U#gSj+Uy-qClgx0AyXC6aMAK&<McPq{qvMgYjlbVT=_
zN;Bsvz-A{4vYF2j{ypE{!F5!zN=g7Lo~;(LAyJccm`Gm<cr_+g8<GcJzrG%9lwuON
z8kI_Tz*5HRHUCi*I}hCsveE$kBJm(%!i<A19IWN`;VK@pO)QmWszzvgzpDGsF8u15
zVBf`MDXXz>E^~VNnk2=l&N==HUkE)P^OMq!cYtO8q$~t2>wuokAG&TA_=@>4+m0G$
zjd3voVh=1%Tz#tcOsTt@+!(dey-+A~yLiK|Yd~B@6oOf<&IPQ-CFGM{hzuX2fiUZl
zoLmw*P|tCq`b@w3_!|Q6*$FD3OPBC5?AH{oAlY-E_Wb0_R(cEaybBZzSVXh2<eR{W
zmGs?@-ZEzB1!+JM<~KnN3ldrT4eyE=_yj93ZP4PS`!XvvK+<HWwE&P}1bDt`sm025
zSeMZ9gQa_-wgUK_>1`Wg9H-nK+HYIUpGxTKLi>BMsaM&qYR+{o34PBgAae%K&8}ZE
z#EHRLKl~hjBlt&2NbXIXi^*i_V=fF)O`F9JOsp^@n`h8{K*E4!em28mBxsUJxpf2V
z)qihsIApBwDVKd|^o#BBcgBh32W(LU5&i>BCFWOgtV_s3z=&hTy+4AKS+q|77EpGJ
z4v+k2*zavI2Y5dH-*qAWE5J@HtDgIMlpWEReDcs97GPmS_33Zi<cbmaEwYhAyYDf@
z<_1eaNz4R{D_cPk>#ORxn*A6i(BRGectMQOrPdkd&o8=NX1k)cTs9F~WxU08DjoZH
z_b#9YFuLhPw7xQ$M=C0|qXg1Uzs#U%GfY6rrM9pDX<AS7hZfAE46Gfgv#_rM*8(Rb
zI)+>}P!{CihLVftni5El!zU+Ph-cfZde(1hoE;@!KeJRHXxh?f<dQh2s`Cyq2E?z&
z^+th@`jmEC>V0aenSn>ZDxu@YH!Wf8_x@0g8ugq9#{)|-VqQmFD`if40&7gNf#e8Z
z)%n&deUB$giGU*Swp$C5VWDIN_pui&5J$7EkUr^%DFO>Tag+ceqdFj++7ejHQm<lw
zeT-;T%jdK8$%%ihF?lbLFT|vkN;DX`kNV`*FYB5n&hhLKE~W9Yv!T=awm6lA1liQ#
zPw^>tAbW;p(Xilpp-hw}rmBnwQ$3Y2897fUqq{VYI<AQUxLuhn@<RM<XA5VyJYqLy
z3#<sJK~jJ9r2>*ZfN;u@pKnQ@G!pcj1vC8l^%CdX>O4ex={<0OvqioNdNMqA(usq%
zh|c{QSwvMH?CwIc3T(%NehT2-<9w+_S)SdZ(g>XLG5Bp?YtTZ{Db$Jc^LM+2>or0_
z+I;!kms1oZpB_EDaa-kH5Z#SBRHqMcY_5s^rjAhy%r8ndy}Yz?wUB>GTAH-Ii}0Rc
zil`t|dKmHJFa_0cAhpw+D?p<K^{=k0wl-(_HYw8TWuEZtRya-5LD`>bWrq=1ZPjOC
zHORu-)Ny~Y&#<k<MBlpxqkLRT+<IcjV&=;S<JyN)smntWg1E|&=`3BtON*5|Mxcjz
z){zCBjQ8yg^1add{ZikTAdBA1=UhUGyYB+STW$5U>>Efji9|SD@SpB2w$E}ml;3oG
zN#fV*C!P9q!x$)yi@_&#r9w=e`b#Mxzwv(-RybI}?^BCZroYCstE3N4V)!DgHdn_l
zfH4#9Jb+vB{OF|ANg=Z@r%f#jIb9{3;OUsy`Y8Q`VBB4(dpzmB)j5X2xuv7|(JH*f
zdoNEW1wzr@oSP}gpm*r-jU^Llth4k(M{LRskzj)4+>}3Ps*KfHD>75tKC_LsH#{z4
zP`}bnDeHRu?#jH1AKO5*SrT~pUgcUhV}DOreN+cZzo^t6+1Mcs?IU#LnKm*fJ9-yx
zm&97$`YTCQVaYo>*8J5Z+OxMoy-VZcqihk-edNhDPW=||$6;{-)s0aM{<h;n%&uu*
z*5nIGsZwGVbR6kye_AEPuc+`1pHMhHlR>YauyNa3b`;xv1?JfLR@Wb0o$rcAf-juI
zb9h<9P;J!X83VcZiT+*@d8PiH2q=2?&lZmLO7#b`E+Su1_5yy7?Lb%gk7n~g%^I5(
zpTDzwpXhBBsxReX&vu^9YZ4`6&i7U_Y+JB$7PvurBuCEV{=0g~BzQKF<SR>bX0+nb
z5+$eWwDxKSE*f{nr!$a4SmQ#|t|LQkN_y9{ZDB+P1QC;WW_cpB-B4=1lH$NGyeF?r
zlUk2B+`;HZD9Omb8+^uQqR%T*QoraX!!@dBpkaTe8rP_u^)orQ#zDkbG4UCax%PO8
z(s>D@BMJGwwA_BduHnp*-$?nL@yK-W>0YTRB%8m|=(vdfNfY`>2n1giE+1Z!AydGh
zp*oiq|Lj)zr(>(<1F>weWRdp3|9Q;R7^~^0kPT)yjbq+LtVR#KL=ye<(CY&FRfsqw
zyOxgiXvtKIqhk(6n9}_5u3h4?j1Gn3^|H-O)2_VA(fgnSTgy`j$3dw@A?2zwcSTTb
z8fN-c)(mH4q)5@|_Z);<{4TgzTe?D<kE=t!2q)++L8h#7>UUdDp?u(Rz@i?p-#vtK
zt|#NS`!5~4YYQ-Fwnn<A#jSxBznwPlDrP#SYgq~$E3KX91hpU8)9{4XJI=b`|A7h-
z{z_a|q#0C;M3X?+<?8IddO7M>34ID2xejy0nKb(lFA;d{b*e^QzL+GMdOO@%Iu<hT
z=3$Pu_c>q3OT+xa=eej~ggxn4`s<$P#cA%7mL(or#Z~Wq{tWaQ5x9$s=OTgeSbGuD
zMapfl(kP$!8&XEPc<+4?!zrs}KI%o7XY*`htaryIot=VV8%50O1G$e5d8&3tI&tTC
zJn3}h&Y6hr7W~-wT<0R_4``o#mJdtI<@+#cc&S}yl3{cqa>{*C3GAR2a_W+0*Va$3
zmS<-+(~%IGrAun^bF~|dKV*@0owP1ANAguJOKDZBc|2c$Vf~p|8ggWnQ2N=se4myt
zSxH}Z!GZSjz6M`LeiF_a{;Kfa`Q-!6`2_9LW`2SX0p?3tk|+jN8A(v@i~?Xp*G7Sc
z&6!E@fuZ`;clwhO?vS(u{Dy8dJH9kTs54;$I%e@|3#=?qsaHK2{>w?lnxj{ZLw~h2
zPMkIe(?W|NAOXf+IGjNjeoK8tuRFraPso6@(0;;Q@m+7xUJLE7jmiWFFkxrPG;+;S
z4>q)O5#fea%K2umTPN~A<a~rBwne7rNPs?jA;KNiDg}y@v<x?{BTCeRr>U2HrxWGh
zNgt%#pXEA|fi_TeLO~Oe?H0~b59@e6p}F2P*{?`Gl6Q1h{%uuB-TjZEL{1g_+8L0(
z{XbK>7n;3Y`pAb{j;qMXxcqgw(AP=_B7^%E?0u6uku$2#>x<b<RdXL#*X!mP33#e2
zrF<0aKLdX??yDAJus%19=++G~Es@iz;Tls+kp>k=QGGY7pb#(GdAMH6<mE8eh)gYe
zqH98uQlV>a9>M3wujIx07O2dt)PIti!0E1zmq;Y+K&7yw{<$|s*}Y=u{o_}o5%RPa
zIW_7N&>ShkDntF%iaLpj{JgS^#vK3JKBA<psv6^B=ag^DCA5eJHh*L7(Jv$TRQgp<
z4|Bu2!EvU2PkC#RO7>MsU%efjLJL~n9)*Vk5NlEL(sD*iEbF0RgH_UAx!$e&;?Y-q
zx7W6yA&Cbmv_ll285ohBDG__Rs}`5&5KKL$*tdL4XsB{VTdx_NstI?9|5F{kdpoWv
zz8~*BQlxk+_(=rZ-^cmH=cW|i%s2K_MFpOvw-xu-=!(r$v`vVHDujyl<#X?hE}}&M
z$Q-g}_(C-tiB#Mc+vgxFl_7LY+;tk1ZwdBM{BK<VB~K$kAMWrZt>yCgn<|t-2x4sk
z1jQ9If+J^?j>wilh*h{vGa{yJwz*Mkcwd#-O)fbKO(@t@-g0i?E}}Y6z_vLQ7_2&q
z=-8o6`S7J-D6r3z)a-K_@J^cIoix49z2)jZujbgzNQ<jzbG=6Ypz^HFl@GrY<vmOn
z;HxO+{!(5(U7{Y^JDhs0vK(C8N+nW1Z6JS~fL}Jls%G%6-j5>HF$Ap?d1bZSkbI!^
zmZjOr&_C}JPya$=8;t|Ss)R?M-k#u^#`CnZohkR+4Eo#{!o_VK@3;dO%XaTi4XPQ9
z;?WC&`~s1LXhBl_TCaG5Dfl~2#oY2p0xs^^VqN*-z_FgUTyhRY(Mhx8Hg`)C!3D2s
z7N1BnSOA0VN<C=VIHS71zlBTVww0P!_~z}D6vATtv?&%J*3_pxyT(7j1->x2id0of
z8(rRGDhqJKi@bL)x1_dQUcB5R6g94D;Px9ai~e4YUN$t#Gpg_9by!R&bJ{_%sqb-7
z@(K@`dE4@Mt4>Ws#E9QxnLLmq>Mmt5x6jVi_Equ`6qkiIbU#AR8hSk-621Rqml?f)
z&zXQP5xD;0^L(kYh0969%-+jfBMtvbIi9)sc>@IIF@s}6XO;dynh=rqaOZK{FFl*|
zS_X`RxcReW;d)`P?=Jr%CVJswJ}(7vhHBbrJALjJHFtqenvfa3*Og9mCk`bq2h`(%
zHnU0MDA&IwMf|l~`AD5%cuF``?u%Skn$8{S$&+|5I&)<^&4#Ji?d+phku3=?-wzk~
zRc1}z`=~at*QbIAZw`NNJf*i^C_2+8@9{}3!9G4@UU76y<+EK|p<BVwyOxknl5aHM
zpL}C3Ge{k2$XrSf@_vxh#{6^@-g7UxFvpaXO*d8d*G<QyB`x}6Vp5_6R*AsLnJ2OF
zCehjN*s2<Hc<OIC-_?>dU2AQ$GZZr@Fm3Vi9U*2B1X&!&N6eJ4swGHdenKy>Wzgi7
zziN6@7aqLBhH$0`pVFu;kZ2_n^}m;MVi(_=9s28eva__0=1d7UUHk1P=0`g`ZGO`o
zL`(J4i3-$ddg>2qTWFo9WXf9bhd1$u-qN&Y=1g<NM)I_lCO!!1Zec}xkfv5t+w7+f
z<S0CL0`1v2_{MTNo=h8~7WGutZ3(`*stJNp-qz^cc(~54Iv2;L8Jp!CQ?)GJzQ?I}
z?VCRMU6Qy(6>lc+XijJ)lD0$}v{Gmn!JxldUMKS4`D;N+M6Rr0_YCK2b$Iyn-c3Xq
zMQxnB>ad2ss@FO(b4F6`{Efn4>u!lBV;}3&2$@13eZZ!{+ubVVu1krfsa^ZO8bD@^
z9cy!L?YL(Z_{7~9SuNN4btj)1qQCm`-b?mEsDWX|-7qW42d;|SM0pp^^l5=^HC`g0
z(BOpPNl&w=T&ZNI$Y`QoUPqV4zCp&rBZ2U0o4e1r&I@xJX=AvjOAMIZjdcHJY!_i#
z_1Ds+pMFn-UiQ^|SkLhXPN4?JpOQyai^*b}L;oRmn&nJ>S(_!iR|<=Qbd}f(@();y
zdD}*+X8XmhuER}U?Fv0~1`G?Krb%M%Q3>#lL@;h5aUER>4aS?AxD`<`kMrdX=E=uL
zM-M<7n@KXkH6gG|IPSP}P6j*8e|^at5UfBzc9@1k%Uap9V^|`Of3g>Z4Q#M^p+u7V
zX+J-FNXYE2IqN6pd3Xbd2snwpK*6p=V7Q-UpdoQ^JlQ^MfJO%Ak$y3Ma`g?k06w`{
z=IP^ge5{rQ97lR#eT=@<g{rzHh4Y%W@4+MazgVc$Ei51Va?x*!`eSAP6ley7Bd_~V
z7XjDX(XtUs=&y>8D8R5T_%%sSj4u{1F&I2}JrRI4uk}H?ya~W^U!Q@leoI%W6)avf
zm}99j{m$vGPobSZIb}aYsutY(v9@+23)2EUOi7X%MP3~skG4#ZZxMMFx^oK)Kh?6X
zDFd)okYHdR+4WP=XRM<AkN+0i>l2szNnO4$RM%O99}YQ;Quy}{>9FX0PYI1j6z~L4
zyJ`VfPB6KN6tk_{+rX-TjWnye;b=dx=#j(2dta?Sm(5km9^^<-T1qOTd9UkWo-EHH
zwu27dK$%~23`Nz|S_3)6DaA9jv-1rY20s&pz5DoaQvuYoF*`BwD!?4niPWwV<+BLm
zr?w>NL|WEdA#i_O-J`*b3;P1dcEA6$R^}kz94~eqda~>Rot&8PAMhJhic47#_lNrg
zM9b$~&?=#Rf{#wJHU@tyEVqmQM45j~aPX?T4_{xu58Q$98=$)w=ex4d!HTTUikNpk
zoF6QV{={OElhdsn94@~u^Zp|(m?QqPD;FXmbGQ`kNBECV(Yw2Q@MOpJ@s={JGmhMj
ztle!u_!=y0+cono6}jm)0G(2o&J4GfEfcUvox>q$&r+%csIeppc%X_>Q*t$rcgbZf
zBn6W<qsXNoFh`WP&twuKE0r_6q!afwy>bP0Il!_}&~jzqcUbQ;ch9JxHdOcbjRAtr
z<D6)2`mf&&?cp=Bt7C+?<j&5lF=;PZ$f(QiNl>qM0h%BH1*+5Mc<}YbP<|UbFNxc=
z-^y)L5Om513$D;thLgFK?+FSCra^n{rY-$!BvTAQOM-O3eVkaVOltOyMY!cHYCO5$
zvLS6rviZxO%%*WF4Awz)u)&wSsB9oL8g(ugMb5Z%oC%lnYg*qhKVGQwksNIxa~a!T
z*YcBHPIOnl<acTBjNaPkv%DrgC5s2`H2X2El&2$qe*5t|3Kwj%yIPU-&t@noCs%NB
z7wccUqt+qYAjaR8&H<PIn*Llf_9aMx7r8OXlhZZ_+6Ss((J@iXedEcjQzc?&4$nI-
zSF<SA^<&vQ&)FN16TNZ2ust7%M?<WIZRuoP?K-3P(`Kt7RfRr=HDb;wtq#oR;{i4Z
z@xqrrz<ESmaR!ZJWs~EHAwf_--~@}S?-u@(-_2X=I*^wQ4Mev`C0G5o>*42VaD{@?
z?STL8=`1{6g|#9vleY~2D?sI?j!LZ!Gu>}~txSLV_4bWEWpj%K9m7Np<8`c~$XlTB
zkkE-6DwdzmeSyy%o#S6|qcWq#CwY?Ikh9w);AnNP#UByc_eDj;=<lEOS6c3f{&%6{
ze@5hg=Bv!)zq2s^faU*OBK7J2r_Zhiq;LG|{+N?82!t@1N5zTUNU&_SDHsRy+E<LO
z^zYjz?U|X{lCFw=W?@%ka~im{%ic3`;TN#wLM8*Ba5AjS_!y8CkPLLPCpp`!K4|3w
z^Bgv6e`0^Wh@a)^V(BLyr2pji#W1sK&oXJxv@mA8e@Zf+cD0}ep9k0vUMyQ*<*#Xm
z4S$e@GRpMv>!`k3Ma_to&)1l5pu<HwPu`xO*boxAbgLyP?e_)zdhQE>R$S2(Vn-jd
zbS4BmR;p_c56^#UmMMUyPRZ<n!2^Fo4niZJF0P@WXv#$HsW}ph8-7vtivr=qt|45|
zE>tQL_?4hkAv>4J>h)?KlZRguP#sgF)Xe=ll7ImScmL_>u&g)#UaoXcwl3Fw(37Jc
zzk=%UT7eR8Nw~){+&6)M=*Fcs&H5s)eqb`f+UDvRM!F!!R*T>F`RhS{p<97Ld8yYF
zebWu0veX4;Vf*eYuE4(y$Y|WV0rRhheqfGZJ{@1w;qC#hpds4be;vVeY?#V#hs=Q>
z@Lm?0Ua)0;DJdgwB-?q_x9v|Zd&?CJryxI+uJ_^e-k&*jo-&nYnw%)xh7}HZvOm)P
zAzxw#OfSPxqxjE;z;Rk9Dn4vighC84Q2Kr-T`H-angS}@#;%C<{vF4h_$DrOy7%7_
z)tho%({kU`+`j?7fn5@kJa@bb`@OsueT{m=-?wVXd=eM3{Q0S!vG@Lz=eYwbZyEv7
zyy8Z<Zd<Ol-|2=%;alKwe7bDbCxasYI}<f||GgV7EXxR8z6l)rXncphft)&focTj1
z@dKIhGgXSO9QU1xaF8pYa=V#ct7qt=uG5PiP$u%T-r`1VgtFs&4en^AYW9E`p?N$-
zw0^hg+0~!|<2Qm5Z$$yp1tWus{hoYdRGxIO4*5A9)Q6=tRGb_+w^p26r>R|KoM^)h
zjA^lpc4fb$wiLr#-8WLt0{7qw<?dM@&!0xC=zh9YkZn5shdQmC%diz+h#=b08+2mG
zX}InjzDsW^Lsu&W;`3N>`}0K^mCc&SH(eI}@xIJ@JwD6;WZbOBZg7W>+Q)B(Hmr~8
zNCbr<e`i1YR=&7g`Vt;@?tlE>Qz$NWp5mtdbuN_DQM^swZlXi5#}6!owwZ6pH>TIv
zZgwSr7ivN@%rYxK?j}cs`1Cn4hP;_ig*MQh%593i9?8*O(I)+c;%h_yERZtGZb3`?
zv_;#EU{@0{d==0Y9VZPxzGgED5$=NaGMz^LYZKWy`X*ww+NasH`1e8%B;2Sd=I~sM
z+vXTZeX=qGkVkBHQg~=oI4ccVtLUY=gr9kztp136O1SJ(4s)gbtlwm6xlF&J>Q`xf
z+>c##9M#yvxEOg@n)WrG)!0GrOt`E(W3^IaFhqPpx<>8H?#)$HxqlcoeUW?6x+as;
ze%8`*hMGeK&#o!D7(FAlaKu$qy)Iq3b0bZ7IN>$E?CustLksg-fA)Oqv)KS!NCr|Q
z@eK0G3-%Fnshr!i65Z5-W2pEz+XUWvgZPSUv~`trG^y-@qU}UPyEK2#V$S{QD<IX*
zNvw%l>Sn@rq#4pYBp-HXV$d&p23?{xao?f8A5=<z;I|dOP9a2?g@gPBA2!*|*T6P<
zX~5@<Gt*@(c#kW)E&PX$D`f5XyazY72c(s}9apb-(y{J?l|8AQ&f%XBsY9HV@AKdd
z2Ol8-?}21;r@?zpR%C>|_Z&03(EAF`kiuX&$LrG-RQ99^>be3WLt^Y#)6gHg-7#kg
z>2kTPbhhQX;=4LKH+nKZ-?M1I5P;1-@bdmVE2s`TzY?@?tdh=um;3E?;^xA=x=!RQ
zRI#jizXJMvc-Hm}?dtpf1mZ`PxU-d6m4BjI^Z!jkKB#g{TmRNYT~O_$Qb?J?W8hWM
z{H_HoVD&*AfVqi9JuK!NB+PCF%!Tt0TT{%Ndc&x$S*z2)fq51Uu7mDp865|HTdX8B
ztAs6!4w91fuie+0fzP51Z<RWZCu%b!{6E^>Ga9b<dmok}MTtmsL8R)vM6`q`38EWi
z5WS4vyEMIw1QSFFf@q_cF?w&KGkT5A3<jeO&&l`s<@5V~o)^#m)qgE7tTnSZ=RW7&
z`?{`u-Fv4!;6GX{QV4xz2w8|`mNQ_;vv_u(-x#mCVAvuhn;G74m?ocCXQJp?eagpb
zzDn-s_w=3fq{C;w@{iHXsKrQDaVtenj_A6z65I~}An;v-t&&*<-V<=105HJHyeqS;
zxNtyg)bU1Q4Chl>nryDYXMP8wnQ<V)YPjR~Ck^IDhb$r!zP(GpSDETaASM>Gd+nZA
zo)lFd@A!@5<DR6dap`d7l5GXHFLjbKL*yb>Icu(i!z~uSE{s4fxr!Dd=NcH7yK`eO
zFSp?(#eDWheZY!*6Iqq#T-$Kp4k~jE2A(zVneAu4v9H~BaBqh}r{kZ<xdUP2Wd5^b
z;JZaK0_0P8>s0@VoXigGVgr-`JdON#w@NftSe(K-I<P&{@%!uz33;<(Kc&+#-V6Dw
zyqB?O+KnUH_><K@VHp8pATJ@Dcl*r0thNZI9gh3<X4Z9}RyV?z3meE1=tcK$ai*U9
z?C*4l(oo}UK`F7W&fd^Va&9Q`hbn*?GQci{Kcx?-(=Y)+EY93^Q@&IcB*0tx9?R*$
z;=XMUp?-G!>HRK_HE3(B#BIL@!M4n8HFpCM@Z_hKQ=vWn-tax|rnzy@Shel9H;COW
z-^rz&n-p=&-!IxxILyN&*?!+gn&Hx?gm%MzdjLQ-I-_QDzz{>Tcb3ymn1h(_o{7i<
z9KSzmU?N}9-7Xc%67cXxCOW0ZT6s6+qN>TZF9><dzjtW>#~k#yg)@+*x=g>ccZTn(
ziWfefsNWNE@0uFNRf&Vw=o)WHHJY#&2=gv(z3d_bwcs2n6GP(u(0nbcf~mj*hCdli
zyucbec%)7%!&#(Xb26=$P>(qrpUshgt&)G6CD-Gec25$y_YAXwP_JTKU0PT5zDKB5
z_2}EFleW=dZz|*U-jZ5$21Y;h7iV*m2fX^CC{v^4e5k;h$RmgU;j0Q^Tj46Y@NWEV
zA#2sJc-()2QDzUVVmG~MJu(k8=XRC81zejv9#X&0_=S6uQpm(&u&`qf$bl~|MB2i5
zkM^O}R;u3A@MIEiLcigg%1j04oodTDL1z1HjQH56FBBXMAs@EGt)!NRMKi+MX9OKK
z^gAgo{2Rw#LhDo!mHmU&q=PfH=FcwVj~eX@=6Z(H_Qu^i3s)g?H^%)(fF~1$Mfz?`
ztsEX5A!01=`h*D8$zm!XRx&)b)B>KvUor2EPm4?oxbz>WqcH~Tcq_DcV8!wDKSM;u
zUK9N*-T2KBH%=lYfn5hA^U6rmQ)ZP1%lbooGeYJyR6xQ%WF7`f*#PqmX4adj^*K$Q
z*KC4JSDt{`?3_Zf9d$J)uY>yOGWdaOGwYCoYRtN9RK#Ec+QwfxW-cac@Z263Enr_J
zR}#Et_HHZ2atn}V+e-Potu7Bh*`E7Snfyqk7ww(pE61~WY#ls1xtwfwmC>ZGDFv7^
z0pK;0H5q*cY0M<l$g78V!t~}ZZSGH*;j;b6f$?7VmgYDUygJk{MDpXc-(@_Zu^6>|
zGWOCkV9JM#O9a3V5PV-Mct$MA?)mf)UshRbR(sRZcuNI1v4%1z*J+InpEXet*j_nr
zCkE-+Z^f*%%b})G12c|krf5oNn@!7zHf_wL_lvU4(3O?Pvh4ls1o@2ccw!|f$U4(d
zxhLeg&v?zssRRHXoA*|dpcwr4X`IPfEdcSH_vO+KdD^l(Wh5wkst`Nsg(RzUHzq2T
zTz?Zo`=n4&q``h;e6^Z_LNyf)t_H|v==@}J<((9J8##k(4@$&)zO}`|?Qw&&eq%b^
zs{!|hei=m9ujQfQNnn@&)WBO50K5;%f<G`}jKUEp!jR;TP<d)#iHU)I+QgJ(d6y=K
zd_t@z)hsDTYN1T0ykLcELaztdBpysdaS<^6iJ(W&y0zh7dhj&Y!eXFftJle0DG0Gr
znavurn#8Hb)qdCX!XK9afZ8{UI*n*&9Zd&UG>uWonf`%x+?i@8n5^kTsjL}hyJl!T
z^?>9Lx{1OGcU?^mUD#@6>X!o?rI66@66^31&#P*)qK-jYZuDEmZn>U$O4<T(+(Xbz
zS-$GvY+<!?lc@|S3LaC@t6Fn^)JJ?)OMD+QVmDRzx}4p?sGRB>_d%Y8LvzKB<WaIh
zw&oW__DFZu(uxd=LTmOSefdM5dr~Iqg*Yv}3P$dvO>MM@SfWRNXrEErz}uizNH&~)
zj0Deni7uGu_Sk67u3r^}Pl#}eKSo@F#V5ot=w+74Mv{rOvf#5k{tu+hM$(R*+zdl0
z%$&I3T~~}aMcU5zMWjOE;tJa+oNhc_FI8~5dTsJPR^q3a*EG<WR*j>w*`}&c0hDGs
z#Z$%n)c7VaVzBcj)E+4WdK;wmhObzb-$SuXXQ$fbxp}`P+`W_^u91O7I+;rx%s(bV
zo=|ldAJUJfCw}Ln0&(eA$$X}{!w#Z!J@Zp^7<AbYgMKkg$^zcQ(d*}lr{RymH%c{J
z0<&8e2S$s4Ch`HzP6v@DtCyfN3X-<eY;Zbw3jdbaN{N5bAG(3D3>|Z5I8jeP+EnjH
zvnJ|niESJjXI?|KFFBs_{1uQgW&ez!`pfkebL8o&56gzp;Gk-zQ=!2G)5Gli0-<VI
z{u<-q217yr?OE-(?tG_1g-eLXN&{}`zoMBz%aC_k{1Vx@Xf7aYyWlhw6`&7>v)0!p
z7<Bp5Eo|CdFi}L1bl>|jd4^xswLHm*K0<K1Jo*ksBznoE`;KTbj8l4Xz7>@+BF`-I
zRQTnvHZ??vT^^69%yzuc?mQBsC9>;b({Y8hJoWRrCd?TTtXUNN8%7o{+_U@5KiJbI
zyNI2}V05{eh$fqicJ7vq2v^!EBPFmafiTcl$DC1~{{t=gV~u`D*_cATTYERD*0i(F
zt}lO#Pu$_zMK);-F}c^FMVecO<tZ4Fu2HHC764$%L!?j(@&Vb_0M}cjN5dw5>K7tL
zJG1g!k5;F~9aV)Le6?tR_M37L4Ve%Fh*O56F25iKGx)g}Z}V)ww&OfU5}ryz&oBz`
z&b<hjX3@0*YQu=-zWTyTjcHmBvc~;ewZ)J-9-YgcvJx!UDj9=4Y`?P0J-QYq0;1*;
zw_+<`%Y(ptEO5RWOyNE#0e^o8@4&KHPa;du2nVd*p7QyrkWyK&ptK84`dQv<PmZBN
zqA5%!d~a5~=rdSyyV?IBX*r1Z)p+HUw(1tW);<}>?3qL3ax5z<IRA5%fB*IGq|ziU
zW*9#sX<C={6dmoF02NR}c3^9%gr`-JCL~>}I5JHTNFyf-)05F#ht+3r)+%!w5pMBA
zZ?G5V&oXonHH5kc6%niz@_+^;1W*=loUgbKII9&1$TiogvOZc^P7`+#Svu0)x`!Vx
zQ5_a9xN(K{0XWr@`a%cCV=Ymx>H7OKFzU$yCE)#d+B(Rfs0JJUlxnQ{&9{w?Vx7;n
z>xbfX!vK0PzV2Z?iKI^Pcjy;_z|8f(bhDjQuIt=DeKmFg@HUued!IrUw#qdoQ#Wbr
z&#dcnO=zs9qR-8SWMb^S!KDt)r!y76>+&>UHU)KH8&_4-(a8g{aP*g~IiUj3W&9m*
zfT+1JMZ4FQuRg%e1)bYcs;rfrw+r%l){c1+B&(O2F2A32)#k0jj+(D>f!@~GCI(#1
zaF#1u$g5r{gUOF(lxI{g%CxS(byqk&40#Rd?9=b7XD-z#Ze7IC$JlFI6~=LGONP&~
zstD8_1mr_t>yY5@-^dehwe7scgEzJW8F4^k`u}$)zW<ktQ1uHC?E;$l+(+NZsMKvt
zB6;3U=*+7Q|B{th$a_DdG7bdyg}v9bAL*IPgf@7TGla=713xFJ0ciu>7c|1j^SLUi
z%=~%PwPzd<kIYg-Axt}ZB7X<7J2ih<81cBJyVk;cHbb(-wr^-KGqf_VyX_92&_>=*
zz_sWWlAQ_C>^5*nrqHvg$Tid}eq}Z=ifxWi8iuWJV(Q_kB3sYx)`6ER%9_RsJZ>v&
zrI+AxDOUILq%T;Q->FGZdNEHmYm$1f8uD;zcKQ5y!|%(wT{_D=5R=8yg%dH^4Dul+
zn3Xg2=3>|wfk8LPUHc|&HdpGZ%(t$%{P*sg1R9-b!t8TUR`Zx}k@k%Kbj=DX`0?xt
zN2$eg>NhP%-m(sZL%(FRr;5$e98j?tg)`ba`pNvgGj0u_ZrQ@UCK=I^XJmGs+(CsU
zKgFrLs74?>w{!i`mF2az>+~aAQXBl@4!_X)4f*$HFdR2&K+}?N$cTweJ6}NG?J$IQ
z*i-xbq|AIULdVT_1V5t->;*p*(5d7tBHZTN>@BhLQdMkBGv4q82id}2Zp>D4zU4P%
zgO82GU9%LUHPC~K8?tLm9^bjhW(Zm0m~%aVoHP}e4m@X>OaWj3`rJ#{k(~$0XCSf8
zv6c0lDytS5uzHEMw7hCLaOt%^HL2l`0u2G(&GSip^f{u`vLqc6#)y5tmHhaN%Yw|D
z=jcj!4~^OM(WWq3XFTG?xEsX0kW`xCiXfU>5RD}hn^oes($nKRqJ*APs^EV>a*;Zp
z51H5KHTI*}9#D1`+6VAhH=lhbT`wG-3|gr<0W<0s<WJItIrO>KStc?8_h}%(l0@DT
z){nH(&GW}a&ebsQwvHVrW`n6a<K>iUQ58D)hZt9Tdc5TNTRHDh0ryc!X`(R_*t@l+
zmB+C6%HIsIxsXe^yRhcHlNGlML)}uLsr1Y#ZN<n}Gp^5IPSjj1^rQgp<a|oL`_uOc
zbgP6*nE{^a(cdOBHY7OzF{z3JUM*&rST)iO^BWrw9I;jZXP->V{5yBSgzd=L_3%kU
zBv}~djAF)%4*u)|dSW(E3OSf=Yb?(Wf3Ib`p-_MB;0OL)`UFvys5ET^atv{|D7b+7
zN9J_^2*>qq2a(5<@+RK?xWfIRB<Zx@2RZAG(#fiK|B{g3ez5=4|5l(K4wqG%rRSQS
zEBh48h_;;wio&$h(X#1Nyzu^F@MHmYTsaoB<dBkca9VdR>DgfLbgi<1+|jpC7z;1V
zelnbOgCQnG_;cZwcG<$pgFEg`($b6V2AqM#jdR=O9xux($!d=*0(-#1wzq}eNkOW2
zbQ+sh{fy2^RZY5_P83+N^^4bh=c^SY(i3>;8QeuK8&m;00d}ge2=Sr1q}6nwz_H$e
z%q+UJ>2F_v5e^5g!35Hlwnca|5TE5ar>5+a6R%IB-^ljIJekd2!+?H@OQaI^El+H`
zXJ=hmW_G6vap?xDX5M`QzQcSz136X$L3QPp4{v)Exi;`w=p~SNofgeE3Nk;fU&EF3
zvzOC;H5TyO;H2EsYsojJ2nAZaNG+6LZJ-Z%y})7?<L)D7fI2(_y6j*ibWZ}~{No@E
zP)<&!Ps-Me%)EFZ#2^*8Q3)B=MVkm>p~M3Tdnzn`=mXRTxYRftc>L(5OAo6FgsmQU
zMb#Mma?T;<!j9w$zL)e0I{es()zYjI9&%`yzxC?Ppg7e;L9nhXYeT&WsLd&0sA4ie
zT4OuBZ&qitm);ls1Jm~jAhf3dqh=8H<V{^Y^ycb2y;XMpQ`s7OBQ4lulWfMy!0t?p
zi1CBnUtDCG^4}r(vzOXfP-fiXm&&Nx_|=yJ%OBlMZiG<*#QHs|)dHR2=0x?8Xz_JW
zz2v<ihO)y(sKXQGGeu51YjA#-ug!>w^Q243XsJgEKb+c80cvWk72EGn*oQBk_Gwt6
z6)`BSXE}Z>o1r3D+M~VL!yWYb(yL7P9o<ZfjK0bI3MZ^OPz@qEfUw{#|Cr%vMyAC{
z1Oqk9nxCAE*#?j5yvL<rL%_g9t7OR&XXurF?~#0`T_!Y7g;7Nu35y*(>VrP@R~yfw
zdCcQGuA`So^?t?OzvCRgfcdUxKvB7lilD;Zibdo>;dv9kwR_fCt|=Q&)w0~xPdd1c
zgdX4q?W%F=x*mnT5D3grLFkJ3zPy~`-R4&67D3(_@mJdHO#s3Wr>N0s>=KIG3wH}y
z9D%?pGV;r<$@j@2*W`twINT07#=>D`(k}L2=T3TvwwqEjlL-A@wK&3iuOZZu59d5S
zbK<?}thQ>e8#|EHNr!_g%QNZ}0J@-?TJ@U-%bbmS$m3UImez6L%4!ur_fDqm+qtFr
z!)ZU!-Jkh%^hyA73f7;vvtEt!k<O?rP@t=R<3Sn=V+lke20bS9J;zL)<?3u-TTn4c
zCY=qbGGFGR=0ZHCHQizv<WioazdY^U`VjWMs)5#JTUl{Vq^+mHFHNI>g<oG*U|S{{
zPqO&83-!^&;|qAgKd1LT-=^rK(@hZ4dIP_V{sQ%Vb+`CJPvs)C#>!v;Su%Dfm?mQ8
z4xJ`R&51C81ppDiP(cFt{rIs`*6&HjTa}X+WBCrsS+x-I$%*`DXZ%o105Ifu5s*!$
z@IdhbC{;W}3InRH>1(eOP3F3mB$+X7Wh9!w(~meTsgn(_EC<r8;F|vW2_@(QyiSSV
zy-lsxzwu^$L?Pby8%c=t2cOZ3XSZa>FA5a;-<vvV@s}#Zu4{3jbekJdWn+Bhnu4`T
z!woZqZQI<bXai~r?8v^orr#c||A|6nFkj)NIMRSo4n9{&H}0qop<EA87}w0gSbiW9
zD3PG*b0m|<d-e}vx|1F=1ikt`#mFQXzEx{sbNGpv2tu7ePZ9Q%+Be$AZYpbS1C)GJ
zt11Lf_|7+7O;;+ksZSz}r-#m(h_?HyB1L{~UTXMp6O|1it@AIk#A+rDUP<b!M}_e%
zoPeu~m-L&y8hdPKRBi9=NZ|Dqs=jN+<xd_C@ty)zC6UQ;4r0kqNy=3aEHXd}whCVB
z`K-pRSL!-evDsJsV!G9W)`*+|&g3Yto!5R$dDO~#RJ$Vw1qxh-b>Rz!KAB&Q#@&`~
zU7;JtaHwYk&&~Xe33r?l%o6q00kkE#vT!;ufrTGD2>(Gn9{zZ`4$Z;y{+UF!*FpsI
z^=QbQ-0KEa@~EKGuk`2*&q8axWPnPL<?3YB#mf!b2KqnEO3y@V*X3)2ovH(okOM!6
zwsaQwnd6iLYW|n0b<Rqns<-auWM~oBWYngEA+$u+r^n|HVW&TlpJt{=f+>;#*f!vm
zI)|<f4yOFgboHf*Rhj{Q)*d5v@WNFiXp~y1!W<ytW%l2)trScA(^kfxN(?c@bhAk`
zP}ghmNo2qNSrq^^EHa&#%p5y6AJu26Jamv#D9Ql7jc13}o2Yn?YIJ{xw*E7o)Xz<k
z_{HBy2q0AE82;jCW`HOOl%u>(Vwr)V7jP++yqOA^pBlV0HNEEARJ+iG9__gLWJS5e
zQF6#l!}<KUDzMbLK45)C#0(8ZW=+_Rem4+^Oapp*+qB>EF0VjpDGJ>hMo)VUGqHZk
zWbUv3E$Gt-Ty3s4Q16v>i8zG6{L1&<)$#K^|0ZIhw2Nd_R>(TNM(>Q!t0kCnj*Ks#
zw8^gmsWX!I>u)QZ8(7Ns;%}PRi~*<_r~#}TS@}y)3^9cR`y~T1^6oG=-M6s<D!J<o
zwVa7sI_wTOE=WhtRcP2lJhd0kIxT~E0Q$oRj0k$}1LoV=SdW=yMGgGQ=FZYZ831%_
zl)9%oBRzkv4lP=+>j%nm_HNRCw#lOYYUz(U#CqDaQd4S)@Ol2z@w=sy)-Y_>X5^i9
zj_G0Wlps)*T6Zw%F5N6vcu-h;4G`~8?UtSkeq+UaxxiE=Km$#)oY9U%Rty;iOlW{h
z03}4srFjVur^6zGfu86R+1?8gQjmODv_so(>Q5!X3n@ugIsp2wWO59{4W>VHX|5<j
zxgl`QyW26NI_{m_Ny{gQ7~1-^yTLsB2%YI#y!_19?iUvxa=F%FPt&bB&8$Z>iVzji
z?F;SRW$L**?QJnYVRtyq?UZ#C{tFF|oe*VXL136zW{1nZ95j_x&i7kJra{>=i-a16
zJ3@x1JA6*mo8=zVSMgHIC49@5Bk=jH2!sQlpu>0x2w7}4@zwax%qc*L=Umq1Y1ZGx
zY^N7~IQyDq)kyqtYDjARE&9pvVP(&ornO6G+h=!z>g|E5N^qM^+2Hxv;ej=FwuTv1
zlTNVfY+9AP)i5tuK72cjqj|t(JJO`kIuYTZSj8!i_XXrUAbns9?IW-qbe&e*=)PdP
zxlQLIZ`zf8KU$~cYAOv5-D{OdC)>G7ooB-!#t)`u)5#XkXT*#WT$);efeZt0RUEod
zxDfV4kA>+kGnixbB0wJ#<We>`yHT*hEi2A4_urf1kBT~*W4E00kN@?vm#J{ejd;ta
zM0q+tz!1M;-d>xxX}H`tuHlIoySB#gl@T#o-S0*oMY+_dE<vdXL0DpZOon>7;m)rD
zy_JYdvjy!&Lvczmh1Ej6t+gSZdUu6b4II95Uvc#Py#dsURmxdtK-{FXKlFnW>05^M
zJKOT5O0G^P3z~DXetF@%UZeUXr&{c;z<1KFQR)<IbvWceR}nEJe;bpo+cy}waziC*
zy#SEYoHn9j-n<^HD*846S)CbP>Rl!u4S#I#0c@xq;YGGR@<@Y6qO8_5F@XZ%-4>G%
z@hGwZbh9r~34Qw6aJqJ0I+#l^z>g12KkwyP*!ti4&?agkybmbA4~Mf9$z<zIx?tqx
zS?hFLmRau-brUjlom_Qxsq6)lTItimj^Eooe#F3`tC1|R?dslKtVYI8--gV22u`s^
zDYRvPgr4@F0Q<Uk)ubHhyA(hh9xOlOP;i`x^MoI-<D`O?3n2j;1^pTjloGL0m+h@l
zNew2F!gt=Y0^V&qm`xr8sQMRM`UL0zsBd51oV>rON6TX{$C%USQxKeld8u;S)ud;=
zAfwhc5zNOGFY`8am%b1J!@=wxzw>J!Qa*{qpnr-}4F<4@zN@DzEjQH@Ciy=0s0N^$
zqc?kwj$_(#+4?1<d-YZL%bxZ3b~6wKXgVJUcf3&%6t?Kpx%@z-ZP^(bq>XA--mCzg
zNs_984*%d}U_}1y%HyqZEJtH56&PWH@Okok#htcKKbO7N*gKMWC-f;8f`(|qHiwH#
z-zB1gw0dBr8dqPZz3L{84|g3XF!*ZIwrCqWneo`CJ@a!2rt&w#C2aCXzhSeEfSEj_
z>-pMBm8QD6^p-k8yEmZ-zfPCOuSSP3eg%cMAGy$J=8QjQacrP>DEH?UqCO~7Pd#mx
z+CvDq_UM(yk4AO0^~%tSx7T!%(ZK@zK6$?Ni5wLMT8k=@kf<a`*o)!i8>h)5hTa)D
z$J1*vk?@jp3Rj<GFC3sRcgrck;q1Z8Qy!cdgm8{xrx}+ena(UF7zXp!EhD6Dngis(
z^R<v}WskYb20H!K0#WL{g0UK9>$aj=S5K3thN}cS6oIH^X&+isz~2U&b1uN@_@y?D
zhM8+!H8e|^XFf5E-co|V5;b{>eO#q}>M5y_2LBRXq{?^IBHVF*w_duuhh&Wv40dS~
zP{TgD(E$LyfZ2*iv${|@UAXNVuYEGxGY9M77z4Ndw&n2Unc;BlB@VotK)5MNAaH`%
zgVx{_Ul(Ud6%V4iJ|^+pS2%z%F29)1ECEGa>-t7e2d`Mc*&MgLp`uh-0E{i|&)|Ds
zrF!Caz)qvYI;<4jNe|6iRlh-IL73eXl-eFt_WJ#8I_*-Jx60nf4)aa0ylmT1WIJ-+
z*>$pxMlc)_%Ult<BEKmd8#zcup`KTufp-y#wV=^4+5Nb%#mG7#Mw<|syldUc;^+5Q
zn3a;C&>cBXS1&)8L;au4ms~0L?(?5KCbHF>dkFsS)BEtm+G}H6J>WErpjl5yX_()f
zJ9>idaJ2K5{-U@OZ+s@u5zV~NaKUokwM74$TWWWa^sM<?!>f9=dCvemo}DHFOGr;^
zRM-w+$tg+6JR>yVJ%8!spY2}YpMnuS+s+j0vQ+_6=Ua?(zx(TVRkCxeFg_{RQ?gt8
zxc<P+z~$e$f_|G*a|?@EJ&f1}r$uk63q=2-4fNtT?VQHs=zZ48Y>wii=KpvD!)T4?
zM_JB8iwNszY_&W@MTkLeZc_S$W=uZ`_2lKwrJi?E@2{NVcUcH1W2@6>%5&%i?qYG<
zbGB~Noj2f&7-(pJG*`Fk<9GWFd&ZX=lQ=c}nkVB`6^5u4!#Q8N^08NbEd33>lEmmf
z$sZGB#fP6j3!7N%k}&~uf%@*)fI>Iya=%RzJN?k0`9HdNYshAq@%e`T+{GCBd?)=p
zRg37K7Vp3?$%pcD57%6_)=WI2qcynTvj+qUgXbwvzJ%gv=fq#v83nG*U8p1HI@p<^
zg3tIGfxoZfa(uSL+-K%qbbq|2sp-=BQo{iwQSI^A6c{O0zf6dIs#@mi7Pp$_3%kqW
zo9a31FLdgevg&<L(j~#pBXME^fix<Epy?X+MC-ffUq}c1?hCXtHp<^Al)ktXcwC2-
zE_JRq##S)1v{u@;_#H<W&(AjzT9-<YM!0SYqb2b<f|3hD!Ih`I-D7X!oAF|zRrJgO
zLBFCMne!I+E%4ZXWo4CR5-Zj1{kLkkJ1qA{Y}u7!VmaFWSHd&n=g2i8`YPFf`}HU@
zE>w!)(9QD>{yLp9;husI#elle)LwuUA|z3?=0?=QQRBQB!}62B`n0mSjmM?qs1kV5
z?W7Qc*V~yM4N%P|%D9sTrc5cE_TdvF%Z|qDcHaqC^`%*+G>qqUvFaz^m$|VDW=?<s
zVVI?#Pog?yom1V1A@IA|OoQL*Hx#+xfe!6<)}>U89;?@9ZK8+us^{HOB%I^|`MzRO
zJDi^2Sao{KIR;i*)gK2bO+}WsV|3|PjvBwQOguwJ9a^rvj{X|<L~MXSQG5kh;;IJZ
zYW?Oe-_H#DI35ZY?|LRI^C;z~te3ZbO23pmvF>{%pTlm5S)DdrHJqaWoWBl{GrP3(
z48Q#5=Qc56v5{nhX=hijBRv-=&g`xS{A~%_wcj3c0{M>*aW>?<iBf9pFbC>O_dlC`
zw>3|WyT_2nX*hoP5u+!C$60|5RtG@6W+syW#?|$$W^xbm$|y{OB5rz0aTwrAyrk^r
z@`2Lgu$}IT10CRMsz92Tn%?yE&)R1t1Ha_B0+=G9aLsMmSj=Z8U&*gF)gEIp#x=eM
zQ<zI_O-pzqirw?pJ5Fa7Y14p}vROm-eBM6juhrA9y4yE!vmPtGMbr4l@)nAJ`H$$f
z+w{NVT>og-yTfZOw}`r>_5Qvz|3w=9qiv_Y_?K($Z+QE^@!6m8vMIz<?lYbNz<Ali
zS*v>dv^O)~mDX36spK>=*wkT<RX05Df%0Z3Ubn&Dy6=6h(yAC9&Fzx~adgSn?7^B+
z$_-~6a)zVr^<dn3cjdwsuim~Ix_-GDCz0Yk?pq{L;QEy8>W#}+O$gH39A8#b>+qA1
zq6%J8ATH7Zch;(@_|HQ9f7jvv|Fcftp)3C-DabSqNVP4jl7+=BMmdI1jcv)MZSK>O
zrTB+WF!F=?+wg<{0rF3$LQ5S&pt*>|#Oe!vPuC+omFGFS36XjUK1aLKB*2%OoP$Kd
z@pGZfJ;HD6>;NYxx=(uAPqILZ=2kKvStR@Hg-vUB@?#R|6FRf`aa^z;)YODy<>VCV
z&`Zb+Kj)8%T4<?y$Z(+Gv2kjP)seu<eHwCF$f4xKQB-1Xr4Es8qm_*#5^EUlim>2b
zNH_1iMD1JxO6`c9;ZXVQUBe!)o!(WlM3IC#lf?SNb@&YTIu{w!KqQM`?L}-%n~f5v
zVeyDGSijBXz^}J`kC3)JQA7tMW7jHEp88|2!(&(o{a^d#ja{36PY6uAWJHuGBKJI>
zZzVP*KTu1s+*rM6@5QATyzf(AwhgY5DFH}3ECh*}mx@YzzrIP~d6qc>_)NtYo1a6I
zhA?NRse@eo0VkIl?=Q%BUVqErU=*%rSbZUBd*#z96|lI@tq;kGJSeOWSwy)|t!t{)
z9@20vhIXh;<OfFDrR%qd4sY4qN)ypif=w_=3=+M=#+aV{S{YjoHNPf*%ckZ;D2$4Y
z_<*4o?77e`&vfVYGq3J}v2;FWK-C_VLo%6fF{(%28#137JrvHk4O=yZr$7f8+M}(T
zEqk}FwyNI0LidOo(0k0RgsXViX>zzHJldKxlZLV_EFV~!9iOkhwxUMVKKSBuW)_g}
zkF)T0+O_{STHe^DSJN$<uBHNt=(eM(H2TBWB69{#E;STU1^iI7pqgzt3rqh9W_IzG
zLs(1SjW2obBfKu68%SpM=Td5HL&gG5@Og3hYpbW2qEQ0HCGZ+YqRvrMCS!yM29e`h
zEWtQeb!XvW&a}R0>i3B2%hzMNtNE5Y#4zeTE39*wM6?`AR@4fq!uwE!TITM3MVV~!
zX3>Wv5ZDLjwh3Owvqo}>=y(2N*~Jce#-;mEo4$aFhjWFxiJtwB>^@dFEzcw%zVj>e
z0i>MDKBc&1Z+)&x<MvhHpIBYT7{$`djn9Ah^b=QQDuU*ML_h;sq<a6MYoV&2{;lAC
zj=Nk$-D}F3oCQ95LSlIEEYs6}>@!<1N|=q>H14D2Oi*v+YQ5u9N@LfUcjZSN{ZEJE
zn!*FqhdBYM{cq{iwy&lxhWcKsjPqb+-<k#FXBGua(Ip9r=!W^paCjfmqrBZrm90nL
zQ+a%vgst@H54xb+au}ngpw!B~A_La}>#g3Nu@X;4s0L3QjmSGpcHQQ2Wi&Y>{PN@&
zVX~S!Z;FtTEq*}*QUNM``r(@%tGZLqJvLL#JXI&Rz=F}~6V@mQq`;`ylc#dU<}{%E
z`%<Cs?MH_3oih`%yhMf52bHPoW3Cs!NE|(V-x}caNBa7!I%u%HqAx0$1hxkX^x(C$
zW|5?$3j+32I64Ik>;R6KVb~lKY*?+iftM2RpA%DvQS-g*H?`SP-R*5)Hcm1uZ0oyf
z-!oQcTLuZ7_9*hL4{RV%A|bY}Qvt%_xS3xqgMz^TkKQotomS~Nw-Ra(38Q$B4VS<A
zx(yWAvpWB}W%6Si&P?h;IL!~b7!t@(iePa4E!L9D>&2zq2KMQZ{A4-fvRS&!d#?S-
z#RNP4Z56CoWu@l2R=!Y(MKJFBdGb0Rz1O}p+E+VkpEg%t=^@>(usmh~8sxK;%q-mX
zOFjs(j~cae?2&}rx5E^u|BbwhW&e)G5;8zY{1(vmFooERDwpi1OZZ;)avh7p{w>5|
za&E!tRv|WMy5=5GvGzWsmVKW8$70x3$r&BkHz1m)n^?6Ewf#yzu3HJzjTg5Y+{3;#
z`G=eJGXmk^k%k&KLZ><YYecEnrP_URKv4@?MQ%7AJIGNrwO;3^og-r}nDJ(*YY8v9
zxRUl}Q2R|!IpU3B7I_a+Ix?a5j=HYroHoO?b|I6yB#baOn!~h#WS_3gx*jIbIABkH
zID7oEbDR<-m?lVu8TOLW3U}HjgmPl#xs>%jK050QWGb%w-Ruqoi-W8L#|x9#bV8w7
zhw$VJ1QAOuz00;=YkN2;25<ui*u(P{dtcT1*GExD5MOH-e%TyEJ$K~MiJ}0e1Q;5Z
znom+~8#=(lH(}@oz*xqat?z=*r*=DJd&1Ms)xZ7RW8!NM682NSWzw{F+0*P$GIO0}
z)S5g(kU#K10ky5BW0eLNPcehi8|_;8jlHjW#pfvW^om=*__pU5AA<7@D7uM7+h@{$
z7M*1SCdQTaw$QXdbYT>zejS)(m{?zln{a6VBv`}%1XjS(UJ%rrx4b$>o|qWgwM^Y1
zWd%$)n?L5^s^Ft`UEdDP$08!2%mU?}9NluXKc2g^J1xpZt+F!+gQ9q%O%`QdmEAP-
z53i@hPgLM2X#<)0J%=;M?@8o}oqtRZ)LJ4|E7FA<hl*XqhK>>|#k{YlM6EVht}(k%
z@7a;Hdzw6dzwpVypNwVTY`396bNYR~N!wsqu$dKk;J!4v3cc+BScuK~A-AL<oy*kD
zTd6HUY4X#xZdNA{{JG-l{a8)wvNCBJOVj8S;Yg-2un*^AbN>Sd;b%@Vu8X)}E%{xm
z3#(6+x4sE+79e6K3px2+s}{F!8^*Y8`@1rcY$F=Y>J>`r?eq))-()czujPJ>#T-9i
zP(w1A+Kp@(wnP&#a>DwgDvPkhJA0xMXIGWMp$PZm^ROyA@kkc;Q;E54OrTkt%aUoY
zxbzLN0~d(BJ9{L>0SDMm@OQT_6t3}`b8df)X>7WNYIhd;;8yCSUZBaaWX{ZHk(Ju8
z;IG4i2A4~~)l1+$rt4Q;y(!0oT?stb>Fqy%Zti^21rq)T^kM$P6#ni!83~!m7jQ#<
zzU*=Er}$&KI@5_jWn42z^38WxymtiFpdCtoD@6nt`glfEY9DTgoOHlWj;%LqsM860
zKL(uogA@7ml6GFFjt@Uxy(eKoCvM6HN4_@!ELRtTmPq~NX5yn0c_9&xOK^k%|8>=x
zn33+<>|zhA@eUmL20ZgRkS#&|a&#=w5?0T4zou8)uoobg;#@A}<>vE4B4DYkC|r^E
zqdVbUoRp&ifbjKB#HBN+_W3E5Ali&3+cY|v|K2l`8;r&ETVG$=r-XxZOuff|Wo+G$
zQN9RWNnACY-`YTg9>4Cv89Og@*xnS;c~Jgg*Ged$Ip}`NlSj2FIPSL{%0|)Fn_7|g
zgtny9*&W8m2)56CrUkzv>0=dJ#!Xtj2nN6D(%?ooEjCI;v9Rf^n2s65k4z$kjvaK9
zqZ892yUN++p>zWF;T0A=)2M{Xw{1u2Eelp=08Z$v<a}HIo8LzlSg!11%k?#N;4S?j
z5wOFO^m{Knv!A{_+E#~_OO#h`pUkE`ddq${xvys4M4(R7fjb^JG%DbESgn#hQ}*oO
zt3vKO#E$%{Ob4+8V(497hHyq$v{75VS&}??zzK+T=1Mr5TT>~wA%Ucj8WO?JruZnm
ztUrjJHfLc^oDx7yw^9X0IMj4KFOA9OkY|Pfo=;d%ZYbDjcbb!VqSo|8zqdWl;^_U#
znjUTU$xaI0QWMTt81Rl{7j;mHa@jM5UwJBo(jgk40Miu}YNkhVURga3catBi-z=MA
z3eZ6ckAF@(vrq&znEw8;@&e*t(7P>Zi9b(hut=<@Ss)faHN)|8qmmxSt4A`8f&)5;
ziKd><F!K~3oQxi9H&-ubhzE{4T_{-q8-7?iZ*9x{L&H$;4BXQCVW@0XJ;VaZL~>v-
z`Y6^0Faj?!HY{ltA&NJDkA@x_V4i2GfN}*Bb*88VPdlbvkpv@bJL&XM<>7H_Td3ug
zvlsTDC8eK#JWngZzB*~XD6=_deEk$d$J2_6_9~1S{^WZDwJQK^>&?VB9bJW2CXW*c
zPd&zrE2v_7U*O-Hx)Nu@<&=$r&-qbD{HT-C^od#}dd0uHXi@NAUI!V8TW|RNxfg*K
zPx-s)JLT9OP6u#gbpUI2p)7y3>hPd1{=Lp76#z#`rK5Bc!ewrPZ2x)z))_~W{nWo;
zJw~f)u2ajN3Ijedg6IQ>0J<+&opN{Ey_t<@SMGj;zKcg0<u*XlyVfBLoX9t31dI1u
zUU|Zc<j-*(7|o%l8#dSPAilZAq79_?_V9dNQ5Qwv?3VZGDO?~e|G_Uc;B^>sqo{`E
zU<qud_Uze4L96xVbkBbz)@SP`J1_j$4tI5w6S9R38}F<!3~=?g*IvlCPL^f`YVrGw
zqaKz|;(#HZ*;&PvqUKsfHLylesnE_9>QAR+h6J8Y?fmx@JXFH&kN<$|Xd~2ob++%j
zO?t!&^pUzsrX@QySV>P5gqd{=pey0CS!dFTWF9(t-L^j(nx)s%m|%MJ<9oyUCVz^5
z(x7aTK0OaX!qmP?pxijN51}^If6*Rv>9j-WfTI@Y4+?X#Zg@Pk!tqe5KugDY=O^nh
zy!uF|b=mVl<aI(OKfn&w3mz!44T*`0wvqQtX5>%naBTHLM2gtHaSfwAFKg(x>=gh@
zEHg&4eOLj2d!Aa#()z`ZzE_q2*vd{PSwK&R21~N+HuXOc0deQ!_=^aY{hcoXXI7T_
zCw~QDQcV~qR&FYuKZIh#SEt7Jwh<BV*j~yw6J%OL9Fgy>L221#s>7nDdWlFuqRI;C
zKktIO*puPPGU<<3dCIG7))~j>RKF-APvx6_k9KMwDZLmkj}yhuHVy=!x@gG_3Gj`j
z{ff|dF(r|rU*&mz9^}>qQS_5VUG8jWpA9Z5NbCw~DyDxSV=VjM<$qjzqgY55<ZeB}
z9njpy$|#Jiy9Tqw8WVWuvJSU!02CR4#3-|8=<qpj*aArKmo`^d`RT>sTUwWTgBv*Y
z38O&iC^mcDJ6$OU_;W_e_zY4dheBzmmo|k9W7Ht(VvW!7-b&b*j>hU)csJW$SDYWs
zzkKVV3cdQjK!&Fi2H(t1<f)cFwzc#5wg5gt%XiA^+{ibk?bAyV)!{a)Mo|6LD0=Zl
z85{c)o_jERPwHW~VdyUq*mt}+X~tn2k)`K3y8c8ebocT&xMCk#BT~%b&H})(#YhY9
zzM)xS(qRXHe3LR}3Gf9d-xUFh28uFsp%7KyV<uWZQbvr!$E$wl5a!9op>m6p8d3(s
z08!Jk1k%ZaXuuRVO`sX$BGu$q62y$BvZ};<W^;)j4n9PNyS@3eisbiA&6j`X`_R(e
z*)hh(^|7UifF%a@_)F;s2aUAa)z|u$UZD3p4=sE5-IeItM>qge#+bxoAV*b~*ye&f
zWt7KiVy1z7khlqZHZ+ol(d$)h@0}@spFL6hNYEj<t2}>S#D7NHqJZipBU;02>z9JT
zb-*F7p{JV|fe2`t>EE;KZ4b}(JgQgPO}+CP)P!n(XNEQvNeS{Hr+tL`p;2lF*Wa57
zf7oy901nqnEgRu@{}GHthGZoswB;Ip-~p^#{4DiMOUK1E@94$M^%C5;zevsnEuAPB
z3L{_X^SiuQy3XhR*c4-_JpS2s!sSp{v=S~Q=REwik3Ab6#HEZVIlhUHl(J<Cq{;%f
z^;O=YRf~F1)z?UH7a&KR%|K@+SxATZIdL#cGxWJ)^G1e+H;<)Lnu2dtz8ZQq_ZNse
z7BnO2lls7Jd>b}n@Rc1j(dSqZ1P3a~c@{leQK6rn0+|RbJ_Qe;9f_3jjIL-)07_n|
z#;H&-w{9nZd;nPbfHeHrfG1)67s5777yagxAj&MUFgBaDMn65V3KZ#n-9&!!-|XQ}
zOwOpA3W{=vq%mM;)0Y%7D$e1xjIqjC!$3fY`6=Lwo;x(50YW5!bil5aN)#NiS_s?z
zD|5A4{c9G~XD_hFvRp=}q;_{f&@)?NG&3K?upQvxYeK%#_|jMNL!;0X=)T~^hHN)u
zIhverH{dlW57uJ%_><I;oQ2P>pnas9hwGan2ZOK@;sxg!ddPN=UID%_0rMHv9vb#>
zB%QLH!Q7jBE|Q`xfva6;+)df=L}iPI?1C*?wI+$*(>#%WUTkJYOQeY;U7x56v^P5}
z2TqYwGu4G}&bh|79;WV8Hbi0ErD(4SrXoEmemAGjUrvTZ(Dc8+|Hx;^jmuCEEMItt
zenKN)zY}D9aO1nUG-PDj)gTcnhWBZ8Euwo{``dY^++y)lD6U8W@DfH#Is#|VM!)6@
zc^2I^y0<+&*Jo9jtIqQ|zMBJx^hOp|Tr>>YidOgM%7$0my*(U{Ttp$#M)qP^GZIe)
zdCU`f82jpLVo3}Vt9(~hGPfpJ=-W<QdQlW+Z|kl;HFk^mRj64~X^y}8#J-nlbXm$v
za1kIC>4c<j^$WYryx!JKyBG@Dfb*G9^GWUEE3_g;Nnu^T??;<hHVX)rxHel#0!41^
zdrVS4^ircjWj6H>q|9mhDefz7ZQ<4`#fGj$H@pK_M@17H#tj_BO9?n?j;B}-*u<t5
z<jZV4rCQNsF8iz8?K=NU91YDG(3$Ehyg!lCL`rqctU^^2ZgN0!<R638IXgpA1|1|+
z-cqT%_EO`EjqBVL(^?*g^}FZlq*xIoh@pdr%o8-G{gTr`7v8|Kdx+1R3RszBV7aJG
zH}V|qXRgqq*^}gH6mlNazJB^BV}SU}kXQwiOm79U+twtX;0#IwZi^T$)vc3Y<5|6t
zS*qepHX1Izrw}*>xW;Ay*VtFmP(w*zel@q{(T~NgeCEs$vM3I&{y$^1H-J`XnAlJm
zHd=0@A5~N1?S)SI4~R`}e21v9*Bs1Bu1yl5IXsQfvyaPfHcDOwi+FH5LM(a!?-&9t
zn_aY>EE>Iar6_CWG*hcsD32$(N`iJ+e2Zf~NCo8Ys#Ioja|oU>ySP7lnhP^@@BGk1
z`P6t%G21ix;=8ni1NIbnc%XomO#R?r-mA#~NeE|z%N^(p6z3a%=|aS8w^$2JGfz#f
zW~t!sHAyyndMd1tDoBOqE(+u{cc0xmHWt|mTwz{*EBz@h_|;^c)%U2=K;ys19lSJt
zHyGUgyFGzQ4m*_syxTC#n*R9@vEx!!#iqfsq<YGVnLGHGx{$C35f+YeFS71Wk^Cf&
zM8J@lA;N9sN)>jmEz*5oWK#7ix5kOQa}~D4$bS}bTIpeKxo!xVlg$zCFy`axZ+pcW
zd2a;Zo3DQ^9HYr3Ws{;T&;fWjp0Q=?)vNLHeKN#Y4)EumlLFfhAVD_$Nb_8n^upCS
zIR$8bjn1?a^_oKw$Dy-!ILxI<;29jqXc(Kd_m;qqe^*eY&I)%_QkJ7Gt{0aS%?!wk
zeASLvmhE72v`%%I@t}0xGoUr_+Xof6lx>o@?uz&n1T#Wxs;H|H`Q|I4HUqH3$T8Dt
z0zyNKatn}9P{I)(x8L|3JGAZ9TDUL75)<WH^h8BaMlsxzgwRRn*=&A}<z)blATaju
zM-1}^2`gn?J%u8LrTE|gK?A7H)Q1M2Q4RLCtL)DXYfa_^lXlJEdovDWl`FC@y@yNd
zO<!9X0zKQokU76^`OhYPA#KBq&xE0|yuhKVnI{vIEOoLc(S(vwjhgbCqC9QpQ`CT0
z&vt@Xyh><eo*vzd_HtU$ZuwTf?tDh9f-MHzOM{}OJASmqN>(OY#g=EkkZH<q`F#{)
zdmupO>cq1dodhqCHdxY${XT3hDGP2It4Tp%^eI;Nc<d&ugBYW(_XiB_oI!#=l$?Ge
z1{=E+7FHb|H&z2CD(53T)cQB_I@u(*w}W;s3hiF6{wM?8edMtsT~oNRpyQ><u0~;y
zv~yRD<W{l}*|w+eJc5^g@;Mc!EB|H>(8A^)+1i;`<I;Z}y$j8WiQ-lg<ngMys9r#T
z$N&dlpOH$X;ovXHBLssx!2EJy@4Ol;y~l{-;xxQIkhTG$sD@t9?!w*_B{1pU?^}lB
zEA?47?fK^H{3B7<76Er*ODAxwVohG)%OT=IK;iq+-Pl;Yu=P!{9ohy6%)rT!u6#;J
z1iX7{JEK#32D0hKT3cp(NGCOaK1<wbx&aJJ__nwQ#G?v@DFNF1==8NMbuQHSwyjlK
z>tf_`w=X}iOimjX&^KLQyV#4Jb_+xG>9gVOzn@bn^j9D28<v_+Q6-90SdrzE@=MBw
zpR1!H8n(W4*W`ULi0>(Ik6#L=7|7%JRqex*VWlSc4Bll<9pnz6PPPL5yvIrkz<FQx
zmPRR+EbA^$&Nsw8D|uOtuP7f;=aa%vsz8k{iL~sk-|a!cwdRQBkLiFXdF&4^NeX-Y
zlj<(ZE3!z)+rTEZA_S4qc=?olpeTsjRg}9Uwd!7A6Q7^j2W`*tUxjG!54htiGyB#;
zCH}7RNe6@hNA-auz@4>10V)m9eXf3X5(uS<8E~2@V2saayn=FelEq-N2<)`bem70@
zXK8UsX%)e-!hGG1Shq==XNyIX4CA`lgiH5TE0KWc8Ajgx3Y;u#TcbpEvZ_4DwZ`i7
z&V(bp+1qi3L~ifE!{WbJiUPGV%S|_2TDw2yZzc3_MyhM|@V`Om44g(oK7P*74?gai
z8uG4Ux7W3NvbgPEPpt-cnCw!JChP?WU*E$BjIr!dSju7VL7sa&EhnrbGdPMGjX`R-
zLHGd0Jjop!tr}6mSFUOjsc=vB;+Lg;m&@jx-jmWSq_O1S{}u@F4=w@DJ}<?d_^P!1
z+y8^K7js7={Wws!h_viKxol(K2DUjF0yY+fsc=&lepAF9%0f;aqg};p>R2UAnU|sK
zf-J>S)5upP7YjI@27TMKd@M@wOTDe)&v%H$A*BPAws^*1l$v1Oozi+c8^AZ84*~R=
zmn>W~T%@#yrvr(J30W?!qXxk;1&OKQdhpL)<OvA^LYhevuY0!iJM|3*cPkZQv_|ey
zj>?3~m#0~ly&Q$BQG{#R^6h%`B3Kj(cyyzaw;s$)pio`zun5jL4jv(`>U3+r{-6#4
zdy<q7GJ^tw)UFFHp`%GF?_LiJ{@nF70b;28W6MR;-XVSE7`0^Nd|AC+qJFe4y!yrB
zkoGUSmZpq+5g3Ech^($&G<)#Puj>+&+u&-ql&qj@CizUZw`aWT?`tGCtZ8x^JfVHi
zP5kWQgF=>dqJ#faBXn(G7lCqP>rHQ?6YUq9)fCseTz2h3m6JK}^=&gNkinASX!2>J
znOU0vlBPr>Og%pdKa1B*a!xA*l+|z4!gHCvYD-xrx#oJcc8DX()=8h<eG;mHI6gND
zYr9+7ATHNU9w*_7d!<^X$rmrwTlriRGh>U(trHVIu$6Q5>%rczVvPkp+kC1}E2czC
zhgxR?``;F*>kM39zSip~U!Y}Xn-6$%)9VZ<f4=h=74YmS6<87sco(DbA5+to{!JpY
z`ITB$|3SV9eMtMKF#1mj>uRwX{Xk}L+TIPpB*tO>h^*V1K`iGl0C7G&-*{=+#Dg4A
zF#L|-q{-O=;eS5ZFJS!!(w!qdhXco&%Z52wNlE$T@?VpC1GfIf&Y3_02T%a)brfOK
z<^nk|xP>$w6@wj0&zTEE$;}iD-;2Zt$&8kQ!zOM#WN>dgV8=_)iaeA^nVn#<{ZeC`
zUJD5<=<R^l7!m8_2cw!<A7mz^8e5urSe*Lv^T$7$LMz0OvdIf3BL{w%Qq?_7GJi8T
zva{e$>s06c_<@~_k6bJr-%nh4Qcw^he=NupZWN);&@HdoIsgM5)fKB116_ud{ciJD
zS-!q{>7_%QX+>95?qeMe>gGsGm)ZH_aD+QP`b3eM%}6t@Vrq3lBxujXe^@}8+I3mx
zl`tQW?62C$WjZ9DH)<y&CqyUcMUB_~T=VMbxU4O0>cGFU?FkfZ$v8)2qAZoydF)d?
zkuJk#6?ut?(YhRMKUWS?4`^S7htbzr5gwZYH(yqFJ_cdVvZOT`>0fMr^jQ1;7j<Q2
z<bmh03?S%d6^wK+|4lQY#>(re$3Ed>fo^DNp>svmWW{XE#baQfzd0a}{L`uJ=l^zU
z<4(am6+Y)A8p*C#&`R=Yet4%!+++X7#XkOm5ophLlo^P_dYz_tVvqWY&b!gaC%hMT
z0M~xQqJs4K&5FqmHu0lSCcu_bL>PidxNfms?57d%1qE=aAzY3Id;KwEL-)Go)8nT=
zmo>a%KcRf0!7=$_k%hoBY8xMjbG-P>2`Z9YXZ7Fkz(Oww6z9A=MrS+lIn6hNV`J^o
z7gi0bbn<6yRrM%*;8Eswny0(R-qgjeV`<=p)cL$I0RmT@{xM-UdjN)h)CeCOq&(l1
zK9RJvv^tIrykOA<gKL(=KVUPXUd|NWdwM$<Z~=9QzAY54Uj@bop!uBadt{SPw;|Gc
zHynYsXvW9w=%d1mH7t(zJmeybrQy8&>GQv<dSiFgm{15kJ2pOV2s~0OdP+_7*CX)<
zPvYH*h}0AmQZewidGQ-}DK0L#xx4f84fLZ7Vue~12GZ~%IjSk|=8m3y{~n|efXPu2
zNHqV^kS2YKb1bQ>KmQqmdo=?Iw4-0cu2vM)c6aXZ1p=)hoE3Ck{LMb)uG1ezL2;w#
z`rfUrX)|)7lll&j5#i1oPM{N-u8|R9NxY>+){g)&UjE&6_fc#ZGIjO3^cnx$@g}yk
z^3W^w;+$}Lv;nH`O%%+*p5Oqdk2p4E2{}iQa>e1GzmxF3u)jFC)#^VDL>5^IApeWA
z*Ay@R2QH`TFTeWQwM9hKV%TBRqm{Z7+Pdw7#E}u7cI8|<F6_IPhMi)hdUN#>0d&&U
zYko9y7<hSG)%pjwB%`65UlCNRt}ifkJL`*IIcr|3hNz+R_M`A6<0I+2){vFZ3-k9@
zR)O(@zavIY&oSjY0cuEmN8r%0ap}0H)ou8aTVp4u_ci&`F0N&7huv@Mhi^5p`Pk#x
z`8T|Vjt5_J@BPN_o&L&Une$_?AD0$;QyI6A6>lW(YXm%hJzm+1<$y_A9GwNmC$+4{
z`<g(%CVtbCaQt*sKKfkA|L3YWuqLnW?QKmG`G*N=ok&e!p_Qp*A+EA+-h{Kd=!-Z_
z|GA@AtN(BX9@lfty3}$a8kx|L0a3o@21UZ0s?R?bo)LYD`57Ma#;%Hl*^XFN%Cr{)
z>FW}Vx3#&0@>Mo^c4L{p-cF+j?r<eUg#qG;fo=srr&%eO*z#N{+aM}QLE=-R8$@nD
zLS>b}IR{S<q5X#kNrjt)kAn1amLm^+8_ftQ*G7-mUQ@jOpNG?`iGfC+7qJ|>+FauL
ztC#_wd2KOl@{`W_m)iRW9ToV*5Y%fZHENtMJVu<aEVi4yB}g`H`j5$3C@r)+38G*z
z-n95fadsY3^&$HEQv5&6y=7Qj+tMxy5eObESb!iwL*s72-QC^YT@r!@ch}%9-Dq%k
zw?=|HjW!Oa*SFW&XYX_Fy}$4C@NY8b>^a6es@|$Gs^0&2<o^V?x3~Wv?aG4x|8%X5
z`afHlo!@?bK(ib0<&nP~4QQ0UQx{bQVz^a}v%;r2#oSBxPRAT#_^$k^E_^q_+*eL^
zL_amqZhu-MQMNw0(om7(;CsA9O7vDpf96{Ft9ysU%8zN$!9%cH_h!%7xrLJTWGwF=
zBNI)(Rw3P&t5b$02mIR|oVmN$C=@wslp6!VRYvMFFixKhSz-8D7vL8e>tP%L>#T+e
zhoGf@cihj1A!6uYq<3Pmur#O0aWU6U>%!c<585I9xnInIAA2}@t}DeaK7#3+9m*m9
zp4~pGBZyzfn#?BazzEHK0IirEZNBGkDZjhtqmzWe)5}4Us~R$x;#tV=0h>uvoaWag
ztBh2WQetjY#!tRmUF#V+WCuzj>nZW7dlc&HR+B59Z7%<SCkOXGk80h%&1FmdMZRS7
z34w^;*OQ*z(o%3&C0qTB`BmT@HhdH?Y}ptU0~2mE+UJ)9AF_s?aLGg8n=xfsGpb+B
z=iA@ihrdDtC7x9S(Q$$?GT2a<p<C;l7TR?alm6tnEZF&@#Xfe7<@=r-oig;poFdj3
zj<T@PEt{CsjG>j|TXakO2BHv1Qxe%&w%~kBA!Pa$f;L1Fgx(pV2+QQkhS~@4i%s{C
z*h%<CETo^&I&7^Tl{xNoWqW{)FcQ>ZuS=rEJ4C)9Oi+&TYEVFAERe(?nx?SNhU?BR
z<iS&JJ3m0+6O@sajw86)=OV7T+9Gs%orn=GF9(i;V=SH&$DCsT7KTOL5Lp_;kvO>=
znBgeR1+POsd_?78U-Bb4Fp4?UmE6#Nhm55cGMj63z30|FxaJn~*#{LiIIv$ezwR)(
z7#41=VByJcxv;H(n5vo(^;6DYC?(@x>7q@%*#A79;t$XElNr`Gz5!$IF6<T?r%YKp
z&o>w<p#P`uEk4dY*|Vs$7ho_K6}%CetpS@o;rzNn*f7)K(xRm?V+fWnwXK+LIFyO9
z%RFA;Dzs3WeX`WJ<rXvJ=nl!T=MtsGOI9MuB|XMZP)%(Re!ys#Z_F-a;d2chRS-Fi
zMaiV{LE`}3a?7~^jPIg&fPIJ<I|5ev@|dociNz)3VmB3v(57R34Qc*G9%kC$lxa%?
zBNG~Z46BL(4ArCyJRz8|^ClBA&)NIFhqP0czh<eN7WFP8XVveT>&`9$R3I=Mw5%-Z
z!b-J2yRGo{sx9-)Tg|YH<YC#t-97}W!oky<eJ@(6{DsLcy{dcptq5>4Jt1ov2@Rx|
zS^?EUcHBkngh~d{gQPc7XoPgE9@UQ-`_sP=(2SN?Ub5)M#%{5nSi7+czuqD5(2_XH
z{G%|sj`W|1_DrXlY<~zy6b1^s#`~y9n7DM;xaQPjZ1PSfc{(a~_(~q`t5+_6a8w0v
zky~oi@p#9h2H>%ff9Jaf2se3&pK=D-oc8T}e#1j)6a&ZhY0{bYo*1ocE>o+p9!FiT
z98{GKnTJhZUdbSeR@cs08CkI0(p;OY`c1FfWVl|6WfTRv&gJyI&uo=VZy*9TO$E?i
zzR|u4x4%Zori~Df3Y<=j()%IHk&KEBHjT56du!<_Cr9?HI-X!R6O?7Wh5q?f31uLl
zGyAUBKFVyb{^Y@ue`d7{(WWcplsi|bvERhxW)$K+&f6c3Vtsd#UV?rE;aEwRT-ix;
zP)l8AUvJDgf-)Pgvesepnk;F%R6_QDEZ6UXvJ24Nqrj)$RUUb5^Q=p5Ip^~$i<J<X
z_}}T9K=q7`R=ol@u+mt|?-ISY0z@tB`QPXL8pkUmQGV6f*n;;~c<hdv9ilG%A6VE;
zzjkfLSmRW)5j=`VK)2wR{4nVq<{S6!vsG;vJS%~@v+{7XbpZTg4?hsQLZwM#e^X!o
z>D}r%as3U6LjY+5DTXAJAiXcO>0sAvqE!x}YcS>AXy-S-0yGC@L#XvMD*dV+^?dfB
zKHxra#kTohTzR@Q?hEYp0XfHB)hFpMl}GkbcToN(rNjX6$5xq^=5V3LdE=Fu_l^DY
z-g~xsCP(#^78O@(H_1T5#@+N)h-UEgF;dmipqkp~@r`cz%_Z7&P)If-<$<q2*NTBU
zlFSVT+}fqbu=>vODx?#<&UoVuIla%?%syZ9BbI<>&mjRZ@I4AzgYr47$9}p`@Y$cP
zQuA2)G5*NynS72^a3drCGLWmwzWThNEXqF8?NP3ZM~<}`Vw&Q9Y0m&_K57u9%iiZB
zn{}akbOB=pjvAfS%~qwO;-Yf<56<y-9aYNNkswIGkw-?#Txvr(6aE63eZzcwhfT*K
z@6Fl#!^GGDj6LhPEHB=z%|kL)bMjVQ>d*DEm*zcAs;&cxek!-_>aR8z|8p1_@4xmW
zvdM))&9@<i=g!DxIQr@vdu&GB<{Lij(>)<8Mx=sA0vQkPNn;~&kC{c6vR~J2Xaycg
zIZMaITssnTA#;>O5Kc=Us!vmS;bqK3q-<&T0d#@osI%v~Ul&l2zN5>n)L70SH#Wcw
zplg7@w-j}@5#}Xkj2+tH1hNr_S*?JZab@9fdy(hF7%AHR2h^?KReeS)H5DBThIhqo
zW%ayLd5=SI3CcfkWz1pS-XLaD7+${+P+<m^gONS$R!}<frRPfvuDmqV+BH|x`?q-A
zdMv^9`843GYCG;<wgKjN4~%+l1xFPPqR;Eob5%^@Bl^92S3JsaszY+~o>C_|VY_5n
zDPPMNnyqD}g`S%fVmT>W(Mkn+u3su8A6NzS7%5?nbrMNTwbs^u(eRGijcz8c^;B?B
zQ700KhfOH7h=ths(jbNVD#qmpF6Tg+QV>h!d}BGLP^isyNhHiY4{hF9C))q;<Q98Y
z{kpJ+Md`yXO=~R!cn@-F7qT8~8rChS5;KJcD%dgSw_|OY;o61z!gZIKT^b(|j5b-s
zf+4NB>SW>s`+P)`UL4<2lzynmn9l|+*4-H;)YwX)rOsQ_IMsFyPaNV@2h&WvRl1;l
z75{`#m|IIy|2V^wfj5VH-Sp2{0HRq9<Ht<$7u&BBGWKJ3h@$p3yqU+Wr~#~E=Di|_
zRiBUg`w&<Iw11oYrp4gN!Pw_X_{4wwEu_X+%cs15*$2%Ug_-A|%rfm)J`1)|T9o89
zQOJvb`e>MtSA;n;Rhed6<mPb47na*2TCCOO$|ceT7{kH82eQG92$t60t7kzI;Ge2L
z)2<_@-CZI}cMUoetqP1;=C>5WWWeqQ%)0Ye*HSX@G8zQ8TK|}tLE1B)>gk5RuDA;I
z6fW5O3gN^09Wt7l=X+UDK2?Z-rZ+*o&xfs;jU;tV&{k1Y`=R#PKgcx4$KPLF#o(vb
zomNy5=TR>^CnRg)t*alPbHvKll2$>?Hq)WzQT#WsgFOrO0AG$y$mDx1Y3x(K8ZRqq
zvn0Yvn%BU-cf_4%oHLXUXv-dXYod*0bOT>renm^sF6W#hwX9hD6fCLrZ<jJ*Sku=a
zF4j{MUDVR}YdKUv^fbcdPlQd}AuTZVnC2h>KLe?_edmWZp1obugHR83^%X)F^A!^q
zX_%kj+~>>4-79=^eT|>s3?Wmr)wgbmU9c5}5v5D#?~hWXj0Jl5$atM}Yt}3;8ftVZ
zH`O$YFG)CA7O%gfM@6mh-P=X%<jm5pA`?8LHGZ&oZWFd5|8){JaSJgm0f>)2izQu*
zWlezQZM=I6Ze)<AUjq*GbfPsMjiP{!g!^wk-mZS+a5mZ(ZS0?*>sB7^UZ2+t=n!%u
z-yVKZ?&Z+KZ>c5O*kcdsNGJCgq^{t2_wxoF{_u@!Ci=cV^Ed0@!|9&Wr#h!Rsg;^v
zA*csrH~w0@8WuZaJUbbfJn$zIZz^vxogl%+vKz_?g$qZsAM(}d^+ndpX_ZCnE7|8M
zH~?%~k3vx;&U=d??P=~1RvXx&qN)rpOOC=IOtf!#nt3w*-t3^E<x?%BT6@dg2m3Ce
zzp%bIxM{{lFIYbFMH7oZeFYvP;rqc$weY$1{<5?}DB#P&5A&|Nd-ou7X7gLlX)r^Q
z!vbtTK@{yM5O#)Ee|~H=6PV)CkbaOnUE;SDp_ujsW!TvNX|%c+@T=P*$>lnKIFWr6
z3g0!C9dLoqvUv5FNoC*S*A8{hnrkEkU&bG^E>8ZwbnAY~o|Y0bVc4A~O@haNOu{)C
z@RQ~grg~ZY-GdOoFXug|n1)Wfih6Ko9B&%aFJrSgshk#`dVga^Sk8QxaO!6=O<4G8
z$zic=)^O_yL8v(Iq414RD-lKDbE3}dKUBuz+kdjfY;x%!Sc~7|o095&)jg52OHASk
zgsT_jh}|NT!nuZrimj!<duiDfWKv2W+4A^k8?!VL78-2D7F=ZmrC*_XN%@boIh%8_
z&@BPjhc$eI0cbKi|D;ACcAa*~0xQC_nrweE(Vj!=&MOE+@fX{@l5|~w%Vjy`>R1I%
zf6=vq1-8V+E2-s|m{D_KCA;l8B+R|A6C{1JDdbcydTFm;w4#@dd++ZMg3AG_+l{o-
zA|zwNvQ{klEJQ7VF~OVHR8di>vw?X}VG0lDf&Jo<P@HgiytVRiiA4KgPb)DnX3n?;
zHrxCcDtO|Fmd=@rBleoAX~p{YD(wXE$DF-nFjM#CsU#$4!8L$PsBGb{Pt%DUBt9F=
zK@FozZklpjIN_l5M!nt%>)t79#W?w%T`DBzMfc0GP|CZP!lRd3jRVG_kyA2EZ@e0a
z&tcU-xqNR+;bE~+!s0viwUQthR0Q}9Bo)&w^IvO}GVyPI=MScrqf>$TlbB<Hn9V4v
z3{o~JDA8d#a}+QO%XCNp@3ds}N7bfs-8F1SZn`oS=yABMfUq6&0JGPA?B>@eJg`|Z
z#kf`7)6KQF;dAPNEgN~&+P?^~Kdq2z6jJ%={gZ)wKHp-~BxRW<%T_juJMe-L7O+kH
zL~=&tn;&OK`pH}>B=S((srOLXPkdB)+5AQ*#r)Da=z;++(Ds7|o%@}Z-185j08J2#
z9`iF&4-sF`>RT!6MZ8;Pw+rRR>t(LOQj1;qg}}$;R)^?|_i-j-Hdn}MFxuz#gfJ1@
zy2|)wTcL%&v3;>~#F|I`?Nxic>Gj_Ai64?;I_kOdsqV?dZ3F2It-c%C$m-4Ro^CAQ
zOYdyTsR56?SLJV(L(EXcY*(_4(ULqEIfxody{LTOp_f1c-S-7_-E(V1(|ZgST;9PE
zb#1PzOF;=>A>1nCXBdic9M}%Hh+_peusIoA*2`j#IU9&CtOO5c79AlpfuA74g?*Jz
z88fE+{Q2pHJVZ)qc9UK_xqC^JV{(mD^e(5X^Zwnx!TIT{`Wi}Obx{4RruO<OLLyA1
zo>MIi8wf|3hz7geP0KLJ*?~XpxAh{;TYp@}qi#^v#Ah0X5LNS2)SWwSyft>(sA*)z
zN=Y;VS#6~j+-c1ySZFesbEf6fD)0B1oYi>$+4s+y!5-l+*jPrF_f<Ss$ouUS<m6rC
z!d7jTg<g0w@P-gj<Hqx8>ZHofL}L!Xdbm-DBUDVqZ~d-?jc21jZGYYr!}E>a>1x(0
zWXg-9bHp-v-F9qiC~$lat5;3t+&gT&O46O^kRDKBZnDpHk|4MNU4upbV^LY>N1b{<
zCxJN~1~ztui9Q`y%}n+O=7aam2(W2!&4eP+C&La$EH@+EVGG!WzQ$Y3ijZ_A@a{<b
zG}ua)9d(_04DX>Z7H@Wv#iOr1^<W(j3tgXlgGzRHOD=8ytW|3C@Cpa&&-2gJ;Hxwd
zPA`V70%L9$-*GT<>^~{Tg9-qudMyN-d+ce<B2MI#U-~==qNc0NT<hN_tnS6_a|GI4
zg8h3DD@&@yaRh9YGvASgG?|CfRpoCHyO@1ivat-H>eSYVVqpc{7~F}v;f0Jvds$AX
zjmn|p!{D8s!JL6dM5C~=rx)w6J5fHXK>;H&JlQB=hsisfWxqnTA7Fw@Ftf2K8kV94
zTehOO3!o-GVyw9@#Ko_tdjPo|YI98N`<TgJ{Pm|+8Yq2GsS3laL34D=PtUf=#>#$x
z#q~?&_(IYJ&<g>@62$+dGRHx&ZI=%Irak^Ef$omVr&d^9r*Egg`QJ&fbC0n9GP}yK
zRNedpI}?)FX`sR+?7Zfp;g<00g!u+ys1%g2EizxJ$!DKDS~n^0+C%SsJ}e<}rvkt9
zB#@Lg7%WkD+D0!}TEG{aGUb^Dro$vC0f<(#DHZb(W2%{W8%V|P+X6qd$*2`F6zY&|
zG$SajPJhu1@#mK_A1!dDU#WVPy3e9YB~n&TX0!w=_FO_v7sIJyr<FTCoG8uZ!K;1r
zj#_q>%~w&$3fHcb!(9#(z>Otgqil(^i3zuv(QKOj<=8A$w-8HwBMBvHt0+&$*=DEO
zD05A+4_TZ9W>-{;R&{IYt7M^-j{ewIXpWHXJ0gQA?3FOUTnt4>M;et+=a^jqZoCpQ
zvWCd~I+*A&ysXL&w#i0S`VFS)IsTqg?_Dw9fiUHhfn3}^`!5O9W)Xo>p`LC74K2U&
zXgjrWqDSxHx3{l?r>lmI&&zapuaodQ#`qP8nk_W5V&0XzPq1Es6H(ywz;hHcU)`9E
zccKXAg5L?2TM3?GCmX|zvWiahJ06{+?Pa~fnnzj<TBo}x!g7cC@QmadYgojYz9NK&
zFN5X{Gkq@Wr7e59wXmd0bI-Fo%BX5ix$DvwZM-t=pN-`mJ28m1nXL?PB{=>W{l9RW
zR+Y1nO97&6KiykDmrBvrJ?I8f+XgE-hw^tDkHWZyDxR(-6T2md-ie*V*mkB8t&oi0
z@?CiqyW`}+vM}`nD(IkU%L2y9uom%Qi-lI-A^s3%z}$`UyLWb>?Y0#I6~Jtho4wWc
z(MX>}`;f^MA~;*#%`(pUg?#>v&X`$ivzBN3tTB2y<<|%u3{vhn1Jf8sI91JmO4qCO
zfZ|)<!<&Ohs0veG<ddRM(7vO?2vo!hSq#jJFDpwz14E@~<{8;E{mO;%RT2(8=9p(F
z062*xY`0osUc<rn?&Y##6^{+ObpGhAvHExBgTona`7NJ$cjNYFJu1rkATV~-_$>-r
zMPIm7Up2pGWhi24e_BDmvn4DqLXx3DSTzSMhJhZel(RyXv%LBGYNrp#%b<$}GmGf+
zz@MiN(h<2L3HvORh%2oY8xA(MZR?*ZG_h6z+q&5i0vCSG*Q31hZ&MqLxBN7WSj$=`
z%gS}%8POKGQn-FQ0YqKklnRKr%%~M#f^mjfVey?(pBmd(j-yWr-wxDm#Z}F4Mua47
z{sDERwa%B`^g2s@-+J^dN)VJcNTXC=q-<T$<~KEDUiz-;rUW-h>xQ?mmSC%J#l?P9
z{7Z*Yv6d^O>bNO1$Aw|jC3KZ*VS-&Z)|sKG9gFp945wKn+vcb;?<7{?z<j?<q^eI;
zZRF2_3ZQ~7;O4H8Ocl@A<75o~L}V}RO*lZ#jjRZS6L-u`>5T?VN?n0pSL<!X#Zv!r
zU=$UVr{(BkI8E3wrdBG4=KbR>9+vDi(Q^Rwly@|~yzS=6qq1z->*h3*YA=&AiX1CG
z)>IZ$(`6XR+>amW<a>OychB8Euy=pYMZCVXWkYHk1fLMuWysX^Nl}~vDa!oUP;+as
zi#)Ho=1mOH=jLkk%97HYEpTmuf`8&_ZBDJcOkHiF>hAi%+b!MKL(wLs6`xpx?KB$G
zW_C35iZ(|4mqs=o+!q3A3QQiWfM^0z<lkgL7)dlxa7}*3)NMUxk+JohF{_PUVo2P8
z9q!oD2cN8Bjpl8Rfk6*7g<sz<JbTYf%%nxghQFy6%dCp~yr=@85S0Ekllmh*>DOei
z#1~0aD$gNy5-Np4VyUd?ermTAovUn|))osivTPg?4E5L!XQ?lK-B4_~c!25Py#L}J
z7w(s|40QK@r~LP>hCb4ELhL^W>d1Oya%)Ue`43qAda0+%N@c}7#g{r4r#eA4S<IFb
zAY^@6RhI46R~P*7&xvHfl|uiFO4Z#RwpxVHPY$U9n<{1MAIc*oe(WJ)qvCi6Y3~4+
zW+M8JQ0<37wEABrjbG6~5q_+4a4|uj>d=7fVD+}by!VV2K{|ySvD1?%jeV$>pu)q^
z@LW)geh}3Z4`~&gqcVs^t&k9|LRNe<yLKJ#4kUKGa(-ECH&<ucDzYj*5pvmL`wrqO
zKa9YV^}`&RXrPmmb7R&1nR4FprjmugD3UCdV>J6>$RijCL8OTZ#_@h-t^lsl-aArB
zln}mHh5FJWYC|O7moNO%XoCG&N?>lI)r#wk(S9LFG<lt<Wcc-)AU5-TqFpF~S}xy*
z4F&Lr=xZH*b9scsf${;ZZz(DqpIETtAl%fyUB<aJve7Wi)thl1g!^yQCrJ_|x~-F>
z9I+y<SQ<arj@~RX$zMIOoOn=6HSnemQq2E(9}JS#sTJ(Kjh&$5eWv5A44<=!cugiR
zjXj{A$d-pj@2rnLB-#(iaFt|mDb0vT6)>hp6mRDT$akwtehdH!o6(Us&vwaaOl(-?
z81}!niF{{HZV4o!i4lGCHYlS~UDakfF+^k~vVUY*efnHnE|&9lhGQk((xr^S0k_sW
zDr4%$=b#LwIrW;kqdMM_;4bBnvIR5si5(~$_&W!1_H%2$`h08V(Uz`KAJcszS~i_o
z^@(Mv(eZ=Xu!&M7(7A0$ok`Z6hf<puoPKq{VuVDu1%HqPe=z5c-k<T&I0~K1P}q77
zRqma=G^R~W66O(yC(T5a;3$!61y51k>dnw=0UrBm8BV)$BYENrnAE1(@Oy_zQ2rA4
zrS+~UP<oyFk`CBg`)%}cPt7n$%H?u+q{P}s!>2>KL9M6D-TS?R6TnjBH@=DCp<!^T
zh=6`p_ZZ#sX+zR>W+K`qp@~Sl3pE?^F{ml>v1mHN^C#7)1c+s0>HUe?gaP|U!tE2k
zK2a&G6F{waP|6Gp5-SZdZGB`PmVEro$h$>B7bU-=Wlsc<0zrHB(Sjz8ye@6ar*6Z5
z>b77(8{;TYenG~REa|0<N$ZM9Dg+rn0Q5aU83$3M?uhSV6=a>Cq5c;Je!AKM`{zK!
zx1-ECX*2Nsp;GI`cN}1OzNL-#cL<I0*f_`MovJ-%-^kW1%PI6_HCYlvgfA?P`D-Te
z;=%YE=4{e1LCM#L97~5y=Tvg;Su(1io4#ffn|zn=t%Qj>zZWoqWr@$j*)0h@`byTZ
z3Iz3jVTsmA?y7o$?hUraxspm_mlEKOcDlESlF*XQ-=8ddS-hOSXYz4Ar!=o4jINdy
zjf5~@f9T4o(%&oBjhD$<=I8j5uC2Drz2oE~I8+6ltP&o;6%wf17(vTM%PMA@DG&%g
zobW$cQlA<^5A`e%!CC4M9My9vx*tI{Q2<+aH8YS5eFhlyt^K4k!KC`(>0$`>`6s-l
z{~!6Vf3`mTdkJ3u(0%`rIS-AX#`76MPGA2*KY-xT#$%@A-acrkq{)&xnX3zaby3Gb
zygVt5gP&MDnN6NN)6hnTj_{}@vfXj!o!@CVt_1*ggV@I^oWp&IXq4sI1D*W4%3NZM
zOzXK}sYcn(uS0)+*)=h?&Qq-8GTob|o%`*)EuV}=8%>*h{~?XE{#%J@4=>Qf_*2=U
zU&mRX?k%vi-TW0%2vZ)vhW)hqxeHd!Hd^UhqvqiTc)La!c30K?v~|OFdP5lS$2j{B
zni!?FPQx}Nq*3`z9~e1LqPNU(``|@Xi{^#zzi%MmflAqo*k^yi4Il`0*QH@tioVA=
z_MNJ=PHO*wR{Dk&Wn!y2(IiMlUb?o0RgV{-w4}KI6Ts!J1Sb-MgCyfb2pZ752*DZS
z4895|@4ZsWslnNpTe@VaT~~iQ6D@4@&U`eVEc0<G@R%p1C5ohU2cRF+v$%#_D=xQ+
zpu@m4Uy{bBna5`yWF<chTGqc$PUIJu+k#aQ;UQ>@dIfMSPL8eYs8(7xUNO+NdH0=R
z;}*qqTy6ghPbF125%ZDXtHS0iRjeYXS!~wR8Ew2=l(Qkql%>TwL9g`&Cwt4+3R0FW
zDq5RI$*6MRp-lnsg$KoU)vhu2#>$tE_QuO^#dhb!&{f*{Kn>7C_OV7v%NAOFUc|0Q
zGP;aqJ|AY;cS2M8Z9GLKvZC7)x1#>vlfhV&%eG(}>e5t!LjzCr4;#I9^m8S?=Qlmy
z$=2=Yqo}}k`^rAC5VY_)aG5`kc~Es>+<WZhPM3<IJgdecf1e6al_6E#gCe-}(!=>#
zpIf4N^1kfNHP~2{N;rg+7BgAD8$J^(>Nt+(cnWMbDg4@s%vs1R@J4JGLX6OzkLJ^#
z4y<fXQDr-BU>LEviH^{E)c3_y8iLtvx5{XGDvUeR1usf%cA>RWka8nKOj4Z*-%XPk
zu1IPO%-LK$>f{mf0M&57r_NHYJU`6pEj<O+a;@3~#Oqd#d;`)p4JGReyt9dk(AA^S
z1SN1U^PfrnsF%MD`3z_pmowZOT^Zsa!+X?4N5M4eQ~xy^-u$%{Ev+b-$J5G31W5*J
zp;Dn#|9E1xDF@On3oq-^k;vNxnbg2fA8t%Ksg(~Bji!zWIQ$b`ZAt%Uzl_a%X$m7?
zOBzUR1c9;oX-B3IJNCF{BX2s8EH%oO{kI!QQuw6$d!BW(AB5Tt3hy1<@$dB0;vh-P
zaYkDX^Lp>;yvcSpVoxV_!a1#A{}8JOnW|xR7lLC#wj0q_L!&CS{>)a`DfP<bTO>VA
z+2V)#=g5t12@9I+1!eXxgmp)NXPO$ek1SWYc0=GAR`HQar*!uLq@N%xUGKu5=XNs7
zq(6|xd<$Ef^Gd8^De_8Wf84UpeTa1eev)<VdJYL0{$b~|d^09o8O7LdKaj&M2>-6H
zJ@&a846VKfn>4J(<z7SIbh(OZPS4s!E|o>jt+K`G<s`J%<p0T>Hk5Q?qHN?A?}^4=
z2HG)1VOS^F-5eF+&J6gPPrNZzZ+deV=BqIH$Kxl?_;?OAHsSX+vU*sDttE5ymWH*o
zPp$oC+vhAZp--f@AAiT@8j7UZ>*krvY9}7zb20NuAY0fGZKjzB^u#1tCvkP1ny7*p
z(y$rORU{tqhyeoY&6j2Bd5}ymHA3bT_G;-Kj<0G2NCywvQy{9!(TVb!I#2wgz6iQM
z3gVvJay?aCMr6j$2Eoq`xQwmAF5BSuxzP;mk<5P-7NO1d>PiGLsXUrw^M<c`YQ5z&
zYlVcc7X#ItKAB!LBa6f&jxXWkRRS9EcZ9)uNQ52(XT=oaH>xyOAt|fRH-%wkb|9dI
zh!^mo3W4+OkB|#V$!#!Qt;jk9vsbRNO`klw;fD{54uZd}Wjq6T^7*io2!7I_hxDcs
zJ6nO*JPsYx4Cg^)fU7rbtN7J^0Zc?23KFL3t)M>dz7V^D=FRfGw{#PG=4?xIfZ?F=
zhxmryJl~+IK<DbpFFa>i<n<P>rs-D>?-?D|zkf%Y<#vJrKM6(Yyqjsa`Ozv!(9YIP
z91q5VcaXrFP5$+v?iAg|qD(fDZafX^7T{S7-}=s3qnuggltKr2y*k38b2KU@uAi#5
zf47ec(?}O1(L?&@$c=EA22TJiDN3?ohTF`*tTShuAY<Jel;MB=vTsjrYY1thQz|th
zNt**4>;O(~D5t87p$!VPjG{guv%}HSz~S3&M-|n&6TGc4koUZ?F?STyc8&#GsArr@
zKg)!EQ?F>&IW!rJ{VTaSQNnJL;-Y!qJxHr(HV;{dEZ?YA;)%oOINc#U;n<ZXMj|n!
z+O|1=+H&ximb5U>NduM90zAxvB<3v!e0wBTXH-ii_Y{pE)oT3144mG`LXc+yGM?8}
zEN8<~-hb$t<~8bKa+w~Gqdjc&Hl1&xT>i*&Dziz66I@TG66%Zm8(v;_>N)+*K@#j+
z(4!Xm9I?wdJiZf%z!)QNlx*pAF*gQf=rd?DaeB6!0yIYjLf+O+$u%6zL3$^HX<w#o
zRi~d|Ifdi%TE`0Y9WtiTRIJucyxvi~{z0t%>3fF^$ZH#6pZ0>zSiP+!2ra>*uc#{X
zY~`gbZgm^lc(9oW%o|MQOSv~$P9OwK(|-g@U$N+n&f7U!H8aSYCyi{w=fv@w@fKIv
zKgalLe9dG0G@pBL8;+wd9|t~GYFra(X7XDp<|9-l0oujZ+ngLdtBSTQg~UhJE+oj)
zAV0zH#E~9rYcf^4_exgla#dgG!K;MgjhD}2g^pT#bhROyqwP$E4Ti3$V{qDmlX;|c
z#+XJk-l}mimXO?&@8dUVjosFpdB5;0OyD7-3hIM|=0^rC(*_ku(9WGBMhR0!twU#>
zQ?4Q|PlMy~xNvffq58NuWeh<+-?jV34Eq;3s?9{KC|{XK0-PipxNceF?5x1~j09|a
z8Wl=2&Ni9J;LSJnm|H&dvc#pjb7!Q1g2c>j4Z@l%Z2F|d_63#w3$7+B2#cX2$QB}W
zCLEqAdJtIb2;`V>A>$6bKOxByO#5YDpkp&~LWvWuJ^<1&I8biv>XD)u-hye}=v&eB
zI@!rsW8+ihlrKVu=Wdef$UMN)YY3^#lkOJZMaxC%#>jrpf))X2At@E9nuN&Q4@s{E
zUfe?xNHcaui!V_HH1k;q;$WP0Nd<=TtV<Y8&}>LzObWSj4mA@@$~Q`}&?(BHUdL4=
zd?(ew)3BP=E0yJv^_sQ#Q3hdOhJvi`8IH3ce2y1Jcd4=~nS6#DPryWMJ40c^sz2mu
zuows{?Hm<-B20^eQq)pVZ&L=yk)O!ywWscuDZ@FyuUd>GWQD)|%Q=N<P?o~|yAbsq
z8G!D>2pXx7uciV+LhGL*oO5%(B|Ewa+)+_aw{*{f5bL?mZ+~g<G&E3lejBBOiIfY}
zufkhOx^h`gg!GI%;A#^_OS<#0t>g5_(f4P6Dab^5*$4ncs&@2IteFOB4DZyHlYO(}
z)M@MzbGW8K3!&u6v<6n7s4yrEEly66<y;}B%=2!lQt`!S0fg3={1%H9?RvZlZ)ZU1
z(^t$nSJElbP5CUudHg;wo{xv2)odl^-7I6CBx7C553m<kAw;D`3;mVM&9Tj5T(0}{
z)g+T$_P9J&`!HU2bK0pIjTROqj%GsBxDOH_$V`~qt*;Lo`_2y8<1mKgLu+IiKB{pM
zF+JM`!%)<)lK!cc>dk=qQ~IJoS5D+R<VzzgPP-uGrE7ZzQ{xz!KtYqexC<O?#NG0C
zTttzgXm!gL0r?|hBpo!?0>^P1E8^{w+KCG1Gy(O}VE4zgu&a6ezHc?^{ICF-`B5_s
z)QUapa4iYqe$yA`S>My^ufuTcybXc{5|}+B*4=G+2mSc{F_<`ZosJiG!x*krA5;jD
z(zuOu<~j87dPbHV<5FfuvyoCgtAt}YWr;!G+D5HY#&cvh6$m!v(hp1o-YYgv)t>}V
z<2e}lG>>O#q$L6L)iKpzi#NiLj4Crn!+u`hCmed`W%i25H{JensLIW%l~>4YB}B9*
zpKSdE`-7b#)0(NsH^!u{kS^scfHe3zWAc5n9J?uje%*2|5{yd)C#z;2^I%l3fk4;J
z-d+5YKbU-YKDCaEaF?lnc0CMJuunj<K&Zc5fNp<7))$Y{-<kGT_`VW|Hk2eGG77IF
zb{X88!CqM{AFIEsOc*^_<rcofM#Z996j|BA>AsUtX2ib8>s2Fc9WDdTLV8>ZGYyq7
zBR9T?RmfxPm0RLicRS)wcZg2VGr{(kM~-+6ABVBTPU42g&yM@gS%3_FS<^uca3TG|
zTIe!FF88CPdlPK!6vd3+bY$`4!ptfR9E$rWcDW7SwBp~$F3{5Esw%@cvEm?TsXV*j
zoN8up_u`0kQM9V9TPHgQevbYImV=X{@uYJG8r!bVEAw%iJL0>L-Hffo0SfT57a#fW
z;m>Q^e-tD{+l8ochRgs_ty?sqKh>cOQ9oo!Z^GHF)9u4BM`$-0(Dmh6U6w5lV*`eg
zTa(ViHov|@Xt3yshItctWh3$j)}l#lWx|HN0=AD>4xpRF8dJ#z9#G}kXMa%*Dspp$
zx3hNdPJYu^yq+DB-;2;#{JU3480p*p>1mg;FKP4wB>DdLVLV(Kj!JyNI7)#k0unk_
zwX<+J8>*PkPOK3d8(1^Jfs2Xe7m~;)U1yF+0aW?zyyB2GE^2CX8eH;_2jeT^FxDj%
zBi|xnMJCkifqh<ligj!j7QJrX+|q&ixoE-$JFrK$Iu;#T@D*;vb_m}+`T#8y0Tss<
zJlOlg{8;mQ$*Z1(&(gtRX?yXMyu?N3e&X~<(o@p!_z~y}WSh5W5W7u6YJZy*wfY_s
zx%}b<<EPO%n*Mn+lTi3}z}VLyQ<pX*Lk;H$7Sp|&p^oQb?%NKW@a&>u6o>W?;+piP
zvlXrV!63|}74u><bzUrHA!hbsl47#MCtzUJc_$goEBjIY74$Z3a&nf48aOHjTr!k9
zkR06ZLRRt031^jx7YjYm&xu=NDSKx@;$oE=9jCpSPxvXFX*3fVe_bbUku*N?W9#bD
z-N`F1OdJf0vtsgo3A)h_uQK0Bhr$vEBK-q&I4ESbQe($mZ1ontu!}$5_8dUGXB~jJ
zh-&mdK8w^cnlc%XbCewjV)jVUjpJOvQBt!vcj3EnV>D`kr7fQ@6DBlDTy5sp+9J=5
z!?8vDuA=1NINqrU{+aR_=UxDyUiXy<Vo|IUGilN+HG6Q#>vfxecnvMveq?`RUE*>(
z;@cb}WO<*-5eZ>rb>V|0&M@h1T1X?58mD@sXkpc@3U4fM3%g=#Ygu3j3vVkLlUaPP
zme393pPGZBdEKs47Pp$2nOhqZ;;5?9vaZy&KM%a$B785vEKpFNR6$CvvilZ4Ff&11
zKX*^=*z(CvSKIBoW>5$|We}U3ikxZ2bS53KF-VbW6*76Fo7oaE(CdQRFFs6zI8G8a
zv)`A>kITfTny>gS56=4)&!!Yhxo&1fNCO!SAr-P&QG~#fKur5<Cn|6H$cbbmnc1q5
zw5q_kG!~URO1Ix6CDs`;@;-96te7+P@UMe6KSL<Ok>p;|xULCeRAyLJkwa#iL|zW?
zwN9K`?zO#V^Il9NRfl%2uD-z1rcRm(DxkVTa79?zLia3Y&Upbe{U>B*_JPDLgRd>~
zHNm@|nApL{bRP=5cC=d%5xAJSdqv$UmP3bTq~gs&!`qQff{}RwOI4&cXV4R#F018o
zWHE`RWUn;vNvmgf)$9v2cr{!ut?v^R%%;DjYS>FJ4M*}9aa4;_55MulJcy8S;f~g*
zRw9ll`4~rvgpVwJE7o%x_r~p;j2BNnr-3HZ?g2s0{TjaCMBf1sfJyJ2BPylD$D(W*
zR&US`Qcu?=xltTLCf=|5drm+JKth3lmcmXN)|1I<WX$*%@8?7Wh?Uq+Z0h>{dbO8^
zBB!$OWLb)!=TESZNQrPWl%04azSlW@$V5Qo4(px<l~_71-8ofYcgEnigA1H<oP-Lu
zc>Y|leAr`{FSB9YAC*DGQ4+ZVHd$2l)GxRDaOTm)XDm?PH^E_#u7qxb<8tx2>WiIj
zRy4+EU*mj@&aYCjg;xV5Uh=qR_yu}`S>)fm_G2bQ?@JZX@f)K-nCKrY<S!b(j~q>$
zGUIv*ljwos51}KfA#O-sx3^ZyT5)wt2_G7z6t%5NtmZSd=dls8Y~zLd;uFLq2pqBS
z>l?8dkl|Jcr5vT6Ti}PIEtMtfVJ0B@QlYD!l^c!SB84*OB-i@7Ve?yra&ho}Ep4fe
zN}yedL&uV-NA=W0#ierc!7V0rvTCP|&ZxavpPq^hf5I&?_f4O1I^^XuCP17E`n{&2
zXpNEf5wy_L%t8S$K}<*tc_ltT|3<*IpU|pYr~ll9Ka)5oUl)B)&L`59I+fs>AcHmm
z-_?=}tLKd)M-KHDk&8ePz(vXW=0?Mq!CnF)o$SZPyn#|yBrLI%r9<vg+ON>Wk+U@%
zI6n*I(MUwo5Z(T4sZBE+3ycq`r)Xz~IC=^`Y$41m4gHj4AItPJIy*(<Qm5HSqVhiO
zS4GFXqKvj4|Ft!1=j?QdV>I;MonVwe%!<a)v`)6!F=-_KZ}E;^3?#AZo08>}ine_E
zXzF+kS3g(_)1F_ICIlj{Evm1X{war)n2)$#sK2E457N~mrp~^#fbv}xq9l9G*)^Vp
zEId*6DaXY;_6*<?61F2y`0Z%_w*K64@obQ<`Q>6K-~}Vw=eZL|GOgxxG{)()kmd0{
zE$rpl;$<s-<aD_?=po=x&UUFG9OH4>fr*drf+gzWg0@hS5{t&|oFi2U2NPC&=9uJD
z2QVJlm3w@KJfRyr+;e*E$4Lax#B9tpSZyKrZ?^k1-4zktU3NUuM}Op(S220*v3h5{
zcGbVY%*;F_@p6#xk<axuf1Oke(+Gd_;raW*RxLi{zBy-STw}efk@Oajb3eMLAHC`|
z#W&I8wkoz~fgh4MB50^JNPq0%f3%?0;m_a$<r|}ZzMJ_8NE!O&wa2?QetJ~BHDYjw
zLE^v7j2!$5j+ppl#b;u5#c!|tZcxxSu{hfw!KBK*uq7nK`_%WEc;zLT^Sn6VS&(aH
zPLC+&?~R?X@+0<YpQF#~Fa0SWUVQJA|4VJ+T!c1bEMAt%^EOU`cKdU~nX?QfmTN=f
z<=o3H4+TZRf;X4OSp4TKN)@T-94@cQP-n3UePBFB!2WBFHzkayTG!Xtm-{B`TN@H2
zK1&1YDPz;i%ioV$j=c`8#3cswq0G8A*Vo+3+B!~pq0Y_1I3c2OWD(@L3w?&35XSZl
zu?wfj$jFP)H9y~nZ;)rnkmj|={)Kv<o9o$yJ%Hy#_3ah(`Z6xymL?%VWx>!@Fm++g
zgAoj}-xy5;ka*uJMfBZCXNrjQ@I4NitT@j^*)BK5ed{7?Y4xc(^qjBO8QF~@+CMxk
zD&9Me+4jZ!I~(TY^*=Q<j8kYQ0ZwutUNwPX#SJGyED5iH=&%5om|>PWm+N%uijc%E
z`+}*ihPEl4tDIk4QLmfnmd)gF>%0iozn-Kk6eWlDYXzps7n9VS)9=u+^WALa8%Oqw
zIEr;{aOHLNz=Uo_gjuz!n^7|V3n<+gqN7FJnlTO~tt!*ih*$^y*xGU%9ktyIf$F8`
zy8EfTt}Oj*YRv5pdM4XyaUI&J4(6dAnL=FHu-wZhs}nXNP8*V2nS3rv@v~)S6nFkI
zP>`cD&f=dkKTkTr?M7SakG=B^ebc7y@$kVvWhsp>8?OKW7WJ>m3R<@#FKOa}$(-J#
zNSj^VQn|COMHz4gC`9&sOkQ9p2CNNVwD8zlcAV}k7vl9@0k&LgktSNtxD{~3wo!<d
zbQ?3J^PS@$?HoKTM3%CqT5(5zLZHMbYc3xL*=g;cq*C0DQnn`MU5Y%=be!!R_4}E3
z`ni8wy!KzmQg3aXoX+kj+p;fZV|K407znDoenAtg_x#`WP#Vh#j7RhTniVCB-71?p
zJqa(_q?lNl&L|nQb)LNa9C(ewA@B04rKYNePhXf412RK;-P2VTHmt1}{-s&g8-tCD
zPlZ5kKJbaVIeL8b9tCf>N0Nk3+fu6S(Zsj!XD)<>#+l8#Gv)wdCZ=NeGYUA9rsXkY
zw%WEiDln#*LSAf77U!iEa=VGYFX&$Smc>$gf7I5iS-e3J^FCTg<ybvTFM~{1#vaof
zgVt3nL2@ZJ+rhAAfQjFJUog3|!xYN!nr$jxlS=~_Qmc}_ClgUxPr$=kUcD^v`03FJ
zAHQQ3%Z+Vd*}r7Bztd>Uw!_$$Ny5T>?)yz*NCV<&g9zj@B77sP_Vi;`S|MJAgq4p1
z18(|YnVM<!#gho0Q3cUQJZ-Ej4G)vQaZL7MQ070?h98^gca0hr;3m23C2J)+WXSx(
zWTOeGBnl&vJr5p{QEv!Ce5#N6BFOi72&dexBC6EksIz~5@S6~2;(e<ic-Fn=V!xCk
zgCpF0+dcA6;_P<H<6Pj?KX3fMO4ahH;5$>aTvRAXpXYAs?N8Ee_!EELPa>`ljB%c{
z&-eU$gZaPjJby(Acgr$jUVpq&8#SN2``2B|?QSSASw#IKqrZ2k|N9nLy&e21kfbx_
zCr10r?-j?VDzWu|i&DsC%*)jT<5pNzbbknf|MTA3c*M>~*ud6brei7K?$s*41>P5r
znJCV){)LX@{t)kw5b@jIVu4mef(19Pn^ym1VlXHlCsSZ%Oz?SzdT0*?W&U#c;dJA)
z)t)+IjD;PxUD0v*2}9t@^5m&fhvWRO$Bd7o=QC|Co@4uNs^k<XYBrJlJJ0xEKmOK<
z0vE12w%Ye3<8jX5g(2X2Az-8<;Q9#8cVoR>+2_P&{dS(6WMhZf{QSna7Qf@gr$dS-
zX>+~(QBB}_X+kAs?5v#BleCIA=d)g`IXlGbhWFs`aF^oc)(N?fyyNbOlrl|rHrZ=|
zcfEi7MCWdS(LXK5|51gFgX65V<LL+m<gq$_(so_EHn?Vyvla8Q&A8a=K`QvbdQ<Pe
zstw0cVK5RZa3;{-Pknn!EzrZ+ehrtwVfQCx?XrsANAdrhg1>{f&`J81SARg28CF*9
zuJmg(5sR1GGl}#`GJu73!LI=?aBxGW)9oBgoq0VHj4f^MY#q=p1M}5ob~wJPK@;Gg
zPn|kzSKjt3e#E3N+ZK1t9*?M5#3#PDjn5dvJ1nKD5jRhp6EHMccQb9CO_@B-Pz%Vt
ze*^bRnf5ebglMtJh+i%WZTs>qOmPA64(dUo%@hK2gZ>Bn5aXQ)+{6*H?A4o(M)-i1
zm?Y`ofaj|Y$Ne)US$0}YT8-Pkuvrv!#O-*-1b5TEE8jmreY-y(__nili~6oy@Wte`
z{b<hjH{p*q`vr6ouPYy4hY1z9zx<TJ%Lv3}=Jd$x@WBUuvE7e;3p>I08O7k$9~6Uo
zWX@I3*RAgMd$-{UA9?-bJ02cl=0GTuE3<=@oa>%LE}#FO$5Ieoe+s^HVZ{wmE-H$#
zY2~)}%>*fzt61X`sIfg*BsAutk1-$^Y~LKd@EfWR<-21O>7f9cu^g~Lo`Q2s)}Qyh
z_BRG-Qbyn&*C3BP*?Z>`Id^>7dl&PKZQ%x<3%s|B)V6<4=QA8kM~b3gp84Sl3Uzrn
zX*`>`=}3Exh{4Ll^(Ig`V*Tj^BSE_8_HdzHC{J;FT%#jGl-z&FiImUPv}<98l@*Q>
z>vHZ%y;8d?u3*5%AZ5gC915+q>fvdBj&g__nTwMz+&?+0W`I4fxAV_?UU%E_w{Q>l
z_sy_jbZ+J;Dk`$sIwi#~@mp`+5DYY~{jt?#Ss9+cAOAr9|1vVee~pYoz#iG!IjB+J
zIj#NMH+qui*Pf5z#nwqD-s8}V(3d?(t=CNdTlh{_p0R=$$i`2Dd+8IP&uM!C2&DcD
zCVkOj7)_C-Y<^*I_+ye&Gvd3~6?%6n5;I$Nb)CsSlJhh~>a=xwbT=H}`#LbE>4v=l
z2)f`5xWPCMIQ{A-t8jr4a3g3i>wmo5jJA%!<top_lqJ3J%Etv(25)6={mRhW8(|$o
zv(+0{f&U5BFvw2{C$7A_+;8HqACsKnu#MsW;&D6fK}wY+tyHS&c`*cG<jDrwsI#*1
zec2EstBO_qFmF3ov2^I@>L7Uo%f1`~Ns|7ynr(ILAX?bfFM8!mH<E*2Jw%&tr{{sT
zS>}BRayZ@C>!^SupO1L<O(R0QlOA;Wdq2FUz<TZSGqY8xSsa<3r^|u^2fBrtb=s`1
z9QiMiWM|dK_ZoK7vz!JQPkKP9diPfIUlWeEX|d7=(<3%^F#}~|=q5oSJ?jQ-CWA6c
z>-mdL<vl7GZxWF;T<1~}tAeh1iI+7jq5N1(i!NT5?tKVjb#4*es2_a~gJag`W+%%u
zLF!YpRZEwZfy&Q^b~09$2349sDb^JmF0Vh!2E~HMu+Gk&St5S*o0BtSuveX3*q1X(
zeL>H#cO2xP@m3IXB@UM?+5j!IMqUw(XNTdFwp{Tk{4(6SMkl6cvxpY`xK4_tw<P2L
z4UbpgADf%9hf55{=evy`BX~|(-EzJAv)zE^!`<j!yQ3Ot6%5vRa8XcT?8kR*Zb3<Y
zfFZp3&(ZbKJQEj_D#{D|M*-(VkLP2BndRy3voN&xgeI1@mVa9Gcr{xZ5<b2A@Z(6m
zT7Rgw*OcLr(*nTe$qhrZeaFcMs6cCsu+uZG3m-{=rta-*vo2jig*p;jzvJN*f&0nI
z_?q8D+{9k<OYxedjtxprPr>E73c%%J+p4O=p4#E5lyjp6kaM@vv5ANmVC~#<-KyDP
zkCyXzwFmGa{qa7f2*;CI%&%UU_pnUh-<l{+UJW6l%1Y}M%)5vvHrweq^c<Go;cj<P
z0O)~Rox@}3(XWHG_PNhD5hrVvXF#%3E}Nkgs<X59L9rGJ;%#9ACMkA9gmNZFz@_#3
z4}Ix&=+Q(@wl63-^o4nOeIW?HJ$pC$0`P9cM&{|0E9WZKHfwI;;Pm_948P><YvyS+
z_H6XJkm;Jk)s*Vhy0`Z9)@<z8FL1yu)``}7mj})Nv4o**rbRWbO2rcKJ96Ze1nM<@
z5p^V*OO<*9!`v?xWRqNFj;rhld}1b5>DbCpT%wQ8a+gQ=#F6^~ACzc?9-QS^Oz?@=
zWFsXtkCcs_d>oE6^I(L9S-{sVCgrGW8kn9eFUIV!&+|%OxAvSTRFuxxce2f}6M#yV
zt!R3Ln{AQvR$q?M{oevVfC+jYxOZojh$N;5@$%|bn*LorIS-E=%&acgGI|6iAJEp+
z=#t(VaB^_e*e+o`p9j2Le_tq|Ax|kc@ict{BMC1LYdx??f(s0maWD{UC(R|+(`gsb
z7_faVe2FMgj=*@lU{o%>eUt&lZ)7T$%Ce;Co@w)u6l<mj@IcQM{L>8{gFVhS4EzLl
zoYO3zrxYYV85h@Y+8(khw1Dw1uwHS(Re%wN-*;$jC&duzcOoQy$Kh{}9PF*Pi5-0C
zC1Qy8Y6ouGwx6(szo^po!0f=6Jba#qX?p2@$VQVhGHXbEG5L2maYXXRVU&Og;|37!
zlfLS{+!9>=7NIVr&tb-?G&qW?%K7WAGbO9yf+4JLdB6_6J2zm?prV!Ip`5*`3e8^M
z9yD<MDNVU|Q-cF-cF~-kL44s)1O40O+6Snyu)Mu2Oy6J=$z`rbDFsXBt4k|YmRhDK
zfRd%gIl=t;gB*DwQ{2h+q=$B`AZl7R6Bod?y~}8chiEa1JdFC29-cDYk5NLA0qHR`
z58mEQj!z}T_y-!>7N{cpA;u(x+BjL|iErQx0+pfKE_%W0DFMHKU!g5~!QRQIFxL;$
zS=>bOfq0Sm6CkvZ)0y=sI`q!jS6mtoQ2*^gHZKMxzWk}rc_IIca1;SDDt|-A9Q;D7
zPP1*e%6?{Kjk$mtV+8NF1a(}~4DS1KEUt%f9|Z&MQ-IRnICVR0x>$mR<62r!=+X77
zZaN_R6`JjlFL#`z&7UaK7;#D_c0RPJYiJA{A3r)5m~1`v89Y_L9eF$;eKGjR@BXR=
zz9+-oV4}nCPLKu%83+08^C?8IAM&_c?F3o%0K9|i-*WQbS`Y|?JUp(hzs$(2QIq+v
zG&4^|EYA!yht3Nb8cdIKSKqx4{-a^ow7?{QGy5zgW%WQbW@Pz-X>{t7KPbTVk2;c5
zS|XYmH^A_C!6m1*yVKwAbgzo3<^}I^(ubukFX|AYGyXP1)>N=D>p_d*?BoczquYwA
z%ZJK-bLRv=eFL3v0UOu~>Moojd|C0|S3yaldFvJxq57eyA9ZzrJQ1`tdD+pb{UFPO
zx9q%f=SFC<KoD}drRHLkambANcz4b_{^;E^c^bq`y<KM<ROD7##AMPj@pHkw+&$Cq
zxxUWmxs9OSv6&`w9s2!gtW&Zh@4@rO7QlZzN!#;zH@cY_%3e&ko$+AnK>pC|UxjRH
zm*$dvpG`WQ$vYocJGW-&|GU4&5>ArVkx$3V-u?mj)YpER$(kd}7SOf_Gkj<a5pio_
zOsS+ljh@qjndi1_k2BMKPFC(Th(@^gppVVyfBJU!9KSHX>~U*@FNsITs9nZgV)}M@
zFT0kO??<^^j@lx189bW?uQ6};-Z)Ww;0s90ut;KQ04Oy8)Dlr@Ahk7w(?8<$MHXrU
zhm?BsZhXFIRUQbMaJfy61KoH(_9LK9_%VKZ_Xo<^V!OEGxGO2j2XE78#iMT7{^XTY
z3*QrR#8%G1ts780;sKLZoF-LTKdvza?oQ4?|L{bw7Sm`y;%BwKxVTJLFS_u0$l2KC
zd={JHUf@<NRV)|h^dO!XLs9$9sDDMM2+hp-U(CJtS5r~fHL9YZND&1ADG>`rih%U0
zfTDEiRRp942)zb^AYF>myY${mq$MCFD7}SF=p8~25Fp?2c^}_<zxx;59~grHX9UhU
zd#^RuoO7?`{X9`aJi}(*r@XG`bagisf8a05)PwQv%OI)z((p0TD6)RBdX8DAS=HKd
z{)PD084M@A=8pJ1J`XqF6lc84B_C<qDek#j)P51pw67i%ASrB8ank(yzcHkpQstW8
z%Szpa7a90#ASl86stELuPF3RcP>^}F!xfjhjPS-<01?y+fA+kgrznn7#Bsxg`+T*5
zr>uJSciX@&uYnNc)5Orz)Y7`geajna347oYX<W5*o2xkdpg-f1_AiOliqBWcFqxSg
ztu2rFeZ4z@yZ%SoUOkX5RWvjnd#6{SW*iw~hhjNz56c0hPmO@^DB{>K0dpBa`R%vG
zNjLJ0fiy6o5)yJpT4tJVdXF_BPbhGcqls=QnI`v|1VQelL?Hk;G{&rqy^p`OTnc&K
zWGHw4InR?oB87F{_$ark&|xr3yEu-Bj?b+N{1*Cg*R~xlwHT98kM`#LZy1q&!0%cC
zq!#5yraw0VHaQ;X)A-&{*`jnb8)%Rl`ygH=yfmP}qTd#u!YE{f-R(dY^A17AE4od_
zm?(x9gU&2o8wL1uz8GMtTs63wO%(LwTCvcCjd7`BEqULG(3XIhNQQpH0XGjfQ*tO(
z+kQl{++7P+GCYWK0=E{({b}{s3~$G??Ek7U8muYDa+nr5j(2#8<!JOp1(J&khUS<S
zeB50a7%Kd^C&<ICD7g0|_V$1~>r&?P8y<WeXKBgYjdtNfykGWM+77(0QApB^N@0a+
z#`_!ho{R{xGhgDaAWd}}I_pv**lY*Z4l|?M`oA2erDHHb_akU^$Z$a(jvm9_W^`r)
z`T)>)YiU=!co9XyIHhw@0!?eHGhobPBm1B<IAR~=KK3mnAIZQw1b2WLslI=fI9WOA
z7T(sA!*V}EdR0E-);o(I^=6l;An2d;gWvCEjj~Nww<*Bz3AeIWS04LNkz|mGx(;aj
z+uCZ_HmOv5!WAEpFK0<2CzD_+aJ(Q-_ZYY{_!09-lQPwILz~Gct9Typ8jT$y<NWg|
zI?BWu0f_QsIIJbOEStp}M8QEz)-5bD?GdH;%&hIDnQV{-iA=hugKu9BzmS9b#>cm6
zpQkWOL(`dioI#!1%Fk|B#|Y8sC-T`gPdrTvT))-!f-;a1%AX4A-lpIb5_vK#bVqO;
zc!|<P#KomP&2-bQ$nO0kiclB9FH0#m=DYev!|<@Dor}h|MJ^4yww}0@X*QiEm`Who
zmgY2QAxTXxr4|R|6vaYd#<3tL)h$ZVD&bE#Fq3N)W$!EP8!dph+|eYtI^ktAhKj*Z
z0QhRcDW<LcZM&>7DSO}KrfIh_#iqL$^c34&!#&on!7FIf@2d70>(h3rZK+miQq#)A
zfOckNc<5s0MnQ2XM|Sj}iH^*TLEB!lj{g&xA*9U5wtXZqv(jh~F?~mM{9x9bKg*|{
zH7+efmcdOFC@T^fUr1cE<W4E9CVw2>vH7tZE@dSDNKD7ZZFXm9yyyhI`i#tlLECn~
zw)uCfD-Gdvg0$Q>T%NzQOO+uWGO(0SW+eYuU9V$EFj=#6=fEMu1e1Lgb^JPKbFLQq
zb^hn^xup~7bBbIEM(DG-lgpdWDTHShzuQ%4o!zze#<2JO`W?VRl75pOB=pI-!Y#OM
zWrm`jIPmHl{9_79Cdd8LXa)NC(lJ|P2LK|(Dffqi8i-%dzZRegV>3b7@qp_lEp6Eo
znc<&%lWwi)7V1`yvP}K3new>2<;`u#Aw0cY;Tzo6nlF=%xmUGG5xl%9xM%aDGE-|l
zVSedzbRl0`=O*KZ!4?>W3uYX@_dSDe)Vs1Xd}1-9kHeodRn)XRcQ!`-)xQ%HW%qAg
z(LSOi+dl<J7A?+%gA9!Cu0EyZd&1rkVJUBB>|mbqf)sPI0$<+*eQZ0cZ8zmZoX3@4
zRqIOcUseC-1)v?Tu;>2E7Vz~iBD{AVF-JYu?H9g0xN6_HY$3IKiQA8W_@;O~>hr-(
zU-(~bffITzrd?OJC3a_m=@~LU0;Kkp*lrq?bTRT4cMvoYsN9WL6EJxD^s~zkAG|S=
z)QV4;p(mA-Ub6=~?ss2P;g6;JzxS#5?z8R8&t*Pl6t^b|e09fRX`FBNJZS7ahmMHX
z`E~I+i1}FGwnXJB$CVoNIT`{DS?-SC1sJqP^@dDGM|Jyi?Mu`b7eqR9A-DGyUztpS
zT}lRG6?un1d_xdjxNP--7Eg6d(7I#W+YFNnm0&>wftc(dAbUP4>=K+9c&LUaDHNPy
zByN0c!A4HbE%frIMas%ZYsmSb;Ed6KE4h<zKLAJO<a8!GEe+8O)i8>TQ`lMvv!<3R
z_?40&uag|uN@&}`!?ZA9{i)OA_3U_hs1;voH0Bld@{Zpdlr*)nt<4}3-f{A<c=p3J
z&0>n1g1j_&IQS8(yxpCA=iCpxWtwjWD-{+!sEmSTtB%5sY3fGT035mIc{Se|XJ&ED
zOGVBFRGXO@mm=^&gJM~|>uts>po%piG9JDs#8nLhrL<x}S<A<r!FDNHV?<;n6Gamq
zX#sAwqtz}<E@dOO^@C5zuBkAE07Jj*x&;(k9U$$6L_n!aWnk~3+wkVa5UnHBxV@7>
zfT>tu_+AiV_+UOsi77TbDE%Il!o+}D*}E4nsQ3)+LhJ1t+cp5-s98{}z3ii0E4{3D
zIa_;DKJ2Cw`oW_H0`kX2`d1w6VTv~zk<O?h^HA1wezLSQ>D4KFSN0w{);BBtrw!Z5
z>5EkFKS{Si8YfI_yBroyqcuD0hMVW@JJx>mZ{?V*>-Xku8fD!~_Yrw$WZ9-(pLc5M
z`&$4gq;k*C!FxH6G$`M@>{$jxl875GSGd$#K)VKDtT}~k+F=X~HFhz8J3gm+w?~%c
zLr+k*zw@Mqa2>}+vOwJ3R7rZOlV~N*$prA9*KIF#zPNo{f6xnT@Wr;E)}0WKyJ9-P
zyBt-D0KoiXw;6jPg_vwAwhZA<8#F*o3&bCXkE&gV*KHd+Ot#&Efn2uUc064+g{Yvc
zTAhftln^!kJan{G8>jTVgjXcrwa|3&C^>giK$7iIL*yIVX>%ln4Bfi1Zn82Mh^`~-
zhp9K$rASK^w~zSvOojR3SZ5jr27T&n#kkTCogg_<#!_}#@^J$D%66?WCd<kwORwsU
z1_#6bi%w!FC$FffZk@`#8*0zS@xMlXW~%##Qpv(}+9&OjQ+8ZMeb{SQ1OO!<eQb|K
z)^~DXTnIL|VerE_uJee@Vf~@LFY&&{F`~-0)Z*t@dijcq_g5WTm&Ej&zgKKsyVh?7
zTpiLAM+V~`^)XV=gZOLzxU~6DWnWPBG9{Y>01`gwsmn952}evs;zwHVAWdcAo9-e#
zN-axPa&@h<!$7c90a5T#{jJqciG$~7Lrhx-8kDLJgo1g$Sb}ba=5L)&c*W$cdc5@s
zC!gwzs9e>R>%Kdcb?Rh5$m2h5sNbOocKYqG;75)Ss++jqV{$XHn4@QMH+$%B_?)z(
zsMZKdU)0IV`S2*WOK5!aC5N}P)%r;Ly#b85$ev*rkP;c(MU_-1PkF`eRGJ2~?Ykb+
zFI4)l6qaO}hnA{Oct87O+8#8lW9yKZ?l*YJsayB2QFHsXyH_Mot+MhW9&25-%$-kf
zylR0F!U+j~nM;rE6l2v#%$BxLe-r|qU)QsWJb5~MQRcoIj`d}GoNe}(OrG{Sh`5b@
z0kHHK-=*xP#cjy45f;h-U`L6=-R$YhA_LkPH?ZREzf^Cm`m}q$(ciCH+?zM3lMCsS
z@++xgGaV@DQ~KGHpo~vI1SBJm*c~Ha<k=GdrIks<BT*Q+=L(36_WP&Tym8yB8X2eO
z{@WLA^3tv1vzNn77j`&+H%&}STi#<k-hEC0h6C?3ukrxI`t9+T^Y!U71V$JhDSh@R
zvgA%VBM><#B<R(j(dj363Dnq}^~*5=qwLj%TuMehF;FLob=;&S^ctG|;57r~s6<YN
zkt(7+YRU%=)9NU%{cDM7$Slu!rZS}fxdL3jPoBJEMircK%aO4Fq7C%H-5od0sQ=10
z8m2^*O+*E*6SZxe$aGEWJfKeF6y#1z4rMXm1KUfIm!A2%o|-f40?CpATEFeeT>s{^
zVz5OGVc|2V)ULLI$lG?gs5L$0c$s;-dwaF-n-s%NF+|%16i&wwQUF#fnKhMtg~G`n
zW>nfKYS1%euIpNJMx}{;ELM7|KKIkjSlfOiRg4$&-6aR3&B3@XEKob)r&WLbFXr<Z
z5Aw1<y=CCY6Lb5ws;qPVqqJ(&;q*r{YCxhQCrKycB?gKpeo+HG$BL<uWB95^d!^6u
zfs5MbO{~*z^i+|VZ_?ytx)>eb5CgY}wj}CNBo+XioWfug<`9v+^K(&%y-xexI)(W#
z@;deT%=~KrmH2k_6}8xwM_l*5Yw`pK{!}9VOFq(!6VEn$EMmdHBmCWkCHLXCFIp*f
zFDo~#+Cx`mk|!T)mqZQL8Q#;h-F<dnC-$~bqQF?Ps|r9;<xyDVlLb0AkR*0#a;68;
znYD!~Z-*)T-krX8tov!xPOTgWNv7kGxthGGQq+U4Pk=zG3|M=M1p{V+ctgf3hFj(8
zZ1l#c44BI73RyJ>Ls2)z=M1L!m2*Fh6B0{B5|!HWHkl@YVF>o6nVY&@nfbD$F9xPf
z%A+FVl4mujWF@JlAg=4WWqaLJNvV&Lv;eN{#5{z}*7c{8<NOf@VOYH<ri>?5mJMGR
z$qcEqdEL)jVNws=Uhbg6@95tg$I|yF4|om8PkLK6N|QT<UUN|l#{|WY*n04gZ5c*B
zrRGixt7MVISj*Ob8rzdO-=7v07J!5AzVl}CX6{jXFd1*7k|(>s5xDpaAf2){nr_Lq
zov5`RU~Y5tJOD7^+YHH0-;P;T06hY)UUV2^UW9!`ND2v14V3#<O4<NzCjX@psZC-r
z*YhpJtbfvTRc9{G+X&#7f3>8q8pbRXl3a`2B6pNz*qZXHmX$3gFRSAo2FRj*DMtSB
zdzGJnEUovgtg%Tn{x`sqJ9dq5MoLjTEhR-e4F*ZC7F9jC$LE4(ZUgRYZpu**zR7w2
zN)uh~$0Geu2e0X%bRBz#5Sg|EO!cZ%nwEg8)kaGHgs~a+qF<VdU(lYN#&&e@*!&MM
z!*W8<zlL5#(O2fQb^*V_`A|9ZBnz9izJXZtMxM7~ba#*b%6jK`9i;D|tCF(v0rK(~
zOFMJ<S5*p-K1b3v9nbw8n~QpzoO<d53}Z@3ii>l_xVyT#x*thOzUZy*dMes&v~xA|
zgI@fc(Ud=V*Dw7lveqDSJY}~K*%YsB3-PBm@NY0(#dUDRk-}ukl2iSo;M8JlV?$BH
z&WmDx&g%U>=;L$wk%_UxLl;$5$_z^0IVEODCj%DGy7%T{-_!Rbo8*H#zE1>3K%PM;
zC5~sX(hX<x`ux8!=e>lvZ~a)@Jc{;ocQHH)dbyCxY1rhlTsg&lU-3MHFa`|5KEPbe
zQl9MKe0Sv*sboCjsvt-Ft}}3niAk5KQPT4)Pk|L$>J<Ia&+#HSXY})j<vpV41g1$5
z8!ZA!a+-0`f6oAZ@!;=k-D2uqa#d6eepI`%#;_N6xHD5x<BAV6+Lp9|D@?SF6h*7(
zuk7d#y^{wgWUF_=_eT&++3LU~9KGb@`R}DQ<)m)N^Z0TX_^`^+VM^qgiL%kc9-dFy
zI;nBf4PIHh&e``g`P|3CIfb$&cMSTpZRZszJ}IT3QlOdEQ2o!~&{gG9Uud;Ecgp5v
z{ivtnm|s?*9faRTkZ=OD2D$V31%y6XpQcbS+<5?|t5JMbSG-4ziLXt>*~yzZc1-$D
zmF5wvb)9ZNo!yxR#k$OW#yjLFcCaH6(Z4s{8jI2wU4I}3Nmgus!jYTZh=zHUia}9K
zqz8cyBtiTF-bRFK0=Iw;<YKaNWvk=dlrW?dlV5BfEZpiY#_XEpIt>m^F1LLZmL}y`
zW~rWEliuR2m*l>;cQ87#(Vy_pug>1MuuT86TmS^uh`7Ez<8Tt2r@x0XgXz|0k7W?8
z2L@#3s#6JRZ`mhuq&T&2&@gaD2L9eg4I%I^Wn;1ml?TU3Z^uODT1%H3q*T<+adLDr
zP`?@>A=X)e7P$CKn3R_DjN$2DXxJHr$pt)WTpxHdD7-MGN+f3%Md7<`xGHX^=1ecW
zbM^+bF{RKp;r{XDrv-SJ<S2IE&L6C0{GRHEmvQilZtC`i9m38fcg>+shwLjGb>C^-
zAJ?Nl@b-vj&&-3#nSd57!Y5#s1QX$=<PW&d+Hjwfl0Yl!24bHn%!Fh2)=5miUhP&C
zST-CaD=}~IC4Y>Odpo)F{%fBAN(LTf{Kbtf%X9aEKAajKmR(pD9Gqo2fK&xbWA}(}
zOi6#@USa(G6t&Ux_&)A)O{STb@XIZ(SjET=j_!@7axM{1?xiEh1ATECQf12w8L+f)
zX<9Px;ik>2IimWRfXhZRlRr9FZeYHPr?{yyUYRZ=*6+cK8l~YDF{fn*&JgSEks3Ff
z2IBLE6d`YnR=;BHLR;oNoePAy5MyBK<K~sbX^Ib)>XB$3ijo`JoR|8BiThW3#S>&y
zQgJLObe}xQ_`j4675a}?I>SV0+Ms8KKzAZdA&ExdL|0*z<<t_68TbB4U6eFvsGlVc
z(LK*;%5n4?XSjoaxqIW>U$v8XDx9+2{5XxfYk3h6b^D=|lJ3E01~X`Fu?B?RMhnv*
za95+o6-*s%=s@l44Js0FIXvBf8Cc7E**&k-;*rrkqG>;d_3d1g_<IDfJ_u-IBa;ju
zA(8YRqq<IN2knqx5UUdcrQU;MjZ<gRFXelXGZfUKB(F409)$&293Km#><iZC50;ok
z+!|4gyq_tVX`JkFQc@+S^#yTEdtWAflndvivG1kq&RHKEj8X>_n64!Pb1F=5fonrx
zUgs$ifz6FQyH_Z(+cEJ|v@-Tkx-4n|IOm_Iy6K+hH&buy++G)5U45iozV=|3hWiDB
z_f^@|^XY=^`k_gZm9SG(FermHj5_r6j<s{}O8I%CH=9W*@xi?e9wR#RxvN_7$$*Z{
z=#a2!R~}>OOqlxB&gX%Rqfbm$^`~Drj`I6V4Goof986tB84F*h>PYL3On=q8H%7#B
z?irE09$j{|cLGih@fH7Q)<G4{+2=EQr&!}0hSbVyHgG;>lEqZ4Fd*;m<`Z(fewe19
zRpDD2JfcF{jPl0sfpNAEQhdUvj-JY6S=K;%SK)>h&mD*_lMCZ<k?O!4*Ddj@!K&7_
zYqGJgG&4uCktwMMET*duwXp%;D8(sq{Rx<dSs}tNjby%A`9m)CKj9j0Kr1HbKuuo)
z7V$MMbN%`1--t3Q+phDv_3cgyU)$L5xg&HRm)WO^H#*lmF2B$tNUrX9lQB03t*y?&
zvm!@Nv7Pyi8~<c?=-vN&lJ$#+C8vMhSDvgMbaUrI=kl~C%H-&R5x|I+q4z!Fpey>0
zKYn<tRU@`Il|EA@m-Og6(PNIQv*oKR7lv6~9YGdcG|ZRveXkt5;pwRjRQrq45{Xxe
zwV2|*XN03EI(|uZpAA(*a_eC*EM)4;I#3G~R8*|fXu@Cz^<clNLG2W;f5R5v`f*8C
z@<~lL=$?H^Ec8Zg1?ggfti3Urp27H3M#Tnu!Z{y)R=RX(NN@4H@v6_`<m_@Qdtd4y
z^tP;X_wde(T5fJyCdFZvQH7V_{JhGJs@io}bfVQ^YeB_Iz&n#(on~*tV4~es#XHlq
z1ztT*7Yrt4Y#uG)k=58~C$(b9Q^(nQsb^2+YZr*V9>l~imNNvQ1YRjs*6(!L+L9=R
z`|7$LQBss?e|8z#yUftG#)b|}meNU>u5D>hJ4eQecAtNFT)SalOc;X5S#7=_VS^Sp
zxwfkF{5oV(6o{nMKL&r@n;Ikj^AuBe=J*ON|J#xYXCKH2n;8(_GI;a6Uj(>UQG8^k
zcb87=wDIYVNhB~bmtI}zLv<5j=2_pb^2-SJ&`wm%2cdzXOrTSc!Im#-OGk&&hzJp8
z%4hjI58Bb+>w02Is8A4U#xbulE<dWlJ)3NpD7`!pu<g~MDS_-RHen{*$=LcBhTe5Y
z5q>D}kx1hbe2O;{KHvILjUnOSG|xHv*P~xT{y!mVRnB@18162XH(`joJ#9kXhh&I;
zhmU_A_kFQTPu8q@;kd&AK04TKzgv(;sK`nT0Jd_nICX%$fIeROek`fO26;f}iM9}D
zwQ=Od{r~Zmg*>nN6<_{+R5%M5a{NV0zCU`>J`TUW(+yo_bA2d~N6jdf`L0y2%39=z
z1{37I-|)27G*r$<ae7aYB`KCN-dvc`z#6q?tt_;x_UCRfj%L-~_~vh0k;w*u;C?#v
zeQrQ2`utw`&ifoRvpWB6<9FaQmA8a~ylhoIr=j41jfDYpHaji#x*!t!O$=k>W_C1z
zN!Y<)3|Fe{q*LmE$ykwTFH+AGT2A7kJMW8J<Yj6HB#FoC1Cao~J8ppd^z+j9ZGW_d
zhtj>!9i{?>iQrj%F@eWy0)rxCjDrGoR{RW7skA{Poa8&2CbrnU)(W6)ITY=<^c5IC
zG$>iH&aC-KsBUV%Oss*#M^~AFPSl#$LcHLF`f3%E3As~>)D~JYhhC(pFqRv@<d;-H
zAT9o-RUB#iXK=81O}fOfZ)mb`jd_IR+ULHrUBmtg3)j{`m@eP(WI*{|oWKQR8Mi(#
z7RJsfInyk8ubU-es?@HS2Ba>5b0#h|F&Ww+){657#*3FKgB~(?j~avAd4h6c;<thp
zJA+wJANX93@j@+0u5bf8M|hY~8f7?nlr?db6{$gFF1;cjR^+k)v$9`8St%&+YUbu4
znGF#4UbPz&I$t{8u0YtB0^6_JxVql{oq#=jk-RlCgbm6tqfg_c2YMsnG7Zn}wPX~Q
zzFc}~$%iF^b)-U;Q}vx~%kTpxg;#*vbsF?BeOZEdPZh!O=c!`-75RSp<f&1j(Srr@
zuw&-e{S}V}3q)LG?xCSqdi9VNw8a|0GpBY!_^%|v$@@<4to+$>+uVKM8(MBvhM@hf
zEDpdr<<jH{M@6q*K9`X}@e+~CWDD0>ey`2}$&;neEi$vgFZvO7z)v${-Iw4W*Anv*
zoq_Sb8w}L-%JNd10YltOVOF*fhDI6|oKH#Z6yf4pgNt09!D~vUgVG3Ng2hXpY*XJy
zpR6bEh8EF^{j;f@RDHkl<Bp638?X^{8O1GilMZh8kCJ|Ews_0KQB@diNKBD2QiEf^
zpSP93dS!gDSj4RlUXt|lKJaoypnpDI(>1DCs^)c4xEH~paD(mdaH0SBgiaW_W)H5G
z0kf73FhZJ~1LOPG?E<ljjJ9_A4a88or-My6kN)VD(Q*<^PQ;m!O)sCgJ;zEWF8U3F
zZh9qkU4Y0$>|$R>v@9DmxA^npOTo#Ov~ZS#d+~YNuQ4ng;>f$1JoB=y-PQ4pI?T)x
zRrkM#M<$80$oD_!YqGQ2RF}&_=#OoAyyc?|j$|k@m*-jJye6d>`o;=7F@-U`thRfe
z=AtxnbA4@?(R&owX(1h%wHkb`lAk;~)j34(Du+c%^(MiIk|(LjrwGkR{+SK`TyP6P
z4I5!hR)Hota??-R=`rS61pd_5bac;j<uroMY3b|yE{C#NaXHu6V17YtX68FSiIo87
z0C~ZgfMkW=z%=hc;xkI{pQpH_fM^lWM@Hll%IWYcFih^AD!a>-n1th%1>l}vKuEqe
z+W_!d7J`F6D~$!<BIu(8PVhCIjAC$>{w&y4pXZD;vXm(-o&o^<gm0^B*1h%R4;s#B
zT#v#kh=GR^u^c_9aVZ}a$gP`vyjtiyAf3hA+3(pOjg+$J<X+Dy|6X6Y(3Jz-1^Y}?
ziXq)2A$DRmQ(EL{2WF6UH8SQ=vadEpT5C+lb^)~5(}xB;*=uW~K)|!MIiU3-gqWJ6
zzfU;WmdScYnV<YzsV!$vxB)6C)-o}f4pwL+79j5pQ}8TLa~k`xu%_2`_T4Az$`5j%
z<z`?nPnX1BG))bV_~GRRbDQw(q<#R}vulCx_^#?)oAjpH^J2D|XOcT_5~w|`Ff<D{
zb(>}`Fs#3)Yj2OkkTW$)jl;ztdd6-ePq~P)<LAwib<?t9aoa(h=BR}lbKhUE3}z{F
zkBb>Um!ZLdoDgAlQiBYUr&pGQnC|JwG>}j?2MbhsyyU}9nfK3-bI?GwM(XT43@{W{
zwpve;kM$j<?f`Moc4{IYNNY=Q=*_x9ty3(nKS}*^+IKqz;k4dDrED`=>Cmq}?%#?2
zfBatmHd5}V`Cn!k!dJO+i8fY!8Z(Ye-|7bZuJ`M8MJaR=^nEQv9~!;qvO~YUFXcHX
zV33}3*P=^nZl%byxalI?)hI{Jz-#TNf6+ppWZRo}FKX$&bA6F33KQ1|iYr-}?K2bX
z*beoRQ|D`@uKHg4<i5aO@zrXu*7qCXdSix7cjtGMx`r&?=FJT-BrA?xi_ha$EW9)I
zqtKXl=|yC%VZ_61Yr|utukmv<yVggyi92#M-6@yVSWBZMr=P9b=67?c^<QV4&;3_?
zN9-1ve37Z8xuP}s#JH$wj>_kf^vyTj0l9K|i};J?_k7ujiKAa4f|6(mx}w>18A3}n
z78XyR#JD}l6RiDW!e{hUtD=XC;jS?GM69{!y(%q-o)Wsda9|RpnW%=D)GY(6&afb$
zY`R&}Cw`bkx%%WbLSQP*`)V(J$6?h>voS*kh3<0>i~C$)1D1p`Vlo<m!kWjc0`)fH
zwG|0RJ&*KKvdZ4^Rg(?FWm(Jg9^8$nIdrS4Tu;<XdR)CO%~_qrey--2XJxgnCU-YX
z;TD>f3tazTeX~Mum{HX1tpLY9MLa7LfZsaHzg(w+Asau8^z`d}58VEue$|Fnu<*OZ
zTu5HoSg^&AHOh%KCuehwk=eviw0G!D@~74W>+lh3;T^ewPQK<(?$|iU3^}mY@@ScT
z%NfaON;r6oZY3t)zGe3Jf7;rH<ax~{Vqw@mzEzu+>0>i-RMQ>49=771a$R-v*Bmwg
zLJJX;k@^y75Fc91@*+IarsF+tvdl+1u_C_h=s)7$V-SCgoPJj|(0p$o>UbGb8`R+S
zngyGf=$P1|W23lIl*5|2tp3tFiq*EJ!~)!eOSj0*S*J(08&JC{SERe!#>x!Q3>+kv
zy?R|D|E;WLhQ7PUPBG7>03z9WQu>;8m7fGH)tOHu6qX$4gWBn9_8aBy9lAv2U3;LB
ze0rmMLpuWJn>E1>KBKTUz4e;)OJR7tA~vvuThIUIr&*09<(KkNopL`kU6hQRYH(I7
zF}~@X#bLPF7i7$WbVNxtW?qAXH9vj|o9WGxar8VJ5tP+IT2yPBo6lD2X+1dz`usS|
z3Z7-bQoEr$OQzRtu*3cO<~3cVK->67OGS1gzdmRSTmCNOs%)P<a$NQL$p8%x3|g*z
zmE6dnYdA5tGDE|EgD^c~t&E!INls24Bu|-6(wB#y&Y5Su%UtAbT+mJ3%d7%m0dKK`
zauA{|*(2tcUZSBx&vca%_m|l4=*lpqwKnsHTWF^WA^Uo7!B6JHj!O|NngV}*876-+
zVe$)f4bT(F(Z(4$XFOEbGHiG-kTBX489bdtS{OgSFWB}*FGk;SI-ShDKg!@9m8q*O
zjPeXsHqC204|~h7uFjRq2o?pgr|8w?C<HlqBtqY5RB$MK6EG)H6b4{MQ6=y4+K#eX
z(UC?`dlz*AJF!9R=VgOPU@cVNj)$Tv1q<1UlLv=`f-GLEVT*aoN_4-UM%N3<^O=s9
zSWh+zmQk&YF>%tvn0m{{bi3<O@IamxeU<m6!ax`Yg@8Zj)|j~o=EMf)9=eSVtGq6n
z)2&|&fN?s<$JutpZ6=$}aej`^V&P1Qr<4W);rkMMNoeo3Z;bQR+AkJ1ET6bR{DP2z
zPV)8I=0$N)2>Tnd&2H}~<sRatSOC<0f5s6u`YJ5Wt4_h%iRayu%%3lo6B|sDm?_`U
zFnZ=n9i99$!kKFlL}BUI7+<OqGA$eCb&bxG>{V~FFjpLC*dC?&a-St`yW=RT8YBo%
zDV_fYAz$LPGv<Y0-y42I)08)zou7NiEm*s{x$G)pE^h$7HG&vn4tBwPxulB3wSW=F
zqwrHj<QZp<u<&Uq>}2W_e-bV4kYv6}`keHI>gD6h-KfhifSuLn{{4jX^<$(J9-l>L
zwzjjefIKZH+`h!Y4yT~I^Qh-uZ!;PZvmb!b!H``B<mK2Y_7YF{214S7<q>t6EUl2s
zQsnQ-Z26NItZxSFxF1GnIe;yXA=-p}cZv<`z0xoj&9FB-IC@1z-)m-Mgn10UQQ*m4
zD1ijl=Pm6?$Se<ilU*QR(jRE@wp9BuSq5yNIJ-FwNys2rTIB*N(771`cL9>vO}>Xh
zow#3a!@@s*_Pxv}oWkZ8Z_{Q>d2DQK4CqTAih{XK=+sMngVM({UV@dM+-;>h=o-^}
zMq^XUcH2=(MN#nu42M1UW|V@4NT0@F_XvOi&}YT&-=WCvEd%74j`A(GvVUFx5dIuQ
zn3eVgnr-|YdG`EyF7$1`B<i@bpVRPvdtyL;OFX-BePDe2t;7cu4(1T*RDUiSx$%me
z{8mGQ*e!oQob-9fa@?rpCB1RZh0nLiN&VA6h);}qp<M2Mwa&6L5@iIMTiuAOw)9xu
zqG!#|OHA?J4bDF}xLs0K>bMj4;`|bHNws@^&99U0gQ$0M<gr^B=I3t|Esx}Ga(ujn
zN`~0)@(WH_QfFsSu>I<6GrjNmtm^4H9wY&;oW;hhoE;2IPX0m?;INq4bbF=Uk5?dO
zk2``vzCFm30BJ9KGZF6<0cpaHFdc*VS=ZS4Fy_>V$M9#nHV<yP^B=Xfa2f;P<Avoq
z*ek5wjjI_B-*%z>gk4)HtcHHyMfA}EY9#sbli3rB4g=}TI<dgAmx^20A9EAVZqtud
zTiAuBFcAFe8me8|joI$IG_A4x_XJaVrVZDd%*=a!XOUwe0<j-}(Q3msc2RuOBeD`}
z%Wu0ebI|_hQoG3?>r=q6>35%M({v^gvR1~1Qz)w^GSKWm(1o-t**<7^^mL2J>pu^8
zp;AmiL(<R?oKw2%E`m9z*11(#&DrOmLA&=Op|TG7x^lG~SWWYMy-C80Um5KlXOeU&
zxeR@0@cf9-ck1Ob-fBN)6&c+(R{tu6RLj*4Lp$9LPdKai{W#Uy=i>92T=x6qs$Qk@
z*V5k}?hkC+F2Aw<owf|^U*rTy+~(fqo7_+_tyy@hNK%+?>^mnIES9ns5;Z^_;hc2=
zS@crxH|MI<jqRX}`W}V?J$TENz_}J-mWn-HaXy9m)hTdp>NFHigP}{{yX*J+YVryM
zpmhpglzZPdY(@<*0)~!}U#IDC>EYvlv1A!{dU1#1@9De~{?k<0%+%QK&gz+`xKkWG
z|8mN>2UF!YXU+SlLc5;K;yY1?O%;xk;8BP0#&t{{3%`>51%tqwVB^E+|9)J*8pnoi
z)R~&5!3FzKYunV5u;++G|DKYSHtE6GF_nQahx1n2WtOC1_B59<*rgftCDnaYrsNrg
zTt?e}g2=VF@0S@~Vg)ddZ$F>cioxA|#t-Sw9x$I@1sAJVWWGaF<Aj(taSKmur9<(n
zoL{ycX$og@pFG>~?6<t3&DI)ZbSNm16=p~kkCn22On%*E7$?ezFYw=^BvW|h+8l@o
zilJ;`)%GupB3C5;QP(GV=pn(m`6A;mOz7-Eq$IpQpETum#c5)KmC6tmk!i+u_9Nnz
zGHdl0VWl0cz!lIs@_!p%k>~mUmT7-$wf62TL;w0)-P2!u24$%hZs%*BENz<EwN}>c
ze$sjb)*AOR%-W>qM98I?a6bKcHoLg$$VomXeKJwfbt<97>9Cr04)gnWID~zF*yi8f
z`Tv~HTX_29|N6WCef0mi>i=KAqi$Ne3Z$Qtg(Aq8!QsD&SDaSX0;?ZZPb>NC-`yEM
zv6~5lBbd~B#O;l4>ZLiW)~0z}d(%(n6-+ZWOQ~n<^raSsMA3;j&JIjx>ZZW~l1x?b
z5+O@!M?L2IKx%Y6BezfLHKCwW@23w-j=FE%6n0JPkgPfJf(htDBAR&%yt_wf6HBUk
zj2J_h+{~&FS=MdBtw-%;0|c}RvQRI{olDh#G-Ehf{?6w-w~i?cMIt?A5)*t$Ncz;L
zxh)LfRjks^G!Jk9(|Jb0p@N{=&Bm1b@@&2?hmPJU4e0QW&t1>Qr=ON&@d0H$;N8?s
ztm>JrYrnVK`JC9cOnilnt8f$*Ra(XziL_s1LMvhFrkn4YW(?=*Vy;|d``Tgmqijn0
zkcpoE<|OVx-~B!O+5VX0#<zm3+%|&AoG162|E|0*9-cR-tn4+c$Y04_Ri)((mCLkz
z%jea}>(gCTx&M0#D30GyOSVbz;sma?l=*h-CT41$m|i#2Z=?=ZrAOcAnv}qA<H)Ro
zP=~iCFaT+rc`vzwGy!(Fgk_K@Dh%h`pNQ`8?zfjXir_s?HgQD+`MA~pQvF2s<iXr{
zaggb3CI4oZWUK4&beV|DFEz}MpXZAYnp&=W^o}w;tjQ!)u-^PpsRdA9YiomYo9AGF
zN;(BZnN(S85ZxB@$cXT>l0X?WpRmdAJjU80X<QU?Mb=m0)NbTiMg7iC@25`&bAXB&
zBd9q1+~V7_kP~bwf=ZbWn^;f&EpcI-GiX^)(Wb-4^a?K-i*`N$v*aFm<Ig1IHmQw>
zl4Fedl#D2!{%_AIWA*>cF57z>$IsQTpXG}94jh2Zl7I3q{DlMb&vf&B^&VOKgY~wj
z3a1}OR35Ipd%-s-_S&fV(OsX0e!FepOw}u~(}e)O$(OR#SqQ4{;m7wlA4)|WGavmr
z*0H{M?M9#EDbW2jcVpwCey^l`K}wAlqLzb}S{0;RUVU`;2zCCB=p$=ZNQci2(b9A&
zTRfKwxs_Lt;CjwevP|$xzT=`yY&0ubCJzPrmC>V|=+gJ4XEtfFC;@#=8?9BC$Af^q
zZmH|z=u!`S0(svn9-`5zY;}LDH1iGO5^miF5;A}n@pzP@xtl+<s<4gk3q2vqk1hZB
zp!gt;O8Q+bS@(;0g^9|Bj~U<-O^%Q^K2KZN=>I!FPbMTGW_te?OM8FFLsI?9#ufx#
zaU&UKMeuhnB^CN&a|3;!v2gUk=$~VJ4aEI!(2t{8?~<e2iT{GqhYy1d%`YR+m$1?s
zmKPUuJ7$^4443;(dH`)2;(SN@;v)Cf@xGA_Ub63%u9@)2O54XR?fP&u$5&ChwOfxa
zUocH&wrocPWfDqI+b9*Gjh0oN3vYM!8xh=L`hhd?S**LS+VLp6wq`yotS{)&bM+jr
zbj8+kx@Oi_YMMfXm;{^cUe}nNIYDxz_$lhdJe=>68A>>2KAyEolfGZjJSMV=-JC%<
zd{ZA7adUuWtR!Olh7lJNN#T=yj_*GTB>K?w`QIe*hlw97j~i%=!!9MtF7}8rAes6z
z%<xGm_hT)!8g!JM<#}^kavPiA>DO%a<MJ7#P!++@`FPGUu(lZm5~LL`pAe-~{)BH*
zmx9uTGWWdP*nat=*88BO*0|r%9@8{yx!S~U0kL=QtCx3m8gG^wY2_O$@_uDFa(s?W
zW$288tTut|aVG64pumx5h-(j9W{O#arODPA2AeLrubQn2#C<pM{o}L`ZQ1Z&{zX}=
z*}Yo|<DLVUzmn%VfFSUYx1=>qk%bmUSA~G`&Io(X3`rJ<IH5^yVP~pFil!FLv0~7>
zQDPQ-QXV{W=X{>=pQk&lyZ@ZeTJ8|aru2|g^?7ccjj&X4vrsuWs{UT`EQ~@T_)6t}
zQ&Pxt*1t+pd%xd9Qe%S3OK<q2kdvaDy-a}GR(4VIhO3ND^e?X!WptJx$XXUOeg_D4
zA_V&u94!CO2)QD_HkM!XvI9JU7zw~ulUkaX&+gB_)Sn`cvT)k}ty0>DL(r|p7rkrt
z;PG^m2y_LnHQ+xGsD0)oFEH6M{d0K;SyJiIKPdR)B-c19ddkJ{#Dx4FdmqBAK$X2D
zAWs`aJ!$A}2$Nnq4vbcqfDb~K)BI%F1Xy>sWi}Z`KetS#09I`2lEPR<Ko0J_4^l|}
z`M_zpIofukwMMmu!gmq-nq%bx2FJ0x%yG^dUA~X-7Us(Jr|bnLZ{JmpRu;XV$HnUB
zZ2aPDx)7kTYG<F8EN-aE?97l^Zw@tGmSuul`g_rNW*b4v0KGmzjv5CC4MS}FL$Wko
z4k^GE_6DqFfdX(KiI`Z~?rp2m=5fKFe^D}=nVQec4yv<%?b{uk-1*@}s;}a`Q1&oS
z`TmNG;tub~6cCyYY3q)2H|xkGT9eEP?fhsqu}4d2`m-P@EiaBY!2HlPuH#bLZWgXx
z;caa-O9!~{DGKHTN~AG>LEd9-%J<u-AZY+ON$pSne7hj*Lu~lKz;#)9!Ez+S!1NBz
zldU~z>hjw>5aEF!I${$E0~$E~vhz?n_@hJq>IxE%loVS&JrgEFeGmi^XX*hSzbIW&
zC9n<fLQQ+&LGygF0>p85OZp6W&R<8w>B;LJyy{wU>)r4#uK^Ao`xJ0#uj2A+72g;C
zS~`l3pg|Q1lW{X$RyOL$0Lorf-Q4PNErhP~!%=^L^jLWlEx!-vlZ>Nz<N@|UB*N6n
zT4%%t;bc*gKI1nt_r4c?soyf$w3rP#@!SE8HsQ;OZF8BhFx<R#S)^3QcyV&24$;Dq
zRDqDRNxdt}r!#aw?y_5?C(mY^S*a1J-TBepUdCvFB70<WR%izeVS!K(-F<>%dN6zr
zD0)B7VxpJn70tpQWd!ai#w*<Dd>*7^bh{F5oIRo0^{qO2#qG6oql`7?fyc(1uA!Qz
zH|NesSpIGb12{nGW-#b;hKb6T9`8OeE}Uz#1L8(~V?2<E2djzg&b_pSI@q*F!aL><
zQ-aJ^$?S}zL=%)nfoJgjgYzNt{&oa|0S)Ue<|+VgoE8Ht$J|cN8-vWs=>W~#Y5O8N
z1F!N@zMQq>6x(q2W?Vts0>ur;bM2>oa*sc$N8Xk`Gz%3zIYOXSl%H%wS!PX;K=UEN
z(<i)7GCVD0|EtN}+`Ga07ek)V|L{8_evPfjGxT5j{<MBc<ScAudhD803LrmyGjsuj
z3_8i&2}AcMw2ZEfNVUdc1&$DQ07qn7l5tkYPv>RV4wC>Y%PP-acYlJmq48f<NdKiG
z*wiQ0@h8>XZF)%XhbUA*qwCcBHz2HCQuTJd>m4EQU1*MesD?O8IKX{<bzNVKdWI-D
z*Sh<c=35}gRvSBSX|)0CBu}VgPlvn>B?x$2)(u$fB!Lh-YTqI1=OHurB8P*P2_Roa
zJ>>}nCc6blNb~V936ZF{*3JV16YEZ=P5-CqYK)S_qI(UQ>W6XdHJ@eK1ivL_MLS=o
z0{s%h)m~ZscozQ@u;2h3b#-K>Sr8E4OO2Afg?%c(9zFf4biaZz#Z<*+!cEn6SKv~l
zdPWswU(ehMYoDDjMS%f12N234Q~7dYB9ESe+-$J#<+8(SZ`}R^r?%Tfw{MzQ6aUqD
z1@hLJv`d~_cRrt*xv;Z(U&5_#fjv28l2{1%x|~&>Q&)|ge95v}+}z>sgEbQI=sD2S
zH5T>gf%ufXEfo{P3zoxP_@jN5I$z>`Q0|y(#IH}qU%u~6a)+Kk&Yj-s9%PnABAOfA
z2gR(c-ZvVNv;-$UNIv&G<YnERFRdISi>oS^vgS@b3!5KdAYMK9==>m9(x;bauMfbL
zWA$Qtc$U7ze6gs?hiXmEJK>ySc!9m^Tq^jFtH3@39=-uRpo{%<ALe)I;dRlB=@t>v
zmg#FU2k`h@2_<oSxYSS24Q1Kdt;CF7XUF4hKbsCdU(p<0zt+cIa8l3-2qFaM*iUM<
zfqW|{FIg=CZC*Tj52IifHDh040$u2JwG?D!kLd_Qb+0(ALf;9r4b?0k-;kF68?Ssj
zDiBNTG#{6~<l&-$zKeCw-l*6Bq7s0(J$}JsOJis3<2n^42=~He`zx|wa>M9lgX@qI
zk-N&IxjqIM{n224+C1nDBwFRQY&A^kelQTE<jeCk{UT_(fs|y^3u;uD&=Iz^J)${C
z)*rR!ty^zkP4I*&&YTeg(AIbrEyh}AKx%-vaFE)(DI0uCATZTzCGxek$Z^<w7riXc
zVqzoS_Us5fipTcqLh}PF2xXirP=k+lALlBhS%EbRVAT?eGYaq<Rj@iZzBh+b+5MRF
z&kfx~(syi3X60wQU?K8hc)F3*`+pgbHsSBAqy@{KjizNFs^t1-gCEL(69@r4y-_mH
z%mJGTpp0c@zWcT`kF0a+)3HcwpsK3hp`#l;jC!+1j|TL@@P&W-@PkUOe>tzP4_i*Q
z{6RP2%@_U2O5Sx@4<Ku$8B<toYqm2mH#0GuIgI~F&9&4W9xqy7drz0WM!%n#=0wGF
zS<yN4HP?N7aGKPJ=xoldgfKWc3QXZtyc2r*L&*T{)z0SKA;G1h2j~VHcLyZtGD{7H
zXU&qDb_}4$OhY3@zW6=j4nTYcwsAxh;QCQ}oDmo&+R+)dTnoAGatIcRiR6?%X5sN0
ze36`)ovaA^4Dn`oX>xmKhP2+h<1U(!4QW~cQJZdF;Cv`l%K=cdrAA>puX@kEQ?3it
zu>X0AI$!!=B@YG}HeUKhEPc;bKOEgio*em1>XhqmW<BYMee+f}JPXpU!)I%+(|EN7
zFuJG@N~(!gYC2Fd9w<ym9VRA_hGjmM>cU-S04a70<#j?0J<=hf$k_j!8~Fk7j_rdf
zKxai!Xc_0=`D)%y9Z*TJPLHjQsK%rY!xr-ZSHDP|5bl~~%Dcbvv0+Cl<_$O>FyR3Z
z{?4?DxOMKsgwMs&h2NA#P51m{&4^3qCsl2h_#=V1>s^iw`AEhn3Jw|LWGy=_@~bU^
zEJ^_3Vr%TvaO7i4fl!N%Wf;8^-ZZ58jyB_C)KRwk;VqYAPU1I$oh?T=@K@t1+Y)xf
zJ~!y6oqcfWVH@WHjk8p$h!09#2U{N1C#CE(sB<EInQY>r6r=B2<Bs39vlCMXN4Dh)
z00oZY;+$u!l|NA5-diP4^LfB<rac^v<6`m&t5_*kvKW@~6KgVZtjyGVpWtV6YX}^9
z9{-puvsq_w0uqAekh(m0kSn3d8L2-?rFyd1Sh?}jhi|&H>-60y;XyJ@cK-R1^C{}m
zi;MO`^qc;{{Hs;2;sYSdf|}7PX9__~l4nUbu7EoA<i)J(&(a}<v*RUJ*uAUE#nOAN
zXh6<V3<h;uIN`m2wf;$6o>Z0P=K#%}|AAlgBBn`-K*3c1Rl_Gv(SN1)bF#u%6LU15
z>&)HI)4;(7FVvq@0*yD|S`|*IGf0&PtFEZC(o1rlg30Y7XUiZShtCiHIrN$s`5AyT
z(mf<8Z}W7?0LJ~W*+MVrfM7)g@nLwN{{yiHG%#h5@$Kew9EE?6=wJ%umY1!Dz$d2X
z+X6Nf<9}~T0hXyx6NY0h<c!nW^*1-W<CDatY-9D@U5@Y%=N4^qxi`BbBWUEGtQe#o
zQ*Z*27w>;1dCKQ?b%d6&QL*=1{FPa`uKllsrnCw}uCf7AZCbAG>dQ(1IQ03x_J&WE
zz76t^(CVJyj2|h=&Lz4VaVaHEmRXGnM*O;CdNeS;fEgYqb3KZla46~9zO5~)YveF}
zHl8KOC8Vd)Y*J8+8|@)AxHMlEMExl-x9;qBFPsD3VU{c7IN-%ZBJsIn(?lq!S%;)U
z4}d?=)__^%C-}Uo92mlK^AStA@65npzV*7MK%L(ZuWK79SYZNdkb5}Zb{rZTgyw0+
zW68OFxwG{5R&}1Na1Ps*h&?b63|2WQvisf9EV@m9tVMZXu=A2Oz@{T=Z^v9(I*DAK
zWp&&Tw;j<6|7f@E|8|6A05EHI{Q3~6U<W;Mo-4>JB%iF4O<&G3v;PIa!;lUW$4PZ>
z4;>5}dbHen`*1r>(GyhWO!2FzJZB7@``Z%qf$Dkn{FsYSJgzJXK?NF`(%oYj1a>@t
z;pAeFAB45Jg$@_Tf=DJ0-@x0T5!{FvIo9Piui9sTaH?J$KWh@f7Y1Z67tMPS_{}#Q
zNEKJSOxIaawxPq<)DZv+iFlzMj+&#@XQ{MVN>0N(6<R0#&FtoX7f+9l;(5*Lm!>w$
z`tUyEhL|Z@FgH+O03gf~DUGZ#!RdQ_P<X9tJX_442n2$oun2S1ucc!%Ymma%4&1Cx
z>4~Rqq7(ZKcb>ph5eSgRYutS6q6`0(UI_vJlM@ojeFTws5D_xuQWsCSKYv~x8)pi4
zjre!na}rB97N1HvyHrhK*-B@8bvv|f9{N-|_gvZRNs*eudk)wc>ApCwa$|TJCRShv
zPtuyySqEDMc}N26h_(fjPcU_*7$#W&`j)r*`ek)@U^8ESOnhlg9yYrC*y1ap#IQ5C
zpyqR=l(LZwU{5;v)6ejM&^eovi-wB)`AgBM(duwP^q!eBpS4MlXev;EnOJuZy0VLK
z25LV^j@?a$uvoMF&G#q?zj3#M3VpZOaPQ5VLWd^3R%Bw{g36^~`W^0%Zi2q`J4yWH
z*?aPc&-m8%c|S=iesY$WpF_TOD?kHi`x7w?q3=W711ynzo{~!>NP}$cBbJ`Gu|%0I
zL;;h+l(^v4>0=nSBt_2j(gRz==Y|fheD)gj))qiz3t={%kTT|FK));b0JuFT8>o@{
z5a#pOE0bA|L!{<it9*P4wlAKoFe$!ZdCc||f}2_^KkC?yZzf$%fbB3ic=hh>QOfXK
zzU~6b3}IiEza@D4hRiI%J_0OF4xbvfQSg)(Terd<t?GD+pC4ySk2jvw9MilK_a5DW
z8o+(X(w#2ekz6KfDNsr?v-(v!2aO@9)O4`2p_M;>iBi<C!R&X$2|o$v&VK4v7rVCr
zd$nV!7WDSs+gMeY>A&7>9_w%a-~Hik?$whP`%O{gLn&ebk1yrkR*jY9QC_BDkj(_L
zQ=dHns?{7Eh_%r|rlpbkU)h#K2wGLn0Un{RsUnFXhct&wxlNEt{9eXbV#c_RHP_+i
z+O*@Q<t&IXzGNGN(LbuTE7o_64ZBl#pfwwNj4vb~>Bx)@)*J#l0<F;>Uo9K4IT=t}
zwokM-j$eQ0Yq%!Ps9OfRv!fk-9mLN~>vOWcBF}P9M;3s~slMI3UY*h1ErB4kmCbu<
z=L!$(Z%&k?J=eGK+4>SepiTyM2!NcNeyek*%qX}=EuTv;Cq|)~nqu8!_W_!7U!H(#
zC1OBBYdg|DE-Y>tg3xCzmKJS}2W`;+%ls_{VPwy9drW$_H1)QqVrd>oFK{@B7hptS
z<Blh>5I7@x4yeu-b!5J3KS}-wsIP}LEcHv{(;mA`<(kkr^ma<1ZjTOuYdM2Q!UlYq
zILdZzh{0}&2LX9<J!csOQ_D>Z&cG{Dolzhe;|`VP3}o4o&wW>_BikQ8tNq`^wEv&P
zB*CU23y>cJe7!3T5DXQ14sf(4)tH+M0pA`P1UU`zdG>zy>@r>Lw)&5)(pU)z@YwaT
zd>(SywGc|pXDvXb%z7;xCsn`yPag<*o}l#SJ%?xmi2z$aINp-G4MpzrI78~w;<Ve_
z4a8SPjpDr8?jdycnsg3eKwRQS)LL`x5V@y-BTZ+K2+c!WfhFdcTi!VZAu}<r5oFpo
zU(%+(1{x(}I+zlJDhv;0zGLG)-TLsujxY{XgNF^-mTzD1+UPduJg%Onqtnv-6BZ{b
zUqb=ZZNu(dV9H_z96hj1)c30GV9qlpN`I^5F^r($rDMn-!bU$!kI%Kb^Q<bjv>Oy%
zsY70TJKjoZP&n(IK;g_$)UWt#eYYAn4i%(%m+i@^Jyx3p{q|VLrv?oV-(encIkt|s
z<SM=$DT@;9U~Io9wL30Z@Y83oiXTERl4ZTAE>QEbKROFB2+z4F*`yLTTW3N8hKb82
zo?qkO2f5H_p6dCxJo*5u0$9v6iz&?JmKcJpY%z5*AT-!U1Nis{$xQObd;$W+GeRR<
zDI~k2oWYTLUX<1<$lUs~4<XJ7?n?Er4r_VBd~<DK(v<hMU&<5>U|)<Z$#U;922I48
zaoTYtJgi?r%}Q@@8u0XPr7b4KW*fSYTx%+a_703^=^9sY_3P?4LPxtF{<44F4Q%?6
z<lMx7c$JQ>Te3pi()qyi^^gOMj^J6R!HjC43I&eAc-@jZr@d|~z0zhc4pwpJ38rkO
zz;Q15G_Gi@lB0}Y#O_C)-v?fq)-CTKfg|#;={yMK%tX2Mx*AF3sQs?m3~v1P##gFP
znXw<zeH~4UM-XCweP3adxsTO*qbhAusQpicdCvVGa?iPHIB#iOv<-kxM6$Kl%*yG%
zD~8!V%Q1BM+FZn8+q(btY4u)~{DrN{;WNNC`$;1*6wFeZ7S@(e#_BhE1Nl+ne;kZ~
z{fIf9e;oUAm4KZ!_B|u4clS(~NC?ds`sS@bVX$_XH}xvCtH_;hvXRE;aaaTmEQ%iD
z2YeL3;{z2N-I;=Q_;7r~Ps*1w_0&5-#ej|g_*;!G!peLx2!Vvpb<7>-GtqJXhpqRH
zYHEwNhXqBYE4>M#f^<QeQWZo{KspH!Kza$icTk#uG=<QKfYgLe=tX+(9YXKDgdWOw
zyzkyOzBk729~_ZzlAN8r*IIMVHJ4VfBQqdQoIMNlGn@^Mbf9goOFD^OA_3YCy2-+&
zZVY;-8yKRxMYN|V<%eMq*54@khxr(zIRy!<=dzrWtuH2_zx+yS3?z_TXGANuxVIDj
z<u7hkYqFSU;y6<bWQ{49#u8y)<~Q`_J4ZN$k2+ays!#Nq&t1K5@seqfmJ#u+J_77M
z${BrqAUBVnkpb)z=O!6WaSEE`KP>=I9mv8BeVR(D7R)U*7(81C&;^a>b}OqvMhB^x
zw(pE^Wq6ESo8d}OGp<3Pb3!iKoG2&8+oS`4M7olnmhM)1v;PK|0<l%j_nMp%zk{a<
z5wVB$q093s&AGzF03VRnT9~wbrw@P(4Z7i5&Cdk@ktdKnW@)yDj1v<p5|}Z6<M<&D
zTbbPdGo|j(SsZ&pkFZr|5Qox;Ew)qGfOd2ePIpqfWte!yPD0bPzx4i|egw5_jqVqj
z@&xeOqADC<a^u7n{owaxCxP-acYPhZ9;E<tVrip9097epm6W6c0piUc+KvJ!c0h)k
za+O%%qeydcI?hq=jro+>XP#oal%!WN5w#=u+*cval)E+gsh7en-inR*O0uEZ%)h8+
z>lF%(E&wMJuu}qEHW~Rk2V)ht0qdv`Rup}58S^xb-ugDCUA$ps`i(G(gHA`{=FTtG
z0si0afTXZ`^FK-94{c|G@{2bs`}WPN)kY?t#uonjDt5pBJ1XlS=0_l5r1$Kdq8O4K
z`1oJ=Cf(MR|26elYq^44)qOP#P9`qW82XGbQY#IunZu)6%3?LCs!P|tD#VOjV2FWk
zkv;+`YoM#lCfF>psHA@cM#*W|n{^fa7)4LJBeN>SA4bV>WW`BomcUxRvib;qcFF%(
z@|kZdF%0wsor145Mbt^b;34g}q5Ws5%!HlNkY3=JFvzQ|7>0OgED+ueD7Wjt8R}mq
zO{5=aaM@n;*Xyr`j#z~lS|{`lDSkvP|72qLpRL<Mtta?hIj0==*T<w~d8L7DRWZ<@
zvi*)`fY%sb9MK0VF5uCMM?kE5`{Mf!k6G1^oIJYWy!_T$A9cRohtxe(o^a@SF0F7Y
z#9mq~4cw{ONzL$9^=CXU(zJr9Tf{;8XY;@y!alqx#(@i~16lncIOY&h!ym*g004l-
zjnB7P*Kj|o^tOeXaD$enxTV=GdGB@=N2%lIolvYegpjc}1#$myujI3-I(RF`O1pHn
z;Ubk-<pceJiX}kh`Qsu@nXT@-NUKy6!-gX!HM)CtqS{`|WSN?84d%SBteB677b!pp
zJ69hKcL!Z01+6619pFMLk#bbtHN$|6s}`q3rd@khHC$$$cIP=rl?%yJS`>Zz4-Ml)
z>ey$sL}&$3D+5<1blK4kxCzH5-H@SO#tMY+Rk4Dp(a31U_RgMmg0}TuyySKrs6ZB6
zYo>)Rx)*W2{M}c#YMt#SffObvz!~%PSVq5UXZ1MBp;M(2(XpVf1}e<aWY{XLr%NV&
zgqWlYOqSMtN%0*JKdY!jwA#xWelW;?s<m%zT{uH9fwUAVjTp`6Krl%)o}H8d%$A$v
zZ_Wtg#*%1zE6ravjSp;(7Fq&{+YgZ|7bfklK3DgXs>s4RiGKv5cli<%h1HjWAf|hc
z&)0}szZGr+6Z%@L-h8qE0Hx7<I^d#wybKZ+K+*8vPj5{GK(SM?a#taZ`0@wxgRYws
z0f{3%xOxlRmosxrN&gj_yXwaBIMq+Jk*{KPnhw-mRQb^G(fU>*QZVKT^HyNgh;v6k
zP&9cWILX$rF4nnbi~F=taYa5fPh2KV`InWd`WB{Uygimw>W93KHHYdbI?2N1%=iU8
zz0UVJ9agRVx#3P+qI{Qusk!&3=HWp5Wx*UlC#Q6O)RVFF4t8*=oq`@=LbVN0t-G-N
zg1#ZQ4hSa$fD8!MLQcfugB3dmUvG|J3(&F102_*k)#}~}gM}-~8%gtL)%)kCue7CT
zX_iFDy&tSJ9(Z4zT3D>-bnW32gNt0D$jkw74(uPjup8v0hP=cU_~T!s<ZZTL={mUX
zKcU{)O{wb~pG&5N&s@WvXJ2oHL!XRhacDyCfV+oCuT^}kDM-6>;A085ku_fTSG@Z`
zZ*KW5eB3)YC3refWj|OZg2sM`)0+t(CQ~+c5@r+ndW@rE!LPG*G%9Q)_O>Kvbwap;
zc3G2#hx_lDJ7IzxXC3=IXQ!Rd5o-%mepwW!Y_X52N#d1gC90RzrgOj(VrQNOHQ@>u
zF=UB@8-A_%evN$aSfr_k<>^($-WivbCF`k#(56~(9xWUn+-p(NN>UQ|Dyd3v$R$Ks
z%HDJ2rAITb5KWYJF_M&xPLfFF{$qNNvV;H1gfpu}IOdvSr46UodM47jIof=?_^(*d
zCs6?9RKzQMTo(La4xkSfnN>hysc||K>t&)+y}F)~Npg~fyt=v2zjl28UiSRT);BIv
zqP}wPJmbpS>k4XXcR8VQo!x*v)zWjXUO6LW+FUKX8r27!l@Ir>uPSI5=zNgVHx{C^
z+3X1l`=mFGb4zW#II|vDSt8O%u^ndb-SWN2FFF-xC$86ioS?!-XVr4AdbSF^V<k@!
zzCmGeJeZ`L{+sT$)zv;sVci8lJWkP@XP(R6ocWqqb04>~U$<=)9!??jSz<pPG+h3?
zx%%^0U{xkj8C%I0^!Pp-RhW|1_ximh?6r6M=X@4;zfn)KFaDW99s+Tu<BPGiJF^qM
zIgq@5cr#~v)n_)F^<OF7zSL`bxp^CY<3D#^5$Wu!iu0eX#ot(Rz%R=lsCrtyghU`R
zGBXp35AZ`Xl?H6j@BuX<;A@82+4G7UMWNd3ui)iN-r7$Vc)7W`pZ&nUYcpeNWp!ym
zx{)igK084b*}k#4JrC9Q^oQ`YpS3(o+Bm8oCn6yp9vbQq_E}gvsOw)Vr1kJ1@kbHO
zJg?MJQ|tA`=-)VNCnoKC?WL}PG&RrX{Ugt=_PkC*!Vg-hQ<v82uUW&xiBHAMaGOm<
zdI6#1l@9Q-K88mBezyAuD^LGne`2WgFXtpGyxq?!4ITF$=(>?wJDhs!@=Z0FPQzL6
zkAFU{(|7WDcvdBo4h4l2`rLVAQVGv$=~kDC5MPP@_Z!BTZTL)zbpw~|;KDKB$LWoh
z>^H#IgH+|=ZW`ZMOL6rl1<YnW;NGxF<53d=;pq?Q;vp!04kF%GQNK%BdpaH5sdU5P
zT|B;F=~KH?<U18{YH?5|`}Nh{jSK1E%FS1^P2ZKjw`-zaGI8;kp>V6wc9u!R+Dh$6
zl6+yEpHg7l)#2)jY;aVhPuSagZmTJc7ci5*J7=*1ZoN^6gZm!(!^K;+z3zyPb_cEs
z@4AbS;qzk}A8HyE-@YAj?uAn~)E>Axo;vY>bBsRz>;oa;V260W79lX4LUUNI*8Pfq
z4^hIQpjI1~C-+r4b`)anbHHz__8(loba=g1E&NaNoE>A0omwk@At^3n?#KX>_<GOj
zKA+E5NH1;ZpypY!5B9>IFtdHXG)&Yt!cB1Gozh^k+X{QczdoM_l`BoK6}ww+tZRcS
zeeG)AF-x%b2Xp#vN9#pq2+^4<PF=m|Xxt3I_Fu?7M<)%Rd(i@lm{5+*rZ;DNwl~>3
zZg5sYN9hNdL1W9qj6K7}-o94I)&u_DQ#6xC{gY%(s|X8Co5tmxTPX$N`(AiZUKe6Z
zsTV3&uP+%l+qmLy)mn9X7tS4?HJ7W(LreDfBCR61xMtn=<rhaD_!41WEH^F-_HI@h
zI%fSCGu|_s%lY4}VkrOrA3zH1j*UAkXY<&`VaA{blCg{~zQ{uG7>KKaM`B)Bx%;Ec
zgo1eApg36QYJc1-fr8Ku`>^2We;0#$A0kHoqc8qV+x~mlLLK!N2m9~g|M~5=7mxog
z!vFcdf7dABf&QC+R6zgF-`4y6Gn(N4c~LK%e`Hwy&x@Yn{?qaGzpwHCzrX?BvgFSH
zNZ(!!Q7pDLKkRdnIcu1kF}ACh%lZsIDf;)4lFfOW6xsTRIXRjT-xr@PTf8={ABkjb
zi0A*M8od~z6eZ7&wE+*6bZn)pW#`L%mHph4X%D$U7jP-B36L969KtR6mEJkV0shdk
z@)d1&U}>B<Oa4Jln--DV2f2A_$zFQ=k|tX50>8&4;a46K>Z6=`m#^UmjefZX&25{_
zbK}nH!`|xrf>!YiFJnR@%NqdM(dcpLQIN^mW_E*8OGd7Za<Y5plK);@hlxbP)dh17
zaykV>9%<JgRL))<Vfp<_m~D;o_T?5MDjev%YwjhLn3-a&7wfBETnPMZIZ?!WY}`O-
zWv-U&)@S6?#LH*Ai`7a&ewA()_fWF9j54%A^LlZ+EiQSYFCUBua_XJA!yDyshFyKi
zCH((gZ8+&ai|b}w9~7FCRo9kH^X9QH{PF+XMYpE1dk4?d$2g>-`^)n;@5=PEN16lI
z?2TG!WX8$V^+T)$n40$R?hj^9+v=8PA@uphE#=hNJwzxc1-{SuTu7>;oGeJ-1Mdt&
z(-1*Ncke{fyuzk`1HHL<9T}KQ*0>h$IK=qu$AIhD(;5@4<nsNf*Y}jif3J%!MJwu<
z?(TQs{3&Ntw9YVeINRt)U~_F%1Y9kEg-u<Hur`^+3svpbPagOrgG$c;Ii3Zyw=%`f
zI}+XUaP38$<I7(#WAf794|<PesF#T1{BcuNQOW<-A`z)_b3_=JR;r{n{rg)R<(QO9
z_e)_@aB2mGgOX=}Qt%1n)y@EZ??s8pvwMTDAa2d=q&xLIY>h)m0}o?i<DGLJ<5>?m
zBjO%Ku<84mixFSV;eo7E>!+m_33NhiJ7b}n9+4}r<%PWWFF$E=UZf6|xj@@&>Bm3~
z@#CGC$6A@MTxN|-W=jOH`<~kWe#v+L5e+<uw6frzA2Lq$!e`YCU6wyIiTR%`x!Bod
z!nIUEEbj<A9_>Hj|28&{oj)fy{<e4EO??+%mht!zw)Kay*3#DpoERA3@t8~P1vBkc
zlDSn>{pn!{yXh{??3F)RCRyYU6>QUeSzWw8VAbMkbBXbf)&$f~hy!)My%%4kAP8iJ
z`PnTG&MloQ>lO|$BDt@f^#0gUh<FIL!!h4Leq%f%HN3f1&U}wim#w){py}m2kHl>I
zywT@yP%3FOja%ysb$@uxpOMWbr8Tc}G51u$1i0I5MuFBt0!}feB3`%G^Ieg*H68W)
zFcbvM3*?Dzs(j*x6X4>J3Xb{<ML1AKjAe<wXvGC{g$)yo%@{4*XAs@9ztoD9BH_9D
zP(*%9%%d-3?RQpGu+#Z5JpzVe=^pw{Sf1)Jn34CXxZaSRYWI<jKvs_Jm{oxe&s9zf
zI-u3d9u}?JHPf9|loS?Oz5Cxtb-jNYvtKad1$f=}Bruwxme>A*QYJ#f#p`a04L@s5
zzl5Z+*%@WM5xH5pT>LjI9c);G_SO7CD$8BRs(rCzf$A~1Mi3KT@;5ewyJ#oMFCSYE
z`#I~e?KG!rI}KdII22)iGdI=No}9#AJU@AE2jIl=*tS<%z9$tS(U6F$^HpJrK})*y
zN6PD7CAPWL4L1QR7LEMc8-u7d2`BvImNo-#EL*#_4xbJltC^l=oObCj*n6$vsaN5B
z&Bdc!<ZaVvEy`AP_3kG@CNmth3yFCXtKZ<-@D~5;FELg~JQVv1<q~<XG4;)l8Al2K
zL;H=ZNIVu-vFY(9j0?QN<&@^=<m?pk+~`kC<XcZ{zfx|cXkVL9>%vlEzVo<;7IYwO
z#<AFSm!|Ya3~z;Rs?JDkUhFQ8XxfJtW%Tp+Wg)+ocDaIwY<M=s+bzFhK^LOyXsdo4
zoz$&ZRHz#n%#@qlzb#N#8oSnVa?C-AUP^7K(bphp-*nKZ|Fzs!;awZ(b)HI4wMRJN
zI1AjoS7oBBkJ<FMcE)MgR&`C9+el95vA$qKXtTpELP9Ym(Qt674NI!D%cdd0v(i@N
z&7{4>0oqL=^;{b`@vP`yIXTP!qPo#o$oEAwtFAFUPeex1-_T;}L5xdbcI!<0w9%AL
z|H+BVYESC%%FjN8-1b2a;$Ye))e6*l)Vxw0*S~SKdt0(r^xjFt<ego}RPb^lX)TG%
zDJ1Tr4dYIa<C5N1@ClOM<+RwaaYt>gt}kJxSo%D9N6W~y&-r0GzWI(OaMZTh(wept
z3m%7J3b72HV`keh_us#MpSbLe?g)~i`M^tz<F4L)+bK^y_Nt+$$(}W4*z0R(tt)qB
z(pn96D*kF~`Gf&IJA2|jmufqWVY`oZ-|}q3I};k?ijGJ0-P<3trL6Mam?ta{X!#L(
zUCJ08|D?3a*Kf){(4u=39#eU6T{m5Mq8EZG^aT3dgw$ZIv|AG0V#1o4kI7tbervjn
zH~C9o_g;^$oL4x8cVTdKt{Kbh_RgoV@oRg1oI5jET3wE9n`zQLJ<rUp8@MVPd`bND
z=;Q{@;PmL<*s-Yc4^dJdu*oSP>H4Hk82+KC?4k0*(X!FA<<OwLnI2`z;UnR|?nteA
zN9V8`JLj;j_|-L~mh<zU_SanvTE;$CVP{a({{;AN*(=o7syUE__bMO35{WOOdfzlx
zXQ75NPGe-n@HxX^OgF(~Mz@2eSX;vy=GBIC-1a<K#^Si!c`8Gt9La<7v}H{4pTvFX
zWN!AH<19&VpsbEa9cHMRCzr3h^%D`V@hKbUx&A%`JDmeP1yBDTpaWJ&c#f2&&fS)c
znZmyPp;9$fR;L4DFPaKnO~dQP{@KFi)OH_ACy@5-$xorbp&AX9Ahr(8BB3UMr1D?R
z>#*?D?gD1NS~?{RA`7_CZbz8iokS7A!FWO=1rmCjS|(nh&n|6b1T~hHK;(aiXIwFV
z2#=wuW%!Ky>KZ9~qZJuS=bD_$(%lgb9IlF|e2OyR=_PI1Rce^JIWE<?yuZ)dVjfTB
zE5&afM;G@hX*a2yh53s1Y^KXtm|KloA?ey|=tM3A1}mSyi-F+y{mX$)&^9cyUS@D)
zfw6}FbLIKlkX(bXwdN|2xW^}t5uqwR8zflb@@+^0pR`)=>PQ%4aex#%b$wzI@Ap|=
zKzdbuA>Q<m`ehPa!0WP^3;;WSEf*+hq?o2Lj7-(94#rPwDrP)kYn<S2Zi)fM8ZeF?
z?-mcz<A8k0VsuWMa;ax(5!Ox%aU&%wz*)~~bI&RZBDdAF#VFfXnWw>O$IbopAcb9)
zZLx10qkn2{K6qv@G36&Jd>eRA8-EA_fi+t9d_O#~s>+(62nYtrT<_wl%JY05y7}Hl
z%I-kx>YXSm*k8AyGM9DW#b(U>vmAVT&#OjGF=G{tkTAM!(=bXLI{d=j>DV3ODGbCL
zd4Y1yPhxA)NL4TF97S%Ph8RX85)hzprLj>OT)O;eWx5M+wAwY44Tt(cw6zQa1G`fX
zb#7zSf+R0{pM>$lT<dk4Fu$T4-zEeJNo&-x!raOPf0utaSbTBuj00UqSE&4sya;7;
zzMk&~@V8h4#b>*Wo~M$qcse2D3W(fc0!Yk5smjtcGaahsv2^7(NFQW~v#CQV{#@{R
zDkh{=cfj0o0*H;DmoIib9w=2qUwb_LrS1o+-%y)NXQ$!<vOhPx-)cPHwK}vSQcpjq
zSKb<Jx4nmLidfTgJ~eZ`x^@>7dQgNa_SMrEXaC63k#*XSkR#gE0saXW>~67c4wiiN
zs`s)>*qd%zaOpM<wJ*i~sO`mwAiJ1;RaXKk^syb~{#^;8_@tsU1Gk<2GbhFaInCX>
zyN9o7>gnv3{QV}2?>&#Vq@!r(F~8{dc$SBD?y%DSh!@Ym2XW7C5%v}DG3|1=s`G2L
z0HQjh{y7VL<6(0w{LXI}k#~RP@sVr;$k=CSU~<j=gfcSJGLA?$6$NZEyDM?m0^i+n
zsb?G%8^rQ}`jX85m~x!#d^H%yKaSX_f{^EyPC%QBjAtaE$f$WP<|7HjN%H<Gj;~wY
zHpC;0JrPy640KC%Zi_Qz$r1}oPNaaY%)}#f2Lzm}*Z7lo-KtvVO`3<kvrq)3{!BEY
z7ONm89ioRNG2J23=SLQzJ!fCk?%FT9!!v26p%qUkRE+s~UHIoWYjAYE3!bbU6)h}P
z$cyHCHXLSq(M<^~Rp8!zaZi1e!^WGTsCs_&P07cdJA8ZDX&MLUaW4v+t18iNdE%}+
zd!X1+8dq;ix*0Ps(_x3R0AQ>4X1d~9Zc=9JyPPVftEy<wuQoHq#MAL5yHC!nMOUV)
z+rbHiCG1YC@X?7LN42sZy+b1f_uDDXk<jy6l)?F|EfqV}{5DAZQ^PX9sHxkQ&TcWL
zu<8a><<0(>?Jba+SF_3p>`HwpB@dE3cL|Agm)D709o0Vp+Hl4t^wNMTr1q*t1~{>X
zJ-af}r}FvFk*%IB?yUb-O!uPyihaEB-L+L{M;R;#fS>4OcHTP8#lTdyhD(JXqxpUz
zD^G@@EP9*ahuXg1BijyXpt%uHZ3F;W?O**@x9f#`lrc&o3=9863*e}U9)KH0q`4;G
zp7HHnTB-i9=TzXl)wy@9V5l8<;T-lZGd)-OVT?fOQc$>{JYYe!{yI{AFQ&tkYo{*J
zXwAFq9&oNah>%))sHibxGlP_)*whgSApi=r?+`-s4TragULO*oTO25Fz^-Ls;>{aY
z>(?%`*=xULJ!VP-YUp=AXjV5&Fx6X5C}>$G-G0Is3D}smGQm359+w(~Jg3YL7EJqQ
zaH;vH>uYBE$Aw_Zr3&5s-Lq0gZr#LK_wEV5Nq(x{)kHuc^Z7e;+zlu)E<#PbTOEJ}
zdX-*z;PZyrEH_8R${JiyPyZl-BM~>AmJi}B{3~xZu--BWQqz9^LHg!3bss0?XY0<u
z5Dq9MB`P$NwmT&W2N4!zH=!XTugSUEws6wwm#W4X905IiO}z?GQu7_OktfZGv0A-D
z3UAsU_HIynzqIH+cD+sD-n|j${i%LFy1RPb)VVIbShrLR8qeT>)A88vx>SPjXVAf0
z|1qBLRn<k)cVE(}ok@I$IN}u6jeD8)0Ep&*i=6#(SypayN}Q-Lv9&nQI!wD9*>f-1
zqcQMC%7CVtEUI4$D9_0IJwF=&2a-JnA#8BVM$hFquDdBVTh}CfaZ*?eScGOk%}i^)
ztqeq1+<dj>)4r;{r-@y3t?S`_A#dgPKJ-qdn?N(fo(lp|$$_@JEitp?z5Rl7WY(wN
zk)kA5O2JT|9zN$fW2W}`-j<fxjRK&jVZhw=$LN5Eoc)!zYpP|^(I>mL9?sqBd>*iA
zJMN~oKyZXRvPb7o%b!VgM1aHNlY5U6%@Di1qW`pQKYQ;waVh_*wO$p+<W=6tjH0Kq
z%F5JUK@UC5lsJ_%IMPDz5)Xb<ZDK$5fI#|`j~Xe@T~h?vWADwS7Sp8S0zME&9%XMz
zBB3oCY%T1!+OEeV@xADo60b9VKfp7`J8ZRs*7kXh#`NzT<S>N;ub?3tB{R%!y335)
zap~^PThGYVv(nv5;4MAjGZ#m0yw26q72s0P{!}+_zx_2>kjEU<vD1ezoP8;1Wxh3;
z0`iZ$T|ztl(S5Hr&jKx3c=+0U7o7qsZmAQOF~XOTK|$>c{M=LrhgP&p`jqH^D((Vq
zpEl5HA@3nRzyBe!o}bZiDX17n-uma_gnLC~$lK<rq2}(mN3SzB#CHscfbjR{RM(DY
z@)tOfQ{QZd<GrRj=McM0`$OtHq^%<y59q5qjv{=ob31>A$BjCajpv%}G0ZOVSUf{3
zk&mlan~c&XTsPSTYE{y8b-s_^dsu7W)1-kvW6FJIAd_qE$`LD}z_yLs6>gnq*vJb6
zEkIVDFFQOzk_3fCZ{YwtIE$qBA{1&r=K{7r#8WK!$j4_6*5y5dV>Ue7p2N9|69~RN
zUudG2NF|j9-3AtNk3Dkug$GyP#O>RDgOz}L#={s2X%KPhJ?@!A0M>AzrQY6^Tx*)O
z{+XE89c0cxHw8KK<Tp|I_%MP4s5Np~PS5F;ss)#upc39@TyJl2VuBEo_&}Bb3mB75
zJYnqJcXlN~&Pd)v-E-~@b<{t8<<s~yuK4@ROCXXsH_=fOg|j@S^S4u1nacxoM)3?n
zTnbt?x&k8tmB;%)>NWvJ9f@ZEC(O*+FcT06sys2<Aj+^|86d-!>(R})Fzp6RUKZZJ
zdA9m>INX{Cf*!~-Hy>V!szk(0k1UjD3iP1)oS25=|KNF->g=AUk|%B$0LD~fX^HO@
zt;xCi>Dun%!c4iup9=us6%b?ZzMwCmU6<g{FZ&mQlc4_x5{-+&^~ygp7wc!5>G&l$
z0&=Szp<hW!<FX|-4mlgpTXM{ic=sfIqUQtCzkk<aW`ui6y?dL9cGXM11UTtFTX#8N
zji{3?Hb+JT0Z@fFcg53}M5RmPAw!j>;gRV)M>CFD{o+rg19Q;{qvET(Km!!7mmnLc
z(?2xP)1Oy|-tKtDFV>xUbwj*PCw$B2>kq<C0Bd^;N2#lGk~})iu}~H@LP9m4<w0_+
zY|wZEfbz16yNmhvRW2DhSd|lokaS6F>Y@WB<U~1+ZjZ(3zDmo}+(F2pygwB-=r?KS
zKuMyJ4sR2Vs_Au<9Y(`VcPr=T$FfWui7!%%0aOauvp=QbtO6$gX#s-P>Q^}u^0{bu
z8kW9#pCb_gBV!o}bedc-oQu|Q|1Xi~*Xyr0(olxZjSf3AWPnx%tZKhr$(mFS@vUf?
zDiJMM{iK-w@u55ikgt9G3>*jr4NUXr1856#@!ucRt2u(EcgXKaki;|EkCr8h2!wp3
z?g8r;Be|=2CtPY)@<ej-KonsGxS*ov#xOlsjUypV?kUWMc<z9<{myfip)D%OW!(3u
z@PZ+uJqr-N?P58C0L)R%0R|}jtWT{?W$C9eqFLRN4jQEd?nPf#lwVJziOZ>1zCn-V
zaD#SUGED8SZ@T95fL2LJ4&J4QN|CI#5l@zx*taS7?9Gw_B?gdMu4qZ8s}6cA@JlY>
z8UYmv@R4K-XDwbNbc0_e!SX~vX~^ov-rRg=YVd&=^&wH9u3($p9g;_mUt$lYgBB1t
z&b>he^}Il-9r&Tb0IF+lcJ%&@<>b(8hQ${CAUZRlYAFv|^|r>&O1G4Lcxv$}EAoV<
z!Mf6MI11^|i0aaA9f<Wim@ar}zQwGlTl$1dO?{*?N{cPMEA}}taOxY*I;?U^q{s?v
z5DU-BsoNjyJn@es;>3iS%{hIU%M+P^I5MXhv-}?CPUxLE02@t$QIM+-ekf&?9qlgD
zAs(wO_ub-64;k-LQ+L*1r)9ESyQ{#Q8;DHtj{E39)+D$TyE>X5_62IKGa*@+w|Vk6
z36_^YSnN4tIBJg9*PIQy>~?1S@`-`4kyWf4onv=ui8TN%I66!K<=Xo8u+y=pF+Xgi
zWTxtLS2pgp`GVZJkMH=*9`NR2@j9Z4^LdZUouKLV-&!b9ilHN+EDS0wKOTf|<k3^;
zB=LBSjpXK9D1RrWx%;BZI_POAfM9MJ^iKVdzeA1TdF+Al_p>9?E5uisnU`|Fkk_2*
zJzLGI&>ZK~q~<=ys^^OC(@<jbsHq6I1H=mJsEY)aRz3Rh`}0=}*2Py;rhg|?8C4!W
zXAv6bs^Uua?ai(O93(A6SZ2^Q*LopeNN)>uTlY4ctVsdcJykYoK^BOkx<4g}3AXmB
zGf-jLQ-JBlJ_IELWV?jVW~&U=U#XI`Drsx3>V5-pR+v&8J|O1ONdj;+d1kW%X&%2R
zkB*FZR}DUBl0b@gq9LP=_jXfxXWkV&cO!;{&T;gqw1N4SJrs%m`6$R9=T8|SI_OC#
z8fM_oq1;hLP$*x@GXU<Wna7NLOUwH-IJlKorZZ>O&ZXg8kY_bV3kd03*N<nmAU!pj
z&GP)a&OjgXKE9$x9U*P|&rriYyrZ8h^Y$fE8c!jqr2$PrvF@O{AyR3sv=CN`@JET~
zoikT#D^iQ((z|gCKw3;*s0_KX|I^{@s$2Fob;%<;5cMyo;q_0q^0yc4gkb3TfgOw~
za(O-HI3B`&XyqDk`rfftG%=%Q)=WraQBLi?Qn#@3znDZn&rcVe?+JX(9{HK^t_Ee9
zv>-)7h#wn8kloSsbnzGpodPdYAD0Fa-Y3sEnCV<KJj;c+#^*SrXB<WYfV&!HQ~1k#
z670x~=U3hglbV$%TQPF!#=EupT$gY<kX_ykj>ubyZLo*#jRGJ!5ZYbBf^GOq@qsk1
z%Hcs~ROvGg-OB9qZ>(f^bFLcOO@lLe7JoI4UA(WCry8=xb<LXztNfY|k!j^0hV9>}
zasBDZVNNp5{x~a?uS=-D)3_KjQ^i>E0uKl}BNdgOo14l!m)i17e1?i%?eu*S+^y#N
zq+{c4wJ*J!T*3Gi$V_+ScZ-4LmrjB*(Xm$`P;KGa>F*yZ!K9fV?t2B|L$;>Su$2>;
z<~Gao%jJAT3^-$B^Pbh8rV*QUAQx5BU6YdX<m|WB$E0=do~d@j6sE2`MPsV!4J|hP
zlqc|rpmdN2jy_!_ye5N;XxjVcFF<Pu3%h)Aj8^w@k$wJ;r@Z{8Y_V&QXX!aeXX_9Q
zt+WiKAK`o~0SrKR*c|q1!TQeDyh$R0b=h2{OEm`X`(5+3n^nN6{J@bxoWG*^vzZ0B
zyCy}-Em7S?{KEW6uw#vJ*lRj9bxqDtya}+N(~)HpAIRHUluKI+k%P3yc{LY2#R_6Z
zg8kl+rQ?r}jnjdX?b{o~%m!04)tyFHPWpV6q|M;6-eH8fHvK4BgM38Z>(|S9ebykG
zmTEy`hmL`v%nqBXt?Xl${ZFM;t9dRq0O-NV5LjARZD|siTw_~Qkm>3eNBI1%49FW<
zu2PCJ$g#CQJ)1hZ=36T%AedT?nyiw1V*j_VuiDFt9%{oBO(TloNSdsRJn2Pf=U4;z
zAyqba^n@;kW0gaG7Y^F3tXR`f_A1j~=UO(fL2EP~HSbE8o2#uK)~a;jMfm@K4qD=l
zC!)pAqry2WBtS6Es81xcn$$CuK%SL5TE}%C0$m-mq5c#4ocrLJLL_*~ld?GANunL)
z6>JTisH(Lz*5cKt!NNs_78@kU+d@}O%=TZl?9$Z~uYPSBJiBfj7pN=(J-aI`DezmE
z@rv4z+lG76MjSn2(`ln!K4gcveL6AqBU9OqTi<&Qk0yVnpw?V>;=I|f-a+&LP_Vt)
z#H|WsX!*fI5!YWSv`s0<^DChnE!yWbRwA$@HJtf`t6!(Ol*g+uMkj-QTKC&4Sb%hU
z-J!@%Q`@7u>dQxqCnxB&k@-x&bM^`<=SQPJA-%RtScEdLFdqeAu3zihl`L^zj7P8S
z6~k#@rZWr}Y2}%9+@#W#F65Y$gwcVRYlF#9!H=yD6t<9(a<o8q{~~nyJ@Z<dn%3?1
z7nK7`M>(QVs)Hrv+uGyQhb#R*-CGA88G!-;W>e9YInJ!Npe-frY%awN=JxgB>D9`!
zoT)-UU1A^R+i&SUTkT8Wb1|!bp`a}i!A7S|3B>73mm)`{e#Lm%0s;2_Q+40}<IS!i
zKxJe<r)eWe1upYAJFWbe_B_Zrc~7x@(*M|#vFxt$uh{CX(R+j+c@%I8Hr2Qm06^~L
zHe$eeu6TPgkk94@UPhoK0WBUQi`WTD(<ZIjpEBJXY<ilRo*6ZL<!A<9UB)nS&lJp<
z^B5g?f%rJ+{DGmOE-wAPR+0Mb)KVAoSR6Awm+7s|;hyib03i2gdqUUZHr~QFLZO(G
zJ3)t9u3|vB|2DyJRzt!gYG;4;x*hEd>cVqOx2GoZxZaXj)0>&GWnMeM*^U1wbE1m?
zI%{Ml*?{DKT&iTR%yixIk4}R|x_ZC`mCQ4eQX_2t&CEEOE{xB$E=$Z*gM=jTNz}|&
zLmO8yK%go$hH60<2!ML#=#glu!U^I&pfY*w<yG+njt`07{dKEjK9g+B<0$pe4bvYC
zj(>T%?<%iAQnTs$TyLL^Qg@|UP65c{RVoKeBs}-gI{L+23Ur?&_6A4N0P58WS^ZcK
zs5gMQa~dgxs`-W50669Y9-PcJ-&Mx8Rh7Gq^X-XOjHT#jPMjOQWgmZXG%4)Wqyox?
z?hieeuSa{Uzb&Fyo|-MhhiLF*LOQxmb-^xJ=gjn^>C9Rdm~w#B?&YW;r_aHQJ*6$c
zOZIZ#236<PhquD`>iuoBGkC0;{8x8_F>Pq2pGt8NHg(567O6TSv=Q8gFph=RU4B>_
z&A5+S!K_hmOIX<8mB;K7fQ<xc3d=S<1PSt19M5CYfZE#J@Ixy)z`%NA{w*aTub}ZO
zmqu0HLN+Rav?9YZ%TAPjfeHo2hFd$n{M`}{Qd<M@3QGypSni_a&r(Wwh}__OTCQ(h
zb{*~B?(7;dif0QDss(lL#mE)<Hj!5Z^Q}-dc}kk$7HRGvhojd|s%yk~goQC$@+GQ2
zbCd(pdG5RJ3P!5E@DQDzs&5yMlqI`kqrbrR>$Atm!0b}qBY?=(t{(XW^%yxZTe$A`
z?=<LM-nU7tj|$#30<a;_AS!QG<FE!muqYfFjq+#2X$WedUQ1Uq_>bh@cor^9xJSw%
zy-3*lbut;xF%zdV;@Ta1BvtB83ux6rG9^o~et;;Vj+&Dwaq7xXXsw0f>5Xr5T;=m<
zH9Z{Ptu3C>AV))77kVQYdrJYpQ=r;Y0Pg(OIDbZRdP|kBp+>?z^$CN@aw+_}{@)3q
z-~9hi_s69IJ0)cP$x&3OXJnXa0YTCldr49Ow>cZpsB>bb);R2VcwJ%@V3w<nn*EZP
z@C3@)_0EUk+gr6>vv^`uW<<bovDEM3#54+VBlfX#^)%5Txp1Ekb|t6wsbnA$0CWf*
zEJ#@cB4d7=wY?jIkin#ywJvSY@!o*F_q->6k{w2Re6b1}-wuEj0N(L?{4XN;3;|&?
zTJ7`xp$31$y!Z9(LXPMQZE5*9Rx%KhhmOh&BG_&BJ;l<<b$ZHawBn=8*?Q#d_XT*K
zUuAd{61tlfmaucej_oC{bQf2#IPE-c$>*W{lZ75`^$5f9seCOR2ZIKMDb(tn99w5=
zHqtx-*sNcmg}j8>hD27#bKPG6_x@wL<S5y0lzfkf2Eam<HL~xA^14l;(sYw&XH1pL
zmK_IN&Lr;n!~xTiw|s@(S(AGmxr7w!z^^XE6mr+EL)u)M^`;zgYL(wbf1oGO%9;AQ
zq0+r~-F?-mNnUx94FrSQSHh;Q99P~{fnM%J3y1M1J%}p&<{@NS{3+wBLLs%H>a%(G
z(b2*Ui4*hjHJ?K*aI%o`5Ul#|b_U#ZIcA8x%qJk8``CPYj3kJpS95&pwvOaImUy^B
zEGOp{M5D)Ww$9bN6=kWYZmICO%A4;o+?jh0w938b(~okUC04J#^;rVD-hRHFB=J1;
zbOP)=Et-?RyHxh*B&-FkUDHcibIf`lA}63~2kp_B<FGgTeB;i!jK1`uSSU)uoTF`8
z#{w-tuZwSzm8^f7o>{)J#48v2Y&h_Xk*tXjcbw!El^Vc%|A;Lhn=8CN=6&1w1L=}K
zLLST5N!QBci>P!Azok&+>}H{qqYDL86(3iYd?<j^MH0viz8LLP^%p9Gi%|x<xwy5U
z;`y<*JP+aPR*UY-7-p4)DP_<Y0Ehs+9YN+qsn}{+He9BLM|N9ArwJJ9>*<bq-ou&i
zUknGs4}4C`{>zc+{$a$Q|9XO(owP7X;^xW_1OuFN8XRy7`kPy6%@vyXxpz@FyBt;$
zi*Z_#ic*ZVBjhV?(0J&uy9DD4yK;U7U-|fzUcbGW4!b<=p1|Gv09riD)Eix`bd2+_
zM#v2Ul<m1Jpw_h&{lhAtWSL@B1T`R@9kXfRRnQuBYowoqwNW}NhOY8}a1aQFf)9l{
zQq1PNPTaRZtGY?r72MzySMDIhR$8Wfw08M2$^Gnrrv)6TOfJmaMqB_)WE^W~v@_VQ
z51e!d$)+A7QQAUWy&NJQx|9Vq&G*^>uM=Rh<M1G2Lp(EqPB8f@PgXj~)OzrxqwDCO
zezv$T8_}P$hZdU34BY!rQADx^7ANyY;h;Ba!CAS|_FL)f6_x4&WggEEGv`X`)h8vu
zsDzz0Wg`gvO4cVu&q7m+zQERtev24~QK0&VB=uf9G6Fo$+t2v8uMTQuE74;WQKeSM
z%C1150uCGAO=N>Ow?%K6W{aewcq)emvT}g&Zq{I`Y+d>Z^8T7Uv*>tHUky|!pxfDS
z2EBDK8bt|6ctJ`9jJ~n_Ld<29Xztv3m2W?qH^gqfGxiG-*6Y<*5?xZ}<Y_*Ks_wJ?
zG6n;RJ>AmuV#QKkN+3D<`>~TIn9%M~bsF5Zx<r+Z%|!=M^Ak%^+T~pRv=^xduMUv?
z^Ra%3cUwEx#6PkCwf)3Y(L3@1`*)6qc)g=c_rUU1Ddnc7BX;ZY!E052<}(;ES62N9
z7>{E6!^n{mLXn*m&=#P<F&tcUx4Wzo$8dP=)oSZFaeGaZ*0^hM*(tH~DRIF3#+vDX
z+}$dBkH$fVrxKM=LYi-Cnoo$dDbNPE761!Z?GD=XP?WA=C7biMR{omr?4qrnJ<i?w
zm8EWXN)t3@DXd=#5N{v2>5-$r3Cq<B78)ZGJTCY+zr3F@a}!OI?9;il?jG5bWisFF
zcaC2(Y<d2HozlSa)qD#^Lf4xSuy1PBYMum%S<+MTyAm~z-zJL<y~Z%kv<q$_V3T~(
zV)#Ir33zMBDGbgojfHLP#ZD+#GsPhY5|J;}N7+TNGgBghhDB{DTj7+_e=!Do%8%Ky
z=tRqK!d5(-m3bC`*a8?IjI&1zDt|{T9ng&dCKUiN3^=^Iu0F3+HZchY2C?%xvClUp
zN!0I6!Cv%M9wQNRvAjJdu82=<<e^f1cqT?4%te<a+V8!%ch5)YHw?fzr2!E4K2trF
z@?83BkJ*eU*Bw+TX^y&9M^)9x#O|uQ$Bf2DYfD$0DX>&=$pc<ND`h$kD8zFY4U>Vk
z0u)^|8<ykaEH@m5?4t)5J%-miX?3^y{WI3;QmA(ROFb<*B7vxQaOYVQ&#kjC@1kgb
zCmfNyd3ssq#pd-;EQT`}!n=QZw>}={PhR}0m03XHN!2yOcs3z}H9iJlh@xAc%rzvW
z6|%R`w;Q4pIbT+TpT&AN%_aYfN!A|$QwD}Jf-W@wLh!)*1LxIisBw7z*k|JEud<W3
z&Hw;0Q-_I6$YoUeaS`FrIde|IcuzPj0DBDvq+@erl?|WA;H{jb@&}=0<khlQCkxNH
zKG7F6xZ2_g7(uXI7gUM>F$O5ttgCw0GVZ#I&7s=1B{l~q3MQw@)JNF}+^HESZ3=6+
z??WEs6}<kHIlXiVm6&NKdL%1Q_1(mf?Xz>E&*Di3>5mUOK?QdJdV+3>0wAADi&r_`
zTO$EP?Vf(2-3ZEZM|4R=V@*4COo$yoHWNgH#-yiEthyD(6%x}l{$zl0(i@<WKtt?C
z5H7{InJ8Fl6a+k}H)drl(+}V`K@HH{q<$&uj46w5tMAZk(S4OSrz${Vb>?)Wt?)S)
zX}_bO)?M%er0fto4**4vWYNVy0&8iX`J_jY#z9N??oQRbsY^HN1Pw-gpS=&@m9H@+
zvJ!-pf~R_AsyJg^ZzlnY({va>VOP1Y%mfq4SafU>lKPwS!*J@d#Q=>SfYR|Acq7uI
zf~_tVewL(Z3{K_0_V`4uSVaD3OZ4`6GW4nF>!V43=X+WS!ry1@pl{~1BBxAc#P9*7
z&)^o0<5EG1u10>1SwzaIoF)g-ew0_OJE+PYw89H_d|TFU>8Tg^G<5ZojM#=6*LEM`
z{6qJ};k7$z23D12a8s+T3sksMo)G=xP!8cjp)p8&QLdW(CSGJ?H@tQ;o$mpKE@CoB
zlUfyR^CG#wI`kWY&_&;*q9&GrSI|-pHlAq?c^F<}H#s9Tj@Bj*X&%XGy~9Hh@)b~+
z30m^5{MtbWWTuS<I=!eRp@$}#qw(%hXUSu=5lqiu$_yf_yFjKQW7!(0f!La<)C2?|
zCDr?9zPTu{-vTr!&jXmFwhGV`TZkZl<m!zfyaM=`#Sg71^f-XV$Fb0ZL89P5x<Ja%
za*~Xo>Vd9jArzC^UG4dj!sTrjwa;vJx<^e(%EC!|BjDb%;!Z7Ky>rr;#f&m^<zUiY
z;wT;yWY62u#qI$Jhhtn8YP#3IOIya2yx{s7;OX{81a(6bnOuCv+}c%Z;?gwW0Uic~
zsvaF?&}NvVG^OJb&x89ll((p}h$hPj@yj2LG5Gcck_xMx7Q6=d4^d&tQDT5IKNl#6
zlEo^k<kQqW<P|nxd`0~dz@U!@>}8mtoi<v)PW&ldaFK`NiBHsX#km^Aqh6JC&>kKs
z0AwaSE!Mk|N&w{O55fi17eGKe=Fan)LmYZCdGG!&@%vBupu}Esw%`rNRQyuL$1vYF
zeOynZN3t6HSd`d<@hbze{m=VGcyPyU?NGNWmfCHxiF>|p5&lFA;moy{9GdQ@-YFdB
znr#`7n&i(=5C0Br#*c}CgEYdIuM%ia<B7#ggO|{3G95iu${qBp?+qa!6|gpQ8fD=5
zEBUnhf#(cqDDFc?`?2(gLUHl|<3jhTmXP0DI|^Xme!qM-#(1gd!zLUVr$2k{$UO1+
zspQveLsGAQC$%hQ&;3#UCP&m}N{rA_%mOzLg_^#q=-h-vWPZmcthMbt7;8GKHL&jG
z<y}@orN@Ks1B^165uNv-k!gO1Wm{1u<)B4xFcbEBFDyQlAGu0NJC}j^)`kc=CPIF%
zbcsjE!_LFW(rCl$GJ&CGz(uu8y!x7J@_~57u4{Zuq7FH+*d4h;3Ha!<fHn;{_kObH
zM-%3!i&xrrJ8T*LS&mA}68a=P&P=7L+EaimiJ83mVrx&BS8#Sjr}!$Jmof+#?=tNw
zmN!Oe<Y>Ak2B}{y4k6@KHK9!?;mV>U91f~F_S>c+B03jcb7Cidz`<@4ex_c)k!4CO
zpA^(+4+o@jIaFivye>1m`}Th>Lru)lPc=pql<262S*T<mQ=hCWLV>uP_C7woLV<+t
zy)7LTh<M@KV%PBlDB#qQ>6iJv5%bxq@2S{)v49b{D`ww|CgtSwA0wY3@l`DE%Z;Cr
zmVo4nAHhN;^o!<nLo7WfZ#W?7Gvb!x77k%j(D5xYe?W_5=#$Z*+4e&R6hJL7Xb}kx
zguZy)V36=g7}8*MzFK(vqi-!i3u3}vTI_he*W-2y!2r)vl`1VR$4+#_&@WB>Ffb@>
zX}VWr0jO?jK6;$lId-5P&yd{e%ja|omz_4Cmf&#X6<%g!q9cvhb<|iL@<qN%InJ>d
z3fTMu`^;ma#Tf(crG3vTZa5Bsc5cMp63w?T*Lb%{fn&E4v~n2C(xivCb485&&9#L2
zi7c!s&ZeeV4Nf$&0E8`p#M124EG3f>#E)Z>4B1QH`e@l7l(c6Z#Ha8QDc#{J1oP+*
zQ!Gj3vNX@fCxNsZBYlxtofe5T0>GrnGBQ{N7iX2<Xl)}8CyyR}m60MI(u~VP#&ILD
ze><L8x9r_v;=%{T2e+uC!XK#q65?)5G!SHp>YtGmQ`1%>BEqWYb7a9ja+we{CN8f2
zQsOGc#%m2j8djD7xp-B}CDPzEq9agT#6$cE<*r^l4KI;CF%8_Fa`ojnCBj`107K;R
z9W=ItxE$G=JyggGH;n8)2Ss-}!(P^WNcpFEYBpi**1x?|*<VvUFHUllTJuc3SU(%^
ze_puValsfI>G=#n8Q`)udi0IC&LBZ+TC&jL&-Sc)EJ26$TT6iHcC~)LHa1hM4HtF;
ziAR7*b<MQ6V`vyK;7-e|Y+R}eqKZ8?<g!2pzfG+VPSA5N2ZP>}T)$C;fMchplTx@E
zd`d9_A0!oCYJHHYjecWWQdZxgYPs}FzG7*JviJRb-uEC5%B|P(&t{~!5}&D|joJA(
zBwoB1`$`ZACJVBqSslMkqme@wnRpkZXk#P(IIh_EF6Gu2#1Ras9+0Gq#k7{`=$C4B
z|FQcfN71(Cp8|FDY<XV%8?03x|3Kk|XHQ{`@>6_(l;bO;B-UAKtcUtUt*5i{tvAhy
z!r3v2`@dwFE3gw2$-%DOPk)y8*it0S`9WF$(Xy;}X%oVK(4zZ{yBUywbj!mAs-L)Z
zYa}3S#{v=|zZ@(Eiz;d7&y`X0(ZuYqa$FdHkW+Ciq&VVu`(>U-xIoaP3iX3}ACptj
zwxi#yliJOlAd!T4ZBrlXgv{@pib;g-3JQW`xdrb)zbpAGzxfVk)Xd?{i!+&Oes*cv
zzZPWy?7jBprd_@A#sq&3<6zjE+v6y5(`Aj4o{)iJC?GW!w6)uLTa^CyM=1r}20h+C
zXRYL~*=Wk>W{l>GoQz4zm}F^<-I4%!(<&5ME@ij2O7Th+Rt!SCB?HiZGi&6SpvQLb
z+p~(1D4AwekqKH%kP1ZsI|oUiTsBuTIgVO4&XzC*_iclZGTj3K)2Bfywb*)DPy=hU
z`kfE1y;&f@Th3c3@m9g?ey5eT>G(&aDVOnQzMWNj$81(I=hGwfFB6_jnpCu|<i8SB
zeXE1uhUMa^4fi4R*cDiqA>^a7^!<ZUL~f>kU{!9mIMKgl=ULOxfvMq4!~HilhumL}
zmem*sGeTMoFK2EJe1Dace3iV+VA{vdo$rblzP7)lxw-N^bv?`2YiOO8ynHy!U#evA
zTVJo!_^a<3={2Kt^V7@u{0iDiXDrY_6>`?q{o*jY{vZsQsd^dadtRueuivl-jbzwZ
z*}L3nkYS@(gkP7=%^$`h;7f8$;cM;z%?2ah9~JJtee>W>OsbR8*lo8|Sv8}lH}iWp
zx;{(#SJ2s;`HX&VA|;)<Xg!~u7nknZ+S+H-4F|q<E1QCeiFD_GUlV?l?HiJ|@UR}k
zbnOT94?4|b^HIwf_a2<OuQyY`VAW*FQ=;*Lx%s~@?=}>Cn0r>Q#75yUf5}BbaIgHk
zul9A9?@65d(lO}##I?SA&v&hcBJAFcs*M2M$|$J5Ngs=7=(D})+dDWRoilK;A+dwI
z>-%DyXWgzeX0h|q^~X95p7VlU$s!_C=W26jczSNvudf#*FWS-7;%DzppZ|+;D>Ws}
zosHg;xVhJWv1Qu2{*4Pr&HlH+2R?8m^%%IfZNSj*xwWORxcDUfvt-$m6)OB-3kwV8
zXa!yYv8Mfw^<l;#a-a?PEf>`a)>iikykBx>!Hbo$^O;#+@*>FRoMWm({68&#lA21i
z*uJ3a?sv>WFcan^a`rq~_8tl)=(W>3-m&69K|s20kLBm03m-y{1k9G6SmfgeN`axd
zi2RlQn@#w_i|Z+0hk2~yBtyeN;WcrFYg>I3=D2rUe<{vqgZkUMrc*Hx7ezv1>OzI@
z+9?ff9fsn)tMA8LtdE{`jomeuZS4U^SXkKQ;OU&9v$geFm6cHU>6}Lr`SXau$-%~A
zqbqc+`&LLqRaSN4b^TLFol8o1c<=M-7yo@@wV3~>m++H&P`;y%6I^~ztXyO$BH`lh
z^`=E=6;zB#RPmiG+vJnXOWuWEU!~`;fz-Vj2de8ToRl*WjbhBZs|#<eR<E9^{nqo}
z^5mqC)X=tRrHB1L?xWOd&ap>pg#(H%Yv;K8DWSt&6(_Gh7PH=RVwaX#c}&@>w%&H7
zU*buxm1m=LAgm2K8>6f&|1}^e*rMn5Mb4LrgQ1D5P<3kJ52B!{Ft5pj&hN83p5rk`
zgBRv*T_7$#>PGk%_kLd||7B-s|KjM_wwPWm|EH5flhBMB-w%%WyX6V`HfK%?A1r=4
z6UfkLxckR=^YmK~ow-77t*zw7=ZFn(s;Ko9skrYwm8^FHatFd!zD)<Af@BXnC=Gnn
z4r-_=^<OkaW4$61QM@p@!(jazk*X7tZ|_gU^%Ild%RF@pC!SD9JwCk)?mhVda>nYp
zI^@5RXa6Jz+7^Rxau<z=yvZ}8)2PvTbC%FOT<hf=3KQQ`(vp)E5JWK&GY5hg_beaU
z7#sce7tSR-emESW<Q$P2z=8#uob{DljNJ|A8rFW-l-It+!P}!R|01+L$yg&=+RYNI
zT~JBT_VM8}f5SVp9?^`)Ilgbx?u&1}C18Knw#&ix(xo#cfpM5$d?s9omsoAKW=nVW
zNS0LV4Gx==@Br`Vt-j~!Ppiq?ZBwd+eck)cS6(O;*n2ufOxtR>X5sAI+S{`k=&k!A
zd?XYy=|36&tmAt|%h(sTANLWZ9jB02KZ%CTh`n2reCqB;#x6_GzB=c-EiNcD^%WuS
zO;m9$7_s@K@WWJtT-O5K&spbLj>}upVi!?P+z=DpR=2)Evx(~9u%CAm)<t|LF{cwx
z=8a0@B2O7iHuCc^pK7avYVG~S;a*LW{D00vPqA66$rZU~JcFM72QIx=4_>&uPO1qo
zLN}1PM!7+Dy#9ov6>CdnzWe)=)~{4Z?QdCB@6<VAXIZ%zn|Sn@%Vg!g_mq3{qoBqJ
zc#)|3<>a4+7TENuI|JOw$A1tbdff<B3?=E&G%z|A{WivO(PO)|cANwA*UGBhItyu6
zy0~5}))O79>+*s~i?p*(+=HtSGVy%dJhM+ocf8{ll$dT`sd%}oNlG2Z=>{qw)2ml#
z@N0-E+G9^;r7m9I5SOzx826s`9Rz1F>@4;+!L2+Mg*WU<vFzig3QJ6#Ph2T0c1p<?
zCkYm$-c_LD#?pL4|3B*9vMbJ@Sr;Zjf(CbjyAMtX?(XjH?(Xgo+#Q0uLvVMO!Ciyf
z0Pj5e?0wcbAI=YW*Xj>FGxwUiy1KinuIjF;rA#V0^2oZDOY3J<mWT6$iR-(vZvU}1
zUleJ-yT&z!C+8X}{{jA96o3|rM}d)-Ehp=py)*68a56}9oA%Hl2YLMyS6V}7(l_4F
z`^mVw2UWHGQl6QJb9Lp-e5O1Fxol5#_&FgDYaTPp&k~&HZ<PB_*$fak((4p^l|px*
zuFgfbBd#=0mg^H*6s4krn!q42eF?xtYKa^B^jtFRy(=S_%-(wJ;QqwQ)5=reOE$PV
z6<wR*>`rN9!5I-x$oTaLE^&d|9s+9w%G)j_n_j?sXY6%(Yyfdm@XdUxH^~ofY?1Ze
z;J@Ob6Qml048bMh9kcjOKG=&o`3~jdY5Fz(ZPDMeeM-OXt>is-_aaI4!RvdZWnAd(
z=OMkjW5o6_0ap@WJX3_<J9%}qUY*6fr&)*?9q1SVsh_A}K7R5H^?TpD`w-u9Ak+Iz
zj2vR+<A3iAessQtBrCV|exlTO^z&mlhzVSiJ(_pc_)U+VvGCH-fr_|@Ia)>zk=}lQ
zF1X{q=N{DE;e}Cf=e<F`Mt#F+62omCzTV)M@lh)h^&|O=JDF$R<9;yhZ_qe7KeWF^
z&~WnpEZz%QpJzzL(*1M%A-*>WYvv1&NhKop`S>I)Bx4_(8``_~+ciXajdRuX`t1lj
z^^4d)_;fydpQ(&lK@+@y%2-7H&iXXxVg4a3WF{>iMe39~oXLNNZ`c<tPj_Y?ZyUp_
z!fN~jtl}s9=6Bnlcb|b`jd8}NI4AhToc#Uz?;!p;i@5sM>k;JlAC72^Jf}tG<+2#t
z{(74m-svAQ{QIGfTdS@I^&9+u{6-vKq~pxz34otHpQN1MyTfqs+B)TrkFR&PyT&xi
z29U|Klj1q`H+=7YcH(0q4z$^O57cQk!MwzA50V^nRr`87>_<TNP9W9({c`O3I<;$i
z?OyptlLwaiKhyef&IQNH__tfPxB@m8qTjp84q<6G5cehVd}VDSUJb6dY>i##5-
z{qoNW+bHFJMDN<=gn(@15Tt31@73d$Ooz35xGMGg6Q=;ji{L-RH*=_Rg(PRj<hSqU
zmi+yi^D{L%yv)<KYG`%!=gY&>%ifCjO9e<Br)Xmajwy5F3bPb+ZesFS|Mo8GAJyI#
zXnnts#{CeZ_3Qw2$RURt<FVAAgFmw9M*7I8?qSQ$9PS3ewTF66=Rb#s4IZ6Vg`7qr
z=B?MVCU}1vk(hZXfKR{e5rF8NbbPIP+pqk=-d0fK!0nHF94_9+LU=pPL#6o#ebVKI
z7P2W0w341<ngq2M9wQ%<AGVN9-g_hcQTMF;eT}N+(1&oO$~_r>Zu_>Ij9o5{9D|^(
zndHU8D|=<iKk!ASL1QKvBIVyo^CyzDM)p|u(3j(z8aS7IjyyO15_0-scz!RjOOO-q
z7}EEN&r$rfCX~11A)t@LKenrzWcMeyGP4uE58RAX+#b5!pnx8#3n7eIKv)L-2j@Lb
zw?yKwEZLvi=;6<ac(2bGJ!gZBbH-NxDQq%y5&Q3gj^2Tw#!@`OJ|?QugRP-+?;jKl
zG0UWaN4t06R7anG^VN@IJhP)iul@G7@hc;&+WLOqeE!QBFpWc|v#L3;x6jIs`9RVX
z>%H;z*WMg+#4u&rrujIicd7-p4TJ-cEywl5?ymU;)c8cWX3CWa>gKO^{&?nuy1!Zw
zowavLA7A^&@pJcO$Yl|I*}s+EI)`J!O!pP9cP>hII>!d7nr0Gn50>QqM~t$s?>trK
z)jRT$@TG_M-|<t4tEZjTJnSDa?LOrVGy2NWI}fVOwD>kI&EPQl4g$`pv2U-FI9smo
z%6~dWNA5s^K=#kr*b6D8E85JRPQT4gkyl$GSAud|mJ9-WD?hihCwr#rFK}Td#rfr%
z_I`41&RA%9kFCGe#cY|c&VMV%;cx3FGUY!AX&mindXaAjnDu<32kCg({;XL#@w*3n
z#AXp7KyG0<T+s|IHIR*BBR(FrZv9iA3UTrX^1FLO)C31`b~<Kd{KF!8@R#ywAp(Fv
zwcg=GOgkywr$<|x*~;%{?H_)+5apJck4Q-P2C>itT-V^kPtl+@XN-y7kXoJTgY5$9
zI}gV+6!fwZF1o(<vWT)0j8E}G<_pM}14hNVr>!S%v_s{)nm)2$yTiXlVH%H5o%M`c
zpHOSVSGZ@Eubq-Z<QoP&e9d8_!@nd5=4?7dIF1hd^!Oy7re>#6k?QhR_*Xdwg)u4v
z^Y)LV0h`h2pdln$I-~^cXq9ntm!vp@@-S#fG^v?g4%jn>Ur_iNC}LKHF;O(2!{`3p
z+UDHIZ^a4r=&0?ffcza+4?T)#>yB_@{BOP690Ld?py3T5dJfUa>EBa?CI?Ojb9&W&
z|3I`eF!;r+X_)5R{}3oRBp^3?i@9JrG*gR{9SvpgKPFIGL2R!Lx;g=Hd7>3Ld_o7>
z^+eDgI6hueS-DwEyZzHX2A>_o2l}j<vYk*k%cDw7<PHD0wfKR={A7dl=vK?_aYswg
zeT<{68r4Cws>-%{40|y(<2WWSQCA^0g44R~bwch}Hc6qpC`R6Bp0_;&PH4=IeX|9n
zXvn!Wn>9F1%)tTdmrT%(6h?g`7Yp1KaZG}|@C!nag6OeIlw5uZ?p9x8g<@3|Ei-Q9
z_e&-Sj#s6AvgN#%osYxMp6Cv~!+{H-u}8~Fkh{A4k3ob`=z5zt6Yu3Gxg`NxMeqFf
zsNii~$?&m=Vb(m35_Ane${(CLMYoJ|zJ?I<QyV+(+%ThM?!e{>&9f+=txTaAos%Y%
zPR1My?&;Au4|sQv|9~;LqArQkN5mD+@A9P(g~>Z$aMN#2Q;bpWj-y?}k<mRe`^E&j
zbR`Hnrk~2kzSoHPTb?;|;}s5R$LK2L`)xEOmGfc|yxr!W(Z@|vmf_wUygGp87<W?F
zLyW<vO|C5ppPo5eeqWjrx{z0OuQd(`q|(qK*V-n=N+6T<%JYoEmg_}<3C{3yd)V|x
zob+UD*}R*bX+%}G9Cj^*p&JyBN)OMiVg^~lp~Geo7T}e29>+(}sYe<IYuNCAW?r#`
z<>}dQ9b_MfT)8^QJ0u6BwM~Zod*mWGled5l()mrL#dEtla?rXK#6N`e#FFa5mCHNQ
zHyJtN_3qUsB0TiFu-SHh&79$Tt#i8PM;p@*^QsNcQsf_hd&~T#@fbj3k_05HbUQx^
za6I8}TF47X^GC3J$}AknxA4wEc@$Z~(~^w84!9B}C|Ot+pD5Bt#8KOiz3V06g7w1@
zxFUya^aFg|kR>@okdGq|^w!_8%kd#?WI%Ye#~6}^-iT|L+$P?kUyKhb2K#M5eBa(M
z{atXnm<{VmaAo4^dAzvyY?iyGyxIn1+Qup_vJm+b4uF(jC=H0HMh{;V{o+E22Q03T
z>!yy==v#|ui&#cZFK`JmNN~~*juN2{w6&pI7{6XeGojEqXwYt<2F;1b*3QHH=4j(l
z_8C1d?#`-lj`RJX0d23t!gW9BUNp1D@Pu<Gd0zIs1B+$(P7ZzdImy3$tB<o|((zI|
z6X2!gW&2DF0SJ0Vsx1xjfH(|yiCgFs2BWBXT5SDMUF*Ri!u#8-2H&0Oy;F-aShw43
zFWbfj&#0=F?C+@Tm$^78(`j)x(@zhp9<V%LjJDa(C`jT)9f9u=iq^>H&i3}55$4Z9
z&(4u6O*j*|CiK;f0t_@-)&S+aDupM6@<1l0ut~?jP2tj)LpVtlbT-=V3&vj`&u;=c
zd$M;=UCRF;o`*hwlexUXfF9MIGk?cq1FOuglYf~3*I!>oPA8UE=06E%IU1f{d=bBE
z$7H?FF39uVp0cmACHd)1=oqyAN7aEPzji41W-#<5nK10#DK-sowE2Y1j%8_|RPMUJ
z=lEvuYr=dni^@);O|q7IiuTo4)TlZS2>W{;YEuI$N?@+Ck4_7mvFOHGxF9qwW$y>V
zZNV7$1_%0sX>8jtcoo(nG5)*y2r;S|-&{W?r+ecv%q!vlyu?xFhvUCeic%ZJ9~$|n
zB4BYTbj@7GF%BjAl?CG$8OAT1?SW{r&{MeiPVgOmvbALzxrX^46{TZ<Bh7QJf=S*y
zQkAXYY=e6v)f)^n9mC1#Fry(FF$#f`oK1ACQkxii4O#RGKp$V&tXCJhxWJV755wLS
zV}~*S-<oaGIOsWHcq$op%*2ob4)V9lqU@<|+1<!tKv2p*Ub$H6E#7jRQFx~?1Iy;W
zo_A=|2>O#X5)#jENBW7V+~0X>9r-?UM!e$1R7|=5>`A`0&rCqTppD_)5c+Q>*Z6h(
znw@wc6GwTPoYJvGnVdE%;-ZI2Bah)VoE(yxOW?lP6or{YwMm?=T^^v|y8<OCw?}W-
z;eF9~IL^4x70w|TYBNs^I;PmNK=tF2U-d?@KVbMhX@JY?f`FVCe>_fP7#;;&{-8Q*
z1q~XrXP4Q6{JVOzZ*qL==2dhwQ?lWv7b*!q&BE-nc(~)fID4>egHGC4z{FB&iKdgy
z@b}Jj_oHGQXWze?PsLdh5D9kkIk^*kp1B6Sc!u%VZecj=;ckW~N@1wA<k`${P4liK
z^I6#KP$19HFhVzOp81K{x<ie;0bug{ca<9EC{GdhpK?k%-^5XdLd2L78-3+0pj@lQ
z*~KP-+2h@PJH{AIyybYQClR8XLYWH?={3|WK)&%Q<?Jc`PisN>jbg{xJkHNdfHxCW
zBQ{PeGViVR#4c@0dDLMxaUQqq)%h)nykb&F8_R)_W01RZ<)lRTId8X3*T+vkdoN?N
zwhk#S9(e<~S)zXHptqW1{>ZG<d?qF?U~38!eJBZnQv`6?^U?H*>AS6nsu0tj)HT${
zoZ+PJhCP<g^4<$xB&W!%d*)Rx|5tTv2&>Fe%m3ZhTa}89{F)uT%d!eA<0)*&Z9udP
zakm>A>LrU!wdn4cIqm7wn*;HAbs0YU6tm`b<npsDmlHabOpet(_p|=W0eK=%59W@Y
z!jl8N(V1_c9x}|A!AsgRnrhOr|9Q<Ltp`kW=KuBH|4qPucOI)wzY54Mu>C)q_m}@)
zJdnWyg!o@$xO`!~3kpQwc!8vh%_3=@`Kr>jP(9z2rf!!fxQ&Z%F3|8nhiuP43Fp?K
zn)!H(f;L%bFvCBN+JctJDXf(|C&wwhphqi)^*X(FpIc|S318uRl>E#Cq?Pq<hea3j
zPatWW$iET&DZjvWd5Zg%Oz+lP%!{x1(tS0^cB#qi6ttC$<y&UE-bm6+_r*~7(xs8@
zGL-GsH++*YdD=+il(S=dzKM#&c9e*D?IB!KE19IZ^2>)<!@|^jQ<+rle+*FYI(dXw
zs_!3#ukwp*f{MtoODyf)3?X5@``W2rC=@#14lV2b2P2=f?v30`kW>a(Z@IouYQ#(s
z$V&{M7dYWsXeyJ9xHn_yls7VQjRF5?h6(LfJ2-jrJl|w47Em`2C~AU|+D<F6+)mGs
zc8)X?OmneVPgaujf~;qy2RMBtdGJv1emTK}VE-?s>Iv3=yKIkGRd16{WE2Dc2YopO
z8x?8S15WMl6R5UQURyYkAEK990*gw|rrlN5;4dU0azb%PAb%pD($3ShTY%|9vyc^q
zYKF3xeKO5_Y-%ng>!1SwU4Q6jK30(nW?EP2qPiTB7RC{>TCH;~<;KEpMCBJIKV?1^
zt3c_|6udeLcr%h)owv>F@pV8b{Aa~BF;UPbgb7g4cH@gdz^f!t+FjWw!>bg#2u{c)
z5Kp<d)^~2BXKhPXp^(0}c5Z;f^VS<tuY_dKXe8z-X0NNDs6!-HqPBcw(MmpRyStzg
z*lmnbh(iSI2odY06ast<v=au))o=0>h#5GHMK^Up5;URkf;dluI!VJ>Q6x5$Z1zyZ
z;6n0gSO%~Xi@=q9;>d-k`6l%g=6P{qcoK!P+#UusOMUWo`>d$;H=MU|@h{}393mZB
z8A|XWZT9sI<O=TkMV7VX?YKnsUp?jh-8iSAgzRiP?^eCs0sY#tBk{CS#|Z9^=tkD6
zmY8%&D7=0?aJWK7@R1nc(&<ijJCfi`l!By*Lc{?YG^Cj5-G%P>lF^+r+wGklgGaw)
zzsz@;gRpJEg_4p<sx1?-)}os9e;+-5$c$t>(snZGyUysG+4Y=B-x9~3XE`@;0k7={
zP42o}TZy7EBoVtZ%WBhdDUc9JuK5s!;RBUWDxgBwFOIKlqT!McmL~ljW;Cuz0EH6!
z|E60yeKldXv}6?I=Ck*Y&SNIcF5s>v^eF@9>r1=uLo(t{uMiKee#}%rg}nU5Vfu<@
zAurx^yw~}T6)<U|mtc7QiU>zuFzswn8&R7{W;<PQHQ~JJ>`PZpQS~|B*(r;2`lp#3
z`9Lgbsy+28uL(Pb5n}N_GV4-0_Jes~M>PCVM&u?vG9~RfC4E{+1`QnXr==kp*@~+C
zvTDAs{vpTtl9Ie(&HP*{yiJoett8^NIXDUricKG`PM9ng^y`hx!1Oo=lx%ye@WCr6
zzP^9YZl(F$D!b>&^Yu^M_W5Pmf!G10iqsK(Sh(MQN8;V^%s}FT{wQnoAFe*a^Hoc|
z*@S=}ni{5E%Ae}lB!0UKrp8z~lLk%gl~uW%^8hc7Ydl6x4!+?|1_{6hE>SVt2)gr{
z?W+^MAKSuzlPP96sTr<%-z?%qPyE4(^BO1Ymex1;IZyhC(fKKuy61@C4k$csi_O3e
zsSX(r=d1i&+c@P{U2v1x=FTc;{ZU0IIIDnxoLMe1*M^JMg0Th&?*D?0R_>3g@G@BD
zN!I<x`(hgl&I&GNg6~6YvcO;UFB63ZbR;m3PYz&3MgRUSZCONQ(T|<!z*1{Be=N~o
zx7d%DnAIi$7ZBkq>P<;%LviqwVd9QE=qI!<HahNqf#!-Ke9aZ(!zujo#CN#;67!)L
zwnvh@8a!Dyc$Q4wQ|Lxvu+K@jl}X6oV>5jp>uh<#9F9On?`}=N5N8BGW@4Q~(K@bC
zWAB{RcX>qHODB6H=;>9K)lK#Y+9y~8Kw_5N(Nz+|oxx3QkkpsqbXgc<nR=m~VG|y1
zze(T^e>%p2z9f%}^F1&Lz0Ry5_U`Yjpv6=Xvyy%h1XH*^)8Uf#kfL|F<ZR#b=czd5
zNN`;H<xkH7hnbwsg!<)M^snhU!8<XDQ+LAsPtot*0;Gt6iKtw`XqYs#;8<Y(httW^
zXD{4$ml;p+i7lzI29AW=vDc{>3DNJeQxQV@?}R4m9Kgq{SU5lBOb?1XA_!yubPE5`
zK?kV$ujZ@7j~GkxK1S+upB=HRzwIi04wPR;sjTS0Y9;1_S@@Kb55M<uG!B>Gy;T&f
z1Jo3gHjD$H?0F4$w_P35T&gUYS_`DVgM*?bb-dgq%tT5#{bWMf?86azllySxTG{Qx
zt3JmEbR!lvcTuB{R7P)mufJMuky%lZJ7$)#HGH*aFni&5v)r6@$FOGNuM}r?m^1t*
zo7Gl{QcBEflmHdAe*9%VY_m<$Y?T-e6TB=Ak$t<C$frU<t<bb-kmE$pCQmpqK1{+2
zvJPR0*&j{iExg$Ex9-4um{EhWPa|1xm)F3X|FR-{X;`VoV(~G)XEl|PZKl9>ZX4_b
z$?oP$TC)^);_V|!!G~Ap*j>>{W`6BdPuj3f*f5>V{C-*^Vj@OCcKy>NjG}rB#hrZ^
zm*s^2xY!*($*<R~cW|^g8LyEVtR=^;I8JJY7HdkAOLF(Ic*P%{-Wpg#Ojc=}z**e=
z&>Sn<rKsRNtWSvT5ebX&ziryo>`|fGBM}ykj9<AVyN5JEO$P|h0Pr)w`o*}z8W8%9
z<?WX_aFq!D*cF?;sh1oW&7r19%n+#hd-Z)7sF|*r0AA|{Pg!hK!deR?Vq&x2<E0*p
zU7dwVrQKGDRI;0gkHydzV1^mV&$*8q?R3R9Nwh*o^&3X__f_+aU!`gAS{l|cFL+$?
zT9U||Op3cBIDMPu;SwJH{*jW|(pXF}iYN`1P$HKhk1}2b7XJ!mKQ*K4Fc)%>P?cu=
zx5U;@ZU2d>?I21_+H`0&WuPBCg2}=2zy~~7f1wTA14M5nKgxYXuCwk*B$Jel2KJ*2
zWRT{m9algzj(EG>T#W)=tAC}@IkO3+1&12SJ;h#w84;r?ajc%97<{Fajm3Kp3$yKV
z!fiE|5pN+IzUrIQ(})!+eA@kbnyzSMLeVd=Pv1F0+&Utmp3;Mx^bF@)#&E&MYp|#%
z`*c)^*9tyl?SA<)91CIZ>)kU>4}ZZInJBI9Q6H7mFNk0+fu>??l?M&0JnKA9+NoWe
zchf+d{{7pn@Wt5ULM$X7?F@0~PA8eTr}ICQsoDMgar#t4hf@D+W8NYj;K2{$Bj^{$
z<VnWhy6SnHMCRvI;ZM)3x<rLeG-8GmZbL;bi&8pSb11joz(k<V<s_6j^AGPoV8Qoe
zFAY6PoZ(fj3{Y;##L7xH8kc#1oL;V>*o%txEow`dqlkYL?d?=HUb2{)fa8=WXS!rE
zXz(m8{iPbG{*7YD7)vca-;oY%&Lskk=e^fSIcx3+)+gj85~mY~Wg0qV!P~hD)5yf&
zgduAWHM%Ziwk_po{Y`Gegu^*t9*s*FMoo}-lI>dt&~)}KQ}p6LHd2t5&~CukU+8aS
zWp7lMLxtwd|7*{bkbmvPP;S2<v+X-UX2pot@~{S^Onh0JOW&OmV)Z80ofBY+e$R$e
zP}D-4%Z&u@gL*IWHO1hiJkKU`vM6!cszPD2!BwBh;j}7`-vMAv=d~@Z?u5%y_ewrp
zCr8*tt5q=a@XB(R!vbbJD9f1Nq(FJs$~Q7|y8yKf^wJZ3G5**vTd^}-d&S@oYyD*^
zl3AC+(%nmacYNea^=U%CVDe<Sq}`?r(bNq^lESWSapeab`mkR%1z0QKb6U=bzWV&>
z?m+xE0avp9;&MbbnVtXXnl&g-{W)FMh-3U96(!!&^DHQO`r{rf!PVl6k2So$Owjvy
z+E}eu3Vc!2o@+zhOEL1&Z(MtmcWS~W#nUgQ`;8P5wR>&iAV#kEB>l4l3S`wk{6^@A
zooMwH_&;|65VaGw^RRd=nW$)>rs^h`mhD&;)`=xZKMbB>OPM|F-zgg}Ss<`s9$Ax4
z*LIh$>Y%`iKK6-M!QknTBbpjbs`sG}0(?IZmu>Wd#UZUQyeF0!YAQ0~%#;XYFxTil
zT=N2KcF$ERS=1Kuy!^Qh=TlQ;zJ2$;RV8`_&yw)<W^(Nx-<N}N-58qpWvWNa4BgZ5
zHZ?{4f4*zgZs#DIOBmDMp?u{7jTy!cFaWIu=3}|wOx?4bVBomv-Sos%>li9X*L@=o
z6{gHBqXz$l1*J(CBKW1e77+S9Uo{sM+3dx^#<0^HVw5<2!#?G*Nr?V39LnV(w(ES`
z4IP}2(Dvy)%>MlLD^x6wr6~=8lrxq-#TD&O1VFRA?<DvLgj2T<hzq!n%3P-t#+l9i
zPV>p9d%AXoc~8)4vGtwW|AGwBA&fj@t~%&GTnXhZYQ5|WusOYl6-zjc;MQBVxTjXp
zyYJ+`p<$2G1*HoX>1Ij#s_H#K<LDKK#f+Fsno+-*(2Qg>NDtjy1FZ$^_w>pycSUu@
zSKZfVBr-BsD>${<Vo;#`rm6;SP+j#XqOWVc@B!`SVfs<^eZMl81774|)zRd7m~_CO
zjhASxnUyLa4#lLp*$S2oM#V?1)p;nW0lCDW)2>YD04Y>=bUaSs^##;0GT1CQYQZ}<
z32(~&NvXY>euqJ`xyu1jTTyeq!bX$=p%Z@WWL$RUyP#U?rKP5{yr8S2A#I7z-Vs%o
z6=PbWvg04%t;c0S+C$lhx`+-21=W!!SL2xTXt2Uhw(E3^fmc={=74_P2vjh-*s=Q{
za6`)$96Nb}&M<_uf8WJUDy%8&#9Qdfh$DKJ60*zgtc0;*A_^_sbANtWM}qj?F~BAq
zFG11O>W2`pWCzLgz5cLL7Z$G_-kqY9y&jin35tyBv}U-ew57!~jKb_{!dQ&PLI>Hx
zC$Waq0t>o`UqT(1jU|YDwn*%9sWGKi>R4%rO#6k_otyFzbI2HM#^iteGf~~6`{`_X
z)2S&zM3Ifle#OTt=*<l+Ia;WFSTv3ZLD4%@Q<p~3I(&FW0AboI+ncjJ8Qj&8q%fx&
zS)nhB@D^kx#8&km&01?~S4pAai)QKj@I0d(4^FX@3uS3HGz!max2Ok4s|K|caz>TK
z9jlC`NJnBpzR0(|sU~-<6n3`+*i|>$e6hGwd;l2I39j!|MrX5KQWRmZ*@Z6FhH?#{
z*^?DGz-FS8#GaF6C`Rhbj$X0!uzX7of$#l2n~iwGnRGwb=hu;0jq<a!2|KzcRPw)Y
zUG39OGgNakgBxaKtw8C7`T5PiX=S*LXlBfW@{td}#hh!scszZ8wqWh=uPKBGerR)u
zozxhqYzeQp&410hhBn=6RIHreiUv2kk5^6Z%wHmEmzvZ=qFSG9Y2{|*;}=_bQOt#Y
z?VYn-nm+yI(g)r-@6i>?bqH$txXxWb`V!|8nSR1YstSSrO8!)Xi&TQ;&2P`!>v_2c
z+6w);&6Xar++~m=u?>9AG(wQ|uJ9G@p*{0y^if4CQ3mj;`Ly$T(8~QLReT~+aQ9mt
z>3>)0;@;|X2eoEQJg?83_0=CAZOR~HH>uAHJ3Yr^N+S>F{|EZ8KuZ=YAV0DudfkIS
zkj`uccX#Uu5@mb;Z|4jc072hd)oNG&$7isvOd;Z>{;;#bF^|?%CfpSpQs7Py60)ok
zq8~>?z;eoMr`U@(n-L$`ZU50$XjnFS4*7;tch{w#NiJa|cYk3dUz7K=YzO9L_62Y%
z5QKtW&2Go0)>183z>9Y(@Ol1yI-r>M#~!1-Z`zXlS1@{4t2Nmvl-*QS;YWb&JwWA|
z8TebiGYAdYTiN!mDKisba1k)bypyb78ehlbr8dt`tC1n8k*z{L)6w1%`<|R9K!7eU
zhgMs)1G*s+CP3Bg7Mnjc(CrE!z#)HrnUPrL%ykUTjiEu5yN_{n{0EUd<qD>UONZdd
zz%Siv=hn{>oE1h;?(vE;f{d?bdLCBiU1MW}*=4Mc9)5Eu9qerRFMrQ`7v_9!v;z!z
z5&!reV@THVYEESP2Wnp+dXiDz&bYmC@t6H@iXp$gW@J(jBY~IfJWms@z49w{Tbc-S
zHqC9B_cUNl$GEUrZn@6Gcn89c;Pzt}3VLE!WCkZ#E)dKzMPyxI9xwD75N4=|(k;*A
zofnyviG~0GHivG%@forfJT=L>E&i<?R$;mZ8P9|dLwU!S-aL*mk#+o7bysP)PB-!7
zFmz-w+y;#8FbXynw!K91$z9>hJUBO)$7G-W`xqwA89lx%5Wd&YwBuY!x2V2Z-XVEB
zcul`nd2agIyCCa#BS}rbMnlK}ryVtn{Y4V0)9*KUuQrvD8tDUE#z-D9V)03~)(42#
zLB9^XcSbdXjcN^UtztIRVu+G1)|uKL1!XF0a5k9MoHZI)@Ef@*BKm(=Um*;k!SLho
z22voEYT-||NP|2sFv1eRCbrU}b{D`QK#*$2URs&Hy*bD37C~5I^qLjNXF`(5lONaU
zKl+~`CdL5F3~zyX=S3)za=#;V4%Lp^`|k;2KN$%K<wrs%Sfj|yDY_wp=A(>0-1=*L
zgM3HG9~z9V%!FDSS#94r)_b{olVJI%<@d$-w(+^+D3HM8*SK7Mxml$_D`{KPG5yor
z10H@s2J}s{Cvqh<HDNOfOjYl3L1B*d^;PiGG4E9jzM{kVLb@BeT(4~coq(q*;4YMq
zijP(r4~#^eI^2<+bKt?s2S?~XQ{Avw;ndh3MFM61I*eO)2Qqr6#MiSf%X$|AL!A4Z
z`p9m!7DRk_arz5B$wTA3@{$u?(_KW#byn2NQ;Rorg<LYrzgLh!!exl5=IE*BXv2@v
zVk6M8Q=Y3QTWl~zD3tv1r(ONwBE32LVDmuk>1wz8LA%OZ_VEaYYr8joOf6IS*D~vP
zABY(&!0n}z!3HsQ>Jk~whh124W%)L&Vr1b{u`A~_sw)QA!|7XA?Q+3|$~*UvK?P`&
zu6v3gUr9bf=EaW7;~d2@v}K$k%JpCQJO1~~rhopv(Y}lttBz4u2oVv@7X-Kg{c@Ti
zyde5a_b>`N8AdwTfjC-gq;8dd{&*`lyUS=j7Zi^o%RU1NlU>|ey5^d#9*%U%B?9P6
zwa%-&kQl#TnEz#TX#Y83(+S$#cy6F3WFNccjO>h>xAzT!3%2J<hZgjv1Z!l#Z8xHj
zkU3)^d$V(}pyl8CI~ZSK1uZptS*i;@>NS6z2Ij=NpLq#P@Y8-&mz`>6=vuY^7T@G@
z!Ki!Ft_GM^)`l+5q?If5-%j+$dkJQ5DBm)hVzP+Mko+3(6p%5%W<`z4QWLP?`b>ab
zbI*qdk(BAYYN>bcZ@A!nR8%jNFqnOXe{6D%gFfYIGqVHkn7=)uRfcF!sT%3J-^NNv
z&pL0C_O5`DyPxfO=lul0kgV@4`M%c<y%}RFGNG==`I6xtKCYY5QM-;ggPEC@+eynP
zJ!ob-zW&Fmw@kxvK6#{}C8b{rb{`_-+;;rK5k87oxm{YZ){xgZg9=9JkY5V?Q78h=
z{(eXrf0(`J7lY|0ZXKK>;BHZRzd?=ik$_)BYuNYtz2)cpUqe?mMA+XR_lHRCa^R<k
zeM9OVEuu!~S;)}9xkRI`hYb{@9n?u#IuXMdxfh+k*8aZG_+n6??(p1X;*e$n-rASN
z(1F68(9+6ittI~<&ssn8SucCNeJ{?QY=n6&=W_pzY{aoxa?M_ujkSx{#e+4Gb0k6e
zz~)mAycpY-G)mB4Iw1bhfDKi!KfA#g<`5b3f<*G#8Y_b&_|Zk;fD#Om!7lO>dXep@
zuYYp`t3BTKif~FWI+3O1bl6bPma*m$<iE*DEMuYRLU_a{${-)VKFygQr`y>oAm~g#
zUYoAc%*P5-YFQIpuMJgf0VPOWzw9Uz9~=#Bf^Q~K^l8US3JrmwlRY>o6;+5VxSLg1
za#AzMqXm&ZhVZ>R6>HXxo(dMuJ06dKmq{~6-hazi!D%n(3$@yjkuJnDEYz&|F&BSb
ze2gI~IRL>fbmiNlv2?K4$sYKg&)X9q_x#w*_#srH%3m7?SJ2J=Clfe3ZYNqYu3R*A
zRZu?I`8OF+{Pa7P`LAwF#5#{UTjEq1XA}Wf#`M54mhmG&Qp{yqx?2COOaExw!K!C)
zGqk9s=-lPW%fOFLmhP^tUSwrFu+UtDe6q7?Cd;#$-JR+S9;7WG>;Ur>4woA&vEq4d
zC}mflWApf|?81MgKjCara61hP&}d{xpEO~3$4kM8L8a!LeEmCO02d6#@{CACl-BDi
z?;NRY9`4W7`m1zc*-?z*?)*DX<rvI8<A!gsY%S}SHkzV>*KpAn>4BX%yIF8xX#3(z
zDG7VoZZQ(I_9YwfV!f3enXZ;rY#3U62ES^m?5>Unfa4L>9LffxuPVwqFqRAG;r9w!
zbITB+`Hmg-61-c4qr|eX)t054ZD|W8h|0i(0uq~`P%Wqvtq9Fke@nQ6Z3f^lCnzCk
z&PV#_R$A>V928tD`5LwWGAyfD^%t7wp_bTV^Y_>(CRfJ@A*7e(4{tZ(h<<rJAH8u*
z3iaG;W<S{KUgZaUF&rk*)D(=n9PV&|Te3m*y&a3gfVi4Mx*gsTl+sInamxV9BAHJv
zU$yT{-z`N6k2#Lj5$r^<A@~9>?2>yLISk1P?s<qRsZY5LcM+wG^miFY?sFReHW9wf
z`_KX2@vvaATpUBgfbdB0JKJY~IPR}4&1)D`ZMUU?h5{3vR<nAeeFvw!A{(pA{=c4I
zLpfYtbGI1(b?$J9|A%UpQM+n$dK0Lot2!7fI(fz3?~qt=#m+)193f9IZAcP<7E00b
zbAfS(HV{$0+;inb?1}0~tB>CB)}r;Nc%<||uWa@+N=kdqpvV)3=ChQ-?KxMqrOClp
z;(ZP!dh`wUh%2IBm15y`z~ej2a<?xe5<vkLmXY~o*fv>hI;1t65?Zo*N;E%oAsj7(
zi<L|%<?1jnlHR_dz}2Z#qsqvkss3?q4M(^etbY48!GOW~_t1N*NTy#$rQkph=5MqV
z>;W+E+G2P@WPV`n*-XA0TQUS!Jslu?8<=kpj18BHKO2Xc3?IWJ$<_v6+Mh7p|Fs7t
zLgul@ANlAwAp=peQg+X)d>lTNN3j1LQyEwz&pv59mRoH*M+<yqxe>xVw9}1XRMoMg
z78JJiRa1qk)~Mf-)IfqRm1s>P8>%9jkHPw<lB=Vz`N*k)Mc@UG)qEJ>#Y6na(Q3*t
zZp~LM2OF0E>pUS^hGU$tzP7yDG@&J5580wwSc{Ikn5KOB>!#bF*HEeda;ef&g}boN
z(+9F+j;Ym>Y5&;xik~>pN2r25d{o7PsS1q>E=4N0L@3<B6B4t7?T>U7hHz%HX@4_X
z!RuLMp_OMj^RR>T>I0KRm0r~{B5j<a%@1i(xOyJvMW1w4!r1ax%<I<&(%pvNRSu@S
z1{Da_41Gp)<o7*)3$P1fY-`6<k<?0UZzIQCIz=x8dyMK>Op{$@OL<OPgvLxf=Z6`k
z%{FX}K0mHom*FhrOzSIQE{773R=(u}k}1ydz@U2kN8YPfDm~1S(<^Z6g>eEYPmFW$
zkF-^^rjdI6`q{iI(T8SQzv=kgPjp~qdcWVnj0J^`QBFGJ27mN7O{B5=!pP%(FILhT
z_y#&OOIWF#gsOm{q)2o4)JnKRIgxuzXRs}|KaAkB8R;N0S$wLfy^-(z9^LsSLN_s(
z0?&?-M;GWUPfsd<M=4kFRZ(Q|A)-Ix;uU5P5V%2cBb@DoNh8~#Cy->5=QAZ@`sYLT
z&&{%lF|ST(R#V1(O){V%e-P+@mm<xrLw4UuZY|qCHZT0iw{qo>LO02wX7M`>4oF)A
z+;zU;zDv<uPci-4D@%s|PZNYYVGZ_=qU8up8Z`@mwR2sMT=hZSIu*62Y=0<Pj#Jo!
zQy6h%#EP^?HQJ3ZJVbJ8xJ%{g6dbXnG2=<n=tYY!gF7m2rwjT}i2Dz5o9$!A--W6%
zcBAH%L0gD2Y6jJ+Vji3o;!k<u@TS4d8@LxqCKYFU9e~h=u~*w~tV|kzN=JO_O&a`I
zgP9FWmPNJdeqAP5+6?D^Yi>t;87ywWQ@AdsBUQy_!H`KlGLk6SU(lG?4M{TNE}r{d
zQDZ)8q)_w9)}7F0J)b=_w*_0#<nt4Ga8rm`mX~pVMAxn(kAL~N0gZ7sW_)yqV{FA_
zsXENWZ7BZ?VHpoq4(4bD=`1mSGxp}|kkME^88lsU`g~e8{?)&ITD`U}q_Gc_tv6B(
zh3DF;Sl}(*rbMZ@1J_pa*I#1NN7<RQfYYL3vB%%0g`kBx6d(Ehm+q1gnGU5zQ$(84
zBAHtDMwslXF$<B5LUrvdLv5gC<9v3(j~5SwbC(}!P88vjixH^{8t}>Dl|Lg_GRxHh
z)_{N4o?$E5h0&yqTJrp5dt*+^jw4eHu!PfBNgHi@#hdXa7h-o`f2!yNV)v<ZPk6OG
zmyDQPcMRRe-jkj;Y(N|ME|q?k#<un}S2vj~cn_3zqSfz}uAS|O1%f@d+pz!6;i}KM
z3rh=C9xXdj<#gw}5+MzY2ZSZS>E9f_2Dhu>Z=@ntt*_?xd9_M+;w!F=Dp04paJKk8
z1xwAr=M+a)*c^f2x(;p{jvv}YD0KRztu}@mk;j)?_iVdy(M0QX*d2~2CQ?F_c+3n3
zh#4hl@uQO7*heIPO`u`&S(G?(Tf15^*8ZC9t<LXaB&n;xP*2IfrWw?h?CS_}gH;En
zE7iH$m5Y;1>=LYn6Hra`Au=NEu|ucAO&684Zt8_Aafh38i&RoX{GzS)G~$u(>=4qR
zRlH=ZQ@Sr-N`qA=+A`-;K`H3q3)2{>(H*Jts{h$YjhQ@p2{7?GCznivSsLNv=a-?V
zGHit+Vx$eys9OyQ>l7gKdiFjBm$96=TQrCK(+n{qJ8zqZSw88#curzSRnTUq_(nf+
z7A0MYJJ)Da?=w}<f}vph_WS4ue19P;QGE+;cnWlQq3xV*w5D^X&7zH0nr)$(&=RtQ
zkf=34TtT{=9TM!NGX{uX&`Kf2r*`@Zv=A~V#M2E9PCj!|hg-*k*4UEqD*JxvHOURV
z5hx$^x@?!zq-(QRKwJ7RS<+%j!C0v^%*9zeM;-nd)JmXNN0-XM>3ut_HyC2*H^4&(
zrLa-Mn|PK!ckn$=N5aOE++<YxOBz%O|C(py&w-ipFBtA@ytmVqJAFW%5dbiB@;7wO
z@2C>F3u%yU*xc!3m2d@`cmc_+*WE`_4xPdxT_K@Eluuh?&j?!8F$D?Gr+$;*E8HGA
z6pMDY2x+Vt@a7IB!Bo$~=W3TM=8BvAX~I^A%R(O1|DB+sYhq%cUTf@OIyr2sQpntO
ziQOoIX5hAhA}K;vQ@<mNMzIYFDk%F&AMUII_fq)#(VQ+eBx&1>r6*4Kx@CpUjMb#$
z$oH=DNCvCSbXEtR#@^0a7Hk;}A9Ypnf#o&+Vk)FoY{V8t^da5%34BIb&VGdi23)q9
z0~#hTm!E+4QbA8f;Yti8yEl_F7)ZMBW@^kEcFFeX_qC<a0j4u$z}Pl-oJ5pH47q)p
zG^?h*VzB5sAUtQa8dmMHIJJHQY@^bvBJK@Ub7kstp5P^wD1lL@W(Dp67DaPX;btsN
zyFM=#`_flV6NoJ5_t<8%j{}UtEpQxxN=1GlEOxr;EE&|aazt~7ga)jxNm+<-v2mpv
z0rRo9O$Z6=FazGrBhUVFzo<{EYy$y0LST~71|RZ%YNL%x1C9_{Z5qivwd=y_nTq-i
z{(03fI9FwXsuoEVpO=2A79pt|GY~U-TonVYkvCci-u0S5hdM+m!7$lwui+JQzeD^r
zOC?K8$G8PmNdxnMUZZBH3TDD2iD`7=^5(&Sk1X<PMxFg9iC9xgtrnB}Y1@sjXS@|p
z{?8PW1-bp|Js2D*g-mB^Jj#cXgd?&8UIKBJx+K0AqnhL+ZNetJ=><8wL6P`(O@^?A
z#loL<dq(5^>Q|jg>{k=s2j@xsPFfHeS7N_`AusIFUep?l6u$}QkGGzHA(iB`#A=p=
z?T7LUx`9^@;MfBV4dSaNxWHH_K4^j)m)H852+fG`whts0L$6?yh#9Rm2>1|`jlIeY
zl?w98aybDOZIYIj<M`Bd<pb}Z^exgM6P7XbZj|-#u2|am?jr$rKsc3}Nhjyn{3gN9
z<%!!c*`jVirAid(F(YQt65Ud5En!%}irgb=;F@ax^kbJ<!@NMV4q|VRa(m2@7P;JC
zl#LN8Gljmh8e$0z1nl&CuTh^NuA}oRsK%zEfQ(GyW7sH;O-8!$o(T~bOl4dCYjtb_
zcr!27#hiLQTRvgEjHYAZWzM&TSDVyq`&li-ik7clYV>L~JSwTav~~iel&3m0vzc%e
zs*pOIl*-9XHHSCL#<ZqHWMni<rdPwM6;SWz92u(y0}rqRA+braVt8y;B$z9`wM@d)
z7;tu*lu4-4eUrvdrgrnic-8Z$M=Ou!(8_h|%;}ri;kHf2w-k`hl1f|I;nmAX@DnhG
z`P7~I+j-DdQIfy-P%Qm9sMcMuuurjCkIec&UnO~B+bK%gO35eA#*F0NKQDlqF*++5
zJ1?K@)>Z8yS8x3$PWqyXi$>LM?H9BgysD;cl?qGBIEya)lWeodcWjVs%uIL80?1|x
z76vfodS+JVHH@7MYm8|O(y0uZ2ozU58E-wJBEy94ymt;XNs_i0k~ES>FJItj>~Dv8
z9rts&j(wgwk0B^D%?te!*?r;Oy<Y!%)k(2avnuwAOut$9TQBJk(y7CiI)$vs_lB#+
zdmlJkmZl<E-ARwG|4jL9z?hY<pz&8zizH9BVU>JT4SWkS59l}MgQWSyJbt%&1YK-b
zm{tng9Zv*#Zb!&Z-FU{dsoZ%eon})fBIn-L0N)XygQ-SGV!VZk)&NTM1!r)<W6TPv
zG3mhbEvuPcV<Sfb{V_RuZE4oba+w^YgPK&cu96>;kO&@Mocifuy;<b(?%n0bEqNjh
zFHP92Bw)%B>F?`9A*|6&afSMtR2nq27b8k@-l$cvuga+>Of<~ZB+v=2jB&SW%}$^A
zltwLm%zCcakiSq@p7=-fW+0O;LvvYj*?t(A3R`B6rqAc9aX)hY2NjHl;O~I$3wgy1
ztunK%u3V7L_fyX*={DHk_UFj(oW!Y9WG?8fN(0z<Vb~~HIV1W^dmV&Y?Lo~=K~-uM
z-h8($E7TIRIi&UD@L_G9{FY0#dtpRG$~t%0G%wJg<^TgNw4*<HuX=PZVyVkE%Rt(X
zTUtHS*c5}P4&1VYtTH_<!^_v_KWx1LtsiTyw3Df*stC$DWNb?5LjX43P&*$NkgeJ=
zT(K~phG~~uBBt*RtNv&5?*Uh}F4JthE>3(~ZE85mXbSi(a<io^4;$fGHgkaYvYBbU
zSkIXuCR8*lmMjw1gtY5&C83(2vE@8zbw(Av_=a74=pcPgC<2EapGk$rNOc$+V+OrW
z6P)f?zgMivbg^aMAYItx@}E`7SdbKAV@-XoyJl^N3U-o3sR1wc1C2d6_FzXkp4~oU
z?A282BZn#$tDj!eKC-aRAyK8PT^`h0N|mlTSAmDeN=HJM63|{HXCr0Khn?UmwArw8
z_Pg)G>aP<36w^ZU*O97Cg|6V+7wOCqD~(dGLl!&IjxvIy2B;ST7FZgo^aT9t<Lzi%
z{e&t>sn@9v{!<CHuCH<)q2snM=u35j4LsrCK*`CY#A&K0SN6eJE>Y3eqP86d$0+3U
zxw2szKC$%LLe&*O6Wm40{=ZY;)0-Sh^SR3-;?@dwi}4fKW2%U8D7ZYgG;+IlVTLWN
zh#~r4?XaMP;%6kXA7sGmq&3uGhAd-ZA$8RM(Z;*IPqTVIEBrshi9k!fyjwuo@Z$gL
ziU0FyaDTj1Ng9GKs&F%y%1c;}Ak{@*KjUGNwAiMO?RjH@vOVoPO0ki`pM~F3cPS-s
zagS0Tw@t%qhq2P$Y#VQu@_eG}>n(<FPKOLwboM*p(0tPBQn%GPNli#h`e-HyfX6}Y
zXR3@7_S(hU-${3ateQ5ux18o|Q8fYSfy*l~4;RD66FQjJ-IgK@Vm{l_&(^FeOug}W
zXJu{gy%bG&%+ctBYTz0gegdy+*M^T5B7@1#(#nhSprEUDzP-ZWAnz0bwNu&8?-1<v
zi6!Is7#mi_d7tQPJaHIbkJrB~&b*`X^Yj+b+lpn6B<K7}IU_H{r|&S$UL-BzIdb%Q
zMlOR|8%J=V(iiGAZ6AGjh1UAhyQ}@7YJ)24`{s#3u9Nuldi70$N}t@&Z66vM{t8o#
zwzq)uori6;s7)|8$~i-K-Sm+v$s`8HS}*o#Vxmh|ubBNsmL)PceVt+ABC5BbXcy6S
z`?&drnu`A+#~yuyFKST$Dk)#(XmwSdn?PosAUwmh@6%gK!2wF`Q;&rw^ThTA`qU+H
z7Flwd0+*H?cT$3}3f2z*<18A>@grBpx@$H+aki+Dd0)b>fzY8IE9nySBYQX}<Q2$t
z<Ic~XLdC6?w-<lqz}@<0xmEpdqM!cQ%6fDd22K7~;PPU~Q)6CZW7D*=gjcP;L|h0u
zNf(1**Cp;B+Zp2K8NJNM<P*P?$7c1H#yP-n+BwKbh!{0QEYoQFdY3|WkvsB#?gDgN
zFLZADs96Rr&d>ZqgH+MZBd^Q4w&3q{h@Ad2Ah!?aa(TzF?^t{Gfmh%Y&(95_``8}~
zsMHLUXg09t7uwrd`OEBR&bS+&Gn73Am<Lf=^nYAne-M-+Y?m>(=6;lthQYLF7PC^c
z)qPD{LHybo`cU96R=)kK=15(Vp<7DaH_)DP!;^l=Pk+Lt_N-v$W}d}9<#<f{XZ`zI
z#1B4^Q*nMH3(wGFpR}A--j@4Hl&z!$VJbE>j5j|bra9*oQwfHTAdg3`y%w>;t-5ff
z8P0_D946)W{Cb462y=OZw#vn<N5zR>Vg^7>U+JoDd1b?}@5bphi_!WHlAM3MS<{DW
zpZA5SURp<7DV@GDEXDBpL=2*D!~E688EFS`Oh41eFic4-@RM0SY~R%jpDI43;xd8Q
z<?6y1$%AE+{5|Op!qeENnc&p2$?d%Ghqml(SJmC6$<?-v(&)GhF3VO3dya26E0yY|
zwTJRpcJ+kpElY0VKTwWuO|x0extba#pX{t#%!EA<xEy_HZj_EW%Q>%wt@3(hmqE1k
z53I$h{x6BW&}NW_acTdQgpp-F_w`nXD#@3(hN`A^wQ-}~w<(zMub;azQn>eC&`oI1
zFYMmc^&KoZ|9GZJjOZV4fXnXOwX$PPeC1sT?!Lph;nn|=?{%h{J#k&A`YY48^Hk0k
zQ~C46$tN#dAPGI1zJ5ARO;Sk8NKV2S@d&}eO;iDcVUb0=yqRFF&fezzSH^{CM>TKL
zYWUm(Q>(l~XGia_(hK|TG_-{zDvu9rcS-2k(`RmQ?hI?l{nm2oJ9EH3+L;#t)0}+!
zc?r*lUr$N`+7w-)mKSEHi|N4jS2Ee^L)fKds_Vl`UX-d^@|z3Cs(C2Es#vL!JFe42
zp7oXTS`__W2{YfMxyd}kGbHff?B|=|_LlW$lUzL9VUDEnw2XPi2J}Y=R>PSYvMY%c
zcyG{6+GXU=tWAu<>tag1`P`^^kD~YsxgB!@kG{qN(#HoL)jXV~n6*(nZV2Ih{R7z0
zM$dncOx2>$=@xlka{AQP$N#vQ@z?c3)nl5x6*SFQN{{PyXR7b^F7JB$Igq?hfL?dq
za;RL1RwMOn2y05>ptMn8`hCB&)wds&FXt+y9^qJ`swWs+Q1u!OsmzgcDb@J-=38an
zyJy5KRf}IPc-3~KjYwV>uRi7QY^Y1IwTgJ%jNFIhxRmJqooCf|h~$jY=BKj!MC3;S
znZa1~5Vu@ICw=RMsV|SE?tc8-mCGSvTtAWV3q5)c^d_uL0ZKYGy8~j;L1P68d3u;G
zWkrB>e=e|ISJ#eHwBHK1Z?#?JtbZhgJWnQQj%UHy2&BB?OLgF{@^B_#`b@$#C#bH?
z8Z?e4LHu!@qV6&N#7#<WyHcr1O+rQC&jf7@>(d!)H37&T2D)RS&vl{2vTlpJtIZt@
zo8n97=nu$V6W8^0U0*Q?89n?k3I(5OM{U&QI324eW_pUh4d0quGEOhhY@AB@KOiuk
ziiN(KcHnItzu1;N^ZOMkx(P^|*M%b|dywKGaHnmTxV_j8YUP%$+P&iU1XEsJbo*v@
z^bTrWGK6^KZ4*9Z&<TtMLhAk_l84V0P_X3|V{kSZcraX4v!Kp4)jmc0?D8dg<Q+V{
zY87qmuN=aiR$D=-z=+*`hun;m?2=C7>S~yDx2HP`!tf!cun*rs5NzKUwPFxY+`Bq*
ziv(vk*Lwy8#Y(@Q%IdQDyyvm<Bcv3L@YYavT8f+$bwbbI)fy~g-|YFSY~*(B^N|H8
zYE!nqVgdD%`i8Gb2h;GgmYdPZt3GAagJ&xDgO;Mg=D3CS{;dGKTObvBZwViw?MWNX
zo2c@N**~~a{p{bHbuKZukFPx(S4%X)+n|;Db-u_kS#nj_F0#b$mvRnsDNWQmx3LU!
z;=i4?T={vPb=IFPacOGqF6|*|Q3~HyP@$&Aeh$~3%v9fOTg*CXXJ3E98Niq0BXJ?|
z(75|v*2Go6xK`cbjHPhScE?V5ysCvs9ZPeaUb;UG6QZbDW)H4!q&HJ^i7=f9`+JPM
z;We8k(e)NHT@*9DcSgL;Qf^ixy>V=4ZyQsyPUQ9C`&MMWdq-Y+!B_foLJZ}7@^pn-
zvsse4Q0TWY)ej;|p6-WI)Es>T75PfDttxJ~)+yf}HSK7r)e#mu;JYiGy}pL|PMwjS
z4(0s|=}F6Z)XSV)egmT}z_r}1-%A>OdgUrlOEPF(oyPC;%_b}7P;$n2hCw!-io#y1
zk6$7X1E%>XcrgokZ{*7Emx)?h#GxDtQ}$XN@B~MQ$Sb>pUBD0c7n$$+$JV!CSjehz
z^4R~6yLWJoH2B^}V>=sfw6SetW83z`wzILFY;4>1#<p$ScRpX-`rTXiFSu1xQ#EhR
z%-h}X>E}G>>C=5!{GbJ!4kfM9gWP_CGoNWD0PN-u$xg!XyK0s%%;V>b>pq~iR~#aM
zS$_hplyMo7*2~H(1@Z=sPbdeNrJe80TXPUpRk6*3=#7KO+w%@0r%c)(sAVf$V$6`-
z34s-w*p~5~TvfksqcpIFwzf$c=Q!TP+O<iu1*Ektle-V33o8TO9dNIr5GXm-(-?ZB
zfcrFYe-`d`^dKKo^mH=kqJd!iR=kpw2mg5<G<*KZNJT~mTZ`ByZ^k;tvwoYI82n0o
zn(mL!&&xsjy2^$_*Xt`?EHq(3fzBhSy?2W^Qu;n#sj_{Ps9x4XWjXLACn+s_cTN>8
zHJdvvbt^2Q?~Gs=XETvtn@Vc)@_s+-+dnhQ0$cfsmq_vP>e2}>4U^t`wvNaQa!Hz(
zoVWFgAk>*P!yllS75b>G@#z(I>5ayh+$Whc+~Jua<~pB?h_<)?2eMt60Ysrxzi3CV
zIDr~<`<_;dH)O`SRNY~i|K*>Tv<1|FE6FsgbK;Us^`etX#yD@AtkTntcxs=+`-ud9
zkjpqCet=l2SA0!y97yAO0}uZ|eG*KWgPvUhop12)3X;_pGts~0sf2L+Tt1Jeh?X$P
z-rDZ5IujN?*9e@Fe-Y~Y?xyLDe8S>;mPTUZ(aT6p%h*#VNt>BD)m=8hFZ<#s?Q$pY
z5N*5zTXybqiO$84$0=mrr+-19ABNczf-aq1pjlhgV0a~m8#!x#ajigRz<AiP;~osU
zV24|F&REQth#v3gupI<B{=<N<wF=W3-wzf$bIQtLkP0FwP=59xzCA=l5^!!y**9{D
zzGnkv!gI1fDJVdH9maD`f{Z0u_4qte(v*_wO$X@|MP5vy5bSwp#=GLq_e=oaRCbRc
zmD0qJm@M9E#cFXF`E-v&hM=+r&OG4p6XHPwJTkUF0%D0PoYUlu&k(~p5Inb{>v@7G
zQeVb$caq169EmEi#YRwWyQcFr9fEBKv{F{`b4hM7WIn&3dekbe=Q3LR4H8Msdc9%)
zWRgXaAqEpI?5k)Kh~sj^R<xZD-j#&6ak&n$aI6JG|K_qE^>mR_N!8>)oVcRj3q?|V
zJ;XDq2V5ZUp03-TOi7MHEBS|=h0eEPl{g&hrV=;%29$Q&L^(pc!Z%GQ782}}tOg*%
zqQ0yY0VMqJ1-392TVApjISRT1)#wu0|JXKe>!;*9A+2`~Mh@I?m>1aeM-{X@CT@JA
zfjo+$Z{0zFA&LxIMiUlubzf)4UWT$sU6IZY5$=Fw_!nb$a?*0u!5DMJWrq_23S707
zqV@ELsC5ez;lzx*RukPXvd5ij)Vh98smFj5p)u`+BBL*zyBjZ`pxyM_Jf4Fk09rPX
z9p^=s*ITA@V1S)JvHddsCU3&F_2hXx5(z#En*DC2P%@~|?}aENC|UB3BU<tufgQ~A
zKyBa-(nRilcvG`S&`O~2a8<awmIpDN*|V&yp2d5d)c)I>#T%^`6g(@dy@w6kaGxE6
zH|EdCIgVyGpWt8ISV&B7DIJg$7zrS1;?#3IA#stt2UrY2;#MiHDa#vf?<3ZDsXCTW
zEifq9*Ze|vsLT_Z*wQ-_SUA1&AwaKvZ*G6`rJN;`$U|RApML!83_zI8V%uwrjmaxC
z+b4tKcl;s{Q;Iaks1f_bCkdIaO$6F#P8W$m3Ps&tFfoVZr$MGYvk@$vwM^%#com(Q
zZHVL4e4C7{-m`5H?`bECI*59;BG)IET&XqOJUWl4!c~{)k*5LbwlzsDv}J6oKTi+~
zd@GyeTn#WpFq1!;CX9p4z;)!~W;S@Wu#FDo?TKcG6HmPu#y=T#863+}j3?QdI@-f8
zx8*;B&>P|U^ezi7S3N0b19Eh}`D!yA@LYXK^IqgmAK6%I!WOp7MYe)xZZ0v;WZ!S*
zGIL)1T0~FsJ%H7HJe+F;4O+Hd-9iy&&;LVsNl~5WF$Kit+TTN+uEeNaLxx)iVRv5s
z=0c^}d=5VGMrKE$EdeOLP}ME4wF05Mpz%k|8JoP$gG>OjoL{6VR37>2jz$E3SQr<)
zopg4GI!Zb42c>O070nk?!c*@NMqKLzZb^^u$%A|p$<I<-b$Fst!1%I;xA?*SgfF;2
zf#no{LDej?$V*`s?KBasaV#XZo}!jl$@l7ZB6nqzb-g$_GeGwZLkKdkiPAZR?a<p%
z3Q$g*@v51^(K+85Ujm&uN%V{PHH?KGPSsU>g8o;l3>KzOv@-7^!|T0UzD<9a;RZ8(
zVv)<=QzsZjVnf&Yba$wIe%`-|&ckJ|#ugdF-7zR4Fpg|}dGw?DR$yQ^Xu5=L{1bDW
zmTn=#XPojS9g^EX`7_e7D&^SO1p~xTbMWaBe4FBL#@kBWYO+E-tsVX05!qW=^Utmd
zPbfvgw|#<GtUA&<UHM(;gp5n#f!M2eMwEa^MSb$N4_uId^kE=FUGg;d5eus(1crra
zz^tLypTnsif~Cc(mSOPCEpS`e0jh%X9LC6mq}}`2AcjT$EFQSJDw63zjbt(Ltk;)$
zw~UpFh|Xf#hE2aLYYQL*M`Ei}d##5pS0uYkjiUzW^~m?>?MYXsVqQ`+KLjp+jK0o_
z;Arv2G}++&A`t&6AV-co0`ah=R`TU!?}Jdh`f_{KK|o=tha;M_;lool;xNZk+Qjdm
zUh~<D0#*0uBvB`2z~)=+WSTr2UXw$<>oy<njVbbeNARXoRb5BtlnuVg@owZPLQ|I~
zGWg63E_m3bev?Wbj#Je*h!ih*(PKA)EF9_?XKM6mcWM*YNJgxV@$)JGpgB6x4q~}G
z!p-j}7c|F~P|=e=fNP+>#{{INqzwMSDLamrKl-&sEUt+>9$?3M+@mM@!MuE8kL}9j
zU10C~LVI98G_W3v!QIRHI=EdGa0e=<dTs7^EW*X2uCo3O>Yb%TAM+JPX8s|#OM%7f
z6y<m<=INm@Q!@)su0~Caey02KDG98V8_hJ+AHhf&LkkeJ0xM|BS$P_%dw1QX*C}${
zmDR}ZpUbqUciVKzSwyZd0Z>@f*D$X9jINQQzpx5ok)G|EpOF#&6zqVk7VxiHR%hC9
znP@L$^9_^v==I@*cO!@43ue?T>xR!01Xu(65zfE|dP7`A_WzQc{vRLfo5~;r|6APp
z|9<y?`Txfka58m4{=0{!PMxhRQ}gS_6URPZZ<}Z_J*#9&NmZt9siW#*`_62qADg$#
z=m35fd0W@IABTBQ0-ZAw)a0COmpAv3O_TI7`Mi$V8<_hoPlK9NP#=#SWDzclwp3%E
zCKfjr7JqA^hOzY@Uxq?(rH2d}B8w<vqx>8xN2}=*+d8N9yOp;t_9(|lB$sok@@}{J
zHvRD=w%XP92OB$9hm?{+m7Ik4kz+_tEg0R+JBI<OXPLUBzdZ^EW&5@WA_e$Ujt*q9
z^!3VE5>+|Il0s86Ko$N=m^6))8e-+fb;jMt1*=;fQcR*7nSqoIFD2maCPh$13(4Q8
zSZWm0mZGq%aFtc;hLO~sna!O~nwbeMH;-`afdx%L!bjwmlpeTM{Z>%CDsh&XNm;U#
z1K$0|Zx0hm*lY$%X@E%rZgJjQRD@$or{G-DjHNt4qnw^<%m+&%yjsXki~?hxQROM=
zE>4*Rg!;dX%vv!KK+w|;Ml%GWPewpBmTKF!bN^+MRmK*y+8J9x?xvXJ_4HL3R5mbJ
z?y02?Y?l*P4oCIEXfk+mS|O#rKT&WUMlA3`n$G>G`oJW8LO!n&J$-A^d0x48#2q9Y
z*^l#t?Wjcu?@@2L=*b_4w#oRT;bq+D72L)&TV`0|iUeHM$ShnrIaxK1wp(QSiaw%;
zyT$)S*MIze6)i?ep@#vL6%|kt;~)!EFv8s%W@ixGgxDbc1LYvS13Al#*WxRe2O)B7
zEKo8YNsmrL5C9<nu@0i!!G)FU`e{ia5S6|0xVT~tn1eW3X2@rZqW@(It_wbn!cnY_
z_3I5*#2zfX4Az-$4KsL0uSOn_9OpKS&pO$sdV5nP>pvrO0tO)=|EtAaH8NeX3%0*h
z0E}auZKZV*{vf)hSYNN27~&;B6ebMkS9v1H@;fsM<V&Yu{@r#960zWX0U;;^D1`P)
zhdR3AH&fE=9T8!(t`PPjp)NXh%R)m2O?>Z5`H1xv1w!7{Z}~Cpx5>9RLockc(wMR*
z8+cSva2ValuX9i9M+Qw1aAhnDtp$e8COJq$=M4V8hI>o=d(Knlb=W)59Mk$^m=K04
zJrC#a!NdzblDWf{+9T2Y#t#(;g0{2Qn)`3q7f&xkp*|w_p6d{bG6P+D>QmDlI$96x
z4^(t~o*1K(Z!Ul=m1Y##4eJpiT-JY8n<=0xVhaoi{iRnv*w-($>Fq9=UgIkli;--k
ziZY5}CoYY#IGG;4joX;l?3A&bpPoH4tu`CZ1^SfiwjFF!w@<A>^!c7(vk66na>LcS
zgd>tTpPR_d-dH-TdkHB!1Jw3_9{MCjT_*lwzg8WDm-T(qPNKD9N@SK~kkbDM@?#Wl
zPkp4@E15*vu&3iEjl~Gmw*T6=Qk|TTwl{KeYgP*nf<N|*qW#`I4I3gm$?)-FZUw$=
z(R(^|js<gmdj-kWVLJLwnO6P08?2QzFA&PZygORN{eH&#{Zs$$GDy<b+%NDJBi8T!
z9?Z|1c4&(rGZ-I|1WFqr5Jhf^^W#M(nWJWr<GqSnpIu}G+j?Y$#h+;=MxagfQ+$M@
zLk0OSco*jPgGI)X$qyk5>2p%+=rXq0?>pb{V>j}Zbz{^PR&q7Iud^&Pb2I$rrgZOP
zg@it}M8DVNiX?mtgO&0Nwdf;o%;*kUR``_d(nOwJAeyl+_Pkk<bac>FSGn`H(fzT3
z8CK&hUiKCyosJm}&kP)wCir%*?jJ9y0Qoq#<Cw=@etBHv?rB%r_wz1`mk}0UCc?*Z
zX2)*rK1iO&_}bM;9&MuYT=s^S;^=Gk;TGv&mC<JOVA}cU0A0K@OSN{oIAML&_iafJ
zdGwc2DyWzhTsE+pJdUPOeR^K|dC@WpsiXPyqOU(1PZ#R#qW<?9UzRBq`GrG3+q`h|
zAoCsl;;J57+hKLvJh74O=#F2YN8zw$Yc-$39e$44@*xjtf)D&8YjWZ1oaaUTn4?1@
z^X#N{^7=xadBKV~@o{C_+}Oh2cFWXivOvJNwq;~^0kXs#-`Bagf8E%{7vT1v8{Q8m
z<9>ftp;2kyVEL^L50=Yspk18RhouLOs>XhwV?oMI+g|byq~rx;g}+51^2nmZ30&4c
zRX30&g5+$X9%sIzv68VC25&_L5J}y5eJUl};ma^B;BJ0#vz*_lU#GOT?%Ywx7jrRV
z8+$MAZ+GkM06oX+QXeUEa0XU=!+8G!8-H5){oN^?ot~;gT$SiGU}$4AmVYLu7Qt%U
zFVlb@=o;3Xzdsu+V~~bgoA(?{X2@OmwwSunq9=ilhy;HgTmSauHa*k0;rsHL**yUG
za8!X0_UmF5svQ?GN_)gC;DB&i^)<9XwVJkKbuvkOdDl0y<|^iX#UP3%lK$PZ;A>;B
zd?Z~n^$U|E0zU(HS&i4%yLdu=YRR9HzrJhWAejY2k6-mr->)ORwTxJF;j5{{`EJp_
z=R0@eJQXl;*z$s2G?tQF5BYs8U{j<Ko4vELX_E)Ax1BQe<;!glGS{yODT9}j<M~+Z
zAe)+Kzc{aZ+3s2IpQcywcvT6hOQu_CQzO8k{%?PlB%f5zCblN~Cc-Ij%$SOdZOP$b
zX#TekGw9VN$C}Heap2%I9vf)*{ArG*Uew`-<3Ac8g6PQgSCPz*b<`q3*Vkmbn<@Id
zw~qJnrJK>}yEYGK3LL&RGk^aSc=P7Lb#_cfkrfotG4=|ToWs;fb9)VSQq3j^Z>ShI
z;wLPH=%o`Sv=~|03R#+s+FF7UMyXbZNm=6Pus|kV{{f^37ngOdohObSvqfB}pIdj{
zB&&}9h9Fa$8>eEB61UgT&DoEQ@d9O-%R>qIU#qPVg`pdS3LIi3JGCj6s*Q`;gw>TY
zZXD{Z%Hxj;hpXw)WrH}XRaRFmndWSnfv>Db_e-PXpilPob=f1&3p|c=EoJUW*ytqu
z$+x+Txz%kzoeB)L_HAqX!R=f%=d^XKli89gtbc6lu$s_zn3#D^ZksMi3dD9Y?PPFN
z4zfeD{qis~1fMLF4^B{>Y3%(yQk7*^!2tUuA})(&4iNLQqM=6q*mSqk^ku38fLId)
z6;zHl`Gv&-sT<AB(=M|i2t+q-;IKkzA;ci8P=nqmmH^V;p=RILXLvGD$~c3Gzb2>+
zJ>b=A&-ct!rp0(<KLqd2Y1p2gHGiHo;9N3l(m{TlNkxl*v-5D%9o_R^M=b&)vaCPu
z5BA|GGpiOe8B1-QCl22Ea^eMjyP%Ok9`#RMq`0A+&r4;9n{|$ad=xZDqvMYmAULLw
zbyu--&t_G#06oPZcgUc2=wRaSz<VN+NEK8p^4bGCp2;}qwY-?k9FR2~yA1-zX%=PK
z!fs|%6^s>BQroz7vD3lnr74mv25E$I!OqEd>p!E%JuV24GQ7C>eQhNMuoPS&uRq$+
z&!VX0&eZLP7B>DkTDj;7=N73V52Aw#s+wK7;8;GpIBUkXW?Kb1uS&YA{kMcxjOa;b
z$!MQ0%vY)2<RS~}Cm<&@9$=!UZ3P;)N8yl+ka*Bkj2!w%Vkd{9dE>m1SLkm)=>-OF
zx6Y|*`beOSmx%9ipn+s2Dy}Ej7jzo}q|6%!Osam08MkqT0z2bjMLMO$WkRpNInh#1
znJESX>=(`{V_WNvqjx><pCd;?t;GPu#x(rbvI{`yynl(C1kTaF=`M9OBjOq8YtiX>
z7>9;M|4V~Z@`8ogb_l1{M^%NpohV0_T}MFOABCfp_zs>4r5)XS&E#8#R=L#4Kx_jA
zrRop;Ip~)uxriAq!tP7tIqV(Rb|0nLtVM3tz0i3Z_~iT|`(7w`he-X09{mabRx&mn
z&0;TeE!fSjmio2XNvyIS?b@Vuyw5UI74GlK)1m@r)^Y^I`0FJ77Wnn@@|ic|X)SOT
zYIg|PHS`H&(>Pi!nflcX2}brkAteSE1;T14n+a7am+ZG;yJjMxZRkjxmPe|?l+goc
zE%Ktz=tIWzcu9HZ_Q4%PHXyfzN=nW1)j4@cxR;PWKA*vOX0^0I<>iHE2aJ>H@@B1w
z5~jOqv#?{92!EOXX`3eY&lYOmlMi=R(Ch@vP^F2XRDCyQf^LFIs*0cLn8j$*GX=h!
z3{RA61}{4biJ|4+EGI6W<NA{IJ%yghY=T61ctU~kG;)aZ#kf>?RZfoU`j(MB9_Yje
zc5%=LVLFv@1-?L0Wwi4X%3%o(i$MD>8P1Lfof;I{8uTOSOx;1~Kyrg3?3)2)3?R?t
z>^^9DJ`G0u5Au#Fsnlez5|lLD&Wk)2Y5g0u3`JK)va!ti?7{)Me`od+;G9HVter2k
z?2*i07W}hg0{*u$wgFQ?8^&HuHK*?E&&P>#NEurWmg~@|CHp^8wy(P^G$)Uo?X|?6
zB3%cuwMo43-c7mXowfRr@e&$fbOdBX&jj-Gi=~OpkYsGi?*&w5MhI?31Z#hm=0K{&
zTkGXowMc1IU@)EjIodQxzxKY^TR-iysAeSM`n{=ulU$M;j}r)R;SHhmyuVD=tvcQ&
zj+h-E>G*m#g9*wf_8C+N72sx4Y5SAs?ge*xxA%v5gdV#FvtCKYNm_jj)zIi@I+k%J
zn7a+0+I&6u<`MP-oz$~@67U)@GgCG9iahx(kFv<~->PP+8~GVROZ$g>&t~v`UzufS
zBs@J3lL-aViv^5lk`yju?<^Fn{QoOeBd7c*WWvM;=!`0pMUWme4=UK_xpL+kklLjF
z?FLwc*A@p0@)0#~@e<NAwCU(MrjJ9j4zYcmq#b;8jwctLz^?7GC)Qla&EznpeFqnT
zoOJ`&Zf}96^p7j`d3Dj$2=6RtlX^ne2b;$$-X+$+5%zkH*04E@cCi3n_EjS<DiHoU
zW|@m^AFTyqRVoZ3MxF~eIVDF}$6O@)=Vlg@{IpYMAYH{u4ohWEB&ZwslW}tTe`D29
z!k(cm@QHyeRS;w0!Uk0RZ!Q2<f$POs!bl4eX;@}p!#%w*4FTBb@-wML6u_Co#}Uk6
z6_yKV>r<^F>IGB%?eh9gjSoq8Cub*%*lSg@akCn_2$~tN={Yc?HBx6*!Vn;ZNtG#J
zQw*)00*5=H2}*n>btv`@eTP9*&{TO>lE4`Ai9!^WETKracRMG{_P`1(25I%;yvb7j
zJSi*AK5|Ook#ZgvO+&o7fQ=j$(+L8e>!afS-y>Z8o+wT+99r7=>&K3Xih!%D#{ms!
zo0RiCmI3*(vde&cr7>Z~<g1R3Jr~PyLJTZiQ(R$He632*Q>hdZHtSV0<9y&<lDy?c
z0XN=GWe5bD>PRP%)rek)O2LIhvrF0mHE0Zl$G%i_1PRcWDmoP~tx!bw7DlBUeZ7X@
zGAYMNa(f>eMEq$LGZrp5C>6U<B&7xKq6u;_fH{46*@u5YmxfTL2~Otbv|?}9-u#)_
z#>vmyLn23B=HR+@cvdeynaJDnpO5pKg$-J7FP_Q{B2yDvKQkw5-VvH{(x`JQtwGWL
zsS1BxeOjbEd|?JHD*2;qoX}jIJWwqKPG4*v(txU`>cq&4u_RX~fxWRnF)1&S{Z5~L
z*U@q=NEQGR`k{M7j$tJ%EL{ZrZZFc`URot&Wy4Bi1;5GuJE*=(w?(9phBa(;C`g*U
zrp_~U8eRVqNV*$tF2_@<428)jgXX_1Xn)kpB9syqRNl{}8_3ZRE;O>x#hKXrJh6#(
zPd&O}m_(sH>Q&|d3JMz<Zo;ralpNNHXkf5^cF^tksbafl;uliD97}YRz`4T-z$+w4
zW=j%zPL;B5Z)qII>5rc*g<)H!oEr4t%k=e?&gN52TDyE6zyw95gD7LG7enIG0f7Pd
zfiZf~qEa(lD2|WLy$gqNy+MMD4Y3D!p)F0igtqe`f*XcKWnO3_i6>xa%?d7;sAM$2
z#Vi)=0zQi>#1;9ger$`89?nQNOZE&lN!iaAR3Yro4NtQVqH?z!Rds$P{hDi%1&zyi
z)vS9e-FXQ+4hdvDjY-^O$(#+U)Hycf(}u(MsFtP38{;!|%7Pr8Xv<(FC09}*$~xyf
zG%5na`m>$ss$Jde-`9YVm8Iwu$^BTmJ>3klH0_}F60|jefKD50wbGCU-tg*xqDc7|
zkBz%dZC3U~7!^Oo)r>&Kh%P5DTK)rN<7dLP6)36T)`rg?rWN7k>7BDzK>FcJ$>hSQ
z0@i*PrU>i*zna^|PWI0c9)akkIe(^&fVu{(_Z6hd5%C+jD8=v7@JagoeBK2Um?^8?
z3BR&joWKS4pyDpfGUCLeAofMeoT6nwWsWk79OYP-Mp{zk)cap|lH4-Y+H<C?DNvzr
z;Hf2K2N&z0=*J>#c)9r-MU48x9*;k!@Q!Do*BHgUfbZcx=dc&Y&r0uK>o3NUnYuE+
zJxY_SJynW*qLPTVY-St>f3aE5w2l|HqtxD|i0uCDI7|e=IIMml?i)mF3i)}$ebfya
zHA^tz6O<MkjWSWC_&w{;U-V&?s`5WMa#vweAS(Y!mpkl!f6M07m4(`;A-I-ZTHQ1S
z#w}*_*Y`ldr@D`jFW!v$+`($tl-uihJi_WSIt@BBo(=1J58;<>9<tjnXZ+(yc#F$u
z?18LY*v|RHe@d!!Rqm;@;7`vhKXK~QweH>BI%5dM=8L@Y4j!VUn}6O+JYioVatrWk
z8v6R&Ux}J<L+~J<bFz&lD}xr(oe=zJbhi5IHBzD!xVwovi60me9=8gZMNiSr-Y3S7
z&Amdyq<%pk4-1awpZP<p*2m8HjF@Mt^rDUh<3#rGr4|)$+sTp$#InW|)}t*@lgS|-
zo^aaIiwGXW)0>Ye&MsLf^P`_~)9v0)`hhY^u_-Tl5ALq4V(N)~Ug$2skSA03054-5
zGKhz5PBxIG2X@B!WV1Nl7V|%C(hB@ijE>P0BBp6`HUg-#<M``G6i?IXJ(V&*rpe;g
zxn^lp{6T2V#<oz(exTBQ40ky#{DWvhr0Fq0%vGAWWro(C@L`EnQ;6HrF}s@EP%^T2
z5;r+sLM>DM8r7Kja~-qbBKYG$C)vgZ%FB~hhwy9u_Rb6|-I>K1(E{OOhVxlY&{uVi
zEq7L+oJjzO*^92wPo<zpSz)99aq>Ao$5U2`N@@&J@)w=OIa*;Rq=e9Nu9#PBu2x`p
zB&Zx>OlPigMgNDpMAa{x)tW>r|2ik)@rLqjA3ZSOI{5PQprxDz(8aYip4>r;D`B_q
zeEc0~!ifFsVvj{$^=8yOp7<+W*3EUyy$V9|?lKo02mo#hF_pZo6+0GG*@n}U1h_36
zsY=_Q27xTECgt09EBvZJ+ChnW4W@x?oKFl>K`>Tln7f_v(w+(BMs<LHo`erO`Y~)z
z(a&rzA4LK{S~4avz=dF;0m6IcXElKqFsPA_9kc+<yso}d$4~(2Da)GW-B=mi_>lhq
z`0Vv8+BSVm57(p2e1U_8wjC={W=`CL?YzY`h6-H5c!@CgJLG2p?l~Z#Oq#f9toVn^
zd<a2s_F#=vXF_NPW-bLAS{JvNR4z^&0h<J3#)3L{ggS39R2IW<83T0EFrQ$Y$HcEv
z$))KwPJniUzCD9d^UFdjbZ?$-W}jM+x}3TOHAHLX54cl;h&PUQB#NytxH_Nj;ntC>
z0`Bl1m2^_%^wziMeeiL5bZ^@Y$`JE3VZ}eRq;b305QV~ksM1vesad%PgqoGvpsjRb
z9hyDeze0r8BiN_E(x!J~8H8ObToJBx0aE*dIg@$5k@`B7Ol?O@8)kP85>Y_rL{Y2@
zjPrn@+Y2brQ{y~eiHtqV%;$M2Fm{TY`!bvO{wf3*7;h95$j`Ls4m|Iyo%%~z-j8*x
zHa0^=&FVD{LriI7p)`l^Zv-r*4f6S7XgV>vJ?1sB?#R=jM={vA#uI8qJ4C0t4q4jG
z{KsJfqzN`<x&2;5Gk~!ug}DxpV@zL$ZHIYPs&ka^S)BfOh_)cb&3`fMg5yCt8=S9Y
z#bD+kU-eZeC=#jVX21~uR=BFtW+iM4bFor627|-mWd-Nfi`rKg88hY&AhGnBBd?TP
zVi2nnKb!(e5AH@Q<obvnmrBtxO8fRbved=5^9>kdF$Y7s9L7J*j|l!^cNGtu2xS$k
zX=J5NM29*f|7IFYuTa!B9f9<U{rx;yO#jCuwZ*Jb+Cq_MOQ2j)FLp0509jm*k@7$U
zlY*>zAliw|mAx>|=Vm(j{{}>!8FS#w>}Q84G5aaTBcr#0aXgnl(%&AN2kYylAIQ>x
zBptG|g$9C`m<IhPt_Zj9f?7)%&gmHhn|o?Kbw(SH$?Z9y8g-%)EeT-t8q;BN8knU^
zuD^+YJW)CI5;5W_dj|xfY=%*F;ld<fEH<oAF<Mizz`+K}xJZf3_ha*KRvTu51T2*6
z$wQ1NELuD9Dc<4<&KwMQS|$v8wSN7g2Pm^X`9JYd`duqfXGB9>?1Ie`zN+AJjKY0G
z1Xlw-cHuNS)T42W^oMBA(^$H4h+dwBq!~_T3!re)e@k|JtQ&(cjBUXbhi2x)FCJ>`
z>a0nN^#M)&p9ZWns{;{be{B_HJSKres{_)>7}u7suuHx6Gk0pvxz!Xsfj>q+<>e3r
z6O3A!ABcG~iG?khCIk(r8M(}t=+_IDTjr>qieLtv+9j4bk_xFRX5O6%a0i>wDJnk(
z-#5V}>FL@lfwW2+L2nI%LLJuQ-YPoYHSySWu#EdIKmtUDT@>`LOCFncQYw%L?&#86
zM%<#76@AkkccZM1wcsohp|46jKt8ULPOx2brvqr)#wI;-#1x}d!75U~R?4-ZkReFr
zlP1e_7`O-lJKsSB(MZY`gE(?xHCNW20H7fNN-^jxkg6Oec__6#%X@fU=<Wd~Kyq7d
zH|3^9ONG6KPc~CM5ekF&)_iU)P+G@{PZnaf#wLb*_M<)lVBTU6wnJyKD5Qoje3uK`
zZ<Gvl;n=6cKsiui#9_O`)7YAPkNxX`+ORC5t^(OGbLSNp^$`FUX5^y>v8`CR*e-Wo
z5`ghPP|9*J?E4)?DReq&!-OCU)ZvC!3k$>DnLQOu?A2M3v@;MbuF<WxYZ{tSm|f`$
zyEw|trCAadAtkHohkE=(V5k$M$Uwl(B0<fJ8+9~27Ybi<f13`Ej6%Wa#(8WM^X~2*
z2p0w)iD3FlDy2WEg6-&rX|I(G^C9395Y}_?8aTO5k?A8k1WmS?vEbv9c(FN)*uPbQ
zj31FW#CBgQZVxvKCxf`dYnGz?pstM*A+-ryf--6Q(`^Uh^i8yT$9<J9`S9gBk=6CG
zFYT^=%;c$6hlH>@MBAV2@}D0rXP|S3zT0%S3G{t<3T`G|Ts6PEWkWb9+q<k9*Dmg~
z&2qZP-xbVbY{+_#t;YzNe@{BBtmS`5YyfT0*d!-LI~6F6sWjgNdF3;Ow{;*_72rG%
zYW$cdyc9aO9?sDrr&XI3Z)ft<j?b<^1GyHO!&K}6{x2{#wq7#NUEI-zP8g)?b6zUx
zGBn;W_DK|Ok*x~M+r*W%QT(V?N<z3;u&_oA2Azn3d_b^#uv?yK^uN8&RGxxXv|1tJ
zv--uFx`%<>AX)I58=?+#yCl+?PwKQX%n+i21(po*!X@g^2LXTn9|h;slG)vgR7_gx
z_@h&#`loG}1Fr?FzYj@ZbwKG<<rKW<A68fVSy=CXH4+yP7p=bAv>lW!)?m5C4#pel
zx-ol!+3-9_=auv@<wgEdid&rOk;=p&7iOt%4ga;Uo8(oi72-4=7fbNt0GQ;b6JvH$
z#))P}Ego0(OB!i0nR)))-ccy70(BEuw+8P3m;o1lB+*)SF^M+DbI!Y<GrEFHI|A`r
zd}qwxv}O$Y@UYaoi5GH!EIud=_9~Pqn~0Ql>9)Gc)}60SO=w9+Sbab{n~1Vf7(4&<
z_r--JSOsi_J4RP?APo3M5o|Dq$RO?w)(DK1$Ji8Uo?5}#^lqrCS_L79!sC&{2d!aK
z1d9XQWDH4o3_3R<zRkjbzemkr*%&VDq`?&39=2+uN5H;N-)!mI_<>c)Y|~)$r_>ZS
zKqz3Icm-_mSmt9)IxT@XnK4$gCk!veK8V-U4Od=`s$CtmzZcdqsT1cY2<+GjG*T}_
zwPq||otP;1Y%-xDr*8q*w!1ZDh*|h-Ym(UmxU>B9hmr-#ABdt~_mQSkP=T*#G6#dv
zC1F91*>7Mbe25%`6b~X+@KANUFKELJff3KUuR&bucfMRiMdHgVg(Rb|2?|mZYrL+z
zzw$7*U%pEAb)rYo$KNafrWaZgw$y#9G{gP%q1klC%cQGi5bBlK(hc;U&BCVizo!~X
z8938k;rgWhUYPvVnxfDv%>&f(Zb^6y1uas$`tGw;tCEi{sl5<=TvSJ7Ck~no!&K1Q
z`hud8G9J^-uvrz6MRoT6Wf5Aor=fv3#DEca<xX>vKJbPP7j38i>IbV=8+qn+RB}`l
ze0}?JfnZTm!ZKJ;55u4*3i(Pm{Th5mOUl#RIkW1Wi>44J`{i6WwjXx0-veKM&web&
zQOm3svWV+SRmg1abfp5mItpi06>}vCH_tScEvJ}4mv63nH(!UBQxi{9#~r_uVcZaR
zUwf|z>ZlAm96YH-)jN?2qlQ)~J6KxvwsNi-<tS)d4l4lSSN2EanQE#~gYK+Wt9Q!p
z11&|(1&vtOnIcrTGw^Cdb9%LM3Wd|9zRwbYZuD|H9q-FV$k=-+S1;u7JEx1sXYQaC
zndf_wJG<e!!>hf-%hU4;JqXkR9RfDIkW{L%wAR?P26<G^%dM$S?>chRueXljXB}Rh
zR#g+A#zqq<lbb4{P*Ms477YT%0A#}F@j*s@dt`q4lh;N4u-hGXa~>PP77Y94Ib0B;
zxIL!QlbVLbFI`nL@*mkuv)SHSgfL4nss$z-d3KQoHJLRlNq);I#C^VtQP)_TY3W<$
zzQllkYipFiJ+}r`f3oG?8@G^7!9mu5!l|2!-gOEVMyWgMxJ!tQ6?=vOPDFW)5Fl!h
zLb*%W$fUWMX%EUwfmp)Z?DI@5{U+PSw#3nLj>ZYwA;iEqpirw%Pu_5vI`9t7&=|4d
zBtH9^)B)_{j9Ie4b9%E~a?og|SNfMTu;@(C&YKI`NFFZ5#!Q^Aa<=XQ%E^)-QgZN0
zbY|7Gai@H7PoK6A?gC7CVMXe!8QeqHHJjh+(b6KXT#00@7Ktv>ZHn?nXM#Bi0Hs<@
zGL>ZG@CQrkdaW$Xveiq9(XJ<Em#rb@c;g{L=OHiIHo2CQWHSC4d{)vK`By}&=7R-o
zv*H77plav?9!6Y?A<sRwB6_lD+85O68f)?M)m`Rcb?!d3Y)bRkwHHB&6=R3B(j7*{
zVwf}3(UJ^xywO1Psy#YzfS2<7hTDsGIzO)uMH>qDnaxvvxafn!;lB<Zs&1UJfCP^(
zOIrPA&CqyH`UHr+1-Uhoa`%R-g>gJ5Ta-fEn~fSYf%+CIa`z)p7mj1OSp)uHOyo+C
zsL<lFV0Vr%=YduRZI+!7NYpXs<m*LIvhuA-v+IsiIp?OkWIU)TIn%F_5?i|yARbqN
zdo&e_@>w1VUB168SEE(Y7U-rT?ksW~)~w@3w7X1#TAN!ocza#%f|HaRx7mG8C`$To
z{uRfwL}T$<-cBv%sg=S^=5jmN)Bk%#q}gIY%PN`blT?$|u_VM~vs1$=kt$-Ve_5Eu
zM7jBtx8v}x#nTt@+o^7Ljht1ZJ^r{nv$%Vv*J99}1}DYNMB61RLcBF2pEI-M$a2Cc
zxt310h)#rMrf}Sj4^pF?D-^Y*8>C>*-`OT(g{JfTJjH6t5XuyIAY06<>>>g;9C?{G
z6rd?Ap+=J$r|=jdB?2}?Y~rI`a>Go)8|$?SXDY4~s)nIu4Lw)FJG7LsCwY5s4^8qU
z@dSl|Ug3#;4gqU>v28Q6xDV_g_e^L3f<mD+zIaFQuSL>R-Ns?ZUB)GCf~X6wRd#|G
z!MKIjvN#=QFP@L2E_wRzL^0Wh-QgF|4Xs9d+>AvlnWo~lE+fS|jPb*?e!jbOtg8K?
z-;FU6-aV>*2qMmZ6&CC<?6;DZmt4-fz>5;qJnjjyr`?oDv4s#xI^UgsQNz}5EOt~A
z3R-CEbgf9PO5UVo7LJ1ojUA1JAf?bSeMu!cbCF7{>hN!meppG=?kNB=bb8W^n=p;7
z7aW>~HkxQ(8|#t<5(Z5?$VVQ{BJ7*Q_-9#D?~*FWZ`;MoFO<^Hf7#IVV<1-?pjVn{
zcBPZ)JW8nR%pGPW*Q*#~vRL^qm97lgZfq%Td+?;R1ny#mHHwUO2vT2kAGa&YI(QA`
z|7`TVH<M-2XPj+-I*$%Rr0Ao^ac0P1rOtR13qIfIurb8puo%t|4MK0quHe4WvN_g)
z9vefc+7u4uFAX+pm00ZvL#OYmaPr~_N3t&fQxz~MgOUUJ4Q(bJab(sY{G0xCjjb_R
z#2Cat6p_{~o(Q&eH?=PLb_>}zFL|``3c_nt>~OBN31h-BtjnT+ncde3@$$7$%}lZR
zw~Eyw$+any^$s~|FpA9&7~2gZTh>ym*0Q{PS&zy*L-twG%)@llrP|CE$*5;_0geG`
zU?=D7tQak#aa1Ii%^CBva>|@ilqU2n5v^rYwM`v-m?kHiX=VMqG+l@XWZ0$KaEwy{
z3r8$E9cd+u=kfF$bxIKxSScDB$hZF)M--N#R$qs3Ba&fFM}_R-?l|)T+xW_51z6V9
z`pWVjYIQ>01gMo}D~!qz6;-aDZ!ruu)cA%S(%5y-7Ew1>o>N_vXi>F)P}y!0z{tMQ
z<eaZhb3Pl5>MYP0uxXRg;3m_C5W$^B3ma1u?2S@@yD%Yx9P=K%={pD1EpcbzqQlfW
zobMa+@2J138cLlB{9E<FvQPlAVw~cme4%+;n@$IMXMG5^I;0TvRa8>dMrG~y<98`N
zLgoC?s2A4gt5<NoD&|zJvcvy+T09HDX@k-OvbR`u;j0)x)ILxUN~aI)mw!n$Ej{!e
zhS1g(XjZhdUMr?l1v|0s;o{29S)<X?=E@{;N@~p7;|vX(@kc8Da1llX36bKpHV=?F
zoO>^4$uPvg;ObnYc%^eocfRDl%{)tr4q@e3chZ2;D1ti`RM!7z-#{mw<gZ#8hs4(O
zO3U)PgcVb%<RZQ?K+RS<VcP0QEWW#B=F39NN?WaeVd6kSs%d8g`DU|EFsAgMaUNkW
zbF7%c={%Kg20q>hzN8p!hqyD-hYoLo)hRS?>Rr-Pw@M27RGIp5s6S<ttRnSrm~cxm
zts@)dvScc>7-(=)>#-2kuPl{p<2o9uqM0o*F<^UhV`_X%lpA7RvaG0jD!C`wL+&}#
zYdl=D%(g0`I2q~rMn%=7koG>{dz0EjckB|&)t9XT)^3o8HVX2;ra0hDqMm0FwLsY;
z_ZB>n0K~*pOSfl$!lmQ7JN{cM(4KCo^0bj`marxG;Apl4vBT^RU4%20Up0@hl!G}y
zp+uMDmX{rP+G)e@Dh(04t_mxUimK*SFB@~)`YgyTN3En@vNXO}O@C)L;T|fQ>Md*I
zHCe@Ur#fl+!CK~dIql8jx*1Z{C#H;up;OmDS9HH3q4J-O_fj<5tXR5~e`s~=23Bgz
zCOITi`BW@wJy}K;iDXTu`Y8Xn(`c$W;!tuP06;zYhTf`Zx1?x}eikXKwAZ)h4b!*W
zec>e`F?}+NZCDB5DmJJ-WnO=2+I7&R86pbAP`1?+Y#pU9nNg{glXI}I8=gm}ePr$t
z&|oAslO;0FSaV+MWTy|)+9DLgZkcT|Ih;X6E*r8O{j6%b)2Sr+YvIvw)%sCAeOt9K
zunytS@BLf(ws}9>v?Uper99DUOAN0Je_c~A!0NzZkxnsoyL;0zg{`c}6~PX?r9N<u
zONoXmDmQG|ViErG(L<(Obcig_Elj^UR5oQ`3Ep$<9u_N=DEZeDUpX$lrG}4kc6o(b
zlsFl2vA+AVOBM%<5%jvI`?yAz)W_sIeT#8h$6p{_fa<c1r%wg3pf64zH??R?*B-}(
zw=`qXyrXJb?Dll>KA371Z{IkR)=s-3(rTcbXxR&W3~^;Y!@SkXvSuSvm8#>N4{tay
z{(*0_oXff1jPWd??zV({adfG+sWyUbxX-6cv#vV~bM8JWRw1MC#J4=DF3YWku(qN0
zF@wl@4<Prj>p<*QsD*KYx<9|p-+;r^sBAO2xa72*t3;ud2d?mCCWzrw8Gn=tv$cv%
zgmSn`pT`lEp;8b45JLz7Rs=e9g0res{F?ZNZ$lmXKd*jKyHfq3nebdL6)S4tQ!9!q
zqBB>d2?mmdvVVnjX=GMbV@>|GDGJ*EP`NUM7Yn|SmTmc4!xLF&6h&32clFw%fG|r2
z%fZ&T8J`2zr78pF+0gB^Pq&YD{pXV25Xx<29j!%9CoL#SsNC8(_&H`8T!U&TS~}92
z)o8;s)tReThN_r>zFr{$Mos5JNdhfXr)y^D&PcHHXr}~mD(BMZ^^bVt5!x{3(7K1!
zBX)xZG_4i9rpgkLvgO<nIcut;Hd748-EJ2pz9EOLUCBI6T(x}9IX~05_6Wk*i^EOK
ztsRfbL`^2A8qIVihcRX(dHOoJbdg%r7Q*Tohjy8$oi;le_aI_wI+aJ##-vv8l4!DO
z|0@ltZBS{e;9MmrlcG)u>O{A3)?*fsltsd^28JbP2m>bn6r@$1RJs2_<tjX1mh6nF
zYLCZJtKCrTFBM+wk*mDcWlzVXO{H^$hiX!)*u)WMWU*6R1@6<xY`O$AX?eW^io(lY
zSwtn#8_CN!>=|^w1ebR|6|BGHL^{a%7w}71+@NnlQ1<4H?A@c9l0#y7aQ>&NyTdlK
zqlgJbH-bOMH?4#5m$6*@|55N=*xO$JrKqOs5?CTP=Q|Yn(fxWKL+*d-p8ucU`>zu5
z-Ww)x2E@THAac1L!hPA3y9?Xl7&Vr%vQ)J!&R=8S{)>E~I+_DL9+<};$hGaEqXa$)
z_C848Z_?WD5mj=3yTu*6<ZbAbwGwG~<-2d@pZ_+`lfVvK&ZN&?oc`JNk=(`q_$7ri
zn5-IQO<^WZxb>KWnwr8Dr})tjpOv$h{mAxhMVJ56GGUS`5>ckTQqQ;P99XNAC+)tZ
z?cncjvw-Xpfnn#Tx}~(d(zCgS?FWv28nqD)>M}gse6YSi;p?_OYpHwH=VKJ)PY+@A
zzwd;SR>qY}dWJ=}B)@dm<5LJ&2Eug+<=vv<S?Fv%RrN~goch*Rv_{T3qt@RU4Q<^4
zn#JDPUuS3qPY&Gk(A!T=`*g%j56`E3kH+=jG_2B?<Ec*AE$^>~%~a$pS(VU8HEheR
zDD?MuBO*x1y}}s2%oSm5Q7*lc^?M)X`IAfDhq-12&zvpq*@X)}AjuX9y+ds`B1wjU
zO06}1$o_TBH8Y7`AnO+5=V4-u<3^tytObxN&EQ_n0%h?*t*L&(olLQ-%hFE^L50Zp
z);=lkz%w0|tMiq(>cs`5(ZqwLIl?%O{^4QjpZ(pDXii0<8B%Q-B*d&nJ4xHRFvtGK
zC&SP7m3`fWOfAWK)&)A0h$+?$UiD+cI~_d4{IzDdr3BH_)h6ls$CG0C-9d_P@2qdm
zbPNCq`D#OdwT=oS2Cdo$k;B3Lv~|?_m2Cqc0N&!Fe<<*tN&5I}>tQ_Yvg?5Gi!GzZ
zJJRBCSoic@Db$G8<|g#_Z@Pxaqn%?Irw$OtU$Hv7a>DuW|C<W{IoAY8Q%olLgza(B
zGR}l+;wl<8QoWsDg!nc{*0BpqUMgL5m}B@xLw|!fi?QU%aLBq={|dW_>=AN$WxiUy
zw)?zP=s60%4G#K@@F+?2>%P_IJP>Hy-W@QaFnyb7;QhLBtE~jtX8Om#+uKd^ZQ|9e
zSJif^d+60v1AbHF1N;7NKqB)Z#C0&EtSY0uPyd+E^N9itw{G`O$8pMjvm?;1v99FY
z&Zu-jyKwJ#05(-)z9!y#mkO7`C&>O4Y7msFAabDV`^Wf%vg5Q8vVI;1$E01(k_rF8
z+S7HH3DEVk!X=*<?x$}HkDzWH5~6Q(`->Txte?DuY!Bo91rg-1Qt`W~374ihkj)Vn
z-(5!9_9}13qw#qeVlZeMxqe%(BA$%g$H1NT%hJcuJPzpd-_ikS#ZfM8bbht8so1w3
zWtq~gW#i7_ZUnKhh8Svb9{UMC4PwnW$h3NAwbtGP<dKhl=%2B^s)vNJ0phf_(O<Hl
zT`T9CpG4~MYI4~r`cEe<d|&;K^6D<{-v*>J4C;lZ`LXbWuG6BjUhPiH35MGXXzll)
z7}&ltDz;WTrZ`Nko*pf}`lw{r{@B&v@4YucYqmUfPS!lk;!OPAKk3bx>FId1me%!7
zxXP5DA6Qx9PR%fH-RYL$(u3ZifksDqv^?qz&S?jQ&bF{QpG2PFdM8ahWU#S$m)aNC
zan>U-MXu{PCiDq5hjQ`z)3iVqW$qjPCjSM#dxtPctmzaM)b)*DETxQCU3b(evDqV*
ziTHg(JZUoBb$#hJ4y(n2<!i3BE6W!>yCtoXR`TiO7dpoA1?k}6lmI_Jd~s2-5AR1<
z008kX)XM!z6!=9w!LoI9Kj_|1(s9i%Heh*NG>*(ND5B9=OuxvThqLVQbuU7z|H&R2
zF@4(-GjyO_6?%NJMifaCN~!kl(foDH^JMNCJM#+oMqjl1)?YD(W^}@D#mXirv%Ao!
z840X;Ow(~et8ULI+91Xy(f?J~+_RHowLz}$?(5`9sbE2X6SB?Q#y8aGX2yEY6`L2T
zrR=r9Vg01@-SL1$u<Fq1sIpV;mC`WPbzbjxZ_!^zEUON+p}?eD|GK;-V8DmdddyB(
z)Vu>%g7nEoGQcvJCuB@FbW8J1!)3l4^QZ^lpJsceO?C(Weg66_Pbht3JhUqI&wI)E
zLlhttcXLMUY3CQC`{ml@x70CA_KcCzPu4ycif!}F+`Rf2|JQhxXWH|f_Ei4^acuhq
zYp}@l2PS@*&&ByG9mo-2+hl-Cg#XZ?RsVh*I+y3U?@oI_y;CXTl?Ll#mnz|JPz;hm
zq2pi^Sl<O#<8`hhej$Z?g5Se(OS22ZXYi4<U%(u!{zmG}3_Z4W2M-_3IC3rLGs5N4
zQxk1D+`J9mMp`X)+byE6L+ih<?n|3#oeySs)6C2**84u!&6$7CaoOG=4d&R#b4~~H
z-^^c4Ij9r5D-JS6Y&qAgTq!Is0&3`i&3v3bv8YEaIg`{#7W!ZG>1#(SM8*005&}I;
z3daF509+b|KtDG&a+gw}=T`XxEmo`!F5i2%e;y>-?{nOSd^$&m!ws~-9M|noK{grF
z7jwKUMP=_dcG5ll(p#_(@99gA$Gu@bZ};ji7q@<qmLexK+;;p!H;*;(?Z$7tIM=^w
zSbasgZ26Xvp2&E)@+{>C2zBOuekG&1YDei$8Eo<o&$fM=0oIcZV&ErJ+c0EURv$AP
z*S#3GS;>~CgO*TTjMLXwu_U7L`@24PZ-;rIs2s}_pIpCetQ&${jv;~1p`ZO%H)jzS
zU-{`LO2YB)o0@4#%&N^n5$HPZH<*;_L|mTw#cYUPsrlO?zo<<Cs1#WxhFx6`(C0Jw
z#{Iihz6^%T9v;6C)TB*>#q@3BnC24I7}b7hAL)s652!sF-n8uf_(|?Hv4P^|C)P-S
z*Q@WLeAx&_qg674m>1#dfIJG1Y8Z(McPJ8Hr?kM9!v<?cU1J}E3yyb7hW#q`&^7m;
zbsxtOPcsD)Wq$EyXDAX&*?884>er?qAPoPYQfARs)3Ft$ed2zQbx&(1Pur9=J0eD6
z?P)ajTP*#jdHlP0^FG*)Np>Z1A2v3|^6f4HZ}Df(@{0|1`#s;6l>njGg_Gyrc7xXq
zni@GMgUEJ6k4UOOKv(+oFZ{cPHGP(S{k~bYHJiyTy~%CfVh;~g%ynKRW!d^mSh+=m
zyFG!t3io=puDGSjQ0Y#EZvucrz}5upFwgcE4K!gfW0|sqe#u5t%$Vy=_BWvMdd5ti
zIn6KXYQiCZ(6IR~rxa=#$9kTZt=hvvCDEe<WtNAtrPBSXdC$z!DPnM!sy<X8Qc2;v
zuLm@Da!bVgnCfDbWI_I+kv-~H29N<<e<1%`&0vtTltb9gdX9|#vDj1sE0wcZ_o1d8
zBK-<^wo6AKsdxNfKVgtb9)_{#Nl}#9GZ0H7aPY?sRo+fsG9}v+rJyUewNtp&H8V#4
zg|4%L0c}EG=smjX(GjkD#&E`l<}262dU&1S*I4tJIW?>#n{7NzXV_)a9{X$3kvg&k
z60)GWT7yndPBpoEsMSHc$VB6?&Z@{&QBAEh{uU~{rKsvx;luY1Y4l}lAHJ^x_3rpP
zt8KzwEY{Oq#Me)Zg6q#u3qocRYkP}jWKP~&5d`Ud71^hp&k|=9WdAk(bVAjh(Mmsj
zb&h{p8PnI2?iL7o+RL5UFF)Pu7wIduuMADTq@z-@x7MDThVONrr}ip-h`dhECvvs3
zVVq2B&w4~l$%h=g5T60p0VJzHBt<j%Mn#Wn5}Tkj%D?^dpSh_TdgDPS?S3v63IuEX
zvzEL~UH;J`6_+a!O}O0|>Q&}@>_%+Ow|)t%aU=H)=Q_=}#t4I1y{GFF?|ucw$2GYv
zP=w<bY2{KEpWo>aQF@cy@pXiUYVOq1wy9at;1Lt5%;90}-9g*g)|nY{D8ib>pI!Bd
zL%cpej$^aw?DObX{4^lYORyRnZ#m%cxK3(&KcyYmnfGedGAdQtwr6E)jy#Y3${gON
z`(2lLe-Kd<S3aF9QHwB1Vw}<AV$DN0y%l`gI6X5^WjXx$XS6rDX0GSO<ASFcfQTMT
zb9MgFc-!TrcRxxn$#Ajw6Muo$3;*NmgJdE-^fv0L<+X?(SBip{`__+TP9QpYR%4Nv
zJLmn~nZkG_v#yWGAsyNF)^T5xI@4u1;QylTD|qW@nl&A>9Wyh;%*@Pe#}G3!Gcz-D
z%<MR3W`<+O%*-4!!!hi6-|t@SYOnSOtTY-OY0gYnS5<dcSJhMPJ0QegJ*ItKoEExT
zHJ&%G0EWjUiIe>aFZM961Niym;8^qW&BiGz?;K8YJ>Z3m4?uoHjmM>F*DGl~4VR<Q
zqoAwR&&qo`MquOt*akCfejzxU#FoxBBU-KYvelh``tmm(kyu|l)IopKzvh7G75?!W
zW9we-dD$O2aFtE;Xs6gTo*ypI9glt57eC;Y;?)^0;OfhyG)xJn$9@=A;~C+vf$h>$
zvrODsU4pxRB==&FCy#Ssy9rR>0#tG#p>sUC6nZ7;e56&o;^;w`J_;cyNBSH>HALhK
z7sFxKpkW@l=KwT15qS}wi<672o>cIGZbkBy8$Yy*jq__(g@)ka!A$vV<9+QDYsu;E
zQFg){D+^!DXDy~fKm}e|{3&Acc={k>KHuX_+GRq<rz%0VN&hV3mTiO7-HGLZSn>C}
zoqUHk&X`%p%I$sh9brH|3(qraZb<cM!-G;lkX|X(iG0iYPwbqGw!JP%7JV`kE2KxO
z9&gRN*9K$GvpJ_Wx40~c?5N6sgr2yF)|T8ShimIQDsEc9_Ya!2c^{z$7ac1>DHB%2
z6p-rf8-N)mnlpcrcdqrq_ASP3m*O%y+A{Cn`Jk713w7;tfww8Ey$zd5jHz0hC9XB+
z$1{DH;4yu`^<T(YwnD4pHTpz8PQMdQ)cAu9q-8JkJONLr57;`<h_<U&eWccoxH^7S
z3q>#bf(Q2U9yPQ*tCn@S_!_k}G+&C=r#v*nXi}fY5?h-k;;l*n*?JWo)<3C!k|`Al
zz#OMJY`GBuS)&6uO>cS`A~skh9leQ$&IDT?JlK9EGUSnJ@Xl)+;u7$04yKTy0z$sY
zfZzQX9`8}A-Y#`$O^8Rdo)3{{^p6WViXsGLgLH3<XVBL1f}z%CZQ(5?0*$i6sQ%|}
z`<!4;W$$ZVKz@FN;u$B^>jz>8I)^-+<g`L^_Z!b6WP*M5{*ar!XU4PDey7fO9)Xgl
zxeA+J_;RWoutdto0QOMd^FpR9XWE3S+VeuPZNR13L0Jax{O>cSH!IbpEMQu$NbKnM
zHi`c=tcd0a&$6}6sw93VU$3mv5u{>WU-e(&-C-!oPBNl(MC1g&y39i2{wq*kb*(T8
z-9(<=0WlEoPn~7YtwPb_<nuD22@A7A@-6S=a<3AzN0J^-0>EQ1#iwPSwQ)|W@t)c0
znB^OKkI#mY-H{Z*qQ>){W!UZ{?ewq?j22^T{RV>9rkB5s&*D;;fOF#g<siHNE1+Zq
zsbx(g%`I4`_KG{A9^)-nFhxUfCe8E86s!P6)E6v7_9n2<tIzVs1{1j>?yoO@*vHO&
zV0|p3E#I+ZiXmkS8|+q!BW9gK0{}fuKPfG@L)KgMXh@>!@msev*232e9TLn76kLp9
zKejax_k!S3tqR?OS7##@g~GFXA%$y8pN*HNjCXCxj)ZW&w-@UfQ%gu5a%cHO64`*E
z$As-rLPn>qDk?pGd37$Q#^K@IxN4|v75@{M{IvxxL5cirTmykN8#vxV5lb}Zc}ErX
zfU4U&;C%;G#I=aX8a{TZhOg9=Fk@9}#b#^oiFw_!95wIfsV4=;`{+>4Woy`hx>+%Z
za6klL-S&@u#X+X?2upg;nu3d%OIGj`2<tY!6SfajwS$=O{sEsf7|Y4cgOHyaNN2;`
zSO|wUtX5k_1V*hwNpL<-AOEOKC=?|0%*`7WmCL*PS0ObG9P#q#W_?~2;fRRL<N=j}
zk)}9><5(gRt54C$9T{Xl#R~bv<9uJGe;k7R38RF{Sc|fiW#=ZL$+@&c<cg7~wiL-l
zQY!yw(dk*`m+8+J5f)T|FvG%Q6S;Lx<K5ta{QTnebq`wIAmykZ!+aU+53}?DUUAnD
zn@1DkaU3T?KrHi#FkQ$NK@NeH-X~wRI%2w-ko~?Bo>LlrrybV3+1t-$Pm>c->vI$*
zE8NdVuZc(dIiTXc-V4pFPKm=e6G!&S=LCw7q06Bw{@`sdSTM3*N!A<R7IVU2UvIDP
zLn%#c7R*CFtVws%^LH*EPH+%M56~A7Q#KCY3F3&?;Z?q*4kG5&zFP!W%tJ<Xx*34o
zAF%@>c5XJ|ux&Vlc;m8N`>0Cp9uZ}lf|ImO?la{mFZR{sd0p|#kF*h0OEIG2XHS*}
z(N^ZGc{mWh1|3<5#}c|oo3S^$a_K1wA@eHPU-aUoX~JeGn<Iy`Q`HLvPpxyQZP+Ym
zQmuYcUoIZQE!&GAl>K2%49IC(m8tSGE;}9TKYkQmkKh-)Im6a|N?$5C$Ec{7M+po4
zn%&P*`dvM10^|{SYT^|szQudM{>WU>q%!OC#OAwQ%5FYGP;ASOy*bbeP<>~qRj8ih
zzhiiOq#@cyEY3}|R6r3q{wwmk+1dtdfDrf3sGAY8VpE<MHo|$3zzO*`ka&Afal~kb
zLCBELdYqs|$(ecwi&fYXo8?hj2v<4{3%kjr{@mk-^Wd^}KEv}8F^nLc7H;mw{QezO
zR*wy?v|po5tuEIi`1E4)84=Tp)@!QK<C0vN5svf78$fXXP&J*jo~XpA;%dci*NTY3
zMf6xF(0RMoYNl(wqxjg~zhMN~;D3f6`7)U9ncA#}L-{qM5+Sq%JVAX@F2~B1OxR1V
zweS@P!b6csap;Zk-;z(QUZMV0_Zww*MmQn{sPLpV{CZusqLtPG?XD&v$gb-4Jqq_k
zt9Dm)qT2!x8a*%wUk~B~^46aG1J3e($b$S5GQ^&t3x+n|Ej2R-z21T1pbDaFSz_<e
z7LQwm*VO8E(Jtunf2R`*F$=$@Bx<Yd4Fq*pVNbCg*yLpe)=d5$*Gem`X8BVjg!GUL
zRQ<`{UaSPlx$XrJ<@DKJ0XjeC^v=xE_DrV=h-~bES={px>vt$J)}3Nf*;=fJE?u+4
zHGz7B-clbOD&gi9O>UpI^eY7PimbcmzV`GljTprbt=(HNK=eF<h<;9q$q3su@x_FF
z%KX@%7A>X%pTD4!_jp3{MCu9S8S_eX2si4tWSOV!YB7JQ1={run(RPl8S@lwq1x&(
z)utPj@DJ}EJwB*t*P~3-5!56#N1RLxh)xk56(^#oSRs!R`DFd|_%y4GkNRf!Xm|Kt
z1jM6w;I%@R<lVWs2~CPeUh-0QMVF=iu&!8Cw$ARHaWy>Rg{{Wu_qB1%ku|G(bx#rc
z2&=Z+zong=bT(AVII7>QD+gYhY)HShYjZ1n5-vN7RDZwFH?>!a&U?6WwkF?As=sh?
zvQH@9Kp3tY<zI&Y`@tK+!>nlS34e7|S@M)auASF@mMPosxwhlF1qM~W8Mp0)(|cxV
z168e@Bv<qNCq7Qak0)NCjlc25UrJ!c*M^3)6JW9I1XF9T%+bl+;1o>c0{n<cj^z2|
z)!0~WE57oQ%SKe~zMd#4JL%(6N#TBo=zlg1h5WH1?A94e<RV>)?h7#OyE3!BQFb1A
z9A`{IYTxPV`jPi=w6l^}##H{i|1(rCg3wujCC*G@&EdE|?&6KZ&b1C_!<e&6!s8KE
z;Acw7WNVv|psCZI3GO_-`^JjxV6MX<lUWjLhxUbFCoz+@l#$^tb^MRG*F~5`*qv!c
znx#2H90>H4?&2XuGbtZ5P7`d^a_Kmt5owF4XgTpPiG};|5)AprF1O9u81_ocl;-A$
z$oInP$AsE%_78d{dOG*vF$7Z%8Khq+LPXg`dc2c~@wW?;)VzvQ)lTV`Mg`T}2)_}N
zZ7b|;Z5CSNR)4ubK0?s9Cnq<D5%Is$dytr?GxUaP_)BjqX|QI?&GO63w3X6^Zi>Ms
zM>e>1eBC;P8|UC_r)OfW*_w9SsHkI1_FuGJoqMxb_oly^#d=xi9Z`%zf!PRK-Fs3B
zHfjemJTFwSmrl~h#*Uc97D#L-IV*oG{P&y(yN7(aL(p6IxFEbHQ}(lpOi&)f50H49
z9wZjvIU_QjSPlf^3la9njhFI|bnhf$+vJz<UZl_B_0@(ZX{_IIs~!b6#;1djttzvp
z?v284l~9L%q~mQzfZft?&(yx06Ddf&hNPLv6!EmeJH%?U-Q|{0#TR|oS!7;6DG_MR
zzrQvi2k0=NkQXYoxgsX0=NcJY=^Vd@H58%5?E@XGv)yW0w^t=m+Rmd8iPj10%keju
z3%|=Y*1h7^u-C9w*7OrTRjvxeE^>t{8mn1_-EYCbX;>(avz_n-r=|yY<Ax+*SPqg;
z-S4{*lG)~i>bEei)s3c3JJ^}vS7T!n6E7pQ($zTh=@wYYw>Kp~mbl=NIm|iTkid8$
zoW*O7O16y{q<%6~YtURx8TxeWp2{c)sPiA+qqu-Cd&acd3PP6|P<8_8iva}ma;ocf
zJp3M+A1A1>e9)(QuKMOInZt|Kpz|e}_<M9zW!UF$+&gW>+}IA!k46c5UqI7CPC)L;
z=v=PggyFhs=GAlXT?CnI*0Y|CxGM6T=OI(!R<uazB;#uX@@#rm!O?cfUjquW0ck$|
zZWw1`QThX2#k)7VI2CA1W<F?X1p>h3s>Ec|Og02q+@n4#kmwsn1G1SmHJw76Z-}_a
zlzlUL0`g|=GWK}%4*~g;4MLzL4iR0h*Ah_9Rl|ZOb90sGZ}j@;BGoT)P^iv6Xsafa
zhEMEh2Ay611?zE!+`I?o_}n1K`jT5xw!g4SMAqu)J(GJgX5IzQ?ywa)fl?jj^i}?B
z8xH}h_=0cz@C82A*`w)VoN|9yPd=51!bj?SFzQTgAKsyy4}vP3T}f%+K{8s%@|?1c
zXoy2-XqM$FVyVESqx?fi5}?a!2lq&OpU*Sl=o(tJN;{tAM2MdKb!7UNt-9+iCqHkQ
zm0W){N#RDjUv+ekv4I~GFy+Fhk8)Vy3LZa7Y>EgdJ>JM%gZ`5SUf{<ZI>3qP4Hitb
zbaYd>g|{DE;sLK^)mu<d5ZZ)zXR|)Jc-7H;B;#V2X3mG0oOa6L!3g&!rwSQN%bJ;T
zb!5Pun=gq29nspNcR!y^>*aQM`H7kq!V`vB8UTMrH{bR!WIC3TfXK31qb8eu3Fc-$
zWTI1!Z_cRXZe<WS6BN^NL;HA5&KuuqFwr&G(^Ryqk{SN3-nng)D)P3tT8NTG@_erT
zy1eU#C?b~=XKTHGe+;iPFst^I+A8k{YG*umlN*3t10mTQA35Dakm|Q`lyg%*#}k2r
zvh8d)F0!|GLfMJ6yc-5pCis3aZr8gaE5tNQ8<nA@5*^df^6Lt5jw$z+|Cdez%(g?l
zFdSQ`cP~$0t_6ODMzW%(Ix!t3c`MS%{yp;jOTriF(zeSyDo(zWsXRlHrx_`C`#8tE
zU%iNr0>W-CKMUp8RQ7eePRw7n=VQ}P9AbF7zbamJJbi3c7OmqszI9vmsZyI6wmc;Z
zJhE>!W5Z>X%`#Gt4@`I@2|QCj3ZE4d=+~rV8GS(=yr^~+@RV9(ES<v`5HRIjr?D8Z
zsp@FF=c+l^d~)WUv~$2Vw=<T~{Z0c>+#RtoTmDM`=m46AMfrZT7+|h0+0ptJ&L2E2
zeM75^EAX?H@)>tlugks2_-qc&OQHunLh<j5;Mmku*`#No4#OqX7c)8;es{!x!Gp>4
zRn-}%Y+jvyTUuV^++Bdp!CBI+%+zH8I;Kc%dmkof>QKsE)nwKeE2#VoU1_p1$T?Ub
ze27ue?3-uYnC&9G2m}E78#r1g6MNnnyE`G05AEA{2H+lW8)VLQ1&=p(<nI9NuHqiA
z*HYS?67_QZFSxTOPEEZC*!d8B-yhYr|7@8Z{oR%vmf)Rt^c3#3`<(AoMrIFiRc?G9
z0Yo)in3gFPBH};YDzz+iy3mIktwUC}UOVKHX?BVFz#~0g$;(Adsl-uPsVePh!!cb}
zsM6}2P`8hfnbbKn{xMPS9AgJWh4A@U_18752n?2tBoynvQkp-ZsBhb2{H<AU9Sn_F
z7kvE^m|sO?^8bp$cG0ndeg8Ke`#*1O$YWr_|Bn~<1j!-5!v054_ur310uvFyF#aoK
z{C~fHDfa(OgQk8PaR2EcG2?n@YK1hIYLH3+#}M35)JO})$Wa3`w9p)}NAkaH7y^v4
zS}vk;eO7axB|3qNl*rgE5pQcANo8@rXr|DIqKMkc=osR`Llvw<dx^u5gMyLPfCiE|
zax{3<|FFt8dh|c+j#T<-An@~eT*?I_8xFl2OTdm2QD}G1xNle@zt@R&+Au<5Qtr7Q
zGh-WQ^sukk@Gp(eIDY0z;h@J#9$UgTIDea@wV1o{ZYLWNdz>5M#{9EWp_bJ}KQ_N|
zFfzeF&H{3sL6QOgIFnlE;mo4USJ5mp?piLwwam+R*P@qt+e;$o^wA#o_gvg%n&to%
zLJ^h!kAmEslB$Az{YR?4f0{}W%b;zEaFo~#qCwF9Vh(;k=5Y<hgP#md=|AC#^8Z{9
z^&q4#!2ipjkC-$W8p%!u#klgZmFSolLAk@o6cd_{S=HPU`L?!`M(sErcH6s&U3x7Y
z+C&gb1aWyAs<}rO3KRyFo_Bv>tM;=+H2A()#z&*yZMbEm!j)6a<*em?wEfH7f2bp-
zjaq>Fmo#_^yx)owty5et#cYV64zHM>Ic&G@Tt8^w>1rXJ*syW4CJ(SL0J1?D2q5;|
z)W;Z4Hfrd7oHtW4ZQsHf<6N?HWpH?!EkPK58d_J{)BMw4G-|_Aw438vu9x{`ybapW
z5<7|18SO{wkalBnpAo|*DM|p^e~Gd~nojB;iuP1T5z&ugP;0jyoxmrf>M_iyCUt4{
z&bt!utwD_7Y3EbWP%Jck%l}7)Ljkiz;;Pn^Z};JV32*6uPLT>+lkDhY2hXZS<8VHP
ztRBjIbsUu-(9pVp+mpyzAC4r}cl)AQF%tVoK7lFb@7v80=+}%lzippfUy6YK-?WTk
z|7WBMmfq)n){@G&GcI&IIJ~_6+h}Ou<m+ykxLE4>n@^F>C$37szn^!t;K8EEdR92p
z%xN0<CJSBvugF0{G5gmI^dDXLmv{gB=0u!K6Zns;Z}MU<*j6|yEY%EObtguNLV}bf
z8h^LDa~fH;8bA4m{0TpLT!d_WaVppB(owpcW?pq@R8Dv~qeB9Cl=5^_O<u$$vNZ~p
zed&F1<m38Eij4(v6b(|@gVx2_;!J+ynXS^cYh(HDBHWcyZqGlaR-8sglt!K#hK^Nk
zLI!>ul<`8BYB5SFy0T6?y=M3SoeSW}J$x!)Ri*(kH3>YJhc@u3i^~p6Tt4=dSG_qg
z&FXPa>GkX$Q#t(Tg6o9AK!?$Jxp@0p{l>{!n8`o)$3d=@^KSiDk@kYFAA{F%F8mGS
z1#y$tU*Yud{OipaBp)CHQd~)-@$)Mkq6($&jb;1?XUbiSVC^InXpRs<F%z~x&hcL}
z1`&H>E~;Zehx}?^$$y5=iG!yxgxdA754>ytc8QD}U(Ve?C$TyIcGyk{a4+QssYnIx
z-Mm!-s2Tk7^*Z_CQi=TKW1-a-rtL4P0EIO^OfJTuDkdv$7)$`MO`64c<NC{FMeD83
zPQXS5Rz7q@xE*MS&Zl!eD={tOUN&(z^TiqUHEIz2^&em&98YP%IL7j2WQzEQ%0w;z
zCNB`fL=`&x%d&pLjUM+cF8X|UsSaGF%WJL&?T}AugD0R(3dJgKCBvv-|HHD@4VhY!
z5~EuT8an`VF8tu-JKsJLdIZ92IA&b*(Jii3Milb2XTmT<rz(PLmhV-OSjErl6}9kL
z#uyNDj<LY~Pc`H7y2hFp12Y^pKhu5%DGzTISG%v=w`ax^4TF@2+`f5G{_7(l*U7mm
z{kE)TX40*ESL0?kR)S&~?n&K0y{hPdlkzC_^ramf9uCwv_9e<11?g11Zq&;b#qqx4
zEpUl%;Wxw)F9wLlT^`4s5}6i4b%)m5iJlbb?vUt#<f9`Bn${toCrRBpJ_|>G(w)Du
zeYJTLDRTIrs%<}+Bz!N0(?R-iT+^wm(;l#{9S#Rn%5r@WF_LjPa~9=Yk0xEJ%qq=>
z13?~%S#I<uFX-TgIcTy475;0qY^C%V4S%Fi4r5MM)3HF9MEe00IMdP187UEB@y>;;
z8hC4I=xsL2)QGOIrKD8&`F|Rj3mgEJ^^lO(`CYijU!YPBWkb!VdtB)Om92j856|dA
zV9d)%DN-OtW3XYHfFnaoA|eTx8f(_|l;GlQ_T<;*r+o?B8%(QTR2Q_Cql16QoXV+B
zOTdytEvxV}Hb}qGkIC1461*XleVSWZh4+3!!rf?_`QsqgiHIa<S%k2nX?57@N2Q>C
zjD`zFEvQDOhh@N>TDl<b>Az-X-FH;+uGRJZ4m6?OA0QI_oT%$u9!R+N15uS0o+lDQ
zhODDm5jodUX3@9P2SKSwXs@Suf^W4r4X!i}CVb<<!7+mu*r^}$GW9|Ugs~I!AQ^O~
zF!$ydqWed)C`{;bNM44J!6ptcQz8((BQVb7;L;Ekw(It;#j1m()8f?q*8}ECv=^#b
zuRg9ivGT!81hDcqGYa{(@l2aK0eeyJxbzTZ%iz%unSgTEgE?ppruHisupXY8nayo)
zr=cc8H`4x!!8|hPilzaVCXl@%giWK`4Z`qVQt;O^4dw{~dqBGFN;iU>8LNm(KS{|Q
zcTm#?m}%=jFba7X7Ka9lMqd1$F*fMKc7D;?-h!6QN~Wk(!o!|3kiedi4z8$VigVRG
zlch8a))!CIyojWeDRY9$p2=&D2375@$%$+NL=1Pw7bm5k@NLK^-Agel*gS~x)1m}{
z`22cgigvH1qnU`tjBumr#=l6pmwLLgPNt<@_Yi@tES9ghtQ1rl)N`B6R9TH6<GDW4
zCby+oxL>fWbWis&8{6x@wyDxza>>?XViR1lf>xfmheWAI^OLb`ZV}N%J48+5tLC<P
ztgR;qFhCi`;C?kfn}=Ux6|b3!wvwe9$7I$~A4GKqUJRW!G?@RNf6bC-Fj>nQo5&RP
z-L|sbxJv{=0OXy9bQb2`9K4YgR1@V5;lP8|p0yF6M-9!GPY#3?Y0XG7d^x+Kv%^O+
zwY7wVhk!MXl%=B{B(X~UE+ZN$2~#~H!V#3e=)oy$(FG|K#24dVO)9Nx@13E=mac_h
zKO0$Dz+rU$9kHfZ=GQA1EP8&Jb#N($8MZCNR4w%z-v|O!{Mf6l==W~L4Ft>3JMs;^
z_S=-q#hKk4RT764`A*C~Nv$S?WWN%vCkpm1c7iVGQ6nV&ZVa$D;LyH5Vk?F%By*^^
z%-pjr#9%3|4k3`$_)ws+SBj)8)<t34lyPBLsj?Xa1s6x6Y5t;>tpO*I-*K0yxBv0$
z)M~>TUPAPoXQ&jDnc@3-gvWq{>=u-5afmn!`<ssV*b&uzQD8}nNSpiDi`I0jX)<kn
z$~Nz~I!*H5>N;m5q{>!sMUg``O`OYccmBCwkDznSv+(u9G+V*_^i@Fnf3l^k4N{Op
zD_MuV0*f64c=(_B_aYz+2u4mTGUZ5=RQ2B8b_>gT#|%y#@3{Bs{$bd~!%+;9j0+;2
zZGrSJ@90xR>1;l_ax<w$Wm<I4V@KnNKffi^v_%2yPd}PC<#gADBfdnbQfan{?k#Pz
z`i%!sR+B7^eNJ!r3F^6T!1#Z<=eyqdIcE~E;6em~MGB+H??sn%ku&^WVqK=q&`ZBq
z!=_zc3y2tJI;YL}!`QymNjt8uLZ=+WnvyAuLc*LGIHokkRh}|!)ys{XXOSm^115$b
zc{z=bF?HXqwC$p_efg2QeJV-rcfO?{-58@u2@zWmFWvZb`FwrU(AOYrt8P<IR^djf
z*>K|<l_r-&2gl>_MJHG4_r>lC$;D17L{X_n6#h8h-`SfTNwFOq<}*pn>Tzxi5jgwh
ztZ5g9e2WTH6HXuQq#_=Tovd6FGl2o?M`;tw3CuR@V@(}|`YgY<G&F|6GL3|KZGq%C
z@Q0rS`V5rYH_YT!+f}TZUki_Rn=ZTb&zpi+ZEQD2jk1CrNpP!tbZ@5ZJ~ClZXvTLp
z9(gX|Y8H&hAbTz9{K+DWXu2q(maUyu1zLZ&3uq6>`Gg7&2H@`t+G)E|xKu({f3I7L
za)KDv)^PTlGOfs~n^mG-LYSbCDKGog&Z6vjH8w?}-KTN(>*XtX2xrNvgLx5n)&~#0
zRuS>ko`e9e;G4NtA{+7LLB+mB?;sd0QWO@G7+%u7PBsiCZ4@jvV~XUe+fO8I>Rx18
zQbkA`f#zbn%}Zx85*&KNP3Nuc#sx)KY#kJD`WDRNCBycw@FF>V%^w73vtUe{?e`Xc
zpBe;5nMF#6s?F4rO&>ZG$kcTmRehmoZ|b1Ga+{k;;m+6E<-?LlLvm&F329U_!7kCq
zR`RHQyM{>?^~j4i816Xiz%9|^mAR`q{w4!OXmGRCKw>c<+a2?Fa<2`UGp+$#rXi_Z
zFojGrZq&oq)}YS6alnv4oLv_s_);`U8xW`sQ~t2?1sbzzhE;)p4koa1w*RhnUa584
zL_nKTzPtL<;Jj}bG4l?x#-_oU!>76zL8M&rCjF|Ls1~&n<)>>!yTjIJqdR#c9rUDO
zXLq$bLd~>XI{s<=qx$yug$-c9l%iv<u$|#VmjdrepLswUZ9=#QlIqcEI2T-sA}6QJ
z&Su<sFYzKt=ugT}mZ54lwXEdFtmoxz62z8}-_|a=*i(Asrwfg*Zc8$oMBfw|o~&G!
zAD3KAu)2R3yTQqsEqllgketrsJESx%TXh7tszXZf`Ao)jOb2(0Trlc2!wbCO_lmsu
z?Tyg>aZ&adlZI0n<`6O?Q-h?zQN}L^GFB+B(3(71B;B~<s|(wac=|bYPF6QT<$3~N
zPZ#Nr*2X<s-|gA=IJ9Tq0S$Ky{p~++{sduvtyxyRw%;N5_z>KXYq*@rWSD)xlq?og
zbGF2G@k|=9YNmz@>35DYpC;FTa@0m9BH${L=!ik3vx+B2b6p0(B5G8eJ>tb|Zr}_5
zOtpjWVKCN+p$vK}&3!$uzQeQ&B@dApVmP*;$SKWtRUU$L3P}$L5@~Xwzs~DIS%DZ^
z)Jh4{{^QL>S_efI1KsoWtQ)qI#wne`s1MlEs?RKTV2`rl8Nz<gDHQTec3sDAJBCa8
zhis<kWSEOh^6-?@*}iw7p^?i;mbYHAVjR0wa)(~>)X0*j+%Tz=58~AtzD{kN%~ze?
zuXF-qFC4>?pHr{&T-x-I85L4zI)J|oGL4-gO~$i?p}I9uIuZLD1XT@0+V}I9@lPts
z(sSCR$!Lc@;&jW>(A9h&Wh!-eWx9`GX~shW=N}X9y%ZR%a7ji!XB?t5MecoZX`#!F
z<B&qAO+avmRY`7|BnxKEw2AxLM*s1La%s&tEZh<g;&E@~$Mw&VhJt;u-g^0|&d7lk
zE&EmTqfPL*F7nNjohCGBIV3J7Oh@{lU#(8iG$9lbXBpAt6`~7#;x;^Mr39P)!JB^*
zZ80pTj&bhMo$atYG3WD$2kIebe+NhbQ=$@NGaBoQt6}|Q+Ga((u<Io3D>r1hr84b5
zYutR>W)p4iG5x7mGRKUT(Spn3et3Aq5@!@=yQW~tQrrjQeg{`gzx3|et1+}+EWw0V
z*M>tBDqYFY&1Tc8qNwsv1=nHjGM#a-xFenJ_R;xHTSkX1BhHpR__FF?^Bb+GDPs~?
zMEmaeMM4&y-VN%iiYC3&!*3jUC;@6y7`1$LL$#E}s<z!k5UyY@)$tVGYKK%lqC|tr
zs=8n<6h-rLqnjxPO~fQ|&V8E#ZrRhp7=Jv~+$T?%VLl3G0(!l6`-H9|_s3DNO(?A}
zO*03vdXkcg0Xc4yvQy~&X7zT7&M<sO)LwM{oPzLg!LrsSvUG4QVhd>ef)*GU{gSpN
zluXGM_@T)%!%`+Id`@?k;W~_6S=kzcbtl9xO(}R6CG{K;J)5s!jCOq^!j9d!B;FcM
zb1(_V<Yzp*U%irZhHP8D$17NnCeV;ByVJD8bq}KzQpj-e=PczOAJv44j(mAf3w&UY
zNR#Vtm4R-&_l;^^o%)736Mr_8%@V{NRz~INgNe01^PMuJo#0(Pc%3!#lz#4UnlyF7
zKVm?lt101IzmoM8pL>@;jiEdF;SUI!!rkF#(urEU%ofAu-0v<|0;4hWj5S#xmKB2|
z?^S(9CKGi#hKF;Br*?$OES)~1PP9TfR11HQZ4tI_q1@Z!I+yAvSUy0vLVNEuMVw8e
zFK+jDwj>kE7tt$9yDKocLLOhvJ=#D>bT-O*u0OQda-Nib6<5wpQ*-iHsGHRYjVjoT
z`Q?>iN#Tapd{@|Aq60%J17uJ`Lu1sVJYDn{k^@7-kTZ(Tkd;=A{S`LfS<fMfVHGK$
zDNMP(Y?NbavVHs8un;juPxA`CO9Ez@gE<$&gr{_WnzVw%J}L}!xJ6QQ`|_x9d-kyQ
zhWPBcYtmo5_>*&f4K|4y7$mw$*z&Z8s534(T@O2T>ve#-Ln@6c!_tyy)zXgNKGC=D
zuGq2*9Ul3$xK79c)sx;V5k><AcFgm+rKiUtW7W<VZ6aMSe&n+C8>n54&~kQR*;}kN
z`ltuIHlsQ{ciLT|1>nY|Ct;16_L!vZkNa0+ZWN_%5YuQ2&~R={W;ZQu0OLtv1{pjX
z8jhzAWcSBGtzi9Z(;L=MqAQ80;wB}!Ht|Gu$6`!0Occ9#^ljZQ$CryhVSCO3{K#rj
zHC9Ws>MfFsR^?Xy?h0tVfiOAh)(|x4*+0$acrb4`jCQbHod^+UeiFrcR6TO%gBhfV
zUg+B=vlmZ4?>rvEmhp(bUeb0mlPdEC<f~zzP4p=Ib&J9WedMc5YHo)PbgmP8qsSlY
z5=T|$0i^aRGnM((=!ex1Ez^}%JgG9;GY9t}Ilf?bL)+;KAGFmCl3P&aQNPeX#aPuF
zPdQKE!_Ckb^qTPBf-mvPx`N5*p8Pg_#3##oQ_BUJ>vbTayCDN8l-TMbmj%w#<s9l|
z8!b?(pt-AIHaI0g2RWZIN>oK?8JM^9J(!=A(_pPyAg7Kyvr8q~C!}?YJNoN=XU41~
zlwoRUSzrHcT-Yh@JkC*vG07_JQ$jnGu)N(WMAw91-@}D`HbSl-;v|6<w#rXG@fFL4
z65?kZ!_{&=MTOLMCh6%CoJLJCCqSeE{wC#BT@QMeH3~jTMe}rk@2t-GsiF_!G-B0v
zSKY}Ebr(Z=ZT+ckZiejlq{HO6O0C@7<j1B`gVT2|l$Gp@?HCn%c8C(6F#SZ6;)=-h
z#;c>bjuLFsWw&j)pJ!ZAgFN1?)BWK2Z7q?}Vrpop%NfeF6c1Nj?pUG{d}C`-(!M!c
z9I?#VwQwrpXe22YHe^VucybV1GMI9-F;W{RG#f4bBMK4rJi`uaGuKx**d100+D)=0
zmx9ZeP+IxVN3vM+O4?jw!~X8u`yv>29bY9ly(t@JOK|8GB=h2a^6zQ~t`h7pVq3~a
zh%gLT`PGey2}gt*C53D0e6Pkz_lR@v+N`aO7>A_gGpG+v&zMxj>YRyINY;w5XPDd*
zsixS_Sy$!`$$pIb?iylqp5wIrsLThQ_Csb2eW4qYNsYb=ALIRkvr&^R3O+e3?&+#^
z15~mzEdt8tO!K$OhNW92qQog!YSH1qAs#t03w94Sd<_>#Y~ys5nk?(dKPz#G>N=iF
z1v%*rFx2HZYJD|_RJ*;yt5-ys*wih?p<>;Zp=e%bHG1`F37Y%@QQN<%_*_@OxVp99
zy6LWqu2dRE?=8D`%oy*?h3_bgPf)7Jvpb(z8bj1m^Kvi#B^{TOIrw$y^K0qJlCM2C
zOB$F8MW$VUhr>*>ErMJjnv9Pg%XT(ZTeSj%1lxV`x__iSYPn-Ok}|F42^IS~L(ai~
z!1%T-Jay7Aln!Bt=^((~%N6T)WH7n*L()UymT{E#GP%~(ZW9r|h#qfaQ>iIK{$P{J
zg0s9wo%q)JN9=pk<f|P^Puv&KT*1b00*ih`2(DC=1wLEpG}#@QYwi>iX9>sTCZi_J
zl+*37Z7LBQh6u4du_nJMM}o4U$$Z*^VUrswf^E;3^~_k+vJc!ERQMsB$R5dMk0OP*
z(sh;iJ5!%qY|<~qpIs@-n;N5gvhDT#Qb+(+XzEDoN}+dthC;Ukd_dT1z1OSt$e2VS
zd5Ko0_KVd33D!jOD(ZaC8tlna##sAtWL0W^gWO!Ds*)IEm52+Qsay5D$(TVY?iIpN
zB{#|GLX^5gL2_itI+zhCU9YQ`T^bU5>-gi@A;i#SNU#~$C+25Dl@ajks@y8;QHGVL
zJL6$7Va3NPtL#75`)4M-e3H;LCfHjTaWV+_;S|TNF#msEf*nl$4^LOXs{ml`=Uw4P
zH;g>T^8fjV|NHkvinzm<Do}H%nh{9JLOxh$C#YcF+d%vQn`P2?<$<D3;8SEY4EF`o
zt?3h*|HAG=KW|V6P~}R>B^X;SIAeU3_jD5WqbKD$p&PkJyvlEfe&iaXVws}tb`6*=
zbkG;TXQj;9aA>3orO}o8Y4twUA#00-=$9SboDH!DJ^XCvKWkvbK{fVynNg2)PSxcf
za_RAj_T@maJG#zA+=eSk!{OYw)*2J|>;yk86ft^y?SF$8V(@`h=x+Z;dh0cz*GU6<
z7qw+_^{z3t`<+qftfr2rvxGWSg<miAZ8>FqbNy)@F5ob3hR7Q_-~^$<>60N)0#8*J
zh&l*BQeF9kwCYv(eE1Ze(;oSFGdlP{b*~-nx_%9;v=6ED&(_B?RwORFuBhG11%8<w
z**I#!S&u(7G@WMgR_YphT|qyn>FH;qi=pQ{@*MBB<2&gV%cLr!j{o8L+KgnrejL7T
z%-IH1xh&!qKO7H`pVRK|y`|f0#`j>bAmh=;*^N2u&QzI^`$j<e()enIr10F;KpY=0
zuyG+vJmih+e}VPl<j-|f$dXDb%~*q$s}pQ^_sYg_Yta*^oLlgj5^z_)<nZm2Rl<gN
z{Job-Vf`&ho`<GSV}_YPw%FoX-y%p(A7XrCJnwdydL5{B@fm5>C?=EvxVPcX^}r=~
zrtBZpvaKJgd*ho(nKv|_HW~FF9W5FU=;0pf)eFd2bv%B$z(J2an)+5cN%BGT`FfZd
z4Y9XHcOry#XZO5*hJ_7GaYgdOr;vvqp`e91Vh)d?{&HxRLDu*MfR`Mg(JRaIL_=T#
z%cNh9t51r<6W4i6m)ZZVbl?ZPk_9oVU7N4jR9Of2&NgRFVgWbny_8zczO>QBFR9F?
z@3bQ<fY$iMO#^fay)roiu%z54HK_Y??*;w8Byy$I7n&1L32#cgvCeq)?GpMznMOTN
z8lJO%9*L#{sWWh1l|E65)%tzT7`j@wDeW8_&Yqa5Nm-%?F$|u0ci$v}&w6_tyw1`F
zaexv1EXo4JpPkq!6(!H-%Y99_Z+lEc*6&1Rq{kSl!_y5jKq8T*k*lag4SO%!Q^H2k
zOnf66xEF~4x3lNVIgL8gq@Oz+Yd$@)PVdAlgNVVi<xWrjP>pPZGebJu^Q1*x-K9Hu
zcMS2>w7eIBLRZ`Ego<04M^Q<i(W{U7F2;M%wFHr?LT@?(P#~*(r&*%kU?Zo%<9WSg
z3YOx9_i#6RbQ1n|`gn!~!|k13&w7VQ3C4aHHj5b?mu^8CJz9Luz~+Nlm1mIiA$)tv
z>O-Ev$?dAd*Cp?_h(`o4OdwoYbghb0-ik6dm-ZJl_@%9C6SpTs6--uTU#~OGRM%5N
zqM7&DjWt9$a`<D(KiAU=&Pzk;tLYv=FIxs^+b!=g1carp)efXFP{DFnd)@C?V<S`k
z_r+*0@GornQZIf3hn3`UrNH#oUT1I~WoXqRZN~m>-2sBFgzhKuisixH=LF<tf8sC-
zx^r58IH8JpB*sR*)twmdcsqY$7kFxU<Fv<Xp_=aIhmPNV!8ZeURrdiyuLpNHm;5cQ
z(#iPcE_(@1e;rhRc1hZ2#85jVG{@_ouy9=MBK!KgXFNhN|M0{lrH7sce?Otcl)9-E
ztj?tU<eVM~0j^gDFQ1XotLYCD2L$Q9EBr3csFVD$6ak>IT0S<&Yi(=j8Q=@S3z@vw
z%)|LG$@<`9?{*O{3*ma#dK<&0@vc?I%O2!y=!QDIa{yvkXm2>K>mu_NK*Cgh2G0(Z
zyDtD@mQipH>`%8}cL|i&204FYRQu@w_~tqAl&Z)c9Wo~M;HkUwzH=hm6ILSXo76uE
zsI;^;e-G>q5K0TUfkC0T@A-TRz0&u$JjN7O2RI?|Cv+dv&)zlz(=k=eoDF!QlEL31
zpXcz#;=W8J9VdoI33=o-IXy9?5Qp6i$RY5Qx2?VkQKQ}>R(LwQ#G;aUjpGh;#Y{*-
zFnX@9ADuJ$VUj}1ozhQMOn=vY7i4|7q0%SE@rl;G(@>jkVUAc>!kLT3DBrSV>q2c{
zKDn;xby-JM#h_kSVpjHyYQHqNjzh=qs?#tM46S}43NMY{X{HDvW=fb5e5Z<BDLv6G
zVdaR(T(Ko{%fQYcSE46>t8KQ;&hKK8;P3!f#nL?gK&)Vs{p{nCzF?Jb%KLM-u7jB0
zhU;|io6!o5KLI6hyJSQO&%}}=)LqQzuS+G{k(w)}16b?-vEdk`ceuUEdo<>zJ&zA*
zuyNTs<Qwr8mN<6w=p$*B0CHM{U3z%<TMqW&5Ei}yx4{QTM-qD94y6P$JRIQ>0`7Xt
zKgX|77<9Wlc*Kka8)5Ets8b2m*2%nGl3bgWUeW?&=4$tQJ~WVf{fU$NzFEZA_udjW
z&V9m&d&bR!_jrwa)|VGpq(%LXcmX~Vd{xdDrqFB{kiN8{FCJ=GEp{jw9imun7Y_L3
zf|Hy|yxT6d?TWhFIC<Qb$5ed?-F``%p7x=e2qWBZwWuojn^RzUDYA=3DYUPoAG|vN
z5bEvGOCIRNzxG(B(1_^`;d`ewUob7Xhd39YOeKs|J+~ii-R69L82OgbP1yquz}fGc
zA+Fm1f->CPf_GEykv>t)1L6zyuJ+^~QK9hJRC4^gCmqR*%eMD^Z;Kkcc%H1MUW7<{
zjVp@KlFEBa8yVqA2&0xltbI^8kHDH8pQz7D-q423M<1*_?&$P%|CyPfUhIEtn~zJ^
z!hcIG>lJYS!t8PJ&Et`NDr$eI883wRJ=61L_Yt!loyjQn3c&YCmaItVl;BA$fxkb>
zz0|1H{bQL*us2v6n_DU`?5PEW(Y*S2O~~!9*~1>yN||W>Kt%83LgemR%?O)2(tZKs
z8vL^S#(7pB(`U~9$sW%lJFl`Z`Gxf|$tKsraR(og%%X0X6$B(OB*@?R3viwc)YI_s
z?a|u1(r%o9^sxWrOb^IQZcebj|0>vQla}#B6^kV}GVerutg(w7gXJ)6aZ32k>3+s4
z=CeJF`?bK7Y!9P-VOL1!m$RK+81X}T;tImCqAK&#%vXmQd2HAS#aX^Lm;;;Mk|vM1
zjU)_=Nb24E-k%NiZBsn0l5}ySxF!U@Kd3OyyZQdkA=>YZ(#AInJnCG0W)`()%i8-O
zW&DmDso>q*7i?(q`xaCClFF!qF`>t&-@E$}OuhNio!RN3`u28QS5|i7_xVZD<wSTW
zu(Nc4Ji6u=yVqz?3Y-s+TC)=NezGGTmj1FI<uqZk>XxQki)x@R^^7ED@VV?7aNUbq
zyXpJO%?AfT+3@!>Px3-BzP^!0(*U;F(o{(x6h-7My4oM$bN=#nVSYFX<PRWZ%J6jE
z+~*gYl<9BX5;;?yB`++zR3G@U{j7(7C8FouhGrhGu2v=mrOd^`J6HnN^C;9j9+k^}
zG5>g@hYiE`OQ<DM&B^+ndP%JKS$!wkrj6};m1SuD{B#WviYr~hnfgR=2|!gIZF#M7
zH`aePOHg#^`%yb=&EJ<Uc1{xz=g%!g)fH_+30bT(zx|0Qi~lKnYKZ9b>F)Tjy4-P~
zp@-GQsqs$V;^F$-TVhQdrqi;8w1*cyVe;pKCl{Xqb%=q!SbS`&twxvpzjFcd9NaFT
zI;393r|lCM+SH3(ht0dXH#y(i6gT~HIi9Ha1bau1>ye4e?iaTwKc<xYqo0;^bab`?
z)SNxxeWu}J_n1Qr{EoMv6oQ;w(auUCz<z1xpS;1d;`H;p63o6y45Xc*ygd0QY!{vG
zVb`0brS&66%^`itQa{2mL*9oC>pa5pluz)q_BR(G${PPclM3&hwD!#7SSO<=DCoE+
ze5<#(DKmgd`;MSre7l<a<#|_eAR}PzxM6)yd!{7!eB;|(IvD~F74<4G^Yo49b5o86
z7=bZDWvEq_{`K=iZw!A64jR}ask;WS1S*XYs*F}RV<AYZ^t{|94nH6=`XFY>1lz^d
z)xHrc-|3eh8jv$yls+7M2F~*<nLQ!i&ijp;Jsk?Z-yykAnz+vxbH?I~2@KcFVb{mz
zu19%<*c?na*_!xNE#GgApdfkKcmr-8=H(FL!y}!Y&NN?$A^xyI5t!JXt%yE#E^Kgp
zw{@7VDk-@>Z^SV03LCmR`569`P;uZtSN$4DAD7Jc^34hdWm1pas&=cj>%CbO2c1Ig
zw{*I;xKLNT%NvDhkEm`om9}@AQzp!hhDR6ElinRk;53Q$4%?9LX*$k;f<=y3W1sAS
zvprjxQ8b!Kn8g?$O#9z=6o-2r=L;8Uw5~A?*pa-Vwp*@L#rwLaO9W|tV2hkp*|<XO
z<ipMskQOb*FRgZe<P7H)t(@}A_kE~1-zE%BcPmiHk54M#a~Uo|Y5$iN?b38TsHQEz
zecFqlV@1phRp_dEr!v&&m?qqXN|W#nfeB<O2*e0I;Hr@VB0mXp8V7f9sJTo(tp|wp
zr*C?zdc_28ohD>P;HO!Cyluq44xI*qT;d$9e!UYh>ROkdfe?!h_77Shzldq>$Q8}p
z@1)^1FkImmruEcK8v5c6EzTY7y8sImhC=%&eB+;~J#W*Fin%1(x<$Z%Pa))L0~L7F
z8u5qxPq>if&t%Kpx$dvo;CHKjnJ^;-mW6*I1`H-h=fAbj=Tq6$?&p5aid8Jbq;Zw4
z=)K@wJ6>V-5a)vZ`e!?bBPO;Ym7wDJ;RUu~(2ca*ctA&iFfgb5w%eHc>V+g;i6H(*
zCj>Frr)xX%OZi=L0B7nuKNAexl{YbCC*W7Skpkhj(+%L&ru8U>Z_PJc7N~HPp^74M
zC`XHH71)&B4Xf~TbOH|B-@E+S@4Wc&K`!SwFO3#olFTS+!gyM+wWw1b@iw;JV|0;h
zmf2q{1jCdWQJLuNyX~cT=7(^IzI`stNBYM55vFu!Xn6V5i7;T5P?rSK$l2x!1+Idb
zs*%uN0to$(%&^*tlb*SA$8qKXqr?G8ZO!36A5!FYQT)uoe&$?*VSy>wI!z`C?FDkk
z%+X@p5&5KTFIYuALl<Jx%BrjatfKonxZXYn1uJKcrYPv{Hhoh=P+IQHNO6hV98+84
z(D7UaL~R=}O)dxVf>KT1xt1@qH*1^YxT35^M_w)5Q4ADuD`ZRzA6-syV`%)1Iya{R
zPXYqHm~4m=)h4&|x6C6ytxBv=*mc|58gjv?6{;%4H1~lg0TT~&+CDgcat-~iO^l(+
zgEUGD$}z@OSk<x)5L>633&PCQa3+O@r7sb>9kGlYP%vo=R0v4nDwTE=x*<Wc-y6l!
zPpi;0z1oITWf|VCm7fQ-XC9PJ+6}^Ef{`?YXi&B18nIY?iTPCseRnp|%Nmf75^p!h
z%5y`UViU!TBWh4a($VhOn|2xVc~PV_s!e5MKUJwQXnPcs^M~?%zW}NW{t%v1HM(Fh
zHFNJ_WmdJajQZz>J6mLq_6AxVcr2KpR44vUZ16KY)5K`KnTH0gBHzNIP=cv(4VBnW
zg!Be+@I*)H(M&Mb)wZfwO7(|Va_AS$(tbPvJnI8>wA)<?a;T<_<Kxkm&z>`gKZbN+
zgB+b#V5SoJ`xHpqJ#g+0*ozF4mSAA_jj>kN+%T+;arp1?1o?SfVs|&R)PX`27##7i
z&u<LNUY5L&ikQa4D@aAD7h>GbILR^3VoJ$227Qs8L#2Lg^R>y1w2=apnGt@mHy2<R
zGvNn@2D5E|=_;BSc`SYN_9s90G^ptdb$*`rlXX3=@9erkSxhLg0nu%BF<rlm&aI7e
zR}qD~=5|N>3Y+Y|B?*hpZp4`(pqBfkYEv-E?<@5L-pVm=_4~7SZV?(e4NVrVt$)D<
zRo&u#B2O8{_E;$0*P|dcK1dgjran!k&etoQb|AbfLOgKIKH-UCZg^7I&t#?9cUVo$
zy#V8(HLLja@QBC1^>S9q9j;QB(aYHR`vZ;08NA@o@Pb!%%>j*L`;)TX$q{3}4|c9z
zTAdg22s8GUd1=8YdCP&S;MoZGn-s+pq-16>@e~VW4(9cqIld3o4?}FHXmmdP<4iJy
zLq*l6E%M(4QmBTK49RaeTbE3vbFj_J!~An2IP3NX#ShRh)&kJ8-9A^r&Db=Adrb-W
zVp|v>L*L2tK9A`|Q*Y~g*Zef<kN7=>-r!i?(LuN)#;}Q>znCd29(cM>X6)SC>9Y}%
z&;JmP<qs=aH@(Lgu85}%daBm$+;b-Sc&{kg!&E#~x#MapFNhHjAdyt?Qn@#g1}Zv2
z1c@A`KhE7x?NT8+k3f>Wu;~B}hX_K@8#Bw1Z3*c-{UjNOVpo+M2*2*}7C8iWP$W`8
zHfR?<zm>}jqC8D<5zpanQs#m?o5C0Kiga<hd^Skn-%T(kf^)!1TSFZ9MhkeM$m5&%
z#iuGMf*a0mfW{bo<3OKsu<Ebs9(r-+I{5os+V^!d3ED?eNVG!lXE6(!g#f515jBBe
zt`}DGu6QTzh-NvvIMPWHv)3}y-&bs~mpLzRMk!nX+)1HHl`9_tVh!Jqx&am<KLcn-
zFEX<Vi^Q7an=*aTaJ^dz<o#<<(QjC=Q{!yEq%qZp3!8;h3E3ro`eFpp#4NAi*~XG~
zquZjmcT<PTbJKH|^Q^I|ZKEWj7M2Oe%LT%VN^HhG0LRGJX<I_J(72%$<rWGDG^I%+
z!QNbIY5%yn-_tdYojJtIGIO22R4U(MB%Cd#zu9Jcn&d`gv)<?*DmVl5hdS>Yb?gH+
zW#AH3F=NqHamb*On;FNN@lL}dTz1%<{GxoOC5fSeXvSj?^9wJ#gT9=bAA8o<U?|KF
zFv(w6e&EsF1PYtI8_ur9m>`i|Kqpm$q8Yz|jJq=|Z(|zF&@=;i{r=8ROTY^5!VVml
z$G1Q_|CG`?z(qZpRQ&e_<RTDGcVHYw#XXJKkMFBVUCAEyUG%Re;(%Ms2}fL(H>43b
z7?Tp=zW{E;gmMddENz7v&|-tZ#G33Ao(1Fui((50?ZMBFP|mN632n(Ly(UP^mO1*H
z@aG;ID=A{|bB)+P7nUaVOED^<Z&G;6CU$WKOK2&jd4I7NSzZ}+UB9Vs?iUjL{q@by
z-}@}nQs1ntJ)5R}bG!MS&NaBw!`EGl(TG@3dUw?V?&$bt%k$aqCr<;6%@tg5Wo^oz
zkxfxct!_IAlZGxgJ&I<cQ)m3eB+ZlvlKaT?+N9etX;6Z^N&~V=3sM~zhH?&O@G<B5
zPbw&N|4@`MXZApx5@s6Bd5j4GSC{C~zlM;ADA)~}qeJ5)wIt^TM!BnDbqz0pjel9y
zxx8p`_L?s@fw#fUF~41Mq;vQC>UIwO>dIf8a^rxivW}%R(nT^|71Kd=h6ea1y)i7h
zfg~f$^iz+La5{b`K`78XF3}Axq0Dy0dv1FJ>{|=R%td(T9QFC~-z!9+P^rw(a}YBs
z{Zrht4F7y94pU;u;cG5tETc|lqx!`S(-Qv>;`MvViBgWqmM}&&!SfSscHVDdx3GHu
zu#~abg(p16WW5&o*y_2ro$Sl^^grBGN_zynp-+!I120-c?iQ5eWT4HSMAwG~Nw#Cc
z@xH{?z>3~xM~$curIsbBTHafQ22K}fdy%Q(j|OX`Qjd&INx0$p(<o_jGB3#xuKxuC
zys}T+Dq3i>El{`s3_xJU?UCLAgjPPqNyh06NEsp;$o0qU^h>+-M85kql)0wZQZa=1
z$RR-azu0@r;JBJ)OH|BcF@wcqOBS<aK})unnVFfHnVFfHnVBtYF*8}r()4%Fx$oVX
zx$pm*i21iWdPnW5tgKvFt12@CaL0IEUi;$v^Yux@;DdhJnQLCJ1cb-E6<o};clR@6
z38b;HA1Qt>SU{@3V_yo>Gh~Z^$L9Eol2apxXtbm<q8rL9+?G&Cr;o;%Zz!p_8q@On
zoAZy<AUMYBr1QGLj8levhm^p&v_8RDmR{vw-r(AnGt5bS0xbK(hry-`o#CityGc2*
z+qGLpavwiKorL4(i;yOHHb7IhiAv`NyT=ZMwxdaXY%N1%9dr1M55Di;Rpf=WBUja}
zaK~`fJrCyKk}C~q_%K|0%X_11%>={hu$^e#D=wi`Z9wJJx}DDco_zgd0t=PuV}0R)
zN#SlG8t~_AqJOnk;XV=z$phJ_epPTI-ZY<R0#)5Sd8J{GkDVUA&zoBy!WimdUCw%1
zo6349=lvP>1*b0AIRp;|7;;*P?p59?A?)_bMmS3X!Wjk=ti}nQH+zZr0?h}|A&|TT
zFK(-kH=hYLsz4rEuK(JF&$Mk4hDan`f9&hPGs|5*Lmqe2xNAoblrNA(m)MMqqao-~
z`^VU0d!tv38@Q+i2SyOis73<MseUM-kp~I1g~xhUF@G_GymPc{V|}1w5$2{JJ%tQ1
z6QE^H^T0i)u8FQlgE+9VqX)*jVebl#mbtiOl<$$De7WRt*$4T}dDp3bOizl0^8RJe
zqTxzollMpPItPTEKN=eN;iz@wDQ?-8$JX*^vugdg;kD6W$CfC<Sl`<2EhT-7nNt0S
zQ^RLUf4=TTQ9O@SdZ&;oe>htz<QEK-oc;`NtP&S+x7;u<^CNwf4wco&YPj^ckqv88
zL_K+@X!5`yy~%nXxNgU%mWOMSR2*6cT3K9?MT4h=?;sys>~b#pc|Z2lUU!K?%1isI
zW~U;sldUh-A-?^S-J7AoL*GD#yc`Sn?H^C4-ZZY6Pi!(aMfvxe+J#o&U*;%utk=Ip
z%*X=P`r%(Of@BQvSA>bPwz-k7h(LUp)xDbB?i@?9?MyMn9+f?dbW8@*T>-M#AQ{4;
zqB8ph%kl?Gwkl=YJ*+l9X9%jC=g0h05kl4!Gzal`O(vLR#QJs0-|wGqUb}ec1}a#$
zPf+%)g7vds-94b=5^Eo`Nt=l`6p)G;ex!S#Kb?%cxCnr^9scJz)Bkhe^j{ymWVyj*
z|EI&I|K%)+Sor^l7f>m|Yh~#YW^hP`HU8`P{<r4`Fn)Q6TtV4tq__zzVr8%YWw`(I
zPY<xe5U;#gm;-aH`C4M*f4|%qBQBjOghcYw=}=<%yYDZw0vXbgXko(G$q%?WIJxwn
zk%XfA0P*Ti#=<`Z)$PT6-ZurKb<}7@S`-mAei{?2%YTLTt4;bW;w#@^-q`b5qcV9R
zgR_$E@nhN_(jS1au-c-klyjF5=I=k9El<q`XTcYAvmCEeGN7gQvvIyS$2q=3l()RW
zSyY+DpfbB7aoHg(W>Z(`1dmR=)Q7R;&RM?Uw#niGqKQQlT}<{yO#hZvRGY=X>XDY@
z2qn7YsdI>PeDjM9z4^AkFkea`xcIH0JpxQkC%GcN`AF`5BBu2w{b7vyu{&@SwZOfi
zu)+qMlxmtZ1SPkh_+(wC-oIZ^It$cvTy<AM6qeG<F&(INjDhK0{EloPrxXc`6tZ)p
zPsig3-9w2Cj+Pfju6J4yv#%B@7DvQJQ*0L)fJ7Jal57!5=9BS^?G~_D4{gE361fe^
zxdV>+`aSBn#jes(@~qF3g39J%Fa5AxsGDt^`j0Yiymjt<0P{l^R6nVf7viECSVthC
z7X-7Tuj2(_Ios_&qeNW{`31&`lwfZJXNHT~!T&GqUj<N5N$U}Q5X}`WH|T1{la9#f
zXTNzRE>o#JsU_zst_0Zqy$u~Fj!z>wh7y!|MGH&<NN0-@yJ8)%NvOQWrjCZOq?UZ5
zEvAuMP342;tQ~gmSVd61@V<iPpRGCSB=8kIza16ZNAyM;VtW&Z>A@?w(V6n>>J57z
zz=YxjKS^ljf#M$1yqW&c8e1TvOZ>?_Hl9(q0UfHjKPoqwu=TU(%h=qRn!K(V!d~xD
z>|TlZ;inumnE7fzJdFMg>u|S<xQS2dy~fY4BQm;~PcU@n358?M5ZfOrXhjmK@62J+
zU<-|plr91xggj#H&ffIkv%q`W{g0Xam$SmNqdp5EZ``&=%Kc-E!YZ}nD&F06i&X;=
z+|YHjVT>MtdPk%^LFp)TN1)w)Md%yT_0*Nky@vf2qMC+=x9~sRxX45R&4Qq~q7U*x
zOYb)mhpI7Z36)LYFR(-f6Xpj`{6&p<c;9sA??dpQ(R|%#4ZX>rX^EzZHpqU%7LORU
z$m2$YEAd}{MIXLA3VUk?b@yZJxmN=$qgd3<RF(SkUrW_Xen>lcTLl-AtR5^&$#)%%
z(}>AZ&{u2vL%<aBCCo+Yl}XVhtkDgF49fl-x1Wx*{1t5=q}LX0NP;P|vUbN;OQg+7
zydR)pC*?6o(-K#eXo%As9k$Dv>Ly}_&~`KrHaf-<+7CBgm3OK_tmMRYah>o~3It$M
zqs>8(_0b7>!|f=d1$V8${En^2_s1^Xms?h!Cnlj2Z8~aj-c#(OGrDd3Qvk=3vJpxr
zV`3EKR~v!WR`}io(6Ev6kff=BHiK|BJ4>aBCOb#ZX9`|_wQrWcdm6K(l&<kB*P8Q6
zEA~RG1GImjj9ZZ+7=(_0t3?<+F(Fk6(!|<PYzap!iTq`tn42>`*SJ>pVg!=59!?y2
z2zV?;Nj+zXR$xC+SxpZ|D3zw>xwO2(EXUI<5P>^X6HD(9+>eQ96xrlGAqi5?vf7v@
zD4b8+AE0F+<#Eg}3z2KZlUOkze(iS)!}cGy3mRj7AOAcjzZoEf(m%(|=gq46OAU3?
zdqgG>S&InA5@N5ByHHTwRLs{M4~FFyI;WU?r;`11KUnrsEZNn<@>{1hP%Y!E>KH7b
z*h{dyC*5kscq1Mxl|U@WOI&z=w)<Sw42Sft+%WwCyliwr`Dr4?kUj~SL`3mH5Wip!
zx9y5Wb`x5eSPx6q8bkgaEUcKXmUXN-%boyamIjaiFqAd3Mj^Dv4#$#A$A41Iklv7&
zUe-$+FE_%nsfKV0st#QkwB<3Npm&VM1&|uCo3LbAB9)n~hG0DOPLkOV=pM%IQOoLG
zMN?>={}9REd>5R*q$D(Ce%^%w)8B`qlakd#ja!p0p4ufvU{m#By<cb?5kmJ{o_ryj
z7fPi1QZOPow9-{yhY?z5psT93PJS$0xp%uNz=xVSmNJ1ulUxBu=ACi>=U;%q{}s!*
zp~NB!fi!j?q_50aW)Sdg7>J=JyFMAfDi?tKkBuM^MHl28^@j~kg6^v#EboDl2qhGV
z!W9Ukk<RyRbe7n%oohs;snQa9&a#9~37a1cn^!|yw=jlp4Vv<G)5BEEUrR0alGPJf
ztKKX#h8IujQiF(J_RozV<Z67T=z<JRxVLj(q>y?A!-Ud;<6beBC>lH0tLmpI=Zm>f
zhbxJ<Lge)XRn~yk3S9?iftCkDVbob+0J4alp+zOl`476$UGjWRXmJs5Tr#9t15v5h
zS;bjVamdUc{k3R|d}w#|Nhu(O)tV7<66h~8rf;yZwy7~&m8K&UOeeh^02{qgY&Hm8
z10zqV(&q8RjUx~eg`9h70%#gYy+ZwlnESx*pETmp#TrIi^rERrnrcEQ2%V!4M~tg5
zT!T7mOMZ}<g`@<-{$+$PR4@JUFQ3E%iKeBlEl3a#VJ|fAU01dA1ApA&>v%_Bnc*62
zX<{Q^h0V74D&Whr+MI|r4dPc$BPfW)bRjpXeGtSBrdc+HjZ#`o{X7>qRR>u+(fi~+
z8?tkGX9btjvP#sm$$;f?bXzhFFBW(q8<rw7lF%FV!i|G-v*&)?{C1p5Pj+v@1+=2F
zDX7P<a1yJXX?%L10S73UC5KW-i5PK!489(g1c6G!K3qw7DrXSkYED%xUfarPOL}qF
z_eBlpmh39AhfUQlaaDyUOZ6clfuqgaVtH@V1=aPSb=8S2_+VSt7&`>Npi8u<A(YA$
zSylql+TO`L7=luvi!Umw^Za@K(#=286u<@d0c6Lq&e)_;aq}|dBQpL9Fg*dM-}rVB
zELN#g!a)QOa$;=gI$VA|tQ_uyKK{E>ppqcw@E2w9N_p&0xb~R#)%1ti4FPajxbY-1
z`?h|sh=2P=IfUl204J5++PDGOF9%w&R4e1?U5n>ok8_62bh`Tm=`1h9@x@+HbojaJ
z_&`ySY6xj)tXB~7m40x4<p?C?kF}a0y6d$uS?dH*)Z1S<G~gw~2WPzpNA(mr1=&o*
zVb64N8VZZn(E<Z37k^026Q|y&mrm<9$6FbFudk=(+0-75=D`xN`>`g?3@;aTw$dD)
zUhG9mH;KL=>L%kXtU{~k0>7vstm?LZSCY#b^b4ZvYtTPdG$jg4ZbjAZiHQzge9fWs
z5}}{Ym_DKFu{LxwUU8nEl?~`iG2TowrPTYafIFh_!Y%O^n(%AGqNNgoH=xG2!t61(
z)0myzv3y0EUpmOK*3W!XB*@FqiNyX1?I$QiBg3kv%4~Hb$Qrvu{Hur=y4prWEkfxu
zL^&TSoQNI7O#x}-xpta2smLq!aW=f~PXxLoJ%+Vs&a;AlW0|55!J>vA>(6K+Kyrz!
zUW(y6gxNa+kaGV?Ld1^njjEib1Otrif?U>{U@;q#B?OlznkKTKlgjKuOVNaEt+<gh
ziSPZg<vw{W{ei43H|7@x=(BR^Ff|}$jN^m0QN?}L5ly6$wI*8B@mg41Y#jLwoKjwm
z>^ujLAGZ$WjgPu5Z2Ao{=D2*}3{o}^9xU8xTJ!u!H9^0~Mcfd$UPO{>F+)mAV}AYI
zpZ%~bsyEGTp;>H16&EF@Lushfccar_EOTPXNPv&^(3vKVMQUFu{|=fP8jvr8^^nIJ
zMaYv~g|d+@#AYM7PZplAf&v#-<03HI_$s>rf-a&P^3ju$KF4~))nO&uvo$LTO0*NZ
zyy6X&-2(30?UshR?OE{Hc#u@aR(VQ+psT=uXhGId{b%knWspd#;eF-t70by}lke@8
z0sl4#xN~S&C}>H1x`%l!;p$$9N<9#EzLl54z#<0S60=pn9-G9O3K@vMmocgUgW(JH
znv04ch013_ngePkN&e%=grGY%T-e=^hxVIdSKex%d7!ngtw}Z+%T}!bv^1nV-U`1!
z5gFPM>ZLR=t|@PVJQ=9C2nHFl8)0toYBLhQ(<+b|Z?66xZ6a|7QbD2tA?JqivfOp!
zhwac}Gt_VACiGDUAt33w(4I}igB`fU6#%|CgD+}~DTn@Hy%3h0$^e<MdZzG9&a4aj
zSP``~{7Ms^7)&wQJ`m8xw6~`}gcY?%ls@qG*MfXqBnh}Wko_*K%E)+Mfun<$^UDq}
zTYdd4Cj`5b*zl|j1ibs4Z>%IxE_-c1UX4w5ELAmCRFpqLn=&A?Izy1c%NBC_H=$zT
zyWvfscnqi44eY>$kiW@1jqSjX08x{vS?wojSa-#l@uO(M^2LP;;UG+N|9B{=EPJ0V
zZM3Y<aCwjZRKh8rMUE46F?f3I1oNW1-JI-nd(b&tafQ|BNQ7R*rnVi9)+><4R;3^G
z3GDQO*Nz?sLPa|~F8cjP<hjD=wQV$SOgd+PndS(4bpcAwNE&AVvPNpsRY5UvUs{_l
zk)^)*l@SKsZg+!OWK=sR26bPdT5e8styw6tdg-`kbD`n_EOiL?9q#lww51JEgA;sW
z#4E7IutwZwv-cG0=s1G&K=@e0$B$@<_>LCg`fn{vyyX}zn=|GaM+5vC#V7<AhJc4Y
zhp7P!TP+ORGu-r~!b-j3nT^o-9{&r>lBwl8KD*^!_OlujZSezmGkKrnPhf5wG%*o(
zh&;l4Yw&#r(~p2Ccg_&v394*#0YpPO>23a(TQpSW44gjv%(f%hBHFo(PvWGST$CID
z*2tfh9%Mfxu<#sv?mj51%yB{*>IoZeh>@u|0E&s#*n@sE;bY#2ZE?<^nAYU*ni}a_
zfk2McR%h6Sid}S9B%O@jl6UF_^`H#beCEiBP46EZTQTF=#YZxHT{6VtR4!C)E=1$2
zssgc?olH+;bO8$%h||9bYB_HYEp^_ICWmF`TUF5KO9z%Ws-ZbQ*-v!`P(Znuaxsh9
zVlGtV_P8u(`_p(c7(0ki>)Cb8POJBbYqPe7qYH3)kiE4wz3HJ0y{cTntWiwPIgM-6
zqPS36YB!W@9ZGLln;ZP2UQ7QJti9KcG1zc@Gk=j#d`5GBuIZ~OF#fhJnLUKk!ek*3
z+eziko$6@v)}Q8$KELHl;hP&|HC>SCziWuCNzh0t5bm%WQ96fd{LW}OJFWbg*5$ju
z^X~kgcLBQJtt?#_Cu{&Z{)2#;H>U^EA~XTU${`pl#RE7xm3~K+24>{Z;foF4@*f$C
z8dRcDS);?yw?=n!E@bDsgkJD$&PZ!py_TI_?^k=kX?gZbvO!^MnM56a9bHFHdH>IU
zQcT#@zR{fibz8^K8BF7bl}0hyMVh`=u|>(5f$^rA-p+KzGI@X~=e*C2c`8xC*9~!^
zAsWv{i)`MyGnunA^Lv96d|>~<5yEm0cNzBGsAp}Iu>41e$BR>Dn~B`PK<O^jWgJe!
zSjp}aBrh`=u{?WFF8s5c<*tcPy|OT{FIY?Vn=&=<fsqLx)Y+rs-<$yLQGdvZ5Zh>P
z@W<l_kc>;$;il_U>U+b+Cj<C-$cPb&8{!$453%bhH{X=mV{x1tNW;c8AA-X!`uLIv
z#!j0JzMD+YCKuaAcZNEgIw^WXDWwhZ-SAF>x;N3+R*&HSL|a0BuFFy}F-N13j@gTU
zZi|@dshso7D0^uwpnJ)s=*I1#dJI}lBih4w0)FEvwlST)9ye@?Mh|`28EB{xr5SuQ
z`mC+clJCE;8(D#q$+S+0x@=2ZDDK~-b~2yPr|VL3S-}rzGglB>OM3kA1>g?`I=t?L
zV*<#X(w<I?UNctaY^?2Gk$+X4NBPQ;tf>XxkXhmjp&O(A+_YHV8+dcRX|i;JZuy%l
zBNrMHEETM#m0Y8{_IpKR1w)`>Zw?K?AoT+us%pa<D7~3JoaXWEEX3-F(d6czxoZ$~
z9r2nip&ooM7_h|38E0tL85-uBWo9=uF?f0Ya$XyKQ6pR($n(g1IrhDPuA#IacY5cx
z)ucN(ofER{hxtAeNL0Io9`H%@e8&wKyC17~9e)M6fVpk2CAA6ADM$3cKJ2B!jy?Yo
zovGoY4MOwNKb8FH-n;~lsoq~-U-wsFUG;x@+Vj|eF<?(5iqWaDwxRtDl^MK)rJ%2X
zI$($$cI4J=Nx5%G&+PGcvWMzJSS*dmc`0;Jd#EYdQ8s-R67aYcXJNLa64?CV=RPaM
zw#+;r(`+cIGr%J}kXj+@TbFFn*SBR=RgA-^YuR7kE=7OwgNc^v55@0^ZTCX@ijI-F
zGChA{D&Qv-CjsEtJ>d9-O|eW;eQmWX*u^@<T(U;SaSvwy)>2u8S&n3@5O`r>&d55y
zaRyt9jvt9~?T;M3QGr|Hx_p)^8HvNjbZ^vO^kp(PC@Od*fqaozf?39ge8(ABk`_p@
zW)sHjonv^xZ44Yrf47M~F%MWin_E*QmAow-MyMEosdD`M0q2~myW}xgr{Azei4%f!
zc>~oGWw4u6alC{-{Q1;!VP;`kLbmLWU0rZo#0)(`*&M74w%oGdx9mN0PwT<iMkZ~7
z*npR-(x)VxsRy}s6<e*YNw7<y0rV#6Dz1HOg|6@#dXY8oCw9;Cb0(5B0CRs-P5e_h
zb#JMA!jCx5_+p1=g%RlwjYvMc3n^t#`XrTZCT`9+j&w4_QA_bO29wO5%`zC(q06|b
zoAz_hi`N8e4@n#cf7>WP-C96jmE@*xRxdZfYa}imO<<*0$xR;_ry~6NtEc9KR1*y6
zHQV}<o}>kMgZ#jPucv=XhkyEn-<DuarQRvFN)V?V^4diK2caxVTZNKjM~K?*S_%IY
zo4!y$JEqV1m3v3kF%))utmhHJ_KsV{aS}EXd8Iwv66}HpSwvn)bFeH2(-PVl_PF_a
zt!TFs-got%l*sYV=lClxzKhPir1-kcJvAg-{YAce`;UZx!VJ|6^x~+jM(=jXd=J)u
z2%03mzxtLUJ&TmxfBNvMbX#&_u9IOng;I)%0C8SE-?ORzcqhwTw*F<84m~>h13F1O
zqym+0-fGpW8lw0j;RHT!v+3ZWaF#d0MdO}o4L1w{qop&nPkbrSM{_Y7VI99pHf*l!
z*D_`8R$FQi@7lWr7#9^#qjvV7;iK5f842s4A=-h-z$CbeKbr<6iMZC6G-)}%;ApRW
zot1injG=A<ulnnyKVP(?QN4iG2acLwVIP{&;m?w#f4V$7LBR7o=y63Q-4uGVE~cZ=
z7Ys8p*lLDRVQ`lYX@_!Uj1ehTGX&HQ=a7S~_|rl@dANCnC_nUTIp0yd+X0!Qa<4wR
z7<Pz3Aob99Q8`<FeXWc>z%gP0X0Sb^p4J*jy45cEO?yzDI&J1MrghZ_RBK4>ka>yk
z<(fBo)dThUnl2dAY9Omos57$ei__-?&%+gyDN<N9-MKUIpy|4jFwgjx(+!Uf&;$S>
zZ{%^mY-Cas;gPKXdjoeqth|3Jj}%5JS<a3GKWbNt*SH92I8G0-!p7o9Zc$rRZ#HRr
zqeVvH4NpLwV5f)jGokJ`-vaZ&cVjhYR~vNlJ?{223XR35b%v1%Xo$84i4Zhw;?BP1
znrj@dgUxSt8E1*5M3ItqPSnfjsSB48)?Ml`iqLA73*F{2oLKJORqsKQ=)S|8T!8UF
zDP85hsQm38smn6Dn_@2pFOT2V+VXex)ediXo!Y3LNLE-7k1SX#tq;(4Vv>R;8QVJZ
z3f+aXsr{9A{cTehx?@Qs1V8JRbP*`s`|~@1?_Q#6Wu*mYgF^BQx1x_?KiC2@{c7jM
zB2_cQ(7VNQ&HM#v51bFfszeoL?1Ik?uD)J+=gUT%4$L*9{ye$8x+%d$;iE=|$nOja
zB$v5!cV7Z?%l5Tme5#5S=FKtnNn3H-X9RYs2Jh*TNs2Gc%e|4!buX7zyaW`-LEKAn
zK+?WuPfPT;Mt0fi5cm+6jZbgHIqt4w<E8Gt`cZ4GWL-6BatOA<EHRG<f?^ArGyjX&
ztC@m1@Mx6!*P46&OIxC@rLXrF(gg8?v&RoidXg#t&%`F$IP3W+o7^zf)7~`;gezYY
zAEmJoiRzsN>gKgDf>h*Qw$|q}P6udlh`ZAt#zdObgJGw@+6ZkWU^k^p0k}4C!=-^b
ze$7H~9HBciP$dPt(iyV}oQ6owv@<5*g|iQzduw=j#VfBlSTO^IqkjXb+)6Ok8vTqI
zc=1xp69X?MC!GWm-@2LAFvAev3zfwTH9P(M=?5^|0DhkF>jt><tsO`Dhz$F~{!pI6
zkc}ez_bcxZb@Ls^ucsf!((3x|NP9_;-WU<^!uc5-vf9y(72Tixp7@Tx8Xe(TOt+j^
zf<WqbTtrpOHP{@Sss|;8p<<KQk@M&J9mr62eG<GZA&H7?S=Z79gnr?sL|D6%4BWgt
z!Y9BeWRu@Bu)>kiAmkIl6Rz%wq}3cmOs>_AG4y`yOsczUkv|;pjgFp$jXa|$ZrOk(
zM7J1vB?w=<=`-1GC*a%>O$^!(5Ved|a+d}bt<aiY4dKX04erAmZY2kS`!`COks)Tj
zOz9-YA>9ySQ2vl(=Lu>p=*weib3~4ea6O1@oub*!)Fz>Mc6ZtkGZZojiFRfVSmG&G
zK?6axM&x6$WeLZnY`eX0LBk#UFA=qB!YqO@n6+Vfn`e6YiPq^&Tl|C^5J6{_GW_(l
zVi+0Us>gS@3Vg}fg@ea_tiG&2cl-N!B<Mm*8ntNUvn!uG!nS@`yKzLSgxzg6Bl=!L
z#y2_jS1gO0XQb|w=%^HKFfJUTv!h({XUF48PHc76v<ofCL=ZY!5McmP@8)fmKY*1(
zGVEXh&=A|jQ#TNTfkw=y3co%3FRl?S?xjC90NW?v(sPi_BR|!z5*(CkH*Gx_C|;{J
zl*LQd_QZ5i2ob^I=DEF0(rA!@#~Pbkf%gI0ru(dR?aSA56KWdQOJHlz2ws^VSe@H;
z=e5Dw<$BG8i9U~Zs0W9(lBkR8A~oug!hL{t8bm1&zNm1}vTl+sYxtTIS^bnW2>r!c
zyn0|fQy*vem|K@lW3|mwF<1;q)fa`2nQoJCR`b&;!f-RUa)>IuDB75X7+o!3C|^pP
zx?Rltp}9`L(nHMMh0rjydWQyfWJ?Qa5}4X&ob@CoC|`E2-JKpoaOix)>zh+;F!|w~
zb{#Sp5rGzbtV|Fu<I_2f<S|&WyiNQ(Ax2xw4{WQEtO`L^2o=sBihzv<2q0BT;T2n$
z@=>ty^D*2i`zf_9)th2T;LD@pQ>^vy(V)*4k!(;fd)a@W)0yWwMkd_RT0g4fhEKOJ
zz4MHZo5xHEiC|hxj%dmV_cN(MGRT#yT58Y{g#UT_u+n0;g-IdP5&N#$XZ;L>%PbA6
z_?9nbEFsVuwsN{1sZ{T+x-xd7JUzfnq*zp&(4xuKi?>v_@$Ce`c7*YOI{cXm3f?t<
z4Pol>8rb{9cV`tqJm&8ZAU$5bRg{5r-LgemY9%6|4zh@*1bL32{BZI>cf^*S;0@DQ
zOqa9lD(et$)?AN~R0WQ7K%8*dM(-eLS)9Yfn$yYyo3t|AJdE20aK7jCWDP3AD}KUd
zJ%T<1sRR1dWq*pP>Pz9``{S>2<}8%H1Ns%NMS@bEmn;EV0@%U}d@8<Hq>tDi*xWnP
zTE^yllcZupd~>Vl?OVg%;Uo7h2h<s+80V3cS}oyde=Cb(^caz82Kn4y2Aac%fT`M!
z37jkaO8jklOU^C>FwYNIiBfj`++%N>(+abH7=pd<B?LWt=^$2v1IMr-sV$JSH~g8D
zT49m)dAtdhHGdM_{Nbq9FW0I>EyvpKx<>)+ezaS|v}^L#2{*+F=+r*PU+`1-r5mN&
zGxaTU-N~%0vzv%g*dE9Y=ov}W^ZLWu_jy$88e>kKW-V46Cdbt+&09e+@Ra;<i>ek9
zQsUp+MH7v^5UKG%!i?~=0WW83oEJ{1@7Hfw&O>_!nI}q0@fOpdlu_ygkLV(j856H!
z!^iF`>ff#9g607i&Em$YT4e2C;v+8}cI@xLI4aR7G&*{u+J~~<+faQhv~19{x`OY$
z&P|w-TW_@_L{?RYM1+?3J7K$)`lw(U#cYhD2sYOGi@z8H-zTGQwb5B}u{r_zFPuxY
z)0-blQ>^+?WEVWEH7lBu$XJZqg~Ik92-e(!v{KbQ;&~q^m$ob!CR0~hg^m?$q>->Z
z&$k7Ui+2e|{*A#F!sUDus4A$yS!T``cI!uOPSE-Cm#(fXgJoL}m0ZqfwJtw3WI2qr
zJ^!FY$Em(%OzZBD`6>f-ue<Ug`g^$x?&77b=StKe^}1O_8dhzeG|wIM?5?1(`^P)#
z*G00$+?X(fxeIR2CB~OAtHuZLgsHG5P@F0{6yKdbv6iDA?w_@V&a03>WdlasJr>EO
zhbTNM<JhT`GT!Kf#VGH3>B_an{@?q}XAD;4o+Z+b^Z92emdiHMh9@9GMt5{pYpol^
zNuFGN--;qr)1hwl3^H^(QK(9(^_31w`IdL}WiPj~Zy^MPHto!VORqqhzEfXE2l$Ou
zotn+>YCi7S<6Ru@inP7c#C&6U0|4)QRD@tif87=gEt+47ZD_j!+6@3{*$U!?_J6W4
z{zpDY3C;tqUnM@^zP1Nz$9*5VwEu6GLr}@e&gYl!A1_*;AJkiK8#4E=oqHw3MwYv?
zB`<mZZs5PZ{vVA#S-PRWQ3s+4@Mq;vq`L?2l=aNy9g2QLm?lk4^KM<#7$kjl*@q+7
zd8fO+LGRo0{LIcVSx30uSGb#gPXXJ1p-(K|$^&{HHSl8<hbPx$vU{FjCfe7^PSj}=
z82sRz?y=s?wQ<$~w;>Aha7Q=-ayl1;bT@T04*LA~$oX+V43V5RrSyF9g5?g&fOVah
zPse<ewQaGtm77#8XQzyp>hw|C&}y^h-}Lq<{qVBN!Px25yIV3g3jbzcgC@fduPmqi
zs&I(!GhCpMyw$3UH?`?5ncC~Vwe#7=I(~Zn8Qy5`FMooq;BbZJ^9oapksnFva|%%)
zaEof@bL+EL>lQq=3*Fy1EPwU*IlT;2N;WxS_}n&|Akk9LtfGAx`*29LMeJ0~-dX&^
z^-;45%X&}kLFsm~QWLQErAAA5S>WS>cEjNl_Y8}6=_6SGtgGkN%@cd!m}ZDNJNIqX
z0@J4E?|MJg^S#uE)$39KKWq1oa!xJTH@D?lJdK4qyUtsHYy#z-XVzUnRMtUwOpMwl
zrwJFI-m^*yAv)v{icqZTrlvB~Gx=Z=0DMCHiTqd31CsNr(@`-Og_jRe*ZQWKGAPzZ
zbb_85>45jG56fk1iv)k^2cQ14PPgT=3B8U{p4nJ~^mVPw<$=PAJwt?4JSux_)cM7q
z?}W^g$_!P4*8@$K5<#1FHrqi;zYeg%YbJLivHs9A#!o1nA6~}em5g`?gzYN0PLE2f
z=X)R}U<oqId74vqaX+Ext?oY^6YOs=GptlBY5buqzg&}i8~;QKI%ixmuljZQN@Cmi
zgm3QAHgD~kwYK3Ssvk_#l;JXA9#|9Q6_BrRsF)|ll6B;Ax?D@r!hh=}`?8Ben=K&2
z^^4QfEL+8#-)CK2m(Q&=S)6xto^nWKMDwFFqaIT&zYL*m#Kbly6Pom6!tKtSAnDSp
z>TQNkKb4`sNXm6D^8mIYoWMzRo-3H++9<nWrlHg)<@f~mDDEWdefNq{UpXC(sMGLI
zT=+ptx9$^%3g#`cs6O#F>gyNk;Rbg~$=uafxAZzA*LJC!pbL8=H=pe1UHAc!FUlRZ
zZWcT}M4vF)vPOpXS#J|8rSimHZNJ<bqOGo~vE(139U5;MBr(6Ab45$LLR(-^{hqlf
zRC?VKTMOMY=xI&BrclDa7F9POh<xAuj3wHu1$dYB4c@2zc_ZgpV}EZ^JO3a)B~S5K
zNZY0_=ot;<nR2vFYI=hF=<KjxZXPzV+BfNu{`k|dk1=(=DC(_e)cdK?)35A)Rm6YU
z!VLSd*&1N89d82^7nDk-`B-gGg$9uV*<1aV=UNJ&;$Okv30pc<lz3eR<uB;17xb!*
zcFVuUNWB}J7De1e*X;0jb+ilKEVqc;Sm*IO_qn^iC+5Y6RFU10KTVl1ejaI{H1!R{
zEn_D8JXMlO-gsJkKpjziez`UHOn;JqH7SP36RTmB*whNz*`!SHJ`<g~58myHs+=;h
zJI;Kzp=sLGH>?)BEK-oYt9=jMw@G25_eeS6Wqx_&()7jyTvWF$!q?>f#r@9TP0cvR
z7{1B67eozS%dY3}M3u2AI_z`c(?5n&XBO$=kTu?foiN1EgP$Qw@WFN#@L7AcZqm8e
z$^3EVc#p53ousjYZiB-9VC5pv+Q1UoeeKXXACl(rNi||!8T-KfX5@3>M>x~fsVv9K
zv3tXFpXgU-Yl(g}hPR%e0MwJbt2zTte>he9Tz(Ozg>3@&ye>RgYr$+$4=1Jlk?oUc
zfYCEL2P8@0-PxWr{@qflit#q4WNA=a;&mhEwtSN@qlTUSS6k^%1aUMkjIH_lO^(3X
zU!IDY+QSs4Vo|$hz-7`rSZWgc*uB*G=_R^&2monjs$X^NvN0-P3}<$eX6Oz3fSPT(
z4!)8kL3=|)XGP1Wb)U$f`Hix8;`<d>v{0vHs`U>oZ^~C~j5&nwJJbZ3grCFH&KrMW
zoQ@##0jLVzyA!fBc_}!wnp@i<Z{yR>dg=uyJ;IBh)7=i!a@!mz?-*Yd0CoL>HMb>Q
z2RU#0ADo0C{TML{?$216VRq4#OKy8pXK#8Qe)q0|lK~pz7o;f3!qvMu)9!C;&f`fJ
zr%UcAK{-C?Chh(wPsc-4FLRqeD&iY5bLx2AN9(y{F*SdiWBN32>)P*0N`L$ig*5Zd
z?)xpADNMf24b%?RwvOv|ifKtypD+kv_`E|*7VsE301%lLxojPHKNpQ4NqC31w3nz#
znUskL<o_N2l38E?)Qem>j_quMSo>|&G5$lY_HAh*S)jeTwfP|SkScT}d$3h>`~V9f
zrnR-Cq<JEb9_U;<pW>t#olvl)+xfIVEw7*pKZhT7nczZA>pl-6izqT2^D7b^Mjr0A
zmR^0vb*1-spOH4>8o-WSK9k2zX+kCqc-EW>2rQp%=Ax>vxQ>`j!l{4FT=$PwEbNh>
zr?OcGW~_&k^xB(83xCv&;Jz>Fz-+I(Y#F^RA!H1oT){?poWJS5%>@pOb?)XY%gZ7h
z)eY=trwT`5F=g=W_Ne>vyclvV-wcvX7_4bk2_X&lyI9lvzY*FU5Ypg=Z57{D;M6`>
zp0k6z;+ABl`<;<yTpfOEP8M*Rsy=yu-ql>QCz`f^5hs`4^W`q){Lp7y(OV1LG_qWb
zq+nvac1@gijS)^zr8K_}Cw&BVdiWzdQKUw(&pys5DmD$#H7&2)&#07H0<F20<hrcN
z+6HXilF7EQ&*WbpciDTun;w~!tO9w?j)dm9ck=Nza!Ks(=L{o6MaWAbK7G3EdUr=3
zau02NzWuNjk=3fh2uIEjI%l6A>>)l=+4rh`!j}v)i8@By91||?pV7nM6Um$3r-3w}
zm@2&I)#Y05g98^FTCKLejCbTyiC$snj&mB?rxiO`L``^x-*G}w*tA}Ic$S+}!_(D^
zwHdFk#yZ^aAoB4j26w5$)=oh#Vi>MO8!AE8c8Mmt(RGGsZRPvBZou6yC!L%FY0>2y
z#2B?=`|#f_n%ZTYZ+i4b7nk${n;bpfo3r)vo$~@yVn4=5{{k?3oUz#y{1$YZVMKiF
zh?35erS?PzzVjkz@N2HgE5e4e3W#0})+lVcC(MoczU>)sVi-y0OMIMj7dN}C9MA84
z;w1!?IHxpH6OiAn|Dd$N1{h(WY#igutrl)ev1q&Mo2&(YVo${>o$tlSb3USjokTsB
ztw(dl^Y%1sFlif2e1cE?1$fA;(-}LAJi&&1z%M>g6<q!x82jO~+4X57^EjPpynkP_
zJvH_BIOo%MEQ(6w9sBA<<8l45;bFLSAQ5)!1mB#tv`x+)CwezCbFdjy=dc?hY*T@P
zhT-$fBDhCOp%7o28|HIl%h<qZ`tCckGABT^MxN)M{jwk=?^q}{w%SR&=kfX$doRec
z^YV}I-?b}ll8%jloAoaPb#lK2vh|DJZ=bI5`zDxA)Nm}2ikT@sHh;2<LvoJ)&~AJ?
zWJ$iNqS>tB#&rqQJr(Vh$KT3#Ro=-=6~z1f6p~VVQMqYP!=&t*m|)QJYcp%rfv~AB
zBl1v2VVY!1VvW!nCxtiZ4ewVQ7b(G+&dEDjA2~*3=L@o})_LyllXs_Kj^8}51R1pp
z?_TOVLtaO3BG=H`xNb?x^@D%jsC?#orlLx_Y&u@6ZX9f4c^={RqEJS*a*o1`dpXux
zOuhQK`ew${c;8$ah<h|-)Q}jj9igwn)YeB#NF~F1d<Yym+DvjX4csmg4^Gr3NHm?F
zno1aa8j!B6xLwQpMXuq+IujhFt1vS0b|)2Ou_ZZ3-7N4LUao(49coR~@LQV&F%ff4
z77}k+mNY83&Tdqd<sU32U>6{~^?D~oCCi9*0Y1)R_7ZPZp-&d2;z_<;r9FB$Pv0y=
z9#lCE$rGUOeUa=E=XFmGtMU=PR{7Rz3a&vCySe^TXN%F~b{a~d=!ld5<+n#P<(w4$
z2hU_wXSae0a)MGaPtVNC*k^h^5A?VYL5{U&LetatF`sK>1d|-ts=Td|yDkgK(*(>;
zJW>Ljwz;P*cqfM~&i52L8(Rh+2#e*k>!s(EDT2GJN8GMGdDm0gU*EfY>5@3RHTRY`
zHI8zYH%sCIQ5~Z%#ZkQ7`i*=V2TUe99;rYZ3jS!P--YDV@X;R($EnnwqSG@}*opF-
z1qGPX0%20<{9~B4PPc^z<WM6Y#9;0B;Lp>GGk=kslkAfnLbfYR7z&_=0xNK@e-p~h
zdp0zIoj=?q$#Y_euJ2)(`dq>FroHVEPLT|#rvM^?KWp5wI;L=3=|)mHhFA_%RAYuy
zC)HL@`oQV3{c%9?4Tvy667rz2C=j(qn(Bp|DBSIl<w;-+*twriKCxF`d;B`#Ie+uL
z<|qt>55Vh`t!V^x700&Q(}qk3!EpJc_e;hrXu9XsBr<hR5=SRDXZ?dmb44rKzfBJ0
z03B4ZrP`G8d_Rkhzu{EEJxCh=@*(=Z3K#+MC%Zgh(y0ixl$-K<gOO&t<0lYME@yn!
zP46{q?<2Yb4r#icAd)oW8a4=vjPG($5q<aELP~yxU!h7a^;TgvfC0%1Qmw_4Q#T?F
zL?>8Sk`Vz6KFzve@|-+r_en&(uH+Qt)V=tL-Ko&%$qf=-U8`gMpUT{Xjt(jI$qfoj
z+MPBf9|m#=V)o&nz1Q2rUtEdYzvpO9;&C44C`zMWN?@9RKzOiBqN!?yycUnmKgV2^
z@`(Tq*O;koLgGF;+hlc9XV~50+ZYyw4|Gu*7ceq$^Tq;ZOcVf}Ktlq5Xo{Uu_P~J(
zty_rPxE^jzLMmBC+gW1v2VUmf!@i<-c0Z^{k65?=nIX=Gl3TL757>ED?W1Bg0O6?2
z)j*Smu@1XsrHh^z=?W-^Fr%rvAd|Xs=jIGNsA;GF%LNC<drqrOXy<BDsv5&+H#We<
zF7%ORxy|k6g52<DWw}Fu1-=1mKwaj#Uv5HQ!>$8)m`F5<#2|QCZQ19S{SV+X1DKZ^
z%WG#0JQ(4aMz>@C&4y@*2I|USpK0ICabgh(ZTuZ;-H*Z$#8Q*`zN6H4w?iQAWTdJ;
z55#Q2ABPo+Rgsu=3<u6V%Kl<h5_qw3t^$zjZrga%#CI}I$9kQ8C*+wSg8EqKRN&^h
zn5zJf=XQ7oR=B}kWSCTT;5$&1;@m7X1%{HE!C3m$1o0?sis(orPjE%;z~jBCqnVB0
zqD_ICmjBig+jdRLwQ2G2F`Z)vqv_+a#vd{!g|L0KzLhVz^p{&FBA_9xjz(0)qjyuH
zcnm!=$M0VIK*EOfKkouOY=MfAAJbwYsJKbu<d~?Iv2d<#W%N`lHiYkdgWkPnPGwLd
zgysTVA1EaIT-);IfElJ7yy@W%hWCJCH_k^>)Y4zr55EMt9HJTH8bpbC$qsoJ>OD?s
zvd?%Fi7g!}pzQ2J+kaRsEPy?}5n31dS|f?qEc$l2S$En-uiTvNuis=lra&|#eZP>n
zO`uC!hd-pNfy|Wx<+IOC?h!gEtJ%f`FiD8(xn^AtMP+rSndj))+oElRD(sZq{9cIv
z5`94U^>!NqGt)k;|G8u|i?c6Jx<4(7OzsJuU<|dlu5<Ufe&C807a|hdR$6L@&k1TC
zak~SyWk5zoUF>$7cIfvALkc?<JyB@l@4k#f+h@c|8Bdbe(hwJ_h}v53$d`(YN5X|W
z!ik_X2Gh4wY+lKR9RpF<11L`iCP;HktKKwro&$+pnmctIFTjxO!}da<QFHw(mziBM
zt52vV?GRVd7$1MYiD+t7U-mbp=`tn|X6tw@A#9eAY-~!z+=E#e2wto#6$#E*=^siP
zJy-CG?Mlj4Ds$qWTQA30e<{`=vEqi<rcI%a_`aP%DLDTpAO^eZF=lUD5|n4_PQjI>
zfe>G9rzpbeqrT<mJ+aYKz;-T8+utF%p$te^|H?19nn91Fte4mDl2*PKfY<gpH=O85
z^=w`w=o$HGp?q-6B}Q2qfw55M&~eEA2BLM0gwo82YSentY!AW=c_t<Qnk@E7o%_~x
zFV}CIAT}gdb>GV)m4X{`+NiT3<S%hXAL!BtY`lxh8Dvp?h_DWq_47ylF1~ITy2J?$
z6^B33*>3i9g3lo?RjAxnTi8Y5`6kmhgXfV{MxU_!lq8j5^OtLMxz*gn3<Fjl1S|cQ
zYesnjkdz|HlWHyrg*iIKc++<IZy<&MI#8sm$3K`k2j$*Rr=6?mW}~Vvc#ixP48OVf
zAC1@4o-3YM<S=eb<BvUN>}?aJ>+c^hh(8@q!Gu(JNfHV~N{N9H)6z_*WMy?%u_z6D
z?W0Xg1~qQ`?b(`;uPZTW`Hd7K=bFN*fo(rpS8UXwLI3Isd0ZeYPmBsGBzKydP?8;p
zVYG>k?B5Gv<jB}|Tnw4f<!6nKF|yisSw@_U>>M{fK_44KxnyZj6&P;xTBxFOem!)+
zBqa8uF3#)NplYF$VSV7VUJW=QzHCFd+(E$Po%?;pvz64;%ZHMbeKbpistX=y7Et#_
zE}5bOl(2Xb(C_@q*+5<s=Q09@J4oNSJMpyl$GXRIR=?0Wsf=-8a@3m1gT6CR9%bp0
zQSP^NTEa7cOPg~574{W!<BB@z`!-Q_-ra5lkag!AJM!B?D`(|AsVlI&j))7|1X0by
zlCj?RCzd_WGr2Ufc~R5V69_{}Oc>9!T+Ly4VXYf6*{9+XWpX9OweRhYCbPTKs!(S+
z&R$;_Jfg?Ohz<F*8y8fzhrWc^xL{GBjwpFruGZ^Cc;*_5_nV1t4tN>jUrmx(#^>8`
zglWNE!htckhf^CP6`>f`Q6iir!%^oH&2|K{Y>e=^;JR325}3tvJj%?=U}oz!zZu<Z
zK1rt+-0WvDiy@kfawT0f5)QZq{ko)RNLmJ2$2pXM@ZoF_Uo<1yZQOX`z`AP1)9gQi
zv`hIAykPk3J&1QO6t~=V(kkf|-09Bb_j_{?YN#Zq*!BSl@U6<O(1gsBldfI?dgwZH
zx602CGxHbm;{s+IB0z$po#qjrebM6-AJ9iXqSs{-*0}pz(Bb1a9wSTBGpZ_|@9ju@
z^~NAw6nYt=dP(f4_?Ka1gmGrl%edqfBMr~5L1Fgh_U?@u?6>>Zer?-&-PL=PG1Qb=
z9&lgT8_XFJwslkEgJh>0wGC_5qB6|K#C{N{Mv8i!ty+WS?#bO1b7{IK#l6+>jeHr*
zkW9)ks(1CKo{Cwv%|cfwGQrzI5r5pn*BIakD><U!duPkkjBeC2aJ+yCamS)+At9A$
zlCgT~f}C+%&Da^6LsTRRaS(0Aw(J#VdQaqW2)}WdVQE(0dph+s@#puM(|Q?n@XtxI
zletckOZl9YR&&NIPaNQ57rN}1tIUbj-dN|rJfi_^0^N2>4>UL!zZ2H2Tb5q*9MHNu
zL}JZ+ktWy$wC_(#>NN>1!>$CNyX??#a#*bp_C{84X3=ynDmf<Y_=&mJ@z|u$rMTSD
z#@B}IX_7zgGnhAq+7nq)ldh~e8W(5sWwLJtDq)7%i~n{I*^~J>G2~^B;!-ZzKG)Cs
z5N+04F3#nmzm+=7q)je~@3HDtwGFwHC2(MSVl>`X&?+5GbK#p{Fz^xEA6D`YmZnnl
zHJ?Rx8Kg>oV^ZgYGz4#5*7$t%d)%*k!z2ElIU>95NVfB+j;cGo0)F*ZFC6FFwjv$j
z1NQTc|AEV?iWxfptn+qgfot>m`Xg9x$oF&2|F#4Wi1&V^0Qdd>V8I8EVxhng{x@#?
zU#xqMFgh5`Kjiy=^S>vG#6QKS|GU57uDSm22mc$9{UfXYpNZ5B^#u%666;p-1dan(
z2y`V_)G?%8H)Od6uU*pj%rU%XH}g76@XD63Er1ECY5^O)x7ib^nmK>E!ZaKDg|n#^
zB9CIt&Okp9Dg-2}GoEQ)QnwTsUVs65Hy+uXjiJ{O>0jX2S;eV(Enh8C8JW5>1g4l3
zPB`9Sd(8{DY9KCKK72}f{5|42iPf|L-%A$cYV}SU6r}f!APBGf*{l^yx6~faDK+|N
zoi7>4p-FnL4&;Q+@bV*WgtyTeEdJuRl24#Dvb^Y_ITe71C`Iy387nI9D;um0j+zwP
zb=x^dnB_~DfKCfX-#ydLcJ<nUJECe!phwPSl}tYsvsr<J>jnPwID$nM{?PIE>$Z0V
zW%3Z(PpgKtuc1r?TA}im6Cpg8LwZoRj7&7`;1ptYn+;F!xV;jPow+co;;bk77zk<(
zlS{C(sC7&ED4fyaCOM1nR|68CP?bSUQ<t~79YHw$2CuX55;kmPNBZSmmIYRfBnT+1
zZyPK#$SQyHVUQs2>=D<HYZr;x#UDI%1hl^5L?<|ixy<Qr=*2P=F#L1$(4JF`QH;F0
z1N(Hzi7i@~@ifN01SQh48RUrv*3=ywwrmQ8W2cDXtKI3#_Ihh*uCTN#`s)PlTL$YK
z3W<Wq2FHnVma8|oz%SF5mR39@y!8D|9)76RReMIJJJv#Fp;G31{xMG?Fdn!%V;}yN
zW8qY8NAgzXkYby#rk`pg{zmGW6iY;ojGLf6$2{UQ5CRVly;V`cvA>egC-sTB^1e5Q
zF{?H_2RGh1!H^%-RyVs0wSK;<f#I2{XXOK}Ye6Ih5jad*fqE62PWJwU?d0Pq<1Wx0
z)XePkc(Gyddr3k{aAQk~fa)yJ4jID{-ui=vN2~`@!^vH*v4ods@HcF>gTBGio~S2_
zuIaLB*z4fKo=?v!*-GW^<V-<)qIB1=QICwPpxQN)c%>0nBT^M5u-lSd`VXR0C4|D6
zL4fI&cNd&>Oa1GiGa@@|xL$s(DpI<&W5Dc6{{pPXP0B^^npJs#d9$PyoDhPP&Y!65
zE-BF`G%OYD){S7&y24@^7T1`@WuY2cOLJ05CJji-`kpDCEyaak7fj}**sJf`!&a6Y
zg5TMhig=50bYWj>UP;x8@VX-^!g<WzUB3VhY~S9p8)>X`Ed$R|z?j(kWR`0U?=uu=
z^yY9CbU;@xc%Bzi;RMtK{;mt3HQ;I&RH?w-V{5FvXTtWue7WBJ>dus_pk=ag70chd
z*OpQ>XwSzq=-gpSFbp(D)w*K}z$GAA2b4cxWn~s!4m2@}^7t;OS1t~=3sblLTD{U!
zCW?3Ln2nsP)N*>P3}z1xN-koTrFi*9zz;Tn`_1O(tWBOj+{gn{mTn2}9|EmPlB*T!
zg>Z8LdIA3ZVX4GL&f<_NbNLx)fnmpoP5Lr;T#DZwr!>AojiXFemxTlvb27<r^Do&}
z9%B%8>eN>4fe|=z&p+DTF<>ed3N|kM8-K+vpzxYR?Kpd8$u7%UGBdec4R{q-4mqw;
zAhs}7VQ(ZcR>MfaPz|q!R9-`>GUWN8dj96w;N!OttZ}Z@3%`H@F4=B%myNv;w(umI
z>&Q7Wt^iaMCB*yJpv2;41#6IN<6vg`oFOF*K94SV*b%#MH?>tPMUs#E*ySEn&uH_R
zsEC`dcacNC-nrvcU?P2$tm-L|vb0))?b7^5lP6`l`y)<qHvnbe#iwzJH)thjr&)H1
z=a8%cj%muA($jcr(yUJ>R1&Ko5~{y3x{21i-7m*-dwZKJDJ$mhJe`D2!E%j(l(VIo
zLzv%IVYN#MoBK6H*d@sQ2a2)IOIZh!$VOB^6$5kxa>at1j;&+1Z?M2l@1^2vWi$jj
ziQbw9;}I{UF*fdg-l@a#gfcA>FQxJ=M%T?g=(QyjJs1QrlSBsvuBe7W5S&?B>UuB?
zDs})pSU>QERjq{u<aq;!(3{928+B6$8~ss3r69Yw1}AyYic*)*E=}pxAuj~$7U~)U
zE-FkFZU&6jOOdn4j4At^g(aI1T^b1(uz&yJ==6IYP2FMA?u*x7nVQg3AF)Hg^nbL?
zx^Z4rUUDWp&hk!!_u#BDa*q?DeT;s$AawjAOi-UznJ*DG8DN#L!bxV;0$-yIS)jJ1
zUP36)=~a(f)a!0pM-w~@ft^vv^DJz0c{Ky+NIp?jT!cku-GsH+O)1GRUg)ZaEEy>;
zEIDuvs*Q|<dfsr(MTJ<Q1qUHY9(3ln!rvS+U=N$l{*}1-Z#?E4D}9PQWSN+XhiaEg
zC(Tmr!t_fmNi#uG7zieIhqWX}05z^mp|xSdFxTBzM1w76+y)-QxUbg`*>nl1-=>{?
zs&J+-ik%vx?A~}@Y)^iK>5MG>5AxnJtgWqE8%A0vrFe@=p-7=<aW949g#s;3a1Bmz
z2`ye6idzLrX@OwD-Q7uW2mw-D0t5|j_Bqe_&e`97&hPifyDqK})|x)%81tUv9?V7U
zdu(dl6e0hhq|a^TNLEAXN9Dm4UwZqwDQd-<uk|%ona*ow%>^AaVRxOYWl(`KkeK1b
zq3G(WitMFmEqmlrWy0fE22F2<Ewk}UD}&y*Lyha3??S$^O}MwT2BK%(^JevFw;yS?
zXg6ZdR63r0+&g_%C%ymK;U~#MVixh3(@+S$;Q>#8$MECU*O}95P80zhxYmLufkVW=
zYFFfG090j!GIh_NG;Hs5oZ&^sE4m!W*E+i2*aHmr{@`D~KWLs8H0p>yF7%y#(q%&e
z8~FZNrL+mByYaP+RAz$W`yCf|8Eh2yidhj=n%ZlhH@!;XTOD2iv!yj3zdXCf1l`w-
z0U!W~d@`PABA(&Fwo7dQ$>#Ua{>7HkRu%&KWhQzl`WndW(^s&opAk~w+&&d;B*02%
z(uc6^)#6VK@k>4ll|(x8Hnd>!I)+RLuG%jhsKvac*i&~JXuBtkaVy&iD>QxFI1YPs
z8Ur>m>%h3iT2=Frz=@F7$JV8n@#__X9n*w|$*q>p-Y&Y@?wR0Qkho5>%h$)x5VTBp
zi$rCXG5=^n2LKtUen5h)>2Ku&ed*clJy(1&Ib%BZ1}oy@@yZA}yHL7e+1yx0Pd|-j
zzR47hIiRsdPx(5ev#GMuvTZpS9vvUQ6c4ddv;=ZIa~4u!ddQvgkSxYJ*^5k&_Q8JZ
z!S=81#{6gKS06|$#wG0BMFcN{msxj80-h6(-3|=jrkc&grT<hBE;W6=Z$dmuIY<9H
z-dbeMznxVbY6#5djM)kIDmDq$qqH3Q{dNb*txOv4*Wg`VsO2o4U(FjXTo|7SnG$ic
zkZpIdWaX!J%$JJD8L5&I?PGwv+7??rv44MhXW4P_qA+w0^k>3k@+`nRKH$cRjXiH*
z`BUN}iiT$-58?&or5omE$SMB18tf$-6soKXaJ~o~$*w_SUY93F|JT(&a@Eh(<~{;z
zyBS$&D?(0Dx3m58LO*uF<z@3s?Zh4_vCyU&upv6B0bV1>xb3cS++nNUp~!tHKErYY
z7nL8PFZkj@STr$)pJXJf{Nwz7$^70eM1fc$OH2xM1ZwAguLHUXoCe%Hy9dTl?U%Tq
z5k6#=JCEqyZtsMW(J1#zGarZ;NKQOm9&ecH#lE?mWz~&ms#)KpFd&bQeo{Jd^<h9v
zZ1o-Fa{Vsk>5@trk|KNW%6*N#p@?RH+Pp`Ij`R%UrRz%LHN!%F5|p!~J$7_l*SMuF
zr_f9HrSpzP7VMDJjWr;cK-zahyP-+eM2+O2W5D-K;HmWNX5h+Z=em>w2m|}cawXK!
zJrUn#J=V7~r;bS;(+#{3AhKWfA44{zpOi-(UG_TUU(!t+KkUwF<5TW14w;G2qXyIo
zdX>oz@8bbho@U`|Q{{qE9rhon-E^k+lqZTehfVi~RFr0jFJA*|*YWljvN=Xw;3m}z
z1EZKEb1CwgS{Y5B%G#ap(}1fnc8%28KgR{K;#p$PukyQHcC7aLZsA(u=-`Xl(eOeq
zE;m?YW=__6$-de#U|4$SaP+vRP4P(i4318)Vi2Y813hS$mtw50E{3%9C63a`wGd(u
zLR~gn$2;GaJ))J-_~8xk)8}ncuT}<gcFCq>W-tS%$8BXnQ5h*j(8~fg=Y>d7(h!%(
z;q9E7GG=?hTi%wrCw`nmp6|2kzTc|b4slf`B&xAZZcx=|gLXdo*u@Q{Mp(`>7JP@;
zY7rKh+`MZQC=g3m;FYh5oT8@m^6i^GHMDj|<*l>a%8Yoa5NTnSY{B+tvM~!2+Vv6&
zc6}7|NB~!^Tt~F|`nYPs#x*NI*5I2?E_|YuO*^jh_Ch#nx-TSkb&)|N;pR^^Qv1OF
z0oRIJjf8^41$57OJ8(@}@NI=m^|PTNxWXuJ7i-8Mob`MP5jh7Lx?Ryjkga*2Q9VN5
z8*L?M2Q${@9y4#wR*f0na0NNrce1p?1>dz^x5T9WkUF&7ULq^a$dV{@!Cc80YMi?x
zkvR8Kv2WxK&d9jt0V=X^9T}Uf6hcRnxmz`_=4DKX3%<#=qzVG8d-JW?mcA?p-*a%u
z$med=TIfWB9o&|WBvKBFSB5-lT89_JC7|_rjKGUh{2PMSoc4>R=G(^f#jt`<&4k_w
zmgLU@t~48`@hS5d95vg!VEfGaW+jQhPeB{J1+zq6?<8t*fUmm}_=k$oMj;35IB9!h
z7CGKd0-&p3NX-M8j|)OC-U2wwc_%Q}U>ywVHq6g^UrXd7a&HI1IGc^+LkS+_%8hiK
z-HojF{-j0Yb`p|XFQP4YL9Mp6hK8P<7{sI(-y3D&FanNHU^euKKoqU*40^a%+n^Bx
zGNTaR0D&Q<tLRuS5pB%-n;Xb}l*4=37oWFG1cu6+hs&KJ2#22{vTksi-vvg%vNnTS
zV*|z%aq2sG=F&18%q5l0&nPZYENqpyQv=+b*N>c963r#bm3%fhJclw?^l&emAB;f~
z1(`~Ze1ao3{7|(%o@GTOL-O7v0><Z#cpWXX;3CiSum!b5c~#W9Hc_#*>6vH1n039&
z((KQeHe2!2I&=1H$)47i5jQtP#pxtm=?@}mHpagcedik~7<T0dJol$}Yg3i**d6q0
zl*1fq%`@NPZM0o3HSyPmjC%2ECm8FReb4=cu6R>=^tF>HsL%7;mE(_ZTAd|SN?ygG
z&|IDt)IC>Z`=!j)QF8dpgQr|hio;rrNOL)TI9SU#M9Q}Qy>NZ^h|W$(rtxj+jx8tY
zA0ye(uaj)rb=af8W*oA_w*`(;vWi8{HcqXliV!t?7``k)3(%PO;gZ&;Q<wyB#nN^C
zJ5A!V0`|HSwvgcMPiH^k;51Uc&#6j18jY;@=dbz7<gIvv+x=I2nk=3b<-ik}*+rXU
z-5o!!F6JDf8qx}S512T-ZoQO*xjHmHAap=mV_W_00*vmGTa24+!zAO9C>OcP1zlqJ
zXX#PX5b<7#=|l|q$IC~^_kp!3^s=Exdeb47_P+H!;&vLWmgpCe<>ky68_#GVt?J@8
zg;^}?uMSwFpJW%Cb8rh-@kz|dFcsXm{JK5WzQ>g?nFCoAUr*X$K=w3<MJ`I771~Qy
z-*@Aq{isn(Qzs!JeWthLANO>DEL6=(H%-r3AP;F-otHOvb<d?=3BeAcVY2<^^*I0r
zKOK9jd2`og#xX#TkhXz5gK9(GzG_^#1U-J~k+qg%6AN~w1rk1Ax_*f!4Z-^BVlDGV
znrn!mRj~GI(rZU6NsmN~SeV`)j+UE{T;~tHF$I)h!0hOlhley_KeJyW<dK0t1){j5
z<*a~ilwh$<^mWP>QkyffpV}M_L`OXYrWbS*3Xuh+AN$_2tT_=`gBLl6Rzjku+a)?q
zh(4CoiObkkpKef+y1ljxI6F>739b%Ti%XAXFZ42)w5qyXvcq>vpoSBtM}&Q9ZlF+$
zGxm<H3`Z27xlHC^HcAA!sXu48vRU1PhpB7xc2F_^)EAW%JaJg~;H|e~%S5%ZuBI&U
zS?*=8i_6v7-WrShx3W}7d6-g`KJO(kbv+$WSZ!qc^p<xiOK}=;ca(!GQ^U&xKrKEn
z1}bnqnt3+50{=0avt<|9n0JuDWfWcBEU>I5$tytX&ev=QpKg)R3d-2MVd-_;`G$e6
zkc0Ux_?>N#SktrY5&p9|e5Eel!8^hIn6?2zm(K{flXoRin_?Mand|||HkZpiJOFqZ
z&Z7J^rH1CoZdbFBBl4G${WS?q3~fDSd+AH~bh<7r*K;TLx+a8}56n4+-=l7}0oN13
z)8?4b%S@Qf!T_`1eFl{~dVMi8Tz`hRuIIxJ`K}&m70C76G<Qx}6WyM1D9VbkA2|6#
z&}!LCXbRg#*ih9of$Cy?B(89$VE`G9+YDgy^p6Ntgq#!hxN?>plYr#wV_;2=>)FY*
zFwW*Ld5cbgYMH}3P)C=hyz*SO>f`%c5IuvMroNFw$g*tnJAq@n;Ft?{uigYn-oQ|B
z!=am1X6rL>im#gO5n*jAXkGa8?a*!DTrSg>8>%)1)O8MR-;)*uuKLQQI8IPVTaVBF
zdJc+EsJN}?tHO|Klx+{o>Lb!spA((HQ*Z@@<hZ?rck8IDu#V-&b;d0$&tW4l{gfB6
z6gkGMiE0~@Bhzp@@`FEEAn>{d^$>b2Tsh?Xr#T$SP=~46{y9H~nT*wFwP~{2`g`VZ
z@a(Ri`bV*XvZ5L@E_Q?k%^HGC@4aEh@G2}w#1xna>||bBGutx|>Jf6+=iyv`IkC@G
z<>K|;WqTTjzY6;?AaI>jZuin{-}2Hkv64C$St{8T)cWGhEN<um@1z)7b`oeHr?HT_
zwchG1z{OC8OphO#{q-K-jK=d5$*MH9y05>c5n`1rA(ehkyGBa;aCD=g@MM`w)kRu$
zgRm{p1$uO)d{nz@XFztHf?~oP6i8~8cS1>LVfIHdi3RgFZx`yQ1hAOM-xfjDBsZtX
z($PyXF|78j2j%+(a_|B2@cH3@Y~(8t4M<8-^I(wdoTWhPeBXu;Y|GNIM{l<mMFZLz
zVVP5LTF1faU*}dG>jOWZ#VWTT;@h1?q(t-eK||Z`?x05@&pj@hze3V2awq?AwDH=r
zzLQnCy^!BzISH|r{bBdfE6D3~MUVbMM#qOZCZmId$41|rtLa+rQWrf@5SUxK(q}9&
z;&DZ=F}f7qV5vz7CwL__dduqiegle1+#uXKnap)6_59V)FxD;S$eY(rL@cnf;qku7
z5%)`U|7&&KyY#_i@H?+`fA_(UMWt!w!{j9r7DSc25s8NO@hZ97-$LLYa;ZdB-<9Vc
zE-@1ziU)QB@!A429pEyX@dwu6N}O?*gDw_EE`GkUjQAqrnkL8eH3seLa|s!!SofS=
zjP>cG)%n<o#d6l>WeR7{jmNS_$9G&1(}B#rB!h)K_Bif`e}-h0msx&LYw7NU@)7=O
zKFZUY|K?J2A+4%vILgw;w8JU~@nbFXl;kpK-(l(5+l>-$_3d(L{;X!(zv~z6zR83g
zUvW$V6B!m^1@wi9XqjHTIJ0XoQ#?08laJ<yZZtjF$`w0&oP=X9H0<mXQQkG+oy08r
zqPg)!jR=XUA<iF(`Urq-SJ}F@scVnQl*B{bik3c(Kf7$D&QD^ymP3ZDX54wU*%w1H
zH>KsDc-RZb6*ji}abI_MG$(wJo*T}oY9apif%J#EiQqF1WvQ>iBpQ$*!d1Xm8jyQ#
zp>a5(tSSuO-DY+&S%@*u`rg&Wf5ifb%QSPqo47vsk}M-MS>i{-?(79ZhbRyDa_8A@
z#QKu2^>#M6cHB~CA7OV64)2-n^l|a$wFCJr)VUnB_rA!ZX}Rmder9ozYUv(?(C*l>
z60P!((FGv{(_M>)UJEmon@x~8r!)0d;rn$N4-;BUTE~SQ2^aS6Kj9WIKN1a;H95Mh
zv+Uw5sl19AitQ#wI%*sFoLm%2MM*TSPyOVrE@EPyxAHI-E9_%=Y}ABddk(gzt!rUf
z_(`#x&)F)FQ2E|wb@fZ1l67><_;UyN2i9|p<4}`z@6K)?18v`YY|G+@ke@{al61Cz
zgbxp?&rwtctuzS8<?M27j5C#Ze|o|M75mN+G#c1E<(DV%u5cMe9%GKv-e`2aEorLv
zL%6zNc~?;c@#vGsr^n_E5*VUq^&Q)hd#CYT4!_zR8uPXe1D^br?!xt<o3&f*87Uur
zeN52j;mIVT-TJ^(^OJM~sji|lk+AT2SCgej3KpOKaGhN!NHYH5^-A$c_;T|`CB?aR
zwNZ9gAU}Un+x=p%dmwk|UJV7xWZA=qX$LpV&Ot=Z+55S+=+M58bnxjDwwnms-UUmj
z8tM!xs}%Cd0mGPa_OK*5IFvP4wgqt34-GF6>-%C_Ozh{$aII68qkgOP@<B$z{qMu%
z@#Tu)7cM)tu3iyA_1*PNAlXgmx942ZM5wMEkM$YkA#h4!ll@})9)1H?v~yTmKda_g
zISUesJOWNJ)Jh`HcSn4zLu0;Fl@Q}F;nO`Jla3siJ^s`HY+OKh*E74SuwdH}Ud;ve
zTiRppJz(Y}aw@Eoe<2Z=?PvKhKi5`U;4_1}-Q8n3=bi8mu4H=qk0E;@omr_2Xzd@O
zVk5i$xr5%A$1cK>1j^Enh2L0|ERH$7u?wUrlrf}T><kiop=<oOu_9#R+_AaU1q*Fq
z$hoe)ap^Y}<s#jk!;bm15!PO1%e#{ed7|9k7W+4DjNXO`eyQ6A_d%Tn2b@JTcEwg^
z(688^ff350Sj(P_ic)osrpgeICS{!bt#TypBtY6EoF_rMDYk{X-xN<rd$%*5vlC5Q
z_npkmk4Eibs8?>mics7Ryw*FFxrv2A%wtkA4xuBSbjC|~`E{F&T)z%MJ3B}6`}I%?
zY`_yq+bbcm*tR5SU{Qr_FM?LyOJ(5_P*X(m-Hcx`Omg@bCvQ{)vGitV_$Wm@A^s&6
zIhVB#<G6krUhZ>fJ{1pt!TAcPCBE1LgpdZb+?PhBJ2^-k?QcbTL{G|!NF6+BK0Hj9
zz(b^ziPE9YJlSs_#@8prcI}KveynB6=y~J1Ra}3{<_`*o*CKHY%iEMEp<Z(yT83Qj
zBq5fir<X>|V`RrsqeSUH-OfS2#f71)_=&VbEA(AI^fx&(-&5DQJ**p#l>`lRdLdSA
z?ibeX;{%B@JSDamX`^auYy%TDJ`0fZJ;05;2~MB8-^_Tjx*PI&5RGB#eJhsh8_0xJ
z`=*#d4O)b$w`*&bXRBrDPzT-qM$se1;u67SqF3TQF_mF=PU(*Zha=8C-S019EoRd6
z(47~9E*lDVxt5#1_=79EtiMwqa4{4JI4rzsxneCXFcSdgTmcE_5Uq>fYXZxnTIEZE
zGT0ZizoW=47ie)}3?7@wC9u#-O<mDB1ZG24WvV?ileD3TA*FX%DT~X1=8tH#Hs*4+
zHV>eip(AEEfAE_eZayqtG#woKnK!rUG?!cch1Xkc1QF{}N0Gh?JQ_|m@|<a$fZWGR
z6>;81^kGZ68pm%+C(=G}slZL#6g~XJwQBb+kxa`Gsm~UddzZpr&R3NUaV(g%m%Dx~
z^Z1k}JMOH3`aZ)t`GnAs@#^UEs26oa=$jNy@@<3L%Gi2KrPesa2EEE(D&)XN=2CgR
z{5w*)`w<!`SeF(6WBqmwJ$+u+@RF6kx@#Bdjv>N<uDx3p`W=2iZ+{7K+}&O%%q>5~
zlu~QHv`aP~`@VW959Y<pXOS%Ac?Bh`6aM5BvKRA6jv#wL>o+QNy&&NjY~XwG(F747
zvDaebBR@EwcpDu)M1%9AmW31W%&Pg}RvsBR<6yTkvxDDZ|IQbAx4AhIy)8z!Y>H-V
zY**q9c~i%mWeH}$7`5&Knfb+<S*Pz8ak9jO#=>mGn;g5EA<x^2J{Cz6k1J&&<XDZa
zV()4_a~6}zy<+WBv<bp2-Oi2c;4RbT^LP6!HO<d~nJu1qZ-BWluW<I6N<ezSEpC{P
zG+`Cl1d<Hbz2;I_>j&S+S&K+O0#4HBgjDWUWghiEst0!Udy?rpU}47x3W4ZLmX`q#
zx~d)ni!c;Az{tnG=_wbheRffk9j*oSkV4;^yO_)UpOd0=h*j#wqbMIsM)cry5Sn@t
z5mj=1E?-lg6GRwtOB-xLqTN6QI&AE)aL_jDZQ{)*Q%RNk*y<4tPWBKd#ikG=Rnw<&
z1v0GA;5)*q>c*0W-c#`<u7YbXSH+>a%|^5wvdvdj8dEMiJu6jFel8L2o9vq}wbl=7
z@f7?r%X?WWe-p}}2d9i4c>KCLz7Bg`0icHYG4{)87RcXwG5WBg_ld)bVs%9Z`@41*
z-Zg?(&!2&lP?3KI<Wx-p9)K^z)T%VUBOWXCF>6w;&e%B?Tyrwqg;uQeekrh(bZX}m
zz+ogLASv}c7_D|qYeK}d{g(^#fBbIdGRArOPxqxK*(BbJ|8Rx=?ISrbi5lnA|H%HY
zuZv+dIB%5xaji2K-VOa{;cVzHBs4fa-st76fPa=1-5==QgV)}QQ3yHGXd8x3?B(dN
z{H+!L`9}k;0WZ#bS(ZMJ^M2xjuV9e7@HYFusSH*Lr*6l|N1{55+g_-s!178H|4pTf
zp}+ZY?x+?E2Vq|?#g;R&RC4)mQuUM6Z{ocO4Kn$V2-=MG3)$A1R{2M-m6TlDy=idp
z-lI>BI*a8J5sTySf733{oTbB0tsP(aWKRC*)m8KVyCh*JsUiR7KKcs-07pVxdPD<+
zMl2Z7{JYUn5U;^;xAjjx!1Rc#rL6nF{?~@;!+3&_V;CwO+QpdvwbSi()g9)TEl?o<
z{J##(HeUPF`xpBKw?aj_M_SN-TVW`A|Krp9swv?I!ANZ*)69i9shH?Rz1#}*U+2^S
z7iucE3bO!Oaz1VAu+3BX;a&;LtoKK+ajq{PYSs9}0)u8!t-mK>{h3M{Q%YEOuSDyn
z^qLbzwajbXXJr{9k?#gQRJj|E$`DU@Re&12;ojfa8}^Sp{E6Zet9&-tys5;o;KoMX
z&GH+MYempkZ=s&voas)wUmXueL=7eKIPNn!9r22NXw2%9!X4P<neHMPb72r&OEYrD
zo_p8`7XvJ>hFWY(c=&JpTB%gCN(qRtICe=1aO|>nRW=2%8B1uJg7MDML6|EFyuJ^P
zUz~8O8%`VY3w1bhr8TN*KgJ6*0Gway@=@^T65^{t7I|vZooq7sdz3urz+%LPqmL{h
zq>#dw5?&t*0n3Mt6&|WJGOsBYrQ3#O%p$wZz8jl8cGS-TzkuKtZEoClctEBbwZ-_C
z546CvGJT%44o}7T`luhwkaSLx#1`@t^Eci8a6+nnwAEkZ=>vkxGQi!j617^t0;n0o
zdV0XG*Tf7e#J<tLuAKCv8XKsl&h3s-FC6Nt;Eb=N2=TV+@==C9r+RDTrO0#Ci%u93
zv?YNW4leTPgm{%gW(6N-OTIESYFs=0S%cF{{w|%!PKz>wrc&p{O#53QJ8o09Uu7>D
zD-0tL#F6YDEH3X%Y+rkiv1CwJ>by59@D;CkAsRY0MA2grQ-5EBT8mdfw((e*-auav
zkX=~S4De~Z7rRW^VVrG+pTPHmZ+4Z_+4_3^<Wk~pJ3Fpc#Y=8AD1xtsL?2<hSn;y)
zeQx&%UXrnNEJJ#<OR75RL4q1ZH%IosoqRo$TT`xbxTsiuta6Mkf2rjM&-eJ&zopyp
zs9q$s=g~u!_$Q7<M(3S?xn40?L78Sd#j`X&Jle+du#SHm>GdkpL|BAP(b*DW>F2eq
z3l=A$;i?$f79Goan;s)mP+z5_D=;O4;;HpVpL!P&XqI-DW=p1g8B`awr0=StqU6hc
zVw~5u7D$X7qfFbZg!ceXXPdj6B=BcKJ*fZ%Va5%bf3(az+`Mu$3I`8X?`&Wh202Ap
zULmFe4*UB!!^){5d(Q!RrxKoBj}|v6p~}-;lKS*+Nwb_I=Q8<bt?Tt{uBgF4o)=xc
zxxmjx_jW{5fMdk2;y_kGciu`~oFiF5{lzzw$!}woQ1)t08_xg<8nLv%O3PID%R6dC
zqNAoR?HBK!LRqWIuCAPWbj5{g0UWuKYNlpO=eab!KRPbND-~|U#1C&Yc!$droQejG
zA*$0qpPNN~20ryC?zU9Y{awo9_C3_k_N;2iqVj%+dKzS#Qtjb!FC#BhjFxcj9#|4&
z1lnYO{ZqS|UZ5Op-{bJXswO(8)Xb;UJg1%Ao08#;Gh-3GmY!oiX9IgJ&h@r`t^*>_
z!io<Wc0X1pJ~FrfHE818`_jt7qxzXAaX0YQS`Ugn?NR+Y#ukxT4{++y^hoH|wq_*b
zy(5$h`0&b&4xHYJmumY{%yQ&6IY%{ZpXr%@Mpe8NR@A-POJDU{eIewVgG66f+QrgK
zGDEj0MN$bWrC(F3u*)9|@sdYj=`!~6T2FzI{^#bErF?0Q`c(Q$QmM(7MWhCWo~1n`
z>V?mWp@{Zb+O~jvyP9FmpajGyFkB>)Xp}<fx8G^WkX1iPEPJhjY_@&n870oC18y;(
zO$NH1YJeq5r`O{zpU1N=pGfN5<Hc7g`&u)X)<7iYKu&}FV${}bUP+|(=7}lx-8w`A
z+jwd`_}$Z;`uyyqzv>l$iypQ2hTh#<qujjKaeipwQe}zGFD)Y%$nS%y8_3#Ki&|xt
zuxIwX>!g3H8Rm%ph>I;(_O&UXnby3UedhT@r{vn;eH<OU;&jU%Ble$Vx|)>Bxxjd^
z_!^9}0+8OV2F{SQ%(9gp;zmC;Dkce4f7|IORmj8T(Wx(={VTsGo<~V7Pu(y5?cTfX
z2}aB35lD6#q^%mhxZ&L4wIyXWEmRVjUedO5{fF-%E>-Pmd$ngv*a5}SAg|bc6AvG(
z$3p}fO~a+C9vi;JsN+%lD6rD<2i^yR9{|UfZ;L4n0Xbn1P9+VbP|4!ySkwD__Pn-u
zz9vaXyP!#onqJp5O-U4MD#yLB(E3It3E#`Az|{CB-!@91zH<(zk52pDotX#EWNK@N
zKe=o~f3rbcXTb~~oB0!p*Prutg2&zvJ85R|I!;{zkRJ#1K~Pn5sJ4H+q-u6}ZPjwY
zyDrC4?i6mOo^0s%0MH(zYt7N4#TtsfooB-Y@91+f1LHz%i3K>)8$~Vld!>&zoGGn;
z{}vfzX|Wn5CD$r=%+D9PL!sFU0mvEQE#Cjkg=<*Cz&Vu8(WcNXV*%$GYj>uk0`pU9
z71T4mZscGi?jc~zizU*~%j~81vhC;V-I0W@BO3yMm1>%T{y{A(_o*_=UpVlH07-j(
z4wTp$0<)j>Mbz{H)g(36YRfVj{*qJI)}{$>MA_o5Q?n)|XW=hog%;2>?^j+Q5$s-S
zIK<*(&%DOlkHSiJ_d#XN%QOKU$F>)#8fj~xF?LyLKecQQ@;?|-b<q)f6N_V%yitZZ
zNz;Sag|&O*nSFwX{|EO=_HGJ8jYY#J##u#wI)9fdiZ&+=ky<u7y9aAL;ii<-D{EMI
z$D#CWioA~oJk%89($n3*He9Uk7w`YAFb!qG6V*Mz_%-mNs<cr5ujNsf`z58dz`Byw
zF!WG=<*TXNsT(na2rOweylCi>zTzB+jE<3<3W_bNNHZyTpat=~1R{+Q-r`XV<!hpw
zLSo;tcy*kxu^gqkL}S*wcqNdjO!oHKdvy(Ql?-TUtgXll_iVn9yyP~%nXGCtpo*+Z
z_>1#@w(q%@m-pEKoFdWu=|9;2H6>{*`yaHYq_nS#ulQtmOv}|r^gprOzLAz|((Wu5
zj)P`UqMr+*=j^{R7&;{}=$IH5bhEH>ePa^PN>Yat0*<}?Hz06*)bom<4zpu&q@Y7f
z{)z^KnLu5e996Qh`4-i_B%Fb0gs9JJOkgT+e?a(LAa@vC?goW5e^robJrkYJ4@|eu
zk*qba?qvV^d9i^dRBX<t3N5Q)cR+=;Onzer+6N(VyRBSmPQbDP>(&ni)3q<IP;CiV
zURcEUGI+YN$I*=|_%^r(gdyJ5f}mbxeq~9FsP8-o>OY=UeVO!vb$7GiglXNhT8S^_
zQ28un4n?aKNOrWn_O9+Uc|idE$NuQDPt~cY$&<J}2mbyfb?%l`_sP{g`(@n6w|@lI
zPTji1fEg|yt`w3a4@EH0menFvr$>`bEUdB^90^i2f5aXEsC=&k6pE!+DWM(<f!xi9
z-rst-cL^52t>gQL+Ltn|i-_Ya{tShc@tX8Jx~|)*9;C1tzxE^3Uop0Fp_LD>cgo}n
zc6{iDB$S}ffs-~YGB4r61#DNVVVObs!_)q)lY2))!OUZI(kFhA)~CN$#-0-vbgWd-
z$lNyWHR<Uh$aMYoGl)17Q^Q{m|8NWPMmETr4D8>MFSi>4b3Rol1i+p=%)j7?p8_}E
z?kyfrCqzxnX=4TitV|ON^?O<7v=^dUHETp7%FIFdg|)@4AE$((Npk#xpVWX(U?wJY
zt`)Ky<#BM78NWHp;kjnUP?&#|y4BnoLx_Dp0=2|Xw!s9o(h&Y&KTpyoy!ff>>t{>8
zs!|)&dO=8bCoxOkN-GnC^2xjOi-IUwIoO8h(*p}WO#*{9z*G~<6Ug4<UX5huij8hN
zRTog`b*edAW)Lw$6i7gfL9l`JSVb;o0a65c75s3GZ2t>FZ{=}jKwxle5gDGF1I--%
z7<O|T^K1y~a8$)|+%Qq48b3svG3;n=1&<huhuhpfkB;4tc66ICB;Yav{adFt%t`Ck
z_OJpV8sDUqo7_Ll2`lr2(WWbE6cgV%0iPx8k4bhzpKsWx?k#BaA>~w@#+aQSPE<E`
zyO9M1H7wYgkA8LJtA@?K`n8t@iXQN_*m>mazLBmb-gWMBp4WsEt!`i0;azKluC9?i
zdbQZ<qEyZ0=4iHWhdE@fBFyhCM0L+lYW;YDo#TbRvYHb9&egwtffD>%ONq(LJ}l>s
z&a>>*NDp1>DAnv!lD^P`^ySlgTYFK))|CMf3FhZ66+|A6m`^uqbw%)MGyawO@dBnn
z$wCtB<N!e+?`q{2=()rw&y|Y1V1zPqu)WDWx=|6{4+TGVSZ<{*5`$>DGPtCLdlc!0
z+(-g~zAh9EiRL;PcLE=@9czaWf&931JV-l7R>b;NsxpQ<iUmW`{apORg3aXXHKkPG
zS>QG5fauM!k2|Td*D{z@Kj7~YhI)bnA1!<N5+;@Yeb6U&jEF0YJWDnaP8J854{AZz
zc_(9u!;^g#)n6RV%gOr894j;Qi7rVh7ZF-f=6l{3o{2+!(xz4X^@eg#_;I{KB2wCP
zp3(3Vuk=(T_$=Vssi3b|iitBP-Cz9mWoy`b%%DJO;4#s+A`SwcS8`Cx^D^R_FO_fN
zcp>ptBc~pLb(2REAkAid_7dk~nUPa>!rp$S%X?Z<D$URT+NUVyp2<DGDqs_BEO5Vk
z$-f>W06$G2&!A?VzrI=-@w_ivh?Y?fxUyV3rgVkjVCAxkPNd*_{jufjX4>X>z2KbE
zwKa_08EYJh5fR`inNv#ViX~(cS=;o^@o7k+wr_*X)V^n9@}b2t9ER2ZNyO31;-P_n
zmk#g&v8Y=|jc0bX=cF6mRr3?C_Z79rF|lpYF!2|<zbmVaHpK~LLrP#9A54J&-x&`p
z9*whclpj8#j4xL_Q-0YI8#@GdTZ-~9O&L74Ge9Oh&l0P;l2>vX!cJ03*q~qV_xHj3
z=H&eOTt?3Tw~G6?TCm4kW@M?%pd|PGhg!9%yJ0Ln4D)ItUe+l5;+BlF#2Y<Twv<bH
zOg~^g|D3I3mpl3!3F1`4jP!PCwa4E`w(=w(GSRp3x|$fFMqH|#PQelaSwR?F8TUmJ
z<&yFKXe-!vyC!JhDhL()X6=|@KW{je^}arGgPaFeZ?;~5;aE(-5MPwNZuAB8D?(&Z
z@8X15W?Ol$4vW<inb`P8t-=T}&|sT0xrBw~g+}lSA}*toWl^n<>(=BWvB@KQwCt<1
z4~dkyAs3ei+DI%l{Av0RZpxR06>(+keJq`#*5cVyU$JV4(JAPIT?zC~<cJQ1Nk)NY
zca9;;`g2NCRrDtLyb{-_ulksf*n>B5l!HsbHY_30XW$^By5gup(@lzB*UD{u1_7p#
zdXZ`4RRL)#fm%qMzd{$*X8cnw=gm+L#;y99gCZA`wI^gwt4GSQ;aMck#fYg}*@?+J
zYAqiZBvtKTf3dloJfq8xyx=*uEg}(IZQ<e-EvLI<W8)VWpbznEYio<%eIIeMg0+(a
z&!<GjS`rqjA)@vU2OP_f4`$(=)pw)Z9@HEkvADX)(gZf;A4+%0nQwXcf9<lAu=Oj;
zu8?&(PXlC_AE5mt4&KD`4Snrds?MlI7I2|=h3oV3l&Xb;o);YM1Tz=xI`qtxD8IB2
zAAMi!!^rA;{=K{CRmzvhxo*DNn`C^OgFsSMW1y((s&pGwukT*l<_=e48`_%|Rs|Qp
z`o8)DtxXfB&n>Z3Vq8XJ$(ZN-eH&61wySZ)T}DQ0-YJEqKzo{Z4Du235?TQ#^Ldjq
zeoI&e@N8{;zN2BHyf=xyi3MbFbtQ$oEH0cn4-2zl2?}lHvvw9Z5!|hVHy!>_?Nl`3
z4RaaaEAp>#NfHx}06uX2i^3zsI$7g(mfte1Zx%-SU|D5r-HVa(K|W+53QH=She<<+
zn&C@jXLfRad#3>^*bs`N`9Q()SXlL4kOP<NS9IgSYnP=calucij$P~0{ks9Q;w!E2
z%7v-O;sp|c_W{l0TOOd%uP`rKbu-?;#_{Z32}3Dazw)k)2(82?l2HmPHQVAMN@o~F
zoBcKx&=7Ty^=yH+U+ofn&L=~54k&;;!d81(2Zxxq1^a%2&1qv7Scwwe-s>C2OX;~<
z{l(Kn__+}dcIYOXn0xD&Ibr+Mf}~nM;!pl4<d!R7=l2fd<7$*bt}{GRu5+c44D9w0
z+g9$A8s>F7@q{{8n*AZkoyt?CuW6>VnsoU<x1O0)7t%29s9gMoDVjvtT<<?eBv=PM
z#3gPRu2_iP?bzEc91)V4**~7p%1~Kgto>0=<tE#8eq&wltJB<IhF|E*+;Q%B9!Tld
zi?ttf5it4r?LJemH_2Ww9@kCBHf*t$zZq`kCu?F7OEe#Nbk%nlP}f>KHn1!45hfe%
zA*)#)w)>%2Fl#ZgPl}+Ie`^Tv=VUhd`2)WS`M}B`gWe=>)&shH7hnI@?~hcvxWkTQ
zu3P0M$r#yMKrF7>Wd4NA0unMSTb}H0v?pLDx02)=zm_MN#OU#n^?GJ3t|}zx6K~@2
z{jB$=lVv#qCK>G!tSJ??&=x-pmCZ4OnDlw@$3600twmd2KL8xuD}m>g`EQUMHbx#2
z_F++%Q27GA{Mm!B2G0=Ke&*2HpChr1${NdA`<C4aF|0&cciQ*kx}p|Lg6BV!Dh`fI
z^?g-bXATLUK65Dy$|&RZtNrf1h$-#z?j!~cF<*D&m9--HvCsk=1KPXCh=arz)WV+A
ziC3=ebn+OmB8C$u<ac9TCS=p<<@L-jR$?pYS-CRWy7{4Nd9|o{)A;6&PzZHhAaMy?
zR$}huZvn>n7?|uGgMsh6gEhT=*g1$DiYjTuI8u_azjkfC!6!=AD5@F%hP$iFC6Y#p
zQEnbSc~EyL1O^veKS@jht?Mk7yUxpW<q!9qFC}~THyRL$VJWHwwS$0r#p5?2ZH*?q
zw8f__WCuNa^z*yhi$a`vO-aOa4S-FzmL%dMH~z$cI>9YoZ1CywA-|?nZ0b%o!)yYq
zZ7*_j4GX{Mml9#skGQ5H7nZTu8?k!9fHCu47F`5c0?sB#Gwc_bq-47c>E+(KpBqMX
zq4r44@>yLh*n@@aNH<89`0M9*F||UTWU@+5Fs<B|3`_a&fsaF|J^n-*MuPH*WI9+E
z%dD1@#cm`rzb*$R{1GM}IaPac6jDql5D%*(adT(nTEXqbc4%PtmRze9h($z05?eAM
zEB7$zW)?EonE+zV2>Pzcqhg0lKDC#0RguTW&XnaQS09PCNiS^YgfOct=#nb7!dn-6
zX6dyVsuI5T3Cp}!Tejgvtm0u$@Trt%pcgw{CHt>f0CuEQ#)jsZJ;`14JTaH92^xOt
z(pC8UU<23$3CWL|54l$ibVqG_<XnVFAJ**x3J9AbT(7^H`7Isr5xT=XWpiKB%n@HO
zAd1|tl>6hBK#FeSA@{*<44BmfVnnALb`#-oe~CfP9L3c2wbaWf<a?LPsdgd_e>(bt
zV5}Ybxyy@Y^Ia4wQfq8=-Shm(lXVvV&(ouM5-xWmgSzx+mO)P!r1qV1-<QuU*T(PE
zJH~bFFuB~gcM^V1-Fe({K%`lY9~U-I1IGj{)?}fpP5SHkvCHVrO|pl~^Cp(A9p+Z2
zm19@em2{~0YJfz8&ow*`i|84cb37~pH`lz%k@IGxb|kGIR8Wz~1tz)0eNWKnD(#<>
z2V|QjmB%(6womUS2-V9j6{7c9(e3!>)(h>Z*<>-SjX88K?g}<i+@?Q8OA#K{F?HJ<
z=U}b}_Wp1{yvbZm4gw8x4m(ijx{d$dPX2%2(1Lt85C7@MZy@|DMCKn?zpN@T&Z~cJ
zI`<X+Z?HL!|DSLAvYo*TrS-6OP9gY-baxtuGvo5KeGi1bBgQv+fmoAojPeffZEDWd
zFnUm#73+fRp12ACnHsYh%cx8mhIKFhF(R_prYO^F1ACQ&)M+XC1g3(nTt3BC4^QhG
z#XBhxG3JSf6Y;{nJiN<T#;23mrT^$R2bM8jn`9GHbNxl?>DAf8?L6D9eEJt4un!~n
z%gu>`4XyPMrF>DQ3_6@=C>w=O4`DZ0VrtO_H0@Xl!VKL4?=)IlW2`m8|03y8<E=f{
zuIb<p$rhxj!30+&SI`^r9$Ik_Sa8(vMy7Yu4hl8bq$hc=`pH^KR?V_=0*nhij+x>h
zO2ov5#P-=l0tO^d<55v?uzajO(p{Hsjlp3hkJ3ON8{e87PEz_>FvslcYYPVNZySy&
zPrH?0<D?8y!n;jo*_YSnm+n(9lv0f5adHbJcT)_^-|v`SU{~0Slw&oOP%+JYLgXX?
zc|L-2v&xwvN#yHuud;+$1FNzsD2*N*Ry)1S0cMf%4(Yt?GPe5|+t)g!`t+5>YgGy9
zSg$pvwVu0c>Jb5WV0|hn)2@rhmaNS_%{@i0oEekf8n|8mZX8J&d=8)XNO^iE@H?U)
zFee_b6FX{<cMf6|9-6w(CS=eAXZTPs$Lal}N&jz?c<g0B0pqL=sCYu!Z=xRwR2dl+
zYMPTj$7N81@36sioz8wjkvKnI8pgn#Q7IA+DCvm|f&L}gZ`_T+0+G~f(SDHGZ1c~9
z>_wC$#BNYt4a?4fAGREgXQ%X|1*}GSeUT&RY3M=+b}DrmIhR$<y=(ZMVw~mR@v*+@
zfm$>P{@U6DPJa*z(xht@qY+yMtTpad#+}%f5xZXz&k$Q(Uv(!zQJ2pWk!i732DNt;
zZ2QfbE}YRl(GY3NSyLKG)0fD#lU~$6scG>o>#+Kl;%JQIQcZZRx2%|UK0EebQnRx-
zVyvhuu)-eSHy*u1R0(m9r!cH>CuJa*ed>3xGUq}LQ&)<mVh0XovTq)!xDoW?c&K9I
z?fq=29hUSEX-1{5%caN~V6$Honx(WeN3Fg`7gnC-d&-MBQ)W=#Gb<=G>UxU}n&%Q&
z?-U#Tu1Rp{#Ao@uBAh7K&G=L0hNdvQM_p6YHBHnRIO<bs_u&qZ@7YmANNN4cn?FK0
zDV{OL&XOz5w7>O6!h2^twZ%Hbe%(oRTN;-+@u`NUOg~d1M#c<j<e242_eRQ(MHE7Z
zzC{qF7!xrp&E+EZtim{twq?R!JI^}Y$P3h6ItAi;;WI=idD%y+qh{^$Y+TZ?^dh-R
zS>SY&h!u8o+8GPj0hlH5rP12q`L)ms(@2G@{g$Z^qyb0?RDZTsrN^BZ=8dmQpwiN^
z`Yk%>v2zof+%<jL8>6OKBNN|3ZtojbFOul^R&_gv<-j*PUJ<Kuz1qnlzbyD_8<bJI
zz}W!*zn1mcFIsPN-A0;HX&DHVV)YK0kEO+&$u$9)89nd_#=#h95R&I8pw;oBX}hO+
z-r0e>CtWXNUF8R!1ADrzmEZ5=QZiiou<$#o+wrFuUV!O03iEtn$%UHfa+D!kwmgi!
zz<GPNGA-)<5ABqo(5os3|NC5NCSx4EGn5l+>>mCesX!lK&9S&GJ<YMW*!`GT)E!|i
zx7{0^=|K*t#;kH$y)k>X{8V<Vi{S026#Y@m<h8PB<uGBoI56FFkw>66efIda+Sh(8
z-3)EH(iSMym4_@r62<TLj+2CD5`M%{`atQD{8?!s#a1nDefFJ}WLlx=2F;G>Z77Rg
z+NWPB5|7f&()uf(9X~dy?MY%I!gGit`GO2|TY)bM7X+Uz`%8?twi9G0&9?7_VM6jl
zz4UKKFW;SlVSpXXzESCqyDM%F-*&~-N!>kVJa&!3Z1e0Hm_LyPeP~`<Gd4c#((fob
zxvV$2#-0=^b+QbyJ-qhd$|It=d$aeuFk^XkFDuCI*e7iH3jC&qA?&mS#<pM_0ifAW
zYB?O+>GgjX=ni=IkgkO+#PddTZECL+Rd(9362|-ekBBNvAju6JygLg3xHk~4xY@+L
z!i_IoUD8r$yXObV(;Iq>zr6ME;oa%#kK$5yHv|!$;8o-M{yHzpl+ib2{U&^u2IVj0
z5mr(X#R3tBT6JGQu@LUxZb<-t`$N=-l`p6^U$kxu9Q<N=e)+Uv_s8mbSmIK8b*@jf
z!j&T7%7*%>Pk36P^yJ#HoIcoecF$2Yg<WlO{Ju*1Pv#)H(9sKdPGIiJ2AvxA_FqrN
z4kEz8JLw0v6TY{HwITTiI2!-?<X@j6;mvr4Df@zHm2!bv5Zb(b6s91ts#<?}kDDaz
zG4IbfwEv%sIzM!vb!p!36#Lr^GrE=mga~O!*pct8?;7;g$FYlSTeUiEsesVY>nAlC
z1^5>0$8F!L&g>bVnXiF!pGwG(&;SnwmWljrMIS$Vtt)-xJHUhbEFPj{dnzxe3o2+S
zdLko)p=8b3Jz25fnuXt%@m5z~+kR};zL+PYZ8;I}l{9hmeRSvjCQEV12^jy)yxdU#
z2597H)w|WRe)Z&jb0z8RwrM|*;n8Hw_i3$b7o1-Px1YCX<SlEedy<d2ndyVGJ>9b)
zCy$oDjFD02{}A`jt@p^^o&l-&UqO~M-!(ot;Y^qS1Z3~;(&6;3+^l3SVTAXyUSEuM
zg%LAh#$b4usf$oM5tW-m@I--xX0ulUHuG5j8H1*Z*Y(<ToqrRG2V58OqV54<A_h;M
zQBo0f#q{epNWJ8&r^QJb%VV>)lpwH8W5Q840qdFF=<N@f!`);*3!6k3z*bj4<j3>S
zr!514j*OS@j&0(SpWQ}QqT4iH#+c_`i&>HjGMGOh;Xjs#)Xp}b(E|t}YzIlKS+1+i
zM&HhxZOtkeKO39MlT}PAj%&Hq=5}fAj18<i-=6r0caC%;ygt8g8cQ}|6p%7{Z5`g?
z(z|kD&b`+>&@t%&z`hCDV9M7}o&P9x{}sl?KRM63?QZ*d;4R?>Q2z05osYQDmKW66
z|HIkZ)OwNH+UDKfu^TEaeSyPSfS;dshcOi<gC)ZB=iT?k`_{<w1Kn_;nctw7-ly`<
z=jPz<7wp;zpfvs4F~hyKA!&~YmJzv^zMBdud3Cf6*Wh5S&!PJ#H#Hepk$B^U#?)cB
zdVK-N0a>Py)7FmGT{6(a!)>1FwH9oO!j=<W1*q+A7)agwR68rq?xJVFZECpANUPja
zvaQp%`mi@^puYz+(4Pj<m<1mg=TF*~Ku&n`dbkFD#fG;?rP|*Lsi+(HZCZ3#UlDkp
z$Ya#nd-9U@Pw;E>0gv+}L=5Z7F@&5=-lU%sj;`0i6<HV}h8whhi_8dciflYDI4ODP
zHga$q#J+Djx1>}wrpEkkHei({>78AS)XrNdsC|0lHpg7T*ximU<>%@cxLsbFLFW#~
zwhD$m??NRyCxccVtsR|1WB7ok0kkfZ@D^Q}jFB!sg<ZgBE#)#LUjy>L2kiyMnO<6V
z4!1xnDsbOm6;n;pK7hm4vvO!g{P*teS$L!-IW_x6u57WfX>ij^a_U1Nl29Z6NUUVz
zOwU2l^9J{@`RMii9Y4O`me`XOz#|vu*PRcDkKvK2Dqij2@I%mFAFAWtd8iXH+KD*n
zJEK|;R)>h++os8kvy(6ij!x1jEjDo`pt$%WCl#wrX(Rh>zjvRLXNdx@Y9fu{RA1@Q
zAcMRe+egBT)t)DUDeZ}1>WW9n?>1fI;-z3u&987h#%=hG)|5^ydpkWwy#=cD3*X!C
z7YpCgZj>2tyA&*quop=y1F#|wUWu3b1-S8LCo!aDnr3Lf^`&TKc}ZdLXC(iJFq%Bg
z#;T&#B`Qt{k^56lNf?aMLcY5c8Xjii$_t7VVXRz!?|c8{F<F_;E5jde15;xd^XCmJ
zWN<BM-F;fV{N%8ic{Q)0oocX&9eLb)VM$MB^txa5t8d{4=p;6;6t|o|#131%w}-g!
z=ly&Bj>o-H)F&oyp>5i_fnoT1cgL7``ai&+c*}-l{X=>uoZ6GdA^AV6d;23mxa~F~
zUHDHDh31JQb%|!5`us7-a4U}o-X>3OYU6;O7By)|)fSAd>DkQqW7S<a0A3)h7oit>
zcZ}sDnbYCeDCwv|)gWV88u&#fd7VDDTdVTqUR6@5d%(I!?UeVgB@(Cx;(hVKcL>s-
zXW;2ZJj;Pk_9PSQ@6o9|Nk#h$M;zs`TXUr{@mX;XFY`0-SVs=*v9_`r*8yE=uw&}r
zB^vkVop)Mq$KLFiZ@4#TiXJ4}9>t&X&iQ9p?a6;23BCNdow#3IeJHUt%ld6!w08wl
zkX71V60xStbb~m+g;>4uLM4vpNh>b4s{14QJ_pF!Avl<sgCk?}$m%np^@5F>ZkxFC
zfdF<_z_*&y`|o{3UvHtPy5Cb5e3NgCs`t<T+4S1)Zo+Uo;)<9@XgV<L>nqx+9JyD4
zW3lWe_H+&B0x>2ubn(5CZzOxeo^Ulx2gj;|yMW&Vw?<_(LGJ}DeN)-4srr4y-vtK%
zaC!f=nEh)>Bd@|L#(%7C|N1OswZZ@LKX0(BDfSb*_*<@&$Rv``zvW^l`RZ;TXkCYV
z{P%>IkDakB&0W2hX-oN0e=qft_#UplKtb`$PrZjdu45{HFE98$T>JL>l;cDx%o*?V
z-*P(M3unO;0Kgh8wXgD#O`<G+ukwUD{Pu$QCS-8*jME6Z5`Xd6qW<4Mh1&2o2b??o
zsJOg&&Kr@NX5^Fq@f4d<E+Wwd=g!mPk9m1~lK?a-H}QXFXIHy?C|8L~9{TM|YxV2m
zsmt6S|Ns2)CgL~%nA5bQ=+47eM@IkbS-~${@-I`a|E6nOq=zL8IJf~V{HOEO{>dPH
zRPx^|FX%M?iOsyCUE=1JY(Hl&^pDP8M_>!6D%CEleI>8qh7S6B{>kFQ45z<(7k3Gz
zb_54X|Fie!$cwJKcTe+NlzV&KU_tC#<rzP4Pk}=I`r~`oMOQTk#;+*GwH2Yud{fIY
zPd_EH(EJd^DkkL_!ySLjg_iE5%cCD8YrTY@VXcHOCS|l|-6sAJ<eHJGR%7)QH)pTo
z{PB*?{56qIV0<Bl-_k9Lt>1B7bkuwK@q==m9k(i~X`;?|gACvo$&{i32Uh8mdEXv8
zfmeRk>F&Mm1jH-T9-eV(nm+aOVyazukxy}2>jt&E3GOU?tzYPj7WV^trK|fTKieXe
zK2dWzcoWW%av7-9vS&7|)98g=xCTa>#J8V1fhWf9*Ulf>(-2sfi+wT2G0G@B<9w4e
zz0A4mf`2!!xhF7B+)x{wL@j$9qTq6czlI^RL^5>(T<RuMeYADReSj>+W6Ny2u2a0L
zX)~c3-hP2kESi|e*Mq-}a$;>vG8Yy52pLX)Y;Nq5m(rb@Pwy__)RyzjLj10#vm+0R
zq_IcbEF1QHEY+9>aP$Y7mJnWFY<$cOogpZsyW<#Ef5e;yT;bYvq4<FX)X6YEEF{X9
zLamop=8MHN!sM#N2>e0fQ=ffaEjFq<qWad0xKl0(@7z#Bm~$7x56qcmo`?m8RqD9K
zV(l0Ax9IvKTm@*M`N`})7-e_$DEoq=ypV+cRe=qXiDySo{45~l@vgJZyniYi4=%a4
z_(1i?naz>lxd7rz$vZ|l^&$Dk^XPlkq&oH_p@4~voBbS-s@KoUJ-NXY8*r>GbOnSj
z=Ach{9@T+KN7tnp!k1h+LvobKMqQRlzw3*KKBk>sgzh92cb}rL#nq_JjfzPuTvEU;
z4UDvRYC6NQ%@_oxvd4Ff-6?ERPR7m-A)OsckHGkP|K#R-^@h*R2WCQbP2DY}Y2!wO
zQ!XV3fh!Et>c&x4?sk3<-~767Nx&_F-;S|0Em)99po66M*uhhP&x>f{eZeuD_c}!0
ze;7ewl)rR->sfT^)stE%J@+4&-!D_(FXaar!TTv^!UXWgC2#v|;!#a9|E1HH7cGYn
zx25A7`OAHlW$0Lwi#gvZd{h4KCgWu8Q>Z)+ZBAxm*$0F~L)j7gmezAIIkoCa)Lifj
zTAfy;kUAj?`#r_u%U*l5k@<^PxbS`nYi&@DyK^LF{;Em#|FHK~QE`3WmM|nZ!3hM1
z;1-<V?ykY9;BLX4KnM;APJqIKy9W>M6kbT6fWp0Srz`jNzrSD0{hsc9>CtbD8l1Cb
zt-aRVbFZ@}y<$#YkX`)zVcLS&4H+*nkmj%%eTJg$yDVu$P~6=VBwxHKhqICZZkr7*
zd(<bCxGg{~+NjJ5dr!Nx{%?qZPY8@@{ZqB1c;ud1c7D@!=EXpQGy?3>LHK8zfz27A
zQABVAoTRlZRP1SxGzI4St{u$AKT{g`{-|k|UT{f_r~C*7b}75BlTcG1SKtwo@Mwmg
zjODyVmwI<WY6%mFD@-fl=u+uw7bpP-1vYD^c$pZQ<ncZ8Hoojq^S?8LugnqvZMMM3
zfb?_ZJdil5HM6gB)DlK_nF7eKJ>!0yCO7MnouxW7cv|g&Z??nAMe;QbG1Y79ul0r!
znR|k=OCWzz9zrc+E%)CbJTCbYn(`EbG+ivkGY3%|bW&9au8cMd(Y@lI#p0y9luUsP
zhjVCklpRyFj#PK5hVJSbNO*<w$mUX}=?jBX({yGCWCl4Eyo2-WCzC(?W8X$HH?|8(
zcsB?60tYrQXRn@FDi;SMmmR(sB*Br}A^9_QWeRcRlEW9f>>%Ix`9Cv<hj6*VU7Gw=
z6wR)QC(GDsbt>z(N(;k&4Kbm|Hyw_Y_klk!LAah(c$fU>D8v+s-f>qKy|dR3_;H2R
z-%_?CTMsN#kPZC-<~ujuy9rcpolV~!zTb_fu)d!>nbh~AjE-fh)PCnj&J+g|XA|$a
zU!iEKJC6IYvUnfY-TBVx7e`Xa>Tb8B)8Rcz<t?M)kDaJ#3Ze6ay4yAJ+?Eq7_$ug`
zU@)+P_Z8|$^>E<9ppJL7+_&P%(DMcGL!|G?GV2{lWBbiehT^yD$mKM2FJXf{o@t=p
zw}FvS-c-}K6v8so?Fy#ApDia2hkwo~TI+GX&#uxPHpBZ^2)CH9f2Z2FYxEf*vsEj4
zl69)1z^wWscCR~8zk++acVg;JkoV}($u;laFs@*o<@U|5L-J}brY~~<$nZ`qje@kq
zADotyK^kAU6rq6_Qv0lc;-+V#ivetH`z+caEhWjJw887svsuZf_7*ur4+gbq{U?_3
zmSVK%tdcg#5CF<`n&-^|{3_|^t3^HMgPoNo7ldTG6*)5n-g+s!J<j!;F=vqV4H9mj
z)yTd2_py`ZqeX?g*ew{^-jLXmyZR*1KeVA$0gqvG_|3{9s$FG3`0?1w%`H2G=o%TD
zGcmZV<y#6-2}(GgMH3Kh{r&k2$qZKuiR<x9BztehgHe5w19W5!hl&LNV6o^u(@_iK
zZX5-T`oOR<^rA>@diDHm$6gYA8;2oPXL)Ar6n|VCiQhw=OWH5OD+F4d7C?~IDSGeq
zWnzpLz7`W1p>0`(9`N!ebU$bWsweL~nt@g7+sL7j`iARv1^-*$yE%G<=n@&14a&cQ
zoA14n**Xp6ZRCjb73I*;nR>o>Tp%1}MC`>;y-kq!W)-#5+Zq;t`>-uY%&ovWdRid+
zmSf8B7NeuoAKpPiNtF-4Jf3qj_#=w)R|sL!b1w_sus<-Q$FtxfZEB`m9g+ym;LKDX
zFRmaJ3QJJNO_PcH1ycAYSMXxVT*zNd8~7wnO!HW#2%?#v`6RC{NG`c2Ol6431_e@N
zMvi8cP1H1tsPRS8pbfFqzXx*FkFX&%DeoT7RI^QI_=Q}J6~G6rA@lldnr^Sp4ekz8
z*}mUINM2HNsLlWbzg!(!z#B=Fw~~6nkk752P|Zn^fqJ31WKlKG1&expzN%OU^+
zv*r&iHiwZA9=~sOcRGS%EVnYqKw*1Bh<`1dA*T8Ph@6W*@JBm)`=eE8RLun-E90Vk
z^y7U_=kL0ao;8)AZDsh%fHxHFlk9OOaFnvM$0nlW_89P%Rdpsq!0oXjIQ%VX79=5@
z9+<N*=#!3%;LEM({Q$&?V?8qeI=#adyQ{LI?r_W{G;=+}!SRCTaq1CHo)p5jVDP`h
zDO6k@(EhrH+NwMr11=I0(jiBjyPxIf(>v1S6ze?|U@njQk02|&9$T>1EFkND!_72g
z2=xC`HX)VbM^O6z@G2a?@qTfHd;UNF$qsE1f$5(=C{F*U0r~%1I@I4Lil{g}ID}cZ
zBqWLG>u%F*!xMlpRG+-MFL-BE-X=<>m{ZGn@XceM$>k9#<X63<#?LQyG1C*vgfO(i
zrDZr+M34k(*|#e%xZrwM!T0V67;EDb40_WH2H#(jz_HcYgwYqfxRT}$WfCRt)RJb^
zmMc=(Z{K)1@U7XIq!P2;;SgB9?wr!BOHx8@p&C*#ex(~KLd$_6<eT$;$*7{39U+vB
z-p(XHQeM%J811fdMMbmK0?8ohr)SjJygJ%FR{ko#B#@wp4~}<ehqU3`!AD16A}>h$
z3%cU{3$rd?+6l63_Idb$7d0c3*2cW<h_H>KVH(<f_KK4FlCD{oV&N4*G<CZsk0hnW
zQcxUcWzsB!RwWZjR&JqOvnsnV7oO{1Y9P7NJK^KjBwdGua-L{Zbs!?itTdX8tG5y$
z`9}QoG^>f<GRc3!{zl+A&*aqk!c1hr<~4~~m+Ew<l<z%X;uUge1(pPlvLJZHtP-4W
zWT`<YV|O&bW1T#o%jCtSZ09Z0fu(fW9MwujR~duk>*6{wbrW)ShofNcf@ccR3%ASu
zbmV#VOKH}CcF2!#ledmi+`dv9q_t)}{-Fub(E?7AS#bFVAGX>6di2A9DeJy*LHkul
zclyDBX1Zt8vL3U`<CkA;Y76CvldRkm-UqYwl$9d%)U+yO@4@xQd?>|9)5O}dEBQLX
zHZ@>-lea_ulQ<{`^NlA2%^%R~x^HEYk2UPuQNQ$M^MwY~s5M55L0l@PN3=0e{9AwC
zB5qYOZ(QBQxlNno4KafuG)fMA*}D0w3>vtN)mp25SfL>$f$(pEg)db8RSU58Fwj{s
zD-_)E0Rsc0s)&7c-!TyY2+MDfD}AE$^|rvRAM$3Eexye-4*u~MpEjYq(<^J{1JbQ1
z5~W?&`fc?%t4sfl0bf4|)X7|~jUsqVJaZX9AKrofwVMqJ;s`i0Y7Tf#=)8|;zC8cS
zJpVIB{G=g742ECr$>nME68=HG5ma$H+tM*0(Ym+Xmwqq)8(p$G@Z<jbpd=p<Wl)-|
zoRH{5gf8hQAkS;V<y?x;l|xVww<ZHAw&MabS#4E$>fpNzq<!f<Qv&DR5SWR=H}Yb`
zU}MXQq4QQX6y{bJho#UVC!vH*Wxi4Uq$?M^PH(ypwXn==n^t6y?Vo&f|0D*$;lX~*
z+|jP?U>&^&c@`zINxS*P+xyvcsBNYD^8Nbmt2rJHJPrz)<xYjIi?G<h%kls#bE@kj
z%Zg&BCE4J#`V=oc0YTZ@PHmN^amX5i70^6!Ph{=UHfwz5`CH)#8;34~KHZK~7SGvk
zL9EC%n{1nOQ;RQ;hVPdt#!f}<t{jhpw}k|ZZ=TD8!}Dioi~x|ve<DtB^<bhH&eX;W
z0>~Ya1xR!dvzvEtKd>p0x$5|y^s*PSz4e{>n03kVHFI{UTha48WMlK@1&)Wsc66lj
z-r9ci{!5N;hr;N-YO4<1kIP--d5f<pF$M+aIbdqwE^uoMJr}LI`1lPtI0M#k57)3l
zwm(84W<1WTJxL>h=iBEXOSpvigdBbT#ab9!5HwT7G=kPQ<L&6eL@Ux9I)mwh6FeDz
zh_gJwnCZG>-`>%?@e=^~yoX8|LXI2@LAsxAI14~xy$_fius`+@9m#f#1Qp|AO4ohC
z{b}_Kj7b;<KrMjj$A`4wsPEd>ux#`1T5Y}IreSZ<qKb)00{4xOG*zZtv;^J14g;R~
zI+xh$rziHdNIa``W=-B@Wm|1V4QAZ-SJx6I-){K;Yrs2UFxrNMCx&`j<%heWSBmpn
z_RP7TAc_pZ2k(7<8g0msYEPJTi2qs&+RRIfBEr$U@c%+<RrGvy!qrw2L7XUv?Jd1}
z?ED<NsL;zS=o&J`AwJks4?3Tei@4u#Gn5@l>9^UXx_6>v(*jn;&W!ubrfrvqJqgf)
zM5O(mL}YDT`8dsF1cb(KFuSCuPNWjGeQ(Sdx?rpqtAS~7&7aHGm_9Bjz&~S(7_iyb
zBT2ceQ2Ld=Dwd+i>t>le-T&LpIRk;@kla$y!`4oE;32wNEXS!shvG!j`#mC(<g(<e
zY|FzINUry~=SCs2T;~J%91qTv?-5_<%7pt6g_ze_>Pc7NRm$?6Z_vMyW?zj6@y5qB
z12@e^1{2)<yr}M#pty10)X4KX9z<9PPz5!BmJII*=(q}dBbcegq<AlL9`RMOzOhji
zG+ZNnL0n&WlcIgCxzv_M4I?(D3$gtaeO7w^Ch&qQ{@o!|!YZ_On(h(y5{?*$KEuMu
z+0;3S=#l0yXmP}QHrETtvsZfaMmFOFBwLi~0=`U7n}yOb7#QY%dVVBd3jSiTWmjg_
z9%G>7{WVt$UWH(NF*TTOvIr7Q!mOYx*c@g$+OlC*rQF$Fpu9>MKu`Q)2@xA%RL56U
z59U=%)>!iAfKB6t3NU2k6yN=RDdX<mmYmUW-!rAjQmm#KbVGw(NxtFg8*OiwxH_vS
z6a%Vpxuc9vE`AfG^d*HQWd^6N;e<0{z*=8T&ZRGE2j?3b1r2HT+sL;gIDbH6-O}V)
z#t%cSj4LUjk?@9DY}B+nb|@)xmIeci>8s!UJu>)s{3=0(^5eTI{M3=YQRUd|pvW+u
zQSu5kE|L^fvqbzt>EbAxRP+t*^-ZL?d#aBclV)lY6*a{ocRUPl17R}S+Q|tN77rIB
z@hWUJUEJNaa9aNX;j!Olk*BFAZ-kdXGsD&~b|i{GM=fh@><{8q>Hs2I&$nC=IbR5K
zPbq!qrwmIF##{u|_QGYys*ldnmP+9Y@rQD{`)i(0cduqf{RIVE#>}<o%O{d1#HF}>
zaTo`#>yl$}lkqohk>#iYV>^`Z1qHZ!9gSP?SWU&G>Rdx|h158;%r@5}NW{Hx2Cc6@
z{S5H$s*rC@`a(bu=6UcLb@9C*==*mPh7FcSSlC4x(A?RPV^D58BU<ST6q<}p314r4
zUs5tJrE&37%ad_(7_yA1$w_Dy+tNmQaR_wE+tRWc$nqnn*J)$3#CkWrL&+&JtZ7K-
zOU7n0{L^a}Q3bWK%}RNt-h4^EI<mOvLr>MiK0V!iyLIYUuAhvn_D)0WiazwG46m5l
z?XKnsqngCWdq_>~e-dl_e+$_BuY==xKFy4ON%cqLvn)sd%!vaVMR5U;Q6nE5Rt#yw
zj}S+|oghCbcXoqfC8Pk2XIhyUeIvbn^~ScMLj8FOIrNNa6WO;-^<>3+uewt6VrFGn
zy|s5U4rQ<HB=LQw@gL*DDTnC^>5ci<aFhwPKj)UMRump(YBdqE0Tc(pUJ%<mH2A-F
zs8VRV0CPyK7Ax@Fk&$^od?S?gz$I7v9%qik^~uMtWUG~%+~Tgvt2K?fMsBx>4j%c<
zP_`U`)s}p6Vk$kG?%1O%)Pp@!);hTAva=IVM(!`btBfv367!79OZnGV!j{WMZAs}K
zs7!`oJMFd(G0F#NG?g=Z(ptd@Z%HgXcSI_kvz=45Jp4T%0WopqX_1C}{P#y>ZgrKc
zlVyp`2ZX+ACr92TCY^m-cPN_GZZRYZfB;G044^~ldi?$HJLC1aJ5nQqw3;@>(kp&h
z<Mqn{-KKg<LFUCqEJK&<Eu|w;*=9A`14Io==Dn@vp?xq@U2^->&-;mGUJk(ap9z$M
zjw?`l)J1>N`~8b|#<_|^{R0f+rb%91)XpXhkO%kqCGhUWKJQs_`&_>FabW)2o$({x
z`RRFT1x<|$!=HMf@8I`Le$0=To~_k6TMzRrjiyN-whbb6JcKW*O$*QGGY=grK2<GR
z*F)$J9r0Yc)RqK5{vCY$cNEc_g<Xw`_j6$V<zGq#^}Sy69eAv5Uf)HOnT1r*T0Km~
zl|9y9PPkN0s~)ta7hWumWX5&3&W-9jIZo|2*jbz8Z(D`NQPs|wZUwZeiYgChs@|nl
zOAsbq`@Ih%6qH?}@3>&_wTNk7&g*JYx;`YEJPJCCPzFfAC$P;UDmS^t7sf&?blGhn
z>m8$Y#CB9M-tQb2*C732^JjT^LqU4nC2Q7^<237;WNy3q>Ov7bWkx^cK24t!4-GWK
zw*l`JAH$I)5NjHC)|{}6t8`o(L=;<-*xsF3T#kgV0Kfi)u2_|>mQnlNX^6X7&lYt9
zw<f@Uoqjw%nBFWk!&Q=~0<Upg4t9KzYyOFK;YU<VFP;!SI-2h{p~OQsn+D(v@9@G6
z9bBpsM9HLhXRPwhQxN7DH<clFWGK@`tQC(DKSB7=tiBac^SWjZ9U(F?tE2eG*wM*1
zn@OjEoHy@dF(v{6lbD7V4OjzB_Y}Mv_AN(N?+vs3%rckezBuzRpkrdI_Xc0<O!#Nk
z1a_>lSOlsCS!GV`B{vXHI>=Iy@kF~bq_vU<wyWVvxz9ZQywJAbVOD>N<MYe@=FIw5
zS-Pkkxhn|JtwUeD81{h_mtL^`UWh!PL{S`TcUC+o5VIh3YBAYAr*umF<x-=%DXO@G
zB)1&5=c|yQBhhG6C-8QDo4OsH#g<G=0bnT)jBW=uA4m-4VMT_B7)%}#RWv(`iHn`F
zl6^6gEGJ5`=^h-2-N;G;EEaL6e!qM%Juz7AKfPiJv&{^{;d_&#S_)7est@Ua3Idjl
zT>VPdq_Cy~G_=oAg|lu8Rhz2lerDrFCsa2^-M*90sn8}F-Q|<M6YbiO@Xnc57#Qhb
zKlZoFWOdPXsZeDk{uFfbJ<C0F@>tpwDORnRJm6RNA#gR0Cn<p-&AsT_L*TN9|I%TY
z@fh;6ZK~%xf3fKwhNG_+OEkXXmd`rfX*gSK@Jh|%++`eUrgY;&*sM#{%%5mGFm3b$
z6kNaWlY`pXD(hLMzVx`sL$DV^t@H^y%h~lIgQ6->gF*Qv_1c)$1UR1f&?9d#tOBh>
zt%!|s&vnfsDZ6o#09*<0?_uY=YcV5>10|_02duqgJS61v2qAL+PZ3k4Vn%7B2+8@O
zE|45|Thi5bFZahq_p;%Q@7R+1f6mxyMCNTy0#&boj<rv~=FX>fY${>j(QAm<{fN_}
zi1z?|4@T;@^*t-Rm!c?Dd*c(f1Ub=I?C*Nik9D8i9pAiG8M{t5RAY|Y%z0DKgb|wi
zanCj)>h{SdJCv6<@zcbnc|u3=QnEZMA0Y~-$+$1PzM?N;0(s-pL{Hv3ZDmm<+C|2J
zS8q$=x5lUrIfb%==9v|w=}RpSgKPR19F>w^*;nCr2gV@ZGn(;|kZ*^Zbr0zb{CRqp
zQmbC|J~VWCD0fwk!35%Q9jzpogvlHuFRa6e(x?&r(T<$ba3X`OLy$ifBlXyhhd&Kp
zt{>i%RJ!=2nR%zY@#E%>%APZ~(27k^a71Y#W%uPB2?r{9dZ<PHw}9Q92HK>+t1L%N
z(qS}bSW?WHty{;u^YQb~Hr3?KSkU#Nd;a71M!PDEgp;t^=`}J51f{vy3{lad>kDTC
z9y1+{>@j_z`JzoBhQPiBkC8JNzr{>ozK@1GL1AbC4<c$21`F&XfCx#0bX?G<wSPBa
z0DhNnV!i9n-r2KCkjhVvP$p8|!Ky#Y3~JG2Aio&s>2IlS9pwCj4kdWahhq~6K7DR`
zN#FveVUmpFdXEZet*1@Hq|V4t*P38M5fOd)AQp_Z&4AxiiwW^mIY&Gl(`{}~FvX2u
zA3t$f9ZOu-r#FdN{6AvupPVDse<64yJh?)GNBEyPdesy)zO^!shAOGpL4$=ksI#re
z;_WXkO%ah0Hhm>5So6T|6ok9so*EjG9}<U2|6_S-=lfS)!aF+!3m)qu;|x<94HsX{
zni;H{_1Cr@VK{ME3++1jwRkfw!DlBqTO=OKG)8I44c>o>l&Dg#q^dei@^oW{@51i(
ztxk`vf6vKDGLlue`QlBMMmPdgcUSXQ5vC6==bc(R_l{GpUy>SekrBJhOa16=WPqP?
zmw1Zt@?-w;>Qr$MW&pY1@kAW4(*H0*H@Hs9;VEoW^k+1`w~yNPl5*<^`oU~mD$N=m
zbd^^d7uo1;b$4_+u#k}C6?)L<<^IgL;&c`y;m&G))Nv-iZ`DR{nlu+`O?o<$He)(8
zVfeyVLi};_k{Oh{91CwQVDub+&v|w!s^v;D<4R2ih_zpk(G$Lud+Y}W*XM^cP`KWX
zV`Sfr*HRTjFc;eOi-L~kZ5XDBC4s2C{1)>Nj-*0F!Z7fAqtgk7**0Wfv68eP(&;i;
zN0WCeX9gT6s2=-xWILnH#*m*J`S~GgwASZZQFo1-0|&m?J5713kGnS?*h52ee2LO8
zUy_ocf2QY4eo^&&-uj=D@ox{1{wI11UT}hT^D?A1s&tuDAU-KaR3Y)}?7wIHztq_U
zjigm_En}yw0KfFAgQkZOuAI!wXsqZHEAD`$k)CsGHSkt;qflCdaC7`fJ(esUq$&FN
zqOgNek)s3V`}6I}?FRSo2$vjX0AA>~iY6u8G=<c74bgK<BuRy)4nK>VT_}VaxG249
zr{$5EL+*9UwB?^e4K&UT9(OTH6?5Dmes+jdwrunqx<`l=JqM!=+MqVhIUs5X!U%_-
zw2{nIOuMe4V2K^qS}oV;J>M1U#z%9Mv7cM0NoX^KP$aspC}pdgSkEU{%F(1oP>+iP
zKxt^@&@h{R&U}yESA_*Q1j+J4&DC0jxE}`J_&<>Kx!)^Px4V!2vQU2%f}P>rW$NWZ
zI=1xpoN>pG%`U!IW6fuIQA}e`UEt5%=P&}7<yV*Tb{lDwBCW5vVAJreK-16xS<V}t
zHyM^2{g7~KIu|Et@ker=#A95yDBE=e0*;`Fr^Xn<%B48>Z>3#l*L&4DU2dy*I&vg|
zXL?a`#lvl2kct?;wbb74uw-=+MC9fGVDd5l`ld)y*TeOYkZSZ5wk7lW0-FWsSv6E3
zZ<m*O8e0|@|HEL$&4>%ul`DK1k$7)TRTGRK74J4O3&nw$$@b3mzTgf234(ho)wJ;Q
zQe4-`)GYT**A<Q;98NbMgte)bLv_@<Wa@m<^OuX@$VtrB_sjJ62qvoF*nDuymSER~
zETP(%rr;A&A+3!9M;1jpzmW9SDkC%1J^o%e*g2M+A}I3v3`gRTF_{P^QUBl|d1ju+
z4KM9=mxSMz$&g=;@;iV`L_&7^*1bzk;1;~>7xMS)<tN8iCYnWej;TMCJ!eB}jns4%
zu}Ozr2|o5P+TNu8Fgi?BbUEk87*0vwFP67Vu#sAj{r2=EdVZvBz}M+j0-2=qFyeyr
z)8cB?v0%%9_D``s>j-AP7pZOqwIqp!y?yomV&$dE(&Zkzys$A0NkT|~(+T=UOCZwL
zejIpXv4buq0`67LBpjshuOGyecsnjK#tm9$km2W}T<UcJv8Tn2xL^9TW5O<Xi$Mv9
zVRF|UgHdVN&tt!aSE0$N33zmcAlIwuQ?xQ@@)eNo$D(w)4oY>3ykcqsE(baF%^r_A
zE`Ge4hS=E<$aoOW0PLOG<?mUAE*Q?3u*62m6_t?tC%amjpDtYv6pX+a(d1o#-bilT
zK3-WX(x)>#N$SypjS6;Kl=(TD5z{LSYzbN|JGI?B8<!kC=p(O3BGd`St8kO6tKB3y
z^^38^6h!LJc%Tf*@`^;*xH<!3Gq=iZ>}XOwQ_N+>K-!W&BDAkq3Ut&(l{vZ5yK@8S
zV@Jb87dIqB_Zf2-Q8cBv;fKOo!0w#1V{S>iWcXYbruaK|eyGA_C(muTzL6go(2k+I
zKcTINg_JZV#2HlKgN|tgT}L|Rzx_Wl<DY5$f1EA;)wH3c8}y{|dCJW2gy^o!U8Co=
zO$kJ1(n*BUDMUvoQlzpOm1i`5s2haB`X>7N{P_9JMQibRgxk0%6OQ8E$ok9!7tsw&
ztF0F273#3}-!kLTNq$3cg9HNRR&-5lKCIr$^O4yiQX}K*cuRMIf=dE)AN`}o0!|*!
zwv_)@$wGeWjQ>#n(NTgMQRDw8{z&s>A;j{<e<TfAi#UuV@xM0o-z6r^U!GNZBK*6g
z<e%5^aVcc|CL`;Ha!?*nKc9nsJ=k^RR=5A?fe?7GzPIK5XaFj+0IBEOo*l;Go}3z^
z78~}B#ck+d#DU0Bn?Ljc$@)v#Dp^j(5}@-+KGu&W!QeS%ff#Lpl!vxdf(_#De1Yi2
z0tq`pVDZtoj~wGwLd}Qf*mKiv<^A4|@CxM~U``)1dyp*`X6>;rrCMW12j}PeiI%mN
zxFF~T-Lde|cho8iiyb-0`JWf0TPiO>O=5j(;R^!ot??OT-OS@q18XRc_MmC{?{e^I
zh4&H{akXiGmPmvVd&j-Y62ttaY}VNepi}S=AFg@!Cko?^SivBoYE!3x?UrA1SQdx8
z$5^-6pr+a#x!M@PcJEIN&AGp@ovrZvgUE}oR-J&jT?!)Q@myX-Q}P~%TcB4JrK54R
zOTtcBJL`+{BX5p77GmPACixtw>>s2m_(1M%`>$giT-G!0E<vgk(0z1~ju$}i%~-7~
zQSqrGCTP;-I%!Q`pwM?H9Kr+Q=<67&KM)8f`9t<83~04MSfu%94oX*?HTa`r-quCr
zx~Xf=&@-!$e11$<ME;DFEb4_1f>U;&_hm#_-l@r>D9smrb;IShjLZ!E1I~XLxL1qT
z>B~L5f<Wx)W!A_IsXX03)kUfP5OiIXdapcHc0Hb%Jo-~DVT4p3tx(OYW771sQiBEm
zyWjXcw}0eO5<!>XU5>}Uo)>ugPk&nQ5o<sXspC$?-7sZf{g<?9A8LUdThI4_@~!=I
z9)Z^+7U=(YhQE`-|M<D(sRPI9uiyST04iR5Y8M&|)ke)xS{p(zG$IGRv@u6NC}c(W
zWtePuh1*#4k_i~0o<lEm9r&UV{q%Nf$xt!iqM!uEgkV}3FDMo-szu$fMQ!LlSk_hE
zbxX0ImH<s9{vcNFDWWKPkKpz5b$@N7%5%u%0~fD_*_=0(vWFY>Qgn3Bb&N$;6}wi{
zRo?rfL2Q0J0};6zS5jW8#);w*Pm%bImK$ZgsQ6%tV5`_wi7b#+GoIdK`y1g=^XE5D
z*4Yo1A<FUu`krxNT;auuq!ftAA=PY4?V2#A7O=%4!Ne!d`S`d|z4J8Y1(fGsx)$C#
zP{Wli2j5Hwl!fMn>(AXZJ^o(MgCjXN<Jc)K`|9>k9$|m!JWkkPaJIztY1zp<M`yg-
z6VuhhLKz4(j2XLsd69JF^aK0gXd$!p5viQLaA7ycnq838`QRQ$*4j7i^(h%}*-1|I
z*6nD2%&akHvRsG%0~b*3a%wx_EagG=-FzB?g3qY|R_p5o1lN%;j))mL_?;aA!$cyw
z$jc0dz^ww8N4=&6*|(&dYzcZJ?>JJ7V^?22IXN-GsIO9x4|!&0rz1+6#Ud&8BpS<P
z7`mi6gA+B}G4s}Yk@#1dS*+Uv;tY?MZB6V>%bOQw#Hm&Tknhd0cg(RvhNN-u{&ZsT
zrL>z4eJY^{M1H)qa3I_o-!Mpe3Vi*e6)Ty{E60Lm%s7_QQ%t3C?KvZuW6{25>tlxH
zJc?&vRS<LuBfJ!$z8lgLeIymt=fYy4mR0=F%kjAsa^)=)QGRh;iXSaZW&P_UX6N@|
z3g-utLK{8Lt#kWs`R)af)Sb6RfTTw1Xc|$JX;4^j!1nzYa?B9;B{;-FUx%sgc%7*i
znDm$v(?$Bb(x0$tz;rI%!?Oq+hc~>2Y~jUBA2akD%V*2Q<s71*WG0(X_jB@jhjd_1
z;g4R(CPk5pn_0z0z>5wjTC_I3(4A*!I8TWe^79&XEhD9TTtDYTc5%=K($&R9L?@NF
ziV+U6-ElPm8o)*8-IGA~g94g1`1t19OT1a3XvX+f+{A@wAe8M+EFAcRro>ct){;af
z-{ndO-Ee4rbOv9s97zkoDlX5_F5^`JXav!MbVlsF&zA?QeTS0UM+v!}jsV@xgK4hg
z;9HJ2hMn8n@Jkh%Bf!{?7)k<7TJQKg<BxWpfjhPhtm2yW?MpC(@XCa$<XUSEi?%ek
zWhI1wv#t8Wk^|mb!fo~h4vvM`S<%=WAx>jJmi@{3E<0)_YLWfFY5}G$eDTws$#cZQ
zHw=+bP$NZQbV?%8qGFOqaqnxYOxw!u4}0hTl+TF$Ogei1LCoj}Hm5uM!;HeAwt#c%
zr<&f$UV20Rxf8E+Z<ER&8?$>hXimZ8^goOnD{r!&zla9DCO=PwV9AjCdF$Gd&B7x7
z7^NC0U)Bw5=~=uYuc&Bft9p5vn1Htf_$c4hjRhw1H%lCZJTSB|f(T5+9=fvk)<-&b
zN<3Uln0MwzM1r197Q};pB&#EeJpDRJ%W#0i`}K<P%Bipa95}_-HCNU}StZ@FNVjxt
z(Yn3?BBuqjSnFYJd|76~dJs|*P-gOu-R*lnIrbKqj~l#g5Gvffp30i8R|OY6(`{Eu
ziIX~SJM`mMubVU6lNKuKDLpA-nm~5T&gqcU;!99iG9QG>zTGp5%bHIp*E=cB6^rWf
z2A6@xziZE`1yR|oUsVS?lT}PLcF5&h1vc5l5J;Yi;Im(q9Y{DgB@92{HB^EsSg4G%
z@*Xd{patjPNI@c9oOhP~w-XjQ(lJG_hJnxDM-pnE<2c+j%5ev_Y=0(nxd=x#+ntQK
z7~Jx_*?O-bEX7279zfIld!kisC7i1l3i@<a=5`}oW&+Z3SxqQLMq_VRwA@G5Y5Q8;
zmO1nkc!P@ayNNmQ=s+J_6(k^hEKL}_$a@VY5z#Td9jkqPMu4c+^{oX8iLp^#Lh)?%
zo;|is_tq`NwK*K2UcG|!U=~)n)UcNgc7?kgA(?va!L7;2wuIu*>OFso42I}8d^oM;
zq|CKSlH}aTW%Ise=jrK>iKr3IkyT{hu@(7UyZGGu9AQFWW2-5lr?Yx0FL=Kz{&6px
z{5xv&&ZnEW=Mmt8ui>MHM(;yfALW@5_IjZq+1428T_Y8==`}2wbh(J=mWsg!t&9bO
z5;7(nP~dZ}XdPi#(kv#^u_Z}ZtY&%V|HUTwMcLQ+KQf1~LD;2-nL0=-@ODt20Lrv_
zw|w24Z%+xcPFm^Gk-%gdJd|wQh2B8;*Y&+q0V5tHj{X5r5(^FwlR}pBVa5V26fV2<
z-;5vS()SOZvD`Naoqb`ckjn`2rayq9Qjk91R}&64ldb(sO~0lC)<E=Stznb6y4hQf
z+I;Eq5ytqyFnKw(n)=pA)O5;+{?(m~#dc!8DUXg^Xx;rGSdbG&a2FQWIUcNbv}1S@
z=Cin(Vt&_{<_dMk<x`w^2Ln~9euEjmB3>;M3b2E12X5cx>-dSsXRLm<*G1oo$IrOM
zL+HNu!e4E~1Sqq5(SWAYo?na^T|>cb9yZw%PIqN69r_cCedF}f`daVX?Mf%#8inhL
zr84quzxs3s3y(XqfL^MX!LD2G)$KAzM^qnUfqgTh%@5Ah8<W!joAWR$UdUZrst%@@
zlAw5obC|@WiQTBZUv#}qPM`Sc8zqspec>?iJN$b_(6qxGSBN5U(8YI%d8a#C0NCYZ
z5c2d+ogMfwr%hpM&SyB<?YR3Hs5NueV!ixpOLl)%{j^7DxR7-Dof?d!ZK_(Z+P~fS
zlTUe*UssDg8mpj;+kp!LZbrc8p)qA9=hu8h^tY^Cq4R%chl1O$gywg5BkE3$zq(GF
z?`R{xMMw4Y1@FnmopaDLflyr!a~}K92AvF1Jou?f0u-i#T&-2D&Ww|n=AW?`>snX1
zt2@Jk?g}%H6FPc{U5Bsc78|;xTq(AX<Dv(KGV$)N=g<a7VZQIu`Qk>mAl=a?QXT8S
zaJ_QZwYe^9@UEg^YQ&O_ra;h5y$VhRFxh=~^Lg;|9<<>d1pENqmd`j44M}u1Vb*TD
zU~k`kd!uq!jQKDFV-6?_t@{&^!7cj4;No6*Z01uwc;)pg+x}#bb9xz;Ca+-p04n)_
zjGQ*PD%wXGOs=lW3Cflie<Hd%95aCgdq)hOTF5R%e1JT+oEc{AVVXHYvCqz<<y(7m
zgI!W|2Jf7zi3h{j&R+Qq+DLQ+b-1Q&mjVRigErclAFDWar#tGEo;xpn$J09q<>o}{
z9xo39k#MXr@6#&+$b-R`ow>zeTN3epckx?gLj={&%H2rv{@&c@r`4+(Rq&%WlBf0l
zYJ70+aSyAiuy5W8dcb%#!ml~}R5wax!33hqBwzPfKzRzug`)>?)vQS;@5vwT9L?1W
zh#BvZTrCS6V#qT+6Gc00oSZNQtRZ{`Iv@7SvOw;|t~NI#eqGNq{@bqUL_3&iSWpy4
zje_@eB}bTxHV+V$3FG9?k5fW%>js>TTk2o@pGVNv`@v*nfux2atLbB@c652!=IH%Y
z5~XAK<RTI^4hLmj4yvXyB;uIwV?8VNsr(Kk5BnGh+v2;pm2zE4$Mr9DTVJ{zCp6`=
zu-IH6PwXLQN#N7gzGj_P%oUQ;5QT?h!N}EHIsu|?iNw>x(cf<r*?&vdVo!HcL-)#&
z-e-Pkbtw5UIA=jgG>NAnKnQ+=@!k)!I?ZJD!v`E9!$&+Xd7~PU2$y#z(z=9R^148^
zvi7~OG-V8nuQ7o!BqWB~Uw@x|=n?i>Gvr5lF{=*T%<fwA&9xVk3T#7j5pUmbb8d{}
z3&IYv4@tU0Uc|H8$b(CHfVX|Ds$kD}_Hrizp~c7=N2F$f(c?`Cv{7>}i$n5Za}QQk
z-j_7%n02hvD#6slI7(0n+C@qo5ijN%X-%|^b-&!Z-Y+OreURyx{99M2q|FE2E7M1e
z`y+0}oDqc;tECHpvXB&v0%v=vaTzyOlq|hk3(?tnoR5WHHR41*_ZYnB$1JIR)xAm~
zbL?-fJ9Zg>!E-^n<SJ}$d(?__rB7+I?zaO5&n+s<H=0uEw_FNs4-FZ9@izKgBi+=c
z2Ii+Y`nqDXi*Xq}@JHeF3eXw;akkJ&IeWWP=tq;7&07PMn-`Q@L)4dJy*K#d?^84z
z5nhi+98`{lcPLg27r8sc`FK6j<B?;sN%vRSUmSG?g;jGxBkV)%mkUa|(MCRPW!<9V
z7y43m=HgwG4xed1Q4jJsy%DDB#o}ks6$m%LpB2=)r$Gl>O}lda$n^2O!R^#xjV@cC
zcPJ?o_y7~>Tz!QIw#wAr-m8l`_UY3}I3db5nuu*md^sx=pKC}j)dKbDS4ObSc(uOb
zKRM0Z?neO~QCF{D@ZnxTF<Ewy!3WC+&+NS2WLyP=Hd|?D$J0=FW&^#;Hq5@Z=3R0c
z9^Q=`TO7W!KAs<L_3q1b6hiU*;Rm*P(O**5$IZVO6eh8I_YBxlN=$6bT>DL1KGM8v
z<;Co(;q(AdZJ{jG!DMU}jpYb>wAL2dv^l>R6{*}r94dEv=YjQcbrO5O_KiN#D_-YW
zCt>%DUaO?z+>w&?vm^R6x4G~fC2k)>T%Ja(j&sF;ElqPMU~tApvqocvM5@hu-N9Ze
zL5}Mkz&cP(x240ihi2KPqV-8t-dO3}8y=}|P3k%>b-Nl(+1crTae}3NmZfcn=XICM
zsvoeL_`R0VLw-Q$SL~+Ya7Uo?$$3AOC_n{(Hp5ksWb4dJ9gn<)EH3CTEO*9fxcL>`
zpJ0dN*8>X<^EXp5@PsV>*!4-z^3|!5U|YlQwT*HS^Q^)Z$nNu#F9^4g9S5r1Sdn}#
zg43oN?)Z6k3_qKeM`WYbgXLXDbnmj&RuOtg38^!+WXC17S3AyJ8E(cCr3R8LO^qGF
z)IHU2(=n(4|463GdaWA+LRp#ovP8CH_2;ICipV@xV?TOH<{dYAp6S;K5({}24skm7
zcR&OI7sirOVf5$ObR?Ypw0M|vM~_dE>_MtfWoE25bE6mO`wY<^{^Pn}#Otzw9v$(|
zIc4s)`}mbRfQ`JKHzXj26W@|4Bo1r}8Rh1@J5p3RzdOQS;?5purx5yFVY0%xIS9L*
zY`NYJ%f{@LrvqVs)1pP1PrkLLmx!7k#gp;n?~dE_x{XXYs);O?ogBko^|09ZlB$6F
z<_$GGYyzz`1NXMDz^RhZ0jvB)(>W9al#&y7DGAQ~#64cHXC}Wm1%qCE&s7ACU$6es
zB-enNg&J_MZM<-r>O!H>$m660FSe>nkwBS8jGRm;!r@C2_GmF$<BtK$MiTd31jAAO
z+^J)qrv9S$UEPmKlhrKAgyEs$zdfFvE4G@(&P7})PzE~t%?{o^(xi~E{|48I&PGsv
znO8*+@WUI;8d&CzkdOM%zoEb^ucJLl7=GT0syv|@1^^M;ZO$!GN!FGlrM~0kFUgfF
zK#GV!cY?@0PFr0xL*z1a$kYp*kfP<O_a__c<v9Ncm>Lfj2<*tuaSM3+OrbFz%89dV
z=aw9}t9&T;iB*+9ZK9MIuL@Ea{^j+;;qI(V<S0%j$9t6P&@V4d_fvu7QiQ|LWzs<O
z`YLlW#KTH|PHj}Z6!kcu_28Iq{4Pe-j;8Kf_|-V!SL}6=X|VoMBfB4_3rs@NO1*&#
zZ_S83dkmXA1endbMQ(#ptk|o(CliJdL0oKiP3QscbzLt)Bv~s(9Gm6zTQ=h@2&wi8
zM79I)E4$wFKLkE5@bEDGXi_Uq>1@_G+=k6={LDFP&ndlz`R{op0q@If;dvN{SVGGQ
z^~C>k`$lQ)PFm@M8L9i7Q~`V2_;As!$%ZO}sIdKw^ubAR2j@{Q6~<JfDkE;oYU=A4
zx2nxHv{zJpX?rG@+3^GrtCK_7V|T)`szJfmtH%(yNFwe&76Dz-d0GpoV{At4dsYFL
z+1Cupg@l<o{a3&1Usb@bl5%;)@qvN6S?1*7tvZK#d|Vkev1eK)gIT#)mPV6X0@`U>
z)y9^o^rd~tXX?CcVRR4g0(pyBP|mZf*B_G&oK_FV%A_Gf*89%nbFRA&u_Ki`E|63r
z55*?KgI0h(t9>%@BhxbW8r_FxBcPSJv8dbfN!bs1UW=L94mpwfr#ps}(2<AqoQWe{
z^F2AT^@ZFt-XdaDE*_i$`f5mhjdILIc5ntN;jYoK4HK|OAIu*u&Zo94a&LjHt5!ol
z`>XnP>q<;X8YjQe$&lG%Z^y08-3Kw7VQzhRnhg<ocB^yEcV%mkK25W~`=(TT<WipN
zhgU(fcyyc3i<6PG?y6AK&UyYGRBYZQ_THXfuCS_;wl=xX+osSWcWQ6;zQYvUKixGZ
zn$(2uwu@TD7OszC5*79zS&leekuJYxe%}6rre+6R<_Mn>C{C<A@Mg2yV=03e&B?o(
z;5hJ41O(x%SAy?D)e0q&Y}_S2o66GUN-R*VDHSD6W9=kq56|F#@^O@-Jqr7Azy(<v
z84W{4^Wvi%s*5=<(LJ4&eJJIG!H}<r56CZaE!*gLZ_2}s0AXXZ{1;d9nsFlCJtXb)
zRM8Ik^aM!F*mz5w6b|+na2BT7*mBpjIX>qS_utvz|BYzwI~w}`p^+2OL;j=ZQ0NE{
z!vD{2jl@!5_!nRJPg?TdUPbmz1%dECc<Fx)PkuWugz%T7tN<)0QXHyyu4|KmUipOV
z32nw<=&=eG*&{oGOqQvGB`>TfX(^5wbNf$i58Vg(N`IWQec<2*h;|L&1-aq{^BhHW
z@XDhio}QkHxU>M>V>u{fn~A^zO0bx>1dCD*j*P<^@84m=b%)HCUZ?ICsIYZJA1peM
zZ`~8j;y3g)&t&|fN|U1y0rh$v=)@WnMGuxN%Nm0Wp$h#SY@5opEw{8X591}#O#$QH
zZXz-Cfypgr%w^WjmAwxVLiqQ?17Fc^wOE~+4}^9<0FAd^ozJ;DkmvmZhX*JwZM50N
zoV4b#pVydVYXwH{O=r#?(Tji#pb8MGE}}x_y70~mi_y#T-{-C(Nk1}d%6wwW_!6L`
z4isg-B1)rk*vF}8@D8Q4U;4Z!k0`t&Y#A|#zJxVmp9gA1qSRd!Qqd4Ox-?i(`uuw1
z<?BoChd0Aq>6yDh$)6qwv@9W$o%(%_fHOMBJjbYCV)|1*NB;m9(%f|8{G&ix<|R98
zpe)D}K3F1$w2XHY04Q2NM|4eBRNmn^QUenchR>QGc2jx1M#pdSBktA|zyYWDw`CLM
z)C*0bGGb=383&5|U-FL5cW9-q9AbR!ED-?yB7+K3>sl|nINE^cw@=o7(>UK>U2d$T
zvwE^@mg}Wg1B;zGD&3MdOcT+(rcxJF8QSf1gI2vfMh0FHC|8?1uGK2AElg-!`SQ{u
zV2mDB$L;B)S7#OECJYJEAE*DjhxR!+se7Fb9|z8w#N7IkgnZjBHYXtmCDhpJlq`Ib
zKHA9>KY!Cv+FKHxAQeb>igv4&?iCIz)MJ`w<jfv1jJW+Q^J3i5IVk$`w#|?pho-lZ
ztr9{PX}_1l8?Owwh0@i0E2Up|A#i_Zf7SfNU0IvG$ztQ?c3QqZAXt*1!S#u4PX7Jo
z&(K^BN%ehPf8pSy@FiK5;^;Eln^|1AT+z8P+=V8O;<DF>4~&L#%;lG>6JK3vaP3SU
ziIi>79)dJx2i;6Cd*4~K-5EyQb$ZfW^ZYqOf3UZ0Ir@NRk8TWN_yNi{@@3gRG>l+A
zHlqPVca9Y}x;*N_?Npn^Qno)>jKWM-{|nfw16%ldWQ5$n<ODMjS-1zmjclwKdq2ww
z+{tAkB~X@UCewBxFa4EfSC-R#=-x2kWhV5RWt$uW_sph*RFIcXVJ-{u?I<HuiFdN+
zs!2WR)umR2`o}h*AokF*n~ZmqgD^+yQO0>89XA}Z5ixb60AkNJg}JdjX2chrOv-m}
z^oUIt76=ds+DJx%l4WYb;4?uQ*e?Lw;WiReTud)=9GmnZ*Nena?}PLm$Ai|j!*nWr
zRoX&}P|$5Pi<DhH1s%J*e?a;Tce#dvlLBXU&3dBAWNJjfmw+Rdw4E+?0)&2(diV<T
zpr`8dbc!B8TEn|4EH&^QIOEXdE~?4ehugDL=B2?`BQZ&3M^xoYH<e;3iI`j##O;w{
zNzHV9q1Jh`Xn%#bv;7~h1n;I3xB-GOdmmuTGy(RwD=Zq#F=M)FE(`7~l#Vul9q}HR
zN|`nGXz9APw5?&ta3NSg{F)EG-TfcF*T$007td}AEr+_S=>>``araqN8u>O@v2u)?
zB%_Cj%eQ9R)ZL&|$3BQ52rr9|&Zj2Kr1{2XOryisOwEc=yaT&<c=4Kc7*q6D=npb;
zzH09e`u=-MT~@6Z46u)kVT(HpkjqR*hex`q9Zu~aOprVLDFh7CAN%sUyeh?C=?42(
zHhvME&z%QA|F~EPy<XbzUs%0jy(q%55Ykms4bDj9iaHQ1XLw<6kNX;Huzvp1SC7xV
zo%;+GK2m}!n>S-(p!q}j#g$t@xF>j3cA6b9@W<zKO&mZ0mb}kzrz9}_g}Q#bjxsXs
z7U)zWN8i1Pci8YDT4M*CzhsJzKH6nfpRLwwGTPbSxE2t)qe58tb3xc99F+Z3H0UOp
znKT2C($pab>o=B_0OHoxS0{@wVNWUh1P_**5H3AIRgr~m_{$e&FX*ltgHyEv>!wm$
zh~jOk0tSn}YG}BLxUW?EIJ9}jt3N<g2ov&I#O`<!>Wk0M4{h&xTN@*tw=^|sjI+J(
zqRKYpwIAc=UK+pL6OYCzGoNs^zN!`!zJJZjH?%szQB!%Dbn%m_NTW?olFBYiy(&B+
zCN#PTh3qK92yo5nrp89Gb+egMr{5;yAAkcNvxLoAkYn>iz!uvN(%+_bqog}YfH(xl
z$b(3I2t4S$n+Oi$sRz(d9-6!C?CdEV7cd5<kH$b`wO=e$#UHR&qr`*)uwJg7{jz$b
z432l;?!3T&blkKdT-e0J`=Sl+OTKfO0N=TlV{zxch<$_E=gcK%8nKeaGsRWlEw}p?
zPs4zBT^rQ>;wPnfp5N!&{qW3zk7XOu+F@}ie@rY^62{LO%d<d_6ltFzbd>w%>VrGG
zDys=V7EUc6*jEQdsIj&N%+5^Tv@}Q*pxk$-#kccjxNE2u#_^&CK7sHM^4gClf!j!M
zT<Y?L4`q`L+cn~Ao&E@QtdA}s<WF#Qx7G9q0{DW5CoOBe!Ri^{LR`^Z26Of|@`zvM
z#-Kd=?VG2q=al{>`X6t@9Gz|j;4a<^j~meOUz=q%AGT`rK1GP9pz0?WgRJXrNQ;Y}
zL)(Mm;`_v|yn;T%!|iYiNNZA_E{EngvUMKuFX%CcGQH8QN_JP7J?ndrB$PEc_!g9B
zG8UL!y1^yYNtgKpxH;4Il!)s0ByYH}jEHuN|2NM0dL2-4B?kt@6$A#eY_oxhy5twr
z>35aun$OQg+Ab4(Yj$#=+R_M?7DkJZ$K?RW`y6~6uUz;gH8D4?V|o+ZEf+#XCNz|`
z7A|0yv$094x@PEk`N+}HgNQoTqu65xZAo-+Tz&^n;T;k@*^g|$nv&r07w$R*dJ=3n
zmuJL~=iCsKS;u?)%IV1N>SpwMAogl_CZz#jgv{SyQ!YPL-z+qV)8OqJLL*Q9=&uis
zRzF?Z#vN@=sdf#X_X2(jd)n%@6h}OTNC!R>RKl_y2ugn0u<sSXm#8+MJDpTE`R)^=
zi13B{q3!cn*(aQf2W{Y+8_1rbXuI6^PVv_k@o=YsV;Q7CH$QEh^W7o{v@|amvdfr{
z7>7;-di98hbNTp=2Jk7afascZkdN%Wt35apb*ZHqN_AG=_7vTb4WOpnN6KXu`n|6D
zj0wVemWP8Cu)s!6_Kx6Kih8|<D7>n`F5cP7=HBLZHV|N4bDM@{@0RP9bPf9)JPv*N
z{B`w$)`7h7m)<X+>SDTO(E|~m`{$xvgf?NZ!Rn(s);k{g5Aj`*T~U9>No(|U`BnId
zA=T8XC;)$%&o0VoXR-2`SpC(5QEMU&LY%Jidp^-&_}p0fFBiicrugFZ|HQy!d8b!O
z>BfPNfCt0L&id1H#?IGg+KeHDANWYQthdx4=SPdb)2;B;A1tt+lMT&C-iMAQLwah`
zhU6j^A-4%Phd}tHfPzvVZw(XCJYf%~#DaBOfSHW;VEELKEiNur!;vK9P0ml8^IE6p
zs6o`HaV09w-58$9hgVd~Pd2Ztb9$Q}aQo#r0fK`8m*lD1^kge6g4nP;)?M1z*Fh@d
zhIEI@?zuZTr+@tjzIs`$457llCC`Af&9!I6$u#}pmlfV$O&9=eV>)|SynGVg``<Qs
z`Y4kA9mL;c^gl=n(J~u`x|KD3-Wg7!cPJJ1{8qF}{nf<0)(?G<4~{SuQQ8nEXA$6@
zQW8yN6DC$jdS$h%a7ar&^l_no3+!=Y7j+_|N%|6~^DAL{7Z)K_+vvL7CNhro^)Wpe
z*+6l4$q&J^Mpj$<Z;p<_d#4P9Izj}fwoymF-BjUu^8FaW20ZNV4KciZdexdyoC`40
zeZN-Jitg5o9Egr$1Z3P*)+kfgs#4$3miA!8!PoC>A-S@#jmG}idJufsWn>q3Lic*V
z|6D|AfDp4WhEB?)p|6Pms{wy<!_m&~hsV#HI<E0Vv8H*N7=IPE1v#PisBavy<Hpq0
z0GhR!Dg0$Q?%9~Kn_V@Ew`40{_jEF<fq;W=CkhKW3~&T5GWpd^Z%7xU#bu)q?!N~j
z@2(DqIp8JcId61hM_)!^&3uT7lhzHmcrPl+wxBdOra3z(@%ie8vge5xK@yNSzn~@p
zTGW6NBd5T7V)+YI7p6Tsn9kDq8GtpDAtb?2N1_WOz_x^YOs$EusS*7#DRLYf2Nw;k
zDFwSVc_fB#Nb2o(r$9OIZHwreC}v6>e?d9OR^+rTG0XBbxvWc#kf*<^%Dk*ZxS6HS
z9)3i6R3rx(nqJCAwv85ljYM6BTM#ikd|x*D5*<J99Y3F_+Ct7i@mIw!LW<#jOe8a`
zE_(}TI{or}IfcyJk${*Q9V(luPpLKH(t7)niR+ZS^b)+2QVs!(McUmp$_^Aw3%}uk
z{2Q95#e!Azoq(r*)b<cAUN8Yd5WAAzJ{&$(z)-$mc{r(m^@lewHB0@PmhrF4#-J|S
z!c7IYW;5pQ_FF`DHFrzbLKTKOxEcx$PP1E=ZP+HxqhY#FO&HvZnjAO*qKvn%+y*Uu
z3IFcY+RIBghFt5y)GrU^r3-ly2VSae4s!ju>>z*aV94#aO*mP9I))(T(sw3>5WxWr
zQT<j@Lvr&B5k!@4GPVdQ^2SI3Nw}r;=B3i*Fkk$WO+oJ3jMQI{Z8D6Cc8CA7H?uSY
zGnS(x)OpKJ{m{aFhM+;FIjI~-PijlQshk9^W@99`GB5ijtRqW{cW4RtZ+yLFP+U#a
zu1yjoc!E2@-Q5Z9?(XjHkf4LROK^Ah;O_43?hfD1bIz&r<E!^nP4Ne2*t>i6TK9de
zp4|(!+fAg^dQio3Y){L|6|_ECp&WvQv*L<Ie)qjP5_jSnJFS5JA5GjSFc>d$MqO8)
zOwuEN;m#O(fj0XdyQOpN0vfs5AXViM9L&-v*8JJokbMlOAz<>m_7BWJHa(TXz=dQh
zW$6`oeI6-}fq9rg+K7~eN18MMru(rTBkpFS-@(EX_uX?Eg%20xO=ay$%E&*NWK`i5
zS{k+?*mw<;NsN?1G{%@Q6bqtT6TB{aX^A_EV;B7h%OB?u`o74Tn!F;wGz)Zq56RKB
z)DP|jUPPCs6X2c`p_%2<bcB^~h<X~(`OpXedfH;V-1#v!W`X|y>;<?GSA4ay@c3PD
z$e(+L;Jo981{A{ysUvWiGKQZyOzwUoup6X!R%DpxeRdLnGhUNpE9-kkz0N<Jz#%xc
z6_NL<BXcoaMl~5GBcH8^)v=lQt%1{tMSx}ojJbZ@CDK3Nce$acXcUcW-5F{_1Ju!I
zIOkfq)<Ll?5}*R1oFEem@&GG7KAd8C=jGB|f1m0A3;9RjBN*W?>h>eiuP;pC-r_Y=
z`c706z0;Od1G=O&#(R&JVHYCGZ$PCff7y{$_F@(dR(TEuGLu1<3mPSQQ_JILU}!vB
zJ0_Feu#dZ3oKpz}0ehX=>_^7t-KQN7_rMtch5>{P2(ktUn<3?KMok~=EUfOj5hvn%
zd+RGG;Ga%!jj2_P3vH1H)xVRi^nWh58s?TETofNn%HwEBs9_jUx>DQGFpy%!BZh8}
zZ-lAq1vCPs<DO#tv7#kz$0Gml4kN?8`E!xT7Mg18h5BTg45J1wB*dgIYn{@-qIr@a
zD>){^#w{=y&DMN!{s>(K@{4tORvwvi2(W~ljF1DDzvc!erBHR!a3I*LQWOPvLAx~g
z2BkLh_ijj0u8PXzn=*eeHfR>oIp$q$`Zy*3<`v*x{8a!>alCW#p0He!69a;yUwsyL
zD&+@K8}($r(G`Yc!VK|m?bwNLu0sEd=WJ^~DMwh*_uJKcS#h4ijbrMgh@^{@bDeiR
zc+$zG__W>S!Wjnz2B>DhhmgXEQp}Sjleb9#ZmwF7s@RTCdg|w^MRMP<uta}<GFV8?
zA4*<ezL2_rwAtfesipb)$uJP#aFOKIBH-74;gLA5KqIWsn>vcHeutNfV0k9NJfE;U
z^z(Kd+5NUVLdAg`t4T6#dtdLgJkErXIDRe%r;w!;R1&ZO24E8CNhKVe^IM%{Sy*`#
zoMPOwpc4%YnUs`iO)Z^Av>@#qo@&>8n}hw1mVZkTp%W{cgHw!oGW?|ZaGp@HZ)bP7
z(4Hr_dD#4!G3pzlGtwmq)<Cu;gSwZupcI%AN`zaLVg8v5cN#!qh@S8nv^f#@<?$`+
z*0S-2@61z)1eL}hqzc;-v%@??bYW+3a0$O3cRSMni2!(mDq2UFev;bsKnN7qP=AAi
z_0>zBfcIj>&oK?HlSyCF(41I=2f#zwY314zc^oOz()lDwcn7IfMbt1zbJl?V(8)MD
z{mY)A45A$_?&Uq`4_~ve2yxGQDuXV~%A|PJeKL-=?SYkGbmhz$`vQVmDHfF=?rARy
z?YBCzcCvii_!<G8ny9K&y6t{`Exn$UOwkSk1eq$`WQIqySyh5sRYt|`+oDXyZ{8aH
z3KE02Ys>EnTRL7AJN3EBj+xEF6a}HSn@yxQ3AIkd59K(%XIDHCszpZ&^KVy$<doli
zZ%R50es&J`PL~24Ok#1P*Y1Q&EcgXb{QoS6TUADNx1`e8DwZhSISW7@fHH8UhaG7e
zSE+4WcgYyItPBKG;<4es`hHyqg3tgea#6A)^(?JHGe9#wGCy}<6x)3aA72Ak1HhEg
zco{t|PRG4QK=9de%i)$xVy%de4FjosD38Qm)XO4TsOrW8VPh(VymNSW`UK@<Cs`)&
zWtmSB$l-~L2$Uv~z%>nRpI~oyMMu*~vvRHuPPozbJ%jH^ieip+hUlY{3=o`C3d>K<
zexs?zRg*zoa4cSt1d2%a<Fu@uR+v#!()^=~hXSHL2TRT`>IO|yd%^GrR6yx0Ez%cg
z>zuD-?r6M=qTE4yS%_U@Zb$xa@Y+^NQrlnMXX)kq3NMk!EnSo7#05+nybk8mo2E@?
z%_D9@gHQVYA^nO53m`OFw2iV|K+BLRRXgsq1=?GCmLE_v`E0i81x8ZxtR_Ge$+a4f
z{_3i~Y%Bsxw^MdQ*3#SmpOJ_4x>>!@<4n*@1TfuH`R5@!x>PJeG%HKu*5eb;d}uS`
z)~LvL0$|l+xLg&SvCT^>zhalz0L}>$xsCT-SmcIo-QmUTUG((i8NPv;C-4jlFc9$(
znEO<wz8$|hWdE;Kx6w4|@Q86M3Nx!JzoLw?Lu6|H5E;|@0Qpuk#2?4~bLdQ}91(j;
zw#yw#__<?C!i2N!*TapW9Ig5)(GA-EY-AtdpzSG5nILjNXD7%`C3yKcxiKvrs=Qi+
zKo}afud9{WqB<rr)%uPB(t6e5WiQayQFnmv57*r{>8b3!G**uN(gU{Mu}M(ADhKaV
zA@%BZ{btFwVcR7#Cz7XlP)o6<aB7Z!e8iEaj$=TRqOf;(*v-b~?C6K*3~Lsa0r3YR
z?$8@hMJT65t9c{O%08gQ|F73>T-B-Fr#K{H77hC6jB_3V!zh&`GYE9=wAoaB(i%-W
zMuE_8T;<jFid&!6pWgC8%C`Beksf#q!EI~Uroo`wI{dV{VGnRA+hw##11;@GULdUU
zx_JPcC{z^ZbvGQ;lo)cCZ?XTst@XN#r%tB`<DxJ?53Zk;e|NAe_AO9gp7_=e0XeOW
z;fZWCQFh&-b(g(&ZPx=qnP~u#_5&}hUb2x|u|!sH(YQqcQ9)a$y3JsmQQ2$f0Ou^j
zktYp^`1I6wSfDK1ET7N#|Cj8wPr&}e_PuMhc~RMvSTt8>z(i9Y5iMft1?F*OYeG0A
zFcJ5xIg#T3_|r#@sR+>Q-Mh~10~$czRCommRmB168^}3>%HwM;pHD;_4A+8Evsw%P
z<=GXM<Vvj{nE%;@%0P<wfd$U8Rtw?}B8z=)nkjvNoB_}(eAkya&bn`6OzMgTtt3e-
z&|_)(6$S9dfY1Y?q0-sRRd^{~zu}cNNIQ@1jh^NTchM#za5%6YLHhOc6?c-?+3~f6
z$I92|ZIs(mpuj2d!7kLMP-N+e<CEZ>{vq=YQTvmi^)wo!wkBs~1qe`WPMK=n4u@Ca
zSLxA28NJ05Ux%{^gARvJ3$;K-DI}^+Dyiph=5=2Qo(*`UR~mB%Myx2gwvNPQ25%!U
zSg^ujKhq667_b!79U|4dGL);5uE(E%Nt)?nJ5R9$sp&fGF_9MmWb|KR0u+`~*77%7
zPB*&Poo{@$sD_CE?w2z1JD>Fjypw}>YHD-Ue73jVzxj|*`LAKpi>oLn20`eRXU%mZ
zk^-BLiL$A8gLWJ_g#9EB)<C*^7@qeee$ME4M_AB$g)DHgW_3gi(t`q;765LJ>BE<A
zl<f7-sOxhD#H9nngR~QuEFkx!V?H?~%t$4z8~k~~?0$y}TxRJ{(osJtSykyv-U7(m
z;`A-b*c1T(AsoCd+_!gVfmVbX0N*RTV1%a@m)<RREf<n$vU+d0JrsgNz-zStrc%Su
zK0o&zTWc=ew3s(&M;(rt54K2Y4Dhr>c&A31Cv=rKzMp6$WS{916^)5M1idVhfnEWS
zG*!`b{uqo`l&1Y<o42++A#*FV3DPDwodA>x1$E)D(-pkc0$|6%V)zqbn=7*^I}%#7
z(?3W7?WpREj2&#QiZRwX2oxGX3sNi5F#t;KUv6JU_M|_fGz?5)NN40);q})~0s>Q^
z77c5lSX)2mky7&ca9L%aRpJgh^iI4b_F*Q@C%J|}axQ(~PKz_{VJ2pA0I1na`Hg@R
zCPO{XHyCwrz6wn6{-X&&niy#lvH$_PE40`rpC6E8ALYNPWzQ6z8w3D2ATc4+vU;2-
zrO<Q*f94WZNByB2Pg$z`2G9J@s{BPixPEP%Di`LSU!Vj?x?Cg7uBGabNaJ_dIz|PK
z0i`R9R3yEoyEq^t0RKP#nHW&HA>cLPs5*HzG*eg`!mO;LeXDE$dta<53#C0zKUo55
zZGXeOY&6_mO<okEI(N_+7O+bHa4?F^X>1O6H%`Ta0pxdbw%-ErX*pqUN6E0kD|VGr
z3u3`OB4Gi<CK?15HPt<$G{yl8nt8-pus(Wo&7-!40hj7ASIy&>nreOE7Zh`a$&`4x
zK#RrMxi@HAzttJ0{`<ckR3zM16X^n)ub<7HYGh|B%S1vT`k)Z&4AutdcYl2!z8<?V
z0MI}E)p|nt*S*rpN{c}EkpeUuK*9iJu2vGE$Bv33oJ+y@cfWuv@g<ZD@X7!j(k`}D
zz5X;EGX?|^pq<+1mA_*r2m9-zsu$@1K!60z%(%rV{VG3~J@X%a@9toRt(OEMU^>ks
z8Bj6+;{!5Q@9(qHYoZf6eAhRP>C8n*=IFk<5kLzfv?LS)hzAIApgW|BuKNej?DxHR
zRs(97VF-<{j1{I#s=UrC$|V7A17h#Nbpx(5F{@Efz{l%zC{J(k{u`WY^C0a4KF0vF
zKPdpm19HQmBt8KUDr)$q#2-EN6~odtNjp{A{y;PW5oO4MT0z~$56}rKD~y3!b^s+0
z*@rxk05!pF3l*^K;;bzZsT{I>>3x;LfRWAv+}YCg3IG(p16<wW2?KWS1`s^oCby|L
zB7ubfG&2@jI7?T>d5LseB>?*l7(`%6YbkGN?J+y<%`|#h{%&d<g!!+T+hq7jIhQ7W
zQ_qE|b$X2%%FyVuJf7A1=(-d>g>Qh=kKsrq^xzuJ!$_mACGYhJFX-J`dnkYoD#_W}
zVpR9S?V&}2H#(9!HFca36$^X`fzU9Ix-e1Pw+E!W67w?h%b_Hnh@+ERdFfRJbz^@l
zuLOWhw<rE+y6=~w18iw5=AS+K>1lv?B}WF5BTGxmDxh-ao8kvBgtY7Nx3siO!qbT|
zXXRd=zjxG>{nP*(U0P&4R{6B1C8YQzue-f8vQw?f|9$j3AhwWiNk9<GhZ5@(1qB1*
zfiIWE+4U9Ulx*|mG#!@tHk<#*kd4Rh^;>;_;Q&T)tvGmbdX>G=rcBe}DIr#%4ZlE9
zzwDARF*lde2hht_#aF2*PHxEFN1{hExWwO239$yJfyT=DEWk(3l?Tk3Cu{mynt4{r
zf!es)f>vyYA2uhcCKKUDCjs<@($aVZJiX9;SfnJau-{S!08{~4ap4b0>E6%QmRiWw
z@*J7s&Aifm=|tEoB6EY;E0TVqi%BMmGr*Hs@kz1F2=J_Ofr19S^~w^sWSJI4s3S=}
z3BuXHlClZY2xm)Bb$1j9JQXYokp6S2q5zEoa6jGKSHRN#?f_pa_6Ke^)~zZIKnsz*
zI*_SZ1jy?Fx?vT-ctY-ZuqyT$9;)CeU&%LyNzt}P?qD&CLLQB9w|H5reuZLmDnyt{
z;{^<3qeJ?A0V2?*1LncT63F#y!^qbp#Tu~jiDaV%J!yce&j#`^FH2oXM=I72WXwN6
z@Q?TU5Wer%R|GBo0a|WrcCTZK%%t|*hI@}EVyD2B0u~48IOG7g2a-bvIAHrx57ohn
zhZ2J+1!f$3rZ{snu!?bj_?k3e7KU0}ePsqr#mEO95XV-)Xdaz-ixTjD?V(^w^@Wy(
z2O5v4j*%rCP;P)7oJw<#FBk-{lz^y%FA@H?^S;7Bo8hQ3)tUT8`ZJhMG$MWJy1Wl?
zfa1GdqeqYod_I)M)nP$9lg2L~d;2zI_kgt^U<Gb$m<n{Y4w?gA#evLjK+7lPWyni|
zW|a>Fs5da1p$MNQ(Kf_=_rC&K%~C5&vsF9t3wLbxdBSRuZdfeHs(|f{d~L}85;zqp
z>zA|E!LNspne$bjDNiEkdl`uwkNiW@=D|pup|wO@E>P`KTiU%xT>Vbjj*;tzw5ueJ
zt7eR`X5TTIF$-F?KNvJpyA~Mw-!tdqmrp2SAz<QDHR?`2y2R@oOpfDv$4nFH{vq~X
z4+@P6#nFNXu@%92Sd1c4@ILWK_$?#b*SBnP3V3|kd+(zz_n?B`qPjFkbu_@iWAVyP
zW2|(HFl}32zb|PVomn5-_Fhcu?ZZq!2LIUvE)+p6yzu#NJ?^O)O!Qx8ds3MrwDo#9
zSovEaC?r-3eYmCJF{xVXbS;G3R%|BV?BuO|$7G@&H4yv09J-8;4f@T!6UsvR%8p+u
zzF?H=AC$sqoG&bFp+;HjR9yivJ9Y$-zkBP{|GMBBF@+`K@;5k;=bPK%5-$jiAY>1?
z5H+)E9Am_CFqH}QSF^`w5f+(+4@1TgVd3>iI;{E|N`gtH(n8&^+FP2Ck2B2lxs4U2
z(l8{n)s)8jy)CmBK{Rc+g~ZALeA#pJn*-Z)KDjK2R$IXhd#@E2?f>^^l8|GZGm
zkTbxamS(3X4Y!FRi%&>ojeUzY^GF==iAegYcM>8rMks65ztqeug;ydUJl7?WX9>h{
zgHK0VZNIO?W}1dWM>@DnN->iBwgwj2thc%S$>Y-kr9Pa?W*(V634~Rcl|}ry)%OL5
zujI5lL6Wm})nIIz-I(HACKBv39CkSI-RzmY`bI>7R{8R<J7NvL6vgzzAAaCcrM1%3
z3{(v;{V82ser*JwQo`5HW$ig4?*l<gc^2T7<mf|@h=jFEq0gX@4peIgn{_0m8%N{g
z3dR@}6n+T@cf>G6Rx1{lI=OwDP7&KFka@R}?Ip+Ki)3`K@w8cBt}3LhIByxtkusR|
zFZo5B^*XW=B;}#dLn~fs=`Qlqz(Z|-l_c48zT%YKkQYC$Sv<$kR3Z<gXx?i+3rk!}
z(Si^m&zOy&G-&{#@Qb3q-^sz`>qMS7Z?Yco-nR(%qcKzH6Y+i~$f>KD1^qOntD@lF
zydg21en(;H>yd{HS_2}Sf;Tr5fs1}gQeHMu(HxcjyqLzEvweU63@O=cxj=eT8X6<J
zm1<%fN7+9v(vwIIu<iYbkY!3?!IXW8Ar3r$MNw=fB$|O~*!d=71npjki5H(dKA|0a
z!f&zauO16>$x2U*D9r<&7TDY_hDX=L1n)G>;BZ9_Wd)VRREvL`26=SL9GKb8PB#A2
zhE>MJAesjgIm?jAxCM?lE*=s=A8N!t#=e|T<(^%J7G%wVv19%m3(0lO8uVoiBB}c`
z?s3tQep_7S4|R_CY2sMJ%p%=T1y?o+_7D!v&@{TyPCrR83jw0}gT5WEW(Efeh!r32
zfz?tZB0Qs%nMf{o?kHqjp#*0UDDLF@?w4F#3i+MLsU~hQ+X?#iH;ULNFquh7*Ztl1
zLJEW}q1f02ed3Z$MdLcP$Y{2%#ei|QdC8&F6-gcLV}gs6Kk+M3yYS=hHhs_4igBh^
znzY6AEP0BQ<8S9wdofc6io3&+(|ihq@^TDlW*eo}maw8GE37g1v2)5TlJ;;^J(GE1
z`jX%UOaZqkh9kM@(+v#xicd2uiK&-*yT&-00#h=XqBB#}LEdl!cVla4$Zt4Mb;twD
z@vylm;Q1Y2k~`u)rQQzOi!gGyh%p@`^ODGtvI=c2Qq>xdR@+c}!X=@Ay_v_^b?Nuo
zrO<0DH29yjq&FC{lUfwu^E7zwb6T3jtfqKRi5F5hyBY=&C4c$Co+4?R{Kr6s^!wfB
zRTg*0HW6pw?Fl08VuO&usQ)_82Z*Y?r^|}&#}qylW1`L@2DV9j#Gd-sJq+AX$J<Gi
z%|5<8!YDF^QyfD@9|@O$oj_t?rU5$z!kDZjSxJ-Bt>j)elXvG`2SydIbcK(f#J*3f
zF1xWz3-u7k`)Wuj;t8Pj<^vKxLlAhg7$S1Ykb7lmAxy!}tc3clM1nsQCnla_%-8S9
zF8WrfX3GZ5`(13>#OPA7<R~GL>8p?swSO1uqSRc@Jzy{c4Ma>F_{q3<Zd{$LKmr=s
zdUpnpzr~)10!S?#qO-L^{b|8vaV!l3sh6e<QIy_{T45<5cV<o}IN{8YHe<2y^YU>1
z!lkbU&8s20P=v;fDukP+jAEQR(gfKZ(Q*=(J=xrwX3F0_62uF@(aZ$zrZ*+IXpbI#
z4eZHaPhnPZf?H^*^FViARlRuHUGfxLiDrOdta$OLG)7X9*LRqd<h9OVWhs|Dj6;@!
zh??TFh$c5v8-IPtQHsg!&&aFb8LZCza*Mc^JnTX7OcJ?=gi;RI!vL-R*_+Z|H38#n
zX7Gh%7DTNht?#ueN`27dAoH=#jy+~T$+PJ_#x$gcc+NdxT1c5U7xyq!cxet}LmcQ@
z<G&t5U7aq_mEP%HVigUIYa$&e!I@^%#3t?ZH;nWIhfSd<c3iL9fU!_6a~NrQ2QHEW
zMG9iVC`N$1|F9|eDm?1da{fF1-4Ag+O(eXHA#YAhXL9K%gDi))Pj-dDpaTV*I$VKB
zNytO0U3y4Ff&tN=3SeTg4^vrGSD_X=1KA7WiZ~a29LDbbw+v-YIKni4-BQo%K%IJu
zPJbE?4XCSRi<{+dj}Ww!MaSTYGD57`pSbrY>0>t#6TQWCM!>8TNt2ZJS0`2qgcy%G
zD)&-^K~7R>wQZ`!HNlau^t>#RzTAS$9hXl&@z9ALGvMhshEv4aPy=8Ia;v=|d19yz
z;StYYJ$r6V9Ew8w>^|p!Qr?texZz}0gpdm>iCD_zLecGB5H^p=bWfX9pt%@61=%li
zSJp^@M&pNtDiM|W+-c&7Q`#2x^=hKh8Tuyk1Z$e?UKF0`QC-qu`tp69k`L_zwWh*+
zmxW5iqS+k6Bi$~Q(yeTrqy~^3V+N3kVM+Q6ybF19_qHjI#3R%?(QPMFcwDi9f(@tx
z^IZiPA1uPU8ZgF4IpCzYXd$43tmB++<1JHm2{{l1LEumhtUCpg0u16KJMoZ+J9#D^
z?{*P5DpugL#F6oaoQa0A<6TseVt>WV-(PTsXvcf!3hs&2A>y}tt?tqdJ{kVOUvOp+
z6|r+r*ILlN)Y&mGLP0U$;YFDM-M2<d7qO3@*a}k=rr&$HmPAf}!y_5AW0ZFC*m|QY
ze~@o5G5Ns82xoK|_19mjebH@6CgaS>72?O5_#qCn6Xz3}C@xD2Pw&_^R*R5(n8MKk
z$w|QAY9_+P@XL~Vm8r~7JkJ2{miM5J6|z8am&kiJ!aS$Pv_sX(4u;tR(<{^A{_M<%
z(J+VN(=21)<}LA5-OyoP5Dr7tuuKD6EIoNZEHNxc4pCuc5B1<~9E{?d4oq>1<uB{u
zJOfIIc5EX02aX?c7`Y4N-(9q{{_F}kcmzyu5|s5WQouK2W`*0cU&wJ!or<%F*cZVz
zx;c2N@AND-i?6zYy48Z-)Po5YvUi(qv0!m@BSDFHf3HqOS!>Ck;utizV*VbU^bDa_
z7Z%!Ogq5mpFXSq*zqwN~<;TjBAJn(q<^^BvymK&&s_wPXzTB)4hxOdbVi)o#wn@a`
zZa?cU`=bdn!rhaC?nmR#XpEqbC`1%VbHyV<jeeQy{_YYFyShEPig}uk!GJP!Kx#Cl
zD<Ui8N-;m(C9Y_{^zJki-&MfcY1cuyyJdBW50W#c69k>MEle$)<vqx^G6?^z$G}NW
zZod~rf>cqDb@SQhkfMV!CJfw*H^~zAbT~@E^WB*?KN;*-R@lM9tMuP@HvK?&Y76H;
z_5T%_@{viHn6e*YY{*V4tT*{<_RwA;6k&J>@ocVGKGGECw8}abAv{&MMo=5&IzQ!e
z11KZ4+vM{?QepBSLnh;*{#dV6Bue_y1G{4rYfTJ))LEvwtPMC0@`AHH{FmH9D6WKl
zoYR3JomIhD&HYg(d){I~Dkf_hVpIwDlm*Y!Dru0F*ZYfr-?&$N^f_Z92swlOrrBF+
zSRvOWT02731)osai9C>aw<hkYE(bS<PX0#`8K%s5@8t_431b&=wbd^(r_2)-9Rh4s
zihE~+rvshF(dtb1ji~lv`#;eN$)^voDe2g)6E8U3pifo<Mt0M@XPIY8H+}5aLFlrh
z{aguoX2uw}D)!xIF6uBdL&#@bp5eL4F+*g^2JJf2*(rRPCt~pqNPV|IOpd`~KX$ob
z3!_O>BW8{yq%uP6Z^1yIcMf|2zpY{nC`W|2eW~WzTa5fJguL4!ysa{jJPsLj>QBl<
zL)HGDak3(a!4e42D}9*<(Xr*90i_+e-4l170&;93Gx$6+6W_uK$X7=v@It%q1DD!j
z5m`QPyGWGhC*c#VwDB;ESNh)Tu3ajV{FxXgvxSVzo3WfdwISM&$T1l3dvmwAQ4V=@
zkg-9o!`-1$-=$Z=3%}&?V2LU+^fQr&6e>C08HF1f$r12Oa3ba6T~DKsm`|MEaQbCn
z41`8ysQAcWTMx>V-efX``}K94em7_)|DrhmdtkP?pQ)db>CAb+0FLaSfi0C4BSS@2
zQ`AC0Fjl+&3vZsH0LRb~w@C~3ZrsCf8wNp(oGe=Of=_eK95yM$3e_9~ahhVt_}F^3
z<D_{KDAq~YZa9WTuw;CW@Afz1;RbxH_Fs>8;~-jU(3)7EkT1G*(p2eNd1~Gc6<?ww
zm?i9CWWu>kO=|#kA{HCK!2yaJ8SC^LS(S=ME3&lm6htrc^fsj<{`%VvR8nqhVqo<s
zpepzce2clH<T}%JZD_u{adOQ&-S=sDfg(Wyy+y;esbpA!_l@geYJRyp3jdWtmiulr
zup;)7bbfw$xWT&d>fhUvKF&X?EFbVf`$i1i{;ICruef$b(Rp1-CMmB3NCpq&UOjcb
z;_y}`9q4+cWWAqm!D{ET0jF(Pt=Gl4u3x#NreZzK@O{8;gTNcCv%L2u(@!gV7VcTW
zLIpejjw5JqVzs;1H_~lc+csx3u686$_+Cjm;heXXof9Z4=XX>jcD|+2O)ojei~7E^
z=VH`p{GYu5P>9Z2hZfMbjHv}zP@Nx`%gY+>?gKJDkGW8^P6_LBU3z|=7w&wjHtW5)
zQ9NBU6Wo%#S$r?yzb_%bLin#bPOND)yCd^?AAZmD*;kjknSK`2BdDRixo>O_T(N0}
z7LR?r0lw`8gYIbHXSWn<oMOk`i~fG?n5STl=X1+&f4b(HU3wP;1ANv4C~CUHh=I@Z
zQ6BiLZEPLoChozSQSA#KGBzz=nkB8siMKxVe^*6syN?lUr!N-U8`~H5*gL>0U~LWm
zadmca-5o}!duY+FFT}kw?Xt46;n{x_^|tG2d-&Y@?eFg&Ux8Or40c6p;#eB10yT>J
zKKcq>n%_e)CFOAQ_3&7kZevhKhmT0k6s$JMmULN7&hNXEg&Ox6$Vl7^0!~(+CE~%1
z<%Y<}p8w`elo)Ai?V>HNp+P0m1^?jhFMsqF3EQTiXOD{JXUdY&Qubx<@FO=Z@czJ!
zqT>=uR@>FT$8)9Zt4n>oy}iZ>uTXNjx&xwo*92hqr%MAdxuSG-5dm!2(ld)LFV@t$
z4e>EC@;Xb7?j%xRZO$uaW!kl=@9&>tUddkj+i%8QN6&p%vOM2wLIcaRn$xAR7m(o~
zm)?eTtX^()d?B{jvP(*0WxY;`0*LhQ@wtUW2mTAIRjvy!D9M#1yHmFb;{<L(EwA|s
z^|AhllPTtiw->rf_F#ZSbecOm*K67LktWh<4#(uKNlEpN^Ujb6^)dx42*Q1$1{O#r
z8gLk&?P&Ypb7LFw0=fM0MpPWPyFUTJOdid4!=Uyn`@3_EjK#X~e2Fx(2ile*H^wh?
zb6UE|wCKaSm9J|~bVs`HU|e{C?k%mh8X37c`Ms9}e8u+OF6`?Z#&smKmNaU~rxTs*
zifjveF3@O0m-UK4$6w>q;@L33OW{3SgBZe?*06&298{hv{yItCq98KKH-i(kq1aln
zX}IXQ9RA2s5KZ`?gd-EKnfy5fnkbgcR~SL^MvWz|rg}I9o#n~nH(^$4Nb4R$nXbNn
z;t(n|#XMVQ;Rf2~zqx!H_atwXnU$>!N@)M`H~7Drk=gjXpKY9)9~4aDa;H3hpP45{
zV8zRl&g&2w+fK1gi#!6~rMG@&tU$0b9>vMatxD6+eW=?L-i29ix11zKTmzoRcG@h+
zVq$3{;y`_SE3Xt=RC;Wu9enSG%`(LJ943m%6kL33{WHNkFM_vbf9S64-lkxDX7qg(
zA@+V&%6m#ZkIuCIgEA913ut`EPgc-|R@9niz^yt8D>sqVn*xpXCBqxMp$*&7BILUF
z`@9>O+I*1%U&QWlDS?xZAaREB{GUfRGMjmRhy9Cz=DwNnxnJenDL#H>8QuHauy!r9
zO&WeqFTwP~s$1et-_M_DV^g?xzBR#4O{$-5gwwi1C*AEpBQl(`UWR+I735{vuQPUz
zn%aP>D%$ZxD>wTxKXZ9ML~Ey*uD{Rd3Oe8te%TmEi>eXR?MmFF;ii9sw=t)yYVIOY
z4!FRN7egh<JN|ZDL&-cRX(Sqh?Qd)B`TX7B{*a@Qg*vHLNjIY6(agpxFwe?iP;2a`
z_X*}ft4(oIxg}wlrosrZGdJ_rwM13y>2WQh?0(jA1Y1W^h_$`xQt(oOH=1l!5<I-{
z{6gz%p5p!>@vaOU9r~=+dh2qR3$aU7ceObtd`7}Le8BKZ$b3;a<`<O6^8yqdk&b6b
z00gY}{=Ai<^Whkz*>Xi%sf*3E;7cj|!1crWxFe5EN@w|P9Bya%$OA`*z@&9dfX*c&
zJiT3TnPDZ%Cp3yxINh4Eyy=XS@{EqN?~AFsu9jD}-pfD|gaEbg)OZp1dG(tI6p;vB
zj|d$mnR|tAL6}Oh6?|aJN%fmEco2S1+@Vmzc&FsUU56#P;)uAVXOrg)xG!Hz0<9%#
z1DR9*Qv>yUv$r%RcH(@D{P5F6B)hgUKG_yWaJH=0*GN8L`i?A(dQf>xs~hK?VX7rm
zb2dCT%<Bm)$O$^H<cEYK$2xy<%Ny360+D||Lr{2TZAF1rae=a~L~docz^?q~)Lz;r
z7laYj_KF5?*Nx^zX$S4khO)@wPI4FK`~90M?v!^wN;QI-8~R8TfBP=GJ2VAU4H!F@
z_q3mEWx0sqq{Ye+M_`r5G$y>U(rIgU=}j1-aApl$&&Q};GsG*T^PfVr9m#Y%aIgh!
zbcqG_nHBL<&P6dyf<z-H(cJN`x_un-X|cGi%N}$`)-}-?e0J?ABjKmdwR_?@WU}nR
zE9XVzEDOxYl4(0?^j<6yQNAjt=8<ws!H?sz5|9TkA+<$_GYwZwxh~g#C~OuHNbFe-
z)@}HcO$GYinscYjTsV%;5l%>`ys+LV+>4(tG52L^QE`P~CtsIyp0r_WaC>d0N~qA9
z|LMfo3fyMFDKpE*U~d#ss%I8A+6*34$;AG>B1Zy9ug*Wsgd?6R9Jex(ZOp_|Ny)&m
z&gW_ob8LMjhX#p~mhS9V)>4g5%VLu^#qw!t``VJ~W-we$*eDsn`BBl)rReEoy9v$u
z#Yv6UWqHlTF%{5(!MWf2Ro|D?7sxnBEf_@1cJ))Cr~c3gBpA8uHQWGuZsGLby8U`9
zV&XQuD~j#abT4wmkTy8zZ#*-Wr4;o@ES#A#BvBmbctX@2$d$ByQncaltbuHgQ9nnH
zNo<f}VO4DH!Q(Nu`ut|$<^Nkwu&s?y(QD)bMb%yk?M4}*vY6LsMR3=<U^WO*w4g!W
zER2Baf^=0zTtt_n7a3!;m93Ddot2X;pNuy>L{i5wgLYDCc1N9CxY*Ui5kAx{sHszS
z?aU@$){ZvB@D=xkH8dvOJCPHg4dQ}7k!E|{w8=1{XCunWFxnV6Q@hJu38F&h)Ocmt
zL4~f^a2essKa(PI&No$Rchv*!TK4QO_J(Zx$-iuSI>;L_z}KIc$bFV@>4b<AA`y=R
zZS2l+9BZ-2)<x6LniJ>)DY2H9(W}0I96MrmPpu#03O(~KS_IlXoNK6B>v#*lac2Rg
zsXjKn9AjBvr@Grr8h84T5lK#X;d&*|bwq7Gy1W*xDqo4;?>6i}-aDQcs3~k+BXSY<
zGa@xmWmD0HQEBm3t?%)186ilIrgJkpt!vuji_WV~XK{$_Uox?i_&d$CR<+$779;$w
zK9D1cNE2|4E`vwDD@Yl?HMn*Cc3UUEwZh+pldHx#4D(?!8wv?YB`MVxC5Jk88Cm1`
zG&S29YaciJSXtQJnwWc4NxfLX-z=_d`F-ue;dJkEl~vLwmFJB4MDlR(g487Ah3CDm
zKc<~{;f>8oHsJ4I++*&+hGkQbO!@p^(BFEsu7en_FH)t}+?2kTWj7T?3y+&D-&JUA
zh-EP8_J<eoDGIBePX|)0P$yxC0z~k0<xBF0YQd5HiSp5(-b!(qg}0O5stna2l?7T|
zpk)XLriWN9Y7=ni@Jza~I9_h*V87SLj{U;EdK97m(pvQO*lXb(#heg!ykJ=Fd-T)k
zJfliuBcnV_4Ucxu-yA+}%M~ybB59*TA1{%_hz^>71Q-`I{M~U>+L0119lTh#mWSQ?
zvf&H67o@MATWOvoqsaDUrLT>XlRjJ%<%4k^uT<<y%+1Fq<iaYsg{_yeXZ#IeR@WQ+
zf^TR#W%z~c>(b_R*lU+a^GREZ78qkZ;{MD08MX{2-d}iVV3yF|<EVDW#At7mBBSjb
z>V(%NYi5OOW+UqP!(|aQtXi04CL-ypKj9w5W5X;dHxb%>+0%fREqK34Z<y;NuYJUm
zP{y88;0b8`zy;lSHebZOYT<iuA@6?;!-_;MqJLa^qzwAawmrD~`=*$YIllC@%li8=
zI-LRag`sClP*>l_UHUwgCd6wTT~1+Sm6CO~j-~!8)(;Bjn<;75(B13ns@^bvrk6h+
z(o_{ka(q}@q1~vbqr5$JEWd`9;@~y8C5x5Qil7h3OT7attl(SgU8CiW(f&&yFT-F9
zWkuO#C6RB2msyImJ0po(O{RFY3N`R;#k5|ETx!(H3Wcot3I>BqW1_5Si;*K#M3>i>
zLkBf9(g>1NdW?vjhjCRnAme9F3#xf~ZR|X{_9TT{a`*NOgFX=)8h-Z)wwR31)|q>=
z=F+=PNij53Ll39@7#u+w4yNaC+}7a-4T_}hW|wq*y7R+Z#(0dL_{P%J2@>^*f`J)Q
zDah2c!0|6E-A+D;iChzo9lP7`>eQXAf@K>cYK^|*yD<4dD&7>v5~y@WzL9rStGe1|
zfRTl1h}Q7Im#`20JBWOEbt8*nF~6*Sw8HOt^^J`h+h`WLCbupIz~&cWPSV?{eVq7B
z6{!XLX!j8kZG^UlZbzw(8kHL1TVWRiY21GuyjhGLNT5wWk9a0;tZh#-!LPiKL`si*
zo4;DE0{0{y3Iz$r8~?sdY5^3OUA`lB>0M+EVoV+sNlln=gz;<Ekf*&8Uj_FEmA&+@
zzB#O{b{YXiN9-O<e7fBc3=IXBd1vv+KoyL9tId~qD&?|E>s|-x=e090y$nI`uXtRv
zSFeAEEY-ay_sU1Kq&!LIpO1fBE<iSZH}A~Mh_FYJ^+sKwj;ZYUWw4|%yNwaEl?7G>
z$cT~}<M2ey@L)-lR-Nu4DaEZdq4E!pmL1(b7wHOEYof)fpbX_)l>V0D&^`0~z?`%l
zIUhQL38Z0fOK}uVF-+~7h6(ICH5c#Hp-*%}MZp3d=d7zpP4JyKyZNJW7+SCR;rm#P
z@+HdwR{NKTe34|1_f5-Q#73VTqRV3*Vy{jGV4Sw07p`e;I%>x?G2`7o5`I=_$O~4r
zMXtS}@re)js$}tq5OU0e*#8R7vSW>TaP2veOc>*D<zicF3~M1ojlAv-j=q!9m5We3
zAP+s7<CPHuB|6PFrY(;5q{V4ypuWpG;0+UL-PDX$^)d6wp0rroE>oALU!Ne&oKO%^
z&U%D;UiDk-h&$~IeO60%0VP;#FiGYSmpgTNk;9@KqR!l&=8I{Csj>u3c}GG(X`=n+
zF{gs87d#hgp`1NLbLz<yjA4L?Z=?aKT0y0u`%^uR6$L}aISJn~s1d+<bb_kGm?ONB
zxf-(8ZkT<cj2<O0yNKLlsM8`!;yY&ge9)eu`FmME)K7_ZxC72IpS~m~-l;3CaWM2=
zue#bQOW?8vt#@_3(&m`(1wK{$LXT1C^Kg*1?Ymd`e*E+)Q;Ds{+Xo-aJhtCrBcaY6
zy?W%4_CovYTIfo}1&wX9=oizt=ZOS<V`aIXxh4EtJ|!NGrmUztOY@xRhEMFtX4<l}
znE}@)l2J`US4Z+rWDck3H1wYBkRHDCc+9auG38IY`-~{+xy_B?7Nw~aj^UZp%_Kdl
zrFVeJ<f`2D8KR*Q5i?8RrgtVe+Uy>9$NoO?RhnD$TL#w%Z*S)6O#mfAm348LsQuFb
zF5}SlC<E$sQ(38Fvw6&j;TjTDzcC8)@EoNL+{VyVz4C3|LoDAVEVV67@qVjZ81m!x
z%M=q2c>K2u$`AVB`J|a79<l5l4)Wb%*wp}7BgwZzn@E}%>u_pXQ}SkfOOJpcuMp)9
zX}~)%kSydZ`=S!aV)G1L0EO<lpD;!Pc_9&lo1JYRHJ*t2!G=FUDH0ViwyJR4TvJ;{
z=<}Z+U%TBY6{+0MJV|BWkkS>tGkJt7;)@y^2}odCdaGVl-JU@tVJdacnva1i)Aeo2
zEK_MSLgG{txg}`}-zSJxBk$ZyXj&`O773NfLJb*ruq6=WhkHznjx@$O`?F%scamy~
z-?RjjNKvLr-d=E|aOaJ1!fQ|kY3lVJ_FOL*#+P@;aY8P|v2B##{5Ul))Ef{>^4MgV
zIM7*=Z7+lnkKO9^P7V;5tPS~7h`Aq?HGVCoLpjH}e8=*t;1SUM`AHf`a?)K1a*F*$
zXrig7+Vr33bCjx<eIX`H@q|rapyZU$eR~U9>cz?yJPp=3l_hviK5Jg0zQJJ)>}zMz
zz7)@-PLlBX_uK1WL$SP#iB$xWv3<>pw)ia;DH%A&@7R$^;rJWc5xjFgKonZBX2(bt
zM4@7SSLc@KP-hoC3o^o2R-jO|R6ExmsXfa2ggI#x55p&+32}<`1Nc77L!DkTDb#6?
zTdJuIz5o0Vnoo<9)K^v5*-d3=`35%M&LRxR9K<H@{M(>miNS&r98@0JE=NXlciCSA
zk01!dn`%Ug9lrEEW80%5+HAaurEm14P*ThmMLZ^ao6BbJpoB>OD!fMtcQon?tDbb|
z@I^?}7>OgeqNqF17WZ*nzpnzeLfGu~yHP)dYKhv>S~#V5T;E<*J|?N-r5;%xar9>_
z2SadaSfp4c>IZ8h*YoC?`zxlaVIPZn($5mfsLHrfgOi#ixtp!~;WrUa%=hTdK~Siq
z(`c<^R4LOVTcb}gU`2r#9>-r(6%F!jkwwFxF5RKjbj}<E(?&)FX?g5Hs)i4E5c9Ks
zM=iI#rsKAud2(mwe+ZjmwIW2y^tL2Ct8Xn}_D-g*1<jfXh>4kpdXT(HBu3zG-DP+~
zz-LL7IQ-5~$i*3!+e^vkX6coy7;56cr%6=Y*s7>rsF4_6HF)MaN)Hn|f_BEUzz*S;
zge>5#1O*ab7Ynt%D2W@!xIzfCiH_pzofgNU^R1``$~S)^Y6JK7;L|u}Pd~7SlFMaL
zD4k>2JKOGOO|p&GpuoVcZigj12Md2;LBQZ7@@L#JPfH5!k@8$H$dEj*JK2<W`QaGt
zhhq|P0i!d{Eo}X*!piQ-;?mZWUqIm6e6XH?J4Rxd%Q<zouXlo>dX(y=SQ%S=w~|N|
z^7~r8qIK?%tS2>3>lGa=_!!QG1`bw_nlCI%!a~J*`Myka##uQepH<mfr8-H`cIE#t
zz-cD^eL(ah{uM4H7g;sl8Av3FjUkN(Dc(Kx8F+%t&#|a&=`rvko|<taZ<hR!pEZn<
zLDWc$p{Kdst(m_$M26U9nia+zHaH{cR{qZJ>c8?pr{0nHjr_64(Jm%FR1dd`Wb_W&
zhZvC>K86q9KEgfrG@>hiX4KLmEx%fD)@f5~s)1VB;Lo<FsyOZ9YCcfa&`Hq{Z?$E4
zxu&bC)-hpLhg@=j$!5Cn2dnlQ&diC*<zZ}Rh<huBM9S7+ceFXDrePCOdxBnsz~Tv;
zG^QIUM=yeI(pvbe1A&lrEnn4=2ScWu=g|_gcg?9J-sb?n>L`|nI7k5ebupqvaspBX
z;SQxGg;*F`y6CkSoF~ue;NdAJ%-Rkl)D8<uNy&H%?-1qY5Iho&XR04IfOf)3i=!Qv
zrg!x?Cz*I2-a9zKs7kwtvaDqJp%p6|w<&Z5Ck!H9GqZNoSVgK51o}jbxWd?cz~&N)
z<+0TjTu3}s9|v7|qHwW)5(&v!3dW)oz<L&|V3W^a4<@Q(P*srBzIg_O8UrbDHzVby
zu0jQA9&M)ufgP3%OobWCXovDB8D`Y`Nf$UzxP;EQ8;8_Hcw$d8I4ly?tU%3?C06;z
z;saQ6v2HQ0^fQuDA3rR))(YWO<x?NZQh2#4HG34H;Ov3E?v$Wy=crh>!GIgbkskO9
z4*r+AS8|VMav!ci#tRS5o660*UN!YI2V-O>DKT=GO+O_}ejP2)Y=%ip+vgGHkW~Ld
z9;$KCB&U)(PPCWA9cb>ZZ5T246ACp`-lRnsxD!q~VrL0OTBu-nk24n-<*<oj@6Z^q
zBM4^hvTN&{##v}y(48|v8kTlPv#NpC=hc*?0aTwP>=ixFPFOM|FlBZ9K-(`@`)D~=
z+)KF=J6NpwmyF?hTs0gw_CetSrKS`f?kq=}#NrfRV#7sgiB8pnc8A$T_~0{DW-l2F
zEF6eWDFl!iPu^5-6rVPR7QaoVQasqO8@4i6N;mcQqI0|n<V#@f9gjkRxA#5iLh3Ra
zB?yeMEOuCM59yXCfh((gr7)(RQk93n<%gC$%)C^;C<ZGKpoF5qZ}&$cf({czR_^sT
zuCntvA0kbrNLTXv^UC~*f)RPf9tNjzb65mj+nMA2_(>;GK6QjYp@^D!(ukG8CX(Jf
zBJ(a$NT8eQ3Gm^4*mV>lT3A#lR*po0Ol^nC#Byy&xGvF|KA>##uyGJ2jI!wlEtrIT
z?BpR4xk8MGES0ha&x@3r{GNkznB*-BBAt)4fpNX0@?T+I`%?n(Sx*f$4aYBBs(FLl
zFSw^#kTV-n*#^ip34IAByyry=sp?40p*+&<$@wS>W%zS2-{jK19$Sa+Ws#J{&-|gw
zQL7H$Qb|}H&yuHOBO+k1fsgCXTWw5@&G=z>OP8!sY(X1AbuHBTDpu*Pmq3yNT+f}x
z2#0lHSb}^*tk;{g8!Dd|W7C1_O{oM{`LmS5%5;r9VLdJ1#DZd36O86PEqfN>PeO9X
zzcABNJtylQMnn;0Ght*if+-3=b4|aK)L>lOSD4jJqF`{61QapHZzWQ94^bwzSpIX+
zw9>yk@4^3uX#H$23~bN~UB&{5)A;RZOa3(1bt9F|5xp{yvP2=%ij2U%O|_O=a+$vq
zSHWAN-IB)nL7kv@VV=Kh0^(p)Wo=Tc7%}rWZ#U5XsxsFCM@oVMP5no!ynv$#rnn%T
zz#WBXKKQJ>07E_EtoV=AJ3<8yvZD2*hL{^JDY|tYGfx0BgP(!eV;$#k8&xvpDdCTo
z;yI0wfQ8AKW3rXz46N2)BG%srByq*W;@wpR#R&;;f4j%x6PuITXtx)M@j5?K$z(C#
zz-g7T5VcZj1>mD(F^TuhQ<2YZ1f=4MWG%bWZdF1ACNM7F?uDoHl$GH9toAw`Yn1<*
z`IKCZf^p}*7;|NS8#FHOpPMg?Qlc?o-mcMrNP`byL>aGPQ72Mm4Ywhc^Gi35-EBJ|
zIIomEjH;`uc$igQlhSlJQ*RZE_ds*c#hYaZWwM*mGR+I+G=e>oQOo_^FNmN=Nld>J
zb2kAmsY5)j;%M2?lhzUL^mdddNcDcEh$d$j1p|terF$ACDRObukmk@{UGqet#<Zfd
zvY~XUHYD}<guTfvJTGTk1JXoqU!$N<w!^nb!P5M`@`tKEQsuDXamtbMV&S~p(Mkw$
zu~>z0?!1U_U&5&+oIY9{I5GwB9pDq<n0$Y?IZDDzRFqNjVu+4H1?S*YVG<b3l+oLx
z0ZspT2{j7Sq+8f<$<35em<IW>_1^}>mon$uUjBKEjCe~U3Kv-09XhLw=)TT=0#~OC
zS#_%0$)`kY#j)7-N1<4^D`8tWnX<&CA&J;O*zZwvoGj)b0ZlU4eXOAd+)ka$I{~1x
z6tpoqRb-^=GGXddBfDwGNrWYT<}S_1NYjt2@f#SI!{s}ptOlMuQT4Ioe931Qce|Bu
zs}P8o7fR58Qdd)jEjOm!NogKX2*g)WY6X|!sPlG+)+kRo9`uN;P!lkDd8iW8d&Z_v
z(gI~15zxQGg>pDXXd*51`+8@Fn<%7X@lzdv*PFard_kU;1nA4L_t`GxF@}`{ku#5M
z#!5Kt3kJEfKRoJsJ%&nbfYWZ-#BH_(vh80|7N2J*1Rs(?#tqJ(@b#3aD%xvs{t->d
z>SxL_e%SXzIy<KJ16-(tWOusxbL~F;1!G13k#j4izEcw`*(b~c+&d39l8Y^|EGcI5
zYC0XkVO3|Fi#{CKdir0)Lx*na=h_1yQ(>oZi)sf|9S9xG_=EJwS(VV)lz78AbIfR(
z)!JbY&z9^Fx(GF+t&!*3KUGvCzTX8`jTxg1qx0If5nP2YhPi2FBy>pMCbSfN?Mfs?
zuiU@BQ<cF}tn5Fiv<YgL*Yn!fda!6mmZpJ|lHHMjX+X<S>%f$)z?W^n7mvLZ<D~LV
z12xPt#ttfSKQp3)f^%DMwj{fJt~{$*>`Ybg=3y|0Axg_%O+FGkSh^KV=I6J}8Ew4}
zKf4aXA*_dtWRbPfxmLI^7m4GB8@thYALAzmgZZ!Bs0Hajl7xT88>}-xYo8$>H-5Os
z6dz@0&c$it;PiUEicfdY0G$-ME*`c%G(uqQ?&IEL?CJ|zdrMl7EA7tjarwh!F^SoJ
z>mkRj8U*hNDRGMVZ6L*N3vPg><K}wLou^h?3{9)UJEcLkscDI}ml0vZ?Va$=>{eBY
zUOSUr3v9exRp;lUGlMldvw@x2?sc1K9WI9W;_FSV=}`>-w{@_ioDm|ZiOQt9Fkl11
zSM=^*`u<;fk`C^~fmYv*y@y8Le_hl6)jR$F|3v4B0ei$&QC9CV^?gvJw$^pXE@iCT
zcp3H9YTp}%)rSB14g7z4ZBYw--n3OCM=|JmuAf!P@@{*9)n7}UV`gIodeZ-Twg1!0
zG+XBoMvIpBrZ~#6Z$AQ7W~2E8wc~2+;uX^J_C#J@XFQnq(nmuxb;>y-W{RBeaq0^k
z0{jF;<K29kdaDU4l`8cPsL-cN*5`iEeB^1f-F0SpI(*~8kdSq#CyKkj<4dr~NKwgf
zFyG@!nQkkj`0tMQpTOta#*OTmj<j7=xL!3hT4mH)4cI^Kn!b-<Z@B$WH~I5hKv;+v
zn3q5I?comh$My&qwa#pR>&6oTp4%lskf;b)$N(~WmCM1nGkWI!mD^)2-;-c02nF%|
zR^Mu1OWU<AV&m=B+v}XJGsIZE)|jnK*T2Ag@cK1F%ICNv>W`-r-qRCuQ^kmN<9aU{
z@4bG|`>~{y^uVV_;I3-**RalgHw%qw)Yn}XS|4}#p1<&Z^wCYw4qQCMh!H=HGpZox
z|DU}855BkkR$%{+8<Ffy<NRcIma43zn&D2aGhn}1?Z`&b!*1AY<`iz`1P7Ki!FIE<
zh>@W!dO$-*#n}0gB3o1E4Gp&9`}7&;>mSt4Y`l%T+RY`TJx1An_#v*|Xfn?(TrGV(
z7sgAFy!!Axc%KRaK@@Jy<%D~-pSJP5ZuL2H<Km#g6?jtj-bhgyIa$)k%`DTz^(@oM
z!EAP0Tez&(t$$s8g92s3V&lh;25Y)SkskWT@dqCv(*^le<dokcLz@r0j`+Zw=kC}o
zSS@|L`F>#f*?R5a?`zL5yq{?j#}zN_<uPIUd@yAhy|vW88obbbTCB6@>GXOh(OIsB
z^&1+Z;Qc`N?N6j)t363*XIfc4A&DDvD9Uy0@__xHg7o(ft;qFRXurla2ciU2R8tys
zS>p9e@Y>dm7aPvj$m^Xf-d^VgtF1HJ_6%GuSElc2=`_f2;E#`@yno-n1TVg?!UBT=
z_P+b^^o3b^hijt~Us}=qqIF|Pl-CXM<GuXiv)$R8YPH86jLWG<U`z}-+o~P>MeB(+
zu=B&f%Ih;2u=xU*acZBetYF~SZ?BHOV{F<{3#D1r>%m-6qff4!ai9L4+;vpklOYXp
zeE6_@r%xOE{0$TU{rDChE@C<o9wAH5%G>pPXDLex_UyHL6Sz=s^Z$sCkuD%UE{jbY
zfNdccJccP#)z!L^7Y=kCZDB9?kI-O(Jw&g2A20tOdvEy_*RyR6LkJK&XmFC??ry<@
zdvFNu4#6P_1eZYL?(Xg+SmW;Q?$UVovw!!!_kGTXbM9a8eCRQHY^_>VHP@_FbJwCR
z*KILTdiC}F+l_~J9v9nLLR?SFb9#=2jg7%wkH=jZihI{yek-?Ug7<fhqw!vQi~IL|
z5Th}gSqu!U>n5YeS6T)wDIw%u=--=G{7D+Pxu2)I{4)@c1(jy3X^mj>T>%EC3(dyN
zh1Qb~sl9((Kgi<PS1<$pYNWBE$z{jt&d$V0Amc9=^FJZ_`7`0VT{7?o9L#Z+p8abz
zO`XB@m(h1`Lb?@{l&+>j$j#S(NL00n{O4wVlN876{_5*|tb?Ji7wx=}&u#nk^{-d!
zuuX)nME97;$T28vV&YBZF-};JGUMGvUd;HzU2TCpAw^oUA{603z$M-%;K^@BLkl-8
zUG}}3K~ngY3y;0Nvbs7#qWP$7x?X#8l;lVLwCB50EfCVU6cx^1@NuQ_dFRUxT2<G(
z&ekPj=u;Mq7fxC_&-Zk_le>t#!|%mzr8>If2`g@RxA|DuLrqUzk%@;}&*8iiLGq(<
zqv+kM*w4v6YcLTJ=p+{VV?kFMNP+%8BJNM87B6*LxK~0CH+5XxJQH(EE=%<kYq0kz
z3Xpk2<azgZ3hkWerej@<3`5oq4m%`94<Y<rg3%IGI5a%`VWp+M09e)9^tk|tM@ST9
zl1fnfaXgVVB4}{WW#7OO=&3ifqQS<=&j*sq!Bz7M_pbwPXtVMPTEg@kJ7Za2W{oap
z9iNc_WPt~uBOoTmZzs?f;vp(kqG2))q<tj!8{_dhN;$2AiE@+eml?rmjG$RrmcyD_
z>4`%BTebMTrh#fbZ$K%opRyjPHdU-3P2eow;2(c;(6q}Q%<K4p5<hes!A^ggi(J-V
z)ihTmz#evC!prI#_-VXj^q_H|uPC28U+DV*Rh(Ly8d--;YXuZg4Qenb%9KV#_Zav-
zRKbuPR~y}V;GJv)#$b;ysI$@oE9@4wVRd(jn8eFXR;10O`h|f1Tma=i_cMLeow|1|
z&-kc){1PwU0DRodXk+~vMZm+yrC?ydtQduXmZ&o`H*~qyxws>$si}%@u9Om{9XS;4
zlA1<r84w}a{W;EUPQZ7gDIh|TWbX2XLgY`)w($OJ^jzZJw7;u9vaF%!3kh#|UTy79
zvNj7S4|z;T;$9)Yc_OPVc>kJ%gIKuA`+qATgCy2>T1jw!3k)o3uBwLyh~Z9tRgV4q
z$4A+<xNv5@bgXaIfU``v1p2ScPd$am{U0at;C5WS5?Ss4X|0V(u15O~M1%Q#ru`b)
z%)M6I%Dw5VN4^+gC|;|UB>)rSsc`qu_0|YIht3~G0hgrHm#@pTD&Vp;I1Ua^5iD>}
z7>9-}_fe2uG1=Hsg9h0E+m_V)5PjMf-Eo@7#w0uHbtbKBH^JqeofW#AVYS$QItV%M
zb0&pQ@JX6VAUOIAf5G%1&v{x<nA!y3;$qHT&&wLD)Y|6rCPbor+kQDYw9j+%S5m-j
z6!ZTwT(EIbz8CSWcm?%j$gsF^j}6o^LH7b)j6|;f`gq>8t^LrNPl=}W+$<ogGe3h}
zD&JoqpKqE{bS4d!SU1A15Gy+#OZkUo3#OijwO~=eI5iaUJYS^((E}q_2++fw{yeTM
zsoaAsJ4nY-0VexpujiPO!Nbozn&o$P4cGos(NqVp9m}Fsw{7<o;759NutNcTC<BoF
zVZX{Zu&K+P<lJkX?EXcha2v(n2^Ql8jqmLMx@-IKSd;5$BIDrh)yZnW_j7+3bn!4o
z1OmLK4v@lCc37ZQZcogM4giD!uFyPVpMc;$ay+qSuF%(ii~@yy?ykeuA5Pk*ACZyv
zGa!TG&q@5ib9+ShEwNy}YvzC;5PE6U*|_o87IS;i()GNRzMi=XeYW0>COzF_?Yb!7
z?-G8v>F%KDr=e{*pAM+rxY2sKfFeI^zF=k6C;-kLaH^R#BWr%h3vK6rh=zrrJDmUg
zEGt0&nMro<N_S4+|7VU0c)QJ8peg_Js05U7C^!;u|3w2XTFb+^s)aOUx(DiPTayZM
zbeDc=Qbh`}_D1K(qVmpHegfgQ0-uLkn7(TrN@^Ib5CHTh3Kn+c=L6G*c{QD~GKrfh
zl|m`GJqBq%;YbNtCW*KKNuZHvm&{gUg2>UZ#<yF1qSQw1`JK%nDS>e}{DUkGYbLE0
zzl5ylSR5Z@3^l}kaV@=Og$9jM{W@*ngN=(iKYp4i9@{i%0Oah3Kl(=L;)(aXv|IM~
zYxtk<KV8two~W-@B<dbbvR{49n;giisc_~qF|6o0Zs@_bg}wV0L<Vs<sq9!YsOo@t
zIz)XbQ!R6ko;gahTSIopR{7OB-kt?!MK!~H?fISAo_A+`-~0n3XO^VM__dG!I4FT7
zOC)hqPGl>iFSHKXKq*Ra=a|0A;1s8b;#NnLc=&r@55c;(@%J)Y>~ogj8!wtH!IT0r
z2E(>3kUM&&$O(sl+2!e-fQDeWRZ_h23HiEjV@$fGOUGMhXX>r6FvGL}fBGFck=$|4
zhdDF^eSH)!5Bl7N6$9SQ5&twxOrOY`G~cHV%-zN<PH-LRqGR0`<%hKb79+k4WACh<
zDDjbT2PBGH`iONl(?9}c#n$u`r-!vZ7RH7kPGke%{g+U=frs5tbyG}BT7BIgd#vlH
zzB5Jbl!fpJM1i6}!POXdI|_a~<UcH_`?-ntu{t2$$9B46biNpd%zB4Y9LGG)^f@N8
zW}QYETg;pKvy)u(1`WGTlLK#P+LOR3c*uK)q-{&p-h@}9Eh}=Q?r#>2$VxWqWHR!P
z)lPqfYAIQq>8%-!A<g)em~LtHG;FAo+^s(iV>_}2`|gI3`aCfM4_-^g;^W=CT)~d{
z<<fq9))i(PQH15Aa9;NSd00z6E-)o3ge;{E2~JtEYsG1+K6?{>x9#E0I;B-jy&lH?
zB|T)RY<9M@|HT$XS;<?-Z($c%%hv(Vv~}{V#I_Q#Dt*YsYGJQiTIAAJ8FDn&(AJ-;
ztn9TyC)r*{IJF>UA9nBQwQH7`^1|~F#SX*8Wf@oUsYmEju6(M0)ko36_i0*7Uq7nZ
za6y>g7ko6TD#Uy0>|8Lf?B^mt32ZWFTUWKdlK|3zU}!(zw~mjRjNBNetp~KmFg}EO
zBz=@dW-?5^EOy7QGPo`A@EG@vwf)PRbqsDqXQ>Ec(!@37PcesO_r+A&vz+%OgrYIF
zvGOwb`<cQ2SHjJ2J1P!A`*&pK4SKsBOCP6So*s7AD49GMo^#zm#=Ky4%8)^A%u}j(
z#P3}eGBRW$0SsV$`n}m<5g7=-@17*ma?tUwP#4DCI@&roE`$cgcW~Z~f3IbPl%m`4
zNKhK0w@P}*jfCgHHp0}Q+_YGH&ttg!`e=alh@momb6dQ#2jBZFqb*aGY_F$};WIuv
z?`01sjqeQzAn}H?(}GEX7Y-*hBxmX0g-x%<z8~X5Qh&1RgqTqT>VOh#x>d%u(2DAS
z5zzEbT>4SLkS}%fz(3?#9y^2X;}Z2QZW}m4hBNX;CwtR^eGf3b5mXWrgUh#1Wo!p%
ztXOvp0>;5gc^5_W<#sowo*ZGG{iKhnV!ewfN$)I#Dl7VY&()G#L%auFr`f3ZS^Re)
zkTfB|(@VXg+K~hkwE}*Rfn9k^7q}I=UQ!bG`;|ea<59PHW;iz{LW<h^sT9(Pgk&|w
zf|W16hYD4qX$CRYh(}}G&FL}4PjUos;YKaAw8CRb&7O^aNnGJ+G5LI<Zi!{K><`}J
z@m8H~G~m(OwcPO4dN?OK+p8Pn7-h22uW3l{v~rOXJ-!JSo}Fi2g~c>&t{~qTr9#NR
z`8o*n+=!5j^LRSa@60s!yc;)bskS?9eR6c0*aFE2Wu{zt22gi5;AEyjJg<H?m9(q-
z*mFephHuUHG=2@}>a$d~2*XvZ$g@JUQTMMTvB~5=xlC~7|8h)k8x?=X;w|pPXL+?!
z4?c8GVw(hd_iBmWV`a?&@Sejv^u1w}D1({A(3qq;T$ZfV%jR14Jy8Lh>!CsPhDwu0
zJay|LU#LbRV2;Bwg9bRK_ogmqDy+OVX-HaEz~(Srh7vm0V?Efs5$t8NL8f__M*KiZ
znSQgHI494l(RWJw1M|3R$!NtMqDOW9ZjI&_Dcd?6(PF%6T+eGg?`0r&1e0q9hT}Zi
zC8wP$#rL<dh~LFSS7mM4B4ci}E9z@@>zeC-MoYLg^M9YE6DaMo^Gwl}xf9mVaxbBh
zYF(ndQcom7(p>*k!m~ML?lytc7$BFUS65QeD%pwR{bW0>sW{~PBN=N<48aS7zv57M
zc^(urQL?OE86qcgFQkE8K&4zx`H}1^qIvq9kj2c>f}V1zpTH$fj!m{(^RB-5sd_XT
zVZVW69<N%fGx=x_LV~so*Ayz_BvVZ%>}zoXkM{YPdKm;%KR-G6QWZdJ&+$&5T1CVr
zkICBRZN1`=EsN@e;Jl>F%rpn@i+p)Mn%mS|<5Ps&Cw?6;xBkb5TjB*Vc!>f+>o>vT
z^8!|gQA%MZ`3I#m#8SWUWHr*X?M0M?>FX?5ux%*(o_uAk3pELP^r)Ju<Z~yYzH_aA
zm&7h+*=_zL9{vTz**QAjFnbw;j&AfemnL80=!;511;{Vz5!IGDCT{(`zD|+qX%SmF
zz56lMu=zkx{7*!o=^S+fN|vO?6N#o*Zt>X(d;Z)l89X{~9N!*ZC_RXL<054EYrpbI
zk7(sl5&iZ)@ET8wn|V^xeoKjDTvI~>dA$ASkET<P?grA?O{P=bpL({razw=P9f;pk
zyp!NX3u^d6k!D(Bw3aQ*({tGCyA=fJ@G9#2eLbV_`fPAn2y%!fo)TCwUbEYDNSGA4
zX(s(T5csjQ*z1&!z%VF4HcO+0Rg%K)um2Z~#bI#{MPZ2%LE}|g@kC#xbdW9o91@4h
z(bRk-8)?Xv9mQYWP|d>)?#4z+rRw0ES2!7L6oz>jT%q0ZeZ=ie7HVh2R33iJU)c0G
zKgSnhLzX@ls2@3p<qIme*?^>vv(?LsgXi0SkR`H#2#6JPO~2%N4gK=Rd78_3d9=pX
zvkcP+^fXP=ZIC0}lMN*34FAVS2bjc)zvD{~lT@*+Nb)LW+idbaBKRp&{5}fuZT=)&
zB2<mFq&QL>J>C2xw!Y@)RY<7^+^iYdXKO<E^S!@SRX|p-&oA9`QBV4*@_zBe1o6Z-
z?rJyQiMK6Ge|3X^+U#>`)Da035H`(;+qY$ZGX7=GMQqfNf>E_qhMB;kPyfS`sCVd~
z!Cx%0^joMz$d@e@t}loAqcRM7`L7tP!}Nb|G-c!ebgbMY(mU4WG2QrLaMcuiP##Q}
zz+yd|&#|t|Igz|eBR!wr5m&HMb9rRnt}BUW$CCA<8<R1gZpk|0VT1U`k&0DM-%qja
zOcr*v>a`!di8<J};;iO3mBXVOIHS9-WINz=O=%3;m--A1B8W-WW3;39iW_lX)kWCM
zPtvL3PoO)#E#@duvMM2t_^T)oL9EHTRmRh2mEf%_7W=jgY1K19MU&i9@g*_J;LB@!
z+u?N+{2BsgwHQW8n<N@8_elKsi@Eh)`<32Eic`b3tS`Ej)akSu@Vvk*u<aKl=`yb3
zZo@vRz&J8pwwYi{!*9zZddn`j`E8tiQ1cHl=-k`Y*|&95eP#CRNDIa_^CDKG-sxa=
zeend9TMQOFE>~`=5Q^SEhjL`ZA`Qj8abBSKn^@D|pWNyC81wy8^5K`%2=^3)mh^a8
ze7KV8;z{N*yk`tf^K87`<k$@>!m=o`!y~e=E^PFkv?ZkYfbsn6zEzQ$-oz9hDQR92
zr+|L_x<ME2S`*=^Qg2?1Y{;3s=n%U~;DE1Jg+4k9Y-<QUBVcn=D8-v~%h;X6&qaJU
zYn|;bmn-N75<;S(XTPnN#GzkemTJBb5x6>}i*V6>FX!X_QR&z%4nH>^mF8|%-{T^U
zMz-EPFR@fTH1V*QK<<2us_68cQ`tLqjnG)6j{b8Ay8be4YspzTudZxa;xbR*Yeyfa
z;l{%KiBU^>K5~l|2_sG#dhhs04#{6|ZA;WFX2o}wR*mZIOLH~n87{98Be|$F2qHEW
z1zHtxikN25g-C|TKJx}ww_<01A=wBi4@T3No@(6AKkXGL%+%1vct@{;qSTXp28hAs
zYoo=+8t6x{Che#W;;`(cy!=B8t0oF~Eb`Cw5!$Bu%R)RWDopK*(va0m{u#FgDH@YG
zd)|w12m+VLuvSU29$rsJMfEmYmk+byeHTcx;!(v7kxDods7-Z2u`~-}tZNej=FEn<
zOLkLHB>ATdR*L1iZn@My>m|!6hh*W~>?bRN^G}Ia>qnzDrZSvsvhb{fdC5J6K?C{f
z4PiYMw~VtWqxECD^^)1&CYc7mxrJ+WgkA;Lv;k?gwMx!*-4pZp;~bE2=@v8WO-q*r
z030$+dQY)6x0HX(&v32tY$M*dX<C#z_5Ai!_f-Llp}lRn1C4$98AtK!bt}M$NCxGj
zR+f|SB^J81m8lYI+U3}aV086*+f8Vm3O+<3Pg;E>QwvRUdf|w*eoKPN*%DmTBr-E!
z#FQ%@UAv2eFP^R&GA0M}8$O<GI<Kl|j!t=`%ecAH6!`e1T7`E2*pF>Rs1XGxxk_Rd
zxgbjLq18qIy}D0~RZd_u0y>_OO0TV959S;j&ttfY8r5{CLVYH!{g(+@h0<5gZe)}k
zDhxRw9(pOStzg-DiyTmBt<Ikbq#sw40y+3&*fi!d2>n{tCMvoE8%@kSmMh*~-7fl{
zR%yXadJCss9J?OQsk~yM{yJ(I)dZ|+%JNJNuO3%}^40sfS@Y06h9NJHflsU~NX-qH
z6}ulw?bF_`w^2;rlJ2&k`yWE@vQpftb2);etqU;ZoHM`Z(yF8~9wQ}|=yWk=sul1E
zC>`8m%5O3mtV)TRv<Q*_p3`x<QmkUv^mWt4NCiJJ<^Gh;Xl<rIa|4fGMg04$f^@#L
z;pN&lSJ*%+v|E?HT{h}d`9PLRVjrShjHe}sR1!QTTw~ILaIHz__Q{@$WVBwoMo>9V
z_oMyoQ6&Dn({df1C7DDKAyU02NKGx9j6QevIDeEwAWCKFy~0p^9wLBgQtFDRK^df~
z9|EIWH9#DN#&<{WQor!Y|HhlV8@#i@ur^aMaK9wc1DoZXMg71RQj2~tRV+$Y1t~lj
zFa?k?miiU1x@$ztdd5cEuxlyP=YNo`4u0o&I;>CfSYyqEiNhGh@SvTL43UFnJg4h~
zq0H5m?|MJ*SCOyqQdmmif0|Y((+-wptib(+bY9*ag-Wr5V)9GZ{`Xw#%!_;ELQz`d
zq^|2a)s^s<Eq(@XvbKt31FXx`dDU;COAA8OYgNoYeJN%qKF~jUHk@U&y5DKn3+-c9
z_L&_4K)#3gJviD~U>Hu4AwPa6%<XeAx0d__D#qM-iogYKpdP+r`NE7uQm&Bdt<a_o
zeucNJ_VW3yxsu23*te+spU12dHvPRp2hF@=s33-@3u3vu{u+R3Ei{YA>xefn+{-56
z*xr{$P0<_c7LkqOmOg<WWmn2<S6dZ{?O%}NL9YPMze?>sdQ`bN$70-XKXP#CN=7VI
zPMO%Pqs#T@)%eY2?i2Xe>MT^z)W0FKpf$OGjIK6{1;Ied+&I*NS1REZ09)yCIk{2%
z1;0Pjm&%4MdraCP*7Imr23!QFjlxYgzDY%DSok<MiR>zkl@m%&GaTd}5kBXMk9f%}
z)WIz}nUqS^M|$i^lq();a5`ZydddXO!7VS2jt!O^%1tc?9U1zy>HLs3Q)w<@;eMrE
zEVfr!N@kTMr}o}{ol5^8o3bmZ2wl!IYM5_T6MZ&fJ=HoR;S57y_j|MICop`Nnj7e&
z95o%;q$<YHUsSO=iQ2QfEpgQ5p*GG$`=F*bQ&WCV6>aXeJT;Cu$a-MoA6Giz4pVBk
zGEh;!`10a#t+q(A`08%dVE<>tDcvHCX`JCrQhyVthS!Q+-}jBE8;~3A9Eo`<cKwGG
z)9M#KE=N`~5TQ0b9#%>xC$Ko%sOlW#Ey~p}RM<b!2I2|QWjD~wVFTz+Y)!^Fqt7Z)
zir(s6rF6(9rX`sFMN-g>o*J{_X7XwGllwpAkkuQRQ)lphSJn{;=JBhxRWTy%cSYZ7
zfU6_D9a>(}H4I?xEx%vxZ&l*O68JEOeZ9#IJh;`V#cE^8&bv&xJvOULdq+t`0)&$<
zBEPA4bc``2?Qcq6<71g_PQyIW(Nf)VL3}G7+YF$*$)|0wtrb^|jj2u6?AK`B0;L6}
zI7#NzlD}$v%V||KeadszQgvAEiJSfU$x4k7`LJj@u#{a7jc>2-v?Fm~RHm_=$)i@Q
zDJ$qD;Y|6>hsNd%bC^5xYu`UmtVm@weIO|Vsn&nz9)kw_8HtL`fRe%eESUeiEKhMs
zg!0d(@Be+eH{|YD)$?Bj7<9GkUNoQp3^D??zdZ4$xJttLzgPXszlErO=IkbS9Faz_
ztuEx}){VoFI;LRWgCi_|kH_Dx@y`2fDD3C%Fibdm-JQ(mkB}df^__Rh<cHNuY|i~L
zTS~%WYWjbYyM{j8y^-mt(?94`V%J%`B129>0d`eBgK-3C!rr;fK&lG-Xzn;dR*0Ym
zr|WOAO$nW51bc#ff)bpffo;7|I<jm;zEW&8A$5%@rv^yS>rj4J<4XinL_Z6!9kfep
zH4rrw4QT7E9X>~@+!-2d66rv6Ooe7))@!HFkqw^C!!1GyeqpNn?*rx^^;%0Yg7<>%
zs7%j3?ESPY?~BN{ds(!y+@gXUa0yu9>c2erRhh!xL6hrSk=_|ujtC8eoq+TYo_~th
zzEtr>Yv-kp2}x)*rJz+#sA$45k!GF;{9?P+8J#JK!Pp`bkw2z|VRfBCk!RvK&fNkU
zclGV<Ad9BH78{K1lm1b=p4a09=XID)y@G-8?Tati1Uy~B_lk2M?zKbc&e%5sp7A|H
zi<pj>E_>)9;4sPk2yC4wWIH|y1sG4SV+u?QrTL_wL67ma@IS6Tp?Kj&_(2FQYsb|>
zfebvIcYmQu86Tf!Kno$X=RPADsN-GA?bXqo4_uq0yt3gD2Tp)?t9_gGD+CLq@g*#Y
z<z*=PrOX18;*TJNv=<w>^xRpXE_&*ssfql8r7dKFSfAA2a-5<DXY|tH&Jyco4EvoS
zIoHF@pSk%=p>RfnzOFM!tpB6l>$-ECUB_I(Tg7Gc6iQM^wDX^o9dLys=pdQs93A<z
z2berEN%Bqz`nt#d8Z~@^Ga`OQFtmjt{<2?cSVpfnY})AJ+i>Elt26SmFS^Lk?cN=H
zXJsBNYlFD;aGlo3UPD-OcP%IZL#g6{rso>`O%alS^hR(&ApyI)Bl`enWc1<wi>o~8
zb?o@r)>i1)KU^5%zh(vd{Y7{C*vEoz!#A$)^Y!#zS3#KoArk4`7`si3`sc(Htg0ID
z(m2q-3#OduqNSJX0a25MgT|P+hu=MTA7hpaS3&8+9Rqr&6sk&dp^H#_IL<Tc%#n`X
z^#&MOL%yVVB&S#ENJ<jQ8VuXn3uS8+R=2;n-x(e)F-D{1nG=xG)oD__cG$sk=d8Fe
z9qBVoC9q?12jEsFP;34zqUZT-%tdhVS>GF|CVwI1g!f`u5lG}-+`l(pD)Jc{|9`Xq
z7^yu9-e5wHLeHXWbsOujo-=kO%gj9+r%ds7-y*UAJX%g=7nY}w0jf{?9rKn#8){Mf
z7g(4J=z2(Fn8Qs3w$P%6*S962R5o8w_Q&#;(u|4(O`=?S1%KIk>z{QhdT!y(O~7ln
zj(N_0?aufhnyW(=Ht8(sbO;`4E?PW~R99wX`jzDMu_=b=Tu07%<fK^9%Nrf`GG9&1
zEppfyWUl>Of9nW;KefSI(jp>{oZaRT3Q>k(RNVwb6_(b$H7rW?cRCFj!7l6oow!Rs
z7++UWXwRvq6nS#o;RL9?fIUR&urfZnUP=8N7Nw?VmAJYLFaURy*W~dbM^_h(@2NMO
zPoZM4R7&K$+~3+}3HL|(5S#zmJxjY^ZfCI&G$P{gvi@V%Rd;O~(3E&(amE%Uv9EF}
zu(`5)@b%c1sf+C%<0Jl69mE4Z*)5vmkgOoJ+?AQH%ls+i;pC5&tou;fDFv=8Ou#1Z
zp*y=TpxI;J&x3OIxjM&lR_M<KFI4Yn9ZDfPG^|&8+j0`IGXkHOo3^u!80>%E^`RLe
z4`E46cCk1pB>R%ZFW1&0UhkW*nYdT*gVmX_xz~kjSPZ<g*r|S#!g|@W=b;nzF3jyy
z7XQHI2(H<vhy!Ahbv{1LDv{v{%--K6SGJ|@bGu7aY~dbS;QGex9G12`0dJGe(#Ew!
zmvXVMq1Ij37!6Mq`ZLRqv#SWm-MR1D-4^dB`qp=bQXy^UxB0=RQ+F+*;`k0%xz0bP
z{S={-Brz1CYu^~I(Iw`{Q7>^KCPzaY0%ihXSD&g_IX$#T!`Xei?41rbt|e38x28T`
zUL`OulerZ>98EOcgN8(NN^Eb^Botb~$u2S1pt*G+7aynP`+g|2{!!UFe-;^HNhEQN
z1t!B3K!RYhI%aPKu=K_&b&BuY;xe}@>n=SX#}iQw2MP8>dZga9RjHFv^b7kYnjJEj
zCdAXbrrPjq?XB^Q_0kK7!Z@WtA23^FYB17xm#S*k2d`Rw>^nzcvpgWK-36ckM*9;s
z0*chmYoB{WgX<K!4+)~|Y3FQ~v>{s1V!dd;#<{H@S@wt`gD+fFSMZ@tL941JX8yfK
z`z;-4_S1~~w^h||4XJ7Rmw3ZCy@-3*E?Hk*=T=k|u3(_EhH(0)cUpG#ZXClRlOVGx
zRexEvMDz_@ogQ9dMfZI7lbC|$Lt9|SM(|3OKw}W>y9kO_W_5dz(2Eia^!(X}hCUiW
z?~1><4}Y`d9FOIEqNf5zP16A<GJu##+A-Jq7e?!s&Z6@$Zr|CxI2feRcm7)^pVd`@
zaYQGWFdS+xy!*6BG~#O1+jVccb&D9vdzS$s;pZ{(XwI+t?E}60JS=ZEKbLZI$CjG5
zSMe7yt|qXzkAI~PG1`-M(KN=TFCZ;_a?kGUHRT*``>a>et`&ii#~_iPLNh1nrPAtP
zjba<^!^pt=1N4vAQV+TFCc$=Jkn*ix?0jl~6t{tlef!$P-)*-&`dW+V1kWF{p8m@m
z0(=0TbM7g`#So;#ROYa=)gjz;-Mep1ar`ZyIDg*qLnkzhUa2!OW*fLpcSmTd8_P2C
zdVVxFbzKl16Z1&*Z6DBYZcuKSFM7o2<T7L5JQMoTe#q#bc;o7SKsa{8|LoBzqr3i?
zdZBEVWjT|wkuj!hO@;qhE;eSCNo@(a1VKE$xhWtg6}e_v46F^8jf(ngG*S|@PdHvZ
zQ3_V+59jQSE^N!J8P_WN#f}{Z$m%YpI(jiY;ZQ>SSzbG;d5F7YL6J7Ms3Bt&mm2Fu
zEjwLO7r&~ebUQz}-MT~AM$R!|MZR4RmKt05O6HhX%RN~;spr7R!YU~&h;k87@F^m6
zrO`omtj1e6nIPR1C4Lxx_`7C{_vEWl7S8FDAxiq-^PGLTUoLrVwH-(bTePb(D5H|v
z>n^)vK4w2hkuLYsh;z5d8}8Uv7XDFp_tv^j()0L~%&02{*LI!i>@ThvGV*=zk_TZe
zWo10NMOlm}4+)nHFxcwmiMBh=)+awR0={=XL{VfB=DA%PVK9m{{lP+JeB&f5_xc+4
zF2n9rP;~Xq{MD*COS=+h1&7_~b_+Bq1*_7P(l-DM2FMi=I@8sqknLq)fbsKnAW|2%
z75gLM?o*x3(0K6pR;F((L~mWJVMy-^aBxJBWz(4{)TVUvN5e;i+yZ%YJR4F}8NwmO
zi^mI6$PfChf-<vfL`)wrkq74Wmwm9ObO`67De0AxYffD+5ao`xF}4B$bjxV{8vLyI
z0{hBrXfl$y;|W12EtT6%NGciciuO1~4e8j9T*7Uie1NxacxY&XT@I}8m%`36_}tL!
zf3w&5njh%tW3f2#d-`mf25wftB)Ul)-|BmLLLVvjJ^8l5PBB7*?=r5<(30-YFaq~@
zs<_v}STX!TCha5Qo`Hql7iFKoWS-W4cA7%Itu=yC{U_C)!-&lfJYbBvRZz#?0&11C
z>m>Cv6zPn>*F*ccFr<kxqv*{mi+wOw)alu+-C13;MQ|GQDSo=!uZgpMF9KK`^^No|
zYl+>y@8s^!bbR)hfgT0Cr?0Y0Nfka5{!4BvdRyNeb@3|D=mj~bY0cQIrPtUrT#}Ya
zgk7mf@8ggWxQxfM@SM|an&*0wyH+|xy+uNyh4sSUBT?@!H*slrU{A5|-^CT$7J_!L
z9M?Gf)BCQrebMbU)A<qsvdCMZyyFi5@{<P`z>;Qo9^044%#C`vIq`C(zMSE{TeC2|
z4m-Ify^6lbc!as{yPn@)H>+o5U-vhuW)K(0+}Sp)9=h^TZjbO$cXHnoz=l~BuPU&#
zvZC!GO)Ue5%lqW9NNrQBAWIi_SbU_uk=kr37hOp>iP2r+8xZVKqxZ~?nY?2Fp0v;s
zxnsFddS*3vh9N*+cX8i5J-0syqh3z=?OQiaJ7PKuTyxRTxK85B4Za-5WBB?H?DYA&
zf@>VIHz?kj&MCKxJ7xJwckvGlj_{Lm4%<1U^zXp0lqkP7fC^d&_z-afm&AB7#n@;p
zP&Y&6qlv%^w&z80@!Fx47mGho`qTFZWZ=);*PA)B{CkB*MJ&sC3r{*n{rBOr20A}I
zZTMH7*#E9v``Wt2F~U~hnXu@y1<hL{Dr^VoU$5s8R;_qs?uGiKQm&ymynd$>k@Q)%
zy5GVHD?`kK_;u6Tb5t^#4{0r&%A1<DRWd|qDFK4W7UQqGW#j=cnv)hO&sPl2Gjf`C
z+n>X13h6aOD~GgRGZ#y2=p#_<l|b4LH|&vV=sG?rS4u97?Y-Y#GOV@Hbt%)wgnK+l
zV7wVmr1mQ*JINjd)79*5tbB{6=ay5#(t`HYIHS%_aQv>{|KUY#FQyFTJ<@EZBm@Ax
zcM@*?{$mi{@~6LT=(X=}{l;pd`e6ROjNRa_mrs5(`~&qKalA{S-G4si`Rs3z=HcPf
zRlkS}yMBFbX%}4I!2RIIdw!u~!W+@~%px)OhI5VH0r6c^hd-PSB^a1z1(!l9d-}a;
za03I!xX8++dg`oVBDa)8!!9Gbz77e>_Kz5UQ#`{M#9#toS;;JOCme=(*Z)(eF3b6j
z4XzcT(3lOx{|kT7X<olDY_QoU8DT8&-Ik6|zY~N)+O~oE<b*ebFI9M0+$}@xS?sGd
zF%3{lfWzXD{WZA!-Qe;1?#8;-a$Mx1SulrKQDgd#AV3fNGV9}H)^P(DDYvyLw-5MM
z#O|NZvr-Ow_}LI*$DLd<Qv5%EYeaHDLtw*5GhNeBd$B5@sKpqwVD1!p_B1w_`}{Dy
zK+S)J=gb|Wqx-@9`qiiTk?M7kfq0)+xM-n^rl-5vf)aX7Y?L6k5HvVN1s7@2qPp>m
z;jyy2ilhnKL<0mvLN>2&A$9Vz+Si>~99?(s{nz2^mvluH6lyI*E$hZFBOqm88>)lB
z&M`pl104lV87nGHn?O{*8|&v`QA+!GLRVU%IxN?<dwyp0Xv+}(t;}xmOSElGbZ^iS
zB}9Z~#roB$>DM@RR|UT5W=dt}+t81xejz_9N5x!GjqEetP}EBE?fLy=T=gg>rfweZ
zV7EKnm4u+IokHUUI;bjVyRaM|NiSCr<8;YiNZ1aX`=fL7SdeqgGs)+ko#wJ{KFkN8
zFZ4t%hJ?C(A6(X3x^7EBLSLlGqNpdV_sIO(j86w`wRlS^B&N@b{m!Ckh8(caII-oD
z@LznGy^S?%fAW=5oN^)^*cL0rVa+Z^cxdoMJ((MXhtM1_GUG_xF7(y;5BI={lhi=e
zr_1qh`omt?_#|504$QQ9PY}0jBtOWk!ruWY>GPe^*K^TVyEs0nJfJhx=MPfOT#}AV
z0)&kajx#0WUIN=mP0Ut)CG+T5PCteo@`@5l$-48e4SqB$lzhRi`<~Nh&MW$9SB8OR
zE~0RtHB|=LSn>8+`e@ft^t>@K%oa{8=HpPm@9K+1-bpUo9E*WZ7noVpy3KE9xf|V{
z+rH28Msf;Bt^e+EMq@DO(|_q<P8>*iiELA+!{d#-j%nYpq%pm^`rfAjpnn&(RpkxV
zRtyDgNfkCr=Xl|UO;5bXPlrcf1;`46b5rU3-m^)uRVK_m(*8N}9$o@>`ukzzgSOwD
zcfQUW69u2|C%9cw+p__f%QMHSmOsU8u^Q^%u$x7T$k%p$hS_unxds*L<Oy7&h|j;L
zXO2Cki}toGp*hCe#t*iSm|hG-*BQTZxWfp)WWe4aHe`kHhG<6+9Oh#Lmv^L(3J(h+
z56>K@*4N5aR`6x`1v5eg%W2Ef#zy~q9THyi`FaSC#;k~5*uJetQ<5GC7NcJ%h@h!1
zH)KkGt>7GG33;~Zukkrm1-xDdavAyG->YdSq}6}cjD1Ak(N*@+@ZKqAKi=g4$)uUr
zzH<+>tVBF(o*I2Z&5Sj!^7~G7j@M-sfE5rTMZp&0`SIyhfIXCvd_f1H5uWu{k93&W
zQ;^AsarXJxMcfufCBQ3Q+u6M9l(<<9UIJs^NjkEhY>2E#(B>GLRL62`j(mk6j>ODd
z(2P$tbSY+$n~kLOvJ0c5yA<}6jJ(d|mt}`$SGY!@Bd4R#q*Jbt&nbJ+ULs>TU4B<I
zA!%;1NP#G+e6H0?ZTjF7)U&2KB%8?{`D9pXnC20rg2+Tna6rf@7|JOSWt{waZk*yR
zyYGk&=G=rxEm(ALq9CBJ<Rz8v#R*ppF45Ta<29B1CUfG)W3dnTa6a~dMXoePzLAj5
z6gn1UR9h?2#k#D5%hiB->dK#M;!huZSUb;5;3^L}PBUMldq&=-f!5F~>c?(3x+R8r
zcZu3LtT<|?i=@9ly!D?FjpV=P+I&Wg92!bA9y=w~W2`Ve&}#-+hBkcEQeMo8=1Ac0
z>&y9lvEi&S(a(u`(gY@QEdqiQ>{fjvp2&O$zCT->!WyiI4opZ`e9HX*M%s*T7@W_$
zC$J2ACAz-+ekn`B{Rlk~LCHL8?{=(gMtxgp#DRq8Dcu>V!Wf9&&ibG={pGTQXg{Sm
zMc*v#y<tN~2lr*1?COxe#Ae12E&om7lMPt@+OP9L3+7buE3B`&AqK?*$w+u<<1E|u
zLanQ>1g!m4(5zrYn*1CqW)lc>IXx6DJ2b76TMh}2?Vjmzu{&YJ0PX-`F4O*aJ5W2g
z5AXx3`x_EXqp3)oMB6nceZ#BqY8zLel{agXmKUtg-%sZ}qXdxo4-{M}I~At6or=bC
zbTuCd1V%iIbfPWTV}#^4v}8w`2I96A&Op!YFMdbCgZ`G^l_44Lzv!%;cEXcHd_eCp
ziD`(DIKwU-6c9Lys>>Yc_@n2eUfN@iv)so`@J|nQtT$={;da_*arb9;gh-0MSCwR)
z=QvZyK_5MZujQ_%@HwGd@@@|?6IB{nfrA_v*FHg<>8klQ#QiO8JsJlHb);7V6`Ez5
z_qrsuu{&K)BqE1NDR)ic%$))DzC*NBh_P(O2>?~SW}t7o0%HXTz;mynV%k~7mLnPW
zKm0+~3o-S7oQJr*ye7<+mLG(#zvx{)ZWsc*ft?jfI9uZCNWSq+BygeEY1nfbh+7aZ
zwhk5x3|K*4JDweGkqiAO#djVRcsg+|?7-E(TBkCVRPYAkkm+X3azI2Io*BI-dz^Hq
zWJUhjY8$>^#P++ynrh?>Wx7}9H&>$hbW>PA)LT@$#>>u~;R!Dx-*Y|r@1H+Bq)KMv
zw~cKtEyi*DP$6sXuI*!pDjO3C(Am#&rb)-kC#twalV2`sHtT&zrQ}2mw~R|J{iip2
zO&d?eN=u4}DF<VPpEyplVAiZPPUGw$<-(DRAUF4?F_>F-Mn7j=`~1s;glU^kR2#@F
zQshhSB64&WIzr(Jm{IRgDLSt!Y5N{3x(X4i5}p(3U!@;e`1y7vrRcMb?M(%Bq@FsJ
zfQjETpE0AnzFr@@9{}-OaSXPDIDQr*#+F#hX5(sd;)!|GDw+yC7JtYsmIV83+P@Ee
z<hT7z>&kjH{Suz22fm;TW)M-{$vl7g1QtevlgU8gkd9Vb^0Z=@=(PZ7Q>2H*^!{9?
z^U9&C+jtl1DiRjig@14P-<M`nnT7L|_IBk<)^%>r4P5V-uLI=9Q^Nh;?L*$Rj~ng8
zNPbA|Twk59*fS>_3sTQ#)aCNj^HV;=b+05cI*_nDr6Z8KM90N=i#knPVz>xgqn~^D
z4|>OSoePbA6Qb-%<_t4mD5u6UOn1mhfuM_Ue|7c_*|jZ56{V-;lEDI5+}Wj#AMF02
z9ZdDx$wOT7!Q4>Fq{(O-So4fTh%~{3s1vFeD+ONcEuSogw$;NAV=H(Cj(CD=B6+l)
zBR)BH`Y(_hkEG;h%(1f#q`r`Ti&0<pNlpt1C#)c*93|Z*eXph`BhfQwSh%uX9^@LY
z-QlQ6*m&o%Agm;!+n;?`9g#rNPn0IcJ;%Oxx*pn64gP=z_v>vJKM*y@{Qs8}{$;V?
zZ5N8jv!L|99{HbzhkuS;o@X%qSF){tcfq$0I&gIV?gVh`MEoBj{x8cI|8m{Jum2Vm
z{>!odDO(NN;Hyvn{<gO!@8STh*1vyfFALQ_75%qoLxHdR;86d=L)(7;zY6|atDT5L
zWc*%<zp2rion{D&=JlKV|CDV9fmjVRCFLf~DKcwKbHuy6q4M%n6%2(+)@4W!p4dk2
zO)t??;n6m16#A5Nla0%8@t}`>lT6`w7=)h~D^w%3tP0$@j>Z~)oK?SF`=dq1b(k@(
zj^{ebty>yhwWI<1XfP`l>+2*!<>nT#YlUgl&?KB>qt?yMNhKhi2XEz&E-%ycb-Rw)
z{FF<Up^UD}eck0%uWcRGtk6Mdq74u?T)BIM4-&|Z-#IL%iN!qktyGPV6B1i0u;u<t
z@^;rT8<gx@|B0D?l^@n-EitJ{Mv#~0HZ$jnQLoI;@)a%^9(B~F{u4io!x~m9plFyq
zO!Qk}w*SJWfHqs{@SvyFppCL`Rirp$0F$ra29!#nKpZ%&tB6=}>(|563kkV=+&GAb
zvG<pbMp3X5h<Q1`f<u>aO6f(*W5Wo63@LwBQv+}6=yiGF#RhDG@&s2Uzg)5|9<(LV
zgltumx~OvLeS8`1&#fZqtWfKnQLiISyP0SYC^h8e&9+Frw?*s<G2{jrD)MVeRy{RP
z6|>t$Lgo)dvJGEZVblc6lS$S0NHnM^vQNE^!m&=a(-XxD|MW2OOY2e-Iqvl=18$X}
z1fvwYVL@8so2788T>$~BKskD_RQ()@eWeAxU2L(}7t@Z(%G;#{X<bWUoxFl0E3R0G
zj`CIWsQ(ss)RmqaZdzQ6bpB7~{rN8#UvhPPXT81(PN-od=IXs3k|CkDQdt#nhSHR*
ze=4GYV7_>MW?($v%P00|!6tDtL3sJFTvBPHE;}FQ9_sx$ugLwp)4MEhK)8Qf<KgqP
z9JiQtSiPnUSP!-zi&|1-KDQ;+9p#e?Y8;RObc{5In0jdzYUb>%HL8usNmXSVzFO?0
z#La?C>vH)h3DOm;aXdrIiOe(byZNC-#F7kD6-j@p*`H}DX0i#ZCQhv4bkS9-gl$TY
z00ARsNn9>np9tC>$FZzr*&)t5wW;i=oR4zO<ipixq^uOIswlock$N3i$?5xtxuykT
zuXr=`@+C@TQHDM5*;-P*0I|gzhKJz#>Gy_dQV)tMZFQaWU9!3cJu!BPM-%86&-dSK
zA)c>x)s;}`@WR%X&ez9n^7+>_ceXRME_PL>8<u)^%{9tU2O(gC2!gO1<U4Db2D(05
zuIUQDzC5~q#(V)kD>6;utI6%#)mzCrV(a)LoHcJ`GPi|FLoJZI`02EwAe<zlmTa1N
zYboHQ009QdyrnF$l-mOJOzN&xw6q%K9Q!Dju?W+&qhU@0&sjYE9Cm^k+4{-bVitP(
ziu%9$PQM&VWS4X$qzuFhlCi2LRnttB5BeH^d4_Fd0?8PoESPm^R5BE^zy;Lk6=K4V
zFE`G)>Qe1I^17_G(j&J<s_ewp_z0%U&fsrLVSA@aa$~PtL)_cn(uc>=lvaU)Ka@_d
zd~0)9+O%*Ab27EVKMs{vGv+pYzb5A6WEoJ}H+u20W{J3Jjd2#~+yg&h=Fg%r?p~Rp
z5yfl-mA_?PrhTECI6%!}lHRx>&sKoSS$}^!st3h?ESq`xT|Y^~G9pgAUB&`5w<(|1
z`w4oqNrB3oV%h@1r)YOcZhGUj+m&x3y6^3H!!k70hQH5PG8^W*%-=cK7Ga=Ha%9xs
z{^9G$(=7*=bqIYwvoGqwly%4G;reE8g!DaJcG*3*fU#Z>8;zSz*mL0Wvsy+i;aYy|
zoa*By0!X^FWsmW(Ji}<%p;RY=tJHup-3O~dQ6Eu7nGukqfgnf5xu#c_A7{<{z~I+s
zk$yN^Ar5T>)@u%oNGBP0v+_Sa*6M1P&8x1`c&9y8fEww*h^?(rbL_H!sJ2Flkf(&2
zz_9+)3<D@$NTW@zoYKY`k+TscK1OjwPM=z%j%!NbP<o7^#ULQ>*V=uIVLSi6QMcoc
ztOC}o^@J=XLHF@%>*|ssMcEQs{Ezq(lgs;U*&bc>8d0}Y%8d3mA~VDpy}&9&+)`fD
zgJ@!0P=3ASypd%e`b8rTNHABmUL!-T9k9Icl!si+o;Yto;@LB_?TAs%Inw6#N?l+{
zgr=mxoWnWsbBL9<Da=K`*>(LyvJ@W}SlvBOT`JaJKE7zzY?%}f%CO7vhH6#XFNcV+
z(%KRd7!zm5g_@nxSl0zrE$wWtHFIirNUO2tYRG+k4dB#;CP@K)L;kRNKv=@SDp%!{
z?xY+r#nu!k{bl^ATmlG69B?JdPO+3~ewf5=>XvuYu$5P8$ovu-CQ{6MH>q&U7u<H3
zsaioSYmPX}bbi0&w_?>Lq09WXP~@0d%1nas*~zpn^tY~lcXualmhi|`2w{@JHU2G^
z<ZMK*;aO)*pLZNsD<jHB7S|>VkknpdF9-T)0^iZ}w4SH;uFjc^K17?GLZ|7mo*kx2
zaDnynytBsL<6X`<t3qxK(zm-l?#z+I8o0=0-W~d0)+I^m6CJ)P&~G9GpXsH!h4SW`
zXi2U`bZvcSH>I6^oJsTh%@zRI^m*7#iv`3p$LrE+&T+Zb=qeqtiD9$c3!<3FXE3I3
zl@hM-y(pQJluwd9Zp&kxww7BeiN)L3xjNdhIpG7hXZW^D3e#hYiu+w9SkQ0OEJ3vY
z`W8=?DD&O1Fj(E<pxqr56C~7s{_X$vzLFN$;k3=1<ytj<`(tK`*ZqRQ-BHhWd`0Zd
zqfmiwx1M7T0!5?sk*+n_ex<KsyGusBYPWJ~{48;p(wku(>!$CRzBpuc<W&>0dXfy;
z^5P$?<IwhtdyhR_#7dM73r_|4wW9j941gf*;auY9bROv<CHY?~s4R|y)Z$gjw+iR6
z{adN=Hfko+sp~f_HVI+#4mo*n<i_cjs`7%|TXLDL>bo5S`6QzL>u!3C)+&)nfL7p8
zqVk6{u2mqKffV<m7YvWCxIeX8_x+{rfs<`}{6UbnZeN7fnT@3&NlPWc(TD4XyKr|&
ze)_LOa}vk#9Ozfr?d!57=hePHeg^+6)_p%%-xP&tim$2|-fWSm^3ajo%%@CTZ{3Tp
zATfDjCd+_|?*<ixoN<ms9}8VP!!%Y`nq1)~;95zndZl??-V;)VYvjI?jg7NojO-ck
zmWFt?9~1esuVv_rM_Q8mczl~9AJ?@^upIp#Er6k|y-p()BZz>SJroZiIo8zDZR`>f
zg+K0JY!RsFCh`5vE34xdzgK4!l|H2o88?}CUve5`meS1Q9-s<;LqSp{@crpRJV|?{
z*1Uc3>~dq{uh&7q%G;?Wi}|Ls#U%%4{tD>_ArAS-+r4(YyfHX0QNKR-)Oz0GP{B`1
zM#S=u4{mIjBhYCbv?m?9J<7hac^>_FG8xm$<5tlRLCp!}x^<Z~u|Lub(Epn$0~-(Z
z+H7Im!K)kuAhAvdfaIzEm9RUmcTjlWho8d^e2h9Ov+kl-;hknvZ_j5cf4$vGe1Dq$
z-(>vx#H}7xQy=w{>MgOxrF;8r{lBvG=X6gn4&3gTws>g3&<^JR(ECdWAqo(u^4yUz
zHhIlT>B{sk?7eqrzI}fK<E7n$2_BeYKuhF1VK0(R_I(+dA6{O-7gMfu6)3zsIO`n|
zPJReqd;}lnjlgNi->u>=mUK;+1qUgF!v%$W7e!2;AlO}}Z>5=nXP0G)>w~%%UGbvH
z5qsX(nxkDxguJOJ?N7_3hAr7@+uLvO{l!Gb&1;XWRrY<Q8Ak=1Vl*@ICn{KXqm5Dt
zb@CIY#uECzx+r$mFn`fc%Vuly*?$8esI+XL^WxRqFCI4*v`w}GEKvokQJeHh_VVlf
z>alc;vA-`A)U94<{xpHWt>p!XWfd9Ehn${ylvY3?a(?9YG~wztgo|4ur<{v6ssP!h
za`kxBpEt2g-Q8mYZZ+F4ox4eRRjy)VvU64UI$?xgWWZad$(Hf)Z>{ZmnOKXQAn0qo
zJ6A@@);Oz^lD;qzCt*~}aWy`#qpMSK8maTnw1d4-pmt_PUijs&;4XVY!p^L^L#1nw
zkp`jn-~-=GD|xiAKxFpeVJB(nL6JMAvx$rD{nXi6ypDYb@q`3CUA;l!Ymhux#Zal<
z3ps4HZ7@qncJsvRd4_2dgXHsy`@0m{tlqTr*9D;ttI}65=RIU3lus+J%{@Iqy!Icx
z&V}=#n`!s1C4Kt^wLiO4I6b#<BdY3_F|pSt(QD|G0z}99A{ILQ4(z@?Tzi11DVix9
zP2yI{s&aiPzW27RNtKtNXvXQ+SMc(PK~HqbacLcV#d?gp8yfIL@82^J_uJSOIo%MR
z4GcuF4ln1!Q)1J4t^4u4AmAxwQnh<?Q)7v~o_yme@=XT$kT&!EO7)NXHQ@yV1H9<<
zplc{kFBV0AdGY(F?(6$QbfAepnQbGz3Ay0ZnTmOVb99S8h^EPH`({e;4&-x4T@Lx?
z#v;>P6B~z20XO~r?io$H!aiA3GhWC^G^l$ghP+R}0~0?6<a_?~M)S*WpIglpjqr<9
z?%r>wl7g77>rhNd%3fVvz)-KgbFzg_&&NzJM<gk!Ll4KlQbqji+?}Cc`7sQl(r}KO
zGP*|`nNnkmvK1tX4&mgHD{lX}Vp|m_>g<x!X+VRQyMs$X+wO005ikx;OQC+rqu7j!
zHeBBC#2{9Qr2{&mq}7XVdn~&Pt+Gi|<InxoY8QRpSURaD3Z!YEwrD}6rS#sgCtwKt
z9XE4sZ5@2xq3uoe$i|UsrljN(7Upf6(dwgPrERQ3Bu!em+?FKfQE>wsN0!PxFXH{0
z<GGTFe(PZM-v6tpg%tG{8Nn9IEQI&%*0?pRZJ{%>o1+D?u#FJHu-7#mZGi%4BYt<-
zz<rZsid2|ny&j}$MlS-f7$l+R%Msz=OB&jgr@yAXjpH#W@!e<Vv}#&1kiMkqwze>b
zb?L22LHGga4*wN6Iww(}P_v8<wmWWk-EvWA;T#geE+_w4^pek;$NiyGUM}oxNp|ZZ
z=hytn-0m?{klz2`ea)UlWG>aW0E)#dp$Ji44~bzX;gVXl@bjm)54f~WPspKMTyML!
z*({{^;TA2kdd^O|%jt>^&e~FZZd9KSQ_}R0%JR0H0aW=w{P*nOPiX&7fm(A;xVaWI
zJf*CD`O!E&-{gZidp?OkJf#dnEBSEIN7Zf&jQbKDRLe0HKKInP7{5pr1G2Rd^tr{l
ze>$Z<TQiFaYFQgvSx4X&RsdmLE;^i}S$qqaqUiqRoVMNu-;gGdmorc`p6cc0Jg?%_
zWrt4~SvxYPdd%u%7;%i)9y2mND)w?$Os>bO`{;fT5n(SG37CYUM*s+A*>5$iE7g}+
z=d8e$R0U8bmod*H8#>0m*>ty+UGrg;_9B{Fyf!oP#uiTMWY2<OhLQ;cVMmx`IQGep
zs$Km-2(q33#1ziUE(>|scYZ|v4BWHSI4A^rzA}y0i?YU9t_uOc&~^QVC1;2s?%anp
zr`3I<xfvHUB78+>VU1E_AQ)!1L|UJrEF3kq|7Lau_)<pP_l~^1g`Y!N=!X&A+vOI}
zLgp%*S(fNe-v7hiTL;zkHSK~B+}$-m@Zc8gBzS_my9Ot?TW|}(-GT*ocXtTE-5m}b
z<lwV^-^_jQTi<){R87^)f3vHoI@H-`@4b3;_w#i3+N%M*ZcNT!x#-SFdkaahL{?W%
z@R(}raRNNi8v{PsdMDbMWQv}}_D7oaaUKOjkYC=%%dm?+R+b@qhhoQbVFiZ^^Dt92
zwI9tpnE5d=*ewqnHCH`TM2<sJ!JS_nvGp1C8ueW)(ITFaIDRw3IEG0LQwfExt-sV|
zy*Bmebwn=S(wiXN!v63*{#xGH_-@R%OjYu8uj>g4KaY}tCoTb3#M!dE%a9ukY86$1
z<1M+=XHLnxb=ilSp`hm8FT_+~VyWtL_+1y2wbBN?vW4M%{Or*YtMSTL(s5hYcX(fI
ze;w+(Z%s`86mml^?tshZH1T%Bv-v7TLLY)g#7GWOxLWr{Uf|y>{0A=zff3Ev=^mI$
zu-(^L);vBiKJKy`B@*u3VkrrDW4c>M{z0OWNTPF+mU5Doo{5!zX;_E=ye<GN1@$~7
z2a2HJ$vrlqUykB%m%iR8{Pyh+kX+b>fsC|)<HPRrL*&lBCM8M9#8rP#Ub=O-)b!FD
z1w(egEy653Yj=g%6h$UTG}&T2oH&N}o(v6lUZHBuW3cqy<Zp9syRQa<gra6Quo!W}
z9BBVEqlh7D=pu!f0&x6wRP=_4OHnk0d%BC}I3;o*oJ#O}&eYDeo<vA5e#fK!_k~LF
zjL6=I1i(ZF+3^`Y58sL<5$wk0Q2SrO3#I@n&pUG0LFBB?t6<6R40?LI_23nG-ilZE
z8y7|-<?T(@1-3y`td%pjMJ;hzUGQlXyUIgN9L{guut_4v(9mJgSK)!XHR%#J48g2{
z^OFGzsv&C|1{FhTmjeZ{GL0=4*LdIiy>~;qXpsK<4_kA88iN&zYrMRwCv%1<a<sT^
z(Xj#aw4E{#YbYZ!sGH2D^}qUy5#tXdk<o~>^Mlt0?CiNu>1if8NDe!G%uJKfl$)6E
zc+0xgD!A=^YgDO?@Nw(#GMBxKk)hL_J*_yqKSzG8ne$;KcN;~_^(5kAP4+ge_!8o2
zBIF+5yu$&r4uxMzG3+5DV_3%Tr^;Lx1=MBx`Wlck%}Pd)K&Y{bfOlwbS{)rGC0(mI
z+A<Gg<_FoAy%fORG2Mg1I2OtW?Jwy3Pl+$@Yks?}8DZ?6l0@2f(F$xtF#6F)t!yX)
zWv%ul65KFMoVBGOPzKHO^=va-aLRnfI(A&V&q7<oxpm8L=Uh9YEUbtAaXiKC=;#1D
zB<Gx)6TnMgNohMWcz}4NZ<j1<ydQ(??BQqW>Mbw|a~>pN%Igl8GKI*<A5jZ>jm2!N
zBN}*A1^g);EZnwQ-+-kC88d~#=H?`4vfQ^~V~q??dlV<ucYnURFq6}(>B`odBh9x!
zpe@2PMfBT9cW_2c?kWPlr0ym`&==qSblqxpH&p>wuROcxt`q(KM!OIt<l^F#@trul
zE{KB3N>koAw%dgy;7KF_HgYNAtsIZ6;Z6j&a>i%q5N8>4&9V?9xlruwb4YU-KW7NH
z0Dd%D9HqD=HvrT9=@}6dDeRDIZY|5nz_w=5Qmg3Uq8K{RQ(l=U(fT;6RkcvtFqN<G
zOe%MyJJyKzx#Sc(g1{QlNRD8{tgorbASuV#n9TY!Wp4)ArPQDd=pKIkdK`ly&VWD9
zk)yGsiuR^M8o5eecKdb(h5s*t4RtC0ku=piHPv8kgFqqC*tI3kI$2-!=$|V2D~%?J
z$v-{!NodIUeh4^Hhg<HVEk47UWj}D{@s@>OtjB2U%fI~nidw7?XpXbq9N*s){M~xN
zsG(6=z!SBuHmG0U-I#7$>tHeg<V0S;`1!QkkH*z?*T4?}5aw_wqKWt2#~zdYi<fNR
zpL}_F;KxA+z%-jQHJIGp8uy#vgTOErwT#2@!0fq$uWpz08X|X*Kr$t@xMCpl_j18G
zBLNpkCpOY?`Z{=VQvYZ%0LH*d_^xZzkVcKC(xa0*gjMRF|C9(o1>jKX081>41kYn8
zHL1Y))ID>+mAq|rW@yHoZ*lerFb?IA9SiQSoo{@1N21e_4$T%JmK-tS0%Y?j6)?qt
ztgeQ=F(YSY3O5Os`O!SL6k#9(qt?^Ak*)uoxy<nq@gu^XjuCO{-d;b}DbkrL;1A<a
z)Rcn&f<|+5^U1NMj4TTdaM!i~`Z>W_C@NM5@O7094@zw#-R?|GZmtm4_<CQW5xTka
zQ|2c2GQOu>ZV#3bhg-fQtT<v4H`dTVgmbH?nau8i8~?*)@_1{~j|adciaD_?;vY4h
zFIMUpdAv6JAP6c-z!&5Bzt}eRpW4_a521fnNa(j7Zu;FWdZ+f3q~r1qumsfFLZOxV
zB_8Mm5kI}pKe&i0mb9iHP7?N(X&Ae?CV<>d<sVf)1@+#ZeV?x!nA-Dj7FY})2rt)_
zw!IKtjX)`%WRnTh*7;&Pg!zq~vL{2sD3qhDOon_h&i9h$-gGF?$cO@f(H3dSkoyJ?
z0+dNvk2As3N_$~K2AEJAn~3w(h%}iWao;cZ_l*v(zW>$zTwjC3Kq^@(%S60&a}s{G
zE?M&1dZtlBk^LMB&rdsEE(cfeD-rtAk$MR)z~-h)R^G3bse+A)5cqB=a2AnSTh4Yv
zbX?-ihSN+Z(kKvlFtpmS_&rpV-8OY9*}dOd=G`^)CCEXanaGGZTMq%=`cY6!@ILo!
zk3bneCfycF2TZfNO|Zt>Yr2Kp!jt6k8A<6aURwcPTN_ecc{H@sx5NtH-4?yAjjZn1
zcP&}&t%e4H;NTLO?=s>d5pYOgG{Q87?Edhf2tGa)7Y?meW;)q!h+{E5Cu_#yxjvw6
zR#ScF=O(2`b&(PPWPdLCjY#WbC+=&vqD9me%-2xGZ{?kWTy70x4v$tM-$PSy(cElW
zV3y;bP1bsvhd;^_Obxv~(^?|}BBPoZK6~3wm00HK-;a+u#OSsA`eo=39-lEhX1RBT
z*=1O(v_q6tqPzOv>{N_~4<J`niR%&`|78L&4Pf`kRfcigf{`hcKaisTsL<#(;Tq^h
z84j~Q`EqnFe<?2{arpKV=m=<>n2ftt8`8ht0b}NW#k$;xQe`Op)D2s2&V9FbfumC=
zSh6tiIkV?&G}UYBbgOIs-MJ(gDnSgc-Fo2MTnsmlOmm}K9_2F8@^F_c>wwQs7iAOU
zHXCCs4ATMA?>4^y21QPp!T|GID-qn<5OsIKYIwKt8BpQybK4$PmN2dFtFr7Ho49(0
zQmhK)Uxff7us+NvWsY!}C4{mt|Hi1PaSI*YC^;EmD{mqF3DEK%@ZFB7N@vz;!hu3O
z;Q%|W^+c0I3N}nXZVy@W#R26o=xu!TnW9vqiG42ml?yJC+7<#ZAZ{xf9xXi{Pz=6{
zZZe8KGAg*B6B;0{sKD>E+ro?+&l%!{(bCr^o)S^4|0)E81x}VlPGQnONIEQfgWq|8
z3t)eLr3kdc36KWz`x%E2v8dzShGB*%<r-q>m^<>;^0MR}q73L(T&oDnhk=p%yS9pg
z?Ezb_jYfg9=eMe^$uEAy-<lf<Y2&-kk0j0a!)@2TxIpN~&2ZbgKr({7p~w!40fvn5
zJCjLbOuc2A2%tqJ)TD5{rOqU1ti_WUY*`o8O&^sBUBY*MTRH&Z0W2f(qs+qk=}O~@
z(T_#mosmj?cg12gOSd}hSbA}Eiuw})Fhi<EKoh>aKsm>G%a{Z`7@?Vse1B~t!Gjn*
zHnCBB7|~lZMoz;V;k-@QIvt35L~Z^@yH2!}w5_G_&~;zp%SH7Z#4}HiG+XQhpc$?{
zJuHoyzPzm^+LmiioK$%u=4r<D4g#&RM02J6mauN|q>-!ZT4HUxJ~J-8n)c^kPP9PK
z3p}cz|9cpZf*q~3wNHA+2MV<yPx^3{^6!^5D#Rl+05a@OWlHMK=W{CO)!`LC(wq+?
z=N&Vfc;9Zu#+U&3{WjdVW2O-gb!%e6IO~IWvlHHBGGTiu)9R&@nyxN3khlJ%OE`H*
ze7;Crqu++Spq9Y?Wji_+5v3L46eec*!zet7!T$-~LLOsgpW^#(8>KDmmV>z579St`
zWWWa6-^uW`e_6hpX{1Pen=zohk3<bT|8jknqc4R;K4bXBcR7v;vJ4pF2t>{a-$xD;
z&h{D3zcU4g1gS7Gj1s>(DftNpGSc7R#}CQ_Q-Qp{SQ$3A-pC<6)$zcLUh(4rh?OeK
z`Z8p<fZ<5wOElWOYqb-SV*C!`4W3&Myb>{y7XoZHN{CNk>&eq4FCbnHu(3}reTd(P
zOO&v&A{*+lwYw-`+`s*7p?cwv@o-pHD9OjQ)7>@j3^}|#>v!YXD&!`PkoY0f))BEd
z|GqgvwI_}^DG+egQsx9RXYvyKw$xR;07pq7Pw3)QlsjE>mM`v|I&dm3n(hSUP=ip$
zqC3Zb|EAurHXkeOn%b>i7P5yf*sE#a+AJ%W&00LQ@T3*{w{w}s2H>#Mv|C_&7b}Uy
z6jPt;xtUu;4&WD!m|K%Csr`94U$JZ80lOhn6%E*`K)5H59mDG=i`A4N&p&jxW_kHi
zl}W7O6IDvwV2zNVBaE0apzVgoWAqF)IxENv3pgtsWO8^8Ug_<596L^cYDn-qNZpf`
zRm2Pz$xG<E<=P=RSfiG8h(L{Z*F{p_4gBK~`*1Sz-ad&14u{6zQwZZne~6<TA|q+3
zU%wvY3=#Xz&!czKK0ZYR{?lV3dA_HqytC62=P_~bX1c)ETNY%d=x&?YTdG<T*h9vT
z#D#D1d8GFe;D-wS8yIm-dPGGe{eMWwYLJk?ko%Am)b8W+!L-coLn`W8Ct4`wB$z|?
zd%nF$QG_z->S-?tu>_0hjEBL!(cN{fg(XT=6>XzCzs}>L{u){vauoot3v6lv(Of~E
zmS$@@02|Ni%?q)aEU%<lZigd?Lg2h|K48Cb_(y>Pw;s_h;oAIRdf&SA<U2m#3z9uC
zI?wp*oO+fkotUEpe3t{J%q9|8x7}y;QI{}8wU{vGgLs(+9;lkiJRN2i9j}dnm}~?4
z3&Q7^41Qo+aG0{TjMI2p4{nw%8r>kA@%s%K75>Pm2>TT&A4}P4L@rK+{gGl0<}&1k
z`TVS;*>A)(jDWueQZ%2K2%-ztc|=v?8UM*qu(HtBdOCyQeqW-L*I5Q@d3<G>Ia`Ua
zU5akBJDpS*KOQyeRMQE{nNYFn2vAlLvmGPM7joOYzRA-MlkuGngD$bwZ9MShrK^8>
zZ7@d@zyuid`ZfOGA+Zz}$!T3VI(t^n`Gw(w$NI31Ru?5LdFOVG4X8N~p%Mr$88by=
zvZr2Ei%6beAo8w%j>x(X&-Ui2T=mXL9nfh0J%Rjhs}90-1A)Ds|1HP+@8_}n`2WBA
zH(C8_{mcJXa#=T0Pa=4lcks!c$RKraQ%=6lFocpfQ3yKLMg4E4XuB9df3ouC=G3(|
zos!bZ)+I^pKa=|u<%YW6|2IEEUORpBq<R@L^-B79ccy*&tZ67Tn)0%LVQ&<%PC~G^
z=W{#T_0f0DD}RPDSiH~YnQWt|%HL?l?E=U?>!fEacG~^3IVSps_jQfrg~L*Ng#V+1
z-(&6b>ztu~$bta~@gpPtHqhyrXld?Gw2EU*aUA>egyyh?Gt8>%fwA9SU5-v|rsb6W
zrsC^U)WTNy|9Sz@_B<U*z`_XVWz)&P<I)_4&)#aY=MgmC)b$vLj*j_;Lqx<GPuBgW
z-r%J}hOYPGb^W#fR4(#D^vYLw;~+3@b;#i<YU;A<<?;3VNh{jQ`TCrk0Qn$Ca_5NI
z=g*wjbWeML^IGusdL(>#PE+*#xh9lxzu!dlHfscY8AiG3?tfr-8TNV<2P@$FK<=OX
zFedp}q1zRUB>M2yK;Y_%m#XdGODg|T5V3MP&;m-p`w4k9P5pc2_Rb5)Xjzr%(k(#~
z86~B$hX;HBuy<%XA6M@^nw_k@Ln`9y6<XKizI+Uf@Z%E><?{;`(YinQ1<M)SZP59Y
zR0TOQe1LDT(SYc_Lb?WOS#cxaa~|iWfY#1Z!czxgx<dZI#=`nC-LiDJ!q=P9i;b>m
zNd$Xlbsw_59vFmguIr*kl?)69L9a<u(@KVM=GvJ8o}^QdWD%*A#{0G^zhDT_@-!qZ
zsL>qqkIJny*=?P+prNh2EWAGY!+;CfUzUD8GAqzu3Oj7s`^zr1SrSckfpOV8u1I0<
z@DiT3x*mNoDHex-n3Ow3!Fm7mUeX^j9Za<cjZn=5(gsV<|7#!94?_4yP<NdN`f3_#
zPk+yfEG^h%dA!?1@4>;DUDJc^l=sr-0zTnbzF^<>tK$=3PiI!!1Xkwl1;6{*Okvkh
z%N(H~b8VQX^{yAq?|sHE=fE3c<d~P>;5*1z&i%WW2ZvY6fE5R!;n=ekkG+NhbWBX=
z1^uvls5qA6?J9zl)DVo)m&>*;h%VhWO6O}ZRs7gq%j23mZj@w|*17Y6F440;TNWZU
zVU+&tB}0NFu#2?$2K)Q>d5rN!j#eI94e>Laz(p1t%0d=mDZ5S`*4;1G)HKyE{hth9
zpVLQa`lm)ID2t|-Lxny>IILJzC<mk{nBLupGk)jh`EH0@>3Zf3P2-vDhVf%-yA~}j
zs$O2<+;+<B^m}Ug&^}VSDZeo!J7MhZzOz>gRZZ~V=C<1Cs)wt{Gcs;)-%N}Kc)X#X
zjpEr3kf>y<M>O{=@bN@@+?I6VixT1u$yQ{Z&jyMa!Huk|GdS&OvhF;th)6YTiJO*x
zJZNbbE9?N2ghJeJ|LPRXrWtj$kWW78E3*@5?i8zN9q2kCB7qN95I_k%);2@oQef56
z5a&mYWjAn<<@A^L<U_UfK%K7V`1s9KUq+gNg%Fjc8Gcy-oVyEtyR19^4Lv<TX5JDG
zkdRBXefm=mq|VotLO{GB+&s}aWBSWERWl*NZVv1Q^fBCAwTC-CD}(=)ARM3|lb&M8
z^gcqr9k6#1r8<Sfa&SxT4MA(}qeWZz0ZJJ>KJE8c&2M$HZ9gL67=JevCBzv4TpA~8
z8y$UnYBK0V@BV4cY_gmBynp*-iGmER^5(wt3}gL}@%;L43>yv1@+@?T$IXUJVt>kk
z?REAQtnrJtnbOVBe;V^E6(|KG_B|=W6p+7%-{{N9=sQ2^%WC~_@O%cQXHZ&}7Wza{
zjf6;z`$>}K(1jF~Avx7VqT$ZJzbmkQeJw_IR5+_e#P)F*h&?_E@%hn<Z;7!m;|@}g
z0l}yrU)0=MOiubgnNT44OF^0ExFy_qwrN?{^MQh#&Yz9gUGTZ1d1vQPkMe9$i-IC`
zU?lf!az9y=dI+d;^poYl&$c%vY_$r|ihGputpCMdzDu$7CJ4B#@n$nP4i;t?V818x
z>9C2p70Jw|tUuU^Mc*m5=<e#Zt8g%0zPpqf0?WY(Q?cg^0eo$2Z#s?V9LzC#cWF^H
z;|2>#Xjzi%wC=RqiH*&tKsvkkD8<*%B(k^0^tT{&7q%LEmP0JA!pfT1>MZrW+y5i*
z{LDDpog<H6iqY47bizP|vd`fs7{i7i^a3z;pCKUpSl^}7@J3?B0m{JL?zTYL6xHsG
zap1{MzP<Ewz4?xu9P3|kcWdjBm0v6$4^Yuo^wAu?p}IR_XHDeV*(^c!0CQh{B>-wh
z!x@?V=cj{=3ftFMxo5iG<KS7VmEQBC9>qei2oWf$_x!X!tIMWK=tD@O-IE+?S((k!
z#i7P8eIjWz3hn=~7Qn#-*F@Q+GXSgk8E!W!62>tU@zPlvXf%cyXVlQNb$ho{%---0
zFh(i5Abz1X4ui&nlW-xC7=DWOeR&1M9c%paLt+EAu(hQ?q1R3!;qPqt)$F(V5JCel
zpb9|h>?N2g)Ixc_C=e2h8NH<Ak_U5+3zE4OUX2*ZDXskQ?AuIR;K5xtX1}8#$I8qf
zFln(h;au*%NCuH;tAlg5Mzyp;&P>vuS{*@&)<WN4&h?F%-9XEZiX3nn8VY)meH7^H
zGt$-cBOiD_oR!{tc!D+fHsp<hB9&E0cFxMzg&rVy2@9=`Ob`f$Lo<~I<qVOK^a%P>
z@+b99SxmPGpM$}@uIKV{+m^K+cnN@-ZYs#{6mpl0FkJ!L9N3XLSOfapLqUrAG6?}{
zaF<2&m)vPan9Fk^{35|30`;e<pNMQT17U$w0;=H$dZ@*hM~@ca^+S%ARdQBHu!#eG
zYH(X!=_c@c_-3)n!nf8cvaBgUn+%S3ii?RR?lf`F1e;6o_cy5psm2L1`705S2lX`G
zhO@Z1vGx;az@HUBKAc90K##`FO=&Exu-SQuu08AOw9Y6zW>ulikcFg!$)>X`qfp18
zq3KGXr$ET1iZfJow#UQe<)Pnce`vvNnlLVI=lW)IVlp4V_U2%2o&bI>(rAqR1JLb2
zl!xjpgfqXeih(@=pwhuMkvTT?XPb&s@Moi3oa|@KJ$j5O!B*&56Wiz*`93*h4>3ZR
zfR%;JYp4m}M(<QP76j_1A8f@&Z%@i0aq?nJZo)_@<D}A0c5n8QuK|^u<%J}bE@2{f
zTiec@YOn}ZTQL^!Hp<*E@-lNS1yNfu8Aya^e3l%^CnjfX%f+yNYTxnUaV9%1CmZb{
z<1abPiym|M+b(4gJwx!bm#CYAA~wt9LG|O~OmfWHY_J>oQw#p&rV)0Oqu$e!cIJAN
z6-%55&-cb;%6}r*acP<b7v_4qt*JsMDe0LOVZFRTkGi@~wvX}<YIm!{!66WgKVOJ%
z4TBX`ZA%iFGrUh(fY66Pu5s$Yob;aa&gN>-%~-^R0#Hit`5BtY28-EvGso2F5y~>o
zP)`<d7tcHy1;%lt-PYs~P#rO7=ifK41aQW+lszN!W836Yx}?x3$K~Kqu$MOSyo=A+
zASE&K@liL-_dba4wg)x~Ic2Yvjrcp<Ll?mkha+nGEp#MPWWc$wCrEkAj)w?ixm!Vh
zADAyU?pG;NM7tl{m3G`WlujxD_G7eA$L-;KG;pGK!DF~%>6|jC5FE_^ZvC5x)fCn|
zZD=U?piA{`!l@#D%jk4x5vcn%Lrk-is0A9)&o`fH_Ru^EOj8C!;NU*x?6(3bMlCyv
zX7QAH2H4e0&hkkG^x1wA@^pW@H*ICyhYQj{3r}g<r%`Jqj`)_gti(i<cMbrY2xbf{
zc)j)8jE&pDjLWrq{G;>Rd@EL;IftJYVeBu1zuRs4Hu=l7TD2axVCZWyXE8r-i0jrH
z9Rx*lmv!6&&!MjrdWO%>yLb_s=`8pkH@XFVMZQu2j-#N9U8`@>!pY0Kha>1i_bGa-
z65v<2Ld86so%!4PH7A<v+DATv_3Gz8iS}g-S;Bof?Nuxeu1TlpA0b)>%h4~cyE-Bw
z0hO%wG<-L2Bzt9?bFibMeJ#mpr>GL*c|W{?pcdIwj!+!noZYOwNDoXnF#B&DL&L!9
z;UxQzMZmxYMyL;{+gubvCik3L+4E<=a#UsB7J)DKxq5W|EH3tXQIA)R$g#zWc<t<1
z*_hWSITWT+Emaz<IO@BT#|%`RysV?huQXlB=J-G(SI}dk!N4Z-dp64#FTUQ;-UI&_
z&?A+<r}p+{XsFjq)9c;_`0wo0)r5mTR=~r<&C??YD$@CI<8XNjkq+3z44`@jBY8fJ
z6yHzYy*O=MY<N$t2|rCgKMjE%5<<*6y*D8ClwM%kr+tUl^Io?vr6FzC>gpO`hSxn1
z_?b$05+r&{3nL1Bxx0!8quQK$-L4XFeK_si0D3VzF@gLZv^$@RrrLih3O~UB^wZw&
z!2LB~6&Ej?-S=|J<DdJk7sS2ydL|OR#J*^SaLmqft7&Q`jF@?xhY;QO#AWL-z7oAY
zbvf-0q}(s_cXoi6yB>o#-sP}^{a>2K5}sbb8_%LUJ9tk6ua66w>TYk>W7Tcux%s&M
zEtq3Tj@OcJ-x7sEeV*Dhe{lE~@$cdH_F3DVxaOqv<IG*Bmp<s?-~ls$NN2_Ka!<1b
za=Vi314@~KH26brg|3!@A=>*e&(FbD6-@8nx7fB_yl!4fzHZdE#LvXngzSO>&?@YF
zj)_1D@~*r)cs&&G4fAS=DRu5sZ1WXm1FrgCWZBOAcLotHEG&rLy>?u>YnIj5e{VY%
z7{3J4V{$KTKkE)ARTE{v<IYn+7a`A0+PllmXDa%n^qpbZs@#U6M-wb>w~Jlq$gEkN
zHs!;jw6?nH1Qfb!{%B*n+TGgF-~(BS(^LKM?*sgwv9(FzLyH0oyvNlL$arm8hxGV_
z&HI9?^IrcC|0ZFn&WSnR#c9)OXj6WE=FQ2%)j#n_{(|pDG4PC+5h!AL*g3DORnzr$
zEO5cWK&zx30k+VuthR<4I${=SE9M3mh4`=A3;qNvZqUJ}EcB{@ka;(@`bPGr*Vo6Z
zjk>R@v_KTR1q=iRyjSIuWca){1=j7l^kf~|Uc5|@pq&|;MCs1f7ph^6ns2-vTgThZ
zx(bYmOg3Y}D`hl1(tSeyvu&MW=RPsxUle}SgRIa)_nQ|3)m{^v8Jhi$vivYawmKnW
zM4iG+erf^wE5~G5Qy7g{XV&)PL?d|}N>1kyHl9ezhTYtB`ui_+>P%idncCaVKc!Ol
zJ>E6M(-@3JbM2bV5JU5DRa~)Dd0tp&#s`c{D@rxMuy7AN<TEL^MBJLHI5uz1`bow?
zqmy@29P<R0wV+7V=Ig_ewnzf=(Cz#)=a={Qz3-&H%h#yz4UBqx;O@=Mdi$HF!5FF~
znMyU;T+a^KZQ$Zr37kVUnB{2S?PG>Rok%xX^s$Ov%KT($6TK?uv2;eoY(uy6q5UVt
z+cU#f0>{Q%-{ujoNOnKGJqsmBTzECJ+t2lDk}vE&@k;jCd+dI;3wDxaOq{gWOV0H~
zgrLy!UxWFT?}j`a5|{EA-FnPe(y^GK*`YWXl`7>JEcS}b7QzBaH)+?oC==2IPYqRR
z5`ij-`AeZi3QPv`9M{Q^d`>`+7V_<pkBC%T`JrwY&ep&JwK`v&dpRo)jbO-N#dty4
zuqI``(t5kqicC=o6j(WblrhU7v0pXjm5U_c9ddG&T*RxqnM0CpUjaoyrQKE4ww&>x
z?vqfA&c>5Wt?Z$%_^L9EkZE8%<X!bg4Nmc%10&UVqG;t0zIwAur;ePL@c5Is1Cyam
zjj>^9Y)a%3h=^vFIkZnyJc&)$UUv4@1m!3K8Hd2s)q~<Oa(yEHIxf{kS}W@>`Qqu1
z6_LJ6hS5}t2o);zReh7`bV$pOyNCgvlM_2;H#31t70$<^-c80<Ui4h6@Lq~{^p<s^
z4By6{ti)|_t^6j`c=Y--w`@{jUfPgy`0~cWH0upY9ArmQ&We<3$QTTwt|(~IqVQut
zio>l>L!=k=Un*KArkiyxcTw9AH&~<hMXAw!++PHKlcN4@FspG!tdRb4X&W32O=KR(
z4V@?W)-LNSed2DxSn8BYhQ{s4*HUda@bIRPmrrBftogl%%Mo8;sjg|lQev?)$LOge
zN9S)Whdidv6w?;k@n*c{MO&Nd5B0^}ECtivt;j+)j^2MZMGEO(Jq0}E2G0dkS|>cE
zOfaSWwR#*_CQvrtQH_d6Dr%%>PFf41iQ<h3cd-8AO+xis3P>+|A`Ruugz(ZbuA6o*
z2&d{4y*A+8?Eb#!RX}u}>)$=;FOD-790N^WXVs!jP@ul7hoR>>Y;pD}s+RKRa0Jzu
z7KfIj01xg>m0EP678HLA{~{Oeg@8p0Hm4fYlo4SOtHY9Eo*=0pmcU_mUc7nH9(>G?
zcI_YO{QA_klTD&xiWBKoU^|>HfNB6SzS-ijl7Bs-bL*kY`$;ahE<Ug5%=5?!lfmpW
ze?G}^VkJ09KRgxM(jlWvpmD90UU#3HgE}Nkie<(DLENvQqvF_f@J;<#Bl;#I@-3OB
z3lWbjXcg%7bxREH*=zeXMt*6mATIOii;aQh5$JXT{)m^Vn|UqvN+dC*xcuUTznz`C
z!6NKWxq4)t`Mx8|i1&0qL1ubR!B-x6_hOmER`hIklX8~SA<^%H2-z^F1mDI(F_-(!
zP+zS5?DbkYKgyfJa-TY%4-ufPBv&vgX<I6%qH(T}?3iZv;DhxdWS=X^k4CT)o|Vhx
z!f^{S_xt@=*%gajv&m=6_0Jp^vOXA`p>r;h(57R};A@^L68o6{=~GF!(?u({sTY_d
zHCHuEEP?a=$dAZt47F7cznJo<Crp0raWL!DU(L@wWbO*Z6V05dHthL~u|~=zdyQe=
z{HDx|Y-!FN4qLtfFGp00Zo$2w_^I{x-}JsO3F?vDFIT?cib^dTK1TVdAllA+L#8j*
zK`oJ7xd&Q5|7s<^x1!|O%;+?M*FwP};%~5M2vQ^lH)EYKMvhX<^ByoVJ0FRr)A5oD
zS1uzzmRpbGvEKq#$0b7iu86mZ1baEly#JcyFp7&G$M&`7kD=Kp$&S^nww$=CD!eY{
zL_9OGi$S>YROr`eW;Y_HRy0SRlZ0sRRsXgLkA90*Zn0V+DFm=7hBp2MJ8wxGV<+EU
zJcNhUS)SK=;*9>ZbyU_~p^x8Zv69CV<3LGvu)QADm<YjCia|N*>lAK=h_uO-d17Zx
zTyw0=#2TU&X)Dil)$~Ksbow2$OlhtinFJ2gw<Sm(Np0O*spTEHn;}L?2!c4I1Z#~k
zlVCco<6g*`A)H7mf$fnh)8g0rK`YJ7)ziY7?e)iEk}l%6Zk6G8B#~4eqz=K7<)cME
zcuDLr_N|3>!oo?N=reJ=&Zr5SI@M;UGyB-T3AocMShI>H<$1W5enKALFAn~!aT6*$
zkc4TThF!L@v&)-FT>{q@g)XXLd}SD_S2-P1^V#N{(<tH14lu1<kX&5GG^(oWlf46h
zPMiXz&ZzNYZz2O0QXw&)2jWow9jcx-f*hg5ZAps$;|c<u1?%b4kIK>e?<lTE%J}w{
zJg6rey))x)1>E=~(9-)*DIvklNo{Xpn~>!Tg4~&)*;LVCc;=X8q!2ZSYAfH)#A^xg
z;;~F`fuACxkSP~0_FHa>w}BQZLA&ed1)cfs9JSTYzJ*==;j=^ct4==px#Z0@74mLb
z10m%joA<c}ew~BAhW@nk9PDJISEwtwdV~hPaKtwX2z{fsJQ#93F-6QU?O`6mE8U}7
z(vIB62KzZL$rEPIfNQa^SUl8h9vWy~i!Rt2&W$tky`Ge!m5d{Q_FNb(*D?C;P*PFN
zq+_7oGEHz+9BRRyK~-q1=HIw83jF3hBbLBr!|<+4)DXJPmgAZsyVj*r`lR<;y(-$t
zQNm_EuVqb*Jd28GFjBv9J)VZ2$fTZD!RzV|?Pk+z8V$&jYl-R|Xd}lLhRV&7B;5L;
z&D3nasjCbsZnbf6bLzZ^ua$JY)XpDO=(DVAIvyjvgs*<8z$isA5tWup`%rjiL88jW
zt&cP6Ws=A=!7VyIF*LmpvW1uQ)<EvD{=j!Xxl~2ciZB1Zq_jlKD?##w3QGqcY87iG
zc`UzBhFA8^uq8!oNdabimLXyylTMGgV_x;8(KglLQE&5u?}?$|80W#klP-bJ*B1e6
zQpjKXi7lIfw4q=JTcY$iD=B3b3;A;V34>gno~HdcyNP7-y%Sqhl7y%#B6eHdn#zyr
zS(09?a@zjU^06l9RuyiGUI*m*Rrm?ns0FKxC%t4XwD+)-S<hVBpo!nVEOrnx4$rk?
zUrRE_<x*RJm`Ja7D}m)kE!9gE&ywG?F6ZAA(%luxKln-(hvHX@T7&brf`?Q^vowj3
z2kZ9@!*ZXB$-QBr-dMMc!?~WHSuEDCJC-EL0B+H%U=Ne)aVRZupu_oOj*D+^47x)J
zfH8dCcZa4JLS4T^vg`ki1Ksm<_%^K<SaPMY!Uc;zg#mGidDHzt*Ml}6-(N`1E3}Jk
zmC5n7QzWVpXhoHnsOLA-Q+9Jd^^AJsdU8JtdL27Src3)$Na%}{pgNl(a(ox}dgxzV
z(E+LLs;JT(T7>qyGKFDlmgVU=<qAu6Dd^oA@NCst$YzfOBA3o^VD^~!eV8ua?EmIT
zCWX%5h+1aLUv&B2&58h?jfeOnuae;wHVY(KKU(7>L8LtVC%a^5c`8?F41UcAoyq$r
z9szA7R1kiF(z2M({%-#By8wEGbzMZex3r(QY6Gj|`>T0*$<WM-#K+O42@#SPB=O67
z5x-*W62*BGgpT}n#91-f%bY+!vol60d>j1yd4Pq4K%yuadm&P%@GCqUtFIs-g*4Zj
zlM@P>uaXx7!vswm<m{G%?;UCQ6&~cJcL*K2e_PQn-64-+bdp!}M9u%ub><(b1h(YW
zr%bNEAW|lXO}L~my6pAk$tx{y8lFrCG&&bskL3{^T7%~BD}`QuBDr}ab!XN9W<OXx
z3>}ID5|F}>2;@a=d=qs6c^Di~5|BKJcsE}o|4G5d%i$8mFhS)Sb})ktfr7c0ZC#@C
z7@Tj{K*ErD^w!HXs=0ew$&`_F#;o#Xu(jzIc6!zq1m2IqT-3_PDs8-RsvJtckxfNm
zEuvt*cDQ6Xvy^EnRc#>z3I38R(Xv!Fu$lE2{G<~?6w8!+t<~{C@gjoenCp|SI)~Bk
zVbhy=VG)L(zhD!v3oQ^b3MR;>M6o_9_ZK@QP3bv@LZpK@wMt<poCeAZ4r}ovILd!t
zm8_Y}c+r1)wk0M{72}L98t1W!Wf?r>w|?Bi|Kwk%*;82-;{z8ve|@s)pbI1W)nMT`
zGU~fQd8pV|G`A7L#0+UN<89R{!y8KjMJI$9R9ggAs<QSy5>ir(a3%~t@bkMb`qq2|
z!^LvrDZG{Qe@T^M=(Vhy8&)p(S#Kzk)P&ULbPt<+EF*B^{>)*vEXg{kSx<!i8oKt%
z*pdrxxy;2(3}s#<w#jdB{P}@5rYx`aZ?Solct$BnGq$vGHD7*n1ytpYl6;M#X>gPw
z{x9+~UZY-3Mjj_xE!*^OK8O?u>wj)0U?P>s)zUFEwKtZ1<)s44cpnqDaqv~gY)?XD
znG(;t(-mE?_J$&gCXLs>&pnbNv=R8ANn%dKpp6*%b0V~19NzFee^g+iLCFbS`=DZK
zO#kTPll!K5BEX?Cm+u|=UnWM)$o2a%KHQtqwdA21g4yTz`-(T}H-y<$j8qZRKj5T$
zKIi(xU|5B+SuuW;k<_BHGn?;lX-FR8Jvp(ZNiJ%lm7*+(VLZ-{EU!v)$mQrukGvs_
zPyCE5P_n5Dd1T}ya8(YD_>r&q9uyGHVg<AAjPbl2J}-wur=iHYVVAEC-{#@Wk$p`e
z#}EU89;*9K-k2~-HUwdG?}0q^Jdg45;np0yDt@o$|5`To5_kiT5}=D7Q%Ddx_2p=i
zBjARan3g>nCrX6*gq5ad07HuPNY&uv2PVSEppef*u0}gKq*TxF;|XycEe<QSNxv4^
z`p5ZH)9$0Z++xP}S6`Q{XR@zI&f>6HKH6#3@PECEVZs91sx!w8%JUIP%O4}s(&+`G
zPnYq27I$L&3RZa-t!Vtru>mrJKO353w3-MHwcru|!db0H!EM?c(SYX6xA!RVM<4!c
zB)@?i0t(G6kh5>uPsV*4{>Z~5AmUFxLz9ypLpU0cH&3mg4ol8~ANQEfIZ-?EwFy`C
zlM2E}Vbw&_##XcRfMC=gbw2S56cq<s8|$A*m;*^{A<-xjC~zkP5}HfRMHW6><Yv(@
zWCbe`C5+5v9EnAY`Pv&sc=BY|x8I#C#nn;~S4<H<u$N55WAn_l(BTrse-7s2vH4pl
zIT2d&NmlH#;%gj>3%?rUV$a;2c>-znHE+h;>(sh$nD#dUxAj+z<{HSa>`dn$7Cb+t
z&?9ANQ38K_gpV@gAP(*@sv;V?S++P^YjD3?<88ZLwy`yZGx#y`IhFA(Q1AULlTI#l
z;Nx5fc~KT=YAg1ZqAj(Q5_uAx;*R^PUi@1jM3rG(@G_29cT$CtbMgC#)M(5K_?Uku
z*~_tA8U0q$zoNBUC)aEM?)iW@1u8Nl{e;ymPn>Ma%(<Lk^h=0MwlMG{fC@LI?dgN#
zXz_(|PtfFL*c@<%9g`BdE3s|P^3VD9Mu`uWs^Y67&*2Du;vXkWsNBuAzZ(9~5{pp}
z<1Y=aXEWF89H+It7I349F#j|d81pr1_m{hJ<J(;0h^hRyeS%!kI*!vHCxiCk^$ZVw
zR8NY)RF6a=Eu1&Lb;`bIL89$LMMq<eN(gl>GBj<0Pb*Yirb@V1{T1f#W-yd)2YDZL
zMo7*P%Er2H(Bx8%sP$>J2>SQPTwYX`pNJ{C`8wyz4uKKu<?}OT=v!8{^XYvehR>^U
zbt^#*aqP0cXh{hlS(ntd`+va_DsY*+aK$}U%i2?)*DAt{1q@nXfr8GBQ&x}KSv!)Z
z;mI5Q5C>_{Y=2{r0FD0mglG1;xNS`3`gb%|+v`gPt4(~<f-K`jzb@(p*>B@&7r@!f
z-a;m^QxG*)4pLG=^0u%YD=1#I@=l`ly<N!|`Om4p%<m8`m!A1y50Q41oQ+lz98BrH
z7cfS5E)x5OxYBP9wM*WjbD|iKC*6M!(Q0m$yNKYy-sqA~4zWY8IIxQH`rs5mzhV-F
zz%Xd|*PhE{rWLj2B|I;-^nSkGoU$cxOv_>0B>~>X7OiZlAeHahjX^*p-&pb+#V7nt
zLjQ2fTzXiR<j38!@Z`qki|Ubv#l#w?v+!98kqjU`gd&43$kWt~JGLE`K$#^tw4JV#
zfwvc)JSy>co`%~kHnvUB{RbkW`bs_+d_n`K__0A49sAue*m^-7!ZbwoHcjZE{V~%{
zUaP#<F}5=5Ll0tdj`Bg3-L-aoqa4qXA6u<agE<b#{`h@S#L1U3`n7^|Pf;|D<Pj|S
zXCsAHwqIYhLx$_pHjK3_wHLc1<odT`-U;4XGQ~Dy3;Y#=EdK0>M3M%{2a-3SXf<-s
z)in$|WAPfr^xFm4%o1XP!x1pPMit8X<^%m(pLxcTgT#pk%rk;8AbN{g4W7dO=b=mR
z)k8t%Q>uvLWPJ<o7vG?3gLg^Jl#FWwzdRF;Lbf^8mwFhe-O=}rKA#{w`8`~zGba_x
z-FSn+IDXeVva_D|J~}JQ+U?D0dt=GjyyNL}Wdd&7X0G6zlh)?Rzd2^tw>O;r26!yV
z>}6~=RwYA%aj1V*{I=xv8X%}I_p>>Bhc!Fw$HMa!`XYN5YRm)ql6b1bQ)Ua3EU6%W
z$~Y0x=XF-S>VU1SsMmMsIRE{Kmz6YxAzzqV>x0XAq_NSc+BNLD>5!$!s(UzIcuf*n
zINcBKx@8fD#yHF@%}o@=oP4HF@VOi_gv-@pto_>Ob*tyiJ0cOZ;?NeNW_`>EMS8ti
zX}=;^xO};L#>Mj6(f##@$-e7MeM$%h{NBL}u4&Z=ZsX;=56EoR*_~LxFcZY>Fl--*
zk&zy!E%03CO)RIA&X*fVvZa_BIflF1XhOC&p0KBcA>>nMAejNkwg7#CFm%E}v&nlw
zXyXgMr0{c{rFNqWQ&r~<AJ90qKfYiViNd0eByu3?wZ-+nj~53RFdPiSlK&?5BNln<
z-x+~oYGf$;>zV&mltWjJ_k}0Mky>O=p0OYJ&~%Oi%jX=e9ddRDId|_&dI|hQ`S60Z
z=)KmUddn>+IB57hvElTpd%kn#;LA-*TOfQ5rd@S>UazC{p6LxaZM(4mlCD5o*e&3%
zwDN?Cc-Q^)nF<a_KRh;{sq<?t01aVTT8;)auTGtjpofMp7z_%HV1EBctks$ejdSn5
z6PaR_>)Tt1|I5Y3?DDd~+Ih<}Qp>@4o5w)54I_Re&;oi2x(fnvyf;WTj~_8(Opas+
zTStcf_hVe26vhMg>3XEtbE?Y6anQ9`N5@j{Zj1_g!{qOO#A0|LYk2<<{YXqfvE%<d
z_bPI78AmQM+4b57N~=Bf#uDLrzna%44eg72I$#>>yg#e+f3B|T%DL-&6}Qy)UoNJC
z3g`H3LJrU&<8_Zfm+)uc)2=FLXQbHgwta|uP_E67=0A=Up+^<zj}rfR7zkj%;Qz<W
z;iLugz$pFa*=Cr3U_Som+3_D_;9w;G^Yndc;NtNuW2w*$y|u3YE~#PJO@m|P@bGr4
zWY;aEBklQ+m8SnZCl@w=hO0@QGM8T$4wm;&@xR|?ALgHXjD;bdNBp-R4^QFzzcuLe
zk6*I?$6A2@#{&3IdHrvt`;FZdUP5C3cV*@l45crE&9Y#ZE$4J{ex@@@pNB6@G>L#c
zwSTg&PJGx7#?N7jcRZpDwBl8;)g0a7oY&E<=plrAPvcdNaM3V!-PLLL0@)FVce1}&
zQ^a=T9ISv&Fr`mUw*EAsh>UOyw1)#Qcn4(ZxRPCH3zJ4PQCpLAg`6pjnO^c<zQI(I
z(eSghyJ<i2awTNVqfPMp7e+sd@{qHV4))gRvE~NlQOUdssY5EumYxY)&xY?qKyG)A
z3v{U3B}NnR;pTm#P!shlBXpQ*gGGyQaUJ_8kfG+8oN%W8`3C8oFdiqVMo|T1l#q5U
z0t&Qqu;E7}=Ta)CM2N(xS4R(in^*ufh=xIm&a>4{y$qn_Fv=YW+WxDnv6t*sYGlP3
z`iZsO*&s&t0)E=ETTYOOg)yG7F@e<2{>!4o3w|uf&52eZ0yZ=zuP<mzmN04w$H<9X
zeCh~|>eGjd>5X(6{>@&f2+2VH1aYXq7~CD#yxq39#@iPM{9Y}?pNzHOG1U#cq@NQ;
zx~N?yvv`3vMm)9PFMM!zOi$l@&T-z}J4W+e!0(lye`Kl#>fwx(Q=3WKj`zck`L@K!
z=%3*Cx;c1EOq=LS@hMt|3_+ggID{=6m>~SjXCHKI9l|gS=!`a3Aj$WSRS}TiJBVH}
zlGnm_v!&?q<5WYaEL!o>wbPLzeM09ZjAY`{W;o*&Lk^^-pOAiJx}1!9hUMMjCk093
zYilJ-c5%WJMjO<ob~<ve74%j|WkfQfE54siM8mxQl=5x8l#Q$}zsxM*<g?P`QtBlI
zLB6tJhiYO0O?(7oW2Su~f-j}UPaMDByri=mtmj1IPszOK;v$4zqzXgM=k#F+h5$}l
zHng)l`k;wBYH;c#Ag~7_=RkqjijHpTH2iTu&|9#sPk5vkMIi+qcaN=HHgPaCkhb4E
z8scBqF039vi-T^m1veOASbHe=lb-(vb;+VccS+zgPU3;mJR$KREGjYTPCO>F8}Ei_
z-ILI2!Y`iFZ788WqR+O!DI7|Gbqfy%Y=?o+y;Z(XQkj-h%W_m4>@-z?IY}r((tTgI
zNj-m{uA^h%Gkv8lu;tz1C3*|TWY}bpZSG1f^ayQuVOd`9?psCOLIe!Qrv~A#X)r>p
zCSipT(Mj8gXk&QsVgy>}y+ENbN;uQxJgai+2a+R&%@o_wmI!yhjELE->*d)oO$j{(
zLB7&VH70@RTRO)CRVuzJGw%7ET%KcW3CBNL*LVw((q0Mp(iCE#^)M>+Lr5E=w4|^q
zJr`lf&%Tj)x)$TG?j~?_9*BdSP0|A~2su=mTyPl=e~{dZCaOMS3@P4RnX9^snA58g
zp3hH#Ay5t8b^gAvPPN~&@AGzRoO|-Ye57#ZH^YX>WX4A0uoj5mV2Bc|U^Ov!=+&<S
z=gP|f9CU^lbvv}nQNF$`Cf2ddl<m7KMn`Hc`~a)iOdGRYcyhYQDA?Y}2`$5*=_YU_
z9s_O&VWBe7*rFP$Qwt|`<wA2XN+<<)p~AvZ6UXLV|BWcz9*kF8LUiJG`#4O2a(`*$
zK(|NM7}k3QSE1`)c<B4&%v$jMo@|6JnlyisC5^^EjLDdGf9~@<FfZ6ysW(+7b|6cS
zko>-L@OfXsC`2@t|GVxvg@a^!vlyCS1mx_A+?Gt>J$Y3E2Dj51Fp;^ul77P#qBhxq
z&pT(|oZtD%cS_GWL*73q6pmog6(tcM38~;qk&A6IUh;if??G=({D$Y+fa4!ft4c>D
z|5quB;PA{j;tOJxnW@}r@{$f`b3&!&hl`(7;WE4Af&slA*(T`dh+w(5W}*ALOoIWY
zh^(etsSxj5>FBqxi5$B*3<Fy50!e{PXL0tW4o`NDmocOQ3VQP$Je;W%=B6_?Bww@=
zhb}%c9dJ$mqzCP4jLFdU-!++Ow+9?|zat<f9&}vPnrpnzHA$dyA+eIzT)~3P2lSKO
z{s265q7FNal!%BIO<eT-;#z?d{}wO(2eLSc)%)e$M?8FnhCK53(k}O**y3hJC1zWV
z3?yVOOh|<92RVWyt*D7e1x12Av#)n>nk+LF4cu^tVUhBoX&6((M7RTd(R<h}yv=(r
z%_5gH8yk-8t3IQ(%`PG+>{+ni1|9#%F$*COn);%!t|*CSIl#Ureh_fi{X6y<c^5r1
z61`yO886U*bCu*kF<}<58_<sJ67F}P6RvK@pqc)^-;ApkK1oOjtdlBVz4!9g|Dy4o
zpOTGe@Lr>e|I#u0_}dXe=r=wK;i)Tzfjn#yOnh3zH{kK5DyOIH0L#EiAkg?=ddNXI
zgYN68p+%>Bn5uSg+>goT*CvH2&X1kB(|!<So#n40$h4;yDzqb7FdO<|ZBWSF1JR3v
z*k)V!iz!=y$mhs}Q8$>Rpb0DgpDL3UbhU@n=kA!ygu!-9?fH0j>Zw1(ct%UFjD7Gx
zS?{P2Ln&|>W}$n{<XXg2Hn!V7YzLqGyMk*HJhCS~n+|y9M9dCpSqbRHPJWOIxbzjV
z%zgB{s^%gr7<3#B(lc|iKx;b#G*;2Lym?}@6de$IaF?8Rv}I{jN=5b@Me(0B?X{fn
z;jzz<p9I%#O@_x-|0m-Z*LHK`Oq;7h%w4_J@AX-E7aRXCDl#nVexiSu<@_9M2>n}5
zsi+DQ^{?;#^<f-X9f!KP|Bi+2Z>6M!0pWEv-hYt>&r3OLSPgwR7+$m#qq&KA&yN4e
zehj487GXZh$1u^PaLdE8{ukrB`m?B=Bj<)6@Xep!*ua(bU(6%zvGJnyHlK0|_JugN
zvC~Pz{j@dW{~C;)K3vEXu|0(J&x$>D-5GuRq$#!sUf?P|M%9bDwc))~=KWassAvAu
zG~_}!p1#JG7Q@{iSty=nJkF?nJCS;1PV+~{vBFQ5#+|$D@7+0#8Gl1<Wb9foi%ZjT
ztIQvqR^c1GH@>%dh$s^qUG))<(YM&ry7I5bB~>sSifCh;j~gmXWP6viJTyoo(QGYJ
zr(!pGzE3;jk6&tuGpbK%CEJLct%vVwLXgv16YbbbGIKkJGew33V@q20?@csM*l+|&
zs}aMlyz4lVQaP!7Ge%E(;{Rgfp3SAX`sMbw5DKJEqL4h*N{6`34|ZwIJ>I~d*I<l0
zK|_FU;WU5CwJQ+x_mkSFIKC13Vj5}sI`r{x`}U2|2WSUF&3M12+#k(z+O|i9-gtxj
zZ+TvN+<xsv-u`H9exk}w_^P85#O^HWvS;fH2}@3BT6|kZLs6}WZWF?QG;?3}-8XMC
zyA#oj*y7kL1>|K%&f8-id@tU$d}G;?p;9)6<`2=<n6U6JtjmAfb8AnwWPiA1ufLu@
zDP4M$S{<+f3t=hB&!)f@e`Vs;+>mrz$btq_q;3-<I+7A{UcC1YK=JAAMTQbM;?gBM
z<oj)SSF7;C2WM?In>B7o07B&)#Kv?BI@@N`k-Tm7?*4`x)>cz{!?XT;LcG$0KNS83
ztLZBzWc>;0yXo`ymMx}>KYxo(hHbk@+WcO|FK&rODl-(VcPL2>AU0-Io*01e6dqHS
zyl6E9qc%KU9xzw&OK%ob?tZ&({heYUr$|k9*SB&r<zC8qELn2KGFVWKHf*t7w}FWu
zSSe}SI|JlRM}FFyXgSHSPs*Y%Yq1jZcMXi@ktNL;h(zD3UmJLu?JdKqEiJGdSK2?7
zpx+hnu(Ba-a!E>4C6`Jys9*il)8P?$kUg$=jLXdqk9VO9fkmGTgw`nj?TqdCt^=uR
zhoeZZ8+dB3S%YH-U!tT~aVG<T&(B#G3^isH@tX9bQwpL#b(%7kPd3EP<hwXTV)QSW
zab@|p?lLnup8?e>*k4{SUYMjk6y;;N{9rg(>qt~-vc4kvCQwn0c)+7DiZm?hZH)yH
zpuvjyp1cNzWbjF{6ouLeS`ERca{#4_TUVXi<MqCvZLsGz_9Ip$k0KI8;S#dXyKUv)
zylXOP=Q{uCYF4=_{q2STTJc!6Qru*qIKgsGJBnfa<-;G{N`@A{yTq$DZhw-cvUBQ=
z8^m@zMTLkIZhs8k6@#d(%ATnP+`U%IJ)e`ij+AAxb-CN+1qE4~k%5zN=6Crsouy<S
zP);90{+Z=wND2<2bL#eCdH;b=A)lT&U=A{@24Ne1hxwF`Ce>saBc~n&H;?PV{|9kz
z85CEvt&0*55+Z2u1b2c<fFwY02=4CMXk$SV0t9ym9^BovgG(bp8))3UY22EAEBo!e
z&rY3ts@}VGt8RXfRjX&O9&5}Y-}t62!e79Gvkou$fbHKmYPlD#jihLTojQrjzm3<#
z&5|N|?QH2-!c>pB=4fimJ?Z-i;O$=>a){>ePu<!mn~k3L6nBke#O54q)Gtr4|GF<S
z4c0|3T*-9IapdubG+XjKDMwKmIpU~Lq*zi`kSp7izjmJ7pPqU6^>%7TVAM2ukf1zk
z`QYiWrupDO!vn}UzLJ1wX++d)MG(+a;-09<xJcnIJVI|gjZ*K=*dXYtcX3&IX^9&?
zKcpbN2bRBgTHltkR>b^R*5rp4Q>U)=!-^MHJ7=gD!DghZN6YRrtj^t~Ia7M!@lwxm
z&sFQCL9OY7YHr3_gh0R<ZlN_TZwsB6)l=EoO44(Sns*iQ8DV@4`QX{flQU%yo@Pfy
zwi9+<yZ%s_w7gvbrk40#H$fgWEw(=d${#x;b6-#NDNE2R#T$-`G;>=FjvrWg3RYk)
zo_>N2czjxuH{uoXPI@{_@BTJ<eN@{DVQN#MSp@XUc+I!HTDRDFAMEx|L`sZqRtqLX
z@0{lCIW^|rXL&sF={3Sg&Fh}fKlhB;E$a|Etq!gz21hM`@c{K7Hb=h4k6v(uitDUf
z8q~6-Yqli?dPC$?NvK&i95&ghYdzSvOUt`#gdWq;)Gh<j9n?_@q9e8~uS?Y4CW&Eo
z`{a9l{rE~}^=u_^0PK*Q@Y|vHx}b5Hm21<a89~!(#@;TaSUZ0%H)C^<XO19s*dbau
zCxdL+8Rx@eQhUs^oP4s#d<CS;rOM_MP<DtO8S}@F1ldxKvfdarvORG_kpy?!U}ASp
ziufh(>MI)0e9Zct5#yHcHuu5vTAr9CD>}WQ^(Yw@rP-93tbXSOls9_T;fejK@z*cB
zCARl!0S{`tl4&`+w;FIiC|RQ=eWGc8bsMzO=t@*h?L3**@FWyd@`@^TV&wF$*0vxg
z+#ZPKltp=k6CrlQp%*^s?N+*S<KB><W<vp`Tz4OD%j!P8%79tY1NurAVf|faqSI-j
z<Vuzudr#6NW)SU>j5u{`qM%H5kDb2Qe)#z+i*7K?<9;arVqu*l{>QwDO%>Ae?6Y_f
z4O7@N0_W)&Pfnrp=y{YjlO#pT<N9qu>+`l~!=L1<@zSMm3C~rEz1h5-<CF!@dxhb!
z9z?-%5}ael;QH0x)dTTMY1dyw)X_b9H8EGsKJm$|=iOG;n<9Yo_agYeF>dDD#Vq*X
z&7`>*^0B_!O3oxl-*qA?rE5M0a|(gF5yF>a!zi;py_)1e7)mEKFyPzfR&Sbfj27Tm
zImm;j;js14X94`!8p^)c?Drn3Fd5$<>y}g#^7v2`3HvW1PK<M_a`+ImUZ;BB8Z#Tn
z=fRbvL`vwLr_a`df_ol1Y_R4Y<kcIS?F5IIi2DSr9m#XgUmLZY`HQcA;9)fp%Hh?Q
zH1B1dPMS?r1@Qq#vWsSTl%wsc${XsFBb0Z`GsmuM=X%`&XLT0dd+vr4<2}c9ZQU}t
zS2AC5nz|cFo;-iMihcIr-Sg;-N8q~Xu9Z_7!tzefJ?tOS`=1k5qjcQ}3gdO!tB^Y9
zT&<0Uf>6iG-Obo}PvPZHMC$N6{Mk6m+$^sforHSM8AML00y}C@{KpBvaXrxv6`?Ov
zC|QHLs)l|zr5jfGKdWHDZU$5IyEeb^@fZNB=Os)Gl+vi%W;^Y?e7pJV8wFUN`XJ^B
zjfQ>h=cVhMse=xgoL`QH%JXw+kkqvu=9MpR(j+l_t#uPXbWCA#rYi~Z#>-f99#BkY
z|BLErIs}kft%ul{def=MEG_buU#V9Gc2&tw&jc1S+mhB-CJol5Fo+=4I&Q&XPc{8g
zs9HVPgD+(<n-o4(+0q$&Gd;FewsBJuxey~x)_qe^p`taWb7uS%%LjgEc&fyxQThe6
zeuYi<J;y|M=zc<-ven$iWKR77?fDku+3cCw_zz<XYiX8wDRA$Q>g-dcS#)1+u;l#b
znBsDb@~#u>>AB5e0}3}k7xH|=W;tm9N-*n*8DE+MXzi2x4yj{u%}P!3j1P7E)IB8C
zYYcc#q24>dbhB<a);mS-{?cm*nYfz2@6GCtQ5TSAM^kGoH0nS};YhD;y;=(Z4WwWC
z7_oX&;K>;&<J)6Bhlq3}o1To*x;jg5(&hZE@d#g1yI#dtKnE2nNiRF^HyU-!4f<EI
z_O>TMz71~dQk^hTqPJ^Wc0W{)^Ux&NQ?+gj9*Mg9F^Jx^@J<;PRzQ6Q%!!U^gF*Os
z9$?C~Od6%<B#Na#vtqZ8GF~x7eArr@;EmR<O|!I(IvS;?>!7dBfgySjLII1eIEz&d
z);oobE9yW3U3Md*WATgci_&fcJQVYhP_W!LArR&^tJh1fi`docfh9I=X-}yBUXaN9
zu8F|kYsqQiovjyoy(7gqZX$bm-N0k%YQdW|u1bwkx%Q#~8J|6RPGT+f#XRRHpFPw)
z$#hEyR~aWZ*i;!=k>9awNNoro`ja~roym5$5rdi^Q;Sot9^|OocUJ8)UP?M`)0N1v
z#^Wi<x4dCxW@87nBzGu6K63N==?O)49m#7(2<2`p59=sc|4DssANrFwV5@hL^O^Wh
z#(YUDq^@+qJGe0$r?6vA1473`c#nd)#s245AU({wvnu3)XPcaw5@fF1w`qys?b5KQ
zg_=jq(OF}jXPlP-E8MJfjhsH`^5x4z-r%CZt`*mu`bDW*$qZ_;?H=8a_w4qQKXSs$
z>52D#*~~w^Y_N~fZpLGe?y?tpSDP{kmM)zOL^9o{M*7i09anDJt1z#$S2sr-rAp==
zD%|xgz!So81fmrHxXzIL^V&R5a`;-m9*t&%q2HOd#CBH!!6sI_^2yg3uVhwuHxO-y
z2<nF$L|8Wa(OcE`#q?6OBpo(cU$q`s%We6a=_OOzO!M~Bl1h~=C~%J*-z;ywz%CIF
znV3!cArmbF<w{htng-6AAG0Z=>3gCqaOKV;+=#8QJZSL2rRS;X+$B`Hv@OQ40J|7P
zrJfhDcEmazJP{}82_)D;p5{LETCryL^VB?An#8i+6xA0FL~PWfExZ)pg#^^?U%0h%
zDyN(o`fjd%YdN_ilVE0Iu_Fb7Jf~OTGx_1r5Oj1a$?x13omSMY*YssCB85@pa*4Ju
zm!7aJ%=fX!3{H58VXlod+u{e~cA1co&zobdy=)jjRG28*M=rHGsuv}!g_JHesf>lG
zK>UC>OKkGX34_vYSw^|IoQmwACuM?SXM&xo)#~ZGiD>(dU3-}oRiiuWLDu9NInf7p
zHZ;rPhbSo%@~UFS5vSNcCRe`~zAIJTUR62nihR1MVJ~D;;o#!8GQOB>c`htw-0wL?
zdfs&p%|c{e+T-rCQkNrasO3%`#GdUYtf_D}zcT40Pxzb3y|{hJFGQlm`)1uCi^#&$
zX5_VYLF$#(=|ZC{OpN%&iJM$w#z99sV9(TK2kjOu^IkKASpyTZU-kGRyRs%HDcQCA
z@%ssMR=7O{<+XYH%~C4HTvwrV{5;`~f^y~+3U32EFeO}|@-deBJh>&BRziNFzXeSU
z0^`eGRqsEPqB#`AHCUUt4{WVDw(Dk~E{auHWeY^D7|jN(^8H_Clb#?mQmNAUfO)T&
zuXbX@i6IL0j_&#Cv^k|D3u*OnCs*^T5yJ0(F9rfWuL*#Mz~cNpSQ;b@h-dg2R~j9a
zclLU^8)SL?R@E~M+-^DNx_3F&SF{!twp}&8ya9lDemqS7w}&7bSROBF$V-|<7xUfP
z^^X;XMg(d+3=Vi>4@$ZJL;@sQV?!dvvM3Iy7EAQ@NLKJXk-9kV)j{tQ;g+dQwv1zE
z-PF8Xz`?{1f-*Wu4tnl8dg+^Y)?T1k?|YZ3k`|FAo@}&Up!OYB@-^*4x;o+=(dW^<
zs~kYYEuyVn)@BOw#osjA){e0O1I2^S=JS#)@<d=-)OXZ;q)GUh+Bj!jIF~OYplps?
z*<H+PTcHF0#U`S9E6iv4*@gwNl^`^E;I`<kJm9u^5DecsA1ROLGb5D$#U=dLmA?J|
zv2mY&lMnyC_#cNo{{Igu{rj2!zWBfO@L!>}?ZBedLyg7Wz|&neS5@;98%lR}*nZ=N
zvZUC5Y4%@N=|K-M^i5q|UH4j?9)%LJHB2ZajS|4?D~)HyWXYfW`+fd>;X!eb8}Vy>
z<sRaiwrrtww`AA%<^u2Y9;Jxma}yV%K^yB8a*!8xv3eO*yv*q646tr!(89{jCga6H
zoyPoN#J<i<l?OHYsuXb)!0zoja1~BEbu2`Adunx!hC|Tw6zG1%I}Un@Zfz){2B%&y
z3${YK^|}fAP!S)i+u9vo=K&1<4geBH%g}BgY_A)LoJ-(R3h-Xjv;VhY#v{(GJATz`
zP=o7-kvE5u7$thu$zBHMBqG<B?BffgcYgi;eMlQNf62A6w;R(>1G$Dn+WQU}GSDj;
zkF&pf>0ev>oT3oxi|FNyXZPRp><u&5$4b#u>?etGY8e<j>biF;ldJ4{S(q@jgNC3=
zLFStTaF3;G0~%mOQxD%<CkV9>bg?RWS#AB}dhzIE*{lbx!Nr&cztagn$YY$;ebH;7
z8*y_>xvUrx`ET~^Ul)Dr8=IaOOMt?(8kXi%kRO%0Ohj)n2}Xo08p_J-o0^*L^qF!Z
zs}AgO$=<#-afer4@=;!x@s=1o`X#oAzGHAT04v@_c+cB3oFx}uT!^B8!=TYAnoXu}
zC~xW^V!6t%QihHe+#vo>c4039pFes^xlDPBRGGZvK4Y7}H+QYg3!3e_@sP*Il4t|=
z$V8SR-g498%FBnhZ0dn5PK(r3nFxT58D4g9yxkS>Oz))^IeiI;684gkaZBy3&;C7x
z8FVB3Z9UqMXU`X%;fwjJ_x38A>!7%-E`bB5*x(viO4MAob{+Y4AUjeR_nj{q!}o^L
zC-KkB10;R=eZ$VHS~9`U-w?t#_b@0f_J}U}C~ql9c}rH8GN~=*|J&f~_deZoXJL%P
zDII&$+ZK-zQhQ7HAHc>69L2pp$KK~oOG`!~=iThb0~&kFWkTo3=}6xXAOl|>IXn3$
zkdsi*>UM@ORO~6*PbT;7FA%!<X|xp0MVF}PyVuK3i>>9WM2kM6zUat|`-TWtSd|^r
zoWkn{Q&6y}VL4Fa@K1L;Uq2n2AIJ7H(*kxs@rkmfgKD&$QFR*!r+}K$T?Ac+-4UV=
zmII3>jbs?xE>{oIeC_O><SJj5>;h9D3j58sxAuzIb`Q35oAVQy%jcBHF{WXnzrISJ
z|HgR#Y;1JYb%X7{k7~0qi^A?<Ku*(w8qh=MX<Tv^Zs#RWZ|_|!M^bEl8S%N}S9gYc
zdwA}fQmVtN_!2`?MWvHk^mEvV_cC<StiE-}JUN93Nr^rgDfYkQ+{OVloY&r<M0<?d
zj$4B+55yMBeR9i7Thm0*Bb3XtHsIs+a*@Zs$gk~lJw{*Oo}({ye9i`)R~@h3_+I<o
zt~^NVElU$RMETvbz`^+}GGj{!bG?Waz49hx)0Gm**j$$M>4Q<+&|#RE2z(YbM9uO{
zmfzP-kieJVo`7kbABX#*cqy;=ARUx93qhB&{)=lscdj+wdsK7}PlBzcAdZ^YwpaWW
z-C=$Sm(sUI58~B$q?f|8_+dmNx6*gUFeQOYu8wxLFj4Oc+c?S|gh&LYzg2H(;7ih9
z2oN<1DZlI*_19a%ESW|UJ`)^(GKCyJyZv-N;T|u0Ax!^9N&E-tH--{@e}|vsi<Owr
z2R9Ma_wVg%D)p^tb^DKs>>aF%U$$HAet&(_rUI071*xLm;$Nfn<%nMpejHvMNP{>u
zkV1|hD=}(#YozzzH~ZSAUg84Y3F&*b5qPK|BB4p#{cUu6>%+69+P7jH)6yK%0(SbY
zz5pCteQYEk+V=j`k{I$8cEs}DmlU80rW#UyN0zJQzT=<uopKoIy`WhB#22qJV#xI1
z`|z7bH)ME=3KvfQnhT&7avxP17bzONxahfH1*%J0A%1`EDCclc{&+;K-I84&{=TDq
zc-!iV)3D|Bdjsqrk5$;h*zd0<GG@3Z8|GN7EF^y~;Qx^8Dhd!l<~Jx}T({*~%GYG1
zip4$8;++5mJ$4pDX8q~}2|t4%K0gg>cc%r4ndj?4MTpt)j!~bl3$gb*g{!^Ve3Be;
z5|4uEE9a7YpS_58x{o5~0#OEh%SIu+Y=64;03{YzJm-5~A$@b-&!6%1PWW}DlpOB^
ztX!UQm%GIXK*-`<Ir>agWfUmi<x*-pY*u_m@G<|bVh{T&AQACF#3BL^2|F%A<UPln
zKazQMnaNR`bf5YN`r|h}3Hi^lBNo+xN$-Hp{Ns+Xnubd~Nz7xh>f}gq6My&ppzr1k
zke86KMfg{wxq|5JCnAO3S6|y@4^8Mz9z#83Br^bl?N!r#s~h9*M?m{Wm_f}{sf}U6
z8WNY${PL+vJ?GatmC7^0Ex0S!!V}>za8!YE*zV)pH(%QbS)fT(-u$znFrTOb{y@x5
zeqXW?qT*f(_7r)6o^3G?+78{+;V?d7@-|JMb4Qlm08{f>!SdC|N<QvaAL(d{;tFc^
zdUu5a3fJ<38h-)6@CLBC?#fV~z0h`dv#;%E4XG-FM#FEgyLIv7om3Ux%5%(V4K-)U
zJck?;FB@)7xLM$@xMlph?{n$_xCG4On_)qAvVI9e7eYXqe=eLck>h=Ca)dlBu8E8X
zQ|U;zSue*A{=nexPX0g#V}3L*-?Pk$0+@@jQF?RSrNr+cjW2f}8`)!&uRfzYSyrB;
zlKUq58@EiBW7f@T#o<oUuTF%Z%vyY~OyYQRS6F_uiDSA-l9b}IgPZOY^ohdEYqI^&
z^6u$(b5rVUOXqcljlqn|Uo#c<RG;*)_j>5tWB5|>nE3(&5Sqg^CP42I@AracA~Bh&
zo%85k7vUon-ahEx3p2X8twXxF^wa^S^)u}VNpCT{%l6f<;SsZ%7(di?D)HVb0j8%8
zNBrG9-PEZAzQhNO?{5z9z6Att_P9&jEW{dSDBVIMpPkh9fRN++l+>;6^S!He5z?;6
z0uGs6J5hif&+;qKr6MK+@d>o<PgA`I$)+aI>#5*f+;4mZ42D>gE1182T~3^60Nu3o
zWsT@@9ktx3lHvFr0R#K}gp_Irwf>M>TCP2aTztS4SGifkKsE3*qQ25K<CJ1hn(n|C
zDTw<#>_lPv-s6X}uW<5(cbBEueK^J9B98$p??Q+^u$V~8>L70a1nWBc613=y1+AkW
z%LzX8(y1KR8fFQS$ej}x{2c6Rgm-lzHm@=e^4?cAHE|e!ff<N;E^`lEFlUjS9gp>U
zGZQCZM^$DxJ_u>KTw)jj8+Be>==E#Kru?<ud^mr{(Ztnf{;5b9nBibN^h21zD<G!I
zz~r9r%|<TzFeqfKo)!W+c2I78MYewrPRpOI%bx`E_~B4+Ok1Ck#6ZIo-fef#iEMb5
zg`vWhTKj~SY`c9|@FNGmv(Rk-emu5ghRl{h%<UKKp*U=y2zs?;BWxjHJRJO$KRnev
zq<~v}$V<NQUG6@{vXt+fo2xc45DZD{<-gr|7z%VK;0qfkUUS4@8e#S9VLd@e8G!fb
zfgiRuET|B73n5)<QKLmY{bA<;X`cNE3lp3kW)*=;93Q>j9A^r@>XSuJVfxD6ha)0A
zD`tbnZxan#Y)?xL_~X+T`mUT@dfPM{Q471~x*9C;7f{G8apN4e%}J~0Ih)8~o54!9
z?<L>(!>46hts$Zptxu1iqAv<}Ch8K*kd=^I1_Q6dlu0YUe}JlM9=D?}Pj7`<q{pA4
zcgkCW0<Z}WqtPdulV-Ane8>5em-l=SPj9KcQFxe!`YL!D`%N+H(8Qw<Vc6jokel()
zH%acI2w!#^jq;UF$TekN7Xp1diCz<0^!_oTK;q)g`{c>#O+O_{1epKW*L%B+#}#-v
zZQvUR`g|<(d0EyVt?Kh`g1oNVKF{a#WrPY#Lc?Zl`9>4)e8r3#Zc+D>HyA#<e4^sV
zSphfmy`s{YpO&7KLsG(bVWTj<Pr_(m$|wZ&`;n!!wef8W-RH$k#apOvxk4|CQzkdN
z)_aiKb#6r9Kj3Izs+k2*tQN`|W58~acCcP=h4{awx3##`FSs5&9TUC?P#oGp`jkO1
z9v=c5B18hyTB|%K2ds8QP`@?}4woQAoy5*l?4RT*ZXR)ffQhZ@EUPBIs8L2}HbYis
zW_sJLk1lN88{ti)!w1T0&GwHNX8SDq3%qB2(G%!-6hMTAQbDWS(d-b?k>kt3i$olQ
z@BTp%4u`*^cNTrmk&0D5C_$hmDHg;yTt%NSE}%j|oBoh15vF)7QST!w<^G8{o1<pP
zDKPu~3~7PhXaBPS9H`;Xwh4<ULG&s5njwwkKEr>jBN8h!uJ8C^pb=k2kRXvm=mcz1
zEzx^5Wid1Nbp0H9{JHI&mn1zUuo_C#=Er%<HF=3%YY6(p0^KSoe>mCfCXWepPC*8D
z`~mi9+j(248nebB5&9ZrZWpz-wY3%J#lgYf<$FzuwvqWEdOa?>au{ub0nGB1Nnq9M
zFhHI5VoU?GR)ZqRRw1{Ekq(>t^~iJdMZZ>XVqfD$Z836(&o7o{{PY=^AKAayC&N*<
zBJy?5iPK;775w}j%kE`<pp<rCE@p}wi^D~SL*JLN6Fa6_QlyRgazc~TdW3P(c7|C-
z;)2Ai);j?*(}J6cn$sSID1%eaRAu)5(awfkeOnXFl)A>esSQ(s6=X+O<+wdpsEz&W
zeU_Dr!(L?e-Gm1K2L#X;hK<r#LFrVBF68!wL3_Uy^T<YK8lHJ6vEF9t4?1AljKx(u
z>Mr=H>=gENPZJ*GNoZOnR(Rozb9rU>vJXQdsH%U=9)OG=(6ZQg{@|Dq_gHrQA)YXs
zFrtyD(^D~D)iCnrM&dCw9wiN(84hIm-E()EuNFU=R#(R0hc+6Y+-}A-jXVY%#p)wQ
z)j+`8GZfyuVZr0bj`R2GR{oBo)caTo5$g2_2CtyDsjlwXh(+BoNgS0+3e|O&C>>&b
zg8MW);B7+amJoEkC2HrzZlP=$?e~zAcprBpwZoTq61SDfGTL|{8S4`&!o*Rr`}YeU
z#F2tAqF#EE(0|Y8#F?iHgtetyMT|tV^q5JQG+rc`#if8_2FYIugNf;%SDLCsoAMf0
zuy$K2V)=_j=}TEH)Y3TjN^Ro4r+a@?Cx0>GYSbk8W<837Co*~x@w06en<i8R!fON4
z{H~YynuOt5?a)G;f^>6m&k11ytd$ThGe~Tn(~{A)ViM*dnGXWJ1e@CGtAUQQTb4&}
zx_v2P&ORrP>GU^*T9IDAc%BG5NsFJYjY}ze9SwTL1IW~Jp`xPyI65{@84@ou?@0o`
z&25^4mgS69_?s@@5B|UmCRVh0PM6%r+8TOtSWOd>@9_1l>#=Z|$10Vgx0wj_d_RAx
zj-t`%5RD8u4OVHx8N<|y=L<ODs<6{tCxQ?4cgn#}^!G`uy&$2hifMH0j@RZ{2TC2;
z-k*}vtb&r2HO5lsW$mo@Yh3j?K1kk@hN<H|Pu?DPpHl)=vsfWlKlioEa)f^D(d(&u
z0BLXkf`0#|;jOQ&TA`j|t`D|-qnoxCxU6`htaGI77wdT}Maqvts>PP4-}`XihLA@n
zM%uB+9GCYL>nNJ+=HIc@RL-;tmc-Yt5mwQeZ=0wIm6`vzBh>blAi=So?W=7}fiwO%
zUVb4}?RS}d!LXFPy7Np%WYvxezs-h;li8NyFP7HXDV{d}z_o|AURQhg-rbaAyk~0s
z*?TL71<p%+2{i7eEGwhBQr&}}lp6ei*NbsXVSLbvqmkZN(A!4*RUA(rMaOiSQUJQX
zF^f%1+ptJ@Q+Gr0M+TjPcFsHRon8e<@Wqe_$yNRS3dj$)X<SQ~<4ISnUGfy;x9>WG
z2W|@|85K|mXt;3H$=V}}rQ@2V)^r@^-=x-JS5#qkWTSpWzwq&7Y&3IIft?b}Ir-MI
zS=nM+sNtP;XlLEt9WAU@R66~jNn<&zKXP={oL3r}#cAS$T}u6QDP{Zoe5a&4AxvK6
zD?6@1G9!xR1xslwfm~OCTU^Vclk|#XrZ_mclHfdw0UW!JzB84ZY<+o`V>*l#LR3E{
zE04~eT<^a0;!yu8Ug;`>Dx7GDrK%1aTvBvsd2gM<#G<XzQMJ6LNI%<+jeYhaC-Kwb
zr5OvJVY!~4UdC%ORwIS(@%8v*@N)qk8ku<-Y<jxa@lFovyvw$l-_8#m8HV2mOkfKt
znnZ(%s_bqbRqmhEoUZ?jJh<t^Nyu01Z2kRZzLvK*hH#NA+!D!li!zAwpe;+I-Atp|
z%y7SbvJpi!D(<mh|C(OR8>W%r>o1U_3rXUreFi6`<|0Xazj8c}IX!XaQ%_Q-+#EsG
ziZ8?xP2oWT{Z=v^ou9{;v-&RZ#_(<rT)Z`z2cK|WTyImuvE}*RgO{nau2Is@-p<@H
z5^ppjeea1GY)slehYUW;jq{(^jcm?+i4ImD{5_~b-p0Xe<@U-3nD4>cXx#HmNtF&!
z?`(TP`DQRR&j4mw{niR8#kOYH|IE=58djz^($nKMs^|Xw4dNo~w6~F|O8H`jKsC88
zmP|;_bMG)WVyuKz3&k7L7;74sUXTDk8mc#bBoT9I(hwiK{V?>n?3EmE%uAn#^<$EC
zqZa*nuQNg8M-$?{MTvR=8G2s`;`?z?_+xAb1hSPZ*l46WRXS~Lg*#<*Y);$muF8rj
z4{W(4EpmJ0d0n{Q`3lr3BHql%d?c>Q*ca(|g%zocm0wTGBK<2U8Sk?<&642^hf6dZ
zx%*<*N($729k|Ube04)Q_@<H8?YRJ>wO5-zp6(4}0z1BF>kpYuTr$v)!9&Gj=<(3?
zFLBdHzN&Vz_Zj&1e$G{d16g3@!X&4PB9t$#z+kN!4VRyE$n9Y+<hIW*TV<Ypf5$=|
z4{Q=>*&sE+sl<je?->(77}GuVay9Yusd`5&>KpyTSMJFDH9|&CdoCMN1pPkS_Yq+-
zvw{<r`R*XhtV%2RoGz%kFM!&tldC&ahp@y^<9F1;2h-0FLlp-#=~JOMC(2Gx1KGOs
zM>L_VD(RJtEP(QBiy2v3z692dEZs+XACHIjvp4!DFCbPEyehIA<~gcVaLT>;W^Q|>
zfMe`ANh>Vvl5DdT&ofyrGie^CaE$(%K7y&h&pG-#q)U2m4i478<p+Lm`uHaI8N{5x
zg!v$I8$jqaL!|HyJt0aGt{=B|m3Bijdgh2jzv=bXT~g94^Y&~KMvNI;F}4OEMl>Tl
zH}8@z{&M^H`ifnCygM}Sj^DdN)r|~Ml-3_Odv|GRnfhj{mwlw?cB&3+zj)4+z^)@B
zX#oo}0+!lzSK`qZ`=fPX-L9#OMxNJKz>ce2z;dJP)@;1*`6spJBoCYe>rovk4c40k
zpI1~emjAutXec<AbrPadh#yMGy0$f1jR5_7*%8GeX>a&Qp=$aT3hS%ihqiZ|0?xg)
zeYT-7Pwu-GzflORLEZb2iaCuNf5oaw!n>N#l*GMzB`~AO&3_mom?G&k8V3JX=;2;l
zT#<!EhZy9?;8<nLqpQ3Nb;~Qg$C2O<YY6_9*1_t6(W*%ZP)`ppmG9%zDfCrZTT0pG
zu~mk|QfG9Kh*B2keGxxqk}_TDdfzhpgo|P%*}%WG1hd%e@y7v`VVl=%(k}uUDyOR2
z^)*A|<&BRBI<&|zwk#5sCQI}4&b>6JCK^;%*Gvxqc9eEEkEPx(fcQ-Nkgvzs>004#
z!*CsOFpeaP&8$6YR?R(Jr`?e%`RXmPkVGR=MmbxO$~VHT*u;e0%8%nU2@}%TOM0A_
zMNxZiVD=_-C<&F}tR4P@bd0cm0`8*vI^7#d+2So7qk!ky&!gI*)bMuaj>`S*0C!hX
zhZZB{geM!Ep5OiY`j=oOcd2;5&*(4EGtI{r1v_5oRfMxSl!oTjyBBBPi$q!v2<~ez
zVI62ET9>NZwaz2#K)$C_@Y#jgZvEm?JCM!%C#Cmy_@0>(DMjgl?_x^#r4Nd1-rPRM
z4U?vN7h|-td+C1Yz9o=xjPhxVO%iZ!J5E9rrZH5LQCz|!l~OwD6~FVEHgXckAgFsQ
zr3J^0m|Mj(3zL*4<C;`oNe8}yj(*HMX><Ap&0&&eQyE?$#~)%=sVR`*WSPM&zq{X4
zFcf{hv$k%T$V3fgzGk6Y*;(aKiQCxAim-}Z=T;#ND}H=X;?^t{DA@QU+{R?b5N~rz
zQ$ez5Ojj@d;DEW)LFP1-G&X77y4dgv<E8q~Gud96=0n@>cKig={b~#*hI3$OeCDSV
zea_Bxr>o;!6FjDb&g^#^&je5poE{$0szLW_C!KQ?;R)t4MXVS#@{u35y-lZ|3*3{4
z$<{W9s|sAZd~%<^Go$u?1~>Bcf^NXrbw8UvEu+ua;7oN|$8teUeNI0XMr#>a?Qw*1
z(8Q?g+2l8D)fX3<@$Gu26!38q>z^7|EWaB@;^h15XcM!;1?!WFMr6=qS>E+I1wyi!
zyxwYva<LfQiTGRx>&RKr4NU8DMwXeLpqL0f9n!$p3&U9ExDnmCHNi~TCYqz4ZcOly
zYfL}he5z}d3N-7p&0wPLWNw}#@=DOgEq`68XDR?Mz>QQfhvQ37E4?hcD%TknPq6qd
zm=h5XkJhp<pi*WK%iCio#2|@*adQ`ji#@tqC&Ha$hz*l>*AtL&JB_oOR<1C+w#TgN
zEOxNAc)%4n0A{h|*{m9o(A&l`4OgCxK40gs#zwN_Yh<4II6Vw8yMEf+V@o=`y2*%s
zE*-O65kF5)We}i35WqH>kSl1tiXT62z|g5jEeTEzJHeNTL=ks6<w7%fK3E0F(vYJ&
zra5q4+Nqd<2ckMg>6+KLOc{nI_0r&SrPb_Yr7`0fWp7tq3)9)X1RHUlAd{KG>B<;}
zq6&G-^kY~qUk6}9vjwK>Oj8sB#vaK8$_iS`fNeZ8ld#NiqV&ko@gKY3(g~f+k_9K&
zaAyMRamn<fAt&NI-Gu^9p#U?VWo}C8{i09WAn8C`9ZtBX*+xyY9i?wdh`STPsp%v@
zm3Ij|%x47b)1n-VRmwSukMq(gsc=9(g(Z=?b@;YuNnh>0TR~1b<u`Y7W%kI-<-)wF
zNlojY;6kn3rlx`YK2-+gyeSp)Pf)Nsfy}NJ+9H{T$hE#0Rry^jXBIPTWCSj3N{Sq@
zg8~+y&6l){l^X|Z`#DzKIj}>lKB})chwna-R`?E{S$j<SF(6DeF$~wvG|vm4mzJ7I
z;_VDp?rGG^!ZH-u?ImM})j*$Ibhd_<L>+#g{nvgTvAk6C<k|cPCG~+QPm_*Orp9}4
zQxaSrm!_Q*W9y%=-(U!<ih8JGu^>+UpJrAL#bLki)D{Ixb^_;K%eDjkdp`s(O}_fE
zDNe$3%s#IUBJ9i>3a4+(>nD^Uq2W0$%3UeUie~$fj`NHORe}KoxIGzzEE;(o3v9ha
z1rEthE(5Bc?TP>&q^Kfu-9&|P*AKGUnMts0JLC>;_%~TK#BJ;lo@>3qXH_ZmHEA8n
zOA2J`<F4%oSsHw@-oC_*q?UbICUTVYpmZnqr14YALKT_*%!W)ML0S^4Me+wb&%SLp
zs26ZlwTUyC!G)3q4XsXl*cTb*t9K!){mWHx(Rk|Mkqc~urA)-vm;Mty$U6Mf{z19u
znKyb>w6eLqvpWR^&8?}KfxfATy?F)oM#WhJu=)ePyyYl(bmra!QpOFoFf>NV_O@=T
z%X(6S9X@1N5Cs3ERdb6-Oo@d?19nJJwO26^rnuWmCug3(toLPb`AF5aytNI)3L?o9
zQ*sk}FpI0wley}4Ta#*sKS)*-Hx^!rXr+-36#D|sUMrkseHoa_CcC{)J+Mct+<$8#
zYRD-!V_x*tE~sCpt@4b7shd$U5?lqnsR}JJBTdLUkh_G3tHZrVdUH*Q1H-_DoX%rr
z52!6pwV+Y2f|9cKYJz24UE^pjUrXYDk1LQWR$E+j{0>w79iuK#l9SS$g>m@f<|}pb
z^?=RJWOeg`r-c=;HRf+p?8UK*N|Tc%)Z*`~Zp4;@&zFZM2<CctDQpA6D19_}!Tr;~
zAe_=IeU2@WElEU;B9gu=O#;+fd`z$KyY>9Dd?+D=S}{a2Q7!{KaQ7E!SxRXkk^E@0
zY(im9jhxs+n2!<>PGF7Kh`w}fz_<OX2n}exw5&>by0EI6T5dLN?#_5~e%pB($Z9-q
zXto7dV4eBNF2cT^uoY#$*C1rytd(H49UGoeB)d7>bThU+TjdH&pJRS;<8rJXuc5k?
z`3Nx%J>CFzc~m7*Y?f+l6!1iaOIV;Z?_vDMzh8}EcYop8&NS)i)Y&V#kkAtZi=RNz
zmudEpQ#w82U71|>w@i$z;~{D`#&4VrJk;vvJ!21A_RR?>Sx2v-8kS?<93A&nRaJL_
z^KXk%?0;nIfZFue*9km*XC5VUdMNvQZ<<=z^mG&TbKvN4dKa-J`P{%ryCr2$vMJGW
zF)rO^T-@HX5|!RW`VMVuElZ!hn>>nd4<U+HBMT_c9d=(>E*INW>Wl_s_o|YSQRVH)
zCS?29m(_Z!X+S6D-s@QMWTNUkJR)`%rrEcrb92+Th8zW=H%P{Lca*KL!{WSC^Sq$8
za*>6ByELiDf_q>j$A7#(1_sR|ER2+PNm7wfy+!wm1@G%K$^y4V7~&c{hlaEnX6hSM
zR4ni^F)^K9;Q88Dpwlw;ry_l?jvMS`7tucz=FZ54PRE0MpV#m2yOHXnDhe7j-4?G)
zAD;Uj1T-F>bj*3shtHgB6WU$5K&4H1Ker&`0zL;FW1mbTGWg9D{*vYQ?2q8T?_gl0
zieO>f`Ky)G2YigXf8B}t#(;7EuRBzjN*Ir+1ZKdyd)WW^fhCPa79NHlmoafK_AySj
z$Cc&5*?&IWuln5`D!;W}EbssP*2Rkz!e1CM8nPvWy*s}D`3b*d<z1jX*!YXSO6h&6
zQOf`PFb3m+ABOf*tn>f8c}UyA>i3^NiGlI|`Gx-TIT-I=WMMq}>y96d_#LXh?qEoN
zzx(d*J26pQqnH@)IJr7_oG!3vf+PdUgBPu|oUYIMqi8LDeu|kd;I)IRhtt}f%`iny
z_=deKiW)bd;?jLJGwNybb@atcCkA5HQc*W(O!1B%*)26E_3dmMt=dPs{im9OBz0{Z
zWRN1UN2XdnGhp#L8nGDi0;QaqcX{K~)O(50M0&}JoP!3Yo2roFoOh~qcF=-YsE@#n
zq^bDbtG6$k2iYHVRCzDQ3Y+E+y?o0AcY#)9#n>^kC5<?TJX|jkU}26U5yh1@i{YJ7
zW;kP*2$7mKo+(ODy$bM-Svw)<1H02ei^j{10@ll}<RTmz6|QtEZMC{5Iaze-L|r`Z
ziN5cvQ^T*T?F2E}!iIQ~F6|iEX-doM;)!DJhI?)EG*+lF8t@w!6$`$><mQsBP%TUL
zcltJ9oiS1x_VP*JONJ<#m!q?=WFx6Sg1CI+ltHN>%+b0*DcAjLgN6?=rdolY*A*nF
z^FFkhl(WMVUY<_&1Ps{JORz{hjfa-uQh0&*rzF?GNzqLs<i(s+opm;=DN$wtYmtP-
zT@a<b<ublJ!5WQBOEXk2yiVqIUEP(<qRxv~)(_Sq@udS*)bDNfn3@L`uayZ<S^N|a
zoUGaG7U$e6Aj^EwSajZN$4AZMboCVaoGsCa935t+`Kqz_yze<%s4#Bk&&-=NMyyLR
zM@P2BR{|esIkVo+RGDf#vNiw#wc8Jez1G2sophD#OM)sWO1Z138X@-}JGbcZepG3b
z-NWo$tdBBg#kZRvpXr!!Zq?^@m#`m3Qqctl-EEQhl{c&lzOs%05;%gPwRmR)UaL7h
zjOZFx$#A|rwiywo*N0Bmk+o+#X&}PwurB2$<M;`}yL`=EzR9eY34HQuv4fU9?u*c@
zLeaO`-}?h21&Vj*2tx*nuw>f5<J&dCw9PEb<vLk|lC&)hT=~lvU_Z@d-$SS9GIGCG
zlzyK-%7_#jRlZzPD)?g4U}~ZHUH~QGq=pFg0)bcu9m410h0H!$n@-n`yx}5=g1S|l
z&7!)FlgNdA(gne|RV-p=LBdCBf@b7ir7!DfHRRLD+Q(*(wYp-|HWHtQ$z+;rRu?Fy
z4~&?8L<u6b$Lj0~nW%~zK7uEc41|25S#@ruzJz#FPR<b3-W1{ND@~N4t7W9SrqX}r
zGBHQPo+@2ZUsl=4J}*u`&!sUZu0?E?C8M0KB5OK!xn=(t*u&7(I>E1!W+Vsfy4jQZ
z)q><frrLM<>j*HDi$r$fjC_`L2^nIb4l-3{#6`mF0ys01Usm5_D!){hGQYqcNh)A%
z<}B!2#$p`fhEdhoN|gA?+$CM-XmLHerhb6+aTi^}OHL8`ueks(Dspb3O7X{)UGIu;
z$tZIu=X_Vou9$DqH&EX97+fDoTrau4d_5!3K~Qxspm&4TW-`4Vo2K(Qk<G`VwG*O&
zunS|j^*}1893vU`Y97y!L+#w8m#Mga%-*cWf)k{~P`sz|oI$ccj?MSW;3tWjYxRi)
zJ@S}%12&;)?hI1fnJShHrmsRIEoH&lK59W{&^ppv+hn3D(}7P~6A5Y~WDd^N0EqI3
z6NDrTsP86G77&!SvK0$xZqDY`I%ptbxjyCK8L+USpC?6=B#mBNib!r1<P_i(pQYmC
zCKtWkQ9i-;1yd`tT2x4VJeh7}HtrYHJ`Tvw)4QfHpAxR0zOC3>v7U*)oxSX@HiSRF
zq?TL}VO9^qWiW=JlaF=AdU<M(gWK<a{)hutE5f9-KJ=<F-8`X8u=CtWJC`vx?_$q~
zyc;&WSsdDCk1~SSE~3(YJ!gt-HizIaGtAVLe9ugNk*}M6v6)J($)s`GHru;9P1d5e
za^11n@!};|^<&L4!uxd)lc~mqnPh3Tpx886^?9-O`j;Ur@ems6Ov@zejz?xZ4nu10
zw)Sn@cbre27k_M%YO@t8vMd%7oz(YgE_UmG&IHnPCIPFOH+)Dhns;0j`k|De3v|TC
zV$-cI4>8&FfwW4*!)zt7Hm4w*&Qgc1=_}_t)=u0$z1Hgs9#vq|sh`SVwNK!hGcPAD
zWm?7V2tkAr5Rl8T`ka1LSq~qepb4x{+(MQLEuLkuUS-DOG$P9-k0w4tW?PuP0WGOm
zuZWS$-dMFX=~>PM3P{qW688D_Oe}q}E+l_)PE^g{M8wfqi*5`)e}nkwjz8w`<A3I&
z;e~FEL4oaCiE<Hm4vGhk<=2?j#G^5e;h?bOjr*t3aO(ax6}7NROkBde^hZw{ol>lS
zemB{mngFAUlhAJ}i~#si42-3{AB9F9DmI{f0tU8EHOV$C|7_V0+A45loz0lZYW+Lm
z`q90=L-o%_|NQTNktm-%@cX+B%_}Sn9nGM#{gWj=gc799`$d{e>j4B|FHbDWJ8JcJ
zV}&nwe-S36r<X-j2Qtz(i)E=4!VWXVCQ4;D#q<9bhyMWpN-S9qnm26(L4ySk&&&0I
zm3C-U65(KhM-6mfOas*TUbx9Ews&<jv-8sPwmZ>v$#B7{TS{45yQmF(UAi;hoS&+Y
zk4r9CvZ?Q`6@<G9+jiUY$UY5kM*RMR&;YRY-Hwvm!LO`s2g`M5LMg_D0}++(izHaz
zB;p>Y{#*9VhX5w*yWu0+^^Y&6m2UMqotV_nD@@!R(pbn8i^&-qtGaIvbFZ5}6afkX
zO4QP;_(V0!1siPqT9%<?kH$6+#QYUf(YH*BW~_i%niEG1jqF4W7$qk|45N!|tX*!q
zkR*bp4rzDaMfPUzZtAjTj1bk&@5+z18bDMs_Ge4a6WBC*=g9Qlo_?6WKK%#699@fi
z@|T#YvJ4ouuP-kjRH|*Z=!5p_zTUN)ughL=A2-_ha@|(u<9#g^vtZEnp`2Zl818$j
zreQrdJ$eQ46)D%TZ&FATa-Hx;pNT#2f1jI|w{su_Df035VkW7rvFlYpp(sasLUxr^
zRCum;l1TV}LT2ls%JSigh?aU)M9aC4F0PiAohyvIzQ`hy0_&mw)kUm)$j9s_5pVFZ
zwv#LjCZ_C!Oeu@;aUO^@q1&2yEDe3_7OF~>-uwgdPqgVJ+^&lHmW@@-uJlFnepUuS
z>gQb2I-)Bg@CkeD&adxzUPK#oM<nKih2hAzI#h9vjjL&?O)hEvr2?v%q+o?5=+2dZ
z2J?&Ht$5f*@lxM&J;xcG^$pYB?iy*?ahN?s0$#WO+nK$k#E^qQ`l{Uxk~=p?CVV`n
z+scjLd9nr5s9*F{*=vx->#^$l`Q^$1^-vg9ZxQEqQBjsAESn~{(PNC$`THE$1-A*j
zo%I03HfD3Ys9LTs92c`9V6^~LPLTv>(jf#hxj8wh>Vt<yVztNB^N!tKr!N7E>q+DL
zSU#A-9+Yk0YMr)frxPuu>H|UN?Ei(~-&L7B_;p{a(KBtM?WTxk+i>9(>O|D%XS@Hy
zqKX23!q|j_yr=4Aqs-;Qk#;E}`>)qGHnt!Xm+@E;6J_5w<&oBBlU%3uz7sty^)5*X
zEs(eB-`qIZ6(LvMog{qVunkD1RE*1ZSZI7oY_{m_Hv4RY)p2F&6i{JZp{&A9Ci;x9
zf~aSkT(7OVP0h_2L?Q1MZ+i4dxGNwm(5Zw{_=)exTS~W-r9Y-a%I94a$>&q_ay)+u
z-Uds_I+$;=zWDtrncpZifkiLxu#)$$Zu(i91?=nrcqiF4!vE3J{yIA|bMuK{ox_cl
z1~g)^JTd|wzb6DuV+B7WZ@TQUiH}Vy&03kg`vG1z$;K?3EgAj^R+a{org2G*EvSW7
zucDHYx9T_Njd&UdLh*xxIPwC%S-hWbf<HegX@Gr_OR3=w7Y5ac=tAB@eD)kq_s4bk
zblt*>ii%HHWNQPWVFq5|%B}js$9tjlL#r!=L_5h^T(r`ZWs5&G^mTr3z4=S}cU1sK
zKxPOB`v<sXequO4!)u|+_4axvC$YZY;r3byoLpS0s=6%s<Aq)aZDRh9a6bW2LePcZ
zVA(-DZ}oya18#0!(3H0k=FE#DcJ6~x2>4*s>TXNhIUbA&c#bX1Uieto&e32Y^G{dR
z1mch|Nbihhr{wALYaK6Y<^V4U=&NtL#JdT+^P{*U%UCOcR$9B^f*d%j?mg=2Z0b*q
z?2Cpep7gO(#*S<Xts?)5u)i83?*5`t(bLOr5eqs!q=r~?Y?J~1z98(i1))UA+vff{
z?g<MCMPA+v^c6I(7@EvGpOf#5wx;-?d-DN4!h80O+T<iB4z<0G{(@2XOJU_0ZjZ71
zjk;`lLS6}gmoT#(L<jk1WuR`9*2#4wM(gYdh$z(@kxD>B5`%c0Y&Mx*skXKmF29U?
z`j>KtUGXvKlI}e^cIgkZ+L_I*#w8yv=gZR=p_r|;0F_*fY`Y9>J9Ks96B1bUEq&RU
zs+BE?V&me<R#4LSwr@lVhEoXWj=a6j5`ouVhZ%$SXR}AMKlX8iP45KF2#~AlE4`BT
zJP1fUox^7DB|q*&cmw_?9*MdQo^BA_*qFHH|B<I<LHd_WHFSSu;v;O!MVj<(wO>Or
zB|;ZRmv8Pq*;>@Fuh%Ib@>PjhD@IzqgS3%|9=vVTWBWsGyrodo6$)MO)Yw!^{pp3c
zOeSRYkxa*r1lXbdxCefGx_mk+?8#Yfi7YD8R!M<4Wvyl)4&t!`VJ}kh<DQOsZl5_~
z@7*sTOQa5>&=(>?^47sm|A$cdXk>S{3j1$0E_PV5>u#;dCY!6}qCOsdm>L!F#Oza1
zYZsxv#E%MNz>wnv-V$<?vABJv0~(vGE>jKZ&5jx4EY+-~Bjh@d*KmcG@@)^p?bVT^
zFIpj=1L)+5iC_v661nN_JLOH=fdWmZ8@p=y`kCj6-ai>*>3OEz6u@=nx}T$gP)IwT
zzZ0{qVsaf4nXn}(daIRfOjB4@Bm<}wa~`}K?zz?O1suFs>wpx}K5HNNC`t%|Gy}!1
z&D`{xKfy?@Q`P7Fm-L(C#kdTE-&^3n#HV`sGR9VVXFAs+mUmAY$!q0>;lS3vKOrCR
zz#~}UAM903Lei|#LVBWX%-u(c4BFlt?Jxgn@4sf6FG2+9=OJ5@7AZBN)zUhD(OnoA
ztY7cG`|Ia0V*Zyn=P&>HpBRd$7Te}CgQ~i}Ogi;}g)>hoyimuP=Ltsd^pJjr1#yh&
zM04c1z@H!HcdpC#4?B^QR@Mg7zudmT_!mE;paI}ILc+B_Ffhsiz=KGUM}NLx(JuD{
znxg$m8VjF@L3@aY@Xt*Q48Ju#mx%_#{YV{*zhCixAcOwjZ~gP{|1Nj(zpbj9m>(Y9
zxqw|<zexL7JYHy0M&Z4NW@Ik5%ToDE)rYjJaxQl_lcn)`tp+af;!{$n>!GF80Rh^O
zWxQ6@aEvD~)2kNFZFHmgkI;hZ#5?V|*%S56Dbo|&{X;@3^7@d}gO>b!u5R7ha{qWS
zjEE)74zgO8!n1x8tX<{On4WPhlg3Mlo-cP<`t@G;?E32ZZ8Us;s-}?0a=ob55nc{}
zI68T<^17s?qf77EW;e*~>5Z<Ms%lSnZhrk}!*P%0Nx}0SAahNX=#@jz$1|~KNjs-I
ze_|jIt%Z?m2B*j#dB^MOYN4$S(lRxKzVJG@SylD3DnJWoyYBLJb$f5Z&uxH^<~qAr
z>701GSs&SJM=EF<RhGn~LhkHrs+DkryT03<xB*~+OfHi1icEjk%UrIBLX|Rra+Z;S
zo-B-+T^V-oXv(!me+cL>((@Wp^SnZgq@KL_OQn8!U08nf7gsR*KVV2rU}k2v9^ogl
z&>*dluL2us)CM9`^Mc7k+^zDMG3C?kc@_Km=JvWLoxk_AxD<u*`!^Lofx$uJA_wAJ
zd5ZDl%fCrMY!P<N0IWNWXJ%IcqCBhTF1g(zGAZy0c?m-J#1E(QyU^{QyxJsCJ(2Im
zo@Bn7&IwS**#My2o!@j3Jl%hEpf-{|e>xm<&U3yF!^e5(3QIGX{Q2eHim{6F+oF=9
zv5sy;7fPqqTu)cGzD&<$QP{12*<owl;~0o5%>xI&^%s4X^m9F1^MwzN=_6qNi!GAd
zc4IZ*c}Y*uwo~K0z4z$`DYs+CbVFMri(VO!7P$HK|8fIM+!74;!CbC9tB#Ax**fo_
zKkRZp;wjmpTYvX##d54*FAk}orXAdT2Yq{)deE?ME}zb=FPm6eP>%uz4R)sAXe1g4
z>V+=)lq(6SX}}n)C0xf(olgL&Sfu)kGzpjWH?U8g&axCuU1>{;@(AQ2WWm7a?c&ak
zwR{FRnofbZc+y5B%+A-l{R5C4GX-4-C#j#J*Wyq>TKiz~F>)e^(z^t6b-1|N7r<9L
z6K^#B<hq1%O(6fbo{6EZh@RT)1jN>Q_YP~?%E*KaK$%W<c0s8%BiMOzG0^o11O-~f
zmKM@_eYP94cwP6$sDau-sMN81@i5rQsp|CVYe1EqPE3vUsi5%DVWcZinrK&pz4tB%
zil;1x_lf~_K0w>yLo}^6<Ycf!JpUgG>}RGGD~p?@+!q+*G8Jm*`6rb18<)8A(;5`*
zHIqB82A~bJ_fa`e8#rE*HlYyT95)NIs+5^E$|nbz*(M93J4snEO3KQn&Z?aBI8Blh
zyel}%epS2*w6eUcQcDKH1<6}WzdE}`j{%c51}-kK`6lpaOfGb_PV~G(*43ELel}}+
zoJ|?Vrl-)HG18c-Fz<k}CFlQ<-|BS^)4})=KdBxA<p1YzLaYE)b^WrF6JEzkib%<B
zHA%%8&_~av*U{gZn2U|!V8l$ZC3jItT~ei8IJp}G{vUiF7Rt=TrcU+jleV@tFmzs3
z<A(5gLeBa9?l1PFSJ`wlf)D;^LQ306GOtKjk5<~sPIAD`Ro3C-KP@ckbHjs*oI9v%
zCTd|lVBkSB{{mt9$HxHaWMjbl3&kR>Uj3`c3DRAs#*^6_zW9XXBA^r;FF2cLOI*;r
zNC3thT=)01!iRUosRN(J8xaGeKyXGAXXh$f>D*j1i-|;<d8gyk0Vz<nMY^{Vt|Usa
zTt35Iou6zfER;QD9t_)+SF%^AJEQCi4V6c_1Be9(xTSxFvC8>!7kKOK78!x^(?+3T
z=x9rcdd-2q#(UF(eyFi>l|tC5VQg&7Wyw%K&HK?YZlOu<Tr9<3X%0g^m;i(JbW8sl
z`xtPAaFLB;INBZ<NQyb>?~zaDc1)#HO#VrWU@yl*fSYSU=sVt3NEP{14s`^ZUf#{k
z*VwaV7eO<49U%mjZr_H~mQV5glF){3w+fTnnaOTuDlSshQga=cHWGWpOcHK=oKTuS
z*}M5igFvCSz3-N|1Jp=Q$bOeEGytPzo<0Z6O_zbYCtJQ~IRvYRL1)ktHEHQW!j?t`
zwy9M|oBxNi_ke10>$-(wKcc{~015&Y5a~*lj-p`b9qB4P^co0N5fBm30MdJ}p?8SN
z0R*HIA~p0D0#ZT=?e3iNeuww_?-+Odi~-3=p1Pm4SDACJ4LKkXd6_<2#g%d?j8v`;
z;Er}}mbKBya<66%;u1zi$H*u6Eafto9$fu^Z(3|3GQ9Fl=f4h*?n)H!`uV%->Yx^Q
z4j53_Zm!p3@qZ9Te5J`d)w&YWKwizT;c(3De$jn-MjA>OHiIJN6%-h}e}D<VewD(i
zvzCa=P!Y2I+&$k9Cc#oD{NW8`2A#2w7AOpb4Lp~$Ae2q2^6%vo#*K^oJ%U`CGx=wn
zl8idd^<a5kl1J>*fmGKsCYu5+r?WaaxHu#WC0W`8l*SOi6nGz|d7!u-n(cqO^ArRw
zr<g3edY`bf)buG;I77Yp!S4*hwu4I*%uCtSM~|g;eZws=ORn^M?kOuZ_%Ko1xmm`~
zIA&E~Gb=TFRGFDsi^MY*RN`85TO@o&J!;*yxI!7kWF?c0^R-c0N}(ua13>=Sz8|Tw
zhqsp;%R;0zUYwC;L=&y++(LQB$@qbMN4uci^1{zbWMAn|TR(={C<k5L8p-D+^3gwH
zvH(bwTToE1cF^f*`?~SqOGB+&7gxi5cv4!gWqQ~KqJiR!3|lBvNeNRZY{;A3jiNI~
zRHzYrdMUAn=~RS~MQpf)Py0``J3)9dkO5pA7R=#+KhB$wLsWmEFz#m6NM>e-_8+g3
z5u~H3u!B}~l8&-+C9tP#H(gy0?)-z1&PEbvoLlc1(umYyCbxd&;`<KX-trKh9%}T_
z$0YZ4D}sny{#-xfHu=zb7;8+Sz7DzaZ1lkrK)qIHgqW?Ut}b04Mw`gWlGt1SYreb&
z^Mz3~ffu?onro6i6-3RDR%y$Rc_1tY_ae^yMB5qGeJ8kCgRDmdv7ZkR1}oiI09VOt
zX=w@Pb#v=iS$7N);>4}ql$H*2T%{0Rde~9Ql0JD5_;Nvi8<*}zDBVQO&$m!4DP!L{
zn#=;PGR^F*<GZ)POstYRQX|(NNS7mTIDQRxz(Ko54&7F|HAeAASu9qa!$dORg13YQ
zlTUfQ1$y~&x&4t3R2|*ezzQ|8e@{;Rg}iHZlWq1jC!akb*QGeg!?)sgRdWXX_z$hN
zBb5{(?}@#EDw}n!M;T+DTd?7$boaQq1$nwWY|L6LGaLhA|Mrc4y*N}Nea?I-D5#{D
zw0SpJw3qW$Yg^fD-l9$ugbuazKzu+A>fVju1OcD@pAP`GL^?ztY0}U{l@Ndv_jO;6
zrWbLF8P~5Z_&R;((>q!ogd91;2&7^hSP(b%vP$yVN|x$d)-_hU=d!MKAURKcIx?a^
z_3Ht-0w>}}<m`D6`pmXF8ND@Ly&W4y;WS^pi|mt)$RGwds-lNSkjaRcizTX1zpAzQ
ztp6ihT%$&RmQe~wG&xs$UJCvZKNZEo$j$zVg}5-T1nR=N?d6aVF^8KfXXGd_B__&=
zLR~JSfa)DU*)x+j0bb{Wt$lnfI~1fhFc<yfqaj2djkYVT)}(<80JpT$sVgYphc&HI
zFEnI!M1D#tvLE-XGBCbpS_jzvTAwaV$q^aQq+;fI$9Wh53;9WP<A+8v$TNvqApFmA
zYFiz>_6KFRz*A*4hn*UL8Wif4DVuaNux2;)=9qABa$@ds7mj<w$9+P&82IHmwF`p_
zTi{vRMMn8%%mp9z-~b_Iv6}B1)2l()a#g0)KbAnaL|C;*9e})nhDy$5=2<F>)u2vM
zPfYKf=~rZ|@YRkg$b$Z`!+TD=08=*y%q}`Qypw1YrqVVMtzpvQjrq|bXy&vp9~SsB
zM`hIrJXTjb_R+<kOP}N@Lg*3ij*UqQl=XnZj|bFwk%?oE03L<i{HCz6bk#zz1?V%>
zx9toC5d*(TtQ$-99v+;d`Ry#GL0^aLr<3i*IdZqQ(6O=C^~zS75F<mC-r1zB#lp6T
zy9;Bi9m>P=;E7}5%!m}_Tcd5^u_%xo-eY4Et2{a0wmjUR)1PgaF`8gQLrc?IiF4{F
zlBXRo??_{ZCpyNiW}}Xvpnc-wD~%+>7JQWaZX+WJ2~@Sq$VFsEW{0HZChTpp{^nL4
zX_!#8Bz1JK2ga~){{&k0B17IS|8ubBcEJ*wYv(|?;p9a>@!VPop}7)vT~HldFk)*r
z`W0dK;6$NbgR;h5m4Sop@j?*SKlLvT%xtjTyK~17RWEc@-a1yNqXIILXBu6}Ij7F8
zU!@2VF#UDY6+is6L#H6GY}gc)G^VAdn>#g?o=rTCn|_IvCjs4uhd9&N8YZ4ND5g*j
z>#XgUJR}!TRsd`OvC3+~?~so%i{e5EIXfZFrB$&)r&|-*Mk*^1N(X2B^07WSu{s8M
zz;!2=LL6GV3<0EACL2!QyKuw*Ys$EfMQ{;%D^9lAe!Nr(AYtU#DYn|cJmS-PW?lj5
z<m}`kBVh|{l1CfzaDEB($Jlt4F-~#`TNTb?HAV_ou7fxLw?q-}jVy18bH&g&i6nAx
zb7JuqXj_;ppomRhAC$Fu=YWTa?BbiY2v9JQ9S|^(^2zB?N87`VAv#TUy}V!m>n?nJ
zL?L%uRL>wy>WIkfShZujT<v?*%H~QoN4w=~N|L^BSaQn?#8+w@$Y3Ui6ndhZ>Cf`}
zKL+`bA&^BMDeAU3ltJmKUavH4BxBwkmkUA;cBs*6gSKIP=?!h$q``K>syKj|JME8R
zCKUCu2L?0`wy@y~Knwyhjl3t>{<O%E+;{}#wYUWV^1*Lf1u(|uZ4GoSW}K;PvrQx^
z|8ri4A_WB{iT|)fhvQ}Q_v&5~?gUa|1}<cmTk_2330g~Pf;k@*g%Rq_Etdyon<ITQ
z2c+&SqBQ@xa`GGF+fZZ3&yNl)1t3+qlGq~h5)<btiYR}A{Sg<tpsB~$FW~xh21<G{
z3)B4^wcyfdupHR#8&S{u@1vxAN&s-X&2^#;Coif8ZR8Gp`m{M4^Lf$X`=Uepeah=l
zRR*gLq}kAW+1sTB3s4hdUz1N~w^p%_W4%4f?QzOLT+{PLn2t>}ikG?Rp;3&*z_xPF
z>A1RMYLf{r7Vi@ibma7kw_ZTS$9JMU%(>RUNz|Y-QfuSBY5Plgsw$aJ&z^sJR{ixS
zCA0lNV7PGQ3pZn`=4LtK9~!<&F7#k6<9}bc*ZGRPlV2*BeKG<<a3X|h#eT)wDKxK>
zs9`A=K9cjq$-?&Fe29?6G5N|Dk%duoH-UY5v_k*u3sqE?@5ToJQnyW7GP-`G3cAzF
z4&m;;A9f@bJ#tKZ6ZgDNr;!X!=vNUQ<_js_S;XIDtw?f&SbTURaqOysRUh1KyRYBc
z)T7sOEO&%F6y#|6_oif#4Al3+#FK^b!jPg^*Gwnpve`9vP4|pvtG>nZcqtMK?Z!0o
zLA)q@=^hkhxISp(pp9_LwGoBQp(2Tr>g2;6PYxc*Jhr3t@M_7Hu<N-%LIy?eU{+lp
zqEnu>{8`QX=Zw#ou0oQsHbDK^Dcq&z;ZdWF(nYoBagfOpVFNLRI=59V509r@Zn#5#
zgva<v4{^KMIdgm(`#ea$?0eJEyXo=@9hV>e!&&}yVCm9y$S;A1iOPoFm|0P$3{|xE
zKKs+#m!N&;I0x_X-;tq-igH|=ba;M_Y7UA7fUX=E!2#cc%^-Sy#M^OidtHTW>Xcx@
zX#FiP_TWc)UB2yey2d%Edhe$~<dMUhEB_RtWx|O4NC1wV{L+8lZ2h#ESp|l_yWuM5
z;oc}6`f4QD>?Q>I<dJ5`(UOLqUKWsyx!H~g>AZ*Nsp>-2(lp3N_scNLid#1ye1<$Z
z%lcm1IrN_~%I*LVRa=Hab{NI!lK{R?;umXlW&`9;PEK`oF#oYc7~(Zld~(9-`;Q;0
zq20Iz_vjASf`#sMkV9AxKaJ4R)6Q~b_8I-4`}c*D&kKGUFW!TQQ_j*vEsf>NrB8bj
zmy3>eCS|~L1xh!@A7U9;+;gtvBlk<GX1Y!fZ@7lJ)*2u)RPVWL1s^}V9wP-rMdszp
zPwMLGAPuo63^!~?t^tNIU21>TSeaf!S2q*X#jcfA_R*OR4-2N!SuGNd&3Q-)k?%^P
zG@ovXQz59y1n`{E9RkzLwO>!|b=?2ZZ@2O6<tHcSQUKIl!HuV!08%@I_U`e!(c{$y
zck@;CnLU@6KDIkHli7P!pXA<ej?u*C_#R1;^S>lt$D~2;>dEq9gX#rCz%V?<)z4FT
z7dwUSzxI<2yOzZ?R<{&8Zv@Nw^w$vm^B2i@#xL877)Rvb5Y5XsH_8k76Zp_1<Ju3m
zG5tJ;5gGK!Dk|c;!iHjdoFGJp%({pIJRp^>tN7S-gnWtizt#3RBg_yXWYv^()jpyU
zn;^b&^}6{wj&i7y>x|56z+8K=&$umReS1^?H;WDJP$b;5@9toQzV=tRS|}Ma@!{`F
zn^-pVYEFn)G2UD%QyMm2*J%(u(NTFh@o|JD_w+xP|JQ2*{TV+bD0R{_EHC$3vixP;
z|FenSZ{TY#&b((VkqZJ{@+_B;W<YlLKi`9QUb+vjBz~#6eL46ZeJEf|+8ko|LY>h<
zeWlz(Y6_n$p9obQ{bRp>?&!xGWzGrtK^Q2ki974(d~(&9{ISEFjmo*}i?PUz@?~Yd
zKEi2>e_UHz_7DZ+!iDv?d@IPa>QUt}_hhoE`IWuM3Y=9c{K#_V{e`QAbM$}1+0yF_
zqA-JyE%*v<{%^$k&(nfG1CLJJ{OA2&um7*&gX2E(|G#YTmqLa<mIu_mFVgxS-f2rX
z&YuVXMJW0S$X#F(n-ku_%F`cMoL92fzjzyC1$7K#|BBlGxdE2zK)mn}s~9V;2Ff^d
zEluquG2D@3T0it1S)@x&&g6V{89+2k={l4W-^yuk?;w^XOU3yu>qP^22)0vv3juH2
zE3pL?1R$c*&r;H-Xxis-<Z}?)T2#p3sHv{@=o+A*VL;I!==S5D>&hq;WG`iU73O2m
zy&C`}V?z9r)cu_4$-JYScpwvWu}`qfJkP`~jH^R^Jk_gVr%xO>jR67wv;Vnl@TQ05
z`H5c_K*E*^U|%zHQ_Icomx7hg6^@!kD`SBWmP}$JGeB`nYD4>~fb^}u$k4NVp%AWV
zU_eIR`V8Ttdh<IBsN{4`p7_+!(F9qdMO>C$%_2hgv*GF?sAm@%Jy(!TlRZF)Y&|&z
zj~YvDAQ$)YL`1iH!oRNt634^|N&2$rl4qPOsdxBQo~*2H31ut{7q*jk(Np^V%;luT
ztypK2aUM4R^}!w)m@pZEc}ddH(BPB18&f1aw-`ripC29iuUe9YA3vO(X)c~MSwnm7
zs~|JTc`8U1#m$n*Jn%4x29*Rc0rwt1(LLT6J}6JL26?-@oSaA&!dR8D)c$ytpw*On
zLzT-=EOj-FeD@D|7m6<;tT)bcs%j)I2+VA`n~5@>?`XbHnOV^miPI^xWa*napqJf9
z=J&>Ey_eOuvte7$$Y>|5v1F`1DdkqcXwh^Ix5kPGo%BimmO7DpDqa_+SNm;J^|6yx
z?!4*4tD0X{t+#w6qvJmF1VwZ-KMB!O+s)&+k{D>fQ)R%;e!i_)fKgtd+wxN*ERVxX
zx0iNN0X9^USf<W5BcFRv5_?BNrt;1M?DNzYs$Zo$8d?Y01SMo9rKKkwa+>#)*RtAY
z?GBxFZ!^t*re~n14Sh0`pDXV|k7NmIVz<|0z-a1xx0`4!>Fh)G=MI!_niKA*VW+tt
z2qna!vhtqJGcPq6JnOe^E14Y7du5dM09=$f`#5cP^Q;pmt6<^K9&|wPKGr<Ax6j#-
zjkIq+9yfnNZ{z3hs|M9`^5Eo^Q6Fn)qDRuN+QZ7WKIM3%f4;^5OT2(e&-x-P$`<qR
zWkLP;RqvhN)}@}*hO-wJurHRuT+0o<(#kF&VHgz^HQeBdGV<C5SO{aXrDNS#?Q9la
zAjMazqVtC;U{eQuHysVWM!&pPJ6UTj;IwnU!Ztp)wA884piKD!T>~Q)$IRrZ31HY9
zLHr;du>F_{YYUKi6z)0r1HlC>ai2saq%#!N$tH@rY2N(3CoD0Uo!)jlVjrkE$O85D
z&Q5J}*+76|mG2a8m%qZqRadY2&g;uV*A{$}z4W$mo>jGQd|p)kAURCgxU8HaZDXh#
z_iVY1OXLA)|H@`hhI%|3pqcn!w&cN92B}cnqt0|G9gn90OuMu8EM$;{jJ=Sd3Bb{S
z%6PcW9-VZ!$G^TChW6M7A_idHzWrBJBn!I|#jIMIH9!Wq*=g@B>?Sr2C|Vf!ggnDw
zFt6d=A)bTj*m)Ys<|nOh&zqRVWoZ|PAMTv?e{`@Kp@Iz_D}yJ4dN^~u0Rh_!OJuXH
zy5)`79yp`Xg>Bcej`}p7H6HU}(*=x>mNt{G^GJgQ3Y7Du!{$k9=)$3oYF92No^()V
z7ePF!tMo!H2t<yRgsAuTU-=x;5ta73sl~E8tdoP29k%9^$OA9it{_HK<mKH12EG(A
zk{Kil$H#tbyuv=QZM7knA4lv6<oBQ=;3|~lsF*D)6X^5haE~~|^kP05=q-upYC1gg
zgxlGH$$?E@y=i7%VPA5nieAd2upKHE?X|jB=6%j}xsU%=O@_OM{<8CSHeRQ4;fabu
z7{P$X%`|VgUke#I=$0|6UG;Y6Q*xE@xosPjXE=3RHTBg)8P}O#K8zAb+|&VlWn4}q
zu~x`aD%d?iA7*3u{Hbnmr9J6>uhlc8`bdWLm!4kIW;%xA%mMeHy{x$y<u#!!-Mwc6
zM0bS`l-ZTe8-cWmzB$y+x0Fz<t$`ZW!5*<K-)(S@3_Ulk*9`bZ;HtRSoxa#|@|ELu
zUWQ0@8N{>7Ds;OhE^MGZtdq`JcACi4J5R>tJ8w(Ae{%bLtiDz9n3}{MrQM+9WkLIa
z%O)~~&XcYKSF6wbI^XhvJ{S0HGT)&hY_A3AjFt~fx@;?b*ZT!F+B;k}Epz91pJ#yY
z`X|1g5;Xml^85p!IJ*xwf<%3%+-U$Vb1=)PrLLX_m?CxaxwWP3RY?~#PyLJ1t>v-o
zU3;#+)O|0*gTMMI^U3{`t6aKw#fH8={_XeQ_K?!gO)qI)eRSFpRai;X*3YjTCzVwY
zE4RkMN%U<ihMNcRMXq#oss=yd-x_(LH*ei)Gwo(6J2?F9;>Ghr&#qlI?ryz9P4w#C
zPUzn4Vxi`16q%39*B|Y){r>Udm}fV9W(`s9I8Ey@%GBBDm33G7*U?9_KYpbmCJtBG
zA3S&zFEkHxUeTt!)?<7}?xMAjxz6>6Hx0LaR>j_jg#l5;-sbFkEsD~!iwsp`1&>bk
z#}lc`hFoylE66s}zBpHga@V~uDoi6BcLFmoz#+liiIU*v)(Q;`r59P>r2h7H$Nj0C
z+~P!PQ)A_-&-_T7s2P&n1Ty<RO4McNoowihqPinOJa-YV#pBAN1n*e4mGN{?ckg`j
z#d256R^i-20K!(W)5O|7R%vL``k@cNIAe4C3<5U|XJ_3on5u)<CW?yX^+u>LI=ZT%
zZ3DYH;$5wuoSVbwab(-*devdkm?zDR8_U~=9rVJo=4HBd+bj6Z(PL4wmPrhfWLok(
z@_}Va<?1LP1<@}fz=33EKXmtHb=LTJ1C%*0F!1cbWN@}uU46s)Fhiqi49ggzpD3^R
ztY?qeYq@GVnOk6hZYR!lQGk*mE|<;-p=8WuyoldusB0{Y<<D9o!0Gln8AjM-c^mg@
zwF`CG7B^R1rdz}NYboQs4rsX(c$@Aou1QHOipdU|@-2Q!7S9<Nm~oSeu!>e3tS(8{
zch{aXw+*6V?qyCUg;cKooV6aQ+xA=8T}cEdbzu~D`2z*kI!oAE5lg{IY`r!1l!JBi
z-cxN!d0qOP%c;g-35QO;{jI6O4DXh?EyZ~8hOk>CkfDSnW@`-D*At7DmX<X2v{$V6
zhE#kqQe*uOjMDo}J`6^v90BjBBzDp$kH{d5NBR18VSV=9g8|zlYDU%8GwyvZPJ$D6
z0EKdhs3E%W+1CP7U!Nl7mlqhgc|MIl&@cTp>>&?T?gK9Ojt-~T6`z_{Q6M;>YMjlj
zfA-uWxeU?ixr})g;)>f=ccFdMt9wc68c1S1PO9h%8}=n%EMTMdpR6N@%;^1!`thu3
zO(f~=(${k<8b&^^0ih79TzD*{^^<agh+DPK+z1E^-1+9C9IYU?z8)o#vHRc)u$ePw
z&MXe^I-8XUnBD8Xcm!1v8;XjG-U~Ql**((%Dthj^hHt$4Y)~;cLGAhT7;q|51+$dV
z)CI7qT$zxN5C`AwZ&+Ho5xIkYlGS|CSR-UCmjr`Z4_xBUGEM_GCA>#*{=8MmQS9VH
zD2J#|{o=yUm>6^K1|yV_57F4ima)+L=(P+51x4S&r#mdr>6Q=LfWz4$h^5SAu$(z}
zP62tA@<s{?5oGMgjh@Ic-sy&w<kXMylnoUO3`Ckc6NbyGhDdi!!y_;-%v4#JwNZt6
zVMbQ^2mLGb^l5Q%mBtZxx-K09n*vG0t*7zc`c51<)d&KDFj|q_d3febtETa)>Z`}~
zjn4Co6lDQ|p3DmB@=nOK^z^IDaFS*9uj2d7t<CI`lFCB*IVcGp9&I(XPx8tuBe{ad
zrj2D7BSfX0UQ9MrIs2Q9kW##;E|N?--12V{l=MsxvKyYl?~&r|If0+i>1<Tg$W|n~
zjLq>WYAxf{j{`3I_L`a*TPq{Dr36Pu@2gj?6j~guXFD5&YxS-ThI_e+Ci2CrtbP}^
zoPa?4O1`C4(v1Zkk%+6VJI{>Dw%C~VltK=;X+z8nMU}3!Sg=31Wpp<}H}K`Xd*6o(
zP{4uqngskg+S79eK-Gbv7jk(gG;gOR+agUv%ZLSnX4ej-kVrwh^1*_!lz{<LMhUkc
zRSaY7r~3Br>$>~`;Jm<lwaq#yz$i)*JB`M@<{L5^gULeV-Hjmq0<a)#J%;4}#(&1r
z(z2ra<JaqP9*Y8uZ5`O`Wba-W&bj8GQeRWcFg@DHr=hcv?9s}25{oR7Js*j=E%#x7
z{SRjL6MOundNjzBjI9+Xx%GBx`c;MXwvn;EmTQI2-d-yipwic$x5hP^L?3PkXM?(-
z?|(fBNQyG^5rb*P67bWb%ZRQWE;QtYcY_StbNM{*#RdRw58z1_l+n05yP)EGm=D%b
zWhonCiD+X9Zym8I;1?7UHLKEDh(#ZiHRE^B$c+>n6ez|&mWg*Wbgmn-9UqChedqS(
z#Hno#E^$Tc!6K}4H@d(i;OYFu3zZq^6?C+;x%Hz!8;~{a)32^kJ`>TcY2ftO%`+g-
z%p5kdmH0k9r2jMb$o@(LY-_gG?yERqyi5h4|IbI<mIo3zQ9Yv~Wg~{GxxdDNyHUbm
zI^pYLYQv<pMx$_@yVIlb4lT`Fy@v4dTXHh~NrxR}gd)7+b^o&*CaN06>#^B~J<Fsr
z-`$8Uf>a9b@bEmC`3vtmcrbI^`W38IiyjSohe6~B0#?}CE4Q4i@R1WreyFx2M%q^7
z4v=L9<NfHEX%n#1r~Do~=%tr{y&WtX$q5v4(-4c+XgzBD)o0xqHhr+$&7?l}@hjiO
z^dTyz5%_Z77Gs?NY{mg7J`-!V;zSV3YuV&!2A<*ND^vQ3RpsT1ZBTD51NFXFeg*s#
zAz=#0%Tv>}ruAF3c9Vkd!^5%NzPp?>(FqF!*AjCZ7GvfvQa)Qn6&K&P1!@3w4R<~e
zSJP#~K7zPAND%YhK7Hnl)z|%+_OV(+-xbqhWu!g4<=3zK#xIKQ+@Pcc9>{Jjj$1%;
z+xN&WAYf>7fZumptf>0Vdj)yG{-6Y{ller*9&9lRJFZzX#sghS9#nbs=#i*H%UTCA
zGFm4kZMu1BL^xIB_@rN8njiNm2#OK)NaINsk%4>fxe<ed7YDBCnzZJWl&Dz`+FH8V
zg052vYU=)-Wyi)!5J}AV5`A32D5)EIH+-|n4i6gx(y6KGY0$SCO$w#+m94hztfl0O
zRU{2}3Jb(z?|)D(oc%R<CoivP5Rm(zI$g954<V%Q><pd?o=>-b^=8zrii0EH%hsR&
zY=!&&YK=?QR>L0EJuWWAh`V9YV@9dpB3jR_j`*qu2cM@;c%WKQQ88R~<hPpi;n6A6
z3O%5SAXxKXl*wgpBNS;N!6Qzotra!&Xf^|GdA)8h35&|g0&Y*ObJ*YCp9t)poGkI8
zw*o=j#iZ7Dk^LO+XOztrkkhlGu=(KqKxM?__xk;lKr%(yd1F~P5c5h+(WRWD+uin9
zcsgwqjyur<v!#e>Zf#xkgnKKX>JPK1m_++hAMl-}9GJYb%e`{5+HLIYpBWoSK>|bL
zX=u&+w^tf9Fc?9gu*z=l=n#a>{Py&zENu5Zrg0Ah7D~$N38@UU+&v#}T?do2C~}4+
z^u_}Nl_R1q8AT^2dO-XUcVAKgc=O>J>u@gGkU;9r$cD7XU>jyeA0`kelf4@aF3fdf
zx_$QV*tU&jrA*umWfFW$TUOIFU4bq6cAK*6TE^Z=jY8elvKGt<Df{8t-KvrP!V3GJ
z<+{RS#j00I%OazVC#-ShtU6d@VPg}yaF|=<`lma)P7moL`6U$q4apR)pDq`s>i)z%
zXwU#4r_IdV7~+L_JYBeWhEjZ0UQv<qsdE|m;3+eCZ0Zz_`z}9E1^CJ&*1Y2R=9hzQ
zz=msyxvgl&bY`8|m?srsF&L}G>}24iWy89=kSxSioC-2BR^&NYQ#>ypwc#mMDF<93
zP+Rnu=SH}8kHKO1N=A?q#Ua{Qp32HDgjV4QJ?_UGK&=i?SSPVReE1yhb)fBC?-C0#
zHtfpYV;Pxp>+K|3)a)$X!V{}lzH=7e?_b0W4laUnwD*m5(cs%<M_~FoTCP!&dh5C4
z9;>Ht-az$@*hKs0kJ#0yf#702@NSZWvkE_*nM`ssh{%vSn$_)07M2saQYyH*#5wCA
zu;A`kGFXTkSe*ZLc#b(A%qzf20kkM9r(m=QkkTEX4cr?;lgLyhh-%!*0=`+m#xjeP
zc{hK;G8Lpm*T)>jDoURZ9|_?t2~km696TbaAaaSiFKqT!uqj98fx}e)l+>$%h2kV!
ziA2O<rcQ;!p+RTD11+;zH(TIT#NeyFr?zqZsjj@bwTf)=$_sHd&Y<-nm(Mh(pxz*V
zu)uy}ykU>pa(JNaeF$M%Y)Y8Zx&RpPN3z{1f@)%;F1w(9ln$b(vWjA$HN?XYkZo7u
z2aWXxCm|^;2e=Rv`9KkDvz9aOUOLqpa7d_Sq4y6&Y)8v8o2HM=nFYxP4v2fZRvE8`
zx_a8UJHEk<c(i4>U5Pkol{yH(`7(>UKM$8U)Qq3j*4AFkwKyC%M{m>mW`N%TSn}S{
zK`SH7c)iU+5Bzox<9y(b@kH_M{SL&+ZnT(YNdGL%u@JqR+!|kT(0z1Rg4q?*+w1l@
zH0G|9+WghsEz(*`WT>z^XfY>Ppy@#H9GKfPab6L!?HU_|C4Vl!6+^2=tDlh{Qq{Nz
z8dIxR>e&BT$XYBqJ4teT&uJ}4p&S%3iS%^;wz4k)CKN6w<~uyYwjQL*!sb3$(UsGt
z+yTW5@G_Z>CbDO0ul{>`h=eNHD#6uie(g1R&2!2UQ@84y^7E%;9G?hVPv)lhxO20w
z=Z?X@cajz(?%eSUr5z^MUEerS`57}$T3h&HLA_(fck#MxcbhDtfytxZsK!9qLn28!
z+MntkpojZZq;x(C-XkA0&`~tTqI@$V<$}xh&c6@+#baUZ#FsKgNo2N;k*@}k7)CW_
z@j&0K^112emz40AF{VN+5g3-}5%#JMngwLf3DENNC`Dmm_h%~gzmH@52t3_tw*y)4
zU%h=)I19cHObmx&5ihTm#uTo*i!YykhYI9}pIGktQ~LeSo_-;!-6Hc_H?~%!Qb29i
zKwOzp23dTYP07~IFC4qltuIDV_|tk8*Rs((bH}v2en83=@d_mPy<LW$UDg*PdZ(Q~
z`XELSer|V)|Dlb^r}0Dmj(ml@sIiK;*kF5{<?x$(_WqcOLV8Y-2b#A$J87qI{X*Dc
z75XUMcn@pe@JJ@`V&5I~PN?xnM({L{0ikq=M5-YIjb5WuDA0+_<F^Y(?15-^k+j(D
zi?L~xI+(<gEZg|Jch<7U%5VR3!+*X?t2}`eUr|Ty!ARw8dX2A?;=S=pOT?M}?)Fy?
z@>LYe@6MUxvlwW<Pg?f0bk@+N=pEnPp1k|kvRDcyDkif6FO*wSQi5q@LdkjTo<U~#
zBZX|*F}%KqCZXL-3XtCyCb#TxR~Ih(g~*$j$fPKIWQmHpnwK_8U%Z<bN-0|G84t_P
zKgoxBd2LKm$I#Zbs@su=?LCF2p&D)R?x|*iV+X+vj=z!UL5(^4tHB}bP1D}VE&hea
z4QvebH^wAeV=1Us``6MR(YYuK%)5!5G6ed8cB3L$IdADXJBI<oVkx-r!zG^Y;uc-o
zQAdflS&Zf~9zxaI))LhZO8g=xbI`@8gCUR%tW&RiPM5rovuF*UZ8OpvC2X9Y5;4xH
zm>f;$8k7=i__zeicYU!fB*ObEnZY`bqwZkdJLQne>(*dkB}P;0Z0%{u3yWvzoNO&e
z0(K0tB=VMLMF)zf-(c0vc+oedR>$ltq?7p3@n>2B4huT$S0Ydz_J=o8gi(}-Jb}2H
zFy@X3?DYBXqqAnLu!Xs9?o-5?-hk%}aRwK0a)z%VFW79x>$MiGa`o=}h8g)zk5S=_
z)znhIzx!pEbMZ@nDKF=Fm-7(4-xrQv<B14OHJ>R?{?!0&W1VzTZJcy;FDG$kTUANx
z>6f{c!qqluNeu%pdWIXYi#_KyfXW<uSW_nq*3efJig=t3V}*vj=c}=9%<FE-&!h1C
zVvfqsRX6MmptLc#*Wh8h@DNPKk5a3miIfdn^WOGG45ESEk86#{Z+KO~9j?Q_U3iV_
zY($(S#%<4S>wzzZX}{y@7#N8s_Ch+w3c*c}HC<ge3I3yVa~eo|24{k&qkd<`rg|#f
zjd*2ZqGj~;NcA{@>cVv5i}G)v;+0eQ+-9Vfi>K?fuRyikXJ-|GlFR7FThGUi4p#5|
zpv{}*95hIQ9iLham*gODBo>dEqW-)TUxiBpa*_PrMVpf3PTIU6X3}fmGMDg-qK;{r
zM_z|CfF<g|w$3`WnZlBq%b#<Bf=xL(e`N6ZK^J0wgCqrY){F=ck->3rA`Jk?jYGUa
zB~R>WLfg0|IC%zq`(^`nHbP;sNpVQ7ru&)(UTF&WWBd<f)9c0wuOtwYyjk!5!6{N>
zpWXVsLy7}b>(~3Ub}CeUv;hUVx{=Y>-OuNb&BX<?pC^8alS|RSP<??1Z2D1JT8Z=~
zdtrv4>rDmwBe8HH=NS!6v}KQ<i_(v_zCfj#M5bc_1k&Sq39@|BzH#F<fYN-7OvE>^
z^DY}o7X-o4K=BTP!75&1L>q`*Q#^T>kDa_H{u;6#IUvm1pm2_g>O7%utybgNJMJD<
z7m=WJo*q_6gWfB3DH=LQabfa)E}z>5by63dS)gJ2W=`0dbJhR2rH8)JhToN50U6$3
zT7ZTe44%1>_$s=~VyUkeaSzpcW#MwIroH|#mk&u1Iq8QKA0MZQ!Q0}GKNtZyh$?b~
z0G>Cc%XPzVcw+mk)0x_f$1Y!5_zvVA(0zZObvs-8foIZ5Y}V|Y^s!AAh+MCECSFat
z)yz27nSrnbVqOeFS*Q=cw)6NygBhF5vCXCNwcL=Hm;qE>;90+rlT!2KZ&NnvzdG{W
zPGf<-T!Zr6-cbnsd!HV)F<J;05641L>WP~Zmm^4P&VMzW`?t+$<`k+UXCA7tz$wut
zf8W}|u54IGP09u_!P)Nk-TG0VkpBCif4?AZp7cBZ$)@z_KYP^w=hKDPzx~(hH9e;;
zLH=Qx|Jp9)+-=A|t@3|;B2E3|zZZL9dI!1rKZhm1llc9F^d`kdQKPz4^>*EBo^))i
zr0shqpM}@b(Z?=5<;-VkI){>2A=lEC9a15=+2j=UVR^G8+-q7}sm`?Y5RXM;C+~RJ
zhFO&jZC1pibo9`g&CShurFnX1&d3o-@v94e1ossgee>8G=~ww~3KL-<eU(!HO1f3!
ziT8SRSFdL1@@HFvW+ETL#(ZaQWZM-QMw-l8k|?97%M;%E{W#EhuuppW<SnO#S7m8S
zXz$0b{^jO`vj@{hMmDa>`j$C|_}*l84i0q{i32#u)|TKB0N1J-cWF3F^-M?gqZTWh
zJaJM^p~9{y%iN^-k0<!85j!>Xz}0}aZ*wwc-qQM(KLMo^iO7&Fp-cu*NO$k%j2fj1
zT4VA4;h%*VcysO3?t&T(r2X>B%6(Evo@yX9wsvH+?QHOF<r(v8a*YGS2YOA$miUhi
zpTo;b5OIaU>IWR0ygD|?k^o7|(k;$U52WlIs9LR7`t<3NC^8!5${@rp${`;c>reb8
zL%krybP$!7Z(W;1D$;oFQm+e9B6mSB_`&qx&9nSw58{c3Q{i5=+g{tsp^OspKof~w
z62wQw#H6h0nh?{Oy^N36I*dE;L=wqJ$T_em&O4T6I^k<;nK$U{GAkR>t2Zxs?sBxf
zs{K4hNg>+1XgE$!DK;V9-gg5J%JJS6iR|Q;F!@rZ>LVsy{{qnJTnAe51qGao8-4If
zWPWjJ3C{`z&Bi_Cn`mZb+g$52YriI-dI6Ql2JL+8Cyc9p7AwOyY@pydS2@C24&Z1|
zc7z@`Lx)edAns>Dd;y)?iygC^f)6a}o$i_@>Ogt`HV&SV_Ooa2lDt=*g5KR;I5&X)
z069>5ugMpw3F0bzeD7wDS-HYi^@HKMRlfwDXjEZZnmjL>6exM*6o4ws%94%luGNfS
zQ|bYu<4pjG>1%SS#!6Hg@RcBy3C(u^Tysd*q{3&~T&ev!ZkYtI+Q<Q6HPC#N*Jw~8
zk1Pj>>cNUmI7r)beNKGA(;B2tOuUfT+KC`D8z>18kv2LYn#ngeE<yy6ak%ij{QL+O
ziT$2awKj7f%ujp)%WHKb<HDXw(p<KIdd?d)m?!$U<F*H+3X|8@FARGf)80XNE}rzW
zhyemxatR8q3d|!8e&=5vp;a#q4ai8sG;1;m<iEMqGjBh8t?QO~_K`A|<{0S6F*dNR
zXKlI}u|e1_2UHE(=T9h-H#xHoeIzsHI?qTiW*!;F%-frObf6DK0{(X~Fo0eR5eYD1
z(<&Xy**I6yL=)?LMb=8pQE4ncxeNp~UwV5xN1-8F&HpTw>XRqcK5NNntBSCkoM%J@
zrQXu+#!YKq058Q8JI88e$itn>&Yoygg?ZJhBy5xokP<M7GnyeU0Dr>*dYbyMgFk=X
z^L!d``}Q|)Td4OHvJM*5g3Y{sY|wy!r;Wbh@81uAEEx!7L&yj{gOJ@|r!Y_79|>!n
z<HojDtE0$rXDpMJwl*33Kkh@nJk9W)PxN_e8qSbky6yz3cS|xV`Qe?a+u%>-aA9D-
z^b8EdmFsf-mE*qeeD)`(BasOiqqZXzfIVw&ZdnT29!ozzT^2{I(&Hvo*?Y?X=*jA!
zrBknN(&e{hoUcQ;xngpje-IUZY+Gue_uFIc>bWIt_*;BVQH<|o#RmA_e<1*Qs&FET
zg8W34-$GGi?&AkH%zg9CXV(K$31V&)L&Ct0L9><J>KBd!Z^9r0&Gkz~^Tu9U=FS~B
zN&PC@Xdqsui8kbDx)0WONxdbn_zwH2zW+pRiHU66s}jqmEbhXtAz(fL+M!}@+qMQR
zkyR|Y#<12?p@hV}$h4_fnM4J`FY#jzzn1#n?3WD%5)d&^au0x0i-lGH+^BNs5Zhf@
z03yI_Y;1In^zHOmprquZgY+;a&PW#SLx<qQ>FgsFtTDW>a>gzK|G~sAGplEU*Ow>;
z9>wl38J=$nG+zpcHtMJA&>;<@;n*rc=N8R|gOw>E>)^2Wefu8H`@>-~fY91XG!*hp
zj=5$iiC2&M9EFcQIuOIHxB<wDv?|98gwas>)#R+aC770$PSp^6;QJ(4m~Oby!~)P9
zd;-qzvJ~5Ukt&iJK3~-+C0eOIWtz<*b}NKk@Jhl9!bb-eyla=80ivaD20pi63u^QM
z_#iR^*zuU}>KHv>6^-osoqZ-Vb%3Uj2l}4SFNf6h9;8=L=$IiQXs%!HovbK_uV}n~
z|2`^FFso_f*hnr&E<@b=BHKhLC@I$h4uIPM_r&S^Y(D%16@Y{Y*X1fJ?}B&%*e%d%
z$YZ>tf@~+a+5Rkl>R93}V6*VieA|0#mID1NktLLoUk{+?#yiR|(RM)qYl8VOQotr6
z0>v6RSKzRJoF6h2J^CZTCgAO9d<d@&iP2VHSC=)rW5_%ddDXlux2UMG(+7Ys{h%i;
zvd*>cy@GPN^)i2~;$3oUIR}?$kuYet>^y7@kIpG-R5p)*lKpa1kQuCvEjhPwd=h_!
z{_<_m*jBevFViKPJ?^#VU1(Hi1gw5(+;c^=q2+!&i=Zd3h#^mnfn7l0TNif;vW?~C
z<tcX_Oa4>#icd{I$?$ORq9(xXtTfmZkRXH!m^xIqScLbtxG_Z`vWAEC0)}h|OJZcQ
zNBw2*nmKlTBR(utNdb>t&Ka4SLYf~;Pm%83y{p%KIIF_l1*rFY08tkXmRWBz0k^AR
zJLoOB!Ny9AVj)Pm7k7Geu?h%ihEf);7Fz<!dW2gT0E<Sb#zXDwxf<Z!$o>n^q^&}G
zVGLq=aGwo4X|s%%xK$S-2-ajU<lp811EoYkQMy#dtNjeQ#7G6k&b!eo6D-X!SBEZ1
zeY7!W1V9=&YlRgdKz_?3zkU0*KUbp1RjS9Gmv1iy@u=MD@|L$XOnmd^&DE<{2UZxl
zk0k&uC?;mehv48r@Q6Uat*oGcfD7ycAQ{f;7Lr<Qg@x;q_$Bi-PDlfvsE&*nritT=
zjTbW6L^O03ft~{>a%iBDl7WFCv2=uU^9Sy5Xo|T1CY?Z^T{bLsluJzTE^w4{7iVgF
zxb50>A2j~r)RHA1knm+k*-mv08XJo3u{ZNEc|g*Pi#M#BuY>1R7AbEZ{JI706LdzY
z*ED>C53I#iFqC_A2!Wnfe^9y>f#{hDYOUoh3Mg2=Ol-fU13=5<aq!7yA4b+GnawJ-
zuE{344%mA=u4l>3%OgyOcMp!&s62>QTt_s_6ZMXbz=EEHB;=v%`jc73?}IU)?rctR
z9xZ8p!4IsVdFwaz%1o?%56glgMNEq?Hd#JMz{VY%qq34l_;%Kj<UvuhSGXnN{;QZ}
z*R5F$=#RmAFVBp8>gKq)xiy^BjcKAIb82g|WCY;>TV2N5<C}U!6?}Gm{Tt;@tyPD6
zw{EQ@U{?>T>|qWqH-Q&}W<O5I-p6;sO-pus>r5}wr?f>lZ7y0E=cG);{K<lf!B@8M
zBRxGSV|ClG!5@qAN>FWp!-;A?XMxfSarN8F$>w7?piT)F%bD6bfUoZJ+dT9!2?O*S
zAXIQ`7FaM4=Pz8)kdg7{{cudwee>6FRp#i^Wz834+7(xa*QXb_-_T@Y8p@d7(jr@^
zTa`LgA>p~rDK4%!UDoRAK3X;IqXHZRPvQeDFtv6JLx4H&p?#~SrsKa^w^D>?K7XFk
zdq?ix+Dw-n^6K?VsUUWdL;3bf10fFnFSe!5{92(+H1;X58KC9^WTSxh(pB^psIWN&
zSTYQqfmV!tzaZUFWp|EXOlEZSN?@V(!i`<zUYy3~jeV>50p&o?ZKj@fjJ3BnCnpC6
z?JE`AHC*!G)-5CDY=u^pvnzlLPP@V;FGxVP`5fH+z|00H6LJSHO!PhJ%U6GTQI~fh
z>Z@#bFtOD&q-lI@A<rBamq-|XNmTq;;pn&0KMjY7YdAO*lZy_L`ywA`ekmH;HoHil
z(<Ha9FaFq_P)ix-y{o8nEjg2tK^UIHl--d@XY65E<$FNuJ>MnfvheE?BQK9ONIcF`
zK8t3_WMNzUBUl~Umw$u+%aN|27gNvqrYOZblSjjD*>dQyEopv(w{56vT1VKni_9tt
zzV!N>mUg=W<QuwNw3N(o>|l~69pa@#%frH8tNZ7unEJvJ>;7u2ia1wcu__4?HW*Oe
z5R$9``(Nudm<htSj$_+-fDr;(0FdMAj?iF03|a$L&u5!^*EAMD8pw<xogYX5lyLpH
z3zdKNt{*_)(JbCu&Fezp_L3t(3X;85c4(#W{h3FF1`XQ8FhnTme<C*ty^pfqH)^Gp
z)XN*OYfz&;LZ2&lR4775kYMc@HvqRYS5=0SQ;@_+z;U_E2fvJLhZFL@UiXh2mr?>^
zkR;qFIVMSpDIf>H<Aeff(+lvwA<3Zvv_-A_8bwA9RG~l^Jf?=8nCi_x>6dLFxW~uJ
z$|`h0E>XeK@*Sv8`ll*DzL$89TVrnGgEHf(1Q3xxj_cA98WI^7S3s7Fq?O8pK^sJm
zd$q3aU=r)LK)l|lCW}mz5<YS_nW{TF1|P1CDh!vwjf3V6HjI-nAZ>``zokzuRjK6s
zfdI&QjHJZ)m;XyX+}9zOG=i^wsv3%#<b}~iFAVuMrs(VclqhBeTqF|APtb8!vDx44
zO^8-TKS4&ud`yWiGXA;TsAWC4d(tn2)?gZ5zdG^Z5nvFXA<ONWTBy!Jus|!&BAY#D
z;89bQ=vz@h8`4goLCiN$lL3AoZ=C|Ybdc{m&;J;#*q9@uus{h&oZ>|75k~;&c6b;*
zFKEA_RSVlPE7j3l>a4=`w<<^x@zF`+#==~+gD5aXuCPMc<U96H`VkUG?q6hOV_hoi
zj)pHBL%$&lMu99S7NQb#wePjR%;P$jdS_jCg1A{I<JeS1cjNQaZnS~pqa|r9scF?W
z6J$6CQ&PVD<~U{v2NcWV_R(JU3J;#sYCYq8El?>CBjlK(#*8j4IPuU2Lk~0!G<)3S
z$jOa#h+<_Ug;MezsM8E@MBQDfFEoF<@GrY{%S%rSjgEd5($%?0YD9;*U2?2V;CP7O
z>M@%*`Mk`$0n9EPMMaksg~h15vx+^hWrBV?Iu-!vVwrqT7xzBa2TSG{xzD;rMJ6o#
ze112Ai#A^|@K)1;$j~zs_D(n!bW5<4fg~rMYVd3p^M<0WKuWzCK3^+zYfnqqLAsVN
z=;GkCFG92@^H@suP=U;t5Pp3UZ&)YA=b_*(3z8v29A(Y7n=IMY-5vQaM*IjgEsdK2
z0FqnHNyEM~+g*Fiu!WCb$-qZ2C^&#uE4Nr}l#cG>Lscy%j}g5`#-@)YIP<AGXCT-j
zTn?CQ!;^c8fxRY8z7_yCq%8yMS!cJdp5AF<mKaSvBy+GJp|;DL%O@udU=q?U@O5?G
zk9&;BeuEej{5SOgak#W3GC!KsG=6Yy0K}ZCal_HvU&$=7+=EQahhmxRr+?-<B}(Hj
zPH$qH#qa+6rhA$ZUg|dUMG}DJ?e++L$%pWw>Qey#T#w!9t$%O$i{`PqBJ$;uVR&j)
z;jxoLdTv0EFW&|E|3akz{?pShPW;0@%Gf~>4OY857)JZxDe3u4KQiq$+wTv9g)(@d
zAn=2M|89rsS&y<V2lbCJ+J4L@|16~<mMldv-#A#LMh^mHc(Ey5ijeh>ef@o<h}$Rq
z5<enGK{Kh+;5v=<@Up&+4vH<=N1HZ8*ZWl(8R`Vgvo!DkLUvsZ5X|we8N{71$jV!Z
zJ*tIvPILBq_i{S_COZHZ`i@L?M5st{dOpnrnP>wf>c95o1g9?c*wEJHHs_9!%GTJ{
z(bT-hce_3$|M-SpQNKf$PygT0aJ60~%rUEnG*bKqaeJ()`cYD^RB`LQydm1sC*NBs
z5X#>6rw-#vY3tJAs;}J3N>!kHo;#`a&+0~hjmbhNKsHmd!MN(}e^vM1*wQCUPXt91
zxwp!O9s`(2>LV7f1t@R8^e!cLqtm?is7a)11D;sL&axQVkf(t!zb`tuj)?x-7y$5h
z<;0hbVi7o4F^x|RIMv5>@Kx&)6(!q3Cd5v3VVH&;N)>=YU}u0H4Qps1&8`K$4^eTk
z8as9zzohUxe%<{H_6?{%-nz50Z4)-HdF;xW3B?PV41>%fSVR|y5!*XBAL96q4@<*E
zP>E)LOPd}pb_@YLR(@$-o`#7Hu0#PWMY$MPQwG4l+_b67l=L5qz)^;a*{k}ek3R_n
z?@2$yiB7m=573QtTYs=PBqi82TBU-cb(mk!Qc^0^F3`)a8rL0ts`X!cUXelSAW2yn
z@Wn4*Z>1WF8o?>3C_cShlE}X0DZwNtAbQg9Y)?MxU808LrB-LM2avXiwehBjlaxcw
zd^f%eSMu@W{QH=G8}K1ah+f0tq&~0PqE`bO(Tz#mNCj}8>i?Ta{0$=SRA&gDCU7|l
z?hSy5E&4B{6|!J{8o~;4{+iJ)zy`4z*k#u*{};B~ki^<^#()bfY6KP&;4;rwyc=;*
zgG}y9Fd4rqsaiHY4h35pR-J)di6FURdms*K6Z1F7&(I}F_@T1wlSmeJE=j2DY2VyS
ze?M8#ypfZBYBKhJ!QC9&`w9nk&o3|Vk-tu9b*hyB1<yy8q0&a~e9VoB_kZt|a^Z&i
z%o}B_b63r=GywvU9IHHa6XHc<vp@YyI{MGeC6NC&hGC(8ZJR*x{eRu@|1+BX_vQTk
zg1D)WaslGEqDmi`m3}MMDfW#X!Gb7ouen6WG4u>w&Yw5+&n_v}ze7&<fPboco!iJt
zAONPYX~QbA#raKlaOoa#?wM_HqdLDra&&Y5&&is9_94IeN#BC}^v`Gs$myL`gErSZ
zmTxdNh}25`YEBdR@T8%<bd;V(1+PsS{RfBUhZh7Z=dM826WP`4nzHlyE~8Z86{Y!-
znjg2PJhYo1opQM;Am{q|9U#}8wK&s#8ZI+XC-)*&q?7uu-iFHCJmd;#&An!!Rwl`$
zt7#P)-JzM4_VJ>K;v~g(WU|SVZ^xFJuigQ!>QPbr=af9rsw90UuAa=(KWK&=<%2H_
zt{O{SUQj>TVs+X|{|_??XC2t{F{i}q$|Kh$4Fx}6qbxM!j=tog20+He6mEUjxiB)^
z*mXJ8zjC?kq~AR4T&cCRZIRm(1CD6+Xa3fMoDFvI#>xA|mxh^z9w8%Vexe+(HODXK
zOLH(}Lia9p_~3+Q{Ht$;>+PJL6$+l~*5xnlwNe!7EI%+^D-sb|%PWL0w!tr)G*k|s
z|CCz)QH`T2E3chC{{_6^wjo=j5n5@I$+PS*>htLu%6yWDTi)vR*AJ_>@`|^#kS1s(
z!xmh>;mp^j8NN3HUrH*Pwg=<V#3bwsYfQGy2kbOu39_~yk~qU(+PRH!myrZ$IqIgG
zl(EO?l+#Fd^8C6eQ|l<3mK7eRX8BP!C}b%TK6*>P7+TiS4C^g_ay{WB2b3O3pOvQ<
z4G=(jB<zBVh=gAHr8K2a8!lfy-zhU#GOjXwP;`5!uuYGyLOS5l_0H7?0RTK0-J%_7
z8tattH_RQ4OQW#H*(Ijq1&`)tN3Q?dup}6Pj}-9M{BV|KsJtpgd$ow?6Lw_4PUyOx
z1N04_x=kFPxL&@S_UXzV2!H&=%$A^g+tN{i+tk?NatDH#9^FDIqQJW{IIsK<YZqd+
zybIG>g;<4J%_kj8ogJ5E%^Ny7*)8x+Prh371VjXB*tK-5zfiDF@e!ZL*qG?CB*Cxp
z5+w4fji8l{YzAz}&}UK#>f~9Eu+R!_6hkMM20noujLVA{8jU@?qjqq`mbBryo9DU{
zl)6$=nG9oOlBk-AL2-)+J?C#Fs(G1y6gSGvNPDCp6xnVxy-tjk6qUWgm%jNE9cFnr
z<8GGmHn)CwQCL}_g*tekr~LlYAbm4I1`d=PY_WxpCQ5jRSZDt-Zi_L$zxQ%bY0#5A
zmA6{NP$q(ucD7l3$1t%#HY14rC#-K}U+-yvK8E*JMzc@V4|(-sOa>M$UNkgR0%pKx
zj7?v!deFGB^J7ZoqmRQWGpc6P!RpDsg?xL=m6zviHMSyeE2cRHlZXhCvnDNE;~p`{
z3f<cew=xot0QdNqI8e6>b3lMWS!_S=C?7(t?eI}IszRw%3^lBqT2|^p_oSjG{QaAA
zoS84f$@pIZtmoJ8tRADcKh@hXY|bz*3smfz#llXT;R)<h-bj{N&Bk<?bHUbI!`skd
zI!Wl>H#RE=0wPna5{jhoKgY>y8o8wOsnmmm1{agPNwVd`Z|-&Dqrw6&YP@vFTKb_z
z;XD#G_j5(e{HNVwKKCb`c3ZIMS@Iv_>`;3B3eB~yeI9S2z*`3!2==OH2rskw7E6JE
zoZI)HRL<2iHisuYSp{Qk@`U9$20Aa{{%wZ~9uH4GSJ!*|?(rMNVR(5_^Uau%Y0TGe
zqC;;w<%6v$>Io8hY!Og;iTmGB_D6sG9uTDGlC!W`<bEpRv1>;GOTcrb;c9>~oy)&^
z<s}Boqw*Ou<@<YXvyJ6=hC8rYOBJ6`w^>Y%poW~R^<1^&x_jo>ZpE;MYJPRjO+$Ft
zosgi6VV9?F1$j%O!*12G%F#O0K~G@H$_8~wTIOg6EljPZjr{<)pn!mws3G2h=Eh4x
zWp?P>y1kRyj9djePYV_Nk+yMyMd8NinV&BVg}K!xQ_W{-#3DV5^|Z@UIobZ;5D}`z
zAcR)(HPbuNZ&_eoTvL~~DW>SS(O5i*xiTF&)SY#wnX2%oolbDWBa^dFIXJlPi~Mx5
zE;W6Y8#cymMD47@Ubz<2{zl!~)`}=L&|Lx+BG;{xoz9c|+-aEmeEtDtK3MZi-f<aI
zofjVEz*nvFN~yo=*L4bbl(i{gULN)nl{WD;uW|=XRrf918f}|4m9R^1w2_%+&^&xe
zQX$}xzsjHqc45hkVts(KPdCp|i`m)$)B4^i%4N~{!NtWIhK!N2%!SVznYP2%A1E2O
zjAa#mif&tW<)|$2cqardZZz;Ir-}VQ`%i`_3kKJusAH849jzEdUjCgGMQoIgSohHU
zK*i3z{IDA+Y*>?EoJdx9-(DcBv#7uIB=ssYPk3v7;W?Gncgd*l|7ZaW;JVrc9M*{l
zJ80@?Qr6o^upenoi=FJFjP1(yIx$iYo^*4w+f<glP#7fSY8nO0oMELhj_)974Ng1@
zrT@Lx=A1;h9Kw0lk&)AOn6bx(Z?jN@o&Gaz(d@`_VV9NAb#mZJO+!HYR_gUSyTbZ`
zZy<2z*bd@oZ3eUI<-Kcb4AeD>@#AL>qTB|HUI&;_BAh)+><yqC_j0k{a)aCHZ{&U}
zy)s|Wwz$6ZeGNgWSlUMWN_(u0#IMnGmELCXwMn*r47=kk4j%h~sfz4Y{)@W-lKkp6
zwW4fBbsN^3ZCU+wx0P?uX9hQ=mz!Gz2rWxA=#NHCn*ggb2*G~d!QhxG<}9JeP7RMP
zNg;S9u!mWJH>N}(jy}UI7`%Xmeb~d`(@GY7p;7j$gaBm5{b)nSuJpzS^8S6MYeJS!
z2+@3r;`bhQmdIGl$tV=&a$LNgR*)Bd6>me)!73X5{iDt~o9U?0YU4-aXPZUB+m~J0
zzw56G%UL`*tDtoluSxpmi~EjbI?yCFXEQQyTQ1y>I!&#r-3;u6wqwe6cTaR!x4_B_
z1p+Dxq4C!3NtbFyndXD8H06S0QkUvqAXTNH{y%Je2RK~axA!CxC4wMPq9$4ro#-uH
zh>$2zMhg;A2csK?=+PpgMT-a`7(tlPMjtJR-s@n(V2nD<;5+Ykzx)55`{tg9hjE;J
zc3JDU_FliW*Omj^_l_30bpwOB%VIGXos>_8H;CUq59glH<yM$oRI+&5{5^>7+O48D
zhRa>$NBeXya%hWFAeH16xW2*X9~Gndq(7fuLtSRAMi5&$!<U0qWC1qsirmwXLUra*
zwlVix6dy+(8hX6kHCcTCE~<{VYJ9l4mH5q@UnK3b4U;z2Tp@4R{-{9L4Hh(tZoaWE
zbH62D;b&nViPIL%+V=fQ3!Z`ul+qN-3Cii1ax$CM<lT%_QXCO3>?tm_KgaL1X0gWV
zFcP)vwxOzzEnd|rPka$ho$=9a6wkuP^Yu4><zk-M8z%Xi@`IBLAnVYZI<FI+un}tJ
ze41wqq`6|HOdH~OuAW}bsMD4>Uwp$i>H`nij>9smLCH?*Hz>U37Hpx>z=kW~mZay;
zt8tgk`|+&Z+KGGuuhadsqvx>Gb+hg|(BMRbevQsuWsVf_A<q8M_r|T?jk^UZ_aEpu
zc;|FT_1LDVDE!~_9RH2Fz(f=zWHcgwe#K1Xg8a7Kr42(Xt;r;-;#j+kM_)DnG4m{6
zLND{Y9$Bj&QaPKg+~N&bsHc29L(b-!wmYA#oK%>r1;yG*GIO6_O}qTPQz}MPBs9XF
zPZB0kcsA69CtL;SE(xS&?%mgMyc`l*-=U}xN%aT!$>9!q^mU-@-sSyHXGhD_3A-Ck
zZebAx;iAGV*OP!VY>Zz(r<@cG4J8BO3^;GfTY-L9!J__wN$Uc^z=|Rrs7SjLfa*^<
zc}t&?i+}pTljltE-p1XQ8-_vU@bQhjy?kZ25?ylt`15BUW=$+heuiDzP+6Os<-L}D
zwdN^Xja~9X@x~Pf;S9l><5Ne^f0t@O*Pq4ZCwPZH9m@S(u3a`)@G+u6rVEQVu#JCS
zDr5#2xQRuMxJPeNsX<O$`9JtqlORuwRYhHHu73iYC=ZqOZWJ%t8kw+CCU@=wr$2t;
zFMFvE6nHd3*saT68!sNm>`aaPk8_bw?`c8b9gs>|;+r}?+By%0@@^F=HqcKY>aSYJ
zBNAjj+)obptSvwj)`L%;BQ_SE4Z<H7I>PC%mDff9XFd2xIM7`)Q<Qr?vM1r9W>N|m
zvv+jJHK$aOs(@I7yUo;vxXeIZo>M334A12-)a$Xxf+|VSx7PvvTGf{-UDGFwfwjYi
zqJCb&sB!Vn!6GtSK~(?)`GTX|%Qqj79++HVb9@oj`=zSxIAK)n#pLHkAWO~qwZwnW
zO|-&R&ih%IrP+%m<bFyE%rybozNd6mE$afXrLBXkt)C^NoroA@BM*PdyRuZ2y2sry
zULC2$8&`68Z?&LQ*_o@j;iZ9RoBX4RWWGPAl|Hza-gY#B2aSc_HzxxF=0-{b`Fhb4
zviq?IM8<0AkS6-DrXD9?YwLE{ITi|a3t74TWls+yh8p{W!*<=;d29aF54DUA?1W|>
z=Z2$J!@&%pR9OUWSPV8s3hg~c)(Q0zc2sx9zk|K{ik8<4i!0=VUQik5iDxbsP;jXX
z0xpt)XBIs+H9(LfDpDPsS@JAplGzux*<(fVc?8)o8L~mMu4dh#^x~RE#$UkJw*^>$
ztpNYyGSxH#(Q+$U<eJBgOHX3hc{5x|hLUacOVtT>H(y+je<~KJK2}2-m(6LKQGCBL
z?3@g9Sgp*n2TrO;*;kUEs8#ruMZ|#pZyE`xA04vBtZuq$oZiDMXC9VpOvG^Yd#&s=
z=D<DQbuisV%MoZcr<yP!7csZt`aVLN4Y?VJ{d0UL)6%cEai`py;i}sdRpjRS62?VI
zwf`BtxE}8>$vT3<e*aF~@y;~#zCzjTRQpDqGH~iND#ah)0qPhS=0(u+y*X-__mbv`
zmC~P?`#_r9xH!g+?tFdf&btazppjK@w~&rUihp_~EUnX>LaZUFp~9x7mzo9K2!!S_
z^-Gc`<f^jj*TkHte_frT;mBD!lp$IEl`p=Ohy7SGl_+`jMU|`Hdbr=@pgI9v%m*EP
zvIgtla`9n!uXPC2-+XAXsgUVP=U3Rit7BdxaNw~>H$!9pmHU{&@9P}BgLBEwq~4Ns
z`Udvl3ASgY6A_+q;$n}@b!A>&RcDyaq8~?J>v7KQOB}*keMUiVKT-2k4gBW!d?s<|
zmCVfq3tN{|Pe!$xbE8TZmh+L<;`a;bD9IYx?b>a`t_$D@+1UMjMQKwP5=FA9pc#Yr
zSm_OPS&h*UD9%;;^y+Ix4slplSE%|n<%LMQY|Yprb9w1*T+h40KR&m`#(N&h3c93!
zN;RF9H&+R6$K|j0KJCb<`zE_^>mvoxqdml_7u)I+#H~(NrL@_bv3&hkcSUuIVvNYt
zO%{1Kl`Gh-P;%ou?<K7$vUOI0{wd9ymsK>UNDDr(9yS&4q@@Nkz7(r9pV$iVD&vwW
zK$6^&&r2TEuN=8@$3q$v=YIBU1VKL$P(JVBClS}OmCWs`pB0wLc2y^e!TnF8uQcvS
zSY24{6dqU3T*xQ-<}T=<`>gdoAm=7J%%yEgG#*;nZ`VT>@s3Zn?nAhio1$5=LwNFB
z1YrqqU&pojj<%fBr?ToZ_1_p5AHGKt*6jK*^4TgCANoNGpE8T3Cv#R*HaQo6cd;C2
zX;oje|Is7(ruvyZ_Rc32Z=dZgXxss|EA;%VyS+wVGdr&n<?CkOfY@Bn0e^?ED7U&1
zriN76p6I0-K<D6^%#$B^lgy)<7$9)G*XlV6oCrUe_Sm&xB_D;@dNsH?IK<AGqX{Kk
z$F6W9xcOb^bp#a*_EpBs%#O~vYASXWbhC1IPd{#Vvj6N)i$By6luwLanRp&#=RSux
z(u%i3;A+;3S7}8$9}X>jS$(35wa<w;W&IyI4tz*V#eEN?n$MH`C{f$LWS&JO@|}H#
zX%U%uJVI4S8pIA6eot``Fa?@=!$|x<L%!YWbf{>DH{g$5`sV(*!h&K3NVy7j+%8<>
zgv^@xsi@rPc8{9E?hkkV)*a25kD}fWSE+nYOro0<rt9-;-KXZ%RGW)duLj6hL$RaY
z+R0H&@J+?4qwR%wJ>v;@eOo+(>AtIL;j?%WzASrj^?vvcCg8XQ2WQF>lO&Ui(yvyA
zE88BgKJhPJ03hl0Y5Dcg87<ej87;8zJgb#bdz(y-V9<(Nh{uQr-gIm}7ErkxMglFJ
z+rvt@%2Si*#sJ^9O|sLqd7;J|t1Y0M0N?Gm5Os4GE#u|Un*NHsU8zJZw;NE_p=bur
zF+DfjdLhU622axkvczKiq23T%pU3S$9=)a6PV?XWvn^Dy!;H7r>neAp$lF$57psUm
zWHs*2=?jX~{aUS~+X|c|ymqP$S*aCBeOmt)nc4CIt2t(_F%ksXUOM9iLr2*{eemP)
zy*k|We_y3G{3ptPen<3*Gmwsu$Iee$!F?MlQ{za7!~U5Ujwy|PZYlbEs%2FH=a0${
zobAw;iHZGuR)nu5?;co_27D5v)s2=V(Xf;A85KdgLP$6M02QG|zcaY-pW#qvQH^fv
zjd@crk2`a<Ol9;n#-B6q2y4!g>W?Hc>q&RnqntKt8tvAPb8rK5<7AnZ|B{+S+4}WM
z>AeuD&mK!bUs@8qbQAxh>7Ds?8y(qaKJPTaYv+S<j+Z*jt`dyMbk@O|`|PXuEzULF
z+sFIuA{Wdk2N$6Idoy2?)aO*+0~PM{yEw|2T^U6fRO;yQ$J!Ym3nF_@Ha@ip{ku;p
z*btL?pd9y8tlv8v?9su(v5;=x@Yv0V6{DNeLtTN$1NWhQM#e0$ofcvYoYxF--l?X^
zI3^mPW&g*u)D1OiJag$YoAM@HwZG14gsy2}NG@C59R2mj_wb>Q!r2|8+KTMeGU>(L
zX&}jTp{d3gLvoY9i2YNcv#rJZo@cfOebf8ZZ<qJo{7|)&*`79H(XpP{K+^qpjj!Y9
zzgvN5rIMKr2U-0hYFSmcTYBK+3~2mdTadN^4Whdrrp(S_giF^rqMbl**pa<$Qj`+s
zNO%GK``QsBCZ5KVu2y{3MN=9KN2V9?teP#`wrjD>k5;8c$mv;3-35B;e`|EjiUlh@
z=nP}_clRyBjSumqsRcLntg>QG`(9uCSit!H*Zzk$cralpsziph>;TF*-SW0!gk)GI
z{nPbj?tjMf|NZh9U$TjmEn`_OqQYl|rAXg#SZm+Yjp+O&g=vG}WDBaCb}tga=hZ6L
zr*8kdn;#&~js4;RNoty;TM-0*4p(`rUuX9o#lfpy=>3oJ06tjW-KD<MH1UM&xs|+T
zro#8{Mlmxx6r2zKt9!3sH@+5lmh|dqnd!tSK1;E;>~z-Zw)3g|+{vSA7tZC>puIeX
zDc3p&CQzT!t>mk+JAUYstt3@aTp{}bIh%k=5a)_KY_~dmYyjgs>euLIM{nOa-O7(N
z#)bX6iVq$vegAqsIMv$sM?#%zV|JO^_R_-x@zi3<Nr!qIA;8mT1+`HyQ;*|0&@USG
zr4+G`@>l&`V-N8`kzmPYdP$yThJ)!xGVa8^q*2p&UXAu#0W55%>zHzq1N&5ScwJ%c
zhZBSTt-nhcyCn{)KR4%pbpGrdZUchGe~8b?>Q8d&`Kf61_kV9|e*r4ZzZHl++G3p!
z+Oa(Hud^I2&$j=wx@qT)_@Q0&mrY<!G(P*!E4u97O2Q7AS5~%RQ(&ur=0G*9wCZ@r
zc*I>-bdZR~_A%jFhAy#6Guu3G_}#xHC#tjRh6&-Ed1)pTwi&}jR3ik?2l%+;S~8{1
zNG53!llVvynNY)9a>9l7ha{_GH6XSDJS`G?UA4?l{PiUghb?XEax2?kf{<t|0o3iH
zo@!vQ@r*yuLBok|wuYPCMBux889oXG?%VA48n<m%=@2Weh+{)0T-<vZyRf_D{A5V{
z@6IvS0HjAI@CeHEdv2_K%bFk*>Ap*Hfc0Y$DhSAU&{SL<U29Zchn|J2%wOXDVCia}
z0?4u;sj%>Hg_T^hS1|Jn7n6D!Biqln^n1X4_9()5Al(T!+;$r*&j`iG)g7)RktLPP
z|7sI+jkm)YYkJtq&S~4MMTVqkti};(ZX4}`j-PU!d@7vv#pKq`yh8S(Y|d%nsG$De
zqKUW(%+@(1qbsYfY>)}+q*Dc#VnZj&Tq9+nU;}<qL^jTh(64F9p)BKG)SrfPNKSlk
z`@58-?&VUSvxswdYi%2ne(R4-9{x3|QIBQCLFXzOrrhvZQEcva3Vg`^BKQ9;>P5Qx
znTvvxN1eOQ`7+0Y{vyj2uL6ADJL4?~c2V^<)BCplB|a?w^&CQbuT%e)@tMw9a*`m+
ziE`oN|CG11mUljwev)SY{VI39YT$b_cNVUcF%>P*mGhh*TmtR<tLF<n@LAk{cd02=
z1gW_EqRaU7V6cMg#Khp%e;2wmxu74I)*OI6g&*P5x}*Px8C#z6+2LWQ99QO0OrBuD
zRSKZ&s~WXtcVnVT>Y5j%lcE15M5)a^qg{45&^Mh})+@su|5-gH7SFY_fvP_-!7hDn
z$Scl2Lv!~_{z7z=xz;2!>!S+8$-cUdmCB}P?t)M1)>fp-V>FcS0b6R>ouwC-t5+cc
z$B00|&GE3}VfUT5j-!BxElm~o5SpEl6%Z^G-by?BjK+i}-weua!O7y|a-Ird9{Ky8
za!6w-I~>Q4Fbwr*SV)`Sm#%N-M(9Us(*`RHGFyd)aJzngy#2J!ExS1dpY~gVHv<Kg
zoxQ_V{Vd#U_!2^4T-WX3#Wzj&@x;*h%O{At^#c;-HPsweqUXwH-i;**eUh-fKDAp%
zRlg|~J=GzVe$Ms#xG$wQ@0<dJUpcE|<od64zR4ug)Dd0v!NR+dg4<R%EM-3m9DZ^c
z*XJ>&lipYD*9Nn|HTjyWPhMe7%0KG4wbB9|v7R`Q%rlh2zt@L<jP|E*F10;}jFmr5
zMLAHNj9PvXkbRaQ!Mjq`t&rBt-OM16_AwztqD<>&pRBKi>4&nZc!l8*2R-I>-nQ(t
z@D<JD3SX5YqJr0b;gAfAj2)-J6v1P3#t&H`=iJ(l?BJ%|>IDd1XnZhIO^Jrle6V9~
z`pM#^c$nl<@#UeW)?H+h|B%GGA)U6Ijulrl_|Y}a`FC<rVn2npkbCV1c8H53mt-fk
zwi)W8;(Hg@m@VO4C!;}>xS4SSKes2ZLY^F*8<YIFksGdmtXc#ako)5#Fg_ZIEGoPy
zZu~X=GEjZrP^I-Q%BM#>)76Rd80I&6iM2i|-d`@w9k6Z7dt)K^Ry)@5nhUQPTqAcb
zMytp#5V)JdnTr%QFuIq`$m@r^z~2@jYt6hC-xDY^R6uvT?a}ttro2Xl&Zj)j^m<l$
z|5iw2Lr&Y?{Rfc+><2rT+Q-x7%qmtPj^}1LDI4^D__pcA!^0_>$J15mZ%!ZB)+!Hu
zoh?^zMH=EoU0WNt`+^<xKESZLnbN3L{T}QaA&8S?bRCj~H<KD~X0~E!D13J@CEWW>
z731<~6@5P-PCpADql0K9exhpRzE8nISNq=A2XGb0hH^SXe09OiLNLlxwQ~WHi>gN7
z7TnUBM1Yaz|A|V+;3VDq<te$0>qJGK*_2t-MeRe>M7Is9tFGGmXv!VC13IVR(sXgD
zNn*qRXrDEK-!AECe~rTE!yi8McRg6cQrz~1%uBs+jBUs7N(0X+I5%bYS(UpCqjc6f
z=7uYO_ZOEqP@11|ZDDGqYA$yvthed1V^6H<V>H-(feJmImb$k)Xox?HPHSqNQ+cpB
zD!mR1Uvfe&e1AFJ?|KaaWdcSFr7ylYRD)7m21bHwtw}dkiheJ8wfjAwJ;JN)$L{^}
z30{fy_9_Dd6YaR+-Rky(Jk+Jv5?7GVeHh#Np-Wpnd#uy7%mFYq_D5;5LZ0#=jd?lN
zPJ9T%$nINJJb?-GDbn?5dGpJ1aRTP<GISK4K$6N)D$jw5+)X~&!Mkg6Q@OHn8fmSb
z_U=jTF??hhRG8+G#n{5yb5+KX#h`gV?9;=qa{&S-F&dNM$b6;%=WtsF(UEMyz0E+B
z4UdX(e9o@nokgPzV#Fl299cOT_m2!Q{Op)2X`lKX*BEe#h1a$EZR_0@5kG==z4?WJ
z+Di$^)5;NJN$uhCLF#8ik-16M{ZfJr5hrD<-Ivxm+opbpTYRH84n-D4T91K9yx!=F
z>Wv?%0-TqoYtIZ#@gj3lITOsU+{|g?$&j{iplk%XTbO-Y?A}YU6vf?LD-HOp77i{}
zFNADW@3O3OxAASXalPr-wJSHsAKjy)VuFCJrcXT>FBoxK%V>Qln_~=_lahDDC<D-9
zTPqXu%4FMXO1j2y`c&CTRT1SPaIzVKDzmqM!Q~I`_&bQ*Jgjbbzg=Fq!vP)T!9QMr
ztO|`Q#HuL|4a(C6BlBtCPLg&k<M+gQA)K4RZVWjyPnP1mnd|+bluTnusrYG|g0i^z
z+H1ca+pF^ZPImCK^tr2A_`$aoKnC}xgz(kw4?(9-AM9Lts}@SKwGc1s(J}rsTvyhy
zTs&?gj`o>2*X+VYtgS@(k4w$+)jP6U>al?l*pgPhjqBqI7k~dw*@H|5E2PB`H08pw
z0WDVu!O89JeK^>NaK64dlqILJu<<9M2jrCd`;*fqgdjkibO|HtS=H>ln14wP6?El_
z!4+)x{HiKFi{sSM{M6PkB~^19NQ9)S2mRUZBEs*xA+Hm_V$6UgmK&|8hK^-lKaAi>
zGrubPJHD~1YU3V``<(i!j|I)kFd!{x#(5v-wp}~y$?$~VYIU4%Ue@@8%&>qCoGTiy
zDaErIvG2PzSxRc<dC*}m%|Qvg?<<hyULSr{c<V75t86~38t)OY+WWzON7r1g#mLdU
zVeo7h+55gtt~lxf_G9UMoYx`*-{F}Lp6##UfcNu9Pbn~GXN&BmsG*?QaWA&yy;;|}
zLOL<fc_+hD|BPBX1Rpgv5#__h5h`kEw(ndMruV}q%6g7=e}2#;4q25jU%)(Ve%?5y
zLoxFV{Juz$&$@uB;uTzBGxy2n-Fk$WclFO_NFkzv59_pigqk{2Zd^wSWI8yjZ2tQ-
zOXC!4dsRm7*A(~X;J4QOmu5}IT6{6f3FqsY1ol=e=F{`Nn4RqK@eP3S54X>G;{daK
za-Wsi@4R%)<ymW*AUJpjf{t|1OB|OC8_Ni{UT!b#D%d)>h@oPdG2P-ykP9@e&t<GP
zKT<uzK?&hLxuaqU_Bae2JaYn^jf1!-yoGIL;Vlq}LFTe1IjHyXt+jEj2k)BF*G~=k
zhVu9xZ_EdfGW@svE7G8SV_d6ET<cAODSkv}dEVdBS*z`MyAO0gv9s2UNsb$DJ>Wk@
zE$?7+3h@rXKNA{rMbSREEo$gFu)lN>sP8I}YE}cSAi$HR1N|g<xoj-EWo|OYLGNY#
zj!*Q{?hd?@283Ib&6ilIGTE?rU*~$$ob;U^khCQRu9Ra*Kw%nfK}lPZYA$1ES$#Yk
z5+!tup@%>G2R<o6Agk?jzpv;i#(RZ~9j#=~CB^Lo85b<SEc>WMBGu7IZ4^b+kAL`4
zdFJGvC6s=w-`7&Lxw3%?>aD(iu|S-YDcBwha{!=jz*WbGjjd-KG&B>TF&UTouIHX)
zs_^&#(@nnEW;+94stw=!O4CY(cB+ymE}xILw@<ap?h1!+b}lgaEF#|WhH43K^$i<1
z{o?8dl)Rm*bAiAawlmU@%BpF975<HCD673><an!zYqM=`JUsOYmT=XiI?zoiGa9?~
zNQEnJ(;-I0s;?%cK<&y{A<rr*@Z;8Et&go9Hi*Ze{;3-AUJ#X~aP$}-uCzU{9wkO!
zy(gF7BS`f&D>TjgdlAp5Om;MWRJ?(t*1HiW8aI}F4Y2k^wNcov`ocn7+u_t#a84sX
z;{MZH%(&(F;qPtb-1ZlRliH@HLluUZkqePva6IQ+E53U^&fA7<=VSEL<`kRoX5T`B
zw>YE6eb{M3Gq<PMh7$}xV0!pg3Cyb?yIaEIewkXh#&8Jo7t~dG(6?1dN7<wb*DXj@
zKP&$>ZYFIhdFqjxfw13wrC(RoLt}_cOIwl3(rPZ}>Zlh3#@F_ZthdNhTk@s#OoJEt
zb{4g&N*jp$yD64B9eziHsm-3RwmNF1Cx&e-Tx6p2_opUYwmmOtB+ZyU_I0-yN^hPU
z;5#Mjq)j}S-gnlsQ4LMKu;o8R)T~Q6>T7U0ml3cC&k5{jQMZJs%(~fUcD#s|bq9Cu
zA|<QC%ihp2l@&6;QLIUpt-I8`mH}6Rxy8vdiOnZ_)#$P}FebRBhNZo~(NRc`dtW>K
zT^|M;|H;K8=li&JCRn=8b^dWRj<F$o-aWuv|Blv}rklW23~^ch;re<ulPHz7>MQoD
zTtb$(H#r`s@E&{t0tf(d80YbNVK#1QOS!*DcIYllvft4bL0u*r$GI7GdT`eE5ne)y
zbU%wk<z&u#j(0Yq4^_gq+A{U~eEI|Y6dOitpFA2%Snzti=suFz=c|fBHdF=Po16Et
z2d3@BNw)38Gb~TA_4cwgyZ4k+x}_ktdYO>7nCcXTkF9g#4Yv`4Nq|_&#R{DV2M~`Y
zGd5GTgr(G6NVYS@gCaZ1*!|M6<metzJ=2u`js<wz%sqQ&^ZU0R9nA}HsAPfoh@yJc
zBoU|pJNyQ(!M^c=e|nR($4wTMqp=gquYqp^ThrnVGv{99Njwpv^=87rDPFHn1Q%Fi
zJ=<N1pcpbGQMdktTlh-bac$r2*$6~Ib&0}nw68{{DM#w`c<9&&DyK1@y|JvpA*hAW
zwaDZ%Uxh^Q+0@FOn<9f}3*X%AkLlP_@b18BIJNoC=d)w@s3(D@5H-!n9|~kC<r4Mi
z{gnRltORLl{WB*i==PvzJyYQkB&KiGdoOFR;%~rFvf0I|B9eeaQkvu2rOzZi#kjO*
zWA;(NgheSp>}#1C0y)V(nS(YpnfDG+O4OlrbQcIP1mQ9K6!8XMH5X$~@NcL7wu3+(
zj8elasviQps2XjQj@(urBFYYCn-3k0;U9NEpYp7drv0bPj-3BdD~N!ieF+AyjfJd-
zc>dVcyefP`<p@6*RQ=)b1cMD~urFnJ?3=bJ)|{gLtA!)-pMjY2v~iah+C|z%Wc1J`
zU1Y;-t9*Wl<-}OM>B-@YtckFgvKRT=f8Y@}c=pcUYtO{u_iUhUp}K=ApPfpL9Rts$
z(XFb3zaKyQ=FI1kI}&4k_)Wj3L;JI{ywo&D4o+Y)q$cLCm3RH%d3;u|(*SvP$JMp=
zC0|2>+7kVFOW5iqKJt)8p`k>X&!9wCAJ5cbHX%sE>Zn^o&iQnyP`y<|V=vtc@U;K?
z>J-hG1&!Ni><~`GA$~)6C68T>9MaO<*6-Fi>#5OR7>+zwIjdD?SF^vrr~CHbk4Hsi
z-iLskbA8aq>eY~`Zo=t;rr)1Lh?wi?G_LUu$3q6P>s)m*;i&IGM)mgNeEsFF!kmQ!
zjeUC)C-#HGA~?D1Mshc|FHy+yw07)+69hMb+us2<k|XOk&<gGZCxUQl4{=wve|xI&
z;AJKh<UCOyokM2Q=q)?JK#2QDba<R?p5d)4=>7Rt2FN<OnQ&@}Xbh6nAP$|nyH5o)
z9$WlV60wz!<PJL7qX|mmf{aJv{3r{A6DCm54v_7xgZGqe?qt(v3*l3r3_lb&Ex6g+
zua3-7K%ej)9r-KyZQpJ@4&YnaZEIT$isqlEK@)N+3AGJBkw`i7j2!Zzl^tcN=n0zX
z@3AdAX9gO-S5`wFsdt-ik7$SA87hQpK8iT(;yZn=8=JFad%7d}hk(bT6&geO`WS<b
z;S}9gbe+$JowG9)Fnq?Vz)gm-0C5entj0s-WFWsP>6LtZoX3HM&vtEqWh;9l<xXq)
z>DuYhwST$*M(km712%CI<B<*Fhli?Gqz=haQsxPT72ZpMWsjj(Mks@kLFa>6u_BVb
zzcwvYCVsld(xCVGK?HwtAgqIoj+;Bm(lD9Ek<kZr6FxJR>Xd;5X<6AGrsMeo=5Xdl
zBDk?<eSP}f(BG=Ct(|{9SbWiscg7+MwnG$=s3nl+$S%Y9YkWt`R(ZS;Aj4v#*;6#c
z{s55$<P|$<4;0`c=AIJ9(lat><|FV2Kn@s^WFslJJv>}t29>ZeRef6=MA<z(QaRb&
zu>#E}YVu*(gHAO5X+OFIXa;N2<#!K3Kw7~1FSghJDeDCP>({TdKb$(eA1Pc^Sg*7{
z&_xr+u`uAcUJ!{gH`mVuo6{)NFA_s{0lV@4ef9N&0e7WnL?kuc+u(_biI~m0m#tD{
zE5#ou64N~2<|)hb%jNb5eG!55;fsrl8o@Ah)AY1?EBO3Q_;nYTxSWCWX*vB06!KPL
z-tN9<87Wi3DEFDUIfX__MMYCM$zi-hImYWrj!Jw<yOO2})Bge+F+*pY*p$5d-`;rl
zXKk(a54vXGd;9QStGtu=vd8il`rVSM9#>h6`R!TlDs7qc+CM?1-uw3&WwQfL&b!te
zKEgaw_&u7kEDnLGMKba@m|VX3w_M^1^+3qS91*|zddnTK<A4YI^1CJFNv>*6+sB1N
zI{oGphPSETEAv8OCnd<USGRs0m7F<VOy!+V$#7i~30l{rV>zRlz>5=rZxg+={)VZ}
zk;n8P@So;L-es4AhG?*BZAb?%+{QgXIO#QZ{6;@P1Jp!h#X3*bhV#?Xvd{V+M2yYc
zzeG}PJPkL}0$-;cud!7DayXk#zYt3SP~)rq&F>?(nkwiy@O#SHxPh6Oku8V09^xzn
zE|Q>n#KCtwe(J2{`*%ajZvUfz@MDG3n81L8tyYa)D%IVsyn52$-6Q{i8emUIkzp}x
zi~Yn3q)zN=T(F0)6p~hRNMn>^4T``q)p0%61+uVepN$2+olN~_53+rRvYqVx{Ua0_
ziC#dx)DMgL#ZFGvq5V543JuxtL#nwRte$EY4|1kEB5|R>jge33LX%JZ!IbaPK!>9D
zx?+;SP-=3u;jQ+Sr(6?2JC{Aur7ostW&)JJ-|poX|8!)hQimi(qw$vy97jfN)~eO7
z#ZRSW<lwdz@?cAPDYKqF>m<6MJ%ykQg8ym2VJ;s<{Z{g=1UO#ngk%dJ9o$ttX-k^k
z`Be93WA(L-&4)Qk)|{33FPg2kB)y#`<HkO5g}KiC4Y1!zI*vrj34F^$>12`*+xh4A
zd~1To5W`p{guts#9D}z~o$l`SXipjQlCGXEk{cu8r$ZWOl=!>ryzs@FL3p2&`HGgm
z9rk`gzX7R+_7RC1r)>`prcPVVob*T@Bk4DH=HMF&8P&%CD&;#&!H?$Q7W*VBNwnv{
z!*ISsY{E{W#))rhHQz~7g2KiD8BQ7~aLGB^o!c(bJKcCV1al>UAPt14uoZ{~(if)Z
zqd*Aguk#(rLRLEA_S^e~{uklguHVxQF5wL|B?yW+5wa~VGB9rW<eauFR<EXW{sR`E
z%NN@Bv_Ji2>^hC^(}Q01W~r;)<#+@m_`pSK%vCTPh&f~`!FLWx1cU%eX(RcIJpVVY
zJXm@Ms9evdjLc6fd3<C=NFAXU_NOuEk`qt(y^xL^-w%30Yw%K^?Y|y;fVf}=@?`+_
zVM(2CYo7G_i(ucPPv!++$IY6fDx80dys@}A=yGCl64d%vEoS+h+`1w2+VdSqVr#PF
z1DZlN=oQ+T?!4k*EOt_V{P4m!#&*W}JebbCebVo2@SECm!KGn61cR6#BDnIyTM`w-
zm$~h|Ms8ha5tFdYEV0EF+V4+r`0f$X)i7p>iU(LI>hQvY%}iT*ls4^x%-DDH*7*L#
z3svp-rL}7VI&_i?_e&4cdS2>XaXY+#nOyd9n|EX$SXK2#48ka$iBebQ*W`ob`@Vsv
zox^F}MppiD8yh2R6Z#js<uaj}2VDs8x-%zky28E0Wa)>&L-P&Q{tg*)wd=|<RN+_q
z!0O=`<uB>iUDH*G?-myW)`TyhV1654yciv??UT5xpEdWUl+!g9U356FV%2OC*<%dm
z-Pf-9&BF7kCD}2~ul&kO<9omhSWRQWot5-BE}pD*Q?{L=O2%%lFSfGJiX^$MM3l3v
zZZNTTkNbyZNOnxIcgq=<yiEAjwig9OY4a!;v?a50^eKtp-DHKD#K4b`pNb<;=`7VO
zHfFFu<Iv*B@vg;+y<d)%1`oH?rgfwdjXkA}8r9EQb*{MYBuC&NMy75_e$F)!qN7oa
z{qx`VQXajGnxYlnx|)|QiuI%JbLc>MFa+F65Pi{XM*GzKAiiIY+Pq{DgL6+-rbce5
zcKA6-i*cpC8YrJw2^$e;X&ri2(m^btMm|IG=EO_|7%OI!vG8QX_0(5Muw3Ut3ONn*
zIM&s3_bGjg9A|}&?^Kl=hq$}@IqCgscd6`L%r{fjseA^X7&?C=r>j-MwT>^tv|0jL
zxXs`dCu-<6ba&cPqq0|d{Q<A`ljXM=H}75^3yqoT-b3+~PeUzB77O0FH=wlplZ!P`
zH8KNTe&5RUgIq&oUwv@DZ9XIy1FsEx>JQx6-@5(LNxV@+;;f1RCKd<g+G-z1F<s%w
z2-Th@*vM`B<-Su>V@f`<vEUigc-16V-Bo;Tp{R25-RRxiN1mPqx#HMcd#DZ5?;U|n
zW)#(_Cb7LN+^q*9EBrs>4#Z;Jph6|k;uqd$ITq1EBmTf-bTFJv#hAJuJc1wOhNwQW
zOX!z~%|#j2SW0*hWs}NEwKd$7FdRSP|7_iMMaPzErj6X_SYsZ@T}5T9Lmd;_b(h?e
zFB!Z38uOOn-ErHyWXn_itUW_4+bC7Zd}*<~Jbmz<uY@|n+^U2ou5w(TLHHf2Mak>6
zoY^1x5?A~*n`2tn_3CGFkvELxs(aiDdS5WGlQB-$W2XeY5oh~?8-Lroda=1(;>l?h
zkF7z_;oZV-4cJ|C6&P+ot-o2Qsd<xI%5gb$(v30di*M$^boqddnkzCpxj!#y2Q{(1
z$gyLy&5nuf-z_pnzZG%?XEXEWwD!qc85lFU?_Bv}aA<V1-#qti#)Th;w@f0lBtqO7
zHpWGhMa5(DQ=L$M&SE=D<B|Fux@@4xn`1vJue>3WKE^DTSN!SxHJKs+s*3v4u4SGZ
zk#(bO4;)itW}aGe&ueuyJmt;Hi-*(1AQjUnd}(bQzU1B&f~Pt!y_(8&x_)NwcY0gj
z<fuPO<`h^C&n$`CyL$R|-@=Nh$y{qP*r#^}Cg(b3*=+l+ZxYq$?{Pwbx)d{)#o!Xw
zqy46J9A0KScr8{Gu3!(p$~hiL^y>Y>Y!v^t0nae`1XmI{jL_xr6-f`xi$-sIyB5B4
z&t_y_%hQg^a?U=j1jKcx!Mi$^j-m)bg;($UVV<)YUH_6NOTUv<s^Sx1BeB_0@|^2b
z9~i0E+D@Nx`T%77f`vXy_|fq?qr36Nk8np-q`~*H8Ir2I6~8i1LgZAQED`#<#lFWh
zGV3O5K8_39YmBY77B~Nd^rm;>V|=>Mw^ZoCqhb_nb6ip@{%*=$WQFaZ^6=I`7sbYR
zc=50h+HrafA-SPLrybNYc~!z=oV9YC(gUt^my;1Q6ur+SkGwr^2>7W1mhw~Ps%pvT
zJs?O_Ls1n|M9eE@E&IRrs8cjJmFIfpp$o_`ep=5^v7+zhe~3rs8W;pdCQFdW_8N!f
zDHx<91jNnF3oiZq@Ii2Rqp&Fzj_f4)ANsN{|46Dv7c_YkDP7}4E_3_;vyT_8-^D@0
z8RZX_%nB4h|3&MY^Gbq6lHxS57U>)!x4tF(|HvHU#z2L+@&aP%B(M%nbh6l<AxqKm
z4UJH`pawCtdbpy%!-ugn&2e0``;4VG%zbL_?D6GvT)_(xCf_wy(da*E_agmL{CNM@
z9Y%Obu~P7FeYj($^;g!Rr0^!Ec-!iGz};vthMy;LL)&XTkrSY4tOKlI>l4lDuycdz
z&7(nY^v+rrYNv%ou0FQqelR3dzi)Tn|4hqqA+bWMeqd#Hvg`W52^|YERwDlu`ktqA
zO51*r`JmN+7r1C^!oYXqX8kgP$C`WmP{vX|`pNAM9(C`Jf9$f`X23+~%}n;jYlF+h
z4?14E<tsN=qYH2?R~#?-rg=g~@WL^}R+;>MtvBrYtmNuo$!}ncjVdu?$)@swqWW~r
z!CKU)<e<BC&+?8A`d`{VN!zJUYv<gmmH9ebVNc}rdw4<IyCQfj*#cHD_BNW^8}Z;c
zx7C+QsXoN}OQ?b1=47q2lAF?(<3j0>1cl#>8$~G*fy7Snfo6v_EA`;k#GnrKU(XQv
z&rx01T@kSW|HxIl2(x5}7QQf!1(*UJaBYECOoli!TI#M1LCP~f*91^-z`@7XLwiB)
z3oCR!q%oIXhR(buK0_SKv;wes*%H=;7dS39_XTCbM+1u^hI3S|q)HZh_3~UYbIgIW
z0fwltApzhVj71P^l{>F=WXhd6>n2-(Oy%$+LF&(<PXWeCD$qqdu9Y_%a|I6&3cfYF
z-94K<%%<x_s)(yL`gARY9a?~Meznh%GM;<^XwA(ZjPS80{o(!`OgMA1S1Uu^6e|=U
zFtw&-BD9Xiwbc+wlUMOT&I7T5CrN^Y=o~8r1B_CnkN1>t#RJXPQH;8Yu-3i^tVOQ&
zMsus6Rh5I$HrUfMVrw7|waOAbMrh*OthYXfKm7IprR_4|O>l&^UtKT|iDJ}QWbm0M
zaG0POV4OO3<pAU+n$|Bc*0bRjc7U-#yzMcT8eMQ?1_2yX3^z%}@ctKUrL`&rE#H|h
z#b|Cvm!lXDcxNF$%6S?1R3ZTX_gIJD@9baL<-y-6CX+>M%o`dET_Cu<2FMl0oAgBC
zC)Gx@(z+cp;DQ8*;cL7q>P%46x8ivqK>~^ym+*6a*vZN-@z1>A$z%2|1%jg~s<_!i
z(^Cc=B#gwki1Jfb+`nl8UaN%|Rza%<&Y_7#ie*AneE-7smA81XPmVE5S(f-+H1)<!
zKx$~KnSRzEze(u3lqJ!&(RSYT%bnZL+whPW{O9oMEI=-T@aLN2@_cJ9OT@Z)#V8dr
zy%LET=2N=2J-d<Sd^WXgUVAS>m8(JOuBT1>QjJTsO%`As0Xm%8BTeV+3y=J(1GzWX
zn0PT<>-_Fy3<~0wUmW%zDIB%AH}>Uj)#b9RP6Kjx%riRTBW1e%i;FIsHKcM$%{N7A
zo>6UmXLL@l&L-3f!5q2E8r6byTfglc*0boJVwm4n_Lu4xp8R>PBhxfB9_RA{jnIId
zE0g6xO5cyky0z$K$2Mh-IkOivsSrQ!g>3hN*p-Xmq-RWZysv3Qq`ZOgDwsCmArOu+
zkIGl(Mbt&0GGx0;BrxsE5TcB_|6qgT?ifsw-m#hii0^ys@?;DmE~d%_*m^~HoOdD&
z+n16MdP%88MAy1jmQ>nThciGv&@3QQPr45YnWNvW0)A32u{b$%Z*Jy=W`}yu>xkmX
zB;xp+nutL63uR>tveV_L08dZ0?_N3*4JLdw6D)$E1!P`zwRrInty#46ECj|o$&wbI
zFaPMchi2=MGSO#Ofg8VaYkZXj$aaDi5WmEe(eCu*WvwjHOI<ZwVKBU_x*BcHfG}+B
zHQu7}C=FnI!9sWtvNe$uCaL?fq_xpKY`%6OrCM4?yrnM%D_tbs;G~RR*Omj6jjno+
zIj-MUEao^jrTzIg{eVAGUEq$CqftN!knz|J?j}OtZB_hU8DAUTsbg-EKHj&!rjVg8
zU6KxvY11?UfaWSz?fA!X$)P~jB1ZTV$>y3vmx8cLM3t9mO}zN%C;}3Av+pMp;C!{R
zH-BvBzGL+LK!XW-1t56f2TkvAOjhwn#+GeUlQE6I>Zcv05?fQgqlV6B;uhAJF=mvR
z(9)Q;GcO;ImJ|UWd<6RW@j@#}APe|nShhVL5J<DT<92^qy$V2f=L&~SVc0r{V12vy
z`h+bvCWJw}s*IbzJdY1zd$7G)9N|t1+<5<H2Kq{`PcpX0yaw=;exXG>d;(k24^)a2
zX?Sw5D*=={U=6Z-7)kmQd6f@X!+j+J7$ywPPdn4Wo%lw8^2R3;G`zEM1pVQA>s_h{
zllHkvL&gaK*|5SLLC6#e)oY@8+ei?$t=MG@iNm3`jm&Fi6?vbPus&TXPkP8Q#?;|N
z{Z7s~A8j%QV9>rH@4YGWxfl1oM5~@W#w{v#FUyR!Z#2&CV~D*~Qt0Xdi-pRIKqR9C
z9Ly)2#Luz*=BPIXkE*^rJI1)6O8ZDzYxc*>hYJxjh4Ufo&26m5(f9~1>SxnBs!e}f
z)`V-2HLPon*Id5@H<mKNZ*T6(wYxiJZ|!n7&t9t!k9JWp&gE5lyo?QLPwb1Z{SrFl
z>!JnpI*FbE_2-Y+S1QVhx49ir_m{1Ig)i>*J&U4uW9dTWg*B|Kyjen@ZpnWY`qYWd
zY@JhxO?7p7=)7}<f)I<|xtOW!6fL~%o5fl03ALRBCY)@J=TvC60xHw^(P-}q+QinC
zo^vWkSi11aerN%Q-q-5FjFyW49`z>EBEYrOJq3X>u6M`+S;VttPwM4?3AHc4BD@Zd
zD|}N+Yo!FjG##opy68|;*Lj-=HE&f8ihlts9DyMc6_2lzVu%x-O;%cuw5tV>u9T-U
zs2UIRh_7Z-OT{UUX@<d#MfSafs^SrBcpv@ge)aALMf>?RkB6G}QM$NmN&MqVQFN-#
zMHeDpf4(YjZUCkOAmqo85XFLu1VriRME`m7!J2UC^-IvrPd71HSD4<;g!<Opehy3@
zWb}UI)3B1lXbPkB&-g}I&pp0>xS=igC7mVOBD6HeT$#0_Wie@zO7g6=e}yQbZWOGT
z*kQ-glTuzRu6&z^J&w9w=Q@JQY=Vx)v5l3#xi7@`ECzfWq6JQfsz^fFFbrk4@n*!~
z(a<$y&%)ZZFy;q{1=C_SCzM&k>LI<s840VnofpbYY)<oHLB_V(-)Pjm3x4{-ZmUOD
z##^@zHYJp55<y3%WbqSuKt%V4zb_;G=ULSPS!e*}>0I3radcrKP|3MGLfkOL<pAM7
zUHNLd%kRtmAz&om_Nw7rIxIZ6Uxq0dw2X*H8<|-9sw3D~vLviUF3fWy%scKp>9I8a
zE-|b@4Yy?q(qprCslTx{tlGS<{#HDYp#xY{zQ4dW^c<4WhVNR;sRAk98Uq#`mS7u)
zMrW2XWa-OBf92q&R)848t?In};~U(#WiI4T16EGsQJ2&89B=h3C?g&H3TKm{t3R2x
zv2uEf<?HXDDsSaU8`?Qn7r6zl_w9VK_I}HqHC(QNuUcP3SqTUW!+O11j2<$2qcN(+
z6zQOd+}gS=vD)$KI3geI=joCCr}N@T--)mOPqxMH*MfX(vjEY~KUu%Y0tDCrJQBzs
zea{B3KUti73Cx&*ub7aLxG4QkcQp9&l$P29PJ-{uUeI$usM`3b=8t@2zIzEZ_*=^W
zC(G24R96)j8}jhhOZjLz#symG8o}9>%@o{u<5E_%6cCt~V7)Ur&4lAdT*c5Y$?pR*
ztY<E+9TpN9WtVs}9TgRTpXl4+U4qynXS{wO8&4*N?~RCGE48_rpsI2VK-Z01xykDy
zecoJ!e-)T5V`F*ELTb(OrH}b+Q5CNx6a`SHwhGae!ni!5@AJwoLGbK}&xkU~vVMG4
znZh8-XKXHsxO#7%)p;EQZq>L7ICd6tZ~iK;QG|ParA2VH2vWs5^e*1OaZ7lP51psv
za!1S6vRAX5Wdz1|09dqr57QWAK7DEZ`;$gm^|CU;6|0dl`#X9rtNkIH@waD^V_0Qp
zvVMcX!Le^966qq{A1;C;%0#!rDmnIb=wz)NoJGA~U!CMwND|rV_|((-$MZ~p4IB8I
zd|blMd*TaNAAn9(#^RmDoBQt8_lee7QM>bSKfOXLzxQ)p$s~HK)009YI(yj!FnsP@
z`Qa`X!3bMjoO^cjob2ymd9$b9h}f(s+{3N<0{3?>q6*%HkwvbqF#e;%B<K3iZ@%R%
zm?l`JHSytv-Djlw?s>vetiMtvDd7?x>9)m;u?A2}ukxQzeVmoAhq5V*@T!JsOpavE
zVQ_AVLL4p@7Hqw#MBFX$GCpwE3p^m*KnRR)Q5E<i5CS-0$h@N`xyYZb$LWBUEWM=e
z_tqK;xlev2HzC!MBhH%7I!_*V<Z1a<Wy?K!mxDR0yCd*@<rG}<=^|vcb#5Ti=f&Zh
zgkM0|QmlZgdgQ7O5C*2%VNdMirSB3X9#xQLT-;|&5>DR_&JEvRRE4kGJdta`R-Us}
zd+N94F<ePkKf516Oc8zRYex+mk~*n<;aK@<jaups5D%Ka*B>}@;8sv&>n+O6oB0m+
z9>G<x4+L0(kIejCs||&;McibUBcR2cq42)L$=@yIoa>1WZyNxZy9jt5c*``LRGa!z
z7B!Bx)ZR9<0u}>6ILq={ld-T49RHSE)rw3D+jo?J77M@>k3xKM&(8rdPAy<O>?ejE
zH0GvN_m=}TD(dGuw*R-D{2w>X!+(PQ7dmF?77%Rzw~O<-ajta(tDE@YKWyJ78sAs2
zAuC%++3lwkdOiOC+fn}W2J_;dpdan;@>v?mWccupQcI?>n{s~_>rp!RkCS@zAHGqX
zJeAU$L8>|38lsR-NjtF9Q{v3}I66z?a7Y7NYz3zcCY;z4u%j7}>q%X1yW`R)TLOHx
zHZ~R4Vw2Ye{w{n>9n!$MrM+9GzSa6s`_;*InR<JYo*GqCBk6>&fH0nE)|fth800am
zRQcbr00&Gbv~vfC<$T8)T=C#8^}h?<?*6O*s!k(b1Q^3N<KFY_p<xSy_Q5?sc3gn5
zw4@wt{pR<X4dQ|y8s{vYNCt5Rvx;C2^MSJx%^M+-dg^<>-tGS{6qGTeILNuY+BpS&
zkfbsCV*?j-!p^DT=SZAmQlq?SC5&y%gtiiwY0$MO7Z1lz0MD4m^~iU-$o|F8geV~9
z-;RH#c|$E@Rk!NQJX^KqsEm>57HOAR`1|4BlX;=ZDW#S@i#$P9I588ny>~%!S#M{Z
zLU$e47eGmtSUrC+U!^9%ng%VM86eD*Ioa>)mr@%uDik=V?6d3o(iSY~pfRv3#@Cs%
zm|nw*PpPhS>GX7m_6yg3FtsicqjPEVEbYuQph+)E9+eEajw`PzO;GVie%Pg&HmxaP
z8BsDxo1ds*{g`*8T;SG~5sAo?%yPPQp!e~$Uy<9V*7hQ=CmEgLo+6dggO$;VItu%l
zFM47Q4PekCS|hG(+@GpS0fkO+wTw@b3Hd*|pB&Qd<Mb@0mqR*hjtrD66;K&p4;m-x
zcKEgWWE%-{-L++CLUuO!>}x4dROo!j`7R-2a}TXek1A2|<!6s`-ApQ$lUpySEYU`8
z{Agely?4RItxCD1=Yk~XEQ9~6vY$(ie!=Xt!Jy7F)KEPiJu4%N1K^12!qacrExWt(
zpH|a1c`545tY-{g+c%^qLsrP@hfCxL{;g3Qf>1tw!>uSg|Fw@dB3R+@&-imjK~1`=
zn1b3Dx4Fb_)0?|2&F(`Udf#iKC3gN6BPZDk-XD1Cf<ZX#y!MKFvbWBd^245=8@D?z
zra8z8-b7A#C$q<ldGaM$l+G#0z9lKl^f>pwEPuXaqjrC2b<_Mz$OEaC`=LYJ+HsKR
zAy12poYd5G$cj%1?79N^*YHiu7l*f)6^?rd1FWL}Qq+c4FOGNdvE$p{;u|fWMqZ;6
zaf=eD#&=)uOY1V~LOrbgpKukwktaPrxxh0vJN#&xjxa1@?wW!!?ANb0Q!&1iNE5{Z
zLYh`iED|U)jR*1gY1OOzO-pTxiEe4|qkvPlo?4<WMXd}_$8G-fSca{hO8c~Erlbu+
z2W4SUHLhY3;Z~JBl|&a@NNs<(J)LBd-CAqfKJ|EC)GOn*3sgX{(T5e3_b7_0>p9wE
zi}T$KwNY`L*HGvVH2I3hYZpnv7iVApbt%0BUJT8s%E$?>-8HFpWZ`U`;2A0DU%1!(
zQdhWaN-YXD%s`%MkvetV86$Fc=z=TA+&b|@_5+P*Y(Pnd+<Kgo!_cZtt^C;48jGFJ
zyRt)t<2l^<FEc)tzCPXg#{6a4rJ}8mhVanUUh%^cdm-<NiT^w2LT!62@0kXgeB3Q3
z&j}v7V5{r^mMLhL5B?Z`cf@hXTZ|1HZUI%hR5Ri$)(csOUl=>M;mLLEze3`z#(Ojv
zP0P^*^KoflSo3IBQEYydv!}2kqc5+0K7wx4=?>Ep+dA*6E6-*m-mo7%>M@Idh%msh
zsN{T>;)vfFAm+hfi;Be?GA@NvOmuCrm&!h}otf#dJE$E{fCeU*BXxc9W?t3GTZ~OS
zE}2?5*Yqq*<Pqcc^pcj&Sb?v>ub<1FrjE&J5Ogc`gUvi-%(Y)M0+)Yb5K-4u<@}>?
z!n!uKY<PtkE9mR2Pa`1~ZtyEwk+%QbvdZ=gOZHAr8QRc*m^mi<7AYpIPO6;TSXohj
z+2uGm`S_C$jA`z`)HM}oM#a;QAQb)ZnwRs`0_4>Av^Qm`>7X(mfT&Q+9<cpi=w!Ky
z&jV3g>tL#QNs#U0ffa|$S?X%s{1NY&vte=k)D1{lH5hk)J5ddE?@Z`#%r<Dg8RKz)
z_KdjoA|@w=jWoF!Q)gqC9<ZF@RJMa{F{;R#tSb`>&~JAndbo#_S-~BKa0+YIikI4U
z0?&q6KJ^M@>VVR5-F<r|6x3k;Z0z8NDmx!vDCnBN<3B0O)i_tbBw|B_T57U)^jsZ<
z2O+nMsaxj}A6wk(maziE+@A~vw2k}o{WLE$PB{ChjPd+g5_+yU06oR`Qr;$vmipqw
zKXW2=T=0NGwyS>VoW`~s`~Eqi`~&aq!^vA#R<LMMq~HNZ2q@b6+}MHC*)wO)xxtuv
zpw*)SbA|Z5wdWw9h?Ylgc~x}k>hZW4;*Cy*x>l!UJk0iRMD36z7WShC-7(QEGiMSM
ze)#(3Pz4e{HuL6w1Xc4>mwO!tZVE?U3wPeXf9bq-V2XftxI7B3@(u?ugM`^GCZH=i
zB?i1)&>3}|eB#HRLpbU)*B?baYiin4dwV9&M`K$}b&k;S)V1xQ`CR+<^vPVr#n9g>
z!_5ayMMg_j=Eu$1Ifr-D(`vI@ff~S3yE4Ta-370n3p^0xp!WBo@$U+0Ag6`68^Buk
zEx7yf2MyOBy|?B^6+yfe4rGR^4zbB&q@k)Yf}7sM&4m0H(F8=!9Ls&6#f!~S*@Odp
zJT_oFyOLS2J|FG;lPdtu#qS(O>pI-ZC4H*05qhPT!BWh8*$LUE_Ru@Sxe(J>yMJF|
z`gQgDZ_H~|`)84Ln&$(`<KsxV4aAEZk7Ctr!sKnPDNMd=sPz@`xiNO&TUp!P2tJ-b
zq$@pMy8C~~`|5`%w|`v_6a)kb0ZAoQx*H4xr9<iN?ruatT2Q(~1Vl<e7+~lcdZe2H
zh8|!D>AH*kJ$s*h?mhp(<p+6XV$~;}=lQI6z2C5#_2+@Ce7->;1uu}?We>c}Tt$1)
zg1fPa5)9KsItp8~O=R9Xad+vNyqD7V&D^gNU@@@{us-EC9prCBR}OLRe`*zIOork%
z^O?#XPaN|Q7IfamzVW7*9HHOT4DXRAgZj{FEIBB`aJ+5)+|5oA@795~TJgVpaCg4H
z=ZO>t$Ikk?dIn_*16}lQXS=!Yr8@<-Wt9$N!XIgtYr8**^SFG};#Qd60v^({Un|cp
zi066mRCDRs9?Dg@LP|t#1ef$qSc-h?PE`j*;WTaRHx@6@PPY>Qm}~75Q7#G!VMAT4
zd7(q8?%cMs`;kmOw<}6`uY49{(`Cd+;^(@2uMS>hFBf!pA2Q2O62L?>^_NlOhKpQt
zZ{q4o`=3`fkC!${&lP_ngo#M-_pVwDa8oH78+}}V<6SEZ&o+l|{c(uceByY3YQ?@2
zQrnJ;=*ck6)3;!eY_BARy$<^-oYfnu6P7HuU$0X~*lcM@w~$)S7!4abJ}$2AIY-D#
zjlS|{peD@gJf3KGrlX&O`!T|EMuZh%OP0&b<mopX`HzG@l|KVU`gIcg33c8*1uqv*
z1}(v(X%`rgvY?HLtTxaFJw>TYbw&L!l%=BRysC}f^1)Dgp{<~#G!x|tz5aRJx92=r
z>EZ+3!Pd`%9|%C(@I#C0wd%(LwGX4t=vKSqL!zbC8z^#g%`c61S=k}IVzu%NVk_kf
z7k<_VWN~#b$f%$2U|xAOUD?H#8Snw|R#HZs`_b*?63KZQ@qtY~Nud_OrK?URMrmBz
zzR_)kw{QHhm^o3sd=i+Z`n=fB>7w5>o5>9OHubIH-G^X62=9FaA2t07k`yMAlm%WX
z)Zu8%hPJciy|nQNPd&bCICbO|Xmj#obj8j)RjM~@*vg*)nnHr%8Em_ce}2A6?YEAf
z%^Lq|XXPYHK~c$Rubt;|x#^^7ZeJAORC=avqt6!+stGxd<V#2p16iwj?uqNicj>c*
zW%_XUuS(9N7hWZpVyTNuc%$N>a`Ps;i#^lQCJO{Y#l>#8Mu=g*q)Ms*?f(|G&4
zc$DATcn(=*BW||!3GM0GE<M)exNPlX1k>VNTRl*xZ7RsqH=G-;<&a2;`X&sFC0v%s
zxhj5D&pr{k{y6qq=kEF3i&-4+5rz08M%q~C2O=`<xaBiX_DZURv_Kj$S$bd@w%U~@
zkY(tT8W%&m?Bx?#ZCVa|Y^MEo7L9~-zWy?>36mCXxDw4O1uOkx_z7FF`xEr#G6Vfx
zV#anizmKef+v;UWW$NWRegTJr0TMnXewo2e(ID>~pa!Mc`&n6)c|q=MIEVtNMRu6f
z&uf!=zhAt!Bw^c<f!g5@YG{G7w}ck~*I$wI&ossQ<mdH=`_A-|vIgA6#VE3N>5;N?
zO6BrBU`Q>%<GJ<ISI4;)7Irt=odb*)u_c`pUk5R~kGb0xkByUHijDj8H#;|?3%RgA
znJSwSduP|hIylzU`H{eYEG*8YD%G^)?ZThml3A1zuL`G^sC&!y00wmJR;8n#ECZ9S
zOCY@{&d;qlE2wRqtYt@jKZ5Bxid60r4`^rz15Z`|#45s9AD3^cRwXJf<ovEcmIw?z
zqonvQ0p)pXyR#(E$z7>F!xf*R8sHW#h?gM`ElAro<zs)2u)E}egYlu)Hhl7Dqz$a6
zcaBq))YieZixQC=nlT7AU(XgP{YrRkZ%S4_@Eg$2r=t%0&lfh7&}m(<l#xszOy$`?
zQXQABkwol4R*s_daJ`<+{Bw@Bvc9Z^7mPSo6D(kYfh7y$sGDXKm8z^)i=<E?O-W%|
z>z}<FDxS}7Z~iHb=(2x=>6Cn2a0P2x;kxQfK3S8^TB&UMV@(xvYn4l{m772u?h8i&
zIw}5&;0A%Lx}U`<88DX}|LidPJ<HU&uE)>XzSvbAAD()Hv`ZBW!kw!EEKFEedyzHN
zmg%{wNe(wFPQJI96@AEp@bK<4&JkWeF3<S1QGn#$0Rv?*!8$T;0?u-cLwj+gqLID;
zJ)`<b%91G1y+;TxdMc4-5V_md7ri(|DhJ@b@*^W6enw0$ROzVa5ZR04`{D*{T*h@8
zfmHL$8WG8n`vBh)yhn-))@K8_l>IiTMhEW9EzoJ&b|#WQXZb|3D`$i<_-W9pYjTYq
ze8}CtGxxC)fr2r1wAd3%ZX<58*L$_8e4mTS8`0_;f+usQ9b#Fxq?ZCdP(6&zXFaK^
zB^q?$up2j?E#7ZCH^+D$_N-ACIgMIPbtF%TAp-stiSqReyddF9Ua_8JX>y)_ypq63
z^D1@^SStWOuA^@UG?MPeuz35KNwMM-)T;E(INz9;_WKwYO>6FlG?@CO2nJOCBc9ES
zC`CSc57@9*k0`;@+UX@Z|AwZSJ({@*FDEf7#a&H8$>B0tsWNt+bjp>57VT^5P!(?~
zD%pOyMCquZ=~XKnSIHB)qn;b?W#yDsuN{cE9ki?Ro)l~N0u{V0TI`_k_DRL`eaw)M
zLLNz>vq$Ddp~%{iVnV5jn(Ui*Lc&b<whn|+9myyom_$C(Fw;fu^j8UamGi@GpnVjF
z5}m<IkN6@UO7eQVfAgq^dh}jE3N)G+cV}mw09YNc?QLO9C#JMTV2=?F3d<87bs1$A
zHA7l~$xURFBF2ne;bBB~1>ar3f$6*phU9;3(u3~}FWyL1iV4TD1YJ&&uOcW*B7f|W
z{!2e9cl|XTRf@52#{AMeSx!(3$jkL_>SYFscSHT-c#<j2+0(v`*f*lfCt<nPl6?M=
zFxf*Srh8GFA77dEyqPjC^)mnyPFfpbAY<E&=a^L;KP&a*!oT}qx^Qo6^CoX59z2$Q
z^{LC}gwZ826or=6QflO#osl(TkfT4`eBozy{%FuB1;2`&i(>V47(>SWIoRMI>}1rW
zm>dQH^)pFvpOl<eU#})dPkpZP6yj%1#M9<TUU=o!O&QWLHOqH+bB>SSk7YVpT6(Rv
z@Q5p!+^eCY&&)0eB-MiOLt^|ekqoL;C!36Slk+Dt%yn6cAT_M*1Ss7Zm{Gkk|5Ug$
zb>ZSC@zJI*Iez~;k_GeYFFPI?c_GgGi)i`?c>H+!O8{js8hl4wiuydqxPHzR?yeVx
zPbO0u7rtHe1w*n)6i%R8-Ce-il#f=angX;UB??S&Vv?41a^aM{644e?%O_!3W8I$>
zg9<YQ1l}fo|1_8^;h(ejQjZPPd2_l1)o#oSo{nD~63scEEFLglb_O0A&UvJ+diz}v
zHMNXi?NB?O9ih#y;T+EE2X`M3M>{bIw3yjpI5JwVPEYP*Ok~ODt8aPE%Q8OhWUvMZ
zly=<0&;_9KW;8SD083cvRLWfEuEecYp3Yorh4F`G*by^i>`loacSlTM6ogys`ei2#
zcO2>&_&2*S<uATye@Dy?Z`spZj1i}>z0Zsx`4qgh`}l#SAl-#<7EG;ZMEUsk-T8wa
zJm3@n#q3(8#lMO?k;yW9)MMrs^pp4-N|fMbLbNOnWmNvzYhqx$0JZ2LSc#+U#Or*n
zqGN2k%l|M{mX1;@{?uQ<sqK}+_^i)C#(nJ<BElhCA}OVczlgWFAQH*T8mSaf`xt(9
z=Z{EtG1ICekWg`P;gIk+)Y-S^surX#`j}<AeieZ{$T-OKBA$r@#&9o^<o2BRhX?l}
zE4X-`Gmv3^nXz{+iMKq^zf>|4qQr%X)H%mtXxVB$O1V>)6|#C}HuZKT-CJYITy238
z_}`eAd#lXHL956XCz|1^*P&mdWz?fzsxyVCrg<cTY2YZpr}ixGo|w<ajyZZlhRaK?
zcvOaCUyy8fwdg^I8}G}6K@dtxBd<K<%)t(49xvSe6oaiX?fTe_`kXuPW>7dzL24us
z=5Mz;n%9c+NK;?VU89uA<GW`hZEg~l$h>kUVn*q4*ZUFD0Hio-e)haa6qXuGL^9*`
zTeniQ*mL7|4b=333m1qpg~#(T=hztR<2eqVxET8ImC)9=E7UjN0a~W=KF28|Y}i;h
z0o<|{hJ-3~-U7fFG`z52S|`YQD}QHwbkyuo>l?d|k)y2#2tzz02D0T>KJV6#A4q-_
zqUNVj2l4_dCHD2A@NmBzuj0{9AN2^OcbtLhC;_Zkm~F>%pZ>x9$O?zC2+C(5SejxJ
zL_ZDsPRu}w%bQO5)i}K&z>xnnFIisYV1D$RF@l|vJ4KPe@${^E*1Saj=CwK?%Rn|s
z@<MC!18ZHD?5MSCX`NiYI*2?bh9;E5RlB>Ozr1v`Ur-_-KY{PgdG*dsxU){%lA8o)
z@%@ck^8nDJzelSw_Pw#g2s^}7fhbPJR?$MaDoOyr3Rgoa%={tfcjK7*ewZXg`>Des
z2eFd&nt+rj%vn4cSZiAd-*KZdNK%xseB_FJREmSn9Jd&pI~(|0f`XStz{0nO#2@K*
zVt_Zey5(|s<tiWRoSYug;u5Rg_yW@SrUB5#hB$Y*$}T>-j+Q*eE=3#rIx?dbI&ml?
zSnfyVo&5RpSW@IbB~ZI|y4{7ABnKBOUN}3==R;gH{T3Ix?$96AtXWKcO`!nyp<i1^
zfL;2cQk;6s(}i1Zs!Mc*S{w$PoI3|7F8fozKoe|?K*qsR8_$&JJ3j$&n&1}Z6Nl{v
zKdw*jjQ%MeZuLk(l*W{sW)-3waH}a>Ypr6g@Zjp|;@?8Ro@Wh&;mEMJ>N;MIlN)~f
zJ)7n|f6N(sHD<N5t`hCV-3N$bYWq&Ud)df%1eLLgOv**KY5=l%SXhX;*+T4XF|>x2
zgMZIYDlkF{@oI=-Qhs1u<K%QvUa!`?{ikf$;R15l19~A?@GQuSyDpE;yHR<d_#k8%
z;K}R^puD7r4>tLZ78B}^clu`ECdK$~WnkKXNZNyM{0IP^6#yi;5$rP_C;b<7(vRdA
zC@lneS`nhVg^DQ9hp#Q%<woxZt<vCd$xcfPrK4nuKGA$zgtPXzpv>4gPzgNi@5G;#
zpr-tUJmaNwPS^9{%7hC0zvl^(P`n8C-!4!9B6#o7Gtouw4DMb_Lf;VZ4Q2rm=@4U*
z7VhB)<I8>Bjrju$XJlSHMm_v;Qmv?~spb@mbBqry=B;8@;4TqfnP`6Lv~I33tt+bt
z5%*LNd_uL^L7z=0)?S!1dAVA>3cMUUD23R9JAPhsRkD5cLiP-`Bw8gFOOaOEPhVEw
znO)itAREZ#(!^dR=jVBGo<!LSQ1#-%aBU%ijQ`!YR-Q11+n1;DmcCr)Z8Bfgv0z7n
z1}+M_z`H+%3}0*s*ni45om0J!KYNRdymYThcQK0rs#vX#G}J$PNMUnd6ta%5odqpy
z0#W2XF6`tO;AwuJost8+@Sn!qpD0^#vAs<GJSY%cWcclSf^f+Up!Ji|Lm>M%o?JN`
zSo)m2SfW$=%htuY$HWBWG(G_xM#{iDQRqq0#lQ!YFYwI##fE|2xIN8d0PZdkr^mnk
z+{*5fP=_cc>2%DEd;`EjrKaapu{$Seb!WX$oj{gksZX>i@C14h4v~~BMbD15>BSSh
z3o{*ZVJ{U}IgwfothY|xss#l5*X-P9fxsjYh;^ho?8s`?SVeCsq<vo^?K%WGt4r*y
zR|&NRKgdpxNkEUC<mPGCsOX}a%oa~1_0g9NZVVNgRhgU|utgV%Ua6WL%k-oJ5`;hC
zr5V~t016D)^wC_N{&CesqAYtJJ+tJKN{LC+%UAj*_Z{D86xu1^Zy<-=X(c6{qx~a{
z4Y;`Wl}r5`hY{0_^3&zpYb|A6I1%eppP?KB<73Cj<?++(GipZLPC*MUARC@29iHb?
zWa}fQDWB;`e>pO7Y7^YuJ%J@RO~<jG{@HjlYGdkHFYX64kt^5XC>lD+q_g?PATAeH
z(Ne{3n_V00c*oL)!K)b<ta<*A%;G6f;Pzddv))avaY4kBV@;?Sph+YcV6Wv;uqs4s
zsb5r5GLE3%xhjj%b_~Rak9aR@=bkv{RQJF-<D4Ifbh`O{z^@{}0pk2~;ysJZ0YV9O
zDqNUwR@UO5-|SrEtJ=1jsUPxAx{g&#$6WQkn*-^1LE0xWa&!e2%9bwUKgA}=?-B$i
z;)zB&-~}MVx%fCGRVgJ36%|ophQRx?jH}+Rm`KRa6{FqakKP+K&ja+sDnoP#Pi9<q
zCS-;cI%II2eafZIl@hL3Gx%mvg?18e5F2*4$D3sVC>6kFpp}3`EE?PKDHq4PvaZtw
z`YK1PV{imJj$jMfu_(|Q{D8Fl*%wV1#P-_<E$+UB5#a(HOQF&`w%Ay=fG}0NE3-%i
z5@FwO8W3#|z4}a!+uOv+v3ab%jBKZLK-Xa;!`&NMd&pvnrRPxWTAs8}8$n7CM#^4|
z2cgw|I|uaJrEcBWb<2DuVj>wWyL&ZLT?eJMFDm<l@btcy4A#dHISgoy3nyin0Ac{B
z!12K8O_G*Zfobhw_7#9AlPd!7Y@GM@>xBK@+78X6QcS6>X}SNf30j`Om0hCSioatY
zm24%VFDdSkvhd1fcsJ^1o1Wu6^l=7FER(P#cgK2*b{1TpVZ4wpnPvefH+uGfT7prg
zDiWx#33}x?9h4rTWKzonT#A47UUPrOcrrlm3a43s7I0BfT*XH8$ExY*c@PHR1VHNo
zF-Wh}5~Ocsg{&`I6B#A!rr0+!=+exE6Xw%XUEh=(`m-6rs<f>WPY>37Dqpp&@BD5i
z5UEtEon~?z0{{;o|H>&&b#7_`*@Wm0wR!{X$?6K=_h44N0TGg~VjrjS4il82)3aKt
z)#^>aeF=mUd0w(*0Q<L5<#p<?p-%u+$JN>IJzu~BX_@AA%&-SXBQzl1=|L9=cf5CF
zU27W$5<5?!j&B4-_<qHKI>}UN2jC>vR%YXrR57{;R1;tqmifxW1eAab)o8)z#`pB?
zm%P1xwt_@1F65nnHiI7(_~PTd1Fvg4Bnp3~G7U8OY<oZxRl*QQ2nSv=K}+#z%h!2H
z#U}a_)-V`DW1@FRDQX801EB5HRyb#SyDU9Gmh+=BsQ{&@-h!e-ogPeF`TP2pQiZFN
zf@xa@G8U?YgtVL=Xix*Ar=k?L+CL7rn>;pIkc0ejahssFG9TJ1zz$ztY9_0UFb>FQ
zk{c*?=9j2tyf~}2Rpq9BQzCdR61n{_6MM_ha#vQJv|guiphG9=T4DmEc<orV@l(Uo
zt~EtIySo!(wyg?&05H(zA$98Ocy7+bm>!&;GJ_}vXq|8h4N}fD6ThmjR9Vu`VG;X9
zJT_YiFU7|FYXY^amDYQ}E?+W~KLK|ssyB81-tYueC*sy?H=cM`4#daUPDFKhC+RQ)
zcMunG`4DTntLB_l;eJ)nsBgC-l5jnL?~z#Q^?DhWz}$_Pp3QEnKtzGLrp6HQ0Q1yg
zlpQGE2X+qXRi&$Two;3O3nIYN0oVnfS?$}QIU{8Rl_`K>?oQPR)ilDEPesRdJM7Q4
zeT}(k0%6CBh0}{#J{1VodOgD?r0Trc>e~|XsFEAmDZ(vaS;15RmDA1-IJ?61y4r@|
zdW|3urTSRHYW3i}_o>Yu&@fIH4o>7Ft^#26+u=Yw;!CGA0R-(i$8enO?3yXdb7DZo
zt(x?;D$>OC&1&VxsYHMR2t*(AK$rL9J@Nnt@Z616trR1xyThCL1Q;v3AbzoZw4bk!
zRZ?Z*J-a|(lW>^_&DXK!mSy22ZGeG6=dQjapSz8^_!<ZwZZihrRwb40WOK7-oIFEu
zEq-kKxAIX=z-~Wbsb4fAb6>cE4tH$_HzINR`~8<~W9IgkS!D@%sT6@nP^Ee#PPTIc
zXk;~hn3#z12*{p91hAu;u`a%3%4H{?46XqQrMzi|`?0F!I>ps&^=fqkUi|%fE}ucn
zUFh491jO!#M!05ARB^f(A2@G;Y`MLe4CsrjmSaHI4e1_wb$j*iy#PFtk^#{w54sLC
z;{79tv*^THb)bDw9k#x_m^+|?%Gc&v{7bH0qiz7`*YIfi1sDI!dTmxR?&ITQo&j0j
zrM@~63`8y{&&H1&=kG5YxkMMh2tQ4xblL&69O!UBO#~H@y3tzXV|VoXaOJk`o!oPC
zATGiAkg`DIH>wkKjUbkrWwG6QQl_av8O!ADu_z39*E>ustdJ_DP_`KN8GAf#=@^x(
z^QdqJ5YCQ|0XoBl?RTZSgS-#uGdC&yV%6SIsWL`}BEkMr*E4qbf^MoS!F$sTs<Z_s
zn;X5ZHCnyb^AD6_$wm?kbn{c;#^n@A-uk!&CldN>pls!?-<A~h`22wwlvQiXZU+@@
zUDH_rk&Z(nNTXj4OGiJ5H~PSK;2xk>(+S~q+&M{=z1H1;-vfiZY=6X+xMD3bYfmd+
z4MfJ&1(6jJM!g^Ag?8drS+PX+q5&^Jv643X6F}7AxAMIm_L!pj=U(N?sv!ZHrF_<A
zl#{yps?PID+iD3diV>LGmv8IHq_S;kXp*w*aU082UAt!%q<2ICfN3PJ_x?<Pv>O#U
z`qie&+hX$(=z<(n;gQi!P7G?FXisz7R!w)(W_Rmg-_K%$3?xfaAm421D74@50^)Jy
zGUvGN%K=SIK(G4^Q$!S7?C#=+fs9}qBTfrzoeuML5~p18@=1t`rR_NK&D~P!yyfL_
zE*xR>UZJ2zMYttgh7bsOJ0{Kn%1jta9XKaDYsP!Cw?M5HNNPJvYY%%gqR=IVW{(o)
zfy;w1NIR}^W1dzxg=#e-9_w=nxm2J2!c&)MqBvTM5`E)BZ6=h6xTH(a2jX_Ot+OnQ
z8c-`T)HP*FTQWt;-p6Cz*^R({u_P+MIacW><r)8d7gu%&`K>kX6KVuB=#tYIwmvm|
z_>scsiQq(;EJC;GmGIrgoEB>UQ0~%$jO}b<6v?>$nsc-gdihZ4H}`3i<EX>1i$ZJx
z>j|)99gi{&CvK6>N+b+WSB<G<{Ua4q2sJLu+wK^H52&hR6<S%n>pA}7{KEZemC~a&
zXsxTc1Br^v>tdIag;gv_rAtfA{Fpw&j)t}?k36U|%1U^EhPWGnjFwgdbx~kD%No|Y
zWw#mQnLv#v$HEi1vm?{VQS4}<Pwdb8OQ5#{1pN!AbjcgqGy0OgJD^NwLDB|f_wh85
zYvl-}sA^NyO+lfKfG7X4gb&1<^uub9*RXOy+TpL0W(=agrOM`tOSWF!m<OdJP;0qX
zee>4UX%u33Xuy_p!d06hFpG_u5^yef#scUtP>t^u^#gtfm<yoFCAk=LABX{$tJl$4
z0^SdkIdm;dS{NO24`X}{sudr$yhpYMfS)OZeMM6?tqULKOD@Pki?N#}ihtYxS?uZt
z)h@|qX}h4LVdi_0s%wG(F#!n{l-G8FRY*#mJpy?bL|3qhz~C{5%9?|+0{X8wO^T(S
zL6y;hO9{#_7d2xsk}E)wb0uX3vM_Jv4-yP|41mDF77LIzyDcpTocxR;fzFg<y1U;)
z0^%hoSh!et(O=h4UfJ9jKcbAv1L?H**QUKS^cK&x&I_8yv&giA%8k2LcZPJv0d3%V
zp|dfB=@cC|#KLYoyM`|o=T=Q<FQ!@Q^Z7-(HL-x-Gc%fv#F}1qjN88DNqIB&0L)E}
z`kOOjM0U+Bo!!OVT=VoUx_2X<+v7rz*0~oP+_<YxBu)?c-QK9<rEIubGTySzEOdNw
zYvB3qua5eq4bMptzaG-G6Czf4w`;tX1&cLy6mgz+j_LpSm1DEWN#=}<p9{Cghx`t*
z!SSq(MpQVQq?2b^FBotwBnsx=dd&Zjq`k-k_n6M0T=C0PXH2LNTD|l(2fi5DOkLB-
zN>asf6`?^+sF-1Uj2*h92QK71Em+_0fz=qUyEV@`4Y_bA8)-#rdC4<>_t=S<t6doH
zpK}~uU*y!>NrTWdh24ILh3?#K^iB`G6)@Va{fMNdfA3Oo!vd{UM;DOmy9&!U4K}{}
zf$})GWEE!L1fA@FWEl8+O(**O)SJn}Gg#lI79S<kD{i`t{k^orL`Y~nN5MZ>K19DF
zlj^WXx3@Gb{rxRAVN==r5&4uXuVD6-2Ji(7e7)-A*Bn)XGGD0|tRB2aLAr(Z-6ByF
z2S*y3h$Q5~y^UiIRCSYOesy!GKysqbEw*%Y6US(bM`6-1`|^uluG;k)5lS=0c^)x3
z93hFls?^+F<X_1|1vZy&9QNP7SsJcse>4DBi+U>jK~u<?9PjSboX^s78g^@g>*^R{
zia_R3zJI{nrATVujS?I3v;3#FRaHHwL5ey09Xh?k#RL^eaVieUJ3TLFf?6WoW7H~N
z75r>m_Jn!tx;WmntFPJ+jNKYUQc%?-3I<zI3Q=y3?w7_S7a`w@3Gcteb<~5~;grpE
z7P&hm_jtLJrm;9QzJ5T}F60C2c1-a)5yieCJa{+Q>)VlT;&e?`oFn$m7^=bc>oc_2
zJE2i{X^5u7f*u#?BgYcM0O4mv3Fdhj8`)fm&p&Uf1ijOS8F2o7(%E&-R4Xeaoo${U
zuO>Z#$j@(jYt~e-=UnNilzAUd>1(qNM^r<jcY{T7uZ<))LGa;^eYOikIWf0aKw%7@
z>b#0ghJOopDgC#FrJXUkB`R-&n0q`jY(QS^wd2MXChKE~(YYn2S+hkQ7EpC(@V9+g
z`Lf=Js6_s|Um7!7=5R^0awz-NBCGt~>+r>?ui1;)axD7L#rTEA+NyuohsEVvaaok>
z=9M7yt&h_1j*aSS*zSqD`hN01x1%i^?T88AEuuY#SYrwFSi79y>cd+aHJann-lVaa
zAr+|-=XTj9<nMB*mz0{U&Mo+?`qZRSr4Y71_v+^y7#cg9)_3+fm^?aHTun=N4BB!K
zxtgh>GAIeDRB8VZwkt}0ENNiln(4$RaDJ-0TBd0oj6y;j>vU@9wtZ;RW;qbZamD(4
z_p?)Sl!HjV_mQZDuu@fB<l7$gGJ=W(BK4xOqk$sb8|n`EVTi3vM)0B5PJ1*(s7Zzp
z8y35u>#s^}J7}f#iOboOmN$us7MQ*z5F&y3O_%WISE<7<&V;EV7f<Or98x_*<T;u;
zsY`AISl(|!E>1O^x*40M;|N{a-<v$fsa_bo4K2P?RmkMK+Gfr<@56c0!|QQUf}Es_
zR%4guF}asfbwijXPX<j>Kf4Rv5}Djwd@DBZb*xzFX)rf*ix(-F=#qx2y_;F(ey$X%
z(PLEg`HM#HZ+*CWDf=hdpO340Cwd`%ai|3$g>w^pA)OXJ#L0k5ZeP_N)vQjeiDkso
zVg*x<5Nk*w8S<h+c=X2M^v!oP6iq`dMed*|8MHj#=DRIa&#F%?q|0bTIe*e1_zsN}
z;|!>cfv9zO2tv^HYUxjzjx}hfzN;JT=HnD3g&5i`3OpAo(VVTwf@+}r3s9yAmn>+J
zz<250FdvjyKW{+ItKYc9AGFQ3Zc@d*hNk4%S9QBlSc$Tf91yewJu6M<jLPFFD>X%|
zgkm-szgtBj^0N@XKTPXc>oxD{ZJ=x?w)zjM(%w~$X(#oJTa)okWIkb@$LXZnO}#~Q
zr;nRF7U++^@F$jUc;q#IO<%yldRrfX1l*Mdk21v}xkCffv?SU#J^kG>CLE?;v;Zj%
zFhp6@_PEdW$!ggY4-YiBORULEjr3p@FA;2vzOC1Zclg%kf_eg|i-n{t9}TnS>tR9|
zH-$rBalINo!e!?j=J}XRcG=Wq#hiZaplqqf+yPBla)McpHNb4!YFR!!oj#neMs1Mn
z<PRNGZ8#QW68L-XZ?;Z5!XcS+wFW{tlUoM@=x1kx{UbY1wiM4+mN$-1$PFF@nTlRi
zXBs*zj6=QT@15Hecyj#It6yBP4NZ1jgmBDskT~$ZGepfpI5yMMo*R_L8aDOw*n8~a
zaon)SAIA|Yue}_lDk>)OX;>dBUA5N0=_X_NT&I|!hY3|@Ugabad^gf#s~hKB6s2he
z7S<DNAne|v#~4kJbWAC${M|L!b+j?q4|Wth(ZE0RLHpXx&!|xce!s@~lN6EjSZ|AD
zDzjSjdUr`gNIr2~gM2IgZVcl(hx^c>qHxHVC58eummv(ZM$yltrA|R0fk|DrZJ?<N
z9%GWB$D<@hlj|!W+l8xUTbyv;sZewL=h!}3dM{G~M^FEa!wnAms95l5l(miz7)|C#
zbLxp*`}(;TWo2(r^Q9WQTads96#UOG5D?X_-}Z^$?|oulZR3}_UoR665<_i3q8!Hy
zq3pKp`q?O#2FdrDHZOj!jaE(nLT)D^%6A+mg_=x};|koFbE`t;p(lytoYeLTu&oZ;
zwAp~bhx3kq{B3CKRlY#7wieB8f4e)84$g(yK{T7>Ic^wZiJgymBa?N#?C}|?riL!j
zOcHo822x*Z6FLsW<?@({3duDD!`0bT*gEGL10A7Wr~?B%_|fhhOR3c@M)vVC5^AfD
zkGbrv<wr?4$MK0<*QsXNMC>4*2fep!{T$0<ftCTN`-Kc{uWNTI3R~mJuVR+fQlZ<i
zZ@k>Rq;Nu1#r=M0#Aq;A=3R|*xmJgvylcAG@?Z9U?|y#3TQY%BHTrf_qYy!1pLd6M
z%Nx|KJXqwpjsC<XHYC9yIwL(rW<!DpJyeIM9R97bp@w{jABOhSmW{uND<?`P$`L5#
z&pHjn>!gnKP^?E2^4KQ#o>Z}GJX9{Pl;$_c)zQn-?RAX25KaW)f;n%3(4m@8@;Hw?
zV7C3h{|5HyZWpwg*mM_VQO2QSN4z6(UOj6&d_Pz*=c$l-Na-C~YTw)e?`84;{$I<U
zN0Ex>N&f1(<?jrw9c@f!J|x<$e|kIJaQJ$9iiE!W`QRl_$?_QYwE2&`c3s#ny)W@U
zyZL=q6Km@lm*<e?HVtS8t%e$<(qQV__O*s{DvvrvH2pI>WU?$cq02(Lzc|&S^1C!N
zPmkhW;+-$~kox|)d#yj=zu!IQWZUB1QcJR6?!`BItqcL&E-C)iJK}dK4!5Bhu*VcX
z9gh-Ril(7sVg^zx*uQTJX_x9Ti%c*i8@6*gNaiPYKAFsUS|;^N=gX9qxqoxcNFTEF
z-tG3frQ(+~h=R904jbc`UMDg8?Wj@qvYG8F)ww*$dak4pP3xmm^KmxaDH$O?(^=1G
z#IZ)W2J<wUROf`o<A^pPNk{+Oh1#=r#uhn4QZPXn(AMslA}k~&W|emQw(>|k$J}0Y
z{K5u@i=@4l{H6~)Wo;ro>)x|64*H)oGg`O|&(42D7}_G|PB*`gcJ$hpI2zQxZoUbv
z^e<;`vOk#0btI*S_TAj^-&;oM4pga0ww~-^2tZ%>=22*JVp_J{yp<#dkh!@qjrn^F
zjslt7q>i5DS{u(Db>>!ef$O~&3?Gp0->c*&@@{w>j8VFx(^fq<OJI0f8+yzi*()`4
zr`-@x)Nu&%U`l?hlkaJ-_w?EM>>g@%&Rwj&YSy>X#WGc4>YJK=$CLVd<8_0uUoTXX
zirGFF-S6`wgf<9B;sv3e6-8CfJ-Ey71OKGwsC&c_6zAxBygZh4n60@rZ~nsw+sUy`
zxh_%Hgr<Wf*vw3xW|9lC+bUE7(O92li`<&qSBAAYdN@}K)GXh(GK3uYidM_dd%zvh
zD(J@1S(;=_ok2qR<)Zh)0#AzA+;sE$?d9gveB-6L<x1<{>u+pLi{CWRvN4UNTLPr@
zlMLB7Utx9RFL@N%2FlMj=beOi0@!8X+QMDEs$N}j3~pvAs$cf4_sbo3uyWk8N34-@
z3gVC86*+cHVBIQli99g*{A09}gM3#axwuw<Zp5YT*BBqfKVX$S?GL@Hn|-~Vy*kui
z3_zTK+KbXvS6A+txE8mv6|kd}YE*z%Bp2byKJGLA!G*6W5ADUqr3)+h#Y2ww+t1Pc
z)PH@d!QWOZ9l1YAHCJ+mIsxc1pfajbYg+>O!*gd!@Unk5$by0ku~v1DJhG;oJ7D;I
z>~Pb#SBC{Uz}B%r=9m^=3eo=Xr~Mh#PQAgbyWY&=bQ7hEt6{B%L7UyuV}4&}#EKkw
znSPEKrFrdv!D)|Bv}3Piy(XsMkG7G`7W-KzesaXvR>TK8XU0>fG7?OL@!6+g*QJVS
zA@@($x6X7m2ug~yrLsy^*iZW+t0*2X@lIDuoXqo(sJ9bc@?xy)c{-=}is>=@a|cO0
z^e8I(?7&m+vR~>l#Qvi3NN_XmLzlzK+0n{h&(G?@GP_zpLVbMybt}u><!t5SAQ-C!
z_q%DaU-=)pEG{MB!>Bu#o@d0jSPNFSx03%l!^tk!&BtjPQXp79b4N;4#t2!j5UWi3
z3iXoxpZ2|fYU{gBL%iW682hyXH;!y{l!)V}Crj;`kDRuK$eOQPtr*V`zMz~wNy}RG
zwIS`*=OGc?mA->f`LV-(!|$JyG~yf3?dveya7Ku}Mj;vG;%s}8thp4mXp5&vT$_C(
zc<bb^)9>~$Yy_zP=WvMLd7ut=T@gpAO+LocP3Sph4AJVp>OI#-J9$JbhqC?OAILkI
zhQTRH^}gtIflVXVvw>+>1OorvNa-L9D<h8oj4z&6waZ-itOuV)+~u8_&%AvBy%%e>
z{h-ZAtz=-Tn(+nQ=YKsD!pxCviMRf8Ls{7AShz(c#ip=%ade*_J=iCT*^HBtCgG*r
z`1JohY)ZNsA(4JV>wkG+n--AiYqR*DKbz0wFv$K{fPXF~L%=?Pe=YKVT~^>c#{6Gy
z^D-m)Pp$HQe)Wegx^LM3+rgR=uWtWuH--6r{AZf~-T%V}|DSaHMo`mm=-q#o;Gc{5
zBU#4(ed%(jubk(*UWjQE{{7~7{l_#BBoWe{b(Aj2>_t)b?%?X5b)1&_*B8}eO>Zm;
zA1YD2*nWqckh}c*2{Q3G3IAG$xtHUc^K}E)?+QYRy7gaQ{k=L%St@^(sQ+5_E?*-L
zQLik>aL31^mVNQQ*o3Qp-q(`%-*ymx)DGMHwd2o>xqY({ZS0`~M*Z^d?i1Hq{>$$6
z=uR5rCfDcB<ep6T{}GG$FZ1fc3H^5map11&X+%mcsl)`l0({}d*q(KdrdyBv>ny-4
z;QF7@AL&^Y;6M2LzZK&(r6K0-5i+J?`k$dBF`dWDT$e1lrpV8cHyaOF1EnFh+ivCC
zk)s06`j!Pz5C8t0i=oxO1|v2<K|V&OloZAK$#0?8(Adq6CTOtSxp8N?>~0jZM~#{1
zz($&IY{PKJKMPyy_^%<lgxgKGR+nUQcAh245$$vsp5cYOXiJx*r=eQGZ@RpvR``c7
zadTj66!H9@*>rgH-#)m+sijXkJ}IU?KzJu^CzLBN^o`GabuZ4XTPKNzXB<NzZ4muN
z>e|@K0$J04+`Lx(Up`+epFi2@ItzLLrJ_w=(hS&wykbf~*BJ07u0%B*>Jr37BMK5`
zMd0o{{ny6wcQ?#r|J4mM>J$rV_`XLSrK7E%NhI8VT?kAp+WJCEe4y-n!_dUe1SqbX
z1KwE$OHzL?_Q_NH8&{;wzc~ZbkIK%+3>Vi18TIPiyoXoA3F*G*lx(RKdX(wL^}?hM
zi6u|ESDk3WpXbzsFhiRfcTNmfQZ2Jq4ul54{v2@BE40C{0(v5YEgtvOs6c+ts8;h}
zk0R@WGW+CD!H&O{A>97#@UP>MmM&fN!&M$B7*Tli9QM;P(JBA78$s+hDM3T+waNBS
zV*2MCeh3)<^l{_A?Q)U_L|tL0r5sw$9x>njO}5W+%von0@I$Gyt)>ge@%2AmCk16Y
znwJ}hiI$-)PQ*#vRdB6;<$3?rB6{ciC=j<<Gw_HWEb$3V)7dCu)wAjJOiZu!Y*qJa
z0pYy4MjV?ZK73^d26J_u9gwITxbf*cJ@5?YyzFWkfmmPcqC<ub7yB)|Bq->f$ri61
z?(jC6=|y#-FWZ{nWdZB>=Sakr<#|ZppHchS>c4ztlYo(uiCliOhp8E-BW9_ofqSds
z?-<?HKsL30<DBT~0)b4H+_&z8$&}lNAArKkPA<i<VZdr&;CPcgWZDf%-mig@{QQSQ
z%I6ZEnm&yagsK~{ahJ;@k@DU(zl2?#cy8NFUu|alB}$7A7+#>R)@dDcXB|w_So&8k
zBsh<T4bO_MZX2GIAj~X$(wu~)B@O4#L!-`TuMV5euBM^;HQzTb;*e#S787R5m`DDy
zEgZvYe0szDRd!W%2E#?x+DX+2az}0Lm9l<?GHY!Qjw*J6egRiag_b*aq{+;QU*4j$
zIJw{v5Xkvj6TNnJ;*@%lf!-4dcg>eQUN~cxy4sv>^x9}lJwX(@)f7r7$N0vUtpvDs
zdaU^Iy?psHrGFLTuXyC3Ps!&)mOYTxUkq=m`R!JQJJ$TftlFDe-s;5;ugTA4Q_=Ed
z)R;t=vhfOYDqm&4^k%*hLm=lFpI3R_Towl9cb>8n>$rjggYa2O>4P#;?ukh5i6|Z|
zW&Js(-a?*S+1;)W#+KrjoBLe5Cls#YhqeC+D$Ac{E!x!Hd??FqDp0Sr(T>D>d;ZVa
zyzj#CF%aG;cL%gDb!{!{L~4q(b9v&%Xc6ym&a*d#H2pt`<Fcx*Scj#SKdomQ<lD{O
zBPZt#Y<ceIuP7}ms}6Cnr=xV^D$#|Qt?j6O%cFfVTywB7)&iS8d*=e(IuzsjewFC3
z^CE=uXoOU&L?b}{H8UGqde5M$+m;Ld{%XMMWI9M@RTC>Z-Id3uuZH)aNIO^59W@5}
ze$HUIbYD=NG91d;zOWqSsY>2CBcX-$Lxqe^-UpujeXorp`5W%fwYntXdT|5ltxh3J
zv6uIB)wK#a0$^iM+RD?n?GAV?UM^g>La;W;f57ARRq~g|2QptXp%I1*jaSW>^F3EH
ze?`uCwzyH9=$x*g?s52Vp`JVY%TTS7jx42r;?Yv%{NR&h&RxjI)YO;gY4ig|<aaj7
z%R~z82A>ifKDNxfniOYv<GhbyiO$=(fPDcSh)uUET)g9VU<{a(#>U2p;<2d2mABSU
zucmn@38L?xu_La2PTZqE+%Y;oFP(M#5&r(YmuP6zhF2d_xZeNv>(dXvzezz}au61G
z7wY(PPT~a-(o6j`)P@`V4i+JMYo}<i1(hSiSQ&-V6WsOM(IqnXbbmG1lnu+GM8((A
z5{ApJ+`A5j@$zLg6YpAj@pA{4*58y;b##zNMsfU)u`MhYMo<fWR}RGsrh)gnvt44X
zCPm37ak9Bb`=aSy$+?hRWge<Vn6`!AG47~%xj9uHnt3~2>z5N<gX7l1*h*<ImR9w#
z7(`HiQ7n1gZ)k{+$L=_nMk-6bfKX_#?yWoo861r1_ou~7u*9%`?%S5#x%oXurDIp<
zd;v_drBQ#nP$!R9z~&7rvq;3}B_;#^6F5VgF%i+q*-gVrb0mCodYJ~9T5c-0RAIFs
zp5S?<lfVCnsP*}aaxHd;jzA20fAN^tpAcMCSCiZ6+p~2)oi6pN9kpRu?e@h6jqhb;
z2|aCxD*%P}6PO;g=#QkoFS)sBPa)<#r2CDDb$8-BvBRz>BB1S{3Q@6ZX#r1D>LnjY
zoYUYkZ2kT43S+)pmGRYUC6oOy0SXRQ<)61kv73c_qjYCdvY$^3c<Q5xJX4izu9%90
z-^xYD`}p3lZn$D+seUi8pc0q&?GW+W=1O|Vr{rOoMIeUKSZ&<M<?9K>mWhkj*R&s{
zH-ea4sIRp9P)?Lxc6{+6{*aGTQV}8kLZ60<mm*^dFFag|FTzuD#azFy;;rg6j3;QG
zabmbo-&vjCntrvFEv<vDl4&oA!N4FVdM*9z4Ov)6#(@rY{so2yb?Ck6&%ze5u{b!`
zTUV~rt0MtouTclC5!Jx2WF1oFH@2GUTzble8l<-a(donI)@nFR&YNLJGm8VZ*&o`R
zjEj<za4b(}u9vzfKt}<k!I3Jq%10ttW;!0r!G3H(c!zp~i$@9kU3;gW#)HX@X??t(
zT}PPcDk!Meo$4=Gvxu4W_4&om)t4rfI^|^dd_?d3oulK~B4KzKp6E$=3k&ip+flP~
zHZFsaazoz(=1%d;M+?R|c2o6<is_KMwD(??83dAJ5YqCdBpdo=1vO)kb7}YNB^G@s
zeiH#<i7s%M6*W&25@QXINf$z*Q1WLU_Or1U=w5iu@=EOHbaJ22KKVzFwYylS2g|db
zbJ#|xu`R#d<*P@J9zEb=i%PAu;By_b3($sO1_gWmvE2L=M@mA~3ZezOVJ+UL?sh30
zsV;pioX=a&6V=0qhHvBI8XXp|n_uc=4$OMawtR~q{chMq=X<zq+VMkxlHiul(Z;$B
zDaCY#a(=yX!>PMA#F!Mpc^+(1l*lR*x7hORNdK4Kr2`o8&}y}TnYsD?fe`*gwY`{P
zy5Pc!5-rB@Arw8C^piFyJE_cUrFgRMWCJCi1Ye%_bfd+XEHTU#{<(60I&?D&?^c(+
z_z;Hy2z*}YJ5u#C_j=d(@zVqOG1q%>=Y+6AN`hwcb1`?|$EhE`b<aB){UpDeAe{QK
zy&L-*hueYF%mq&0sE<+4<m;U|Ys&oq#<IBw6*~2L3}vk-op|pyv--&WonF1TSu#1L
zO;;M*D`HlvEBxVI>F8Q$`B_-JHxia-Kr|fDFrk>Eb-|#|6W$gi@LhCn<%O3v{=gJs
zg}Nt4SJUXW_0y-PYVQvFTa6$Iqlfl*432J9`#kim+tjgbMs6K=4AYn@WW*>z3+GTO
zFA6ggrwO_KD;sh;;(^aZETr0nB-7Q7s}to4X+lv8>rMBpgsg`?Jf>G*$+sp=_;gh1
z)34)K->^QmOipCZ`Iq4(%Nyt1hi*+>1{UC(Lna@w^ZWkW_F`or7+Hr1<@=I$xJPpm
zACCuwDqTjY=FXh(tgXX)#4W<pr-mWQT(PSwye3OlV~!Kp!kRQQ4VGm^v-9D%;xjW9
zo5Z%AOP)?t$_lYG?X$$i(hPLTW<MG}4=19>{Nht(Q##RP=oQu(L&fXgsbCOr;I8Sn
zodN%V_T$3<;nlQz$o$XNG_`AEDusd1sQrw{)mhrd-*+XmL}%~$nzeIxM$>iJH~8!=
zXVjYaK*W8{GOtWqQDsBhVby)<XE#yfj%H)4hChA!L+z<A%EGTA;^K(PE?4hZ^qu$J
za~%2>k2)AO&2B~@;>wk!;P23lI2lD^rFzvV&|8E~C{G@NxQLuClZEUTvY=e~VaKV$
z)dj-~mD#QELo_&vV{xLKZ#P5ZdHqo6_G!?&H6^XqG?$t)q7)a8yN9UsV{=CjTZ)uJ
ziOCX!oCl{CkmK&UkY>yrXeL{=c5B-F&S9I>1?!X2K!Tv3h6uM31Is?;hi>Dt8CLu^
zLaKs(>u<9)EDL&3+N#`n?pd$cm8Gn=%Lws%8PFe~2RXP2IV8+}+6t+ErP6bS(PeXj
zcT#l<a{HaZUI&;P!~LK?4pK&UenbfT#FIQ(x2&fNevB}R@J*`sIU=q$Wx!b3kZ75I
zX>>ZIR%U4dC2H<{^+)-FdwJVA>+10Kw}V*)rJ4OV+B0{Y$PKc7sMETRmSX3Npq*_)
zsg5mJ0mO)!?Iq&!eW-C|wX@eD<ZD~#nv7FrJ8rT_T~^zXRHpaVByVu|_79cdyxt@}
zvkAta{YkxAzw&8IoOyNyA|hl_En;V+W>s(Z{&4>9LZl+a0{dUwW=T#%<s*8X>~EqY
zz4T<M{2oS2tvSk{==_u6SP3f=Q%bPs;p&-V(@Mu`Hs^kd1k|$)fymC2PliTbVL%p}
zE-y}13e{d}T-A@+pDoanb36o~YGd?AlJ})4j29CZAK$%02V&Q~i>$rQj<=^h?^04e
zh11OrSZ_9wUt!V+?@$Y-$L#Fv@UH68T~q{M3At~y)}J6uAd_n(h*WF(z~+&-ImWNi
z<6q_dEPrU0b;f2!Fh?Gfka~i`mYqtrgzr%;=*!-IKNDK&=^HbqWH}2D@<tX?Q&=?1
zlO7Q`eMg{X<=V0*N#n`=`bcGyhz|>E{d^)b2V%vSst#M{L}l#`1Yf!9YuPlk$^{Hw
zij;m8&929qcPb^&^3uoGgu3cCa5l-%|J0rV?c5k54^E~?`ZDF<T=tblbC*8oN_Dlm
zF5tecJ{Vs<a^S3&Wa_whABXlBkA|=>5vgCUUUe!0xtbNr9XXF`O*0X%g}*(=G2#~H
z&V1zp&3g8Dz-00K$3RIiep`A-OZabWOJ}`+1Kmv5@p@W<yD6vO={U(a{U&KcO@{Ok
zM0x)}qrg6Wu|`qRF;ND5rQ9_24K4vYkF3&wGXy;6VP<9qUkdZ@wDw=WYgxSAZaxh%
z^xRy^I=g!Ii6w?o(hWCL)MGveXS+aYYl^_fs>?f0vqa|}Mzw>yuxWZqz9I$~78ZKV
zaPheCyy8mwkp88S_T(fuU@&r=n4DZr&Sl8@>`jTz@=RxJ<Wl^?i`ty61L>DCi(6Zv
zEk@1b<B-0SB+Fb_G$S{65Ko+xTv?XH^@@hw*%)rnG~C;VJIbd3*T#6IDB;y9=v>}^
zn~*TyyE0`sfFmk0asZ7z(Dq(v&cngf-<C+!qdo%IJdgF>nL}uaX3>|C6YRVJ7|m>I
zM$PIY!qOK$apuxo4R_S%Ffm#^l-l*G#8{~>@SCdIeqxWq7D(50S_qqab)LRj6w9zW
ze<Ujw5C%EnUbubraK3AgPjRbrh5q&&iyBVGrAYOyQI};`(Ya8VGWAi#3Z{|Jtqk{o
zkX*`a^+GMSk>&+(KacPuJlil$RdUJDR7?1*5E+3aS@?*Q|0Q8KDsSD3VIs2py%5&?
zn-a|VpYxu6I1?NxMTZX_E!*PdI$(!mT`?9lJB_XPhMwRzBFkAj4SwxLMn&x&9(pvy
z;S#a(Grey(OMnl_|KphdxmY^)s>j$Niim(x@@zQkRDUq-3IAk`D7BBU_l1WhCs)pY
zov@KGO;`6Wz|i`2EE`||Ep6XLQv9S+U+x*iaG0i~xOXo;-S?dktCFaFHt=sQ_nEO0
zoxN=_N0o=%9|feYTEBTBkWe&5IiIn5p?{gh$%%!5&$7Ur&zXMX`MJ7DnxOI0hrr!a
zYErl@4L$v8t`S6RU}EK!%q`%<z!7&(Hbt0Yb>S{>{qM8BheH=*jpE`<^aN|;$qxF+
z+H!v}gY+Hp>_AoC<RaDP_+@MwZ6&46$|dk9G$cnBtIOyMjN_#Oq`gwXm!SiUjAC?j
z_-5Lopv>BKAvQW)(W5AtTXS8;0(!L5KPNNN<13$uFoMkgydR;r4<DuzTHbtQ{+@3}
zH<l0K8c2~+okAKDZd^J?3PdBC&Zjqql0Q$udl%Vl4!h-<O?vT}sVAal=cx7xRcO@c
z!`q&!b_i1Ls|Cbeg)!GZ#z>m4^GYW&<<9WE94CPx8cS6N<Gp-+O(ByKpa1WpyYym8
zQ*Ja%*-h5P_@5oRzL*x|w;wvL$H$1H>Be|a>dm*(FLg2YSvSCtmYa!x=_|SN+airb
z!4h3H6_w_+>_F!ZzRHS1J&`mpX%*(EYBd-Sr5tC0fJ-~~lefiPAL8RG&DJZTlJ-X#
zq!4{!G3gQ3Q}4Uv?!lpfCJC<8V;iIJ`U_W?PE{A>j0f!eBc?As>6J4FVCn^YZe2#5
z;*ycR+l`gdoCVOpQkEPsmUzuogW|?*T}~YISv0cTlt<S6%TG1k)gG^#vT}cpDwfrq
z+bC%a;7-B;bT7v{R4M7A3<nG~?nv|t{1K#Yp_{@*cAM366d3lS)$pf|9<x&2LB1&o
zpky?`#GT`7_-rsRigo9Z!7gCo*gVmIpT^1SChK!*6*6y$v7jEXlDCB+Jd%?}+yMI%
zn%+DgI?#421pz%i+mS_Fyl=ShZ|&x9A@pr~uR1Bw{qqO58s}YK&d|<LvM*n%6<OTD
z!^hYDwFKLz^PJqfr^VjkhN4n#yLF(y3m}B(%m`i3JvPmKUAZS5y6uAnM<osySXl7J
zJFBT9KeR)-(e<xiy~6F)EH=??^%yTtEYd)3ZS_yE$|kQJ9i2qtMd9AT=NRM^eL_++
z+H;8raP8QJu5v1{sHg66lua2R(%wpOtRT3=)fcrYDObk^UPC>yk81W;0((JF=8uZV
z32c&dfH|zsbRUR5T+sZQn=996ICi$8C&9gDz|h*Jk1kNilzH|nc)FfdSrn2`qS`RU
z+`sH1LL(Jtx+U3cyyFp-UF=Cu+58#W%R%?ck!7M};D*4}s~eXZ85T#o1>9c~Q4K8)
zE$9^214Z3X8wY17PmQ$X!!rwvkCT@dBruxTUUj#1eZL#%ce!Mt0&O(2#4nuZv>~*Y
zY)#SG&Xgft_B%gy&)sS@kGI&Io*HZg&o3*^w|Ea62VVtaxtPF25CwhEX1=;=8U=Yq
zQPD_Q83NZ1(Dg!1NGVR`KUJvzq?!1=K2tV~Fl-!BD)JtDcq|ey3+FWz(OV(GVRumK
zyLG5zn{A<`)pY%>xYZRBfTn*x3`ym6pSlmbFdiiFIoMPJz#v5cNm|XqG;hB1)w_~X
zX=RxhQ9^-{t&rZb50720+T|LAYAW-_(&g!l{%9ABTPSpfC)f34bz3bR>~bMdwqISA
zdqzrh(}4weY$_}zUMV=)ik&nNtvjjvPW^J|h>L1uHa;t(^!NJ!oG1ectKHvzR$7fS
zJY+S=5PM=6_*zcx3G`@lIXA8#-s4fG&%wrC;HNngOB0gOqPMx)3Z$DJ(<1ul5S(F*
zn(~V~3Ymw75L?|W>6b5!`R{u>{r0~WlyHdXTI~eYi!>B%MvGex4L!NtrwBC9`uf<Z
z1nioRH4T;~R}Do`6WIf&88Dfc`O##9FUn|xjQ`Wtkw-(dhv_l0L{nmlv6Ygdc-9$A
zG9K|lq<Se4Eyy-y8G|X#$xb67(U4&{nmO5xZH%4piYFP_Mnr^cGnV1qnW9(E_s>1|
zoO{pxe&6@qZ~uKiJ#;)^AooI{?P<&pme(1o_Uym;QnXy!*vx^mnGbiJ0n#eK$~fBp
zEkhPPoORJ$KMc^VE`D{39{E$LBbDmwi(MsgQSKdOM`UlY-pJoEe{XT~f`rO-C&0em
zT#mFQ`%Y?mFa|K7&Ri#f0gRslz#nj?%n42WF<gI>tcbn%`RRIEwcW`v_Cw3y0}Y17
zuRuLC2bDwD`>6dZ5E0R}Sa-t70w_eznc3vgRCw7kJsqW&oeE{iUN!f&@@caz4<SJ|
z>aYegjg`Lvo;pX1RQC$9pmrt>D{*`>Yq5_Pr-@1miAMT~dy=cgD@tYg-=@u~>X<D=
zR-9<cqHWBgEV`?to~iBLSipBs27RQSf3%v>Obtgz*S12x!pk2pgu0b-K0F=Lw-M}L
z#O0AJ4MCv1{dY$@910y^IFb8m8*(5FcucvU<_ITt^Nm^-QdyCb&!w80hmBnNvNe~-
z0s&(te@LgkD|fTp@?F`O>W%_l@L<3#b)&Et=gAVj+Pd1?0CkJPqZ(2@$EuhJ1|V{j
z;%ijLgNGkSO!qU)p%_;4+dy@;P2_P2m`_<miF@CpCu!|rHreI$!1V@c8SI>kwY6Ah
zwvYL=^cy@7gcYD)YsES{C&0q+cv-()(x>o!0q3apS5XCblhx*b4`5b>Mn`8V(#0VD
zvV(1_iRnW7#xZuDj1l;Vm(C@25-EgtVAVmWrDfISpB(u#xZevyb^uQ$YtFyKqtAkr
zqB-7MB}JndR_d+$06DNHDaUHoJSF3C-U;u=+Q;cx3vT{dDbyi&0Fx5hV4lbr`C=4R
zc8Oj}<;5ZpV<%X#m}zV?iQQ`J)(rQFl<6-|ge|Sg2DeeKt&2@egt>qv&DT`4oiq+S
zHE=o{5z)~-2INym2LwQcjw6RZzazERz*p`DJ$pxhnjwf^(rA<y4ub|&{lR`QGl7ju
zeS1pg=U6>0t-w6YMfWny5EN-_QZTz}thZEu>0$*CMGQ}(g-=4#-IDw-*Ed*`=ub;E
zs^igr3FlN6iDkeC8uM7<4f2aebEM-`(v_>5w9eh|mIQJGW(&e+Q~5x%s%^Ez8n!OE
zELh_~?tDI|dkC4Ck(}q-R51&|<;eh^;B1m_+^~B`x(>mB=(<P*j9*XwIDVQ?yWaag
zMn3f=Q;z7`&nqfGYr$VFkNT=Dk1_DB5*^T<OkG9y3g@l_2?WNu*Y4>AT#Sc~<W7Dk
zdm&zv3kkP9Vfn(9(Gbh5@`jir%+OLW-saO%??G(J;<9ArdLvR@J+`ylR+wJoYHaq|
za3EEuWg7v!!6VJ1`B`JmOIbldvJr4{+m>d)32VB;f+PBAz*M-Op;?h5u*OK$62KR(
z8J4>jcH#VHjdL>&z~$lNh<kT~DZ90(J5f^lr_Ujem)uuUp2CMFsSI|x{PGWkhhIS{
z1BWxA+NHa3HpK!VnaMQ?{Kz|)mae1!e^dfg!)ujkaZrr7Sv&mi^;7HXF2SEM=H4|g
z+wgMsxc>sCx`b^HsYID>J79J6kq~E#>aWl$zi+!ra)Ob;a|>=afnLq97j`m3W9vo2
zl!)SvZ%2ioo^UX`q6TCj(J+3>PS-Z~DIiQHh>M1`vbSC!gA}W~K6yD<fjO&a)@-ol
z$-(ZF0OO(zH}t`jxd~lv&U=#sU<&LuKc`6m=N3Qm(sK)Eg0vDv;MI`=Zt`DDeRdVW
z{KGbPqW4eEhdvf~k$G}{&;y5>(cr&N=k`5(q`%;B9f>6C#v+2ZxMGT4O7fiB&m$sQ
zB*aCm*!UaadLUtTqxjYz)M`b9&clO5-|XP_DD+5tqu7<l(jHj03Ltsk;o_@!_y(-w
z9(Unn6nV#~YY4367`LR86w!Xur<%n!39((+s^nM)?_X!lGwBFVPZBR@z${8-8S9%-
Uw-FSi-^T;M8Dk41`LsjGzk<AC{{R30

literal 0
HcmV?d00001

diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/score\350\256\241\347\256\227.png" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/score\350\256\241\347\256\227.png"
new file mode 100644
index 0000000000000000000000000000000000000000..bbae9015f24a0d9a27bb5cf037f173ee52422845
GIT binary patch
literal 77871
zcmZsDbzGEP*R>!>2uPRI&<zSG-3@{?NGb@@A)V44(kLMyUDDm%QX<_Fl0)}*2EE_s
zexLXIiyzF)HK+F8Yp=Bq!7ml0&{2s|@7=qHE+Z|jeD5AY*S&jiwaD<`FI!Y4^!M&z
z-jfj*QFVsfp4%)~n}`$gycTc@iyW;&Kw>8>4N{RV`~X+^#y<*k$1a26Wiqy({1Ai_
zBa4<)g!e52>gc<QuxJYPO39!_zqfhxlDN#F23HrkoJzO%yh^N#96g)_O={Lt-S%qN
zXX``O96f&Mxw-CDPR{=>ALco|y)3-JgZCGPyhO(RuOEhx{o5UtTjK}bTkrlqFOT^P
z`^S!PkxzV%^AQfu#H2?+^%ni|&ks#JWL4=}#_ZY$THwLHSh>mnc@6f?1FYOAJNNB;
zqd!k>F~Z9I@t@BdY+-_zx3a(Ts{S*q0YP{!pTEzT@Bv?4@HX2#`QHOS68mCw_d7GP
zH!1@Gf<7MW{~8iK0<FJrr3$h<6+E7@!b+iP+JC<A@3;-(u{0$-N}}xWUO1vx{xg?<
z&8$<x;6$vivoq6->A#+R_bfsp@Xb;I`M3lT!V~{@MVTIgmo<L4|9FS$-6{HIi{ZbX
zAu0Stv=2*|G45`j*+kx3+W%ho{y;G8{y>jP9oV9@*in3QJKY64t58Q0KJ56pr1^$-
zL-1<9{UA_@=HWkkfd~5#la;Kw@ZH*=2^-ASIb816?_o)ilsfKAOOd%4eNYd+q`3FI
z!g!GQd*ksBZ4XXaPJDPh%b?LL!#KTVQ8kahyYvohDi5(jmfGC_wyE}8?mtSlX+wDu
zQQ~#|yH%AJ6@IVj(keg2u8*OYfa4kbv5%p}M!o?#f-Gz!`$J$8K=tC%yZbx>)>#`t
zQ&f@1dB46wQz8!*_u_~9qI8B(=R$i6ge}DGpWoG5MGdO3a_39L5g5Pf1ed=uWCUxL
zBw=v6JGY+verVJ~dv7g4KOnp&7oANeB!DW4+)?2AXe`iNeK!cX0-u|PjhnUk3JO-q
zs^cc%)!jUp$h|*GX+qX|UdFw-`-5EA2WRDPlb&66wvPu^puu#hH8or%XVZGlOa6o_
zrQKwY<+C128KCOh6zbMHUY_hcB!Bg1dt35F!IrJ1?QV(OJqgo>qxN<Cg8>{x{t>;!
zC}#Y(I`*@+-HkUFzYpG4QAP4U6&tjWE`I#k_mQCUg2!pxa!LIzue;P(CZ8IV;^UTO
z^;~?M0a}P3KW@d^NB3VF!(KP<kET36MYRiuA%B>M_v2To4MyQw$&%#3V}}dXu|s&x
zo01Q&t%V!K`y!-_DFBVVtBv1B>%{)>*PuzVZ5JYe&bPKNdX5GK_rU@9whW8APDNXE
zn<rSgb*SE1VvARY!%n7$M`MbQBI!sw5D7D55~(k*whBsUy|(gl1kYDv$aF)jW^1rw
zj=qaMD$*+dHN>|!TVXn8lIB6%x!`^{h(v1t^0J@v4V?zBcD0qq$&^Xnoa^tRi`&WO
z8?N8BtU}-!C3Rc*<->w*!>Ix@RhDD%Qex66TCeq7H?uajw5=N}J+EEwA0W5r-(E~6
zQkPcG3SDn1dI$5FtYqP{Jyd>iPllC%Isd7Y{Zj&}4}nF6-DGZPRh&zOL4|l?W9L3A
z-biF0zD+>3yK(<!voKhW$o!(p3uevkhCHAiIrkQYQT*-hqrlUv!RHGk#ov1cHowLw
zQ;wtDJn}jd>tdu9C0HVV5PUi$aH<nFq+{1hlV}*hBN@W0Pt~vz!H2BWkwl>93=zEC
z6W`S-HXa#|)6nDLS5UWsM_3&Zv;D;SAXB5|{Bq$oLTEXN3WS)TFE*;Hc$>9p)J5mK
z^Unvg`qx_;0+cVMeJPkvy803OmjcN#D3^ExF!QADMb?5tnn89Ii>c|FXT5+{uA#!5
z(g)_&9+oILwu7hWSef`Y|1H77kR+{-1(rVyda*D|Yd+ZF&>Vd#f6|x!e(G%kiiOVW
z*P@gHD`6bcJrWFLBh^jEEDY0)Zq>d|ALx2rA7d483%iM@d0yn<&N;7yE8{{ae_fw0
zBxF2eg(j}bqL(nfkiy+Mn0MO?|0OPU;XbMBq-)cPxa#`&#Yzlqd&9KOXvAnn6x;Z$
z{DZ?0De~G*>`NjG={I4{kRz7rS)V77JS`37*VB_tv;|^L3m(snkge-BX}Jtv{<xnj
z_DEn{?OTsJGUG4YRTWJ=*PY5q$!+2FEo}xWA2Ql^X8JaXg1W~cfg_R{Om}nqf=L(w
zwzK^pZ}x5p;z_(iS?i=9S1)+h+bdubjbc?>%s24Bf9C%bBXv*aCX@WShw7qFqlr@D
zeB{OJ95D_VIfzni3}<bSDh=V$0LQTOh-pp=?TfnQ5GEf2&N`2f4SH(jgJ|5^DV)$c
z0tb07aEKi~+h|D};O|&Xl}Q|h9<fj>!#xgu+=KfoZC4(lxanfcPXY@;p1~DF31vgr
z6t&0`E=K;{nic*B-qS}*m54-6(Pv)^yZzT8fh5p={Tt<YjON23K1&+Ee$j4HCmrkh
zfov*!7jt{kCG_XBQ5y`{HX})G1SEBg0nNWYnIEY<3jJH50;pa>)@<AvtM5uumPE^o
zpqElcl%s@MS;*&4Z|w6XQs+2O`=%oVFE(;d!k^D0XEzFlaC1<S)RMdJd*iMlrd;?V
zQ7=<gzfb(iG0cD1ibQS?r5#}!QA%x&3(-TN>NG80Om$vHak8CalAu)_$uci#d{(S{
z{ET`e=ED#nThrNcl>7Oi&|K~N^`WR_haXz4Mpy1v2i<|Iji>VuX@%&nX5SA>T&Fxj
zQ}GQT>P9ZmdD+cXA{9IKm3e~`-HUX{AZ(~Y>C+4Qfo*-y^R-qJvxXhaw*A%U(^nGG
z@|g7*9Oq!>TgTp0uUUB6>s1BV?hf};*ik1+F8EUtFDMrGcWA`}LWwt%SEF{2SrtM#
zo?nylIZDSoMU@REWH_sN(!O5JRBgSm`(54+?4g{}%^TTy%U|mnZYz`*pkzeMaL{=+
z<deCqVxjdZhDf%^_@MoGn+;i{Q-tRa-x^l*gyNvNUhY(9$)K2a#EZVO_1BrhKQxra
z9tb4&OhoXwoLPv^IH7wjka2~4@Vxo_MI=j}mTIl#%(H8V(;ySN>sp+K4-cqhJ|9CE
zXTHT4)zTZ9Ce-;}U_LW^_nI&r|8%`oj>9f>nt%jklXQs1Tpbs)R2tZ78i_BAYzW50
z03?{WUgRIbGS>`p$C7qz;=#cF0kWwd<&KHeEi|Fo2rD7{O`N-Zw?FjBGG!#=$BZ)%
z(skn;lSumZ#Pu|<`4vz|PCzCi^UXz}drCVCLU1kG{E3j!?<e$B+H}w$h<^!Pnme>u
zx4ta+d$Q(5FI`%T%UJdcUXt)GLSH0Z_9U=EA6reFld(m9r8vzqX!!HkB9S>$pkS}6
zU|PrtWPo(BurCsbM7+V?!(|<CO6Cilj<JjqR%CbCtz9Q`{P~(JAw|#qK*9r4S1tzn
z0wEZq+euc-^b?5-!dBv!O|RBg1s&Bb%O>I$g1emkZCLOU7r7B^_*AHj%-zXPM1U8K
z;C5e6!0vv>#nv553O&7^{*3=IJ(~>_bua<tFF$XlqOb6XPK_zk>i3$o5ekgO%K(m9
zq7Ru~4Iwv@5Km`>#7nJmlU;$AnXX<=Tpa%vy4c8+BKJ_m_wS848`NZ-x_n$U;yRDy
zYO!Y^5iDxkw@NP=i1ny_FbXlS*hnIMCvq?uM7e#`M>4w}NaB^vQ<RY3HHmEAG>B?>
zI4jdy#Icj98oCa6vtN*0P%VeW-fJm$WzK~aACO)gmw1sbIu{w7a7-FP6T_Y*#Yl1b
zvvDu0g}%jpP!b~nHW>yKJ+Km|qspwU<uvc0QJ=`-M=3JfGklI9$1bwCL#z0m>kn^@
z8_$1z7HtCMQtd0<Q{2NE+Pafz^M&(uat|Jmcd9I=L{(bRA3v+H!GRMzTl6JE!E-wH
zliI?f+-mqHWgxHhF2AObVkur&njuozJ%<IlmW5wu7%AZR(nQbWSZ#!=Fbv)*Kpsby
z&uJF{Lr}WMB+S_?mfUp%l5Q%EJ5^=**-fsNlD_h7jdZO<oR4zs<|0;?VVgo(Lx+X^
zqV~dtx|{jMK{q)LC7j9Z0&32RU<@XaM5x;1bOy?oX+s1g7bF~K^$XWS^dWx}+XDzl
zY!pVag1qoquts5pV!uHDTDCM)==KTWm3yyk1IuVFvROF-KV=J3NyF&cXD04d&+Y8b
zR(9{KCic^1L|)?7zJWw4CBtK%<OceYyr!={peB8bC|V{8rzk2&GcL(hH+~7H+D0LU
zBhV&m#*BPFtH)BsFxI>$R*Ve$<+}EL?9M#Ir5CdFCbCKVc>Kmcpikd4*EnB+>S4E$
z9%yU=ZPLh<r$^&ki&bfO(}Lu%J8ZIn*zK~OVuHMEq|uj{@wX~B!A>qgJT^W&6P`Em
zWCaDq*WfXio1hIs0dqE=Stkddx+jb}0utg}JeuEY`3Jh+Jb*0z;4OnC?dy{nwln#p
zico!nn{5*zs<2CWw5{2+hdqe%Z%o-}DedqbLfhT+dHrZfy**YH&JY8C{-h~~@I%iP
z6_@_061=bhz<njV!~P(24wUL}6u4j8uyZ&)pAP+UGV|emg3d89;v69@W-}s9$D4|s
z%ZG63)64xC8v3BCI9>O;y<9-?4VX;}Y~##%{LJEeBP^y~{cXa&6M)Qg*{0m~rv~H0
zRkITJG|zjJ`$C^9#11x72eW;`@jk&|@zOFGQuf7Dmr%H>iaMo88L6{w{#{)EHDmBh
z=JQpO&Zk|LnX?$9Dtb@$-i?ml@`=MO%eyGR5d&yCSVR~7E+OnUBU{{FJrYf|?IOBm
z`gD8ixHrF(_<4=j>nmif^N8|Z-3i7HfM}Z&L`u-vc`qu);kxM<ML|<!BJb)SkG=Gc
zBlvcMB69=}2RJx=1Kbt4aA{MAN?KJEcqCK{2ihMAF|)wh2{4@v3Gl^w^0LZz(KkxD
zf3^~7-ooPmseW$Vd|f3QP2wKIX3P!G)_9~`!2ZA<zSraQM!Vi|C(`R`rK&2fZ=F>4
zCM(JDtEfGe@lEvip%gxjQ&>PiIr=(s`l_wwcr&ll)~qxZz~&e{gLSvhe&xBsjyw~J
z_u064vPyT*1rneH{%`?glZV4XX>Z+xLKC&XO!vDUTYr5?5qq@}MsMFA&yv1y&NGHt
z+`#Geka5A6#Ovx{hus=<ZqA_ldo&YiRPy2SuqM&tbk2#RIO3!FEATbw@SVkGuqJ9^
zOy3t)TYIS%RkviN>TMM|!=!eq;iQ5U_Ue8YgC1pcVj7f>Y>D-oezxY>6@V3(LyeAK
zKGdaSCBE-BG@KNn)Gm+vy82>DauJEd=AO*=w+XuO>Bg55vb7x7GyE5k&^OOk%-a40
zhB3Z(=bp2FY-ptE#GmAbi2zltiz?aQnQyFbL^gYL>hT$00M-c<Ypv=zOsFmQI{f}I
z2Lg<&v$~AZV83R$_owzo(?Pg(*z?uXc{euH;tx>_51#Ueb2nV${(c=uiZ6k|yYc0*
zRSm>>^C4RU1v)PP;L9^{2Z?#rguZ$<9i&aBZz{y}T)t_d>IL~>+<=L;3!TospF$;d
zRLEuFujQoXU9=VNdslm%zKGQy|EEzbfCC3U!6Lq3`R>GDK%#O+!Qsi$znOiSvjY%P
z)r8yega(SXOn<15)9&o~w0X&z6wa{F?bY8_hoM)vnx%&FJSWvPpLFu(a&JK}VyGF~
z#}q4}LS*f%lShpmdWyAk?~%aKcW2O3sXTsub937Ks8*GkZYOGgSUW8D+bLo<Xbt+`
zAbRdFMMk(h<eMzi5)&Cq^EgSu^*S7U{y9QNJp@*t4q`;7Gcu=RS33=JtbNH=T%Rp(
ziSW|=0a6o`Ak*fjTo60^3YYq~AYcWlY>GBrJNG~W?L8SfGO5nKh>&=mEs4s09ELSv
z-+6A%R!SP6vaf|bhj&&Yg(3>$QcxL-n{O^0!*=3cG3~7Zl;t_Zvt9!grOXOalRWRS
zwN@*qft!{f(8cvQfl?CYExzCmWUXDrjuf%Fot9fn&~ueJ=m{C8GT-)SO;6RqeR<rM
z87Dp^Jtv0Bcc2#}KIXDiLgDf~f|QTn)tYjMx^(xM#m~hbvEqaA_m?(|dS5zbvDJ0`
z)8bXY@?D0KJ4hY!0`MI3BI~9(%D$RC)^*&Nr_qLWW`Qv(T!?hrX;-Usgt2fF8ck5V
zNT$zzl$59h>O1Un(Yj7^4n<$t_TE@tUmV|a2XN6lt#w$V`C_8!f=PxRyPtlfKlF^%
zY5}g#?)s;3siUnf6VZ{6Z-f(xvi(6Pf!mrl%2V?s3A>f>=dwyy=<)-Xd(CVTtdJ}b
zlT?@NetE>q`>vul9XN_n2EXJ$%W8q^^1G<oC$HiUwL}4$@u%tZ*>}IuNKxEEyAzs@
zvS_Tg(;j0EN^-9GrQ~ci*z*Foa;xR>BsIG*qCpJ@r^<fp#dGw%Jn7A$#sxl)?b2>l
zQ68$^hF>%~d8?KZ4>cZdI|MKnOrUvA7VDB$I^(7x#jtVfm_|FVrIQT=rrp{xhaMAl
zgspgLe)(*C1-c>eUC<5T%YQabJ}19DAlLe;M(97Rc>c@jofLx_tb=}uT!;-z9yB<*
z?aUx~<Owgbd7aB08A7XlZ1pBDPMb~z80?H94X=qzl9zUazUC6Z(a+iSv#eUtQl)mR
zO9iQja?fy1&4;3`AR^6#vhN*F>PLolpiww&|HMkU7;<XD2|EI$NK{7H){CeaY3zg~
zkHItj+hhH;3>}ZNbLAW4v9#N(_49a^>$_l7=Lala@fkxg96sQ}a3N9YJwKaKIZh0q
zv~+vgyPzrI?r>_-<95VhDqB=h6u_fjJ4Hd~5VLh3Aw*t%7;%L%b>*GgpbG0@GJk=p
z!9!@eCsjEX#(|#|Pp#az=<{&d7BXVc(YQrqzsYEBr^<3yl-S>bQ^)fbM|q9`0Y9MS
z{jh-3`C38=;5`<8Edb`o8AQaa!=&!Rf_hmtcz9NmjQHLKD+%1M$IlnsRr<(^>$VsN
zz(l#vhu%4<(+h1RHp3alb)suu8TA*U8RcIjv2jCV+Rx>Bf)7Cz$sBIuv9bh|6WQV3
z8vpIpA(?gkPSut7lrILa^u6B8HAW5<`0;+$`T<1RH5eeqHzBP4M14p}kbh*j^<+#q
zpuO?^$pbDEuP<<2XXQ*G^v-h*lX<V5TMs~&|K4Hk$GeB#6W*Y?E-*|rk1e?8{YdEQ
zyO!C`bfwe0OHPam-*SS!r_1jjMUy)3UobIyYGL;DC!tjDE)q(;?zW0w6lR$F_PQOU
zJHEcx_HB(R`M;oMOhOllLXS937AlhS;aKHNpomf#pQ7Dcu@c(rx^)n}Z=r0rKEpyc
zwjg`O02XXck^caJhf=HDcyP(>7A1Za8aQwXFkat7`O>U_3(qxWbgJkU2k}7~wL|p{
zKOBX{`O0YW`(39rAB5X?qjDLMpL}Skvc_sVkY&(RMINArpV&mC0QnBSr69w<Wt;-@
zz(_K>dfv6*Yzd5Xrrad*`O(N=i~=u1qHF?|H}^0PvO>)xqCS9vC&)dHux6roWq%!b
zv|SUG@MtJ6BWyZ%b~8CFdLv1;f@V|mohYuU^w;8{hQ3yLo=xhFS?(8NLWhWf{)-Qk
z6A(;GIhSdp5K!hC;aZ(fAveK7myo%#w{m?iLWpN8RH#-jo>|X_onpU$@g7|(!=5J`
zr8vR(x`B*7X>4m*M>%PpybEQa1;g+!HnI}*h`$VK)zVP8qH%3#-IO$*yt*gq3>vHx
zuCN2yyl=T6cc??)V=x#rARSz*q6YBDvu=Bh5jrDLgrLln9A(dE#>C*n#o!Azm;153
z!beE{nyGt;J<u_FmAtg-h5zo2p-x5sbTaHmiM|&o1|;tLt$<c>tB~+{W;II~85a?B
zZ;v`(<a|WkJ3}f~BO)KW(d|HFV=j=AS|mZoPNvV0VdSf1&$0gyXeOm+6}vK*AtM=8
zqTCCdLhnC&l^Z#ugA-Jn`e^c_Tp+EnI23;+H#4G-An1Mkvg0z<E~muy_x5`t9T6y7
zl}iV*5|(kWfMyg2fJ}{hGZUls$Qaohz)(Inw|SR!8gltr#a~YGqMDnlSUuzg$?jub
z7h`D|BfYoGFDv-!^K3fbCfvAj`5r`cO%xM6r`~m`vYe5G)30#F>?xVVASx3+0Q99j
zTo6&QWT5eaTbqJNLQL3B2ve~$NTes*Q)9iPM`yqFZ+Xk7zbmORcmWWli@Q!`+@Oq#
z4>CZ@7Hnh_JbPSQaRDFhi(qVm+#w@K$ad5C%?Tr-TVbnX#bRmq?Ly-%JBS((tTVi3
z2S1erp*>WM`lo_?b{sSijX1`M_`o=IRyj*8YNBphyQ~auuA+@cE?Wb<F=<~u4+yuU
z;xEhBMn@&1ePFFGq)0GGe5Yg}v%LHpWNS2;EG9B<m%07-`3LhN9UFS3Y}_PNLJZ~r
z8Tww8&wd4*zMsZysUgx@P3mAF1kLGF?Ya3d#MYE3tm6F+j~>G@tc3o1CSffgo%dq}
zt~b`a8w@r913yn82`iWIe92f~cd@C4pX=Q0*@i;b^Ww!sbW}meOG}}%)<^6^YDLLk
zPrJf~l%z<r<YXmm&0dX=i_m&zGh}#(^(cN{q7w$BNzECZG%n}GYkPQ)0$NmWEfI?A
ziEeT)e%Z7}r=VP@&qLOC0!FbkJjdfIE<C&kug9Yv$>im|*ZZx(^r?rpGo#lj@rsBM
z`X{(?;S2C#D$36IWQztaJZMsS4jQhY{qyd_E;ws6N7I$&66UC6_TNPSolqD>_1kno
zPsOIq=3<k+88pkNeGW#wLQ{09szuL_+)lQ!g!?2=qxm#oqJiv3gi0@~2=N3E3MJ1P
z$~YZ1e`s4D2ZhPeB+}4KrteCu%gKh17WiLf9V>0IrASD9VPr;d#c~FWkE_224W%#^
zRa%!9G$YTK4w4{8w-!2dm3DjR6)-;K+1fPsw2O)BYkD87<G_mq375$;#QZ}6yf^vv
zhSU!C9A@3vdoxpimS(q8tJfy7;;siL7Yn!73t}63Oo`IE9K6>Ew?7Rd{h_wqq;&RZ
z>CSAQW<YP!k%b|nmzCLr=K?5tC8m%sn5e`m-W}Ip`(@Cgj{)@XGxk8lrGW7a>~!#B
zc`Zb)V%=ULkaii_18O-^MR=p5Nlka|>n|YmretB=i31ENZY7@*AIpjfhdF^HRh4Nc
zXk%u(H;EcUpve2ZBh$PEd7l3lUA?mC2I>$O)dQaUrVnwBve-Nr2scC7n&{QoAJ0{q
z6QGR#=p@i*ei%6`kdeaYv>a1hvmCMt27MCtOzO(Eleu2_7syfL)#uU54=t)g2*9C~
zl>lj$Z2wc(W3)8p&X`%7c626(p7#+2xS&xp@Q3jQsGq+R4{b>-f(kJW@+zNa4*sd9
z)$J?DJl+}z#$XiQ--YEqA5jUYKGm&8%ke*`6r#IK*>k7!YcyTIlz7!E91Ha)(4!|6
z&-ku-BB}peRD~vEVNus1o1eOXJ!Oy1E=R}fJDO|gNcB2kb>b*Tn`&iycjP>dO*DQw
zg+X+xt#lV`JEQgM9W57TA><5u1<jQHaEr=c^`Zn0I&j|f?Ds13qns;qYB=D-hlUDI
zs3ZSJj%cx<pulhkt4g9iy{iXE^_>JGBLQ&sM=1`p*ExzI!$$xc4I3!FoILmFKOrE+
z;M@Mj?C%1jBt>2kc{*RLklzx18V!H^eCr#>E`S;)%n|y28=$#JfF&rf&+n0KJ^(;6
zXtg2(wIwFp^$4>H(=ZDQ$d?8{F3hLp=@!st!xlBh{9Bf(Y=A!Ha?XO?pYLyEi6(Os
zD0C%4e006%C6p*)>YmJHrzmuNWbkw_vMSo%Gxg?tJ&g{{%nN~%I)(GrjI#Fggct@-
zheIEbR0<ENDV5Xh=swZt`{O$GFwv57`f6aDogEapc@@Y^6ToFXQ2VLj<>>DlAY*lY
zN}3=+{oK>PK;w1KwO0vLHm=2Fv&Xy)eK0b2NhZFyw$SB>JRw3zp9Ha+3GeNl5HJx8
z;^r{rbd(Jtn-q?%xe?yl?Hq^{m65Q<T#)Sjz?0t7x6_g3p2r3HmJWa~NFhjEO;Itu
z%nWM3P5tajX|F01jq^=qGM+^nAx5EBbZy|3@~^5HtH&#19<nT^CHYW*E)#VT#3Mlt
zbOXq)aeIX$l6x5$wx5skSC;r#ZUCh>h*3eVK~$out|kw|#-cE6>>tzQ8qnY>HBE-$
zs`PH98kSKTQHxe5H1r3%6^wrBd7|Dbvm~DFs_&-=;bXVS3jvC5RU+CJi89O`FioIQ
ziVjAMnF@S^>J4p>u>6M$00YuwMi95nkb1Hz0;INKfm55INWlyGR6m{FH(cY@sS$x*
zpsjetRdNVD_`Gms3|)!qTkeEu#Q&60uGKi-5B*Ox%z%(Hw+}%-=OZcR3`{r{6~Kd@
z8i+vw>BdNCx?YmV51L{>5wpbdIFA#Qxi!TSlKqeU$o5ECvMgSc1xT>d<$-*)Owf!e
z5CT~#q2K&6Y2=J;S4%A?<z7-SQg{o_fjrrm=bN>PF$F3V;u6)^B3QShP6vbQLoOvs
zC71}ysHNAGZ_oy=_1KlygNLM)=MbWiU>a$wI3w@zGXI(M4EdbtZE80d>@m>M0wS3D
z6mZxY_C<|k9E5rOrpp@lwi~5{K&;JR<;nKQ(kBnihZH{&JTt%+`|?3Kx|v9>rh${X
zN`F3Lu>4q=4MCj?{=Fv;|HzX2Zo{F%NG72&b&?%FO05nrgFfo79HDDd^%>8PsRV<H
zg*uK?Z`*oQ+ru<uTVV`_W(olOwY=+%l8vI6K?b7*R)N^*2}&rv;`(F<Js;On-F90m
z#&4*HvtNA1{aEjD>9{ln76H~*xXHsZkt%lN?RnZQQBe>bIHjYQA%tE)LwJy8+EI7#
z^ZbD^GGE^P7t&!8YpoBR{QbA++<zsRh|+VlRGLp>F9VKV28n8=aVx)I+1UuWYgCqG
zTAE(*QKkOtPpNupit3c`T-g$0Up#W2$!0ZRuD5%Ikab3-P*O(<oDf1tKPCFV{B6f~
z45V_NhIhz312*6I&taSu<`ztI!gSC6pQB<1<K8Bot#HkQqrxGApcJtB(yjwbswR_f
zHA)O<qO(ykM*@8?S=6QNXPrX<C2FyL^9?|#uKTS>Fm(@)*2SUG<?xp?Zf9PMT~`b%
ze;QDfc+;a`Gv)4^NWLHbNlbt3ld$=ZLM(c0Agj?_E*>;p?v4!uDH%V~s^Opuoa=W$
zBA(Q9E0VJB#M7{+_EVAgWtD%Sk^*x}@L+9R4Ncj2SU1(Y9eJfsNAgyYXAQR)O)=(1
z@X}zNmJO~^3I|%5&tI!KVqT-3$j_D{C`7{S)t{!)vhlj)Q397910swsHNb;aImYB8
zVkoKz5Nj}ONU)@Wf0O9gsYW0JP!R=IpOvjmW&Y_9Drg*-1~GXpbGuP8-^&N<v;j3`
zQsT{)AwTNY_?R0$7Mq0p<4(TjBg{61{Akc1Y(%Gu$72yC7(jt?f^|QTT{(P9d7hBl
zfG0c;jw0P=^`le(HCYyrAS&#qOmXSs8)GJ5EF=&)*8}@57!Q?_^WRxU;Y28ppPB9A
z;zTdyIa%t#1AMKHHZ_9FRIF7V&{pU0n^$Xg1@wqq9eKw1slIE@NXIq}%38OwCT8v+
zQ1=zV!97>B85e`Fiy{BqOA*y!L@m|nTVv}EqDE9L3kZ{#0-SoZajf0+%V8iz3+4tf
zpZeuR9&F&b>3j4kS$>_~D)@n=JVqlBtSo7y<0Au&Fy{Xy_4;;2=!S~=e1r|i0?TEp
z_X_18TJxOU@GU5(t5q&Hwy{em97{tc@0tBvq*b26brE^!ac*S4gV!Y;^_xJb6ESy`
zZk;b|suKTMAVb-(DKSj*xiBg7s=*b^{sFq53Mp<;r9>guZbcm0(u?I}qvFdSUm#jq
zBtjjn#^`?!tjY&rhu7G!UK}O1ARHE__6U?Zsyj(0sSFYj+eV^p?&^eH-QKw=*&M%)
zYS-l-8oV-n&Tbu&Qs)o0gVY8qrU?&TCoafGx;n4>A5j%E>{X?@Lwl>9^&d`|r0K@?
zR24Zsz1p3t&rE0YJP2)7g|Vn$Suv5Bm8FT|)b4(55y!Nt3eDXP;}{BhKzhA0Rt35@
zS83?5vFFO;1>{~%7#26O7wdRDU^E(3Tv_?HKfWeAXfYn4c9I6c6cvo6(1$4%x_6cW
zY&84cqktLs8Sx1Lj@{2R>Yzq>Z2mchi|-`H`Ni)NYx08xUB{`gt{C1lb4OXh<1cc5
z)FXKmn4zgxT>jhR2~`ODC33^jsN9pFE3R7tefKBSE<Zg0KP6grdvjUO(ksM@-L@us
z$o{Fc&&ZXMWb}I_h_zmt1cRrv-I>S~$2-fRY$rim=4ryh2cgWnKdP+eP|2gOI=BSy
zOA|PeDXk;L$f%=~<_<~2h=^}>>Nh*qUiBWyEQ0g>tL6z)TkgkIfA}mIrB35t4J1wc
z0O2^Y55i=>51pQ-dR}aP58ps0Pl!(hF$S_{Eg;62J*PJ1^e>c;mH>c5F@`sFcX*+5
z6q9yE{mBggAKze3wg`BEz+YF>CiI2VN5B<RUaAk8bu7hYz`HV__dO>jqiRUT<`_$(
z$vPu(m>)c;S`*!i*S3N<u`s-lm6Pi&WS7L%9ehOGVVDJW5vI8T79Jb<4j3CEcTCbc
zT&%_@#%IK?CQUdbf0zD&LYKV1T|S%&Qs)`X*OB~)#PD7H8z{l1k@Okk;uIijAA|s^
z*ZmoK+Ny7!*!x8a0pN!caa!^xSpHZ;S)9L5zwmC+7Xwhc2I`K){X5^<G5TLSHgTMT
z8r{W5`fwjstG9%oo+lYX4H0+T{E{9^IMn=EJeaMB2EHmEW`Cd&HvmLl+Rk;tlTldC
z(Ad|ias{Z(|C4!E7zc+_ZQ_-%>Fl9=_G7?08ei2JK05`dnmTPkkOmHykYPpRX*hpX
zAek#Nv8geUhB)H6WIGktcR0-EGE-oqYLhS|!4*=e0n#D^cql0-4&+ZnIIi;+K@sDS
z9JwTetioDyU>^DeGY@eMdxkfHLDGhQSnQYxSD)!;jl{`5Q80~jSAVaB=?SJ^Mt1n_
z*`e(*`}qsw0}f2P#bO&@So8H{m%6b(=u$rN-2i!?vK{IMj0Dsp@i-d!pXiDL5uNlB
zR|OVpE!3Dn+O076NaSM7yGm24rdLTmFw*AyO#l_9ei2$qV)vqk8nKW&URr{c5EZFL
zO`$n}ZTF@cT*}#*cR2(yk}$d)cds#uFnOER%*7>i5w>b!V1x+<Mt&rf*upxFrw|wf
zmgu)dc;p#v>glJL3G*|Neq3UwTtg(iPL{gL&!}<c2Vf)o0{GoxtL#BIx!NoB7DJs9
zWsC&~LnHADr%#cMzrD+Xyj%gaARcH?3GjyU3-9)_7L$9YVoWsa8uzU<ZOpHd6|>@q
z9$~_dyiQj7m(!Dy%s-W8fHS9n-&9JFDH2Pkb8mNbVSD8KbiwO39c&x4m##4<XaV1f
zg~f_1Z;&#nWWzCx_YBsn0jP=zU;bU*Ken7*ZlW8O1x1i+8f^^?g?Do31r)-T;~75j
zf!@M5&t3sxn(w*N7-4lRSfUxQs)IM6SLXBTemnFdM;zi?aFw37{qwPPgYEV4q}#v(
ze#Bx7!eB+#+DzI-Y)u~XAEgW?0p1eCT<G6L5RRBZdjR^hT^dyido781^StB0lWzH%
z`{gsq+I`zs&gq@^AMsP9hRGTqh5BRZycJLJ!tMS7AlMMBgj%lBB<L)E%yQ{~eNX2M
zx!9o~eSnJT-J|`5Ljl%jA?_T006)<$YVZTAC#`9KFD*Dc_xjlvNm$u89id0AsIFhR
zwryi`UwR0LyN^`}@f^g-4HsJl7YnC{qX+rRCo|SLA9yFfaafujoaF4~hF8ESGnnWn
z+E)f!K}}JF{H-SBWS3t_yDmo8!?#zF_R&W`a5H)j{Bx4fPBf`SAQf8d%}&qME+Nk7
z#EH+YH5J4SW0KoCc>0$e9%xMN5%CHFZkDGwsj6aUrn+1BW@q7sCGpP|(Ifn~FblJx
z7%vaA7|=>$VT->{5fUJ8lf9(;X4oB}K97<$T?lc%>DmkPEVTjgiG^(<zB6A*x(7gI
zawJTVfr7b23f<Ir*JzTh%#`2JQXo2sVX7k6fL(Ygm|!-B_gPa?XvxWASGhU4YNH*t
z?A3K^q)NG4Abs&U?={yHjLe*cCjq(74w^PiIzEijGsE}XDJ<os*-IpNVa=WT(}k|^
z$71id^+Us#FF_(KV?WQ^YzP@gFOA2A+52{KTvpmMRyP1C5$8zXGG(rVboz=wE|ZPa
z${N*`ubg1cW66ESvf96?Ovc`5<YSf+G&Xl(#l{u%I`mgp>q39q)tAUeuu{?VE%w{p
z&H#H;xKwf{IS`|=Ec5cN^Kw$nQctG5Zm(y$G4}R%kay>MJ8|oPBQLpGw?rbBg79Y7
zsd<K>%!hpDausVu#)i~kT<IP^V02Emrw)N82ggG3^s;q8{i0jwrnI^}gG;o_9i*u@
zJ{=aV;;`X7h}lg3;V>dW0Qar3n*8>xTZ=ytXrJyt7Gj+TLfaJJlH_x}Ce$=ahtb`<
zZW!$w&c5LE^t%Ei2Yg{5Uwlu-5lK2I*eWa^ijI&{%fFk+e`p?%J=wF6XMxqu^+y*Y
zSwaM8O-a30AfJQu_XfH#$_=|QN@FzXn~>=q0A9`Wo6DiG$!Qd~8Q+nUY*CMtlRj2V
z#|CigA`UDfWi4W3cq6L|`x;5IQTxiiD8DMM-<8$2T)zhLg8?@CldcUR8&BY=eL!xy
z=M-@HY7$s=hUYVae%mX(WkPYurd<OJ=+PP2x6UF{&vqQ>s>rALvbUj$9Av;Gp8Alr
zX0fcFb!`1C?`zZ}*%g(Nqxbf{oN`T1F>{q*Y1tWuy2z~lPST&!X4*8TmJX|+21V<g
z?)x|=jTJH(?V=2(6rlMR$5mXxAlWOM=y|{h)lxNu_PsyxMCA0^1CCRnxXQ9#HI^^a
zPkgWveY(_tF6}j*>K#xo;c4hVB;Z*XWRLdCMdbmUVH=zWsF(I!?*VsVR$P64cw}Vu
zw-N`evg+|#V4jeHg7mB?Pj*&4YZof@JO<zbpkX%4DqYqtf3~XIjFw|fH!SKMee#UC
zr+ZlDRdB8v`g_I%QIKln3BAp%@?o))pPk7ID?A0ra5?J*B~8;G{LMybXr#<0pEj<x
z2sG$=M<Itq)K~H^bf;}nk?d)h;U+_{HBOf$-&_QQErUiio+(V`J2W)teC8Rhv42=w
zls56`p*o9;w~qD43f7P5wDhaj>jCJG30Z<K<ryx1J#S_}zq1d%af%_r-b}kWqRad6
zjLlDgHU+?>)&)R*wGI$``O5JR7m(!2CLFkSTJ%y~>2H>#VCVmemhw-giAV4T`uEH6
z0{~^}By^%5;Om$^5M8*vIV~^4<gI`7s6g2Z?<rPbn*FE@&k3-Bvz<*~26juY%eP16
zrhtR4d<5{HtdgfQmlt7gk)X?WASMuiq)vhphKHMIvOx%yPcq*>udhGgrgxAaB1lh(
z41F}p=5jZh0+{_>V_<ty7}jB(U}8pFPFKh@ueik}nq~rQLR%M$<Z5zSLs56O97<I%
ziUCi7DsyvyhRue7tlOPrFOP*VS_l1gvYC@cD|Nm!aRzUKR6Xwo)e+fwo9*gD2150-
z(CvM?_GVF%3uTSiZGJX6ng6yD=^-<d!mGUpmlrT=0Q*Oh(D^tc;+8Ijm{m|dm+$-(
zvb?=(H-X3GqQnLdprt<Y{YH?ot!HbVSoi99r2w?&mtOv9tOW`FfJj4LcDX$JTS&c>
zEw?A|&DT5aWjg|XQg#GL)ICcK$3Z6xUS2;(JCJ~=rVISLyMR|(J^&{n+kH8h?pL6N
z*nBpwj?z;on)uA@@?FLLuX6AXD-pp9P&)6jPpzpjS=o4+y1n(DCpV{*1B;v8NJ(bZ
z40Vn>G(!);PYg@A2{7F-^FDBO?{8#Ac9G9e@PO|83k)Qj;_IkISFo0-zFD*w=%ly-
zJxNpss1uqYnxwOui*QIJi%*2`>w(eQE*db$x*^h0gk)@1ez%~7zgYBr<m!YI1Me=O
zSz?+omf(5#d10d$P#Vw|vig<4({zWK5ddy_fbka%S|D>z&x1PP6Fy^AZ#l&~QnGZF
z|GE6^6PGzjhb|>(Ro>R+4`0hNmso_nA`pXr5!fNuW^I%7CY>AQ9*h})pnGzuywdQz
ze9|nPBoxAwQ4cU;Is&i!GOZwjVIRzQcA5-<y95HxtTw6^QNGWzOl0g{%?d_jB5=9!
z1GJskbOdatwaK_Y%(6;EQT~4mtoZ<;7!*yZp9RHreG-FDIfcITUgggK)sX<WS<NSl
z2R}dO+jvB;w0tWejE2JWWoNb)N0IRwEU)xE&4X3J%WbLw-v@nZr1)(v;_>%8d{(1B
zV2mw4=!y+AmTn;vBI}nG6A?=nE+!L&ywg@&mE58nN6+bbFFio?>CaNVM5YD{M~e%i
z<UpZLM&@p-8{q}8jMOZloczViF7-;D=T}afT>DNc8^EwW<kMw-6vCsy4}bkYvkYJq
zdW2uFt06r1^0)7PqxJnjWpP!r1~`ChNd9c2+bIyp9ve<p8!)mUsI$W#s4W2}!a^Jr
zmCP93QiAHGFB3vM5(v%&q*Jp$_geH2Fodr5fiia<L=~|y_^E-qc`JMKYJ;LF7gMtp
znIBYq5V5~d$qdkoV0hwzH{QfY5Z&YUKMY{e{R0bA4dXmxl3I!`>}vbatB$ibyP!za
zKnwF{gUca9PI4LZ3ysXXQ!~H`&m{0Qd9ycf2;lAizG}oi2P-FYhvoUM8oqzs-v7FB
zqXna-v_6+^Swq;-kyXiI$nCDU{`~>-i`EhZ$lm>3GY@&Z=HKtol!Y;~vi1eL|9r~8
z9znMJp9dt0fR~e7WeQmTGq4sY<;$YKzq-R*=V??*w|5?z|M^}`7v%9E*dheO8lqsh
z;;p|2?>}E{fQuk&@(lL!v-KiZU06HwzbD>e2l-y14S9QCvh=@R`SZR*5isxbFO?*9
z6UKj2{l9;xz|gS@AJ;_b6o~%23qk*Se&I9Nv@JRGR$vQy8!CLQ@xMm!jUKjY^z7<1
zf4^JKUwA_MKW`Ze15Y30=bNbhpJj{3%x(Ic1^%JC^~C;l!D=ExIK268UBS&J(Y{X6
zvj20v37_0MzZ5gk4z}h{l>FU0vEtviq<}Q?jn_@<kNV{B@9F*>X6T2Ouz=W5c4&Ua
zKU?^(qateH4V#wmy@Io3<XBj3|KF=#s0mw@QWfp{ObEXsh2#GH(YqLfU6nd)FR1v_
zSz#dL6a0H?<Pl*sODCr)-QR6iXNBkb6P>WLKm+sW%cjl||Le}y0Fw9U-y@y)9DI7{
zaT(+P9@V#5WYtUXE$l3?C}81Gy8iD`-7&m)u!G3t?qlx!?-Bi-URXd2_bWW*2;Wz6
z|9zwEZloW;XN5H}@}K>mQwsV=57=&8^Aut+bHS^5{hu{MgXJB#mRJ}qVT|Ry_5Odo
z19ojM%C5{=!~E~CyoLWESHK^>QNmD_guEg8|Lm3~FGBt=K)&8xUGosSJ%!yU^1+(K
zeCRD9W)KlI8uAKwB|$BH?&=$hr(|<PS@a87YzX>^#`;3_nGy8B-4$R^`Bn`in|hY0
zy+LB#H(P3yq{`yIk;vWIxmYliwBwl<+4(v3en0Sk5Etjq8^>nTh-Ow0XO1;47QYI!
zd;y1!^o%m;e6k+6hyY6eUeCI_s`%Q;CJ$G)x`9&4$VWmba9^8l8*0>VBcg}2BCs%g
zSP8estdf^V%UV%;oTXZfjT_mn^8Us94ty;{xvJ62A4I>Eh-=Y;q4bNt?}aEg<}n++
zw2W6jnqsbQ>s+fuPKLZ_SppFCtKnDQPd^mpoNK%YRjXF}eOvgo-`?BC87!YgRRsyT
z9keVs_xZrZB%h?zai5)fRRZ$|Q}>pi{~~!x`stm*3)BXB{dM*ado&obtbNRaQrG9)
zK}zCC4@@Olji#4J;xDw9mZ2vP!!txCK6S)xIjkcyEC37#Gy8+<N<;`RN|MVcQBREH
zjHa-@CI2YYGR!L7l-LXNx7<>51D(Gd(g^TQU7j}I>h5Zpq|l_0je0GDYi}o@cRJ{B
z<#KOKq-`Yr$X(Vn2cvp`ohcsxk`imyC7=+LJN#B>XkyEo8UR*}6qw9$!H9-EeFb!1
z)Hv&(-yrsy6dHV#rq5;36}=8Z^^*bn=Wd_xvGh>C20R631zNb?bz(^7>EiJO*Tm{P
zfB?b0VUQh2f>nqsz}Y!AR>0%dcBP3ZcLUmoOU7>3SV>kBiF4oyddW^5nrPNpMqHtx
zV7vocN&ujq-m)9%{9tpv*vtvsD6C=*H0-j*3){$4e=Nu||FfYbPEe>kJHFIoV+JT|
zoRE@&(vctnJ8`Xy!<l!wgi`&)OA_aG%~xw|9{q>jc1o*dNuRBorf$k`z4a?PPbIZe
zR5dCEDQ!nmmJiN=Pj)|-p3H3wZyUDiZl(0S@>Kh4Y+gy3_5s|W+9MSOmLd^;{0VMe
zrUx{cN6*3?0fMqM=m~Q}lUeBelfgk5-YVObu9v>Uu?^Adaoj(!Ho-X@@!FWOlRWJ_
z3fE`_9YW4%)6uw!qW={T*d7{psD4{VP~=%r`&c9rF3l`HwbbEfp<zEHcxhoe3N8W#
zB}l+=WL5y~_jhfSB(K`hptYxpmpF0Q;B@OS!u7h>O^zqbU!9X|hnk(xbX;Uz*#<g@
zs2nKj0N+DM2ep)0cv#JTUU>CM_&EGWJpk%#!@8V;yn<-18uvU(Vk)D_Tu@#|;tulk
z+XLUk@L75{e>NPFF0ttQh<D#H1CqZH1t=4TYmP*;<5Ac0p}t8**4ZX<8Os@uRYBVU
zo0wMR$+~%Nx%uFCXL4T(f31o<o5%HUeQW>Z!=mbWS;}um+1%OfD+obUZ;RV=dku$2
zve+*drDV?0&6q}ipP(=dl0U~f0yhmOdd@SQNocYPgM9J|MWZuJg%@j9A`Z)@rX8hU
z&E}3qV1Eoglq%o23HEQX_B?$2!f?C$Hg=MKRW$3_lfFed(Pi`E+AgcQmLEHnrA3m&
zl|=9OC5f$nq4AJWsj<fbB<<X-kJ}i%95X=0bs#AXbb0p3V*PQR4H^=E442oE8TpW<
zHwb>Ok|wpaOH|t(fWYOyWJ(jv;o-O;<SV-dL9;HmU6DkM+O*JszbC47kbR#c^yt@T
z>#eFR#uP^QJwX2F%BeS^PlzuPNN>sXY033s(hlX_na<m(94-vyMzz}jT$Pi8c;j;z
zx5U;m>%nuNYPFTdQ-xEGRh=LzErk#B?KNz?w&o4-dM8KZtPg*aK`1lxBV+&9DZu#8
zxZ21VctOVypfA+4-}Z3IUQ<=D3e7xh%35Wyg17O?Im_M+8FmM_5j~ue$gZu*L;{^~
z^~icbCZte)hpp-nr@!qh$EQg^S-=|pU8WgtaG@!^5NtTX4?(0pJD|Pjp^Btd<!4}2
zx=$)D@^JuN607ZddJC=Cyo_aLsHf)&U6PJ+MbCA%5vCYjEFUUlV0?dtu;sU14g>j+
zj>}pc2J%4H#RcF%*mZc`w<0f5|DX@G%|=-H_!3{R7ZdrT7%r29E&K?ayK*x9SBNdY
z3PK+GYsqwp(>jIxu=0d2a92uo6+yfnqOw6b)~G7%c??N0>4jdX+1aQvlIf#K_u%E0
z;+%!~LWcC+b}IGQv;ybn-tcyGh`kVp^p{mwJs}OD5V6Fe;)4sFNVbCA6e6T!xq>G!
zs-vTl$d?ERQ}n@m&6^j+Bpn2Jg>1T$n|jd93do)cURz@?9Mu3aVr42-+@uNjelwsU
zk!sya{N5<N|6=@&ZDg}T?hOYm=OnPrp=1NM%%<GDI7YfiiYZg;?S7!p5$gRFXnA!H
zsnaY(Zrw$DfI0fUpOx2+xxR0X<uLw6%WK6apP}6UZ~>ItQCu7HVsNDhV%bwq4B~n+
zFvZhe<&Yi9nGCguE05zp&#z#jOX7O*B^K><+k?U>^_<fpSsfjY9+yU<5u^tV;K~})
zX~1KveeQpbkXcZ-bb`mu?Wxyob;M>^T+pW65_t{rYm}J{IZKfl4}Ci_?ycNR*PS93
zK9|&Wr`rKL-WQK67jw$pm2(<Zm+DlSP>JH(J-6d_Xjusjw!YhtomuQP!^{fL7;F*G
zV{wFt1S*=Zz!eu@!hGb_ldnQntHWB*btok10_@lN;1a+YOl|X%xjKQdA*sm`I#B5+
zBF|)x*}noqs+6{$g<aGR%f)$kn#El953JJ})Jv%Sy9@<Z8M$KmBDTzO{R$|LOEw~B
z0IpOS&oRR`Ff$fD$t&@l!1+|BjJ{$a7G(bvZKAY~s+l12Bku<+*_DxlffaYtk;Yvl
z@~gddi==mQVT|(;=w(^wC+@h{f{-=(&TJj2@!g5g`zp!FUllYgI&UE6eSAXRyIC4#
z1~7r=-2rgOV;FaEPH?M}eb2mZC*}j9vNNXO5g-Ji`3${vpsY2^p_c<<ua+OHWOX}^
zw>6Q$42SBv8s}X^$vF_w9wyhsey<*3qru&yxsNi5V@hwMc0icNK%q#7CyI!Y5X&4w
z5t9W;r6=QXaHN(}565*OZIL;q>13dua&25zE5H7-#{e!w_y)L|;?m_BvPN+B?OO;X
zVx$0Sdx+PsE(c{`wU>w&k`l92#w%dtN3C$k8cOE8=nWLWLEBZ}!=te1P46JK&$gu$
zDMQnm4w%wOlYyxEwuc+0_3;(x3H+KAC0w(~WDFoy$~tta;_91an2~dh`|6e+6%7en
zVSXEtHBlOi97A|7aAkx@iC6!P9JorxDVb?F{~kXu41?PLCbrPSWR|^CQ_vu|Tb{_$
zKGC`&<iMV=iz0&^H7`V~tW+tM?u#K~`yrqH(-n3yIp%@hhsa}d)sGv{Jx;nB(RlV;
zpXcXOusyv{pI{;>Q60}LB9zF6kn>einW(iAI*-SbW0|aO`x#@q!}A7uzf83;E6sTl
z+4jw^Do}xqtdpZ_?Kr0N=NFkmc+;p;R~^>-$fRw~BP}kB%TJ^gYqUC3n52!a0XuDd
z-4K(1N>IgaLWd3sRQt5oeA`sPmM6PFSk&I$Hf){EX8dgQ-N(!DZU{rl%iRq>cIB2X
zf|SDSr~xJ}Wj&32T<Wa$;Or&p)<D616`uoX`iE%<?rk(#YtXvF{`v-`lR^n1&$KA)
zK2u8mSmT<Z*n!@c>f){Ast}o|HKZv>Y{F(D#H@_DGq*g(tS!Gv3)Frghq_U8DS9bw
z?90FpO_|NS<V6ChL*q8Dy+0(VB_;|E1R+>P#Fepd#AiLfxGEg;jTQTd*FlN{^;PJi
zb~hKdQXs_PaN2Q{9^)0xHN-&j4uBA1q)!mmo{6lU#o_XvBPT~OyYV_A4J^`Uasz^;
z93=`>_f4_(eK|xl10+&Ku_5j}bjahdHTl-jn_fjvACSYa61BAS05xdbb_D)wfSD4x
zMT<Mu@?)8(_5vxtgCGRsQECy2g{*A=LrrX-DD?k8^)Fh=Ay!0IEa?3hfpM{=(acz_
zG6STDjKxULBPuz?V#72%S766o_RUCjA+^HA1%P;W$@k4Jw1cSatzydfG2Gb`_bvld
zvBYl)Fk1XIuK5(<4tIoWM9%zWNN9%is^s*?urjlZ9?l%r+x2a-%k7ERS!jI0{^i5p
zjS4V-vpjW?@syZlQcxiw{nk-PMVk^G<)mDj{8=k|p|?+pHy@Q4I<h?e2-tp=9HsLM
z49La3ZVuBjEebHBpM`UWblkktN!<K;*7TvNw2obLcalmrxINvMGMLnFxC7w;#h82#
zFW(-s)2&gDw6aoSxl38IGs$mu)29)!goSHE(&{xeDC^k0nDxr)tk3)5%hy*@7E~+0
zu4Xm72z+iZL2Q@2=zxP7jq!xNRu8{i>{CPW{^v;vj0fqnubxW1fZd{u@Qmu#$|-Fu
zR_h&#N#qst31G0HHhguKl?_r>^s`M1>Iy>9A2Rm$IK*G5GpD~CimeQ`vC!pn+&1Va
zolk}&GmCYLFeXmK^NMN}uml&+NEMLMj;eD9<wz2B4(qb79Yy<n-{n>p<%=Rno-gkF
z8ns`9QJJWSbajzYNJDeeFIzreVRw6d`hG0&?(S8<;=PVa|8Le??n$_)=Ra-HOUGf*
zIq=ojp_<iQixu0pyllhn!Pu`mV(w=C9ju>uSq!VA9zQ_~NJ9Mj!(hna)kA-ZEHrmy
ze<rlGc~vG+al0bG?g^MGQIkezJ@zx!>c;I*jE4F<xu&8$9h+yrH&TGS&n_~Ap9r%}
zOt&7<P=$yEPmC+Qv}EG3*we6V%<nJquU@4V3uNC)I<y)4jl-Kt&6s>h$-9NJ2FYvh
zNH*$3&l8E~K)r9wa_+A!!E9-+^L&*zLi3jb`$46Ap8|UjF_agOt7+7#cp)yWJ+jas
zsO`mM)5Ij}>LNM><*DS($AySlh5QG#^QUrwI4VYIrP!vtujLX_k|i@9HHz67gMU#G
z^R$O^NRO6w+F*t`lEsz_OG=At$8ljx9)&#qm-OIPh8&TRC#dregNq=ng$OK=;G>=D
zg)Z*Vif1jKv3_MS4V#MDx|%vHKdy2l`XqzPOx~A_0`4ap<9D4fsznZIT9IH`$TZBw
zS0T8gvN$wYS(J^cn_wyy6*VMG&hRv{rp)^kiKm@wMqlCF>ypc$xQA8pTY$F2Tq7f3
z`EuayfDI^92$Ovkl{fEQiZLC1lo8b-!ZU1-p+Sc7Hfx%iXveW3x4Y_o_CT&oVtApR
zvhS>3g2c~rL97)5yK*Bf=BV@6@t<f(mAjmi36@3^-dv5dd*C7C<$O<ouYB?OnN~)T
zkCZvrkY$nk@&1p`ar}O7He~}hjDBdRdCe`jJ>_`gyF?H@1nd;}Xz%4wpDaCOm*MxB
z0*<i9GTaF=3P_P%UYj|RMf2dl7vRKvJq+kFmgZ)zAEIEhI1@3|J)jQ{{b)bvovIxX
zYXF*BF$@HLA76Z5?%A)-q0BE(0-C=}HD#<sO5=y3@nCr2XBOGUnk5lQr_5Py4SfvR
zSXUBQNVoH+ktY_l@{8S(JHeZ&QePE3`Dr+U>WQhu-fV}b^xX~N%BTlUH1qpdpv4oP
zfhKKHsWYGx)4z3)j(IbB3OMyBZhr$$kS63}exe9IQgo(v9mjc?to9i<MlR8)ygSma
z=5=wqj6hF5Q89ikXTqnjqfh3=BdE{c>KZb~{H)r+Rp8h`36RRbG?Jj2tord3#}4zO
zdE_miqD7a06+O;B*MYjc=>(k)Ez<lA4;g`iI^09mNQu{|?3q!UnHCI#c_g1=*T=W%
z!!~AlvyA^ATW1{?)!MFMMY_9d=<Xc4yQRCkLt5$XPU#MbA(WDCX(>UvB?Lr5&zc?I
z-rqUrFMl{QE8g|4dY=2f?vu;APp^*>>Tgr^6SrFRK{2>(`IRv{BB$}dV|QDG?|wQ9
z`@jTI3bnV-KrpT_H7ecw-Q{s$AW(w7Br_K{yRO>G`7Xm0bPbRMOme|@loL@W&see$
zyYb7bmOcm@iX~kFYT*z$|AT^8{zCSaXi<U;yjD{R2#l>)!o7^mUS*${+IM%nnQ&rv
z!j0kId<$kE6w<9*;|UXoWiG$dW``610kmf?-!qs!PiK*)k2Mp9PvMa9Et%emh2=z5
zAMHnA#NJi5vqkl&qiS6nwk&IEQryv~L4?SZ0T28>B2r6I0_|;b&A2?8@<-)$3dAW4
zpH?&uh9oM-9s?0=k`uRs^4?{+`86jB?i1Qa`FvYnIYAZz-w05__$}^7+vKKOG_nc?
zK$60I8(?KzHXUtw+2N2({aUM5%v0tudQ{L7KDxPugBjT@%<j;Mg4KiXxXH_%Aziy}
zvDTY9(_3^Q@|iregp@d|RTmPMCFyP&^4VA@4x^txflt0XjPe!68%xs`oS!Cq5fSB7
z<#XV0Z_?e0N{cB{mQ(AJ<9Gn?yH_*z+0S}%Eg_sC*OaqzzL}ooLnn34ukKKYa&u~k
z^>mzD+94tqviD|yT`~Upbu|rgalj+RmeA&O;iDgLn>>KFH&mi$G~no&nw>^U7y0sP
zfZ`YCz77R9B4oJ@@@>WsD1J-;*F_vuWU>T<Ux$EfJ{B^$491ZrZ@pR)H64i+$I>(R
zXZ*GnuqQ>cQ7Y&1pv^hLwX}?l#2Z&IBP1XvUOUj!sJlyMF+A<#A(h5O!uPp*9N9o(
zSHk^9nlPy#y`rw{F#eYp!~0wZtqBgxVd3cQ9@PUwxFiWoK8Lty-J^$U%!<&=lX=<M
zF=Z7}`jsX_AptmM<JID&kK@Q{+4Q8?k;RhM^C!dlw4y&Gp3lz025Lre!~nC6A5Kcb
zavblYC$IdqWR#r81*MUugWTtKP)I%-8a+5@B68+~D&6yB64BoZEdr{^%hrh^w|c&@
ztmdK3Oz~uB=xt05c*I2~;tuu`l8mll5q8Go-HsNB8bwQ*>we3mDV7;5-xJlis3h$b
zaY6bl7BWn_h9mw8slylV_16I5NB#{97PqFnJrUtSw6tH$J=h;|=ELh>ol$R}f$ol}
z<AF-SEj_i30WoQ;{E8X7c(T5v8zo#*AnwrXzBgKJANL9<WRYM*U^JIu75MTfcen?V
z6SWP*I=Vg7j+n&OSLu>t-=J2mFG!gppV81WVT&jr(S^?pAPlj?&KL>kOJAyLOckVg
z84BvH%8|Tr<_ZeR-`~oos6^cZnm%hp)`&73_~SX?ZpfvTkb5<EbTLbAtWnEr;HH&#
z=xg-;WvPx<oJnAfVw~yEaosWnr-2kI3eHK?)FQ_?vF0CYNl?=pZs?)eHi;0S)j?*V
z=+~s0qdB0#Ntha{L1MkDQPVE3c0J-V{R>G*+8nVdbB3aXdO}sL-t}oWPJZcXsoqfA
zPQ3*EBYbmK%6jm_*_?pG0a69p^j@Mov(p&~WWuo8-hu;Y)d9le(#Aks^WejDu8sIs
zX{zv+Wh6#e!Hvtl0&ArN68HWz5{HyruW?~nVVm*&v9Rq024-?0MiqU9ZQ6n(T4R-h
zjB&S}EsFJ<a#srLciq7oCtVuI-vCdaWvzR*u(D%&pnY~!?P{BWIg-kKJ%?=Wz)QX*
zNq@faOJv5)w7D3i`gm1^Dx$h9vh1R}g`f`(76?h}@_45K!(WrVy)gxKWZWky$*xHI
zVM)&$0TaPt>fYEMRb2YuKAc;NrF3Q`qd4$lxUW>_E|2xi7)L5vva;h<H6`-){Vuoz
zxITf>hf7~#ImF4B+aE`LDgqlm5S7iR9Bf9B^r(3&Cj1c4lt<FZ1XF2{H!`EJXHdY$
zvEL<_fF}4NOg35`C@1<%7OOux{5Y1;Ih<*~Bhv>vIK;J}M5XG7Q{^y3tNBeNw||*Y
zw|#KRAjtY<lrHjjE#T8y{jC$Ykxn22gv&O`eB1-x4O`t#*@}OI1GWj_z=XF~)f%;j
z>CBL;mRF&8h!+XtKS%|?eU+dk6OcXQsq&pnBTOs$M3HR9NS^9TPf!57ZgaD4ijc|p
zBHfVfS@3l!MLFyXSl;%=!#<^E1}{J}1GWvDkR2UGa~O1RCM-t6K@fTbbS4<G%Bg5Q
zKirlSu_swzJP9U11f&LxIlNS*qkvzYPQJA9#r`TbNsy5uYyP&D_A%DHVw8FyALAzm
z9-<7&w_kFxk|a1u55D-8eLv{#ZN;J(AD`97I#Qk->izxDJO7Cq%LFo)nZCDKR_~@N
zYH(ifJRKT_%W@#dh|V13XqNCig-EV?M@D&bC^Zj}M$1oNV0wV1I>hlh^co>$v4H0%
zq)~U;P?hhq>FaJR{a-Y3N@vV??EM1EMNYK82#o`e^@!6($shzQ_db9eC_*f2h;@)L
zB6wlIvTP;F>U3e6P$WM6Dc*RI3qO20@VR?D5<j{*TP}rat|z5VJ{9ZG(?&iVf$m48
z)~XAV%{y6+usPU~l8S+=Gc442d+TI{UqIDV?f8*WdzYa1E`q?#7uK9a<?$-*<JB^*
zq$-*xpm~=M+ZN#x-kDVx=)tTYPsN&#c)JNkTOmETe(%-%l6}AcQBd~KJoFT0-Qc2t
zN4CJ!?`-#0v|n(Ns~KkZ>ximK{x`TG!!W2Pmz+A`6TSj)%N$*vy1_T1YIGb)I<h+`
zX}^(4iO1DEPX6*rx}=LigrpSPv2ctDQ6TZn)K(M@SuP(rJx6)h=#~(!yFv+{GByEq
zW(UBe0xY5OXi!nd8}ph~ZrI8qzQUIhS)i0XE7_`G$=!Es&M^lb`=u2m=Uv|u$ItXg
zFJJXO7>hXZW<`GrE5cdE=H--yJagCE&A>E4o8d4G58;sC*-i1tUC>O#;KD+NQXq=I
zwp!&KDZAU<s{BNc%DC`%n0Y=-DXy}2XDD@Pr+X1>9Ct~0l#gyuP`r}okwr}FnxsYk
zfK6uX{p~1Eh#m9_Kvb!Ub6x=ijn^jDQ}U0QVJJ<i>kZ^zEO(_wVKtOA4Fk0z29Xyz
zCkd^F>)DNCo1p?3`qM^IE}kissTG%eYyqli2`s+8Dh{Y0!H}=W7jXMqyNd#?Ck})N
zIhL6^g-e}q{vwTx>#W{KT1!#D=3xcLy0jJfS#zsMNK^kiZ6*i(M^AZ(YOqdtJJM;w
z)-2EMqd?|Jrk=d8P5Sih8w^8wF*{G{B`(p0GG>pnwQt>D3Cu_u_b_0ojRp3Yer3J>
znN_D^Nb01H_{9aOjHq77%EO+;(AXj2HvpYNBa?B;63$rBz$<Yb9>Z$MqHFTgmr#rd
z1T3H+d#&6QlIj8twbZUrl%FrlZ{0IjG}pQX1}1-~W3efT6tyWpqy_TpjF=nBuT=+7
z*j%8x=Bz5Hq6FPWovF<H%tQh4k;O9+&YeuTmUTNNF?pET=|@^VQm?t;)>9n;`zgcB
zE5_r3q8TOaC{ho^w2d}r!w)1}^j{H$7^{u}!B;21fz)$#+&#`on-n55wl2?(z`>mS
z;#R_42CCfDQ|V%(<4Iu4-oRB4KQmM{$r|Fau@$T8ms6y#3OfH(^=48=PN$4UolFu}
zvW0)|+qPt5(D%jKOzaoZKLz*(Ub(x+ee=GdXLSGsl7N~{7Em@c{FG^2@tbg$fGoA8
ziIfgvQ27kV+Jnms)|!Y12)q)dSpon#U*e4_eQ}ZMdAR_VWK+4^V1>Te+A81&m)kXx
zv}nhHaaX1pEjG>FowRk=RPS^dTpvj+zv`%y-zzu!`V5x$m8=RE*e^@a>PGg-rxdu#
zcB8?$i0?$~g-SZ~+^OhE5Z=-FK5O#`=GdL1v_0f?T#}}cC#9@g>s#50wt@&P?J@#&
zgd5Y}yyGx4-z>0qG)8t~y~m`e7HhYxj27-S^J95L6yF{l){r&0YVRPSA^6)^U2lgH
zl^+HuQK^J>i9{Auma*eFz+-p9%fOi5eoas|WhRyx1yzlMtzS;VX1XniHQ~mu5?dE{
z1LbGbn08JFDNoG7mcHpp@YFlzb44Tl-u54^3Hd&&_D*BD+gbU#hb^y%ON~9Wt7lMH
zHt2-O1-Yox34A-Y=(QujVlEO~d^tY^8Mn72RK(wa6a<mqsh@Q7(M!Lv01;ZvUNU|4
z!1ublP*vBT5VqmyjkHa`34MuWHJ<$xtN(z`Pg;i+(l3wZ2D18sq_oh7u7Y5LX%A&n
z)&~tYXV_ppT`GPqU1hK|;My_&ZopHYQ?9Bn@$(1@UZkUOR^(QiLLLy%w^wrh1oyH6
zd&qWYtq#=H5gldESRW)`J?e1o=}R~mBeO&^KTFLf6gr<<<N|2rZy5*e9#)yiI!mP6
z2y7J6q%Tp4k1mYEC2+ZzqcCtX)RSvs;cS+o<0H&A7?Yao)kz|ZvW|X2XD8Fdh+gtH
zq@Q-HZXy<7>5$JEsR3Doi_zqQCMKV&^EeIn;YVFmkPC`{)+GW?#pua)ryV&t4oxlZ
zs!5@w-EGB&wtb@IQ9+}LHTI5VwoLKR&HDnjAiyF;HnuIAh^y(Z<&rB$Gak1#<hN*Z
zDa6DHSMH<~-`uT8<uq3PIoNXNs6%jKZCg#Hjr<2|qS3LMAYI}_BCan~k{qsuNFa3u
zrD7PxXV`}@PSoA7dNNi$mc+4H^i03Tl@EpDI?a0ShS=d-yzB$zrJmd|TUp4|3}J!S
zGNk_eTxuo(gbQL0sG7Dkx@u8FZ{!Wqz%x_)XNH~5@$YEw*NKoGhiKZ+9tq1dzMo`-
z_^ZFqTQ2sdK5k7JwGPfymM?QM6F>vwqwMNc6R(>3T?$ZfzUv}7WW_$Pc~Yi2sv@!}
z8fkgao3{32Hcuu%oPHuaHA&)@MEm)srUk}bM4CiSMZ{<BMq&%f*my2vb?V#ws*ltP
zm*VPOwAel}AAL@|sN!JNN_=?+e`Q3FgJNr_;(}ARn>EejMXms}SkmftMhp4{&7%Th
zRArz2n771Xd%r{=tgmu+Y2R*1s=Q@{L7t5m4X!WV@TU-gLAU9)_|fLWxyZ4MW@M-5
znPL<$pVafdKqxBtEnOMmGZ-hZf&;>31D!C$kem9Z6+g#vo2HX_=wl(zO;;$-6$5Ur
zWqu?Czv<;^hFyZ}OXfU1MjYpAvc?s!Rgk-~?ayqD=(oG(24Fxhh1beZ#x2p592Jhq
z%^53EU^MzO!SmbQq+P_r*=PX8QMd=aQUNIGrSCq=Zbjvj2;6jtlz5j*SIdt1;XIFi
zQv;s)Czec+0JzD^D<-dG{G&J;XCLz3r!&|2k5Imk{W8_Yl3jowa_n`g^8}z^{8fV%
zY_hxUpZqJ;|BFO|s-`ILo)W6^9e7VkeR1eKp)qDNfJr^^5U@`Yn8&3^O_lS#%!N5Y
z0x%D1zRMEWTt=L)`%=ig_b%Z6&Ipe@1t$jbQK;QLv1Jwtcj=ThC+TBhe7a0}9e@AD
z)r>s&&Kza!*45sRG$LLXD>t*7+MeAV&4cw}g9UJal7(he(*y$L!16~Fj_)*bbDE8$
zW%V$x@Uzl14mRpIapatX0s!q~voL=M#x;k01Nuo=^|b(B3_YAjY?z&w5~A9aKJ~BM
zkm@a<+|bi1)m#^vDGvsd6ZxEu)RnZDkDg(+xLX||iujAjioy@ci7#UAroPRA;0C3&
z^=I<mX!yXe+m#ge*nj)jEylB{`+7gT!sM-r^z!lRUk9%}P}+V9I?FOP87V#TfgB>j
z?)0dY+e>GH1;*LLKh`(}4M7Ex*L}aN?NRhw7zpWT{CA54oRv11zK*c`UW%Ihw#v^7
zzX5u2o_t}*qJp>!I=5wPExQ7yj5>ob;N;~%4OT}!kkDuRq{f{q{lSlYFX~W}d!pgx
z@X1T|NQ2{e9yOs}z&k9YA4U$4lgLlNcrbddahhCH%3z8_yW9sspG|y^;1+oyQSSF2
zpL}+^le{E5)5@olmFgAI@pq<Pl81@-PfdX+g|{=$hzW6LsT)t8=o%-Czr%N_pYTxG
z#JRRIg)R&8Haq~z>Kg$*PVXqvp+)>CzsGzlPxmFoZ6-XdOGTkUPGavUU|bVkp*Gfj
zZJw70mHT;=kO8F=L8do!_hgaYvhK?C%oKPS1rm&F*3oSZ>So(4LU+nvD7L5k^MfzK
zqEp67stqO?(sd#4IlC4B;A=_0FBuibyqfD&68M-wj+CyvW1W^ZVHmFTg7^LLZusY_
zay-isj7nd%VfNOEVBZwJA3$Tw!{5@Zp5rI~?JsXpjc%n0Tnh`!0k7T0o_i+q11ZYL
zCHOO3e0ipLV(~#0x9ED%n71Jt3XX$)cNn1#ac&N>+!q!<9x`FN3aQJKvM15sI1xD~
zW+{OjkJQOwu3Am_yM|Quc&P}@KW~<Qs#74{1B!4qnaD{e4g=|Inyq}s!n6`x{?yz6
z=1&f{&l$NLW2D>SDs+~wMgo87Myv{ITJ@(#<?+$wxKU&_2x_^e)cgjV!qvEfn0f@^
z=$?W;S3Oo60Wn139<Dm_{d7lzmh@AmKM!;*g9KNA!fljW3(FMt=pm;|cYbz`x{%h!
zIbpQ90~?~)2gkR%oX6-x9pLI8?)xb-pypTP0o`a*JysZSSBD6Yo{Vecn{oT%tkHlB
z-OljKyjPS0H>>xY17lu|roDxG?G77f{-i-0jO^5Vx=|hkdc)H!{_}M1po4Y+Nw0Mq
z79ZmwujXyeFCM0F-b`_%W<3vI&MofrII-#E-?G0yq%vtwApbs{OgBQ3t10R;mCLI-
zbVoU)&)yRhb3SXol9)!GCq?MBge(YG4u;ZllL@m7pDRZ>F<>V9>=sbR8etlHPVr`=
z>j#l#oivr=B*pD{m$oSzI%#!8PXhC4YB9zKc&~_lGL+YDW78qweFSVZY8)xeH+oTt
zh#;bLaGU`4*jX*nY}`7!4DN16DMmrv?d(Hu)jDH-MvG+=Z49Kz_7i!rac!4$U{8I#
ztE((C|GHb)^jGi6)M3~m-vnlIZamQ~Aq~>8XsfrDhes&ja*G$9q(=;>S+5U-Hjfrs
zaFbAv0NbRlCwb|mM8U?Dw8A<XnY|(hIg%kZBhqe%<?YOPnk^`>s-@XcA>j46Ow!}a
zR4sO1NVz1JkN#-8bZRH7r9d8^GnK{~6cDX!8asfnb96Im<A+OuO^@e7li}UyYRDMR
zFYt4!DqW2P2j8)wvfeUo?yVkT{s5Gyg~ufbuiRgG`L;f0IDM$`ID!l3YzA=EbXR3g
zB-rAcNYydZ2q)biy=zErweK)J8y<_)4+aW18w`f*sPPlt3HDe`Jbc6bi(%XJL|}F{
z;~BvbLiv2bs;T?cc9Zz`Qp1Z%oJpYb2lNweoxf`VvW(5ExC=+GIj9sqqqn8Pc50hA
zh|+mg8|Gx(e_wQPd6M#X>2Xq?+*pqwZu!wevKbIhmoGUEH?dIpE&%0JQ+3<-fPbUW
zJKGn1$?GH|Xq?0udl$C<MDfyYB#{hXfGoWsTUfpT=&|N$<NXc<+H_a%yP7`tbTvr9
zQ15!L9bN*g<7Um#6hq=cZ=uj_=K`idN_wj5<#^M6niUv>7;<7s%);ElL7-eh<2p90
zI|V<dlYtybIdQo$;~_J;=x4$f^I;9*m<6G(l~OwP7IKCpHEl^2;jzJqY07sVqUB9U
z_YUmRB2}biB*i8zmp@iH{BqF8C4Cyw4nAx+w_PNBk4#k6TAtj}tFt^Yg5c5Nw8rWY
zY|*$>9~y-AzC997@hT`uv=v?V#2;J0aIVw$7jToa<LqIGP(bcvDCj*@au>yUyq<E7
zsF%_Z6T>FadjLg&JVc`jO5K3Nqm-3<rvt*jR$eJ;yR(D+RMr8XQUU7;$k#MH<6Ta+
zGWsRj`_VBB7lJp!TzT^p$c1Ev16t9-t4F+k)Wf2$<13j-*25@qx0I@Ho)1B^y?uE{
z;?;hBOrhn*D@e*FYDtZbi=O3t<8w^pCLOe2mB1uI92`tWya~d)5?*k+K&Qx=i1PiA
zk1l*daBfdLTAc%hS=Ny!3CDmJ&AO#sF$BIoyFxENF>6RE=XT#K=YI4=)E!?<nw?DP
z&v}JRtNY<2s=|u+Jw^PsR60TSo|)MXQc2YY6ekN>Jbr?$VL8(DpxBCN$=Gci1;Tlz
zf?#Zdqus7ZWib;$!JG>w^2)T+pvS@<!I0K@^BX~p_~2kIspK6T_9=cQW*yYxg<5_S
z-B>w&u_UGr&DeKR>!n7LJ^Wn_S`zZB{4uzp3)LEd<el2J+%&t9QqMmP)FPx&Yz@B}
z1;w_MRJ>F-3i+d!hFewK)}iC;YtMXoQGz5i3bc>2Jnk>lng_WYLGk|00j0iQB%CGY
zSVqYnmgH8js!23grMp$-V@=1g-72kABb(dsoOrdBqK$|Q*Wfb`UuAdEeDrF*(s0k%
zy?O6|kdjlel9#7bJH4^4lZs>SV<;$kQ2m{iSC)dA{x$g<57SU=!2?=j{Ma8Nxjlf}
ztc*2ex*vzU2*8)OPUX^k;*{}Rm*7a*&uOM^UtqkDxKGk|@VF2E1elY*gU|#qOF_sf
zL%7v9E@n%rAJYAQGW-IXwhjEX8R4riTIkas$Rn|Cmghpp(n|>>ui=|~r~~F0a0ANl
z7MN_Gx|O?vsm1&+Fa{6&E8C3HGllnx&k64?eBHNO@d?E%;S0Y5u>{m!-;SP=`TU|5
z_Ica15Csve!$wd+H{60MLR6%H2M(%jWmxmlv=G9?=h|^zYxB!GWmT}D9q@LeB;RRa
zT_J*7o33kvoIkOwXS+9iTRwRXVM#w>=q>4D`I7Ba8aJQ2$mC>~;3wa`|7E-6yreK!
z_w5{Dy^t5OqB_-)&}lL{N7!jnp;r0^D{t6_jh5hHRF*l|Hh?7kGVQ9`qGROnQ%U~j
z<T+po86?jyV=W|!AAhLuIC_KFr6_3cwXoWYEuHZ3f*c@s6cLDT%y_5o%zh;Q4sACn
z%0a-Tb&))B(H!q9+pZ<U$vmPx(uqqj?}2&YJGoFj>=uc+R&;tTDlM1kU&WeSC)&X}
zbCSDqD0^T*7ao`ABc*7|&hDPJ{bo9^3_2Uwn3f%CGBI<UZaEKzP^Lg-zLnVQY0BoC
zRGd%Wy&3eiQRyiRhbPW`PXo)=uA|nSeL3cHzyX5cM|rfXKSUZ9PwFD2g%%={j}9MF
zvk_u6Z)xHov%0L|xc)~_D$dLG`w^N5nVa!?Pw-YAhms2AD&P^{3}cMyr+d&8n1V8P
zUD=A!1Oi;j&NPqrq_k+3@)i{(okOa30)&<^{DYHMd1jBChnfm=;8titkMfX(ars-E
z*~TbisT0AxBJ#GxyO$}jB1G8t;{`IYo7eN-0hikQa~3yBNdc1**M?997SpQXY=y~c
zL+Od)@{5B40){pcrNSwJ&^puJjX2MnDp2%mpTw=PpF}}!sXp;vFYk#Ra5YFG16s{x
zY`F?s8&cL6+invGTr@2)VC*?2Quo34i;cm&RWD2)(8qJ%tz$>y%jkcMPfgTyn9kJ}
zI#8Tcob2#Ey9D}}O?7p=kI*WulF%-l*yf8^o?QbWMg=-k1T8&>@iw3>v;4Eo`fKd<
zlP~d4#MV}(IN#u%nLFkA%8{-^AfTWgIKkU*-n-YOwn>5?X@bi0d5jXGUtCIbF~2>B
zkTsr^=;5D#ENJdlKUC=jV6d3Y`GCQY&4h6k#`iQEv5lU<*nJ4^GEM9b9rDHT=iMIe
zIe~%s#EFT{-b+gXDTc?w2oC66X`XH(vW&{(ag1L2?yee%(TZA~!)OSyQ2;I=ZS=jj
z^6la#5GDS24oFYos%8w*-EDjC7az)p7vFTI_f-KO_)-^q5Ko-VWEz1C*?GQ3vdp5+
z#)O<iMRl7ho97Q7;fVKEL$Uc%@)SE*`oRm5*snj)(}YVQa$RO=97bQ41&v{<+YI@!
zQABaS#|r*ve_EENO59K<7KFD=Jt(Y8pY@9Rs2)Fix9N3XI7(V73mF8YvxXeiUw=)j
znpW|`annfBoF<|Onz%w|hZsSe9CbK~4B1mu3GaR`ICJu1V&q5pU4BN6vnSq>(o*o`
z9l&E<&@IOEusGYHEL3~fOBLsmJWY7@PS54bazI?moZf8aJ9Kq7i?4AkZYW{^a7F%}
zh@|V)bCYWH<6=y<A45HH1(-0?1;TvkupyZ-kQq>C?$-bzkJ=-EIrZb#fGtfO<NA7v
zy05^p><;T_6EHv9IVeF0C+JV{e*;owBF681pH~t3JyMYNu3KGlowBmgIq3<=+CXu&
zX^{A&rcwwQp|y9CZjG}ws7tDTp!)}JW`_TA?_RvVQFBG|PlG`RPkOSQr(mFH_5Bl-
zk5h>DWo8`kuEbMEmNsE~JO{Y$PP@oHA9H$?-(~s+H<EP7S+hzVh<wOE7NL^mV!uB8
z{)k+oGoB;#Vz9wH;hjhZi2$$TTS<0EkmFCcogn^nb)c8_(A*~YS}yrX%O{`X0#rus
zFrhTf;Wzc|in`Bej?G(YT=-|s1p11W$}Zq)pGgUTJpo8%hkMTf!Mo2<<Ru{Fts(Wv
zM5v&>ER-WjqK2azYew2oNKe?#&>jE)dDiBw8-TZu4VhPY2KqcnzNGmGD5$Vgl?NVv
z$VyZvD+hDkLYo5U$x7Y><K~AsjxkVlHJcdk0ZTc(!S9t%!EL}pG4sq7wMN+YDM06&
zh7CcZs@H3{SNpi0&~qA68r_+R4ti?hKO`jk+Cn<%hW#c&5~U~j3TC%EMC4rdn|E$&
zmM8%BJsbek{7Fc4B^&6;gwdAn>=Gso_NlouCUB`O(_keR%y~#DMzK1uitmxo*=4P^
z8T{bvIUY}Thrf|?$DG^egk@+30b(JjP9$yILbR>jGwURij{GhZJdCGuBpSWz6?60Z
zHh<mnv%ew=llc)w@kp6b8QU~}FgEOc8SThQ@Ih&m5FNG-PG@UYXG$_#<nw}3Q^9wp
zZvBS(ukZsi8VbJ@3>SSyClG@(^7_4+z39XkX_O+)uC@#jqi`lT-Md@IKmVef(M!=s
zL?+Y1OiIy2^rMF$89aoyDcQ!q9PmZ;j7^`08H>a&h~ZHsJHBQUqDgVceaAj@YB+t5
z?*PD{#cf4*&J4rdOv=B0rG2U7pobf~xe{9g<q(;1dE@Yc^;h*dh~8WTFqr<ZU6Rbc
z6Hg}I7v9Z=LH{_5;=ggSQaFigT6Wr+($I`76p`p2=82wQkc?%%`h5+!A$+oVIb?gK
zJDq$fBfZ#$D<E%ZexW+LUb+V4VkOaW1dXwH)}NZL);{}<3x4mp-&s7UCR1gEnhNfY
z2VJKI&!NhJ_C6sTsY~T@lz;~1s{{kl0TVsBLmEwEy*c4v(_tm$=I(JA+5E0WX6}2C
z_2#XB7E1io<#g}QDOZ1QSR0cU(5pV-GIAVXf29pJt1x&*iW&A}w3pkc8^+pI-m*D|
zg?9vfXImOwa2NA%2Z)0?@`>?y?NDToCbu^eoy!Wk^duH3p!BxzFQx29ua=_?f3p~J
zIt)Na1`qefB{!Z(B>AoZR_1+s?^8V}c@Rl^)1_kz{tF!^tjyI+<azSS^Os(lC(6C0
z9EY=g)Q)#Egm31HyF;#^O80|o!6RtLkW30uHutT;R+}(g&iYrOD@;WwCadow31-rS
zBz(Uw*ecse^EG+EDJMft#P9pY-yYA-s&b6LIFty~W^K}3{vV)_EQXWU9G(>>oXyA%
zP8)?GM0o<V&zM9HPWT%#SEC1C%vg!x<JxRl4f*Zh+(7v_or#iE)=VaZA2}q81mK6@
z#pOThFct7^V-!RRnvk_P%jsTvrfOq?fm*<bF=d~|@_9Gi!OMYPofRg{Lk!H1-=<<U
zQ73Oge^u3-rRtj<v>9$BRo+am>Br_BAF5wiq_{hjvq!L^{ici9s%n#x714I=#X)(F
za6{%{3KEtQ9YvD+#!wx@K_*e`YCK|DhAX+DFx63OZzF|n_$t)~-&`1byg6S5Avk2=
zCtn9E9g`K)?3g@)cg7X(XQNUv!k~GnCoe0D5S`>S5kFIav$k1-Pv!JlEo8^I=;Yg3
z41piWXz<g%U@S*jmCiM}@@6`xCkoUZZ>R56_tFRzk>viEy_Dm9P1OCni|-_2V{~MZ
zmM=6bEo~be(MZd>^}F&lZxw?LTlZU)qiAAuIk9T+s<;DCqMj-bFnIR|0R_jBq&(Pu
zoll6%d$&NY;AjSD{o5K;3(_3s-2>=5Q*57w>$lUeb_(5WwlD*ff#f8Ujc494rSpTN
zpz?W)A6s5Su5&hl-RH^K9m%QaaF%AACDi4l0`hd?i3IqOi{-3z6I#^g272%MoG{b*
zzj&o-&N{;p)uuH#hM25rhOvM(1XYA%Xcs={>Wg{IMw8TMSI(m)v61L^DTX#@Tu@Kv
zGMe*yxd6Dy^l(+fe96?YZLrjccx$2l?+Mgj6d&RzO-K;^?%?3fzSOM8F$dE$gBjCF
zzoY^<1u(pn9HYJF9ad7omen#fVcJ@%pLsdWy4)sGDue~rZ{T`W@&s%z{ob}~y_SUO
zO|n{{vQB<D7cEgY31AetI~=bbW137HIce-AXUPWemZ)_3sS&O+sMR4jan=@cq$Qd9
zP2jEyg)tbS?<B_BN#!38u2o;vZE`pvQ<ex%4R+fK+w%#HM1w^5U+s(Ft|q|~HS5|B
zRB|aMq#moiw7$(52WSs=x)!@_YaH+<p?hv^8hIw}a-j;yJ={|gf7b%`<r|{YLybPa
zgO5YPZnW4gy~);)-_z>2U7-;N^NhPm9sqO=WE(#%S4`LhPOXAMOoNGT5`Q^oizNS&
zTAf#wG7UDPvcip;Iwjm{N(F7h0aks9XiE{>nefB1HYzKZ)>v%$7BMTB-E?}s&JOR6
z*SeLjT|&}+(3V;<*+dbP6k$-2D*uAT4JJlhoz*S~S;TB9o+<7*;9nX#0i%w<FEDa#
z%Eisv3o2gvzCUJ|9>_*KAhn^Ms98w6epE(Wm{d5W)Wz>#2U5nGIV7k!QPa{ARV7!J
z%j!_pLVC=Vd5Y_1PxPgy6#R|xVU+uFH)TlsFKXZ;(@l;!5Wiu-)_j#a?U5wI*=eF}
zlea6~8x<HC)l_^|0e=&2b);G8iGYaU^}Lv0S>k@+jTM@`E~7(nqmyr1&wVpPcm>|*
zlES@%11D6<2$nqbe!rFSu5_O>u^4k7PPpW{e?{j|uhk<UriVKXk2Ve>QRgZZIsB8V
zRy6|AmD3X$xjB3~Kst)=o76Lk;1C$C=eoCA)qqjx{4jGcDE!0Eq~CW*Iykt`ZI}mJ
z+LM3WnjWYm8?|ia)t>H-@jVkCC8<~%N(0BtifEZp)%m%!ziMLOML3(%!8?958{5bi
zjaexZHq}IKdfP_%{oSMjssxywODMm9zf}AXHxS4r5K5gcE$m~;b7|M!kafXq2^Urf
zh7C%`7-6yow-Rnv46x+Sjd4sA8NDkQFR+(iSCX&H5fJ|-vQ($C>U4XX2@zO~K$_!c
z+z&0qG(+fLPcJnz#JnpOqk*fS@-+B>Uky_EI&kE!@>r=iyT)eHf4pPPQ=^+n{|cAd
zBByEo(lQnk97>qD8czkLpuhkTsj|!fZle#5n|%qy5b7H{52-o7Mgn~FwNWzKa?*16
za@@!T%hF;qlMWrNNT5tN^(#5lM$CY;m3*zanp6h9oY+~(+t=8-PgM~dQ}Xt-<uh6v
zoE-b&gYNmNf2eI5!<{@UloSn!59rS@t$*?9B%>P=<71Iw3-$2xPQ%BC6i{25lLU<g
zOMv9#9PAif;EYvU`y(#$HEk0o%{Gy}C-v?)1zzC_)FxUHQPfKpO6%TyES#9T&rO(@
zdE;cniA6p&Tl76|l>xQ#&MwlcBi9H5r~l^iYMNU2(`pXjs&U4e2HFTk0nQu>a#;Fe
zKgZPKRT-Ct@;O*IvLD$9x7<n^J&IY*oo$Gu4lRKzd<h_HNTI#4L-b6gRrbT>d0DPH
z{+-s7%=0roP1B6zoWJomly}{`nCPFjL^&J`2U|%r4UW~{K(fH*n71#N`jYgLgIEOu
zEX}@J;_oXndhk}sIlsHmYXuma;|)(U_JM5I0J&${L8|iHIo^Iss|e5093yuM%J@>%
z9j(DRrVJF|Li+-A1ND<3ZrZmX1iIRVvdZciQpu$k7<1TpTkc(Z#F`Fm%>r<$X|LOl
zU@%3^_+?W`5ouTh<n&VYSg1Uam#|_qOEFbw28m?^S;B*pe|ZUGDOf?O*8q!JaF>^|
z1$ptj9~Nf6pw{&Cg~V=G>R`Ub!O00iCl;HwT0ilM+DJU`*7)1-k16_lfcgkE`C*B%
z7D|>J14;`bCdus4vJ9jLHEs6h*&Zx%U^J^2HgEYJ4&$vI)6jd;h-Fr3g3>J51Y)z%
z5r5!60zS9>)-8rYUW~b3=u^EIILEv!V=p4TXAB}qL@qSTJdqY&aX-K%qPg>p!-Gg{
z=MgaHoM@zMwUyaKqA}oCMaAYxFvm$>QV#caIlgWDRu_~s+pg;HO`R<;UM{S}MxO}L
zW@Rk?&1Xh9b78iCEO_re1qJ%qu6uy3t%?muw}V%5fZIT}e(oU6?c^!r?izn!@{LPE
zcrOI(wd`dm9luhlt*qkGFrs%fS(<f9umtmdFVhg*vO|eQrkD9vH^CHjA3vUU%m)=e
zOl9nfOUXTW0xKtd3l?pBZp*==8ZPKOU~`9b8L81C&*ltO4?nIsrbYY~Gi_S5dKRhn
zO((Z_HohmeDB|hB8tUtmkyyQoak3KJmz|1TuuwXhPuic8Pfbt2mvNaF)m&9G`HI1=
zO0gh4O(*;FlsoM_8Y95{)nb6P(n-He!dABNCbTi1PS;#&O{rpS>=eOIh<cMnh1~YG
zG6>+6f>TY%kVmniC`Kq73F}Qx@HewoU0vIekqv%2FW5F%;&_-fE2@A;ILr3yru0=P
z3@o^}+%;{Nh}ew;Ri<afVJPTFdn~=eA+o14Je{gC+i}b~GbLJ%V!0$K;LX5MP5h1l
ze@ASnj?AV(aO_B0qVT|0?189)mL&2;cMFO+&{P5sFZH+;<CpPqzzC8@Rl0Uv!o$(i
zABB%m@jl#sZu@d)cc})j;&nRWAAvlGA<@zo>JX=}^Ss;&6G4y`|7@2W{pUP7avKf#
z&1Mg+^RCf@G8#hc1ClN_!6&`2$nU+~R9}>w3|}+&&e`IGA3ZU<g|pU|K`_dF>{u~)
zbdl@J?)D`6K9a4x79#sfqahT-B7E9eYM=IgE0P4<4>XcF0wf)mj{9I&Mup!Ww}b0=
zZC@nTmiMSg>ubZM9Rc$3goCQJ;8f)|zRyrhrJUz4so1CanllTW(4dFkJ!f#Jq;8&k
z{IWSM9oBL}b<GTE)%q<NFWvYtVHuSTg1Ik=$0X%@ARs3qS{1R2$Iv2Ewe0<Dtr|m|
z<n`bswHH*p)zMKlT@`3z@0d^!>XvQOxQ51<Fv#PHMg{x%gD@`2A*y}4TzPteJ-5_=
zig`M+68$%|RS3Z=;%Rdv)w+b4EoLnj0}9|#JPGC*yT>FtppAbg2x49dl>?Fqa^<e7
zI$1|Ms*y75=C&5w%5|7rw~7eA0O!Lr@cKG~5-7RB4-r3|N#>DQQu^uy#|oC2x-Y_8
zcxZ?c%yADd#t)}QS_M+ZLhp}eVP5HB6kKXPX&5RUze;>AUnz;r=Y&xtTxIOW`SQ()
zA*TF5lD7lNHKF6N(B{u+%82r`ktp=l`vMxL<hs#qhrNl>6Qsu}7~f)%E_&7ciWhtF
z+31oLE_?wK9+ao%8qp&s^}V_t0C{KmC_8OlZIg|$L=3k|e}Yf$1%YkqtI>WrQ2rI<
zJlzuep~sjDyznzxKp{1Bw6q#FmoW~+ZfoHD2`2=geXgQX={WNZ<gAtX+on}(i7z>A
z%GRL=_pOMV&7@kpZ2h3MnXhL0^)0ufmHKU;rLQrT{F+V|H}Py#D_mCeXPkhy<5Rd*
z0hJ0m*b#5OAoC7IKhT2s^{bb-cH?B6Az{8zvM**uWXr$wICOD_c@dfgpoSIS<|c$f
z5&DfJSr;`BU5{DCp$air2K`e$(hpR=kd-eSaZ1N8xeaJd+DSBtJMH(0u@O_77%Qew
z6HH#9SUw?g`&Z*N9=RPY#=)ggd)r&8ctnV+nsWhk1l<F<-LIXtp9=lpp6Ng|qva-Y
zh2od0y~DecSjO|GC$RmQbKXeP&n`dlDC0hble`_pK$l?W;f^RnBK~kwXNxP#EFVW^
zjZ<@gn5OTWN*RR;cQa4zcYMuNL<px0ft3A{Sag_Gh`6GRIDFA(4}XKs;RRnF`aIyB
zD5|C&*e~#~&&1B&aR})vP&$I_emr#dNBreuagn%Do}cnlb>vVbzOvfV=l#}xnGNzn
zops($QAi%0G|_C%KE#!UPJ`YXA{Ac6twN)KOqE1dpek}BFonnwz#ZL>-JgST{O(9L
z%;SLdT=y%qPUUM1lIWc*&yeA<FdVZH*zhHLJ-M-2iZ1Gz700O$oOXtpTW*!gF1QF5
z)HVS&;l;OSOm8Ponr%C4_Pjgo%7qgpW2^L2G|~yGN<o&KN!>S?CFqoYt$L#-(|C@)
z%;;8B8^Iqd`t7odSPa%%Jojy<jPBV>Uw|DK3_zlFh<j*HDuY;h0Kx_s@nW@#I)BpA
zDA{1v4(KOAq1Zq>+@aPNIqMA9$DYgcWj#Rs=qsV7r1|QdHB@W28sK++(hVUH8GdyP
zElv{d+@7teiT<>@@-^D<BD&Duc&>ZG9StBrS~%@Zjyo<c73n^lsG!FD;Ai~kn<jBs
z@?xh6ibS-5dCs2(+%6l;=3;Nhg>~J@sBJ9nYtjIa2S+KQFkpl(83*)ib%yQVm`#7i
zu9X2%rsqXr=mtOv8eo3?reZ3%MO3h|xOjF%=nT+fI58IE0D6qx5aJ+kUM7L+EI1qv
z|G<POBR(g{*ys<4YX^jyuE&t$ah{D6=Z!AEdohQq0hx%tbZB;qJdTFqA72M>UY9TM
zUIWo|Sq(8YP=4osFgvt=V^WRzU~9G{&Gs?x=HLD*`vf?J4*)h22Ndr5AKuC&CX~Q3
zW2LQ&5|R-qo<NVe3@Aly1Aar9{h+@ynf`?Q=tJo{`Z?9dwtxJ7B1RnYUko2pfSPiO
zqVzX#3JA53DZ2m9H<Ymf)DnG+59E)W?7udl@3&M@#~(NyooB#?G`vx$1N}FjhJ_VI
zI{;W8Ds-<TM$^0gy{mux(gq{s_%a;IT<L$K>-q1U`)f&cq_Ftdnhm754mAJIr?@bd
z(2*r9nzBgwUnG}5H!=>IOYvt=8w+X((|>m8ucc3-LPv1&rM4VEk}<o-tRnw6!sp#8
zbi*Kqe{+C<RH20Ce|7;t(*Y_?*rl4k!mpqMJuFi+{?{TVv7yVJbJ8~bH>D<<4z`B<
zk3Bin0$CIN8M6J)o<ZZRf&3?cUKEk86yfx5gwfw$>U+e>Zo&o@y&SIn>fek0YZ_&f
z!|J?}7ZmxfF7p4tbIjnO`vgx9WC?W2{_o+yg_(yEgXa0^7fYc)tg2uCP^6&aSw{@z
z4e@Fjzxa~@L=ckt?*smh1Nt#xu=e}E2PqpBW>N9)3CocIB?4v4YZm;KQ3&1O6`3Oc
zKf}`<pp#^gXN`C1Z$#6dGZ#MuikUO}2qhtzAu@ep%=)_)P#gN+-vs{BbAYY+We=rq
z+85*;>;Ap!|C}<oFj`6CZ<9TaAB@PLv0wjZ0olmFKIF&R6Mn$a{*&JM-(RtYhUHkp
z_U4)xBQ5|*Q~&*G#vj@AuBwjGz|nfA4qF3_a{Fh9`l6vDK&4);^XGqZBp|+g`_D$0
z`9Y73-(};{f0nx#DxUBc+{;V`3V4srUm||_?}hGAtJD7RbwV;=I?9dI@cie;e{I{5
zWRbrXB8w0(9oZgmUn2ao!aoOC(rl0VohCFtUa??>`2QO-rNq8iRK_e)h{IpM`0sz9
zXFp7_9T8Sl#s6Nnq*)|%pCaiS-<u#R<A*pv(QE(SfJ0!XtO-*5{{U~9-C|W?{WDy^
zbs7Xc8kjSS&`^p$KA6D>N&U|`+L8dh3y8d4eE8=uWaGjts{ZwD3gFkLFBD_`y+Yu=
zKooKey(%fALoaGk-jKi2K>u~5e#eW8OhNP5rXO*ap8faF|M*MKANIZ-*z(s#ihohm
z|MhF&*aOKlz>kq;Js4i*{e9y8`^pUi{kWe+o1zlfn1AuL{@BNNvCuKYQwv7x#?*60
zT=4zxQN<?#wwr)twx;pVu`~nH)6xD}2?sTFl2PMmfBDY@ufPd8*8X!$`i^Cxr<ura
z-ur*XOiu)HLH>`G=)piwVHow=@qa%3<A9lg5MQoHn)yHhSU&qT{{MHc=Og=S&eed+
zmJ7_ox2IPCCzsVvsb!@P0RD6m6bAw9qAPonk3f3T9q<J3<MtQ;v&M;9`g_V{0e4_k
z5$aWV6&}SnAU6{8B{a6PM}(8=j7phdhN6rVIpw$#U4Ex2Ep<ZMnfWs;l0!t?^hxg$
z(4usmj|Iq!F{P9<hDMC70YAQN)FmJ4v9qM`%I5l73^Qyn_IH)KT1tJces^BFA}9VW
z&#>PQ#9>!4Np0cuy@oYipIr$tc={!?ojU(MOHVX1ORO;zsTHSpkgsi@-bjdXP(|v`
zaJTUi|G3XFUbg1Q<Y_3?Znh|r=i>yFtvLr`JuXuT1On}(yqFv-u*_%x_SH25-MU{~
zt!tc_H04>IP-(wDkbsT5O?%wt`AkWsOSp7*Tp_cnp0?fx(00@3yM@yA04)y1%BcKR
zKyFwCyv=#<-;M&c)?Z&<Z9W0^#~Ff2Xll(EkimtUPe=3#C?y<=zoFNco*Ue#C?tJJ
z|AM>#-Wzz>`St+_)-dtiis-2)Bp9jB=;;jq>JOy8Km|09fk!}B`}r3o_HLfd(48at
z!t6^2J0)mT54j4o*d3)$r77kQ5Nn5PH@SFTp;TTPye-s>H|1#YIWyP@#tInfbngYy
zxPv#r&TQL6Uw=#)cc@RXBH72YMQb&CV3}`5>fwHX=7obh!F3DK3yeYs)Vc$@7kp{!
z&Iz;f3GJ>w_95c%-g)177)nppVE>KS(x(S{bJI|9a1fqO<;*>HEVs>Fohqy>5$^$h
zaGyo{Bdd8wf~MQJcf2EG{w+_^dw?Hl6Ge?^c%FB)@0m`K|E0YjlS_J!rJj<Fkzc-D
zZF=MoKtZ~7iL18qy=VxuI4}sxOWQBMh+q#4`32ZqtdFi|03JE=Ztw4>^OMhw2S*i^
zmIDF(1ab67(dlK~UgM0Ft&(2>YWWiYL{5GYf@yHc_o~_tUKk7o3f)_>u^=!8Z|%;Y
zvJLQmX}Sj=CG)by`5i))a&!fP&{(6YPw|fcvuQw?-rc3ta1?Xui1f>0ce84N>fwt)
zB)k>|nrCJMOFi8c?UJBDfQS7k_WQe7x4x%hK1#h-dO)UgtyIBJFc4vrz&4s#p?e4z
zQF2=9fqXm;IJx4suMCd>QHt0hP5|H}Ei-n2InezAfv0euLH8U2oN$VWLzFUmC}bn!
zw%k{Y)={>gxG+i52f?rmcSz7#o)L_WA?ZHcWj+uI%Eg(OD+lo$M8n^tMWEI!VHq$!
z@M46qkhT#F7Dq@40EDG}yMY%Mf<P6hL$gIyX}7hKC=-uFGqRj(9j@dPTvYuJ6~FII
zC+CLdh&zV>ZsfQAQ?%0fm)Xnq`I1+Ywx>@A%Ijmf8FC-raX+;dICh2s(ULG-X1hms
z(vvU>v`$=OecHZ2HeIC^q7=^4-H4crcZWxRCJ+OTo&>~1=+k@AkV*pt&kCHd6tK}M
zj3owP+4{I;hRnFSHRpnFe7XouiSJ}Oja}p3ug&o1$%g~KgYF*1u&;m46S?Q?J9q1p
zHRH?69_MP%!Z*|gF%HXAW<eF;lD2RabE=9Hc}8Hm_jo?O$Vzf7W%EX&9xSU+WU3|?
zcpx={HJ1h^kKeVe0dg3AY!#_5U<{<jpz4W0)3l^ObYnkL4KMBzTmz-R@cNZiuRal*
zXg9oyzfAmbga`MQ?9ml)F%3L$Fnp#fx9Qa3IS<jN6tt?;9h7z^#S%q|+@)inC4@to
z1?&JA47*>F<rqb)cXTlT92fZQx2K}mWfC#uh*GV^WVhKh1a12t7Xuh0sPr>Oui5=R
ze&+1ox>4rJ8W92{+bRjHnz68wnZ-@0$B`d4Cu7UglUN5%a%a$=S1umyjWCz5NinDv
zb~{fon&*erc!%q(2df}l8lMVQ5u~-6pr};ii7RdWb!1h!52<gRcZL&4uYpL7lPS9y
zpvjpXuvI7$AiDP8dN5r;MZx2b$Z3H<SGqrXBK{h%WjiL4>_L9&6t0FumeFD(Ury%{
zlb!1eF6kh=1s@m77Vlh_76wY~V`oc$G{I(irjt+Ys8OPr$eV9p2H#*K*WgQ9I49lZ
zR=e*eo^2(Td2$KV#B#j6JkU8bqm<wyDViuBl$(hd_LftymBr-}U1MnEOC{(Fya4TO
z`zCo(N@~&9^0=NU!bZlKF+J0QbfXgla^Ip>b{Y9ws7;Y=`5|>IqZYU-g336+Q@Uvp
z(#+lVFL}FMyUQi}Q%moxG__hjuX*EWNO=|IUH<fB3SX_ORB{)G;*1<uByE7vsHcw)
zpw~AM94lZC0(gLYj@?$Wp&hwZ^E(IgH?M!YmeUil|0Z?#uGkCY$E{!@%mS?GdBeQy
zj~kryjEFcP<t<kL7Oq}y1wH1{pN#s`JTh-3F2Wiz9*s2664dJRhUW051J+xzUNCuf
z|1(}_EC`>pGW*J)v-0k?VSu`6i4^D%I2M@3*K@#)fQhSV=mQe!6r;k>d-P{U)|Fz(
z?q2q~Dzz6kjju&0R7|gGlab~j4s#&AU--e|O@M6f_lGO72Pfd(W8QA!Ewh9y&(Hp`
zwy!#(Tyk8al8!}G3&nCF^~H5_jahg!ANKp(q&q~0b64*bj)uoa3WaYGVfi13>BMhQ
zpBpOCgstwmMfg-Z;nv%95$?E+b(+1RL%$b>pncHRiurm000BDz<^{;+#RKpr>jea)
zzS^dcs7M%BAtOun_@UZ&_IcOINsi1r{R8zZ3R$p<88k(t!nt;D%L!jOfqb}Y^a)^p
zmsuv~EZAX-)nd1$r-D#b22t9<)5_rreIGs7$@CBpXsErbkaf{%-+a=C<x)uGx-Kx&
z41LZ8>Tj2BE*m5+xGgKtK}{A8H&J*;)wS?*a-mgPl8wKClamZ4#ckFhf&a&TIeQKY
z+=|-fI;N)lTsP<%5I3^%a7ry$MdV(`nLbjzIHA>-AMtOIP0_1~O;Xj#zY(I>?=Af-
z7?BH~o#+A-H^`=w_?HEWIJ&xeNUW>TKaThJ0-Qp|cLx-|E~2uF9&HLFXk~N`TCh{r
zh|oVTuvP7jg?X^Yf#nnu7D~|X0o_53JhS}{e`oO$e-JQI8E46tt3vxU0ckV+OWc)o
za7|j38Plc4jh{I^^6Hbdv}41ZBapG#7|U;n8zWKaX+QR5lE!qDA+lKX_ml}lXhf5}
zN7}8G(<pp{PJ(mdFk156K<suCpkpPxrD;7dkjOT%IJiE$O5}mMLK`LJyT&!N)lUgP
z6orYpOvej4@l^B4Nd?W5bxhf$@~5~a2=iMX<}d{8`{)hUfxBW@j@$T~fmAtV{eE8&
z9=A8|-VS-J?s!|mlmicu45MwF2pu3?w8x*HUaOM%koNOIn1%YAJE4I>$x#kBSy=}t
zcR}4`<s?9b*Y#8({b%#E4wh*K)@MKc1_Q9z{cKWMfoT>};2Rvn>g2Hx$_|~<m!p}(
zVIz*-9I*)Dp@jOQRt12Kh;YlU6>o&t5=1H;`#loC0e!`nI)EB}I5ZS4CJJG69-TA+
z6-VS@ycD_lMK?51y$_f1opLUw?3N*rJ&sc>3fC#&q(?jFo8F&hg*-NsZ&p^&GgO^2
z<4cBSzE=<`=UTM&2B|;oD;c84oJbpPCBxFF@Tk6|sfQs|mg=7~<_uK!sW^tHgpPTk
zOh@tC_E_K^vuR2Msfm^VyX+@CiUJnSVL|$YU*=u2d4*;BN&&B{6pCEB+*`L3z21}$
zmGUSjx}9RAgNv3~mJ1ctS+te&H#ol2qR&_BGHz(bvN)$jY`y1HppideRIq{Zv@IiQ
z(1V#xOK)`r?NxMT#UM&tX0CF-4|32LMIlph%TOTK5cS|Cx6RFOkt34XObU3m5-Wcd
zdj{}H(0F{!KB!$x<PCdwi|W1VJUm4qt*|b{M((Ehy!Bl4I>1v(Qcha}fS{Nl9w?76
z^V&k*=MJw-EI*Y-@-fpm6R$DHpn3HOMQGxb20{kuqp+I$>_li?==zzjj&CnV$*8S`
z*DLhnnvL{1EQQV(M4jy;(oqVq8bM@VBh`7#HdP8X(p*fkeGP>NZ=|)gQo%LDYx9R{
zY}0@ac6PAu`z7F|Q4}SdC9@f=741ws*D3uh-2HWT3dC!58I~(bKD!A-FVMlX##Q`X
z3pfOLnGl>gfbZ@lN^?1ZbGb5<EPu0Nvy^(1q`8eJzzbY8;#cj^L3h%W{K8~_*y6$e
zkz6fUSRo5`na}1|5byxsgn&ql6;x&%#fnUr>n61;!CCC2?V`R55L(9Xr1TwZIxXd3
z15;N((u_Iq@WjKl^j%%A%G*SQNuNceY;*2*KKFG3%Kav(x3Fd<<{QNvc=u$TazFAN
z4waP2QoE(e@3S|IQW4E0ZRL-NO&f-|nx0HXyj)d1>OMCS(c+AEi!Xw>M3iB3D$k$H
zUdP9c!FgWK${}zIpE(L=lH(x9me*h_%SV=h6<V^oW?NK&h`Nf=D{@RrmT*=qs)KqR
z!mhClNLeeOY~w7W44Ti>6Cp2;I(sUi1f?o*Mdza*d5u~CiROXxz!~7Cp2G2>fs2uN
zN{+o#65F0~%o(`nyrDfX`vYsW(^|l&u!d04f2uI3TRqQZBk>K0k=H(Q+fLM@JVAy)
zWlP8v`~_t+V!NB<z9la>dp|bDr%P10-*Wl<CbY@JOI1FLZ|e3JGE*rJL?+V3^&i@p
zqxUSPIdn9A$i+pXX5V=DAGXdqu8t-7+X?O#+!EZ~-Q6L$yF-BB?(XjH?vMb%-Q9w_
zL(mY&J14vM?%nsd|MB6R>C;_ZJv~!h^?c)R-YM|T?1r)8l=2ywlb5qwNm{;dkl|vP
zXmygN;;B`R$aBY-sei%@|9K}TG_dOs)%1O@ZBj3SdySyx`_mrxsb_J7L(p9dcA|*`
z+?y;16=qe0-$otve7-;&+WVPYtK~E_rfD~Tj`tbmlh)D?AicA)!n^zlP=|~lP30rB
z)SU<r_nIw?<(EDpV_hG&8D(09%Q~Mh)6vOS>Y6&n7a<y1v#Vc3iSRf=o*AKSp3$*t
zsD(~I;+RRHBmiGCj^%R!YOw2}-yp_AkFW*OQ)X$!HG)OV3Fe(+;JPcK=f96tzTDEZ
z7wxTJ-UEWoK#54G*Q<=(HRXg96XYnl*xu$7uX$4%H+HdbTiiS~%W;bK_}BTebVTE6
z5h@TdcrQ@vdV^bDX;QG<xPWLHa4H5-ZpT<N(lHn8_zL$kvYTmigolF!uT$+!xL~4k
zDi}NiBDb>1NOT<Kr#xDMxlS%in1ev89K!Zid2VpkQWS)SZ$a6&Ii=O9F)T%!sUOF*
zLQq_>Ac38Z*g9_E*a_F^%P2^hK8Q|CWC-6$mIk3~H$ue8K}8N`DR3w2FzJ2l?4sLV
z0s?XBk3!dROf!Fk*;a}7Kze>37Nj~(zq>deK(BJ8T$fiIM{UK4jd!frLm(%#wN(oL
zIFzshC6$3DTdPm7q3S#tnKlZlwS$S}F5pukly4ynECvymkHbu?8ChS_RdSK!SWC?$
z4C|qm%Uvo6V}Jiz%2~Ez80a|ql)29YTX8L%!so_=awEwlQWCl2fUEr6fLh;r1}q`M
zc3exDY~=oC&CXvYSuXZp$LP~ZicH?Gvr44Mp94(Sz0#_YZQVojfm=Ao;8COb!WN=p
z{OiUr<p?)Yd}UFN&j-M%yBVgEC=Kse9C3Ux1uI*=JsdUvPFsOnW<!V2v)Zso`}b{D
z0%58=qu5q3?RMI|yR^Pd2!6$)>L=x`@wZ%&l+OS@P@2c&=j~=^x`l+=bHxV9YQ1r2
z6G?1r?;yp;{fvD|B&%2<tHIB33Deu;e&xC>@Or&@7V!3c?7Q?<8ZjEZ$623_DFsY8
z%V41+!*)VS10xch77fMpm^?&j@xieBZYO!z-yjn8BrNM;Zukc#3sX)0bYM+N7Ddyv
zf-%Iy+rxxcfa{}*Fh9u<%S|+jfHa`$31TLfDo3GbWIH~0{DIkr(jXsl4G`fNJCZ)9
zwV_<V>?(0`<3qRGYAadw&lg{ZWr)o8^E%$W1*4X=kuX75^6H)EqI7x|+7J>|*s14`
z3s%&gP1=R~*oXlgxz(8gOFpN!nXD-@dbRAdc=?hfnMUUM%wBr)0TNI2@^e!{7OpJK
z<vuhU)7y48<}5Udl{I{<9*pi$y~Ov?$ExsH)DsUyj~}cV@7zPS`@bl2rMXOm^rg1+
z>V5mtcFG|uW*+kEgO1Mp98PxDkZ?K%B*~qfuHbA!bnLpJAOENEfQf8j?Fiwx5hE?u
z<7rWOzpGaD5aG-Ue_l#{OJa_H;dj0ifp^X?`83SmGVTaU1PJZn<LTOB2otm2U-~?J
z#fFahbss&H_bb26P3L~TUz4C7&&)iF@H!>GNn}IMne?3}{(vQenmMM%R9)QcP2&kS
zL&W&oP}3fbT<<mV)Kf9Bak&#9btP~SW9Xc?Sf!sNzpARpw16mkds?LiV{yDeWp*rz
zqh`YclYT})uzZLuo`v4*GnixOJiEtru=ODS_8?o!2q>+k(^(Iw0>JOw{Wu<f+7L4d
zP(T!jk&>X&)x>9W%yu+NaumZp76pyey~`z?d|RRsU&NO42v)>NP0Ya`pAi?X6>4*z
z&q9g#QdLo__1W(LqRS(v6z6q@J-mud?nji}Ubl7Ka2twL2V@>2cPD`U6-hd0J$Wym
zy<^AuHVwUfiT#^BElp4wDR76PEN=`mj(9W6yl1<d7E;uKAD$RWHmJ)f(lk-80OSko
z+w%4~#KG6J3?F%UVG1HkFwD-(yNAjPmk>Am;EYg;y5mUYw*S;~T#wbeGYclg5Vn<4
zdF|)PXbI=oa3dk?T!bZ8CJc1Eaw-8pwL~UAKN1rZkVGPlZc-aB*nA~3r-d8VW^8uH
z;0!5L<Q-niO$!jRlt(m+lOHbZbuaGoMLGdd-6@+__-~_2T$7BDWka`i<%&t5+i8q`
z-ZT9A5Ml3*FuIQuqEfrLRh!UD$0<_;@lN<<2zn(3+S#DX(SBzqLV2R)yU>8~!}}y2
zoyiF0p0Cb6d1Oj(p$AFJ+$L~lMJ%ZD5?NeMR%{Dr{T8(!VY`a&Nhl1#9WC$C0I(Ct
zv-UhbIIEF@U%mge1$~e)!elICd~2u`YgSskfrHc8g36{zTf1CiEdul7DCm>C9P4qc
z21_%U<EmATBQCabhSC^=$7R-`R}`Vwm5_Dv^!4elA5%lOXpMDQ!(xzdU(l2yBuIpi
ztYRF}QaE=snlweJO!<@(#oo>l5lkZrTO~OwP$AYIN78=@>tKlHtLbWP7Up4%v?1gB
zB>;Q2iG!*cbiP<q;j4W9@MaI`+VllIRr(;gqav2;Anj3B{g9SyEUZeQUf+lJyJcnv
zS!R$#vSc6tG1gD(0mz5NGOK{%xd|Y0C^7pX5nEBBXKMvx4IZ+bY(h;n7}?PSsYFgS
zSZgH9Yf^p_?vQq)(jXre?67DffC`lrYKk-<{PT1@+XmQer$$Ubr{%Ae`Zx-(YfimU
zp&#=?leJ3W>P1B1tU)~z6(3=+5(clqwkC-|%8exTZ`!HYt1ze(c1%iO-7*M<U`8z|
zF33#i;0NN93}ctRB%^hqZ`CJD+y3r51}Yhb570XRB=`>pqbL^wZO`z&SpM68IbE;i
zq`F6A@zEO^)pM`ZkTvR-@5+Rmoz6yR@L3i<YgLH(tj6OgY?|U6SvDnx%jIULZ^u4F
zB#KZ!FxM+*@tH8?3&nh_Ne{*#;Pv*5Yk1Ua5OH85D(!=Z{1%t=O*(?^Mnd4Hk~Aaz
z{=6@6a=RV)>U)3Dx&O6d70>M%GKuF$?G-9ja4%uP==$dXurWuHU^7j$fY}k4rohl7
z7?!co4-Kpcr1PT*1n~s<c`o^CMoH)vsJ8EaGb2BNdGJ4d2h@wTPFsCKPrWup#2f1e
z@twxGz5!z4hZ?6}YSu<18|Xd*(<3r?<A5P!puA?(!ZG_Tk_(6<LT75ppGs4CAn)~N
zI9s|PjAt`O+BFGxo)SCSM<J1{IhNHQ6q`dOGRFw{)WH~?g7yBe<(><wvtO$5&3Cb@
z&=O5Ct5ba-1%L3MBw2q%E-ikdE`Czh=CH7Cn6l(9MkheaGqna@=xRN47=vu)^<PfZ
zVXsk8iJ|9qMY00p)Iq6vxZGJyXs+wC!}0hcD`C5E{e*XSKyZLnuBq3J^rM(RZb`fZ
z_?4tTu4&(#w8W0sV4jO0-c%%k%IPtMz!R(?$=XQgoVZm&anz5m1dJEWFEbZ)`m4Ne
zD;WkFimO|c{3+_7?mO<~S&3l%SD$5SjFf0m@{@5db7_rU#bYQ3Ppg_K*MEQh>NS%7
zmd2cspt5^aO5vq~mEzXXe_cXyarJ>4fy2T)gd7kNh+uN_w6!5j{*p+@iN#kVZ|gjW
zb8{l@A=d~#uY8h;@Cfib=iY`B`rHW26?JnA`Xv^2KpqLu_ATZFRM)y>t%!-DiLn%6
zrft6+UJyzhO6o{`<rQ;kX!^7l5aLK>+Avz25sB`W)S*O|$C-(8sFcR+t-|OnfBIo>
z7GQdLJIYv#3cd3YyFe9?lC5V^2s~o4G^gFGx5eQ6rD0eOf?U_dmURLVZ1o!9lvf~B
zG6{A~*?Ra|(-IMEJ9F&2&dsr#bTL|=JLIr}UeGdTx>yNWnSJ!VC3>aVXF=kfeZSWD
z^>WznHOOIgkA1%aaxjFw^ZMu^`dvDfd7pe#mckP1Z21)S01XoXZcdxHl0J2B4GmEM
zZivrf1LiNJx)Ue^3G0*akq;=t`D~$Fw*%zybm#`Yp9WqS7Rc*MI}{1a*buOodTkmR
z4Z3rxzaJWaXWvUggMF|{h{jQxk?YpgxphIYw<)!UnT5c=ZjWXEISOT;xq3YjdIwb1
zqKwl)lyJD6H!o>np{a_G#dyLy+6RGi-Q>%z3_lf67pd;W1WNu)FtbwOc4A3OvMnpu
zZCxkwhMwLEI^l6F-RT1&4qt+g_J4Sxl<f8~NZ5X!ljA_0d>7<ytg!Y-=iPPMwMf7m
zOY^$jL}{<MT5Z{EH!WZJq9b3L#J^9$yONF+(gOiQ@&x2m;*})VWV-qLso&N6P1Dg4
zvTzcuzQ6l9>Dt9|G)9u&QdYjL<jIO{MH<}?kF-kYX}0UmH0vUn7Q}lK#~BbWTHOUp
zp}!LlldVb#cq>bY>Z+aY5UUtUDU8a~$1tLgA$yKG!zXS<<@moZ;=mWY+=E@8kd!#E
zw>K(7n`Vn<fGFq1>?c5pjr&;tDgaBvC|*ljDfZpZh~5=PkuN|MNV;%W1=pJ_oCz3$
zk!q80DmZ1xoLyGOCf01#G(aOFa~^wzz8gp{b6rK3kt&TGo*xG@Okmf*?P4lt$l6a8
zx#)Vfa|46Gt#g*!TidX^veis%IgfAwPV?MNR4fUIF1yIuZ?FSG3cYg*xajYUcH?}v
zz~3v<lhtLGJC+|r1vpLa_j!6x4kD7XId3n7tj`W4Mb2(-<0B;I;toW?em2OWfQiMU
zPlnN-LDi+p>e+J`m=zQ1qpW@GTdpC%?~AHx)1D`eHA*}nR*oD@P4nl^7KyTH8n=+R
zM09^&TIjs_DZGI0z`{pxWSbZhr(>&7l-#s$68cw2{a>+wM{0t%K}fyG!{wx##wqJ!
zJD^&vJdk8dY~mhY!;!=xpeB6Fof#9T@MT#o7h<K_MPIl~q7=EAWs);+-FO)NKuTf;
zKXoJwVHuUNhCuR2&_HP&59p!=cW=h6kJt-Na|ruK^ZAdka$s%8TO#CQX#8YByF}A?
zAC>&Try14OrGj8#Un_(GA!wVR8t(X8vWa=Zyjf{P`7)m?H0sLmM;C_N!=W1Zt0(g6
zMd!(obJ$+sED$?1sSClgk!`_6Y?afZ*1{Kgp+sd=Cj-&iQ6>=Y$cpUEsJKx*!G#hH
zciP(lV<=&Vl%O7|_rxE?h^*2S?cC_Oe?}BBa9R9bew;W+K48wxIm;bZ;dWzWH;fD+
zJd`?fdPEF^tTvGteUW{75nl-w8jVh-6posk_ZXpW_h%%i9ZqFxj{dMEKSjftFh^k6
z;b8$6oqxQmP}|MaB!(28Mala%jdUJG<;LHP2!ZGGd;w*Vt(7{J+|Y^$i<7HP`+BWg
zj1_!(4y(Czz!E&xHR}wZfWx8q!DfCIis3a=Q5gb%m-l9v#57iU2){mSVkTiBf4o@t
zGiQ5b{I8w|`7sH*xZCq)00=ihm|g2W^8;9O+ODJ)$yI5qY1m8U@yiELTxuHBs4jbh
zG7C^}zYc+-5$q3^L}fU$f)`ARoJNQkiYy||)<GM*`)Gx=3nOI-ZjJVIocUNI2VDh_
zU9NyuQdOC3beFw#4S^f{X8`Xbg77WOg1&%t+cFZJICiyd{u3=PO2Yjv0zcgOx}TWV
zplxp3kp%w@D*TW(3`Zd*5#)&C*{5S(>LIk=>5kl@ZA_VGrQv<{dSe0HOf1j|c3cHw
zSnE)dA{Cyj<gNJZHgSuf#v2>%85>iVeIR6TK#7V8p`cn2#5jh_Px{$l()UE&vE!80
zh5&YKk7#%*kr8h~2o~|JKjQ<+xK_XR>Vn0(VuMv2Up&Hm)@qEWu;5)Ic{)YvV^1Yn
zEho;5bW&omu^CK=!!r(neKZ|(u-H*E2S4cDp_~8-f&<!H;iXC_<J~Bx_TGJ`&tv_m
zmb7XW6*9zrl|%2vtwk`pu8r}unQiX$XX*4Sh!wG!6*~_9GDBgy!cPvdCYB>UpB;se
z1t@c?ewJ2Ic}hlU-0Hc^l&8F{<ze_vl&+vq`?Tp>wb?R4j*h6l9yITDsKiW&w$fMi
zT$pCroM~z&^W?;{yYI4h6$d<#4##PwZ6xNJF#4SrgM?|B9xs2tUSba=Gn&j0oDd$4
zsFOcL<?O^|JoJirMak?z!Nt=&Da=BZdVYfh-pn>(4F}V8Mdrc_M3FFBnN|u<^GJGZ
z1(PG=59}n-<EX49!;s4trqK=^c4z&*e*_Dd3aMO+$L6@fnYe0VbKE~+^_wXSo?dsm
z0w<QZHgp<dAa3h1&neoE%wR|W9X?O7cG6ze6Fd^?`qHfwkjo*NTN1sHYmT;M_xWeg
zG|<b!eM`!US2LuWH7Q>KuZY`JZBgEZY1d`Bkr&eH&A3zLmV2xVZmg!|9$743S=raG
zO2co5&B*Aq%uw300ZTnU_$kzd6Fy$mZjc4YleUDD?cTeSP+bg&QIK?e4(n<G6b#$Z
zW|lDfB!5#A33NLHhtOqkBmT#X8z?be-)Fl6LfZBS8;qP7TTndal_ail5I1j2IH;hD
zMptIujR+1ElCDg^fs5FMYX+w!x89=rhD1@evYsK-7$)+y<eKsBgptr*rxE&|YzkOs
zB)TAx5v~7RqxE6@dT+PGlKV8CXLS>g%tLOlwuI)fzEwA8^-#`Lt>P@<xulfkbr^6T
z6XUz_2zUxR*igZnYHDup6`%ZjK(6X&zw(O3j-#f5v~pRe$oR^7!a6?N-|M^F(FOzG
zCw+UDJ&J9DIKz9}<NbpA=A@0cTTJc;h#c#03@=@s2>0y2RDAypVC9Q%j}?3~Vm$pG
z8n8|2QIMP^ct#MO9?!s7OIQt^bi@*zGA4;3Rr5^<ZnH1qDRq1G0Mg@ETWs@Oz6xPS
z($j?G8w>OBQad~@1W83Z8L51}5Pabnueqs09-*&!v_=IhsQ{e8z^JkwM_eCc<5uah
z3m^Wx#F^oO5l3;n*Zspo+6LmR;?g;#ZL6{)oht*T>>M^ix48)XIrk~pA+0)_2$s$#
ze>$n2P%P8o>r3H|FgqX{pe+0(kvlQ_C(@SBH%!Zfm;;%m9`|5x4ZLyfxb<*U4=S9-
zL%t`DL=$wEVvDNN9{QM7sN#=w)!4|5p~V~oD=_SqF4lNaQA_;UMIGnGT$(bJ2o=qZ
zXLc!ulq{sM`=#r5Yd(}DjzY}2l+mJVFv+V8ePP}ws4KK!*@$b#lqa)F!TEuhVw2=p
zX61J#2*9IPt>n0p>>$F?fWj+F;5kLOR&`Oqwc)tYm9>97JT6j9j-)ahB~Vl#-X+2L
z=&y%{9>?^fKcGF#9<kH?Wx-mvm7Kw6uM&?-mX)hQl~EH5PaS5Q*PUvY)aqNTc)EFn
za2m!^_Q8fS{Aav92J2`qtblv1-4(t4_9RD31>rsA>}*9KGl#+jc<yUo>9FbTaMo~m
z*9CA6=8cun5v1?Q8g?JJxbz7+OR^kVtP?pMkBd_?0*u*IpHTNFK12t<sBpi2M~Eu@
z2AydM-&ON!B?4pz=%=}4&Ec=Nbe6@$IL*Tc&!dg*=zP988bo=H<-L)7S3+${d$d}x
zH3fk8x%?ztrCm|SNUxpNroVbBeC;w31Ib8Rp*6VFfHfoml~vjeSov4xqfBU0G|wct
z0b=&(+4tDwy<H|fmC#EJ#;5zy96~DWsPjSCy{{3wG}gIdDjLBC%y~MEF+3M-{_Dmp
zZ5Cy6J=A{QIfLqbi(cIt=zMTgdnvT~uh)yHoTP5%?fZkW2^hE0S{MwYp;){(V8e`<
zN*+Zml7`i_qh~hpI?v=p&Jt1kBU-&BD9ySw-}`LvzYml%uE~EHW?s>!LpN7A9`l~8
z*%olyvyDfq&ddTH^?jGDsE)%5$&YVZz;0y9=@8AznIR43Xfdorqksc%gEzDgXDSeg
ztr{z+{PRAj<gvTT2ICNN{-)RmOk_lvk3ini5Gxary_Kj;$A^+LsYEl;ch@@SkFEom
zF<M2b;w4fAHfT{2y*d4F!?QT#c9}%w4A3dhj3%eO1TnmRqKvAS*Oyoc+AUulhnwUj
zO3bwbXwK`Zx6qAA5XC)=_)Z7RTnWXCEoHznYKK82CQ|Xid7Z|-h$;U!9t((Kzb*cE
zA%{ULho-Nv;DI{c-Q+a%!C8%~^ki}_L>3GLgOgO_#Yw`;0%pIRR?a0BCo3?J^SjQq
zex1nhTe^IAm%~j*@4EE9B*A*<lpPpPl%tGC@gk$1X(D0@OiVEOrX*jYW6sj@s}P-E
zR}T=ylGo&T9Zt4mAHAO&dkmCm$HD&gLt|wa^~mCuQln|i``G~(y1Ig?-HP+T^uv1h
z>4{62hxS3tKH79fKfi&u@OW{xYOta7(zb3_^W88r?CvO&sTi$hg$5(`hUBKfYT9>o
z$~`_lGg&+P6zza*@mWf%b0!3~J!4@jl`o$8SBb?B6666twff#c;IbWgWEXdAU0kfq
z6>Jtq^)awvL%xXs_=2LM?p>@ci3<E&hCMFNfL_@26gskoqd2Xr6K^cvwW>$~`{O{7
z*ycEk<4>l9A&28r33QjHy%ZiR-@~ZOUgTe`l^Z}#-I_vwNW$m%uugVo#->=w=>jjo
z1yb*a5rN*NUxvz|it{^VrP(f8Ur1J98?KAJ$T^L5Zr=%bv!~B!wn}f!ibD82tNeD!
zP+9YWRUI*7iX8r~rPI7`9x?&bkd3!rUECnW*h|~o<Jc#<(N^{_JfJY|<u{$C{j1-N
zbe(%dq#w=&RM@_+%ne9Q+-+EV4PjcTU3b3ce@4F4qrfA~5%x(g)-nBwNk2+QNoFV7
z77@(1B*oph=r^%?dKUEyRKw6ao{E^;Lyw8y!elWOYbaq|sxq4Oo?oqM!Ih`-rr9>R
z<vbN}b-`u5LX~a5WEa4KPPNhA$3}13=mC*;ag#kv@nFr0GjX>OvQ)*yGR_d)_XNCz
zL;TrVaAn@ly~q~Tk+m`GB>)RTMf*6KIZ2??cqjL<pQ3{aBZ`=#m+`53><YT+7Lyv@
zBtUq?0JEWsRwwgW(C`W-9jFV~$evOyRM>{HI*<3(tmKK7$vo;K)4~X0kiugO6C!c=
z(FT+_5EMLY`U-5Pfi|aiU(6W?oS3o2p#t3j5$f^0rT0k`;gz1JwOLi?!SmDD+^H;^
z?=h90VaLVb;#p=Nn&B>Ik55(-`(;HUXW_5u%lBv!81CtPf9#cT+E1t(RaTw0WkH%w
zQzL7|$5B+_`~n_M<NX69xIyoAWrn=$M!BC*y)T?{BM*p!i(8jjH;t&2!h<<kd<jw9
z(V=ecmw`Bd^*%_J?3Rtcg?Q7*a!SiVysE3X|5lV3EH3MIUZ5pt(@%@xP~gthGzE9t
zOb_qan+n#VK18!Zwr8#KXeu>^2yP>d5}NcRM(PcIS@e6cJI(~Z<L`(XN%MXTC5C3h
z+C&a_BJY1r15qUlaU#`TTDHi{A+?N;y;}v$8Aiv%!NY|u1GCaS((7T_VyX4>7s4xD
zd@Wz+=s$Z{eBQP?dE@w?ValCU4lC^v%9s4k{ifBa-I_!$uGlhRFzyl;r_M>=^|Xr`
z<D#jdCm7h7grN98_eWpTlh?b>+c8-*)`hZWE1OAQ)H+07KMJy7M!3e2g=S#%B7{wM
zc3^2CZzY7yC+3GLVnA9YG3EaXqjEj;jlV4(Gy3!Pw2)ET`zca&zQ75H-wBP=dab{^
zfR%GP%kcWX&%B3+&XzRPrE|J7);pNAU~iAIJOXX?jw$=T(R~k42=e`{Q|#F?6e#oS
zYF37U?X?F&3irF?7@h07L+DbI9P`LBWX&t)<ZD59qPv^?ZPGOaRv~Z8EWGYYHu*-(
z*qf2`8|(!3HgQk*%KY8+i9Si25uYVdmf}eHG@6>W-nF-ZOwB0@Ew=l3OF*98lN3on
zAXC)HnTDCI&H^bq?sp)*q2r1QOFpAhgu+Flq}};h1f$0@q1_T`PcQ6!hhkCV+-;OO
zZ=ans1C)UfMXDcIyOsm^mU*@_=17{^x<OS(EvfQp6vd<sF}RhI#RvOXIU*OA>R*7=
zW+6kDi9fJ8(G6LT{LX_%?S7V9>WPN_Fd2z3?nvkGh9&QHD84~?Md**mH0MeT(5jXg
z$U%c+3&4f1a3-dAl<|~M=xayOA(HVbp2$!W?<)lO4=j?TQ#^taoibt<@M-8sk<xy_
zE&@PebKtk>L|IO=<gj%<&I*ZlNfJ47Yg1sQ`7~7i>R(*s$*#ZZIOJn!eqq27o0YEF
zc<eOIXmtoJ{VNubcqn0}hCgmroq}t=sHazf2|2p*Dm(`WHyCrk-Hf!&TncdyH@9Gg
zP;9bU2^oBkecw_iRO9w77{Rp3NhtHWv-miv;iVX-cBlWS2gO!^V!~mFU<tbUi1fgb
z{9zslJH?%XP_~*OADi_Vr`1WlSAyi|n(e)I@%nUtG~7T`Ts<F4emL*;>kbc+wP79`
zyRIiQC~v@>m;x;mY_fG;3}ZetI$P83Es=<!c$Ix$%gQFl$e#@_6AkR{cZxdHf8Ez7
zebVHoFntY&zaif?{+tn%or4|Uts;2lrY)3Pw-mooX4?MJLAi^7Hhal-=pO#QwuB|^
zgA~cTOJ7H`8_7CMN3OBz<cXa?3jQH?nq63Wak*LScS+4(MzIm{`)8q;NET_Kd#fbH
zt(B9UhcdE?_7j6ZSvrD^pI(YSvLy`Q^rel5A!Y7XkmpYytvgL6v(XV#iaEs#??s8)
z-l6GSl=JPIPx`OJU$5IZH!6Ef@+zEDFkq%9D{r{%Mq?grcwVykiCM*KORKN>JP`{(
zjDl2g&4`#(E>Knyz`#~(uF@Pl6oTrjlZ<Cm3B`!~X`WO2KJlX6NDfwKh-&=EnIS>m
z20&mZFYHvCMXN=MZl6cj2CO=3c8t@+)0aSzMuY|bR{$uaR{be|T?9xV|2q4;r!VKa
zPcR3BmTy@#wMM4~HcHq-id7NQ_y<lZi?l&*R#n>?3hwTiB+jm~+iDyhQ5{vHu}YdK
zJAd!pzVYU%DBOBivU!XV+%Y5vc$mHanB}hcRXw)d^sW;g_hpy#n`Why3LZ?OP}{|d
zd2Th=GPFbBr$`70RJ8ClZPo=2`>&Qasvm2q79Fr#N$R!wU4m6IGr`S5II&GA&vd5E
zyg)JYCfLBFcLU7}JEo0P*IbroQ!NO#xLt~52VY9HoFCle=8bsKcBF2U;A8PUxpK35
zCe<^7V0PnC6#fLtg<~<p_fsTI-1ZV%<Dw4-K0u#HnNX~Ax7V(@)f<l=VDJ~O6<x~H
zBv_kjA&%Mu8pZ}4K$>_bKmJGCD+?LzpF(|EqiRJ0JanJIvQ#>jS~c__!uY+$Z`=h;
z7Lf)d2;GB9^1om%`!Klmz^_k|T+ddv`n68$nHnNM@q<E&oz>j}&nr{p2uzOSOhbF2
z-$9vMXt-$hGeEdSIF^Mp0Gk3b@R&l9SwuK?)dm{DF$cC8O3Dq;W)|53QWRevEn)>e
zg*YVlhxdW9%4X{M?iwhqDgzRFNIH>+Uq_3ECnA5`Ui2_0ESKquh8%u9@-yj};gE>3
zL_)tQtB7AXJd7O1in5?r4$F!!yW0n)c8^W`fOU_2$t->tj(*zqyTSBJMA&nmx58-X
z3h3i)HXO%p!cS4)Dx9l$?2gE&9`QFo6NE^~#yh_c8H$=afJ@wjf!UP4*mG_^-vrB(
zBzptrdXK(&b<s=}i=!9c%HFaJ()3nKOZo+jAey(H+e;ehJ$Z5Ub_Hox#G5Zy<xW}6
z5v@sNyr9fRTQUu@w5IPBE((<2dWuh_5X9P<s@oibRC*xDqe44et>Fnqz7#)O?`-FN
zJ~n}Sl%5nQn=t<UG6Y73Ku^6Qq-GARw9oe}E1dbyIX~xIiCsvks%Q2dy=7q1dK(f0
zLM0G)nI2}7;_ZuBLlDUK3-V7qlo&L<55wxkqBF1gur7Pl&grTa;E-Bu6*|kK-LY&8
ztJ$ER=W-Tvat38|7t~6BTdp_&MTVs?;b+<M9jfdC=-s1UtabSJyMSs8O>LrVzqT`z
zBF&68m&h7nlljlpwTk1U*)%{cn;J_l3j2gGD^Vnzg2=TTBZQH!eZ@^P=ezZS0?#?C
z94GxCiJa<2mr5T;1N9_nYz$e(E$SdiJ~dq0+xk9SqULq1=FzLRW)Y=@H1^D;<w<sF
z5Lr({1_d?YrcKwh#<X{C5i!Doe{LWq3`gAZ{i=ZrZdberU(JsErh>GILro<A%r78c
z+y{7nMDrLQ%K_eQ4uJ%+1fKXkiM2KFFo^IvOf7_FC1VQhjLN-uByOP4z{D@>yt&q!
zj)KX}Asv5mr#FH~ia(_<q!F}FXVgm4hp|5AhA|Sptg1m78(2-OlH<k&>SF+hihO3M
z6JepxK9K%BaY4Lk*o_C%ehEDs(8idWoSe)0pRU3_^pTT`5{>hUVt#-0EXbYz-W?T(
zN&`|L?_7edwZa;P2+rm-{L&a^m{lGy=x0#8cmnw5$7VRqo%JU(=^iHxmC*K>JVQFa
zE&AehqSe8xG*ny{7k`P|LR)%Sg%&|DgAFTKzg_~O?0zQ9t3(smwfS<#>f=`1ph)S|
z%a1E9hRV1(7APB%j9r(enO7#I?nX%N3c~4LoL2EF_uN#CVI~hUZKm+2xFiThE%M{c
zWLa?JRj^wdNrn)ZO6VSwB(ZpL`+>`TUJ8@JLqCNVFduudx;``^&Hqfa3#?JMC#G^F
zz7=ioaJm*49ei)L%j}?W>@5+cVHz8a`7-#f3fSWg+GuRN6F;j1a$2(GJ0~gN-^7?h
z*a^oA^11ymo%g0$37ZGaf(Az>9tnM|eF>z`{n)M97#r>+W(si}5VzRR9kj8>`eP<I
z!;FLQ<`6l4{n+f3`H|j9i^Li3m5biFJ5I&c#Frdtu@-n=5Ys}t2?Y+XBHi-#u-Gzd
z&6os@h)sQv+$Jj7PG2d5F)A{N8qVFO81b?%X_ed(0dPKG2XtvmecT|$HvXaJe1rZG
zZ`DRG5Wc{FU=>qlBW25IPZpL~9u+!my?b|YgdHV(8Rys608+f^5rvl*Dcj(&bT>oT
zAWy5|H$i@ST=hKP?0_g-=ZLbhh@HWx41If%R6$~9N=01OyF~T7H1RxZ-jeoc!L6B<
z%df2q6sa#Q^dISn;o~(m2qseu8I&*f1CqQ#MD<7G4t`F&AzqQT{FN0dC(dX_vCQSa
z&Kb?s2_96)eoOn74gSz;c66g&lhKaOECRND&VW`kYFC|TupL^3`|dgjoYC%>Ae6ti
zt-Z#`<m=~-UVFU7u^4f>zK2frPk1PT2;KeAH@p>E!F~rt>7_ym-y&A}6c2#{XlXOP
zA4~ZKbq?*F+cS*Vdjc+R30ExiDvY4|r6g}W-+I3x%_3S`zlaH#7#QkhpvxvV0%J2)
zVJ8-N9T7ky?;<K}wq5Yh9qTe68+OA_y!PaJ2GXpWvc<R<-VTj^B_GjOVU?&|lO0M5
z=Ami0ezVT-w)cZdRNe6biJvD?S!57=9;}1jcKDO_aiq79a@jaq-{#K~8BS*!6fH;%
z?ym|7#1i5Pq?O2p-W+<JBk~rG@)T|<HU_!VA-a1f;QP96tY{+p&VgMf7J(eYoxNJs
z>u{@2*d-jzObj`XpKIH*3xcRy>jTe7V%LOi8o=T9TjDYmE~oB6{4o~G6&cnCoXhW`
z-}HKr2qb^5+9I!ZUV9Hu6(@z9_JtV-PVFlf_b0-XTVbJkk!1n8Idw}Z86To}=<_Q^
z)b^~7r;B0%!jGyPJ5f_)zfO;C-jXPKO2dtT>Yh9gnU5^qxQy4O<&82Yzfy)PA#hR;
zpoQ#rlf>;tTZn6&{YDT?VB|$Hh2@i^8gvWuOg>FVG6a#1!<DDeB{0}4+Cs#MK}8pp
z2jkH2L?Ko^6h|%6T$!aHb=i+GhxI4%YmZGCP&gL}<o&8BRa4h<-;d(CtIoQ(QGge+
zh!f^D??WC&d~=px6%z@GY$HUSi~p=P`2Iy~zd&G$Wzqii{&6=Jd=$dDKeeB~%LM$(
z;fm1o9F;!WX8DYR&dRV$Xg;^D<oHLMfxJw+FH*;IHI5j}*VwIwcat`PX6q>?LN}KO
zR=;e!gWM8|(OHie7NWo!lb%8}LwVWJ$Yj?G>tVsbM<pCv`}}!hwaz&LyaY()GgFy)
zTT<=2TkYYko9|r4dG4Ae6G!OBgB2<qw;MCkCXN7T?yGDc1pX&m8MpcPeDd`>vM16f
z!HQG<H}Vx?7M|oQR~Fh12IGD<KQQC7;MW61`!KPe1=!ABCFm`%10N4Ev8@}hwC`Hh
z>^uiqa)kU%s8$@ft@a!mV6A8B8_>(19z#~rR$Z4g40w@tCn?l2W87g~jlKtC>0xJo
zCg@5QH?lcL1?SG=^8L>6R=785dOON4vKZc9KFu^YV4C7!Hyby_FL<QOLLSX@8~#R^
zEV!i$tXvO5^<K>0FvS!ux=AazfFY%(0Ab{)R@EvPN|GC~k+Ni3`(1jYP|%_t|5WLS
zESEi<43yNBmR!&^jNzI~i8TP%Fq|qYuAnL%DNrt*$)!bH&>d4UjicOcN`2}p@(HSJ
zb@XOS%`wVncp^O^a~TCa?x6Y9;KvnT?uJ{tV}3_!h{cENSXvB&b~N5#mQs@BI_iB>
zg|)2-|B8}b;XDFqStqXc$wqXEoEYiWf9cV3^=4EGO}vG}LgBulHb_Q!G%EQ}h$|@e
zljX>B_MByK`eswTc<6za9BWfNghw@Pi@cN?snZN~kbb@DJV=SsHT#!_8L70r)Ol9U
zYH+zXVLm3>haKDXd2K@y4%g6>TTbm6!%I;UqCL@c(ls_X>2k5CR_5P6rzgik)@^gL
zF2=}_a@M1YkYXOONK%9<2MjI<K2lTyaNmw{p+?O)%7q*8RU$`$ke{dgLkDsr_l35X
zjq(#v-S@_mvU1mm=vGwIisPOJR{It!nJX}OQVdS+by@_unB-k5&X-Nn=-apc45kgj
zjbBj}c4d+Xpe8AXRPoZ{aDC?(<gS@AYCRSzj~IwUqg8rf->SY-co&+R$ox{0Pj_rF
zQ=ZIbgYc}SX&p$RDJ_nnBEweLZF5T@bGl5<UtZ^r(z|j4mB&SS!>NMAsyY1$+|l#G
z^IAyu^2|@S9n&%(iZtmxM4g^B&WiXxAQ+c2#dze|;3(!;z1?>~Pc)gr^RpoQ%+QVX
z<GVBPJTCILg!nY9az(Zf&r8xugmW_z?`^v};JL7DWbyPx2$WhT2wc*pLFFXFrur*b
zHFbW(CZbNYY@U}|{niDA;^*)8WErYii;k|8OPZv!&<zfKZK^c8{fKEQSs4;Kpopgq
zQA5SL>n9*T*%KZcBvI;~c2bbUKb5IkjropeJ8+^@U@T^h<(-mbQj-GZV_>yDe8k?6
zeBR;FP~vT2^T2WR)l4B<16;^V#?gGHkOXY_cUv2A$(|Q~(Mr`Ya^`Z?q#r&hKErNM
zu679@@AiS?UvqD0{$19n$szvxdPYyZFDAlMR>V<5eD7xq-d-67idsiIHLBrb-+i+<
zXC9%0<{4=2w-|bm%E|IluEq;?3wW$HNamBA^t6q3`@nHPmxvB&>Jui?Cw`9jVOOq=
zN49s>q%Eg|>a7Vy!2N}+3r%NLlCoWKVBp50k38?sUAE%CVgUrQ{p7BmF0PmRk@(-c
za3yM{_@5|l@XNdhcYehYu0^!es3D;}cTh;kZ|?dD0H9iEv=YhH_rn91??&5`;e|qy
z+-773xmkY{)+u{<Ngf=32;C!njnTV|*di|w^X*Ea^#)`GoxiBeRp=8Wyd`kxM+KK6
zWNY#Qe~?Uuk05c^F4G;5)xmfDwk>d6oD6!OV#d-lk)#*YX<Z1}9${AIm6J4z@~slz
zI0BH3j?|n)oYZ@4L(8<gZ?QAwozERhxSrLfUEJ;)a}gtK^M2a~Xm;Y0vkvnyTXQ16
z(hKkX_;b7*arpuH8sew2+WtvD;-a&S9QpZkfW)Nb;Y*ynLe|9a`7MgM%sD7>#2Dy^
z+_#(Ux?#!a-S)!*Ht8kS(k;o^gqS_!%oHl_RlLlQgx)Z9`9LnfFjNR{j$odocX^4_
z&c8x(k*r!Ye?%=XT|Bg_n+*xKoFJ7vEm^f>F`mNckQ>iBo(2-9M_=bAWijdzZJTtz
zRd_vrd&>8SaeyVC&P6VSjXPQvr-j>5FYwEV^C;2UZW`C`Uq8rnxmC&bgmXl+AY;KD
zlj`gN-Se{WC1|&?FX!_<N&znmz)gX5F_iR|5Ni$f$OTE<8f_{epp&e0S<@2Rj92Eq
zR|f2VWF@6*c3-2l6Rr&XA>Z|`1#Wy^lg3{l#8;drA9L;4B>;Elr>NtKyX@V)^!@GG
z*)58L=F(Bv(`t>~HG>3sZsxnmoUCkjILo=Q8a?_d#Ga67;E253?Xpx56?|#6{~>=d
zk%U6XumL}7B8%TOKKt3UD0+L0rtI~06bqb6u`xY&Co-m8^tBeKlfWMOh8Dl+aax^r
z4;J2ExU{S+m#foJ3hnqRN)%3=JDd~=6Y>U2HhVLTe>f&{_;({pfSI-j+L+1H;7?M7
zFRV83IMjhZ_Gn!To<Ca9M~3vrL=q4Xp5lt;-Yi$K<_uA*GReH-67J|-${I>wAlChw
zwSuHr45T;&+OCCb4!Zga`hPZoQ}d$f2jd}PkFRthJd|mN{#C6-A`f$jgMuw#{LBxf
zg{w-$tD<J>^CochdP)WH#p||eq+62EJ#VeuecwUp7!izM9^I{WC`UHvD|_%D47w*a
zVdtP73G6QBPF-9ILduj+J*AzaI1g7mt9k_81o8woDXRl4zggzov6KNmEfWD+Yqvzs
zFFq+IBJl$mt7G-b4WFsQPiN|(a`_^zovXaa={}}CVn*ZSV8q2k{*Y)&t5I$rp}6P+
zsl)une*{y#d+WIe9vDgJyw?HMtdvg9LpU@#7*70q{gA&@f(EHOorGlc3{N=<yhomx
z<G{95!%T-^7_;wT56Si%rDWuql<18W3`A@S_N!cV8hDT&qA8@3d^#<qMDs3G<JJj1
z_@cKX&s=LVX9XW;-nE$xnjBQ;9-pz^8^j8ipG@zOgMt@lfI?VLpupM#-8otQohga-
z0(<M%Rz8bX`g;l$s(qK5$ZQh)(6jd@>YT9T&v`bk0|XQpEn<}ToHe)2bc-!Bk_CZI
zWh+rdFMa)TG4XIYUN<8vO;0=2%VIu&PG-})$bhgEUx$9{Gxe^UWd8QB5W$wtW8ZCf
z{XCL;rXvSb|1lv-N(0Peggw^Ai&3G49D!bFFqu%H(4jA$GaT9V{5I}Y%&M&MLsI)k
z2-0hBD<WO`1>GOs%Wb-w#Z-QlL#!?UfZCbET2-@f30E9v;xgZ39avkYVv3b8dKuUj
zVo~gd89lbCJIHYh{FzH{25fjI{c#~Q1mz|@`R-X+;`tC_VO=+>!X<~t7~WT5?h}Cs
zyqQ@eEtd4A<J=G<Vxux!-m7zz{li{Twn|uXc2`limAejRHr3mH$Lc8)^SE!msUA4&
zq7C!vr>!mrgovAFoNW!6WKlZ89L+d;-Ng@aq8J%U$aH?-L-MG%5|3f5hEqTT%L@^_
zlHB+l!KDH7NrncfWwI5*lIt<;{=!i49kX4Xk@#NzfthRGD>?ps#6p);$3y4kxo@t@
zE#Ysb4Fb5;#J)9wUTLa_DJ`4dlA=!Af;!7SbJ~qZh%Z3x&k05h>R>|j$xEeu;LF?l
z*N1Ppd$)564+OI>%%cUFOJ6o02RS;3iND>9(#R>MP=?-`-Vj{6K|?O4EvWGyqbdwJ
z%h8k{1N$3O9m=HQ&kUuq3ILz6tOViWf!x4=6o$aV9VxX-Pu80~RTm4yHk1@zAAnf8
zN}X0xDE3KFHX0__?`XjneWFigBp-AN`O=F$%OB5ZiY?AxkDbE<cY$M+xSqB!YO=V|
z9Hgl{T`JQ{@PSJ-FqgthG`;DTR~rE|o)#%aJ&E`a9CUD`M77R^>w6n<@r-4#1fzs(
zviwp3l<)Yjy1{3NsHj2s6@v%V;mc@G)Pum}yrBLXevR|jxQZcApUEI9GQq?A9?zty
z)V|t&>B;c#pC!pxXuHTF6-!o5LH?W0@-<ZOb+R82#<}@^+Tii0RRF((<v*(r?JrHW
zBvkqNZ)ynu($+6pKYRasU2AILZ#K{}D#Sf!@p?R0^2}eX5d(eVD*(KG4yxtL&jzw%
zmUNsDOF%5IKhPD<BbblU?$hd@9tQRK{tbb#LG?H1JJtN{cJd8C(D!LKsZIR-yDxr%
z#YxYG7*+_u-RvZa2me3E<-1Ux^62-Tc8@=tBm-Rk1Wd61a8y3!p_V1x2r}p+f<R_}
zjD}K{^Pe^hqJY+oWu=w>1Nec)^T+l6H)o}WNRev4Mo5?7Z(7ZtrV&uf4uRqRHI%wR
z{r_D4=L=_0U<(pNjNw4ziFHZ;3qceZ(QOy=73jI>*Ps8k`#+gFTXDj`u;}Ok<@bM?
z?Y};}5(7T$b3I4?mcA7Ax267`fJ|V_j+t-iKq#&{0+^)ezdrdF@h8w9=oYkbwpG%<
z`l=3LrSs>nKW_0!g9I3}VdpQlr+;782kW0<{cji#QQZqLeeEx4DFgp(v?WBa`49YO
ztNR0RwZCQsuEk$h{9)>Jzl!9Ee}uum08M_Ir{Ki5zx?4Zx8Ox7)4YHIOIE&ssWSZs
zjO*{Y>U-;-Ek%V4y=N8kpWglr?7_bWwY>aQ@a^CK`rFHcyGuZwrgB`UL0R^O__Id-
zX-|V(&@}UF$KisQL-@Eb^R$2eDlke;pleAtR~m;IrvDmKH!+0tAKcZyU_jb}psDgx
zm;2957;vDLb^W`I`}d$hePb&}{ihRsnno?V2Ebh1hJnCrgia_giT>yEf6WsU7#S4=
zVA<j%?Y>&A|Au(|8Cx9>%-NV8b@6)Ym~!C%a-t8Q$+)FX{$J)4HlPCfMIT&-e$NBO
z*7eWX{L}Ge8VI0en!U<R;jK@ic?UZG<>H^hKqez<$z#j@b7c#(f5Si4P;fUj$l`F)
z#VUVYcBT(uW%Ad=brTW;mlFo)>;7k0-P^%}6D@z-LIfGO`yh9+Rpb2AWQs6082(nE
zrA7m~Izlj6>%V*qG#h6U5LU9Dk$}xSBYp4wF-d>_@>_9(>ZF@DQ;i}2<5OV0oP#=k
z9(VCBkS*%}7=;qT1s~{|EB?+2aGwkjT=-A-|K$tN7@(e?j4reRvuVJ9QpW$^vkI`-
zl9gW>H~|HdKOVTHD3o`g^p}m$!M=jl=2w5myMO#I616Pj-#q~b-L9bBuK$fJ{I`dn
z(St4g3qK_S%o#qh^3-3W{a2@gyL%+^%mCw{fjkjydO`00*bP9c!Qk6y0*!mVSl#}A
zm;W&=5m>0EbbF|z`qKaS6cAemR^Kq)F%e6ChUEWg{mc{OOAj(1!D%7x1;E&P|J$&A
ze~h-2T<Z>65ZmE`6Mye7AS2m=?sm4pjZuF)!&<Q5%HLQ#Brsr9gh8qmO#iv@cl(Iv
z4gTS?bsK_A7J9w2H2=?;;9N&3djh`S?E<>nxsYhx|8vFvnHqyrn51CvfbQO;8<2Ju
zAR5w&VSdd3?E1OCpK@tVF0)oaDH{Oh=?<hdG3m_IQuQ@ao*RT`!cniLbVvK<t67w_
z`nKaMqY7U}(h42!GLR{Gwb<IF+Ait8?g)CkdD(dA3T0S5gweR*T;qRlj>rJBoNowK
zEN4!T4E_`m3rc2HpO5OZS)YmMtST@Y5Aj!1+Mp+Rg!j`v{-{keFu50}48PZ<CP<F;
z+jIN=M&38OAbKL=398*f?}wS(1ipFAds=W?n!Z!$JO(HwhV?YpSpxUH_ztiTJiK)K
zWgP;RTFqt};%=pC+@Z`{?i`nOhaoEUCU&)5-E|w<O>x*EytXW<YZ2YrwaAb}IUDXK
z41DqlN8_!=Gfckl-zJp{!DpGMTAfn0mIh_1w;j~1jFa)*U4=9?67cnfy0>#W*}u)D
zxfTzy8Qe!?HUKNeR!SD3d|)B_C66-3CxqydkN=US@($pI9E1w|+6GLy)YHwX882%)
z?Q*;;$MOA-l*dja1EI}vF=DH6orLROv4Da)i<0^k7p2%eVxO5hkh%k4=7EDYKq&<e
zq{T!zq#9pm!_@^aTwa!ZU(`ue2}58wDRw+p##+)~oUnK5I#n<c^q)r6zp(nbex+p@
z*aFEHbPlqs(7YNCHfyKCVEsI&G88a{-%2C^0Xc;capxV|7e7&v*dS$02}Hz!<;mYM
z*S1GD%X9EW{7z&a!oGJydKCNNeUu+6>chGMnXirBE1SR+_5}5GW-j2VY$$Fa3CdHN
z<bOT`YHYm3ZSg7tN@wgMCqOl;0K(%a*5|hHkoS;l&FGU&Q9HMp&exAdKakc98YL<_
znrdx7ZehzjTt)>PumnB<2Ra4|XZxUsC*x)XpPNL0B%M2WgFyyHez`Ntb+?t`*CO3+
zWMKHCX-?h~NG8gpXRd>O?=>Cuxtcwpm~1m{iAycYw*01r@~RVb{@$JH08~2)r1ksK
z5Y%6CvJ`?K12is+Ry3S}x(4Ys0~pa(Vp>fuF@)Fd2afzJcl0o&D$=liY*3>`#!^HO
zKvv`LXS}E>TjsS5?f2%hq?)feHe$=cj*2+TQ$F{N%^pi8VmpY|nkB!=6}+Xxb#9V)
zq-k)W)nm7Le^=XkM{35{S<<5TU=%RKmpu6+dkJz$fi!_g1oo3udOx-o!i*15K@XaB
zYM+Xwyt~QrGJ<S?bW<AMPD1nn@~8*E8&k!Qv1%v8m8)^VKmoxkdc^4sH&ab)MfSVn
z(p>%)_a$A|<BRWrvGYc#V|9i(MXn<pj-}BBEQ>i8%OJ1X)~T~+9co%bWYV24j{tf`
z4%2DyTI@FQrZ?I+o`($1vRRO3>XkMF!`CxralMBxOHy(ayMyC!K#EcJ>KFYB9qGZ>
z$#}`(wg=|X7b)Oz_I(QXAT=DXEeC~#J^lA@EN~eNxODdDi}OA}a=$v#aKIM?fgkBp
zO)Jo<mAnz20AHSJqP1h6&fcvLkC8ptyOp>)jU`Ng7H<zI?~a0X=*7$?1`;8TvR3C2
z7}&3QB<l7px0*LyiF45t0dVws)Mh(*EAqEiGMfnxu~GV6G2z>j)E|6VTc4$$YFs<V
zpctzn8M>C^{VYx5tD=X1$n*XvBITOT_g4PNQt?aoDNo02t=0z+xb86Wx3P<E;0);H
zHLb;U)4TNweOpah{l{w;hKf!dRzUnZ*qtyU><oa`+38MBD;PJH&JT^T%^Cuf(gzFE
z#tu$!?rdAXmtd%{92c}Af4*%PeG&9xdVPz!Dj$U}s^u_Vk|l<J){K$)V3lPOqC@Vn
zzu<1yGHfH96Ujp`ulMt+7$Tx8^&qrGmVv<s9AX{2>%O_x62|;6dN^v6xLtW2Al^lB
z9t;T--vE6OPHq!IY{Vo0%35`Zo7DOaSbAZsE-x3Ejm9Ive+yVIHY_G>$5~Uh?XAgm
zF_D0gq8CA6skEpPs(6roOdd+Y_d|g_P!ujXQ}P<*03*+5MjHx}=*R7~$@Zccfz0lC
z%41qlkBI`RxAm7f%B379t)BN8eBB#TK9yeT$&+<PjW}mfU}x4Q?XDK1sVufw5c>O8
zJ#BWdH3K>M=1KRFVG%gf`>Fbb;clFb0g3b!<UdU44TnIi!uq*U-@)vG-%gsg89$<A
zNAwDv0I`fn$u#kBd8_kL@+@<Ze)<9XDdpx&6`GMAlR0Ay<7M`{(Sxz0S`AQjxfJVG
zZTPy+>BA2f?J>@DF)&?16|gMDyh&qeMA-PU%&SZ#svl{}{Uv>PKPYoNbsijRM)YPr
zOIxzC3m%Y<wK7QxKV1}2+v^bPbb7s<>R-w;mb5gkiW47;gt}r!bH3~y=yONCljq#V
zi2Th1WQGo5;uGo^Qdz>h)1E=}EWc51JrKVxO^PZ!FhM<u`Lq>-i&jus^M(3T=a1~m
z+p=-v(Ny489F+UKet6S{lSYh0lB(l4@*x5>1V4@#as$&*HG`BJU082;B?uhtGk}d2
z-E2~i4X;<*w<AW$-TQ58y*AwPY4Fn!fvX9F0PrU4z{OCl-0V9V$ce_|y4EPY(j^uY
z;xuClIY(I@AfudEH5rCB9y9XzBqbSU9LF%At<0Vqe&_`(Nxl6SbRNEAZ-=el=5n?8
zdL3H0BUEnb!nudPfyluFr-F@gPCg*3d|a|=!8{!KL|bNMh0T<L;t>9Ztiu!i<VV`9
zkYL$(p>gjndqUxPmeZu+G|lm~Rw8!GDoG5dFc<lRlZl&Lx;8omej?A87W&BD3Q09z
zrqM|#lbN}hyzwoZES$AGsoxf(s~5PYzR@xlCJ30;Iv7&uNBCg<`;LhtJ~-pUcT>uM
zhPhAi;IE=FDa$s9+=m=_7^BkCe|yS#GS;=7;!{0pBs0INSRjPz*M@I{3=Np=@GX8J
zq2ux<$&3#-`|olegXmpi@bw0#x-=ksqs01aV*<&I9fzQt3n|~UOEIct>EsP^%*LUF
ze}VnBQzv0cs1mUp)WH<1Hog0{AegEc@e{zY$_yuP?t#QI5WaXZ{E5XL4@!*dWb9B!
zEtj)|ZhsdQ&fByj9?_hHKd(d(@%0z5E=CHO6D&4>y9Q~bWM$c1@e@)))`&#cn7(bo
z$g2MpKy5tviv44q@->MWZeoNu>JV{VsgLqfC!K>{`o}_K`L-t*16M<FG7Rm;j%>mK
zdI}l^?|IXl3-K4a^PeB6qwK-8vmjH)Obe2^9Mim4#`H>(LZ=<jo2$>=8VgnP&3Sim
zM)`_11g<Y$SOQVoj)FSIj=+ayD?2M1_DxMae9x`*X)KJVF^6nWv*XX&{vTUs9Z*HL
zwteaD?v(B>>F$v3?%F6FA}!sp>68YM5b5sjZUO0#5YTV-bI$QO&wIW<VfM_rXU)AP
z*1E3WUG=5jEFr*|#KK<=T4N}k8rwAXN`S%8P$+VllSoVqsqs=P_oI&EYpPGLdPJBW
zH=rcBr0iGIAFT1992o6i{$kfI>Z02x8WF8el}wf}8MBqpexv!d6!VOs#bi}Lbyb2r
zv;q;RQ<mn@Z7bwe-VG8qpZ{u~<MS=^W-x_df!Npa0w6Amu@etiXKNu6>&j5S;M;|F
z3gx~@&U>OA;TEdk8ni?9{Ysc{S&WYuQh`UY_&h&5OtNT(;d>&Ni$#9OmFV*W6j=Wt
zJ-i=E3HB#6$1F7#Y{QFnL>uWd8L9OdcT9O@&&-k_fPd;Z_Rgj!u>yYzzWS#ejmT~c
zxBMQ5Z+DEs7rKUXf9w7#`Kn7>o7G8}?~eUV*A6vdx3$Vy1{=Z{lw=|^oTwv+wMJq0
zZN1T@cDvVvPqR9mSN>PA!@;QuSzMmC!}}s3%c~%BMSHytDdaf9%0lVF&x_oI=pXE*
z-&g4*^VnSNqwPD>QotL2wLoQA9afq7T}MIhc*R+i+5~fz=-Gd5_O>ybRJC08sNvNL
zMk48&<_iOBd<d^dDA$rgD-=|~Dui!$5l(%@q$}4|$iq*<o!ngc$-UByb%?A?FzWI#
z6M}_I57(AGWFcrRERRyNBH0tP?{F-UQmtR4$3qI)HJtb^H6l|i6Nu*8{zK=b1*tUE
za}sJ?$ZJNg?Mfw-U5nPle7eLQA5<cbWHm00<I&*;^B<IA<-?_HKVJ=X%bTt86%9Ts
z__PWIQXt!E53u+Je}3GtPB$KSX#yDL=__@^K)s&AH~nVQxjzAZr^;+*Wid^YauO3V
zU@1MZVl5z{4NHI`*+zArvzs_98zC(!kCtdUnVy?tGB$tZB93j3uS0J~@}zmC^7d`k
zqp1hTb{x^xW=}FJMESZb03eeIelIJ!uNZ_ZNs?Gj0a|dBXwg#dFag1P6(xMI^<`!l
zrwV)_ofDMvTAhMbR8$vNm&}AEyOrltUzx#b32GK;6<UXJF-~6^9|g0xVI&88*$Q2a
zkI%V?Y@<RO;8u+vk9*mpYEzMp?|hS*Rs2@r^ox0NgI!D%CJuHxO*sfVxFtmsrhB@l
zdGy1xtle+STj|I)_`|U6-Ejx6IDd7|2b4=HG9q*GR7jB%1xZ!mB$;CExt}v}D_c%e
z(vcdfntYvF6&k)y@lje>Od-Y1P4}|zUYg!~Z7x5>EIPjJW<piiB+$hYkPmHRKvk^F
zzm;M4O$3Rv=9{%GVy>LE-)&S?VdSzs!#w*MkwMkWuQ`ngytSyP{ZO1LS)qM#xMSyK
zb}QBra{Dk3yc>PC8*7;;Z91S?rY~XTpuGELlH~rAVGc04YLKU}QG4D1kK!<meYC`6
z!sNOlrW7dyqw(E*(omN1=A6*P1kOHh^3+pXiD4OxB%(rY%5GaOUv|6+=|H|8Gv^Z9
ziLg!1gFa5U4G0n&G}+p`jn7@)<9pD^w(K-UgaYCB?3K7|0`Zl_f)&qzKaMz_I6gB-
zGGu~reG}v%OiO_o0#^Il>{wZ$@P3K*^24Fp0=B9N>P&HJxYc{|FzIRd&nU1~y@ra8
z`!8BKPFOuZy@5f;wfU;#05xz=d+@MFFGds)mu@-XsD&*;-*?k@;9;>c{anaRVr#Z+
z#<NX60yWN_-30U8U1TyNtYX4nc1jUTMG+CBSlLl=DQov0=6ihV?L02GI*FKMt^NQ9
zI`bm~3}43Md8aAI6Y!z>@k39>lE_hUvb2o03Poqi60RpK1sI@uc<A8A*R)-Ykm7I)
zs7w8H-ot8?b3FZ&s2&4PpOg9$w=8RAyO%|Q@e~JzLZGTWwJYdQih}wi3Nwe}N7jeu
z4>N#`CFcsw;9-V7Na-Ty+@D@_4#VuiM+muk&f&-Qa;Av^%ZyhXIRFIn?5BnUxeyRj
z*td)}QJfA%GqX>EVLbqgTGDEX_uiNdx0PA!)w;qmZN_Eg<wL>1$@iDK3U>G>Nn*7d
zo|?1NiY!_`0Cq6dSm8{!nNUc6lYC3MW1%^_{>aCu2e$bWyc>&zb}Kv+cH%<q6oR8K
zw%5%veK|9Ax{<Uc8<V1sPHm}g-TQc;;pnAR>T3xNyS$6LM}YCHEJ$bATe7gGHE-$+
z$*x#2HKLDhuWHj)7~++%OrGM`NT_3uJ64f=XiPs0mGdf0CRDC#I7}NlsmZC#Jbv^j
z7^cr`FfPGe%@N?+Ev`4FyxMf@MN4kPLOzdr;?tnbkWZoIOH7D^m&k=(F2+~I6D?L4
zTDo@|N>`IC5RGU2jFRH>+CkF!_|3Eo!Lk6WC2mc-h36+6wrr!CBsPy~&-zx>CB+{u
zyL7hoFiEZT97h|QzE`#Bm1H=JfL+mv>mG2g0R_cfoSZ2?NhqgXKn(ZtP_6jRZls6D
z&aQr0+pI4Qldh<wJO>RL>~=sHE{vtaJcYogi~j(1l3w4Fc)8Vp&14Pf6P8mY%j8@_
z-W{SawGUm3iZg*(A2Mfm1{^|s>^Fjm9lmkS-CqwS>;X~Po23L#s1E{GFq#~E++4K&
zYNC?pp37U)i#>;y@Y=g_t9&Dj0V-%Zb?rcC;<oW9Nr3T*SPwqr6Tw*A_|)3if93*I
zR@)Zi%O8@g`B<rW&!PPjx_D;Fw>I4p>e0WtfOSM*HxRGOaN8BIy;BFGLIwP3x--@D
z<N8Q>k$#v<kkY-rmBh+Y=WvoFmjS8UDJU+h(MZe^yS{w*#)ej|0H1P%%UwPds<E1!
zG`O1CgoZ(m@h;0G@L_)qth>x-876NaZZGz<OndA9CdJ=3SAeZC7qu!ezg%&dmhjG|
zN4cBkC-6c9$ii@S%~kAlRA&Gr_$x+J+E4|Xp8#<frpJ=E#@aXzFh4pZG*@2I?pf}e
zSPXKxTssz0$yQ|=Eia5LPw``^8wq>cXk}Qvq5V=<OAs<=HSh`ymu#I`Z^9$Za+uRt
z`dMf<m$ibe@K&cp*KS~Z<<oZ!b(g`q!6_V+U*=tNXFiOYB%Z(6hKkQGan2=98c*a#
zRP$d^+^S-9@uYip!%#_`CMmILxG3M^s^wBS_I&}#%FWy4PF^hx<L=2zP8FICL0p0|
zfz51*2rkL5R@#t$Y^2C^|KP;7rd3qxC9#c!Zp?N0dPLVPLfck}{pUql6DO%Ud7Rp=
zfynyI1!bHo?DyU8U#5Pf9G<4|A_)f?4cXmP+5x8ajTT3W&4u5nd6UQC6J@-^kTtt%
zWJD2IxjH;RhJe*VJm<Z2eHsVyo7+(Da|0ehc_q(@<RH0`@cqm;*)Esj*aU<m5pu(Q
zua{GA!2zfD3kkIDFD&~vf7E!H6DzO_()x_v=*=z^%>M2(n%#VdWytlq2vX!s=aX4n
z6Meu)&m!=V-BzSAm*q&t%O!8z+c!9z_HCF3aX%JKd+Ca;XUrvkvx?Qi9}zEGi#ZS1
zruWV|KBT7PXR?6ar8*l7%bvz^R^!w)t(!fG;&KF>i*ngX7->8vw91$EbdLeAlyJtE
z{;Upa76GylTjdLX7d)MpNoa}%pQ(6<-*MS+CL^m5Ha3ntnA%R}f%P2fPHo9ITXh`~
z^IFfojt)>}b{eXSL$vuR(ejiTAJgD@P)tgAKB)OEFVJQn#!^tM4CKZ7Ty!INbkv{t
zI~Ngiygw1IznNw!gxTPzz8S#U@-O9&1~lFVJmXL_s0uT%%O-3Kkt+CK3HUbLsUJvr
zun@gVjb7_HfnOD+#)Zby&C#As(^?J;!5O77-ZlE##*UkODQffaaYxad=%o^T_`V){
zx9$?RCIH9zg4jn>@iQf<x`~Fz%88tkp!$+nd`KP@g$P(OTqps99L~exS?JcBVsqgT
zz+@BL7C=hk*w=HiwmTTo45X3<hVuzntRI{d5zU-yLPyq#tH={|YEsF)>C=uGDiV&1
zFqQ6DVXdIPTc4NBqC%eYy@Fr=9#P*B4jxRwV!AGyD{iinF-dv%Zs_Paeaoik=(v8A
z%|n>rr@#0Gc(UL~8$7gdXMOi09%h4xKSZj+e}lBa;Bq>{eu7ln;b8a?j=LmTx9n`z
zragrvmd;N;x+w$A?)8GBN$`YAl0QoV`p(kKV8cusMRo16K9|XNJXTPtRvm+8AF$Z@
z<Nn<LK$ulT$nBvnI#^}g4+r7oCXIi_RXXx&XtFz!KoUW8U=CG~r;!-bbY9wtivM-)
zrmQ@d8qblV%s6(d*q0}cL#*M@gG5*w-u*UWvJ?+|oS7dG<$i`4wZ;y2Za$Np{Dqmd
z@y*qB-f#c>+F@#Jo1Dex2z7~4MkNQ#9Qhenw&b7hLuyFSr|qrTU{2k;2!D$p8HLbE
zy3oi~7|_^bD~B4P04Xxsa^?vRXDpdrDZHgc8l!A8ezA$~ipR(2m0953M$XCYP;n?C
zGy8io)+pwuPzSR*8}(|HF8TFZ+_klsA&6XTA&S;SX^<vBL)`EwV^4BT2qXlpeqWse
z<f1Pit;q1*eB4;1CCqC>eAL~AMjmdO0z`0VM5BY+PL|0l_@86ifDc67LO>jPHzF&3
zJ;2Gak_R!?c^CK1QU(2}G<_wAO2um)3GTnLQK~e0C(l9(r>w@NEU1b&D{bl-2n~gh
zW#m}1WZ-U0X8#s2p*WVLLu@QnvBMb<*EpYcH+MA;k0f><-Dt+!sQtGjU|qh*MDJ4v
z+prh)4PIJ=PG60GMPqRxhxkuNo4n~kqBp~we9iYhJPfgpG`#J(MEI$jI*$M~!;x{Z
zQ1oQi`knn59j%>rJWZ8My?)D0#p%pI{S0|#0Wi@OzJluRI>E@oL*^)S(ba7^Yt+}^
zNBi3geL|e-=km&lo8QPY2O<ekv7dwAUU2MYhC5)cD`+zf0*s{DgDDSGYS!ebV5c?X
zO?6VY%47v(#ik@CW(K`(pbA9I)mzDXASw>WJ8ypPyL%@oYp`A8s1+Vdvg+!c_LG6t
zHkbFucR7XC58xjn8t`z|v?orx52ITjtWc9vrpzDc9FdCZEZA2|Y=WVWzQWHcUUzJF
zKCgEEh>x9gBp%(a)8cDM{Q)g+x5L_&zINpL(Q-h#1u19CJJ+?=+?O~szq=^T*Sbx9
z2e&lbcB;*$5Hp8!I(>!6r@|=E%H7k2{9yEERn6HXDAi#1p$kD&eAddDTKGLs8o>D1
zUMCzAOPPTpH8+o5etC;^QpFaD0&!wGqDrCZ=WoM_hXB9;b(%6MC1PV=<a%Wql%Xk9
z9hRiN$*Jr4g3>k|BtG5X1Q@$;J&P<YSCP0=SXGDslGTR^>aPd6eyB^{@D$`ZZrrtc
z;0P7)zOxfZwDGpsPV#*ot+vVOY=J^>tA{=olvqA6hALG+SHtv5f$$D!y)rSQrVxsi
zOFA}spkIcS@|!JgjGiy^#eKXodmx0ydB|GNZ`Yjhyp>LwOJIJqq?y{m7VN@G9rkXA
zLw$#v45A=%-h{;<rNZ0I{sMr~SkufM_a)8R5w-PQl%kNf+qDpDYk8*AXQ`95OR8f~
zW@jNtLy;?lno64LHZ&Dx6-{A=SV3)rf2Aoi-u|#xW)kw{r;iLDaT$DefvIX2p~Nap
z8(N-q#1^BXH19G4=>b<N-<5p=*UrcF^B9$nCidp)ep=yjW(_+8#&4zp0pJnVGWtk}
zyx^nh<H}-<@O@}NNk~g$qYAbU9gP@&e)-^Gj1p(EWaK937tu%r-F{3Q6A2YOy^_mh
zJK71LypYo3D9;?Kt*}?+@(egjga$(?f9EN9;o=yLK}#LKra;+)QP#P0|N5|jxGXW|
z^5i8hJl|?jb<W)}L<R|V3&&EMis_kQs!YS|_BnaO4-!YxA??ZtldhzTjB0S`9?Gvv
zAgRb6DGDD^AV%P+6v-bCB<B9@6s4N8IPw;0Vu7u?O%m;ML&FrhFJ_b=+th9U44rVn
z))Kh5&HZds7<I0&4o8Z3si8017Er|zXrTWFswk{Jj@*++?0q7V65;kA>5g|kSRB%7
zESuBd>yPE%5LlT0fl9QIAN3C3u_=E1B|9-g)&1c{&+$F^Gflh*C{>JT3pYQe=~$)m
zjRw<DVq35Uer8IB&*N$HqX=caU?NcT;0RT|7|%r7##|?r@?6r|`mz$U79Qkg)}6S$
z#Zj?vo6wfsynKqut<2aTFhRUhz~<FmWmsz%7q`;RWZ2w~qHVcjJ~tL%)bF+Lw>-Hs
zfgpA-A#Myz*;<avXXn4i&0(%>n-Qhjt5;2I;#Nq28CE}iDRQM9yFz4}Epf(al{c0g
z2Y|Yql_X(IUjCd4p8QUdv|nN+gx?e9RQ}BXc}HUw!QPPI9lsBuBeDM$sRFZWz;_B6
z!Y6$W<TxD3FFq~k_(=(d`9$06AK(Ktb_J2t9$HKts3u@w+Y%$cWyl)DBBU9JQWo^b
z8v->$dV4ec;^C2Ia_*OK%!>|Qo%`4Xd)wo%nf;Oz@sxyl%=@li^PAj7CgF0jY3bXD
zQLS~<cspd?v{^Kp+N^;mt=A&aaIo~~@<rku9MLJftr!A&Y>*K>ABObs=>y*Zq-q~p
z#%I<K-yKM%B`lj8(AEv=ao`+U=WE>Y8^U2^1(KwGnhY)%tC2>%cf_ISy-6{o%L=9L
zOfp!*h<_c+Z_HjG?wr&Xi;TFDsKSIAN*4#;(@5U0eD~TuX%77xIepfuj>CT1lxNIL
zFyU*?({_?T@!fYesR+f#5|!HIt0bHYd;%Z0)_&r>O1mA(Atl9M`K~8wZaWPf^&LA&
z!GWH@Mw3c1>Uiez#3SrB3`Qs1f&a%Zid=QPFp(Nfu1chc$)x3J{6w;5-YZUb4T&va
zb#1ibC>lK?5esS=>`R)g^M?B1);#6s{u82@lm<>h7W!lhKorP%Y>Maqk!8I?*F#zB
z(QK}WOn`}K3o%IqI<yNSf7Tzal#$%l@J~a2pF=jS=3uMooo+>GD2<axL8wJtJ(Q34
zer$$r4jy)I3$vx>_vG&fv2E7MFDZ?5obrtkGy7}Bp6|mM_X|a<(q2tz@w7<d#J#K@
z87c8RQ1n}Mu_w1_zR-<Gi)|-zFONyHj++3>#T|e50f;Que0QiWDgvdnc!0Dm!<Y)Q
zp=xH8U;*5r7k-zh64h4Skp9Msnp`R&@1x9}7(y{lG!`Yp;>%gn8<<RG1s$|{dhDy*
z(CW58_}5~`Cbq}^D4tho3XVS2FSNP8<8jdzj0^brcuS%<KU`&8iT&CY>&X`Uq4rj*
zX#qEsoGk)r{b4Sm79-$L3NY`trpuha+-`TZMCh~&Fr2sm?wpARYN(37e%~E`*m)%g
zT;L1Vtw6=*Oa&`O@a~J!lXNee1;@<JODqR^f_062k4hXqdn$-Fd%M+l65LFRLhhHp
zu7Bs=HYf)t8<vV{D4c#TnOg7hCJZbQq!=v%u~4D9lZ*%}`rp$~JVe%>9refIyT0Tj
zckV?i>kdnN^-cMcH5WmvEC&?JqvgnEQ7L7?+xREQ6i$#w5OzXVhsLGCV#@s<L<eAY
zn9rG0#`-KshCEH0GLV+nnV~QPuL2(rN)5J3mD78;cXfERY`XFb_#Qk8^IoERW0g>f
z!3j==_2-ZEsSl%Hoj^pRi5|@?7yM}C$!!*|G<=-3)%Kbj`Rg@N-*|V?!&fN;sCdiw
zv}#$jyD_3bruL+<%C)3+t}Tf5FOHqT($KBGVW?p%-A0fub3}%D7^C?N4!x)Pa<RLc
zJEa2}c6Z*T=L6}C_n6PEeA__R(=yEBzR3-7#fq|_flt1uH5}+^-T1+_i`v<*xo55z
zW6-aftz#EE_@UYH6qhoqmX0MdR5DNX(kzrpx$RvD6^F-x2D5GW6YcazGujrDl{S+N
ziEuX<WONG@;h}zZS@g4gk=@No(0mGrLzILr38)6MY<^pZTW*~7rN}wY{Ni<M*yKYM
zEvb`ikoVO=DWPp<5ekC~le-bsaNtzBb3SRpxKtj87*<?CEo(Sd76Q)5<o>Ogx7#Rd
z^OSiT73ug}B^Zyl$joQ|nG4vkgF_ZD)1e~y(*6^u#8p&(4WVnru2#JFRHL=zRRF2_
z5+XBOhE*#8uZ){KH299n)qE&mxx1z+jc|hPnHm2hy&Rc@CZ1Zcrqy_W39x5b{Bg1E
z8}GB_ecClCU<ue&NaXpf!i|=S`V)5w$6W1{wfD)i-%iE5NT6nxt{am{_u+xU!B3v3
z>6*oI2#>aH#tS}oOZGiK4LvFRJK!0eHeBcP*68MQ-g6a^!s{mYyZ%cHQ&T@B=$z+6
z8&@oD?BQbX=uMUr$5duI+jk^486YqB{4f3)jW^-aowyvYE*sX`*;@9nM=I8q40M*?
z!#n^*q$&Vl!Dboiay1*zx2rxJHR*c&VJKKbXMGg(`;xseheV3x&r$=e4d%6!U6QWi
zf&_%5LooH<Zo&GKzvs#*cjpI0;N0MKX-CtQI$;(+Wx=f20X19qwE0kSJaV6s3z$9S
z#tXWX-X6d^wJ7LN`7LYK$b8l7zoII&lyL6yDdf(0YjPOUyvcGUMe+pk4W@z)SAX~y
zRTIlvQCf|IL9Uw|#+iEtw7%}C)PorExs%uY`N3p&Li*}7c|wKn?0cYk6AvTJMj9h4
zv#RKj5YJS<p`dy2XkGcdkD81stwK_nKxzh<O;Y*rMVES9S#c_jR2h@%k5Z_>8&Kkd
zuNUQ5>GnX$Qcs+~6btFvs83c;E(b|z3VJ+hx>V?LuV!)FU^EK)$3R+kKm&5pu286)
z{L6~fyxzLzTV267n<<2xHpBh9LJZ;9lTZGIR;H_Ti)m!C;5*?*v{lh_&NplTp-aIK
z;Ynn<a@whc(>w9o^g^ROC(Ov^uwAF_f<w+(EQTe1vgzK*o<tY5Lb8O<ZY)1Cb!bQ)
z5v2sad<$Fb;Be2@$PD?Tx(w?cnOg=y`=N}v4tsm2Ua`nS;d8FvnSOEShc&}?zVuuw
z(~^f`^z>blig&xAU1Q0*+~|w`!BKP+qe!2Ykq3s(Qg%Y(R>DMsnt-X_2c&{tO+M%k
zK&jGjr0HLIDH3rCYeTQ-g0A1ax6}-!N+HW+8D3yycVD||c5yhy7vP|wRtDY?Wi9Th
zKg7Mwdo2Kfg&8BeBtH$YyZG_SVvigXCioIzDKL955Gv<UBHf>G>>x&#lBJv9vEyhD
z8&nenh}171(1^h~!2AT54YC%~kH=UFhZ&Y}STm_UPWwySDr<|6rFF==eWbg?aZN&G
zui>n;CkXDG>~_6kL)i%|qRR%eTs!p7b>vEIPf16L#wCw~62HZL<~hd@B~b@;L}Gi;
zGS#3Z3!N?euFRsRXgl4(bDZ9)UQV_N56eWCdHhA>vP&kB5!{-xf^{uTKf+l3UJDhs
z$%K{>@Wo877A)wdGIYOW5a7|u0tC=Z=UHVk-V6G+muNNpEJ%5}w|%Eb>6+lfulBMt
ziW)tc^JVST_?lq1E;rCPf8;9vnp;RXgZX$L9FG>C95=72HaD#|GW5_Jt0VwZhPZac
zscwk$@g)pR7~e7{%-({`<Fgz02!F0lqfB_WUR@%Z0xs{ee2RmL5#9g|vEQzEJ$<?d
z%&OE1N6ASLj=jx|h-|Uy%u6Hmtz7>-6VAxY*Mx|0aCz?8dR(ODUT*8n<$JG(5~G}u
z{2A<#o<-tDzbTh)g?)IH9b7^?$GB0cINKM>_GJq?5z%#(!EdIRXqtADV|BmQo-arK
zW@e~{<~1ME0FAkthzYeVPTU|SZx*S}nz8gDJ6P3=gRj24>{FUf0Apx(6Md3N!DT5#
zZ{TNzCXf+(1e73DGGr#VsjmMm+NRUX$|MFX_Zv2FY_`cP`7}y6NB39ym%UN9KSPOG
z<$!NIiN^3=n@q_vYPI>nj^zXem{IV+fFgt-bI7Dc(RcHUvOH%|6@f5ct-3fh0~*_i
zm)HYFFI{(Oi%^ttB|~c@;)FY)yA5VDJ%xwxYt@CguyCrMEkO{D#JS(*qsVPa0$XX+
z(~cFnmh34o6KzP2P$EtS_ep8bH6BZd;3o2ptEV6wDYG_lSH`%`lK9gs309_diy7LS
z-q}kwo^Y}rM+>s+3@*v5TmkOU=$$6>vY&tg$W~@8)<6n<vqw}GWW5u<j;muH6ZEcF
zj=ri%)zDu}Rx+-^fLQ8e_VPOhEmU52$=(om0D%R#ZsZi-wWB6Ryh-0*_vjLs6Al@3
zm)%j32)Y2EJ-?q?^9MxoW7K<t+g|+w5d_iBlOnum0(cZ<V5=_a?i;U678Q|Bb|mhY
zwrTH`*<TM<5w|I19}9zG&rBTRLX9o~tWA+w*ft?h5iLo)0wtQcZfU~kb@g7v!PL1z
z>S_v)LVd9u+ujdHBPEZ-kGccLg`05E-rzC}b#mJ><JcHmS?3TYC?n5mDsiy;EG$Yv
zreswPA^_t;8+VovnJzx%bv~^H7+5B_{eD#Pwphexrw?JcVD;131SjE4bOf{8cP10D
z{uuVXDO{bOxP6L$Bwwly@!kM0xu50R{r!B>EeH3WKG90Z41A6?j;5?a8g~u)9uHoj
z{iak_;Vc$!Ssz5odL|_~_Jt2yHldt%q;{S$3wD>V=-a8TK}o}b%-4%(MFg1dK@%ex
z-AhBe0LN;WHNm@h7(bc$;1!MC75lpbuV$N(UtbsE(w=ZBvV`Sov<x}V!uh!wwX<aO
z$dhFBt(`=uMp%C<!$!K{q4<^2c!5qi!eXdCl1+M*sQYeClhSN%hBneEB*SPRi%Nbd
zKxTZtXKjdufl92HWO}7}hT{ahSfTauzF(OkAo_ht@q2q9ZBxFWQ);hthk9nxYbuc>
zcgU3%=0S(t*gez*Up46aBwef0epW6SRFHP`kVXbKwW%=uJgf&*4)}uns{BLgBDIgQ
zLNKxBJR)w5%LpA2_fz&ka~?&JcxPAuY97_vyDiG`vtj>bqTg&U<)}>)^U;=kcD5_^
zk{?##F3lfktA6Ux<&H~w-!9v(YGrWlKZkt_X`G1v%|iT9pTjyk8h5}n{>WAMv?sCo
z)H1F+__r+qW~ojn0kX7fJV?;L=tbqo&k8gML>z<08YG8D?Zog%?HBAG9sRL4VqP(@
z&F<oZ6<6~c<l{Jy$$^3e@7<DCC;PXDh9m~Q#)oH%O?!*?Bc~;vJ1C0n)}NEt6PFZO
zE1lAZf4xG<>54>J)-h}7BoDaJSe4*Y<(i5rotax&L@|l~*?+=7rjvf{EC)(XnIT>c
z4Nszw8f1BSmgFTr(dmO-YKg{On`!Wg<|nuHYU}`qIu6T9>1k&0%3je$3WYCL_{QVL
z-Fq2@*#T9zZuY`tHPK}g@RJ{FH4;tPD>jR@7iF<t;3MxwJ}FM`<LYK$q5J~?cZh~#
zHW%_-`LO4ns)>V@R~kZem64z&g8I~3rOAhKdb-F|UB$vXTAf{d3--C{c?Rmhs5XH%
ziAS^G4o1Fs8q%8dD@ekpSIDF>R@Pr`J1q*ph^YR`1&CPE2$_<}JmLNXLw5VB4>&4E
zH94L&xcGCAaYL#4`mW`d2+>vY`0}rb4w%}Wg*PAKh?a#<tv?K+4QVSdw||i|Pt(Nq
zUc|yOV<aLE0l4rQ`E4onEk^u<;A9uZi5Ol4?7?M#0IoXJXynD?j=R%c|Ayb5x#&VC
z5G3zJ-EAbFr;#7Rpe}Wv8zZ-tOXVXFV0v!*=Jh8jXI9X4TGw4lYyOvMluOO$c~*>#
zLLs?<In8Jp8yRv#;43GvIkm7inVw+O=IFB`doBaUQ06hYh|3x8nYWpW1^x-4Z89q1
zv8wsUVH7xH!~hc=RX~iuW|rI-s*(R0gV8kYisq0gcPV4>6~*JyAhp8sB(<UD>rYi9
z3^ygKN;f$f*DWQ+!-#5!SrG@f%$he34zqci>@5Rs=RDag9$!^-Sh0m(of<WM6ZKS0
zGUMH>{T6FPlBE1Taf<wgY)ON9?l9U)+|$$#szLan7>d*--9yVHR%gaQxTN7?A;Wp_
zAX6*ouKsuC(prJD?oUr}uPfujf*O%|8j}|08sp?FMcz|#2D}q0j0%FuYQrmTb;FxO
zQKqYk#>=*Dkj_!Rpahp18@_0~ov6;|R?>sy6tQk!KoY6EfGxzEC2<rB6Bu4-zXvF3
z3IS@J;JRAVGt=&O8<{+Y12gp6r%qr^iYak%a)h&cwty?Gbsh`!j+`e+78RYiX-wM6
z$>CM9=@A`PYD&Sq*v6gJ{<}Av_NRbc(u4;Pim62qrSt9TefiqAL3AD|GtOvB?J^0G
zuC@q`4<mAH$URRlIq+`6jbPB-Z0;DS?!O)#%tI||k$9{g9m0_I0m%5^xB?TPB5AUt
z!-LVA(9&T8G+nCNu`2^PcFTa=HI(G$z0|zX2)*`{H9kFcHh|BtUy`!PL=ZFpWR_si
zTSr;KH~p6k?}ma?y?q{;E_~Agsb-L}#Xar*Vg{9nzS7bl=Ct-&Zo!ui%-?{iKhFW0
z?DUvTvKMkV1-+-AKMI6b7}Jo5;dsfeq9mvPLLJzlt9+AuM&kqUFA*XH-K_b(0U~W1
zPV*!qRpswXmhm}NFJqu{jOoaQ@3D;HL!UW9iP_{BNTQxTFxR*lTy}LQ4Re<a8tnjt
z=!h;mgeayrRe{7!Gydx)(%2&)rCaUo(;c98n7cnRx1_o9z*M9oZ{S$5b8Rw4bs~l$
zeR?<MugA{ctss(NeG12C2_gl>NTiUB>URuDG9`pB>mIo>3&2-^0K^f)!S2b1sroPx
z_t+op_Q%r7a$QRG3H9zOGk<aKcTh+BVRgFZS1UtrM<>iUwJA$tca$eU?X11=&OP-v
zpvsQc-%lxN@<=gFRHP)l>CqRw%Yb*_j&H)DQ87nVssz^1=6=fyO-P&x<P{qkt$hdP
z)Jqb_lek;OdItS@O8RyyOt2eYu?QyFSu6~gVND93izw4*hm!9PSm2b>*D$;l0jesr
zxUdg)?vkCW7Xq5<1maS9gcy>KkJ7H6-=4a>a-p`PGkV{!NgP)eGbK3aw22~$KyD-M
zi6$KCR3Af>z$q{px_&7S4_^I2sCy8K4$p^Hn@2aiFkG<n8G{(6G4t}N5!<hTx0!TX
zF#K4<yTnyTk>{R7X$@oUj18cF6>n>}>AMRhF=X{+t(Ztw(>q#W9nAuEa1>i`dmLkB
zq4?+F@<g{Q;HfpR3zC``2S@voCPb?mvL_SIO^D6#8yLob?T+Yz?NivllNLGSuf}d_
z68<(r+^)e|idFPFH6`ig2bpD#zO$)usWvB!?g{H*?jPI8k;hLAt81lq?7vV@yx~p<
zBnH3U;*Fu%gx^PPsQAmu;R(+zP-k4k=H82-Km=;ORDm*84Khk%Wh-36DGcn?B&M)9
zjaTMYi3WQq)~Q^vP1e+6hb3f*u`OToYDKMshd|M~I~jQT?b{!~YzEn$;-MkmWb-`#
z@B$!3&=dfc6sd-iI$q<Uo^MYs3n#r%1CHPLfR?B8u00oNTwfJokE&LN?H8>>6<6P6
z?YjOe0X9oH>cxtaD#Sr+(P844E-s*iz|SH#bzbi`H;4kEg{1i$$#WPin+bIDaRNJt
zS~}JbR#)HV-TZ!C=dn{OFi!~VqW)4CtZ4bXuU-UH0<&&x71!~pPyd>&)-!`bP@1B@
zQu_THt=<<PEjy0K`<8~=?h4N@gn{XbP&^at_?z}be~jUT8y5}{yxG~pJjtWCHr|}0
zDkyX+2Y|1YKyP&yJ_gNZG<JBv2JK!k*r`f<CWyCQT=k5|0zp5Ah)ny9XJ)o66~F5R
zsJJS>1*j)@y2~vp7b)=*y0k#1n+Bzz<!!V0@5?zcMg?;0AQ@&6xN@3Jd8wM3O)C}P
z`zF$B^$mcs2?}HZtiXi0S3JXf)6!;dKNZ;|?B<C$(O4}Y&MJBt2wqf`hFe-h@Pg9p
zzrz9e9NxZ<0D*K9{-nHiJ%j)Gml<E%25!*xSFg9<SNVQT2%o2T)3oLzHTUQhxqFlq
zX6scq`9wzX6r^6+uaIBr_ZE_5>^}mQd-i5z0%+Skh<ZF4vQD^RE=Qf^Fy712gCX`X
ztS0oh-Ma}mB`A)_ay8Vi<{?bHr&u)uQ5D1oD&81oFephNWK^qS4{vupu|nd=E~Za`
zqLdw>(|O;;90M#&)=To6QDYS$*TK2VJ14n^W<M~FwpCg5<HtdW0V9|vp7D_+i&=GO
zLh;mpm{<=xz0iLCUiLm#iPK;C=|eYV)!v8bH)T{A(W~5h3?KPaui;ge3iv45w*)*w
z^^Y$U*>MM88MK3lo?BCRT52k^BhQ1qa-9ZBN?B?Un|8?Eqm4B@>^mRjwgF1Sh*=YJ
z*8o@(_F9FplJd0y3rVOyiTzcRtiWN<@L9bTs$?^aZnaF_8yjtyof^6xK^lTF`t}sp
zI>vcOUEZP=o$}P}2N7<2e^xD7M=v|Bhh&<)tVAhRx%hV;>u(pU?TVj$J%bz-4Hdwx
z$z3nUZt1HGT9qm1u@0~d6SPAxqFj4!7}ZxXtQIqRGVZgsiRxzFx>SjS?2IN<+;U5`
z6dy;vmX<hT4rD!sr#E{^a&NzDqdefzP{MP|34%dToUci}g;!JY5}CKWXHC0;BIx*4
z+W)xRfCH;@xcr@*pYjpG0_N)HV2$Otu6o(m)Ih*cu(b)(h__28Y)G5;CUiYlUZ;6E
zP-x|Tuo}qP2hE<BWa=Iw1*Bq>a`G<5d&Dzi#)FbCK44nx`f&&rFri{>lh_~5<^>`p
zT3Sx+YU>*B7gmi(Iwm#KgRR~Q{07Rfi<rRC+-PbsTZP(}2v?(;T?muEkvypzD3t^I
z4!1!#Pu-!tTV4@;!3&8TBTy^5O&q1Y=vMG`I4<Hf_ka_pEc|)VFP@7Z*2G_4&n|c2
z+-T`*C@=f-H~!9}(~en<%C@t8CTtEb_ay%+9HNU?QSeVwbJ%=8vV;SE<22RRXRn!<
zIaX>Ba20`Ykt(96+-Tg{fxFZav~xC4$3rAa@^O4dgy}7K6jr>x<J5NaEvHYH6r;Ps
zHRj~r#jP&+UB%dW`ggKlI^8Ydo79{=wQU;6ivIH`I@i_<3r+Peig(>%>fghF1h2W?
zmsi$y5Em~P_$mvi8}3mNyRb_nS@L9MdXu-S6XFCkHVuD4HmcB}!fD3(ra)=jKWAQ(
zELLeV-?VPN@q5p-by#tq%__vH4NJ#}4vNG03;Ocb>3o)+gQd{IGG{TY$rCN7#pKtC
zoHsKHaW;$L*#w1ox|XVjHuEzIrsI)cnu%fx8N3?PfwlLi;Bgy&yn&%>(P=B=PFrxs
zsPbw4i1NBKHSUrM_tVTwR?(qy&(0aGgN(Vl?`Qi=#IWd4f@T!|ID@MI$$XA5C_!Zp
zijY)mUZ;&O5$V_M{SMS6d!-XfUm}HvplzjmKkuB}a0Sj7E5od14;YVU-D6s(+iUyD
zw`gDJhVY{r_8U7huP(*nP~O&ACP`o-<y*j>cLHm)$*wO{%s}lIX^#H=a}IC0Uc>M?
zsVE2YA4l(kEEcC4so_voWh76xK8V+Uy(D$2JPgrw<8<E)!xtnMKD#k7(ZhXuw>CsI
zAcz2p4>g*T(Pt<3tB=!}mAvb@sA6@XcGlawsGoIbZm+uqxV$uJgD)MOG)Z)^R*I7|
zE^fPjIIj7_L=A1xw1o#No%_F#_JrMT!pu6+bKl1wewn|n&pGo>2Htt*q?Tc+rJakO
z>W)lvxO<Es1ax{4w>EGWS-884(Sfx~XDu_pygz5};449Xy%<x1)?wvf*NIY5je;D@
zA~#CE+ze+hFG5pH%cM#2<VG_E9hO<qxQ4%&5+-o>Ig>4BR!4D3R|rnP-==S(qM_QC
zO?F_cIF`i>+9MhYXY0aU07Snik=BntEavZDTL(+)3rk5RktI5;6M1bFMRWi|*`I+}
zP*JV7kzak0S_ZAAL|Q*JjvL0=p9ov&J`aJHuS{@rcP<WIO8Ol;UGHwvH}VCtv5I)E
zM)X}T11eEwRs_?=U6F6P%bbR^&7SycP|`Mj9Lq%nRadW8B~p(Qlyp5y?3*@Ob*)DX
zuy#n6vqC@hM!r&FE%mOt3+H(Y*5{~9iwo6Su|Kv~HMiWUteK6|cvW*hy1Lu3_S^?$
zsnvLMpas-(&~X_0&R{FQ#52qA#fx=5z7(5Iz2}8`3w34~93DseJ9(T>lG&W7=O#4E
z%`ZP<AHe9=fpT`a+uWmV$2)SaW`fbQ6kV|R_s`DiXI$uBSPi?jp7ME_BGW1<koW5n
z`(!)y7`&-MgLAm@8ylcKym6L^mHvtmcoEW57M*C&;VAT=#B`MICJt}I(LOc*;n>e~
ze*g(nn-Zk_RT(j#JkgLQ-JvD6=A{8&21nnup59zelVzdPQ@BHyLrWE-SszWio6+t1
zQJ75HFo}I5vW#h!zRWV0Oi;BZGwfJLYGn45C35>(F-YY|Nf*ESCC+*3VB5*)_trB#
z>*Mz<eZ3fBC$i|DJ_RO!VK|n-orUer#0vQYR<Z4}vshnr=#exaN;OgU<>b>4)*4VJ
zNiZjjY2@Rm#vGv8F3b?_2u5@A-7`j<jqJ76x>y1zj4~%QA>FnQCvd_&M-v!=KK@Zs
z26qY{Bu8vSVxTezegU*jO_#*A&JYL1Ep{%u-?AT5E%)=D_UOT;z>?g^6d=D28E*AK
z{h-=m;)}Xp%X~x~&$u2ajlrBX*by_O-J4@M{d1<7aV->-mtd2@vi*)FGXbyc$PoS5
zA~xHzdwF}^OG1z&S;A0(^$~w@r2YwjgQ1azyCuDun=MAyn&up0ASBW2aaI5N`$zQ>
zKN+?M6eIh7jSZ{Whk?Da=PKBa#Blfpa!y%9o^u2!bczg2u2F+%Ku7(uj`3Y-T3o)0
zDRb`&(M>wj9G>Zg%kGp8LN&t}IZ18YG9tH%6!On&*CqWWW5x&bnxk~qu(D9d<|3k}
z<-cE|`qn!RSS3`n*#gF2?D@uZ73k9aGrtBm!27~K-1+Oclc`xUUhLOB{J2lm;a7at
z6}eN>8=uwHjizq-8CdD)6D@gkg~79nLyxZ@OXxtlTtnfkjc-dK@Mb7W(@ln|%n=<1
zP)3AUn<t_<Ld?$iF*;`&o}3}%iSF0~adM??<RG3_jwP~7<&s6KF(J2AL)FCPh0z5Y
zbuX~1INvnzFsvHvp6;6Kf>D5nHStTfd?As#y_`yD^*)^TeVDu+w`%@&2)qpzl1?xf
z`ZH!1s=T<_)C={W)YDTxD8V6O`Gn@C#GT>bFREE#pRB3H@s*97R^I&7dHlc+H+%5<
zvLQS*r0c!~r<s#i?qlXH&7uRIVlr8K_Qz;v_6Ww3oV58=Wst9uW301*<Wh^tlG5_t
zbqU=I=akMCA@X)(GP2qr!nu)Uwef(;pfEKqDbke@Dla$`1@2}c(Cw=aEG>e|?WLCO
zD)eCJ(7-_}u%P^&*e%{7|J}Er1H+}oJ)FZ4Z>F_CB;As2(MIeNlRRXf_<Z(~OIF?}
zF5jcgze8K)0C%Xu2}>kM>+$f>6q}9E8dOX2Bq~%FX1j9TNDa{V-$^G`bhMp!JXW!D
z_=U>s9kToko}6>h0U1;}87G9jnET-Q<trUxObm;2|J7HOrMZ!^oA)|LGRc?-e+?ST
zJ?1nctRDz%k7x+Ti($I9T5W(+kt}0B86CAo{!G(xFB0KHaU7In-`_*$Jr`qUcx?Ud
zNx(psTONmvE$wrclLK~TIjT&A?w-cc7E*CQZ#Ji8!V!(S)Qeeupu(RD`V@L#Gc5Cr
zgI{=9Q|rG52N!KSD)CRY3PG``r!+$sd#r)ndH*er_UjvwaajTrZ>lUy(3JE?I0VaR
z@D0rvQ%M~KNTR^?LJ<=Sr0a3B9NHJ?K7+Dy&q692|6JFbW!aIdPsbPLNOK|qBk)3^
zkB;UQlo30&l2GDo18i4wOXy+W!RuJ1MG0h~lJcmg0oUlGO5I!evJcR_Fns!pxI&^j
zOZrL*QX^|`%jw|w7$-3!_%DgV&??3lM$HqzB(mhTBf+K2jp4^h_f9R3CM=|K76*U~
zx(M{P9&BUI%Zdg<IgkOzuZO!09@(pL=RUzU66x2^`{6pTRgJyeBhG1hNFgvG%4;q!
zpN0bajpCER=*W(KnOVS(D~@v^CS6+Zg|cd0F2DuMim-zbHO>x=o(R0QCF47Er7f3T
z6qaoMyD3_#oFee^iZq#5zVaLgFRFloJD7Ex<r;h{1B{1Z66?#=4hjn|*Qu&ehlfZ6
z9gwEC<mBb0`O>8XKa*`w%ZEbu_C^3aaU`jFWUAb;yjU$(Yf~ML<h?ac)NZCmn9o?H
z5TOeZ&x@1}X(}`-WvB|X!5gb34t@||1(Qq*ua1(z%G^d8XcYN;ZsFz4kO|kN(tV?{
z?C(RYK=OiMN7X1=TAfRaT(bK_9)M8VQDasYihRj-{1U;V%s8HXaQU&RAp=kaNB6C3
z%)?vGGYQ+Sd0yR$n^M+sSU$^OGe>}QxAF(ZLIi~Zlr~zwFdd)H9rV3VF%U|oJxplz
z83nbVZ*;-pn)my$@Eyq6jqKzE^faRaub+emyC)9^dFRj{wSI;p_yF%o@i5r1%lxz;
znz=rg!K{@gt@<|ui`2n^niNSB@R%wKX>OLVqz#cC=qzqBbHcgFGfb#+BA}MFp37Jl
z;Gv2BEs-A?D+ALce@uUL->D2r#<|`p$h~>~7f>3m_jCE0E^P*3d^w}knQfB^JJIn}
ziU8x|={aXmW%)XZJh9-XA=95T=}CQ*HsX*ewbL6!bGFyvc`Wql8R}D*tPkRw-ywLV
z`E&(YhK?u>$r40a^6jrBc6`fLUwX+e|Dy}w%AbZkdSZv0K2l)7f6#ck1}OO2kGIX8
z>n9u?Fd1<0<|il}BM~a5a&D9q$(SnlkopQyE~%H1>Qc2>P@-Kb&T5`SWMoM0uUyv5
zHkt#ZK{iKoMhu2V>i{wqDB!2X@r>GagVdb96m!{Ie3D+JlG)tV_Nu1G4g1)F*Msp5
z%HoKx&u)}{j(4K)F(6-utNrni8kpWelRrBfZE{21I&37#Sr`pPsfxy6FCUAJ(BWPr
zX51^O*1Q?miZjmV6^JA^g(GCPdb|Y@T$ut6gs%gu>p*ggfB-kdH(uKKfo~pRc3TGh
z`352nnuW5<3xR+j*9iKD1#hDT2Np>u(I=<9Sn`0YyWgifmprxYw+89GurKWl)4Q1v
zMUyQtI|2cQKpYyMbtbkWxyr~1yv;Dl5BAb5Oc)c(Cgpo5W4CTNO(U9SvKbUOwmmEN
z4n49&9~-BXJ0!u0G3tZbOAjWKIgx4#+QoY1?9hY=yZTMG1NbH!AZdSeU&H_E+f`U2
zJAJvDg;RySRi+_pz$vU?o9jcT^`6cY)dv^-4S%^9V?*Ao&TtBB2H3iZ^W>`nLGz`+
zCnj|B&}6HH6ZKsokq}-SEcH;lE4*RRZNppCJ1#y?k+@Z2`VUVYd5_uTpO;Zx$&qi-
z7zb1~tB5%N&5^Tv`lX!F<cTQ78cL4@r7lJ3-Ox7?e)zgY7jfxm0Xizi+5U@>NGAo}
z$G3F)-3qS7u3Bz_Bo5-q!pxE?KiMuG6q`<ef59I$q@`HZKimxRp;5isA?Va28p-N8
z^kyV6S2cB~iC5I%Y4lme-jGAx`<&r_1r(3><S!%&!A%&HP3y7{yk{jH%koc}emjL9
zdqnf;J&?ab@G8=l^mzctL9%520PjRTSx+y_ZbpuKHJgmwnvF`;F5$_R<6R(!<*!|e
z-`-$mB3N`XeNHcHw-d{lS<V2m?ry|EtB#mR@1D0O8F68=>K@vam?|C-SxvG%xK4v;
zEeS&uVl9V-(b9onGj}UJBiR8JY7H3)%2;=i7A%XV^d&1F0z}087m*!R6W+!PNqVB5
z>SOgWreU;_=!|E$J%g+SjX9qtp?^=m9m+ftb(Uh~;$<XY7_(;GA+glDecwWep|!hW
zvh(tNVsrWs{P`z(PZsq}*jg<~z3CX3T462OfrdgX_IJf2w{iD2UL<2(xcp5YQI+F_
zYRDMa17J%{>L2qt<4TlRZ;+uyr-(TbWm~^2BC+%6qth-(_|==PN{<yxMM~d;<!0Vw
z;01jtTP|rGP9lSssz!PBeldgo(+|=9;C?XM2{hFxS5$0iFY!Ylouw?%cHh1To>s|x
za7CtpdURF*LLq<@%q8w2AEMJ7DRETpCQZX`EWv>m5{}yluOC$Sn9bQQ?>O<=(ZzD<
z?Wy}fk3Zhx0a4X><@h3T@uyGR<u=(2+L%+j23f2F!I&t*2H)Y1+^0~oV@b^o@N#Kk
z#nFPd65Eb&KLaUMumCOSvO2>;9K$_~%myLcd7nnE9GUPDR;(FTK1t}Jj)-|J|H5Tb
zprPOZC^B>YmZ5cC%_f>ZpBT7P_%($Ar(KFX`jH>;1Az9x$>UN5Nc<>?Xge=}dPJR7
z-+@{7i^~rnHT+TMfu0{dJ{$6fMVAxKXs08Y1`fD?rBEKdXH{cM=-4{tK~{`oqoT*l
z5REBeFYVO9Sfy<tPr|jONFYV#Xq$H0hVr6jaUbm8Tlb&wL|x+j@G>1gSre_En5hXC
zC!$p1UDS-!Pt$dh*(B%L!pyZQtu`2#){B7qRm5cFkSxmKBHxp_9nL|%NWE@fM%uuY
zeynqkK;@m1*c_-bWvO=J))sszBJ1-OLAA0vrPRJ&Hu@yTClqZAv3bUL6Cbhey)65Q
zj}#Z*9Lk_bBUBrA2#;y;@RDjWAz_%Tis+;_4*jn&7=gN}uT#7E6+u?B!EgvP%_#3=
zXP8eIYE5Cc$9*|%QkuS@P-|uA=i4>SDEkZzwDSuigk{PDz`!X-T7X*Qc27xZQWPMt
z=`Z@~VJ~@gDVy1wVmp;8@7^k&OUl(e;Lr}|QZxGrP)?=TDk|oby85lITEzfGRDODO
zJ{o`g(yRA<C^Ec8nlSXz^BTZyLsSv;YBbelcz>aZ>V%;2_;m)qyUng?Y%Xc+Myx8H
z9z5tMw&dv^P{vjd7$A~XAnwuAK1yjQ9`)P)u9z7e0t&sooAQb>iHkSFIUkasUA4j-
z28L8>_(nhl+OUYiq4Kxn>GJyjrlkeE2Z%<GR#XmzVO?ib>i>c0{lT39bfl7$)iA4n
zA(G4<u^6KN8`*T?0f8|cpuFy7{uf$KfDmzx;SU_C7Y9PHntWTU|A%cA!4A!P{cmKG
zvor+81p8U}4~*%rV>{79TUP=>0(2(;G4-3b0_$%W)4w28k7U8bo8}NQSH$VeV&9+k
z{)2)7AY*2J2S$whw>$a^{{v$JC~PqRg$*P}&>Hh6uD{R9jRAsjDro!4+x!-jVf{bH
z{1-<J_)CujIH|?(Dq^6Mx=d))pZ|G`69Pz=urq7y{zZP;QU`cvum6T}MF>C$Q#&X|
zv;Xcf23Q@b{Dwb4sz4&cnXC8*R?9C!{KsF+v@THy$}gaSkT>O@aY>;BgH@>94*v?h
zbAY$^pPM0J1B5ZxNTAC|pmoRpH22>>yQMB+`3KnQuM>crH#!n<P9n5v?gw-lY5&vP
zz_B4*tmseWV96~rD)l1Pf91lT-95>G9`ho*`u_ie7fQiFawIji%I7}>GFWKq&wuBr
z4Fo`lpIgE5_d!G6i^u-^7Z5C33W#03_8)9Cfo!a@76|`~Dio3<Qx*%a|L+-}%H&t!
zL61WACX&?tzsyC*rTrym7B8y@kmaCE*Z*$<LT&&Tkie38tN8v-o&UlG0td!Ml!@Si
z=57A(KK?6=B|%7v2s({|0J>RKgt+}bGl*0G(%WuIIxv4>5Y5i9%CP<nNT4xHU^6PZ
z_Mey{=%9Hk|I>(>93(8cZPwlY4y)>e_|ad3%ncLRd_`Gf5Zm&17Xv|=u>+7n$dv_9
zeAXd_8UH!lpR5;{10ZK7JdhE|(_{UAiHnd+ErlQKtjvg^ALC&0-%bCCJQ={c2_lVB
zu~#Ae3H|RM)}cXId1#brQIcl#SPZ~@=O5&^u3r#v+jA=)Z#@v2fF0tT_kZW1zzNXb
zS9OM0V7Gtm)-{`}L-?oZu&!GOz3_HRPv`GEFtf#EsQh;`kHkO{-u<|m+xmMu(2M}$
zoIKDBa617S?{%MdocL#05&Y1+kCJAqkStowXOa5n*1!I8;)i}*Hv#Uk0bvgQ|DWwA
zhQQT4Nh@9YpB{(w>08K=$r=>Bep9N`{GrtSi80_Wo~ap@17cWLL9F<A%|EW`))b`A
zq`aIRZT@PwDq8&Lzca0l4!D{%NP6Xe{`m+MOf2=Eey<aQG#*qX6<hfCVbQf=9Zdc@
z8tDuK(<jgFs`AfP&ECQ)LI&r89MJd=&Px3MIqAA)#MkGK@Q@xK*PIjob2fkX0C0U-
z)EF^VVjSB3XK%<DYlRH!R@j@fH(Kp~U*ta-Q<nwVee-cI2yhPG;iDZxn0pHJIX8*%
z{(BIdEOm}?GD&Z_+OOuaLC2bwZvZUp7~;B=K~~zg85x3Dwo1Sx%o+VbzYc<K$Ag{7
zAjMtDPMgm^-RH>WpZw{te@7Sb0$NE^5os(v_vxIV^S-}uc>}=Q0|%@XyNps}(LCHw
z`s7@^0lKFLzVRc}ex(n(KLEJjYZ8DX42j2mo(X$6H`B!bvL9jwyIBOQf>~H61$JW~
z)MwD5$*d5>A@CEB^R1yoLnyAs;`v|y++Y9dLA}tfq?9Ye&7V33xquVae876<^MUAH
z3%ma|%3vc*l$8^Sy_x?}k!wJ`sRN`MUFvZX?Bmn*<l=}G``1HiR0Bt$oAe}7?eFrZ
zKn1IIuA6KC90&MyFgBVMd%JM|{fv&mVo>-`h5!^uF5tvRNL<^d-~B}|v*)l(+PFWI
z`!(U!q1BR-0sffYw)c|redgk8lJWUbtD#B%Z`}W{tTT^?dVAwIlI<#rY~9NxOQS*t
zNoGivX2@8wCZnXTrF2C~mR#a?tD9!VHkL{DE!&k}V(3<CB+8x&Y3vndFlG$%J4bHo
zUzpc;&3Vl^=Q+>w`JU(f!R;O<bsf@w2Am63<j2AFrR9<nsAAgA9(>eN+X-HFl^9!0
z(<+yp1i)pR%FeCLdjWf#xPZ&thfWCF{uxfXeyf1~`_)CWZw-Wpgb~Fj7*DnFdaqP{
z{jTpGN#S`+Qvze8%}iY92jG|S+tH-)<zLCg$$(Ovj8FpXLI~^&7f;WMY$4wS&SR5M
z8_<C*Z%qT<I3W1Q4~qN#k3=6mk^Z}oK4uDde8rDroTFa)bVwBgcA$hf#(eL=<Qw5b
zQk08&U&zF8Za&YEmr(R&l0z*6@E$_}Tp|y^R@luNYk7|5x4D4jQfhvJ5@Aq{@$=QF
zwq$fQl|5?VX5Bo$M`1)5n@AOG?bs)Q>O4fmk$<IK=x1iuQ0Gch_(9Kq#ObuG=w<r5
zwXz;w0Y%VZ>As7d86-BKSa;hVrAKnc88{pJFSE6>Rh~<3&4c7xP_uLiE5~5P^akj_
zHe+lP-7G&dtSCGrro_ShT%T{gFM9c}-q4*iuy-gL$%Oh-Gf~>-Cmx^jHSlBSoP{@X
z)9Oz21J^!WUdMsnZ8!!xb;RmVpHhgLQxBo1?s^kUEV5FHkQPaJDEn=O2L{p@y>Bo<
ze%b@ezWySL!s*QCq(Xms)>h1B#Bz=44x_iRQX~`#fVmX$0Dx1LC~c1(O+TXto$3;=
zJyEMs#{aaXEnD|;vGgoJJ!WZoXR^=twV~#`Ys$_|u;dP*5wGe%FTU&Zj385n>0rxq
z)2QhJS^NRfxuD@hm!mQ_q7O$c&H2G8cX1Ytzx5^htnRDceS#wn&FmwzZg+w~+3te5
z!VmMW0ySFl4O;%11v<}Rsb+vHVV=ok{%kV_8y*+s#{7@O9tfIwBufO#oMPV`sSy?6
z)TK&OgmI{dFpe?5_H&>dK#rwP<t|vliU4iAi0}9r$)Ir$et=%<-A`RhG5r8AK=VSW
z7u&SHulNkR{%X4)w)A3>+DFme8JPCt4jHo4=}R9RUk&2%$IcKyI3ph`Pd*I57o%pt
z=P<wFV<C?o>AOrC41or(5w}|K-L#bKax5agj!v-d4M_FeIPMhrI&{-wFejV=KxKuB
zqRH(rjwLy9dM`X!w`=>!AX90-lpkX8=>*AMo!r@LTB4qj)qz1xIOCazImTv`zJ5kh
z%^-|@<#DOXVjo7$%4#QyqVoiunq~b=?vkz4mRM+ZE-(U?{+L6wWw3wuI`-lnU>-i+
zuZt>eMX@}{jIO=z7v5fy%`{r1IC~RpYo>?{jFkN3c0%|}VM%-Pp@HKeT4y|SJQ!wP
z!jfBw-;J%Y!@vds`%Q2W6OKXuy3`GALgKfK@>4(7PG$mpUi)san2xtA&m-)JQq3QA
z{OzgR{D!1*)9u38p!2G4Ayv%-OTb810~}H}3*(*C3d}?ehI^5ndJp7iFv^bqV<pt{
zOrzQ?ZU)%Kt&0g>r?gv>`f0WRyZxI=s@zc|AjP4%63u-pu{ad^Our``ct<v`6MFIl
zK2FJ+3eZKhq~wrcZ!zM@y;BWKXAd+PF*jn$>LyV*Ak>hp`%?(=y7!Jc%P%@~9J}a(
zj|kzVr57%-!O=DlMfmML&!RLD*w_z(`zBki3wiJNkDzoG-Kd)dMw<~d$EbpN@I=ke
z>Cx(G`Ln5@-;@8ovPS-WUJpo>&lmus@E8xypmyt)+S#~Wb@%gaOFfz7o~@6AA8xQ(
zSg_P$zsB|t$0%<3a|0sX+?HG$<~*J~1p13_?^fXQzsgD2=;DNwuFgw1vVZMV120lZ
zjyH<udJw%63gIS3D?Ck~QMLH*a{(ZkDRNL^^-rF;msYNE$m{@QNJbe<EQfYExzMhN
zgdg{q%%-Sk0sqcc4fKF9({O8q3m3Ryo9<=oF6djZ?^oN~W$cz4yukzquZ};P_oE)-
z7~kV(&o=)J#&JNhAw=M&cG965N13lv4MVzQfhY(|jq6o(k{vi<1~=iw%Xby9?dgeC
z1G3f7J8?F^Yp<<7?{Z+_zm0>M*Eb#<m=#|H*L2q}meRDN*w*3HD@V;LeGFr#teeqo
zr%WYKvvz@_Eyy+!l11uc0Uz4>@=>Q3@>hvD-*02XCnBA_5{d$65I&R#u{d>P$}or|
z>0H;sSmqx;X~}hl*))x%^=q1Rf&+@`AQkIoL2+x|oDdiWe54yHFflN@Py(2PXZzcp
zYA_`d!Y3Rlequ*F=PQ#pXyvL#mR)!=Nge?`Veeh{i<&+dLoi(GC!D}|KvSlx(UKub
z=7*RJCghL^H`L_6F);@Y+~W~5hh0=6D(c?GZLTFpJVS%qhq_Hbu2_Cw)s`f2y{ZFd
zOEit)cIZJQOpF$N>wq0%k$dX$e4CYp^Eg%+BQs;!*Vj9)0?$8ZX)uvnlmf?gQ0WFd
ztFu8w1(z1E0I78pqtT~;7>a(jlSwZ0G2r+o!uV;6nmafk;d2#Urny6CswjkEFW8eo
zNTwsVS6gm0L5ykAw-q?ApnEIts9sqK%0<g3jL+ZxM)O#W^bv85cXCN>Vck{q!|lcc
zL%NRv%vQ$+D~6ujkvq>ixYIfjp|g4M-3qZ8IKmOH#}=L>blC3M(V&R%=62?z=Rm{w
zW^PU)@IV`fDGHf2{74qhZSGoL1Q}Fm*6Zp*f9uLW0jic#s9R2(DKLYV%}5rD8=@5)
zzdM#s8!DHdM5sOQ=XBRJ&*)^q={+)^qbKK<O8<bvEn}1^YQlt``i~(nIZ-?}$iI8q
zhH>1ISPg@^n+W@={P9Y7>WK2k_*VB3YzR0q#2-5i3t!#|w;YH;r0>Lx9b3vX5W`8r
zRZeTrX8pjArh)<aL0dBJy{~*q_eEP<nk{64RSaUt+y{HYctt+gn%Yb6#G}C0thlJv
zzzuSn1eY|fP;wxQ^L-saa|XcAZii_r_t%eZCo=R4pZ|of8dPFlZH2e$0O49cYG|x}
zDneUQrR>wR#KfCk``?!mV}6@JycrhZ#_*YfXM@_bHJ;;eb3Snj<lyJP;|F>m8j0PK
zn+%hNoG6%w;DH&X<w-j%7IU1cQuNDn;Hw#d83gvbC&gomLX_r~X}Cy-Z39TzYjz(r
zA9falSj0W89Z|ktd*z#1soEP`RqVP~OUm^~c=FGW9#g2|SQ+b#OwB&|dwLIc08GuX
zV0I#C7-3k7_HXW=j_aO+^FBTdVbkn}I|@sH*Qo-YiISYC*p6gVC83>ZHy*kwY~R!M
zTAGCH40>c?1mUeIVKi$`2ww5<+mR91#-g(7EjVglxidUiAH7=1zKU-2z`k5gxA6{t
z?_S?AACp(S>!l47JaBwrhgJ`5p0fO#*;HXp<i|{W1f5pVrxPCo0DuES{n9<aR&sfv
zG`UbPw(V94E-^f4St>UmRUr(*Ez?znXeRR&*XxHil_j`qj4uJ$GS2lYoeHd&(~Xh^
zZ)F4)<=`HDbYyXU7<oJ>Mejx$`xq-<gQwN%ZemUznM`Vb6t5KNdP^<Ty`q#qJF*3>
zgPq8^w7q1}XU?rHEQ#x;`I|VIB`D^0N?V@kPTNn!H7|u@PZ||nMLs0)Loiu+?%;)z
z;0?%qJK~|G_`Z{+Wo5HhzK+%_$&VLGaJ|WgyCjBEHt*O0??0)Wt}@P5!~+`s4|~Os
zPJ1>y+0o28X6xo%FV9}iIkh7V>aAWapzJzIFv9N{HQ8u;R<-ni=GFls*}w?swnTR&
zXuTMxYe@!4_*sXSPqps72k_scWliMY<8F%4fmr*-ei=li0sr+2XWCH7bU+VB12r91
zgfQfFt5D9+ejF=sPLYADXw#l^fC-~lh~Tk7O|+$lT=J&&^fnoWi|b<<RZvZYzHz12
z(!mAz#0GsVdc10Mq3kKAc3Wx=&9EdAWZw9YE~kY|^~dAHkzYbz@8n^eLr4B48NZaH
z6A4{QcV&d!zMV9<n1!W5y}<x&ly}N*{AQesY%_KoiQyEftMdAKsCii%d5vo%wlC3I
z`dd#ignVPyFzlTjYCa0L-7n9&e-0PM28XMT9doNjj(VzKk4lN2o~DcfiaY%PFulwE
z@i~&&VxfB;U~;NJOC8HvcXTfFpF2Masg25)8_lufhq~h_+(6?aPL<}JF8a!M>nt4A
zs6B2OJ$4%}=@i$vyD%X@Z*h&g-@}3gXC(RgMN=G-)2T`vB=0Qv>yo9vPn{4GkfyIJ
zRlAQ@jGeZ;Ir6>NLdoCYgYJgCYhRTJLE>IC`E@!!4}BI`;i7(gtGPxzTHuR9I%u~d
zlZ5#Ae-buUQlxTGfv&tyG0AHOP-r}Uy6v!{#(fsVdbLG54Z#xiGJ(=kr0b<G%E?MV
z0s99C$!GS&LcvYa^ELTLg9&MYN#*j}XLCuDD$bnMp?&)UB$Bx#-nxgLT>PBd7ZLft
zH#Oe31R_gJeWX4Igm%u40uiq*@|QF|zE~Z;*HM%B#(#(Io8I2E7ga1SpfY^%mia|K
zegQ!d$zM(3FPr&q1Z4-m@K*ooHNl~^=FJPTZaVnJxX*7CbO$eXx7|xK&A(VIy87RO
rR54>dK0Y4LpUUS8ei4Fo@fyAj-R+)ZinMd!6+VoGt@&*;uRs0|hlUqc

literal 0
HcmV?d00001

diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/\346\216\222\350\241\214\346\246\234.png" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/\346\216\222\350\241\214\346\246\234.png"
new file mode 100644
index 0000000000000000000000000000000000000000..00a37c31246654696ba8e85b2047358e05424020
GIT binary patch
literal 144470
zcmd3MRX~*Q*Ddx>5csKtfPeu=BQYQ?A|TyeA`IQ#s0gU^z|h?=Fm#uIf^-cG9nwR0
z!+A#ipK~t0tM7alXKol~-g)0A_Fil4wVy8vauWEs<hVFEIQWupUMu0?+~CK-xyJv`
zP4J0d+-pnl=eolyNtJ)V%kv+j&)|PDM==dYWgBBh7Xy1Jj)}F66_mxn&>jl4b}+SZ
z+`QH(1TGRZQ_*k~wTBuwn%P)CQZciFf)8<UIG%Gr7OKnG*%uQ&vp-*q&U~~Olf0Oa
zJYV(x5&Mh9=n9S(3!nWV&zG`c^UBHp;@~{Ok$f$p;+n8F>82C8KXtjy6WI07au|6S
zL&|N<ui>|yYWlfQbwefja+zvzNnHX^Xf{`8K{7NkeT6sHDcbDWc)#32l-cavPQmM(
z*GWml9^4?)Oi+xPrv6S-GWUq1J}LATAr!gpz;4^!b$Y_enh9Q4e~%UCy>U-@F5aK_
zI5-lI)~^3~KST5n|CK*)U)*Sx`+I@hhobv`-o97<pS@IdtimdOxIjC*xw$!ubs_DI
zl+*xHx7P7wZv}6xx5jCe=>Gk=n6y6+S77@+1^h|ZsX8~UO6x@H`h7LoM80yl@exaI
z!L4uCIyyTCVYJdj|Nh(8*P@=2%FQ$FS!&!J=OofrqhZM%pX7aFR&G71<*?Wl9v`or
zSaG`DMqTcCY>!XLr{a5yupxz<P9`RWlHYk*`0_ZE&v98H+e$8xFSFbNo#%15ff+7{
zKi`HkshhX><JX*S)DjaDt7dz_qoT}PgUI;Yb}}yZ3w<EV4>5D?&yrE0!WTKcLZ>q7
zx=yOThl2=pHeI&`_z4D4_)vrL&zvyL=dZZs<m86^@hSGs52r0=zI~{f^`|hOTwI9O
zACA`dW{@N>gJ|f9T6ubUvL0!WO552LSJ+ImO5)zW-TyRHKxvhmJ6^W1{va-dN-%G0
zYfBu##L6lso?iSaMw69~Ps^r$Kg!d~tK7Jo{B%wt<YE88`nc29jELCl*Ey;PwQ{P}
zK(>ayIEcmkhPO!L-?cuvX7>iBqNJ22LG7h7UTITUO#JKO{B-}Y@pAHRKt9d$=L&{x
zAqCEEZ{EB~e^^;ue|d3A&Se@lR%xR@=Wf$@UI9ikmQzkdLSlY)w2c|a3=}%w8<gs_
zsoNGW(5{rbcJoe-{ps48<@s*%C92G9&|E;y*|~y*O>ZqM?oSAAwape8JJh;t4#(Rx
z<Vr@+sn+%!ZO=I^c9BU?mV5-$Sj~*qGmrG*p+yYk@j5J;_NR#rAsxWnyms23krwCW
zIjs)MCi`5t1+RJ@FGSazAB+#BON5%EV|i?nc6N3i>QepLhJ*xBI(iQBU|7b=VD{?I
z@TaQCQls{JR%2!D4c1mWySwFHr_Pz#+5BnWjdQKWE6k@G>MN|qvp`&^Rm+)~<t}!`
zrl+Mn(e+p_C#`NiVJO@jQrB0tv$KP*T8@^8t)dInqgi$5XJ-{xh4Aq3_B+`ce_C>*
zt3#Bt<;<oVy!LUYq@|_d;o+%Xrw8rticxL=tgtT~0a7C8nyRWdu$zQb`KyJEdf9S`
z1M}?>Oa;-2iB`$qmTi1awKLay%bix$@X5K}&_zh{Ou43Cy?T}C;lupn-6j1ux!qF7
zy+2I3Fry~Pa2ghkqLhx|5GJ*J1y9eqCr_TB8htLMIti%+a`fumS3uzA?`U8LT&P#S
z*Ggf7(ohEPP=)%!k^>*E-9mvdK1G8hrI7@2fxU<eq!VMW$<fv<E<V1hy?u$$*;b2G
zThXYc`}sbaR&p>)HUeR>-<Z^Rae_$l+>zX0A6F+7J}bh`$YZl!ecE$ZwtW1dfUAPM
z!1gC>RtxU+iPkJcY47a~%CCYwAF@_fQOR(b_EH!1ztd0YGLafU<^D!dQPF+YkMtkG
zKk>rKPGB~=vbdOaIOQRaL?Tblc6-ac&rfhWlie1{D22UsK+yGh9ZssRdk^Pprh~6m
z#%&r^B_+R{Z(qAbV7fEk?i82PHoCVmC;|RgadvjjM6NA_k_C&`AN>jp){=Pr`jh)U
zhE{TCZ%+lhKS!|MCy<085C}c@724|B+SU5QDP?hSaleJlhEpU+0dqnzM5)K2buhKC
zp76y{x{i*H<9az70F`tAAvG99>E1@|rrH=C1fn|E7K)Y^KBtkFm)CWfbTS{z3g)(%
z%72#R`brH^-eD*R63Bfi(FLHR+5YCV{!{TF1rQ8n-skRumuI_9d^UeW<{<$^X{kwX
zQXH2VY|3>GUpD5s2tY5@&L|e*`}fm*E>EYy1W^2&b?F~JUOnXoyl`-tot?c3Ruwj6
zu4-zUL&B_)vbALkLVk7A=dw16MLR1~IvRrw2e2JXl$U6Ip91H13D_j`B^0tn<#N~T
zBugJR0=&!jK3RDtCLv+5HPdvk&DyE$A(!x?c=&b@D@xm@KC`Y)V1E@Yzdl}x2LGxq
zS@n7)@w;r~E2fK^jg~-(Tb~HJZly%A=_5ggsg2{RGOm8S_!Y>OS5%|`cD4|U(10=!
zIXQ2py$%xryl1*hxk;dPT@pbsuZ`EA?R0i@cMkzHEd-#ED;7us%b@l-Qv^#g3<8*u
zl8K2)IyW~LGngHU&08=zDaLsCMQ(P|09gMFd<t%r(KM1ji%B|tqkK(3I5J+)qk<2N
zQZ#_T3~XBo2=2AwHX2E``aKw*6GmmdVG2Ys)Wie<FlVgFE*%@j<>o_DXg!aDC|2D(
zkBw>x7?;b&1n#V6^~8jZ)ksllM<JjZou6jh)!bHNBiFvE>$!>j`0-<`#tAWu)I}hW
zHBy9vdkXi}HH3YRJN1bkJjemyV|ISBPuxnGDBxC7S6`2qJOHQ=HEKDWe6jaT@NmMx
ze6psTrhRf!_wuyy(jxiM>V)+UKv6XvorD^fO=Nq-vvFTJ$(rNEcnGv3k}-5)X(@Yo
z*%Z5O2Pw*o7VQF-BhMw30N=Fu0G)z;+0qA;uSL?>*w~z}>R%3oJr?D%nG&EBbXPe&
z+*H@l(107dC0XoahF}?B8&`HZ*$T_f0FtOUI+iIZDV2c?n6z!dCysL2oYK8KK{w`2
z99D~{Iyiw{Gg}`kAFKDM6grx@=CwbhZtvtY1Q>~)ho>@&wNUTQP2Vip_$(|B?Jjl?
ze!2gQh?=_a=g&9z4__3uT2)$(K9MA6fA<=0*f0fhTdU5s7#n}YBqSY!gUZ^K)^HjS
zZiO9ICkN{Yu*~$&o@Ebk5E2r)xw<ZQ#d7l5&wmC0KZMlfFE<+$hvg-P{S`)}$)YWC
zxt1kqr<;xX$=)ZqASPLJ^YQko3s!p6wY2i)=Zz$xlL?4{Oli{7OG0WP9fmc_p*%Qv
zlE;T3{V_eyZLvuJS_iE2hm$V)!G_v;9qtEfx%?YdqWV7P<%_+^Y#`a|x0><FtzFhe
z*d>(|6cpf{f&v1lVnd1b=|+7rGBPZ8dR<M+d$nf|;sXjdSV|QQ*gF`vSq0-Nw69G|
zOVhHzidr0;Z$D<Yij)~Cr0pkboUl3Y{mcFRY~l0VVJq`EHwq{q1upA}7c6>p8VwB%
zaWCxNDMves(8<PSuxMA%^Yd$m3Z3Kttw6Ews*3NU5COOnQ`|-sF=Rdj0xR93uwD)P
zj?eDbCs;(pe!m1YL@D$4gK<nhi{IuyKRer7$xb#uJy>`D_2rql?(~lzVu}1N!wp`i
zjsOf8(F``n47shv!ikg1{pkz8ex(%^J$G|=AFlCsBfS7rs9G(+&tE$QiHwXydBo?%
z$k3fPULIbSxoqkMKVVOj8`rV>L5$ro5LAvoh}BiJyn!0X%*v92c@1RkkR&ekUU(zH
zu+`ixk9QV2+S~g}dxa2Sp&FY@^u3S${tDTddxNIk3^ujv-1U0{(mUJRxq~O^;=wQ)
zudSwAC&x?4Ow9Zai$TU+F^?s2@$i&;lLRTXEjfPya*?2N%jgll=Hugo#rwWgQGn%$
zGEWejPC^v`M0Ip^;c;<!K+y8ps7C+{4JPBv>S6J?g-=c+d9s`ynrE3?ZNDI;S)e8+
zCN|nK6K`E33EKl|4+D4yUh(D2m)sIO54WwUdXGb?&b+w#zfw_l6}G}JWHvzK=kJfj
z!tn6$IDQu-2PbDaxSd}>023XZIG{bPDq911MA!Vn0s|dg)>z@l$OzpFMAi}plZVAy
zKR>^d)9Fj(uV3#ZVE{GK!~)9zSMY6ic6ZBu`0#;|m37!8*((bMyO${BrFDFK9IWb1
z3t&$*luED+fE(`8Q%T{oEnh$pkJ#7>adpo;Kum*;jO{j-O?aUwDJgjX#d6(ZPa>0V
zlfsGyC7)xM3imZnH;dtc0VS`?^TQ!I{~Nyp=Y7^7{p1KmGe}h+NcvJld>xnj9s^OY
z3ZkK$&Bet9eR)28nJb?xq-xy=!T_7gW9O9?=pG@(Zt89zfYV}P9u#PniVL6bMIftw
z;R0fjr=p^AKNvOH?_erKx#<HkjRv_^2$*06NXc+8>-12eJRl)*?)J{OSX*1iahil+
z)1X)QQfIuvDk3>K8I%hLYokxo`*YOwy;QGUxk6mwv630R_x-*+y^xR|mVAK<a@v`f
zghHVV0s=Z<?Y@SDAd9*ULj|+2Ful+LAFFjK1cD(Gpob)u@4I=X^8poYrzLRy$s5e#
z$L5}ynYUOz2TWm=Sh>BkqjY(3PMbRr6VcSvG*)aFFy(z}g`MH{_IAxq2b0roH_s|a
zOJ!GA*K!D|b$q<G2f_{E;o$+2ez~NR6@hm4_V)hz`Eyuuvc6wit>a1>t#s6K^5t3f
z!RB-!P?)(u4#WZe@QUA23tB+dU>GH#rSZwD7pI%((NdFg=XI^jjEtO&jMq1A+<;}M
z6=*R$dGZR(kho%{FVz<-8UZ=GC-&m=OeicIrKs!J`AvCj3KnP-zbY;+#!ma}{2bzW
zWIO3|d5)Q=cF454dhHq(GFAY#E6B(M!C){Thf{A+dn(+glfl*>M0f8RP|vBr008aJ
z22f)L<%Iz%*Y543^;gF%3WtgeS{wo6n*o^yFj3lX?o&Vj`qb^(pFlFKiX3+NYW`<g
zajLHCau_=Qt#-fp{m)jdf1dF_T?_v=?rLkZ>_2Eymrp2s@;$tqtLB!3P|9ejo;il&
z?=o;*jUE6mf6I~F&hilCLfQ+@xBAM_f4@z+%T+~j>id-1)`*f+%psrb&*v{ExvHSg
z{$8Ua6_fJMpZEVCO!)tK9}lxwQvh}X*{r0emjrkpS~}`@u%<~&O3LG8hNHj)vZ&m&
z|J6-jk*TREb3o?6Xkd|-@6BSx$F9PA;fKFhG|;QBx%m~y{1rg4jP&$j{iczPjg3L#
zhJI`~RgnPPPa795d%L<+tgVrCb#*8pF94L@_52-@w&6OcqQQ5?%b?N(^brAm3Dnm7
z+?+!(0U-{lAJE|7XZ8*b@YvW~P%7$R5wzXN95e8HCv440Y1hg|ErCA~R#H+zL`0O4
zl9JLYM1!MqFJBD-7y%B0JrNcbcG{TGyy=^o+xh#c-xwh7BeCF)K({!pjf98Y|3<Cm
zwj7(3gain5e0s{`B>DUPFA~M^`6D7C;E9O^AOQz-DRERt+*^PAz?Qza1O(O?`#<Ag
z48G?a6&bk<YzjpYkq_22D|8Z@R5)?vJKqLDg#$!L1Oj299R1r}darEjI-&_I1woI4
zpFkKG7#M_1KyY3J?^v%tlTShcEhsT_8++sT*Io#$$-H@^U}wiJ@(c$hA|>?|RO>|K
z<YTBmdZTZRw~Rs2-kt;dwJ%t{uYrN^goKIz^ELBYS65e=y!dsT914KpU>b}--+<Z6
z>Jt1Oe0Q8^vs`?47Irb#$5h4B`|s9Xe=mFoTnxC3_Sv&e<5Gwd@9%55v<b*ccR|Sm
z4-Aw}uf)hmN-6>fQ&LtY29i7_<&mTksNpI0t8i@DY3S(k%F0yGIu5eR(ZARIW$C<k
zDu0m%b4N$VfGp4C{Ac$(E{kDipt)oi7#LJp32|`;g2*_z-1nBn5m#`?wG|Y?Dy%2-
zKylS+#PPoe!B09-vI~|PYh9(6{T@W4_xajm6L7g;-<>;mP(UcJ6$o}lUv9M!9*`vC
z^i5z#Q&pt_cESP!EKu+=b!MesV`gSXfxA;=Qn-P6ob0bY%gv8J0icS&XIEBIS}PFt
z{rIuWab>`Gy8IlKtK8Ass}59Md`+Y{R``O@cXM-tI5E78DAg*!mMee}=2uomQ47DH
z(U)J<oxtm0EXpGUw3=3hWv=u37;jO=SCJ*~jPkclSC5a5RB9VSLqj*}cIB$8s($-z
zK%Hk46%~n_%~qTAQh5RMpyuLa705&-15J>vpc1$^-}LD;FV6#V1Yf8JuyuZSJde6t
z0>;S11f;Gy7#02r$Adc{br3EEn~j$ZC8ec9)ecMUHcXPB78n9H8&)@BjW7JMTQkfP
zlbzzE?Y+HdXteI|_n|+duhZ#^Nqi=kpwJdVNvRq*8N+T!gOxm@^;)JqG3=r+;CRRY
zStkxC-PzS@sT2GjNZd$}NWjpVbX!V1fC3rE$jn^kGhOfD0xFG>Mq&589FVMe6NRIe
zQ6QbvNS*!t{Mv_xhAz%QIqS5y+<!B<6IdIMfwVk5?C<Xf>NU4a&(^j`FsMNR+)oN*
zgstrsbs8~~0;!OYkd&O98UV7m8lvCBj)UI}NCfH*pi$1hA-u1SU%v)SZ>6BVN5*MP
z540gWJ3H74&2)J!$Gv4mP-nJ}k8A7JIv4zFyfNL__$r3%;X@q*H*s-)?~CIl^20_8
zG}1g8Yem4REJjN*L7{<hz{RTXU%!6k=`60SnAzy6fp7>43gWBeDv$-%4=^k&FMaIo
z?aNp79J(Qx$=XWM5Ih}Bka#*SF0P7(#v66#-=oE(HW@7CDrY~Wa+HE{7D_`W(G=z@
z1DP09ZUnXx4Dd_j<>f8WD!}3+4b<3ptTtY10Bhdl;vE4cKm!kEr~wX+)ae~ULUo{o
z;BDg|4u8$fL98dLa0_@W<}es7L%eq2s;mMCMSc5lZGG~rvp1LV;^G1W=6f!{!NI{;
zzR1GJnCIZN;C;M6Y6Yx10C5Es`&z)$0|}Bnm@Utl^z1?bj2|GhBL>Cr*A>T(-$YMO
zFL=(BTLs{gcqo;Z$eh*98#i)orW+14CnyaLHzu(NeSQjpt=))&ftgtb<_N01>Y5t0
zqu7VQj{>?I0KGN*wCm^3oC8;^*6MgQgee0SHDI58zEHgI$VlJ^_2SMNx%Fu#e{%py
z2%rw2Dgx|A-kBg)tshSN6oPRgfMUkx3)|WC_r~Ab+9W_S1lueBMpPV=p_w91Vd3f7
zh<0WGJFtfw#R>qjARv%`eE<H!$oL>h&;x^2S)jNqvzZpcI^r7}8(2@9#;iLI0#ivv
zI9XcehX5mM;X+DEstk~$R;czM-`%@+4~q6ZfvhbCe2SQ?sLM11hy>)GML{~?S~D{<
z?68>1xB&3y)2x6hr+lO3T`@g3x0*yKmFiMYBA~0Z!o~|Nuamvpt%Pw)?gITrJ&=6L
zZ{Mb1eUt#F#9MKA)YQp=92?JnhfhyNlR<X3aYVnVqY;a<)9q<YhhhQ$c$wK#$%Czd
z>1lmX2ptqTtc?^OSg-LruZ7V}0)9Cl&%?B`N`jbIO&0Q6n<gb9!X^gxNk&E)HO}ki
z`o`s2P%d(Kd^`f6OI@qGmzNimBbHG>O+n$U4Rj39IN2S4zTSfburaxoYh4a-`EQOJ
zuVsdgfp&&RM`w>)*J8`4j;=0LYcK`yAKihUBm?v~pY6<hVB8}CcuqPF%3}J{B@!{$
z62iNUEim_fPs&t{{Gl`0hl>8@9ux{CK3bTbIBs5AM^URXY+jm@@@-1CzQ%fOiQn_6
z2qYRaFfk_W+9-GS_u(qsdZu7^z`ihc9q2{u*l~9PTW1wf*Z_-;js~fRDm9V!u;@5H
zJw(?ug4z!r8A&fJ+}Ql!BT1l`l4h7YaKdo!KTq5o*0uq&kFQ^?2ei6Tv&u?KN7n(&
zB!pUl5`c!Wa*J$WqUC`#bmCM0Mh3i5k504S!&l(JC*ThZ45XxT=1V(u=amRSYr=t{
zrUDF}(#lsXo2_~qAPd1dyeuEDt*J3z?n}k`kN^chU8#mGIAr6wfrmT<EGjr0j&%yL
z#hi$UNX>RD1+65=335Ki9H7W_v$YD;5LgozOl8u(ljQ{KhfK1Z`BPXc7#n9@fC|?f
z7}5uCD+SgfJDR7N3hPS7yb`7dRA9i2OzrBDS)Zs@7Ef24ot>=#v&29c?$6|d)QYe9
z(+c~IropKKYdRH{vA_qg6FL3*<x4nCKzC~rkWk;hzxMb%RsxmQ1(sC}aLyp0ZpB?i
zg1E(&iZv<Hd0Ys4kgdRQHQao{3ik^DRdPR=Dz>Qt9LGGL%M09p{^`SVSYMw4pa|59
zxs&RzU%z7gT(EZ*pcppqPxH$tiS1cs0`=wPaj#EgRFts~)>qC0C0Q;YcF@nsWnW`f
zS5Qc42_R%*XIBKM3Cjqf<j(_iXYb-tXs<*Es&vK{rEi)LwNAI+JC&g6+ad>)?&$+q
zh>}pCyNpcN4OwQ;Ycot9npDJEkH9O%+Gi#`3H<<U=Ye+sL@b}xG??*eOwtl&yt43I
z0PHmoUHL~xu2?Gsw6)YUG-9x9AMs3{@$q93;IV?9kV5868KZHj<|W{6Y=T_5b~6VM
z(~w+@3Mp`tb`B2IKwN$N_z~-CLye4b04OjrF!X~c0~Mhn6rmXSBP}wL2)I0qOiT!+
zkvPD!dwYB4p_2vLJm<ZaXY#%lx8f)eLs|=vKgFO*Apqw<5|)Ql&Qb#{gu?m2A^;eM
z0F(n9_q9{|$W>&moNkR{|7=S@xz#xL=^XGe(HoPsOa(sPJO^0g6*L!AH~87u<gr}~
zY}pJTb~O)yEa`pe4T^qX+{%Ey8xZEe0tzT|-#5o1CXjUb_C5gChrkYCT`OJZQ7AdL
zWsKLyOuU8JSp#a{GSB>8&{E6=wKdk02b}rp)hkA;!|BTlbpS*F(Nq8;I?eyQ3kIf!
zh0OgJK*UaO)?YfjE7w|QDgNE<07cr1HLeGDLh@UXfN(&Dm)uI)U+)nD!qNlL=kK4X
zn#&ql3!04!8yiTFMhI||-(~Xxun{_b{#2;6o@BJDHE6jB=8!WmcZ-k;)#Q7N79PC5
zGCTViU<<Ym0$d5)?MQ4$JQnp=ITir2PW%?ZL+r1yyYu)t11NrIs%z2bxx(PWKNVFZ
zfKLKSzU8&iQXpEw3JVL*&dzAy<!2M+7SX_M$dh6j0E!W$0=7y4e29sV7Ygq;%~y6g
zI|*UKeQ^yjG-is{{1dFIAXtI(ys)^~(bm=$zE4k2{|0M7fq`W}JEJ6(60h}YmKvJ@
z1cPOOE(Kp15(C&t3KSRz!?B=;rwEu>73h?zsuz^J_OC%}2h?glnO23>i{v-1U;hbY
z5G^e&M|}6eJ}8#4_A&?y1i&IGd3iLky3S)@LjgaTk(;|>EDq!W)`b}Lg@GVm0eA+q
z<{$tptV}J8N?KW20k9BAlM4V63uU12_}#BrMfW!7fo545Xu|?Llm@Ixe4t46`GG+|
z^?%L+T+QLPvDwUG{k_5OjWa<I1Jun+c++|pqz91WN?6g!Y0{$tL_5GFMJwrUkQm^@
zRw0Id&{6B7cN(JM;80@GD(k~GpZE40i2=n43+sdCHnL*>-La7nNJ^}y1U8D8oIH<&
zMN1`Sp)Mum<?3)j1fUAwTcXM>)PI}n?mbp_ofqKh*vbhMuYhNUAZQ)sY<A&Jr<DP*
zXkF)6t#b2B07av|BdoeLIY5>l>;uh{0{)=~O=zH{hE&ycRhJBLa_BiZOFQCees5`t
zJ75ZcPgcpXb?iY@f?8lHRk5|qeG<6*TNR=ERVv&L9&VHi1+Owo>}UWWtZ$Jd0mZ5+
zWL>Ob#6>=Y#i7^y06u`oRspFBpKJ|%$eT7<>%zpzsSJvwYU{>xXOHOJow=&2s-K`3
z1wEo{pleNjw6>Z9I~TaRgP=vrqOOj{Jr(dcu!I|ac6W1Y2Uo%zLB|^RJ~3$Kpj~@z
z`ZiIG#C02EZA*?=8K7ugGM%12eTps3L5KhiUE4~MSY2JMI_%y6Of4<+l>}L=sGFPq
zUH!esKwhi?xB@w$o$d%_Q>W5v{r-I!G|jLsSiQyR@oF$=l}}9y($b0nsylF^yz2^T
z?G<eDNJ@qW1O!M)OP2#lhIJc8iVZ<67OgV;wM06a6;ud{@7|>~TwZv9B+3KJ4XkWM
zFgqYw_5kkD^YUtfSWp0;3<3TJ*8sgJ2Vz}HRu)ckO7j}KP&Wx4=74_GkG3`o&>;L8
z8VY(cle8fJJrc3--;>DCcI1h$|MmSg)`MhaWvyAS7|%-(_|eiL3Q`FL&RC=aTi6V+
zxjFDo;isKoRNzQMItrJ|1Qgt$a?ex8!ydN*eWWV}*(%)LhQ`Lm>3)d=n<9*=BP
zyGurWHC6P%PFR_Kqc;Jb0BC_=n?`_Fu+#^Z6HxPDET;*o!OH{GH~@5VUcKlCa{}rF
z0~Q4&FQ4~Wl{GkBAPG%1jpw!^x_3{>Ky4pT5eB#{d|oGISe)&4-kh6D7xFrx2WlLY
zZ^(Z^`N8r6(5Pmi!+4`lBd9@DR^8j%+p)@erpedBvu7Mg9&q+yJb*Yy4nXq|XqRh+
zo?@-i8iE-akp9>t&tm;eJ@1X%GRwd#m4Xxi2Jndi2*F=fOJgkzP-bU?#)kOHKvHz{
zGmz<gy(CjH1#dG;HkLsvU_a>&a31=okO>q?Al|^#$DeUBvarBMEs?tyy$QU^3_&*(
zY5|o2o42pz_lcbM?lmbz$}nv00wORMKzvAEduk890Ja%rP$!R5h5D}?038ecOk)81
z3czIohS5`uUDXcjItVD*<?FWXoxrsUAfSw^x$}F^;xh59M;RosGzhxatfRw;IDHTQ
zE0@EuiG@G2<cl%ba1iXPpC_+;qm~0t03`zwL=le@!|(e6Gvziv{!L##=e3;g-@i*&
z@bly7<bh#;suCPoVD@e5`;U!a$x8#ej-VdKQXx1@5i&-LbM_cK8v~4eEQbGYhh?z{
zEW<}%--U&ROrXsqHb`(1uno_Ni3xDe|H~KRT8qrEo~*%EAt1Z#_K;V;<$edw7O-so
zFO$R;mmoIy+!JKV324ly0CfdCAs}@nsITJ0v4GPi;2GEj$7(Dmrz5XFZs|oBIqe#A
zG>}BV2Edw1phypPdVpif4T1zXjM+ei!jP-KSK)<5x_=@p5`YK*X<e{^JBNqa@MrIZ
zX%Iu^jZ`;p-Qpy>`}@k^l>QbuNzib{7L&{*fxuY5EqLY3ATl5UxJLlTl|fKKpFfDJ
zVg7x6!hqOq{!UG15-it&8W2E2cxWhLK|Ib2O2iN&7(B>!Ahwr5msTC@vk^yZTn*Qs
zpN@&)^LJvY4s^(|1_(H7^r!L#n9JmO<y&T#dHL9-AkR!!cQ?>-(V!Z51{fT)@VM>%
zICE$0T#_Sh*diy9&q)eA9t&69#)F`%psJ%Y5edEWGO-G*9O%UWp9U-b0L7tzJ$3cZ
zX|uSO5dn$62ZMt%^AA#1_wRe07gr7b<4I?~9Qjk#;(TNH_dh1TE$*lP{4U}D(@U#(
z4zA$JYeUP&92YIazxI&iR%K)q6aJk`N_bUuO-H?7t$<clrzzLZ(#+yq=@JJ978Vw#
zJvneX2+9iwI_j01dqpH79w8+yrK+Nn0lODK{m0P6`6l)YgL&e@<5FE!lT%jp`i5Wj
z!umRcvV{#W*tpEB2nhLi@o+(L2e3z;m^3;kr)ng;_ST;Tw!LlV;t&CJ+s-cD!rY+L
zlwOjoTA*l4PeKF<adgzr$st{no+Kxf^`EmgI6A)s0I?`a7EkEtpcXllBTzF%x4+66
zEp-5vru~t5{qJUcqwb0YTd$<^L|t7y@n??}jQrt13FlIV7%MCCQ^McfHQF*w1>6}p
z)x7&r+j)_cD6gvQQ_6o19n!QrFXn#*$D{NWTopq-L<~%TJ7e##JHLV1$Z`LBix*d1
z)OmnLY;W34M=yV@DPGdrp@Gwu|NYvQ)mH*f5*C^}B*r@hj&yYOvZ>hFeMvX}JD-`?
zUjwBhlYO>pN_eWECbMRU{9#T$ZI$`uxc@xhHld3PSrFBZrJ3cPA*2eLJ!Bq%7{AeL
z^Y`%Nw=dOI)g9*Lx{-#E7X<<i6>l<WNo{Nu|Ng#P-b^yLC!zjDeG7#^uzUaLlj{CI
zD2{{k694PhKo!UE=AN6`@E^3ajH&-=z`Q31l#y38|4YVRZcyg$t^V5wGV7KtOJZ0`
zAWmdCUGJi*Q*r}wF1}eBuTNBI-d=Rqd5p(g!WG>({v>b!rW}uU%KA1?hq=MZp&g<b
zm6nCPdgM!A@-v|&n?9g;)CjS0qH5$Yamtj$ld{2VtjYR8m{hxoR^j^%dmGGER)!S>
z*Q(#45!q3Vp4ao&{thIX&MX6*4wVRWq>p*ClhWFr2>G!UVSV`lCH{87>J>|=(_fO4
zoa|azl9!5mWm6oGW-~5^c(Hc7?GC>2Z|mA840|>I#$t}+%|`bUZcfzdG;j`^{rFBr
zoK(x#KAKuhNI3Yh<r>#z{N><hx&5hY@5hK@%eTuL&BVqVF+*+5OuCm{6idt7cGL7n
za3^;k$2XK0Ay!hZn?m^K0b)<1yfW3(dukt;ytY{TwmoLod26i5HpkPV7O9~G4-2D{
zu9xe`dEaMy12y~3dX!K8X#g9H7vof*#jVzaU(+=bxiu9Lx9>nG8_O=2Ds~FD{rSW~
zHXgN5qh+MHR=2Gc)~_uu8=vz=zdfqz_0vsHIn5T~p}f&+7%$yc>I4>5>d=jB6PPtx
z!h#fYCfs{-ll!S8|F2(}7S`%Us)a)pDg~YfFtt-T6heMKoS=<q<#n*nV~&bityEnv
zi-tv_?-tt`D^gX)X<27Z@2}ob8Y7*9I1cy_A20INLj<-y_TW`1M2)PFpDZ03#SMQu
zY*u9`p||RFUhmpbMt!4FZ6MHix3s$LmrKrJRUv+TiP69&jv3-j@SNdMnM6Th{34`x
zWt%W_wpV`e^YaKJu;7OTgYPQ8Xl<`>?c3K7Dw0O8Z!yYJo@O}fT*?=?uAd#g;imQ3
z$=OJX$h^DhnZ;0vIuGdH_4ntn(ME<EDnScY@n0WC;T|O`lHYZkwRsMF@l0AehFOIu
z8T2$V2f1{fc5#qoY(A_}a8z=3I6}8O<x0CSyhqhS-xx-Pdr&Y=%)1`AP0qJX;n$^+
z9s1CTvhb@LoQfLQWB#lt7ikHGnL3MFx50;kI}<nk<g~M5U9NNMQ3eNvvMv>AmPQsi
zm6p_~T;y2rhi80Zxy_MSB#G}?!!k~?5=)u}A9l(YBA@#tl1&5k$*v|0MPm|fT5s~J
z*c4p8Y#ERCvKX&@^VB98IU?mkg8$W%R-8H*A^k`RPdiVoflCjsf{YmAQL;g|e4Sik
z+YCx;t=>#rtrX5!cXgCN!eUXn6TV22)|bJy*P7r^!Fj^8)0<8-x?R`U{ZZwI_1EZb
zMChwJ<#PMkTB#0U7q^f1x?sE;c=g+3Nng=An+V8!lLyh*;$Z;uZRx$EEiox1TJqTZ
zdq6zbWrfz1DY0f`(FH$kXGYGrJ5$OB5tFe>hrO-@Q%&nt7JW8`^_G<ykCVkX54^HQ
z3F%?9r(uutBdx$_@9dF-$L^TQr5}B{mHs+51E|7ii#h!h(--L0#mq;Nh-nXbCYLP9
zhI)wc&TFv>VVlG!fzICBS!yOa1+#M-iaEKKweU?Ab*1=m$n1x$65EzKQEHzOJ+tE_
zt4)V;|DY4ATKA%LU1V-)l=_*66A$BFDjGt|EJ<BB@l-low&H4F2}g-hPlwWQu}UU1
z@-T=!2OSjF@j)(Jtk_0w07W=AWKoeU(}BWNAFgaeAk)py#aK`-o#I1BF0lvePgpM8
zOM~?0j2xLfm}j!CeBBEk#>lHN99Mk7J;<Bxz0%W>Rb~@y0ZkJLCk#?5boU{sWN_>D
z(Asv`a0{)B)SsshuFg#2KU)`{qbSxSrsCG)RZ!L=qqWjO;s?k@m(O#WFmxv{Iz(w|
zwkbhbF9%m%NKx<Sc@8#5JuJ}d?d=z@EO!5nrth3ua9=z)mdYtk`Bzq>T8{IHMU}0F
z^YXb5i@N1ILa`2wgP7calj&j6Kz;30V(rT`z4H2vH1V6$|N7m0u(usB2-(e2sJGkP
z{NT@W&R@U1$f(IDn0~Qe{cX1{waKz$tVYMCYy7b?nhJL-Z$U7-JO5LCF0P`_B&Ft$
z?v5v=U{}V=*DKG*ahXfz=NUWSh0~&+NXun)@seovScsDeuq)IXqaGuQGtKw&4dQBY
zvs`BB@HLNoqXZNxDbf};nbeS<)tYmtoN9RAehC~NusKx6{Susb%e<?87T+d6?0#Y9
z+|71@)9E;2R-Fo22QH^3W9Sfx`04&|<nbWUXjard>(%C-!0-3?^$pv)74yYl&@bZt
zEV<qb$os_S-IhapR5MB@Roi6s+bYyU+jRaa37I2)y=EdxkWMdAqOpD&Uz&Jcz1u6W
zrn`;xH*&R(o6p_P0#zgjT361AUewpXWxr}2`3mLD{lI-mxv<vvtG(9@S|fR7Qnus7
zt+Yc3Hfkgnq^>1G$gq<nWsuF{Rmx2_V)PG1ik`n~x7PE<uM|&VFm<z)=3S~?3`=8E
zB~w|q8!3YegDdJ)_rrVizS^>4+VF}CVp%W#(t&%sagrEe_v)@8%j)fFVTP^hxC7lR
zbQyKoJ0-UmL*+3vo7udfbmXvC+ElWj)D^volDET%fUMVrgUUP!F05k4R-WEr;;ktA
zN7^v8vpMRqm^s8_&Q`{{v+ggCUyD*bNclmLUhcZ$LkvkAPTYSdJ@Cup#C*8otIbFJ
zhR3*}HR^^J9BQI9JOg-MJp?Qdcw`(5tzhlybfU;t5)({~EH;?nV5NEeXBMwrB2*F(
z84&5RqopZaubIOlhn4{`Ey{;EwU;^vbfZzln$(wu4bDlI-VNP2eiMq9MWtywjSjv+
zqWL9ID|Fj#PxZ<S6qfq!iT3H0xaS@@?z#@;;PfEzz;+!WkC^d~hJ!G1)``cBx2F0L
zhVDOSq8QsB-3V3P<0(}4p0fhn;DSFmFecTdk$Gb`4a&Vg9MpK{sQw0u_!`%*kbet2
zJ-$pGInx*LuXbl8mY$d|r+u}#xMOvd_#Qd(u@6=DU6)Vj^ZNPBw4Z77`+;08(D%l#
zClPx4f*+m)!qggbZ|JLd6W?aV6x^*u4d7OjRD4vSrHP~69V!{IhwtylYZYxDQe4gA
zzNu-S(RVA$tb-*LcllvXnB0dQ!NyeOU-p86_As?1OaVh@HbY*$NR*@G&4@HTWKDqh
z*fT<lZw^ECqm`SQw>y_iS6Jh(QYGZyo$Ne(r?5<!PISCM<8`)ZKj}5dS$41PU%kiV
zyB*VT+hrT6v1l4;#kZs7-BEc%4%rq^aY8}JW+6s(>D$L8e1e%FQBk_Ol*g385Tr5x
zLE@CFwc<X{sa5RPbJq|OOGMqzw>xls3hG$TLgjhcE4cq`McZiWAD3@O*t|@6^-&Zv
zMBi=O!d39qTRe4Xa;wbJUG5syXXMdafQ)#_s0n;x8=+BR?Z~L7?R9TLCmO29`p|`!
zT;0hiZ912D=G*;R0-Lo`f`fN>PE3Dp`THd{RLTrWB%dsA2V`69?*$l{p2D+G?kw~c
zD{)a9&m6ib%~{EXD&CZ=1rN6&4m&NUQfUH>#!r$KStkmAZjcnXo>F8kNLM7#WD+Ny
zb{a>poKtky2&)F-!zRvAm&ZPsT*iA+yRE$L^To$wnnY=j=ls<imZsC@0scCr$=(ze
zK8Ea?w9dPfKe1WSm912azehqa*vAtbSG~v?e4G1?m~pHy`y18zOFNz%^e2}NEn6bZ
zvTZ*g8~u`>cq9&H)a0xwnW%J=w&F5xpBx?cyBkCEJfBLeY*6HR@0xOM_I;J-?qsIE
zQD&xjcz!L6X7qryA9W-5tGjpPIeTXkZCuR@Rneq+iKrWM5T4;D?T?h__4hBzQg<M4
z7J~;NJH_X?UTZf5?OTq~YViZ_8u|$WMP@7a)!t})P9qF0sFL!kEtZDe*;##<`cvE7
z0%EN^HcQKqXFp<wOO<jv!mF38tv9PIc++e_PdgYLhdZIoI74?i?b`!kGh6#kD|JAb
zr@kbQ7^%s6q)KriC^77`v^kVEPZ_IZ*S!>e*8Z^jWQ=*bsnDXsO;^=3ozA?AW!iMu
zqtH}D<&!Zgor$932BZ0&f?wPnSa(`k%l6lEqmWkb5*<<F#`NPk))}M+q^o#y<y<5F
zs()kpvfm+n%Fid#icxAkUh+&lWYB7}$c+u_JpH%7td-`2oErCYiWm&09^rebLk)}9
zW%DM2JEJn^EfFeVd7ld6p{BbY>a3!NigA}>qx;|P@k>0V4vMDT(Ud)QAUv*EN$p-7
zU0Qo4DJ`)-Bx7L{{D3nLGfVxo>ucg+PAdS5IhMh_;UJF=Ei)2Hm~6*LYA9)!;@a`}
z*Z@(yb;OzZ3mdF*;?|{pMnS}DHO5Ol)9Fn&ODKm5gGZF@lpmAdYfup78{WV5b()OG
zvyC2q#R%37%gU&t9IIx~a^MlO*bw$N?^rE`%tkGa2P0GOB(`GC%L|$Ef-{0t-95L`
zrNWH5B~_an%A{JNH4ZX5l#C(>elR&lh}~3v|El&@$NUB&$Bdd@3T7#J;yQ5dPRFkB
zG)o};XUGQ*Py07ddnS9cmbl)cZu${WW!@bsYc8c_MGq3n4|EtlCf{CsC0n(ffcAeE
z2vcVpm`Go093C~2lnKkMS~M9bu?Ww)Ff3(G+~^1LJ$Gx&YT2=tPN3>F1?h#TL-amc
zB7(d)K0AVu;>6<1K#$k+ZSQ>z6}QP7E&LWuHkX7+RdN)@pVUux7u|fHj^O?(cN{n`
zpZ;_vYZ@$#&oQx6!p&^6+@}b$G^TA^S6A~M^&pR@mA$Ra-8m-oGPS9(z}~e!s=>Kt
zAedMCgaArBq&DX^A%k1Rs&i$(U=R_I^i?mLf#(5@c3xD-Mt1BmEWdHc=c4ZkuOx>y
z?dlhr^@%b~vne`1$IZw3Z1DX%C${i`u|ss1U+>SIX5$-W+lntyTr6MmQvK6Zyack6
zla8w!e()q2Q|0pd)6P<`#8hGqxjfa&NH%5Rv+xh3RVMMUMpwB|87<`e6M(#K5;nmI
z%r9%@od9`rqcmAus!Sd|Y6<lmObn03k5-!FDGo)^o@(<^OEpbs#E@3o_&<w)!nX~&
zCyl~uLWU@88uj*F-;u>bT+W`vLY3yrT)2KH`i5Nbc{DtsvOYpqs4%|i`+mg!kjcOV
zb|xe}(59cPhT6p=Zg2=$Y&g#m$0WNm&hU39-e;NI4iSpuPO(?eEYe4)+eCLWy;2Ns
z^!leF!`!5qml~)06t4Pcw9BK+Fi<jUCn|1gff@%vU-2_*WW^^h)$j0_QUCs(CQ-=j
zaq7XQ)k?qn310j>zlyS+hwf9%aY$#KT}W$)e%hylL(_{}g%+=$A(**+Wez4aA`f<r
zf2h@*T0|Vj$mY%Nf4_H1C@5$FXQtfj*JcaNI~$uD5`CJKm;={dve=e(AUa-I{mR$K
z#AQBA_u566UAv|Cc9Z3o&i%a+NiE*<$7Ey3LG44n60sCnv)hs!ObUh@EEL*GHzQ;?
z%)X};5{120DAZ~>t0^5*Du0J5l48;Dsnm36vx+iOLe)|4UHC3J!e#W3xe<#NCT)sZ
zCARMbwf;@)OiRS09{TAP53RAPn9_ftn0hnNQf%o{b>j`B*;3^&YXyCEE7b-!Ewh)N
zvalq4r2@T=sp>Gqr}(%zr0f%{wzjtr2dS)lwKP5W&4BUZrwnf6p{6V^j|LN{M?3Ai
z-K}dDr{Xj6q53OBO?9^-@=Gie3nk2#1>auS!o<CEN|N4P#-4iwE<Oo-ce$OBh3ryA
z1@&^xP5IyWS<%uD6)_Pf40iK~H6}QSE`Q09uzL~?uhT6)hTG(fbj2zu;-hPWRO3$f
z9<G=9nKm=ASsWd=psV7Ej-yGrjJwaZYG!y@8fHrwDdJL`uT%yNgcXh1G>q9hoBjxh
zx6u0nsQlgY{fl~b71@Dwi2b*SA-!9aKcb5DI3;8UHiM_rAx@5EsB^M{lf9{u0DGl!
zLxlHbf?|V}ocTgj_6D7p+{H@-ow|ndqHJgRliEu0JHpwjW+E^(pDN9)$EtT0R3(ND
zS0RV9l4BZ^|N8O<?j2nVq!F#k^4x_d;c~vYkvn#@@zAPD=%ou=To&>4xKWOk#s_I4
zgKV0{9J3SCg%_Fwjr|i#Q9n86o$C1*byCrXVVl0aZ(v>1asz$rOM1)r9FBFmoI(Z0
z54m;;SQ2*=jJ@x%8KP}(P>-huvrXiNYi^oelwR}pd1@&+(7`*mzgOZq$~T#Qyji!w
zHJ7GY(bNA^xL5rZ{v4modGbLVV=j6(S*;48lUOFJPk*41a?&VJ)Escz;)5!;<f3I@
zz8J*G4qImlbM5*akC`5*nrfun`xUhu?!g>)+>3_Lbs6dR)|rJTc^ijtXT=lou29xP
z`nL;g-V4{fHz*|*>`DnMdP!!FjB1_CuC@t9Ye;70pvy7vJl|`bJ*EXC#t|~&9?K?Q
zv@_azZ3-Sq#xGi|WgR#A=npD9CRlFO6A9NW-s~CdxWnK$@M5251Kl&CMMnfP!VOYb
zwz094NqkD=7O;mZ9puVLS1m5xM2*By#a;GW`n`dv6AJE#pVt<T1M$G=lUSIDi&ox6
z9N8-Ab>jOB1=&$>+4tWQyikvr(mKjdBR|T*gWl1VFk9TC96O$|FSzjZg>6>7SGxt*
zfR=00x<tqbEm5L*bQB?za2uTT#|r2A=TfPvWor{hV-Z>jB&iIA@@j?L;>UhRmlsWA
zb(#*zo^ni9*Y+>~E8ZM$MjrDE?luxyv{@ya4KX>9O;2RJFS5laa|umK;+AR>+t%8e
zNj$0Qw)EP&9IZVq%E7JMzT7--*E6Xb)eo515Set+IAM)AaCn0!WA#%u<CfwkezERd
z9%3y~9qK+>_^V8@?(4<G{rH~r-*Gc^Z}5mtjK_3n)8m~PcRT*za7J`x*^P}X>hNz|
zTs%lhe*J^vqz;K-B3ta@?dpo<U2+g2Br|aB9LX-Z$Yf$MMB9-(yC1FP&n*yD=cKE1
z2j^uI2WK$tg0P9z?2})f2ICmhYE!QQPZgQ__eP@%OJ<@2N8xAM%ukKp(>zdUqbn|b
zQ4`qJqoamo5lO87!LRTNVo;rXT*(*W6{t~^NxnPhF(S~&Yg?!sajq|OT>k-b%j51_
z93sQ#Y8G6JX7Z&*AMY-G`0@Ig`u*xTXw-xER~UzBJ&<)l=kI7W@konTMd4dRy~P=|
z*RL5~zb*g$U0|<COY<?kcJk9y!kdwvl{weN=t|N^TOREzkD`cpDoHUP>#WlVSG2^W
zJyUpC;~{RdPZ_O<QTns$5}h;6sfVR>m!UMCqmD8T28^r8A+)ABUmocvlE188Wv|$;
z2&%ttBYaWSR+>vj65wWOq4Z(Zm6bwi*Va<-O%b6u>OW@zUTKj_XE_PdAln)Zf1c-$
z?n>O_gF36h;0bcjw@>1uCElKW_*#wVy}Xj*D)bKhOXXm^R>*@r#-zwy_hA&p)~e<!
z9@kS7zX#pR1pIkdCPY6LYZX4OpGsFuD*p^W#T2tni7xIi93`_Dw0snsw>+lB^VBok
zEP1=f<DeOvAZ5l9lg3?Qr8}W&44FtdyIM4H|LcG=P7^I9dc{0JHkvo6#g{#oF)H;Y
zD?f2YzufT$+_Kr??-PRuo&Dnqllb(=6RKNI%hZddbPN$Y)Z(QVGvbR=S5&J5HNv#M
zckrfAd4}Y~W9Yq&?=m?s_22Y^6?xoZqVqZI$o_Psb#(LXPHvW?H20}~d-g=CVI9*|
zyiH}jq^^w9&KN3Uk)z4fPa)Qfg?+2^Ud2W6p?Cbk+L8@d&My)VHS0qL_I|8kii;1*
zql+sS+`aBwRE$5S4<UB&&Gp$QmC@H8-F^A^TgeebN~vz)ozx7Gnu!Oem~PvEW-Wpo
zVosJS?pygJwKxu+5FaY)LdeZkHYhvB??%J&CJO~Mw%#C!I^f!BmiSTgjwfE1ur%d)
zgeIXXtjI3VlU46&I&b772=$WRuBmlOF)Uw|Y(#<<_OC`elF~17nCorqF>@?~@^gk1
zFO6nog3bXo@>?oJu}n<U#56O_BJ1Yr5~I8cbolNX0l#a?+y36-$&8ec5R-FEt|=Sq
zyLE$Wit;uz#;(-$?m~>M;i(TOAdX^I^&>gx*)GU8KOAPm7E>3wantO#Lxt^bZSS4z
zxU&53k66ClBHXCB%+}Ku@A1EB6II#!QbstCcF}s%m%K%JgoI#<xmSgIoQMt@)MWIY
z+|hRau3+7%-imoPhckX#g9-`F(%eH5S!2EUx0ca_|H?E+QNA#whN3r)W!yKfuW~Ba
zc`5#YD1J09dXoD2DMIR}+wAr5b$@YJ?N9UbB#Pz;api7}XKfK%@yL?5jtL9L^J_ha
zQ|j#Z&4V8?8j!JDjc2&(s1OUu_u!ot*X!SEc~vYINIP4M^svF0FpeCb-oYhMnc2JD
zN-hLnamaSO9L=dMS|9H~e!Le_rylT_KIo^!J;~3Z%TDhnp3}eocl9k7K5CfAhpR)u
ziwLFdTP$blLp8@2)$>u6w#vV)#4-QM={D}#v^sleT|xZ1fIM`Tj#~vH$Rk|myHXmj
zWp^C+rF~?!{#@p^=E^aOP?l!m-5yov>KN2rIiA08!}11ir+4?w`1mj=Td9@z5;LYk
zW)Pb#f~9KkVs4CxC4=_Jnd?YpVdH9L0)UH6`WO?-jCo||v+HG<Hz$T$igPyVi-b!9
zG#MtXWRJudo!bnQg04?m(d4Yeh;PS$Y+GFCM|iMJ=y;LUh8XH|>_#ojJs|0>sLLK#
zl_zUSHok0VV%Ck4>zO-LJTb?O?WK7~>3)MhsHygxZj)3z9<PV|#?9vw@92zPX=IzN
zt@8|Oi>RqmyoY`&mgRV(9NN>$J~gb%Gvm$N)WceB=dBUU!TexwNGJAC@wU%9LRxlj
z4RLPxt?P0IPT{d(q)cxeuiXfwHWD>})R1>guxM+r&viWA=izg5wU9>|BX?AcyK@gV
z?)X2CS}Z?UV-|9by|q_m=kd~IG?zF1{($HECA+zk?+e{)j^3~9<tposBWH^(y%`>`
zJdaGtvY88OJ5=D{OFmD&{+2IE&e5H_qV%o%BR{Lvd$hiipI3j(WznsQ3=^rPGhNZ3
z)|AsbAUdOjs+0sl8WGuLkD@;ov=a$ji-Hy^?!2bom~)$dSS9DjL`6A%AQ8Kh-MP<+
z>hJaXG~8lPzwy&#Ay44p^?xHu61GwqPg*EOeI!V4*}Dr?Os@Xq)4A39&y!k=yh}w;
zw>(stlPjujVX7euQ^s`=TG#bq_T?qF@>2X*mVn}?2%P4nXB_Ql$;<*}A<Jj2IjZ?p
zkxl-;aP1c-BxE}}=OtTDRR?vPUO*JY@x>uHT7n02(8zI$96Oa?eX24oXSRe2inq$F
z0<Yy<iSb;`;f<6XScR%sd41WBrU;p{OXo!t*PA`cK#1r?FNl{We=|l%RL{50qWZSm
zJUGI8>m+zmdp%<7DW7Gyh!nD<&lI^(S6Ib{7MQ({F<YdPN?JBHQ@?Rzw1AlVvA?8D
zW3AWiEaUNxxO`kxySB#mXEL$=Vx((q+EJ2x;+@I~e;;WpFUm}drJ_fb<X)x^9PVem
z%M<=kFkY4wCGpt4eQ;SjIc=t10G6fpYvh6h0o~bxC_N7jeuS#LIv#H`V|i!SxObUV
zYS5?Y<NDC@dk!uf)l%gK{@5M#^johB>X`8f{h_%ufpzt+@s0Yv3<A!%LbL&YSn#?A
z1Q~r$G|x(0W^PY%@v&TXQ}&1Hr!gt1N);t7jb_Xw|C<LxC$WX+@^p5gS%$U$9Femh
z4=XmXCGK3iReTfjvJGirrA?pu_M!1>7ZrstJ&{}%zj}Vy7_H!Pu$;LiF}*Ji`m~C?
zKDvFP25)>!zi39|d90uI=Q<sWyP<&{zwVhk&a`-{^|af#n~Q3CZQaP%WzCp-5r!Tp
z3{UOXNaO3G1Vr;jK6A$M`R(J*y4b;|hV25bY+7y1Y0fG#R!DQxO{3CBtni&u+^C#|
zF1!zMbI}a(H#fz;L&M;T;%+CsvG(>)-A3&aOAz1p9~Z_i<DJCb<EF96ESV*z{>c{e
z$w~hHDEH~~-1^-HAt;8pGTO{Vr!PvP<yem_&i&Tx)3fIxRncK=6h-Q4j_mHQo9ta4
zh_tdAd+X$9y<LB&wt*0*=JVTk>%BX~P^#4QAnALLXzElTZouFdBySUjl7H+_(m(6{
zJjF-@$9uid^N-^cHrvZD2$`RpFz|<0`xxy*%0d+iB<I&dth`v`-5rtnwU~gqjd`NX
zr-Uy`eB?$N{!wFA9F5gNWs$x<wS09hbJy-^yi$Z$i^ipU{0`yJgxNsC)UtTv)=i2u
zX<H0q6q#AR-d&#|!3PaiWrsT&H7_Vmb6yXJBHwm5FnQuYV%#VObKG6oDU#aT;>J}D
zTym*AtdyNqt?(hNl+u+@Iowvo)b-jmH^&KU$~L^v63`{~_X`h8Y=s4!>f>mAruZ8A
zAw@!Ige3ExIsSp(-J#WyeVLn@LT0ZQTT5sRaixc=b9gU=+!dZn8AA9g=YMVC({a2`
z->VNH%u_D+5-P1YESj3townS<`^NE@sGNa9*F{y?L0kg=4xMS(#|yti@_?J7-5UNm
z^r9bcR}UqpXxALo*{K$}DRN|*T+zQT0&w;t2UkoO!LzKXuFu&&8C?wAQ{oX7!OqSF
zTHld`#IMxa*v%f8mcvnJHf%^AA65BoJ03de=X9zf=c!-GA2eagUhn1WKW93%cgvFX
zY|%K!NJj=~wGHI+DBQ|@FwePzF&!c@T9|#=L{2rl@kobX^^4sB0poy{bShAz`8o^r
zJIU3CPuXR!kglr;F;VNvh@9{>w{1M)5b&$p@*7d2rr|@s>~fIp#?&8fO#GUaa&<I!
zW-}h%;I72yxP~kBmVQ{<X?W47$Y*CkIvD<qET0Y68vnCSK8=YgpPW8%&Fx7hmu$NX
z`r>hWpZbc=<?=>KL#A|(y%|NWIVNnX&YkV-oPItn4bAb)zS2o)jJo2T(|~(%AEr_N
z;{C@Xi7y*Sw5r6@2>N!9;IM-Bd_l`^%WScUULx#q?D$u8<Pyj=Je?ALG(r~+%x#i~
zO@7@b%fsR8xhBufBvd$Qmxk~J;!PcRt+cy@X5a6qo3T>Lbo?*&-YP1tsCo3nT@oa?
zyE_CA5(w@V+}+(ZxCD0z?gR)fjk~+MTLX<l!}K?E@Bh2=I?waMS`BBNKGfM&yKL95
zGL+5zTM;8)CVGTEKG_dr*PoQT!6;VlDZqe`l2dcLALGL>CWU<TuJs|++YozUMeVYQ
zYBqIa@H9KEsrtJ+GWFNb*MLz{ypm~Vi+&EiC3CjKHuk=3<v$(t+jmSy_NxO~cy#rE
z%bhP;1Vnq|Yr=S9L1G*x^$0~rN6Af0gtlYmYr;?X^{g9Wv$3WAyGcaH!Rnf6gzRq1
z5emrUw2wTT9jHsb+?CC27WU30XRx{%f|}m}R%J(_F9v4j)?aNIXJJC>(bU+{R8<@~
zv75}O=0#;JmL~4TMMBbL57L(D*Z9IytD5)J!5tWik*scfl9gPLfaO6-plkTR9ot{*
z=AO>V4u2&<Sy?;I*@ZZ9l0Zfw)J2b~D5^foC!p?~9O|Pv-;$K%k%s+^9>0p`V$IjO
zgGbM>eO7eBv^Q2LWbfn0NIN^_yaDAc3Yn}gU3RfynW_&0KB?SMyA|4S1B1reG&Qu-
z>Mi}#jHLPlhq!^>K*$~E(vV2>mS1A+A+V;%Ymtq?p|RS9Z>MO73@d!FEHZ@USu#%X
zO;KO__dHG~!%)fKGvsipi7gzvi9dDLbv;6gIW0G)RTB*kv6b(r81$gLnXAk&rUTVM
zg#N|%ICpgtR^yMVXB@KTB4}A7IArAdun+ZL?bs^<QvHEOlx}9<F#YKO{&#B6)Ppo?
zS40XHN8-#8ih7It^ub)(oP7CI*|)2Kj9Ivz6(^QT%I3=|;Rr^p*CNfygdD-?<Ha-1
z{!a+f4|pod+LDVpP^rHO7$#y+2FtDV^>XcUshPq~h!6s?ZLAqVygVfe`g2Zsb2A8L
zSbRKy67{}7Rg}y}`W~Z+{!rvLQwo*zIYC50|4s?zhp7w7@fzz&&=@P%Kew}cqqL);
zCNqD*M{rW4xkn-Id*VH|I=3#Pn#Iu`z)5Ld(&k&E+PyQZfW4eV=c=F%P){VXT<}(h
z-ISxm5;juaH}3xCPcGN|y+E{<@Yt9H>$hI^o*m)YiOlKh&91H-!k&#jz~6w7Xy@Dc
z6nSN3KX<+NK+57?^jf}^zs$*5i86KuE{?CN#9eu}3Na(kD?ybApKhNp7%lixnIBy)
zx2^FmFFyPkA9&|is@^-W$}q<CWthE-CC0<VK2r;cG$|mYN*yviuW+Z1{L^3wR)lgt
zKsypnl6D><N+GSIx^-F2UG9_a<#eTDDuwsr5{$_=I@~SYen#K|-&mno949uO!zES-
zBO1CbzplhD7`>;q_=pD{qc8Cia_GvJi)`M3&PP`{c_$1dgJFAuZv6}DvqBCKN6*~E
ziiw7-ZVk4@4wP}vu>%WX;183+y_KMlKj1FGc90xEf8f*?pzDmsYE0?@b(tyXUV;Y!
zJNbriUd<gx<Kef+uIV=pD7;Sas`XpsUI?e2EPYkJ6^0uOeFkZy<yM%VCaGv-4R2aO
z4~Te_t?v(9qlv3)LF03`HIDGi-To>kia~`W*_ZJb%cP%j$cs>Y*!iYi{xHtJym5&0
zdtaQWSan5SMd9}U;b=}fOgd>ndRh2rer4?^sLs&-s9o+>6tgBlQ!*Re|6G%58l8A;
zILKwa?I$aTsmeu{@%w0-fJ(c**4iok9K9iL#O^fPD(CkP@h36O?hJ)fDpMa)zvEe_
zZmr-+;i$0H-f~jB!B;|~6C7r#evM9Y0iNfWjC6xwgmT!<1Qo-)sp<h<SVOk{+W|v*
zoiDLi_~S>S)8B41W~MI+P6BG0E@P`EU1zEpw6pI_U@Ole7=)s_9H|Tc{7CA<bs`<7
zJ9g1<w|66)qWzCM5-J|0LmX`}&+l>~1aV=iGnU*)fsFm`t+9l*LULH@#GCr^&)w19
z>Q1kw%W@8GeDu-KJ{P@#1V4G;r;C&BO@6hY=bO;lc%vX=b2nIs{y4Pk^2K+*;N;~)
z*W*D`?Uf$n<oY$S=$w(WI$&O+W~#<<Jk&(`p<iuexu#~8+l99uCOp8rB{6!fQ0?zd
zzSM)^TdV}7Vm<wdYa@GA=VgKtB`cr=PLamgK}0%l)Ku#kjk8Q&e=y3zHqhY+l4qKM
ztvlQ8{&G9JS7G|01|UEEjUcimOih0S|4{qzAln#gbC&F}@*R?s?`0drmtROqQG|_l
zOD9?1#ga7?{c*Z`rfB%$V^6?8YJT9>q5#>mym?C0EKXXI<3Y*-s9!AAc<8exgr`4a
z!|`Z=y-+O9!5Mt^eHK*`0H%zWN)$$YDcTy{Ulm55Uoku5xj36axWozZpj^)XynUZ1
za*ZYu*w2I-zfRS0Js?AgMpyzV>Z-`2KkYH14~SZ!X?o5NQYT0)!V9uagl(8w^FwKd
zzGXKc#f6F9z<j~U(a{&mk?waU&u+?h`y2^sEf=Vf!0Srn(G_U4I5LS0%q&d#Ye4FC
zzHnhMaHl{BdgwZ0L7#<qtq*AojGlB2eX70T;!x<v6#_XV;jLC>W99s)Y5Xz+u};3P
z<vif2>mnoZ@Y58Ld$}(cjYSBVDmK|6*&>X)MB9gNLsu;|j9{_S&XeiphV(TB7{n55
zo7%n0CZv<HN@#YSD;l9ri2~V{{S}O-%JNA>qi(E|i!OmNPfN9}hf-tXfeATk3f@AR
z)-98Ac-M6*k1ptlA`JEdl4;X^x=YGgD>9oP)M0Bpf#t@8O6Wx)gsR2ZkocdA@4`m^
zhPc=|BU9_l)QGvl+k{Epc-I)VyHZ^~7_|3coisJ~?%Lw$$Bj~0QF?vc>O;_-6Ourr
z8Bz#4b`D2X;SGEDsMRM9{F{nX6Z!SK`3$*SQvNu=?V}<rH;4lr1d?YJkP4}O#NKi)
zA6oVCcn<2VYqG3>6L?iN6=p?O26yI9W?}o*V2B9t)Y+Nz=dW$)466XUloWD_5EhN9
zX36Q%&Xqbu|1ODC6f}I1=b}Qo^ddbmrM?UH<@oZ&SkNGXv{n`PcVC=_voj`^qUhJB
z7+JCGZ%MCqJn=CFZ-3*#y7T;V4#~L-=Rx!FAk?2BxzTmVF+3Bxqyd^mJzCZzSo^lL
zXE)j&Qxmk7%S4i;-|w#B&c50|AE@Wj-37;m?JJ;Nx5W1z$UTNyw)<B5$7H1Hg^XL#
z{&=KyVTY3EdbpmAy-Hsg!^7t2fH|+7A}E?n4oEfb)sBG93U<My?;rX^O1eaWEmL;Z
z3Jp2oUU53Am12O#g$;EU7MJ7XF5)#K!fdQ%4ZC|&zxkke^}~Q~wv138XE?zsJr`*!
zFP&8v0kjv>p4WXvH1r}0nAm9U8rdYgH$1^t$L+^UwTle+U9Hx&g0PR%+|?aXeuQe^
zBdjr;i%+%x*YCGd#Wc>*WM;^rDo_0HzYmQvgtqikEJXnaI7h!)0KaSEt#WA!;D#D5
zPhyvlk3&s#>i@<<PFPj1;*Cu`8Q~<v+$xNkby@agHAkN+ATpF9s*d;uP2u3qpZL~q
zl>`|?P%T5Tte0Zlj)+sDl9~r9if~{jA)OAY%|;nWPx|sXP5R(bubgsa%#3Is#bcOm
z!O<~UQXB532S^j9>XMl+c?cF5wKK(%Yr$*887s{H@YVjufU*s`j&;Pm+~-8$EopcT
zBH<DGRTz6TH(`(_?;)+C?B{mJ=Drc<F%~oKM9?3J2EbDT+h0sfmOU*jJXCB38R(9T
zN#m&ne8x8$viU`rO=1dK7$dEH@uJgvD8!8Wv^R#P%n=u7G*?hs)#M|*mi?@5Lc6e*
zV{Fy4y+IpUM95L-shgzhSXbQeC+1r}4f^r-b~KO5KhDg#ntdS`gVjx)vKDNd2X4|7
zI@4MvPpe-{#Ir>lQ+l<B;(Pn>mmFgyR+m!-x3r~ivJh9V75Oli>V@#xC%vN@{t2?{
z!OVx@EU~GNL@6O|SOIZWoSvvQHUkV#zS_Gxso0J(s!LBBZ5`AivRO~r<iw#^B#Y+W
z`fU*2XJPO?%RLM(;^WMI?`iXmHdMo$wjf2X>3!`wt>zEzm`AmuK48VRnCJ5wE6<3^
zB9rb7v}^z!VoS2{xd$#6AW=$iYJ+4cQg<KKG<;wlmC{Yo)pulHqcGT^7%37&$FmB=
zElLD-^9AB1Rxd<XC9v5OVi-H(73X)T`oB$J4t(aRtU6Zd_g;%X;=)(MyH9(;Bm7BK
z9{R?vCx;MqlJp?M<AFzy9l}vjkZ=EO9vUpfGtA`V;Wy<Uid>KI$qn@+4&5rQVB>5%
z0C4v0LmXtmK-lq~qlWNY1oJY8z@0#(;0ha1E(KsEwI{D!d-v#?UDwqC7?q7_Za=#|
zmHi2O%C56gjuGIE4;%eOjeD%}veLDguKI#sFQw4uwnGkocmwyGW+YgJ#kq?SE22es
zp$^3t-osEm9XXi!LjUSqlwVa`@Z_Fj=m97;&{gOvbH7g9l{SscBA7Mzt_ZlHx;XA}
zN=Q^(Mi+AUhf4TFr~<B~<+|^fbpM*1gQ*}n*@<Vr?KoQ57D_{e&9OR8KSF;HdCdDD
zGAm&9qmIQ`!bW7mJAQFf;iqGFU8lp|PkP3sUqm`CO7janF{x%Wj^ePfrr0_kUL>dn
z3sEw<-^~hoeeJ*#CMblk?3)Yk=w%%jUwPn*M#<d2sa+UDl0;;X0IxH?NGN?SepY?I
ztNH`fLBF8FUKU9;NtO{B5|vey+{hS82;y9#3omSscEU*)`@t-i7kP4%p<}%)E=fZj
zKxM1=j>j~Q*ErM~dRpt2cC^N!xIXf-y3GEyWA!f*7Ab0x5Kn#xpJ{z;`)!1Be~U7$
zupq!5C&Z$-o^*z!Cb@m1JshOYb8qM!xFk|74f0!bkm;%?9lk-aQ=pqd0IdIw<;Hq>
zay{6DO{@Qt;ir$omh(-+9B=-IpiOii)AASlF0s~%x(XuzI_3QPNEW1fxFplzpAWi!
zkRed+URYr2k+I^V0eZQflQ>SQZ&C{4C|(Gae&(i2yEc|Wn>VfR&X}0mCZo2Ie!Wnn
zabmP>0ROlqlQ{W7)tNk+a-u-5FJE)<qr*+vXCB#EzqNu*$$V+A0Ym3{l{q7G^Ug8#
znq@Pk)bpUoM8ErWY<wjZiZY$<^RtE8NB_*RPexv3i#iik?p%xo){&BZGH$6+)V737
z=VPI~+?NhVD3$5%BNhTj3BlCywFS;tefi&fkdSY&z(l_}g3^%d-LgTK5$0^0iIT2p
z-*pMMhNf!f>pn*5w&Rr^EpWEHUSpp!=#;8G%Vax94t9{<zp48R2`nF^yr8S@h}ktE
zT5q-x_exoAiq}wc5|EPkr(8ss7|duvNtRGT&YNBQU3%snQGCAC=%IvRrD9(wPf7EI
zh+ctUl|^jy$j)(G-J}%0sIqrr{ecj<AE5}BH{F8Uh;36=)7TLqlLu2o6`4#RcSJ?s
zjK$V{C4RjZKh9kH*r;4rFrm~J)&yJuKZ`C_SLA0%fuT-WMC&bT>ksbKIU1^{4zw=f
z1Jw(%0{!p9eYp8H6=O6URJ(fRwS=?EKm~GOlP*4dff3Bb;h+}@+`mO_geQ7Cuee!D
z0pAi`lKHg9rqk*@ZQWWzxuH+&=R&c1?*8Gi`x_4tmKT5>A3ZfB_A>c)$(fd?$_&6r
z$1?=gWVXdL(7Q1S^qYB#U~>4b=A@VS`4#lg`l;AK0rg&Pmbfl#1ndLd$?zlGoSFh7
zHCm0X5lbrqKW?GK-`4twf=$V|3s#@ZkcR;Cu8Tn)_0Tw6LsvfW(X~8l43nF(Q*>&`
zQ7G`Q$aWeCNA0-s$!gkUhLfF_qh@jbnTyz+bT!d+A+O1tD0$v_*Ru@5SB|&sQMN9l
z3gA=h6`>ko!E<*KSR-}>?B^$OO3aTe#)x2CuKt+>lH$2PMha2=Yk$ZBz^9MS+?8R7
z=u&?$l$E9BX^g+?ErSJx2@iiFmqZA>i;JDoYRQReDLkJ$?(o*x;L(-WjnSQFVzU_X
zA$)L1PF}q<*8Zx?Y$N&z;(EY47)UV1D~Odf1Gcw5gzna6g_u8ld5B*^ZXb`FTQ@k^
zc5(T!l#5C?@_f(6@OA!S=m9pmRl7I*X~|C1yiY3rVE8jP_dj<|6VcJRP8WoUd0rPY
z7qzzHviQA0(5fGydIIWjg#bQ3JgC4RVAcDq-kinckOElIWxXbaHPlY8os+O%io|)m
zb>8K3J8y+WI!fo`ORH02T%BN(j1KYga72C;l#<nP0MwCY>CiGPx(B4WvI7>*S9p#^
zc%O<@($>EVMvvcEt?*77kD+dwdF0LQmg%!q!c7?sHgGbg<0dw7W!<-Uv|#bwAFkTb
zdwk}8)xjF_xX3PYa~7#FAtYFH=4B%^heSi}lJvVvwV`ce%3QSH9G^uvjq*TPWRuV{
z&Lrby3&K$uh-+!=kLp8Pc>zb{@Q4?m@FUzMkH5ulX)A%hv&UXPl3qvErE}tg=(&h%
zA>TlSC3WA6`+hzsO>=Uk3<R)&Xly$z3961YKXr8xh7WRj{(P-A1*?DNzm%O?Q!>?}
z+m<>wFj6OjH!bkownlyr+|D1ln9rFWS|=>}8~HLjD~aI<dcUfO?Sfs-_QnAq4q1QJ
zZf~`!F?3P|s<6LS<#H7bE`nPB+Urk@nczWk0u%lPr(CK$*aG4uPulQ@WLDa~S*%q*
z|Nc~u{^)>NA10*mboV#BU*ZVE7%%P7lq#{6eMLSd#Y>IY?)eUtXbP*Wd`(6W8zsv)
z&g3Ko|L?VSZdzUS-QZPD#pWA(+8<>-9kCLoms{)wUl{_zwNq+<T(^aZX0a2s565AX
zLT_+0&^!TkJ!W-fZDYZKP_Wl1?m^S)8No}bTlZZO7V?BCIWjPvnL+TpsUy;TYvz>?
z5cw*a^1MEh3>;$1PfWL4P5!a{`mu~&Df+(sM~ui{tZK~iIGa%HW>z~6cD-xs2_M-o
zW2ke?&{WV5=D3D&Q0M!nydM+$YV-za3QJ6C$o`$#KyvTR0iJp1yxIZH?Ypa4{jWs5
z8*k@-J~Jz{rk~m->ztFty#5l(DN<b#YV7x&+b=v@LH;M^X>_^3R1REN?hMw4O)djN
zCj^ut=w_n@^w$K0O-G(QnCPOC;i}CTSux^1VxpX^M>B3iiC=ZU4D$FQ^Yb9v9Q^NI
z0HU*>Eu%#$py2QTSl~3)#l;ln@bBvgQLM$B)n`RsnNwg(TPRHET6!tbn?uKgj3Iaw
zs^orP#gd&O8A$i<25!m+DU&1j6dBcbJAJ{3(&e5EIePi$%gX9pV6R1X@HMYL^R-I7
zNOyGGA;4@II|dnLMM9hpu~|H_E>XANz#XRFl5k3?K}QYrIeuu}X*s`OXNIVz#E~?P
zu_Ih6f=#{Os5)W5gv8?+Qzy`s{ts2h>8khAmOFq93MZs6Em7J0+eaorQ-^>zrWI*w
z!Sz>@H2G70iDRGn%GJy~`6XXQX-&XH^Ydov@!j9IN_AO%#@+M1j%pF7k2;dF@D`jO
z@r2#bC~1hLCYuIPEOe<k^}gCJp0-O8H*01ncNNu&G3W02O+<Yq2zo!&{vAx5U#uB~
zS<7&LF@?(n>XA1lXch%#OowY!tk9duT%lIC$`e><jo0}xl9fTu)eKXzyRRX-osqa(
zmE%XeY_<lUIASM+Lt=Ln7tRp@GelFk^L+iSES1s><oqw;^_^k4rv%y{RU}73)#7{x
zQwbM@`z4&E3C#X-ICg7A=CoToUN(inR+hMo#LKdJ-E`6Mq-&(3Dbyl2dl9C5q;Cpx
zb~aVY5iF0$nqS(L`}sH>s~^Fp#KNf~yO7f5h}q?e5$7?SuP7POWs1X3=~Equev`Sd
z%>((=ItBC535>Pp3U$MA+Uzo`{f^+oQ1gvXVVCQGwYsi<&Y=WSUmI8XZE!)fa7HqI
zaqR;lKcj|!+1oNQ8CFvrPqJK0Ms_t+QzxM6YH||ApW+u`WgAps5azB-pR!X!?W6l8
z`S;IW;y6wZ{|T#qN#rPp%@PS*r#-^&*VexyZLfog7qUY{nT@TV@VBFIMI9Si%TusE
zaFuiy`jWh?a0fs5!)$Ea&X%iy{`8M@JwM}$AzCt}1aem5(A@2Z$>Ey1a=&&LRlS;!
zSe@|5`-jJ$_;~V#;+cE(BHQ9+V<zEXRFdU~vK4k0L?(yF_p!TcE}WY4w%D#siMC4q
z2DXHpV^D=lMc9O*F;-nN$b|FrFiMm+<Vh^0jGjv)s3Gq|%R}dQJ%SJXF}K{#(?wav
zVz2MwD|+qUQ&0}KiMx~pe`3QdeowG+broGFDwfWq;CLh;CZ;KnD(v41$=WO0o(QMz
z`^mN1M6cw$tCZz#;9K>saR2LhFRQ6hI20y~&&nI0T<9^X4u%ql--I-@=*J0oquL5Q
z)Z-KO&bddaPfl1|hM2v9@~J92QqU9P8-)_k(T7MeS?{@07kp3Jsx^>iyJ8kEWDPd%
zB;XI}@qwKCO2Uw6q}<M-##5%>8w1~{>})c8$)H^nsplhD+Y$CH@2MP4LC(zG)BK_g
zrT%__aeyx)j71R2>FV5p1hRF`tq5LnuFnyrZ=9sJzSyzzq?8Cg6tF$<@Snzv5Iz%D
z(Uk`gwQNaD(ry*>91Vg-8>}-5oBsxQ;DLqQ@tR@HzbYdEnqmU-_H6&e##({}1VLYp
z8;#Gb9uhc<FlJ_Lx+IihlZ)yCt8f#-?H9koXg0OkD`QHi2nl3eUA_F6-!Jr=_e;B~
z>c=OQE<PsjXr89q8$_c#9>BggRLkV9JO&<asY=l{*tblD47fEU<@enVlSpE~3OU@>
zo!tCbF(0=$?Nc9{t_&-)H$L6e&*|w<RDSu%)9$j)x6OC2JN~`*;xkLKsV)QD_mX-M
zp6-fcUnaGMMmrJx>CVg5iS{zMaVM%VNI1+O$eIxCUj3(}B#1-k*Mz(K0L^hSH%ER`
z|3tYDiDEiNXdlv}+I2Jes<_1_@t#K+w*OZW^YYJA?K3xN*jE!)YKmX!V`-*Kl|fbl
zB~@2NcbQNjFd9z&VWAuY@}-+Wlzy>cbt@y$YtpONdwzXmT<n-Ce`bGy`vu%+GT{1q
z9#$utyeA0rq;<vIlpGgkgpKfLo|Aumu;&-UpRNe?N)Bc_Uoe?m#!|Xm#hY){6-J?9
zQkAn!i9+sM)-73jtdjIV)95x}e_?n*@=|$zRhl8e=j&0FyVnHLY40HP_ae?#E9&?`
zIqRH)XCjC?<U29<^AM%guI++(nH|zP(?BC*MsDT)b;_Fk-zjK(SvR*Zx2CB3^IKrK
z3BH-d)~&Xz<#8g&k9oJk9)W-CoVNCtbkWg@m|cX@j*Xkj{hnjgx>mxxPQ?7~E$#1o
z8M@KW(uq9ykE2-eels{+Ga`kZ|6)f97+CNUuC3I+9H<3w1eIqh*q3%nK+nY6n=CDe
zFR=RBp0|qB!j`-?f}5YSS#^2&>ukfA*L&TL1vIC+G1m_`T7B0&^*mKvTYrc2`9)*;
zB6cwpnT&2Ei%;lFkw>*P#*D}sN-Mq;;S^WF34c<U=vT`$Bau)Z?y)rG(ZNJ7xxCOU
zDPx`*Lg*|*?)ut5hBI`=$NKlh<<WLwhHzuYwAGmPP4+J~iS#b5$I^*YbL^PZFJ-{F
z)8EQ<hRI9wJaJJZ57l*rgAW&0qOVKN+)Ba3q|#gST<x7Eb~(SrEh=+Q7AU_=XsLks
zu(mWZr)ePdu+e(=n-0YTYN^O7<5r%K5~4|n)tpa|&j^P%vTGFlu2ZBNXvJWvL|K<^
ze$g(Ruk%Q!wv|%6qvuB#iud?Wl6O{Y72on=cO;FW6|<z__8HrXWG()r)`mMl(fM||
zf}pq{(0KUI`PXKVu@o=~mo$2YDX#|!`^7gw4N0+6X)98a;~UBx(qEt4cg)ky^02=C
zTKw*a?JX+}=FD%hB~o@$aqq!6%b1dYal_MKD{Z{Z^mBBq$<$rRT)NZCXQ8lG6i6-K
zGFod(aLQRU*5;n?uRAUVzW?amc45;Ll>>&yKRuaZAWIOPZ4U;wiqFKdq-ayF-1~{V
zVgke#zr%_>4m6l}5WCVLwzRccedB>Q?HZo(x!&Hqghenxi6VDVW-mS3*B>O4d%|JQ
z`q?Z&KqxhFOI=M8k?^!BHKU;L@nE$w7a*qEc9r|@^*?3ycoUwM={-oYT3;{%zHJe|
zZ+@ji_;U=<r4-M>+*OmcWOX#fg096dp)T6_H(6z+$qdgKTaI*x8xhC@oVX~Q%Ff53
z_`?KvZme#g{zoXc$^DxtdI|6x|LPGaRB&y=1FutIw_*^CN|k8M_L~6M^|x|WHS=DQ
z;6;HEMW7b5%9%Ih<y$sX$O#X`nbiiW>S8<kp_PlG^axR=Tw8N>uBPg%Kr|B^6@BgJ
z;sRo5(b&5~CO7+uWlQQCTi&ros?4Z;^cGzB4d^dtzqs(4g&Zi9SduVJUd;cy67Eym
zmAR~xBhB$IEIZ(`m=S$)>0kpj#TU7Veb@Ud>W8#n4WLH?9OCkx8&Z5=yy<!)L)Z#e
zIpDwnfQf-(D)a7KU1vvARsgpynA#cv&vkmk0@-?{Y!5>cqWN7Y<ib6O|MGkav^_um
z3+ZkW)1&nW{fzs6w`sx|@ia}dFKvxli0LsNvO*5u11%>1L3uG}<z{n^Fiu$Ze;l+t
zY}o}2ArRbr>>8;L#+C9E=r1VK+7+8gvvsFyaB&dSeI_(T-1c!<D&2d4E~`PqgLcb1
zE#7CIHt%0^ixyo_ks3^F*|!EdcNj?b?(rY2h-u*5v7MDWxAqLni!Wq2HaG08579x5
zhcRelW%(jjW;pN^o}}7&S;4_mh3a-!{miqZ$@me|lHM|#H4>cyb?2J}aQsh|dnd*Q
zMHR1c9D1w(O5OoIz>SFFaa(EO0%5GtW!B;&s%!pyHS>+ju{ltgyXtYUkW|+EFin~O
zWWt`vwXc1ftvyjo*>GE>a-~RN+?T51;s$Y|o7fVl%C^fEN_$6w>G7CGWTyK$tWefT
zGi6gaaQ0{Dw6pcY=V3cv;AVLUE)z1r{o3suznt=IPy+JB6a$dsk2J|4X|0z@M$Lg1
z;CV%Sw1xYwI+oq9gM*`bfKWnH1V|h86aHj#dL9*{vRIR3^52O$YpaUgB@X=T1jm0i
z4SUCpA0f~h(3i1~CT->`O&a#6ma;3Ir{t`IK_2ZPb04$D#i!xc8Wdza^yS@zJSE@N
zb7vt%z``p42Ax~K$?EGd%bm9HqFWN6lnjlYQ*pgF@iN`IXDe-3aKyUzFeDzYYW1T}
zEGKx@ZXLv(xS95kQF|}nR2N*Q?cEu{^d(yS$C&MfQs+caq3NjP1ateu5Hdx3AsRZB
zjpE9^ZZAu(ykSOINd&`ReHH0!75{I~iJ8h(bAx6e_CDXYwwmB4#D|8kraxIBv3A4?
zqJ(tCpST_fF-NjquC0J6MxrSS-ZdCD>>T5J0$OCP(`q{J0z7coscA!5ZT2sfU?(&c
zmCwrN!#dkgke=i%>>)_!D{vR2=_Z3rZnG&TdH?VpDz20CI!p>I#K`YUie+FPJ#ik?
z-t$z(*^XKFkJZKl_N+P3{Gm3xum+TKq+|^K5q1P5L^GJ7iP2@V5^tCd3D~1?$eGmt
z6F#{=X`QmkD781!m2y4pO8lK^eUY0PepT6TU_1OAg?+p#pmz@`R&W>S&&;p|@7YpX
zspD5Nq>QZl27aDfUsoP>o*PYkqa-Gz7gN(toN43M2Pnr~I>G;(!O<)(O`IwHr`_ex
zI1gMde{03CgvFN*HzpvNOrp`!u4RJyRj{f{pXGl)o=rPYA0WPgyXon#<O#az#sgWD
z-aDW|`kDT6#D>L@1aNd_Ujz>R7`foSwRpUHLJ>>G*5P>OJm=k#iX4;#(`_*a?3W`|
zW72xqEEe!B>=S;H6~DsJly~r9LzMdcn?y{!G`w*s9+WJQqewUP0<yy6(aQ>d_zsJ9
zJ?v)7enuV`(%0{kpSZjMSk?%B2%r$OXLXAW8=%ADs_YLqciyiS_;EVSNa3og-ALJ7
z@*RK7NW4s;r1Z`|=lE<EiJbZ^fOb-bau-Ec)o+=Aj=I5Kd_4djC2duyu{40GgzkJ@
z<AsqA{uAe}0WFJPdy{=M);BYg?88Hd78}ptr*+@4GQ~pYvQ2f2ISJ*<p*vh0wq5vO
zQ{n-QM*FOQ^%76E+H}WBP1a1W<5ij2P-EBHOOC%}0Tm=aU+K%S(;7WJF=3-MH;1d^
zVWa)ft_P7(445VOzO%C#qtJrk?(?Af(u+S4ltV#FmUc=WBnebvq$|Jpz^A%jvS|`5
z^lqU7+Qav?kuel#tok5%M3YQB+_nDi&hSM%X^TtRn!en`j-+NE-IR8D94zLSXXRFD
zdJgUFKqGPS3F2o*#{wCWFH#5pDyXT$^wk8?3xmcr7dVflkfX_sy)V$`HBWs%-XUl-
ze;(AOT>gmO>rLUCo3wZWr1pMh;fb|59^$gfSqBu_0~8_{+jv=<Zsq!a{)$BU3ptgc
zkWyUMD@WVK3K8sIH_s*jOG12rOyVCxaX(vbbJ@>3EOKw9e8+^RmQ@w}u%(H;e3}W2
z`VtbCent!f%VNv3i1d7S2QK>aJz8CVT2vyp=7`)=h`eE%A^ascjqa3geTXGpPL}S!
z=_YC6z>oRNoe;7%S^uy0l-}_8^_no=)GZyg#pX?sEVN@&0J=Y1@A3Y}Uo|9(uutOu
z&ktvPfBoL6kK^__vH#<@6x|3|7dZdxlK)(Mhx-42{(s9uWVzuQicG@LKH`6i5+bL<
z?KwbNlb+gNP+Im!QC+hH62$=e9@$l3z<f6O|B`uOzV+MfxlTe#J?ej71H}tGvIoll
z|M`Dp`Ts>8(T+K<b<(4XS>F#1P6y~LT~8!h9`W)(o#40R0qO5p1dTZ%YFCaP62JcE
zYX;CKkF&R+ZZ?ks*s^snS&E;8@DW~j-^Z=rC@nai>5l2OKTTEZHyK^`PdgC;BCs=`
zT1@i%-noMt1Et3#LVW?0)86q80ZDe!<WUvl8BeDD_pe3&qg@mOeBzvfgM8(|Ee0c{
zPn~4RT-%2r*L=`oX3Nv2e?Y4e?q7F9mhgIpl>FzM>FF*xkAZJn{yYht{@xcXuR-BV
zKNk^kfD=Ve&t1d<U>>u`0FR`Q`%^QVJgHli|Na}@q)Dp`-TYUsIev9t95N@WBcx5C
zS_d(M*5-jOYxYKd6U;-~Fgp?X(6nWjWpCZCa8mLEK-netamj#38-DC*c+Tf;rk}T!
zt)jj$opDOHy%NC_K3;I4=1SQAV+C5TsNtx(`APY%>YxA=PvC+}wv8PR*Z{QdLyI-s
zoYy#`y?&}%8~1>IIxL>I$`;N48kCKl;#hq>L<gIXoZh}sPfoL)NBpk`9+Y@yu6K2l
z@SEtJ;0xwHVD{kBzQdCj8!8#$^pkVVhk}xzEbK9Z0Z-=5{tz~4^3-~M8-Jh^f8wJO
z9hi+IdS&yN|K-i4G@&Ck8}K3QfI&EY2Two@{^!0j`{&?CUs2w{i6sf|<J<Z{P1fL3
zLH6_3qvO$%bEB8bwhW<={S|0+y?9uX(-yS8pox2ZP}kSyI`XZX&k3nTT6px&(S55e
zgEr3800Cg%3N}adi8{GI>%Furp*c|gzp6T0M%}1($I`<0jo%g|vk|mA9Q?$SbEG^a
zjzJ<cwiN0sL+pN!`LXys->LDy8sNI_*7tT{KW9n{GSh)IGP+iR-F;ifIqp=rSwE)#
z0MXLE_z$+JIXd4J2e<OZ@T!|*^O;f%2EzNjA-!=dkg}13y*MHmA6>2X%^Z>2kBQ0D
ziT9LQaufvzMiF2`0|np!(>DBI8-c`lga@(>*#+DQd943=hoAiAI0e6xxu|upB@Rr|
z-DI~%wM_X9YFUcxZfpbglC`i1Zmc<!N{^h^E{9vUuS(w7e09DqL5b!LE{OsLbsbyY
zTSVO{1JGz~HYv7zURlOaRq;LLw9f5zE|0Oi8>)kQ2z0&A9SxnQZjg^fK5}l|Rbq)?
z&<#DfC{G%T>}0in+&s@xci%$i2j27NF9b{hrn?49BhYd-I&&0AIDZhulNIj%(Gz*v
z=*XM?cwp7FL<?8*wqq2ETL1K8!u61t#V;u;fZgj1_*H%Oj!&59vuIoA|2D1SNY0y(
z?d>j?O$lOxKOLj*Yv-knw36|8u5*9-%9A*N1dEP4NDOYpJWdYuNvUDW3)_u45J_Jl
z{B(a~x;F;ybhpA*5#@1fJ9&<VxveC3I!YFLzK&e_cCh0-0JiuNoKyO=*s_+M_dy-5
zK(voPE3`l2$lds5kYjCF;c|3dwUqULbpHlDNr-FKlhEdQ$CU4>)%)?+_RRs4ec<WQ
z9Y4@tajD(dY#Dll$)IGPMe!XMhAeyHQr{xLV|cPnjm(J@NhmNyp#s=U6n5}_!EFeQ
zPL_9+W9c$};Mt!5G?R?De*%B~O%dSKzU$h3OYWQcUyH^k*~eKp-MX1Szk{%G{8h&}
zWi1|9OBQRf;U66PG%ccM+0QS67{1{G!cB|pcz$zrjCwhaZ}B<@P4zDK)bF=?n;E@Y
zU&D4F&Ap%vkozY6AGcN%=PsPa4{+iN5QRcevOLdk;k=3FWT|t>4UlLhuzI?1B^2_C
zK)!d7ad=k}?df?c?z&LI7u!C_`tRNr=@@-?f#W~y2)EA?aDAhKvB<6W{_bBtR&W#h
zj=Z2rycarMYbFC5OyS81!{?aVHP|L%M}=Z^&0%y6-GbPCwjS~2zw3GPw^7-NVEAYG
zJdVWPym_;_7m^4O`#AFU51+`x<YBJ!)uJ5WIE9F4-bkOJ(!jMb?@bJ+R|o&jsrAmu
z%a0)?14U;(*u*z+%g?wb1-pxMI^E~gEW0{Nuylc<F*lEF#XJ2~u^}(;SYD{yU$+k5
z(zPD{M<8L4Zl3?Fw~C(3)dLn1#9^*TVrN0mc%td2nRR5eXzKmxGJfFc`FmPGtZtUv
zmMK*^i{8}J2l2@70$l<Z&}3Mhz`!UVcpI~wLHUtr_%=sD5_}*uh<arfcBnX%Ks238
z1=HrL0_~XZY|5@@?)=K)iN)j2-I?AyFze}Jd6K)~{gK=WzNhZ|uYCRJWFU{k(r-j-
zNqQK%+>+##PljH$b#$4q2{-k_7`R@>V8~x<-oP&s02iM(GT<DHW{n5M!|rsAG$b8+
z{LmXdzU_cSebiRec^_de8AlTGTG!MHNqMwRxQINB)6!xf_#tqb9eB~k6N#;V2a98^
z8a}i7=11>(D6u$_iH^3B#nN#!rYJGGM!m>@nlW}CoUt>%x*+j+qAd|3aG7jK(#E5i
z8I1$qGBP|OZ$+6+3%B_d12=jmMG0Ua#c;*}&)|3XV^MGls^L56EbU`i^vM^@&-zy6
z$cK9Eu--QPG_2u}^@Q=fEc*PqEpj-oG=1aQh4%<3d<2U=4{uL7zwe2jEOr5&svp;#
zYiiv;E=K|Hy}w-ZU3O7dp$G3Dr`bgJ{lIu#Ys0lCzQ=|iJHTU+!?(fn8ycDczWwNj
z^z&!o*;&!9ZKC|8K*$WFmb&{_Rj6yowAkAzK@Gi5uRmIk5GZTRg1Xk@SvRYeFtAf?
z5JHmk%$oQQLURv7C*2<hd~`vsI!IP}0+cB_L+B^xo#uuKKrn<}@cy{StnGP34DrbC
z?jJUwHWo-HjJ$7w^zUv$*^II}ys}cuNEF?I@4Hv)uv4gx7LCr^A0=gKZ9vQq_)!oY
z`_t=I{)hJsRs6fPsPy*pKl3&SNHNy(Y`O>^?};qk@D#5}otS=an$G6=70}5C9!`+v
za6V(peEQfNc<lIjly7KMw=!K-lIr)8kD2(-ic!Djj4Z$VyUdL(aZBKm8~Fzr8YzD;
z1E%N*$-6?r+o5!4yE&@#^!>lYmQT%2@Cj(Z!ELJU-1mD-5n<=wTJLVLYiauvK+ryi
znzz>)+(f|Tc9hm#_eVooiyWrY(}1*|mB?qJuVL#z@9Pgs;MIO1Bj_UmAg<IyA3ByX
z6N2*PHy;JVn>uG4_dGYx<xsWWa`WZT+L(h!$9lGn=X!ItW6gFAFQC@EJ=LVtx-`=w
z2pXB1vVfBjCnK~k64uM@4cr^Ifrcde^C3Wl6uis7oYRJpYde)W?N@u=e^Ac>=fh`r
zwU#|FRzg;9o!;o6`z_yC&wbM!4QMJn>iu<8?+I^9*4bypn&NC}+j;mmwg$42_glOE
zf&U#!T@4fUU6ypKtsi}yZb!5#1T%omYl*i=vVQWV_cw<cz9=_-Ou*2`DEnjWQB%5>
zk>hFFKHqPGgw4?%R?-Iz+=in#Jnc+N(<jG(@i<tUI@9E&0lsy=?VA~i=y_4&)iHN7
zVAW}HOWZZZGW+wIF&tm_*L9ce3PwuTOBBt3JTd3Xr%O_?w-q7+uCL{T=TFKn;s!bC
z1BoZPnXa$wFj<#?A0WN&71cWJA@}^XwE4TFJc3&`P4=BRTdoG<rb6Ub21gqUn1D5B
zbq&d*GGu1m#<=!RS?O`*^?V%O>Hu2PlbPDM>L)9nQTInV)ZY4KuTDp~*RM_mVP*L~
zN$?+;3G4b`XhITy0tk-|Kbcm+VNQgPE<!L+I0iCZ8OpBtO6X+;*y{P<udwDYKDGxz
z8191Z^P!FK5&G-!dkx=j`4IGctV5e2q>Mj~2ZN-qD|LZ^s;Vts8=h}Z&V`5=J!h7O
zF>J|M7u)|%QAz!F0-7x?lL5YL!ai`kuPuMqFNL0Zz|~+kbkr^H^a?dct7euS2-JaK
zhAnBe5qMlT{n6#{N|qN89SC?vFgGvp^7iI<i~c<EF%1LJx?V&eXc-_R=b(rIwb^0V
z)75ARn#qxd%r%#Y0UGA7Us`PGejK$~=I@4(;&Hz|t^u~4rA(qKhYXa=6=9}DUf8E=
z{%Sd%_<6iiQM#~Rqes!aXXRxm1pMT3z8WRx9}Z4f%va=jRMpy*`qK4Gnu*xM)=C!Y
z``jNOnCUIU?6auYd+Eu=zjnlLglr`6<JCz7SoZZV+<jYgS>*rj1u%8WKp*o+%uVeb
zYf)gv+!(zsc&X!M7+oD1D`EP<Bp_3;+!mLL`lX>_gq&O`{F)S@wvU;!QM0k2km?a^
zVdoySAQ%+ML~E4d^W&J8#7<7WWacsg@EvpVn7IY@buAF|60N7#Efyvnf%Ogf;X)5Z
z4a=(whxBEVKEyU8$-ez+Q?WtRV5CstJAN-mY}C0@K}SX`+frhi0|o9D?-ijhrLvcm
zEzHMjzB`U)u;IzKmq&K`dPgcpEi51#+_`@X)?&>sSALI-q&XD!kVrz_zrAS4Qy`kZ
zTP-W|1b|v$BAvF1Pa#rHleh}7xhd;904s{~OIm2KG{<|xCxJAxo52@#UY5Ls*~1q3
zE6=2!nl~$aeM$nKNtM0??oG?I!@u5*BwCh4nhD#Q^3iH(jUv<3<PU_k(bXRxn^qwI
zVQdIZgZXDGVTUyhH)HtD6zhZiG`Zux_!qb9<6arCA{-4wuNYbyk3=Dp5cWe@|44p+
z98j%K@4V1k0)Y80UaxkSm!)!$7PwDRa^ud`7;Z4k{ocO83zS@rp~}3vF(9A2R=lIv
z@`$!Jb8l5PnRn(qxRUM&DTkwm?ub#HSlX`40X#AN&fo?3Sl;k}WrFPOzp&g@K84RK
zdMJL*B%3ZdIB{%q6qlSd>@EI!D15{sU$|mV!Oj^z14X~y7<Z8ZXV8e}G@jeS$^8Si
zP8b&#m&5o?8=?%ZrzI+S1cA8gEtZWC-P+*wCp7fSlyti@cfSOIKm>dqXc77mnE~ha
zD|VaD7mZCqFC)Z|x#B(gp4)269`mJat;T2f#4ZADHVamedG$&zs+8xVy2A>5iWD^N
zsJY;`vo+vmBT<m;P&^fcE-NP5;f`wcK-|4?2QPSg>$OY9#<S85;hy9@flaBYnT1cc
zcdvK*-@wlRj>r7EW`@^0K+N{wr3aV&%U!#@lJGMeTh|i|2S>f0$Qw#GxcN5?S?6?X
zy{R_nQ~+Wy6xIx!#4sK5Z{(^2G$FX3NwSY?ZpiTG&PFGW<NUyPnb*XeoPiufGIB01
z*dSCNy`B0^H6@W=Fu?!s;<2Xy&IMcn*6W*&-hI{gm7Cq9`WB}s|If(3EISM-DMkI$
z@;D&v<i-)F<j<lq)Yt<(V!Mlm7xIXPb$Vh)(4p5dnCwg<|E_kQW|rU#{7p%+MP9<*
z{T)gixj4|5YuU7u_j-^>#`QgyEiBtdbe~b`{2St?Mb?LUbP+TG`Rg)AbAb&;rWCZA
zeTet{sj0!o34z$Yw~V}aVmpz7ZGo~1OW!mo&Og8?<1Qlm$#=mUpYg(sS55@|nEi$C
z+1P?7(2b&EX=E`cj+q<Tqs;S<#1G&|Ulv!eb@~uFwXesri!bGk*FC9_FnHGdHnU&n
zS2Hx>gr%H@h4~iae)YnNS0>~pja;?PoL#aC5^2>H&7he)Rlh*1;dzsF_t~DSJ)4aG
zc!@`Gc#knpO>U5&U3qa{Y|7)xt5SO59&*(9oUIiY-ZeHf<E-UPydB|TM?UJX$7e<R
zxJjsneG)~%xR9Uz;Ym`F?b^1Ngb$Ol`8;ogZhn%Xj43FOvV7QxjMT-PrHna>c=V38
zRJ|Hrr6x)nXH9r#I}g#q#>l$m0Lb0PcGk&`(pJu#O|O*9dp{=VxSVmb5hA!sE$@k1
z-n&gzGg`xVAiiCp<Keq*#G<)z_N$uC-J0Lr605oCHX{CZ+08dD21Ubh+76j-8&=`>
zbJbARM6N)Atrsnr)riW*ctlTDYL?>zIr<#C&L->5fBmkcS137%xjQ8C6DP<6h<u)p
zoD7P(y<={Oy+r}z!ppf&pL<i2d&WQXq>ZHzerZX4+9IIe9Lh>aqQCki|0dj+sYkTn
z-to6CbKI7!tiePhu+Q>LWIoTp`}8|R3qc2JN}!Hy(Ow7%5V0pqNz)a45Y>4*7sKpz
zyor4YAvo^sQNBMKI<oJlb3r?OJz}T!U=Q?qk#XqoM}UmYhuQH=5b3KglGb_HmDUqJ
z%4#=VKzQ2n_oM){onN(ILbyTjJEhqUM?={T4)^@b2AxP`WMmGVr={)o<7qUI*)Lob
z<>ino%Ieo0j;rc;ef_`?9udB%0~&;#fksL!p`9<%`LPwK^wK??Kz0oH(4T;im)q{<
z=%F4inT}G^4XK4FLzG1HwX{MZJeefXkJE(C=b>|<_W?Di;$pu+IkxVe?|AwHLkdw4
zNF;8M<;Sg-Lv8>UlUB9SuAn2RO2_lxN&;QJS5CU&3nByoZZXxLa65GVabB-~`I=i{
zR2FU>XVO8EKkMqnv^td*W=Vi|X5Aed8Tk*-)llkQ!ecskU_nX=V|BEtqd*r;YapCA
zW9%4@KZS`BeZ`Tkr8P6cwIyaVlK{u`lYo~~m$llLzKA&5F-%#PxzW1ZOt%Ymn%9KS
z#GcrX;AeuXY+|_rM=lZ%+y<8U_)}c@g`F8}!=$cI0vB^M=+>34Ma397r1sp4CxK9t
zmelUh*Cuv<)qt~DheR|Ito~~iXY|le{k{`C8~B#wq}wdRt97r<>}@<{WoYO>pTia~
z;oOmFg$7_szT>1U8AKvLcfHI$o)Hm{X|Smcjf*=;lEJVuFwz&!ZT5{4JFD)&^@~E;
z$}L|oBp4>|-QiY`d+g%*qnp0?W+__l(K8=>`Xb&ZLFBjEtcNdVh4)om@4CNwn#nS)
z>i5DwJ^5`U69gZmzK&e8I$g1!6=F{(QD|4QFPN$ck0eT$#NK8XPGXN%CX`CCL`ARP
zY6-E>G(4K^x3#)bWOT0tOG7(1#N|UG2}J((x?oE^2J3Nf<8S;rxD=Xk>yoe(F|-@$
zS*#$_io00$E6^+^NHwSV3P$9#INE>Poz2v9uiqXuT!_9|D!p-OpZ0b#Y3_aIg@NE7
z-_c?E3JI_Tqit)EY}qZCd%>-(h{{(5#k3u7%3@z0RR2oG9z?&4cX;U_kq9Kwi+(km
zE}WT<W_1~3#lSpXyV-vR?<+0A3qVW7brV9?^W40W{4b9-y-o<-1`YvX5nkYr58-Pr
zpe^S1)+NpFS_Dc%Ljx@)X1_P$h#ScbCbq5gWt)KD6ulV_M;%en_u*{u>>a|W0PQxI
z$C3!viVYn!S2NefCeT54vJbvMXgs|g{`t>4>wPfo9_K@Qk0M6<rILkg9o4JN_lp)O
zTSAv<?e?Ip7e0rU?Ctl5UWCCkGA}2LFEPJ<0Xr2gWz+I&bzplOuiDF4XOI`wz>q&G
zDr74ut|d%Npj~wz8#FYyK7qKV_ZBA>9@icP)lycGgD__kYUw(fZ9GfB7ar1*2RX-u
z?83h~8AGpq3^5e62&SmM1&Kn(EfE(>(80jG1^0i&<WajE7y2gl2R2su<=&bbevB<8
zpo`he0#{XekFqibK9)Ru`i10Xi-$$mD~Jd4a<H51<g<)dLbzcP3m}sG-@4RX_qe6u
zcaUv)CQVy$z<$tGe`J)<=!^F4{1Si0AHE5@_D!K!kTSec|8mC*M|I6|*DhJmWRU-%
zIWhlGRDSKGEaRBM^^NEa{0!E_O!k#|d6`JF3juB8Mlc9T8`dsoyU$k9`;ByEC`*HY
z_%_byHY3AgIKeU9LEA}L#UKjVeSf{0v8WmaEgRQ%|1mW`N1_93lo<|iVtm)c(}l0+
zeXgB((R0F|b|dj~<cGtH?0=RrV+@F)G9*YOE<m}bK-!BsI3K6M5onNeWdyV4vds^m
zm0Meno&-2gqgU(OYts*$3dNox6tRAV>=?Jm*)il^3-k!qnn5`*fRS_W@_ZFn{Y+x3
zLEE2PTWg#K)n%_ah;$Z2Pp^>J?&2pEefVh0F!!eOKvW~~^}{fLmeS?Gi2ygqQ7h1o
zX#0Gs#rPiaLC{%q5db@S+P=T+=`vggv}@QLA4hFjR{&#iRWKMjkDMORV(O$9)sgn+
zzds>jVFS#D*ev%hFE`D8dV@`2xmgX>vR79-Jb{{zfzJ`+^h`{LN7nO1c7TufZIk;)
z$3+eV%y<e0{*qe5nnQ*nZVrym1u{X)>+9JnMN*(y>2V0EwoEBs#P0(v0_F2yN4Rmh
z+2M2aT}34Z9tq0padWUnVEI2w?JnC0J%5T(oSM>Q=<6l5BLf^D0F6|c4Yg-I1o-^T
zdo`+%sVSbMw73}UBy6!RbNj4vjGh73Pv~vUQQL;JaGbt2{oUNF<tliQnXOkxg~ZAR
zEB2=8^9sRNJ@L`qBUM2K*=E{Af20t$n4*@nFK^EkF2B1T{Fe?gwO=r3N-YR&bnASJ
zhQFDxHt&qd?$po_jjjXtX1XCRG+%ydhdc~x9L-)awz~Yo<;Hhpgy!D`*yq{aHs0lM
zWylEM&HOrlw;1G)hlIJCec_$x<S)57_ecV-qRe3g(NKSKU6)GYx>PVs5c>wue(~U{
z4;0G(e%`Ed;il5S6lLy6_uGF(c3VsVRbfOa_LQ8DCI-vyE+Nk9<zwF55<E=A;6ZfP
z`sr4cH!1VDXHnI{rN-h)kW@fSCn{97hp%VJ%ntt@TDz-PLp7u^sd>{=K)Y($rq*ST
z-qx><-I>&F2c~*S@g63(C)nQa;XA8EI(}_CC0fjBv6mL>L!YK*=ND-9*zo1W(SJ6b
z{|T@m?te~(z0o&gRT$KF;q;a-w12I2RjNc#C+75)*!0DYj6-KuGTM*pL4wco(B)sX
zrM~Z*EJzc79fvYxTGABZ8#|RzRNRAbtk4L#TOD-mcWNxBOV3ln!Zi_(yTK3Z$)th|
zH*!#{9d^p0|8WlnP3nKAWequAb%thgIhgha9=RSwMhuqNLVSft2rInZZuVylx*Lq!
z1IQL~{=F@B4%q$rg^i7kwez}<w(fMXiVx}+4?%U)VD!V)=6vzx)#f5{P_Wn}o?7MN
znyY;U`#h#tW-cXUbwSSICpD45gb&OC-dk2#chNdW#62t!WUQ%9i2^BOdYYIr*>2@e
zNM$9BxET1Q^;;{UVg~zRyMlMuq1h|w`K&>}+f-HzSds3u(p7r{lD|**Mx>?H4VhFO
zC^B;6gLfD-v;wUfG|&edJ7YSNA(+XlDknW~Z!d~F=`DHmiVH{QTY{0hE$@WO>mL^8
zd%m4%PX?R5&)@Z@W07H%{x9y{DLSvN?-!13r*YDtv5ls&o2E^}E4FRhHXFCGZQHhO
zd#^t4eLs62d`I8W9(x{TjI4F7xz_w!Ki=M1dg<Njo-Y>!0~6o^YDh2N<CTq?<3Cbe
zc&^@A<dwerUzmA~GWE}t_&FkM&({J&4LE;zx!CMl=o^4WBjtbftg}{k&bZ~OBXNz`
z>x_9t)u8Uj-szkw()(-i#ia$pe45{*9<w*6;8}+6%*7*Is#&trnUW|e|FjNjaB*0D
z8WJDZS+QM~L$V~>u3UVMEcE>TT~}ATl3mV#vxVWo5$exS5x4a68Q&Ub7p%X|$q*%n
z4Z%<UxnX_5iin+^FBb?@FFBUCWd^u+JwNh@nBlV6Q;RHNP819lj}aHqZ3Uv%1f*qh
z22itaw@c-=ckQ#TxmlJwYRb1?o8VKH2Uc1l$8)T<B26gx^(~ueG~S8Nt(F2k+7T<z
zc#q@_!$XSPh_8>a<=)SvKJHV+wm50GI@rmbXjzb6qp9SO!=23p3gxC_)Gv=_z28WH
zdGcKUbXX1xZG$vs-fc>pJyHN*D<AKc6A{bZ?^m5#?v9E9n=s)w7ki04sKN=AF)1v~
zGW}cq>EDp2)_dykl>hF_7H+%ogeQQ@Xufj6Q*E>!rIWshy0ynD?X-Lf1>=fB>xfuX
zCjPkUH0#wb6-RZQw4b>-%B5ulh|{@J2Uj2*^tQiyRI;F$TV`1=IN!FOlA~Q19GtFd
zh`#t~nOhX?vN8fe6ps@oD(kcOo0~rMOWZty9{jnfwlYpkM!az4GAY01M_jHrY!hNO
zmR0X0VF}dE%q3X#>D347=QuuPzTBbh?r`D$NSKBw9nLsIyMfFvOQ4FID2hMGW+Q_N
z>nx@PS&&rLG|!mIWygc%)Uzec^8hL4ar1@L@Z&zfOgeq0dFz+1Fxvf=9{1%a(g&*k
z``}#4aH`mOjLU%ZObtA>=5<BW5yQhRQLq2QuNzcOYkjZIs{6T^`^)AMIQL}roKg!R
z)M??prwFVjpEGnM5)6@K<n+fw_ev82z-wfpHC!EKn5=8M<D0@56a*8$hs)`L+}vQS
zszw}KjuoBrMhmQC=Ti0D(ZKS|kOU<N8{3;tcZue6EJk>1W$A?D5ejEJ;XkGfnh(k9
zc52}6wXzBPl(sx<o2~)ok2mZ;!{N==`=l;Q0<;O!*<GeLGWS#E@zx#0moD?2$tzrs
z5rLH})&q=Qkph3|y2ru%Y<MG%3Gr7yTv@2aaE(XI^u*EST;0EhBDeqytKkS-pg4@M
z4X>G?bJWhwy)zeS^mAF5aRo7r_aW!8!56@(QqUd^SZyGy+Y4ef<JRlzkLD6e;3#kZ
zB6R`=;of1tQ}GDE^)3fRU3|@!T3Q91IsIu|g2&s8cAcGsONV6u)Gt%<n-{~78UW9&
z!t1aVef_n%U%>r=dAe}bzq>LE`n+X1yXy(SUj{(QP&J$EcLxhI4X1XFXwST!m$jVF
z+rt|5=46$<u1x&4UH<rc7B~;l2)N$R04RZ-x|c8XJKO4-$=NL}gx;R7&woRN&WF9Y
zcuu3U;Y3_0-sSD`db9l`)6Gs+fIlbReA{^UI)(+!CnmZRZi{UgT{GMRN2I|OQ7=eg
z48?X<UtxHNFjrWT+etgLs;Bl+Wi%r{$*M_c!J07GQ&Lo7GC$OdaC|s1x#6DOwQJfS
zuRB^v_h;k?NkO6WYkk~GyuEY<>%;r)yx-M$xYiET+1o0Vby~dn`oIt>rlk5i3VbWE
zVPE#qXOMc?vxqvTvxg2Xr9dFdc3B$UN-I_JxA-34M#RONJNK97m0>&uXcdis5-sj5
z4#&8T&C<a@b~s`nVRg?5AA;0dUQnrZ0%h1h)Y8E0H&R}OH{I@9x>T1QbT~k25ro*(
z!@S+5j=xfBIG(K|dYd{PW>py5BAD_E9TTp$Vc>a3Zf!{=SZhXZg{_s-u0QXifY1EC
zh5jU<6z^AO+@ZA|ZRnUtIa<+pZz&A5r_Hxx)8mh{wb#=cyyk+NwgeUZ^g>jggkH?_
zV2;~IC`_f>se~irR-HbuGLMxH_LW>+bSEzd3`6n$gmqeSU$69M-J6fnR;^4<2$3oo
z|2@A#m%Mp->-)))nvc_6{7whaqr5q%7c5%R&Ggk7(Q4Y+XWDHHvcg}hKYU$O&;O31
z;Z6em7|t?T=i8MK{Emf`ZvOyW&IkJaZyAG+3p5?}x(ceJ&+r6Npw!Qj<$U;S-C#%%
zFsnlz`^(Pn-41yc=E-CSxE!%`YBc6lr`se<{Nn`W?_mcA!w<*nZ)@6Fqsp%%-__KX
zb{zhama4M&dFt<_2qFR3_+_g!dFi2_E01Z-$-K2X%0_^URjste22f>l-@U=XprD~I
z=UN_jcK5D&`;g5SuN+oqoD>o!#^V7Q3gDef-k;%GZ*&MpmGh}oug<;p)Z80odPEZY
z2smE`;5zS(dznsn0zSDb5`9+W-~YTm6P>z_>=}>BqoecKtuP`?PtmY5Xb|#66OUqH
zRzmeMyWC~D99ntpRW7pOPtPN$gnV%LF|2jRt5)d<^GBoML%GFyU?>9SMJWiiG(wY$
zUfI31gMZ7kNjz0Gq<9F*>iy{6s>RKV_xGaw0s81y5B1l_D^|Dzu26!4a8fkt#osRR
zGtJ3r4y(->I|cblz5zv(^D}_c*M!#n@BoK2EwAPZE@!%!iZYehGFW8}G+)E7*q#m+
zHbdggX+NQGqaTzkN`Ka8*YVtVVod4^<;;ZLpdTo#K!7064LO*y7FD)_Y-7sF!=R~;
ztIS7hp}^yCRB8Pb3IqOHt}9!E9Xchon$OxO^drbNZeRh{Go}*hWbFWR=~Q<;^NWVc
z0nu!^jt(XhU2E<WI$S&!_NF<ZmC3G0p`g~gJYI4#ww%)KC@ro;6a)?(j-OKVxjl;E
zLyd&<TW|YiPr6&9fhY|O8nZni#Syg9NEeo%)Tgck9$&MZT3NqzG7Ca0iw{g3@{c^I
z9Tgb7*`VxdN4qFbq!WV<pD$PKR=-l4btu`-TrCN)5rVEIM;CscH;$2NcR-za9+B|B
z@Ao%a%|YcfDS4dZueyT^n2?;ew0aBizheE(WAL3b_SI8iRx018hVB^{$oXXe0LWky
z-Q8Gc+x<~?_nJ;ugGWDldNs$^wVWrnfiizxQO)e?#H1wLI4T7gQl3x0E!`|AZwBYJ
z)&tqhNow));|gU>s>&s_QBeqKI!`T|ye&*j94e{-pkCy<@k|dPzH6uI$71%iR4$UR
z-md&4@me$G{l)D6sA!%s{#drK4yN>EPwMwAL;Jc-H}0?+;5-Ku!%5s@=bd5Eu(W-;
zpJ}YwT_-qDziMc}h=cinw`FRrf75D$K{tk;RvhsO4SXKy7Q-ThYJvh>_=X^==eoR#
zTnOJ>y)^#xVV;qal(i*E%}y92eC*7-dizxpzVPZAh?XYwh1^Tu3z)eLw33cZKl#X6
zp}J+MyHGCG_$|Qy48^aHvf!0#C~cuvarZVv_)^2~DQ7zG&P?BCZ~P*9v>R5=wZX{}
zm15l^7J^s-gRw%x9J;Bn5di)fT7G|Xn*S3I-3U2r+fpByIZ}AU1)9H5sHy|hL2$3F
zXWX;*=C+Ik6SG{kiK79*IiI%nDX}D&b#_CMYm>P^!aA<fH)zjQzY8wz*5}Cj1TYd<
zd=tkJMwW1r+48l<G%Ujc?Q#gab5oG(4e?o;=Jk3OPOK)lAqZkzz!~-a)uLBwTsrS|
ziFIv>8}xysxOCiw;ZS?y_?L6E>aK|g8Kl`yFZ$%~+^B9eHwM~=Q^+SK`e$5ltyjE(
zH{bKFgK8{K3rE`SEg>MxOJ1BEG&_GiF%}I<u4HDc>TAu-mCjn2Lgd^Eyw1eYyE>62
zX>tZr$|Tjgc*qskJ^}uFt3d@06UhHB(UfN`73gJ@Erm~ob$5q^gjiIyyWJmMuO}wV
z>T8{HDg^ifDTmMR;E>7pbL}=jgy%)5JoC~)ayiZSuS*e_jAbSBu^9^jrmr?fjeTRK
z&nPSJ+ZK`ZOKta2fg;E71vNWAlppz(<Eh!jg(>kuZ`~J6Uvl{RjIPmUCFadmdTY|C
z65l{A<BU6c1Qd}DF?2cHUZ!zshcKdPKHp?vTG<;r8{=gJ(DN`N{qh|9^aTIcFH+bl
zywBUmhf*c^$8CN-v!>q5A=9dpJNekKgx|sT)k~=%f-Q0HfCj9JHZ(S8KyxR5UI$=}
zmh37Es|>d}Q8=DBjT6uE8?eF=zwjFk*K&IzyEiXfKufpLSzbXe@_+AW`_^Gv{09Cj
z7upMP!}jYGx7&cGPxTGjMp$h%t;7pfiWq+0nYHjZo^C1@6-0}BrDHXOd}up^No?)O
zOPM#Mr2=?@Qo}h|BH}aV5|PGZ<H9l1`<|m>bese{P8YeK+1!di#B<~!>2=)f?DR~(
zhRAonaAnBlUYl?dwnUbQw=j~mC@`>gxWLE{!w(G?a~a8!W;Hf{W8&;{bNl=q{LwO3
zsSyjq1gF*sUoTn_UtG}^W{cYHOVft!ytE{j-ugCWF$=qW5HBnY>!OC+ViM9qNsXbl
z%i(6Pmr-c|tm4D^71=`h4AnMWgSe9+H9wesG&F6!Rik6;#t)#BEcdDp-gL*Mm+%+o
zT<aV8xLVpB`AJukLs4m^gdj!5&i9t9*nUzSadBQSMULrNGrC_<crKK!021_cW9DCi
zO0&xB1ET8l`tJ!pLGT^aStX_wtql#qJv~ez!QnS}k}xV4S&tv7scn`|+3;8aOe3c^
zB&7-^Pn91i2#$M7U$=DBx;r7RoNy){BrdA^PP;0pgI*vCgS77D144>dx(6FakN#wk
z3vvB;9*`aE-o+l-==@chEO<<AwQb9ZEqrkSv=y0Ta7_Dnm@)P`h>T|Ud*d;lK?P`)
z=qkxO+?ri?aL&DO`TULO4d)1aKQxE#7Jpyxf!f$O9v7sr)4EL~9#Zdr9%kj<0e<^n
zg37FyXemM35dA@{56yvASozU}_nY|O+ZO;pgR-#*B<0`rQSil@f)Wqv9<g_6CzgBA
z;oN}2UR&tJI`(DqTkI+I8C%fRTqQ6zg3&iP6yf)yuHm=zmT0f?f!<j?JT^m3EzQwJ
z{DSJ}6N`Wb)9URNd0Q?9gI(D3n2N?V$Dd?r^X5I4YKH`T6`Sn+bLp_!haAyu0v#i&
zqeF6zD6_=Pe}Zd%9RKM*o@h8ini>0rbV2|XRA0H(9kG)z_ae8-h!7{Ft6BxwbZewd
z!hm@>z1`JDiQH_0CfLw8ZsbS&ngh?7Z9Oefvc6a3rp$(_Utwqg!&t7Z8^}&8I5YGt
zA9uX=VQ8aBn>&8U$V}-Iec_XujC<5Q7f<%{=fHqv`I}R$bG{&3LA~91`S6S4Wac6h
z@ieU+xIWqBjtB8~xyY!3ASOKF*Wh+X2)#gwnj<ELDc=84ska$Jc9~Lxbq?+gTLNqE
z+zDGt=idP~?on9y4h5dnHzm{o#>6~4OtY9I-8B_~HOD7z^MsJ-0fGZ{-XNaOu`j6p
z7}%T5(&4f|BH{|QURCpC1`?ggtS%T}B%U?CfGwA5VOFW~Fk8ZBotR+LU+x)FT~fk3
zZF5P!=KQJ!*l8@cfdt1(OzMdpBB*bPfJ9V%yfkfcJe@nKY=1uE%XHte*&A^|3)1si
zz1!LOxTg7l0?F7KPbVG@CI9_P7UyZ=H2-s5@hFXjMnLOaXUS#%zqJ4*e4cVfLPBqE
zFY7Axbxqk*Z?EfY+jZ*T4bNw8msh5)Z1g~Y5miA*EQwLm<&4SebyI!=>6;Jvqj1W4
zLB&SGF7pu!aT*q2KBaCSQ?}I12(P><FrUUMixQ(0Ts&1@(*(OjQ(H!sJY(!XP{qXS
zdgVxnV(w^`pK9$XqIr-gtgN(4U360X6f|3Rjp!M}bUE>{@vqrR5(8Udv{fG0lS&kt
zUt@_l@IDGpD|v70IX?7}O=Amd=)g2O8_^iKzP5u;amTGLLz;Y&xvJh=iu<y}h)Kbp
z=u;ZZnd%m&QvIt5;?zGzuGrL9YP0XL<?dI8_vZc82J)=fr*L8xS>i0QPB(n|=l$uD
z6MdiV*<*9HC(!tVP9bP?uKb)MJgT~y)P}=Kp!!2tmd|lCJf44KN>^~$hm52{QaVaM
z40EK2A38Z%nC)-#%z~CoVHiMWafNdZ%BkD>H22~2?aw~%gX0v!Y4!%2t7Bpa#*B_z
z5d_r=o&+(bf-9daWI7aS6mUGKey#DMRI|yNa=pn|Boy!y8QspTSe1zf?=^H;&kN!$
zxA>U418f`v2mOnY)GaBHa*nu`#LQSgUh6$>6&KWWHv~59y`ly}XSWfn%h&0l^HLsE
zUJ2&ACiIi!`j`=a+yM<8%8H<b$TANsVrF<edpa(pa{O`-&QoMtrt@=J$++xB?Fbb8
zs<}P)$;Mag&uAjlpy`6}ACECKqZAf0==L85Ho)Z&F=wl8K6LIBS2@4NY0SMGo6Ol)
zp%ZS1^tOGG{!?<3!5+b~i7e!&?y$~6Bv<Z6@87TRQSe-D&MI~MI&H0ce7^S=LI+*&
z=DcQi?`>JFQqy_-e%0A_L_A(+c%xy@?^ihLD@c=NXR~GJ-wvkZE-*fRedt*xM!N-%
z{7IMaotz{^^uH<wLh{EKT#psi;YfI5oS)t2j3_T&l(f)Rx`RSvzeuCZpJXxjibT)m
zaa&*?+iE-@H)XQCVjsSc$z`(tVr+2U4c;VP{+mi|^8JD;bVZXE7pKzpA|m)~P9!WW
ze6mpKH|FU%^4$JDBiA}MJZi8PzHHsRdNMGa`*VPapP0|<9<ZR`@_;b<%iATj^Vy<N
zFcE)n+v^i}bb5U7=v>C*^?KQG9#F%F{`RJs%xz-Z{Kg<(z=5CtH&zhAwXWM|F`eSP
z>P*hm=&#A=q?<~GNiDAngr;2GSApsxgdjbk_v<4ec({w3?H+|F-;7NA8_J@0<}G-E
z*V__L`4z@c$MgIWklN1Vd1w(cct6Otp5%9NSI{#J<$97%Q`A@GXqm{Ltq$S557X5@
zzxs)lq<M`agtn^=UyY!dv)uY-7{w+9ATre`UQWU!(uaN27pLd!mR2!m_mL`oxX8sK
zH|*141FH^Z&??x@rltVs>mWW0Jr43&L*@A<_x!>KD$Bg2NpqooU&LByOCrL8%3EiH
zgmAD|7!7TywCRK8=v~JqTjP3DBuCoR!0PUXO4QdJnDCuH#d>OQUk-N_w2h%Q6T#dC
zu+6#5kusmu=~snx_0(|q+W4eD8w)-4MH-r?2bKH+ta!mWzY7&?s;5O3;KDm@MZFXK
zf`aJOOzAstOm{zgenW)$s9@vo7eUIPdyf9LTPJ@M`K>Uy5dA%@Z@v)|2Y7*fQQFo;
zx1zmvNDyB{Ol*dHt+X8s3v|es`O~!vg;nk`A$#FI^o6^<F4pKk{#L)PIA?={BN=%3
zG|$XWTXcv!GY5ErxDJ?6Z#;=L$YxS&u|coRkWe>A{mGZPAtWm_N=-6CR6-9(knL*&
zxP!m-&uzahQYN?A*l2Lr;%zeMmkdQE9Y2$6)B*7|akSYfrY|6zu707eEJitl?ZA($
z^}F91#-%TMa~oJZu|&E`BQlHCT-HtYn);Wb2Au@FEk>7)j^8R0Hrif8@NJO0pE*!2
z>R;QGs>}#_*U&{Z?T{}cylD6UREG{hbZego`OLE)<fR+SwbYxNZ%|Je>-I+K{px9b
zh=up*M(_qB<;0}c+c0elv0|C6a57(^M5tH$k~bp|XdMpwcVOR#idvT?<z&e8WMBe<
zio@`GwVW{Za`<mXYomzn5X?O!tb4h!DW87bq9Erw8LNa@G%d?Rt@Apajl}w+x&Z93
z(6DKDce>mk|4Hd_boCw$*U3e*dFYQcBZ+6P)H+brs-W>6C3K0^>K0Vxd`aFWFOv(n
zf@d$ItNafWA{y4h4vJYoj<%(x#rfh_(7{UMZD)?m(_QY%#U{-8e7e>Pc{1I0xvBLU
zn0mcucXp>=<9?=_y8COMRGiDZU;fMbdu#9B&L|Qg-e}R}&+7xd_Unf?F)^`|mBzTV
z!qqN7@2wu$nT`R)wr{y`$(cZP;N^DVRhIy?*bM@_>G|7$f~p*+!P6nmms7&g66$mE
zy5gX1uCWt)zCpDc{&Isr=HpA(a-M{du`7BUAuoiK46p*f_y8t0re5uv*C^IojNy>B
ze3g`^0g-SN7VCoc(QB)M4L(CoRJqWZC8n#~oj>MT0feIevrJFd&$%foMGYLKpD)kH
zdzwk~_5DW1L~-7^pmhsa7#J&b42_;J*4CVp4?fj1SN+T#8OxFsFQpTTK~_bP<w^$m
z_=~A9SI<}R*Sof>noyKjff;C(>$*teKc->YG?a=a8dctPYh18ah-t`c__MbC^I%c=
zB;SlF&lY;90I_dE;#MCXA3k`OQllraYgAVssCywXVZKMhkyE6O+t_38R3srDsxJ3v
z3QKw7xOsF1BbCfo429%Ey)<)X_I*EBS>La|-#{Pd8@}Hvf|?%gEL$N&;fh;y@tn`}
z*VG1=9lBCdxwVn^Tl9HcPHUrT?`%)sF;|n3A8r4F_t{+Z+5-pue^$(#B4BWT6<^@a
zeA(w7iM3B0_eiOFQ)=`Og@b_+&L!T-%9XbT17erA=gd|{P}PRw%2ssiz5Q0D2J^j6
zmoudvrXNzVD97|{zDZCC+Tv0OK+~?uUkPROo23ylv=_3)^x9#jty&OF8k@HKM9U>{
zj{nU7WiG&SU$T{tn^do~>-5i7(1D#SWvG7ZVC>(7S&sB<PbeuNls(tn0RaWa_IDpL
zn(jXCK%jPZNy9cH3;`9JAM@DUp|#<4J9;^*uIv`nK?Eo46#=(o7sIvYtBT?pUZ_OZ
zSCki1li|l2dyU%(VtMAW{gXVg4P-hGZW<S>9YXXKATq3fAOBeS71%(ERA&KnOG{J7
z_1R2sOX8Bf-NZt_v+g&QxZf*=gNz8$NQ~i#onHj0w4I*;c~2_-#OUbZm7K|85kVY-
z1gFuvWMyk3>YR#YsoiY<5tGTHU?19S@R5~B0+tSPiyC9K;d5-z_G3^l>ltrcMVSOZ
z<WS`n{8Dqq$jiac&gSMHlcJ!zUN{0Y{~??_BASIyrz52L4@kr<c>r~ay{WCeF(-M`
z6|#Kd%|7cYX4H<4EUz!L&I+Lr3To|#z$Y^7CM8+%*6D0_`Y6ChPi<rbFWhemwDV)Q
zD3TU)T2a1lcyr#jlxXAjQdu(xF+9rWz-F`<M^n;A+URh2oC>9eB(jcAPenCa?-WE=
z^+V4|>ORA6EzgRtEL3pbEB#B-{D8Bzl!1=+B`>Bsum=Mos0U*an{d4x+8sU=@AJEB
zLcgZIj&tW@pw#6UAuk9|>g%bwV|Dn7)fEfm7TmGr=McnsmV*O>4T6cFin9tbC)~)T
zZWsIo@~4gWM;u#(C-yVtVOiM5^_{J*jtkq^xF+oWe47Qe3|rL<4&E`5xuRn3j9Qr_
zV2fg3N4{IO*?f#q-n9)4{-0ID1<jZLnu78zv9)_^6ox_s2JLV?X3JMs1N64=LIARr
zDApi!6zR9ZxtxOSg#taJ-D~0oLT9&lYp$m)FrF$XASOc_d{2gX^Caq|_8u0G?Bkxp
ze9!pAA;DI07fxjv9Ac7uVj+^G2_52|x!YxX_`Tp&7tyL*b$Dq7T=v-O>SPS(4Gqh}
zHO-4rwsVpve9~#ea3wn%Ga%yMPnf6q>F?=^b?%>6a1Z)eJZNex&m$x}?(Fb&2qk7}
z65x>WemmQuX@f#;OqHDHpR;^QzFBAvMr6-6W3SG9<72FcM%<wJlrwUwpz@g-i3yRc
zn+jvgF}AjF4ZJiayqN>zuXivs=h*0tOk$bc5v+goz@1adSy#|97MM@Mwt+i+u)eaC
zVM)ysq7|z?r6J&eDdEVAg7k&)(C^X$!K~e4X$TIlW7kGU1se+cA7<fUKyZ|9wsmjy
zcagEPR?SV}{UfBzWPSD$S^`nghm;-V^*Tp|ZD5cL%BavplUgZKX-m!OT5?cDYM^Z5
zeK7hC%WK2;X^g3FmM2=!r|ooJXIl6P7{`!lhT*s>y5*|Md@$bg$!4d$)~8p-jnz$@
zP?$Igi-E#x+E)4WvZ6f3cczwIsS3Ue38p7d=T<P{8tq>;2F{$QF#~I7Psu-FVGuI}
znY|-nJjo2>6M1lq$P?EH$UZHe$X@pY=L<vI0%L3JaQBpILT!Mi!6b6=ucUJWnKH<2
zT#CcPb@^k>rVM=#_9z#Eu!`Q5vC~}Bf^j`<0``Pvl*G#;*sv6%O6&P}WAg~Vm^HF+
zw-@9-j%OG~psR+D$?C>nivpFUENSlWU-|M#_V=Xwhox`g7y0pFeOk@&eu#Z!OzpqQ
zpqfjb439MC4#?1B9Fo@1uG~5|^QH`+tRFS>b|UHeAWOvc>FLRCs=Xe~#=#3>nF4ak
zKDzD3Vb25P?KN^b(ncAX4bxwXh9TsqYEXVPuAsHwxHY-rIV_q3iCyvE4OD&u*fps>
zA{wMkJfb_Oey!3x9gzS2{j6<K#Nawq*U{q~q8Rx-^>glSmt@xm4!cv0VK1SfE{9FY
zvT%eg`|gM`fe`xla}3TMk%d_&mg6Sl@gMW8C88iF-0D9dqqV-;mN#>p$~eotK=hcJ
zKgmtcG!n(K7gxO#uIc3eEFo^-fIJ#CcdXR9U`Z%rwJg!t^*_|FGsjkpb#&4v(my3@
z?#{}MRzV<0lkObBL3r?!jMdsPT(7Cf%NxMu7HSM1VoNwhU#$tH)c=NS5y9iJ!%+%K
z|LAPG5Zv}iMtt0wE;h^=ebqUv@4*%mboN0aIN!wXx>+>QY5@@kG4N2^EEKXu<f5L2
z;`={bxD!;MU06dme}?Q{!5-s&F4?#0K(L)-LG2rY<7J*PZuIjJ9V{I2iH-HmRva~|
zXG(T&ds2@|B7#)cDy>&kAx54_3xbJgv<LA;0EUEs*<2K~l=W6-%$UU<d4Q;f?|oB+
z#%lf{O7rP`;bOD@<Oo|%o9*1-;Yq}XQ41Su(;i$Un9$DQ8hcgjU!lL6{{LYpWrnbM
z;TqIQevzj}tbRi;mPN8M5D!iNR3?3G?j-y_w~nzq-}8-0(krH9OpPIx=)}}0rB^=%
zBiR2d6(xbeW&i#SB;hf!PW~kj^cXwezd_x9maROX5C4@@qGRv~j{m)FfJe9!$UXfl
z45$GO_3vHe`%r(=r~j;-^#9}${t?Lk_dM*tnU~3t(7yJk(7!ts6_p=0f#$Q&WH_Bg
z{rAn#c_fG!nfNHqb4^bE`-1;3f0zH?C!Vi~?Zp$dpP4fJCnLeA5OQ}}%K4Ee(~UY~
zD%YqG@!yYU>-g80DT>XJyLP@i#8ovAa_bz+XA&5!KL4m_Pvw`9e>l?W#XFrP-vRQ3
zS^xR(=^O#KO`*^_m2IthG3ide<mV-2p052vR>3d)4*bhRXgs(NNSWQuW=uM6NzQ%H
zS^wt+fb5-Obcbtwas4vx$o22fdiJqLXnNu$h<zOQN<O^w0MIfe=EH(m+&p<Rm9Oa6
z_@k(;2*p+AuerzT={(9aM{`OH_k3{+s}Bwjk(%@9_VL^Ndo_lMyfhuI99W0z+0HY~
z@EvdM_be@!`LC64$74q9xIn$NsFYtzx+MSo2u1;bCvZmlNq2Vp66tw1jDAT?YW_2c
zv=bHE{m)BTmqyQy>e$okLN@aQBL4cE<PJI`7PAvd!WVbO>7c0X!Epi@EIPJ5R)Yfr
zSuaEM!DbJ*Q=*>Enja%>FKT5?LOzCO<UerYE)KBH9)4o9{TwrV+F&ddj8ncpH%4C^
zJXX#6(Ifo7nRE;Rnr|2OYe*||Jhd{mjk}lNBQ)$MHI5+TtZ&M#yh*pc9HT?^)t-y4
zD4;i}ti@|7Y3HzdjM#;@Lrl>m>gz47bptmGuxkE{#wbCluCvd5ua9h@C^#|g6j^?-
z^%vvIffz@2(b5W?W4ZYs+xat8->WcR2bx=+2fjAdI9>+U-6;HfAY;6L_p;GFA+dk8
zv(VAx1q*Uvx|ctI=v9|wDhWYzjNDs;;=7w2#~^Xun$^@TU^Za>ZmBu$C~;diHRF_0
zUl-qD!o+%he+l(8!wFnmANO$LWn?22=RYc42MozNj4G!#-s4Yvb0PGxP|dGHe#QKU
z*<-2|phNRkA14CZUi~giDk|wb++c+bKEa~?_;R`yDfqP^wDCRZ^EY|)d_iGRiJ))p
zGqALDJSuTzChO@vj!o9D2fs>@RFuEf+^J@^7@^;V8D3s$S(44{@06S7{CMR{fDHSG
zU?tuDW;wvx9v;|Hz@BEzi}R`+<%V<fOT2V=r%ANLSiBkHWz3hQ)GP{X2#bbiN_IBh
zzGc&CLN?d^fd%lyS)T7s&6!{w84b>0?=ifz?3cS@OV6_Y%MN(PADYuS^3yU6?zW4I
zTCU&)*AzQ1Umx|Y#`iZm-Pc-wt+m*=A1t*ZT1Wf!!m2x6Y@M2)<BemMkCC;>d@9cL
zOl~x>P}i=${oh}Q1QIwI`rmmp6hwTQc$^kEm$<XTjr2@ZZW1=Fb%OpV_<fY|u{E6d
zZf1WnoFWa!md~$FJjSo0grag~4qY?dZ^Wi#6yB!l?*Dt(o=8|H@zp)BS&W(&zp$Pi
zpF@71evfT?AQ>4pUTBgTp+CzYl)T*H8F@Q)4K<5nmzhp+8)hzRG6MO1E*PIs9cf}k
zyC8Gj6LNO6B~8c@8g4D8oNjV9T7IJbFRNwwX4ua8M`rZiyj8sJ7Q;ofg&wx2j@7dK
z(r_GiZ34|0JB*GcS)UhQC$r_n|4`5p`#MkB#fXw?Hb0!)%AJyvt9dxJ;+vkqq%*gk
zwfnRo9s4dA<mJBDWNno8e>_JsNa*b{%8Ne%H<4%faO-@2Z1s_)D9`>Lr4E<4yI$SM
z9Sg<MFZcV|+R&XXK~h9kR|O&;=fck>zKqkd&t-T*0SkzHU-rX7+IbVR$G)Z}brniE
zx-&K=25DvVcTY-u;>Hkt@wd@1FXi#S)h)}kXV?@BqIt69Q@@sh|KLc0+I~fB#cvNS
zybqa>zvHe~xGi?M!oG*WJOA>2yl~2@D>A&O!<4;%-&28}G;yoH+qg8<Zn*YS6c}~I
zm@d{!$gR6!#4_e+wUe&9Mf;e!HMNbG(ZVvuCuLWUD51zwwv9u-Ml>dgsy=Mb`*8Nm
zI(B{8{j7gKNlPszwyreW>AU+9mldRXw($%pad~($8g*Q<E75Pix$skVaPykErIkgk
zMSF?+ZLH|1jpvX1#r5nDCZTVZxMLo@ghi|um3Y*T?t<Ohqo-Ys<DFzoLN^C@(?3BV
zc|$`gK|w(k`oy?8A?pn=7Z(>On9P>UzBZ5Ng*Feq)h35-W=~?=u;o)K3JSlm48HBt
zQ;(^;X2;X5WvljH!|Ki5+iqy~vG4SkEkFeYiBJOW*(aKupORnQ?)Oi4)Q0X*-#WXa
zg!=!i))GL5WOAl?y?CHpb`I@eqMP7a;_?iGwssW|;hOJQ&YBs+6N!yRwnRZ*n-*2P
zQ9Ers(+5@-zieDkWI*mIwd=Jr-~<#T&b}r|Gc9I6&=1WLtKBtu#eQ(eF$U}3)N-*g
za>B{Hy0tD#J^30rdsI(%+c1cZJNq^VE~>fOJ@VKz(>%E{$Rw7M7w0+VCPXLGrSKZC
z$7m7=nm4z}=*&+2_-Kf|`Df7(N)&_3krs3N%b)UIobpc0FJ&IF{bh)q)7vfA$Mr66
zqoR>WO&R?_WZA5jt=(=PMo{C9C4|pxeg4Zpw2f>Ad)1&zG9^!mW}ScSZ)k^oomQGi
zDWw7o+&Sl(G%SzSP)ZU2dtjktZO4Mbc7eM9gexJ5NP=SVWFAdIWH(L=a@FB^G(?Oj
znH+I_VbpD&djS%UH?uM-RlJB276eQLliP(Xw?}Wt<-44b?KptBPwx?uG>?tx(ycc_
zyz62<o3T7tCO0iL*CgiNz?k8Cm1?2c9ZX;csU+b)yXqu=X!UUlq3ryd%I%x&oTU@T
z`^mRmQnOc#ME7)g#ZwMrcyiYs(Spc`K4l?C+Yj5P@q)t{kLD_O;3QUK{fA|TYLe`^
z)T><wZ*xbNF0}W<(=KDhwA!=9<JIm8q}#;K!6CRA{3_J5($dloujgLlEiTvA&BgCQ
zC?D8^|2Fi?rm*HqJ)bVuUk_`&#m2{Dfi*duk6|kCcUl2;hg25zma9V|r~*Ar4#!~3
zNPaOC5?5V_Ola8Hv)MrXbu&CN^6f%DI3&!)DA)RbTB3n2`L*pF`oCTBI7FTBu!EBw
zUH0}~pS5bDMKPBrpfJ^Mn{{C6oYbFE(&kdrRMjk&m7DhpEYi$QfAS_5w9mw@V#{ds
zmVXN^uKq-6&Pq&$U!{|P4tuG*cMt2^fka)4*_ss<<mF6K_MDn_+=QC-%=-`=5dl<}
z&+Ptg8%^V4b2$1g__%|q!T7VfQ#Gi&bLO{_%lqrSYNal8{i>sS+%m+;dV7}V>l2H|
zlRe0JGoWd=+GyNxGDnP%&oMkZn>02qPPNWlwL`C5rP8-B%QF>7_CrkMN}zf^4r{TR
z&C&DP8@~oM937d0JZ^tvx*mOEY(A-?8lRuXmH`0NUhcaO_1xLdai$EdcJMSq-fccW
zibPeKDRJKIM%nET!g01`2$yVSoc2iFwFyR;lzT`UV8#&zpr-Z>k1OOwz3q|XX54>f
zR87m6U`t^c9EIimefKp!O++{#rJ-_AwEhN+(43OWg|HwOd<_Yi%dKu_yQB_H`T4QE
zOZaX}D5iEEag%ck(^aR8UsO?;jHg^k#^#XSlZ6vT#-ihT1b#(=t`ke@rNWu~Sc}4U
zM3fUZCFVTIlqnJww5wo-a0y_heI{)6Z6&0o?P#P38=2M_8IUBex_d3}5*|~R8L+`q
zfz4xP{DoUkC1Ac3t#CFaE4#@i8CCuHs}Xsb6eYvTvDXM2NTb=VGAR`v@tlxb3j~|S
zxwDN@$6~4PP6v?zQ)V&jay!beF>6S!hgqrm>*X^z^s;hghfT0ld$Q=a@l^koF^xoe
zwa8W^dPSR$cPM@oC?{er^eN{X69Utc3KetEp)Jq0Dp!V_DBZpcf-#_9AZejmr3MrD
z6rwKYoy3Mf62hY~D}r+5t5Q9LD4hj7zhv5X2XPlN7ZTo!K()#loJ3mOMys}ws|-uo
zRr8*AW(=i9Ki^iy6R%fH%WX2l@TlL7g*CNrwH-90Myo;c2Ps_#WWZUdGp50mm0HC%
z7J!vghBlCS!1F1E>N%0Q(6O?7AtBd`zI_>sk(8lL-zit`9YPM6@xM-Ww)|}y`@Ba-
zzkUc~_t6HPa|DDN0F82T%=#NCv$JNht{S`xvB+xQj)7}sr+iHRk&|mL6L{^~FD@+5
z&8PaUmfCN#Sg~%YW9ObO;Nvcq&#s)OY!xU{1`X>$0w&T@Qt-cj|8~C`q}H$-<p?L_
ziB%AM^!Iw%CRuSkF5TJRkBg4}TJK?F!@{-ZOhGN5zELRj`f0V<$>{azRyIp81BHl}
z*<z`t$%4yjZDgTR@6wq+ub^P6-OH=aa#dPh9tF@TMnGx2{h@7Hz@`r*EErJ4f#e5^
zY8gFlx!T0>NWf{eCJRi_O!#?2`EM=2-{%w13prnqvKfu(?+zs}WCj6Q$%~Wf@vE&c
zrf;gMi(7ZKiHw@k4h|en>mE+4RT3l~@r`!-tj-tOmJORQg6}VO$|WkD5nOAMmX?gb
zhVPf}s)iZ2_l4ehfTp*T6Z5^16cJBP!Sj}D6rfGjl!XLPo%|OakPD}dK<~*(l{e5L
zAe5y}wK93M?nDMOU(4e#ZeQ+m*_PY`*${@f@P_K2wR(=3ua{qmz0RRog_;6{kj|e*
z4Zn--G?xvDtt}A}{!os!lsCCNZrxmw9}8k^V2xmbj~Q*(hDa{yS&bIg@cyxu^J9U_
z_wi3>&pkUh%oqdR;M#F-EOQ9HkR17yIUV%2q@WCF<m0>F5A)FCdxy3^=~vq@_E8iL
zC)_vmbaE`x*Mn+W!S(9oaRnKOFk23>t=5dHcZ64jmw6KNSmX+mRCT;a^nR?)2ZbR<
ze$gR4%U#ZI`Gj>|Dw0<}LR9=@Mf(bw83lpj+9lO0Hr%dlJof~S&mJt+ry2@n-5stT
zwB4@tIbb9Vq;7J=Whjby{12pul}2<HwYdFFLM-p)k0p<8UODG-J+6InZ(7wANh!gN
z1~B&#{PSN%tKG1#ardH|snXtr(!VR+BjY}j>}(WOMLUnvQz8EzthV`>HS(CUP+vmH
z#gbY>2l*0P2O`*6?$@O-sbT0qfZTV5ZS+1}^0`4SRoxpJCpRlhE1C&%zGw#k(67!k
zd#{=HHfol6*jWRUC+g<g4S%@N3=IC7FeCH)a<%xvUfNrws{Gke9vozc<iyV%u~U@F
zz~7w2=B2|3Y$O&dO<vYmeLP0;I^2(^<?4zSfuEz46jLN)=NuUgE|1E(-m#ECMuWG#
zff{;wk<~DM=W+;9hGwBE!IzA4tDC8$En~cPrI{(|#KKaxj-A*!`%ZO4W>agwTH6Ag
zhX)#|g^zgnafNC8o~Jw~dIkB>*;h}J1mEk7qOrz!t;OXp+r8{@wQq({XKug>72!XV
z^6{my==CvMt?>iYJE`rKG~Bz@8#XvF0Jae$LA;r2R<;RD8rjZtJKKzvVFcIi3&&3v
z^hgEBSv-nZGfCXS+iEbDQZ|FP*7C8_4;DO`+lCzA9P?W%l_vX7WMpJ3HvNsHYkz(e
zC$pPiBqSuP00n5=geZbf2f5zQ@4&1c!<~T`?hJVhOw7aiauTcdSI4ma+edeL^}29j
zKNy=)j(M=JYHCA(JrEocLdM0F6dj$qplohVufTr^<@I)*^>|zEh1^kRv7F53#3=N#
ziMj}wp_`i=noyZuZ!T1_$)A;#ajaS{VdAJ2EICQ44M(>g_p_+k+2jB6aFUiM&xT_V
z3eg{v(q|56z8}coyy6%_{1LyewYYH)pbkC^2X9t(c%B3>%5$c8yVv`pxsB_-Rql!i
zzR+B>*E0S7Q8>_Rw6T=Z`vk$1=+8a{Q!}yJycFTF?TS!L|FPiXtblP=k}0Yf-$}zC
zh92b5ViQg6hEFTUJ*2UnTs`wZ&#tV%7Jeoc^sGFB5M6FrhH(OHJ*=#bsT2hR90CO`
zZ{!X<=2+_kcrrzBNF>+_x6JT(A_HXobO<b4YmNo7ntCczR3{D!@jTgz&$`rc!@Xrc
z2qp#+1{1ePrnoQ}0Ml#vWs?^kY&JWb+}{Y>ugKhwLfxp<T8=4fXf6_jX$2mohdO54
zv7J4!8DJUm5w()TIL0JIj+|!>N`erp$<#CW(g6LAoaqK{VZ&g0h9Yz3m*Ur5?eW6=
zhuZ13<awCKyG!21Os!y@NF8f;3p#1cxi(0t53yazJ8j5AWjwTz2XNV4a;g@H1Dvz%
zI8}DqbjMB&EQ^70u5#H?aJMx%=1oG&^1`Ze8F<c*jWhI>gB%V$s-HVIacJE->^@Gl
z4B2w+>RjfnK68(cjgi(48pgx;i%)TJhFoIC62$8%IT5_(%39x`l`VnT_tn?O_SHbA
zy*l;VU83tfJ+2}d$stnGlxq-Yt<5xyR)Q&1=Slmw>3<JRBomW7s36+D(NteuR6>Hy
z@kAvPV3VfL)Ez~Nxp46CI{RZ84#O=Vrr{(eYF5@Npj(+^b>Y1I-IJD<_6Q>PZyPlM
zXSf8hOP+8xCMg30vNzCXz@g%KFhK_93MitZm!IFiSGeAqTDCpz!7xzWpDcC*y@-pg
zZd@`+j22b-6yDQ^J5w@DyQ@y?20$PIS*?n*uukky(`ofb{mJZ4t^B5@RIS&epAJ6q
z-YINGx}0e)+YwwvM1!_~vjrxb###DzQ<|?vW9f_9X?uV^A$HyX4-E@@Ts`-k*Pp+a
zVQP)fdOa~JeX_vyQ*(qMV-ea!J%7u4v}h}60j3718$j#7uSWJ0iaZ_k;53iB3JvRd
zw81Lb^ZsUl7FXA9KwNyQByj!6^6;nH1mCEA4K_#z*(@C&GtrvPFx`1bLHoPkry@g$
zgezLOYBiHyce*P9!_Z)A59#c^kNY!TaL0se$ZVD^*a>)Faa)+>6O};FRAzWOqY(^R
zyIpOiU{+tT21Dw=G~1EWm*--BLMZ<J^zqJob$H=-sBMck4?mLKHY<raB|Q@N)~}YK
z4rbFU>H3O5y+&~R2B=~VByQoxLPMk!>7Oi=KeSJAMap5A8mY*0^nx{x^KLfChKh45
z=AH#`ua`*I1v5R=EBQKOqeV6D?RIBo*kW$nFkYRCuD*i6p@J0MK3SQ)n#FGRwdr7@
zbW49bs=!fqsny_YZ9t9R>Xp?<FWrQNjZ0{oR7}3(wIQbO59Z{;?=e3j3}NrbNE<8R
z#w;KO^Q)q&XV;04(UZ=$WWCL3rB1i<;43$4RO}-wB#`CYdg6F`(5{?pug8D3qz)Qt
z=?UVR6>C3y-gOb!Ct@Hs5&;Lv@E=m-K%%yi%cZg!ASe~cUaAvgVTl6P$8BNqVx0v6
zaTK+7@o<`DQ%FCFS2}}wU35&0u(frCR{hnJ2_$=FTS1+Hk<zv!Ocg+R!1#(e(hWF{
zQt9rlF0}-4U0rg(9XGws`|P6@Z@E|n-NDpBKu9<OP`vYb>pcx4V}Vw?r~73uE|`YP
zZi1)De5PM)tReLmFm1p#Dt#*b_3Ky5(@`;@eUsYCo$l8k=3|4eCrfoC2O{ROc@UZ8
z<om@-BEs@Gj|SS2p&c{>l;dX|MB0R`*#&OLfg0(vF(30$sAv~Dw(Y+*j0e<n$k}|L
z3UqTKh9(Y{#tSU1BNGX#NUh{69<oW|lAr0Yn~A`!@Yl4x>jewnw&aINwyr!bmvJ`O
z%o@;0@mpQ|LPr>*7MLay$=V^UFS^jEY-ZyZa%{NqU`8b~hr4m*Y;!)WH+x)6Q&eOU
zv$6sUQO=finuVY2$(`)u(}zVzr|n@$ChkoBd(eZ0fLP-h!G>F4w9fJv>w~yNhg%Kd
zd5Sn)@k)FWq#>eCCzWic;YjCau|}^|K|6^0X2J%4W*Lc&?3NKZ%85nOe7jugZ%WAt
z71|d~tIkJGrHw^^e4Jxh!LnTTBcOW?>Zy599N|`z92=rP!KN}7!59aU+fwW8Zyck5
zW>3!`9<lk9r(xiOfE$_fCEdWMFThf4vE=TXV+kR%-YsHJGQeVo;Xccnfr<e7q46*e
z#&12=S%L9%uBf=JZW#Z2@~V8Pw$x&)eK&@3+I~$G`#N1@3w!st6wB2vZ(EH~`vBuJ
z&&9iae#rhHvq2$<x?`n~`qYa=o=^0)H46kHsleEW{@+T4c<jbsSF4sf)U!9zN0r%j
z`hl52Yr0Fd;GJgWt#zGOfK3gEBA}-ps6RT>+3E?=H#D?8n)yvNtp=!m0k^vnGX}Nd
zT0m$qYBtsIOe}s6o9Yfi#l*tu0cwxV>J==)W|weE0ARma7?<Pab}u2Ym>33yk28tu
zPY&NIs_~!S?;x0V2r2ZPVVqE=oPk8x{_yi6)tq_RW9UA^5T)5DZDI>+5>NGh)zH};
zVx9BdUTpPp7^Pr9yfNbUger$#4IKKRoa`>4*FDs;k3#Ke6bpSD{zY~om$2#~jZI8c
z8s_WYyqi6#h7cA~_x-bJbUhbNPavBVF)zS1)RimUHHq(5p?9oQdg;bTbUK&1n(&3i
zQn%JJM8-f&1qUBt<_XxQXBcAm&RYE*tZ^TU-+PqUNOZ?qD0g^C4b1^;^;G=fk|jS1
zArhn7lm@a((T7^F+pUGF#1~df7K(2n=Z3zyZYs6b;MYGbtyJy^DsPA---$O?n-g06
zZr4|k2$wl^?!6nGkQ9G?c4mk4G7e+~;)@x`N3~8`#EoJwHx=lZs@KiPR+KPQj4aRf
zk1Wt>hvpD}hA!vQWG`|6(^kFzp~|}j0#&p(Zb5%%D$GuV$}Y<}*d&TkJ7G19!2tn3
zS1&c|BC_CE>+=u!E5U(oY%H;FrsOf58;|rsV%NAAdpP@#tMy}e&aE&|4rPzb6O#R7
zhP>c~v@PuTY9_Nh7>@|w-9C+IDzw4oU|@t;{#tgiyFG0w-wnjI!ba2W+nK+b5~a2q
z?XMX^ptuNXF+5MQ@ET)_Tu{wA?l_1MJ-X!_5eA`HjOcg1$W{@BI%bvw@rq36EqJiA
z*88=l-P+EvC(SZ5VEE2trCvXdUq%oriD9QNw0}_0^QgV&#fC7TodK_JAMm}jAFp;-
z+|FA+Qc=OkI=#QWny)m(0AWX>xWnVutC{a%--5a)^}VYcWAg!0gq#_*e+L}26u@Gg
zvDm+UeCQ^51DipjzR=9KJdxwBJB2=L6^eV+l(-1mXA7Qp{tA#_zqQ_abXfI=19XwA
z2~kt3u@$kiJqTNn3;y|ziL5mDa=_rR{xPgegPdaC>=@4Kcf<+h3+}o&FSHTJKy(tt
zci|q<nF^ac4-MJ5NLifkSQzVkNYnUCL2n5%!uZMCL2>rGjj;AWbEHTGC1b!PMce_2
zL6>Iq@KC)S&C9WRNl0>U9#pHlv^Gcr#i}(UXc<u33{s-AUW2SFSi4sF*+{Yiy1`MA
z-nqS4SlpjpjELP}ZpeTCR@IKeshT|tv<yLtyd_bZpsB5P48tUkl`=7*I_#sL%!`oA
zGqw?2MBLvdULi4)9GyUO;UmNoR{V%so-w;By|rL-LJN}b97;+)CzO?KfZxsw^DTAM
zn5EJ!Bk^>s)AUn<@eYXu7ivvoPVs&!m14T86>hU=;eKyS=%U4}^Y0v$Z_@idkE-u%
zW3D20x4YV)Cpk?5om%Pz?u@`0(>xz$FhTUOe!;6X!Jrx#p@k30o6PJ07p)WrkGa8-
z`UElgWcSh52EXAdF|Khf<ZuXUN}`v2S+jZZ$q{_sb~{{g`iO_-qBWOu!~V}x&AYVm
zrRQS7ev2tnZ(4Pw+elAdJ`^#X`q;1^s^DKIz5St=so}DZ;5cy}X6GnMOqGs?CRm2C
z5h|*GTE7X1J&+0rWH7bA#A4GavnT7k-W*O7a$9>}Tv&Tvr>$ELGt_TfIy>*B7^&2m
z;{y{Lb+`K>kf#vYF`woszlKds7b@5t&3qc!>$oog!r+h`rC-3G?$5_dRIC4bDiq@3
z8+&P1Fxo(K>dUE>S1wRey<#^mpmoXg2xM3BI|jRC3)U`~30Bc%-A@e)K*h@ck$?|V
z@}4ocU6_N(f+0PYMkEZdl<4bxagF8sarI<TI+YyNpeCnsqiI3N;|Ug|_KCx>VgW$+
z1P~yGh&(+z-LoxAKKIg(n5|Kn$q@6n9hV}8k~iLw5rSbVQg{wQa{7TP>zaIy8ASSG
z6i*G$(i~5g@1i+LrB)GoqZEPSnBHQVl8Weu>1C<5DnN~VYq-M(E$;MSc)_B|OAGQH
z`ue!-(}Lb)7-c|gn0FII9M#EOviLtc9YguicvwRY`UFD$2w+)^{L1c<U3t(?O^?cv
zqAKyDy8K#^uv=_5u)bqnF^3`4B1}3k!<G^rG@{=gCpK0`NO;S27Nf$nV49<3Er&>G
z^}{=9%G65kXWxV!Qw{FzU)v5lXrXNy{+A6>WRx)#ja6SFZ$Q5ic(<S4vf3IbMP_a2
z$IX#Izz*EIKw3FKb3{N$Fp6)XuOa>cr~I&#F-@3=#fjNRG2~wVb<)Kwz3j>s;sDP)
zM=ASN-6K+zk+I$bySyzUikdy}hHpC9<V;T#@%Hpf*>1s7*+XE`v}dac5|R}qSca@m
z7a_w6)AB>Q5#bmRkz)*&*15@^8~w<{@C=yb{apcwQj(Gz0k|OJin6-qJ3y?x-0?Xa
zVeRhj%0<?@Jl&XBHlK0=E^Qw$_ThZ~PqkiuWQkUL=3he|h+%_GR2wNI(yM;a(IHFY
zwAjjee=*XWzyHkk%S+Q#JzhRbuvIv03b6X7fKV{tA-_Ydlsme-S^{3n>v>3U@CJak
zAppeP^!-jrJZ>hveRj_g9Q3SCuFZ+twn1xxV3(){S6!=>ABvSkcOc?PX!pKitLVXE
zOdGpXAOC<1frpK3u>J{w9k63L(61Ls)|gN&#?{YV6?U5COsY#NB`4<cC0DY|3<nXE
z+OJeitUYB-K-H1D^Nwa&C5jG=h%pY7$DCX6xvBNOw<a4KLL9EH5s^Stsda{bG(h9H
zDwH)|c`E7h@hHRu0vJSb->n=~G3UTP<x#3+>*Fp9Ax@}us)u^0ql1*NLf}1K)t2za
zPD@tzy6g>D^)SZn4MzMOI=G1+Mj>$c@3~Rsh$Y-nO|STdysB92#zm(Jg@%Wj_=mTr
z*(AfN!Em;els-VI^95=8!b4JYfLVwqBu&F;qikbf;=caJV`z0bp@o(?(T2G*Hjv^_
zt<cOG>rvz^qS0+`ooPQd{a@_8Wn7fs+CM5HAWBF{hje$RQqqV>cY}0;lyrADNJ*D;
zcY}a*w=@i$a~6C5f6w0M&G~$u^Y)y1qcQ^b%&dE@>-yFueK8j4ZNBr(ju1Xtq?eh#
z$iMklo`qBWi|RZ!-4_IjCvh`FW~(vG((TITJC0x-Mb%NK4pTJL7nPFB)i>Ql!Z9o#
zGoGkhRlRxp>Wk~c{wEF>I^-<2w?h<<7tQ{6MzWAn_tE0ZPZcbI!sY(m(E}s!UU?VV
zUJs7OV;QoML;@~5>CSuVq|<xjIS6mKXVsLI*|wAqx#IwgBFM_hs@mklFr(uoz|?xJ
z-@ZXbMFlVib2vD-DaWhPp`l_8Z9a@|;DPVC)}yfS!=cj;(Q_a0wIE+9r*YoPqBQ4Y
z<g}&Y3fTU5Kn?`-cBW0W$`H}GI@!FeVViHM2;)yx^}DN&{CvA|Zy#UlLyuXNg%ZX6
zMx%oI{dpD$#|)1Spb%v~E46rU+un9J65=L|tSF^18H0EoeJ57pmdN2%oCy;-vb?#(
zPreiB3p5bnKPMgG#%?GivpyW%BSnNLVJD)!d9`crntgHYU$XN}hPn$Warb))#A;HB
zYG-!Dox%(%#n%~=`c{C50r_j)S7Z=7x&w#x)3o;XiB-5!Hygei6j;Rr=Jkg?U+>9Q
z!jD)s%&85iMt3{qrdaZ_8gL3`!znCf{&V@VNnK4Hz3vAeWoCQB$aV+w_p@mJNbQ8s
zLtgVI8BF3PszUQ-L{%eSmSBEX-utmw!SOOh*>kN^3od~u+Aq2NPHn#&xxDrKlf95|
zYd8{iQ!UnNhZH0#iL$&!Jl>n8-Zq&{4-vI1%y$ri!EVYlmvXz~{01mvU!6Q(9`VxO
zHipXb?{BkO5~`MaeMdkz_}b;MFQ`vi`aQNPhBd7ImaWXqmbfgraOv&ZN=t3cj>M)u
zYgz-ntHA5P8R7eG8W}wFEQ{%{VjfMaJI%Uu(aLOb33mO56qOn+sR;@JLrN(5)TE`z
zzJ|Xi9;)ENGtVO4R(aInf4`sEiGnsl8C|`;1@RAeLmf~Q18VJRZ;mTb1+Tvk#8Mg5
zs1FPb6sFqzY@OHLp~`e$ud@OG$1_c*IU~<~BDVt;uqSoa%xDjRz_&Wi$Z|D4n6YHG
zo!)6R&=3H35e#gEPTu_6yz%jIBqIJ<YmY;oc@QE3g1sGm9YJm}%yR@kK9HyS9hjSW
z9ci`8flUY7lVT$(Au&<soIUl&C+*UZ7W!r=6@Tk4j>nPq@ZJr^ZVn|l7qN#4ta^~@
zQ4>E~b?cTnudwsWNku+iQ*KO_P5OpJ?ka<`+)GS07Pm2*@G-kjYVkz$o4?vF{`6<_
zRQ+1$q&FHt*Q$ArdRqPSu5bZ2OIhcYV6$!nQ;rSW&FYJ+WWBC+JogK(2%=6$Oqp73
z7|Out<I*QqhKaPeKh&|G{3vyG<rVs~*koq5RY%2bj;u$d>Z$lR;)`eLm42LG3qK!b
zo2C+Czg-XPM%zuSMKnKW?o~L#4&-VMtgvQmw+wz~7KB{P(WunL<U-hBXzBD}46~PF
zA*Ea;<GEecm=hdPs;FU3qh4#XP%TlzCKmSk3AtV=A#d^gP<D%0JqrdgzCXpwFiuVU
z&89x;%IcQ>*D923+`n`ArnCw0SqFyM*sS)*(eX7ik@>B-v`I)yv<b~DAIwWca6AHY
zqm9p?-?kWPV$GfPYAIqcEqlI(W=V@`9BCLgiuJw0+RVD1)L-n%8J*IwrV$fkd>QdB
zzZ@!a%8)@89XiadZ*sG~H5;iOlzKuETV`!MyrVo__u6xm&3~M$5j)>U_|;XszUaap
z#4rv^ag{yG1c_w{|8Qabp-Q|s^Nd%V)#eLcw5R<W+<KuA3Mh#v1^J{9X;^=L??M9a
z&CN}2K>-FH-r(JF+wn|el`=ptl#Amp_GdOOzTt<-HaH!se}^}4e@?IdD+n+$b{A7B
zXB`N{>^$}^Mn)8`u^A2j{8TtUUTN!S1jroBA4iq)bVQHtAT(L0RA|wkT56Bm-GZAA
zRL|AA$6kDE!>#}jiPu#7p<;5&uF`1et-gT)EPM<MD;wLu(9jbSt_MP`u?3Db>a4b=
zHLSOC4*8hA&3jYNBHGvq9go)go6$)JC!MgGL(@K#r2GgC(!|_eKJ6CUjA?RaP)l`V
zvGpt6miCGdy>BZos2CKLnP9#R@S$Okh^w@H4F1A1;IyfrO4zfiupv=G_+r^+V|G8^
zz72;IUqJes;836T^Mq^Bb$p?7_|>^ok!cjv{J*wgsO*=YYXUn*=mzGe5mkO;9@PsL
z<-d-bkz8(+<wkmaDzaWlJ4^EkgJ0){cF6f#1^Hsfui(dyR^Dt*_(v6T*VI?4XMnY`
zk}kuAWY34PStA!i7JoAYdq2>Jqf`c#!dd-|?ye|)j&K98+1ws@Ra<h@nTY3uWE&E=
zsh_XbO<dA237;=0DLiW37o}}J2mpkTVt4zCeoA<q3(aDEz#@8(6yZm^tmJYAPW19%
zQfm*veWAbdo3o-nhPBf)fGp!|ZUsy0tVu>--o-NN_zhDJbhuA1DM*q)qP_n8n>1eS
zh^AlP<LTVcT;-*f>TAo=)k+puLC=2s(Yd(6sV?tvynZpY5?Pk)cX3vq_9$ahXvN>P
zFnk?+!^+xH3wTt+EQY|679K;d*-%6gOVuK+&2ly2%MZ*5%_cSL1i_jz8NyZCW8~$3
zWQMPGqF6}V1+ta#_cT(=w>-yODM{hU(H}m%gKID!s9E}8SAsS8s$aXdXge9A4&|Ll
zxiebGIQt<X6aFeHC{Z})`@JuGOF%vgqg;Fq1@K-uVr-OX)J6FD`BBz!q>jjB@FxS%
zB%<lKHFZYY4H=|-R2dDa*4w-TsLAH#f_3?V=TwmrKz76b{x!9;v#UR!kWQ7k17Xm^
zDIB3S<}+YS!ttMQ^JIY3q*}KQ0FZbDJn5ZpA`ImPE822z7KDirj|p56j0xV+BO)S(
z;c<kJeT?{%o`^u?@;!?LI_Q2d%koq|xVUhS2|@2w61)<JF5>f+l6=zJmuCv1PlY(D
zKgX4&^(xU7+*mt|5C|@wBh8qVk)lP{B8KY?UmJ!z#)NuoX&#u<i;DY=k86Zf54ZE0
zqSvkb?#2B9!)#+UX!!UQVH%%kHF{xVR3Hf7>e8ZaqgQHT^%({Rhq6NdZQJ(rFcL3i
zLvW(y>(iWG4Cov>WW_VUCtN`^DH~!+Q<n3_Q3QV1H)VI3Vuh{@HL==IdIUDzWXCRJ
zzG2ZTr?G6qcwIa%cHPCF;Ewj~tcu=|ah`!czC6}@a!F|Vv}wMI?}XQu;ujbN5sU_E
z<Q62Z8|x91fujiWWxmj`4;MNwku`73)!IPo`H9yIHbp4QmO0W9-`w~+vCxk~HMf!O
z+%T%VzgTvhCXnInE!AOder*!!n9Wl&Q975oM}XFP6}%SSkR<s1CzsDyUp$PeCt7m!
zNQwW<lu-dKy^!91y^Oob)BScXt8u*c*5qs1q8FS`j>O|Jd&pa(Xy~OTL`8!cKO-f<
zYH3}P7(Yu)vv@SRU+nYYJ1<8{9u!z+YE8Dp!0Bsi<CaGlW7)86(bOYn%K?ei9ezbo
z(p`fqo4=NP#oB2i%0F>wn{mGLFs9pf;Rz|$LOct1nyp3E)$w#ZZz!J4XgO=OTLO+x
zfh+sBL8O4Vwl)z+Eo9uEDK9V?%eV(bTt1*9nlGj`9JR_{zkZzzh`gz(DNG=saXhvf
z-m(L#%nxmz4=#X1bloc~z+u*XW^2pw;igohZlcA_Ie|e#c_zjVjNx!NoUgH>A1jwO
zSC0z_3_QEtFSoxtth1k#<+7bpQaPP7$<)Jv7rGgc@9*z_S-)~HSA7E+Vp98Mil)4;
zVoyxKX@LW{y>68lCz4lA!m@1V?`Fmq)6moJuCj|q6N36(E4x=Z(stlu+iV3ioZqf-
zYX;Of6POt&N=p9XSW?eY{$x_3z|!py+&z))lpS306X%DU&(w6OvD2tJR$l$6=o>ae
zk}h?YhhKNq&ow((x+9Zf%T52$0^rIHqNaMHb6v=gZgeo)aarl#9gz~`QI?FKhNVV{
z+qoMKwTZL}`E>p@s<tgkmgUv5%Con0MTsQy*tSPntPGZQy}>-NSHvX}TfFF!z>OA_
zsS__(5$9oa{`f;b7RO0zB_YtvbQn|Iunp(0-d4;!f6}n1_(u#Ss~^^p;xyl+Y3#n}
zhf5n@k1#kOlwE$lH)9d)iDvyOlS5)jmcTtxFw&mV(wZ-)q#6{mKeoDNM&4Jmp2xZh
zz!vG|kD4x}EZtYql&uXuSVSLhhj%0eRNF7_4-e+e`qW_ZgSbEUu6$5;Z-GoC+|Fn=
zSmzpG(|P0=Fgu?SqyCm>Gn!mv3ct(O7|^Z@_8~<$T9wc*lOlI@cy$V69BX~SgKp+K
zy4DgDj6yasF<_le^9!3ZRfqWS8?WA>UKws^OX*e9`}y-U6H-V3lCi2?|8koAnjlvJ
znMF4$j@;f~Q7G!PWxbB}wS7jg^taidV9xt*1d_U(sigx}Q#E^IL`2}A2XZgxC#@2v
z)uM!%8BN{tH@eAu*$uFc$`_6{2Vxg(x{w)&+7t`yx1cisH}|+b*8}VB$-5pD^fF&A
zeQ-FLZNK+<>Cc}e0MP-I89Ltl?NK}QLI(;qV=><ij;#>7GF?DCBYSeS0|MnKz}w}d
z10i(@nxJ9zj*c$0?S6~N{dz4BiGVu_q{p%VZtZ&?6AjH;^_bvBS<}&a>)nz0h4t6L
zNYC>eO4xjPp$B$^bRGHqQQ~WN!Ihf>m*2bVgFpL2(~1AbyI^M0TBiDAKI<P@aF9gP
zf15FWz5Xvl$N%Cf!nR!`i97u-t}J2g`+%H~$uH*;`bTdIdC1l+(BuyYT};d{nrdwy
zAMO96Hr&t?unvmme?K5fz+}Kk6dPZk%Jz~Dl>m&xBFcUb@*w~Ha5n;?;Lv_iLks@s
z6$j3LgyEZRk{Fl(OZfNWe{s40?_T8p2fvGu$<X9V;kaio@Sk@%s2{EYtQd|(_e#jS
ze?C6a!#}|xkn%0=y<W!ueC@w|LlcbuZ3jC1|DXCVbJYLp-O%xlJ<z#&J~;u8K6iuQ
z5YZgMa>Zo0;T?-y=W32;&?`9H1n0E;u+*|RE0i|6#u*^zp5}a&?l+ldaX-o4+x>B3
z?ho!lkXf_ViKUFlKX+0qvghjAjr&gZ#)pDXx*Sp_C(q5MBSV_wQMrv#I48O1?^MpS
z_bVUbl%J{G`z1na!wP5oo>5gotzL*^G8+A@iMLo(m+Hqt^K;i8WFaw}CD4|82nx(x
z>}N42CY8&O%~=smC>=7?Z?jg!`)7qrq?OYp{7gu<LrXV=LJDQHbPM()q3waAQncwP
zntgP(&1~_rV=G&yRBUq^<qBZey_>7ICh?kG|1q(<{&Sxg%rbc+XXTOLmT7N0WS;r?
zCJ%Nd5k(ogqB?uU$W*>4?OU6?62tz_i`-1jd)kMv++6Q!ETlxrHr5IX?3S=<N{K}N
zo^2G$$Re)ICOFU4m+J+cG~Q+E?NtP&Kt+)#Hg?Ef!Zb!aV%Z-BQM@3@MKu#@i@lDZ
znrky*>XSPw{jax?=U*EkpPhwJ<`Ewvp?sD302}|K+(SIo+<wtqZ!46%d+LR-NYH9g
zy~L;^JnFkDXY%OxGxfzmB{S0e<UdfOwe-BiDB=qW=~`A2{`<<A!!_Bd<cYRsC0BD_
z-z9_}SZg*4Z!}oB5bHP1=*%WZ{TAQQc0z&?e!08o!sF-g;$B_RQ)A*E2W+%;3@|;6
z`}qY-a$pA4`zOw{48JtD*D(Jj{m<FZP|t1*-X1Lj-1#=Tg9VAp41*O1tN(fP+{8QN
z8eD|+B~!uBpGhz=@(#C3NNDEU*`~JzksSVD`WffvqNA06K3oXaCoZRkpSO#ZQNsd^
zR%$2D4fIzi1Q~szCi`Zg<t=d;M}tB8d^2Ir|Kk$0i!L-WF({D#+&C8TpyNhUdY#X9
zMu>_2Vg5ecz46f~ALV$k(kJrTbVc%3CaJ3`@)8o+SC+FsC)=dW5`{)HGfvgK_Tv4!
zwMxxIRv&K_hYM*wkidf!9-3?4XRU%q3Sn^E<~BmORJ6NK9qBFHpg4g33VNXE<8j<h
z9SL>6!`0?0^Oe&y(hl1zzt)}7=BME3`>M3eZ<Qv8ol&H0p}Ezu<*;7H%t#+eirxRy
zW`p*^rj?>n_D;p^G_KL6E?-Is{3WAwzjSud?T`Oi7DnAiM=3^`)WOgAL%TCY^Px?f
z#f2D_jVl8n@Liz;pyL4GG@rVJ-i&!&UvETe)Er{#xIcaUXGuwO6uoD`7ZnwQx&<E3
zW=AIBo}M01rv?If=&?}`Gm_KsvaGG`sq3Lv<wH9l4o6M;eC4oJ#Kp7^?%nO>h`S!A
zH)PUVq9Uctu>;<CU9sJ-H57)`ljb(Gh>7Obj1rjib<*7HOQOLZd4vne;l59tckFXI
zS@&quHLJer&sc1<!Hq;ZnmDI4YH~GM3pFfTlAW8ob4-M?`7{-bte0gk{rzG+shHW?
z8|MZcK2n64xDgWCsZlvLd5;lDyKtg$<jNje6fJ*pJGt?9u5j&Ek%CY4cH)eFxgxW1
z(+z38Oh>TuA?NmYp{>t^r;;0ura&efzEp5BbEp|v4l~DmH#Hr($;GQG=f=^NPXT>V
zc0-3@_?T+d9yJJescl>FmKnyV*c{uo#^D!2;_0;5m#9_!^SXUXcce390dXvKu?aZA
z4dXjDV<5D?xh|pEqo$7ogM&Q${B)q(kBdJ<L0Mwdt+;<~EthlIU^lV*T4O@8Dmite
z&b`gmiBa!J2f`iC9ISRlu4h^G3|2|53VW2IxL*qL2X~RBeZmwLc4pK<`GF(QXQG;e
zc2;WmH~T~%CEAp}g2l!o5kkXpso^0fLPj+)&9K60Zdj&-R9c>%R*9Wrg)LU`^t|nq
zvXedeDOVVohXeogs<`CwkkJUA+9n}C_6LrI7|Avd0!fu3Im!K+Li(D@v}tCW<59h-
zJ&S{ncQJzYcr*$lN}JD$S}<F=GU4{qXM2a&y(>BK678%d1?P&Z2k303mP}-p^d<1p
zN9%Kc!e23O=yIOz7_Gg;L=BfRm=3f3)5hV7x*dHnif4C~-ZZ9uM-dHK*^oS%x#lIq
z7Ex2fjTE@VNls4wqB5iT9s?cS<)E^!ek(!azILU_nMvqwZmfT73?U1o)q`3$5KMnd
z^jtUs3lx{lxI(WB{+SFV2Eb~EQ@J9G%Uc^eY7tNfvH-ydb2bZ<(qQ!x+N6)|ka2Rp
z4ej}ySa1t8(Zj%fa08|(+B9f_eTUtZkpKlju|RGMFi<8vAy|t}3zo80uPeNuo?x_9
zA0Pu<*Zi^f62>wFQpG~w+5$Ae$HzxPPR@9)$_OU1MZf1&hUcvr2*7{+9#9n5TTEU9
zfcboy2Upz}n3A;z1f5A-*4kQGjiCR0;_Ue^{wUr9S*mHp1F$E2X=49A*>`Qa%#b^>
z++SH%8J5HxCwQ>a<%nO|cEM~Q7>EkJrCS$i+YJi|a=#*kI#7fvV|`fB7)biW^czvP
zHSGh&QNbiO)q`^h?K|s{JIwT2{X_H<#HQXt^kwMA27jiOaAK?EBKxWx^<7dkbg4fx
zvxM|~%5`!6N;9c)Z&q)yAZwv!181pCkA!`*>VxL;pX`X^Z774r$6(iIS1UZ7Lce$2
zGaFC^<fdW%Iw_(1-a4*bw-Ro8H2UV*dk&n+m7jH=SdC=Tn{kCrxKDfh^~(UcYv?4u
zUCA@IUb79*!41XRcI#O6y46Sd`1)_sd*WL6smX-Y{o_vHe_vhsU+#oO2V+_SL#9Uz
zHMf>GR%C+N*xQ35m{^!GXSdauiE<m*1_l)`U~uYd8Fr?)&TN!YgTrLIjr}38aF`Yv
zJ<t2zQ9kISOA45KM$aS<#fS}6CVErQMszRh*>roiP)Aan5fU_-;O!z#=lfPb9=04S
z%tQo_bn!RDd%1}JcDL79BJ+sBProzE?Y>1I`G52@aJOYWx*l8C59p*1KrF3>1vVrO
z%>5zyubdVJzAd%i7t@eumXZOe(*q32J$CQ~5il?CjmtwjpCep4CVzn+)15cf&>}p=
z1d%jto2w<f8qiK;O))rb0>E=L5EvUr{So1iQlHMR`qOLoN;htPK)WU6VCg6YZiK&x
zqtA^bv1A0;Acm6+q3@}%Q0$h~g(JG!c}e_j5nGt!5A`Mk-0c#L^<QOv__o+!?qj<c
z-uFz}0o~V94;f<dD$>fP^z=ZxM>^ErM%laPtvvK4hEc@xS65q=_p&cj1%q7#8nwPq
zP*4El0YF1CZGc9Z*=|$f$;<QPIked?t&7Rblou8jUb+Ma$D+$dcpA5@Pv+x!fy41~
zI)JX(fT8DUa@;So$-Ua8W>+>q@b+w+Mx~@k0ny4_qaz&;4^J4ePzEp(Oy&(Wa7Du+
zpgdzX9>#%3C0ZoO1V)C3z!mU)pq;%3ztW=Oezn7QIE^PZo=&CKg-9-qy90EpL!nHq
zv-#5n0NHsGkpjH2$!abqQgCpw!5)|2g_&#71{Ms0<DpY7Sq0EHIZ#&vuhgBT+niD2
zDd3F>LVJXroH*^bhu;YaWjZgr2Fj(USY?37McgNXTDe?%1WQG;1L!G&77n+;AFI0s
znUwteX}gmJGnQ?-9nU>uJl%@DR^U$43t$7KMxAG@(+3H?PPDK?%FPGaT2UjXH7dPo
zto5zk9m1{z&J?z|5`@sAoi_ZsuBKrJTK)Wp?=nVt7W1_P_}*SmU|_fppxz5*<`g9#
z)-s&rQITJl-OUNF^kp>Wwcd+icmSlcl^D8BEO@m$zopdLbu9Q-T2A6}WcaJJXOVk^
zv&n4?2D-bDXTIk*lXR!M$J12enTOG$Z+BR&PLaorelK+P4jdovz#PK-#k&3Qxz>SL
z;!B3Xy9wu_fs1Mn>&NQHTg6t9xwb3WbpE-k6Q~84$78m}4V~L^JF6UM9jJbLSKE|2
z|5H1q-0C7et)y-{^su!jQub0d1ER5#GA8#Yufb*7aAmWw?V|MlSF?78{B;OaJ9X4)
zCD8-`2Zc-?soEJwsr;@*){h81tr!;&jmJ}|m2xWiOIgo**_DiM13`bdx9DzG#9DV_
zsqT#1YKnWRJoiEmIkxP2rCUoG+wiu|(rI}#-hwNC{HvMHd(q>x@<T1wBbNBO#rn|u
z*<Je1Bb8t&&5EK|Td#`y5353J>uGBbAdkNs@!VLnzJ=uUv<8BI3EWS7(>Xo;7RcW`
zbu|$NUS4F&?!rZ8_3OxEaP^1aWdvg1)Do2?t|(Hpmex2j)OABsNGDYBhd!qc79PhP
z4>k9+&Vj3>l+W0^yO0j`2+c@4NWE`&U%3^rq)(@kcHh@Y(RLO>%DvAtIlB~E@X(o-
ze(Wxw^JvTXe4OzmwGQerz~upPSozW1zMC$cHCwkm2d+cW^U$s7vE_Ogab?-r)9rHE
zN#f#=G*!@}5%uH&I2=xuR8(ddbefz>QBRh@$%S0o2GCCT@7Aa5J&r>W1U&2ebJYb*
z?CgJ3KrvY7zS{0BSUq<zG&*V()eDX(_wUC*qi9ue4>V;bFs(*I2{pT1hK7dM;GT8|
zar1>5wRT%lIzSul@%<QJ@#jFS?I_s3W7lwh$wW&F0k`GNGaY?>Fgc_<Sg^fLzbC{v
zQbSAYFf=0K5*d&Er&aRM76fqcH~RYe?(y+E`l@XYht}YS6&7R^X<UwfB))mJg6efG
zvf}lqsFLkbAeRma$LAVFxCTzXDdV#zK{@d%KA)3?ul=!IgF9fhF2uyeZ{;$}T-Ru5
zXiok8{M;=oMZ@P1{mzL#^a@}k^RoHw@1c&l^nB{o#D*!*0;K%xzwWlnHOxCM`t?%O
z$Vp2>cwC{qxm&HVxXKOR@B^XA@Z8jZ@q$9sFq4RCw@LBK$>3sZJXx8gzIjF@JU1?~
z#t#Fv;qIE&)FjTqsT&R*-_+Qa*751?lH%z&?V8QeotFf|tis(~9R3Q+KwPYDkCBe>
z=Bu^M$K}o&oNzpN+fSJmUD!Q$u6(ZAm8Xww462Z+E>fPA;`#DoU}cJ~Ux4FgO2?X%
zv{4A{!zb5YX&4M~*jLMkvqL0y?Yirosk8kyiuE(WAIN>JgQ%*shwv36T>_}YKGei^
zjtw$Di?m0L$kB};T&}p&j>28|z1FSKzA@_y=2UT0JGf+PkPe7gAghn4>BmY$aCD#{
z_?JzGqtdRBW$4>+&7TJJMzfM0Sro5ResQB#%@li7z+lPgjm248DH^}7dg&I8{SC?R
zL}_O)uJ9v&p-4{FU`K|$bSi;Y#ZzNqJ9}8qJkG^ZseR?d#v)k=eszm8cF)$c|Ckpl
zr}jCF_pEZf;t-$6<#UVCVS!-$7*TyG+Eao32f0|R1*H~dy-!DF_m|;0)GLCPw9=OJ
z1>gG)dofcxbJfOJOa{4cHO5R_;6O!8KaK5DNuC?c*^Xy}{~v>h(LV-}Y<Pq1Z-axt
zH%Z@|$Ddqr89`=sei#;7?#zV3I!RZ{q>Lnm((v=XsBhviBRC?$7+|XWpeRIc<ckCZ
zQdHC%r=vx+y5aqcD)$>ZQZh1N6#A*L4B-Exe;XoHfpzs?ljZ(xiSPqsLb$k%0SWQ!
zNpN0q530Xs=irE$x+(wpQ}<$TYSH~{1X*3?sdNRF-71h(4P17EMu}R}hIn!Xu}^|0
z;I~TzzF#iWf%!Bb9soC*%2E~348#MOsOLqo4jQ_h6?N}_j1ed4|8YiypPYctpE5QW
zJk2?qPqVhR28KWd0X0lI5c=g>xr&Ou02%A`nq`e=>PkvIvRq4lidLLhe&X|e=g0_4
zOt@hZa>Z2HWywO-65%i|00<n-oYeMuc$_0{_jzXI<^JVT8fdLtpkC4>UkP#$xG=~|
zX|sR+e8G!MLh}d-Fwj>=h&tENn!Rol!%QGB5shTD+2%fCS_B1Z##-s2wPVPSn)1qg
zoF7z6J%3xhJte5VJ^tO{KO=*3^hFYFU*o>4b$D05rDwC*Gzyfb6^MXXrE$dZ4lH5U
zBa9O7eFo)-$Fia=?l9yu_*&NdrJ=%_t{op!%<f0&?oop{&sHDuOO!6WX>JL_PIRlD
zFbw${#IUd2N(vF!%z=S=R_^z@bb5O2XsBCC@Bs>Pu2{brhP~+PP!e@(0$`}9uhOix
z1N*Ncz3mrxAyOo?9RA%IBhCYtZ%e7@#=>_8*?W<Z`57(qij!h!D=tH`Rebe+%7}{g
zy=j=1-0cvFQ~9We{7Gg0SKEJ?ACmO`Z{~;8%HBK{iZXx2&l)p6^xa`;qG2oW@bOzz
z%*^8rp^{y~5<~M}5@amt3X+2GID89y#u7xkJUm_L=}7L5o%pwLCO%Se^2CYloLD8<
zZj}B+9G$i@PAh3ZNG{Ryn_BiQqN5Q~N9j;5$!7S2X^$z1K}<sPn^&OR!zf=Sxf8^6
z-X|cy!6|5IX<;Y9&i>0=5g3d?b!+Jb7^_o&3{(RV0FY6=L_xXSSJoD?%=$3@Y4VW3
z)NXO~{Zk19<a>B82w;qdlRl`bsuoX3$jBJZlxaEMd@aqBPDDpX?<90yRH(BUN?>^A
z3A4Pqsw*Cj*8@79kHF>%Y^@uhN_)0CNLPL?-E>J17Z=B#8o1UKh(J()c|T@<yxign
zOc~sN9cs3Bw3urSETnoot1>ft*;3^u936l(es3NkNJ@%6o6HJHU|4@Z|BJp9sOWke
zRb$3QQTlb&F{DeQYE~V7oUD%}tD!Q6c8P!y_EE^qnHXCs8R~Uk^Y^<PtDy}1Y+M$;
zsf+Kc1k)dqv<ZI$^2+9BOot$p#X1cOOSMf?#!Y6aYSZGD#(Q`Pn_3~EN$Y!A;as_&
z5NY>5A=8kBcPiXJc#~ZDm^Dcs?`*6QXV|CPm*I{kvz_r|?%!U+qdB;SljPPheehOp
z?|BS?926`O++1?I+^ss%ug04Yt!J-12G0nMmb6)a<UIAXqs%)Duu~Csc9(uVun;{^
z#`L<dxK}}`&$)p4ZVEHv&2UBZ!^?@p9pkYxN8ZAPrKB!{;6r_HeX0hYxlQa0A?%^p
z)Q&)DHXAgFFVHeacq7_{?S;A9c<<drkH2%Dw&LG<UMI_b`s`Qy)0h}(@uQZ@v(Ocz
zq~`XfaAF3Sv2jv<4mN3NO-w8p<JX>olrn<G!-DT^(aKV$%;llqhm>Au24sqj&F6)5
z(og4`82k!fPe_VLLgpGRuO1jF&$(@&O8g@~VVjWr(iC-0W5RZ`{^ZNhFIH+=2ixbS
zmNgTNy^aQu>h&F1qQuwEgp(p%u6Hq{nqu$wd3|Uo*&ZJ~-=mG5#hW<qR3BHsSeC!5
zu!Q}P%!)cXjQKWyOJw0C{N>9h;%3-CP5L6OfHITV^=(4KfRt>9_Df@pXX47dxDlY0
zMVC?Sx_fhE9d##(el}yg=HO=Ik^TYFG+>CO@iG!g2%B2q0k$lMs(mwjXa7sgN_tmX
z-F9phOH+i$OkdV&u}3|#UJGT#w26-<9xg6!TtWgmF0NWb3F(QliVY(<`HSx&fhg(D
zD_Afc;2coV@LX+&88&8fYDx$95UbT9cT7wSHA5*B|Ir8T^~+bUiiY2o{+@kL^}W2@
zIu<tl!YGbN&>aU1&zrBa)-g{F37ZO$84A%C!}_pX1c5-Fy+mrR5EiDA&oJDdE(L+`
zQ#V()G=CS2k^4Sp?#x6|Mc0x=#=dW_*tuRM#G*n+yJ3modmPvw*y0-5SV%azt*xeG
zJnp@w%G~Ji{&1l}DU>m?C(lO1;cTF4frE^~$;2~ii@IuFNM=5PK&)e(<+sr%MGe<1
zz^M6AdQFw`_@1J=%BZ&yb6JgG%%07J`IpF42BFDJpQQV@?$mxWXBO&yd<fU+V3!vu
zn*%R#GjHqV8#j3(nrdjb&qY%|AG}MTT_16#!L4k{TI(F$0F{9(yc`pLWt}M6U)Upo
z!e$mG*fu@3Bo@}(OVlBBDqP}Gs;TJEY8>XdpyVV8;ZmtdrMQNHnuP%-HVj*l7cj^a
zG<T|q`^#vPGjkS8Nn$b*@dwGB^qfYUzKP)#I=2STWeC4b)M&xux^YF@lK5Q9VR?V?
zU1Zp|t?wW}3n)c?Pc7hOMme;m8|y)`rbIjH);dZe(D&WxogZA$w_jgQqMB?eE`>4l
z>Q;pP+7Mf7V)>l~d1_x|q~S^HZi(qfwbA(hsiufO<0>bG9^g&ZjLTfNRmQ7j2kg86
zE)=$z$3g=L%jEovILx&|s)2gR1$-iYiudVrGnCCE3^2YJJ>oU=zrGIg8y1xJ9KkOz
zDELiZH0AvM0J)>Fc$g1fcs4J_a=Bb7TF4#EKINYw9nK@q6tz=)LBH@R<8W(0l8R#I
zAcXT>cZ7TiXUXCwanG!q0FE_NXR8CsWgeCr3kD}?wC%X}PFZzKbq;PwwOS2#6U0Sr
zflW33t?c@P!G7AS9NLYb^cwuL4)hOgR^RGR6af^x0a{o1)nzwr?WyB`v;fDYn%Zaw
zCA^!BKAD-sz-7m8U2*oL!U3QcP-jmUIXo?Qej-#Lttrv0H614fCjYJwEV_E&jw`Mi
z8yX^GU<e1zumS7CFB5SO+`NW{h6{LVx)6Fou<E}9XG;2$W4Zb9b{|+)lu<;4?6;A;
z9`<#-A-gXay8&WA6+t902%tH9rSXT)pDL;YV&ha=zrv-xc%>}Vr^qCBGA?cV;_nhV
znpxunSuVqURNCZsh8R9^q&$r`^-g(}<UGD`kpkVP*wEfd-+iJC?fN@K!6n#Mt{%3Z
zB3%Y6((;L)?C?z2!kr1Pl!!AS)Sm_ltL04v{0V4Y3}+*?a7FXMf1G&iW>kiHP?ncS
zuSGJ|Qn^eE_+^BT#2v|#?QRhKwL#bj44fv;D&4in%)*qiS7kj$=5=sM;cH~q%lLGY
zxiP9Pf2)m}&|lg8N5HeissM%wL-%{M{4EvD3wSw_?fGw)w(P00DA?B@#|zGGXwy?G
zE<&lSb_74O#wKZ{$%-Nf$rK0>EJn*Ize^&6N7u5Ym!`HQ`_b>ZFj~}5ZF8|3QiFw4
zX2Hs8&(dtsja2e8&647)D1YX$hUj!)IV)Wos-C4R4pq9LrQIq$8``uNNj7q~<<?FQ
zwM~Vcf#8eoH*O+JA`^6k0;w>sRC&eNm}$e_XOdnX_LD3{f0!sH3Ns59MZk^2dq$}~
zolI@eY53!wx420~c7B=LQa27p+p(_v8b(uHdF31#el$#symX|`!IB<Pq;oGVpp*Nq
z+)i`3&grut%4Uk&M$t$iwVfcg6C=kFe@g{QgH4GvRU48iwz$j+HLrTM>B>&7bw8=&
zvboDqYNwy}^x!4WrK};APoWkLS~iT01F@PK2o4+4b*?iHD#Ka(wBEY~?`|Qw#GB^b
zB3zlU34Cv)F?A-)UfJ>O1Gck!ZW4b-ji>HHZR#0B!y<*Ob-yr1m`&zKXJm}RqVl(c
zJRHr2`#@sPXFoZyjo_Zfm^K}-E?POa+&!!X5=#K10>F(7K;t|`TawA^81is;HCbz^
z@puv_WCL_P8%|BGM-5>6^+io$)_ZPdW;S)HqC#<=DW!#*2in#jzkXp>Hyj;@0-+H6
zV6`v|nBFlMhRc@3V^8S|kGu{flyD?$Y_VHIi5S3x+8vCMx>!dcopACL--CwM0V00J
zHxHTK`X7IN8RQWWAF;FgM-uXR2l8hF8tGq+;mU5uMjU2&zlcmq`nil<C|tgdFwjZF
z)#`{1sWhp?UV`7Z%qt{#%w1{_VJPVMjAZF+@1q=cOanxRpd&Ne_F<H7te`GZ-c{UW
z*H+5>A&Mb;)8*#GkvOxOukq`fN$4$?xXyBTKMEE?r15?1*qWdn>am5!$eutB;^Cpj
za$?Wxe(r7Q^UZNlPEM9d0-1ZZ<2R^rZ?~f^JLrbfGJEN&JO2va5eZ`*TV(Jzl<1iB
zZD9z553I#La%vqz;(UoM2p-8<FyfEl+fv7<=1n|axrm(0myWO|)5>;>Z@9Tre)oo@
zum--lYDXF#dK**4;`^sXY1(xz2v@VFjj|*f!)3N_@JC+Z?97hJLOg}OZq55L8tQ=V
zz>yO57~#EyFLuS8R@jb@-xKcL!BHY94T7X(uKSbD;5)S3>Jkt(y80mf9xU-aHKado
z-_#nuwV&pbc%Tmq;uAi-q#4UUkNbnBOMkY?@}2Vnfi1PFWx%j=6Ap(+Zi-e@U)N<2
zgQi*55qe<96XR;1P3WA;jP6PJQB|OWGS=&bLBU7nw<iR)QIC?WG!dBF<tS^X3rzZ}
zk|`~(c@pv5GPGZA@F83hp#BOcwY`1u<9=+J;%VYsRQIs+p<&5Tmd=ig@Gw!8%BaEe
ztvfxMK{DsN?E{bT_8NiYnPn-HM?T|wA|mN}I8*!&ov@0*W*EY%?|WKk(`HYDT{PL$
z;tfppA8{<Q9bW|U-{iI?b{A>-w!?6+if_f-0DyBC0HI)BFE(&_+0{5=b%BuoVlDp0
zU%%ph{P^*0XMrD>ascc}rBQ3K1~kPb@l$}^ke)8n8UncI2Ot2B{Rw&p7OzM5s_JUB
zHcx?tTFYXBTTcLx3SO_mf)06iJf|<xb4gY~N8AiF0A@2~gezW9kMsp4`f}cbhaPMl
zauSm7fO(8I$#BhAzk8C-JHjX6$0Q{5f?)pBv$LmLFFu$OirC07U0_7#6DTmas1~~`
z15^AqFIr`%8Pxt0Z&XGZs|4OSbdivj7ZH^r?Xd;*JH3DOq|F~MDPo7}lq_q$B($Pz
zbtd}}y8Nihguvg&qisI=p4+qe#YxY~%IA#uxz5xqqH(3?=_WS;<K@=WHc47dk)K}0
zZI@7*DG@DeiN|tt*Pp7TPc>o>{+5h9d#|G-<y>OyC`ZVoo&vXXS(Cei_hOkj&9nCP
zovUHhY!Dtxz_krq?~GIJ_6*BTX2G+*1Ok3ae1#TXPQlgjUh&~;++oiPCVoBxS`7FE
z)WSM(R1@7&`Y#yMR;Mx#T^7b|Czz-!*z%%$?NnhJ0e=`t^-w}c^={9aFCl0A90Jkx
zHm`NxttY4&IZ9EGdC}3;akFUQcb#bOAd<$pQqN{aBCx#^=q$!=hg7{mN0TF2<z3JH
z%l_-K2?+x}px|<`UOYHZ5nlIlqw4lm7Q2yKEL<Ds`++$U(;83T9Y2^~z4{F$)8^-!
zC8TDayrZmi3OUC`LT*j)_McwZ=3oSUeLZ@JyNSv5l`UEw&z@21uj{<37{Zlp#fy@3
zH*{r#R(12l48Iz<R>yT-7|fmZ*$f>|C@zuK7EBhRd7UcA)L!l1v3S_CaLP<XG%q-z
z^&Snuw#qTDN4BYCs;juXZx48VC$tltw$)33H|B?iri#hau(|AUrS)Wvov$(Pi=mLF
zWkaL;Wj=%QlW$f{RrO_BYG73nAgQC0lKKbZ>2_Dzgvy&CQt5q>!NG+{4$3DG$U4v$
zBMk1Km3~q{6nOV88?pRk{bxTln49TP>Iee^gZcLDXJzI6U3@T)u+&0v>i^h&z*tP(
z$NQV45#w*?f5$gV<Un2-$P3{1xMc^nGQlT(X>4uVY}}mM+NPvjt%X=I-mye+yYBOa
zc5M@wzAyY=M00ng26G|=E#1vg8LZbrdyvwovo{@RzPx^y5L0)g$Qmi~H(dnLTqd2{
z-i`OgP3I$jRV0Cl>8E%S;{h}XM01r;T;li5B#j5d+h#gxl=zqyPmU)3_6+BGlU2SN
zuG!01Id`qkXO_D7tOX^wRz6D!EW6Cy%l_^ZzAJWz;VSDqi)W94Dyb!_xjY*S;Nh>$
zzWjEzmn@@(JpHJc%)&?zn&_>d-}yxa>#WvzuUP+xrvl5Z%)8KDh?J51=bk~w{Ggbf
zF{Ahb^}02p^(sO-qP>QfMD@n?v}pf{%a8I>Xi`3>vO0Ew?wFmyQQCX=xGMe4H1|Gb
zJL&JR#Ys0OMf1pZF|JNhth4J$EFL7{k~F{3tLK`gT@Ak)<>z7J1auU#Vmux`uZWS=
z^NuAY%T<5>*v%!UNug_o9i)>^m5bPxZ*goy@%w9l&vTSUe$gA1o8MfoY<WZ54at71
zKYVP8&K$jWR_wn}4Z`6HM&$a#1}B#6ZF_t0`=&_!PR|n7eLuh7;JcVgVp<<unsm6J
zJW~>kxa-exLn|%5h52Gf*R>{e*o|K>6hxXt?M%-!MufE~g`O{H=(!16U7_Nr;vJZq
z+uw+Us}C+3GNkt1Gcd}`^u};2D2m-<PM4^4d0Z_8PJN}|;Haru{iKA|F$z3qKc@fX
zyRZS;-zoL0rj>_?o3ov#0lx`MI?J0k9CRVb>4F|sv$LQ<1naYiZPkTF$uKS;AYh_E
zK2v?4ZaL?P1q0D`U-toXjr@o3jn()MIFNl}0NEUH;V0T2O<%~p+64qycTbOkrsgn!
zR{AXKmZswp9VLM+1GB_4L`PmjY)7rw<*;_o>pJ$vbrY*=O9kGhVVMCfn)0lrNT9WS
z&BL``itMx$eT<%A_}#HamX@23yZNRR(ZayqAFa3)vN4^cpFVZ*M=f0^%W0C^5&Tka
z&c#TX3B1lRr$#gmd$QIGb9ov3tO?~iF5RX9+A3Qw3nF6^J(52AygV}P!MC2jirD#}
z`9}uAB#<GZZU_JIo-CMnIsWoP^e?6ny#Y4)%2ic<FE?BM4kg$OfQ|^RKc6?aXp6f4
zbffe8G=O_u_$xFBg2^r6%*_%gibnXg2M_)Ie(X&ylr8wrx@rC*wgwMB_Rr5fi|S1w
zuF0us3@B-;LG|yNrQRh{GHY@xm(mkU@iRsQonihw+02iv>OlNZ+1=;&Th}v^FZ7R+
zNgyOLsO+(E(_{>9KGG%Rt3Mvn*argqDjT)Vpw@RFY6Y`c-u&bW7Gg`}w6<$h=yZfd
zC@7&f*YGU2!9zuOJ}<VCDOJ;%eLBP&#)t^zXr#kGIcT)@8|)pZl!qcCvS)M$2PyCN
zpNIXn_wNhq^f27Qe}gGL8?W?SVue@r{F0R5BZFULMz7zaqx`Y1yE)a(5+;h1oZ(&&
z7*LOhJxbbc(J@x?zJUEecr>p!qd+JE+=Fz+?T+=|Ny);B3aZye3s$LrPlvS4y)ZJq
zAO7AamfiQAQwc9IJ?rmJeADo}kvW+?bSkw}LKohiH%g8~NmUq_n@!9W=(Eemi~F=^
zAUrHM*YO6Y7Kd_)Sl;DOo!c&Y?$8OKc&3y2$eBRWP<k-<3Xn#srOKcV2<_V_<m3XW
zD0DDIBVaNkU{RUvw`HEPDN{Ji_ex_t*SitH0v(EaC-N8U*)@(&eCzuyn~M;szo7gr
za-O}<`y8!s+b8rGtpg^z*G#VfNc_ny0+?Lo*G|0Ya!)#Fz%=b6k;9TxP((*ZcYq>!
zccIQY=kANDYCpi#X{f1J59?OSl-LW48d>F-uqX^*F|nhrYJ3|a`K3NjH)K}!Igy`2
zi87^$O8@edFnP<Smc93UE?$X0=C92}VWsJR1d-g@UurApxIkvJA?~Myni9$*;cvuT
z9h1_Xt@1|6TD3NnndD-(w<~J)MUAlLH18GOkDVN{MtKruIhdnclOO-6SkL}R6*=YQ
zl_?X+6-~Cjt4(HK{ri;0?5@3S>=XRl9^$J6BE1=t<V~dr_$$im{A^Ke6dq#AO2hrR
zcYi;!y4;PG#}S*KjQFL8n-i_nW#qQCj1DKp^mE%W4prIg-;tN{J=cO~`K57Ft_c-%
zwi(+*aINTV+ifEftzdFN_{iot;ky|7CfLiITF@?uELCSm{)}y`WqX3maM;VAt2RM6
zP~r!lS?B0BZLT%JDcOek1O77Ai`T2fyiwQDr`-<K`d?uH9H*lE>TJevqwHFfOVhAa
zmeQyk1}59DuAE4nU9)K}Zp+6>Yj>tc-CBqLV^#lP>0R|)3s-!c3|IbdRdMkQs<SHc
z%rLCNO+4iv)-C=O@PWB*{c;-Ow`di}K~(v^v=!2h38O6CQ?}vcNJo<2ab>*Mx+8+9
ztls0ijSc>Ok})e2)%MQ+553CPjRhvJYE7?q-;j7_YxWk5g!D@V0gKhzYNzn@d;P3o
zk;h&!`uSBKC$#g|0?s2iNA6WskTH(}i-}9h=Z*0y`J?hWlDEr{R$PCzl3SAG19S|%
z7Pkz1tNl^RttE2SSHphc?%%sV)ZP|kQnT53?2|fQxPK<t(Plf+{G6D-YU7)ysKQIT
zFEd(Dxbs<Ybds{jwI!KP^bOIo-sSjQi+%xTRC-%|oyyvKypSW6<vPxH=gop|LaJ3S
z6?o^c+NuZnH%;itc)H*owRYzU@8P$24s^FBFyAt0iz%-)G-bJssViGKchsPX`hQs*
zMoPmyb2U)o%=m7Kek@JWse_8nSd*5PHrUBstWut@Qm%bb3;a`yfD!`66kAYqD17<S
z4Y(cE3SAicoly!P_XlxyUuxxn4rwrs#xEcsCM9JM)anYyt*FF;!%uxDrzPGBD5AC?
zVPQziZpQ*o_iDR0toJQlHWZVVRcfE_5@?<x%gdR7FN)uMx&&qw;54P~I>1>FlBjYR
zdIMr7fj<S~m*ql2qV~7K@rN7GDB}paa|1b+(a~Z9fO0rzA3?uhao*}hzeW2TrcPkM
zzU?NhyNzjcxvbcw^&@j>#G%pgBc$Er<Zl~frinL4bq9{xKDNe&(!N#$m(So83k)T5
zTJD%fhS-WY%S9c=AXTIlE9_2yYx6JW#NVg#F|%#EBD9(d*pU0q=RYWgMrTJky!>4I
zwKzl7%qZ81JpAh619qqSTIiCK?I-oxzZi#^nX0aO<$GbL17x~xCu*EB19uYt=T}w5
z<iLX|Nph1Lmiy((EgjmY!#$&i|Kl?QVp;M=sV$MKQL$lnE`)`jlH@2~vO}SWi63LQ
zZ%|mhn*4sLT}54{2z3yaxM1%@+vmn*y^}w45e^^R&igp~i#D9>1)7ev-jC0AZ-sht
zRh-8K*Q}e*RM}e@+MBwV^W_Ig?t0j@llgsSn6CT{`e@+X@8ov86?P)F7AP_hdM(zo
zZ(h4lzYG5@r|ixxx17Nrarcq!p;D9-pUu6M(w;h}U~y=aN}Foq5TX7sR@a9KVWzbi
z(q6z5cRI;vvodjiJsX6J)z@05*sso$O2vtQ{u@37g>T5~{AM<&#lt1s=M6j@-@zuP
zZ>crryNf01NL~p~OCnsVPMo`0Onsjs_pfhFsxM<5G%4|IyA1ZIJ-<A-$lXazA6@7M
z%aD|5ba7d4yoWEI94xkoB*<N-fovr6bw}9@S!rt|j}-C9R~3b0rFxmmVQy~FR=)AP
z<6fr?4NF1{C6u%n<^Rk@e!3VSraDZtNYqXjl@x93r)+4B%Xn}MA_mJ6M3?w{yCuvg
zYY}8%6v)G7!EuTbqvIMA$7xJ`srzjE<i+}3`a|pI?cP@Q;6ICyZ?khl#*mY$`SKdO
zZ=v6*#UuLjbt}`)UF9LJ-;ssNNnb^zdAOiqj{Mc&NX$yhETjSV^w?y6nwO)NwPh!c
zAZtKc-Gr?>H%1@pYi}6OU+0)nI;OheY3)fcCgty#*w#eQgGSOn7Lg^-MHmi0DyzS3
zvPpdQ0$qT%j1_-!F`M`l*7<CH{>%3LvNuD&PcQ~1fz5iQUo7*sW+ep~SNsjsVHKXb
zp9BnRi1~WCLKWuMm;c7i+ul~%($>~Cz5~=q&6T5^q1a4i^wj51r6RaTK~y9eIBBIn
zeR}g`WovD1ZH7MH#fy|zS6Ab5SrHf+8EtNEenB}d8`yO~fAi)&0G*yvSzIpXOm?SB
z@d4-i%wqmmFkN|b3~;h%vBuhg^dL>oo2__z*T*k-u?q75&U)&rK)7WWI8A|<=(Uc=
z1y+FGNwLmD(X#I5Q_M8jbNayLc~7ZBVb4>ZsKe<}4VWZ;S5_dLDVs6|wP^F@ym|B4
zC|~o<cIy4nzzWwls8E1U`m<6W+^gxG`@LYVv+*CO*SZ4;CG%eBjFGyF7+zajg}yI8
z<({>LA)4=G_^DUfSPYZ2sqokA(Xi(Z%N}KTxX|m>P?f;9%e>!VE;0XHxp0MMV1A3r
zAzdIS>O4qdBu%T4BNA*W^&X;sQ9dkknQ~LItRy;V0)I|fag5V@+171e;+l;3s>*IX
z)*(g+rOU|~Q#sZ@>gUVnoaOp+R8GQF;)!F4M%*{<v9Q=PZ!pla-tR!)K)oZ+Iw094
z+RfkN9Fq^`o&B;HAL>meT#oQ2kXKUdU5SRsjvbhzZGG{mWj4b(esM(;W_#3SL1+(5
z5)r8Q`}m=7<P_zUleS3{C`&jFlZ1;Lwx$<*I}G>1Y4_OMj@MMuf;VwtOHI_Ls1$aK
z8$j+B)9)0HV|-*l!A*)@hSK?Y%klcMKCq1eUv{~6s^rl@AhV2a>;#}283@EuC4v^(
zo}B8p>bp320dsA)_a57FwlWWjNTURNmof;Khg-NA)1$yENTlL%P552GZ*mG<W5@J%
zX}0olZFBn>Z@@gaD#0KJ$M(6sN4)<eRanqkSJ1<jtJM$Sxwn(oMUK>4Ev*SfS5n;8
zTgmXyeNE-mZ-WdIm(Oy8HcNvv4_L6hmX0AAu){R}UK+jqg+!nRm`BwL|6W$769&=1
z>7>jtbh*yBj}>L=L(^J3n{PEL{GId0+PU>Ri}}g9k`a-Rdz1}-81UIo)-kOhdc&f-
zcE=1BAv;yIIoZ;Z_yt+PHW;aXsX;)@=bU+i{<*I|UTRr&A$e?6%e0e|BzlKzV_;dY
z&#Df1#CzLiB%m(hkzwPSJ`)nMe2FPe8PmTz@!PWAbXF)cQNmwnTPFkoIfpRuftmT-
zFM+_tE`L=%KMbz*xB5B4MbJ?*e7Q~orr&J#0{d(Sh8~COq8j*X4|iyM)F~B3b4g%q
zr?@~nnYYQp_O|}@+)G0KgxQYqsaK@9NfiNpzgORI*uOuVAs3PYka0^_5#5(kqUC&9
z{HIi8AVri0A`qZ=0&%c}Aj;kU@!?LTmd_PJ4hxUOeSF%|GUhP@!fCwZZETodzomZy
zguj1(24rKyz^tmwgKY|^a&2vG{>xbdpqaczUEMEl%UEAu|JqkP_T+vB&0leg)%SEh
zXFyL5fd7KG_y~XwDhdjJ;2#VHY3q|9GbaEEw=-5=P;KWll7OcZXh?bh#MS_I6a0si
zXJz&4ub$MEx~X#Myf9m5J{jeQJ?EhSO-XBrLPp)>YnXoPQh7%<G7>p(z@BF4UY{g;
zbK$ouhhHDDxxv|vZrC+nbflxbT<aNzOX%Um$c8#c+}eHMII4bF0-Mj8*Op8=Hn-sl
z8MtP~Z9<dlOj5Nl5gKY9`7mp-rSnt34g2(-Znr)KlJNyY;sCIkpOS-ZZ0}}tD<*9e
zqmDZ_jNbh=&T|~mRaEp%?@^;qafL}v|MVE=++h)OK&f9KASY~c)?nXf1xy$pgKCm(
z__o>382Cy9W!9pj^^nJ~{u#mn<?t+0{lUt&)fU(nY+#R)uRZQp$MGR_=%mgc-PmgN
zR6VimqMUDN^H@iK?HRLLXD~j7#^Z}>?!-SJ{E6B=x`y6=vm`~HrCP$1Zhz5>ujlN~
zue}a_X;XF&;ro;*GgFWO|3O?%z@t_5;fxi7(*j+uYmYmKFZ9L>#5H<xU&%)_RCK--
zilO52J*USO$Hmcx@$*Yzxy+5RyYyg@Q%Brgmmj}W%AF1EUyAWA9u`)$eOowW*FNOO
z@r!so8=D>5=l^)(*x6n=4SK;{&Fs2gd$L|W_kEujAH~<n;->a;pZMc!9=9$Xo3XrL
zLk6>cqmJ(slE1cc=SR67C+RNO)|QbEFTRYZpR=O<VMw^{ki|W+jB5Kvd|581dHa8{
z_f}DHZBg4MPVfX0ToN=$@Zgf*?(Po3-Q9x*3GM`U4esvlP(b1CRz+3U`TCsx{~mqc
z7d@IWxM4$$V%3^!&pG#e-v^>EmgC^U>)DtCB%l~@!^LDF06Et`=jRA{{_PC^<!&*C
z*yXNKHBrr_L+#;kZs%O_IxYV&q*gx-nU>Hju*TN^7h{u<O9SJy?9#gcJuej})_1Er
zr%EoQ?XeVv9=?t;>7>S@;x)tLwSbnch;J2%Tk6?s1N0X{8343kzDKI!pcrAG75*s)
zj)@(h!8b@#7Hs<oyb@|%$Qr#6zYIKf<3@Xj-_z6TB^3QP^Jfc9%F)cw2h}#U*LEXw
zd*@|ZHcOxv%6U777AJS%*5VSps+Y6e@9sLyk9G4E*!vEoz#ayR&MfByhVFk+I8rV!
zfuWw*mmg#^3AMk<Z$*h)Fm9Z(s|&sIL-xAOwi!@VmENP7O5mwlhUWy?IPlTCO_;8}
zC4t?dDr&aPX;};C^#4v<o=Y$;)i;cf!^J0U^nX2UxjU$L=d!h{xSaI0V@e`FM}%lN
z9wsHU#i*Y)IHuc*dC>Ru1fzwhrqK(gO%-S#|8b9_u^;#%CKd}!Fbt+Cjr#}?KLLpE
zFJ=VV*9|n$m;1a*(>-8e>)HKJW1?HfB@iL)3Ina@PkH{oDkx_(nCg*0QU6v{1OIcm
z4I)mp3FAAe;Y3!Rj-9b;I3Z(d0j@AD-F~dt4fTnv-_8)oOBlz^g;-p&S8afweI2Mm
zZ7VJH`y_bbSAyd82&?Y>cG+swxsF3UH;c(qJ*KlDejkd0k%}H?^VlC1j^-}Y^jSuf
z;Wn5nPzig01z%Z$6ZHnMYQTqQicR5=jiPA|mvgV+0RZ#vDt+{c_76(iyCgXx(nvH|
zzng8qqoTbiAy?*s7PT55zYRNzZntC6L;fiYW|!6X@y|JZJ-*Or7FI0dy%`znZ=(Lb
zBW_HqIlZ!f|JS)J7)}eyd%(I&2T;^2$t^B!RyLyZy4G>N$tD~WXYxM(n1bK)gBuw@
z{G@MKR5vO!FoI+PL)W-*M*qf`Kiw$t(PVe%NR_c?(`p}N`=Q44&duB!;Ry+s%@=pa
zuVM6wIz4%keqwauqxT#pt(X?`cc`3J`@ma3!Nm?#!iiaz+XBdij62H`+@s;fdBbjO
zXU1ChIe-Eg#gQ{}+GSJhjxS*Lql*-$(<eBT>MO0?$$L3<$F&f~;keg8R3>AwFw~j0
zs^3S(pJI>F{|#-h9_#~z@>adJ7H=xz9U=7qO$-%;rr6<dsu5e&CaEfy@L2FPbUz#C
zTYPqH*g~~l`i5#{hc!TqD?k0+X~5NfLochi*?3^%^}TW(HoLp`n~F5eJKVeJIo|M=
zDYwV|P8*@PSQxsSK<L-g6@~KbhGRklBPtH9oPm8khX?Qk-_O6q;_Sa!0Eq3T%)WM^
zlbczrH#+GbX89ri9#JDR1k;PiPZI@t|Izo#<a-sK0m_N;H!iAD_GT`+vCvGBUC)zP
z!g_-40UICD^cyRL<wm62GE)3bf%vSbaP$4vuwuT$tjfCSJS$u}$(@as$Yl$@A1w;r
z>RU(;Wc@3~hYT%-QIN%P$2y+|pe^fSnBVK+K<Ln0&mcB2MSqmj27$D88rJ65!wJJA
z8EXa2AxIh=1IkqFt(RR8nP%o&lgn%oy_z24!HWMRg;OxLFC?0<rTa@Tm%rA8dZ9gS
zWwvP~_af=8?e>JsZ~`Es7UK8jhpHb*cN_bc@olwqcmtHy+=)FuO$yprjLJn6oxXUi
z@@%2+z3_D86G#Rm-%f{+=mH8Lt&hctCs#(t_|snwFcbg&Nz7PGLVHK>HtTLLsf#Jw
zSCtU$uLPcPUH!UH{9O^_@tQx=&i&-3_5m%wDIRaw{h}(j<GIszUucz!q+G=LAM)zW
zu|2?V|JhGk@BibSKCi-nUCi0I1gj6OOP5Z)KWwPh2x^Zv)Bn7eWV^3J%aTRsG{%bm
z#Kjt&T%=ovO(Z+fWbyg-*S^#C(~`_npr?u5Hf))bgIRYaVS_lT%ik4tu9HO=89yZi
z?R8{Nt+DK~s;lu2=Pg`0mWQD=l*P4Og{_*MKMvuO2RAuGX%Rq*+dGi2#6Hf_>0gu7
z2Uv~0eP6>OnZ2Oajn7L{^+m@#>!BR(YD}{wlV+A*D1n7IXg0z@rS9@lmz;V$DKrU*
zrf=CPbB=5#)85cQ9;}=J3YN2THfhctuA4-K67O0`RJGi;!4ad=ax~ptzdzBoNrV}d
z9p3+od>omMfN4svKkz%*>J5%Hx!*9-Cfdq<JZTIvCY+g*8z<KGmN3SrcbKBOhYLj#
z!!ySy)Hs{@T<}qtU^pC>r?wOMOt0oyHJO6HC4i7<8;EQoat0*K>$Z*uTwnB(@Fb&I
zJHV`&Hj+mI3jCwd(UCr`xJ!YutPXoL7rDwtmb01CX77=3{Qz18A#!Pvv^x*B*j<)%
z$5<T2sJM|<ay<<b@=J8m)OL<0u<|jq_md@*mbS7JH_f#%Qd3w@EOB+>&pMSgs-d6l
zfZ#Q)(>-~Q=tBexn~$i?E`?8Mi_KTX{~QQ7<q(DUcdz%7>KwV^1z(tMQ&vYG)83rn
z%P%xplyZf(Q(Bp;QKT4XL=D4aZ_E|A%QvK>Mzc09Qi5>Jy*~w5nJO>5NaQsrhXnfi
zsrpz6R<!SKm=ejVPlkDvBj=GV0vIil!meg<INZbt{eSB->~2FutVpDcPBTzhS!FP-
zjQi9iHTK1wMqezqr*sBwSm+X*21Ev0#D*Tq2@f17ufoyfNC?83K4yD093<>IF6n>h
zboO8-l{GNmTGInl(y^NBxyH)ra5R7SNb;10Lnb-625*>=B4b@B{~M2clB)mcf6axr
z!3-qa79#eM`zfE;#E`NmZH%!WF3THj;EOeAc`_6qG@1K+wAp@pjd-k#KI2d5gRRby
zjkYJ`XpdiAZ)D{{F>?kGsrq>}BSG&aCZEz%!)kX_Jv+`Rw|s>|7Q7nhiMfB)z1pR9
z_P985x7y=1W5XXJ6q~DE4Bu8Ef|kiE{rcWmx!M0MW@jDSR;!CpVc{rlSSJp9K680h
zdIasz_w2(vI<eAa&UuH|8Csl2PIiOE_0J&io^d=<h6#P;=L(oL-wH0fa_jsME2b?^
z%!gog3LU?z*(`sw)>oEj`Fyq*QlYF;CE2Ft97Li&ObWDu1S0n7V!M%%Z57q9yp4qI
z=Rl$twVjmtmW#GnENTA{yfJ5an)fQb!f(Ru#kRH6|5Vdu@&Cj?<FV9)KF<8`+B;ow
z&WZ{B_Uu%O{7sQM9@0PE_teLW8X0de&b4wzwT_Gy=U{6z5gOl-JMhUk%_*tV{)@Ia
zt(-Zt`=w(LJzRt0{jOM$AKg{tU_AGhPMF`lPFpw~uDMe})4ofVUEfcPufVnX$US`&
ze_FaYkvCTRv^Mg5tDy;wrbn$tg`+X6w!gX$uny8UTGoePgPlf7zplv}G?(}@_q5&d
zCmWXAm$<<67BRKP?99$kOoBMYEz6xX+l{~?7t)`=U1HJ#`!25Vf#K?%44lKHdYbHK
z-eacIl%B?)bt`hrXHehRKd}l|b?-RX5<*E3YzTtN9}bzOo3m#>g@sX+m_i-7wA**R
z%v4C5`GUKL5-?TfaDrE3Mjh+cFAyk5{M|5D3I1KWGHIBgJW@_ew2yfC0=9i`k4G;0
z$z!5+`3-2nw)#I$>=|+Lo+aN3+cS}M_Wi`JA#~o%7}seTSxXF?w=+URr#dL!5nE7(
zj^L6{FB}K)$cRK<p~m6YUh3A~N6~gv5rHSNR&t|&r9XSF2vNc_d7fZ86cUNXDjCN~
zT$zn*v@|Yr<!_26s9VllFc#Ct7khH-D!PV-1m*^m<nHF(=#6f6j;eA0Z7~uO<$he^
zT;)~T(3PLlbC%!u#_?<RjQB|)Rd6hf#+!X;k76@0*S7ET)*TZoFLCA)e}(?A=#F;T
zP`xRo8(&;l!$Egm$+#RihiAp(+Dmz1Z2J%UsCNDTtnQqD{3h~L$SmbnuAS)NvD;OV
zAN4XIXb%WS#W{F+T#kf*B?158Xn)}UTSy%aj;nb%6*8Pc3vWdY&LHU%O%+Srb;21e
z|M7YunxA9-mu1J~|ILGg36|&lC*7y~|Li4^af*hz)6zqe`~S=NLTpd+S-Dri#b$o+
z<bRo*aH<NwhjI?jvbz5l=kI>t|L;&_i~pT<{QqBqtwyUWy2sU)DEGUCwainxnuAPL
ztgk|&klz6odN>%V`n;i<9@(~z%i@Mi2OfV7hRv(i+3X>Z=vABlR`{7>@kV$P{zte1
zeytni@_Ca2U~XglT+od5by|C9f{x;_=r?Gbpz%kZ(++mZL{_iEVdWtS9CJ6JApXCv
zkznyHJ=@voL-ip}e>>YzzQ^hq)&Rf>(f^(c_Ew(pjuwH~20Gf`j()KXL@afmCgn~a
zXm%E)@mLDdwa4zoW%BERA^%0@2$bocs+j>2{yIq30*t5k&3Pqq*l?kO$Kr%qK>dB~
z{nM7YeMb)l1I(l%UO2u+XMyNjtf}L>EiD~}e2a0ctfv19sT-F<<1=TcCW=uKlA)P$
z_2+<SV%v?Vi}=d@TP{nI+zo)yX@z}`^x7|tMr*X0t1KA5;{W`*xh;eh!+KvBgA-!s
z%PD78ONXkgME=*JrT@RYM_d2+uDR!dCJ}$0ETm|BnVxSkXGCb*7zsRLt?oPngC1-s
zdIRCvLJbyA3hva!{LxWzF0)BrhGLbMA`fWn(Xq)2YFAqJ1o570W|MEbCi<NYKcRO$
z>aEm`36MgIeL>J8bqxzgaJwie-`<|*Xh1#HO|D6ce(KR>z(W_%75HPq?{L_9?B2%n
zvn%=T47a^N+a%1!&1GUVKP2L_t05R~+L51k&-ImaSvQH$v&)6+#oyytDCv(7Oa^TW
z=!(7UX0cTR6-stnK340;TEoUi0kwW0E|^^T+O+LSxTvx+2G+t;@&$Vqt{om6+=y%7
z!)8Fb@_<C)-<K{n_YMvWW*!LES*EghT@FOJ1b$uajWJdcv<pMY6v#~b@&3N$rptt9
z@f(s<^~rP0h#ZMmI@tO3sw!mA6vxsn+&)3~>jKfiRLvF#>{rgs9k*Y5om5m=M^Qi-
zYTCx03=y@rlv-F!%e*R`da~fS?5q#W{^ITUD2jLNS5yWWqKVMDeKFK>x?%{IOP{JD
zn>>7Qas29H!9rJ)?=aFO*y1zfch}W1bh9f8_SvUQv70D#yE7y2y!H0IZ>f7`oZ10)
zzs7P}lAZdB5ZzyeV79$%JX`tZit3E1%r)ooU;6Ps>^CQL?+zKk7z(mkXL9;eQTP0x
zOP@Zun|}@o-|YRh6?E7wA)|LD4ULQ!-u(W3kav^s*9G_HP^qr9VbTYIDY+#keFdH7
zhznrAcqca}SpTYH*)Tmq^kJV#di9&ZzK0W@rX=jw-yybK+@Mrz{=gSN<!&%JrnB8R
ztkydK)Uj~K6Ml!rrMz%6@eX8gL;Ipws_wd>pDvDc3hmM(5Qe*-c+9{0FkJ9!wX`k6
z!L&>pjV=`v{i)*y#tLiPdXE*uX`Dya*7lowryl2$z8wb`3;}aK+Q&j?3KCdd<WQ70
z_l>&1)FdudVzwMfx;wktDk=HlM?0hExqUEbyA{>!U>iP$)Q^PQz_i+js(&(-B=zQc
zOb%P0H%78}Q({t;_@AGp)Kdy3b(sG4IXH>TY_(f3YSWut!CTC8eB=`{c%bKx4_jzc
zT$)R})ExAJeHwI?b%#n?PTjU-QmIGzF~s=N-(o?V*M1~gkWp)h{&Tw7!AWU;8=bxg
zAi!{hLs59p2kMy$dz}6mCK6PH0Dv5Xyv}M>54@|Z8;_Y{&35BRJ=UFx!<OKbQPtul
zYY=R&a##oqy)*{E8?mP!1<b?2ke%Xf?xz}6=P(Z85$?k{AIQ_sljmV#`c%&YHX=&t
z?K-$?v(q4-X;TGgx#l3x5sK~fdHB^1{ivz`LFv*b0p{=7KVOp2=8kpQ357>PD?GLP
z9}A8)1B#(!`uV5ZiByJEC?vh)8Kmo=qUMST8{d8ba0YrnuMM(?Ay@r0;ODTzk<lp>
zn7(YC&1>vExz`<=PNUWA_I18M>C4zzYPz6}hIt+zE9)<4X4*)SqTnQp|8=zWc{gZk
zu&KG{_Tq7v&#-!PdTumJd?KqGu{t1)2>AZ!{kvynOm3f)?$PxhUWBxa%i?;&>cx~d
z5B|MViU7*T?V6`aC{{Ii*VZ?6G~#w^9{@;3<?->j8-BWweK;HcWVhqn;>%aqck(P^
zaJs_mUj3Y$Jv`}3<VUheg6@hC#;>|sJKZD9kOKd;Yc!J?2RM{p(o)?R*Z<zU!CaO+
zhf9M={jxGVv*Szd)pgY5-m6kqsT<IndGdQ%rU<v=;QigD7uG>JfS~ggtvQRILNJ1>
z_NpqN{?_NJ^TpfS@DvrWUw1n`$>jSr9*l;=%Fw4nZbUz2Iw^cmPq~hmiaIa2OFTh`
zeY2lKI(N`5wZ)Bdlev?BkGdd>UFX!R%jDED7H}nwWct7}x|%dR_4U%Z&xbE597z-B
z;5~)O`dX21a*C*zT>ZP{?dR%lb{wy=0L!f#Qh_SOR%S}V0QAxIr2bDJ$q9ax_zmH-
zic_?XF=!XCU3m9Cpc1Hr_Js2I5zK%W=4E)b-!;jk4VO{|JoXpY*Cl?xqACTN>k+p8
z%pI|a?W&QZo7Tz)_O=xwzv%`Ci>{~M#;f|vfx5epIN9xueUnBr0w{uedNF^NlO5U4
z-Rtg}0fV`<b{&n|DAO*o5!hO9SCY=n4#!&G=Xgh!{S9043-h{J`IRD_ugLlr`vm3%
z$T@n=BD8z&vE)=bAoop+x;K^Nz3(%!mluiFlf*j#_5_j&KLwKa77j*(bNjX~Tz7w#
z^Bh4ckUFL$U@Ec;@XD&%=JeC!?exl#)|5|pdITMI=?AsSrc?*_+5=#`R?<F$;fxY!
z$iJJxrNMF^0pH%$q>V4})9emX@`qYrgWH+j-#|Pe4O%P=A)ij7_bX88x!}q7$=QDK
z;QXT=g0s4SfvgVSp-t`_5=6HheL0wPc-@|CXiL!NXsL=H5p{C{CQg$w%pKD2CLg!V
z<idM5+feTZV}LmIhhK#pqWTL$m@AhUpO?E;yufvmKVb8PN#vfj_2)yexomtpnxr1>
ztzM2y%Z3zF-Q1SH9%|e+yoS$LID|`LYaGl$A*c_Ctgc$5u-UA|7q7t&ko&H1c@kk@
z1xxoHOF*`|-9KzZuNx1Jr=!B6Lq7`tryW#-jAu$0L!Tz;)skoezi%+hA5)m{XUtqt
z_@27K0D_T_pnP{4fCq!!P(Jt53}BJ;Wi*}E>eN&&HtV-nq&Yr9jjx+(EhepVUSBzE
z*HcD882CHq=lYk|FrS~v&}*Pp^O-43R5u-xRI=^ds+F2CmTTx4H@<E#1v@#2U%i=W
zk@;=-)O#{CEgSMb2R@~e=PAA(HbzEr23rF7gcQd6Tgq7;6}plxJ}5rI_WbpxrdDE8
z)(72c%elO7a^~b+GN9yu08>iO5kIiSQ!CwFSRC?%^x8y9EPqRxDX?BU&Jr5^)t#U?
z?yr-snNbx*>EiptD{URxVLuU~uceab-x6rjNFtDy>+*Z^-CAvtf_)_3ZX9etG{CQy
zm8*wA<H5&U7+q^Q9uNiD2IKdzS%ct!WvlSmQealO?U4y6YX1zpHQ+{7lEm}5g`MN}
z=_20qe$veryr<g7A6F7RxUD3?Zc{J#BoSk|_SE%T_+#uPyEf0EesdACxoGN1`l$Mc
z3pw+q{xFAB!|P165XNck{?_!rhT1(jF_;f`XvLsaOmpBi-<Cg+>VXvXc2V$X^Wg1z
z{j{(x#CJK@$%=!bW-_RD-E%L;_>euU3()IR(=pgwbMEV82B(R78e!hBH)`5{;zr7j
zaGa|}L#UN5o<3@sJ3M3VNT!_(I5(QiV7bzB7o>_<z<wbDnAMx;W9KkZ5<PzVy&ZaJ
z%XKsraDCZ99@AF+?mLK`+Xr&b(j{P=4?5@02F26}Xw6pxg6+n&cO0GWw<A0q2U;U=
zbI#(^Uf1}r&sTc$gYxw>Z+Qsc^Qf$*S#DF1$pUOho@W%1@^~WL#+RK&+EIZJz<|qg
z=*}KMnx%pD6r<U^{m5^Bb^3&zn?JQ{$cF^m*l4HCU>P{N)|=n<pnrq4&hGBG;rdnd
zB2&-n@_6&q8zITSVl)S{nJlXEF$W1HOc$Tk@;hx<+M6TXQT4F&GXV|7TUy_4_AZxN
z<4nA2;+{^NE$P&TX>o6lrr!X|GorJ2Y-U6MGmItQ`t)FYO^%~p`j~RhNR<PaZIj+X
z<%HRDxnp%E;m>Y~Lp{!#3GZ~+J@RT{e)?)1AgG;=+;T9#2)0%e_~**w7Ia$=w92BY
zw7qxFy|s4%&3qF$e*c+U?Ec_!T3}Eg@8Y0-eC%)lbPkMndRd@{WeRA2F^P0q<D=jJ
zs1!`w5NRAW=6^`Qa2{K{;rth*^GD0IR<j~HP1b{C9$VV2XTW&jUGATFyiU7F9&tqD
zkPZPDd06G};h&L_ze89%unCg83o>b`;W(z}!`)l*N0>twlR-zML8te8m97U|*K={8
zrbYLJTYkQ8n>$&R&hL^77~JZ5SXf|O*I^3ylP(vf#~v_NnF<wwjk2uNvW1MZIAHX+
zuz=IAyUF{b_Q$oaAifvU9e=lkDz)me%*TZ+o~r)dTnKN~_+ECyUOw_-Gwt<e97Y%3
z8^fH97luUozuNO%+O!k8rY|YoJ-37HM*Vff1$#uV$&wf8DQ~F!Zq>Pv)852Il?la+
z9$I@3itWbt=AQ|SmBjLik28j}GVgL-v=qH4QA2G+Th|_;4IYoi%8SRqVSsE_;5!so
z)--ocNf)J=KqSqp==Ua1a78W0#m&r_g4RtIO?Ect9v&5&Rh64#O!G9d#}(C9L5d`a
z1L)PidRV^aJIh#*Xc2|Nquzafg%B!{qaYEi>$e#$FS-q;y9galj70FdOux1|II5`|
z=!GvN4LR4lA1duze6-hK3MUsPC|CPn?lglCl(d`fYDW>wh>d|9wqI)ST`ar_q~yO8
zcRul!GBM?593U_hJX&~U{JoekZ~=wi{X>T<#Ph**c+seM<1Fw~f?T@QSuGOGx$V8`
z&}YXvIj`S{3&ad#UrLXD#-N@4qA$e6GL=b&*I`ogHCUIIXa9hL%<e45DeYp)k7WNo
zfC?B+ujCP2u1XMZU;Wqk5sT{6zO?_nC%UR6sD<!vlEH+q`3PYFtvD{8a4RrTsyiW+
zuxZ9*c<E`gtm=NVFt43CbX)fzf$${Y+My&9MRew~YpE7v6{klvXxXHMi7}*)4|%?E
zJ@Sp?_2b$kK?r?lTB{4ch4Bvr)>v#f>sqzt$8!5zU<KkC=nu*u;rIdxxZVH9QtESb
zu)P^i673L&-`EKDr4zB}n?l6EA!1u*jkiX^Q^TNWCrotX6L}SM@w)^G#gPFk%3=ko
z#@8$*Fk2j1cyQF_lCP6OmT>I(o)>1);qA7A|9FTCE$+(ZT=KjI1GFp{N+|c@9x(AC
zKq50dzi~lDd(kCZ&PN_=EX@<Y^65)FAX_Z5IwG@KXv44!TKT6Q&2{KwcoP(%cRKJ#
z8p=tf+7fR~W&X{Z1HKug&4J_x8qN()!a{q>1yte*$=a27uU3tt1C;|}{?~hB*yGv2
zlFi_f5;}2lg48&EQtW2iRXL+Ph)5n@*7(Y0>!T1c3CZ*L(ib>bl^A|BnxqXGiNc0)
z2t$vja-*Z8`D|+{T~YWW3_bK~YP##{63Ev7a+nTHU=t-3Nya(4=#fUCW8WSwVEuzA
z-;BG)k);WVz})Ga)Eyla{Pq;f?cVO-V@FgZ+-BMvW_P;WDK0CEfg{`jhP{Dp%W=)T
zS*w+&3;ss+dhqF1C)|b)0FeM8dt2`W+OW^4^SC+>{EGQ-YuXq3W=pUA0`^BAKYoO<
za-_=3E7deK3|0)h_oQlabOU}(Og&_Dn+@kKT|FMWV=_qS!|{b~5>SV}g}$EMPX-e3
z1=ywH{EYgGU4^<rY{%II7lp67K`B>_ax#y~k9vSe@yiDR!PK7W;YHT0XF>f}wtiGU
zUCb(X`imM~<4!bUvQ`hBK!y?2eR_r^2APpkc?~z}#rfF;PU%N^MLrr<cyErN#fOSG
z?=wsIY=>o?bWJRgl3Qd2QwJW@^6VFs4+4!(Aop}-D^yTnZ}6r+aT9VfbN4uU+b<#}
zwAO+Bzq-zs9JJ|@BFRg2YS!)i2f+qY!VZYae@k39t8<37W{{qM`=9=nBc=>}TL2e{
zwR%@Z;2Ot8F|ps;D1}ddbW&NaRD`4X7{lukaqszJ@M=-^HQDUYNaOYA32Cy!dJ)Sl
zfzn;V#Iuvxq49PlnGe<WPP9X6v}zscTK{)O!d+nc$8jrh!&WzsfTB4w;n=9))W6P7
zLiFAy7v8}<@k?I>k)NuU3hq%dPJZyowlmN%xVMviz~Vb!6zsrZAS`~3F>KD3tvY^g
zz1xAvTrcj_d~-7uw48RiHc+uEE>lKUG<4mX3YyBPE<w8@K8w~o?EbC^D!fNlSgXrb
z&OdAtDZokD@WYY4=qed^=jP_t(b2I2m=tf%0Djc2r(aT@I<I{}1HS|W3<e<vm|iQY
zTwPrq)mL`BuE1rlFE^UI#}Tp>EH6VLapw!m%h!ec`0gu^8Bv_V`*Zzou%>Zh2UwJ#
zy5g@M%tnF75iU<QaJj!f<mu^$?^Q2|RM<ZY=3Rb09iV~8>g&0@eG&O?YSC7sqN4AX
zK>lavsCl-E<13%=-AsQLFX-HLFX<9#y-c_^T)0ScC$lyy0*KJ-9@Bm3D>kY*3B8cT
z6N!ZC=r|x-^|_C2?d$qA&4#~XT6MU2>hF?`UdIEqh`-_SRIVS;6AU%s*V3)WoD>h|
z@qX_OLSU%e+{CU(yY*1^#EnTRc~`FLxS&&7%4*uK7OGN}51x5v`@~^yqBLBTY(Ex8
z;&i=fR#P<`#J`YQ@8QB=vn6FdzJ)O1MrIkjo8NnCombaovvl)3zhZme^RZFr!tpo0
zVJw2>*}{THy7G$R;x=64+IMDz?9>^fohM<nD-s1G5G`iZTuI#v7nF}zRdh8%cSSe9
z04<JECFR5)wM;Dlg3R?E9C6HVA<jQxb~%w>F8*m-r<5mIR9c*(O{n6lR@czv{mp-~
z0F}sKG|a0`vzET=!C^Ck)mbD(8v7W0;;J;pvAP9=EFZs&?oE9Secsh|_RqXkMp=ZT
zkyAgcn~KgrEu@PqiDjx11<L1Ye#&vHbN7Dgkp?zSO83$zx(~WNEHU)?@MH&}yR+k&
zgqOx*5`@Yp=u$p(*&;4jaeRz4KHP~Z=E$Oi#zQ&F&37+)GbJEKScHts5|c{KX+Nft
zGAQs_l`_-$xG<y}mLA!sNh#Tkrc|IlJAasQEzc>Qnk=1OGO9uCXtV-3SMApCA09eA
zoN3d_MYhydeYxWDeY#=>k;G1IZwSM@3}xf8J+>X-U=D_%*z1Z}JV6qL`L%h!d@i*`
z#&-EvuiP0dr`u%ydU#)Ua^88rhS|<?bz-?9c!HkpV8B;>iQ{EMC?ag%S9d<FF%Ro`
zY`%eGsv@ein~YM(dMY39GW0)Y!g^PF*_E!Mn30)D&*S>0(*Upos9>oX@0RIec9_?i
z2v1d|_rL+>?|Z!6!&+QraTx}yAy9zQ&H!?WH0L63oQM#D`&rE{hq3|!rzsEcHkQit
znpElm+PR~g7CdV?7i47k&X4=X{kf$-f!@{2WlKseHr(bV@Rjc`Ci80V^Lx;4wH<wi
zi`q{{O&<JgCCG>;7ocnfPA*Dy!oM{e)C(>Gf{{OMqUtVWsN9U78Aa0V;&R{Y@)sAO
zf#~^-N~ldaXzbSG-~R$>QaK4re{%9YP1F*`GhVj&%t%jLb444x?&j_8n=n26fdTU?
z?)y@B9OU_@gwKY_s*ZWd^g|=jxB1P<U|)<DB36CjKMr1WU$7a3Aiw%hSO+B8+hjX?
z3{HSx8z4_}CKmtA^MpzW`V$h}b&~j8)=|3@EYvOok$w9ji2*`S1tz1QK-&h64olt^
zkjU#db5Bl;i!8+f8g_C!$%nejG{iy9*SVlO4t#8jeO_6%0e*fiZqoH(7{+KP^NKK@
zng}Em!0%`@x;2rqp)x^71|uth1(_6%HnK*7o_m<lZ9d2~l<=M!<T4q7U^iJ?E%tZ-
z1>jUdRs>E-cjBQD6TPc;E)PMJDm|4wH&T|<=`07vW<rn7R9XF%Js7U8O+bMjY0<k&
zP4LyKn}*!m1b;IYnEiJ|pf+6-$F}S0m!cJWbQ1H@{6i~qmXql12#fGH$j@5$uM+AL
z0RTm5L@SXHR?HJ`;@NK4S^6|t10@K|{R6)1bFa9!3B;DIUbkxMYH4r3jSpz4Wb)~S
zIekr5)N};r0*w4UeTEedf75mK?KrXqY;9ny!}10gPp4uv1tJ7vPCW9TL1CtUi49Py
zlrKJS_I0NmkC+SZWE;|ZzAeL#ed4nm?yl)}?}}@%oZe}*3>aT)a;SZxqbEqo54fG=
z1|OSSYS{XMzofePbwf8@78bL2A*dP;<%OW%AA6@X76bJAGeIYB(jMPDLV7OeP~UtC
zJ^C<tUq`)pM{iT_qV;Gkm^5@ct5;;gO+ne+2QONf?p%0U6dCA3WGumu{(&JVDM^|b
z;G#5Z>~66eh|(XbRUjI^v#2J041keg%{UygG`y}#uBV8DzhF<aPT9{EAFHyUnD`6^
z?I$6sWht|UE4mW(yaN~?Dw<Z<AaD|zCZ-ATX1&{T7Xqzw%{?S1Wv3`?Ppd?tD}B4B
zqh$NJ3(TaK4T#nJg2#rXp_Eyby#GOUvA`(97q74h4n6!xSNiEL-poI7tfFKbt#gWC
zzawNUk#x-ZJDxY5az7~XU75=7?)1M6OrQIK>39?iB`VnG_dg&_0j*a0yBMY4L|4h#
z8^T>4ZOi1JvQ1bl9o8&lh#Nz<5*F&nE%j~OskS*7GFBo#A_$)F8YC*(OOumm@GGSj
zyROr(8c$K4LvL!mkl8HV?`RN@vCKiwPics`Xt(LR*00`~LX&{F1^gy`i(*6<cRrsQ
zvc7XX7Op#$is;o_ZVaRW_C;|z&rdHhcCVLx#M4R0X8vy@cj!2nK?Xy{^u>8AF;V#a
zElz}fWLk0frq#^FS4oaTG@;wK4$u4QWenqmwSDjg-oe8p*MbiHzJIdHO(Q-(F=2}8
zx^Pzg3>}bSP{CLM4N=s|lg3ZsEo7LIf(K$wEI!gU+lX?`Ldo^(vNqyfT0l8It;Yu~
z80(6;?$x+I45RB~Gc!ptzK2-Va=Y?hz>H(GJ%@d*trhBbHs#6qDs%Zg;%d5H(1p8u
zD}7IXSWG*)JHe)=b4_odox_RkDl46bHDp6Bi)gRwy9|YQFmr<}K_C3sp?jZ{lGKJP
zds9;?xKF^L>yfBo&mW5z6IuQk92QeZ+ewP(t4%I+{~&3nC|E9To9xyf2n;+Sk1zq%
z^$Jk<^7i^w942JANA&_KUS1bHQv2Uu*-A*Em%J6Zqu~D1D?RS1)E;~8gE2D6VE>pG
zOn*>(5fTGC#vY36KP^X@Wc8j)3%WL<Sk>L`(+y6AOU<V`8y3`k6{U@d9C12Y0w;71
z*RObR#th<tmAjdZ9J)SWUQ^4pi2Z$-B4(ujT;G%Tma-pV53%$}YL?>%1o7y^4JmUi
zBf|<UJ3ZL>MSN*5CDBk%UM!fyGylj`8~iK0bDe!eC|NwBt6r3fABC4w0{6lnQ+~+D
zjC_O&q}A@Xi8(NEf8L)~MV)=0iD5<fejtlNz8j30B{bFj;5$U^1jUn{DKO#B5Iu-<
zMQ;GmytYZUpE06*jtD!8Ej#;Kfi(1oRk$1oFk7z^oig&j@+EI$d=PP|o{9X4WCy~t
zagG{==pyI4h_)e}E}jT97Z<BK>BtTgcn#>CjKf8wXPqI~l1t#07<dc8g2G^XVh`*k
zkhjK0u{N1<M|$(Oh~uoyClXC5Z*a;I^r-BvBYIV(s)r&McEo5dMR=PM#)jjl6+JpB
zN7q0d(%2nm_I%bA023fjP}$k>(M+Q3UKkE#{aP|ykl<K#sIdU$sByhi8cEwyTsFLX
z?=Ly-sH=n1{l*!6*`C%}u5-Im>rK^q`Dgd^Z!%ILiqed{)<B3RsKHzkKa7`*1IbUs
z;p90QXO_jl7v@*=JVe~r@38059njTp3V}?4g4){Jn1{1ryUoW>kV+@j+5_ek$?NZX
zD1uG)!vDD8Lm><{9<EgI@bF+~_h9q9dOH+LffHy@W4ZaF#xOSi%ws7WffL|teB554
za0E!XV3*n4+$7?=Beu3tY!$drJ9UTV6Qp)RAJQ$+-e4LnVz9v4Kj5~02qU@QhPhEI
z9WO|Dw>o*y%Fl20>H_9L4Dpk7-D`Kw^Iezi@G4EuL+&=Z#=<>*FB0irV#PLCa&;@*
zG2+tg8v}BPcV}+r7)YG4D~^hy;r{*iF-_S|B#on+;Cq?{$uOhMH@3I<7Dp}bl}ta{
zX2I90-n$`MM`zc+f=aFCaNltkO~dDe-v~ri^jc1c^)dS#rxK2E!ha9V;B#|upJ%7!
zL?UBYJiuF7|FAKCM3vrf#@%rT0$kmi(^Dhaz<<fVR&7V&?Z6t?xF!tHy7%#KHKE5a
z$(FTtQ|9=2-XFmxIcU#i3rk4{b0b?ZSjdo0qzw-YJxp8Zk3@D`{p^{7R_GZ^duYYc
zP=%Vp`?@R5ebUKd>3<?O@cGp(?&iHPVqkMlJ<mfLRAJUvNVXb`4^vGKCo-!!Z3%rk
ztOJ|UHv_WBdTQlSLe$HZxI9j@v*BB>ewXv$bGccQtgML+PpYiyyu6mRbU#bpfgDkd
z<efS>9Owh0xXJeXQNL7Lrd%k68rc-EVW=xE&xDaNuM*xB&w<L{w%yR(fhx&jz{Jr-
zNkdD{?J%QRQztt6rjOFr>P<FRXFt22IEA!{-rFea;@Z-B{Uj>n%_r)MfD>`ZRVJO+
zRW-Fp1OoZ@0mbO#$|=UPO%JZCijrKDqZu1dk9i-@CJV}DqP4Zzj;=17+J@#Oly8C2
zG#fDeG}a@)1INex(&T0jy7Nq+(JBxyk-UNtG{t3aZ$FX4ff@+zh>rDK<IA6xR=DWr
zIHQ1-^DYNd=Qmp4jNqD@xO8;d@EfWtTu}HC5fJ!n4V%wC2?;}JP@x7RZTDqXoW%Am
zpBksNvuS7sCvpS$52tf2Utj(b!8|l_<1QT%SJBQk>m$7L{Lk%T^w#gMPY8<heBdR(
zcYZRO(@#q^if=xlU~*##bt!qZ1Tv^APE<bG2gfipD{?j1+t6c$1WV~9ipV(#IW}jB
ztHXO>88CNAxh`8N1-hJQFSI$M3%R+2nS2j_paYKE0w5WZqmrUW1$0{MW%};v)<sX3
zMV#A;69XT$Ef`-UX1DiI<TnM+&f}MlG114*tcIwQ2xmoq^@~PNHH4psS*UeO;R1cS
z&vcviemy9l9MFErzr8f>6@vzRNt;W3mb78fRTuAsP0&*hYDO#}mY>~o-ZwwTJY6op
zdQzreeneQq_z(?G!__HsIii9t_>Pi=$7_q)*LQuJSZeC~dLTfm1cwhmnr0c+p;OgL
zT!?<oUs8IIrpN9{XwRT?X}_4OeMBZ-#b_Iq)n@Akrgf%wpAapIi{c}n8yRw@>9(6z
zsHKwj`J(XDWu}q?Ht$6k=Dgg!_y{eoPVl5tJxzA`YsAF`FAG}ZOIY|Eva*LV)x4Pt
zw$$h8`5DmLY$9B3KciT=_P-%TN|9*49q)AD7(E65ICGQV#{XvY?XdVWx=uf9W3tOj
za`x>PR~barnJ28E*RbR)HCBc0-_rdwb6zy&eovYyUpgtWr*bE<+gUhCtuHf{Vg?G$
z&GpIoZ;nf*{B#ft&IMl=WahZ+_J&C4u2SvRL3)O&6>bnnmP6~4-I~v3(ayoaP}UyB
z!;i`UbzO;LtE8VJBR}=YDfkwi!LsqB#$YUNSI^43k$2lBrUehxZ6^${V{-Z?S1`1H
zXM10`bFO5$tiC>x_wVM|@sPxp&ZDKpRK{XziJQ1oE#6aau;JCyKsDL&>G<E<1JvQG
z>?bPxVPc}|kUD611+07z5-Tb<B3Omhg|rHmCxpw#%veQ4mFsB{ljBk%GU#Z9-syG+
z`GWvWc$=@kp|+IeenaN6cugtRYVX4_7YjVGDQ(v<5!%JFvc|126azmAr2K;W5|oXO
zS-xa~F67d4rA1l9mf%=47*s?Zk1i5<{41eoPR%WN4WIDFPtc(Hg6h2JY<<2h)Lr8)
zY`={e*3LOwJ=Z(n3*gJ|oM6{b#BccO#zEtz6P%P<XXGrrfN(Yr?&qBbK*mdus98)6
zN_0A`tFYFIW;{W=G={X8daz8F0+wQ|2U|y)k0w$to6B!>%wW$V;Fm*;``=Zc|KhcU
zwk59NUlElL|2<#6i21!S>5!<s^Nri-od?Y$JL4&XVGLeLl99mj{wGOwwpJPq&LJt<
zbnrFByBV66c_7%%$!JQ!T(m2E0EF*m{ie{QrmPnp$qF?;Z(G=#{io}BOEsT@Qlf6u
zBwb`FMT}Sp$NlclpO@_SX7KBP(H`97x#|I_iUZ>EHNMz(U)7WJRE_B15ABn2GQ-<l
zYg!?%Osrb#%Y-N07YxW%OWN3d_71@cm(KD{UpaR=AQgHY(>xQrW(PXoDHL~;6>8Ft
zG{eG>ne?uxP_S+eUSlShSAW~liis7>KPNS@U_|uK=5y`m2u`tt+S_Xa@n8Z*hnu-_
z0>XfPEZ<+P4eR#X^78VPelA$o*MHx#+v`OPM@2`&y|L4;-daKvd??N&Vlx>q7d#wa
zOi_MWnD5^80e}orhaI|}EriClhI0hl`H=W4$}35EuDFATVw{MiP*z}Vfahngjc4GX
zT*mvwQ=9UtCa3kPhSKWl-TSW&H?XeN-Rn<%c-%82|0|&36(YfUg>b3KFzWD|7{5S6
zY3>IoW_bvTddTz#zrV^(TJg8qq98sSKc2SPxRZSGTMFoYZd9SNIB@fL&{o;06g?wg
zKXVx$6^je?=^2_Cl@JY8gO%>=!c2oZctdAM%WgBI2TWW3bQsZ@K`pr(vL@W{%7=kp
z)8bdJh(Czsp(ufg$gD+NDw9L_J@EE&<*aN%qQj#|*|8UUW~>T^Pn9{}zH@M7D-A(!
zO76pB=%<yZtCQt+M?L5dOH)Ty74uA$U=)GdE~R$R<ZF%HXFXEbm74;X-kY)cG{`u4
zA|<@Fcf8&i{VqNQ+SBQM8^FkxdPKPCXiJfvyP_;7P3@-Yv1I&ZfBe8WhKnyGW@UWT
zm+m*%8mst^kG5~pPCW^(+f3>+Drauhu=|{d_zPF~GfUfy){yZi3==_W`#DV{1^_`d
zyZ)kFHU=2cCEDjfDA4xRtZ$fakwu69v_tryUAyQ1t=Z`l+NUWW&Zg$HP6~XWBgb3r
z(Y=X!HJi`UhQle8{6agotq%UvhQy`r7`f_wk##7X_QWws7TYI<MxxS!gEShx^4tk4
zOR=3HvJ-~fN+hRs>Zyrcc7#E~_BKmJD(cC-5AjUf79R&V2eijM<e1P*_4GzY1<KK7
z>rO=@Gif$g#910kS0LIU&lRy|*zLKz`Bk}2XBPE;EtTfrgz$#g>ZiYEesDGICp=eI
zSPil>Ge{LNX(5j#DTTGlHtU90N7!5i;Hz9}fxf4JlGUJ0;i1~MzES8SP$LqFbKe4;
z@eWQkvpPRbHtah&bh4%U&Zn5LkoHGoScf=+C|<g2xywYTqSWXjYZdP7>0`*%6-oQ1
zX5sABd^^lv@V47>R575B#GxxJUSs?K>J!fm&<ZiAIgI!_-ChfQI6oGs2OCRd%9&kk
zzyzkEZDb&r>eOPNCD#mwJ-6LR4)5;fh~^X)j+Z9^n8S{4R+<OeZ4<gXehAk0&gPi8
zDgBl8jix>OoB#-p05k%CNA6E`J#W9W2b-*Wni7ZG2!N_=Y_utKyu9%F-`}Y|J!2iL
z9{Mkzp7O$MwMH8p3c)LE1`r@46GVD!U$BnxTX6O+K&0x+$Mg8I_vOY+{?tYD-0Pd<
z;)}g%23tDvj$?&P*xfN6WKH3T8z#xWw1N@acn{NwvF&}L&(%Lo*_=WgOf(;y*<t?Y
zdM2$7+RpJ`;p}LRZF!MqU6RnwUE{(PRi5=T!bW^FS%(*1QS}$?2XB58-D;|<EiUym
z)lI~ms%dpJ^^QNV8y1&Lc=-Ras#9&gSFB{j-hODd@3A|CxN=l%mGwBD3NpoYcF5(1
zE6!^-b-`_((Y3HBSj+v?_~JfR&;Ge=-bf~AwQk%#bI6&U8&(++_lP`X(2ZQwtFG5O
z8#3{hb+q-zFk8@%JK%f=7Qf}$;ldTFGV{^KKK0KGv53t1w58<J7u7#5)d$u^J6B!4
zS2$t#EMOb<-B>Ipbx5p$WQz($Jjj?<xm|&bsruTo=BcZl8S2(%U(@Zzc`?g!l+xX-
zv;$VXc#n%^IiswY=xi*pxdy-W-DxccJ+_&+S8B`ptG>M@_X<o4q(%4mlW6Du0IVAE
z49c~16`WYS*^oV52j3ELXAg|hN%1RvTWcsf`-=J9NljJ#bi#B#jE;j&WlI!=VKGa5
z%`oZ*T<Mf)f6eX6j2B(}#{;H-K%S7`gEp-LIl(>>pRv{UgC7srH^X_if6CT^&tp=y
zS3Q0i&24uWZpf{{Gs0nsXL>bTxh!L_PHZPcB(gGmeP8^V&`5P5_6KZ(RmUtC@HNf8
zEtbcHVO4VOxnGsS^v%_uQHx&pw)^Wpt*PKd(a0_p1WIWcrV?)(_*Mr&d8a^lLvH@X
zrKJgDKh9AOmRY3w_PYC7YMK<))~2+L1wdkEu{ie;X|s+cjof4!^}#@+f3XtCq3b^U
zBAhyrPJ6|%Hn340+4lZgQWmQj1rAwMkNtYtBgcV3J*<6%@1Vz0gdGh?4NILS)3e3o
zC#B)+<6@m!r^7wC^{qbN9ht_vunNaN0sy^Te%eR(-a?==jt)OxE$HE27N*YX&n$H2
z*V&DYjaS;@$jexuuJO9~0I%S{z;FMUr?cdMX{Lp4b!jE!KJvgksKFhGPaqtculLy)
z>B*O!so^y>6n`gk=4)yOi|%L(7p5Ip9jf(*climN*Q<WU5b>w52ti5J)mzV-S2t0p
z0${<nu#mN<Skn=%unh8g9=VC2|ICBb1IxSyVBZVoxjvWexv2&a)tccm7TmM5pc3dC
z+QF?DKqL$=U3XM47~}<wc)x`3J}F&YW(I7~(VA0M{HGRSIPa}Ub$bUlSwvBwA3q1{
zB{QYHGWySaW_E7DJF%hA>+2U-=G26{wmuaEi;M?*s`Y|kUHWGlJT|c=&`|y0_eIUc
ztlO_fgb3F9G=~1zzy*xo<82;8GX5|4H{C)Qhw5h!I%jTiapM^Rjf4WUbFIw6_QZF~
zFY}EY5$Bhv##--xD$KG|QpN;^w2zK3_1poiEW@*5Rl+wmu5h-&72e1~mEW&?(<P}M
z0+)HL_|RQ)$~Aes+}R|wPOkWk0u!dBJ1L+Dd|nPy2{;+2SXwomqb&2AH-;5LtRQr%
z<r(+L<!j!y4Pn2P;8Bby^AdQTFtYvfN3ifD6+d?_m7!Ld#(hkjG%Xi%%C@Y3Ye627
z^?;d5_G&>k59G(-4u#m!h?WRQsmx&Zh&2W+IG;FmylCKCVID^Av|mX0wbK;5Ok6q}
zFrLj5<)>4fSPMhI%x;hG9S@;MWOVM1H?ISKGmA{w%^OPB7uVf3UoD&c4}r60dG)iQ
zy^`=5N*xi#oP2thI6aF;-cN$}!~&#AWp?E5u*#<73D0RbDU>#io}NeSyE43a6N4vF
zq*R)$v+b`+Y{DM3_pTCDn;sPi9SAI3RL-yy%Qwalp!97DA|N9(UZ-Zm787S_peK+d
zw-$ZLXx~K`GejPtQN6h1_94&e4QD#*$_EMT^XKq=1_v@K`R`WYEn1h0>(8eIeOde6
zR1h(G<gkd(LV7X#oS)%84Q%q|B56Jr>oaeq_9%aMj1>;4Zd;fVf%{BoqVfbllU4(F
zBcNko(DPbIiibEk<GLi8=r4@f4Ns<iGkr@df7)>@;C?1pNvb;5`wF7}T-~Em^42Pf
zn$m)pE&p5y=iZ`A-|cJHGBMi@ppHn}W4aRd;0V@H5LYyc836(}5R*w8jQS}|J*kze
zE(i4wG7B@NC*+?{huO`GIceJFBnTzh`*`Yy{lZ@iF}u8cd~Q?k$OJ$W4x_8kcQ8=+
zL^eqD0yd*cs|d-(0DaB<aS&iqtak&3nJioz9E=(Y4Mx6`*z#eCRYb>tHE_0bd@e1Y
zsp4Vmtf^w5*x1;X*48?3g5=f1g&)J+%Jy4g#oP<Wt_xo%2*o}wb8W{9(f8#TYAPaz
zgd<SfI1GeA<N8yo)}L;3mz`LlAB~N71{B-Fh65m><F~q18bVO8i@)nI!iA2rM@=qX
z&#zw57G8Vncu)-K*K$SEhtx|ySkE4yHx*>4t^;c`jrrqIK|m|m)%(w}gRW0+isBD@
z_eUFPK651(-@ddlO^KKMI%>+r3b6ZVaxJ<clz*7XzfM<HDf1=I*y;^ie*f&<!NG-(
z0cLbeXKf62*1hK3)~ej7^q1ZS)kwRIp$sP3n%qLz{VX0pzDwzI^JVFh300L7Z-a5<
zOT{p&mk@?Sw7bRnO7!ZL34=Nw1u$U%&X%|4_J&!?cMsl#&%DKL^vt$SwH~PdkWTj5
zpv$T=gEc@U&d-z61(z02U@fw`1mps1|2!HPi{^`LATp+L4Dnj59uR1&9`$GK3sX(y
z%_!9;sGnk7Wjw&)&?39(%+2DTHJW4zyes5Zd+jEb$>`JiRcd@WL~7gV`HBFf0A8<F
zr}(3N3VkaAdsgw*B#+OlKKwNNxaP@=x<U0$nET7nPdv6~*!0bz^4!HISiwVdtgS|}
z+sg6@H>q71E%!BHqTAMDW#+v}xCdUH4bveI2O~>1TOlPeQ+;hbdKUNk^(R?yHnAy_
zEIBr{3y_xK%w3He)@|c}a`lDZvC<GTjDWj0SNI(?&y&x+lCXRxc@Rvr>#c<H{oVWO
zq4AAmBwoiAKqf7LT%{UWy>cB<Z3Fn>^Fv29Z7UvC#lc=dvlO2l63+Cry`s?ObL-LF
zMXKRz^qh%5GZ%;v*e>L)MO+IG-`p^Itmg8_*Ch`utAL!RtZ}o-hK&&ueclrR)BQ^Y
zg9}<R?h?qp(`4B{*p<)G7e5<#_u0!(T1b8MVRU|yBM{rplX=gL6@FzU0&Z{3uwbO|
z8cg+k_wTyF7LPd>iC?_CeN0qxB$*l)Q$9J1p{R1MRln+s1?VMqFBi?UH}TJs$h3sX
z0j=`2HJ|R1eW(w9>l;r#)=OxhUUbR~Ke@&Ao&gi11836n3&pAp4&}f$<=7E{*lwp{
zK0=BhMKI{&odjO-)K?^3HPQ!S>x%NaR<C@w64Tn-nBU;4Qz?F{us|vNKh!XT!3C43
z?uZZJm|F^)li>u4Z<7RC4{d~0Clg+Y_HsH}Mx8%nU>F(6MeJ)&Hp^J6u>{|>ksDxq
zXm1Vd^x<RX=cYJqD{1u;Bj*2rT3>4QI@;q&DW{+63G@C_RxUDXS&ZyfL5pH39-Wsy
zs~#%tBwQ~kx)X}f_-7OS&rT|EpkATxLS}d<Y}bDDM=`g4<8slu;`HGghd@s5#^S(w
zS^MgZwcye^K>}fZcdgupa$Wp6MY{6PBpq9UU*3EWuMbE1FsEFrVEPP(TMud4BQ)#`
z554!c#BjrZNCtB7<-Lz2hov`l5X^Fhl~!=Hf;)3`a&4lm#%@JnwX~|L*Qt0@)p6b=
zBc)G&y32p3Z6RsU4Cb=<K!8n1EU`QB^N94fnp&gq)W8C7-WPXP=hATUTjfsCw8x6&
z^b(y>`nmhQ|7HRFZXa!V4-_{<r_9ezMpc|SI2?WFU%dSdWJEDmWEw`mZKi}OrjZ|M
zJ2p@Eis|gNr(@&vloiho7KP1SRWD@*EtrS<SaD+@PnhXryQs9lqRSo9Em~X^{~=z$
zxQ#b#+~<E54fdd$9{23M=8E>h2|{x*pW_qKJG^t*=eWi9e_gjOts8|S=^8nsHy*im
z`R^>L2|uXWkn3_^3<-QrM0P1avo-wFhMb|0QT8MvDb?Qd2PlS}{?C$9Cvn#4&nOa!
z<@Np_?0r>NU0ate1Pc({Jp_UWg1fszaCe8`?(XjH?(PuW-QC^Y;jUDxbH1u}|3W?N
zhqYT<Tjm^VNFRNU9(sH$pHWG6aNjp1P!D-9J3~sw6(#K7itpQr5FiU#$L9H|5-%7S
z{*9P*gq&seZp3P!#WIN*W-N^!n|$mN+(1W>JdpSuX;FUFwyHK>RgU*cX&f<k`I(JT
z2`##wbfP(|6YnOQYBas!$qxz{tE6+WY)?$)gqEOGpq&YQMh1f8720ZKGoLi0XL@g=
zA$%RH4ewjF)IVm!r;hbs3ou6px+9WC_vP#^I;?DMpnsyA`e9Zbr|+~0kuRdF=>(Oh
zPth4S)b#7`DfJvS8rq8l?t-B!0g15wNiLS-f<DZ|Zr_fhGVu|@v+p>P2<wvSD>cr{
z5IeJ|_+b|s``M8ukf8)AVU;?cjY=u2?Psp#rVi6JUQ=9b&nOyl`;t;Y%65ORk5;*&
zBQbg)O|&<B23g6LS8SIH@o<1aiceiKrLA^4a^q_SM?$OX7eO{G#)QhuHNTlD8!A5<
z6U+)#84mHI6<GV!boi?%gZtujXUcfM1M(6CGBt?~*4(HSYgQw)xYZ@&zxN&Z|8IXQ
zKea4EZN9QvoK~*j7ycNRMJ4IkyoL{@&uL9T1N;i82z@GnwmVRZ1M6bH!jLk@14|qz
zY8s=Z85$>~&KG~Ahw2@Zj2?7(>F67-{Cuoeb6CsKT~8bq|DLf1#lGeSom?LT9t(;3
zBcP$5v;V6dfUH5?%WQtu>970hwhH3Px<o+GUD1ntI<=N{mR4MF8iY2HtvVZf@cEXm
zR&Pu-B_R)yS=|?~s}*`!WBhmlUAxZ#0mZ1Db;)(%9b};eD68a~#k4}P+Tb$P5Ypde
z^&H!&-rB&_1VQ~taW_!}1gt<ZJl+LZ5ze0}Tp$r?i88vC9;z-T^@Rp-WL|xO#6BPQ
z#iIVZCQWpJ6M5eu#~-JfD9O>-#U~eb9aE{Wsn0dd-4>zs*IGa{Ru%K9etC|@=4ZI^
za67Us9b=+WWPd!O{o;TGu3$#C+Lquxy6X)==u{l<W|NhXMF5U;3<bCUiR*n(qKY<U
z^Fr&{Eus=$R(lTl9!x95)(kqM{_}?~4mpc?zr1JKkBig(Q@o*V!7EHgFOn2zTT17h
z^tH059&2RuAH+IsouS(0KEK_lpm#LVYFm;{&yG+ttPoiUlUEM<b*DAiz~rDm6LrV+
z&3E%BS>6iR4hQVK=4a+^vwzX!**Fe_`PcAo1Hjq)8U^yRjfpzdv?lY<cLiYzRx&p`
zlrh%k&(QRZ61Oo?`L=hX$IHgX>j@c!bPI*$YDfZ4)1CcR@iwfnTwBeGpY;)*?udd@
zaBZsLi$V=fP3_?=+u}g2#hYlx`|Qa(z#-vt?s<>~&UC;5y*+J9!aW6ezIquIMQ;t9
z?j{`#GY^OMGVEb^RJoToa$>mhyd+hbPeCfTKS7}ytIx+inR|_+io7W2l>lk{bgt^~
z(#!ruageCtdYSQ;^+XOr1t_NLepiS^NUUJ-=1e~;cV?_=YO;<mtx`~E@k!hhg}3c%
zNv$fw8yORRs2<fU)5%X1|N6eE$iE>&1Hh+{rjj5Xzq}QP50DRXuaEIkB)n3=Nc#{H
zW<Yb!p5J1z`?6YVRgt(NU<L9iBkz3UaV`j`0aMY>3?U}_mZvw9%O&?2Q}5em4{k_j
z6$&l`^3J9^z~>F`u}Zk!f&bJWNXrMi=!!q2^Pc2xfcTpG^HWh<6%eU)H!41v^Y;Yd
z$W3(d3(BX34D3SHG3bzwXHRlPd>@G^UVWl=MK~>F;rP;XcThCQ%_Df${CoVc48fGY
z>!%^=@Pfa-=h-$EJeIIs+?0XUv_>=b-m1!FsP;M?N`xU!9~RD#LaxeBcI2@k+=jx$
zoN%??TWi%-YeNUIEV+MUr;^WKQxL}jG@^n9^a`~L8KK0j^GEmrZ;!RKSH%A<86QO;
z5;kkC4t#?yu_=T*x*gyR(~=9oo!M2e(eHi8Tw&28Kh%u1rFeb{><QSbgGo1aC;^^~
zLcFN<h{O=pM%OP!!)HSRb&wkI**nn9h6GQ-kuz>RB0`#_TboyU$^neBh6H@ZhgBJs
z&3NIoA>?}Q&CkZIi4p{3Z7%`*UZ5fKHdT|}1!f`S!;^@}1WqYImB&+>g`jgjPdt#W
z((2D`3DPxHb`M!o=~17FxYB*Qe@M<ME?G4}W#OGh>J;IREB|yQBU|tJ84le&l8Y)R
zd)YECQZ>|>Sn2amFeL2Vi`FhYbUc_~)O#uMfe94LU%RS=bLi(gzyC5UW%jw<o~_HH
zou!{W=n<<rt}VW2y*)on4RM|@cJh4=se_|j=T@0WmD$Kk;c8Zg;kf0?+}b@nM}Rgq
zqhEVwg4`0S51|_jZT<UVgn{;gKT?k6$^E%_)|B7&Ma?!@%1Pl(EDr=h__RrJr_Z=C
zxcq40vFw`L)%77s>zpLNwp#fw1Tg5xPAR@djPtB-?r4?!E-_87M#Nt(EDCRHXJ6fa
za8FS5fg1Nq_8l&`oF==NJD^GD(kQh$rNG;wtJR+RjY*Q&iO@)s{^uDG2g2W*bbU;x
zGvkv5_X&;bSE`rBXq9~)fMk1wM4gc5*hA_>0lg)dhc~M2cqp<TZh4lnt~+3J$`ZjV
z#25As4haN+;Bu(+>R;CjUqYb9^m)@jA)xll!cSQlBaWb_FzeUSCHZ*{C3~ICFt;nR
za0sMzHE40}=YA5~;DTms7L{fnnG!*hi_>s4Qm-;R+PT7SN-*y4%vp^&eS*5X_HX)D
z(eF5`kIX5Q1CK9t2}cM{0YcHB7sz7stEZ|7gk!CRYvkr$Y;*uGX}omu#8dKL9~ZR#
zcR!Uh1FVn>PORN*ty8y+_zLSF8)j80Z0H9$rt1w0>qXBnF)Zq2R5hXbiJIL#%ohy#
z5AbGJ@VwyjZPBy0rH$38o(!<!@K~kzgXml3>f-^TX~ceuHD+L)CVwcH_!Jl2`i7)R
zIlyYuPf<&g44J*5m}3Qsh)piOomi1DIHIbV6uUW{LeVKbn4&KAD8cOVkl~3?@=}E@
z%Tkr*1@Dl6NdoS!m|%=V8)jCi-0hVS(IQ>eM=n>&KR?-V`8^rvEm;Se=_#6=b8f2x
z3bqOOMjxx-S8&l(@+>H`oSiu2_|v^j>d!U=6=Y>uR}MF+V1^Zt5C4s80w1w|mjRof
z*cx&HKxgY%ei96KpV}bg>s8IHh;-L%AkBP9Y4{copt#JQJpOAc(zVOSUfJG<ZY_`#
z3=4fVn_EBY0vh0VDDFe}-R84&dYw?%1oO8-Ip`epqbwb=u<*y)`Wop>;u;taWPHhW
z$Rh*eV^WjH*VE0&pICzY2!0R~&Jyf}#h*4RR=+7P<UfaG;`bEk7}u-1wYyKj;{_bC
z#dqTs$1$18%%b2v_4=^svvg=C!n_tM+sLUj)}sazX7mOJjEo1%mpkiws6%7Jg{bnI
z=)kF4@T}B-HYR{9&;Ca0Th0z!1<_mMB#K-x|3#sXKQuAbw+3PUs=u0$g^MPi7bkN5
z=K6n{KOVG0h-G4+x1T{|v$@P|KLLHq`9(pNNoL<+w&20K#GR5N?hg=EK0Olp<(0nX
zRR(Z)OV6s1cBHn~urT_k+`6J9ad??u++j@w6W<BDC$Uk_L{Jd<x-|#|Jfr#hBi7>L
zRD$p>B6A*fQ&{5tPR6spOd~m!XLy-?ntU=*Hxg@LDYA0ZCjU`T!Yi1IKxln$C-hKf
zhnIE50`Zi;_}ZP+sQv8Ev%M$N=9cY301>Z<odUv;v|V^tfo|%UVjh<emkUX1yYc!6
zzfh>o9^I=(v81No|4TObe)%V|W$?f(L2~iG`3+*a+eIqScH5tBwZBuykI&}MZ>@Qx
zu@e%!lSx~OE&+qW##kF`oyy6{P?D**r8pI$+&{iCzJs*GmN}u4Zv@xIye4RQeF+1b
z<0n#J@&CthZcPuxGO9D5*5I_>rsz9_Tx2u=YJ54_(xL&s#WGLSG|??N`yG}_LNdL7
zmw>$*MezV6um27|?_h2?IynD6vcS7d+dy3Y*FzEdVIuzD_5WUv)sJ{zJV{eejlh4G
z5m>KR@%Mb~!Vhi9|NXwM7u%ZhM|qlXLN)CFSY}9YXovq~g8+Ht|E~YP^$5Ah+sCFi
z6vrF;$liHlHm*2%HBT)reg54J`H%VfJx+Y<lDaA{E~S&3Ta=i)_di7A6mL-KYNiNz
zR}@%l7RObO9n0P69N1J_(EGj8*c_7UIYRQ1d;T%<(Kg<vJ(jeg!3Puu@dV8-ZI#ci
zZ6i%S6!gNdwM+GkIJP@zTuaStHJK4Y1RYU#nT`HTsY4$cxX)A895Y9Z##{{<{?>9T
z397xF2XjdO{H9(<|Ci2}PrZMpUk$jHAPmdw3yVricjD$or?z``(NJaQ4u!AT169s0
z{4pl&Dxh!fjgr|dQ8LIYtIy(FF;CV4ZB-nu0Unye9U?FNdS`4;<!EnZkEIK}y&~mU
zhRZEKC%ZZK^E_R`TXR^!Z{gc-kA;<2%WHa9=8ozBx3}K)yCY}fSg}LbMi(um%ExUu
zuI9~`RG3kD1yx$LQDqH-7#E={iQi0}VV9?`A|KWdJN>CUKRYg|Cf-TXo9bbIy0{=`
z)I}X0Bvv0go+RTm`&e0KEsYiwD#hH((hoVh*g#Z@{hPk>`K0G?GwQ8(#570hJAUcL
z%2BrZ2<;UVL}4BaGV&(KI3HtR_IV%2P<fEYs`Y5~JER#i7k3S4`V+c?x7Ly(DOi}b
zBypc8vye1iLOS*c$Arv0<eX3)xo7?OR21ROAdYR@gYj$km=4Gb%h@2_w=myau{_w7
z0-GPn6b`bSRSOSIxg!?j6$I8dLN0Nc(E;!ik9_-3n_lvu|Cf?Uzq&I_+pXWgs))QK
zlL>)y%<H1YD6W}+B?I04b2TPdrJKkUjUm>=YG<elO{Ml;7#Y)=POt5#dcwGh;C<`K
zlDrkh7PG7FdP_{aTIuJs8|;ZyE=8a8fA-t2DF^V7&(Wm)?3zUEqhhfO+PAw`#?y2W
zB<1>dwivaPCv9n19`@m5o9Du-v8ju*O;P?d+#@lCfWpR|{--r>TYFRs&uF8K(iW$X
zevL8@9SkR-uH|M1u(5<5ea}TdqnC~S61p-CF|m}T_!OaxB{A9GXtT*n_pf5`uAj~|
zF^#DADq0t+FV3X~(o2ax{QXC%IK;H4L(4ph7^{{ycIw+h-lxh|TVnGU7NQ%zrsaow
z6-Gl;p6c_qzKs^w4ssn2twJeQ{h3VkEF4*Q-J_oGObx7uPQiiF!`*_0I8k{rIoUY~
z6Og3u4y37}>lv+RoG$tn#llUWnxO`*>5_BN7Me^%eEP4S40<Q+lA5aI02@F?W0`;I
zNfg$&*g5wvbfuRJFP{>TCK{S3thGHQInLQm^Ogd6;6cw*QHh$Dn7S}Tdedjhpw}d#
zaVOW~%znDL8-*CX_@OuNT2VQ$_#OyQUn?v1=b{_np7Pq$HXIas;iL?=h&v;e0>=!e
zGt~#X6K}3kZ?Ie6+v2P`DH${ojwAAGDc-DF_s+&<(FpQOV3;^NV`>~mKH3BNs=t1R
z-$4r^W%6tYRgkXiU;Ul>)NR{A=IQv7Su!Z4A;<(Glpv)_*7ZLiE2O_U`++*1Ff4+e
z#SfV!bJo#fL*p9M=(p$t|IzV%nsSCrvx4&(jv8(bds?EKXRB8!e3DL<q|2&5u)N*-
zda24!F}5adH|N4Ck4+gC6W?<!-P;yM7T<Q9u4XKq_6JQ3`#6S+p=yU-BvEnIMQ`io
z9-r(>$d}yW?jA;UhguE89Lw|$mZca4UtHT7)>5Rc8Msr;ZSf2k$2O0?&~-en+{xRo
zrE64jja1y@^fRK^55z+|eA+x0T@?>pdeYWP$Y-{_-BVe8Y<ik*{ZANDCjs)Yce(n|
zcw6+Cb=Gf*)iKRj)b|(EM7GoPxbK&V<B-hZ!YUuTVl{6$pGh-wgjk-B)eGj!^j3(*
zM<+FD^eh#OsJmAWg=d{|YsqEuu})Yyt%)EVm#@f{2$HYbp0FC)*WGXO-ti;*zbSs$
zp_i(8-ll<ii=I?5y~W(!<fs@ZORrN}r@1&%Pf5GhP<Z0Nt~yo|?^8>0CJ!{A>M?`;
zCy0JA|GN>SRwNVV8w1m@$?2_5``!vkmL36f2tpdCL7Gav$AS0;i8>G`nHi#L-0Ax*
zY$=>?lPA7vVBYq-oh@PUtx&cL%LbQjT!uUJWt~qyDYeA(R>G1QW&#aP$7{7Y6$9pG
z2E%jTR6UvG4vGhg9McOA4=e>XED|5z@ewZJjgBZMqHvXk?_tGhz&T&auE>ten5-XX
z_uwZ@FT<(YxRa~bH!+%hEC_z&H4~<*a8VBZKHbCgux@MG!>ZxFCbm+inLqo_Us8ws
zy8%1r1B;^8VHPh>BNU3ReDQ>7O{qK7knE{<sNtGJ)b>C1^tRKm(ioS@Bw9l6e3fd<
z#M5fMr@H3KoVX&ZM;vQxu-hIvya=`)EVnP$#<Z7u)t?u`-_-H;rN<bc#tq84P{tLk
z>+X!&?NP+>Ri#z<SL_vK^}YAkD3h^nmiw;SZ%5sKb5fE#-F-ND6&`5!3-uv6y03j0
zU99h<mfZAybQV|{<CyV%r-R0>^Ij_W4FdMduRGEQHIGb?EFx>$7_T?`C%mRIDBb|?
z!xn_)9~YIDiqCP;KC{tm$OUl(;Dj(s-o`UXK8S=W68U2ZlQv>c?^8^-F??M<@HZY@
z_+0olzgbCldqmd{-kdw3+i;*1WVSR1a_sw1FW{{HBo1^Jh8F);t&RQF0QE_}Ql;1D
zICyuJJ0=eXgHp@6!oJ|_iB4iSfc&mYs)a-cCq<y%l^I0CGDB=UD6dw+sB+)SoxJqE
zZ;ElKe{=m;>3^^oj7R{B=~L3+J^Dc1**MXCXw*o)y<~~0VC+8dh)@8t2A$_jAF<WK
zOn&O>ecmLUyi>*5!5`P#Zi~h&37x>Lr2i{ewsv`RXQ|Qc<BXF6wOXRyM3IvwJ_p)x
zQJ!TMQqJY1bcMF^0C|`a$2T31VP%2m-o_;FsWg0FzCsSiWA7ijJqGo4Gf<UmO?8y^
z6K>HbvBsNZLVwYR2=jOo*RJjGTV??*1Kfm>?gN}pV{nXWl-nuu>g8EYEkOz|9-FNH
ziOhArK$B{qgyoNsTIkYya_E{CsvNH0vEzl|U@P3o9ZK`ox@cY==4LDy?q_ha_leda
z=Nk~;%pZnZc#<R;TB$>?2RQHCVsB$jEJWlzkhZ6-VCEaEEmz@N1<}NB->R1i+n!g}
zn(h@)9$<?q{6pFWQ7=M?42}_h{jRAt*Beg`2RmtU-6RfyqRBI!o2A#-Zgv^~D}w)R
zw6#lECgJ794$|}Ir!JS7u!Kd?QnTMS2_rkkd$S3EcuJZ&Gz`|YFLwOU?K*Rm<FLYN
zUD#7^gB1NY>LOZ10cABym#$6VTK6yMn?7siy((=Nl7=|qUsUA8M?0UDE&nLjC*<4G
zv1-$NKXRm0d3w}dF56$6e0tq082Q5A<a*75yhCz*xa;5md6D8bhAn)R0)=Ihl_VP6
z7Z=l;%Rj+{_JWLmK>*C{1%WF0v&bIY3j_*`e||%+G~Z~X`=@QX={ravsd1hCSejb%
zHLay~d+21BpYUmeqc5Z}tg}Kf9b7Q;b8$Z2FzWYUc^u6NIT)lds8DF<6J&4=Ea#0U
zgLtfdP#bpEf8H^fdr9fexe;VET*H33;=|$(>TK<k*v^+CRQ+CO`TaCT0be)xIlPrA
zHj#U2Y&g(D-m^M{!xiLr+92_YKjYZZ;SOl;QDBw!m_vi*58bWB=cgUC3=W$C>1|e5
zujyBHA$XkbJfl!(H?>}$mB63+=G2rA5M(Y6$c-FTs7kIlQ1(eM4V7x4rPyK%975xr
z7;>_%#S^^$8r8=sknFHY8G)f>YV^k(5PUy~h+eFaG;f!*fT(b^#Y(=y%v1nl-mf|g
z9(^WXtA_$U``eqP)*mBD-9#v+N`td`ur!fFF}E4@-6moZuJ)+n=Lx5CW>~Uk60Z@>
zFiH^XIggMHx440I8VgRa4l@`ztx-K4A^zv=!l3wjuqnggMrmP!N=a+X-?*dNq25gi
z&bii*=FABFhD`$G@z6r%L_&1C7<asDA$a1>=VJj5GN2eM;*{7G+!4LRI7ET@QJ7@9
zU0?_`+LCddDF4=ELGWL@q^yJGZU1xxwL+0jeR$`)%`?WP4UzXchbX_O@V8!SllN7C
zXh<?g%}(Y8QDB^`YY6`VFe5!RX}HJ}&UnF(dO!bK(wONdO2zSJ|LB=ek(3Uvv=gwh
z)CWxG4D{^!Wsp7^u6{mOM$yKDpVf}!;uohAO@*P*>LN_K!UQxS6$!Wr=BwlP6XUOq
z_ear~6=V8anW;1%C!S+r49+de+ZCCEBLr{p1`jyo?#zkNdKfb5@x8r;_y+p5dGjph
zwTjmZk(wm*6>;0I9P{q`iQ50E^r>Nh*b-XMGAwP+*lQ2@u@4c3^M+a45u~@fWA=1>
z14rd*O^5=QPnGgOC%B=A^4J!%Z3!3F=5CeeEUj&eFW%T6&B_@__#6PX@TuOY+aLgm
zKKbKPB!*2%P8Px%k|?+xxV{zm()A6UtMz=;?tU3W@P>r$JhD%IgCo=6f=aL4ot<U_
zN%(5}VDKg9$nMd!q?FawU?qYJUj~ngg!y|$Cb`hLCd#7;BqXJn%EK?}ooCFWk3YZE
zzwA#TFhwz!cQ}~#H9}oBvRuJ@fPR{@#*7X7u)CA^7YVU85%K*tRb8S^mHviFDK0I8
zXSG!lqPI&EBZAEl>tBgUNE8@6A28clfueHTFQJZtK|9_Mk+%$n+r$+39Sd%)w^U5!
zWnb)d9ydYK#F)p<;F|`!lFY3eJZnC`h?w!`)Hy|~&Dw8@?qI$IA~|LUEMH<Vt{1T}
zoa+aQiXcn4^rmiGolkqfw8wPLUSKgzK&q||ug}6(F{DF0m|;n%vlbENcrhxz=3##6
z6~rO$g6G1IGHy+UkX~Y$CnjRxGWZ-^4>gFhkqPyoK4qHOD#^WbKA8DMLSs*N54Fen
zHNRfY0wS}$Gb(XIMR+Qr-{e^X%kAP5`lxH1gg7oZd{kcG@mS;(H#vgZMMW>@99n)8
z2O19yN;!}!9b{8W92$QotrxmYv9%~erbhUMl{=7Bq#ja>vb|jdy~V8>9AnIrY4x6I
zz=%uyxCSk97{Dd#JMA6WFeekmlOIyiiezCvfTt96^9zFnmF&RDoH=Z5+}ZZgyD}dc
z2C}H^h$_?7dWogeb3~zA*jL_RI{TtvYulrz<bzsC(-vawiz-7*%AIt6NlD1`^siAv
zdO-4%{oo#uPv+#xjP!VWPPbk8c-#2c119bOYK+@EJM7zbp4Z36$B!!;??M;r?akN2
zloQQwf2EtYdIEvtlnCi;_G}jmatiUlC;k<Dak)FTCoEhA_$1GO&@D#I4sT}Wi9~+{
zS0;gURnmaaYWFwCID2R!XXg`0{B~JBh68z=EMrZ0*hYM?V_!$u%=CpeZo_}L0Kt+Q
z^YC+V`!t)?2RDnF?0(5K)H+>fk_Dv;iikx)Jz17tOd%gnxiM<2<{Vx@xuH6sWn^)L
z7fxbDrX9c28X&`JImdN{S1Kt;+Vk=Ij{whv^g-2SYBZIbH3?B=`1ON|g4p`0`n8xe
z2x2B(@Y4kMGGcF-O*x)(TWtx+ro+{oBk6dx21h@auc*n*+dHT!Sefge{=5fD5iHv1
zV7c@Cb9OqlHbq8nnh@n?)9-?rFRW8`*q#Rhzmp!xU<vSt@+p2$c!bFNxSpbx{phwY
z!&r(uh|vkTu4moN-|b;EU)=gGuhc1`R})RELxH7eaYM6g`uU+ip=@Kfdtot+a!O)3
zX-FS|c?84YYDNa5|7MG_tz|0Ec~#vooOy;$AQ%5erl!Ly2al^8ZY1s;eR{w!*17g5
zEz_WW)z|&ZG{oUa76*ewbuh^eLj^3{;)AFBO1AD60%qhkZz8QHu47p-)$cyO>b(b!
z<=!7V!T9G|n&`cs9hH%DQx+s{+ugLyD)iB)iyh4Td?5O_Z@a4Fu$HW!3-##_w!TRG
z@cqf?p`MBUt7*u;vp`415f*Fb{2Tq;;ytFxz+-4bq@^%nDHm1O%44-~mz}L*+(f&W
z)rlHMV>cdFtK?DOnC`he<#ND{90K~)u(-d|92rh3T{1f<Z0$bo3gh(hU^27sELr}w
zzcIPnB<~i~33>3V8vOIRiT*0B_wD+Rckhbp?Z=|196WrV@Eu9LUN(l<)0JPe4QRye
zf3Eg73<@`UzOC+s!`3H3MtpVsB{Z;VG>e$=!;I{4Fu3w1*&1Et$Ih9soTlbDpkpMe
zsfllIf2Z{d^#$_}a6ObK8mnrdrK&ny>2XJyCPfMYu<sae4kiv7JKroyEdbW~2rv?7
zvCaeqBqS#)BV$~grhS$|<QZVXMa9LT-r>I79`$5!y5>}{0|az(a&qJP<v~`)dV>*k
z!|}ACL@kftjf(+oV|L8VJC|fuTN2Nw5iR9*7sk%lJ(`tO^i4oV@%DTzRT$sdPcRf+
zFkvE<86efW#fF5y=<4c<NJ|?5QmL|9FX7DVo9T}0TO%}`;J{?c%H4*Q-+EIyf|%h4
z5l;Y{@M>C|=I={ZQ<K}v_4csK7lCV#euATC!sEWya77EyFVc5lDuJ>33ZkAlG&;o8
z!sKpLA)y;2V6<FV>phCW9JO?gw=I%<O@~v^w%UKyqHByTabm80*(1JqX~m9^IhhwP
zZ3LD)k-4*q{A=*3vNL8TL(lQ{YMcVz*ZwC4p;ql18_u9xPha)N&&m3+lq?{FNEys!
zmfW2ZcmBOi;Dc=?Ti4)~!`bx=Vc{;S=*zOOAKgh5R1g?HtLaj-8tEDt+U=;o5IO#}
zU3e!^6YZnnVNmUYq>NFgy&0oL4z$gcgGKQmo8&bj%^QR0nWpy%+Eb2i3!UIr#7NyG
zj^{v@^hx~23j7!>t+>548~;a7`-Jb?+rw9D_$6|S4|VgF&U-e>1~Udf1XnA9TWzfd
zKdXagiCV!Ih@$O{YJAt^koSEFx?iflbGJ95Csr<4FBeRH7Ub#9^Y1bLUS)@InsW^@
zL69aY7mg#yu_qiBJ8dduP#!3ZrSg`tW+Ftu;kH2xV~l0wD!0$79TuOe=|fEjAe;)b
z;E`q^g_qsG*Y<p1y@#@Kc53^MEMtDGzEc_B@^f8!!|=)lK_?t(1cZK#Y~bX6b4B(~
zjNVU2!y9e$qPCcyFa_R6w^Pgn2yC_9k0b~A?^;n&Fw46PkjeIA_yT0XLk#B?#HyHV
zQhmSpDNXaC1{`R%z&TZuN53u;))jg&CoEmfdCi;WM)GE_p7hsF@V>idx$0v#^y{~@
zAf87i)VhTd^u>woS8-vUyq;25?<_Z^S*2yD?Bqn|6ELh#Y9E{B8<{8Z8~k?5#E4NU
z)svFK?l;c7aac2w_sRc`>LC+!BeD`Lfn~`ZK_ndU)8*E<<MmqnciWZFU+OvV5DaO3
zJS&~I-ADnC7@%JkWZvod;qkk94?Ti_dmNC!9{8(B)*p`Jyc;ciI9rU;+}s>((P+NZ
z4XBMl`}2k2bEa^)KUZ0-Fr7NEqmW2wPh@b8(cyPK?c;y|w+&?R`lh5GEYV2<_wWIO
zAuBuIU26BB5b$@e1_+GCQWy{12>yPR?F8q5S!FxG!aNYOV$<sjL7SbMv%YA55K4Gk
z>3qYXX}kOcXbkH<UT(Poqn67w+f&qjH~YuLAXjue9}hO1%$M%~p5e!Wx-DE~`#m;E
zY_LM9VsvKa#MRCKITaPh$w?-geTdfkbz+%DYXb092}`v`C|@9-ZfB&O0eMAH2?^jY
z;+s6&*%3oKFcrb$|IvZ~vel~1r{L(6Xucr&?G|i(n3g6GimtN}=Q~bPUevI8`7gYr
z6?#*R@E4TX#=?kwX+vcy;nHh8PFybjO3Y2wD{Hr%3yL4U4XcvTYm&-(I>V>x(>OH(
z{I1u|1}4zs(&8fGL;W_<sSeQAii{q(IHyi#w|Q#`Q*0-B)DlYt-*7=8ZWghCBj6H{
z%2o?Dn(hPMdZD4W>n4YQUq5)yLlowbUcTL+HSp>nbjsO&*+tV-zA(6;M#%OkF8{=P
zEQ=wS*#u3hJcMs%=KiYpJ7k{y(vdEj)|F`~yle8WjicWT3ro#=t_JQx$xStKG+2_}
zpLsM<$6AAp$EZ+%>$H9MkHZC>DWaOLw7g>h*_dZUjSrZDPwG#5nf|6?a4btYqBYX<
zWCEI6Zaa9qrn_Mq-bp5(>C2eu^!0ma^9=ojzGr;vpxQ0^c%@^_gB2TzZBJp#Ie7d4
z1>t;e7Ol#Z5fwvo!ezBsx3NeEYwRhZk@MF=##p9#+&j#Ua7{6g8_Bn$?++~2)H*b&
z+zmT&;d#;%-&?P^WHI_{@6j~ZcJuTXK9}W6l6kq<F{g4r_!6dL)lTS(7u=tza6>wn
z_ppzyx^QR=I>EA#uO1Sv{)jhemc-BS$1%ifI%R2Ns)<z#l$>zREQZ~B-h{hmdXbsZ
z#WLP|#K-6`CjAD)sQ+@|z&w@fK<-!X!VDLR7zkxRthdQ}$LT0P5eORgwP$Z0+#?A;
zh)4^$@(+VaGd2PB?A;;K%-#2wi-Qo#3J`5z&iCy6{4d}>Ky#O9!1xO!wN^(OAowek
zK)~UI=v6FNNN0aM>!8&Wf<_q>8mb3~{ze-BI)f6aOyMY`QdSRZjyrcQPp7pLIp-e_
zogeAIK&-8sClN`>e8-FRGS&Lyv`W|KTNA|!^&ud5HDC4PfPM!5)OI^9u1tk4CMFgX
zTKaL95fjrbC(YUC`8wcv`LglBWfO*><%T3EOGZt-ul2sC1(L*K{jG7sgPp+r770HR
zDztBUpA{7yy~g28wLg+zqC}ze+lBjA#IJxJA#i83O0A(_K_3Px>Q5_~W!B1aqQTt`
z$hnOtXnFpHyu5vtzwH8#Dw&@_F)>|r%@g#VLg?<x6n|7O`cG`OgtL`QDGCM+v8haQ
zt4#=%TT<WR@5FtzE_yZKm*W_owQTn(E115g-=ade{ay{sm83x3{v~^LL7|_^9_GM9
zlI)%f!DM>0x)x@@gDy9qnhHUDx1jdZgz%Kx&JbHioc1co+$0Zvgt(`~@>);2D{2*r
z|4SN+;?R}ckNZ;Ct###~n<O8DQSe`I<zNhi8=7qcJh5~+PpAW!aQlQrRVIYHYoV1j
zd)2;8A<WywQ!y#g0V*E2A+<6HY-Fpaf@c~1O|l8|GgEV4Gky_J2tmR%GNy8s+BMRC
z8w0oJ;?PQ*;jk>_+^J;DPxa5{|0PO5U$?}dgzP#sQ0Ctv=A$4<XDNk8>U0X;d%U7x
z)W`)Z?@N94=_EYB{nXAyh7lcdN#gn|Z=J8ty9?!fwPUn_nN4QU)FgdAhKqaz?zL@B
zwq2CVEn;AI7PH#n((T5D07^mzHp~9OWOaf{3<lGB(3Fl?LkyR*&5g<@Y_J4<FvMPr
zqC5pDL)UZq!U&Yt3iO#O^4IgHgQvATUd#zVipQz7C?+-*i`#=kn)}VJ_GnzMKa7x?
zIzpq>K`_C;N9X$Hh88<Knul~&TJK}b(t}??L4o+Es+`<k!CB|+VA2!_Hc&eHlLN!^
z^>%>$%pV93YV~%v(D?FNT2i9i>hPm+%|Wd;Y^dG!fzEW6?EZA=ugO0?O?a;F)4FjH
zC~4TE0sdSM5Q-n8Q<*5B&ArWVJrAaG1ss1d&`_KbCI|;-1I71zwPwJE`&%|i>U|$+
zX<-BAbPWBS_$`uEJ5}De4)X`FPwRPG5eW%>Af0P_Id27iwO5z+S4Kv(&J6-*t?YOA
zW_jW=C?3Xv4$eQcExN4@rTP`ngaU!TeugLa&{xB0A<Sv7q?YXAUPJKG>YbHazWe_&
zE10;$R(Ey7r9jXV_uFI!?KZAb9UG&cN9AmfktECKa)Y9jjiqtmbthbwIWIWW!ypXz
zKdmmFqNmrZQx=jWI<~!W;7m}Qd>A7xv^-%XE%8Y!0u#*THIX&rQg4Xn6Gq%?k*Em}
z_yBdRDT(=+B|4jJqEgDG7%2sX3n#0R%NI-J*Xd9-g~ca2ZfNJ~c^Tp<RYEW@An4Ss
z`is;^H(U``x;*dtLq2FOd!ha{kHQ$`&iE3kgd;+AAh|&O49jzhI7y_HQVi+a9ZP&R
ziWk15d2FUEBiupZOg6hxZK4Re@yn;&DZhJ*o;|QLFNQLuku}LW-+qDRSrvh?!jS1e
zfvKY@O2nd)-CgqW)zDcKzYl4DCykSd3y28jK5Zh`fU~j{0a(?#6RuoL(M+z)u>Omz
z?8DUb@UX^EGeDepG>JBM?m8(66DV<wmvs#XBJlk9vWrv_0f|#eKs8kxFaSUb84(9(
z#Z=XKMh~pcTx%w2X-RQy-1x78vY?<~tryX(QT(wTFnE@TkuiR))lqo7v^~cl?)O>?
zT9$$rGZRw+prC#U2%6UbNsY_Hxuk@|mv)!CZ<t?fu6IRS?2eF^xDY>sSe?}`WZdBj
zg<&GqOOXqBW*&?t^9vqM6X%E+8u|VK<won}%?dj3uw~xq-;g}-tlX}(EcHB<1Evso
zI*FOHcjC^=BbR}1=e~ZHXRtdd0He5q>XE@HXv;yK%(9Hf%gvfO2^Rj2NhMtVvbFHF
z*BNj;4H(;0P~k&f$JAY0*A3~}Sb~q29&Ut_n65e86V#JBmA;H-c;K`#Y-Q3nCi^-x
z_mN_k3*?v;pc4_u>IS^sKl;TI!ST|wJ(ev-MkiCg#>aaJg;pG4D!s8V$8VO>$1u4d
z7AVI@O@ta{W?n8~a|d~fb!iBiFQNqe=A2S!u`+kufvtbcvYXDHA%lAenqSh-*5mE=
zQgrM~<}rc_>ohd=Y2wz9luZulPwCGpkUKNu1Lw~ovocl#b+-JmU;Y3?X8ocTUMzxM
zH0^)M#SU>nZFt2#LcV%zve8`3%fWt+EFhhk`r`~?6Cjd{ZJxY=%T7Kp2$_yhK;rnu
zF#0{9Ue0HVUE;!_z&KS^4(D2BhIDLvrII9&u)B2vlkvs9P5^TWgVYG!k2~v}sC!{I
z#yR-9b2hML#lrGeY!?bXjEdc6_cO?XhNDL5+1UOwGWdqa%Nc9r8YW`fQ9)|TkH52z
zvBMz?XNKM$HYR3^<%UK^c6NtiE%42N1oQTAiYVsCa#6dtRFUOpyAJR120AeW0xk)_
zN`N#{veX@$UNO~B_5FFFbN_HEchzCVqDh^4cx1#G@Mg|Wdx=L2l>|KjaO{;grt%sZ
zW8>-UM^m}NwVjnW&RQO1L#N<H<$pfP2|ic-)}%hcPW*626VD_V%`H`|sWJ()0svTy
zKx`6qFo7XHFn{Wlaj)TE3bZ>^zaEO--rQhh@MGM{Y*?!r3iZrnhw2H9AoOl6OT#yG
z&$o}ncp2sSs&8Eq{hX|0g~&pjUXd~3@IHs4IJ1TB+%=}JEb~+Y=hxenpHNYb9uA_`
z%6eKtEY%3_0}^UG&IF^M4TkMaBwprFf8i7$n^s-`2h`wk^H;2J>lljzi?-LQv6MbD
zFu@j(%s}zye_wp?n_8|J5}D@<kF+S&%KE8IWaMpEjV6g}D_hs4l4-7R-ruplR>c=;
z?*{)6Zeh%VS~@UYi80KDz~0y*%FH~3P3c%Ww&jb7lAvW58dgle@s1rSrz)_n3T4po
zl_IO&Fxpn@W?#TmA!-;~rrqJXJzRXPsbQrWA*nmeb$TMsz-07O`L#^lG74*eYoc$?
zHiHd*3UYm(z;1nq?T%I8pE%wUl@^a+0wGR75kQf^*y78X84Ep$Zg}3t1LD?)OKNxd
zQ(FrocTMDcg5)^3S}hhM9^-ln%z3pQ4O}j<5tC=rpZwN|L8ljMpzgSum6Ha&-1J2d
zN??X$M?q2hfH<*ZV{<%r8`nMn&^PD1IVGT8FrPbm1;m>DdUQx6k`PrAfMR32FZAhg
zCsNIT6_{b_wY{y2@4SnIx;HJ<y)sO8=hANBAY*A>V+T|*gEgcXPCgeMFA)H`pwqZQ
zH1cwHBCMmMlQ+Nr^t?mxF<7lPkTQ4DHcPw+E#n#nwxI~3f%SUHhc1!Xi`bsI?TAlw
z5ZOnS@{UQKsuZIaZ3Q6S>JrD~^<1#@LfN))P-)>>#8UHfd>U`}cE4oU1fP7e$;fz=
z<K^kDh-7@5m;g5c)}kgt5F<vzDSy5pIfTx9-<W`ykug#SY9Y}s=i4-YAgf%Ui<>a!
z=W%dFbb^YQ3qn!ECB{Y3Yn&$ZY!Anc9@)I4yj_g2dMl+TPy(T6{{7`+fH!!AcO6G(
zc8T?_ex`0nKSDa+JUKQezY-C57RnvBh!-JZqP>WgWLlWP#2od$`b-Xrw@+ZdC}5A9
zxK>!bU*3(KX`2J47rPoxJU3DW3KU^Q47Z6}wHs%yd9=C5heW80fgyCX^9-1K#i3b#
zt=*~_XhSiY{XyQP0gt`%c^aF0huiLzQFcX!yh=v^$p%CA(L~xsf=y1VvAUHF5+_-2
zMzl2>>d{fIj*VcT!fpM;(6|Zyn@U!1ib%T!E)gVgZsk?Y&5j$41FLd}SfsU~;mPk?
z5Wg$;YB=d|ALl<`P2#?JBt-@0Jlt6O*Dan8nIh>cUA5w|jC5zw>YkYiIhNpF;3K?O
zI@HvT1{7Bh$wEam3h7j)KPa(&Hl@;+7Bl0*r(SXw>R*X80MehhxVX#99t7KwAB_SD
z1l%6DSt>t<g=lGM)vf!`{P-*uC|EpRS(cWT+<+PYD1BR00`HcWRRKaJ4l)YLQxC!W
z9{?`<?9WG~-V_y2sYGY7%m9o+NyVd4CA}5@t}0|=Vex}hIzu2gIAa@$fB<3E$e(Gs
z)(8PmZy>a`ww6d^iN@!2xlHVQ^#_u&8q+z747cl{8?HtKu1BwfOi%pOvHmvan{n5V
z2!M6w)r4yEt_LRiPE04vU}%M%^mr?_?Zc1xV2I>QkYMu=%QSQj%q9yJk&SyM4>C<!
zR+O7in6+}}526+`3IdfR&F}D#x4&`m2B~M%i2zI2Lu;wyh(p9keoila7uMCKV2zX@
zs#ohdu+_$HIJt4!(jbkHgbygtXSDSCbhl>BJw#*);38>58b9ca$0_DTmfdYd!Dgcn
z65G9A!>M7o4n-@Ilam+_smd)+^Q)zF#&2&$YBAJ~zfuvjb#IRW@`IjO4eaH{x~R8c
zg;%Z0oKI$FEX>RoI;HjDU)zTuz#q)@Z@AFjqO8M=?REGVknXEp6EV<;j%E&iexGh&
zjh9!GPH2mZG+H5LfqMAX9{35HJ{e&`u+{_3Ukz_jJQbI+U*Nfjp7()(;azpIK%dr(
z7k#!Eyptl^$^vv;@}heGxq0_X8N#}83~3a7EqBP03(1H|>UbQK{IC}JpV|@LB@x&L
zfv;0S_dT^D(m(ZR89JKc!(44XLuq9djSh`lPSXk=aY)V)I-Fng`o!}x`OfB`V^E*=
zyldhMYNG92;uaJzUg+a5%l(OYNCSs6*&R*eGktijKSqLbwxlZDUE{C|t{r4BW3y`m
zEdqsl8iBc#HxX^WsGvtU##XO+2@7i^=<ED=>8!Oo`UMGKvN?jGDdlEq6&kX#Q>LA<
zq|(1sRF(|e#{fI@0Fcufy#aWSo79cE;p5ZQW_ldv-@7HH6*-n0&)*I?U2e6XuJ@cc
z@R=;vNB>rv$<O%ZD9`!Rib5tpl7F<`&JJ`!+-Kjg6PY_A|29Q(g(C}ca(sTMHCO>>
zo5Ai>l7R|Xf4$8aGhE2yu2if3Wfd^+=7S8^$s@bJV9?^?O<C8hcOH<iRU|}9;7;k5
zxEO))pAQlTn~HI-gli2u!VvcM<r`0Z2VvZruYWuK>c$7LJ!M+Sb7TjBZNZTxR1W4u
zhg*Q&#8()e@|~i$Pu|&>JJ6ITbDBV0_a;;18RvOnfsOr`)G22#gQDYy^RC5Zj%m5R
z^jNQr_Y!f)yuk5e>tKzk%meGvQE8<;&SSqYP(<RCkDYk+8`G|oVR~;OamPnvr^7sV
zsEj`Ml(s#;e)3@EgjI9=I^TUrD_MY|<V?1-OPp67#WlVB4pRPDn5T%_?TUJ+)92uR
zGitDBN?B|O=jV(2+%d#`2~9G6kvvjr+15*MYeZ||hXpZN$cW&g{`klqiV4xq=u7Tc
zphxVtq9RmvHH-RVzhmBT4;KX07j4BJChld?lfiADrM+D8Q#u?2$+NN~Pt@Cbg2vv7
zzZl*rycd%J;#;fup`iL##-JN%OW95MY|(Mu<wtac*<(JZv{vYxE+!Is<U?>7m%JFp
zVV@iq>vJ}NBzSzwfmKv0wTN0=iQIF&>#;H4s>Ew9qMfs8SHO2AGh5;VdHMkWfd}b-
zV>$*l9+7iVMN<=g))Cb52xwaYIe(Q-4`cuwPCtOrq2Jxj%*+sZkN<Us5-AMOMU_O$
zMb#)S4i5Lu>N*3)qh#qE&Nij?kiOp6*EYw;s?1a@4yVfcI9`lq^H#TqvIWwa_|SHI
z?)#?zKL3SBQ&Y2bVr0+8dH~PS>UfrX-Sgv35pSJ_ap;sI@v*}0?8|`1=?x-~*h5yk
zgt@?hf5L?c5IvjU9mrCR*wBIwfzIW^*ZpHW3&u4Z8JDY?>Y$E*!lAxVyk^ew!6wW1
zstvb?f-32DqV1}ezJy)hfy%#&QiA--k=dx&3jJMq(1k3}PR^CPox(KSXZjk{rH~RE
z+X=6bPF}6kU`#ZN9$PX|QI9`(W|9?BULib2qdZIW6tYiBT^Mb2r$(f;VyNh)KX?Rc
zGh17S8lrrBN^rfCQ$VxK|8&51$oQc?cq9gUUmp*#a4RH%z+Xy3W}6+1RCaE8{-xh^
zY5;=#aWyn0No@YGfEVHJ(}32<jhrkyxe+!#ql}%4UG-*V*=}!h%7*=~$=>=8Tl~uA
zPv1?GP-Nat*SARgGHz3UT>s=%PXeInJIiXfzg6lNoAZg@)obtN;vIQ>c2HW#uz-Xy
z0aQB@6gCcAZuFdn3g3Plp7_KZ#~ZRMZ);_EdXhSLL(Tv4FXHDj(%c3>uM(J;a_Zok
zw<RMN-Zf6s`C2pA!s&Vsa(R>a(KgpYn&}p^H<D;OBguSq!2R*mc4EfFG{<UxEK%5x
zMyZI1f`Y=h=|szD7)L@%N^d-wrFep~kqd;e=}N6mL*S&-8f+Q>=+5(4G~@k&V_x;j
zA&Z8;JFq#NB&4IKFD@zBZhu(cxnHrg{1DV21_pZpN&B4+exE*_t&Ldh>9}b`d<%;V
ztmr(BNF3`aaCF(4oXzF?gMhW`<tc0yN$P~3_wd8ceeq<E&))HK{#wkBB;B^t4^!?g
zB$BY`7H`nu;a5bnu4!?A7-~Iaf_!J}wYdcnUjiXHW4K31;H}4#lkr<p(Qnni?JmPN
zt6yT@msS7%_rYyjRb&kJo=&A0k)w-{zABr{#uKYD^x>2VJAar@MJoK|7B?<22hU2G
zv(nxcf82t1Lo1R-oHxpSf=p8%0w>*|7&G&V($RY6t#u38S|R1txz#50QFVk!exa0f
zr7;x7hdgRIz6J*(hhJyNKI2n~?;{wY%qV7Mh-G^%cCSJmsAD|j$*hF=SWQJ7EViel
z|EmA)n+Z+8P;K&bqw{?6vwM^_oN>jhAe&O@9)8+5NWad}7?|+Z*T8q?FU$I2;aHZ8
z9;+)Fey)YFrlxd0$5M!w5VBKxZ3TDk;+)&MMk<&U;zm&nVn`S{)L{=Z9M?=wGLlE5
zPGqoVbwXU}sCH}+#}vh79K0hS22C8w8;`Q3`L<zv2LvqZAVpVlHOG=LBi)CV8U6gR
za=A=<t@008SeA(iGe^Yi?5Tc!-~ia)SvOo^9+#350*q(f=?@2H4;t2!Gn+4hOqTVN
z{s9i948hX0@pO8;Jpz4zl~YKM?Ox~vYK`F}TD`-WLg`xjR>up-APFVQQ$H@7-9e_p
zF)gZato#YC7b@<Tr3uyY&b<V+zg<WiCUax--rJWq^R<?EAfXfa+Dv<h^&{)(;YC(~
z(NyUn0Vo`s@8;E@!9P){HIymL<Pn5&<bo6}O0L&NUM-#uD7Ua8s;IYH;!U(G3kY_L
zoOjF!Ef8Fx3fh?}6y1%T+bCd=I8FL4V3?jHe>newR`Snni%{KwQwt*S_sO=T7=qrU
zT;dTmaeq*&UzMC1{=~rulaTr>+_G#c@pfdXC?kG`YVk1798`3*qu!a8<D1n?h_*6^
zERU3!vG0PT2v(GjL$<p8eb%Y6<Z8!Pci-qAF2Jjk-kTAtg>l{%@1hy$g$8>ECJGiS
zLTtQE0)digRfnj+4)<32SKKeib{I9Uaudl5zkHjR(PflNd<d~l7_oV}w9g_%K{pck
zPS3S>GjbAI$)0BgS^9gk2#3n)RZQJjxEgj8?()`{u*E~=<XhB+Ff#?`{m?(wn0$86
zd3=I-^=byOQv+TdP!FkYrRVtu1*Zn!-NG{ufCh$zf7~OF8J?JF<oriKpx7Cz2R;Cc
zX?OGJmHyd$ir{A4?K!(of09y>6TK{ojA1mh!ezZFr#!WlscP{13d32x9{}}<g10f<
z=kBTOc`&&1id|j<^I~3_jP=vAhwtGGVly;>L$!_t;Ztd-RbnQq!(nN-`68O%iMebG
zSK!izKT2hZwkW490&32i{A<KsTk|^qVwBMMt-1Ny@}mc8?tpo1&UZh(H*DN>U=S~#
zfddSe15$s`*gOgx-p}>fuN91zv7MBD4u4Q>3JVxLjTd_Jn}JfaHEwu#xaq861>|uz
zRvPoCHc#q`DFZeT(WDFvzCQp1XyW7J)41GI054btWN9uyZ&|}}4Q;6-E6ZzjRUH6v
zt5?p{?dKFM@Lw*sdPdXP6M>%B&6?weU;-dDv+jDq`FwXW5QIeJ)xe&u1q0Z%Mf*KJ
z0%vc$lIGCgj$C1;2h1M;(jdARDO0Hp3NHTeaRee~R8-U-V0_$o=Ev*7AnmIMHv-2c
z2v6+;eag;2B>nGI7BO*gl@>d4R#sN-M{M>GeTPqnftQCftf<msJJr#mRP9p%gcT`X
zD}CN2#04F-e#y*Yss|82|3&leQ>f!nG97cGo$sfT%$WR%XitWCl@BJk6Nt4S`B$~|
z?LGy~n2TiT@haG)f@Q1U&Ve&YHzH~pG&JhQ6E^Qnvw{;Z`}AfaFRcg-<fnZd=Cl!h
zRQnkdPfQFgWL77b51Vn%G)(q{tEF07KN7vkBln}`^ImihEuY#he`6J}8%EK06JuF#
z$H;vt){op6$yT6-(M*-r@qsMqF7oP6k({3-$sAEeIdathb#F*Gu0XS?UMiT&!~uR}
zNaX5vI|@Qb-2x7~Ly{+VF9lt}NVBP0*=IaC#ftk!sUnAfCA>9&*)5@z1GPiRs}SZ-
zp!?O%q8grz?^(ywC}vJiBWtA6yvr0qW*>^9$`nJG@7oBBwWU+M^GY}-a@KWK=ew{`
zS#X%>&bNAQ#)ZOIP8S++CMPmgj?5fz&Vt`pX`dfI&l^`urHC1p?vqh?96&N90NuJ#
zlG;iE%02|t+&+gD@Lvej1I_NNu#E?~XF5E>-#ds1QRUcXats_5oEINuMcqmFt&gU_
zkPhhIXTOd=Xz4s<jOTGFZ`g^fFvphZ1S)sT4PA3Lk<;1pLqlCMgOlEY)ePa!V(KgW
zgvS*Q7!lt};K;B>)0rONw@2KFMod{xUR}u}$=QF-)(>ACi3xW=|5*1g4n+4y0f&Ev
zHY?ZPK+Bi9sHRJ6eT~v4s;>k>!h!VNvdTLt2BDXiZ(AX5T}khP{&C}bKNShWRdnOp
z&^1QduG!t;XXG!SWJy8>^v@V@z86s8%y{o-PgEPnP+#jPGmRYf8aqJbNNf-4D!Zl-
zNOsO*hDS%kz%hXSGp>E=T&a>UHVo&}ZY&Ef{Cv6E0MM`o06ZwqkC%<VbSvkNw+AA*
zSx$Texk9P@ii(Junwr7e^8~=)fqWBCVY~ri#?2`G#AKtTR+%Ct$V!97&T(m_AwUd?
zmKX*e%gK7P1(6>U;`y9d`N6@#e>)JX_6r)TE+^%yR=^Zf07on;8SV{6NrfPSg9HPb
z<81VXquQ^JS3sTPa(_Bhq0tJ=crlnQk{wNBy(v40OHCaCU`Eg+R$If_Gb4b=67yZ$
z<4?R6CUDFFQFwayyZPjbatlkO{r$va7%^mI`b0NtvqXEdeYDz@p4?>NEvd>$fBl@d
zB@fibE3mosPv%^UbT`$o&$H$QZ$uEFdJMOR*$2JC-%;I+yX>?mTm(Z|=K^$Gkxsa6
zV%$7c-y)&sCeAIWeQt@@sF)qCqwS7;Dgv=!pZCXPzoCZp*e2!H%hmM-8)-$U`eLbV
zAcf(lE*}S$7dVn2aOM;q#*Y0z>fSN1u5Jz6jnkNo)0mCzG-=$RNgCU>t;V*|*tTsn
zwryKyW`FyA_jmrCzi0kS)|v}r4Lsv{?&}_nBdrj6;mVd`WFF>FY<}(`eUYMOsogL8
z(3>=CgY_B)&Pa(l9cvPM50m=2DU&RQ(sMrj#(d8nG$Mhr%l=QUCKxep-fJtgaxy=s
zxnotlBnPzidLNtM_RF8(WjrLHRAQM@WfH?DwXyjV_n$cz)lWKi?>69!8Oe+o3}%d{
zV(QA?Uy-0GuQR?`Ng4H8>^~wCk$-6$^4*mnS(4tb%4|HL3ShCtAdRhGjd-nDjl;i@
zHLHx`@}y2CF~`aY3*MNwe5t}P-|mmO46TQjY8lAP?n+7vE0X)8bRV!y<tlfs;%!li
znP^AB?ZkOp!@x=|H(<!@@*C1(nlXzpmeT{Ve=L>iQJ4Np1%VV2oJs5%iJkMYh;{;C
zjTB+SoQf~KqDfyoE_78$r5`3?HoE~y=sqK(Y}GZ#ji1WXg@*Y}Sip+SZthSArzaV3
z457Gfe@v3ck27BkIQ~i?aFYa`%Va<ENp5WH9F5<&weR4-gw5`XreJOthxgK`#jEz7
z4=tz1xL$j*)`RCGVp=qhP|d6<>Ob2Afq{W>f4Lms0s1vMCMMb^h{o%2?xC?UgM~_+
z=+oko5^O&A2TrYyABl9)@4$f#UNUlWs9zOJH?Al62ecU&?Diyq!rY}8?cd^JYCtIE
z47`}Qn6Vf1kk{+H7a)KvUOcYbT{5fq=+tB7xEoep&$vJ`UmwVW8LfIv<%$Ef=Bsh;
zbKo#-$EW-A2_O<n;`j0_nfKc8fyUx<=cs7Cf(4Ej)p`V~*U};)P#tf#70C=bf-^-|
zQX0T&7ly-nRJsgkx1LLn=@T;30e#Ai|B3V8{e7<3s+<W?bxTh%!-AuMDnr&~cZ-!L
z=^}{8hfhUWr#gb`BA^d+`F~beIm_A;wBiZQH;126O;%xAQx#>=Prl#f{(`Rp(P+$W
zfsq$Pem0ws-3T8rHf_W%CUS2le3I?h4jxJnOJ}NE8F<^cRIG68HZyvBVuH`fXDqL%
zpN;jW!{H^LcG1sOY#Q>~1Y_d<!{^+xrErx7i(QjR$SrKQcivw__I$~ypM)-0zT0{c
z5&OW{G@7g}U}*8L4n3Ph{Xdeikj#img84h9I8YO1w9g0jkgJkI$}C1M5H8;{>*ra2
zHal#G1W-8rsW@$YOvv$s75(k^k{8im#we3AlD2vK{kvTchpka=juN4|VrOV5zP;Jp
zqDo)GEwAQl@BH4e8~bjRvwYm=d4sBr$%st8^<N9eS-to|W#Fv+-kd6aPx0y9)lavk
zbziO#Rtr}Yg#OgDS2|Tnhvp53dw5{?QXa$x#w5^8fBsjad^rv5fG%T^mz26w?%}MV
zeuW9}?tNt8YLx>FbPo?OLnpP{IT1po``+21%cFv8Q)v&R@w=PyO2gB;@s_s9u`(q^
zY~#kqLxRJK&?+6dnRAy$PkxQqluZmwUSWNEC`|l5ur#P&#wApVGm}L4Ji>!#MIyFD
zRz<OQhVw0N>NwZfuV69kvz5uQN9))ddUwriM8T#E`BM~*g+$K3NOP}D*atlVL?f=}
zk*#$kGp-uuH&DE$3xU_uU{~xepK`Qu_`;tpYch(C<vKnKG=T574__Y$mS_FImT-*=
z>!q_<Jz%B@9X*!WD7Te&Y(ygIH`0aWzI=~N95xOt;rjhtLU=ip9Br90rX?L%<;0Y5
zCWZ4l!*Lkq3t+!<JLK~fPcWn)v?D%q{uh>`dY5w>i}bWJENFGCuNNw#9VJ6<SbC(}
z(G-K0v*ueTGk(IY)rFOQTpj-xPvs8ip79HrF9S4{N8_bnB};yT@%a{Hn`@qBZ;(NE
zQGcXuJm&SMatx7+A%6UzT-o8TsA1FzUen(M`;F9MlXzUYVi;^cePYdciS=X{m5j`~
z0z2!Orlt;oMw7d0E`nKCSnoS-9>h}mZ^Qk5Faol}lQz06%KhV=O!Y(6Vt(1i+`nNI
z538h-1n(%1o48y0RR=wJ65*#8*?+YOw_ZqWx@5el%&<!8$S}kjSmrnyER0<h*s6Rn
zh<|Q-<hjuVdSIP;R@Ql{wBq^!{hy~3=lZ5kj#E=c`WW@ScIl12<7pCm&Y$gcY0ng{
zRa$rPRB{k*dW;yIO8x)nf!f8HJpTi&{6^*%09u9rOSpe&GE(&7^VHo7pUV-%o1cNR
z=`Tcl?qD3tLLZgwf@<>N+*2&bRaU62P~H>!=PE>I5tz5bD#V6Kc{@X?heEgkI({xQ
zF1FacQNDsBTu>KP{OuEJk7t;|PDZ4bHX?KI@l_TvJ!z6@_$<^~!>Bxy9Q4j2hAt+G
zaX+2Es$z9GK{7Ldz{x$s=wER3)8#f`8uD=2>y%@3{{!~`B8xo8?p8M~8j`lTs$of>
zi-RE*Mysv!D%^2p&ZMk{nic9Pa+tUbV1{_(Pu3KIb-qC@eovV{p}FK-T(36VUn~?D
zzKcT<5eSCad~~PpY6nuS0T#9hPwj{LB11R`(CQ2G7#loT{lE*ioJkI3$E*L3^?doy
zhZ&IF{!gCibY*2`E<~BP$VrlDENg4EHY8h{pqe62w~#8wI-d85FC=3onq=WFCEemo
zvb5(W(caW)VPCt3J1pOdO}>0_>`LZzVdROS&-LtNyuRq6fzD_MmZA%S85?OYhO=`a
zn=|g;PSnef2;dr-$nk$CCB**znPovBqdt(I8j@3$?ozyL`C<LIRRxdxoAb{jUvu06
z=n!4{7Ncfj*zObB%-NAI>fH3#MPDksY*NMGwQ&5O@EEQR^xbt61HmjF*W}+eL0Z^9
zqnfjYua3ZULpUx{aV+WLJnMYEO=D%pMcT>H_kiK+6TGu9S*Ufr<66(wKZSZyXNELW
z`<M-E%yUXx%RG;$yDHQnWEt{q1UXN7JvoLQoQ`|`vRwC>jsDNyZc+e%h3GO;2)`J?
zq;|k|Q+aULBUPTRVp5h$7|d-?bHvsbW;Nu;>e$OSI#N6_%8PYKgFEP~=wO!TxR&<R
z3^Mq5neq99!l`V_Gxj@P+}Z;vIl<j3!U35{;s?y$w!R4NVM_rgm^<UIs67xJj&|zv
zv6}g8&HW_;>c`iVH1dp*6U3)n2(XIP<awJKih^%T89Qr<0gOxIreten{^<BT_j!+e
z4clgBc`AQoim@$3J2L+_Qt%}KxP4pU(Z+&Sw?~Acy}vo`6eccRPFCSFm)XGzap+H4
zC;5jvwFM;K#VC>DA;A*UOj9z@OiaMbp6*^VpXV<!`YEAl9)VX=$W-kjGVC4<oU@zY
z@#%NBgzJGkI@dM!OqK815N^0!vmHlF3>If$hPQkq$Na+|rJdJK6G>bH%SJ=US8sDe
zypo-8*2f(ameC~r@9S5*5v<Xi=N3%Mt=SA2tn^n$3C`7DGQaOR$QAq#XMA7y53IEi
zsOJHC+~E8PG)wy=HKb_D|GFg~mknxmW^a1xSQ=H)N7?^tO7Cj7C?5LDzAc_8x@3;6
zvvX-)d%3<w=)BU+!dXhsxxGv4Yjo_3u&3|+1mVaX_>oz0eB&Iwp_7WYH@z4Z<Va!0
z!T7#+!II|Zay}SF92SXC@L~aVvvp#t8l=@ulO1U(v*@DYE|x1kTk3V~q5cMEx~Q(&
zB9#Sr)XIO|a(emcfsP*`?}Yb0|3%Q4VLh$ufbnnvzOGRzo)yW_5sXaQM-NbxL02LE
z&6)APY)Xs6ha5xT+R_90^^%Dhb?*t&jstmJ2#7_yhLc4$>k4T{Wc!V)!bWz3d)?PO
z*L_~gl#0^0{HS^)qx-ojdiY?q2J;xI<%!^o*x!nT+3&Cro`euI0M|TyaNcJ_2Fu1G
zjjrkO5JAlDUti!*MAo^`Slxer5B#1riM-Cz=mtOmM_>}I+FyA6(k|-{H0A+uQ~v~E
z!5-~Z{ep|xpVS!`Cq|b>e@9mNijur~5&xuzy<5?L!vUf<0r5(C`?TMhx$Do|P={_h
zreXE7{;F1(IV4S1Osa{YF;08(Q|5WTWKBvQd-_eG*>z+OrYy=7JVN7Ga5#d_Y!<y5
zMOibDi_iQ5KjDpaQ`sq)5R?1y$|Bg;sk$GBM4puG#7?0u`}xAY6p*YiePyw=gmtY!
z^xwYqI?L6@$>=#3Nq=AdLyE&_+}WnXSb<li!+0>hQ>*xR8Q*k?2YK45dGi;<6nQ^0
zA*x{(sR}5#Ob|UAZZCqDAS{;JRnY6t29j~iw}!aMVAvVz>xMA-Y4L7lRqh}gvrw8w
zwT%<@ru;P_CHeJl@b>=j(J!NX*UU}U9oc2{9L?$Pz98nXL4DCfx6kLxy)zq3J250N
zJ3@EuQx>mblNOHL50MsF;a++8PQCO6W9q%bW+6MScSf})={5*ODI8m^AwY?3VGzrT
z`<|+JVA%$Lzfs~lj{F&td~F^4`+cDGDH1Y#5ps#}x<()yWAaQp#uHM_EIW6|q~4)f
zA5J}^y&!pg0xQolYEBo6KCIm{Vz&M()SZ@5&tH)uIWkRXA124Am*3qWI5c`94XcwH
z|9+P(g24&tOdz@zbmTd>wo4ycnA|7nmH90(xKFYm9$6%cn?CiwUt6f<|Aei87|3@{
zU*uUYGvMDj|F<!_$q(ZH%X+7gMy&FGgBi%Vs{qLB|DKM#RSh7E{{oDkh<XhK{x4e!
zpnNj|Sj+wIy?-MoLjJ#9XXxO7NZ^0douE1f@Vor~>jV6K3_QW(!8j`f%2gk~$<}+7
z)BB}V!u{KT76!3JDJ?#V8sNs=VZjNB5JreDA{>(AOc`4ZWA3c(L5XIKD2zVYBsCEb
zaweI4031(2#6e~3B$6Bleha!QiYG+0<S6)nLWKWgp6#p2W%=&2sLA1ZzTD52`|HaM
zq;vKA$lhnu7$?4Yhu|RKN=v+k<)&raJ8MQn`R6+PBdwsI8Fy`r)$;L_|6DhpV7Y~b
z{(I$${#*bK{{8R8>)S2;&#fqjDCmwM|J)=NnN=zO&rL$E5dP`^+(SG?LV1Gz&t1`9
z1au1ixhd%X7jM<9a2P?8@zf`Ag*f+nrhQNQ%hv=tJz>v7$abcz;vj?x{yzB*Cs|os
z)o0cY_b7w+wy(8!|9tzaJh=s`)k3KCJDLo4cSx7O@9^AhYC})E@~u^?qQ-7%fyKmg
z{0(6gUL9y!K|1^W>*d;g%^rX1F#p?P(e&~E-`m_)XXaty_Ss-GnJ2C>=#y@MltCAq
z8r7-2?l`O?FLPk(G)Cp0U1HI?&*rav@Rk{Ogt!s43uOTM_G6#nt{IDtu}>S9Incmt
zSu3dU<BOi9%C5iJ;_4aE>-9i?PUkQ4!&Q!QWVXv#6dsvl?K3+faO;}YeVfEVv{LQz
ztwre-Ru>dn`I*1g5>qBh_dt{PPdmKJp*s#k{v032G-R+jlP?{*N3n6xGF@>gZ9O+U
zX(-c9?C{etGoyUh$5d;;v}r8f^<$00IMqLP#B5Ko6VU0?Qe@H_RX%&##c{;<5m4&c
zTs3nxy;COVixV{o?X20mfZCHu9`>U*?^HnR$-3w{QGG^d?hiBQYVN_OkNf42lzpnU
zcq`7yO6xZh(sQ!ncvsaQ8(*z6sT`2cJ2)&wM@{2*pa$6V{D5n8dBIck%<BB7Kj*ek
zp2S%yeTOVR-5hl>plCm`<;i>~e0!G8c$KEfOZ}q%Hm$Z?)_B77u9)#U3C0$;l>9z7
z5T}&5&8a?)y)K=Rjv8_N2<_si7K$_Kr^d;1BF^S1CsjA^w|_qCyp|G~POP|gPiTei
zqy(bUvsrhY{N5Vc{{BY8-FCZw{aW+p9-L{Zy|}G{PIdmzNIJxT<=*whvfq#8aofkz
z?U}Hw4AXh@;uBASR6m}ptjReVr=whL4=C;E%u*RqzDpN+y5oEPc7nx(Ij(~&Z0_u4
zbZyOQOFWX{>0L{NfQdPtb<Q(X!!PlBif8V{N_Jp2{SS~F8hhW7#o-Hp_TtNARAT2`
z@0c+1H0Aib%wAcyYwhvRsD2c5by^e3Ez!kh_UvuixGcReqljTErgvH9%P*`AsO=}c
z2`%p0e@l|n{&}dF;o(jo6{);UpN&&UN!P8EDzY8`joQy}w}i2Owwj3Yo*}Yz{gp%A
zJ}E?%yK3LX;TPWCwjb-?mH>}Mnd+0Wy?M=@602=4b#U?hXVrfvlWyrl$4w#R$%gQY
zawiCzD#o8`0xk*&)^9pSE-tUgw^uTN)T{OO5^9wrASqX1Y#iOw1%A_F<#ChI<^~}!
zJH)nl;(8dw)O5J05Y*JYvEFLedc;m-3EOU(5VhhUAof4A=*Qau@8roECx1pCXjd0^
zYg%#{|3v6c!x)_okzO}jh6$35W15Qih)A&)(*BJ@53{Z7iGVBsJ##KD6M?8l&GKGW
z6`V1FgLrBA(LA!K+=Tb0HNPGwMcvL>AKrt+X92sl@lcptRj`rc-}dchF)C08`P^9F
zq*~$YssGjGe7<wm@kFTs?Z+J=1rHORDWcuNvaZgb;+~VQ#m1<V0n7)fBx1P%>5tr(
z{#>^PmveL1`g4kv=U&$wl3J*zeyRvvR_JSq7i&Ll8VhG1hxu)68V~KxQU@1wJT^BJ
zuMlgU>HZnKQ>Pz;*XE(lfJSXu{|~`Cog$rvNCv^ly11TDi$a74w!RE$g4|q3RNvfG
zO{y7U&E_Sv%Aw_?>kOXdvmwW4EBfuOd36?%kh;%}nn^tiLvYH|whp%`8ayrbkS#cv
z;>DX6%OlPg9ifDd#0`0frVWBDD@^Fm$c2m1eZvhCyd5E{UzIPMzl;%&T?kDXN|V;t
z2gU!$Bt4(Do0qjCjf9@=XWCCLD-+cDx5;$T9VHszkO+EvXK<?b5XsJ(QNpkcNwz|&
zcpe<7s^l{acC&+Np&Qwmjfeec#7=+o)f=qfVfu~n`TkMFWnPQetf%rzk!k<y1dr5y
zL(`z4HXPt48pUA_p*Ij_Ahbm&veFQsu?8GWnkt&;(UB~YZHBG4y@kbP3h{lLxYEOk
ztb9pm?|n~Y5B^Y_!Ci|xIh0BK#3QM+>3eW6?r@L}+g3kK54$1T-5|z<!JI_3Cli!o
zg|_L|PmwU}kp<cB*!HWx6wRr#YuU2L9>4#yBIh2v@?Q3$$+$=3e+5^#32$#rr59~q
zGcDq@_0CSwS-D;bT|v|8^KrU%eiws`7B|tSQ|n79*pI@R)lZby<h#7zYrdRL;-Jar
z8(>j;oEtsAEc<HqIOFD>jfMg*)fPyhr8$02!-uG(@;KK~5beyyh(NHh;kGWb|L*@D
zE3<lD^Xey~v(eDF{YUFMPyvad{(aGI_Sesss>t?r&Hmwm>7W;TTwK_Y3+5n3=DlmJ
z3WLkeF`NDBnI+f=JgNe#ZlNiw=7>QC@3mg|)c!P+_sBI+0^u`azDS3Qwm_{NAddOD
z{TOHM#>Fg>$t!&>R^DQOXcx0x$g9!|y8nz<*qEksc&+lbzR1J<wbn&M_|0Gz66R%W
zg1?7u!2GDRN}%$D^lf@m{mbOBu6p`lVg;SphE?9nc5SeAO)%a1qX_c*x&#%s157@j
z2ZKkj#=xfs4p~Q>X#+#aM0b~YA}#QJw{<Tm$PtgYjMC0e%z9_n6<4LZf(&4u(rKkT
zF&86;(<+<7N>6I(0vqQP9FLLO#siif1`s3Ca;i2^(*_>Rk8PcV&R3*wDVpzcUhM7C
zKuY}qP>;eOAq|+bIW4^rygxvH<i~@<!%u5Z*Wd2gLqb9x`q|rkhGq^EL%#e0Dp?=8
z&cQ*k52FB+X8_Q#d3ms*!GO~N@(*-Zx(30C09(B`in(}MI6uHThf>bd{Err(@vJ?g
zB47!&o-TcGY@!kCdO`ZilP|wVLI?!~Cf>)(i;wg8+BzPIT<;Q0uFS%H<C8O`vl~9H
zliR16St};B(_HCHDN{N0u{#E^#)i_(bXvMr0DS6O5NkIe$3`<xVhs>zw<lV=m%68<
z)t_ZvY$TCae(0o$$yR0II2~wX4NMECbMIfY*yA`sn;J^e9hh1)TAJolPqiIf+{w35
zOk{w&Q!|QY<b?UboruY^NqSzUeiiDwaPA!EqQ|Z2F+oX7Y9`E9u6avBg7>P1xA5s_
zs|9|OTDtQvE?Y`fgrvb!*sqwUS2K5_Xjh-MMH;WXs0opAgQ4`Hvh$v(GdJ+}I|ZJ)
zC8<h3yJS?O%Yzs6Ey1R~BVY6iuZaDIxEKaI7&vm3)D9#ETj`Mpe;VM(D0EQuig#ay
z7I6HGaV`h!ANlFrq8r+v=dJNSHYe>w*_Y)88+p@r%(##sgm54tF=C$QxY6-=oRCxl
zNMoPU4|Cy7E=8u}?~|Je*H)WKU0Oh7w*N>eTS-#rxx&IQrrXyx6SlWK?zE-{wv_7V
zq0h;(M>CSt^}P|@H3Up}DSUx_Px`TTru$Mb!5f)t10eiIgyq-pVsw?&)l44}b)d2d
zST(+H@Z0W)ef(`WnlxGLAuEgWeo64|GkplL0Wgdk0C=@K!2f1bQV)%YrwuUf=8M&k
zAhAP25)!ZzJlF8R3H2T|^`a4YI)AeSEHt+PCEr4amse_JI;UMgZ?8~xb~a8$Q4t_r
zh<v=>zXo0o;1{{zjerX42iZU2^~Mai#FDQ+uiY<Mz*#?B=mfxHcYm-O03U|aK(Emp
zm#FC)3=sI;-Q8m1;+p__Z+yM-aStF2o1mhm2Fvn#e|vct;5P&cqc=yNpC-T@92{C-
zu10u(iA~*QH~zF;6Z7kj8IUEmUoQq^Bbv4XQ|<|?C0X)9x!hQJbXRlxQ4R0rwNq<>
zKYAkVJTT0ho*9wEx)~S1;sdSEouZ#pd>(yVa^a01{+ewpUvjCOH^PXhh&S6XOlBr-
zEow<_rmA2?^Ap+q8qf(F^hdw)SpTKr9pmf1qF@-4pNG6oeQ+ap@kBj$c02E-gKN@*
zcO!dpx3Lr_?2su;lcbxoT9HuTf!_K?Y<B;|cw3{>L5C=x2kjJ%G*6(_Y4!43cpA+Z
z<IgV_8_Qvx)dbz6j)4H=N-R(H)GKSqQPS0nh&F@2!9MS=4*F$Is6n;SJQjE`i~@+K
zv{~9#bSeSKfoBqvn3G<g1(25OIWF(3qWOl_ME+tQd02JxShEz#L+j4e@dOL(qG0+9
zdlF~oG|}{)TZ<IMN^!R_5XMun*TZuiLk?5J=yr2a`|Hrs@hxHe5W@NfQN=DvPMaSd
zw0BL|{tDSBF(x!1l*V1o9k|{a6M0{o+$aQHj|Hw3j2l_teS-N_$A)~U>?uLa%;%N!
zWpCEKwSZ{Y&`7y7m?<)K%g?{m13&vbGqi1ev0vkZ%-wRiH_&u0NXp`j_#MsmBpWcG
z5$5ECNT1LX^H`+3{H(TS?=`9%NWoYs6QxIH*?PVoqkub>J-TDaFZp|J-Mbm&jNdT!
zH3KPjR$%Etr+4;oK`AQI%#N+g8V;^tNzg?!7H$V9#7k!IBuDZ+1>ssY`iRGpUjlvf
z((C=+%l*D(z3<ST3DAJ`-vuJ#Z|$X8?Eys53ZNwYQ0bemwI)BGcVud|xmIa+`S#ZX
z==zuGd?^y-PYf<s`X74M2z>6ri3uZM&OKT0$Q+^PSJH9b8AJiemrfrB@S;==BS3~`
zJmGfUCMqrt^M0%KK5PCAh;LL{of!cd?)b)m!{G>BhW8fzqvBaqUVc{V;Rk3K<du{L
zkUTH^`Yn+7Ul>Lb>5TxgNivDS2f#PuJ{&6mm0Pl}FBsqv6QXDVR8J*5A9@H-KYtbm
zJU4*vN-13s3s85VfPqHs^f4t3js9l0KL9#`l($|Dvn}(!zg({**tiW&oV!z=m=_fX
zm1gpflot!=^SyzXq$>%928jj2zi`*Myg5*KO7@a*2HkSo#5b&RIahkOu6jz=vtZCk
z4j4jxl6x80clOLleQ{?e@zkz(?`t2G(ri!7;`C9urzPAquT$54SItmU@xERKi$P-+
z3o?IJC&rYXKYKQGbG?u1n1-IuWpvP2FVXHhUmERi^URK6dYDWvS{Pqj?++GKEF0`^
z^MF3XwOG(wYNyL&PF!j8ERFT*_E;A;F7<@L*uA)^xV~&w$4ia-yI>WT^H}Q22<?>B
z-o1o!*rzfhJEq?DP1e&x>r2e|&16|7biofBOD7Zt74hdu!KuI*uT}dn3F5Q(0+r)6
zpXp=c6*}HzAFFdiSRSZ^%UEnDi{tlwji&Rlc@D9b{Icl>s!{6PC=QXZ5Pqe{*=V2x
zzPN|@$u1b>Gc=an$2OZ%k9_?T89&*O>-4k9f_OYbqPZ<Ra9BhCoYp&uR0OGR=034l
zm#rf<vLQ0P3CSpC<>GthX3j-?!O+@yzsk<QWJmZL+0?JUJ@x#%+pJq+O>UsrB(&iI
z1|GB3vA%8W(JW1TqvsM{%&q>V%fH}*4{4gXPfzyQvPOmHLKHao7khlRsgx}&*7JbT
zy)xDZ@ywajqJ7r%%yU3ywQFmu@rdqvqTidzYd!di5ElEqaMzBcE1%@F-Xx0}u?c=$
zVR4Vhvu#l(Dl;hjp@O;i7%MfIuVDjU+`@j)6=k*%($hx*So_t-gTQ!UGdsV2bOzX|
zq0v!@jX!~3wgPZ1HvrRxO`zPrQ$0cl{;{S2kbF#H;y@ggJWk{!@`Zts5wq9ZOaG~8
z<QGV7KqN4l!OOj~U+H>l+TGKW%xb2%U7wbQw>O^t!@+?S2wJ+fFDuRF%h~SVkG0;1
z0WRjkwGGhse5@oXtY%!i=Kyl=ydBPt#cGPqzG4jyl9!kFp>`l8CH?vH=if$8aa~>F
zNZvcFj@MJG$3qz}*v_-n7L|5)P5?B-ce^_wBF}&G+Xl4DW24E;fHGp&spA<R@MUre
zoUhU!Le21eG9e}*ae3Gbbh+%qTsGgoepCinQ(-u6*2aotoRB%%?M122z2ok2qVH{|
zsD>yghbi9ckyMQFO+VgZMfdZVN2jEnWiflVZ8v**wFAM;^q84phGMHkeL9Q@0;!K1
zp7<C2THv!$W3g>!n~+*$gQ;>!Q#z0PJzB4SitGq}dTh5U9*;CTHVlyw-4v6ueDQUO
z0FAbRmzFeq8jUlIOf9cu95wa5(t8*VPvB;&?PhX@7-G+7Vl}X?sT?quBId&=ypnf(
zTDd!t;E$hloH|qOm;b?>yiz<<i=fbF{w5FF&hWKpS}x^gH{Ze?kK3g4GfXd%@l>Zi
z5uYbvelt_Vqi^lKWG*xeUbc2gsJ1?85nAlhC!&e_c7C@wIn41$KkM-H$@;&7HyhFT
z=AP@3Lyl;%R+Y8*)SjNOUM{RRRocNXx>Y$H#z^H%LAZP=xd8;!qu^rAjM$<_`X}Hi
zc=P&QZ1#lY@YJufM<l3VXbEPqVkx%Vhj--nHc81o!G2ZR)>}!DM_dP{Ud*>Yv{#g&
zsT#s^R2o9CM;X-6XB4(7j<V!vX>iQkQK^`XGT^vCN2^GC2f#dA^Wyi<uIX;rw|5vL
zAiH#bXlqa3Ph)+B0U!H$wyjLz#lF)W5*cZ<RBOWX_OKl;K+#Ftup;Q}%;9!_X0d_+
z52XSy?f_N=JTil%*8zk(fXdtyUPLC7Z|uV};^0qU3S>HniHQlw$#O%*;vH}lGUx|{
z9N_*6u=J%`9T{p?ZFMSrJ3iphNh~I)&-dp>w@0(5id<mMZJO<u;Z`4v1TYq+OJnLt
z5I*OvzrWq%Hnag@&Ii5N368{~R_JoM)t5A8=m{)blMJ!D`BJGM%9YvzdVgVVzrSMA
zYp#?^iK^3K!eP?w0NnlyAeA<EI(P<*#UAe(I`F6LjuDGb%#Xyr$h^6O=Xp?EolJ4U
zPYp8?6R&puzD*G3nJoB2+_&y2pA)91$)99SiePe?$28nuX*Sk)=f<gs(fbAbjU*3(
zF*Z)$>}_{JU&p%jvwSq>`J_acb4{`44)HNCF$-P93bJv7Cv7n&@wApvVM!{-OeH1G
z4Eb<jWtib?5N=D09z#n*7nXuO38Ag#_6U&~R+0|=rh@OxD!t91&4kQ2Z9>QMZ-kj{
zI&5|n%;S?45(S%$|CO0PH0bO%JS+h}s+>))uzq?8-`c`1>kL*JD{ljDeEk0K2!Nrz
zElx;%1yZCe&x~a`Kd!fzsu3b}h+J$#2@8s*<A1$Pdmk8QcMR6T6?il{w!j3ah}*Cv
zn{m;E{1&$pH}=r^rIgX#Db%}5ev5LLwkUCR;!oOet6bta-S>uJLiTt!Iv0ssPze7E
z9Agx}^p}1}T`XfoneDz^%IT{An_)=3Hlu)v0$HZVc#|A`PRa&9LQv3KH~&wu`y_yq
zx!xNytK+4i2?LA`Qs@|K-0m&Ge7>bBBmz6S29<hq?Bxb43TEb5pa4^gqqugn079Es
zVN!TvZnZXE|IvqlU7w*3$&6CGh^T0wJ$zsfa60XCb1UUfwa<CLz0k&=FYXSf13zqZ
z0K1P=U>C@!KZp#xcn2tOlx10<AWJJ0g>>4%4Huw|#GJBgyIV4I0W26ek8T~7>bVQ6
zYXmH)TqbDs#!wE-WO5CsC92of`ZkND@6l1BrIVh1KL}jelf6k0W`}DXOBp`hhC)$a
zKKlK{%q`7EWzY*Yc|@2Is_hemfb#TG!M&~zR(B>^zej*47^j7FKHd_`;K4B?@bjMk
zeknlE^%GVGOcN2GozRfbCY=^)@@i(cZ>M6UPr;zuF$Y|2dP_?F`(Ny>Drt_z#y&q3
zjKEp<Fy7^K^i1QNZZ2EG9DFv(UFmAAk*>o_G@AR(rMR(k_UBI10n6!y?k^RiV|%{3
zVOY?y@=$tgU=N5G+m}#+CCc!(?MTiGRr7GN;v4}D@_!ZuFSJ1)-N2J#<;3xEAQkv{
z$oiZ|RMi&`LEb=;B(2Rw5mibwwP%kx(HlJ=zB|9exTc>x_IBE@8jIK`bk9Lvwq=f_
zlxYT4*-ax<0XbG4tT<2GPyFOG;M?P44vQH-pO;Pe&z#mtD|$QA;zkeH^79W~re5%_
z<3KUCbaN(F_cacz@|C%Bw#Oju_bZQ(Akj17Q7LK3y`(Ujpb=>1Bgy}aQEBarC@L!t
z8fAL)7s}?~G_XQgS@_$rr6PC(5n=7L9y^5Z`pX`Bss+F64W(fXHo(UN24+Jii#0>T
z!&@~Id}jB`B}x^Bo!+3~v)KrO2wb3a9WPgO(CMcNabNg7(<&<&%A1dh6`xstm9qV+
z1WqEGLB3*h0QPGjTU%UOVufEPZgu4h_EkPFYw0ikxGj-UFjpI@wlE^74BPZkTH6>p
zhZpqEAST94LLqg=VMt}HlQpp}YGT*^sbQe)j@Xq2s)LfW{9w@G^gH>7&qT2=1dIvi
z=V*F`tU~XXSApyk?A(6QE)#n0HFasrqV!RNYis)(|57?Jn=ZpN&`N1XWP&N4I)PTT
zx#~S$_Keuuwq7=uFUj(#?ap3|FC6c+C{CZfk@oY|apuUvaalu~8rbnsP5Rk5V@I_y
zRMf79R5QHJII4z5Uv--ImQOEzo$IZ7RSb!B`t?s<C9S8#$A`@&O0jvG8S1uYr&_Gn
zpZV9id}6h+xtnKw$t}1}^Db7qpNg;mCD#Jp14m|};%8J|1DGHAnQ9QDz|!i1124m8
z#)|-Ly}gty^ERk5`s~Rq&j~UEdBKtHd_1~fRg1Q)nWvf@nzp`Qy`L574SM6ma9pm!
zQOgmX2EF||c5Jx~*}!P!LfZEl`AMykGP3m6KhF6T>)S6mu4E_sN)B1*m*?vtdRW~M
z88eIo$D)qgV3OSkw@+^qBRiG=__aLG_Jv|NLtxL|&DidKSXcqrMd(HHLUjTN!E9q2
z(TO?NPP2f$Jqtimow1zo008m?|2q%B;gu<JeRP~I?I>WeV7F8Q7P}93F0ZHQ%*Lmq
zlGHkW_Xh^G2FqQ*`e;DExzQ_hxrFl8ndcNL4k45VAHbWvv6&4oJ|e}BsZ8I{kZs(-
zb#D|q!{bgR%ahBe;;Vy&cS=vGmnT`ds`em|=-e-9t=dz_&`9df3ehNSw4jEY!+)!;
zkQ2%sVCG{#Dq-l6ys_NIFW&U)HFBu?sT6wI1llH<<XU$L+VhLQwqowDy5tor@=f*j
zs^AOx*X<?M;O6g8?cH_y7I(VM<ndu@sN)EJr{)t}IpVkK$P)>8S9eeC7kJ(RG9F}j
zNmUUNdUNU5gRQ&>1FVMb=M}sDO4pZcR9En>8Yea`3_;yoJjrpFj^?l4Ky1?yHxtM;
zrH7qI$7XpiXwFo0T57x2wf1NIJ^<<7k{xubavf`_TL|=?4K~DAC_GJ4g5uf@=ly1W
zy`%~VwO(Lkc(37QXj6o*KrYTTMvhZMmMbo3=nVr$kX~c3(ic4Yifrz*wi_B^rES;g
zI&)-=uY@5(hMr$l)qN;Wrl)Lr9cRG5A6wNKt@GkUpj+=pKbNO|!xUYB;C+1Ylp^x9
z13P4nN%hauIJSb~b0Qj#(S9wyVkeK~q24@+#B^Jb-<nTr&=LT6O#;Av>g7O`|6{jn
zx#9@`lir4QvTUE=;I@Hq9oXB}x~%}(_~_{9E+7PP(sVt{GyGaQKQ{+#lTiTi?*m{D
z)bnA$Mop~?_+PmLF)=`)G|XNuslvSKnv`VM*4HNhEERlB)9HO_P|&BWc4~BOKxs^)
zQvDn7b|aTg8w3n}-~dk_tq+e!?(^0jz)|i)l1%>++w|?%BFqflN@t42XX{(@wZRG9
zyE-H8iBd^RT-LxUyZNYFb&TX98K(U6M7J_xV&3E;oyJm>-1=E)J#Z3b{;t=&{552#
zJ<a?lmqb%|<IB%gclJ^8-_H5S>DoTc)7kR9JhRuFX89PL{IZ0#*^KM=(BN!s?H4Cx
z@(JTK(I^Zkt=Lg#!)5?;cC^wTYX*az-(|R1a>=!S=xac50OdKKr)MxsZI1GvHaGez
zbjl%u)N|EsxAWP;@LJT7NmPYF8YAXa4zV1-dg0DVa&$OXsF5Bk+P{tW(`a_Q5bdqz
z#^<}`HiM9<UE3%ni>G7=QXXrv&c;9U9XM2NXZZ~6eX`)N05*WM(k~w7xdo1I+kTv$
zT*=p>+SOz_7L5^w=!UO<vrd}uLhdo-m>5CbN_0uVGu@ELK!Z}_B;BD&K&4(!Mw{2i
zX$YOWg}Vss1|4=|+J^eSGYh`^e24JPN_?5G8N9~ol^WD{v>os=`hIF93v49S6A&Gs
z-Xh@!H8TC|MvhBc9AT1kmQn(nS#xTm>MHhrs(<$1j#w6N%6G{;mGFW)nb_=`Pk+#h
z9ZK^mM_uBO6`QlFrgEkYJxKD;b-46UqD#~IaOP8Qiisu>t=8=eF^pt|pD!;gG_G3*
z)-e$S1A~@pi@CYEtG08GNB|+W1M(~N#YZQrE$pf6R(rVX4^TaUNYL8Anx$`KB(JVM
zT5qxZFrxJ~#c|VC7XOz%;f`7Yn8~Ex-(DYq9OmSnC&F?nqTBmyg{QE&IZemYY{iFE
zJ+;mY=%mZ=2_1Ad^`9Qpxfhin{p5@oE|2!!nU5Plp+h-Vnq`86#GIJYH7a-PD{t}q
z9gs`eygy}U)Ah_8GmXlkcu(%R2V_F}&448ei`J=-QD4Q&DoA!=P=MNjQCXA=t+s26
ze^FV4d^n=vFnr6JP3(?J&aMof(7vN(E2P$mEU7)7rDOF(#J~0%8vfK$_xRJfCLX1y
z>>J4XB?uJI?rDwS+D$fGv4L#j&BPX@>nXmhG(pRU**7dx;Mi-t<}0T?_ENC&qK>qG
z&!#-IFn5xsgXbC0fPKKFol?O6Nm<yiOB<xI{{G4=tb@Bo>TI%rm-_n^2QAKp)4n>~
z#mq0S(J>L3Ma3oz56@QH<cl_ig&|Y~*2O7puF!1y)aL+#eFDYkZA9$|8Kbk>h(KGf
zKljNDXbHK$3_aNWIlvmz$sWGD$WuARUYoLwGgV4A`z5p2Up5fUyeVR4Ei@tV1C{w)
z*3_#mmQI1ETnNbk=>0Xl+DY!+9~S@|PzCU8AL4&tA2h2Ryzr6L{kE)Qq@wbA6^wBG
zlg$BGE&zIWvC)PEK+g8(JKh{p_hrPzP2jn3fBX9bwomX`o?j-R5WfB~($}}&#CD7f
z2$=$Q{`IajfGre|^#)k+MES(TMD}`<qYsZhz|YeCq6-@6JbG-YO`8F@?Q;t0S;OP5
ztgIlv?@xa?k0|9jY3UOC6yp?|nSEn+3JFr@@Sy2c8U3j`%;<$I+CffG7|!J0F2SG@
zTqE=itG7=Dgx-K+m?0*myzxt~SF2~9qnf!foNSQ0imzx|V=GAC4o{PJnj)>bKCe5H
ziU;bTa;x`D&1UE(E#Y&WI!_nnNzFdvegEHchwa+QelQ90Y~5_eZO)b{yAjtz=^2jz
z9T0N6<?Dl#F{6Tn2YB$pH&HSuavG&uvpULvruA=I#9e9%;(1+|Dd$q{OReR+&eGtb
z*UE3Pt#b$xw$8c&h|1H-)s$^EWH)r{_wj&T=!PF9cTTL}Cj|ulv-?#DpA9$q%7?Z&
zDX`Rp8f`S?etP|(6GYFf1A}h7wKr@((ff@XtR&O{9V_}@Wy}nG?31QkjX}mV(Vase
z@dDu4H_{YDg>W05MxPg3UX#kj$*#w1$*(CJFl@b;;lnh`Fl(5F)q>8z+lM8Jxp9Xz
zcZtsHXkc5B+4*8!rd#f|wMyRZ9;v8DN{DC@8W$1_rpdF*6tk)Jez|0MvP64gz00+I
z5LJLpDPV92*wW!*;+Yl`oxB|6M5^-E1D;?1Nnt8R@?H&+06*+Cuo}gF*u|Y*y`&((
zUAVtJ?uRccR%l8oD=Vk^Z2Qmrl*J(+m;jQoYP@R~IQW2t4}dwgFX{p14v;0p-)^@@
zeE>+A%<shqL_dJhisp@rSW9ZRG~wQnKrcw(s_=s;`BZ~=hKl@A_JRDykpTSBM`dBh
zmr0nJ!;_HQiLnp*2jf;M4T!Acr-pfNP#98m*^W9bcd$7Iq(DBt-|ytdP5v~#d2w+M
zSu$WSemS2Vhm}%_-apCZi!@|RpI=Mxw7^jzDvLD_b#)rU`Q=PNrBi5>dN|OQUs1*w
zt_`eQ5{)}4WA@gNg_a63A71a*E3fntZrr0m=;qKLyLKF+g_A8oP)Kk(R;TRb50sbW
zkE(_E1be<7KkEzGvyi+0diu&}{6!F3!4s^&C&@T_*2j(X#|y_)k-36^IYi{9>qIy|
zwa*T&g)+@$c+X&q=Q{4g3b%ElpFwxyUqy|x&tn_7(_%Gsbl>3;0O0bGVHuBfK&^5?
zH~15HPAOJFO<dz8)tFySR;Hd8oZv6_<M@(eu(%V)O%mAWaB&N|Ui~!tm}^zK0aWHx
zNjMwDIsz>+?FizfQfXqKFKxN7hnCW6!kfWfpNUfx`SGEcdcif?4L$XM>Kd&Llx~jg
z=P<l&4tU{=CW<<>PCc6tK$>Y1H;@MH_BX9CPfwoRpb*_RXp(2=Nc~h2H!xtO^=gMp
znC)0k&S3#N7}>Mu8X6oVWMYc>C@7T`f5in-hd^NzSYybs;Mr1BZ2^cJWNhyN;y=!x
zA4x$iFJ8dK$+G2ZppueOjqR?)hkGuN@1HxV@_4=jWkFQ3Qik!^>J2_cOzrrILoMO;
zcAaTAM3z|V`T~SihJb6R`ATDz%EICG<3hU!H^2#6Kn;PYsHgxSl<j`H6O-e~0-Wuu
zg+iJp<<Z=?b=ijK1Q6CoK&_Ie5jU6N0qDfoJ>O{5O1hIDdL;V;*v5kFdp~VBv{2IV
zb`)iWYiEt=l<=x*Aiq!Q@~nMhm&&z3E;ZHrE0QkTfm1)2*e!H&@${Lm4HB1%vN)4t
zjnXdmUIg(YFXC<|v`w9Ac>GCZK@v;&ap&{~pwPUQ0<Y7}uOfV13c;U0KcP};S>(To
z=Su&e@Nf<FYFHC;_adAcK2Vzy&x{G9h5dvc`>cxLpJ#}g{~k0NJck(A0VUID;|3NU
zn(BTxA@PzYJy-VCxLP^N$I6QCxZas)bhz@tQv-=x3S;Z^GU-zOVIaIQrF9-!qW*H+
zk{{RySvup9ODYE}1+)8Exgj>4KjZXh=8Q_MJ@JIdq*g8l`ZJs;w?|~7%8Qw~OvuPk
z=uLgn9n%geIma|XLp=(`XmMT#w|+WdxMcW>7+AJS%}BD*cX~Is?x$mDt~afEKX7;`
ziElvydw}VMb~rsPj&0fX+h;vpW}Mqh<yw@YJz(Oeg8<JmyR0=B&3uC<=-LH=cHlD7
zYapYgF}D`e5&R3qt23KH?6GNeltkzEw<mRV&>$S9ur8vB>96n2@b3PLtvAw)T?_h5
z=i{Ag6@DF88i!VZK||(^T%{$xKSc_S%RViskq4-!B)nub>znm#X$L4!;yczfJY55k
zB{8bfc4V?SeOo;j^n$C=uAx8?_U?9?lRe<PZ=?7bBQu_<4iVgANf2as@^S@u5m+CW
zHGox`Op}X5;4NCJ)hvzz47d~kk@G)vAr+RE?f{AG4-=ga>nQC`5X+9&yZ$=>-lkEm
z^nQ4724>brKt3>eRuxIWpYN#g^&c(3H~^nE0-gT>Ao123+bS!gfYFc+Oqm&}md!B6
zUSv0+fBt0Gw%M%%FuA(t<1WwbjQj@#3zEj;Qhn*5(r8Vjqf^FhCoBvJ@PuUK<N*M2
z{^YCi4p?%ybw#YkP;-OQpsCztUN1ha#+xg%Bq03I6PFAkY0dsLAB_Vw<{FRbr9TgH
zXNi1cPHtwvS0G||wSa`Lft&KT^J06{Gvq{1T%Rv0I7e&0MO`w0B64?PSB}?_C~__1
zH|J<)%^OvMmCwvG0WrxNTzfjyF{U|Xn55CrtH~dv!xA{TR;o}uCl>kh6ovP$d^+#v
zv3=#;5Q91JWKwpO-^3T2stFy6zPH@K&akNDYfeZSzkI1*m<cpTeZTN{1pyiR6ug1}
zJdnu|cxoiSHz4>*wJx5slc%0$mawf(;J^53#>@N6h(b^GIUbZ)>xTnq+HuX`{Y=G}
z`pdYj=auxaVR(<(SB;(4#*pqK2t!)UZ^IzWFlYs_M?=J3UWhV$>fWquzDG>-Z`8p5
zr<uiI3{VmkC40kDq?g|Gb{gT7Z>PaVU$+i0VGKk)Aja6H3}vj;bNg5blMkmuIPda9
zeU&n^HtuDWiOnt(XT6A6De{@^eAfbhm#^O&@oPavo^-kV?L_BKL78(rGwF%EzMqJ0
z>tK&)5!4de#Y3~n$&Wd@ih9<>kZHWCR3*;pkGvNGxsGD-{A(&L<Tr(s7Wd1?p)}`B
zG^{=X?^ye(R##%(Uas;;_O4Gg7_lVxrgxEm9*UItdM;vL4IPw$=TZN?OBf`pHmY<0
zcF1(GP^UPqq_|~nJ)j_(?+t5lIbHvKR4Sv6AYYrRz4~cmnu*CTY|^OYUhTHQ*qwVm
zrfn7aoA$YC#zIT|?GR2u&4H>{%}UhxXq9HQBiE4TnZ8AwzwyfhI0$)ZEK`zudiFTI
zysE0IULyyfcwk#@ecVe8hlBly7d{-1fh7BfW0f~3_`^pn32=ZOKL_CO{y+15eSJV3
zDiui00=~*PflduO*N;Kr;Rb_IL?1PQ53vC}0bg2fE-45wdv|?SQt!gGYMoRTZvyI8
zPj{!w9EtmVs1m{^fY;=QUHOOK@<%cRo6DXu)BOr2K*xI7q9Hmtd2!DZ7#SAod26~&
z3<{p_*Zl82fRoWEN#xs&FJ1?kf4Whh=f3bAent;&hf6}KgWBRQ-)NRq{^h|0!A_To
z-a&~{v09P^Rj#I%*xj4O$;Fa*Oa8Hzj&BB7y$4qB78c5qb#DHVuk$@d_4xy4YQPdY
zw=cTrr+5f0G95!G_{yKJ17Nmm(w(k>;GtBosXfvs<2-ARIk6AivB%&<Jp=9dbT9Cv
zeHbP5NwzvSA50b-YNfUbb1gdxJ5j8#?Bh<`e(pn7=8`h*-6lJuONd{>y3*M9sm6Uz
zIG5D=O}fjCb2QQJpU_nDa#c<atR~TLdDGAB2s$Xrr9gP}Mf6vNU8+<xc-118D?tnT
z4==mdN*+5JzwI)-k`N>9o-|KJaeW%klp)d86Pq)gbxq8*;-@}*xq=Dev9(LQAQVP$
z@t{a@x}&K%q?EGU@po)-0*sh<O(JN-CrZ|>pv<T)saQENQ#SLp-?#ZLFEo=<n{yT?
zHt`#@>?oM$os2q6RCg7N&t>*V@+>yrQL{SyN7@fOkh}#{Orj2m#7U(p8{SE(h9uS9
zdFCY0{GOfgTyNS}x;GvmMqg9tQ+yUhmR3W9mg~cPJMgj=wtIL8eRi;&qS%Ov95UH~
zjJTVgl%(NGQWyTO_~c6I!JQ)6Lv*i=-b4E>YFKUUem1`J<Y{@XG%ML$A0L>|F_FNY
zp3$oCd(*@xwIs-a`$V10t*+gbo{To>6$7ZGGLC$GhU@&_cJjmZv+Ld2*v2)NDxyX$
z!-7IR>we`o?6Hso_>sDH=yDfpi0TAxmwH`nQ(cYX?3<OshtK1QqlPtg+@;L|)3ZG!
zBqWd1nQwp~kf#eM0Z`G>?zXJI(tWt|0uBY?r>3T+NdR|Uu-|brWtVB-2_yiSjQ<h>
z0reOlg^&Su)qtwU2q;5^+xaLeD4<|q^aK8w!C~;PNEaP|tG}#*!t9CrX%mn%OlC2W
ztJz2R@ap}rgg$FIZ%4e(WYmKK&Km$!2wwqDxmXku5o^F9@dAkM>s{UeKe!9PU;P3g
z&@<i6nkaVmXELljj@2xn@&SuqU4SFs_jviKVY>&cZCd&RIhq9X%G=G)uDKKk3_n>u
zeQ}blTwmu{mo3>v$%fv>O8uP|2?QXZ(pHs=Z+uRmbQJ<=ypV3@6dk?($(8W<qE}Hh
z{K>eUuu8nb_0Mnt>A^M5_MQ@{*VL?<oBqWorM5MWkC=rgzn`>ZmH1^6riG3?kfNiF
z%PY>*Ytk^&Wc6t@g^Fi|I^)N#l+uJ&s=Wan4@>U%|6&czeSO2I3Kf*A=huV-j{DK`
z;#pWX97FfDm*3v?p&}%^b&LCtmn7Goec0~rn_w^etvT7I8SuW3znL0^7%qhQzv$cI
zcHk?z40<tRvWw@gXDsc}?<S@|vk;(~L0lx@^)ym{Q5y-~=SI7bJBie;Jnb$+$G<_B
z+ql!4@j<}|rCkfVwBc(ayN$$U#W4T=EZf8$sc!Tti1pS?iCVm}(;+WMfH4qB!r7jF
zsql|>SBnU!VRRp6c6XE2h&^@pm}Oh9+`>n+jK3NE{jpZOyW<KSTV)JdG)Iq$k`jdT
zlfTLRt3q94Hz*B!m<%D&?_<j(zqnxj%|F;M8d#P_8?@>@q33Zi_5(Tk5oJg`eiXcT
zU#*OgMQ0gbShasYa{iJVEpnvhrVqysD(k>zC<Iz)S$W00H>9BOmT!!CAEa;jnu_mP
zOEF+GXdW5qzjEEjBl!g?Ei|Y%B)00Z-cBwP*de7^QLDG~66d4a@1(1*Sji;xp0|G<
zm-OYf;8c!jI$3oXhF-siLHGqO)Xe*#m>4ZY9Xt_M(Rmx$VbGi3JGEM6Mb&+UTlQFB
z`lIvkvWTE#qh<WiYQqw_uejqlJ$(L817T)#OFz1Pii>(L5B(PAmy8DO<$ytRfHn|x
zDQ@oGH&lNbSUDl~sR)ee=T`47<j8{!ycrfI0fxZI#jh#iughyi{ejoKm~Vp+F;=xG
zt0iS3<k=QgN$u1Xk^hyZ4+`g2%G0G)o;7#*I>WNbDVvh(>;XlJc2)4=98@da+F9|p
zGcfgS=<Ne@=$;85Ub+_@#i{<KAX9j9Ddm8ID8AP8)z(M!bL-9X&wZ7_hYQ>q!D~Xb
zV**Po;A)bzXRMJ|-EQf>!ZQq#E2GHdQL)$|D}~KI{e$$l(*pbgT=xUsL4MFE$0n&$
zDl`do@I#V39<@o5RCR)P1*)Ga*gz;%ggI~6>(qhI0+(9Vt9x=;m?P=W%C)hGwmkUI
z^G25oW1?j~cwK3U0Ud<bCF-_7N>=YH-8&bsh3{DPR`q1DhTser6nQb0<>qe6Px}Hd
zWYTG(e#r6OBvc7OSa3t~=0sYV|FlV>ln$V1hZ0E#)v-2Vhc95G_IU6Ev9#5)X;{p6
z*2Cz-RkMo3i67GAw`VEJXhoLnUE3g}>cE;m!^dkb_5-OYhrgGmSmT1D+7?B|H2q@!
zK((kPh>VtB3DS2?(8<r!^f-KxuAXDy3~OTXD;(;F#!kgVmkW#7N+G=EyCp1)Z1L`Y
zS&HZi0{`@>*CSI#cm{u<&c#`wKc7?~PG%RKK4%s;k7{j2>SjmfS1_(C;l3Q7j>&dV
zYE%ucXCDJOKkjiti9Ki8vJuT|zIHsoLYi)4p>Eak>;kvAhp77{qpP&!F8bAy?~$x0
zDc+thlnMKx-spPEl;TYcx5$Z>&_aJj$%yaQe@8(y-3Kf*!69t$w^KiuJe%8>xdu|L
z6;k@y+LE4VI76~~(S`fffSX~H;;0w`b&KTUoW2)7LC7{{cWay(8$-CYM|uU?t+1>#
z_vxB#X(l~;R9><O#h<s&r%onl0MV7oxd5bs$|p<>5UQwfY^V!&J%z)!m|}>Zj;*R-
zW;I$vlNp?r6<BV(FZdIn&GmLAMQ{)CuQkUPtuyqKEz?OD)BJbX*z$eI>zTpw;-Cn*
zstESZB=C;G8sC$I=P>B!hLtfqDE=rwX3J^{t~J-8XVh97-|-lRH)zgK3){JKA-z)L
zN9<p(KwB28v3+WRKhAxA=qigxHiJ3M3Lu#0R@q6wTFB3fbq~iz6VIIx>%TSyHJeA$
z@4_FlKneubWAh2JbL0Jj%l6F3hBjym@&HbAA%C3le`qv!@sAHz!Xr0Ym4&^mu&SV8
zva#I$|5W!@L2<R+x+oeX1Pe}Z*Wm8%!Gc?GcXxMp4ess^2@Z|B6WlGhpUHpL|E+J|
zopZUXsOs*byQYm-#u$$TFEqO{QW)6|Ce8~|_)?LFMYbEGc)HPzvreE;PLMj7qheY;
z4l{h-<7?;ylh0pQtEvrl?g;k1<0VK|a7g1G7JISGA*Ur_P6d>=Z00M;K$gbq2;9lZ
zKMV<I%%pY6E7v|lQR&>VOmt%kK5mH?Xy?0e{SRz^=U5g}sn7$PZ`DH@_7;1n=J(VZ
zvNO*8hAs$X(*~=iH1zCEjYp?v-9;WW`ot`Wk6hq7^O2`{Agp`)`m%iZV%z<gy<+)<
z$bD7|V^og3N<GzVfa?;GiDFRkah>|7Mp{7m_3^#Pe^~-;G1T0Jn$R&^OP6-Y4)Fm6
zG~>)i6(6_gv_NHZa?g<Q!yPpF{m+RXf)xlZ($f2Gi(V0pUtl8vr?|6Ga*+OyO=aA@
zHbU=nU{*iPtS;MR<R6p@Kg3&-<4y?jThL(m+~>2SR?g6P8yI5_$w$L4CMjt{TvFJu
z>f1JP;UP^)p4WewhwXaW8sz6M{&CogIRpY}k$i&0A-(hT`$BJ?L@Ha%=uG*L``7I6
z-`ebCy0g+gqY9N9SC$vvHUORFF&+N@4vK0TV03yR@K@1sU(EN%aph+DR?&<|FE%Yn
z7P1Dp@sIpkpMb);U!TbT6Y9q~`e)Ib{Wk;6)YO=bk&+Kg{aG35ea07_>@RsI4WH^a
ziQeWgDFL14u(h48E$g?<*I`;Mo)l7^oz7c`pi?{>ANulJ_Kh8lW_VSQ^Gm$?{e@OM
zG0={_AXZ=TnjDlG$x1U^#agg)su=cvxfP{tWlLeT!Th6j$8+{_NSmHAZwy^wmlwP0
zZ=lC!!1VmZ_7MOO9$05+egB_r#11IGMyc&vQi{Q=M&3zjL7xl5Jdg6ri6O0)SlN=4
zmO)Zs80Zn}3qZlUGep+6B;DXkn6=Q%p74~VPK)?l5?MEo1kS!B(Roa<xKe(CkTl%$
zO9Bkn%;gno@F5zXU37uq>!?uA1!=s$^MLHP-gW*yFs9K+rxV&_1OnOL+_;>MQCW-5
zI^IemEQ!fAgU2MuPz(vSpq*P)$N!QJA8=aV>kTGdL($@$d{wmrPIos$3))$NlO?((
z;{T<2Lec9am!v%wcmz=${Fjy%_ci4IJ6-X=OYr~o6w&|Q5fS&m_5J$#Zk|Q$1)_hh
z&R%Dn=A@x7wf&gupDz}KaDD&orZArW<K&<3{ohWM{paHT|ImMTw@NGjT<AL2g0!W^
zQoQI+ODT6Rek+=rcb|OxpBVrpt_oNeF~9Z8sf^-CFAE=EW_?>$R}!3R;%UNYc8LA9
zz_1-IU9*-YZl)>(T2^GE#fcd6#--uA9~r3q@7<k749fbOq~c(RiL0YF;7>5FchrZ1
z&?sgrDS{wOA!oGIMr>|w!O`J?DT=;kTen^-^ZeV2bpB*3>ZDRR5IJ$)vnHN1VUWpL
z=E{eC;`Qxh&Bs$|ckTXd0Ul;qEkT7?dNSwJ$;>Qw4mg~|rl25vKBXvChHvl6991+y
zvv3^MQ6{@P(&eDF->`E6Cn&~VTJWe~G`T(SYaoFgGR#!dXGOC>!_+D(6*)~gxD9@x
zN9Np02rd_10wjDS_wl6T%z}kO$#pl~g(p$OS%}6$9>yJR&wXyjSFdxT@aL4xRcJwE
zsDJ+4eby)lJv~_Pn;8xIgwdSPU3t2|40`+*pKNr;<JLemfAk+lr#kNU*`6F-7@ka&
zuz9x-$qkF9?x%3@M?C#s_6C<O&uC;Oo%TgQYRpzxAEoHyI$O=Dhs*OU2{J*^oyS9M
zu9*YrQd(p%l*fzXJ|9`51fKh|ZN~LqWj;#MscMlJaWB+VcH!>R!1@#Z^XC`=>kOYF
z>p9J^$J2@c;@4*}ttdF~VcV2S|Mjfy(g63|m=bFb^XTe}X3WzZ_v6y6KwEdXoOxfK
z>lIrE5Jc48sg(rsO(81v2wfv{W4iK%?w-s7>z}rHK}-C@#AIE^MJ!L8HS1z#v)+oS
z-KVhS%n80?h&;UMa1W;$q=jps?XNpc_~n4n^|BROlH--uzi(V1eVAUc?Sm=ZsP_n}
zhfhxEGQV}+Uyr-L_aCp?Q4D2z?C&gT!N3VFr^GDINohZ_E28mdb4FP5u+ZSYXr^$4
zG&$y{U0vJPHLRJgs12mYuqkRRbyoc!hbaFB2p@pTYvW}j2=p7vbFaFL`Hcp1Z|iqu
z@Y)%rj7=b@>UDoNO``sM(bgRR=ygMJUwdB8-a1e=z;Qv@E8i5)p%e%9<GFP(5ncJ(
z;uL3~n%Ly@{LjFeXkJDBk3-qupq!x5SMO<1YZrdGua9n6QCx;$@i>;iMK@3w6XUOn
zp}veM$u%kkJxYbIyP~X^SS`0U*AC#lRtLdFTCYl(IbgK96zAY(gFJmFkQ~poVr&e5
zCz4buZMD1W5U}=)tsS@JFJ~eBkZ~=!ivF!z#s5zT0&jAefx}n%x^9W{{+^E?l#H+`
zIH{y?-Rqs|_`7^EUwWKvW$^nOd;IkQ4UdBh{AIR{ey|Z@Rnyu*otmnfpVL@lB0c_C
zR{4m;!!Y9Q+ul}sRwIn7{xgKm3uTzqS4*Sqc({YM`d9MF#V?#Q)<;<)Bd0F5(o#6X
zcm&G2!vooJe?r;tO$%;YA_n??oUVjIi2f@SVO2AMSX5PUAnE#cr*yq1hBHHJshuk3
zX+W0o)5D+D!->|1tM%A7zZ-3Zp!a5+W`V!=?<k8-TAGmOG#;&Pcib|;uD-{aoLEYw
z@v+U`%x-wrSQHBzSo5wvjav|Ro95&Js0B+|7Ka!f4ennD)-toxy@P|@&>w45_I89i
z8#0P+jmum3`D3hI)Y9Dy{Axip-u@0@8YkNJ*Qw2L!%0P49se@rUy3-W$pihdnEjfL
zJ|uw-H@0KvbCb0IGvz(t#ORDt4ruIkqGo5=QL{@ZN>@s=QesnkGB*<I-hO%C5-i4?
zQY1TruO?ri=+gZ&9Dz4l3dGOETAYnWi{ADWU%k<mYN9C4*5=!|-{6dQgr^goPJHNy
z+zI?%ET3PzuU+=zZ#QwY3{tLKpT&OO3t~iNgu5DrBgWJ2yc)wiUK$mwC5U-e(zxrQ
z-0*oeB4@{ooStE()!M^!QQjnSN&iVEs5ZX&Hr>Ym(%G7vE|yai4@p+2+%$aR=(R3j
z;PQm>zmu3CavQ0CR!>rXXTWaNN7|U@FO<7smr6?<eZe#~&QCo8**4+Rf+STpU+ao^
zf*!sh_4*)$elr-b(3GrAM73#EJ1J?6TNWUBOmrQ2H8ohNCDsBj8J#e?9vAcM6t8bH
zq-hQ4jT;!4h>^ztd-;PFZ!W#m{RGy?u34j{|9fKKQA*32ztO<L;}&OjL0<D?V|M3I
zW(6kqIVN=Yzt55$_dU#02L!NT_->}+S=5j7>YC!j-Y62!=&e3EOAQx@sI%8!-m~er
zRVLoIv&UOVZp{rv-=hJX-=sVWaRbj|u%z0pn9Jb;HCh>lC~`7J)FI0P%X9D%9H&(e
z@sWOnG19u9D<ktTO8WdM0biNipP6~b`zk&uZ&^eR*4eUx&Lz#pV6Cclf0aG(D9!gG
z{vWeIsz{Aeix5uf(i63V@ydnE>diyM=+YWFsJ1!CD8rtcx2w*OL^SHNtMO%L0co=H
zBEh%1R}5MuYog-;G(|<ng4I-iJ~C)LV9j|Y(z@&D8Iah$4b{NZWKWuQ-L@=>o=mvd
z+;uJEyM3Fm|G9)$HK5bl9xoOQ9(AupPo1GO)M@fTeS1a3F~h>llikl1xLg0aeJz&%
zIlZl^je5k5Ag9WVw$uT>{NC*u<IbJ|TwH7~O&-V5^^>eVJn8vb*5zFOnA%qg`Bn$k
zrWnHADNe&RtZ&=9u+D7kTg!i`PgUSMv2iRwl*@`zr+>8Sg0+U@&;K>_U?D>FiZ5iR
zzR=IlSkH|Xu+*J(MGjtIULHwjTu{jVNh0`0SjvuBFNb@XgqLILVl>(#9Ch}s#HwKC
zl@#R3mTasra|sSNOnTxK^w7Q;jk(7lvr;OKb1dpd{I8iW5F`oA@zt{VD+8aurc9?6
zP^{~Un1Rtg?wqM|o#&3M;VvCT??+=MCwO?Q<g?i8?S3h$I%$h(h{laqtcg7Dm^T4_
znz_%m+0B{4inK0>Dquji^3x-@($DZZj5V?A7$yPnCCR<V|H2A69`0x9cBV#*%G`GY
z=vFQP?Mw`>I!D%VIpnHH{HQ08U4{9$)XU_p5>eHc%@qnyn-Y~O!3EtYv^%HlZ<<?O
zWS+c<dSE_T{zcvRP8;-^ccz)tf&{PbVcuMHWj4z<&g25xU!z(!yoBc#7L-&}f~QOi
zbi2K^n*a^ox9ElA-FZ!$)4Kn#m;s(ML&g{>Y2Qz{M=MnLpm%gv58%Jl3gFV)Eqwst
zc0={OWKHW_)oLAHzMpxNiAhOzW2_4dfG5CLhmn*dSZ9c&H|C@tNSIFPRBI}pd*;y{
z?Aa_RsgHkAl~ONeOeQVpNN<Ag3k>g{!MN8e`*e{m_m+0nvz+_=v#9DVTc&^J{HO`K
zO&CdNxw{jm*Y(TW)3|*DfMm~9J6-jW9yT2<oi0q0oiAS&>%vCd|LnohdeEn^F!pNS
zr{p1XXbAnDC)Q#O7*enEl@U&C&JaYpb!o2I&I!?@?2t67iRJWSOEY$QVlR<GzN?b8
z+}Q2W^s&>di@P3X9>z9#4(^=7cxBc0v)fQfNXcWRp{$sUXY^sL4=!uxD=Y##n}&dQ
zZn>bsY%aZ48!$>>?)Se%CGZZY;!&w^A}rpmxu23jI9;ZOx~<yn|75*G88a8HJv6;!
zJR5gh*Olq7s$0C{2w)s#YjtThHK|-VV$i!CY_0FLlKE1j+2*Tt;1s(|K0=>!UfxdV
zA8uJj;&o%e4!a^@C|r2M?apB~-^>}LrOJ-)NJm{t$85q<Sz&<oOmbrEqr*3TFDJRR
zd}da=zo#9=`#St{wOL3|%jS*30_j73W)M6G5|ql`HiJei7%Ihgm~)YJx){?OO>z3i
zk?A~5Zr~3hWFy74GH`s4k3ofUJakIy%2>fBOkQH-d)~bF-1zZSK6k7-UvKp@+Y*Cj
zBDIb@`1{X)-J0W9>0sDn?(3g!Y^vShe>q22bgan~P}-TjP8~Fib4Wpd>F<Uf879`z
zR1Xy=VrvXqE#Chjd-ghm(xw6!x&c0^L1V@wXz<<OIDqi%JM|KfXFizis;G#3Cxs3k
zIzqfVTh?1HF}-f{zve5g092k@*Gqlp(`9I1Qb|KY!y5bD5jnwW0OT5(n0UbGM<g5!
z_>xSOJke{l2xw^Fm^ZfVW-h;*F}+iuw`2GR_OEhh4)wb*@MA*WS<=P)FJWUD>{o!x
z6N9m)FTiQ}fP_B^K(z<TtD6!X_YRA*_W;xL0su+d(!#;S#I)FAXE1AE0}xU%bzJ@G
z>TCt$USFQB2mmTKt#(_3rr?A;W#32=je4s++1%VF?itYEeCH&&0&u8@ahI+cBLY=_
zV5|%$bMjnP1O@L!d*EEYx#2d&OT|07$M=`dHHv%$A+pravMuV*{uc|d+yyK-_}XGB
zJNvljMox#F6=!nYlKF0)UQ`b8)63O2*imc|yFSnS=8e5v4N>AQ8HL(PYxgFLDs8?6
zEH7m3k?a!#Doo(k4UTw`r&)c0Dd8M{13oBPe><#sdAsPC{SM=QaawS%jb=rcb6`r?
zK{O8|OXllP9`zgmt_(S<24j|c26dKC55J)(&&wAR2_DB@2jAsoW3X~2rh>~5gA=&d
zK`=38|0TnUGK`<ZCGZ}Z($lMIw8iF5z!RofNijDEFG<u`!v!Ks=ftL(?9G<&+0Uj0
zCFvx8_4Ie+jF|97Kfuq9#Mk<itn~adKtPvdX=a)mit8&|X_c^84+C)k5nkSRoc!p(
zCW6=ue<*(eerD`mm^|~RpKaC2f;cD#&nZM6i=7~Be9g~vj1XDyeoWj!=V)6zUSXku
zsC}I;vh!zub0jomdX=UmoigwHqAUhA<T6J$r&dSD!4<(sl_ACBIh_clO0F)C`5ZPW
zKgQ-v*~&Id_~sI#-=ePGP>;?qNt`-L-d=P$z0&%ne~1Mp0=4A{*l?v>cCEu}o6Qws
z*9Kqq`J_Kq5RFqk$XOzdgR&2HFb_5Xd$p}<8u5qjOmgTD1lGTs<|G-WBpw?WQ{#=~
zzBC!|YlX@xT<!>C(+T!h4P`G;t!@W_W6O-SIOcpN^R+NRsmg)B@>){k`oJYDxO%=D
z*E|?vSQF3c4Eq=xLvo9%)$cZRsevK)i9lG2jnTNwzm#g^$I*!0$sHkuxkALsw2>2F
zT}&>U-Zx0#GiuemHUN;Z->vi>y5FAo0d_ONA4e`gdfHX`>D>z%fJNEgo~Yxy{sFH7
z9K``ID7TZ^!GpPSOrSeyYSwNtfe#Sg#2By)fkq$#Hd7z~#ZpjI>;d#HH~|y%_Q=6^
z)cJormjS4{^S)cD%naZ;Bd?(F`+Z<HuXqCnp~<xBJ;`*MZJv1JKY2kBU%Q5?eO@eZ
zaB$iz32FgsfKV<n|Kn7QM$^mlKEDazC>i;MGv01{;PPQVyWZ`}_}x|=a7BMtMJR6C
zcsX-<xQy|M)Tk^fDG4<KC2u-%xdGSlZZk5voz3s_li>9t<lQ|4KnlbGrhq80Q1^fb
zq3dQ4ri7f_UhN>hh1vTm=YH49bw`KN1sQLhJyy4pe_~qE$F>q2#bP{~KIs%I+hXrd
z5oiB|*Eim2Nx~P_pzyhEp_(m<d%z`MTOU4|agFP--(m6x_T0-JFrtc(7qbVCZMfkG
zbJ;bgpfg*z=$vuM37VB08GH>#^*4JYnxbVMJgP4r5!uPetb<>MP{$F<ot2cg#8lE!
z8%@EnF`0k)i(k1}Md9tkI8V>mGmz<2?o-2}sM@YJ^NwI+igK?Fh3(mj8ME^RLTmM(
zaC{N48sD6PVP@3n;7goAobO5x@i^yO&5u;$GO8MKJqcR;7Vtx#4Vzuutscibr>Y3?
z5eJKtvmI0+Q^+Tv{n<1m`o|G(7T{}IE%e)^1*=_LklPv6ATPR-u*zlqSJO&_Kh6v+
zvq?<bh1j7ipN`CLdZcjrEVl^bA7O?`tZuc{9#PNvL1<^}^?G@H=JL8u2}u<hd0EC_
zA!B}!yuD%tVtCczcgSQ<UbadYN+IHh-Nkp6MO;U(xxa3V$z>zre^bM*0Ymx48%{^`
z+d%KfA?&J8F=MeZ5AzGIUMZiSHZM~tuzOZKnB+2^IquqEr9J>tqr~l>3&O@|hWxiR
z&`$f^=Cy`qldfU{(qJ?ykF3V@`N&Bo`kG)(1*zxxm&~#2xU6dS-BA>0p94tk4_6qC
z#_F0Oh<&$58ksP6o7ffzHmbOi+4_<8XwWvbFx+|_URVKr9wQ)gz<l6c!2O-jxwB-F
zZ8Z!*3$QPaW=r4E^}s^X{LaEwYqrJ%TI*mufce3@ujxB5iwB4{Er&5yFj}3C)V)1p
zfIpbZWF#d`5C^6UC<Oq^lg-lmumCfj_&ElU#oAxKuvsn&czW{G8;_YUsmRO!T5WYO
zGc|p;ItR?&5WjFB-(0>c(WC*6;yTrs<#Qvzc5SJ_Oetls({>XAFo`ql3q^T9KG^+>
zH{e?yHf-2q-Taw=;5C?*jxGv7{wKdP)t<L7yTSedzpYTK^*{GI_xQr)7`n4_f!m#;
z>zxMFk`)2E8!&oc@V)OhQ}^Y8PXJyajm-)tgU!m^?cJLlAltsHfY55S)Q|}~0{4ig
z@dQ(wuNfrRQP>eU@Yl+cx-@9z+-Mw6zp9oHwkHK00V;NyDXsmQEO`uMEKv~zE6jwS
z@{s^Fph89Cvx;)+G{p}E;hYm#ilZJ)$ZY%`?N3Jyj=^h_j~J?1$%10(P&({qN-61~
zl(4GfNjP_vF(0?+>HE}4ex&o@l~-{elEaf8VJ9Y!U#DTy`N@GDx|0Yapw239mLF!a
z7H2xjx2yzA^JkTQmYqro4K>WYnah(s8CbS{D<LTn@K;&k+g7??K;`l4oDIq}C3o=W
zbf0{BR)^4KF~H)E$nLyG<<DS5^_rEI8qigUMuOWexjZeNxMkJL(i4pk`kofjCM~NP
zC3M+t>rLD%iy<Z!!4(YAA{9Ypqi8TDW8>l_dVSyJ#Q3B>9U95BSspvNZaKiS%U4IY
zJfEYN8SZBmFqvoDHX`2JZ@GsbHRDe3-2|;=UF67@Z6T<DjMa%yu_T>3TV^38!~fit
zsNWj<=7P2^*qM>JxGQt(?(}{O!B7rFs1aXpUldM_-5~ZEaz!U*_=61Oclqsn>XDZ!
zyd@Bk31|37pG1UU#f@89VNub$ttDUt-s(J)1(;dN0D%{RL}0x{4q|$EcsRIuOctOB
zEUtaKF6J-n>gpn6Se~8j1Ka~#{|-~E0G8x+bv+KUF<&^Rtv$YfM+XCxF&vsFy1Tl5
zR%$le1Kj>bP%4k}2^O%t>;MRG%f%YMbAUX}_yp(&Xt}I8Spo<ifFs)oGVDy05IY1^
zX*TLko4a4=a^*vTHLHVc60QcI|0`3Y`vCG9EkJyOal4#B0b(G~!1IOucB5NoDA~?d
zOA5G@>G#2|10Ep23H!<w_nJ3uT|H>%EDW8Wo1?UbGtO<fn~)UVT*{>K!bF5R!TS?>
zN}>i2N4RyJ_cw<~5jrasrc>L1k&^8FUcR-VoD&OPt>|%@mHyrsNrnxP{Sc?{gAI4Z
zha<TGL5h_y$?fuIg|H<0$ElC1EAN4ry|)vt503|4kWn{;zqW>?hpOR#aW$&Va?HW2
zdb#P^9cr&0<z$Spr*@nqS~tmp!?H%@uJBnvEQp9xP%Hnf4y^;mWsBoeuw<HAKK%li
zggX<yj+pUIdU#VHnLtTeTiXXh@Jf{#GcCz5jae_Eix>fEy<!i;ju0e;b<!W^cZDJu
z7};Eu&L83w9a7O)<w@@DyU>Td=S>^058XOp1!0x2C9*qXuT7u0qRR23XS&Q;!UTlz
zKFLmUoofN=Rfy2&8nW3{Zv(3BRPGz1@<zla;f)b#@Jv5Dfu+OWd)0Bk=j&riSiD$-
z3WSdr-v0PZ`cWYZ;hV4@lx&LbD0Iwu?=JMsh2!W#1ozx70P*A58_QTQ^U%`zk}F?A
z#sIS^)2GLP#d`b~NJ9DEm5l)A5E(mr3P7NCPFGoPvdPijm`q>7wgBA7o2=KQtgNis
zeMtiArC4!fn;&a5D!*{r7wmSl(P)SP=9V7dT+pshx2GV$dfE+O)p6QwK?T4d?Y%o+
zfU&R5x={=7uyS~DE$1Co4#W@;IAtjwJadVA`>00TBl2Rrd0T2~<$&~W*Qiu0OrwFx
zsT-oyhJE7;+Z;*>Bx;^vkR<iv%?y~+Jp@yrVVT7-82a)~w2eI(iv}GWB^1P*!o2q9
zK$`RL$S2q(;b9MFPa?RrPx9WA(dERx&))~!CqZFK@T5p5%69CwEunbgMhEQ3vRNMo
zz1i3G5-n3DFw&vEKf%{jYWFWUf2?WuQkbCY=rXY>aZqX)9w=jF`|HAW>6m0j;_0E>
zPSuo>PpX-g$OxN++UL^oC5C;G>T`wWg0hwkOBm{;Ku%GW5`mPLuiod9s?#5Ac9Xqb
zy*lz4wTN@hh^v%*ne8X8!tT=)QGJuDL<cwW-m4*BIK#f$Z79B?5SI7`#*m!v);f<-
zTWU(#y7gB;Z`#XW__+Inp)UJ9Ua(Hh*rjvC#;K4fMkRDK!`r$;46*WQabKzFo<Igb
zN7P0<9OM$7#S3N5<uDz{uR4FB|7IbW8*SpXJRGf!pUA=c;X8WBEf*F4+CzG*BZgQ}
zS!@hO&8^RAVs;$dBZz9rF#n>(|8_numHxY>1f0D01r~S%0=Ci7ydq=I?z=I$%h?wa
zu}IS;47d)_kK@=;n`E$jfPN1BQ#2HmO?T&uM)120Z=vG*-Vb0KrSW>Ov1LR#?2q@K
zSO8Aq2v`g`WST5_?S68;fc<6Dpx?L}HelD6MNCg04M-B@&KMA5m&eI~Bx>^{Vw)?0
zWDxB75zy!jE3B=JduQ+i1ULX*sp>Zak~MT5kXP3?7_^gcvqe<0!N`>pL=G$$ii^tt
zOM)*|D)JK{_--SRo2w(piBwl1paoGziEY^VS@{}#d=4~ZdkB37y?H#FV230@>3WB*
zXkTks;o_qJY73eMJo;Yl)NAXv_2GsSS0d~_{AL)IGG76NE9t|XIeNn!=d7ig+P|L3
z8~jWLK`P16rSZqnvh*tnbx}<F;V!(QN(E2oufJFo;fT%Ok%n4$@Ks4%<;$ILM;Y=Z
zN;@~?#9z=6Z!fh&nluKQ4HR4~r<-J=RSf6%#P-$&r74tW>88VHE&2aYMZq8BqZS)A
z9}`jWY?10cK1iSbFrDzI>i6x|?gYV9Jmy5%^&_F0aq&(`<Z`^QaT!i}$H_{EzfgD3
zstLV{4PW*3MOPu=!YmQw-*pHLinms~GwL3eQxy9ZEg%0$r4ciVevi(aCI;gY`scf*
z!ENOrJ4O1|x=$-rsFpNR8o0;PX;0fH^b2w+{A**kyj>S>G$&{(TB~uUq*t%2w@T+p
zVB>c#f2v?EHdoapNULS|^)VdIE$T(2b$PLoc>JHAHWXXfvls4~bVSXi5nE@zcB(Y|
z4Z;oKh7Fdf(1x}Q0+FM^U!K%YKwL};BQ;vaK!KL!TB9dTSpb8Zc97maps(W!lprMK
z<j#~2pMe=>sBRR<ax|NE_xI}oBj<I%Sl7Daa=`hNhKA-H!U^ON!tqsS?MXmn;m1%I
z`!XpS9s){MV56yw0q-nKV8hi-#LnJqg!35m5z}=yN!9nl9zu7(=XKvlQAugWx>FA2
z<eA)*Hs90>#e3TISv-#+4w#kZ@-kI>)FJFpJM%jpB0J(ZFwKMITX;K33v?9L^(0VH
z32BIhW$ulSr&?f2T2tX*N??ka*Ug1Cnxj7xaMrYcSzM1*z}SsnKlLs4pNNa8T#Rs3
zBIit4sTwT(!>bl~g)pfaV`RxC9Eo$0Qa^F_>t)@1-Bjz>mPb!am|<>ieJm?kz_sQF
z-*Z^ej4Qh@<NK%HZOKh^#~yh-9P}Bb1|dIWD)RMhz}>h<sMK|RUoW+$+Z!=TFD4`|
z+uv0l+7Ot;wYwTGls9lEuHgpN8bRaBx(ji9$}wdrLRtIB(eT)h#R-fZFoQ~-El6!}
zc@z%Hi9I59DP9B|uo55Yh?92mjS7O&Jv=wyP51+rGl#Fk4iOk7pjMrmEU9y!J8vu(
z=#0*6Q36rA0`)jWkFV^Y=hV(7thMhRq$8D<j<;e#0sc6>lSBy9L&{!TJ=-pu&sPk=
zGK_;d4k~bemW>8+As#b^u&j(?F^=TE{O%f*9bTRrLi(zAc5yL{TsJ5;yh)hzUWs7b
z^DDtA<8b~Jebwu!GDkz}&2hDbX#vmI3~3UCK+U_CE5@>4e))xN@NSq0@w*xWLd;EJ
zdgR^26QxZ&10n$x{AX<-1*HRa<5vLQ_ML{*WXbPwODoXcNJ+A_wbinhZb8o}`v~L#
zfMxGRKc;St+tr>k`g}%4#sd(p?-?s#Cr$>`%LuU$>O3Fp*{oNGfcyXmL8}Gb2B`y8
z>4wN@Ak7%h;tr8Q=j|sUA_9js34(af)(fRm01_}fxlHQ290ByN5CZQZK-N-RQBlzq
zm^M8;t<O8I5WfPuT#SeyfHKl%<gs(-(iqC1gVso3)!Wn@cdy`($}B`XPX+65+b9tq
zhz3K+G?rD=K}~573FcL>jto<#qv*O;IAGhQsW~RgWa78(MfRu_<bWCYwOo7~A2u=Q
zqj<?^z1p+whsTICc8p^Qne&c;vNd|C*KT-Peg^Tc#Iu6GzW=ge=Y%6qA4*G@<AW5x
zwcn|Mfd)qQS7Q}vGc6hwkbp|{XtC#pT&4>Gv4L+OwG&tCRR|IG7#8tM33KX=LD10x
zJSG20j=n#Y)}i)v0-p!?G8f(6>`;g-YjV4N+>vUHc}#{EJ^wsa9${azBBg|=8QhsW
z`raXSfzG(Ab|%i%wdk`<eUN{a+(~~G8#kHE_Q0Ty3mg2>vHpe~^suRsVWQNfUw;1y
zUsVnsZ_;H7bqfAMd&nIfTyG$ZnyE5Ys<9;;Mya?_0xo>bmKA1Iz~B&0I(^XsJTHj{
zmD`F~oVXxAeah0I6?NL3Z6>wJmz_>^-}2uAdp)8%B4?j(@TKM}^YlWZnybCy-L|<^
zrD5c|Y=Z*ghE?ndG9p+Nl`TcMk~6F-TtQ(5DFKXTOIX_hu6ziX*UWO-Lp^2#5zH>M
zYT}53?MJNPBjHr9i$Z6xbiebVE4)`E1ybbBp4HW!LY`FQxNAD&LSE=_(|3XlMcb{9
z%Q+kuKZY5LX*h22IVL?X_`G0R%vb!lzu4>m5ap-0qD2;f5`nWj5On*1xMf7^y0L=;
zGyn4@5~u40A^*$aH*O;*z#cebYfHbX{fsLZ5f2`qk3$ZDK=~aVnE=WB3h*0lH9GS6
zPqbD<P6J!H89>kS_4Te>+)K;QW5w+Joi`3xkUt-(c5eZ8<Q5CBz;E1UfBh{?Ois23
z3<UG1Ocy)cn7y8wib9%s&*lipc^zl9Kyk9TcC6%#Kh8P}!fb4Tj#U-@;TG|oto<>L
z2?|jxE*c?t<0DRi&O(kN1(?)+(=Sf-BJ38i{OEK#3fK)SUhF#}>^Bn~LuD)*=6oye
zeg?UyDSa^{D7mK+?v0z=-Ed+{H(;YjH=o8&yj^rlaG$tL&12#VZcNMG93v*VkWq{C
zoe-wvv3>flR;2I>6c>q=6YxJt`P1FLw0WmFWv<W3vxw@fnAss1(A~vlgXr}^K8Qe3
zgd6cO)_O$m=;A;ywpv8dAUzSpT-`N3qDyPE>2<hCbs3CMbWIw;LaMauxb<DRAi6H+
zn4_|}>&p{8g<w)xq*~GrNmrkjv?4A8607r2c*11SCxcj=9i?4Y+_on}D8CDqi#Kg)
zKkiU!){NE}E}7l4DUR>3o+o{HP^Tq_GHr!pUc^9UQX3m<%Wzc=^A+pZ=AEfs2G?NL
z%iyH;Rl@^RbCI5I;t7T0I{-#e20@68Wz5*1n<rtUw<r&Z+~-0aiVClI#v=rVo;<An
zo_m}5CY^+lllBK(*`iOW{Q8v4$*~3Fv@8*PgPA*CzFFD4z6Dg-ivSXO2cSGbmllfh
z@^lB22g6-nUhX&VM7h!iWP3n4<z%ropw?jUFF+RJI=W<LV$y)fLIH%pfRz;zwweS7
zr8MwoLc&adiL09klN8|IxMY?sSDfG0mM&j%C3MKj%1URo><$z%--R9tyslB)+}ui(
zsjb^joA6DX+O+7;TKDnCv-u|g+tvr4xAV8g6@rHsZ93xmdJ-S1CKr%^%U6e@g0LPD
zpew1Wq)X>s3IFU}2EjD%ZM8l?J$LWvM2J0U@tF|u+MVuC;iDsKc&k;vQ`)amTQY@w
zxrzr0&)yru)fwMuN5q3Dskzp+O0?=cmoJpzsASR&y#0XpZ>11V9S-QkZ4?i^@*sS)
z6&bmDD?OUq<3>>G;8UisLi$&)8XeTKdwi|1`0zlEJZ~`cF1(H^Vb^ru1ks9nb%p&8
z=+HdfIbaCtwAz_tg{b6TspNvV;L2W4`ic*RQ`t&EH;^l(<-acDkY+K&(z#)bb|>D+
z^1EoXx&ng5h+2!@udq#B<o3xt{)hH!&F-c0{ldlYGXAzyz~27UVwf9Cpp7bF6{cQe
zp*b>C`e5p*Jh*!ymoT?8SGPD%x6B)Iio(CSE1C-$A!{D&`=2dGSIMNw2hibATdu5>
ztGU;KR!eI^c-PMoYR{^%5zfRblhSfCUk*HX?LmD;RGa|tPrI3waKYy(Rxy?L9QKp4
z$aL8GeQhnOy)Xoxzj~4QBUQ=^3knJf|Hx~`=`8`ZfmO4OseEzp`!kpB6|)VXtMsnC
z`A+%rdEO@Q=;$&oSe$&gss#LkfTBZ7OG^p(APrT#LV9}o$WR;!_i0Rr*kCl?HqZ*S
zYCd1AH4wP=wXN3a!~)1-KvhmW*=W8(eI|baP?-y_sGvPMK7N<!Y6W<f20vHki2bup
zSrnUQ#HRLR|C-W&wmx}80m7rlh|~-#kqdcPTkgS*U#hHj_fwOi7euqFA2_+pLP+zi
zH0zJhE8DGv&rk!n8Xwh%ys4=@Cl><EV#R>&vrW^_$S01;>lrW|&tiz7-nmmI%_F(*
z3V0AsJ}+bu)s!Y$fI8RU_xu4K2NZ`vu?fe7Sq&CAfgE^UuMniNC~(hLLJ6_!1u`w3
z4f{+v{>NJEQ+@#uc-bp?09Rw<w1>pNvAukBTBLKD7ku5<XP(Ek9C0%GKQ?4mM_6@X
zigemlJwWS1z>3%_5v7vthDL5m<pOV6aM%v|w1k#v#+^x5=lA}4Nc)F|?E#U>W!f78
zu>s{z_JdzB#}_s^4{~@)`gztz1fqbx%=egcO~IH1FN*Jx*fZW^!i7cfx}PKl{Icq0
zp^4ICBljV3o3wS%gg;#24@cK#2F&K$rRPi$Zsya=hX_4=g-Uz``%O-${(1W+X|B|K
z9t1@3^DI@7v(Hsh4yMni*owkzDgQ2<1{^rKr&6vN0r!<ikUZp^4nI8V%cDiCyQC=#
zYT!{xxkb*lqh~7@GJQ?K84sic=joPw5CXdqgCZJv+g7F=1;EE8B`5z2v;#|(oQ(qK
z%tg!R6t%P>X+fv+MXR+2FaqtYeBkSzSF$P<YQKRwYXk^K<a#N0VPVMkb@A`cFtD{M
zSnDDYj|OXyQH@a+$_Ch5qh?pCQl$CUL6|y!xHlf;ieFsT-6-0fPxXP2JN3`v%Z>pQ
zpyHF0F<4^}^y-&vIljCXiX-uP+gA3KfhOr)_nQ~si2JTktIR@0{P(F{qt6A+YA&Ug
zt~zmf9m9A6pAOhfS`(%Hl0^DB^BZ(W*6;;EwdSSh<M2JtuIsZdkvJLYXAbq~L624)
ze&|-dk%z!UpXdeU`{ek^PoGr%Z<k&4zJAOBlWNS!i-JyMmrS)Lm{%NAH^LfV$PZLh
z5t{P;*1{l_fV6+h6E5d}gh7*hELtFPUvxa&k}{bV@+0<FdvPqPOU~}k9A&_I{8!Ky
z6TO{-d1C6nX6)AAGCag<GJBdWIi9IyW(de!{E2Rtf8u=ZRh&b1RJiVfwApDwiN8Fd
z={tjNbQ|qII)?{$frDPTTDt!ESyw5e$-_mT|I@POigI0Y`Qn{7ICRFx8r232H2Lus
zN@{14J61+}*ATPtp5|aKx`2wqB}13{6lOatE_4ZZ=Cql*VH}fi?ZpDzL}q$;ep`}q
zIz!o7kxu|h;;VIYv%{~t;h1ARX!}g?@;NNlcIhJmzKQ4wlBcT6f?F8WPm!6KvbTwj
zkw^IJKu3M#&NA25flu?cnUjAh!eHN;CmiRBQu=9^7K(@(I-4Yhy*Y*CczW2214g=g
z5mV)Eur%-%<6|i`AnTHIA*>7H_+GXK^jG0A?`sjv;6_|J)|J^c8IA?0Rs=Wkk$w+x
z*!<n}?B?)9k|Fp^k4$PN*;tTA?7Q8G*Ya!??)Lq}_~|#;bu*K32|krv1h5~GMIOAR
z<KVH+Xd*-FI2RI&>1V3x^%C3b%g%B|gG*?<;Vrf`QdU-OKpR+Ub)Wzgd}5YV3KcH_
zIs122;~4L2lB1)e_gX40o2A}ZIxA2h@df6!>OwC;L2#heN65;W1n5Ko+qbq<uWT--
z@b}EYbc(p^ek&sR6fq<Km}N5n5t$@FgbK2jCRg`UT;#Ocp!ZIkc+0z-At2ED0QA_v
zH~`yP&~csZrVt=tw=rsNVd3_8SPUf9djRS8?Dip(*CQFov=voU1^^M+p|P>3CARlS
z7vZRmT1H~8v(qKH+}>x{W>ZK#s7vULQ09J`&|m6^tnP?y1MF2nHi?rSTN7$Lz1O?^
z2vkgbF(F*C)J3M~x)zxg+Sa`m>HrR7YF=OQ!qO`a-GgQH@d1J<?|XNpgVKFb)aT~W
z?IjpmU2E#T{)LIS)UoNox;z=kFd|Z=t{E{u&EDA%<DiYjK>rr@?g5RIXmvd5wGVIc
z6XDI&46DpBtCx9{fwni2Tr`|B(F%>jqQ9$C6jDq4-LawN(;SBIXN<LOS!3<KVW8V!
zY^^px?5nU+lVN#7u1yGyvy@XV6Fw^)47aGRWeScL7*wn)$W4_Qp^=$0UQ6fvJ|~+j
z7~nBrQ!j-X7I6myL!FTn5md%J4`-!^SE=rw($k-hhc-OGv9)eg*irxs`g=-qzjQmk
zpO1&zx?ftm%8GY9%#-<arPjqya07;ib9C_&kr<mPm7Cc<OY%TmhHPdx*ES|<Ut1Kb
z0d5lOmj_p<xh)32c{`id79rEQRIz?mfd9hJ0c(bwWHkTc4yg17we#*6gn$;`plU+2
zK5ebga<L$@z1ey587nG)U$0AYZfaA3L`UW#Vlf}~yhM~ah+Z%ohtRHt5(A)8wr17H
zzYYF1!ylIg*`?r2mgeLR#8iJk*tQ?WCQ3F(X)AJ-8hX;j6a?#rT&YL2#KXN|jjJ~V
zPy0|RzO9TCJAQ4pt8C@`L-&s7_Qt#+Yz(Tb_qXVG4L0UM?T!%9_H|TQnXI3J{pTkw
zO%=ZywWD;3bDK{FH7j%Q7*n2zW5k8{^|{o=%PDPF)So_bmwCHCk_p{N64Z@I%%2=?
z61)1|Hdz`+-|DZNliCC&cs62u0*Cl*s~a4~Df6fkV$og%vHydBRHFtFJt(ZIp`xN9
z`Og-f%PLCa7*OKYgooz^8hwDI2_TLoW@pC;`2pb%(9@<_(D9%<UZ}Q7bJy0<83%HD
zxF8{5SoUQKonAcv=?wxtPwM_pkLdkupP#wHA^L0?;ET;R1n=4>Koie$xL~#B2c)E=
zB%Otu4&B-?r7SRT-gnJiPiNcjCG_T1`;UOI166pQEuh;qfFQ>Aay{b-48C{5J?}`D
zR*fDwAgT_b_xk)Wn$6#>lmQ@L>H(@e;qy~@)q{BR(;Kx&O}!{d(r)7cfZVw`=u{K=
zk!Qm*_laDwk!*2A#rf1<@$|RXHP-HTK8QK+vWff4xxj7|7CDb#K_$mouRYNz>?t43
zuKoxmMm^u;;JXvF4vU@(U-Zg?0n^5u72UquyUYSU*V^#nt?eN`Oz;o!KhW4+HsKHF
zV8*~m$Qu~z6iBbP?bR#jb4_Cs{%(V(JBiT>4>FsDH_ou1l>Juc`J0-Ym(^~k>I!>3
zXVBI!HIOJnJH@vM-?Ol3AN|CEUSuZicBn$pE{)f+y6gI4j!>F5n$8h+t?YEXVUrXs
zSo)#tNC~?n+4n<1pC49KO1jx3jFr!K3qLe%uGWcy_^$52cu4UQj>AHAlO8TBHuY_~
z)AJN+5_P1nV%;9>xBLY8tJ}h?zeFA*VTN!Pa&u5ompALxVmXkX&b(UqF8>HdOl6AR
z_Fby=&;It2jsNhX_xXCkUbb4gAz~(nJ)NPnfwa%GIFqWlLoCY7k(@K%z|)VWbh5Ze
zza#-`w)<31rQ{Z5`XPfcrrz;QdN27(TEK$gN1IAaTkpfoP8ZoH2@03E4<3ish<KI1
zwo@jHG=_eri3W=jaMNfaFU!YH;zXtBY=1AbxMOd7+E(si@W$_!aoDC@8zWMMUdMs+
zDnmbRku+3W9>ou}>nx?KSG%2-$SmwX;nwooS7u$#7&Da35Gk5l^v*0tFdTm)+w10_
zT-yG#Xh+~J`t7LY%fPqMU#rgM?FIdhznCIbU~YMzY`wO~d@;6dX$<JsO}Hw1JqAK5
zSbtz=`iXiPc=uyR4W!PLku0lR%E`#=04=?J0J>4!(16yxF#$;GLt@aX{Q~yen#-(g
zY)saxE%Y%P#ow}%-`;?RG$9#T&?j8BcglL+%pqV=&L^ErYo`9O0q7_KKr8?}BBZZ>
z7a;a_Yx5@N<>l4B-ER-bjx*V9^~ueF-hlQz1JDM<pwn<19t2Fz1A2Od-nE)*0okf|
zw*a7$3yA!x&u#+_i|Ox5{_pMr;`nX_D-&>k+K#IX07an`OJ>)LKgn+t<m5VydiyQ_
zq|9~`#B))wFev-sWtWd&GdtCO;P>*|Nbk${a=6vRG3FSPAF+lI>(mfYjLShd7hLd^
z_ce&yeXTNhWzqPZDGdl6>Jd_xiuXtM;c`t&#ccYGh(9LB#1v;f<d!yFt*CzFuoEW#
zh(GqbU=L*EJpXi&S+=cJAX($zb0c~mr9PUKxomxT#+UAg<+K)BG2cF{dr8K`P4rB6
z6ueN&Nydc9D=C*woq1Yg23?+MZl6p`K0}D&xLdpBA_t^`dY9MLv2_3DbW9G+g_DJk
z%5B!gDMMm&QbqXH&U+|Bd|5f}D`$#h><OhFXueS7H*f_%9-uz)am|XK(I!}_2Mv^I
zGdkaXGue!9P5u1Dcv7YHLCW1R3|z|FB%j!cn%SfQPOofwI89|7S5r#urERo=U|%l^
z{Zh$4@q9yEG)m@=D;`T$|96d0&tAevuNXa#70H@ygOPAG&H1ja5_b{$iPuGU(X!`Z
zd>CgJFfu5qOhkh+_}v6lcp7@XAThcD%|`{7mQW!RSE3gIiKx<b+2LFGP@vioO~m`%
zaN%Dd`HL|M0EoVuVET)zAKAmF6ZfGc;pi2Eso2S2Z2aEO@cC(ip?T~Cf%VT)lK2l%
zekS8@gVWWx!fMk@3@D*WKd{XE!Q^#A(CAvQihCG{QbZN`n#K_S&P$r&V_|jbk_q&$
z3x#)7W{lh=%dhMH*vwbwY2)ZzNz0-WB^`GyDdQAZzS*QzC8Pfp7H&U2Fm!T~z=Ley
zgPtS5Z>QLh-?0{n{!M*<&~^C*4S_!<f*%{ppSAG{4e8&I<o}lo@)f}2f!v)v@}W+~
zVJ@@5IcoyuXpltJbxV;nED#+?MD4beO>LN_tld`SV}k=aa{*Xfsj<w9yREe>pBBI}
zp9`6|5{VL83-da0J%=i5!j80~2G(Bn3T8VINwV+TKXe;jw>}lT9hel1jNRis?D!-i
zBmP>?G02gNFc-*}JxM3L?LwYJFqAK&a>p>C3@u(9DC_i-=8wq<cxH`GZExJhs5{|v
zI9_J6b7vy(h|iq;DJ!KahjF=F9+Xo0xBY*)M6f+c|9LFv=7fLf2Cvd^I&*vdFFR)s
z^wDwU{ari3{n@ib{T9^&_a}%zV}z0tyt1s8WqWF7DA7l3D{}2M+0Nj4zXBCFg+zyB
z9_8tdW*wQp%y|^JNO=06$UkeKJ$e!guCgB3$n~RD;&{09k>%i~ORDH=U%y20WqN2q
zp~@D-Fy)1{A@~Lkltlb&r2cov(7m4<Iw27@#pRGe+5s8##w!FoKd#8QjK%TnjXCB+
z24XHwitxNVonis3ysCBxPDy!6?ttuXTpG$~Y}Jzh?-Q?4oJyNdnPp?6UWd)2pN12=
zub8UOZ*%chiVk8=a{n}XN_9|c&EONc#KYaBL~4GB*}KfP_>#47!qbvW21CMM!m!aD
z>1NP~IiVoL%@S~uJN2a8L6&&#fO1f1JQR3ht(=`QiB)MRtHeOgzP;WM`Q4*!2&Yqm
z+nPwaI2DT-n!&se&G}zZ^hyQb=WUIG(2mA4e{~Lq<*_r^bl^+E{qd{@dnz`sD=u-z
zkWd;|q<|cP>cFSRZo^&v;^m<Y9i7hy1_9GtPa-QmZc7(GB5@nz5a*K}vfa6$zM0YA
zv4pLqk3M~yUbc8QZVm{W#q*lMk^oy_MGO&s7D&)1tKwLiSvSPZ5>v5w`ndk(%Z5&h
z3mO4gPH`SV+TM10oC2nz1S13jX1#Gz&F`3A8jo&i0zsemC9!A8D~$2rhBc;(vD)37
z|4jwZ<$tGbWX8(;aYV-&)Weha%`ZUqn8spRU)v!-$ia*A(<$w7a0d&<>UZW+JSvBS
z`l|g~%X$BDi<-i#z_eHMw~HgeSq?lpQ{7hoYtyHsBZaz|SqaFBQSAzoNeo=Kr;kf<
z=~8EOjN@bn{kw~UarxG(KW!K1G*zM<6XN)|J}&W>^tLl)m+3%CW~Z;G&n@p18b;y8
z(;+<lWOLuoLd3|~GKFSBRn&_8ixh)`mA>v2SB$Y)F;zjSo?o{iIy?`NNnvzNV&8HO
zgPq!CDOLXOv9TZl6oF{OZ@1On+%wE~>EY+wG+cbtek~&j8vO(%Ft=lBS#UT&bR#U|
zUY}B@H_xr+*dm5OCCzeL)V)H<g}{tWIyz=AU3!1bkU2z#?)HErb3x%WW-UP|n9%Vh
zpNkpK6OYfAtx|>kLQiPpks-Ao?>FjpHN>WynT*dLIf_D7vD}W^J~*vNNU2tf<PqfW
zt<SYbUz`^*tOL`&##ENeIoh@f>qd|$ugDmvzE@SxZJ=~<I%aV5fx=IOBn^p<(4wWQ
zu?FZ{{`1s&__6=<%tT0*CZ3Z|h9+)y(K_1Fh1o|>Gn3S}PujZWW2s%B{Wy3r47yWi
zx+kcZ#;(Pr4rYGD_}~4JJ9+gsl);u*qY{{Aw%p@s2!?k@>cHYapc=Y?$FZ_`7RPIm
z!|T;V0^eV$uw_@mG%=hvxT<;8B<Dmjj5o)Z&>ptv=-S6=SZGw=V3LzP>a(O3w_0>r
z#5No1nLH`F=jHYjVRjfqJkrrqxYggV9;8VdYNf%8&;T*+Yn*7hFzf`Y*FqNM67EKO
zDJu__6;~nO_lbhR2`>{CD9K65Ia!n4;P#|C8eGHw{pXs2Hyeb+i&R>SV>f&}CPy5Y
zON1V9ER-xTUWuxS%^xP*wF6Yp^Ap$-e=**+Lf8tE_afBd=S<2|<o9U@h|PJs(wJzW
zol_UzO=?ZUlu?GA#5SZ3-u~`%f(G4szz#9CC07YrFvOPOigdfeNm`NTzx5joxJ&rZ
za59fvzIKH+ZzwTVxWka?pzl2u8O%A6WPAr2o!0p^LcclaGOja}B~pif#zmWCWj$9y
zs$U3jJ(_A$Il10qC_2_1_qy`ahG~Bpp=9hc7g-xC{O~X44{{g%`*jh)o5ZB(X<`-<
zyUC(7h3Cy@ZzCA;vg}Q-1^Kr`MA$-;$*3RpMLGrJz-GebNocSvG(~|?LyaJAs>m1`
zX-wW$9$frdFJW;<cq<yrtN?0mIkIOVV&$T;(9qR;n~giMxTUM>kjpComdYDp<;=4~
z0*ic)$s#iMxJLviNZ(m5-ri3}r-D|)#CpDI3UTv<xUo8Q)RGZoNcDGMV>I9e#Cd%(
z<bBbs)5jMm@&ti8x+wfk$RdW@Br9fs?&W_2w1=PlJvx!$#;hFFA68obLuKk>GuDOo
z>s_H*#03*3R}wogfPu$_RL+7I(5HUvYpNs##_y1<^?+$?j=`;QCj53m>uAKhNz}gL
zx9j*sbENTUp(Y<JxCd&MStwW*O4CAVD9j|Vs!EI6)7w03#uVuKnW^TF#$^mLuD+{n
z>}uK_$(S!`qgh!|h1^Sd|D4-<6e$9C+mksE4J{O>FOfz{t$SFIQ108_wPY%)fm0qb
z<x0pEX~`($6q(~<<+MSxV$?IuN2(#&(}jNJ@s`^2RlzB5BqqW`K#LG7=2e;T;T-+K
zy&q+|Scc3Y5!)+dpClWAM7Dyp!$c<R#c_|Jb{@HB;><L-?XXq$Hx&mqg4WcEHwH)T
z$lJcl@XpFOBtY1~N$B$en3sQoq6-=zgWWGv7UQ9L%16iN2*S6OmcRt;r8|R9{F#DS
z*HZN_mJBizqgS^ok6B1jsxjuZCi8=m6f#5j!;dE-DE)ZwV+t}bMTs)h;&=s3_5RWW
zGKVeia7EzZcH#bX+%`5rgyhi0h}mnjA8cpRjfa0#cGv?N%}PDUAOgO(#1jAh$A=G8
zzE?h~1}nz_T_2e6N6AcP$?DX=N$HZGFzT=OgxESQ23vQXR>kYpdA`ydDf{d;1PFPc
zmncg}`5KIaU9;vWghk5#>!MyC-buwb*K_(KLqZ9rghYr1Q0TL+9DDG{mJmQ=i?R7=
z8R`ygJWExcly0xxemKiCNBdchHTFk-1tmU#-{XgLR&u5a@==VB<l5}i_+H=bI;XCa
zF5MwxE&I6acuI%O4SNL~16Wh1nI#CUj~bGTW;bN!g^ih*vd}s;LytSc16>K_BFsv#
z>n(qbUenUPv}5~50^3sFh(cj;{f(}G&+O`<H`j$wpYCZmc5YA-!Ie|VzlCd0Y5(t|
zd40P*!-D=n7z%MfSs+gvkSY|Uy#6MZPK=>~fuhcUk&L0BF4F69b9Qk&@#_~s+O^k)
z&tJQZhFWda)~jC$vxSuQHu&pp9IS&J6DkT{VZx_oMOVIT)h!})dufI{(&gD6egOjm
z!>YA$3S;Aba9fHP4-Gx!8L`6KwE)j6b9rFq38vu!2fo5&mkb5GfMG!!QM|u0z`Mk;
Wa!uZ=bQ8RPloXW{sS?uj|9=2f-XKf>

literal 0
HcmV?d00001

diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/\346\234\200\347\273\210\346\210\220\347\273\251.png" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/assets/\346\234\200\347\273\210\346\210\220\347\273\251.png"
new file mode 100644
index 0000000000000000000000000000000000000000..9e5b9d8e09c07abae7e363df705b15bfde44c9da
GIT binary patch
literal 420137
zcmaHS1y~%-(k{V*L$JkNLXZW5YjC#!!JQ4dXmCq#f(2V3BoKnTyF0<%1HlOn!SxQ`
zIsbpoz4r<8%<lBgbWe9zRad|Dc5j%fvJ55~2^t(69HyMC<ZC!MWL-Eo_%jq_;0}|%
z!Yv#ehN6{(gsPl`1eL0j{aY(r2ppVjSc2A5ZS^07sd}%e5m6*x$!{s)e8PPtkHY%?
zgeFv87CsQyOkA}rBOI;RK(h3E0To+)J$(60DRa2#K{yft#DHZcftUI{+jZah#`VUK
z-#!1IVgAjb1+S-Yo==Tl3`%i7>5W_KH<f_bUX4@N&Sd+BEDHx`VcZ7l&HA#nh2s7c
zdA=pFW*Y9hEm-h*)!pp9Q8-MM4g~jJQsN6+Jo+)!9RXaU4pqWB!mD`kMQ`0^jj-ug
zg|q=!;`&9M$uAZPJAc3ME?Z&{fuSGK!u4)@H?!cuMN;%updlu!hG1nNDdVPV!<UJ6
zHhXxe2jUC|%plLT^8c3ialLXANcJ07q#RrO^D?CL1?)~73%@_B?uPi@Lg3ErN53Y<
zkO^jiCya&b4e`r?`HU}2%kqU-M_zKC=J=g+aNlaRXY5Nl2Uw&m{rf1Yp7qwTTf9d-
z()O?;vwt5=aE#DF%@fEc5lc$UVXPicjrC@(^(K9N#oJJ*(-TK4d`-*|fky_SZu)ey
z2}Zr@gnW_Xt<!=ne_QhL3u2T!<@gZp5%LK~PkZ47U1%Q8Q^HU@TI#9ye6LgSKqAI*
zx=CG;WR)>YrU_-*uUPQp#Njorv~_}6_*`3kwNky=U3V>ENg)3B;B||xkbI{gVcf&(
z+mJG2VH#9M6IB^HN=go*a8-Pw%#Y-Z;dH2uFDRztHlB9ocdxjPEo?pU5_%O+`h9JX
zDk>4wi5bP(8xpRb6%A7Old-DlyFxH~Q)-zYdSQt3q?QV&^gaC4J9Z><1V=qL3EoEa
zQ$z%jPahPi;IR+}euZY<QJ^+<e-J-pl%%1eBms>qF~cK$y6240bXuWSc5#6tM(~kO
zAUh;l5yHW8T5kGoB*(cXM&pi1d?YHg((&Rw$69;-?)~`<)pg|ZD+bf|Y8XSk-?4RZ
z142J^s+zKo67shCINA%r8{V3@w9VFT8j~&UqDG_c1wIWd2*he1{INna%MR}OfR#%r
z^eOG+;7@>_m=doL-1&!XRbq=3<klx{L4O<^MENR6&Fc_+nJd~!vTtipn!UskT=5yv
z<cY<Oa%M0n^GtG<c{0BboSt8*U49mcx8GMCKD?}3f*7I9A1=dYOl?J7Nd$wxc6}~G
zZf%b;j?`v$l?@AdvygE|z{{s++ak)ZYH{)T^2|ITIbLv5z=hxE&FOl0_9ingt=q?U
z!HQp!6NXn>!NH+jaVTUmW%^(A7vNNV{WMByra5kx2m)uXBJ~cR!imUSU0wNJcL$L_
z`31thkmtLItT5?N2`7QGz(T_>hbIVO1|wc<eCzm`gB|gKzx2aVEu7mg%o{Nr6a+OY
z3c3Kfk!zOPA&4<a64FM1sVdr=_f_BEZIS0+abFPcB4B<aCPT<+^>BEmk5sd~{Tp#u
z4BH)Dgc?O$+!6(0R7Md;p_9T<Y$g<p8z@OtCd(NcY)*qK`C&kukSeS9MJD|0#|f$y
zs;4n#y*ioXbzuuK^`EqQW$+~?)yOAMs>Ee7Y~urhGmf?-x$wSzB>av(3bEwZ4ceF9
z88zGTIC(MmK}5!j<9(iq;sCyL>kJVny!~}4hGOtwn|vu}AzDGl=U>jv5iZ6*XP$lu
zed$2B52xkuJoP>LVcrFqH<RH9t<}>D)EiD;%7eBRTuVd=f8n<9R@w+gbKLA_A5kHJ
z<beo*aSmV?+(q0lNrutSTTjkSIXP1J;%eh-V`@X(<7bq;Wq0XgWUuJFVzHS%b~3KV
zeU#ggYmk$XO_cqbd2Y^KjyfAerl?G<&?~jZdCYcfa!h}Wcg*;ap(}PpD(pK7)63`m
zO!G`i@#yi&eT?~plb^RGB6HrVZ>sm_xlF`O2u@5}TU)=iPPOK+Hm&loUbps|9LoKv
zb^Wcl$S!9{Q$<rN7p%UN=c{p>ePu%u`Bv&rPS(h=Rlg;7<!Yr@<v9rkrVM6ku&P<H
zbe42eH+>}N9oo;>p>&n+pt1g~>))WQpP%uSn3akdjGUw1W^ITK?tE_2#bC84dR`<{
zl%?BHIb<z7$zY{t_0$Tz3S@0RE>e8^=>x|Av3~ITviD_Q9b-rHBqqwHmVcY?#Bbje
zj%SZ&6cp{`r{oG6)Vx0bWcL)$OqauBK!3w&Lu7+{qvx44Eg!8UEi59J)X}O_uqnp#
z&XcAIw@KJj>r`aB;w0e2`9y1*cWa5e6>}HE6eFA1mXptFMEJEu=DBQJZ*0u9^dhGu
z<{pL_rUi-JOIz+;ZcZLFj#!c+UIDua=m6Il?+}lnnfFX>f1t^V4GoWunN#N)u|i^(
z$$EL84x_ekc5rrjHrjYmHZ(hJpl}d(;N!qq(ogQ3&lQ6&2d0y1lFqf0v?D5#AZj*_
z{jsJ$st3yTEj;Fqza~^^#!k`{AQ$9Jc}@xKi0|m`2=Z`|il*f8X!2;;hE?0nlg&p~
zgZ5Mfc^g<7<Qve>a+?~OW_&YzuAjF*A6AQ1)66Iq%oTJXLnE^$-xuU0rzHy~BPBD6
zs3!W#>AAi&p3x#u$G<KBb*tYi+#}o2Ut3ta>FXMj`t|y^69faQc$idYM$*&xuF8#|
z&d#Z6x!Zwsv%0saYcISZq+y_GsHwlBop#VHpvR>eOYn13yia%BFXCVP(-pIS0umdv
znx{odS{lh4og3XeM9FQ0-*CBJ&xDA)74c3h5+QG9S>jq^Z6^24_VsRg(em_e<4*jZ
z{7!|h`it$5-B37O4xCQwhxcy!;C<Wu(N`?&bft8qtg?d7sdH(z3`=z!-a0@XR6_N7
z2R>z_2aX($B&1J!NDDnF37*E=c*Y~RD_QI5uJ0~&5rEQwq9ZxZ_fCJNc5-!JQde4@
zhV=U@e6Y34#81net-R5eE*!J(W-rVrdIWlgJ_?ifgr<<yFz-XXU)>yE%%k$4ehh?3
zdYW3BDy=kRwq|O|k*~@`rwQMAmy4D2)o}#H$0n82X1R%v1Xke*sQ-K&u3n;4nabz0
zGAoB1=fp_<Sut%Otx+`c6D+=(dGDp@N29I-rOhmR*@I^eNIQtbL3i!k?QAj8mI+&O
z!7APP5jn{s$ptSP%5Td1%ImxI9fK|)u9jyTyKeky{DRI`PH|4>YcYSuj<Mt`>G@;v
z)FW~f)5h4U5(yOqvptmz1gm5S+2*Nb*6`LNF9i-|S~N{EF2o#LT5G<Q|GKU0w7Sl*
zQA$*_|7yZ(_9KEZs4Ylzu+swVZS?@>S8l(l+u1L(dYe?vW1E!&+-VICcAJf#JSP9-
z{c+y7H(b-G%-<UH&qPu}k0cz@yEqK_L}?w){SneN4e|ibH4+)I7%o(s55%4-G{aC7
zh7tr4U<tm4FSI@PWi~1M{dV<WWrfhX>tiM6P-Z&j;-V8hohEr3&E@=%f}WycqYhp5
z@{Pf{A2!3VIvCVY#vn&~T8%<&v3Ak)_Ty<9bNP$nH^>_Mv&OY1KZ4yzgGj;9<0Xxn
zSxe{N8_{~5dUm#$v${qQP0i}a+9Kom)xEHJmHW=Sytg53QDfvruJW#~8a;2)a*uNJ
zW+AmU9;~_rxqEY8>#tXaKZx>9@lQa1b}XpZgLXA{ea;l-z|&_z2R5_Ai$>=K>m{UE
z>{#&=w5Kn(y-3PQ1W4jY=dpvyZLN;&1P)VH?FCbmQkb!SQ|S68dActzH`4~mct*=d
zyHfTGos~kI`uA2>qmB5VIe*_XzxeSo$6tk9bdcX?En+x%qr%VvJJl@BhVQ_)xp}ha
zwDrKQh0@a5zM$ooWBSqe&93aTvb_-SrGbDe|KV{Sb0hO+iL@cK1?I5VeLAs2&Kt;^
zx;W5udHklWNTp`;C-nkTU4ci<Zt8MVzNqc}%=q%6<CgEQ+o`9H*mk1$Lf&_>cU#Vz
z?LCz}5EaGL2_YAcxdnm+m!sK0i?733!=f&r`@GZ5U=R=3_qPAOWGeMY_$wF!zBXj_
zDZ2OEB%S7EZ5nMlU)*TAJt<ymy1U-S-o?%lW$^JnwwkjT+dKwW8+~pL_G=nXoStS}
z@4vWhfhSU>fO5jQSK-0wa>8M_+hr=Ne#O2<|1L0oMthi6x{WbwwD2T1;QoZ;=i$MI
z0GwA1f-%zR?>sokak$WP#H}O_3712bX}^69LK-1b+pxRX)I_X1CAF_|Eq}b+(c|-r
zjoq)oPt*cPqXQ!Jf$-uZL|g8yk`mktAdLct_=E%w2}nHwJ|a&@|2HlD<T)I|KlkC`
z;6kk65dWP=8Mr<ipMcLpnSWgoVuRtH0>5y9kNY?H|H+N4`wiiL((q@%GdMAI2{}37
zs&3{4f!H})+P_1&*--*+pgPFvIK#ma(m#Bj$i1dJ1l~VwrJ?;!TS-yC%pS^SVs38=
zVRMH%JiG@^$Xx(PLLu)=sNA8pcFqFs!ZiQn5CGB-x4|@2|73Y*BTS>Mq)H`W?*yUZ
zVPj`wrx8J;qM{OVGJh-ZT2lJo#erYKG?wq)IS7EkZf<UDZZFyFoh-l{{QUf2c1|!S
zCo7PH)!D=DorycEoipvfUh+Thk%Ty#IaxWpv$D6NdU&sislCfPVH%o;ivIWbS34o@
zR{yEV&iUWn0y+qOcmn2NV+a55yMdxY4|fGrt=u8DI+9jUKxRN4BD~yuLjUCd|DOD(
z#{Vj*{hyK?Je)lLUG%>m{l7&uogq#V_E4bCcOw7M*S`z@_rre|6aqhV{l7%<FFF5n
z7Z9`vnh^MZl_r8_9PecX=#k7yQbhx}0#^2LJn;g)p8x9#q!F<0Nc%I$;NV`t$w`W7
zxIfu#(sCfTfNgsG!hAB~bs+URo|U*SUXfT+Q^ie7(`{hD^<28gu5(}7en3K0c~@w-
zTS$4gUu0L>_QXy_D~0o?f%1z4bvgP3GO3T5TTHRvw@`6|sg}=glot;;9Z}zdTIrt<
z-pMf>FQypzZk#6%2;B=Ay=l6kWW%nZycQH}+4QTKt=7=gl!^%n4mO=188IFl9IQ5O
zgSWA<si?0PAf=$FCgru$wy?-CsB`MFTWqvq@?Mcj!Uwe+HPzS80btVh_gGf7`48NJ
zqM}IlPRG-2&7JM-kF_Wg+EH>nF(E<dAZd!vpFi6g8@Co4wNUgJG<(k?KYc2bnRuKI
z^+5ck=%d<Vn-{RSxR@J#Y?_p+x|Y&7raW6d!qY1V<y=C}VGO}jO1RtH^R;ZX-ef7|
zFQBZdW*weumua@1bWxge+5Y50<~uHMcH4W^z~<Ro*y4n7Um<#wp5$N(2O_*${7AsO
z`{5}?sb;|&BwoI;=;5z|7GCH3RTUP)gxB0X6}ZyjD53BFdPYba&(hwfu+E%Vl<kz<
z|0FB$T(i0FIkd&sId;GiV6UyItked?RTvo=Y0}so8rkd5?mrWQQG2<n5LxccsEkqF
z4`W3yqeo>WCntx!Sy2dHey8Z;<FkfP7IJggR8j&SYjNGv4Q0jEJw9Q)Bc8(5Wf&?t
zmRSpe1@Tx6n(^=k!b*}RC-i#4e5O3MFn1R2@pf8#yZj3Nuv~C-utg9CeTwl%xs3|)
z?Jqa!K%TF6DG)yD#H+R%&)%{>F$sjJh~7>tdhW0ay6$S9$oZ<kzp+Y->OGM`lzjEq
z6Akg$R3QQV2`y_v1rDAma=ZL7P22dBCx~V&@CsgD4W}n3_)W^OeDL|^pCBnBo6p#c
znuR)Hbd^-a@7&yEzM$}t1@OYp)13``6`bA}PP2EfFy`ydK#mDIEIiLS7f<aw@l`Hd
zALVR1#rZBbnt_zQexzvWF%bmY7HDv_jdr29(L`ZIn4TZpCK2jDN{2)gGJc4=lNbG2
z=bR&aHe+_S5)Ph=A{T5Y#aNDQ_F3zF=5;!$1--w!bv;_?!r}TIz>;yg&JjB?OhX0#
zx3eO?wb5m%vP<wTV9+xisl)cz+@!vg-+5uM2eUGin8@rIKaR}R^|cIF0$nv-0LwMU
zuVW4GV^8XV?^5BWQd}l|Ib2c%{;2-wJ@nWF824-RML)K}XFB4|0vSa!lgB7yn0%PR
z-v`3V*?+FTVoNf|VlB32cbO5lG29dHpR(}YW)FnvHn^V9?(kjM?9APpk#5y(iz{So
z=6R<F!3&*?NSs})M$apNoAAZaeO{un%0yFCO&fX*p6$(BfY|gyo0G5fx2N5R{l9~r
zJj40Rc_B>2UJyjsiH1?2IL>Xq*yt{bzQ`KNLS+6-3P*h3{2jy(*Br+}dtAKHJ-3wR
z8xElf(owJ(iUNtEj^rGQa46@Gf?)4ogZ|V>hhz$5xVpH@VfM>1P5l{=mE9N4h-l&9
z#d}j?3YNx<bw@uLbn*T~AEP1z;re-DU42Xb!#MwuF%6WoCUvH3&SU$<j38n6N~+4n
z!*+DqK*B2?_28zvs~zgfFQR^i;{CGmR}6nu7>z2!2OZW*<M^(bp?DL;$*vO<IZH8H
zu?b_nA%BPV2CYMZHUuQ<z>6bOF+Mp{^xxF^RNpcaxKR2Jb~j(5t1P}m!abs{J|vHM
z)jxqEYne>a!bC|9P0BlSpZ-`P5%5OTt**-MHg?eG7k^3%H1WJatikgFVdr8^Yk%|@
zkF;^rWQv^;&%%328{H?QS;o)&`mh5#WaQY?WSjj8hX(!odb&i&Zjb0)89AQ=)fMPg
z?ZA>aywLX%?GS~x782;jmXVtT{cgh1#gRIECu=H=E7a}Zb3o**lZ+POR!0?DdaTDe
zcbt52WY<<e-{eK#HS9|`29f&JMipTxp-&Wf<h^{3&x19RCihI%1sn~NTTW>gt8@mT
zp~Lqd{(2wIQYbN1>QFT3cQVi8Hjh44Z4!cca;kmZc5YyK31O&Q*e&1guEx;~marr?
za2P;mH9r}Zp{!y|b*IV3ns4^;I-$M7_4(*l;eCI-R1@$Nn^0<CqGgPTJ*a(K^{<wP
z^Fc@4cnXbm0xYf5iIcr!ZD)rSLmx&-q#HNt(pdee{V-l=kx__gxbMJ<Qm+v*_VJrU
zy%oLEn1VX&$Xs|l%mjylO)`-SGUM(s+I$RXj44{ZOPrp%>{r<9@fqtXr9Xwfi}D3F
zI%_3nc{Kv(mR35~NH)6Fvm_hf>O~uiy}l1ddYj8jGp8B$bE;`xxeIL>^ryVXO7~*2
zUfX#&ras3%7@@1(k;{Yro3qI;Eo_HVh0iL|ZU>Z8g>=fkzo2=fZ?iJSZ_rtPu%{qm
zH|H!dTuBNpk87A^c%)|59wq%Kv3gO)rAdrzI>l8)qz8*HnVptoHZnzJO|THKi_jDy
zNENKWz&?>G6#!|I5JF3<PgV5(97Ff>R@(c&AaVF;Lo-ycHz>f<!{n7`KB86os|PcY
zFTJ3-_t~10(YH~s3DM(Q!Y8Mfn`t#UX}$uB!{tuveOr1UuwiAi=-rIS&7p+X=2y3f
z^NL@yFvMZU{_kJ^<_I6|tBODLrc@SrZVZZ=<JiKm$~J%EQHdZMKoDT3`^@7aay53Q
z9md_&!?SK*>3%xjiPDN!rl2086fvJEL0efn#wq0R^1G#W<LKu%LR$x>9zQk;7pyuo
zUEXM}ip%(9g#4~*j`}mTx+34RY&1KFE@S)cI<}=7(wd)dBT;%3<*4`O^}XzE@b63a
z^Ra$#3z;}N4f^=`GP&#a1k-gF*w>`N?Vx%*N2LefCy#QGu*~;w4)5Xq2SJ>qQ*m_f
z*l#f+j2V06GX!po)Pbw}?`-9ngB0|9RetA$_J@IkCab;SOJO6gMM{`L6vY)<RPr|O
zNnp^bjY;fze{v(#bcloTh|%PjeMf-L)RN1n=MW`Ude-L2HCws<Sxvq90sG1++wAL`
zPrlF31ch32jU*EQs4>f2(rDV?x>r6mHH9<felWV+5whpCbe+2Qvw>GgAIIXgD2sG1
z_M`e>c!J7`q$;=dJZ?U>t&e*uyHXlG?JNsX8a?jJN{}C(9X7(r?;0o2ht#XRz#NoS
z4RLIY%d+~-P++gOGVXtRJ-4%iNqUmFh^^mZd&^c@x_<5in@~w|jn2D<fsH5*%rB>E
znhC~3)Rr10e#fbZltqhN^)9-vVhTF0NUE7D^(4l@+8@D2p|4VbxlIVuCzSw^u5w}3
znr(}PC*I?`#^e;KjnXYy8fW2D@b$H6Uuk+XMH<b((kkbgA=5?kg@_yX(Lgpz_Yki*
zs*Sz9Jql}qR!KFhUNw<;zw@v#Nd4%@4#vCWJ-+C5AS~$c>-~{|dT`{RhSwvArH_^>
zJxfR#?G(r)-p{3#twR5%OV=O{m1*CEONeXB`2w>(<Oh!PC4`to-JIItSQc>E+3<Rm
zmN7-zW$me++f4J-P`yyFR*7Ncc{wmkXDKAMcLcf7XPM<Be~f*^gV(B{%39W`Fe}fd
z3IRHcv(PG|eWrz@sg0v?DBI?o^}M)tCKfS~WrF;kCW#$NbUHsaW(tKus{!1nSKgV!
z=zNq_P{5px-VqnX)P2E<%1V|Y^;c*H$bdlt=%RV<@1*O`fC7>YeH00JJeC8lAoA>s
z{bK-(cX4r{Zl-EyuhKS3Q_*BFfn@W-udDp^mHVXdxVrl(`LCs!rCy0EWXURXq&A+T
zscFjIv{oS569f`S5@h3<iFxOhlwdCh9R$Yxf=6Y2#CX-IsnQ`YPTuYM$N?=?%&*S#
zD2dRDrTd=AYLfgy@lnLB5exK5eoRCE90?HSxfHh&f1G<^d~-lhy~<8Jo+{!a_0s$T
z3^A8@Kmh2^Ta^Ub>Ug$^7Q(-~@F&l5@qtPbX!6|uBLk_w!(vI}ZoE$;oT~-0#D!cB
z1sU=$9<fPD<1<_!yVs8I6ii>dzp%)YK>x2w0N(RL8IXQg%!~G9AknW_Lj8!&fb9E}
z4{A%Rf33_6*t|l}s~04Xbrt7H+{kxn_MHdZ<Dyv%AO82KC$!Z77*6ZO>6)D{6CO-V
z_hnz=Ut=V|MJ)I|cmO|_Y%-M?AU_lQ#PVZmqJozKn*Q<miAgoirxB;LiFcX_7V$2X
zkQv(<t;Y<DhJ6sINYNST`K6Z%AfqM=k!Ad6I%whwlPGOjRK*zBvxg%iff<ue9;;r}
z3jU?ErB0P)#@<3{{}k)|TXfw-N&fi;wz(gFo<f(x+;#`nR}<J8R#LJa`h`s(x{Jk(
zR6+pK91wkT6oRyzRX<n5J?nN$@E6Ga&kSTm$4(0bZ=#o=vzZN0cUS547U|je`JL+m
z`y3PeI?(giSXsX@UkY(qb_CqMl|X1FugqT-1L^5I<cS;SRp-}tYfHQv#Wx+-(~&p$
ztKq!m|6m2uy}MQLgOY0PhFTMmkd@yxi<$MmHc*FhH&0Djs1#F*dbQr&Y29<5{f<Ua
zgaIo05!*1|Lm>2K!7Z!4_=ol94P7O(l#-%12J>61Vqa}oBe0vMrX8oQ=2R?*UwBq$
zt{%9G5VSmM12pVK;tGp#S?AJzYQ}X5O}pv_K2SB=I#4Pr2Sm+^LHaVaNzgQ&eD&rX
z?<xhO&R*ZMPb0Ab9JC_G?Oq@`v{a^(f;m{%RH=CdH_MzcCi=84^bdT2ou$juL{c92
zijMlIn{EN@#%&o)Jxypw=&FTeOg!q}P5y(9gz;P;;o^_u#FCpHIN@K0EVn)gtKvNk
zS?PIf?Kguq_v^(?6<HcW%U)NQ2tK&-$7$ntM`)j-`g>$9ZpxSx=b}JqXG*DX-av$3
zR9wtf&CSg{H9v2|Yrpu#JM;XP?6Hy|DG7<GLM#FTLd-d}ABT4{>(78)4H48d5{!d{
ztIL8St4oBb^0Uzk{c*mf{GF7A{;WkQnBC4!vo$&8eec}TeH7*;haxQuWc?zIqwD$K
zV;?GbGrT08bb619c-r#JUD`^j+6C4ULX64DyV$)(qF7&AY@!Md*NS70ktOoB*HIr*
z%(U6#bGR;sEMWNdbJ>-qId>ZIa1r#C8u(U7G|t2L=gN604eh#92jx>r4YvC*i^5`P
zUQ0&THBl<FrbyMVQ>>l6TGKtsy9@)*o%=K2)p3k^6ubj_rjs%>b7{3lp}{O-wvy##
zf#}t6^*5KWNM4Ax6;rI1g4eXvwdnn+Xu^K#iCZT&#sR>aEVI7PL1>|GqYtAFrIa&n
zLs;pKB%PX`wz$2$rH+qasHOW5#HW+;`7;gqii*`I$M#4<UJ7)kC^?Yo(<mxnra_GQ
zeUA&Sa~m%E><NJX$&G^>TAQDTncQGK`v8Q+VYV+3=Vd2G*8^8x0=B<4TpB#?KK<mY
z;N2nRu38>>eev>kT@C9IKmL~BgB70c-FVu)(UU7_w}Zg<yj6!H$QAcDOLsY_<h+K)
z*{0%fR#0da3`ST0%#mtzsI0aeLyR1VM;SReJpzJ)dsnABN5nDajahYdvl|1+>WT;^
zA>6D?c9NusU+{U;;x-l03ikdTllGB+ZAu9OEojwQXt_v)59<7>W&L?7+F5B*<@|ZC
zI}6YC^jFH1&#_N~qi2l@`P6U!%!XR3q27QjC(ZPvM>@iwWITn=>RwP;RTTv1KjKdn
z3#0Qbm9qVza+$CgIr{S_6d<XNe6bt#wNZLFJOb_-J^tA4FVsmz6StR%dK5Y+9y^F>
zFZ)n;`s~eBT@2U<PIJ1zUwVq|`Zp|&a=H}bMYjdPwiprEDlg(a?|fk}7nRxa&3lWK
z3a1<yUh|*(Rb%@HW_5S76gv&>1Hs5wjRBJvA_wT(*myF<w2!p>g)%UdVae<+L@TJ$
zXGrLTGdGuiI#v-62s^Qb6mWRQ<Y%~ULERqB6?7bk*!XW?jU)a!7%PMALFlxYXJ%WB
zzS-nvU8KRaN11Y9flc0(W~X&|+h>No3i65Wm5Nah9?`MjMh&2vdV>fLLM9{G-0c=1
z{YE!NK4@ioJnk9u^-%yL;C7$!ldoavN^r=FQ2(S5x|RhHcQ1^nKdSxJKT8_p{xcS?
z;`~gap}uXw$RR7jtm|t{#fv%f`6XpPWtK|Qn$K(2$((xFQ2<`%$2?*T9T4M(vB>>;
zWxOoRruZb7s@*LRHygXb^VB+&CQChZ`A2(Syg6ASB8+QY+t5<Zk#>lxu@sETXUqSl
ze*(ax$pRgHojBHEExRI&mA0PUZyHaurYcXK>>TW)q_6kMELD(hV(a>CcHwB{TW`}<
zYGlLXKZ46%A)zRyjFelCN?-SD-I?L115O@=Z8d>0j3w|gW%@4sYn``u<1fg8gtnNu
z7@}P2^G3g*yi6P=9G#gJxpv=1^;idh86Bk%LmZ9^2Z4i$+U>1$%FV-A4I69uAVf4+
zH0bL7%Vn|ISQ3ZSBH0{f^r9?*j4VevLPrfkdf(XVq{REzYBg`3H?}vHV62Yfs)y7)
zP)+@8l`ONSK3}zpb~**8nm3=aRgzWT6)R9IUYjZ{|CFxteRI^vB>wkV;>ok;ajli8
ztZQJ8szZ-&_9Iu<$QSPO50fXR(j9&SIN{_iwFn}5qk2+xIFcIDGU5k%aMEEifjwSe
z@`)j+6V!>@`Kdn<3|g8LZ!84Z-U=Xc2o6CrMLTeA6AWVYSHM*aB?S{M;z=yH&mNM4
z30ImOmc=`j?od*q!WDOh15sMB?ddMcvkXt7V5roEioeu?U<SYtngG}>{-}|MaZZiO
zx_Ab7?%)>FXMe{WxhNm32y~yfwfOxmfv$ceS0n0#iKWR${O;7o(cbpXXRl`I8Kx+0
zxu5il98S0i&~E=MiBVlB$6^xhNO6RuhD=STgc=-^T8hHBx3?F}i(&YjOyUdBH|(+?
zhZ||4Y+?ypYO-KoUztK`j8qmB{%}H8G;GkAk~q2}BQ{agC?`LF>lk{9%4IyRBN=uV
zjYM3MG^cf4;=G)p$8G@Y$Flgvqf*u~kHxj?F`Ca2ew?jR^?Rd5zczhgC8L6sfLv`W
zicx)O^wKkpL+`d7L=MD~xUxV(Um3}^lpuo9AU=RPQ%>TE04@9xJfdAW<o6h0(XV9}
zsVI2}t>`-tkcZJ%<StnU3!0ab;cnJOoh+2yz=$*o)t<wBi=zNVDKwme<u&)F@uEe0
z6&Iu{Fn3GN8kE-o<UG%p;CKWrD~7d3;OIVoa$q}KXOdW?S3O8r$rdJF<z`SBcZa*{
zbC(({-xLR7(e*eo<$4xyTZqE?K!E<5^yl-J^>=Ug(l~rAh9MX1y*Sz~|Aj0qkWdmY
zkPwlWP?Br~@DwQgK<@fOQj=sDhGXzWT+N_1Wb|Ma?;FMI+tMs7E|-~$ffp?`zeZPF
zKFBC&8U##MvsP3O4$qWt-i?tE7Q3adm%C6%SYBE*c#P&??KI;ayc?Y`Nhra53~jJ`
zJYeKx?`QjWF>-Wbzh2$6#?d$B1WkD4pu_idg@abVzCZzcM8kk&h_U~NG^-BnxDo$P
z;x``YNYqcGVH+D8-$%r6+^ZeF1j6zr>wvGLl;hDmaj)(7p!o_c>=QizadH>U*g^Hu
zD!Pm=V3T!QV_v7RE)+iV#rsm0{BIF>iy@2po-+OM1+Shxf;KcB+&5~eB+fGVi%=@}
zuZPKQ*}ovMfE;mPbl+I?s#&tJ6jGUgWFU<Ou3iX(E2-9~D(RvL&yMT}h2FZCpc?{0
zzA1({J;WAKdR<ZZeVtN&ppCg*U9fr>{Z#o~SdxJ4-I+yEy<OC)J|Fa}jIYhK$X)tr
zby|5_GpWdE66NxWIZG(F?juu}r159KFB9j$<Gf_=tDyOkY1I*1BWGY>pm+~V#yI{t
z@@NW`-iBqSLDbdkQUpiz(XgnfC>+(;k=RCKU0vOFy<n~+=?EmTU@pGkqCi+H=Wo%8
z@NBFUuI4=QtrdBfuPR|$OXl1zgKyU7x95-xQfm5ic}2`vZCrKx^IfR(-}!pB3Si%M
z3!L3gT=|f+DYg$79^fW_8se;~9~pflU?Kp*`7FhV&<EkB9NG~!SNux$clXahO=Xm`
zH0l4?+b*a}LUKcn&sI$pPd6H9)q0HXi0h~O%=PfovH@FX^oajA+H`sq=t>%WrQ#Qz
ztf<`lq8d+-=JsY$=~XrT8$pW?EooV^+q#LmaV=?i&CnQhjsOwp4DpW8U5rcGO{=Fj
zdE+dkV=^>hEzqjxMFMRX?PEY$9f)2b_`V)0gd(eh?OZ14ge;*<9Z8GmX2J(=#V%^!
z{eJB&4fg{aXo6O{_9S2?*04Tm#$Q%~ER-qqMQ4A5k!<)urUDsR!1I8S_6aj!X)d1e
zPDi}IGZ}E&@iB_;3Z3j>pMwcgD(;wUbgq-Q!Aqu?f3z+&j4|J6&TsiE#9^N4SET;A
z-n&X}a=o39netlC#C&ITq*aHZouE>YsKm{X069KQr+tKMc%cLYxa6UnRUmkj1M_52
zKVP8DVmHrEIe*9mair0E<dk;N1}}M(pf7^m^Kn#gb#Y8Tr=&=j%fVwiu8p^QNsFUv
zFth+rK&T;%GC7bi`z9YPJqyyomQt~g&STMqFjZ=-)EJ#blAe`4bY-AXcXuhWBhoZq
zoaQ~eH0U&jT=J4WttGMdyyx|n9^P<x)T4ZB-vcv(rH;|j$cXd-DIHM-FAx-n)u>ji
z@0d~Q^_eP=&V*j7$L&og-b!mlko%aR4vtpn(@|hvNj-b^v?a)7&QIo(+<`?~K(N_t
z4lovt3*3Ow_;Vu4>45!Q*Gjjn<1@^@rkkF0Ap=r1$un3pw}B)QIU{6$E-|&_LEQ-(
z%4jJgwK&&1SdZfon%Oi2pQbj}cws-zrM*Q1cpS3{#P3hxQCH9o%-hBT-E!&d@w4y5
zPh=^WvPO*uEZVM-GDOt<%WxUT{pr&K9pp4j=9iWzm&P3fn;ciA8=o>L@iL@m#h-OH
zcRW~0lI9C8LnasJL_LG~*TCY0t;<~$Y&-akA8$XWjc#4X_9mMPM>fym=$OlmkDpBc
z_5SQ}b4y0nxFDW-$R)0U!gvUT{Y2{#L|G4bk{^?B$nJoP6H>#`j%7sme+vJUg2lZY
zAQ<6MTC&{o*1o^+Rir`d0fNPg*ry6;Ns@#sYtF}kCUcy=4c?bd?xXDMNx;50sFU7>
zyk#<3Lms8(`hj`gkD@B-Z(J<;Hh(B?f~)Mj+n0WpPX#}{D!i`9nDV$_ZK%HQ0jPnY
z>n>wU777Viq`2Q#&s6+?$G8E-2c{U;HIyNg%)NZgNHMzhLvLHW3DDZFAay@_>MXj?
zD_Un2bS6m4ffsTi+#c09AgfZT_~ozc?xQ$_YMU!v;blq`T)%5g#jjL(0F^;9>II)<
zsJOR07s$e>3jqZC^QN&1>M^w;d>An~F<h34Zc+HBSI^aD7`yPikBM~aAj~YkajBCg
zeSLo-+Vm0TCJ9)!Vy&brgYWRdRXlvx8iy1UlsHZMu9T|$*AIWd8_P3D&LPAZ;u^l5
z3uR?dl<T(74CyU*KU%@@ni-Ba*&I&8w*Smkr7a+WHI;@6kQ|)U`fj}JKK8&K29l-{
z)>&I<N@^Ls%AJ)Wc<N!sBlnq~JHKnSGZcDzE39Q;dE^F~tOa3&2f`u(?mqsVr&DBp
zXz44FFiN?#)^NHo7I&C&yu~&fBB7z|@a75pHJxkxN5+gvOSHIZragQ@d%{LT95-3N
z<@O+sNg@iiOfMjX<KB1lUCz;3P~ge-EA^1bp^e46K73=gN{q8>FCCJ?YG_>*)VCWZ
ze4Yp_AvURiHrhAJ+zNjR>)Pq_+1^f<0OWTKLFU9GQ(qqy`%Vz76D0=WLjXZE?qZJr
z#fQ?d#YGB3diQdwayoE)nPePYYi2EK;+wPe=@UV{Mx9t*%u6a2V5bC!Cb4=5hXJt%
zjja_46D#yU5zZIjJqLalha5J)*Xlb|ZE2~<n#w1gFPs$CebaaY#5wQ6Pu&cwhE~}s
zz1PJ1F98SPBPZT(P=ZhY_uT+A>}v28hL&Qbt<FJiIdm~W2cb-j*hr2+c@4nu5|=St
zYqcD1KafL|YdP}5)x=R_->01@-4IsC(U8S+%qc2jZPe|>Hi=~%oGj3ict@?w`p-0E
zee=Tgotov5)t`JDX*5;^Cvk<g3`lDb;Wm~U18h0{BSh+1*xywg+G>tEV4aSX!Rn8;
zg3K(At{!qEcpzTqd+Uk9DrZj3(vEulM*(e^Bb2t#N=#DBnEH<2`37l;Z2^ap2FoA)
z8pkAwGvc;TE}nXKy+2W6iyGXS0`hoGGN&8?cM4#PFpdW1bcG-o!#P7o#yb7E<<WW7
zNu7)Wh=Ntlvd;GsHX=?LKjNg7FBS8cB}TLs66)fxveyo$iHc_Lim%D42UA<KLqRL3
z{wUp-qmof{^;D@;!KDz0+)*4dxQ3B@#hhMQ$QZ{ycE(>z0l-0IJ$L*2`$^Ri?)=#k
zDQ`_Ri$=l#w;^fzxh4zP6oJl4>Na}}GO*M+dyjecu6x2n%hhk%?#sOWE%~PDHhTf|
zP~^4nV6xWnBT$Cs)3c;1arEvpE{GPc*dd+RmhAZZENOpZ&d<HEXBCi-;OukL&m4w^
zX;FYMU+Dm7jMbo?Yel+~+Gh%%&7&U&^c{@`-O-@Im9^0br67wF(O9lYb=ubIsXNgT
z)|ceQPxi<1f{=cHDw)Ltg=cKwN{vfF-Osv3E*<05yWxJ2!Ty?g)qABKfukscyI+)r
zn*$U(_M*@ee}L|_y(-O*_)q|D6Zd;$G)Wo@pbP-}Rm=oa85H+|sQj1716&PfYU6sv
zyOv#mDEn1=9{|t9zVJPBCS=nmYGHH?f{L(4th<jGAIj9qsYlS8FaVy33GY3C%9+dx
zUm<Ezgq&AVXLlByxdHo``sWyJe^=aR4C>sLYI?QQ`JTP6I`EqG6j`!pN>rXXd{x5C
zI>{rnMPbO+Y57Pviu2(&12FIjG4_AoJ`ZI>p_RLNE@PYykD8b!5aCup$3Y*)+IWw|
z!H5dBoein={}d}p*oenRQHDGWbNa*n6Z5yPyb-Vvl2`O)WW^`FzIYZ1CSyB_=l8Z~
zhqC6X4t;l{YItw1%x$lGRsx9oYtmB;c(}F>#J8JGZw)}47KB1(m5;;?p{-p1SVLnR
zQEc`n`EfUsP_UbvIEpFiIA0bwrKri+wHA(R>`W|3_t2a=%3CNDtos63c2HPPR%VjE
zO0l8WoUEFvwG60&UVp5Dde1fKQM@aGk?iaUSRnk|<8iqCeCp;_+d5`q(6dxl+|klW
z*<F~u7<ayhc(oUQToCXG`je#kXLmsuf`7M$R3zP8CiJ5qG>j5Kl&0TZTtPqtI({_w
z<{|FsD=XAjylbX8psXqiFS1(in5sWwN;z*upQ?4Jo8-C(KRD7QE-qh;vT^;>2eG+7
zY;e8YyYC4O4k^rU{&8OSoGCraY4TCpu#W-2|L?773|Y)ZrO1fwsUzvWWwN%cYeluF
z3$13Q`?xW{**9gZ4}ID9KmjS_?|jy`)iI3>=hgnx_q51v@ij=7q^N#zx?Y;}<Q#Nr
zbWmMPYp>mNmnONCm!BZhFI>V*{|ND|N=KF6rKX`f)B4G%K9g&#xl(C4_As_NabKmy
zM$iSmG+~6(B`P8O$%ojymZTRYeK|OP&a1cvE9&c*3$r~wX!2HTE3J|hoVvL82Fazp
zJ=nNZDcr1x<^;n&*ZcGK@7)$Vd1xp;<~7&_0LYQyJX+Gy1H^~}Cw(quS=m$hw?COR
z3#(l2iea8cq@T<d!XdrLUII5?S+&ajH7+Mi%Pg`J6)y4qPS{-d0DXgewyv7!fP60J
zV2%U2qFx}Z)U@f;H}VY=`>ert<-qvJZ^MvHQ<M9s7<Q;9reD{g;q$U8{;?Ig)&mgG
z;1;9Pys>yOz+j0#Wpm4ZmS?%qa^ja0kGQ9cq3@BRe8l{+;{en8omo<u69goz+C)A`
zqWkxuaCAu}sxA6(eUK&4_&YoWmL?xiwHs0a&DS3f@>-kwvB|pG_T8;?t!=B~??AB?
zg9Z=>vSx6(lRdCZS;zY&;p5+PzegEBepu+0U@eHC1+o)=kT818ZN5Uxeg%Y7_`iOS
zTt9U0x?(Zj<}qS0DeR%~ow{nN`!62)Lr_BV$Tk1~H3F;W{e}5<1cX4dCLUW+0Gj#Y
zUrX&TQm=y_M4gup9M4eIu^*)ne)%#2>$1KjV(WhG4`c<AjucjIQBl?Q_V(d(5)u;5
zH({&<Vg<5~h#~u7fz~RpBr_)jNO%pC+rxr`kqZh6D%#rGw6CwP&Gfc^I~_RZ;6Gv>
zlE%jm)u(=qYW!#O!#P1n(%8%lpSb1kot?N5r-K4<rbp0%MiNjp8iMA#muU}eFp>fA
zoG`Fa;_orN!~fVr--C&)FKPupv|azPN*4av)qrV{#ToEmOXmsHK)IJ%A0HX#=MSpz
z>d?s>paL|w?*q9pL}k5ePF~>@(!b$bkr2TU>_+2Vq4G&iK$M?H!|ApLuLrYY^1ktI
zdD>dQGEFJAc`vOe-|_5iVQB_=$<qI$h8GIoPDp$Tcm6naX!5vyRC*Xe3=d2FoyosL
zufO~wiVrPdarE`jAZVMrl2;#?w!PFW!r{{F!~=%=yfVi$<w-zI;usIXJE7fBhoI8g
zjW?t$!E(^k^Tu1X#hXabuDtySjpWKR_O>q^%$RyJMUnsa_-QquQ}v7#nL72TZ;_k6
zF$-!4*U_AK5XnDL6@Bg)mErLpsL}W@-}xv`-&%DN^{|GBH47h{3t%ggE`ZW(ZLF;e
zYig!}-QfHgvjEze8yUgT;}m3P$MK*fA*ug~A{Zbqkz#Lai?I<95YRn0Jq?-{JPjdM
z$H1sHNg1r3frfhwlRDDsw}tMc%&hx;H0ftmrB==)k3?M;XQm}{5i5mX;*6>lmMxC5
zU*az_%AL<$Ds+?*46A+}+rko26Xf563?UNKw=U4(kU{4y(;pfb$upv3SSvpXj$A6Y
z^J3O0%^;AJpz2&BtR>V{FXLB>%#1(gsY^OoREshCm^!D$qal_0#3g*JNXT6AazBls
zt?-kiMyJLMzK6`DCIOKJe=5d-sdvpslJzu2{krGU-efV-rdn@Fe^92>f`Y@IS(zJ^
z^Jtk1%ZDZ(s+Zjo2v9;48)x5E@p~%Eta5nvp6vA2URH74dHdq7zI*2}uTRn`;))bW
z{IgfcJd&>LId5|Wrd*~pOo8n-erIX5OTm|u&?+UoU?AFx>>XVV2RM?($hVww3H<|Z
zh|>yF<hh;S@TOqZ*Rp(3-7{@L;f%{#V;lu5i=kVJy$dYcKaOJ}rf_VM2Kx+<OXW+I
zqxEacD`+KjZOJWS8tYHbdOv4*q`P7p(|or3pC09Di(|3{=TjaUaL^K7KqdLJxBf;S
zZxqG?``wQ~(SJ_5ZTsu^0{)QC8HwyL-ThLV;h+(vcZ{A%jLuzi{>XbPq`Qb3x6*(Y
zf<N3eM>yY0j{8NvS%LT`Oe1-Zol<4RTTh=}G+xj*zB;F@Hr|PdSznIlMDxMgvma&?
zGFN39Je{#HFPfhCly38tR9TU?&e}8bKp4sT-qo)KowbAQEhgqAB-OsIq9FfFp>cb=
zHpZ0Hjqy(KT7Bl(GMY(0!@<B<XuLroY|$l=JPv*XrIQF_zFn71bNjpdOV!*>{mQtj
zUTu76S5ObvhL)FLo-1|jiGEDbpK5~0wt33S9HoTH0+VR;Wo~q7!PASzqI!XpE@DB3
zgrCGNsw>=FySHOngh?7&{Bh>L4wrI`+OaS@l2mnSN!mRwey}k`=04s2jx+j+k#qM_
z?zb|}U4x^3A3c=C$n~PLrT4m*LX9QI;(3A#r-y=2Zh?jyUKnE*`Fvq8`j<Str09Y3
zuPbcb5<ekd1+l1$CsOzEd%m85W~E5-f-=vNs%DRll_>11Hd(4VW9-$^V$KV1D(jC{
z8q|Fcy6!pyoyt_Dj|W<g=A;GKwlPy7kTci^)-EN1OZuvSU;2ibmmRmx9+zN#%g*Us
zi~cLRG@V|1>yY&cD{e*qa7q=jOVf=$DgCgnP{p-obI0|bQE&6Q_p9ESg3`J?zJA~$
zt?vWVcGje2td((&u}u~YG|@xc&k1JY_}1Cj9Khbg(-w|?`x5yLGq3%u<eqh_1T6^<
z<w``8v6-=YRO|SW3hrt<;`p2>-+!80)i$~Odb31QVRzc{tjC?>JG9e-6PKv0;Y6XC
ztaF&tlQex;u`wd+x8};VrH3+<dIhXOpN1EH!hfc+GMo#|9k}Z6Zi%oq#rWZUE#IOZ
z)%%D1%JqeKs?-!#h45wV`1>LIyr#&RG@5>>Onf5YGbnArJxglg*Td^jcR3T}$~c<D
zc(*x8*ll5a*TG5C@s$yliZ=@c{|jxTxOmW=q;XQFhzJSgO8sPwf;!2qS1|wT*GQd=
z5Hq8N8hkkX`HY+&EtQhB3*2<euV?V3^&Zed5)&}>;ak&u)fnS35}W;TeTE)&;w7@{
zo`#yXVgdXaB3jNd8|s@Tq7@9c@UUbDsVf^vJy1yR`m;yQ{ZVJN`+(%!Ki@Tgd*_=M
z9_A&99<N=`%CePSV3&4O8$t)K3G&tTsxy&TACeZv=4EL3{*9))V~&X8`&SW7)Sv5~
zpo|voBAv|+>W6~lYM)<U%p<(f^3uz9KBy1AvQ%G#j?By54;qrnL-QT9D_cCVwQ#h8
zk(>5Ty+~Oq!TzJe)=iPwu6^fX*fN6T&jW9L_-j<L>`JX|8e8~fwvj(%_?0dphuEQN
z1)BLmtIp*tWcbdo-D_))m{064l-|klhA>goTPAFx4oi#;E>S6S?aPNSf{EBKM7O@H
zr>YUX`KrVq0zp3XzETr8U=A2cJaXX7Q0?yCiQrKoJ@doUkxdY?pfQCZ94qP14!TB$
z5Q$pEyrrJqvgzzm5?p<MKXk{^?EjoU1_KtDb!VlE;?1;#NPQ{W!5=X#JtL!&S5P=m
z)BR$O=qRvt4U&DsJz><DJW08plGp#qIfu3Ft3dd%yT=bv2r)EWXmkl}{+%HOtL+_c
z^i-nwLY-^k6@o3T;jsFYZ#j(rnWtKGjNcrM*=Q$BgQt5tKBQ0?9ZJXCnHv+!qwu1`
zI9-T#4)#)E-NaVj(RsmgFYrpPP-cGE(M^|=A@pXpmQ&Zg9Y^&3(23z^Z_YhoI4H7O
z%(udwv${t9rEa0yeGSbY?|CNnY=QFXgh{ywrl1@3;6j(5eP&k__(D-!Z!$UGo+eGI
z=Xd0`hzn;oM!ZqcN$H(CFBPT359lt+?7!1kJGgZ&r0-jX40jj$k4{{>PYG1CeDVGW
zaUkTt5=iMZuB;7*(v_Jkyss)@Y3oUMX3J&zvO3;8y3zn@$vSQo-L6=p#y<XZEUSHG
z&f^N=^BA>@`K%p9t8}2y)@%0T!rUj+xl>zVY6;zp(>o(u$ZaUW??{3DoX@~8qO4vu
zq%*lsokX+;k9Q9&2-+An>-<8VGNu4cxiAZBjPKT+Sx>ROVT+$0ms79?t%d)gW0E3e
zG6Z|}I<6=mEfld{nlV*B8wyILBm}nF7hC+UZ0q#EPc8+EUxaYLB5Vnx+$&s@vc&o_
z)3Rc+<hZ&7PDS|VUPlBx#X9)pd*?pZSFCcA>K-3-tDiPKIVC0Rv9WUH{oV^?(fWDP
zXllQ9BLr2Uelgl66Lrlh-gb-eVTk7P0dO5D^GoR1=jS<iCSN|GUH`eRmHpBi8=b>d
zgpX}9_Stv5x*X9MoSs##cz{m|y=o@ZZ5-sPuGUr8OPS=o#J3dj(X&^HJ~ccjOPMaw
zOY`ft41^nBTSl25S-@YqPKFg<*PrS)y!Gyu9S))cU9N*EQVUj5$<=*#E9mJ^I&j{r
zS+r^`Ax}HO@m&+8)YYwZ&jeNG_B%x>ndG_|SuzuS!qFzt-O%3gM|!iVz7tcEVqh@k
z@jLePXRium>$azZdpLy@T2fuS=xOghzq21zJw5MCyF%|5Xm#c)Pb#&W0ma>ywn#j^
zJPBR<>JSldzS@pS!E3N!%RRxPtzZ_;J;f~z{5?qMcm1dp@0Hz5>&PqoXwus3`fL;}
ztrS$v-Gk<3A>^w<`%<u(maD|KALcjg{&85+;_j~-77C}XFll-H9a5PT){^JVh8aDS
zdCL}3f}h2CZ`v9E;O)sIHLxr1z}f9lX<Wy@!h(umkcW=w+_<eENF3u_o{`!glW!ml
z%e*D(mE;mxVzP)2xo0`4$`y?0yGup_OU<G0XzZ|^mPK%zDZP~h_6|&0+jQI0lBwtf
zczqCVDSJItNlA;lq_Y$H(#=2s@0!nri)asfKgf60gW*2x*78)AKb+-;&mKq3V)b6m
zjOTP)z00#H+Z6<D!&dhBP#b3w&hT1!Hxtt=LUxjy?aK3p$gAHVS0XCMXT0XhiuNv<
z-re1&oL+y*!G4(F`U156{KTeD+>j_B6r8pE401bsO?Ay5<wNU!eNHqj1dag@Y2Mj?
zN57OuUNa?jnvLcYo{Q(5yJ~$e8WG!zH+9Zv!ef@AZQi5ArJ|0R2KnJkaq3tgEZzK-
zl=v-1TL9NwK^n+EU|Irmz=su*y!omrO9UI#mp0=#*vf}Q>AR}m&9j|kM0*=UKX48j
zQqyUc39cLGJqB%t5LMU>`w65vbf#5x?CNuHZB2I`ej_J5Mgb9OOI|yeI2C$F8V4Bf
zLEfvB4I+PM@mq^I0-Bd63trzdv<rtE2~n{)D2WTJx+SMC4XY0skIa_hq<S;bzFZ1l
zXMVZrsJOE8`@~CBepB&mG{o8`NQs@Ce(}k$8BB%pDf(Unaxce(QD{lfQl0Dijpokv
zcOTE%vSY=ocwvhDZ_(GES-<q$kc2g_T<iQG^$0?`bm4mzU#kG}U3NQ`N(n#qBxBJ@
zu5j*qUqjr7bZts%vKyXz8{OmRwfk();9BBtO#JST=@VO<8-4NFUO^L+fs3b!@w0;K
z0+TP7jps!r1y7C;hA0?a633_%Z#JwlZCWQR)J0~yi7;)V17^vcx_X~7`zjydg7!jP
zDIhAXKK*We3v&~u&MBMHw|J>LfA&*;P6>{XdRu#c`E|h`=}W1xM8*=D#xC6C3tn6z
zJtaE=Ie+`-uNdr2h;NHV+Ks;`FLs5?^2&Dw$zt4K+||5k&rXQLIOzUGj_1v0A!qeW
zRQk?+`={up3R<9?Bc3QI0iXrphJcM=!!C_PR-5N>Be<&wNXOMHU%A-E673d@nXB!p
zs-?mUU6#dn9Lz4N&_TPq=^d`i<0>6CaI0teQ0$p<r4-w{V~B~+LA_HK+gAZd&e{G;
zkyYsX52@&j9H4HBCAr#O1^t;QJl}!P%7Ci?dNbw$Qa!!GJbJuAlFdqY0zm-pY5%N$
z(%0*8>#Y8YteIv2<D%9tqhh?i*bws|P9s`PrSs)b?A+uhuS6>U0fX`x!Xh1LL5+RQ
zloYz~{+%VW%#zQr0GnZiv+|cTq21Wi9otDP>`f>(NA#_hh^P%Hit_w4FlU=rWuLs_
zEd6V>({;6C<I&lS@7b3gk2w+1ch95pI+(K^;u3}wR?R)Scm2uHdA+<1C+d6u4*<YG
zKfjjETV|V=ZMLOzmqyiHy?)gWetyu-TsdPMJ=Z*3b=<tIUbs5KaU;jg+PQ1z?8Jo=
zcK+IVyU=;T$z?aQ)HOG6<t*HgBJ#90wOVVV3tN5@Ok3!_*=3izu7ucwki(MlUQWD~
z4ijz8+a+}!H|(D81+5Eh-kf<>7Y2tayV-Zsx_i6r+VyL8qc;=n;1K6Z58qVZRJakI
zXZ7i3*)v)WDcBMh{g0dGE6%2)r_*nWGlSV!Y*M~44%B)3Z>Vjwd0v0>Tjm#x0aecW
zdfaGt`As=1eI*??8b=H+(dNA=akIXlWkFG>BahS<@mzHd{r(x_O)NCMatiCgQu?Gt
zTZ1$0j<$kf31d3Of%fk9;GTp&&v_(4KC=JA*48xF+M3&}rNNC0jm_?Hlo__dK;_{v
zj*(}CL-Kv4J8O63nSAhXx|bq4Z-1Q6FrvU@-kkX^kT!UGa?#k^1!cJ0b=f+-ZA9KZ
z|M~H$-VMw2wHDTfmS_VWz)_7|F_b?E5b(a%eXZyrgb|dyv^BK_W3~-0*e3BO29!JC
zBYxtz*?%+g-_g@y*L>W`=R38B(dvQrg#n)a)#l||FlRxC$CQaSLSG3ZlJlXIw`IqF
zNNqY_4%%86vhl8QMx;YqxzT^a`(8(g(X<x~d5j4xq)8jZ(AnD9YV*C08a-e2UPpK0
z91#@$L>|5FFdmXdr>}_L{eDjq?bi3PFdmTidVgpDA$+c<H{aVtCd3x<<oWLL@#uQ*
z^+LY8BA@Ubt&J42aMw1}H=H;JaWB{G{j$|rGcHkwoC6TrC@13<<--8j>+OSc(-m)n
zaUGKK@9|-P_9(sWHF$i@F;>vd89P{MACYd<VP>et2-)6!HQESu%0hj!q_)Sl#f_IO
zUYE2j+5~CS7Ko4k<1K{Y{7G4}c5}_zyV|2aF%G5UBR$#}WPHnbMO!3Y#;s_F(Z(Q$
zz%Xbxy*<;<yZl^9J4YBOU8vub-^7M^(>a`Y5p4)V|D&I@`}mh1KT6}g7vCu(i}$3c
z&qZy1zu>o6l4k$_KmbWZK~(H{C?sQn=#;g8#%{*Gt6rCk-Eg4IFs5)WBVdXzA|2xo
z+~Gxgnd50K^z$0+k+wq`J>GxdLf@yoMg5n=)yg>N?SwYi<l`XiILmM53Cy@i-dJz+
z^~5+w-@fkWV16qKE~)&UAM!<eYpidK^DT8Z*V75nG5U#%I4-n=faY;fe(EvK13qRD
zFXe~F$c5kBmg425UC;G4M;e?X5NH^~$WQb`uGzf}_xb}hT<2Wwyd2j#<SWGuk8(JV
z%){irkyg~1rwi|eEeqq^mW?6su8DTd7|QrTz8OC_hjARsSmR$-mQ3U$WV8uSV}9XU
zhwHa&Y^jYha}J1eq^HGo8DkXV6?KtMgLvopwb#7ndC|5SeB92)GOm$sA5UU<@issm
z_VxEg+oN776XORAIG#2}y=`B;!yZ}taGZlV2OK_o*p8e#64&UIn|fa0oE}*DfURA$
zCce|QPh9-ePG3H4=dPaf^F;S3Z3S9dL_h>YV0sA5N}+|52MZ@GR$hu;@5ZO*=H__O
zI2SnShCQ8V2p<=HTs(83=uZY;_b1u=eVUhY1d()d#MsQs%$u5;VoSU!Kg6G2Zr<Zh
zA(tCBacC)@0CgdurF>qM2uT~iAB7g$N*WI~J^6%fd;U3FympN(T(!!L*Pq!%7g!Gb
z`qy^!z4vX<4e$+f=GcQTy=a@Ce#%yC+3a2jt#<jd&+OAZdu-27U$e`nPlxytLW{@k
z1JG@FqkS^{laEPt_5Ng+Z}Eib@9hi2KGQ(?pmE!A`eAeeF}`xaA7k}E)E)J=WbHZ^
z!<L755@HERS#wnqAN@}}rRs*20j~!<LoS^@WuG5D9G*&19~_!HuZ*4(;^F3bIO?;u
zzQLAl++b^VK4?3>_(fajf=j=9v3$CBuf6;IAB1?vv~zgQ)VUYchQ}WDw)K)NT)is9
zzKbVMx_ENLj=b|bJNmnKBk%QoetG0;uh@exzF-Sht&H~g`N6|>a_`6X$#34U^Df9V
z_<4>vDRVk6jg{Y-^n1G5#xQ{aXv32A_P~k<Y}b}uwt4yHD9f4lGxo+OzqWm+_uJX_
zvtbNdK7YAAy!Ih`X3Mj-WbTqM>K!_J$UZpnfw4V$?x<ILW~4|rz;`M)?E>S;YQN!I
zwQ!ZKn7<+n2^en?c@BJbz%F)P3PTd@iKhoyrh_-_GOk_Ru)vKxH>}*OZdkg}zc<E>
z=@mDiop<pG78g4%Mq47U+$1ANES|gAR(m<uFWnHw^K>YI$}3McR&j&f>Bf`&UN<K%
zo^%7#&A4_-*IN0ws1Nc@`u#4*VE9E8S?_g@u-52(2SW;i8TEYX(y8FiZ^Q?4>aw^L
z`gIr?JzfkAi`o|3#w8nFtXt|v|GA#lkQ;=%?K8h=KYIRX7y!AE_pB8Ac0S5t+_?`a
z-;@Obg?w-Ewz+1}T3bHfg?Be7ASlpAj+{Lb1|<Y82N~tTaLBRoK#m)tXkTj<uCXQa
zme^u9*zr^=h6(xtX~730BL>j3SI^p!^GBlHVI(1*s3+>mKTp>Z=skVRS{PI?IBr_D
z$u|0m;cbo^V>{gdh!JN0XZ!6_H=M*d*UOmN1X~QaYZiNdTd+F%8*P_%J))iqg`~8p
zE-+MKC_dx@EbSYk5A{0WPx!)e-Qsn&HrBNv{)E8h@uu;7EN-NW!H768lwR`upE!Tq
zj$b$)c}i`Ke57&|<0{k%Mo;?WQWuO7gEuYR6h>;|ri~mt^NC&X{+Nk3F3NcOW=Zj-
zvXj@f-o|Jz^izaq@)hA>@8#i%=4;ORv&)}(+@INJzG5V$Ey0WYVx**R&|WurKU>hc
zAdHx#LtCRP<t)@622;|)2zKhy=@3imU5Loi%ko5=h;B>Tme|@QYi*tPZ+J9#JE4r^
z39;#tkEM(mjPKMZV@dRZ0>^vNx>MO<f#J5Pp~?I7a$C7zrT5X*ApnziJg4$P3t|H8
z8^hrzXAXwYiqV=n;~!<j*o+Y|aacZYc^F{v)WqOTSRM#Lq{Rq&;o1c|?rn`ef2s3g
zh-lO|3*6$LZ&xl@742&EBJVQ`R(QVqea)WkAFm$a317yW9yh8}2RxWSdt|(!jYOVE
z$<y&aS$VwP*Dz-Jy3ET!c@UrKXtyy8;Y~Rh0vzpvHcp))0_2xGYuX6ykcTQ(Q64V~
z=OkDY7j?k{0_5?i8)i>kJ{4^>`d^vtHJ|oWKf?n<Ifj4wA?I4!NhY@7QR^%`3=#r;
zga<QV%lUxwW9pOeXQ3{~TC#SL^-byoacia5^VXGHZJ|F*QD4&#*B}==F2p&Ku_+nQ
zOXK9+x^Bt3cz}h685pM#!D)Ztq2(S7JRCva!{~qNviC=?zt8>r%Q@{<f6T?~bxvJ!
z?uz>Jv57W`aE8Fj_{nk9G4aBKdgQu{u>*k>k07_mc>B5)FMS~s-&(v)tn}-G4K4!A
zi))i4wlMZ({V;1IT`uU+PT<TrhP1erBpi7hkMpgREAxr8NRzzMN4Bom8vPTYnR1aQ
z#1HC<Ho*e|CoXy$@%BxgY4eG9^uJuTva%q+dl-)g5sG%WaoI-SE(`IK2ORLargbqM
za6V*QL9{q{=8)HKd-PG-SDZsUB=K%uzQs2BbtB>-=WfOm(&2i%bm3U?NPau7cgFa=
z@3VdJKu?{Ikubn^hLBHR{>npNv48XUzlr?O_TD)7hW+-?Z++bKw&3Tgh5lgGn#F7F
zr5!KXW9uIc!RNZy;fE(av^^*H*uk?0Lx`&JbQ6QpJn6d#h=2&(YXY-cXo;INzu-Y&
zX=(ANBIf!{pBqY9I7#HU@x}jy!NPA^dNH`&@F%TYZ0YuCSrAA{Pk0=YeoKb^HlNa^
z)t^R2*g@c7Nf%3m<-~)h4Dqs(27-&0$`OEbst>jpHK<R-vzfe*Z}N&z$Cf1tbF|CU
zW@lm<$lHA<v^ej$;dIutldaG03K3<=x^-^sUTs%gK)G<@q<!@2t9I}=Z-%D>o(7M6
z^(!IBtZ>hkrq;IbP&oF%2loCy|7!>{^)9lG5L&n}EUdh9W6%vj#E%EfyrpggT(TrQ
zXRe+<Zx>IVvi8qE503=A61ZVwDc*_|zTc@0jMr8s8dpkdyf{;sG#;k!<8AL0Up}tU
z!lwPOW<S$r;f;dF$J&P;vh|NV5@Hy?@eUmi6xm50Dk*Oqi{S`QB&Lh&zS8c(#a{d5
z&9|)UvOl5K>*Iz)3lYT+O6lg0kI#iVW`Dou1@EU7TesM{M;^8vFMQD!tzHv({B-w6
z_S=8_r|?uk#HnR^F+WFbdh$to=&N70#cS5s+@(vrEg`fVwgbQVwTpdkL>XyYPyF#8
z*~8uj5oGY#`Ru?ac4GH#+xz<KcJA;IpQ_BCtoO1Y;*{n!<(chhOw<1Hj!R)Gz2&Bj
z1w#)vcH3NJdDx$Tdtt{5Va(uW9fSMpAOFmDf4bXFUOE{@ueENR{ldmC*o%IHw%l(t
zc}n}hnFIE_qrbDgr}l>7pZIvvj~+i>+1Ljco}@%zd4Ah-p7ypdPBi;d*)$<Ume=?F
zJPfXv{OL)K!NABeR@~#UjswRzHtip4OWd&G9FZ9X*Vt2Ap0<bAJskN$V8{%2$L!5R
zZ-&S~drrm=1dc5$w%E=!JN*Xt@p$Sy6IycPP9br6P#QllbQR0%KVs;`nEv|S*F)su
zN#?$Sp@X_jVYx|;#W&;`V;V;2ZEmpIx#l6y_d{Wz@D>^ds^jO6+55-ekDG0bO&A=B
zD;vGZk0<W`F<@YP!srkN()lZG<-(Q4r%5pgTy=xS$EQBFw~zcbjJxEK{DgQz**xB{
zmQ;u2nevbxM!{#cJQI1x$ib7m2rCEO;QG$t-`Ytx3SvZt6-Ldh<W+IpS~n_eS+*ra
z35?MQ4e_+R$DPumY<(DG+>m?G4X!6Ied_gh%nqG790n_dKe9}_EN4majuv*jvX2&?
z--wT-sY48D7?qa0DEai}r|qfDPsWsgh))O=pS!{J)jh99T~ddsO{98U=KLPp@R)61
zwZp}>P2t&s!8)Zwd&m|)BFdZ@dTzSNg2C>W`+w=}`z^cfVp<sR+_*#?J-*>_dvyI{
z_RxAiAGp!3*F7kh#*6ev<3TFa7yW~LHn=hK^Q)h`;qkP+f9ySb@7TM(FWVpGcLt^G
zQY(TE0^F0EpYn2Tk0*8Uj-$PO;zsRXANW<&Cr|5Am$Z?zM4$G2tn$8rp&w%%;>Z#g
zMd`ETpOuC5G2BsZ`XFQDzEk_6y&&Z9#54;Au4W$}A8|w0a}PWhb+yR*3<4m>XZ4vl
z$06At-=B$u{&chNrXTY3JY(VBlY7G$hH(~y8gUR8>|1;q7{u~N);${IDaQHmaw-Td
zl#xD&&~)hYLvD0C5(dX}E-++FA?XxHVVt$R*qzI0$;U!Pq&8NvxA>L#S;!AhTVseu
zjCyeOgCYJd@iB#|Z{U{<w}hvi+;k%}20_~2Pd@rduttO>|HMn*#yfMHpGUU1ArnD_
z@9@FUO&I=>N7BbAN`E=)ZQ}TaW8uMh*4tk?*TncoXpf)#agN;KTa53Bj0lOG?^2n_
zXL0dk_!ncCx50WZ4~E{)u4HZdm-~Jh^}#umdM1sulJ8YMo^D&Y-P3x&<~Pp|V=Vc>
z;O)`ILj2SzZICwp!SN5mkc#n`eW~A5)+QHJFcj0acdp$TLJMsKj|*7GbX0{lay7LP
zZzsoGC_n2We0t{?c`Rosr_GENSJo1l>a8};Guu}?2d~c!{&2!#7xm}VXWl)uB*IeG
z5P`jYAUY*OIzj;b^rl}|G7gM)?x7CJ5B&*mA4Fs7{0p0&u%$lMVra+92SfGl6T9v0
z!*7QulIosg`3&=ycfM?2-1fy7JMb>U7>@9o!jfs)Gif0Xao+mW#m{{%j-U4HF3vxU
z@w9*H4v~Oy2+<H04|)9|CL$&vFtBDV`&QdVu%WLbsv-&<a&ZzdvBw4B5Ep1izA(Nn
z@IJ^j*h79z@P&;}ga;mdCTnk*c#^Ts+9*u%z<~wv@At8Z^Ce?niaYU)M8^12<T1+r
zvD!vzSMa0X(63>GXu(4VwC|<<z(;s$`n_iKcdr{bB6Q*zn0OrXhXs!LIi7e@y!os!
z4)Z$?qaaSu9=5F59M@2I&M>~m6uMsLv_bkh`KC?ocG2RO2Ywm#${0(#Kqy7%qF#4x
zea4>J^pveyl!^I_MIq>TxahZ$5s%6Lb3a7;p-wM#T#9SHpYHi-T$j*yNfGXhCG?Fy
z{K6mFzkTN4h9HT+{Ik73v!Cz(g<qfV3t<dCh(+Ic^c!*gFxS0(5zl^i<acq+u=n)d
zXp>x<j>aX=NS{SO1VrFo5t!9N3!(-m7{m`mmW4hg3nEB5@sw6V{sfEofpNamr;xeo
zQ^;VXM<8Lzhf5bSc(5So%%ATUpPp9!0;)6};>M$exUczp9Y`nTH6M5Bar#~!0qR6T
zOL?p+6^eFyUkWYWRxnEQg7Ix%e9j*J%9nf!r}beJzjEe`3oW182mkUz+xN>~7KN6_
z|KJ<8!#xUC`E*dc)8(R18+OD!4&MFlcf+FquY)QVNoa@uhXEtT3@%LZnCSEOZLD|i
zg{{x*av@-gE#I`k&K^BxXOA4Uqwl_Jr#{{ndBQ7!<0BrABNpPj(-Ie=S)h@j^WP>A
zUm?9X!sE`zlFv#NdCmWDhtGu^1fSsK;nVSHU;18*3J#tQ`_sM<TAXdK3)k@7{Pfed
z#XXK5aN%st&YdBYAaD_mC-*$TLYhS7Hz9r_!oCn%TvYG4aM2F`?sxXlPhYjG=gztK
zl1<&gm>y}C)B$ZjpBIlX-oH}c=+n%38^Kd$#|tmmBKMTyy(}j_+-+}t=R0=J>l~p4
zL5Athws^krR#~!sT?lYYSBKED_h&!z_4UYOLsPRo{ioly$G-86n34>Ss?+=S+p+iG
zx7|PaX-ps1=noU%Ep;cMrO348LzJhaj3q(t%2#gWxPjuv6V^Ld?+ha;Pr_lO!~np}
z=uh34^`09%`HaC2&y8Jf*!}WDUydi?8(k3L$-3Vje#;GjyIqv|i038a#-A1%{b`I6
zaBaYiJjStYp5`BY!9^`MWZ@+ePYJq_<-OzY*+0Df5AId7FP`=!KAzB~S&bPtO>UUB
zmtP94hBnROUnkasvcsq^ZrE8cN<6am5qoai^I^1xDTX(U85l+X<%55*J*W0~6U~e_
z+(;u#Aj&+s=}CKT>+@j*<Yt!K<?}N#_k}Ud!;uCHPfTKv{>Qif$$sr#M?A%hXoj~>
z+@u%URJutfZa$NLj0G5eA9VvTdEu#Xj24I8sD`obXZwC;zxm`hQHRtyoHKenU0fXo
zeGEJpHPX7$g_ilP^TR;J^c_r5bfxo({nkYm45B=(d#&ec<N*UWF^*Y(UcM-UhrxJ&
zaRP2Td^(I*9&!&e&p#p$W#I{Tp29>_z<A6P-P8~5ia1y?{$tR<_=*t*qZDz5G281V
z1Qbsbqc3q%PsEE?(*Dy2+(7?^9djcr1{&f{#)m|O`WST?tsVMraa^G}j(Xe}Kaw}v
zMG{)JF5enP-<Nj0XkXm^yyuGtMykRXc+!n^fBT#N&y6cT4?zLLPF9au!aOC<)2vT!
ze9|6V^Pn5H+oJ7KzS69592`kMgcdi*{NVk6_Imwk2moE)_7K)E{9{~v*4x&TZbU-d
zx_Pt5g?E37+S?7smoEGgrScCLjEIA=h9_!&@$oP07cR2WHmNHX;z;d=<M@^O!w8De
zi~9Y8$Ns>@hDXBVj<EzICmwe{c<%@H@#&AlFv$2u+h$4iLA+0RzkI?CtJ}S;Y<6Le
zIqWgAVd#VnX<}H%*iF9Zb383h8~o_xM-e}Ud-kPwS<%<N_Q==5GXXC^_Tf#D)#oiH
z=~Kpobh7?VJ`r@@buk;`?wbdH<DPPtB5#DjAj?zHOab$xxAPaa`T5VkGwl)k9azQ~
zR{oFneb}Gormw$y^j#mT_D4A=W2)!G7(S67KKPQ_{B6H^GWjt#h0Bh*^*72$o_xWu
z&(ufMSu*ghU9>Jl7WyX(`3RwczU<8TKIr_u`_4bv-@Nhvx)HrQ@*Z{L@#2ZZn1mq9
z_=bpsC!808%tsv(M{19h7vnHSW1eu|{pp8(uK73&$M9erVWG|#vv+OYWsm!V20Q&c
z6;lGa*b?<vC{qN^{UKRcagWqg*BpKG$hpJbu745j5dm$EPu)lUC<EzH-*}CZe?%P~
zKERk9V==yLF!7bR8O!K<hzmUE@Y<f&!h-?Pfjp;v&zP~)+a6vPPq-l+&sBulxjuDK
zEW}r=m#kf5fJVfA%O4ov{L5HOoaHRHwU3~1>&WQMRsJ_V-1<EoM;()Q1aS=LFF*KF
zJe;t?g$AaAW2}pQ<aI>83CosthiHMA%5@9<1y4;pOJU0jKi07p;=mC7#D*uL&N+81
zc6Kayv*5w;>jS^GAMgHgoF`~QEX0}e^5@V0x&6sge;Rqk%P>n5!PxnEI3DgxeT}gQ
zj}W|vxQ3!0X@i_^Xs7g74D1`+)8s+FPJ8x&XMLP~{+1Bw>631nrV$pYuP{DQcE;4V
zkG^9E{Cvjs$#oyY(|MS-vBJHCc=+O(t<T!aJ6;aqJ*EMUf@Ni+&DDE5h9P54ueben
z_trRb<@2}(`OwGM3)e1szB`NM8;>(llkZqcs$+PRwvh!a9OL2PMp>DD3lAdNG~)=~
zd<X}`O`l3*Ln<#~#%Uk3kGU7y+ehAx_6=kDdoe$S#IkGDI$P!fKI6*sK9;BJ52o_t
zUWTR=><f)K&SQQJ^p-z}@PqfhAJ>BPC4>^D)1=<%%bXLw?AO}}9z3)|9n)_?LO)1I
zU`RQ+K0)B3ZNKr!8}^;wekbadb0&R({>kT-hF1HNU4LT#{TKftuE!8se&lWS)sJ3{
za}*ELEMKrZ9{Bmzli%{|(C56L&5dhF^2|e1oX;+JTTj<IiM_TWAOa#VodhJb_z58i
zEr=k9AqX33+OBlc<st(Ql;i=y$&GyoB3=IS;*O4vSgFi(!N7h#b1ZSQrYXY^PVh!R
zoMB5C;>Od3W7tpJh&Xu6Ams2{=}C1u8_)fkHuWN*<=)mF?e@METDVa6e)+)j&)P#@
zdN~Z;cpqH;>`Zv5eDK2`g;xQd5qPvb_KmOGcK1+N<?qSBqXnUbcW%7<z3;m)bHYl6
z7A{WXLe>9ZXz%U5?%p%c*h63WvaNk^N4)dpvri7%sr~yS9pWNC82;mm+1oP_#MnSY
zV=XC+hfmW+aZ#RK9TNu!x}n6!A1=ltPX8q?o=IbrC8bH9ia0$!K1aS%_(GoIDM}YI
zh(9e6HhoFkI6S+#3owP>QyP5DM3F3?>GxD7%E12YrL=`|cpmT~p^gwYR_)kk>)fkl
z`R0u_;3A9Pby$6~&qwd$KjH*@5hAFM>mAoZprNec(b9g^#fyD*=<Ro`yWQt3_xGyA
zQYdGX-9Ljfv4At5iGwZqXmIhJcdl%B%%9Ts_OjTg4(q$w>q5(i_SWD3gPku3Ewu<O
zEptL>dH9vD+LCpCQ|6vdOk0Q0^6~3Gw~v4RizshBLd&0g%O3su*IW!-?Z>s)>HP;>
zX!*eQxaSq&%Mn^4--Z0A&y<n;r<L#ba*3OKr<#P82Uk7l9u|+;mv(+Bgcdv|FevcU
z+y}?rk0(UAIb=GTr`_=IrJV>Z{=~H3*zI?r1%t+i?)kFk)4g$Xh5-nJMZUbLd|AEu
zjk4dYW8h^j>Brm{{<M3B;B7%K;T(>j^5O9h?Vo=4-IxZ6X*Gy1FSJC<$aPcN1~;9g
zo&C@H8sSJc88`UMGFtv(N6DXPe;#3GZ1DFWc)7o_^DFMz^I~{RFkO*H=0>kRyXaHV
z@NSzQe)vOw0)0>92LlU+0jAZ0Ax3{hC5#00<yidqEVs83j~iX@e}v1XhVmDedOh$y
z9E>MF+5MAv0<p($(&350*?X(UQz*BWlNYUGT)=3|Q<xYezWVT2<K`Zb3nm0-iZ7<*
zdHv(p?Kf`B4MD=w3ZA~7m&%d{ggBm@=jq}QSlpP2;098RFgyv%yJ>ju!8=FZvDfy#
z8UhGU;ZqOE;4v2as8dHk88DJy1i<i!2M}!pj{^)37#5h{evd!pkLMLn>=uO<7jp>1
zilG}(0b>m9i>G{f=g_t5ex2FV;c?)BFdU*625P(xs9%gPr~QfD4~}_%FL-Np+~|ln
zb4{*TS*cC&17(YGic;k2HR{uOBSJ?x3l*N~s~B0y2jb4+!CNPr7apS+<=W)+K${{C
z>WXslr19Uq_21&j|HL6XK1&dx1%o`^X&8=pD))*T^Wnjht%T(`k22yVeO5dQct^oo
zJ`KscZpef`1|nF|ULNv#!dq**8w8o&rq|ix(Gus7k}{>Qypx0J^e~8Hl>FrLPr{21
zvG2`K-V9NUYs|>M=PNDnqs;Ux#6702Ltw$%Zo|^`(H6-QM(U4T0RGnx|1}K8JpE7I
zB%><^|5i6nVZh}{^cS|j7)EK{@xnA|cp>23hqwU~3~U(2Ld@{b7`o|?Ou6zae}b8*
zi>L$QN9aMwdcrv~Ml5Vu6m>+~4W_gLI*u<GHYpSDD#1I2zD7Hwj7;hG=D|0kuk%zg
z9tRlo7$-40Kj<PPA{Tv=Dbp}iGq#_1qa21+;$g*80*?T^LU_LdMsz&bPP>-{(+n~F
zP~ytAm^X6a?CFo>C##06E(qrVV;sracl_d^vVIVS^FLW!h^Gi+*;qoK#Au9n9Z&10
zen%h0(AU+|73IKK$U9OPzj(5oaql1B{>SiWL;Qm$*<f5kki~dNTO`l4P2SnTR7V)E
zF{mQsF!nL^62^Sm2u5S3I)n$_h&(wSUU-GFZS}Yrx0$|blTTrY2VpYKW+hG~O=Yzx
z5AXQELxMJn2*7lY#Lt*TU*mURQ3edUcotyXf5yjYydgL?8?SsyFMq!RQ`QZ58EA_L
zcXRys8*Wgi-M#+t&qFYxpE4G}obi@(1`jjvkOUqPc>8cJBOK>droST{Sm05FhXBTZ
z1irW3TM(g=>D9<TeTsHh+8-%B9Exqm<0lXN%fb?(&}KYeAy_gN`kJ*V1YO2Ci#p?c
z^|1G6u1lVD@sqKevXC!4KA29FsouyB`Ay!Hv@QBK{qa4&cG>4%mHW@^&!*P$eT6JU
zn&c=g<Q1l*i}=Sm@*y9`h&Q~BJRp6BzJj5fHi#e}=TF~987Vt){MmDVW`F#|HzOa2
z5=<XO`$N<t9p2wU-O`@n&XiBo1?Ayk6$ECy`Y^0Bz9N`xckd>|K7?!x{)|iXS;lN0
zw7|RbrW@)JFKOH4n=Ncf8_(lC9{0OP--|JUypm`7G}E<iTeU4X?{blnKFaxk{z4l?
zNT9tiZ5iVpd1h>5VVuEp72fZ=N7bj^4(W?5WAzJuCtt<uHIJKrag687TSbf!sV(t4
zc}@EX%DCU;ZGV#sarE^sKJdl3CZTVWZ#>8FF6JG7OihTeNIB_Wj0yB@&S7u+x$IN-
z#$k+S%%=Xb_L|w8c&OxAKi6`N%Cx%lIf_R+q(AV^6W+%{y*7B-OMIM1@I{>bxz{68
z1R`z_Z(eAjeZBnP%Q1!QGOw%i9p|Gy({zqSh@KW4X#2E<%Uu^kjQf>~eTe#{9xZtC
zeEX?y+n+!GXK_wz^!M@cu*+-i)w0(g{NcKhv39e2=KS#&zUgIqrYN+~*Ad@$`$Gc~
zTB5n>j|hms3?eWqg%<d85<s|MVY)D;4#S&;X_*i;5L*yHIH?kri!?+K_G2(kQ-2}I
zFum6mf64Vle>pMXxoF~~$LI9^mnA;^&mw<`Hu12KMw)tzIJhYzUh=}0aO5EgIC!^^
zA1zZSK;1}anL5Xb#6TaquZ0#DH4s{Me#wQFbv}==8?g~u@G98-qaVlAO_|Wr;zsU*
z(6TL?It-!Zn7_m2-R~o`e0oc0@$n%ohyoaP@MgK`LJNlX$G-73Tlerowr0nU5LtN7
z2A&ItfBSZLImGl&T*$Mg#W|0;_&)YV=s51yjetMe>3eP+OMS(Xw)r%&J-7Z6Chhla
zmYZCjvWWa|;T9&M_Rqz`O2Wq1lHUrc#f^&}7cr&0v9FMR>}M}&6!udV*it`C5w>9E
z3R}2xr7d>BB6~+mHkWbB0)M*QUrxVp)#?z)@P6s?as9%{Pwl!3-a`WjFVs(WylV&1
z;--7a3=J0QG36(*;QNStk)`Z6-nmof<NPXbBX}=`M~hDhh8PqcEeI{gGLM!dv}}Ir
zDW8_^tF~y}nu15m$#_r9zMucXSSce@hdur6Z`z~o`NH&bMWN+`4{h(OuSNKBgq8ws
z&;H1ID7K3f;2v&6aPYeA>35G7H+1aqDTE&NDQ6H`FeY&m%graA0!&Bpsn2Q66kqFo
zIt`wh{*oIcm(N?_28hOZBAO=yF=p;L`EeLaFn+|Xb4i(LZL#=<Ps{zL^vf<-@Dw>u
zY+~raa0wfxG{NZby?4JCPX&%9wB(V|j`L}z-b0vZ<7^A<D!!8euabN?!j>+D9EYI+
zqb!C>re0zWY<O`4%?d}1Wf*|)4*Ah;gqFQw^h^c?Zul|ebHl&ZSz=hu3oR*au#vBL
z&x7B*Z*=1+MoC_(jgWzmgK-0c-6x-Y;`N0`adv%`I1m@*C2jV@dW%na^IYMnRq}!n
zC|-o?`NxpPdpBO+H&STHjH~YP;rU={CcH|X^68lnS*R~WB%bcZ&;tjIFc>m0HuDtz
zYkOZSTWCr3l<I`|Fb==8{iT@FWTT4;b6u>$BLMHAJ~u3VTrg-L5{)LbJnx2Fp3=oY
zgwZ74XVC8N1am_whANEC^i9f!0I}T-POJT0E*QfwGT|xnj+X-g3WGY{3@M+fz7p${
z?`T(rwx9i0Xhpu^*!afhg6CE7xO{}U&ry$Fztjby8`B|u*~QLpJn{`U4Ej43ye>%&
zV-E(nGwq+-ci#F=82&NPr0eLE4n|81^Gp+iFhaX!>Ldgjrct8JWYhm;agshOMj#A=
zJpGShm-fTT@ffLymz62F$S3avz(^Oz7s``HLHY~zyWXF&M<Bo`$onEN@-P(~MleJ!
z%7y_x^`V3Xij;?c@L&oxp4P`Wi^mgfk~BD$e)vQ0kG!`AgBo#Rd`5tzEg_tc?^hmq
zC5&H;UmmTul~W-y)5h=;s&kgKAq=q?-59fIkJJlKqW|!NfA#Ih(GJNc{TsvHI)BP~
zo=^YO=1*EvN2Fh50$UH82@pR0nH$0ATTGQl-0)^9q@TMH9Z@l!NOuD)Wn=usBLXq;
z+fRPm%dpA~psn!~|0&)P;B)e0=tIzkFGAFgH9M@$r&OcOzvEB2hhfb{a)k9l%S?!~
z@1D76H;hSsOsc;C6d`HIx7syji30hi&;_zkUQUidoWQV*F%pkDrmuk~2VpRdspFhA
z=N$Sc22t7t?~Pbd;KCF)c!ePJ-7-vC7>f}OF-RjOU<AcDPMjEB5qp{PtKP3c5x(#i
zp$@5g`ZdN=p6KQQ1D<&2IE=pZ1%$(8h4;uXT_Wv1J+K#LjzT*hFErN=cx5hXUF`YT
z6h>Z*^-SsazPD+-E%4SvASWCPZ32<#@r{`W9^Ph%nv6jwE*_8iIeYbd7<U=_=x@A3
zgtpGPnR5QljigLH#W|F`A#%_L5e*r$UwQbI7~|&qds@!9z(TviYlRiDf-w@KHB$;=
zNT;oS;KI|p$KH*1JRCcBEZQ7tvyZao7us|zw3R|jDQ+WZ3QtS(e;%A;QarS2!Xrp5
z@N)6g|8oyKZx8zTz`2I^Yao;&oX}qB3y5%t3J733UDRY+NyIqL2XT({hX&qqk6Y5X
z)$g*ggS@8nIkz)b(C-j;U`5;DI0PWhle8z!F^IrF{^-ZiHpw#!;fRYdjW&(oi%^1x
z2i}nw<!N8!fxd{B4Cfif2gV=Hn>Bv#!dQjJAYQTO{CtDg7TzPYYsR=Qc>BOhGp0!M
zG5C^utPuA#e+Yo~w7|z1cp#kMErGy5o*3Kr_}GdkB5de~ac=i~BOVZzV|eHW0qJx1
zR6(G;>LN`F$2pR=&v}wQ!?h4XA#s1=`TgOCKMs*R3!lcgLYpYC4p;?Ve5Srr{qsGI
zLB+PjZ?qBr$A9*xV|Xb&eVwT@@is%ay4-y^9vb<`y@P1q=e*A#p23)N9qb54`)163
z-^DokJOV4{gzUUfuxI&kl=GWU{INfv@=S<hlm;&$+5y*0pZmF%c0qr@`;l`W?_WW@
z<eK=?3%p?dl%Ip$?&aqd#B8>Bh4Y{e*L7SI@;;XnE`lJWa!o>;^Mc=MSKiL(*Ii!E
zh;Uq&5Dt$`(uO%>3_{B{pZcbYeg9oNynxWcc=j`&W|s0YmN1qvKK`M5k>mYG-$I;1
zXkl#Px*IW*Jf!-dZnTJi2#COR5}36@3((Vv5-}ust8jD2#U3mB(}f5pO;(J>2r>vW
zh$38o;FZ$e-fovJT?$VaeoOHa9xoY07Cc&r55a^f!4OXnTlkLQA8!}J5HCUw@vtAU
z<+8sE0O3W;)Co`r5?ZFtaUwCKzC%At{fmCXXZFpsg*>JH&lFuu9aa=t)~~n4Iicl4
zJX-u68Mg^7;nC9S?`ZLtE924feSg9EiJV7^_ixhS1{cw!&fmq+?;Z^|ulLy2T~FK2
zFTZT7J&u)Ix7wLQKE=!@2O}N46T+j#^TjP7o*caBeTXs3Ka=m$g>)ECn2M^}kE`?X
zCF1Zc?|MP#8Fs;dICz%{27kg6ALS>H{zIf!jN8+RIQ@6VO^p1>(<a5~nTh-$a18Z(
z+&=GgzmI9e7w?$yOJe>LE*5zBTI(;$tgiK^v`Eu0$O%L~>KYqjKb*)Td5yGvjD;WR
zQ-*<FcdPI7Fg|7mW53{!@_0IUlHk3<`%CKmCG&)*eiym$!c=XWo_xX<tXSbMz4x2V
z<HtkT;>Gg_U<1w>4!vGSJ)Ipc++^p8Ec0YDyTm6>(j#8#7B40Q6F9T4=UPYPYw4zq
zE~Y(fJ70du7O!3F<#7X5OdU3g(BjjbA+!)~-m;}3@`Xps&qonj3LY)Ij|GpG<1V!9
z@u|)}KYVA87TOO&KVu|YRv5C-UpZf~@|h*=i!XQa55@uL=7pBW)@2?o+-xJXaFd!B
zTDSpSw|ITr)FLn-v>=9H^uVKqn==f7+?<Ujw4}0Ol;x&#iQlC2zLGC{To@5BreQGh
zo)9;?+`tp(cip3f>1{r5KNsykd9;9so6BGqP2aC6-0+Cu1><tC_AvAY!jUe<1?KeU
zGdJsmBVVKSbK+-VKmTCDaonh5(8VByaVHtIFqB|`I`2332q8bhqs4D*GZ7?A1Cbs=
z2?iZ*{z=~x$P$Wvo3(kKNT<zH9=PDSfH8HU8^{nmF!cP^nPWIP<%Xwt$!-{YNaxnV
zXJVvIc<TK{pMnLWCNDw9_(GY8pC{-MS-5#eXyNI3>X15*{CliCIf;RV=_a0b;|4eN
z82m2y4LJrGZqzYgU^IsVPkAD=yo%7`Mu&1dTJmj#I4}Sq{C)kAue%XwyEDn2h9_^S
zPIzLOC%Jj)Z(eAjjWeF@aKjOGMN6O^FpU}0$;6A?Q(Isx^KlXb6^5IK+}j94I%Q_6
z4m@+d_uluyxRuq=Eyk(+K%P3oV25#or&sG3XPqGn`}=&%LxjRO410{bVW=y#-|^aE
zGN$lO6%0mUVA}L_7*i=1?TWPEe#V8QzjL95cUeSV@N|efwGNC&&$|H=<Ljf|?>pUi
z^SfjIo+a<U82D)C|37>8;c!)TtqXrAb`b$ZEFee`1r-Fu-g}QG(Q}gAd%ydqefOT6
z#F&#<5{)f3Y#>$;6i~qe7DN#gEYa`zjkPCi5R8(T#PjV1`z`xjd#*Xh9COUIo@W&6
z&u8dj*Za<9At%f=0K9>%zLNt`mp=KdD4F*6PQyFNm7LP9^F(cks*xH<M2Rzi{~|+P
zeDkH!NpiI2TU#YeJxUv7XB|yW1k2?CMrS9LAMlMF5G~e<G%|9|-yRZVkZ2Lbz!r#1
zkdg;M!@HMU%@5FDjTn)7K;6#6HxVtl9`X+;enFyNq*GKkN^y26TIA*jn3-RZUS#3b
z)|38kpJ!{c18Q69KOpZMgh*^6{Rg6o8oVw2H#hXB>{rH5e(Llf6R)0ob;ul~&lj;f
z&lGs5#?UJP1EmpjE>Cz-VF(Mq8y8RadlFq??zKzK`edH`?(n<*v#n%X_au40TZ3$H
zG76+$WZBuSzm!d4V@m7N*LG;1*S$4n?)BZe<9Uw21MI8&05Ao(v+r!V>#H+D$H)YD
zP4BHa5twAkK1o?81(~#AB2?ZESPeJ^N;Ysg9nbg=4)}U>kv-(153V6?4s!VU)ZGC{
zt4(6u)|Z#LEaETY*l`}SkK~_gup4y5{%|@{I*~KXJo|c22-{8%Un~1r_XPo8$?Dvh
z=Y?FjEa2;ujA3GVpR18#UzO^uMK_Qk*&i4ZQtcP>Lw`h|C;*>s6cI#^$BiCWkv<|B
zFR;!+#JLX??ICgzDNB~@1e-2>=y~Y}!jG=AtJ31#7oe7xzP-EbbWN!dcZWXcmpO4b
z#~LFuw%n^7$W8Y>TjJ=UkH+rR+C^f~{T~J3W~UGyY&m^ZGfqSafZKW6H9se0e~|i*
z%%%OaUSE6nwcejoo8p1k!HNz^^ivg$@7u%HPXe=UMAq476$l4JBSy_9JEGpx2UWO!
zO&8IUC;>LmxCakoKzJe~5G2+GLeTsmQ|XL8i(Es3*k`Os5qSV_^G*Lmh}e_qu08<d
zXU6837pct<ks>8Tv5;XSyp3;U*854>^Xe5l;(RuHMf92eNufwSP4;yUkccrhp$PKy
znY{d6ly1o>U#xwyc8_SEN8Fklk)9$Yew$RL_R6{^iXM<f`=<@s(sYEr)t;3#XP+d;
zvM13E<8|$=pLb-rxo>UJFX!n8Qp3F{M;(zZWI#vBj-DV_$=zr5-F?7WAr{<Am|p}D
zvZNwDp1-@!(XVcA-m6F!QF2$zxuPPFT^j+xMp(<zy={y`iSx~?ezFhl1=*YTL%`6H
zT9X;SziXeJLA1E96B$hA4=#J4w`k>}BG-^xHRg=7TVYvc`S1G1h=8@v`+acTHKOJE
zB3csBlC@4we+s#FcWnIgxgRx8jcAds_so7oOA#&oNY!V^Zb-CH+2Jviz)%AJLK67@
zHPO=M!G`9r>1@9V?fyFWw*nCYn5nTcdUVoa^lPoS`I6$Rks%^fT02M?smYx0yT6HM
zac-->qFn$qf(0StGk$k%H|)-P`|Y=jbaAe>Tfe$q4!?dI36S5AX!$l0``gc@um5c#
zS|%e}k~-{-co@Hu8ZCbi(emK8Ct9+$5d-^W-5(fRGGff=5GHx}PgKjqSts_U%|E5m
zlqr6WtdT0}m8YL6!UcGbh;Se=wr<+o+maN9TQ_eik|74-E0bmp)P^}A;QN82lj0*O
zvef=SYV5U7Zq!43eHMb^<Jg;z^4gNf4RuY*)@U#5X6M6=HR49G_7Ne8rB2&#M8IbR
zP7x(}NXa;h6xp)1+LmtXGbEdoTiVste%pwNeL{X5H0F?s#y||!1A5LwvWys&{M`8&
z39>mhchma!iZmK|*w~Vx2#(J~ut>l5VQlP&A-~B~AG&LG<0t~hWn)kMtqRj0e_U_!
zoRd?B=i=Vsnd43A=j*9&qPEHF%a%nZ8|z$zPxIWWHg^_Een%vEjfkA}+5D2t50c8r
ze0`WYFQ0{U8hu2(?T(w?n|V%ZU!-_F0_kN$%S|`!B6ZklDRLj8C4|uEqz-E}TBHs8
zN<_=fM9Y&QTJHLKqQyM7{e>fenq|m;>lUzLuBFLrsWN@92P<><4>;OG?G8kX)L|zk
zqNPu?05w`cZ4dPxR6C+&dqm4Nq9qY6MYMcP>af!Nj173_fe`QyPy_%0rKGozp29;a
zZ$l5P9+tmCv}BDr+I@kaJ+P~Zz*7`hfS~YR2ZAV+`)Pr89(n;6fB_($W3ZiiW8jMi
z`UWL6Fo>tCv=2ZLAg{;`pswP%yw(Bvh@n3|{l|K+ZA42Ozw6bjTHetPxSqoUw3o5)
zpqC2b$~jk+TmbpfmOPN+-ar|7<M#`&54>sZYD3F)pA}gEHqQu{!-J80dA`=Tqi=`B
z>-vZQF`{q)x$1fp(GoeE^VWal0RX9L>BJM=n0cJP9Npm2>Aa}{T|`QlKPgRkiUN%Q
z8sC&?A#Y;`s3PDHTjxvXls<|^0K)LR1|T@p2)gH4sl&DtExbHCTPten$mrMj0w>Hr
zkXFi#vbO=HEIG0Oh>eNYyfh)F$2%6_J~m|Frv5k(7eESq+u<-}5w?Dl7Z8OW(givU
z40W)*o44?A-c-6y4?hn;Y>xWV-c4Gaz2g`CtNzArNZlvphyvo)4M2>iG?}=jS~$F&
z{~FN(P~-(Hss(s-Z))>A`r>1~Cs#fhFz1t!KOO3b7X8V+a=)WwTd!>$@A_+uoep;A
zckgch>hsP4I35<Sb1Bcxn|W?|t^+dxQV*r(5OPlRkhvn$ZMq#f)V!u+z%6TBWQ5}0
zmj)mP`fZLp=!o;^5U*>%;;oBst+Y*aNFTL}7yw?IZ=Un>rpzrct_q~b<6qjl1<6|v
zz!oXw0HOo&fp*rV6o3G&6#+%nw30sOrPP=K6av9ZhTC+;Z`%*5(?zt27TQ`m4LFoe
zYjz?Gc{+<6lDY{|B)uQ-+Su4pU@pLu&R!6ma4{)=0O;nEEkgVPjP;q_vX;iD(7H5c
z2sRM{>enF4=@B5W_6G|i2Ysf;h_KNIWdF_2NXY`+GoJ;B#+K4&g}tpYb5M3O{pnii
z=3RU2sCps5{8K|hSmWjwQN^qL;ehDu1$o#H>i%F*&(g1gWb?iLkwqWq6qQkRSz_Y=
znh4(Y(V=F;T;IWlBJdOe7dgUXofkTiP87s_A)BSGRAVc5WwoDJr`Eo4)!4eG>Z^X}
zC%ppf@+v<uWGF)Zyc5o=bqX9+gX7jjE3hMUUBA*;5jSimKv!gvXgYhCXrm`W_DT7K
zyl>+jxb(IkYdw~1I`Wo&Nb4kGOeDk^$DI-H^|NXWbOhi|r`Tn4W)2WpKbn7Stw-b#
z0C;Vpdf3*x0@ebUtru;#dx)&~t`B`X=ojnK+Osa~lZYwfskNKGtKs!=>NhDa|9<RM
zIh<q;jnBbGux(_)PEAN(=owqDj>KK@&=*zUK0=%31tHI7d(F?B8q<dCPxc=~qH)@j
zjFS#J%$Er;7Wx46z9x~X))eOj?HaE=6R7`TJ})BRq15fVGktK4?jEC>+u=*PmCxs%
zIAd|Nu@Tqj(Ld3uBF!3*kF+pmX#x>RYMyY^&}aRy7l~kMd%fr=`duyOVokPi5fNC&
z<v#A@?9<je^2u233+@}t<I1;Ir2ik4Q$RT_^=-HQrPqDgbp6)0adu?U=cU`)wh!5#
z>BH->HNRT$pRqj)idbqk*k}D~YsNli4(NvTX-FZuVov-`+EKb;zUYQ_*d*;}1CHZo
z1iHE@SEum%{3)kY1eALr1?WX(SjWh6cHZYj28v+Yz820bo28cA?^8>~y41e;B5!uy
zoSc%j?G1=7?JrMi%llG4j4s-rkV<S#8zYUP#Wn8!*th*Hb71V+Mra~F?OF6o6jPTn
zZcRm}b@wM7;hpB{(2hgfP4B$d`ur85g&h&m^4Ex#$g+r*ekAFTXeludzYHZXl)%53
z1cpS*Uzre_gTsdJrd6Z~5DLk1$RURm0OfsTh^Q73D2Nn?&pHQS)l!0~yQ1cbb49e+
zfNf4KQl_2TKJUAT7HP?RRuq7~w0`;f@c4ETAip8e^6g~yx1LR3zxPDT*;0pH5THA$
z!$N+%_j*^O<*Bcv4!gWJ^W1ZL<7cEs%fTXATIw(nE&bGCS@Qt?k+BVklyQ?M_r^_`
zoORu$xEgi%xJq#as0Y%292)@We}COuMINvj1?EStqBb_HeY40I1=OXxBJcL9hwXs-
z+L0;?pfB~=$g#0YW5!e@3ZenQgIL)ZTk}DPUC|r}9`7MakUOr4oNyph4h(rb=BOi!
zFwnN~xXylw7}+}!CRJ2BHcN`Ftsicx;_>gj+4WW1)yG76ARinfV=qT%>=S0pOzNu%
z^?pm^vTq2S(MiEbMw>#SY*@QC<kOmpPB}cX8<oEul=0{*c_BwuKl@w=nUF@YwfSK6
zKL#{c@_53w$FAqv`|P)W@2C(S$DACZ<>HHKF39oqr7xuDJ)$Kc@2S-yeON{Kq^++U
zd5>+$MiC-wbP;Dq9(!!>h^bRbE+g`u&l>B7$YAq^0H-0N$ZX<CC-p`hc36>JuZB!R
zw3thM+aYz>Wta5ECZfeyMcIgGsnlVAzQ4v+sl%>M>aZ)i)M5K2YR4E}PTiMhzb189
z(-r%XJ!Z<3DZP^;iz7k=iL$isAzNO1?X}+OTzBQlmAzL({Ba2IzN5ze6`}n7-tds-
z0d0_IkvhzS^B~azq-d$bJg_1YDs@=S1M>0M7SZzfUr8NCCT&arI=Q}G3n(%%=`%bi
zO1Y!3HQ<fM9dPC=L`!sR|2(9&2TR}-08n3!N`wtBSTFDSTlWCmD?sGKfPlyxo}5S|
zX-7OLi)`@lJJ?5(_B$zk761qc&*NG83F%BOopot})6#?ie1Jea0}%~UuJIgHEZ4(+
z1NpSqjwXxt``~x;0toc54|oxMaLv4H>%ko81N6N+Dcz7EYd=^UkID}U3<V5XiQ0B8
zwu+}WuUP;CF!qw9n*v|}a4augwbSEKD_s_n#ryJ#1GDz+6D|7TT;Po~HN4RE3(3M`
z6nJAk%z-*80B_(LS$|uiMgJW@7yah-Nk^nt0p^s~edgo%BaiP=<^-JQ9ZnuROaaUK
zV%{UhIsnB0Z@OhH+khJ<^yirEDu6Uzy~g0006ruQ@cGGCp6vZL1-N+_JI^(0KGWT@
zerd<=Qo&dw(zMZGgbP{JoaWj9{CDN~kIY2htc_Q;dKB7oI8QrGKiWLf2Ve~OOSK0W
zrVH<F%DwN_>}LVYcsr}lv3}FLrQcGA-5*emEJV&|v&olE710us^OO{Qro(qVe^>9(
zWsmlrT(KzN(bnF6{fI}OqgzD~$8NA4t$)?&fiKm5H-!W5_J54Hebzi8IYs82nZnMM
z@+<OIpqWghX#3++f9!ij^W5*&V!LL^J@1)I@>P$+e9>WZjquPuf(1E8FV)4k<;hzL
z9OLD#E&2>3LpoX;=O;ZK8+c+;Yaw}FO;p7lAxov_6S>6;-<l!^AfZ|pfM8%8Kzc9&
zP@Nfif)sJIzSl4PFPY}{)3)!3)hp&)UPOzihJEsRHc(^@@&o|^q$3pji|Dj-LsS6+
z^~0JJh2dP$On_jvi5#RWl+x&eq^sFCptA7;$$^Kzd*XLBPJ}7q&Vk58Je1wbXMJb4
z)l6br?FZ)5HIPtrTMdyl>sJ?fWBp5;<UIOQWJs<zB49qk{@MVfr-c9jAhT%z;Dt#E
z#s=xfHu0n{_6+xFoi2BKWOXuMteMj6%&BWt^k~kfC&q5QX999gclZf`!{+PD+>mS6
z&by|(+^u(W_h9NDAbrT*2kDhLZ0odLyS}HR`p~V5WKM`aA;*EUoCwqxu^&l8CdKA0
zi*BuG44<j{$>{N6v32&AOJ-kEkxBH;Jg~+0FMS~M{#xlI(x-Dw>DEN8%ZydjP1%if
zNV{b>VpHr}$0ceM*p7Ud6K4uUEV4lAwLhgmcBdm-W7DP2n^=87mIDCEKsLX?5hlQR
zpuW^~0Bma%C`%@;ORnTn#AkF<Egh-v*g+8@q8#aUvx8*DIe~s?OWW+B$iZ%%AeHwj
zn=hh8Y9qSK4x0}$d>}FrMJW=)0Z11C<lg0+^!>s(j3DYn8*mmun9{Kg+51XI><>BU
z0$EFC2S~Su?C*3P!Ra$w<7{OAIVFUGKD-pt>A{e}e@eSwN3_%hTqAl@Hmk2E1lINs
z<nrXhrc^4{d0E4vTj+kJvWl$fAfiP-*@EUgqCIId>32Ko8u^7?G%r%G9ec#Fxp%mx
z_H3VJ*B)B_P|c%SPakF;IjH!oYizeuZMAf|M7Nzox5;{HqDS0QAZ|r$In3WpkKR=G
zCG>-B9n?$bX#bEz<%&F`@9rb$2XeHt6MGdoQo517+<wmHxmP2zwqEFmHFkVbp<Xxt
z`iyU0ku|S|H2y<saY+Z}9?w2t+&)KEN2Lh-w4_KoH}%Ob454kUSR?kEJCh2IUXlr-
ziT-n{K?ozmkU_{5ZPPym?(NHE6Cy71@jT_;WpB<@`8mzkHQLyXe^*2c`yfT`J*lJO
z{?UD@$RurPcbo0Yd)gd4j6omFdm~z|JN3HW&(8k$N)cHREiH9eB0EJ}x;H^AvwLUc
zz7=V;Y3qkYwhvQ>4N7<T^H2gq3H&QbU`Vuli`lT@3=%B>C=W0;L_~`^Er^zu;>%{E
zHj9_qE<h>diWi~<;o=<c4@SN?x7|GWyYr+a^Bsxfdfp!%dnSRpaR}f^?kzozIco&j
zp0)bje@egU8~yOz!K+H&|J8MZzJ9NXmH^F2nim5A06+jqL_t)?hiI8~acZ<oPJY^u
zBkN!9B3h&lTl&C*+ag-#Ua3aQc`3S%XgRp|?&>v_I_!zxsnPOs5gTr9f#`>fA791U
zC#LBA(Ru9^pzw=4WCP4ay6nAgbTsctSta5G86sVl8&@~0KTL##x*_XAioCIWdGD3Q
zPZx;qCO1Z+?8sgL^AQR9JbuRU6|FHgh1a_%i-YQ#MF0f6m2zuSMT)Esxv)Gy{)<H@
zAzzBviOv-{9Q{MG96DiqZ_?~py~6{{1OG=LS^E8!s)NoVCy+t<DZ=D~x7H<{)v^?n
zUz+q=NuQX=h!$}&cEW_THMuuE^+=97J|1nk5ieVqv506v#E>1*<Ab+SH1f6AEB)8l
zNfSGgCs8BPs3}_i>NC&wRwQMcNFZ}S9>$OcYwh^mW2^pEL`xAZvopsbS~AB)w7h_5
zxhqjbh?YdOWX>AVQvK>$s;_=GX0n$O?1*X8ss_uE$4=`Vn0}E<&3nMFP48`})MFxu
z5LWa+q>;YA`b>zHU;idV%gVZ5BU+}PetPfZ%e&NJqY}|#T$MWP-urr@TC6dVMW_Gx
z`re!?FE27}Tk5c+n|tQ2M6@h_u_A@`$~C_X=+(yHv-;g4F6Yjj+q>kFOM1r~b4>5>
zT#L<NE0=}5TDo*;@2L<s&p-cs*-&;=JFTq&k^EhMtOw`lJdam>m6v-?(q*Xo(1?~-
zM6|>cPAV6mg9l+A=+X^I_f@IGLWW3d1N@L;=h;`|0UVtwV5<M%-u71yf`B6V;d$|0
zIrqu}Hmm`uOQf9GE8y2fAz*+W9+0Ks`LBgihrJL$W?f_uq9p(zuvXe159^Z-o0tdT
z83B|6z6TftzE(s_K%dSVx>pa`3I<E(z{C0Fj7KyGa2}YVZ)DC6bVofo*YB;4gH%Ct
zCe7)cJtOIN0uTa0o=J)dUfdr1ub6##rF8NDE_H<m_uUaKuCe`E{Ywsfs|Wbpn*ejA
ziBe+(C<&O7>Vn5LFb1FjOt6R5gMHfAL3c8oX#raRXHpW)N?TIU+!f%7XYBYf6XH2P
zt33Io+>ldTs+A&I0;B<g?1w-Y5B|W0qv8$kyR;a9U7!F`Y<fUnpxBl1%HC6=#kJ@o
z&vZHCr9%=q0JH|S@Wj;**D?3{hiJL$*}F5B%R8baZCGCpvM{$^wJxKZ+9#_v-}Q6G
zXD#s72e$EY2e7zCN3<+1qJ_=?n*bZF4L{R4edY-*4Hc3IDBaZ;5qZ<?A6ue-QhvM^
z@JP|%rP1rx0!*vJFz5l?=1`v-hytAAnM@Zh4FE)U>0p<x=`SUdlaHBGz&RcH)uU>(
z?4D?ml7}8E@(bJp9zK=&5ldcOQXtvB`*peB5rx)(C=`8ZeJcU)=F4{7?)C>?jm67c
ze3t!Ds;;sDN%sZx;W<w?fy;<9Aaa3Z{cBd6(dTx5M(5O1kXCM9$T87nicU+3#6Fk{
zb_7UEM~Y~Pjstb2PC|kJ1Gi-TAditPrzQfy{3B!K!B=Mqp$WA8H1`t7IzSm<N@}m`
zl17hbw}=rnZd$5E5de*7G3F+-?Q*FtwDh8EPxMclz-~kf{iaXsl{9K18>IMRtJq=f
z_?*686T(8uH1&&+L!w_6glORv@49R;LWVrp0cm#HTv|WUA>H=mZ5?r&HHL^KztMdl
z)4HC1<Z%`CHY(|pDmo!H*gA3`s*P1d3y^YU(toXpx4b<=y%0ybQ9yjUUE`E8^54(=
z_lo$S8zMaJ3^C1a0&J~?12ey_(dmDH#K_*5?HBCP;L+sTfKTK04>4n%w!e4p>JK|3
z9oarTDKi23QVbpyQcmQEwNUmw_OQvYK5HO3y7u~g$o#hVoWq#~ka<bS_|rqQ%*>jl
zvm6p0jmNm6)aDUkhVW6?{+NE9B6U4Pl_I~8E22(-x=0LT7paFdEtzBt+ty|NRyt;l
z5pvg|eEqWY^x_jQj;_q9y<J*CWEL_Gp@e8KERk8(q<R;=YeieyE&C8#@_2|a^3i{L
zEcxn7T`t!~ZX#Gj8UXo4*&$h)&R7c$Yfj_>M*&e8_WtKn-_1C+XWl1dEnGYQx>^f<
zW_P79BSSLL2PA<pa)xkSdo&%tAU0A2k%*a&Xt}4x&_%T5IUsZIFkj?~C>33Wustm=
zN0**#ZS65La&KWjpyz5HDMW4${U$_9HMY|iV>FNYKRN5xyxX6pv$ChUp8|Nh_p+|Y
zjSlIn!<>ADbnH)ivyY;x*gsKs?%B-2TXBqdxKFe!Nt7Rg^niU1tTm!f<gu&6^{t{$
za?hm?=61&TnMDw>Cq8D~{`#?B_lcIxwJ#YXEwbK33?XT!B+Vmd8vB`b!hyiPTM;c;
ztL|rNzfWY3^l-=xdmV=a_dBA5oGXngoA%Swep+NHyD0Ub6rWEes<2Ya<r%~MqH(b)
z_UXfuN>9qJD^lN#Ox!!&n>C5xe<O8i)WxBu$e$Zezp)%#IPX}ek3{#}cRiiRG&PZs
ziu+_gsJY1)$&h^5Pd~S#?Ey7!Ie##s1)(mDq4Z@^!n((^-q<aNcC`O_2rgqEoTIkZ
z+jSe!^5fI4@BRE7slyH~xr?+!;yjlKy)Jdwk>z015t(Pz8l?+rv{cQm)kz%|=L`4G
z?HX+?!|y{03?(p>z&}d@L!xCDlDDyV?R?mH)LUt3!$h`t57^ymv;e*R?(jDr!HsC~
zAT7lhQpWd>;^pl<=~<)<lUl3+TJ8GUaGvv}YVot1cJB|5J(ED)I7IG4qGiwIw<n*X
zuismu1tFK00`(_fdP(o_sZ&xXBk8;X^1r?6mENLTZ|yDl^Zi98AXZMf=IW}sade8<
z`yC+u(&JC`9>4jwy>+WrRkTMVS`H1daP)~M7FnV2Jz}Mz7y^U?*n#>#bW@x)FA_pp
ztiJ>#@F3uYX!!)uvSMZL^^gK97e3j0?~OMqomcL3bN*))xpKtxV|$YlH8J_**%f_)
zB(V+v`v@E_^-A`Qee-~Ud=PQ+*2}N-RxVmpq=?!oQhrsct@Lm7ktxKSzDu=rWNMqJ
zk>YU8N<&tJklHGUo&fcE8JoW$aS$!5pL?#p>x;39{urMWUdNwxR>`GEjf}BH(G1!>
z?HD`oK?F?~ZPT?Q6`FcD>e}f4iYFFrv+3I`XW7udooLY~4;AF1?u!&*GtWJzHzTi6
z;|>b}bV%l|KYt-skVj-8nuYv5@F82YV~%$wTF6d{x8u${qc``;D?<Vu(;Jn2r5oR}
z6uG~r_w3#G)c6rCXWsDB=-^c$)+W{G);zzYh`y)pxTEBb(Agc)BANx^e8w4P^sc}D
z`ig9^huN3dOXSKkA%Y%z?6KbB#f$%GL<_G&X-;@W^H2mJN@;|Q;L+{D3vla!Qau7G
zNSw!O6;L28&m%7_2r#v{JWi1*jc73+P0x4^0x4Y!sBnIO3n@VWhyd0X@{-EsM*&xk
z3*Z0*1g!sVfM}s_03P5TU>rCk!a|Ccx#@pFCs}hm{N+V=O%I_S<VD|j==QL>A%HAE
z@R8)CmI}f{xpRT99xgr10>Ns3OP@W&^Blh<(FDjQDLihCXR?QL5AOg{p0EG}dDFiM
z(UQ*Y_Gk~zQcEDe5MjJXpMU-NY8!x#6xrc1xYl3?!g$yh#W89`Y-r?ib<&}TVlci(
z;-Sxb5&(V5lv4{Ro;`VXdHsKdXvw(Az(0qFe{)y>D4Q67%Ul3Uq+LM<U6C3C0F^z7
zXaP`&q(QDoNpeA=3`8U-MviO)V2_EsrPu+i89#zZswt_@DxxL*8YEiuyWN3qvsd2C
z;qyQn@&=j#>*q(G0Ho)|>-C5f|CN3OxWZevQilP0Ba?iz<l^U6Ger^Unep_cyMQnx
zTtx_EZd`|)fGn<sII^}H_#q;|UN#79G`Z_5Juf0O?E^A-CQAvz8&>3j{_>vZwT$fh
zb^7~tL<>L?p`;)=aBImMOR8}1oA0g78cAeg#;_$d4S2LGxGZ{Ua|$yXkM%$g2zIcK
z`njvanA&-Chit4_sqBD@mAWpG7{J9RQlsUummV*B1DrEAWZVFAAe=VHK*}&OnUeYj
z?`(XlqFKn4?u(d1Od-9NWZg&&Lsu#SD>`B=@XkiINIS$Xu}O0He-zsR$h|uR9$->h
z9O;(Kk5q5~LcsO)iD<Fr5kjpl6Ptx}0qpUVud&mYosJq$^ul=Tc~dhUz~Lo{;$j~i
zKsR>AnDhtOZ0ru7A;X`Z@zWyrtY<(YkQPyDof{)klq}T2nUpotqQj(w6V(EIMs7VG
znbU9jqffleE9x*H?Fq-sswke%fuKp#sUPem`DvdH(g)<2)JOWQJ`>UhXlM<Yf9J55
z<o%0t{zvcR5JZ4wHdq7>?{!7TwKq1Y<#>jZ<Hm}ZN$pM%gn7BIvH%Bxz@}eh(|#UY
zOWtpKXzcbq`&7yB^q-dOrBP&SDs@=qQR*<CvFDT>+{HY${f*5vr##`=LPU$`I0eMb
z9pD~t{#^D;fUz}X4#|++5E&o>M$zpR>k?g?^VH!IQHoGBzt+P(kujaM4&RE-1AN(9
z5f17iwR<Q4vi)w>#93J*mlSC~Igvy!t$VS^3VkW!+C5*+yD;GMd5L_n$GaXK04}pv
zh+rxG0I~Kqw&1|nN;XZT0R0zjXDyr=!g5OX$Z?4ZsT6aOiN0>$vbp9(|D?<k6}C0@
zh26XP@tZ5fm$_~=)U2`V0~oVY95cw*9)S21ElSVWdF`1Sq$4tfom6)UStYtk{Vvfv
z?iUa|NEU0zeSia@H!*wT<=Inwmfq1{QC{@TTw0H639*ykmT1v8eJ?vk0{Pv%9}t~#
z-*D2z6N?xkyTdbPsST}1^Rpz8c<y7GGhjZx+hdy^W`8n1Bp2H%qDBAd=Ia?3l1f@K
z_dinnNiEf6Xx;c(U%&XmJy2Hzsk><RH+n1jhO7|OY!+L<uKwzgmO3oy<+6tq(US3s
z24qjz%#(A^W`CzI^ppPB4;wjQ?IY*Nn%*EaL^83jQrJ4rbBghukact888?=bkUom$
zW9NNFq!fLk+r;K`+CDV*nCzNVtQ<o00pVz`WItCYD#re#{hyunvx?Al|HwhYIh<##
zIWp#m;hw?0BI1yp6H&@>h>iR#^Ultgzc%k|pfy!BT4IxtVTX)3G-G_K#^^doLT$Fl
zHa3(ag1;Gu!`iDk&*01#9YY+c$92OQH}w9W^Zr+epo1%V@s337+?{&`DZ$(uxTiWU
z(Lf?H>Gh}(jrxwDe=MXY{a{<#o?90jei=$&D1o5_{>c&;5-q!!0GrX^%PBSo8xK-M
z+AxF*@CoSwm})gzkSuOUYBR~_j!Zfo9>F4EwCOxAWQ$Z_+7r!!XhGbx!v?1fH)Wz^
zoG%gv*)k+rnw0kJJK2)!kZ9R6`R&Q4=<@fKXi<X%;6Ei^{?k*C{LrJvR|>8VlE!QO
z>u>ayJ@im-)uMQ$1~dnB1KHJAIehBWieixB3fTR8A`jk66iippJsy6s9S0{}ln4$f
zrj865f;jn0!0v+*x#4d>`~43ZnR+Rk68Vu@4eQ^n`YGgzSYT5`xQOgfFGf_$#&_PW
zzcm{qI%9f3`bo1+>P<+L2r@^6iTWlYRYZSCw}ljv@=EHigJX|Gntb@~`qZC!sDSzp
z-g>L*qKJf0M@7N-iD^@Tb`dh4rkyRRjj|;v!an&pMJe+;kdW-8C_DJDvGx6+(PN4X
zdM7E>mfrtBZ`s2OY7EGfqauTeA*hfP(rJ+g;^xEjQ~EJ&?RQW>lBC;0KuAqSHk%Vo
zsJ;o?dr<y1CM1yeq$wl!r~h>Kwg`>_E+fAf*q|GEFf*RC)uj&GvxpXb_c|oR*<m3#
zPQ2i}-pupQt@#14h`!nQ&ibTbOEKpDyzA$Iqw-LZXe0ghev4>XkcgJIR*7f{DHDQY
z-w;V6hi060cBPa%XzZcA4R5?rWXp=j7xq?!$TQ#jq@K#Gi!biYJm>7*=%YfAL?5Lf
zdu{0pz2yrYsk$!SBgg7<#_#ai(xvF0{pOt1jJff~8++3dmD7k8HuKqMQ?T>tr+W()
zEa)v<xbPoNwCF2OZ4XmGKHwJ*cR&lzU=ON~EPtfD!Fio}_!Q0HAz8hT!$u~$G!KmG
z1}MtDEDypw54GoEvVmOo2>-nA!4_~*p1z4f@vwMrKn0$vfRTxbPPrjogfjw=YGZdq
zi-&WO1^@(wWO;O-7{CJHv@xEK-S@6`3Q$eXd#NTAjy@!*D)igKFz;BN%RGGLUzeUk
zKS6aKTzT{JEC+z_G)AUah#r7%e(dJ*mX+?~grpm~bk=17nocNE>Te-hnyknMNTMc+
z2jelRzoAe#?_a<_;6aU?y#aO^W1$JP4C5u8ej9HU+6{mwA2@j;lmIQa$Aew!5AOro
z5d}c3zecoJTg~fOdt_yPc?2Uqc+M*3{B4Mq27Vg1G3&SVMsx&FMBsojI;NHg5b(N?
zalkj%28>E^w>zSx^e}R1htIUb-^#Y<EroZD<E*5!>QerkTA;A#1~QjA3<%~k+K82@
z)HT2xp8@y-UXB=bM1jjv+W;g1FJ(`nTL?M;B47hx0!*d*=Ci9m!xNSqnkRo(tA<Pf
zB2qsAq1YEB%kM*&A^wh)9w<Z$@8WMlv`E*7I7GAn`1DU|BSm-{APSHKP6CD3##>pM
zGvFqI2srHZO%LspnK1*Vc)qhWS7bb*2!LGx<@*y&#f|{HT;F=67lQ{KlrqS=KQ-R>
zqFHzrKN*rn(Qq~hh<8<@p^Oc1y1horcA^Df*sbx@I!VeXbxVNF=N^A<fb|11yI&UA
z37kSYldE~rf1o#TTuLUN1F8Yw%n9O{cQ>*MkWYUHb*_!Y+94N!wjqiv0$!@mA_7HZ
z31Si8%MJn!^|#5=nm;Wv00y$%WC{dEvaoB9rQQy((0Z{Z0ss1sEYgmnEB5_qMQ|W<
zmZwe%vcq^LjGa&^66Z}jr8hY?j#Yf`gLf-cA@67XW(R@hY-Fp21f2IbDc96}c`2j;
z5SGreyA2@y`Pn}&p!D_iuh$-e1Ou$99b^8y*aLdTn_VO#!pk}VJc`8G)j>Ynj>vUT
ze`@XK9rkhmkON3tVDa^dOkz)flt5kf<hc+EY^-?#ir1PSupR9h?RvmwHj(py$O`fX
zLbK5bEa{2tGi-2|Rw>RQ*=NnG^kqnO(KVt;6fH*%u3WpafL~EX`iHo)4@>RWt;>$o
z5;=s-L3#tV%`e?!o20|LICh>bWuFkcq86pMWb>tNvcFxH{sN%)%DQO4u(7M}rQkZE
z<<XZOsdc(8(WN7@=FE>sFLOm_k<{$e*br9!W=~zed3}*-YGIfQ@}dJ$#vL58#eu~B
zQ_>sR*U5$~>C}ZMU04w<q69|->}G$+P>mFJi2Pi~9_YU0?2tU7gVdKnB&d}p3TNTU
zMcekRM$U<X;XEQ;+6kE_X`+rj>e!O2wX8NAeN@BB{mxlQQMfhs&>HzB5iQ0_e#l4q
zgN!nF`s;n^_m0jQ6N$?i0ug~^6q%`}jP<bQoi#-~`ApNZrgv$+<k^02f2U7Ka`R0l
zqI=8@(wq}WMK;7vuFL%od9#B)tG{&IKmA17StD$-{?_`<?`*GjPs*H#IvqFq@DTCu
zmd*Y{YW^&at!mL1Rw7-}hWia<FGrqZ4x3au`oZQ8YENfd%{e?~Tp~`~@37nUW9d4r
zLpF^qM!MQZM47PfH=KS$rNp!B?Hh;|_L9>DS-7@weHuGr?-Xe&qUFj&w2T_jMOzgy
zoqJu5EZRU=uxD(mHDT-sK{j&D`ZXmpZP^Fx2X-j;qZ>0GdWqnBEB6hEI(=wzHCN`6
z9c(pPM6`@ciaj>c{Q`SueC}J3ry`o9_Cx?_+d6kG=V+HrD`G6?(=m0H+>2Z_@2W)9
zj;KBK(b(e0+2ZUc>^s@IUt@FFhl64#I47}v_DcH*8%r<z8y-Um3?(p>z`ue7hD6K2
zEZ6{QV-DyB6mDL;47ZKf3pnq6KRXvF)zX3au8qM1(E_LfQq{)SCs~j!qF<yRYY{TY
z7#_toZteJtemGB^mi8JFErSyN_CGh-4vCg;FS|Ycgf;YiB3gDBryIAdb>RJ><Hr{`
zjsQ{j<D-r721?||`|H*fP>#fqqD$QpWC@V`;}8}bQ~Vqe;xmYw+zsZ5BRP!Onh>lb
z#*EJSV=Cg~lcWceN^Hu!xxI16q#n%CqG#6h5EIg2Nf!ku7b)TpC9>ti&DFM4TvBu)
zAVf_7@{u9ZVI47bYFz`tqE^h>mtG1nv#dxL=d0%-a^l$e^NUp4|DfcO_8{=s!iow}
z>qEU6ppmxK6cODs^^|!v2EadJMv5(@$6wr3vt?;}^byIGp4uW4W*py}aKg;UD&S}A
z+H?2Zo3vN=6%pZc<Byx(6A7fojZ|t1*{c`xQX(jPHwPj*CZzt2nkS#<?{BSI)mxPm
zVThf54>&L>$GW2SuDvxR%;MD0dHi>`l$^Xr7fb&AIry81mdxR{BkfI|JEu4O)YC$W
zoRo3Qtn_2ZtToRqsd@YO!=xVzAvZev0s`!qL>M8d><tA7C*tLm5E+l$Bz4%T$gcDB
z`(YkJ$<Vn6jEskQh?kF}dz&|GNUFG`P6?Ukpj(yNE;V=#I9OV^eL^^GC_)V>_)!RZ
z`$|Rs4D_8$^uhkqh?W~}xM2sPg}p?yNE?P|S(Hd2?|HGQJMVql<nnj=p*=hS1NB#$
z7zD(GfQCR_58Hqo4}TsO{p?|TWWW^<xBwgVJv@+mAoHN?0npEmdN9xV4iAJL1c5OV
z4?8*!xK|Y*?*U(-W`#)AX8{6AmG#qj7t4vgE22ev9{xRy@^}P50kw|GL-DxLhZPV9
zi~$t#Fm#@41D!<6NDm`5+HpyRW85NXc+o$&?12JDX<$8C=5PA#;U7pLxBXekWv#Bo
zs&%Uh00pXg_@6s<ZlxUYa62>UG^EVg6QV_b^wr!qFL5M`mj{0`<#`DdkP2x{9=w4F
z#@3>5*chMZtqX7h90TzHu}BJOl^#nJ&!c&Hze_tbyJV<Mso!=;9hNyNeJm(}^tT`S
z+g(fjkWuBCiD>yY5iO;M>9hMVhy3k87$7;)z_NN{{Tr3SNyHNmcO)E{04QAluMsWV
zd%WXM?b_OK06^6FxoDQ!L8sNclR1zS(elSccLC6y$G*_5O6L__MYQmyHV^a?h{<#P
z?f$yZzIk%K+9UvVWXA_7M$QZO&ZqAz0C7XW93Yi*fu#Uzkx0le>7?l2=B!0viO)#+
z1PDh;iNuip&sx;yuZw8W579!r`GLVac9ANsNgU?Zs}+myYAU>6QBMHC*OPt-=tCA{
z)MRL!J0IF{j`alWQ~v<5cSZ`(lNmdrP_c-HTT@3x!~!7S{P<ixUWdLRKcvI~f?d+D
z58!7--G9IEmI8{@T;U~*lwupYD1}6{EWS1U?}!#{u|N84%~?lY>S{<KX6=ZwVf%nz
zQoZqF_N*eJWn6${q%AqJmFfZTq6S8r8^yl8&zsrNX;}SOUz+aurR&SSb=@-)Ehv&p
z)QOreWk)j40P(@2^&27h^K*Vty2@@Z%X#brf<uJLiHZ0*ZTe|d$40~iuW0Fmj2RdU
zY~#@`jnos-aXPDpjd^G1t%FJ9CKV8efO|cE>qwSl8;G7FT}90R_>kVxE|H5UBGD{J
zY=nz(kT*R8PV(aaNmBj*g8`SwO!Zs<kUr0zh$3k=hF3qYcSOtT$QsxV6qEw3qFd<e
zpn!Jqvoo}3?AES5&sr3PCyEGB<!_4u%B#PDumf(|m+U#6%y*v2Zm%=f)~fznPu6!A
zX*H$RmD*B(dnBSLH*;?7uzMmnI)Xa&2=iGoVP{@T>M*thnI-bUx&hi71KFFSjnRAS
z@VTVj(qAMLKo|jmETj|WMcphBV4_S!N+>4J-Xp=JfD#!hA`bb9R20!64W~W3<Pv-L
z;BtFUB3D;FS!9uQ%TBrf5p8)+Y!myVJ?YczH_JjwJh$eV5G{#n%KSR?Ut}Mknj?Vq
zD}v*R5F%t~&!ty%MT#n*zBM^1`+~zg#_FWVQ|pQ>=@qaWp{PdM-qCAzSo$d7@`^Zu
zyz<s7aWv{`*=WOj(E<A`+j;u5(`(%!rU1=q1u0C9_&6`5Kl^@W>=hdc^tTW1E_GPb
zG5SIu^i547_g30xzv%WsA(@a#fPXc;^m$$8(pq^i(NE@5w3)qzuC#vF54-rnRyvR;
z+Gkru7tINAXWluo7#qS5S;Q7=yR)&~>Y+F6G=jstBMt38+Ox(F%2@PU%_sK3eF<H5
z|3=5HdFzwGGk!828Rr^y#+r~8)_g|=V&Afkqqpv3ko-kR<b&F!>xgs&l<|-Q@>6Op
zYmy@kGKO=Fs0{ZqQlU!i>3)*kLEu^^hzI&8bu3vMkH{@VsQIwAIXbYJ2pac<(uPeO
zJ1O^~KM8SoZat5(KkU4FOV>7ch*{|e-8JgJeawBZlz7jtS(5e^m+t#4y<khEKDBN{
zg<q5uyzXPPr$(8!+dP^}bD@r+@grLhWzyBMnf4=pcMndV$(x=a9Vci1vfix?(LeU2
zH$`Q}`NKLymRWz+lzon#nyYoWA7_h^qt%z9*Y#w03?(p>z)%AJd<hJRmVr57^npzx
zOcby8LaNwoyx+(bKiAD>Kq+-o5G<loM7C(d`3~P3(c-gyZxJmmGUc#<mB<z#w9jf!
zp8Q6#h@9~|l0{UD*N|u#n6y3kk(|kQNVM#M@b>T{bpLxxwDiX;sz5Y=G*1YWN{<yA
z@aacM_k-N9_G25=;^>OOCp}bt7wPiZmZY!9+OAYl?1NX|R`?8p2I+#J5V@fq$4M7m
zP=ta=i#J1xNVm1@;fH%KKlxOhk8IHfB16h4gn>Tzu5W6vG$Q203(hZML=?lDFRiEu
zoV6=fRD^@injhm*%jNj9&g>nR)KSu-jZEZ(Xoxk>K3D0_-bjrO<I!jBi&7aIP~Y$G
zuU!}LKj|PcR(*gX=$oI_Rv8td;Mh}7ty(vTs{InpC!%HPgAa!ESy53g;}dZ+@uU+g
zO&g-(lgJh+wCJ|misU(B+SE#cc5G_6i1;ziR~A3jTl|MVL~b7!xixj(yxxrS&gt#D
zzlfJFdJAv4Ej3SWDxyVSjJ4$7p9AmhaYPIG=Vkmz4sxDz`Q?dfIl4Dy{83foMqQLu
zPcE*a^rDaS39*KFnsM&gm45A@F&!!Q$(E0MFK0Z6mbYGhIYgD{$S7bQNY(AxKhZpN
z&)k>(MPH;(qkoQCXCdzBp!8|<PxO+AC9?DxeRZCGx5MwwZK=b~JoC&-9X569)NM6d
z*vpnW?9oRb{fDOxtMNstx)dzE@?Mm($pe`04iTk1I)NV^=m1s7kNx-UyypCzk!}NH
zn<r{(yP&Vg+k?1=M`?TjYzoyPQalLDzYQ>!&cMShP)JU3<VGV}c<rlYfj}W;=}5Hc
z!ImfP_yBVr_`UBT5dh=wKm_yRp9OUAfg|g>z@YQ;@QhgU0Lqj8p6Bnb`Ub}8bLP+k
ztUdr6dHw-?)Ik7bKAiL=04pAXa=S~XCJ#PBV>{7uXMm08y1vls{M+5(dJg?2Yd|Y-
z0vN#a@sofe9+2;O?%sg84;Nr6<&Fn-<OB6HRzQh|YY`>9J+DXvjt6d_6@W-VW8S(e
z*RCjV@0$777RV+AhZ+Jv9)LG+6o9I&lAsIpj{;_lr9Zd&>!Dp_MMWumi-?wOIug5K
z9Dv4I0sYPoxHUC}-GMg1A>#(H$jgp&P`ibE9Dsjy4*-q81Trvhj=^<VZw!1Pb5iHs
z4!}G-01uFp*X>D3w}Y4y;UOj7lP@nSLJo*YCZ&G?CZy%z!3{)`x<+1e<Rs8W3M>RA
zkVQZANi>W305};Z0E=G#;i*4V`}EP8qk9bi11156tR4O1al0s<wT~=+G_M7{vr;qY
z+!H#GR9Y5m0MWuroaZ&UYHRyi=oQHWh~v!)?6iJ5qNPix1F$hppDjSxddocXF6M3j
zRMMOPyMZ&!HVlY%habxxM%HA;Gv2%bF4aSrKV^R1gCmD{ajT=!(x$XV+S&lV8posn
zVC;i+fv|Bt`*F*{TdR;cfY~~fUXDH?T0{;N(GqYOk)SVwI?`mrbN-~P8{jSvY_`rd
z0mWS{rY_wSpcpZuZ#=`FdO3BrA`j6lbSA>?Tb=ya2GalD>;O>TSwJr9l}?ZgI|G~r
zTJpqyJgL9{%l4G^*=juvvpc^y`+w9NiWXcFa8HUVp5NBFsHMwN^X1sAC2|A&>EozG
zasW~h!`7~pP{2mIZk>^jNCs=#-yL)r*a!G*eeXt>UPTTfi;#gp`({tbM4gc*Ry~oO
zB9S2>sYGVXj(nx=u@>kn*)k|14Co@6iO#50rU62MnQWaXALJL}Q<@v|Lf#dnkUs40
z*g<Z&4p{=`Pm66wYyqQ<L0<s=fPaxE*1HIxec0GEzAJm;bKYl1L{83E=OcRp;&W_>
z3gnOi+SWJ{gC5fleP~1r0u||P?vZz7Es96K{j0gpb!tybKPo~Zdh^2{{-8YGS7co-
zU9&VcWMS=N$VU397`-TyOS6wuilXd!^y&7+w-u3x&_QDG@K+~k;-Ql&YK87_VlY4U
zB81_CArTk6_;Bf@XaTm%{el!?vnS3DIeumB@$`yKW4E78dP{_u)ME$~2YH!G^K3s=
z>&@O|-*gU}Lcgug0)wNs<T*2>DRSDlc=WTA^oFejcptR?fkm7kYRO!EH%=4)X$1B@
zxzB61Oh2@9L;&tf;viuU7m4^$QX|^`=)jW5z&gJmQ8nfVG0aAaI3Wvl!5V<BJ%_P5
ztRFg|U&tzifoNC6mwFu1Ss6dFs%m`17KpGVcL2RT4JgjGYNxeZjqmiObUT-A_j2qX
zqJn+Y=Lyks(Sc;_UdNt*+;G%qyC3Qru4jF7q%rsIAw;g2Ys9Sk2y?FQp8Ks2B+0@R
zQebw`T1XH!la0_8Qby#Y=sU#lewllzD4&jQxQ`M!t+rOHKZ4|uUa(SY<vw6^B6&sE
zsG~yeOOs9wxk%60dFw_hv(d3rZ)DA??}ccwPtp<l7^3Bq=$5_5JRsMQP}0WHGv^~%
z*+6rE_z<mg?YwKNSiI*9x@k?CTm93n{mlGm!?;ARiJXy=O*___w7ZCrc3(*coh!=B
zeGcLWF-@1;f0}FONb86+Y&wf<el7RR?$MB>2qEdd%wgLPeAbKJaf&hz{!VYL8TaVs
zSeyEfXf(I<!+cfkq}UGU8Rs_rC~X;j8A@O%fuRKcxe^!>Es;MXZ{zVoh#)?s2b0PR
zIP7SffuEh%B3Te7$PoaK8_;$*-;Jhd7OBG;(XxJha?^$^5vAhq>bJO-15xu{(%opw
zwY`u%es>)tjao26qNPcB&%W19MsB!<M9ZGZZ%;mDJ$$c;maO?00eAV`ja)q>L=xWL
z`N(<s^A1OywX1WTmG=v%&jaWuMaX>An~~H{b5i&m=`dmD%&Nb#W=S4q?z=zrMizEt
zPjtBKR{u(6b7H^PhfhM*j7ii4(&X4v=U3Fp-T`$tZFsM@GWAMCl6)9j^yyY!tliQ2
zEXv}T*(X=Dg+lHkCV>9x(7f{WGu1Z2NK^~ZAHjgEP{YN|z2DUzQ6yZS2%|jn(zoL?
zu9GjjtkRQ>Jmk=%Bum<>j0X|%=8G>@6o?dAC!CjB7&B*7#LycdDjxauud5GIpG{4q
zkBAphA)hA#L`tlOfBDNI5=JMI=(saZuLvHg&=3+&{Qj2Sg8%&0cS5w3{4-|PK~&AV
z`l{X;KmBQllu<=?y!2>7-yeItB9YdG*isvXYzY0}qz)qwY39uFh|!}Gm9#0cc(V7%
zZ*K00mb6_HRATg{lV`qm??-oP{=7e^Uw+TgX`(aiZ}rwLzqI`dsc#>epQxs*uMRML
z%rV<iUwTM=A$5S3L=UA6Q>#T3ruRz@Uw46BUiI(vW;fLSSZk!6Y<6zGe_kmhsqG{W
zvg+1T2NHNd2kf48{F!-hpH&`{0KrF+iUdf>3(&(Yz*<_UMzjD+e!uYdApts~h1X|!
z5axOw3;{Mkiz*5n01NrzK@uq7L9!k+Gk+f9eGZs@Qb0b}@X)<J=}e?jQEO#o3OEB~
z%Cj=p;jsv?K(YXB0iupOQdm^sW)J%x<or_-dBOWqYOzMN@b0|x**gksVng(`lT|hG
zl`nM0i}x~cLIjOm{RKuxCcs<(zTEWErZlhL`kdLyXI;aCdRM6Xgo+4JI2{1XyZDKK
zA&R1E#{;@_C{i~eF4TYlz&4`AypW%lcFpDB;d&15llPtwE%|WQMtuby9+31cmj)a@
zuRtC(bsk$0063u6;sBLAna2n8n>%GrQXyTj1JQzrk!GogmH-6q5gj}CyML+r2Ap+B
zeFFeSl+eHOo({Mtt=@|v75?<hT{Vw@FmnUgBNKq2g7S0;Spkp~odRH#t_J7{(BQ!e
zv~Z|P(AB=UpsvNc9|(y6TbT2Getifuz!0$Hu!9c`7<_u~vXECt#1mLFh3JDnF8*Ty
zIOg(9kyasl&WJa9NA`3?%Woe0O-Hn(4?oC$SnH9@^Iu<$Q^W^g*8z|$Vl^NOAZWzC
zBl;fg;{wQ}#!==67_@QA2bD5UIydVUuvX*Db#|6_XEQrf%eq>J*7k+*<ee8C2gsXO
z<9;9jCoggH0l3<st@K+wi!~?;$GXseHiH*3SwEcAFV>`Wi-5U2ser7<!>UHhl0^R`
zq6N_snE|`aV;zx)`T6|I&wFE&w#u3Zmd}fqG_nuqXs+v-Bq~F1^baVtE}qLkQGEw?
zOJ`L;R_5G+O4<1!lUnO()42dc>Bpocq))(D>sV9;QWsFCK1}!A6R-7JH+d#XU(Ef_
z&;CWN7paJ({=0X{z2(6V3|DvJC-KTQRzPfv!XqEAmRfC5h(!7)LJ!~#p0oDNxuXGe
z<YcaSO#|mVcUcQa1|*;qOj1K4TzFrb*L%_*#0a~}D<1iUQ18gpspUD{NHpnd-cD@*
zge6jkeFs!qGXTkM?R6>Lq_Gk)u{;E8^Azvz;M+v*ck;Fm_?f)T9byY{iHN3yhX-W0
zzIlryNXZ>AnNRlH+PUTK?PhoBDlnMt*yoBE2#G~k)Oi8oYLkA_2$n~m^hbg53qqv+
z`1Bw5)KEdjmQ3?EazQu&YR#u81lJ-9wnGXacGJ2gYx@Ho5QSwAq`UfpXtB=OX;D|A
zYt$(NK3^Ff`OzsqDz&oq>=6KG>q}Ij`bXyN!PKWfwAe>P+n5_fga`ytK|fCVHRP26
zK9geFQd9wh>63L|YuX?C0w6o}YykHkZB;)d5f)ZV#@FlysoTHk7Tbo@o;3ED{$AWg
zOo%$s7wObK{`8}YtUyqb2V%I9!HsY>W{3TsE?yctfy{P&I*Sk!orcge7xWY;-y&KN
zE$&^^1#3i0vsK3ApLs?2tDQx6)poPr9x~#P5JDXQZavdy*F{z!5D*VYV@?zzmcGf+
zz;tUgmoBo-xac85)7&8>?HeLm?ep44@aS*JJKweAzJlIQ&wi%95Mtzv*cfZwT(Ajc
zOY$=t#ZksR148SiH(x3{z|Lq3v0zR_2VOq=in7_9ZM1K0?CXe=4?`;E^3@M|Z!W#`
z-+Z&dbnMZrAN8k@mK+HVPN8@8bKINR6Od-^Ymg?KU=Y<(nps2IHBLIjK5=re?&$=D
zwC*b+G4;57M+VzFE<O3u>NiJ>ZcGRpkz|{)Un2Bg>wS`a!2N@@Pp6~>wI)TsxEIjh
zC*l~P9ufO#9oWaJ9#{yxak2l>FIuzjee_<=;ZU$wkvaAPY4)tMeSVmT`s}}=&5RF$
zU1<n&PvRI{Gu<1kOx~|0XVLBc=$~10y!*3;e;G<(D1o5_{<#tu5-kJrx2d_2Kzy{Q
z77;Cux*4&t=O*2Y=dUOhB#Ret3P?40^WwZgqD4E1l$K%)(0`~r{qd4V+-L`pq77tE
z>q8@25HvnNBw9#v&kyn)Mzrj?{PyIN*3b8yXrVWV1h;>FHYRu5<(ZhZ-i|t_&aE@+
zoU$LWE%llFZSd&Y>THZ?kgch6BGO^b<(F0ThR6<4C99uDwA@#bB_c3@`#xJovrV}M
z(nC~?NE0c;riEyk6v76XBW;+dl$W1+x@xNEzvD~ajV-T3^EoN1MAC@50RC@Ct&^w!
zbXQL;5b38x=pYObD8?j3l?WA)6jE008=JOoZsaSfBIkIJk+y3>(t=5erpC%av9qrv
z71}H5%ZraZ+CwC%&w}Wgd47nN8OQfF=UT77u&np+fBa`}-Ktgf-n3Ir>78)?`BftZ
z;qyXj<UH{I{+}ANx-p8z&%Wg1-iSoe>>Hx|iQoOc_t1a*@;fD3%!k+f>weTb``>Qt
z?VV^E@?QGq2daOfTHal~rfRK_l?W3l<EDocI`*_vE7E6FNcWEuxwtY!%cI+fmdtN|
zj!XaioLAGo(yLs@wc2^*L7I6iMfD}~<L~t{&;9ml;`?8!o&3#SV;@tSWzL*AMY7lr
z?N#>1*Is+Aw<hW1R%I?;ef8C{qyH?TrOBfK9fo2Y^$>5&p5SxTd67fsS=+6bZoK@J
zbJoDQ9x6SY0~-_smIM9FJh%f^c<cfyc>eR&_RtO#0_@0_--s3<>hA&y0Z89WUU(Bx
zgoo~!KL0WY9$vi=Isjr|2Cs1E7FiGg2SCV!Pr3wvCJ@DgIj{iH!sAnV4W9eJaS!o;
zNMNS(ksTiJd4vL5q`E<jAX<P~JEso2<C#B}S7XT`edHOvyQ9g?LpEUO+@x{2G~l|2
z{fc4;Frj~U1$^awZmb^u+kE)$(2o4{a@Eh9I<NY#n7Kl$Ko$k6fwsUVV5sy;Y69@$
zCu5+A)L|9T0`O2HBW?S)^Wj<m4EIpqmT2kP0=y%kzAP}+`~#a2Oh6#s>%cmopC|-!
zk$y!j3*ZV+gI-qZuy_}XXaPnl;0t7QG`V!w+y1rso^_oe6rmwq5n%C@$PBQvfAj`W
zqJD~+HqtoJQR8*ZMl47xBXx;l%4!|F`tGY$Zvv3SqqgaveCztcOO-b)ko&mQRXJ+R
z_yTSK(ZFsV$6bAoFKW%qOqx4H%Q=Zm0N(i<PkzxKK%|W!IRIfiuIZ@=l@`(R>qmcG
zeWxq#pG~28)PH0I(2LB?owgAsfLvf3FpbAIVp7^7vKHN-ElW>PV8Gq>^h#|%-Y#Q5
zk#hzf&@rj1@>%Qeq<~z~rqJyPX&aFM++G;44oC)+b*`gyudjdBHW^E|LU#d}?2zaK
z0GxC<><TY_Q5BcZ?ou%!S^!?sk}b_zUlUz2kH(VC+Ew!EY?rikJpV;1@u25vY`%b@
z<l->*(#_D@O<O+<kSw)aS2IbJ9<rdV7qV_Y*N5|kO#IUxFqUjZ6R4xY16^<43FvlP
z>K!0Fd4B`g>=_02<+?tPTp$OvPIzCld7=l9c2do-O&26e=8BM#BBkha71<5|$({jG
zMerapM7jV=t$Q|({p*OxZVzy6;2|&R5&H!kN4!Np)C@64)-j+_y&VOwXGOnX3pt=I
zwpD>;GVE$;oK(>-bXLkKas)zd{Ago+MvB0SCdv9IYak;cQ<M>ML4=aJU<fXD_@U?_
zAXuMwmUYQIQ~ZO>bNd_cQXyz-MIZEA|B(X5Pe)cH;?H|(1Ucx-&PVBX-n4cY5)nX?
zGM7kEMBtGjZ`2C|Cev*|IwHTGd;2=zvxpWc9&bp>OnZ|wPueC|a$wW+4FE6An7@sW
z-Eb{{Is(O9tP6=FGPA3-^I|^#MG>k<5bF&&gXk@yW#NCNouu$h)Sfy+oCMgomAS@4
z+245Mv&BV*HKHZsV0#|#+hdWh?1&mCY|r(lURS>(PI>Z+RHbtW3;N1C-%)GYY-D_8
zM{8fS4!0lV?QitVK8*-b+k@R?1FU-hy3|lWY4$#1(Rih#Vhik-zKcQ-4chKyT$dNW
zD2Vx9nJZBj%R)dPsTPFnV7n0l2x)--*{RQBtySu<%!N6%7mHkI>sjxcPSn1ZJ|3JJ
zAoNRGHRPgb4K|PtxGyn2ddY4{rL{OtCu)s}IML_Qt!mhp9gmWjL1#R<a_qwot8L>E
zL4#yr-$k9QdV5vY{+}zwm-&$5ihRwB>+3i9uu0O@Nj>G>M@p`_X{!+}2oNOvzS$pr
z25G%2YaH;u?Df?32zkf;Y1e-`=>FyE)S5GXHTcN&)sS!IXGzvX?KjQd>r4LW3*DCH
zl|5#MpH0LJ#~W)@Dn;v^Y(?$4Z$z9RJKayVxwL-NIAOn}Y-2;%Jt@(wv(CnSnfspS
zYVOS|f<|PS8eCEea=zg7p-<-DJtA_KZd-F~7rEHWkB?oMcKESX=TNj2Qbd|L^@rGG
zHqhSYzRLdMKG*%Qy{OW9<$jdzh*UeU@902gBDt(z>l~@d_T3k|y6nwmu@7(7Js5p*
zSUdKWZ8j_Fp}kK3bC;fZt2G$Can82$^4ad3%s|emIjk0kUxpGGN?<5~e+>x?iI#zR
zV94D}AU+hSZ@_RF@EGw1&LdyyAuyjsw(P1#i_bV`XQD+^3POb^u=Yi*IM4gigNa<R
z!H!Ka>Vp-hPG10DJ~NDH8JMiU;Yah_=F)fa7GZ=e`&T~nTVGp^Abkal>bskGI`^-7
z=v!M)KrXLVkLX{y-~T9i^4>*Se46?!NRk<6pWT~%*(F7g9G!Y3uZ0|0vt(&Sm8|SX
zv;gcqu+WP-`ZfpAqF#xj?Aky~Oqh9mMN=S6)Cu`G1ctO;1<-rC$l(D)dqo<AC^#@R
zP^9=mzHAD?yXe;2s&GBB1{tJKy(kXp)DS0!9zCHqCJ}ri$EE1~D4&f-Q*4~HXw}cO
zqrC&t?!K|1d+)bzZ%xRf)k#aX?BNCZ{%obunmkCfy#GNFLl6J*S4FgppFTZA&Ai?T
zDK>w|`0>3jqyNhi(emK`{lC4fn>JS+lj%u&HtV8z#m9ysejfj?-{tTB@tqPa(t5ep
z{A;i6o%L@wkY=S=69u&Fp@)0#gk<?(-CG^e5}QMQNFX&`5MeXVjaP7@akixH%F0Cp
zM2or2>mRj7=#rOx$Npo@+N<nQ`e48NDEjH;XYYG$)4|fD@86g9fYx~uepwHbC#G2Z
zWhoXvH)*>7Iv$ig_yaZo3R08sNCZ&*<g}j@2<+k01N_a8-&`OSqJ`X~duVgWliGE7
zh4T!Q|C{Ikp(DqXXSjUg075BfJVbf{Iw~)9fFhs(f&p;N(^JY5z@LX-57?qvJir4B
zczfTQ^MGD2#(R@zqt!ky=Si8xYgmK;KuqeMKlF<q+fN*T4c7;33?6N#@ov}0A5{H`
zYm?UN=tGXK@c>YfP7ej(_MN9WPgT0&q1xVMKLZK_urKR_%z!k2#_fx4FYp0b1hf^I
zAx)85D02fi0<aKH4X6d8d#G=umO1@OzcUy0P`?L6i!qx+<i+Rd|H+e1t|%UWlpOy6
zisb<jcxdZWdBX?XF>k;hzzqOKzoabTslWV<5H0x}nMfn2{him6F_5v(nLl8)6ghx(
zfLehy0W3rSsbhoqC=cL(hrkMl`7F>mp4JMdUp@D#nj<M;=rY2>cX9>-)|xiw{b>iZ
zj1B;2uExZ}*<8Hv<_qQddVfeAV12q?>juyXpaO_1?ksmdc^I==Ez*eqIRVCsH9Hs4
zLI;1j;FpyKr@WAJP4cLJ`3@KYx>{q_4-$)v8|XrAezvZF+dxL1!1@f(7h$H}2~d(g
zI>=W0?eH@o3&|on$mb>mEK<N8=~}>108d5i?|ABt)QebAwNPAp@5rq+rGMT7jv*AT
z$=m~v-_M-zmIrt)ik;wLY;F`5=DB=%iXEFz-q6n_h2QTN{Vo8NngDCkURN`xj|LWD
zK;O>`Xh)Xv^8<M0%{Q0Q5dq4K$C>~zOD*TS$QD4R$h?*chh96ZdB>n0ZnIqh*}0mC
z0s#ObYRw;DREnwFL$vVvM&@+7+Sw2GiEP<SbN`ETeqO*HyM`!5g0T&UCE7unM1b<c
z@zj1Wk#LAZ<Olgn7sXZ}!p$L~MGX$bR|8?qfq4UT9ucy9e8??8xB4b*D>7R@jEPPG
zVCjsMS<?8d4)It-O9*uUp>}zbOEaeC)wBSb)*z5k;qs^AO^#Tl!=;0nPjkvX@oqP-
zNSqBJVb&$x-0zZB@^>K(5QEwtlry<Ed)pl9%rSCDS|YMhJA>}O9>7>ck0>osO2A+`
zV(mJ1Zr6Kl9z}upS)C2?WVZp`)}1tL=27ZFX@5GPwTraWMn|-K5~Ah8KGBk>*Vs4*
zz?BX1V#5)7`h&0#wTR$QEMDZ5v{%{&67!x{{Q9Ge!F)-NX1=XGIs&LxFx+*3+0w`P
z&Vc~XY#spA_F-hDhy-InYCSBX<)skckr%SUTqCB{7Lv{p2ycH!eBK+v4=62-nFBCR
zpLUj|-<vGkd;#Ch6~G&5YVJ8c0MhA^=%bARrI7;W9r39zBHXm+^K_anAx%WK)RF76
z-G^pvT$(5o<T^5i4&3?79p(6e@IYd-%SbFXm`y;o0NR20(u>`hJx5>MTUZ17UgOJ}
zW;^xcfB^I&aumFulKo0e5OaWV^j*ym5ewFoKC}JoJ8+(i*&gkfcQW#GJN(WbaSU-i
z_dmvRerj)-bFyIH-QSRf`4zEKHaeeUvD@K2?YW;K3+q=(Rx+QG^<r(4v-K_t3Be(n
zYM<=0BCe#86bVBAR=>Tv<ba%^Bi4k-St-caRDBis$DVqXJ?pQt>Q8m+%i5pF%e>mF
zUVi80S_g<^DGses_kwJK`$qO`!q^G54(KnE4Z+Nouwh6igzfQ1&4?a}N_8*5j_y@<
zu_9Wc<Hk}%OYY?^4yi9%21(<5?Y{8F(wcwKuKIw`HYQ^;R`-JLCCu+<Ax1?e-u(D)
zs~?D0w#q$#C>E*XkcDJ!{@H_C3z2~~k<7>#`-7-y`e-hY*a%hnZ9UReZ8_Wz4jyLO
zv3<6?--lm^5*SKgD1q;r1cpS*PBUi1*cpU(iuW7HEu9wt3gC?d5siW*01`p%zCnZx
zg2!$&)M#m`!?XjS0_X#;TIwo*gOp<0@OkYvq6OjOJp_!17T0Ju;MyA=-(CW4{C1w>
zA<^>f<@jwrVg3AfAX@5%a;Je0K1gi)Q^xCPH;jJX<$3;)1L1+V5Yh5UK=|X&J}W7<
zE~(UEUm;pj_#MGPhg+L&Nb6d;E@DL8mBS}b?#;gRk`%U|(^K07F$3TiP4U^MLbtmI
z71yt$|FEE;vV(abDKaO&e-yj()E#$L<jKZ&lX5JCkUBJKxEwKMN=3dLnkX3L#{MBw
z_6bo_sl)1Fi)V5NJ`{-)n^tv3QbXjmM6s-X_W7zk^J=17CZ2d=?<+)$h?ZX^b=ZoP
zy(6c`^C}TBYM~U-62P=0TK*$M%ckC_!&9>$^=*o1Nevgo&SSs*ukV~_IdIg$RUc;V
zl}R^n-F1<F9zwD&6w#8pI`6&lX79sy)~C+O{za%s?I!iwq*=3iQ|Bi_=e%=!hfWyZ
z+Y++qWvRn%>ZcC#keXN1yY}6DZF>lD&Q8ZSJ#VLv)vt8QUT6Pm-&=c)B=sGbD7CKF
z002M$Nkl<Z<bAKEhr4R;yL`{Xb{+k1e%@VQ+8FBm0g>$t#sg^1P7MjEH%<<i?O_+#
z_f)_HDHQ<19yEDI1M@sI15!N@0}14K=h-bq3~zp*>t2x&@YX}A2k4PWA;Cjggn$$<
z00$5Fz((MQhw1GQaDdXhFQ2#h*pdfjqz|G6Ai;YW5F^3?xWW@yx(}ZLPyr2*71BI-
zI3}}`lX^>@e*lMm@TlZ*$h(-wrU&r?Lo&xD&jI=Re7YA<ts4&@8~AcjKv&Tw2gQS&
zHzhA*pte4~5+GE6fKKF1cgV$qJrGgau_`(npha3LsnPC!_MQS00rQH&dN5ae200{!
z4xkN~Cpv^jvj_{aGtbRCx2Z?XdA|4?Fo6Dk(}<SLM$N0wA){LjhDtXS;B`ho5D^H^
zrhqaM4)_aTWM2=6_piA{ib&}Jbl|Cu2mySFA~Amei`KWV=x=0AUXB(a0i-d1YH|RW
z05d$=%{MPyq)P#M(HHHRcj-X@m2?D%33M7E1x|oB1dBeDe)MHn7c^U0E2f`r110qV
zP!F7J(K9@b>8E)#PH9XKGRQmn2Pl_nW=jgLck|dqQ~}qd#d{|pp~xDMV1WN_U3Y1d
z$h~wfq)`K=t)rUb{LMjj#_44(SYsm5MDNJkkKkJUj%bBd0UPfuy=<TN+3tGAnSZ4t
zaaM@F9TM5j1Vp7RQ7LpCA+aDehXB9oDj=yGgV724VXYz|ke_ExKdbtoK9Y5Y0HwR;
zh+Kh|ys=#az$sz_I4Z(Sv;;7-bR_+=F8z%35zV6TG0+%5Y6RLvaJ?0s&^O~z9|O?n
ze8j360>ESdwTKa)DO;7cfof@{WLX32f3a@KM}K)(dm%)~7APzN2k0-d;M2c!&OZg@
z4mOgX0mZ*K=NGl^k;CTQ8a^m`byx@{vhY&;Tm)eetto0xX937SJZZGl{As|QHuTNh
zACj6#*4c4K9#<f)^;bpE<B{F=vkavUq#~U%uK>ce(NQU{=&U)iF369kIKutB8Qt~m
zjp{V8Z^#a0k?0mt3Tz9K4jC$H5QvHN6WIZ1|Lqeu_ildd=CVWdq#gS2|MvI3EMmx7
zLewCc)vRGdfv3Q2*R=*k7qmlvx3|&W?Y_H)c?3=aKc!nj3RJXK^p5SN3x7;1D&F|g
zH6b(2d7DdioBr|2zkdF8MLZ!y*pUwdLX#carjJMqpB2eS&SMkVgJ_{U)|Ec+a0kBL
zod`cfB$82tgZbtaZ(isY+opC0@Y_7IMfAh<T*tb+;q)7NSB9(v-n-s|%ey@Y`DYDD
zmxP41cZkG76tXY+Afg4ZFS@1elg3P5g9q7qk+Y+H_Z|`xkvM<qsU=r6))1br##XY+
zh*go9+Labm#2uhpq#NQ1S&6LIUwVaz(ih{gPpW@*O6<NqSB<EQ-~7M7`Gd|GAw>Ly
zI65E%>79K-B*;taUaB;3*1EJ;0C@KAFIn%^{{g)-S7gfuiKIf(nkV~SMeD?lObU@j
zhKFXn_9punqQ$=UYVIYZ-)j2OWJg*gNuF$s`vJCy-9z$C&w6!VWiPRpBM-DI)uZ!w
zH5dABEYeoe3DIl%Z0|Au)~QHO^Nt`zdW*m^Pp_|kBkjLf_Rl)1sFt+n^ZF#p%iK8E
z7)1lqF(joi)Ozi&*(gN4=bByz<-Uf@5V!U|dZoS#5*$fOugTASrh7mnuxlW!-w&~|
zam&WYXaC-zd7n*Tr@ER?`xQZl(CccnEbKj&$VxWXxY$OiOwIks5S!MneFT}rHY52x
zYm%40A*?4S%8brhZ|)DY?cVY?u~BTCdkF+o%>kLFFUS~saIJ-Hvgn9b^UCIm=DK9|
zB^B8<ZsfR{SLyj4UbY~8xvz9vq@kk`Ep08;n~|ww$JJMFZCi7LCzyfZF_gei0z(P>
zt4Ux;w0zBs*hB`8+O$KkwA5lee!b_$8c8Cp7cXB#iVef-sH1WN6)$h+c_C(;?}p#s
z{q8gGypvG-M6Eank#k7W-ZYP5B!KqY%jb94kOn(B{Q0j+z`PELmcJ^dzv~I~^}hqr
z@^=-=Kx5m9mPE9K3=z??TcTwkqNQw9^tssszx!EqjQT3GF1aXV%}JF;%zL6`K1%_0
z1=8zW4+=Znpzid0*}yMTD`jKv`TOp#Xp0ZuO%CCZzatMlq$k4Ugrxa8Iw`+Kr54q`
zAzvJyix5lkb=Ubz&ix`q-8~$(^kVxYngv-TqGio<&sTb_SDt>hB6xNrTHdeJVGI8A
z*S&QsS7l67`$WqHAtojisZ>PE|M|a_;%wBo!;+3G;N3+R^`zoDFhtAaeWC^VB-*EB
z6&dA+3_TBNc@F_3RhqPA(yfh~Jh`HHUVkB}8Im&Xjb$&E48GunAA(C8b_b-E&$}+-
z$7knUae0kfgw2xs@9!;7w9bZg>nd%V2%m~v>Ju$$@Kl6Oh?cR52qLqW7cUOc@|)gU
zt6r`-(#JNxCES0|m#@2K>D@o5AK&$5JTPY#OFw~?JQ1%y?fObX0^}iopvTj%K3fmX
z9y~oPd$9LV&I27lDxHo8Qo!ha&;Pkn#Q;MTs|9u-3%V2+ljD6jy?176zW{>){=5&j
z2Alw_NDbyQpGU8(83028h<PaXu<aZ~%iRH48qorrIx%1zuS5l9c`eGV?_r#`FtP%0
z%#+jMAz$h(UiJVSfWV4?F+lF+>A%kd&b$CU`p`f+B#_^MZO)q#@PhX?;z3Fm=j-Di
z7XP6<_j!Kv0xgg;pL5V3FP^c$_R|AM>I<*QWdZs`KLF_q#Lo4A8^B9ZBvJ?g6s8?H
ztx|vhV*svF3F(tDy1wyNziX{__i}WaOx;6$W1=N}M!p%B7vKfK2An)E>C*h};FT|$
z1n9T1@70Su11?H=CGrKx=Rnp0lYHmFx;a2H5C?bx$ZTuRcmMp2yZ}YOA5lv{e1w)c
zdNyDi;1v<rr6NO4bl&X%HQ=7~MF27&oU}WL7y1NDHb0HL(7$$lKQnYCy<9)tDIf_*
z|Hg*bOAi;sd!1fSizhL_MlG6y;?+x!7r*jk0dT-eWB4R<V_uMxb0(i$wRXrx-63)S
zwyIqM1hO8A0Lo{Y9JYZe%(oGC(^}oEbJ9Vdbq#fV0Nns;KsTZh*!o|ICIHU1@wU%Z
zBl$vS=m=fnVJ*cAVECNl&MD9ssU=n2t&46g@W}PZ#plcs;1Adal)NgDL)Igp)bI2I
zFe-f+V#EDDV2hm*frf<Ok!^0IntE{A1LfUreQw?oC#}d?dKa(i)68=!(gOIbz`8a&
zk|6230$7??hilV4z~+TX&DDXVhgQVGgUjwOJMrA==W9I!#>vE39DWAO0RoZBz_BY+
zL|j4cqeEyQwL}9*rDl8u=H)X!H#k3?Y`!QR`Jd<h&$2u08j?VbA7l_<`hyS_0DgoR
z8%M4PSrHwg1bA-yUHt<9Jphv@y*UF8i$tti4T@?fqR2I@+h)(CB?Rbp&taXX_8}p5
z*z&cB4B>fwZ|Wx@TJ(+1u{CT@MF7U$drw4==(WYG78hAW=MYCyDY8G(A<}JNsx(_c
zIBN~uy6~2YYOx-?O8=tE#_6D!1cy*LI(l<q)+!(xVF<icmxvD0K_D(5+8U=z)<m1P
zYHSx@+8Q#C(he&64VV>Kx#qn!sTZRDS5nl*u8J_xw*5}q`dUOwfL3ctM97TnCFaC_
zQ?kq2lDe=``ep4}3qFTr1$fh2c7h#e|BZnyyzR-`iaZd}XuRyJ`Y)UYkZmhcj|{m;
zrj=GMpfOzsVw-z9j%blS%vc-`_Q`(x!<$K0bYP0lBjH2_u?=TuP1~E*q|pat4#Lcs
z+q`I}$;{8~AP<Lo2XlvzK!ja$%C!{%XKWwGHX~L=xr*|6>;1R7w#hQYu{2Yn4^Hu#
zGQafZgOA^@eN%z-cQ?Ho9o$@`x^vXXu=gRk)%RlWrL<~M9duBX!o10+l#P{&OJo^G
ziYH@>=`=8(9ck-9-@V!xyH%gx`s+V`=-aj2zwf(W@9={UFM=P@cEZs!%dXH#Da1G#
zumcE12NO`=>*p@|L|>8eV*|!>t~fh-U{0j56Uk%m7bWT5h7D$W%*8i3wv9pG^J-&2
z9-Ta;i$JAYGonM<MDX9f__iXU5QFRj#~S40#UZth&6;yR*pZ|gOK#Su>$Lgrs{tjc
z_g%B<sO0yzbFU~WjBYK>J)4wWoHf`9dkX^9y%SP`eXeM`+-q<js0;%cJ9~=2LM|UQ
z`lz~Z5s7Lp*+BdL^5}wjpBz$Fw5l|$qe33DLr6RJ`LXDqbtJ0K*tfD*Su>&ttp#;|
z=<gwktVWvMvgno~&B(?2w~pHwYc4V+oJJfakF@EZ@pIIfoPA8B9NRK25o98ro(mzQ
zZ6vyTN9)mHZ}~?bU#C+|Z~oDJ8eVTGfuRKc`$=F(wCrY18QNVLTTv(gV`Pi^Dxy!^
z;EFg=U&VJXM%?>|mKM?CcN>bApWWy?SL!h&3PMHuh!$;2E2e$Kjc69<IDbgA>}Jxw
z;oIhQNVI%IVf~#>vmS;-%kG7_jcAcN4AD|W<98F$vJ0Z6T8@lkw9597w<wlbsa0~!
z>{*F8Ii{jqHopCKZ^PQPRp1`+;JV$Br-C}%IQM^U-|}D}0z<8ow_bU*B1c5J9GUjD
zt*>KK%SBtrg^eMNMZkO<yH-TYXYrg180C70O`KQ+(19uZeqe~|)z3ZGTb)Rj7m{A<
z8xbwJW<|7IxI3a{Y(=!3bkT(!(UNO?Ezy$EB2Pr%92L^&n3I#{<Khc<B3kb5y|H|G
z(yb+BKx{UmgnWqy(W3Uu$gyL4qmMkIB8cW)b!ClPWD=r9TC@#o5iRRO=pb5tSgGAy
zbK;37^`@ROw>J~f65?nZ(empMEieD`h?chZ?Ys{fx0j=RuV22$UpyQFYP^8&JpF%s
z+K;QI2I9nn^{PZ<@N@<k1L~w?0(yC1_J9m90rDXs)&)$F58Fd6P-At{dH_wOP4aLn
z5<yLmS(8o*;CVs-kIe<>A!U#_Kz9JmX94WU-*pfv9{%;s11b>Wo<7myL3mQ%E0LG3
zbPdkY-s+@*0kTy7>*y)~Nt^l%P~eR&%>|J5?GN9shk7Y#6m<q>t%|3*2V|gM17pm-
zT=^cRjR#nu=EiC9o)vLIM!;wu|A2E~381s-4mp##2X}qorOxyAgrtMvH48*??X?+~
z_Q_QAj5YvmKs}z9WI|qV1waS1D)P!p^^JU%OsaTphy<RPrN>#Z`rv1CYOVnjK!z*g
zy$Z-c?38zPWQ(v+bH%t3EdVQIkMuWHEH{9jXaZyfupRj&2Rd-Wm|cfAs#GFC5J0JX
z_YU%Pt~mmfnkVUxc(R&1*Q9&q&Da4V`p26YNQm51hlB?>y?i`iinLfQb&JoFn|}?2
zlX?cAb#{mcp4`BF<d=S|TDPk7n%=oLt+m~@g!C7XQU)(SYg^Id_i}A%&qOouVm&b+
zz9_JZlimiwL=Sj!`#YH!i|B2#SAbey%;({e4dOs82n2*^AZ?S4&y!C#5;nK%)UIq#
zH|Fd?eGn0;V67pgq-fC}M8r*x-BfAY=##$qT<K=6qJ8t`cY1^LIPZk>s+|)joLKV%
z^a7e9){sf1&p96yjb!n8WCcL$-_QEDq>Q_|bij3h%I4bq1M$5_e-LIOPUr}L7CGP?
zz#}=32c0&*QU$IHsExq6YTi{vngFt-4+4(gmGnS>wYKhBf6TY`fwhPT5qJo}CtpcX
z`RETFqW@$--j4d7AK48r1P~kaqkzePaJub%p7IK(^TuDBbcyb-E5awTXor5#0RX*-
z7JUXRngb*>eJ6`0v27y5ki@<tp^=%07SUJ;Z8i~LCkg?0_jY6?dW=0gD+D$okB7WI
z&>ui>f!5J8fWP%BGU(v+pB(^Rt2whnL`&wF?en~*=73Z<*Ur1Pyxr*zV)~BM^iYH0
z^?>;7q4bFAEg?|MC!zp3r0@D{EsGYpJ*2`1A;Gk7z0m^)8F|@P*cd?e>9Of-9veW1
z0rx!c0j$Q>Y|XCLM3a5{Zf^Z&XUSAO2jt#y8M_o}2nTWp-~x#0wizs&PaE<>vnOme
zFL=><Gh&a_=vWytM2#wIqdu6=wy_~i*@@GVhL4>%KWkFzGCIxXN~y-?(y2MeBr-1g
zH7!m6NNwq_)Y2FcQg2LjP_zT`gdE7jdNYszcILlTeII)oM+HQs)N%-kH?wZYkj+8D
z0)Wp+8bvl8xc_(v0uf5q0G)0&f$U0de#p_*MhPXtK|I;Z>_?)}s)kr}#aNL~(w9kt
z#ety_pXQisMFLGf>ex6)oL|M~>4-z>w%eb&z3i3G*dyuD+{lI9v_E>0oqG(2_UNT`
z?tZ{H5q0#Op4cDVi>!?eWCz(H_SN3D?_Npi=^y>;|0}G*{3pe5y`Ku@))QiQQr3e#
zANeyUYmK}_R6d$?ds3AmdFcsR*@vplv`tP(j~Nq=FS~_gLNw^RwXkUAqE1#tM6azm
zsKM>#OS4b@++?aPeXpZ0XXi~!#0N)(s*jTEB3A7c_b$0NI+6&(%nzG<)FDUaUhCwn
z4MfYa(Z_@8USdV=;Y1`go1azNEsX9;-Q`M#biT8n=5Bo)O+@s*8C_uO5GdquP;4JU
zP$UXlEPX89M`l}}qK&L=>uN>px`-Th+kK`qo}zdWi5z?EE23h|xhOLZH`2ekmp~pq
z{?g;MuI(v^aW;q3106+rnFrA)oNnB!n8!Pw{$p(1%DNwuW=@|oyzc0RH!$+<M{Awz
z5lZy*vVQ%#6Js9`uTt1r4^n5E?@IHUk2d}IN`v3g8+66dzPHOX8`{3N_lMty5*SKg
zD1kkdz>sL!h2;L4k!73_Ez*J^MUW#>f+0er_VTikd59gX(c<?(!!9M*yYUWpeyg(~
zrP#=LVT)*KZA(q&93+dVmm$%zi%I(yZ<&K3(ef<>_IEhdni>)<yBFj)C0Z0$U%7B$
z)msr+QTC*N)2MZFb_)soKRhYD5G_;Yo*dF;S_quYy|pV>7P0a2;-`uX(1v5sF4Y_P
z0!-B&k1|E)5htG{;sV)lQ0&vOsVy>Xeo}`dHQ50NkL+!H>+Om<S)B+K#KM-Gk6_U~
z5im%X=_&k<1QGo)=7__4uY_cI<>{w;FFg2gPwKEqiSYQ^h?ehyXvuigj5$(7%WM%X
zI}t5+-Q9b0*^5=E9$EB55iNb0715FhixFc+7uh13W&V$@ttcL{T5|sbz2yrQbg9GM
z4bf5&Esbc&18bs<rk*mdh!&~CkTa{E>_@cxlZlpk(9YNfb)$i>?R(Re@ALOg$2$@N
zSRzOKk52hfz_CmE`u|0Fe*%liggpH7IS-#67J23YEF8QarB?cQYZny)#FUBwfCH=t
zkV-Y?!5Sds;nPF@^0ji>ztRDuMNTMK9B=f1Zh*iaC(6ddcGYM}dat{a=1b}yz=?vU
zysZ^=7V!Y&km`bmtCzmkLvSAQ9f|_$H$ZoLv__YX2{<6qLPW^}NdW^al6uEjffx?J
z1#kthgJ1z90w+YttPb$c+xzw>Z!eGIE_F;E(sPg-8S0N~10n%x0CAv`zQ~0RsMJ1L
z7f_S$AV29e<TwY&D$cqU`4L%JkNQ%@$>UY~TnfdKC14GBrQh0eUEke90p_G`5ly40
zcOzN=cy~RQD7FEj#kI`;md`#eFinjBsYT8T@XLc8AOfTTYz!W*S3p+eEn2F;ihf_q
zqn(}vMC1v~Lmh}-1W|xP^V15w1J0%ejNutC0!Vr`as>JV8-d*BtmY^7)^!Vn%C%~q
z(nr7&VyjCX)`8?~JP-z81B~8~NCJJdz6ZgN))pPow(HPwky1RPfy(O8@K!(f`16Wr
zQP7#UcmslK?#)AhJ48%Nl}EM!sL#IWhz~>~(!jcSEMB`nq;`GRHreLT(Z2gx-}Jo^
zE!F`*^`co9RRoXq008{Y1-}Z|{l{9TI}j~tTYG?0B#W97K)@^JTv3rA04RNVD78ZX
zr%To>DLo~t2&)dCCmT9<a6bRPFZkc78}Z}%oA>DlJp%Ms<WdGhzX5ALL$}qlFkbp>
zEcAq|k*lJ8x_V15_O1=tapRdc)*3ReiX-dSeE|+dmH^ZMX=TTAZPzzX^c_epJrjBI
zj0bc-xb(pyNcGJ@p6w`-Eb9?4E;{GB`PUaP{AEam_v3+&Z~&l*Xg~%4?LF_1OPix6
z8$|HW&iq;FGcWe0YxLAw0@U9gG6EQkupN^$XTbJGv>=J~8@Yq<QV1SVd}QjesHJj6
zz&-YI<bET2uP5qCEgWmZdM5uyvY30HF&>e4NK@(U0GkyR)F)cH^Ny?(I6pn@hS<4T
zMUEr25Xn+(0rBa%I!om2XX(#G)Btubjm<{{ndiHoxvOex7@J5g_K^&X-)Hy!!QKVh
zBM8kC!U$;kVInQ8cXk4)i^w+?y4u!co3Gt{*S^F2n9Cc^xS=APq<}(%+#iAjNr%Ky
z^8=_ZRZw}?XEI%<5iM%gh>j9j#|ePVr<?XVM9V%u>_~2(G54F}P{5|CcO;?(`H3tx
z&us2*Q`d}BfPG<hz<V+i<;C+K(NCE^4oN2clDYrWbAPHRwRb}z?3;Z<B+Spx{&^7{
zQsde8I-=!~S{Lg=0BT#BE`{n5E$YmOVo>W#)Pb}{2px0I&a^e4|IG$@zkcr9A=hE!
zrHUg5^Cl94F4K|QvJWoJer=t2&wQ2RK<3VU1M=No!nuRah@cQnaLbdo^!K0-DoWv^
zq+C;5#yyIt8}r3RwEa}S^pg#2wsrr#_b;8YM<M{tBc~CzgxzOfk%J<{=!Vbx=V;??
zV<ihZDs`z;d&tZp*0cB1PlPa{O5eNnpdMUzk5p?rdoE&?T>|RU9s8Plpj9C;q`3O?
zlKYEXW%tYh-D%^hP7Qp~cKto??W)iI=4D)D#OVR4;C{ne=P=<t(aH;B|I9xkf)mHA
ziCsE91SH~W6}un%#_`1ckiL<#qg~t2`sAzi?5EB-<C99$_-XXSn<e4M*+Hh}+cn*f
z&`<X^+I9_Oh=^cOvPeOsJzGfr)_|A!M7oIxv`5`|#*GyPr!Ea0sTxHY-?NFxM!30#
z>muM>%k`~c`=s?oF38q<V~5!r`-Ah5Y)z(3Pke7jn-A|vEoM(TH+I#2<Gz&ca~yE5
zB2p1SSn?vM{xR5Id+)pWUVCD@m+z0Ejm`I(Ke{(Oh7uS`U?_pVQ36AvB?8>pxHp5l
zvuxXWjP&RfbZ$h8s1rm>tD!PTw1{RImlQUDeJ_W0M5*{pBT#tyIn-o9w1{X?keI=D
zp0s307HP(|-B{HphF`v|1jv6#w0v8s{cY!4Q$wO<_cGiS(V~z&;^F9-GkUMRu)Mcs
z$<oyP_;YW?;|o(rJwQfC*|JG)6x+>W>;)nOsWI-DV=BS|kUwF@@f9hvcEyU`i;q3l
zTN(mDt(Lu$3ZR|{w%;#To2}jY?$IFa|K+oIW5xC%8pecVKH<D`s=kOw5Cp`UXA_+g
zl0_=8O&c~;KPu%&(uO&tN|<@>xxKj|BNTp@+6%ezPa|5=C&L!eqA>lD711)gH(Tnk
zDgD%8OOrb6E~&$o_fv-@jRrgJCOk4B7c$u|gu8k$r(AP&?~EIMQVoBehomL<-Ct?Z
z-d*!r?}HFg>e~4Hme_LX%8r>crwFT==bjVt>8Re8jUQHewnu-n>l!WO_+1^=XnO#$
zzuC7$Roat)X7(b^1ok=6o4&XAJ3aZXdiLG6+r|wH;7#sfRmvYgkOxeE_s~l=j_yIZ
zQvrXM?g{}=juiPT5Wj(><x#l&kpgz*Ee99@46jP+9S`t4+3!pG6A$q~K0vU;wGcME
zKgXp{C&g<R5W~aL!!zJ!kZ73}Pfhh!cwO_V1>ORH3Oq?$#z1G=Jk|aea0s~D9_l*~
z?g!;*C<*~st8XGp4h%q2<BPuQ8(A6ih=7MYqAv|_qDV8)QW`6TUmuHCB;XgY<g<=^
zyZ|5l0piJ34wwSkv@|~z{gCz@)o(vAarp-r(P!W!5BN_5G?6tq=@$?8#|u<=rodGI
z5>m<EjNOC1??7(A1`l{C%f1rP5&%`S2GAJ507#R{il-*93m|b}(#44&0$!0F9U+$v
zZ0Y)||Kv;F2nB1%;XDA}*3`-}ULX&EvJoxqa{x1A5XI6WS$M!o&9W|l50HpAwy2a1
zn>JMY<!M`2$(J%y(aBs7S%z!@{9cokKkAPF;vY`E5rB<?>!N|YN5<rok(KN<=l6P<
z2|wo_!0_`BB-RX&Q5qs3Dspgh2pDot7wX)>Hq2waB3H64*#!u6STh|M*cGk^k~`Wp
z{oDYW=2h%>GV?om0(IrtS9JS=0D9K4=p^9wO^@ExyEDXq_0{IfXT4m5J^`cDOh6t<
zkHphg+8nhac;i}sBCo7<y6yVaKQfChkRR{lp9T26Cg2!5;c&id*1F8Oj_P;*PN(TA
zebJut^aa?4Y(XB<Z(z2X3!*C!Eh0RS4_)fG<l+wy|8zY50cghITG}FG_U+WEU40y+
zm;%?xBSedW%|LJ9APMtgZbuhc^_L=^*cd7E)Mt5b^ZFuJ?hYvTM0D7Ck#4HUgM6M$
z+Tmw?L8`K4$Pnr5jG26-?V^K#xCdW&D0XOLk;=#+wgDK8Bm|a{hqPS}FMqheJCSdq
zb>^o}&YPGNI6T7DenFZdUCsah$KHMT+i_I+qHmleE7+EE&N*jeY-7L-7?@#Tn7Q}O
z+|TE|zwD(6Z|(#b<}w3Awy^<caL!4VE$5^x2j1_yde_m`@i|JC4zg7(?X&mp-Cecn
zSF2WcuV1ZIFsMF`lbiF|8YQAeKA-j=O|3~_rvEh2q7OKOMUjb^;FMIqh!%^Bi@FhQ
zB)uVLC#?PMoKqu*@rGPCuE2rh`#~ZW-i$m*?RJeZo)8h7&#*xRAkg(Ao3G#LADhbA
zOV72X=~lyeTkr6vTb%WD8d)czkIsn-`f2O{n<6@aY$MT<d&<-k4X3O#dMRBU9df>L
zFS1VLCMT*jT8u@yt!!-*(FudAb37a2dWA$`bM8ze!@u42z3L~&OM@vjov1Wx)F2=A
z9h)pFgO2EXDGXuFsjN*>hmBY-F5(;E^V?j95DUgW8+_EENiQ4wEwT!U%Kki)MawH1
zBPruVNg}x6>e@y2@qa!AhtlKhIG`NQL)Cv^$|dV%aVU!>5IG_0PoI4+sp1f0`Vjuc
z1|31h8K3S0=<Ihs`xiAH4q1F?NS5y>BJ|%fwqD4XdZUO5V@`yf@qmb+ZzWj%>Qft8
zU1Qv5xG&K*eMg7W71bu!c2QtRmfJ&&)Z~SvbVHz(z0QG6C+l75ITtCf546Lz6B#B&
z71>3)Ap`Wu_wM{&-OI3BO+MF9a?xw4-{^~LK6&-w-A~_D3)72eL>j=jZQACz+1uC;
zTdq&!Bd>d(W>dYx+ROcd#oVQq(?=pM5RU4m^RB_JzuKlRzm~Oryjy>cjr@6Z`oZ9a
zEMUjkc(9jj6Y+@cd&y5%CZF|tjlIa@w<f!HePlw2v8K=HKdktK^t1jku|gyw*$>XO
z6=5s2;*Vl~=&Q*Y&a+<If4}a38@`gLQe;fse~M^HwAQ^z?dJZ{{e<%ApZ1IDvyRvp
zL3U=w<neW#Q9h!uO>}_mq(w7h8@{%te?Ds`-4|VBvXb@9E{?5mUD3aE#5I(C(RN$I
zslM#NKjif?QQBl?J~l6#zGxX{JlpWMwf5~nk$TY`fsVkZ9Dz=>q=Vaqul02^#<PuV
zJI?!#5-mq%L1r&eFJ3NiF471VPAkS7>EZ^*0~@37B5-6<(Qr(wOKLHZF9;WrFZRur
zt;ycsw~qk)I?=LyoOb6C^tBT$^CLQ&XgT+WEX4hl5G@y9G!iXw^sjvI;o<(D{wzex
z9ktep)Khf1y_61Tq=$v!k4iePOTU`%_e7CcWZvTJFFo;OrL(%{fBm4o9~65t+m^6J
zMUaG8LE4mTv0GARAqkvw)TyV8B+eC=*IFs>yz=Vs`$T>`dFNe4N^IJ=A*8~Rl1(%V
zB4hEgrNc#E`qFU4pMIkjtrr#Z-0wrQ{O)&SM9Yq>(XtyOTHqD|tBpvH<IXv!qG?0{
z9WzC=-1oyDRqC*HNgc*^vmFhy!ajH%5Zi8X`)j`at>K2h`RkBi;{$~Ldgzy#Dg4ya
z!&@)DG%Q}eY>;+MbdclEyz<K7tj}H@0_uh$uhzZ!R*06nhF|~Fzbt6#FzqSbN&BoD
zcE%ZJ3|CxnMWw!U{Ss-Z-OoJp%<$AxPYsVh{&>B&>)Nyq&adTr=Pw=-Jsff-azx7=
z{?$Z>NYzoZa`W(Ot3GcJ)Oj?_jyNL|u@EPahh)xv4~!n_%{1l+6Rq&8hkrHvdmL-7
z@6tXX7Nl(9#8L0x-uky$1pK?@$ah{j<nvJL!ITqH)CcFJC>V~Gt%;V0%gM)SV5T_d
zmzn&^Xb<*dM9BCs+j^j`f04ULi^#!&LT(!;K8Kk#PktWfE9V2Eg@Z^xn+b`;(H6vt
zwOPKJ1<+x~!TyUZy2?@M0sXl+PRntS_O$JTv>E1RQ*sb<cv%qr<dkJ?56*jA>7)<M
zd65sW=8W`!P9Dy34g+;KkF!LMc(XUvKbvS#T=jMIe41$a`eirIM6_@^a|D<HZ+3Gd
zT55Lc#DTyWdSe`@Qf~OY+Bk76rfId6eTjY?5Zg&sDM1`(D;<gK94YG(`6Q3J6eb*~
zW_tP^PVhynSX1SObFMEL>CvOBvW`t6S<FB;>y#aXsryUsW8EW$ZEThd!A=^Q?_Bep
zn$eC7xHUuzl1G^qQ8(+jIq0jL$m^XW&a70rVxv3iGbA|w&QJP0PUwobiF4RmIcfe(
zo1N2SX+(=M^d(22HXA?2&67!GhRnGA@!N_pb6)E=<){~yg?n_X#%A`*8E?!PBbS_d
zamjLH+Dk`YOT-R`_`m%2U*Zh@uWCm{l|;z;mv?n&A6$$(vvL14<IXI24qY>!rNn#U
zr6+6lGg-97*4m~F`=o~Xw#bjzlBTcfu{C*}10-YAnB<6N7aTY0$QackqDABq=O{;?
z=m7-KUw-y4w<cObaEd~+hQ~}qOXP7)hq1I@oND?AL8?CFhH-<C5FKZ1+ZxfLPYV0=
zkF}_7y5Oea{E!of)hAwlyrSx?$$@a-{3Umd-L$FNSEY^kC<jsV^*Fe#6Js3{PDc3L
z69Pe+8w)!>yz=45YZm^8<*;M-kl$=Fhv^$36~7n9yhtqa!wb>J@h>eNlEc{Lke4dX
z+C2J|-oO{hV(eXh`em^{*H$V=&R>rFXJ31EE24#cu}+BekbiUiU)Pv7HaV<+kjNXv
z2j?)yC_AYhQ5VQZkqwP#p{vTWw$FFtjHah_y#`P)BHKuW-dH4y`VdhfvGlnyid1En
z=ps9)-7vB>ovJUh{<Po7Mh<IKh}P1d7bGR#nxs@iM*SpFlJ`AxU+Dqaj8Esbe)L)Y
z8aGIP5o5@&&u6y1hz5PZcK+<4pOwxzkKM6phw`q;b&Bpumu0;V>8nKa{YO$2Ayvsp
zH^?T17$Qu)oW_VIv+a?N(wvF5u^vW64;^trBCWnvq!~=i-lv!Bxod%lJs681VN<V<
zJ^hQT{~~Ky96G%Gj%bW=LIko+S|fIrysdxgzBuK&{<&9hjb+>Qh2t*FwHGlv5+bM7
zegw-5(Gv2v@QD67pDt`6TE0@`qy82N_x+@nvt|wQO3J&>XB`vQcjTCBCRvmPm+Ag7
zR`qWYEwMqav5qw%!NyrW5-lNZ6VW0?oO_JLu><UbbZgExMqpJ$%cKsvrf-U931<36
z-*eC-KVL~S$gdLZ*N7JFqbvGCG^e(TWI-03a>OY$KJ<%7-k)bpJ31hRpm9GECL>YX
z+F1>l`qT8#@3xXVc7!dRmYdA(+u0b`{F`&FN3>X9O1~jmYHgWZgQa4#&pnj;Eo6|$
z4CL!C9?n`;NkM0PsGHrI#-5(&*Sbzl>SWXDf5WrM>b=?3)|bAI{h6_xdx`gQ{oIh}
zs57!Y)Mc?#;~IE3R$p4L%XLV4IZ>k~fsjf2{`$JVDe}wOE9{)qc_zIW2O@Sv?y~L8
zhB&9vr^Y6mc;Xsi;*au8CSY&0t@{5Dc_vJMYENrZ!@63OcEMZOCII}|m48<D_1xpn
ztH@V`r<9K-I*5c_5`xdx?)mD=9{K}wI&EB^nG6r-o_lU(obh~Pqa!c8@WSx?^NGlQ
zDiM&YR?UpBJtR^uwj<CH*ar~Ur-_z!lfwA63npVcbK%S9>CG`NoSaS;sz-*1X0Z=p
zf(1vOcsZ6qN17mI%nX(a%;tt%gi2%0ktdFCX~UeeV#SIgW*C3vy^^&Rq!4p186DrZ
zT|)qV`izs%yMEFK2(P_)BR5&;oOc_2Xldv6W}P5gCtBv$E@XxmGDOO$bFcech>S01
zA$t)m=a<v{#mAo*e*dE%Cl%H0wa!P3!1w^{CODs+zyE=8hUUR)QzG9)ot%5!bs<`=
zs&!4+xwlroGCc9y-wyZxB&m!-j%*Bh`-eoFjHw|;`4B4m9gv5$tS4f@dgZk!8J~-2
zIqlRUg3buhqMo&haIh%-Q+M4R^5p(uU2LIpZ0xCp>n&t|_BGd(Ob7>Qv{pXwK&{gv
zb=X3#(NcJ`p?PUr5iN;mIVEe(TzpefhlNO58#4Tbhm)Eo1k$QUr4Ew{Eslz`%as8Z
z#$^5!0VTrd%8*Fc{^eg*DzyU-UOYVh>)#HK2XC0HPXFnjN<Ef(t?P2`^`EQsYlyI=
zvH9wN;jjNW5iL(WJyZUk9^0iqk*V2}%MxjG(@k06{nS%yjj?u}do*j+JoL~*!@c(=
zb<Tqi&P*VHLt#-p{>l2|0Wun1@{1ra^IvXw3xl=?{v1*dnp&pIKga_oVr6+8Yn<U`
z;7aM>c+NTxlxBx>nsX+Y8R&tQ!<M6)6Ul==%s75I?956ui+ik-1L=Ut4D_pOabQjy
z4#Xl_QofnP966lxW<7Hrad>-BHsi4boyeWHY-2hiZ6z~uhO^8$oKocFgyG!rUHT7B
zFf)-kZrBfHh{WKGRfgHA8~}&_@^aSv=e_?~?J;|@5f$2G8z082n0naf(3Yy=n9N3o
z52w$x9_bsp>ET`3+Q<pQ*(BWt=N{)dM+he$EVRRGOkp%e2s9^vvXtMP5*5)Br?b>y
z>bDU09Z%g+*m5vf1I0{PP6f_IGH_IIEFoJ?h#cBh<7XmkH8g)XV_|YgW?Sp`b0fEP
z5p3k<JheD7tT}?o&%w8D^SaS76-N$74`-&?^wzZCoOnG^1a~FX4#ze}2<OSDGkSUK
z=@IGnchp_&_D>3uZ(a50McTp9tbfr<95Hvt+0|@8*|)NEIX`WruLuOQU^%lm(#q)=
zT#T8cqnl=^i$372g0W3`g;8XKAHqsMa>RIXWNEh)Db`z{gVOBLAG&A^jZU=Od#7C6
zSofG}tnSw%XVVo9XyX{6z=<xh#e&`Nefs5RD);=YnN?ep3DQR(bFQuF3Ei@2veZqY
zmJkwj3q}^HHpVSdZPUlLhA+sB6|x#1oTD|P+ZdR1Ppc0`+9s`(G*;&(8qLgn3zgFm
z>2%a<v3NQyy3EY{SF$dV^;txQnDKA+FdJjRW%7};Z5(6phY6c@Ll$QjjY6l4tCl*9
zQ`+okeM(H{z{6Jhl{RrcB0LZzhbL8&wKDDrA^Yn%sULpfks0S(!?8}D{HbvlN2D<~
zE&zUI5y;3;zeyh@0>K(BA_0ng3bxk9LC#3;C8~?VT@=LkCyQ9aiEfG}Fnihp+efGW
zIQA_V{m;Mo=OPiLTqKvialC$GyS^M-EIk)8XI0sfk!bnNW4{^R-8iOST)y~_^yBHn
zpJ#DrI<X?e3#`m?m(GnHwsyn*DN~<XuSL{^c7FEkYl;9uqSKjwz4Kp(f4Spd<JjZ0
z9LI==2<sM!%0Yr5vJe9F{IT?_ae?^e2$rsoW0Fo)N4W*5e#+mmyQX2U&&Hrlf745Z
z6@nX{X0ThA%QXu@j<~gs4SQF0<h-%`Dxw^%+UUCB8d7ToMJEt5Qn-m`;Y@uwsm+XG
z3uz-n=s9v8<_Ixq-b4WDYj(#1=<K*_i--_T{HsEKU3tb8Iro6-_d8>oZw<L-X8hxc
zy3vP=(w_4Zd0?#=Wz!YWOUAovq=o3M|Ag2w7OeThPKg*ogdUeQIYeLl?z#Ias>?WH
zcl1}=w7;CosSjbm9+1<_ebH3NKjU2FsT5zvu}yn^mWc;!H3Dy4`rlX*W#S&<sN6Fc
zyI;yV)^-`ATo>0GEhcB^Z{wZr!5ZNwx}j`M`dYU(yT_g(R75gZd+gM#@gt&&F20;H
z$l_jPb*?$C8P;rB`P#}#Uxt(&b>t5r$P*=+wp$~Pj<&ux9`&KIWBiLuVk433`dE3^
zIuU6`W~8gN?DP?AM6)>Fq!BXHMYcoqoi*+Bm;Nawp6IJ~Qtc)~>ksexKwtQ+-5-#f
zZhG0YNf|JAt#jRBN3J^as=5}NsDV^4(ZLvIV;z4+#-S91t|8V>v7XX{A^DI~ju(v)
zPtl|wwhTHEf}>7mGDxnK@8x;HI5I(@+D4}QBM&S+OSpcyCX(H;WH#>Tn+O*=b?v#=
z4woi1D;<Ze`-LB7L3#QOuZ97l_v@E_E$fPXq3-SA^FYYiyFw=2nkYd!kkMGOxE4#d
zXF@>{)S3K)9CnTf*ME(T5>dszBH~OY(l#>FeJ?n->o2_Px))nUpUye{>`G@V^2s&c
z{jJFYB3hKs#+cM$YyDMU_Q0QUylnCNm_+uu&wZ|l7LoflzP9I{d#*^9d+xbs@T{XR
z#&NL+gz80h1Udry1Oods(c+>|H{BD5y9)+m%h}}RyO*sU+fsW;@8!i&&kzealuiZV
z;?>S+p@~v~Ay1TN#xSBqREuM+)#Bo%Y{wv45Ht4q&2g`0f$0|cqU>(`3*ue+<D%P%
zmR*8wmrtUvooJaKzAZ${##(s(oa^!+niN*n@IbV@v1--u>a))c_x|S(hldj(AxcDQ
zD!S~}y+xada<NGZ#*uH$5$mX&^SNuw`M-JnhDz^c-4M|&ujR(_y-g`6c5f^xGju}`
zEXr$nX8J5WY(=FZ+q@yOgk#?jL5P;4PYcnKXqnS5zdX??No5fG_iV@nYri~|hzx16
zK8TEeaHASKYaN&KZ@9i9YosybxI^wBGM<ma{^4KVx^P6x#O@<J*mVv?xE*`uS;NT}
zUpQRwg&Qjx2wAl1k)-QLRLV<<;3%RcQ9jyJ*NWKj!asc=(#PwPU~<(rzd4E6Sy5O^
zkA@t3=HB}%O__(!rAHh-9CO-fS+C}c!)YNkRvdqPqJ9nz@wIk%`tEy&+yCjG#)y`z
zXW&6<PuqLt!@~6jp?|{-Hw=IESAR8}bygPXJoeZkcM;RSPXrQ@<+kY0omngB(-6^8
z17_+^J+=n(acqr0w9jzznFEZ&svNR;82`&_{<55^9z;2<|K*N<sRwtCbTc%i&GWEn
z7P#4zh>7MjmZE`!%h~{Dlq=tZ_92-y%0cmsIAA#JIAEQ_+0H4}h!)NP56v7Pw)Q|j
z^Z1|g%2Apz$BFNuwj6+QG;xw~xSC097QD1JW=)%!X(LC|Jq`k&Ibu1OI3PH|IPPye
z?+fJs<Sh4q?gb-`j8=CQIO`7?wRgmKlo<ZC<xPDY5@twpu$l>NCO-ngjO?~>wsk3P
z%L6>;3nGB5wwbc>xs_$FS*O<EuttkDX3Wa9UId4V*{x=eH=>1og#%n0E=}g?^2aue
z^=JY|73cP`aY$YshvqkuLW+}_Y*M$F`E6|;v*B%(Pe0BZ&K9H{N2c^-97AThADr2r
za@li){vx^TMeVfuM~o4Mskq=1g}|9N_M_i9qb&SvX7o3Lp>+y4gGDTHz*ujB9&$)-
z+_Z5^p9D(xOPtMSTXV4M6AQFkmxeCTDeWRp!=W0LKfbHqIW5{m>L89({mD_sc_d2X
zA8z}Ha@5jg#~1c_=S=04frC0YwUIMOM|xKE)0Onou^}v^q`NllR7a7IA%Qq_MM`ky
zTeoABbBk@Q?`bPpIo+kix$5++Ll&o-bz{t?M_|x1Bwo$JogA;Nv)gSSotfU>`m3-C
zF8W$q5pW#OW2DCgMOxjT`lM6hWaosmA(7fx_<GV0e(UOQMZU9&utGSBO0d{EXCRs3
zt$yQTOn-HLjrnqZ#qr27eqY9YrOpYlfEaZ=`HYv=pU#)&P^vO(MIm-!ed}YlR@8_^
zte;905+^VjwY%9Nax2gG_oI)Z4^EB4Q@So|4(L~U$%*-kq&;KX9(nQ6a_pHkPDhXe
z$RRf8<=8umL;pt{y844tT_oM5r(Ig4m{f5{D~^8=NTLNFd-3r&`d%MNq_j&E!x765
zA8x+nYe}E==Y{{FS!ZI5Xo+(?<G?I%`pu!p$;f^q$PpoD$BD?1+)@r%*F-8m&yB5*
zIhDVX#p00}7iFxyyM9d(5C5FCNdD=!|CDG+v%--jxgQAGbnQ7~8ZJ?ZNNjfAxI{pR
znsI$tpMBQUFeANmYYN+O4UTc#H_W};^remM=18?}5Sw?(k*8MMIAg7E#R1A0Z@d|Q
zBiYKNlu@ZN=m)zZN&zAG=OJzX`uhK#2#-Wq=Lk_mUYx3rC0aqvB14Zzw8m-2oK`8p
zkX70U!~5cF7o`Hr7hcagI?-9A!WFr;Z~)(sXbv`Af8LU@D_xTGVJl-dIij_v>}1jd
zUKqj!(TgbII5!sV3&~+jYlCYs{XAk4*(G(F@vR-A`9zE%4}TZi+WNk=T|N5AHQ#te
zA|ZiA6NGAylR6WdlIrc|OKvW*$MM#WaX(NI6Vb)jLU354s%`9yNg6kte?vvOl8qz$
ze}!Z*`NFlq^_|n6ov1z!{!J!;>s#$(OXvr}PPCAS9b?CuDvxG7iOBfR+-tawj<$8Q
zSr$>}aM$Mt#?G5;@Xopr{)xcR7knb_)lJ@5nl)vPNlHZovht<Yg7>-{_teWzBr@)f
zif9!*==#TQ7{fP)0HbfGM&I;1qCmRKpGV)2q0TqXj0dD#wQ$l-`qtXraMNal5@Nt-
zx`9}=c8#bQHqv=EgatW7yt!|nd*p^E+u{DpSVwA!dLjq?yD5=zNK$Lhxi>~Q+#W(1
zku(zRV}yIFC*4+hgk)e_Tq{KfyH>%GjUy|&scw@>=!mw`HEkd_Tj@U1#1F(IQV4N^
z5IJGR3B%(drHohoc}wg(ohec$W7}ufcO)V^g-|qE##prWl`*k4(QBejO}G)&b9o|@
zt~>X-y3a821=c?bi6av5rL1f8M%IyHFINO_Z9|SCO{6{~vv!SsvP>3En;d5X1yaab
zIL4QhZ6d`)1Q}zu#Xi7!A&3^3lh4a!W`2k-eJRA|*S}ur|ED8bMC?EI*ki*_fBMtm
z?z``<{&IfbIszSmj=-)Wuul>#?ZVic?T8K0CgnWKO@#;nWJn`h9FO$iR6>RzPZ(=2
zzd26S3GxIP;(Qwe?Hsd#8FOSyIref+TiCu4Eo4!)NEzijN4b7iHnN2itZyG50e#l`
z-h0i^=n*X+AJ}b_U@UZ^WqyF*!-0?V*qm_jQ<F06tgEjc&P+7QnO9v^5h80|e<MW3
zJ&BHZAnS!>q2V<nq3dCITx>oM7>Q(gKV^v0IEX!o-8nT;8W(=)3&YV#C3bM^3lap$
zvieD{r$X#xQO%99S!Bq|!uCrJJtXO~POoT{w^rw7J4DZmiR$n!^%Y`3x-09y93N*q
z(&F`mUB8s5jg?tT=B<}jkLj^;W2-z78)sZ~MXk%SEaZ*NIDFxehjV<0mYIl_3o3or
z`8VD;oRH9NQ6;M$eq^}qAOB~`daTr8A#E=F;ujMob3)nd$8P`iaLeER-LT=^H7Wnd
z;jGVIQz^=<0kb4z_#NN<-f-*R|3h$1c>U)0LO^A~(Yy{jCNqau96pjCk^it`j~R|Q
z>7-iE21&I!SiYHbUn?JaaCm$5%XR+y!Rfu&;dftqtroafCj4!%J0q#UkYpl}PCW0t
z;E>tKFRUt(1uh%bt|?Lre$tbjo`|5MPdlwrt9fYF_XwHW+Zru@B%<Z|>#rZa^PTTh
z3bART<^KEcAMU&FzTwxu{`GKI2(?cm(V{QQk+H=eGJe;``A2UN2uP2k;|xbONI_!8
zrUz&<kIh;(!yD0JHlu5d2WK;(IqRg);57E&>LFOV98UC=nHB1~e?S~l91><$DxZVi
zLo|nnoaP>wkpXQMCMUXw?UFt6Z4rI^#1Z3PIZomf=fv_LF3pY<HP2+Gzf>$gj1z%_
zm*ZL~&eP^<cfNDp+s9EOH4_I42k`}o8bC}u8E4DSGlSPl86MipwxxvmHh8veT5|hJ
z?$p<u%N+Dyzw8@vm|Rkk3Y@a=vo3`c860v&V5LZ<ay}w;IW#x`PR#6KImr=I79Qt>
z=LpaS4k1n`q!Y49dLk)++!vCIBZdR7ZPRkw*JOv4jpIvx^<zVjT^1+s*E8!I(Pra3
z|E~}mq6uCMZg7%{CMKjd&bQ_ml9EOW7{xxFg>RpZGoABfb#%+|)jRp%R4#IVmG^E)
zqt$T^nSBetHrw~~I6G^WY@C=JagLQni8I$a9~?@eY+C!wR+jcmq!h;%=PaG(<QA2~
z>4NOw(9+lKJPm1>l+MJ#ME{T@`f+?+SCI&>By|r5?>}b!6p<Ais2qD_;{fDv;%E}x
zqn}=V`_)RRqi$vDPxb42GsH!CaLQYQgOiQp>Q`BhfK%_0j7u1rp-tYlsmuL22Qo(@
z2mEDm%$apAs)U2c%zKVgPH@o+?fmIBxAMHt-rw-hTRyjO%W<nuZjNJK+9Y)%cD{el
z_me{H!O}ftX&)z<{;X)5;COW$aiTfaCSn5tWClNbLS{4hwGd_}B(;(VHj9{_eALOA
z*}pW7{O1SJJ^J2QTAwmt&~%CplM$w7$bThi(4;;zwnd0Y5yh#_ejv-*tY=tGw_SOS
z_#?~Vv3*D|j&2bZ_HlG`Ae!B7Tp&+4{>&=ZK2FAEA$ZIzM+)mp>FzlD5SsejY<Uh{
z_LBZQn|_u`(6|wy$GHdh6N820r7YxpWm`Q9a4Z_*h!(_;b%r<+Ie!t1m#6%zB9AC1
z&S?=aoY-`N{p0MVYtqKCiC4$Qwy3SAV^<yuNh-D5FB9=$bRbOFAP!H?PV)QCq5n>z
z8<A&hg_-3X-<;lTuRa}Z@km6IQxuprb2y7zDjvA|@7Q!mk+nyL@c3p@`iQJVdPx^m
z>r^D7ND8@CZ@!Rz44Gs6O>AGPo7wcMf8&h%gX(&QFgr0s6#D>=N7643Wr2C?*tnj-
z@AyQHh_tvelLm}U_E)6Dy=PdH&$ljYK|w`|C`CeVqI9H$P(->k0i{ZlE<N<F(p%_F
z0{o=+PUxTp2{lM3^xk_(AR%z#f1kbgd(Ph1yFa`i-f#0<xn?rYTKAfn^~_rLx?|I7
zU+De?urS>dxlbU@>^0=&q+?E>qL=vikS=G5{eFnBVb_K$3uyBpo`ToaXoNNDZArv{
zSuFF%=u@%UfqY_5UHK&C?<pkTC~EWYnjv=3)lg%@1MGJKt(Av~_(p+1UQqWv19>%O
z!%ZHpM%S-Aa>s_P19$@CjHN*8kecDM6-L>Wt1pg^6(ZHJS`psz&p>>+;SA<%Y-In=
z1rUZiUl@0sJ*^S6hqeU%RBxT~lz(!;sG9Q!xEhG6Jmp0};S;fTml`BUrEw9tE`ZVU
zwAN&7jLjr7MnKtth|;HFI6WWdWeuIn+zG#T!<};YBec0AQK3GeV|af&()0D9Ot$OQ
zz>A@~oku3$9l+ldj8#1>a>T3{geTcr_P%|9Nm~pMQ4*#2JzM!U&sLECGuWc_`qh0S
zImMPXx8q0SMOnLPINl^x3Uf=CrfG{9sEl=_l4s<8v$dPESq0UwO2q9+sjiI3mz8%|
zAu-(N;^B<V<njZ|zN#3o6d163+PM>Z>axw%7#7X^{e(4KJ`9=ztF1uV3~ioS&>3mB
zJq;&$PTRtnl_1!S9%KJT@dleyKK<<efT7yq>r#XM`z*Qsm@gyo%~#QH<eo+P`}*N$
ztX|ATzaih2ICQ|O5skL8L~K$?<%VN#eXqh}=g#i--r}<Oyk|j^eXn;N&|>D+KBbcx
zXpU%mI(!_;?{9uSJ=uY#%vG4K=E_$7)@|v3=)W@S8M~l!d<F&)_IgdRTR5I!ZE{{J
z&zY}htSSyp#{-*=mi^+H7VZOXkJ(<exj~znWdoV@C<w<0u@ur4bq2_hcKACB(ssA3
zBC?%IW{;;V!G}i~u8oK7V0!MnU8mCr({X1?o$k477P3vFT%$7@^#kPe{uI0iY~B_$
z;!B#680b=4Jl%%X%?t9R`JLcPG`4S`<^K0CJO(&x=(FaZ*UWXdCc|S0@mY-Wx)t~D
zoj9ee*(v1r9dU0YR$Zx3(ju~53peTw-SH_l*>ooUJPA(?_$_sI-^Qq-o^zL{axE{f
zKB%Lx?8$h4?|Yy$cLyVWd=LJBT<Qb4U%%W_wkO?_Pv9&yj`WjNs-J4pUOXZsCOfSf
z)Vr%tK1F2tth@a3iL_Lqmr;8c5&kTQP^MS@Ztfb&FwctJu>K;1;>pguzp&*18+`Bj
zTvLHGR=$6Y#c-Kj+??Hz@!>FdXW<H8&7viuP|q%f3N^)kOps80eLQ>7en8%Nf0$K$
zdYys;k>#=Oa*0adl82ywKH;2*H+aPQ$zVDxA1i?NWXlB&hlFY}7K?JQ+TvRe9)GuV
z%s88HG(QQhn-sKR4@WoKkkhOyGE$z&PDe?~YF7kqZ<!D7WNf8fHFOUS7jg3HVSNmG
zvN1xzfZi|xKT-K^rcB%B{g$0D=OEmS+aumw0?Vtle}G{m^)ztOnqZFJE~oJkwo3q<
zr*&#h67dPEp%5FI5R1_WXaU@3Qgn`IW{dz<t#YRYBGleOSjg+3>l2{G<R$(1H|x7A
zL#4#J!X@Or4)$c^XW%G5TfFP>mfKS|J{K~VF?zk#%yyqwnB|D2&x^^+#R^#E-&UNG
z8|A1hKnCM<ijF6DsP9IbP8@L{`507u#6IHwl9`{Gm9+xmB2W`RX-zo;{#IubT?bzn
z@Bhgz)sHQ=0y|p6GS7l)+znN#^6(_AkT0fOj*-%Y>`2TIyv??gZMjF9gG}hb7r4&;
z@$I^tBnVZG8`b)OmAMxZVCOu+)eu_$!ewJtmd9n^$(c<!m=GrKnhtQ&C-8Da;SE`n
z#J8A_&7cpdDK>~!iH5n{^ML9d$p#nJa5ooK&|NP)QV=P-)`}Zw=~N4_$ce+R3bU^M
zZZs*E7J2b)R1M?+og`3YiIQpW+;aKnJ+@Z-X#tzu^M*nEb7DGUkx{*;1TvkV%QG43
z<~2!6;tdWTMT2S73TSRr8jHw~_mcM7-`K_zXf1(TIroN-WrFaU4f|oJE99_7f+W;}
zm8c`(85son-nmU*{uXO5>73!vvj9yE#GATq?$RuU{VtZ%U|e9lhjJj)V67@mF3l}U
zO<hmiZxfQN4y~Gy0m_iQkl^^bT*M+u`6w=AHD7Ne0Ok2xMP6yF3;;7%pyZ_)pMHBd
zu1JOs>crrlgyzYdHd<um@h9Aanojt&9DSwF1Le)Dmi_*6e!881@4j?*oHA9D3z6T3
zfAX!j4AkyQ+YuCR_ho;6_BXL|e*LoA)mp{PmQ6}tICAk7_59WOo=6fD?g&kg-hKrP
zMeOs_zxAby*^0P&pAEPIz?7MXZ#rc3aB7?m%S!O6@g7G>|7kT!l-QosWHxxM9TxFQ
zdQ@sf|C{&&Xim_ti<IR!r*~28>~`H`iBBPb&oo(Jtt>gso<b&+IranL9!$IB_k{31
z1*~U!l*+sq9|0zNo0@9zXo(k}`ivNzz=o+8rw1NVwj!-$<e&bWJ#we-*3KJl+({Z4
z1o=<hrWM<$QNEM!2%K?~;`R|{%en%8CR|)03+60zrd}Z4$=t<T55>ivq>)vB?`FRB
zRTvM8k6)GBj&&4oa-dWz#JP@SSuOaZGj}33^92W&e$$y&SxI(ENOnkQobm{D3SvO8
zBr3b_PL%qrZ(>v_`6tR{V;-B+Khlz+5AV*ZMX7fU+^Y$;{|GWW-i(gxGRGZOvsye+
ziF^sfPmS(8?Wy7MpflS07S^LU!u-Qf2ENhsp>FYEI6bS3RD7Un{+-%Tb#1-a1NWDR
z#>Z)Yqkuc=CWtb_&fmc-cy3%Z%X6tPbytqaLOi#Sn>1}_^g_t<wZ|J!8Fq_Dy%_h?
z1^GJ(YT94ul8k09zPhc)Njnj|+egFNB}=b=u@WtP`9pzcz7Hdma}*74wNOxh(|i|^
zLdUfmj(aru8lfC)V8NsBmfIS}$e58ZRqZdnW5-QzU=K|>k-|PSC0Z7-;p}70&y(JO
z2C;R0y<%iIvj|bI`(5y!YHr-7hcDz7D>`yhtSy!+gg^-xdMy<IcDQh)=-;MJ*#LNs
z(8}?=vA2_K$&TuA(L8=lqLBZObn^7zQ8z<YVbF;Ao5{7OkrA(<%6X;PMXzV26I3{O
zzuhh=i-VX1x+_%w(zc|0R8+7s+Gx?q-`~EV^1kpab2jtYD6QhBct^WRUIej4(t}x}
zjMYUkiC#h(28`W7$e6<OEx)lQ>D&k5?8jr`RC^JNiQA8X*JCd<sb!ZLziLEid`0-w
zq$o43L_D;}iH~r&#nV9dGltn?$V`gH^H!vxyxD5;OV<)3_hgcJVF)O1pXRxtAf}FA
z;h+G7A1a5-H;y^C#m|Wr#byj2#$j%9I&@6F96lbe3iJrGWqDwlBVw36K-8sC?=Lxk
zL;ZExsfo}Axp#=5d%V*`!^^GF9Iu0it<vet=q>iX)U{u(i(5t%C!uQ>YW5^Ip4SyT
zJ>mOUz%aTEj;ZZk{J>87^|u6H1XI_5dq&^hOeM<=ghu`DQ_ogk>VO0Je$hV<q-PU$
z!zV}NlX;@mC4{wjsq18T<SRClF>a%MTAe=dZg3wBFzKKtjV5V+yBBo8IN!4Yla8{D
zmWz`vRVJgP=Hh&q(Gy!>Aohli<f&Chjxu>bNE2}JpO9RBlXquE)%cA3C|_(gZrhkB
z=E&kZE>8pKBhGC(aG@gXF+f)|`R#@B11m!|#pIX<O)`@X9wKd$@O-Aa{b)CZi3oQe
zJ`J4BYaYqdXrr(qs$SPOtUel=&w1s0RcN3x<pC=d^2w4fXR%(_6yTMkr6vk?f=h2z
z@E#;I%tcbH<b)qTjK0QQm-v#aC-(t!`mv%x$6N}B5d#;4GS@pU{nxlgVXz2T`6b!0
z(l-%H6<%8=)(@FWMOo7)Cc9PAM*^WP1}EK_94GM3%e6yKWm<7qaVp2eiiH1f7ZcXk
z$X01`;7HXPuBGuKLe6#zM1B&T%JVQI35vMaK*N&nbj~)o_?gz1TkJB$niYKRsOi+_
z^$?szpf|gk3BozM6_O=&>Tg$r_M7q;Kg9D^A7ksCT}F_sh9h2pQ>g4h^E$tMH#=8B
zz5EE=+wAPdp0~z*@+8JpV7@qy5z>*ohw*r!H~1yuHUdKTqhHS2rMx`_Z|w_q;*qko
zgG$hAqvaH$TpV5^a1aV!0JA0SB-#0{i;i;6iO|kbv$z#+C7*F5W<7m$o)B2I;acj#
zsFHO2dmut|=ek%9!n3hlVL*cEOay<;8``z3djT|&ftc8ewaZH<;9Ca1HfnTns36>A
zbEO%{x-kfqU-$ShjS<uukLUqsR5r1fyV_QB{=X;tFD^{_aB5>7TG;ry2sWf#C#oNy
zYV9dXD$nnTl#YfQY8H`t^fB>zzHb=Md0NKpr@CF8N2TE61MOQVr<)@a9Zd<&%nP@e
z#dC6zqJT1#`qsEnehn|-nk^C~;SGK@d?v5lsHXgbjmP1665{#hH~UN3&Xn7AftxGm
z$+=`2mWw3$;eDD4Ru8g-`D~%e`vX5jcII*sUK~$I2-iqdk8Xsyoi*b7oaFmAmT%PR
zMqIV=Xh8K=%0*YM4_4Y9XZ?-Vq$rvLOk&A_=YucIyr=Wy!1$;Yh;X7)6xpA<!sP{D
zRxC%YA*y7y!x9sDORK2#tF&J=TcQ~Zao#KWV$uZ=ttw?oEgcz%6Pvhs?~%r&!&*dz
zCg`f@jH*^Hh$;J>Ff?(Y>2yb7AMhlcm*HYQYhh47=pR5tCH=$==CYybW=SHRqJ5i{
zONnOcbZI_cp~_&kSY(Shhw&(Ig_0)X8NQ3M_j`<);3z7_y=1Ch;Zx?w(o15_?zVT~
zVq6%RM?K2}Lt&4TeuX6^%FQ&f=(WB{*&n#6M5cmTnOY)!$%&HNYRGqf)GL22{sbjV
z%hd8x>&jJ9H>y=Hrwl7k+{VZyzy*`G<RT}5a3NeHerOfaP3Mw||4i^3Ibn`25PqUw
z!(%mF#7Cw?woN_!as7JF_2mdZQPCjntJ#cU-V{2NLQiDV>l?J0?)}TVfS*0K>8eV$
zQZ({m#UP}ghPCHi4Tl<g<rg}}`(m4XUkx&_=DReVPgB_hSF^jMLK^(C`y4#9023`N
zC*BNn_dIE7+4R^)ZlN4z4JL^Khlq8ET2Gx<$E-sR(&t4Oy2%En%{1;_YmhjRSSGxO
zR!^7nFPL0U#K32^NQw%6hFlrsd<g*KUnAYFYMU(qSyct>A``%h1S9Obps0NTTn-g2
zdiVCR{yVDL-u#>TGKZJJDt@PiXYFoN()1t3@`A+EW4&MnBq8$Z7jsQtKs<&g1n!X<
z7KSwYwqk931nC!I#4(UXKVKkk)FI`pEr#6p$CYF0JBu24x8?4`tk;{Z9V}ryFr>8%
zqmn4Ju`z+z+C7A9#ELyc`pRK<x4VAnNS-#5GhSjqu-ZlHyM^*QG=#BM)&n2ik({MA
zW2O*@exHAbe(PJwK|&r@LnbOpND)P86|a;NY1yp!ylnGsyZqDAG~+Ku757p^9qjDi
zXatak`dL`~7P*>Nxi1>9!0CJW+om(iHd^?rt2Ur$pdk`lI}fpk35J9e&{P^AWov0q
zdgNh4Qs0Wj<@LvbH(Sj=;l=UZvb*yrmJkJzVKMA?nT4S)Qp{C%R1HZ^Fc-FXXG$sQ
zm72+`o^39P)fQHTe>OqB8z#3oUGiL%-Q=OC7KdH|A}0BSvX*P_r|ztTWnNV`6DxfE
znYp_WvYQh1cz&PZG!A!r@{vcn{#ywqDIWo>&93t8E$k~U1z+RzVw#qth7Y$-H|%Su
zIGyVQ^O>ewpKaadb#eYkL2AA?&FYcY%kgCIajQVA$Gvvq>-b>4Cd6{^Vi4{)ty9u}
zg?^!!N4U0!mK@?EUQ*7~lmD8kw_Mh)bMkdxEAjVD1uwz{cMuT~S|#@mlGMfx#%92y
zzX142>gK18>GfV><hPEXAmg^|X|)a^@V`E$hdX47BGcWgV0RyUEmEkgw_vD(DAqnR
zny%J$dNO_O??6<D%O+!PGUF4nbDr-}*KJvu3oB*=wSTYk+fjsxcFZ1nyKeg~WUgK1
zEtW9NHLZIr-=TT;v^41i^7xe@k4AEj&B%2FpDKVV^J94a?(2xleS}O&z1ivQO2%6h
z#L5rtzbY?zEMIJJd|QDgr-&4OOp;`8QgPa56Iuzz5W-8`;~x4RBU?Ys#nU9U>UPBl
zVL;5jw6(9~!!{F>wE1prEL!#Nh}8yIUmtOf+dTMbRA#Bgiq1a;?|%1@$}FZRsg36#
zL<_iW7i3gqduVG!L32G#;w5&k30Yaa<#}MNi{gkQ%SVX#QP+syPR=a$`pQ^g9RMCn
zfCDE@G8;6VN<FmYi<KF|n)I1Q495nWS!9vz`z;hgIlK-PAhxI;w>PN8crixF?O}e6
z(iRjX>+^V_M@B+gy}4$X<@sC?Uax?9rb#q<x3DVS#1SMsEIrW;WN{SV->tJ&1~hHa
zct|)qD${GJsgi^0vG-sCC1~j<+Ar$#wti+avh`uW6|TlJCfn6B&YkVAcV;{su+vil
z=Px_p%BFrllH;aH_3VlL{SqYx<oImTJZ_3J4g7&MWaKdB7mUW2U1A5=Q=}0wa>!`v
z!5@^HKgVxZz9a{90Um2dxVHIiin2}b`XBJO<F!#*kRO~24!cr|xY*#HU{FSt#Lo6o
zH_tCWWaI|f3sWB_?`L*z#b?F+06G#`ZkH#fOm1fB#=SM@0&&x89*8sf1miOAFFd~W
znisN}miTCW4V;{)0vx<2U{tvmz@N)e8?C>{TsmeRdACJ?9KX?>ai%|gZ3bP0(lyFt
zkD2@Y(IYIjo{Sr<3SWSy*UK%PQKO5wOSdNnrkXtgMQ^)!V#HqpRS^cIk<uEbY>ij@
z6P@%|Ri>9Yp`j=D)1I5}$XGDxFM6mkC@SUuQ|sM4W{zB0T^dk$<9gggMm_up!J>^U
z_umT0r%gcTS9F@CiV>@{6tI8ZJJt0dc66*ujeh)6`BsTW{cUQOB}<T~@E2ej@U6I|
zct=5~If5d<ilX0#ic5ZeVd%m=aixFPK{y>)x_a7(zWGr-XfQR>|1^$a&1^hQ`xc=J
zvxBv<Tem^$=aZ|THw>vBv@v(ASXde^dg6T|zB{u!LP>ZugmL{}B!(R|8YV=MAS5%A
zcSnYq%70xsd(PQ$;D@}F_07yLgBMZh)lP|HiZbd%iFBjZl%pHX`zJ-Xk48B*+c9|>
z>?!r{QY3hGenEqN-GNi`66pUU83aeC(#zeg4C9?96)E)rR}(jyd79DueY@V+Unun;
zN(#|q3DarTt&Dh00-LqhRz$L+Mc+5AWjg;DpLFZl=pQ&;YBCu?N_C~*K)muP=4y&V
z;)m>zPq7R@M(;X~BhI4xfV-hO1(%bDIZ^RoPnlGSJYM-MO1<3#P~DhUf6D8rQp$<`
zGk`ATlK6GAJmA!5M&X-2W&=Brzu_(_+@3<WhD^X&rF!;2#qY<;Ky-zwBX=9<CS_Dw
zDU<;#*1<bfM;xJJQEwV_h<;ek{-Ea$X?xKeOg95JFy5*Z()d7Au4TqnS962k{<%vO
z*xfCPIxOM%^H9tXr@Q~R(}-v9l^RT$5OnG?j6q8Y@@I1<E8(*R`~|mhlj#;K;+_hE
zCf>&!Tm4DPw5B4GPwBA-Ue|t0r2|+o3temJdcJZmiDmKHU%CrMJ8#@ca%HDEJiXZ+
zx^+_auo&N5y6lI|7V6Ygcgv+L62DZq^FPdYtK<g2BCO>f6CbEV61hA~?xT>6bUA&k
z?8>}w@s9O*PBVyXB^kGUV&(OzJhq}#y8J3~*5OgJ0%=tGN7E@*%13~0w*92xw*<=9
z4M~UvY+FVMv``+@lsFkZAH|sG)$_nAJ75>(F$mZjU+s`|<Un{kqG;RwN|agI>DuKu
zl^JXt=|x-5RKL93i9_3|QXaX+lo{m{&G=fm+v}X}mTX>B<&#C5Xj1Vafn0X|Q|~0a
zeMatoTQq--2rSX_OLlg1mUwjKI;GS(MX42mVecEF{ir0q&Z3B!;+#y~L3UK;vc&uM
zP=q|?*&OsXt^J)~+)`0Bi}^GuOWbYUDLQ-7l?8WZRA2SYMSTdG3fhwE^d6>R!(Bv<
zfEtv=LxeI=(lgKcX;G}(UuQ?)QcrL!>e<E5&;Uw<l^ME{G!wX@RAtggnZfP8^T*@X
z!#<V~TJcD30~9rT;|}}rL__yCC2ft$z@3y|(Q3pttTTe)fF$L@2eo8Vg9=T6r<q^#
zS@?H~@zr4~{VFT4<JgVp&vnI8^r6qbX8Ao?aq-Oy=Zsf=z-dTvFHTKT2*WZh{4rY?
z{B%x)bFQjc$*kY`0bR_ix}3n44bTe-p4U5Nk+mtQ$h%xEc0#cowc~XGeeRC0L*BT4
zVvwLjM$jh-Ew)tnjX$>)yDGUN<@D!Jrru#%@IM)ZH+Gy=U<de@eCV7w;|%<?qgIOu
zqgEbO63BG^{UV#Wdgu}FkB^_}^zHM!MV3R&8y4)b=B0eBKA+E!D!b!=Q?z40ZKIHp
zu`-Q(v;@Ap=S3b63B<RLEj4!RNfVvw?CcO+Vd)Qz?-Oe#B2yh%PKOsZOCB4KfEfap
z?=_f2kt8GldXIi4>!^)|zBG33IR98-aqKjrBhp&%Zpuhb;c;|C+!F+oo?RmaZ?0<o
zfMIgxkPI^|n2N`k`q2%#YIAjM5q4t~eVGZ*SK0bLeg2t}0lDxz4N%2|DsVU5f@;v!
zJ3)4vhJ)3S8vNCvisQN>f?A7{Ri<s`y8A&{m|nIIO!+|qXo5e1HKsf>sZ{mb%hg~{
zhDVZSuDb>{>z8u<>mA?R64XY%@{%R6F-*DIWQp*QadQaStljRt=YMeUhI(~atB4cf
zsqF=HQs`kkuW4V&`?dLMY8%BNlYAmb{uoI!ev67ybfnUNsz~4-5<75v%iQ2q;BQyN
zL#1qg?cV4^gP5r|GUxcY{EYgmfo_qV`)>H^Ajao9ua^Eyx26D?-Y3k#!Xe&2Z=%4T
z1X|9QqwTxXo())Ynf84X<}Vz|4iJqVz8ehdoMu4D?9H4rOG@xc{c(GfK0birmI@63
z2wEfEcEML;XV%L+M~ghXAsOvp{b=_1W5sP|%G*$KsGGIg8<d%=57L<Y&{~?28j<fu
zM7{tGVT>PNV*CzeJ^SZsIFPyVYHjn?vP-oQ&5vF8EsfPs)2y?5t6&WC0yF?w=hHaU
zU%m`yt2<Q#d%9h+J;~=<Y7#KYKKUx?2E65|sPxUmSDXvS&EhQ{3W=@5r1N;=-vDUK
z2G=1gQ^EdaIt%~`z&%R=dwuyop$2-2&&goiw`&Y_5l<VXEp@oKf?~Jojp&&21IbM6
z1s$A&Q%210WUjgPDc}BuVv!1e*gf%mQ9v+wp$54!D|hVjNm=jL7e`P3*s5d{Kxd63
zLoQxdY~=e0CWJ!vu}hR)_=#hm+!ePGqj$m$z&EA!@$J>}rDtnhs}aljwbl;<@@6I;
z`*PS9j5%D0hZ>^XB1C?^q;vy_VFfjq6K-Np&MD${HhlrM$}B#I#xjORr`#F&e4`la
z+itYO@byJ6ULiQmJv-`_Be|j!ZmVdT+Qbvvj=`yS`*etGt>Y4<xd`^HqVfknzjWVJ
zGuZ5<uE?kwYsNIfwlJSev0pTD=n?7eA*=te0fu{Y&m>^jAnklt2d1prb0HpJF7Gy^
zDRd}zU~yt{Hyky<$}iqf@x;7MnY`<JnY!gC@L(!-c9D;l=A%4&?47NGw&2KTw><5;
zS)f>0<Ih+pAmIX(GX28&L#;wCRVu=u9XQFtPMoOV9^&yvKBy+()Tis0xVmGB8~Yk{
zO46*pjcE!u$73lPe6?T;*K3uGgF@3S!PcTc?6X8wV}~6#mfMTJ&@bmW#aNT^Nxt`i
zD%Rd&Sz}yz4>eEicmE-ICi>y2Sin7tHSWwgzOTYC{Zba;A=RRhY-`+W9gW4=r+WD5
zeO2*w(`c%s0agidr5#MI6`NKCADIO^i7{2irer|mK+?yuaiHhY#M$6v5whmzW&Q`f
zHhK^Vq3pPcJCus?JI1;@zrkRW=~7|5`kq8>i%^c)t3Zk0+;KN1aw6VhoeDdPMJ%at
zL^h5~=VB8J%46{z<>>seU&--xqMb=K#!PC=uks^b)`TSpva=3|_T~6>CgVS?Dv7F|
zm2~!Q=TKKKxlf7cwg}rl7?!n4VeH=DegG@3Jo(x6lwqPI5vK9&N1>PcmBGiL)fWM|
z*DjA`gGtTE%YWyWPbMKBidF(4EL_dKH+5j9hRPVB>4#M8b;_FObqFid;KrTL%AsCg
z#WED?+fvVzCX^>r?A2b-Yq?ySBPLgGwLFYtHDe{ub;Z(Ic?K_LAj^6!bKmK<qhBkS
zM3QRBWW>m9{V@E*tM%@%f+TfHN@8O=uGk9Ky)*d@jIM&<EgC-r#X9Q`%=i35I;ldd
za^~x>4;(|>g`7vE8hn^~oKOd(EK2Yi3~MNe+gHv1SQnPK6Bnxir!{}}=C0uS)lfS-
z%D8o_``Jm$?Cp*xS$-9(1@sQl3o}ibs!sArwokaX@|ol~M`XJ=w9_FiHae+C#Kf3;
z0$NJ32zS{zuQl-``^|`*q@BE%Vbgmi)Ux><wSFvTAA-z{VH{YNQrz!+tb3OE(2@k9
zD#J`IsH3dkEjDfJl2!leRP);XysQ&j_U)3uFmSY{Lg(4f+enDBnH2h7d2vIP$tzr-
zJqGeh$t*|CEQiz3&{;F-Io}mjX>U!8UatS<@7`gPLGM<Bng;rCM>;FSv${1HX1}#I
zxeTHk2R1Tz;A39e?H_gh<K4Ar?t`mXftk!qb*EZ%L~46;nrDs!wlx;Vs_;6Z>HC?>
zp0#Aqwt2p_%-Y>lc#zJBo5A6nT-4yjZp4`))CO%)q3|ft;%H=cD%>n#yRU_PLL))Z
z!I-eqh=wuCK39(AXix2VAY725L*B(bY10+=&Wwa5x`$uW$J)#Z*}~RmK&37F@?K)$
zh-D~`#Iw4Hc;+7H;Lgeo)%?hJ7P-#5G#6jrY3JP8L_D_1NXTk|wvtfWJs{2JuG5Y3
zjLJ56Jo2ISxi7o73Tn-Zh<KLlofo8J%s9hXmu&K0YN(gM$xtDA+p%5O<J5$EoD=?E
z%D_v;_{4=bWVAk0UmRIE(-?f@DUET6eesQov=wG@KBkbS;UE_jKjo{wFCsn(5)Yc~
zT49L`1N`LdoJ7?rfhV7+ZsZR<z8FEr(emck2h9=)#%F*J3i1}ym(pQF<*t^&;pTM`
zj(Li)v|v$}*WQaB+2P*gT_X_XG&rS1q4Rxn%5X(Cd&Tgj<<lW?;q=j{yk59Vl8sSu
z{CDVUV8j41B0(ED)x(Q_r1Ux52|P5JgJvU_Kk@Ea|4`38sb+ATSOnm6w-$nI|6_c4
zTj#7anC*@e<1vlZu2a+i{qlzmq7xTWtoVByc&Hm3xLQ%q#a#yMo+`BiDy3xz$~aZ*
zZ%3c*K%9^(H{1r-U(wmL$3=HS^s9Ko?<(of+^?xm<BNa)oCC?2dAw~Ze=-lN{hh>b
z0qR~67ZE>xL$0F)&mI46f@IMUfiRu%MhCx*VuHMG`JOZBM9Yv5B~T3_>G*|{QUv?P
z^E0lE7dGXvWlURfXL@EUS7~nH^P_Bt>H3ik{K<$j-~)#Z^pmBeWpD8VV#`?vr2Ze1
z2Z^1nf)+Lb)Cv%IbRHzHU%vv?`2JYw+Ljp62{vby7?g(0f=3DfU|k~HURVLSwBa;l
zPb+Xs8JjN6W=4aE6o2%ghK){hTPb;<;b)kX78&B=yX&h=E$4RP{0Qj~paY6>Z41FF
zchU^4<c7<Dr(hEmmx&8Xb5pr-^lpwe8xy~4B5d6%S9)sXj9MmZDn+1$-Gx2Uj`}tJ
z3JT}NaB*3Gm+1s1tyRFrQ=093_E8~RtOmC|`*%pprD{J*kF_DLOCOE;MtvHa$6(?>
zllV(IfDE*zCuNs@U|~Fr=xLY4hFc24RMFyFC~2#-_6WwrInpBuv09PxQU>lpeOrEs
z4l7l4V223`2#I)Pvrd8QQFP)?nw|m-(cyj_R5(r2V`DF0ZNl6rG^^46Z=^R4dUUOd
zWIhZ%byXy$M->rD7+tYGC}Xma+sYA~&JBIaS$$T8Dqc?)yZ2@G^pD?}J#;jWtg=ev
z>j$y%n_e3;hMj{>(fDbSw27kCvG92X_w=#P?j;sFvorRiaV+Q7?p#<|%*;ez^4&N1
zePY9m!Jq1Zuq+1dWlnUYNu#@uLxa7<T$sU5<3zUZ<Ss_Ham6@iJeq{U@l%dS?`HiD
z&V2z}&d86bL*()CW&E6J|JuPp&*h7vSmELi`StCT4S(YB@$2g(1r*@esE<L#6o;q3
z)|4)i^>u(vaFb^b&O3NJ$BgCjJ2V_EODjJAQPaNF+Tyu>!6F`?*=`|LA7H#1D$*S5
zh84*{+J_5nH-*L3Zkf)tS`AEY_=|Z?MGtp(fFN-Hq8$39=ativc8wj6?RxfE!Z*94
z@}-5Xr<@SF1_^294v`&hj&+?Q?1h~Gu}RT=5jv8{ocgn9&dGsbC48!`+RLf*Pmadh
zv<MD-;ZWt)p~u3qA@9e5qg5-V9YcX;GPW*b#8w&h{!XMLFOLv%3)pHoYBMd9v+LHd
zv9doJH`wWE%eldjU^6F!tCM+=tFUm=>XV&Ncug#G{`0Ud2Vb6g>(Kx9Z3It#8&hLD
z{_^iSfAIPrD&8&fu&MRAGQKv!%J~`TunC2<pH5;}^j_4>?8;ykOhDzaYVnm#BioI?
z-iN1tqX9F+0VVHDh;60}D@EJ3U005Q7>b05440sGzWLC*PN(MvS%C28=u_p#p`i;2
zRJ-3qH0M<x_pRTn`6!6ls<jDG3rhUXV<Y|I!Iv*YK-=D*R-*$?gZM*;juB4<a;^HZ
zYs;u`rP4x#9NZb1zKFLUX>?*2D4%Yt8e64h&c(b6WU?vVSnNGu)7|xfvZ(}fJ~4#w
z92Hr;ogBET5it_*bl-Nc{bavnn?vkgOI7Xo{G=Tn>%kUW-U)si@nC99-EVa+zWBTP
zPvAe3OO~wV5~WO>dyWeHryyX_y;R*!w1z41%vHcy`{}Tp9gt2@i31Tve7??}(XB{6
z`s^BNVa-y+tdvJ=fE(!J4_5`yGy0}jt+?~!xu4MIeOfQIXz93+_KVj|rOW9y$YhWy
z{7A<I8Srw7&DktH)n;5Z%6!9b*gD#wmAUc4Y7!j>jwA1c3RuzSCOtg3c;krAAM}`C
zeYG<9+4%r#h1SK@8ZlJxTQk*F5G*uc26~FEFqhKH%5#drgdyX!?5@j7n~@p3!8LD8
z4vYxtqT-`A33<@x1<`qKaVJr1sX8Hn;O<HLH%$GCYqQV&$Of$Yh42i&92MP0<<|bl
z);Xj|`LXd!g+AEGs99Ff)eu;-wa@fur@To=|NK!&fmw;r{eDz|^97l`M{+iM;tizb
ziu{Nv=qoOD#Pd>TG=a{>ia>qvDT{eB;&NtU-K!}+po^30`A406juU!^Z&YdOOSBD|
z3@)gNbxiG(%I2h?f=*!L)=wC6VcxmIR6Jq}pu~@`OG-M4+g&}YwbMzm;&XUYM3~Zg
zM2dDq_8$uknS(z5i{PpafzCKzaah^-_^O_1nW}R&TR|MqdF*%m<j)Uu(6#MD4X!1r
zjnuD-nJ=d;a%OZr-ZwTH&hQ&`rhO=JGekEdtB1V#@Q_^xtvN(KzK11;OH9T{uln{v
zvN1jZyv4%P53%$IdzaPqpiVhHyt$q|QP_uyZ%iC=(iqTB^c6B;8%8z?m}W$DK&P2*
zjSYjz49zo&Hpz3VVuA6b>aTo@A6n)NXZ<Qq5$K4UvKf%SyMpq_%zJt?vP<2rbGMz(
zI0o`AgPWXl_lw?AoyXD4WueGvt#M{Uq+}mAh<3onx%wGSXR`YU^$(WcBD~FGApvX#
z>XS4ff37~!1$mi~BGqj8zJ2xCOu|X;NBX0ZIHLgP4aW-w1Gc-P=t6wQZ|mo+OOk~(
zY>n8qY4KC1k2jkyT&??u`y3D6V~ARByt5KX%)`2UjSJ+>Qbu*RrX6oB`J-#hHHxAI
z-WWKx{ixjvJM-T>C$4=MGVqSk09G+)@t}SvVcXWCXxF^wB_@FPR%y=@w7_xe-N{(R
zD+qRDhdRKLE9E`fNm1r?_E_C?`u(bjo{T9O+Q(!`t|Qa!b-q$C+sW%+>N$Wkp4&y)
zzT1bt`2$_g`}8`rsd_gzR(drK^|dWjBOM<0t#1U&{_n3P2wB3)BpWxJBYHX51jF2Q
z0Ts66+S>u6=*E=|UoSNfAwRKw&;DR^4f$SUp>c=gM-vrqD`cS(s*(d5u1)G`?v1XT
zrl3?%RO=W@e(9s(o(Gb9EAH&u@8!2m{SB|B3w`SW-xnj-Pa-Cy`$N;31s!3<J&WMl
z{P>dE%sH@sdr#^@S$UhD0abGacOF!?!%M6aa#kv0bL_OBhQCnTPvqBnvc{YR6^m5>
z7LgG?xrl14dgVd+QAxy_9vlAj84vnPeD7D3c#eOmyH;~y>`(6@O1kGX8ZP!>%v5YQ
zp<p_lIJxu%aaxi{CkQ6WZgBQTU)0AYFq6;!rgT}_?4J4ULPvXi0e?XDRRu#Ge|s9^
z1`F8zT><g!Cm#l&&g9rb+klyB?wAvb?Y2|zJ<qm;zLNN1ttl*W%lt%O0g#8p{4m;h
z$xxaHyGAshEpD-RA0L|3D(>s9!`_sL8d<Se^vKo6LXl_MSJ_ift@nNgbO#bOmV%JM
zaf^WF-OPQijlJ@iihv4!&%yws?QC&sH=#DAa-CzIww@Wl<89ZgN{2p%9a;vBy8gpf
zy|M><r)67TMR8dtp>;_B6ST-hz&xsAyrgowUnR(;@Tr((c6&q1jJS?sQeN*34uqo}
z!kc&bowjufr9)!_P?168?g>hCIYCGpsXqzppVGZJ!RvyHIz$;`gU-NMm?zS!5)UzF
zQuGsoC~b&20EAoBt>A8I!6Ea5jof;sPr?%FPCxaTOm)^7jaDg!Bn**$LaNp387J6{
zb;{HKCM#`huru-yodc7$fAV-z^D6BmZv8p=#XkXuI<TPe4uf;M1^2ma7meP8lO8p6
zBIr!CEoFQ+?4)Q-S8vDdV6=1ZS@u8t8(1Q-v%!Xv)-C0vgn^Z_PwYo*KSb-zDz%l2
zbv(}&EY(I+bUoo6TjYy)#m4qV(V_Qs;IiFhOoyLJ01e%av?aLm#8W}3UJ>QU#w2BX
zoP1f5=SUE<l)BRN!CrfxT8?<&*j|D-<GpxSwZ!JLpSwHya7#W(Lxz*~GNb0{k<-4K
z@zh>#j_cD2OKu^cP~6=w((u6KNxHqJ8gjG70w$x(6c8^a;-`1jVz%CR-HsR1pMFl)
z6Ge2a>2OHu8G58I?E|2Xt;8g9-vN2<wkoF%VJlGSoS6E7(c1GlnNFI*B){xZuBjN>
z*mOOqB%G+#OZ?HirJa@7#Y}pYVNbT`w4x8fvjO~O;gV5pJ*hF}z&LB0ZjCXL<E$9o
zTCrWS<J`(*PTsu$wRbCDDJ`FRw$>mz-dfwO1O8CwmxrX8NZP5YO+G<PZD%v@N$!-_
zg{ks2TQ*_0%XT<EGHEo<c^W0~kjq>i9$HS^T4VY&R2D?)<Cpa>$}3_5<&_$?*7%*_
zB(g&QhM%9QnUS!@o}+SRF3>}Kd3aI3>s=u%##re_ca|;6<I;0Y4LeK7)RlDtYq-Nu
zVL;V;9UfJg7+@pJ6iC*CI~5I>(CySj&QwZ$S}XjcJE8V)n>D9)+BFHI>*U8Zs)i>d
zevO87lalRWbkVC**mk`2LixhpCq(i0Ui)h@XSZbFWOD_h(3GKZo;Mqx@J7S>`agD(
z)`6yr<d*ywMca<x224o~8Wk5hTHKI#oa(f>Gcn^PXTc6RcdDR=WXFs=&oh@Y)=W{?
zgy|q=MdJKy`ZDQ=R%qv7V@AlclGPmOt5nV%@3HI)kkLLvg%{cf^kNP&A*|PIF&9Wj
z++6E$vecj;wq0#}rE}&Wr#U7Qt2)}67#q{&;Axke!U1H+sW;aP&9qyQzlQ1Y0Ug!5
zE4FibfNY;2-<(>9_6Q|GK*5NCFF??m^ZSJX=hg$cUc|b_6X0A}$(&oDyEVM$X5CsA
zAIkY;YyLy1^Lx*os=0q|uNK!f_>Q1Qf_I&b;K2`SPaE*nu^2m_4fyv@>GkHGIS0?A
zjdF-bNydpP<MJ4t$am~~8i&i9gFfMJ)?rO~Gf$=}wlgH+{Oja%PA)GzC5%w8uwMP;
z)fknfz4RJsB?q80cvWe>bchSXy5~=<p0>cOH;AsjQof@D`nKyycqX$un{+O5sI-`_
znP-p@Gy;0os#n`~`sryT&1~ont;S>pmYv-0a|b>=xwub1vp*TO{tmYuk4mT*xUkH&
z?Xpet&$#&$1?u1qy6p70=?1HNKSGxEvZ$etw&GYD?+?~gEsV8bD&@WAWSH(N-q!Z+
z6$;2bw*Md^3%ja;@PEI|0IWE&+P-jZ*#og2Of<xRMKbWBY?#$oXH!ZOXumQ)aL>7@
zodwRZ3~Wec(=Tlolp+NTDifmFj7GB=QSgY(xsSRmh0>61<OBs4kQ&ul4<4>CU3;Hj
z3<a6&Xm${cVvT8;tyFOM?%}Q8>9Q>ncOq$H5<*z}=y*=M2<Y?YXd5m=&l|;a-VyW_
ze?#$J&BVIwN5~ldS+#b-hGFxFtu$LR(AjR=9aGIFI%^t}B0B5&5Y!qH0J<<=_g>Jk
zPJ(UM+Jm7B)(}T4P0L~n|LolypcB|>9CiQsk<r%5O<|!13BwB=kD;+)^j;6{9k9*{
z+p|@{>((Q{ws5MFv{GhTxSb(w$s!|0H`Cf;lsSWY-B%|zc_*)MOK}{stTUzY&TIW9
zY)jT&0FM@Ei}tAhXx*9CMn^PnGza3rSUw(_h}A{zws>w}bT{~Vr9zjGeMFgYqa9g$
zom-y!<>e55JFh<wwMmiIEcGf6U?z%Z53=3oU#mHZdc&}jeL$L)R68w9Jiu|=hhY-?
zsnWvwe)<Z0S7Mq4WTXLdwwWK7g1ICI+3Y}51JKQ*oe6g6gPEkdEX`lWzdbu7V98Q_
z<L(Z3AL0#KW<O<D<@4f<GStmyG8}`pz_DAycc&L1C@lv=gNg~yd0p@e570_=$|%x)
zZEw>rY32x|qX%;K+t@7zs9*g8kF4t|*;~A@C;XXo5Ox;j0mB+ffv_fPY8h!;5{7gq
zy?+980&}Ec=%E(mOpyjcBx9r0t!01#KfA3P_+qB{B*oNm=m5m3`l7X0$TVXQ<LO_P
zx3@5iaqN{b_5?Zc`54ZDMmjE5vC76txLOpVab-70HFayn29l*DHWDB_gYNqVbeRax
zkl@4>^>cqH+;-_XJs4HkBMI9}w`*>EzwUnFckE_A+AwJXn*Do=MP~yQg!&)%2H!a_
z^;t_XOj}Deaw=&sD4+Z#GnN}Y-R!5`VdAOu58L6E0^xX;GvBetn@N0=21&@OxQ%3-
z-m|VE9K!C+nq4F~g2<1VfF^VxFcQLuNw{C=IiT0=kIUK?n}J!DTWs6eNP1Rw&XmHB
zDzclAS<VYE%Wpf-z1~Hr9nceY*}Od)iz{Rr!^$Bpf55^A&_86Z6F}(x4{L)B&WJae
zxZ*e?6OidvYuTKeDt5Sidp<>?$m+_r<m3RA(qf%9u|1zrI}NPspPt8LmE@%~@T$$c
zw^^q8ROG*lsR^(s0coRQmJYvmdj$8UF6Nb*uV{VMbB#y$E~aZim;*A0jihkMV8Ly_
zTqMe4?-Rp}gtJlSVE2z2CYzazqxtRW3^z|jSe6|IncP}o;iZ)2cPni#r5UuEO`f(l
zE`@6gjkog50v!kV`L^VN4>nzA679T_1&yy9K=o(NdFoRpxS}BxYG~SGW>~^A9}hBG
zNA|S|BQ#~|pkDJn@U=8k0*AtYAY4m{L8R^SxF5bfUPjCfFCQJzj`3MP`X=pVv%QP>
z%Wc8%CF~WK;nPR>fSkW)<_8PoS+)k=1AeiDo$?hySAQrs39H4wl4<|#aVm{s@xU(W
zH$)n{xzt6a2}tEN66G;&arZtM6bl!d_s(NAYHMuZS{hYJ4Ny__H3Hr8BO`3MyFdrO
zUO(<zgyGL2Tk`$CEavTZ|IeZS<DY(XMBQL5D7cgKRE?;gn8{z@<g03&B}%U$ud$V_
z+J<6NzCZhCLs|<B+n>nP&;zJOJ&)vid-s1o;Z2g=GLp9ID;#`%Ff-b0iTG33gX7;e
zM*m+1>?e828mgJ={4XJyY5tvtRICesP?g-jWXSP9$xx4SF+4r+=#F|P`uzvr*1!2n
zQsD$WpaHP<)%M2(aq;*g2sy<|f(Ii1rltS-Zf2&6pZPzcySc_1|Mg=v*T3ZQsv)ge
z@89H-{XfX%KlEy5{<rI|#aVk)^uJnOid=@&j3*`*{~ZGULrs4HQ7u5&N36_U_8WE;
z{<fAtc?1;2`Bc^XSLpm79sf}3(-IK~Lj}Y!hG^Emt;4Kgf;JItlT-gU=Kiman<PA>
ze{n<$_2c!U{1>8-f9L=If~fz5GWFfR(j`T<r?!*)OU^d`gPi}<1|9?xR78Du!=&FN
z<Zr7(`0uRKF;D;M@4qVOKRYsf3H_IG1-n&Q@&Em3_F#eTe}Kq;yz-A<%zr`tvw5<X
zhUH&ykNN*att+Fi>ur^0t^`iZ*L7xuNt0HlhutfkTj{<)j-2k+(_l(onSJ^&pr6&e
zJ2d7xjkjym8uWCo%Gzr(zctS^9j+M)=22sl7FMhDPikxySV*v-R-86b{qF!a74R2H
z7;}^=HiUWc`}ODBWVB?Ghd+E5g0Au5>Y1Vs>BR_@S&Uve)$$qE7ti0r5%Rp#c9{at
z&HZtM6`dH7V25ZLe!c;j^Tptw_9yFuXB}ewQuMuc1{)q9kzg~J&^lPOVF8QNVErRF
zukr)Cgd@GBV`1SZdY^MtFIKZG_8V+wTGABMDB*Os4Y1HlUA^~5H1n0K_L5j=QGgqx
z7!AwjVu#27b6xy*g=vxy0GTnabt3h7NW!5}qQNN3NNRPcvN~wNk>p1!VTXH&WOyFo
z6f_c<`m|^`>qu&F{kvLgy`%3g7EjJ}{Cq8C=q%F$J>wi+{SEkR!{1WXC*bx@#2(XY
z0bQmW9OoJs`21I@j5l0{3k?|5Ee))nvjf~g%c~z#xNx#mWAywY<jIHt3z~0HNSF`W
z|IPuf$Q>jZDCyL#q2B)rqIrHdOy+V4{%MC=x#?n~dgj^9rd2@affypa+>{}H&<26>
zbf2YOZ}oP1U!Qpuvj=@?$<<#jT7;_D+dB;#3=d^-Q2YRxJUqIP!!Iq6PFUQ-1y~={
zFI5EK+y6V8cOWO2nHtL&<L4sJaRmCaI1U2~LLY%n0yLAGh0hM65x?397BO?u-N%(>
z`(>)DKzA`k$4-M5kw;*b9$g*Ut&!d+83k>Cy%oK1usdJE>ZSZ%2)Uas=v@zid7SKe
zS4RPmxj4Xy9cuWrb>}pd(J_7MM9_8VPxgYMV~y2H6Us<wh}u?qzX-((g)xA1MAd8@
zF7p3)@=wh-zMqfbO(zjhf9<f-Hg{o^6!Ze0{_qd%dCYD_#p<<q<|i4GdrU>>K&Sr}
zsqKG}N+xN;$ZP{_&$szBGXMvB*00Ps&-gl%H$m(+8Ze&or-F{AK??ySKN<<zCm&+q
zj)VlUMpBEKEgydHJYRydC@}zeOj_LalPANxCP{ZD+dza&k681$nFIEyHMn_ae!knl
zF+7fzOxhAveSMRdEOVlYX`=lMeTeSoC(jghP+nAEH7e}GlLTH=u!?9qUv!M+1VFW{
zUO-!`9I4u-7H^YF*y@8fzr$<cdZhgj@m#FU%C`piw^KN%s>5(=6RjH6KFt|`L3~y<
z-BLmqVjUm3veovMoqjgEz{B%L%63$x)9}vVp!+yOoOL6mbisJeFSL^$W>M=?d}mUi
z^Ge%%qtOrV8(D_n9u_*9a0Hr%iJIR?)WNVZaH`sRuo^umW|23~xc7$46t-M#z7_QZ
zo(LzSCzCi|G(W*Utg$&DJ7_~2GAyi4{Vc__z&TUhrmlr?<k94Yk^r|&LHKu7a(c2U
z=WFv5TxPh~M3-)Pkd1z;tD}M*py~PxoQ0U5iOkfqtJ-~d#Qf+>bXY^?#6#x_KYRT;
zbM^L@wX7fB=qb@lj>>E`Qva^u6|nVrVIfJl+uLI=yyF*GmOy>fKGY*q;;CfmVv^L@
z?Ftfe)Cd1bT&iC5?@5#kTl;y^PCm~xL5ygeY?o?lMh{}IeuguVhVudIAtxRFuBR^p
zYRyNnODSF^+ylL9|2uB$Klhz~<0IjENHB(c#P)YSa!-J%$sX>s71B^C?^o@xqgH<a
znm*b)*VXg$>R4zKpKY720U3JAI3XxL+Y{U~(y}>^<eJk+4#Qiw(uM^89=2Wff$;B2
zOd>oo$2}BWRiw_9CQ{{rmyt{3HaWaD7dH)LHo8ub^bpWajQQ^)1R}E8TmNL59$q#i
zB_l~#OX007x5;{@UmZ@(pL@nQuIr<}qj&v+#1j&znJm?FgVXq#Ad+jzOX;H;^bo0&
z)9Gtue-#;lnY*j{_<ML5X2GvYAZ!FJFrTEn-Cl!`)<N#xu2USoVp&FOULOQ4-n=bT
z3{h#F4?4gv1^3{ua6XRHl5J^cx$*otu-L0lC&klwd3sEEL7biQ&vRj1EOPDFN3QOc
zmFT8uX$GH>NIEJTvKJ7LsOR=3D-)_DEyvf_`y#_8Oh&$FDv3b<(O^P|+5%Gw+b{Ur
zd+!vC6EdS^B-<`VRE?-Y{2sglu_%pl5%on`y$FleX!aIAE7l2DAW5d+(ri0DKfK%-
zBIobK_Z<fId@hJiBjsWY9C1M(*^n#o$_VB6v%vdZu`;;rI(W2mhj`oRgzE6j5Gf8p
z#w#sm^Mv}}xd2>_s-LY6-o!Dy<+r$m#}n_oe&#UMR^#piD@c8`^BGT0OD3`S&(b3F
zpicH5r>>zwjA*FCR6PQ5v)SxcxB@ZH^Vj#v_gY3=GRe0`-hzW7UKz+Z_&cwB7Pn(R
zq6`PnK{e?$B$vCELf~=ajPzk*6CHw*Cw;>3q_UjygUtCENvQ8Bx5sR+Qua$g<8l*)
zfqnF)uh|;j!$Ffp)Rg=b|6k80KjtAnq*d!(M91LfzMKCH#KV4^CvvFv(d{-W5=!W;
z-K?i%p6AW(kQN?q@1!9{`I8rq;*6RdW!*e{4U*@ad*~n3I`@=&Ss86Y0Rc1ZB%4Wu
z1HOq`qux{e6+59k_kcYOSD~OE_DLGKjVTVb6hZJ4bfGS@hwG(jVUyXWKuO;jN4PL%
z)`;v%K{btZ!#D`<vi@GLd&x(U)im`XA^33l@bWe}8TsYz2ArP1O~9o|Y}xFtm?x*{
zw^%{?02S*3{wQ_B`qf2q3+uwntPa7tQH7rPxICwvDYGP++&lMNd$g1u78wU_x%B;}
zA?GsomIcO{nqK@mTs8K3NPc~sYVPLnEXMH(-Ko-zlHI`F0oh^Vu0J(4P-%e2G53>5
zx5Hz3iM4OR?j_RKH-?W*u-<dk=n8^|quVatcDb4q`P#dzC+!owD8Fh8FT0f)OfF@q
zpR62pn}(NhhT+z6`v=to^@e~q)2+PY_7dP9xxruIxfG0%U!$EIkY_<U?&fm2_5Rr%
z;ZvoOv<isLaAk=#sQ*$Z+#si2BBZ|;9%T{7Fs~lcH^UN~)3kz&iEE869wRO?_1tpV
zE=VIqWCfsIrfLwRJGu5%(zw~f#i1a<a^KTo35|9u8(8US($Vjy%!2U>>`<5UNcdpd
zoaBeV>9r@WN7&Y?hn+r`V(^1(9R9Ft8xg%J?vGl6a}<3L^1r$qBIjB=mrDbPO5*!Q
z;RiX__`}1}@NgwPDZ^nwlBl{kke~QTQ@nwy-eZ<vTm4qMy~D*5a#GFq%XbRDm^I?>
zh}K3V0@2kL#9hDalyXud;^gGK0dKV!%jUgZz4ymAqQ+#~k}ER5OP5B(;gvEfc}X61
zb=$GXY5Xs?-ZH4|@aq;W6pFhS3+`T^xCJlnf#UA&QarfRQrxvAKya7h(%?=hR@~iA
z{`by#-+Rw{K4g9~`H-1slI&;gwf0_1=wN%Zgie)JNZ#BQ+TfuhN74b^?*ZDys)u6H
zDtY4eS-n0*S%zNvG380*csvQw=*~w&{QX)kV??4%#4f5gkAZkU#=6*4yUkk;uGnP0
zeAf15wb(i2!jyP;PKs*JYlFL?tUgXwKYE}o-Tq(ETj!h5pj2DC$+tLpRNJ`hV}BQo
zQncT%KcDPK)l#ld-Q|SiN=4Pg%y@jJpf#R>aW*_QQmJbZ+i++i*A-8OZ`od4eEtk3
zvg*nvq;l!Jo`?8d?g~d^{&-okx;+(1dczJ^mo8g^a`^t8F{RiRvQfZYB_|)Bu=;<6
zxNFI3QuvLIyxn_YQ<LCZz3hIm@cs9-n__*+cSpPXiQI&o^Qn&{ieO5jTk{TV?RPOL
z5-sv_-QGVYlvjMuSBK1Brq%W1=o2@7UGGxI--xZ~V*Exo+v#M=)PGq%#tHpb8G14&
zHSgEq<cx5B|A)xH9Orw>oH63PUpKK=A4OB7AO6-j{3KKDeyU$wzpbXGdicc^-TwL<
zLWJj~epG#&Bn$mF^?bA8^6{u;E2>{j)a>K&v@+YYo!_*uVnEXd!sAr(zpFnu)Gss<
z{mTbiI!U6uqTY|S(F;X_<nxS;FnaXOG?;;ha%IAm4r}Mmf1DouE)-#hbm$m(BwP%v
zpX}L?WMn&$y44<apQ(%%<4YBHyq*kfu%c~Nl`kU!e$SCQ{ofZhDqix_95Li>0_xN}
zU&bANQgPmoekJ(fzU=DjdAe3IEpxU0nwC!li~CKW=(UcufVCHR;z480$^4hJl8N>x
zG30su>`Lr5osxHblrtzy*u11V8W-K-!(yteaui>pki{Lf;8W7(U^4R<F$cA=|7GcB
zckru8zf;n)eamU<23EFAtOAGkbL#?5Nbq$&PC0e1?LbMl|6OEs^oHnpw)>OEi{1w%
zO^cwH4)UnO$yVo&!Lp85h4MPuquGb+6v5|zrlLhWRu7tW^+5H{X=US4tGEo|+AVk6
z%CWzNgy9}}#=cvTn}r!TIJe>+r>?hJK6;kQ{ekon;^KGQ8jM1~shohm)_#5)V?LuC
z=KMIdu8JevN3ELEyFI!LfzPr~g<nk%$8-HSA-DaTC*`M9x?N4~`d<khi##VqHaq@Y
zMsAMs{jBk(1sMk({@NVseSGu@3h@m(KJ0IgVSf0FY(nAOc2>68U=z}mNMI0A^3cBW
z;(WIhE>W#O1Cu#8HvGTA;X8U72z*^2fkKGhZvrRb@({fJC=UgQF1s8{;oyZapZD13
zTL9~866lU{eby^c31Q4lHwCNWB45*%s%0ktddZx@4qR5Or^g?2*5eywq2r#pI=-)$
zb$Gu+a&7Myxlgna1_$6>0z6mbFgmXLsnp$^DkEK_mPvRA?w>+=@{CZllL<FbUFa6b
zd5s*87kBw?+v?D=)xI%yPR#%^)vEotJ8`hl5xnIJvd+m*`<JyfDeUoa_i843PEA6r
zMm4=(D=VYxqBolLIo;q$7Wzr3_<QHF^<Kpkg5T?KX1%qMq1I0Dc1u%n2LHXdhjZ7I
zfjnD?{Agaf!J=y$5v-nt?<^@H?QVH<yt=lExfq8DIM(01D!lqg{d#{6&O~EeNzXO%
z7*?Bf58P`mHFz{%!eg6G`ijM55(sl0K+9qX8Ha5hoj-N?ZADF!$kI>)xoJ$+)n4f+
zS?m9#$VWHhCw%E))12s?G3*bfy}VMp_M_dx-Uq4zLRPA%nf9iR@*g@gB!Y}Yd>%S;
z$Lx|IguiH!Hw^SlgeWLG#o=T4_)_PKU+4pZc1A&T5gjuQN8|ryfqDBEdk;ollbK!*
zJ?Ra_-$>~U)nUg3cFQkI*N(MEBj=XD0+!RWtt5<d&kNU=J&Ang9~f*!F}eToWaZt>
zXU#q{JHb*y$&pA2rT4(&82(eNCdR+tFAa@=_^|=U-b%<qpNeS9O#@D@X+Y*Cz=)XN
zhF@bStEofWTebqgnE^iEZ0(*!uuiY`$U4e=PI{+)G&eDVRH<JWff!*WcpMU4uR}B-
zXY$}%M{s7zkv(usuvQwHqk9-FT7Qs|V(WMDbMt5U>3d>oh_8N=@BP_MoHM${S>)z7
zU<E~%wgEM;dBM@wr)6(HASxig)8{UT*2-AHQ?3+6*)WBiiP~@UjEN+~ujL3j1`h%2
zgfCbBwca^olYxqUTs_~;C9-PR?8J&xbr?$LPoz@)z~jP~Uk>p+GV&GCm59k)y6P?4
zEK1{~az-Vl_2m@4?_(u0rj7vnNeByVF!I8a5aCGE9gk!A>B(U#+SWktSnT%BL&kRP
z8#se4Za78wsI*xZOQQ9ai5i~Z;JNuV%3-q2nyqN()aCJUa}Vbk-XdRmNpEcYPipMw
z8ucg*7-w9GGWC(mQGvL&e?nlgMja{I7Jrh4%P>xEv%qBl%nNJ)fI5IhzQ1$5#Xm&9
zT#SRRHM=~`MqO7u?>UN)VHa|mm7YbV1slg9R~m;Trw_#Q3Ct_UL?>m4BPzNNTWfMQ
z!>UPT*Uo}Ett+uKE9IfKt^dln65(z$@m=SARd8?cBzS(zSXc0+Yj1*}E>gH`C0h}L
z-H(%}lkN#c=$qCfDR>T+ri)R#a|-}Cu|wlnGQ@lMbYCNOVKnaxKe+@Y)#J5`gOsmB
zP`ZWFU*9#S+(r`Ofsye_qso5#P53@0Nz&o~&xF~tzh(<fAr<;kSj|Im8Fn1q)nYQ*
z5QEn$96B-DO>)-Gs#B~M3Cwq2-#{1>j}vI-T0i~RqO-T#H*@Qv>Y(5Mm&a9PwG)n`
zs~x9pJINna)qUIdlBHI{*dSh1tL%;?x>JNfKz6gUS%UF*d+Iwa#iLK)ZF@gF1`sdR
zOsCN;3Yb&-NA2R}su;_!Na6>-=aY~Pj$Y7VHjTtj4)XK9{v>UrQIGC!2>}{AAX5sn
zMi^Pque~IE4!uZBsu*IJwgWDQfyd4ZE(gwwBRt>6Xc}OgE&eft7<S<TN5v-}VfD>k
zGIW5ln9La&$EPInm$KnVtGLWx=iS11#1d4_Ax}5B9JfSxVYaQDcpMuY7uoQPgzlif
z%`uv-4Mi&0;{X1zuLcjhUY43r!h2KhNg;W55#Fo8=*F920xmHg(c~*LT3)#AYzhb?
z@`di>BpSkbuEUFY%hjy%VMl2UG$jl}@!S-iu{-V@aMi?$Vm7^uuwhqkrFRCm3<#j^
z&~swwC&AZnH(2LV0>(X~rud}JkHwA8Tnm!@tJ(nm?Z~`xXcl*O7~zy2_UEAW#pTGC
zz=65{Qh;_ecg;87D}sN7bRTXs$J*k}h1ud6T#l~P>H^0kR$p-7ihG9Y$=-jF)BjDQ
z24MZi;;RVd{YdE38K}+EPBsf*0IY^q5D&RKdW?Fw3)TaHv0F9ew1_*`3nuySxCG{F
z`~BYJkI`0{9(4L2hcixl=fth<Ilc6mK!C12foC`gCl-Bb#4vLS7ont}5uyc2O_JJ&
zagIjq!5?1-$)m=cUk-@j?$zv55@(Wl97(Uc&HV^5;H{R)5wG9|Hok^+8=tT~%Is(2
z0nQaco#y8qQ?>xY@?Zt(4dsfMz=k;O^=M<3DI2e5UTEdS;*``~#CbD?77U%G-EzWh
zWhMPr;V0|K{a<x6Z1~84iPWXL;O0S~EGYwj_LW1V3cB;6b1ina+Cs6oQ`L{2k@7z4
z^pJPt$HQt2hOUtL!O%F8IbpFq27AfaR-z#*4O=@#nV%gw!}Kh@^AQToYC<TS;vWwu
zDKz4~D#t~|`zd|s*HUA6zgbF~h|bryeaEQRUy^fd8L72ig~MKx_d#Ng>49Nm#V<B%
z9lbb5Ul=wze6CY|x-ZeNM9!!aeZdr@c)PnIN5*IP0=M654W<RD%0UCkVE9b0n6QgZ
z;Dx9-4M<vaSD$<-eFI#!f8(N4uF8zs^iL_x2XQp>h0euzF4~ZY_qx=Tlt*I{yuY1u
zCAIH;BY2b}M8%hwd(WLUUk4Aph#IB>j?IO<On}}hrRT`*%kuLV$qGyDzf-mrmJdD#
z*o-cw3OjCsGwqcRxYm*|o!BU7Ew0JQNL<O~x8^JRy-8s8AAw^u@M7w|pRcVw#uFW3
zYY+@v(zc9bPLekX^t^jn+fZYLJWj(Y&ga3%JucDnzD+~D5QmmO0~qz<V>B7T$^&sw
zLoji~WFQdZEni}{8%1z(o&K{if094b^Z>@qx)TKKJ=^}AV2d~6zIAVcO6Gp=syLkK
zGv6)N@i18}i!FBZ*FyH&(JdK)kh}Lov(pcD_gWRCF+PKSx$jUPf(bf%`ObN(R-Vd!
zVc4|#;pzXSRqu-G5m$jiqlcTEmExRj=`m2GMaqP*aG7Yel6j32-8xXy_0n)CLp8Ch
zZBso~+@@1$Ku;&WFoJcV{8GG2cdDXaaMyRQ#XR!Zhf)HL0PKxcrQLt`+XNg7SR`v7
z6e+!@6&s`as;9(7RxiU}DX6+iOR)_FjBXlBu-nfs4T^U<Kp~rrKtoYK8(L78U>rQi
zn`?*^|F);*Ede=12u7NXKj*GH4{AA!GuYedxQrtB8xh2#`i~@4=>TRJqJ#Hl%lrVG
z36pbsYSL>kn0fxaYIU%6wk~m<6>2oOd=v+e?RlcXXjl;Daid7Y_D*bTZQixj%XF6*
zZ_VifQ*_gad>p=27R&Jrc5wI&@iM;5ShLqY(bqlr15KF8CkTe6hDXbeWfFI7e>kL;
zs=G-LtX`G-G#)9-7I5?UDL^xLWNnxcdu%cO$F>@|;nHMyXH$%l0kP(DeYOh~=ktMH
zEaCEN!CN+^%^jwEDqTiiMd1ppnBY~YS<q(L##j2F9U3^trl82=-wP6-j<Qn|TH15f
z@0T)3ZKPc_Ev3d-`6n}sg%mw&AL$jG?J*X-AjmQf{bjZ+lZf}4t|p=NmHPRSyD}>M
z@g*ROJJ}m0K8ZNj0`JY7%YE)7?T9MeL=a@Lch@jA`d(m&B-D2UC*)(t#q6RrL*t7U
zv$Qnj+|P7AAGX1V6y;K%`JTdn`$Wt4k6eidiU$0;VKX_|!hhbR4dZMHG<B??dn`6L
zdTWZso9TLebbg;!$y>cao4wVjxXg{G{x$GgEsF1FizuE78yxS7!daNd<M3;LV5^-(
z(XZW(R20U_p<A*dhW#th0XNGHtCv14i0W>CDz3#Yh38cw;B<|2F2m`OGu-`61THOV
z@~|SUMbAfs$-*t6y}ZYv0F&dcg`l>z-nGd=G)D{fNkt@@l(aWA>5$XAiO<!{=rN|n
zh~q}y8l|7%#NqsMd)4lu#dxK%26R{$QwR1BMvGuLSLP0VIt+i*gnnUiU<|gYZ!GL0
zoAgf-KYxxo6{#sSBARDpFoa!nm)iCU*J#IP&U`JgP3ij^{Z<o7(X9oOuJ^C$R4;W@
z$G@$xRV3hOFh(0Y{F=3<Z)ITBazCSzLRufBlb5<h^1bhoe4}|wI0GOaVj4^;WLd&u
zV4VNlR%vzNnMbrs6eYbQY?tY^ctQ}+_Yi(KmoGxZ;k`E7nC2y_zKybXFNsG<pJ$WZ
z$<;B^-IpLN8Xja_$EWLo8$(P;&!yxgO+|#ikXA!wHF5jcE%@mlpJnVsDM*3D7~Oc1
z=fK{f+gj<{&+M3o>}LRomxzP1URtD-f<r&riWOv>dc9mv7UQK~WJ@DImhvYy^TYB>
zAD$}`1%iCfe?Rink@D<|7$&oj(@+7Wh=@-;B?EgcxzrG55CBxPM%w!>3{4{vF*J;p
zF|#5))RN{EL)U`Y)OoeVOajct%8DptUh(w?LLF`!+EPdrtUnWcq+GL*A0m!u3YX`n
zhEDe=M&1YijUi?sgkaOXBJ^sc>c~KoNg|#A+tbXP(YAhT7aWqQ-@Lxn6hIo4z?l|>
zlH06OT4ALfyPU7Qcy;0FcU1L7S_@I%?X;|t_{R`WGx&@9BQw)7`-|)roH6?FYbXm$
zNh2i#0}gjwpOvapus#b4PcQJgCRGt&l(8*}=^tb2xHf8g$7GU6IK&SpCpI}0$^cys
zv-mb+6)aGx9b3$Q3uXC4-ptB35rK~YB#M#Rr1w=Mk%I{iPS!D2RRjhkDcjiss1FvQ
zvTOQejfcX~Ub<~3v;2Z~GOuMXQ6+zT-Tjh3<n)+hQh)%TLr+d7Ijb=;hUYx?uog%w
zx);3CZvDPs1##kONE=$out}TQWu=%P9k_VMP`GsGjrtY7n7}@-erHn3QARwfi90y=
zl1r6p!Ye;IIl!z))_+@0DA`@nItwj{>ztQB)@=IC8TtuwG`k0E=B6W^LJ-I#<opx0
zwwR1J*Vodwi_jP4HdCwW9z7y(P?cKQynh)%e;J>$WnL&gAI+iynW7Lkvf#vPRX2^z
zR_5TKsl~-tIA!U^Y{C6)bKEFne=bz|L`~mNo>V9PFuFX+#f%v)kBMUidaGYlxGHtd
z_lY&r?fTlxc7mC%RNXZ>X*+cuc_4Kz?SO47)YW`VvaF8EiJ}i|9%_x}Bs?6}6RBB-
zO(sN}j*(qYGj0|$K?6h{GD$nQs(-*>K^dg@nt-;V&X8h6^=r0Z)C^QR+FvFlpE0o5
zGrC+sVP8UhIsV<+$>ZdUh|{#*!CUI1BlH#xnILj)9jj-Q5C}v5=$)U3rJ$h3rFSp>
zV*X---a+{m=xwg|Bo^+{FF(&M;REKSDLeLfmTcQfrlj<r8bn#})*hFmIsp}NhGmLF
z(H*LwGtl@(L}Y$<FvNngY2rb(^NOos@fx_|mz1q=_Ywy3`8EioS)B3#bZz9oa&i~m
zf7)de+%S8f1-%69hh%%{+nH73aUi`cm{WS^P;h*f`=0#gB^UWLs;vry$00Kf#JD$5
z&<=sqcB1j>iJ65~G8__U^M=W|BtJiMYnt_MVr7OP_nA8%uR=`!1~LSjuvgb9(J@sx
z*zW&!y-2;-%UPUbg1M)yve3GZ#*SJV>QYs`(^Hf!zAj#%FBR;NwKlS!X<hV&v3CLj
zzfn)p_2;pb?eb*X8gRQwebW=<f*RRlC)kkp@2%*8HZVM2NHAv;Bv|kzJYLA?WOd4y
ze}haVRNp*H_?3V&_qp^tahbPsgmP&?;>%AO%fd$tEcV8EEQgSosi+6HYimkxAaKlN
z{^i$?etht=$^W;W{6E#jR~pZ?h=Jeji8%o$INchB)HjK`mRSt|0;r;LOGcboe5K$>
zrb8kf&XOv`MkPcX$K@22hbN5qhO{8U)r0`3lvoPX4gLv2IQM8dxwd}81?$Da^A7|n
z(0?6Fp`LH0#@mGHB*&``D!`dZd>-%=cfU)lF&OVV**FF5F?a^ySk&z+(Il0SITCX3
zXT&BebnR5;&MNPmIR_sMHg2_KsJo*I^Fkv6Z*f9T(@`BKi$ciGkr48&1{Hni<;f&A
z#<;PgGq#QF6C@wmcPj@qAgr39()0mRXF&+2w2=yW7^0kx6(Ao8n-25mZ2G{~6Eq)o
zjjOO<P<IfqNV8~M2JpzJsT<@U*W~V64I4Xd3r?{pvpg$MXPCoz{QYjovythUe*YeM
zPE8zJ0smmZG?g?jFXk`ax2D5##GP=WKef4YrV)wAS7@{#01ugIv_gb{fne$i62qII
zNYMh<pBq2)$=d{Ez*uBFng}k)YO)kk{zR<FIpv3f`qT2uG=Gw=7Z~VBCvCGYZ2omF
zPA+Eg_20F6^hMlXO0VTC5s!CQeYxXAZo{cD$WBK4o*#%uYHDktL%q)z#G?KszjgZT
zN|#r)a~jtfxkIBnJvLeDS(D${h?zuj=&dGtBy#U<qJtAHA)rw;uD}8Nhs_kJn>RrX
z+Do;nf|z(W>I=?{(H!%?#P=xfrPo(ku5zQ~;MfRM($X(a!vpoIKHK59{3&SVIMmoS
zp4dk1zP;MQxZZ?d7I0nNH6Qe+gN{on9!W(u0G^3X(+H5Oh;D8F**H_#NKp5IKL$+c
zPLR9UxT1hP9AEJ_<}s8U340GM=!a%Njsz6VZ_pReD=eGtb|UF1h3m#CbtC0i{{Wc%
zl)<nnq&M2=Jv5a|UR5f<a{(-oY6lZ9{~$C@`E7>zNuwRCdLbaymFFyK9@5Zd@CY?2
z^2qgjFJzZmIr~p=A(VDgTi7!BfPxLNeDnbc&r{I@MPQt4jFc9uX#XJcQm!pe+;@1G
zb6!;&S`ZNzOFT?Y$bqZh-u1LR|A#|%%)@LH-yJ%R+W&zBSD_r6Dzj+Er0iSqWIG;5
z;NE6Cd>gd4t*wn)id3UzA4!K(F#c$^<2a2rntxD;aBs}}s66DGyi!?_mGpH)GBe0H
zxv2C*4zKLkC=gae!I3MLYc}u)&x}O~MEt}!0z-sN*K^f~Ksg=lS1sY0C>^Pq(D<s~
zLMgP$oF?5AtfQ>#)EM*_LylT&S9=*zdh8@4@jQH=Si;tbTg~lDb9$W}ZDv&QI1Xzh
za0TQ*C1*5A${N8Mq@Q_&<Eey&<7-ICqHrn&iXwA(aCs;^&X;9bv?il6<0f>OLJU8%
zQxMIA1Yq^JMY65hstP(D|E>zpHZyVh%MB&?OI4(diy&Q2lhgFdN@m3#ON16x*?nEs
zlmfXF9G4jZR=WSdLZEbly>Jz3QarE5Ef7FrtUhb3PkP6;l7z-XRS1-rXNMVLe7Sda
zo#!IG{)X^ts{!A<-|NtS<C_1Cazdjcz7TK(y&(%R9klVLJA306hKO7gi6Vv6B$Jq2
z9N8{J2a|TC5l1~T>`@Ix4Z}lXUDW*npao)NT0(b#%G3lGxPsvg1ROYczaJ}NC8r3K
zq0EcQ0D{QABRxHZI_`CF3Wf^0J*C)tf#*k=HIA1AWGNb=-R?{83X_}qodWKnoyBuk
z*XJn6%jqkmgM87L3|7=W(`u2|{y4;1DI(>6l;M2inf?At`vIDO1^-O3lc3eiY_X(+
z^rtue(gg*kfLr;7u<r@v0tP?v!>$Q`@~tW+mCpTR(E`317nDWtnVZ2uoK2?4A+Yu<
zDd4U`M1&4`Gxu;*vk_ucz=yk$k319T{$ZH${VeQFb&-iOl0`1=!u8~2W#gxf28a*I
zD)VY}G8d?>$u~O@n7NG-V#G)@?KcRgF?ypuC8htG6;07AE>}^h$@;)}6KeaHGvwqX
z`p%XxtI_7#o<-Plso0wZm$#EREZhj?!wVk4pyA&K4XyI`ZU|Cu1WboTd93^8kMX~u
zz+JSlL9aA0bt4G1e(G1NvRx{jxXL^rYvH2g`D}|_S_-8YEe&9Fy4U7<%Q*56+cSc4
zy5`p2I8Q1@7?Ks<#~HPGq$XpeTMd#ze_J6}K#pbzjqFc_b3REYKB{>FC3<an3-W`~
zKEI`Oq8hJV>aa5Xh$Z!<04X3(IUYf}KrU_8q@tVK!x<inrhtKnH9`62nYiU#jh+cl
z>AR}(B2AWT%Q_(CeWWa!8SSe6N`(M>3s*b=K9)`lAwU?#JMg?T1*wK8wVlr<nd3-J
z%1@KV3TI%Rc`3E<_m>_^@kOpIiJ;&($<X0YOkgLB{TJ#U)@8r=TUkxBM!Cmu9g~dt
zTlS@H6<jyDl?v@({HB_qbRdJ~IH}n?jmo~(kq1CCb`|b{+;vnN*dx;;s0B4fvHtcu
z?aEJm{l}}cOEEQjMAXIX#d^1&4pjHqFFT4fNHOmeJ<V^OwG?4oNAI%uz*0v&))*f1
z)UBaE^CWEc25&M<K*3COpa>zMf->>joX-L3q*9ahWed++)MXuQB(Of8Z|7~;0{@=!
z#R9y)6<;g#a>k!>>*ohEinQod>n_=FY^3RXJb!kk19=Lk6Q@U^(J@B6-u`-J>R@iK
z!pEbiP|$ULWe@B=4cO6KO2y^SEbm|r`{Oh9!ug3*oCcHS2Wmj}E$KaiTVFA=#0czS
zw1%i2+mOmWYTmY+My^6qHyM?xe&<hRxEXu_h|4F~YJ2dM<i*%NQ`XFA?&Fw~bSIH!
zA3JGx4><aGsENEVyl+>|rYP0oN0DV^Weu7I>rm7ZEr)et7TLS%PN`ozv^4m%h@vj`
zo#tBe{+)A!a*r$NbY(l5ao|L3!&_Y1#aJ(IAJBzgO}{#Hyqg?WBz`4e<5(;{jGN~*
zQrCB5S6r@CS<lY?4I+7OvI{=4BdPP!w9lG?H+t)ERcg$dQ!g)@Xmg1rPDW<!$o1tS
z%k|3B;!a}YIb?I}vxzV)Rr26f7%}k+-5=DC|Mlo00K?0WL0TqKb!tLZ?NZ+cQXCD)
zx4^Lw*R$sYqX;=Q()8`4HT6{26KMjDd%q+!x%*cP@$V58gb*f>d$^Y)L!L;DU)Zat
zy5srkiXVPbfcDfCOI|sW6q>7AkaL}Zw9ee9@FxUkLb5$zk>lghB}A=N3$=`RMA-4h
zp>{<0=$;!SWpbh$oou(-A>{b*N2Yy6th;46>}c@~ULd82hnzGN>`G&*^d_hMj84%E
z826I}X-cE^N)<}c7>J!fF1R5F#VujD9jX{fvyv4M`9uio%YA?!!?Q(3yfw}w)TvR?
zmAtQ1-si92@iI}Ovf}T0m+l=>GtP>nf{zHUVS@xD<AUZfCA&_VS8h2h{DgsYKIox#
ziM@$cl3VZes3rn#BN>t7lAJaQy>Dy{d1vr^5Id7enEe$9#ZQj~6HFy(Bu$e5!gmGw
ze@;TQ83Xh}$OS^K>I|Im0?RE4IA(EpevYWH1Xf{^Xc^DWlL<+7+vQ6NBDkP$zX^4I
z7^q{qWiy$?ap*^5;E<Q?f8YBzp8oVI<B62M#bvp5g^&zo;pf67CuQxqmOOjam=gLI
z1zt!(7b50L5AHM?5wf2sBbPBdMhmM}wB+iQ0<vs=g8s}ZZ+C^Vv$s!H(Lwr!i|J=M
z6Po_Te7~VKY>axe??Zkr-B$O)`iHpY6}rO{n);gd7G=~m%3$nymTv`DUy_%u=N2|0
zVETDRhc8}{V!HioY@aqQfN|jWh8VaU_Fr?jlf{u<>oWL?)GQ#Es!j|*WsO){10qoA
zcnhl`$HSZTSy>anlRh0TDV=5KFCiO2e(sQ}?8|umkuJG^Xd&8$<v208OLhJ3sitWt
zjN@Wf|Nd!c*2m*utOv=DCdpbWMthN7O0E4FyQ69JwdQ!w*hfJdVR-|;3k1DT=@|wH
zSo03f(O|rO=MCc!s?pE=*RuND#s^kL=c|^^LeaJ)nwTQ35w=SN*gsZG?TH}Plj`%5
zC6h)ssd&BClx3&%i&<vmP4asvzK|F7hEK*<H5Uph3pq)=_4u^TGbq%N#VgG#Myo3D
zLX^is>rBF~k4wyHWQ#M;w&3Y39BO;NvhhMJ{&Kg~!=FE8xZBvROb=4`f1>wQMpr@<
z(tBC(cqM|oV;dAcy2v`4IH~Ouu2k@*+Ay0p>qo)CQ7&UgbD0QGo;UL{gAj6{q%Q_<
zZ6=sqrc9W{p=K#SYrfdxWpR6dBUD&EWy5>B?O`BlSEtR|SkEY8WlstwzW6E$9k0q|
z2m>ArR`2yCZi!5IV?MH{b0^a>+W<hzB3inxeHi>>6ibN{cxF7s*8bl<xTp4_p!Emq
z=M~DalYXO)<yW1b&bX=G1gFlr5b{LOq9CGLVcfU)(}9Lz(%2bvHc7Fh57I4x1nLdj
z#kQ>&u)ct`<X5AOCAHV{SIg^jzn#9UVt<?XN)lMO*W&)!#LwEkAw!AG3pu5aEFcI;
zfW}rnQ@zc@V%Fy043Ev0hqLwN+rER&>KULW=lrLTTE*Ip|ACA0IFWK}N4~p+A&a7x
z0mm@B>;A+BKS_fnA%E1c_oaJ6bJ%b>?otT;wdGSsBk`b0#;Z$C62S}D@2L=8303G7
z->|`)hpJd*EHPpwl1!^b@N@&lQKFxtALp1cTK!JXcWU+mrJir5OvDz}LM0+y8VIHc
z5xkaDg4OH_N(J!uV`|XyWR|tnD)@AeS7-=^5kez%_W#hxp_*1B75?^rCsK&vE&90p
zg8tXsAmVvCYp^)eq<@`Yx5f*V8gU(0xPo0h%(OT{GqQlAS1-)6!U)1C{j6!E>pZW8
z1iKCDmfKn}?}Zl-iCEOI;esh7@4e2;vRR@=Ydu*=PE#Fik@M_Z`pdxsbVJU;xL-3~
zYf?$2^NAUnY$dajNc+bR)r}TNIR{dorN4ad`4D77J&RkxFdD66&_$L`zFoig_ye4Y
z^T-IQ$o|5Qgx(UL#j~ajOvWrOwGOon`_#hPhdPHofQR5q=M7{hV$(w#m?|*Kkb_Vf
zb@S9UPOsLBL#}8grouKlvi~rb4tJe~pguN=C$jttKKRocVXEF^M5!RobmbV?^((b5
zL7aS_&K*M*N9xtB!8!{cAr-NH`(htRWGVC+>J8aNv;mW{Lr@1Skel=_!PAID*q8n1
z6#z&BV)#8RNIm@9gCs`S6tjC>!M3f7%DhR;ZsG%tMo-wi>F1ND3pN(^d9ri&&yUi?
zkVJLSgl`|EJap`wuzvggBM@*}9FtPl|4EW7zc|kmb$5^Eh7juNdFV~I4G4C&6(ch!
z@<{cPQzf5lM!Vb{Np&ci6_VkQ$J*-OO}IuGkqRbtUxIXv@Xyd89t_So&_)ZaJPf|W
zm8QL4ox&gjc~nU6c3nr!GR0xZ(5*vn_UR&Ubj`~0rkA<RRrYWk$U6*H;6)k3P~v&~
zA@pE;7wD9fC%mK}L$~J1DgB%1tFjP%-&=e_kv4K#FGzp|MNWQAs;EJ~l)JRE)LGM!
zgsCMd5I+L`+WUB8ru%gld>Q|Y<FaULk+8>fpZ|CD3reP=kp#dJfOi}azq>J+)xvo_
zzEOkGb^9q#bP1fP&~!!xRMv(*J)eezcmm_XUeo)Yq_y%oc=o&)g7b?z7#TME<}S^i
z3$I6~>ggy%{d!Qt3wJV<^d-i$IrKfHVkva~P#e~YX%-wgEs@8T+OlxHC)E8=U~52Z
zpD|{oe*Why2`dE0Fwqfhj65?si=R(64+?U>$BJGcM>~J0S@<;JUR+I=(-AmI$YsGc
z^sSVXHT{QtGbm)93Pk>Mt;PCTicG6gymR@?UZ3&$QzW6lVgk%(`1{O{QA7IFGRZA+
zBi@hk83(2I>KQ3gjvGv=s9~sclYW$GT+)pA91SIjQZQbZK2bbckE5Vjt_tsASs7z8
zs_mP<!Wjm47|Q+FDS{+FjnzX|q;HdG`5GCSE4+6T^4Y8~27X&=E$6hm8++B_Y#cue
zK${MQr~Smn^9v3Ww7K|X>m>o7@eyX+3xxu^wk{zWO%8s--5O&94xu_Hum6pV{tG9a
zAp!mXmFMa4*n6nrjo3vAl5Gs>jm;;CnzSJ34r%<TdD}>&cfBJookYPOE~s#B;2ky-
z$X}-VwTTA@UXiG~I_Ji}hnU9@qhcWkLl()5%L56}7x7&gdp-d!q3#Ao>dE*RBqs{g
zETu!-JDfD62^V3{kUJs2voC8xs_Moh2py;#zvIOSsu06^j7Zv0tJJb`>aWQxdR;D*
za50_N+lSZ-DlEQ)nI*qLGOwGWHB|{=*Z(1MSw~A-@mbJD_uKP>13E>L;NP$wAwPOJ
z`(%G3S4O1~0(lLStjZGjb)i5nb;X9>^^-T7R)Il*>OtX4-ovnLSlC-5RN*x8h=Ql3
zt)Wki(yloGzTUvFZpGkO7GioPaI=;Y>cP-M+!t5OB7zDOx?!B*va01e;(;<Xh6nOU
zY=ppz9Z#QCNY7;_A;)=}<0vNbkMdz*yNe>(w(5}YDMCmO`)`Sr_N?xa%hnDa)vjbN
zl_V+RA?a<5_+@_i?5)5&Q<;o%Nx>4_s$|*j*If4*ZP5Dn&+k|AbT+;V|BGiY1yrbW
zO4eV$2U>o%%$&7(mkl6zZ~WQp04p_D#x~ib#A;*xcSlG*D*c+-&o>{DP!7@9gbAs(
z4SL<t%`4n4iRq#NM9j^294;}oc)dN#MO3p|HBPf+xN1-oAw)ACIVDt7Q84~|e{jmj
zXaPrVQNSYL5<p1<0Lc842_44XB_JYcmD0z_QC>`<Txy#GE6-!JocUp-gb|Tlr#M7;
zA=&jkNd;%d{-aq??~mO9ZF#WoVH|FK+QKu$$>GWgV0N@6A@9~mV~libAfI(k9eBA*
z%-!sHxc;8MR_J$2<$yDd*k{)QCI<7(QrGA#GoDWT+D)@B=8-M%jF3*Lyt0Ux3~)R_
zk3@?Y?&5dodkE=raUj2lIA&f4^yteI&eJcTxrPmwCVtZ^mDT-H(@!-Q>HcmGW8MKb
zZX89xHCJ%?9LUEey!%7?TfG24z#u7ilq~{3`jT=^PqbFOfCmS+4L8Dc?7Q`G^C)_f
z(D9FfQ%()o#Zkl5_wSK5>tou`j@$Ks&3+t-@ya5Gn=F1LmSnC*`N*RtYgUO-QWkY}
z2dJV(Xjrj9=yStpinRDwO@?f0x@^v{m*n@qls<$Q1Et4)5jSiJv&_j$L(|D&^`C&s
zpXYq(EMNAbC6p8D1w<vT{pheSW_P_9N{0I48vJ?(BCBc_(-|kKPIKHVC8Z^MfJIih
zVCsVcmu*clK}RD}6EX3QvY=tB>HdzmilsI7v5zS-<+6(fQBpAJkn)5)YcJ^+$<!QY
zF{_dhCPN7x9Sn=3lYQREiJCfGB`&gaAXSDp9`)B1u4u+elb#5VDFy5v!o($Hct2zn
z4Q3_eW7hshJtT=;ct!K|)Ab*F^0B!aht0*-;!Kox4j<hAN)8vi(A;9$7adX_gb%*0
z^g3ES*QQP-N3LJF?WX+y30?m$F`WY^rezyszVrltD@PQC5!ZYQ<p&e3@0s=H&pI6S
zs@F@_CqEI5afe~f_ICwk+);&MGJzhXD?^(3<l!}jYjCNt;Uu5Fh@t)UN##wZ$dSr^
znIzXe#rTQBLRf{RMNX5NefKw0ay9^d#0FW<r_Y8L>JyllYi&`IP{}vRdsa2tB>bAZ
zrW<o5IrzzF`XBJ_%5#0!c(XZwiXAZZYC9&Qniq!wBqgl&@iDp@LN4=y%wCrp(W|IX
zZHUqnLmdb?7UL)UWqqu>uTpnjOkaaGexRYu{}B8e(Y}s$gm0uuz|nS?zkpPP7TbKZ
zR!hMy@@Yq=Ao~Gs$xHdX&0T5IwQBs;mK@<`aKbQS%c23fedb0H$sg`1=Z}F<jD`N<
zlg0Zk1A#SOcr9y6G)K&TmXO16T=*yo@BKKLlU7CaLOcrMmkxR?HsiV%Ofmzf2n36_
zHVWZsN+0qT-5BP*GjZfDWz;*`oO8Hec#H^Ge*53l;2~9TL3sM@?axD=V8htUOV<<j
zV^d*R<U%6xwlEu|e4z5QLALzEsFAcaE@pm-seq80`dzx>{hLC`5S&=%k0mH2ND+W-
zDvzXCy8{npOs=#3pT{jV%h(Ejh$HV<5Y~6z&j1Z1Q&3QN-3k=YP}`N!Tc|aiY0{-D
zBv9{U)>8<g+H7nSlB<yrWBoPq)NfhLNRB&u5~N)-uR%s2>w)?BuO=Ug$}}V1-MD32
z@gP@v7(~z)enuP~7mW0XYHbiAWMq~jWpilv)(w9+FG5WmX?jujH%;*hymdB=+y3Mi
zWr0S8K@RQ}s_k~(BivqVEgteRKIWOjyHqukKxLYT;&H^07_3x?CxR>b@Z+C2ag!<*
z0r^Kpzy0Lkzx=RNyETayJA1x|LM3e<X#k8mojFIjA2=OPX!lX?uA$5z8kcGjFEtJs
zA4x>&p;QTTtR>TcZr}VmZAgBBWm8b{FylJSsNC~dgYwAf&R$G%oIb+IJX9{k*<Qt!
z<FJ|=WgG=K2DO{%GtZ3{&15$)kG90;fJYtY_dT9=zbe3U-d=Eg<>sAfD**s??X2Sr
zFv2iGO|4{vS8OM-_h(U=_h(d(p8LxQDtdNiF_lW{(2|(URH@C7^q%MTppC4~1;QOW
zGQmJ2W<9uQJv}2(f=1uhtm#=mQP}y8;Y*0cJh`SX^!xjFhY)J)8vf}m`5o&@YCW#V
zi5!UM==50)_`O2DHRWRldkaLgy=V*IF}xL474-l?=zqaXecR4G`;pq^ei4-#mGYwT
z3Ota-@7zSl!8I~j(|gf%);TZ!^=ka7e(<m0qG5HM-m*P;aO(1&!rj8)ZGRbEcAo*B
zeC|U>=m4S!RSYZR$k<H3zef97-}HR-d`)L91()chpVH%>1WK||a<2b_yZH}75v@f1
z2utPhctYKl-!NT6o_rez`BW51UC6Wew$A;tPn>jpfi+Ffn|o=+;2LY~oO9l;Asl5U
z5bu|fP37ICm`n=9pJn>DYP_(%?{(3WhH(OGv62RF(tGUKb5@A|?2!&m%A6AX#>aIN
z3{X9=<O&PwRUme^{S**fnF{K*549X74(r{%TfE*`DC4Y?Y(iW(NWYE?q(E{*%SrX^
zQub-{_$n$OO+Yd!7{`Tpv$~deVfr|lktq_@zE~OlV&)+i9Ji$OCQ<24buFy{IA!H+
z9D@R$dm!%B&dJk33^T}*k`X?imO1=hUdkJBV94hLv;6wK7gle`L*zB=io^u^#eJ^D
zO^&o6(>68L^mbuGUtQwI9hfxi(}7n}{l5>52tuyIJUqs|PGKWHbfBZ}Y_397=9qKX
zeqy9sB{n26!_2(bgDD(_6CjL?xAv&JjG3hVNOX>nZ|v##ln<ByFm{YO^EBcLWbfrr
zLPbofGn0I4<&@p@MG_cy&<to5*8-@pT5%2e+HeLerHh%y2uGTVk46(Oi7~*7h3PxG
zAjs_Z5{tpP1@h1+EDB4E0W>76CeZez)aTy!JVmlMj#rQIU^Y}-F!FAGohJUL;MCIj
z>`dmaHdQ|-;G~qOWYdvx<Fc{Lu;d9Tp5hym7MVBcwqglC^2rWlcg@xWYOLeu=Q1Ii
z&moaoDA;6v2nTXCq1{xtm`{{b)a)JViBbURPswk(l%C~ak<*s>+)`E%YgCp7qI~<0
zwkdD5WYHy!6MsL*5}9wub1z*^*wZ<vgF}7zx8;>!(yw(G;r~RTZfVOxjzjeVG~WT%
z|EAdS)OBc8A*>=Db{30wFWoA2Z{2=*^`a8bd&IukyxZJzP$_IwcZjab5o6{5w(Hr<
zeyty;e+pwt-04M<A$2wWbxPpooO&2DtGV3sy^#nrN|^KSP?N8!8wDxTBNKgEqnXR<
zJ<hR#y3up(%<6)J(?n_pB@X&VXi?j3!<0BiweFAlQ|-WY>pV4|`PC3ZfUY?wUORC(
zrZ4!@X0N~mH~}M-?q+6K=d&J`mf0)`>|&2UMAJE~R;5>?I*>b=ayybH3B7?EIBz^O
zB^Ze#7-FlGE=6mzo0Lzz{tYfKTT(XZ(vRVw1VeHd$B&Cf;`%(azJh`?_v<%lQO17~
zwD%u~Ra>Gp$pXK2xx{vu8p+<vYQjyr8S8h53=&jyr2W}v-cNmq;_?nG-15%KAgjU~
z^og_0vO=fW?N;_G{IIXxS(7cIY#;egQNbW6pQZPCvW&9KITS1{28_{{r&xN&gN|Tv
z=MdP0vlse9Ok8WOs)<~K&YjgDg<}8L|Da7MLYXiS;Q#8~IO?DjTdBF}uSJnE09|Kh
zz<Z~D2yS>_A@1+trxZJKt=EAbFabwa#r6%q#}-i8nFsEPIRaFE&ympIr|!?VT~cZ}
z<?mO$7!Z+|ni|ZubUz<VR?CiC)x)CLTlYw#km!%_Sk8lvp0ECgNZLFJaVa|8g$6#D
z#AN$6UPY?7z*T<5tKr@E)+Yun>4GQ8>oDY{yt(TBT^#Wb7)9e>yfL>`L(Y-u{N-4d
z_+#qTDn5e=X0ebCMhwG<w5|tYG^OD}|1wCqesb+4Tl<z}Ek!LDr@f4oS=E(d+Ume#
zYGm+8ry;Q_OMGq?XZ!o&%}BXf5-M6v4Isi91IfY2TSdndyHV{|MT*(?gRHxwFm7XS
z<y%eYfScLdC`5#6Zr8~1s%){93$Nr_>ukjSXd+;Apm{vf-LZczk4yXzmGtVaUeC%k
z5QrhfOZ>7cXtp2n`W1PDh8!Zh;s6|@qT@P|hoz1epHwV}nhy<9NGeG}Kk4MtXshEK
z-~Anf7@$+JPXPjIx*v{0SO4$}X!8sIlq!)+Kz&#9TRq_jztYPnmd^d#mkC@vHG&KU
zD$PQrOVli&D%C|?fodz@krlC&*-BrpW@^c$!@4SS!28GGa2{g{qKGU~(Yh2%T4hVv
z2kC1qc1^?;Zv3WV1vsF+xSpU6wxp*b4)|ktn0iUq*yLvy+1-c@p_Q5Ypcu30V3mMU
z>8q0IOY+?+g#O|rRsl~$ONhzSl-EV#dn9En08BO*i$!v;4M4dC6SgoabJm;!lp(Ej
z@)#{XqYoGX?UbXkZ})w5h&{?O@q&@xv+0Ieq*Ip$FV%CKr@BuM3uDJO2R_+-^tgi<
z<S(^+-~wq25OSNb*ZuILtp&(UDP@E&V@e-Iuutg6;c|>uQj+t)E-7Von6`EgO7tp&
zY#*Ca55k?^7I)pW9~W%DCy2>pWhu0`x9`(86H*;uITYBf*$RL;SGr+Sr3Ym;k&mS5
z=yAS6FZHn)s}USiFF&^9xJfh(1<@dn8?!6Jc&QG#Q<y^@US6A5Y2)3re^s0f$%a5F
z=yTU{<(M%E-5|AAA$N~$-B}NBe~lC=#`R@;tV44c4B<)TunQUehLD65rV4uqMu69R
zwb@^=V*Lkl3$(x8E4NOIZjSOF10NtPE*GAvAS$Bkj0Lay_W=RxBBLq1T=Aru9EQ*a
zvx!jtvTwb+hX)DO0!H9WlCn<IdxPNx=}AID!J#SYuSEK3gIEX|wEGb=Kl%hvE3-HI
z8Zpe8!TGy!^?L-N9$NN9TC@AMY?bAPLyUW-Pa@{k*4nALl_K{N4HLc*5+eVl-~2aA
zVfi1HVo=lo0F=m7R#IZ$55mN3P{t%Kkd9#`LB!^fSBBzRtgI<L(ZG`;*VGR-Zok28
zu^vdaj+PB>&jb6uN1<3UOr7Tpi|^Jw8d@m)mCn+jNAy0hvj*@5NOam17LNz<z8ktb
z#&%%QY#Z^IW`osB<Rn)`_J~=&@wp-xZj$_PeW5THu42Fhj5`?}dWWea<0KPqAe)<F
zML4fyDE=~_%|)MCj(e71A<6J2l-F2587W(hO?{O%4EY+V>b+x$I#c{NBpB@&%Z@FL
ztIF^9$OD5o;_9xgut5aPD}wfNY&?!M!r%G#7h~4<{HjfAHdf`44_{uA$B+zJ5g+8B
zC1<>WgqGIiNN-~awHzuqFF>nYFzGz}bTTgwo5RA0iTXL|&r-%&9x5tjRg%XuGo{;G
zHOfK6U+k<4ajbuobp7%g<FklR&_9DO9K0TOvrEWzv4DWkn#9=5;zCKJ&;+&X4ZJ5+
ze#<SLm1OBP^!l{ZXmJasuiF@CWuKNWMwrYxYbpVj)0eef-yWo<l5zCyIi4m&G+)N~
zQ;`}6CTqKRy`~vboYP}9ivKfwB?gHUVXkgTcztQE*-88^_CA&#6?qt$s-eWc>MJcO
zBDi(vpi0rfVrY~rK0OYI@ob(GxSrMd9gkzC@tr>n2nzplppP7~7?dT6vwXF61GM{3
z(7pWTYoIY9IDuUAt2ASZP4p@Q@tw+)$u!Qq2oT@T`eW4fK-(Hykrh1L(Y@CwV>B{H
zW*foWNd^j4Op)fy%B8JLR9cc1(f6;DyTCwC07Q6JnY`4)?Qj{8q*|XZGcuds@#8r#
z{gBq^1`I&7tKaKYUpf}0n=JvA>z(zRD~6Bs|8Of@aY<Sc8KyN^^36NhcdyK6FROWv
zEuL~-4_a49Ofqexwa{3Bcf(Lld8UjOTH&MeyPIJO?1oeRry9hEGV}<fOs`g(D6~gD
zhOm4&=)b@TT_`<ptl*4^1th+9Q;9ziN2rsS`Wk%F-S455$H$9w$fJM>PQ(!i0*uwN
zWoBEBY%{RPj%DD+WS-uJxHuKc(Fl$idTa=kj-E;NCk*!rIGQx1TkItnOb67ok)pNs
zww1_*Cf`r=ZNv^3P}pa@-Em*pffTyQEc*Id=+{zkFGllpMcDeSqh>96d04u6OZnNR
z&&Udxu9W;FV^-!$XK1!auOZQ;s3N^+%jtCIG;->;Z#7ZJDtuGlsfo)dPj9<!Ofnnm
zUH_^jQZ8wudUq-{O6ifQu@FAH_4q%e*)Mcw+_<hqr6h2DC>)XbJnQov&(H=E34~~i
ztS?gJqehcarW2YJ@=KTD*~b@V_;wS?9}j2@HGJ1&8}&$=ujY{{&vsvam=<P^wj-OG
zJZ5?0pa}(|lG5=@YePxl!)O=>Xki;jh88`z@o0nizR{vRR1WjiY<-qHH>3;dE}@%g
zC<!n@&HwEMfD;|G5#oa2M#zXOurI9Xl`xlk4;4|lF>0(wYI7|b8BP|X#Y}WwvzFSk
z0ivhqQ|O?f{fXY6ylIO2%JHRJ2kDJ~!BuFNlr-*Bv|}h~R=s5m^2-ogs1Op0x7v#M
z{WvN>0UTQYYm^@3dEO{e(`wBK9AidavWjEie~UB_M8KiR@Eb*eo{(ajOX@Ssd%uW9
zZ2sk$deo=B_oPb@IpHDi^U=ePmoH?I!E*Qr>~+|Nw1*JQAUVR=56cYZn5*iHB|rL0
zJX;MDiL-|YVbaNz7jz&?m6S$8DIdhw>~!0}<Bb>RXKwzuPEAvN{TUJWzXkRN7x^I{
zlvyDnJ~b?kuNQHymNUd!Ros}V(=uODk|Ih@Dq^u~K9dN*^WUsT-nsRH!bJ-uYO<sB
zOz<ip?6S~!*n{nd0H{-bkTbD)ifvtw9Xj0sIF|SVC#$GbE|yqZ{XIsLXD15?(<1=|
z85LRQx9L(wiagj4xT#IsVQ{;+TW6fg(&q8<F;ASQ`wB%fp+lh}5pp2B;VEr7#?sIy
z1LJ>PYd(HztqJ6>(V8_KDp?J6L{b|&F1(&FvXFnr7>~gV@!JU2qRd_G-Z&c3XDU-F
zBw&q?b{gjbjs@n*+sVcy7OGW*PtThPH}}%!ohou9-IzSQ%@j2^`y|Soq-A39dU7lD
zQv{4ToGB7RytJzbyRg}@z4u)iDs23rbFcX>^3~|cL`mRPnpB`d2|eSd>%Up49o{91
zuJvm#$sJy+S;D0-F(%#}FFFYT2!aJ4${t`I<-&HE3SxWm?me}X7ZAl{p~P3P@75GB
zMNy4${d!Q+QZV3N^jm(I0-OYsk&oUGdlX3#RW6OlB3(}Jw~BKd9DbF?&Qhi|hA7GW
zx`3Cx{xCF6I;H1nS{<dWrNJ|}*=he*k#4B}rRCefRXawkpC9asv{-3Wt`gTnW-rP2
zm@kEHV)pJHg7;Q$J^qjJb2uya=PNDzMx{o*=;sjs%ZWq)(1Q@tg<$x2)MJ^K>IevZ
zO{Zx6EXAzNximB={1-<W5ihTpp(u65COA+8mTEX|a0@5q68yf~ri38`Fpk081&bI;
zZR;dc=lOp96f#OmCn&xY29D{o;23KZ_!b&i6{bu1L0%uJKh2rK6`cV>d&r;epUOz_
zGyrP_uP^wB8NGI4<MkXO^?nX5tPUK@(i}XBKn#SxN_ZUG5Wo}~u@SS4x;Qf&CL^7!
zGZiG%TVMRO;mAk!@+g<kRiu!4!5I4$>L1BdI#@bbBc^~wC}B-`^E8dZ2!ezh>>qKn
z@gpu*$f#QyiG>(fh4*K+JpCR_@K#A$uzSmNN&S#|?n4g(q(MBgZzosGMnax6=;b3*
z(A&EVa<RV=R7BH%p>H=SLeA#I|DlF&@Xon{4S@0!?QnuRh;TlJ!u-Ua3B-D|$nbhk
zUB3jfL?;7B9Bbx&a9$>yq;}!WA%NSgnNu=iFgvyaK=W+fsI4xLky!YRCp7{qFrv3B
zl8w#Q(kzx_bRvOCJAV}c69>tKY2*Xrj&{{mKc{=?8}Yqsn3v4KXjq=!*v6_NHE=pE
zu6S=GwtBP+kkH@41yr=9Zl#rfw5wrbzxt4%>jFPaK%(Qi;{hf6FocMlp>#aVXM^YH
zm_q$kJ>0AqB?i?_euI?6+b6}gw=98QX+4M1B+zPVFOiDxf+o$D$6Vn#bX8i&n+bLS
z*Id_<IADmGfk`n_EG09S8Ggk|fv1WA*1@On&ETo;uRKLNPji<vA(FxRFbJ(En*;X+
z)bxL`_f}DHv|+j^8VGJdg1ZHGw}wCnP6#ya1a}SY?iSnvB#k%j5ZtYC2*GLGo!!}c
z=A4=N|9yRK&JDd-EULQd^T*!0`m2n}$oGw*72gR-ti06HSYC#I4wnWEn?*OkoYfv-
z=1x*ho|bE^8#fcr;ux$yG0gkSVcKU*!^21E$Rlc?a)cE#A+>r2$QYMx!6TMJnB`Fm
zfbOP3!L|6VyMc0<0z#oj+Oc>M8<D0+1AO&8BCUi52uO;y>VAfv?#pZ>+vP>r>U;s9
zTa#$AxRW27PB-E8yVS+&-}|~8Qcj$?9pb}&bb9<E{h#&>*!Z9dg(-*H--X@=$2M{s
zUI=s}4iajzd8XitS}gM|rb#yFsUa97b)O@~A1w|HQ2Hr@19pm-M<emsn)`CyP5_GQ
z84?(UvP9%{+(EMenq}&6WQEA;@Tq8o3UpEuROp;3gs@g0a8h0%zZQ1C)Wl`82nLU)
z5Uh;(YyEm{{Y?(Jh+5MMeT3m4m%}+wl;-nzu1ZAOghV6Eu{M=Jt%AyuT#9UtIq)$<
z!Z6+o4V1_G1|Ht~_;+Jr%Z%((t<%(iAq@MDV3fL5*6@%NP4FbEpG>XW?HQ5@hvCZA
zs1csA$~PPA99^+~vW4cySq>qx{<prwntY?0gbM|UKHn9(y#`qj&OPXXyvj?vQ^Cl|
z5f9eo>Yz><9V%c;YUxhYU;!J9etph+0XO@|X-VMLmsv4ak-^N}e<Yr}<92yaOl-0a
zL0HkMG_OUUCg^}+QegQUCrOAg>2oR#Fl53Gpx97I{~myjR47aqg@TPHBWuPx*n&Z?
z+$<hW&8ga3tyPsRX5%u}vfmz11u1zBd<@PAJd|-l(&=puev6Shgiv7zx~bL?DdEF|
z@sFkUi<4X`3w!hcbcpdtjAM7)(fmg+r#%E8+n$$8q@7?P+a@e}C*(v@#o3tb2Ie9-
z?=}1@>d-2Dm`O85?n8~-hYmoN>!SfTt>4^3O?TryoT*b-%+m|0Yt42#i%ikXMOHEt
z0IXBj-`&P;drOTWl%Fp)OnkI+{yiHw`L3{JFd_}1SE$0OoR|$-y)MsI$NP~)h<;_S
z1sA4{`9e4TxwSJ%c0o3W5wXX1+bS?-2sS%OJbWD(R%JriH2_49Q$e9c-*k8S#Ot^r
zl>P%fu)l3?{2L;o_jme6v-8VP<Ms!ccPJlONZ2kCshnHiRi_VIAe#@nAsbKu{{;J*
zBv9AoF2v!p;ZChc@KD0GPlO(jc9=NV)b9l_a^Oe|<<tOg(a-<X40s(=&Zq{~Q_%^h
z2aGg@WfFqlH%t9-(8OGj#JwzaRF)*(qtvLAj7UH%a&G|j76(xcV&FQv>cSLoM~+0|
z5M#BeX$D!!sC2duw(;JyN?<0O*ws{>aOOMFH2uxpF*O<=P_J@^wMiCURgqO-bwEaq
zjRi(F7rU4r^Ay4tc!j{#6rJ$QLh~@Zd_!<HHxvLQ!)tJl)$%#`MUHQAQqu6&;;;xM
z(9by`p%v*3(w!QW@JD9qR=x;KXZMVAz=BguELVf1PkE2xyYj&)6PW(U4Pq#t2FRQ?
zRakCh3Vdwod=XOvt6Kh$v{oCQeU3Q8@A>m`EA!#?Oo=G1k_9P>aafBic?}ucLiJdx
z+X6Wt8nr=$fUPsd3U@!f0gE)J7K=w#adaxFEZ-KVhuI{T2^hp|KPvn&iuGP!=zD(z
z)hv)P_3P7Yw+w%&^i;p))h~Y}cHQ#i@I<5@Xp4C3P%FZI(6D*N0U4K55U6*J>ySOl
zN@&#qfPO&Y3>FoF7=!dX)=yVUCFf~!P2n{<*7FbP4HdIHZ=aJdZEZq~{5W7w^$Dh{
z0`*b>#c`*c56;Uc8PGYBU?F<k+!*OcMBBfR%db;1IzBZ6!&0%)G3&23?#ekrKEJhR
z1VW6+zc)~6t$ho3TX3j|>!y!%BaTW#WV~6I0#7!&Yk*_;Wzp}9|1wXo_D$1alpXPS
z=&Z|Ti`h;4>uKv2K0Mv;-IzHd0gQB|6!ulJw~uNcOrvsLGND1Sr8z34Acl0eT?r^Q
z@dI}Ai|?m&9A^Fj=jF1%R8Ui{H*Zh}p9A~-9s_6-;HJ1gcvP6shV{TmNIvvtU^Py(
z1cHuI=GCu;vT(}q1U{MV5<`$k&SJbUl?^+~?La)hPu}^X1-Z%`+EI9S5X>F>1b3+a
z!wgeoYFa?@MiG2?CkGhmS4tZ|N)6N}0_QRteT*X{U`rrNHSN#n!-p8x*OA8TahgYI
z66&UYT_a#q-_j8T&KDKO8F^_Q0D|&G%QN{`<&$@OAA0<ljjNTcJDukcmT4K3G&}H}
z!EIk&PpKjD+AreqKZ(=?w&Yl;v125XG3i*e-3HaGiTspeiKmtOWBW!3aVn^uut6h%
zMeea!*nD2n?tva+oIohxrc{x|nq+yr!s)i}k`^6yW|UZe-QKAaN4P`dJEmlu<=RV}
zaOv}!D;0+UhiMcA3E@W?8uM5Ah&XeCG;N>Gr?qM+I|N)4msBVHT?aex9WRX$oWhaM
z=y#h80$Mkhoo{Xnn%j*9tI&fBhD3aum;2l}MNqG6^_VdoQ2rWACm@x?C6b<ebLDLA
z^TuA9-G+%`^iljhg8ASPAlexWJ57vB5XhnfG!x$m3=F5)@6@KJjF^36E}$yn_d8A8
z0QeC=cs+qrHobEGzpmg&!ud!6R8JIq=C!!-;Q|hc^h*{Z64m!DZxF{(U%AAYi5`+a
znYyr#enEs$dUA>iyle72sFlxjz#lfLMY0FgUJmVFE<#$u+SqZj7~M}GUf#Xu$14r&
z_^!^+91C0$U;Q_{<3d>u3_7s;j=nksnr-;ttcTOA37s6m)DRsvC02odO_zo|fA<3#
z=P4g^UGPd?hj!K+{%H#f$>g%|*8V}poCz_u(GUVsDMg(}r*v-DpI)~$_GTEH_MA%1
zZP4Nhv8KwvoC;)YnNftaK;dN>C3B_Q)t>cGSXuFYY)S{~Ow90-k0Nc<wt`@4J*{<O
z6$gOg=6?TaWJFr(O}Uk^-t}VdMFoEB*b|U@o^n0!ZjH?DEA66mE&xKcUZfLz$#Y$A
zD6*USFDMrkY+NX?1&>miJZ3;Ee)&?Vd24_Hg`vNt4FQTEfn=+4P04r3RvWjrh4d7=
z6P;x6JFwJU!??}h*)AcNYhpcuK5z}5Q2Rtvk_^uXraumWIR@1;J|<&M!7PFk;)<2N
zMn~Q{Z{7eM7b_J2VV}A(yiMW=&%ckRq~c0l3o~n=(imFVb4i=l*_JMuI}N%WXek_I
zo=ry^-=j}No2gOhx#j-~Frrrbda>rqO#d-3=oWtrXuFo>0L*<IK^n9(>wDe>uee~p
z`whWl!6<RrG-~!;U8+1eE6)LABg(_uslqPq$c2Y!t9`8p9CE%|Sy~iKQ&0}iazlNt
z5NU(&eKP?Y>1Z0d4U@(^Dy0nlE9g6%Dz>HEe)iB#hAvbxgbSu1>y($0qdz5+-a8)`
zuC4@XH-V*{9N$7oi!oee;fr7elB3cCY2MOIO|xf#ri^1wdwQ&wrc30!C4one=Zd{=
z`T8++K0Cj|?w4O>itC@7;Ek@BdV6uDm@aEeu4SmDL-FQagEo&$N>lrKB;iJXMl^3H
z-%M$ECP$~$gtjZ+(TX_>69nGNl&S0+D1Z$L^fwzvY&OtVbh1llnOujN;Baw$TRZOM
zLZ-Py!c4tzC1y77lKQh-t~d{8cj9XqH`|{XfW+AF#yjK~)#18KSEDKP(Pk$Qnor|W
zbq69HGIvI<1_k!svjX}6jJsp$^=5_Q%1ZW_&sYVKErhBIaj`7d<;%r>%WQ=aex(y{
zmZ3v;;an1^Q_ea<pU+d}{qn~p;(|jslVtIg?ei1g&;r1td!bgaGDc*#;;yWXI1TzR
zIX!ySxF6rgvRvYR&A8X=$NCi^*X!YSu=4YJv)u_*zEqBbMbbm08ASb`RZvgH{RKup
zucaPz&utv0!T{iBKJ|Lr*CKHzsGpar7CK<anDF}%Eobmo3bgYRJ90VM?9|7Bmu9BX
z(6MlP*8|8MT=(aq2}a0oq#Ox)ntiXAu*!?A4=WtC&J(#}1sZ9oI2>o5t*^EUd)LDL
zn}Gj+lfbDk61eb7yhPi(+d?Wu47AhUm>!-JT%+IZqhxG2bVzu$6}=1bM=7KCcEQh)
zfDu%k(MwtIDI(Xa&w}jNjF62932gYC-pI7P@YM;isCp=Dr0iYEffF3t+zA0Au8I(4
zztjpE#A}#*H3Z-^$XMP2seM*pA?%+u6#^1i{m=~XYni28@G+i|<;?tb;$u$UpwpnC
zZIO8&xY-PGYNLS3x|cOcwvL(9IH7|K%*e%rR1vXk_)><>>Y6H=0iV#`5RlQ?4!i|@
znP<N`<H6<jN#)p#Iq!J1N&_ZZWisPgb{tUhnMaWa&uNUTJq?q>tYrtE3xdpN?S>Kb
z9ZZe&B+A)AIr6;XM|UN)Ge9O289-#vCKI3PuY|^8jX?esE`kH&k7O#8dp>UZ70hB5
z9hXVCvN$YTRYIq|z;bH-@Ex(%-Z%6CrIb_s@2(lei^%>6ic@DGHkjHWW6_pW8)X~i
zl|@&Elhepn+sbTAvjicI@JFDa4J?V4<=ur2-<2F)-dT7g6`7yH9Az9xh35C@9K<Aj
zPAHJddLz+hE%t1|#M<N$fkNafRJssjx~ztR+Pxs{ZUdW}A9HfKTrV+I((o>z*_}b*
zm^SfI_~OZ4O)#h2(g)|vkC+KwAm~PYJ$(Hvmd>KopB%!oSGh)62J;wcLbJiC%9WI$
z3kD8g85e`QHE3?6R5^NYnOb%Zg<n}Dce7Y?!}CM7#K`;NyFX&*%f$*Tv=TH^>ZL9l
ztM8!Ib2Q{0%<&@=6H`f-bky{FD87^ipGk9{Rvmg>D{T6m?o~|}?&jt#<~(E$cwvSe
zN-y)n?8d3BYf6#$S@9?Xg!yE^*4X@Ar|RxT-8U4pbsbe)y(UNM$A|S7tUPF&+?TH1
zqoRWw4ymDWnehn`P_F@&+o#PFRj?$OAhImLRzW}2ZUu*Cmdj)?f|Do9N+@Z;r$VTj
z!S%{@(PMPC#MA4`j-v7X-oA0+V4B_gMiS%m;1mv#EO%>idAHI)x0eM|Iz?FiL53EO
zSR2;^=9~mi3Q>CK_X{QCx!h024`)X-mS&~%PGk4ZC{u;m9O2_+XfVZw;MWc|YT)^W
zIr(>|mlPUjJPe`u%cyvCF&#?5PW4@_T9ub4(s@cV3Ny5*^TpaiJ*rRiHjb0|QV`Fq
zaiZCG>%KSox9aoy@)8ZQJrkAlUw6Em{s|)GQ{s5CM5f&aQ*wvE!ND7S?Y}ctzzJ;K
zhGDtTCJu@7+k*a%E%R<}jFBLDm;pHs4pCx{N}nmQI<jgLTkN2aZHa>4Hly5>%E2mn
zvpkwI>XH^BWhfXHo^BWhCnz#?KEnBRvPcI8`d-EY>44S!G!crbzOapmQlzwBYBXDE
ziUv%2=f+G6sZ1888xk;?XAF|-Nzdn+M{khwPWhxcnEU0&qMOvml##-Ofalk%+vw$z
zYi|%<#L)I$%33R4_a|o|l%RNNM!EU=$PFdm_`k>$S}fLm_O;@cE~p0ok`H~2z7y_6
z0}P9=;<2el9~=9EJX&)k@^Q{tLx|%;O4rwfN;f9A16b7K@KuZElSr-k&+`_#G?fJ2
zGM!fkKoHiV;GuWv866-Rp$&EwmM)71H8ijc)oO|?Hd>e%4GvX;DkMLqz)>_~;cT`X
zWs`+c>q~=mOtK$tlf;bYEMA5TlkrVx39(=h+b`LI2%>9pmKfDoNlm?E6XaC1kR*2v
zlWSuO5`INGo!;|peu^b~+{{TD8hGx&VrW^U>a8PAh)yN}37JT-cgTj10wM78L>`YU
z25hwy_%c<<37vaWV6ufuA3|ySBOD*?p7PDO5hi^1V-OUUYXG0c1zP%HJrMbuiTixx
zHjdo*1sghwg*6@3fPz_7`&5n9DhnQT-ZOmRM}QtQ8-RLw#b?!-f(2i-eRDs;T9z;*
ziTlaHAKx<?fjO4Pz@wAbazT;x4wV+PS^Xq^=JN4Ln1OG0s$#h}w!x-HELE49?}k4d
z4urB90wxi2`#@YOY*ALJ_9OY%HIXv!^4%1OZyf5b<FvHfw6j_bxo2Tv#SH%WBV<#a
z_E;5BUyBkbi{YFT<l3!V0kruB11g0_&j}(&MJnBvM5eeuL7YYovo~)ESJgT%Myfu>
z_Ip{_sJY=6f4_9SS<vJ8Y}B+D-$^8Oqi!{CDWPSqeN}+d`v-GQ-J&2*lvslu-Hscp
zBkQk3J!pk%a?A&ZZK`%ZZ`kW{h%xUGEi!F;V>^>0TzJt1lLeMNveX%(Qu2!WJvu;<
z8W26p5ONMXcv1no*kH$E5-}kl%nwY1Ckq%Ua5JuOF4y}UM^xEYXKfkkwYmH1ydnWi
zF*h{j@?NQ+_HJ;;apU05v0gt4fXgoPAzYrGhi9}hhn!FeNFawzZfAG&Wes0EZF|9Q
zF;>6Ho!ZrP6;+JJ;%)yp8plT;>d`z-n0hq}U)fNcJT6bdzIfEbr!y5O7ey99bG0aL
zXp2<t&pQAT?4Ma=D~X7p))Z9JNp6Pr`i<TH5$;+Fwj!yR_}O$#b$=_4)qf^(2`fhu
zYb8ZJ?qMCbEh1>zVycJTT4#{=&vm;VRkR*}E&*<u!3!#p4rGo<CYJ1S51@2@%#Sux
zq%)(^X>wtI6{{nZzzySl9{va(Ul*VDa*7W$+!he|AD+cL!AszH<%v6q0x%^A%hM^t
zVf@N3l!v~S3iy}ZKpZ#;Mj!=htzjDvlrNVDWJ!_DDEa_Yu6%>E_4?3*nGtd5PpLR#
zm9N-*)JLklc+5E|W=wfAzq?ry<U5YNZw2KF&oxjKo2=jb9>e&0j4CL1S{6f<HlF%9
z@?qSqrq9KoK`H_)W(A)Cw{2L6h-#@QI}F57T1-`H^cHbtTY4H}LBG!xW8e!5n&OQ_
zCOC+FeX+~@K01PMZgp<Wm{1ET^mVq&O5aYJql_<QrO``9CQk`frsYAT{MY>RfI%VF
zbNHlJNuiP++zfnlAKwjD?&k3_ysHa~unM3G+$zuEQ#3t#T_$|(`_leyP@D&WR$~h9
z6NW;FuE!R%CCVD959iKg&{X=|``t2?AXbH;!w=-aEGXLa)aQt-^MX`43>Y{#-@n-A
z;4esv{L0Ir%SSRdDH#1CsL-VDeb&`5t8v00egrmWV6|X_X=_Af`$iDnl;NEI3+Tzj
zHy3A?lKO%4a6x#fiWJ`MdySfl+i?G548o6Oz`Fv99z5HHg}1a`K$SeK2ZDrkj#PYz
z(UviZxN+N+%Ry4#@EDKcFX?8~AEX<nVX+&h-<4<@8i1$bV(b;T{k*%e0`Gnsnnk6_
zgUK{HCrAtu7;!MRTHsrkBhqAG0ww(VC)g?Os7LC9im@$J%_d1jI%s?p8EP>+U1A(`
zzGYMcS)C(cXDvl5CZ#j^xzxJGl0~aK?>)^Wij-z2vkGsoog16Ys}=&`rlqsXzJC;T
zyG=;+#qlKi<d8n7D89^G@<s*$t$?dm-Si0j#ZRYbV0O1iw~=I{Uy$1P$AR_OxJ_+P
z6OpD03`a+LpaEJ$S;63dxrlGNfDu33Zvz$Bprb8aDIA^BEFmx<3MQomp>OodCN>Rq
zx-LaQ!$<`rk3r|UK)U7Y4j7`1hT;gm5cO&Z+}@*fL{WrIB8e=R^imk^Gsu<uNcFGn
zJO8T8i{i!a=Wy9zj+Bul=(cLs@YMO*QfHLa9WTs?GJRf1urmZv3n?5+)MwLaPzOY3
zM*v1#aM-S@LLk?W>Q8`?dl_h~ZL_xw5-0-Q8T9e3kOErjd~ekKn}VWNTD@i`dXQK(
zKP;Jm04rD6=$EpZfg15Au3^yN&Jefl@ZeBEaW$QuZ7Tsk#TxN*^G4prZL{gdj!KLB
zkG1h!<&mlL9x##Y)+mqjj>&SL<N5RuTwvX&gSQWHEUBL@UhKT~dnT9%>&u`PpPN?P
z7Y_%Q?7<`()9ZPgvy_p99Yyivtj_9NJap(2Uj=p_kk)f*s)DU_nw`ysKdbI|IP~xQ
z8_fAE`MaDEsG0=$^jc!;7GACzPCeL28e{Slkt2zotT&#&p@w5iesrS8>5TZmgpwkG
zCsCDpaKZ;n6g)9Iz`s<CbrAb->Op683$O6>^i<R!LAboB&e#Mq3@<MZ>cQ$7uSB-z
zOterK@y*d94^}&w5_nQNn4yQDdz|8Lw-mQ7*APnMN>!(CqKX#%XevIkjN=WnA!1*0
z4{FioblQCxO*d@Frta;J;W9x$2!y*3V%GF7rufZneXg#_5-_p)Y_6`F(1@hS0?UXo
zr}CjPP<6>arqU6NiiDX|ndbl%xNjC*o|vMuS!v#kM`kov1~{chWY-QCSS0?+dDix?
zXHAHs30a|MQd527$|Ctay5x!+5AMy!3Xn0Clb|6pQmGo9CQB9aFySm5(groR>atNY
z;8eH@?vL>6X?&{pYSZO@`R-iKDp&Znj9emSxWjpmLDKsEt%#Ac)G5S>#K9uN*eRUB
zCC|hx*<F3k5GDt#JKl)W9T-ZmzR!r%(-ihf<7<qTq<27y)Fu8lQiB=SXY`BrMwHU8
z)G=15eXf}dF?}uKsQ8;KVsnM(<XKXvOg~A$`|RYlG3>@sHl7uzOR)hYhMu_OT_Vv?
z=j1izvx%6IyCu00zG$0gFg@hlIdrmt1~T0io+%+=ZZEf%g3F%2-VJEKPqc#XU94(<
z`xdbFR}dN&PijdAn88}?SLlW_c7mD6-H91DEq#qrk?-DTC7aoV=McLE37L`d+dqnN
zN=&2xa5RkI@1_gfdRiLl+0^6e(f0ZMj<X38we6!ae(rK;>w_c+IIlCA`-qVAB6p@{
zb$d__sYa`fWw!f7XYziH{~SMTqXG@ug^ZC$agSutNTaK#I899i{lOc%0tX~TlJU;Q
zdw<Cy5WjrN9u#zThiSXFf;aR6txP+1A^<W<T7{iL>zQ?|mJ^HKfs#dR^2`Jr>B_lD
zEc+1^K9E*6_xAPmP;tlSQ!(=;@Rxlmfk1n-b=|_k?JEDLpNrvV?aY3=`zOz0hxc}E
z-x|Sg+G?42bDeLscn2YBX7XM`38(3N{;=)-GmLi_p%Fxr^@AOJV_-P&Y<vz$nzBE2
z8x7ue)L@j|`-Y5f$k1vVfy&GSB-U!<?N2m1Y2oP>Z3JOyn3(3)4<WW4$_CaokE~mI
zLzLP^*XC}Y>W5#r)zPw~er#GV%-5>?8Y0~G)xjS(6tWynKhA=glUv=DJ-zz5HAY9@
z3M|W$q2|t105#GlirKoU<anZeDd#?g*XTp7(8HA}(PAyJpanJ|DWXPmd7<}r>|-~^
z_HqZYR>u106zj>KD}d$@H?=9e&N?0}3WNVl5&17pl09ssT<=2U!lDn|-wMziaWQ^}
zlSz?KWJ(O)CKCDYosB*qkx@ySoPR#v+9E!P7&!k%=?Ru%CnVtIvhwW2k0M8rpfH8&
zMF@R*EV}l4xTZTESm8wBg8TIt;1rIK6z@{7e#wXDdzspW;M*wq$Fz+>(2}Nn@P1oL
zb$tH3;th%%Qj!0xC{grBCJN2UY|IJP`GIIc&Vl2oo8*9cRDEOl-)WNf%|-8z*!?ug
zE#Jrs@C9D?DZh3`exRMj)D}y{BC+*2#%DTC@eW&IBp~2KrXP+kynemL&)T&Lwo$QK
zsa*`}TNxVlu@ItHher<632%SWDw|&ITj14EPLtO0FUOG&OX(KIzzbTT#KCGw_=bQ?
zkNP9s*9Jlt=n@-Rqox`zS^VyPlXCV(7P1}!l1Dv&N3-XVqs^aXjDvMOmMCTm^<`@3
zhKff~JrRF3EsOL+y#j1=bpec2XJeKFG-j_?igW5Tw!HGRuGN;o3k3l@*Cs=>>xc91
z{zZC>t?s&Tx%a4(!OKd<an=aiV+4yHN_YhU*6`-&F8zq+SWoo4L)sUYLccs=(P>!Z
zSj%pV{0lN5sp|mNG~?cZ`5#pmqd+W^Ff!o;B5k{;vimX-mD?GWgU!bv!JZ{_ngX3W
zNBcb&v}NucJaoZ`Oqdo)(M;Cv1b{xTc$N6;FwrVdle4g-fu9;^X4-JIDIP&6{&_B)
zqT<saU5W^f-b|XfrS`Us?=Lq?8@Z-sZ-Vc-;&-#C(TZbia>A3?pcP`osv2yW9QD#G
z%#F{*T#NcQ*66m}k*J&-YINpAyF_O4o8Q^eBrL0(g$c^U2Qvz25UFo-pPn4dyBP|%
zLn6`;w;FM$YV}_q@G-)~Wz42-#|bX|5rkr-Q#5#SD>7SDe;#81=+H>7_+4((d=5~r
z!f<Nv+_KU3m@&Dtz9j~oy5(q&x`;7r*mS6#QrmaR!;x>L0jB<MAdEIq6FKEH&xPOM
z3PW#h-~IWHKiiJO2CIry;0t*cjv&<OCro7kL~Amb{)9a~dBe`cM)Gn5jL(Ra7D{?E
z`uguqN|dqCyzg*6CVycng9ez=)KO74wv%mk6N-r5`-eGt6wCI&N=Fha&&bW_WsW%%
z{nrp2vx?PE!Op=`hZ9?Y=o+_LYTERXf|>B%t2ii)nJXJkc{sNPEG2rxoRnX;SS6VJ
zea<C^f&Yn~#HHc79pAv6`k@|PK?p+F0HN4;LvFqIvFP!Jow2VuKFoH*(ZCl`i}pzv
znsl-W>UID``CTtsUJI<}3GrXF-t|H==ePb#RL6fvJ-gVDa)E`&sHi}ETS^-LuLO*l
zbf^*sFn!qkqjntj@ekUvfKYfX<TeXk3*&Wpu#t+K2v<{H^Lo3%`#B5=Dr#L(L0^tv
z$Is#ye_%pTF7JgCX9}SyXa6!hhV3PQWJbUx)3$V3uBs*cf@nEAYLgs|XMWIoB%qo$
z4yt&YQ}XH%HkTxIlYTLr=EpK+qc(MqW!Tw(m@>Fd@MJ(z_hF7TO*d4J6vj^|5D=?N
z0?ui6uFD9gxScj0IbU)bmEZ#qUX@^aSy)k;(5on<lAt7-kGCo(5-}+%4_ZVO|0z>d
zwX5#Dk_|K`?HoioPaXAzVwS(z0g`V;Q~&;01FeN<wu$RXYMe~Tea)K}t=3&;4Mn9>
z%sQ+;B;K&_`aJvo+*ETuR^ZbJXFJ0J7QbF=^1W=pS>C<HR!Wfjf;4&B(UO-=0GB=5
z)f9zGu<)yMINjP`c(y@HB#d}~<#OH4dZ9lq{c&oiK=>==L=?IZiE24_YqaqhLrvBI
zY0kW6z<x+eXq%wuE0zfBtCD-ghlO&fu7b-ii;WW5;$>e&)!Xx@rTF?RQ74e@YG&T*
z8amubg+XGtdr&_OyVI9-F09z1F{85GE9Kyjm4UuT1W>J`K_gpK%46Gyje~-Mot$IF
z!uwI;8mpN(-Eg2N+t|Zd)HH;u)HSMIi<}{slcwni-+eSWqKFP5dnjn6s02Hx2<Uj$
zDJi<N{qfkTrq(e5G^pfG%8gY{E_$O9Qh8ZmENZ=P`;7<Y$3warIvTXm;O#V&z1pSz
z566>XnDEPDoB=qn?5j;Z6}8{#E`)_!V9<KCD=i~79j>W2gPQ^OZq1}XKjKhmD47eo
z@vnu&aUrssf>o`>2(Hp>=w-B<jPIEq|ETFU_FbkQCl<nzo10sK)Q<`snr8VA${Ql8
zkGbdjrFq|e1WW)cEuR*v!bL}2-S`b~aT3h>?)qxCSsL1yqZv7{idY9iYX6*{4yDES
zSkvc7cwB)%;L0$BL^{O!YwiIX8^bG2%pYjU<|lj(J#E!y2N&PP7k#ADd<Gh_LKiT%
z)i!`K$EolW7_i6#brLbKoRIF@p383}#*Rbzsx0Q{^EPNdBU<jT=@n=0zB_7vz(218
z4`RiLvTs%Y5#7g~l2%TICGCWU;^$$`Sbx_sq*BS4DelmPPV<{R%82|<<S;4yWc^KF
zskMjV%D0hXD`2Dj`Q7Y<ptkAP6N@LYDpFi3ot?<9>{IpGH|%?Fo#d>mt2*vks$Vrj
z%+p4_hE|TN{>}eMj4GN6K<_#;ji$6ig1ZguM=V692{}3w^IJJH^7pImL-~bfopu``
zA<e@`-yo&G5au+HgRE*=O+ObuYWfE{N?D;+-xaTTJQ^(#wb}9fXoSkip$rx!w&-+q
zfWTtHZXye-YrkT&tmSJ{;V9QSfLf?Hs?7@nnCA%zXkjsV+LL*3?;x6lu5%A9j`V?o
z$kQyv3&cX^wdE?+oh+YLk&a(Ks}l{seitQ3u}mXRn1Z9FJ`XsE{(x}Lqvr8DnUf5i
z<1kfz#kYJrW$t}v^azTM0dWe$N^HFOn`Vn|#YjROF5$@SB&<j83;X)NizmY68i~K2
zp=IuFR&-62gcV%x_WG3YBiM+t%4^x3_v&>qZdBH_ja?YY5pI_S<zN^$GPLHN=GKtL
z4W8fFC^(fM6<5=8WnMQ``|i;Ty~UKI-%}M?Ed!mk{K?_NRIoSdGVkFsm0naTN2sXK
z7QPhxMa&@x_0hw?L1Vo#bMEaHrZvqDyh*TlzY)_z+D~!k;Q*M}bUS)(qA*Dv{aS_~
zU(lF>S_e3hTYpwm7VtX=%=Jf+yoBAhY7bq?Pm{Xr`*}%3;32gons1ZS)weNBH~iUw
zsGoAfWp8z=q~tKBxhiU#f>Hv(0iwZFM&s{nM?{O!ns{=}$WnpPrT}nEJ&Gp=LATsS
zLfL2m<v@d-z|UsrH<^0}(@ABm^nRqN?r&kdq+C*ffm3UZ2FH$5BfGQ3)e0frPjRRF
zI6IeDF)5!C%s+>`agpy-1iO!z@4gxmP`}b|b)AaHB}{y`kd;ZEd1`7`?Vs!TixB&x
zs!6Jr5xmDVsx0WanmbsMyO91*f$0oX^<+u~g6<;RrgfL{>dO=JI(DcZ*Jm8xh4IYo
z%t7kIn=ZUAX9e&nxmgIrC?{xjQ`TN-a_Rf2KK)X*JppdBezU_L4~m@w1y+8E7Kl*=
zYM^xCL#C+(u@3wfX>o${pNE1Sz2|j$CoL4`udIA6@Yz(U-@X_z8_8=)cx@5X;k2%K
ziIHhwrS|j0(#T_#y5v-IU`Mz{#Aeo#W@+)PI=5kv@{Y<=+V<Lih(z#0ka9UbXm<R)
z73g?*m^k^hhF;ZUjVJPhvf>y4+o~{1KL#nj>R`;AORPMZJo(a};mpUGhj(9ybgK~g
zI-u%nWMV!a<BJYn)JEnIw4QXz@=$iU1SpjV3hrv(aZ&3WK%q+e=u^|hE-h+wj6%SU
z=X;xn%5by{z7fIpe+h?}%=-)BDI(1w5Tm8wU{q6~!($>ambY>)%59*$l|u5{>f4oX
zIsMw$6EAbHiR=Gv&F<y!#!94)?WHCob~5-06Q+y%k5LMf9Nokhh6aO`QFffdeLI3S
zO40(JGgG{&AxM>x)6yRhksrqhKXhw1SQ0|obO?0B%+s=h^X>O1(U7B(p?CnKW?j@l
z(^!B$`Ib^anK=5^k6hRe$l0gup@ANo(E(`-0}XGM7HN4%An+3byvc#h=Sgud!u^75
zj?0`5^PBXwqIXQmVrhv|G;~a_*~>pKa@7f191X8I8QQ26Bhn#CeX)u5onyCS40OK#
zhNE)V7YeZ`Cy3Z8oI(jkhES6Y(#!aY24lKOFBB{Oz>bN&?+$R{e6_>bSv`nwVBV9{
zS<`f|fBfNmU6rvZ8lw=CQ*oaB;iq@`#E!ZdPMXHnEu3Wp8Vp{Q6$m+sZCEhe?o?}^
zm96G7v1@`=#d0oE1z;9aSmW-H-~~N0Ty`Vey`{<vB?M`bLn`IL#bXF{(lW6wGB&7%
z1CS3tCcfBu;to*T62C8S&LSAD#2OLa>0VPIhvNsoHvWZ}G;jz3v$a#Yls!Az`Z)#J
zB_^4Wu8Ed%7E;*$PIB`dJ705ki*TI+qL6}CT13pr%bx?WFJ5h!+2n&q4}Vj&XLvv)
z1o!faDS;o5etw}0s+8Osl#InxM{y%Y;{>g0WAFRDSLHtCl$PEVHY00xsZQbf7!dp7
ziY58ML<yyTlqZ8TzL_07bd`;?wNl)}(-f%qSm~@V5BDznkN1chC8dRrd(PxrliZ<T
zZut*{@<f^%3s#97$UTTY?AQAhs_#20cV*T4VN_RnT(?|`WQ5Vl-1j!dGBh6a3#=g^
zJH2A_(!2C;$ZwRI?*sISuZ_uvmt870e{6e5bq75W|3bgOsfnMLIp81^0*KKg(-$;J
znpM4RTYvMbV|KBqWqJ0*{hqzGmg0(*h<+SV0HE}ly?A>hJ)DHb*T%-?NQ7KKQtG%u
zzm@-RwN>CS`9thQ)RRzJIMF{scO8-L@dHb9t>9N~5o0X<0KhI8iE-5NQzKmqxIoHk
zC&RT(rz#WpA9;5&91zL}=c<Osr&ijo71*BCBZlI-6=Vzpv2i{bTL40W(k*W2lHl^b
zLaI_{@zW_0YV{glG3Ebq`x94~g9^@C#f3L{6Es<-9Al-1GfzPIL2%qp+ZES#hy7j~
zmTScjVWmR_<viZ${Ce*y*k)|Ha6UEgoq$SGWs+1DC2$7|IqY>mesvt7QqA#NuD`m(
zAQwL3(;{{4T4l@PgQ`Pr;wKuM5cuu`S9rmwQ!rG-OBppK3Ww1u?$3ZGBU<(~Q^xBl
zgiPc+NwMFi+5_H6OVcmY6c?6mrScTjVX(+SoN`**S?1WP<%l;L+46TX1@@<L^UvDM
z{u+6S3^*@F7@1V57`~xx`FV7)D3*iB5yJguZd3D3e(F{GMaTnl{B9S1V6@&!w((Z-
zQqwZVBrWZf#7&fvFkkyxi&-u2*#0<6iJw?;o1P=vGGf*@4R@Q^c048urktW73ht|y
z;dV?BczP+V(ofX$Xl8_4C<8&Mi4i^hsB{%<&_)Q#oxdOhgnUAT*29%-VEL}3rle++
zIQc5)Lh*q?;TZpXp!#!rm~sEJN82nhG3(J=ulHj2S$0~73NKa?5He_l%M9nn0vQpf
z<(T7)!q-XUh5-~bV+!0EGLN$Yfq?_FcL``>z^aAFGmpSSTxwM+ZO;|EHXL%Y_v8t5
z(j!SD$ws*-{x@1wP78UyR_L`TJz?I$F#c=>Cv$1$%ZIlbvsPnQ5Tc#1ukRrKq`hoO
zg7wqhHqVfTS>X_*zS@L_b9KAbrTvZ0!u7Cr2<ZWVt!K?o-5QfnX5FwMwB)M@yVaXv
zjq7G-+8oF(t{LTk^YNOU0fk?DyWBr|ttSwc$WaoX8Q&c=+PH~bf17=`_SE*=JfJ4<
zbaw8DV?>c{<t3=S=D-FrWCR`)Re+wr2FbUOirwWKs-|+b`(1<i$Wa@dZu21XAt!AG
z`aqMPx^_T)l6i7;LK~m&Fr=o>8O&5J3e!gIuo*w>vYOkut{iVQ!qo>=L{XI@P9j~D
zKN$FaG8+vt_5P5089lr1tB+rIt)a^Kg#$7(QdeJMe;~0e@CtJEnEWl!^8;#`?lsTN
zp}!o@@9|*CABnt3hoKUoM{;Y?o}7xj{L_}XC+a3Sa&W(lIvnf$SoToIAv?ZC<j~rk
z0F_u>oX+@bJNoaO8ni=tYwtxC=cd@Bmk&0To|WT87l+B88m?=xkGD^X#peo4V-hAG
z3lQp9_NzD}FHjSWiEri{u2sa!PAVUO^Ca@tbmhjA9&x@2N%qOXWsLes61J=6myA!>
zw4S&R)=i~L@w=fp@3951-*bkbSWjD|e^Dl^Ab-f;<Q)}Mw+T%eWmE*8xFwM#xA>uI
zfQGc!8}#QHQg{;tGZ%EXeIJW{W>!=PIyAD?iN_gVR8Z7bCW{a7pExgYc(eW)xx(l}
zBWx>5C9)0__c8D}s#m3b=rKtgO)u#ebvw|!J#Ibe8BS$o^Pr(K^1nGD!W_qT|DUvd
zpdmSeGT7=voqt4$|JUz+r~^9j4}H5$V#ky($55e%CYVok`TD^QYVRg`bP`wqQ?%<N
zgM1l*eiS?It5nt2YL&VLm4{2(^X|TbE%tp+(^tj(I88Kr*;uzumbDZFK!tfrU9eK1
z*!374ENzebCC4KrT^c4SR7+4X9Uw9UI$yHuaFYZ|`W}b|N};bo>*p%2Pd@#5zF9i4
zKPerULSaACCZ(h*bp)uH4!=5*eDTWT+RM8XcRy@DC{d1^r@h{eYl&r2xGqqR$PtGs
z&-FcMF<z8~C!!vpdy$*0kuSw=ZZyy1K{ti!We#`S;|iX%1sPVgRy-Sb7YuAW<9$yW
z4WQ^x8&ZbqO#zrrPHT-<As-m$c2*DKt`_U=-5Y%?pyk4KH@jjY5cie=2_3h~dV#CH
z=gWE(e~0+ZEuoHa+vI1G>r7IA<f=_iA>CW-AjCi0emfaY$uBiX_-Z{pnl-}Ob-o0&
zq@g^mjvouZ<JR=}FJYba*g#QU)=Jo+ol6S2$rp_$X(xvXu~J!-W-A^L=NDa?wXbQe
z@+AV=A7suE*bLeP+n(>C1uZR0EF$dA{xG$a@zed)MmVKk^#sMcUM7#;UiVY~f5A&V
z<zb?!^Mrs)>s&IV+o>a%7k)R|2<A@cgmVeXyb)XpGWU30?Id`CqgeG&x@K8*_&Ws`
z_XlNRiR3lgTrxwL!UQ`8y59q-BAlO&5b&*l1J<&*F3C8#h($982n*pw=jo*WXrX3w
z7qrQ}-m*TAy`+`*U}<D&w~h#1zE6J}J~u|XI3RAdTK=?Y+uvO1*Rbl6yDtMHSmSSM
zA2OOHsx(Z8SI8a`n~y51>Xwe|*2Q;=8guIIT`m1Xl+k!|k9BUupd$x>{^uKz2z2r?
zkHoFG(Y*b3(am!1Lh)r(&$4~ajq$nt=xWPugyJSRZl-dl+|8<JK16Q1M1a@!b49dS
zH-nGaabD#K1<Vp!_dwYwF?r$Ja3$$>UCZOUou~l*_>GtOolfnqh_nyeXVUw%Zcs?I
z_1q7~y~%>f*;%tZ!?GtPY+;FZ*Iogkf0~jV4Po!tnJA;yKjuEZzBS+qsl8A1FR}D`
zQ_s1OzA<CHjt@cmu-Gz6fBhIwFCKud&~ZTpWQf830<8uY%0i4`jvI)v{)Yu6$%WZB
z3WzG2T_cq*%(k;<x?JO}M!z^uN<|M_X7WB=3e0tuK3%ZHzQ#`yi^n#`I<d_4rF{R}
zm^Ojo6F19qeIAtQH8rp$<@0N3HJ53B$o>6&wc|#QY9-l695xY;V_y1^yNlg1ehkmT
zcHk?`=jnflvdIGdWV<5M{6gnh{mEd91N}{%^yExr+RCP%A)5tK!AXPvH!JGQ_VY>%
zQUUhaacfk}DiCUL_~&re-SLp``mMcdCmq;CS-J=zF}Q4KNjhG5!OJ(1W2=tPi}vTT
zEa5yr_5;#Bb^8>hwnTBI7auI{#nA)l_POhZAGFt`@V&VB*SlY;b*BP!A7MK%L676?
zM6So9f8zw-2g0m5@d$1X7m0sO5XT!Huy30oaY$GsonP82f}!d7^Xo#jDxbYAaj8hO
z;#StWUkP8fgcA+;aGA1Fzl{JE9IG=bL2*<4uzB?#?JAmsdfIFvRLA?ZV@8(d74IX@
z?!F$rxb^>7Ie$Az!d}vVF<~3F(BIdU-D4Mvd+R8xc<gi%PblP{p)#1())S<vtp^I2
zmu&sn^d7>luimzDxbI{x>mP99_8&f2;+iDPVElKB5x49Ah`mX$fysb}`2MV|(Y}Y>
zVnrnoHUkf|b?R=3-YaOaOALPxwLCX+3_pXTr{d45c|H!9KlwjE>gg<Xm!*nBUu4>v
z>Y3d?FRb|Y!t_gP5YxYJusQD)EaN!7xpD8GFzo+hy#Je~+${C?Yy3W}5S4><_@sxS
zzr7=4w^+7Zd)foH$gyBb{Fr}S!M_dnf4r^#Qt;-l;}93-;UA%eTh00k{(Bvt#~iC6
zv7NQTe<(-(`$=@`6)gM1Ik>LzoS6D=b2Gw}oGYfsZ2w{@{;%23lJ+orF};AG4{JOw
zY`xp^<Ay!_|9`vxk6-5Iyfm0I?y$#un0@i#szK$N>)C??74;;-^X<05>GAfUA$Y$$
zyYu0fFez!k_0>~!S{CKu|E+&^BEUNI-J6ev*c;uj#@_Be^(OYg&t_v`LA$%V+t?Y)
zy52iDsNG#dXv8iWdoH70{fwHf*(D(|n7zjM6LImuy#2`kIt;Gczxq4siCzNR{&kYp
z%**qSUF>X<FKE!^{%^9jC1=L#;^Tq5R{v7Q=|30Vz)~bwAFX8&CI<{J<v#7&1(qEO
zJlwWQK#z+Q)>qIGM)eI0a*?0BC<Gmi_6`p*$jQmu51(K<{`JfH>gvhdeH|5*_^YP@
zO8>Su34UK&vX5WeyeKdN53}dVw5ts#+{G=C%j~)ilf40O3j)WKjdvIAxJLRYVqGho
zWSsdg^Inigu{SP61EVQ>^^(e`5El!DJj5XdN1Hs*6L-9qy5Vx$jF(hGVZpRa&Ch`D
zl;5U~R;dU%#wlwvhW~xI{lC9;%>?~TPv^1go&?#oKi#<mG2iG496e><`#?(_TJhnl
z-3qsr_3mBJZC>Uj!3$mrs=ybqi;sAhjjA?s`9kg3YV(>k`D_nnheYcayW%)FemHFV
zHU+;)6MEVtoMJ94>Ec_iKSr&1B|tqWX}v`SN+Haf%BJK4O7T1p(9s1g^czpWT|kh0
zg^dWV;O}|0!)$XVjEOK;n*iL>CWb$m6paJBhA4RAX#cl~r-|~frJ{fUk@oD);CBgP
z>CIA2`u!K$S61F4k5bU%3aoXA0K)ITcFng^9ccfV1#o*gAOhNFOtPXL{740i4@5?$
zR)JU|76-khGM^gDJ$9^XeLo?#@m|biXvPn*e!Z(L&^<Tn)3<AA;-@8*TBQ*Ty#dq5
zJ7uJ<ru-(s0LH{)7)MX+T`UaUKZe+B47Qmiz<EGm7{MR6Ojt>Z3$(e)4=R@F^+Te%
zS=d@frkw)23^u}5k~Z!}2A&qb;dJLgpCNorjC#X7LoGXGlPegLeq#FBh=(3LnaSaT
zl<{~?TPZKhD+E&>8k>zL$koii@UVHfO7!lv#<6>seAH&qoD0NDgTjEoChwG@(}sTm
z>HoI}mMZ=4%pBN&tik=v0LxaHIK$c<wQK10eNNW#2l0qsP{iNf_-gI9K4Bnr!SdDe
z;8!>VL)@MK_-t=|hJH&GxXQ~iF){Na<QE-z?=`Iv+UqsH6|U5;yGVy~-A|xi@^EB&
z$I2Jo$exHOaQR%1P9)Q4P?W$K)KygNTgg<}b~n7Q!R2hmR8ht`+9D#-Q-47u{gBEE
z|B+gk0xeB%mybwqy=I4%oul!Nl~h>sN&<h0T_5e}bCosLGJwVP#k+E<e1cRyt+2LM
zm0#0F_ZJ~RWA_SLaO7;WH~aR)*5Re|u>HpW)BCjRRygz53RD0*n&@oI2Cnh))>UKg
zog9%>-`%ID=g&7TCo7rznd7U+_CDEm7_XAKgd#EUr1HsdR(4y*nVoR(P-&5ZK3zk^
zEl-v{)#v@F!_K=BvtgU}(p%G>F9ASX^W`_tFVI1V`rtF2z2m(>5LC+2%{AxurWaJ}
zRc&!R3lu&fiE$9H56?N&8-6v+V{dz1C)WSuoBM%%_lDeGI#89002TQkAHVI9b9E+z
z)aU}SV`zyN<Y&inZ?aW%lVb0QHy4~;Op7m`ZpwX^3XW#QeH~+xuFLKGo|KoF)jaLV
zJE@*U!3b95`P&8NU-MsS?-soVa~iyG2@`6yGXSQ+Ey6!_Y8O}-KI`5UE4a7UOnqb5
z1M@s!FZ7uXJkGy#1umH@+T8~s^Ve9kkXB}K7&e6AUuku^q@TTYcn#OhCicGYQUk92
z(<I7d&0a}5$YM%b;JH!1)uOcHVVs>p!g!waEg29VnF0tS<&h)Q^Y5}mebSV21SYHr
zIkY;x+|Ty7PU;Fe#1UtS4X@vJI)a~!52r~~`<~vU$2MB8UVz?=*nRz!idgymg6Qxu
zvB>@I@=L4pb+%6{L2BB9>sVimY&58;Z3Rv7gWEA^-dGt{kK?2(aAL%YEpyINi%CCf
zYeR#tV3uVBS?fw%Wx45{*$}>K;pOF}1wVYc4I8<8!C6avgNJZ{RH%~<RkA+BPwDaX
zKilB{G)$>Rpd}udj+W$=M*~9WcVkv&W+zie|H-5S#z6!OYA^3BzpJbO)T2udGQ=O%
zB@KaW#CKnK>POsWmEgjQ50CEOaM`)o<~Katyg3Q4g!6u7Mbcu>tCn0Z{#FXzCUh4*
zm&I+Os!q1bw)!Bm4)mj;pFL05lz9Ed|2dTW$vx}p-~y-(v>sXWTqT6L4wdk2yG8z-
zXYy!u+B6=)N=*oSKr|)8Dzn2_Q?qAH$aNFtIkWIO%X=Oo1o=<(dcrqc``4=hkMS0S
zj}3_KLPN1D=xvg9jGm)-{p++UtGy;FYC@s&Ki8#CPC}>_BVl8Mm3d72_~PN3Ty&$V
zc3nhLT)KKA6G1Pt-MNI#AF7lkSR^+Pbrke_`s`@==+fiiXfzNph^0@9zhIiM1H49P
z=c#}L8jb=taF7X^>|&8Om;zrcu_;EsmX^m)0lz+dc}(3@w4Dy&))n-K8p~BP?~_H=
z(k?r(55)id+UK}-w_gc*&|#dTFXrB{@r_*AN$r85)wx1RBBN`Qg%t3P?b{T~#~6`+
z8#aHN?D%IF4<-<fAw`{JewaYGM0&$-p)?8O%a?fEFw9$cEBG)tU7)V+DthSV6Z%oj
z8&*1W(pP>l5#?JAV^^>LxH4fVT1&taRVrM~T-EqHB4_l%v0+y(_v&@Gkh|rFPPE!D
z8JEND_h3Tp?noVOt6mt#KyP@3Y+@1ETA!Z_$Zro=C%O@|b*_Rv?9Ny@mNC2l1gY_j
zilxfSHFnFk%X!a-xojbpZ_%uVo<ObeP3uToxhkcvex#4rQ0{$OxoGP!Bz=OG@;}%P
zO~)q1?Ltlr8E?p{G}f)eYrgmMKaRW@M;ojec?D?%r1=H`2O3YRFT$~X1o=f(k0-fp
zwkG(lVE=rRy5=I*6`f7j_l<)}arI&X{YU#D1$zLw<WizT!$N&ne+L(qtN)tCIPt3k
zpL{6n{{0?RPXTr%=(QJJx(D<}UPtSEErt-2zLX(;|5c#w#acYbx*y|)qJ_#K&2-(q
z5x>RbxKrNv^z;w#{@w;5(!tYHRC;sc$mMvFy5c3)-T5YWG~4n-0@F1ANpe&9s-uc<
z|2}-thnO?ge+^`<D`**2VaPL~#r}=jLDLJ&^b!@0HFL-7CyVKlKg~Aj++034p51Bv
z5SLwNx1IwI8=0?YFIo-{eosa}7T6gnQ6R~LgBF7DkB?JL=F`z&!WiLgv|t3Y@@WpA
zfe}@-L-|QM5?@Zx>BzOFx@VO|Uuoy}1pWZr-{|xt#-3sd{pFEpfx(H*s-1Dlc9Zr?
zWqJH@CDmqdWJ|A_F8AgKvPbN)u8TyQ5cY`kx4YeZGFY`6i}1i!6~$Y7(T2W@S{+Ym
zO1V4(WZ-Oga5mV^|2g^yRw_n(@_Z$J-6i-krbKaTO5HxTRCx&&)oC|!;^%;gR<q?O
zNfqc;CV@>0oviAc6jAqG_0W7ByyAOO8CmHuNmKN~+Ib(EKgUWje(*M5R+#bR_4yjz
zWKa85{dK_2(9#xA6bNVTlBa8TvE2?<%_Rk6OA2<_wUR$u{OrwTd=@%9$KqaJJbBv_
z=2gb)^+QuR=YT>R>UXVy!nS2A<bN9h{rd4H)>84z^9jfJ-0a3%=j?h}OTC;7a33fg
zEDc*LJqJ4;E_p;cOf7dFBJ24-C?c@^Oq*T$X3ldoMntseGGYl`M>>b=K`*;W>wJ}Y
zs&Al51=rh&4$u=8Zao6Jv>pk0_KUC6+`2vn<q&#Ow-HkhF!|cJrX1k+V*TG{-R?K2
zu!q+}V8c&w&BQT;!*7#^qktpAa3X$2<p>MRWrN42l1HGC#6^3RVaA!wLJKh?DQgPj
zw@SuzOj`CsGn8w06*~>%*ELi9<-*|g8$&$D$$|-uQ(!N<r-_SWRET;xcdMs~gtaS9
zER^rp!i=!5`hdg_sc6WDT=54_t#|$pO53Ly6{p5LVszTQk;x64RsL6}#>cqSGEWCL
zepZTxK4W6Q(*!F~x8c8mQM&#wC|+O;c(7G&rcF$(>0|u0_jfufIL$6q<y%I9l>@Qe
zvfjF)iC$W^Fp$$~)#lp<TS~H-K7*O%N$9LxOLj}}8;AEtYE8KfN#XLs>G}2Z3Z})m
z4ZMC1lK$$3Acr39_D4m@kr9$f8i%B(Jl~BqDa&oyL`ihx)wU{EmBj}|I|jSy2a)SV
zpG|M+FK-U+Buvj}vYJgY14iC%Z!@Xhf_!tscR|utiHvdrUhnar){7t9?dmReOjTk2
z9+>5fpLam_5C5Qw@TJ4@dQpzSq_VwFW&8LlpMT|hi*NhE;?3hy=oD-krP{0L6U$3^
zk95$fBxE)$9ruMua*>q(Mb}@(HTnMU1MrB^r6|oP0R=`%k5)nu1f)hwO<Hoq=uqj9
zkd`uNMmI>uDCv~$93A)k-S_?I|NT7O175tYojb4hQSYPsWp;l3Oj6TjSVvNP->Xmq
z?js6EICAOz3%Kx%HN!dA)`C%Cdes`#vJ~yH|EGsdA{h&8{VRFs5oW0<UvZ|5>bl1H
z8#BGq8gb?Kc6!PvLs4M#iNh<My4OXCBAC_hZ&NP!r5E9YWDHVHX`Y}OzG~2!U9Fzp
zn)D3h+w5m+h5wkP%y1vUIpm1z-}R_01r3`&^QBR=C)p=Nz6r0VxeH)P=fu^=W&==~
z_5b;$w>O%vx?o)r2kT3GrVGY-474ELdq=Y&Awpt8(*=t$y}|L`#Gz>QVje>8Ocww6
z(OZfR`-@U|)(XLec|9d$x_Bo|nLCjS9R4<JzG}k~1I)I1n7wvNOV0T309D?a`lg~`
z&no}OT*ZOC_H;z2Y#>neI?vV#831%Ue1C*>a88VyE>4b7yp_$OYA_u&r05S5n0m4%
zo%KCo%y#`DLYm>>_xl{GpVn6go}ID|r|whd-ws%%(-JW^O#}(7SDQC(2OA8MPy;OS
z4r8k=weO;94j#ZdKeeeAlt3OQYU5<<3g>ol3}Q-uH&7>Y5Yp})shS~7p>DAu>gqZ$
zq|K)<y-uonR(K4tDD85fH}|kIVdm9A9hV;df{kqjil9x{Gmt9umCt^+%-Ng*gdW8t
zZmX0p6uTAz(E)(h{v<%`X|k{&&>kCw5(bybvb$b|)fHOU(3W4osOa7G>rud@^k;<t
zs&oyifiKB1)t#<|(~&<Yi;iAJ7~9&G7wlY_zceiLPnLwGzI7bq)PwFJ33Y|6f?j87
zna>}ay{{d&&$5?*-QS)&cq8UxSXAp5l?2XULjN@ofJ(7ZJTSR4%enTl#A<1_M=dlc
zfx9O-gk66V`+;H#j7uumV8YD2_noR?SA<1?5AZ?vJG)hTrc-h=6$c#G&4lVB+xKs$
z7)k~iwYS;M$O`~;-#p<+NQxu5c3RcVctDc4>lFk_r!hWzc8&!$Jq;x@yv=2@mhF%4
z1WD{Z1c4X*xMPxJ7G`r_?u=#$VVABTrMr7KOUWCZq=G8p1GOKFhpAD-$99oA3j`!v
zO`@wTg(ve4d;4p+jrnHKmVBG(*K)Ws1Jq^Afe@7ip-l2)t|MLmsnoGcY>lx^xVxvv
zr9O?uagei+)R)RoS9k(|v&Z;@_VSm*c(WoS1(`a8zy(OEqkcxRG0ViuB8uGr?osqS
zEz=2+4U9=j76O4M>FFOyki;a-X6L`5gJth8Cv<J+EYKvxB+PNsB_NOJ2cxC-YQwHd
zavh_}DR_%yS*m(e+u;%=qJ}oAzubE}C>SeA3ppGQbdq;QLh5pj*F$10Mcqr0o#_Jj
z3x(VB@#a}4jH{QUnMckteta{WQd*o0=Vhk0G1dC`NMUvH7jqpiU5Dp$-XwgLDfsMV
z?l(-?<aG8o|HlESX()t@yBCfmKo;Hh#yDsdyC7j1>69>*7qdrx*;bcnaAe~jd?+sg
z67%6SK8*I867dc%Vo{Kn=)5<E?r_*5%Wo-m-zV?fDnC@j#Dr_j3W^_kNklxcLU_=C
zzGRN1u?uMAUUB2o#x+a=r|g6HQKjj*JD-b-oBE}xV3+rIwbMgJFs+g#HGRzqpTz%-
z*_a=}B_6{I<W%<)-?jD$+vZO?UTD2sz)^e%!}|D^%a=%zEm8rVZq)2s$G59rPe%T@
zL?t^^3jOT%S}i-Yn$QN1pmpiREH$6WDTJSJDDSK(=i)?`h(8$%3R12JFB$Mg{FGI7
zD3%Y;s0zFCp8bFi>pg;f7#3V$KvGETk_QJ+z0VDgpzk}HYlWK>G^(%s;;1uc0F2JT
zd!ehsdPdN7!rA^rjh}B(OyLYGUG?To!M>WLq!S)G3r^T{G!r9;-P%YcGV_5xT)%1R
zUIi?^jB>|p6lAs@_>h0JM%ehISapG(VaF7XcQn`EjZ^w<j*C;{LG80Q^XfbHc2&{q
zR8ibAvwC$`+@2S*l6%KPc7G5^+erIVerW2jMY;xRel~k8xq9^rsclQoN$%1JHh|ga
zaZU$~0uf`E)R))a-D*4S+;Fa*_m9OsIWsw()<6K>_XuhIwENtZ-0KnjApIb47oL})
zKrydTc(o1Fxloj9X@VA1N#6h5V-1a7@$2VMAHGp5OX5yONP7dtQX*WuL<R9Lx9P>O
zYqzu-CV`_P`3kZdiW*e4bJO1Ij40kfN?BSE$6CU#>uh=l^!7{=w;tk7@I+?f6{oJm
zp)GZp(!Iw#`AMJG1T?gfS16}rb>v&4Sme&l1B~`9qbf68u^T074+McnN%jt5UKGRa
z_0PuUVrwFOVK><U7Ddn6E#gl|$kqLfmhAL4Mtw=2bhAk%rF;zd^Lio*@Yoe4ftA9)
zK2$=lt7BlxO~suxJR@=4_N(9C);+X&#c8H6jeO?*`g%~;|Exl|##&kla~ifHU#|Q=
zWAFdX!mSoM%(DH6)$S>Uwy9uq;6-2#0gV>B1t*?dv;{-G%i8CT*u)(ioQIQ@>=qBM
zPbF&XW?C;aX8(=3jZcPMmfJ%C;JP5G-*%_u!^afZele<-B$3|3^3h2(3uW8Wsl(=a
zxKKJQuxBV)f>QslQt042FC>9202+WpJM3b!$?spUoVIEXko}P^{l}UIYA=+``}k=q
zPTrvU!K@Q?45+f#b^WQ^G((!c(0oDyb>B(Q>RQIJ<@@IG@BHfFyFuH=5?-TCPXe9=
zgnSw+u#ta9ND-Cwyf=pZfXEEryb>VnFi52Rox;FA$`&VO)r1zi8M2e21bMMpDIa$C
z%yr{wa<3~HB7kTxxwL24zq8}jGGpL_th?GI)Vw=ZE{9bA5To>NS)O)mjw+tVOn#ot
zij-Ug1j>MfHNsuR;1s^HX{r55Gq`J=bbO4n?#%{=*W;dxUh>!DBG6w7X<~@4!J$9g
zPl|sQtd|G?oDVJQeAgpSA#1Lr1aTRNrEk~2GKKp&P#Jb(+uQcrjpEWn5wB?YD4W)(
zAJHLnQ0Z&2*fj3Bzh+OGv(cR^K~&hvo5@kfe<rEB5Y23d)~WmsU&Bk&1f5E*Qn;sd
zmYq3QUXQ;o>MuCiC|0nO54vK&%7nEnUl2N=CyMIcEd;VZ4iY}81c5_d(<diqVS#4}
z8LP)cOG<|!ygn;v`{<<4Brx*<sB3U2nd%}UGDDpGg>@9tq}o390E-tAA->403|E9Q
zJy;blLbp}<E($um-^Zi68v$AEk0uInBysib9ofI%*{pW)YNoK>wF*qiYHAu)_$aV|
zs$r<Uzc`_i^t*~TZkj02?I{O}c_UY%CjM&@TRM75k!?jWFOc|T#q?*80Kr8Y--iSx
z;x4x66XH2sXiuBGO3V15!!Q4|#tn!pJ2g@*UNfROzD5m>%rk&rNO|MKyx@#XdBBOw
z?|5KA3ce(p&d)WAF2T(kusF>rh49-o1d0(m_GyyM@2}?zmHDMD)<Q`<ABIPtvfYjp
z|ASqLxi7=s1G-aq;H?3wwa-N3`-www5498N^Y$;@tCu>_pe|yI`)LLIrqcJsR6VuQ
zC^?l+5YA)EMtp*~#_1gps|`5=@@l?euP3SaIOx4w3=r^pPWYG920@q`g!d4amQy2H
zfW{B4iiq5YYdMn-CgZBDV#)851`sST!wumd`Di~mxp|7ueSfH&TIE}3r^QlAB}?A>
z_vi)y&{5A!Clt^TS?JLGfM}2<%j9{Igby;mz<<Y2<}b2c`HAr=Gn>6T7YOdw`1$a8
z>b^J1ph|0|$KnM_-DNXf#$#N4Z`zKtD*taL$XkJo>U5RP1V<?TC1QXk)F~m3i99w!
zDP|@WfuFYn@_I}dG?zP06g_BA)o0C1t*qbJGsGmldai5;JO*KH0<vhBx*dD<$(1`O
zdz+N2{kZvlBrKH1B)rjvee;I5J=$p~v>stit8xhqFSQ^ea98vhx4BFCCE_Jl-A^3u
z?3l1Ia4b>SJj)5j%{bc?I38T8Q^@@C`B}o+$Z=5d?UqJju2q<iM22LfQiOasVKu{I
zGh142{)`-37FOdAqCiolm`e$kyjct>CFDx*sVC|5Hc6Hklu4X-8AeKPUlK5GS^U6^
zXanPzu<ORnuXdnJ9N!4F!xDm&PJ?WY7M$*n%5zLTJxgB~@th2s^GS&=`W^OZr4^)O
z_?PS5G=;MR|En(b^an<79jRq8CE+L1AJ4>`o?iVYEH&4|+^&H5Gtu2cP%b%r80&Ef
z#SHP1JXRQ{?;66j`gU2$JzVR%(lyOj{<|^jV^|yDi!5==+1{I4>h1g^lKk86t~J=3
z)Y)r}#-Ozam{SY{Uorfp#fBK5_a(zP<<PqZ9+%_q&mUf##rRh@W%+KK{|un{hu7JX
zfeVityPf^uC!*DtE&djmC=-XJj{k5_V53r&Q?_z(iQ$t2+QxSTh|q4O2#C;3sp*Ly
zCjr-zPLMex4kR|na;?$nt?9nhNui8*D?nnvxUwtry4Od<7|$^q7w_G1-modhbN%&c
z#S0#Yv-cm7y0-x$HfO~AJ^UIjZYa{x+~M%!cEI@3prF&9>#J{fnJzx#&|i_)t#p{O
zlac`iZ}HZx54W{p=`2OSB}((cGsIaP<J?7+LimS1(GHu0J(KLU?||0n+iQ37J;%dR
z&5zYUK8TZS^n3Rc;V^nKI`!STwYm?yzxW!p08;j%ZsdtyUlJGV^4m9OQ<c){SyNg*
zXk?9~&t7ZHR%l{wZ+o4#H`5u`klC{3HBhx#ewd&9Jc)h5I$^t}ryn1nrw?U~&6Q4H
zw>+)`C^IC!2Vp6MH!nP)sF-bvm-tc>Iwuf<+mPcft6#cx9RIatUz{8bbX)c|mHbM%
z9)1vx?b6!A9oonTXgt!Pe0;TCkTF-+tq_Q{9AHQW?ZbtxVj-Pxy?vH9?#&XcC$5Z^
zX<s(wV#{>nO6!|G4r1l5y3@NoG@fMSqn_xb9R%*&UsUgvFrB9mea$A9d9rjJc_7+o
zu%MNhT#sSob)O1HYad!;A$>2`lFu+PlPmJaAQ@Qi9C7`bwJ7sv$G(Hd<X;Sgb<Re=
zURL{|jePvpjMDTqZAIN$nEVl~cL$4p4uwO#{lA*V4&u^zHUA$);r}NyEN#90cZ5$q
z$B{ZEA`6KQrawL-w=~a!r~{aLbAkLpns_+cT)k6?3*^;>Qq05)f;4%X<Oeh}&2C%q
z58_F3brQq10i-xcNO~B#w{TYA2oaPZ0SHNHatV!`+j&%N_f@A)4Ds44;gF<NO#LG}
z$;E1;e(f^*Iddv2to`~I(HCIZ)cNIx2YOdrG+GkYE6q7~!8be#b_V=Tt$SD3Xk${H
zU)sc>Al|zrDHTF99NK;YEDiNwf8Sj@G(|w3RBC#kP_r5d=si2X-wyMEkgpe+fN74|
z&aYYtqOq8sVz+K-s#AfU!JQL(naMx?cruqP@ASbxSv{jX*wlZ|gcSbE8EnyR5n4WE
zTBA#`!AWE#%$;ER;ye&5t9)6r171HP0ZCT9v|M`>XPyy7VKE7eA#A9=?uc2{ObEo4
zrkQ8^N(tIqGhG-OaVajBGgB0YS`N2lA8O5=BSYOqY}Q_-6?*+hm*}LN65?!qARo5e
zw!AVSR_i&zqUcYw^v!!h4WBiDHM7X8Pb0{Bg;Te#PzhVTdvzL%!nWSI{D3w+^QbO6
z)QD5If)<-~jA~q*WLt}XZ{~v;$2{Rkm}1C7i-EO-pNP922Ujf<A$TmqbVX;$PAYlI
zJF`t3W$06l*l8$JtJnnniCjtzCISsli~Lk&fYsVMmS5X*as;s5_T2N>3-Al!qR;gp
ze)W0$`~|*3gs@|y16)KTQyR>dc?;^>oy+MU!&lAghYX6klC5IVJcO1USvHmWH=4zB
z@Rr>jN>d{b=j+~e68W6Zd(Ky~2U2Z689tu2$(mMTLaa_6XY`0yKj4cr&Jrg5{h9cr
zIIp&~?-fz8a;baIF|h4(1d)B$IAOKRhhnhSMcK4aqMizK0fqvQ4`-~9@vR->HEDJ<
zo=<JGkIS|CZ6hE-^1rD<MYTWxSX|3u0L#&AuI=eS7oV*F6_)YT1sdE9AN;^fOz1l{
z!BnyOTxLo2N^NfitkA<Bsd*YGS7zL$hk2O@!TVo$!GXgN+5V253ubUxml7qcZe`)Q
zM;D6kic>9ZTXCJ$3wudmm?hg;=aRpKyOAmj@$i#Ix(S=bxuxh7CFpY_2%fi8F019v
z97MN!t>eoC8>z$<8`~IRoeJ{0ENI2^f#~x4QI1AP_pZqatsE22z0r6oATNg{K<1Qm
z9@_Q$`fDh$Lr&x-q@gFSUB5Qzh7S%gqgfY{!#+P&p`2$6PG^iHE~RdKOlCi~D*lS#
zLq2blF}cdw9GxNC$Bx<t$p!{iBv&I<HcrgL_E>jfL5hkZ>t$?|p^yUTDSXiF$6kj2
zeZ$^eR!q_DjQeC5{WsG_$UZihlv+3mg-hZ=2*^0A(?1$A6j2TH>5^x?*da~6xmk9q
zu3dm<Z>|fv*l>RF$*1rjp}@;y)g#PggJ-BwGpCm8FW2Edr|I8ny1oJc&$CKwUcZ5(
z>_nJxp(|1^!~~l)dI()@4?M|c>AsW}ssL207^MfHB@`A!Kt<zL;uCM*S+ZFc<60$>
z+tw^>EXsDD;;S}t`}ie_<>nH?E2D=LSlML)43JL*VA0omSl&IfIQDI>Km7D2a$nBG
zihe9#{t>Dd8E&erFEk;%L#Y9pc=c;SB~8BR15XSSHo@OmHjb7u+2xf-&^>5n)%NvP
zeKy^4-xc+9xvW)8mwdnEL8A~eOx*kQJA#+_@~E4sAv1;;nOC<r@>N#PtZ!h8(&4V?
znaJzj{Xm|Vp?yCJZg?oUU)j6pgz7|2dUN$uR#K<gPX-EAyQ2uNtv5c{IqsPe+IHy(
zsur}U9ZCNeAC-v>#73^FvEs2rYQPg>6Pw?5i3Z?zUpdi4ABw#HQvSFH+gSy~*{hr|
zcACw%MF|TIC%KJhpjncqM|v9L@y$5J1&Z80^cn8@`#biL><Iv@5N|uhRi3E?hV}l8
z6&Rq#h!Z5^pbC{%fW{*TmA)R$y1;@3O*j9bj~O1tH)1CW+IiP3^2P!(2Xm4G9PsgH
zgPT5X$5nQeVp=9nYaE+k{EAfYhla|kPSP`)HY{A($L;=TH#ca<C+~BPwvW-t*n97g
z_!WSUG(u*|DXrauQ-nYr4CC3q=r@~?xiTF%*^?O~U43in$9@pj;qlVzIgJpA6f+K1
zixa6<uS;R*!v`xMG5;_l=IgWRt%I|E3je`_EP=PEpXj`uh(}Oj2YA7LoE+kyFsn`v
zugg1nc`pGAEPM>R4D~rTo9aIFH(OC87Y|zqscZMa?T}Ht=e8L_ulb1=-yQwj(*5|n
zQ63VRx-U?ZEqNQYLR%`G$|-L-Xj0X_PQ(m`AH9sKY*p{=$L0IlkyZoeeT@8;=XL63
z1FBj+Dpz(DHFl2Rw{9BE)6P`DZ=Ts?nJ?+N&NFarH3D>D-Zg9D#yvR-A4?|+?Dj`0
z(B%H7{I0M*NNlimn=0ST=bD9T!N43y6m1M+qv;Sx!g51c^b`e^jp@bai7=U7T2O1H
zY|LRme!p9GYJLzRvFDoBJV}tCtFN`49c%Gk%7roK?;)J^1!ESL%iv+g%jP&xMu#C@
zN|*30v4wmVF7zVWkPF2iA3m5R)57OQ<E|B*go$VN9om{R_};rjJ3LYCI(f0!Rxe}1
zSNyAdjo4-iBhkz@rvCBc$%;k1yxjkzRpdUJZ_~C`pgZfinvRpyfhP6XzIl8rE5E>l
ziC6b4tDB!b{$I)(1yN)zHWu`Wmmr?`KwH2IGAemEmLWfxt8~p9at%sm0s!umd{5bd
z1<7B!NR>L4j(AaG2h$|Y=(9kH@jaQOxeuK?InM&3v7liEf1Llkn~$~~{wA!Tl65p7
zePX2Sr4+C0r;a}0C*8RlSsLP0S)X>rhdn6!u0uu~iB>g1gc5;3B;;|feQ9!2h2CDD
zhrQ{Di{IhmQvn#@F@X#genC{}9hKNA9b!Gpa(Cn++d{e!PX47*xKj4opy(WNzW%4(
zbKjd!d-s32QJsN-nMWMM*s(nUNp+4e8E{X<$pI7m+ck%m4L{ix+c*ug%{0B!lx<+D
zpp-=~zN_pWHcO9ja5B&=yu9LdV)#pJ@SI-%mtWY#vr@YTmw8<5j-_8V6f<^$IeU8N
z8yH_0Q|+?w5nbi2uBiNauGD;xHa5_B=Hcn3&Ujoj-gFVPL#{9O%;~{#d-m<^KL&!}
z$#RttVi`M}>opN$cv$$Cm3-sj#>%QeDQ|#9$fw*E%v`kXjbo*5zxY1Q|HP1d_9ci;
zK79~V{s^R_+`)Fhm}zz!G^8sT{jVF&mlvKC@hycg>w2M|7@m{gFM0b!CoRw)^Y?|3
z;ULq+6Gotu*{^5frJL2!;Z<#PY7x^r`J&T(sr5fqn*2g7j1pvt7k@;oww?SuCLc`L
zAO)qvp_=%^E`NU$&hj`FMPb{<P$Kh;xEvUbyLQla>(gOXCwpj&B>4Ao%v~lZk?uqc
zCH_K8rqUH&!2oeE^vozLtzn{c8;M~Pi#uWN@tW570jK}p=PU#}*ZALyUFc&u{;)ip
z($6am0_CU52j61XDz*4^MdZH@6Tkp!&XBWjQ8`pIzk5PNiJilxwZA|52by0Kbz@r+
zIwj@&^uG5V&<ser-g%xbtiO}Xh%9p)4U2n><F;NW8;_n<v<7q=qb-X%3H%y#6^&a=
zx)boRGFbwWI{P}ySF7|I86Pe(B`#kMPo_z!T;-3YmT6SHXebvQ)V7U4aFH3ljiIn!
zb?)WkacNy%u2@Z#1(C-OTQu7E?grutM^O1(RJQ+pNIvN<3j)JG0id-`vZ`o?+j`UN
zwK}GjLI!Z<yibU;`QqIcij1&_EqF4rCQ8@V|I*Nj2yipN@f7#*@=r2{xjY8`v^mPP
z_M;WY8=E4C4HmfcW;=Pm3OfKxg2FE<)&}n7OupNj=Hh|^E%x-vPjPF&`DkJI3Zf?;
z^vtCz@kc(1x8~X`%v3EhAvb42`HE{&`;)yc$|JDxsiPUH5T?Fcn18T#n4F%XP_fSF
z(AwavYX`L6FQwtJk1W045e2c5@jY$-7Iz&wY)(AUvSgwea)J$WxLwKGKjbi9AE`rD
zw7YH$@BgM1zB1{rTsaPLL8bG+mT7ySM^!(sQE)LU+@q@B67LqKa#`+kQnw4_{Y{h*
z>m?0t(tA=xnG;ltPTIof#A`=F200fOLo1TW9)vN|PfH6{p!nUm3$Pg1M63-y;X8se
zc)7FFUWOD@!NPv+8`QALtw5^fmDeZh*F&;uQKihL=T8WfOOia~Qe8<W?H89TtKB`D
z7-i;R8+4^xi$F}CIL0hx#PWYz2PlXjazwbW-c*uEahi1i6HVwej(q$GVJ`Us2TcTg
z@Cf%s04wG>DGG{_S~9%ah+ig5!I{EB=Ea%=7$Vh?pG1jc*`v&}MO+<OOvvQ0o|6m<
z;^u^v6YY4u4NfYey1V>yRcCyN*Ney!A8T2d7j~TN;P<13X)5zHGI)Ps!{$o_x7-Im
zoNM8m8eR@_4`&@6SM`9@A%<m8H1b~_tQWz3>o{|=_z^!hq%r+b04;zT0iWVWo)YtZ
z%1Er%MBTSyPo~c=sfni1Rq}y|4HM3aq6W{t)jC4t%~V2PLTT6=-Nn00KB5En0m>>N
z4rW60I`2o)4wth|GsO3&u56bOmXgC`G2|Bg8vnH21{sqF^J?}>lwQ|`3l&m~U(;}o
zoXUA7FB?URmEDx(x9z;qms)x3tZ70iRfy&X#Rq1V>6_uM1NhSFP`$Y?%`A?%>5F6~
z(%(8Ju=pT*B(0XO&_puTqt;Jf=r%RlAIh?>(x3M<3h=gncp&XyI}b?%JAJ{>cI(UY
zhZ3xdeUtZS)6B8ZbZPCK_5K8_U?1f8Gf#EB@MDi>sD2Crv|fZ6zcC`8kUPsf0%6;N
zbu*b<swjpqN_D3AYH6I3DlhCv@szXS+pca))t8}44R@EAwUVZ}6`ZDZ$;1QQW~H%n
zZj`0p_f}7)>+X!00hty$-<o4KEKM8rG?;7RW@gCwSmE1k*LRtP`FSA>;CvB;OTJE0
zvVQq-HNen=NzB3YK+M|IUaE)zoFOtHV~F$PLHB>6@AsO_vZ^hqzgw|iv=MYB`Dyll
zh(ypt1hVzMdC^mrs0tJdDkScX&nYlHK{&lntLCyCz(o#&_6i&MCKG1?PP8)}X5KAl
zS9H<T=l{p_v(uds6)hIxAy2h!m@lU9PbA(je$83P&VJeD8F_E)uvfj+;=hu~8W5Hy
zz9j7VbZQz4?}w#J!m2?i1iyz`{@tlcl~5vaUf-+LBiRdnroygRLjYrIYW!yxCVKM@
zvfiDLbmk~D`X>?W8JYnc(geLuV3HJj|0NkCZC_veS4AAZ+8jNJ7#YsoIHGvM>UDd9
z4PK>SIUrcB1%Tms6m$;hjX2*?u9FJkSkQ{)is5IJ1yOTc)3I+ygPWrrW($nCeCBt;
z8d;EmR3+CADlSfy2kuNaJwZuc*x(Z;Y$Oi_jOF~*UPZo#+fGe0Bju7<rGLxLEC1D$
zUhb#JXD91iJX1vHFcmL146Lmv_0PZpGh&1PvG3*WJ>Xa+)p_EbA_YLEk(lJ_$nhT2
z{Qzq{G4j#2r0pkJ#s%Rj)nvFd&3aehNRzCp2l!Ma$>1#ldi3pnW_`+RC(ZSicv7gv
zo-kW8oas^75#pSA4~<#1xKJ4L!l^WIxI5QK<v_(lro3XjYezUV_8k!DC6Zs?o-~kV
z?Os_pl|I8Y-|@NuUN_wle!J(2^&<2UPIVawe{_gG;y(>c{M#7!i(u=ccdII4naDyq
z#fJgPL2Q7BkuS%wB4agx<kYRi?gFg&I|+e6wueA{g{ed?TB%QlC@;Ux2tBpF+v8gQ
zKRY-GqI-VM0A9b>8D?|=5nev=#4FZK>@i$jMB^5_8WNEf2pQ6BtJ+EY`Z;1G#jDTh
zptQT47Adv2M5;-doVVR8-B$F??v;}XjnrGTLKeNOhR1!}L{?Q(81KNA2^|H6{0H|w
zi|7~Y@U*h#sYB;pJ(tsma%K~=exan2Pf(N)K>&EX&EnlLgBKD2^}C`xmww;uHZ9j5
z-Eu!q_ek)U>Pi7Xs+RoWSSkD(?6d5Do2!uZlUA4v3OR^u6uXj*fPnbl*~s?u&82mm
z^jeahJbOgzP`O{qJ@+>E>faLHWyILR+~2zQ{PuQbZ%>j_sXgys3fW>+y~>-MtFjXW
zf%#gB4o%3{M+Eh}d5rGA4Qh(<w}(HVBaL;Fud`By##Bc#4GZ$NFcU+QW{)Qq`P;kZ
z$Kk!g7_4y()PdS#T(H&p!<Zla$z*Yr2x8I*BTaeuU&J}w?=0ScdOKG_um6PM_se7b
zl;~oORY<8?ncXLc?(1$C$Y6&w=DQa_grIh0!KYtQ0eo~BnF8+Ln(`QU`AR0LZKGe8
z{9OcEI7GNq-Q4+mOts}w?~d1TOV=<IUZC%`&$8O2$8i2MA#@sv{x|U<AISL<Ga&^J
zybyHEllsFSVwTc9Olr>($5)EQwfep-H&lw=T|i6AWp>G1zVL~@LM?wZmd>FWRs{tw
z4=t)G@It7Fh@S}t%C^$CYO=BL1z{uSzz^!@D3LwPizB0$Iv^K~VR^$D$l}LVye{;6
z!O<1t@;8cCL~C8ldAh&i`&J&v@g|VKwp@BzHO`M{SQd#WfEp>-?B$&1QeJN{4}#Z9
zt{>ksAgLT1i~xbpf#4t>c$F9|)sf!slGu=ENe(6i@uKnZIdMNf#^L~2&<L4;LbfCh
z*A(p_UVSB^&ubaeZ-g3q5P0WVz)3ayYuSk7q4B2g!{!}8Xrkc;W>+z`3cl2nS>tT0
zP8SU*Im|qrlaO&81jYu#F-PtYHo`C=)uMYcQAFWb=rD=TG;`L{cyO+gW&!|kh32zM
z-vP0~mAX6$N~)o#LCg$(^H=F6+-G2ZI4izEqm=3B&Hl9SJ`KQImOp^l0<}afy9hpa
zOVyU_o#0J#aB4zMtq=aJCM4qYIGh6h6BMd3AMVZg&v-jN%`;wT&1Tn&nUfO;L#}~h
zCjYl!EBo|t^%+JK3lQ%WE<2UUAYuLT<xA!`HR_QWaV%~rRs3Wnmgz%SPSk2IwtibM
zP=gBcZZqQ0td-WKVp#F`<8u9Ts;ekL2odH_BrT7)d@nAG#zD>%Q~mo<yLkej7+hhm
zy`SI{Nadv3lER6?H*W}_P<`ZQAJ}fTY%T|iNmA9@&4Xx1J5ImDXr}Nc`gtPH?=}=&
z-qoFb<dw=Sg@>5p9#_YfhjmPEqmxKr;#*<sfe>hkX<q<UJaWSu>Fb1XJH5kvr+j@v
z$Df07_P5d-<>_`byrbifw|hZ!zqIO1j)689Y3W_Kh*LG?D-P2Z*BZcY@cF{>$S23(
z$8IZISDUi^1w61grHJ;I(qb^pO@6todUOC)C>bT3jha-#|GHb|h~3;mxaMohK*4zk
znIV^02V_XU{x`-MK#1IZE}t^bt<-NuM>=_QdfR($?wv7nhVF0LY)IJqK-Cp*IBFOj
zv4ckNExt+NM+5=)x~D&f3ZISD=uY=_R*BeMy=ZXWs^44i?`pm{D??OlW&i7I!N1Vn
zx<?FMKN11gn3|j%5S(tx6x^ZzBK~uFG{I0i%J+AQF_wp0GQvMVJ`k;f-d0b(;UEDS
z+QfKsCRrRatlA|_OxZPIKmsRmtkqL-ldBl;0=Y5PI^Q6^4~fA83H!|}MAlc|0?ZGx
zI9<lh@<nHl#;8vtoozDMbxJPi6dw<4yJLMoN<Jq1eqBvNTE>4~g4`lhPX)uz(9X7?
zSI<K7+r<`tT{XSBiYm{X<N4)xSW@h};gH6U`1+WLXk|n`nhG%3Bp*)732Q4;BTG*9
z$AW1R@Oi<Igk}jRs~A%^!)juGuJaas>Fg<nn`}q^*w1lQS>Wk@R1k4gZB+LzBS`2;
zgJ%r+yq$L#4(D9rkO_UL_)`wIDc4`X=b1GaJG=86SC>i9TEO&vz+bvdLEG?l_eSen
z-~u^~gqj7$%>%(uZvjfGzg}m^Xe{@bQ~(30<SIeM33jAW)e7bv)i-zKsdts<t+h1d
z%=T#VMKlDgy*`AFwd(eSNDu#|dmlN#z388?2JZBHx){u`0+A6}3de(onT*Wk*hq9E
zqAc6Af#gYVdk9z4VhE>#Hm1NkSjbKHIG3s-{pSkVR{6n6sbdw{Ypt}`TG?wWr3biR
zWZIoWE4eU%3Y3i>YHYj*&BhP)r%xAm?Lq^7KViJJ{O_Cq=PS%@{59HgTv?HXgfkJj
zA-tg_wZIue)u>GX6p2+LR=hdo)**Q%OWPjx0UYWYLa2(zaomt6t+jZ&cpZAQk@HA)
z4C6FsU&b-r?NYw|%2eA^Z-XOPp<#C$NRJ@mU0Mk5&wVLhp9FV`#R;J>;|9L&$<?Za
z#|3}Z_BpHf>|4-~Ica<b{7Sif-bWl(_6}~;c%dPKsr1N#VkiKA{{RLsk-zY}Y<3w`
zThEKY%w+$ZVe<eg+Ha-8B2x<*(g~tNx*Fax&9>jb)D&M^Jc|GD8vMO{dpIE8?2z`s
z9|IK86o#@?`(Q7e9Lmlrlx^3w(Xb%X@bNLm65N?71RQo*O)I8TuvirOYIoX2x*mgv
zWOxIsp$h$wwz3pHY#xH$;FUXhg0%O)FRfnlZ1Hr;TNT*KVviy;L+)?LRIWKSJu?$A
zK9`tK2SD=%mA#Stpv)Tk#`Ol5u1MNy-%FRYXA}hb%8LmbsH__8LiOq8j&MY-^oF3|
zVi?G8>A^iqj+D5;b9=s3U2~i=fKiIm{wHU%Mm{iana;80Wp5fcV{L5BT6?}-HSur-
z+A)TTZf&}|G2h5xXR@@$DyV2n%pIVfCgR{C<&u>wje7g>;eXK+8j;wj=dd>Ywysj^
zF+Q8IQnfK@+K;1==1<kdp&aVzn=PZ=Y^-e$A=(63iVe?J@w-_4n$rB*`q3f89X{;r
z%^I0JKeFbeEItbQ5@p+9?(Q+yr{E(mf|#U+FUK~qnKB*ABj3WKH@NG7zqNYGaiPlD
z(hAw3zC8aD6*#62NaK5qNojg+I~M<!z)DIs123grLBH62((-F<9#n9-ATQI~+ipB)
zg*f#1OCpFb8j*$AHv!w6ADm+l4K=TGEK>Q?B4<xWY$G^Ia3U*ik{HC$y*58O_9ugo
zvA`$q;^Zx!qkorOmegkJw-@IS4R5UHNnBWnz$>Y8i{uW*<8h;gDQ1oE0pa)i7yE#u
za(vql;G7w2UQ_FE5l5GUulQ)#^hH+78UK5@xQ=@2UlU1WLx=!j7<yr|yd$l-9FgVk
zn2buV`n546emcTj^mb8L_NR`_O`zEb9{1S!Bj@$sL&hcD7@D+PwGee76zc;|T;%(K
zK(1fMGBNPH8A?qy@$SV0KsPeM7$b|nu2-;nx|sPaZgBNeB;zEuHO*o!gM~HL+?{I6
ziYhTsgQ(cB9({W1Jyq(s=d@V~4CyU;Vw_w5WsO~=p`dSqH)~BCx;~+6czN+qQp<m&
zdBJa>dRN+fasS7;hgLL|cZcia@A5R7jTVbO2U3&3>~`^qvCatUJwE@-h`qOBkVeDN
z4_!3u|H=FF05iRYgO2bI<07SKr1`DO%x}y)zPE)f<5&o^$iAkZ)CPb<AZ-f3gCsXn
zG$C=20OoZ+(+Ue238@cxm2)Q>mS?G^vO@EHD4{A{u^TolAlu(F^w>^ZwtxQXwW~ZT
z1WPU=a(h0gJmg~LG;ML_)&*&Hx;BY^64fnhDrJ{c?0asUR*Jh_rAj5~%&ZBy)b&7I
z$Ykn+$SGWEA`_^4BReA;)+_Qw^kU5>e(LBx?Fpdn!TKZ9r~(2XhtZ0;O~n+yGLI=S
zt=}b_3{TH}`DWvA9eiAN(Y5^ra5snP!&47)?+T*+?1QwFQ#H}dd59T8eD6ZTk66ng
zOCKo=Ty;w;9v|Dc*@b^_>lHezhiH)bp{pQf(&ZDsUQguIw^E4gtB`w45*9y&P<rPB
zoe7($`Wno(b^?;#N#@LPD6PH>c6=@9Te!jRM;a+qX(daX(}4jB$R{DRvD<gqYwn>@
zX*(9b7Mo_9?_x~3HwMhZWg-cu<Qupb{QA2zejb0o8dSGa|L%`r%I3AY>_WV<bVX6F
zvjYZAGoM#grpNv~8?Lx652R|=yNttLOykxqD)X%3rRG|t2Z0F-(xnkI(I6??EW}wV
z;^wIPj<L^E$eVNU;0~AcKJLRx*VHa;LXq{We0R4qeIKP7W4np3wtI(@m@E^Ne?6<p
zH<Tnzz&)tgKyYlLIK_dl=&Riy%I@naPmq|y{mCEcS=YlzMhQR7*!6-nLZgEA81RK>
z*Q3Ebp26u-KQy<`Muz7NK20lT(K-SBnbZB5)Bd0K7Rv9a7IC;^WFE)+fpYTyLJMt=
zF(!Cb!8c|-tv$hoge<K9I+~E4$-oC8Q5NIfwQ|Ohoi|*`2tjP~8$b%hIu7bl<da8{
zER-B-hf#I_-bXVy$Ty-j(i^n}{y9?Zp9M0XpWV6ty&Zlmi^)eROxlUrR+Ri3JfGCh
zVI~+dacELKXZu5Tp@aQht;}euYrNs(<nwt7)D`2Vc-+h@%q2PW{Q(c$c*!hd(4?+C
z%N#R&PQCXEX3%?Dv=fzFWK2;gOOgA%=xwZ3>L9dyt_Pu25Vay|#em1ppc#Luwpo^N
zM}qCFmMLU-sWvf%i6iJ|@(WM+ZpHh=?OrY6pnzk4X4XX^x_BPP9d0$A@Tb2v9)Gp3
zAnDf$NOA&z4IgFNq2{w+wj3`Ws;<E_SucEP=Ll!3?oarn@cS<n!i$XSoqb0KplK|z
z&-LB}`aXxZ%)K3=UFy(W%N6%o%Fd92PX+eWLINL1o%g;}fP*1s0QQIDO)uav=j5&L
ztIa6)vSfUJ4=Gn_`!rWlwVlq{I44Ig?j4&oj*RUg{*F|<SrB&BEk5<atyuE<oBh&)
z<2n89gBb@Szrfe2qTRx#1zxN#XdG}dnv<0JfpYlG{@+pX4XN|=jk#0_|K`_Ub?m^&
zB|Qmijr)7qNXgBz=p8#MaK#q!cI0Jg>;oopnpVRmmqGsMj<lBY{vbMl$b{Pd?Oc;Q
z)4sB;3^kHMWOuJ-(E~*#OAH7T06dPy?0qryJqksz%63(4NuFs&3(T`1hB-g&E&4dr
zHJ#mHhvMzHPq~!j^zw4mArb;q?F^IWhZ`R$w|x3?HTU%BHvV$VVPU{FoVTdF`iyrL
zd6IH_i77|RutkP(gRgeC7aw5f6bZk=5JbONB4;qyW`Bb$v5EBk-N8L5oBg%c?td5u
zIrRf==#XYhZ5I(a!Te@4P=0$XnE){05y=Y%jZ8kmvOLo-*mB<&^Ii-FP?huw6Fqns
zVp^n?1xCl2ksVC)ACkNk8{#uKN?QK<JhN?^d|Z*CK^yCoY}locBj$>6M;LQ~a^?18
zNHY+aW_9-Q-<`C@n{_OVJpjC?E8z7xINWO|bIh;!OQK$F_=~YnVlsIE8~FI_sn$0y
zd293ZtY%nFc<Ntq{qHL`F851aQDnq;10$>}_pX@n3jO}z_ZegLNieyT!L4z2ezCE*
z&O*h;{SnjS3Ht({EVI)VO5`fB?5d<uYpzxB25g#nNpsbN3xi-k9iG9oQxqFbKjW$q
z#OQElBeg(S@T$bOiIQr7c!o#2%tmwz+%=ERh?U>wU1vb-3L32oBoAr0SKy^fYRzU>
zziG!DwDw5O39uEP8`r}0tPY#qwj+H;Hb3pvm$m$CWF;l?{NuM*@7`+{R%Q`lI=vK&
zr;h{MI)TvwkZO-Ht!3oovi*s52E*SdOigb${AVyBVMN=h2k})uTPWl?@na$q^33U9
zx|``sv0c6fIim#0T~HSLIXB2l+zJ9Qam6si(t}J*PM~wNo!=pIebKNv88ki|HLvQb
zA6?e$wZJi~#XH}taGcaW_UF%^Iwi~~DI1ClmA*Y4!wfG(Yx&3ix7)(kGX7VMKH1oY
z>KP61Y=DL(Ms}6fUe&Q_`DM(D1v<%+Ml5bLEWf){P&=4-ugRPBUryw+rqa_6)O>ah
zZu8Dky+?mv`p6R3V;m}c_s4u>*`qP1W!ioJ|6Krl7t!(w_ZKnuWrNrmH<AH%VwrA?
zNh+Te|BJ{q(|?3<`V-?3`w8@y+8#lcrE;41{mfvN9m+4-%CMd;LwvT)PFu$?bNVw&
zn#ol<xq|*!QTOd}X$+-;Or6m<sI$&)|8QLU{P}aZDd8%Ku79@3->D3Y_)w=E69@11
zCLMOE8QmPkho17JertRT{}LNvahc|X$(GDsFx|g5ygTgn@kaLSm)K}<Jg4g2Bcy%F
zdTP+0Oe?PT(*{Kqo3hm9CI5ceDu@%$3<pK(_d!D+oca2ZtkTGZ;OzO<M)iGvDdd@N
z#}TDtE50rOFMxi>@6HvUGUQC+YENYLX!4x#q1)LBZ0jNHUzBw(-~ZC)JnYBA0p@VR
zIt}%w_?^=Te0w97oDM-uh9p_<{=9`tug~sS3xnO)l%}tH3yD_P`5t>YyFEAc1gxOZ
zyIqu_v8F=Fd}w}Hh=pCt!Y+sbzj<zx|8L*4P~X}&qvx-Bnh*2Q#}I=x^V;KaJ}<Qt
z^5Y5$!N{mOEVHEt?O`Q|m!_oX29N1xzGk=Y9Tx-#Rl9?ekNq5?)<1;)({$xBOCf|(
znZlcqN~8|!gxhKuXgU0Xa2yDN4>BydA79>|IC(7h#5Erf0bbQL7am&X7|(1-bGeax
zlvmG869vX!HTw^A_g6-Htu{w<<IZ3LyAYBYE3cR(_Kw?2Pop!Rl7oe?E#U2_AMd|f
zll*Wp!zU<uW}M7#MRl0`OwTc9ZP%6O9tOJP8S?fp+1y`m?Ckm9I!Ec~BpRe1S0TB4
z^$iRC-i{zT>VY-SU+-sVX{D4J@d8Cmb`ploWRr3^>Fp=)#;VVU4Q72VimKe*tYox3
z{kn4r4ILbc2X&@u{V&4(%k)eURj6F5y~E>6bgBPdHoI5V@d5eJ%m0kAbE^J}vGW7#
zx;&VPTacaKgGcb$6!((~c-kJ6bSAK`8L-Hx9=}nUtUG!|jcvfs!LFQeg*Qd=M;{x;
z@{Bi(N>&wP{A2dq540N~J?ZBAl=Xk{$(=;0P;~2*L7IZGM$8I`)-V7AU(Po|Gl7|y
ztW!)ZRG26jh^x<Y5SnSfSymjxBYAh`DVkSFhL^Ep8IT`6)0rY|wP0j;_?H-@MWlf*
zZ@soPdGV}<zz#q7;7{eoE=FCtvtNbE_QBVZB%V4^Kr}Epf`bLhY@0?ijvwGgZr%G$
zCC33L46)o{vUj=3#zw8CCz;cc*!?0)AX`0Iw)0rjq{dz5`bYTnS7w0oKMf&crU5m?
zglRc8FKT=XmyVyi>-4|loWzz5>?W}g&v|mb5yOCP<d%KVLw@z3xya*({6|803VD(k
zhW+4JgYM?}{hZ%JyNer$1ad2kRQoEKy2->(L~RD#_SpoBI)Eti$@8HLshFs<NaF0Z
zm$!czD^8r4rhuW1@-p(_V81o1f0k)8)Wfan>FB4=>Ac?Z_B3#mGQqI?9&knTe@L)F
z{f+2#UADhESe$y9owVQ@*>2<_m8m2ITq)rGadTRKNu;F96QB*S8gpGr3K)Mm`T0iD
z=>MDpib}xm(f0GUyQS%p8<UaGF&U{1Or&woU4G^nF4S?8zfn}ZM-sNu6?JyGk#$Yy
z&yXD&f|l_^gf=yqfBN(EzAy2o%huDFaY(gVCe!`RdlP+bQCt(p(^KdeT13+G=<A-i
zTbIZI#IX=%ineIF@{N=H6rb_Rf6yknLiVIq|1yrCuv&d5iOKni#&iOw#>|1=x#TQ|
zY*&kD?Yi9E-V)>QtpAEn(LDISr5E)ys{HyE0PBv;Op*fz8&yS8O!%@&z^zQ9iz0H4
z=B3!aV!7W~nE5Ag)!4x)Cg!CG0Pu%P3w&1e5(B1iuBXNdzi<9$P%XeTyt^}GTLHf0
z5zkOee74-pmo*KXwq3JtJ&bN+QkFA$9G|sl{(~{QE|Bbt`gbIkmoo(iF(1K_ii%aJ
zOu_}v5_;q(gLW|yV#?+6XN7OVDxzq^_8Aww6GjmuN^^hJyGdVEy!F*HP*m}~Rld)F
zSr4bV<mD@Z{Fjb#BB5H1%h~cM11Tk^<K6LFEQEE>ccY(ob$F=H+%qM{FlRBqO^)<e
zKnMmhuR|$jBF^ZaQys@6jp}!uflgHXxO>z);*!nY@%vSZ{8j}D^fOoh<AfSgFBJl;
zt|m%blGqR%)l8TCVd;e@oBjUHn7!~0n;O;!N{;vDyBN0Si-(?+9x=-}2XW?GqK7&#
z5aG_+ZDtFz1D_;P<s_<WK-BAPkEeFQ<f-cS3T>ZrXv9oouBAPCRNc-}>x*4X4bfp}
ztc|EUc!yxGH(X(#Zl?J0)YC-IF%~Z}tbyp>oz0(J!QJZX&FGByrb)>w`Yuslrz1ok
zIW0l{Cw!$Z6Xrj<IzJG}JY{}i-Ly`#6r`R`kEt%D3HQ*`mW`{Qrpt9HQRCr5$C~A{
zhj!^|X_<AtJg%qfwmD03DR=W8O441ke6OWn-o-uWQL_A1AxSrbZFfp`w7}=@T>!WJ
zuT^89Th3vI|INjR*&3%Tg~AS^FUfaN%R5IUn%)1A1;%9b=7=C;*}$U&0ep)1YD-_6
zEC6BH1Ayv1Db7Rsy)t&3S6}sK%zRShuTNEHZ)g2eJhJ}Sjl_x*4hy<)2cQ%KlH^X+
zQKXS0f-fw@I7{);itg`xw<t)qDC`++k8I**PE5xvmXf;hyz~RutaK<Dt0}BWI(#Dl
z#0#t$M0<bOOU3=y39-#S;(rH-)w?qs&_R?=vneCu9vE#XBh}&M<zANyRLfO8I`U1;
zHabJM=7)T|Ob5$tU0ZZ6y#Qd3+&_lq1C=JCD~4L7+@-jd%$r<|<%R}vIxLMcma6#^
z*Mc{?<o-rq%QiHKH-O|72g3|_Czho};nPl_(8}sa$8B6Xh_RM)CMR|><Zk*@-Eh55
zRnmk_dr-qmUm%L`;6XMGNF;KyNp0MtrqO#Csg%=1<7%5+y_xVK)<2Bj2~I13J7Yho
zZ)}H`*Y561p6Amgi;}8O8vS&BXPkGaV-({-y!!YhtcGrFzIU>gAj<<bW-VFn5H8%i
z1B_v6bob?%_CcMB-1r+1&*9ibYybj)o9(V*xvm`I5E&>lH6cBF2L0Uw(goHrGPR@E
zIc-zP$&aTmv>dEDL0mkhnc$|;ufE_(L&>Ux(=zix@Lmufk-)^C6+0kI!Kd!H00lhn
z@}$V*DSZZ6P-~NOsn2;JQ#4g!72&YhJb$)3ZLzT$_~N!Zr+evDXX=05Mm;)9srcZJ
zadwSM!eL~hBG#nz3>{*B!+2;L=9co$d(GmxH)Ts*XPGu{-b9mDmF6JPMU!&<05Dqv
z(FF-+j;mRpIvDxkgv0u#0tFKNXcJ9xSz7aB2D!y7A7;EHZ>}Ps^ylm~`4(^&P=9ET
z1KtcGaUPa@*i(1|{v=@e)3UoyOh;ygAU+TaN2;_|!~)K7^RUVJHD3IimV4on;2Dn;
z`EZkD?`?%rd@fJh!n5x0ChPwAL}Xl>P=xkm_k0UWv^HDiPJ{GD=N`99*I>$3S5bu5
zd|^h+<(U<tv;uw6d_+e3YX@;I^)1+6l<+5i@;(q4fWl6R3|sdy>mn{F_B&7_QaStC
zXjcHA55@+E6IvAbimi@4qmdgP^=$DTxo;58P<8#<Y%_M$U9M3!82D|%Hjr6AM}Fq<
zqsU6%HpqlP)+gRjQq0zRs%lUuIZpGELE^qucDc>E3d`2>tX2+}%9|y8nX*sE6|5o8
z&ixG*G?@Hv$~&lBCXEuhX%`KQTuZC*xA~$&|KB|y|KGr44<=HE*^AMk6hBAm?45A7
z1s8PAYoh;iMPiFXe|J@rG2<PJ*xryU`BWdw-I|UIqczG>`Vc0}ki90|_bw$lPA5rM
zwECTMzS%+d31H<an9A+kCU)kV1O79bUDp}bk$NVSbiWo8!wTK@yQM!_145eUahPcX
z{Og5~W+uycM?xNXbY9ogjzqZl&x6UG*hvkCO1*|3!v&L*7f<gP7}7{L2)_45vt^uI
zTN=?cxJQ{>x^F?w0}GW#gRgTJIuZwNtrHULZdCelWf-ErFAhw(`~Ntdd}D!E_NF%M
zg9|BpRE;u-2mYkIn?97^5zbZ@9s{wiH7#McG(Z=nR=?`H2JV<9hQAH0GHr0YZn@C9
zG^m~t56!XF`N$hU|8=<e4sIHArSUcQ5nliGkjFONR_4)=dB4>8_wGL}2uNFxXF_(b
zzUeG2Jjp!77w=T_Pr8@X9~-G?Z4A$Y_0G+#)km4`knt@~=}*&}ztm@AS$E@2zK6~%
zzI1g4(l>_+=P;H$x*RYL@-!<S&h)<-bBMt|U)7LOi6$p)EneEsx95K7QS>;I?Y!IT
zU8W4|0i}Q>rNRSQaOE8ik|b}SwR=>EBZDz%BYXlOShzT1at7}z%JO!TbXiQ3+&*U<
zoESmZ5O;0P9^iSSna?L$c%v2zy$+5`cHG8GRfGFS-en6&_moS{thHm!fYfA4Gn>MU
zxqe9dm{H{Pa{QT?iDZu`hSvG~w6pP+PVc%(n3!?X5%>7&_MT&I=KpqN)wWuLP50d;
zzZxc5Y#{gUHZ3H}EtHtjRrqa~i$}#kop*k>`9A)S5ad;_O-E|(EqVLZ+}7O};}3k6
zyTb)N@Kc~{>_h^3nwsYKSEcv1GWVA##(L-IH|CPhHRwgT#%FUB$n1vgpE{7E@4lJ*
zm!wG2f^po2nU2XB`qC)$-bQK%d(d(2YXLBidyc)IO{#T<d|2?sr%Hb2$$*0_|KiyL
zLzgTmPbt;$taD-I6+{yz<QDKj!gABJ)U*L%F@i=|(5}*zrtNKz1qdkLzVLFXT9J5V
zYWf~B=V>d+=+k;Ddont?PO<M)cI4eKeH3@S76zdMJUy8xb`KDAy%faf(EHMm``SEz
zW)<C99Lbn$gBe(>Rr<~~F$<tvF<nuS;y#D5j2E`K8DEy0YZ^Fr+CnH5hZrDBO)_d|
z>8NpK?~HZMJ*I-&<yw&_?lE1cLXDxdLC*h&ueXegD*D=n1woKTS{gxeNa>UgX#^Qs
zI)-Lwlt$@pkd)40Xpm0nZiE5p2I=>BKmYsr@ZPu2C-}`AIOpuW_TFn<>sr?mM2s{&
zEvtLS`#5xuq}3Zu?EqkpRoVCZmX{;KM;@eH&Qo^FP1A75qQ&jyL2a$&s25f8=K5K_
zRlJMv3HQHH_Zq<Nz7!k}Gk>mT0pM~K;gEh{i!Sv%=W8Nb<LTP^CiCnMP&T}<&s$n%
zzbsVhy^c!$J%g5X-`nB*LnYv+pHt_Jz4siveJfuzGNWw^1R5?i8B5KxBa|X`voBxP
zm<$kZVLcYI1-)EWjLaJ6$p*@(#scJq(s*s{b|wm_WTWv+&~T|>c|hSMpwAmwW#XfQ
zYDFgMLh%awe*<ehpaA>lbbsWY$9o$IEbF)X6c*94HNL^51JG2^DQ8N@Ob7~<Yk?YM
zyF$qTpmuY@r7p%2_0`=61}jEE;ZZnw0QV-Oam8bxB1_N~D8IA?!WlQ*lnOX>NRtZB
zxiQGSyT1}QEAaQoGmqCkb5q5d<$(4y+GL)mJswtb?vijhvbjeCXZuc5EVa`)>Hof-
zgOH{E^3T)~zj~=gB9k_D=9!y)=K%(Im$Frq0b{K&cbllHpa?dl$iiCgyi53B<=*q#
z2$BybPO(Hx#X_F?_DpOaRNa#$l)f{zI5hgq13Gk7@Y2C`Nj>!}CR7jhLJ&5aKk4N^
zOWBOi|5g*955JFaO(6sbQ|~0jBzS=8c!{#2ml*V@HQ$0QiL<ab2#K+ZkX>n0MQJ-R
zC}4zwX0$<bky-W~#GZAqdOaQQP`A3e#^fOZ&l<JIq|Qhg&pV8KypR0-9eI(RE^*XI
zz%sD&pb~xf@)iGo5u5sZ2ub}dslyBYYNHwrtW{Y*_W#V-Wbu>HO1*mfG#^^mK<;5y
zo$zhjoz|T!4v%)@?romFLh<@m-A<)nU-SvZR0Us|M&<qE201@|YXG<SomNSD!1?#3
zNu_;$yhetJd^1)TCH!6a<g-lQ6XwH2Np|m6YGG{&22U|Hxfli)pEB<%NA@e9M~jX7
z4NaPkD*drIB<oZW_GkyQahHGXI3h*BTTp3L7m5ED5A9Tw^qcu6Z!}f5B<it>`Z@<^
zLFF3K`gd+WcHh2>&b{DJ=?Z$)r9B|~@`c<Oi|Jnr@vu|^;9)Bg5@qoLB#DT-2tWzX
z^JiY)cQvF=e<wZNwN9lh%i<*`c5jg@pAwSWfB9v9)e_hL+TQUlS$X8&TAGMEdSF5Q
z{a^pDzX84@CHs4I#K`y%K|b&>vDp#8R#55rZyWy)XXjPk->b(3y*vZ{8ro?*{{I<~
zf56rM{S?M#f3M~ilH*W~`7guZ2JQ|lWz&C+{C}oDBI57WvfIBA&;L$)zbX)P#LkL%
z{i6i#|6-J}0J(Ii|2IYJf03BUt_S`;Ynoj2znDk=QBULK2DV5tvdp|>MtOk4@JgHv
zxVz!mZ?1pM<bQ4`EBp8AJeTuRl)nt7O~zNi-EFBvLy)@)evrPLpuRM{8_fi*HktQY
z>>r5`9sS=2{qF(#PfaQil)VN<>h#?^mnE|b;N6|4NmFP4&k!&WdJtmJ|3Ef#8k1zC
z7}iKc0cX^qSbGv`;z2_G<r5NA0Q1G@6Y$Zas}lHaefHcBBnJ8xeEZwzgX+D~qu0~v
zolifXwvR?JxJyUpXQNTgF*O($EnJjdvF(PVki|F;*`KFftbDGFI05-5Rhv4@H#jBD
z_gd^m2fW9V`u98-&;6ZoI}KT>JQ<{j0#8rRr0ndYTc;X4sGFOvrR6q(FJ?q~H8C;q
zO;*QBgHu<Z-`SS({e;Nu=7ZVtpypFB{a!gmW}lVHVK3X=F13`aN{5E%5`#MhOtk+)
zNb}<N+*XQ%JHKb?)VtWw!(zN&9rnMrl*{+@bo&F8l%TrNWL;;WpuP=6M_Savzvn%|
z23QDK+`fVL&etXp^MQ4#A;+AM?|dIWeFDB%GRj-8IC=A#`0)<I%0zFA_iEG1cK^et
z%s0(4l;7lLB(irS-nXu^tiTu0s?>ehBDNK(TzNvYYF_5oNBdS*bBU(`=o9+E&8R6n
zN?JvPHqWKqEBZc*YWzt`Z#sCto?gdk`~xe9%9d9Xqs+pljc7Bv`mTkmH*U+PE0JoI
z{mn{M;g%u!_YJ>8{No-u*XD7%e=ix_0$_^YZtn5KZN0=sD;A#WDpuZYe(PF9BHL>6
z{_|f(g-POhTV(Z0_}8yX;YIcJvvW<&&A#GbG0e|#)!lN=c_Poe$@a{H(4Y+`Vx`D#
zOIG5<c!6jV&%&0bL`~PIqdGtI_NR$L%Wft-><1QnYm>`oZ^@&?vU>T{K9y~m?~|Sn
z4Kolj<YmL2cB-YALMnK`m_nRxVymMmy-Qc;1E|`<<zq?gOfYlIuVqDQ*_IH`jo$v!
z!z4<P6>GF8=t4yxCG^y;o(R2@rqVV^{msL_OdFsJYO&&dD_A|Zh)7B`Ds}mKWk-lr
z_&h1UHQVE}cyal|?E~cdrDiMIZ@Q8dIV2d6Mxc%jxqh?fp%sv1W|B;`!Khp_I54gK
z#;=7aDM?rE%+0plZ#j*_06tTyb3f0iO}hh#`oK?j6|^<fO*RO#Dr)QAZ>R6?UZ2fO
z&&;6NE@6mDXWk2bt3JOx|3d^bGBkh?X%>D^S17LI<JBEP3_%Zm$?U0pir#A+zoO{s
zJY+b#JC`eZ9H62a$h?t>HVmF`W{}s=<{j@<&9E|08;v_b%uJS7`GY`E#J-?yqjS%;
zlfGY8qRqn~-~gVyf1mW-(U;#%RER&b-PqvZV8w8YxR%g@{%8mPvgb`lx)<sTe`bbY
z(*uMX;__07h|d@7ubla=U&fyhGIem+WZryLzYx`0emM=-7IrL~#6X<Hp;?U!+$MFt
zqHuXb`7Z}2=f{RNjYPP`lUEe&`mY~R$%9R;Hz=M-KO&kgTU6QyRr~I8kscf&3s%B)
zsh2x(Qz|y;OXR+?Lt4z2UAa%1fb)%*Idlnq&vz<-c$r}%o9|QBG@Bt?18od<@j@C3
zZyJi~2&Ir2cAshkt(cLaFhE70gQ2=<zt=Rq?&-n#_;PQr+;7(2mKbY*z^7xQVAT5~
zI!j@dw&=WwBy3c>)-g8l^N{6;6;|KDdD}gXp2j`x8_}}&W7ZkQ@hE4|Y!qBPn6U_d
zLg;ffUbhwf^k0i2>70-zi4Wlh4pvngi|z|D*61<JY|QZ8R36Cb{=8wmym9&W(n<DA
zA;5_Iw$J94$Ja-#9I~6IT@cnx?=my;68=0SDVLF2YL@iQkFJ*27~PjG3@RqRDzVC_
zb6YttZ@x{cgIz8>1eU|)st|jW<xu$*P!KWZ*y)?HkcQCzvGe~&`<S%kuSQ!34f@hI
zLPEj@{010o>gsuf8Y5&XPmDY!cykAG<0kZc>bZZ=Nmf&1uqkP>f}x>?#>6a?-7&`x
znFW>^mtMa|G6`FL)kU+Nhn0=|o9vtUpK}UDB_BRn0xJ`s@zK=J6OkuW!+N(FARp^i
z<uqMe0#x<m&ke4-36y_+DG6U5@HOX*+NfK90krveOTGOvmCMr|WX#PCk2ViQ;oEu9
z`=>iR&muYm9(^U0>a#FKu(!$$_BK+RE$WOrv}^^`Ir?ixaeS^5+o5B*Vy*;IsaYl^
zd4lY$ZVp*KqG!_lOX=U04QWIrqR7b{D59dx6u{q>u{|q?eGdht>IF-67PUHfdS^d;
z8iYRFKLsQ=L!*ftl7P?W_Y+y|woVa;nL(Bl^H0$Ze)LAB^-UfYbW?btbK6@<*8xqn
z;?iY|T=?qYKaz$yVD3ig#^Qd>g9%55FIQkkQSe1E$fjANM{!@p8@EM7>ENXEPKbf+
z&U-6(|M;nuB@_6c%n?TA|58AvLFUYP@?p>nue_}6Ty%RfN!Wx!=^HBggrYnR<Vo_T
zf1Dn*)PYAZ84Cx;71ST}{MY8phMP?~)GD2C{t5e!|3gy-JPj&9z=vq&st5&igY{JR
zSW@?VcL@ILpx$n9N$~LC$r~g`yT_qh_|X}M+2=#}N8gp%yFYU9^NZOs=)eos8O!#Z
zhkVMvks!bjJ-pU<VioI7zIWW#>$xMl-pD08vp15&_o@9)VlresB-yUxhQw|n;$^er
zenaM5^$_$q@tO&btf}l7(&8w52m9i&D}^fG?DktF*C2@|^KZ>&v)II4U5quFgn1dy
z+b2Be*3TGzH6)`Qgoi&PzZG4`ZLc1*oSa{<@Ic8~rn9aGG(gg;&8k~8E9Qb<$w4L^
z{7ljK{U9DCcwoEnA#%Hpa1MbnmhpSnH}wrV6Is#XHJEbIx-L?ASwjIBhp4GYE~(_?
z_n(kJ{0>poM3~`rU1=KL=0HBC$s~Xm0Oj65=NVVuEGDnQ=jOwgadhl|-|iH$|FYes
z(&V}c{v!Oph=>2jtun^@FTPoItz^5QMX3gx-)BtYr|9vR{C7GB^J7N(U7p|M!#~~{
z>0~|oEFX&HO1P^L8p$^wCx;0sj65TZNbqRP-_uz9ob;NE$(kJZWrfQ|u@TXywo2sR
zRDh(RPV137<sNh@keKYZPRo6va#lPlC}>Ye-9dQeaWC^5qLq4}6!D(y1PQcJezM8z
zpX2~x{f6j%P7dhd>C?DDReyQEL4j5I`9Jv)&_t45&*GmHjtsgGc3-T(eF6#vSk8hF
z9HDTVX56~p7p;oOuOlqWXq2)<o+6R3u6i!^NoU2ogyJs7{?yCwP_S)s9ACT~`v}Xt
zX;KxB=*aed`PlkZTouh~k9{l-rUw*%s4+DVOAM(?3^^PC<(!94KyHBg%-JTAriSkN
zw;s`Wv4;=Fw!gn23`gy#&syzwxiY0>C5A9rl@2nKdlwh#5SJNnf<Ev;RcDZILp4Cw
znt1InVhEgaLs}LyKU<TBO`Fnx^ZnatmaF{1^Bf#Qv_T^AREmU(@pIC1MIZgL4B-4R
zRoX7bL>ZW^T;LjeDx>j-g9iN@1B5JZ12V>~@I0AucXm7ocf)H9iyXq}IMEJSsYSA{
zFlgR_YzTI_U2!jn-XKn0&wo$=VbFY|*|Hc=*8?Ob=Lr3i6b4tF5X19WR@cCkm|u8{
z!wN8{L-5;iVLq5v*BfJR!wVT_(WB*hKUmMrr#+`kI<eAq`257VBj#11-z@v$bw?=F
zk8><e7;%GzWkEoF$|JjB5@9ETdxjZKq1f_g^l{Tr{Bv@=H*^uOZKpp^9W~6+xc3|F
zi+m9_vEJnM<Kc$Oe(`!eqE?5;Rtn!_R#-NdtQ!G*glq9`5m0aJdN%96vcQW$86u}t
zJAI^%F0ee>gZ2bu&3S)hQgoG7K|Rn~D0{}!YiOuxanx#PQmP$Yk6`Uv%o3foyM^0c
zcIX@+gD4ffK51TmIEGJwkK11f4*q8S_dGDl`CAeLsqoJ@7>=mX1boKmF+lp9euFeb
zI^ygJfn;PgV?E}@6FQwrL*PVazo=7a)>i^Dr)CYr9&r+ww<zEx`C>oo_-cBlrx6NO
z!IvjCIX@ZaCHj2DcTfRkoQQq4Y6}3JLxah`^8Inovyv=rILc&5ko9%UWSKJaP_*U)
zAUjew)mmlW|E*_n81Ak#Kf*H#XR}vd9gg+>^jlEIis#daS>M#BI!B1TcMx`$rsbiF
zt=+x&V<nc=IL|?T;FoBgZ!*_*RlNiuAfUatd5hab^}}sG>ZlSmk5l;#C%)-Ogz!eA
z^Hl@D!X-l*`z54MigLh{W+@?l^V;|beARx^_s38wX~uLvvCjN5o}UF$ATczNXbyMr
zj)z=tmH3{!g-V9Hft|*cSy|}KNXHJia0HtEDU7AI#>IO4E-=aL9JylCt8~*VBVUNL
z^9ciK_{nD^c$E{?95TAKo!$BD^EJ-;AEbO0K<1Kd;5ccaXH8c08(`KhIKKmRFd^~H
z40S&nnO0Siw%8<wt|kv}4ST#O@4uM`f<Cb_x{?Zlcb`*U@-IHh432A9lzvK(3<au8
zzg<iePXf|}pBnhBI#eQ6>^uTDNr;)m-?iZfu%3CgGaQA#Zl+xWGs<%st*~JldUTZ7
zAjX{mxzlApQo>j*I?2gvWSsq5Ak;Gpi8<)qlDr>*sxV|TZO42_GWFf6`-sTrAb9W1
zTrURc=Cy|<yPid<-ybQ@yQz|F5n=CRfXU+vAtB;E7@R*_LH7-x_){D2y`ye~MI&nK
z;+^Yoq=YBLV{c7c)YUM_%a5AxeSy{1Q3!t#qo69^qWc--D+op@0<`lDuXPMzFN1}Y
za=Q2)FWucka9+Jhf0MN+?mEXPb!N5h_u`xCD%TSERX6E1&XcS&U!Jkn?Oo1^N?zAU
z>xfniD^JaD=I@`u>=tLtR_&TTQbTrOm2fw(?ul{-u|K^(@v_E0wM)@~+&gNP^;n#R
zFO3vkNTq9r=n+@cIwT+&Bd3=okw(z7_=!{n6ACsbih$(0zWo4iVF7&7bD@_sdEs>g
z*&6CA3QzFMmoJx^3r53=xMZ&VFgS6Mz-`&k)hm{n+afd&_ZWdt1O+JG#&W^j84z>b
zzXOf~CH_;A=usq4IEkQ@bpm5wb@`&*9yJo^!X06CvK=n==mVz=`7Nj>=vE@~zECf<
z6DMg`S7O4=xbymz-2QIk8C4Qi5fbMUn;6G7KgOJ{Lln>h#bo|C+Vfr3@5;3&gJisZ
z9}|fPNpuZO+Pn*m&mDBE$}abQLX{h}D)u}l8B|;CMMryg1bb`Q1lgW|FciO*I}Z|4
z>%CQXdbZYYTj_Oar%Gic)(GZg9A(cIgIMBFQ~_d)2R8*n7@Q1_xf?(5f2d3bpRG{G
zp!9bM#8Y{#dl6!asHTW5vy_w)b(VHwAUx66)>odeQVzyX7h0Ot8lUq<HaE9U=3pL}
z(S$Edr2ta4^~$f?e_~dcy70*j@weZ%QZ6LK=gC}biK|8dMNpbiNtav1S7U!^RaaP)
zvhBkR<YN?^M&;FK@3tGIkgD)-PCEfw;)IJy_-gwhQ1aB&elV`FP6Zkw6tY{aSBC3;
zLzwy}aUg*h_{-@fhBh)ve*d$BLKYqX0+=xZ5)2ERt_5+1E;wCe$FC5sv{xAIAA02#
zq5#HdP8(;g`jS9uYt$7ic45O8sJH4xN3%j&(;6Vfn970B;5-vtu)w-s%qeG*(s^<Z
zzzqhK>qP@#>gjJX*|$AFy9*+R_CzGm;5f3CwTQaI;^S%C!R;L!0M7qQxDlx4nqVLs
z%Uz<EtGBNMwciFh7gLa|F+n-7XK2Y$aj{<vCY#1&i~TG?RV}d`dC^~#5ZgXs(hoqp
zqdm^=$8tZOnxz*lvHc+a)?08Buifq)eQ<M&mi3vcYAxNBBQ~eq`Q=)n><UMnc14T&
zlkmITh-CDfMvoEK!DFEOaaRq6(^*Dhd!Z+M=G7eXSbv1(3hR?~h4*~6*UqEwHR)4g
zjMRQ#sqr}l&}fwtJr`K_oG7#;fxz~z=ZfK!S^c*?9beZ50d1Rw)cf1kqYC)ta0)M!
zMmeYDro>|Ex0t0>`r*PFt#nvTwdaPu@EXBJ;k_*h`6jQ+$n8)>GwV}_euup6Hx~`b
zJ;)Y826|NwGJv1?MFh-_X-rx7^YLC!nCU3p9<hup>q+T}^Am__wylt)GnX<41YOpD
zbJ-G4A$=Z)+m{_R4ue&57;W>fBglB{9`3IBG;KqUFt;)+1#@0s?X9Rj?~dIIxBWA+
zb^~MAKga)*$9}}brgUYct79P!h;v@+5?3<S9_Wspju+y1!dy|U8l~zy2>ul{kJbPH
z-m}zuv)_u|B}jst#EhiKcpZ+tPR9M_Wc!@oF0QNwAc4?{TbL?jV;s3V$AAV$jZp$~
z7t7)8)e}%x>CQcv{I88Db!vmu4HMprd-R<}YmKHcI7IRsD%2dPeyRa<5w~btzg}D$
z+)ifC)$p%Y0X=Ido!2g$&CZAy?|{k&R0eg9VN{S8^oh{vg+4PIYn?7@l3r#D)9dLG
zM~<j>MU??|9+i4C(7J(E70!t0$up!iWX?wCt+LFUL*d1XdffeeAU((S<jIq8K%Nlw
ziy_<^bGPLupyok2y2WksYpE#f<j0!sObKTg=u}LQ7B@BC`wF8)zsf>%@$(Z{--lAm
zpRmNEwv8&(qPXNru3Z;|NgKu~WesO<`XK70cOK(L2VDw}1=g5UV@Z>DnVdvgj4fLG
z5|cx+(hA2ON;JvmLFu<i43(`GwK`>Zb}XYFHNPn~)GSx1eJxr{y~<@ao>gT$*k@C&
zwsoaVTQp}4!O$o=&kFQLw+qtxdMex4DwD9an_Y`{pTj7`K9YaBw-r7{=v+S5UadU7
zr_^0ILY4_qlaSn`aUSVdqgYtBuU}j~<m9>XJLI_fejD97$#W#kvwVOvf*<kx^>#)c
zJzku|^A6uO+w2!;x}UKmpVh)!NB=fPDlN$>F+eNd7O{2P3L_r;Y1ph?l9X8SEWqk5
zFKnYLdOsjRrpT{IINM4=T0w)4U3)Q^P50waGDp4V*_Pm!U!>vZg5H;9zV{ciaJh}7
zHsKG+A!$|$^q76;hOx_^=L~f<P1>(@>p&kq*w=1FC*&d&3$5nme^=3fsbGga7ZNmh
z=qSkk|Ko=T6c0VOVD!Vr=0~NzyQ_>>x}s+2W|$@>bAK4P*IONv3Q}$pe%8mzC*GZ1
z&pM4U)V-@2ZEO<+(-_X39$gTea2ulpSdnpR1!~7!-@IW3J5G)2US+Vpr_rU6R8xJE
zA!02o_NO1mr**|8QbLH$_e@zw>RpELY$9dk_mTdS$4rRtqi-`CLLpWXnq%m1?BoCk
znsxHe>;mb#8^n;^R!O367#lvF0VSQi49=1^vd0!k8co>a*BWCG$*fvq#(eWA{P8S%
zFgy+)6!Rm7OOpfyL$v!0^^T;hLmh^~RqP3M62p!i3uV4y+2z1Zn%!#qMh?Y|EphhU
ze<24HHRny*+ur9;ln}|s3=BH`k`CYXuVU6ROr3~D(D9+i^1gF()vaHMrVF`Io3E<4
zC^8(O{LtQk1G&?tSBRhDm~FAYk!yf8?Od!J-k1;FwXix!FMRuG!hhkNa`S;6VN~7u
z#fkDX<k*I>;85_>?-&CC=k5Wj<J}g&<;py>6AJ!P?fLFTmxQaWujsX=a9u%HoKMXI
zC$gzhILTx$!*769%O94e*?cv(GTQEFu3j2OAs-)MWx^!$25}QA8GID*z9i7MuD>=|
zgSxspgn(}!)>-0vId2K~y@ldpdm0=B+I=tIx$Sfy4IshJoPFLH*Aq6>hPLX}IPq}g
zZe!P*5zFW4l^T{**fM64$p?7guju54@UYai@c&nv0X|d}Kz<85C!@*GKkQEE!jfZ%
z;^B+BTWENPBN~PmB@jict5%K2hm{-r;r42=Y6C}YzL1J<<ZX>H?&2#VZpLC=$F_ny
zju>`<%ejfx2U<fvJmCM`(KI|O;5@sqW&k_pZ`@THHA(M=yr-%Or(h{!tN_q*vyKzU
z;_`KL*QY3F?{~=oCi~|*R1aS{^}iF**&J<WFA~M_1Z#Yg5LilDkixAIf7HzPORcrf
zl=S~u|D4W>hPI!%vvrfp|BvxXly=37)3TWQe3UquD*~KwIABMQIO72Zb&c|SgKwBw
zwC{Eyxa~d0@Sy>KTu)Swd;LS!o27S(1xq*6IcJL5b;gR3CXJ}Z@kqyr`Fx(ZxsZY>
z=rJIoSgV1%(|($Tg4olVMp&&hVNWcKsq;&9MIEovdz>Up=(1F3-Axl})~2byoqvNs
ztwXz85v2-KF{msWfw9c1^5-pH4wel;Y-DcGu~FviVwsXjFHH(fT)(Mi`Ir?fmD9Go
ztOsKc`a9i0)E&pSWuCe7du`WVW|LX<b;X$F=A!=^ZSt@^Lu2SS|1#R81+8^H^Ks<S
zjeyP&KFJk|d6+InE>CCGH5xt$Ge64&ix&K0cC^2DJB<JjM8fMQr3^bhh{tE6`xIne
zFL0+P8q`1BW`vP1a?{08a`1TDZUdQs;cJUlOu_B#YaICb=H>Ph^URe(BU#!ZuL7;s
z@xOxp_pJsUCCdreD+bb^2y{5&#F3+Z=QNYM33j5Be?c3-6LT6AW<AZ3Ij^^g9)!LV
zjgrGg4xNkwjc8K6V4UCbt%+2&+X(W;`w~B!=;2L`@`E^nOfM>ep<T?9EqgN(I;p+I
zD95*j0%@OZ@z%IGgS`N)$t}{-^%HQ*b|YUaJ9cmUVMH515HRX5OOi5EJTht8BqRrl
zP&0{$blmk|Le)J*X((tQ^p}nWsI&w9f&rZ$cScvwy%o4ui>fq=*w_^nBn?UITnON_
zXy|C#uij&C)8NS)t5(Dx435VUg1*qaWMmkk2>!tWqGpbhvHh~}?$jd;TZ^TwI2xf5
zE0G}a;_XNbf8m~N^T7+G>QDDxIvaQzSh?oXJxmBMGSgQqtkN}v-oH6--s^}P`07I5
zyP1kcE9RX><4s^mepNGsk<=++T*;i?*e0_JD<JmB_j0P9ZP?%|)@ID5(e#X<GOE!4
zH6C3V-^+F>7;+0_O?jL~DAIGsY$Hf^2@)6=A5K&e#`wkgK%AYA>0I+wL|`s-3<*u1
zAc0e7T1B2lOKy6DQNaIc@v~TzYt;Fmug{%BZeePrN@~ROy;msS8n80m+TMva`EHz1
z&U-Uax$)EDs%1DNJ(DDdZ%R76md)nIv97(yDkYIn>c+aFl(m4N%g#*KzxBd7@jjgp
z#Ov1ca*s~$Q90iC#5}Jv#BFZNAP9$tTpdG@2g_gG?tdOn|Goc+)S{z|=jQZr^=+9L
zZ_<}0#B&5c3nW|@8sTwo*c!}b@8FU}C+&|R<i?T%*M$B&&obh2!CthMYmBQ@Q!sbx
zW<U;&F>6u1`p!KUb+C?SOc7w^DX>p<$-V^|eX&XGlA385wlT^5-I|g9i5$hmCiVJt
zkLQuf9)&<Ff~nMHn17YI^ol!1@P68zkQ0vr)6jEX`*CYlk=tnR+PI&~*|>`z>UCr&
z%-Vx?rS5KU|0bPA(qz4~a_*dnReC|5lI}gFrxwtoRgPUQJWRkLbT{C*IGt}=$~}kO
z`qe0Ae$}pJ-u;_W>5V`|i=W;{^Cj-90}FJ&tY#Ur8@N$=u)b0(=og8wsU*b=OHo1P
zGjroKaDCxh_3iKHDy=ID1?zxDcMtoTqy7^#ACWRdvN0W;Ka7pQJjE}_J_h>m$LO_$
z4hO{SBypqC1?f5N+rF=h1#8;oThZeF95lLD9L(S<$s2?E-F<=F96dVOfyvXI8^MH7
z<wQiZI>r!G@5_O&k8|V+v^W=bSt?{Oo02D;dvI@eEseQ%DO8*=H~aNBoet)u?&|f5
z0K<#jylAl+kRE(a5MM+}0cPeK>m?8H-dl{`2g}#I47MZpcXmm6uY*(1HH`}QvJP&&
z(+3niSou|){C)rE=G|{>JX>sQpS+}sJGBhT+I!I^Tzm2xElz;wzwP0FxH3O@{3Y-}
zo?Q`>gauCqerskWNW<vAGstC_<FN|##-#<12H~5!pc#zVdu4~cu=_5G4;pApu`@+u
zq(o7n125&2dd0V9TnmVxAg@tM4@o__La*m^$hOE077TP*j$Zq5OF?xQM)Ew4k5Tn(
z@P;SdWlmV?$3YA`olw7wM!BQ$nUU9cQu}>UqC6;Ia1$tR(`Ap@U@Uu!<0((vA91?N
zqVti>>)vgYRzscfqpX?y#tskE>xydep#u$ixJEl_5I2JnhB3t!M7`JIWFQ@=#XzUn
zFZ^_8eKVHgu*i0f)_1GzPwsnW2dwU)LqNW;)yU7>GQlj%tAD(+6^|+gWGp#Cw(EcA
zdFXz`YzJ-)OQI8Qv`r3s8I~lqg4-i><{c~nr4GUiN;p1IhX3fF7ASct(y)@_Xab|=
zMoUzCo8z3y1S2w46VfnCbS|0VFt9nBwhv2sk-5l_i&%WTIeMq;(%7*l<F+vuD83=#
z+oo=5DBK`RW^e?`gEGX0@Lj?ME&ZQvr{I>YmTd%Kn#%p;ww+<)7;r~gf6Wngz5uh1
zrxbAfA*JdZ<@TJs)op8@s;r*-s5@-Ul}O30$1&Z&$fh(YqGrM7Cz3KE_7#z$%64BQ
zGyIFGw5!1Zyf(S%U=ad>JK2V3wt7sm^SL&^kX6?9uJx^2Acg)fuI+zp#bW{Ra2XuB
zUjUmlBYiXDJA$anO^G~#qibHVb(25l+-8_jy#2&{6c9UsA*P`uRhTh#O@VlBaLqk4
zSRmOU+PPYk0(p&8dZ^3fs>=O@Lgc4(_?2)y&QdT^l!wZiwaQ}aAR^HO=U0EFR!Y?g
zkENGka&TqqFHgONgp5ozgXTo0%qxDv8Jh-qv?<p5{3e<y(l5WaEA(s$5+kyf=#!(e
ztKx;B^HI{UO^QnxrX<Sgvlor>-7?d2Lb$?wAGLzR65eHM(CA7P_A(tWahoPk7l3JN
zX}F-DZc-0!cWv<ny+!#IV#A30DQ`aeM&XeL_b_A$_+2!&LlM~d*d|?<K*6ekfKRu=
zOPYXug{uoH0_YA?dRgCQ_Bfr*tI~Y#goW>38J)ggGIFVlY-r@Hl&U3tr{86mZCH%p
z?PzS|{G{>VLxd4eaZc@-ENW2JEH;nhiv3REGip8?`hyUjJsVnaix#Ug1R@U|5AMr>
zc*fgU=d~Vp)W-TKmNyhZn5zs+QknZv5N@G<xh8t63(D7$1$HUQR5~=%ioy3Mv!;;U
zKN_rRw;?3qbF7s12mFLvWsQq`UyZMwHESsa=OKL~uB8t3B9%il{Gegff6n&t6dj(l
z(#qKSQs;b_lt&PSXAE1A(8sQ8M3kcMmJosow<E67PGK-!r1l}HW`7P8?-Fn77e&0U
zP697xoDb>tA2#X3Ct1vU^#uMFCL$Ci-i<susv2OWklCwO?%Cbz(_}kYXGr2#k&)cb
z+9!$!s&oFK`u3p8qVA`!UwHyKzsTxP2hevR+BqUdFGz&BiwGm!EvHoKsDLW<@l=VW
zeWVbi$MVap?6iTRo<JBXKYp4Mx+96-+3SVErD^!S9OcqYy8urwkq+!}L?2!%Wu`~3
z;(QuiRHP*qrKs~oB5D8rPKnVE{i*DPhciq9JjQs`d^15V^seNE<Tw0ZWyx0RombK@
zN8IiCiPqJw(aMsHump@)Rw5hLYb`6D>ko>FDJu%Zb@KzX7evSfTHPMH=82{5iiAn{
z0aj<(j~|sa*54Sl<nv9c4se{-dq)5T6dNh5U5zWTBjiIqE-h$%=b}HrxF*?zar$}f
z3ey1}V|lacGQArf8u&{bb7hc+V=?Y|+bVv{l#`mbd)CerMsRRBv1tqCN(d0kC%Oy?
z*`Lo<+(Y1QI^hQ*V<ycwP;l~y`?8jIYkLG@!`^KU_kZl}{~flj4am}jCHeKDZBycK
zP^1r>^azUONxqE?c#7TU<eXvv&>~)vlZ)zOZBO_P_&C4**p(Sr(-#|ABk)k2n!p~z
zDX>E0{0DJn&S0{oGnS!r^(Woqx&xh)7zjR$=csm`ZT-<yg^9dCc3&N)2(lvMH*NZ<
z5A^*g16%WGSE&8Ed@an8*hP!5d@B=>`&8yD0WFwA2)wwYu3A*>2_(vRbSf{Uk4d5^
zYY7&QUW1Q)<J(cq2j9dB#&3<*e>vMyB-7Z)N9pe3oOu<Az1}$In9=-F%BeuHj%xkw
zcS^Ljc`HWu!}DV!Jbo3}Sm;ugWau;p0Hc4pt%xjdPqzZ1$12{iV`G<G2KVgjes6_6
z`UA3JH)x?_Y6(0#l@BC`V{VgXLJ!eNU2)UC{6OwKW#6Ab$AIHsv~ZJrHN7<nWVG9o
ztrXlg-`{>%nc!PUn6YD7r||Y8pJ?ox2=j|5@v0jryDbcRe1Z5#W?(!-oG?FXSX861
zk#OE7?>Q(-1a?adE6y!zWgLYs#%|0W?3K?R41orfp$3Y{P?K^e<6U&W)!SlrmH)>Y
zeMf^QJtP{^Sb4_kD~sslNRZpnnZ8pb`-LL+@-O&6YaCXm>KT^c6ACV2ng5l-w_+-M
ze@b>c*Sq6+u)xa*t{Sqm+geSx>vl3m|7?T@dfH`ErO0IzXw?|_+66B+xXPq<R?Lz|
zo2}k@C>g9Im6xfLQXWVLzGUSXRPJ&A=sR#48fFz+6X{J0-V%FkN#HC^;PlHi5MjwR
z?^9VkSr3IC{&=fn_)Ua48QJVt#=Oc_3_q$H`rZqacRJrs!@~L-=L^)-z&e{6S8k|1
z{b66#vv5Q7nQm#pQ&gwr0O*0Bf0i?B6>4}Vz?qAH4d!VvAGRLh(9JiORiaj$`v(R1
zq;Yz~7c|#Vl<WJqV8VPh{O4nigdn<(Ar>MWWjSZqKGAp$0_cV)i)d&>9?#|8UpZ_e
z5-%K`lF6RCof+ckcsc4+H0vTSz&k*Xp&@y@&gb_o_Gb!rr)*?`kbJj-vRlpw(P9Q!
z0o+(SwN9yZg3WwLhrDt=8OW2b6Bt1CX}h~EQ-kE}!*}rX{s9gFC+H~4-$0+|bbvM~
zsu(@nX)KSAc65-G!QeFt=?J%^_glEc*D6NFmseYh3P+Q%&MkfTp%6lzxr*ZP%8`p9
zr;2(QOki9xj9UwoKyA~f-?{eP5v83590H|Jt$Yhx>c|j8ALll?WJ?F)^tk?TE8V8K
z3Im%e7mZJNLRbvXgU}Hogejc+Q$k&QM?ECmx#n(BR&U4YdIm;t#cOJBasT)tz*4V4
zDMw~KEhL7=OKfwEBSHlk4CNYYbzdnCttOjtkIf>+t4z8?_h(bAHOp=EGqpgkS}F0w
znZWGfR?G?RqRe0z>9yAz&Rm(V#96I=VZ=tGuGn;Byo3AOSuQV%{RT8jGr4u|8?6s8
zs;7NUa_fqgXVWVio4(iHht#KLfbtk4>YWBbnueMO-NP#!ZNto9KmFX`YFR{zDtycc
zouf@SuI6N9pLD^UO#Igy@~%AS3cM*zGF_;8ZpE))@ZnfHQnG<aFGq`gZ<c?a#ORIU
z^o0HD;cq-HsXtuI-=!BMmb!`m?fce#1$^JeSa-SXl?_5Ay@y*@b{AQkGdv^HL~^8$
zK}#$#G<YEE?-VMqJJ713SN3ktb2-n9@glJ7Uh?mt7fSRhnws%_r*hMh^R7m1_a5S$
zC_lu)lKf40KNO_=ou#+&g6RHE$EB$Mtp&ghZ(I3-2k5I4T2DX`B`+4K-DQTvn8E!6
z6R&!nqdYfhQ-ggAOi$K?4ePgRpmd=$qmR?c3;}aSJYKWfR1mZ-Vneo{o8rIr;{l-R
z0P)IEGb~GDYiYk}rzCY9HJDCtW{z!HQ`D&?GN#FVHQ7k`#Kn2L!kYkOqZdo)ISC{<
z)7{&X^8l&kNT4s??t;<pDp=OCmY!N3+ilM%<RP5nB?|>1pR(mhQ8YmPT4|l}S<fZl
zU~+0I7-ZIJHK+T|$f0lSGIgm~KE&kg_b!*A{HTm)pLtbhm(f-r|3$ZF{mt+;^{2q|
zQS^9jgnhAERFb=4Y0cyIGMHSrcuH1by~b&SNbk`t&-W7244Io@&P71+K;jQSWHA5b
zpR7HX0M4%^8hW%34=MDPx<Ybf6d(7CR9^STDv)v!vIk*F4M;khN`@5}v3~cStQJq#
z?|OY@9A0I@1g3FvoG>A8X40c^y%|D^-|ssC4z6c_lS2BvgRP^MKzfK$IyDe*Na7ji
z69757Bti-oqYvBCij%3e7}IqfUy#ER*^8CYBT?rk+zc>q5Irf66H0t?0F<j%!WglK
zO$p!2i#`|@!v{I9#m|}n;D>9j`C#XD1-Q|1vj`AMHeVF~GiLv-W%852tmoT;&bHl5
zqxsYN)tWQ~0!Wag)7w83AEI*^FezuUb5SKpJ56s+DaLb~XrU?xD|!0N*{l?4VZ;bU
zlRTW=AtTJm8_XT4lq$DRe_BC)8*afu<QpbGqZ}6P<@g+`@AqcIT?M!Eap36G+KM&T
zTy>pCo_g1%^`Z&jog!zTQ|v{hZd6hZ4&~NHshDI^ZIo2TceQv}v(kYDh_O<qG(kw=
zB+l{Mu$qGyt`@T#^ts;t@@(n;<GpXWa;Cb4DPwfi&-66A4Swaz6Mo9U*??Szu%sNS
z??oK{1h3u!;MI5oeW;Ppul?Q8^~8z|Ex<+iPCqFy5tqvVfX{SatQ2tgL$a@YA!n&)
zU#vRFG9^Pz$b|(q<8o0afOw+rXdvUGs$uC5Vc=erBwMz4Aa~9yx3}y0EPXTcb3o=P
z>$IHBYl9c`!U6&oM<<8x#jA?6%wClK%+jL|#$_FEs`_0wYd3PXXLnbYs`%1~xnMv+
z7>NCWHjAz)VZ;fvZWohw1EVxx?S=enm-5zv>8`H;TKyI-?#7FmkG3}eDrT)x%8E~H
z_v=e~JOz=z*V`~@xopm5cm+Phpe(#ydR1v3o04@otu%hR<lSE9#AErPdWM^j$y@AJ
zpCv62y3_sKsGi2G0T*lHzG-c3iTHJq=FG!`JN&p#XF3EN%bW={o3vmAN7>Ei9-Aaj
zYxk5+%q)-ZAZ!cYIb$WaUbw`vn`Ugz-}aEeFYep*o(h6Tu$7~dU~PFCZ>zb@2MN=K
z+)93GXOTtuxQ{u^oB%nH#Y3QzTz+rVSo~6#S9TjgOQ6(UXH3=gSXhwv*)_)P21hC6
z@Xon3+rUB(u3r~P{_=nTA7lhc%q`eYa&D3SU4>QWf&DM9(vr;IGRsOKC|LOZ4S_(R
zi}gqPJ`$V(!e`I8^R-nZGWwWXbOy&ex$r@AYkr5mF_8W9pW~hPcfV4N#5N-K)J}J4
zz!Hdc{$AR!q|m-2lfz9iAm9wmsFL{kD_`<`VkDLL%2xYbT&cWb0s(4b??dJ6;z+bO
zTsC;P$?G8?o|N8Mu2`L!&SZSA(wi!t1)n#Vp@#~*H+9%b2NrY94c@$TySShOuW)w8
zDQEEaEmYPCoH#jl$?6P!^YTA$U~mpR&9jl!E0}?qJG7dIuLAsVx?}3w_7&q$*HHsO
z@D#BTPrPMTMe#ALKnb#Zw0eee$)J%I!|@CiI|#GUat@uV_t~>na-!;;C3%yYE%&4O
z038`vk#Z24aEnts<#Nq*l*6h$2pd}kLw)t<2)j<}yA;vMlMjD(SyfsbBgae@0l|u9
zk#1>BTk*X&C5a45Nq^cbe!kR$S;c<h_)Nart!Xp6t#P}rbx1zrez-17<GX71`Oc%V
z-T}8ZSv#`E-LM?iqa-(f$AEyl)rKXB$S4U)D-}DBE&UQc?6b9<qWJgW7ajS7M;S)X
z9TargJr??dCJ3{gO{HCai-O$TX=H=#wLYmg#z7*5We`>%*81$Az|ae)0PbKT{#wSx
z>zt2F>F5-lL+5po8X-pFd3Ni5yBk?^R5(b-n7N4cD(|`GzvtiP7y+>g%nI^W2Dw^d
zBr)-*B`<%2S6fZr%<}TE)^ru(;=Hk0VKC1l-Xek0%NG0k%<eL6dQhQe?5pqcE|;Wf
z=6th}AXrS}ca<-d&@d6m;bGXK0&E^nGVG3w=GI3WOAIWp;R@b2O(ns2M)cpZYmEJs
z4PaxhcjPG<z$6Z<Z3R^(bl^6BtAU~MI91pfwWwy%M*vRTptd``^IWOy<@qk^K70ZZ
z&Y<6!TKMgYhEdFj9@x8deDO}sF2NexRc2_fQnvuuj?%?4=h~ziKk^AV1w_)-rR%e8
z6cDFYI|nOg@bv8_Oct00C48o1yYv%0m&RNr*x#I>dmQ{q#Oh<z>Qx7t0!r^MYm>48
ziMKCFl)})t0;S|T??AWbY`@Q<8_lP2Ye{sr!jjDM?A}t12jn95j)clu-mQ<*fW2ze
zOvRmRKfUYEJ^b%*?*H$0di3IZy#A83+1*(zqmr<@Ycm>-T#QVHtln7}PB7m|;s2&X
z9jj3ihiJt`9NvINgt^{2!I2u@g*}aVa!oK0@H8rORA4C%5=MniW+!!LoDQ1r#ScA@
zzWdYT;mL$T%5%)HNiNq*{GI8K*Z}!KVfipol-)4y!bppw<pse4(gLxxlD%V4Ffon}
z46mtF-gaQW&Q?@^Hl<ioWEGu-xYLz#9yWkuE9XR~-C4-WICO5V1o^eps9jp6g@;2A
zYxg?IoqBZ(=|iJ#^vc(BX@m}%;kNTCsXP=b5d)nh;H)~CRdXDtn}hQ%Kq+mH;vi3r
z<;?DTjo`BCSTL@f_ItgPFpX>-w1Nsj4{E@Hyr|Q}h+@`&h%oYE1SwB9R6`B+J_g=6
zee6>aZn_X2#SD~`xr_85F3;4+KWBPF>T;WONyX5>xILJ*+O`Lw7Ig{;rLh(Ap6QAn
zj|%ULVs`k6%~%Ya_M^^AVIC!A<fVMJ2UBT=Of?uXL+(e9$~j#sdXtSVE1$I9PF*y>
zVTWgwoi5Ki0@BPA``N>YcehdzH)O?$Vg&1TkVU>`$Mt}Cjxq&RjK6c;lL%B6u)c>A
z=9*hKLKeNI+m=0g#OB8;#!M*roVy{Svq5dHToW4sy!+4o<si`<cfUFwCIAVP_c0Nw
zCi=b4qL>cy@i$W_A&|vKd#U0XMh2x=PUuU0<`j2Y>qxC4H|XNq(uw?67Ap{&ksMAi
z5GNmRM$V|`oa@42uHKJl5z@i|K>p+S{{{KGQhW+S-EzSL`QlBAppbxwRWgPN$GIQ_
zAJdyWVoOzlU|z)a!{iQ+iliFc>rMclbVE8sYaTtehKV1Y_ZgRzM+M~5V`%0|?W_w-
z%W63l?ppmbrUV(eKtG71dl;ZXmcO<6N{+!W<9Q!KKo$d<uF+$-onvES(?GAl_|&c%
z)Z<^tAI27Y`m0-kHQ}ZpxvKxdzpiZCqCQfyEj1Y9`hC0Wm&k0M#iOIKm$I3#SVwbF
zQT_t`9jA*yz){{`>Ij5N*;3pyM3AjdZUOyq`oHs50Q?Y2y4PSCGky%re!3g|{21%a
z@3hfM({t4-AFv6?P~U3NNm5LYY&~9wZ&zBsD3(t5UR<_hdbVdOHjeCcyW}z|INs@W
z@~fv+T1f-ulPe;!>U)(*04lpseK}rZuiJS43xT26*3#J(dayEWw+ey4Kl!<Ji7n3A
z_sD1vTU**IW^9sc3c>%ivVq(m@88@XKB!2=v2BB?%ep_y?|OKgQbF)E=t@77)8DM)
ztt=H0daPCw_dXtIW0UC}PD2}fGHAoADU>@#<ZhR&;2nlsujgR*r4^71I<@*EI=no*
zNCQ5Ujbcn~?03Od=PdL~%V9|^rN4SbD27>pIk3{aSO8O0+1D>Bx635ULa@YQONPF!
zvU}V{gUSvP+evm$y-WMu{WHUDPajFMP)*MckBGmypk(r;wi*p`uDl}dvddo5X`TH0
z&Ku2QAYV-kC<iF6ceOQIMiKWWc{idF4G}W)(?k@0v16t-*a+$`rz%gz>#<?=ileG*
zU%X^(A-T4yV1Dz_)->LTm4%t;<V~Lf@iD2mZDD6I(e1))`K{NKJv!O0Od~R}XpQf!
zcA*w+H1_)3rQbs^<l)RH-%8L;CyRVgyaH5yrZ655$50FgoXHFtuo}$gX6Ji_RU?zk
zi|s7>Ttw1oMVX<#MP+4a)pRL~UC$tfuyfJR%LD7`Pd@V&+4e28$W?NGuENaLZx5Yv
zUP7P#Gu!5IitkbQg;<MO04{H*YEPKJ;$zYEr7vq`BV%Y)J^eV`oWcf=R-yP8|E6I>
zdtqU(yFA6E8OOW|A|67K|K*1QT>a--MmWR~@5!DT3=1deX<ereZZ3k#YT!s!<0h^K
ze=w4J_W4&X&F^37Q#ZfHtZP_Hz&ITi^LJ?|NKl@u+^cJB#OgnkNUCP))OfxaH4=d?
z%~q^~9U4N)0Xl@;(r_xD#i>~^2J2`7rA-t8eGRJmINf5o;u}@b2%YlIlti6q_^WjO
zMXum`A@{X!1VTYuoqUM%iAClac1icd@J)$%ZkfT1v@ZnHy$gm6&!O|y;@v~*k*iJj
zNg=Y#?<*Rb(59a^>ZsMkc$>$a_GtC@hN)}J5x(G7m9C$WI&3el!xsTdPgSWuGL<(L
zd0JgnM2nyB)$7~$S52iI9b8TXKE|QDw2}W&`H-(>!=yWB`7;ecRFfn%M!;?d2us0L
zuPp(j@}6Sv#UGIoKAim7mx{6W7U5eFRc=E=`vq=557Y2z=(XhcOkoi3P<{F3MUu8{
zv(x0lLgH-M(7U?~Hz>i&$ZMV$a4rC+{mVNNr$ot8$PHfaiVfDpstL|u+6O|EZNZxF
zrvm*GQ3TkQWH`K0b6sy|++S6T3Zvktz<d^qMXrA3@7t0$t4cjYw)y%^hzGjkfuw{L
zhc9TrH+>X#9uEpAMlwYm0G3kGw$f`8O318!_&RQaz&>1`;irbUz<BPhEVHgpbzHlY
zhW>aRkN1^s?WrL4WD{cV>lAemS!rqZ>*h(j2s;AFvV<zr#=3GT#jjB2(+a}|CPbfS
zd-AbKkqHeLP#-fRiK245-E^g`#JUpd$+cvLOeMn=@g+>ZtJd#6CrA)3P?3)7Cz}#z
zias`E_L-Z1dMmR=ak>Pev<bwxc!7<lN|3PTw}drMCjOST)22*|wu9y)jjoW4^J)WB
z*(T+gEwMEck|tPe;uekT>4~fU`bY)q6e%L4Sas#F1zLECGZR~Frh464_;vGR_UnB{
z^l;eB+5KCA{y>a$DA;64P1-r?(&>{}Qs6Y4`XgGL3>|$sSHT?d2zxcZ5gDZ2C$G0j
z$X-`(PkFJe@+VYj*}1%+FRhNBZyDChDhj>+ir&l@JTj$2dz(piB4ra@^Up<zF5-(W
zQY)?Wk|hHyP_HpU6m{+0wt-ooFj$Gq9M_K@b+r%&+RyhANR}r#RDzyD7;mp?eJh1F
z?(Xq_d?Aw_y61YiUn*h=pgGj<GbV!RxW1~_hZ&RQ_-_8TVZ>3hcV+(KV=EP3H}EzZ
zll&d#S7~Fyum-#niIi%U*w;a*lQNYNjW}FyNx9`Q?2^8C_}e*g%i5BBjp4&*GC1Ie
z5foz87&nC8(dCl{$<9VzvbBGmk=XSedq$?IFVqS7_Bn@r^-f#lc(?p=gnU+16{{Pc
zC2`7r&>5FLzFzNZ7xzJoEUcJ3KyMS}8r;t#bx5+D(`Iu&<L0=d(aanOx%7uEF8&GQ
znCfc6z8t=W#eZUj=!Zrq(Dr_~{bc9NP#3u#Y-S>0Q;KPKmm=wX^>h4^qy6P8I;B;E
zg06$zx^!qAqA-C?L$~e%NJ1=MhWT9EA7Kmo6>5sJseRzbZp3xrrs(_k`Q_J%E^;P~
zX&N5LdT`h>4B*j{D9-eW3SSAL>;{~Yqx@<{F`-T+1>6G`LK+#|15oENr!4OYU!SPI
z``{4`_!-)eKiGr{ZQPj}<}u?&R)aJljoU`Tu32*xoA}<@3Q#`i4;taaC!oWdScsXy
z6!U&@o3TJY-R(K}Ey=&17K`l5F9lcecX3Z8L+k2c3vSF}OROW2jZM~=v(E{h=?41F
zNM&WCL3A_e@|Htplst&qnTN(DkF3a{^X7Bn+lzoR`IX|6VPbfJe7CZ0L4p9CCIu!w
z8|+>immY(*XRhXgua5lJ;QKda8&9p=EleE^cZ`fqDTXpEpDNAJLPx4`QS{ocx!T4=
zu$$vSuM|UcJ?A;SH?CQ{x9e<5$@J_J>ug;7d!r13n}r+}-50@jLwvymw-wVk6nj-K
zWY+SrOvSpZ#AVGoUH`pfUHwh)cpJxOnA{ssVOh?#?&F3WJS<RbAJVKv=?U^{3h>Kw
zNW$=+9V)C?@k0>FClvFJn_}4Oh?JwZFf2Q00q>Jxn&*cx<X@y?(oi*<vIcMZmIQHp
z`3DPl8S!ybo2k?3#GN;tjM4{Gn)rRG`nlR6S1j2#PL+Gt+v@XJ@m`_cRmuW&i!jmA
z_@mjwHs&st$EE0FvG&O8X>qKI>nrh*D_E+dLvV1HUGL_Lbm7*i-bhwhQxh_|F4)Sv
zKqw0Dy3n)X@fds2yJ#6o(O|+j3e~M91ifU5jJ)id@2{f#*LLmygCO89lI!2^?!oZ=
zgHU0$qWMJZneR}3&2?WJ;<lMFQK9c)aC0%@3OD-6TnG@No_MDSr_KUz{zR}pvE^e0
z7XEYOdE}57`WV_)J+JIn<UY&&tw4sJ-`Ze(31o1@pnX0M*$xw~SgkXm_Fn1=_On}k
zFXsELyk+}gs@E0tm}0luRZ*{ZHOT=mmr7}ioLZLG#@&@JoK#2rR_UZJ=KI)Yv+lU$
zFL>JQRz_S;zX)>wipT{J^-x{&Ysb=Jma1_KMs4<hf)=b?)K0II%+@h;OLpjd@tMJl
zBciZWmo&{AYmZ>E%S!&=RuC8r3`1bB*_@{h-j8Bn^BTh;QP~2htk`GL`U2}t9Gc=&
zwVCo?+Agc&KOKZFC|{mJ@?CtRaceRi@*L*xY<&+-8y&762C+ICALNtDJOM2vTC4fM
z(*Do}HEyCyjbK7nW^K}@kIaUIbo*Z3(MJcFn^|u~FN~Yj>Hc||`e(lYk|9f$Z*GP9
zp55y7U0<RtUivm&3>#tOn*RorxIwsI8bxNV#rEFnGTtoYTdSheI;kc&XWja+)mVsM
zOZSN0o+s!(xyi~}gvM(L*AHCBBaCSCN!L5nD$bdL$hSkW!%sXELwM83{$(P85r0Pp
z%PPhD^#7yoEu-RWzJ0;q7Tn!~TL{v)y9alIySoKw+}+*X-4aM}cbDMq5PY8Z%$zf8
z&3pg%ex6UPUW@Lodb(=YuKkmZ4i9|mAT3Zz9OSe52n~w~AypG4vCSdrNv&VZ0>znz
z9a{aL<5><QkdJ#J!qR1Mw*mR|=Qa<NU)lhaCQLU1&^tdtmT70nUg0`znoyKcfk!mV
z1=9J$)c-f2o(C|Ty{kWj3%NqT2jf2pBLJggizR>x$QSy6@d=$-RM}weKk`Wfek6tg
z6I1+CCYDLU06b+rb&qEb5;57{x?jUWTji2KR4Da>FZ_>5D=YjzlYl1+2`@zo0N^FR
zE?^K|V}%3%`3d~j9Tbn?pV1Oz6{gU@|JVCi|Nrv79Jhkt`$jhRZ%F^Cu=zhzCk1A9
zJstKi2@O2ikFTtP)nZ6z%#4gpfq{^nt|uu#zlXnU@TvK^qRPhqc~b(Z$z>5huNY0_
z`8-`S7y|rWvB)dl1b!=FIQv7F+x@;Q%*-tb2^gK8HzlraZamA7+X<@752r4=|7kV<
zceY~=35+l&+$ct2pAz722oQ+g0Qgh>PmmSD|6y|Z|E!FEHo!~#f9(gj|L5h$|I_<~
z|M^GY3p@h>dXVWM9OXYB^HN@^{OGXcv4`)=xT-XFrAhDxjqe-CYNF?9<GhCu$d`ff
zpQfTH@XxA!G{#~gGWlmonFzrdGKZx=e7~*2{TcAy2A=}9r9e=fG%j#bkr!Myw#&*w
zdp^4V?DfCF^-x8CL6SmiV7<}(1DQ|7eghKmSWmy`t=yI<!zpWn5@FNxq>)ZOrE92b
z3Q5B@;?uqP%;SrT!-w?(VXeXtWKzEaeVdO>$glVZx)ZQ>hLJmZu4^X(*$*vy=c97v
z{DOJ*4_~6A$F}$*`DS2Ns>YvRq(YpyD!aS?u6OKl2X5yXvQ6FZ4Jaem>$E?9wn^I`
zxm-*8I5z<`Vmef=gwhX()%(7;#@#vIbMN#zOiJl*rXqM@|M;xo>{^WUx-cxhHR8Pj
z*W6ov&HW#WK9hL=JUC-0lJ#+-*x=w`byZapA|yn_s>a4y9v+??3lE2Z=>=d}3C=8K
za@l*W+7DlWluGYor=?fawY87a9&T^(yfrjZNppX>1>HG-;+G>MBZ+Uou2%1fOXuD(
zOfjpSobBq~-IeV~n+%D(n~NZy)a)))`o)u_aXc`@r1WLadaA|G@Z6Sw7V_i^?;`u9
zyqq;Z-07Lt3(q@cf%Q<W-pS2fqZMy|l)1Hhd-n6EtWKKA5xrE>yC5#|TGCh(mOGZv
z^)tSL-@_@Ap9Q8egRu<v1}G`D@nXde_i9k*)`UioKl!~bLoVw2du8^wyYxKlqIx6_
zMCdd~U%qBKD}~#2J!|{hc6I(z{PXqL_02fL@Hro?t@`cby)b=mjX<hjQ|zPd<b``*
z;NuPJnS#TuLal(?nqRIn6wJ0?%f);y#Hzx8bN0lAcL%uo<p!#mc<*0DwyouV;75TS
zKBa?l-p!}A=Q7ggg;(xlk2%gataG@<{~MNH1!4r#emUoUC-1;4LNv_I^rlzh$hMS>
z=~IfV*>x;;zC#--E;_gFF172ui@SF>s`g;rB!65O(c>C2!r-fF!p(<$-j0S@C<pgp
zoQGN16$#@0CD$HFsy!xjzaC{~e_7XKJ`=(RtlcM9UP-L`jZxq3hl7=0uP?cciS`)}
zS#Z`i4sv*?d6K^Jckf#F>m@7az7sUnPiyhVJ+q+~*sQ~nUTnAZBhicU(bIZ6YmuU)
zgs4yOBuVXX`o8w&HP{if=5pQ0?e$&rK{FE1g*KLB_z$6Hs1MULZgs{{^{F<;9h1m<
zr6{?^9RmYd9<+b>=1VCF;cOl-l;s?m@8~48SR^x-{rhwl6kXW|KkS-iKcrdP2$(IK
zxO{vy5oAHZO=)A=P7#hx?&5WFbB$rUa-8hsnX|`)jS+csm~GMRY&<{wcD^RGa#nKt
z=N_{^;q+l;VmtZdR;a$DZKpbQj0&5Y5+pa<y!NQSI`*JD{f45S<;|Avd+E0+OxejW
ze|m7^a5QO__3z+rj{8rnT(9>x^PV8x0)DHr%N-x!u(e>DtYzA4PbHFlTx@(ur(}Ki
zaT3_iJ@JjzpuhP65>{dStpY%&gn-o|KZQ&U7njarv|Hygj1k48>i+MM^Pgeou8^V=
z*Vog9wkED>Ea{GC*QVB;D?VK4)84(CYpYSuJ;TlMZ?SkBGPZ=K)TXBDs=D762?)=g
z<}15hr+UEAJH_6JyJMunxoxctGovl7O`6`amFrXSPQ&#}i`iUqi|0pMvm6)V=kPcW
zjNY3l_x@~mK`5TJ7;dk)lgaIi;_;B6hP<kZ5eaoQ#8^ZId@ZwaTxNj4BN}shI9zu*
za$!`NJKO!Lpn3asK=JX;NS&`h)iri5tGtDx{8<}s(3<HVsBwqAOJm!KMHH1gm&=?)
z>3Sp86zW@zlofXvW|%1J?ZNLs$hd}_)Ra`6@~3_psjmA4oRc|csK<iT(;V-`xr2y(
zWtKc$8RwPF5_)gjl7i*9N0TGti_)9vXO9EwT4Mwl1yiY8$YR+?jT^fWnf+78wy)<C
zomD;!>F4LX5=Spn#24K)4~zF*&R&@hN8ZR2KjY2TeMdKZZ-Us`y%9!B`Axzl_6!v)
zHZ2ccZA-6@-m3Jnx{dP}+YH78YgOF+$LwMk|L64n?|jt53^?Jaxo)|~*4j%ZkA{k0
z1f^ylT_%Xh+n-7A;*KwLLV2&w-;i#UE4}tF;CBAm%>=BmK^FMhW988XQ=&Lqm&EFM
z26HDwGBI|+BRh-l=P$)s1Eop|Cj~PJ`ZUY(pip}^_vPwL75%l5tivUf8&|pm=<}nV
zqnL|nOq!*oqdQKcdmB(s?Q%JP=h^Wi(!zqvZ~y$<1zCfWU6*~0=iihP-0^JAcUGA#
zCgF^mEint6ylsw!*&CgT6E`SypX_AXzjqk6)tKp5i;--pF2dM<^=C=Cuk=dicCV42
z5%BsWM&pW7wP`rmV%tbKzPP*8L%tU}_oV_b-6)u&#eHOVJSRH4vH8>0zA(RGcYs3F
z)Ch;V3{=L&C|uhc`s^ug@slk{1$2+IvYbBk`Yshq<qrh)q=qNaV+rq#q)4iL^Pp69
zXEvAK-?~tLycawToljpq$rdmu4d&61PWn3Gyb_X+DNB?W|J4VJR0n(IxL>Y$bh
zH~;JM0Lr4oM5w-4$NOg!Z=uZ7lE<Wdsm<rI+&JGY9=m~$gg{|eK0|O3p{_r(b*Wu6
zbbgT2{$^-|K6kYBCwnbI*PY)jo8O`*;-T1TGJ^uvbyeMLPk>F*4gLH>MF$U17^`^p
zJ`EX4Jkh9aAJ?hrX{V+r`El^G=<&vOZ46(a)$XtzFb>V*$OfG(riqT>cdvDN>D}BT
zxbLeehv&F<AK)ALJvG)Gq#k5D8OpUop2f6wz)J6dY4>h*QJQ)m?B)`kgY%7VJ^1Xc
z80`>4Nocy|CSS4-))%Rhw{`OeN`$Os119(0i0_ujeDq>Z9d8{~`1{q2`eN@!MAyn8
zf(b_ZM3z&02lf|U2X5K=Bs;%mURbq0je5y6cG<Sy=Z6)p8~${?P!v2Xd|yjlR!B2}
zX|G-6)VxYMe5c5PIP>FwIHe-Gsl4ReTakah87%^x-JNK!5M31Tt=`JIvXfqOP1HYF
zoIgx)p4luFGY-yiO|bVgrMEo9IHNGs4E{S?n&brXhuvuZr$V}HncUfU!ItayAEW-c
z=blT(6@?eCL0XaqtCPp(tjIu&l+1ku*!@#RF|%v_%C!dXg{=5$Uw2S6bz_LDs~6}c
zoy=EV=@jftDiR7rD)HC{==X>eiX<}UNN5FC#C`JQ91CD4zuNVu%kp#acIxV}I&Sb)
zregZ-AFG8>{KHaOdF&6}`yxQoAYl^jx1Gz08cQ|k3E*}LK8&6>=A0M7xNCH4^uc`-
zu>$plI;|Zy%i%cl)Tp@nEbsGz+XbIPn?>x``-gvfx)H6+@xK@2{zGKa8sC-XR5?zn
zzIY$M_jzk?=Hvu@f(b?xFK)2eU;&l?GzMCWaXRnGWcD}(yEZj1*I%@h+ipIhphJfz
zuP1^IFUg)dbzlA19z=RQag>5pqspCU^150Eq_g$m9*DzS7xhT5<?lRmD}e|CY*WyE
z?J`XCPi%Dkho_h;0=&hp`rq7p7lxVuR}7EYV-K4K7O(T}{X`KU;atf2y38YdzC59Z
zduqDuA)U9AtiI5k`7~i^-jkdZ-L!1IxHP6#X>%R;sPNJ6p|y9l{*2dWOeq9FUUs3Q
zFeN?jg~|lK&?R(Cpo{$J3d;mgN)QB`vLnl>*K?9JjR*S7$Kvr{phnw)iev#ros5jr
znfe{K?ZbLG&%y%;6-?8KzD0et+?+<Yc*gNm=H&q7%_aQ!D1|w;3scO_hn2(9Q0K4R
zg+(7LF(A~Hs@fU*s@3lFt=mqi@FrQ$=rQWKDq7U3zBR+AQnaoYm~VApnPhX-@PclB
zylqsnDQ-P(uq_f^Xq<}$T+(3#{HSW9R$&IM`A(cv8RrCb3*OfMt`p60kRue6q4-$u
z`E{}j6E`V7-=tMAb9i8f$BX@Ir*pJ|d??>{rC}N)%|wgR*7<{NU+qt#OEPUr7&c3v
zudO^JEDj$bq0%S#ZwwQ>FN&Z2rMY?c?Ku0_Z6iNI8i##h?X~`B%53fb?$({$#2<=O
zT#j^InyM?)(w(o3`xwi4gYmy&aC-hzFGliLEf!4pTbO2W7Jo|s=W`E->;G$g`Dc_M
z&+G=1q=C$SH1UYQsl+Q4Hx0=935q!Lak%xX4rxnF+XV`wAy9fD<_q~lJShoW_DC%q
z%=|<}uo%SD_m=8wJ_`>kt80<l&p24~xU0wPiY~Z3Md^XOHzJ%&#^!!UiX}0+!m%F5
z+(^bO{VtR|yADruyXy-+#l4TCwj@r1n)bh95h)M>i@i3OQc$L>evHRhWHxB<HSjSq
zzduWjHm;HRY3B6nnWy5}Jgh74<#unhNGa`LLSIBVJK3t}>ejM%^2UxzGA%2uOr}Q9
z4o!F_uTEK>!i%RDj*(l)Dld?hj&OUsPrc`cLFa9>b@vDJtsCje-|FnO=F#C#Lljx_
zadI{<75JY+<IEE0YY?0=3X2YHddaU3!+8HFH{H@Q5{!IM_y1Y*+>phfpu%A=P6yNR
zIXGYiJWB%Sb+$WPGXzjzDpkzKRJ<E5t>X9vLN*X}zeBBF6Nl4+@cfAmq6v1gN&QS{
z+|EyJMW2|r;Sqv&vK_37_X@jgkHysKoWtv3MbRO_ia}dKzN#f?Hrz+#zfh_S?6XzI
zT*6&CB8fbNX1nbxp!U%6o9m{@uVZ<9XF2K&W2Qr7&;zzw4x<-s7c%CbCYW6)wmbb_
z^h{>Z3E5pUFohC}_z2Y4R}e5Yn)&jG&x4C=bU`bPI?zNi#7RtNh_WRtW`mP(?X}&y
zyq1*C@g2%9LfjMnN7Vex`~n835kkw)kGKN<-fY+i9sAvzD+z#o0AWR`*5oota==i~
zoM?Q4QO5IwOpd|gfrp(BUjNK>Jg4clNG7#YsDsJa&8+vs$`$b{|1ZMHk2m-tt}6+@
zYo_Sw>0Dowr*-+9RKaVW8bU@oTMuVPU+<Q4DHlZeN4d9UWg~#%y-Z~}bF1<mMcy6$
zJXE(!a%i2{nN#y)hn|g1y>&HL)ty)9r0~emk;3~ke=@eI`fmX%@M=F@>};>*B$ng?
z6nvH)9pox+$Eo%zvUS&t87@n!CazfKyN>fA84T)^cKQ9Mj}p<0cP-Z!3M+c5(9;R!
zDnAMe8+B~$ng=Rv-S$>ot?e%`>Z^RT=~3}+<6VDABi!M8+vyBmqf<Y*<3uDxP0=TG
zfScE_#r;m}k&v0(q&0)s@dBJf*0aMWzU;mXHDD03dnL8{i<J+JW!Vmqufz}(6aH4Q
z8$1t&Eg!OTSwXg$6+9D&`n*|K50U7!T#X0nv#OM|B9i2O+%HOe64g=SWDD&uX>B|y
zadmN7EiS>~b(cQz<`VP2FZ;{R?$VFX@vmCk56p-k?9xp|f?*E#^}=;RE>TJB2<EJd
zk2`V3^s*p#!();xcVZ{{^{LvQ<T_+}2{#c=f+qa}&;bAS18q_i4f6sBxT;k@-8~wj
z^9UB{3GOvSsr>`}^^M&CwO7IYZhFjWSyL6;sBD3CG9|O&qtV)>c0Xb05y#-MKxk0w
zd)@(4`Bd%JL2sD$w&^6Uutok3z!3^R<2@_Nx{aeo`~0H4No?)p6Q3xKRD_AWt3eJu
z^97ybC1Nu?g{3zR>ZMGN-+hDF1`QvLK#%y0B7jU#qM|3&W+RfNt)f^35%s#`FLS~N
z$;R4gFa1WIy<2+v6wwZlBqXtO@`t_aMN6f9_6eJ;+dU^Ajj{{tjpt~fqcbrmn<bpP
z<4_Fcj8U}*M+&MdZg^e6cP&;VP{|lPEP^dY+iXn{n87j<qDs-~1#g`ae>3MTJHdV2
z<)UjltB`|FbXXSog9PT|CKEV9g#2dB`=b3bRH%Rz<ojMH@3}+xTU|6~<F*OQ!UGU5
zi+2WXUwhv}RWS$~F22lXk2r!2%kjRRj1(MS?PSYQDoVjhNHpD+#THXG@^OHJgIvC?
zkvjtvS)k!YQ7GDXMv5-N;l+zi)A2Di@;`Dn0c<ur-V+20*-lPO>x#-%!kj{RLR4eE
z+nzKC@U^ahc5wCe$Xq2*caF$_sn0H0UaM;gTXc7ZoQfz-bmM!k{czbw@1ZXP>3nn)
z8DxwO1olt4f~zh5FSFWC?SXBJ7n4R=_%oFbhwCoLrvq}C`_L3;0cZvGsfsxb{JFkn
z|B!x%@#<(?i0*10*X?bWS^UnOaS9L$_AXm9!Gl_zHJ$-H)Rnx?Tv8R22i)%@)7-2v
zd_gyNj7;x)Ec5S-o6~;rhFkr77Tm5Y5{hYLB33db1%QxPdLC$cA<FLtkeNEks1+NZ
zqRx6IRVyAWc|PmX$~lz8mY3IoW(QYx5elnID%M8GAQ2m`Mw4i?q^cvGCUW^!ElMXi
zmcyN+pB@&+Pt%9TbsT=S3mVI0%P$Hz+7!>vM>)+TbU4h2iYal4AtA;uoMcllYMZnU
zqmVAHQ2r?pbt~ucN91R=Nw;sR)I3po-PO+IHSUV8Cf+NOva-h+IEgvlHH#tRalVSk
z#=#k!{IVpDPRYdq1iT**opGvbNa&x%mnL#S!XB#J<SLBCRJ$A_`4s&9bT;^DLJ>r~
z>sxM<`dwD3U~ilSWO3;h&rW<nbYsjx^;N#*9PXBp5lI>3Ppb-GLjzuGb1pY=KNqCc
z#HPnv(|%4~LSvCb7WzXKP|lt7PKE99FM>=x!HI!IH7^4qq~$Rh?T$N}O^XY+Kl|W1
zZimyrM5XcSGl3Yb$Q_RVzAyeyr837AI>!<KW5CFPFa?D&6X8xEfti#Gn=MoU&KT}@
zp1V=CuoO;^GMT7SF4dn;>;AozZgr@`DxYYEEna7mI>@z`>VnTc1;4e`7(9IwTz3}i
zbUgbB|J@h>JO(UTWmy-2JEol2b3Jv``eSi(kikN%ss-6~gq%Uo+J2Sygg?;OBg;Ww
zvUN)nJHEj<^?<xQ=gTxs_frHDporCA#u`QNw6VsaIJqntcXNC9(-=bYf}bSIZ*}ky
z-H(eR8YGO_xN0<SKh7_%|9S6!m(2T&V6>Yby}R4L9hG5L6&50m$PrRd?6<6vo`k78
z!O{m7);AF(N!`(QTjwpYF5`EgHAp-&e)=s)HD$RMK+dPWFQ!3mH)bmx&1SZuN|OZ$
zcV%@LIiP(7+pS}z3PoBqG-`b)Rd$82glarQZ%23n66=(9Q9SLI>QR3R2}GXPOl0-P
z>wNx3!T~LlR;bhB!Oz{QjNqa0@s5-O#g<q{a2#7`^7Wh_)Kl8=2Rx-xtx?Nu36Z~N
zoDPwXc-{?k4b9hIQTW>uP9dfguB*1b`Eu}HFFNA6BS_)^cbR?BFw((RLq*{aWWLVe
z!7yI2P@6yq?vg9xwb5JtVW4T@XnZcr+RO8BEh~zR`vAqW&x19E=@4lUWLTU{nbeQ4
z@$k>%Fvnm<$|j;+@#Y~wq~|BeA=nN#6X%U%@g_nm0vmuppBge-svAAe6j&>?bEb3N
znaUQ#y`zOM=$vwJS)Tr78ew?c|E<v$pi3>ROdURk;59~_4xJ<(U$Q0e??W5&iOA%Z
zN8&y=ezDt^qp8DNG+u3a&MNK}nY)@hd+K3val*U~>tf%dxFOSuu``P~zrOz((&25b
z?rSrLEb+4XfkQU<T<`YD`X3#;;~wI1JrHlU!RXsN7LwKbD{bksw#z#9Ai92bn4j)q
zH_0sk?XHz;jOd=U=}sfW^U~P99Kj`Dg+BQA&8OAAWlY=iTSp4*ifWOH7z{e?hWpC6
zjI8(%{0ofYqkARlt33}%wrhTK6OkXj<K=%O`c{uc9QTE#Cd1Y`Rle1dg89?XV=UA0
z4Ux20KlhCwv`{D(_V}XR5*<2|cWhQ;cj(0OdA~V3cIbd~>*HsxrMBr$KxH90nl|OB
zWNd-4ymfJb{vZiHWQBk(H*J;vpxyK^ZzL!;k(`2&0HASsZ%5byX2SPn<1~|BR_0R(
z)gd9uU`6qa2%D!y#Ai>(adEjl1CF6F;1X2T%4|XqTI?gywl_D;SvXgZZZvEbA#8FR
zZ&GWtzB*@O5-pV=Q_4iz&iu8~`Ql#C-QZO-o~BKs0(aN$kZ6=4r^VUBtUFzB5ic8E
z?Y8aIXuIs%$7yXpgQsuXUgKsCjVyy}Q>Dv2uFjleW}PzWa2=&Sv%m=cbg^_BRk<B!
zM@*qZwvCeAR_2ylfA*{p{(Ty?MvTbRnXG@!J~H*+m8pF+kNt3*H~4C$zcDf(XRF<^
z?+(&M(gvZzDw(XZv_Uh_340meDn&c;gp77#tKn>y2J|UBEB3JVd^>OpU{+JH8j1U;
z%gtu);Fe=g9wa`T0$M&Uf!H0y8_uwl!2p{ad83aL+&><%1<9j6F==j@46Ge2f~Et%
zo@Irh5VdowFFSrqY=x#dOUQj4>lbOK^d5D0`eD}`v}x1C?)0NTgbr&ukf+eA&qL>S
zrBa>WFOVY#w1cpR5U<K>8o}GClS#l|ge4}O+?N`ENgPTzlEzM9tMp0eLv*(OMQ@#?
z=M=!_oFiT2qxneQwuG3xM5u;99y`Pl^yj<5)5Y^OgS&3xD5Rud3%h$JpKbg2qP>yu
zwVNtu<1Lb(NYdHU->-K^I_R)k=LAlcm$LqSUcxPH#7x^g5|y>olp|~5%ntlxDye;;
z`4kS%?Z-c|i4VvZW2DJHEIZ)?qO_bIG1=@P(m6rv_e*d0fs3cVnCP&p#Igw^xD5WG
zyo+~{9$C!~t19bp2yO7Gy@135WAd3P&XDk(OKh~&Jq)MD0=ejQerKSZI5L|kt-&In
zfe}75BynlDR`)lW;B5dO{}{Pcb`%yRjR>c@&k28CImmiBz|HtXBGL)AJbSEBFW2Xo
z{8w5(8YQ4=-u{-u>i4y1LWF;IVXg>t**rNZQEq6lS-`d705j}?P6Yl)IXSYINRxf|
z%|=h{SDTfB3fIOa$?_k9-H#DfD|Cskpylh-Ui#gYP?C%o(6DA*kj~0l$mx{&08xRX
zjc<2opU&mAtb&UO0g1*N-adnKu96+C-y!az1yY;n#F+P$Ck-hbTs^|kug1PMkzJ41
zCf-HZWA_4aW43QeeMZvF&9~<EODmi28xjkuyy=ZP9(KO(q;C2Zg|dS&9V=#8BJ<TR
z^Q#hs<Wov(+|v}-*Sj_AHH>{N4=k?t5R&^dsC4fioMV0Pi{Y(;l~uIl2n4l%_346U
zPi!MLVr%G~X?S9;I6<oj$-#jTa4%sP<>y7aWF`yeLFP8IF*_}?gZvZ{w6Tr8MN&9@
z_T6{a6yCP$d>W2X@dd$C8sHR>WC#*)CxtE!`KZHUVZ-+(A>?}32ZZNuH{|FLi(Skk
zv-+If#7WWRf1JfM<7i4mz?dVM8Vh6f?)p04jK!mbxbsgBr`S=;h5uH6YsROR31<~f
zhr-bi^AiG}V%67K4ziUi5zH&S@@hV~9P8T@BL2L;@T*@N80IwILc~0^I@oO5f)Ea}
zWwp1oeuy(c!|A-wb^LrVG(4KOh4&e!Ov7w`aoH+*bJ9SY>a$7I6AYW%D$l&z!rGWy
ze)0%s1;vVdTI0!Cc8lo&0?@*`?M&^NTvCY=f`gv&>8WAmW#$-kUl21}I(Hl@PrOim
zX}I1n>A~)nWK2}gD|e6~Hh`XDb2DKKup0Z_BuvnAwI<2z@f7cu{uhvyC|MZJD_aua
zZh$gJjrJ*yHr&BTTPUK4kDmJ|k+MLqw<s9Cy~1F(q48>PLFjErWIQkdh8p<}qmS`5
zAO(vl6+sU}EOLiOlf{roY}7Cx#LCOZ0>CC8D?^!+ont-~z&Jr3T-{=i+Mahbj}!(5
z$>jI%ejlj!@_BZ1hfLKNedNX$AFZtU`+aH%3rglMa|4k0Zqkn}3B(gc7_dNoXVn2y
z={!Ggw(q1UnSiGQFUel8Dj*XaNMH5|QDc9x+(o*wIDc2t)+OXM?w~=U@xh?S&ssmz
zK2V_dnNEL&J6$OrRiRsRC>~xq+U?Xlqe6)>b&@{CV*APRhQI|=p0vmLQ##$*ZaaLZ
zJIgzA*GHe614#y6Buv=qG!;{agx!5A7Do&G1ZN6`;!L!?w9Svuue_1x5wIJ8me*U2
zcm$3JbgCK46L`&)${0nQ7i%j4ga~?$iKILqLqTFlAlsbxMgOZQy-e(i2&zNwWnOFn
z#Fs~LjtQ9uiw$#?qNm|_$bpl$zi_DHFpqP9ZlUKf_T2@e-#>rc9kXT5F6$-Eapz#n
z$rYnW_h{%dP2%^7-W@Q^ra5J0MY2Wr!1Jj4*G@m^sf`8ts3J&6_NQD3R-p!J=doEg
z5ubSL12BBJXofX-vhUMlFrZDiaWE{)A?wGZfrq-@IeoY44rLAsBda1bIvgSwwO}8r
zUto|YX{r8tgJ{n9+GcGWvGrl0IlgdSaG5L#cXtDjF^nH*T8oO-exuK|7=FssGDb0k
zGQ}8SgBR<&WAj&HCtj<YcC(&nreh>=mNNwDuR*1#QFM~griPPcHo8?mZP+p)N4MzY
z4=5Q3&jlz;xR@&F4#aXALd{gFiph%0R}1z6xm_zrTsA)xdG+2Moryx@f>zt??@irS
zF)lk0uqlA$zwRDmo_Ksy|G?IbRNpBub@9S)lyEMed(mMTCXlxU2(56&(@g}uje3~k
zD}uWibvr+a546jUyJ;%CiecpqrBzI6#L<?~D%EFoddv%K)J?v<{kF5VRBO|Wr^7}x
z=FrX`^t+l7p~0zONZyokb8O<G<<rb0`in+LdG}F6Bj(mYbsl@fkt!H&(#iZl9-Yhl
ztT7rc>ldg<>02Bu8o$jl`-MH}QSi3$$28;JaPr)byCbrdX3IJhjPRuclt_<3PeDDd
zhgt-B#SY=gd+J;rYob-jVLpM-9rWT8^<PuZf@<%V7Whl6geyp$DyKH+jg(jl_KN2l
z&$#TFomb^GYI(^W_p~!Z*8=PCnfR<sQ*{}PlNd^+z$h)sIcO56qz0$;_AR70Zmi6$
zVuA6Bp>g&MMJb+V?Vg-9Eq&UmEFfdNLPCnKb}~OjCWu5*=F48UCeqOKPow6zODFWM
z-2Cg0@)C`fLhu;WHp#T~H9@NjjM?Ygb7b;E7f$lzKZiy|{?=J7=j7FWTQYs&H;=7b
zb^qIi#XfkJX)9^l?=I>5i{L#bRP#FdK7N4J4&`5@ky6wE^8^c{w7*bT2S|ieaJ>1Q
zWW4Z5DPcH72sUU;z;Vd5tFk>5GXvcK2)B@(MlL1Kx;c;K?(2eyRllI|bH620(t=o&
znhe<iS|j<B!@*B3Kqc6JZglbO9B%r$?hyG>!06QaV+=>Nkpc#7{4?Mewfg~Qx`1j4
z9J-zQR>t(VnBNoMK|eO<%TH|j$YMmY7j$TNDx`KocvCR}+%Q-g?cJ~tKF_Pow{Zkq
zCzmfdpp)%frP^BzCPWPm5o>zsb8_v=<9rFjS^~id;)6KsQ|?wrlH3##7#bdhZ9+r@
zb(#d2K~NENjDu6*6b?q60apC<bOj%?_Ml_n=*1qP80JYGjYld0urhk<)OtO+D3w%G
zpwFLs^=sYGo@9H|aLCo@sVOhC2!F&qJ4R?!W;^HlZ>VAaeu}Y364|@lm)}59<+eHh
z{^0JEFv1oZ)Gbf`W2kdxePi30O>UF4Db~>_oAj%oysx&^3E)TtH`Y@N5c}jcl0Rm9
z|Db$-rZDf>gxXvBsb*iK>Y)YhN8EOjZI@+ca9Mh95mkv4>f#OuWA`jJCGPfEt!XwJ
z1<p&2|HKn@66$jxo~ib6Nq#c0-w~_H7a$Lv*!xPO=j$wsqTtk~B<PA7XPOV;H(9q0
zBk&eQd7O9?8vECCh--fw5Kw=4bN;ev2CB)XfEmJfv3jt}9YnK@dM{gH<Gd)w1=`9@
zqeE7!I=AU@Dofw(TU>E^&C7{z9@6Y695PAaT%!fd@`vNBCwI<wj&!4uk;o{XQ4gX=
z5acv798A(tt=aK1Lk8{MN5%RN&Z5|kv7NKv(18ie@<np5vKA}^3^D&eeNN)zE^EV6
z%*}O@yUATu+Ppwx;=rH2-mn)Zigv>Y?qXeidz<7dFpU6&_)XZos)uz%(|vpb4Isb_
zSr3Xz+h?pB`=hT&u;tCpYlfiwPZD&bIO<C5PbZ+@2QQ;y5d!*M)zYk2G^9AfGC~xR
zPTElK*06wY8YXd_V5iyQj6BjsdzWPcfn_6qFaJi4q4G_}z!Kf?4FpMf)ozym_5x_}
z$#+^G%h^ToU6_}RMr8Z62S;A6<WQufXKc;D&oVYvZfsK(_B~$3LWsU972_MQ?R+Ge
zL_O8YJ>y?9+2PJ8iSclsguRFIRhTfo1zcxUX>!jelY*D6u*hc|)P9wR!C`^W<k1j!
z=IpNX$FevFZVI`{DLx<B=bkmxwnkJ5`VEI;g~4l#(jJWVX<=s7k&z;~)kb`5FL2PK
zic@SMH&w!sSLd!P$F$-*B`}_HC?4N<w<#445z=;KIV)XH1oC2nRPjFVq{~ofavyx7
z0Sm6zEk3GIkLL6)hS^W^+TCt_Ud>IeOYd)k>dz;oyuKfmXTAeRr!lv|Sq}Ac6w@w&
zGuWGUc2y8v?CYHViZC;-RQBvaaTTxAhQO0o?d#didJnfy-%Hx$4@&v8HqgInG$eos
zImKdF5(NVW4qo!Dt{>u$0@~a0e9DE`4=uSgHjyadu}e7YqK{TMQs`l+Vq3e5@ivC3
z4{NzeU(lG3a*9&&iJ)a^wXc<lvL2$e0FG$To=`RG0Z^F%x|V1u;03Wytn!MD`ec%e
zk7nr}veq0hXb(B})jkf(*8|dRTj>}JI;Vo*OO01dCMswDw^;vMT}mpdL6B@1wTSY#
zTm%lFa3==uiy#N#%FI7Y{JBEchLV=pr00N&@l&M)vr!npnIKt3g*zW`4+}G=p`cGn
z?)iM`nr?@J;}?X7hTV{UI?bg2PWzoI;73_y?W9P->x+3!QG1C}6c~k<tN<no-dmj4
zp#gWOJ^C7qe87+Llp&pHQ!i36Z7@U=_8}pe%V4f{s(HaX_j+YZCN{;I0;Wj)dtILG
zim$F}hFd;kz8?6C&fR%!HXhk8Twh6}Z5we&D>8vaq7waB)(|4{#gRLJAd;8Gz4^B2
z&#RT<grdZ3`_@e}11L8zY}>+6Yi9jd6o;5m1xrC+ocHKEU9Q0g0^|os&QPVSy&G<*
zlxGgJ?x&?*J??-rDm{(=oW|u*QXqmFrios)1C2d`$%mp*hdqpT57o!d;CF<0)LkN@
zQ+)R%RWvVW0}o+mJ+a7rx%wnfS&ZG6R`^Bri`Yb>Ze(TSyWWSdzQIW@;*m+@bFJ9o
zqoJhjfpmKxoIcmO>w@EtLsUhn#s9M8G%in8D<5{t?D0?6zGEZWEtckL7Y^i!zjIz*
zDd#@h#qonL`y^fyek-&90&UsJkc3!L-hHD%g$PZnY4d%m?PvIH2@%8<f3E54kh@w7
za-hM)<n(8K=X4)&Ls@usr(Gh~0@JMsoPkJxmMi{@x7S@~UFT4Ls4IJlr%b`cr~79|
zA2W}%4RDi)=jEsd&T8f4_|@(Ohf4ldSE$LO%7l+9^}+5G%`Yr5go($;#(~uv2D9Z#
zL=M<1;Y0ec6#w^=PrS9P2DG&Bl)+EXP7s4#ir(|mf2)~kNZ3U_!uiHu(y1-Cvk$aG
zhUl_kiV^6ls65(wE6B_{O$q#Ua!)yc4h}zMt%dbACTh6H3R9iwHIhvB!bW2SZw5l=
z>*pM(qL_~EIu@Q)Yv8Q#?6KfeL&65`h|+2Is4O+wdMh%d%FC7z#gCxtl8KY$&mq$^
z!^^@Vh`=-FBK6^SkqMj8wjx90k^hVUquc&{S%qpbIzgxNo!|)%knc<L)F(nWkE7!o
z+=@E0<M%HM@W8?I9wKv^6iUjk@ziU#I7GdBKYC@lKZtFyCd$nI!l0zpmztjjfjOrz
z7DT9W7Z({^UrNf?FlM{bEd(ALRr$dhbMvKSE8yEckV^4{2X%z_d3dg~&)pcvNB!GH
z)$bo-=`ga86GqY17An=mo_!oScO<zK{KIbr{0Y&fJS%CV!M56q*CV0{1d$NoC837P
zP|6BC&Lj@qgIIkmCgCP-#-OLZCD`fDWpM&ic#|SP1s8lYjR<e-a4y?)8-g-hd+LaT
zDFIId0txF*Qxurq7QIXvfy5*Ig8dZ>Oku@FNwc5>EXc)FaCqdzBKx>J|2cC$3tK@;
zA#y0RbJQ#~hm309V7(hCY!FgFDbdO4vl@&9Ymr=sb8;TzzQp0Ybo5qL%&3v<27>C0
zf-3cEq=tX^)?f|a@dl)-pA91=!%4(E95JML+NG2@A7puux!?iCd#K|;lfWeMCIT?Z
zL!*RH3YxZc1<@9`o&ps<BJt~phG^$ct#i@j2|Cx^JSd&Ms^alpsF}9IU3t)mw91BT
z)5{aFVqc65v}tVS&i-7ui3OQtMnSb0m8g-jV+mip%r0PXNXkp0hB4upOM0wxQS$IO
zA=F#o>_a0wHplQ02gOsB8q;jNX-7qfIDRC2jr%q^&NhzP=F!dT?4bMTEaB1+N-zwv
zN08i=jA@A+65f*wa4YIMuf%1h|FZ76-7`3GCU^*q2v*CMB9j8T8=*x0IKIx}`ijGJ
z&1yj_k|sJ%v5*!S_YDnlOt?c4^GD#5-H5P0WkKwvUfd_zj+U3Z*C-213oN5~8cA8$
z1WXdsWbB&bjN&P`#QkwsLNqgQDlg!~dH~r`<4?#<C)tWk&tw}TQiB}DdrWsdJ?8%?
zuECp}5{>mZcMe8I%yFh5Rdx}s`tItrfZ!NGQ7RHq``f;YxnPf9qx{O}?CqBQSs}ll
zOapTo(z5N3Ef$kj3{DOumffzk)7_Xa7Sr4)U=^V_7_Pau$x0ir+_phG6PHaIp_^*U
z>5uZVzT-Q*&m|@sOXEjlFhn@X;Pc0Xf0aUELf4{!Smrp!n1~t@qzzB$`c=XEI$}&c
ztb%J{yw#w$UNyf=49`OM?f#MZIkwk!p<9E$6cHpa1XZB_$X76#bVIp5`)uRq$Siw^
zGJXAQP}OvhqL?aTQiUndiJH}{*t^VvJ;<z~m-l0P|EFrxR925p#`A_gI+j(zg#hUt
zgi0WC7XYFx_jYxUV6H7n*UvDEPtlw_fTkO?+7*Y`M5US_t1UOQdNCVlYe_$kwJEXu
zLV<?O?;oItUgE7~QrB#SsmX&_j9!gh8PLV6>8D`^g?}x&DVLuj-q5#q*Y;(9j`D0B
zK<}hg-0*X)?=EFEGo|n!%Dm3?54Bh}DlHOer!7G0%qrE3GUbd}_DTyPvu+JpL7=<y
zow&UsVYaIdE9Fp;RW>M`6C$-A8mL#6!BbU?<K1=64<1%W$>9Fi73GH_5kLqfVf`^I
z{Dagej+7fuN*i?;QY%3!-r8wEcN@3qiJHL#i4Y~c$&rP#jWqNI55!NdP20~R)Si0+
z#@OMRD8+eI;Svd@hK9R<p8!fDBsoy^znrneMup<Mz@N{@j%EL`m&w{A;gM4?q0uga
z{VqgKGmt%Hq|(&&6oC=ZYiH`D4>%Wt<`6txc6C7b$w|CGI~Hyr;G<8g7}zT`QHe)p
z{obV1Hk)+w9)svhCRRC5*<~ES*VR=ymif>pfUXjyq|g0Bi(NTQC@ik@Pv17Bn@6Ib
z7BdZJyP^XcO#?JG<h^H0WV2sNQYr)a3sQ^bqZXM8eC)i(Q`2=x)+G@W8q-`rM9>Qh
zyVyWnBupOy2K@DhWQd8!Bm?n%@sC@yNrgHw0-P8bqC;D{_V$*IXQV!cneQ*SbYHPj
zRBVSmuNqp)eJHf~T$W~vTb{Uj9h=8Yn=7>*+d8n0BbrGi_1rFkh+U6(5oaJnw*f~8
zUF;ch9l%~{srCJ!S9iK^sq}@tsk;~Xg-l3^LYSMC<{#ckJlEb+^MgY$_wxu~!ddK~
z;X&nVFYJ;OhL$(JHn{IM>a+79_4Pu-lEwDu_!P|yQQWOs>$u1)-tx@Oy}{{so?Lpn
zRq^<gpU14(B%O;o`Hs@z96c^#{)v)he&EW@@_}EDUu-7ZSHvvOm<O%$9C%6CpRjB@
zJ87AeT7;hiqt-Yv55`;@YTd3)mWhmY-mW#G_d;u0M9TL}63(er2@pwEE@W`xJ;C2a
zf2*;TsGnW9ap2^iDd}-oOgK(p(=I0sOAgdKjk)4bp#BLtFn@9ymyC~1Ha<@3Et#I@
z19t{(1)dg;Ck;|jVT&O}WQnU7@TVdXOZdAZ5l&7Qn7>`u@ASp|*Gh49Kdg(gSgyHr
zeGNr{V#$$olfr&O{^DMPF;=U4^p5ifj>3|+Oeqjf3^99ptuwFX+CJ-x<k$2;*CG-U
zUK;Hrjh)zDT}wfK$AHi{3euss!UKLbp8Op6&EUX*25<{Obrv}|<0(BXee=yHo%h|6
zrfAIHzinJ0%eJYJLWghJOIRIsu{>6Oea>Se{e;h&`%rpF0Kki1@5lF49LAbp8fmmE
z`&2x0QtVMzHpO@27K6pE_;>|KpVArhhmAe1%pp44&F7>s<`w;wnqO9H7upES|NNy_
zw0nOXfV#ii?~mE>+-nzJhoY11^cC3$$kJ`T+g0z~A4j2=gsNV)tCR0f7E_m~g3&AE
zv`t6t(ECRK3+>~j(%ri_#WDpiY&KupQ3W#tBm8ER!cXM{YG%y@qwbrUpAsyJu^1?`
z>K<!Rdv2yo04SQ^EX2f!vGHoOnLiXFy109o(;0kehQ>X5pjeNqG2XIR<Rv038dg8a
z5ku<JJ~-TVgFHThTBOtu3Zxy6?J5~u{TGV!7Xi@TjPk=SzFW+3yiI+R%^e;U8LBG+
zlo8)=$mz6|O3*kdUmerW_xvWd^S1n?2(mEaKCcCN!RQ#@jUHjiFoxlT`b`<&MFL`1
z;)Q1d6QIK3WDrL3TEGfxPP75E^{N^8p%ppnWj8?=9fb64cW8h73Fg^t5(1*0%}WJ`
z!g@GU&>5()l*ZU>9Jdpq_})~j5_mg{MPfiOjHx0Lt7`~W8x#b;P{E0pVU14wO>gAO
z!ZsNi&hF&fZcl`?vTldBCi=DeCyiEIC_S{BJ`q#jsC#!UaP22%2qlSHKd79RKry#o
zy&PH!4TWvmQ55vSrHgLRH#Tc!9&<p}l(pZCE(-He%|U;b^t}J72k=AFn}(c*%ohu*
zSf1oR$j_AO$=SPaJ2%Q8!_?g5PYqr9Nf1o|BgsohKB)AQ8=}zFf7bs!w))-R{oK|a
zHRow)f2W`jlZnD!B_^f0chaeeC`siYis*((#Q=?j9#r}$?qYCIOgQHKlaug_NIj2L
z+<H;-_0R7OSf?f%)tHnZ6<lGm^+b!|WryVOr;poTgrV_GU_B*=@<rO3d=30;y~SJd
z?C?vop-)W-L<gmd5_vK}nKvt?B!BTll)h6g1&~w8Mh;OsoyBp%iv+!VF*cWTPo8Yk
zVi(IA8``5^Cg;Ffa^bP$wvNW4Qt4*(TYWDlq6&@mNaMOLlmnO~lLqKNXPmS~BKGB!
z6XL0x#JwlNM~Dv5JyrPqx+*gyj<miveP>)HFMvr1(iJw0@<Mb9;$g6|&YH?-F)5GM
zNUNR3{NzE)BjTGc5RhksGyFsM`ob@a{dk_}X(OdS$C@^NU@lTBVH}@=4G*fXfUTEY
zSgAQn&_%3hjT%*PKsUg~rJ%TFAtg*x_c?SkysFCan<Ze1?uFNM<F${_&1tssg{EBn
z?bw!7V@M2Mc8GDsN5<%ptfZ^AT<=Hm!x+mP`%7mqX?PWm@Z-}-?Q^fG=ISipS~(F9
zuiGgre5!F@e{TBP&2@JV>NP^f$6sI+O7J|dJ>%7-@{L){L*3VZcpvUg`|qOL9VkTI
zhgLX!V%tSD$}hTf+WNTD*EL!1$xcf<TkoAt_AjB=-f>i~1tQVHfjEE-(&=XK@=a>@
z1I>c>Z!OlJ_=4t7-0m-l4!-KX*{Y~#Ejruv0*M7MzsFpIStx(u5xO1r8?Tp<arJal
zNz^D@gc2y$q=Xw+79(mgs~F`(S63>f)o0$9&k=%AMzNU26}}Z+>UN*qbCJb8CE!2I
zdg+w&G3|c!l;jo#=Q{+{Ez>5zh>tEJl1hYeVXQnzYI+cw!pV~p>XC!-tnd>S6tgu?
zcBg=>X1Xoa<FNjH4)1Mm=gjST4SNy-Dj*S^NsEx3yWydUy|L<!{v>!m(9<uanV`RU
z)Z8j}`{ZqYpYVQ|>p7{@tW{5iws~OxL_qM`)|+yN`d`(J90cIza%7&?DCpwsZ+7>#
zmqw>bCR2eunSE)5_Q{4;MX51Q;3z35&fwvbq<F(`b3fwo?zk_z7Q*WyZESOlaqMy2
z;6Fd&(IZm)V9nwb*ggSS3Pu{#m;M*j0@Lq?{5EU=KKv+#<Vh-}AgH*VOMf!Zkw|FH
z1mXnL`3WwVB#~l=vf>SK+$4WKeEV}bAWtFANq6-Ei%O?jCAN)53iU}_748^ADqWd-
zH@WlAb@^9?lr0}7Rb78~hgc22lzLZ%kK1(@360%qD0Y6$2IRjX!v~OVtzD8JzyW=a
zzVawuPVFv3T8dKjH&4F>jD~r{JLL4bhWaVC!)Vxe9{#M+<a-1qECWdrb*{U3<kJ`P
zGaij*;C3|GpQOj<HHM9$!II@p;}Vdyxt0rhXt)UeB4&JYQViBKEY_a<X*z1~Mp3Y*
zm|hSpLJ)hJ=QvFWl%Vg8i$;W+FZ+65D;Yc%-k)=AE*4MD8djv3GXmFq-52*udLBfO
zPx-YM5y20#{x99te;+e*l)B}i=R*WvONrMmqu-CaT1g+(uUFvT_@9_FJ~bmBf9fld
zSDL_NC9lG|fmNGc*?gV28C8umEUT)-9-gA+D_)CZy0;mPQP`lxMHwMIorN9fsXnTx
zj6z`$9MUg084~XMEBu095gq4<W}jO0%VMsJj#%v+F$L&lc6Nh54`}t3(9v<M$L1^%
z9XVG^WR3b>(Ol=wwss%5y5ZL5R5zyVjHKSV!Y!8H6zQ~YvlTSCCJKA#7aJkVKWl|w
zAYgMU)6?dpDxj0cedl8^e;{?BMnWX6&1*&RP0DV%>w#it78W@|25Cx!4+&4R`z||W
zR<Q~3iyA1K0^Mz8cT47YXcP`ThxYlvq%v81UVgdni-Hx|XiVZQqMl}C(G%tJU`^2g
zSc%FF-$k76`-%qy4jhwXqxf1HY=W3LYxh{nsWo!#zg$YyD>v7pEUUFZ->7#k<jvA>
zqP3DZOP%77Ae-e<R<9bw*{p{H8rL^AVrS7qG1+zziFoV1-)fCEdwrHUOWC_e-eL}M
zSiY4`MOnQr@<yFJ-TrqY_#%PcQo*hI`iGvpLoC;u?}FYJ+nV#q*}nEr2_FDzUKsPV
z_r>^t?`ggHC-z|~&rK7vgo{m?#{@aA&*7yV0MSJ5f>;cfObDB*V)LN&epy3O1acaB
z>Ey>elX-6QqTjv1!A57!b5(H%$@^txZPn7NY2ab|2VVkM<Dja5t?hblaa!2QXEqV>
zS=<Q%(hZd<|A6zsvJk2D;{7^34h4a_q_3-@tIe6miX7Ta>)*x&_Qw}8pQQYDI+^5s
z7Ej`!zg9?U%O@-@j<Km2m4(Hd6>E#OUxi~4WxCCSk#I8^f8(7|8etL-;<Vt8BFB@u
zAy3`7(2u3ps3x-y1iMxMsqeN4rB+#c;OE_&@zB0|Uz+}K_S7h{VN9H=ehZBjSHIFH
zmi?xCVJqPZrW`%no42kcnj+5r0hV`n<G-<+_6X(^i<K>Mjq^zXHuaAQt5w7U4ktKV
z^9BhQ5)CIDD|e|KX49t#GKHLj=jQVG#xh0P<;G;^es>${#RZ*qcW%2O%TUO~=b4mS
z@Z69w-~Zn~KA~t11;vrWpIC*qzCUB}Z6T3NBH)N76m{*!G196KU_}LdsmG#U=BmWX
z%0rUE4#QFj5^gfgH3$`%$kR3I-?w;eSO~`n+Npotj%>yRG}If+$j;2%(=*gxVs8On
zEQ64kP-c@t5n@5zJs9N7=<u2~BZyu_U5(*NWrZztfO`Y&Ce6m_Is8X+NAxIQ;#ANA
zLJx9~R`@T6U^5FmbRnna>ufH)!)kXya2{WIJ3Wm&Mm}V^ie*Kxgx*b*MLho;1g+j*
zQ-R+{2%M5I;m2;Y+>_2p@J&HY0b^ty<F$W5AwOQivpf5lvCHgj?HZCaiu5UhF)WMW
z=Q1Y|PZRw<s+fghF%=n!41brOsfd5F%nCe!evnp{h^QZpBYA2T+!=sF8gjIq)Z1hz
zZYxXh0z=-4OEh(1lF2i=;<3x#PMoiV<Jja|q&GPjX%^pF$H~*8#$2or6Og6d`pqy+
zi%c<#GmO`~nFR0!f_-;S_~?{MEJ-}m1CI?}R(mU3kRwu<RYHdak@PWCRks?%xKob@
zKWssS6zK+u@z6K-3&ekI4vp3PPMVBFPd6uJ2%A!*Y~0mm&OA(-7kC&5zr|_jkZv?#
zeP?dXG`?Vj&V(k+<b@i%`+Y1(EMST36<chOBOZb_XF&E*Oon@b{g_mx=1luNe;n7<
zP3lz*RL+x>g36JFN4RtTK^yXv(9^2(3m0Nl!}*WzUyT2XW$;SXau!2T^WzHCOI$(#
zyw$2+DL7Tm)TS1t#w_)SZZA52ZMA+lovhf$M;U8Llx0|jWyJ7F2sxIz4F(!@Lbkrg
z{v|RZP;Ggri_8Jl(fdTCjpfXvYS9e_hQ;+CarM;@=XVfHXsoSTg#da<QJg5cdX2L~
z+3~cTP7j~zcrK04bH@9U=v^Scl!Ak-HSVAXih$3?HryvPCYGMR-~Ps{Y+mie^G!FH
zj;7Q}a~QPPuHv}K1}6&0I=<a5NW^uC9lS8Ytdn!)81Sqk&R~8h6&PIPweEW>Y~CIN
z(e0>u2MIm3ja;+pHLLE2f6&akRjRgKtcFD>?|*xEJyZWnq!RK)OWmK6iv)|9_KQ+L
zl7($zNmQQy(f0&a6)~hD7~!YsW2n6WwqmAdrxRsaz-jhbnbMLBdWpn68t2K784t=2
zHMs+fW%TNz7Z|stt}-*JN<P<;&Y%KLvf={M{hIU6y{Ny{myF5fl6yir5C8?oE_3qw
zBBR<Q;xMhK%P^e*_S!0U4ieJBUq);S4+EM!{rflSj7E)Qf4a7#tH>U6O^`n#t?BMu
z5*8!LB<GtJhtO!w!YwHfarK=ld?me|^(mI;aI6V=e>Lok7vr<DvP`xu4o+f0Ks_zY
z!LPzH3p!Q*=3L$?cVa&!RNR)-{hQ)!fMFh=J}XfLbjGVoDDe0sq*?1?F41JVh&y3z
z^R)l}i6z{Sn@rGI7Q^FQ2565el0tYhLEhK#m?l!YNMM9kye<UwL>O_z@?J8zIwRqT
zG3Lp3;T*KQJWeDlyd)N^$Y6_Y&8L7Lj^eU*A|$h-`~Bf5gV1(dW+0j@izC=9$;ZzS
zDm5U)h<pOFHu<eAgI}U?b(sZs<G5;yGK)+z_UL0~q{6OtFBmx=*p7OY!j;CJD{7R%
zQwX0i;7K`bWn#l{POllfB7DAzLUqzgxA_48F?R)$pQFFR2n^&{k+8@%q+>t^1U8bZ
z5LpnJTZqgZmy|KZ89I2hM^O+p_oZKuVA!^qSG^k1Szp_{tfg7E{sV}!+}81E?ua)$
zWgd-bv`8-9Qd3WfzOn-oQLd8TGP$ZH)TcQuy$;4;;OGCv)>}q30k{9d(p}O$22#=;
z5(5UK$LJy5B?6*!cOxlCj~EToASodYjub^&N<q4x-M{lZ=O6d;inABI**VvD*Y(NV
zVOE!}4DMe#vM8)e4mc`V=@0r}Lhjf#x8D@+w5I{n+<-VH&ARXQXC6*tb}t^3s8Z<H
zf4)s9dTQGD&3bbL**b#H9tX+3*zSF-DD%{m_rCw6ji)9`SmLS!sEbZwtax&Fr|43*
zS8Ze>x-ecxwh%MfdtD1_8S0eU&W&IzRgD)5<9?ppYeLIIvvzC4=C%OX8^9%Ia6@26
zVw~#BG{5v-GX~w;X4Mf=4|$e-SNf{r@~J(TK4nzmN21fW+=#0W=L4szOz#=0Re7q=
zp~x=IzW|u(yI=;<g%=!2z=c^OCig#rn-zsjSx=m;_U+g4{bmDC-fO9_nm8hg@-D}o
z1O1H{mofWM&;T8mymVE?;-*ecSoC?&h=bz6d}Pp$QgOaq1x4>t$|=4&{d7#f>Tpt_
z-1r|f2RmKVj!bjGZu<IqVHa<7g0O7rV0@RRc2>14IbuesWdJS_0I!~46}P{EswvS)
zE}A!Ws!fO=0IbvQY%S1XR|<5dZLjhv_>o4;yP`+`RK^SB0t)bFZ4jk*JhUem>c=;y
z-Zj2{WrUrZyM8$5)1LagLMSFIs&-ZxgqQ&-rF89+8=?6tDX$HsrPfQ8tMH9O4SOGR
zJDc^@ILG9E+rqzM#zbI*B9r1vVit}j=%S?oFLl#62^im<F}H5?k9xJT*YT1cdq5ve
zN$B?MPc@~-*&m<2*pg&<B&viqgqnn7Ge{LJxs*U@al)NUt^w>RW07o;U-rC#)XAu5
zvficoX2*&|JGLVRY{Jv2r^D}Fk+Up?YJCvA-5#(jI<(0PfW)VDf}9lhi!7u$_@L2&
zI42xF<J#6Xj*<DYA89M2Uoq~1NV2|Cs$gm}I&$cFdRl%W`blmvVU1+TCgqb*a$j5j
zwEj&yX2pG+R6Zi+T40xLS1R~l<T_e-n8fICYE<J#U)h-9tkaM1Y99?qqMskX2&zEz
zNi)4Y%!a{o6fuYV5*PQ?B)wmKrFf0agrU~=zy1_Z|Igz<Ww~|rv|uhg4Twdpai>J{
zHH%71F-a+idh{ti4U2MtRm8=0k}XV|c9hD^ix``v#at>2Aa)NcE&Eim9goR@pg9<O
z+rt{NDvI4V8<$mJeO?Lo)RCI~S2=U|ff=(lO@&8-8Uw>s0`{6h>zS{Xp8twx642bF
z5r5tH^MS29{@of>HdtNv7ZaG6=Zn8av*MURyg?<MGU!IHToF{PctjZ9i+X71eMShL
zVw}o1CW))!<5rlxD^4bEd3H>_fzO!Ahy9J7478!o`@ZM4RF?{$m5Gygo0>55gX?0B
zKThT-gog}W*!fCj*glO*HbaNN2N&a{(#h0*Z=SoPAU=h$gToN>UP-|DS3L0F+Jj1>
z3SsIFVPOqjGo}ZX-s7F+V{HI9;W!cy?pu|sAeoArAj8C;EogefQ>U#8wYw+p-ui)!
z3GAu=63Tvb7F{IQ>8!IM|M~JaD<M_*nelc=KI~P{ef6Vo6t1uEVwA1BR@#hJ0aPn7
z)5f+6!esy@w?#Ncd<3>ECSNKvc+MK9;R`pEE)XK(*dk=zyyz70_p?*##4-td{Lzof
z^Cq`;q|yF#p>b_m#Cp3+xFW9jzVvCB`!JbvMlti^;Fq|n($W>t1e%;5c^ZbTyXm@v
zdS%FPs&zX2&mt6Bs)^QI)^<|%$)2>1!t!G)IS2o`$V>>?%`Sn#0NqM<tUc9Z7V2rl
zvxI?4^>nZU6+H!{mi}1T@ckTWs6&tT1M3wR<g7P{Wzd`63THLP*-y>P-3r4Ki(o`X
z%svM~6<Z~S21nVq8Em6aDrqX_98-=rs?0t|F4g(Hsm?VR$HgS_rPxm}T{eXr>hM15
z=Sv2L5&!ZgD@*_wFVvYc{`5bnm8Ih6HU*fy{j*Ac@m!lt0k-aNI=g&q>raX(iqpUX
z554a&)L0Ymhgm13Rbh)`VH1J(2pUT5=l}z$p)#O{ooaX0=)lHps8rycQl)f|-f6gD
z-4oA(_Xmjy{sJDnKj?g9d8rb)xKlLBO(G-a>C;%5i~@><59NJVV78#v)B(awV(-3;
z#*-^g_gqzWIMgB|-URgd<lp{>^+_LZLw40F*B{69|FB+g@9UUnjh`Fy&99J_ovj3H
z)j+aua$1SJs2zoCNcaE%Ty=Mmz~8?b+=6{f+5)2~jLD_Rg~0F-a<gA^oZ=Mml&~oG
zHE*CBbtBHtn(=1q+;cS&npoz*0-<uHvG-LCuZ9wZl(tM*kJQvr3a4GYDA1C-TaU|r
z8JZ`;*+2aV4WbE*bQ^mq@x2?DlbmOK$1smDzEg%=CRJ<5fidvSpWnhN=99>T9Z4P(
zR5ibA$!RgA^1Wxtv&q6W+5hAEKyM7<=tp5T@$^}8k|HJsMmRn)Spx{K0bRtrHKu;T
zW=`iqHJTMT)_}?q^jK2H5QRCrT~Tvm1Y7i0o-v_(!c^389J+vTr<fqv+Zm64;^*s^
zmnl!KWM{n|+vO%GYRtIcEV%u*FS1;77o<UEnj`jLEUNx*Aw(<BtJD-fm<4p@$n%_k
z`Dn^$KIp6UB<s07x_K<#XYIKU9nKQLa7uwvQkJzwy|~I=*OW<onP$awEI#^&pkb5+
zCrxir@rkxs2T;qR^NWj>V)>zwMnfr#|25MW;{E{5?WnbXao+T>t)}3XluqpT_ML>^
zZ5`Kk8Ed5O-^f`Uueho*h;cjhY?A$r`po-V=NJ8BxjKe2Ga3qkiTyCLN^A7`jKg@d
z*RXA&8FrAjSpNZN`%U&Qx8Y*Q7b~AS<|_(?XW}#}L_9yB+!G<&)a>v&<+*CzLt)O?
z+NF>h64iJvmo6dJHflh0MnlJZx0hDua$ptfom!fT_Q%@Z`~0&0ja;=SEVrx21hUvm
z>O5c#zPUOOr^<X>@U^ABt+*O#@O5U{<$yeLiJ0{~jQPF;i#_~$8}=+MyvDyjusRdH
zFANOhcq*<`ql>U(-K6bWe2-Z(CMDO=&9fvKJgE+%2q3*^P!%MPAZvvXhlnt5lW@nF
z`<#IT8~GQyMYTiH+qvwPav_?0^^@7v^(;O2c^Thkw{(YN+kAO(;czEWR#Re&rG@lZ
za+wj~*4E)}VK&^c3ev}x)sxhLARCTUKviIPV2)ZjorZdtU*!mG`|cCEqpNm#4433(
zW1zfx?6}l)9Qwe@hEoz(;zO~DuL7>a4fFD55zV53>f^Ko%nOna4QmT2e`GxlZJFYn
z*vyy4GM^6~)b|GC%{D1(u<w5=a**a0IZ{Vg&eWuGrt;c?>n5OZA(wEzRDat6lX;By
zun1`#SGMHqj@Nf*WevMCAC5mzk+TMOOz*!YkHn+dc+~p9oF4Wi5X5^o&stAoEj$9j
z$za%tSee0#RMPjZPLu4t^)+7Nzs25V3jp|;@JRgqFgfBAqf5?~h$rIYwA^>Rt6)MC
ze3Cw|N?bguxcx;YDzUw8McC0WV(Zte_j()aU$ME8z4%et48G-D;od0C{#Fz))eN!d
zZ+~8MJZMZ~a?>1!RRVho@NlunVcahZz4Tw#8f)krheam<!V*E^;qI?>Tg|?|${CB4
z;X|!mpIoZ=Cqp&Rx%{yv>AGIu0~G={x{;;}A&r7H*@Spe#OKBhQInC9(|(C?hqF;O
z;KU9*h!$23+7MdaiP-twImN%@mA8Ko*nLvO6956x&g-POF09u5FW}+-g&|&O;!DlG
z&*SfLqo7jsQOXqz?)U(UIk8W)!j@d&028SY*b(`l$(k+6!(tuv-Y3i@mc*r6#mwg~
z0WJWjlUkKXY}OxG%rh|2{q4nT{LnT^ezKVLdzQ)TO>{_X$;I)J=|N}TlK3#oxE-;W
z>Dd^InIKY|Ch%FRiBp9uX$rwFuQl6}?F2zSciJ##&0PdQ0y)Znuu#mOt4H*Pr>4z>
zh60W?J<riWLIYN1(Iq4<G0~@0er(%BcDVZlD0XhwN<+QUX8lTu+>9N9p)V@bN~~p_
zJnU7~gD1?g;L1Q6nxMbu4f_=#HsW%>18!M=9wD4eoq0)PQ~EV3P=L8|5KlJ5^zCQu
za~RcGsRGegOf4D*93sp=sho&uScZ(-KBrF(@zRSTIxFQ&<y;9AAC+*7iB&M5AMidt
z<9$K^=60$8mC>$AEJbXxiWG=&wVhKOlknL^0>^-$1b#-fc9M~64Cd|xYY^g+{CJqg
z$4zydY%Z$U&iVy8)D7V++v4aU)$Cd75)^1pp>`Zv#Erf0oO4c}saCZvYX5$;$rK3B
z$^k}MDU7Aalr1Y2zuD;n>Bkl2uo=?cCS|7buy`{HTZgAPQfAsD`2!=+-Vre4UPEO}
zxc~1+D$P=rt)zT;IjtO7;{(TLoL19Qb9Yk{bMwXrBx|XUiAruQL<CxhGmx@tg_g`B
z^?83Mqvc$xD>ASpP#w8q%od{oNITEI90ts%D<tcS@v~yWI_uG8v?whYZ+TwXQAahU
zp2y!}ub@LhSK}|SN>?4*KmBlRS*uEVP+;JW5VC!xP7uBEEPpXETe{Iq!Z()k4kEd-
zlJu5dbs?nA{Oxb%cY7wmQUkl0`z{oD1NqaE-G${o@8IYz0i^zV_mnuSl!3IV@TP0<
z*w#0!2S5aMPA!jbwm>xG3i~zVd_xvvdcHAi0YRHhevq{=SEBT==NMJu4>QI#p7XGg
zB5A%hrqlBIu4PuA843(w8sbO&V55hCh8UFQZN|vY7ma<iD#@_WrUB#;tU0wO(Vq=-
zWP#*@l}iUsD3RP<vYjdZY3sJTav`-h2;BVZQW3cUeU0A1D(eVb((^SkTXv7>HQ8~O
zG3g_6RnjN+?5YIaN@eKT#7CR1Da)Kb2>e5$=AyxKwJAtvC`$2T(Tioau1a{{cIEz}
zmE?M;xhyOA{mXCSbv187_eY2m;<GZp%}-mox)PQE*YnHdUDIIrZf%E8d8*AoA2@2z
z6wmi79m^~2vG(nMe*F_C)}8x}2EV1*2PI5$Y*a6%EnVhv+nrR1f(xrY^KI1h-`Vkh
zwZaogeii8be_aGjVSjNkb(*(!J_wp&7vtksFoL3}xB)p&PQu5Y;m`m~AXvE01N_Us
z3IzXE+O~s!l7EEo3_?FC&dekW7M|hFiokoPCM!tUP130Nk?vFWAy-fEL(vQ+*^^ae
zrUB`#|5V>z6~#Qo{zmiI%E%L0>?45S$H~Wk(dbz~B_XHNzga-epGw6I$EQj8AU_$t
zj9sU0ufk6XAqn5DA?joYOl~$8zOlz{y3HmPvx#x#xa}x!qEIZiy1wbXHk;pHa^5GI
zQw6<*;&)qmsw<vDt5F)&;HK`o$9o?q6|hO3CUbcav-3>3V>}HNGZT@Yx7Q_0bp!~I
zpP_Yy$3j3!K2Yc~FZ!Y5ZS8y)z@FB*Q~tKX65b#y!_c@TJQ9u}2{R~$?Qw;3Qb;dY
ztV=_9T->2j1)k0OK_z<weqCG*7F9AFsD^&ym2mOBPflvgq=DKl<Sfb?p<M9*2%{VP
z7c8X_rz1Sx>4HdbNved^GjKPViKHTOh>coO;c8x$2q5xG!bpwbhc~c~&NA;NhPx>1
zYt8N=-1BW<4Rwc?l8*_%&4rAWAW4G~k4NQ(pB<}#ql@O_LI4|2L%F7D1PN>!EyMEG
zKVr)Oj5lB)ab`t=FWjh+h1A2`a!iQK$+SS{OHz*(`>CcvdB1u*WuZ_M@QBu=^<`ZO
zB6gIRN)jO@%QFxK=aVp29Z&&Aoh$FRH;Ot`29I%TevdYX@aN$LSrk>u`2^ODzl(hq
zKW8cLy|5sHfB=;fe^KC4Zw{OE`$L#BED9es1-0|%%ctqdO&8=(6_{N&#~%>a7`kB;
zC!>-xn@V|y=K~*JwfnfNPU_1up7_NEOieeZDP^*zzpg7k6-T!}2HZT0`c}EFT=Qz|
z((+?P29#2J$V-#A;S~4+^4-vQ&25Zu-^HM^!M|*UteVd59Io@R!)Yu%#bE0r@My8D
z?^iDNTg5zy1xn{e8g6m=em?^siYu%EKuxdG2jO9_JOX_t(^rYLyZ*t2#-qz9BEB@b
znyaVf>!&_RaUJwH^p21M7}ANu0VR-KaYba?m777yh?TK)$;6x^$f8IBwEXYjTvp{J
zDpHdK(frw2T;1gKS0Dd!D+52JTJF79nbCnDBjN|<!AWWKzEK@q0xOYDnth&F+%x!e
zR!6_}rF++_Rub2~)J?>nE}prVW~;%>4c9r}d8f2}?Ds_HM7H#R`H4|$d+Wpu9@MJ!
znlOcAs(tUH?>m#n`QSW{!?SrmVs$$WA<$HAaUIdyYQw@ZxE*N~S1jdl3T7p2pFTt8
zMa*af-3?4U5U?fjr&2iY83U_6K24AG|F!rh_q`*-KpX#aynCFUN<r)hwXc9*ldOb$
ze?=0n3p=af^Flpv{$XzuxMaV%d%cZ_aZ>NcUjeliMXpQrB+h1q;_x!S5<#O@wBb10
znpLxNhW<?gtC!h-!k*-yw|}x!mXlYQoI1@fY@h)y$-i@3sTlbpLpt%X%bvIuYws(&
zv&K5j`gkP<pIdM=g)7n{lkvqUKXOj$6H;;cd{Ml=2~k|O(syg9H(VT(Kqvho2`0Wv
ziLqSd%LQ^D`adeO8{i+Gx`x0kz)|7J^PbpM>Yf*KmXAuv!u>x=&3vN&Vv25;2V-h4
zhKIgY)I<}HG2DFWGbN>KP}0^D_vkTT1HKbe4Sy`W|5u2PQ<qx!si2}{@;U_4f#iYj
z3!3p$<3DA$I<p9UACY193rpSe1jU2&MyyKv&FwEKa%xiu!-U@^QixP!8WGyWzO57h
zkZ458ifclzsnn%aLRq}R@2!xPCG)@gLSlij_3M<w-y0<j9<kqt=CUJ5OL3>tCiB2<
zLGHJD-?p+z4|q2ywXA1Dpw2lw;gPw#lsgl8-@mW-(ONRhDy;|^Xp_5G4R|_Fg8D!l
zAR*#(L2pYq*T9awK6NoRXmUVC>GK;<uzOa?_R@koj4cV@9%uqs0bIyPIX>sSExj($
z4PPPWoOr&;D(Q_*CE0HM<I?I*>)AGV*UmXlfFR;oVlc#uYB~FeMB1rcQQU2RW~|XH
zgKqqcWv``}%EY)?$Hv)ZYjpVBVC32>0M(D`*ce{wmri2MBy`@c0xKF#=GpF&oL-!)
zJw95mITo6cu*QmL;+6O8txY+zyjyv%ZB}>vb*$xJCVH{UIa?MnsJtM$sau$Icn2kt
zuRu)m?zNEOT?|kf@KTvuHd?_<>b}F$_zX2@qr<~9%*_nQwkk1i<M=`|=tY}eXd&x}
z5oNh@K3WD|V&*Mp*1g7^g6K5A12Zv;$$(B4J6pCHT{$w$TXH&XKwp6jN8{^T2#-*P
z!0YDVH|#BVgt6K%Q@2&m3Cwe2M2}FO1KB_cJLJctXe|r2ZL;;xV@4d(DpnS6TRpw+
z4j%*arCzo3gmD@tHG3{P-;wJw8a4#4rLDZy+ahxB$Ifm{_*?ld{_KW)UHIh%@;3aa
z-Kc<wDtF}KqqFT`b6hF0>pn8@Ukl+vLi<w1o0kZ_h#ZY}tGr9{9m2Aw91w8FF^wOV
zb@)a?P^Edfa0*ONIJ*|De|Xu!Y2#{qWi=?j>@L=OLJ{&b`bwLjO(abr0$i2hTyUI6
z5{@rI^3>sB!V_o{{hav+G|dO)f*hZlU>3re%&g`Xq<8K9nG+`PfH)OA@=x{C8oBxW
z-|N2|di|dpdig~#;bhToPuMcP9Tc95NqkOliOt}I74{xNvEN2+5S<w2R4hT%n-vin
zW(WGJG$vH=>7<Mp7=4i{64%#H`nDscZ#^f&rG}{OzKjLf7brbDr`*7zCpyK9zW*t5
zKD(=WtDPN!Vrjys$+56^3HTP{zets>kYGO&d>l~>U}eYrjpYO0-crvpN@zCXCWMxY
zJ#W_`FK~V4+)ZUFN{#1&Wq_F~bmb^%?n+R8ssEaI&i=gcUaaQ!disxgGqf}cE`qRA
z5)bRdz`+6eg|D=3V-|HA$h$O`4U5bt1m1M!p{%iY8bx1n5Juu&rs7tRF^SC;ove1e
z*51OZH=N9)&3e32%kodi(C3Px-J8-mTq(^kRnrRlbe9rrr@@9|2~fD^JG6MzlWGiT
zTaO~a=fclWPoRP4iVQb;4AQE;P4y=l!m*9QHn|>AXf$wb^8gR&nRzE%hbhH0yH#8s
z(H1w(kJ8K~rd>?@r1d_S*-@Eh^UU#vMH>p5ShWNK##`@hBpF65^jDWw6)gx9wL5D0
zKYt%?_JVabSX3S~{R*g$j`HZPyAgrW@dPa%dv2DQ(9_pJqzRGD+xcrbRRsSAQS1ZR
z=Tu2MA2rGLbPvMc_*D8sOH&2hTOp}lJnC#Nt(AYqTh|^zJpBxbTbeN?gd1`%_^|>V
zvG5i?&ZB1KPPJs`imlUcR8B2J#QAPZSzV4;GkJekH#~kK0qV$$+SHkNQ<D(LmW3WX
zkNUPsyZGTfo=DR9!}{7?epzQkY>D6~oqxsO3hRThlt^8V+w29DcC9@mMdAKo?sz44
znedQo@m5z)s<`CB>}`I7KYJHK)yXOGTs3E^s&TLu|7NA^+rW6$BPfV8{GbvL{3+OR
z>D`r^l&_XE*{9Ip;MEtLnKaI=zQk;b{24Zr)GEW>dz+Y54212vxk9T3{Nh#G|7MNx
z1|EBQ&9QYpmDOR^HxK{&HXcYTl%zRbr0xE%XDONGF0P`$CH?nGoX9gdt|%HRma62v
zbk&%@atd5E7B8M9)-l;^u5^D=-I|#_ZVarKHh1KRSuZ%C>^&yf#y{7Dc37?bIluXG
z0tA46cAwm>9=~r8U<De+k0RZHeLX)t#kw!Ae>Rx8u+C+J;9*=B8+U|e%dTzAhRI9O
zRuSzV6l?>1t(TjRH{$oo-HIKi{wn-n{`nYKSavJ((z|~Y5c?sLxnXWvd)}o3c}mPA
zm3_X3&y*qcsUnd89WK(6;iS1^@cqCYwiZ~)Ak>@5uzU4EQ=@vps)`xCaUO(Zn_0_y
z2c!#jWZxD4_rTkt2i_~2K-1E{UDq-mf2LGE%q~oY%~wkZ`&|7qW-vMXDuh}!&+A?(
zH^A-qxvl|UD(4B&%Pd^AwVK8-^H`>vA~}SH%e3uHpTyhu3hzAE-uXm4S(2HZBauqk
zvtOu2YY+%lN8GbU3#aJAyC-bredwIirkJGchW6{Zi;@|_o(S`XVj)!%;v<!-$cYs=
zhx$Mki{1uLf_UWX6dB@ilsvhgL&PQA5;N!sn~KW;M^qYgFw65jO?RC6PUuXBX;4uz
zXwnAb%{AKFRlk~NqqsOfA9C~UctbEam}?f;6b9$`W~^6g?@U+G!$TM{Zy?5B-l=GB
zV+S3O!kOe-LW=W8jTL!Oklo<<fqtY;Kc27EG<;^pe7l0*r_5Ab0GKyC5BFmTiK1X7
z=T6l{JeD}%^Gsy*+-H@Qmh9lEj7*Wqi}69dil|I$?n^7QC_p0p@hIh1s3s0DAs3)b
zy=8f}bBTHzHA#<{{gWb}1-I>*OPx1m5J1URa?$jj2kBcV3P5GQ7(CO6hzc?g<1#4R
zO2p@f)Ovn$h^~_M_UU#p;|}HByjx5vDnSuTs%slDMJKU3p2Nn@SiY{7bcS_z+GG5<
za=pRV*1{Y4_#XFgCYw<k8A0fRrt_`m-bVrTB^Y&7nUEIa)SvN6CYJBl{j1M0oxx*5
zRJ^F;_0RJkzXqni^CWcxZmfi!0S%X8t9TiApUlgF<Ra%H2ToBgjsCm?n;Nze8FC|&
zk5Wgo7HE;Ma4#Rp1>JImchno3n%b|LG%@N4v@Q=9c};jeGkQPbn=hc&W#<}6%u}_}
z2Nw9EM)fr75{nCyH@&w3X`+Q@V{+{9H!SqOpFCi<-8trTGrna1*LK(Q(_d>QzZ39y
z8S^}myJ%Fp-*3Mpb>I230lhfToX^LAriiuV%&6I{!>K7vG!+9eRT4ZJ&9A046oxrO
z#qT*w8$Ea)E$^ZEc}U61!I$Kg9G}~?q|>3~NBYkGdlVbv*7)D1C-+0qG-e9TjxN##
z0X1d2QKa9n_@y9evG4pFZv$47&AL_w?sh)TIt05ixglnOB#T7j6;Xx><&n}F-^2|f
zN7>S}&EBz+{S9uFWJ>*q&=yZ>z!12yX6!k@ShOD{lNh$@nvXn5H#))-oNl{D(k!6V
zmf5|cn(h7uHn6)<?<q0U?(Z#~|ET$*5Zy*R>@9bhdowsfam8P<y=^F3dZU6I>seU!
zybU~EZ_qvn<?qP5OU@bTXdV%>0sQ;W@nwX-HkvS<ex0AfaZsh?oU%B4@)AsXB=dUM
zWhph9L|p$RC55gi4*tr7M*!+%I$hpF#(#7A|6~BspMHV<qc<W#&1KgKytTIsHoP+c
z@e(eks}hHEAax<E(M|T1;ju^wA_S3T&Ea_QAHR=1&=UqT|AIgIYlkr=p!@J6@0)GW
zl)pQ+G7EpZRG(<&JMA&#ji5E1^Ju|9)F0bcO)-t#3G+A_oF0PEcWARmtoG~MzGxEb
zk*>c{Aeg=)Zcz_FNO#sR+&Pul*3i{<$t0GhY>8r4>Ez6R-YCqeA9pVUrcvQgR;*NO
zop+}dkK0w{5YovG2dDzz<=g0Lh0;tB$XIxuMj_5ur6+P0t^|N#S04=bb2WedyXNq7
z!YL2`-Np5UklT0c=I?b^ZDSgBGGOfG@BM#{#!||@)8uIa$4BW7UfYV(T%$?*`;xYw
zLiQtlO>96)QaW`nher(koKz#X**uTRw_iCvc|uzBFEneF*5A5RLCIMRXS|Tyl?vW1
zjbD?$Uq9T2p-d(<V!+TsDzw?XkRJz!mx{esg%3)EqvUi`MZLc9Reyac%iR>-Bx*gS
zDHwfS-{WFO+#W%q@D#$>iy!kQj6-=lqAIpg`B!L~`%${>$Z7Csz=}Ycy5-$%2zOF;
z@Ry=oK6dCsVxbk!I6!gXp-`dCQ#o?}An3SgE)CSKWyp3j3sVx#4x&OhSab~MjYW$W
zn;nYeDO^xcWvxBKGDMjDohE-tE{NJ{q_R#_KS;AV^BD+5Ig$%`@*`&VEn3fP0_A=j
zh!4N;+na0H!t+3?j&o3e25T?Kc>18+pclpec<F_Nf+e5<uZuWm`R2zy^mu^<e3lfP
zf+gur9CwRiX-*@03Xh{%^Mx@2Ug`R?GcO1RH_3FL3@}x9&3UzG^Cm{Df2${p7Hnl|
z&1F|M%*HXqsv;T&&3uFlcvMQ=F_n?!y_6GSqGkJZIb&M=?cX2A07632I((Y26tXk_
zJ0{0^hZMGVm0y(zYxmN5wdw!u5P1(mu2mUCAvkV-WW6MqWPN4>As9C=qK$MT#WlaI
z$;2A755j{~hSXMl+IksmR&xpt3ErjRA2Fu%Hs*>UCV=M*Ti;GKE8v~=-abh{DxDb1
z1i(tiqXA%hkp!s-Vat+?pf1>5j7c*X=qg5vH**gyq*z+o*^R%TF67EzVw(?L?%egg
zx?FoD0mE*>D=8D|@$LLjz0+p>^!5o+EJ+s2V1R|nxT~vURL8erQ^LlGk*N>7dzV#7
zSi+bXRQ1M3?VKU$f8RttHgDc(O|PYn#k7D48~-(7th*uup<F}${Az=MYmzj(pwWsc
zGnVwbp1Y5`mX{;{yN2+8p#0-Qz%R5Rxau^Uu?O4d96Z{Ca=^ewA5?XJlbx&;?1(ss
zT8x)q3ajtz70&b4)X>BnagW*s9(sKJ>|S4tlDncicv(Q_XX3ycgm*dCEHHCyU*SGo
zsL8*C0wY_eu<?ob7QsF_3eCY_AFDmj(LExDv7Bf=d@c-jXpx2XD8dvK$s;SW@OV>T
zNU57_(xxJ5ULwr3#^|pkpFHnfT#ErWXhfcWh_>J7JAUf8aY8c%|F;F^S<JM#Iww<3
z@VJ)#@=GadM(dydUlss7eO_5lMi6O_KkzA7n32SLIzT101TjXKFWS=sG_O$|S9sLS
zMnijYC@uP0lnII@%)o}P(%^0IaTqM*NEp)@wWV&ooDysm_Ld!f@mI2ByZH~&Q>n8W
zycQxl^XlsA#0U}mxXU=8&Z7h<D;Kb2DX%kexFi?Y1^1-3y8h)V<ThN!NQwx9ek-b<
zi!n6er`>04ep%J_>oIjKEz8SoICtLY$;0oR+9Q?m1i@&(`W_n<A|E4w>ES(LO_+Hr
zo6uZ$i3r=#f_R#pq)O^?_ZNj_PePmh80?857M@oU5BnjpZdE%;RGcuPMpA)7X<I_1
zkHT$2!`4w-(oBSJgj3jSVVKeu(-*J;55h!khv*we2%Ug8^`o8qKF%FS5S<-f6)#Gt
zmVdG{ZYJF;9;?x^0_77R8vaG8NoFBSX&>9HyZj<CN<2!*;btt;Iu50T@U)e^+BUY%
zHjqlh^C9bLYLz0j!R5o-mx>k4fUt8Kxdv6f8d7+eGteYAOLO`vl%kV!G;&`dB!B(E
zWh@C=)zS!5QTKzFcuTqOic18Ozn?>R@!2LE2(<X0T`oTr8ziTsDtsJgttGDK;ZFV+
z+V{Z4NCuZn5^Pm18H&$$JyND>&}*~ZTa7Dk%Bg;1naa*Jy$>%3+;?fnfiZ5Ihl+yl
z`0(`7M#&fGdtOhkc%;WCSluOlElqUtL${!;)cp`E*(M|B*U)^c)gli7Rw{TMWWh|F
z`6+5xU&5c3SMA`7Yu{9x$Zg^IzF*mIG7xeU6yq&h?s&cKl|C=pPiv<Cb@sOg$MeJt
zX9c|9g4Y5uM^x6Po_9%%ZCZX!f$N9{m-Vd%33m9Dr5j-v;ssm&O7eXbW0{P^;yg{^
zQXg0#vggM3VqR?^vZb$Q=ehg3%K-^H{AX5`{m(>C>u{H9ra39-_}$3gHo94*eF@mx
zv?-!FjaT6BSNYckZVdFm)#JiNXtQGT#Mjs?WpQ%;mi5c3`G3ph$_`{e?7Xi6Bg-#x
z3ulcP(-~3;NfUq5{`*t^;HzDkH9c`O%q*UzaWw9(Heg(3nI4;ka8l<W*(G$2xlW^5
zBfDZ6!=0VD9`O5Cgo?d;onR}qxWsbVVVi%&`oC*|{%0@n<@gWx*PrtU9L%n5E*rTR
zGPG><ogx7~<AGP!574sr0Mg`%w+XO`-2tY)AJ^VbPSu})#i_IHSuL9ZrtN7m#J4^g
zf}8PKFT5@vb2{FXY(IDg<glBZqG3PcaNT2*x)>S9KLB{}4Hpfu{HMnvoD?>dN=*#(
zchV4H8SJb=<Xi-JX@WJPl1To#pyO>)21UKe>yI+jh~$lK1c?SV6&`lW@9aKl49$Y~
zGQ|HPti-|Nmw{(-auVn2Z@Y(j&vP!84y0#A#tm4+QP>oFpfQpCFhy)1dhRTDn8zd)
zn|0I@xXR<H-tnHo{T$}5gslrZ*M|4CK}}I`$D9|aSuw`|zrOW3e-^`G1B9E3XV74)
zfPv4pj5o|l`m=6)*7<?5TFLe;@0PrJRSH2g7qU#}=!feV^(g*mDg`VmR2JEm>;@CQ
z)y_*4VsmF40|?o^Oa-DGK|U~H2dx~TcRD$o;Gqd|*TmN{yhGYQPAFwA5GaLew?6<=
z$QrIG=+|ty23ENQ0XFI0I5a+kn8}KeGJ|a?1`26+Di?JF&U!g*4mWImzT^kfoI3Q{
z?8-s^1}z2d)d#a=-5<vkM_rntTaaIv;OQfn==nG5b3%JZvuu#g1(ePMS5Qt6w4v3F
zv_)z|rnD|0-sRw)9yUV7zaQbRusRp091A6jZkc%T<(yzvNlNlnl;`gthO!v|dwta)
zv31mqS|vCCM+1t&K!&#vL)6NuO%>m#XLo&(;uh%duQM4eE@rs54&XP-%N=y=y`!x@
zve|^Pq7c^F+b<z?FIofg$gaOSKBel$PIR35NKYeZ;WqZ;j2_*KlR&;&-Qs(5wg;{D
z(<pRax@?yywl>=jli@ARxJW=*qnwb?F|PQGQoF0bQ?@cpStj%#8u-^dsMe8>I!^KQ
z^ccivEiO(?5w;k^*ZpI&6#fJ0k93jTG|ML$K&L&;f8W?G^6SL->NKzpglFh`m!~vR
zy{xY(D8oF7Ksa@<NT?H>-0EIN-m6VVQ&#=O*wZV?&~JQ*P1f%kTaOsjeC`lwSe336
z_SS%f+oR61*M4srn%&)N?IFbt4D;u_3YAIaU$@q6+C5CrNdC>$GcsQ5S8FS!_NHb9
zbXEg@u}sDP9Nex2C2WRM$aawy5cFO<sg*m;wg3iGf`OcS@B*Q@R#B70K%2ynvz3zL
z1<ljJ8f599N!ew*%6QyyVF>wb(`k-Nvq}lh22UEzM9Vc_g*3^W_l5FssCHVN2jWG5
zxyty&SK9$pTxQmeTRlGij^`jvEKQOgR!gH-;o(^2@-|uh9t}1B(jQlzev6Rc)-+#4
zq^*ubRo#SI)%NN9g>CLPTMfT*wC6DHkb!eBFxW9D?G>Gu*VISpVhv{^N|3tDx@c$C
zkx_oFe&^4Z-M+K(yIjB>&jMKq$m8aJiH}o=!~TlRKJ8^Xpmif`IF<_D+aNM0=0@xm
zO$JLqsP`+6tHoduC*U|iC`vaBUtJuIuiB7uu{Zv7giH2?Ln)bs=36#LDH_(w>4Eyy
zcHMKh@1bQU@GaR5?cGNwKj<0Dq0~Y<np}+jL8%b$S~&GVrMZZSPR?D!<XMUR7Fq>I
zq~ucXZS>-!%{C>i0KWU=<oeZAez_g;<BMe!58_Uy`Q)HscT<<oo|+?yhn32zd<SpW
zz9V=~^E1hu8VFbHc-$8UX=a~NB)yl9|4-7QQopL!wj2dP77#8Pg-15ogXFG%GS9w8
z;h|;$%!}=&pOlu5tznEW8|2wcUvV?jExuz9j{RcKm^uSOhoF=x6%{~!l)w8)!1%De
z1~hU>ljUIM!L&)Gj<0xy$ALk}G6zG~pIZ<8Ab1eG`{|k`Q)6&$31Mz^y;mbV`%Am7
zM_CSO3@@>7ejTL+np7fZ^&)bk3H<BBt_~#+`~HVaL98o#`eVzDX!*=+PU6{&qVHow
z#;+yo&0$kEf6RMs5r9at#~QusS8jYM0}j2RB~O?7PGZ6g&P2ACeIHji!EjbX?_HCm
zr!J$-aVCmiG2$}t8|8uDq!u22OU2@{i)WN4u+l6IGr2@y+wdI(Uwmpf!}It~2cd60
z{Y=J3LhCIA=_QRpPwPta{`ILcazQr%1x@8ZW%c95QcK)@VR6Y5{#MS|C<ghv9U(?m
zGd&2*g1%oRXPHnCcZIg58u59r9R5z~Iu3qi7x(j1qnyh=RrTb6pAhKW*Q9rXZEpnd
zg^u-$-ik?#%*?_N`8<B^?OZ~u04}c_k9Q{s`I~fHFz{N{y+5}nz<YCbLzF<_U#Q)>
z_n24hV76tjG_gK|qCA>)_)bbu^u5(xbYMGnRAYfwQkE-V+?hY0f;wVgn%`&XB?W(3
z7K__0%YS&dUlD?Ddh7|G4(*6@wPEdN&9aXrnfL}BG3lok#e?{pg4Wda`D%>)xzUjI
zEv@qeSk<k$)uK(z3c=b{?2T!Ly~OqcqKSJg?0ko>Y2Wh3q6NjcsoZGMwfg#5U@-%K
z@4as}1t+Z_x=);2>eXty`$IC5#8O}hKrZUXIPPUOT!~k)rGv9<B5jk_z+)>Csd>SB
zZ{k=I<%5=_kl!cV(yJ>o);TWvys#s4p?bCFEqI$#-wz~LMUxZCu31`*8|U2O$ptl^
zZg{u@FVJ_Kc$w-Z1A#M+9**f7WfPU}oprI1VcaP>XTm^iS^EgwNXf-XlKQ#=esN83
zB{wS_DSW?ya=llj*$oD72?J{Pe;?-S)hh};6-S1(_tu?{E^VyDA9%Fy1f9jQS?3@7
zYxktg+0Ko+F9#{LZLU+6C4X<~8qNS!M&K&$jJ7ThwZ85T87);I7L|w4xLg$p!1?DA
zZUmJxpC;Q%LfFoW7pM-fH;D33&a)K(rzAie#`*bq>&{@`z7A3QF{<0uY3ae;TS_jS
z$RuDM8?RrW_O9$2&RoZ$Sw-^Ye?LQHF@5iHoz@e`+NhrBSc+ZGDJzDBLI_E>zL&qT
zsx-V(8pCBnXL=57#|L}nuT*Sy<#Aqo%TmPI(k{c~#}@g`n8$6>5U1O`t(omX7V|_(
zX7=A}x}Se%YrelJXl(7E2>kWn{PNpg_tvTwf@cvsob~Ht#q8hP{A`w%-CX4cfxPOc
z^J)Jv{b=&${H(F^%EO$Cpl&G7k?ZZL*ujs-4Zpmb>uu@yilXf*Q{pLAO$gsGDg|>4
zohK@mFUjm<f9Jx)r5?F1RZB8y6tf^qwIzBQ`9wm67&~kd+$q8PdhG7m(_cCz+d<WF
zHYy1a44AIAUOw4Pxn+EjGGQ5<m7tW=*y0>q7(`v8mZoe@IqW75H*nlytfiLlv(BTS
z@P9+p1PSY~*0>f2AqQ!CPUNZumEy(r-?|y!GhpU>XUG$};TaNlr8}A8qJy1Y=h_Ud
zBL6C_@Q-3agfP$1c9^FU8tWW~m2uu~-`@`MPqOLItG<}&qkFzt-f;H?Or)JDrfsKr
zzYH(q_)bY0bl-yHrSk?7tfX+|>u|sU&j9atdhTF+zXe2U!Q$iccX_>rEA9M1B9%%W
z6O8~2$loY_g_xz*-IatD0<Yguy;lhhXGd*(Ewn1*JMV&Rv9bStPvv$t7cjbKSa|vA
zrfit(+?4@Q(~k2Y8^``Ole-<R6jVSxaPZ8PEbeSR<dN6yX6@Yo^?ajqrcy$%dv|`I
zc*Qw`{Kj*Qf?yXj5vL-GBsASI>GGQx#@mhV|H9t?4P~ux*cJGM*FP#f)Yv*DhMOUD
zC8xVJwDl~Z)???m`tj!P>o-J8a(*RI(`<o%!d{}C4dz#~Xr@R(YW+S#db`zhtVSEA
z3u-;olVi9zr<r}atBlNfQ(Hv`V+g)bd(Ka?#Emn0a`c;k+RHf2M?aLiGQ>~?E1nV(
zD$G7Un5bTg_?cq^@((}7QmKSW<>v}iQcIu@2|`Nz)UkDCr4GV+%g$9Bgu1?FwSlO6
z`9i2$m>t_M9iiZbj9J2oYKvIHu@rj4S6?Q0$~f4g&|w-rx&Y2&RxqwSDA=H?@^)#=
z3n{-tzm7pK2Em2^;iQWn+s<rz{X+tlYTsDo$I%>ubQD)p<w<acQ=2eU+r^NrV`Vp4
zGpG{Qv;T0D7+&k;ZfBp)>}^qh5d`c5<If$V?N6%>?Rvyk$?Ie<3tO8lSUp}?Arhou
zX`Dd&NYCqEVX8{z?w@G{99q?vvmo5DB=}V)naYL+JxY6{<x$jC82W5$@tQZ$atwnz
z<|Bh13M=;Q>8kFP7uv8DvG`R{U1PA@yrhM2Z(dYenhO2TAs5|vaad0+%=|-3LWL?2
zd>`D2j-`nibV%d71#4txk_zShBC0dC%g%5>r8sxaF#u+uKVFlJq<|PU-_)j*d9#fV
zM=e*ae=~NCZw#vP3%v9;=Ss<^<)p0%ii@pPX+G`<^L|3T)a`dVa5}Bvks>Xb!!B>e
z^o<NpUeABz=zCiX7SQ2Z&k!ioF?{{scC2Y!%r*6rXE?D}@0H2L=V9yP#2|y7U*y<R
z>HLA}CHyWlvE)5ZJU8jW$tq0+cpB1CTv`g2MSoh>i7-N^REUBk4;DvtnpcVXZ`a6n
zCW2s$!fk;R4u1;J%H5O9mH|04!iP_!0Zl&(P*=y3aamkZ56TzgK3cBR`$fOu;tg&M
zbMQ%G*M~#hB)+00e_jVmKAWgt|0D^M(=TGh)UY4fRPh<~@tLBq!Q=eTvSDEq99*$l
ztSEjq`vsvgV`~K;O|<lgGzs>hIqr39QQHd0ZDv=zth94(smSVK+r;fh;3NdQW3m(d
zSolp-0V<!{BxM<94ZC)`G#8CJ>!mVh5+myXKJD`3<h|N|o^XJ~P~7$!cvxuz(vYMq
zlE;7bIAEd?>Q`1dX@*!-H7vy_;ZutMaD04+D#RR$j-e;~XX`knnIB8TqqL$Q!%vp5
z_P)e^KSKy60mVOh_|2}5L6RjXikJyIl({?SEF=1EgAu&nN0mYTkKO$jR-^ax!Ssz&
zO4@iQ(;n2EvG`UT&HYO5JFyJaJvk~_2jxX}3uS_EV9c2@nyVl*q@Qg<gLM@1{-6^j
z*}(O7*;WAytWWZU8Fjbgy@!g;ZrA4?T46R{k`^fM!@pvtG1;D1p9I4dvUT!FV=^R;
zn(=A(7wLE^be8d!LGfY`etgo{tR&UuJNSO)I2;SDTKyO4>Mm;|VRf4Cxg_1MsbOeT
zoZ&Hlx|>^4?!O+$?p0^r;e!2*Wper%G9p*dqUpTv#u41uctLM_T{#Ao_AEhZtr4Zu
z2*NZvp09`R54zn?dSr5XRHTI@&c8mu<PyEed6;p$sX^lcv=+W>-L+yqfx#P}GbBDf
zH*0$(-V_xo%=4vyYF6p^@s`)+v1IPmWP!vmmB4wDv9=uMveINSoPx=nS0e=;)@<^L
z>O~<B%x-vVHVYW<IyYj;Y%tbfZauv&N;tC!XAD0aH)yjf|1*uUAa(R9eM_*V1f&5d
z15B$%NXgQ{CR&AJD^wH_+r~#xxbu?I_Q;8akPZd0gu^Ea<(UB)!m0e3z&C)<pQ-<_
z1_r74f$(Em@R&6ZoSWrwDn_H!_qP^a$+MM&{imaG&Oicaxgw2LhJsIa?%>yN*#mw*
zEK_sK4LO|hP(0iaNQ`)y3j!$#-AvSn*FT+)DP|$UCemOMLR)xpkg(eKL*7+`Vr9P>
z>RZXt)hnndqNX6^Lwp*$`=SSTS=QX9^PNLi6>MlLhW;AM=<NRMA|7$Ndz1DgLviuF
zGB|6pMS9B@)|dYxfafFSPjaxaNE2}{n^uFIzDenOdIg7^wCZ2iT`BLP-x-qOdF@YJ
z;|}`#i%f$tXp6hSRV0c<WP<8`9y__b5-*gR)>}g}%q4zI&UQ@BQ>yW?5iezp7pc&J
zgoVqFHAw^b?~}~@7a=O+iS(k0PNrRN-G8BbD6W3$mAaleq8oHnh^n%T?n6VTwI<ki
z3q1ZC2zCP;=X1jGNsqw}h;UbIqNOXP!E9hG4L()DDLBDO^Tu8_>=h<qAVvwZWh?*=
zZ6?1e57L!fc?)hn9(s;W<{6zKbKm>c7Nh)x2@MWQ5&yk(TBxPuzf4&9He1lK`H_d5
zpLa>_?eJN{2A4u3!!ztS+xSd6ZWNeTW~xOS)E*2TD!%^rOkWM98TVS9r^cy%?bPuI
zaW>PE)h>$jQ_aYjaeY?zi-ZqsU#FmnN~7`_L>io0FtE+d3@u1yiC2_SNlYUUN$6b~
zysV4O3PJ<S{YQ}IR2e>AIIbbFVSajgmPo>x9aS|K_mp_&FOS?TkAid&a4$6bUu3f(
z_BPsVI7#>y`yadVGiLX5sq*S%ysf7^rMU(pc4+<gJY9jP;TShs7VqqH(FdA$(%Tn8
z$%&67QNiza@-+0srJwR^FscFy-?)~!D5&#P14to@;La+|3R_`6{HMplZhdNuSBlX;
zqMc@M=`<K&7TTsgS~$zuF_^Q!sDb|ao~DbqEpgddNc62_mID!ZoC&ES^OQd%8!6Ih
z+)TBQ<hSq;bEy<)c_~W8??Fq24jW5V?^aXpD=rA{&h2>H0?VG%zc1oVLTaOTI@p;T
zQH%j&rMkEOCOj_#GNk%MN8i7>1D!gb*pkbQHo~));52E+!+vs^3`R9_w#_jZucb;@
zDd=o%PS7s!GB2Z`b=Mq6iyiDED!A?$;rpW1{08_rO%j`!Nn7cMi66yg-S~Wu4B6KN
zki%b{=Z1p-@3mdHG`q^*PME$kd3MQiE*B(17TB|=)tss{|GMe4Np}f&PEth!i#=~G
zd6q1D6Q65$@{xD`B~ro%J!XO7MWOi8#foy^Kf3u{HCCPY#uR?GU?(jk1KW_s2lSYV
zp(M=-SlyOrgR2Mra-kk3u5TJt;-j}uxuPX4Ey#_kfBDx!vfwzo1b52KVvdB#KpuVQ
zycUff=!O@?5_HZ03Ykj_Dt@DHhWHuUpqLqk8E)7g{XYCHx8pGpIy5{}bbuV^POrHr
zzN_4O0Tu&l4g}PIvA3!DC+PRaAyGJKpI%>JSI0CdwHVP%*NR|9b_(qZI;bSzCqSZA
zO6;5P6L`~f@5}c0Do%;c$*}QU02y)!3uO|0V9SUWg}xw9CXLDT2Z;sz_|1yUpL(r=
zaovjim}yV-qtDw6T)1^_61v0e`W21?E=Bh%m`EL`B^lx{aVuTWyI(aT_=AhLxALa-
z13;>eu>{AEfI}<FK+q6q4DI_x>uZ^HQ>LRb_ZOTcoK0?{4GibDOQ~15{<+ccvWg+L
zS~cx?g$ua(r3+yaP|t+1?b)*RpJ9zS>^*VSvHYjz2g}KY9ZufrGEHRQ9AOWH$R}@u
z+i!9T8|zG<SrJPl$zqMEEzg&^mW)$Ew7Y|7cd}SH<}gMfaIsHyz0>;h&HoQfoJ(<~
zL&+a_vYtSU!^;K-z^*pXH<h5oDt-e(P!s$6T#^L2a4LXXw|zHjWsM?d&w+YIfP#NZ
zj<puEo!jAR183F?)GrB(L94<@LdlLrazrodocx{~vmBF8v7rTG<lPlpLi)D986b)Q
z?Jg0iUIYU;a}fX`<Y)YabQWHmsBu-|wl@Y$nm=eH!tX1|Zk_{lCkdXvA$wLVu5?q@
z19V8FnmJ`$%&gL0G-dauI&N}!8RL`B`Gbz9jr*+U{9UgxZ)0bruO6!r@yf=mU&*#o
zJ73gr9hp2aUWQKm12?gPp$`m7o|s99J%CURtJ4oC!Nhz<8g_;`YjdY^Q2>~$aQV3^
zWIM@=gsY-SHM5Z0uH)^HhwST#rS@cHs2o?w$%iLAo4YL=4_E`EU(@;e)-3my;6C{=
z24OEhJgz*?QgZ*5`VU%N#29uXibn=ne?2g#!S8!?4wd$^p%bv#D#DMvFARBw{0>b_
z^-An~#vL=o@`9=^!6Y=b3Aof<kMv+477L!zyer-z96Ab6Y5u+Pb=W7vZyh&lsh2I2
z;#_a1=SG9$MFwXKl?Cg4Z7dBHf37ZO+OMy|a?5`-n%!2Z2pZh@xD(Mh0p-q=-pwcb
zVz)2&W(lY^VPddR@l)%GiTgzH>cKl>14p|T@maIwPRo3lT*zFLlnJwk`xi-hi+J&G
zRDz-g5>Vc&CV1S%Ipc$s&M3#D;z4wbcO*h}aw)3Ny)y>EZoZUJrio}Wb>5vIn3s~_
zG#o9i?;)k7icHcJEZ~ysKhkTp&PVTJ@}i#+r;j5F4~=D{o1evkB!KR8Eu7HS!<Clt
zpRU(0Ge?f~ZWc26VOr<knDiqHLgk}o*~}sN;wE-DkxJE=kH!cGwiX(mqDrKm5!_~x
z4uuXID<tb8%P0L_dBi4yu!zLye;0m=j@}~k=zU?yGwkr2;n{mw*n`mz%Ws~NnZxnF
zUgTuPs?c!^#N6(_`P`10d76wICTMvf@kfdVAOp=hHvL9LUPI)L!V9x2QYA#z1_NNb
zE#g4C%nKqg{vze@YnA;(esq>aNbmC+Lq_a5Os^>Kv<E-pxNjv?W}hmO#YG}-fQb2$
z=NSU-D{GvOJW?V0!Bhl9kP;9qHk4os1N)0S(hs7m>QX<b(I1T&3FoxJqi^#6ifT~e
zWrc9V>Of8{VM)n&l+O(WVQrAMejr%Go>iQp=g7D=TaMS@asHyzx;{fXF`sb@&--G;
zgRfD{HXR%Zs#LrYdT2dp;xFtdKTH4d)Z7wTh_zGea0_4JDSU_jH-Esf@CHyzkKBgb
ztA5U8?P*f<)k4$a_mP%gBE?)ZJbAl5zWXH%H6bRaSFO#%UH7z5nV!O`sQELFQC2LA
z=y;;v)w)Nw5a2(N(kP_{Fs0$C-PFk6)CVIfP6>gyUiSAl=c+mdG+)OYgOjZfCcQ^w
z&J_N8U+KdS<cg!2*+?L=e7FYc*Oh!isfE1>b6X$ReaM>8-XgjTB-j|&1HyetPu?dV
z?9rB<*(U}g??sft*yhqHiXyDon3oOj-!p!19_sl&AsaNw{fD-tqWZIW$ktOIbx1aN
z?0E*|ix}b#xNc{ygT6Sq6X%t<)?UVN>ae;yw~M^h%&*FBIyj34`2S(=EuY)^mNso$
zW@d<)9WygiOo=&WW@cuHnIX1gW@ct)W@g81nHfjFQ}fpIo^$3en3`|es*+0DwRU%}
z?)$!$?X6~R32*nc+RT&`92DiVLo*tE+m;f&PH`n?Re<6mpft$RA8-gcte8`v2kS<n
znY4Fa*!|Lq#V)xi(iyccGYGjF6lk&$^+(k`s}EREVyH)<8AI!%5>!gO76f$cGu)hA
zhPypXbu@xq2KLfsydR_}WqW7D%c)hyAQ~gdBr0`ID@guIGUxiq^vfaV7s-zN3wyu9
zNm&SOFs2O7Y2nWPU?TO8ctz@j14p2SvcPw(TG7(7A<eeZL|P?3?M5NIX}TvGekngr
z;Y&))I=eP^#U_cU<F_Dvk=*&b26Y++yf9YIFqVjJ$vl%>Oa@stiSbtdcQK_H%zvtJ
zbp-2@m==Q|!IB`rEUOVvlN{-r?54<CbL{8GWuZB#i@Hh!^K`@I=#pH6acRH!F$<@1
zLzuZi<_?M#R3D%&7*Di*sR9j+n>;DDO69fwn|g*4ZFxPsdEJ-MCTok-=QquM>zdOT
z>zC|Lxb)v;<Nu5@4XoGZQm->v*xV}YhSV#;XwO0$JSry{IL(HeFJWOMxZ@bl(`tal
zjXx{en1w?bme?ap?SvLx(R#TwD8>s05g33Pev{i>(}HS9Y0-Ut)ga{ni}||P#Kf-T
zQ|KAjZBI0D;8j0n;9zyvOPRpodr}T7kVGvZ?O>-j$W`uvbGB~=F8+)j=|E;%LMIxk
zen{(u(K7Zf^erweu!mRh3>_n!0u3g*Y}P@8CMQZ|MHxfxYv+<hDM1xY4N9(3p^82#
z0;X1R7wi~Z;HCC{joGhoSY)a+O$ILUV+Cd+!_YSR70OAR0l-OVi(Fk{>0ov`Dg&=;
zOIV&_+5F8-*%yQ8(`ey3MEXTDl}c%;sKjE%5CJmOEoGlL2&K`~Jc;4@t{RA0fC1Qv
za9UMxeR-69pfetYQmK?x0=`wf{C4WgB>eJz^N`n%Bf2uI1#&G3ZeYSCM=?{{+CawC
zhXRFf)oC()1_vp<67g3LlHa}kBg8t}AhUUsOK1G&f@W!=r*ZmRiHK&-3{QE3CFK)@
zGzF-AqavXH!;*C_5D8y1?V0}dG{}ltMp7_kJ1{4HdxZ)~J@fR)ddr93dhMr?40e{7
zYamj=qWWTlhe%mLaD7?D19Iu`)tC0Tcm2Mf5grY^+I8c2++J0>rIU<!trAGYu<RpW
zL0|;-iO!Gp)PS(*EAM4>svEb`u=j)Ulu+qJs#X5noa*Z8W_GX3?$0hX#;<>YZObQX
ziM>F#|Bfh_<X4%LnyHCT5-QzQsxQn^*nv<Ru<-W#EH*9IgIG5~OhOxC-ojM{>_*<B
zUNcCqcQC6;WjXloRIH(TeX}_4FAwIZ`i#A)OkEaTe<Z7CZ_KZNWAIiR;mK8T9uK-k
zjFB&AIHT}Q4@`u{rHv%A33-F?qh`x=3tuo(shMq-d8T-vC`w}$DI`LxDqL+k3wLnw
zMXZ#*#(Rb*88wEY{D}w(JS{+u1b@Mukq8D~6-DiQ7*Q-o5hZ4|;0%wu_MXiC)Nntk
z_CsSlRqHE&%GZ?^1M<)i8=f*qg@co=!DS>;Rg_-jEqb@+@iKfBM-SHzaY$$>IiJJg
zP%HuVx+@4NgofOiBasQXBevRu^lWs@@)68KKWP)-I$#^}q3c)-aOm#@1DV8zqZAo3
zSo@<fRio>Ig-|t60b(GsVP1Ln788lGRJa7+TyX%?-!YpJ#+UePNA*KHtr_Cq1Fc5G
z-;p3&`!|G@6!c71AeBPH%@u|1*T%F@YpT0#sWsW<KtXOKd{JrfJ2RkE-6{~t(L4$^
zCVmnAP!$p5aJawt^GSa9AG=CN$v1=mIVcJwVZOZ!Kl+`Lv-`H}aP#Nhb<QE#KKF4x
z8A7Ur$&lnfVPZc~e;Ut2?h5#FHfxai+omNHo)~5>A^_ZyWGcYJIG5ONo24!Ko8B+n
z$nx<({i1`w%nqg?o-@n?)ZxUciKUj9Vk(42{g|?=ca6e#Ubnjvr6i;V4@nL-(Z63f
zNyr+(CeHT2U*1U5&x82!zvhLtF|2R<a1p4SHX}${C3#lsQYcZdxqo`1<G=6mU8cX`
z8h0$HBvQ*ao$+Stoo^%>9W_2!Sy}yv9!HUI7XY5wbdze8kg{s`dccq8$B=N~mSG2;
ziivBZmPp=<08uP$QFD%5zxCcFHm2P1p2DH^!8ft_p1PnG6hFW>%9bE0@iH$Fkyb7X
z$8)0+eKjHB#{20^BpQ6Dek4H~mv)+`q@j2TStxlFeFpz`oMBgoR;yq=t>n(AV1TR>
zcL3@LWtcOfgam787Xv5i%|Bx&Tc^Nz*l2+RmCFQ=30F8GCxJ!pfKOd`N8yU_wE)kk
z<HrxBs}Z+|{=~eqzpI{S2ofVmKG01V(-|q*5WfS}evN!B6_=`pYJz|sYr5(Y6j!N3
zqWZyrh7pEcBS`a|M4cpGq4o|O(ng<K0X`Ydh?d@D`6@SPlGA-(i0h3Jn+lcW>6;Jk
z)exg^w&1e9i`ojx$$-bOCxs5diX3W{K0xp(da{=+S78cs+|FA>V4gm`fWT^&KS49<
zMG#eCXIdL`Y8=t`?WjY4Us5@PYr#~565VGsfihQS8L3N_{j@D6&?J2hzw2?!YZv4m
zcK!aMT$8SQ<@Mj%m$H9tY9<i%TAm7Z+O7w~wBUPgdGYmJ(MJ=gwV(DT&qk+$d};Qc
z7Px#2hXJfFWJCKfjaE~y*~qZ|g%}KkBp&DT>f0cti)*S^BSfH&c6m$lSx%T4V68-j
zdpPyh5X$)3#TOp^OyV|vZx<DJfqWOs(xOIjIp|J0zeo;_5?<kQyGFz^21sk^t27{N
zL?N(1j4uua?ictN#~<AAeG`?zMJXf)-a>_UXsO7woJp)_Ln&ocZvizb^o;K`Ry>3v
zyK;VmIHa#StMoCx$qu&VL?EDz#f4f<-3Cj^RFS~9ZHQBpW?7+ohbz?~_GRb_9s7ZZ
z#H&QN1W9x!NzX#;$sfpIfIzB1VFFoz+KoqT?x!3eytXkzFw?EGzkuw;&uJ?qw?qfl
z^|~+L2S=li$C$wt4Ck{OipBt+r50)a$1)-ri>$!1e^ke-z<e|ed$&KeH}pBP&f20x
zx27ja1Y#U|U9*RXrd{C)^-4lSl(#Xl-PP=*esAp<dHI(~+of<r$GfUIheTvpvfihr
zc%)M3Af?F%v+DxSSMjfM@($0S`LR|SFJMYvQimwr$^b$U`gw3=Jx#0c;c*+GUFc29
zVxwRJ-J%a~(hpjH12t-1t+GLrv!t;G++Zb7udu#L4!CXypFa+tmB1ODWNcI<&$ftg
zZ>Mfo(8<<(Pn{6+{v%k0Y%bX6o($jy(hU|420gXMFv+Ky8wje)?)0{au8MYI6V=xe
ztvvOLwp~Mn+_eGE2ZdV5-S*d(ja1OZe)b!Mo!SV438yASGM|(?MJ4GKv_%wz0q)7P
zVON@uixw@9a$6aL3QF=!;UEzxgg+C=ytJw60PP*R6CR}=A>8K7c@j-YWC(S^cSc`1
z&NEUH(XwOy{t-~tD=|7o5cOBlQtU*F-E|*<cp-_#frK+UjKG_AKcwjIc{?2D_xX+d
zou)TQ7eT1h2m;w8sPQApoTiT5a6vh@gwNeytp40--=){Y^@PUuMtRvu3a*A^Q$5R@
zmTu9@e?bhDtR%4N&^pMYVbheRy@xZ)P8(x?gGFf8wpLJV#Pyq?gs8Z0IcOTJjW!zy
zfN=>&WAyy)gF6>s5PD{@*0ZTJG)m;|oF&N|O2C>2m2E;-iExpg);M*~+cNBt7n35z
z%9G_Oy^K5sBlsC-Yj=OTXi&9~V<cX5YgFlUG!?w5gru~7Ot;wE_H{3VSjU0>67B!s
zRjm*1amzA4;n(qg-a|m<A{u7u_11{)Wk!fG38<JF7K}K0{vZk#Xm+3mOn@Fvx4j!~
zJp&!S*&A*&UwG@`;~c2;1g>YRuhklfq6j*ASWjfZ*khPYJWQ1cY*=BU!Tyf=p{Jz;
zN>?X+wfo7~2shyAv&fRiimw>DVN7U+_5L}Qp#I}Ij!^91&=KqrXv=n$>1?wI$OVWv
z+|lK!!wI<t@8Ru*C8ggEI*KI3!6So-#otA0)q+p0#4$l*1?arOu?1|^^g~+__UPLF
zA$MMDM8-!^TOo$*vkmQPH1A5a=rjQ`%^12)jM5<zGX+;+1<bHys9T8!so6Ti*0pQ|
zYVkM%)lH;`HnB36Zz{6<S&w6~WrKA~Bfon4Y2kZ~PS1s9*!5J`DA7>(w!HLC3&{I4
zXqXpqh98_|#m5o+o^cGq!?0EOqWVF}8mz9y6dB<h<BbV@R4orbA|cmEGE`AXPem@v
z_%TkT0|gU{&L?4kJ_W=bDp*@Jcljw5U}^;rdL{t#wT%5Nm^2cUz7NiWx}Tw}q=TiL
ztI*+hS;JJcS)f<)ZxjVzoGlv`7D!z^(+w*~UgA56QU83YQB?O@=~i&kzIR6sCQs)I
zt`9{C)soqUL~|8Ny~kV+wMd!5vGzxrp5K4#?0gdcTsi*(1Gxqlk#D*K4er1Fm6}8d
zHZ9*XhY`-`)V4mJRrkHwS+Nc-UaY5EAx<|`*^07|pucGGttE2P_H}$of7nT|X=uFt
zd9AmoHR>^r`$YszJ%=;?GMQ1<3XQ(R!0|`$&NX2lZ1<vcFQ(YEK;!cKo8)?%+_5GS
zk*U_ZKc8z4PP_gOnX2_Nr2cL%8;4Gbj$T|7lrA>t!=AYY1FdDpb7iNH+Ko+i^%WZ1
zGm<^h_P!uNcR2*+9z9r|(>eWRSVunIkMNSFZvSozUH9)1dpLUDwF=oLsY&4ln0%gh
zC5xt{n?#n_x)AMe*Fe?x`2Y560YMtgye~3(A#(zH_A7Ip0<+l-*{=<;UbYr`maim_
zD<kWpkJls*Z>d#iU`8NA_t2q3=a{Bz-ho5z>)-RDv)?)_^;WrzVJwayiGe?6S@iju
zDs1t!wr%klwJ{7@rN!SWHG>zHvf2RJ0J6&>^LXBzpZBsQKjNw)IW1pQ;uz5OAJP!#
ziEXdUdabHy4i>a>YpbRf=nGJ-=Jw+k7js2H-b@n6#rfnB;+VmKO){6eAf>!a^KeVd
zGg<MW($3dDkk2AlrcwKAs+<g+Wf%okG>^tH^^1*6CVkD9VKl#RBvJ{l2Wl+D*h*6*
zPS_PEwrnFl4qM^?BBlO=Un5}feR_q+{(#r7j;Z8_`e6%%Mnyk<ci$GM{IPSfo1#7T
zbDSn3N9!SyKmgfx<^KUfY1FMKnZ?gd9R3lm1Z0AQVBaj`ix7Vzo-B|cPEZrB&qo`P
ztR#2bzcZf&AN`d5)~&w$29)?ij{em!wLUre%cr%kv$w}V3A5)=1GDNGyo)rMW=*tX
zshm5g5ITw>LO8OroRD0yZOD8c)3(|uNHvO%!T1s;3O9&m(PxQ5s3^m^cyD3O{xf<t
z*$0!UKM6g`sKh593L~8Md3a;+K0ggt$y9_ae(j=T1cY{XSu^N73VEg=#;fXsL5~`K
zW_VX9?C1GoH!~iKa3<^vdP`52Z4N3=)(laNe9mPsN2ogPfhK{qC7n1=U?_R2Q$;VB
zye9O8G|d<|c)NN&L|rQ%_{jO_;A7a`&S2Ls1;IbrYx6!^ZGF?y9xXr+x)ykEIuNeA
z51(Q$b%uEB;7d-mGQchl_bcS#MhQk}E!N;iPT!7a(v^OuQ>0|m`qY|*q)<(=%^|5!
z*Fx;`q*u7>oZ0}-i|&;cUmlQK@FmhH@)G54>o-t5ZKSOurYZk&0)vR6T_4BpF1FiZ
zYW~T{>M8=J=r-sj@~e7@MT6*l<F_-PQ0GCMf-3RJn^2)Q((~=^Z7J-SaQ96wpMsCK
zt^Xkx{%;pk77>V5=HYqRCvZXu`NVo5F3S{<=_0n%6P~;^*rDc0p0mK?`5Wp_#pjzK
z(a{51h733jV8xoe?B{KeCf0(0BFX7QZ%}f~N!iVc9uC@!9EJMr#i#N6-0+hIo7iuq
zM(yq;)3Gmrl?4~lU=<-6JbL$(Teiqb4dLon<hHFufH5j=!ChVzA|!MH+9|xOMxmYz
zPE_#}{Xl5*T3!27!Mxs@+e)@yUZyG2m&L9(TSE9;=gOarq<=AJno{|ggH`--bGl?!
zN6<K7K#+(>Sz3NQrjD;8w)^92`>P6tM`RqTt6{sj&Qd&zGUPTSh*e?s!W?1-9!l-G
zfhT{FsV-V~@kL%~Y2u_7gSS^CX~ZI$e(~I{7>qAb?E_u|WS%9~nr;-!KJ)LSSB!nD
z8cVQ)=bYm6&W7%-4vz?}c-f+uCU$coa*M2K)ZR&%J4q6Xp65*XI168ET?)H2%qR=X
z=2<P13fbzNgk8^dF_M)=8Y5CItt^q%($goXVSAi)opG0yUnq97r%sATCu34dA6ycM
zM7URyb$6tBqxZ7fKPO`%-A1LR{E3RpP6Qj<+IM5T$S31EsUQyBSXXLK1hNdIVfKDq
zB)}ck^uL+BUK&*NUmP3Nt<1V=-+D3#Jyp=SJ*V*DP6|ldta<IQCOa*-GPik0W(2eg
zCJ~+qFV~8;KYe{%Dii=Vc`R&k5+zkh7eMzn*OF-#%Pc+|rrUvt31b3t;lRyyF6*rh
z9`CzZ$=#N%lsShJZQ7O{l>c_b{Sg5zAUWkhoJ=pD(G%%(Ia9AU%=KrWZr9)Sm;EeK
zup+sJJH&Fy#RwOW%OhX(mZo<8L~~KczU`KyGU)LSk}`@pj3szPukvwcwGyl1ai+eI
zh$rAY4iG#UXr&QH=e4oe)!QGc;V(cdVXL+49=PXf91(XgY*_6AYk@uC>Z`AICO7Qp
zQ_EsiyqG)d8+gL`4OnmG`H?#H?Y<*ub7RH6vR^O;*pQvlw&Bd|8$jAUzz8V-(lG53
zTTGXfi-|mlkWtxw#dU8i9X(AEwypWXl6Wt@o)A#ho6|!YTN_TO)T`6(DM4FcN}fhW
zbm);%wRtb`lCVkon>?u0w1A(a5vMG<yd(oxnKK00^$u@k2jvz_#E>Hh`ga;_r~`A?
zncDSF4ozpfG5<HbGkpblR=pa@F_tAdaXB`M=n=YfyBK+&@=hB50_dDeLp0l2Jw!?V
z@7G|aagH~kE-{l0uDtD64gAeKosn!!Q<rt!&2B7|a86S`MRm7J2Hz_(r;zH7fJJ)*
zYvZjC4(AoJ4+XF1JKF8%zrD67`8GMuX_jV@RoC99h{yMTi8yzjx#~P$GdBVGfn(+m
znm07t1zO!~69_h!0d7YQhd#5pYmTYTb(xR3GucEtOwHmAQTSYp-&Ifh{t-n?Bx=@3
zE$4mftd3{2KAOrjNKxWhI`JGjBUsMd^d|mT`R~9Z_fO#AJsRxvjDo$QTmtzVVeN5>
z;D|u%1SsYYB|ZRT@m*A7o)F7U0T}zySwCb}O$xO+Y==-(r)y}thq%3**vJKZH2ie<
z7Rykv0|eb0>qMQ1@snGyMq0VMl<jwdej#WZKl5vQd?#~`A2s1NXpB0~3pPxxwS$~b
zbVjRBZ30#8ge&vWlW#4<+~leaQ_P13w*B7-(%vV5*P);m*cCmg{v}Tv=SEyv{H^D&
zB<BL;bPg-kRFXleSNx>5_yz+G87y@@rbS|L;HK1;864Q-;(z?nCuH&+@o_izo9=S4
zqsVFuhEBd$NNf=NS!W`9aEpE=BMddOc)_@K4MIcsRb)s-A<-BnAJcsUyi&A8+vuzH
zDz2lmxPh9TAlN=5*V}v-JIv0I2ZHLx*>q-T;lrwrF}yi^>tLnMro&P1<D-l((KQ7+
z6HoBGz{F}xVzy;@=&XE5a&)bhbz2RgLmzA-$%Ww5Li6tWXtF{1@(u3oldt!Q@mw}9
zAf^XHoFsF8LCb)SfdOg3BEX&|(J-;{XIXc>%rNXOfHbkHxoQ6M-%0zRs~)yUe+-7{
z^M%%Zln5|zdeG<*vlE6drYGoMs&Y5rSA-)ok#_*4u*#c3(}ZsCK>r#Y?n|=Z$nJec
zO&k8pH+bieU5DND+ws(dsrQG!9$O)6^;<-#3sOy<+`3xpn@{dsQSCefdajTJ0cK#}
zRZP$Lj82>MaVyh3xEm7x)rC0z0>W?IAyV7jGl>3^#j?LsYkCsjnl(x;McA}4LH2mz
zj$AlS)Fq<WYYD#^oV%=!HXjcn`lW{=7WeTxvrP1PyVPFV4!g|0T`;+14Z~#gSH1b@
z3$ayr%Dgx)y5ZeHyv89u{lop>{|%d?x*O6L?8PaC?R&LvJg&@Ra*^~O1?N={sNB5i
znRyPFrUgJYXwS~I*Gu}(Vli|^wm$y<1a9&~;DQi^S~pbjjB<5U;{TPb-5=2G^A1+C
z#fW=-$UuwA2k-!!2HK9n3Oi%`PrvqRf;3JF`(pKpAkB7qLx02Izi$xsKa8T_AvbR=
z&=(dynt`6Zt>FzX=l>*`{Tu0+AYkjCn7-A!>g5sfkKg$bbl}7P`y6OZ{$Frk0wn!=
zoTj0w^q!+1<9dKJEAU;rMu&;dZp2-GqBn4}i>_UWpu4uLye;^HUFAo$U5Fxu-2P9?
z<;j&4v@)Bm1bx^Ag#CA9yM5n96&2$*AA##!+U>3k4ULWWJIB<^klojf1qA~a*MAQu
z(+^)ks?SSUXh9zN%kw`yYM&qPNVgzq9hB{IwO+FqLd2%W2!;<-#p~UPQS;uOVL!h?
z=9?ehyKi2O$5BjAGg<%vc>Hm)DA*?rZuN80En3}yFHO!{<3!Bw=wg@tWv|`rA!Xb_
zW&W&UEVIY)|A^54LVE9f{PS#mu-uYoIY7eBfpSoLqSVefXn_y%Z)?Hx->hu{O5Gaj
zA>en%I;8#H;$Z*QPNlc4zx`czx#zaK^#%To1&(o6W2yG}KvxkL+uUO1<?>};G*!0;
z-0%*E_Z+_OCBp*kSXp0rVAavKHr$%;jXO?O7^NXzmaspIl3QP`g^aow=J2-BS7Y=J
z^VcvB>D<giLsv$JLJ2v&&Ad0}oV>2${MrbP-d853nS9N<+)Va3Zwnu$O>T|3w+emK
zIL|p^X0bol#Om>h4?Z3mbS_Z+j{nP1^iNs?ih-VFyX}mZFc4V`92Ydu1)n3|`f)WC
zo-B!2mJUK11`Nl5Ei;lc!4{o$_J3?HZF}#0)_Tplz(9$@5Kjt?FwfFGc@TemVNoWD
zA+aOtP4XOcHfX2z%1;h{Kx2ymjv53Dwt46q4RDGT{7^GWs`s&1=VJc4GxKxZ_T@T-
z@ZUqr|Go-<`6t5ID$sfJTsCy0t{mTa%4Zqg8pK>m6K@@E!7`(Obi4S)lB<C(m!P6i
zyuqyLoVVzHs(Q12#JB_cy4-D2x{YWyMsKkA0g^>J)AV(|bIrNc!8(QS!kguOn21Qr
z&yDldwJYzMD)`qflCAz4+%o3fVOMebs?kTPnN9Gstdkp;m<5qD{n>!^4#tP;T7um<
z+b8eyR%xri<f|&K9amHBXxm72VwH8#GW^3Ff?MZs6`u2~+Xu5&%1#bWq!hloCLRbd
zjsuNEhzH7;A>}`Q<nF$Cv|sYOajF{JAkkvfNIy9*nsH9Co8FQ(4u3mLT9rn=@pSEX
zjh|zAAm-;zLjHm-6Pg@~zg!in`n<fhVa*k-K9(dnn_?{X2)CWiW&5j3CZ8wIQWTf9
zj`fb=6sM}WF+`SHJ~;Wslg6XK@%fw8DbHjiVYbKbjPX@^etP=!K8XI^bi|+qw_G1L
z6!Yxe9v1D`t42MCZu{GpX#0GiCU}0}kH-ATnc4h)u-!QoHx_M+8E*Eef~VT&l+AYA
zy2L4n3%l~pXsX-(mwBTmcH)M8hb}@NEbn%!(Cw~F!X+Edjx2rf3JcFYpN#T5ydu`)
zNm|ZUmqooLpw43<r$Fmv_@_YaZH~>&{)7<*aPU5LxQP1QUgg7jWPgS5u&K&RpWm|}
zgj0y1=bmoscA4{#NuX=mYbee8Z<qbz1k%^o;l!-&SAs0Ae-AwF-~LHr>EZQE6VLL;
zn+@&RDa4_ZuP58<-yl<AK&ju2;|gjyJXv=jqa=u#dStEGP6$dNa<97MAaF5t6c7vf
zi@`jv$p=g2qS%$+jQky$CQ*vUSVYFFF<ppeZFbUDmuki`87`3D-1va;A^SncVkHBC
zlNcU${S9@1a{`NgLCC&4H%)OunKZ&VwwE$}A{&jw(vFjT`ng9#&Gc$Ka0R~Zk!}v}
z>k{4!qzkv)?cn)lo5N<{Mg@ZHAy?P7$;WkZ_7b+pTbE{}Lj^EvMhL(6D!pvo`4!uW
z5&fHkJeQw_{2#sG`=lr=WN8EjNd?S@&1aHl(~}nVq5u<+<mhi0v&C#4b6uH?$ZLrG
z^ZBa1<7JmZ_3Hv7;iEP&;9VlOq(I%N{LV0Ky`geqZygK_nO9m|M8(B*a*A$}#VF)T
zDl1Tpc(vB(ixFA&xl1M1oac3#?K=z-d=M?IctRBHK|ZZS9ee1@T0MxoF~7SPp6JR5
z!fz@27d{WD-BPC4gPR2Pg9J+b$EC^#UfVN{&yL)69+Bq@YrG~G5h^%Qf^a57L5a;4
z&Y2ghnH-I0VMEKV_Yq-bobAcSvCm5<B@e}>f>k0_ir>D5Uj=^0(`H$asydXt-+fsu
zCkt7Fha@jhxA5@574mAj!o>o$gWy3UnO}7GNV}S6i>WeE5>6@i5g0p(VT8=Pc<{a(
z+%UGAVWp$yuNhUf3vy@q9?i~H4f9Xx)6EUZ-SID@h+7b>);{S?Q>g1@5U)hG&-?QR
z1d|&DjlcbWytZ>qZrpcVPPxKMViP&r&hkwoCH@TpcQEUI9Ln2gN$po@6C6N5Q+sV$
zZH(r~Y^2lA0(P(K)cr_*^Tv)*3|f%iZG>@0wb?gd51Zr$9146{2zrI0COm!opE~Vd
zhgS9YfB+M+)HO;y6rB^dcWLjm{OgKON(p!mSAwgj(zh_ceCnp|aho0x24+hKyuu4q
zT?-xVOuSIdNb0v6>z6_q!u4UJ;OaZT2F66ck#*P7G&DA-+sEmj&afDrr+~H^we#v~
zDeq+3B4f#P*a{R+<_-vXIF|iO`Bs|Pz_2c~x3KPyA)?+<yVu4TeL5p!xrjnhTqPb>
zL?TXl)W3|g9jGHv@`shrdxZ*KcVSSWa7(8>fF%nktivzqeJFcHhUclwD!m6}t45#C
z`NMsMY-aGr8b=;7oFvZBU-jkn@$mbHV%hv`r8!8dD`gMi&_OPjo6`B*fJRawQdjui
z1M8T?*bO$KB&dGs@62R5Dg|J)@!Z=hOL%h*1Ty;F7XuFXTxDpST6P}o6+EkubV&$N
z>rw=jtG{XqON5$u7m>S3gKcGi@nQWHM+*(_Sr2U@-aCA^?4pq4Z)$pqSH7y|Z3;M3
zggkf;HY=zg??tUXFAx`nTUhD0nrg_HbRy;zF<D?<_cPuy-)6((1a37adjuwrg;BGU
zh!O3np5XxWUudH<wF#$@m=C38#0CzC-Z{Mj(NCpK6e8olfQa-V1RPL1cDQerL90JE
zIvsJn3)L*7b^&`LzoUhQ-@Vd$;EFk~jdnt#d1@qFi_6X&SuQy9vfHkSmRcSj%V<cG
z@6DhByXO~d1Z=ZWia=fK<YCnGcN;ZL_Tu~9yAOz6?;p`4@v8mHOGz~`R=J6Ptha1s
z6(P`|;@R`cc%#xe$iJtu@o5YCs6G*Mdf`~cSlL#834qXGcRh?^c3G`s)~3x&vsv%I
z&TQ{n4T2HvZ@ncG6SsvAvVBd$f3iMPrC8rFz;exUta4<`!)^0^h&nvWs4QGSb_{J)
z;w_kvN^PQ3TiE8of4o<HE^>-;Nu6qy%YR=ISiBp&2c3gKki`JEWWbewfV;-bwWJDs
zv8;-6#&q>@0cBI=v%RLO(y!6yMZ*P=`8G#L?bc1Ymm7p`uD4vAM|O9k-e*O?d|#dy
z!dK`9Ej)5AU3sI)ST4g*oK4b=+5R_8W9^+$yg}2t?O3a{umLTyF1u4~mLpG`BRG5=
zo9p|qKZN;C5nq_sHHc5Rtqp3-bc^%nG6n>B4g1mp-WEKneC=vfYqFmw)Z1_QG#>wR
zXuV?q+R8{A5275rh>Gld)1zN`BFVBn8PD_qZ$>$P>C0!D;<_75Uv6<}cA#%21Blj5
zyEdb|Jo}WQTwE?my7|1G>5zjB$u&4fPwso(f7ZHP$Sayi5?qEcPh2VJdqg5K=N$){
zQ=nv@U5<X#j;bn^s#FwcyZo@h`g)V3V*)LX7Q!`g#r*91GDR4Iq*Af=qwxC$4HIaJ
z*Xr}){K&KX>?@ybKrbOa?l;yJ>R8KmJxS{rhNZgCMFph*?``ZNC_Hwv(ZTTj$1B~A
zYs<TW+2_#`kNu*E&Pa0bpD!@Kx2;V0Ne)i>Xy2I;LS79Qf2t}|UuhpToSBC#fIoq$
zJ3K-9q*8Olf``=|pa>i5N?vE}g!IR}p=8BRg5!EMH4X~StEzNRK~4;z!@>A(>eiv@
z-~G)WZp!$RSQ&^2_NGbC>y}!2b>2JuJHf#>N6+`dX@?NzyO6J8{j{V{J6(@u3fh=R
z2(mZ8UweJU+y=)B%2==IHUffG%Y6Mk-*W(hVm5=7hc%z+Jlb$S6YL@a@C7XAz$H@V
zRNVInudcmXuYPGum-?0@S<svNZP%sm+i$24`Sr7SG_5gXNK+<v;>-aD#WIzd%$0AC
zv?iPjVJ)OX`->c*tCV5+B<lAhI5%V7B+5o??R(Yy%8`X6DUZ3CwKzUyc6KRSIM&DG
z+nSG!A}}jQ)YLYA<^7r_R@8+@FpWjbcVv4$7G{vBI}ORiQT1NZsgIpbRsMucNNfkv
z-x^Lxvz$ZLOmhb<_^UhA6+Q_buBTEq{J{N-WpmB*D>0D`FGwRA_*_bp?t8O9B@R6c
zGxD`rqPG9oDF?C1k<U$fAb79na*4s}RAwz=i02*ueH>!ddK@m&b+J^f<7Q$YX5-A&
zB?#-2z(%Y{l6s8_A*^zVs@vDNIwbUWblq-kuwAW$1M=6|>4R}wZ|H!fG#QQMdyz9d
zf`-34KTo(#wrS3&-uHsvn2Dk=P8NBXckUUtJI{Rg>(wGkM?+&XQg=TMFXHB!`kV|U
zLZ_8){Q7yGb-S!(s4}Ar*IP1Rj~a$lG`oheQQc4zQNwL;l;l*=`O?QV&}1=0IDfb<
z){o0sAd-IS4N0$jP`}7{vn_xv(2b4Lqs6ZNNo!eLq{3t~7Wo7G`(SOwXIGBT*VI-u
zLAD8TOceFrA9LGPac-un^V5QZ*Rp&sq1c*R2Z9^sY1p^D{WTc}ts`-}{(drJeB)>G
zY_Ev0cndMsD=MUmSy{zL1vUYNVRA_Yss@@h6ix%JL=27g!D=vpRA^@>#QNImPc6Qt
zB0;dNPq()c4<^N3pIo7pFtYm9L)9rxXZ={4lL((6U-#b_pEI$J&kR)_&>l~W*z*~A
z?hMPR!^Ls*cVF<2OuW2Xih`V~jEWQCiLZHXtSH-MuH(aCzBHkdq-zZfHaSnNH|;@6
z4-XDG+=Pc(p6_0x*N>&v!4*DQAp3;8bp}qgzIDCYndtm+xjFg}ET|nDEhAnwc4nv(
zHVzm8#6y9sx!7U!s2U`7J>88xpA56;zG(fTECE>lJ&OQ|JwSk{<1gM_5rLk*xVx3v
zl#3sg(9;Oqe|N^=U1SlKMOQh_^Q*8*h+hv72^|4#stxE#O|JI`nK%8MGV763LQ{Yz
z5;9G}DOI7L411hpn!dDFStP~Gdf-)n2k?=myE|C#^X3Jn5N{Js@$vN(wueqtPl&6j
zXnA#Q{m(BZ9e0~uHfk4J*T67!k<a%lHt>9M<d`~7q2#dFi1*G6VFK_kjX!=YC7jkS
zMB`^w_PFw<?GTJ09{o_gVYB1y&GSId@iOv|E`MTs(1kIU^|e|*U^xQEZH>Zcdn(n1
zwhuXObH6QefvcfT%+VAVflZpXdmDoPn0RXOd0kXndoUkze5ph_cQf8%dinx7bdNN5
z0uDB`i_fj%QLVc{rZK2uSLkh)ipUoz%%w$t(tFtBS{fVYkfTnTPD6v*esr{)Nx@xd
zMkdS(#Y`YT2bqybbA5BNEx@=UST4!}zS@@|lP(Fr?0oe~rFOAhKlR}Ya~_`W2iuZg
z15^ToNeeohm+x}O+lj8jYC#Sc7D4t@c(~*Ln=t_umdgl!>A#8k&Hsj!XaAS~sj{2S
zA0oToyJ8*Ff_{v7cIzg9IzkW2Tf}9N4)pe8WD)ke13Qq^10f9C#5C&bK6&LeMnkxt
zT=+_b^Inv%ZkOBgC8ZJE_RESe&M0!zUKwiXHRnpm_kqQ|a*cN9+vVgA-W_0B*&~GL
z&?`_7rRsM$q|rDJ`|xL!3#<JD$47F*k0pFGhC8<WAi<;rgcV4kUHPWnHs{x5JNzN;
zZ`CJuS#5Ul0d}vK+irW6BEROd)<&l$pHrb!9s@gN5Pau2?)cW{P95Y9304X7QHb2n
zt*a|NR1sDw?~t}n<rU!{L$7~j8DG}8tp8@Z&~TDZ$sLc{vp=u)b5eqxOGPBSFzB+n
zY&XxAwUfeG3v79NaO-|NPp<iVT8SlYh!kAGNQegxdmdo0{vj!FOvg{_R^=!nTKl^{
zoHkr1XUN*U$F)Yz7q%Un+r7dnz<cUM2#K^RH+<i`axH&F!JNORa192Fk~`~CHCv@^
zUf<h=91mc*m!UY_E9=bC0v*le#L1uRItOXL?IVwb5<ub#w)d)3=~~-9=sCru=_FQ(
z0q%cf++vUNg#nc)auq4vw{Rsbt$FakMQfSk4W1?u2d+NfY%%(y07(bfloj*s%)Hz!
z5Lzdk?M_G&9;aX$W+;YMn7yLvQr(l(3WLoKxN1DK9#WHg9Dn*k{QGu)D;I}zb2;GK
zv7&~?1%-s9RVU>-enq0#dq9%Eg8Q?lVw1ZW#x}#s`CCFFg>k+x?(bhb^U#dDdWzQ}
zZI+{!H7aI_ItiO+QxqA@{gS9SBcb%VIPpOYSw9Lk4=<CK7}|?gRQUv_735+5aE&l6
zyNtx*4T@+vW$H+AYDrpUa`;hY%P?|g{LyhvM<@QWn%bH#nw{%}OI(7Ne1&Ht3SUTp
z(b>+Il&Qu9(CprY6dBk9OUjAnK;yZrPv`u&iAIVow(&5T=M<MRFw2%*OkF3prtpmZ
zS9JQfaCJpae<cqF2KM>+^N|?u|0BwscZM2aNv9p)`^Tlk;W&<vp-Pc-Fu;HP-g;4X
zk#(`Ux|*4p`RLc_b8G5ar1R6o#f>Ki{<0GTKnjN{B|1Wm0C;%o>L6tlRlxndTJ^o9
zjggIYFEXs-&-SR(RKse$nUJGUt;t%0#nK)O3ek#_l2U@Kw6ueTg#{xIj|V@VufXR7
zPArjtBMPygbANxot-Zbdo5Saa&#Slh{zzMRczDds{QUf8g+{%#*O>cKpWC?|h?Au1
zDk&L?kBe(hPEOtg1Q|oE=Z}70SJd@S2f?DYr>5d^IURoYqAJSI&&S>;c8V(Da*2+N
ztT#L5<>WMVG5h{4cAi*^uSCn+yJIlJV|ah7)9c~QsDi836!|7BI=V6SRsHql^Z2w9
zSGiac58;s`E0JwZXEzY4%6pZS%kQ0+i;i#oN4nr-<C2~$l4MNVkaDs1Z7Cb0^qC1Z
zc1v3GmSbeItwVTesjaDKN4w{Fixs!$gpQ!|N~3m#)Agt{<G0$DVxw)u$;^s9o~j1d
z+Vah&Nt2YMmfBe_UTDYfg+%-(v3pO)vtp0J`**$XEL<z=`xmY*?R?92#}1n=r>*4&
zMMD8y%e>h0Bd+q%23W-jnS=sAFO(yjjGBGF=o!_yyvtx@Qz!K<i}1Yg>Fx|fXZc<Z
zL*FLIOpZ-CXgEb^PaaNIc|SYiq@?EH4yg+X>@@H9iU|yD^w+{%nEG_`y$_~Ty<6tv
z&(`|wZ9K4emhjvuu;gbe*Vk|4F=%aor0?2&6)N};l7cS-ncMe`_{SHn<S&(JMYR(_
zf(E$+W2T=Lt*xA@5ja2F69ywU5YLX#aH_YfDBc;VjvrY_DOYu7TS|r8M}{PB|I{-w
z(##H0QLKF6x?$i5%1)0v{kBqT?|Z)0Svz0dKdaZt^TjZ7(N0C<o7WNYn$qF5@7d(`
z>up@nse756aYze(Hg{P*o6RLnPcgw;#-dXznY3%!s8for)WuJ)up<`%rhh4JE#KbB
zdRVVI=!05zzDa8Ld~3lc!bN<UM@b!Vnz5VTfFMk-xOjfHb(Lt<dYM|UQg=A%^jUAz
zv9&+ydLG(rQ@I>!AoA&HYakV*Sv{C8cXk5FE1S@XEyeUGk2eAzs0xbAV+vb9t;4GO
zF)X+Gsx6nzwgYUt^;254YDW|f`Bao9=5uyGbFYaB^~vr(oR6tN?{2(0erCn(<uLDF
zdbic`_h)S$<M&O?S~@3IYvOO}?g9d)A+p(-#XMTp!@+wJ95LM6z;j14H=yJ4LiIob
zOYe1#`z%>XE;qBA-PaI^O#x0dn%T#2tSLdF)0Qu6ItP=^P^Xt0H>_NaZat?iu1Ab+
z&m(AD;zCGBR-L%(Tc_Bi+12FPXv;|z^eH;p*4NrCibi~tKi9w4v$)F{wV!>9JX1Zd
zKx|~-VVgf~GauLD<UD@O?Gj9vCm4Tm7L=&quA;+TBLr_+)3P>4@fpLV(`s6X38&;8
z!ct>W;d$qEb!l~qm<_gbrRyTJRoOScIQ`vJT~{bu*=h86(SEU{s@Ck5J+v;JJ}mbB
zju+{W*u=H+y5HT4>;0Cc*7ce7b{FbrM%wT)dmpQ9VUVoVTC%`U&gd$8FC`si^weII
z%y*$=)V}W1vIXRIADvvl4UdjCaEG>;zrFi1udV7?N0*Vgt4dGn5P68xw)gfu<xi25
zV|w;m@{G;=24ZwTZoF4jz1s{2`?l-Rsg`!<m|ctOdUVuQa}OfYwiAR{zU?|b!p1tj
zrS@b8fUDZN$!3^${^yi&ZA=t=x!Oe338Svlg{xY3`^wB|OooN)YmAP=rVqqLz!2}*
zw_$x+uYs|}?13lllr(}%M^@#H^9qd!hax(LG1ZJY9benDAxC=xA*a&8%r5tSLVl0M
z%x>@b!Qn}j3^Vm^)aGl-IpHD9vk4h#dlRR+%{-4_Ti-u1i-kOllr&p|r;jvYd=7gD
z>|CH<z2~{}gtb0hIK=7pu9{LQTd^RvhP2EyuN^ohCok)`dQ#@*T6S+DAzd{-Ev+m>
zt+5%l`$v{*L-4nooJ*CJxf8aNjny!ikI*VQpHs+Q{ORFevz9d|C+(jeS#*pu8`-~0
zd>1b>POUg*Bfq50i}mMMJzmuCc@~WVd7Vc&w$Ghyn!PHs2EttYDx9qA9zwH&(O}6@
zzmY@<;|4GT3?Tbp-&)9-fyIaoaoi^0VBp513lhA-2sv=u@xS)|t|wsdYEX~b;2f`o
z7}K3?bb2k{(>j111D^W!n6_51<APb)SXr~cG@mX!s2|P|{GoD!eSQf?2D{39pgy9S
z|4C5jvL~GgM>Y4@weMVuh+uJ`I(fyMsdKHAokW|-yVPPrFjt(bwIG`zL)DK*FQ*8U
z4ulaNcW^tnN)#S1NhqN3mbOhi*L}F)V=QhEm%c!g?ZIow;Th`tC^}RBM&{Or&P8#5
zNf*zvl5b=x9~;2E5d4c}R7XpGSaY3MDTHa5+Q(hU_uQCpjj#f<Q>Fkhdj#RM-*Q^B
zgwCWE(l5yPM~&}a_jll+dU|Dr)Lim1m{OsvC6e)II^S|aYd>#QVjN)<MU`|FM1%8Y
zhXvdk5#_ja)_k_Au0ni{_ir2FNZ-(eIn!3mcUSPFpK#7Q0_$K6aZyxL6fx3dT88{(
zc46dTIKuEGuyFSid4Tt&!4{VLpK{XSbR}t*rWWS~#wK{eYJOKWR;aL&q90lT*uUjh
zR{;l78`LKuL4Rsxx-6xq(}6d%NAMJ5*kJLr{s$fH4WIt)j5!BI!iZ~c+BqiAtXysf
zGR^k$>5SziBb&C?4SyefHj{dqC|0Ex%rG&(ifo<OX7Jewg9{_y8m&2RHtg{6@;0p{
z8^4lMKF>Ki(a(0<@26vj`Uap8{+PVD*%brG;6X9Q;I%l}xqk(V#@bf_Vkj`^0w4DO
zfIQT{TLV3A5#43dGxKZx28KR?ho<0QZisjK8Be}GkWW-Q=%}_u36)@l(kI}@i*83T
zLn9Ue>@G5YxnSe)7u<1#2U;Ufr`X<4arCHsM7|WtCa<K@Yqlu7XPwz{>9LWV_FZ%W
z{P5V#it*hw>pc?F$>T_MjPfOo8b4-81Gt-QkP}sqra}WrSP$L7`T+JnOYCHDn3`%~
za0k>o9Pdi}lLaGJ>;17`eyz;=3RksSseVzPcr%y+qtIDv-p<5XfeK?QTaj1OL*JK-
zP_y<93{bsYu5R8(ueDOGz?08lkBbI^^Vs;LBcv_3;?6pVn}|a&Eb>x)gJ6JwZtw;&
zo@YUtD}Hx!{!a0*S!q*cODsHIiUY0rM({Y@b?I)J6+?WR*Zl;pr!3FsTaqmmLrpQz
zMk(XjeQO%6#muEAESwgJk=o}7G)-yWSh(ogaO-`)Pj_nDH4FjA^=Cs_T&9)r7p(xN
zABU==*HJ$oi*(YiNZ7fX%t1fs2dI9#jHW3S>YtSPp^_&!)C2?)vy(vj@0R=EY-V1x
zC#%DlgJZgTi8Xq;DMy2I-Tsa{x{XHzX8=HeY31PzHRwIr77a;@zI|UVD|j14Sl;Ya
zR@>alSJSrpjMJMuox;bL+0X7;L$DM>3bV@aHNwDB2E+1470#mtCk12lmuCT#rxq{V
z1{`j-zNAApob#$5#bkGz3d88bvBVJZadw5Z6SV?_v8Nbn+^aRUX@$kRDFH6rha1Wj
zp-@t2kGS_@6?>T#mwTA8{So)10uR}!$MkV@BDb5Dk)D$>a`qlP^(toITmUs%8OMrq
z_5j|2g}f@Eg@e1nGu!D|&&B7q&Y0O}fKhL3^s!VOp})4KOx}<T*fB$z-rqw^;}!S3
z%>GlY7dLT#>ivW|!+paFU&$%XR;GsxH#4mAmvRhYLK}R>e?s4{e~*PlWB?)QoZudZ
zKos@9aF4Q!okQ@Igd9k6w>5~St4>O!M!}v?GvTWVuQ6ODK)Ie9-NQ+hG#JiUJ;E28
zek<W*^PE8lc7JxFV}s7)V4w%s#@$zG=rFNNb??>Lv(RciM!*0cq^2&TX>~j2J}2lE
zSHyOm&)*i2x?AZSlhSnR97_F$?|#TLb))KNU<_<X<~*b|2dVza<Y?a_6Zm94;u=C%
z`!-Lp##7o&E<4MAnU=P5ktJs*l)TFT>XF?F62mC3@yZLhq@UdhQ9RVqHMSGoJ01U~
zw*T}SMNg<eV9pOJCKMuWPJN8s{=tvD#=`CirTUXUuy|*6i~nf4z(gH9UWg(j5OdZ^
zW>#$>$mh%e&c{ck(<4#qYRFTqpnxOX4;QA!vx?>fr6%wDAIflPdj}X_d98AF9K(88
z_dm{xtFce-bL4W@6al(jhOGfS9aJhfZkG<x!}7RvRAAU-Zr74~;)oXYl*5bJHEQ|{
zwVjl?KXDQ!ZkH=Do|L}&lc>GBOe_*}7yP;p#{p=T@cGF6aiA4KeuzMK9f0mBRho#U
zXQ2ag1QV8$+itPH9`wf!@Me~@Y_e{u)$zN^u^GlK_y8k%cA}%FqByS36f~MZQ-^C1
zfbOA2haff*G;8Q#BwNk?Iz!7rw~oc2t^dyHt=4@HkkHS&YiMk(jaTH#*)~_lavLPT
zBQ&=CDpHotC3QHD)<EbFmjiYcfZM;v^Lfu@`0bn}87c}!IC!EjPoF?s7-0zA2eQ0G
z|AQaG#}w}Eq-)W#BI1zEtb)-$V_@+xc&(>{9Gc#mIA1VKkrh)IJ_Ob!0QhZ_C~A{y
z=XWh&WRbe)TgGW>@ybuTwUQ>QiNm8WsGiCboNH$y85V6`8}e}JLS<aHZoE<0EM8zK
zAutzhzMIUBazH2*AF7!=&$C~$pTxMY5Dzlp{%OJKTy3idCeV;g2;Z%EUDm)5{fk0T
zAecwj%G=!IqFce7DnzHFHi^KAnd2BQR?CehO(4voM2d~QX27P_TQCH{R{8Tmcb4T3
zSbE5!Bm`h>$_Htt?#SX>q+yz}rYRWZYx(-8qj2OUT^6-gg&p<l9?Vlz6`Aeb^U8{t
zkp$*JVf&)&*d`x<Gh%=pEQiEGuCsdk)LCtS|Jiz)=kcV9E%`2l1|SSJ1Gq}yp&RY|
zTrzC7xc3G}&|uWL^m+}^F7{8<0`RsC>XM#hQzU;k&<z=%lyUiE*i*7hU+ODGGIAu7
z$EAH{-@Mj7sNHJUGjU33V!Ju#DEojD>A+SteYxqE<4I#d%ueC2wt#&q<Cx`Rj4+d0
z_)8FIZhPYVI@reTTsBL@YyyT!j~MZbGczIhG_1$eVH7nLU<5H^c^ycmMv5&Pu8z;l
z4!9aXW`Z`IM18VXjY?Jq1cBAZWZI~9LNUW0lXcxC{;u@Y?D*-n2J2q$`>fsEQZ}+e
z-vDP{)^66bw<W~9cYz$E&U2@hY?#1FH>-oH(EFhurroh-Ndg0w>PpG^JtKh5Uy}nM
zLJOICa5Oc?%hDDzYnM#MA;v;Ab$*$o&fSip=P3E}C=n0_=8_kXEQNF8iD<gbo{|}F
zI8hp1e%OA%K?yAmE_J_2h(2#V<Kh}FLj3?AMV+c#<D?HPfX`@Ov6qc1fL1H!?f;cW
z)DZcEI&Tm6?{(lm!H^jX?P(N5nVd$4@T3Aq><OH}A&G#|1QUJZ)gHMeJnK{1{US}e
z6ksz~_;-?!5*J*lMcs&xXf_JlpMavw9W2W|IbvJVIaCQ+A4NYB#z#GNy8qywOqf6{
z=iuJ(sZNW{7WB#_{~_UwZpigq@#fSETytzJX+rxjx$7}zoB_)Giwog$6_j-pXDy5K
zIT;J!tJqx9U$%*q)q{im?_m!^REY|0Iqmb+QhC;R;XLRAY5Dw8KA>hSgj5e;X6Bn8
z7k@Q)|2yF;7>aN0FQ$X|r^1QwNYLaZrP_uj)F+xak(>o7VYpZ4>U`b~r^Eeixyi)K
zNBj#mL}mB6+2y*<1I2japzW{lnXju&PF@eVK_nsV<&9|4%-9!As$#(V0iiBmRkc9~
z5{T6@>)NlZ@Cb+$`7zB{>$aPn(}WKduN5~H-j&l^L+jY9Gzm`QTVC=xnxzSM!O#()
zJ@QqdKlX!c>#Pa+Wgx}k9#9KOc!ZuFv&V4@eU}NbL8{E+qU>kH8FsB?!bb1*hwdm4
zqDt#V<4)}&gU2?jpd}d|_E+JI^Ino~_V7^QOx$zc?1w2lvDu#fd-Vr-layeuoQNFC
zGoLy=#a{g7o{O4nq*ao*%GvL7@Fzz-H-D=MWoTCY8m(iu9=U?exSiA4H~-xL$VGun
zAq8X?(=Oa}v31P}cH3wEd=fq3O?;}eknhC88#JTiIykb$%CE(`3yHzkyEvykMyjsK
z5IF&(0QUfETt`=(%I3glYE;MvsERrmz$5y+l1|skx5e}U^U_il3z4znEY+?6bYCWn
zEb><h*9nt?Gksfd`Y$tNBo3@6%xmM^2P+aj@kIH4I8v;LsHmt=C%ne%yw$R--}Ry0
z?xqzClO@_Yb`%FoDqEp%7HqdK#*!IppTjCNH@n^wWch($)?nLUgC*qaFYd1&61?ZH
z08Q(TQlt!_8`w3BFA$b-X3CP&q*^C1Ph#HheF{b&V6YWc3{F;%4HY`77fOlf-%TU-
zarF^ylv2{yqkmkTESqU(2um54U&KncDXb?VGL8Ij`h3T+iV8qyvfs}dx?4O%R@+`U
zoLb&?WU$$UZXl6_dIH19*POPH$*CtU=tRzFIhE6HG4+SI(I=1x_mBGxNRq~py7Koq
zZ7D@Q>SpNWhXeDcpadRC)kRSVuJ_Tm7Vvu0pdN+!LxDm2<0ZV(f27*j>F&q=l)xg(
zdQ=&91g9&vlHH^Qm*AgUl-paHrP(R%RC6q<u`#TC9}TY^GUf5@S_Q5!eH(A34#1v7
z#NS?UF;+?N28hW-vToAeuRd^<A+<<-55vGW@8Q^N*~hbUZW%gj9^CkX?4K;<cYxdS
z+qh@BvWY11f3fwJVNu6lxA$*`b_nSKq?C{@DFGRg20;{%W@rSYJBO|Th5<<lr9?rx
z1nCAT>29RE>zV(3uKQf)Ip=;iueq2t-@VsotsR1ejWq;_Eg>ENXXDeHSaaVwk7EK~
z&0MvEe-(ZNqRIu)hUhZ|I+Vn<JleUdT_jNW8L71Zzbej97VxAr1C=yWT0X+kmNLta
zgc`?uzjiAt_=Qc;(g|vQ<sKt|&P@6@EpbJ3V7vRztw!RS$l5G`+n;zk{XqS{=hFY<
zaW)YF!>@emnOP`vlo9f|X>u(nJi(zJsEICT(j~mcNqWHaU;!({l`9n(T7F4-6|%1r
zu8_3%Q|>F_A~qE=na+%!!9TJQWqA*XH!}EV99&`~so{T1($D{vB!yTF78xSzVZ)O^
z1qd=pm~B}0E$Jq|d5dC^$Tn}l{xEpkG=8;!%+)tq>n+VVc$?V?n;#%wBan{<@u<v=
zcwosxuI=hPhcl+i{;qzKcV(A=c>HZcMgBEYdtJr#K1kA#OiPh!$)$Pj(yH;QSCZ%{
zRu5-DzlG0M|J;Sprv|pbKZg(3fgTs5cQ4*r3E}uZ(KYv&;<}d6WoZK;ZZh@~H>d+<
z*HTCvU)?@!LqM{8m!L%<j57%aqC=0tVNYk(dF*U=wZsLKRk12jzIv5f<wg!gq9Pz8
zc11J8cjy~*Giz=Rs)hLrHVV6~c{)GjN6E@X?8&>AjBl!Y-o}3}SQx|YjhP6pE~WPv
zTloD~oS5N5$DwJaXUoP7abfXFT=xUW!t7#Fz=4Yy1+d!u!uvzEoP`FXv54GXg;LI>
z?(1B~m!ASj6~kM^n(<cG^ZLel&6;GZ3u3rWu>8W<!$3SUH7Tl-^Y`})b6!f%-5wY`
z#`U7jg3Rk|X!~CTmi_pt{9HL69by#Q^P^=QZbCTq_4(FzSIOx(Akz8X+#n?}8G(Nl
ze%MjlY>PiW!Q-|4&;>ybBy6IMqcwt@)!eFn7*upls)lME$7q1a*S5;Yah0!LoZVXY
z1ax#E9;Fm*F-j&<{jQqweFcx?w^H*zs{`>|)HnOeJdO?(oB6s6FqkM&^8KVqe$bcf
zOJniY!`D5fBEtqNlaG5e+`U{K!MsHRfjrDDc~x#=I(YH?-M|W~SR8X_C8KzKyY2_j
zs~?6vM;ic-za4+TvRfNuhe3zKr9B_y7`WzEU)aO44Gm~ue}N>pQmnGXf9LgsIQ!vS
zk5NW#2%Oq%IqAj`C_ug5`BfNFXA1N6-SNtIh#CHRs8yV@qWeq%$)=-#b(9=54A!@s
z&5<km9`(sIfnX?bY9p@=B`UPZANYLm*qS*B;E=qCSv5J;FodRmj&puTEc)*t6K~=E
zP4upwiGQ|tf`Nj_cfC7y#W1Z;gCbd!FYshRNJYwY3@U)u!8YFcy?Do+jgwjkSO*H<
zb)ye0<aGBKrFC|zi}c~U+oTepq~1uE3011-RWU&6yAB(`-kn%%Y-BRe$TCt=(`)u`
z<aN#Yq{VrauNaCZ=VwpkA+K&TIFng1)k_5b@K|H?UzztCV^|XH<)?<auOn-bkpi4l
zow}D1OkQECRD#xsS5BO9y9H#{UyM0?G+_h2Wv}NcQh;v=k{-W5P5|~92~<X^kCh`#
zmb$rvpzI~Qd`Hqz0~Vdz_7%$0Pttn^Q>h*|)|TojCQn~07S`%ypndeq3K{prigs>n
zoBn4WrS?PTk0eD|pm%Rs36gSkh@*yu_{Fxou^kB`4h4DJ@(~oP`Ss@_Jc17k-T;pl
z6&0+qY$?TDUs)`-z0(&~Kr(x2bIb!gC)A5W#|Wz2{^hZJ9<bBAfv`FQ%%y)n?-hc8
zJHPF_$8Sk9;OjTKL%~jEUu58OC2)@QlOpOy1({f4><b#{%-c5!oU;!9x8yZ1Xkz=1
zz*Jn-ok?4Js->4=8&RhVPA$=-UQuMvO5gF<vM*slgIu9SQ~@t{`ZDn=o6TE%LfzsY
z9bX@BUvK=oy&i1cD2xOINCJ}zBiEvnT0Myalc;0HxqcQvXUdc#Y0=F+xRsi^&hD9H
zzdcPH8*G02cCfUd7QMjM!<ai;%G_)Zte5uuazkhn2|IJvEa&nzz&!kVlk6U6jCcRc
zFuMbx^<kDbN@5A!Z<D%IbCmq4cvJZP9(b`m2_X>mhg`Uv^r-~za<|YcWuu?v?Qx{)
zdYfBIpSOShAY!(44c0(u^(L~Ay&x+T*Tr>3>9hj$y1bICs8^<uFMjGdCz}N})2)6+
zX(=Z-NvtuubpB#7jDIe(^<!n)%}0u@Gw4&E4(7>?iz(-=X5I4)K43~EdKFl|0!rmC
zj?<F48Hk~pU^YhTZ$Wa$0W05y763sLqDgurpJ;nJZtt1#Io^I3sjgNgar=QQt6t%X
z2x9ZtQIQ}oy;xg>%$^VZ`BX`PE9mTB;6~pO2fXqd=;hl4?=9hMvh}0r8ixwm=SqG+
z(ywG(J#9=sv3y_P{Y?fU&99T+q(8nWtI57t|Lm^b(mqZBuW1bF0Hf03tfZ;9Wb%(s
z+HQl}0X77Sn7}FB>b~TPsch3y%J&8Ge}6d$YJae+r6>@ZrFy||NeEtaa=Ga=eAHi5
z&&hpOl<raww=fVDiDI1^yT=^y;txa72jj??X8A2=z2o6a8=2NSR@<Z^??5C~Y#40B
zq~i`Shvt{+H3=A{yh1YS2=~ng4D!5xG}3_t0UvB@xAd^jFjMy8^v`Eda?aNPMWYDv
zGL}yCSv9(NuYb(-4`h9$<M%a@YS3wm>Ee6yMIA!nE|3Z!DZZY_W5@<G@0N7VWT)?h
z=BZ8^J)%zGL$rCWw)Z0`58Ra+#nAl7?%rXmQeC^rZS~v-BGu5EL_9HlO)ZNfkM6Uz
zD5dZM6`yw%vBK9lwNKr*B(U}<xR&L7FpFdn2f=KdO|kVA!&5xOu$;iK<j~tPn^5`*
zfLLmXOfeh>2I(*FK~M$kFxh{8-}-HUlAK+=Z2cz02~^uEZc!gAUDMHhXlEO{;_tH^
zv40pBBQSQ^2qQsGAFm#l+3SX0|MB5k%Rn=Y%har=HaI^EbFr8=Vu_2lMHAOMt|eXj
z?t7glmkhoDRr155*UCVp2QA}yY!;yu+SeH`5@_0d8<+Izx8spIMKDV!8^BhDuJ?Yw
zsP^XujAitTg~$DL`s4Z7UG$#1#NbnH#rL>fveHev9#6@BbeV;jA>L&$(r#N2y*0BL
zq~lsobA{e<D-1rXA9T+W!d19$E(KAs9O@SBD5jPi%liF$*PQDUwa_p!)j6&6Cf<sH
zM|8$#)~lmbi3$I$<tWKI{v8ia)S*)#H@`g_m}fK&PXP|gzXm&@+-Djr8>`4m$d_F!
zFzr0w)jcw(%<{WZxGTfUsJLI!oJgOJ8cWQ}HdZnniTCGvh%86GGjEhSnen2TYkOAr
zcaD$hZQ!-z%-b?+L83}=L-y>oY$(r08ZwgN4@Q)OSpSJj8oikP)!r<IsPp`Dr~-Zu
zq%@9BeorQOcxWc^Bgf78;QGU<c1EGcY7?KNFY*5{YUYvRV<uU#+l~lB)TEs3s;tzO
z4@sq`G&O>XH|%m2i-%vkR_sjW*$`L^YbdBriV#XtSvvTS^-78w(jxczgm!6rA`&<x
z3QU@aT<mk`gJKjeIc`SfBdNGHiiNF39X$c%IpSoRN>6)|E$7j@K$esDa{g%y`t9<7
zJv#jYI@*2VbvVw$`?7PjRDyPOaKsT?zK>bOu{DY5P}LM|wO7Pv%)qbL8A!!iL2`Q-
z=k6g5;JC!(iuNpro7w;(nnea*WVZwpPL_V2#NX8`FYXEN$EmTpT(53CEL8@FF~6V*
z{q=)h9h$<Q^E~YFz0lB;%fxo<1cFu%usZR$aE>h|B!oMQ;JHvXpw<tPI3&6l^2pVP
zPXVRoTlRh|HJ+Jy=0dxlX{cwl2Ie+*NQ{YD(mdh$uSY0R;U{sd>0@fsQC^ZG)`~))
zJ7j9=E!HQ#IC9oLkR$v9({RWH4mY@W!`{c=y|$hq4Eq+#Hc(=3Q~jFg&E;(T=l6y6
zwbWzKi;#0XfNFbkZjUAP#9PzpxBr>l;(`E6Bp2&oapoY+n}pke=GN2)oBXs2{kop{
zym5JL>vTTLV~C^(*OV)#`+%E*2B9-tSoAW%&|B7k)HP|J!Q3k7_HSiv60Rw=Bb2P~
zn{=i?YkJt)iHjWrkGC3y4*GNa2SL3LUKKG)p<OEVlazk6`bgu>DSD9Ecrq9+#)v3O
zFcj>%d5xMd>wl8mF&P%`<h1{S?$wj$)bOSo1_4VHjv05CjR6+n5{?jOjt`<U?GN4S
ztLRhyeCgYc_kpe{>x3gFm0jQY9tMe&zM+PN1DO8zb)J<XCDm$Mt3lf$!-_INhf^E8
z8C=WB(si@Hn}4fs4ZM0OH2(<6t2C_x&0NikSRk*R)=9ISL{>pOG4|z)ML)Pa9^UG(
zb63V(RbPs<=Re>Lj#(z10oajE<0z<~F@+iDyNW%{4oUz^yoBJ5j5{IUqvbz3I8c*;
z!vn1$YHiAGtW&4&Iy~Z0>|N3<Ln#aWq3PftabEuQgJ~1hhwMazSib)zoau9Hsv9Wz
z^;~CZ$NxM7csAVk1lK75UOyp^Q|03aK_;Kf+!`I=6kX4*N^ieIdBAZdj|BD#OYU^&
zQYuD;Uv<G@NxjNF%mj~)ULXLj&+jrP4!6<GY$-#t-&O*l3;wnz=O^J0|6<3=b;~lB
zRcSY4a|bi0jx#33^`uNEeqvkrw%puKpN%8ZP3^;5+vI!vY$zzz>l1>C?7%Y&brIfj
z)bqZ;Jkej*SsgS4j%Fcda#$x6(2JVAPN$0A)mT-=5m{Vv*6I%{S=Bh|8TgA^350nv
zLZd%z-FTMDf~h_XrNm|+u=hV2m;+s3FGH*M^1?(CSlX_0UoWZ;1x$}GH*=@_3UzvI
zPzX__Y%cJXvQp+PHtZ+LEd#RmT<6^OAM`UBQARn%OWMXpk~*)tCd@F)L%5j+tq;vY
zoI$k$IVIeE!jaXKTlLDjIVxzAQaEw%qW}J3-?~#P5qLBlDyYVRmqf-yJI^VU-_|-J
zd19t#0zYFO4BJpy+9BLl(BT`UE1_R)LZw_gnDQ>ofhE>m4WkiBD(WB}k=L$?1c(F$
zwtJlHlNQYN<g45QHkuX75`<WTi<+|FPeZd#gzZ?5X`#%&0H3G%NUq!+%YRcs#sxpm
zThafI?a+#k{U0f`LSomajCe@8v<?VS)3HQ@^iUuyv8CM1!8%}16O4#9budsLP5|*3
zJT-G36jC{9yupdNNBV{Ljmje?!(e@*ohLKOW{<*QeKm5M!cFgyyE=I*8bGCaTBMMY
z5!pHm%_T$RlRaY552Y*YM5vc~R`D<@Ng_0MCCc%mnDglP69J@#_=x5LZe=rC{Hlk&
zGmS>m^evt!)@wiy;87$q3yL+y=~2fD#a49*7e}c1yvm_Ta-pv8aopUq*vl`OE}cjB
zdo}=zjtb(4LAyYyPD@e%E5bw-isn|%MRJx&eU-J&Aj{lHT*cuIbZWj~?+LQP*=0*5
z1QNfRvvY^AOB#yp{n+D&UR8aJW*J3A2F=NMjV@qGB&wjH!RIopD#z<udg)4wV3Esq
z!QcJA>}*N4a<0g@ShDREdp9q#?o9>Sed!N6%sVcO+*(9&7D$_!uM!0>KJOOW{bnzV
z4p;bnW&fs0xRg)vh<1`pcThWXM}73X)0&hHjN1Mrr8lWCx({u#M!tgbRPpO2YLZCm
z8HEl%&bFw}Z5usjqi3sb$w*4awOlJ4xJWCyIoLR(Bl3YvJxw7*Li}z04D(euW6R8<
zTO)kwokEq+hI*#uhNZN?y93G~KOB7L0Ez(0o(K+RHR%eXkHH^7DT>$7{S!)pb6Yea
z)?&z|#gXY2`8p2p)T}@@Ze+Rx>}gq-6L6If80X<;6kp^*w9c+dvi_z-(r93BmLq%d
zE6ADuJu2ZHjvY6Dtm0z2Yl?X@ph0av{Zu}Dwd9wN-Cm8NJeQgg?L`xJ!VTm4Ap(JT
znerP8Sat|<rCuaKyBqS<*v*4tb@bepHwnD#sshrc<DS87ER(!O77Lu$f6%9`gzrj?
z1W-8?Un`ypof(dS&9eH(4j*`lQmwY8QT*5jhuRx7Fu_&I?Z$1-szyo&6<vQA=s!y$
zZ@=OEZuGCWU98Zce!7zu_Mk{0msB6SwzAM6WrZ6M@QquY{93I4Q6hIjY|rsVtimk(
za(f65nd8M-;rz#D+hILYDV%_XH7s#qeCU8;Xt|4?kZaX)W09SOZ7_<GYnJnJWbwYM
zj&;FB?edT40tR<ItJ4PAeYQH1+TeG86q!~sj@ZA7<J_<J^ViM19-HY?oqrH~#bui4
z%ZUokYk1nl^PSVR3bGOLXJ~UEst(}_h4q=mQGo)tT4g|wKIPTzP3-^VZ<fcJs^a(%
z+dr9^^Ip}xr~+e>P_Z=gJ9qt9@l(<SMPRxBu@Am99##uHzdQOvlR*56iWB-EUSoyp
zP8JzNBF>OVX;~L=acKkMaUpWu{M2$4|M!o5NpmJW9$%W35_<IqbjAJWoviq~^Dy3+
zm<$v2`;`7CGoebib0C4w*xIx89t*do*Kg0y_e)$Yj+{9XlL?E6`#0IK^avY<1wbgA
zlC+L3Y9NnRsd3FRKdtFcmiuRSD+}iZmTJNk&KJZ7#$_6Y;8JCvQAfse<KcT4Bq>tJ
zF+Vm>v1+)B-};-weRISo5IKQ(Wx(Cq^Y<F)AKOoI+cH6wvb~K3yNp|&?=(K!&PzJ<
zy~7n3-U{33+n1JJBF?Z+#Ou%Q3GJ|9415HIRUnw9_G}rJezb`QfWwRC$+_&a!UFyW
zwQKOlmdUn|A>3`1$$EX(Z$+RC16MLJhN(lSS>|&>A#fNKZ|p62sMUM63}3A6oZsl+
z!(l&lK$)s(yRqlB^98R#z6iMe8Ma=&NdAI0tqVK0UL%ik$>ncT)z3<2kK3?jKr{?w
zHI&MCm;@^DYGw>q2zP_mi9u--{jz^hNIKYt)eNM0kZb+iYg!VQHkMJ;n1HY_Yi)R$
zhNl!a*7qi|I4%zC60^#qWX<x4e=_q4xodMo-LL3*bQ)N8>ug^rJ1=K5ZF`)4-OkL2
zU76ttSLl1NlI7}ehNnFti8Mz#EUhMafT<%MujK|yvC`+k2n1bnVm7vQU}O-1_>tnp
zP`}2}Cv0wyBP)*K!d}kQn`W{evvaVv=^{<Mc{a&HTP?HhV8(&yfN`a{t0R&)_AkKA
zVsSjGmcmmRSIz#^DKmQ^kVYAW&TbVAGvE6+l+9{M1ZbQmKplRMWV2lHCQv<@`iMY~
z>7Q^dBs?^(J$$__ltRq1^U=RB09EriFJf2}PK2QFSW1xfXU&#_;UG~QSb7w%wtCGt
z5mcS0%O6cmHqS2(7+;X-uOgJ}<J-gOH41Z2(KR)6ijKSex@OR3B;pVYx=cfJLb^i$
zfIA2MPNe~ACRro7P~V7o0dg;>fa3nrUlX%8W@&dCFUO?;qn<Q%<U3#>lune^8aR$R
z3`c4fnAgGO1^j6kJ+P-+F5rH?O7LbTM)w1cCebkB1=h$yoEfAzFxhSJm7$aqm1hdV
zrr+5XPL*US1ITnDa6V>}n4tMZfCn_x-9f`7TH6zJ)&%<4x@FG=?4jwUZ&NA&lfSNb
z^*vJ@YAdPkGi`qZ{w;`g=EOeXuYtz^YM^#vTW)7x88i``oJqK*Quf#}u!?B?A&j)F
zk=3ihvdk7WjRmwyu%QGWgC6<WQ{3vEYA}`$F+NfNv`o4+t@j*NL+0?9WQ=F+ZK}ob
z68bB~QqozYskc)u!U9VXfu(Uk;Q^`4U5r1oeCL2Z8%niH7{=eJ5trKt#L~DV4psvu
z{V!vwbG;);H+(*Ce@aUFg#)3g#o#7C<5w}Fxx4!L;btj4*u%Xq;G5h^{wpV#4dHk+
z>vdK{u<*6S<tsl>HEL2BrR(=)5~mN45aC0+CYzIKy+_{$XfD$kC!LzsaToH6xVKeV
zVKz923^Zm~m;8v}n$_VT4}}FokZSlAk~w%-YJH8n2dwLiP0Xz|H8a8YOho0X2)oSa
zIdV9#q>ouZ2tM}km6kq4LC$BngvVFyAA|Kh|FiND^jvG*<sP+ZBsZy8;!{|joS=1m
z+^7PSeBw_!c0T^h+x>lK=J^}#Qa+n~T^hxJxxclSd0XPdmEZU<tOIW4km58xENDls
z5mvkwH8&{2=YG6ah-K}j2J{uw5B!S#$<$e~+JR2ZG&f6h2PYB6&ALDrj-iie2_)4X
zgCb(FPl=&V0ySyic>WxftbpD+fwk^+F$^O2j`teV$|iGisU1UNzQSMMRf5a$s}D`s
z_b?&MDH+X@B~0HnwSUm<C(0z9I%7jUVtdi^FdW|792WhI$@jtkFtK9LxxgfX7D8V8
zTK(<AU@cYRS3Q!hl^{J&Hm80$!EAy|E`Y}XT7lhDgBS)61@%K=5bDmPlt{gBw@bq2
z&S3uT=)>tfWug*X?)-PB;hvbMnx@=wN3V%dD6B8KW6N#)@c`0+tX2^LLO3cs@6p_&
z-S8oc;@3RDW7dU~3UW!PL>e{6Y2nJEu-DWakmU}a>)~O$n786H6ig=H`t2tFoS*+i
z9HVX!g*;>t$R>!?LMLPRstCghHIIFj97838Jv-5`sAiNmD_i|1WEh7aR<`Vn28sC}
zDy7zNiM#lP*3!&fI|eRhf`}n2^qc3FHjDrX%lm}CLjCMGG(VqziepzUAgg9^25BGU
zq!vgFA4-}??@;qm-{>G?a$*4k<-tDD5_}j@ij1cfsz|#2*!JTS7e_x!8oG|8c3<r>
zr2i??lv=&gnI(0yo&<mmZ+5iu&0<OMwjfhmPOlyjEsCbbWujJcCEKd}OI8`hddTu2
zmXNH+bW1JBFJB<wo7K-#QsB>_`u#HR&eX>jj4U<x+JQ-O`U~Y{etLuqzM+akf3;K8
z^5V_K`^nMs{Dg28P&Wv>lp$3W2r(%`sFthU26I3119js)!PzV(10{8bw%}vooB8t(
z{$sov^luZUI^tX<I_~r~ljWaRDwQ7cE{!_C54>%45B`1ha>xrS(kJ%-uI*LIs>F7L
z=YK7c@J%pVZi5)}Mm}A_0>A0@*HwfMM5KOTVo>sfN_dVf`m}oyHa|A&>rCz>0mx2U
z!VBGJwBMhu<GyD)0BOJZ%OPM-@yfKlme;U+l4a8JgMolT1VJ?3x1B%j(%+LAE4)nv
zz52OUyHPw1CG8wp<N;6%{uy{W(<xPQ3JcTv&sYu7ok^^eP-)1{Gm@t-sw7^#@lxka
z(k6NFSfjXv+)Xk!!{z%A!_?&zW~xWG4sRXbTL8SNe1tox_ljp_yv@`w@@$UVa9^F^
z&WCm|7<KFixiN_9*E0UcWw`bD>*U!&FpcEZ0Tk3Ma|DknT`>Ry#6hTepNT-=5DWyu
zhe*nh@VLu{MSr(lL57I-Y^w&6IVdjMX7v9%d=Mx%G~_$8i{Zrl@23RbKVDp$II;`$
zdc%6KtcxK{iI{+$NCU6mH*MEHX3s=_7{(g4g^4jcBnnZN^eyC#k=?NLSaGBpaZ=&I
zTjYB_wG07(%!{oO<zE%NmV0bU1tAiB@7i8v2{=p=C*dwCy|xI{=V&6^_X+A!$uMtw
zvEVYFQRmub*fhqIaCLo@9C%4}^28$GJ;W_%-aoT#BtpaIz~RdPqkXYklfOc9k@;WL
z=9Ngcg{J7!p#cn=T;uFi=#^2*(}mw)+&CT3@`vJSpH&BrX`78yCljEawc6x;9X?0U
z<6@#i0^qFR=jSp0^~XKxe@$}tJBI!?m6wg6+>lON^H;!Chn{E+_GYP-kX{6r1zc~i
zj56}qSXG{Vh3^L&PKaR*lpEl|pvofm46r8Wo_GO~_@tbBZ>@b|rKBa2kimiC87X2u
z#@>p%31VO0DdXDz!y^4(EN=*DN)&zwOhuDId};k_a0ow1zyuRDnJ@8VD-M6wab|*>
zIXw!%f-aHzjSTz;Sd)j$9Ke@+de5S*DuMYoy#BnBo`WLf6X!FbgM8GWWhj`3kGf*Q
zFx`m(%u_<w;jl2zE~OZOhQ1!nIE0oKKnx;c)S0ANI0}t~oY|(NhvqB@VCb@LIuNl=
zxo+3<J*Jv)bgpL<>ggz`uR$#nr}CK!?dD`bR$7Y)lFGgN*IzeL$u8AqO4V|u@6O@k
zFSz`am0QI?gdkUev(U3P`V(kZ92U<s6)war&kw=eef}TtpXsTt8am+#r;;ur{$$Dv
ztdj_j?<ZF}@@K6!DQBV#n{qD$AkBn?|4O3g{!TlNk(B_HEM5oSTdrBv=LyzK$pO1k
zl5wf?Q~Rpps50_@G|O1^xC!1+d}w;YVnMbttNNki;(+(`sC!jU^~b`5M=@JC?;>*f
zU=J8*eC16i*sD3RI{e>Zrwmu?u=Ymsj3|0WQb6!iDoUr#7XZ&#zbh+`rS!SIpVMce
zJ=Pil{ggR4RN7%TZv1Xt^{ev{=p)l-?S*_O?LMk+@`Uim!abD?D8fac^uZ8h@k`ZR
zIh4~aX*Q1pF7eQI&>CaOkm@+Pha;$pN#I{8V}x7X5l{I)9PB=QyDVwE5_TCPO}c`c
zj|*eIgR!l!R(JI@`CF1X02!3%Sef#uLvnz}2Pq!^3}oGM;s4DT^I&Z?bI07ksHR||
zU*vGs2f)6>KSF-B;Ckg&+hk82BD^2J!b4D$98<uuA*QdH%5WzUAq=YIv5fqA@QC;q
zYkS`7Zl(MxkE8Fqn>KXIUhYMgsiq>}j{*SMLTWb{S+Q*9D^Ix=mTkMb564cj^Sl>z
zgb`kCl5jQkYGWJzsh{lr4?OA8QPv0*8|`YPbOHN=mB7ugtOtt}E^(I$=+qClS?JU?
z^qPm9aomR&PqM~YKa^h6TAh4|T7P`(q#Y~y6&oRti4w*}{r+J_2jiR>p}xE57a}34
zO#JkzPR_I)C@=a{Wcn;8l+M2K=18SRrK_8c-;v@~fc@4Wgy7Vp>Ww<pS%-=aX}8wz
z-^?gS&iFzvt3kv}Ktl`tqSMzFtxp?u3lSMdQ0sVwxCXHXSDm7D!-if0mO`++Qh5sg
z4{U(@os3^#GmWG&E}nv+!$Ni-H_IQ89vS&^Kb@ufIJc;TPts*83MTDzmo^iy{2c_$
z{l*j{ZyW=UwQmdVuorObVkiW}fR8{7z+5i$RyB&SBthN%vtWW-ZhZu3rcCn3Y=Qep
z6r)a6Rl=gKsLzKceqmHIt=1!>N7e+nP4K+t5e{-xxusI-Wjzq#G-q<h2QD~hX8x%Z
z`D;Lf(L{Cdg-LF6Zr$+o+Rr((Z>ISoe{B;_oyb+!_{urgTjP7xzv4#x_7Qa3AKxIc
zvezz0OkUUpAKjXKF4BFwH_lsJ>L28Vy%On!_b&A6PrR$4-uKm4S|l)z8?%eq6tUgk
zxQ(Ev0>t&+tT>$R5E$tqr9l3`650QmEcjpY1m}GG5X=*0dJ@XQzUf)w>BY}$eVIAv
zwcbJ?PTe#=$?(pn`5|wb?#=TeyzZcWoqXPDj^t=s-tilpqxSNt%>Ad)&L%yjjFY^H
zT~|v--lR;i%tJ@s0=`qlG&Pt{<r+Sxr?&zT*sLpag7NW^LF(*|sw-b8AA4k@M~7=L
z+2+W$m~-w!3$$3rq?e#PB>%Xe=|biORSYf^Wx!IX3Pzg?;QmuIj&H!JWGd{TUtpKM
zo`r$PT3Ld<##;uOF~@D>2u#RUSwag{!c)X&z0`v$FHdv>2|S9WW(i)`n|PG+cw{8V
zcB%)qKP%TSwTZI*z8iZb@#oj8_x<#eub)P<a1gjqHIwT+r9`{v-Tp969o322A}W1X
zvyVVeb-A+N5d0uHl?k#@RpK}84Zh{rUeUfkqPE|ZnE9l`E1X1lUFXaKpN1Eyjil-f
z+XK|-V@Y-Q%9;h@DiTIg6?`iTTEx5hkLkW*BD#-Lon!ru3dy?o^?q>=e_WHhnxQ=}
zk{*>R*SuBCYg5eqRHzO|EViMgOGd7mJJUPaAi#9W-ZC3atSkjLZ|;u>DR?L6DM2=#
z(!p_k<9O}u2n>Y9M`7FvMSCNHFfQUmZZ2GH?t8Lp3VMD8N!2lZPTw@c&k7vD6XLt^
zWX*`Decv(SDP2DDP(Q^2ByO&yb-WPVYZ@@fh{?v<G3g)cL(FcTpahECq?zGHYE9D2
z71tntcbxAe4A#|W2?Rc4@PEl)2NbU3F9HTQH`YzwW3acX%PYd`@u#K(f>crm6ps|x
z=ss!VNDMp_#o<fAx=Kk#NQ`!z<D7mbCAWn^n5fA9V~;@qDm*5{LVKbn7LtSIAe4r0
ze0-QuUvXVcME6FVTvf8c74SZ22#f)@ksKtrJMRGS=S?cG=<zR^?mL<p41;0ayqeV0
z|7JD^H>@p7OL0yK#=p!J*>){;)B5{#LN?GY9ckZ-i-l2ZL@^tYus-AatM>^cwp7+P
zepo!X+~1U*q^a<e9lX;mhsSz_pbpl3atrP*Ita`pW*^R&^aPoj>2p)(SiH0_3K>3n
zHq=^{R{Kz(SBYF7N2#3X=n!EDe#QJs{s7=UKSh9EnM!InzkiaGHX-PwpuICW#U`~N
zNWcNZmxeN&E3q7Cs5ctEGSw*~kyM9Qa2F*mB(Lvi3Apm>-Om}v(KMr&-q4lur4^HE
z&3as+-7f@!?VxrsB-Ol5O<w16P@cf|J*AxiXxR6<JaPLB^Ev;rkEvX}gF~+k16A2J
z+7*2|Aa<bke`Ms?`ESwYvloO9J5^A-6fT?TZKr#clRWR&`miqK`t5jSZ~ZT_FIE=%
zQ~%;vYW=!5K_+cr@w9w)vWIQZbt76rOlD3^-|~Fx7NX%tBw~+|EXHz(KCaeM&w_~%
zeH%^mG&^X9fj@uE@L4-Q#(=_m&Mx&=mHP9dLUtKeV>1)d|C5_|Ao!oz?3=YyIgu?6
z3_Fz%Muu+V{(Pcf%<tlaG>39i0i^R-VHeTN0U9v{5Rzu98@<}n^_Iq)j&RmjuJOU^
zBF)5ZixN6Ke!pzfk4*jmBO!g;3jeiN-|jlJP=*_Y!20@R?QF&^ijGVBvD8<h(FkSQ
zVLt9^;%)SogY8YOVp3f`l+H+064IZrl3VEh!736(yBzCHmdAqda)@-G|BHJ5_(L=!
zR*#i=v?V79nR@QV+6+=XFopYV$&P%o$k!1|DV?y3SHOHzrZd|G8Yx(Q@%<xqC0k%o
z*!}6mDqLJOJ>Q`77R@u|Z>+?Y@c12BqxUfzh-W)y6PR>SK$}9fpL`5CAtTU^9ZZ^i
z`<8^Je2P+e?<puvw5ApII5*~J1Pw@M#Bnv^<EwPV)-u0`T2^HvA1my~_+GeH9X;rY
z4}aCL8c(a9iAIhIVA<M>;=!Vmv&t7xiq7Lf{7dRXS<0E4=V5prGnkyO=eBl@V;e4(
z-naV&!J$?Mws|Nfed#y`9zX@<s4IS5@dM|C3Ph=sBIs5Q)1W+m&@MKT{v)|3Ne#!<
zh(uqN+6hK6YdhlHwS&!U60HarIU*?vf2h2hWs40*@9|13wiKBb@?FXsmg7x67~@v(
zGwt{WF+ePZv6LR8aqLQdy8@dp5{Stk66CC<!c>$ZQ{Z#xX$eW<>gKQT&u`gOALb5)
zwIj;~ROeD=y!n&MtE3uVD_VmL9up|m$8Wa!;JPccG$?!hN~C)j>C52Kx1MWi+C><E
zjG#?--x=~5KR(77MFm*Wk65Pn(&C8k<S{-7N1lo;k5M}@35~DiH)<w)foDVU&RPMz
zLd`>&W6!5MJo<Gz;J(Y3-er-~B=vDe@mFTC#^lGu&0;^SV*~3vk6f!NQ${CB0bQ>5
zyZFoUCNG0;4@b7XJf9RD4g6K6R+{E#zx{Tw+xyI9HSWH@BIQ_^SVcv}u<8oi60#A5
zbZXp~J&K4GVDESfHs1_2{B|tI@bdPnJMHfT4$s|frFu9e5W=)n`Mp||VE=*HLncU=
z-PrqUKF&>Es$9ZV-Hqg#WN`#lbBb-vb9ReLBo<E;V}B%f4}w`2ry@j9*<3A4KxHNF
zuF8oLNCGhA=y#6?gw0tiCmnQqvujjk-1am~t9UX)uhel{YY1qb{lWz-VuA21heBbF
z%AlzaQI{zatPhtEP{EYGg}ogiCXi#;u75;FpDU;md&Gnfzw%popmC4XaZMrjA1#TE
z?*U0sIB1R%@^$xz@1c(Y3;LtBx|~U{O^akul_n<}VfjY{XQ}Q$1kKT(C6&F9D~8OO
zi&kG~0C@xaGnoX-ZzEdkct%3A{Xd_MXVXpTL1dsxCKJ5HcIXxzRD_?zQZ0*R;NOI1
zS>gli2aiL;JRk_j3RdTs)7a(e<@R{;qx2iHezSYt|4~(J{!3MAotQTQsh~WAV7<w9
zp9;?@e%H!6Cd(yoYG{J{p6cq?DfU#5nWRMMe%(Jazs7R}VaX*S47W5CVk-%Mx?qfd
z5iN(QBHPG&3?sy_rg<`F(_BJEw~v^-+JnX0kK>%Uh2&F$B^wN~o4~dL)&JcEa6LJ3
z-=(u3NiS@qAV@MCJ>9h&$8OGMM>KxCJ^ehM<*PQkNg+`tuHL*)m+uorgU59phF335
zlj=7P^-@0+J<P4-FD&p)VU*<U&RSlczr}TRjrtOlh2!#<Y;Ky`4!T!+j`^u1f><tJ
zeEfWM1L={g!s&TGHn?SNi!DD%;?_`woq;apF!e{dcBSL=un=}BHP6SNK#_ROw7{di
zi-LuMp2@(AH`J@aN9)>Y92Z5ib_qB^a#%1;Ob(}tn&vw^5^=;C+%B+dc-J!c$LvXc
z?Ie(G`^$%cjgY{<!OV0_EjNby@Y@3OHg$ABZud8Km1jTfRCu|<l)!p#+aKz&h(Pk!
z5qjOSw|}sBl-mbCP2GOw!Gy>8_U3J~>Fo$iL$cIgYJCEbYn_(A+|P<wssKGT#RqE5
zsjxzEVTeVmkp%+>!V-Xc)-5i{G+A(h?jFOU89h!2%rc$&CuQb|<SAD;Uy9nZC^10f
z$1o}So#UnY+r3p|7JByslz=Lr$8V>O#MQH35I{K}R`Y<9lhT%+Tb?U)=*WEEk6j06
za;^XUDqwy+A&%D!sd%*5)|oIJT<0@(RmB#XoKx0)2*Q0I4Wr5(gIh1DS2(WVM)MT$
zZS4e7VOf3TRujuCWjxClfMf?UOInaquS>dpW#HF7yWhWe+*oz`yTJU$Zl^IYM9~{W
zD~NXx6?my#?^h4vr_%q_cjhh$3MpMte#7R1Tm8T-ckrvN{IxE1X#_2}cFDwLc`(3@
zzX$$|*%n{FQ)uDF|7jn?`208N(a;)i%Q7?WZ8d|(3vrBQ?7oBUAkTv#=xAvs1OhQE
zy9za9>S7w8yv(qBYgn==)^W+U5pq|vscWG=ImY$7%Kd{%6N!#F;$h0S1nzJe0j3Xp
z=v{zJVc=P6>T?)289b#|n%bK5RgfD;)|>cI3^h+3&8knPvG!BDhz2<$V97o_@cd;m
zcKI8Y>G)zBo4S>)pVhRJLE(x_hm^T@Aa<Rg%<^P5>JF|3HS2d@&Xkt#=3Wci7y7<H
zqnC^5OBrK7+f%4)gl&)eD~^(rGsb0C?(g$m=S!79D*F?P=!za9FhA|`tQ;6?NTG8%
zRR!efY-kt9TNbMGFL`W>q`IXZ=U_Ruv8`Y&p+v2&Wb7VewY|1n5hTm`mqK1K0oK!O
zdi`UgUi3Z0I^K6OsLreNZD31~hw4fUam}^$??C(DSLjXhZwZ*>`1cOAX(j}2UFD;m
zma4f;x6_#2z4TYfv&q-IlR91pOtr-Cssc(IZ|~$aY772*LG!=U*ZWVv%tK_9Ym26f
zN^w-Yp1QT&hYUQxV}m!PK&h-sgi{%rA9JTj(RP;7NV(d{avvM%E{Nf*Wy8Q4tJ0DE
z$@S69kzs9v9`?U1OuV{k^7ocDasu;d?BDNs_rd4dvWx|G7<W8Ufl`G-UXbm@(q*n|
z3Y2G`z?y@jN|v=pO_IddIoYp`muX6=mC}-cfi`8H?pm%WF<lVJFd`6JfACv1)hZOI
zgxeS$kLNS>a39-|wJ0XW^EVaMUmbduiA*<s)ywu$qw`x9{pkMFix;(O#3T=~NSG9U
z;wk`96nhu2buDK3l19;owKw}wDLzxM#EQe1u}F5F{enMMvae+EpXckxsz*r{9pQ|%
z9fAylyqRW2xRVl48);m<#Dg&hBBytGSr<fQ;IDT&bDDQgnmKPXSt36LQ|SNEcc{aW
zewl4?I9`L4bJpxf^?@6`PxN~{`^IZIk{h76w=iyMfTFps&lGR5NrOM2@ke5a_9s|P
z{SgRM+LjY?*M5(OB1Dnsmgns=(~ATSKXwEo+F&nFN%|o=T#DhKfyIee#yFWA1qP<x
zX`ifM(N|Fdqb@)Gn72V87r#=}`^1Kh@IdhxB8sXoV-1{K-X3Lox47&4IvYgz$b1#<
zJvI5p&~v>Z_!ws^9I0#XmAXDeh9!V%i&j_G{)~;g)d2qM4wVMN0j<g@l`PO(=s~U%
z%}cW!d9&5LM##D+vJ76{K|w>$fG)eu-tzw2kST^oOHZrGNblC2kct<No<ks8CNaBv
z3^Q)mm++Yi9h}(rNS-Zj&}0ylGSEu^s7o56n6PCDQiMRk;U~{$*z)r97nUae{z$2b
zkn^zYB2Ks$qC#$|HAIj*ncr;&AL_`9x=VWXKle8=y!URckY-c8xQMJdA%!<e->*DE
zHC8;|lmPq(?as)m?rx4FgYqw87*))+a#B(N>=uc&j2L12;Rcx=-q#*~crx}%7CyTb
zv_Ea(!s_8@kUS878P$iyGw}qOG<crr5fE=8b;sDP!7PXKCTvIZ*Nrf|92iMd5+ro*
zn+qhJ;5l~g?aT>1C11AsrRtZQp1OQgZbhRw^n?i;wOERX&!_WHrHXRKP@(j+=!2WK
zdaOYdZ50k*YVkI8?IoDmdY`f<%<tE6W&c7+{rJq)N3sZnLe#J6Fa=ky;gXM!O3_K(
z?@lJFccYP!>>^POW#Yv$#?;n@-08dHeCa>#be9jTU;0Dpif25ou#<!_=QFJAsB)*E
z>7zB@C5_QGIHn1S>A7^-%5hgK6;d0snNO1E+%BtK&lGJ`Zy)fRi4w}bChhe!#9&mA
zxKn<;XT&BILm1e@uSxw5#$Qm$rI`C`J<ZZC=~HwcAxyvAF*fk!F1_ELEr=&;_r=EZ
z_Gy!TaIHXRA^VI1pFe|aK<D=R-LA;Bko~s*JGgvz_Ma6?!U+d;*`3UU@sM-*>#s~2
z`K5}{Ue%Ch0uSg}_b8P#WkB+a4*9HXJobZ^;Ft;Dji&PjX1Xx*)MVb3<@Eih^~J@1
za(iz=;}+b}Tj7j;D%cSp@GE^)Q27_OowoRus!ng3HkOUBt(lpXgZ^4^sSX+)|GAM(
zOk@j`%3h)I_|&%5900CkC;OVX@A8NrOYuDpOyx_DU;Hx#@4nUWGe-UrB6NN*_yb#?
zjRpfr{fwcKo+Kmi{6Me}Lk=}otbE(*I;(EX3|78K`zOW7vD9vOHA;@<jUFzA_gh))
z8+jR=&_uNVx~mBml*+MSh4r0%B+X_WzG3>wAYme?jHxcb>?7{=4`zSn{q%0ee>~4d
zkDG&&Eq7#jKOv|$VC?7?wH5=R#%?sM7I|FIDP!=DkqQtnMOIQElR9Q<peojb_va<(
z#;eY`cn$GvBY)l04y-AW=aku@cfVMuNDr~>0_Q>$@C97}O7_*z!1wCO=<{3J<kA-p
z)(2hsuq!Fl0X@{Tsli{8(ZOc147ySVxN<py@@gZ<S3CA*>B;H4d=OMIqs-XXA28V>
zdBl&x8l+&z9ClWm=dtU11X|SR{_BY#)6tj%hYt8wwfR~dh?z>4TBms6I-{A!unrLW
zsJ@lJ6Z)A2I$Du8@nwVwt4Yg5)>w{_A=>DzOeS4E7!_u#vv#plY43Rx?zaUBwz4#6
z=s5xzo!Z6%-op3|ASmmLGzK)At&2y!K<#^aEoq4Y4R8j|VuF9Rk&xL6A;M$;R6exd
zo>ab>aMP=7#w3g{Y5Ut5lMeXCMW&gk#QhXfYPS~7*o&9nds<nOE;}t3Q&G)Dgo`43
z(wRUX{ERoHrgG!j;9JR<xJ?04Bf5TTIgZ1(bLCnG;D~7b?Q>2vtD|SQtlcMjTkHmD
z59xON2xdYVU+GvLved^H$c#M5CkQ)ZwEsbb-0h%ZCJkt8?9jo+d?y^{q5QIYeJ&QN
z$RMO{B-_HNaX@45M<25<I;@9D6bhI#R5Vh0)Ad9Yu*UDQ{ZV;i9c=8Ml7;#j<PL0J
zQf(a+%3~P-4;8c34Q{Mbik`d-V<arX%QUfLQgFZ75k7Yacy2c@RQdrRHe%T-5}KDz
zuv-it`+?-^^`n>Xi?Dabm(*jSXVb<<=8Syh#wyV>caAF>pEpaMdw<RQEAWd3))ye-
zpsS}(NvL{w;de#9-_uB87y<NUJ+YDCLPABdeeJereS0tUBzI|t&?n!W5Bu8yPa+=v
zdDzuIA_3d)IF-OzCBZXs$MP2X=U!TV^Q(F|RIKe?HZ`X6HnZzL=7o==kwnKofmT^R
z(QnnD%EhtP+J>>=oI|J?G7sK(sfRbXa(lVjb+)H<<%BkEjr)J_o4d~ck_y*v(q(6Z
z{*=7#An60W78jg1At6%K-Lp=U#jBtlDg<@?(~+xT_T}Lcj=#x_pJ(xU0)R)oe^3T^
zkslh5kyO3wd-5-iXRdx>Y$68GGBw5~VkjxpNjRhh^SmSNVf;y`;%d7m*<tM^iXs+@
zOG--L>+FEIdx&pentf8gG6jhHuu^{-dA%$pB9X|HW9`5=LzsVM_2DO*UrLCq1GR!Q
zX_Q7Hqw&<>h0Q@dWOQu$Gkm0rfgxlNrGJgF#2{9wm&?i6yCJe{4n+KK%fkh&{_Sml
znSY@IT}Wgk$|m*LFnhKa*XO?w@E6uM<%0XB8FtX)2iY<Y@Kt@%WY`{=>s5U9HQT09
zsoX@{m>n0dQA%{Ir07@%)>qBb$uPyP#EFPh6OgRsFs+u9NA>94Rh$<)#1jk2N0Z&3
z3**L5%Y3(ONq4CAEn|1)>Ls#nzQ*p<*k0??=1ZaMn-3o@zreie5L7WdbU$dRKQlJl
za2USjTKMYG(az~wb|hVxAUXd9Q@aRdNu>=I21^33;pW1Rq<;Ekpk3#+1oX7)l%)U)
z@^@BsxA&L&9!3nZx9AyV3dE}-&J%hDF6eyL&5nzQp&yXbhQ+h$z6mkyf?~@qmT9`U
zMl$g0F1Ita5c+EhbZh&;=lSCflv_pdnXn&n8f`qT40pc_y|jxC7^iztv7iZhf&tJO
zLFQ*3?o*PCc$mY;Pq*!@Y6K<$wLLEX;ph@kH_`RLNsE33)Be{ylw3*mHi4R6l$55k
ziM16H?wrKNlm1|N73|WnU)-bY`inbP0T+CotIYPu63D_%eADY3kQf!;?%+m8lW|1H
zcX5Wc9)Nnhw2V@`lQd`F--}SB<qg8HT%-K8ZBinalcDt;HOmH_d5HhdyZqD<a#5CV
ziv4D$P0Fg^8-WMMtHG=XV7_D*Q-;Oe;V}s&@t@4s4aP;E(!6kpJ6ozxA#;MDr~Ha=
z^p2Fb?ug+;OkR*LH4h2l^jcNkN0E}k@(n={?#@cN`Imzz=1Ag|Kmm&14<)2AxkSMc
zG_og%o2sA6R|@r5f|I6FdROdY{^szE^uDmg-vC98vbb<`&fmotI~iAa>Sju(PB8s(
zLo3^5?2$2vb)7@!nPNq;?%SpR#3HW3FSQm%Ez4f`YtOTym9g3ZRS^8ERHWc%yxdC%
z#v=EC`@08R;kba#PJBD<3!az>R;Lsw^lzKwkyAcLgTHQe2Od5=<arTgI<5#LTc>JL
z_*L8aED?ycG#4%?9Mv<*<^BL*xDM9+#Fj*HC!vo`e0S@^wIj7cj`*hz#-Cc3*H0F3
zg6qESZ2zL}r|+51$o7J5^XEa%|1*YUIbvwMHJS`YKU4bO52zauJP2lSaq%QcwhwXc
zBK_oihs}{<Jq%RFR3(>^-b+#oE`sV)ieVb=Nvd$KKO=lf^to+-9=!h|;r5?%>%{+0
z4gO;q|1UK_XFb2bOw&nvZ)sAQ37Z5GyJoMG_C^`E_Jt5Z=5^9r@xB-aP~edt`=#fP
z$6Z}Lfk|t;HAS+K)4V-we)f?zzEp~n%L+?zd5zrj@2s|PhGbfW3yMw%vT=s+a65K8
z*_57ge95W5o??4TWUP5mWRS{=7-MM80D-m@#pXyWLU%L82J~qHY>`P{o#gSET7+3I
z%{s7CIkWTRBK=jDjESvvKcMXnIh6Jm9Rf;r&uQpz<RP%W1~=PTK@cGVev$etQ|to<
zcfjH%NFfED;1c6A0SC2O#tsC&=G`N!$3d&(&Sgf*=1dC(3rqT^?^XAH$Q`}Mt+ZAH
zPU1@gvg^NbICS7K)&}7mSNZ+yaiYInB`+p713Ft{=(IL?*55h1ckyDm0sy9&4wJ^V
z0P9)iVR9AF=YF1*-4!_5dA>MBKgsZ-J966wfXeP1xnlx(2nIhSZxPeVNHGH_5U<+<
zy@2KszB;LOxjLv|YA8;pb)D1UR~>09<VQ4fBrl9+{X35c^GRBbjl)T6?FQ&67CBCm
zMqY}z2c=mviLaBr(Ju<c5RMNd-^9B;(z^PeNc@y#`}~KwH=0HCR_HMA#%p{0*K|z{
zgeh%B@W@g^NBa71rDmN7=Ue1SHa%YAK|#4^=}<uW-DN-osH%M~zJ|!?8K6^>p7v3?
zvV&1#a)G1{!+cFTsU9xf99-+z5q?lWuyJ!7adY(cIqo~?*^Ov;9sSGZ?G7|aBYq?E
zHeL3YdjFGMt)h>DC!dqj7WaDFaumqmHT8RG_dWk<Rok>l2MO;U{XujbjZfp)*QM~6
zYS~wv*FG1}d@+L|Kg9S*cv349ZrCFSUcd)5lLo4>ue=55;nV!u``>Qjnk3imCKT{q
z(;}Z=u0^_kz;8n`6OjA`1-msU0QhPU{2<EU^M1ELr9kt^6=z>B@+=NUZIT_xb;diX
zkB!!H(q?dd4fbVf7|&gLWS-P*<`GslPEwDbfJ#!WaZ|k{P+@6RYX+78BHS5jpH`gP
zcJ2U)rW>Ws@-e9t+scrN*{x~8^FTz>1CS}^0{1PZQ0T8I{aoTBgJgkl)cgbg+W0ZZ
z^eJDH(f-{gwQaHbR4kO49(M@zgF{@SQIk|2a=S_`@#5PqLKG7xd`07;;_UWUuL!Eb
zgP9{3y9R{o-hszZF2P={jT|P5nOyx4mNno?<Y%fwHuMe_Ga)g{*eGUljc5KP=7jp1
zq5l5O#%IM5*18%|ZrA&pA8xz~&Mv+++whEQ#TD%AQeA+F39`dR?BPxrr8yhS?MroA
zcbat-{Niw>w?_DXTA^a*l6ZlKe&vt+Waw|xv~In-eg0^Z;krXC&k;=g>Nmlsl0$+{
zDVKtNy1wdyA*Z1CL;qrN_4azDvxDn@R>f(sDonoco{3xw-_K(FuZPB9+^}XqpC=~w
zRT#T<uAj*&E9{YJUWri`wG3xW_oWW3&q8!`!XZ-Cj1^VzIP&EuzY&?i=O`&J?!P|z
za$T>Cq&P5t!M9s`oJzhjcNFjA7;DTq;60eauwM4dKN-g~IG3F^g>%(oP#z)dlofR+
z|BVTE_WL*YhPLT*AMH*{r}g~0ulU1uzp&6WKbVQ#N>MFO2~j;ELkI`ZVO&nV3a(z{
zt>a}xZp|6vvjCdh+U1k$^Kux(>2WN-Y$pwmT<A}^$UBwmv-C$)>pMlk9^o<6)y%_1
z0z#>-It_%Q=XW#r+F(N@Rp3LG_2XVv`fC5*`iI9^y7w6DDp=|_E;^S>H%rT4^}DD&
zI=N3DSLe+grYvM21GB_U{z}JyO2_)<iRlDtBtcN2ea@fB_sup4+p4Zdi!95He)9sA
z2HvaJ5$?w0ORx!DW8wv)Za*3kB*)`F1+sCJp5@l7-K>?EPGV<rM1rM&;Pp|>bZnz6
zNu>jam!`!vzyBau%*#_u5M14BbBy!?AB0%;50YEp45`0OGSvBv5B@*)-nuJ}uiF~M
z-KBxxjR$vkcXxMp2?QsNySqb>;1C>wySq!UV8Pwq{ypbC<2=tj_Y>T4|7jalHL7;)
zUN!ezbIrxndrQL0+QaCl_jjX}WI=U;pwo(Eq>3aViY*flF6+}c$Y`ODsM-1soh<wr
zflEP7>(HC{@>efBos<F#1UcpW+NV$Z^KRfO(&`ybd<;=uEiD!5H)qG$jnLU|%;T@z
zO9X@nT4>vQ<L+RkqyC0%S^gJ@pA13D5RBQdO(_dY?ae>5W2<3`52o%;n~lfn2T9XI
zzHF}>Pf7DU?DaZ>`v;61bS9W1GuTeO`Tb5GQ*{nx;!RDxySfySue9h)SO|d(_EWF%
zQ@xZu);9B%5u%atNbFpiosGL#K&1OV(2AdoK+zT_fv*t4y6B^T_&_mi!BIObo4IV<
zwX_@Xbgu^O1{muU;AUS`n3R1FZ2JgdwK6H2@mk?rK;{y+RrQj%|1k&I3?b$*Uh+o4
ziu=AyioU|v%_wAa%K_%&6|z5%X1%>u>p|Pe{QZJZzDh1NAu6l7Xy}N0W2N#YHeNJT
z=J@uA{({hxDm`=?^aKeF!q@${WXF?zH~)k#64d$0H5x92B>nN0{=V=-Rb)KlMFh*}
zmydNDK^Gp<0w35qFana%6HbY|5SB%<3<4rnA_!oyt;O`&U3mE(T=u}EbSIH_jy-t(
zdXSw2Azn1u7~G`(7Oo1la{cj24WTzjbq65ir!Z-PXhO(<vXRA_F$)R}R*XSpd@d$8
zb%l}e0Sij{IhCOp&4zlu^Lk!of&Z1!W-^1kSToBDq!?%y`HE(&jDR0@I{E!YPn$=>
zs5Zx>5H^S>#1M?=L&0h1P<rnN7fXT2Gg#oTbbAw(ed6sBJIpCQvy48;pPao_R9f^?
z_y0}*R`-wm&4SDY>Rh?TXY%*9v=+ff46;yAT4$%gjoq$<pUfx$c=@H5fUzgG3eDJ^
z^T+>@?aeynI2bT(vm25PD>Y8MNdR8O2-(#>C<KC&zNdqimMvB}V6E=B)cNqwZe<T)
zbQ@a%A2pG2X|HgB0EP|An2QkMDcxq7$=`2}*0Ynh!$w#g&wgwLp&eJUV*Qn^j8e!!
z!xkdQFJ`2aCXiAr(?bf`Aig=%D><Whx+j?Ro!8}#)H)|To&{@J_vXe>6ioS&svX5k
zZX0Me7RspNU$grDLl4`b9vO>@mcH?6DHAB^S-Di(@z&l=e;#ZMjAFf>xCC_%hA)b`
zYq3=7HALi%B-<e1S@q&blFLTO!<nTT1-!tCGE?TnYx$JNT7TbcA8U)qLXo0h?>nM5
zYQdf&6Q-;&9|2g_j{n$mq8`^tv|Macf%%hCMG{7{_z5OP*PG=U+!?UxKulUArqnnq
zB-&frZM?D^%c0aAbx;~3K)R&q{`K^iE*5DIbbS0jg`lZROo<YjFY_^gH!okQ@r<EQ
z{-zFDoIe(6vKbIJY4UNOgWC6A*z~UiZaaiCO7O{$$zmWG?X@B?8wX0QERkpUOtz?l
za9ks0q2%pQ24p9vxQ=VT&zHp%1{-52pp?vzM6wRG_CUjOjsDq5i0xh2Ad3AUGH*bM
zwoOx2G4V9G23rQ(>_R1z^&i0qXFR4@+$8jR<h9*g?n0kJ4-p@6&LTIO`~6{W_AV2;
z_O~*g+rBN0gC2n~;`iOkU`PEVPdvHokay{S+xKc}48dOKl3+?-mE|0h7V6x6ECmB#
zHC7IMDup!TD!HY}h~`VrMreAS4g}0->_J@eI6<g+Zrape4RY<>{R}Cxzux?HxDmfC
z<<zhXm+f%Jd_>dqArw@D)xKe~CkT-kj~XW!KdoEVkmlv58cbp%3u&Y3_HGSDxx}>S
zN=7d^5h}uz@TP?QxZA<ZqgQF90=>ZW`;$P3EopIs*#PBg%SFgK{b-rE+>9|+K`=LR
z9(f&+NzV?Rn?mG6?Z6boWq}gAH$q)2&HaSosH|$)PqU&<x@|OzNtv04TeP<LZocDp
zV@Sqtas$%wpQAHRklDkQk&h1BN;wDrT4x_nQM9oo-Len`JgmC<1aN%1fkSM>2TKx#
zS7&C1C3=E_SYz9I#wEB-5o5-tFnmMsZBR292CyrBWIGCK-5WwOv2)Tu=noB(4hovH
zQpb2ja3guXxtz8@GP1`SiVZ?bjY9{Mu^R>pIHrv{AHB5z#`lG$ODlz^5KW#ujtIVJ
z%{j$#^TiZOR8{!Sl=VgjKHc#>Y{Mpzjlvd#A#vOS@*f*FRImw1I@--u8OjMMJA0qa
zlM7Ta6;u;MCnk2#=k=EN7=#Jvk;=3t!fUsYAVx!#D!#bPVWb$qkZQ~RpRhaMI1a+A
z{TG<j<jl0QX6JIRkJs^^OoUR=&lX3)d=FRJW=#>8-%1R7Y-X>Lt?cxd3a8m&Kb4i0
zt$UKZ=lBi1iuVR(pR{|N5zn$R8vHNrr4K<unhZt{Cw50HLH*2R*CH$2*qhk@8kG!c
zCg}%11MgGbhz+E}l}NOqjbzOXb=vXxQRZE1Yuvp}<ossF86HGgE+yDaqflDjW5MX0
zlNF4Cw4*nFI1R3VeLl2fx8lwD<<0E=3ZUDFQvYEQL5=}DWw0z;(So~LyOh$Y4lkAI
z0iOP@&Y6~`F$AcBt$I}fqt*()!Cva9HWf*>0+H6QS|_P9Loh_OxUWnma0xu|Mlu%b
z^#;>g+?$K>0=TET21VLS_Fzv%W!{-0HoEli(b@{bQ4*Wg&ee%++an69;drpeFIf9Z
z?P`UkxS+Wn`!QGHoYyANP4Q&>a=*^zx}1AvhCTaBJC3N)JF#x)8w|7u*6S_jfv|hz
zLN<?($~Ar%ESt(;bS2|4x=mP<_FJ#`-v$ye@?6uXOFd71ofVM}wWtI*uR6SaJsc;l
zIbN_`UiOi`%HL%hH2GXp0S;5OKUsZ=<J0zNV6lkVwc!2j?le=!VVpFo5*~sAmJ`MK
zTinw-{2uFqslAKHh{GuHN8idy`e~)#CdJlS{FOCGd-QaR?tA}U1||M99Ce52W;YJ7
z47+a%Z-)^Xw_8`XGA#_IeUqz;xAhN_7?Vw@vH^R@)@we)l%Pb1!g`CIMWRm!$xR3g
z*d{ce+B$NR0tB?B;B*2PAP75MO>j)H(Ca+a$zZyv&<Doo$}8QD8*dx}C@IVE<zX|M
zy5rX20Ic^qLofQ`kxvkNRQ5{s^+7gxduU*!wR|aG_y+dJYvcdw!C{z;Vq)zg5b=pS
ze*9MUSz2pY4pMt0uvej1J?skp2dWDm7qkQU8)8yv6y(8x-Rn+Qqprw!&PgcL`qU`i
zQ90$|{io!Ji_HFoXxDpxriJ)7HzoL<zF2<EC|^&{pPQ1?KGY0=6^1gZQEySK@ErX<
zm&4c4i(_Y!?LjfOY$_qA?><c?F2Z2@?UD?v*lN-1Ei<cnn)jH%`gRyviwU^~P(YSN
zwwl^+&*Yi%({oa{+<Sa4Pr0*Y+3zO}ouYdDUpH&Fj1ISwIad8r^HUSgJ`K9@8h+`2
z<({$=Ogdbk)-p#afV~%!6-{H+1AFUCI~r)(niu@AR{2Lu2u2qRu~0iY815_7GOE>S
zIJoZ&GB&%A?)AHAFMCHcv6ypTw3BR32yZlI?~^4ng)Giu?c$(z7<5^Nt5tFU8i016
zgGSh%9+|nP?`AxdF8di7Y&5WB<k3r81o$oebBNE0Ho{#iIZIQvP%qyIKpyx#3){bb
z&1vp@W9Q!tqJxMhrY2sh3iuYgYsn8?i@PFO0hMlwMxLql;~9t?F0n@Pv`d6(;;L~+
z$eG#P(+GN3i1KcF-Gl$J;k;#7cnWU6Fx9DaA2sV{aP8#14v`QeA6~I3x{2twGYeKu
zy+7wCU0u(amhB@p-UNXc{akHeCHJT06!H97Ba37F-%pM68ymkjL{1!SL+{9qg!AQ%
z{`N$|JsppLnaU9zhJ{$Jl1LwR8e#Pvh15mDHlF=KsQ9#bts_Fm&7s%AEzy+R`$%cU
z82yrM+Cxfr>}`d(dI<Vzv)`P@4<|)B85=MUhLfUenE%@<_v1lwhX%dL1$aT)J2<sB
zP8WDR@$k;C)D0<OlIzlfJ!;P6-S&8ZRu)OZ=x~Z>oELbX|KbVZK#O<GlIJrW`U!6t
zTqxzDFV^FXFZ4?dCL$1_sZ;Hbkq^a?H~5p=V*n1J5TT9Fu<HZAn`Erq0O^y?oT(hB
z56!g3)$ewA#@SJ3-}_0SK=S2S8~8SmFRWgqIcGXKIc?dLP)#0L4PEvmouE^_)v7xB
z<@o0h29YTJ>v*zRly4APp~fU1STKmBYsy7~Snp~4SdT?}?8AQ|$0Z8>Nkj8cj$cMb
zssl;6Mi$L!HcFx4Q(T;yK`dWhv1=oLp<|DKSA4Bp>4wyUAt{ZjZj^}uK$HwD?9>=`
zx|y8pP_YxtTtB8CR|x8u#VB_@d4pS2T(ehjw9%Ns;)m3CzIo~9S8O>HJPX^K8vYhb
zB5C8csDsdxOXVp$%Uj|a4PR`F_I)4_k>H2y!<eF>=6cuQadW&C&Fob8tW}fJ_9_(&
z)draN@;N7q9c&+M$0k^<4SBk8WH2K_PjHuv^!hFO6a+8eCMO^B6Px}}CPra1k@Zvs
z%xnRo?`@7#E%3}=HfjvDv~O%?R~p(e!)0Cw+(*I~u@F+2Ni!VkbuN)11wsaAnGorR
z^xuNN0npc=S=77T+P)n_w~EKmZVyl9MHZTgq^yRMGB2UuXPT8x6DsGqc)r}r7s-D)
z<mN>F6UhpJsS@Uz3>FyU(iQ&FF~{nG4a^}t8st!ofpAGm)j%Q1G`q!V2j`X+tw+|-
zRyb%eQr$XlG4CJ>U=$!(vY4511$3Taa9EY}PEJsQfJ!maIs!zb5RS*;G-)zKyl^T;
zIS@r`qt%+uU*{p(P=Zwk&TsLIvs$ovP)#D?r5U-_(lQ|U4lDi=ejJlJ61n)b122e!
z((;<QAXP@;Fa*`up7f74u~YrJxVxDlb!AQfk(`<pUu&>jTAeoe_=opf5wl^bZ<u0R
z{;4SETFmVDW2)+o*8jBgMe|}rg{zGTYa3SXuz!y17%dwTqW2=3saR1)08$BD=gl9K
zsg}9WNW4W?$IFCd(h9Z|lJ)SsmBG>`VTLv<>tq~)R0x$!0zq-ufbEW%(bJg+{CMfM
z<4CAxq+v8Ba8w}l;RveQprjKL`;0A91*FTTb87+~>J`pD?)JnnkL!l}<qOsPsc6dB
z=2Bc}p3n%bb~m8H%TSDnWo+IKPT7abgE>QBG$DS&x;vTG-VnQ+=u}0E3UVPztW<hc
zVFfixoO7=Vp7Thl=|CqTuU}2-KKv{hjmxv;;DXDeX6NKCHs^IE6>1*%(%vKS8+j$<
zcKM%FR;s*MY{u@mn}!K2S)VK<=~WWi&D;LC^>^2*3ffUu{pBBdQTlfqe#<mFCugPE
zPTN=XL-!(pxX^R&{iCS{|GMcf+f~hbn<IZRMyt+w^)8r<Y&{vz1(_h*cnAqatNb@3
z^H}vbz9;7Q7sI;H%Lbw5@oW=S;^}<z$PmzLLSu7^ar}L;$|}>)i}S4FxzhBt8|(+H
z$w1Qj2BnLZ+iq8twRCR;t?0mD_wt&<!D@=+kiUKa4Ws{E+3{QC&5@zCLG+5`xuO3(
z@3w7|P6s4Urn&xrn}twXO3F&C=LGKxLlA>CtOe=J#e=6T#<M%b{rK;r&@RM}|5ar5
zx(~cehSO`^*%=_vLfS#3xzAy7PnsNL#_J_nF)<86-zUDb4EE`3{PhFFhnXp;jW*1u
z50M*vsn~BUmMMEVMX@bqx0FJS(ptQ&(|T8o5CZ-fp?3BynXpuS!-OdWTpS<u?U`|q
z6%S0z>xbM@e5@snX7LY!K_^>F=R(q^{I>u40a8;db3rL$d|JG4=GOwrIi-W^*dWbd
z%DKp25KS_CRZqMk$X8%rQT#y4FrVmS9@%p<s=mn^$>H-ULPj$?B348y0nW=yGSk!e
zJ^XJJs~qxJ3dh%neJY*WvdUN&UG4h$o~DdC)XJ<tEr(MAMhHgApR%vRrw;{)1(J8Q
z+kq;P4|ouI8fjEZciFvttTFcj3l}M@dN554TTu@aaO`SF9TcRpSdPofU}gp;=<C&l
zb9`4b7R{^b(^+l#_q9pGF1eU^z2qjK>z4bEMF`Fwpa?V8k>2J~3!kdAP)3~&!<{~S
zHf%F=l4kT^d61%7t4TLR5@bD!o?;q`)HuYb_oxA;!J0k#PkTxvo6HWkQ2n|q#dHb?
z<RzGH5gEf?CkTUxWoNo~68{!{6osHSmxB>iZj>1;TZ#kfl0RScWPPw4J4sbO;?dA#
z?3j#6s1KZ0eJ?Vl$K44S2ehbkLR5~>M%-~VCf(g(;WQCf#4DZJhh%>AbP}4;BwR`h
zMOy&L^k@4*Yh=!nUVUyR!Yvh!$k9_}h#pX)Qk1uKHzS!H>-27D@F_<8Assj>stu!P
zAOJD=K^Q%RE8uw|Y3A)+aBvtJJrpF0g|;I$*iH%Q4K^;H+zx%}^Qtp~STh;N_m+;f
zZA!~@l{-7>v>b#NZh?OZopk#;tYlBGKpOA`bz`p8vWjKsweRur&Eb6R%fK%Y5nM1v
ze$HZB`TI{J1-<T>$5W4Qn}UbvH!w3rt%`W+5Cv5-WV<KEZ+$_U&}(R<xClQlq`}--
zN(5}-*O~<#U5Y@`q|OQDDbcoX0Y_80PVxIwPA4k^#J7IaPqgN-8UE?W{ju#X!@@C1
z$4PTgCMLB3+qcB|v)pxz0PV>{1NB(2*P8`H2B01e90XTsmIj=t+J<Fz?0q}sWcF)y
zG|Xf_Q0b38U9!?%mfm^Q`jqyhwUW`*-}6Jdpc7J2FpjR^>n~|V=Bk>IuzCA5*=H-m
zro6tCL8%YgkhLkl5tOj7ui%Mb%6`@1Sci$)w<=pB8rX)Bf`4b2<nCCZZK&?bxt->j
zDcTmgkN+5v3TC7Sr0TsvNM3S9;uWx!JfE%j_MY<^Fb^p-CcSPR+J@w#-d)${v{I}^
z`w=!6KjuSNZK}6r!*V$U7O!<$*`S*Yw~aGW`+AAiuEyO%GoYZwM-kvTz{@~K!&)Ia
z<Z;T6qinQN6@c&|77S8L09VpN3`!0}&(@xfZGLJ|hSkqafhGf+0Ga#c&7%#cpYUF2
zF;qQB7IXOMkNqmn_^Ndwk2&ycj*q{0UXn9d<_+x}zDN%Yw2`IFOO6f<+88z;chU16
zzMF#EPZ8FI@_eTr&eopc3YIv4TNrpFz2AE2ilUc)*tUSg5|cr|kR@}Q8ZyaoZ`t&*
z%JU6un||y6+r>UFIM4oezHhsJ?%ESH%4-b+L<&|BDS+|Yo~~SzmWG!FcMw~t3Yp_A
zvs>P>|K+QltA1Mjg%SBdN{IAKicTVsz<q&U9@i-*pvHhhkO8X@CfGw2L@F_Uv`xZ}
zi@h>4ZBauxCNahP7g6~2vi~wm40j4g)ksgz?g;Y1J2f>`s>wykXyGpDg<~lG22|Qn
zKS8YjafZMH^*^ngFZU#NLMior$D24kx0zDGF~h)fS7VUTN<S8&uJ&1{$%KhyNdQ56
z713_2M$8rn!tJ`Pt7McgV|Y#(y5?%?Tk70*Qc&UT6vzGOUMC0Af|O`J#*ajfEpV^O
zzc3)gY4o094S%8kHUKv|-@X*YgBMb2ngBf2f38Ymhw2Lvzy_tQ-AU8w;j~ATZSgL*
z4za#>wR`Q7&nAJ6KAFX>n+cwG-6~=-+DkS~WaVNp1~$?h<jXXb*T{U`*^yLaRV);-
z8W|N<R^BS+>q(5=BgKV|v%o=4pz+39<<>AQA6X10#-J)E9l=(o!iBILb?oh%e_-H=
zH_u*>-(D7NiqIev9;T&oq8Y%IK&}iU!DZYbM>)?bz4)9I6hA^}DcjwPApM$g75d#S
zlLp=~w9h!v9ELFN^+8*TBoU(PE-y#<6FdN`nO)1F5ksX?_)DSS3G-qrmL8HDQUe5y
z(A2ZN^^~62&AK*Ic7Imc#FG+*0Z|lI3ikeBJ#VrW8k(OKe`^BBzn8vN$ITmSQar9M
zx6NHQE)WMkfo84nT>xnx7`#Ak0GN7<;<S%pq1c|<Uu_gu?|P<NcPM16rYKPo74?w*
zj4~2?<4dmRMOE0pAl*p!6CI}qS}b@QYDHbZu~H_o&Gb%LnB!nFIRa1#<*OCJi=%+d
zF+GwGtt;)#nAB@L-`Q5+qOu=bI6a{#mOD!VgKFB>lf_!_Q8k*G_$j-al497nPLXI6
z-E*V5@-s3toaum%4Djm+U}z&Et<Al^DaD5T0nz=}^^Per;up;uf@~?gIOKUeCM?w^
zExV*;$G@B1UpWm{zVt*O$*>M(b(<00U;!=WKVC_j#Tqn>%TYQ$o+B|SiktGUWDZ5$
zj7;6Fz%Fj0wFzS7(`342J}Mfl#`qsu(dm7Nx!l8C&RlbNXJop+4WixC`BeU<)aqD0
zeNhp_<og>F0l;DXlHv>xj5BwfSomwg7#2K~;vTjg-N+f3{&|8p3J?JZz*Y<M8F^>!
zegyj6OmY{>?des!L(yECU^1G75QALU8lZA?GSeB3^=04#=9@>h()Uw77UAvt+fz@L
zLyim8e1y^DGOsIei1TTW`P@yLM&PIHb1oWj-|Z5y=6`*xsqx;rzM}smqA!~==Jp(Y
zFHVOd&qE*&YxEDxxh*&5lHpb3vh5_gFIIqcZ@!j#R73lM0D%ySe;<vwR%1`4{!@e3
z%1R#gI$l`Uz7jBk+!y9wgQ21Skb^pmBXss1hbIA!XWRFA>x4{iO`@p@-}P#({VA>0
zzYPKLjB4Q1ec{NHS#|{AfP<Hep{88eT&3xcP!WF-EU1jO4CxUE@MlZ>js*|{jkv4L
zsFfiUsz+F>;t2Yv?DQKS36lq6lhNB`KP)>h5kk(cEI6EGW6>pR(QPA2jO!BubdnyM
z@59H!-8V-7*M~XPopkFZC^))eOgY+;Kt=V(no*!9li?4}kiGdy>Hg*}>)3^&S_TuT
z5w;PQGbYtI91V#W*auEpU3~MALRGaM5uRMERk4~BdF@f>%mj|3A9fthI3%hkRBi_x
z(JAItD!&Ve^r-sasL<_lkTqLMhGthH6FQ94txJq;uXrdzjWl(5`z93enW-w)cqm~}
zxtz5|*!v;|uPpi8C#@qT#)nA*iMG7=Vp+s_2S##1Z?t-3+A2*BOPmOv$Vaq$za^r3
zj{=DxrF$@4VE*;yy^HC+8wtqyl|Y`r_$qTK%?*O1=Z^K7uW>xv8{(?*JCwpVgM~#V
z??<>AjCFtAojL<HUswo2P9|+sQOjlOaC`1V#3<8R@l#dcL50^@U@#Gh`xxh;K@JT^
zrj_;A)k?EF=h__8+f8fxxLcfTh|aR9#R<8QPN<d_bCSMkoTMR*zqthDW6i%Gk@n$+
z?tQ{2lAO{?N@VsuVnoK~Y!j0hpUkQ|Vh-fw6uEo4`Oef^F-A1reQz~0?{z;&^JlbQ
zm9}iOXxWI9hUqh|7=ozG#^U@{IjM|D95x0#Tb#}aW*mhA9#YcDO{$yR1wtx3nywRG
zYSRV@TTf22iDWiUm5=|i_AeAll6(y#Cr`{i1kV0Q=aw-7e^DjN&;o_X7X{LWGwL$w
z(HKhdM~Q<TGC!<s7<AGm!sC!66R}ksIw$H(@mxN;{gi6;fVgHrBFVE6&;j=@;h|V+
zSBWeY0oyXpEj7Im(c&xTaAUVK0;y*C1vAn4{!PzQY=2&zvZ7uuq9ydU<K&lYjiX4H
z^IL-PK*+q5(49#Cs6B{PC`?vKcLcWx6(`VhNR`-Nh#VwM01EWbO76;Y?!j`;>eT!s
zuKhB|M~)=kaZpW2<ZGc65K;imuTM&G#!D<vbzAaa2l9q}tjkT=LUD-NllVooZvZxD
zgkQWSg$?|`5q_tgPz%67CZ+~P5HW#x`$y5Wq|6Y-&n&uGNXZdZOba&1{Z=nADZRB;
zDCoZQoVWLd8}j=y8EE@sg!$1(hM2axu=GjanOYkZ``?dcSOQ{m--lhr@1VVQEYL(&
z2b_`Bo641Bz={(FEJi$XJ<#Nx`%yJ%{{ad!EmQplwR|XR;XQ)9wtK}`gcyz@t$xf#
z661vE;T8SIt+MH!@LTZ$hnvn1V_)H?>`+!h%O}62PWM}yTt``Lln;qbsDM4rqaOdE
zBq}vEM!YKxV3BnUou>*V<{=7OTz7EDV(OGJA-MdASJ5cF?Q}VG6-k*}&UANVx}S)r
z!#@7-U6EqfCjKjo8{DMzTi8tyCl(`adw^JzNCQS?#P&C@nh~a)?tJ;Ko4aLW{fwfE
zZ#IzeFv~ER1|`7VpvJd6F&oUJ6K>Kf#2<I<DwQGfW81E`kAp$x@7Pp^GzLqwbW9{8
zZRmd`L(LfmR>GYb8d;QiMu%+zb8%2x>@Kp}f5+iqCs6sUDV5jC0u>NYL*0rD?=ZZO
zO8YPmJc3zF6v$n%)+LKhzQ*N7mpX`~UUj>rbT*l#5&ParRVNjbatxwQJ@V0!mUC}m
zQ+5TsI-hYD_WybKfh2i<_#CRm9JxM91HGC^69OYSjt6w-ibhy--xq6TXfSgCZ+P~a
zP@RBwS`zmpvfEA_?LvoN?5D#9Z3YJ{mf(p26tby-1`Q8-gJmkJ?He8v@DqeVIQ$H5
zyD=J{L3ZMp;&G{TNd2Mh`-Y(lS={zXi<0;)gq*0~wSpZ&h(^@~X-&vlt|dHBl8cUa
zuEqg#j?p-BAbmn=AXVn{S8s*k1VYRAs?ps|Z9M&m7K`p1T=sz!>KNuI!XfxOEIa(O
zGJn9BqdGpjTWx_wyY3JB%;$;DOzs~X4D&f(<?X&(0<Y>2v5+N|?ifQ|YJOScfn3d#
zRm|tdsSzj)eL1Jma;uQ^HJBxqcazgm<q<fQ{O{Sf7z8ljGoNOqNkh3)Ut(qgRA-@P
zLy8V&3dMKz>Ga6-ev7+YdGl*mQGe|HCxiq7FZ-|=Uo+w)kSURbQ`$ru37sb+&PXTD
zqJ>LVos}=g5{J;8M=*?ZTV*4LQ%7f^XR+dAJ%``w1zz@CuMP>_V6VeS->$+1Ed6}=
zV0yOxSB#|4+dmVj?wLO=Cmu-_{lk(|2jYt=?61k?GmrBYz~^Dx`*|KIC|B#8i_19Q
zXC|nrS9mE3;8WNQxu^HQkHR2o<N4}Ksb|As#BGV*tdi#<L93_2kTQ``Q+9r^vEbRN
zD1n7IQVgGdo%Tn68$E%%N*m`LkrN4ViJZYDVAsZqb+{vLmq#q-N{*+L(qm;p;SY;g
z{*}esWOS;R1Ji7uYgGazB};z0P4KmG6}P0)YUfik`fzt5Ujw1qHy7PCQ?HB=X^`kE
zQ*l6@n5~N>QkXiwm)-~mMS+^ZnFz5nny>$`sGmqpL7Hw6h9Su86L)=z(OOgy*hxQu
z7<3N&kS$%{Tb|D2Jir7&q6O5nRo#YbJM>)%HM{s_)H_7dS*ak4o~(SS8Y}o?LiC7o
zTqn<KjYh`|zgx7Dj=^|Z3h<$bp{L`r4p^MsOEV!>K*Bw-tn71{G-@0hNtbt<GG1xf
z&N&CsvHLC{JIrlg$}oene^*$I(8)xZRmf9Er|3{fBD<H~KciyumG#Xs|KchO46{-7
z-BB{6tRiLZ`{DP*M#zjnrNNoN4e(Mvnv%IARzpukE7^}_w{~>ECms74*<+iI3fiOK
z_3wCo5h=+^3~bOEx>=U$-5L;fSeDzaLh+YBFG^1<mITe=(CHy)rh(A(ZKBem)s{uX
z3dKx=1<a;WYzz*gP>O{qQ-*~OF+juhliOKiC7R1HG5{RlY2cGQ@}lQUR0N<x`TlKr
zyKl65U4kX<=P8`|^pm4tkCHt@Z(Lj0SBHiOc((@;ghHf{j1FIY<=Kl(u#Zy1pPIV)
zpw+v<XSrY1T7sRx4sr?AzQo}&i!~1ZcqQj;r?$il<OOaqE<#M&?5endEbI4lpT03?
z!GQJD^z`t-Pw^WwraAf;j3BhV<EQaH!7#KvZj?O-^S22`z=GTJx-hlcl$h~Kd=v;n
z&psu8|LCF$h5nXLTP~GeO*4nbbJgvu-}eMXS|S9=u=)HOhgO-`zSr4Z+=B};*|(f)
zS)kF7zL`gsovKiRjNVR;ban7r$OEbx#|9Q%ow(^Q8JIv#@>()W9Rx(&vJBpnaCBs5
zdCNv4IsB0VpGF#bfki<&pwnmTVbg;r6(tJZJ94EFPBEpFyvs=pM%zCuIKbLs8`bh^
zTqJE=$2f^`ZT<LgMzy8{aVl&+1s)T92Y6o}^t=Rsb9X1%YkYj^3D#<GJIo{9wCC)Q
z1Yu9U`3hQL2|DmJg<p-lyZMY4+lxd+33C^n>$6>9ADN1RA&^03m|=mZFvFBw7&Sv3
zKuG3@K)ys<bs_WCc$$E)K#>ZlIs3h+59EJzqAW|$kobhH!9@?4RDpWtG>sJND79`|
zjhl?z9Q6Zw0Gny--9BBvBOe@zLQ0g^(gRihFw4=1g|KoXAU0dc`UJ_4AE8+gR?<6|
zn7%rs8pZ-$?d%9~8qSLj3Pqp9b8lnW?55VrPytiC0ku++;~a*<t;+iyF$mCQl5wOe
z85-Brg>3ysfT3WjfM)df;r|AQ|H6nnq2fEC)OwGzC~acDeZ;~=+g)zP;t$SJbrUyW
z2C^Xq7a64yU=S=2VB|G=_NT}KavIf`&kvTe4Y|<}oI0Npm2+wJyh#_C+dN;6-pPA(
z_-5yX&z^m8Zot9sz39P@Xc+%Q%tpy1BI@_?Ji|1!kEn|UR?<mS-nDQ^4hpEhp-$}c
zw5Kv&_ZBayjkr8YCa5^SY#otwVE(*G_HUf|wx7N7&*hT}pPb=-GFcv_bI*<v&w2Rg
z5b;ixHbtvHvTh6TJcq-dQI=;+`zL*n+kpL(fDtq;qTT(6R;FR^QX<ChpH%YzfSeA7
z6^z;&o`U0-hL$akMK~nnoKe<g*Z~~n_jzTPsBQKI>{FH^NlJief{M%z<uK-CfDD$t
z*>;k0?#K5G6H=W;qYkUxwvk~~dh17X3HhVqbL`3@*+y&nqY*JoLFAa3;DugII4Qp`
zoZF5ZrO-+aL!?-c*GNT-9mX0Xw5GxXLogLWs<{c-FbD9DJ59wH!@IA)`4e)8c<u>P
zEr=D!o(^TJ>&_fQ50d80S|T#$`Mq?Nu+}mPR1DbnJEayc3%VxzJT+|tu&%2mx7W2=
z94K1Q(7vpdGM)jOKB1Xth~4^*>N{sRwlEtbYD>lty*(3l<*@;2;~<Y<e01>hdF{0b
zm6DINWEMTPllqD&D{E_58<0v-24MEK7+}%HXw=&|KJ&IB;2-itk)??5M2VH&k&IE&
zIsVx&pBJzek0QQFU5sEi9eupn!eQ*QTJ3NSw?$Sfk}M&AEUarVRa__E*G(LXxO9px
zh1wG!C~}1!PF`=;&uHKa-ThRW;91_S)&0@E^q^qq(RgiEWV719Dc&T36@LQTj2Rl1
z8-XuYmZ%=uHbo?z3NF|<Sd$bK4Y<0*9LMZ$L-ZNk05*}DSHvA_YK|q2x5Z9^zJDVE
zmJq-Y^G2&AyKWT?unndT5oN=<kLIjrEx&r@d=_(_%HcI_8|h3P(;c`_<Nsxe_lr}u
zy-T0+&@*(NB8K4wKHFwih1JBcZ$L-}J}DXd(6%Dy6y&WZ;H;NXt)jYIc>$V7g1+DE
zcHh;nAXC9Cz#pZ?mkBYVUL2aR9Rw`di4Ni*^GoKT=RD+OHwBxMSjtDs<XR;L;H4^u
zzB1F`{5>g2VW5>S=k_%<s+*za)!?cEByz{Ps6BlL@%UbSr=wNr+rYD(t8opsFhy~2
zq6!zjPk&pHK=}OJC541}-N>L0bq@<<LKqG$f|ZnP9P;US{*~=V*??C%^rs4RS5KTh
z_AK6Y>)T%$b=mkuO9%KkEK8dF-(CPRAQi6rNb<;HG49nV>*uf9%Q2hb%yQ@Ir3SRi
z3Kdw6-`PQ?6rg3bRpciP>Qwn)1ZFW0Rw)DP9zw7eH9P2rk6A71QhzuV9agQH0xxdJ
z_@J&$$1Ypay$qD!UB9uwCc9mf+EhjLLc#UpmxJ20hma840b(TpYsol$42}wQ0HHNT
zH7{2e>7!Xhf}mup>R7FF)aqG4Z5jD<$bfyK;K~=-&s2xf4Mx-ZRfPuk_}QqH`R;Kn
zQ@<VunfvsSUU*~&d}6-#Ji(7PNT0OkV$35QghRzXWyT;E??hAV{2PMniH_+(mTh9$
zc_ljxXPsgs>ZxM&)8266_WF!HzIPT$FFZgl`diNbRc6}o8n-(nHT$<r^9;48`c~fG
zg?bEz1>#763}DSf|BL5Q^Fh9EID|Jin(<e`RpPjBMy3Ue-Mmx`*EsWiadF#w-H&u7
z8k*lHIufjemibmmA$_1zgDA!3VyIAC=wF}Zo&rnrsKcvBGma>$-}Egxr%n+R35>x}
z?l6yMJa!!oXL@&w)yn+iJ=7a5o?do{NF!CwkT)jn8uSTTTA^cs@-{R<BS9K8(N57&
zB;~s1GwlVaIiCIdXc*+k0J81#QSfHS>7b;O#tj0(m+>uAxI_)Fi$0rgWBOentW&bS
z5V?~TXVm%Y?$)>{xe9#fOn}$X?8r<Ec^ll8K`h@KL_O!q%bq7kQJhw{jlWJO`U#6t
z7D$ol3Y0*5qtbw4c)#5VhHdhILNVVeD%u^|G_yi+_m#B4J_>P#PM1=HCQr^9A0q-0
zX;bI|Ceo!u=b-5JGMXsT(MT>$(6qt=3s8+r7R~Ruts>{dS?kANX|=7CjJc>Bff%3j
zJ9{aHO{}(sC5CHJ&z-7pB61wU!quWZl2RlI`QStSL=6G<(nB%^!ZuB&(bf2@<FK5M
zdi0$rchWm}7!1&WcWB8s2#Y)xV{+IF8BAzMjUod<NgxJ+bc)IZ=ShL#AwK<py!*T%
zCIiq+BCWs<T=Ww)!H~N}{mZt8kO~E6+;NiCElY)aSd7fHkxoxSq{{n&xF7SS=i%0i
z7j2Q5F#lE{*qewB7Yu)WOy7r4(~8SQe2#dAooQcnix21hbKxmMsX*GH&+*SOE;jqu
zyq7K0eWMr6`Xd6*axcx50S=)KP)Csm2TMZ}rSc9n>Ug}CBjFy$e-d4k$bVHzZn$hE
z(?q&jzMAE--V%C>Jr+<Fq3lWjEg(E%22mLMSG31vJ|k7RZVYu>u@Hl}ubxeg(l2`=
zquJ99Fb?{Hw<A(^{SLXw!M-}!OQ56!a-teM17{t#T7qvT@8*gl8jZ_XKgZhy{pAv=
zfJdHcvV{jC<ZaIt$lNg~y#(=m<V=Y!1XQ>~JiO0G<H2z^of>TtQ7M6x#^lW^^&&$e
zha6ENtv^_FX>mtH-M%;IO+7xlbTFw_qGCU#`R5@>g{_W!J3uZAmBh6yblI@sJ(T=S
zm$RgaDQcaP40F9_D!1Yax&L_uyE`T^DwNA`n2dnT`EyD-OzPmm+*P5WIkKuKe}Z_~
zVj!3&Cc$7rc7tFzYtQv(a(60sl7a2}ni!Ku#F%1TDD0)M^pmW9z)Rn_qD7LjpVuun
zA_ge%iF~#^8FoT7t>90Fu9kzvxCJJo<lc~uekiPu*7xdjQyUvBT?S6%60Icp1Fu%E
z#;GaNHPc=a8LC6e;GyVOy8YlxHdntxeng3L!m;Np51utgoiUE*q5Ndvp^h^>&L{GH
z1;paFCQ)4k^6NncOj*#|1f~q{2*tg)U*Vq$`%b*dlwV;v?An$$lk2FJ!kNgj>J`L%
z)N!0gct?~`Y0E~wlO$<-LUHa_S>62PLJxm83FG7m#&}1a*B{?VPgzu6<Nq^sAxHKg
z%Qp?oDg|s%SSm-L8?}udgmcwaF>GK+1Hqm8uHZ6B7dtPU3cGb^YOiD^w&uO6!plCH
zcQy5Uee`Y-S1Pv<8xU;s0Ag6%yvQc=j$k|LaSt?e(kBQUth-tA^Oj-Pb8LK!q|4&H
zY)R}IGTDi&`JeQ>`*J3EJX|k-FAQedjn7ZW6Gz_=S`M%rwAOiPFe@Ko_l+#zUuWIg
zAC|opI3eI^5#5*TUGMJ?M#>Aj-e2GH7uNsL*SC-U`37O#84;(zQ&wYAg38GAO*$rt
zUCMvgfQ%Aw3cY9<6+vTSvCP1G8b+VVlOcVlpskpNH|pgf73x(Hj-he5K*NwabS#TJ
zyz-FYz@2mJ!8OI&v1PG3>Alw}wQ8})Tm<#{Rj<m4_oOwCwYwkO@p_a%l|Q9^kQYy4
zpV9)_u}(w{C$cF+K!3CmxBW1wFDwye%Wdcn{{mJIGP4X+7$hfxmo7e2e>Z1LauH(c
z>?<l^Ux*^%ybl*}GQKE5ii_qt2I03(L_{M}rl-;FESVBMd|z3iOd-j2u*mDg_GN=<
zFyH>~LW<o^u{2Q#PC8ye5DI8lf_G8ADNc3Hr@4?Q_%_(Pj>W5wvX__vpWic!Qw$p6
z7%C1Y$OSYat|%cyAU02|P6Bzz;*NJli+mg|<}mnj@p<;OPlXC9+D|+sTws(ThSc#y
zSya}QmXVozVput=iij3S$yr-1>?X&?oVkr*u&+_U=QSZy(LZJ_u7m>yOlZ^+@@(k6
z%*K?2NZFFvT%opVs|6B;g3luO?0vU}*t0!fe-+M-6b&Mox198&mwIJenC`U#n(X}=
zkLY;BFxZ^R<45S+4Rkc3T_BbDNvrIaM7R8%k#F+1DFx2@v$|M}{DvEcw!^&sTH<|0
z+gjJaOp#*v^Acq6nc|MmfPTs~0zpiTvy8l=oh_4(>=zj|nbwGD$W0Cr6VIHgE=FF=
zjL97km7$Hl5@`_`5_)!i9?EH-BN7H)z#}`3bzOrQDP>>cv?6WlMI$s9c<Y;THsqCp
zD=SlJ;k0TnuaCn6YK|fwBohQW<=QI>sdAHdjA%m_nl2AEnNTit@WnZ%C@#=G$F5}2
zsx_nv=}5gb35&VMtC-O`=%|(`;EYQ83{`2tEHPG>i|h0vcXnilC-x8&pSB%{u!3wD
znGl(-)UVfO`JVwtNbc{w(5cfx2JkwBS?HMgeh0PWRXX(=a%bjF3(*!8n&piWKFJe_
zUa>8A`koT>X2Lenr8*L9KDt}}51VxXZ_g}71B?&I37@U+f}jdt(ZC)u@}GI9<fjZz
z{u~?4e7+a{pM2#%z=atGidkT`kJg>SJ}$~*r0w`1ELS@~V3>5~H9{%imPUAJFaG9n
zZ80X&bNK5-VagKe`cw_~8s_*H%`#_SFu}C)-vm?sf06KCo(4=S?Td&qr7B&j_uJ{p
zpqG^mdEiT+7aY-~ctyRZ4L)%ZmDLa_H{EbV*TB|0buxjf`goB?*$mKV41~)eMyi9^
zi9`AvR63-x{^0uNH3ea!`CwK3kZsIOkj9WU!L5DSTofn7U`r!2#Fq%4GH0}27ZcZ&
zzm8;;8=-#*I7JPpoIfiz%rQENX4Ep(BK!<b-5es%{A}qi;T2MG7tcEMm+2%92HTR?
zX+ft#b?A@~5h@MLlTKhd>--=If}>$E?<I*qU}@Q4`!wzfgh|3sC+siM<|e=l`bMH}
zL-XUy_b8r5`*hwjRp|^Y*(vU^LO${VVxK^(Y6J4pNj5q`C&F0>)u3Y1RHVv$OgEvT
zwcrF1PtiFq8Ndi)3UQ)`><DityQWWfEr;rYqt@w^d0@|EfN_KeuyZZfemT|MGQGk%
zresbaETLdyA1O3A8O!l_-C2)>@Or?2&rc)r_ZRt857r8O)L9(-Ao%+NHu73DBuX5s
zQnir8ZJVN;q?K@_h)|yZU1eXw;Kq-F>$M!N#VMB6p;i-U^B$dol+W41+D{r=9FoWu
z>d6{L%aV@;?P9p<G*l|ZIavwGL>>Eg%aTogCMRSg018IVsT}zU=gtO<f^#67YFA=M
z*~yA3@)s>*|9P;ph7+Li@#V$*i0ekf{?%WWbgwU9Ncf%PeDV%R^5WlV#teerZSjsB
zL(JKxRU<QMc)@eDFq(H9b7!bcq56>S#>y5?C<qZ16r1yK&!bnsmkCC_lm01H)%ZhO
z2I^!OF75zd21P4d*pPT3+L8E8X;^KKKv{Eb*P#gYS%iOO9T%-lF__%6qvspdQ?~x+
zBX>On1@mpPx;oY`y<s2(@z_3m*+ifwLJ901q)`I6@yo6^?&!kkdrg`S&1aHvf_cqC
z8VC8qD3=DVP)#KtDRhOpcq7qP3_aBK`eXtAhrg$<zsg)9`7$mL-0p+?nW9px4}!U;
z`lF(v>O_`@wQ&KdDghwOy^T(fV>O=2nG9q9yVQya@0rMMHwJm<p?Rcd(*H?^ff<ir
zQcftPCJ*+`SC=^9tj6sx>Yd8jd+}~)QI5A2P|U9y2tjSwjLNX4`RG<t{Ihe`D*-z2
zI}U%QZmz!_BckCE*asvJ_?UhFVSJc2es_OcVZ895fqT1=MUban5`EPZQk8c=G?N}*
zpJqK<MzcK0IoK^fA3(@DCmG0pNBe_eA5sF<M`VPb-TVqTzKA8RI`|A$_DiZ2NVRz#
zdXy(N#;RywxBwcI9L0J>GYaa+^HE`7OjpTR$OIWSGRG_fAIy=av?J)QH!+%>s#3@k
znD+Sz$T`uXvdVFe|2VrDl!-22Ty_B>nPi9~$Z&)>fl62hsX5zU?eYUKKReD<>5iE;
z^j`06U=+IEmGl)6^O~8SqrBnJ_cNb%VQRSzrA=u(GF6x&JZZ5xR^~qQkX*<=NjedT
zg>P!CaHMcj=7nnnZ$vC3L|{B~82HoVtPvfUhI9s(hw4;dsjC0L@TJu!o2Es!g{xpQ
z97|ty3(6{ztfb5-j%SD=iJ%@($(AYgz4Js8)G{YOFOTlz2ATNUg*4BXd|E+wjbq7M
zy?}2{?X6(U8?68NZa@cw8^tM+GK{PSQv0j92cE=o2{tYdEI?KWscRli25G$LP-!?v
z_b6RX<<vl5zTwf6_AJNrUh&7?2(Q{~FDoR(+0HrkOVw5M?{DtwHR$c4Ygg-8+NDX3
z5e&Y=g;og^o$ht}zc6^L;D)kD7)}XIX=)k&dj|C%k_^M8h$<KU$|^4Y3f|pMF?vOw
z0nfPObFyLHL&_v$`U~$8p0WGG(b94LOFYC1c*z*}`s_}5vj!%>$R7Mi&6nu~*s2(2
zcUgLr!WH;s=?N~TgA8aaea*U`6V^y`&p4#lmjb4+dzZYJU!F4FG>Gbd`#S^93W$cN
z;T{glhsR5}9tHz`P<Hrpc{GsUSy?W>d5>>8vHCTfo@spfia&MB(0Sg~oSAI6#i5RH
z)#Uc|%-pjw%iHgae~h-z1NU=2QsIb#VA1p$_g&%$1chFS1UV%E;^GDO0?io4Ph#Tx
z2|NT$4KDK`+^pZ<t`kr|ZT(Vb_19W0BLs<X81St}E8$Ug8d-!97_J!Dqsh-eTJM;g
z*_Mw{`w*mcU96+VP#_i@s6{+96m@?r64`Vwh(&nPRIS;h`>x~f(4C94gv8{*cwxo8
z5?sGG(S0!D8%yV%oD#(XLnk0bq3$bbL`sN7a6|~1=puBz=m;BOyoW=T5sEC{HA|}2
zMDxaHhmeelHVvu}q$<y<=V_f__SRVDT#Nk){Lm+gTRk=-AT@-{-SsW+u&9fvGPaKe
z2-e)zoRlyi&d4!<MY@4i?mYU<*T``?boRL@b~NKOb_le)Wb8Z1nE%4t+lSV8CDvww
zuag^zM3s3u`r6X|G#^d1q^MfZ++U~Y{8d~0A(m2n82@{jic+Xm8QR4-kCbn9+3RRU
zqpv+*lwQZuY>hn4y&kd`22y%?!EQ<e<*gTLpVx`^HO~;=yk3V>Yj|~6@8@Pyz$lZ$
zA-EH|(|5PAh7SG>a(SKXl#kdwZl?`=qkyM_I<NCJ7V9Y@3X_q^2FG<S3Oz7?g>C1l
z7-{O~`1dxbmSF!s^7}3@M))vbSl_2|W1d^uLHb9Tr*nL>e(Ee2S~t+E>;2sorUyI?
zhh2Tr^fPXhngrxQzmD4HPc2SDeY}Qdrr5n~R&b5aB?1BY>`bk#^}hs(XmiX2-|EfD
zE>9c$gKwH1XY!d&m+qO)l%njal0;7$6ux&e#c7*AVlr9~Ngztdh~-iGv%+T)IXk+o
z?d@1c-t-IYzee7>bg|PeSTRw0`F%Ag^IyRmx9+maG6}q&bCmoHB7bdBG3SNVoK(5P
z{fTlya9yQi!-~nCHY^#H=#*aa&5C9W`??pOxAf8q1<G5OgksUMa7QI`WHz&4qfM(n
zA|MJ!lt5xcXrxpcD6UdnfL|m;h)rzQRo1Ds)|`6()2kro7S~h0NWzZmXW~y%#|wuc
z_lM)*+m{)sZ^8=aLcq-%mqFf1zdz=KwpR*oZ*|q&j8@C}1{@&KK*!SF70a!M?|Dsr
z&g&C3dDMJArIWjC=h4nykLthjU;MpSH_no;i(?$502k-$x7#yh+XR>R8x|rW5{5OD
zf~!R}w6*oi>y(AxIEVziXCm<WVmv)-yq3e9zx<ad<G-~TuHd+GfM`w}DC6!ftRi2<
z-q9Tl?t5Ys_`w5ZaxO8x*D5g1IN?L1mq*oS^dJIE@Qvk?{@ddZi&9d->|w~Ibf_$!
z^9m>B{OiZnR7I8l!`jMs=!yH*k{41sWvBy$NlppVMw@iCHg(|VLhnV2x$iQ;w<9Tv
zj`C+qODll$KN<VpM#Y9)H_v|$N))SEQ*_Rh4mo{)ot12sYaAc9M6Bj&C5fbgTI6uJ
z5y{H+NaT3U-@X$sI`3<)J;b2alhn<bnGW9Ctqe8*qr%YShoBXHV+|`T>I?Jd^x3ZB
zD3AXvehB+^HnKxZf4zJ}0y)H^n{*TLFagRtv{l>sxQB=Q*0HCo2fJMSzQENwJaEcZ
zeQy~j(&XacUMXZyH3QnWp<`x7ry9%kls3gj#BC1z2pFHo@;z{80q)lRCzM`A;F(Q|
z4f_>Hj<It%5}DRZZ0BT_YeQf#(ZvY{0DoYi+PN&#YC0p2->cL%`FyZHEGn%=H7dCc
z7qh#J;i>OVSDMIukH^|F|4s{SYz;T8{J1|eS3YlGPMc@da^wQUbIweLRdLzOk7R6o
zZ5Z=Vk9KU{^I1OjK07<1Ww0{F<DcJ~Osf*#WX*yN`{fBn8cGN*{wBo5EOEus@l&r>
zl3jDaP{ZOrmg_N3s)x0?3gE}Tlb1mhmtqt<6645%ugq3Q;4^=9J^H=?zL4QfeGl-_
zXvh-I7cPcu^U2BQ-Zpv98grcq9s;#Jy)2VHv9|YrDi%3Z)Gp2MR6Nf@-TaxqiANN^
zWbv^!F+{P9xLS<5c?bMiHf&DN^7lDy$$l?UlWBeGZ`S=ecoZ+qC@^YGX$^H+o$9hq
zlxtU)pYRCjF1r4Mpb*gp5AYoB=k9aqSO{)~8XJNQ^24Ej&Tnu--~FG7T|ff^Y`b27
zfw7P>xP4ozrCOR@-rI9sjlKUFcV_<f&*3^}5G2f~2thL4m(L5!<7ZL-`t^TQoIUYq
zkTiM$9^xm&U$MY2O#rblc>6+PyvO{6lkXp`=VcQDO2i&Z_8GoI5=F;3!jQ>2#`>K$
z_y#}rQY0vrT;|zR%KN?bewpF_e~hgHCiu^m*>{>M91p$_w#VY%65~7Oe;NOE*#F%2
z-@o<jkb&>X&dyo2fb_ri`?mZ4?+3nN{~s6^W+Eu@FiOLK+dofdeD{LYj?k*ZHz8B;
z70lDf-<`t$WA+FA*SDUBupn&4ZNl5F%ZE_#r*H9{|Gc!d{Q0*m#Q)v)Ki<B5`)BKK
zjV;$i<p0bk^ss;S^YMz_A;t4u`juUFLP|d@x|tGt_jKwWBf|HWxOmz{sLU?DELhdQ
z$hgWC>>AG3^ztcGo{M@DPvPFfcgiE!`Fo9r@xaS$fB11*N%LwrWGC3$*X=oDA#o|y
z_V##7LjOe{nalEI#<zWi(QB;`v+?`yjJp;?my+sM3x^28x#bP7cKUSpj6iUF`8Uzv
z|9w8Wz<mAZ8oNS9k>4^fdfM3h9xqwU1`vFSxg2zCY;6%I!DVxv-1UU|(y^Z`Yw_2&
zww#)UQ?QwrBX0#f;nuIOuTgH`(Me#ImX@@9{YlQ@P7B=@8sdm}=UC!(q+jx-J32_+
z+sKP))ck#a7hN?R$Xc4W(P%!Fe9iOdy!Zl>2NKCK=knaM2%Ho3-yl8m{SIy{oqWbV
zDL1soL*Q#~R>l>Z`-g?^@9D~QS9h`es{h}mj5D#W!NT>i-*HCGB&_2xk^=98zOgEl
z$*h_BtKv8Rhpe}ZiYwZ-brU4GJ5++ZyIbMzZo%E%wQzR}?hq`v1_@5#?(PsExZT?A
zzSrJ6`~0oHYt>w1j^6v%Da7rL*#cb><cIdGAyA)<gHfo4X9^@Z%RrOzw+quy|FZ$p
zy)<jazU+jzKP7t)@=O&<M%*5hEk`EUYtd_WyiZf4J^$Z%^?#qSury=@8zpFo&xy<Z
zYT)O$T2AUaw>Z}_16pBzrDAa}GIuKf{2u|S?@#XH`E&3d)Ro-7-^dDiAyeJ&e+S=+
zBj|?7dLu5D+78m?5Um2$k=?ukcWd`KTgsI*ZCg-&$UT=^27NMhu&pZj4j^DB;vQnK
z^~)2z)in2XEib=kb1f|g^1d9;Ibj@$Nr<zreG~G+UtL?}$S);3F)`-f=fGcXpOGNo
z`r{hplhQF4a-tvKVqrLUf#$u?F1{h=)+4%%N<?cCI5_fGWdAX=#Y4kP3id*Nbtlsy
zoyS8gBI@-k*ZQgdCzYQa-=doohq6OE>xy%#PD(C{P$FNKj~vE5o5{h{cci+1r9Fbh
zzVJVne}s2#A`^kVP!#MF_7YQ1s-iAiC|z^+?&^aL+LE2eFQilzm)atK*;(k<=zFGp
zNVl@<|I5+phHjn!n81*W3%Uz`%kP`|viYw4L#7m&_wKG1e16ZyHN-l(S^2>AqsCD8
z@jcxRkj!4`+49|WkdmFclb0h@;@(RI7af22AEVVR&U?voV-D*}bMTq^aB{4cK>qTk
zo*w_{vDXrgq|8T_zb=~s^~DW;*xUbpqT-Mz`a~L^y$0C_I1L<&kE@9;ABDEhcodaR
z)uT**zA-Ta5Ll=2Sbft=cKIrCy}bM}ENkIdpRF39w7#hVr<d%rC*+E1F7%jV8TaWZ
zFD{U8-`}}k(01)hNx+@s{PB$O>%NTfzcy#d+I2@eWY1b7G{%R^9QN)t4^`@ey_Y>?
z#jgR`Tiah!N0Ys=xR-4kg?+csljsKq{;H5qwy=UgjVFdy4qZgL+1cmays`&>uim@7
zuqCDcWE!cPxyJ(7shTz&QSTNO_<Cps>;em}ajd^nq)Z&&tY4a03`fMH>U_jDL3FiK
zo{AXHs$rj>T0<r!n$Igf9|5~8la1CR=t(C1<blAX=QK-mnzdQCCHM*<tY97g5t1n{
zcVmmiPgK6Y&QGfz6WoasDGd_vVC_eOmR#$8;t#7agsh)H^Usbop6)|f<xzR*JS9qm
zc{QiL2A)b6LJ2Qy+TU|OCvTjU|Kh3C+j(bmU(3Ok`!VPIyD|GblfT=Re|4)eYF*5_
zyn4uG{K@iv>~B6xZNmjYj-?Y4N+nxLaO$};JGF*pR$Ib_0Lsc1sGRTDSHp%M=l1h&
zMMBOiM*6<40DQ<Ahp6OBE%NOeM8qAqjpDS94}C%xWEF}<_a>KGncD*lr@UV<{v1vD
zTd9`T$m1$S)1|i?>xaGSZPL@Mpfq_l`Y!^lMrf_Pqdh=SKEF&UPkE|sae|{WZ>A@=
z>R!B;7Ajn?Fm_-vS5oOb3h$U2itCc_Ma5+TE#+6QMj^&pwjMC2pJG=z!8FxJ|7ArM
zEyCLG!grzPj0UgU!+;p8_eK75;dH#<YH7Rq%N?J(?auzF@l2Ukf62-(27vJ$Xw1p)
z)*^|KyM%(k+T7%A!uXKS>^DpEK(xKQRMkPkq+YiKYrnTsU147&bSeurL6vEds%hjQ
zmu%ub`_Y;-Ic9kAvU0nM>2TN#>{FrFX_GRWsiak#hAG8#oa@nm!iseaXKq(qpuHdR
zKty3d3B`nAH=TSQJ2f^oUc_?__dwQHeldTAxD`GVH^UL4oVo|GbA1K+I1LD*Cl=&m
z^HM;$%E?`UmpnG4V4s6x0N7Zd8c@~F0Ot}=fpak+Kqrh-2la@T{@SNY1AdrT$pm_M
zP-#m4KgyW?<?B7Mi!|FEO%09j6WLc>h@C2B9gWw?C>xCD+!J^Fv1hoZWA4%IlXr~8
zRz<|INkbtnqh8deYh|t>te1*U^9F0e;TQHDwXHHbpi}tF2!!DsKO<yfi@K~^%JN^i
zjtmzSLrto1=2`U4gla}?LWD&8!pR+3mgi^Ih2P@M?QQ3@tKuS5>7Dgw*~)27=@)4A
z8hkV7@w+QK%ROE?(1?)vE17>c8dST-3A1smu4;5IkF|5#@w8Ac2iY=lG-2SaTG=bq
zb1^i<m?mG7&s)1Fv4vdE5e9>QvU4)KI+)7&z6?x{@sLHZ(&LYt0V;_7+VKcjO|%OF
zRxSPVn#5n8;=a4W#@3(=CB6{VYQ5%3pYFCo-ThK6D1K0~8GYdY|K*(iGo4PML5AZJ
z(>@eY{k^|~Ibn~4j_=FFTMvg8<ck?^*&Dc5<knbclMduPQ$m|t55%8YAP3jS<K2<;
zVH+0Xn`mv|@HWq2J9zHOG-fEGKdSUFa?sVeu7{Lkah)xK4plifZQy|9E3@U@BW+5`
zUr`q?j?Ip*mz%XH3UpI*4~bm&P3`g~0*n)h@dw?!5{&@nQFjc>tlG3gX;Sr+yso7$
z_kT=lL_Ru>rQAb5J>K}UX(C33AYcJiR<OzSP8}CB$-70t-q<ia`St+(l3+u8Q$T^_
zO&l^g$@2u{@)2#%=9z4dMjSbhj<8Hc&1;69(=a7#u{PpoT8PqZ`L4*2Pq6s*dSndo
zgdg(M@%eC^KTZjU<=`=|cD4G<-YowS$mcOFOCsSp*2tQatJVq(N2P>WL3e%SURVIF
zR%;Dxtny11#G)qwd+ipy+4V{L$STc`^f#>reE0Eb`Tq$q2i7e5@%K`trs)`PGaDct
zJzOF|q6NC0hsOzTyBaMno9Mdqj_2CsiRnFe7BqVGBJK&OOH)6P;Dp}R;S2g@N{#hL
z#gkb7a9z^q|7MV&^{9+lidPNWu<204qeJ_OdE(5A6+dw(g(qdfWn5j$nMl+mthkhw
zHJ)w*z>PQ^ll_S&cDLdxSrWugfp`y$WL#|;WodVQ_iOWe6>qC|?8T7MkD;+$N<4R-
z;y@H=G@z$gq}gFN9Gq;6qU5@t_`z4{-xko!wZ;7hQeq2<eHB`brLss?QRBjwt`Tb0
zMqKbwxf+bw91TCGJAY-O{(%44b+iQ?w-*YR&~&PC)u|VhXM99xY8KH$?{)C&eJx~f
z{j{M2)8sw4<<6szN*d<$&u9X)Py>5qa6^$0xp8Koe?@02*T1sd6$*aRir?(KM~_I-
z8}o_%pW+E77T|K`HURw=BUnSW72gb{rPE-@2eKRFoIhdJUtIu+?En!coL|-IE&1JB
z&(_23j6sQo@0T6vYg7iM4EI#_B`xLljVOAR8SkM87tMdBPOjVUp8waE!CUfo6mA?U
zcjU~qVu+gq%kmJWV~N*PT#(@CPoiut-Z1GrMH9Ogwc1XQ|EPlaDZc6k1xGofFQ#U`
zsc!?Ij$(_-n8$u5o*6py)AG>uVH&GdQZeY-FAwj0Nb?Z2j)7H_|F*H5_tF0W=|5`v
zeKC3+XN3ZZjuVYiZ3W9PwQ<aH;MvWA)7}24vTMWG=B=6V3q+VIEI-s5=dF1vz1=-M
zncG=CVc!{QM=HEu<s>(CqEH)_79;?xoh0&*@5>+3TIK$kP{CD$YNAASRmXlfJp-ly
z7I<cz$C52WI%@oUCVojB{jut+#d>zgVk;+0oy@@-%V1{55ns4Y*|~|r+KJXoMM{Le
z$G#KUt!pn7ZN3`CaLL?zWny4VbM;T=9aLMSAc!=pqEJ-#hXcMB1bh$J?D85tBv|65
zPq#H)Ma7v(fH+rnN{wEDitm(%+P!Z3(#!8JTRF01RZSxV{SOR?8ITBH|17tW)?*cd
zN1k}-$VLCyx>@k|c1GXYTt53uoT%6d=IrmXIPjmrX&AaHvDXB%u;oI7h{h96r_?Vg
zudHI?+sa09&z?7(SN+zNs;eq-CtcfJv&6ooQ#>D}He-iV3+?S}3;}!1${gOZR$c!5
z`aE7NfX8!Jas%Ax`b;)YhfJ2{i5|RU$hzb`<zv}tw69WVtH*`%{CO@v?r)?`qLrE!
z&I+WxLb=8C%q>a(m*!7o9Eb_2PQUhS$lE{|9Byh9_!WG)V!H=Jt194cliA{Ato^j<
ztJ*Wc0X*E@8Q59o-U`pIlCzPOG-qx%t@EF7uH_Dk2^Gj{x6{e2<y&)~Y-hfC41uqy
z+8d5jmTc|^gso@&a<He#5(*{cn5qq%>+}HSkcD0@EFnQpq8U6`2^tPJPChntahJCa
z{j{5XWcg(m5}*_dPhP6tX0U{9)4bWWf$p$n-RKHx#6TL#d0<r4*e1zW=HW_}A3;lz
zLmX&(6(A?%>CGIf;Mdh74&%R5*jfCFq3iwt{JH7c9n2ExE-B9&?i|7Jy^R!mDE=Ek
zS!z5Q*3eM|tB<hiaDx*A1(Lf_buF4FVK#8C8+Q$EB`LKb!}11BkCcSuo8*Zf@sD!b
zi$UN3R#AU#Wwz1l_>b{Lg*85kCTUS9@OM<MNwUm7o?x=RRd@>f?|NDFH2In)+OdSA
zSDMllg_3?E0UVp%#HWN;pfm2X{-3(EBeNYfiropB!_HyTK~2Wza#Yx1Cu-3g@nqqg
z#K|2InX2U)RYjfs4-RUsZ7zH9EOozF!}L!bqU;Nj3z<$CX~*ys_ki!(l~Ww)H50Ir
zxT^&WbxRJL2wHVp2tTvM5Gn^a8#>edI{)j&{`ZyiFA$6IddRc{K*19{Jchuct2pds
zn*1iKvnbuPo!Kj;@7xUxuYZ$4x|_j0!{;?E8UR>gzG$29ca>jK3CaW*IfK24FU<3q
z-=sbPAPT#1z#DJ!(}lQPVB*;t>pewVtwb0<p&Ad9JmzQ!mdmEx*~AJsrElpjj8`ea
zD4qP~W>oDEeF&rdctOE<^sEk~;sZrY%?p;zYpWCRTFxs_0t_Jzl$_znA^@sO4-Rk<
z;i+B2t!AEJXZlkP_N{&g$Gf(Pg5M-*%`O)gR|Ccd&kVPGPD*`&L1#pCDDb#|ty@E%
zQNZL8$6*Bx_;SM*rk|Zsm!qWug(~KpbR9?hQ%N`@f@|^#lxsRK=oeA*r1y%q#gan~
z(T_t70WzPBd8I?NH<}-u^fey~%=WcnY=(S(6m~DIfk@m;`t?pKsWI{YFvif>0tdqD
zZ{8jlDDVd1Gq1lo?9}bs4SVEXE*vpOvhd(B$&iKup_+#VvkoK&wRJGqASB5OEqKDn
zA1cKud)5!4{B&PIc5(wP8yME!FJlXz+CP!Pj6(6696@=E;im7L;vF5V-!X$$2#2j>
zaCNh)Oos<}<H*GouIDSsG6QP;W5jvn7uy$0Zfk~smZDZ=hHKdEsKd<m+t|d#(+223
zbn#}bI?yDf012eFWzT&Ram%}!s+s7`Dxd*(Hys$C|8X;JJY!pDQ*(r0>3ltnp*E8H
z=wZv#8`z7XlGw3FrHK8Rhr8zKm|C)}ATxAc@a7<cEJ!J<f3OxM8>^+n%ZzY|sB>|o
zXIAH+(Cz-b{`<hVWS;YSceoF^z2eonb&2NfMwxPvvPH!Y&8`QBOu8EkY95N|_bFi{
zt&%e9HurIOzC+C3!Cic_)+e!mDKFMuQkv+YID}G4SmT-8#(KAR^*3A%d_mqNX2TvF
zI_3>zYOn#?(Oq<?!0jjW$P?|;K2bx8WOMLD)Zt^kkU0>)j?Xz;Y@#BPcVta`WInQ~
zk2TQHJzwp|L-v~ZJVi-SG~yAsf{lrsBcBQfFRU^prholLM)Lc<Y4HfGRiwDi_3x3e
z7%emQvx97>tDb;}a-3Csk6IrQAr<DYVdx?9AtE$PghZMQx!*LI8PNvZ6$w6_N~tcQ
z9?x^F^<7_fFDmP3V1YGgk+=WkNCL8as2RT&eqK|>Aa%7=n9WpREh1x9s^Y87ftpcg
zRSgnpT*s|+KYc|R=;*obF*d<vp=ogfQ(f@{==WJ+nD6!1oKPC)oiY>F`I4MkoVz2-
zVgb`RD{6ff0$<OYom6j&RZ8UM-f-zt5}mE@x~5ZC3uK-VNTuEm)a3eOE3trutANVe
z*e58Rl<!$#Ordrwf&WaFy|=p-w%mFru@#zj{-;?OvP>2e-3C$WwtbGIOY#%+*Hu<V
zkH)qEJvbL4)5~>1zW#e8S|lQbH~~e5sAQ?I?Pf^BLwjBcu7SC^giSb6xDIA}lXiC8
zJw^2uYRj0bZPn!*)&m*kQbDZFMF%I&=W-|$fe>myh_>(;qG{*hHq30ZX$@eF$8z+7
z#VB4HkbB-6a2P9S9AGR=F-D5Ssv%%+qkVHoWW2KGctZOMbx=2gbaa2t9&wrVqm!X2
z;Xt&${6Xrp^6!c93curcR_}VUE@Z$s!chd&W}?AEQ$`RfrKBVRB#>Z`5VOXq<K@0O
zjgR0-TjZW!^F0k3jt8V)t}}JlmPdsDt|%`Rws+IAbSoop669Q+?5REdBQ#WHGI`2k
zc|j=nrx%<`37L79GB2CUA64R<Q1%}kUgn?|H&(u$g$jsFvf8GptCAsDK*V(8q8g&F
zaC)r-AI3gagk;c|yT2_1(K<hFw0@!V!<YNG6?lY*zLf~(?xpP81wL{2wmiYlxLi*c
z*59@EB88PgA@!HcWMYt#!N&Rdd3PVc_dO+q=B2@0&}hPka_aWN)}H9M`^dU#mYBu#
zgl6=<DAE`;tgnWV?Bc@%)L9t5K1GIuIz2C?5j+gMWs~(O6XWfF`mpJnnHB?H5)-<n
zw$U~llU~WbtFoE;WI}>%BU4KQ^RTyZ6QU@tcDJn_n}))khHFQlxR1X?<c;t+aI`Rm
zizO|3S6c&Qu0vI;W-`O6e@zC&<3#`}++nDsvn{XVz2RzWp7>8#U(cX;1$i_Jr~V0Q
zG!|&mWeG8o+H83Z{5|XvdY(LVGC9*!i!F%!<-C|1nknl4>##UKyB~(rz<=_mll}oS
zfCt?n65at9;=sa<+Ee^;EJmSQ7Fkc5BP`6Y{!ccOo{fM30m0+%ZEwWhmIG@Ma@u@J
z^a>Z;Qq|;144w|}yg}#3KRgUWDPT7+j@2<1QW?%}n!M)p+}?Rxo3p!hW?{GBq|=jf
znBG!fZz8MR>*AqJZt}_Hdy{nR4jFMmUIk)Af)S#gXFftFnikbg{XV?qBkzZ<?kc0y
z@I>Vw<3hkYGb3Ifr>{$9Y85uVk@qI&9*H}oi8H0A<<g;|X8|9zV$%SQIZRH=F=AYm
z9+GWYkj>4Y;eL$R`)cxeO_L$pc(9*?meJ{b_$&=WlJDsBRK;;V$Mk(|qkKT4lJQg=
zzFq`d**o_hn)B0lgl~yxG`>_(C7Gle(=T;Q2*SR;pttmLC3NVTA5*#Wh!FSER%I`7
zJp8R-|8u9mr2-=qQ{(#iq1Rt+fmQ}PZO&nqe12OO{2WZv8C<X$7%mcxh1aQz4xrjS
z4A$rDY96;K5waV&Pa+HJ^9E1M*87ag4-o9y<+j94yY~T`M}0c)tKO+F=xKEErA{HP
zOB)spzI{8^8Un$EIE7q(8Wnk3a-4C$$s{e!DHJ5SlG?Nz>cE=AoYkYAh#SDjYa?#~
zdU)XbuE^tqk_WX7@g22FPe~z+8vNUz|B<`;jp3Dt;YAvS`{G=u5@p;=B8=$%OIS2s
zCK9)0*x&gt4^<Afu`p>ji3o|9a;r#J{owPjgg>LPx>T5qVJLe;NjaDf0G+)_!Haq{
zHaN_PW>GK{wJuqa(gPxRn`rn`IeIiH+ShxxwtF{@qcAK+K~j!SFJY`adw}3=8XDZ0
zMz03mb-ZJQcjr=cMllJG-1Gte1VAI4yr4+C39oEM&y$5V(~VhEDoe53PqOu?_6^^G
z%CG$W2gSos2(4I+lvAk;)qyuy)tv=xBY;$ahN-z1f^znjV|{jC<<^n1998`7t|1wD
zt3pi!ZfB7-PPKzhgXQuL?Ok`a0E;!OgmPsiInB2QLt94s6tSOi`wB{&?&sr`?3D={
zGQ#RRd+rv^ODZaOS0)?Vwi-8hXojk|H{WafDKO})!K=-MBBWg;y5P%AEby_+M|NH$
z?y1G-5dRv}y(z!B%h>9xnhv&xX&Tl*<hN#{28Xyq(<smv+p21AOrViZ`Y&izXwIRx
zaLMtX&i|?%`-7*1TmQNxa^iwxUWFQ5Wd<yWCyNELNX&;DAzj(#s^EhCHjl$5&?1l*
zN(Z_x*yPL2RnjRPu1e$$RBkv5aZ)8=xuE-ZEZGcM&~ZNonkf7Xo6<hA`3Sd|5%GvM
z0YU^u@y$6;|4H1!Rg?_~PR!foSEDMzR-NoZau}gGSWHL=T{7%=Y=WMspVZ^>;4C+{
z!L6Z^#e%^gnIbM@cfWC-VYt<fv+-Z8OO;4!*K3YZ79{DoU(#n7Il1^U8IG?w7r(>)
z7nwW~7wA>UM_B%zTx>q>7-+jQp}g$0nQiYJYu~1Uw^vKa=SAu}rd0#)Dt@jJxg4te
zv#y6A-tSF;r`<gV=*@X5pfu&IGwArkZ>8h!f|gwX(C=>!91bQO--w7y&2$grs->J1
zwH@)x_~8_i+5%i;y0_}PlRb&sa<GzcN|m{5#UkgSp=^_gK<&%QFqtl{9`8;6`RiVO
zY-)9<ihT(IT@g=2ie%?|PWu=03vjIIV$0Qwy@}c`4EJz1;0c}L%3jwlI-OTFcAu7E
z37%FLRaT~6N%4cZmmj0$O4w2yK0e_45#4|jb{x^yMFx5_NcT&GhRgI+Vm63HlaUYY
zGIBr2@3*y(=Ag8SXr6n%lQMN_*l*FPj*n$ZHCAQI5;CVo(=BmYoiU01t`3j91!0Yy
zNkr`Q8XXEWn+OjAR{h>3c>1B&PG-==sOwU{ViJ#(q5&%PyrOGWrZ?_#D77q_UEvSf
z#yXQ{adMsaoq^_2PEiVI<G$6V5Lj1A-m~OhKk3MHU&+XzCl6YlT1|R5Y_l{c(p+3|
z_&1|gKTWT<(kIDdd4_M)HKnVny$DAO6gW!3kr}V7st0F&LoKqE5sC(F7%i8Argux6
z>r0Mnns0{+*e3Rb3H_;NI<#!I12##Ch%%Zr==R%d6+V^3s;Idt7qMaZDfSh~V8fc0
z4o~8a?-y#G6bifEecBMdY1jc$8G-STeijo5lXM&Z!D13?QS&DNi55Co-01ywu6rl{
zqfJnK%YkD7^QB&J@r_|<;q1Rf!7rE)ZQczFI|>(@F_DyK-MLFD4Ut=mXV26S2k`HD
z2+9Sv@c>n+ZmjS3<R^1)^SEKv2QScx$Y{twmOl@GL~R$*Tg>}(81){D4>=ixV-oS!
zBk&dKJ4D}~GhSXQ&T3U``iIjjl#VyFZ8sQRYaC`nR-&xlgk5Bn1PxQp3Z6iELDQ~<
z%iGRp@wd^gDxXnlish0Rw{pc)s}-983Pp$6k+$}ap$R4G8iRNhoKce)x^$N4;qmfx
zsfnS%Hp!k?y6_P^ib{ZvCI$RzLP}UEgFYc&GTkGKa?0*a=LJ(Vf7652ikJ$VX8NLB
zP7j4e+k4Jr*EgA%G&t@tn?EX3FOR(6lz8n1W$p*L)YI7-P^+=lCUfVhPUS8ZiCl+t
zo|eCR)>&_4@f|q9U~yW7eRdD=1vVB~Vaybnoov5}OUUVZp7-wj*fk!~BZ}Gx7mJoS
zw>)PL`%-wO{44R7X^Tq{i)YnTKAM(d9;yb}<u@U_Rwobk45mq*Q5B8Ek#D6MqB2n!
zEIe;2MAO3At%`(P+ZFbNDGkpKabw7KLn=n@=h|)5kX<)S{6}@Ul8zNyaz0uHlx<UL
zDc|Z4JS<+xxCscX_c%shHWVz++v5ET18qOncr#v$%rm)4OuNwpeV=R}L1uHbl+EA>
zAdB?e_QD*P1l;qh$fyEcd43^?eYBLDC4CtT0O!!U9zyWF&Y{vIaI=Ct{f|jG2fKv>
zi^MF{b+a8tel;#gC9+voM@FH~(R5!lR`%p(nTFYvl>ZjckjnUP+z}bCUA-$rms2<*
z^Q;OaaNSg;UC@{)i{ckP%&yjpFQK{=K)?ot01;|?WR)Qi4)a49A4Snz=k*d^m%q^k
zI`{WEWr^f9B&HsZ)pc>mT+Ioq{am=KaY$D>3Nwx0gNN*jLE3XAvJiSu-Ka*>M0%CY
z#W0O7ZVoh{$owJyBbP?kN2bXP9J4PTf4E$25U5(ick!4f#NyGh9y9M{Hu(FopEdnx
zui5=}K*>-GUI0`~NDegHr!lCONC%`MntaU2?d1F@??>f*AB_sp4W3qHFqKR|tk4ni
za(CA5HEHP=k-4J`DeGzA)lXGD7&)OI8xFa74nBC%=5wq9y^(Rlt<~w*AsIOcV+d^(
zvH0l}mhW<g`caSLgxD@56R3(v_mLi;ypUj%pjF|7BQHn>yZEV3Wa3keB_;4dk?lmQ
zv*mdHK+gB`A92Y`l>bo}Nf#eX7y>f~79Y36U?lEtV0b-!Tkwbq8)?(2?&6@uO{C&t
zJWnuyI`?f8<<)?n`@-v{Z-()tOVV=QBeEpWU)J4dp{UJlV)t_dS6%QN+7>>6?k+2r
zHWon~Aq>rUoy2qG>E`6H*f=It$dY+e8NO9=yt^MDF~erBs<PbVQA?Y%$i&ReR;ppP
zSZdq618K@0OHs8PmS?6{m2Mc7yb-C|z^j~fYF#-TNf#_1Ai&_us9VylC6p$U<_s{!
zjt1xzT)(X7(Z{mnROkyOQZkY`+JqsMVAk!g7+>N<Ph_cR+7%wZ-{0;0eNlP{%V{M9
z#<@&F96idvmd>32hmQfD0me|e2-d0Hc()8hWGRgTkFI%BF48y%^Dng_hkm0#BuJ!1
zyfLgXku{Mqk%5zw^H9FfuZwiJaEebwQ4yORFffT~hDtx|A$9uQQeC$ovk%uO4`O2*
zf}}2n8<;tr6d{%<yuoJ`P5iZHsU-v#;NN*KOg_725Gs3-gt>BRZc1aHOMOY<0F^w}
z%*`Q$StgwxX2)7|@b<7TZA2OI2%|P#piElm(d{Bm0iG9d)lPvcs<4wK5g<7IY}e&;
znhG)nazdYAA|T(Yn|m6VuMF{yn}FpFge%1YAM*i4iR2!F8EWIW_S_koWg?AF4ZSDt
z+QN6c_+>w|d%qSRUSu=bkz)+pTS=As<g1sjLx%5J=F_?Cjezowj>qrjpokTO>y*{p
zdD0?y>b4$B<&*-J5!A0=HZ4;>byZIr&zt(oH^kprm?mAyvWzqn@mGkg#aSg0!WO|6
z)tOw_5SiY@uxwg-Tyl{G?cdxQ2GBBO5Q6Fz(A`_Bm~z-mfjT;w6@PUqg!o)CKSiR{
z8m7Pz;Fl+nIg?nBJXjIm$^3AEDy$*hnvT20gW~0v_A{)z_=i;#RK80v6{XrHpUbH>
z7O6nyzo@s&7&$=h@?880C<v7mGPk9U=0ar=iMz72?#7)epF@>4M1>(&Juq=F9*h2T
zxfSTRQw@)GFYU@9#)q$h^^!CJd6S*7<l?%-Cn5L1vLw-EJ^w;BEX|myuwkgZ8=Ud}
zj{HD|WEt@o$N+`fS&eS@U9$I^Q8HInU@U!fp+Kcq-+O$2u0767|LZ4iCo_S*CBwNf
zyLkB4ei<yrt`&VynV1TcH}C^|xoW>N3~%~H6gFyXfSO^n0gV$coZf85t1_Cdf3|hA
z(aEqFX-Eg2`Q|pUGrWtjBDY=xHxtDJXL{L~$kyYSMXkw-Xc=%!=B6EPg@F;y{@e&p
zO0JtK7d~wq7JL?01j}%qD}5cwymI8ce(&GCO2h8J7Sw||79~JX_FY@UxG)#~1%@_g
zj3n@U#7WsE#UGuqQ;~im19({~jzot8A5-Z9X3o;W74mWQ*6#KSr$vWfaQ`>I;3RL>
zhmZuz^GJ#-?SraPFCYD7)$COz*bX=9_9@+TtXh@4{M}wnW;>`nAQj8eQJV(pyP<ZC
zhis_>DF;5qI0V5X!7#5o`+~4H90tjU6Jm%Rv$@30$22`!43GYen0nB$YUD?GsJR*l
zFTE9fdiFZqsc*omC6I{0oS(w26~>|J^>a)!P*S!Apj*Q-JxF*-zAEQX+-g~jFi0ak
zdlpMpOM`s(1yqc9q)1wSUzjd*s@kAGBkYJ5`SyEaI3QydX%;q9A)T?UG^m<3+Pyci
zNL-Ct{*oZwhRlvc_tM@jhV9?ANnA#*fc|0p<aXK;?eMy^+P|}$OyQLDHQQRhj6Xq%
zs#0m7AeOn0Tj(febai1!4qhm4gbDChv6v)^>@Nln2D8xTYu50xvHSg7bRpNbB5J(o
zx|z}M0zS5_I+YiW_9w+D<2N*rUCWj(W))4FM3l;}x;}<rc!kl2o}0qr{dc|~m>3F*
znPDbbkY|0}iF`x>+qTP?t2`)R{TCw>PYeP(u2_)D7u7FC2kE=D#flBoTJE1-NG!DF
z<mBS43YqSQoxam(x`=72P;HKw%IO!Qyg$3avoJy2lV8LA1=%tSx&B`kKsGOAb#0__
zX6kK>4CWz;vznPZh2Vp*6JK)xj6A-|O}QG!EH|X;-<0EV(_(maoxF;z2uRCmPb0w5
z>BCXD{nvj}g?UL8ua-;qmm=cCNtvM(wQ`4_5ak9_9&QhQA69<6UP~12-2eQY^5FY%
z=D4%H)ho?cb<4)|FO<?S)H$RvF3}=Ua6l@4EX;BDS^BYzOWKxqz&WqlDLk?ORRg;k
z+%?4IB&`OAv;|Er(N^3eV({BN(`7N-?sw8gk*$ZD$=t9VwTm5|FYUm!ScEJJuco^!
zC#yWF1>hgg#)_jizp_$Uy!~nkIJUXn!!+i!EC8@Jk5t+JERCvoIw*uo&Qn5>)fw>P
zV4DX|EW%y1$*f+wp($YW2{l-hAKjQnAX6{VY8mDF``bL%fULPT&72Md;F-EBV~JUk
zztOGN&flXZc#&QW4Le$eCd_j1h4gB~zF&4TLFU&O7KDQVH5^*U_$l>C*P`YLtp0Km
zVE0`y=mQ>uE~A8r6tI$pRW8nEz)FNwMA?T-E$AVVC~we4qK}0EPwOWCq|8J!nAa?^
zapHx~S}`@jI8>{%(?CKz>dJ9dnqk+IavreO`yo|<u_Cg}mQLz|uNnc`Xk2pJv|i`c
zvZRy8-vjZ%Arsj+tawMi?P~%H2i8jzfNq6g?T7Jv%XxuZPZ~9z4}zS?2+q7WDf_vI
zkX@6?;OH7~2N9WW0DY0fAf`A9J{xl;l4e4@aX<sg4NNYH&WH~5j;og0tUZXEzbw&M
z3Q}@BvZT3U>>gwQ?)OmVaRvDU6)D>tZRoI6PX=4A7_#2phB`Ij%+QfM{@@hBCeuQg
zJ1>yUR9?R6R=^sq6w=YaYJw1(YJ@Yn)OfA1%*-rO5$I;o@i685d)%7|@e*9X%k?%&
ze(x3RT3D)dy4k7AhEcbh#p8B!sc~iQ^{u5+qm%yrKYU>(nd5+MZH?vXzs5AW9?@2!
zGM}PmDBy(K#nLD9zZvSX(!YwIq9ix&<w_-8M`6Q<!E_oHqj!*@X~XkM%(?;%{a+Vl
z<?M_oBv4S@Tz)Hm<dLTxrcR|mhE-@pqJGEjmoYLBx*EI@fl_9Jzf|hh!L&CyRJMeE
zY6ZZ5nPy>}pH8m^=8}fCg;sshs7IipUPtwp%-v^ExQ5?s%3>z2BHAA4j36>yD#96)
zXxvF?b)rD^yrf=#aww%!v;nk65$#p|B-YW~neNa?1Z<H+HFOh(M8%qY^CzjD25DYh
zstnxdf;RKQ-z3#Ns{ylPVyX>A3{VLf5=9tgwq^6A<t6gG6tnx)SMfQz2sjv35D6*~
z`p8ro9fQfV6g|Xj#d$jy!#<XhE0lOVku;(IB_|!fPm}b<t^=E6SkZ`6PY7c(krA>;
z@Gm3|qtq*I4?jKOj1%_OQT{5r>&M7;J`XyRWC~VO4Rq;c7eG<+P%=e6`L_+Bno_#1
zME_T^?my0~V2Cq|uxy?9rk#82T*)E!8=*C(SR4i-vJ~ndf;R-cMxsDsX!3R6k5)Mn
z69OBWc4Ai~h$I+Eudo=+V{FQ>Rl5%e{D2%^T>sDUmHl-7e~+xEe;vVCjbe~LBdT8g
zjybt5KAc)pi7Y0NL$oQgh!O%v1M4GkmQD+QEga~e(v7B4QZ>2K<(+C>Va>pPMi^#`
z*N1RPzX1C85&MGWQDg79K=SIF;bI?e<~(%6c47)~yi1dXb+&Xd24e2DyYrw|0=){a
zH!^##N&1o2b)wYa6Xd>Gb<bhUs$%iqDhX;lzbFERC!{t9Z8<1Ddkc)vhRkKX;tQOh
ztqfva@^nKku95md%0Ly7NP3`hmY6QMR9{K}Y^|v-zE{%p$y6G`#1*t7A6$hfNYz*o
ze3?nZU*NnwmjZ`y$s+5u(DnDy)+m(d;*(5=s9<7ye1A9UM8`v9`Sb`FHslCCnb|42
zBSmTnB<=EeVZ%PG2re7wNZ!9BTkyBWVY$}ZpFTT1DE<UR+hQAk*^CWdvA=_T$qhqI
z$u>ucPUpNgkXf!~S5^D5-m-BnaT1o|*BpsEPsXV5Aa5^Ln^Z)rR>F}*6^mseqt-eO
zo7<2e6px7-Ya4?N`Hi#*Ue9yy)%$=@)qybS8;YvFpmH8=PiGu_>a5#NS)B2;ZJl*q
z99Baro+AA@P8E?lPHR;agMdyhxhcm-COs7R)MolYXwT9+e$$6ttf>H>{0M=L5FT?q
zUF-tEmoM3GSTAFq^;~L&UF!%w3<+lN?$?pHE|u}(rG9T-Y)pRxj?4*kA=FdIspXUJ
zl~wK$Vb&KN7Y%W`y{$6Vs5>Gj{6v!x0h4d(=@~I*KP-5r6MlX?^Q*m!sHzM}<9;3T
zRgfjeL68|~OCk>cYe8dubEgNt8zPiqdoY_+yht&;&t%sZg~XbfA-ssWfF<PcJ`{I4
zg*{mehpUQCmidn5P^)8xbQ^REj}sx7;D-8V+Bn8g1{l%MO)W8=odEx}W#kb4{_=g!
zD0$jgkVZFC37uAeWx#V886=fcl`FD>)Vs$-i@b{hij1*2z69@_@8$jR{abobVMb8S
zqDWysjPbE^HVzdA_aft<^evv&WmhTV*1uOeT~Mkq3ODkQbWC`e%)~k~GTX)VJ+PwQ
zltPhu9!DIWmlIWPA*c#wmIM<P9SNPW*H0;0mJ$fzL8JQ2;~UY2Egu?`Pw?iKPn$#B
z3eA4ZACD=`JB*dO$_n!`X;3F(J^1rnVZTgX$IH`c>LR5<X*h=y<-$CV31dl)h^TkQ
zb(#Sg!vJ<@5PO@buDe9jt8U@I2<KdpEx7m9+wu`@2S`jP(IOI#PdfFQ#`;~{AX|5l
zy2aVY<&|cKRgnT`XD)#8+#+g(pxlpJ3iQ<s(|E(frc_ayZeAA9ZL))`R?+p#T1A_Z
z6Je)ck-llksFx4h16~XW(c`kz3~lxvA#@`e7`@6BsGg6~dfNz&_#@^8{Le7geO|Lm
zOC@YRwtK)F=y6<mv<z{xr0N6akDs>dmulg{8Su}bqC<aBhG{Hr48<;=uWA^8*Pv(9
zxgVHc2sWPht@?q2G)CE0!5#F!MM7wa-%;t`zQhxertgPG!la>{%Sb{_u14(fhR}h$
z94Be45iFpe1PE~$gDDoLAC%*U%@BuZ@2sa3=px@0m2fer<-oV=>$?~D1^F|JXOa2u
z?Q;!@TqMY{3$f}Q4|B-l=j>fYsFjY@PWNFi@{FoU{G4=?`WUT^cuejo<ZRZ_rjxqH
z$7bx%{E=ar2SBnsa2le`;#uR5Kplu8j>5$>%3o`p%B|rEiHI`md)j2WDG(<u82h*X
z<VT5RL=r(~16AwHvmrCBXo#B1#mbSwtGa+@h+xm*eF-=S%CzStk<QKISs2c|C7?ta
zgacb8cNq!0F5Uz+ZK~lDH7oJcAn;sWTLfd|0rG?_A>%J{mc9yxtYw;>$(^%{TY?G=
z`HkuJt91r>+AEu9tpU<PTH2vNo#NNyM1d-GMYm>BUlqTCE&@vG8;IMJ(Q?=eG(wkJ
z#ma1ih|Oz9`bqb46A1zUD1mA-;aZB+uSg^OED_DyMxgYIrGdaD9B25EVujDW2QxUP
zjf{gROjLdhRpv7;`WK!*mP~pU=HOzhm0O+}AdyM~wDkaM3Y~>sk=qFBNRa>re=#TO
z&q>urc*j$z?x3-dL}!}ROHPjL+i!j+TYKW;EJz6M4pTx%jDMDuKSH#)!Ze7x9K#?5
zb(VtB)le&}h7{ilr=5D(z3x-p)3M?UDUfmYmnTYH$i`LB-ea-ieSjXK->wmNKC29Q
ziQ=<rPE->R7C#Yjhk>#)+!wg9-xYUUQ}^~DE+Rsm&$QC{@N$||U899;&WFfZx6{(F
z;PPy_4D&NH{y^oXf*YPzPshjSk5?lky+sN(x^UE(Q)0utm!vA+6+Oq}_2mL@wo%Eq
zWA3fYqF7~{uywH$Hn}^<phHZe4nahg^9@U8s@AwIGUON~vRjtmHwM_`YRe;FvqG~l
zPLQh;EJv}Kcddr~Tf1TIZO-M0MR~-pENo2V^0RbdO6Cq%b~+f(h#(--Ta~~EHYHCT
z)<>$Paw(9JPC6sfChu|nftm=WF}CAT%AAth^inWw7(C=&Vuoj`RVr_BL~9LmECv>-
z<h}A`*Du1sQw<HeG<7wB0SoAi5%2F06jI|LS@P00j|o-$P6!ry5G-X!=C+(C9Al-D
z$>#DneFWi6#`5>Rv}yh=K<<4pOu9LtjAK1QrZ<07XmvkZbke4y3fQ|AL0Qv*^NqUL
z9g#fw90lCOnuopWYFy6!DBpjy5C;25V_`Fep@hOjMdJcRQRbQx(6wk37wO#QB@d9j
zl}O7;6lyoyOu?>^xT%a`q<gPn@_q#X+OX;VmG~m_(p74}a(fInW65H94qC)i>o@jc
z;jyff6zVW<Rk_!jHw|1qWc@mgr7YbqE<5rEC`goPb!I1WIPu6DlZXedeW>%;vDC(G
zO!~D5MggAv($JW*V)w}VP%TPiy2=}``NDp90WV7iKA+@xanlAYcuq--QD#0rg+32R
zlkj8Bo^?XdpG-8tWpvI0;z(~45W!&SQY>vJ#$I(LN*Ra)v}&Dsj&J^Jc1o{3kuOO3
zI;Vuy7=2A5zI64`GMq|L;a5+56Z+Rq*w?c(Npr%nKlbf5zXj>Y2J}z1zgEYp-3^cd
z&aW!DpBu;t^Uz~oy4enT2=sSdhG)N-S+}=gHAfK#f4r{vjDHS&J}z@P^Sj#^t#?mI
zRYm0Xkc0WUl(x@T@KnTVl7|6A*59Q~ab3Y5G5*)_U(ump%V@|DXz)Ad3_nt0IsO+u
zNVe<bkSdrc%}xS|dxc?rJ~0QBwl^{b)!r}b;(POJyK4khP>g8Z%*Axdwh7eXM}QCW
z8xkxG?rUsqW(kDYt{8RS7^q|W2;gXKqM#rju#=MuUfacytZr77rehxoZHgO`T&>zz
zq?d;r({8U0$g6kfp?!VHUv~IgCP6Ka1@eksH4Sds$&I87Af+^^^6GX_>h61qM};t=
zf1(j5ks2{>c>UptX&$Zq8Sn0&uUlDiXDEwC#djx|y)Z6~P#{~9?H5c4UHSMG3WGtw
zeuMXF;H@*xg=HxVGe~@i)2IKnB{CyImHF5vH)zZq8!cWlk|eJp;(Ol=zxO=@d2ao}
zjRMUeo)=+R8~OsCOXRlnq09?J&Cn%EL{*Pcka;RBw95nhzCFq+f|(_AwCka)P=d$K
zD!M_3x?n_PBu|c9=h|;-b36B{==NEFKujoheLo2(H%td&qy5_7&x05tb6;qbe<K1z
z>P@yk=?q*ihc&{=H{k8YPAl(Q?eHk~p^nX1_8X}PM@>?La~-saaI?ApDFi+;11_^%
zw#%fc=gBBUs1xArSF5f>Mu=}r#3hhO=aid7igI5aBF35tWLmA}%IbAxp?ly|?f_ej
ze*pL7k_Q9`7H^nez7IdxTYPpS7!ZJtDRu0z9z(6~2)M5S&b}G!1^&Ti<hfKK!N{&r
zh*t)jP8_V*YJy-9tB17}^3+J#%4Yi&Sn&|PN*eK`*K0QkOM(3`+fwwx{Ar9bdzZv7
zMWq@2Ul`Sz&}GlE?RZG~%QVBVqmgm@>zo~cnyeZ4)IABAF^M!2t|Lrff+0kC>x@@s
zia4pb&$Z}uzK|w{BoFiqwL@@2`}n$eb4sEN;$%v^*6CF9Tu>}EA8l+?P-e&{0I!8x
zMXxiv-IEpu<7o=$H)cFwO28OZWZndBa45D^P8Eu2>u(jGi~P}zF@HY$DV43>Qt?F4
zL6=JJg`dht8`rd_be+r2cyr7ie((5j0uhF{n1A@WxXuv1*gT7a8XgmJjNzIY1yl5=
zn~w93e`*-!IdL#yT{$HU2fEFdmcqZde!n&jc^-W{NP+kTD0XFeb;Ed%qJCUe+G``@
zR;bI058dcp4xkAmJJJu5tt}7d?sIRH+qjm488>g&;x9bF`))5f0oePa2MRpXLZ|^Z
zRb7mnP|<Lhsl`kp3P43{=}cHeXi1uSf7Pb#32F|U!fvO1H7Q%$oFsxXEijh`GT7s$
zSzncgUL7&;)ub2jp?WhyJ5`W`2TLg~cNY3LIk^q1)^4&fEB*Mr5wk5{#fmvHeJZuO
z)W8(pxxp$vRfkM*gb0c@3=w8FZ19-h3@Plh+*4E37>??<O<(&c%D@P(sNVe1CsBta
zC7i~y2eLqT1y~uOM5MiyEUS}||6J_EW@gKJzGCdS&blP7_tw39UMZx-$p%pl;zXlj
zOT=a@RK86ZL^dF(9;;6<aMlQ;f1{LmT7*p%%(v&IOOtJOSzn)hL~fq8onI>2D~rh`
zWk!Wmq2S_<ByL6h4fZ4Zhi}Dzp>lWH;k|=*Roe*jZ~5it&5cM?E)K>bi8&@LLEt<a
zI+v?N9=Zi`^2KBZUIiLr1_LI7Z>C%#j~`eIpY3q=<9R_6#F|6CYnFLn_H=a>pG8-B
zddi4|pry?sxs+e%ksy|@GAS~TzBzH_ZFl_Nm^U<hh=|&OFA#8SS%<x25$Rr&OG2YV
zqNiM0?=%pXeNYBW6KRCFJ(L3)^IU$EOcM$Sj-ikqkeDlXE5|>ReAl*0v7ZlSqw1{n
zn)~JmJpc4D#|-m(Ix0Otbm=OT2QSy^gxjy1pQT&U9i8!y&#pc%@jk$iA_O{uPEjv+
zx7_obg$&mHv*IKs?}Ap>OKmDFaV9z5Fg>IIqSei1jf+<~LAG?pw+J?Z1{}C*+DElW
z>Da|o{wyV#4bkGL4n`}To{q1bj$=Mh%X?nH{U#Vi2-IAN(cY@N>X9_rJ!VO~bM5gi
zaf4a1nLi^s?lH5GkJwR-yL2FQO<S4vQEt*<F0hX}<1#csuoO!muwj8$$#{WI93Ha9
zHHRUMbE@4^`g$u+8mE?5uPSrQoJt@y0AX~qQeWU;Xz>I%&kni&@x3Cn)}WF2f@(8f
zO=da!)~P_Ig&hsIYw!l3%+*m2Er<;=tKjpQ7)jS$L5T{2#t#W%QQAi*%^&ewiM`FC
zes!*Djeb;KnoLTk27#po@22p#=ZVRIF?=L2%Ct$?-_l@OTo}du**Zv!N;I>8qCb<w
zK7MlK&hrdR)+^J{r9z>rp$qLhe0BuPZ^x<S_z5Yvu!)gOZw*J{>cF)}?Cw6cTuqcA
zAO^N*yP9lF1v)7CZOwV^y(<YL@qhf4sva1k|H0qH?HATVe~sHXZR{UJ5i+s*H)ks|
zC9j2f%p29($zUHx8%>L#o{Ssu6rY7uU}<;B2#J2nml=VetRbkNuW*ax4yFWjXTW2)
z;1`0eLkIWr8po_>m<acrC)xKoFOiMLDZTr)1<Z}jIHp5}69bx2%O~-4Vwi;kW-QPS
z&i|Ctko3~o^;t2OWyG-lm&L1E{ARC;u8_yRG*YE;)$16|#L`jF5GEh=o17UL23efq
zHsw}cSV&z72C<%x*%|E?S723FFO~%|YyFbpQC+eE(eJH`$fz+bpwp4Sx8u?d9mC0T
zIaSnEVpvxs1~+ciFFvvWM|lpK@(rtLKJA3lML7lS>xr&FC#jJwpi_L``>C5mHY`u7
z?!N=9m|2JQwG)eDS1?A%utA*+G!KotdK(I_e_+lZDXX&+xUyAL#JwJM|9fEcHj8aI
z;S1im=>0&O9DPe)3Ku%UWb|{qeu>{Qd5x!6t7zG>%JYA{9=<1vzxbE2<rerjK-_Op
z>2oQ~3915Z`2vJ9dsj?+d4_K#qk|X-ZBbv0dV3`m?-0uHf3v<5=U+4@sx#9je0fq-
z&N)|WV~k9ZZYuC7bkuhw<=`4~jt3rSRFcZdAu<fkI~?dgi5FHX1DkBVvF0TbQMfwV
z`c^1wvNQAv+Tg=uN22#D`rlv7{Plzs5dfSHQ>IKqfx-5vx+Zz^oZQ${$O|Y3_jCd)
zKl$TSePS`3+1FAr9c%I(c@5zGsf@EG@2C#$Dl!N|1sFeGhkum)!&fhoFz&%WOJBSf
zcP|1!q?`(MWweIwvKEzgyHiGL#L;sC=1AIk$dYf8P7LZ<f^5(b19!iGpe}$tZ!HxG
z)WwXhtW|ZDsSCG&a5o!{!ANEm@YY9&bo-}=sQ<+?aMM~gGiGd{=&*klG$+6DZ`@Oh
zXhu>9;sN=f10Lq~2QSE3kMgj+$0~HfbIh~-D#we+r}?+unS-k*e6?w-(LLO60YWvc
zA#smJES;G1d*1r^8x7x}BRNtaNv6II(TF{0NvM@qO0K!^DtjT$VCn46@X@C#LR&O|
z$_Gn8sQ0^DkL(lme}zB`0{_GN01*ivw_iumr96(c?1xLUCLJ_I8UH4cZbOK`-jO82
z1W^NiWzZtVL9Jjqc3h1aOL*t0mVuA`%Iq~;)TqUgl_XTtr<?;lHr{&z;e$#c(YS%y
zUMce_cK~;i@_w5C)LK7)HKGupkb5+EbhJ)|8`$*W-9(gA|6^J@?1>gPE7-(zwFveQ
za``Asf}eBWdj&b&0!w+PNb$!G2J@R;N_qc6dN>726K+3NX+e^z_{8d;_B%;D5#kpe
za2TaVXNsmp$q;@gaygp-y|%FpU#_JNYZ!kRFu~nh3pNCh0o03bQuJDMH}*;TJZ_X2
zlE`xth4+#Ayw{Xn=kmIGi>GBMMkRkjk+M6ORD$BjAc#sR#Tzw#&#*mN3q$5%uEHFr
zL)4_(V^N}Z`taqHi!h<0Df8Pyoh>gyV!zYWU!UEP;~jGU@YFX%QC}gXNdn!m;RJ93
zaf&3C?CWW>1t_s)3>qLjy1Uf3>Xx{7F}niw_G&(7WifRTae4I~xaTj{E?T@6Rr{|$
z1tM?EdUM}+hTI(LZAhuVoTBBPRv?a8h7ca6N201Dr$QI+TB|7_h*!|hV|2C1@M1;b
z!gCCzG**N-jo>NeJIJEILji%Jh9V5>Dm1@zcbtv3^p;VqS#ADh{$4ibcZYmbsgQ8s
zcQPx8sJ(h_k|!6JlfnR;ebQgyXa~B)E<Tg8X)5|*Bcui++o<QG`8|6t-wg|A%_2%*
z?ZaGb&%>om6a|0IE?gD1Uaa|hQx|C7A~tJ~++hMS%2_@MhL2m2zo{?Ji>@jeV)O@!
zzKTX;WXIe>D6e6CCf_J1<cbe`g6+h!`ZIwg@_Ia5hyO4zkhgUx7bJosJHxD0LcD&+
zHNj%Q-bdL;AH{>|N1(tz>5|~II38BuC|PB1zlBVIjSTSguDHw1_IxU5HP;@|ux8v0
zj+pj8@h1KLaudhOoVXiu#Th@9$4$_+OH`Knex2_eVz={J1L3qor7Js?;j0KHiiK6>
zbn7bW&?+k1U`V36V;;fZ#@oUieWLpT=VOxpB>s}iJ(|BC=O54YJ$SWM@I%=5YRB+T
zn*B>U0x$@ZghB*{vS@?=8I_`QZD?h~-|+FWd+V)pv-4`xb?W+L4qRxz7xp<t0h1&d
z9S9_oHiIGSxz1@`Dx;7iF>dO+XP~)IZ|>KwhE&@W4*_hT<3uk~>h|!{5KG8Tkgxmu
zJmI!TRd)*`ciYH0S5Ytm?bIr~rK5%Ji-o!Gaw<K_v1Rzs4A9@ZhQ=drNz9PekSNI8
zQDdYf{|q)L=JQ(W45;GIq6GvxZytH-Yl$QEUXW-(&0wmIgm3x9N|oV~G(Qy=ywN;0
z%4<#go^&CW`Xw{RZB}O#eIJ0_NL&g?NYj@8$xzPBN2xZsk?G3M6kSVQvxpzwCz3vP
zy#i-$G?beLx$&LBQk77j>$L~r^OzDD(D@k2@DyqP4_R*&7KQh=ZOg#WDMJek-Q8US
z3=IP)CEXzc(mixH(m6va-6Aa=f`F8Sl$6o}67T#szK!SiZLu{Q90#mf_r0#`JXx0B
z!xZIP%sLPTpYnue_zGMxrpuP?7{hRp*-lw(YBrZqvDv{f)QM1bT2n_lA+MU9G$wjt
zz?mB?0sm)rzMN5rP&-+RdxC5x$-~p%$36V#JWNwYk8UI{Cb^(QFsIiY%mc>t47%W=
zH|kr1feyQc@UERmY&K}+MF5}d-qorF0<%bn?A(q)Yqc1$#_^(JTWk9+Bgj~D78WQe
z$*Txp1j<p&vk*&et8zV<ELZ*FtDh8uM|a{iUil(X7id77Vqgv_Eg6rPJ_>oK0i0$0
z8DaAc#B+$ZEpF^gp2;~$5#(#Mf<WBbe?ueTiQ%C&YaVSBpr#m(nYWs(pWv)Ly9Da%
zHOe^6{@Qp;BKUwDMWe~&p7(F6UKk~hMQo$u%&B=PUb%+<S2TuVhJ<h6qgu(#(eVki
zyw4frFu6{@sHbNA?Qmp`B>`M^j|>zhFpyFDcX34E6*RoHng&Y#IisRD!Eo*rlBzOe
z&ySihR;s~;=DwYsF3iAqlWrFydEe-Z9p*u)O4na3uYS6C@Y*SNHP7eUOZcPJbmZpK
zH(9Zq_HYAyh_bG$Op+J%2Z>?hPz^|eGRwRV+!<p6iy+IR3vkydXfr|)^^8lw>vx+f
z*&J^a&?|06*L+jUIx_N3>$EY+(D%}&POBv*Obmh-{*{iDhZ9d3bEq$oj1Pim9y%2Z
zfQh8R@uH@9t0@WVp7b8^mQ@@#4+LML-59&P%ymci{jpuH%eFxP!%3^dO%L~=t_hQM
zPHd8B&cbw#tA>G92-RAfeSb)DfSh-0JqO??7!_*YmF4W!`jtv(l}-S_^?LO$&q&@v
zsy|}Cu#gFzl~&6sKE~41MQlnRB<ZHfQ*L6RJIROnmY`Hv6*}!c2E1mDJAi)62H9@^
zTK{Ovuj@X$>HZxh{EV%O;y=<-Muz2+`dNVW3_NYY_qo9AZO>-zR?<h>3v>T3@WV^0
z^Jvi$=IRAmc7_h7Vvwfg$dAXDb@0C!`BJ1jmw4y+p{cOjj>+sM^o{rTdde!=rN8@@
z3LbQ;*Rl`8I~Di?4x5m$MQ(CBQrC51T5^w)zU$xh-*ZQB?UdH3X44r>!GIieV)02f
zsp9GoSZgDX(;!&iC-k-zgj1XYLqn3p;~HqFMlv`Rwb<MtG-O(0ZaaQWiMiGh^NY0(
zY&;FZ+_&P;MZ*okzA`MCGe%)<%jqAS>l%I)9gkq=_f%yDt@EbhtnTdgcPnWP<%IM6
zGYI`#$Wcj6I8TWq=5?jcWDS-BKa=3|(uDDP32GdCJ;msshy}|%LS9w|Kt36x-{Lo_
z;))G-D?&h=1rz%wWlA(zKUHnCYj|T-=0X*!X6+jqB}plf{Y`9=vA6Ztkv?ZHT*29=
z3OF`4BVBuh)jPUf3hK%1RBjk-&M)@p5cu11DFmy{H6+ynFj_0Uz~3n*e)=iG5skAk
ziFh+u_5|u2$?@|~FdDW@&6Yo#!&()7aI@~EHCHd2sv!MSGHT9xjyna3)Y@0dIcfIz
z(r23RrOEejTr=ZeY!FIZN)TjH6&um8)PbEZ--6miN&V?2{g}m>vVk8I<MFtolK*!j
zbOXe*B}VgyrtL?va@B!=Y;@QBo4Lrec$c(HMj_B8tN7DL5VmPnyvx5HNq;x?;CT(N
zon<}i@XNvlZ(9ABgMa9M5=A+bPp6IeYYiY1L%-~RPYg17+7^jzoHr?0qy6Pt6(J3O
zu#>iU(sB_k12fwn-=qA2RXVT+I_urzPE;iPd?EVq*9Q<CJ16#Ly%}Tk)q(o!Dv%Mz
zy|3EX`4^&pWlFGU{H!wjI@y1{8%EYef)sofui|TIDq<NH$WItvQ|(<4^F00eCujT+
z{LxV%SC1FA(`wjU?yKCwzpdw#Du_2ja_I0mkd4){jzff;G8NbDX;y#MSN)sH6Ie($
zS)Mm_j$i-)b2-8FZ>dUF;B?8q_tH&0&ajg4+s;Sl-NYIbw)!A#!l|PCbbhF)UnmbN
z$PU<J+B*BON6jQfNiiJQ35)sMXhO%su@(X{hSyJ8ab)mvcvHI3A!Ik3(bLeK1VdG-
zNrW|!uGaktd#lCHWmhY{-@`NXQ0cz=5OPN;Xk~R?D6w`TjxDa6vSC(nB@IUt__mX^
ztYB3or%|pWs29m<(tcKwL|@$5^kwtEOz2()6njPu|0Q<a9>=MYGyhxw$vh#&8pG*0
z6U7F*HdyJ@lmR-U>>JcbhxL+OlHEj+3PT}X!2|w`oN~4;Q)9DOb-c7&SJs=@U_aO*
z1QoDG;cMy#8`=Yfp8mH>b?&teEM&!|B#jg$i=VdfAT|gG7k+};LFCWten+-Sef6GJ
zuv3uFw`AH@wj^x@tp0eZYurA#(uVH6AN(hlJc}g@$G7{H%u_cNEJd$`+5)GWH4ow3
zKk+jRRn3i+cz^&~xF{yY<^WmP7(l0z)0D7hV0N9{7Zb#CN-Z;yR6@3_kFEwrlIyV<
z!+F#s#8O%eu=e~C;qma%wkfX6JNY7EMEE!<ntW>40Wo5xzv`te;qXDbccn<U`}J1A
zng#QqF3C%TpJGzDBoL^gTO|KduQyHXkcN4LYLxPd!~*ntxEH6QqDoH+bi)Epfn#hC
z#qxGbACkpx+}=`<aE#T(9W#WP8AvKP&69H}&n1wXsl+K#^TPV{RF8K_Qp6A>&KWfn
zMM5lq>K?FK@*hTTN;;H2id}aKzQUptNfaL^fm@N1N<Gpb9<+o_AIlFv@clicuEoS9
zOhid@-I!^Jv{OkX_G1ISfRI2yXl~eW+-ipLy)QM)fR#j=AsU;JdlWq0RUztKPqoC#
z->G(T&6rU%jf>sco0p+~Ki|VeX*_`8QKf?0>C7fM=mN7%d@GklvDsac7Xpoc3|eWd
z;WJ*591sdRl?vI}z)5edAQI1q-2*)ixpWWeZM^IOm(>{h<+YrypxnD2dB}vV6z_E~
zO>A}qM_JUo6|7~{ouyFn`1v?HaL15$Xc3sE_7OIcGhW@OSN`mQij?R22lD`-Cr%a_
z^9yG%h>P7UPnefj&+k}%jo&9Vl)bD(aA||p^*)B-cf+Yb!o!~KH1h2SEXH`7`8#gI
zIXhafRQ9NC;rGIXFD?+tO8yI;O|x_BWFRtAb=keBhhen`pH5X-w!AXj?G4!)VFha=
zCRiv&XWO^&vuXirdTicI)&e~KZq<`PpAp}amjR!XRGQWeRuyY{!WkpM^Gn}hPfj5o
zkBUEr0##TOB6rOCDI4C>36mu!s~RXTTGHVB6JPk9<G9ssl-96Ukt>$DM_5e0Ze#bx
zqj^e|J{aV-)-gNJ!=5Ij)-ptS$Vz)<b6OIvE|=vHe)fv2ijS0pANq#`|H(60)2yUI
zK3Beh*6SUVSlXnGi)@L=lrUcv&lau!E#S6_1}vF9`7!npaj8d|RdxuMmhhpMrV0it
zPps!*3h9lwqZ4JI!MGY`E^4;57WAr5MiM#sJyFD|bh(LFf<AIckP+fvfcF>f+fakJ
zR4N)Wiak}Vta#ig0&`w=%RNmR`Yn=+vU>$ECE+OXqviw|6WTa3%$S{zk{rZDAH;%6
zoT1_3t~`lT?0Gt*kb}hQp{*Cghn*GNXnY1diNGr*5dNYsdi5HVWH^Y~j#`}vyIuy1
zCK1Dlgri^a$#d7cNi&{2$O7znj=+!rd)i&p+05=0yPP!k?!C@#LrSjzRW9r``6u_)
z4F?o!n)r~oCiU+;0$r$?=4u7VH6T5oi>N79Ms9+6ZT@{`^hjRd0yYhkq!av%$$vpP
zl6Ah9E9>McY&cH7M}sl>moZ+{RnC=JH)Gf-dpN~o;JriHyolRjWg65c=*y!_cMs=4
z+_i%PQDQV9oo)4j?rzKLUOO-M?PR7|y;gKPJW5zG-~YvaCil4mQ5#iZm>HshIB4^y
zU&rrc?ys-YQ+Y6Ar1)GZ=AsnB<fG+tnBR+coF<a+tl^A(@$*YLSt?3+H%k3GaYqQm
ztDrpS#aprz{fXr)G<>qr(&})4)3{jfj|5=z5TY@27VyGUu(#0B{{{5A0e{*lZ=X`d
zYnp#RzTOzSX646zwQ4dJUWtpTRLZh_W7!CNxjIbLZpr_UB$gb$gdtKn&Sn3*_{lat
zK_G)|7&1hnIoK;$7WIq=V`xCkyd5mHPK=?i<{jN+{d7k^xe7PXk*qec%&wbb%Y5PY
z{w<a)M=~$x2^r?a8jvV1q7tv)&;Vcqi8SE&{AxGoN7RQ$k5^K-5hfhK)78y4{@7S}
z@)0_mj;?5?z$w3CpeFb=R7+WEzsfyfBj6kSH^zh>_fnU4ZMWmK>VtkHg;8pBLR)r1
zWLOyE1DOp8*0)~4F9XY60s7qXu9~p%8Z`~_WlB^M`d_WlE!tMZmRR&eOM<*0<au?A
zeVqxi4Zq!zO2K)Kck8uR-DBiJ6m<#IPyT0#$)*T1rtO$42~Cv*c88D!I33$uDKn9x
zRF5-CDQX3{45!BM?)zQeo19z~6f3YI;IReIaZIZLS`R{{f-BSV@HEe+!kH;QJd@@4
z1g(F`(@TL(lekOqA$S4SIEyvf;Y>o4s*X5=TcZ7yZwU)bcU->t^T#-k7jR0fFFIq$
z;T0SP$RsA~WNgRH#GT&j1q1`W4|{ZB^S~mFoki2AP33N&PQRVGuV{@*qZxp-kO`7u
zpQqrmC=a+}bv^m=#os64vI?zrmab4!aa4=yj53g1&t0>5s8>$BCz^2LmHLN9b<+09
zd7&5RlYhO~kZmhnuQ7lBF^%Vi%5-b0B!J5JZay^Vs`zU3Jp&8`CA545Mu@zu$^z3s
zaiXac=k51tsI%5Tf@?tjkQM1YDQ2jPeg=s=UKVERHrX2G*PD5?VV^$AJ>t%wY4?&H
z>(#>~ha7C~mFRPZa6L!2+mU~ZY}j-cs?T%hf_8GI(HXT`X@?BO-J`tB%n}cO79g{>
z(WG6arXwy*Q@C4VI!W6El39e+i<{}W#1<OJ$;tC{@R%nK7iv%1ss3P<?c(|GxssHQ
z{}BoKZkOlCzyrdK+yW|1nH9Y=tN)lsMld!l7>XYxE}PLAbRp5I!#tD#OYLq9@DU~`
zQ&u=DR&FWD>npcA`n_E-Zuv$YSDsiNzIkxIPvKxfT}^}AQa*g6*~1{G5$p-0dx3Fo
z?CI51o+juKGil8A+^z*g?e8DKbDBK+@SV}P=jCAosXECbXJ~S}c@fR$GEbbJDPKbt
zaBtjK^&RMVFi^ImB2#gJ8g8EC>9Kmd!t0Nyb2L;k8B7V51p9!wz#)(HvE)dy@`n^J
zD_M*(<ZH-%RT|^Ou0_+~vQzfru`^9PS@l{L?$Nm+$1_8@iYq4E%whcp&iSR1U-AqX
z4EkQ(%RwHGw@R%fFV4K-w77e57kn-hPC+WFUzBg{#Vvw=8C)jaYePQ``$wKPD^Kp>
z(=vak9PPDFA8vxWQNQte-(tJrUKc0&M$Q>R?VQ(G9;4<Bvp<+-)$kTH_-AcTbkO`d
zXZxF%aZY*cR%%qNzoE@VMJzK&x_q9T{N6TXEkJw)4BM$J_8yFk&1PXV*8M5X_eC=b
zn|9o?{|~S{u-g}$lf!kHC2YN&dHz9joH|olDXZ*FF>LtcLtRP;D7@KYWOm-e@ypHs
z)gPcjnqvP8X~tPGV)uXX&~3&jzkAz16@^yZT*e!Y5pK(1M@g~xAv0~QND7QTJMd_>
zLpo_oe;sAJWnCDx`9us9sHUK&93^jynw0SUU63X<Pjv_t{!x4lKs)s+LsEpvL~BW~
zqS#3I^4Cq+dtE?ze|!zjuu+ZSFXX|~ZK!!PSvhc^Q<H|;Y4FIL%cO-U+#4mu6|jaR
zNZXtm@PCPcSAN==1yCs>$bln_z$u3hezUL#VhC&OMh#f{k(}}k)=ZLt?JJ_clSddV
zs^LL0*hMzfA$kYH{D-Viof?!h0~V?PA|7S^LAis5z-K+CY_%{1FV8^_f@n{1PO0cH
z{=hm&`s|`I!yR+^tSt{d{V}qnEDsD4CUm<So99lIP33acD7>^XLN)Cmm);+OUO{yh
z8ATLYHCt0A9N_N^2Qx#`GaV5h8E=S;-|Ys~`sbFkbg;PM6vRaRq3Y&=N<=Mx0v?h<
zR8h8@=YcKrrW;VCu(mA`C&4^5yM}IgNlwY<oYf6xWcHn=?h0VxPJ5hd{`I1|P|PbL
zSx#d1R|u#TJ_Fqng<!a-z9=1c)pHZ%rhN6+pbv}iU5tZVtSa>!roCClFPPO`;Qk7k
z_Q~ORH|RblPIS#dBrH#$o+@VjIS(~Ik~876kM5<d+81SY-nq8waz>sVRfcruZ1}<Z
z9UP3tW@Yj6-?`<&Hu}nMQ{N<Cqwby`Ni<%Q@E{~s*$oNS84;T{;f`@B&3-N7cYXxR
zic(ObXol|G?gViRi#d4s9|`1<B;(IF6aVO)yWN-XY3mKV%f~0QraUx;h;kw&XuiLF
z;=AG(1B1mYa_Ut;&IRnn5|B<}o{>!1l*|xfiz@S+2=?_G8Ef0zwwRqQO2tk{p+_%*
zUw`=nqFu>JrA9GXX5OR#-P#)c1;md(#5v>Yb)<Io^|p+(BY#ayLqu%(#}|CW7VoFW
zD`P@DUpV7jotqmSiyKO0l98`Y1A41ZE{(fsP1iPfUETjU=bz^D+>7ROr@NIIh*~P-
za<5@{K`)wndSoB+lSAx0OKa!j0D!Chfn4OdITdLk#qG23CS#|bkn~bXg?DW%h7=t2
zr;fSYaGRJmhSA5uIfX`^aokJ}zXZ!AZAq?qzWTZXJB}XD=wBUrdDVx5m55QtrWSn)
zgf?2e9t=tBIPyIrc!^wxWRxhD6q;Qx`hpm3o{{O&CENU%tlAwM9KE~G?zR-?*bWjW
z#!JOv_qxG{1MEiR9y<qNuq4A&I&aL1Gjwm))|wASvrV%go*%S$DU!UK!;Jx^!OVtK
zJAK$`D0-->ItPqfaI32oPZ>LeE3a@d#DE6Xa5F@_qv3(Gnm%&+RSh|(2hWC7zk|j^
zH%0i11g?lPxkD>eHW(VUxlX7%N@NG*AIq={t^Ln**Ih}pzqQgbze=8a7cR!Jc<Qjs
zJYb#D{bstRr<6LOT3sL-?)du>mi9`yLQ;ZbgA$g0lGe3b(ESdq0zP36?8y|wvwbsD
zV7W^j>r)rwNWuW0V`Vkr+`%z$;uvPFF`K3p+c`tejfO?PdrL4y!5vH;3=sj2sjjlU
zr%)wTMij~hj>Yz;EIzylK0l!=tV2Q)b*@q*U3-fELwrir``_r?H$@c6Yk-5@oGyyr
z6cP0oXL|d4pM&&1V=|s=EHO;=EgKU>?iOQlFkp5d3a6gI5Jh>>UMl*|z+wb8IVTIE
zE|n)fB)<;Nbkd8bMrG3rbXCg^9$P4|`KM8tO+dp)1IGvosE~p-x=yZO3)R>UBOM9j
z*%Aox4wX82zaBl)LK_L#m3}^3=0#VXQh@~Vu;7#-4@Mgu+mVE(?g~b~3sjR7R@h<Q
z3fl5P#E7Vp&HIttOfX?MH=wO0>6sw2N;cpQ<4R3O`R#jQGveodQz-WTI4%oj5nC!R
zf+6CRfjznA%lM-IUkVLe(nA1|2zXd|M$f&Zz=-`R*Wu~<l}0VZR9E`N&DO}TOqi@f
z$yeSsCZnnJ!VDU+SDCrz7*W_xO{ed{ER^b$_ph)$QNw&ewKi*fO9zAg-31JE{~?6r
z9ot*2*9Shvol50YT(2Pd**rqtL|4`?Q`$$Ykc8zt7<B@YY@c5(7f9&k8n9tp_229z
z9jT~)u^YaK`5pbGtDZu|AXeA9v+hszrj(~5vr^!yw?1X*DSNTp_cn#B$e%z!e2FYG
zf$dj-zj>Hdex|aN?5*(1CH<4r-Y&ZV*p!vW*!o04t5Lt0-l%tdj}$=~%%-&(0phXD
zESULXhg9JgaKO&V5Uo=fz8qT3b?18e7utMoLZhQxnfxNIE%v6rFQmsn&d{02=}(`N
z#Y(wyeD;>QYI{Idng$|5swdF*a{K8~j49i?4Co0?^~L>OQA7&e@i*5i?Y0>nd)hPB
zZ$dc%tDSG}yY>WZ>>&wzf)@^lb4Yt#)Yc7xaq#e~{Na(m<<HYr@XKy(u)?`=y25Z8
zUCMU)J4?6s^*Ny=5RK3G7EphzeziOPL7fY_4@+{UC)9>|Z>py-<G||lc+Ol9jv`dI
zmSZNTmOa&oN^Y1cio|JcDWxrAoB+o0>R;OWQODe0w}3w~F6bj6T^0jlnVe%|Rr{hM
zq<<Pe`*ab>ALbTf4f=3shBWGlxl`7yh~10g*H<TaJ@1{madYsl#jmZ+Z=(CYdYNxt
zX0I@-FwQZj^01o))HB?re2Atkc@~n9$)K)UwHcHI^GqXjZZP2Gg0y@H_Q;-`)hOeD
zY00;g*xtUmP|(m(HsKt{kKRG_4A`LW;`F~Wz*d5QOLVK>r`?7C56*Vv?cf6c@N?@(
ztLPcC0G!0+vDq%L;F8fB69^vWFLwF$%oMy`ZPEiFO_c*FL>>57=;MHhkS?2@(b;Hs
zm2piR!D<TR>0)pyS}25JhA$8Fxxuf=K<q(Twndpf=)tQ%QgaAmk%;y>Zsf(JvNX2M
zp~@{9`FrKI(@(VU*ptf*kOF02gak)_-;19$ubGEtyM$&P)YDXIgrV5_8NZ|m=Fpxy
zRnN7SKDV08ad7ogM2wE@U0fH;+wD^04coWOv$7|h22u$!PmDgFYjGo<Rg@GB($yU(
zLETA9=A?b0Sg*iYQ(CP^1@#OjXPxHm*VF0}!d8>9j_S9Lts)kY2~fXkBXHVjapy;`
zej;_X=-vr(60a%Fz_Vld7EMq%%ym_vr!>f#X1L&52_~T9S0I3d<Y)?u5T<m;tn7!5
zI_z7>t$%Pq<g~=6h~=yNN7ao&#?+uJ2FQdhhOgu=d2@_z(q_DPoAmPE`ZCRQo^Ji5
zE(4MpfY?&m&Xg4jB=MV~`egrE=zj$aAPmv4urqHIxe4K?5)eYvv4#r363dwNS?Jc?
z*yc6$bc7uP;x~3pEq_Ba>|?TTQFE`ofX?@9Tu-FMMdU9XUk%T|w(V9roGzi{6vY;!
zQ}b|N_+gdoX$K|J^2!P5rL*utbncnR5HSiaY7Too6?$R_EO?)*iXLtR&CXXh_Ez10
zHQm{7%19_Q+|MnwfA|77{HsMc7%;k~)+@Gqi&hGlRotYfCzwl2yT(O-8)PBMWk}*o
zB(={b@T~HbS{mRJFf+a3OytJ=Ol!Q^j?o0H??`me{)=L_T~xT@_cXnuIw0Oo3V+u@
zjYE&qo8Xb}`@1^1w%7It>PnH+m{Ej?Z#$Xp1ZW;0wyJk+6T(GprH2hZ$Q8^+K($>k
zC^r&@^x2;gm}L}<dvFqp9JB=<>t$v__0Y#G={WDSnWFTMTNO@=7Ki&`MNz;H)if`_
zCQu2?Qm>}2m2uRS0OLcFqaMS}ygYcpb<GwR7qttOcv^V%p`cBJLWJVc3FEn2G;vPP
z&o7R7_x4a1m0+yUHOolCgB{-SKNw9|^eIv2(>eS$T~-M*T2NY*VoJPu8U$x~VjLbt
z{F+I%z&GEHb=4^SW+(lX%H1zd*A|ex&%9)6pZ#}T$)>gPY767YgKWUb)XH-u=?sRe
zh%b$*8ildh1W(sRw+|(CbKEE5?ayPtHfRT8R@rlnQ~B=9m_uMrQUkdcfvK|W>21N6
zan*H{QU=NqUKfc${685*_-R!*7kux=+9VkH7-%Fg8Q$uW6dIU(5;|w;LT)E|bR%dK
z>f(aJclcqcI3)u11xa5)(KnU+tljo;DN%tH!&@DS9Ljk;l1UQVvqoO*Cp-}wB*8Xj
zO^|uAWEuyjJ+K3m)JGAL;(t89xZ<I?9G+ucY{6Z)@$LFGugs6Si9(6_(=}acnJc-@
zud;7)!ehsG0O3{~g9zTy8QSMlMVu-4z)uXU@hUqeoAglwU0Sddb{Wv{jyyU?gMq@m
zp+==LDU!L0otEk(BL>SE2k;XCVA@Rj9($YgEA7^_L=x<I>XOSQg3<rrO7cwX>h_k)
zEn6Gc+CCOV`BO7QYg8<}^^Yk3NpfzLt`7fh@o4n%Dhx#XzKjdECL_;z>YS6wL4~6v
zbmt})^*yWk<9Si4sy?C*+^4fD8t-^(j)uB-&2We;<1w0?5~C%EdoR`f3Y8uz-+dJR
zWzFmKjA|x^$92#bXwJIlRiWrk!$C|M>p`1IMB+Qs?Nt$XpT~q8_fkJn$rFy{&IA*a
zn<80PxE62@K0_Dmou8)(u$yEjg!LWMdE*#akk-A*d_1WcMVM_BGu4xhl0iY_vBd*h
z5*2WTt&4{}rP;G+FTI3J7UyK-aZM^N7R#9R6nfAb+^m)3<Q0G8?eJc${Mh$qX|{ui
z6@|{B4uMDDc>)#t96=ez)wesh819obr!=AFisbf7&)VWbYD2AbfZD|~Pls>X-FBC>
zT%lC4BI^zzr9KJ-5Giwz@o-RNiS_iuzsl9Jr<<U_cenAemaS^Kt7i_h$Cs58UhuYq
zF>xQxaf|;NSbC}d7m*B0rQaVDaUD=5>;EYY1Q_WRp~loUVxCkszxIan!D8<gfEqCm
zF{@cgxplbk7Rg-H-z8o%;!|oku$U5?!hk1LES39MNyhA}udfx!%3;b6re}*N@n4FL
zMbJQycNvl|bxJ;$TB@)lK&yrLYujekbbgfvSOc2YDqoW_o55<#PbHT=j4m=eJ=2pq
zc9oKSj#r;95gH(o$8H_o0(k%>D&`IpUS&o|)E7Seq<lGY)B9y){$;RAFh;zZNUKHv
z?a_N{WxSn1TmQn6bV@e6DWuRoLR^B8KA9*HBX*+VG=}e5CJk&t{`pp=<=EoUzCQ?1
zg7K56#)5J@<1Vp8t_B{sHdWs?CkCHvHaBq_zGBxrO{iXNT^)@)q+e7WynH}gel`m$
zeP@Eufj>iSU-m1DjhW7JtrA^lnNT}_4j<@`C<|=Ph!t9t*gTdYlPcAMT{Rwsmi{3Y
z>GkRNIHnx@!r9fV?%^a>1K-zGFm(G+<bs9aGsSkmtA2}n%~L0i_oY%OAa<|r!HehB
z3MHQ{CmosQ_dcIH;F-*cO|G|VC@B8GI1{jraSX?%W_V5r>(iW7wiFo`vuo&KV=(C}
z%6y9uJ4$z){g<kU4%*3Y^h8!yK;5GTtpBucP<wdmZ~Q<=n_h?BYZZy`sa7K2NmgIF
ztwssWpq$k$1{$+)clE*kOx1)~@>ryxRdN7U6B~jMoh!EJvaF;cDL0vK59rj<hnLP+
z-|%&OC=_cKdpk}r(!amav$WbcQY}2qZjy;6hFumcnV9-nm)|_N*K5CxL+f0m*q)1r
z`lOrHI^o?L5gJjP@8~Kmsn1W6_t>@vy@tcLbG|rlnw4bDFj#3`A!ht+r^|ctN(Go&
zjXeB2rF|DQ`dbumi)Hk@KEhzt^#wFb?>(`OC>>lKdN0$jXhk$X$4bIt(%J~p7Z0^t
z8DwToez0Az-J5R<+>_&_<T{hs^t;30+itiW<@R9>e%g<esQ;^GVbdg-%rQy$`8Qr9
zzJ4EdJwEx0jLUe3W18m9$iz&!?R>+pUvWYzZw)4x(0B2iDtY6j84AfbGks`h5YK6Z
zn)(a~QSG;u7Mt>f-QKdzz);|ZTZoJ7hOXs(4d5`9g()|M@c&~05KC_(6T!;MELpy7
zS55x)K^#aR0$@(>vf}}ez`*Hm!#JsaFKkFzp;WWEe}2~W`cUKJivP;J-b?O6YiIJi
zNrks<SL&CgvA}ua@5&(3mOsv0TpvGuNk-m;9lpu9{CCzSN#~~;T17O#cKk-2ZULw^
z^eL4p17W~mM-w6j^jAvi3$yLt$LP%<r&@9#n!ADzXM2KkvmH%`L078NU;AORqin7v
z^B&YQrqqFpDjdqwZDzCk%bdOqx7B9uy|fQUs-~H;$MWygBMBpMyfDqYqMUAT7@0gP
zG=d?LjaKUJ*-?PFi_AOES2kX%HDYtETlqqrL7A=et)#Z2T}fD3vd6kDIvBH5NnFJ<
z#d^*csn>)sxkE(Ht(mMW;^QjY+@G#dvOPQmF$V59-8^4y=I`G+Gcyf6LbAjVdiBD0
zD#37hA=gi|GZuu7;ws-Fc6I*Sg~9)y?FT*V_hF1A1iOY5N<OKFLw7;PnN=bg4%#Zq
z|EqM!f4v2$<SL8$tMvLf#sm{3g(aNP8!ZX3vM>-@6UjYDxWnqZ%>Re|q0n*cxIw@I
z^@Dpa<)dgBouqw0jUxl`^p6{iR@CJ~33~R<6z$uAx`T264}644gKhNh&)90TQ`PF_
zhFO?HlEEFK^PD1_c<Z;%?|He}PbUl%r-_cQ<a>)jT-lPECry1WlG9~u(y#KXiM%m6
zP)n8hAJQfWPl0+;65@>aoK+t`{pLIR{ZKKeIu%5dLOwmxB2|zfwLNM^SbT=f>!h3j
zf0$q}nU^iI{nr!YxNOOhitV`VHEXc}{QHiOEml;1HSNx+EjGKU)2n<-oizG<uS1#>
zI_SMQk_2msbxe~$At=m9tsupytVkAexh&YJ3@AM>3%Ed(Fus*j;21l7{1uP-jg!Uj
zP>)u+*fkcZ7Fr4S^3(Giqde9VqvW*^<!_FGMVMywecN?zfu)_x*CVFrnx_ucX8aq8
z-#U-}4HX5$7D^m%#{gG@kQPnT9_hTl&Z=d=6!N#Ir($~bu}n2skF*1&-nGPS(dK9L
zEx;M3LhDXpd)ZvB(8sm9*leRhC9h%u%^VBFO&-5j*RZ2ai|Vexd;X&0M(3#3{CCz>
zTt1<nc_q0f8BD+s#Y6s;z5=Hz30M2GX5rpki(MGoY#aU4RVYm|>Lio=SBE9XBi;V}
z&_OOuW|um27*`PICr~IrI?CShI3|VGCZ2u83p&JLBj|Go(hr*rUD3J~<vc9r3BcZt
z!cBb4cu8bk1CeE#l!6Vlj$Qm}!A?z>a+uqW-5?yA&b4XHhT(gTkCGo+B|G7GZJ0Bg
z&tynC^}Dzv&0EpEJff^h6T~aS<ztq93;rt3b{ZQ`P-eGSW2o*WHU?6^`d3pv^>j-l
z_1paiXtqr9<D>2}sUQ9WrpDsVfCNLL!}&u`&@}&FQhzm5Y@-x1<MZj_z5K3-P3iv2
zm;P?4nDO9$TEHnMo9?xYpUNF6&(#>e0D9>BkemC8y8jB`55&wEV+^mM1j7x@QnDO~
z{(%e(Wjv10o`x!+zz3XYuY&e!nfC^C{qLZEmSK{qd$r03RZyL)C|IsSONn22L7+D4
z?<&z#Ky_w2UI;THk+CF&0=j*WK#mx3$KWGfi5Wom+yhn{gH3dkgeNw(%j<oWpN<B3
zh`=T}gX--SC&q(Y&u5&qiPo6OY-nTym&BbfZZ`3vezE9$<Sx=LCBIWLX>HHsbnA@e
zghsc~v~wG-sF09sjQ^tB$YxlEL2y-9u$<uISbt~}f5)&J(#NNUFn$I@z^#2Wg2&T^
zfgJQXdb3cDygGB2l*fIn{3J)GIcdwHk}?GWbeu?BTsWa8o}oq&%Jz^ZJs+y!Rq^_e
z9RHXGvQjFp;6EVU#qfOpDCwV8k{8q~Qh>@rP|<eTZ)(sVCacA7O~gV_eWc4IfFexz
zUGt08)w)XCtBYiFz1K`I4+#m#WGjn(=b9>1{M5iFIs4W6s(iqjz=*o!Zt&>l-sfp;
z$l*+Uc!Xsp`2RAq-eL&;Px|#5{NqQ;`6+s_DvArhS)~Kk)2%EK4R^KLRYbj~baX7e
zY}@kIZDuCpaitlrkVr-0QmZ1)^%^kBfD4JJkK>#0tpi-xOajoBY?d-@#!(8=XFD0}
zaBx&|t}3HR3*SgSPzKZy&JQNW%Z2~N=6HXI3|p$aG-ua=stR}8Ci?qm>dDivVEQhU
z6Ur0JmTU^i5*d@n9@dMJ^r=JIu|-y&9NrZr{^?)?zAWhaFf|&c&cV%@B{@FlY^bGA
zU2$Q*P6vxwzAct*j#du-<y`6>?#Mu}#E2vuEC;q?hjW5>uplkMWBcV-wh_QC2Cy!A
zgM2qA3ki#vDxxL(_3vj5d=2rPv2dZZxXnniv4$d3Qj_yVEZ)VDE_L+}#>%!(Zbiji
zqdV4>(Hi><Z9Kbv5E4jJEBo9`r;>2BY~IQlum2*ARNcl}{*fEwwjT$+JM{Z=Hs-P9
zK`n?FD}mUtJ-YH~Xk3dQd_yE)>9h`U)N{D)?o?^oj8h_Bj3tM~pf-mH!hC92%t0Xk
z%a$M(ox+rlm<`t2JIR!F?>t@H9>XM@K3+p-l~#BS=>4OZ%8{Y|-!qw8#MD~&d7VfD
zfPZfuBPS_e8&gf0H47?g=~q7Hq3nl5iOVB~hB*0WvuA~$B3Gw>mBa$NHFQmYWl;gw
z8ix9{22NTd$zu+-eg;U1gna^zNo~X^09$cX2oo1S_oXDGrX9e3(w*EuB3n|UqY>y8
z4Kkf%i^U%AXTy9BaAg=hS0Nv+NpCr1rz7G@J($>OU?EggG~03#f(&1<tBI>r^=j?V
zZV&H^)we6}LpVi_<s5S`lmb==ug_j29Bmmja$d^rmJym<F4CO0`%kO@JL6KcrYfeg
z%Z|I|)~|HZL=k9_i7Z)E5BVHfBt!IW;4f>7M6&VIi^Z{Ldbc|1j%vk&^A_`C_aRW;
zplJeuNuELWTPu1Q2fDLW65t{ka2aRwX=3GzdZh>lBhS1ga8hkyfp=g4rjFbL{A(nB
z%XUi71-vunWSf(i4H+enmbq5@0q@0%{)>J$BGty#H{Ho&QG)@Z%IPP{!i9r7NO)?V
zMNP4ASgR1uB&m;3sLaIM`dgiW^gCY2c!;u^EWXOOz*h)OT%>S*Em_Z4lU^E_$RNfg
zV}o{Ioau@Clyh>P7GC8<vFMlcAr%*pzH5A2B+p!Wc(ao)Gus>7v@i#oE!qKKAnp`-
z03<-nG2q;ONSE;F7|3xfNm54VdkI^i7}n9v#SI@a?eXX}3CYd)W;h(?=-~ai+PEj%
zH;IHNVtYOVu^+$sR>w&Fmc=^Sz|C)va(&=;-Y}&O>0QKkT-Luc)A{TGt({y0<);Z$
zR%q6VLmP@C3WZRO@H`+4Ep=1nhDHlYYbY3#E?Abf{(8O^<0=^qzX^B+SwE_EaH(^g
zu3isy##alBc9`_9&Y=90t3Ryrw11Q!QUrt66s%<I?~LiYZN~`ib&cEhlnkMARSyF~
z_Q-n|^XxL9HI4;+w_Oj2&TpxJ_+Ui(Al?z@yNQ{_vTyqxhW!U({(ki4Zw<E^|80WI
zpftM5UwwLVaOD;UZRF+d{q`H>JI690cn1S;6Be}=5E+^BhX~N|*3SaB0iRT%Zi-m)
zailitWD{gJfzl2GgHQ{XW$l(o68{#}w?1XNvkigspL4#W5Hy?tI(~9x%BY(bq-C@`
zqD9W_eI0u&Cu3yawcBpMdqpB^RA-nRK@=g+@HHV6y<$tX%~k)ccM8x2hmN%B%X?-q
zGN?*YxZIc7lhw?_qYtr>47goGPBcz5v*yW1pK!&MdU667fcb?Z3sL#!u?!uioQmk%
ze>v1YDhDvZ%u9)v&--I$Nvfe>nqOQy%7Pu5w=YGVdO>B3OQY8Nr11JK{o)qKna8Vo
zd(N4o;EkWNO|s6?MmFJ|uu_s6RkuL^A;yx!7$o@q@?dWyoD+kH6!mOP27HrBQZZ!Q
z^7SrU?RI)bHsk#mmI|D>pp8XDJHj(s#Va%3DQ`6UXS^p<2p9vp8bw49{_ZXYjwymy
zlc5Il`$_@0Q*E6JLGnaghFATRiZAiQf48<zNpz~c_M^UYz~#E%W><DraAq94dJvVs
z2lt|SMfj2xgb|Ln(2b}E#Bo@|p~KP~{agV-HUI#(-e}*TV(`8-OQ4$NdpNS7lW$=!
z8NdG;!Y+^fdpjmVXQKO+g$N^tVm~ZFZ5GgpE{he}UpkeablJz`c&*{0$sw!~Pn=0r
zNl4o^Z%H9cw|uuwSKb&|LM7WaByH%7)P#@P{C2ut`YWX{>0pZ)dVS-ekrc&-%Zx?G
zr{wLW44yir9H%$@VDL_0RuNY)$O&w6>b7;m{&oK{?hD9#9&qcZuOE6Gs_6rhW4)Zr
za;23+Z^o)T{rdXo?g6j`ABYD9S1z>OeO{=5bRv4T$`wu6xlF}Pf~hHzKc*_;qc9Fg
zB%|g__Jtp~`th$*IqlzL3+!FI-b$9_xnB>jf*15VM89)-fzC9d!w<l<_Ws9KyyD?+
zTAh&Ns{?l|4aIOVzJ2`p*T~q|SXiM)eQY-xd)S6?HE-Sm_Br~p&gZw<n;XhvFB43S
z6;w6Ly;pb!e!R;l7R@OF3O|>+D^Sk&HG(p!q{^f+r3$EwI@abh!g1pN7}iwgI64V?
z!U0RsRPtZ*AF%97Q=wVqCgOaSdC4Yd^zTJ*@3#*WpY^moy0%y#2-JZ=CQB*8#`>9_
z&e3zP6rPtjmyYFp)K@745m>lOTgrk7Dt8V2lt<DJ`?fJG-hV+Xq{jnb2OaP`i+f$#
z&riGBE=ph%*CJRIH6fKAV)IcGsa>mk$bD#K?q$|?Kq=>c*v;Xmt^P&nt_drNVF<q_
z<0e)fNLoplAV9W<pPrr57eD!x?rPZ71+`sqCLAUNIAwr%Hr%TlFC8d1^i_0!$g78W
zwT*&?r=}S=^UvL5=es!0UAo5$nR^RgSroARbSj%l(1!s&|6TTBQkqu#fiat&MeRkS
z>T)BK7L&LwpX8uvugmN5X?k0%g$S`|dpnW3j%R%YpI6s{5e47<pyT><H6X&ItE;l5
zbYZb!;*2oP9+%;LzNbMVn2&S}Z-_8Mm0;HNgiAkg(wI_yOs}6Pw#cz^rdUC@Qn_V!
z%A0vDTE<5=Gk>eOq!d7?f2UgazJqR1F5`%)E?1m&=CEM>>sQNzf?T2hpO`XN9tEQ@
zFPS8|dW!~_MV)Sxi%NOSY9QXQ&cS%RXLDf+MJ3MR{5azjb4iV~cF_sXu%C_F(o36w
z9aqRQZHgNtXXg?Wj;!??P(nC7%ezya{|VuOvEN#de{5ue{tzm&t0duJXSn0AB-jd<
z!{`r*jl}i$E*TGtEY&o?h(WG7!uswm&m?QnBi!yq?b$7*%&L*tDf1^QHY&gUWK!XH
zU!d4auw<Tq*kYu4?T<;RLwIKALft$k?uzGM>G0R12~84S+`2&r#}XI6#5tods)tTF
z7{G<nVk1}Bc>n6<a3jNU!r#E6N_%oDGaV8({7ny|t<K+5ak|GRWwX|rA5>~{5$7Z2
zdHU@#=`(f4xmMtFG4ro6{-|YO#>pRYpq}EGSUG`|)q|`&=B_87SqBQ=xtPk`>!O{T
zk{jTOhv6OM?A0ryox@}EX2}t<j~8#4YIVouE53?#%U4mL|E1!3NnPrpfO!!UdZ!kj
z=1nrQicrMfaENxr{d2<r3;?(=*f^Df(V;Ph0>$1vTZ7_CLkf2#6{Fv^uqm5gjYULR
zvOTOk%<$RWJT|scb%URYJLqIr5?;gvpGJi7*VM1`3I2k|b`Qj`o1ltKDHJtXuED+9
z-(KW=wof84PqoCj=m-8pPsd4Adx(yY_sHiEd<L(xRWE2eQrD=he)}et+e-S30`1>%
zff<RP`d<z*{KzrdAFXuuY^w^t`2QZD+m-KFJ)X+0i*ipE>t$ulHWTwO`k?;M!%L-c
z^SRTQ*_&{8+5rOS+7Vu4w)2gPys#1Jw98NZdvXUz7zQ&0cSO5xm==bm80L1sbx1l)
zHHSoQA=03&uuC&1r$z&NfdfjcTEn+5I=+|R_Ex=MI!1uFip5kTzq>M3I{xW#M=545
zo#*kQC-tZCCQa`q?}t#wblyUReL3x9aD%&>V<MU10Mj=J-csXfAYOqk<;5rWY6EM$
z1)V)+W~|k-Pj4~|O#WS;2zd+jF82v$S^;JD<b+esajr@6*(@K*44C|E5^0eB?1s;3
zfaD^|2|iHTy6*9rspROmck&Eq%i)~xtiC6^H1br&nBAx*uh8IkkQ#uW-C@GNcQcb;
zaTc+jGhC9HX^NEjcdD;*g_?3*mBeOx*hG)*Z)c|A`b7uzb`DERLo<|do8#1#ulF>(
zxRK;;Zr|_ae<*ZS&-89^mEWki(7VwyZjj%_0jwA28&z^Pj}*aAQd$49GKs4^b?YL1
zfZo_O#2~r)4>>GMH=g<fCAjduz6|@n3P?NYw}LCkkm9s`jWq}7lYqFHxU*T>8Opcu
z?F6>c&csjvDu|>Wn*E-Tlg@zcqhQrYv0d}aIhWS>Axr5cbKMp7gH6ri4?SN&hC@^M
z4_D$k7z=<jZI^7r+&{S0-IVPqizlM#+v$_XXM>F`R`?a!9YMqC61<F=Vl8jqeRJ>G
zI3*3#bIw?psHCxq{;vIblyp2A`_|(hiGDXF6*C+Iw$~f$$UtURw@2wTvKYr+p7QCs
zxTWPSYS6pLizVMX%#5F%RZkP=OK?=u8$2(5S|bTQ-JQuE{|#Ruq*E)4vPn7OgwDj{
za+D$zcLEY@t|a%}3ssY$WF1=jbTHm#8I2sG>w%~L%9Mk{`sni<<M>Lf`N(+VZyBbQ
z87UT}qsi6=Q@lo>1BT7sd9k5jXaCO2>B@kD%ziDWva5|{+FNh9c%OmoZnkE8%s`QM
z*pFT#%orMA2e6CoM1#!1U=^F$+TZ@58zs<KhvaPfa_QcUM;S|Vj$GS{un)wR4$|`W
z`(uAn{-h}eecX|xPFkLGA9y$8`>m(hf`LZ0aKKOIG+r{jPjjScD%izo!5cv~{Xy`B
z0z9=q5v<+5^Ice=0sCfw%_AWqVnCzZI1<=zso{k5i5k@hOJN?qkc4t6INZ}7YX@Xa
zT&FU!Q=L}IPx0&R08a)dfOZ7FbIoOM`&GXHbZR1QHNpzQe0<&%nR7kkLJ}%m`V@YX
z(D}BJTl>=3xdo8M#O|f79xPAIJW9|lJLjgiihW61pmK#44ikS<uu^#`hUHRLEcY}3
z#Q>oZ<84>G%H6>2;X-dh#FE@TLu>H;(N$&GBXU#I@KxJDaBw6uN8NVC3t>U?=7H5u
zE0ykw_z*iR0-qxjU^7>yte$X{&XEG9f(k{q4<D4b>LtYTzz@(Iri)d~afqbFPf1J7
z*BvZC3YJ>Sh*?3ygwad}To~!sUTzp3AjVS4h9J!VVY!^-az~hDGS=-N`#1zR%ACs4
zyMBwxNaWLi!Ps4>2VBT|NIVHM@Y5IfUgV{N(8rIxP}+(Up2#XsY^I7w7xR}?{z@yt
zvGP-%`TME#A1m`F(f2p+2o~uxg}-GD5ZKL9rg?WpF=<vShW2AA<3Ioxsh!prDKns|
zc}P=}MDh}9Mh(p3C3+5EA?QE>HuAn&oJNW>d|rNzgp|wP$AOQpKm1$7=9q^P+bLa1
zVazbi8~ZU8*4Dln|Cmu>0&{SC17od4T9Hyk`Vu{S<`u+FPwPY@8SN--v({G1R%D))
z(_iL4SO{6ce7q+t)w~;DD{gbm?RI;)VkBSmBK!HThS-S!<UP|a6#{p)=Gov)Te8;Z
z`vH$_Wxwfo`?yo<It$4_mHo6Ihn2BC750N^A-e9Y6i1bZJHP#2!aJ%mEJ$*@RNQuD
zLV6o%d;iI147fJmO?rjwpZ!b8>Rf$=m#QlriB)liaUQe34<#e6tckMJ;2o<d;Wrkp
zoxGPoi2%uIWjs`d#;MZClJvAroYJTnHz6G@(2rqpk`E>I%LU(&L6)7>o6=4L;OD<s
zhA_kL8E+#t|21lIe10ioYFDqf<ht%XiJe`nY+I5&HWxIS4l9>8XF#Oft=8gkZ)#hB
zO&mvN7TC(%cU~)UqH^^ToRtQ+*ej3e@TV+N6_r7|pKq8GKnzl?G7ubfx)p1zPmk%p
z<I7V)E*Z#szkj>_FbB=HfKyS0LNA5twzfaD32i4bZ>sC^rp=l*WyNE@OoIa=(itF_
zM0?QyX$wYumy#t*^EFs7G5VQw{D&Vm*^{r-KO()j5>rczGKG$IWSZS(JT3+mlq1R_
zI+Z>*IzxB5q2vvFBmjaBx9ejR#t{zF7wS~)nmv5_$|fb5%b^fw!8GNg$RDyKy6m^Q
zqYd(r#+<xYxvJxzc;_O{B2(k#{<uSQ70i8F0A+T|@BAC^Q-&Rr(_n;&3@y-uMT-kG
z@8|y-3C3$s0fphQ!5EN@sV{AcXq@A6mGC<UlWpPnpzj&!u6eL8JVp9evNAIoJcH6o
zBv<wHZgGzf-#+eHp;uU<Wm{QE)n!#2NsDTjn|di9ZpTs0bDKGwth937{Y>C1v#*9w
zk!IUcm2!vvSJFYp1_*U5i^8MgHNbwRRARI#|4L&EAV~~<Hl1?Oz!RxvG#Dj}z;XHv
zRs^dU-QfRZW;f`dNvHp}b~4&_9MN5PXO!Bn&x^p{!DlAK!_Y|h&53lq_AdKk>Axdq
zRw6NbK6)xMm;6EwH5@;GrxU^kr;)VpVm1LmXZUypQ?TY>3QYxdGWPw(a|8tB914A(
zZO@r6o-y!i_7y(kL<<TMtwFAX-%h+>5Z7}4xFjFk10wN4+_9$-jR5rnHeY)leq~@H
z@3<&@*2d$^s1VxC<q3K=LnARar||kEF-_|;%=1O8tB4jgc5U5B&5-M5$DBF7Mt2RR
za-@BefnE$@K4Un@h#O;s!B!Wr$)F^kXL)Z|_+IU`@+uB9X=HKDyVp4ZDK#JX1vd*o
zoSACW4KBC?iqA<pFcab!#eXrOg}yEO$OgGQzCZTT$4Gq{|F=J^Nw23Qj}ir(K|8h)
z%yyUd8|815uX_7}V-EgIhK|n8%ic6kBo-lww2zPvcT}M5zs@QDMrgq1WgfCTXCAJ6
z(YR9OGcsMZwb}nG(?G;yrPZ{J$S-v+K@`M(su{?hAgMcUmL4DRhdop6k|0XycFiab
zpa9vVQ;beyI;Bf7O->`&CnL7eKEAp6COm8HYgijJiZhI+2<Tp2wQWn{ROBovYx1lv
zSr>K(!JmGR@Q_!Ks)XzIkpjKP=XYmR3}Nk6mPdH%nO=-s;BKfFW?DKA40{e6R?*S4
zW+2>>lpHDb{h%GUp=VYl|NAqQF9F3wj&in>AxIXfhDTK<A>DK7a*6(Hp8^g`35XY!
z@<h7tI?7`3^T3{j_|wkBMKy_Cg=X40Sy_$Qw%BW@1!#n?iH&N2z%b1&Xk=dIV@pya
zwWxh&?t+r7r&()NF)CuslQ{8u%FQKV<tZUb7P)SFKX1gej|(x^Hf=NG7&QNyquyGU
z4BHDqPDNluHdswJ!||AD)Tv{Iho33FY1^|%-otmAfOcYvB5)#4TadNaL&cY47Bq4T
zV}FD~d`X5_WM-wPQ@)!APOUR(QTF0>j_uJu-Kbuo7+OnO0{56Z4;}T<2E{FnUZsfy
z-mLF+rM~fXS~@@0WRe-e*fr~3ke#ggGOqMgPQw)3^iJ0RXL&JJ43sH{Hkt67LC{i+
zE%;Z8CxAt>VBI-vPLTk{S(Uce3O|g@65X0YwMipp<+4PN5pxQK&*Q?eG)^`8H@07=
z(br~ck?N(oj2I-1k^$~$GGF4=C%Z6G<Hqb7<u#&6(M9_F^xu4;+A#OvjyJE~L2OA5
zDt^5=<|6&upD7mL4v&2{8r$2`)E8ufeW+xR2CFrMOuQtaL#4_sQv<kIH2}Ci*rxIP
zD@keh7-=|rv}H`&-SZs#NN8!{_xdC8=OIDS@QBAPg=Wj>phkpqDLq0VLSC`4`6vt|
ztezHSEd8C3m>aRqIUizUpp$<p6?#y^kqkD|4Y56va`-SWpK3BQU7M+1F%K*Es8etP
zbSjuMNpk2?>I|a|L<L`5VCI)q%iD~Jr#~k6C^_TL@B7^+a#S2!sn5J*@T{84gtXwf
z8nqv1_kV^>3Q#z5ZJ?)0bG_Hr59D+`uEqnT+_r{%0wtZBb_QteQ@P<qH)ma++$v`<
zL93SYxL#cvA4S!kFE5jS{(I|_)(K0dvch)CqQp9x5m}H<ySYH-m?8em2{0B9p5daz
z7XOdEw~C6p`MZ2^4bZrU#@&OvyEX0}+&#Fvd$0fv!GZ;MhXBFd3GVKa)6aY6T{F*r
zX70}Axnebo)pS?Yuj*S>dw=%RL-$`H{&Hg4T<g}vJQ25lrGp&m!@Va-qT<6c!L5>W
zs+Nj60-}Npf?v1#zLM<++iw?TJCG~u_<S583|NxLHPM(12<9zBAHi;-{7t}s%i>;!
zI1?sN-2yx?%X?Txha6u<@Ry}>J4aHrMzlI_ah+ND8h=Fl#}n~CjuUJC`#5oei~ln}
zHuFzoBlF<f4$N>o3tG__m>qe$SQx1{DZl^~>Hx}df?qOO?iaO?T#eE1Z9*@n=>8wh
znV~Abp^CZYr()52XQoZ$TS9x{V61tl0JHC6G(MF!00bXp087Ich1TNbB@6<^_j4D^
zb|_i1jl;9S8_xLeI3-5n(;+A@#%Qu=IJogiQWXas&wak*hObPrjRT(4v!$b6#q;A6
za^drSw`F;=h#$sdq*W^7<V&$yj7xFUC-wMg;-=InM8z@PBkY3#;<^%>KueU{H&M~o
zxD^}Ql#8xr92*n#k4YcWWAS1>_|P*juyStV)k=3dgC2uvH#Hk@DNIfqY(Qsm*wLRD
z!q>Uf=Ky82LPRm7A<uAca@q*XY-djgC9Y*0r;f+I$Tb9lvdZToCMYo{+SMyw7(X!^
z;ZZPdsFL}Su6M7OGpr#ZlB^IJv^prir`#nB#1NS4lcp#}1v`(OHYuhT`V!&*fh`wJ
zga`caNk>d(fxStw7o>qu=z~)>8M>T+AqL{k8c`!Po9vPtN_fpo$ym0AUotz5tTvUZ
z8gNCH*C|pNUq8SgSFy$tn7qopAp}Pq!_}9YXe2)sebWhn3iSkL$?HqBSE0PumQ}yR
z(saN7WVZV`v-0iMy1R4}s&934>Sdc378~j3;6<0i4t7)qX@TEDvNh(F48L$kqP57L
z*fA-{mP3Or&iwF2H45){`;(n<BcJ>V*S@$_(O|9C>W;B|`NzY??P?zcHTXItxgozw
zgBDRYN83+YZhysRZIUz4GTzbCiFXFFYFSa=z=^lnSIpIPv+t<IPuCD1_)Z(D(7pMJ
z>9B6aeVoXY(E^}lkB3BZO#eFZi>3ow_Z?D<M1x=1rvQ=n#~o~Hy*AiiLv{To($lCY
zjx~ntoL{+Z6hF1X6qRg*TxaDJgMe(4DAS^<xm&^&X;Shk;TSS?p!7mnUXa(yjB()>
z5O;x~WWUd?*i}5a1df;*e!vD3sn>1~H5r8mX9>gRcNXqeTsPuJ;26GQN}zmnxtku2
zv&i%kL;5dd1<Qzfgu5Z8a+lHyH18BZRX0`NAANeBcXFo;bt{(A4o$PY2zefz8Ohiv
zOi*?wrMcIu^va$S#5oiN^I=`B0j|D8&JuEqb%a`z$N;(jRoo-@RrQ_nM1r5Azx%ES
zj%#&vC^8$N2NjErBvDlQk)Y%jdMXz?3CuN5c5PylrhJLy7T0m<qP!!PFVLMGk{Yvc
z43t8l9Hp6Gf^X!sib}iV3soZItZR@B8@aSjLn1`iOXEx8IcX;4=KP`|7n<gXNu>(E
zx5sezjSU!2=n!IGVaID_Gp71YXI-9Ao29P`WSN-}$Hsvej%^v(xGj2{Hzxsx7ho)M
z`1TtcbBUrV3zk!nx#ZL5vG3U+<n#&llA=)iav>jXTVY0nMwPA=cJmpXes;$|F%8-R
zQqY1wxj6cuq8BQGSi`v5^CZ=D-yO~{%}EBcK7<2^E*iM`ugJ-7lYmDFY~~~&T%ZgC
z6x?#(;75^-r{Iqtl*pyLF=k*XsHzMMe%3O>Az2_Db;7XsTb&Mjo+_+FZ=<-1PN}l)
z{e-AfJg{FQO$yQTB`PlYEG7Qb3Rym1p8%jFX!Qa}BRbb}{z=DnQuCE>qU{5l)4&C(
ziN!H7#_qpASz<D~DUtRaXyY^tW`=wd7{;gXUdj;V@6`+uTw6xZ7UO3Ic^GN{2;kwo
zrrfH*=y(tfAI7Y9hxIWY2vbd~9m!TuV<Jzr+6$hC+(8HF0|uUMJ|l;(w_J4c2QZXC
zSr087;!z6|7xsZ>o;n}mfDyAM9@P+Sc!NoYiZ5FZt*8<|r$2iZEyCUop}K~iZSXlP
zN}t3x)uv<~{u&9IUED;CDgB78YZ-E9LDd5{Kg$V5+<B=IDGEy>%R(x82F6BiMz9@y
z>TwwAnJ_sQjF|JY)G}O?++A4B2l^uUQl(^hv2yL(b+>)*Jj_qh4Rmz;Z6Gh7Ol7t{
zlc4iYrD$N@%GFMubbk9v_{C5TxN&x;Nav78)4`E^b38Mobordxp-O6|P*-=(pr{~-
zI1>jKBDL553~^UUM&hm&t%DaBv;j<7jpL0tpB*w;R+=6IwtjJ_mhY)b7A^&kh{=f`
z*Ph9f4*AJqK@9xgujB$?&4kyDAH{{415)QqDfDhTvo#u2Ba#VDbM$uHwE^bxPLy1j
zaM+JCR&R?W3tm6x$m8k0fWPq<cbTI6{ou7_mzzBc-~cnDc4MmAIFq>R@OLT(mhxBU
z;p4T5!M@T{Rtt1th8`GVF;kFx`n_hyBfpvDx=Mf0MrqTXPP3f+dZ1DrU?uv_o_N_W
zwhq(kq3;&uAlKWpb=OD%(XNWNTaV^tSuf+R&-j^Dd&o6RdEa3$IrwSfg{kko`t0jV
zpm6R>y&09;xZ6u)%*lCJ%2Q$5t}I`EcZ!3xr2=Kg=Lr8!GyJ(52y1{C@Hl-hRNj<|
z!u+~W^sLBYDwraP>NVc|cE38k(r7o^js4`0IDP&8av^h4vbF66<e8ds6!`9ukL|yN
zEFj%RzgcQ#CR~jpcDXXDJ3aS{dRh{bxg7^?lbj3jBdvbgarCVT<X8(zF!(C7Acw_L
zx&-Eu!exGV<(aeePfN*-Mn25nVwrupyA-EU{5cZvrFdiAi)iRmlKr!;&F-ymMYOY^
z_^v42c$OgdSITf$`r!%)S@{}fGOHM6hG)z#PgF$uIEfcbTilg_s*N)sg71(N658`B
z_2dJ%Jm8I>IudOt`%9-+TGs7EJk6ndL~ULa?rtdEAPZfuDUUY38YtJWH73Dd6ObW}
zhVix8SZkCaS&mLZIojLW+7b7ctHfxW=jV>McQSnglL{O^ciB65v2ZMFCnfWS;x?8Q
zaGGAc6-!o3_;&HzEjHI!47GYuv(Z<(0gd9Q0@TbsM0zuNtJ;nX$)*9SD2)VLkxqvU
zZKWTVJ&1^0V?LOaWfnb|<x$VpVRN5uN)xr88(oPNoGU~;g!yagpZse5d<jnK2K(cH
zW+Yh{NtOVbO($U-2M|#UF=)9V_S5XA+P#F4CZ_pNtk0!G*SPehm+rw)X76*+Ixxkv
z%$fG+I8oz;Z`fS<bG_f*s@$zsxiGxIpq{~DQ-gqOR=3ELCU}Z$wZ>W+^+*>(gy}w)
zW#FI86MDSSoBzz+Jyd{upOQrF6^+7>1q9SeI6F`Ki8x~!r(cH(8@NZxgy;dt%0RNv
zL*ZO@OButpwHOSza|~d(on%z!jqlYKNz`z-Ep*=mWiyfz;`sfa?*1#S<n;@QuR4s|
zdUVDH@E*rO!9;vmF~z#q$`T^BVjhX1MY2s;r4RzbM`f%$sk{sv@a$c6m)&HIn59uG
z$s`i+i9%#SEb{34GOl%y>f+40Tqq8p1|SKcw0dMiUBlx2Yz(t5nThmn=AGCU3}j=J
z-kIO@r`_vepT}p^%b(^G0aH*Up|NwCE+{JSh1XMYCujq8_QVM=8x9neec*1%C3}KJ
z1MVI~MshobkPw~~MyYV%b@kW-tw_1*`c_CUixr<0zIAuAnd`%BfkBktFyyBFcFQn9
z;N>FYx#3o_KgebX9wU<F`aoONE77J}qHG|^6FMvosB#-ayy9no{11t(m#8@=E{-!A
zX1p-yGS1K5CEs|(<JC*AA;2>cYg&I;yA^EK3JEWpVcRS*h8@EG%unX+k}2Q57BV+r
zq`<?z#6VKHtN7Km@e5)2Tf6m^HtICA9_JJ@dU;!tV&dHm_1~|8%hO~RrT^;oMmyNo
z@s*lX92cII6=Z+~^{nk6w7566P11BrotQWdS2+vS;F;mVwmv%~7*-iKA`H@fX8J#>
zs*Sn!Y-HJ<{s)`3na7$a_&6#ZP3)@2pI()+ZsH#L@tm#nhoCw7@Z!n*E#JN58o;2)
zX$YvW!&L{kATO1cu3OR6T1ry14v`H@zjXcGK;nG0M*%-FkkC4zm*`f4D(YThGA<P5
z#6@~00}9w4KyPfPb1VsnX@X#s9q(W;5oNvY?SQ<x?RJf;_szoLluC}2RzCM%{QXGg
za>VEF=z0a@Qdgv3Tf7UAx|XG1+1|_AF3sAgLy|t(ZDFop;V(P{mZx&9;Ya|QCC}bg
zH{bC(>0kTM_)Nl!o;n{kV!LLa5Q@S#!?$^iHk%u)xZLwoD$|QjesH-J({rTO;FQ-X
zFl6M(BSN&{!GF)AGDL(#guVn^%~@B^k%af(6xnI3Sq>k_`SByrZ;@$@B<s=RTq-R$
zwNKyxejl(9RVT5T2FF34Y(HJwCUcmFj~WZMk9X)?<Hk9qqU;PQVNUR~76aHhJ@Q1V
zqnvxx5iSy8WmJAyNsn^Ezo(+^)qnd29a>*U364-Zu3f~FVr3YdcSr_|%T3r2OgrKB
z*6UgTEV(@)3)y>;I7H$PTNYxZ&nHm9#E2EFZXLk7sCGubKJS3}%g;yEzi9-Kjj*gW
z<jZc@j8!t8L+nTltb8Z@mGh6Y()~DYt*0`*%c5XF@X~Ylh7QwwCz4yIiim2Ls4LnO
zKUEHn87vbTAqsF#cJ+b5aPMlrtWJ(6uv{>Kv<dNm+ow+T+yGN=3|~3zAx3dm>d-*D
zy~@5;RRF=3=_evuR76x2ZMyc@Q)Q{FDU%(T*VN!h-_NhISg$8SILVU<%$dW}sUgf%
z%)Gwl;J?9=g8UbvHKG2nia!6P`cB?mUOtnonUuZ$!r;U**;@OIw`hD~Smed+tVv^&
zG>J)*AKzt`WHX&4FsWd*$stx4QlKBErTK3^(~l$C)&BiY-QF`)un5&Omh-MSfd}#~
zVrh(+Lm$SKbMm;GR>1DCSbr7)vMtUCAPQhady#eQrv?B;B^SlDdH*IlE>tQ+AKj+N
zuMbT`z3^b+yzM9RjfCQb+!YIq2b?XCNGL<HUT_Ld-^roI^KScx6XGj&rNk?MFnx~}
zwMmQx8uDV{!5l$QLc%m~M(n1M_N`yUdS%y@oWM~N{oOk5aabm#Ahz)ljN{-nLh_j)
zfdg_+1cLIuylZ|h23|?vd&Z2I?v%cm^Y}rc<7y$$O}*-Cp@X#eS`Atw^5@IJy*(5n
z6;2S`PY?B?G8J$WMcP1X0Z=3+ukr`JWG#t?e0kn=woPr<(XRb~1j$S+6Sby`RBML~
zjXvP3zyp1zr#g)#=2V1t^qZBW=m;wO9dt!PI|`m_e*Hzh#$VMnz$+qZ$P9oS983g$
zP<kpn>+Mtu_ylHbFV!#rk5JSqfBl~Ng26CIutDmp^dprDl06mjMyC&KX46K*jv`5u
z%gPJo(E)%s|AgPi8U$JMGQyNR9oD%A8RTTmAcrHP5RQ;Pqp^oKE9RvUHZW_MM_KFE
z6CnLK1BlJb=M5RiPco)w=H(Q-5ZZ2LKTkc68Wj5{IeVW^eFph$t6a|pzOEw)jj&Qy
zg`rsx^8O)&FxIgW-;3h<CU@7OzbG_wi9>tH^2m}PeUe4h5TMHiYO6k&;9*R#FBh3`
zhGmYLeE4&5%=?Ih_iGR)RT1Y7T9AJDH3CGu>{4p}r%*C>rF1=h%`_LtEW6Eh*J{b_
z2+OsYFxnq^YluIY+vI~%H<Fv(_@&VymOrj;_2KNHG^h|mE-_=*34MRb#PoLu?15;A
zc*lnnGWFSQO0~|Jr9!RHLk{X#((l&`!L>UD2z<8~lR23DiAh~fr*q}fo?^f=)fL`3
z-XHPbPuF_aP`i6mc<=ArjxDyE@hk8^-FOhimfb>Cw3ck$kkhA)10Is@*{=8J6YKWN
zeLk;Q+Q8L24}D)f;b9#eJVyAXSdfCEs%=@Fyp@Me#1U5{5tC}SjPxC>yS-S1a~!+l
zqgPUO%-epfMK+(mxCX_bL`D;N)MOI?>WqTL@#r-i1g^fyg~)`l(MNLg8Z_B4`!0@K
zsJ9gPS>7>KX-O9Z)8m`Rb~#GyQu!(L56p1bz&U>mKIpoLfk$j0*-oEn=kbrqSQO;2
z5MSbJkPI82LVpRo^<04M4NM!JTM4IO?sm`*^yVk|E@#r4%Jc|M)&F{t*+)$<JgU5i
z>jx)t#O(r>L%t(zP~(Cu(kc{3@MQ_Q+JTPw)9CgXb<XN7Nu{SnI5aZH6VhCbzioX(
z=>>tjJy5bs`3K9T48IccHJ>0?%(%D;T%0=ZJX=93C_0;XHZJOxn^O-=lCnQ9p<;z>
z1?an~V>)^f5e?{xVGuDOvR-V3*uB`0FV<$CKBvuDS72y<upk5vFsJ|K-IybB!rGSn
z$~L<0RW^-8Khp@GNV+nfl4T_C%gy^$gPWlO6w^#74=+P1!POM{vjl05{kQPrpy*66
z+6FV7i|2W!cG67qG|$lZN;S0XVsb_7=4Da=qjq(}oGJFMk~9N59tYIi!h*Db=PBDJ
z*o}WO{H@=`f%t!ml>3J9UwhWqt$j0s5NhzrpJ4$nxS!X8225@)!|*(Iewan5wD5p_
zqJRM}9_g#W6;3T}pgV;3ycw)`e$bMjN|R_X9w|`Q#<;q9K?)O>j?LWJf4b=S`_-#x
zZ?Whx?18qYmrq^nqvH7$qgbQp<yR;jhIdRJ&;iFBPKKUbFAmI%wrn|NIMgI^#qx~l
z_k_2fK~Y5#aNf++na32Qnk282n#8W-Y`H12K9W5zL}GS02-oF7wW9Q9C~*Fc(;SW6
z{Y8sO0Jtlew7B?dT-o1RdS&Dk(roCohJ?_8NSH_;6vxP;Wm!FqEU|$ygFP@1`ET|-
zsMN-I#uhh@g*(-0CEi#mLveT21E2S$$*|QP+7IN8d0Xl?g1TZW);y(S*Zig8F9hsl
zva65>#nLgMNkof;pn;dUmkdik#Zn|(CVfa5gD^g5Bj6z+OfXat3b_UV4{VW??5Zt4
z?r#0$^e*%3pL5hAg5QWE_YPvVWN#`HtHNkYCHMdoRlVK%8{&oH&re-cQn#N}vKtZ>
zIs&{D0R?d2pilfji<cg$^9TMxYLZ~VOo{~lc}wC(zYg~aE<oOia;@+@U@14-$c+3Q
zZ4D(EI~uY!N~TM7CsUqXQ(Ocku&<W-3MsS^b)BCfTjnp6L-AZCA69=VW%zwL-<WoS
zg8ztS_nZDkdz$j-5Q1(C7Y?<K#z0_ebUaEFvr2>!cEqqm4lQmwA#&{tkN^(_*Fm;O
zQ3jO1KStcGiaZa@agkFO&^Lf|;!{;njD!IyNm4)BAP59Z6p~{Yry=*gxa`sdTL*i#
zf~+@?RbG<SjDv&BCFeNW3b?58o)cuz5yhFt1aKi?1XwP*D{_9ntQSj$r-zT`uc+^P
z4RDP>Yu4~Fu|oOu8#reLNNnXVRkLAlCMZxx#E=pRyizH75iq$)e|iDqFkhJr4kL8;
zMjU#7W@cqgln-aUsN-mU%=0v;Xr+02zRcZ>-d;l46Ld(VreISw#Tj9Ro0pB14R>SC
z+;b_KQ@++9@A>i9b8W_R1PbQWyEci)wknO5pkOCEqz^6yngSjd#)tv3asTm23mij1
zjUVijvttO=OR}Td`<T1+76p_1T^=o>S*~s-DPEP>En<{dZ4^xw^%!`!u<WMIq>0Bs
zhy;febB>2ZfR{$mKey?cOnDp1C)fV?Lxs>Pyf=I9i_$U1%7V|f&=7v;%XA+siH_XE
zV9YRb7>t$6iljUX=&NDtiX)tCuJB1crU0GEQi84ci4DU51YEWwje))WiGcz1)E@?W
zG;^(i$WUN+?EtMWi|eO<>11Os>~xl>px7ulk2o6MA!;bGQaw%K1mQaV`R3YdY#9Yg
z5nJ1;gt;1M5&KhH)WNsvI(S)McM3yn3uiae53f0)lgwlS;cVV?KC70czg@i*`qh#v
z5u;GdaaJM7Ry_SetWMQwNg!ektX(t?T%U<_!6qK2?LJdEj6i=)@r<4lp(~+A$#l+3
zXBsOIfcZCI&Oi7cJqolcOGZnNd^N;n-Ly08-%^X7XrOo#@VKo)xNayL@gT9hjag4^
z0Hvx7bb~4E_dR`SFm4E)cCv(4q1EFM&>J^~{D}22o*&Y|I;F8zFv-99sD??Y!UO_~
z@DJUJ+<8x)-yS&n%rV1hAg4t(Xk0}YHZO!MR3zf)(_yN9Rb^%LTErfw6_ko8JV7ZH
zZBoU6_T5lm{D2ftnLB}c03*;{fdSV<&!YgHdzwp*_xhWN<Yi1WVC#pg;Q-PA3{}MX
zTaD{-0lYz#VBSv&!-m8l*R=%$OlF_sSO}#7?r7d9JzPGb-f0w=-@{Sj!Mj;$py#;B
zg`$NqLLrnr2aobz?EEXaFY?ymvDSo5c`L_rC2!>#Ef@&=jlY+J+1W%1uVHO&?bY-3
zDsG+=l!WA#zXJ~=hr^1JlwCwQ#6P4j85-i|MPNLFV9KV-`==XbS)(k7ATLO=w-TY)
zr|G8;m|beU=KxOV2-YH-`B>nrunGcKX8b+Jk>gfIoh7s7;4P~rMoiAHhhq_%wA6l1
zT#1+x9WE^KF+PREHU;OCdM2y7zx*`)3=%v%&FOwIC2K~Zyc%M3lq;+_yA=dj)P-r?
z5Z+4c%;_5M0}f&kX67aXx%DyZCn^cOn7@cwu*qb@wo5Xd{x$XgB68oY(x4$3yXikX
zQi1wis)5d9DIrmWtB6`54A@P&l;1V|uzd{(-U{j%Wo1aHOCf^tW1c4%&eNdrM%%`g
zY->xMe~Ab3c^uZ3Xcd!j|D7{v>otX)hB4J@gdhN_G(_pt4za_qW3tpK$}WGB`;%-i
z#);VMsB*I31484PlX+Kaj73V+W=JBKLK?z932ZJ|<Z~eM*r(MWhV2J|>MBTN*JwL~
z*mZ8&gXe0%A$vkvc+bS3jun@RMbd<A+Vx<T<Kf6qQC>J?=r>{zp6I6}y%<pe*zu*v
z!Tpdd{22{2O}3AlXd0;S@(+V2Ov!8xD}OuEXdYce;kL?MrJ!GQ>M+e=f>T47aq_AU
z5~A=X4w62Ae~jX`hCG{P!QAqJ+@`c3E)_p97*T`<!^|Ry3wAtjZja^xO6osGmc=N)
zG<^URSG*j<QbWhDun1L;UoYF*!z-*OfhC<VGWz6t;OAhAUkfh{2s7J%ET4EwM`#7D
zfJHsot>W$0JbUm4-n(2E7^GM)CA$acm}=}xkV0FE2dJlkKs6|33XN`p?j!a>KB&5(
z^H)QLZqs>&K*yGCKggL*b-Djj0$-Bk7mPSU2}*tI$`wUJ?|N14oqqXH*lM|4YmtJf
zz!COQw8yDL%!*jgDWl`)UwEceGw>zoEyZAzgC&dQ;I)Ct`i>64F9&&Ean`}!+wCEC
zv?J*?UC#{sz`py2w1%~Yp}JK~pI`NFt^YwGy5t2s_Cq{S!6S(t0ZEZu_U8CfZM<7o
zE=*}EhE+@k-p-h&_|jh{4!Q&b;CdDz)~LjjP<aa$f$3{tSb%<kuCvszT8ECtUN5O^
zp2~7mE8w0=^`?h97j~JY@r)DYa-h6l5~L5v6XmqgFeAfMSRdfySF`f+0IwF+^Rd<f
zc4?9vp*`SdZWCax8APNg08C9yI!{dsks`T`5;awd=?Xq?lIR~arXX4>U{!Pa@i-E?
z{I{B4@1At&lj=^YybdJ+jkxu(ch#UDIz5JGaIu$frVerrD#$Llg|!;l+o3oJh3jSR
zOg&Ye64qhxOpcqNZ>l<B_mR)`#OV^NK)xQ#`iOf81TnT+5ixhWxF#-2pH+58?a2CD
z;hhehk4*oijL;D4U`KD@v{&!}mr*+4(l5!9%II-}0@*1{LgkS;%&upsr*G34an?j|
z^8{-*M<P<?eQ_UTdi_Jvh<s(CWb2`i`+@Yn5psDJ0W9H^@oOO+LANmn@>LDUXN3Ys
z2c`z<7r|&2q~#+J=v=gHp~nI~4f@-b+cBx8E+Mne!&#O!yA)*hQxR6cW(H(%Ypq1S
zeXwhce&}sdlterl0LG$7C9tsq%Lp+1q7r14)Z5*@-xV|%N(0Lsafv3kZ$J4A37YKJ
z_|Izyds7>#`YOxig+1@?zQcYtP$|$s0;scY2$m&XzLNhS^N?5}U|ET1nmt#rn$(kT
z{gNz<Mua%T9G{o|Xz_OnuV43fEqCSmFX{#^K^hTXy@FYIV#|+rAx)9@CpOKo+utUy
z{3(?1Y`*>Z)`epOzy*i=SsU^J#U_)j2;(dxx1ryKeUVu84$JzbMVdt99r<lU6y8;T
z=tL`{k(y;+%#xO|ZPTe!A%GRn7YbIW`91<~^}FG_X=YH~*Uxb1XOh3gP*_H{-9UNC
z6sUc_UafDT@2Fl@E#dD!`!R*ETmDS>gSz808fsDGXV*RY0gTozjFkBJv*qG(gDnBz
zI0rk#qi*wyt=&p(np?!%Laq?6`*5H2t@!UB#~+4UDb#=9v0`8xPkhsx)FZdORRop-
zZKGdEV?I09D?uWm1sE9_i@RU<Sfv}BXNhd<@t*w7)Q)9APlX(m@SKbGeu?26;@R(q
zNdMM~-oiuT29PuZI=qbf;H?fG1f0e^h2E|{Hl0+&sbUn=MLhSLKdzAl3c+lhKnU2w
zbv)E;i4X!dnwiQ&6N!Yh^`RPV)j6>OSSmJo5oHTYw8`!3ePLWAOZgivee}iqJf#9%
zq>R2s2IhA8B=R58k+8Er@c~afxyG|^$S;)}Q=jkuxWV)-d>-9*T%F2P9t3?pB<y`l
zT9G4dhHCNuIEwnF1{gtlnA)taowo9`AC0CWuL&puu+@uWGAkq(zx{ApA(so&$?=_W
zOyBhBc9sa*F7%8f#_(1$EPT^)g26V5Yzi{f`9Rs2>MCyNhafjq$l?%6fykv$0`mYi
z>er_g`nVDa)m5T|Jsr|lR3i2fe(e{NhdM-11e8`SjZ5~LfXg11!23ihWZT6A&W0WS
z<Wr~yHbR|3=8S5uhNAp$761*%axccBhZYGQCk%d9TJhUpz8$yh?J|Ew_Z7D9U_lT(
z74d*Nke{iFPBqKiUSc_kl2hx4216}!hCJ%L+;3???FoY3e(xbSj(nFw`^%saVv$Y;
zh8zKXVWFMze#pT-JA_K4Q5HdjPSVg|1Ey}Lu~a#%Ot*=jFVM)5PS2D{g^21j3TOHx
z!Zn-$pMJ#0b0?hkL2!fY++SuS6*wbc5K<7~F(pyICFueB;&1H$%vQVW11-}$Y1+X|
zAjBWzv`>-mLCAKhpJqSy3(T|CnpabA`ldJp=&XYq#Q8)EydBH?cr(jt1TP#-^(WX*
zhqZ3A&wQ#A<78_m)h)3Gw3DB6nU`wXQ+iV;&8p}_gGBDWjYlDeJ#xa_G8H-^5o92y
zI}k5%m%D(;qHok*eb&7C#p8XJbY`S4zr|NA47uw|#M9;fIXjoX8iLsdOyU|1`v1{A
z2<Is8$N3nCTKPO$(fq?8iuf7&YVd_lsqf%)(+ApoRXgyJrL~xwom<PX!ml&W)~(`n
zjv&%yDql<bpgy(6{Pa&iq|2_*vxzN%1IZ+{C8}!p+u7Tk!L%ELY*W7f5ymiC_oj>j
zJ)gr$!(rp1QS4{NA5`9de%BjIo+&2}=kPcsOdJ|uv3W1)(5#+Ml0yH>Jr`^o3Nib3
zBM5Ci@I*u58!0zfcpkQlc>JS`wgNY4h=Rx39;+&)hK-#vRY8$L5<uy@uu1jgwwA|@
zg;aZ|C7!uLtl8kNh<B}(2_7r?H;IkB>(-1N+76$!`zg{$SBReh8y6Kq7{07&y`F_@
z6O*Z{f@Rga?}Zz?UE4w*Q!i-mIXiB3Ph~g-E$dlIlA!^UNFk39lbwzp&&(iZk4g#;
zjtU=MmZqp*(i{bVnvQLS#kk%<&6{ng%4Hf3`;MtW>O!9_FQ{!Ht6@eKj6?LCrF==-
zIl#&3X#T8}Z_PRSnVOmO3=Q>S8eyg)84rCEw?omI)=2pGux+~SdHuGDgHprf=#(8y
zO%grCLpwf(Xcl$r`rPwu8z3Ri9+|0<Gg_3mus9X9YktgmIm*hyO-3?8E$9Q(NjavX
z4A4NNBCt7=J;)HKpAw8Az*rgNu24l`THj-(h|}<*^R-`#i;E6D{6+J4(<KJv({Ilz
z!KhBlOBG<p*a_{q|5=ZXd+DN*sTy~dvcmdiU<tA>Iz<0{C`bCse|aMe=hv4nAX(cS
zRXoeEhyi~k+_Gjq<=NQ$2yLMuDoxKgF$2d)j&e#vu{OeJuQnedPx*VHPx21vWW&DS
ziKdm*F#OtQtG5ki3o%wAi2AMa#!wIVqwr=TG&MD)+zuw<wW<vWmGE39LPFrF$L`No
zi;0qZynk+W6W*U0ly-amOE2XN_Y4C8^Y9n0q>}9lEtjg-8Fu*@?78c<bT2c*7>w{u
z&oB%6Uqk<M1tuJP{|omyR#fppNiH&19Ps@`sr@aCZy@CZx5oyr{Y~u||AkeJz}+y6
zaA9Y8o5T}_twdyfI|FB{*zwKB%*+*@hS1C-&(amyT25&O-RZZ3b@=)jeekGUMshW!
zy(DR2*Jb|4b|wK6f~FI*ZVTOYQ!%1?XnChu-7E1rxGS|tohcE`g7jr6LwhI7C?>Hd
zQ=5g;gfkt|^z-J?LXoC=bsf;2*WyI2t_s=NgeN5#|2Z@{M9L<JiCd;G+v*7c;V4u}
zi7eOjiyKR<-j1yQ*>{k%<<wU%-*iV5_$XG_tjv(7UGdj2p@3oAe<o%+|D7A*`n7fA
z^M?2w;T)}rqj$7wFkQVr<W{aRuRPn@*j5qPq`LG?^7yF5<M0CeOLA`IFB<nqL)V<|
zN3Flz;-#`KxC`cc@^5#95lG0oPjL&>tA!ge+E?~7eLehWxs-1ASwk-#P8Nq5I$~+u
znW|<hTQ16b0*-%Dp$qi}t^aZD%kOX_OH`G%w)3`NZ2H+PWbfVKG+Zy?!>%07eG5pW
z>dISPH(V}MS$H&hMcy0qKVLeUIJ>A$P?ZMil`>G1Ta4xMxfP|PVBMIOS60SyJFZ!O
zWbVy@FDj}0;ydlPIrlF$Ne|{BE@T0|MPb=|d<cTVcl~GbplX%%{Acp;bx7#BK!t`s
z<R3cZ_q+($t=)M^Jkg~4t5!6WxDugZc;q;oyzcJYAv5TS?Tr>Z_Axzqjc)mGklgi6
zbw2IG(W{Tj*kMP(KVvXh<O9sMAN+Km)35sadh5P0<mtz&onjJtoEoD}A}%hj?3C~{
z)5VeubA62{rgco-Kmw@F;(zbr2_+!Dxj3;iTp{Lva^s$oor|b%vV7wqpAtFP{aaP#
zx6Y8e?vD13$WGd+P0j2)uM0-;EisQ1&KU>Y%_GXnIojx#MS<q)cRL5Wm-4tvYqAJe
z@8mD}L-+3+hQcQmxhy=2oy(>#)!g8Kjm6^smr^hI0eXTyi54CnUM(OXAgvN79JVCi
zF7ppY{hr_vARX(P``vnl8}JE(aZ$wUKOVM^_>jgGWTEQJg~Tcu__ab>wSNyn<-59H
zfg~+Kk|~1xv`u|bj~_~B3zHfQz&rw0&Z|^}n6jxG8f0t=NrX048b(eT=wb>hGyQo2
zr;9cP?)v<l7p(ua1RSD3PY?p=>9llwe)A!~Aj}NTu+vO<8K!STsQ3SFI{$xo5dPs+
znJ;q`JyO}&%-N(L|2fQPUff4|Wx+%Oi+!IEf+WBcm3&te`~J-F4DvtkK|Xq^)K<XH
zDp5Fe(5>K2qUr1vJe_S_@1!rj>3Uy*45argB<Bs$BYZ6*w}fDmVEd==z<Dxknv2GN
z?WTIL#UUL>3)i9jvJ2QM8jsEx|8wy_TH$|L${vL%02+*PSecvmzwREEoGw!T{VubK
zknf=Km%s;=jPsvn{VeLMnv2aW1oeyO-`8l~K$zE3#S|vf9nprc#s973wD<>2#u4AL
z_%HwgF;d@1Y)LBYvp*31$Flmrd@@Kb7PJhI^PfRSnA@509qFGp5Fvm(D<9ozrvKxC
z`maCk`Jcb!_-A#_?p?O?iGg_Bsc%BJ|NnpY|8ZjmQg8|1jKA4lthfJtZN5CUarCWl
z_~xtf_pzXEPtC<=?u}!Cy=LTp_Q`(|o`xeKeUylyz<R_J@=tG((Lp*m(MQ@&Lp6_`
z`>RyqCDg<D#(aSy2TbCIxt+iOROO2O+4nOQu@@WSvw9il5`UAIEc{WGM6HU=2!-x|
zIU)&D4?9M?ly@-RDr@jO=cz=y?o#9=R>3gYqw&AWw||YZ01?O_yM7^5xRAs6hY}ho
zDm3&PY3Z|-l?*H*dffS1tNF?A9G)|FxFj-jkM1_QeOVFsSlyrX^+jT#2vW^Xn+Yqc
ztFvc;M5}L=0_X68B}k%rV4I{a8}kr?BTe|z11k9%;7$DhZLDuS{L>QM-RyTCh2S8g
z{3DDincu0T0$I>XEJZ3<mET~KkGDGL$v<A4qH5EpUHJ}lZ%EH{zSf4jjvF1Z@+bTK
ztr(hOUX`2bgqvD2QjZ^Zp+B=U>2SLVV>Pa6GxQZ=y<2@LKQraB9O_3o+&+*aX(J7t
zuta`o_X_*5;`vZrJAd*Hiy#Rr`A$7{(xLBUtj<yyFLDEH+*rukaO}?3s=Zyz;;7!T
zu&baTJ_rD?Jl9NCb@t)FL`*JWDEhX$&&xD^V&~H=S{Y5rZSM%kbql-p`}YL#zdoW{
zV*j)_(Ear%WIl%5LZH)$ZVMTHM}aERXmQyYQeQV-QPNk_z8t~Qz=MlzN0Aq1CM|NP
zw)Sx?PDVfd{&k9vZoSp=tF4Q3=MCboMYxtm@|>dAU?>ynT6B8ngU>PEb$uxUr6?+0
zci{Z*Nbm`?Be4Ot-I(wc$c`tuP%gJicpd`MsE%8vlz_!iPXG=8SNiiI7uAqMd9U@s
zjvieOyG5%P1{O7qQJI&?!nc-nBAMfGKlVoLW0C=3X&4(ayc|?|v+(;_<@>_*#5Mxt
zv<Rj*u!M`tSvQpdzx^)jmw3^hbkcU|u0^S5^nWi4|NW}x0_&g7=BWDBZstow;wzi{
zeuXU!nY70-poSA555#a@tMWHu3UpA|Aj~~V1t-mRE?$$wa`KA&D*cGcrBI{8lE^Iu
zjfNTZSkl^>wS|pMn(-$NqTB*9LxDhYenVd-+jejIHss*)w*LDh{0cp>RiiM6zk&A~
zFWnW4WA}zMbd+#dvf_}q4+<-`ttfLoO^@*N)rap7&s~2{)+hZRJiXw(($5&Ic7oVx
ztl&l^$#K-+kS~chTPLxG3D|m(?Dy-N@2B~EYuFZ6d}H}0vMArC8Ui*MZi$C+U%vbl
zGT(2pzw_5KHKnVrM)BdaiXgvx`V)>?c(W_lc}^_C)rq>}wDCuZO5=3rLiH6%n_eBr
z-UD+@H4z$&d4OPY`<3sT(^dhiBI>~eDjx(KKr7&$$B#QfDq<=Luz9R9&h`2pF5!^!
z16Lr>)-z)pipHEGE9+-nvVy6nre#*UMKq>zaX)2YajWjl3W+OfY2@jvUp#c*Lp$4y
z2tYbAQt4M(dJb{S{%_P-tKS^gwiYJ91|IH+;Pdm7-KoSo?E>QDG`;rQ*aVs^swJs4
zUPr$-q69tN@<!C5?glf72C-$Tjrq%f)3&HttW`DxCu{QRhTAq?)_k}A?f1%mt)xN!
z{pxZF2qD{%Dszd!qu#G=Uj-}`$3op4#$-h2kyVq?>d^q)ez=5;tD&)LGL+HM(25;Y
znu8?ivK)`{R0*vec|9ux-8p#Akx9Szl$YaQPEWJj9|g~Qql15Cg<QiA{*?ILc30@$
z8^TuHK;@n%Oqvk*=5fg9%L1@Y98Id~=m=v|H3_u+tc8+v0{xc^_~~PhQphOW)G&)7
zdT3|o_VrYfEHu3U0b{aJjM1jeV>XnKr1W`Qh^mg!M?9gsv}8ygavrTpc&U(YoIN2a
zVcneBPI=AwH?$L_)Ana>P1|gGP=tl`+MJt@Gn?v#rYwxZtw=&|$m$oT>p6OU;Z?RD
zdYvmsnd566Q}td>yw6Yg5ht3zRRA>HR>&|S@2L~WRK(rSeJgZy^vKj)^Is3@rd9)t
zB-O1BFOE?fi5DU@Yu-i|`c$2adF$SYSLJQ(#9cJg?-BzF?5sZseV+`YudvfoGc7av
zNt33bZKq@z(RD+*y`*g%j{4(A6Z+5r<AknWVtMRjjSc5u;Wqa?j?nGHae5_1uryPi
zZdIhl=Zu*TQYl0IFW|aRrnXX#v5W43zA{kdJ9-2@drLFt%;M*ntm>av42|#S-w*?$
z{<~wh{&mWB38C#oXRMC1^@g{&Wcf21yb!cR2j{P9xJ;4__DEL&n+O9R-a09vGL#$E
z1OgshX-<}-GknF)&nKX*7I?&LLMF38_<+ZJzV8}hS0t78s<x?JH)o~X!Eg`*;7V0r
ztD8h}6V=-y_O_W!B)tHF?=w@Gh)qm~f~4XA6zlE>xKT@bLRGe^FL&vxjbGB~^(VM;
zk=i$XcygU=kwb-qq@w`0EXEyl8r6EZrLkGr!j*K$3l-g--RhgESd=ZVL%xq`RP-G{
zvS)sluzew42^j?ig2Q8OtSDi10-n`I4Wi0bsUWqiDVEW=<@B5Wh|TxZks=RiZpb4_
ziXL!nV=ds!cAG0;Ws%}DA?f#{G3Qm*!yjuGQKw``3wUiKZBxEImYd6`MaI(57_LBJ
z0EEwiP54ew`0ati%~d=<jeFisJJM3W3F)U9Wj}i0v+mMWW;Yx}ddfnn2RhhDu;hr-
z3^br@Ky8iq`rUPYwDwO4{(J#>|GlDrXSu4Dt#b40>R)QmVT$twlJDkAWK(x!iGDst
zY`zcwuAPHYJ_pz7+we=F=|Oc;UZx#*{u;kI%x5fvFnV@+hN#Q`rMvH_SjM3$jx}K(
zA@}P;3qqFNb4JY_yZ^mCW!53(z+6eN*tRFgLbZKcvZC1rL|!+<hi6ZY6e&3%c8_L!
z+I(k<$~>=Je^|9waD?&B(yCk!HfC=AsOp-Qv7@BmntsKjEdEXB?YTFg9%-e#?VfKx
zraIdWWa(m1Uzh(U<F#`)uyPH@EO7S!R6}jCLx3iDN*uisuD3Fi=rF=Oj#cncc-%0^
z5WoXHf#h+9Y|ol~=hpY$&tl*lQs<n9KYR^h`VGJK_uc^}?+{;?$cKx2`-OUJX0O<l
zbux^JDks_a5Aesx6O+{%&))|<N~y>hR^0X#c`N7C*HI+BAPK_^s3Q?iW0{eo>5NL@
zY`VdkGeV6#+|g6ksQxJSD!&wcvLN^WYdmUgfLzw<Dx{Nux6fs|-OBQj$IhkiI!jjv
z-B~lx9ATnw^L5j;&a<VpuY36MS=oG62_-ocLHPhUkvN!PFID>~VfxXw$z1dokyp9~
zQbCItLYR}I{Y$v3Vg}msEPxBsT-Iv>M)Ve2y=laA{eApm9}GRO>@+-~hxdsVCFukY
zS7dT8w0emuYw-TbTc^)D-Z;LpIh}F?t8qPD8tY7xcU@F`_QJy~M9`jPqb~m={}L{%
zg7cQNDEQWp?`vxIo@JqOmYCTdogx0i-%v7tvn6pFXA+2cs02xi<e2%M#A%Xa-cnLQ
zEjixzt!A^nZ6eRQ)`iFK#X*PVdN2I%mD6_0oWweamh{Qqj`#Iwp?3fGVn51nO1rJ=
zDd#0}ioKROmZzU666I;^l}7&akn6zGJV>5d*V9cwe|f(QR*{r>2}azav;5uczS13g
zL7AF7{#;mnII5BjXM25LRPX!5#8iu$7q8E5)DM?AUOE^1)DW8HZ$N{EFCiA}DT@9y
zC4@Vqv|c^WR(kaihTU}Bnrc!3cIn>daC!={YVibX!6U}xZR(4wX2sfW^&v0<zgrhi
zgK<mNRv;`w?qr9>XGJv5(gP}oUCVHi#_VM1X^KD~Ci5I8gXUV#w~^-eWM>y=-dnUo
zBCTvmbEE&L?*B)D9XyN^Je*-2-Al7iN)ZOU`~-jHijy1WP98MC?(%cXGR8XikwA^j
zN^}!pSd^5~L-=5cSfQ{hT0niXr9_xJp*=@D4>D@|Q3$5b*B4xP&V<m1Lkh5&u?1$U
z%pt`S+kW{AZVYtDd@IEVd0YIj7+glCq=aWto^ofSU1SLG<|99rGAW-e^edq}X}GxQ
zk=rD~jBoOFx#ea6X*265J3H-5(m((b+Ho_m`dHitYMP_ntSjWgDeP;b-T2~;{o7UT
zP4~O4Bw7sK2+niEZTNxI$g5sO=2Ky9)p!~*I3vWMhb-c~1i!uJxGS5U3j*<F$f0!I
zvtKG~%key8uaEj<ojMOm6f9FNExQ1($%{|&bZ=sP>N0BJzPS-k*C4&gWJR^~@MzOk
zX__;omD<nc3s?(*fbMJFvvpq`&UdQr7>g-}xHgjCW1DJLJNV`t{WvtDM$Fb8b905i
zC?3r>#m3pP^}!jaS&OCwYj4C67Pgd*g1#0q1@E?c7gCbE{i$@`3dQJm4(!ezpZ!Qb
zk9B?UTz0BSLc>-zd4CFJ-ihGmY~H2TQ1w1vqxclohLB>Jn8wI;vJep;_}IBZgk)$n
z5#dIw9CjZuRQueh-Jf!hkBPiU1fd2ZAqqe1>`sg6Bf4cWW$^Eps8%$}#PAHYs>I+U
zkwo$3s6PV~Dva;r<{ruAWl?2TQF$Ccq*rQ3_$XxRG5oAnx9ZFH+C-H}kBRdc#`xTu
zdQ$SNYv-0}{AtX&us|lwC_e48gkE6Av+_=nW<Rj$4e5eXGGrL##sHK?ro{?z(=3i7
z&)@7uQej!RD=sFWo1sZ+>{Jwhxj1z%FcRY;dbbs}-&oPZe+VS3ZZgdA7`Y2SO4N_D
z<l<#4#B5cvAg8;<ikk+XBdUkzCCCiA9$jZ0x05L6pP|$0O-4SK-a9uO<;oHqC)Fe+
z>?{?aPV5joqW+(^l3T)<-b~RM1SUajg}W(6GE`$9;G3>-+>2g;1mFj=%OIy=@c>c3
z5A?uKWw)$qE?6f`^0&p3(Bo$QmAOFa*VADbz!mTD7MH{Aue5AWZ->o^`4^*eh!Ykf
zxZ9|0U;L>v45<|uzj%4)vmsc)+|!X>f1Y52hYuk1$l;N5U>ggmBPi!#_^m$H<zMKr
z8JtwPgEBVOSC8KmhRfPzUEXksWwX|}nqaqd<-qH|YdtdM?^>ZjJ*Pqroa{uB>OXyw
zC#8W%tV+;B-Gc>}-X?tINf$^EEJKvHNYWv1V_b`4D0Q>1$G{wN2`;CwMrSvD5I%RX
zZ&Qx{plVy;SI)v4349Ogkm!2fDG*OhLt6(Oh{}J{>iL$(o-$;wB^CahStfyX3wju9
zQKwpQhib~KkP7SnP@*ki$*Eu-tlH<^rN0YP-VaQT8&|c|f`3!M^-!lSM^conlm7Zc
z*eB2A<K$CWIIu(YHEo{xQwRzfPOXE0JFk15VEHHQ=$RgzxIWwGQ(fujIibeoG-{LJ
z1Etk>*g~HNM@lLnpl8(iXTX4wO~G*y<||54>gUtzsF7!Ap_<#TKD!-q>YRPK;f<zO
zvnslq1R}dD=UyAgH<`?tGY9c(f>UH|la0i`<}0t5eC`R8iS4TXqHUgQ1m5e&vmTf0
zwD!#;ZBNh8njx7*G?>wq5x@>U=NtpO*tDe-lS9z=;vBv#h}BP_XK7X_Us5;_9A`<$
zp9~RA=4V*J6=HHw#FkEeC5;H%$u=ME3r-ZN6FPzagNlVJ>-g}h^lWMPx{<h=chd<Q
z*k}9)OJO0inkSjyW(=j@jTkS2OW`1L5OqJS<8+)25?TMgUUiYnsk=C|HMQsLV#51+
z;njuq_ikFalJU(uL)pkvy6h{D08<qc^4Q6!#P3(#LJO^ckr^(P(ONiIi7DsP!*h)9
z6Zc{M!^+`?k$MtV52A;yWhY@=C1<84_t<?Gn*Vvs`9JovUhzHVN(}>_3})|sJIUhB
z(E^x@Q68@d#Rdp^fI3pn!U(jFaX|#~sYd>G_%LvP!u#s#L>p??nWx1C>X?x-Sm0xq
zimf4Np(0-=-~BmAE!=-@G-<{xFzs4%xW4UE?mP55DkD6Hn*j-Jn&X!SG7q)hE^NJl
zB$CYlQ12^_;6Y0cCi6J$&;ic)5%NSTR+ZVHYiGVwMYjJ%pa~?N-f<2eZtCh@=Wott
zST?A1tK8*wj8)>TsRq{qAG6q(c(aE`T;k3x&nS6W=;+61ZH=r7(<@Gu=v#sh#(xjU
zO01Wev5MDaN#@u9Omt+&`71aX_C6CG5<q8nN~QCX6@9hNIfn<Y^<i42d^0fSzz4}_
zpSeTP^9um%;$=z87iu-BGL!@{Lo7|IxxrF0>8T*p;{+{Cis7!u86?IzdhZ4%tZZl(
zRgj@&0EJb_HKcwX8UI6WX&uSU?jL!Z-E3bz^CQ|h=}XwV_=bCdFm~G>xtkQ6l=zpD
zq4c<OUy*FuD6->$S$LR3amJ(3w$!WdL)H$-{K^(m^ItVH^v;83ex}DR={;SKaeL(w
z`}+dp<?U>l!dH`j6fnZC$d!8`^sSXB#05cO5{pV6UVSL7fU*&{me7>o+u3}F@+Sgy
zJ-!a@Hp#CZ4NYIaMDcxIi<N3%N~(vy-iK`JDfE=HMSf#MWzd%3aTHl<#aNeT4x>YY
zoG{(I#2a)9%r-&3Ioe-H4|Hm1D+b;Ngp+8h<QX#XxmnRqGPNV9xk|`+WF*H*)D*}v
zx=`ZWW)tO7Yar+6Z~fbWU7bR&N1;N}kl|%aGS)f>8hbrnwlv+Tu%*=fjJLbe5*Z;Q
z(TtSFPH$Cu_?|4nEP%t{PYCZZ7W_PGI^4Bi`!~xk{`M<OFZPRr(<8?JW&UOt-KwL~
z`_qBdQY2cv%h-ano$@1%6ATmk&IpIfq*DR~`;o;Z_Gl0&{ERCp>Ef!k2`-s%FA8rk
zP2z=o#VK?+DE){owci&FDsT8W79r!oa54TdJC|?k<WK>SnVa<UgYPFh$S&_i>Plt;
zf5>xkO6U;cc$(jjzuu7?yP?oiZBRs2_aLL<(w?oxnWmt;`A~8}Q9@t=5iw83?oV)u
zq*h+riTaCuBd)>IpzfrAW2+iog)fnAp7Hm{7o<)x?d%XT;_yJBZ&3baXKpY}fq_rS
z#$;CITkzga(+B<Y?t)%V66^HwW;J64H5}2Fa2=;aHXD1N&JfoIjAsvt)j@{#WRs(n
zIuv;@2L+2&$3>5m!xMoAx#o*=TA+WsM8`-y?3Js7_kYz3yHsJa@v|INJjbXYJm7*d
z<)B4_$QqT4F9ql+;MXoQ>3{s->dyG;aa~_t4Qkg`ECF>p==1tK)+S>meF*EiqSgg=
zR4i0+!-RExc(_wYE=o&ZNx}KNFrV`-D@%}_U8zzg4YZlyX(oUU!-`^0GDDlp>brJ@
z_|b2)ZO-oVoiHHK(z-=WkYr&{Fdd;PB!;|~{Q+R<fQX*8IsP>CS>a5o;DAM!LTU_k
zzb<4E?Vzxzm7@YD=XvrSUn?#W$05#3c6hsWw$|<Y(~dz3Z2~6qr0g_B(I5`<wN-HK
z@>OPYV3~EWwVEFH!nSGw4l_s&GF!n!jgzm&4=yAvOiSXefTzfeu6Ph7#}rhet##E1
zGvufH*bf9)BaAS~VI{7`X80Rsv9iLQidYEo2tzCOiXCM7Guw$y{f{j=i@k$2bXC~}
zI$A9)b(%XU#?xd_ZTeO;AC5eK|0YZ>kr&~X0vjHk9%VFrFe`PI&VTCq)R2o&5a=uD
z*Vbt?kT@r&w+P^?(0jAO`M=fQ#=<a(T+>*v3~)iqKwTgR$hmY%{2Oi-hYdI)=A3UX
z%Qf_o18xs>K(RR_Fo`@j)#%c1TSxvwE;w-FgkY@@En$yjge>5D01hYVGq;PzPo1s7
zB6^;M-R@0fn5_fJu{_}2ENTA=q(u4Ne^bhZZ0`JCShN=qm9Y+LIAxoGvXqMe;TA55
zLI>CRyJ(h3fjHe<CAvb^2wJc^B3}<@-;#vD7}AH}tgn=AH+t}c3cEsmtz|mWk(S^e
zWvo{^A>F?i?|(UU{E<L{1xb0xCfB_<UC?#T8>xF4sE}%z_=~DZtgFyPVPr$m02q<^
zLXP1$sMJ4lA6E{|ZO%JbB%$vHrdy~4W%<JA(=SuVp~x$E_xs7nx(&9tgXs-`0HFSm
zdSvt*&{F4nRPe`SqT4**35#SlfSN+S?}RLE8uKmMox)AN(0+Uf3Y?}=Ylm5e@Za2$
z!2VeTtVnhp5nNG~=U>9Ozm(4i#AckzUAq1sy51@*%INJE76ci(yM|C0I;FcqfuXw_
zL_nmwyIXpQp&RLxMx;wg0qK^8@A2Jxzu&d@_5Key;0O-ZTF-s|vfYY_98~(f`7K>@
z+-N(@P*g;?LUl=_#?Wye-^lvU(#u|+5>JW16k{C%E2b(667lAIWiq0X<l?Oi>$pJW
z-m!2VX8~m7G}#ZWK;c819wZgkGuxWhtaKGbZ7f@Red&{3*-Fy7$X+}i`cZnH+Q*cm
zgIc>$8E4AOV~XAJjP$#=^U*2G`{#Nj60tv$D+CAOe=mCpShI;jF`1eB>q9LOnnq~g
z@>wcwVadB~6EM--p;nFF&z_g%Pwz>U;?&#NOM-&xM7mf!x)S0JWfJ+L%N1iR_Wtar
zOKvOkdpvyNhaGLM%V(6kMe>c8e{=1Rk{NkL!_Anq+gXLupSPxs)|wmm;};(VSIPbu
zgx2a$_tW*}{{@cyH&php2@oza>dG%zJ|L-s8f#^~p;rW~Ba`=<Js2#44H(Oe!BT>R
z;3(T6vs$`&dz95pIgCHzKl3XUYRFxXn=s)0g<0D~b2lsQ5idNr*%RyzQB5S$Hrg*3
ziSOPSa}&~AsYFUs+w)ep$~j}4EhAib_{wlys2%d@xU!pUwIN+ZxE(&zEu74X*nP1{
zOse)d+f-$hLD~-(D0;0xDZ9)Xw1sq21YFC`wC>Z#SyKvYct7mBc&e#|();G(b3KTq
z;_ar4-Up~pKw{8|dMLhxn+A0P?oIVNT~PSY#-v!fSs|0g$jZwgge!C)T2^i8QR~_u
zsBbnsC8Ps<s-jAbO$4Hv<rG?bs=&pEO_9wtp>rnUNG!=liQnXBfO~|DvgJ|y7#vBj
z##0L4$XO_Pn-@V|hZTqP^oLwpei`6JZVl8GM#HL=Mvy|Qpt>qh+H2?dz;`aiP)p_T
z&PIiaN4^9X7yqVRC&6gT<O7)dG%K^)thE@oh`rHQue_j>E4KT`vfY4kDAcpc?@>Dq
z50iIF{#k~OelTRO$)45#WK@xtAhK}nRV6Wvy)2P4z+XBXj(nvWJq*9AX(rW0h9Cv1
zx1TAf=Mjvvy~XIII_wcq|7yLnquWg*SLm-u!LWcTO>87a$;z5l6r{@E@^gL#16Lge
zIXKx@7KY9mX?=(z;E~x0?*}9+!q!GY3C^td*9=iai2{33M9@^j!wztL#H=Z_(I;zX
z>ilx%?JEuTN@oHM#we`OA!N(3c+W<TNrSDQ**AqRCvW)E4K9S&6wpVxX&;?}EMi5!
z=7rUjqnJ<FMcF{m*d6p7q|rs03_BHzTH2?epDB}RMC|Z1ShSKXP0a%_TFVJMh*8){
zAxX1Ut>m3}F4DQ84P9@An?6f;xM-$AwA#F0P5gd}c2WtU!?Z4T*h4Qf<FA1+sx>+o
z8R9oPl%pfxwjGzg$D;P}<nMUg5QQ7J^}}67ye1>lENCA}|0k%pXDjI6;QW(+1WAph
ztiS~E^(%dkiF>4%bfnPy06a)9$m2rPLOvc^E*^uE5V`BJdKAR%^S*#8h2>^Iy!kW>
zjJ-F!C{HchGc@){_hJVaG{oZE)n6_IGwU6{s`KX8A_=~@j!ctD^FnL*81w__VKRr`
zw2P+d{;z+Yjpx&C?O`bE78Pp(U92MY{LN7dL`F%n(PC#i3(@RX%^FX#dj#qQUt-X}
zwBr<tq50)Y$rrSnA<Apl{-*`EEdY9K34vRU{1i<R!6EYKT=eB8WGumwCzGcL`kTP5
zS)rY|F#K2SaR~@g)~o*0HO!eJzbKCOk(f`olKrPOVeR0qB)qnK7DahaK|!{ye8)GE
z_N68AUL|8Ardpt2n7DrqfpEe**qor}pN}>_>t-~B=oEdCtJPCB#|K#@hf-MEN{2c%
z%cxBdWM|Pw0vWJ863)w&;B(E7dCws0gLr#k^@#(E{u4j)bE|P4?z@H|6`QWV`gd%@
z<j`=5X8MfRFcS*)BcdqPEJIyei}$%uo!lF%@I)(E@Ih+JyOJ(Oj`nh<*$>Mme#}6J
z8dD=?T`B|`a@X4%^@1|=BMBKNBIZxj{i3u$`aMD4Wda!-R$yu5PTK#YZM<Y%rlCW}
zt+*Q!wLNBG@*Go4P^1z=oMiRK%=l}ib1b7H349P7Kol)qH){$3s&vwv*1~LV#|dN@
zzYs|@3b*$97|6%-SxApL)0uyeYYZTqw3^45G-%%3+~m-QOnFdomkLjwnF^uTm9Pt)
zD(_JA3pah=`uQ~Ly_{cL;vth)zVlidJ!JS?W_#oCcJ;J(SFVrmKC5q)?!1?@)dA8D
zEnR*X=t@d+s^1x#j<lU5bXM5wIW@Q~P1k0yBl0n>=a!VR58rpd^1rt+wh{D{um>X2
zasm+F2)Wv2<fc<LI~Wl7UGHb9su__)S?xgF79&sN0l1|Ib2kRot8R7vJLJ%U;y(^2
zf`Ft1X2p-1fa~?u$SgnD{pQOHC3?M6*IxhEc&rIDFzje)^j(~?R^3h?BJ@1<o42fc
zeO3_F5Wu9brlZP0j}A11giPNP9Q@ai{qOAm*N}ZM|KCIQx`TAIJbDKo2X_roU4Vqo
zQ{@to#cPu_P;0Gq9xkP)$0tU9tdaW7;`Ed!`z+*T@?tC7pRLSa7N25gan6Bn0`eS)
zDgHrMreqlW@hJ!6r|T!Nfm*6HtXcXS7aU>g+1&Mq*05J}3_D82fxQA~xGArPx>rBY
zkN^2m3!wwk@PM9LyctGGrhQC58F!Vjv1pu9#wQ*5Kd9R*x2~k%Fw*gdC4;$0x*dP&
zJ1XsG83~VpvXKw?LKJb|CSb4s1GEobqNVnJDAt4q=%nfzmPuNr5Rp$wlGOwynR#P<
z8hLn}-6$#vvrRuig;!VSM<^w}A0bIfu}q*MXrC!PzV48_CuB|s-fQS+)gIL76FF5z
zfBLKjxkU3#W@+pn{<Yac4cmK`CX%fjmdOm=z7SW==59vd7UH<6^3f|??a%FG2YbDe
z8LwmSuD?(jMF%3bk*9fZ--Xh4pz%00V~vi+B4ul7lnJob8g0=f-@ui*pxaTxj$SL1
znPelv`XNi=PVpO>b!M$Qwuuu*mJ@qj*K63Zw1@IVZ#~ZmsuFLr`eRLx^qn#Ut=I0U
z7u){We3<8#`B`>SW+i(E?A;v@0Cds4ovq}KC!1yWX!M~=I$9sf1L|@r8TMmBQgK<K
zCX_!ot>Y+XbYv1h+)BoS5j-g4LgWLeqivpj^-$Ps5c2sAyt>(%&vno>-TJgZDqONC
zMI|`Px=X&-b*;K-*2m>g#jq^T>~Zc1MLL>*b(ifjG`>ra>@$@O>V14J7j>$$Otcn7
zF6-U^jvq^MQx`nzil4TySX_FH+Q^vL$j?imtcpc#v}R;uII_WX<v8rLCH^|Rxxo<H
zNuTR%u3a4p7Zg*1oW8ArqdGb0;u6iJgp9VznC307HJ47^)gPEtBuY7B#Pmo1PhW`u
zy8lHeHj>6k1j_r>#!HZ5(V0k~(L~@x0C7)9hn`|3*FiQkAEg3qZtL(4;;+tQ^`J>-
zq7sxo#y7H#%Jzf@zg^j{!(J>3Ou0BM{yNY#(EtC@#NAJYmdDe@P@c-UmZv-}JS5te
z$Pz_9+MocwqgDG50K+;Old1p5B_2pe3u~Tg!q@&bkv0_t1lUYrs*EO#$PVHJ44H>s
zq<7vF!U;Wo(VUw;hg0j4v5e24`Xx`JI^m^oz#3@3`g<JgZe(a+Gm;LT+NnL4%|unJ
zYv9A^q&x!^&CVKB^cAa*o^KiG0y*5`rK1yh1j9uTypP8e^w`g++YGfT^-QBOkjdxC
zoh3&rqKO6jbNwIsr)@Xpf<}aQ@giTzb!gV}eBkzI(XdrarLRZEL(r5@=BRwWv%iQ6
z1c5_IVc5H5d;PvyteOsfmt@~Chc@klpV2)eYsCWy($Yf-vrshniqGr!1iar%|9DSl
znt=`i?imzr+8i?1?KKCDYog*O(Sxbdija70E0MO|T`h!MH>AA*f|NHVo9#s1BPr}S
zG;v%<bh_~cLZRV)<G~>O*5Ju^UBg5gI5ELnf7q|ze-!aN&2N_4xUBOkrm@E?`ywCM
zyc4@1brgnq-ZetdI!*Oi-WSTwRfV5N2Jzd85Z7aqJlXSqBrXmRBF>Pz`!zL7@Fr3?
z2u8O}A+7XFT_sZv(=K(Nw?Z;+Ru-nY@|W5l1EsO~bXa1lUc<ip7(OAZEO9Auo&zOo
zex&xXSiH24bE4Pth^s#{*v{-!)Tz;LVXMT2w7P7Mo;jZ%exp8Wm=r~Chau65Fiu+L
zQsYpspXKL%+w+hAC<DW1^=M2G#Gp>CTBG3O9ZL-3CQ~Zps`rGh$;u=dwz0@0G3ATy
z0}!VEGwD=P9_7;tIe8!R<R2roz$|O%qW^D=8@pZ9+iD6>>58{x7Xb@Sg0^&{iA((T
z=>F>q0Mk6^v%78jLF{9OmBWuOdS9~ITw(B%*2}CNuGHiHf7CPot*7{}U80Ab2+-1e
z$XF)%8POYYxrRE9!}8`EXdnMAA{h?z9u=#aC*EXdLFAdSc|ce`rkff&6R8F_3{`4G
zvJJ0GlYA!ZQ730B^*&mk_Qf%9E%0(t&Jq57MNklrt`t#Wc;cD?_;V7+F^HLy3yPlg
zeHsX51fVZ@P7Yi(am`ZUvEsej!LtU@1Sa92k@?ODae*#!q~7e$Cs@|0A25)SAZ?qz
z=%Ftpyb!mLcC<VHmPf?Q73h>B$E{s0xGSF_1R}tlHLpjC(dujNC-h4O0~~tAJBylR
z@?uGE^RdWAB*Zg5gU<2Shb$<AiW*A>Q^;>V@c_*cd*z_7GwgJF;=vaofyLa(QD$wk
zd`Rni`rG@dX_BnodVc)g4__mK%+o}xIt!){uNiGBu^}40+%(cw+3~35SpW}1(u{+W
zfEt4kv^BfYpLQNqH%)vHe}{Wk&qSJs_Vr94!wmGCgE-yTf6`d@P)sc4?3S8H4f7JZ
z`=&jwvYhz_4^Y5Ir5j_|OP2uo;5X~VGlc{;H{Zp*9BxY27QZAdR~b(x2A}iVF14=>
zQ)tL0$UDH$UQ+LWY5(M0X3#pigSZJQhDrcoba*i@`w%-w>8N-pS*r>zQc?)Kv^HKg
zY^J8eV7itA$|tuMX>3?YWt6ls1Ja&`IT}nV;ciZy5Z-Cs3w>LWk3ib0$T%^4tc*NV
zH8vt%529JlV+~AWO`Hk24%b2_p+(SL12r-5v#Nk*jXwDfyP+K#<sG7xT-Oa(qmlye
z;xD1<G|ofbxKD-1&xD{fOxlRWnjcq^8v<Bx7o)8RX{CqFeknRV7{yY(t(s1EAs0&(
zYx2mW4*%3N?svE0p(=Or0kkN#$9HCmIm{a!5hS7q$K<oJZ2}Pm7LiiVh75Z1(Kfni
zvW;IbjE@~a{wj2OU=!8G1V4xJ`Tg?O7kvN*s0H9i&9y9FL>KHKL!vkPwYN<Mol%9d
zFqY@nW2+ypsQ=RqApCa&xEeS?QXHWR1YPe&PdrM+BWmSB5l0YlgT&C+KF89Jy({nB
z*2iJSVJWi!8-R{+9|CO(jJn*E))T;}d7iCeNIg@)=+uAuVbbcC;FL>C*{{d5bq54z
z+*dp7P&FF%BtfMM<buB4oO47P6aQoBVC2FEm<5<nrwPha@OOAL(d9MWjE^pu7L22c
zr?bovk;ZfW!VIxHVJVd`4GT>S6Y+0pJVn!`Q3B~B;M7PIzvF^)s5TO4Ymy5D*Y_W1
z;^6OHzb)(~Se>y45sS)s2I^TYtgT|c^OkO8{IyVa91r8>c99y&5269((Llct=vfJ~
z^D1v=&B+QDR7z|IFl}bUY$$Zf2y*RnoJTY-X%I^s*ek}$=Dxnec5ae}m+wS5Kydeh
z^Tslk#n@_aBQT&*R}tdkbBIP^si$p*oMy_tR(|sB$YHSlAXOm$=uq%xgh5tM;xOT}
z<ME~pEw2$)g|YV>v}Yqw%C5G=uw5o=JPEOk*5RR_S~6$aDG?eo>j==I0}s$sgMxR6
z>!eG%G;&7fmTpaNuPxDu_0ae=RA3s+8v1kg3uJ#i>cj&KlETq0FxBSWyQD@_8n|-A
z*zoVE)ceJy)xdW9EslJJaaTPL6Zob<*J!r;8SRGt-VD0Nil}MhPH7vco_s0;sCzvm
z2Va(vO>0~DE{CaiPp$J-_-T&ikF@GsLMfHjXo&adkZ|_S*cx*&=(3#VNiO6z?jj~V
za;p+;JkLiM`t1^(aabw(1%y{-2w6L1x6f-M)7-|GB@Nwdhet+L+iJ(&Rzskf2~3Ak
zJ{lfQHaGR2#S~g(wOg!51qD_8!dl9yN#{0^(qr8nU7{UPg~j#!#Wn(@UCVPgtB;4`
z{I0=;$c3+^_KW6}cE?JK*(wvX$CHp<CFQLBZG>jbnKImEe*C~9zbJq1{J*V=|NUFH
z2htZCh3rj|fRQv@4t)#^d`2+G%OxOSV*rGF9t_3|B<{LZR1Lf4e|nETG>!GD1K8I1
zkUYS;r@l0=bu}7(+UEHEJi;}}&9P|}krlftb<S~ptnFF?LzdPiuFlo3545hnc_S={
zEC?;U_QSg29(WRqdMr`7%(=&5!iI8gASbb(FlaDI3&+|3o#pQX?Wb8JI86MEz?x$*
z0>x>YsE7#N@qx!|bhP{^c!<LJ;$QnE9B>*Lj)(`3VX@_U(S^mdU($C6pRUDtR4@r%
zHA;C~M(?xuE?_{|tO(M}^0Z|vA|h0~tG5);Z9~t1{@3c2)xQ@+(oRlx&0P|GI*Qym
zW{_4MD!)8le<MbCHJa=1cFxk~`B|)1Zfl(^_TdsMT;3gR#Hm72uD5rIMzfWPcqZZH
zfS#=|6*%<O<vfzK4E}AV;nRhNiZrvNRkX75tYKU)2~iOIc7B(H*1@04q^*V<tRT7c
zwCy`JI#j2!f#8C?q3z_C&JtN~m*1q4+V+B3#@l72l^&ZNcUt>f=V(*Ym{ZAz)KJRP
zr99GR7b?WE<dC;X*+tpWQ?e>5KNKV~U8dezPz(X**v0NB+bN3{urpmComWXh21jKA
z#3Kx}^8OGXAbi8MkGOQIBn4$OKcb}EIfTdJZ^FEe$RL&T<en|}zpi6`yc9#Tozzjs
z@wgdMPOrgHtQ4MDSu9TEDD0B>aj9U5sB2+Iml#4ZCKzrIj6>G8Xbu0Bp*?H3=3B};
z_y7$NFt?_2B$+(D+Mg_Wm(Ai!Vmpw*UVTw!b9=KD-;WiG;D$bFb!?8GE2!+MCE}#z
zA!EdJpsf|!mL*`NEXKo+SN#zaE{#o8?H_ZXR`=;IM#$aH-;W(|5sH~E_&dlk&m@am
zM4FGVo{>ItJW-;F<~JH{b4C6)Quj66EkSZhV|(8da(FeH#7x~bH<F4DHul`m`)Z60
zNXvI8v0S6jB~?yT`PV!-AFh%gH@ZcaOU-Lw5kzbiC&t?Iy3ZT`0YUBiSK^0H7W@K9
z3k3A;;?2EI0Q<aGeM2RO6BfWMK_X9t$j)gqZhiK3r7G2f;;JwX!phq8);6zAtzNud
zRqQe9$Tc_pUg`w~a54T%6ILB5Zy;MpYsBsD()rWT``TOBKEyumXji&1CSKp8Xz=E}
zbQqk8C)eYIqocKS&&a&j4B3)K=;kNl^r0s1P@>(Tl?6iHw^>t00Q1r6eq!d%WQf(N
zdEkq4TKf>TThx}9_y$cAV_@=A@ryC$3{y0$WgeHTxQg%8X|s>-ht32pIDP7j-I^jn
z`_Kgc;nnyqMp*M`s$@~V4!lqLEsLrk<1mts%Bgs-&11*3PE$09)^H9PqIzl@h=j<1
zaN6(pM`S<b>j*JT+_^<t1T$R;xHLsC+6Zr5j>h%R%p}h$cYz9ta+>XpM;q_{`&54D
zY9Iw79Vq3h`?e_&XFU=H15%%rN>uqSm$Bq0uIDS~5H5?2l33Fz^?yCI06IW{KQ;YE
z((*Q%J);FxF+~#oUY;S0Z_M;I29OWyd2vNB0#3ov_}}p7r!{4GzX-0d-t{qlYr{ec
z7qgKcq~rAYA$wz<ZX2yUM=H<9Zs&rRR{1o|)W_sSCb6#%P`XQKD(!#wXUh5FjtpKx
ztMHPuY4_x0jJwPTviN=Ju7TiVmbHGgUUGu!)&W#!sZBk*Qo(5DZi<s49VqUys~YgV
zv+jsn_)M=SPu}c#VMS-4+N8>U4ax03Iy%}YiWfy&^SPMlr_*aR`Ko0vZP#(9EZNAC
z5=E~*n$+u;Xn1bg%G3c?aH%gY!6W68u^5GojPL?^>^}VS9&A3R-!^KmR`Abt6>vyf
z=?i6^n7chaH9gMY-?*CDF+E!|3$`t^7wftO9yCJ~r0h}BbK#a?71b#M<uO9e2Ax&5
z1F4b9Jv(l?OKc-FU*hooJh4Umr-vu1i*TAy(p<LjVB-J4(5A-$MhRPP<(?DS!ceDr
ze0ng4xdeIV8_-Kqyg3#e)(5Io28uDx7G|IC&PaI*#BY@($_=)&cYTAze%+<<UY!f#
zv9?r9inY5U^(;oF2@M^T@+a93MW*E;j<i?wA4`2^RvgCv9!H~s?pK1C|Jk5YZvmxr
z(fRIYTy);AS~<^va?{TMS$DBR4rR`_f%EXnS?g!RDa)#~d-u`PsF>(&n^Lk41rZXB
z_Kr@nFSnyosBl<F7NpuP-QD!inB90RMpAMJpoEIg8uj^u!{{kQYY@<{3KLO9z<=kY
zcACI;SU4^6c329D>xCvN3}r~2SbC0L21Y}#I%x9Uh+al1?Rch42&{f81z};vo{5fT
zwOqK}(W1iw3om0)a#4Wg%aQ}Qkt<o<()}2&2$`@VP=_goiJ-xR@<&pN+@Q4giPR*$
z!Mt1S#vT0PF5-Imew@^l#Tdah8*Zt>P$rli1YZ3%6@`im7D;DFaq;4KXOh;eSc9Z0
zfm%5*!cU%Qx?W-&<I0eSB~9M6!6;X7U0S43jhzc<S8KiP$&VWu$+ODkN_Y#qCYvJ0
zA)&KNW-0#AKwx^(VLvct2LGr^ysZ={=~eGb?PrrK;j1_zBVj00r};vy%<G1=Ix4$L
zL1-|pK|ProQkgmQb#+FRm%{T&^6&Yf5%}D$?xSH@lX}I|5(WHy?wN4(_3<zAIE9h^
zbZU?IZv5ll0(q-Dk6p5%K`s`L=cBdxh5X~WsZR8k#(5D$i9QRzBPH)RhSSV^wMxv<
zt$#1gGb6&<-8ld<Rn({TK$_pXYvDm3taIVg&^I_YXNtd)mBrEWsRWcvL#HE8{TU_!
z-}v-UD^EJV_GJlq{%KHC^A`Tf^Z#T4Y-C<RqxO8NlM6PcIQTz79?q!S`~4q$(`yL>
z|45HLzHiSv`JdkKQ|^CGy~dF!X>_|Z+jG`{7r;!8V*;+lyG~$A0K?F?^lLianBd31
zLzMn|@Xr5KpqK2yXi;?vC!)kCg$v1j5g)XvvwU?v1AdVU<W2JEtlF|x0<lTQ1XX#*
zU(fCbW3RCdU#6VhYwTZ~Z6&MJP~+46`Wxr>=X1z1a^H&}#6sk59H`Vp8re(p%c;bT
z->b#<Z)}HES;Ym#Niu?D?Z_(_=WXu~B-}*X&#LA8)uzSYug3<JBU5J(!8yR(R`RCx
zs*rqDHv)NqWL8=v<K*gO#iL>ZJ%ICp#>i{Z7{|kPmp1B;SK6NrYBq*<jY&N5W_)cw
z&dVV_CqQ7EdhD*ZON4|aB{m|pVu$HN=y}Ac>0=F+@HqvvD$_&)U_Ya!xII@=zZB09
z7tHU&0f(Ui?>89&z8BQU@?A1t`dxfCXp$6~6nlxCF(rJGD0n0UVNv~{QGVhEtp_*?
z+mPvzez_PEfxnuCet~5Yy}?81((k+F@q}%`-15btXy0-z?W{|tQ`J;Hukz5z_w4>I
z=r36Y#{c2o-C}k6B{-cNYpiaAJjzZ0=Q}ME|K;(%)4u1M?*3_3{XX5yAMTL>L8KC$
z8f+)&G{rb8fOO#9Spxtn$G{L!mIWByFr@LG*a-Gv18xFcR5SB6z9kptb>`Ee@h3xq
zHV1=!0uHd37Ibdbb;I=niniLcZtoq4v59j}80Lmt^}2<3J!OZwkq1}4)&-)R3i)OU
zhzvp92qVP<RY}$K8mVwu6UZuYLGh5Rg{-KhoouMSE*{aayJghQmkR}yX2X>h6n(^1
zU><IT_GnBnqcQLAge~Oi6Q#|iE!F+<4TknMhD!}ihP(tsU(qk=px5s>lPo9dZ%q@0
z&4|h<r&A55+9+Ci-Z*7Uo#xT%K0S8k(E3X^+7zkaPNKYs*53P{&h<J-nCqW=^DQCL
zJ5auO^#CSU2pB>CAx6Xa%y*b@7?6m0l;?}KCL_Y7a<3(k`>p<Dk0%Juj<i_9(F;Qt
z@JWYTll`9Y;t<pUSIp-(t$tr@GIhp>^zr|SG4i2`9=y2@?q@N}jZ%00#Svek@#E31
zq=Wx-5S$?<6$KI&lGhB)do?7Xw?4zUSk!1Bn0_IDmRb;dpI7WX?MxxmmFXTbOMfXX
zX13gL7B_|Rs|b#+i7IiY<r%PU8bnK{at$4y*N(6mNt#A<*}9xEhlWvb$)JZw{hG;^
zs*0IZE47GLbxP<~C?cS_Y5J5aXnF0q-2Em#`k}Co!|9ewJz*BGfh&@9K*`*Xpn{e0
z+YByq%hHDRfA&iIG$3x^d@QF&Wq4b&-ad;w(nX7!G3YKB*<6nFrPXVwy7+quN#2uH
zM=GwHs8b<noFNx1o&2xxUzZj-fPO3OxZ-R1Xq+7DC+VH5brYG4!k?MtvzF%RZ1!wy
zq-V&oCRGy4qT0qx92C+~0n8Vn`{9>Pa}^6AoBK>knP~~S_fiXc>LtH9>t(kjuB6E4
z>_1fwH|PF!EeXqH`i8SNU^frx%vh#FCNnR?M;-K%(4DEbt^pFQKNJ}g+`W)pW4`3;
zd|2FV5qf*XeP`)PW6;XCF&0*y65F?%z~QrmP#?N0Jn8=3lLVQBi@PLj?_8rdhKk6=
z;n42uC*6VK9hyGrYk)5bO)7k-)kRC)eakX?GSheYK#|By`z|gQ?fK?Gb@5H#Yi8H-
z2W>9pRH#4;W5?}7RIQ(k3?tKQQxTe4Z{wKz_GPP)J<7Ln2j92uaXxwH^(?U@Clby1
zR)I0MGxzO|N(8Km-sN5n6Te<8Ppn3V3$w(S-*^4LhVlO`T|Y+t2cQJ$Jv7_VtMiBC
zzvz~#&-b7qkunJjpjq=e(i}~B@9WBc_kO_j1GKYK28_TJ?lkp%mM_b5!aIQl#?V`Y
zjK}cz+e_>g$@Aki6eZvVrfRtt2tEJ?jC+aGjXe8n@Pv6u&~%v$FuSrN)gu>q5MUrp
zl`@jVEF)RUwfKmzF^1}BeaQU|sk_v>$&Pqi`^iXqP&O6KVLXKf);OwB4tbCQ6IU})
zPQT6^RbPk=bsE9`bRBYWt`}${a5jN;flKq61TFTPikM>*?!MEKJD4X@USfii>AXz~
z4Re?T>W~^y25y>L+G6OV@yrl!j)^q_cFXDfSqVz@I=fIH%9U0%fybv^9n#pv#bs+}
zl~#P#_r)*kczbE)e0%OZ)%QIU>vVQ3KE8Bsle0*IsC~hm89$RHVaRw3)Gz=09&sj+
zu{uDo3#FZ^zr5XrRo^ycE(E`hB3|aFe@-bf_96B9jp<s~Ft!s&V45ubAa0^#B9zmx
zo;B-=p@|7JejKZwYH{+8wYkw~!$>cjcXWKkzq#sK1Q$!H=aet*fZVVqA@b=Pfqsy*
z-&<-Nzar9KPl}oktmqcAWy1m<I$kS%da_h+;5jA;Gj?k6Bvm1OC5ms*6S{kI)J5OR
zd8Rvl5PmCM;`pG$(GK)+&_62Oxe=Z#=~!DRW7Q}XoUf#7D<P{>?4%NhrK#s^NV|I9
zT)Vr6VRs9fNu&Or(2VW28EibQ_tDGMP^?`p-{i2j>H?)OROa;36wj%5GMb$F7x)n{
zOZBH;@!{sO(oX&+oS75Bc{(WS+;H@4kFlftAqTu?9unTJ|09xzjrs2wB8UW{EB+i>
zATgI}9{6OOz>XwZh!*P9nPfzd>ce=GLd3t>c_U;;6hYRZ3;s|R?ldV`pC4oDtpibX
zSnpn^9uNFylIX4U1^yJ#MP3{4FgD^}9HwGkUeh`Iny^+Sq80Dt?Kw`}YuTW^<Tvk#
zJe?H>4^bq;l@B=87TlLvO^l)Hl%vFRnR1{m`XkI(D@APGm!84TppuJzobTtadk>K2
zqEa<5a+@ry4xnV~Gt)JV7G)gDRRvEH6yq%H4L$q_m(k_nlQc+R_ETaS<i$<vT+XE6
zOYQVeYlEs((_xp|I9v%V|H$&0TghS+^FtW55EI_^c-_{%b~Y|14wuIl=}={>_tnLW
zbeP1xg}>xK;f$9)x6<;)3?z!7Ia>U-$PS<QXCElVqQ;Wy^erjNq~AsyQvd2yC-LC$
z@Oa|zWho2&UipeDt#CUdh3@Fxge=u-BNhcFHX+8VV4!r$FWCx%7Yi-zCn@s03aHuu
z*LoaffRKLJ#7Tu?8<oUP9eF<iV_L>y!)>O~J3f(f#JV>{vdr<+ACjg$zPPQlgDEA*
zy;Ed6Nwv|r!XbB?l9-bkBXx1bftFFLwV5j_{iTP=Z;$r-xBfro4(f)*?)%2ma(>Z|
zimZgi=ZX;-#C)u102fG=#>~1{m?cnfq^xa|(kfD(3^0%7yZN)GdJ8bhu_P{dp+iaW
zD}60(s`A|lv~SRFzva(dG@saKF?%EJii;}5Fywo=fIeMla+Qv-n8BYmlI8dAq1Vk+
z7$?4;RsAOon*2T*9a8(ozTU+A?m!Ty#L&1|_n+QDKP=#@?GmjU^?VwC??Hk<ERQ#g
z3;f^({k?(b+eJbKKqMboLIut81J@C2XzWh|B_<@7IpQQpgOfr9<;jVHT`2s`#mxMD
z{I5X)#elyoe&z8?FWF~67Kr2Hu{(H=eN7jX58GujZe$d`4PRTPN=0FM<)aSP2aQV+
z9p10^|5#nL33LO6p8#SyDmJ(5Bm#pn);Nd9pBk7*cS<sSES^uE+%7{4zkTeL_)j;6
zpkfsDy;Ae^p&HU+1N{Wl@Ox$^9H9{J(z^-kF9SL@>5Y0ydEW1%l45CvnP58^)m?8P
z6N_r-Px11ZYISCo8Qngf87&TxgDIx)Au#BaFi^OS)-nt^dXDh;ij$WN4M*cmI!bs9
zPLM?;Wmv@eS6xN@up0EPw>Y>7B#plEZo#BR>^F<TG{}Jg+)bcdwgiR*VwI=k7KEhG
zx_ofqcdOz}T1xHc;X-b%EIt?U-Ia!5!A4!fWjTr+mP$<Kl(0%=?+&vP#>Myh-0~!E
zcQArtdF&4HRT)s@2a1sWsUoT!pLf^hu2m|Hz6a+Aj_cj-Gna6T_Or`cpj`x@MG{!E
zdS5G1e$ug+&A@#(o<1>yUHZ9iEpk%-qt~qB^lMzaI;U3#k{YM0^b-O`kl_CG@ye&v
z()19wvF(D!Pk#Jv)eJmMh9D>ZuY5k|SDgCzUovB~pV$OjKFU@<7>>q#%w*rR9K8I=
zk$MGA3cYpe{>aIb*8%<(m108+;j1ji(g$?hrIVk-%`tbS*C|k`9fj<eNK^PsNZPaF
zEC=|HZ&@}eulI)T?f=7SulCO_JBv~L+%Nh_{j^sz6W`lGHDdt6*I;ARRV^BbHFuJ2
zls-o?CfXM#m3l#Th31yvhcW%iP04E|{!a2v;0oIy$0l=;bj#~wp`{t;^N1duV!8br
zCOt4jH79`yeeUHxA#E+vy!EIuh0Mr5qs0d}_jle18{A0o9;$7S=o@BZ&({|FU8pn{
z6M0V#Lj}i-nq>8s?7~hK7{A}OD9;^qKP`_RC)d$SqxO1C$0kA#>!HyF(7Hm=Hyb+y
z)Z*RdQ`SUB+afvGoCvx+?t4k9ou6ZUpmyU!13tzn`xwZB;{ujjwoPC^z2YAdrmpAf
z5=wJ~CzFi|N|ny?gb}aqkOWK(BOdepm*p3D{l9U+ZRbN|R2-Wd?HMJr+)sELJRg)Q
zWh+!{t28DxaW!Y6(B8#103r^VxDHsGr6gNne?&rs;tc&JirWa_BH^Tc>7)Ek>VA3f
zzC!ee_MJe3L)-@DROqsWUO|LEE?cH&D-9j9{@!?-3`bM=812*0?+bup*xNv}nyi0X
zrwKlp`1|}&<T*|93f`<kK4!E?>$8xxfbJRx-MdUI+L>^9zqjnQQp|&=g(It#QHsif
zn|vd@P9;M&SysO8Inh9x$^;iKLJM|&|0&XutXJ{^Ql@i->JT|ZvDd{>LQU36k!#cr
z?tpzVInWkw)Cl1|5-HH#>j&$J{_-17VS-_rcXHA%52tKAUUnQNq-WbcvEF<=r3!7d
z=r1Q0OY?L^D^7u++skY^=;%_C>3t9H3}5LZ7E%deZz@PaSX*?e0AB@n$M$4(=k$nf
zPOOReY=jV0u2bzYJR0~P{fC{U{J&nmOd9@=75=~R9Rlegnr$Q<dLW{FahhL1#=Zsw
z32QnsZ4txmoYanFnCQD4jT9K=^%(N@PH8pCm#>=CX1-nB-bg(>ujUuIRgk7Yr^2J5
zelyyLfzFP&8Bqv5U#+OVasJ=~?2F$rTZ2V$1Lu~5*G;I-FeM7a(Q1{ESB3QS-zC<X
z{PN;<2W+$)lgT;$kNW&TNeVQepl$g3WJAyWGT$o46}oiq{PVScSPdbJVL~I*M={OO
zb9LG?HtJ7L15#hf{rD0rdH<CkU70l-CMH?njN4r!L~bw@r%dioMYGNy*SjW((=X;=
zLbS2bKFLJN)C#RkYcxfS4g)ZIz4Y53a`IuWilm<pD;(3bDph{JK%<O6+iyxU%RZbt
zjdj%Xza&M%4negT3Bo*(vURLVr1@tC>|$?Mo<JlGbV0U6L^!o?8>|^V1x}~pqXBOD
zLKP-@%CWub-Z!tw9DK;9rF}*`-Jp}PRsRkr7|I?n5(wA(p`}NldLQJiN{s;Qj`QQE
z+fa$L)OohrSz|^U`J?$aX0|Z@o(CJWPs)MAJhO5QOgr{iz}UPDuf6`&J;tz0PM6g~
zM4jG2%H6+do$UPGxJ&s-$g`mPA_(#UOYw1f=`)uB6$yc7!PBA+uR)HG8Zw4QS_9Qh
zpOgli$j<R1#%gPIlo-4$<LBn)rhA&_%71N|dcb<2c<@fhTjfV06r1+dGr_c%2p|m2
zx6y5{l-%pw37x@Vg%&*^%ZxRr6e5r74lUQcT_mBqK&%T&dMzc5ei9#N$UCQ`wUbh-
zrkG|d($2!7uR}$ol^)_N_dd+AoxK11OIFi`0<Ci8yVSePIATSUCqKz4@;r2t(KS(@
zCHnGU8hxra^LxI37yieoF|H?`W?ecOC#p9TMD$#%@;oF`V}2UAfV7J`YGM%&uy6E+
zQ<4emTEPDzHI6eLXhSnBp7RA;O^O3$I&{mdz0SDn*%rryEyw28wzgwNPiKrXU)Cf!
z)`_)Yw?VF|R`%<wtt+%vLWQcuXSupH{6M;`H!>vBNR@r6#x-y;Rb3NlC+TRP_n#k+
zD3wg-ypKCb3yDX_k&RO<&q;2Nvv!QpBH5o8o&lEb!Wd=+N8NUFgN_KN-<0Hk{83Lr
znV2Mb5F*)r)16-l1U+bY)zfX12`hNPzJ48SSyLZc>}|QjUC7!=7x@0;k=RY)8yYSN
zG-JHHi~}!prj#j`2&w;er{qWdNdVkl)kGRP1PyY|->*wrUaO=&`{Ga>jTFx7)MXli
zYTnn8`^my%SzNI|T4JNu`b@Z|Os{j4Kn!*-iSX%sohFfcEs8)9@fq+`F7cDOpJ2Uf
z|E$l%8G#4Fhr(m8lfRdL*pFZ3EYr1$=7;Mw2+l@&WYJGj?p)n0td%+CWw^*EUaPk1
z4erx=$(L%Cr&zyJ)SR#UWcRN9Z^k)@Hqu2GJ_&aIXCJeZVU)1o{2PtK3f!{wt@v0Q
z8X2++jw$KrSi>!bgh}5rsK@;5Y={h0U@qz*YH|$$65{a&$Xnu@U0BFoCXl2mT-jmx
z-Whw>`w(%Ik3&O9yvo26-2$NXjMhyHI*a+Y{Kun&o{bR%p1Sm7|ClDg^r}igI?aTg
ziVw|w*j;nijbcw>B3X%6;dDX1@ERsO9-g&!Tk$<bwZBf9?s%8?CdmTOK0l3wZd4I#
zg`MNUT~mz?qn+Zz{eAoxDIA4~<M?tGxYO9P>90TEjR~vwq;>nhbzCMOiZXcZ#x(gV
zcf<tqdu(iMfee-0gW<s0f0Bacz5FkqN-*n#RBo*Ji?!bt)WL|()C~C0(0pcT^tHhl
zTUq&Q!q`K#5><B%O4tI*NMDOU^yP&4{yNEWh(#p%T4AvmXAj=W-$#|;7Z)4SV=BSh
zIrDvbz}WeI%1b(ahGKuvhP>B6gBn&6M8c@{!}my^7F!-)Q`b~6&G4jyc?}l=eV>OK
zx^0<OVqXwc0mGIGmB^|9q?*Z~%<fExP&tv0I#2`@KAb82LdOp4IG$i0XCGsHMnPMK
zi5;i_P%mXoCGjO^5Ic-%q`nj)8Z0;xLQXjYqBp*Z5A;GaSDnyY2`m_NC{Na2Lg&_V
zrrVzLNxO>HO@88l&z4LIejkXLF^}XnYmlJ$vJ;FT7Bm9tGT4(Yg;xp{W=x#VQ!CsO
z&xt589_f59a(QHY1XF-^+dj|0>51hQDrL2^vm|i;^wf3bo)k1&;eS<L?0RtU2V@A%
zq*%HOO=K-fmvzYP3Xi?}Mv^IZI<C!B+qyp~dArN<>%60wSxP~p`|;o-K#!%gDX(&Q
zy_ws;5S&<JiKX+`QsckboByb6L0p~|ZI{j&#Jcw#?6jY|>Tn^T;%5DB4BvBx!8na>
zW7t|>%3Ix)B%ynZw^*d%B1C;)R@^&4%z@QbSlIs7tl!1+tdc0QQR~2cdo*>+ANy2U
zcB0<xV7^u#iO5bf=vi27a}5K#ZIZ<>4cCt`fmRpJW0YQe7^K?Tr#Gt?H&kFZqSaRO
z^KiV@(%jJFU3v1YYq2;!4Rq}%6&AIaJ-ol!PrZgcDuN?*d3>0#DKYRmE`of2v#rRl
z{pm7OE9R=G5(b{tN@Zl#b@Z^kpjBdjTVlW2&mJGr?bpHnMUeUR@~x;Zf@RVbu}>NM
z=D?b;B{Zq?YF^*;SDTw`<0&7mzkVPu|H)dn7)25v{!MVhe`2b31A5Gr8c*Z>o_bY!
zLM{!F0+V|rlcFM>TF)^*uSO5yfy8_J_&5Z@SzqacKm%Z9<wqHg@YueVb;uX)cWZ~C
zFjJKc^7>!rFY>X1MC12c9;zC;1E|MK|E=|j1}N_vm05PM8+j)fu0<bsWmAt)UW0y1
zG!ftesp!{z09p+*uu(6ElPRS(t&cEB9SveChVqtQ8cQj<Ud4+qvtayoO^tIdj#*NO
z4wbm8jmQAPanf=}j|+dQ*sA~T+1{X0f_(magQ^s7d97`@@VR@NA9J-VmA+4i>i4!t
z1YTTr0f7#Xg!+nr)-{DK`jZZ_Vu8$t>vCbdh~lJMLDM_i#;iWi#Fvy+tg?7h*rOic
z3|DA&lVjc+u^v<?3Lac`t3pkjy0Gx5hXx}2H*`y}ea`ppl3y)wGrXERuF&&B8p@iw
zY>|39a@srGW5UnKo+fhMyFrSqRtM<8V#;V=lSO6RIU0wDQ3}a;ZG=weD$11*sLmH_
zU~3Xz_nS?uAx@x|=PE+1GpNbkmF#VtKFpIUEH|8gJ8#mhm!{$+?$4J`>em<3UUP>D
z8{oO2`SuQl^H_K^jvLnrC%CZC2`L?Hmsf?qL7vga9zNOmVX7~NBV!_G?68jNfPKi|
zD(j8IDqOQ*^YdV#rbPrO(0ic`$c9scFbgerzvloi?X`#B>GjJY-8^b>6*?+Lof75L
zyf_UySEbarsx}S#<H(QD!N32!g5Ps{Ie1w5<oCAg8%n|+Ix1Cp-L-UdG|9ABbIG;X
zGp$dyBRrN|VXTLuvqYjRMm4_nJ^q&E`(b4p7yGvjed9p}N+R<sVk|1@MitqAPaPp_
zqu9}~7#hr_&8%tF(k_hhqqy%+ryI?58=aF_A-U0tMS>h=kipaF0{WS(N`cz!ii2aI
z^4oW@)-?F>-8W(_+9{ykC|(14|M#eYy(Zv4GP3{9xkBSW59oPa&J<4pdo|vw63IlK
zS@K2vHEKByMu1HqOJpE|=q~>(z|Vd`QIJx=#u<2f_E}=2FSpGm-f4Jq0&z;x7s+FK
z@=<^VmZwggiT)4nAJm!rUuV3YR~{5Jmq3*MDtx@g{OX{)UfkE&vKY?&`7w3Y^or<r
z9z&tYNW-Mg56YKb1lVPIWS@PQi-(UnXl1|)#h#6f6SK9@&k|Utp6$~hHI~(KYNXH_
zBU1JD;2rfK8lR5tRsNkz99E<k2#@1GDU{4Z#S|k!`MMW7Ojk=3N};l&m5G6VbguHu
zgJ-jtb%9YOMa6T27EcU{RiTLAF!tPH2#>1RIr%F^d^u7BeZ=K!WTV(SOh1|MH#@c$
z-v%Y@YcJCelc>A@;f{#>^02q#txkuRFAe0E4apmSt}ajuyIzs%oTZoLg6AwV@R4N7
zkTou)mPNgf_#;R}7P_&tHH6PjJfVH=P@t`lmnoO>mXMZYCrUn(?i=Q#_%~phSYbQ3
z9S~rQLHk|7DMcRk7xtk)z1VW?$zmLW!R1m9#5xcb_>7pvx;0c`&ZFjX$LuB59t!%t
zY1|hc25U%GB7e*-+odwM+ET*8VP({nw?y8+izNCe14o-W-z&`{o9a7=q}7xE6{><Z
znyTzsIG##td0W_OI=1tzM9WrS$y6FdFV{W37cM6&lV%Y@I^!zoDC4ZtA<k4kO$x0i
z$lyX=OJo0>!$F-|glRodft>UjPc{_U^Pl`ha^hiWW8Zy2vIdy0Vy~ikB(@Q!)T{4^
z{#?k=>HrKuWX}jzm3147A;}Yq@v$&28=MlFAo0DY0hT6HFWb29@%KBv=MDSwIs1kJ
zZjCVU=9>CC`bbcDsEEJzyV)$&g^vGLkp16t^L1#CxeD+=KiDX%tcM~xBO?4ey<6Of
zNSxx-@v&<WufYL)=(|6e{tUGy2zoT+M+N@g@ImCGRBZ2i(%PLyVF$E~CKS7gB7tvq
z0`pgj4F0&&KP;;2lwLlnG3<iU1g6dzYpT5uRtSGFqYf0==mG-<0eMDqUX>Wt=ZEd1
zj|ly)aFz0fYE)(;ABFvxR#Pv9y5Vqo#OO@l1^7(C#Vo+BYEth?1(<BE(h!b3VT*zs
z@3G5V87UiQ1Ped14qO_dq9%9*|9RO?#8O-1{r1{cslV7a4dgPdok)d&j4sLm5Di$F
z<=&hYlQN#LFQT6{j;=>2{F-q&GP;U>1bC2WbmN6ar%=VdgU)^7+}w&9zB>7xhky_=
zjZ8a=IIz!7kG@-!rh7e~ot7J2C(e;@R~hPboOi;p8R4p0q^#x-#EZn<fU8KY&UHO!
z9BwAO|BhtuDAD=0g^=5ko@V@$sR61p>M#4e_Mw_MrfyR{T3N+-e{5(qwbK*jE}c65
z{8UcK@_NfDmY~9zuqdVY+K*FE&o?;0REXxh<ZLB})+H|L(pG*N4yXD^-?S8z!mgxi
zST&dg?`oDW{<NeayhJucFc(O+2aw8SsIzY8usPgR)-7LeadsVI2S#}sJ=F&B#)Q!*
z0<-mzhA-A!z0dL6kOuQ1EN_uI>9MAdd*_73<T+&jU{&%3Hqi6sP8d*pmU4{lTkyQ8
zN<uN+pk$Y(mPihJ3UYvCB@$are6fm3#Ylt@x1mV317?&mpZZ4~5)z6;sO9Xu&3+P^
z<u~rW6aS_+4mxUtCEFv(iU+L~-W)Lo)6RPRFj{7uNNzgg?7uhPVhqT*Eyn>bcpqnn
z?(b)_1v-pqH2y(6<BM^bsq@L_he^||!PqrAJ+pldd7>272SdW=`KYU#)&D6CtOwUa
z1J~m)lBi38G<HwHP~t^)J})4W92kjyUS~+?*~Q^Pvb~vSE+_a)fL?1$J!!4dsU{Mf
z75dpT{sQ%9f?$l`W~)}>vj-T&K@G(G;@CLpLJpQA)Aai;8twybd78qHgVPBsb9w!{
zge8p<P4DfLNZsxh>rSg8()^0kX*pEZeV3xyk;bQo*o?WLh-Q63H+Rj#`>7HTL^w(}
zj*(O&#my-lpm%=XHPdn!D<WTD+3PYVN2<~!Gpj>xx{7ftYN2SD$f_y-_tJ;kdNNzh
zgdWk4qm{O!K_=h_weks#w><7=EyaX_u4o!#$X7HOVITtrgS%LKWajK2fnmzQG;@a~
zL`8RRR7|wxGaZu%wZAdTD;_SK<m}V|MC)@>Osr5Ri9h11;*DR@vMiP9u?BDluLPi7
z;-li9$B9P}BYDl=+mdMIflIPJpP(`{`<weQq+dPMTBm-WAc`EwAY!D%hxCArIAC(P
zpJWHLD}2}mv-DXe7E(EpCnX|12{t%9qoDF<ECBtXVaJ#te~XODONxyxazMi8Dymyf
z%^~Ps*b8lUi=qz<=sQ4;Q5~99gmcKr?&^BBn%1^6@I<5pUS{g*nIUOQmW<SF|1fn7
zmt5%$4*RuOtY4ArQ0QMs9h1pR(ang^in!G07fK)@OxKfYEodABc3Z4AYzM6DXDx_s
zA1PLX8B1)`ekCx%kSbW%(Nv;rZjZ<hj;_iyShQhOfu|gk^l)R<#P#Brtw}WO>86SF
zu)lV8?Kou1g8v+X{YJ&uN|BkBD|Ppl-;g)lYpSub7iJk{GO9_NP)DTN_~u6l3`=1g
zMaCL0JcFsx_lCQEt>FPN*ciV$rSzy3sKsAXX5Ww<?i>ilM-ln$I?*-H=&b=|tlUZ4
z5!L8VZ=;NvYQNw8rz8$s+4Z_NdN(M-2<}kNjdZ*jZJYpwwao{a^H7n!Ua$gKZm|&Y
z4b|8!L0mD8a=zazP-d`#L<OS988@Z-gFo4j{4n%V;>FJ#8&B|$)q3cB6FzH8$gcGj
zf-9;-5*#cj;f{D3dKY##y_?I6S5WH{N7`}Xb6?ZO9!IJWcrsdOo%{A`6oR}!^IEnT
zoL$OewjI5%oZ8!oM;jJYh};AkMNPqmrAf9XQh`gT8M$RsSJ8rkQ_Q}^uc6xlMbMLZ
zhV+}I(?R#1wC~TY-uzY!BcOmiwpk#JDX1HmD8?g(r&z>9wz)`0N$I4pm2qC&_0<c=
zTH#o9m#4gsd6|T~iChnw-sl^7Tj4GZnUlRs=CL3dXZP}QWz=tE{7p<4%57I6qnHEG
zskq;x9|8!XqR0t&Qkk_Bmx>y#?7RUX#JgD{nDpa<F}X7$4HD<eS42Audes1%6kJI=
zu?T>DFk5p9n3Vl;(+~(Vg`ZKM-#uvuH|RCD(9kS1j3W3a#IRO;KZ6M#_y#qaGe^vX
zq{ycruMzo72CXT#!MfN)(3FI}oJ_u0u%Ap4b#AOzMU>M8E^b#cs>!_l>pVz46OnE&
ziNFFHEq^ubdl!=^7;_{+MBcu`!9+^TZbXsBZcLRHWm2xb2C0<GJu~b><3-;&oC=TK
z9SVM<=kS5IrU-_GZwnQy=z)fJKEynD?GS~IWCg<&chex}!QBi%n1!MIN@xvw6oCed
zwUAf;Q&DMO;GOzT-t>Z)8KkpJj3^D8DZ`p?7Dz~HdK$G}I#Dl)2vLHom&IS2Cw2&|
z`mF9d3j87kvc8%>(`6>9a@PH|z|L)qJoA;2k-1s$f7HIP*$}zIo|>MWB@j)Ba_pAx
zC8|Q%3GdFKC0eVQ@^}4zg6IFYj)004iAt$36aof|=ZD0;_=tfx^g<Eut9)^;8aQA}
z;WZv#1X0in=V*km%e~lM1kva~UmsnY=t+=pLR*MnS$tckum{rgZm_3mb>>SI%D?Ni
z)}2H4ERSTHym=yap}cq}*hnvE6wI@l{|kx3#*^bPEdhPsB~s@5_U&@1ft7dR6I^w4
zpGYB6jiUZjybut%60`iw#yi4;Jg=RpSO{xdj#<N2q6{$meg4CwOsl-s4i0kBFm6tv
z;4Ge7d|mvhZIONJGPytn(yXeHj}y5a$)JkQsF@!0gLrvaxvF>96Pdej{M?0(kt3Ud
zl4u|2?Qp>enzB0az`L4R#$VRomv3{*gTemYDWRONNw-2nh6$Ir)RE93Qyd=bc(;ia
zpXDofL&3EgY%mpaBDkbrEnwx7X3VL(V{2><d-`5o3%KF>r-^f0KAFH41ZiI0%0lRu
zEkN;J)?vbCr^bO65|2>5hMovCXapw27aD&T=_$N>>He9l^`+L&cC!sn$-G_)E)+Ml
z(BYTUkrE^lNCh7PI|SEe9o}3r8vUc(#?4DW1UtCTBs++_I2KbTDmA!FU*gRy_0$S7
zSAyBm2qNHEvB}N}aja+UBiy0rzf}(cfgrI-qkCcPqm5tn(lCAabPFLty8P9wZ7w&!
zzS;VmAK*eh$z1(^?44C}Tv@ZGTWFER%*<?ynVCvtvBk{H%*+-uTU=shW@eTwW~QnB
zPMnB2Gu=O6BIZUBby2EXyY||-GV{&nJsCs}eMr1)aan0DRw_;zbuE_N%Pw%Ip3%(n
zqnI-8gcO?o{0&3VsnX-Qj>nl}NT{O!q=MugkZzRzrZ~FaRM#x-oyo?xFKHj(@2wjX
z#!8b%EG9lwm99P(r$SGy!R?up0xZ$frON{Ki<>?U1<L}l5U+HHQe7MzS%n+k)M-cI
z`tXY==>R9SZ`$!Gqz;@8Kg@d>5fMW_dSvtkEz;-CI}UGW3+{{|JYrw29F(dnw>Wlt
zUyM31&8-=eR*x&2HsLCqY`6I?%HlL7VhO(z`8?@!Qi|b2V%<W_2)oDq*jiTq-1z5%
zNHZ2Bp;kUc07l@~vpvkf%Xp_9?~nK?ffvJ{A-Q0|kC)&7LOwMmlHvW7KMgy2{@6tM
z>r~QfpaoQ?+q`a1{HAdWn!keu#BQ^!S36DEP3=tvuW*%zCo^>z9TyQxvEJSkMhvqj
zdV*hKXA#9!L&1fsR(BMmt38YV1l763mTw=sw7AP;08-;xjk0S19z^g!6;*0VVr*iu
zvrb8R9_KMH;c61>D;<C49`%DiSb8CUm!%uGl=S0@t!lxaTL7}THsQf{YH`rW;1HyA
zleG5>&xa;y>w1bO75sRHFW$dQXuiw&VsDK*K<$Yk3r5K*cVi1fD4@=|2&pXB3lHtl
z&DFX3n#G|8e7C63^OcLpIXe3hmI*=RZFCHLF|vxafHYP}4w!^zw?W=vU0Q{ZNMfb-
zSVbg+!_w4baK;|ZzB7gKj-q&Qj{v{?HZ~fKuSrdWz^f}c0&QOZ`HsZ!j!iRlRl0K3
z=356eL6(^Jl<V=R379@kC@($TH}5MGv4DV~V;0<j@K5Ap(6h+ICrc;pyh`r*5^5ih
zd%mo`){7HO<aKREa*u1Iod$lcTEV7UW+RyleNpP^@^No_=pHC6l_eF%;r&1iZ;f_l
z8Yrkw1fNrTy&j4X*0lxq>9!fTJVz-CT__{377~{Fn^|Xn?9SZFT|}Buhp7;!ufC|s
zm-FuKtCT+}5=DIcQq@T$X<3(s%>oDGtyDQ)POA2EJ9_D6KF-{4hDjW3+*&7nh`!Xj
zBFOmU-o5e@Ip${{&8|E4KO94TL_zo9Wko5+t-gn8roE&fAaNZ@ji%UQo)L-M8cor4
z;Pfu<VNkE-fBA;|U&&r~LDb)Q9y6CzF7dMjuaN(f+{07l0yZ`Av}BDR-_aF6#l?B*
z+FO`^Kyz%U!$;O4%GCAWSmqDVUHD&E=C%J|nP<4DiXGF_vy8!<0l`8E`xEr&dGY4D
zh;dYfMZIW+=8rbYob<jl*Cr$bYI~DetPWe4jFF;(s;nZ}1W*)3ho}A)kWg?IH~`;x
zRyq>xBzb!0*!`5qbSMY#{DTB^tWsW=uG=b~E(3DG^F+K})OcK}h>D*>)9lh{Vesgb
zaWI7)2EOiQ%P#?+k_acWgu$KzuB}JPKl+fx=S;CORw3Dj?{*FFAdKM4Mz6?8I@HGa
z!&1CSB@rdXX3;^IEzC61-yi^naETCSW@UBLykQjDZoY6fH9Uvych(U5`)tdmB$GnS
zJ}u*#rlFV;Q)D?c8&t2DAndXY;s8bT8FuU37q0euN9EZ&sXXwRx(g0ljdW%}RCIBY
zXdLrX&s_u6ue^ExxZYD}wyE(tz;k8N>UtoAv2?NQvZ`Ll;!v95eG30Yk{3Jn5Qr;7
zQ&>2?{gvo2Ey<pZHAeyrf@YDpXd>IUz1oFaj>6DUKJJ~;5C{!A?~Yr()MqW#Ug6-K
zf`f-FAyw31R>O&WK=*|Htq$~3VYLc0>cIm+*isKFSb14-L&TG86o%Gr<(Fjg*=JT~
z{<0C*u){y@c|u?feJMG}HItfs7TOhT)CRmT0}|7S5cbirdxF=013Cy;M7V<Tv6wO&
zESEx(>D1HJ;hSAfxYyfUtF7Cdw*)6pKR*B%?9dPY*2Iv&0ZF$2^=N@$ka+EHbtpC6
zi2)IX<o04?{o`y)si4wr;C-M)@M#D!Ls0+s=d5v6IBJ0TL4+9z!_u!=;{0EJ?cI98
zNxHv=MOf`F!c^em5RZ%sC#_Qi%ZrWyS;*L2bMR5v-4aj%!4vaz-12$r<UeH;m%!WL
zVq}c{rr@oGG=Hu}r-R=^hg=oj2Y4N;Pp9xxn0FNFDtXk9(9nZi4ZV}rHeb!ZIghgi
zj6+so_=zt7h<o+9ML~B>k|v#kd8@qCfW77>dEpu^a&6*M-#9Qy`)PIh$e*!@CK1jk
zqVph!DIsLuLe+8!6jfn^^bpEAq388vFJX=5F0$^D4T)7{S4fjbtZ3puCy??f9O-(O
zZWw$OvO)x{W~u#{L3=TOyIGgKm%J^$lfB{YT=$iqC;p$2-XnsABlA(R@%qvg#n-|>
z@@zi$yu$L)H<NUX?<JR+IZqqfR5Zler6q#o82d;xb0(8CUrh_wNh((;vLM2X!wHdQ
z*K&!rW1rqi!ta)jElJ<+A<jmvE94m#_OUC!d}vC27@_3lFZ~8A(hSv7s9QN8=Y0OC
ztuGTa)!bO(qT*So;yY<1JF8sN<KQOWg`G)AO~ul6KS-`IpDR|Zyyur$`r5CQae+at
zL9B&a7(^P4Jq14r^(X=MY6SfFM?`lwN;BKvfRmc=p1*veKTBg@ot|c6WqdzvbW4`(
z>)9`g83i6LQ2ugM2d?wkJUT*KZvICoio(ee{s&h`=!H4p4jDcU?<pxN{I0tK@*c*A
zqVF9m8PlTmIb+%rojwj9ON_UMklrpwdIYj8z!mnmg0-$r?2wwdnwx8Kw{w+Btc%L_
z55M=sFebt=i5oE{u|KF}69wGvu8<Z8B0|WSYZ~jjSw<#-6bcM5Oprt(({}p35vgn_
z%f{Xr`DmrgAJ$%r+&BpqN)HvgCw}q`pzkb$5+L<f3O-msmO2<t#;a!*+qrEktpB?m
z$y`}mL%$G;9AT7LlJw;zD+xN&wMi6Sac3o93ZWl5NHVtF96E_V`gvGwO9~=24`>F`
zwnnRv5a5lHOJczw!O+`5F)^FeKP&qxvdXa>%&25=cxh28q~UTaG@57g45`u=3rkZ@
zN`fRn&k$uHt-*ZKWFS8|lY9?aww3NXL7B|0=C0b>r_7jFJb|3jE&u5kYRrAj2tUaO
zDWqSv8F&O?yX2T;ayGS%dn{*Msr9)UocIE%oF)}my%XH}Unq>mW9Kd37j_E4$Cetd
zPxU^zd0&6e+W`a)Xg3HgFnKljpZC$$9?h)9Z80b1)UnHO|1Ip0MS>=kn$ps-iNiws
z{Y2h29t9L2fBh}|bh&6w`HzsD$jNe?#t2pvi~{;V$xAU?Q8C3ZikLY;Faic@`5C&+
zd(w3Ej!dcXPb<0H65A}6#+6#UE@?mJLxXYx(L;8YiNGHSNbn<&(|2kxF@KN8%XeB~
zFjVRUsFVar?|OE#-c{*}PZuV`9OWVG>s)qa9XK3P+2RPGU;j^;Rq0v1nHUgqlw&;z
z7AkmXp+Y@r6=WV@<8T~D^P`kUDyMJ{Cf%{Kv5f}P;)@~1p+D+t_o@Aj_(A|dAsiud
zxPj@((_4Szo|ir2r3S6sJ=%R-G|j?dGMyni<7VV-1)AB6&k>*4G(*@VOQu2>_U%T@
z%*ze-M@@@lG*8;jDRGS~P_wGbKD8P!S6V-7*(hCE+*Zc*fR=CEI<MgS)JN2t!k{JG
zlBDJZ4PrFZO!2>tw`<tk4Y!O{z3%+aQtp4h>K6IitBlB>9~zeT>6T3JJ&NB1q-RK5
zhTY?K_*)a^8&_Ghg_P(?ew<5;3e%HN0~0_2JL^C{)Gz4k(RHTnEgqNE?VXTLiwXzn
zk{EX}0p6z8HW?h{Cz>bf5U>@kC<%2*0vQG4hdq0JH!=8AZvFSeoo;c53I3`9OTL)%
zoPtH(SMUM$nE7UNCUcoRL*90UoCc*Q1c&fnE(ok5E{e>t3U-APVUgZW>H-F@F0)?^
zYdR~C@cZ8K(DQYaF<8_Z=a1|=&s;swM^Krghl)JJWA`$(UdtXU`E|e99O?u+&e=C`
zybr~a#>HsV2?I))^#nW~&XIm{C!*@rxSW4J^$?sJ5-F7#MFweicsT!Pw$rBul|LQS
zMynSW)nsrM*VZhjOeUW<(_0Es-iZH2jJHDyvMA{r*wXTafkE!J2VFaVD!W6`v3`S(
zZY?s@3QHPH_|lV(Yd_mnQx<01AqPT=^IPmF3K#)hk*G_}=n)!@*ED2EWQ?+Js!W$F
z$L*=4VDxrlrjMu$of>`ZsTSWZ7@`x0AqGX-YA3F*i%wYPZLqsKQb`H>oxE)4)ut+6
zt_&|Lm^c{B&u`r`yWGy)GC1(D1e*Bv<}rt05tEj8qxXPFdq|TDr)z|0vG?y&gkXy>
z`=nF;2t}|#D*JrxT8eEoLhU#xxJ&a>KJ)y4{nJYQhhRYgo3Lc%p%xb$(iOBTmj?uy
zLDY}%0k2*y8K_Cy#VV-Xy)WGID2q>PPd=GZ+qZ2Xdy)2!jCnqJ;=)yYfd`w_=84+l
z_r_Lps$enhaAeT)-L`#|^Ds>2#Ebt^>ILsK_UoX_?&X<b27bkq?dsIs-rVG~F45dt
zs|L@pbHAbQS@WIa+x---zhq(mDM@+P?1G%YDVWF6KiWkRYJtY!-K|PKt_e7Wg#Lb<
ze884-&9K(~QutvRYf$s1|8VTH+L_}>W!2k3BC)-R64ozGe&ohpn8<Yf?rGaI<@7IA
z3&g+e-SF2t7DB^7k1jN%PSBs&mZqmTWIuZQ;6V}atqkvZlrT~0bqQmcb%B%j$B%y@
zK{x$9&!$DPcll`mqB|Pzmi&E1ht{?J<&*#0Ilw<$2=u=2o)s<kAka*+@h(;0|9>CI
zHS7P$>+<xu;)(WH8bcp>iyJr12qX?-7ZeWj-eQ5ibV(kQX-TO%xkx)sd+vNb%t^2I
z^k{8sZPTuDNPff^@4O_2M9sy3gh574Kz5tF>_mwFS+f#lkESocdfVZK!XYmzPJ21x
z2qG8}z8i_=AIunj6g)o7E1PV2Iewbh`BEdz%A1k#i$juvqxgpjiV~9|qlCeT{+};>
zh#^U5ZXm$$GQ7P&_w`soPKG3q*JIrGj4;apMQGPaird^FH3TYxGTNqPjK{Fy-~ZKr
zKB&7uj2M+67V~m>&CpNq?=MIGpD&laN%j=l`_iS{Ql?vaM{qBad?#V6_o&_Fv(e6X
zKe1P;R6uUm;<!a-T{_CKrQKC_q1?)`(HP%~bPQ`9`@^Lox@<|Ch$OoEcG1JtT;_%I
zV&2q)ApPFsab-&<9iu|*|6JJrST^rp;2_4wUst=80&;(^_s`MSA$v8o6k{W!Wl)~;
zhK-ARGo+O@8oSwQ`#6m2kG(xSJ`Suk99q4)s`-kD7_<Dzx%z${&l_UenOcZU^NpKC
zjo7kt<~k5_q5%!3?i89J`Gh5U{pwjQOVWdPeFdAO`l3~1VhKj%euJZS9&oqps@^`a
za7hN0*m&9berb8hPRW~~>>4}MH>i{N>&;hTubs#9vhv(hE6_EzkMAqJTFn3bWBt9R
z9A3~;q%wLvoijJ9_x5dGTDBVJLeO$H!N6(=zL`Yvyu0J`c>sB)zd%D@*bVL3<F?ge
zlnBHoZNCvIA@LFx`;5UYdXZ~(LX9E$^gg0gW?cOtuwb+)LH3`Pl#C9N{W+P9n@mcC
zdU^P`Hpa&-9*bjLAXdI3X>6_A=CKf4C`Y4Zc8qDuwZw_klAN4lilbq8e$t)$HF6!7
zaQ2<Fwe(gI?tHonF|5nBimE?7R{dQW{~O;dncH=CU*~CE-}%z5Cj;1?oo{#FP_hY{
zcc@6<gT3cQDBFi8+uN7&OGCgKKmB0Z3DtHV`UdwtWGddRBh{Svs(j_nw8@`6`A7SY
z);-@N{)85o#-f$1lb|nW#Z1}XV&b3YPlt-uu_ahTlAg&9p?<IXL<c>NR62{(mSjIQ
zdoyY@jXcfS?&4E;6LwWk?8o?-&oVYros+AZ2P$V0nB!%1mDrz9agv|UYBmRZie&ic
zRtpgWr&@mek6Y<~-$BSA7?L3Ac|6lh(v=v$cI(`;-McQ4Iny1=Dx!7J?T)jTmm7?q
zGiVnOX1mUzo^qiwxCYMmRYN==!-T`La*|wQQ@(2dxmLRwcqCtFakhoK@46rMp=33P
zf=LOeyJq*9CEwbDetkXPJ^r>hH>1BKX3>1ax(+f@K*R}j{YJ;M<S>Y_`ZSVA-&NRO
zrG3({>3DQ@Hln=fy3=b7n?F6j!5a|#STw59(vp&vb<Qrdi-07p&*sHuUspVsMZ%^m
zFXSJctTRt=c3KawQzG_;u)!PCcC78}#b_kbz1@h-L&@n~c2x7E9o*`=vY=3|`NNP*
zCP6H(w{N22dvKnf4DHTSr_Oc#3l;F3fvd5p!uzxZPySJ>I%sK=>D;mRQ|e646Q}&e
zDR!LC(sF+^_O?IX&>M8DL&Dyuq3>beP}Ga@naZ+$V9Qvfv@X%aZ3(ZXdM`O-b-L%v
zopv!Wt>H3HShMY5t4;QCa^CMmLF;@M{pyCNZ<D>NI?8NO!o5ANysMvR?*Mj|F!W^m
zZ;v_BnFLJ;FMkCAPJ&L^t*3yg)RQ3m9efa|C$*Q4^Lh4lN82ptHfh?Wd9cMU`lyx{
z8~i5@2z=IerF&_@Ad!EdtK`}JQq}=wELA*DQV;+qeG*pz2-ohSG=_%!av`}ep=j15
zOQ5^26y(+n%0L`~0=Zr3%ibNMK<I&l)nIt*o{H+`@`K&9Z;S1YsRS!EGvRO#f?I3L
z0_Ms?HA!^K0CwZLWQB^R5R9?P_Nqz7EI!LDIl0Emxv4w1akCunN1q3gCOz-Gsvdi3
zy@6zmO-wg!bbp7IlA-Evx-NTfy9O0zW>+39dxI-2i)xnZ?E^1Nx=sLC(Tl|t$h}&2
z@dzNEBZ-3qHquz}=@P28xRmi1je(wZf2@AQ(6uW5&k}V08%1;-g3gOQ1afRkgQQ`h
zm{Wj~o#GG1G#mgK99fmWK<Y1uDy425F*rBG(`FCnWj(4T<>Z9&ruS4_oD;(+6^$!S
z_|CNdW&tF2+RR$aF0na#$XUPUJxeY6C26n6(PsA9oZ(k0y5ZmyFetky6MuJlmY}(d
z%-Xg;U*&)mM~=ot3ci&eFW+?58Pe2^o%!g&7T?;rB(!rl3;xoKptH~p=b2Li`8MCY
zko^In7lY6XS?*oa+o+=2MDdmgjP6KRu1v1zTe>UdlX^V%O@?%}73pwmD|^79JEwF+
z*OVnh^9%+%fA%`@&S4F(T#kt_HJX}j86RiS3=GdlH!QnsT3>ha!uY3fNeZq&40`^~
z6jJH!qEUfO(HzI}Ds;-;2ICHi+!SRoZS6GVv`6~dp>D1cRZHN@HUYh>&$ohM6}&K*
zqe0PC2m1FlD86!!lQ7uL7WDhkW2@_C$5E$$Ng}~g$Z@IrYhc(6|JCqYWplo8kZ{q?
zAJd#xx8gOADVIpE=AY^wo>}-$j0t_O<T-NkF>+d+^C_Lr#TZE`oZ)s>uZTQX1F0^Z
z9Xk6XT&|J%KG(e;RQX^xc`h@;1d&!Jvjb`XRpTs)jNitOglms0NAk^_Gx;8{LQbh}
z<=!sNn~(Sf5@Sl~iPp+kOpPaL-!hqT;s8a)OE#z6KI0)(2%LF6^08mPW&5mBE-Ey%
zCm5V^yNo~h+>~X39HctvHVLJPnaZX@_6Ha~XR8kl-wJW@IPCN1SxA9S7Y43PIw$;=
z9Uk~9UGpOh^uYXxaOm2^<P1|0IP+yHaG2TSgp(H9KPiX44ou33&iDs7oNLZegwcjt
z?(x5v2n_fALUWr_9M-=$7}}ZrK=e9rm-hYuaM%Mwrw;dTGhRuc2S-^h5N|KjbF}^{
zL_)-W#NP})nHH&G5IMCQ|J5&EW%!CspuMDrAb;`mwkC``^GQTxNt7R9Yj&={H{5xt
zOA6fxb|sHjr5BbIDswm%@&W-9oOBzZg@uzd)8pN`jlg@cg_(Gq(AuNzXQ89^(K{9l
z?J(PFP!t%|i+s+|G{iEi^T?4)omy`?P7zyToYsh^*N<~@(A<<x07rEq8>(jeb(q|8
zNY;oKrDfd9Pd~ZwuC=q%E4<3c3jSD|MDYVY!&V5pg7YW(e?F^a^McMU`CLOZ+c98V
zuu_W?saBbk7tB58`DC2O#G;&%``%(Pje?DeNAkg*H1i}~&=SL6QD0!r6=U1+_TgJ-
zj|z^Err);~_#gWn9%0B4Gkz5uT%dSfrT^Xn-ISa$TT9@a^_@!mx%4y%28IUS9dhYB
zLpOsM&-}CAH-%Z;cGHtU;EyI@2O;k5`D(`i!*DcK&bKv0g2u+aP#r0Us)h;2V!__+
zdjwC-R=zWbvkYX8JC4Tztlt*mfyru1KZa%sT%6`GEmzu><S#ak9z}jn%iFo2j#^H_
zPp#akDzvSVi|kM3Rdf89OR;vb`1CnFKDlWmVZR7AgC&^>AT9j)=7xfU!LIO-mnk#?
z@eI7*5Rj>?TY>gqqT`6VVS|;a(Ru+Dc)yu83a3Xq<{RgPr7Vhs{(e=d@Ft^T+(*(~
zT+?R_h#ll&=NK}n(%3Ts*z9n6yJ({ghQ68c*$p%12voNW*TOa^i+*RQts6;X=mO4L
zpW1NgTT8_JIPD3Phl~ckqIsge&HRKy3E(br)2>QR#{A)P@ndY_u%1OpZd*2V*ssv<
zH#X|^jIZ&qymMw#%r~qzu5#GS{cHiM21fe`ks9<vc$o$*9o@pIb(RbIN3Pj+A9bOx
zbLIT2ZKBMG+bUh$yRxor4V^<6Mq2hlRmqqK8rf&X0s=<gL{SC^?q0O=nryonwOb9&
zOFVx!)DIEda9*LYiQo9VGkOvpvNg#aa_N(14q_bskW#8rjUqaVrFKW1r6UolKCz7#
z<aW-vY*0QwI&<i9pbRI^gQQ+@MV;MG$JjT9&B3ogy+9j-^NvYVhDM}}Ga4s5&*E|&
zk<aNcdJkN!W-Z!AwHGFk^KZ%mGGXA}GVMZ+YtW%d!~OnEuJ<Y=yz9pI?kML{D?gdp
zRoLiRd3q++Mt$bZVBSID8%|+8H&uhq^SkY4R(Qg=s(PSyIo<-RF1Tw#@#78&VdO%H
zhXg^BXX@1>|LY>lKDxM}EB!wwFIW4dSE6VPJuJ}I{%D*`OjAU2QkW^?#9XC0UdM;2
zS%Hhy>kzySb{}oh#hLc_D-pL-U^Bh(o9yR~koi!gp%A?k%P%_K{g$qwfBg^_8{5#o
zud%3|B%8UZnQnLF@elG-8K$AHxhZod<vY}TuPXwiCRudh9E`^8@wjSM45z5Y)2xgN
zvDyfyO?a6!OlY<(I;O^jb-EwfSZ+4RNXPQ|AO#^5f|C!OKadFw5mMd`E~%^)x;tVB
zTlp_YaS7X?sSrnJu&?!0FRpYSj8)CUoTEp)fp~Vn7N+*?M&aJOtjpRCI2;$NHJ3JG
zsLAdFe)W#++|JVkHw*6^XNs`1ilm4lk3)JCZaS;kOfL3m*Okd^i-z;d4pTX;rf{Qp
zvNS&RJ@TpXb}T(04g8T-8VvG0)lj4Lt)kgB>vy|>5u!7DqKefn9sNwM=jTSz!inDv
zmh=j=WWr4ZTeZqVi1%$9s^S{yktZ78I;KXCzw(>zmmYJj?mL2E{=~9t)6t7P<uZx(
z_L~CHqVR8w9eycfX`9TQQnn`YFj0|cX4WL$;x-7~q<L--e7pnRfxzs0h~6R@c@GuR
zIJ$7<H#}G1(Y{f{Pnt%YIa;fkOd3+1I&^3S3)G1~-_9*qa-4AMwTHy6}l_24m=
zT6Jl%FqEp1+U%SR<N9K?=8ZT`F;cq0Ff)7SutjT&5%c%3D;3|4R1(dXv*qw_i_7cA
zrk;#z&yn80ZiWwMwpd<5`Dx-HxK`5Gtw$j%GE_`d(IS2WITZ?FNKR4nBg+XF<(`Ba
zf4+vC%gH<Fps@4?gktB(!F)*)S<y&iZ>j62R8L*qFGsCN`$3<qmY<TRN5zq8hT{})
zP$(JMH^d_*tWyKl5bj};^gunNcwU$BQzB(`P}eFeuYpymADJ9CC@fKpW?JJ5wNYy;
zlV<2Dug5MhQ6%;I7YNdcQW?)>(BOO?TdJulTBMbQT$*YadVtd?ZpO|l!fF3$1>?R}
z7fdlfft*GC@jlLFftC^CX)xR9>fsYjCr4Vwzho)fJt%GlAVnndmj=mB{GU(`b{J#N
zP%EgT=bfCl6zNP0{$|Ijl39@vOGNWj>u7M!?h{xpt4G5#OeqjEz14Pe@b_FG6x+CV
zib$mdi6OBQ_)5a2&r8Sk^9I>pgYjl9p6!I<a*NiW90xjKYbFHb$p-_DjuV7XkuaC7
zs2vAK&DFd)<2IkZUqfX(YL7z0`7TpIYyIFjwbBlfwpgKj4{GFYv<^+gk0Xm?O#>Ji
zg(ZDp&g8RSH~JLlW!0DHDw`1qr(i@eInP4o8)!d6BGI^Pk2yE)b#$$A1tXKW!aoXr
z$fk1)71RG>syo7|$mWxg&PZHzxr#e~enPWcsb9)kjPgQ{06Q-stfE`EBtV(D_xr+L
z0Q473r^z>bnS=Yn-XJeJ=w_K}an#@G?P9j!!6&e5txQx$0m`ai3(90tI75PXPWkO;
zV`vrNcgA6k4_@2xuzVI`=pGM+34{YGU)r`--%@oG07d<(646?XrIiydZtu{i$7kpd
z!D?+k{d=oAE*w*<?Tq7i&ZDEN&ba7<k@VNQx$Lt`+^1|WW`}&nXneyN1jKuE(J}hG
zd@Q0>hFb)*M2BylogeOYJ-A+OH_|Rn{yO||&&7<D5qcm<gbW)(TllQfWWBLw<rxcX
z5h0M^MszrN43830VyW+(Rb$Gr5%ARSOqO|PCa2<B`nI2Gs<U2}oB4E0)pp)j{jfIj
z5Hx<xOWF#xAf7gl)a7PVRjJLnU|}hWEWH3ea+@R>8t>x&gB>(lpMQCiOMsGB^J$iG
z&bMyi=_H7pckz=Ua14oy2?1-ZM(;|dN9iY9@<5}>@pmm&*E%%aJvhhU32Z7Rm6e?y
zN)@do5br`~=<M@-juM4Thcu^(Sx`Mf@H=zNDD2+6NjO9Hx0RGXe%*fUZqk<;zFo8V
zf3;LKLKh+W+j0BUt#xj$7<nRQ)ulAn%NylV!$O9JMke-98+Lm<x6^d04Pr&&WA+Sa
z+V5gGN-U1B@su?36cXp_)bP6jMXH%z)(r=9&lrgGwi_oil&l>~RL%J?HS?0B(@ylD
z3yERAW-JoY9+FtvoMWyy3^s}kmU-6*9wbuxC%h>M>}wnN{#y`o0|g;eh8`T$Fie;@
zVXkKBeU&lQC)_Z^Dxp%vr`ae+B!m7eF9%_}ERQ%6WCrh-0rnsm3gDD?5UI!rKt;u9
z58WeV2_L;mBzIcDOdyUalgw+mNG283qz~jbbhm@-=0jybirMP`c6*YdGCT$>d(UGG
zZhN84kj@6g{rp;?hFgx3nII38Z1FEIat{r|<hqq9uds><!|e4wK;QS|WC_DWCyd79
z*hn$E?!41As2-gr3=N>9W7-liGL~*grAglIH@^oofX?IL2Pq|dSN)yu@&LiQyKJ5*
z%0He&_q2G$8V6B0gck^`fnL;9=q6`pWy-jz!jeM#k;{H&*t5Ek!cbpqYWB3kimq}A
z^XB1-R%rk}Nh?!T&b)PE<%T?ahMDIQxRoUZeU9_fi)`nY`x!fgNnAg_+4MrvkcFUs
zSXG~Y?$RN3iFe|Sx`wFEagu0ef3$VgyEAp|w^&wv)l)64W2m~}YX3OJ6ODY)EPi8;
zGlHF&wYH8k)M`veV|bD6X8*cAbdgo@&V%vLrUn0t??gC99ymM`Lkd0sA^)5VkCsoN
zrQi1psp?+Fd%H!bka9SAG&s({P)XeEeRrO}qDHw5i6RJU(q^6kIKhXECr0}9@Vm-N
zJ)P+?UNwCkYZ=?i0kT9I7>!^|0$y{WeqKo&%B64Z6#s_ZaG$qxPTTLP$o^~j>|{6;
zxUTnSRILutQ`M>Z!l-WDGeTtlpgtmy5Kp4h>Zm@;Ip!r#Svpd#Ew+Ufk_~aEQgKZ2
zD1;Bm)#R52pe}cmxLqUWO!K>E0z)R-$f0~SunZ6$6<i1yFv1rBq>{H7DVFWCA_gxB
zAKZSv*%uI4oYZ8?yxTDCgS!+UrwP=rrc6|*W~hQpR_)(^&-3v05;nj(e%H<0g`3Z|
zslCN|vC=clIOleo4UcTBB5XT>i{F=^+Lt4F=5?vk{I1%sT|}qJ8G3WjdqC$RWNiqs
zN59yPwQqcCTW?8f4V@Xs;|APV1c5C=W1L^sMeJ%3Es9#*jVJJ_(u<emoq{oB@t0n{
z=kNkj?ft-FsVzOe6H4Y+dTLq?eW1Aj4}hVlhPDLmdL!BY7O+M{Kn3I%VT6%hFqDAZ
zAUmBne1E&0v2QW%!cju$f=S!hJGq2LW9|b>0wu^yUONi0SPGt{M&L8Os0D}-7%{nO
zdETqV{INi=+5m^cfSWH)u5=m>nqhVEOm_;K>E7EYfxe(<g+1tL;dhS<k~IaKZPLLT
z+PUP5c|!GzC-)lVnVJ7cqt>)@N?<zp6&8G`1NXI9^wwQ$@AonW!<5G#Mk-Cwq`2;E
zMOA7|#Z;q+AWB=f^F46{@2_JmnLJg~XoCu+6}NA+<n}{*vOntH5&&J<1$>R_up3j6
z<Jibb!n5u^Et9&T4nYuX6WHhVsgq8g8$R)Hfvd4Ov!0j+A@~$J6B90Z^<%0$1r6%Z
zJT~C6h{`(CpRhuXa-P=Fl2pB=a=sqP!zIBkVz2DyXD{0#Jfw9+dav}}D2-8`4N9JC
z%j$fdjSL>=N2CHK&8&Z-9Ex*AN$%7m?VReQ`cZ@5gFWZqk3eABk3$aj%OWh^M>x?j
zrFg0;vuBS2D=BmaYR{C;5BP}<S*jjJ*m{>R%!!rwuXtHztv*<y@ZQBl%MfbE60Ok!
zcaAS#Md6<rPYM-(kR%wFW!}mZl@nhMX#}5;nhu#CTy&2_Yv$J>faoW9Om&5#@*ZJd
zF^S>AAON&pzz@H3u4UQ;hyX!E2LiEjj*8CSMd_KFK^gdUur0C@{xAA%!r%{(i5U_x
z1bUZdbqt>AIyn9YU;}XSZef@Xv2X8qV^+ylLuCrt*bak&>BtsQ31kK^3oy>G4B~uK
zd<8CxW5=JzA%TruwHt^LK;kH@^Vimvm835|1FX=l!xS~-M2J5+DP(m$JW}X5Ve%H1
z1PZTA-d&_yB)Hi?<36zQ;QC_kG^7K>v>)e8YrOO7vBG&MxL?8H@?l=}botgc-SqwC
zQ2be>EYd(bODs&MkxxSNT+*>*YU0O%p9Zn8iHA;0wE8mV&8btxBv?EHk)#k%0gggf
zMDrSvR`D16;rp}dw=;Fp@zEiymp8o&yt{R}pzDA?em?{M5ErfiX$}Z!5e<TBtf|l~
zE~auWYbV-bEPY<i5{A=AP^7+{QmtO<=@s8|3|P=gXVMG9is+_Je!~*mzVqBb4aUkK
z#!zlR=U#vd7S|<tU`S$ZL(HD_VqdL-pfVOgHIThz*3Vx3Lg)r<2=Om@ma_16IJsso
zB-9Wz;&vPl6e<)rc@Q|fGqI}Oq%ud)XMmf3l)sg~5f@!3cD2Zk?l2x=Z(?4GKX0D(
zFpeaqB*8_VHw>mE&dJP|qh{!t*&@;D`GGu%HckbDzd8F%pr{tQ@xD3W0jiF@=ewpp
zg1i$u(PnldFm)8DG&Mzm`w?+8RaHQ)DPbVB6Lv(3C$T1svf2+d6XjI;#`G4)-v+DS
z%QS&Y<~e~t;%a!3A(%C`!fAzlOQConPSVT*PKIywb$m4MW?U>vWe5*6+WD*Z?X<Z3
z=h~<%2rTx1@?Ud9v2;z+Ia_;%i=NzZc%2*;o_8I9Nn;fn!-i^kt<Ldc2Cc)9L#1UP
zjnOhquxP;q$m?Q5_d@DI;k?=ci&v(+Jjv1|-l1VQp4T@!6oY4Jl~#3{k%yM_BtMKq
zlH^nMwG<c^w_Mbl?HFk0&H_sS@;<;vnL!l}r4Wr24h<(Cqhr8<C4lB~c_IwM9E|sA
zCV8Rq3i&x#Ku;XO$2nmChDM7}XPnS0x6|KA8&IGwZx{A8367e$lt*6bJTeI~i#w4v
z!dlb1g-%M68ENHaY3gzHaAQb^#X;#Y&%{p=)JGn0^1Lky&Q>p~uNca@`BG>{EW~$D
z&A;UViG-Y|aMPWeKw{3LKPgOLMQXG#+oG(E;%@+v4-&;ttVFMds1c*X=&|#&iD!Nu
z!1;=SyAB5tyvlQ?)k{tKCQ-&=1|05SHH7aGwJBxID=0A*m!X}j(s02nK|h0IMo*NS
z|GJ3*x@J9+-y`e$`KQCY4UO5r)MULXyzC|S<i)R(N5v!SGetB%Ki4}}Di#8&9QN_e
z&g@9f1eG&FSM!;!$}4E@GkIl#ND2{(#CDoN3CV<DCNf;iBP#hAc|0)_Ki^z<O7ELh
z4w(uYGC>d_SrXPBT1h!q^_@G_5t3yvi-@0h>vf#<y8C#HMP^hWSCzU=Zcb&^p1Z}3
zLtW+%{Yet7AA#Ngah>?O;C7)q9%XVw%W`QW_`Gq&EqryDw#Tkz%hM(zjF>=qN+E6`
zYW|bQr9M60+JS#=hc>H2W}D?x_(A3QlLhx>G0=ROuhdI&|4IU3FWk&BjCSGADKrBz
z=}drj2)a$WhFelDC!ZJkDzYIsUU=k^$4HK3onIvc)(86W!Dql(MAnv3R*yP={B46C
zDLU4n|B&0DQ4Q*7Dx!BC`RF4|l8MT&M5LruZJPZ^gCi(DMP?`^6V89tUZ5*Oow?D@
z2l=;-_?gA04`6^$YYUXZyIK*L=NZT(BBZ?16&2PolDNZ|FyLY7Jw9Ax4O>FxPr4D`
zG$!Zbs(T)pS46H)J@kqzjF(d-EEVtzGGT*A+H&m-zjVHhTH9{;D)tYG-6s~TV*?<}
za#YAhI)C;0D;oBZT1s13Bn@|}?P*!oFY@jceswv2-dE-U`gGj%P*3$9f7`_J=6=s@
zod4A{hFveZh@KSvnE5Tz=fe>xc#u;*Va@n`2~+|Ev*d9QEu1l<^7T?Y-UE1M7aTNE
zYXZ15GtEexmN9v5?kBi3Q`+OHHHQya6AGa6#qMFU9=szkYsA9)dUUb)_nvrnzJ*Pp
zNPZHX*7raDoWOhLw38KqMxKN><z>%D+!laa5{F-El`0}+hocy*dv1`QcBTtj_RKlI
zo>nZybCX*u5_@JB2?EdS^Lf4@4fe)0oM`=MNNDkG|Hxe;`;J}_OEK)kvy->P%gsWf
zg$iG@D~?m?PDPJxfq3#v;#rF~)LCt;DpBdL8FXbf#h9aCpA=i1=Mr!P$F<@yxvAGQ
zwZK`*5n==%ME0eTcYJrEOleiOHg_^bCcIwy+1P|brF9n)aGY3;<S~gS2^)qL25zTM
zU{54<S*`Or*k}Hnj^aQaLu}NYmPv1!!X_y+PmFq#4pUySd3cn8#Dg<K2_CAbSL#8m
zPrS_gwz_Mc?U(KTDw?;-9#*s5_!vYN*xdw($b}BR_Yk^$dbz#DcRf7sbACO~igoem
z%#OK$Uky#DG1``Kt36U1SCp)wdBL-YI`Uqj1veeLFj*rmnhkGsT!qMXXqW-4L|cT$
z+3tUsTV0^`B*e)^p$S9Ncl)t0>M2@A-bw0W4GYgJeZKd&tT+lv|B{iPX1**<YW~!p
zI_T5Js;&Iz{p9upBbqPvtF=!dwnxZ;Ze05aHwJw;Hf|287wQWpBHb`DIzt?F90U3V
z-5gT~iDp0|jw;-<wqkdjD~0_eIV}dhL8*pWsu#<8pEFF_T#3S~H|IzV#j;mL<@Nm|
z*<U{KX3svEgeG4XE!~i=e0H|Id-wL5Q#8oho46Fyq3&^aUIX4^vo?CMGsMGE?xywW
zQap$0hW+Jx6|L6}Nvc66)A_;E`eY!(&JK{%iOaf3=(!?cJWmj!vK9vl!>InQ=vdUT
zd{rjjf<;gVseXHfI^bbM@<Y8RG2=Ju)gWCOFQq3H9JWKg9eRpF4ILor?DI`VJr@}9
z1S<*;Tbq?DWf|yF2TT>lMPWpskQVdqsL+^fq~)CnM{^OCPD<HmTd=v59A?f@UX{Yg
zo_R3i3RBupk+-cmu26g412u@ftW1*>pz&eJT%=yPNrthB5m)C0y|p>e8G;b$>`_NW
zLaqJHS~O<rMDzN}ZD9#qm?AJGUunziM#qxzBrFJhh`X2FrmndLdQ7E<=`aUhsfqlD
zSO6uvRFL04x4N@W(W!J)zz!fu%6wjLiymfAG)T<{-C|<9sOjg`wPL7Oyi!BGcL}L9
z(Q!{SxYm8k;*_anV;Y=Rurd(W^hC#VEr`dle?=|EQ&#{4n&VhWClJ!x%Ip+an@R(j
zxi?NS%3s>|o<g^Yu_i{voCyIHGI?bm^gJK5YfNLcu?{9tnV(1*W;@<~;hY~mrwvDk
z9$`%L_<I<uR@FD_XDZq8&RTdn=79iX$x+hWM|(K<2f+(0Y5wf}H|1W17TLlVVveyB
ziu>&P5Q!wdAcWVwP@eCgT2PsS@2%u<pQpd4QbXL@A=cDHy=jX4Ek4Ws0Ex<4vCa@L
zQ1jJ1p2}lORVJA=RhtJ6>~Zv4J+Bqkj}9t~Ip5B%jghbD)aYq{NIHk#Z|m6{&&^ka
zm)XOZg}qR?&m1Q{ua!;j3NUG63IQbOIL*@nw3rUnSDCexP<0d}o4AFA=@fwQcE+j5
zC;zy;?9aC~cF)f-@6n@zI7<Ngj4rt*;A$ZBpEk`^5D78}s1EB&m7}|dkXtC=@bwEF
zB7Lt%9=E?>kAAF(U*UbJCb1H+R_{B=^@h`;1<N70mCu16m7o4j2#@o-@EWzgo$MSF
z2*{4JIyf`dVwdBJ{@FtHUxzlx6$o{c4)Od0fBm-$Xh6)kR(J}HwD?vHetyyX0yzPx
z=5Uv=O?(QUIhQ6nx7FnB6h&Si@KF#Sf2~oZKLnkjX>2Az9GIj+2iAzr;2ep?18c7l
z1fk!+oe?9&lK%z;P>fpC4aduR+{zmL`dF-y0uzv~@h!`)6uHM<t}A!z^BF4}9V8Xs
zl&LbqA<+be*2M~c*hQr#6t(-5|EDIuxI8%AGRn<Mt}0pJPzNOB-yJ!V7`cu_HSRCh
zI_smfDxMII=MNGOR7|gZF`wcS$ruVg$h*h;ecuf>MYcztHBXMLg}<Jo_2Va<?>k-w
zoiJ?|a5)?^Nh}(f0S;EY^Ou!(m28^8R!aLvy6WVwymqz$c{OKzYX^t3;Sf33BC{Pw
zqqN?_i42Y@)nqT-Je!?wM^D##i_6PqUySH-==Ax0qNb9jIpW~Ob8>QtyO$=ok|2UG
zM7|2eq<o&bTAo?QQ?;z5jn2(5?HW~k*B;bRnNzLs2G<x>nMt8Aq_s9lSwJP>A^$ZW
zXqLx>&c?Y(@)xHf;3pRQMp@8lk4a79pPqAtvygddbfUxxY3rEb0QKy&^Kt~qP)9!R
zs^O<Osy1F(OxgtUEvebZte1|ZTV$b9R>ehR@KaxR%cA!~)5uCeatKHS{pxO`qfSf(
zd;xI_1Ydz?FLG&2yU<btIm@sS{NraoNt`oruA2a1Pn3wLpnIL(#)iGj_gf;4Ro{*-
zkyoFP;ZOuBO4Rkv8Q#N{^rEk=rsY%xRH7?8ds(Dfn!LVp^(yS8#*}ow*LPPEMTh4-
zI|=4RQ!&&K9L|w^*DfMZd!vM`;^-)q$t7yhX>Hq^yR(<p)l*{UGGppMv_sKde08@v
zZ^A)(3CqwHiDZcQUf%rXYnB&YO3PSx0u_SZAvr~PoSTKWz6?6;=ss=j*{9E|P4?Fe
zJFD}Pc)nBuI%aswecdDa#n^ulT`_{1{w>BAPYzxsy=<8SLNlnD>d4H<R87jjT-hDL
zGVm2tbcXq0JNu!Zqe3t!ebuqHs=(Y7*oYKt?lbv%$H$|zAxY4HgDz45C1M4r*@{5k
zRqz#ba+3#AV@YRc-jVx-z7d0Y<q7M><9F(iR4D=k3%{S{>Jm#IcD*G~)ezB%$;>UD
z>mnU}&~P1$v?Z{xe^ZJbMt}**HG_Tpl7A2|j~e%7pZ_?)kSM_@DGaj^e5+w7L@z=4
z3maTvUlP=<RRRnI@761STd)~Qo}c60O9T1TLMaQ5ybpR9WVP9^!@RlCV%f$zsl+pF
zt+qt|sMr4NjzxGDMpV2n6`q;|t2}3)-0#uCTE^vFUNphPLc=Qn{)S1#Ms5DBz4InQ
z-alMM!&Gw>eU3AKZv7MJigr)7J|{8TZ|cHQ06J~uHToGrbCgK>Ab~=kgG!0bjIh?j
zqLvz;XEB`~Bh9DGTD&+<IkHXYVrA1u8w`Os%qW(g*Nhsi5q^$<mkCsUQCZo@!=*Yu
zLRtFi+S>a!nAaaX0w%Dh;0}IMV$%5-Qj2Wi!>3fqD_TrUPQef)>5?`do2G5&V+_%R
z53ogWk01m_q){cvY*y0&{p7ST^O2lki+VvcR+v9t3raPhoJapJbHU|OI)|%0psCJA
zncZ4+fQ4wssLfzE1)+UoA}4{^AY7S73f=lj-LrDx!=KwX!cvV7N(#t+_?<q*`ewkP
zcpuVT)QS^ZNX__!T6TOp{cDMyfOXgV3ThLmeT62hiz0stMC~?|$h)Zp8Cg>z7=MAo
zQs8AefDXcfGJrFH$n1_{wqcUihaBnU<S9DiEL!LKyty#(JlfNd=N0@3rQKaF`D}@r
zJf*Use#GT0-<YF*k&gK)$WCB>F^_-m$(ZCmS8*1s`i(CK&P}*GF7*frkl<8WJ$J<T
zjL&L{0T`X9rC+g}qO`zqs6e)`s_zZ42E?RPvn?u<`qvGbaBc>b!=J`<ggy$o;av?|
zF2WYBh&IS&=We|lq;*uX4Rd!)uQ7>@J%9j@`C<{6E1dae_Wip8|580DBnHKxwQ(Si
zjNPw4Fci<=A4zDG0|NsW%>lf5LHG=&Ng~8SB5;rKmQcHpLs<7BF_=D52wWoG!)S0M
z5yZBoL=Hf~T|uCei;KrEEPTb~dqbXq-Y1`k*ln!v6bT;3-%FCfx9(q{@*Se6Oeg*s
zo1@9ve36V*hzh|TWzw^ALN|xa`dK-M9D>YH#HPxw*CK-1{t|H@>CU2-SB%SH>wLuT
zE+i=^0MY{_uFmoE>}7?g9Fr2PT76>aEr)l!{uUwcsSzf;$u20otnk@o|5T*X+V7q(
zimcRqSYpySX;FRfetDMs!|af3bVyE8TNMe!CL_%#d<2ml)fgzo3%MMBn_OfDg~d-q
zllmaajLb6Ll-w@c+s3{z@?$rV*e7gRAR_*teJ%krTvF23I5v_2b(`6BjKn9Kh3|9B
z&jBNu>#l-aS`Tfl#(!btJ6N(XwY;qVxJB}{h4+0Zzv)Qgri`Nx1yOksX&=X7W=I>x
zN1L`DTE>X<46yE7!1bL)9bK*}s(*jx!+3ZzKAHNh%E@+JyF&?e9}TZr>^&dlJWiqV
z`*stFQCOttw7IOPTYwmprPnwj*AH(i1)rh%%ttpmr?*$>HA=cfX+2yp`SkiSF;Nt^
zlys0wDj9S~ZS;-bNYyr9y_$~9xT=gUHsCaZ$+}de=1DYRxb9ONfaqaHf4IG8vd$Q+
zH!gk?5q^k!)0jpNZk3b&p<0{L44uc;Atlh>$oIJ{p5BVqW}s0`h86}qT?2#!`QygD
zSNCDr#ZFQZPb#DuedvXE=DvQ>Vezj6PdLz@1fwHEW_WJVP}7dH{2a1XZ^I3|(#<ku
z#O6|QA>uxW$ZlV+Wmn1r-Z<8KBQny_W9zpJ>o~|hI3Qhut59sCg<;;Lw7-pK2FU`#
zfQSc%TqJ<gt<5zH-xsGQW4IxwU9fgyLCh(6hwQkuV4pUzUuWEQqaU=+!<~cdO(4^I
zh0#JdPgM(I71>e;#b3zfr(FWeJNn(AjfrMXBUUrWBXMfrkIu|g!QYR0l6gzI$G_#&
z&whs9<5oB!ZyBS+9Pdhrp0-cnZEGOMN`x#QC(DLG!Curl??b_(>#N_H$VC0)L%E#5
zVbF;de;tbEH*w(X5Z+5z?7!?d0V01;oM9dEsvRW3K+NRL>~kM*RG~qp>@i1B63+A-
zMh-*ERNEQjjt#QQ<KKs!WP&q-MP%x2hBOG_C)NxrQt-Yg$o0#!c4!T%(f1+waNL4n
zW*TRLBPC2zulY){b<ws%^dx?UXc;E~M^4aKU;INP*0QJZDtm{oH@@?Mh@-71{gHP_
zfl?w;5Yp62LMS0Ql}<QUIDEUrt(gu6y=rgz(u&*0bl>Pw%?AZ0i5I1Jfz+5X{FmSa
zD<#=nGPTZ7j8;jnT2|r>7-?UhZ<+UKrRW$w*ZpY<J>XBxv5W2rN?1xFsphH}FwQHx
z|3Me<NX69oUfop1Ee{NH@}o{~H3~-~dlhI)GwX2~^i<u!RSJ*FVO_CEjx|p2H9>K)
zw6e(#-E3W6IM|0f!R<}I*J-AHwY9gZ*fTO3>EgVhA-kmI%VP>lIeP;~0*U5ytc?Lc
z!FF_YSzyHrFLn|j*Bzv=l3z2|0mR1j8AlCf7rb|YD#}!Tb+@-~e1c|dSHV%Ozo(GU
zzZoY|wUt99D>lL;(80yK$U80;wf-FWQM<Q@Iti73z))t8?Rmkq7Yf#d%FvdaVFDs7
z#zM{XN-72#N#-RKO)GG2z&`hLF3%EcC>q778<nFou)$FULuB(($x%&G=e_x?)qWP<
z7~icKm$4EvL|_<Lxp)^j0A-sv#DK^SX8|hosY;cl^7%%Rl35i1HYYk*UW;Fci{H2>
zH9gg1L%VzZ;-9-PuI5jz%8dWSdJ*px!U&lMa#QGu%9IHkEkyoI)6m<Yjp3yT*rG59
zMAj(fP+a!%q`aJ-;md~RNumK|kBPme1{!(qeRy=}YL=a7)Fo37a9wj%FklKzdeVtj
zWqiajFqJMnuxXKtv#Zua^3Gbykq!f}0;`>@cB)fftY{)qN^Ge$TSd1s6SxobHaCoX
zReC+p39HO@Ito6GVlZH<7vR{$H*tkylau#f-_Fz+F&zG!VUWzy^ulYwEZvVMm_MAX
zt^ktI;ZJ#y(3)<$VW%R&dv=WS{EVXK71Fl%WmVm4(G^np$X847ke+e%V`_0!5e+4o
zn%rfcjW^Xx)oajZ5i+nM?_?xPZy$~n9^Ec-Zc|^9EKL>3;cq|fM7oRCaB?L6amJu~
zFA_B8FnbunH18!8at8B7UTxC8kmslm%lxJ|P6TQ<hz!_6Cea*v#ZanZhD60Riitmg
z&pE<1)N}82B(O8ki;~1-PB~Qa=r8!vUT8=F8h`!kmeCgEBdP;_pSy5HSob*0!l{Wk
zS?jQF8u`;-QpjEir<p#~aS94=+J}w0&h^eLK6=$va#}a{%HT{Sj+)##H15yR2Yada
zMcKUjs_PfvNnt3WW4_o9kk(YTDLV#{r<7Tpnui-NBLSbG?n^c6r2x}$%;bF8zA%7F
zBCpQ^L%hOeh7<)UG80=0i6A83`1f$(sRJC+!=qHPvSA|DcvD(MPGifIfTj8fiTy(+
z3}!qIDR0%CE;~BqjsKX}r)huq#YXHhayBG{dsZB4XSHP)bdFMQAJTiPZ*LNfJacr$
zj|+G0boBH}`#BSu0Cugh3%*f`#B(lFxFXt7x<&1<;rz-@^g(Y@jj9Od;Nel!mR?>X
znXJSz#pFz)=Jiv@nG$!1+qGmrx*WN<7u)vh5Wb4_??V0~?Ke2icYl-u%{01vI<m-p
z<;(eo_FcYy`W^j2BmGMZJOU0ii77kMqG&SfR(mQD7eWl$2u$9^Ar0Z=HnAPA?I9;2
zy+|pqxK$g);rxY7fnKkWmw-IaSIwP!)5|<Bp%ClO7ZeEHP@@B6lHWVX#ZkOO9o)ED
ztx0r+eIjyPu)Ur%^r1Z4inIy&c{cfyx!0pK0NQ)m#<j2wF<hiZLHgW?=KMGxJsW-l
zB046xAVgYRNtn#r&c=)Bq4!Laa1o#&tz0=hp^{;HyGLUdO08(Ghj3tRBS|Cn7jhJG
z$iPE>;dCMD%mg^op-wbt{TUKcW3#Z@KN6I<Q@dQsyj)%h1x1r<lq_#u?e#MsuYCk5
zd9}3<#2Rk`J6yVs&wz+b{~%gSDJcK?N4tvnrjV9kGyN{Xfx4e24VAjw6OJ3>=!@@g
z*Wync!bA2oA*EVe`k7SCM}&g|UKbd(|BJ4(3X5ZlzWjwi(1gZAXryu1;NCbSxCIUF
z9xPZO-MBl!U4pw4+%3V~3DQX8KAr!}d@~RCKJ{aNRduTNIeV}5TbIooVw3qCUfnOG
zU(GLnFjk6FvCf0d@VE*OycX~%g%F<)wA{-L=SN~HzD%lkZ{nzew(;o6_t-1=UnT1W
zqGaWdnZgUc)CHqr<#ec2tsiat3?L*2$*I(2D|C55OTW4{GB3+|ZyRlhmv;+bDzV+Y
zg1L}yqEZDlsMcoNU_eG;i_v?3`{S=sbZB&sU$x$skKILuCEZJ(JRP5lpwiP1v&OOv
zWl|3`blbhcF>p&9a%0RLTtar{ptCItez_(3c;XgRUz8(boVEENm-E^?aUAFJQm<s1
z*wrv0SNir7Q(}{!xVQFFm!x|`{Y-qDWRNgza82^GAH7(ykgUL-l00xR2Ww_1p{CHT
z#t+6Ot>k4;j{o{z>=`ceha+}!I}+qE9HT*wx;3zbG6PMw*6DZ;YHTSOwsz;i0QIX3
ziW}1Smegt(W$L|<Wu?C{+JfhpC^H!HIcNIiRf|_eXazcg^WDOlvdtpK;8o_^L5Rc=
zod#}JR(FK%*8c10FreS`LXLtz=tYO{OU<`tBX4k7jy_zt8f+tB7Hs_Zjy0CHL~*3x
zjlGv@=~B~nwP<@a%01x|ZUvGZS#9^W?+JP}TY7t29g>8iRxZ0CkbnAq8z#PRI{cP4
zQk(0X^DWu|C?vL8jaOU_&`>HnjO5~4w&p(FFmmg!CcuY|Bi%$75-(bU*<VurkxCa`
z@p94ioUoK>WIRk?yZ|Xc+<@TSwu1!g)&ZTdm(sZRH0w5ezHEEnUajHuJBRwvJw3><
zPS`h6d_KEIe4oSdOZ|r*tk+WK%GG`|qVnsHzIk`sFF`1CNg`vJ^e;`&P?h3vWUPv#
zp8Tz#Jiw5o{Tf@XM=5V<P)FuR-m&uvaV?b1Z`VIHVW<Elhy(RUAIhextTu+=Xt%|=
zrRc?gJeU$!Hs#xxQYeQ5#MN6y3SIt?1pij^DYwd)f+3$s>cd1I^V2FaJ3Mb%gJ4PT
zyXi4AK1R|Hudn%(X7fT6$De1Ei@-kByv0Tpf{2{cz*2}l2Vzp7H2mt%<JX#?`iU>Y
zgk5orqZ6eQR~1-l*cmZ`g)>nKq2IpK9@(#<a=}z^;6YW!iTGhs8j1X=QWbByC(W6o
zUmcp(?z$|k5^2x-Xs4a&{=20`nB;Fq{QM{iAo9{&e((n}W1wpa-oa@)`(+FQkAyrG
zSQ~cl$l7%)Ed*cz8DXKkj(S%SEV!&FhTujGf%$9Ra2;uBP#p1iBFrsa9Bp&0ALkG)
z4()03?2gmfev`HQE84ZRi|?CKfxub*DCg~a^8EI2K>&RRMM=dE9yPZndf%*HGCQyc
z04vW7?=|Y*VI~&&7FLn+5+N*>!HQCHI%kPMrtkY3XO?ps^DOW8m%#1-6nL{Vrf-2h
zEracq|5+lB{lf08a)p)usKy@-OeyK4tHX!00xeasnNrK~y4oULSi)eY0xm`T-dKGN
zBb7ZuH=PFZYoq8C0{wclKA-?jI^ei(czsAjVTLyUdW!tMJKq3nxX+6|O22EF_MO#a
zy_os(CkCNh7cb382n(vVZJXO3CGXO)4-#1CT<=O3K+1aZ$~{y%w(98k@rlCzY(d&D
zsbJW?XOBcmZ11+s2KUxg?g}<v-6NPUjG3i(@ywv;WM!229hF{a;|CKV-9fULp5625
z2w}fmr93K^E+)}y4R!!+dBz^h>fHL?^ATyO@KlchL}>%|WWrmFrzps&SU&$O+-OpW
z9z8|=`Y0D{>`azRtN8jhE?0j?bx7N)S{UaUiWksEM-xI{G~g~U1$rZAccKtI9R{65
zVqYv#hhOe#BNWpqx@n9q!5yiRn7!RUe^xCu8PRlKJ<{7Hcbme5i(`y+^~5vRolI}<
zBXKq;7GFz^Jdn&ZI9Wu-vyAM{-M%$T2fU{Vcf>CQ#^A>=6+5Bx$e4HyRtm?e7Hc$M
zokUW-K}7asp48`b8K;#ZR5<T#llk~ba$@-<kZ>C$6r!mIS_!@!mEj}^P>n9WgI?bi
zB*;M46~N`RV!{HG^A69KUG5vM!-?E6qNy!tHMEM`FMF|X9o|X-Ka?|`FEmyAb-5;7
zf5<{h%YVP<HhRT~y|6&e*iBjg&S+GA-o^oq`+PXv^O&q1Z4aY$=ljymuvo2$aru@s
zdJ9g+;t6J!1ytiu<%SEr3)lRYhnrKY(pK}oHNpY7HxvPXcW8yU^n@TbC$(65=r<KZ
z0*{a()jqfM3;u4ERDdA1G*&MbS|6E(P54#uNEdsA0J{9kq0QGq1~@3Mt|!sl7}I5k
zkQV}7BN_yVN4`-5c4>F5yqs808c=6F5Cw;*JXkxrUI`;wcpx++I9Y&%@ksj%r<!tQ
z^Re5nLnx`*oC*-zI*X~KwC1DjT+4S9X{_N$30#&QwH|SnV&l&ED}nSQz&_yUO@Z%3
z50)ysW^#wBzgb9rkV^CCtFpNMO4kB(3|NL$tQwTJ!iY49+prYvQjv`STp;0ckttUB
z@a|Ge#I|rxmGQd{$V9>{&Mr4i)n$_?DN!J&2g9W-h8a~lYA%+05$SjRAyv2x<LFyn
zLZ&+H5_m0et@^k71I!mHE5u~7m(U$ZsHb|1a?*Ic>_x9dsf|2q{6NidFx1GQWn{>%
zk^6c=J*6n4V@qAMkxn_&E^(NwXHQ4P0-h?}QF(mqaCmfdF~RmWr}NVm8L96ead(Oe
zYc`NlLQnx(&-_+6_0}p&n^Cu)E-VQdx(W6T3`6QeQF=C@Vo}kFsmcIbF)Vg7DmQUV
zqid?Ai<e1YJF(MXeElT`K?c<lg+O%pZQ~L+w&h0P%e4Fl&xO55F>C-1{y&nq`iA^C
zplUgcd5i!}KI_2lblNP0)UeBN8no>!VJ?AuH`bb#QT*qiBcDSrS$_&aN9I)@3vwX;
zqguG{7(|2eCa3Yu8e#<0pplOxDN*gLYm=d_K!=0$dE2vySvN?kP25i-iP)?AZsAp8
zCMnvlEy#gi^=(!A`57B+V&}GfOmblz|Ae6yL)Bbn+MXv7rYyWOMsn|~lW^gHar>T?
z$9I=m@dUtPIXI^yCA#V<W#swpQ!p$+q&SpP2>pdjW?J(rO<v`#ukR3=ns*ZIC;`Zb
zaYmJWv93z9cqu2{B2BPzEqs*YC4Gv+`yKTD<q7Y9yo00<b`+b{Sg0H>bH8{iHTxB6
zBrAlfZ*JLW3|37*d+2cG0UqC|(sd}7Cv;3g|Gm3SkN-a#Fd?HDkdTvt^ZJ{hoAnXy
zJ!h27pO|}I9}J@bo8cTr4=r3i3x(}5hqT3JR>9rA44Q+j$N_kZMfP`JQPy<_kz5U5
zOD3?LDf%ozR`%36%nYfgT(bUI9k6esr<!u1_$l&-wL1g(iJEo{*Ymeuu3+O25A}>m
zxl%SMtf2)g`uA0L=87UyLOHc6$&hvqXO-XnoAUjyhJ2}ErL4f&MHDyCn0(3J#@s=%
z0GoT3>Uey)yUB^DGByqga&=s>y&7zzuDt+u7H>nKRyGSyq{pJ_ona~cR+JHAGE-|2
z&0wH1_L-q>7Fxj;OG6c+f%E=h4zX$E&KM-IBz$$rYyRY8t5p$ifAS>VZ1vi6^Ip~E
zDQe0R4Vr^;sq!XyYeYv<KGFjPYy7Zf4*l{M2g1m1wZ$8K(ep$U^%<F>20DQk#Ua~h
zy&0SCV#i|{!@594GuX$a2<dwu(d?l6DWYh6Ldi_WxU7+w{tU&$3hL=7fRzQ?-I?`B
zEcq7CoV#PoBfW5EP?5`z>i&w)VEm;3{7)xM@4OohNX9UwyA9}Vyn{{eL8qO_&ndYY
zt+hjY+t_J$cB&nNS7D;M^j>i>!84_12$NMBoipTx9N~;pV0IJb98<zXp3EF~Vjg;+
z>0u@jz3!lFjS(Vk{`~U9V?Qfc!-@~U8iSv?Cndj=%;+Uz{F{;teWw2Wr@ZoR6uf7y
zc{k8DMg#JK+WHX_dPrq9OB+kPoqTJ#V`MG1vIRqP#rSa<Zn?)u;+x8BOeu~o?Ji~9
zI%w&lSz0g}p~~xO19e$QxbbPl9111Uy_LzA3*K{EU^GG5!E?_o{zzN%EyksOW-DW!
zY5*HE-67Tzpfmd8ov~lJekp?-@2uMOt)kLZjI4R^An7h49vw(JaA%0x7I!qo@^v>f
zCrr1st}B$Ndrl(6e%_fyRVeh2WI#M4m{rnW`BTqIhnVBBKu6fWuli?3#13kQmm4Dh
z`3(2<GhWu0$Xb2|LrT@Am&V{XoP&VOt)IE4eHkjoF)R<IDR(jS{4L=5YVkUh0UfKf
zyLlvxZ>2kBeq)iPv@xvH(K}CoPveMn?mxeBV(_;uz0CdvXQM;!8mk@6LGL(++%+uE
zPY0uVfQiM#*&LcEw}Vt2ON)wWkWGFj%&{ouYvi^Q=C@xd5}SWoEP{Ryy~{h<M8zSN
z5ouI<l^?v&LbbD=icf#!qu1IUw@(aTv>g&dbT9Xu9k|)-NYmmEy=+wSP^ey#VkOR*
zABVHgo5j)r_=?r8e4n6h?|2aqFcB@&3uhY&I$2DcRJ@(M)DD*dR?ka9dSBk>GcsEd
z&&hG15F0ciz2Yw7Q}jj?Of)0v%>L-oOTMxvRA{CZMngy6vqv1g?|z!v-G)lqnHMVS
zj-dxe#0cLN>wW!dytO<IkViu53BWT61MA|#!}M7pPT-GD;NB5bT%;I|j#;1MH{dwO
z7x14(x3YV$iUfVxiEpcPX?i!C;|(~GB)*z#Wkh>?@npcOlmv?BON2Qgc<87AiEVEQ
zdHiX{HVc;8f?mo-bT4_8&g>3r(FOY~J7KFVN5QgAl8QSWrzh8Z9^d%(oX#|gkCuE{
z7?nTS0j$5J--tT_+z&}2@grVBC=L=Xm);tjNqDkPOO|=#t>7!YDi|;AQ<r`>s)Bqk
zJTU~{nU$f>R%yJrWOh`sVI5OL$QiH$CpTgRAcHNFd{*shzT%r!pOea)eH5<K<C!qP
zrkgk50Y?=S^jnRmiy<~#%cA_JW`=_I9b;2Im0HKid(uHWdo(AkAw1|m)X6Lhd;}`5
z-wE5tv7>QW3_UQuJ&`iHF3!@S&`VT?R@v93m8iJ1v8Tv7DO_<)iw!GNa}h43*47W7
z{Z<j`>gkEyf}UC%SwE0A@X5dWEZ;k3KGt~EprPlH62?LMl=5WwzB8ty@Ylq^-hUjo
ze@h_r)s3ml9j8IxMuw8ldN+b<Q#`LBNZ+Koj3?Ye@WC|@fQe7)Zc4V%m*m96$YPDA
zDCD3P07KWlw1J;5_7vV?Av;iJIvC-}91av|<P^zl!TKLnfCm!drgq;&NsUONTnsoU
zobUIEgntpsx%j<`AVut9?197?M!h#D6HJZ9HC-68l}fCvRLB!a6imddzAfS}kw(w&
z?^@uD^|{%Xwg@>>Z7m12WyhQpC=YijV*2uYHvhRaCsuN2NP9~^U!)BXDOMlE|MrS}
z`2Ak^@JP9jO$Mvqqgz_(G8gegdNnVUW(htnq?8>YAIl$s1P-Y(L8Y=lm*#ygNVeN?
z^eY!;xwqf)f1FxOz}k&CvRozsobqS6J)}T=qwvVTu|@>rk@wtE9CE%V+}(`mWYWYu
zKx2Pjo_@BC*A*+K-7N0ZBpI>I+$Cb1CQ2{`pGlHgio`d-n69lM2vUl2Ouyml&qU;6
z4hf{grvp?dy@ZTQh#2aK?BgdhR8)w!v!9o=?~fGgeM`^N!O`*(v}y?EgphCxrc_QL
zB~WE|QWR|)Q2KU}7U>Z&6u)evQ=JPLysf-$T%r(L+(!(Em$K_fMDCWP!adw6p1;~6
zWU=1TY$Rasc7`2{ElT2%EU^{yECt-pH#*5~8SQ_V060UmE|AHO2!|dMyFx}geAG_6
zNqiEaiCliV&m*fKkobTc&<Dg7#im6(SAIM<mDtEYoedGz<)1~`*Xj83&I)0m)mALa
zxp_}$M5|}X8b*?b*b1w;V%juAvbk$&Ij+01qGqj&Y6iYY29;39WQ*J4+yBb~NJpr>
zv%51hE|-PAYpz)iOE12!FiCfHGy0a8xO*?SMsE_deR*OlieVjeFo<C=hJQT_$l~6u
z0h{Ugj-kW~{Z3B-lf>r$C1S)OluK0I7Kql@ky1;rP&}Bq`u}Hg{6E+J&-h5hw!g#j
z$&ueCF3d+@Kptm^%TS3=gDL$R3h4722o{RM4Ucnkx>pQBJeIyCh~|cY)i0p1`9QJ*
z!8TK@1}{T^oDdSM>$4lvU~l=DoBb!OS%m7c?mSqX?=R%$oT7|RxakXyglNQ&N)tV8
zHoc|%RBxz)q2J7Sy~ljaApWW)Y71{iVgFq{{T0L6U83S}{M4KymZ9Gi(eow`y{<|e
zP`7h6%3)>Kcu=Kjy-liwOBZ8gs8LP8v}I2%pzSNbny}hgwKVC6TD?<LiwD#3KZ&eL
zEQ9Xg4CAGQ+w(H_4P;xL`d^aa8KSEu^w+T36$(Eym~0e(Gq<30zT`MZLpO$b=BeLR
zt?kNF{QZ^QUoEIL?o?c%Rjh%S#fB>t@b3rx1<?lQ<6@rQxS9f{GTs1>YY(pQJW{{2
z8c8<#Ep!l>fQS$u^KrJafoZ=5UcCO#fY%AeMEgd<;r%XJCz>WE1S6+kr9b8(V;RI=
znYnMA`*SaBFiDw!&NgFF?ANdToEr3JLcC0RclxZAn*;m-q0(1=W)iOxXlNGmOVzaF
z+p27W@QLR6Sh_dX2$PP)o?6Dn6$u0I(BR)}MXxlrgmH{V9yDxNavaeaj=NTI8Ptir
zV7%z!8r0kRx0gS{VS5wfmv`;H+QMkBO5eIfWzSP~VjKpZ>Tb)Fq`g~B8cU;U-Shmz
z3$DQpJZzu1_N?(b4H6Eh+gyBuiwE(bj?gUt7fSd<IFDXDB}B$>5fgO$xP}uRCdZ+f
z?C#m_wJa7X(tP}NW!PBM=>HR=YK|bvE(e`SLjYz#EW9_|VlOM*0Gz_ndnv8|s(Eab
zQ3}J<a|?P)C2z6tZYNM8<5#ghtDbu}<(*SeaGGQQq<vkHL1Ndm?CR|xt%mRfTbcuR
zen?50eqrZc1$tUIIa`F^P_3jN*+6=z7HQ*q#gt4(k)Uhp=zY}D9*Nd{TgrArAM*dV
z)c=3HQ#9XStxC_ssrmJ6ck<dP>YH5iTwr(%D24%)tPqXI=!ycS1F-U6F+zy91kDhx
zYk>*kqe-?2SAv5=saq74@%41NHXTZrCJMq)*iq}u>doEN|4?ikqQ`LDlKy&TC@@!K
z=PP!n$O2rbaByBJ^u;;iCsvvvhyj|@%9k{!!JVsr!tf6?5duW(k>pSW@i|Gv8>VlY
zcFBNo4{uznv3z>ECbtPU>N!EMB!y0QwB4%@>kMYe(VPQopg<OBLPB2C%ipU+r6lko
z;M13HZe;3I2bow(H!5?mU*7k#-ChP4yk5srQHxoKc-g;Z7<~F>gh2;Ewy<5TR`mMB
ztJeq)$;rj&{NjODa%DW2f3Q$%7j&Ff6l1y_zVI-frH^sWkINbnuWtm52$&$m^fnrt
zz;mfv4x7vKgK-!7r_en~aNucB&j)hR%71!FAY>qzVK>tKbE~0Z4LGn#1IyXR=>{y5
z3@Ez^ZQzQw_n6+5W>Z*UkXI^%A}@f)V-~-wd7EFdXxFbpD5h^Gc!6B$hiK7mo=Pzg
zt61IsXCk=;7E))pck5G11QK$Qa%t=tLRyb^IhU2?*!i?zf)g+O=<n-M8@x_6AvaMV
z`urX|gZD)A)50*XnDSOAB_<dWckvwA?Qh_wiNi!Wu38*B-6By(5Is^~vx$6v&zH2s
zZe;k!7vXa_drH0+_R-0M7pLP+AtP-lP{h7(7D(llrkPU1Wvx`00*~YMcOI_G-=`f@
zi(z=Vq-0?)cxjsT#U#izFPMs$=~RQG2?=-k15l78J=t-fOguQ(*<Vw6%Dd%bOW`KA
zv&*=vhrWynR>W!+CEXtFkHb_rB-$d?azn6IyV?WsGT-u3gHNwzfMvH>?HFD6{oz>Y
z!0+sVFS+(^hPWHtJZc~BjT+xHTUEt+17#O-pGAnVA*2VX$@zC(elES@VAz~Un$)!2
zqVFi7c8@OKQ2Muv6<pT&fi*Ip*KogW9;obUI3(j0re!DK`9Ds%S-`*GmITxRnUwYm
z_MaQT=xHojgp(jkybgeeov1*m8%Qk%dEK)h-MTM@rI5Ieg2k&s4`77=wiE@7Ixr|?
z?jeA1DaE_8<$Kqv*XKRdu3zK7-Fl{Dxtkv9&@S!9B4Y%@_yc=de1QXB?GbeXXt_&n
z|4W!FxvcNdr8L(9bfg8#Ma=|0>3g>lko(t=m+1C0gnIRa5jJc`8qM!-jk&Lfv%~V^
z`Wwkb?OgNGKi-^c8_266s4WN!=h76b-qH5z<E-0*4j%aZxWnviQ|Jz_V!j+K7pkBu
zCVi_vzF2`H`q>2H!zy`{_k{wj)qxCmPb4~gmHLUid!Ynic~0<zg=~p1QG2|PPb;t<
z(L%_vJ_?w&eGrgxHu8SMv7-F?`w-UdQXS!lec>@wc=4{tqa1dMd#Lr45|bxQMti+%
zGfMqs%u+2fK`JpZkvuGJf*`&1$k{R<UC{Tg&MmZR6Ac;RV(yEGCk<}JM?qegA&J2d
z^chd#n^6S2&<g0Q(6`-oBYVr+K!;`&iYw9?w$JBV8v7FIWu@QHV8)nLq_$1YM!Ltl
z0t8=B5!?~_MF<0H512(XrfWw?)90JZPCrsqES==^$n7dAy+%h}ltAjz?e4~H_02GD
z4Iy60KxEL`ClEB1LtuA_axc2IHu9gJpW6DK9^1<^Cm$dBM%Jiwe-<=x%xSR{yVk!l
zeP#3ZE_m9?8rG13nd&ZM_R4sYLFu2HT2d*+fWhxn^_QI;?ZbZ5HNl(!E?A{np-)h*
zG|J1cBp=+53}#J9KYSQ2<A?2T19iRK#(*UnVLqZi?^dEc7I1L^+d^GR;LKZn8`1dd
zG;q#AyW$}0VqRzVDJX)XK{#*HaNGYQmt_sUyplx-$GuL@C^}A1w<Yw>?|w(Rf~_P*
zZTFLUFgTfqac|gJ#3l<66~iARd4c5xiH*@^4}C|iv4m1$6O|<wlJtDD#JjXMPe)r5
z!VKkXW77MBb~)a*uX%M8wx9XJs<hu%I*hct57))6eXt>(Y_F{ThwQZ%g0%l7x7){G
z`^+s@q3!v`+i9`RTB}%Jd4A-nf2zW__G1p(EopS;W$}pDaU|`(Q)&DJ(S^^Xu#DU#
zZ#&rQH?X*kzTW1JCR+e?`0H<6fZ#xh1{7IQH+0T^+q=!TMun<%;^Pn_-onO61v&!+
zVt@=0rr?k)R|g?!?UY5lSNMqe_h0Om3G9Dkw?+?w+qrfgB&ES7Hy>?hbynh}hnu<7
znc}w^j*XsY#>1^7e7W{+_v|iD{EJldBO4AMamuXk-E9}@m*~dQY}J$5wA5}jTRf`W
z$n9&~V+qmb%C8TVn_YQT>$eqG(1yS%m5hQFOd6A2?45Om_DisYkTgH2x*0`#`~ZF&
z^iuW@JNPT1jdH((-4COd=|pZ{kv;ppHBptDn)2Do$A$Awr>sKCCP6PlEg|XNUwA+C
z+6G}3EP88?;fRB`?)6YSUq})`$+_tmWtL(%5o?Wr>#mNqbxAy{kHeY3uV24RK79D#
zt^*<NeT-I9>&1NSiRsNXGE5;M{58`(GDf9X(kWN0a?)56e<+rB(Vd(>r^=Z17+Z{P
z2icy(nSUd`p*=SLQKP2}?M?=R3(=NCrgbzzc<{@$a&LC1cqv)O8IuRI64|$FBZD&r
zn~F~j$;p}L%Pc^|ZDJ=>oZlTlpU_KqXk<l=>nL5E9-i%BKpfS+OaJppV*qP1>_M7<
z6|=p-E$fr`(p)03d3s1JZG(wl#t*4}izw$yQ~|rH*zMyJ|K_#t1^+bXt9-(a^SX*X
z6xe<V6F<(5xE*NJny(}f#Wo4(J)tNUTMtX}m0dKbev$gZVY`DwJI2&Cq19q0w<L3}
zz{e-x$NO1Dc{mzWScnZC+QtN6{j`q~ecaBrkJ)T2JQvl^F50u^&oEb<Osz+!$atQV
zz%zk<au7yEF|ungrj#jeldkSr4P5BfZ!6*W#C^!DQ8|C!4KN%fWL^3$m4JN@^nAAQ
z7T=aDM2wq8j)D-$m~i}DY1?gt&X_5;U<E@Eo${11)}=t}fC2s_-09Mim`rpqfK5_S
zD&SeyBN>%r5#W>iICX$#LO+Ucd(?_~Q!94X_>yL1vEKGofh&LTf!(tolrkZ%LUM!9
zOW7r|lbO-j*H@e3KL>f_=)zM{SmQ*mf%rbvJ6NnnVLcm0`EG0k2k?TFHkRTKgL&Py
z$FFkQDOJExM2jDA$oe0rw*(<rN|h>q;_vD{3|7VDLD}~hh?>w41(hC;6{-H-`+vHx
z|GO9U_&>d<=mFnRP%8+6N~0Kc^)plfTTB~NNBI#>;P>P5r^M1}@H4{J(mY&_oLYU*
z*jXKwCu@I=MzM58CX_F=0Aou0_1#AAV$qiOc_&$T4%7IjyYEfTG;3P^OZh8<xRGc=
z>lh<l6Z|5jL}c^&ohW7vtIBxH^ZMT^IqBLuH8Gy#q_Sa|_dlkajbVA)iO$_Jy&n)d
zN38`8gJ#17^tSyJ*wY_V+a&wB9pKd!SZ|S?(sn+J<fJZ<*&f~|pjE)hcd~50^bFUg
zSVQ3L2FxP}Ecx5~vC$ZP;a<I#gCp#nc0aVrTSBci&e7t`vs|X^Vgo~rew}-Pn=x8(
z0TjphyOMM*u9;4Vj8?0x7_4GNf|rWpb_s78ULl9#!u?~iP|-vLWM<jyRji8iJ?iim
z!#UK_Jjm34-JKs>O(cD1(l)Mo^Lt$dTsSFepZ79=;OCUm8t_N*^iu6e(Gsxa?cmsy
zYxSC70?xMBcFMDk8Op7$aC9)((wtHMV-#3<W0^vQ4S!Mk@t<1>ImRrHR|Ewpz33P_
zL!8LgRcdGu%xx)3-SMl71yo$3Mrwp`@=DpUvX1AiV*or@oa!sZmUw8b^hVXw&6JWY
zUY8W9uG?E>aNQ>~G=4ldvn*4TkpdYwiWFWit}mg30!7Cpd@GBK@<eF9mF^knqgx<#
z0buvlekqc^DE4d}ACgMGkaHdOWNNr?TJ}2NI6SB&Mq=Jpnbw_?o`))vQpr$PWN9~i
zeONt&3tsDkuud{-qJNlO;#)VHq$)8mRBdnmo?J4VjC?yo6}T5ZCVOPMGqG`r3N<ht
zt5xFSlJRP);WN_zQ(C0ElQJ|h(qI7&X55~qPzy_9i52;%OZ~CiHE0)&c0@l`VLKm*
z9i0X}<C4lXN|C3x1zC|ow}$8T&28}AD1Be>!9<*YH}DfhFUIy>rQaoU+B@qw?e!B`
zQDV@?RsUyS?fTQq?GyY8=>6~S5z5s63cFqBlvxj@(|s863R|WG;v3mG3qs#q1vnW$
zInAH$B(wKHuq55OB}kXiaKSnp;OcCR98p<xvi@X?V|sS1IYfc-zmO#L!7cibZpL6y
z70L_tO*f0-p5F6exkm7#5OwOQs&%bZ2Vu!<Ho72>zE6={73!hV?YF9ADowG9&reBt
z9v;a&&CI`I5N>$BtVEQJs;V*;kgnhPA;jwT6dxB~C~?lLGSd8+fb+!=GD&=f(Yjf`
z11_s9VTY}qdla(?JS?0=^8V9vpJI|h3_I<R33K9G^Mv#n`{36Lys^M-8<L~=_*7rj
z`)XcUkB683#qo4CLfrIAuEX#BzZC36g0M-p0T?il|DQ&6h~nTi@<CnK<I~3+PVI^B
z`&U};-s#9|7KG$wsCBoUC}f_y#NY|qMO=MB+R+giPBnF4m>=F7Aa!gro)AHFJ+B@(
z{>crQh><CAbhmB$8~q!vwjs=v+b33iffHqo9*UtR_dboQi)@R(on3B5BF;Jes2EK}
zU)*7Pm@%`6Lu-4YI7@9er*4;5Ludo3`74v@%B;+is}87c{F3R`D@tN`nz`AxVGs!Z
zdMOu)049W_Gt($1k0P7o$!$xVH34aTD(e%m4)fy}8(dbr)i7iGcYj}v4+y~&1ug@K
zOf2*z8h+SN`oQx6&1F8+^Iwr)w1SA<P%uRWF559P4%eo-Y)@oqvTBY<;ULM7P17SB
ziLbq7X@+R)r<h4WK7YP><%@~#4TZz~pDgO-C~;Xa(|Eyk*r8TaI=qV3XIVDEKxV+}
z<vpozC#6OCMKF|8=2pf~M?f-kmDKJy!lvG!^wcOL)8Fh{w^PnVQyHiD8wotlexB0s
zUuY&MCMAkF-J+VS-~FF|fR8d@JLS7u&~!#^wuE5}Y4VrYzo5(9mQaB?PwJdvMqGi}
zY=NwtE26RQ+YlVjPtUTLzZ<rG$_s&xDnot_$K{$Q9mH~*52Z6UIJrk&W!j2Hr{(+I
z+8nA1%{b34wtQKBjZisvXoO0Akr2?*VoSH#QiQJ@zmQ!rNd0RS|1!vZBsm0j$n2wX
z&BSLDo*+(zx|D2vj)o=aTGfWI>>_M!Q8~M4YN+X;!=G=|j&s3DsCd!7zvfz2!;N2i
zD#Nc=FrjJ>1O5-PKg!E5ubKRbb4BkDepDW1c&s|(T_nII&#vWf(h_+8YXX_|5?PsM
z3a_IH-j~eY0@?4z?elUm6c5U#+7M7;7~jh8RvqmkHtj%h^{2F;R6@Vyc=vAm8Y*ff
zRIIM@LnTX)zm3uyEarndKqLoTEMoKBv<v9nKR$NvhThfv6;%3v3ot@V=_xwzo4FCl
zfp3djZqfLxT%l%gQ)%7z#pUM!U4)uGY-hN7tx!b+0Z{u3W*Y(tZcFtpBQRBjoll8i
zTlp#Y#(+|9x<(3cFur>6kZvrM6>gG5sI8^bu*je94N|dcMVQYtKR8yshd}{vw?8nd
zwbwAlm*pCNkIgbaZ1ul7H6~7QHPZoQ%gDtg;9ro!8jqUi>XY-tL9xD%zi8F+M=wP_
zue)UJMS;Xd1?oLM-g~0+?5^s;2qC1VN5!@_#4J^PmCxP%%B9W)W7RMreGExIB0DFr
z%zYED^;!|d4tCH)x_DjN)7@c%jPc6NLqHpj6jm6b{c<%Q^oo?6b&BeQd)!)%NQ1Ii
z*6<^@UevV<vo`rFxNnT-!|1J|8q=Rz)pG%cBL5QLCInlOgzJOU2ny7lNh{6`Z^0wI
z`IEn@XJw^S-McKoEl}lPXpPeqjkf`)X!rs+BiZWWmwv#{4&JWWJNx=h+!rZULNtnT
zX3H(%OWbD-hNXmH6x7}pdrU9-;MT(5@@d}{N;va9Ki1l{;4LE<P6v%_JG>zO0g@f3
zW*T~tZGe*35lyO3`JxOX(mt0G4Zux){3#%=BQMD~Lc0>YD-pNRcD=Ke^~j+6)Tl?u
zO67vcr6heW_F?>h0MN;y_;XL;%~EXR9N&3ym)hlM&>%no>hj07<=QkjHgPaDN1&C>
zwZJC_N~r^zVPStv1-#SX;zjmuWVOQ{H?%gf5KssSd`etLJWTvkwx5K#<ZF^$jLm@j
z3jilXH7)l+a4Q^3Doi)da`z|W0CIPMk;0o}W07y1Hbev>^AX7ADp!AxZaJnI?iW(Y
zQNO7q@o9;--%5>_StoY4y${Qs`W}y@*ZEi+OuN+5xS&+8m_g-zAhq0HXOqNA6blIp
zeGo&zGK>6d#oFMOaqVP-;oOgoDEAOqmmYoXkau!Gmlw2<(QIa0hf&iSoGSm*>&+VT
z5eF*W8=<()>)CQWOaHsySx3bQ{^jN6u~tay|H|C?sSF69PqNxj6z+X#^m*iB<d6#5
z>@1r*E(vjJ`lcj>=Jo=&TLhTH){EBresBU=03T>53{H1C#WGDdsFU0fjhe*SbAFPn
zI)ws@-aM+K<wBQPg!lAWaDx6h$NN7J<=sa-Le}fB#jTHP*E|9zw!~~29dUgM*qx)A
z6tcW_8d34~<VsjIxpr`xw4y4L5&uhfa2?>gVorlfU5phuGU|+?@q#Dwq-n#IEWGX_
ztSYm?1=?sJ;M)>Ue=l~gFMpt?Kl9g*_V%-{T<8i;<?~{cW3qxwPL;}ZHbYnVT_jwp
zYPX77@qlEpLer2i3In0ce_JH$Wv0|vHxGh$%WZH@dM^ak4Y*`I$zOTGo&l^R3?!E%
z5>3^#RJ!nYPaxR+hoy#v4rQ@M@*S<FvdO56f%yk+@%zy{menFbY^jkn#|~>mM9a}u
zo)vB(#+1w0T>;=H=kwK%aD*^EE*J_XajJo(Uxsir#i*eJx-ngH<z(w!YTgrodLG2z
zA;eniPx?d47<_IOMRPNRG0ZeE&xpg*m*dD0m*qVE0DwffxE%A5czpo8Kn4?in(9_9
z8=;OheyZ<|wO@97|N8N_{5(S_m&(9<A_)$EnaJR^ZUHd{XkXu|-(dn?urPDo2L~b`
zj8@@B>84x_mjyt~e+1IJB6rX<OpR|)-+oz&p@pF+O?TVn8kKEzyXJpcS1$b*Q!<n!
zUS}7ZyVEJUykXXp5khnt_>l8l3U^jwXfN~OgrMYom>mhda-F3k97uKjf-@+SAc|kW
zUrMM7!Gn5HK9A8c8ddqO#qwrJYVb#d0&Hvx{Jap|5@U_hW~OR&Dy9_b)9kMj)~X#G
zjFagWLIN)Y>i1Lv(adnnNKy5gE9GqR5M4$bgCGS|VUSeHKrPjzN^MMbW<8D{&q3oC
zlUO<6<4VD#Rars&ThdaRMMt-qa%I!DU7OUkXOmRcR0JJIs!3u~|2IFo^{Wg0mN<qV
zzh*IgIoKb1B!dochWaZB;CA7&3c>vf{tN|_iwx72<UKfUl)8S`t!oS0L-CJ-{{C=2
ziUY0?BDqPqI{tTOS^p+AJB^q6yU(P2cay7F^9cckBLJQb5g%+PhwRrLL<#w(c?>|&
z2}5weUJ{UOr2wZKCy_b{Zy-WlAnL^X8wV=rgHe><IOc^;xR?h6aAD)oC!xvy<Ab1*
zuJf-dVwIgBPI6c#(aHw9ucM9Ue^8Pk;2k1-iYvKLtulaaa(>g&`5Apscixs<UOdB&
zdeLHAvQ%B$V^T;G-O`vFZI$C{{Rr6#Wd1FW!Y$4AA2M3EE_JHpN~W^S`6Cv++$!zb
z*$QXWk(mIA`p4>E3hMkfUTWxkN!0K=AKO+kRi$JyOl|j2nqP~*hpJsZNC}jZAT$_$
zCEo>l=*BWkd1@`Z{kKyEDmX0W!G@{72hv1n-VPJP`ZxU7@)p9eE=W{!q-K9>m)Ch=
zdIY)NmHBxoHC4Yb@-t5+<b{`pP_(s241{xdCqR881MLy9-{<-O_Kus)4+6S3!U8Ni
zQ?GZ1Z{Z{C<<e5>lqbHK{^H-P#X!0QR|*J3!k;xy=KxksJov)zA1eYP*;R%;E;Wji
zPFvg>TO`y2ZiNIW%hB;<zs$}nwlI0zx})S4zV?uDA;D2hEK1V%o=^K-7)hDz%D^fC
z*i50uA+^(W>zG9!5?J}?RTqJxtJcBa1ruAhi|yCuHfg$JT-vIxtXtKj`i`tM6&ge*
zG`X9pKq{)QtXC$*hGxv0^Y~@lEf2_jjulj?xMWJ-?%d5cZKeDGqr1NRC_c!X0BRdh
zD?FRNht%D4`ZvnRFIwt$bSLFB<or@$`aC~3Tn#@tvA=ZE+1uIP>qi{S-yk1DS_9iC
z7kqPEl7!nmO9zW}aWHu0&4%4(V)-Qx63of~3{jCsCxrWMV3c{@jkDD5p!z=o!(@xS
z4t5~ml5UBQVItT%bJwi_9G~M708!;xCHgK2olx`L#WN}VXp{Vn{MV0#*W`yz=C8ag
zDVg3-g9f+=j#+{MCThhjX-1akbxOMoT_U%>GS<hGEq|yJSjK4Gxs7}=2ZE3-uXEIu
z(qq59+`-DnJ&aCgd8Wa$Z3g#iGMEA94Yl>=euh0g%Ao^ihvhbM4<ghuGyUo>EupIS
z+vg$`>jJFZtm?enXr%=<Lu)P%8AH+bNo)o-j-8JUGt<*IJWoUrD~T!iaqEfT)UVmu
zUH>d6oa^B0OS$JW`TtNyH=qkwOJQ+$S7JLi;qzkGV;Fq_898Y*t20~<o}WTf*-wRK
zL%Api-rlNHehlP-x=Z>VoTY!eQf)Xb_|86g@j)zy{lH~X6oTnxLa9sX;+gMTF#umj
z|7war#{mbttplzRv*lyd>Ve{lU8S@MIds-(5dPtz{l1=~PNYknekbqI(Dt<tf{l*Q
z9v?(0O;mE_G|ZxtH9Tl{*8PoIs5v@4H6a_uMNa1YQU^?%BmA!V+QviSzzCL<l&8;9
zn`L0Do}VId`HPunO^G(lLCAA3V=-sNCzb*LWYBdF4+*d`&JgIzy;m6__`uSr0Hyem
z&e;<-Os~^*!-W8kYNS;0#)6kb#1uOd7-DDypj*Pgl#^O(Z`K9`Otn5#{g7vMf!Dmt
zXD6cBsvHeRa4Im)Qe41fS!lGb;B<38k8S?-OP^T)uU)mv;`%zRFm9HBk5+6l6d=8A
zy?<~3pGbFF!?9+s`b4UL+8sEfnLOlWcz7##K#&=>eZ;@X&CUHT*c1;LoD~Zx-25rv
zFOB(P5F-JvSYnV9Y{lWyqQdn~25%<*en+50ypSnDBLF?jj@Fps$AmjPVSUiD4dUC4
zz?`oPtpu8`h7#e(yXkI&J>Vp)t(7Hg`QTV^(g!tuc6jr<>7*YpU?XYTP!=)?e&L#=
z4dW;w%m_DNk_jps4vz-%FK&3Lsziy~W&3TM{IhDhpLGhL-cd_zdwTFG@w0gs3}J>^
zfqM6Ophl!xhKaoQc3Z~uPG9b&FG>Qk3(;LMb|aD-cnBz--)+E})e5ydkJEs$?CUay
z1R+_*3%fZ>W(WX?E<{jha@yD%B2fCuE|1kNxasTZ*Crid%eV<kNdssjDhTYYVe#}*
zdhj&lE{Io^hKy(xruG|96XD+8vDj~M+~b!Z3;C1=@*Y}Jqp4TPoaV1P*^WO@&8RY2
z|4=~!{z^tL8!WXsiE}`*8|;oIg*-)$30!2+WFxM~G0eO1v9ep^NG14*1&#U~p|<uy
zcjS<dysjMc!|dY1)-WPgREn^1l30q)|5>7f<zr1Rr5H2QpjXf0%Xo?VyTeMD?-f5x
z5t6;rKD?GI>Ucq{`0wx40}a5FDm00M+b3Xegw4q6+jmqnVIvoTZr6izC^0e(5aED(
zSYN-rPCV^P^+QHb5+3LO>FOqii`PJ|Oq;4sB(GL0xs#FZIR@6Tov-+bk1WWbm*2|E
zi0v#Nt1>C(0{_m<^ey+ik4y}#Vbx;Q+OfSI7PIfY@2)%XZ#iaKqA0Zgl#67h?R*}=
zAU!xa-@fWgOr+AytljDKTN+9@w<mK~9VSMPea>n>cd9%XtnnpLAwm!MCzduwj!M=v
z{b#@X63o4IX@z^`dFg)j+M=+?QgG>b9i5jqCEYdVA6y6kJthS2rhKhJU0aseJOI1!
zwf>4mk<DDG(}}_1QG3aRvi1yFVY%VDk&<5~bC!T4cnAdxM<zo^^lmUi)iCy+>g#|~
zhRG@$1l}_Bxvx78_wKL|1wZ816rRR#p>|UV+$6OyoyD~1eKnBdTDp|Fc=~vC>p+7T
zJhb!MQa`p$Oo1+<@qyKpPPv-HYK=bd&Xr0pIC@Yu6%T`lYL<32f{FC-t=lSsj{SMb
zfp3v4HkqAL+Ujtb;%=S$l}V#MZ>`a3I?v!9rZHQkLbi>ah7vvo`w_FG+ou;WP(uV>
zv$6w`OL9z1<{nK7eL7*Ia4!}HRyiJkavZb=x=pKSXc>XOw6<z-s6<@??Vou{$bn?h
z<);S1#rK>;dz0|IS_l-J7zan{Il+|K9X~LcgOafg-^T1fCR>WG0o*CyOXK!!u4B&T
z`Mm;hFtqd&ick(vXJ20T;W1M#oL!;w2Dc1B!N)k>6;Gdjgwa8K{zwmq@PkA(8MTK|
zFxWnE`tbL!!@90x(UP1pCw2JH>*>4&g1^+_cuQBF3k29hZ<`$w?l|QiQ?fbVh?EZY
zhC9Cv7Y@~*SIQXUbND`KLKTQoemLyWJ<l068*z{;dIWm+rdgkksZF~Xf=eFF%F5MR
zjUf>MaDZ6aYNhaWLkv0=O+42h1I48>O4$V7Y1@hTR^E>68AdTH!`mWM^_J0{`^R<@
zht)bGi~0$3P+QM{;Ayz~{oR=;l@Cd?7dlK!Bg7-47neur-U2!L8>7K8j%hk{@8gE@
zJjWgMM%rkGFpZ;a!tw%wfxo=j7rufxroDfl(Py>AQ-!J+r|~(ulk(a*ZQpHqqc=Eh
z{VeqS4@uMSfS=r~bsSO;;y7Y9zpo|iHh6|lzmeo}-dprSSFzj{zLkkRsJw<jzVwJl
zb;hDF;UDY<3q(t%Z9J6OZ2d-}`X^J>iiFbhg>a+*IW!3=@EVfd?`O89qw!~0YICkk
z)2cEQe?@agu-S7nU#Qlx6Otf<uL(~eh6>ZU&X&24ta>dWM3W7Bt1zLYrQYKAW1-^z
zEQvgti6y0k%wb>Cq=+G+{Fa{kh=<o)q`<>3_|z^p3j!NQSfSlib|26CUXDzG2dkV3
z@jh;QahdMp@u?efqVrA9d@fuy5Ovix<dJ2?CfQpb3v9&P1C(M<5aWNGo~t^EyQbCN
z_yymg?cPHo7rqRR)*0k-83wpxD<r8{0hI$M12d%GzL*7O@d)!R)m=L~ivyBw%SeW)
zeP#n#bEP|^aD?$Fud$?_sQ;m4BM86g_x{G!xRWCuctPZqE^d)_J;dr2Iw*u34G{kD
z&OU;&h#jy3IM>R8BvOS^Obe*x%60UTF6Q|Ty2o|J;{?TMdS<Bjhqvh!e-lj2UM^`O
zn^(sBeAJspr9Cw3rC;{I7IO(bzzIa!!B9@D*1l?_y9x%FCghRf<F{=5!nGfa-4yvy
zg<;z!g0AR6Lmr5h8;hg23#$j0h`A???*PRrcNTJ6y82GFnnV{#Sfk0nM#e&hV92Xh
zPz1mLAbE=LM`AmM+D*A0NTDO^{ZMuTeWi5b45CTxqyrtc7_**$j3ZY(8|(I4UwgXI
z)~N`Am8^bG;Or>Y2*g=#0&{^4Dww#;b;eg%uDj6`%0^)ac27fw2OvJIO`va0l4sUS
z&B<f0<vwG{9OUhtX1CT4=b<OThT^<-*e7Uo4oaHE(!!%jEFU*_8BPdt9(9t75a~7E
zw0Flb9xLuU-X;b?)MbgM!mIybnU0VC$<6+E8UInDc=-9Mmms<a$8gyBn-jL|-Q%gQ
z@^w!6q|Dueibo5!8}!P%l&@pX1LY?lCadqjp=yHJYuTI~Rm$6*xtyHO=!~9alp%HD
zO`VMF7;z(O>I~8@<&=Uk6a^+zVC{3bBAg@Dft(4Q9FWp)LuY`IPor#q^80>!v}wkr
z$3P>5Sy)-gZf*bjcqUJm8A|5sTh#uGhHyLjhc^)n?sf8mpx}{F!V!a+VZsXC#<aQZ
z+fO+o?mA5z;8<Lj(LGb=3YGl5n_-2<17a}`O+ohguc<)KdKIzCwU!FV9G=zbx&R#3
zcpTIQK3-LDaiuj0x}3<m`}Xw8r2Ue<8sawCwMW``D(vMwcx*-dYmH3<P%$qz$bjw`
zatjP4S(V8eBs+-%*kEaqg}x4;CHWpqS>)j2<D`P)Mn&lq1uazb2Hem{(+{AVpZGuj
zbg_j@YNn_4N9_UIWV{ytQ92lneIrX6!a<kGK|#CyiMv6v`~1t#<t?R7-;9;V`UZ{Y
z&{sK5TtVsh-BGGp6br{T@Jc<BaczTlkp>sv(y8X7iE$0N(fO9EvaUsD>o)YxEIsP_
ze$4-YX<UAZ@|0=fwcGY}DTz7hZ!AGMGk$$HXyUGx?4+~yAjGSL$6XwMj_ZNGV4+nK
zQXg=wz$LnW*1!1vu@rxUn4Q>e*ZCI@sXB8q<7lb@8B_LiP~@2OrFzh3?D$S`v@^5*
z2$JI!k2Rhg^Wg+WqT%SbI6(eSw;!)g)*PGTJzwFx>u>+jJaB*ZFR6xq6Wi#C0~qS8
z>{{+5mdKD24H{ZqR$h7CJvgylZ(KS}+-^J`AB0unU?Xuvfk%E;k^cEOy2DV1P0&t%
zC_FpiB)8C=8$palH9)o#-@=ZA(f^h`W}_vR+8k_N0CtFs4<6Ck8T{&*IFAe}aUhZS
zzMn}?JhbqO&IRs%_`$30oz6J1ODW(jc23ZpJbyIVS~x?{%JwB1RnSS}O6{rP7R}Ln
zRnOXeOfvWOvzi?<Yksd`p6OZLo2skrVdsNx?x2`YwOW-mkn8DSgVGh2EzZXl4_WQa
zks#92%94&%12qGW#5CHiuH17ojY2-Li*FSWB0Eeo|E6cq5sC-;0~#J*lO>D2{$-07
z1<kf_<ymIz4Tcwa8$R-_<2mII{G4MBpyBpc^lQHU(AXN6I>!kgtA0@|R*j11Y3o2)
zwZ`KkH^<sWGSaj8pB5G(d-K?b<}FT!fLDUMh!50fvRU>S>qj{I<;|Q@9xVUaFG`1A
zgP4k-fi^ePsY@|v4}tA8Vl_(}JDfE-_~0<go@N><?f3qo;~@X#iS^D>UWn2m99`#(
z?us(X>CgM<R=cZ-$NLbE9Hyg&v-fwlsPSTTXbtoDyD>8af~jojk2-t5UP@ztW;vh8
z9)CI<^x4{79W5Mx!x42-+%e`;HT=5n_c7Dcv)TvbC<3QDOLs+QB;pu#;1&H3q8B|W
zVs)5BD&IJ#U%C_(Jq3%2757ZsDmh-?nK3N>nF+CM`e%y>nv|3A&*&{;U9j#(|ERAz
z@#>y7(sZ=5bN}|%RarkwI3XAd)D{#jMY9)b38q*T!(}~Hv5qee@RU5pr&j7Q{2^F>
zW^uKu;$LVz9J($YLc$!Kc1HM{cJ|%(Nw6Og3Xt0W(vLqSjhX-+D$KfVduF2KxwaG{
z(x|7moROr(&ahwVg|AZ;Oe$M7C3fHS95hjKjE%81zj7}LZ>f4A;$Y1CN&Bl>bc9a>
z*UQ?+vqIGhH0sAHnjLj>-_FNFUZ1+ml7>7Od8ty6aQd{f(p&Xt|KOO(sh+0~i5%@<
z+f}yMo2%?RSmx1~ANw>RM{dO@p=;w~$aT+c&U?>HUyAW9x8VbSm(S?dIj~bLN4KE2
zM{QAJS2d2*)4ggaCzHt1zjz_v1&_;p27V)nz{|y!m8xEl60_*6``s_CwGc9E>2xG9
z!%KV8%zqbtIR33H=Eg^W%$a~Ab3vG_BKlti0w-FWK-cdp=Q^6(1McICf3n_TTp>0t
zx(})6e}6_PEN#njs)9KiGN0e$r*~Ycj8`}`GI#F82Ffb(5i}>t+DUwx8hfpIC_kw2
z7|;ciD!to$D9C(>?W=Io2yIhxXj!7-`cS3162j5Z5e1)@Ew06F_8Ph;J2^v-ChATA
zJ=HPi2w&=Twpw?dFNlrS9*J7$Nl;`$dYlqKUEe<?^Eq<li?u86MlfMrp4<Ep?3p7e
z3dV39wxsZz@9{bEQK}M~COwwKkz87EbH5g@Bh<y(X`#O9xo=<5^Sv;7E@qz35M{3$
zvo&bXVjJ0t>bQgFRCZ*JpQv}RyQxJ{|9)0UHQahok+l+9c2)9>9xx!w@lS1OWttwb
zHMW3u6o%JUt~aY|YVOQd8l6~-ecw?~R5VamkB4GXi;awD2(j{EWwHSGMGFqYqW@zm
zJ|s(t$*TPHMjDHP6B$6FYUS0NeWgaWtxcf&1!>>fTL_S8z!|aMp~5B|0&g0S+=;x|
zZu|N5%0)r@^5YXuD39T`#PU?^{AO1EQ>ovu_fm@$cG_rW)V<+-Y^kVd1j2Wf`3fa)
zvgaE6id^&>omfj<eQS7R1SUYv>gMJ~#8qZ!jK1LQvU53Ld_P%>q<LiiZ^zD?DG&gf
zM6nXn|N1fru`Iqw{+a5efF{XC??+ek*ZuRYm~)U?pJaR)YumCB9KM+A6TK<lk}0uB
zr2O~g_j^FEI$57m)6w~Zs3=>Q2{>Wl!p95|pp9<b#JMn1v|Q|9nB*11(|_nji(VnV
z6vFn@L@=^<$cV-F4)R!mxNna^M+4qqWhw<-I}#RTi8rblz94pPllOcuiwAx3g2<IR
zLZiPCu9<!)^Bzt0%omiK!@gegXTUR59HMS$fn<=#qezs<zZS0vUc?`HH4Hqb9*IcF
zVw3UKNg}Q{r|&-0v7EVD`VbyVk&_`hyOJ><S6vo;DGc}P#r*q>i|_&{8&G1QN>vj&
za*|%vQp!n1E;@Nr|K<eBRz%62{6))h)<?np_p<J$BEch*rsR5mAKTyhNkN1pdSBz{
zg9aj2{9o+-Ral#0*DVSc4<4Xcu>_|$v`F#bg+hTAx8QEUU4j)aE`<gt&_Z!{0>!<!
z7H@F~uA6uL>woumu=e>r*yp*DJoi1HDPxW~2M`*li*LU7fBGH}<8cWv3Nlmx5tu<u
zfu3#(!SLhBVM=WOcRBXe8p$720@D&I;|ToE&yV!+|Dm|iihH`<+}*X9USiCrGbg~T
z>-s_#p4PQSIrje`hWkH2_*}!M=Q-Vse^<ND{~vql!*n2?RzQ5iaalL`KmYIlp@-c0
z|NZp;dk#DhLgIbcG#U}`7_hU`CT?!zGN*Sh-jM!-;b?LGzFzID;wWLZGjOK-fsYh}
zKGH!vx2S5L?L(FK|H*m$|K2YnrYDOica5%ih$4Bi)p25{=lT)!sJZnG8Bi3HwjZPt
zrzyNG+`w6}WTOe%T~_ktcOT5J)Q^3o?{GXaR^;UQd|sClMr6<F5+A({S9{Rc#}~59
z=_`>MJ6B1<;&)1RcOG^b7;BQhP5u9eS^dwkj3*oHOdx+c<vU$Pp^H)jOi-6XEY+IZ
z)3VsWtK+r9!{wI4C%3(vi(+mKS)R*EPq)6Dn_5|Ms+aOU2FdzM;?`R&K0SqlT*Ot=
zNmuT3A=bv0?cX}v#BgS{yj3HbFrNUB(0Oqi&%XNmy$>p&$?gDe!6@Q2pM=X@6*e9d
zN3ZF$=@6qYH!F_H<m%Hm<31AlQ#(&hE#b6F3(Y+H2QVIEyhrU`vGn7c<lcfl)O|Ka
z#Ne<h%_7vZub`074^dnlz&)#h(z=hDPOV#~-&Oa;y3oErByZsV_XYEQOY(Pb$Aq0y
ziYae0B;M+O*MFUf|8Wf95K*C1X7DqP|KwG4N&m^JRx)Q|dDQ|bu*^Q9I#iI_N5$cx
z1U^e-dfB#7F}L8wTgCUqYDbIRJ!!Fy0b->EAIwY|v^~v(*wtH>fI=O!+)Fr)=H8<K
zVF<dD*9Eo0z%}vVs@K8gp&nhg$EZtmM>lg}Q6K0J^@6%b;l`S5LiLBRF0-_cQUNWi
zGgiWnGqCtT^GEY+j98V)1TGl%-E#v5@x~+irJz5D4Zrg|_b|-)9U8?wBM6V$M|$Y9
ziAh^XS~H2KL>+ourjrX$Z4H{e(ruKz7i%ZODZ3{!zRRAXeKuh)D%B%X|KT;HaR155
z<TpyK%gP;<!hJy;9XvnxZ794?sGhE0{L3@B(*H*0aSZe|L0*5UDrd|a&BPn4MA2*k
zg!qwlP?{B9FV6pczP<mkUqMN<++T;NUwnAnvSO$HqGMxY!o2ynj2TVKWCOziS8h_Z
z6z5AXU=|rsnodnkP22L%TO&j7)$^B)&R^cy?4IS`>+YVtot4UHW}m$-i?_BV&B(Jk
zyp?9Nwpoft1_Y)|k^GqS!9dcX>jiXu-WjHKPJUdB<2Ong{n5Q0j+q1{r8wpb(``8X
zk>%;rTesYF3#`hv1%FS5kyyCl<G(Ys{q1vb*;v<fRL6?0K|g9vAPCHu$h1Z%&9vbS
z%qY6OS2t^bR_rwC4=w%0saS6F`Mum0u-Fr!c+)NR+Nmp|{Ki7!z&<GX0Teq&^@CXS
z+J}dYceQT0&3q#gVCAQ!6Zg%y;D~9Cfl6@%0~rs_T_&R-K2-5ne#hssCe)}w@=FG5
z`K*p}zXe$|<hDGOxqGN}J)R(A2l|W#u+V7Mnr|eqsn7_o-(e;9npSHbVf$CLYK<){
z`;^Ma8RYU{-Zh_~j%>5S6YQ7W_x1gq#3iD{NAHhpzc8tvy`7R%Nywbf@@G&Qd&Bvg
zpCcgs!yntMZ1pn{xsCrV7XC*u{3)2-@a3%aT}lVJUrzOGl(iIl=51YeivQ=<Fm%QK
zS0Za0*OYtG8tgh1bgBWOa38EeTpo`Gn-c9G^BGO3i{4kkkbCoOrVh3kW?6oVzil{W
z2`nB3hRPAeJKytsd~Jj>8$!lNNH4~6?K%(Nk8&xy980}8a4s6ly!_55%9}F;Q#xM?
zMc3hTE97LS-ojIitL{5H_w!LS6!ltpohxZB?!8dZ#W#5!QE|L}_ZfF%d2=3i);RZi
zdbVX7pK|G3CGK$Lr-6{r9O?0$ag(fwK%PuM$W-97MdiDyHzRyEd2a+fij4Qp5>K{h
zmqYn9B+VmeO!T1W7`acXqx7!nc6+L(Ht7xc9vAw~KayhQZ5Pi(O!IuCJk)N$H9ZSj
zEfVx|-_ThL`|-b37WR<Gad>PLy@PEOsl(n`k2Vik?=AndpItggqma7%E?5;KR2l6G
z^@lz0t+GI33fpSe#o1=drP)*RHb-dH9mbFL8L(zT4`o_<XzKp$?fz7pZ#%xv55S}|
z=!m}5!Y=6HxuG!bmVH@v&UevjZ8?QggQYq}$*E}4bgvYTbN10blIFh1e)szkr#<7p
zQfR)wrL1V1u&StEM9t-sYG4gVEN>##Qn!6zy$_$P#9H0^BH@x2d!_*660v)I6MAb0
zr6Q)WumE3EMQ{2;!5#PC(%bieswW`}t8uu*Z4Misr}HB5Sa>lFs`~^|h(+K#_f0`L
z9+%Vc`zN?U9$u37BRK^Sv~>6D_lk3(H*d~TiXBQ;XAkUgJv^okOmHoKA4nbcL5F1*
zjiBQL*qMW(^E)0de|IOp59;JMg%%T3AJS*h9UGug(x^^<_9^c}3k!)iFDl0*cwAy4
zb)mLd#yE6gP$Rgfc;@LI%R%?gA&B9}UbRvN!-Z%r4`TOzA9Y~Cr<$%zZ%)~8EIKfh
zGjxe2iE%w%NWr;5%IlQc;O6@z1uFPDHl&T@f2MJNo^r_{W~Cq>fJut4=(knp-@O|@
zT2_0-G1!7Vo?@-%*UXJ4dxs3a?u!%OvI|J+io!@Nr(OGWdm-fWoWqdRyxXrS9E))@
zX1~eU?J}hXa6~XD3tW8C6Z+1eSH_}bjTwbbM)%1mP@^i`boXDMQC^Z#pt58!ga4f0
z-i-%S=TYs-b3Lx?GxPTmmJYi=jQ8seKDmzqhe*G?q33kafrya38C0?xrOt()Y3fxD
zlc`-Zu~aMBaEwbpaQiM9r^ohJkJJtp9Ovp*+tN)Vq7y%kA{|c}E$+N^j6ZJK83+7|
zd!e(fcC@@RN|&VTdYWSI=v4ne(5D62&=zU#Ay1$-UuzP-XqTE{4XREE&fK?jI9zG9
z%&0SLT0eZ-gD|mD7l_Xs%iU~Mn?Au8^EghXL6rMWzZHgEy9w)?X&7rHiPgc<#oC<n
za22@U>4<wGCx7h!SW*4brMn9B3=gaN<I`AtH2*Pt5=y@yK|bE_!6@_1cXeS%-kSV}
zH?H}d$({cTP>0QH>XcTMF1O7ZwYjVqIX7<Rl4)o9ez5K&H0(-hPsr2wXH{W*(5sqd
zGh+DTrw3cZ5#L1I_2AM|BWdq6Cgxo%E_mHP2vjc$QQucCmYT7CC;`x`dpSRG9DOUO
zkM5uMJf9J@pM&Im&r5+0lXU2I&#B&~2qrL(z~cQHoNh%+-J<yZxmKd9|I4EdcNwBX
zc%&pBDJTJ3dHHJ(+WAs0sV)JW-SFRlpC_#(KINc>&YK76TxGAt)P{o<x-Y+-cE0hS
zn41L=X7ce;F&QxZ_)SE90;?|Is?bs({HL)qdyum)H8qj$mXltaXC&`UKlA3?hvu`_
z6t)PT&d;BPBgY~+419l<4jl$tS0c;9eGl3x7sXN!Hj-<(yDnDL2zP=%vc-NTJE&WE
z`ENU*R4fi(3*<pibok-5ge1ZJ-cRlo;)@8FZ<((1jcwlacPF(|7zv-lABh=lp_~fs
z`Mz%GFQnhvFK}}@^*mJuOBwaUb-XpEQF9@gfTFh)3NjO>=E1h(8o&G@Exi-@r<0|D
z!NdCXxl-Q36&yTJE%wU7a(FXk!nejG<9(*u4sCzXarFX0^60r{Z9_8$s*z}0mTIhg
z+u3_;yDb)QeBMW28s>DX+`su(M8XJq>gMHb%8(bZbb3ymnVxKoiXBsyyuBi5tvGs9
z+b54Bdc(y%Wo<F)!a(0(wAyu4`_C&>QpHDA^oAkfe-un-{6@ycxi|3m#6prkL&}XO
z&A&F(0}!|-`OUYa$>lM9DeCEj9x$o7=9<M%V=aoE?S>F(PIux^{I#INPwp*DX4lG+
zl{5w@xeK>35~Ln~&?Ib6c@L*v9<g<jfF~lT4hoIj+0OWjmL%nQ8fX5m@&CUI;5nJ5
zHv{s1_ZoG`s&lYv!+%PShKMb_;Vhyd!OsjlFbiw({P%g#<aPcEJW@KhlyfF+K_-fW
zhucGIdjv+tLLZwgX@u#H*zd%bm1>^amJamQlluGy6bb&hM@;8Q8PDJq=i$K$k~Wp?
z`Mr8bG|F;>Z3$yG2I0-W8&lkTY4L&R($^$`quZQUja=V`xBtv=CTDcxj&>L9p4M-7
zh%3et@e8oqo!#zw+i`_7a&~ZkdLiYfZcBH%f{FDZF*2XRJbtaqABLL*{OMKA-6sla
z8~8F;KhpEbYK@4)SmZ}f+!q1m#!-({6^iW3{s13&Z_VY3$9>7g$R>v>hTa9HlAXaO
z*x}@NEz)35>g$9N@1kJphFBaqomEUK$rn%pj~3qiyf^80o9`NI?tO(da%l^O*fJ`E
z8(q#?j+VmB`v}>#@fp%%%`YQ^5}My_zP)%o3XP?TR6}CI#6B|Yxc+(PPOR}BBI7O9
zn(ic4+b!}(^cBLcC|h4^gQEQ*>jst8d#KXzI7E23UPUI<2h?{6oG5b2QNOQmheIDq
zl7x=Z+s})##jWdcId&50V7HRDOJAM@(v2p?S%Tha4A=<Ude)`NM_CZ)GQeM`G$#tx
z{5BuN#B?@p;Tc0{9af7wQ_TtmScM2e|0)1V`IA|jP6ywX<~2n|PH+7#Uj9iVTpMA~
zrlf0F=<|9yH7(tq$WN?%D$i?#kg&cNsM~@pPu5VsBI~tjgr|N+k_s0!PQ@`ubdm3n
zH!U(IlrOh#zbF@O+-oS0QEFzJw1Ka1dTyv}LWQiVF*8ZZ---IYa5nU$%;Ie{@N#Zy
zVNC9X*C{T056)fLTGS#XjO(?T7ykOUkm^!tjKW{^X2Cuu$xTr`&@vid?uA_plPPX6
z5E&2A)07H-Uvx;3m>MdlS=hK64{U6xX5<#tl}~i+wrnw)A8rR_j|`D4o+sQOMVGYH
zP2yt-+nzTF{ms++ou1XgOXi?5OTABgPi*}=-Q{@Y{)jArZ$A6bxkxos-h1Wuy{w0Q
z9{Q{)KB`cQdV1u9ov**1)0A`p|EWjeer;-!fzgVSHvi;{*cBzFs7JRX1uqb$Zt8ju
zSX?FPb(Z$X!K>F+2VNh&q_$18e`EOCb&Oe5(AgOjY~TLq>$bd&f_q1}5b0tr{-g0D
zQPjzxBK<P+xgy&a+;E>;a74sdExY`sQj2<*ZnwJ^*pl29aZwd>pCU0XHT#Pt3$?`V
zd|NbV>gWZFc6%`F*nyW%D7skWoqDj`YE|n@)YJV~+D<;k_qZ00*sl1=nMvB4CutCr
z>^fK`R^w&LM7b@a>uekQ*D^yZWdk+PF8JepXOwogwXUm76^$O>VCtTr+fs?E&=^pU
zN65rI{hZpn_)84o;i5dCAV#6(yLs`#&P#OFp#z%oHT^LA_W6CMmFZvVzHy>Mg;(H8
zhAfi!pZ`UYzB&WMQ4o9+Y|!FwdZLSOa0Y_!G%gDXY5_@qa)e0Ufcr4PFyA-q#QdXi
z{=~q$+9W5osS$}yQ4iP{%3E9?6PfYURGe5}&~hyM#$I~{A20AB&dFR&T-%JJlBilr
z9m`oO8ZG4?1FW#APG2)A%FpDw>g<kXno0v$qJqlzBQYarBltR8?%+vhI%6iUNYE#<
zZ$!(u57Zr-?E-6+7I$~@yvSoHnw1%IU@Vezsb{CVofK*ae-q+B;Hh~cC#nkP7N{TZ
zcs<%mG)gDwu*q+xm<7;ba+s=jb|XR`FFD!Rlo61*U@EeCoF?QChi|bWT`C?TDs`!$
ztl*o0<&gJea%AN<^sL~wK~r>kc-V~BC-fq9=EXe`+=mM0$?!DJOjsrUk~qKQa8C)f
zN_sP85bB7rjjbh&rlY!j`k*U&Jv3T}sV_d&<@N7Y#q9wU)&v@K_*hkvlGF}?F|&!q
z$D#QMUZYX_;(m@p)QmwR)EvX6zR>zrYA((w<;48=`PDOyBQ3W47sli@xhCIEjJICC
zwjQ(|b*EI%;Un4a?2S9z?+Vjqvxv+rkb;i7Q@x;P!N#Y#B~X=Av90)k{$YNKgCaK$
z=-8?*0i_2^2TSGu`7)zD3vvd@z1*gwf>xwjp0Y%?70pstv@|%q+??eS4wztUIQuEQ
zG}#ce?iq9%y?8rNe9H62Zbe?k2G{Wi<c<0I^8^wR*5RBrJ7XU`T`o_u(o)rovgNkL
zp5hII2E8F~mJ!4H;gLXST`NMtF5r2|i*q*Wu<|5g?XZsRoNXHwt>m7@`v`vfjAJW*
zFMj7&xS_u(xm;@Mh)`RW;Rn_i43(nuWiD3mZ=CdEy^xs^-Rd4p@fnI0i5(&ael)U^
zqPpv>wX}b*NNV*FCP2~BT2r~%ox6@``{c(|ot7%5zE4zkt~(XRE|I&$%{QgzhY?Zv
zcRA?7RfS6<xvz$}bK1wZ_C75O;qvwI(0ci)9|>0f%Ze&|HPH6}zf^%5Bj7pwPVQ?q
zU(zd@nP)z^6T<oF1*gJwVMdNJO0OG?r$2Uj#YJI^?^oe?ZHqpZ;Fyst_E98wFrgIn
zv)5V*I`Xf|qZI*4Q$ahqZY=-uDi&wyb5SomDR(Hvlxj4i-+Qvulz-@TD*vEhWBPMB
zc^T-W<7`~#OH$;t+U9s3(cG4r8yxwh+wjtr>@Bn1fZFbPfO1eIVcWa?n)6M=y=Cn<
z`Ir{>`_F6LEBwc@+TU+9&z)66Y3^Gm-F`Xggi8hBq82MPd=3u;6>l+C-pOCfE3l_g
zq`t6<m9khT|1##^7>H_y{t{iL)ArhZck`>2_M^yT+)MTes+1C^EicXNs9+rf4-)uv
z&<H@CsqACG<*oIPQz!Ze39>OBxMGxM|Hr7wy$0qtpV%#pTrA)GW;Hz;L-|ReDpZ-d
zP(>onh1;KqmHY3^bC()Dx{b7qhC61&ehLLS;CW&YdI%CLQpDCa83|PnZ{rII@Z@-^
zo3VMA{MLo;kVCs0UC-c6<la3&RDR_b)?WnssKv_ntsNKM!Sbq{$mpZ0%V;WtORNr7
zoqjhfLt}gt()NBa;&%QSq_y<d6>iE30#MY(u-kj=369JZ?&9nX&C(Fs_n7Uzc&lnd
z?CSerjlG82urqS?gJ}1hVrTc5!8X9pv~-3DwRsxHdJwKJcKV!v5!3s9uB7x-<h^am
zvFX@Q%{@n#zQ=D#zm$rWkxFdmi_SDL?#z1_J3oS$_AGiOBct30jR!z&d6UCFDT1P1
zJfVOF8jkOfpc|RQSEkKpI~@EKgr@9n*F^dNq7F#q55GHAI@#G)`;p1OsqMu~k=A7B
zgN2jPPN#DmPKTSKPTTOztPU?nsXv0JSHH;W8K9hg|IVuaIVt}#6IP~V9-sSWA9rYa
zkLGoAN-g13dD1^|NwRP(a`sK9O7*|EZD0;SyynS?Z(mn;heS$;09rH{VR-Utj8PW&
zVE-<JxG+4`fDz{8l3LRLx)qG1{JL+8+@l2}gdCr9!j*tt0#X+r;(bY6N^0aW@h$;8
zwbX-_vkZBc3~0T~+-3YL69wNMMJ{#-ww<4_nXtKMgOK;f1%|g+I{TEVyDx9_YDQ?x
zW_EYD<;ahia{mOlD8RYp91w%4Zxg@!oPTUMuJcfsB-M^`dC%7LovYBPy!8W+qhvMk
z`5q4c3eX!nmYl=5#_4CNe$~>uY!h=&!(^SlgFWF`%(49hyN+4ez2sBCvnYA_Z{M8C
z0A8f!3|DSpV)fTz_;ulmM7e^%K?U7Xvztq4IHue!oLX=xr3toNi-lS~3&($h&oKyt
zMh-3#kM(f9-Xo@Gu0N%?n0ruP#%oZZ5<GR4o-v<&m-Kt-#ipp+Nsltax@sOV@D@S8
zD{a%Ia2oQVU&V5yBFJ)%4p|^y=>G-j{Mw|^CKP+^7wcGNfvb-mJ+pJBQLC3hK6-{V
z-#gWMF&e78uXIPun4j=ZD@zvJ`VMaTq`1YY!DYvukP>P3PV+uEb2p=rUZ1%fE1CY(
zB@U_nF_K#B$>Ul{IL|?@nA~WW8oQ9iSiA(?+CrILTG&g4D+JpBN=DCXF7mWLzjmCT
zNHy_320|V`>a0;}o6^Ad{XWQy${>?u;a+-NX72;tT$*W1bkeKhg#)hIrMK|3aHnW-
z&9v70qG&EZsrz9#&eM_@Km%Wz5G5P~9Sdb@kqH3c(nhHR%&>kjao;T_Ik$U=JzOAK
zI1PZ3+wE8R28|CM_+YC&CC~ZR12nHn({?^dcv;Je_dMNdy^+0J3og4bxfRh#;!TV~
zf+uRWeOJ5vGWTZyTo$uc;_;?udvmM#4NVPhYO2#owsWH>Z~4U4&S1MieJ^fyMBm)U
zXo1>+16#35YR%6_n!u(ZD=wG{XL|ED!L(2BvmE4>PGb6&du3_lfOPJ0j2Gb3A8ezZ
zrPdNhGovPZZ{-YS`~$r1#4ZO`)QyZbY<=o;8cJ)yf4ZBq`FumOMXd6(1#RQuIWI{j
zEE&yI<OH@~$l!Sb^?*9>gv%Qjdi6<k`j*yUk#VLi+%j+sf$9Aje?TLQ-aa2R7Q+}*
z#OsDC9`(}dRplR}X*~vw06&)33$<vat2J$qMiAvCQ>NZvk_?da0<$^BnjwHpZJ~Do
z<@Gf5Qoz<IE+)CpZailAnEX9kRrrb?V%us$InRkG19*6Lf9G~~y?}3GIPV(j)lKd)
z7#LbPS}h_)V@Ysy4$NtCWj)f>TYe&G{ec;k!YMV#fa?um`%+kZR5#XdipCdfLe_(Z
z4M%6j4JOF>B$y&x=O<Jqb;e{Zjt)R$Dydz_zG=SIh<9$A#z{7ybS1hk&V1Xs0Y8oi
z?it-@`s1|0?*Es6j_F5inu``c5n3g3gi5MAE}4ydPWBVMxZn6q(Ak{^(;nFJJEv*J
zvYPssXo4hv+8I71?Zn5i%bv9!birR_p#B&20Y)1Vc{&I2OPqtR4E;7V;B!P6aD}E`
z_q}=fgm0J5lB@tZ_jqRKX~hkL=Ds+-u@8L~(1(3XphQ3@G(oh7W7c5xHqHPPk}MkK
zjh36z@~s&hr6*jMuGV~SebKT($hHrQnct9IzLvazcbl$stIe>%Xb?5NnswRWY|g=#
z`@4_mM(_}kp1Bt@yMo?^j!kxZu0i*KUWbFCD56fHy=>sx!^*v%=jditvSXWfe#;=-
zpi(+<qlWN~iui*0SbEcZ>0;`Dzbu;>B%lqYzwjG5!vOWNlu>R&>M$KHB%J^a1)mgH
zZlgLtI079;o;B)54o-ZKK<8!SY)*iiX(xRrf^KX^0!7I0a%kam`j$D5)r|rAncvh&
zfW9lj@8(W^%1oVDSSbh|;fX7{a6*vY7p?q4a{HNCcEoTGA_-u_)Hmm!St?<p`H%TF
zwvkO0nC<yxVXIyAE{>8doczkugAeB~hCQB)|EKL?0;r(#u6Q?~Dz8m`B-_+iHk8@^
zrc=+z1~G;`!;Fdycrz@M3(FR^%nhg)uNdq2My~Sixdrdl&GGC4P5naCPcNsp7MWIi
zXK~>-hkUm6z0iyRAt#aAiS2dVI5bA44a{e;8sh&Ol7H${b1@yO;&HDHsUFM8Tv2^5
zRaKGyOCIgMKBRfkCA#FDz>o$2MR5zu6M#Qq5<ee(qIP8P{#h!?qX{JvHZGv`(6APX
z?3<ErTha$llU9Y_tn+(gc8cI}hxJuSz2@sE7@uV5!G15aud$%jFSXX~GxCuq&eCZ#
z>ouu<Z**bWxZ4Njq<;OVkK*h=fnL=w`xNB{#mCAO2IZ-=;m;wah^3RG`sSOkNaot-
zdUIuwaEb%1C!V+~YF62XL{!q_L3~IDGMArLiYLaZ{=D2NjB3r>+v^~|B8755dL>=M
z(sKKC@D2g=%dwqkn{{5e<es(-39h8=g#&65e2Gm}%eE;pw=_$9W*Wo7Mae(3gU0wy
zNk3i}W}-9grru5te};VNnxNAcaIYjk4iRrErKOC*+|&V2y?Ek}33=N0Dfvo-<`kP)
zl5!OB>i^Y#T&Fx;%Giwmlp+;!<?$!jUNWew-{-~=GPIzxRPD9G-}OnH$D3`b(O>j>
z38*YXh8!YVmMUiW4KMy-+_V%|QNpK=)jJQT@IeQc0UuP8yWCPt6{EMYbHy{q5Ed!u
z$`@@OvQp<qdqvmbx?z35^Ms^(Q@9Q;z2L}cKbp$m{Pi2*DV4gAv&va`0<I1ckNC%L
zzsoE(w!8$GiSNX;lE_y=xDcERVjAAgNJ>&4CyGwL_;4zW-!bD`H(%L@fvKJtjDIO(
z19hOPIc0XAyYr}M#;W)U!`V8A9dr6TD8)LIiH%T4z*v3!H6wijY-tC`-~dQ@?)TTQ
z5o0((WtR#jZ@xnKVRJdqpKMDa3<qee8^u%~@33^`g!q))P5kZsBt|c0RcupmC01`=
zE&WM~==VBX2>wB->O_WSgA#Z=_wi5q;6E)p<od%{hr5{6Lkm>pN(6b6sxwx5{Yu?f
zWJ;YVZv)T3%Pw*|4?fG86%_l<YB&^XwB)6Oy%-dQ;g?||W25)6ZSdz0{jIby!}3Z~
zt);pd`@{ZlrbE<yUQ>YeWfL|o4saaCItJq%HQt!v)pv%2H~$DLw!1PM-L#N=O#Id@
zckEi{iBPgCZ;YVLm2zw+F-na0ly8W#J#EQo_{<kP6-w1(5X{*$_Uel(P}7jZAJUM0
zx(U#&(w?*rgJDGL3D)&u(k+Ddzb(L@@1K51wDn_QegAk}ibu$yIdhyy2XxPg3FcG8
z+|K_b@9A;+JQ}yDni@U+D1Ysxg;e<%wX*NYg1k;eb3%^Tu3FUJB05Z}8TQg9Bl|CN
z_W*7DB3_q0yO~e1xnNgFo^aMI9NN7SFS;_Jj!AxF60}veEBUWo{P>$v%Q7p$_+zl5
zGYog%qAke}xQaz8pPN^=&?azcslzX0aWH2~IxE!tj*<A}!y6_6id(SHJQ>KP3F5Ch
zLB%A?7MEFATF=%LUBH&bhXsAh@b8u$+Io3L-~HlqL(A*nkux+0FoLAE#5uoT+a<4r
za{zp#_<h+B_a)&hPt%uT&%gD*4}L6fIel8{G5>T!-4Q~cGemwIJFxzV#cBD@!zo8x
z=>>B?STexlFK2f8&xU(L0wlwM%wKK1j%zK-44ZJaKK8hpyr2g!e0#zQwryN90E9wh
z4VoxPW&Oaf-kTZ=t6_2CNePbn3onr;V%cV}?+Y?H?nnX!&^W((u?^7heg=9=_$y?@
zJL7}WbJ-8ljit}ybLY~8f5v_#C=}R$^M(EmX<Pt7*HnMo;#w=JN@Fm!65O@YIKhR?
z!+@=YZqWuJj!C6iy5E}SNPPLu_=gK{UMB32>E=!PioWbRijor*XVU*5e5IKd*0)So
zMk%J5C-(g4oHn)3>L%*=;&e%8i7VO*z^Jfp&pj`KPliHblErbpHW7AC8)DV0#-3Ms
zF-2UNFDXYYj;`{mZ|=xA2bJ$q@k(aMuRkNisvcTWA2I*1MnHVgx`D*#`c_{xD$d28
zv!Zjb$@4%(uc!OIj7_|`r1$0%$beBj&kc8B>)X~qpnx9ZmIW%x?6DUgcPzVtIN(5X
z=^EkFhXrNyld5Xt&NT6s0{`9c5miSLHYU)#Ha=Lt?`cE@{4yaJd;!f9V?O!fp~kK%
zjzev|4PeBF%p&FFj0(m&wgl4_TYGL3+PqO!Bcxip!!vKAkt(%8cNPQc%<-c&=budP
zGbL5#YvOaHK9lHIW}d_<{<o^hwZzP|MTCzV(iV8Iur~-KQAhCrXuQ)fN#FwhArU#3
zQatw`{aX|I-9B0{h60fICIYh5^cw?6$DF7#e&$!&60KuRcha~A@EhB(u-We|$_Pk_
zaLUjFsa5z!ULd%$Q4aCVB$2S04<+;ziXy6Z@ZkI+vVXG18LLxUIRJz;Cf;5B?n*4;
zw5m25-UZFer1~G?k3Zrw*Cv*tPf{SZrY^oPKHh?AZGw>9(Dw3e#qqE2FvAtXS;L<7
zp?Z|OZowryHuvZ5b1nM?3;4J}@<fTs)6;1|80$AzUW>ytJKG*Uwf!64R1zraq<ju(
zg=LMIK5}X+J=5pmO{OXTD9QuqnED}MV46uC#VW~60j)YKFv-V*L`T0Ikifj;{-W}i
zr&SoVf}tZ5RPFaXu%>3#rLtp#9G$4!;}<Kk>kJDuavKcaS7f1|9;6AYG|)+riu42x
z;W~a~GE9GK)5IFG^>=%O6+Q8`o9|=W#ZYEP?Khfp%Jg`R<)ei;6LFtGw~qbm9k%TW
zQc^@RtjO)3Svc=)oX-PD|9F1W@xl`aJ@mmXOkAc6br~HcgVT*xRK3<7m3-tOFNp*D
zTy>UbgnZe>JT>8IfY46)LkG6`9rU1<AP*dh&(bj#I59U2hg0DCcxNPL!<&9-82+ea
z#H$cefTi<aA0`BOvU%;l0uwj@%}V+<%!<7i&-H<$$2Vvj^Y+Gxk$w25X$y?t4KXi0
zMis4lQQm6huPvn~){-tjB%yOb=w}VqtRy9RaJbK>bO0C$tbnK?3Razj&?4`o0Icwz
z7zupF(UuQtK|ae)yBN8a8!2OvF8*cQtjkqD4MuKp!LJN#p!brGLnbzde_J9JKs%~B
z5PVd$i*-~W0KSh??0LAKK92Oh;XgO0eUn%vMZI%;^M1*kxHRcVJG>c5iLkU(ut`#k
zfu|Lmj$b)E7Jq@KHIeFrvSxVCAox&6q0I5qs`8==kDc;dK@|9+H~KH*(qMMXihIRW
z>&xoaVI|+Fn(XGcw?$j-C&r`jR@}m}&gyYRN;!a&Sk*{C^O0c7<d@QlD(X{wQZgjr
zM3FRjWQJRJ$M_=+&@Ne}IEuX>F58n_*b)~F>l(W$@+xG!8a3tFur)`)#Gq%M!n6q#
z8vE>peQ(HFP4}5$lDI0e^!TK*OBQW1zO)F}UjE*@*!5gg?!|c!%stsiV{D~Um#G%?
z{C!}w(nbS(X!ta98w(w()|OJ!Eu`?ke4DSNyK)K{!o3{GRgbOh#wal7JuUXUq!gp_
zDmRT3@1c1n@cHC={xmN$Q9<gle#7~Yq2&=gp(I+$ED$>`DJ|whItubCjy_hL%i<-*
zKe822Tj36`&(n~C^Xo((r2%X2H$7>(K~^5tG#jCvuT{<=N`hln02+uZINy}_g6t+T
z2jH#2@3~5UKZCq<;c;AaPQt7?T-e?9*e2hZT)}^FIZc`eq$imL5cZJo(GUk(!dPm%
zbiv{z*mfSTjg9vRD5GZl7=N9t%cMXJNjNf@hIQnMF-Reu)~G6H7R;RLrr7zjE`V<w
z$;t=stSTP8ykDdtHrdCPBU#<UA6=l!AT<(&7!2at0PBmUBRFwA*$o$fR)`WOSY0RM
zn@K_7Hg@6Tt{$&VAg@qY{XLrT3$J0Tp*3bz;0s{woEMeB4Q5i?Mvo`iHBkOo4nZr-
zCubVm7#PsCM=F(`5d|ZZQCJ}=lb3s2Y0dyO7EI8~^Omce-Ir=v4PZF|^71`xC#Hfc
z#7i#~-<l7p^S)=xV;KlC2urDHYoVd-ej^xaY+h@eOW0wAGQjmtWqnZ{S&0LF$&_1J
zUm5-+f~a9=6V`x({)};m5~rOHN~HNX(`I1B8@i1d?rM?Xt7A_|4BV|8k2#$*bqbc)
z!#ZMvJl7P4;S>MV_i?5b4VrTlX_N^)FfJ9Wxg77}i67@(L8n;EJd3O&i(lETbbCXo
zQSR^F)YFX@o&36fm2r|JFwu3x)DYOGCzZE<+jD;@{OAZ_2~b*u=t-$<?wmAT>x$LX
zMWQk*b=PTE(*0S%3ZY_V_(C8nw@U3x`w@`9B5jflven{&U$~15frYjeKF@omZZqM9
zv;A}ilC!yOOnBT8-gjjG-TF*~nX8OoanIwWrR^&CCRa#r!3Ql2tt1)51TzEJ#CH_D
zdF9fqj0T3|crDh3ND{)>@IA<cVsAibA_<#&2)d|8ATm~{B<%7#Sr^t3#b`|59}-i;
z{BZ$q+FLCQl;{)3S;pa+W%yGp5g=Bz(c?s5C8Q;h>f>!rRR4xNKi{2TL|E=`DwARw
zdkD%2z$~RYG;mqMsURhsWLRfR<3v9@jEOHa4l1rs+o?DvGd54WKLhciORsVLu#?_7
zeR4qj64#LHRZkj+kY3yQ%zyZ)|Gn_fCnh6<ozWp!KB8RvR#+$gQ;#O#<>-kSWTD`f
zUkzpgMNhhr991-bV#1qghP9BxE~x-P8HI1a>VETNPAGeaF8}<NJltElq%MEr+qTCf
z`^gg_nwA?5*!W7jR+H9_s&`xa`*wzmZOJ2AhxPftfCaorHVlh&fD9)4=@xJ@?!9aA
z<sqzv9P$b10{_bB`bO@qXTD}>U3eyY765tH;{bVmWk-etGZztu#`v58%%ie46L>2+
zfQ8yp^ei_x*WLb`hAVp_XXSF)AYa<lewhqTmELkUZ$mEK2JQNH!ut;UTO(aNjD47L
zpMSh0ivfP9f83Jwdbr{rNw13F7eX~>#1UAUc@J38z~&P`AvOYm3g23bUOEQ8vWo{1
zt*o<l;a8MlhljPeJ>+~*jN;iQL;O1?VEjOhXkGZYdiB;wqGH@<a}_b>{Y!U+9@nA_
ztv4@Gg&M5{BX%>X56+e{#>rLP$_sDr?v8AQ1d!^#az3v8XKdrMIO6xT<&r?v2HGBO
z&`k61IxjGQo&L8_!LqW=cVk=3&zCl~YVfo%In6^sH=623SsNdtP?_&ZK3xSY0PcPL
zDDXyVDf-0v>!gCjihXdRg`r1~JU9##3!b<t*D2xq;LUSIQwWW3>r1_+A#$3UevIV4
z8^j3rV&tNR=7N|w#>mIvKA|7N#KYc~YEgar!~IM-BC^LKwObZlGq08KmYfG09R1!5
zfCfxFV4MbVTMxtpj~ja*RnCA#P9j3~9a7&<`Q8>C#3W{GePr`U22rnPlx~ARb%-1{
zVN!;RJa1Bz(t8AUZffbiJS7gn=t9ro!6h5*B2qGcZATR$flZSkoBlDr((v{JFT&~A
zUHQndZG>)60DGx3^jNI|zomhq&61I1h5T^|rOxC94g-f%GxUbN*Y$P;k&<9L<wt2o
zz1`M0u6#x;ZrmAYeW2E0v?*$)=LLybjj2=o&VvL=@XDa^GKwqr&pzuSS{n?EN_^H8
z8;Q$^u9GB%hrqg>4+48rs{jJvgr{75Gf`Y;(Axnzg+!)uzQ+t5_}gsugj}8NHZ#U&
z;1fJ~8(6q_1FMfU7#sLdY6u%AwNSA!e_*NvW)YJFyA6}f5JJd>A%ChokFm2kj)3B?
zG8zm;gEY<6ufY9x97sof>IT~gCRpvm+qPSW)nYvUhG~Z)C$g;!#et5M?YBWI>iFE8
zR=a+1akV+#EPb9^LkUtR65SD^fIQRqO3#z8p@xdy2$RU#6=8-;^cSf3>p{`sS<2eT
zvhghD*cZ^vl59f>06%~t#WG+mPJt);i8s}W4UZ7aN@&^hoX=T8xE8AQiqo2&jw8;k
zx=cJ)UH{VFR@6)z>qJhMHq04NGi*=N;IF3e39OvXNM8sll;OfjkfT-3nIjAMbBe$i
zT~yiSJ4K<)9g5p2@@QUf>B_Mb<!7uPmb038n>Durp@gJ(F*nZkPJB}HIwXIOvkHvm
z%y(J62yX!^R8ffif&IQ=v29b`^!fXmG5+qiQPf<2q&X54XQK-3H5#=8Oz28a^uQHB
z@!NiUA>+AC`=hS{7gCr+y`Rs=bOWj{$u3)*gRBG0d4gR#^ZJB}Es4?bLMJux!<%}<
zzZTgU1;4hB9b?}nz(Mf<egXI?H(&Sc5boV)7p&^G>{mFFmOYIwduck;b5AKZ<#9cE
zd!O1(fY#9jX?WV1YSjOj+yQzQg{K{V!$%j1qBZ<Q4yO<|hGc?jffxAD2*1%a6*hW4
z@7!1HYT{$vlq}M{DpqCX&4gPV_!{m;%wTjp$9ZDOBuI)(A|lWspGPjop3(m}OC|;P
zqHqxqMtyV}@r>$eSi)IX;)mw&?9;r<%U3Pm6{*5d@A0@-YaTNdV<`1Q74cUve=amA
zsS>RXXASM^J)}DbfBEZ8Es*64Gyrvj+lT|sQ;kN{A;&B5w1HD}K7@Rs_xX>HX~mm`
z<qJ}bM@d>Omw#V6gbxpRQ-GS;q};+0^ryptAami(U>~>sB2}q3sw3jNb9n)x;+D@@
zmqzrPCLK63zsgO#RU(4MQ$ae86NBeEeuvKXIL<D)O!RzB<NWkQcvq@c%lAZjfQ4ND
zM30}NX*|<9&Y6z6!~r+$UhF50pMIy*Pnl(n0<vuM4WO*ceInrE+&@5CuRUBjH3crs
z0^&1^6MCOlkU^aJB>_Ux;a5GnVZ^ztSa_+pRsgmq?$5r&;6m!If*lod#a_->9$x=L
z$3_s&eItCimCm&TxOM<I)}K5+NHuJ!2YCt`Y-to?m$KxKP?X1{_S1J+%&tDJl`Eft
z#3|9q%+wXemCiZb;{)fh$c8Zj9*ELyCIfL2SkGlPhIa&-OkR-R?0v;-e}l|gZrXXv
z_Rfv>;kKj6Zp%WZ)=uQRI8M+*&DIVzt0v6fG8~@9;u(kv$M$;otKoIgiH6F`Hztx&
z5GQrv5Fluc06+WjSUZ-%M77Q$Ya`ei50vC~*Hx$IZo}nvj^Rp5{N+v1Q-F1Q9f*wU
zTT@8T)%r6ia#^d*ep^Dk5SD}F0sP$YkbfAA9T(7B55BYkyX#&;hHOyrf7NWBG7j;k
z-52Y)q<K#*iV$8QVcq?E11~4)XuJ?(egyKAP<oRoSDF=~eKh{DGdsbl9%u=4_U4tY
zmkFGXf*W(01^z_3$OIc{#HYu+6>7cT6hgfet;pdweDa2Pg_JtXFmjLfo7nd!4Rk@T
z$3jtfH9e_EQ$wjU0MHuV_O`)@WDtCp1^fmYr_U@c{dRpzCLZq7Owt=!S`FR-@ko(R
z#LZ$|TY;noSkQ2Zukq=fxqQ3bLK5*KXwedLb0T$O*_N8BTTt}ferb!4bJsDEn`Ok2
zAl%iOp7cdrjyUg_k20d$xaz$;&K~jn$7O?fXCW@oCvX+X`~IMse7kVW@bidoFZm4y
z6vK=JekmFD(k_8rnq(ZT@wuVxg1O=folN30>S^tcJ?_~H4&*08hKM|-Ep9ZK*qz?3
zz4T>@q!R<?T2Q?-khr(t7t-05kYbA>Q+T98`*9HR7tF)|c+5VBNR(0}CynP^CBdqs
zA%22_bcQ?L9DTgJxtm@7;(4&<<)j2ST_5%dpJWmSpwN$4#6JMPXDpWC^wN?<8VV*J
z4Ra`Y-Gu3k+?EEe6%B<&8S=un#C#tzenvWGt!|=B{Q=xh?HAg}&aPH73BjGm`g_^S
zUtPRQdgme6k<!Zd^*G<QpY(dvE?G$QGiLXh;DoZ`7P(-0(jU8csE>OQv=q!}eSYTk
zr#itnJ+I72a(QQ2p82<;%rE<X2G#DI5fgLm{@}^8v?S5&Czgw1gGcU>QmTqfh=*D*
zuxW(6CSrw$WWTZcG1BpS&=Jnp*NjQLYju#A*E0iuh+s193G5spx@``Aw%b;(TL7hx
zL?I>4Lvibw2tshp)ogfsudDa1o_elSl)VHYUOzBvnv$RQYNPO6<56Cm;`H`WZJ20e
zNeM8ZD2WP5oo=3&2UMk|)ql?1+13d}OZolXAov*Elk|uQ4Qm>?n`f3mJE{y@2%;)(
zCAq%HOnJ!3SI>MXag=oC-ZY=ob2+{x2V>t~avu{Yc1FGVj)TZPx&HOza>TF9@P1pc
zQp1na$D<SOYQbRj*e!TM7B_#?aepuS<lOtDNEex6$n6LPFM(AiDbIejB4hTwyIp;}
z=-mA5yzJ?wRm5$D6F6IE>{2?y`PXf<>3O&N9vvPS$O<aI_#nCAMELaFUJFi(x`95X
zvc==jFX3rTtS;*^XY&ud`HC?J+W(u5{Z$ZstaThoNyKCzM<5M+(gZfNMGu*6bd{#S
z;U&i;A;P4FCk{1FQh~#F5O<g)_D`3Zf(V2sgK}NQ(q9n`ZVZIZsx+Rk5Q4E0;mssA
zmE0)}I49~IB+p>I=o;LKkUzBLDTiL+vhL`ff;yQhOEA9KP+iLHC1B?73ad))r^+F;
z>@hq!;3ZRSl*1PRbtxzL$<aWl$LE4ykwhDKcMC>-{_$!5!Q1P+Yp<U6j&Is%P_ZJ(
z=oYJ|zlfc=#8X_bwLVw1$%Oo^)FEH7Y073ypxEC1xaAA7#`Sq!$&onhTXWg|od6CC
z`Rt#NqASs!k5md|c;iqiuHQfL^x<f1D@8jK0(x$0guI1#p~%kEOC4NM9y=2K+H|Jp
zrZg8f+{U-9j_3Q=8MZ+VLG|aqVjc+FX5UGjRZ9(B^J)LcHe`Cj%~o1<0Svz~y{G;x
zL#VzB0RjU&P4OW^^I?$$Qay!uxHz0E_Z+={>rkvBdXwIM-%t!u&jP<D^p6N48A3MK
zUEyhMbWbCxa=Kr~nivmH#-HJ%+1~evtaMYrF)_(bWBAf5ah$WBMi;iha*DRRAT&K$
zC2nM2QV%_E=HiP-6rPUaVcH^Ta{qu?eyWt-K?!#_b%CLM4XHq>iY>y@pRv~eJnxZ3
zW`no<Blu5c#apvpE?k?K74PNG9Y=S%kQg{gs}xYy4KAZ^gDWH;q>xc!)WcE<+QEb6
zx=O3&2X1TB?6z<rwvLP5;dRuTm=`7rC9|s7T-PBde!jZnnVxB;V+bg$bYSFSkpvrq
z<KG{(EqcMzTQ#$J?0;TyX13AY*b{%w%ojVhk^rl%q<)RCk<V!OKJdc$ctM|ROh8n>
zR)_(Qj+H-R9J*<QN6!s-ALW=y%RQ(P7C9x3s|VnYvie^b)b49{s>VtpU6>&iL`|&y
zY5HKW>lWNWOBvxpGdRYwVCE1uzAhU%Q1DdDof?8(k!<oBQ`;i2+~|1|W+~U7YEgMO
zN|=%N(MeSQ`5RSc3Nua}=BEhM4l#T8*(KB(=bG|3C770t86RN`Iyff%%C)(-aBUyg
z-)D|4;IW$7Hb{LMcWd^T&PeRcX3R;tJ{9>IZ{xJO6_?^u|E3|bUAjc(m?5$F*)>sy
zB(pG^u~d^1&<Jqtak7Bb;+yHh(?Tw}^8@gnFJzpd!utW3=l#RwTv(J;U+x*?d4uvF
znN6s?tfBgcK(r{?e$Oa#aemMuc!C7uSl1>+<k`U6C~gQB76Quv^c<}5SbMIn;5kW;
z!X}s(oQzBgY+k2flmPzvcj@Oxa2+#p7F7574O3R4A#K;Q;rb|dmV{?e_v2vGTxm0N
z{803J1EV(mIn`P|eSj7GG~ci(YV98qq3;3qbo`yVX6!JJ;Jgf=&r!ShZa*wFDv!Cb
z>km;MNs`#f+I?psV<Q7NF{TrAGz9%1(Z%MhoqO?qIdL91yGfzPh*<a7Vqm<RC5-Eq
z*>VJ_(p-N4?PMN73By=0eL=<pi_i3oyqdHx99<XVooisn`oYI}uYKzMl=Ej8M`V?}
zj8{vW+EANnWF+7TYOnHIwsaEZhE@H$v~a<KWejT~-Hrt1Mp<S|v3SjBb#G44D{S=~
z#-DAxEI?OFNu7U8lYEE3_4;D37g1eOP$^l^seQobq{T8aIT1`2%$MTie{uFm#xQ0G
z>`70^x`-1PTiqpB&Z}-$xgYHzn9O20<{ud=2rmGq$>Wr*!@4-|bO-^0z(cxhLjswl
zNDEiBeqfZY9F^2!hrp%x(TTT3aT>>oA5LEE09Rs<4Ls4J2gix_;$18^iPuCigD&3G
z;hjf{A_E~xZybtfg;r-IvFVrYe9s*D(>A~Av8xl$VC*nX9i~YKTQeZZbx-d@0vY?O
z^2WW8i7OL+Q)4;p6=NE5+!xxM#m%8035DpLe@vlt5C(hP3jEffIhsluE@YQ^4=pCo
z3|_+hCH2<ZcNena<u)`1mApA)Frr<#DafXR_}zT9*D@q}(;IY}>7ogp1oF-^ac;g7
zGO@>}F<>qP*WQg;!PBgyW@^iQyQNyyO7c8{*%ri9R;$i~*+03UenQTY#-jaAyu;VX
z<(ghEI3M$rC0$a9fWN}lIu%<a-F7OjL_>F&*Jse9pLX)kP~U<e<bEU@pGPN1;b{$<
z)RE7|#F5F{sC)0)9$Idw+OOOH&KY3-r}x1U3~Uzxo4M5RzVV+|UJkt??E$(q!*e+A
z_&Q~=s>1zW0#|8sq%=-0hsXsY82Inz0N`GrIy;SVBwwVsVk^R_)ihKG-|{34yhP;)
z<@M+@Hw6SUG%6x2O|J+yxwh?}>tlx+N{uRCZ~_dJ^pR>Z@npU1PmqL6T7f*X<Ch+6
zz*S-a79zZKdtd_npK5;r*`n7j-pvtnEp1xKT5^e&6b`uMsT9MJ8ELL7e@UW$+zJ{d
zhgOygzjST6`1vQC<YF<`7yFt>BbH5ezLo>8@73zX_PvRNOW#%!guP5Pp5mN?TiT1G
zw!d^Mr<G-tR>Y$4^xr*BY|ki(vcqlr{ng<WTKT*BWFPA6<XN$fMU7WAocK&Gj1Bmh
zx9fSWEJv|AwATs>Lor+J$kYj*4A7ga_ZBzcib_qmArr?POZb=Ol{p*8(2u+hYsNm?
zX!&>6yM#{!<R~2hG#dVdH`KCjIjXr5LrCLkTwX}zmt-$2dPvl>*yVs|F#!G=EIjv^
zDeqG0EWVErI=0=;d_)?@3#|q?E0k09`^b^qgmMpa!wdNFq80QoiE~#I8`!qS4y9~-
z{0{3^RJR;X@{$z@IJ|I;*#YWLsmDK7WB=RR$Ja6#Yv{LDcE&pPhPI_&EeaQ}CsyA@
zF$xny7oIYZ)X^^k<h}joEJ%fetxZ3ET$z9I4zfLVkzAxa0O5DmWBMy;zCY2nDIi1m
z9XHa@)3mJjk$poeW0J6Q(!PwnGiu%(h)=Pmuo=B~SSekmawxd{v2~QH(kVMNCeoGd
zh``>|j?gD9Ez^q_G3ZOCuYJEBU~t>*e)zyE+_-Fn%YB-NG-uzK$iMG=A{nfJ5+jZz
zttuXlm&~_e<0tb2Sj<|lX1cnCQ<}cM4l;&eCK&Os4wQ3FU5gx!V29wP8NQ=~QvjHI
zu*CJ_ITT~IaFigCV9=j7v<bOJ4l;eYzX5)7kQg^#Al`R+h|b<8&LZ%DoBI%@D|R$i
z6>gGzlsJ>%xju*i%yp6zT=;lBn75dEQp21B>Zg?q>JRvNhxfRE`YyqCMqt6vb7&9~
zWOgkNx*qs=a=!MoFVwR?^;G(nsfT_AlX)`G>z{huRIvmmp<Av?-BpKN5!qWD!^+gs
z@)KzpTJRa!lY6@b+#{XJ^COm+crlT~IAcMkNGv3-#r~{fyAV2=8zFxvtn)=RmBF!1
z7Y&+BWNFFcSj40GS~5Tbo8+AmxbE-bmY~YxoLHk>j^9OSJAJ**fc8F%x@r=@qVl$Z
zGSIP+d~+*1QC^a^m(C3MnOcCdnr4xb^9Z|#y-5!bK1R>dECpV_oUBt|g8MK`=G@T&
zED{%13M^Zq@l3M=_=sBL$2ALlanrGsi8wy>hZK`o{A(Esw1bN>f-Lr|qWu!H!%l2)
z^D#F0@<EVHHskcnzF*%}*y5rdZh20pbEX>w>5VgRVen(OJ!XQZy>k7*?9n(B0jsQQ
zgP2cJjKdv1yQ?%$bwmcPPr~0|KH2iL4yW7FEhke$-0$ID&(XVvU?!SMEO|PA?ZXpq
zymu7Sgp1|juTa5F{f6daVE=iQmkEEu{?rwCMjo$Pt#zxh@27Zr-KNi*)Zm!&jZG_V
z1I)2-pAzA}$7WrU+X5LH5j)-@RcaZJ@H;<)LiZ|U^h83((L>8^w^vZ*d-XE08<F8r
z**|h+VBWt0OqET5?wJA~&8s`vs44({;x)kFsKS30=>~S-X8^Yj1VAJ?_MyD|7E?ld
zvg!{0!0tg6*8hvCxBiQ|kKTkSW$2+na_CM4X&AbsL53C)q(i!4knWa}?ohgs?x8!R
zrD4b+m-~74ySx9u{PKCtd0*$M^FskefBdM>1e$0A)Q^21@b8Zh0PCUW>GID?QRj9l
zSo=$7A=lYYJNu1`AWsDoVAewHd0bhN<UmK=^Qhmk#`AArglT9Tcy$zXt=d6vM&lZ*
zaPY;?++e|Af2*?^;l6D0Vm-4ROHjsC0p{S^w1Xyooq{+_N$nifsE!Og7HQGo?`L8y
zagAjG`l-a(BRA#r4`i;6rdUD5pKv$Y)@GjMKX8i9SaZ(~p*qE0VVzd*z%vc>@=q6*
zEJsH^SBpX#9@fOAZDB}uvo?ZauT2gtM=h00d>t0Ock%lVq0#q;*PTB|R|;)XRXlRl
zwnsf>ZL(ea_o#CS0b*eglYJK`7S64Tv=y>Qud$3Kbk-u-*s8e<7(hYvglCpK;SL>m
zf#ewkihD!BVZ85hWxJVkUddlILw^ohBD=hvWTyVV@%faO_`EXTosZN8;ATTv8{W`9
zo+{pkab_ij&AL;j@%Vn%wKt9(^NPT^dI3+qJ#;WgPx(wzGAfsY0nNfQtKl|pSjZ3*
zA+HP&l)DdmA5CZ6r1=(#H`2hpKz`~iAHW*}Rz7_%HcasUk!VUXttIS;z}Nd@$q0Jl
zJZnsbW9N{okG1`S)_y>`I-nQeQ@z`~J(h*6^L0pd`z}ePUJQvRgCpakBI6*FqChi0
zZTvZPU|NyioccoJ$ybv%^tl$<2@TQ!`V(4eN#S+-X0u*bAVEt_%SdVo4Z+L3(-8P@
zz|r3Dy);Fqz_N5qis*4_kK8}`cH0E0HKzCt=;#l=P)g3om6uMtWGsu29f9UG$02Bt
zO-pF+m*>LD&o`&G%9i$0ry;Pg@xBNC*<3H$s%EJq>8UNniEj;Ss#zvmOfO6pk#?Xx
z;tpG?7l|F<HyoF`BR6e9zodAFyN|n8yB%G-X?gm?G%yvPnn=|vN8FtRQX|t;>f5G0
z*dDatq=AqHTDd)6n7znc98t`c7yA7+9v<6@JZWV$pOTeR(z3KJ(;ft_;P%3~2(4D{
zR0!cB632naZLGR+iU!?ZMGh0zSVGZw;Px8}{*Rer9t%*cGf*)`P(-}Y(846musdU&
z?Hz2r+0Os5+~#Y?7dPFnRe?Az0{+^+mf@iitTOXbgb9FO7N92WoG~onVu7PyC+RdR
zHOb8)`Ysm3Z9<qAuA@i2;{7NauehoE_q?_k2~${NC%Rj}lDxS)iE6Wr!2%}$UY&B?
zj|ARrztZrY&Mx>p^F+*ci<8=WAO2*7@R$QW0jL2=(<Cfw_rp88ymN^YnxeXTI4^BY
z#<zb{NtEo;`aiAj(MgHlE!@pI!jt;VTiC0s+(!KGVxaFsaHro$VvBazWk74QQtknu
z==!Xum_O#buh+G1aiL9oYR1a!byc@qy*9ZIyvN+~F_cc0<C(mFLf-YeQ{3G&;n|wL
zGiQbh>%pW?@cqKd_wi+8!mQ|EsCug=M2qOnuR%|FgUT{NQ@#0pgk3Zgp8DKsXoDRq
zl-y}j3{q`U!%~`H;qQMw$Ks$dekO#17>M=gB_wQ;MJaa4S>>i)@T8hdLw#6Np|}ED
z*u+grrzJ`#vxkdJo}ZZ~_n9q7TYW7IrFD4mPPcxgN<nTRpdz@YH!`HW<z=^lk))uD
z9E*Ha8nsH?8#6L`2Sc{s_~ISF1C?aj>$+$6<C=PWM_5+kKOJ8k6|h`r$VggcT<7&i
zn6vKhWRRFg=fgnJa~zH()HhT<^0@g2-VA!Y6+@9Nmdjw&K@UxT;04GEL+ACMq1j`J
z_=WD-zj1{o01*W%=TS+=J#dr_$2r;0cEw@YAOMe49zl+fhpC}%c9aOFz6^~n%$PI5
zKRpD=Mw^<r&HM#YMENA!ZCiD5*+WHyIVpj~XLOtOCRKNZfDfWynT8tvU~yg!RP*<;
z*H?>gh=H591|a}jG+=K6C#FH@i+<QI<nf6&1`rLP*!~8R!S}QnEDj`H$i3?!i5P>;
zf-7Ghr*wN|ViG*w$?DKn5-?_LBA&aC)y5X>@}D$U1v0LqjR4#AlHMJflLvM4!nn9M
z${FQ9sw*gV(H97pj%Ph>jxl{jP0mz48G;T@#hNU=3HQWef`$5*EFVp2ys-|&?ASfd
z-cVCPZ_5&pWEX_1o-B<VAIxDAqpguVUcg1OaV@o`5SYf23e8RZO#)hWOVN<|MxT&P
zo&3Y-95WcpCd)C5S@tKrvl)9g5ZxRrW@U>!5C|QZ8pY6IimK5IeCMfkjRAw;A66uK
zdqw-XkCxWSb{IP=Z=@584@|u<o!J7Mp6$@Fm;PV!=XoHc?!$S<VzJYM40y!MQ}>s{
z>b&q59B1kr-;E4%M>3|a1fQrg>a^+IYQ5*B_Qp-Io&HH#a?JI)tO=Cy8CBo42Df{|
zj~ubMP$2x>q~&#&GUU)y@mBf>O$}{Hg(_=<heLiu<m+64KGRao4Q&IkD0U*^-L$ps
zT6M-cdM+dRGp*=GR-@#n!53l4F>;%HW@d?jJ7w!X564%8=b0ojp7Sga6C0)?`mg`%
zBYheF(*XBiu!0TEpQOif)%8R~_P8Of|4d}9N!_`WLV@_H&~;@vt07sy&6~pqJKq>k
zTFk_I&;sqJ)A#kOZCv>i3GY}pr~mgx<WcnB3uJxUlPYshS2W+_%hG*8TNkJNDhQb%
zklP}RdHbuB=%4(M`Q`({c#Aeb{h7tP7mkD}X$y-N-Sv!&@8wT$_C_y+Ksf5Y7^I@z
zb*R{GNtr*vSC`-O5DOkL?>8h{7SC^I0B`~y_+`J3dpku_6voLNF+<<;z34GwpRk0a
z`#)qHkg(b=YznS_3(1#*Bp?V48V#EB*M}*j(A*j(zz($K7%tjuQl2qOJetN(pKKX&
zu-Rg)h2H}1<wR)@V}ZKKNOJkfzea=47a!>BG2U5#Bd;MM2~QFVFTxZBTPlYDESQ{b
z2cJkGC`R?DQO@FDro2;4vAP7Bzr9}|kGSndC$8^ReYlPX=}s)i^=44NX@w|q5og-z
zOWuNCkF8q1yLR?OH`6YhOpl{iH!{SZEk|e?;1qUllgg#<hYKa_*4L-e*4tHZq~;3B
zn$Q>_qK-Ik^3Vj2tUbc55UJ5(U_3ODpa`a@F$_OPmjE1NHu{nNA4q}BGeNG0qqTrP
zhfq<Pn4x@tK{}yap0dK3sy8+!<IKi!AQ1VakkFN}4Jsk(^wJ1nv8G_jz{KMRU}LHp
z#HApuhh0=bQ>vfV{hLmc@(Xo9TeR+FGyO-iuvSKg_OHeaCXWOD&GhR&<&Kc~Q3}SH
zk<7+{75=pr(5L#ZYaf6e)7Ibm5H^?@%(>@X{WvMHA&xfD^~KTCX12k0HyBjkC*X%B
zifsn=xsz`W4_30(#Cb)}Zq3#>{corU6^4scwOo2QPO-{=PG>6jj|8NuE_@fnaoHs^
z{!HNeU*m+3<=IMUygU*0M<cH~fD3$vBcZ85t|4A+;ALh@xGYO(HvuaVug5aZlZox=
zUKqkw1g2#u02&ZJ|HeGBcp=$k6^qmG-=98OkBg26_{d4I#AB%uQBZO9Muqs%%-sr?
zL$NQx4em%}6b!P9@m~C-=UO8A2L>ly1Lzy4T4dD;<B5bGKD}cclnfR^Tld%XETP*0
zA2o+Z5I7Y^>%V+n@~YtT|8xNeVm+cDKKvu6Us{l-m3PH!w$RIt{FZ0eFL;^gR7}ah
z#6ay@okZ0;9;z+#Hn8hS8;n&zDdXgnaJY@SGkrT2*_q4N+Ok5>9Qkn<b5ptHz$Xgu
zE{t2*k?Rz-%@}A^gVCJ{lw^>Q+x{EzIpZd1eV0!hsf@H0MBMYo=9-(Fc`bj<J@s+4
z7warvB4ae)BC?I2p-$m#mpvLC`0?-LZ(~qdQ=WfbLIL4sc=;`eW1rp)_!<M8%)2@4
zX}>#`=M{wWr)cd&ijn{yoB*^bPl~m4y~0W{r6kB<f9MpwEnKGk6Ut}kO9M5cSD8f$
z(u`>nrtN6n%qnjr_cvy~T!0t_c4*2-KpD1eOrRu3McgyUF;Zw&v&lQ#M@Z=19*=T*
zn}oX9xgkC@FU!#{UftF>xF}Po?5o@;hNbx8u;y1oof@D(T7z`|%X->kT7I=5-tO7Q
zYIglc7-J3ZQ!_m3wzb>%(Q`Rx%pfnf`A^$h1?LXqY!2=rKQ!`e31I)Iy9^<`%BSt#
zyfu$zlSJy!qLLR}C)GsXF(Ys{Ax&eQ1^VcGC1yS>ipIPP7sW_~PI${38vu$)w{gB{
z7w~RkZ{(g`nzp4$UuL+_>`3{VRcbvuj%n>iEDk{?ZQ5dRw((fC-d8GluGQqwI)h@)
z5W{btrW+*45xCS+RvU)68T4L*vaEnQrcVphTYCxKC*J7sv3fA9rTH}tP3L~}NnA?t
z<>%64yauK#eNrgNUqCky1)Yw?_ydEk{O5e5{>_TEzrbp@@F<D-Z{yb`q8KA>>$5Gr
z_G_r<)U~*O!r%2S!F#&AhtHW}X`P8T%mPj2JlNY#RIDbARRAL3B3}1DeFr*OPZv|6
z0=0KK+zk`i5X;X;7q9-ni+|8)B{hEcgssKg%J`@4^?5LS<%G9`GGn8UFg*CHQhB6q
zIp8}JmUn?aeIL3pZko~h)hy`m*#&Luo-9=PdZe4S;{Wxrx8>Lm-Iw<Da`uAJRG=io
zm)nyBW&OI>c-ON#{He{UUphG`$~mKGP$o#1=7AIrECuuklLb^no43n(M#J;(#!sa8
z%dapDpk&J)R{6e9tBDGI#-mH!HhA2n#Qb`Ye>~7k*o|+Mqj?6$<s0ybitiea-zm%6
zB_9mVOmC%Zf1eb(6-r_EiCz3poa^6Ox>+vS|8M>3Uwz4!QcL~((NqH5d%A(t=4~Mc
z6v54emt_Hnd)=c8{WLg<f-gnJGx*9rm>5FK0{nQEvXSj*UPpd3%qJm!jEZu-qhn4e
z9eRaaA+HTZym=|;MEim_E6E~#+qm9t>B-JCUgq`^fIo=1B~JNC36I@g1@8Qpop$^X
zQphNG4fEZXWgPVD6XiOyq74+z?E`?^qzW9W+LRT@s!Ynh1R9lAJB8GQ8q$~u^`N1J
zxr*`v#QGp3l%gufrET`7TKx{K>3Qksne++W05eb0G(q$=8q8`L4?ch=q55Z3*U9J5
zkQ58KJNk#Z)yd;3I8vBZ=)>SqJ}W=J-OU!VojH1LwX>@TcOJEKM(@twFlH=cFhi>)
z_ArC34hLjgcjVr5v_Nqf86F)DVK8J)Rm`xhm$`!1bzPW-gJT2tEW(Ke84n0`hx<VY
zA3iO@ihaEp@a|tJekG_#!_4v8;WYyoYcmOh^Y{G0-xpGbQ%^`A;Q5MNt*JrLg$P9f
zp>JQRD*BDX0Tx4*vXcEE@)vzKj*S|OkBW)>dS8u3QAN#<w3mUPuhF<Y58<xV?0n5)
zsZbem*InDUs~MCVPWMI+Yyf&4l_S0yoz1f*lVbGzkh%!|$I#}=QIH~9>I`>TorilM
z52O^af7f|wQd)HDq&g)AXCi7CQeGM_V#9cUwG6AK4h(i*!FLlBem!FuF+TRC?ji?~
z;+iw|@Za);a(w%k3LZf=n_+>adD`#XpXv{C_mfCroD%gX8NMBLE%Hut4uP98&QJ*M
zPW*|+JOy$8c7bC`d6XKqEr1JL+3OUxX;bmR3!>pPBMieElh9&PkE6#U7k9-p!EEuZ
zVTLnM1BcI}>j*g-iO1yM#qWd8CujI0%H~@?ei(@Ed84m8oPEInkF=h4*u)~L2%#xP
z1k-L-F|>7*36Ae)z*y7dQ-tswc1KX@AQj5FziO^&So(9a3=w$vtx_E=CEoTd8wSa?
zEXW@Tjk5W@IN>7N#_O+YR@Ir?2mYN7lLQ8k0r5QfH+mAk)#F6FNdWgb`<UOl*+FrC
z+a+TXReuKy3|_q)r_jVNHIL~3_}j5+s)0wAfCccUUP|wY{?;(na?mzD7#LhVY)l}P
zL1i5~5xgK5gEm;b?t39J;*Gp-Az+~RB&oaUxfQZx%5Twa4!Avsw|JNgWt|gjSbsfQ
zq}>RL<GUxyM&CfD1ENfcLcYxkZ;L$kWuSV4Q1)rIf04?b7^Kl9bW39jvYcXoyat7U
zfVuI8FHtAw$QxC(4b6An-@A?QKSiq>tq<uEn42?i)B4^#Ue5CRfxG+T=Rt*k0^raw
zFN>3MOhoMpVCd=Pe9^cMlBI#t5A0EPo}Yfso;>{kP7iPFxqQYbMq{P8*;-c%v1aff
z4jM96I_{*>8fceVhTB)xv4=@vaq`Fg&hXepf45-r%spv?z~{1%b~+XOAaXKoHc|Xf
zZ-TI!LeDX`#V?@ctIzc}CT_%HFHT%f&9n)+c%<A{Qbn~BQ%t$Cp3;|iJ~gSHQH|uE
z&z6<X0N!hJ9FMmn*PRPZ;0cKjxgk!y)K4RCYw_r^<l9aL-s3}{p!-qEoyy$mZOmc4
zDjpdI=eh^85gNYc%p*ZY7wP9)82&EZDFQJp%dnRJ+By3ybT{y4edkB>$P%fK($4XJ
zulg72)~+|+*hzd~!v$kyyN<r<{=un}#sUWMIgokJv4M!)vz8bqW!L$MAl34@jK7wT
zQj<F*#BIsea8<l^H&{YGP?)^jrfk1yD^Hn_7Jsu#3n~U?s0n6YAA%3LD^k_D$%0?y
zSuD_~6lkt`{G-JVdYAhfsJ~!2DF1usmLz0EbB*^Z^7{SyTzmcG2FAbxq}tOK9BC`9
z!o2ctzn_1Y-esjb);G^ng_^WC>I9lHdb{XlDLHg8^~Ia!o`vp}O?x#MtAUT^qgDm4
zcn(nG%u4v9f7cu_DJrGbt>`YYOR%W9@@0CcAXbjesK}gPgCJWQ8Cc9*g-oRL+T~K8
z&%nE=qqf9P)cd3R*s#!dg>lP4eNUIQnKe<nqqkW6*FS2&73SDybAMioh5_qO8`gR8
z!>Qh3a*l@m&zX|0+NhVd!{2i29sdrLxllrlD$57j`pp5s;P_|&XI#HwAI5orJ0|$W
z!(doAfIqx7)*a2A>*GZCVkzSeV`4BVrbK6WZ81g7S=VP1#-8ae4aKpWd@a!TqVw?J
zMp?YbJKxtO$3Jz*<yz1VUY$}pe(_7JYn9Aq-Hr%w%qc6s2AO!~@F-lhOl7jKfZ8wQ
zNz&1Kq6z^7Bb>;>Lo+aTAN)_q&8l3H1ymph9Do-t`~&bQg5e;}8>&9csea^rtk0kx
z+Eq`b<o}>HK7se>^WoTN;eZ_5gewX2<;&upueV77T+n!<)KT`MLy?0bcfga>Z=fWD
z-UJD-IQao=Pu;&pQgqkLf{>);otELog|!G4Gp2v?=cc%Kf09e+zr8C6NlZvZ#n^&{
z*?#9iXRhgSLpC)l{K8T<_l8l2M#M8hlvSX`oNit%{LH>XVXy=}#S-dh1l@q#wp0ai
zP=C$F&s6h>hDv)5+75c*_5<UQ<FSz~a{%t$?&+pX1}voZQTPkdn-^ujySx2#*e6B7
zINjK}OCMO;v@79vaS~se`j>Zu&|`NWK+7eG4@}!^=}fU@`vgg6wq{%jG_PTlqPEtu
zgY+x3Z_b4*Ll3+atTdX~MT-EzAp@A*uk2|5(pqbe)S__+JkI;K_IP#5=Jd?y2`g^K
zh?|t=EK@~Gpf@y*^t@KXm0^G?$fUe%LuSpoK&G@CfYh#2zHm)=_zQt1I&^Kd$R0}q
z9r+Sc&=VO2y^Nwl!8iMxhTh0X#K(<e^T_)TN@{ivSe0)FWTJPhu)r$E8cqO4-}Q7~
ztZQLEqVPSo6lD_bgW@CJRl9mBZG|+>Bzu9UjT{-_HB=&=3FW$j^${Uhc3FJ7N9a&h
zn+hFKH=`Ld8|{XW;JyEXSVh$eNwQWTs5QiDkrV$V-%`ChBo(4{4}y|}nFH9WbiaQK
zGGr*g(>u%2xGl<}BLsOWevVb(QPtZ{ls9h%m;%@UPE}-UUOwgF93Ci=9Pz<j*Bo}F
zrri<-_eMuF*Htf!C>kBspCCgTU~m{#Tof?KgKPy~Duf5BhXg7FTC9;yMwd(#JRs%l
zms!K>fNl|S;@k(r|46?`m2t)kLKVD1=vus=tBizjS%*I2CqCb-Kdk|H05(V1-rUk7
zF_yj<!I>-4ac1{cI~-{cfmvs%HojAf3!i|@XBAKc5CSY~;-gWarNxH;>wu8KRSjLk
zFQ2JVFc+ju6RapSZ_42Hj=>C~RTQ6DKKpc$IM;Kd?|{@xDWo>^V9&!)l%tIZH-#==
z*ZUPIM#E6s(Rm!(DF#~C0w<(=$?Z*70(fxTU~4B>xhZHZzp`EaeZWi45n)4T!UWhJ
zDvvK5)gpBE<<p6|pewyK&z=HhXSsdqj_Z4-Sm17WIDgO&1TL%c0?#g?Cs|j5C=GxP
zSnJ`=`%LB{`WYx@oPF`zd3jXO0niP1@(Zl8Vh$8KjG`nDGb-ku|7su0&y5HERXLS>
zQ5PD9ajv{t7rbf~1HPsES;K{q+N*OH_ovR~V2br-;Pk1)vuRP3n>L!?Si?6{#5dg}
zSEYtk6qd~$#p;dg`-2M^2M@X1JKp5`Z$LN=E8ZAQ!$b9v&Tnte!!vT?Ufu(X@e9h)
zMJzd`pMSHRw{^SCv2)RWHE2>h9}yoZzt{rb;EuYA;#uK&-@^WE(Y}C-1oypJL^T`D
zfU`kso#Hs2s0z?`G4*3?++hX#TJeV`SNAjOY;5Qt{AS_!M)|t76#kw58U3Ee=Axz7
zyshzLKD73eQM!DB?U;S&9)P8g;3vLc&_A$31sle;4q<tsA(&q!t(7f4Ks_*J(CFeY
z!nvYGUcGa+(w--~LTXnGjOD5GV$%&G$>x)?dSRh!9DF`pGS#U|++(5JO}#ee+^T<C
zpum95lIAilU|fgA+SJW&^+8DqY{+Bl?3f#u#G@7hjQ}U7T)y+-r(}&hC!?S_#^x{M
z3KMtQ*=iMnbGk9`IUW=C0Uf2T42P|rCzU6z{*dNb+Q9E(A#pA`KO~^#nN^5TyCZaf
z3-m&|#Qce0uFej$^HO1Pe)WG`0ivh?nOw$qgFflo%A6?Z+3YK{TL{hmLn+aXcerSH
zGC+7GZGhvJGRaPlU0&6;>Yfz_7%LBBPm|eIi}c202$;f=l^^1q#H@#9C;Re4R;V6>
z?_b;iNL@IaM03(F&jqq6joK^AUBUY}Qt1d)Q*u^3)HcrdKuQQB_WGP((3xZj>)?M)
ze!T>p)SnPVy+&;$fIcWsuIgBrB@_)FDT$P-nq>_YDX6*^Z>y%MZ}~x;@WVwI_wja0
zd;o{yY9P#b*xA(k+H5C*6g0oQlir2MjI;`LN2eB!TWMtMKHH*(;kBU`ai=*n>jU2o
zk_MegEpR%|yfkj|`<xXFen`o-lg`N*VIsu1M7?Y{!+gJm6U6ezJ8o&k+ab1(?nr`%
z`zJl!oZ5(YN$z!0tJ^QB#}@l7O+(>+=x<EV@3fyM1wn<vD7qWe>|*#qe|!Ek1rg`|
zJR9#aM^>YSlI(sD=U|h^fxhV}f|s$`IQYnhEQY8{OAL+#`9eX`0j7pljMHB50AWP@
zn4$E7v|&0={!}uof{!;alO^-p;T;Yk`~#aI@u&4<m%UNKa3R889JJsc^>C|FqEZGm
z>^3m)E5~*^<RwbN#~nuyPCmLf_I!bk<sWh;AeYnXVhl0AX!&(gcO<?8K>sc|@UTN2
zo1!?YWB&s!w9lcfs>Tz*nUnD|cu+xtoV}J=U+*P`!eA}_{9--bUaTF9jB@<yokUYC
zE5P?eX_mzo<AJMcpl<$&<q$LWji|;;KiYv9q=gWYBoY3=>!E~0+8l^JQsQJmV9T`R
za?s(VFXYegM7!{pDzW%?n31V~vu*la5xd$udBDhkmwa5y0Oc4Y^1p3jt;M4*YWjVZ
zQR`g$^V5BnDC8D5c<{IDp>C+<NFXZucWo<P$E#2T#<`&;gM%2x>ujVH`Sh@#Bu3||
z9k>Jw(riwB+_@Hj9_bHQ30$@Q3B^b#ag>0#j!_!}cRx>kt+rh~uoXGFD1PD{dNanp
z!YvaTNfjCsl<YH~%#B`Xhh|ttJJUy6s3a}8NiF%Gv_g$Lab5@-oI#kW7+c8vj40r}
zbeMjAd8+41k@#%5`or_lYG}UBY`wrkmW=HF7#N)uGpOArim@TdM#XvLX25s-e$Cs(
zC2@tJXfglSjj;xP938E!I$CeiDI&UlZvzP(zoK~&>w0nTsC|yZJVxkd3C5qpi)2Xv
zIt(@1fO8_{GX~1L{Yz|c9SE#E@zL}Th^?xQ<Lml!)csoSL}}GVu)0*8H=qM&xz&(<
zcm`LA7V)+tF-|UwWSBBHC!iDe#l_@Z+d7aEaIG!t*4Wv0y%UFQV`G~+eY8;Tz7BhX
z{<MJJ5KW#8{DeFt?b`c+n2s4iv*4iI-+J)jF*kKY;yQT&5$R0D(X(EkJB*W#v?Bf{
z??nabScnm*j_NyN7=KEIhPA#8cqj%z%2vO6-B-MvT7CMlJu6kmfeoxY4HN8Y$a+d*
z9qV8EG8%7Mm4Q#wwvh?s1O*06Be_e~`mlq8Kr$UJC(-A7HVN~)+jM`8KVJBZCCeh8
zA6K9UVV%3WE4&OBQGGVP;qO?eQ<!`1`b$?=Nsk|#Kdz*z%XLOB<->z|nQy6`ryeGm
zkuLhWqeDJRWopWUocylOU=bxm-(z!pvt^ViI;F*IioNkRN>k)~U1Y-)yTlUoP7;YR
ze#yjt_uw-LsoBCBqe;R_l81cAS*S#I_Ei_w*NW#3c*0;-IZ~yPD4Y|xh!=)6X*yIc
z)a1=I{@C*MFA?6HKgy3z`l4VWB+B&y0y&*Kns4V)W`|Op<wHOQ#Aj<iq%bruc;<0U
zcyA3K#g3Q)z>x{qrF`bYQP%XG(h0O?A)k-HU+l2{T<gx#SCH2K<kg0$H(OY_ep3=X
zjID`;TmC|&3skKTBFsY}B8fc)y4ya=JUr|qRpLF=5684?nd&+FImnDm>Cb2il6>`r
zJT!)-MD>*c&%raRccYc<>tcx3IC2mKE_Ch`K@w8S-<lD+JA2~er(Q?fCsCSLxHu;|
zVS?~8R32ln-OhIqg;-a~gFPH@1AjX%1y!`~Sw6N_X@5Pcsddo8WETo{J6bX}V~GC6
z6L9D6qa7)J-~ZVKxi1ek1ku~#o>Os|g52#@)Dy+P*Z*M=Ohjik|EsNIB}itiGsfnr
zqfY*VF7cX#?4zk^VBo9QB)8~wT>(}@KN~Yv@92I|P&DhsQ6ath>$qC6+n+1e8lS-a
z>Q~UMWY7^#4La0HP=74-&16ufQ;*8nX5d>Vu{_ykN$kH8^-(P$yK#7Op74&6fp^rX
zd4>8n|FRdqN!C$j9mia=iHp-hV&=T$os6C%KLqY0yZ(-=J5AI1SZSdjTOl!yA5kh&
z?0wzJtvQbR7bP?D?_jwb3>5}z-uOPu53ZbCbD;I-Dz~{`w+R(CJ@Nb&eID}Gr0TvP
zum#t>q&RRrV2cWQE$|Aq>Z%vmNAPo5Rr#3CcJ^zkF(e{HNP`m)IOiM%*pn4=v_-Nu
z!Cm}2O$}MUn+KqxDP#RAnx<ha$hi4HB+PN#dMhtcw!Skg_1Xu>V?1?b=ior5xgV7w
zTBs0fE~f}tsdvsP4ZuJC_dF+2s!)81MJ`)dWA}K6vRfZo8O^+d|MO|v(}g`}WRyRJ
zXc8sLecW3kv@N5-_QNNaW(5Ix99jI3ZLrn*;{JjkaSQOFz<eJl6>$WQ^Hx#z$K0KM
zM$(DiHUTc{Fy2IFo#q){ZPEUIj|1d;7HJldjo%-#1-YSPnd1&{zxz=R*+lW}2hPK}
zE9%D_6GoY(+MR*gDZDbmvZ{Iz#~Su+=fg%Qp_{FIJ?Z3c_xY0-<MHBy{yZI|>R8O(
zUA|8J>1|%&U7FvU<sN*_WdU0@Vm#T%8ehN%AwBB5(;Q2~_3*JgE9c7*^VSfIL*}~m
zZY-`6Lq+Zaoaz;tJjSvg!;^3)8RZkP(SbK=h)|TUrq~UecNwFpKg_c)l<w3o#3EQ>
z?`}pr$*2df>Ve#@#_ych@v3o{Aw?guAQ34U_AUFgd4)SxxwiPiqaL%u)0yrjzj6*8
zbVVD^&RYFfCrecV9H@oYyJ`jevkt{f^|MGt=2~3SNK$d4?VgY$LK^T-LduXZc*R?%
z`lcI4Ol-Y3l8@fXCDibqQ7Izxv~E;TpvX9|6%J4hZP6uRO{i93S)*ELvLel<gsM!6
zw?-#}az;lEjZaqLx17m|-@OV()Dq6eI(-`!=0Vl>r;?dS2l1qlC4p11^%x*jvUU;E
z^T@@0?0`e0mCo0<M{X-vu`cE_KuJA>ctECeEz)TnY}}@c{pV$qXk9RPuXJH+DA_1s
zcYcnWh7P#}-)3vBJ~|6}_lbXK03QSJ%D0n838ReBE&g7vsYut|y?%1-<$>_JlJLJ3
zRS*4DOxr|9yEjR|iJnqv|9o2@O8iWjxAR*~xvrq{3E4GS?CWL0uzauo?OM_{Ht*|h
z*w8X+n4kN6tF%Csjk&T6R)YN(NoSe$*e!G1GB$+lwhS+6K&?c_t$ZPm=V>sNZ!(O&
zaH@sFlRAZI9bdwi8R<xD2o$)YpG`^G@1w6R6-Foo4rrx0S|skejUNpWQP$Xa(?IGK
z@raMC$1FcieBOfqVSGXB6|A0fFqpsJ50+2`blq$Ar^?q^vHNeB`(#+-f*g&A2wcE5
zbfm^(nP#X;_)5YQBE$1{E|rKd=YR(yW~3|S{0FJ1h1)5(^z28AOo7f$q3r_$;tn-G
zlHSGZsKk8m=X2<&;GT^+Jp>4f)Owl)4RHZH6(&tpSRr%JwoUr5U$dbabu5jd&7DQ%
z(rI!+;wen~B0)p9`Zx9RRC6d1?`>w4ZH(Ex2WdjKC*Aj0EZ2n(-Gxk^-VhJ!w9^<}
zZb2<X3cLKot;U>a&Eo=x8@`KDZ4NETn+2-O#OM~ei7YjNfDCS(AVe+82NFg~YUiN`
zLt^5&Lkg4ARqEfmP3^p(F$s*xDcdSs7iT@6hk|hCWM6UKNrKk9&=N2m=v^tMT1H2c
zLV2jMZwZk|EHg&C>up9PcWkGj?#A>(^o(?Zx}_#Q;z&vu6l?Fz)Eq>8Z?Oo4`6UTJ
zlrLV=kXcdaquu<q&5aHnYtg=z^1VtT4$tT8T6lig96U8$T_~tn*c@kgHi5{K3Ayer
ze;r+RzHdn8^?8q<8;bxrRmXzR&INz>ZM3<WTkaLhqox}Ln|_s1rng*)6Snl(tfMz5
zv5rGtno*!a!;u{gu=<+s_zl+o<$tF4Me!K&)1g_1kB)D8>`ZFts09cmK6|&Ud0}aN
zE;_${h}7B0*NmdM2y%62WAb5SGO)JCtmJWg2vPtGK!j^xBZciW#Z60RhhwUyaMw%P
ziX4|>v(Pzf(5yTvALr$e0gb6+t1ZztucIm0ZsaRpo)`zU`X-y1j{ci<8yT#hS#*}Z
zU>8MT2m4R{38O6M8KmmeFY}M1jFIdzNjy3Z42pv4{;F<ETIaOjvXe_YkIoyNJ4HY9
z!V~N3{jmJD-J7TUzX1Wu$@z@puJAUk{NF;Y?&o+8T*Mf&uVe!+N-^W;MDiKHR<yBf
zloVa21-eDn_3q~H+S-j;O9GzS1Vsk(t#3{5TV);9*);T`a1gxg8;w9hu(5{rATvE|
zD5c&eahV~vYRdQu^>W8c3e(QGwrEL#Tnp4&$Fa5JqeO_%VtEI>e)I0jHwS|-AZU4;
zC7lf3=iQHqT(?`88VQJ(mg%Gd{LH1~bjhzu=WQoaGyvZMItvHPnLd=oxodqHFZ<>H
zuk*Qe=`}Ddeua;;uc6!N91A)g>EG6P$8bqD`<iB>lb(QcL}XSgUg42<rt@X66AN=h
zq(u6Cb76(Bc+9okYQ}TguCqdf*Gn{%7nhVCgVM+|#M?d8h&V9<ufvHr_pw3}+%)f!
zxKm$+K2AKD`QO!#ABw21olp)^T!ot8id*|=g}SbY4gUselbZNAt=85>Gd$eXv-GFP
z<6rJ(I9$*5sx1k7IfmyvvuBC5G%r4CTUz4T*~yQUXVy6Q9oH_NT`VE!#ZU_j@#;`c
z7VV<ZkO+p8dCjLrQCd?hZxDnUPyogE&r;a#L)FiauBUx|RA{!7>Q3hZ*(_iRiIpA(
zP*)ElH9cWnELG8`GwyUb;w$<W=kQuGlt}8WdW8BeQi(h!EZ;wH7&B`2IvwLH<JW7Q
zp)GY!yx$b7hib292$^=%p*i4|Bv>DiU>Vgm&?~^~^T;*}{X*~?tAP_++~K=7#mcu;
zTo2=!aZ<HvnMOZz=htx6AHGO-c`G5G%OIZxJf}2uAg(4#E`3SP+~-zm6STHlP>M-E
zTizIGthSxaU09fG%&{=PxjlN2rmD2vs|n|Qs0d5haD~Uif~}B0Ui?KcbF)aqOYBq#
zSy~>6ml<M)5lFt3Txxz*@FnV!f>!1D9pgp`bTaX9Hx#5*M`-G8;C9t&90qFL;>OGE
zE=H_HB9HAH#wR^`YnXvrp<na2n*`0yw>^PhaLEJF+P+N|D%w7;V#@Z8&lo9$rEV7c
zq-^$(sUt=+h55}ATh1)EHbVpKuL-?QneiU<O3mE-nrfGxELb78MZH%^R&aCL9j;7&
zjj&#p^3CT#&MralH(zaWpO&t}H<iBS3F4%a1PMVuqOwMrBL#nl<aN3BM?{#q60yLl
z{W_9BVI*e2EZY;2x^l!fWp(a{>Cqwn`-1K~;rXNWjGEw#d=78njlAtn4RPWePxGjy
zhw`8c>E{02_7h4W6R4YX-C*L(701mZNTzOIx>;h=yWc>8&78}J9z|~b%Fe^+;H)Lo
z(IBi-i$K6;1!Ksa^c|LLB#dMagn#)wl&q#R$Zlgy5Bcs-Gyn93@L0Z%wl_{Dp&b@x
zXI-g~<zxHUiqS~A*)Qzvb1<D+rF!3Ax%m0E*01iDux4JL>%oM$-LEQR_qqCHX62H%
z+k@>Ife2C45e*PMD<!_HxJyO8zsJE1ao)J_i7UdWqHmDi1>sM97?v^#nwj0X7zik>
zr8?4XSi0d|Y?t_EeDgi$PUQXngSlV8w?7tkSjs~F>Pqw1Z(4JMxMpog6~|Tq$eL*X
z*anoe>eaJumnVlWT5or#hr!u{`zWR<`F-^`O|Nz_b&Bj^VF9`y=pk|Z&VPJ#{(OCW
z^nM?85nzS*Bp<<z7nZ_3w`*;FBfL~AvH5&#Ws>Cbsr~J@j?TNidKc0AO<DawZK3@c
zuC^H{3kee}GyY8df*qsPhjlpt@6&Wya@GOJhV2RsxF!_f`#Jhd(>at@>^su^%&%(K
z%H+pAul#4fap^ihc%AEU%i3K)D-|n#9^=)6&}Z0ws>z(l{rAJj!S}45;>P|*;iA-s
zl{hurD`fS-Z6qD$*PRyB>Kn|DwWxGQ9S{F{#9nQtj`@HrdPL(e0~B*Rw*k3RNM~4j
z@9Z`n;y94$r|l1DKaXCko@#R?w6{tFwy!#OFTOkn^S1D^*IuquKDjp5YImF!pP*2n
z{J>yjJVz9jXhlsYk@K^O&X<CGi_nkUxn@J#_~5^YN>g~W6v&!Y);u;#xA8m+<nceR
z0O_i}c63Mg%N{$*CH&@YV+c1$<*fYWe-((%emWF0!_7RD;9#r)Q$X1mEJ@I1jtw{D
z=@Q!>6VEFgnS^h)51C&K<|iA3R3Zq1z{UjI!;jC1#HiCPi-ANnv3mCtcY8p%lUE4o
z@>&0(Y+Z%-&MZTSiIK%w&Q~l_rrs4m1c{8Y#J7+122nfAj1aDe4Lmk^EFC=AIQ7ro
zZj7A}=Qgs2>%|lDuZoD^ZGwh_KJU|G=Q^25_%9*Ii_0M*<yKeA0w<?u$KNNIrh(oy
zCFgR_r+vlW|KEe#NsZ;H=k3uqlehF8F4jjY0wHx?g7zy5-Xa$!<H8iuZEI^3x+|n`
zn^`gK>5j$8yG7lMxsDEtmQo$9###drhmsOhztTgZoN46TE}oyyi!Lw>$?HlwE|gW$
znp?*tRw7R4t|N>=tK!aUTK-+NjCB$E2&AFC>X<yVfEEFf+7NPYuI)d`i?i6HqMqHX
zuL$T&vyHE`j}MB!;oGZtFfs%>f*Y9{Ur#vov{FPWoVPSIJT*R9dj;BgigQbzBQ9X<
z39|Z<D#vlk`N}a5x~J_%8k(=Z)m44#)eQK%P4J%AtZkbb8Hh3(f`-zCdI3#Ai#g!t
zar>%}-;9ic+^Im@OngM`&jXs8dlmP3nA-dp^Q_6=Tl!y-s5QEqoUku<mzcT}ylE&o
z4e^x9Xyw-cP5=e|h%To>k21QiJwj0${dgAI$($!k17rl_5Kqu#`0#%QxXvtb9)d;N
z<IW5_BVq08P;~z}6SRyDfjrqYysnF@#f|5lNIz|J_JNR0Quoic(5Dv1Kd&@6#>y!{
z{yzJ$?V$-asNp{={M#>0*n8vK2;iX)tbPVE%09pe_N4VbEUpvZ=Z90$a9VBh-;-Cc
zkhj3sBc3#j-{P}wlklDE6gpPBEUif`o-M;j5YcPTXOHlD?qI7)rZS8Uf4L<^&!6qi
z_ef7W2~s6LM6_iAT*uo654N{0c^rOA@Dg<p8CElw&mPft-~{Y6UU^Wfmna$AB8*ag
zkYh(=iR`|85tm$qH@6`U?Cd+!+%IvV#v&UI3xVI$?)YyT_j$$NEI(+anBN)>(OkLj
zOkw<Thk&y=2ZeRFt;draL%=a`<Wjn$j1?8NP@;HAqXpLLjJ;CZf3-?fOUEYhE_*(L
zX46F}5T!I88*B@#3|j9SnW4an{D!Sl+x7SbjOn8Io|rkmLjW<Uv&Q1Xujn!Fuc<zg
zHU#6quP(Oz`dP{DtaFb|!DHf_lX<88KTy>D=%8Zjaj4-Epk(OHj|9Vyf+n}Yaorb6
zvfqjB#t1u0cU^6y;H(zo$4XUcF;>l-G9%A3X>Kwj;u4hh#R9uy@>iu)uEx)>+>6cP
zVDtZ<75U%v$hE$yWZhA%?Bnhil>JF+$ycHCt4AngaZJWHjD#5ALO#)o`7;VUTIfXx
z3L(MhhVMMcNEAug4cZXuMd@0bA`E=b`Hf6xB*2RGg7OOQng+-zZ`YCE68GPf&gWaT
zbeXK-R@C{cU=o2VIuK5uiT;svN%0>ibYz=+(X-|DlEsj#!jPt$N<*4f;`)mnQOrMX
z5GK+xKdKK*WG|CW7y8y~=MQ?sLz=LNsTY#vXH?CiuHy>d<8mVdEnoa%%-g@Kb|9Pv
z8Jrh=Wl%kE#`OR*O;yLGKN$l7e>}bZ@Iyq<E;GH26ptbOE;zaNlimy+KS_@6DQj&v
zyN*R=|FsX`qOtM*y7Z~=_|Vvo5&FmF)x-Ug{zkf{IM5gD{5L8?n9=iXi{X2Pn8ZC;
z?69RX@~y4xRUi_=3ifh@2nbmu_rp&90LCh8^R-Sd5&tCNHudDi|5<vmbiHW&%f5ka
z9+fvg7dW8Z=lOA{IpKEaSD;&#-#%me3oaV)O(J$KR)~s;`<SygNp8|?05vs$SF$Vl
z*13+gut|5jc{%|N9}2L@-8AuOm8(10%2N0flYDLA-L^+!?t59Xqw2%zZ#mmoKrwmN
z>26DTN;6M)Td{h$=^HX5>M<!$oJ8%Xc#{XU(|rSnL*F=ISEmnyIf$Y5T8^?p^H`eM
z)>8>T&1O}oiS#w?(rzd;oyv7A`!-ZmeCE9a^e2!&wdD7HB#Dj)tc5GLr;36^_(%(%
zGo0D~0*`GElM&ZsrI}M{Cd!On`a7kFdBo_O<Oh-6L?Jjl=R`TJRzBiL_)}hp*=^5R
z{wZFM$(_vJk-O9ldXDuCSO5yBUYg-J{&pixEs*5#{&_^SW$DI28Z{Qzb{;sfeVGPV
z-^1ywy29J|>n;+9obK!}IEF%LBgBprfXCtgv(b8b7>IQqF3y9gebGwEh4e*;s7m7E
z4yuI$udFo0cKaCF<9`=5O~7ZP=I}UoRmdl~pz<M%4)F@fnTT`l^@a<>-`D;2erg`S
zV-K=*XqMlj48&6g91}8UCQXP=v#Ct?VVr(b8l=Pwn!gvBpc%ro-o}yCGGUdy{nb!@
zN`Xv=+$T9ZqrUeZR^?=~5odQHf~0T4t$Kpv`^MFfC?8pwM$!3Vz}UZSkU%jzc|JhR
zRRgu@d6ls1jU-#<3?oJ?<0yF?5eqx(`>PO2kmN&^-QeF?8rbhhd<SvrI7E4c6>Dsa
z*n&w0@>9S!(03?4%ER(T%s&shjY8@X)ZEFU{HM9jHq%UK>Zj=xdIb;R>e`Lk@@!6L
zprg0FvntYLh!vqG_T=U%A*%NCSvQSt6Q4EilhV-n=OnCZ2ak5=D>Ak1ESYml`S3c1
zf;^}A0r_zn@#rY^Q`W=}rRo|<9J@)Uio5i2S(}?L2)ZDQQik^VF)xh7C9_@ZwxY&s
zP($cF)LL5iTz~kTK=B$&;fGy)_nNyu9VHG`TLZ&C|Hbq7NZ|+70EKNdw4IaWZJR*?
z#@XuLFu8CrC}=ePRFT5Q=-|HXcX{S5;LO-c5Hnqv+2IT%Ks1cAAJ`<xQg?H>8H>pE
zX-;71n0JE~_xR2h07bOS={;qNnD?+v7Jsa$BveY!z|-vFL?9o<jf|UN;!FL^q!vaL
zSyu}8%SG}CTO~wNoO5x($uUSCsP3G~LL(K_B^!4Z`=+sjvktcI<N7|l5Ox>LQdE{W
z>Zh0R$9vUl(RTk%n#$GwZd@Wpds6f2qWVGP`jCjz9Pa68cRCEb24|#Wfp>pVB!nx4
z)5+81;Kko99L8Rq;Qe9E8P?H2!6^Ob3%b^opLDj_J>pJmRbEE~I(^w1XTQ1nr~XJO
zTuo8&qcc?4-Q;WV`u~HPZwG2$m}+vf(beABXwBzE*5iI$C$Q@j5u5!}Jd>Nfy($?|
z=RMs&COL8iLP_ZM^&m+nDx&>#$*t-l)5L@co_#aB?_-@Vw>^lyMY)F?lV6LpjQcW&
zYgZ?Y`%KVfMkXPr?9Qtb35l+3Hk9343tMXcvXY>C7vn<ZjYkLNHtyo4h}9#j)a{3!
z3L_IWT)-hE(DUx)g8BBK`yfiTi9%Yz4xZRdhW``dl}hV1cbC461NC@$^-6lJ58x${
zx6op5$5>~QgSO|LP_f!vZ@;H6w$yO6Jb%fB`B%(hZWQIlr0Jzvmo$&$r`vh5pHqN`
zr>1a1A*!0NGM;PQ{e`-O%MSrh+AwjYc35#tvH#8e9sGdLZR&XugP9S!s5oTD&ltWM
zvLWm@y5J?JuHKFGT*|VoeQu%4l}NZZUirX8M6Dw~!!=y+_vavf!<aY0x>g<_vCq$6
zv8m>5#nSHLyQ4abF@sl|)h1662g?H|vF@HGs|-;(_I%yNq#88?=@gI`lUBQ-r|jwI
zHZQ)t>Itc*e4Oqnv?;z|4J5Nrap`iRKNL;)ZX`*+)=T6fJ*+NX8a=R9L}qG%Mbt90
zqiLK_T(vXgIw4WUJ_(XD%^Z=!U)93*^sZQ;%=;Pkd@Oc1Vshp79h(je0VeERLaXu&
z@lTFAUi@oo$~S9zm<d}S-gk7TaJFpc-R|hj3iMJY>>yg7$V4hi&#Z_zeg4)mL+m3k
zt-i8;AHD{;LRC=}DE}Gt$0qdEwh40gHH4*H{aVg86g<tWl~ytOh_H?sV*;ANS=@D_
z3ao}i5p8%|TaGlF8$9TgYc`#Y-bG$p{`;}7g~|0yiO&ebP2C$siHa@c#fi~PZxJjU
z5VAKLX+N?moT_B&0Lcu{v8$o#^VOrjDJ*x#?t(-CErQz->R?Ibu6M7xj9}}N2INz6
zVSHwh(1rLeqjZANb$Bo5tKN=`2b|$X5}l0D=I{osFjupo@rutoU6<|1wmZJSyxr2H
z1i9Z<u|D{NSmwtjBA3Ni(nj!C`j*RgWyaOZ7UCeJ-zy-nf>A0^?@s2`xgl6xjH>G+
z-MwDIy!kh<dZ}7-3>`$XKN||pB;LUedF4p4*i9<f-dIWyvku?peC611c(B>?Fwrqi
z^5}h2Ej;<jS1kvqe;1>T{EdDYdM(V_OYR(fu)hKP=v-@y(WQE&*^!#76|{~|0l<Z3
zeZF27+7oH_Z{9X3H<YYbcd}`yDveY3IUBpptdVtZchn0>%*gyD!{YbccAA{`dcQ~S
z=@u-Gm%j!f4<JqNb|_NIPHB^f^xFFlEs#rqX7C|Tbb6m=!(R@ALS?{Z9?=l=zyW!r
zT_Voss|ht!mUWY|Qtk*VvvT<TZO7G^kG!q%ulG5z7=?14!T%)Rh>-^!cEtJ3m>ML~
zhgbT-Z02MC#gMBvp#ETAt`)}Tv}M>{^VG-yxs7_3CF1DQ0I{5zEBytVAz*<t=HahT
zD+ij5LK<aJPnp)Ig`V%k`v<R{jx$!JEu(KEz3Q%-N(O+W17D7AnKrG!a7_#skMq1p
zf?+NS;Y|~TCB4M*5Qml2W-h(iwcE1m<MP&WqeZ~011w+XXN{0zpWJ^HjilLLQemyN
zkx{?<MK716EfS0K?j*UW{Z$@zfem`h{uJU|2cK`=)XoUXT*oZj2b+B36rp}MnQ&Jx
z?_YXdqTfM|;_BpFN4uGAh>%gZb$z#v@4E}+TWvwez>(WC9sS<L!X7__r9_Shz6csZ
z5Wi$OdPx7}W3GV--pSD4<o}ev|04muhx|uI#Qw_6DBK9`3=O>{AgG9E0sucp#+kag
z%th<k_fq*e^hQzr9Q^Sb+>~xDKZ3lWoakI?88AIe$frWp9qswt)lkp(8W!3px!Tp=
z5q;^58bfQ7yN+wpo$B)4&4#`6?1=1y-^3zkq5*jlzH>Wf39(6$=}Ld65EH4fI4kaY
zJ(~Ub0-mda(_~*!O1X{9$|(2LkhuOXAviWRqWeDJd7j$0u3Me6b@u>ek6l}L_|@yC
z*SYE?W`4Sx*A<~%mc!V&<w)bF?X}J0s(rG(Z}WC44y%rh9=2$7J{J8dy-8m_{U_!=
zkX2ZeTQ{-xN>TD;LSS1(Ui}-AcDqv;O-I;XhNJTsnzlm5sA|9+VJE_lAm`cEDteB@
zroDj{cfqndT<hLbjzEbycLXQ_Ai<<Xh`KS@rUA{(;fT$LQqb=YnPRQoe(hAB!o5N_
zUzH+5i`l?$+YpypzjiKU?P=e()EM7y)<1J1^@b`Ub(&v@6dFodBj3uaWvW|54e@`n
zy)ROICm`t$TD?^BsG836$wd;f`163-{+$Sa*k;xR#!LxuGw$!VEQq~;iByA!5LStq
z#(>{CicwsdloTKvxlo25K#P&>xDP&N_up{}?@C19{hId^g51fBpgvzs|K{$zvv~;m
z5yH27bl8#A#*^=T!}x7M+24!ydikSy4-6`@WdI%k6_r4;TQ5#DsqLc1t}Kl4Ip_Tv
zB2sA9M2UoL<&T;-q6LQ=`s+pb;>lRVMFP87t(ldW5gx`$VRd?h*{Zh!Wf|3PCZ~#^
zLel~f9+j&jb>0*(lO0&WTx`DYoZxf7%p7WKaL8psFQfxSk|iW?LYY8P+4KRDEocvW
z!2F65=F;Bdg-%N}ZmBy+!Z0F{<*s$wE_O;RdEEZ7=|D07X(nKAWS+Bn!dM$+FWbxh
z*fg8suSP}*(C_>ocSKS=?V^WHhCyrX@*5VFKFMgu#eD)c6NpIfv69K-_yP1<UxSTM
z1a_RF_rb=!GH~|p(Z8;Ka@gpxoU0*QS5V0_k5kadQA_fX0`(<_mFCvQ;V#`ZFOuD!
z53fK%2|V^Gf7)qYGz!aeV2q7gT~hv%iQj5Jh>O@8=E7qzOr+XhDx=ETp6K3iMbhh>
zib#2SkFtz_jUp`!#+;%VzKePIY%b{`vOojla%V^~IzFKq;j-N~8Ab-s^Ruw9pxKY6
zJ&g1{Yvvdwv3rBvJk;N75S@vkQO?e^#PLi!kA2mEtda$Nj~MyIWgR4f`lLltfwf4>
zK-MjpR>_;+&(t=z^1rxx%b>X0C`z}nV8LCR#+?v?d*kjN=wQJ;xF&(%?rtHtyAvR|
z1cF;b&_Ltb==3-D&Ye5+@2%>p{&A|#e)m~>t!J|ur-h2WH$0Z-v^79Fw)@kPKx6`i
z>KB$wq%-l!Oyvu;uCOhRPH(U7r6eeFy(Bn{zetRR0Jnr;gPBWN-Al@U@5eq9^vAC!
zNT(Q1Tb+P#AP7^)^4eNuaoSV-?MN0|c_kH<0R;n)1ahP{Mr|%QR#cg`6U0H+*-)N#
z^14udy}@t1g1#8P%ki_WT{!g6jOj?}*6&4pcORJVa#?A58JYiJ4eRW25VVNFLLrd5
zmNAzuq~}bp`_>zK7&ohN@YYL?o3%JFudH{>Q9w!bY0-UW{I04V%!QRIDCz+0PmIEd
z^q7tw^(F-#>%Kuxet+*MwpwgrQ*jrRn=5%S&LG^h$e;hsc1GIztlWhQuD`H9Zr`I>
zN5@ORa&JE|F=N1Y<1HR+vpi=L=W9*Nz?aUnKOL8_$3a+ed~-L-6%W;3|3}Pa`V#*=
z_^Su{|E`wghMQ-k<oI0Sby}UH4%5f)AleCZL@3U=#B}IBFc;M5Ix{t2WGmO2><`vL
z=t_SgNKbv+Kpt~(r=g@scb%jZ4DZ6x-K{+=R<VKKUfV)4OtN21D}4McfN)Lhj=*m0
zruOy`r$2^gl^$Hz5yFhJ>2(S^A17I*sNAHh&8t3N^?%TFPJCQXWDe%K02I7_52Y!x
zh7aJe*o^$?_Hgd%0S7KuAFSOHT0}LVXl-G$&^?7>Zeyp&zv`mV29vW^L>*(Q&4;{p
zl1ZcE*|BNg$q%Lm_}haGV~@noSD`m~_fJ4&El|LGrdAhc+u0ZTn45qEtBTQ*bLoD&
z5)_%E3u*n+v?KbEgje9tN^NGsHeh88m+R7(hf_3*I+V8qenHQ-y^BQZ4v%x?sbsH=
zkY8Mn^CeeI0*x}Ct2n_HGUW#NWGs`}Ccf&SCdF?H8OtBJ+Ji=S({HDg<H)vpht%_Q
z?X%KPI%^rYEF{}M^d#0<4vjIyQr>qzJJ*&b!{kW@ZY~mp%EBM3^D!bbqseYb<;oil
z{zkv;@n;X8sKgW3b&_B%3?7pIT&&huv{tx||5lZ}Ds@`SKsjsu)LOw~WgzrWp^Yvg
zg-2Lid+HR%PdIb2sqrFedDlsQmg9ZczJEg+W-ENO;_w&9g3wzbn07Z(1THY{D@u!l
zD;F~}tFf40rRhm>BYdbdp5scFdx5yJ;}qThjoNnOrHeGL>XP9Frn}(tSn@)Anu;tt
zY*Jlrkkuk~aF<}vlO|h0PlYTtlp8o|C_I*b*c#Z27hL_a@RG#>$Yo&2=&`X>_43d$
z%t@m;Hb5mVwyh^JszLei?zf({<5pEjKaOlA_@Wra<t3?3G&$N4UV7<r_{3_S5n4kq
zJL>F4*U-h}^%nZPcHC@ff<Y^_*C0HO70~^55Cvc74!C)w(Rc)@K^}~j>BK`VA^=BZ
zsSt9jK9e~iRp>wgqfBY(O_T?Mv)6jAsmBV?x7TZqgUaiL)nf!FcDD+h{*sE$pHJn`
z+>F70m#&}f`?p4fFCR*uHDw8?RzMIb+XebiqTj@yJ@vsFrrvv=XkMM9N*>cHx4Qli
zvlr#+nG0W`o8|(<1|`YWjE}!6gOCGuR|@n2j`<W+28KoWuN@!eqFAtZfaO&K-Fuf~
z&X2)8JNejBXtOTkOZ0DO&GHnyc2KmD=3pgI^tf!1re)`VUcI=Ob{l>2X1@KL)H)@X
zUwO6PU~P@JYZd!91cZTa)*nB0w~OW26676zYji2w*>;TG`GCzB89)h+T&H1*#&<H6
zh&M%h#@6$48nXNcD_I6yX+?A}Zm64)^=N$VXMxw(<7e5=W{fr;nbA^wUtUV!xp>xG
z^P#iZP-m@2hU(msEYAe%yNAH;J}B%m!ae7VNAkqa8J;IwudEqG>*q^_?YmV5mM8X!
zv~6!98y%+TjPbc02L;^OC}gmTu2&c4`R|W6hZ$%lZ@Wt4E^g@oBGdCMSiDiR=Iv!|
z&OMuZ$bW`enpXHgBvqIfgqUn-0c5z9VPPu<rRO&Wf+r7Vr;FQs<<H*1r~WUS(2nKx
zr>h#@;&p-Hves_P?rqV9%*MazeZk+OD>fqDp4!Aeb`g8%N=86h?DKi_9Hx?0N)jh`
zrJf|}FjF_?ZQh$&q4wtUd8+Ty6KJ0=Fb0w7Pf}OJmCmC0>$hn5n&z5Tjz_x}dBlbX
zS4+<S7<kmw&u+cY)ye{8a^q#Hc`6oKatR6sBm95tT}JBz_6of}vhvx6+rdbb9!FyI
zG<Ku=@Iudio7Waf<dVX!2^0MnT0czRynMN*aZ?KWG4-Ei?(iq3)@y-aQPuQUE_naj
z=78`!5Evt5*(;nDXsoH3$K|<OcmPK!;|x~`7BU*e7pp>h={8op#M!ZODjo>kHv-OM
z{_CdP!VQH6&2b<3S9FE~)&^A~_`JujVB0H#KBvD)KS&NPhS&->#Y!A`-`ykTXjRqs
zyD5tiq>LDH(ZApK%Yg{-$=zceO<QYjZ;qCK<wKE(Qd8amdc^2Fiu+l?OqU7BJMhp{
z!aWt-gmSi{yQ=b9@e9LpJTGgtT~QYKdDpRiG&vDEZT3&I$r@=E7B+K1xckBZ=5lB6
zimX3@#kH-)9xwB6X`kk6{aNIh=qbc4B8)emoJ<}7j|?Oo$}if6sh*p7o+HUcI8v9w
z<7sKyXg^VPgLi_1mLGJ$@0<CvVpCX#BhT#nLxlkc!wPzhLX3Tw+d~FrtbH$sHu);J
znfMAI!kwzjTrtLe`B6b|?D<ly%ReWxC;fgir+>NU)h<kq3$fjir8)0~mCP`OUhXa9
zQ!miWLxl<x3akjc95vjV0*I3;Fk+X=mN{atk0lfq_ZPfQQ4(LW5)2U9PxwTkz3_3v
zc+`iB6;ZqKmg}SF6k%?wa5u2MlWA)YVITJAW>65_TDCxnlU}L}JYb-3ihldzu{q!s
zy)SD>8ZI`?O=<8;{Dx)kz2525@##FAn5#7WN|+_o81d5>3ZA~=o1juif8J!I1scfX
zem9*|_-^Wu(X8jPW@Ed)<m_XKZM#5J+L7c}YAz}PCaWb1E$yc|K`ur-v_*&d&m9}%
zA2AsViyNKDn8S>1^ArLI&TsnHm&3AyFiLLp7>u)Ah?f*IQ~ZhZOHNEyKyz#)+|4D!
z3Hi-L&r|u=!@XF`DR4J%$Q$!Ie3E^p&-}9?^i~RUtQ<1#VhKAZN$+~sq4Urho{`$%
z?Bx(yx9d`?#rycAi1!w5_GnT}ec8usDuBh1XM+Vq$UV+<bYFJTqVz5`y0$9!+DmA>
z+`uyV<$tjNE|Sq-_?!&&_&SYqTO3;79)Htc{1wf8Vj8rvi}r}}xQIqg>Ktw?wGr;^
z?KEBUn2Sc82E8d%KDG-_ExxW4W#QQgwD~02$@h6nYBSdkT2p2jgfBPlEz<<3n8#rZ
z@|*tyG7jnm&J=M=am=G7p((j(dD%mNU;uA{I^!>wyhpW_XZ<kyKg`$1&@VMK3{Se>
zis3#AO%sSI_f8^`JH0V2j`__X708%S9cR;M=s4TL2ZC2soI1@4_(yM?guXlbN0fwM
z^Sba#I*(GSn^(4pyfra`^!N&|0NYir0ThPM9b&3f56659o^{)$_JLhSs?X4WUu}wH
z3GeIZ3Ha^o7s6ezCNm-e+PV9NFXs==GsKPo{#Bv6lk5~)6w>yDEv2r3&fB$BztgP5
z<;*!hi!$R$Y)E;vfDd77-Hk9t>W$`69XGSboGfej{i0v6&mP`xKJeLBV}%tnptRtj
zjaR7dC4c9XvNa~teYrmaBk&vN=2v$dB|~nO*mlgNQrqP`-|YZ&(M4OtRYN;g8-D(8
z@z0PK^dAk};&n`9m%L=#*`Bsg4zXUep9^03wd&rK3MQ4Ftcs%_-Aup1-Sv5K9M9@L
zk&q`tPs308?XMsB)puRWyY7QNe0_LwGllCe!YFl)y?hR9j@@(55-E~xZb%hUuev>Y
z4GufEdw44rhV#WQ8ZUGhQ{ieqf8<Vnv`!`#dROz*kXl=fd&CX?Ng3eR&rjb}j&T~H
zhCkH*_n*$P(LngJQQ%0-Mek#tcO}n)S;d=0dQ8(wJ(=UOEWGE!b%b<VHH8>}Sc6n?
z8I6Nzi<DSL<@?*c@@aM%MYvJ#(B>sWYtKHdFuc#Z#82uUN_FQP+_^D5B^7p|x!Y)m
zo=PMq-D0!dyDu1QJv#f}EZ7S{wB;)wHaA|IQT#7o@&8m8iklINtuT1JnTk$!ya`|)
zYm3DBu1GF}M=)(5GR-UZ@bTp1*i0S)R}Ip8RugqHd(u-5vYDc|cw~K?-X1K9M<IIb
zIw;~V=#v8vli}xx(meO^QJMHnj&0$NR2-f0PJclS{)&SN*ne~sY3-#0_1dVFl?fmH
zlBh;ygzIP>MG{^cvfqpA6Kv-)Q$lvD9_w!DzA#TlB%$=>s@h?3zV$|jD9(Ok*@nf<
zmoVFgCdI}2myX8uhS{_6=eyWGsfIgK?r6T+&X%Mk9d#jE)r0cz;OFQvYSM3K!riS`
zV=j91SKflo7;IO5r<sy#X^)K4Box0Fc2RInJf9ZR@w`u!+YqDV-hU6`6{w+qieD}(
zH@aV-FXj9zxSx(X9MtoI?aKp~kT4mg#R;WmuarPANF*VOZxj3Vc=BILd+exeIgl~w
zH%&tVi^vL@jCz_=$C{h`+x62MX0?2MFGy>EFIcy_?6(E*L^&d3N!nzdrM$z{VNUX*
zjV^NRV>wG~wL+l$?&)FY{-++Pur8;?vh%ENN2C71T)j}Tun1QN@W^hF-%CcBVgfZs
ze$bDk)mq<wB^7@j;@C~v2aYD7V_tJMtBSJYqb%37&qiK7@`+82b5g-alPV^pgFI=b
z9(rt1CmIKJh_{~XqswC)4@RW`|IYN|CDEe9_9PE$$<FMSH%3F@Z}W(>&XbzkI}*`G
zi1=)f6W_^5g38NoK9SeSW8QO5qrDz@c3n$$sWY%>w+NqW?wwqd*oES0B;@4>gE3)8
zta%tz#==+Py351w5=_ZoB)4i7xLH|hI};p#v+V1xIBx{1SZiEzoQ;bGVTn5m)hi8V
zw7R+MKXyxGli=7|Vq<RFsQ&{C1bUncVh7L!Ki(mCz-ZHQ?q(Kg)r1P?(c+LKI24BJ
z+*6h#NF(FJ`lXv5SFK-2`8+S>P5T^1e<7U{GXre;c0zfHz-jLZ6VCr`SVwbnBq^wq
z4-N!NUO-94${46`jMjZ7#O=)3T=Cy|si0I`9~*tBR{)Id{`)G!(e%f&`sI4)3~0Cg
zTWj)UWy6d^@<5sGj1YtCQ-&B<(u&HDr;85mooT1(HPW*?cxc5Tz(<Pi_bmSuiu=*!
zJZ3)Q<JJ%j6+2}3H2*oU919<?vU)$aa^^NYwJ6xKu<#U-7ShJRZcK@=XQKhnXgQ3s
zcUD>DRQeYGC1)4y7AA1U!w`>*bRTHzEij`miw3-V_Jy7Xn)GDX>r|fJ_5^4U2=?<~
z@-qg%0&MVCS2?SS7MRAxxUs~<zT}YWP0V)(i@fBs15xQj$maM<T6~|j=m$5J&g5e;
z6FQLA!D*wJNJ6AN|5`O3Tk>w7cL%NtU%E>nX%goTHRABKlVsQDA8}=qHsvIlKNZ5h
z6?K8HG-ETcb+H4AOjIfbWdcR}U<%j!uK9pxY<~f9&<bm>@+hHZ{!^k=t*qR&;?T)_
zE2LVu)RfC0QX+88T&zdG)M{$(#eBN`_jQVjKpAy@!iR{oY@U1bT|E)ucAsw%X=Rj4
zRzfqgu<~c3q<u84L1@BHxqx52_*l9hFS@(3>x%{Wpf?O+@`qBrkSw!HQESgeiD|g_
z`q+Eu_BuY-ADJ(l|CusyAAyK1f)^@JoDRg`-&xk;GR&Tf0zAjH;$6m>1BWYA{B>Sq
zCSPs0f&*kFmAlW^a!am$17)?x$x2L>1;dBJUGr$=a_Sr-oNwt%D&n+5{p>BlCr((A
zL;4G-H*1Rx_EXZ5hSI_a*QUc@Au*Tt_drR!8AiOMpcI@3Qr)r}3XgWD(49+)M#qH)
z`V21?D_VEuwzgFw`@{-4TZJGSz61w+E}JQd#*wjRlSO&>mFnrwT|8#*-+}WVWo@qa
z-#i<!{2!j-|C;IlJPw2JkL3qE)(_f@V9?+7KcL<F(VwA6IBGLrUKhsTi@XLQzb~+N
zOP-HN8%TwY=}+CkdPvh^ihe=8LIsL`sn7ffpxEePpgjLM%AbbAdEuDzO4wKeGQ3%f
zP%<u?&&l*eHkY&Hs_Wxz{~=UsyfR)!X((lhOqlXmx<q6j(zCm}X0jD%((gXKO!B!q
zg&FYJZj*1sjsvRLVfQ`yMdt4!{sJCbG8r8Uiz>=kZrM8C(CO`L2J0&=)zCyxITbA=
zY?@AC+edgfw}Bw~FZzWOw2z}2L@P&Rsf;vRC_ecfdzqNOnhV`+#r#)u<!}%3Z(=|B
zOs~{HFv8@NhPXMGn!>P=SEgCz#=BomU&?t7WgI?@fp6#pvc#3E3Hf7we@rf&#*oG@
zVp5vg<@1$n5xu!1BCI&N+DPK4VV$2~!`{E$z##bQD3#Hb|NDc@`^-mFt)?<>ixE;^
zlTcL$x{!M$Iv6((_F0e~;kt`i_m2@$CwohZZ8)7+8Fa%S?(H+-8+oBdH_YG`SQcwS
zkx-#Tk(C?y_B!UzkE)@)N$GN;VQ7335A%8kum`Oc<)@)DIyse#O1fgRJ4=j+9d3d>
zHJ>Cb8`6HqpYZ+C_3jOX<O?NnKk6<EWc5xdHZ%W)7;~LturOQYuJO(Zu>62cr`Rr&
z%c;2UzrPy*5@#yxUU^~E?T%VEI&}Uvr#V2yDDN3+xpy0Tgm)wSZ2Z1fL*Z6kt*wtg
z!+W)((k_ttZ1-9`-($=*g4;~}{*`#v&I*JHo&K6wB{TCWY-LBCTe<r(y@*_Uw+KB)
z68j^$NdZvBMUBkkQBJa<gW4_6YnVz$(rJACt&G4je}zrhpv2h+EgU8RZxm7ZQWwKs
zi4=O>LWvU|6v$rMg3?h<Bv3ZDXfP2=oBIba$Sg~nXx8W68F05ZNDW;kQOTPn&nrp9
zCBTn$dexqeP%MBtQV3#CCLb?mYO|49nvO<}fq6oT#w7jc+hV1fCufWM@oYd2%pLL_
zzGvh3=IX#t`?!?-&Y|x^2<XdK3+a69s!4u$wPPtx-`qCJ#rmh8c<i<4{MaQ??{NqX
zU(UliRnXpv6grhrr;IxH`Cxq+Pp;kk0w-fGyl+pn&B5BMXBA{bLnlXYLlzUM_lm>o
zUD-5Klm41$VVWk;?AoB!eYf>xl^xi25i=W|N`yQt7Y$u)`c*oLmtBZ681JRnD+v{H
ze_CpxG>Pm23Jjey6#lttZ!W(c<=b5MCiD$t?2$LdEJUr^NE%%7?)5RyRg2dxYd@w?
z5xq&BU+EVxu6P<N{RGa!kh9)@y0(D;3Ee)P4&w6w-g`ZnUf8W=t)IDf>)y3?)x7V#
zD!1FgL}n)|qp}|6(s*Q+I;*59$Y03{%%)U*Kw28U`RUoyFL~GO9wtB-2Rq!^tV;$8
z7ij2;5Q~P~H;n#dmP{%Hp1<F{5Pqrh_j<whX!*22QIeB@@-@l&Xo~wiIlf8u(gtPX
zIC<+>P5xeN@b#Krx#JVpi$Lqeg`fpCYK`zPE$;T$*Ss!6rRsx!;<@2&Y>~`zVT`~(
zG^5?p^Jf+#>-5HiB2`pJJMOCfk3rgs6g}U_iyz;#&NYe-GRjTyW%^?@abYSc@)!2B
zNa)GjOQtYAnjDqZh%Dx4nPn3RbM+9-8{nGXg*=1Nu!_T^68J)kP-YsKWU;^KzKjF6
zQE;!1@SxY74K$9+7nf&}_Uu-JUHBu98cSkZb4x=xbpj+a+WIGsV>cHabTy=^*${KG
z<w4_^oB)T*tg5YPQlqaaWnF^*bSH=HfBlraG9ob^l>lBvlG;~|%aGFZ!jm*VFL-Rx
z5OCo@Y>%pY1b*Q{B^iRe2aE*gIwm0-ll&<<Lpc|&PVV$ZU$GT1%s-jA8w9en_mGAT
z=A~Tsq}Uo6<Y-X*dow(yP%(2oyE**-BtH;U{J%qhJn3CHv+>F#U0pX4lCziCT7VC2
zkPuYubpaCSt#CndZca_J2hk|Kd3U%6((B_go*d!Y)KzC^W2&y&_o!2Cp=Rf<n)R&S
z4t=ep0#&KPJ`y6>bx(-)TV$2ySnn|akq4>1?lE5E@5@QrY9op}Rm&CUU=L@cIo-FN
zSy?wi@n+PW%`e_$)ufn7;`to1ucU=)Xo7Sr*Tm>6#*a~8v_X7GBCN`ah@^!09TJ^(
z$`g7;9<zN69U-K|E_hCtWMNdF(p!l6la#&Pz=)^j!-a)wEuK|prm=Rhsy#m;;bE63
z;vv)2mA&&XDOJ+{Ay+T|*%}ErbMUxi9Q#PHpfr_q+556SR{Qx(b66bliIG$Ly{6Z+
zjn}+BK@BXgCEs?k`v81Bb7AY=ET$u@$mm?UDJ&{%m>{$*Lp=%wdp5fLiqgKvs18${
zg;b<<YpKB28#^T|FDx+<0EFGG!lFQ+uTbG=9%9jXPl|ky&|;R)@2zGO*lz+Z9g9v5
zcIuTSM~vV4b;xh!i|!{KHV#L>($8h$+swVY_ANy|wkKihbjkNChsFiUj_;CesL}hg
zCwWr1Y!BXyZL1K;^yLH`cE$Xv(dOu;yx8fE;g`IVm#B2$MER&?d=9iqe|&4QP{k#x
zO{ry>>$%>FsM7iRGqWtl-1l3Pp!cK(cY-a@CU@(^Kw>p(oj{w7PXX`UF=FGt=*3YT
z^4N3VB(H`Yvzms+i*8q_Ctez3v}5I_YkGi>Xvi8W_r5=nSl$RzKjz`B-bw$oU>LE^
zga~es7x5gov}BV_`M({jWXba>y49XjtyN&#lNhm-1wK%(-20L&Pg)Hu;H+Im{=w!<
z|4!b=nJFrDHW|`;5r(neOkYq!P3VX&W}WZM{po{g*v;tuZeTA;&S%IXXXlR#tC7~X
zT%FZOj}FM+k#K))k@=}SlYSqAu{EN6g6ILLtuZvu+X1fb{bFs8pLd3r-qSJJ{6WW}
z-lRf&B%q}epCikDm#U*OdwZXLvDTjCIoLK-@vo)F+i8B|hy2Ks?mctwP+3zhGI9uJ
z4m$C|9v#_!*)`uT(srp@mcCNe=lgBs6Mv2D=s};$NaUH_?;mt_edWaf)U~2V9Wa=g
z@m54g^R4VUK$FBzH3*sRghl)+(o`e?k$k!~3Y0Q2pP#}(EQD;Ar^RFCbj--Xkg{5&
z6y0RWu8Z$!D!y)?#M>8LT+-&o^k)3Y|Mqx>u{szQt8lbTj$QuC-QD1wKSyio`F`(H
zJ2*_q3#U)~Wez)vGZ`pLo63UHNADQ2caPE~?n-M9fm^MdP@_`NnISs|l48BCM}JM4
zrM7{Msx!WhJjhRr!61u@Pf_+@<-W93uMzl*XRaFWu;A(pCT&+Ic+9c-D{j&Z<NLq!
zM3*qxE-NjJb$qNCx7%p%TUu42p2dxz?pDOaxX`WAcRA*nLafmZE)PTbprD-t?955c
z;T$2o479;0CCSz&K9>=0V)wOM+HG5ULqpNP@4`Bt^+0#YtV`!l$8z^B`y4aq#P2&6
z{Z|C6e3|j6(lwY5oPq(mJ`UC1mKmupl}T8mFhTj9uAucb<*eiD3#%aiBK%S#4`Ml9
zTFp;0)4*~xzOoU>^HM&?tN6_n=*39M@30T7mtV2uT67lva97&Ek*>R!+gzHB4&p~$
ziNd+k0j5tFYAZL5lR-92$dkraq<C+~hl7j|xewN0M+^gY&@mY;wwBiJo`n4Jl-3V;
zC~>{~yCRpLqu72d%qdx2x;&w-9Sq9x`x{ZpnV{J;XMHcp*slUPr2UxE?VRmlRnY<S
z{*_Q)4$!SxyxK|Sn#Rxps;`TE^#+hiGe9(owY<eMbmnB~eDH2+c@$__lWj2KvBLs+
z&L0la>P?4S4X>V72j<x@!?Id1JH71ozVP34hc5VYZapOUFZ5hvQhf{H2UeQ|pdz_A
z)whUe@?YO1xb<Vu7g3ve*Ztf4|C}2C^Dw=D{8A^Y(rGT6vy_X;>WYG<nBYoVzU2)e
zq;Wn>V3&Hu7;S@B+lkx$<+XH-PuovN!Qm;l8WH1Va0uI~TBShK1wN-&c!h(Dmpy|K
ztFc&<Lgv@)KSq5^2vf3#%>YAp78GHNV`yG6xO~K&n^d2uN*ty)M!qtLOLM3>QOo2l
zo}=yrr;I3p=920a8t9Z*wL={$dK+JDyZ~R8{Ta8qLVj&CYbfQVbI6#zCl2xROTj4i
zKb7r9dY8T$hAh*Ax7Ex@22EPOAm(hI`bbkkilnMC#La?*Ht>i&#i40xTW90aS*%OI
zclivnMpSY2nG4ko#1?t5_CdsuLHhGiXPqsAcko=!W*mwunqjM@iGGKEHsaa;Emz3j
zi&zZ)mqrddyZf>4YxiQ2pU>rJDdCE}f(z_>Bqo)Y`Ds7!QRnn~yNBgI(d_M_KL_%M
z+MpA70y)p1dO9%ypYKOvE)eyV+n=b^`kxPS9lC$v&-=-)&hDL#{tw3+=q9Zsz9sqh
ze2;YkxPkwsP9;{C3~`Cu4St08Wl@lfHibfp`-OsF8I&wH>~boHL6W=JrOj@?vc<P1
zTDqElV2}qwYt>t*Z^}A!pwF`7shEWVf)WCqtt=8@Yxb4{KcE<!QhJTsxoVhbUMi=}
z33{Re*O6%n?7Ni<58H8E*Qq_v{Grs6S{rOy9TUP1MyFK{k_<&mg##aXj@+9jB9$GK
z6-UUMhQ4w=nYAd3c4+b<#UdM1a4_Ph&PCrovOPb!x*U&Vc9WHG{+cw1v?bssX?K^K
z(?`HEZtVw@R;w(;@-aaXj=m@D`+l{d)DKXprUQd)Dk=E_DmrYsZmYc{<Q71&7n>wO
z2>e$w&&Tldl`%dy)yH?~i@z@=t&v{u0;J?M(TPJ2Ci^>|^{rQ%<6b8M-W$@4Wgz0m
z!`c;^<$%uPigZy_9~L+--GaZAS7*-ds82jQ(+eP}|5Ywg0eSnD@z#J>DfK?!IDr~)
zO*6_f<CWljB#j=TlXQ%W|N6DFY}74FT%I8vx2DTsD3SlFCA$6B>Sc;v%iB0)<#O^+
zIg|{QD6Xs7q8(fz{=V8cl*?1qn$L&?Zk_ueMl_0|HBSiZ=JofKSk;1!1mYEGCD)_I
zR<tHl+B&U^b2HQZ?szolgG#<H<{3|)D1a3r!R%1H2v{DHD)Qc}OEBz%kzL@Es;gs5
zUFUTJ)5MpoS@UK3K7_}<0vMZ@#*O=VzIF$B(q}PNzVbl{4`Z_zonsO#H7q(ndfN*G
zO3T-MMfnZ8jeow^I=$b~X}_6pNe51;`X^d@Rlk%2ja4K=dgGiqj~1VT>UcfK86si3
zvu$rco!ly{%oQw_mYi^CHxJb0J9LjGn?p4tfXP74%|hZ#5<!$s9{#3nE|Wrtc2N4N
zfOnU0_KW)~EsPHe^z@9Ah;2$?@=vaNYkb%R`9MH-HpIj~*yny><EeH7f>nE)^iZNp
zk!iU%H5_*vB>rp_Y+KuM`Be^-!>u7fb7`8vw*AFWx=vHqNg4O`cLJ*6ocX*$;|syR
z$h==MWa;RbuY=t}o_)1aeVz8hMOEaaO)r9oW&JHwp8u_}Z8I;53~*z-i6XvSUirB|
z%(WLHK;Z!zqD~|3DdH)W*K;@CTT7+jaUcucQMpJU|Jqs1C__hYdIuq}s;`!4H!~@V
ziMf1CVG^81_m_;Qurb+{tG}TG!U6f<6)iaED;z$H|7C%CkpT<8xHncAi?@p(k>a*7
z&7~<Jn_uZF8?BOYaSmw2P~TckStT&is`wG?p`N8v!B@DJ68}g@AD{Irqb?Hu;lvIx
z#^3$}|Epls7`yKYs;*MPE~YMQYMjzSGnPCZ%VK<q4KiWaX*7@a`ozcLRpUl>4xUQa
z;ulB>>}-@*rIBRvuvc1s{EJ3r5Jc@89eUnzyev6LIx3nT+_T*}9mc(8RO*!__R+n3
zp^9w5xu*b!=NmPmkV30liQffWCxw~z)HUwsn&#_`<oQE7N^heTg0;SP^RLaUY}~)~
zq5VHJ_US3)mju~pD*A99T@0Kt;+@EB#U1pwsH2}>pJ!cuMw^oR0-%&fyYlG1qyo_k
zVt*O(JT6<86t0~wbc~n(i;&~nLu~POD~QaCRqZLv2bz9O9&O%zBK=?@An)oc@aij~
z$(;HT-uXW-g#n{~r`u1j%&?dF*+Dg|><FWgS)d*Pn}4IT(gChxsjcwd61QW@My`;J
z7t$4%<cNpX55zs;^6j?)4Q7%UHzvC0*rQJLM1dgzZjAC?>~KD;8-DetdwyCHVqVS)
zXc{Jy-=bwKC$mu9zy;;tqeMh5Ha<WOxY}hta!PQWa+Uu!L1#D4i6%cyNcuWld>F-}
zb4kaORaD~M6naLUX{&UE+#;MHKPJQrcwA%EXEnxWs+FDj!YI`wgx=n;Srk&GG|Ro*
z?SPM=7_Em?2UQKqGEzy~$%#%nViGn6f4~fDrhg*#ixFQYuLlV{LnKzH9=ah;@;-DO
zDgVl*2cB(;g2)ciY|wwrx`Dcw^WIjwkEue0dWi)ri)_iogTN7=0U>b5B4&3ws{|mB
zxr>Zx_?w`?l|Zd=?i+WP^bTw92=W#_eQ7K#RQL5Wjb?hw*%KxM*%^1wh_v=gaMJ`k
zvW>yLUT1*EX9SyPaddzlL9tg6n^s?uohg#MZ-oFa1$%c~9%b%?qS6p*7q_tSvmrcq
zA)HPmiJEJgSs~ih`%r?{(t9Y?ASGWz(V1eq;Q7!JT5}>LQdEL%j=VIg#4nKw-3ue*
z3)VL<>}t<6!QRGJDJCbnhmGqtwYApsTHE&zoBHmib|WAi<#SvT3h>zW%TIeOMQB^}
zF@_H@)x2+(f^8Hg!((XivCATL@oS<F2f2fyhycJ3jwXG!nBh!T=(@;e9Gpql<$dj}
zxhP*waOyBWHG2ejhG<1@nH+H4`g-N^WVO@*v^)XIsVrn)4IimmoLE0lDuJpdW?@6G
z%uudF?JF{#khbvO^(lEeG-_%}wN(;++}{Bd5Y3Wp{bW1|V%q%fz`>|DRcpFE@4T^6
zt`9RK(~szv&*1HB$kixp__Q~R98C7g2YmuXvGp4#66*<01Z*V|$&e*BSJ#&q@C)|J
zP@$q~y`jd-N0K(djQ-Q{!Yc>SRgKru+gZ^vQkX1gR<9H#Nl8X>r=dojO;qLYnsoea
z!-SXq#W<h{viU9L1Tw$cQBXc#9JMT+Q+MBFB2;Rt;>ToZ5)t;+cQG<OTB?Ugwn%wd
zE{hAyWu$|dEydaMgD=ul^WQN9PX4F~*~7qjA^>CFO}>iYql1r$>7%_v3+g2GMej?S
zSn;7$@sfT+oqoHkjns!#S+Wiu(i+kf20-S~t-RI?_cmRD3dEC9G<WkPptC#S*-0Vn
z<5z7h3}z9WAwr#6a|aB&I8={pV;CXb&*Z@#edf<+sbS~GMx$B`I}{i8YOqo7*vq?6
z@rruf2$!x>tJh26I2$p3vd*N+G^o1?+AK#?h4)@(Bf%F2OI0JIA$MS2OkRGTz?L7W
zJfR717CVKzIJ~ArGJ^M1XqtaNkl!1xym%3i?}bWmEE-z`=?r=v^>5&FRjQODNSab>
zej&D*S#VK@!zwJT5$9+5t8>i31#zU=a>|tIo~ga-az5E6x~iQwg#hUoemNBe<1&l9
z2r&3y0PpY7UYJ*X_4#e!;vJ&AHakimU+*UbiZ*;xl#<M`JzpMI*p)0Ck+Q(}6K2~;
zdgk`ML^G^bVcCMFK_EvW(Crj95q(K!mLYsD?P5J=q$nQ66Dm!>1?p|eb+<ej^>DzQ
zjFZ%wp^GC3(2)Hpt!G`6MYd7|q=i{J!l?+azlDx+{1vpyf+HZUMZ3*7we!aJ;X6S^
z4RE2NFc+PyjPN<bg`|4W$Fk(BWoYS-Zfp%B;uqnT6(#t!Dh4R<uiye8AA67N?|7Tx
zL-VLHy!sckqRvebGDZ@`9mSiE?O2PmP~J!cj_t77o1-9Uai2`%PT6$=eTR;(i2a|*
z@y*NDKFN18$hHa@IVpAY5%_`-Tt-C(@XBQ8?b`cRbg7^IiphA$ZCW`alYix#od3EP
z?*)4LsH%L~Cn7pT)tteg7W_SCXBE@%v_3-1PJy6+d|bi#yMIdh(&_L)p@kNo3$iAZ
zO@t8cqZ@Dcjfg@JJ0QnG2DK#3KZW(M7wuMXp!nI8{uRwmhWwpRK++&9z)2_#5Wcz7
z8WnA56XYq(3eBb@>Gyl89Osaot-z=5bE-%^H)27+obGod+smd<G8kMoBI#usJY)w%
zm2`VVJD)nw3}`YocDKpbO~{!r!usdR<N@nEPyc#0{<Lmv_}lDOY%K6Q`2Ep<%w-ek
zjn4vpMWh{{B9`Vxkto>)8GysPmk98mroW&ZkHlBcASfSEq)La=4=3J5F&U7t6d3|8
zi;IVgIDqE2HM&w}q<ia^8<&!^8J?)LRHX7=HyH0c`YD6u(qSz)ZpTCEpuCx<cg=*L
z$htq0MX%^q>5n>Fwz$XYF$nWS9XA_C&i#|D$`DqedEW)3HEKi=XixozIbZO4IcdtT
zmf+oeBr0K+!@XUpoL-X<sO<-{!gz{yttJPBP0Pa@ECOYku^&35oCh@N^$XD4)s5rs
zPIF5aMDAYrbTTy1pX>X`g7(b;cK@*Np+WxrQR21lxu9s#E`lpXB;xqdMGdT_q}?m%
ztDd9rivdQN6ivAE^F=u>ZKq?o-cAR$7KW=*Xy+jLPUo=vT!m(Y^N?INS|Fb$SoC)2
z$bUNAR{F78Lp*yUpwV$qy+)^00^9N%or&vDc()}ufR1=M4|(?~z$`x`BH~nwDVCD2
zgTY_>mPI(_<z$sANzhg&>=z$t{~A}<*_Vxm%Gw&?D?N~L4c58G<)q+gsV(m}#k65s
ze^@Zd4DwRT*L9R7JM{b5{WU{nLIfrsul&Ls(1>`7_$;ZFbej}!dY^v<0~Y!Wzhf4x
zI9r(~2%rbZaRSEhBXy7~kTPl~)sHF(NniWeklpRw3NB8$$POtO;W49Upth+|iq`k3
z^xgl_;#+&LaA!o@c{OXtC*|kt6^ybOYTiMwlsX(kY2tyKQMx{cO1mhg8VvV$5X8r?
zE2s8c`h_<VzyZh;0hE(j8^FVKg|xsKJg1?k|N30FLL0oSalZcs=#LSgs456_Y)pC4
zJr9ibf<ldG%6t8%Y}B5S*lP{l<)-6YpAz!7;DDI;y)5*iFVgp)0N&m}sA<4KK8}<C
zX-Ggz&zsGfHMTx0uVM*;g_tj+LN82!dN&IeyuQPO?V;~ax;WEUdALZMMze04tn{*9
zoFdnlU=nr#r8@<((u3rpH=9I5v~^YD{Y*$-axJNmW?>hzS?QLw-PRCx@`6L}sV8->
z9f-m9?tN#ThHW)01$_?r`uL2jr3L|@J5UTi=l|IEiB9N)&+LJcxOrYrH*8fdf{wTx
zS1f@=YHKeTLo{#+Kk1htGK}EukfN{Cldc4KZ*7NAgr)O1w)mDH<uL?@T~_BY^rbO=
znnfVky~Nc?!+IiLqphZD62AWe>UgIu7=Op_0l4E4lqaK9v#lt%R=FI?L3+L<xoK7x
zn#lG#=v@4~MQo`{2=2_BzJQWTVi5((VK`1ww|-N+QeWTP_za9ua%Kv$KRM$9b(ihk
zp%v|J;XCe`ESErEc|HuctSm;PxhNpXmCKP><j#_XqM~6W6-E3-iA)Rl24ZeXR^F{B
zG0<nN#41gF)-H3zqi7KHG|HylTMBJj4*07xl%`f5Ol$V+8rJ>RTj1N94>HdIA5)mp
zZSo)<H=f;=e@?kpFO4!<=u#6%A9tzXIlO`_;f_VzB0ni}+7(G!BqS7PSmN>AXUT<I
z%R8E)O_yYX)4ti(r!k0YWPAB3H0_9^z6$$&3}Okoojc{zz>{wJ&KYQ7xz8E7f^n`W
z?4!IFa|+*kr5D>S!Hh~3ym=I1j+!Jxl~|EtIi%gg=Ha-gFzQC+{dP~!d4(;2+&XE1
zdX<>!9}7flcrM{2Z`n$&KVSUbSR!8LKrFr8vMM`YAL%RbKj72<EBhnJPwj@Az${y&
z7M=2?TsEZc-b7mGtTk#HpE<QksztH2N1X=ET-fW)eTuEWXF(jr^KR~5)C|TeM+4)|
zyMAizb0lL|n#>kMo?o{c>j5SR@n?D{Ex(EQKWd*ud2>(EzR}mWagb>drFJ^$_U|Xj
zKk}2#)GwoCg$1I-D!y_#Eb3<PD5HPiBMhBIdZS+b*(y;saP|H=)g?5^%CygEKW^GD
z;JzeXpPOXo^(IX{XnJfV9pR6F<dGJRN{mse7M^Q?TQvnnUfhid=Xg5rE6P3$8RSlH
zB${URKA93o0YhJt-PjNP{lhVhLH||zqd!<<6ul8%-Hfy->S{5l6>@x+1j7zap&-nh
z#s<9w3G2#hWGwl^(t8irZ%<#e@_}y(a!g;YKOa9Q-82BhUX}F!lI&;~<P`|meMa|l
z5j)r21yP9lYVTG^Bgd#jFn<SlVqCvk4XCs9cky@sT*t2oLH4`}IFG+&TpWKYp*63h
z=fuZ>bN(T{w)anzpirqb!1jo{4PB%ZcMfn5Lv0&rtoxuCR^FJ5!d$o((>!p!Dw9X2
zxe+8DIyj~$M_2A5AJJq7GG?AubU#XwnUoc$&spjOh3vS@kePReDz3tvbqhdtB6_NQ
z1$Qb0RCdYK?^I+qMrVpn$WE1#?pXCh|6IU1_oM2GjmNW|4|sx%l=h+VkddRW1t^q)
zMc-48n(4<bGOMfmNB)aUEj=Ex8lRkW<2RszzC(_)ZZbCrxc)@h7IsjoH;t5ol?HK#
zb~6&bL4_8cP_yA$d!ly=`}Dg&^#uWYs_7xmsmyWq{AoK(l2X`Wa`A5LFi?ZMouxO0
znv1yGz3F`re7E~F=(bZPDy_jY;Bna|SxhK&3menr);>b$R`<B)aC0j78p3-M!UiCD
zgOBfGbHR_qfn?Svdl{vn+@_jIk1o@lm$^8CB|+yMk-I15Cpr^$+aLp^V@1_~E7cG*
zpkuIi#jl?b1LjaRctYv`OO89Ez9HJm$Z(ki5}s<`@(BD89~7hG46kmh8M07(=_oqf
zl#+FTD?48>!`Crrv)HvSE4m$>QF9lc)Kz#gaZjQ!U{?R;m(~P;`tx(&;Cl%x6aZT%
z;Q1-eu}B$-ROgV+Ei~&QFL-63<?zQ@;Tm8HyVDx|Nd`o`gdc$ugU%@;A{x}?Vi$fH
z@<+-g<hBGe)jGYaH&H{eRz35JBThtK@A&gSG7FqL%-<<^$mdsgNAYBufdK$(D{MI|
zaO9WGB799<g|Sy>6;uFGtJ9$607ajoJ?{%OF{+C_;jJ;`6iO6_x&#HADd9#%YIDQ_
z6t8Az+8YHhCdueLx*ojt{!w&pW})rSV})rt0^b76UOxU6_}zToJIJi?w>x)O+!=yh
zJATUWDbp10ME!o)`SG<m^W(g)R=BO8q|MQSfKzi?O8k!Zvw2n6m!yM0ODXG+cR)KQ
z3Eg~&<D->$+xF+Gqv(P;FZPsPM!~pehK(f^(SRW@$qTWG3H~^{qdzVZrn2<&OGl$T
zGBaPA+C-v2TLnACTIucVU4~J|j!8Dwb0S&ob~sx{DI9zBgt)!~8VUH1^FQ-UJ4K4*
zAXP|n_+XqoqA*;-nuV5LSot~lx0hvW4Ms1ih-z>aG+r|3efjpz5Ac~*pq76Io?;)$
zs;NYtfDsWv^Md%%H{Z^HCkOaLZdEua?vK|F2zl%8pQ4WO`-+-wP(z33_e=bz4-WC$
zMQyzMlbBYo8*JrLK4G*k7A0vx81C;G2H#ny*GLA(+1tB~3_3&&8oFNePyBNvRH7M6
znwD)7+ZsraU1<Ia4!3s9fyk<iyy=Ox+JgRJ(EwhcNtpJ~<O^{7Z9nv!S%YKhlX=$(
zi;jEJPfx|q#!yI<aTEKW_fSe=5+ccXCd5ywtgU&DixLof&eJXl0~zwM1FVjwxQt9>
zqQW=-03A3??1%qvm46ZW<u^jsF$n3bGj}i6Y}FKg*H<7;g)FE|du=TO`H}#NeJ>rM
z+2<2{w{@x?p0TIPca27W-O`Y{IOfEN?H-mDmQ>O$*|qlbG=B;iCXOiOxj6Ax_7Kij
zvg3)y#~kiKL)_82=u7+FfL>Sg5)4cuIxW%pF<y{d3|}hp1&WXJ(*@dMPUZERqhmpE
z2Qg-XCq8Qr0~Yycf2%c1Zvc-DGAv(+1;pI&5qXPEc<!HH<T2E!yQ7sI)W^yeax2>E
zR9R=(_he+lS?y}puWmP7LvzYB8t7&B&35_ndCER{=b}HH>v>Nwn(_QtmzHo-QoC3b
zOF-BMaA<vVK+N86A3L?LKzd~lCF<>A8jH%;?>+k4wES6D<YUTC6Ouwa!fG4(#%@1o
zf9P<3{B|8!GIc2@0utc9`>96qO)XnMzPnMtKJHx0Y|5<T-C=5ZH?Nk8X1iusiqmDM
zs|81L?<?PC1~0VN3%}>dKgmYFvG6Bci2hrwnBPRz9wsCj1Z!)MM7*l6k=`+bR@KQR
z1&*i4f8Z)dA7f&-ihS1%8(YbtDF$gJ2n>CC*TAtj8Hr^n@7_Kl$H%?%n|Ju5g6GiJ
zi@^*Gmlk2Tw?J2Y(by_muhEc~_mE;RwaUG$3S08*twPXwG{VcIL66w63?7*<boX=7
zht;!2{$(r7yC0~=<hWH$p*tm2+TtVq8Gd#al<0*7q6U}iMYQ;qexp;3$QtK9&V8O7
zSDwEL7)fh|1Q(~U-_J!*9#bOxm?k#QXdq+lAf2YFXb~0Q-O-stMSnbe=6>}ACob|{
zr5$$%1XlOqvwT?$VXI|9-JrrGxc!*I;K<gScpoKnw0dG$AzIODZ)Ov-zK`KyZ#f1B
zWtAB-4=St)3J-&Lq7%%=i#1IWg^dB`u%&xnU7R~Yo>?9|t5-#GOL@y$&2&Z~sUg+1
z=m4~wdMUxv$Aw<tMDpz;+9q<@F;-hI6CL^HV<F5et+fBd`nu_j8h{sd=!&uT$2UNQ
z4sHy#6!s^XI=~_;9l+M*W4hQGVT7G`<82d88D19}VSeJ7nU`FCYD0*djKbKMz1Foi
z*YO7LN7|LJIi&YBW9p~MUU~R3x<oK!y0j_^1gNU(jo@SEY-ysjvZDZM1u}X$P;3L^
z!`d9L+}kUvJ#VRs9GaEK0qmfRF7D{2rWM!e=v(4{mbb$GNXIl%a|1N6Dwhz|iqw{I
z=u(_$Pym6$n682Mr|nVF8k9N|>(3LlRly~<utWTqQL@YTimy;7$}E!{crPvbF^M|P
zG|x$-8gie825q*b6)Fx7kB0imQ52&ius%e^5K>~6aqx^aD}jUM-7o3>E+ufe`0ur>
zOiq&q#8N8yOO9%Jmf}p^7RY9#JmQlvPRz+^J>1a<pT(v`AH2y6a4_-OtyKx4j>2&&
z;7)>s_BN)=_#zm5r3pq6!aGD^@}JL*7&_j<<-s05-dp8fn3+B(pI(P-xcq|N#44m3
zTL+s-Hc{~2OtSV(cwQz}pi=C0A3_s?zmq(4D)Xzn<c8xXwgn&>Q4sN@BqI$I+FOUI
z$VQIWh3#B(RN#e`^4@^SjK*kTMd2&sk*vj{HXkl4G;7q<6w9{2x#Rri0$p>hJdnwf
zX<oYJgKTs=>uVAcG|1}N$q@1(M|kvf+T5wR**QnMb(KOu|NT4upx6vP=#i)k0mnMO
zX=j+kn$aM8k{A4bmn82s1x%A0VzEjAVyS>J#+B|+r3x6!G?k3BE)A|pzh>}WvQinV
zf2EDlLkN);F-EC;34GDfXpCK?ngQ=JkR=B2d8kh?7tE!m8Q2capH^{uV$Oq_2`=V5
zhl**&8rrbqZN&(Zbl={;aq4d}zRj&veX$rhnKkdR$i2bucQd)a7;%VSpl)elP)%?M
z!1!t3UH>inb_I5cQ5gL}r$P8**W_?IU)tU@VpyE;Fy!j`|LfBc4Lh0T=U~O6lFS91
zB%ZGrX=xs40B8u(^oDnvv~(zda&1Se(TDE6So|lHZT?07B{Zx8+g<;gjt1lQu^U@B
z_)$Vg!XKMF9g(45UcKH{ez%Ay=EFt@9FKWCNy4EY-o`1$kMeya#^=TxTN))ovGSIU
zXxEFvgoJto-HxOmR3!t^)__FKL!lviXdm_+c)p-*a*L6Yppjs0O8%PT7ezi~a2)#f
z3F$r_3lu^JxN%YMNs8rwaw4D+*df3>HJnDF0(i>~n$hL#xR$ion*a6Ba1nP9pi@(a
zm`f#T;$iM`N@*H7$LNnvd>0?Q(d^P=XMoHBl8J)+16mAjIcV*Xc&>!p(%xYfF$pVL
zR3fima=L_fagG$%gVr<@C8<DLtLGlx8}t4Tv4RyMN_*377be&60?5S-9sbb%ijm6n
zL9W=#TwWJb1WI8E*3x7u_}gf#OrgxR7C_33OoUSn!$*=`cIcs-A5*+V*nA0}5a#^4
z)JlxaF4QTRXwQcVkC1fAkiHPXV?|1KvG$0lnRHmZX{k1W3}RkK$OK{_FVc6Hc{O>C
zO=Qk{S>evd4ADh;F|9H0;#ki<V6wDGuqW~JbY;*~E!XFsvoZ|}Yhg1zQuS|zNp9#w
zk%D^P*X9xn_Sex6NfFJ|^D`#jkSnyLucG^wa_J-k6(17b-G??6;J?NFF9KUpRr+!k
zj>R|Jimz`CMegkI4?#er8iGkgU|+PG&bDlvJFCH)APL6gn)OqPrLk`91>q~ap6%c0
zJrDsE|M+R~fhDEnho~aET~XMTa)0sYc2TdRo9Z6JD%p(tfxX1M=-6prHDkEIP`Ss)
zA<v+I`M_BJ%srQE@Q_msuqU~GJ!wt&n6~WtQip#>rLFB6```r0C)EL!6@BMIBC9_U
za%mDY-Dv5PANzKb`ok;gH4?Pw-1|!8?>ayqHKPsmy>f?7NoCj+j4x8FYIu(=(G+i3
z@+1>T>R1CNvdS<RV@WGGnoFxflGw19%GX&mgqZ9T#0Y0!wN(esKOf4Vj~Oq2Yc-Li
zj%}(x%JP$6q7M<7mGV#EkTQ@B>xjc+{pT5f5l@b0yVNJ6)X4T`rzW9T&ggz)neRC#
zIEAopk*P6___GWJazG%9u*D95pa`qJ2{8k<Zxj%bw-4^-g@yZR%PrGH82id3MVEm2
z&{VnKGPY2Oy#5gUe$}B*O*SEiA52<}*H*3}bz#|qORlbck-{}c4WY=ccT}c#x~dhu
zA=zl=uuSZTiTDks@@DMX?WAIVQ9}P6e_~3Ov^&Hapgx*G-?8%%={NG_=hF)ZKo1fu
ze~`9X<IuYYUgIKlB782{uEfloLmn3As@1nn3RWxQu*Asti0wev>tI<9J9IKbA#SX9
z^&l+PiKvSXYUfFRw-F%KzzUfn3nbGX6(U>8yWUg0pS$<^KD`q2E<0UJ1OpKIUPp|)
z-pAqE&6fP(iP}t~@#iY&ZCSvCUf>qhQ1~IzXNLD58mmR%IN2GZb9{U1VM$S@@8ikK
zyTnH!U##q|VFrTXlfjndJbO$H6V!S(*)9u|M-gkB0mo0K#LPfFs?;OfXXE2Im+|v=
z4)$|36e?a6MEP$P;WSX!9@o#N*}JsMMzYArea1|9L~pAnjhv+Z#sI}@Rs<ZzVD$I)
zIR9#yh&eIHWWS&T2|j8VjYtfh4&9A~8>u9|x!<pM$@BE6xKB2Km*IPkurQW=zg*c4
z*g9_stm6_G9m-Ry>=4d9zu0hv@+ovL>_t<{#?k@F<kUr<(}W_AI|PnI$Bi#LJ%cqa
z0_w@!Jw1o|ZN9i23`GJzS%R180}x5F_7_aI<wxyPR-SVIsu@Q$BzZY2ah<9EtJCHg
z`mfU-S7i3mF%fllAApYF3uZYNQ4;)ZcNXzr@MR1~1VW^OJo%I4eBSi7Qq?oCdApt@
z8imBJlKTj3#IA`nB=sycHk+6ve#xVW&DhRA#-1VjAAEgxJY3<nb|MkUM2{9VA`)$g
zXwicp(R&$;2x9c;o#;Uj5(LqsjWWvUEm4Coqj#c<HhLSwx1DqEJ?Gpj-#^>-`_0~O
zS?gWvS?gK*eJ9|mI?0tpKt;_PU;`-qyP^d5s)kFh6kYMC0R!9aRY@!Ph=&I^;eTHW
zZLFvI5$mF{^7x(5yEJksSiUBzn|sS!UuQLZz03I16vo2X&nSk4X%qC5{EF{D*A+od
z-QhsEOfe}3bw=uy`9~k<3>4xk#>AUWGeSQlFQnr2sMf8Qamf2dQ>sb~c`Xf3U^CjG
z1*AIT0zT>hQ3{iK15>1$Vx!;57T@=+7Om?lFdteB^lPOWpDH>Zn-YFwL-s72?S-(>
zO367rULvWIP36%}w`Z=AqeBd>v<c;lPZb2mB|Zb~(--@sI6BoE&6dD@)C23AJQOi;
z1Wa7`fsHHWYUYmFzeG%5vO`JnrNcY@HEXmra!v>n<E^ZN1z$cz&POpZ4523)I_!^?
zV^j^KAKsJVKAY|7yjE5SAPxKb(QJ6;hh7+6Jg?O+Ma()u+7mQSVDsTZ{l2VBC%-vw
zvORbq|D>nbflkQ9k=+ndM`owRrS<ajgXrMACue>=%O)(=q(Y8Y8*nFU>d}Q!OiNBr
z7T3&1!hH>Pw0sV0XwxP>(TixlQ9^hNMR3>Txs7L2=PJKcp}XIHRHNX|JKMuyXxOFM
zR63ga?LbS&W;KJ`<v!%I$MN~DeFJ_ce$dmoT-6y*>qXdD+*{b{**NdiiQ%=BS~(Hq
zkV)^;sM{V=xE3Dj2ah7q-p$nv$s(Uigq#A2K04Y5e~j%8c2r`bKjDAcsCWCy=*e^y
zVvexoJ<F`D#CWgRQNls56iPOJX{hvN&gNCUYyJ1d-=RW5F;1_f2&!~hPKv~}>PWc0
zoje{q%5cL@alWE?3Nm|6ek<m=IG&QXBy+%n-W%8P28xu=gvV`)XL_pY3Cx~P|IB!+
z9~+`*xUsvqxE}8J*hAk=^$XXr?P#XwlfrtAy_>#BSAM6#?4c2z7i&&4Q0V44MAgMk
zG}ovcQ&S||!d>7}E1G3ALiyC2gT;1C<vnxPwV<BKkKWCKQ|<>uHPj!hSgq#}Y1OR4
zf_9is<P)IB2MqJD5&Zc+oBp`%s{1eHqG{>AOMLfq#jxzZo2<J=DwJ6=T;XhbwBG~r
z)MrU09?y>Em-?m*^5Em3yGnXHHq-w6o=4Oi)3d!wmC9wEVd~?_b+lZ5&zxri=33e}
zah83lyV`c`da1-UpGJo#r7EwS^#rEW?FuJ`3NP53_8Zk9w9~#3B~2JP<b^Ob)(_jG
z#+OY}lXrI>C!Q|}?zFn!w_7NZ#v7~{3W-_U<~SEZk@=jyR@%82NheUe?%;iv^Ij(B
z=WP}<1}}GdTwleiwV!qTAA|gv)))iN6`(vmBrRrkgJ~n7_v~a9h9EYes1=es8nb~J
zJ?m0t90<0bH>U8EpeA(^>~ii<`@`~C(B1MswiuNlx!8P3Z~r^b{l>3I`8r-#7612F
zWoaMx-}sg?aK`wjW+=mO`C1olx=ZY%+3^GOC@s|QiWY<Kdcb2X6MxFZ0wsaEJyd_D
zF;xG%!#xpmP}`MsYPBg=BNTO=w}TCw%Bg1!sD0K4_VPHLzI#jPy4H<XxBZ`&mQqxE
z|2W2fYd`Ukq0qiwKm7zdM#j^6N4BMPp+<x~Y>bRjT4MvKf7T*(J?u7M-Ww`pRpg)_
zJtbvpHfT7?#n!4p#>`2sj4+O*iTU})FlpUshN<7Txs`Y?D6WwZJ6_k6u}gKOvlciB
zT@d*8eBr5JP={s{)(112hReh%b=F9G3Hwb8TTt=_|4yVeS1iJ+o2%XW!_!7ob~Q2R
zYr0;{Ag*HJLB`%_*&AEwVg|Zk8CQy26=6*_5wQ>Wtk>H>ZE`;CNHM-dx06EEo56>#
zQ3;<&<VJbm_g<YPj`-HSPX?&Fmvz9FO_k}dGTf!`uI>dBOw`y~i+eRlo9;_?hrIJ2
zoL6IV>^X``eVXAo0_;U#5F|b?bf=%-LGzb|iZK?gH!oOCkLDT<;}x#u`v*yL`><@t
z{qiZnO@YXAy;YXzqYpfO@hGy#S->eltTj$SlS#>9+_|Y<u44+lG@%0d);lxdt?9%b
z0^d^1c9kgyyPO-Hg6WfKl8xL4l();vIUn4W9d>e)jiJ~l>wmAYWNj(Kkn*y2hUBQT
zoBm{;KkenxKqjdpum_>2Rf?s#sPCJzEfSN3g*Q^M8}S7XU5DFsRDe9qX&$d2!CceE
zN;_*^y?nKw74X%g<nG-oztzs~<SeGoo3_`BQkPf(XJ@>r1br-?^t9D8IotHky2f=}
zqo(y8CR!ImjSqdJTYczG_TA~5QFqal3v}Grhek&i=Q--?My=&lo|<P|L!991`o~5F
z%yM0odMCI0+N{!VHjO-2tL4L0H?mtPM4uP)8gsRtZLq8Ve({rAsC(}tp=;^wFd$fJ
zox%`!R|ZZCjK4*O)`}%Dk;Dx=Tt(k*@$K9GWxQrWr)c`P{=-fjQgN(z<BeAnS|(21
zhU;_0$8-(6hnfk$RKEy%(251E;5Bx$Y`=0GRtr)C@i);<$D4p9*AcdawFhpMQm|Xr
z!B@8Gt}m`yWuUzgv(!v(S{};gpL}=BJAWA*JwDquk=(916RfLT&e}kARQ#Er=7TON
z-=-e!En=n>RN2^%NfUL^Uu$Xf7>FxUJq9PdkqJ6Bnk^fJy#H=P|5Kz~C{27Rl0cH6
z%<aMCBVBf9V7E5O8NXuK9sI9?wAg6TgRyU%EyqKfNxMd)d^vr+4eflkLh<V>(C2Al
zaw70ZVU%}1Hh%OSqC(Bxrhm!mSfNW-)#8&KqgAxulp$SyybmJIN=Tnvr24=20vK^S
zL=>;ws3=ly9%Fc#duZgdWj_?ZlVAs?7cqVCX>(apUw+l;)eNNn)GY7A8BY<h?T{dI
z&qs}33kx`npa?CKa7poxng3vn-a21wVN^QdFU03EkW4od6IDyr)$^gnL*B94^;?+%
zZsQo%B@yd-nw4fv2!xo_Yxv$?&ye1wg6Cq^oWGghxsbIUYgnob!C|&YuB9;q(MJM1
zO}Bemz1V5DYw6Jiq-GT0STu`R&@T?kZn$=XvVAe=YH6SA`R4?Akj%puu6_!!>9^go
zy>E3dT8(X;(BZFN8(yS|VKSQs4w2AdwYVifgl1sFe4&x{k2Rf@rQUm^5%u5!)sxvG
zHfvH^<MZ~jbYmm_J(;H1L5V9>XKJIrL=V3r*FJ+bp6l-}*!zz7=%YNY_z^fRwgwfG
zXM)ra;AR8$$a`8rMAvdy?$=9twEqI-S3FUmx>7flVMJ5u?QvrjAKgE4OZf0;RJPw)
zG`T62pl<N$&n@}GrgpqBS;_!26Ka!ROI#5f)XvO}5+6Wj-QzCVX2-8O4!ZnVt{W@{
zyiAU<V%+ac5H$adjoUakn{4P%@xIRd&M?a?W<b92?)<&I1d$300K;(J5_gG_zzf`~
z!T$Pi@_-{ojw7$ARv)(}vLZ&rCjs^&lFa&G;&#TWOFSW(8<{2OOUm?{;KiL)2Ge<$
z<pp{1y*-3Fk|v9<$wxUxvqSb}tAyjajbTmM&17Pw)xZ~_CD8{Y3|(#uhBan+aXDa!
z9_XIgr%7p1?6-tkWXz1##A;bgM|#&Oy&peal$NZ}8)KC^IV+R}4zk~%To?nAd_IC@
znR0_ZGfOXv-X?pMVG1iO;u2;;-!QX&XD*6Zw<G`XR_o<y<8(J_GXaO9KkgTCJF>HP
zVH({WmP5(d4i2By7Tn+1_w}#A?biFg(uX?b$-}-zNV9$Q+&R_Iy2*PmK7r7AnkI0|
z_=fupr8QJW*B=G!Q5sj4ZI84xj(#)G$tlTIWN^lp$=5c{z+5!nD>qe+_L}`6YP-OC
zmqnOMf=k1+cTl)9w7jGWHl7J`dzc%DekL8#P`#9m2aRE1sgUuIcS0WMo}Zj`xcQ?6
za(pb{)64rU3`4nRFT1$>ddp|lc8lH}*>S;teO+tdSzEcXh9j_U8eTu_I4iQz6*hl>
z)#8bCIKOgQ<-*)FRM2}jNmS;#tVJS=fMR6!_Uc)c6;N8|13h|B$p}@c%h|lG2^=)e
z(`1m<ALv_GQJZ4iM8B1F93nB|Q5gRu6#GInw=0TdxJG!&BkkKpOM%UZXYov^)kZJ;
z3D%a;cPhgd@dsjy8JBdPzUoKZ!01HSkWJz-{}xA6=F>#_+;FYxFzSs8afQWJ_kVu+
z_3&an`|wrLRk@b;r;;EYMKRw;q8BjNy6wy>m#@u8#+DH*Gfv~Ji{66}+vlQo2pcaL
z0YxCzG*6Vl4<L9hI_Rlw%*eO-%xxL3cu_vF+{QA7i-LrFLNiv)@(Qlbo56rwVzmM9
zx6d{0D3Px`$EVc`<S{HEs<$v&RuOJhi}ma95Af&0-`~OsJ15RhO$d^NGG(FUUH<pQ
zH880ux72qw4;Bvfs5Mxuh-{17qs>f-@D|)~i8Reu{O{iqQ}e^^jBpfc9=g<6SK0Sw
zJ^I$WeMg~<nL=kHlQ5%1hy%oN(>B`>?Td*{e9GS1O%O+b|MHu%xzhvxH;eYmQ6PtB
z8{3hl*c!+Qk^QpsqY?a}yQXTbnlBj40-lUQe{u*z^1Ofz8{=!qUS>a6piACJYBq)a
z;^Ssv7@Dr16{*DgKDtpTvC!(!@u>bq`@1gLr`#Zb(8hZ3-+#`aZ4bZFOYxM>nASHh
zci*8YXMx1*b};JJJb4J~d<##|i6gYGc$_MEee7<V_DR_L9X0!*sj5d_b!v$n(F_C0
zzT=tqr9VG1{PGd%y^Xr7X=<=&z4CME?sRa*Om|1bGtvBZna2AYLTBF2BUlXOk4&HU
z2p%lV#|Bj--Xk>b<c0Gaa9Z1yc(u9W*?}Oj$c1N3)=y+?oNm@~FDqI{P|-IQex)!z
z<KNJ6J3N22EKve3D(ZD}_gjh4(bD(8p*PM9g4L1|C)phpfBHzbl?xveR6cst;`P$q
z*4Fa-Yfc1*XfspGhIdfH@So7~?%dKE0b&!}^1|H50kfTI_P7E$-&?}=u5x`lu8x}b
z4A0IFHrG3gc7I`x24LxjaiX$enGcf4*%fnMcycKsS-or`E$WqHU$P45)~NKz7Hifh
zWplXYpOPgeB|S)WsPBvi^Yo9JD$;h5vs)W;DcHD{A2KK|A2)oT5V_trb%mA)Z)(Pr
zA@#=FUN2$^0+=Eu8iidI>%^*|yx1ilPL}uMK!^O^H|%kzwtF>f_NvkI0_~_5M}$Yq
zX1HDTldtVwwc8foG}a=0ijU4rlYRMC^#nbcww<b`&5k^er?W90-V)P+D^#gGQ=!$W
z0u%1u$IA{`YZG}rJE+7cT^e$UYraM&zuJAeX6AJcpFmB;+>kdr{<ys?v&H#2xx<ez
zFSdPx9u=IjK#fO>nKZ5Poh66UrsqQ63WQmYzdj7-S^JfIw{0zwfPg6xOD^#!dpGov
zRsGppLce?OuHfCg_wIWo{!NG=bKxG3SMoY^K(TbA=gb_Qb7QQ=>T2R0q*l++M}~cm
zM=;Bq=GOdkhkRNEu;kO|+P5xk=f62|ZpPc5IR8es6FTgw%?K}KC<Zaalvicz!$|4!
zm{U*Re!^Hcw>pGuAa;B8W&@SzSIR4SKH4fPk{bI=b<7l7tG6o(Oqh{F<OMc^AtGJ3
zr-d0g?WDbG<CUoz;PHcGX~S4`QQy?h2z?9%2_|>_==H1LBa1^%QfY#$Z<$1fGSpT+
zM#;HHx2bp%-N`wmj@qpZ2$X~#ZY^saXRHl>Wu3K;T=Q8Kr<%zr-dZZy3iZU)60jqP
z_3gVSv~S*b3f2I<9#rYV?KjR<<nf9#A^t42b4n^e-Kr%)d8gu>3+mXi-P^Zh@+|3e
z>^VHXLWWIKgzB5Ra%=bX8CXapb037;QdP|oW7n8reF7efF(OaQ(+}wn8Y<c{BMds#
zu<(<YvvEtReXgR4)jCE9<Ah$O0ypRA{hDDoN@crVLM-~gd`uH>+v!i*$F0VQ83J6p
zbwl<kWS+9UplGqK{>&Gbvig)skIi|`DwU)`N6}zgWZZ@BG`tDjjJbu|)U7VypbG9Q
zd*2@E8@(;j>P0tI<s{(~YwC5TS-mg{wQ1_ZWW`NHQ1yvGtqYQE+^4e7)51rXW>kwN
z6gCQa8jkL=SAA06PM&YN&C9+#%X^^{SMaK2-=&-1J~-lWJMEsNQnTGF+hmnebh7`_
zD1RN?BQW!)Ja{J3*SG+B>Xlo=IhfBx?V?gpF3MfXXXO7*B(hmUB($-r%}0Cc?QT8A
zEaY@Q*;my%HTLPRkfwM;WRpmlOX^z{Od*G>-c-Hu&X;7r;hI6r#=Kj?`1YqFx5=4$
zgk<<KiQ!3>dl55o%hj1-)eJO(;XyPV?u4Aus~hg-2ZhZJRU+1Oeb-3RkmNH}_7$y_
z3mINn!lsU^Xk9v!nO2{h@+(?vkJ!Y^JLNW3%~MoI^0bG2Q{R$^+1yerQ{E_GWi;^C
z^7iY*DWL2t*qf&3_E*?E*6e?;CWVhsNw|ffmgQ_iNmUPjA=Edlq4A_kw$9JQQ3>>O
z%bFv1l}_ttccH4u^aBTYYljlvL^!93W(*7m=`Jok71cl3$svir=wTpD1`17_g5s|a
z8}#_u8nIhWM?-3RqkmMN*Lbk>>Yk=}?>0>KT6~+r9I1HQCz&{QyCnDu3(WClt|10}
zv&QX0P%0_&Ihg2V(=yF|WW+z#?6ZBm>);`S(q(QkwBL6!B>&-?RmqPxAxYNL;T8rM
z{)32t!^a^E+mNBYB9y|YDnoZ?TUzvaM$t$`p(OVP9TWEU8UwcTVKXY*XI{9OSHymZ
zVRXAzVdU<aT<Ljkk-gArUi65I=Cb-9+=_rWLSS9@MA)!kIyAmtfSTzjEqmFt&Veq?
zcm{DAsxdJu5^EcustWa+d}Px-k-zIu-P@0ie_b8zyXlUL6dO5Vm1{YAzT<;+5;D^K
z18p}F8gEBM73+5Bo{x>?wY_C-qgEWD21{dK@-;vY7zHik5=ci~@I+6rnLWr7SUFw$
z;{AWxDq!&Io>v3njr<}!0vGXCzu$+k>yC|etP<XXjp@%<!o*$guf4`CqhC^pHV-8l
zdfs8~%0nj_YUgU$Ae^H#K2_PpZmfM-aH!feI!-+DMegtC9BTe~zwZ>}R(&Q^77@+C
z@EgU7;Hy#0FM{c8?1vJcMvm_tP|~(K**P=CCaABW*5rJN)-Yb5*77TG%7?}s{m^Uz
z-_3ngTx#(WL;>5n7TvM7m7*8B(Q)O?qh6Whn@ITgN9P!|`sP;qXq|=yhNfazKwFfi
z6W(>wXIHNN^8%;32g^?iXK;(SBBi*C`6dFo1xdmH!YmxMq`&swN~C5#$7-qx?|FUH
zs6JGiz$@5wuA)`qR9Uon-lgNc32wohr~?PzXA`YY4P9eY7PCMWOf(h?(UP1zfQ6y7
z>LPU<&XqA9llj9Nwnv16o?+|J&-kvtPA5glmZ*LHc$;cl^9R*`bq45`|1jx2m}MmP
zyEec2yY%d^(QM6p)7bRv<IS&R|N7Ox-?>5ogRvYbEE92y6v=-c0$gUk^8HJ0|K)A|
zslHCv0bQN{r4M-ly!==w5V$P;?Zrtf&(Qt!7QOw_fX??c-K}GWb-pRDtgW>=_LeC0
zROR}Iy}INgbvw;p{1Nc*seuZ*-`lD8{XZK2KbIadV5m`i7M9F_W-ZNpa3zE|_vCAJ
zJxToF;^LyuoxrDEK>@<&fFzxh&3(9}A-tQ&LO^WY)3n?+xxmO=6<p_CC%~{U%!3dx
z$dICBrTorFHHpE$$@^cDCmo#t#^r@Q!*sf-0)3sBE(SCp`zgym?ePC@4w($JR5!(2
z>~`_k#KXcGpwmg6Pybty{_VvV$~FERm|$f)iiEMJ!0+1ibtcHA6e~2=L;ue#{G*|b
zhro2%xqVohs=OlAU3tse?o|~&n1P{+{AdsSmx(SR0Pw@XZ{?Xa`$RyFY44`Uky?bB
z3K@#Q>)FTXzlxO`Xv*19#k~%AJjst>TpXs3ubA<>K(~m^=H(4;Pi1CBjzt4pAwlDV
zLw1RlwEsO5|5T_FI^c2Fd80*fTaIVwGImr{{phAtw}=9}7P~oncCM&WR>&VG%I3I!
ziM5WcYnM;zx)m@vv~~!Q2u|?ot0I@NyrNMRabOG9F*~4_Wi=?3`tePvh{m};NJPq;
zltwE9Lou^$g#_=679E`t8-ZMlk@{xy9!^6MX%|I?CPoF@a;ZxN;_nJ9tNeLxR+*1Q
zc%LGZR4dgRRU2&Mt6Ju}0CC0{x$cYs>NofJl`x5_PWc-*2IYwcH}m!H8Qc?Y>D|L+
z_{QRf+waURJ$j0xipDFpzbc~^Z5;ATBtFncb*uQ6!RY>dP5!qs_)qXWLu<5%s}VE|
z2RyXil1=G=XeNC%iC_(R)G4^<bWA?+VPa>*qu%FgO>To2^QX=r+uQ`U!otGQot>Qn
zx|ttT&QrC+o12>@L-~%Wcq`-ki67ER8R~;-lZPwz_x7&udP~?tbai!GDzy;?#%sO3
z#3ahU-Yk!-cSj#g2sIAcTS)NP<TBEhHGB8&IC&%+{~U|#{MxX!T94~q^zgz>X_FX<
zCEmIgKBV04f4QX54}lrdo@R5o!KoHD;r>OWddSj{oj(lwplbW5TDdVNZ76Dd5MrCW
zXuIX^bhU;YX=69IqPy>5-;plc;+GFGxBIbvo7!_DKf=fLBtxq1{MSd9<&i>8`!lZ}
zTx%0mGsb0{_T1w<enJ*aezturi2-1*iii-}-8ZM$NtpJy%;h50xe)0bcb+LU?z)kY
zk%=~IkVq6|$cp`90r~5K5?Nc=P$@(^F1?7VBAYctaLnc)UT|K*ME-0v{$h<Ct%_7L
z1XB5cDKiTTlfXq^#ND5mTA<9<r|<Z<qQamPGijB$Yj;JRt0xXNx`UWoB^n#8&ang|
z8;Q%g8w~}*$w};bs=D$w4JeQ>^S`Lw6e}SAHk=k~L_kbNM#fgv4rMh={qYk?Nqas%
zKIXT5^9m6n2SXvvNoQxiYinx;B#^<YMFKZ?K{ospgL8A4U@+Kr+%DT`dvX|%f&ra?
zXS6F3#9POUMBV4TpfSmZob|0U6*`7(VTc@`tC{im=F^X*P9#L5frp0%y7pZ;%ex1M
zGYxN4&%L`sNxR36%zk)!{pyDfNu&E$R4F%@|Hr@oo1Z|_0&P)l$~-wjqu^q9>FJSy
z1SBEp4|Fow^b)^*MV6M9s`+OgKk1@&S+x$+=E5Fql*h!xBy*U?E^DGQL5VdrHT&Ay
ze5DK<-DTpy65lu;azaW}Z3sGlna*u&IO+Ohpi5{-5q^?Y@tHxi-||sZ_6ARrh?Q}}
z)7LEoe3c6=VK`$Fi0{t&w}u_L>Gp@9;gPJ2)tiuo&Vb7d=yy8`0Ea9T!)2gX!{BOK
zJx@eLXhkAE{rEXJ_}Qq=)m9wV$ID9?CRvaqE2V6x-A6v5R+9N$F4}Y`f|1iuz2ev2
z-kx$CxO79AJ2{~eP^>XL7y)#6xso9;{zXQKEWWje@ongo2AlKfN(qNy{{bSEwnEbD
zb!!wOrnEvS6xW1_9#d_|T<o{j$HKVm#iy==`(nVCTD>~Jv71NK!Fx@#GYCmYLYbTR
z!)gOai&jf?NgZ(~`vw724LRB^!GFHUgc6VWUn7uyo7IsNDr*uRa&rLXsUj1*e{c}u
zGss&sqQ#rFN)^lvGQSC_tTxfES3+yxYXrJk9;VCEC!wKEl@uUroh^)OEuZ-{A86lB
z92qu=0DO3~WzrCsa*&nVu`y<>G8Ta-9w%V3C=xB2xpw07d+0HdYqs#^KC>G456$ow
znH%CQ{4F!Yg+0Vn`TP5mCgrU@M_BjiNGUq5l(^eW%G<(xsfk2$TnCGx5wl@SCSg($
z(p1{LttGNDF>S)Hw1l%>g_vXUc63X!typa-LoA{6L}SBU5yb{0aMY(t$ll$)iVk?5
zPQ%$0DKZZ(Td)}jH+W)+&{|WS5NZ;$@jGZUsl)jDt}!W03`pLsN;|je@;&q-fk<>g
zblWio;py4hlF5eDmx^G55-=V2@>Hx{%sQA`w8ApGsI))lebP3Gpu*Lwqsfg&9-HtM
z_MuI!Wc)i@TZ7oe)X|x}jxI+XIR4fzy}7-X!H+wgR>qMl0`Vu;oo8-BR?7<x^MIbN
zG*Ab|zt&dN%^3)cSHZR?g!n|oGmE;s`F<kbfAGnI0;x1%@7MGiW4xa(O9{RX-uHEW
zN_?sBh^GcR`8HyZy-<OI=xz>LhVmgum*V4!fVOmN3G3Yew#Vv^d<muY${#k4z!fO*
z+2Kj(kFS6uK1&i46G_WplG^NfR&p6p2b$<CSBgU5|DIO0haOR&1eB)1&7aXgnECBS
zQz6Px6xByAd}@1CV!~K<!eo^AS~emeKDY6)O`6CCCAg(6eNi7+i@E>oZ%p)Gv|0lI
zJnNNk89nAu4s~KC3Hku4U<Nps`isnWvWWaFC>G9<-H$f0(qTABN|tR2X`B7dPcgc_
zzJ4;vSvnv|OqVm;HrSXtlIaBV>$#R?N_hYTKq!s0XZ#i`B`2q=!U}04Q-G;WjVDDO
z(SvD4)|b-Fn|-$@2R&sK+XLf2>mq5Kkb}E@vox3LE(E~ItPT-_R}uGQ<5=O(`qYf!
z{Ob7OK<y9QSv@#h<WGeS6>`NTSO8=&7!{?C|A9lx`D?(P6jj6@Ly_FxB3!Ig#DOOM
zhss(ie97zJ{vvy3SU$>DRDL5Vy^Rbw32keJvMoJ(pz-V*3BwX4I9G*abPG1_i(6yI
z?n|No|5gRFudkY$(Iz@vp3OT)(4^VP$C{G;Pb~7^sNq5z(uV*%OYZi-^?L#Qxk<P*
z2^b_#MMWj89l@c>U!TxShR<k#50;Oj=Pd%kwAlHCh3}3JI9O?-tn(fR^MW9+vp*Ee
zrVKzRi&toumX?~~9#G(YLdES&^FBG5*>csk>y31Q@k-+|W78_w-A5tooD%Zg*DP){
zcoua-eaw5I-j=<-dy|C?JGH1EpLv2_J(Q&!7nG2q55LR+KU)I<QfW}{sG|yI1Zhwp
z+4j^Pmz9;ho}Zs*URS%Bxh0FRHpT}h_w;27Ml*`j22i=I7I?~DRW#IQ&JunV5*jKa
z8_`=tDm#*;y*%PDuB!bQA^>2k2(vX~V0Bd$9R390ZR;Tdr6wP$o8{s%o+dGDuPPw|
zTSk#ggxzUy-H7gXJ>ChxO_A7*N;dG@??zb1_x!=yoQXbbxq7ln6jS$Hx$;tkgRJ_V
zdf|8JL+3Qi!vdn8s)jF_t5bGXSdU2m&aBa-2w$J5viEUE?4)O(&XWdG1@m{N%VvZv
z8T8fg+kELQc*UyXj#$(935*ikmM*)@0tCm!8@y7Aw$R8?F968Xf(q5xdYAV`YoK1I
z-f2eO-h&c8o@kwA$Af9F?t?1eOl;$Z#}=A=_y!Q}-hoV$jxu4y+ifO=V>=di%WIdJ
zhpX4^vM#!v*&t0sM4fX_R+|$fPEp48TMa+FnFyaZE`Ul|i|JbIXsSqjMQgh3#vfrK
zhMWU_`Y$4<0%ky`@5%j8A=Kb^s19H#;@#@A7D-KK<rt}A{V$!)NtZRY1-7#g4s#y8
zcIYxDbLcjqX*_A+=ji&gR#afpcsi8Gpu|l=ue3AqYY#V@4g2Vk8L*GY+crC;WgCZ=
zGU18Gn<ge_@tN2BE)7P}c6eSQFm~ee0WxXgo|?=WU5amP&Yg>nCU~B~bV-V`5pWfX
z@H{+O?23DXgrFc*$YDRPgL+4q4n8=Pgo*e^By`~00mvSSXr{&ogIxFYmiyK~8qN$N
zY0V4`f|8O<cu7aK<8Sg+0mB4f5BiF=0IKs7^K!Tquc-GuTQ{ceQB7+3^=&7av!|wk
z*HS81est9naPvWDpYRpfPEULKjrsamPEQ|Pkuy=84y91cjhHd#Fs$6YHLrK6g|SP!
zA{9YE1>&P{g}r!;D+G~ie^pXa3KCZ5ch>!yCVSx<A8Y4+CIIuA^{JTvi?@ayrY;-4
zq_6%n^R4`bWh1pVYQ+OzNJOId&K*t&1>jx)?Bu5pcz1g}>)Ul4u)gF&n+L?_Ljx&g
zF+Ws9&;R&}9==LT5udU@JT?Ry`9Ry2n7iNv+}(;E21z`laYED^p-o>aXPO#ee)W~K
z%}K={Ld&MV!435O-PrsKU#H`)AK0MD3-l@6+1Ak_{QS<ED;a9f{Ues%s^dpA(`YB<
zWy;RJPvt?t!ZzAVvg<Qthq$aXJFMLw7m3I%YZ!PcXY~FEBxtGOm%l9&DsT`Xi|*G$
zSwSY0r4+rJ+CZNL);1N!(9g!N#Y08Ca!}V2u8Ml7BFsH~mb{p#x#~1ZD48DGcre=|
z8fwf`&#lIx+$0d`w|5NQ{4)Xfe~iz@gOW#qR4&Ml-h-!RdW5Qb^>`%O39xBO?eZ02
zd@!85?=959ofjVrn3RXz-QCQjeI>@865MW+4)S?c>Mju?$+|i^&AuSby{}Z_LJ|^;
z<Nc)|UH%V=z1e9lM5W`hBklBH^{PhO`or4XS`yHNGl?^afk&T9hXR{z1x!!as7^xY
z#Jgg(EIA<lXQ43UG=4Ni{)y|(CU;tv`5>dDq*)e=r6Lw!)NR_@AL_J_n$U{a{8+X;
z-kR6;T8w0^6RD&PA=lJavgpDjpJ!G6e?;Sq&V-ZW-%WZX+v1<kr^hf-y82wQ;F_rG
zFV4?>XK)*J`KnZ1z@>epNoS)7W0(=qk&1vrDB!S2sSm!x+aM=?8@<9*^*RR{K45Cx
z)xAJ^&PF2CQr!oQux(!;a6^^GXSrFUGlLdp32HZ*bG<g!jD=rCr_vpG=z?5Z9yOfz
zMDV#D@8j3GvRG&`BDp-wyMNjIXn*9>%bHeaKD6wvUY+?{j_j_mA~=ohK=xmn^{=~h
zm+J?H=uF}nnpC!@eP6kf3ZYUFc!WhDMSjBsk%bCa!K>%Y?>X)bN+Dd4Y#;j0#;qcq
zidNi<I-(eEQ-VbVLpZH?LPQ*XmIMa}zpWqFL1zj>oNYmYT9#x0nIR#in2x}b1jN_=
zS(SKObsd~`{l2wCu0q$7Kc9`nW-O?A_9jfz=t>DNn5wp#uMODgG8~TN;YJPha(yPN
zeT0OnqP%=*Y_Fg4QEA)9Y&$jL=l|;azuo!AlZ|RhDH6)RMrT@)5fSJxBJ)_F*2kIi
z6&#Pl6eo3OLj$O^Q`p4RDZ}w-w98@I=uGX1sJ=#!8vpti;IAiiHkv~pc-mA;AWM+4
zjO}CJe#ap>BG7gp&oRpXu1u&lrZPo*-(%=dS?cu@v`rQI@eS}9U9$!Q``VvPBtk;s
zmGRkWy7=i<Izv;zNU=|~4FalP(uAA-CVoB_(BrYMAh_H@lG7vUCSe;vLYebbhx7KO
zc=#r=>B4jYpi0RJnKz)DvPB2xoqRvB&<K*GNX^@MKKqgo5=Ade9zgH1Vy*om%ZD-v
z7TNOo&!nevN?+17IRM<>OHssH5>p3rfh+)8t5#-6TotX6qRDbNE+7>V$|6yua3mh{
zM@iq$IM)XkW`gZ%12O@7?@&d9<~&Ve)=j$#@oA?6^$n}Lr3*N0G0O0!uJ^~1NWT_-
zjjxq|u~zzcUK}toxQI8UR^Gg3i(`Jk^B*B0YvHJld$JQWIV;OF@hjC&3BQC8M94$}
zww5OI)dOONoR=Fw_wA^I*{tL<!a1}k%3w23Z`F;DxU6{ce@-k`9{J{>Uu@jO$YJn$
zQL(=LjgOk^fX79$OlaR@ijoqXoi6)n)XiK&a5SOP-q&)p@g$fTJ9zDyAb(Qjaw;&8
znO~#3Jm9ejros~p2=VOpw!${*%VN9jF6M^CU$jXMv|f|W8kXB@jQ^1LS)Vnp-pkG0
z5*aKTS6>NKMn-0~^gvsUJQM`P*V<AFgtl{MEG#UudJI`UTKi7~1k87LcbW70Tw2lc
zQV}{Snpn8P8yg$0>rYBoL(*A2Lanesj!+OGx8Cq5AfE4<;M1>_7QP}TP42$qXUB`?
zUguWBPg0WRUj{~cC}dfEk$cyF<1d=>;d?4EWX@=m3#pb{FMX#6oYDD*@*V68uXe>s
zh1tzawFfmTE;z6+K0jcO;=-wdSsfg`w7&yC2<C7YeH_e3E2z}L0|rm=(<i2yj-|dM
zTrBc}7cKqh6Mr8kJ7Mj!95w8+2BG}fcI>;CI8d?1S39=szFZ8ijdznl&(+w_oN*E}
z496V+2kRky<~ucHBS;vAF;d;peoyRBXO82(hh1epTk5@>#;1+!Fn!a}l2r-tdhq}8
z8~_%4<^|-;KX2-0Uy(^^@oW@>slUohmqqCU_&L;VoklnGY_=s-W;B!7v7z{i?5D|s
zOj(A-@kctpU<EjSr7t1W5!N0b2cS$9!!?j8Y9NU{9SxvV_u0gAm5(VT4Dp+L>21rf
z?es-X>d=jaKmQf19;7eoY92Imasp{}+!qJI%fZA2AraOf1`9I@gX-Xi2axupWio%i
zFc<u%BNx#0EqUNa_d@8%$5#-S4^i6hGsCY}+$|tOYL{d>0Rc01%|*K|&&QVX7u%@G
zSR;_~Iv}#P&6o<L;(2pR#+Ti!b8+MpYdJhJV9CN*cshZI95+4^1G`O)So}%->~kJ!
zdQfhxG^UuKyU}9pv9`JM3Q<mjR*TkIdlZVhG-!eO0I443%@~o;A$Fz<_yQY!tZc^=
zN<47N#pHLo?TOCBnWD*)hfVRhXo;0NtfQ#$IUyB(hI0;MHiAK%<>lo7m-|sCh)dlP
zYXBf0y5aEvi3t!H<wK6=gYa;$?fi1%uwZgUlQdP6ceiJCY){eHYWP?e8~_x#3?d&Z
z&$kdHqfN(7=M8U>6J-q62hl+43neCM$<vft9^G+!c%ldd*Ff)djx^{epgb<ODz8F;
z%r^XrPDi~oSukr-uB=mDUW*<I)zcIF_<<{}{h%Q|T%iL_#DLJ=fM@C3vMYr#plNlK
z;<(1!6%r%ImyXAQ*v)F!s&#sU2FYfq{xkMa`U&MHBI0lL0oemLJO>2P%M-o`ExMNz
zni4jn^Jxra5(v9QW1wmVP?Ziwq@f3=z3Cl~wGG~YcscL)uuB5~f2xZl##}p)|7`JK
zr%-2&&=DWrg-fWnh2fElPg|*1lTW-is3>?WTQIH(%knVT)7!A>J4R|bMU}Al*|6Pf
zq4Bol^z&GgsNBh18?LZg5w@*Z;mfqveKFD%$Gky{T(Mp@erW*JFaPwYcKuhU9d>X^
z9Dp@Y5;c*+CLZ%cMohUVW{df}pFR0Y=lIO|ynxrqvgW}_oiJUR7P?;w-QQ)*RZMhQ
z^x|&|#EXMh@=92WJ2@y^91HH$Cy_qy$p0i(41x~k;Kl6-R2C@OEUN6$-ZXfvdqEF{
z0Qq^qQX3s)G*eOaNMO~Xjq1_XBq?{1s%eZRD(Afx^$X^2AKOkz2z9q=^-dqZ8f!c+
zf9Z7=EKJuFQ+1pjVBUY0a_H)P_!y<QyD$6~o@f0Vu<nRzd0zmjZdjpUSTv`tsah<d
zDl#o<$l6M?0IDj%#JD9dgTJ68t0?ob%BIjLkcyJ{c*RE$*pAuw*gMT<O%=N_c}(WD
z5JSev<?*!kOO=gD&yo*zm5FNIm>V^L@~eA=Xw?6z`!cN}>hYDk%-!)!<gS;e0h9q<
zw5oSleF`>y{a)kjR4)3H?Sts4mE*|q+2fPz3qZy%&8R`X!t??JY4v4>VG_KxYE;u!
zD|bl_Mtu$s#>5Zz#&*1?<EeHke`|A=H{!PDDY0AQ;XZE#tVbR>skp3R!aZ@9=b)}$
zhv)*)Q<T>yLV0KRC-=tSHAPX8h>`v!o#lo7WT0<$2e9@mKr#!NC9D|PH#L(<8;4;S
zlBgWp!K$nl%jA^qi4TZ@@<#2IGK~_O;@>SnE?>^?MA?RbmoFXg75@oNAZ4C=?ur+)
zL2t0U=%^<R<Zb->zA;S*z{sLkHHO7({s4T)^yX(5x8qewL~-$$m1_?t6bU|}SEM~b
z(3G&HoBSmm-@pv8N<y1_m?ywZ>~~my^aefh{fi%A(iGZN6y7TF0>+~CHnJGSmWVbG
z)rPK#^N)w@PPCUAXwqs1AQxIClj1b31K7Q#-6}6Mc^gG$P@=AMQs(xLW=X<3RQ|x{
zY?nBm%c#3n<{z<foLVLP=ew<UFSoD2HbC(@wRDDxnin)~v-iMDS4DKD5M>J$PU+`;
ze5y<Pd2qofPB)%=?M&SP_~A-S4nP)q{*4`k=mYYO!w`>DvH+tKZLYP<x(-g>*}DH1
zG*Akx7P=WR?gj~<Zzq$LPFjSJ+S*#Kgl>I_v??1xI^SXIzwnpup49<H0J)YjU%Pg}
z!Y3DRUi);BB@PDCqApI3_x{r4N1(~uQ-s{v!axsu98^+)z}n5NDnC<J3J4EgGWg2<
z8%qC|@$q*B8nIli>!=vHAZ4g0Dv34Fv7Kgx<>dpQU}-(d$`y;tXAsE7r2wive!S_h
z;Q|1L!M4hcK#H<&8tk8%bb`6Y%l?fz`44GtZ~;_nuUoV=LJ^o%?n+yx&tLaa-<eN*
z<^PZ0{VOX4z7ITcUnrm(mI{n=x3C-anNTnGhvKoX*yW=Fm}a10DBEb0Vr1+wAc7F}
ze$vb5?Ow?;Fo=Uw`B#{$z_d4uV;=ANDfk@;^6Dxc@oraBawk|{z9g?+7rbCZu7u>e
zApl*nZLDHY&=V=lVxg=Q3g%+0E-a!9r>>S8bxg)k<jP+@`5b`8!k@iPxFE#(@eVba
z6$rKF7F$6H8l-p9{ubNcM3w=wz&XQEY%32u?;7N?CP$gh_KJN$+_W=sQ77CWk~wFU
z%2-qbAcyakYmIY4_q_jnDKfAixpXOzKKosHlN32g<*MQ-dSSdm8}aTikt>b8tnGIC
z3{^ZNgm4Vp2>}^!Qk<HTyb|;7L~sWcF;a(8?glmoaTBt-if+;`{{_JJ0523goE<V^
z14g)zOdEo?`}08>(IXWS#<JoV(W1i@%Qf1@mfMCpMIVhyo4e{4m)dR3=NM+7hXDy5
z3=7)a(AmtwU}1wK@A;<;m&;HKD9O->)CfFOmv4%H%&!qRkyBFyv|A}?wZdr9?}d6<
z)_sCD82N#f{`ti5e16z#;WMS92MI){Wo@kHmcd=Wk@$|8vg}eJuBxYhhQJ`$si-E%
z=B6R7ryGp$Uu)&x?<(1WHm|*6!6UE85D4LNFlaF^Srfcy^p@V*EHms+N%LwZb&j(h
zCMB)ZAA8nHY;<&+PHg-AqSIN*miAQzJBQx*M~LxLEYH|4rqoh%5`(haLe2Hu3@Lx+
z5*4M)ixPgE{c8pM$K7`oKwMh+Nh-NTb#)U@5|#8^zKE;?`5GHb%g!B-G~>)J;@?^v
z=2kqF*=dy36yB-fHiFg#1qB&K;t${F@7x$Fp#!|RUgW5NR;IK2moi$&8g0WsEy?se
zb9c+I*9v3WQR9S&*P%YD(aT$OB`dgSbgsj%B)?m7|NEk>!P$|3r0bZ7w?u;4(A>=J
zznTZ_3+=AH0}}Q0^t3TEYrS#fMl-<F&33J?7b+_qy_ILIW}^Yv8jq*3gs((IMP+{W
zPXLjQkI(STOw%-!JE1%{7(Y2V`Su2fvpQSWL9DHoL+5%~7_izNJ{3LDu>4uy`@!WX
zX?3++3-vDBOWyltDb{x3*NDyUVpVKCSf=s`E=C!87S~gMaNEEgr{9g}^<w0s35c(d
z_LWD(wqNQ`fNCQM&~d>&jnMjlnBwAM()U23n<t0`AV)IO%{@KG+dDdJJv=-b;H-C%
zD;7nip0fUt9Hw*d<)KtON+9W~|K`mb8!Iagt;&^Cz)ehk6M^1`<zFj`3H{u7C$;iO
z@@z~MduVcgi%tTkO;?X87kqpAc`^Qt?M}ta=z5%m?Pgarxao-^y8oxwGlTyS{r`AP
z^o3aiedZl~&#(i3eV3835{EquefJ#Iu}X=5-_+EU8QA|la|Af*f+C%bD{P1}FffoA
z88#b!FcM1$8C;5_6OxW2lv4a`GbuPdLgUz}NBI)e4FM8(K#<vQSs<HMqo*8qI$5!U
z-Qo!|hB_4AKfix`$|>AU$WZy?lQA-bfP1_#wDETxh0W?dwXK{~cY$QdY5xZo=feaJ
z2#rW#Hin1#(z>4SjCU?*>5zhX2qhcb>@F=W1o1{a)XlI81kobOigLggfb8orM?^(C
zm8^()a_b=T6hhL?Khc%<B$)Bo2gKV-5BpR<Y~*1q^dbw0{}DLshg!BdI*X`0pXC{Y
z$am$;cw36YI>9Q*F8fd96{nWy|59otdg1b<bW)mlPt^pvGh{-WC%H<)pM}Dhia@f0
za9H13Y3$<fAA!szeNMEFO4>J*0%dLC1I7gb)I<z&XpPw%{w?;YYXR{$0*0rwN_VV#
zI3fKwY~lG4U|blha28FgtWe8j4{R&}Sh8wNx5>UH_+`0LbXsThwiWgNcinJI-g0T>
zywdmB7$PqxC#?-cx?tKR#fauFksOf05FKS#M9CXxXMU}=R%Up7d^|wOj3A>qj9#P3
zLx~NHjEF`ucJsjT?Kgh{Wpcdn6V|d)5q_%Pj@Z{|$6^Ykm^L9~dMADHB9ngCiT`rD
zx{d_=48Dm=t93Qv8Ub<8+iz7>LP4A$31r?UsEJ~QR73y;z98H#F)fXxAb=`GK8_pY
z$p1W%5%ud=p)L}zQPsjk?f~U{^*xd0{Ttx0@OVvtZhyfGOFT$7cNK>v?u5tNNfvNG
zC`8D`1Q)2#H!tsqjR635rSgc~gC8Z;$;`eFpt9Ys2a1@@{=l94oInP9r#dpB`9)7v
zBYbsa73l?)OnNRd6crK8^cC{x*lK3cy{xbKurcPzq@<~Pus*a&ijjo#Xty9X)c6Q(
zBDs~(V&~pGX7oY#^bCNFDIdOPKXi1x99sB?+$Kdncp>1dOQ;bN5+aSx{AlLjz*`7{
z?CG_<W9J{^fT<6F>|zU+Ec!f<@JJT<sL?YVHWFZ2SzM3g1L7HO6xSY?r7spc34RNw
zB7unXiwk0><AC_vr@J>An~*<X8aKzA-P(qFj@A$m61@~Ezxy5q_<+xPL&uB-`Ue{m
z)FHd!LtIIGvL+GI<V5G$BlBN5lC_Z!Fki-_B6{P170)Q8I5>6;mg?@M({&4Zj_wz-
zYUp4r%TN`!Bk08iyBrWs%vBuP9_$;B4`yRVyA}v^euBNqeU;z?=Yyq<RxOv7m;m-@
zQtoO1RYx>)1RS78=tZ2_e)VK_049lvi9{-b(YlyYJAIHGnTM-xU5%$f!d4y!#ODos
z1q0RFU#lb{avz*c=^1o;lsqvL4`td)U*r{iJ3BEf;6135`_Sf~$avx#0l3@LUQeeb
zl$~@#$$pC8>u<nPnfxBiVTA@sNJtoTPgW+7{Vh<LDJUdlF@i7_QdUAN1AOWl9QJZr
z^#V)3j9EHULiZbDB2iR<@%7nH8shJYzX?3_nI;xL85wpv($x8h7;g?mj1MlRjrYsz
z+AOwvRZ_trr=rOQ`z~$6pyLux;qms|S<1h_s#h1Khziw7^hQ^^fBt;W{#Jb-;MIwU
z1bKPMLWnd4BA~kX3KdSs&ogyTgJ`4_Ym{~wfnWsXxy$E6j&yc@Q&?`_dV&$LCg?qn
zvu7%Eo=Gq2jhr|Q)veE51Ou?81Ju1+8IOY4ne;E)*4PEd$#NEQpOC(urZVYXnykN+
zlfHr{dcVXXFkVmlzIO0KadC0i-{1Y!N;A={3Eg8CUIATJQbM{#iEIt00+jVwOfXwL
zAU@}@t{*v3(MU{LlyiG1VO`5fwB&0aH{bD-&8N1BXE=#RisL)ox8sx3%1*naZ|YD<
z*4o_m^X}BiML=Aa=I83c5^jom<TCr(eq`%m|Kr8}6Ymc`Dp?Ms!skx<4P_4J&_2Op
zW%j$~=3akEHg|OuBxKG3QES%%)d;XWUT~>Zc0V`_q28zQGFKP{l$>|e9F!O8TVUS1
zFjaS)V7y0fU~pge8Ea;0<)b__8R88fTi?xl9UFU_4h@oh=9eqQ&MHaR5A42F$!hZ^
zwXhaeq<v+%JoR<OzxIOvyAK9Pd^LW0J^Cj~aU39Bq}6Cic6K&t{fbBM<G2vwI@gUP
zFb_{C$CuGfIk&)cbC$#<Alu(Bwg%}PoHbGKT=}@q;Pq6if^gEOxAu^sN@+%-+aduh
z>ekW{&)yAInV4hZ89S|ODvj$Tu`yqj0c=A9TYDf?od7IsrlvY_<QvLg=Wkx)l_;Lq
z6IfVE0Z1Q+G@Cz~B&B87)rsn<C@Tl2%QAPb!c}WGQc7u6xO5uXY{Fzhdw%^25;oM)
z(UD$^;Lt`|g)QGAQSfW+g>pb*IeKuy)2dT4Q@TFH&gT^I75~VEtvbPrsE8bLtw)yG
zeJdfy;#&2U0_TlKCC2%{7bNucBOxl{WJ^|m#J$r<9u3d81cUWmU4Q93I$x3MMoOku
z#f7=qSgsS17`OYpC=COuUjAqZ+@5_YYmxew;DLc`)JZBrBx{JhNX0sstD&L6bp_1~
z<oZVj9mFLio1mSssO6T%(m(^Jn!GaEXrN;73s7W7`V+QNGHQZwS{dEx#T<P}H<y<{
zX$vTzH7GUjxbECqZ@@Dz`ndZsd!j#(!1nj%uIuk+<72c53B<y!2{`4i@6GgCBtRPv
z#UmORmn^nfybb8(3P0}N<6o9J$j_f+5;CgVtQbB@8Hw;L#AN9BM9;hJ<XrRkPi_8}
zmC|A_$qk4f&7c)EqD;zDSlpUzG4Z9sXMs@0v%L*VL5}BW<fGXOIBph|f`d6bwV5LW
zsKs*y=X|$)TRVVymQSBP&5FF1FDdrV=EW=1Vg=&)$nC-=N>%w&C>!k0`eu1C?r;>H
z%UCWGede{c-xYtlT83)diJhw6JNKT$Dni?KDvK7&P2QJm*{ZQB1;5Qc_1sGtI27_d
zoLf7gZ=J#47QVa)1Rw*7@Vq2o>MmV<S@I)+@n#*?6|*5CZ%tohnRM7qR+CAS;|rDs
z3RO~ydKKJ)@Y^H?P=U9PAoh>3GaUC61L6&glFpR?!l7Hg)@{@?I`x)RxBu=;*{|Qs
z28|9=OVcNhlI!<KB>vzGfDENsDJ|=wOi=JU^Z6}aU*kC+oyDXOLq*^7JsiZ)(2$9-
z<QS9omn#16cBG>NfVM!j`y79Nze>l*2tA+JSp5*mP*&Tyii|{A7HmC`K#U;U{{iAD
zD%s4`<?R}zBASzEkSkwH=bwK4B3a3LuY2?=qNus>>B2-M?euJ*3s7l1#V6nWNG3EJ
zD_4Z+Ef@4&<()YHnPvd@LQ@qtnwBHZ`*Zxxqbo&xhMG(zo`2zCU})RwYf7wo-+{3$
zd^m9Ed47g-T^}tiiJW28{`$x3<P3{RhIi~4X&&8;Zpt=E{d<PtVn`uT{bFE%>|%ba
zE?+sk)2kz%`~7*!&bTqd6t7Jf!=6JP!1>}2z@wnSJXV~poA8wC*6?@$cmxr@!bN&D
zNbvF95qW2-twb-v#i;9RyX{dqbo(d3e0d^LCiuvZdzkXG`DHPf$K*kpPnS0rZ?ey_
zcJgrx%JyIgqd)wp-dNH6bD3Vp$>GW;jAniNk<Y-PMt9hqN9%+6ny$0W_@W*=X6`C8
z;yp!s(_WadU6iD~eG{&-r?)o*!$oQ4|6cNP!<v4>Uz`&8ll89WgO)qN+@1Fx+@SST
zDG+%jkP8$#)6vk}WZ3tFW`x7_21^Uxvi6Q-_D&uNEtndV-qG6*(dN8z1@!jib1AI}
z8YBQbe#*(q%S>3QGBkn==4Q%NT~FJs^JmY>tnsf31K$9=vpmskOK$^tD0-p$-A?O?
zW-yds5JXHX(RVGjr!gme5!+_c(l6e2SO4@(8~&zmvEz!fhlg-|G6AFecSheM6YLyy
z6U?Ib>#Z44A5}_Z{qa)r!7o14=wf$5%c1hj)WRW{|2xy=@!GrZRCr$8u>6ILs6&Ma
zJhBEV8M0t5fUGo;NTB%zJIW|6<<;yc7I|CI`igZd2lxvQG*0<pr5P<y7!x9-k^Le5
z_}Jqj81|14@vT4<OW9fvIdDx}gi@)-El~#;T^gKq5e;&0X8%1=e&!+>KG>230tHyL
zi!#Mm{|G#)(jXl+rwr&r*%`^6hRh8a5^=NT-jwo3Y2N*e-djJm82TTZoHTO4j<=)z
zY!LN3;YLls-ptu2zE)!^Zqn{Cf05%A|Db==ie=D_1a&K$goRwuEP3+uX*DB~ji%25
zE>`2Z0Xd@E8CQ)EasOlzA>t=M{UL-vQxvGPC`{FL&ssJmu%{1K@|%ZlSTf;rc|uY7
zfqEv2Ww-^G-Q8ilvfNB_>)e#7mKJ5)e8$UA4#*8+z4I{1?eO>rAf7Nc*!9rMnW;D9
zV?_RHn3L-62Wp8L+7#hWV12#tH^P2r_2M;B=$*X*)jP=b^~fBTOYxd(!^Ifp7^b<Z
z)k-p=Jb>cTa4MpGN&qVLWs*M7LADB-ej(@71*#w8^@y@?dr3Me#@IDOhscc;``>H!
zE7mV~i8QH}eZvstMan?-Q4kS=*+_HB=CWoi2});8D+g>I-P2=yI^W;7D>1W=q%*n)
z>3GK0<6g@u`dYS&)Si5P9WpOin`0&$aS%ocHa_W3kOu0#T4O1}ys!&Ajq3QWnMU?E
z1dvV(qy%C$fVu*>>P2KSO5Q>q%(VJVnV5lURtO+(Es75coTqnGrU4j!Muq?X>N@vm
zDDyCmS6iDU)K*lM$+0L*ZrijOret-&DYubJ5n)KV4aRLK8FP%4Z8Ae5mky&eE_0C@
z3<@zZOolOJTvn9JFftayyKnuo?K!-Ez306D{Qmkq-{<puzwZOz4t1!h2ZcB`I33=O
z2jr1nM@1{2t8Dt-8dvx))0EYvF=}Qpz4B!o*z$__OqU5}8Bg|j%yeEV5R2pRBMW*v
zH9^ath)PFsg*XM(=~UgvmG3E+D)0t}9*f&a`qDZvmg^Ze!`$uLB5zX1m58s64Z)!s
zy6bN@1b@Q<_geHG2#hN<>z(;UCy_p`MRQ{<>1y5qhLJrws_K+`UdZRAIo5omut}pC
zO5fN?Vr%mRgI~nPRS*v5*)}R!+B%j=^T*bD>bkH!YpGV%{2}*+dtD&bV6j-QT~PO4
zw=<hV!wYTR;Q6+Zu}8DG$jte+nfC}W=#Svhfnc?mxgc1Cc*hhxc94e9<@_DQt~yu|
z@Q^KGj9>l;<(ux?b_m&YB=;#}a<#--q`LO}uW@w#YVOtA=<ZoIupXL!T$z|6V?}%-
zVkh9DsuM%eTG1`ytHn_<8#<answe<+XP{bqvtVrG&{~_a%jq0ux+@(`IU=)1VU~+J
zJ*e`5qha%cnhN2iW5(A~xleCJmbPkU2#cI7JmVFD!Ej!@lg*ux9OXSFla=V<g3FbI
z-dId-)hreh+exhcGMq344`)XzLgu%W^pVTi*B@jh-av-{LNX${P5%m+V1>sK$&Y^M
zuo0hKauvM<w^h8;z<zO9eTPC^Z_R)xiwaXyQ|+52JX^zlG}i(J0YAzO%DL-Zko>Yq
z&knK`8p(b%GIFTpID*>QFB$%|o;>)u#>UbvJijBMH)ixze?o0F^Y$!G&#g3|U-(Qj
zE_tLeKA1F3JkjTEaZaPMer#g8p+6*c2M2{cZ-jgi5j#wjoc80plzhB3`yp$(MGSPS
zoT~LLbAkxn5mO|itKO4F5co8i!^2pm)n9+bmOqZPE_vy;k1+_II%;mX(Oz9Tf`<sX
z2~yLz5uEDu%nOkc(r3lIR8gI?S%=#LjvpM;$YPJDTCgZdaIbUm7;J#e3*zTecwy!k
z`t8{rmYfb9_{x#((De^(<;etc`Q0VAip%sr&F6eeZ@AHG6Dw+M{Gh1i;*%bEwZUl+
zpiyNApi=F;m`TcN)h=B|an0V1dWvc)Y_(@H-Wq={;ar9WlkzyW9RVzJ2S9*eg-iSB
z9|JBFnZciu71qfl9PdsoWO3km*rxY^26!Ks_4rVN%^5kni)Go-H>m?RI*~z-;%+4-
zk#$7#j^NgvsN>G*zGuaZ*}F@^h7Txe(3hHV7Zt1$SzF&X3(5|CeV4pcM;Ol_Bi|A|
zq&8%`>_^ReGZhW5hzzNRJTD=?x=yS2>RuF_sGm<aiqB4wImwyAVE$jL2X=`iV(MQm
zPrg1{r3UL%*sL^ujZ!@tp5x7ed6hsZ7OFJm=lydW-C0}!N-hJ_XaUdSu0~1p@1s*W
z)n#cKweMo~bCPx%rX5mvet?$jp8YvVr6Vw+RUIQ}d!RcrAX=}sSpR2C(;ob*0!`H`
z-uO9au0hiS1njf8f)1DQ_>Ir;E#E^8gVYdozzN)Mdsw4S33o2E=4?9vxS`9V<>+M2
ziWZRHl#$7ONbJousORdC462r0Nk!fb&fVb?d6<-faSzdoOZF4c{weR0@Z|}lKO}K=
zU~WMsi$7pu*XnpwMZ14vo^zVHO-py3r-b<DL~och2tdyWivgFJuxb410T4Jb1khcU
zgl($S^$dxs@+ktEmf4i8cg*chDo9-zyA&01F>V+0%>2HZ0WCqkelt6N{4>9-8f@iL
z4mw#YY+`UbY+u%JpAPg(^JG(@uB};szt1fR74S=jGEHcS-v*H_c?)*h2F-;mb9;wH
zn=5A~i!4Sx=W)~*uh=<*o<TYfsv^wPl^o@s5VCH<SVQOM2d9*mw|+k=`A+)@91kC)
z$ta0^U`U|b0a7>Tb#tXIQ9#=ob-Lig4mo%CPf9vXq&FEO@>a#RYkkjlYnNB;N3tf@
z>Hl?8|9@MQ3#h4tACvWkPJ$BLDrzL6#jJj{te6SRV-(0&U-^e50Q!pzj<Fnr1k%j1
z!Ez5GDaA8;7H#6{Fj-XCRO%CW#58@wl2l3Aj#yko8N82s<J>0cNV6EJ0}?~-%fzKW
T{rGST_*}5Q_(%DfYf1kCSEFaL

literal 0
HcmV?d00001

diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0001-20251104commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0001-20251104commit.patch"
deleted file mode 100644
index c23f7201..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0001-20251104commit.patch"
+++ /dev/null
@@ -1,1272 +0,0 @@
-From d61fd429337580809fe74a59b1dfa81b91094dae Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Tue, 4 Nov 2025 09:11:51 +0800
-Subject: [PATCH 01/10] 20251104commit
-
----
- mindnlp/transformers/cache_utils.py           |  28 +-
- .../models/deepseek/modeling_deepseek.py      | 149 ++-
- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
- 3 files changed, 976 insertions(+), 87 deletions(-)
-
-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-index cadd2e04..02f8d4be 100644
---- a/mindnlp/transformers/cache_utils.py
-+++ b/mindnlp/transformers/cache_utils.py
-@@ -812,14 +812,26 @@ class StaticCache(Cache):
-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-             # k_out[:, :, cache_position] = key_states
-             # v_out[:, :, cache_position] = value_states
--            if ON_ORANGE_PI:
--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--            else:
--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--
-+            # if ON_ORANGE_PI:
-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+            # else:
-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+            # 确保 cache_position 是 1D tensor 并且类型正确
-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+            if cache_position.ndim > 1:
-+                cache_position = cache_position.flatten()
-+            # 确保类型是 int32 或 int64（MindSpore 要求）
-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+                cache_position = cache_position.int()
-+            
-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+            k_out[:, :, cache_position] = key_states
-+            v_out[:, :, cache_position] = value_states
-+            
-         return k_out, v_out
- 
-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index c695b944..d8303e45 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
- # Copied from transformers.models.llama.modeling_llama.rotate_half
- def rotate_half(x):
-     """Rotates half the hidden dims of the input."""
--    x1 = x[..., : x.shape[-1] // 2]
--    x2 = x[..., x.shape[-1] // 2 :]
-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+    # x1 = x[..., : x.shape[-1] // 2]
-+    # x2 = x[..., x.shape[-1] // 2 :]
-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-     return ops.cat((-x2, x1), dim=-1)
- 
- 
-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-         if self.training:
-             raise NotImplementedError("Training is not supported yet.")
-         else:
--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--        if self.config.n_shared_experts is not None:
--            y = y + self.shared_experts(identity)
--        return y
-+            # @lwx
-+            if orig_shape[1] == 1:
-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+                y=y.view(*orig_shape)
-+                if self.config.n_shared_experts is not None:
-+                    y = y + self.shared_experts(identity)
-+                return y
-+            else:
-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+                if self.config.n_shared_experts is not None:
-+                    y = y + self.shared_experts(identity)
-+                return y
-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+        # if self.config.n_shared_experts is not None:
-+        #     y = y + self.shared_experts(identity)
-+        # return y
-+        
-+    @no_grad()
-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+
-+        expert_cache = ops.zeros_like(x)
-+        for i in range(self.num_experts_per_tok):
-+            expert_id = flat_expert_indices[i].item()
-+            weight = flat_expert_weights[i].item()
-+            expert = self.experts[expert_id]
-+            expert_out = expert(x)
-+            expert_cache += expert_out * weight
-+        return expert_cache
- 
-     @no_grad()
--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--        # expert_cache = torch.zeros_like(x)
--        # idxs = flat_expert_indices.argsort()
--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--        # token_idxs = idxs // self.num_experts_per_tok
--        # for i, end_idx in enumerate(tokens_per_expert):
--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--        #     if start_idx == end_idx:
--        #         continue
--        #     expert = self.experts[i]
--        #     exp_token_idx = token_idxs[start_idx:end_idx]
--        #     expert_tokens = x[exp_token_idx]
--        #     expert_out = expert(expert_tokens)
--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--        # return expert_cache
-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-         expert_cache = ops.zeros_like(x)
-         idxs = flat_expert_indices.argsort()
-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-         token_idxs = idxs // self.num_experts_per_tok
-+
-         for i, end_idx in enumerate(tokens_per_expert):
-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-             if start_idx == end_idx:
-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-             expert_out = expert(expert_tokens)
-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+
-         return expert_cache
-+        
-+    # @no_grad()
-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+    #     # expert_cache = torch.zeros_like(x)
-+    #     # idxs = flat_expert_indices.argsort()
-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+    #     # token_idxs = idxs // self.num_experts_per_tok
-+    #     # for i, end_idx in enumerate(tokens_per_expert):
-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+    #     #     if start_idx == end_idx:
-+    #     #         continue
-+    #     #     expert = self.experts[i]
-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+    #     #     expert_tokens = x[exp_token_idx]
-+    #     #     expert_out = expert(expert_tokens)
-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+    #     # return expert_cache
-+    #     expert_cache = ops.zeros_like(x)
-+    #     idxs = flat_expert_indices.argsort()
-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+    #     token_idxs = idxs // self.num_experts_per_tok
-+
-+    #     for i, end_idx in enumerate(tokens_per_expert):
-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+    #         if start_idx == end_idx:
-+    #             continue
-+    #         expert = self.experts[i]
-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+    #         expert_tokens = x[exp_token_idx]
-+    #         expert_out = expert(expert_tokens)
-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+
-+    #     return expert_cache
-+    # @no_grad()
-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+    #     expert_cache = ops.zeros_like(x)
-+
-+    #     # 排序保证顺序一致
-+    #     idxs = flat_expert_indices.argsort()
-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+    #     token_idxs = idxs // self.num_experts_per_tok
-+
-+    #     # 找出有 token 的专家
-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+
-+    #     for i in active_experts.tolist():
-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+    #         end_idx = tokens_per_expert[i]
-+    #         if start_idx == end_idx:  # 没有 token
-+    #             continue
-+
-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+    #         expert_tokens = x[exp_token_idx]
-+    #         expert_out = self.experts[i](expert_tokens)
-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+
-+    #         expert_cache = mindspore.mint.scatter_add(
-+    #             expert_cache,
-+    #             0,
-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+    #             expert_out
-+    #         )
-+
-+    #     return expert_cache
-+
-+
- 
- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
- #     """
-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
- 
-         # Initialize weights and apply final processing
-         self.post_init()
-+        self.warm_up = False
-+
-+    def warmup_moe_model_deep(self):
-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+        test_texts = [
-+            "warmup short",
-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+        ]
-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+        if tokenizer is None:
-+            from mindnlp.transformers import AutoTokenizer
-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+            self._warmup_tokenizer = tokenizer
-+
-+        for text in test_texts:
-+            inputs = tokenizer(text, return_tensors="ms")
-+            with mindspore._no_grad():
-+                _ = self(**inputs, use_cache=False)
-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
- 
-     def get_input_embeddings(self):
-         return self.model.embed_tokens
-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-         ```"""
-+        if not self.warm_up:
-+            self.warm_up = True
-+            self.warmup_moe_model_deep()
-+
-         output_attentions = (
-             output_attentions
-             if output_attentions is not None
-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-index 3cbf820e..d4c6b651 100644
---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-@@ -18,7 +18,6 @@
- # See the License for the specific language governing permissions and
- # limitations under the License.
- """MindSpore Qwen2MoE model."""
--
- import math
- from typing import List, Optional, Tuple, Union
- 
-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-     TokenClassifierOutput,
- )
- from ...modeling_utils import PreTrainedModel
-+from ...generation import GenerationMixin
- from ....utils import logging
- from .configuration_qwen2_moe import Qwen2MoeConfig
- 
-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-         self.variance_epsilon = eps
- 
-     def forward(self, hidden_states):
-+        # @dwj
-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+        # @lwx
-+        # if not self.training :
-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-         input_dtype = hidden_states.dtype
-         hidden_states = hidden_states.to(mindspore.float32)
-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-@@ -234,6 +239,8 @@ def rotate_half(x):
-     """Rotates half the hidden dims of the input."""
-     x1 = x[..., : x.shape[-1] // 2]
-     x2 = x[..., x.shape[-1] // 2 :]
-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-     return ops.cat((-x2, x1), dim=-1)
- 
- 
-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-         self.config = config
-         self.hidden_size = config.hidden_size
-         self.intermediate_size = intermediate_size
-+        
-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-         self.act_fn = ACT2FN[config.hidden_act]
- 
-     def forward(self, x):
--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--
- 
-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+        # @lwx 
-+        # gate_up_output = self.gate_up_proj(x)
-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+        # return self.down_proj(swiglu_output)
-+
-+    # def forward(self, x):
-+    #     gate_proj_out = self.gate_proj(x)
-+    #     up_proj_out = self.up_proj(x)
-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+    #     return self.down_proj(swiglu_out)
-+        
- # Copied from transformers.models.llama.modeling_llama.repeat_kv
- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-     """
-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-         use_cache: bool = False,
-         cache_position: Optional[mindspore.Tensor] = None,
-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+
-+        
-+
-         bsz, q_len, _ = hidden_states.shape
- 
-         query_states = self.q_proj(hidden_states)
-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-                     "with a layer index."
-                 )
--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+            if isinstance(past_key_value, StaticCache):
-+                kv_seq_len = key_states.shape[-2]
-+            else:
-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
- 
-         if past_key_value is not None:
-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+            
-+            if isinstance(past_key_value, StaticCache):
-+                kv_seq_len = key_states.shape[-2]
- 
-         # repeat k/v heads if n_kv_heads < n_heads
-         key_states = repeat_kv(key_states, self.num_key_value_groups)
-         value_states = repeat_kv(value_states, self.num_key_value_groups)
--
-+        
-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
- 
--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--            raise ValueError(
--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--                f" {attn_weights.shape}"
--            )
--
--        if attention_mask is not None:  # no matter the length, we just slice it
--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+        if attention_mask is not None:
-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-             attn_weights = attn_weights + causal_mask
- 
-         # upcast attention to fp32
-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
- 
-         attn_output = self.o_proj(attn_output)
--
-+        # @lwx
-+        
-+        # max_seq_len = self.max_position_embeddings  # 2048
-+
-+        # if attention_mask is not None:
-+        #     # attention_mask: [B, 1, Sq, Sk]
-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+
-+        #     # pad 到 [max_seq_len, max_seq_len]
-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+        #     global_attention_mask = padded_mask
-+        # else:
-+        #     global_attention_mask = None
-+
-+
-+        # sparse_mode=3
-+        # attn_output = mindspore.ops.flash_attention_score(
-+        #     query=query_states,
-+        #     key=key_states,
-+        #     value=value_states,
-+        #     real_shift=None,
-+        #     padding_mask=None,
-+
-+        #     head_num=self.num_heads,
-+        #     attn_mask=global_attention_mask,
-+        #     keep_prob=1.0 - self.attention_dropout,
-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+        #     input_layout="BNSD",
-+        #     pre_tokens=2147483647,
-+        #     next_tokens=2147483647,
-+        #     inner_precise=0,
-+        #     drop_mask=None,
-+        #     prefix=None,
-+        #     actual_seq_qlen=None,
-+        #     actual_seq_kvlen=None,
-+        #     sparse_mode=sparse_mode,
-+        # )
-         if not output_attentions:
-             attn_weights = None
- 
-         return attn_output, attn_weights, past_key_value
- 
- 
-+class Qwen2MoeFlashAttention(nn.Module):
-+    """
-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+
-+    关键改动:
-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+       直接传入原始的 key 和 value 张量效率更高。
-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+    """
-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+        super().__init__()
-+        self.config = config
-+        self.layer_idx = layer_idx
-+        self.hidden_size = config.hidden_size
-+        self.num_heads = config.num_attention_heads
-+        self.head_dim = self.hidden_size // self.num_heads
-+        self.num_key_value_heads = config.num_key_value_heads
-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+        self.max_position_embeddings = config.max_position_embeddings
-+        self.rope_theta = config.rope_theta
-+        self.attention_dropout = config.attention_dropout
-+
-+        if (self.head_dim * self.num_heads) != self.hidden_size:
-+            raise ValueError(
-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+            )
-+
-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+
-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+            self.head_dim,
-+            max_position_embeddings=self.max_position_embeddings,
-+            base=self.rope_theta,
-+        )
-+
-+    def forward(
-+        self,
-+        hidden_states: mindspore.Tensor,
-+        attention_mask: Optional[mindspore.Tensor] = None,
-+        position_ids: Optional[mindspore.Tensor] = None,
-+        past_key_value: Optional[Cache] = None,
-+        output_attentions: bool = False,
-+        use_cache: bool = False,
-+        cache_position: Optional[mindspore.Tensor] = None,
-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+
-+        bsz, q_len, _ = hidden_states.shape
-+
-+        # 1. 线性投射 Q, K, V
-+        query_states = self.q_proj(hidden_states)
-+        key_states = self.k_proj(hidden_states)
-+        value_states = self.v_proj(hidden_states)
-+
-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+        # query:   [B, S, H*D] -> [B, N1, S, D]
-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+
-+        # 3. RoPE 旋转位置编码
-+        kv_seq_len = key_states.shape[-2]
-+        if past_key_value is not None:
-+            if self.layer_idx is None:
-+                raise ValueError(
-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+                    "with a layer index."
-+                )
-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+                if cache_position.shape[0] == 1:
-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+                    kv_seq_len = past_seen_tokens + 1
-+                else:
-+                    # prefill 阶段：cache_position 是范围，使用其长度
-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+            else:
-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+
-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+
-+        # 4. KV 缓存更新
-+        if past_key_value is not None:
-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+            key_states, value_states = past_key_value.update(
-+                key_states, value_states, self.layer_idx, cache_kwargs
-+            )
-+            
-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+                if cache_position.shape[0] == 1:
-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+                    kv_seq_len = key_states.shape[-2]
-+
-+        # 5. [重要] 准备 Attention Mask
-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+        fa_attention_mask = None
-+        if attention_mask is not None:
-+            # 截取与当前key长度匹配的部分
-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+            fa_attention_mask = (mask_slice != 0)
-+
-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+        input_dtype = query_states.dtype
-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+            query_states = query_states.to(mindspore.float16)
-+            key_states = key_states.to(mindspore.float16)
-+            value_states = value_states.to(mindspore.float16)
-+
-+        # 6. [核心] 调用 flash_attention_score 算子
-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+        attn_output = mindspore.ops.flash_attention_score(
-+            query=query_states,
-+            key=key_states,
-+            value=value_states,
-+            head_num=self.num_heads, # 传入Q的头数(N1)
-+            attn_mask=fa_attention_mask,
-+            keep_prob=1.0 - self.attention_dropout,
-+            scalar_value=1.0 / math.sqrt(self.head_dim),
-+            input_layout="BNSD",
-+            sparse_mode=0 # 使用 defaultMask 模式
-+        )
-+
-+        # 恢复原始数据类型
-+        attn_output = attn_output.to(input_dtype)
-+
-+        # 7. 调整输出形状
-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+        attn_output = self.o_proj(attn_output)
-+
-+        # FlashAttention 算子不直接返回注意力权重矩阵
-+        attn_weights = None
-+        if output_attentions:
-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+
-+        return attn_output, attn_weights, past_key_value
-+
-+    # def forward(
-+    #     self,
-+    #     hidden_states: mindspore.Tensor,
-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-+    #     position_ids: Optional[mindspore.Tensor] = None,
-+    #     past_key_value: Optional[Cache] = None,
-+    #     output_attentions: bool = False,
-+    #     use_cache: bool = False,
-+    #     cache_position: Optional[mindspore.Tensor] = None,
-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+
-+    #     bsz, q_len, _ = hidden_states.shape
-+
-+    #     # 1. 线性投射 Q, K, V
-+    #     query_states = self.q_proj(hidden_states)
-+    #     key_states = self.k_proj(hidden_states)
-+    #     value_states = self.v_proj(hidden_states)
-+
-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+
-+    #     # 3. RoPE 旋转位置编码
-+    #     kv_seq_len = key_states.shape[-2]
-+    #     if past_key_value is not None:
-+    #         if self.layer_idx is None:
-+    #             raise ValueError(
-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+    #                 "with a layer index."
-+    #             )
-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+
-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+
-+    #     # 4. KV 缓存更新
-+    #     if past_key_value is not None:
-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+    #         key_states, value_states = past_key_value.update(
-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-+    #         )
-+
-+    #     # 5. 准备 Attention Mask
-+    #     fa_attention_mask = None
-+    #     if attention_mask is not None:
-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+    #         fa_attention_mask = (mask_slice != 0)
-+
-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+    #     input_dtype = query_states.dtype
-+
-+    #     # 6. [核心] 调用 flash_attention_score 算子
-+    #     attn_output = mindspore.ops.flash_attention_score(
-+    #         query=query_states,
-+    #         key=key_states,
-+    #         value=value_states,
-+    #         head_num=self.num_heads,
-+    #         attn_mask=fa_attention_mask,
-+    #         keep_prob=1.0 - self.attention_dropout,
-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+    #         input_layout="BNSD",
-+    #         sparse_mode=0,
-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+    #         inner_precise=1
-+    #     )
-+
-+    #     # 恢复原始数据类型
-+    #     attn_output = attn_output.to(input_dtype)
-+
-+    #     # 7. 调整输出形状
-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+    #     attn_output = self.o_proj(attn_output)
-+
-+    #     attn_weights = None
-+    #     if output_attentions:
-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+
-+    #     return attn_output, attn_weights, past_key_value
-+
-+    # def forward(
-+    #     self,
-+    #     hidden_states: mindspore.Tensor,
-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-+    #     position_ids: Optional[mindspore.Tensor] = None,
-+    #     past_key_value: Optional[Cache] = None,
-+    #     output_attentions: bool = False,
-+    #     use_cache: bool = False,
-+    #     cache_position: Optional[mindspore.Tensor] = None,
-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+
-+    #     bsz, q_len, _ = hidden_states.shape
-+
-+    #     query_states = self.q_proj(hidden_states)
-+    #     key_states = self.k_proj(hidden_states)
-+    #     value_states = self.v_proj(hidden_states)
-+
-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+
-+    #     kv_seq_len = key_states.shape[-2]
-+    #     if past_key_value is not None:
-+    #         if self.layer_idx is None:
-+    #             raise ValueError("`layer_idx` must be specified for caching")
-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+
-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+
-+    #     if past_key_value is not None:
-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+    #         key_states, value_states = past_key_value.update(
-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-+    #         )
-+
-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+
-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+    #     query_states = query_states / math.sqrt(self.head_dim)
-+    #     # <--- 修改结束 ---
-+
-+    #     fa_attention_mask = None
-+    #     if attention_mask is not None:
-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+    #         fa_attention_mask = (mask_slice != 0)
-+
-+    #     input_dtype = query_states.dtype
-+
-+    #     attn_output = mindspore.ops.flash_attention_score(
-+    #         query=query_states,    # 传入已经预先缩放过的 query
-+    #         key=key_states,
-+    #         value=value_states,
-+    #         head_num=self.num_heads,
-+    #         attn_mask=fa_attention_mask,
-+    #         keep_prob=1.0 - self.attention_dropout,
-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+    #         input_layout="BNSD",
-+    #         sparse_mode=0,
-+    #         inner_precise=1        # 仍然保持内部高精度计算
-+    #     )
-+
-+    #     attn_output = attn_output.to(input_dtype)
-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+    #     attn_output = self.o_proj(attn_output)
-+
-+    #     attn_weights = None
-+    #     if output_attentions:
-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+
-+    #     return attn_output, attn_weights, past_key_value
-+
- QWEN2MOE_ATTENTION_CLASSES = {
-     "eager": Qwen2MoeAttention,
-+    "flash-attention": Qwen2MoeFlashAttention,
- }
- 
- 
-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
- 
-+    #@dwj
-+    # 只遍历激活的专家，而非全部专家
-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--        batch_size, sequence_length, hidden_dim = hidden_states.shape
--        hidden_states = hidden_states.view(-1, hidden_dim)
--        # router_logits: (batch * sequence_length, n_experts)
--        router_logits = self.gate(hidden_states)
--
--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--        if self.norm_topk_prob:
--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--        # we cast back to the input dtype
--        routing_weights = routing_weights.to(hidden_states.dtype)
--
--        final_hidden_states = ops.zeros(
--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--        )
--
--        # One hot encode the selected experts to create an expert mask
--        # this will be used to easily index which expert is going to be sollicitated
--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--
--        # Loop over all available experts in the model and perform the computation on each expert
--        for expert_idx in range(self.num_experts):
--            expert_layer = self.experts[expert_idx]
--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--
--            # Index the correct hidden states and compute the expert hidden state for
--            # the current expert. We need to make sure to multiply the output hidden
--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--            if 0 not in idx.shape:
--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--
--                # However `index_add_` only support torch tensors for indexing so we'll use
--                # the `top_x` tensor here.
--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--
--        shared_expert_output = self.shared_expert(hidden_states)
--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--
--        final_hidden_states = final_hidden_states + shared_expert_output
-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+            num_tokens = hidden_states_reshaped.shape[0]
-+
-+            router_logits = self.gate(hidden_states_reshaped)
-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+
-+            if self.norm_topk_prob:
-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+            routing_weights = routing_weights.to(hidden_states.dtype)
-+            
-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+            flat_selected_experts = selected_experts.flatten()
-+            
-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+            token_indices = broadcasted_token_indices.flatten()
-+            
-+            active_experts = ops.unique(flat_selected_experts)
-+            
-+            for expert_idx_tensor in active_experts:
-+                expert_idx = expert_idx_tensor.item()
-+                expert_layer = self.experts[expert_idx]
-+                
-+                mask = (flat_selected_experts == expert_idx_tensor)
-+                selected_token_indices = token_indices[mask]
-+                selected_routing_weights = routing_weights.flatten()[mask]
-+                
-+                current_states = hidden_states_reshaped[selected_token_indices]
-+                
-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+                
-+                final_hidden_states = final_hidden_states.index_add(
-+                    dim=0,
-+                    index=selected_token_indices,
-+                    source=expert_output.to(hidden_states.dtype)
-+                )
-+            
-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
- 
--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--        return final_hidden_states, router_logits
-+            final_hidden_states = final_hidden_states + shared_expert_output
-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+            
-+            return final_hidden_states, router_logits
- 
- 
- class Qwen2MoeDecoderLayer(nn.Module):
-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
- 
-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
- 
-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+
-         if (layer_idx not in config.mlp_only_layers) and (
-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-         ):
-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-     _skip_keys_device_placement = "past_key_values"
-     _supports_cache_class = True
-+#lwx
-+    # _supports_static_cache = True
- 
-     def _init_weights(self, module):
-         std = self.config.initializer_range
-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-         return causal_mask
- 
- 
--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-     _tied_weights_keys = ["lm_head.weight"]
- 
-     def __init__(self, config):
-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-         self.num_experts_per_tok = config.num_experts_per_tok
-         # Initialize weights and apply final processing
-         self.post_init()
-+        # @lwx
-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+        #     self.generation_config.cache_implementation = "static"
-+        self._warmed_up = False
-+
-+    def warmup_moe_model(self):
-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+        test_texts = [
-+            "warmup short",
-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+        ]
-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+        if tokenizer is None:
-+            from mindnlp.transformers import AutoTokenizer
-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+            self._warmup_tokenizer = tokenizer
-+
-+        for text in test_texts:
-+            inputs = tokenizer(text, return_tensors="ms")
-+            with mindspore._no_grad():
-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
- 
-     def get_input_embeddings(self):
-         return self.model.embed_tokens
-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-         ```"""
-+        if not self._warmed_up:
-+            self._warmed_up = True
-+            self.warmup_moe_model()
- 
-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-         output_router_logits = (
-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-             }
-         )
-         return model_inputs
-+# @lwx
-+    # def _decode_one_tokens_logits(
-+    #     self,
-+    #     cur_token: mindspore.Tensor,
-+    #     input_pos: Optional[mindspore.Tensor],
-+    #     cache_position: mindspore.Tensor,
-+    #     past_key_values: StaticCache,
-+    # ) -> mindspore.Tensor:
-+    #     """
-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+        
-+    #     Args:
-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+    #         input_pos: 输入位置信息，可选
-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+            
-+    #     Returns:
-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+    #     """
-+    #     # 调用JIT编译的版本
-+    #     return self.get_decode_one_tokens_logits(
-+    #         cur_token=cur_token,
-+    #         input_pos=input_pos,
-+    #         cache_position=cache_position,
-+    #         past_key_values=past_key_values,
-+    #     )
-+    
-+    # @mindspore.jit(jit_level='O1')
-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+    #     """
-+    #     JIT编译的函数，用于高效的单token解码
-+    #     使用JIT编译优化以支持静态shape和高效执行
-+        
-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+    #     """
-+    #     outputs = self.model.forward(
-+    #         input_ids=cur_token,
-+    #         position_ids=input_pos,
-+    #         cache_position=cache_position,
-+    #         past_key_values=past_key_values,
-+    #         use_cache=True,
-+    #         return_dict=False,
-+    #     )
-+        
-+    #     hidden_states = outputs[0]
-+    #     logits = self.lm_head.forward(hidden_states)
-+    #     logits = logits.float()
-+        
-+    #     return logits[:, -1, :]
-+
-+    # def _sample(
-+    #     self,
-+    #     input_ids: mindspore.Tensor,
-+    #     logits_processor,
-+    #     stopping_criteria,
-+    #     generation_config,
-+    #     synced_devices: bool,
-+    #     streamer=None,
-+    #     logits_warper=None,
-+    #     **model_kwargs,
-+    # ):
-+    #     """
-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+    #     """
-+    #     from ...generation.logits_process import LogitsProcessorList
-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+    #     from mindnlp.core import nn, ops, no_grad
-+    #     import numpy as np
-+        
-+    #     # 检查是否使用 StaticCache
-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+    #     # 否则，直接调用父类方法
-+    #     past_key_values = model_kwargs.get("past_key_values")
-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+
-+    #     if not isinstance(past_key_values, StaticCache):
-+    #         # 不使用 StaticCache，直接调用父类方法
-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+    #         return super()._sample(
-+    #             input_ids=input_ids,
-+    #             logits_processor=logits_processor,
-+    #             stopping_criteria=stopping_criteria,
-+    #             generation_config=generation_config,
-+    #             synced_devices=synced_devices,
-+    #             streamer=streamer,
-+    #             logits_warper=logits_warper,
-+    #             **model_kwargs,
-+    #         )
-+        
-+    #     # 使用 StaticCache，进入自定义循环
-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+    #     pad_token_id = generation_config._pad_token_tensor
-+    #     output_attentions = generation_config.output_attentions
-+    #     output_hidden_states = generation_config.output_hidden_states
-+    #     output_scores = generation_config.output_scores
-+    #     output_logits = generation_config.output_logits
-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+    #     max_length = generation_config.max_length
-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+    #     do_sample = generation_config.do_sample
-+        
-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+    #         raise ValueError(
-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+    #             f"{logits_warper})."
-+    #         )
-+        
-+    #     # init attention / hidden states / scores tuples
-+    #     scores = () if (return_dict_in_generate and output_scores) else None
-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+        
-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+    #         encoder_hidden_states = (
-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+    #         )
-+        
-+    #     # keep track of which sequences are already finished
-+    #     batch_size, cur_len = input_ids.shape
-+    #     this_peer_finished = False
-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+        
-+    #     time_record = []
-+    #     from ....utils.testing_utils import parse_flag_from_env
-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+        
-+    #     while self._has_unfinished_sequences(
-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+    #     ):
-+    #         if _record_time:
-+    #             import time as time_module
-+    #             infer_start = time_module.time()
-+            
-+    #         # prepare model inputs
-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+            
-+    #         # prepare variable output controls
-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+            
-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+    #         cur_cache_position = model_inputs.get("cache_position")
-+    #         cur_past_key_values = model_inputs.get("past_key_values")
-+    #         cur_input_ids = model_inputs.get("input_ids")
-+            
-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+    #             cur_cache_position is not None and 
-+    #             len(cur_cache_position.shape) > 0 and
-+    #             cur_cache_position.shape[0] == 1 and
-+    #             cur_input_ids is not None and
-+    #             cur_input_ids.shape[1] == 1):
-+    #             # 使用 JIT 优化的单 token 解码
-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+    #             if not hasattr(self, '_jit_used'):
-+    #                 self._jit_used = False
-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+                
-+    #             next_token_logits = self.get_decode_one_tokens_logits(
-+    #                 cur_token=cur_input_ids,
-+    #                 input_pos=model_inputs.get("position_ids"),
-+    #                 cache_position=cur_cache_position,
-+    #                 past_key_values=cur_past_key_values,
-+    #             )
-+                
-+    #             # 标记已使用JIT（用于后续判断）
-+    #             if not self._jit_used:
-+    #                 self._jit_used = True
-+                
-+    #             # 构造兼容的输出对象
-+    #             class JitOptimizedOutput:
-+    #                 def __init__(self, logits, config):
-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+    #                     self.config = config
-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+    #                     self.attentions = None if not config.is_encoder_decoder else None
-+    #                     self.cross_attentions = None
-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+                
-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+    #         else:
-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+    #             outputs = self(**model_inputs, return_dict=True)
-+            
-+    #         if synced_devices and this_peer_finished:
-+    #             continue
-+            
-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+    #         next_token_logits = outputs.logits[:, -1, :]
-+            
-+    #         # pre-process distribution
-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+    #         if do_sample:
-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+            
-+    #         # Store scores, attentions and hidden_states when required
-+    #         if return_dict_in_generate:
-+    #             if output_scores:
-+    #                 scores += (next_token_scores,)
-+    #             if output_logits:
-+    #                 raw_logits += (next_token_logits,)
-+    #             if output_attentions:
-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+    #                 if self.config.is_encoder_decoder:
-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+                
-+    #             if output_hidden_states:
-+    #                 hidden = (
-+    #                     outputs.decoder_hidden_states
-+    #                     if self.config.is_encoder_decoder
-+    #                     else outputs.hidden_states
-+    #                 )
-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+            
-+    #         # token selection
-+    #         if do_sample:
-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+    #         else:
-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+            
-+    #         # finished sentences should have their next token be a padding token
-+    #         if has_eos_stopping_criteria:
-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+            
-+    #         # update generated ids, model inputs, and length for next step
-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+    #         if streamer is not None:
-+    #             streamer.put(next_tokens)
-+            
-+    #         model_kwargs = self._update_model_kwargs_for_generation(
-+    #             outputs,
-+    #             model_kwargs,
-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+    #         )
-+            
-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+    #         cur_len += 1
-+            
-+    #         if _record_time:
-+    #             import time as time_module
-+    #             infer_stop = time_module.time()
-+    #             time_record.append(infer_stop - infer_start)
-+            
-+    #         del outputs
-+        
-+    #     average_infer_time = None
-+    #     if time_record:
-+    #         if len(time_record) > 1:
-+    #             time_record.pop(0)
-+    #         average_infer_time = sum(time_record) / len(time_record)
-+    #         print(f'average inference time is: {average_infer_time}')
-+    #         print(f'inference time record: {time_record}')
-+        
-+    #     if streamer is not None:
-+    #         streamer.end()
-+        
-+    #     # 简单判断：打印是否使用了JIT路径
-+    #     if hasattr(self, '_jit_used') and self._jit_used:
-+    #         print("[JIT] ✓ JIT optimization was used during generation")
-+    #     else:
-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+        
-+    #     if return_dict_in_generate:
-+    #         if self.config.is_encoder_decoder:
-+    #             return GenerateEncoderDecoderOutput(
-+    #                 sequences=input_ids,
-+    #                 scores=scores,
-+    #                 logits=raw_logits,
-+    #                 encoder_attentions=encoder_attentions,
-+    #                 encoder_hidden_states=encoder_hidden_states,
-+    #                 decoder_attentions=decoder_attentions,
-+    #                 cross_attentions=cross_attentions,
-+    #                 decoder_hidden_states=decoder_hidden_states,
-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-+    #                 average_infer_time=average_infer_time
-+    #             )
-+    #         else:
-+    #             return GenerateDecoderOnlyOutput(
-+    #                 sequences=input_ids,
-+    #                 scores=scores,
-+    #                 logits=raw_logits,
-+    #                 attentions=decoder_attentions,
-+    #                 hidden_states=decoder_hidden_states,
-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-+    #                 average_infer_time=average_infer_time
-+    #             )
-+    #     else:
-+    #         return input_ids
-+            
-+    # def _prepare_cache_for_generation(
-+    #     self,
-+    #     generation_config,
-+    #     model_kwargs,
-+    #     assistant_model,
-+    #     batch_size,
-+    #     max_cache_length,
-+    # ):
-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+    #         generation_config.cache_implementation = "static"
-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+        
-+    #     if generation_config.cache_implementation == "static":
-+    #         base_required_from_max_length = generation_config.max_length + 1
-+    #         base_required = max(max_cache_length, base_required_from_max_length)
-+    #         min_cache_size = 50
-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+    #         else:
-+    #             max_cache_length = max(base_required, min_cache_size)
-+            
-+    #         original_max_cache_length = max_cache_length
-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+    #         print(f"  - final max_cache_length: {max_cache_length}")
-+            
-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+    #             if max_cache_length > self.config.max_position_embeddings:
-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+        
-+    #     result = super()._prepare_cache_for_generation(
-+    #         generation_config=generation_config,
-+    #         model_kwargs=model_kwargs,
-+    #         assistant_model=assistant_model,
-+    #         batch_size=batch_size,
-+    #         max_cache_length=max_cache_length,
-+    #     )
-+        
-+    #     if generation_config.cache_implementation == "static":
-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+    #         created_cache = model_kwargs.get(cache_name)
-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+    #             if created_cache.max_cache_len < generation_config.max_length:
-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+        
-+    #     return result
-+
-+
-+
- 
- 
- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0002-20251106commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0002-20251106commit.patch"
deleted file mode 100644
index baee9388..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0002-20251106commit.patch"
+++ /dev/null
@@ -1,3200 +0,0 @@
-From dcd6fc7b6307db27f23087ba3958949eb52a9beb Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Thu, 6 Nov 2025 09:20:38 +0800
-Subject: [PATCH 02/10] 20251106commit
-
----
- .../models/deepseek/modeling_deepseek.py      |  379 ++++-
- .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
- patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
- 3 files changed, 2689 insertions(+), 305 deletions(-)
- create mode 100644 patches/0001-20251104commit.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index d8303e45..73773c22 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
-         #     y = y + self.shared_experts(identity)
-         # return y
-         
-+    # @no_grad()
-+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+
-+    #     expert_cache = ops.zeros_like(x)
-+    #     for i in range(self.num_experts_per_tok):
-+    #         expert_id = flat_expert_indices[i].item()
-+    #         weight = flat_expert_weights[i].item()
-+    #         expert = self.experts[expert_id]
-+    #         expert_out = expert(x)
-+    #         expert_cache += expert_out * weight
-+    #     return expert_cache
-+
-     @no_grad()
-     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+        # x 的 shape: (1, hidden_size)
-+        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+
-+        # 1. 收集所有需要的专家层
-+        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+        selected_experts = [self.experts[i] for i in flat_expert_indices]
-+
-+        # 2. 并行计算所有专家的输出
-+        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+        # ops.cat 会将它们堆叠成一个新的 Tensor
-+        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+
-+        # 3. 使用矩阵乘法进行加权求和
-+        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+        # 最终结果 final_output 的 shape: (1, hidden_size)
-+        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+        
-+        return final_output
- 
--        expert_cache = ops.zeros_like(x)
--        for i in range(self.num_experts_per_tok):
--            expert_id = flat_expert_indices[i].item()
--            weight = flat_expert_weights[i].item()
--            expert = self.experts[expert_id]
--            expert_out = expert(x)
--            expert_cache += expert_out * weight
--        return expert_cache
- 
-     @no_grad()
-     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
-             key_states = self.k_proj(hidden_states)
-             value_states = self.v_proj(hidden_states)
- 
--        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+        # @lwx
-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
-+        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-+        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-+        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
- 
-         kv_seq_len = key_states.shape[-2]
-         if past_key_value is not None:
-@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
-         return attn_output, attn_weights, past_key_value
- 
- 
-+# class DeepseekFlashAttention(nn.Module):
-+#     """
-+#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-+
-+#     This class is designed as a drop-in replacement for DeepseekAttention.
-+#     """
-+
-+#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+#         super().__init__()
-+#         self.config = config
-+#         self.layer_idx = layer_idx
-+#         if layer_idx is None:
-+#             logger.warning(
-+#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+#                 "when creating this class."
-+#             )
-+
-+#         self.attention_dropout = config.attention_dropout
-+#         self.hidden_size = config.hidden_size
-+#         self.num_heads = config.num_attention_heads
-+#         self.head_dim = self.hidden_size // self.num_heads
-+#         self.num_key_value_heads = config.num_key_value_heads
-+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+#         self.max_position_embeddings = config.max_position_embeddings
-+#         self.rope_theta = config.rope_theta
-+#         self.is_causal = True
-+
-+#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+#             raise ValueError(
-+#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+#                 f" and `num_heads`: {self.num_heads})."
-+#             )
-+
-+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+#         self._init_rope()
-+
-+#     def _init_rope(self):
-+#         if self.config.rope_scaling is None:
-+#             self.rotary_emb = DeepseekRotaryEmbedding(
-+#                 self.head_dim,
-+#                 max_position_embeddings=self.max_position_embeddings,
-+#                 base=self.rope_theta,
-+#             )
-+#         else:
-+#             scaling_type = self.config.rope_scaling["type"]
-+#             scaling_factor = self.config.rope_scaling["factor"]
-+#             if scaling_type == "linear":
-+#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+#                     self.head_dim,
-+#                     max_position_embeddings=self.max_position_embeddings,
-+#                     scaling_factor=scaling_factor,
-+#                     base=self.rope_theta,
-+#                 )
-+#             elif scaling_type == "dynamic":
-+#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+#                     self.head_dim,
-+#                     max_position_embeddings=self.max_position_embeddings,
-+#                     scaling_factor=scaling_factor,
-+#                     base=self.rope_theta,
-+#                 )
-+#             else:
-+#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+
-+#     def forward(
-+#         self,
-+#         hidden_states: mindspore.Tensor,
-+#         attention_mask: Optional[mindspore.Tensor] = None,
-+#         position_ids: Optional[mindspore.Tensor] = None,
-+#         past_key_value: Optional[Cache] = None,
-+#         output_attentions: bool = False,
-+#         use_cache: bool = False,
-+#         **kwargs,
-+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+#         if "padding_mask" in kwargs:
-+#             warnings.warn(
-+#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+#             )
-+        
-+#         if output_attentions:
-+#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-+
-+#         bsz, q_len, _ = hidden_states.shape
-+
-+#         if self.config.pretraining_tp > 1:
-+#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+
-+#         query_states = self.q_proj(hidden_states)
-+#         key_states = self.k_proj(hidden_states)
-+#         value_states = self.v_proj(hidden_states)
-+
-+#         # Reshape for multi-head attention
-+#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+
-+#         kv_seq_len = key_states.shape[-2]
-+#         if past_key_value is not None:
-+#             if self.layer_idx is None:
-+#                 raise ValueError(
-+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+#                     "with a layer index."
-+#                 )
-+#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+        
-+#         # Apply Rotary Positional Embedding
-+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+
-+#         if past_key_value is not None:
-+#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-+#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+
-+#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-+#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-+#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+        
-+#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+        
-+#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+
-+#         # Convert attention_mask for flash_attention_score
-+#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-+#         if attention_mask is not None:
-+#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-+#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+#                 raise ValueError(
-+#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+#                 )
-+#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-+#         else:
-+#             attn_mask_for_fa = None
-+        
-+#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+
-+#         # Call the fused flash_attention_score operator
-+#         attn_output = mindspore.ops.flash_attention_score(
-+#             query=query_states_for_fa,
-+#             key=key_states_for_fa,
-+#             value=value_states_for_fa,
-+#             head_num=self.num_heads, # This is N1, the number of query heads
-+#             input_layout='BSH',
-+#             attn_mask=attn_mask_for_fa,
-+#             keep_prob=keep_prob,
-+#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+#             sparse_mode=0 # Default mask mode
-+#         )
-+        
-+#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-+#         attn_output = self.o_proj(attn_output)
-+        
-+#         # Flash Attention does not return attention weights
-+#         attn_weights = None
-+
-+#         return attn_output, attn_weights, past_key_value
-+
-+class DeepseekFlashAttention(nn.Module):
-+    """
-+    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
-+    This implementation is a drop-in replacement for the original DeepseekAttention class,
-+    designed for high performance on supported hardware (Ascend).
-+
-+    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
-+    """
-+    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+        super().__init__()
-+        self.config = config
-+        self.layer_idx = layer_idx
-+        if layer_idx is None:
-+            logger.warning(
-+                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+                "when creating this class."
-+            )
-+
-+        # --- [FIX] Correctly initialize all required attributes ---
-+        self.attention_dropout = config.attention_dropout
-+        self.hidden_size = config.hidden_size
-+        self.num_heads = config.num_attention_heads
-+        self.head_dim = self.hidden_size // self.num_heads
-+        self.num_key_value_heads = config.num_key_value_heads
-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+        self.max_position_embeddings = config.max_position_embeddings
-+        self.rope_theta = config.rope_theta
-+        self.is_causal = True
-+
-+        if (self.head_dim * self.num_heads) != self.hidden_size:
-+            raise ValueError(
-+                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+                f" and `num_heads`: {self.num_heads})."
-+            )
-+
-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+        
-+        # This call will now succeed as all attributes are initialized.
-+        self._init_rope()
-+
-+    def _init_rope(self):
-+        if self.config.rope_scaling is None:
-+            self.rotary_emb = DeepseekRotaryEmbedding(
-+                self.head_dim,
-+                max_position_embeddings=self.max_position_embeddings,
-+                base=self.rope_theta,
-+            )
-+        else:
-+            scaling_type = self.config.rope_scaling["type"]
-+            scaling_factor = self.config.rope_scaling["factor"]
-+            if scaling_type == "linear":
-+                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+                    self.head_dim,
-+                    max_position_embeddings=self.max_position_embeddings,
-+                    scaling_factor=scaling_factor,
-+                    base=self.rope_theta,
-+                )
-+            elif scaling_type == "dynamic":
-+                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+                    self.head_dim,
-+                    max_position_embeddings=self.max_position_embeddings,
-+                    scaling_factor=scaling_factor,
-+                    base=self.rope_theta,
-+                )
-+            else:
-+                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+
-+    def forward(
-+        self,
-+        hidden_states: mindspore.Tensor,
-+        attention_mask: Optional[mindspore.Tensor] = None,
-+        position_ids: Optional[mindspore.Tensor] = None,
-+        past_key_value: Optional[Cache] = None,
-+        output_attentions: bool = False,
-+        use_cache: bool = False,
-+        **kwargs,
-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+        if "padding_mask" in kwargs:
-+            warnings.warn(
-+                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+            )
-+        if output_attentions:
-+            warnings.warn(
-+                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
-+            )
-+
-+        bsz, q_len, _ = hidden_states.shape
-+
-+        if self.config.pretraining_tp > 1:
-+            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+
-+        query_states = self.q_proj(hidden_states)
-+        key_states = self.k_proj(hidden_states)
-+        value_states = self.v_proj(hidden_states)
-+
-+        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+
-+        kv_seq_len = key_states.shape[-2]
-+        if past_key_value is not None:
-+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+        
-+        # Apply Rotary Position Embedding
-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+
-+        if past_key_value is not None:
-+            cache_kwargs = {"sin": sin, "cos": cos}
-+            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+
-+        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
-+        # So we must explicitly repeat the KV heads.
-+        key_states = repeat_kv(key_states, self.num_key_value_groups)
-+        value_states = repeat_kv(value_states, self.num_key_value_groups)
-+
-+        # Convert attention mask for flash_attention_score
-+        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
-+        if attention_mask is not None:
-+            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+                 raise ValueError(
-+                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+                )
-+            attn_mask_for_fa = attention_mask < 0
-+        else:
-+            attn_mask_for_fa = None
-+
-+        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+
-+        # Call the fused operator using the efficient BNSD layout
-+        attn_output = mindspore.ops.flash_attention_score(
-+            query=query_states,
-+            key=key_states,
-+            value=value_states,
-+            head_num=self.num_heads,
-+            input_layout='BNSD', # Specify the correct layout
-+            attn_mask=attn_mask_for_fa,
-+            keep_prob=keep_prob,
-+            scalar_value=1.0 / math.sqrt(self.head_dim)
-+        )
-+        
-+        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+        
-+        # Apply output projection
-+        attn_output = self.o_proj(attn_output)
-+
-+        # Flash attention does not return attention weights, so we return None.
-+        attn_weights = None
-+
-+        return attn_output, attn_weights, past_key_value
-+
- Deepseek_ATTENTION_CLASSES = {
-     "eager": DeepseekAttention,
-+    "flash-attention": DeepseekFlashAttention,
- }
- 
- 
-@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
-             config=config, layer_idx=layer_idx
-         )
- 
-+        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-+            config=config, layer_idx=layer_idx
-+        )
-+
-         self.mlp = (
-             DeepseekMoE(config)
-             if (
-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-index d4c6b651..bced285c 100644
---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
- 
- import mindspore
- import mindnlp.core.nn.functional as F
--from mindnlp.core import nn, ops
-+from mindnlp.core import nn, ops, no_grad
- from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
- 
- from ....common.activations import ACT2FN
-@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
- _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
- _CONFIG_FOR_DOC = "Qwen2MoeConfig"
- 
-+Long_Prompt = False
-+PROMPT_LENGTH_THRESHOLD = 128
- 
- # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
- def _prepare_4d_causal_attention_mask_with_cache_position(
-@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
-         return attn_output, attn_weights, past_key_value
- 
- 
-+# class Qwen2MoeFlashAttention(nn.Module):
-+#     """
-+#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+
-+#     关键改动:
-+#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+#        直接传入原始的 key 和 value 张量效率更高。
-+#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+#     """
-+#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+#         super().__init__()
-+#         self.config = config
-+#         self.layer_idx = layer_idx
-+#         self.hidden_size = config.hidden_size
-+#         self.num_heads = config.num_attention_heads
-+#         self.head_dim = self.hidden_size // self.num_heads
-+#         self.num_key_value_heads = config.num_key_value_heads
-+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+#         self.max_position_embeddings = config.max_position_embeddings
-+#         self.rope_theta = config.rope_theta
-+#         self.attention_dropout = config.attention_dropout
-+
-+#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+#             raise ValueError(
-+#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+#             )
-+
-+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+
-+#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+#             self.head_dim,
-+#             max_position_embeddings=self.max_position_embeddings,
-+#             base=self.rope_theta,
-+#         )
-+
-+#     def forward(
-+#         self,
-+#         hidden_states: mindspore.Tensor,
-+#         attention_mask: Optional[mindspore.Tensor] = None,
-+#         position_ids: Optional[mindspore.Tensor] = None,
-+#         past_key_value: Optional[Cache] = None,
-+#         output_attentions: bool = False,
-+#         use_cache: bool = False,
-+#         cache_position: Optional[mindspore.Tensor] = None,
-+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+
-+#         bsz, q_len, _ = hidden_states.shape
-+
-+#         # 1. 线性投射 Q, K, V
-+#         query_states = self.q_proj(hidden_states)
-+#         key_states = self.k_proj(hidden_states)
-+#         value_states = self.v_proj(hidden_states)
-+
-+#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+#         # query:   [B, S, H*D] -> [B, N1, S, D]
-+#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+
-+#         # 3. RoPE 旋转位置编码
-+#         kv_seq_len = key_states.shape[-2]
-+#         if past_key_value is not None:
-+#             if self.layer_idx is None:
-+#                 raise ValueError(
-+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+#                     "with a layer index."
-+#                 )
-+#             # 对于 StaticCache，需要特殊处理 kv_seq_len
-+#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+#                 if cache_position.shape[0] == 1:
-+#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+#                     kv_seq_len = past_seen_tokens + 1
-+#                 else:
-+#                     # prefill 阶段：cache_position 是范围，使用其长度
-+#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+#             else:
-+#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+
-+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+
-+#         # 4. KV 缓存更新
-+#         if past_key_value is not None:
-+#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+#             key_states, value_states = past_key_value.update(
-+#                 key_states, value_states, self.layer_idx, cache_kwargs
-+#             )
-+            
-+#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+#                 if cache_position.shape[0] == 1:
-+#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+#                     kv_seq_len = key_states.shape[-2]
-+
-+#         # 5. [重要] 准备 Attention Mask
-+#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+#         fa_attention_mask = None
-+#         if attention_mask is not None:
-+#             # 截取与当前key长度匹配的部分
-+#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+#             # 转换为布尔类型: 大负数 -> True, 0 -> False
-+#             fa_attention_mask = (mask_slice != 0)
-+
-+#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+#         input_dtype = query_states.dtype
-+#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+#             query_states = query_states.to(mindspore.float16)
-+#             key_states = key_states.to(mindspore.float16)
-+#             value_states = value_states.to(mindspore.float16)
-+
-+#         # 6. [核心] 调用 flash_attention_score 算子
-+#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+#         attn_output = mindspore.ops.flash_attention_score(
-+#             query=query_states,
-+#             key=key_states,
-+#             value=value_states,
-+#             head_num=self.num_heads, # 传入Q的头数(N1)
-+#             attn_mask=fa_attention_mask,
-+#             keep_prob=1.0 - self.attention_dropout,
-+#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+#             input_layout="BNSD",
-+#             sparse_mode=0 # 使用 defaultMask 模式
-+#         )
-+
-+#         # 恢复原始数据类型
-+#         attn_output = attn_output.to(input_dtype)
-+
-+#         # 7. 调整输出形状
-+#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+#         attn_output = self.o_proj(attn_output)
-+
-+#         # FlashAttention 算子不直接返回注意力权重矩阵
-+#         attn_weights = None
-+#         if output_attentions:
-+#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+
-+#         return attn_output, attn_weights, past_key_value
-+
-+#     # def forward(
-+#     #     self,
-+#     #     hidden_states: mindspore.Tensor,
-+#     #     attention_mask: Optional[mindspore.Tensor] = None,
-+#     #     position_ids: Optional[mindspore.Tensor] = None,
-+#     #     past_key_value: Optional[Cache] = None,
-+#     #     output_attentions: bool = False,
-+#     #     use_cache: bool = False,
-+#     #     cache_position: Optional[mindspore.Tensor] = None,
-+#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+
-+#     #     bsz, q_len, _ = hidden_states.shape
-+
-+#     #     # 1. 线性投射 Q, K, V
-+#     #     query_states = self.q_proj(hidden_states)
-+#     #     key_states = self.k_proj(hidden_states)
-+#     #     value_states = self.v_proj(hidden_states)
-+
-+#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+
-+#     #     # 3. RoPE 旋转位置编码
-+#     #     kv_seq_len = key_states.shape[-2]
-+#     #     if past_key_value is not None:
-+#     #         if self.layer_idx is None:
-+#     #             raise ValueError(
-+#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+#     #                 "with a layer index."
-+#     #             )
-+#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+
-+#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+
-+#     #     # 4. KV 缓存更新
-+#     #     if past_key_value is not None:
-+#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+#     #         key_states, value_states = past_key_value.update(
-+#     #             key_states, value_states, self.layer_idx, cache_kwargs
-+#     #         )
-+
-+#     #     # 5. 准备 Attention Mask
-+#     #     fa_attention_mask = None
-+#     #     if attention_mask is not None:
-+#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+#     #         fa_attention_mask = (mask_slice != 0)
-+
-+#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+#     #     input_dtype = query_states.dtype
-+
-+#     #     # 6. [核心] 调用 flash_attention_score 算子
-+#     #     attn_output = mindspore.ops.flash_attention_score(
-+#     #         query=query_states,
-+#     #         key=key_states,
-+#     #         value=value_states,
-+#     #         head_num=self.num_heads,
-+#     #         attn_mask=fa_attention_mask,
-+#     #         keep_prob=1.0 - self.attention_dropout,
-+#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+#     #         input_layout="BNSD",
-+#     #         sparse_mode=0,
-+#     #         # <--- 修改点 2: 启用内部高精度计算 ---
-+#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+#     #         inner_precise=1
-+#     #     )
-+
-+#     #     # 恢复原始数据类型
-+#     #     attn_output = attn_output.to(input_dtype)
-+
-+#     #     # 7. 调整输出形状
-+#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+#     #     attn_output = self.o_proj(attn_output)
-+
-+#     #     attn_weights = None
-+#     #     if output_attentions:
-+#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+
-+#     #     return attn_output, attn_weights, past_key_value
-+
-+
- class Qwen2MoeFlashAttention(nn.Module):
-     """
--    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--
--    关键改动:
--    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--       直接传入原始的 key 和 value 张量效率更高。
--    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
-+
-+    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
-+    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
-+    完全使用模型的低精度数据类型（如 float16）进行计算，
-+    以达到理论上的最高执行速度。
-     """
-     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-         super().__init__()
-         self.config = config
-         self.layer_idx = layer_idx
-+        if layer_idx is None:
-+            logger.warning_once(
-+                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
-+            )
-+
-         self.hidden_size = config.hidden_size
-         self.num_heads = config.num_attention_heads
-         self.head_dim = self.hidden_size // self.num_heads
-         self.num_key_value_heads = config.num_key_value_heads
--        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-         self.max_position_embeddings = config.max_position_embeddings
-         self.rope_theta = config.rope_theta
-         self.attention_dropout = config.attention_dropout
- 
--        if (self.head_dim * self.num_heads) != self.hidden_size:
--            raise ValueError(
--                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--            )
--
-         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
-         key_states = self.k_proj(hidden_states)
-         value_states = self.v_proj(hidden_states)
- 
--        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--        # query:   [B, S, H*D] -> [B, N1, S, D]
--        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+        # 2. 调整形状以匹配 BNSD 布局
-         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--
--        # 3. RoPE 旋转位置编码
-+        
-+        # 3. RoPE 和 KV 缓存
-         kv_seq_len = key_states.shape[-2]
-         if past_key_value is not None:
--            if self.layer_idx is None:
--                raise ValueError(
--                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--                    "with a layer index."
--                )
--            # 对于 StaticCache，需要特殊处理 kv_seq_len
--            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--                if cache_position.shape[0] == 1:
--                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--                    kv_seq_len = past_seen_tokens + 1
--                else:
--                    # prefill 阶段：cache_position 是范围，使用其长度
--                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--            else:
--                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--
-+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+        
-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
- 
--        # 4. KV 缓存更新
-         if past_key_value is not None:
-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--            key_states, value_states = past_key_value.update(
--                key_states, value_states, self.layer_idx, cache_kwargs
--            )
--            
--            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--                if cache_position.shape[0] == 1:
--                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--                    kv_seq_len = key_states.shape[-2]
--
--        # 5. [重要] 准备 Attention Mask
--        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+
-+        # 4. 准备 Attention Mask
-         fa_attention_mask = None
-         if attention_mask is not None:
--            # 截取与当前key长度匹配的部分
--            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--            # 转换为布尔类型: 大负数 -> True, 0 -> False
-             fa_attention_mask = (mask_slice != 0)
- 
--        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--        input_dtype = query_states.dtype
--        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--            query_states = query_states.to(mindspore.float16)
--            key_states = key_states.to(mindspore.float16)
--            value_states = value_states.to(mindspore.float16)
--
--        # 6. [核心] 调用 flash_attention_score 算子
--        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
-         attn_output = mindspore.ops.flash_attention_score(
-             query=query_states,
-             key=key_states,
-             value=value_states,
--            head_num=self.num_heads, # 传入Q的头数(N1)
-+            head_num=self.num_heads,
-             attn_mask=fa_attention_mask,
--            keep_prob=1.0 - self.attention_dropout,
-+            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
-             scalar_value=1.0 / math.sqrt(self.head_dim),
-             input_layout="BNSD",
--            sparse_mode=0 # 使用 defaultMask 模式
-+            sparse_mode=0,
-+            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
-         )
- 
--        # 恢复原始数据类型
--        attn_output = attn_output.to(input_dtype)
--
--        # 7. 调整输出形状
--        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+        # 6. 调整输出形状
-         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-         attn_output = self.o_proj(attn_output)
- 
--        # FlashAttention 算子不直接返回注意力权重矩阵
-+        # 7. 返回结果
-         attn_weights = None
-         if output_attentions:
--             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
- 
-         return attn_output, attn_weights, past_key_value
- 
--    # def forward(
--    #     self,
--    #     hidden_states: mindspore.Tensor,
--    #     attention_mask: Optional[mindspore.Tensor] = None,
--    #     position_ids: Optional[mindspore.Tensor] = None,
--    #     past_key_value: Optional[Cache] = None,
--    #     output_attentions: bool = False,
--    #     use_cache: bool = False,
--    #     cache_position: Optional[mindspore.Tensor] = None,
--    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--
--    #     bsz, q_len, _ = hidden_states.shape
--
--    #     # 1. 线性投射 Q, K, V
--    #     query_states = self.q_proj(hidden_states)
--    #     key_states = self.k_proj(hidden_states)
--    #     value_states = self.v_proj(hidden_states)
--
--    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--
--    #     # 3. RoPE 旋转位置编码
--    #     kv_seq_len = key_states.shape[-2]
--    #     if past_key_value is not None:
--    #         if self.layer_idx is None:
--    #             raise ValueError(
--    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--    #                 "with a layer index."
--    #             )
--    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
- 
--    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--
--    #     # 4. KV 缓存更新
--    #     if past_key_value is not None:
--    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--    #         key_states, value_states = past_key_value.update(
--    #             key_states, value_states, self.layer_idx, cache_kwargs
--    #         )
--
--    #     # 5. 准备 Attention Mask
--    #     fa_attention_mask = None
--    #     if attention_mask is not None:
--    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--    #         fa_attention_mask = (mask_slice != 0)
--
--    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--    #     input_dtype = query_states.dtype
--
--    #     # 6. [核心] 调用 flash_attention_score 算子
--    #     attn_output = mindspore.ops.flash_attention_score(
--    #         query=query_states,
--    #         key=key_states,
--    #         value=value_states,
--    #         head_num=self.num_heads,
--    #         attn_mask=fa_attention_mask,
--    #         keep_prob=1.0 - self.attention_dropout,
--    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--    #         input_layout="BNSD",
--    #         sparse_mode=0,
--    #         # <--- 修改点 2: 启用内部高精度计算 ---
--    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--    #         inner_precise=1
--    #     )
--
--    #     # 恢复原始数据类型
--    #     attn_output = attn_output.to(input_dtype)
-+QWEN2MOE_ATTENTION_CLASSES = {
-+    "eager": Qwen2MoeAttention,
-+    "flash-attention": Qwen2MoeFlashAttention,
-+}
- 
--    #     # 7. 调整输出形状
--    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--    #     attn_output = self.o_proj(attn_output)
- 
--    #     attn_weights = None
--    #     if output_attentions:
--    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+# class Qwen2MoeSparseMoeBlock(nn.Module):
-+#     def __init__(self, config):
-+#         super().__init__()
-+#         self.num_experts = config.num_experts
-+#         self.top_k = config.num_experts_per_tok
-+#         self.norm_topk_prob = config.norm_topk_prob
-+
-+#         # gating
-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+#         self.experts = nn.ModuleList(
-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+#         )
-+
-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+
-+#     #@dwj
-+#     # 只遍历激活的专家，而非全部专家
-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+#             batch_size, sequence_length, hidden_dim = hidden_states.shape
-+#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+#             num_tokens = hidden_states_reshaped.shape[0]
-+
-+#             router_logits = self.gate(hidden_states_reshaped)
-+#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+
-+#             if self.norm_topk_prob:
-+#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+#             routing_weights = routing_weights.to(hidden_states.dtype)
-+            
-+#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+#             flat_selected_experts = selected_experts.flatten()
-+            
-+#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+#             token_indices = broadcasted_token_indices.flatten()
-+            
-+#             active_experts = ops.unique(flat_selected_experts)
-+            
-+#             for expert_idx_tensor in active_experts:
-+#                 expert_idx = expert_idx_tensor.item()
-+#                 expert_layer = self.experts[expert_idx]
-+                
-+#                 mask = (flat_selected_experts == expert_idx_tensor)
-+#                 selected_token_indices = token_indices[mask]
-+#                 selected_routing_weights = routing_weights.flatten()[mask]
-+                
-+#                 current_states = hidden_states_reshaped[selected_token_indices]
-+                
-+#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+                
-+#                 final_hidden_states = final_hidden_states.index_add(
-+#                     dim=0,
-+#                     index=selected_token_indices,
-+#                     source=expert_output.to(hidden_states.dtype)
-+#                 )
-+            
-+#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
- 
--    #     return attn_output, attn_weights, past_key_value
-+#             final_hidden_states = final_hidden_states + shared_expert_output
-+#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+            
-+#             return final_hidden_states, router_logits
-+
-+
-+# class Qwen2MoeSparseMoeBlock(nn.Module):
-+#     """
-+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-+#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-+#     `_moe_infer_prefill` (用于长序列处理) 方法。
-+#     """
-+#     def __init__(self, config: Qwen2MoeConfig):
-+#         super().__init__()
-+#         self.num_experts = config.num_experts
-+#         self.top_k = config.num_experts_per_tok
-+#         self.norm_topk_prob = config.norm_topk_prob
-+
-+#         # 门控网络
-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+#         # 专家列表
-+#         self.experts = nn.ModuleList(
-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+#         )
-+#         # 共享专家
-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+
-+#     @no_grad()
-+#     def _moe_infer_decode(
-+#         self, 
-+#         hidden_states: mindspore.Tensor, 
-+#         selected_experts: mindspore.Tensor, 
-+#         routing_weights: mindspore.Tensor
-+#     ) -> mindspore.Tensor:
-+#         """
-+#         【解码路径】针对 sequence_length=1 的极致优化。
-+#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-+#         """
-+#         batch_size, hidden_dim = hidden_states.shape
-+        
-+#         expert_outputs_list = [
-+#             ops.cat([
-+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+#             ], dim=0) 
-+#             for i in range(batch_size)
-+#         ]
-+        
-+#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-+#         # shape: (batch_size, top_k, hidden_dim)
-+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+        
-+#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-+#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+        
-+#         return moe_output.squeeze(1)
-+
-+#     @no_grad()
-+#     def _moe_infer_prefill(
-+#         self, 
-+#         hidden_states: mindspore.Tensor, 
-+#         selected_experts: mindspore.Tensor, 
-+#         routing_weights: mindspore.Tensor
-+#     ) -> mindspore.Tensor:
-+#         """
-+#         【预填充路径】针对 sequence_length > 1 的优化。
-+#         按专家对 Token 进行分组，并进行批处理。
-+#         """
-+#         moe_output = ops.zeros_like(hidden_states)
-+#         num_tokens = hidden_states.shape[0]
-+#         flat_selected_experts = selected_experts.flatten()
-+        
-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+        
-+#         active_experts = ops.unique(flat_selected_experts)
-+        
-+#         for expert_idx_tensor in active_experts:
-+#             expert_idx = expert_idx_tensor.item()
-+#             expert_layer = self.experts[expert_idx]
-+            
-+#             mask = (flat_selected_experts == expert_idx_tensor)
-+#             selected_token_indices = token_indices[mask]
-+#             selected_routing_weights = routing_weights.flatten()[mask]
-+            
-+#             current_states = hidden_states[selected_token_indices]
-+            
-+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+            
-+#             moe_output = moe_output.index_add(
-+#                 dim=0,
-+#                 index=selected_token_indices,
-+#                 source=expert_output.to(hidden_states.dtype)
-+#             )
-+#         return moe_output
-+
-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+#         """
-+#         顶层 forward 方法，作为智能分发器。
-+#         """
-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+        
-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+#         router_logits = self.gate(hidden_states_reshaped)
-+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
- 
--    # def forward(
--    #     self,
--    #     hidden_states: mindspore.Tensor,
--    #     attention_mask: Optional[mindspore.Tensor] = None,
--    #     position_ids: Optional[mindspore.Tensor] = None,
--    #     past_key_value: Optional[Cache] = None,
--    #     output_attentions: bool = False,
--    #     use_cache: bool = False,
--    #     cache_position: Optional[mindspore.Tensor] = None,
--    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--
--    #     bsz, q_len, _ = hidden_states.shape
--
--    #     query_states = self.q_proj(hidden_states)
--    #     key_states = self.k_proj(hidden_states)
--    #     value_states = self.v_proj(hidden_states)
--
--    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--
--    #     kv_seq_len = key_states.shape[-2]
--    #     if past_key_value is not None:
--    #         if self.layer_idx is None:
--    #             raise ValueError("`layer_idx` must be specified for caching")
--    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--
--    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--
--    #     if past_key_value is not None:
--    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--    #         key_states, value_states = past_key_value.update(
--    #             key_states, value_states, self.layer_idx, cache_kwargs
--    #         )
-+#         if self.norm_topk_prob:
-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+        
-+#         routing_weights = routing_weights.to(hidden_states.dtype)
-+        
-+#         moe_output = None
-+#         # 在推理时，根据序列长度选择最优路径
-+#         if not self.training:
-+#             if sequence_length == 1:
-+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+#             else:
-+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+#         else:
-+#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-+#             raise NotImplementedError("Training path is not implemented.")
-+
-+#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-+#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-+        
-+#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-+        
-+#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-+        
-+#         return final_hidden_states, router_logits
-+
-+
-+# class Qwen2MoeSparseMoeBlock(nn.Module):
-+#     """
-+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-+#     """
-+#     def __init__(self, config: Qwen2MoeConfig):
-+#         super().__init__()
-+#         self.num_experts = config.num_experts
-+#         self.top_k = config.num_experts_per_tok
-+#         self.norm_topk_prob = config.norm_topk_prob
-+
-+#         # 门控网络
-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+#         # 专家列表
-+#         self.experts = nn.ModuleList(
-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+#         )
-+#         # 共享专家
-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+
-+#     @no_grad()
-+#     def _moe_infer_decode(
-+#         self, 
-+#         hidden_states: mindspore.Tensor, 
-+#         selected_experts: mindspore.Tensor, 
-+#         routing_weights: mindspore.Tensor
-+#     ) -> mindspore.Tensor:
-+#         batch_size, _ = hidden_states.shape
-+#         expert_outputs_list = [
-+#             ops.cat([
-+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+#             ], dim=0) 
-+#             for i in range(batch_size)
-+#         ]
-+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+#         return moe_output.squeeze(1)
-+
-+#     @no_grad()
-+#     def _moe_infer_prefill(
-+#         self, 
-+#         hidden_states: mindspore.Tensor, 
-+#         selected_experts: mindspore.Tensor, 
-+#         routing_weights: mindspore.Tensor
-+#     ) -> mindspore.Tensor:
-+#         moe_output = ops.zeros_like(hidden_states)
-+#         num_tokens = hidden_states.shape[0]
-+#         flat_selected_experts = selected_experts.flatten()
-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+#         active_experts = ops.unique(flat_selected_experts)
-+        
-+#         for expert_idx_tensor in active_experts:
-+#             expert_idx = expert_idx_tensor.item()
-+#             expert_layer = self.experts[expert_idx]
-+#             mask = (flat_selected_experts == expert_idx_tensor)
-+#             selected_token_indices = token_indices[mask]
-+#             selected_routing_weights = routing_weights.flatten()[mask]
-+#             current_states = hidden_states[selected_token_indices]
-+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+#             moe_output = moe_output.index_add(
-+#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+#             )
-+#         return moe_output
-+
-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+#         """
-+#         顶层 forward 方法，作为智能分发器。
-+#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-+#         """
-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+        
-+#         # 1. 门控计算 (通用逻辑)
-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+#         router_logits = self.gate(hidden_states_reshaped)
-+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+
-+#         if self.norm_topk_prob:
-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+        
-+#         routing_weights = routing_weights.to(hidden_states.dtype)
-+        
-+#         # 2. 智能分发到最优 MoE 路径
-+#         moe_output = None
-+#         if not self.training:
-+#             if sequence_length == 1:
-+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+#             else:
-+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+#         else:
-+#             raise NotImplementedError("Training path is not implemented.")
-+
-+#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-+#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+        
-+#         # 4. 合并 MoE 输出和共享专家输出
-+#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+        
-+#         # 5. 恢复原始形状并返回
-+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+        
-+#         return final_hidden_states, router_logits
-+
-+# prefill fastest
-+# class Qwen2MoeSparseMoeBlock(nn.Module):
-+#     """
-+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-+#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-+#     """
-+#     def __init__(self, config: Qwen2MoeConfig):
-+#         super().__init__()
-+#         self.num_experts = config.num_experts
-+#         self.top_k = config.num_experts_per_tok
-+#         self.norm_topk_prob = config.norm_topk_prob
-+
-+#         # 门控网络
-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+#         # 专家列表
-+#         self.experts = nn.ModuleList(
-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+#         )
-+#         # 共享专家
-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+
-+#     @no_grad()
-+#     def _moe_infer_dispatch(
-+#         self, 
-+#         hidden_states: mindspore.Tensor, 
-+#         selected_experts: mindspore.Tensor, 
-+#         routing_weights: mindspore.Tensor
-+#     ) -> mindspore.Tensor:
-+#         """
-+#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-+#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-+#         """
-+#         moe_output = ops.zeros_like(hidden_states)
-+#         num_tokens, _ = hidden_states.shape
-+        
-+#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-+#         flat_selected_experts = selected_experts.flatten()
-+#         flat_routing_weights = routing_weights.flatten()
- 
--    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--
--    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--    #     query_states = query_states / math.sqrt(self.head_dim)
--    #     # <--- 修改结束 ---
--
--    #     fa_attention_mask = None
--    #     if attention_mask is not None:
--    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--    #         fa_attention_mask = (mask_slice != 0)
--
--    #     input_dtype = query_states.dtype
--
--    #     attn_output = mindspore.ops.flash_attention_score(
--    #         query=query_states,    # 传入已经预先缩放过的 query
--    #         key=key_states,
--    #         value=value_states,
--    #         head_num=self.num_heads,
--    #         attn_mask=fa_attention_mask,
--    #         keep_prob=1.0 - self.attention_dropout,
--    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--    #         input_layout="BNSD",
--    #         sparse_mode=0,
--    #         inner_precise=1        # 仍然保持内部高精度计算
--    #     )
-+#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
- 
--    #     attn_output = attn_output.to(input_dtype)
--    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--    #     attn_output = self.o_proj(attn_output)
-+#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-+#         active_experts = ops.unique(flat_selected_experts)
-+        
-+#         for expert_idx_tensor in active_experts:
-+#             expert_idx = expert_idx_tensor.item()
-+#             expert_layer = self.experts[expert_idx]
-+            
-+#             # 找到所有分配给该专家的 token
-+#             mask = (flat_selected_experts == expert_idx_tensor)
-+            
-+#             # 使用 mask 选取对应的 token 和权重
-+#             current_token_indices = token_indices[mask]
-+#             current_routing_weights = flat_routing_weights[mask]
-+#             current_hidden_states = hidden_states[current_token_indices]
-+            
-+#             # 对这些 token 进行批处理
-+#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+            
-+#             # 使用 index_add 将结果精确地加回到对应位置
-+#             moe_output = moe_output.index_add(
-+#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-+#             )
-+#         return moe_output
-+
-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+#         """
-+#         顶层 forward 方法，作为智能分发器。
-+#         """
-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+        
-+#         # 1. 门控计算
-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+#         router_logits = self.gate(hidden_states_reshaped)
-+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+
-+#         if self.norm_topk_prob:
-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+        
-+#         routing_weights = routing_weights.to(hidden_states.dtype)
-+        
-+#         # 2. 调用统一的 MoE 计算内核
-+#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-+#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
- 
--    #     attn_weights = None
--    #     if output_attentions:
--    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+#         # 3. 统一处理共享专家
-+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+        
-+#         # 4. 合并输出
-+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+        
-+#         # 5. 恢复原始形状并返回
-+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+        
-+#         return final_hidden_states, router_logits
-+
-+
-+# class Qwen2MoeSparseMoeBlock(nn.Module):
-+#     """
-+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+#     【最终高性能与高精度版】：
-+#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-+#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-+#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-+#     3. 这样实现了速度和准确性的两全其美。
-+#     """
-+#     def __init__(self, config: Qwen2MoeConfig):
-+#         super().__init__()
-+#         self.num_experts = config.num_experts
-+#         self.top_k = config.num_experts_per_tok
-+#         self.norm_topk_prob = config.norm_topk_prob
-+
-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+#         self.experts = nn.ModuleList(
-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+#         )
-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+
-+#     @no_grad()
-+#     def _moe_infer_decode(
-+#         self, 
-+#         hidden_states: mindspore.Tensor, 
-+#         selected_experts: mindspore.Tensor, 
-+#         routing_weights: mindspore.Tensor
-+#     ) -> mindspore.Tensor:
-+#         """
-+#         【解码路径】极致优化版：bmm + 高精度累加。
-+#         """
-+#         original_dtype = hidden_states.dtype
-+#         batch_size, _ = hidden_states.shape
-+        
-+#         expert_outputs_list = [
-+#             ops.cat([
-+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+#             ], dim=0) 
-+#             for i in range(batch_size)
-+#         ]
-+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+
-+#         # 在 float32 下执行 bmm，得到高精度结果
-+#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+        
-+#         # 将高精度结果转换回原始数据类型
-+#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-+        
-+#         return moe_output
-+
-+#     @no_grad()
-+#     def _moe_infer_prefill(
-+#         self, 
-+#         hidden_states: mindspore.Tensor, 
-+#         selected_experts: mindspore.Tensor, 
-+#         routing_weights: mindspore.Tensor
-+#     ) -> mindspore.Tensor:
-+#         """
-+#         【预填充路径】与原始实现一致，结果精确。
-+#         """
-+#         moe_output = ops.zeros_like(hidden_states)
-+#         num_tokens, _ = hidden_states.shape
-+#         flat_selected_experts = selected_experts.flatten()
-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+#         active_experts = ops.unique(flat_selected_experts)
-+        
-+#         for expert_idx_tensor in active_experts:
-+#             expert_idx = expert_idx_tensor.item()
-+#             expert_layer = self.experts[expert_idx]
-+#             mask = (flat_selected_experts == expert_idx_tensor)
-+#             selected_token_indices = token_indices[mask]
-+#             selected_routing_weights = routing_weights.flatten()[mask]
-+#             current_states = hidden_states[selected_token_indices]
-+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+#             moe_output = moe_output.index_add(
-+#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+#             )
-+#         return moe_output
-+
-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+        
-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+#         router_logits = self.gate(hidden_states_reshaped)
-+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
- 
--    #     return attn_output, attn_weights, past_key_value
-+#         if self.norm_topk_prob:
-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+        
-+#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-+#         # 如果模型主体是 float16，后续再转换
-+        
-+#         moe_output = None
-+#         if not self.training:
-+#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-+#             # _moe_infer_decode 内部会处理好类型转换
-+#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-+#             if sequence_length == 1:
-+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+#             else:
-+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+#         else:
-+#             raise NotImplementedError("Training path is not implemented.")
-+
-+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+        
-+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+        
-+#         return final_hidden_states, router_logits
-+    
- 
--QWEN2MOE_ATTENTION_CLASSES = {
--    "eager": Qwen2MoeAttention,
--    "flash-attention": Qwen2MoeFlashAttention,
--}
-+# class Qwen2MoeSparseMoeBlock(nn.Module):
-+#     """
-+#     【融合版】一个混合专家模块，内置两种推理策略，
-+#     由外部全局变量 `Long_Prompt` 控制：
-+
-+#     - if Long_Prompt is True:  【精度优先模式】
-+#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-+#       适用于处理长序列，避免误差累积。
-+
-+#     - if Long_Prompt is False: 【速度优先模式】
-+#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-+#       在解码阶段获得极致速度，同时保证结果高度准确。
-+#     """
-+#     def __init__(self, config: Qwen2MoeConfig):
-+#         super().__init__()
-+#         self.num_experts = config.num_experts
-+#         self.top_k = config.num_experts_per_tok
-+#         self.norm_topk_prob = config.norm_topk_prob
-+
-+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+#         self.experts = nn.ModuleList(
-+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+#         )
-+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+
-+#     # --- 速度优先模式的辅助函数 ---
-+#     @no_grad()
-+#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+#         original_dtype = hidden_states.dtype
-+#         batch_size, _ = hidden_states.shape
-+#         expert_outputs_list = [
-+#             ops.cat([
-+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+#             ], dim=0) 
-+#             for i in range(batch_size)
-+#         ]
-+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+#         weights_fp32 = routing_weights.to(mindspore.float32)
-+#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+#         return moe_output_fp32.squeeze(1).to(original_dtype)
-+
-+#     @no_grad()
-+#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+#         moe_output = ops.zeros_like(hidden_states)
-+#         num_tokens, _ = hidden_states.shape
-+#         flat_selected_experts = selected_experts.flatten()
-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+#         active_experts = ops.unique(flat_selected_experts)
-+#         for expert_idx_tensor in active_experts:
-+#             expert_idx = expert_idx_tensor.item()
-+#             expert_layer = self.experts[expert_idx]
-+#             mask = (flat_selected_experts == expert_idx_tensor)
-+#             selected_token_indices = token_indices[mask]
-+#             selected_routing_weights = routing_weights.flatten()[mask]
-+#             current_states = hidden_states[selected_token_indices]
-+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-+#         return moe_output
-+
-+#     # --- 精度优先模式的辅助函数 ---
-+#     @no_grad()
-+#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+#         moe_output = ops.zeros_like(hidden_states)
-+#         num_tokens, _ = hidden_states.shape
-+#         flat_selected_experts = selected_experts.flatten()
-+#         flat_routing_weights = routing_weights.flatten()
-+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+#         active_experts = ops.unique(flat_selected_experts)
-+#         for expert_idx_tensor in active_experts:
-+#             expert_idx = expert_idx_tensor.item()
-+#             expert_layer = self.experts[expert_idx]
-+#             mask = (flat_selected_experts == expert_idx_tensor)
-+#             current_token_indices = token_indices[mask]
-+#             current_routing_weights = flat_routing_weights[mask]
-+#             current_hidden_states = hidden_states[current_token_indices]
-+#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+#         return moe_output
-+
-+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+#         # 声明我们将要使用一个在模块外部定义的全局变量
-+#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-+#         global Long_Prompt
-+
-+#         # 1. 门控计算 (所有模式通用)
-+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+#         router_logits = self.gate(hidden_states_reshaped)
-+#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+#         if self.norm_topk_prob:
-+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+        
-+#         moe_output = None
-+#         if not self.training:
-+#             # 根据 Long_Prompt 标志选择模式
-+#             if Long_Prompt:
-+#                 # --- 精度优先模式 ---
-+#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+#             else:
-+#                 # --- 速度优先模式 ---
-+#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+#                 if sequence_length == 1:
-+#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+#                 else:
-+#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+#         else:
-+#             raise NotImplementedError("Training path is not implemented.")
-+
-+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+        
-+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+        
-+#         return final_hidden_states, router_logits
-+    
-+class Qwen2MoeSparseMoeBlock(nn.Module):
-+    """
-+    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-+    控制的顶级推理策略：
- 
-+    - if Long_Prompt is True:  【精度优先模式】
-+      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
-+      适用于需要严格可复现性的长序列任务。
- 
--class Qwen2MoeSparseMoeBlock(nn.Module):
--    def __init__(self, config):
-+    - if Long_Prompt is False: 【速度优先模式】
-+      采用业界最强的性能组合：
-+      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
-+      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
-+    """
-+    def __init__(self, config: Qwen2MoeConfig):
-         super().__init__()
-         self.num_experts = config.num_experts
-         self.top_k = config.num_experts_per_tok
-         self.norm_topk_prob = config.norm_topk_prob
- 
--        # gating
-         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-         self.experts = nn.ModuleList(
-             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-         )
--
-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
- 
--    #@dwj
--    # 只遍历激活的专家，而非全部专家
--    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--            batch_size, sequence_length, hidden_dim = hidden_states.shape
--            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--            num_tokens = hidden_states_reshaped.shape[0]
--
--            router_logits = self.gate(hidden_states_reshaped)
--            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--
--            if self.norm_topk_prob:
--                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--            routing_weights = routing_weights.to(hidden_states.dtype)
--            
--            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--            flat_selected_experts = selected_experts.flatten()
--            
--            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--            token_indices = broadcasted_token_indices.flatten()
--            
--            active_experts = ops.unique(flat_selected_experts)
--            
--            for expert_idx_tensor in active_experts:
--                expert_idx = expert_idx_tensor.item()
--                expert_layer = self.experts[expert_idx]
--                
--                mask = (flat_selected_experts == expert_idx_tensor)
--                selected_token_indices = token_indices[mask]
--                selected_routing_weights = routing_weights.flatten()[mask]
--                
--                current_states = hidden_states_reshaped[selected_token_indices]
--                
--                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--                
--                final_hidden_states = final_hidden_states.index_add(
--                    dim=0,
--                    index=selected_token_indices,
--                    source=expert_output.to(hidden_states.dtype)
--                )
--            
--            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-+    @no_grad()
-+    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+        original_dtype = hidden_states.dtype
-+        batch_size, _ = hidden_states.shape
-+        expert_outputs_list = [
-+            ops.cat([
-+                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+            ], dim=0)
-+            for i in range(batch_size)
-+        ]
-+        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+        weights_fp32 = routing_weights.to(mindspore.float32)
-+        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+        return moe_output_fp32.squeeze(1).to(original_dtype)
-+
-+    @no_grad()
-+    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+        num_tokens, _ = hidden_states.shape
-+        flat_selected_experts = selected_experts.flatten()
-+        sorted_expert_indices = flat_selected_experts.argsort()
-+        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+        original_token_indices = sorted_expert_indices // self.top_k
-+        moe_output = ops.zeros_like(hidden_states)
-+        current_token_offset = 0
-+        for i in range(self.num_experts):
-+            expert_token_count = tokens_per_expert[i] - current_token_offset
-+            if expert_token_count == 0:
-+                continue
-+            end_offset = current_token_offset + expert_token_count
-+            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+            expert_hidden_states = hidden_states[expert_original_token_indices]
-+            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+            current_token_offset += expert_token_count
-+        return moe_output
-+
-+    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-+    @no_grad()
-+    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+        moe_output = ops.zeros_like(hidden_states)
-+        num_tokens, _ = hidden_states.shape
-+        flat_selected_experts = selected_experts.flatten()
-+        flat_routing_weights = routing_weights.flatten()
-+        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+        active_experts = ops.unique(flat_selected_experts)
-+        for expert_idx_tensor in active_experts:
-+            expert_idx = expert_idx_tensor.item()
-+            expert_layer = self.experts[expert_idx]
-+            mask = (flat_selected_experts == expert_idx_tensor)
-+            current_token_indices = token_indices[mask]
-+            current_routing_weights = flat_routing_weights[mask]
-+            current_hidden_states = hidden_states[current_token_indices]
-+            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+        return moe_output
- 
--            final_hidden_states = final_hidden_states + shared_expert_output
--            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--            
--            return final_hidden_states, router_logits
-+    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+        global Long_Prompt
-+
-+        # 1. 门控计算 (所有模式通用)
-+        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+        router_logits = self.gate(hidden_states_reshaped)
-+        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+        if self.norm_topk_prob:
-+            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+        
-+        moe_output = None
-+        if Long_Prompt:
-+            # --- 精度优先模式 (ACCURACY MODE) ---
-+            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+        else:
-+            # --- 速度优先模式 (SPEED MODE) ---
-+            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+            if sequence_length == 1:
-+                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+            else:
-+                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+        
- 
-+        # 3. 共享专家计算与合并 (所有模式通用)
-+        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+        
-+        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+        
-+        return final_hidden_states, router_logits
- 
- class Qwen2MoeDecoderLayer(nn.Module):
-     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-         super().__init__()
-         self.hidden_size = config.hidden_size
-+        
-+        # if Long_Prompt:
-+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+        # else:
-+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
- 
-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
- 
--        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--
-         if (layer_idx not in config.mlp_only_layers) and (
-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-         ):
-@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-             self._warmed_up = True
-             self.warmup_moe_model()
- 
-+
-+
-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-         output_router_logits = (
-             output_router_logits if output_router_logits is not None else self.config.output_router_logits
-@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-             router_logits=outputs.router_logits,
-         )
- 
-+    def generate(self, *args, **kwargs):
-+        """
-+        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-+        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-+        """
-+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+
-+        input_ids = kwargs.get("input_ids")
-+        if input_ids is None and args:
-+            input_ids = args[0]
-+
-+        if input_ids is not None:
-+            prompt_length = input_ids.shape[1]
-+            
-+            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-+                Long_Prompt = True
-+            else:
-+                Long_Prompt = False
-+
-+        return super().generate(*args, **kwargs)
-+    
-     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
-     def prepare_inputs_for_generation(
-         self,
-@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
-         # Exception 1: when passing input_embeds, input_ids may be missing entries
-         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
-+        
-         if past_key_values is not None:
-             if inputs_embeds is not None:  # Exception 1
-                 if 0 not in input_ids.shape:
-@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-             }
-         )
-         return model_inputs
-+
- # @lwx
-     # def _decode_one_tokens_logits(
-     #     self,
-@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
-             attentions=outputs.attentions,
-         )
- 
-+
- __all__ = [
-     "Qwen2MoeForCausalLM",
-     "Qwen2MoeModel",
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-new file mode 100644
-index 00000000..6dfb5b93
---- /dev/null
-+++ b/patches/0001-20251104commit.patch
-@@ -0,0 +1,1272 @@
-+From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Tue, 4 Nov 2025 09:11:51 +0800
-+Subject: [PATCH] 20251104commit
-+
-+---
-+ mindnlp/transformers/cache_utils.py           |  28 +-
-+ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+ 3 files changed, 976 insertions(+), 87 deletions(-)
-+
-+diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+index cadd2e04..02f8d4be 100644
-+--- a/mindnlp/transformers/cache_utils.py
-++++ b/mindnlp/transformers/cache_utils.py
-+@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+             # k_out[:, :, cache_position] = key_states
-+             # v_out[:, :, cache_position] = value_states
-+-            if ON_ORANGE_PI:
-+-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+-            else:
-+-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+-
-++            # if ON_ORANGE_PI:
-++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++            # else:
-++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++            # 确保 cache_position 是 1D tensor 并且类型正确
-++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++            if cache_position.ndim > 1:
-++                cache_position = cache_position.flatten()
-++            # 确保类型是 int32 或 int64（MindSpore 要求）
-++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++                cache_position = cache_position.int()
-++            
-++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++            k_out[:, :, cache_position] = key_states
-++            v_out[:, :, cache_position] = value_states
-++            
-+         return k_out, v_out
-+ 
-+     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index c695b944..d8303e45 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+ def rotate_half(x):
-+     """Rotates half the hidden dims of the input."""
-+-    x1 = x[..., : x.shape[-1] // 2]
-+-    x2 = x[..., x.shape[-1] // 2 :]
-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++    # x1 = x[..., : x.shape[-1] // 2]
-++    # x2 = x[..., x.shape[-1] // 2 :]
-++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+     return ops.cat((-x2, x1), dim=-1)
-+ 
-+ 
-+@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+         if self.training:
-+             raise NotImplementedError("Training is not supported yet.")
-+         else:
-+-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+-        if self.config.n_shared_experts is not None:
-+-            y = y + self.shared_experts(identity)
-+-        return y
-++            # @lwx
-++            if orig_shape[1] == 1:
-++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++                y=y.view(*orig_shape)
-++                if self.config.n_shared_experts is not None:
-++                    y = y + self.shared_experts(identity)
-++                return y
-++            else:
-++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++                if self.config.n_shared_experts is not None:
-++                    y = y + self.shared_experts(identity)
-++                return y
-++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++        # if self.config.n_shared_experts is not None:
-++        #     y = y + self.shared_experts(identity)
-++        # return y
-++        
-++    @no_grad()
-++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++
-++        expert_cache = ops.zeros_like(x)
-++        for i in range(self.num_experts_per_tok):
-++            expert_id = flat_expert_indices[i].item()
-++            weight = flat_expert_weights[i].item()
-++            expert = self.experts[expert_id]
-++            expert_out = expert(x)
-++            expert_cache += expert_out * weight
-++        return expert_cache
-+ 
-+     @no_grad()
-+-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+-        # expert_cache = torch.zeros_like(x)
-+-        # idxs = flat_expert_indices.argsort()
-+-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+-        # token_idxs = idxs // self.num_experts_per_tok
-+-        # for i, end_idx in enumerate(tokens_per_expert):
-+-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+-        #     if start_idx == end_idx:
-+-        #         continue
-+-        #     expert = self.experts[i]
-+-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+-        #     expert_tokens = x[exp_token_idx]
-+-        #     expert_out = expert(expert_tokens)
-+-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+-        # return expert_cache
-++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+         expert_cache = ops.zeros_like(x)
-+         idxs = flat_expert_indices.argsort()
-+         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+         token_idxs = idxs // self.num_experts_per_tok
-++
-+         for i, end_idx in enumerate(tokens_per_expert):
-+             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+             if start_idx == end_idx:
-+@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+             expert_out = expert(expert_tokens)
-+             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++
-+         return expert_cache
-++        
-++    # @no_grad()
-++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++    #     # expert_cache = torch.zeros_like(x)
-++    #     # idxs = flat_expert_indices.argsort()
-++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++    #     # token_idxs = idxs // self.num_experts_per_tok
-++    #     # for i, end_idx in enumerate(tokens_per_expert):
-++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++    #     #     if start_idx == end_idx:
-++    #     #         continue
-++    #     #     expert = self.experts[i]
-++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++    #     #     expert_tokens = x[exp_token_idx]
-++    #     #     expert_out = expert(expert_tokens)
-++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++    #     # return expert_cache
-++    #     expert_cache = ops.zeros_like(x)
-++    #     idxs = flat_expert_indices.argsort()
-++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++    #     token_idxs = idxs // self.num_experts_per_tok
-++
-++    #     for i, end_idx in enumerate(tokens_per_expert):
-++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++    #         if start_idx == end_idx:
-++    #             continue
-++    #         expert = self.experts[i]
-++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++    #         expert_tokens = x[exp_token_idx]
-++    #         expert_out = expert(expert_tokens)
-++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++
-++    #     return expert_cache
-++    # @no_grad()
-++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++    #     expert_cache = ops.zeros_like(x)
-++
-++    #     # 排序保证顺序一致
-++    #     idxs = flat_expert_indices.argsort()
-++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++    #     token_idxs = idxs // self.num_experts_per_tok
-++
-++    #     # 找出有 token 的专家
-++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++
-++    #     for i in active_experts.tolist():
-++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++    #         end_idx = tokens_per_expert[i]
-++    #         if start_idx == end_idx:  # 没有 token
-++    #             continue
-++
-++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++    #         expert_tokens = x[exp_token_idx]
-++    #         expert_out = self.experts[i](expert_tokens)
-++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++
-++    #         expert_cache = mindspore.mint.scatter_add(
-++    #             expert_cache,
-++    #             0,
-++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++    #             expert_out
-++    #         )
-++
-++    #     return expert_cache
-++
-++
-+ 
-+ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+ #     """
-+@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+ 
-+         # Initialize weights and apply final processing
-+         self.post_init()
-++        self.warm_up = False
-++
-++    def warmup_moe_model_deep(self):
-++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++        test_texts = [
-++            "warmup short",
-++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++        ]
-++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++        if tokenizer is None:
-++            from mindnlp.transformers import AutoTokenizer
-++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++            self._warmup_tokenizer = tokenizer
-++
-++        for text in test_texts:
-++            inputs = tokenizer(text, return_tensors="ms")
-++            with mindspore._no_grad():
-++                _ = self(**inputs, use_cache=False)
-++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+ 
-+     def get_input_embeddings(self):
-+         return self.model.embed_tokens
-+@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+         ```"""
-++        if not self.warm_up:
-++            self.warm_up = True
-++            self.warmup_moe_model_deep()
-++
-+         output_attentions = (
-+             output_attentions
-+             if output_attentions is not None
-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+index 3cbf820e..d4c6b651 100644
-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+@@ -18,7 +18,6 @@
-+ # See the License for the specific language governing permissions and
-+ # limitations under the License.
-+ """MindSpore Qwen2MoE model."""
-+-
-+ import math
-+ from typing import List, Optional, Tuple, Union
-+ 
-+@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+     TokenClassifierOutput,
-+ )
-+ from ...modeling_utils import PreTrainedModel
-++from ...generation import GenerationMixin
-+ from ....utils import logging
-+ from .configuration_qwen2_moe import Qwen2MoeConfig
-+ 
-+@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+         self.variance_epsilon = eps
-+ 
-+     def forward(self, hidden_states):
-++        # @dwj
-++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++        # @lwx
-++        # if not self.training :
-++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+         input_dtype = hidden_states.dtype
-+         hidden_states = hidden_states.to(mindspore.float32)
-+         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+@@ -234,6 +239,8 @@ def rotate_half(x):
-+     """Rotates half the hidden dims of the input."""
-+     x1 = x[..., : x.shape[-1] // 2]
-+     x2 = x[..., x.shape[-1] // 2 :]
-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+     return ops.cat((-x2, x1), dim=-1)
-+ 
-+ 
-+@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+         self.config = config
-+         self.hidden_size = config.hidden_size
-+         self.intermediate_size = intermediate_size
-++        
-+         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+         self.act_fn = ACT2FN[config.hidden_act]
-+ 
-+     def forward(self, x):
-+-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+-
-+ 
-++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++        # @lwx 
-++        # gate_up_output = self.gate_up_proj(x)
-++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++        # return self.down_proj(swiglu_output)
-++
-++    # def forward(self, x):
-++    #     gate_proj_out = self.gate_proj(x)
-++    #     up_proj_out = self.up_proj(x)
-++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++    #     return self.down_proj(swiglu_out)
-++        
-+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+     """
-+@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+         use_cache: bool = False,
-+         cache_position: Optional[mindspore.Tensor] = None,
-+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++        
-++
-+         bsz, q_len, _ = hidden_states.shape
-+ 
-+         query_states = self.q_proj(hidden_states)
-+@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+                     "with a layer index."
-+                 )
-+-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++            if isinstance(past_key_value, StaticCache):
-++                kv_seq_len = key_states.shape[-2]
-++            else:
-++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+ 
-+         if past_key_value is not None:
-+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++            
-++            if isinstance(past_key_value, StaticCache):
-++                kv_seq_len = key_states.shape[-2]
-+ 
-+         # repeat k/v heads if n_kv_heads < n_heads
-+         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+-
-++        
-+         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+ 
-+-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+-            raise ValueError(
-+-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+-                f" {attn_weights.shape}"
-+-            )
-+-
-+-        if attention_mask is not None:  # no matter the length, we just slice it
-+-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++        if attention_mask is not None:
-++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+             attn_weights = attn_weights + causal_mask
-+ 
-+         # upcast attention to fp32
-+@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+ 
-+         attn_output = self.o_proj(attn_output)
-+-
-++        # @lwx
-++        
-++        # max_seq_len = self.max_position_embeddings  # 2048
-++
-++        # if attention_mask is not None:
-++        #     # attention_mask: [B, 1, Sq, Sk]
-++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++
-++        #     # pad 到 [max_seq_len, max_seq_len]
-++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++        #     global_attention_mask = padded_mask
-++        # else:
-++        #     global_attention_mask = None
-++
-++
-++        # sparse_mode=3
-++        # attn_output = mindspore.ops.flash_attention_score(
-++        #     query=query_states,
-++        #     key=key_states,
-++        #     value=value_states,
-++        #     real_shift=None,
-++        #     padding_mask=None,
-++
-++        #     head_num=self.num_heads,
-++        #     attn_mask=global_attention_mask,
-++        #     keep_prob=1.0 - self.attention_dropout,
-++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++        #     input_layout="BNSD",
-++        #     pre_tokens=2147483647,
-++        #     next_tokens=2147483647,
-++        #     inner_precise=0,
-++        #     drop_mask=None,
-++        #     prefix=None,
-++        #     actual_seq_qlen=None,
-++        #     actual_seq_kvlen=None,
-++        #     sparse_mode=sparse_mode,
-++        # )
-+         if not output_attentions:
-+             attn_weights = None
-+ 
-+         return attn_output, attn_weights, past_key_value
-+ 
-+ 
-++class Qwen2MoeFlashAttention(nn.Module):
-++    """
-++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++
-++    关键改动:
-++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++       直接传入原始的 key 和 value 张量效率更高。
-++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++    """
-++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++        super().__init__()
-++        self.config = config
-++        self.layer_idx = layer_idx
-++        self.hidden_size = config.hidden_size
-++        self.num_heads = config.num_attention_heads
-++        self.head_dim = self.hidden_size // self.num_heads
-++        self.num_key_value_heads = config.num_key_value_heads
-++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++        self.max_position_embeddings = config.max_position_embeddings
-++        self.rope_theta = config.rope_theta
-++        self.attention_dropout = config.attention_dropout
-++
-++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++            raise ValueError(
-++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++            )
-++
-++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++
-++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++            self.head_dim,
-++            max_position_embeddings=self.max_position_embeddings,
-++            base=self.rope_theta,
-++        )
-++
-++    def forward(
-++        self,
-++        hidden_states: mindspore.Tensor,
-++        attention_mask: Optional[mindspore.Tensor] = None,
-++        position_ids: Optional[mindspore.Tensor] = None,
-++        past_key_value: Optional[Cache] = None,
-++        output_attentions: bool = False,
-++        use_cache: bool = False,
-++        cache_position: Optional[mindspore.Tensor] = None,
-++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++        bsz, q_len, _ = hidden_states.shape
-++
-++        # 1. 线性投射 Q, K, V
-++        query_states = self.q_proj(hidden_states)
-++        key_states = self.k_proj(hidden_states)
-++        value_states = self.v_proj(hidden_states)
-++
-++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++        # query:   [B, S, H*D] -> [B, N1, S, D]
-++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++        # 3. RoPE 旋转位置编码
-++        kv_seq_len = key_states.shape[-2]
-++        if past_key_value is not None:
-++            if self.layer_idx is None:
-++                raise ValueError(
-++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++                    "with a layer index."
-++                )
-++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++                if cache_position.shape[0] == 1:
-++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++                    kv_seq_len = past_seen_tokens + 1
-++                else:
-++                    # prefill 阶段：cache_position 是范围，使用其长度
-++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++            else:
-++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++
-++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++        # 4. KV 缓存更新
-++        if past_key_value is not None:
-++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++            key_states, value_states = past_key_value.update(
-++                key_states, value_states, self.layer_idx, cache_kwargs
-++            )
-++            
-++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++                if cache_position.shape[0] == 1:
-++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++                    kv_seq_len = key_states.shape[-2]
-++
-++        # 5. [重要] 准备 Attention Mask
-++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++        fa_attention_mask = None
-++        if attention_mask is not None:
-++            # 截取与当前key长度匹配的部分
-++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++            fa_attention_mask = (mask_slice != 0)
-++
-++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++        input_dtype = query_states.dtype
-++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++            query_states = query_states.to(mindspore.float16)
-++            key_states = key_states.to(mindspore.float16)
-++            value_states = value_states.to(mindspore.float16)
-++
-++        # 6. [核心] 调用 flash_attention_score 算子
-++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++        attn_output = mindspore.ops.flash_attention_score(
-++            query=query_states,
-++            key=key_states,
-++            value=value_states,
-++            head_num=self.num_heads, # 传入Q的头数(N1)
-++            attn_mask=fa_attention_mask,
-++            keep_prob=1.0 - self.attention_dropout,
-++            scalar_value=1.0 / math.sqrt(self.head_dim),
-++            input_layout="BNSD",
-++            sparse_mode=0 # 使用 defaultMask 模式
-++        )
-++
-++        # 恢复原始数据类型
-++        attn_output = attn_output.to(input_dtype)
-++
-++        # 7. 调整输出形状
-++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++        attn_output = self.o_proj(attn_output)
-++
-++        # FlashAttention 算子不直接返回注意力权重矩阵
-++        attn_weights = None
-++        if output_attentions:
-++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++
-++        return attn_output, attn_weights, past_key_value
-++
-++    # def forward(
-++    #     self,
-++    #     hidden_states: mindspore.Tensor,
-++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++    #     position_ids: Optional[mindspore.Tensor] = None,
-++    #     past_key_value: Optional[Cache] = None,
-++    #     output_attentions: bool = False,
-++    #     use_cache: bool = False,
-++    #     cache_position: Optional[mindspore.Tensor] = None,
-++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++    #     bsz, q_len, _ = hidden_states.shape
-++
-++    #     # 1. 线性投射 Q, K, V
-++    #     query_states = self.q_proj(hidden_states)
-++    #     key_states = self.k_proj(hidden_states)
-++    #     value_states = self.v_proj(hidden_states)
-++
-++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++    #     # 3. RoPE 旋转位置编码
-++    #     kv_seq_len = key_states.shape[-2]
-++    #     if past_key_value is not None:
-++    #         if self.layer_idx is None:
-++    #             raise ValueError(
-++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++    #                 "with a layer index."
-++    #             )
-++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++
-++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++    #     # 4. KV 缓存更新
-++    #     if past_key_value is not None:
-++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++    #         key_states, value_states = past_key_value.update(
-++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++    #         )
-++
-++    #     # 5. 准备 Attention Mask
-++    #     fa_attention_mask = None
-++    #     if attention_mask is not None:
-++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++    #         fa_attention_mask = (mask_slice != 0)
-++
-++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++    #     input_dtype = query_states.dtype
-++
-++    #     # 6. [核心] 调用 flash_attention_score 算子
-++    #     attn_output = mindspore.ops.flash_attention_score(
-++    #         query=query_states,
-++    #         key=key_states,
-++    #         value=value_states,
-++    #         head_num=self.num_heads,
-++    #         attn_mask=fa_attention_mask,
-++    #         keep_prob=1.0 - self.attention_dropout,
-++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++    #         input_layout="BNSD",
-++    #         sparse_mode=0,
-++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++    #         inner_precise=1
-++    #     )
-++
-++    #     # 恢复原始数据类型
-++    #     attn_output = attn_output.to(input_dtype)
-++
-++    #     # 7. 调整输出形状
-++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++    #     attn_output = self.o_proj(attn_output)
-++
-++    #     attn_weights = None
-++    #     if output_attentions:
-++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++
-++    #     return attn_output, attn_weights, past_key_value
-++
-++    # def forward(
-++    #     self,
-++    #     hidden_states: mindspore.Tensor,
-++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++    #     position_ids: Optional[mindspore.Tensor] = None,
-++    #     past_key_value: Optional[Cache] = None,
-++    #     output_attentions: bool = False,
-++    #     use_cache: bool = False,
-++    #     cache_position: Optional[mindspore.Tensor] = None,
-++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++    #     bsz, q_len, _ = hidden_states.shape
-++
-++    #     query_states = self.q_proj(hidden_states)
-++    #     key_states = self.k_proj(hidden_states)
-++    #     value_states = self.v_proj(hidden_states)
-++
-++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++    #     kv_seq_len = key_states.shape[-2]
-++    #     if past_key_value is not None:
-++    #         if self.layer_idx is None:
-++    #             raise ValueError("`layer_idx` must be specified for caching")
-++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++
-++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++    #     if past_key_value is not None:
-++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++    #         key_states, value_states = past_key_value.update(
-++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++    #         )
-++
-++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++
-++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++    #     query_states = query_states / math.sqrt(self.head_dim)
-++    #     # <--- 修改结束 ---
-++
-++    #     fa_attention_mask = None
-++    #     if attention_mask is not None:
-++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++    #         fa_attention_mask = (mask_slice != 0)
-++
-++    #     input_dtype = query_states.dtype
-++
-++    #     attn_output = mindspore.ops.flash_attention_score(
-++    #         query=query_states,    # 传入已经预先缩放过的 query
-++    #         key=key_states,
-++    #         value=value_states,
-++    #         head_num=self.num_heads,
-++    #         attn_mask=fa_attention_mask,
-++    #         keep_prob=1.0 - self.attention_dropout,
-++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++    #         input_layout="BNSD",
-++    #         sparse_mode=0,
-++    #         inner_precise=1        # 仍然保持内部高精度计算
-++    #     )
-++
-++    #     attn_output = attn_output.to(input_dtype)
-++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++    #     attn_output = self.o_proj(attn_output)
-++
-++    #     attn_weights = None
-++    #     if output_attentions:
-++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++
-++    #     return attn_output, attn_weights, past_key_value
-++
-+ QWEN2MOE_ATTENTION_CLASSES = {
-+     "eager": Qwen2MoeAttention,
-++    "flash-attention": Qwen2MoeFlashAttention,
-+ }
-+ 
-+ 
-+@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+ 
-++    #@dwj
-++    # 只遍历激活的专家，而非全部专家
-+     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-        hidden_states = hidden_states.view(-1, hidden_dim)
-+-        # router_logits: (batch * sequence_length, n_experts)
-+-        router_logits = self.gate(hidden_states)
-+-
-+-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+-        if self.norm_topk_prob:
-+-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-        # we cast back to the input dtype
-+-        routing_weights = routing_weights.to(hidden_states.dtype)
-+-
-+-        final_hidden_states = ops.zeros(
-+-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+-        )
-+-
-+-        # One hot encode the selected experts to create an expert mask
-+-        # this will be used to easily index which expert is going to be sollicitated
-+-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+-
-+-        # Loop over all available experts in the model and perform the computation on each expert
-+-        for expert_idx in range(self.num_experts):
-+-            expert_layer = self.experts[expert_idx]
-+-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+-
-+-            # Index the correct hidden states and compute the expert hidden state for
-+-            # the current expert. We need to make sure to multiply the output hidden
-+-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+-            if 0 not in idx.shape:
-+-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+-
-+-                # However `index_add_` only support torch tensors for indexing so we'll use
-+-                # the `top_x` tensor here.
-+-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+-
-+-        shared_expert_output = self.shared_expert(hidden_states)
-+-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+-
-+-        final_hidden_states = final_hidden_states + shared_expert_output
-++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++            num_tokens = hidden_states_reshaped.shape[0]
-++
-++            router_logits = self.gate(hidden_states_reshaped)
-++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++
-++            if self.norm_topk_prob:
-++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++            routing_weights = routing_weights.to(hidden_states.dtype)
-++            
-++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++            flat_selected_experts = selected_experts.flatten()
-++            
-++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++            token_indices = broadcasted_token_indices.flatten()
-++            
-++            active_experts = ops.unique(flat_selected_experts)
-++            
-++            for expert_idx_tensor in active_experts:
-++                expert_idx = expert_idx_tensor.item()
-++                expert_layer = self.experts[expert_idx]
-++                
-++                mask = (flat_selected_experts == expert_idx_tensor)
-++                selected_token_indices = token_indices[mask]
-++                selected_routing_weights = routing_weights.flatten()[mask]
-++                
-++                current_states = hidden_states_reshaped[selected_token_indices]
-++                
-++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++                
-++                final_hidden_states = final_hidden_states.index_add(
-++                    dim=0,
-++                    index=selected_token_indices,
-++                    source=expert_output.to(hidden_states.dtype)
-++                )
-++            
-++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+ 
-+-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+-        return final_hidden_states, router_logits
-++            final_hidden_states = final_hidden_states + shared_expert_output
-++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++            
-++            return final_hidden_states, router_logits
-+ 
-+ 
-+ class Qwen2MoeDecoderLayer(nn.Module):
-+@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+ 
-+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+ 
-++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++
-+         if (layer_idx not in config.mlp_only_layers) and (
-+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+         ):
-+@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+     _skip_keys_device_placement = "past_key_values"
-+     _supports_cache_class = True
-++#lwx
-++    # _supports_static_cache = True
-+ 
-+     def _init_weights(self, module):
-+         std = self.config.initializer_range
-+@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+         return causal_mask
-+ 
-+ 
-+-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+     _tied_weights_keys = ["lm_head.weight"]
-+ 
-+     def __init__(self, config):
-+@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+         self.num_experts_per_tok = config.num_experts_per_tok
-+         # Initialize weights and apply final processing
-+         self.post_init()
-++        # @lwx
-++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++        #     self.generation_config.cache_implementation = "static"
-++        self._warmed_up = False
-++
-++    def warmup_moe_model(self):
-++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++        test_texts = [
-++            "warmup short",
-++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++        ]
-++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++        if tokenizer is None:
-++            from mindnlp.transformers import AutoTokenizer
-++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++            self._warmup_tokenizer = tokenizer
-++
-++        for text in test_texts:
-++            inputs = tokenizer(text, return_tensors="ms")
-++            with mindspore._no_grad():
-++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+ 
-+     def get_input_embeddings(self):
-+         return self.model.embed_tokens
-+@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+         ```"""
-++        if not self._warmed_up:
-++            self._warmed_up = True
-++            self.warmup_moe_model()
-+ 
-+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+         output_router_logits = (
-+@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+             }
-+         )
-+         return model_inputs
-++# @lwx
-++    # def _decode_one_tokens_logits(
-++    #     self,
-++    #     cur_token: mindspore.Tensor,
-++    #     input_pos: Optional[mindspore.Tensor],
-++    #     cache_position: mindspore.Tensor,
-++    #     past_key_values: StaticCache,
-++    # ) -> mindspore.Tensor:
-++    #     """
-++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++        
-++    #     Args:
-++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++    #         input_pos: 输入位置信息，可选
-++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++            
-++    #     Returns:
-++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++    #     """
-++    #     # 调用JIT编译的版本
-++    #     return self.get_decode_one_tokens_logits(
-++    #         cur_token=cur_token,
-++    #         input_pos=input_pos,
-++    #         cache_position=cache_position,
-++    #         past_key_values=past_key_values,
-++    #     )
-++    
-++    # @mindspore.jit(jit_level='O1')
-++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++    #     """
-++    #     JIT编译的函数，用于高效的单token解码
-++    #     使用JIT编译优化以支持静态shape和高效执行
-++        
-++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++    #     """
-++    #     outputs = self.model.forward(
-++    #         input_ids=cur_token,
-++    #         position_ids=input_pos,
-++    #         cache_position=cache_position,
-++    #         past_key_values=past_key_values,
-++    #         use_cache=True,
-++    #         return_dict=False,
-++    #     )
-++        
-++    #     hidden_states = outputs[0]
-++    #     logits = self.lm_head.forward(hidden_states)
-++    #     logits = logits.float()
-++        
-++    #     return logits[:, -1, :]
-++
-++    # def _sample(
-++    #     self,
-++    #     input_ids: mindspore.Tensor,
-++    #     logits_processor,
-++    #     stopping_criteria,
-++    #     generation_config,
-++    #     synced_devices: bool,
-++    #     streamer=None,
-++    #     logits_warper=None,
-++    #     **model_kwargs,
-++    # ):
-++    #     """
-++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++    #     """
-++    #     from ...generation.logits_process import LogitsProcessorList
-++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++    #     from mindnlp.core import nn, ops, no_grad
-++    #     import numpy as np
-++        
-++    #     # 检查是否使用 StaticCache
-++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++    #     # 否则，直接调用父类方法
-++    #     past_key_values = model_kwargs.get("past_key_values")
-++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++
-++    #     if not isinstance(past_key_values, StaticCache):
-++    #         # 不使用 StaticCache，直接调用父类方法
-++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++    #         return super()._sample(
-++    #             input_ids=input_ids,
-++    #             logits_processor=logits_processor,
-++    #             stopping_criteria=stopping_criteria,
-++    #             generation_config=generation_config,
-++    #             synced_devices=synced_devices,
-++    #             streamer=streamer,
-++    #             logits_warper=logits_warper,
-++    #             **model_kwargs,
-++    #         )
-++        
-++    #     # 使用 StaticCache，进入自定义循环
-++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++    #     pad_token_id = generation_config._pad_token_tensor
-++    #     output_attentions = generation_config.output_attentions
-++    #     output_hidden_states = generation_config.output_hidden_states
-++    #     output_scores = generation_config.output_scores
-++    #     output_logits = generation_config.output_logits
-++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++    #     max_length = generation_config.max_length
-++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++    #     do_sample = generation_config.do_sample
-++        
-++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++    #         raise ValueError(
-++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++    #             f"{logits_warper})."
-++    #         )
-++        
-++    #     # init attention / hidden states / scores tuples
-++    #     scores = () if (return_dict_in_generate and output_scores) else None
-++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++        
-++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++    #         encoder_hidden_states = (
-++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++    #         )
-++        
-++    #     # keep track of which sequences are already finished
-++    #     batch_size, cur_len = input_ids.shape
-++    #     this_peer_finished = False
-++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++        
-++    #     time_record = []
-++    #     from ....utils.testing_utils import parse_flag_from_env
-++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++        
-++    #     while self._has_unfinished_sequences(
-++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++    #     ):
-++    #         if _record_time:
-++    #             import time as time_module
-++    #             infer_start = time_module.time()
-++            
-++    #         # prepare model inputs
-++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++            
-++    #         # prepare variable output controls
-++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++            
-++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++    #         cur_cache_position = model_inputs.get("cache_position")
-++    #         cur_past_key_values = model_inputs.get("past_key_values")
-++    #         cur_input_ids = model_inputs.get("input_ids")
-++            
-++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++    #             cur_cache_position is not None and 
-++    #             len(cur_cache_position.shape) > 0 and
-++    #             cur_cache_position.shape[0] == 1 and
-++    #             cur_input_ids is not None and
-++    #             cur_input_ids.shape[1] == 1):
-++    #             # 使用 JIT 优化的单 token 解码
-++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++    #             if not hasattr(self, '_jit_used'):
-++    #                 self._jit_used = False
-++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++                
-++    #             next_token_logits = self.get_decode_one_tokens_logits(
-++    #                 cur_token=cur_input_ids,
-++    #                 input_pos=model_inputs.get("position_ids"),
-++    #                 cache_position=cur_cache_position,
-++    #                 past_key_values=cur_past_key_values,
-++    #             )
-++                
-++    #             # 标记已使用JIT（用于后续判断）
-++    #             if not self._jit_used:
-++    #                 self._jit_used = True
-++                
-++    #             # 构造兼容的输出对象
-++    #             class JitOptimizedOutput:
-++    #                 def __init__(self, logits, config):
-++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++    #                     self.config = config
-++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++    #                     self.attentions = None if not config.is_encoder_decoder else None
-++    #                     self.cross_attentions = None
-++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++                
-++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++    #         else:
-++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++    #             outputs = self(**model_inputs, return_dict=True)
-++            
-++    #         if synced_devices and this_peer_finished:
-++    #             continue
-++            
-++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++    #         next_token_logits = outputs.logits[:, -1, :]
-++            
-++    #         # pre-process distribution
-++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++    #         if do_sample:
-++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++            
-++    #         # Store scores, attentions and hidden_states when required
-++    #         if return_dict_in_generate:
-++    #             if output_scores:
-++    #                 scores += (next_token_scores,)
-++    #             if output_logits:
-++    #                 raw_logits += (next_token_logits,)
-++    #             if output_attentions:
-++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++    #                 if self.config.is_encoder_decoder:
-++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++                
-++    #             if output_hidden_states:
-++    #                 hidden = (
-++    #                     outputs.decoder_hidden_states
-++    #                     if self.config.is_encoder_decoder
-++    #                     else outputs.hidden_states
-++    #                 )
-++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++            
-++    #         # token selection
-++    #         if do_sample:
-++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++    #         else:
-++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++            
-++    #         # finished sentences should have their next token be a padding token
-++    #         if has_eos_stopping_criteria:
-++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++            
-++    #         # update generated ids, model inputs, and length for next step
-++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++    #         if streamer is not None:
-++    #             streamer.put(next_tokens)
-++            
-++    #         model_kwargs = self._update_model_kwargs_for_generation(
-++    #             outputs,
-++    #             model_kwargs,
-++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++    #         )
-++            
-++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++    #         cur_len += 1
-++            
-++    #         if _record_time:
-++    #             import time as time_module
-++    #             infer_stop = time_module.time()
-++    #             time_record.append(infer_stop - infer_start)
-++            
-++    #         del outputs
-++        
-++    #     average_infer_time = None
-++    #     if time_record:
-++    #         if len(time_record) > 1:
-++    #             time_record.pop(0)
-++    #         average_infer_time = sum(time_record) / len(time_record)
-++    #         print(f'average inference time is: {average_infer_time}')
-++    #         print(f'inference time record: {time_record}')
-++        
-++    #     if streamer is not None:
-++    #         streamer.end()
-++        
-++    #     # 简单判断：打印是否使用了JIT路径
-++    #     if hasattr(self, '_jit_used') and self._jit_used:
-++    #         print("[JIT] ✓ JIT optimization was used during generation")
-++    #     else:
-++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++        
-++    #     if return_dict_in_generate:
-++    #         if self.config.is_encoder_decoder:
-++    #             return GenerateEncoderDecoderOutput(
-++    #                 sequences=input_ids,
-++    #                 scores=scores,
-++    #                 logits=raw_logits,
-++    #                 encoder_attentions=encoder_attentions,
-++    #                 encoder_hidden_states=encoder_hidden_states,
-++    #                 decoder_attentions=decoder_attentions,
-++    #                 cross_attentions=cross_attentions,
-++    #                 decoder_hidden_states=decoder_hidden_states,
-++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++    #                 average_infer_time=average_infer_time
-++    #             )
-++    #         else:
-++    #             return GenerateDecoderOnlyOutput(
-++    #                 sequences=input_ids,
-++    #                 scores=scores,
-++    #                 logits=raw_logits,
-++    #                 attentions=decoder_attentions,
-++    #                 hidden_states=decoder_hidden_states,
-++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++    #                 average_infer_time=average_infer_time
-++    #             )
-++    #     else:
-++    #         return input_ids
-++            
-++    # def _prepare_cache_for_generation(
-++    #     self,
-++    #     generation_config,
-++    #     model_kwargs,
-++    #     assistant_model,
-++    #     batch_size,
-++    #     max_cache_length,
-++    # ):
-++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++    #         generation_config.cache_implementation = "static"
-++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++        
-++    #     if generation_config.cache_implementation == "static":
-++    #         base_required_from_max_length = generation_config.max_length + 1
-++    #         base_required = max(max_cache_length, base_required_from_max_length)
-++    #         min_cache_size = 50
-++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++    #         else:
-++    #             max_cache_length = max(base_required, min_cache_size)
-++            
-++    #         original_max_cache_length = max_cache_length
-++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++    #         print(f"  - final max_cache_length: {max_cache_length}")
-++            
-++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++    #             if max_cache_length > self.config.max_position_embeddings:
-++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++        
-++    #     result = super()._prepare_cache_for_generation(
-++    #         generation_config=generation_config,
-++    #         model_kwargs=model_kwargs,
-++    #         assistant_model=assistant_model,
-++    #         batch_size=batch_size,
-++    #         max_cache_length=max_cache_length,
-++    #     )
-++        
-++    #     if generation_config.cache_implementation == "static":
-++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++    #         created_cache = model_kwargs.get(cache_name)
-++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++    #             if created_cache.max_cache_len < generation_config.max_length:
-++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++        
-++    #     return result
-++
-++
-++
-+ 
-+ 
-+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+-- 
-+2.27.0
-+
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0003-20261106secondcommit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0003-20261106secondcommit.patch"
deleted file mode 100644
index d64b7f3f..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0003-20261106secondcommit.patch"
+++ /dev/null
@@ -1,2769 +0,0 @@
-From 7a37d9be16fe823c251701c26bbb20cc09f9922a Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Thu, 6 Nov 2025 14:54:37 +0800
-Subject: [PATCH 03/10] 20261106secondcommit
-
----
- .../models/deepseek/modeling_deepseek.py      |  217 ++-
- .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
- patches/0001-20251104commit.patch             | 1272 -----------------
- 3 files changed, 528 insertions(+), 2032 deletions(-)
- delete mode 100644 patches/0001-20251104commit.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index 73773c22..2f9192bf 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
- 
- _CONFIG_FOR_DOC = "DeepseekConfig"
- 
-+_attn_mask_cache = {}
-+
-+def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-+    q_len = batch_and_seq[1]
-+    kv_len = batch_and_seq[1] + past_key_values_length 
-+    key = (batch_and_seq[0], q_len, kv_len)
-+
-+    if key in _attn_mask_cache:
-+        return _attn_mask_cache[key]
-+
-+    mask = _prepare_4d_causal_attention_mask(
-+        attention_mask,
-+        batch_and_seq,
-+        inputs_embeds,
-+        past_key_values_length,
-+    )
-+    _attn_mask_cache[key] = mask
-+    return mask
- 
- def _get_unpad_data(attention_mask):
-     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
-@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
-         return final_output
- 
- 
--    @no_grad()
--    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--        expert_cache = ops.zeros_like(x)
--        idxs = flat_expert_indices.argsort()
--        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--        token_idxs = idxs // self.num_experts_per_tok
--
--        for i, end_idx in enumerate(tokens_per_expert):
--            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--            if start_idx == end_idx:
--                continue
--            expert = self.experts[i]
--            exp_token_idx = token_idxs[start_idx:end_idx]
--            expert_tokens = x[exp_token_idx]
--            expert_out = expert(expert_tokens)
--            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--
--        return expert_cache
--        
-     # @no_grad()
--    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--    #     # expert_cache = torch.zeros_like(x)
--    #     # idxs = flat_expert_indices.argsort()
--    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--    #     # token_idxs = idxs // self.num_experts_per_tok
--    #     # for i, end_idx in enumerate(tokens_per_expert):
--    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--    #     #     if start_idx == end_idx:
--    #     #         continue
--    #     #     expert = self.experts[i]
--    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--    #     #     expert_tokens = x[exp_token_idx]
--    #     #     expert_out = expert(expert_tokens)
--    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--    #     # return expert_cache
-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-     #     expert_cache = ops.zeros_like(x)
-     #     idxs = flat_expert_indices.argsort()
-     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
-     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
- 
-     #     return expert_cache
--    # @no_grad()
--    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--    #     expert_cache = ops.zeros_like(x)
-+        
-+    @no_grad()
-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+        """
-+        优化版 MoE prefill：
-+        - 批量张量化处理同一个 expert 的所有 token
-+        - 跳过无 token 的专家
-+        - 保持结果完全一致
-+        """
-+        # 初始化输出缓存
-+        expert_cache = ops.zeros_like(x)
- 
--    #     # 排序保证顺序一致
--    #     idxs = flat_expert_indices.argsort()
--    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--    #     token_idxs = idxs // self.num_experts_per_tok
-+        # 排序（确保 scatter_add 位置对应原逻辑）
-+        idxs = flat_expert_indices.argsort()
-+        sorted_expert_indices = flat_expert_indices[idxs]
-+        sorted_token_indices = idxs // self.num_experts_per_tok
- 
--    #     # 找出有 token 的专家
--    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+        # 每个 expert 的 token 数
-+        tokens_per_expert = sorted_expert_indices.bincount()
- 
--    #     for i in active_experts.tolist():
--    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--    #         end_idx = tokens_per_expert[i]
--    #         if start_idx == end_idx:  # 没有 token
--    #             continue
-+        # 找出有 token 的专家
-+        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
- 
--    #         exp_token_idx = token_idxs[start_idx:end_idx]
--    #         expert_tokens = x[exp_token_idx]
--    #         expert_out = self.experts[i](expert_tokens)
--    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+        for expert_id in active_experts.tolist():
-+            # 取该 expert 对应的排序后 token 区间
-+            start = (tokens_per_expert[:expert_id]).sum().item()
-+            end = start + tokens_per_expert[expert_id].item()
- 
--    #         expert_cache = mindspore.mint.scatter_add(
--    #             expert_cache,
--    #             0,
--    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--    #             expert_out
--    #         )
-+            token_idx = sorted_token_indices[start:end]      # 原 token 位置
-+            expert_tokens = x[token_idx]                     # 取输入向量
- 
--    #     return expert_cache
-+            # 执行专家 MLP
-+            expert_out = self.experts[expert_id](expert_tokens)
-+
-+            # 按权重缩放
-+            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
-+
-+            # 回写到缓存（等价 scatter_add）
-+            expert_cache = mindspore.mint.scatter_add(
-+                expert_cache,
-+                0,
-+                token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+                scaled_out
-+            )
-+
-+        return expert_cache
-+
-+        # @no_grad()
-+        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+        #     # expert_cache = torch.zeros_like(x)
-+        #     # idxs = flat_expert_indices.argsort()
-+        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+        #     # token_idxs = idxs // self.num_experts_per_tok
-+        #     # for i, end_idx in enumerate(tokens_per_expert):
-+        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+        #     #     if start_idx == end_idx:
-+        #     #         continue
-+        #     #     expert = self.experts[i]
-+        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+        #     #     expert_tokens = x[exp_token_idx]
-+        #     #     expert_out = expert(expert_tokens)
-+        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+        #     # return expert_cache
-+        #     expert_cache = ops.zeros_like(x)
-+        #     idxs = flat_expert_indices.argsort()
-+        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+        #     token_idxs = idxs // self.num_experts_per_tok
-+
-+        #     for i, end_idx in enumerate(tokens_per_expert):
-+        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+        #         if start_idx == end_idx:
-+        #             continue
-+        #         expert = self.experts[i]
-+        #         exp_token_idx = token_idxs[start_idx:end_idx]
-+        #         expert_tokens = x[exp_token_idx]
-+        #         expert_out = expert(expert_tokens)
-+        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+
-+        #     return expert_cache
-+        # @no_grad()
-+        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+        #     expert_cache = ops.zeros_like(x)
-+
-+        #     # 排序保证顺序一致
-+        #     idxs = flat_expert_indices.argsort()
-+        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+        #     token_idxs = idxs // self.num_experts_per_tok
-+
-+        #     # 找出有 token 的专家
-+        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+
-+        #     for i in active_experts.tolist():
-+        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+        #         end_idx = tokens_per_expert[i]
-+        #         if start_idx == end_idx:  # 没有 token
-+        #             continue
-+
-+        #         exp_token_idx = token_idxs[start_idx:end_idx]
-+        #         expert_tokens = x[exp_token_idx]
-+        #         expert_out = self.experts[i](expert_tokens)
-+        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+
-+        #         expert_cache = mindspore.mint.scatter_add(
-+        #             expert_cache,
-+        #             0,
-+        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+        #             expert_out
-+        #         )
-+
-+        #     return expert_cache
- 
- 
- 
-@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
- 
-         return attn_output, attn_weights, past_key_value
- 
--
- # class DeepseekFlashAttention(nn.Module):
- #     """
- #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
- 
-         return attn_output, attn_weights, past_key_value
- 
-+
- Deepseek_ATTENTION_CLASSES = {
-     "eager": DeepseekAttention,
-     "flash-attention": DeepseekFlashAttention,
-@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
-             )
-         else:
-             # 4d mask is passed through the layers
--            attention_mask = _prepare_4d_causal_attention_mask(
-+            # attention_mask = _prepare_4d_causal_attention_mask(
-+            #     attention_mask,
-+            #     (batch_size, seq_length),
-+            #     inputs_embeds,
-+            #     past_key_values_length,
-+            # )
-+            #@dwj
-+            attention_mask = get_cached_causal_mask(
-                 attention_mask,
-                 (batch_size, seq_length),
-                 inputs_embeds,
-@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-         # Initialize weights and apply final processing
-         self.post_init()
-         self.warm_up = False
-+        #@dwj
-+        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-+            self.num_layers,
-+            self.num_attention_heads,
-+            self.head_dim,
-+            batch_size=1,
-+            max_length=self.max_length,
-+            dtype=mindspore.float16
-+        )
-+
-+    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-+        key_cache = []
-+        value_cache = []
-+        for _ in range(num_layers):
-+            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+            key_cache.append(k)
-+            value_cache.append(v)
-+        return key_cache, value_cache
-+
- 
-     def warmup_moe_model_deep(self):
-         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-index bced285c..ebd7782e 100644
---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
- _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
- _CONFIG_FOR_DOC = "Qwen2MoeConfig"
- 
--Long_Prompt = False
--PROMPT_LENGTH_THRESHOLD = 128
-+Long_Prompt = 1
-+LONG_PROMPT_LENGTH_THRESHOLD = 128
-+SHORT_PROMPT_LENGTH_THRESHOLD = 32
-+
-+_causal_mask_cache = {}
-+
-+def get_cached_causal_mask_with_cache_position(
-+    attention_mask: mindspore.Tensor,
-+    sequence_length: int,
-+    target_length: int,
-+    dtype: mindspore.dtype,
-+    min_dtype: float,
-+    cache_position: mindspore.Tensor,
-+    batch_size: int,
-+):
-+    """
-+    带缓存的 causal mask 构造函数
-+    """
-+    # q_len 是当前 query 长度
-+    q_len = sequence_length
-+    # kv_len 是 target_length
-+    kv_len = target_length
-+
-+    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
-+    key = (batch_size, q_len, kv_len, dtype, min_dtype)
-+
-+    if key in _causal_mask_cache:
-+        return _causal_mask_cache[key]
-+
-+    # 调用原来的 mask 构造逻辑
-+    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+        attention_mask,
-+        sequence_length=sequence_length,
-+        target_length=target_length,
-+        dtype=dtype,
-+        min_dtype=min_dtype,
-+        cache_position=cache_position,
-+        batch_size=batch_size,
-+    )
-+    # 缓存结果
-+    _causal_mask_cache[key] = causal_mask
-+    return causal_mask
- 
- # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
- def _prepare_4d_causal_attention_mask_with_cache_position(
-@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
- 
- 
- # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
-+# class Qwen2MoeAttention(nn.Module):
-+#     """
-+#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-+#     and "Generating Long Sequences with Sparse Transformers".
-+#     """
-+
-+#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+#         super().__init__()
-+#         self.config = config
-+#         self.layer_idx = layer_idx
-+#         if layer_idx is None:
-+#             logger.warning_once(
-+#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+#                 "when creating this class."
-+#             )
-+
-+#         self.hidden_size = config.hidden_size
-+#         self.num_heads = config.num_attention_heads
-+#         self.head_dim = self.hidden_size // self.num_heads
-+#         self.num_key_value_heads = config.num_key_value_heads
-+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+#         self.max_position_embeddings = config.max_position_embeddings
-+#         self.rope_theta = config.rope_theta
-+#         self.is_causal = True
-+#         self.attention_dropout = config.attention_dropout
-+
-+#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+#             raise ValueError(
-+#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+#                 f" and `num_heads`: {self.num_heads})."
-+#             )
-+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+
-+#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+#             self.head_dim,
-+#             max_position_embeddings=self.max_position_embeddings,
-+#             base=self.rope_theta,
-+#         )
-+
-+#     def forward(
-+#         self,
-+#         hidden_states: mindspore.Tensor,
-+#         attention_mask: Optional[mindspore.Tensor] = None,
-+#         position_ids: Optional[mindspore.Tensor] = None,
-+#         past_key_value: Optional[Cache] = None,
-+#         output_attentions: bool = False,
-+#         use_cache: bool = False,
-+#         cache_position: Optional[mindspore.Tensor] = None,
-+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+
-+        
-+
-+#         bsz, q_len, _ = hidden_states.shape
-+
-+#         query_states = self.q_proj(hidden_states)
-+#         key_states = self.k_proj(hidden_states)
-+#         value_states = self.v_proj(hidden_states)
-+
-+#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+
-+#         kv_seq_len = key_states.shape[-2]
-+#         if past_key_value is not None:
-+#             if self.layer_idx is None:
-+#                 raise ValueError(
-+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+#                     "with a layer index."
-+#                 )
-+#             if isinstance(past_key_value, StaticCache):
-+#                 kv_seq_len = key_states.shape[-2]
-+#             else:
-+#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+
-+#         if past_key_value is not None:
-+#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+            
-+#             if isinstance(past_key_value, StaticCache):
-+#                 kv_seq_len = key_states.shape[-2]
-+
-+#         # repeat k/v heads if n_kv_heads < n_heads
-+#         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+#         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+        
-+#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+
-+#         if attention_mask is not None:
-+#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+#             attn_weights = attn_weights + causal_mask
-+
-+#         # upcast attention to fp32
-+#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+#         attn_output = ops.matmul(attn_weights, value_states)
-+
-+#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+#             raise ValueError(
-+#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-+#                 f" {attn_output.shape}"
-+#             )
-+
-+#         attn_output = ops.transpose(attn_output, 1, 2)
-+#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+
-+#         attn_output = self.o_proj(attn_output)
-+#         # @lwx
-+        
-+#         # max_seq_len = self.max_position_embeddings  # 2048
-+
-+#         # if attention_mask is not None:
-+#         #     # attention_mask: [B, 1, Sq, Sk]
-+#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+
-+#         #     # pad 到 [max_seq_len, max_seq_len]
-+#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+#         #     global_attention_mask = padded_mask
-+#         # else:
-+#         #     global_attention_mask = None
-+
-+
-+#         # sparse_mode=3
-+#         # attn_output = mindspore.ops.flash_attention_score(
-+#         #     query=query_states,
-+#         #     key=key_states,
-+#         #     value=value_states,
-+#         #     real_shift=None,
-+#         #     padding_mask=None,
-+
-+#         #     head_num=self.num_heads,
-+#         #     attn_mask=global_attention_mask,
-+#         #     keep_prob=1.0 - self.attention_dropout,
-+#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+#         #     input_layout="BNSD",
-+#         #     pre_tokens=2147483647,
-+#         #     next_tokens=2147483647,
-+#         #     inner_precise=0,
-+#         #     drop_mask=None,
-+#         #     prefix=None,
-+#         #     actual_seq_qlen=None,
-+#         #     actual_seq_kvlen=None,
-+#         #     sparse_mode=sparse_mode,
-+#         # )
-+#         if not output_attentions:
-+#             attn_weights = None
-+
-+#         return attn_output, attn_weights, past_key_value
-+
- class Qwen2MoeAttention(nn.Module):
-     """
--    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--    and "Generating Long Sequences with Sparse Transformers".
--    """
-+    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
- 
-+    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
-+    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
-+    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
-+
-+    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
-+    """
-     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-         super().__init__()
-         self.config = config
-@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
-         if layer_idx is None:
-             logger.warning_once(
-                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-                 "when creating this class."
-             )
- 
-@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
-         use_cache: bool = False,
-         cache_position: Optional[mindspore.Tensor] = None,
-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--
-         
--
-+        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
-         bsz, q_len, _ = hidden_states.shape
- 
-         query_states = self.q_proj(hidden_states)
-         key_states = self.k_proj(hidden_states)
-         value_states = self.v_proj(hidden_states)
- 
--        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--
-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+        
-         kv_seq_len = key_states.shape[-2]
-         if past_key_value is not None:
--            if self.layer_idx is None:
--                raise ValueError(
--                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--                    "with a layer index."
--                )
--            if isinstance(past_key_value, StaticCache):
--                kv_seq_len = key_states.shape[-2]
--            else:
--                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+        
-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
- 
-         if past_key_value is not None:
--            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+
-+        # --- 2. 动态调度核心注意力计算 ---
-+        global Long_Prompt
-+        if Long_Prompt >= 1:
-+            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
-+            fa_attention_mask = None
-+            if attention_mask is not None:
-+                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+                fa_attention_mask = (mask_slice != 0)
-+
-+            attn_output = mindspore.ops.flash_attention_score(
-+                query=query_states,
-+                key=key_states,
-+                value=value_states,
-+                head_num=self.num_heads,
-+                attn_mask=fa_attention_mask,
-+                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
-+                scalar_value=1.0 / math.sqrt(self.head_dim),
-+                input_layout="BNSD",
-+                sparse_mode=0,
-+                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
-+            )
-             
--            if isinstance(past_key_value, StaticCache):
--                kv_seq_len = key_states.shape[-2]
-+            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+            attn_output = self.o_proj(attn_output)
-+            attn_weights = None
-+            if output_attentions:
-+                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
- 
--        # repeat k/v heads if n_kv_heads < n_heads
--        key_states = repeat_kv(key_states, self.num_key_value_groups)
--        value_states = repeat_kv(value_states, self.num_key_value_groups)
--        
--        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+        else:
-+            # --- Eager Attention 路径 (用于短序列和解码) ---
-+            key_states = repeat_kv(key_states, self.num_key_value_groups)
-+            value_states = repeat_kv(value_states, self.num_key_value_groups)
-+            
-+            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
- 
--        if attention_mask is not None:
--            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--            attn_weights = attn_weights + causal_mask
-+            if attention_mask is not None:
-+                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+                attn_weights = attn_weights + causal_mask
- 
--        # upcast attention to fp32
--        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--        attn_output = ops.matmul(attn_weights, value_states)
-+            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+            attn_output = ops.matmul(attn_weights, value_states)
- 
--        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--            raise ValueError(
--                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--                f" {attn_output.shape}"
--            )
-+            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+                raise ValueError(
-+                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
-+                )
- 
--        attn_output = ops.transpose(attn_output, 1, 2)
--        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+            attn_output = ops.transpose(attn_output, 1, 2)
-+            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+            attn_output = self.o_proj(attn_output)
- 
--        attn_output = self.o_proj(attn_output)
--        # @lwx
-+            if not output_attentions:
-+                attn_weights = None
-         
--        # max_seq_len = self.max_position_embeddings  # 2048
--
--        # if attention_mask is not None:
--        #     # attention_mask: [B, 1, Sq, Sk]
--        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--
--        #     # pad 到 [max_seq_len, max_seq_len]
--        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--        #     global_attention_mask = padded_mask
--        # else:
--        #     global_attention_mask = None
--
--
--        # sparse_mode=3
--        # attn_output = mindspore.ops.flash_attention_score(
--        #     query=query_states,
--        #     key=key_states,
--        #     value=value_states,
--        #     real_shift=None,
--        #     padding_mask=None,
--
--        #     head_num=self.num_heads,
--        #     attn_mask=global_attention_mask,
--        #     keep_prob=1.0 - self.attention_dropout,
--        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--        #     input_layout="BNSD",
--        #     pre_tokens=2147483647,
--        #     next_tokens=2147483647,
--        #     inner_precise=0,
--        #     drop_mask=None,
--        #     prefix=None,
--        #     actual_seq_qlen=None,
--        #     actual_seq_kvlen=None,
--        #     sparse_mode=sparse_mode,
--        # )
--        if not output_attentions:
--            attn_weights = None
--
-         return attn_output, attn_weights, past_key_value
- 
--
- # class Qwen2MoeFlashAttention(nn.Module):
- #     """
- #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
- #             return final_hidden_states, router_logits
- 
- 
--# class Qwen2MoeSparseMoeBlock(nn.Module):
--#     """
--#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--#     `_moe_infer_prefill` (用于长序列处理) 方法。
--#     """
--#     def __init__(self, config: Qwen2MoeConfig):
--#         super().__init__()
--#         self.num_experts = config.num_experts
--#         self.top_k = config.num_experts_per_tok
--#         self.norm_topk_prob = config.norm_topk_prob
--
--#         # 门控网络
--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--#         # 专家列表
--#         self.experts = nn.ModuleList(
--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--#         )
--#         # 共享专家
--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--
--#     @no_grad()
--#     def _moe_infer_decode(
--#         self, 
--#         hidden_states: mindspore.Tensor, 
--#         selected_experts: mindspore.Tensor, 
--#         routing_weights: mindspore.Tensor
--#     ) -> mindspore.Tensor:
--#         """
--#         【解码路径】针对 sequence_length=1 的极致优化。
--#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--#         """
--#         batch_size, hidden_dim = hidden_states.shape
--        
--#         expert_outputs_list = [
--#             ops.cat([
--#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--#             ], dim=0) 
--#             for i in range(batch_size)
--#         ]
--        
--#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--#         # shape: (batch_size, top_k, hidden_dim)
--#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--        
--#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--        
--#         return moe_output.squeeze(1)
--
--#     @no_grad()
--#     def _moe_infer_prefill(
--#         self, 
--#         hidden_states: mindspore.Tensor, 
--#         selected_experts: mindspore.Tensor, 
--#         routing_weights: mindspore.Tensor
--#     ) -> mindspore.Tensor:
--#         """
--#         【预填充路径】针对 sequence_length > 1 的优化。
--#         按专家对 Token 进行分组，并进行批处理。
--#         """
--#         moe_output = ops.zeros_like(hidden_states)
--#         num_tokens = hidden_states.shape[0]
--#         flat_selected_experts = selected_experts.flatten()
--        
--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--        
--#         active_experts = ops.unique(flat_selected_experts)
--        
--#         for expert_idx_tensor in active_experts:
--#             expert_idx = expert_idx_tensor.item()
--#             expert_layer = self.experts[expert_idx]
--            
--#             mask = (flat_selected_experts == expert_idx_tensor)
--#             selected_token_indices = token_indices[mask]
--#             selected_routing_weights = routing_weights.flatten()[mask]
--            
--#             current_states = hidden_states[selected_token_indices]
--            
--#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--            
--#             moe_output = moe_output.index_add(
--#                 dim=0,
--#                 index=selected_token_indices,
--#                 source=expert_output.to(hidden_states.dtype)
--#             )
--#         return moe_output
--
--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--#         """
--#         顶层 forward 方法，作为智能分发器。
--#         """
--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--        
--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--#         router_logits = self.gate(hidden_states_reshaped)
--#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--
--#         if self.norm_topk_prob:
--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--        
--#         routing_weights = routing_weights.to(hidden_states.dtype)
--        
--#         moe_output = None
--#         # 在推理时，根据序列长度选择最优路径
--#         if not self.training:
--#             if sequence_length == 1:
--#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--#             else:
--#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--#         else:
--#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--#             raise NotImplementedError("Training path is not implemented.")
--
--#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--        
--#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--        
--#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--        
--#         return final_hidden_states, router_logits
--
--
--# class Qwen2MoeSparseMoeBlock(nn.Module):
--#     """
--#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--#     """
--#     def __init__(self, config: Qwen2MoeConfig):
--#         super().__init__()
--#         self.num_experts = config.num_experts
--#         self.top_k = config.num_experts_per_tok
--#         self.norm_topk_prob = config.norm_topk_prob
--
--#         # 门控网络
--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--#         # 专家列表
--#         self.experts = nn.ModuleList(
--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--#         )
--#         # 共享专家
--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--
--#     @no_grad()
--#     def _moe_infer_decode(
--#         self, 
--#         hidden_states: mindspore.Tensor, 
--#         selected_experts: mindspore.Tensor, 
--#         routing_weights: mindspore.Tensor
--#     ) -> mindspore.Tensor:
--#         batch_size, _ = hidden_states.shape
--#         expert_outputs_list = [
--#             ops.cat([
--#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--#             ], dim=0) 
--#             for i in range(batch_size)
--#         ]
--#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--#         return moe_output.squeeze(1)
--
--#     @no_grad()
--#     def _moe_infer_prefill(
--#         self, 
--#         hidden_states: mindspore.Tensor, 
--#         selected_experts: mindspore.Tensor, 
--#         routing_weights: mindspore.Tensor
--#     ) -> mindspore.Tensor:
--#         moe_output = ops.zeros_like(hidden_states)
--#         num_tokens = hidden_states.shape[0]
--#         flat_selected_experts = selected_experts.flatten()
--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--#         active_experts = ops.unique(flat_selected_experts)
--        
--#         for expert_idx_tensor in active_experts:
--#             expert_idx = expert_idx_tensor.item()
--#             expert_layer = self.experts[expert_idx]
--#             mask = (flat_selected_experts == expert_idx_tensor)
--#             selected_token_indices = token_indices[mask]
--#             selected_routing_weights = routing_weights.flatten()[mask]
--#             current_states = hidden_states[selected_token_indices]
--#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--#             moe_output = moe_output.index_add(
--#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--#             )
--#         return moe_output
--
--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--#         """
--#         顶层 forward 方法，作为智能分发器。
--#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--#         """
--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--        
--#         # 1. 门控计算 (通用逻辑)
--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--#         router_logits = self.gate(hidden_states_reshaped)
--#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--
--#         if self.norm_topk_prob:
--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--        
--#         routing_weights = routing_weights.to(hidden_states.dtype)
--        
--#         # 2. 智能分发到最优 MoE 路径
--#         moe_output = None
--#         if not self.training:
--#             if sequence_length == 1:
--#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--#             else:
--#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--#         else:
--#             raise NotImplementedError("Training path is not implemented.")
--
--#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--        
--#         # 4. 合并 MoE 输出和共享专家输出
--#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--        
--#         # 5. 恢复原始形状并返回
--#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--        
--#         return final_hidden_states, router_logits
--
--# prefill fastest
--# class Qwen2MoeSparseMoeBlock(nn.Module):
--#     """
--#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--#     """
--#     def __init__(self, config: Qwen2MoeConfig):
--#         super().__init__()
--#         self.num_experts = config.num_experts
--#         self.top_k = config.num_experts_per_tok
--#         self.norm_topk_prob = config.norm_topk_prob
--
--#         # 门控网络
--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--#         # 专家列表
--#         self.experts = nn.ModuleList(
--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--#         )
--#         # 共享专家
--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--
--#     @no_grad()
--#     def _moe_infer_dispatch(
--#         self, 
--#         hidden_states: mindspore.Tensor, 
--#         selected_experts: mindspore.Tensor, 
--#         routing_weights: mindspore.Tensor
--#     ) -> mindspore.Tensor:
--#         """
--#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--#         """
--#         moe_output = ops.zeros_like(hidden_states)
--#         num_tokens, _ = hidden_states.shape
--        
--#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--#         flat_selected_experts = selected_experts.flatten()
--#         flat_routing_weights = routing_weights.flatten()
--
--#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--
--#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--#         active_experts = ops.unique(flat_selected_experts)
--        
--#         for expert_idx_tensor in active_experts:
--#             expert_idx = expert_idx_tensor.item()
--#             expert_layer = self.experts[expert_idx]
--            
--#             # 找到所有分配给该专家的 token
--#             mask = (flat_selected_experts == expert_idx_tensor)
--            
--#             # 使用 mask 选取对应的 token 和权重
--#             current_token_indices = token_indices[mask]
--#             current_routing_weights = flat_routing_weights[mask]
--#             current_hidden_states = hidden_states[current_token_indices]
--            
--#             # 对这些 token 进行批处理
--#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--            
--#             # 使用 index_add 将结果精确地加回到对应位置
--#             moe_output = moe_output.index_add(
--#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--#             )
--#         return moe_output
--
--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--#         """
--#         顶层 forward 方法，作为智能分发器。
--#         """
--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--        
--#         # 1. 门控计算
--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--#         router_logits = self.gate(hidden_states_reshaped)
--#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--
--#         if self.norm_topk_prob:
--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--        
--#         routing_weights = routing_weights.to(hidden_states.dtype)
--        
--#         # 2. 调用统一的 MoE 计算内核
--#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--
--#         # 3. 统一处理共享专家
--#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--        
--#         # 4. 合并输出
--#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--        
--#         # 5. 恢复原始形状并返回
--#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--        
--#         return final_hidden_states, router_logits
--
--
--# class Qwen2MoeSparseMoeBlock(nn.Module):
--#     """
--#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--#     【最终高性能与高精度版】：
--#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--#     3. 这样实现了速度和准确性的两全其美。
--#     """
--#     def __init__(self, config: Qwen2MoeConfig):
--#         super().__init__()
--#         self.num_experts = config.num_experts
--#         self.top_k = config.num_experts_per_tok
--#         self.norm_topk_prob = config.norm_topk_prob
--
--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--#         self.experts = nn.ModuleList(
--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--#         )
--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--
--#     @no_grad()
--#     def _moe_infer_decode(
--#         self, 
--#         hidden_states: mindspore.Tensor, 
--#         selected_experts: mindspore.Tensor, 
--#         routing_weights: mindspore.Tensor
--#     ) -> mindspore.Tensor:
--#         """
--#         【解码路径】极致优化版：bmm + 高精度累加。
--#         """
--#         original_dtype = hidden_states.dtype
--#         batch_size, _ = hidden_states.shape
--        
--#         expert_outputs_list = [
--#             ops.cat([
--#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--#             ], dim=0) 
--#             for i in range(batch_size)
--#         ]
--#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--
--#         # 在 float32 下执行 bmm，得到高精度结果
--#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--        
--#         # 将高精度结果转换回原始数据类型
--#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--        
--#         return moe_output
--
--#     @no_grad()
--#     def _moe_infer_prefill(
--#         self, 
--#         hidden_states: mindspore.Tensor, 
--#         selected_experts: mindspore.Tensor, 
--#         routing_weights: mindspore.Tensor
--#     ) -> mindspore.Tensor:
--#         """
--#         【预填充路径】与原始实现一致，结果精确。
--#         """
--#         moe_output = ops.zeros_like(hidden_states)
--#         num_tokens, _ = hidden_states.shape
--#         flat_selected_experts = selected_experts.flatten()
--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--#         active_experts = ops.unique(flat_selected_experts)
--        
--#         for expert_idx_tensor in active_experts:
--#             expert_idx = expert_idx_tensor.item()
--#             expert_layer = self.experts[expert_idx]
--#             mask = (flat_selected_experts == expert_idx_tensor)
--#             selected_token_indices = token_indices[mask]
--#             selected_routing_weights = routing_weights.flatten()[mask]
--#             current_states = hidden_states[selected_token_indices]
--#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--#             moe_output = moe_output.index_add(
--#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--#             )
--#         return moe_output
--
--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--        
--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--#         router_logits = self.gate(hidden_states_reshaped)
--#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--
--#         if self.norm_topk_prob:
--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--        
--#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--#         # 如果模型主体是 float16，后续再转换
--        
--#         moe_output = None
--#         if not self.training:
--#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--#             # _moe_infer_decode 内部会处理好类型转换
--#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--#             if sequence_length == 1:
--#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--#             else:
--#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--#         else:
--#             raise NotImplementedError("Training path is not implemented.")
--
--#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--        
--#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--        
--#         return final_hidden_states, router_logits
--    
--
--# class Qwen2MoeSparseMoeBlock(nn.Module):
--#     """
--#     【融合版】一个混合专家模块，内置两种推理策略，
--#     由外部全局变量 `Long_Prompt` 控制：
--
--#     - if Long_Prompt is True:  【精度优先模式】
--#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--#       适用于处理长序列，避免误差累积。
--
--#     - if Long_Prompt is False: 【速度优先模式】
--#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--#       在解码阶段获得极致速度，同时保证结果高度准确。
--#     """
--#     def __init__(self, config: Qwen2MoeConfig):
--#         super().__init__()
--#         self.num_experts = config.num_experts
--#         self.top_k = config.num_experts_per_tok
--#         self.norm_topk_prob = config.norm_topk_prob
--
--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--#         self.experts = nn.ModuleList(
--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--#         )
--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--
--#     # --- 速度优先模式的辅助函数 ---
--#     @no_grad()
--#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--#         original_dtype = hidden_states.dtype
--#         batch_size, _ = hidden_states.shape
--#         expert_outputs_list = [
--#             ops.cat([
--#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--#             ], dim=0) 
--#             for i in range(batch_size)
--#         ]
--#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--#         weights_fp32 = routing_weights.to(mindspore.float32)
--#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--#         return moe_output_fp32.squeeze(1).to(original_dtype)
--
--#     @no_grad()
--#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--#         moe_output = ops.zeros_like(hidden_states)
--#         num_tokens, _ = hidden_states.shape
--#         flat_selected_experts = selected_experts.flatten()
--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--#         active_experts = ops.unique(flat_selected_experts)
--#         for expert_idx_tensor in active_experts:
--#             expert_idx = expert_idx_tensor.item()
--#             expert_layer = self.experts[expert_idx]
--#             mask = (flat_selected_experts == expert_idx_tensor)
--#             selected_token_indices = token_indices[mask]
--#             selected_routing_weights = routing_weights.flatten()[mask]
--#             current_states = hidden_states[selected_token_indices]
--#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--#         return moe_output
--
--#     # --- 精度优先模式的辅助函数 ---
--#     @no_grad()
--#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--#         moe_output = ops.zeros_like(hidden_states)
--#         num_tokens, _ = hidden_states.shape
--#         flat_selected_experts = selected_experts.flatten()
--#         flat_routing_weights = routing_weights.flatten()
--#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--#         active_experts = ops.unique(flat_selected_experts)
--#         for expert_idx_tensor in active_experts:
--#             expert_idx = expert_idx_tensor.item()
--#             expert_layer = self.experts[expert_idx]
--#             mask = (flat_selected_experts == expert_idx_tensor)
--#             current_token_indices = token_indices[mask]
--#             current_routing_weights = flat_routing_weights[mask]
--#             current_hidden_states = hidden_states[current_token_indices]
--#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--#         return moe_output
--
--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--#         # 声明我们将要使用一个在模块外部定义的全局变量
--#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--#         global Long_Prompt
--
--#         # 1. 门控计算 (所有模式通用)
--#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--#         router_logits = self.gate(hidden_states_reshaped)
--#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--#         if self.norm_topk_prob:
--#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--        
--#         moe_output = None
--#         if not self.training:
--#             # 根据 Long_Prompt 标志选择模式
--#             if Long_Prompt:
--#                 # --- 精度优先模式 ---
--#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--#             else:
--#                 # --- 速度优先模式 ---
--#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--#                 if sequence_length == 1:
--#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--#                 else:
--#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--#         else:
--#             raise NotImplementedError("Training path is not implemented.")
--
--#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--        
--#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--        
--#         return final_hidden_states, router_logits
--    
- class Qwen2MoeSparseMoeBlock(nn.Module):
-     """
-     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-         return moe_output_fp32.squeeze(1).to(original_dtype)
- 
-+    # @no_grad()
-+    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+    #     num_tokens, _ = hidden_states.shape
-+    #     flat_selected_experts = selected_experts.flatten()
-+    #     sorted_expert_indices = flat_selected_experts.argsort()
-+    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+    #     original_token_indices = sorted_expert_indices // self.top_k
-+    #     moe_output = ops.zeros_like(hidden_states)
-+    #     current_token_offset = 0
-+    #     for i in range(self.num_experts):
-+    #         expert_token_count = tokens_per_expert[i] - current_token_offset
-+    #         if expert_token_count == 0:
-+    #             continue
-+    #         end_offset = current_token_offset + expert_token_count
-+    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+    #         expert_hidden_states = hidden_states[expert_original_token_indices]
-+    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+    #         current_token_offset += expert_token_count
-+    #     return moe_output
-+
-     @no_grad()
-     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--        num_tokens, _ = hidden_states.shape
--        flat_selected_experts = selected_experts.flatten()
--        sorted_expert_indices = flat_selected_experts.argsort()
--        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--        original_token_indices = sorted_expert_indices // self.top_k
-+        """
-+        优化版 MoE prefill (速度优先模式)：
-+        - 批量张量化处理同一个 expert 的所有 token
-+        - 跳过无 token 的专家
-+        - 保持结果完全一致
-+        """
-         moe_output = ops.zeros_like(hidden_states)
--        current_token_offset = 0
--        for i in range(self.num_experts):
--            expert_token_count = tokens_per_expert[i] - current_token_offset
--            if expert_token_count == 0:
--                continue
--            end_offset = current_token_offset + expert_token_count
--            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--            expert_hidden_states = hidden_states[expert_original_token_indices]
--            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--            current_token_offset += expert_token_count
-+
-+        flat_selected_experts = selected_experts.flatten()
-+        flat_routing_weights = routing_weights.flatten()
-+
-+        idxs = flat_selected_experts.argsort()
-+        sorted_expert_indices = flat_selected_experts[idxs]
-+        sorted_token_indices = idxs // self.top_k
-+
-+        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-+
-+        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-+
-+        for expert_id in active_experts.tolist():
-+            start = int(tokens_per_expert[:expert_id].sum().item())
-+            end = start + int(tokens_per_expert[expert_id].item())
-+
-+            token_idx = sorted_token_indices[start:end]
-+            expert_tokens = hidden_states[token_idx]
-+
-+            expert_out = self.experts[expert_id](expert_tokens)
-+
-+            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
-+
-+            moe_output = mindspore.mint.scatter_add(
-+                moe_output,
-+                0,
-+                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
-+                scaled_out.to(hidden_states.dtype)
-+            )
-+
-         return moe_output
- 
-+
-     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-     @no_grad()
-     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-         
-         moe_output = None
--        if Long_Prompt:
--            # --- 精度优先模式 (ACCURACY MODE) ---
--            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+        # if Long_Prompt==0:
-+        #     # --- 精度优先模式 (ACCURACY MODE) ---
-+        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+        # else:
-+        #     # --- 速度优先模式 (SPEED MODE) ---
-+        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+        #     if sequence_length == 1:
-+        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+        #     else:
-+        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+        
-+        routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+        if sequence_length == 1:
-+            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-         else:
--            # --- 速度优先模式 (SPEED MODE) ---
--            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--            if sequence_length == 1:
--                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--            else:
--                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--        
-+            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+    
- 
-         # 3. 共享专家计算与合并 (所有模式通用)
-         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         
-         return final_hidden_states, router_logits
- 
-+
- class Qwen2MoeDecoderLayer(nn.Module):
-     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-         super().__init__()
-         self.hidden_size = config.hidden_size
-         
--        # if Long_Prompt:
--        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--        # else:
-+        # if Long_Prompt == 2:
-         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+        # else:
-+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
- 
-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
- 
-@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-             )
- 
-         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
--        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+        #     attention_mask,
-+        #     sequence_length=sequence_length,
-+        #     target_length=target_length,
-+        #     dtype=dtype,
-+        #     min_dtype=min_dtype,
-+        #     cache_position=cache_position,
-+        #     batch_size=input_tensor.shape[0],
-+        # )
-+        #@dwj
-+        causal_mask = get_cached_causal_mask_with_cache_position(
-             attention_mask,
-             sequence_length=sequence_length,
-             target_length=target_length,
-@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-         """
--        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
-+        _causal_mask_cache.clear()
- 
-         input_ids = kwargs.get("input_ids")
-         if input_ids is None and args:
-@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
- 
-         if input_ids is not None:
-             prompt_length = input_ids.shape[1]
--            
--            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--                Long_Prompt = True
-+            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-+                Long_Prompt = 2
-+            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-+                Long_Prompt = 0
-             else:
--                Long_Prompt = False
-+                Long_Prompt = 1
-+
- 
-         return super().generate(*args, **kwargs)
-     
-@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-             dtype = self.lm_head.weight.dtype
-             min_dtype = float(ops.finfo(dtype).min)
- 
--            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+            #     attention_mask,
-+            #     sequence_length=sequence_length,
-+            #     target_length=past_key_values.get_max_length(),
-+            #     dtype=dtype,
-+            #     min_dtype=min_dtype,
-+            #     cache_position=cache_position,
-+            #     batch_size=batch_size,
-+            # )
-+
-+            #@dwj
-+            attention_mask = get_cached_causal_mask_with_cache_position(
-                 attention_mask,
-                 sequence_length=sequence_length,
-                 target_length=past_key_values.get_max_length(),
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-deleted file mode 100644
-index 6dfb5b93..00000000
---- a/patches/0001-20251104commit.patch
-+++ /dev/null
-@@ -1,1272 +0,0 @@
--From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Tue, 4 Nov 2025 09:11:51 +0800
--Subject: [PATCH] 20251104commit
--
-----
-- mindnlp/transformers/cache_utils.py           |  28 +-
-- .../models/deepseek/modeling_deepseek.py      | 149 ++-
-- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-- 3 files changed, 976 insertions(+), 87 deletions(-)
--
--diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--index cadd2e04..02f8d4be 100644
----- a/mindnlp/transformers/cache_utils.py
--+++ b/mindnlp/transformers/cache_utils.py
--@@ -812,14 +812,26 @@ class StaticCache(Cache):
--             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--             # k_out[:, :, cache_position] = key_states
--             # v_out[:, :, cache_position] = value_states
---            if ON_ORANGE_PI:
---                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
---                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
---            else:
---                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
---                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
---                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
---
--+            # if ON_ORANGE_PI:
--+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+            # else:
--+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+            # 确保 cache_position 是 1D tensor 并且类型正确
--+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+            if cache_position.ndim > 1:
--+                cache_position = cache_position.flatten()
--+            # 确保类型是 int32 或 int64（MindSpore 要求）
--+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+                cache_position = cache_position.int()
--+            
--+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+            k_out[:, :, cache_position] = key_states
--+            v_out[:, :, cache_position] = value_states
--+            
--         return k_out, v_out
-- 
--     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index c695b944..d8303e45 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-- # Copied from transformers.models.llama.modeling_llama.rotate_half
-- def rotate_half(x):
--     """Rotates half the hidden dims of the input."""
---    x1 = x[..., : x.shape[-1] // 2]
---    x2 = x[..., x.shape[-1] // 2 :]
--+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+    # x1 = x[..., : x.shape[-1] // 2]
--+    # x2 = x[..., x.shape[-1] // 2 :]
--+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--     return ops.cat((-x2, x1), dim=-1)
-- 
-- 
--@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--         if self.training:
--             raise NotImplementedError("Training is not supported yet.")
--         else:
---            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
---        if self.config.n_shared_experts is not None:
---            y = y + self.shared_experts(identity)
---        return y
--+            # @lwx
--+            if orig_shape[1] == 1:
--+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+                y=y.view(*orig_shape)
--+                if self.config.n_shared_experts is not None:
--+                    y = y + self.shared_experts(identity)
--+                return y
--+            else:
--+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+                if self.config.n_shared_experts is not None:
--+                    y = y + self.shared_experts(identity)
--+                return y
--+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+        # if self.config.n_shared_experts is not None:
--+        #     y = y + self.shared_experts(identity)
--+        # return y
--+        
--+    @no_grad()
--+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+
--+        expert_cache = ops.zeros_like(x)
--+        for i in range(self.num_experts_per_tok):
--+            expert_id = flat_expert_indices[i].item()
--+            weight = flat_expert_weights[i].item()
--+            expert = self.experts[expert_id]
--+            expert_out = expert(x)
--+            expert_cache += expert_out * weight
--+        return expert_cache
-- 
--     @no_grad()
---    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
---        # expert_cache = torch.zeros_like(x)
---        # idxs = flat_expert_indices.argsort()
---        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
---        # token_idxs = idxs // self.num_experts_per_tok
---        # for i, end_idx in enumerate(tokens_per_expert):
---        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
---        #     if start_idx == end_idx:
---        #         continue
---        #     expert = self.experts[i]
---        #     exp_token_idx = token_idxs[start_idx:end_idx]
---        #     expert_tokens = x[exp_token_idx]
---        #     expert_out = expert(expert_tokens)
---        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
---        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
---        # return expert_cache
--+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--         expert_cache = ops.zeros_like(x)
--         idxs = flat_expert_indices.argsort()
--         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--         token_idxs = idxs // self.num_experts_per_tok
--+
--         for i, end_idx in enumerate(tokens_per_expert):
--             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--             if start_idx == end_idx:
--@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--             expert_out = expert(expert_tokens)
--             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+
--         return expert_cache
--+        
--+    # @no_grad()
--+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+    #     # expert_cache = torch.zeros_like(x)
--+    #     # idxs = flat_expert_indices.argsort()
--+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+    #     # token_idxs = idxs // self.num_experts_per_tok
--+    #     # for i, end_idx in enumerate(tokens_per_expert):
--+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+    #     #     if start_idx == end_idx:
--+    #     #         continue
--+    #     #     expert = self.experts[i]
--+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+    #     #     expert_tokens = x[exp_token_idx]
--+    #     #     expert_out = expert(expert_tokens)
--+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+    #     # return expert_cache
--+    #     expert_cache = ops.zeros_like(x)
--+    #     idxs = flat_expert_indices.argsort()
--+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+    #     token_idxs = idxs // self.num_experts_per_tok
--+
--+    #     for i, end_idx in enumerate(tokens_per_expert):
--+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+    #         if start_idx == end_idx:
--+    #             continue
--+    #         expert = self.experts[i]
--+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+    #         expert_tokens = x[exp_token_idx]
--+    #         expert_out = expert(expert_tokens)
--+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+
--+    #     return expert_cache
--+    # @no_grad()
--+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+    #     expert_cache = ops.zeros_like(x)
--+
--+    #     # 排序保证顺序一致
--+    #     idxs = flat_expert_indices.argsort()
--+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+    #     token_idxs = idxs // self.num_experts_per_tok
--+
--+    #     # 找出有 token 的专家
--+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+
--+    #     for i in active_experts.tolist():
--+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+    #         end_idx = tokens_per_expert[i]
--+    #         if start_idx == end_idx:  # 没有 token
--+    #             continue
--+
--+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+    #         expert_tokens = x[exp_token_idx]
--+    #         expert_out = self.experts[i](expert_tokens)
--+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+
--+    #         expert_cache = mindspore.mint.scatter_add(
--+    #             expert_cache,
--+    #             0,
--+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+    #             expert_out
--+    #         )
--+
--+    #     return expert_cache
--+
--+
-- 
-- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-- #     """
--@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-- 
--         # Initialize weights and apply final processing
--         self.post_init()
--+        self.warm_up = False
--+
--+    def warmup_moe_model_deep(self):
--+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+        test_texts = [
--+            "warmup short",
--+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+        ]
--+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+        if tokenizer is None:
--+            from mindnlp.transformers import AutoTokenizer
--+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+            self._warmup_tokenizer = tokenizer
--+
--+        for text in test_texts:
--+            inputs = tokenizer(text, return_tensors="ms")
--+            with mindspore._no_grad():
--+                _ = self(**inputs, use_cache=False)
--+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-- 
--     def get_input_embeddings(self):
--         return self.model.embed_tokens
--@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--         ```"""
--+        if not self.warm_up:
--+            self.warm_up = True
--+            self.warmup_moe_model_deep()
--+
--         output_attentions = (
--             output_attentions
--             if output_attentions is not None
--diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--index 3cbf820e..d4c6b651 100644
----- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--@@ -18,7 +18,6 @@
-- # See the License for the specific language governing permissions and
-- # limitations under the License.
-- """MindSpore Qwen2MoE model."""
---
-- import math
-- from typing import List, Optional, Tuple, Union
-- 
--@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--     TokenClassifierOutput,
-- )
-- from ...modeling_utils import PreTrainedModel
--+from ...generation import GenerationMixin
-- from ....utils import logging
-- from .configuration_qwen2_moe import Qwen2MoeConfig
-- 
--@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--         self.variance_epsilon = eps
-- 
--     def forward(self, hidden_states):
--+        # @dwj
--+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+        # @lwx
--+        # if not self.training :
--+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--         input_dtype = hidden_states.dtype
--         hidden_states = hidden_states.to(mindspore.float32)
--         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--@@ -234,6 +239,8 @@ def rotate_half(x):
--     """Rotates half the hidden dims of the input."""
--     x1 = x[..., : x.shape[-1] // 2]
--     x2 = x[..., x.shape[-1] // 2 :]
--+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--     return ops.cat((-x2, x1), dim=-1)
-- 
-- 
--@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--         self.config = config
--         self.hidden_size = config.hidden_size
--         self.intermediate_size = intermediate_size
--+        
--         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--         self.act_fn = ACT2FN[config.hidden_act]
-- 
--     def forward(self, x):
---        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
---
-- 
--+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+        # @lwx 
--+        # gate_up_output = self.gate_up_proj(x)
--+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+        # return self.down_proj(swiglu_output)
--+
--+    # def forward(self, x):
--+    #     gate_proj_out = self.gate_proj(x)
--+    #     up_proj_out = self.up_proj(x)
--+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+    #     return self.down_proj(swiglu_out)
--+        
-- # Copied from transformers.models.llama.modeling_llama.repeat_kv
-- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--     """
--@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--         use_cache: bool = False,
--         cache_position: Optional[mindspore.Tensor] = None,
--     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+        
--+
--         bsz, q_len, _ = hidden_states.shape
-- 
--         query_states = self.q_proj(hidden_states)
--@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--                     "with a layer index."
--                 )
---            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+            if isinstance(past_key_value, StaticCache):
--+                kv_seq_len = key_states.shape[-2]
--+            else:
--+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-- 
--         if past_key_value is not None:
--             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+            
--+            if isinstance(past_key_value, StaticCache):
--+                kv_seq_len = key_states.shape[-2]
-- 
--         # repeat k/v heads if n_kv_heads < n_heads
--         key_states = repeat_kv(key_states, self.num_key_value_groups)
--         value_states = repeat_kv(value_states, self.num_key_value_groups)
---
--+        
--         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-- 
---        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
---            raise ValueError(
---                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
---                f" {attn_weights.shape}"
---            )
---
---        if attention_mask is not None:  # no matter the length, we just slice it
---            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+        if attention_mask is not None:
--+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--             attn_weights = attn_weights + causal_mask
-- 
--         # upcast attention to fp32
--@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-- 
--         attn_output = self.o_proj(attn_output)
---
--+        # @lwx
--+        
--+        # max_seq_len = self.max_position_embeddings  # 2048
--+
--+        # if attention_mask is not None:
--+        #     # attention_mask: [B, 1, Sq, Sk]
--+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+
--+        #     # pad 到 [max_seq_len, max_seq_len]
--+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+        #     global_attention_mask = padded_mask
--+        # else:
--+        #     global_attention_mask = None
--+
--+
--+        # sparse_mode=3
--+        # attn_output = mindspore.ops.flash_attention_score(
--+        #     query=query_states,
--+        #     key=key_states,
--+        #     value=value_states,
--+        #     real_shift=None,
--+        #     padding_mask=None,
--+
--+        #     head_num=self.num_heads,
--+        #     attn_mask=global_attention_mask,
--+        #     keep_prob=1.0 - self.attention_dropout,
--+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+        #     input_layout="BNSD",
--+        #     pre_tokens=2147483647,
--+        #     next_tokens=2147483647,
--+        #     inner_precise=0,
--+        #     drop_mask=None,
--+        #     prefix=None,
--+        #     actual_seq_qlen=None,
--+        #     actual_seq_kvlen=None,
--+        #     sparse_mode=sparse_mode,
--+        # )
--         if not output_attentions:
--             attn_weights = None
-- 
--         return attn_output, attn_weights, past_key_value
-- 
-- 
--+class Qwen2MoeFlashAttention(nn.Module):
--+    """
--+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+
--+    关键改动:
--+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+       直接传入原始的 key 和 value 张量效率更高。
--+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+    """
--+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+        super().__init__()
--+        self.config = config
--+        self.layer_idx = layer_idx
--+        self.hidden_size = config.hidden_size
--+        self.num_heads = config.num_attention_heads
--+        self.head_dim = self.hidden_size // self.num_heads
--+        self.num_key_value_heads = config.num_key_value_heads
--+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+        self.max_position_embeddings = config.max_position_embeddings
--+        self.rope_theta = config.rope_theta
--+        self.attention_dropout = config.attention_dropout
--+
--+        if (self.head_dim * self.num_heads) != self.hidden_size:
--+            raise ValueError(
--+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+            )
--+
--+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+
--+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+            self.head_dim,
--+            max_position_embeddings=self.max_position_embeddings,
--+            base=self.rope_theta,
--+        )
--+
--+    def forward(
--+        self,
--+        hidden_states: mindspore.Tensor,
--+        attention_mask: Optional[mindspore.Tensor] = None,
--+        position_ids: Optional[mindspore.Tensor] = None,
--+        past_key_value: Optional[Cache] = None,
--+        output_attentions: bool = False,
--+        use_cache: bool = False,
--+        cache_position: Optional[mindspore.Tensor] = None,
--+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+        bsz, q_len, _ = hidden_states.shape
--+
--+        # 1. 线性投射 Q, K, V
--+        query_states = self.q_proj(hidden_states)
--+        key_states = self.k_proj(hidden_states)
--+        value_states = self.v_proj(hidden_states)
--+
--+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+        # query:   [B, S, H*D] -> [B, N1, S, D]
--+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+        # 3. RoPE 旋转位置编码
--+        kv_seq_len = key_states.shape[-2]
--+        if past_key_value is not None:
--+            if self.layer_idx is None:
--+                raise ValueError(
--+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+                    "with a layer index."
--+                )
--+            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+                if cache_position.shape[0] == 1:
--+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+                    kv_seq_len = past_seen_tokens + 1
--+                else:
--+                    # prefill 阶段：cache_position 是范围，使用其长度
--+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+            else:
--+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+
--+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+        # 4. KV 缓存更新
--+        if past_key_value is not None:
--+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+            key_states, value_states = past_key_value.update(
--+                key_states, value_states, self.layer_idx, cache_kwargs
--+            )
--+            
--+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+                if cache_position.shape[0] == 1:
--+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+                    kv_seq_len = key_states.shape[-2]
--+
--+        # 5. [重要] 准备 Attention Mask
--+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+        fa_attention_mask = None
--+        if attention_mask is not None:
--+            # 截取与当前key长度匹配的部分
--+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+            fa_attention_mask = (mask_slice != 0)
--+
--+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+        input_dtype = query_states.dtype
--+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+            query_states = query_states.to(mindspore.float16)
--+            key_states = key_states.to(mindspore.float16)
--+            value_states = value_states.to(mindspore.float16)
--+
--+        # 6. [核心] 调用 flash_attention_score 算子
--+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+        attn_output = mindspore.ops.flash_attention_score(
--+            query=query_states,
--+            key=key_states,
--+            value=value_states,
--+            head_num=self.num_heads, # 传入Q的头数(N1)
--+            attn_mask=fa_attention_mask,
--+            keep_prob=1.0 - self.attention_dropout,
--+            scalar_value=1.0 / math.sqrt(self.head_dim),
--+            input_layout="BNSD",
--+            sparse_mode=0 # 使用 defaultMask 模式
--+        )
--+
--+        # 恢复原始数据类型
--+        attn_output = attn_output.to(input_dtype)
--+
--+        # 7. 调整输出形状
--+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+        attn_output = self.o_proj(attn_output)
--+
--+        # FlashAttention 算子不直接返回注意力权重矩阵
--+        attn_weights = None
--+        if output_attentions:
--+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+
--+        return attn_output, attn_weights, past_key_value
--+
--+    # def forward(
--+    #     self,
--+    #     hidden_states: mindspore.Tensor,
--+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+    #     position_ids: Optional[mindspore.Tensor] = None,
--+    #     past_key_value: Optional[Cache] = None,
--+    #     output_attentions: bool = False,
--+    #     use_cache: bool = False,
--+    #     cache_position: Optional[mindspore.Tensor] = None,
--+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+    #     bsz, q_len, _ = hidden_states.shape
--+
--+    #     # 1. 线性投射 Q, K, V
--+    #     query_states = self.q_proj(hidden_states)
--+    #     key_states = self.k_proj(hidden_states)
--+    #     value_states = self.v_proj(hidden_states)
--+
--+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+    #     # 3. RoPE 旋转位置编码
--+    #     kv_seq_len = key_states.shape[-2]
--+    #     if past_key_value is not None:
--+    #         if self.layer_idx is None:
--+    #             raise ValueError(
--+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+    #                 "with a layer index."
--+    #             )
--+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+
--+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+    #     # 4. KV 缓存更新
--+    #     if past_key_value is not None:
--+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+    #         key_states, value_states = past_key_value.update(
--+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+    #         )
--+
--+    #     # 5. 准备 Attention Mask
--+    #     fa_attention_mask = None
--+    #     if attention_mask is not None:
--+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+    #         fa_attention_mask = (mask_slice != 0)
--+
--+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+    #     input_dtype = query_states.dtype
--+
--+    #     # 6. [核心] 调用 flash_attention_score 算子
--+    #     attn_output = mindspore.ops.flash_attention_score(
--+    #         query=query_states,
--+    #         key=key_states,
--+    #         value=value_states,
--+    #         head_num=self.num_heads,
--+    #         attn_mask=fa_attention_mask,
--+    #         keep_prob=1.0 - self.attention_dropout,
--+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+    #         input_layout="BNSD",
--+    #         sparse_mode=0,
--+    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+    #         inner_precise=1
--+    #     )
--+
--+    #     # 恢复原始数据类型
--+    #     attn_output = attn_output.to(input_dtype)
--+
--+    #     # 7. 调整输出形状
--+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+    #     attn_output = self.o_proj(attn_output)
--+
--+    #     attn_weights = None
--+    #     if output_attentions:
--+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+
--+    #     return attn_output, attn_weights, past_key_value
--+
--+    # def forward(
--+    #     self,
--+    #     hidden_states: mindspore.Tensor,
--+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+    #     position_ids: Optional[mindspore.Tensor] = None,
--+    #     past_key_value: Optional[Cache] = None,
--+    #     output_attentions: bool = False,
--+    #     use_cache: bool = False,
--+    #     cache_position: Optional[mindspore.Tensor] = None,
--+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+    #     bsz, q_len, _ = hidden_states.shape
--+
--+    #     query_states = self.q_proj(hidden_states)
--+    #     key_states = self.k_proj(hidden_states)
--+    #     value_states = self.v_proj(hidden_states)
--+
--+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+    #     kv_seq_len = key_states.shape[-2]
--+    #     if past_key_value is not None:
--+    #         if self.layer_idx is None:
--+    #             raise ValueError("`layer_idx` must be specified for caching")
--+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+
--+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+    #     if past_key_value is not None:
--+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+    #         key_states, value_states = past_key_value.update(
--+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+    #         )
--+
--+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+
--+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+    #     query_states = query_states / math.sqrt(self.head_dim)
--+    #     # <--- 修改结束 ---
--+
--+    #     fa_attention_mask = None
--+    #     if attention_mask is not None:
--+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+    #         fa_attention_mask = (mask_slice != 0)
--+
--+    #     input_dtype = query_states.dtype
--+
--+    #     attn_output = mindspore.ops.flash_attention_score(
--+    #         query=query_states,    # 传入已经预先缩放过的 query
--+    #         key=key_states,
--+    #         value=value_states,
--+    #         head_num=self.num_heads,
--+    #         attn_mask=fa_attention_mask,
--+    #         keep_prob=1.0 - self.attention_dropout,
--+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+    #         input_layout="BNSD",
--+    #         sparse_mode=0,
--+    #         inner_precise=1        # 仍然保持内部高精度计算
--+    #     )
--+
--+    #     attn_output = attn_output.to(input_dtype)
--+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+    #     attn_output = self.o_proj(attn_output)
--+
--+    #     attn_weights = None
--+    #     if output_attentions:
--+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+
--+    #     return attn_output, attn_weights, past_key_value
--+
-- QWEN2MOE_ATTENTION_CLASSES = {
--     "eager": Qwen2MoeAttention,
--+    "flash-attention": Qwen2MoeFlashAttention,
-- }
-- 
-- 
--@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-- 
--+    #@dwj
--+    # 只遍历激活的专家，而非全部专家
--     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
---        batch_size, sequence_length, hidden_dim = hidden_states.shape
---        hidden_states = hidden_states.view(-1, hidden_dim)
---        # router_logits: (batch * sequence_length, n_experts)
---        router_logits = self.gate(hidden_states)
---
---        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
---        if self.norm_topk_prob:
---            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---        # we cast back to the input dtype
---        routing_weights = routing_weights.to(hidden_states.dtype)
---
---        final_hidden_states = ops.zeros(
---            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
---        )
---
---        # One hot encode the selected experts to create an expert mask
---        # this will be used to easily index which expert is going to be sollicitated
---        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
---
---        # Loop over all available experts in the model and perform the computation on each expert
---        for expert_idx in range(self.num_experts):
---            expert_layer = self.experts[expert_idx]
---            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
---
---            # Index the correct hidden states and compute the expert hidden state for
---            # the current expert. We need to make sure to multiply the output hidden
---            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
---            if 0 not in idx.shape:
---                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
---                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
---
---                # However `index_add_` only support torch tensors for indexing so we'll use
---                # the `top_x` tensor here.
---                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
---
---        shared_expert_output = self.shared_expert(hidden_states)
---        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
---
---        final_hidden_states = final_hidden_states + shared_expert_output
--+            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+            num_tokens = hidden_states_reshaped.shape[0]
--+
--+            router_logits = self.gate(hidden_states_reshaped)
--+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+
--+            if self.norm_topk_prob:
--+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+            routing_weights = routing_weights.to(hidden_states.dtype)
--+            
--+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+            flat_selected_experts = selected_experts.flatten()
--+            
--+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+            token_indices = broadcasted_token_indices.flatten()
--+            
--+            active_experts = ops.unique(flat_selected_experts)
--+            
--+            for expert_idx_tensor in active_experts:
--+                expert_idx = expert_idx_tensor.item()
--+                expert_layer = self.experts[expert_idx]
--+                
--+                mask = (flat_selected_experts == expert_idx_tensor)
--+                selected_token_indices = token_indices[mask]
--+                selected_routing_weights = routing_weights.flatten()[mask]
--+                
--+                current_states = hidden_states_reshaped[selected_token_indices]
--+                
--+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+                
--+                final_hidden_states = final_hidden_states.index_add(
--+                    dim=0,
--+                    index=selected_token_indices,
--+                    source=expert_output.to(hidden_states.dtype)
--+                )
--+            
--+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-- 
---        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
---        return final_hidden_states, router_logits
--+            final_hidden_states = final_hidden_states + shared_expert_output
--+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+            
--+            return final_hidden_states, router_logits
-- 
-- 
-- class Qwen2MoeDecoderLayer(nn.Module):
--@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-- 
--         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-- 
--+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+
--         if (layer_idx not in config.mlp_only_layers) and (
--             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--         ):
--@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--     _skip_keys_device_placement = "past_key_values"
--     _supports_cache_class = True
--+#lwx
--+    # _supports_static_cache = True
-- 
--     def _init_weights(self, module):
--         std = self.config.initializer_range
--@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--         return causal_mask
-- 
-- 
---class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--     _tied_weights_keys = ["lm_head.weight"]
-- 
--     def __init__(self, config):
--@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--         self.num_experts_per_tok = config.num_experts_per_tok
--         # Initialize weights and apply final processing
--         self.post_init()
--+        # @lwx
--+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+        #     self.generation_config.cache_implementation = "static"
--+        self._warmed_up = False
--+
--+    def warmup_moe_model(self):
--+        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+        test_texts = [
--+            "warmup short",
--+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+        ]
--+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+        if tokenizer is None:
--+            from mindnlp.transformers import AutoTokenizer
--+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+            self._warmup_tokenizer = tokenizer
--+
--+        for text in test_texts:
--+            inputs = tokenizer(text, return_tensors="ms")
--+            with mindspore._no_grad():
--+                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+        print("[Warmup] Qwen2-MoE 模型预热完成。")
-- 
--     def get_input_embeddings(self):
--         return self.model.embed_tokens
--@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--         ```"""
--+        if not self._warmed_up:
--+            self._warmed_up = True
--+            self.warmup_moe_model()
-- 
--         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--         output_router_logits = (
--@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--             }
--         )
--         return model_inputs
--+# @lwx
--+    # def _decode_one_tokens_logits(
--+    #     self,
--+    #     cur_token: mindspore.Tensor,
--+    #     input_pos: Optional[mindspore.Tensor],
--+    #     cache_position: mindspore.Tensor,
--+    #     past_key_values: StaticCache,
--+    # ) -> mindspore.Tensor:
--+    #     """
--+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+        
--+    #     Args:
--+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+    #         input_pos: 输入位置信息，可选
--+    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+            
--+    #     Returns:
--+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+    #     """
--+    #     # 调用JIT编译的版本
--+    #     return self.get_decode_one_tokens_logits(
--+    #         cur_token=cur_token,
--+    #         input_pos=input_pos,
--+    #         cache_position=cache_position,
--+    #         past_key_values=past_key_values,
--+    #     )
--+    
--+    # @mindspore.jit(jit_level='O1')
--+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+    #     """
--+    #     JIT编译的函数，用于高效的单token解码
--+    #     使用JIT编译优化以支持静态shape和高效执行
--+        
--+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+    #     """
--+    #     outputs = self.model.forward(
--+    #         input_ids=cur_token,
--+    #         position_ids=input_pos,
--+    #         cache_position=cache_position,
--+    #         past_key_values=past_key_values,
--+    #         use_cache=True,
--+    #         return_dict=False,
--+    #     )
--+        
--+    #     hidden_states = outputs[0]
--+    #     logits = self.lm_head.forward(hidden_states)
--+    #     logits = logits.float()
--+        
--+    #     return logits[:, -1, :]
--+
--+    # def _sample(
--+    #     self,
--+    #     input_ids: mindspore.Tensor,
--+    #     logits_processor,
--+    #     stopping_criteria,
--+    #     generation_config,
--+    #     synced_devices: bool,
--+    #     streamer=None,
--+    #     logits_warper=None,
--+    #     **model_kwargs,
--+    # ):
--+    #     """
--+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+    #     """
--+    #     from ...generation.logits_process import LogitsProcessorList
--+    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+    #     from mindnlp.core import nn, ops, no_grad
--+    #     import numpy as np
--+        
--+    #     # 检查是否使用 StaticCache
--+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+    #     # 否则，直接调用父类方法
--+    #     past_key_values = model_kwargs.get("past_key_values")
--+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+
--+    #     if not isinstance(past_key_values, StaticCache):
--+    #         # 不使用 StaticCache，直接调用父类方法
--+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+    #         return super()._sample(
--+    #             input_ids=input_ids,
--+    #             logits_processor=logits_processor,
--+    #             stopping_criteria=stopping_criteria,
--+    #             generation_config=generation_config,
--+    #             synced_devices=synced_devices,
--+    #             streamer=streamer,
--+    #             logits_warper=logits_warper,
--+    #             **model_kwargs,
--+    #         )
--+        
--+    #     # 使用 StaticCache，进入自定义循环
--+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+    #     pad_token_id = generation_config._pad_token_tensor
--+    #     output_attentions = generation_config.output_attentions
--+    #     output_hidden_states = generation_config.output_hidden_states
--+    #     output_scores = generation_config.output_scores
--+    #     output_logits = generation_config.output_logits
--+    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+    #     max_length = generation_config.max_length
--+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+    #     do_sample = generation_config.do_sample
--+        
--+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+    #         raise ValueError(
--+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+    #             f"{logits_warper})."
--+    #         )
--+        
--+    #     # init attention / hidden states / scores tuples
--+    #     scores = () if (return_dict_in_generate and output_scores) else None
--+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+        
--+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+    #         encoder_hidden_states = (
--+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+    #         )
--+        
--+    #     # keep track of which sequences are already finished
--+    #     batch_size, cur_len = input_ids.shape
--+    #     this_peer_finished = False
--+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+        
--+    #     time_record = []
--+    #     from ....utils.testing_utils import parse_flag_from_env
--+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+        
--+    #     while self._has_unfinished_sequences(
--+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+    #     ):
--+    #         if _record_time:
--+    #             import time as time_module
--+    #             infer_start = time_module.time()
--+            
--+    #         # prepare model inputs
--+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+            
--+    #         # prepare variable output controls
--+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+            
--+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+    #         cur_cache_position = model_inputs.get("cache_position")
--+    #         cur_past_key_values = model_inputs.get("past_key_values")
--+    #         cur_input_ids = model_inputs.get("input_ids")
--+            
--+    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+    #             cur_cache_position is not None and 
--+    #             len(cur_cache_position.shape) > 0 and
--+    #             cur_cache_position.shape[0] == 1 and
--+    #             cur_input_ids is not None and
--+    #             cur_input_ids.shape[1] == 1):
--+    #             # 使用 JIT 优化的单 token 解码
--+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+    #             if not hasattr(self, '_jit_used'):
--+    #                 self._jit_used = False
--+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+                
--+    #             next_token_logits = self.get_decode_one_tokens_logits(
--+    #                 cur_token=cur_input_ids,
--+    #                 input_pos=model_inputs.get("position_ids"),
--+    #                 cache_position=cur_cache_position,
--+    #                 past_key_values=cur_past_key_values,
--+    #             )
--+                
--+    #             # 标记已使用JIT（用于后续判断）
--+    #             if not self._jit_used:
--+    #                 self._jit_used = True
--+                
--+    #             # 构造兼容的输出对象
--+    #             class JitOptimizedOutput:
--+    #                 def __init__(self, logits, config):
--+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+    #                     self.config = config
--+    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+    #                     self.attentions = None if not config.is_encoder_decoder else None
--+    #                     self.cross_attentions = None
--+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+                
--+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+    #         else:
--+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+    #             outputs = self(**model_inputs, return_dict=True)
--+            
--+    #         if synced_devices and this_peer_finished:
--+    #             continue
--+            
--+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+    #         next_token_logits = outputs.logits[:, -1, :]
--+            
--+    #         # pre-process distribution
--+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+    #         if do_sample:
--+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+            
--+    #         # Store scores, attentions and hidden_states when required
--+    #         if return_dict_in_generate:
--+    #             if output_scores:
--+    #                 scores += (next_token_scores,)
--+    #             if output_logits:
--+    #                 raw_logits += (next_token_logits,)
--+    #             if output_attentions:
--+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+    #                 if self.config.is_encoder_decoder:
--+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+                
--+    #             if output_hidden_states:
--+    #                 hidden = (
--+    #                     outputs.decoder_hidden_states
--+    #                     if self.config.is_encoder_decoder
--+    #                     else outputs.hidden_states
--+    #                 )
--+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+            
--+    #         # token selection
--+    #         if do_sample:
--+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+    #         else:
--+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+            
--+    #         # finished sentences should have their next token be a padding token
--+    #         if has_eos_stopping_criteria:
--+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+            
--+    #         # update generated ids, model inputs, and length for next step
--+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+    #         if streamer is not None:
--+    #             streamer.put(next_tokens)
--+            
--+    #         model_kwargs = self._update_model_kwargs_for_generation(
--+    #             outputs,
--+    #             model_kwargs,
--+    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+    #         )
--+            
--+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+    #         cur_len += 1
--+            
--+    #         if _record_time:
--+    #             import time as time_module
--+    #             infer_stop = time_module.time()
--+    #             time_record.append(infer_stop - infer_start)
--+            
--+    #         del outputs
--+        
--+    #     average_infer_time = None
--+    #     if time_record:
--+    #         if len(time_record) > 1:
--+    #             time_record.pop(0)
--+    #         average_infer_time = sum(time_record) / len(time_record)
--+    #         print(f'average inference time is: {average_infer_time}')
--+    #         print(f'inference time record: {time_record}')
--+        
--+    #     if streamer is not None:
--+    #         streamer.end()
--+        
--+    #     # 简单判断：打印是否使用了JIT路径
--+    #     if hasattr(self, '_jit_used') and self._jit_used:
--+    #         print("[JIT] ✓ JIT optimization was used during generation")
--+    #     else:
--+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+        
--+    #     if return_dict_in_generate:
--+    #         if self.config.is_encoder_decoder:
--+    #             return GenerateEncoderDecoderOutput(
--+    #                 sequences=input_ids,
--+    #                 scores=scores,
--+    #                 logits=raw_logits,
--+    #                 encoder_attentions=encoder_attentions,
--+    #                 encoder_hidden_states=encoder_hidden_states,
--+    #                 decoder_attentions=decoder_attentions,
--+    #                 cross_attentions=cross_attentions,
--+    #                 decoder_hidden_states=decoder_hidden_states,
--+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+    #                 average_infer_time=average_infer_time
--+    #             )
--+    #         else:
--+    #             return GenerateDecoderOnlyOutput(
--+    #                 sequences=input_ids,
--+    #                 scores=scores,
--+    #                 logits=raw_logits,
--+    #                 attentions=decoder_attentions,
--+    #                 hidden_states=decoder_hidden_states,
--+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+    #                 average_infer_time=average_infer_time
--+    #             )
--+    #     else:
--+    #         return input_ids
--+            
--+    # def _prepare_cache_for_generation(
--+    #     self,
--+    #     generation_config,
--+    #     model_kwargs,
--+    #     assistant_model,
--+    #     batch_size,
--+    #     max_cache_length,
--+    # ):
--+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+    #         generation_config.cache_implementation = "static"
--+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+        
--+    #     if generation_config.cache_implementation == "static":
--+    #         base_required_from_max_length = generation_config.max_length + 1
--+    #         base_required = max(max_cache_length, base_required_from_max_length)
--+    #         min_cache_size = 50
--+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+    #         else:
--+    #             max_cache_length = max(base_required, min_cache_size)
--+            
--+    #         original_max_cache_length = max_cache_length
--+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+    #         print(f"  - final max_cache_length: {max_cache_length}")
--+            
--+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+    #             if max_cache_length > self.config.max_position_embeddings:
--+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+        
--+    #     result = super()._prepare_cache_for_generation(
--+    #         generation_config=generation_config,
--+    #         model_kwargs=model_kwargs,
--+    #         assistant_model=assistant_model,
--+    #         batch_size=batch_size,
--+    #         max_cache_length=max_cache_length,
--+    #     )
--+        
--+    #     if generation_config.cache_implementation == "static":
--+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+    #         created_cache = model_kwargs.get(cache_name)
--+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+    #             if created_cache.max_cache_len < generation_config.max_length:
--+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+        
--+    #     return result
--+
--+
--+
-- 
-- 
-- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
---- 
--2.27.0
--
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0004-20251106change.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0004-20251106change.patch"
deleted file mode 100644
index 25b442d5..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0004-20251106change.patch"
+++ /dev/null
@@ -1,7498 +0,0 @@
-From 60df5bdc79368911a03b9c034b11b7437df753ca Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Thu, 6 Nov 2025 15:48:09 +0800
-Subject: [PATCH 04/10] 20251106change
-
----
- .../models/deepseek/modeling_deepseek.py      |  189 +-
- patches/0001-20251104commit.patch             | 1272 +++++++
- patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
- patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
- 4 files changed, 7244 insertions(+), 186 deletions(-)
- create mode 100644 patches/0001-20251104commit.patch
- create mode 100644 patches/0002-20251106commit.patch
- create mode 100644 patches/0003-20261106secondcommit.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index 2f9192bf..0546f318 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
- 
-         return attn_output, attn_weights, past_key_value
- 
--# class DeepseekFlashAttention(nn.Module):
--#     """
--#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--
--#     This class is designed as a drop-in replacement for DeepseekAttention.
--#     """
--
--#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--#         super().__init__()
--#         self.config = config
--#         self.layer_idx = layer_idx
--#         if layer_idx is None:
--#             logger.warning(
--#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--#                 "when creating this class."
--#             )
--
--#         self.attention_dropout = config.attention_dropout
--#         self.hidden_size = config.hidden_size
--#         self.num_heads = config.num_attention_heads
--#         self.head_dim = self.hidden_size // self.num_heads
--#         self.num_key_value_heads = config.num_key_value_heads
--#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--#         self.max_position_embeddings = config.max_position_embeddings
--#         self.rope_theta = config.rope_theta
--#         self.is_causal = True
--
--#         if (self.head_dim * self.num_heads) != self.hidden_size:
--#             raise ValueError(
--#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--#                 f" and `num_heads`: {self.num_heads})."
--#             )
--
--#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--#         self._init_rope()
--
--#     def _init_rope(self):
--#         if self.config.rope_scaling is None:
--#             self.rotary_emb = DeepseekRotaryEmbedding(
--#                 self.head_dim,
--#                 max_position_embeddings=self.max_position_embeddings,
--#                 base=self.rope_theta,
--#             )
--#         else:
--#             scaling_type = self.config.rope_scaling["type"]
--#             scaling_factor = self.config.rope_scaling["factor"]
--#             if scaling_type == "linear":
--#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--#                     self.head_dim,
--#                     max_position_embeddings=self.max_position_embeddings,
--#                     scaling_factor=scaling_factor,
--#                     base=self.rope_theta,
--#                 )
--#             elif scaling_type == "dynamic":
--#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--#                     self.head_dim,
--#                     max_position_embeddings=self.max_position_embeddings,
--#                     scaling_factor=scaling_factor,
--#                     base=self.rope_theta,
--#                 )
--#             else:
--#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--
--#     def forward(
--#         self,
--#         hidden_states: mindspore.Tensor,
--#         attention_mask: Optional[mindspore.Tensor] = None,
--#         position_ids: Optional[mindspore.Tensor] = None,
--#         past_key_value: Optional[Cache] = None,
--#         output_attentions: bool = False,
--#         use_cache: bool = False,
--#         **kwargs,
--#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--#         if "padding_mask" in kwargs:
--#             warnings.warn(
--#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--#             )
--        
--#         if output_attentions:
--#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--
--#         bsz, q_len, _ = hidden_states.shape
--
--#         if self.config.pretraining_tp > 1:
--#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--
--#         query_states = self.q_proj(hidden_states)
--#         key_states = self.k_proj(hidden_states)
--#         value_states = self.v_proj(hidden_states)
--
--#         # Reshape for multi-head attention
--#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--
--#         kv_seq_len = key_states.shape[-2]
--#         if past_key_value is not None:
--#             if self.layer_idx is None:
--#                 raise ValueError(
--#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--#                     "with a layer index."
--#                 )
--#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--        
--#         # Apply Rotary Positional Embedding
--#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--
--#         if past_key_value is not None:
--#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--
--#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--        
--#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--        
--#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--
--#         # Convert attention_mask for flash_attention_score
--#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--#         if attention_mask is not None:
--#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--#                 raise ValueError(
--#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--#                 )
--#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--#         else:
--#             attn_mask_for_fa = None
--        
--#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--
--#         # Call the fused flash_attention_score operator
--#         attn_output = mindspore.ops.flash_attention_score(
--#             query=query_states_for_fa,
--#             key=key_states_for_fa,
--#             value=value_states_for_fa,
--#             head_num=self.num_heads, # This is N1, the number of query heads
--#             input_layout='BSH',
--#             attn_mask=attn_mask_for_fa,
--#             keep_prob=keep_prob,
--#             scalar_value=1.0 / math.sqrt(self.head_dim),
--#             sparse_mode=0 # Default mask mode
--#         )
--        
--#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--#         attn_output = self.o_proj(attn_output)
--        
--#         # Flash Attention does not return attention weights
--#         attn_weights = None
--
--#         return attn_output, attn_weights, past_key_value
- 
- class DeepseekFlashAttention(nn.Module):
-     """
-@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
-         super().__init__()
-         self.hidden_size = config.hidden_size
- 
--        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--            config=config, layer_idx=layer_idx
--        )
-+        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-+            # config=config, layer_idx=layer_idx
-+        # )
- 
-         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-             config=config, layer_idx=layer_idx
-@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
-         return outputs
- 
- 
--
- class DeepseekPreTrainedModel(PreTrainedModel):
-     config_class = DeepseekConfig
-     base_model_prefix = "model"
-@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-         # Initialize weights and apply final processing
-         self.post_init()
-         self.warm_up = False
--        #@dwj
--        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--            self.num_layers,
--            self.num_attention_heads,
--            self.head_dim,
--            batch_size=1,
--            max_length=self.max_length,
--            dtype=mindspore.float16
--        )
--
--    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--        key_cache = []
--        value_cache = []
--        for _ in range(num_layers):
--            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--            key_cache.append(k)
--            value_cache.append(v)
--        return key_cache, value_cache
--
- 
-     def warmup_moe_model_deep(self):
-         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-new file mode 100644
-index 00000000..78f22642
---- /dev/null
-+++ b/patches/0001-20251104commit.patch
-@@ -0,0 +1,1272 @@
-+From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Tue, 4 Nov 2025 09:11:51 +0800
-+Subject: [PATCH 1/3] 20251104commit
-+
-+---
-+ mindnlp/transformers/cache_utils.py           |  28 +-
-+ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+ 3 files changed, 976 insertions(+), 87 deletions(-)
-+
-+diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+index cadd2e04..02f8d4be 100644
-+--- a/mindnlp/transformers/cache_utils.py
-++++ b/mindnlp/transformers/cache_utils.py
-+@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+             # k_out[:, :, cache_position] = key_states
-+             # v_out[:, :, cache_position] = value_states
-+-            if ON_ORANGE_PI:
-+-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+-            else:
-+-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+-
-++            # if ON_ORANGE_PI:
-++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++            # else:
-++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++            # 确保 cache_position 是 1D tensor 并且类型正确
-++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++            if cache_position.ndim > 1:
-++                cache_position = cache_position.flatten()
-++            # 确保类型是 int32 或 int64（MindSpore 要求）
-++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++                cache_position = cache_position.int()
-++            
-++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++            k_out[:, :, cache_position] = key_states
-++            v_out[:, :, cache_position] = value_states
-++            
-+         return k_out, v_out
-+ 
-+     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index c695b944..d8303e45 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+ def rotate_half(x):
-+     """Rotates half the hidden dims of the input."""
-+-    x1 = x[..., : x.shape[-1] // 2]
-+-    x2 = x[..., x.shape[-1] // 2 :]
-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++    # x1 = x[..., : x.shape[-1] // 2]
-++    # x2 = x[..., x.shape[-1] // 2 :]
-++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+     return ops.cat((-x2, x1), dim=-1)
-+ 
-+ 
-+@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+         if self.training:
-+             raise NotImplementedError("Training is not supported yet.")
-+         else:
-+-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+-        if self.config.n_shared_experts is not None:
-+-            y = y + self.shared_experts(identity)
-+-        return y
-++            # @lwx
-++            if orig_shape[1] == 1:
-++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++                y=y.view(*orig_shape)
-++                if self.config.n_shared_experts is not None:
-++                    y = y + self.shared_experts(identity)
-++                return y
-++            else:
-++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++                if self.config.n_shared_experts is not None:
-++                    y = y + self.shared_experts(identity)
-++                return y
-++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++        # if self.config.n_shared_experts is not None:
-++        #     y = y + self.shared_experts(identity)
-++        # return y
-++        
-++    @no_grad()
-++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++
-++        expert_cache = ops.zeros_like(x)
-++        for i in range(self.num_experts_per_tok):
-++            expert_id = flat_expert_indices[i].item()
-++            weight = flat_expert_weights[i].item()
-++            expert = self.experts[expert_id]
-++            expert_out = expert(x)
-++            expert_cache += expert_out * weight
-++        return expert_cache
-+ 
-+     @no_grad()
-+-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+-        # expert_cache = torch.zeros_like(x)
-+-        # idxs = flat_expert_indices.argsort()
-+-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+-        # token_idxs = idxs // self.num_experts_per_tok
-+-        # for i, end_idx in enumerate(tokens_per_expert):
-+-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+-        #     if start_idx == end_idx:
-+-        #         continue
-+-        #     expert = self.experts[i]
-+-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+-        #     expert_tokens = x[exp_token_idx]
-+-        #     expert_out = expert(expert_tokens)
-+-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+-        # return expert_cache
-++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+         expert_cache = ops.zeros_like(x)
-+         idxs = flat_expert_indices.argsort()
-+         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+         token_idxs = idxs // self.num_experts_per_tok
-++
-+         for i, end_idx in enumerate(tokens_per_expert):
-+             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+             if start_idx == end_idx:
-+@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+             expert_out = expert(expert_tokens)
-+             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++
-+         return expert_cache
-++        
-++    # @no_grad()
-++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++    #     # expert_cache = torch.zeros_like(x)
-++    #     # idxs = flat_expert_indices.argsort()
-++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++    #     # token_idxs = idxs // self.num_experts_per_tok
-++    #     # for i, end_idx in enumerate(tokens_per_expert):
-++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++    #     #     if start_idx == end_idx:
-++    #     #         continue
-++    #     #     expert = self.experts[i]
-++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++    #     #     expert_tokens = x[exp_token_idx]
-++    #     #     expert_out = expert(expert_tokens)
-++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++    #     # return expert_cache
-++    #     expert_cache = ops.zeros_like(x)
-++    #     idxs = flat_expert_indices.argsort()
-++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++    #     token_idxs = idxs // self.num_experts_per_tok
-++
-++    #     for i, end_idx in enumerate(tokens_per_expert):
-++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++    #         if start_idx == end_idx:
-++    #             continue
-++    #         expert = self.experts[i]
-++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++    #         expert_tokens = x[exp_token_idx]
-++    #         expert_out = expert(expert_tokens)
-++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++
-++    #     return expert_cache
-++    # @no_grad()
-++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++    #     expert_cache = ops.zeros_like(x)
-++
-++    #     # 排序保证顺序一致
-++    #     idxs = flat_expert_indices.argsort()
-++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++    #     token_idxs = idxs // self.num_experts_per_tok
-++
-++    #     # 找出有 token 的专家
-++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++
-++    #     for i in active_experts.tolist():
-++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++    #         end_idx = tokens_per_expert[i]
-++    #         if start_idx == end_idx:  # 没有 token
-++    #             continue
-++
-++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++    #         expert_tokens = x[exp_token_idx]
-++    #         expert_out = self.experts[i](expert_tokens)
-++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++
-++    #         expert_cache = mindspore.mint.scatter_add(
-++    #             expert_cache,
-++    #             0,
-++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++    #             expert_out
-++    #         )
-++
-++    #     return expert_cache
-++
-++
-+ 
-+ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+ #     """
-+@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+ 
-+         # Initialize weights and apply final processing
-+         self.post_init()
-++        self.warm_up = False
-++
-++    def warmup_moe_model_deep(self):
-++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++        test_texts = [
-++            "warmup short",
-++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++        ]
-++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++        if tokenizer is None:
-++            from mindnlp.transformers import AutoTokenizer
-++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++            self._warmup_tokenizer = tokenizer
-++
-++        for text in test_texts:
-++            inputs = tokenizer(text, return_tensors="ms")
-++            with mindspore._no_grad():
-++                _ = self(**inputs, use_cache=False)
-++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+ 
-+     def get_input_embeddings(self):
-+         return self.model.embed_tokens
-+@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+         ```"""
-++        if not self.warm_up:
-++            self.warm_up = True
-++            self.warmup_moe_model_deep()
-++
-+         output_attentions = (
-+             output_attentions
-+             if output_attentions is not None
-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+index 3cbf820e..d4c6b651 100644
-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+@@ -18,7 +18,6 @@
-+ # See the License for the specific language governing permissions and
-+ # limitations under the License.
-+ """MindSpore Qwen2MoE model."""
-+-
-+ import math
-+ from typing import List, Optional, Tuple, Union
-+ 
-+@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+     TokenClassifierOutput,
-+ )
-+ from ...modeling_utils import PreTrainedModel
-++from ...generation import GenerationMixin
-+ from ....utils import logging
-+ from .configuration_qwen2_moe import Qwen2MoeConfig
-+ 
-+@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+         self.variance_epsilon = eps
-+ 
-+     def forward(self, hidden_states):
-++        # @dwj
-++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++        # @lwx
-++        # if not self.training :
-++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+         input_dtype = hidden_states.dtype
-+         hidden_states = hidden_states.to(mindspore.float32)
-+         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+@@ -234,6 +239,8 @@ def rotate_half(x):
-+     """Rotates half the hidden dims of the input."""
-+     x1 = x[..., : x.shape[-1] // 2]
-+     x2 = x[..., x.shape[-1] // 2 :]
-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+     return ops.cat((-x2, x1), dim=-1)
-+ 
-+ 
-+@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+         self.config = config
-+         self.hidden_size = config.hidden_size
-+         self.intermediate_size = intermediate_size
-++        
-+         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+         self.act_fn = ACT2FN[config.hidden_act]
-+ 
-+     def forward(self, x):
-+-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+-
-+ 
-++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++        # @lwx 
-++        # gate_up_output = self.gate_up_proj(x)
-++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++        # return self.down_proj(swiglu_output)
-++
-++    # def forward(self, x):
-++    #     gate_proj_out = self.gate_proj(x)
-++    #     up_proj_out = self.up_proj(x)
-++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++    #     return self.down_proj(swiglu_out)
-++        
-+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+     """
-+@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+         use_cache: bool = False,
-+         cache_position: Optional[mindspore.Tensor] = None,
-+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++        
-++
-+         bsz, q_len, _ = hidden_states.shape
-+ 
-+         query_states = self.q_proj(hidden_states)
-+@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+                     "with a layer index."
-+                 )
-+-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++            if isinstance(past_key_value, StaticCache):
-++                kv_seq_len = key_states.shape[-2]
-++            else:
-++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+ 
-+         if past_key_value is not None:
-+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++            
-++            if isinstance(past_key_value, StaticCache):
-++                kv_seq_len = key_states.shape[-2]
-+ 
-+         # repeat k/v heads if n_kv_heads < n_heads
-+         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+-
-++        
-+         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+ 
-+-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+-            raise ValueError(
-+-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+-                f" {attn_weights.shape}"
-+-            )
-+-
-+-        if attention_mask is not None:  # no matter the length, we just slice it
-+-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++        if attention_mask is not None:
-++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+             attn_weights = attn_weights + causal_mask
-+ 
-+         # upcast attention to fp32
-+@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+ 
-+         attn_output = self.o_proj(attn_output)
-+-
-++        # @lwx
-++        
-++        # max_seq_len = self.max_position_embeddings  # 2048
-++
-++        # if attention_mask is not None:
-++        #     # attention_mask: [B, 1, Sq, Sk]
-++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++
-++        #     # pad 到 [max_seq_len, max_seq_len]
-++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++        #     global_attention_mask = padded_mask
-++        # else:
-++        #     global_attention_mask = None
-++
-++
-++        # sparse_mode=3
-++        # attn_output = mindspore.ops.flash_attention_score(
-++        #     query=query_states,
-++        #     key=key_states,
-++        #     value=value_states,
-++        #     real_shift=None,
-++        #     padding_mask=None,
-++
-++        #     head_num=self.num_heads,
-++        #     attn_mask=global_attention_mask,
-++        #     keep_prob=1.0 - self.attention_dropout,
-++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++        #     input_layout="BNSD",
-++        #     pre_tokens=2147483647,
-++        #     next_tokens=2147483647,
-++        #     inner_precise=0,
-++        #     drop_mask=None,
-++        #     prefix=None,
-++        #     actual_seq_qlen=None,
-++        #     actual_seq_kvlen=None,
-++        #     sparse_mode=sparse_mode,
-++        # )
-+         if not output_attentions:
-+             attn_weights = None
-+ 
-+         return attn_output, attn_weights, past_key_value
-+ 
-+ 
-++class Qwen2MoeFlashAttention(nn.Module):
-++    """
-++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++
-++    关键改动:
-++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++       直接传入原始的 key 和 value 张量效率更高。
-++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++    """
-++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++        super().__init__()
-++        self.config = config
-++        self.layer_idx = layer_idx
-++        self.hidden_size = config.hidden_size
-++        self.num_heads = config.num_attention_heads
-++        self.head_dim = self.hidden_size // self.num_heads
-++        self.num_key_value_heads = config.num_key_value_heads
-++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++        self.max_position_embeddings = config.max_position_embeddings
-++        self.rope_theta = config.rope_theta
-++        self.attention_dropout = config.attention_dropout
-++
-++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++            raise ValueError(
-++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++            )
-++
-++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++
-++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++            self.head_dim,
-++            max_position_embeddings=self.max_position_embeddings,
-++            base=self.rope_theta,
-++        )
-++
-++    def forward(
-++        self,
-++        hidden_states: mindspore.Tensor,
-++        attention_mask: Optional[mindspore.Tensor] = None,
-++        position_ids: Optional[mindspore.Tensor] = None,
-++        past_key_value: Optional[Cache] = None,
-++        output_attentions: bool = False,
-++        use_cache: bool = False,
-++        cache_position: Optional[mindspore.Tensor] = None,
-++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++        bsz, q_len, _ = hidden_states.shape
-++
-++        # 1. 线性投射 Q, K, V
-++        query_states = self.q_proj(hidden_states)
-++        key_states = self.k_proj(hidden_states)
-++        value_states = self.v_proj(hidden_states)
-++
-++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++        # query:   [B, S, H*D] -> [B, N1, S, D]
-++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++        # 3. RoPE 旋转位置编码
-++        kv_seq_len = key_states.shape[-2]
-++        if past_key_value is not None:
-++            if self.layer_idx is None:
-++                raise ValueError(
-++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++                    "with a layer index."
-++                )
-++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++                if cache_position.shape[0] == 1:
-++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++                    kv_seq_len = past_seen_tokens + 1
-++                else:
-++                    # prefill 阶段：cache_position 是范围，使用其长度
-++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++            else:
-++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++
-++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++        # 4. KV 缓存更新
-++        if past_key_value is not None:
-++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++            key_states, value_states = past_key_value.update(
-++                key_states, value_states, self.layer_idx, cache_kwargs
-++            )
-++            
-++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++                if cache_position.shape[0] == 1:
-++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++                    kv_seq_len = key_states.shape[-2]
-++
-++        # 5. [重要] 准备 Attention Mask
-++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++        fa_attention_mask = None
-++        if attention_mask is not None:
-++            # 截取与当前key长度匹配的部分
-++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++            fa_attention_mask = (mask_slice != 0)
-++
-++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++        input_dtype = query_states.dtype
-++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++            query_states = query_states.to(mindspore.float16)
-++            key_states = key_states.to(mindspore.float16)
-++            value_states = value_states.to(mindspore.float16)
-++
-++        # 6. [核心] 调用 flash_attention_score 算子
-++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++        attn_output = mindspore.ops.flash_attention_score(
-++            query=query_states,
-++            key=key_states,
-++            value=value_states,
-++            head_num=self.num_heads, # 传入Q的头数(N1)
-++            attn_mask=fa_attention_mask,
-++            keep_prob=1.0 - self.attention_dropout,
-++            scalar_value=1.0 / math.sqrt(self.head_dim),
-++            input_layout="BNSD",
-++            sparse_mode=0 # 使用 defaultMask 模式
-++        )
-++
-++        # 恢复原始数据类型
-++        attn_output = attn_output.to(input_dtype)
-++
-++        # 7. 调整输出形状
-++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++        attn_output = self.o_proj(attn_output)
-++
-++        # FlashAttention 算子不直接返回注意力权重矩阵
-++        attn_weights = None
-++        if output_attentions:
-++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++
-++        return attn_output, attn_weights, past_key_value
-++
-++    # def forward(
-++    #     self,
-++    #     hidden_states: mindspore.Tensor,
-++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++    #     position_ids: Optional[mindspore.Tensor] = None,
-++    #     past_key_value: Optional[Cache] = None,
-++    #     output_attentions: bool = False,
-++    #     use_cache: bool = False,
-++    #     cache_position: Optional[mindspore.Tensor] = None,
-++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++    #     bsz, q_len, _ = hidden_states.shape
-++
-++    #     # 1. 线性投射 Q, K, V
-++    #     query_states = self.q_proj(hidden_states)
-++    #     key_states = self.k_proj(hidden_states)
-++    #     value_states = self.v_proj(hidden_states)
-++
-++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++    #     # 3. RoPE 旋转位置编码
-++    #     kv_seq_len = key_states.shape[-2]
-++    #     if past_key_value is not None:
-++    #         if self.layer_idx is None:
-++    #             raise ValueError(
-++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++    #                 "with a layer index."
-++    #             )
-++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++
-++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++    #     # 4. KV 缓存更新
-++    #     if past_key_value is not None:
-++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++    #         key_states, value_states = past_key_value.update(
-++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++    #         )
-++
-++    #     # 5. 准备 Attention Mask
-++    #     fa_attention_mask = None
-++    #     if attention_mask is not None:
-++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++    #         fa_attention_mask = (mask_slice != 0)
-++
-++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++    #     input_dtype = query_states.dtype
-++
-++    #     # 6. [核心] 调用 flash_attention_score 算子
-++    #     attn_output = mindspore.ops.flash_attention_score(
-++    #         query=query_states,
-++    #         key=key_states,
-++    #         value=value_states,
-++    #         head_num=self.num_heads,
-++    #         attn_mask=fa_attention_mask,
-++    #         keep_prob=1.0 - self.attention_dropout,
-++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++    #         input_layout="BNSD",
-++    #         sparse_mode=0,
-++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++    #         inner_precise=1
-++    #     )
-++
-++    #     # 恢复原始数据类型
-++    #     attn_output = attn_output.to(input_dtype)
-++
-++    #     # 7. 调整输出形状
-++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++    #     attn_output = self.o_proj(attn_output)
-++
-++    #     attn_weights = None
-++    #     if output_attentions:
-++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++
-++    #     return attn_output, attn_weights, past_key_value
-++
-++    # def forward(
-++    #     self,
-++    #     hidden_states: mindspore.Tensor,
-++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++    #     position_ids: Optional[mindspore.Tensor] = None,
-++    #     past_key_value: Optional[Cache] = None,
-++    #     output_attentions: bool = False,
-++    #     use_cache: bool = False,
-++    #     cache_position: Optional[mindspore.Tensor] = None,
-++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++    #     bsz, q_len, _ = hidden_states.shape
-++
-++    #     query_states = self.q_proj(hidden_states)
-++    #     key_states = self.k_proj(hidden_states)
-++    #     value_states = self.v_proj(hidden_states)
-++
-++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++    #     kv_seq_len = key_states.shape[-2]
-++    #     if past_key_value is not None:
-++    #         if self.layer_idx is None:
-++    #             raise ValueError("`layer_idx` must be specified for caching")
-++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++
-++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++    #     if past_key_value is not None:
-++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++    #         key_states, value_states = past_key_value.update(
-++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++    #         )
-++
-++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++
-++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++    #     query_states = query_states / math.sqrt(self.head_dim)
-++    #     # <--- 修改结束 ---
-++
-++    #     fa_attention_mask = None
-++    #     if attention_mask is not None:
-++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++    #         fa_attention_mask = (mask_slice != 0)
-++
-++    #     input_dtype = query_states.dtype
-++
-++    #     attn_output = mindspore.ops.flash_attention_score(
-++    #         query=query_states,    # 传入已经预先缩放过的 query
-++    #         key=key_states,
-++    #         value=value_states,
-++    #         head_num=self.num_heads,
-++    #         attn_mask=fa_attention_mask,
-++    #         keep_prob=1.0 - self.attention_dropout,
-++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++    #         input_layout="BNSD",
-++    #         sparse_mode=0,
-++    #         inner_precise=1        # 仍然保持内部高精度计算
-++    #     )
-++
-++    #     attn_output = attn_output.to(input_dtype)
-++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++    #     attn_output = self.o_proj(attn_output)
-++
-++    #     attn_weights = None
-++    #     if output_attentions:
-++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++
-++    #     return attn_output, attn_weights, past_key_value
-++
-+ QWEN2MOE_ATTENTION_CLASSES = {
-+     "eager": Qwen2MoeAttention,
-++    "flash-attention": Qwen2MoeFlashAttention,
-+ }
-+ 
-+ 
-+@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+ 
-++    #@dwj
-++    # 只遍历激活的专家，而非全部专家
-+     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-        hidden_states = hidden_states.view(-1, hidden_dim)
-+-        # router_logits: (batch * sequence_length, n_experts)
-+-        router_logits = self.gate(hidden_states)
-+-
-+-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+-        if self.norm_topk_prob:
-+-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-        # we cast back to the input dtype
-+-        routing_weights = routing_weights.to(hidden_states.dtype)
-+-
-+-        final_hidden_states = ops.zeros(
-+-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+-        )
-+-
-+-        # One hot encode the selected experts to create an expert mask
-+-        # this will be used to easily index which expert is going to be sollicitated
-+-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+-
-+-        # Loop over all available experts in the model and perform the computation on each expert
-+-        for expert_idx in range(self.num_experts):
-+-            expert_layer = self.experts[expert_idx]
-+-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+-
-+-            # Index the correct hidden states and compute the expert hidden state for
-+-            # the current expert. We need to make sure to multiply the output hidden
-+-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+-            if 0 not in idx.shape:
-+-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+-
-+-                # However `index_add_` only support torch tensors for indexing so we'll use
-+-                # the `top_x` tensor here.
-+-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+-
-+-        shared_expert_output = self.shared_expert(hidden_states)
-+-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+-
-+-        final_hidden_states = final_hidden_states + shared_expert_output
-++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++            num_tokens = hidden_states_reshaped.shape[0]
-++
-++            router_logits = self.gate(hidden_states_reshaped)
-++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++
-++            if self.norm_topk_prob:
-++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++            routing_weights = routing_weights.to(hidden_states.dtype)
-++            
-++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++            flat_selected_experts = selected_experts.flatten()
-++            
-++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++            token_indices = broadcasted_token_indices.flatten()
-++            
-++            active_experts = ops.unique(flat_selected_experts)
-++            
-++            for expert_idx_tensor in active_experts:
-++                expert_idx = expert_idx_tensor.item()
-++                expert_layer = self.experts[expert_idx]
-++                
-++                mask = (flat_selected_experts == expert_idx_tensor)
-++                selected_token_indices = token_indices[mask]
-++                selected_routing_weights = routing_weights.flatten()[mask]
-++                
-++                current_states = hidden_states_reshaped[selected_token_indices]
-++                
-++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++                
-++                final_hidden_states = final_hidden_states.index_add(
-++                    dim=0,
-++                    index=selected_token_indices,
-++                    source=expert_output.to(hidden_states.dtype)
-++                )
-++            
-++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+ 
-+-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+-        return final_hidden_states, router_logits
-++            final_hidden_states = final_hidden_states + shared_expert_output
-++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++            
-++            return final_hidden_states, router_logits
-+ 
-+ 
-+ class Qwen2MoeDecoderLayer(nn.Module):
-+@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+ 
-+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+ 
-++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++
-+         if (layer_idx not in config.mlp_only_layers) and (
-+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+         ):
-+@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+     _skip_keys_device_placement = "past_key_values"
-+     _supports_cache_class = True
-++#lwx
-++    # _supports_static_cache = True
-+ 
-+     def _init_weights(self, module):
-+         std = self.config.initializer_range
-+@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+         return causal_mask
-+ 
-+ 
-+-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+     _tied_weights_keys = ["lm_head.weight"]
-+ 
-+     def __init__(self, config):
-+@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+         self.num_experts_per_tok = config.num_experts_per_tok
-+         # Initialize weights and apply final processing
-+         self.post_init()
-++        # @lwx
-++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++        #     self.generation_config.cache_implementation = "static"
-++        self._warmed_up = False
-++
-++    def warmup_moe_model(self):
-++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++        test_texts = [
-++            "warmup short",
-++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++        ]
-++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++        if tokenizer is None:
-++            from mindnlp.transformers import AutoTokenizer
-++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++            self._warmup_tokenizer = tokenizer
-++
-++        for text in test_texts:
-++            inputs = tokenizer(text, return_tensors="ms")
-++            with mindspore._no_grad():
-++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+ 
-+     def get_input_embeddings(self):
-+         return self.model.embed_tokens
-+@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+         ```"""
-++        if not self._warmed_up:
-++            self._warmed_up = True
-++            self.warmup_moe_model()
-+ 
-+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+         output_router_logits = (
-+@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+             }
-+         )
-+         return model_inputs
-++# @lwx
-++    # def _decode_one_tokens_logits(
-++    #     self,
-++    #     cur_token: mindspore.Tensor,
-++    #     input_pos: Optional[mindspore.Tensor],
-++    #     cache_position: mindspore.Tensor,
-++    #     past_key_values: StaticCache,
-++    # ) -> mindspore.Tensor:
-++    #     """
-++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++        
-++    #     Args:
-++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++    #         input_pos: 输入位置信息，可选
-++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++            
-++    #     Returns:
-++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++    #     """
-++    #     # 调用JIT编译的版本
-++    #     return self.get_decode_one_tokens_logits(
-++    #         cur_token=cur_token,
-++    #         input_pos=input_pos,
-++    #         cache_position=cache_position,
-++    #         past_key_values=past_key_values,
-++    #     )
-++    
-++    # @mindspore.jit(jit_level='O1')
-++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++    #     """
-++    #     JIT编译的函数，用于高效的单token解码
-++    #     使用JIT编译优化以支持静态shape和高效执行
-++        
-++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++    #     """
-++    #     outputs = self.model.forward(
-++    #         input_ids=cur_token,
-++    #         position_ids=input_pos,
-++    #         cache_position=cache_position,
-++    #         past_key_values=past_key_values,
-++    #         use_cache=True,
-++    #         return_dict=False,
-++    #     )
-++        
-++    #     hidden_states = outputs[0]
-++    #     logits = self.lm_head.forward(hidden_states)
-++    #     logits = logits.float()
-++        
-++    #     return logits[:, -1, :]
-++
-++    # def _sample(
-++    #     self,
-++    #     input_ids: mindspore.Tensor,
-++    #     logits_processor,
-++    #     stopping_criteria,
-++    #     generation_config,
-++    #     synced_devices: bool,
-++    #     streamer=None,
-++    #     logits_warper=None,
-++    #     **model_kwargs,
-++    # ):
-++    #     """
-++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++    #     """
-++    #     from ...generation.logits_process import LogitsProcessorList
-++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++    #     from mindnlp.core import nn, ops, no_grad
-++    #     import numpy as np
-++        
-++    #     # 检查是否使用 StaticCache
-++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++    #     # 否则，直接调用父类方法
-++    #     past_key_values = model_kwargs.get("past_key_values")
-++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++
-++    #     if not isinstance(past_key_values, StaticCache):
-++    #         # 不使用 StaticCache，直接调用父类方法
-++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++    #         return super()._sample(
-++    #             input_ids=input_ids,
-++    #             logits_processor=logits_processor,
-++    #             stopping_criteria=stopping_criteria,
-++    #             generation_config=generation_config,
-++    #             synced_devices=synced_devices,
-++    #             streamer=streamer,
-++    #             logits_warper=logits_warper,
-++    #             **model_kwargs,
-++    #         )
-++        
-++    #     # 使用 StaticCache，进入自定义循环
-++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++    #     pad_token_id = generation_config._pad_token_tensor
-++    #     output_attentions = generation_config.output_attentions
-++    #     output_hidden_states = generation_config.output_hidden_states
-++    #     output_scores = generation_config.output_scores
-++    #     output_logits = generation_config.output_logits
-++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++    #     max_length = generation_config.max_length
-++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++    #     do_sample = generation_config.do_sample
-++        
-++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++    #         raise ValueError(
-++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++    #             f"{logits_warper})."
-++    #         )
-++        
-++    #     # init attention / hidden states / scores tuples
-++    #     scores = () if (return_dict_in_generate and output_scores) else None
-++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++        
-++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++    #         encoder_hidden_states = (
-++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++    #         )
-++        
-++    #     # keep track of which sequences are already finished
-++    #     batch_size, cur_len = input_ids.shape
-++    #     this_peer_finished = False
-++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++        
-++    #     time_record = []
-++    #     from ....utils.testing_utils import parse_flag_from_env
-++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++        
-++    #     while self._has_unfinished_sequences(
-++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++    #     ):
-++    #         if _record_time:
-++    #             import time as time_module
-++    #             infer_start = time_module.time()
-++            
-++    #         # prepare model inputs
-++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++            
-++    #         # prepare variable output controls
-++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++            
-++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++    #         cur_cache_position = model_inputs.get("cache_position")
-++    #         cur_past_key_values = model_inputs.get("past_key_values")
-++    #         cur_input_ids = model_inputs.get("input_ids")
-++            
-++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++    #             cur_cache_position is not None and 
-++    #             len(cur_cache_position.shape) > 0 and
-++    #             cur_cache_position.shape[0] == 1 and
-++    #             cur_input_ids is not None and
-++    #             cur_input_ids.shape[1] == 1):
-++    #             # 使用 JIT 优化的单 token 解码
-++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++    #             if not hasattr(self, '_jit_used'):
-++    #                 self._jit_used = False
-++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++                
-++    #             next_token_logits = self.get_decode_one_tokens_logits(
-++    #                 cur_token=cur_input_ids,
-++    #                 input_pos=model_inputs.get("position_ids"),
-++    #                 cache_position=cur_cache_position,
-++    #                 past_key_values=cur_past_key_values,
-++    #             )
-++                
-++    #             # 标记已使用JIT（用于后续判断）
-++    #             if not self._jit_used:
-++    #                 self._jit_used = True
-++                
-++    #             # 构造兼容的输出对象
-++    #             class JitOptimizedOutput:
-++    #                 def __init__(self, logits, config):
-++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++    #                     self.config = config
-++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++    #                     self.attentions = None if not config.is_encoder_decoder else None
-++    #                     self.cross_attentions = None
-++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++                
-++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++    #         else:
-++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++    #             outputs = self(**model_inputs, return_dict=True)
-++            
-++    #         if synced_devices and this_peer_finished:
-++    #             continue
-++            
-++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++    #         next_token_logits = outputs.logits[:, -1, :]
-++            
-++    #         # pre-process distribution
-++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++    #         if do_sample:
-++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++            
-++    #         # Store scores, attentions and hidden_states when required
-++    #         if return_dict_in_generate:
-++    #             if output_scores:
-++    #                 scores += (next_token_scores,)
-++    #             if output_logits:
-++    #                 raw_logits += (next_token_logits,)
-++    #             if output_attentions:
-++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++    #                 if self.config.is_encoder_decoder:
-++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++                
-++    #             if output_hidden_states:
-++    #                 hidden = (
-++    #                     outputs.decoder_hidden_states
-++    #                     if self.config.is_encoder_decoder
-++    #                     else outputs.hidden_states
-++    #                 )
-++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++            
-++    #         # token selection
-++    #         if do_sample:
-++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++    #         else:
-++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++            
-++    #         # finished sentences should have their next token be a padding token
-++    #         if has_eos_stopping_criteria:
-++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++            
-++    #         # update generated ids, model inputs, and length for next step
-++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++    #         if streamer is not None:
-++    #             streamer.put(next_tokens)
-++            
-++    #         model_kwargs = self._update_model_kwargs_for_generation(
-++    #             outputs,
-++    #             model_kwargs,
-++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++    #         )
-++            
-++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++    #         cur_len += 1
-++            
-++    #         if _record_time:
-++    #             import time as time_module
-++    #             infer_stop = time_module.time()
-++    #             time_record.append(infer_stop - infer_start)
-++            
-++    #         del outputs
-++        
-++    #     average_infer_time = None
-++    #     if time_record:
-++    #         if len(time_record) > 1:
-++    #             time_record.pop(0)
-++    #         average_infer_time = sum(time_record) / len(time_record)
-++    #         print(f'average inference time is: {average_infer_time}')
-++    #         print(f'inference time record: {time_record}')
-++        
-++    #     if streamer is not None:
-++    #         streamer.end()
-++        
-++    #     # 简单判断：打印是否使用了JIT路径
-++    #     if hasattr(self, '_jit_used') and self._jit_used:
-++    #         print("[JIT] ✓ JIT optimization was used during generation")
-++    #     else:
-++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++        
-++    #     if return_dict_in_generate:
-++    #         if self.config.is_encoder_decoder:
-++    #             return GenerateEncoderDecoderOutput(
-++    #                 sequences=input_ids,
-++    #                 scores=scores,
-++    #                 logits=raw_logits,
-++    #                 encoder_attentions=encoder_attentions,
-++    #                 encoder_hidden_states=encoder_hidden_states,
-++    #                 decoder_attentions=decoder_attentions,
-++    #                 cross_attentions=cross_attentions,
-++    #                 decoder_hidden_states=decoder_hidden_states,
-++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++    #                 average_infer_time=average_infer_time
-++    #             )
-++    #         else:
-++    #             return GenerateDecoderOnlyOutput(
-++    #                 sequences=input_ids,
-++    #                 scores=scores,
-++    #                 logits=raw_logits,
-++    #                 attentions=decoder_attentions,
-++    #                 hidden_states=decoder_hidden_states,
-++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++    #                 average_infer_time=average_infer_time
-++    #             )
-++    #     else:
-++    #         return input_ids
-++            
-++    # def _prepare_cache_for_generation(
-++    #     self,
-++    #     generation_config,
-++    #     model_kwargs,
-++    #     assistant_model,
-++    #     batch_size,
-++    #     max_cache_length,
-++    # ):
-++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++    #         generation_config.cache_implementation = "static"
-++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++        
-++    #     if generation_config.cache_implementation == "static":
-++    #         base_required_from_max_length = generation_config.max_length + 1
-++    #         base_required = max(max_cache_length, base_required_from_max_length)
-++    #         min_cache_size = 50
-++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++    #         else:
-++    #             max_cache_length = max(base_required, min_cache_size)
-++            
-++    #         original_max_cache_length = max_cache_length
-++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++    #         print(f"  - final max_cache_length: {max_cache_length}")
-++            
-++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++    #             if max_cache_length > self.config.max_position_embeddings:
-++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++        
-++    #     result = super()._prepare_cache_for_generation(
-++    #         generation_config=generation_config,
-++    #         model_kwargs=model_kwargs,
-++    #         assistant_model=assistant_model,
-++    #         batch_size=batch_size,
-++    #         max_cache_length=max_cache_length,
-++    #     )
-++        
-++    #     if generation_config.cache_implementation == "static":
-++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++    #         created_cache = model_kwargs.get(cache_name)
-++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++    #             if created_cache.max_cache_len < generation_config.max_length:
-++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++        
-++    #     return result
-++
-++
-++
-+ 
-+ 
-+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+-- 
-+2.27.0
-+
-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-new file mode 100644
-index 00000000..22b65dd5
---- /dev/null
-+++ b/patches/0002-20251106commit.patch
-@@ -0,0 +1,3200 @@
-+From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Thu, 6 Nov 2025 09:20:38 +0800
-+Subject: [PATCH 2/3] 20251106commit
-+
-+---
-+ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
-+ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
-+ 3 files changed, 2689 insertions(+), 305 deletions(-)
-+ create mode 100644 patches/0001-20251104commit.patch
-+
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index d8303e45..73773c22 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
-+         #     y = y + self.shared_experts(identity)
-+         # return y
-+         
-++    # @no_grad()
-++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++
-++    #     expert_cache = ops.zeros_like(x)
-++    #     for i in range(self.num_experts_per_tok):
-++    #         expert_id = flat_expert_indices[i].item()
-++    #         weight = flat_expert_weights[i].item()
-++    #         expert = self.experts[expert_id]
-++    #         expert_out = expert(x)
-++    #         expert_cache += expert_out * weight
-++    #     return expert_cache
-++
-+     @no_grad()
-+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++        # x 的 shape: (1, hidden_size)
-++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++
-++        # 1. 收集所有需要的专家层
-++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-++
-++        # 2. 并行计算所有专家的输出
-++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++        # ops.cat 会将它们堆叠成一个新的 Tensor
-++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++
-++        # 3. 使用矩阵乘法进行加权求和
-++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++        # 最终结果 final_output 的 shape: (1, hidden_size)
-++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++        
-++        return final_output
-+ 
-+-        expert_cache = ops.zeros_like(x)
-+-        for i in range(self.num_experts_per_tok):
-+-            expert_id = flat_expert_indices[i].item()
-+-            weight = flat_expert_weights[i].item()
-+-            expert = self.experts[expert_id]
-+-            expert_out = expert(x)
-+-            expert_cache += expert_out * weight
-+-        return expert_cache
-+ 
-+     @no_grad()
-+     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
-+             key_states = self.k_proj(hidden_states)
-+             value_states = self.v_proj(hidden_states)
-+ 
-+-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++        # @lwx
-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
-++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-+ 
-+         kv_seq_len = key_states.shape[-2]
-+         if past_key_value is not None:
-+@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
-+         return attn_output, attn_weights, past_key_value
-+ 
-+ 
-++# class DeepseekFlashAttention(nn.Module):
-++#     """
-++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-++
-++#     This class is designed as a drop-in replacement for DeepseekAttention.
-++#     """
-++
-++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-++#         super().__init__()
-++#         self.config = config
-++#         self.layer_idx = layer_idx
-++#         if layer_idx is None:
-++#             logger.warning(
-++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++#                 "when creating this class."
-++#             )
-++
-++#         self.attention_dropout = config.attention_dropout
-++#         self.hidden_size = config.hidden_size
-++#         self.num_heads = config.num_attention_heads
-++#         self.head_dim = self.hidden_size // self.num_heads
-++#         self.num_key_value_heads = config.num_key_value_heads
-++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++#         self.max_position_embeddings = config.max_position_embeddings
-++#         self.rope_theta = config.rope_theta
-++#         self.is_causal = True
-++
-++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++#             raise ValueError(
-++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++#                 f" and `num_heads`: {self.num_heads})."
-++#             )
-++
-++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-++#         self._init_rope()
-++
-++#     def _init_rope(self):
-++#         if self.config.rope_scaling is None:
-++#             self.rotary_emb = DeepseekRotaryEmbedding(
-++#                 self.head_dim,
-++#                 max_position_embeddings=self.max_position_embeddings,
-++#                 base=self.rope_theta,
-++#             )
-++#         else:
-++#             scaling_type = self.config.rope_scaling["type"]
-++#             scaling_factor = self.config.rope_scaling["factor"]
-++#             if scaling_type == "linear":
-++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-++#                     self.head_dim,
-++#                     max_position_embeddings=self.max_position_embeddings,
-++#                     scaling_factor=scaling_factor,
-++#                     base=self.rope_theta,
-++#                 )
-++#             elif scaling_type == "dynamic":
-++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-++#                     self.head_dim,
-++#                     max_position_embeddings=self.max_position_embeddings,
-++#                     scaling_factor=scaling_factor,
-++#                     base=self.rope_theta,
-++#                 )
-++#             else:
-++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-++
-++#     def forward(
-++#         self,
-++#         hidden_states: mindspore.Tensor,
-++#         attention_mask: Optional[mindspore.Tensor] = None,
-++#         position_ids: Optional[mindspore.Tensor] = None,
-++#         past_key_value: Optional[Cache] = None,
-++#         output_attentions: bool = False,
-++#         use_cache: bool = False,
-++#         **kwargs,
-++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++#         if "padding_mask" in kwargs:
-++#             warnings.warn(
-++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-++#             )
-++        
-++#         if output_attentions:
-++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-++
-++#         bsz, q_len, _ = hidden_states.shape
-++
-++#         if self.config.pretraining_tp > 1:
-++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-++
-++#         query_states = self.q_proj(hidden_states)
-++#         key_states = self.k_proj(hidden_states)
-++#         value_states = self.v_proj(hidden_states)
-++
-++#         # Reshape for multi-head attention
-++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++#         kv_seq_len = key_states.shape[-2]
-++#         if past_key_value is not None:
-++#             if self.layer_idx is None:
-++#                 raise ValueError(
-++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++#                     "with a layer index."
-++#                 )
-++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++        
-++#         # Apply Rotary Positional Embedding
-++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++#         if past_key_value is not None:
-++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++
-++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++        
-++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++        
-++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++
-++#         # Convert attention_mask for flash_attention_score
-++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-++#         if attention_mask is not None:
-++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-++#                 raise ValueError(
-++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-++#                 )
-++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-++#         else:
-++#             attn_mask_for_fa = None
-++        
-++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-++
-++#         # Call the fused flash_attention_score operator
-++#         attn_output = mindspore.ops.flash_attention_score(
-++#             query=query_states_for_fa,
-++#             key=key_states_for_fa,
-++#             value=value_states_for_fa,
-++#             head_num=self.num_heads, # This is N1, the number of query heads
-++#             input_layout='BSH',
-++#             attn_mask=attn_mask_for_fa,
-++#             keep_prob=keep_prob,
-++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-++#             sparse_mode=0 # Default mask mode
-++#         )
-++        
-++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-++#         attn_output = self.o_proj(attn_output)
-++        
-++#         # Flash Attention does not return attention weights
-++#         attn_weights = None
-++
-++#         return attn_output, attn_weights, past_key_value
-++
-++class DeepseekFlashAttention(nn.Module):
-++    """
-++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
-++    This implementation is a drop-in replacement for the original DeepseekAttention class,
-++    designed for high performance on supported hardware (Ascend).
-++
-++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
-++    """
-++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-++        super().__init__()
-++        self.config = config
-++        self.layer_idx = layer_idx
-++        if layer_idx is None:
-++            logger.warning(
-++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++                "when creating this class."
-++            )
-++
-++        # --- [FIX] Correctly initialize all required attributes ---
-++        self.attention_dropout = config.attention_dropout
-++        self.hidden_size = config.hidden_size
-++        self.num_heads = config.num_attention_heads
-++        self.head_dim = self.hidden_size // self.num_heads
-++        self.num_key_value_heads = config.num_key_value_heads
-++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++        self.max_position_embeddings = config.max_position_embeddings
-++        self.rope_theta = config.rope_theta
-++        self.is_causal = True
-++
-++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++            raise ValueError(
-++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++                f" and `num_heads`: {self.num_heads})."
-++            )
-++
-++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-++        
-++        # This call will now succeed as all attributes are initialized.
-++        self._init_rope()
-++
-++    def _init_rope(self):
-++        if self.config.rope_scaling is None:
-++            self.rotary_emb = DeepseekRotaryEmbedding(
-++                self.head_dim,
-++                max_position_embeddings=self.max_position_embeddings,
-++                base=self.rope_theta,
-++            )
-++        else:
-++            scaling_type = self.config.rope_scaling["type"]
-++            scaling_factor = self.config.rope_scaling["factor"]
-++            if scaling_type == "linear":
-++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-++                    self.head_dim,
-++                    max_position_embeddings=self.max_position_embeddings,
-++                    scaling_factor=scaling_factor,
-++                    base=self.rope_theta,
-++                )
-++            elif scaling_type == "dynamic":
-++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-++                    self.head_dim,
-++                    max_position_embeddings=self.max_position_embeddings,
-++                    scaling_factor=scaling_factor,
-++                    base=self.rope_theta,
-++                )
-++            else:
-++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-++
-++    def forward(
-++        self,
-++        hidden_states: mindspore.Tensor,
-++        attention_mask: Optional[mindspore.Tensor] = None,
-++        position_ids: Optional[mindspore.Tensor] = None,
-++        past_key_value: Optional[Cache] = None,
-++        output_attentions: bool = False,
-++        use_cache: bool = False,
-++        **kwargs,
-++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++        if "padding_mask" in kwargs:
-++            warnings.warn(
-++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-++            )
-++        if output_attentions:
-++            warnings.warn(
-++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
-++            )
-++
-++        bsz, q_len, _ = hidden_states.shape
-++
-++        if self.config.pretraining_tp > 1:
-++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-++
-++        query_states = self.q_proj(hidden_states)
-++        key_states = self.k_proj(hidden_states)
-++        value_states = self.v_proj(hidden_states)
-++
-++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++        kv_seq_len = key_states.shape[-2]
-++        if past_key_value is not None:
-++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++        
-++        # Apply Rotary Position Embedding
-++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++        if past_key_value is not None:
-++            cache_kwargs = {"sin": sin, "cos": cos}
-++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++
-++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
-++        # So we must explicitly repeat the KV heads.
-++        key_states = repeat_kv(key_states, self.num_key_value_groups)
-++        value_states = repeat_kv(value_states, self.num_key_value_groups)
-++
-++        # Convert attention mask for flash_attention_score
-++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
-++        if attention_mask is not None:
-++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-++                 raise ValueError(
-++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-++                )
-++            attn_mask_for_fa = attention_mask < 0
-++        else:
-++            attn_mask_for_fa = None
-++
-++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-++
-++        # Call the fused operator using the efficient BNSD layout
-++        attn_output = mindspore.ops.flash_attention_score(
-++            query=query_states,
-++            key=key_states,
-++            value=value_states,
-++            head_num=self.num_heads,
-++            input_layout='BNSD', # Specify the correct layout
-++            attn_mask=attn_mask_for_fa,
-++            keep_prob=keep_prob,
-++            scalar_value=1.0 / math.sqrt(self.head_dim)
-++        )
-++        
-++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
-++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++        
-++        # Apply output projection
-++        attn_output = self.o_proj(attn_output)
-++
-++        # Flash attention does not return attention weights, so we return None.
-++        attn_weights = None
-++
-++        return attn_output, attn_weights, past_key_value
-++
-+ Deepseek_ATTENTION_CLASSES = {
-+     "eager": DeepseekAttention,
-++    "flash-attention": DeepseekFlashAttention,
-+ }
-+ 
-+ 
-+@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
-+             config=config, layer_idx=layer_idx
-+         )
-+ 
-++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-++            config=config, layer_idx=layer_idx
-++        )
-++
-+         self.mlp = (
-+             DeepseekMoE(config)
-+             if (
-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+index d4c6b651..bced285c 100644
-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
-+ 
-+ import mindspore
-+ import mindnlp.core.nn.functional as F
-+-from mindnlp.core import nn, ops
-++from mindnlp.core import nn, ops, no_grad
-+ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-+ 
-+ from ....common.activations import ACT2FN
-+@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
-+ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-+ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-+ 
-++Long_Prompt = False
-++PROMPT_LENGTH_THRESHOLD = 128
-+ 
-+ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-+ def _prepare_4d_causal_attention_mask_with_cache_position(
-+@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
-+         return attn_output, attn_weights, past_key_value
-+ 
-+ 
-++# class Qwen2MoeFlashAttention(nn.Module):
-++#     """
-++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++
-++#     关键改动:
-++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++#        直接传入原始的 key 和 value 张量效率更高。
-++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++#     """
-++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++#         super().__init__()
-++#         self.config = config
-++#         self.layer_idx = layer_idx
-++#         self.hidden_size = config.hidden_size
-++#         self.num_heads = config.num_attention_heads
-++#         self.head_dim = self.hidden_size // self.num_heads
-++#         self.num_key_value_heads = config.num_key_value_heads
-++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++#         self.max_position_embeddings = config.max_position_embeddings
-++#         self.rope_theta = config.rope_theta
-++#         self.attention_dropout = config.attention_dropout
-++
-++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++#             raise ValueError(
-++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++#             )
-++
-++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++
-++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++#             self.head_dim,
-++#             max_position_embeddings=self.max_position_embeddings,
-++#             base=self.rope_theta,
-++#         )
-++
-++#     def forward(
-++#         self,
-++#         hidden_states: mindspore.Tensor,
-++#         attention_mask: Optional[mindspore.Tensor] = None,
-++#         position_ids: Optional[mindspore.Tensor] = None,
-++#         past_key_value: Optional[Cache] = None,
-++#         output_attentions: bool = False,
-++#         use_cache: bool = False,
-++#         cache_position: Optional[mindspore.Tensor] = None,
-++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++#         bsz, q_len, _ = hidden_states.shape
-++
-++#         # 1. 线性投射 Q, K, V
-++#         query_states = self.q_proj(hidden_states)
-++#         key_states = self.k_proj(hidden_states)
-++#         value_states = self.v_proj(hidden_states)
-++
-++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++#         # query:   [B, S, H*D] -> [B, N1, S, D]
-++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++#         # 3. RoPE 旋转位置编码
-++#         kv_seq_len = key_states.shape[-2]
-++#         if past_key_value is not None:
-++#             if self.layer_idx is None:
-++#                 raise ValueError(
-++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++#                     "with a layer index."
-++#                 )
-++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
-++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++#                 if cache_position.shape[0] == 1:
-++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++#                     kv_seq_len = past_seen_tokens + 1
-++#                 else:
-++#                     # prefill 阶段：cache_position 是范围，使用其长度
-++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++#             else:
-++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++
-++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++#         # 4. KV 缓存更新
-++#         if past_key_value is not None:
-++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++#             key_states, value_states = past_key_value.update(
-++#                 key_states, value_states, self.layer_idx, cache_kwargs
-++#             )
-++            
-++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++#                 if cache_position.shape[0] == 1:
-++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++#                     kv_seq_len = key_states.shape[-2]
-++
-++#         # 5. [重要] 准备 Attention Mask
-++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++#         fa_attention_mask = None
-++#         if attention_mask is not None:
-++#             # 截取与当前key长度匹配的部分
-++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
-++#             fa_attention_mask = (mask_slice != 0)
-++
-++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++#         input_dtype = query_states.dtype
-++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++#             query_states = query_states.to(mindspore.float16)
-++#             key_states = key_states.to(mindspore.float16)
-++#             value_states = value_states.to(mindspore.float16)
-++
-++#         # 6. [核心] 调用 flash_attention_score 算子
-++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++#         attn_output = mindspore.ops.flash_attention_score(
-++#             query=query_states,
-++#             key=key_states,
-++#             value=value_states,
-++#             head_num=self.num_heads, # 传入Q的头数(N1)
-++#             attn_mask=fa_attention_mask,
-++#             keep_prob=1.0 - self.attention_dropout,
-++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-++#             input_layout="BNSD",
-++#             sparse_mode=0 # 使用 defaultMask 模式
-++#         )
-++
-++#         # 恢复原始数据类型
-++#         attn_output = attn_output.to(input_dtype)
-++
-++#         # 7. 调整输出形状
-++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++#         attn_output = self.o_proj(attn_output)
-++
-++#         # FlashAttention 算子不直接返回注意力权重矩阵
-++#         attn_weights = None
-++#         if output_attentions:
-++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++
-++#         return attn_output, attn_weights, past_key_value
-++
-++#     # def forward(
-++#     #     self,
-++#     #     hidden_states: mindspore.Tensor,
-++#     #     attention_mask: Optional[mindspore.Tensor] = None,
-++#     #     position_ids: Optional[mindspore.Tensor] = None,
-++#     #     past_key_value: Optional[Cache] = None,
-++#     #     output_attentions: bool = False,
-++#     #     use_cache: bool = False,
-++#     #     cache_position: Optional[mindspore.Tensor] = None,
-++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++#     #     bsz, q_len, _ = hidden_states.shape
-++
-++#     #     # 1. 线性投射 Q, K, V
-++#     #     query_states = self.q_proj(hidden_states)
-++#     #     key_states = self.k_proj(hidden_states)
-++#     #     value_states = self.v_proj(hidden_states)
-++
-++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++
-++#     #     # 3. RoPE 旋转位置编码
-++#     #     kv_seq_len = key_states.shape[-2]
-++#     #     if past_key_value is not None:
-++#     #         if self.layer_idx is None:
-++#     #             raise ValueError(
-++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++#     #                 "with a layer index."
-++#     #             )
-++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++
-++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++#     #     # 4. KV 缓存更新
-++#     #     if past_key_value is not None:
-++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++#     #         key_states, value_states = past_key_value.update(
-++#     #             key_states, value_states, self.layer_idx, cache_kwargs
-++#     #         )
-++
-++#     #     # 5. 准备 Attention Mask
-++#     #     fa_attention_mask = None
-++#     #     if attention_mask is not None:
-++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++#     #         fa_attention_mask = (mask_slice != 0)
-++
-++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++#     #     input_dtype = query_states.dtype
-++
-++#     #     # 6. [核心] 调用 flash_attention_score 算子
-++#     #     attn_output = mindspore.ops.flash_attention_score(
-++#     #         query=query_states,
-++#     #         key=key_states,
-++#     #         value=value_states,
-++#     #         head_num=self.num_heads,
-++#     #         attn_mask=fa_attention_mask,
-++#     #         keep_prob=1.0 - self.attention_dropout,
-++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++#     #         input_layout="BNSD",
-++#     #         sparse_mode=0,
-++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
-++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++#     #         inner_precise=1
-++#     #     )
-++
-++#     #     # 恢复原始数据类型
-++#     #     attn_output = attn_output.to(input_dtype)
-++
-++#     #     # 7. 调整输出形状
-++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++#     #     attn_output = self.o_proj(attn_output)
-++
-++#     #     attn_weights = None
-++#     #     if output_attentions:
-++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++
-++#     #     return attn_output, attn_weights, past_key_value
-++
-++
-+ class Qwen2MoeFlashAttention(nn.Module):
-+     """
-+-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+-
-+-    关键改动:
-+-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+-       直接传入原始的 key 和 value 张量效率更高。
-+-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
-++
-++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
-++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
-++    完全使用模型的低精度数据类型（如 float16）进行计算，
-++    以达到理论上的最高执行速度。
-+     """
-+     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+         super().__init__()
-+         self.config = config
-+         self.layer_idx = layer_idx
-++        if layer_idx is None:
-++            logger.warning_once(
-++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
-++            )
-++
-+         self.hidden_size = config.hidden_size
-+         self.num_heads = config.num_attention_heads
-+         self.head_dim = self.hidden_size // self.num_heads
-+         self.num_key_value_heads = config.num_key_value_heads
-+-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+         self.max_position_embeddings = config.max_position_embeddings
-+         self.rope_theta = config.rope_theta
-+         self.attention_dropout = config.attention_dropout
-+ 
-+-        if (self.head_dim * self.num_heads) != self.hidden_size:
-+-            raise ValueError(
-+-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+-            )
-+-
-+         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
-+         key_states = self.k_proj(hidden_states)
-+         value_states = self.v_proj(hidden_states)
-+ 
-+-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+-        # query:   [B, S, H*D] -> [B, N1, S, D]
-+-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++        # 2. 调整形状以匹配 BNSD 布局
-+         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-
-+-        # 3. RoPE 旋转位置编码
-++        
-++        # 3. RoPE 和 KV 缓存
-+         kv_seq_len = key_states.shape[-2]
-+         if past_key_value is not None:
-+-            if self.layer_idx is None:
-+-                raise ValueError(
-+-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+-                    "with a layer index."
-+-                )
-+-            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+-                if cache_position.shape[0] == 1:
-+-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+-                    kv_seq_len = past_seen_tokens + 1
-+-                else:
-+-                    # prefill 阶段：cache_position 是范围，使用其长度
-+-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+-            else:
-+-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+-
-++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++        
-+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+ 
-+-        # 4. KV 缓存更新
-+         if past_key_value is not None:
-+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+-            key_states, value_states = past_key_value.update(
-+-                key_states, value_states, self.layer_idx, cache_kwargs
-+-            )
-+-            
-+-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+-                if cache_position.shape[0] == 1:
-+-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+-                    kv_seq_len = key_states.shape[-2]
-+-
-+-        # 5. [重要] 准备 Attention Mask
-+-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++
-++        # 4. 准备 Attention Mask
-+         fa_attention_mask = None
-+         if attention_mask is not None:
-+-            # 截取与当前key长度匹配的部分
-+-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+-            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+             fa_attention_mask = (mask_slice != 0)
-+ 
-+-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+-        input_dtype = query_states.dtype
-+-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+-            query_states = query_states.to(mindspore.float16)
-+-            key_states = key_states.to(mindspore.float16)
-+-            value_states = value_states.to(mindspore.float16)
-+-
-+-        # 6. [核心] 调用 flash_attention_score 算子
-+-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
-+         attn_output = mindspore.ops.flash_attention_score(
-+             query=query_states,
-+             key=key_states,
-+             value=value_states,
-+-            head_num=self.num_heads, # 传入Q的头数(N1)
-++            head_num=self.num_heads,
-+             attn_mask=fa_attention_mask,
-+-            keep_prob=1.0 - self.attention_dropout,
-++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
-+             scalar_value=1.0 / math.sqrt(self.head_dim),
-+             input_layout="BNSD",
-+-            sparse_mode=0 # 使用 defaultMask 模式
-++            sparse_mode=0,
-++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
-+         )
-+ 
-+-        # 恢复原始数据类型
-+-        attn_output = attn_output.to(input_dtype)
-+-
-+-        # 7. 调整输出形状
-+-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++        # 6. 调整输出形状
-+         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+         attn_output = self.o_proj(attn_output)
-+ 
-+-        # FlashAttention 算子不直接返回注意力权重矩阵
-++        # 7. 返回结果
-+         attn_weights = None
-+         if output_attentions:
-+-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-+ 
-+         return attn_output, attn_weights, past_key_value
-+ 
-+-    # def forward(
-+-    #     self,
-+-    #     hidden_states: mindspore.Tensor,
-+-    #     attention_mask: Optional[mindspore.Tensor] = None,
-+-    #     position_ids: Optional[mindspore.Tensor] = None,
-+-    #     past_key_value: Optional[Cache] = None,
-+-    #     output_attentions: bool = False,
-+-    #     use_cache: bool = False,
-+-    #     cache_position: Optional[mindspore.Tensor] = None,
-+-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+-
-+-    #     bsz, q_len, _ = hidden_states.shape
-+-
-+-    #     # 1. 线性投射 Q, K, V
-+-    #     query_states = self.q_proj(hidden_states)
-+-    #     key_states = self.k_proj(hidden_states)
-+-    #     value_states = self.v_proj(hidden_states)
-+-
-+-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-
-+-    #     # 3. RoPE 旋转位置编码
-+-    #     kv_seq_len = key_states.shape[-2]
-+-    #     if past_key_value is not None:
-+-    #         if self.layer_idx is None:
-+-    #             raise ValueError(
-+-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+-    #                 "with a layer index."
-+-    #             )
-+-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+ 
-+-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+-
-+-    #     # 4. KV 缓存更新
-+-    #     if past_key_value is not None:
-+-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+-    #         key_states, value_states = past_key_value.update(
-+-    #             key_states, value_states, self.layer_idx, cache_kwargs
-+-    #         )
-+-
-+-    #     # 5. 准备 Attention Mask
-+-    #     fa_attention_mask = None
-+-    #     if attention_mask is not None:
-+-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+-    #         fa_attention_mask = (mask_slice != 0)
-+-
-+-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+-    #     input_dtype = query_states.dtype
-+-
-+-    #     # 6. [核心] 调用 flash_attention_score 算子
-+-    #     attn_output = mindspore.ops.flash_attention_score(
-+-    #         query=query_states,
-+-    #         key=key_states,
-+-    #         value=value_states,
-+-    #         head_num=self.num_heads,
-+-    #         attn_mask=fa_attention_mask,
-+-    #         keep_prob=1.0 - self.attention_dropout,
-+-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+-    #         input_layout="BNSD",
-+-    #         sparse_mode=0,
-+-    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+-    #         inner_precise=1
-+-    #     )
-+-
-+-    #     # 恢复原始数据类型
-+-    #     attn_output = attn_output.to(input_dtype)
-++QWEN2MOE_ATTENTION_CLASSES = {
-++    "eager": Qwen2MoeAttention,
-++    "flash-attention": Qwen2MoeFlashAttention,
-++}
-+ 
-+-    #     # 7. 调整输出形状
-+-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+-    #     attn_output = self.o_proj(attn_output)
-+ 
-+-    #     attn_weights = None
-+-    #     if output_attentions:
-+-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++#     def __init__(self, config):
-++#         super().__init__()
-++#         self.num_experts = config.num_experts
-++#         self.top_k = config.num_experts_per_tok
-++#         self.norm_topk_prob = config.norm_topk_prob
-++
-++#         # gating
-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++#         self.experts = nn.ModuleList(
-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++#         )
-++
-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++
-++#     #@dwj
-++#     # 只遍历激活的专家，而非全部专家
-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
-++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++#             num_tokens = hidden_states_reshaped.shape[0]
-++
-++#             router_logits = self.gate(hidden_states_reshaped)
-++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++
-++#             if self.norm_topk_prob:
-++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++#             routing_weights = routing_weights.to(hidden_states.dtype)
-++            
-++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++#             flat_selected_experts = selected_experts.flatten()
-++            
-++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++#             token_indices = broadcasted_token_indices.flatten()
-++            
-++#             active_experts = ops.unique(flat_selected_experts)
-++            
-++#             for expert_idx_tensor in active_experts:
-++#                 expert_idx = expert_idx_tensor.item()
-++#                 expert_layer = self.experts[expert_idx]
-++                
-++#                 mask = (flat_selected_experts == expert_idx_tensor)
-++#                 selected_token_indices = token_indices[mask]
-++#                 selected_routing_weights = routing_weights.flatten()[mask]
-++                
-++#                 current_states = hidden_states_reshaped[selected_token_indices]
-++                
-++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++                
-++#                 final_hidden_states = final_hidden_states.index_add(
-++#                     dim=0,
-++#                     index=selected_token_indices,
-++#                     source=expert_output.to(hidden_states.dtype)
-++#                 )
-++            
-++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+ 
-+-    #     return attn_output, attn_weights, past_key_value
-++#             final_hidden_states = final_hidden_states + shared_expert_output
-++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++            
-++#             return final_hidden_states, router_logits
-++
-++
-++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++#     """
-++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-++#     `_moe_infer_prefill` (用于长序列处理) 方法。
-++#     """
-++#     def __init__(self, config: Qwen2MoeConfig):
-++#         super().__init__()
-++#         self.num_experts = config.num_experts
-++#         self.top_k = config.num_experts_per_tok
-++#         self.norm_topk_prob = config.norm_topk_prob
-++
-++#         # 门控网络
-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++#         # 专家列表
-++#         self.experts = nn.ModuleList(
-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++#         )
-++#         # 共享专家
-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++
-++#     @no_grad()
-++#     def _moe_infer_decode(
-++#         self, 
-++#         hidden_states: mindspore.Tensor, 
-++#         selected_experts: mindspore.Tensor, 
-++#         routing_weights: mindspore.Tensor
-++#     ) -> mindspore.Tensor:
-++#         """
-++#         【解码路径】针对 sequence_length=1 的极致优化。
-++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-++#         """
-++#         batch_size, hidden_dim = hidden_states.shape
-++        
-++#         expert_outputs_list = [
-++#             ops.cat([
-++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++#             ], dim=0) 
-++#             for i in range(batch_size)
-++#         ]
-++        
-++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-++#         # shape: (batch_size, top_k, hidden_dim)
-++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++        
-++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++        
-++#         return moe_output.squeeze(1)
-++
-++#     @no_grad()
-++#     def _moe_infer_prefill(
-++#         self, 
-++#         hidden_states: mindspore.Tensor, 
-++#         selected_experts: mindspore.Tensor, 
-++#         routing_weights: mindspore.Tensor
-++#     ) -> mindspore.Tensor:
-++#         """
-++#         【预填充路径】针对 sequence_length > 1 的优化。
-++#         按专家对 Token 进行分组，并进行批处理。
-++#         """
-++#         moe_output = ops.zeros_like(hidden_states)
-++#         num_tokens = hidden_states.shape[0]
-++#         flat_selected_experts = selected_experts.flatten()
-++        
-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++        
-++#         active_experts = ops.unique(flat_selected_experts)
-++        
-++#         for expert_idx_tensor in active_experts:
-++#             expert_idx = expert_idx_tensor.item()
-++#             expert_layer = self.experts[expert_idx]
-++            
-++#             mask = (flat_selected_experts == expert_idx_tensor)
-++#             selected_token_indices = token_indices[mask]
-++#             selected_routing_weights = routing_weights.flatten()[mask]
-++            
-++#             current_states = hidden_states[selected_token_indices]
-++            
-++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++            
-++#             moe_output = moe_output.index_add(
-++#                 dim=0,
-++#                 index=selected_token_indices,
-++#                 source=expert_output.to(hidden_states.dtype)
-++#             )
-++#         return moe_output
-++
-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++#         """
-++#         顶层 forward 方法，作为智能分发器。
-++#         """
-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++        
-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++#         router_logits = self.gate(hidden_states_reshaped)
-++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+ 
-+-    # def forward(
-+-    #     self,
-+-    #     hidden_states: mindspore.Tensor,
-+-    #     attention_mask: Optional[mindspore.Tensor] = None,
-+-    #     position_ids: Optional[mindspore.Tensor] = None,
-+-    #     past_key_value: Optional[Cache] = None,
-+-    #     output_attentions: bool = False,
-+-    #     use_cache: bool = False,
-+-    #     cache_position: Optional[mindspore.Tensor] = None,
-+-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+-
-+-    #     bsz, q_len, _ = hidden_states.shape
-+-
-+-    #     query_states = self.q_proj(hidden_states)
-+-    #     key_states = self.k_proj(hidden_states)
-+-    #     value_states = self.v_proj(hidden_states)
-+-
-+-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-
-+-    #     kv_seq_len = key_states.shape[-2]
-+-    #     if past_key_value is not None:
-+-    #         if self.layer_idx is None:
-+-    #             raise ValueError("`layer_idx` must be specified for caching")
-+-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+-
-+-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+-
-+-    #     if past_key_value is not None:
-+-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+-    #         key_states, value_states = past_key_value.update(
-+-    #             key_states, value_states, self.layer_idx, cache_kwargs
-+-    #         )
-++#         if self.norm_topk_prob:
-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++        
-++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++        
-++#         moe_output = None
-++#         # 在推理时，根据序列长度选择最优路径
-++#         if not self.training:
-++#             if sequence_length == 1:
-++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++#             else:
-++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++#         else:
-++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-++#             raise NotImplementedError("Training path is not implemented.")
-++
-++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-++        
-++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-++        
-++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-++        
-++#         return final_hidden_states, router_logits
-++
-++
-++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++#     """
-++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-++#     """
-++#     def __init__(self, config: Qwen2MoeConfig):
-++#         super().__init__()
-++#         self.num_experts = config.num_experts
-++#         self.top_k = config.num_experts_per_tok
-++#         self.norm_topk_prob = config.norm_topk_prob
-++
-++#         # 门控网络
-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++#         # 专家列表
-++#         self.experts = nn.ModuleList(
-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++#         )
-++#         # 共享专家
-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++
-++#     @no_grad()
-++#     def _moe_infer_decode(
-++#         self, 
-++#         hidden_states: mindspore.Tensor, 
-++#         selected_experts: mindspore.Tensor, 
-++#         routing_weights: mindspore.Tensor
-++#     ) -> mindspore.Tensor:
-++#         batch_size, _ = hidden_states.shape
-++#         expert_outputs_list = [
-++#             ops.cat([
-++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++#             ], dim=0) 
-++#             for i in range(batch_size)
-++#         ]
-++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++#         return moe_output.squeeze(1)
-++
-++#     @no_grad()
-++#     def _moe_infer_prefill(
-++#         self, 
-++#         hidden_states: mindspore.Tensor, 
-++#         selected_experts: mindspore.Tensor, 
-++#         routing_weights: mindspore.Tensor
-++#     ) -> mindspore.Tensor:
-++#         moe_output = ops.zeros_like(hidden_states)
-++#         num_tokens = hidden_states.shape[0]
-++#         flat_selected_experts = selected_experts.flatten()
-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++#         active_experts = ops.unique(flat_selected_experts)
-++        
-++#         for expert_idx_tensor in active_experts:
-++#             expert_idx = expert_idx_tensor.item()
-++#             expert_layer = self.experts[expert_idx]
-++#             mask = (flat_selected_experts == expert_idx_tensor)
-++#             selected_token_indices = token_indices[mask]
-++#             selected_routing_weights = routing_weights.flatten()[mask]
-++#             current_states = hidden_states[selected_token_indices]
-++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++#             moe_output = moe_output.index_add(
-++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++#             )
-++#         return moe_output
-++
-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++#         """
-++#         顶层 forward 方法，作为智能分发器。
-++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-++#         """
-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++        
-++#         # 1. 门控计算 (通用逻辑)
-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++#         router_logits = self.gate(hidden_states_reshaped)
-++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++
-++#         if self.norm_topk_prob:
-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++        
-++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++        
-++#         # 2. 智能分发到最优 MoE 路径
-++#         moe_output = None
-++#         if not self.training:
-++#             if sequence_length == 1:
-++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++#             else:
-++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++#         else:
-++#             raise NotImplementedError("Training path is not implemented.")
-++
-++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++        
-++#         # 4. 合并 MoE 输出和共享专家输出
-++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++        
-++#         # 5. 恢复原始形状并返回
-++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++        
-++#         return final_hidden_states, router_logits
-++
-++# prefill fastest
-++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++#     """
-++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-++#     """
-++#     def __init__(self, config: Qwen2MoeConfig):
-++#         super().__init__()
-++#         self.num_experts = config.num_experts
-++#         self.top_k = config.num_experts_per_tok
-++#         self.norm_topk_prob = config.norm_topk_prob
-++
-++#         # 门控网络
-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++#         # 专家列表
-++#         self.experts = nn.ModuleList(
-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++#         )
-++#         # 共享专家
-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++
-++#     @no_grad()
-++#     def _moe_infer_dispatch(
-++#         self, 
-++#         hidden_states: mindspore.Tensor, 
-++#         selected_experts: mindspore.Tensor, 
-++#         routing_weights: mindspore.Tensor
-++#     ) -> mindspore.Tensor:
-++#         """
-++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-++#         """
-++#         moe_output = ops.zeros_like(hidden_states)
-++#         num_tokens, _ = hidden_states.shape
-++        
-++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-++#         flat_selected_experts = selected_experts.flatten()
-++#         flat_routing_weights = routing_weights.flatten()
-+ 
-+-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+-
-+-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+-    #     query_states = query_states / math.sqrt(self.head_dim)
-+-    #     # <--- 修改结束 ---
-+-
-+-    #     fa_attention_mask = None
-+-    #     if attention_mask is not None:
-+-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+-    #         fa_attention_mask = (mask_slice != 0)
-+-
-+-    #     input_dtype = query_states.dtype
-+-
-+-    #     attn_output = mindspore.ops.flash_attention_score(
-+-    #         query=query_states,    # 传入已经预先缩放过的 query
-+-    #         key=key_states,
-+-    #         value=value_states,
-+-    #         head_num=self.num_heads,
-+-    #         attn_mask=fa_attention_mask,
-+-    #         keep_prob=1.0 - self.attention_dropout,
-+-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+-    #         input_layout="BNSD",
-+-    #         sparse_mode=0,
-+-    #         inner_precise=1        # 仍然保持内部高精度计算
-+-    #     )
-++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+ 
-+-    #     attn_output = attn_output.to(input_dtype)
-+-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+-    #     attn_output = self.o_proj(attn_output)
-++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-++#         active_experts = ops.unique(flat_selected_experts)
-++        
-++#         for expert_idx_tensor in active_experts:
-++#             expert_idx = expert_idx_tensor.item()
-++#             expert_layer = self.experts[expert_idx]
-++            
-++#             # 找到所有分配给该专家的 token
-++#             mask = (flat_selected_experts == expert_idx_tensor)
-++            
-++#             # 使用 mask 选取对应的 token 和权重
-++#             current_token_indices = token_indices[mask]
-++#             current_routing_weights = flat_routing_weights[mask]
-++#             current_hidden_states = hidden_states[current_token_indices]
-++            
-++#             # 对这些 token 进行批处理
-++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++            
-++#             # 使用 index_add 将结果精确地加回到对应位置
-++#             moe_output = moe_output.index_add(
-++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-++#             )
-++#         return moe_output
-++
-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++#         """
-++#         顶层 forward 方法，作为智能分发器。
-++#         """
-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++        
-++#         # 1. 门控计算
-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++#         router_logits = self.gate(hidden_states_reshaped)
-++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++
-++#         if self.norm_topk_prob:
-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++        
-++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++        
-++#         # 2. 调用统一的 MoE 计算内核
-++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-+ 
-+-    #     attn_weights = None
-+-    #     if output_attentions:
-+-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++#         # 3. 统一处理共享专家
-++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++        
-++#         # 4. 合并输出
-++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++        
-++#         # 5. 恢复原始形状并返回
-++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++        
-++#         return final_hidden_states, router_logits
-++
-++
-++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++#     """
-++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++#     【最终高性能与高精度版】：
-++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-++#     3. 这样实现了速度和准确性的两全其美。
-++#     """
-++#     def __init__(self, config: Qwen2MoeConfig):
-++#         super().__init__()
-++#         self.num_experts = config.num_experts
-++#         self.top_k = config.num_experts_per_tok
-++#         self.norm_topk_prob = config.norm_topk_prob
-++
-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++#         self.experts = nn.ModuleList(
-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++#         )
-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++
-++#     @no_grad()
-++#     def _moe_infer_decode(
-++#         self, 
-++#         hidden_states: mindspore.Tensor, 
-++#         selected_experts: mindspore.Tensor, 
-++#         routing_weights: mindspore.Tensor
-++#     ) -> mindspore.Tensor:
-++#         """
-++#         【解码路径】极致优化版：bmm + 高精度累加。
-++#         """
-++#         original_dtype = hidden_states.dtype
-++#         batch_size, _ = hidden_states.shape
-++        
-++#         expert_outputs_list = [
-++#             ops.cat([
-++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++#             ], dim=0) 
-++#             for i in range(batch_size)
-++#         ]
-++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++
-++#         # 在 float32 下执行 bmm，得到高精度结果
-++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++        
-++#         # 将高精度结果转换回原始数据类型
-++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-++        
-++#         return moe_output
-++
-++#     @no_grad()
-++#     def _moe_infer_prefill(
-++#         self, 
-++#         hidden_states: mindspore.Tensor, 
-++#         selected_experts: mindspore.Tensor, 
-++#         routing_weights: mindspore.Tensor
-++#     ) -> mindspore.Tensor:
-++#         """
-++#         【预填充路径】与原始实现一致，结果精确。
-++#         """
-++#         moe_output = ops.zeros_like(hidden_states)
-++#         num_tokens, _ = hidden_states.shape
-++#         flat_selected_experts = selected_experts.flatten()
-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++#         active_experts = ops.unique(flat_selected_experts)
-++        
-++#         for expert_idx_tensor in active_experts:
-++#             expert_idx = expert_idx_tensor.item()
-++#             expert_layer = self.experts[expert_idx]
-++#             mask = (flat_selected_experts == expert_idx_tensor)
-++#             selected_token_indices = token_indices[mask]
-++#             selected_routing_weights = routing_weights.flatten()[mask]
-++#             current_states = hidden_states[selected_token_indices]
-++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++#             moe_output = moe_output.index_add(
-++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++#             )
-++#         return moe_output
-++
-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++        
-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++#         router_logits = self.gate(hidden_states_reshaped)
-++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+ 
-+-    #     return attn_output, attn_weights, past_key_value
-++#         if self.norm_topk_prob:
-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++        
-++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-++#         # 如果模型主体是 float16，后续再转换
-++        
-++#         moe_output = None
-++#         if not self.training:
-++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-++#             # _moe_infer_decode 内部会处理好类型转换
-++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-++#             if sequence_length == 1:
-++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++#             else:
-++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++#         else:
-++#             raise NotImplementedError("Training path is not implemented.")
-++
-++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++        
-++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++        
-++#         return final_hidden_states, router_logits
-++    
-+ 
-+-QWEN2MOE_ATTENTION_CLASSES = {
-+-    "eager": Qwen2MoeAttention,
-+-    "flash-attention": Qwen2MoeFlashAttention,
-+-}
-++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++#     """
-++#     【融合版】一个混合专家模块，内置两种推理策略，
-++#     由外部全局变量 `Long_Prompt` 控制：
-++
-++#     - if Long_Prompt is True:  【精度优先模式】
-++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-++#       适用于处理长序列，避免误差累积。
-++
-++#     - if Long_Prompt is False: 【速度优先模式】
-++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-++#       在解码阶段获得极致速度，同时保证结果高度准确。
-++#     """
-++#     def __init__(self, config: Qwen2MoeConfig):
-++#         super().__init__()
-++#         self.num_experts = config.num_experts
-++#         self.top_k = config.num_experts_per_tok
-++#         self.norm_topk_prob = config.norm_topk_prob
-++
-++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++#         self.experts = nn.ModuleList(
-++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++#         )
-++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++
-++#     # --- 速度优先模式的辅助函数 ---
-++#     @no_grad()
-++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++#         original_dtype = hidden_states.dtype
-++#         batch_size, _ = hidden_states.shape
-++#         expert_outputs_list = [
-++#             ops.cat([
-++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++#             ], dim=0) 
-++#             for i in range(batch_size)
-++#         ]
-++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++#         weights_fp32 = routing_weights.to(mindspore.float32)
-++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++#         return moe_output_fp32.squeeze(1).to(original_dtype)
-++
-++#     @no_grad()
-++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++#         moe_output = ops.zeros_like(hidden_states)
-++#         num_tokens, _ = hidden_states.shape
-++#         flat_selected_experts = selected_experts.flatten()
-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++#         active_experts = ops.unique(flat_selected_experts)
-++#         for expert_idx_tensor in active_experts:
-++#             expert_idx = expert_idx_tensor.item()
-++#             expert_layer = self.experts[expert_idx]
-++#             mask = (flat_selected_experts == expert_idx_tensor)
-++#             selected_token_indices = token_indices[mask]
-++#             selected_routing_weights = routing_weights.flatten()[mask]
-++#             current_states = hidden_states[selected_token_indices]
-++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-++#         return moe_output
-++
-++#     # --- 精度优先模式的辅助函数 ---
-++#     @no_grad()
-++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++#         moe_output = ops.zeros_like(hidden_states)
-++#         num_tokens, _ = hidden_states.shape
-++#         flat_selected_experts = selected_experts.flatten()
-++#         flat_routing_weights = routing_weights.flatten()
-++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++#         active_experts = ops.unique(flat_selected_experts)
-++#         for expert_idx_tensor in active_experts:
-++#             expert_idx = expert_idx_tensor.item()
-++#             expert_layer = self.experts[expert_idx]
-++#             mask = (flat_selected_experts == expert_idx_tensor)
-++#             current_token_indices = token_indices[mask]
-++#             current_routing_weights = flat_routing_weights[mask]
-++#             current_hidden_states = hidden_states[current_token_indices]
-++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++#         return moe_output
-++
-++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++#         # 声明我们将要使用一个在模块外部定义的全局变量
-++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-++#         global Long_Prompt
-++
-++#         # 1. 门控计算 (所有模式通用)
-++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++#         router_logits = self.gate(hidden_states_reshaped)
-++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++#         if self.norm_topk_prob:
-++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++        
-++#         moe_output = None
-++#         if not self.training:
-++#             # 根据 Long_Prompt 标志选择模式
-++#             if Long_Prompt:
-++#                 # --- 精度优先模式 ---
-++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++#             else:
-++#                 # --- 速度优先模式 ---
-++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++#                 if sequence_length == 1:
-++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++#                 else:
-++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++#         else:
-++#             raise NotImplementedError("Training path is not implemented.")
-++
-++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++        
-++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++        
-++#         return final_hidden_states, router_logits
-++    
-++class Qwen2MoeSparseMoeBlock(nn.Module):
-++    """
-++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-++    控制的顶级推理策略：
-+ 
-++    - if Long_Prompt is True:  【精度优先模式】
-++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
-++      适用于需要严格可复现性的长序列任务。
-+ 
-+-class Qwen2MoeSparseMoeBlock(nn.Module):
-+-    def __init__(self, config):
-++    - if Long_Prompt is False: 【速度优先模式】
-++      采用业界最强的性能组合：
-++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
-++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
-++    """
-++    def __init__(self, config: Qwen2MoeConfig):
-+         super().__init__()
-+         self.num_experts = config.num_experts
-+         self.top_k = config.num_experts_per_tok
-+         self.norm_topk_prob = config.norm_topk_prob
-+ 
-+-        # gating
-+         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+         self.experts = nn.ModuleList(
-+             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+         )
-+-
-+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+ 
-+-    #@dwj
-+-    # 只遍历激活的专家，而非全部专家
-+-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+-            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+-            num_tokens = hidden_states_reshaped.shape[0]
-+-
-+-            router_logits = self.gate(hidden_states_reshaped)
-+-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+-
-+-            if self.norm_topk_prob:
-+-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-            routing_weights = routing_weights.to(hidden_states.dtype)
-+-            
-+-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+-            flat_selected_experts = selected_experts.flatten()
-+-            
-+-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+-            token_indices = broadcasted_token_indices.flatten()
-+-            
-+-            active_experts = ops.unique(flat_selected_experts)
-+-            
-+-            for expert_idx_tensor in active_experts:
-+-                expert_idx = expert_idx_tensor.item()
-+-                expert_layer = self.experts[expert_idx]
-+-                
-+-                mask = (flat_selected_experts == expert_idx_tensor)
-+-                selected_token_indices = token_indices[mask]
-+-                selected_routing_weights = routing_weights.flatten()[mask]
-+-                
-+-                current_states = hidden_states_reshaped[selected_token_indices]
-+-                
-+-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+-                
-+-                final_hidden_states = final_hidden_states.index_add(
-+-                    dim=0,
-+-                    index=selected_token_indices,
-+-                    source=expert_output.to(hidden_states.dtype)
-+-                )
-+-            
-+-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-++    @no_grad()
-++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++        original_dtype = hidden_states.dtype
-++        batch_size, _ = hidden_states.shape
-++        expert_outputs_list = [
-++            ops.cat([
-++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++            ], dim=0)
-++            for i in range(batch_size)
-++        ]
-++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++        weights_fp32 = routing_weights.to(mindspore.float32)
-++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++        return moe_output_fp32.squeeze(1).to(original_dtype)
-++
-++    @no_grad()
-++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++        num_tokens, _ = hidden_states.shape
-++        flat_selected_experts = selected_experts.flatten()
-++        sorted_expert_indices = flat_selected_experts.argsort()
-++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++        original_token_indices = sorted_expert_indices // self.top_k
-++        moe_output = ops.zeros_like(hidden_states)
-++        current_token_offset = 0
-++        for i in range(self.num_experts):
-++            expert_token_count = tokens_per_expert[i] - current_token_offset
-++            if expert_token_count == 0:
-++                continue
-++            end_offset = current_token_offset + expert_token_count
-++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++            expert_hidden_states = hidden_states[expert_original_token_indices]
-++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++            current_token_offset += expert_token_count
-++        return moe_output
-++
-++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-++    @no_grad()
-++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++        moe_output = ops.zeros_like(hidden_states)
-++        num_tokens, _ = hidden_states.shape
-++        flat_selected_experts = selected_experts.flatten()
-++        flat_routing_weights = routing_weights.flatten()
-++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++        active_experts = ops.unique(flat_selected_experts)
-++        for expert_idx_tensor in active_experts:
-++            expert_idx = expert_idx_tensor.item()
-++            expert_layer = self.experts[expert_idx]
-++            mask = (flat_selected_experts == expert_idx_tensor)
-++            current_token_indices = token_indices[mask]
-++            current_routing_weights = flat_routing_weights[mask]
-++            current_hidden_states = hidden_states[current_token_indices]
-++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++        return moe_output
-+ 
-+-            final_hidden_states = final_hidden_states + shared_expert_output
-+-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+-            
-+-            return final_hidden_states, router_logits
-++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++        global Long_Prompt
-++
-++        # 1. 门控计算 (所有模式通用)
-++        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++        router_logits = self.gate(hidden_states_reshaped)
-++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++        if self.norm_topk_prob:
-++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++        
-++        moe_output = None
-++        if Long_Prompt:
-++            # --- 精度优先模式 (ACCURACY MODE) ---
-++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++        else:
-++            # --- 速度优先模式 (SPEED MODE) ---
-++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++            if sequence_length == 1:
-++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++            else:
-++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++        
-+ 
-++        # 3. 共享专家计算与合并 (所有模式通用)
-++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++        
-++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++        
-++        return final_hidden_states, router_logits
-+ 
-+ class Qwen2MoeDecoderLayer(nn.Module):
-+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-+         super().__init__()
-+         self.hidden_size = config.hidden_size
-++        
-++        # if Long_Prompt:
-++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++        # else:
-++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+ 
-+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+ 
-+-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+-
-+         if (layer_idx not in config.mlp_only_layers) and (
-+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+         ):
-+@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+             self._warmed_up = True
-+             self.warmup_moe_model()
-+ 
-++
-++
-+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+         output_router_logits = (
-+             output_router_logits if output_router_logits is not None else self.config.output_router_logits
-+@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+             router_logits=outputs.router_logits,
-+         )
-+ 
-++    def generate(self, *args, **kwargs):
-++        """
-++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-++        """
-++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-++
-++        input_ids = kwargs.get("input_ids")
-++        if input_ids is None and args:
-++            input_ids = args[0]
-++
-++        if input_ids is not None:
-++            prompt_length = input_ids.shape[1]
-++            
-++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-++                Long_Prompt = True
-++            else:
-++                Long_Prompt = False
-++
-++        return super().generate(*args, **kwargs)
-++    
-+     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
-+     def prepare_inputs_for_generation(
-+         self,
-+@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
-+         # Exception 1: when passing input_embeds, input_ids may be missing entries
-+         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
-++        
-+         if past_key_values is not None:
-+             if inputs_embeds is not None:  # Exception 1
-+                 if 0 not in input_ids.shape:
-+@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+             }
-+         )
-+         return model_inputs
-++
-+ # @lwx
-+     # def _decode_one_tokens_logits(
-+     #     self,
-+@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
-+             attentions=outputs.attentions,
-+         )
-+ 
-++
-+ __all__ = [
-+     "Qwen2MoeForCausalLM",
-+     "Qwen2MoeModel",
-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+new file mode 100644
-+index 00000000..6dfb5b93
-+--- /dev/null
-++++ b/patches/0001-20251104commit.patch
-+@@ -0,0 +1,1272 @@
-++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++From: Pinoeer-kingxi <13022943007@163.com>
-++Date: Tue, 4 Nov 2025 09:11:51 +0800
-++Subject: [PATCH] 20251104commit
-++
-++---
-++ mindnlp/transformers/cache_utils.py           |  28 +-
-++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++ 3 files changed, 976 insertions(+), 87 deletions(-)
-++
-++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++index cadd2e04..02f8d4be 100644
-++--- a/mindnlp/transformers/cache_utils.py
-+++++ b/mindnlp/transformers/cache_utils.py
-++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++             # k_out[:, :, cache_position] = key_states
-++             # v_out[:, :, cache_position] = value_states
-++-            if ON_ORANGE_PI:
-++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++-            else:
-++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++-
-+++            # if ON_ORANGE_PI:
-+++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++            # else:
-+++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++            # 确保 cache_position 是 1D tensor 并且类型正确
-+++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+++            if cache_position.ndim > 1:
-+++                cache_position = cache_position.flatten()
-+++            # 确保类型是 int32 或 int64（MindSpore 要求）
-+++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+++                cache_position = cache_position.int()
-+++            
-+++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+++            k_out[:, :, cache_position] = key_states
-+++            v_out[:, :, cache_position] = value_states
-+++            
-++         return k_out, v_out
-++ 
-++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++index c695b944..d8303e45 100644
-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++ def rotate_half(x):
-++     """Rotates half the hidden dims of the input."""
-++-    x1 = x[..., : x.shape[-1] // 2]
-++-    x2 = x[..., x.shape[-1] // 2 :]
-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++    # x1 = x[..., : x.shape[-1] // 2]
-+++    # x2 = x[..., x.shape[-1] // 2 :]
-+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++     return ops.cat((-x2, x1), dim=-1)
-++ 
-++ 
-++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++         if self.training:
-++             raise NotImplementedError("Training is not supported yet.")
-++         else:
-++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++-        if self.config.n_shared_experts is not None:
-++-            y = y + self.shared_experts(identity)
-++-        return y
-+++            # @lwx
-+++            if orig_shape[1] == 1:
-+++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+++                y=y.view(*orig_shape)
-+++                if self.config.n_shared_experts is not None:
-+++                    y = y + self.shared_experts(identity)
-+++                return y
-+++            else:
-+++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+++                if self.config.n_shared_experts is not None:
-+++                    y = y + self.shared_experts(identity)
-+++                return y
-+++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++        # if self.config.n_shared_experts is not None:
-+++        #     y = y + self.shared_experts(identity)
-+++        # return y
-+++        
-+++    @no_grad()
-+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++
-+++        expert_cache = ops.zeros_like(x)
-+++        for i in range(self.num_experts_per_tok):
-+++            expert_id = flat_expert_indices[i].item()
-+++            weight = flat_expert_weights[i].item()
-+++            expert = self.experts[expert_id]
-+++            expert_out = expert(x)
-+++            expert_cache += expert_out * weight
-+++        return expert_cache
-++ 
-++     @no_grad()
-++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++-        # expert_cache = torch.zeros_like(x)
-++-        # idxs = flat_expert_indices.argsort()
-++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++-        # token_idxs = idxs // self.num_experts_per_tok
-++-        # for i, end_idx in enumerate(tokens_per_expert):
-++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++-        #     if start_idx == end_idx:
-++-        #         continue
-++-        #     expert = self.experts[i]
-++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++-        #     expert_tokens = x[exp_token_idx]
-++-        #     expert_out = expert(expert_tokens)
-++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++-        # return expert_cache
-+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++         expert_cache = ops.zeros_like(x)
-++         idxs = flat_expert_indices.argsort()
-++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++         token_idxs = idxs // self.num_experts_per_tok
-+++
-++         for i, end_idx in enumerate(tokens_per_expert):
-++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++             if start_idx == end_idx:
-++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++             expert_out = expert(expert_tokens)
-++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++
-++         return expert_cache
-+++        
-+++    # @no_grad()
-+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++    #     # expert_cache = torch.zeros_like(x)
-+++    #     # idxs = flat_expert_indices.argsort()
-+++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++    #     # token_idxs = idxs // self.num_experts_per_tok
-+++    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++    #     #     if start_idx == end_idx:
-+++    #     #         continue
-+++    #     #     expert = self.experts[i]
-+++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++    #     #     expert_tokens = x[exp_token_idx]
-+++    #     #     expert_out = expert(expert_tokens)
-+++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++    #     # return expert_cache
-+++    #     expert_cache = ops.zeros_like(x)
-+++    #     idxs = flat_expert_indices.argsort()
-+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++    #     token_idxs = idxs // self.num_experts_per_tok
-+++
-+++    #     for i, end_idx in enumerate(tokens_per_expert):
-+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++    #         if start_idx == end_idx:
-+++    #             continue
-+++    #         expert = self.experts[i]
-+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++    #         expert_tokens = x[exp_token_idx]
-+++    #         expert_out = expert(expert_tokens)
-+++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++
-+++    #     return expert_cache
-+++    # @no_grad()
-+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++    #     expert_cache = ops.zeros_like(x)
-+++
-+++    #     # 排序保证顺序一致
-+++    #     idxs = flat_expert_indices.argsort()
-+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++    #     token_idxs = idxs // self.num_experts_per_tok
-+++
-+++    #     # 找出有 token 的专家
-+++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++
-+++    #     for i in active_experts.tolist():
-+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++    #         end_idx = tokens_per_expert[i]
-+++    #         if start_idx == end_idx:  # 没有 token
-+++    #             continue
-+++
-+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++    #         expert_tokens = x[exp_token_idx]
-+++    #         expert_out = self.experts[i](expert_tokens)
-+++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++
-+++    #         expert_cache = mindspore.mint.scatter_add(
-+++    #             expert_cache,
-+++    #             0,
-+++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++    #             expert_out
-+++    #         )
-+++
-+++    #     return expert_cache
-+++
-+++
-++ 
-++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++ #     """
-++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++ 
-++         # Initialize weights and apply final processing
-++         self.post_init()
-+++        self.warm_up = False
-+++
-+++    def warmup_moe_model_deep(self):
-+++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++        test_texts = [
-+++            "warmup short",
-+++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+++        ]
-+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++        if tokenizer is None:
-+++            from mindnlp.transformers import AutoTokenizer
-+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++            self._warmup_tokenizer = tokenizer
-+++
-+++        for text in test_texts:
-+++            inputs = tokenizer(text, return_tensors="ms")
-+++            with mindspore._no_grad():
-+++                _ = self(**inputs, use_cache=False)
-+++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++ 
-++     def get_input_embeddings(self):
-++         return self.model.embed_tokens
-++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++         ```"""
-+++        if not self.warm_up:
-+++            self.warm_up = True
-+++            self.warmup_moe_model_deep()
-+++
-++         output_attentions = (
-++             output_attentions
-++             if output_attentions is not None
-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++index 3cbf820e..d4c6b651 100644
-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++@@ -18,7 +18,6 @@
-++ # See the License for the specific language governing permissions and
-++ # limitations under the License.
-++ """MindSpore Qwen2MoE model."""
-++-
-++ import math
-++ from typing import List, Optional, Tuple, Union
-++ 
-++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++     TokenClassifierOutput,
-++ )
-++ from ...modeling_utils import PreTrainedModel
-+++from ...generation import GenerationMixin
-++ from ....utils import logging
-++ from .configuration_qwen2_moe import Qwen2MoeConfig
-++ 
-++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++         self.variance_epsilon = eps
-++ 
-++     def forward(self, hidden_states):
-+++        # @dwj
-+++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++        # @lwx
-+++        # if not self.training :
-+++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++         input_dtype = hidden_states.dtype
-++         hidden_states = hidden_states.to(mindspore.float32)
-++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++@@ -234,6 +239,8 @@ def rotate_half(x):
-++     """Rotates half the hidden dims of the input."""
-++     x1 = x[..., : x.shape[-1] // 2]
-++     x2 = x[..., x.shape[-1] // 2 :]
-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++     return ops.cat((-x2, x1), dim=-1)
-++ 
-++ 
-++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++         self.config = config
-++         self.hidden_size = config.hidden_size
-++         self.intermediate_size = intermediate_size
-+++        
-++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++         self.act_fn = ACT2FN[config.hidden_act]
-++ 
-++     def forward(self, x):
-++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++-
-++ 
-+++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++        # @lwx 
-+++        # gate_up_output = self.gate_up_proj(x)
-+++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+++        # return self.down_proj(swiglu_output)
-+++
-+++    # def forward(self, x):
-+++    #     gate_proj_out = self.gate_proj(x)
-+++    #     up_proj_out = self.up_proj(x)
-+++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+++    #     return self.down_proj(swiglu_out)
-+++        
-++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++     """
-++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++         use_cache: bool = False,
-++         cache_position: Optional[mindspore.Tensor] = None,
-++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++        
-+++
-++         bsz, q_len, _ = hidden_states.shape
-++ 
-++         query_states = self.q_proj(hidden_states)
-++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++                     "with a layer index."
-++                 )
-++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++            if isinstance(past_key_value, StaticCache):
-+++                kv_seq_len = key_states.shape[-2]
-+++            else:
-+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++ 
-++         if past_key_value is not None:
-++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++            
-+++            if isinstance(past_key_value, StaticCache):
-+++                kv_seq_len = key_states.shape[-2]
-++ 
-++         # repeat k/v heads if n_kv_heads < n_heads
-++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++-
-+++        
-++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++ 
-++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++-            raise ValueError(
-++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++-                f" {attn_weights.shape}"
-++-            )
-++-
-++-        if attention_mask is not None:  # no matter the length, we just slice it
-++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+++        if attention_mask is not None:
-+++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++             attn_weights = attn_weights + causal_mask
-++ 
-++         # upcast attention to fp32
-++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++ 
-++         attn_output = self.o_proj(attn_output)
-++-
-+++        # @lwx
-+++        
-+++        # max_seq_len = self.max_position_embeddings  # 2048
-+++
-+++        # if attention_mask is not None:
-+++        #     # attention_mask: [B, 1, Sq, Sk]
-+++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++
-+++        #     # pad 到 [max_seq_len, max_seq_len]
-+++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++        #     global_attention_mask = padded_mask
-+++        # else:
-+++        #     global_attention_mask = None
-+++
-+++
-+++        # sparse_mode=3
-+++        # attn_output = mindspore.ops.flash_attention_score(
-+++        #     query=query_states,
-+++        #     key=key_states,
-+++        #     value=value_states,
-+++        #     real_shift=None,
-+++        #     padding_mask=None,
-+++
-+++        #     head_num=self.num_heads,
-+++        #     attn_mask=global_attention_mask,
-+++        #     keep_prob=1.0 - self.attention_dropout,
-+++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++        #     input_layout="BNSD",
-+++        #     pre_tokens=2147483647,
-+++        #     next_tokens=2147483647,
-+++        #     inner_precise=0,
-+++        #     drop_mask=None,
-+++        #     prefix=None,
-+++        #     actual_seq_qlen=None,
-+++        #     actual_seq_kvlen=None,
-+++        #     sparse_mode=sparse_mode,
-+++        # )
-++         if not output_attentions:
-++             attn_weights = None
-++ 
-++         return attn_output, attn_weights, past_key_value
-++ 
-++ 
-+++class Qwen2MoeFlashAttention(nn.Module):
-+++    """
-+++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++
-+++    关键改动:
-+++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++       直接传入原始的 key 和 value 张量效率更高。
-+++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++    """
-+++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++        super().__init__()
-+++        self.config = config
-+++        self.layer_idx = layer_idx
-+++        self.hidden_size = config.hidden_size
-+++        self.num_heads = config.num_attention_heads
-+++        self.head_dim = self.hidden_size // self.num_heads
-+++        self.num_key_value_heads = config.num_key_value_heads
-+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++        self.max_position_embeddings = config.max_position_embeddings
-+++        self.rope_theta = config.rope_theta
-+++        self.attention_dropout = config.attention_dropout
-+++
-+++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++            raise ValueError(
-+++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++            )
-+++
-+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++
-+++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++            self.head_dim,
-+++            max_position_embeddings=self.max_position_embeddings,
-+++            base=self.rope_theta,
-+++        )
-+++
-+++    def forward(
-+++        self,
-+++        hidden_states: mindspore.Tensor,
-+++        attention_mask: Optional[mindspore.Tensor] = None,
-+++        position_ids: Optional[mindspore.Tensor] = None,
-+++        past_key_value: Optional[Cache] = None,
-+++        output_attentions: bool = False,
-+++        use_cache: bool = False,
-+++        cache_position: Optional[mindspore.Tensor] = None,
-+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++        bsz, q_len, _ = hidden_states.shape
-+++
-+++        # 1. 线性投射 Q, K, V
-+++        query_states = self.q_proj(hidden_states)
-+++        key_states = self.k_proj(hidden_states)
-+++        value_states = self.v_proj(hidden_states)
-+++
-+++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++        # 3. RoPE 旋转位置编码
-+++        kv_seq_len = key_states.shape[-2]
-+++        if past_key_value is not None:
-+++            if self.layer_idx is None:
-+++                raise ValueError(
-+++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++                    "with a layer index."
-+++                )
-+++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++                if cache_position.shape[0] == 1:
-+++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++                    kv_seq_len = past_seen_tokens + 1
-+++                else:
-+++                    # prefill 阶段：cache_position 是范围，使用其长度
-+++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++            else:
-+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++
-+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++        # 4. KV 缓存更新
-+++        if past_key_value is not None:
-+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++            key_states, value_states = past_key_value.update(
-+++                key_states, value_states, self.layer_idx, cache_kwargs
-+++            )
-+++            
-+++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++                if cache_position.shape[0] == 1:
-+++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++                    kv_seq_len = key_states.shape[-2]
-+++
-+++        # 5. [重要] 准备 Attention Mask
-+++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++        fa_attention_mask = None
-+++        if attention_mask is not None:
-+++            # 截取与当前key长度匹配的部分
-+++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++            fa_attention_mask = (mask_slice != 0)
-+++
-+++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++        input_dtype = query_states.dtype
-+++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++            query_states = query_states.to(mindspore.float16)
-+++            key_states = key_states.to(mindspore.float16)
-+++            value_states = value_states.to(mindspore.float16)
-+++
-+++        # 6. [核心] 调用 flash_attention_score 算子
-+++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++        attn_output = mindspore.ops.flash_attention_score(
-+++            query=query_states,
-+++            key=key_states,
-+++            value=value_states,
-+++            head_num=self.num_heads, # 传入Q的头数(N1)
-+++            attn_mask=fa_attention_mask,
-+++            keep_prob=1.0 - self.attention_dropout,
-+++            scalar_value=1.0 / math.sqrt(self.head_dim),
-+++            input_layout="BNSD",
-+++            sparse_mode=0 # 使用 defaultMask 模式
-+++        )
-+++
-+++        # 恢复原始数据类型
-+++        attn_output = attn_output.to(input_dtype)
-+++
-+++        # 7. 调整输出形状
-+++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++        attn_output = self.o_proj(attn_output)
-+++
-+++        # FlashAttention 算子不直接返回注意力权重矩阵
-+++        attn_weights = None
-+++        if output_attentions:
-+++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++
-+++        return attn_output, attn_weights, past_key_value
-+++
-+++    # def forward(
-+++    #     self,
-+++    #     hidden_states: mindspore.Tensor,
-+++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++    #     past_key_value: Optional[Cache] = None,
-+++    #     output_attentions: bool = False,
-+++    #     use_cache: bool = False,
-+++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++    #     bsz, q_len, _ = hidden_states.shape
-+++
-+++    #     # 1. 线性投射 Q, K, V
-+++    #     query_states = self.q_proj(hidden_states)
-+++    #     key_states = self.k_proj(hidden_states)
-+++    #     value_states = self.v_proj(hidden_states)
-+++
-+++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++    #     # 3. RoPE 旋转位置编码
-+++    #     kv_seq_len = key_states.shape[-2]
-+++    #     if past_key_value is not None:
-+++    #         if self.layer_idx is None:
-+++    #             raise ValueError(
-+++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++    #                 "with a layer index."
-+++    #             )
-+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++
-+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++    #     # 4. KV 缓存更新
-+++    #     if past_key_value is not None:
-+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++    #         key_states, value_states = past_key_value.update(
-+++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++    #         )
-+++
-+++    #     # 5. 准备 Attention Mask
-+++    #     fa_attention_mask = None
-+++    #     if attention_mask is not None:
-+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++    #         fa_attention_mask = (mask_slice != 0)
-+++
-+++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++    #     input_dtype = query_states.dtype
-+++
-+++    #     # 6. [核心] 调用 flash_attention_score 算子
-+++    #     attn_output = mindspore.ops.flash_attention_score(
-+++    #         query=query_states,
-+++    #         key=key_states,
-+++    #         value=value_states,
-+++    #         head_num=self.num_heads,
-+++    #         attn_mask=fa_attention_mask,
-+++    #         keep_prob=1.0 - self.attention_dropout,
-+++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++    #         input_layout="BNSD",
-+++    #         sparse_mode=0,
-+++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++    #         inner_precise=1
-+++    #     )
-+++
-+++    #     # 恢复原始数据类型
-+++    #     attn_output = attn_output.to(input_dtype)
-+++
-+++    #     # 7. 调整输出形状
-+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++    #     attn_output = self.o_proj(attn_output)
-+++
-+++    #     attn_weights = None
-+++    #     if output_attentions:
-+++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++
-+++    #     return attn_output, attn_weights, past_key_value
-+++
-+++    # def forward(
-+++    #     self,
-+++    #     hidden_states: mindspore.Tensor,
-+++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++    #     past_key_value: Optional[Cache] = None,
-+++    #     output_attentions: bool = False,
-+++    #     use_cache: bool = False,
-+++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++    #     bsz, q_len, _ = hidden_states.shape
-+++
-+++    #     query_states = self.q_proj(hidden_states)
-+++    #     key_states = self.k_proj(hidden_states)
-+++    #     value_states = self.v_proj(hidden_states)
-+++
-+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++    #     kv_seq_len = key_states.shape[-2]
-+++    #     if past_key_value is not None:
-+++    #         if self.layer_idx is None:
-+++    #             raise ValueError("`layer_idx` must be specified for caching")
-+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++
-+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++    #     if past_key_value is not None:
-+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++    #         key_states, value_states = past_key_value.update(
-+++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++    #         )
-+++
-+++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++
-+++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++    #     query_states = query_states / math.sqrt(self.head_dim)
-+++    #     # <--- 修改结束 ---
-+++
-+++    #     fa_attention_mask = None
-+++    #     if attention_mask is not None:
-+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++    #         fa_attention_mask = (mask_slice != 0)
-+++
-+++    #     input_dtype = query_states.dtype
-+++
-+++    #     attn_output = mindspore.ops.flash_attention_score(
-+++    #         query=query_states,    # 传入已经预先缩放过的 query
-+++    #         key=key_states,
-+++    #         value=value_states,
-+++    #         head_num=self.num_heads,
-+++    #         attn_mask=fa_attention_mask,
-+++    #         keep_prob=1.0 - self.attention_dropout,
-+++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++    #         input_layout="BNSD",
-+++    #         sparse_mode=0,
-+++    #         inner_precise=1        # 仍然保持内部高精度计算
-+++    #     )
-+++
-+++    #     attn_output = attn_output.to(input_dtype)
-+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++    #     attn_output = self.o_proj(attn_output)
-+++
-+++    #     attn_weights = None
-+++    #     if output_attentions:
-+++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++
-+++    #     return attn_output, attn_weights, past_key_value
-+++
-++ QWEN2MOE_ATTENTION_CLASSES = {
-++     "eager": Qwen2MoeAttention,
-+++    "flash-attention": Qwen2MoeFlashAttention,
-++ }
-++ 
-++ 
-++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++ 
-+++    #@dwj
-+++    # 只遍历激活的专家，而非全部专家
-++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-        hidden_states = hidden_states.view(-1, hidden_dim)
-++-        # router_logits: (batch * sequence_length, n_experts)
-++-        router_logits = self.gate(hidden_states)
-++-
-++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++-        if self.norm_topk_prob:
-++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-        # we cast back to the input dtype
-++-        routing_weights = routing_weights.to(hidden_states.dtype)
-++-
-++-        final_hidden_states = ops.zeros(
-++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++-        )
-++-
-++-        # One hot encode the selected experts to create an expert mask
-++-        # this will be used to easily index which expert is going to be sollicitated
-++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++-
-++-        # Loop over all available experts in the model and perform the computation on each expert
-++-        for expert_idx in range(self.num_experts):
-++-            expert_layer = self.experts[expert_idx]
-++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++-
-++-            # Index the correct hidden states and compute the expert hidden state for
-++-            # the current expert. We need to make sure to multiply the output hidden
-++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++-            if 0 not in idx.shape:
-++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++-
-++-                # However `index_add_` only support torch tensors for indexing so we'll use
-++-                # the `top_x` tensor here.
-++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++-
-++-        shared_expert_output = self.shared_expert(hidden_states)
-++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++-
-++-        final_hidden_states = final_hidden_states + shared_expert_output
-+++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++            num_tokens = hidden_states_reshaped.shape[0]
-+++
-+++            router_logits = self.gate(hidden_states_reshaped)
-+++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++
-+++            if self.norm_topk_prob:
-+++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++            routing_weights = routing_weights.to(hidden_states.dtype)
-+++            
-+++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++            flat_selected_experts = selected_experts.flatten()
-+++            
-+++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++            token_indices = broadcasted_token_indices.flatten()
-+++            
-+++            active_experts = ops.unique(flat_selected_experts)
-+++            
-+++            for expert_idx_tensor in active_experts:
-+++                expert_idx = expert_idx_tensor.item()
-+++                expert_layer = self.experts[expert_idx]
-+++                
-+++                mask = (flat_selected_experts == expert_idx_tensor)
-+++                selected_token_indices = token_indices[mask]
-+++                selected_routing_weights = routing_weights.flatten()[mask]
-+++                
-+++                current_states = hidden_states_reshaped[selected_token_indices]
-+++                
-+++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++                
-+++                final_hidden_states = final_hidden_states.index_add(
-+++                    dim=0,
-+++                    index=selected_token_indices,
-+++                    source=expert_output.to(hidden_states.dtype)
-+++                )
-+++            
-+++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++ 
-++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++-        return final_hidden_states, router_logits
-+++            final_hidden_states = final_hidden_states + shared_expert_output
-+++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++            
-+++            return final_hidden_states, router_logits
-++ 
-++ 
-++ class Qwen2MoeDecoderLayer(nn.Module):
-++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++ 
-++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++ 
-+++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++
-++         if (layer_idx not in config.mlp_only_layers) and (
-++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++         ):
-++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++     _skip_keys_device_placement = "past_key_values"
-++     _supports_cache_class = True
-+++#lwx
-+++    # _supports_static_cache = True
-++ 
-++     def _init_weights(self, module):
-++         std = self.config.initializer_range
-++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++         return causal_mask
-++ 
-++ 
-++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++     _tied_weights_keys = ["lm_head.weight"]
-++ 
-++     def __init__(self, config):
-++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++         self.num_experts_per_tok = config.num_experts_per_tok
-++         # Initialize weights and apply final processing
-++         self.post_init()
-+++        # @lwx
-+++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+++        #     self.generation_config.cache_implementation = "static"
-+++        self._warmed_up = False
-+++
-+++    def warmup_moe_model(self):
-+++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+++        test_texts = [
-+++            "warmup short",
-+++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+++        ]
-+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++        if tokenizer is None:
-+++            from mindnlp.transformers import AutoTokenizer
-+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++            self._warmup_tokenizer = tokenizer
-+++
-+++        for text in test_texts:
-+++            inputs = tokenizer(text, return_tensors="ms")
-+++            with mindspore._no_grad():
-+++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++ 
-++     def get_input_embeddings(self):
-++         return self.model.embed_tokens
-++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++         ```"""
-+++        if not self._warmed_up:
-+++            self._warmed_up = True
-+++            self.warmup_moe_model()
-++ 
-++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++         output_router_logits = (
-++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++             }
-++         )
-++         return model_inputs
-+++# @lwx
-+++    # def _decode_one_tokens_logits(
-+++    #     self,
-+++    #     cur_token: mindspore.Tensor,
-+++    #     input_pos: Optional[mindspore.Tensor],
-+++    #     cache_position: mindspore.Tensor,
-+++    #     past_key_values: StaticCache,
-+++    # ) -> mindspore.Tensor:
-+++    #     """
-+++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+++        
-+++    #     Args:
-+++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+++    #         input_pos: 输入位置信息，可选
-+++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+++            
-+++    #     Returns:
-+++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+++    #     """
-+++    #     # 调用JIT编译的版本
-+++    #     return self.get_decode_one_tokens_logits(
-+++    #         cur_token=cur_token,
-+++    #         input_pos=input_pos,
-+++    #         cache_position=cache_position,
-+++    #         past_key_values=past_key_values,
-+++    #     )
-+++    
-+++    # @mindspore.jit(jit_level='O1')
-+++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+++    #     """
-+++    #     JIT编译的函数，用于高效的单token解码
-+++    #     使用JIT编译优化以支持静态shape和高效执行
-+++        
-+++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+++    #     """
-+++    #     outputs = self.model.forward(
-+++    #         input_ids=cur_token,
-+++    #         position_ids=input_pos,
-+++    #         cache_position=cache_position,
-+++    #         past_key_values=past_key_values,
-+++    #         use_cache=True,
-+++    #         return_dict=False,
-+++    #     )
-+++        
-+++    #     hidden_states = outputs[0]
-+++    #     logits = self.lm_head.forward(hidden_states)
-+++    #     logits = logits.float()
-+++        
-+++    #     return logits[:, -1, :]
-+++
-+++    # def _sample(
-+++    #     self,
-+++    #     input_ids: mindspore.Tensor,
-+++    #     logits_processor,
-+++    #     stopping_criteria,
-+++    #     generation_config,
-+++    #     synced_devices: bool,
-+++    #     streamer=None,
-+++    #     logits_warper=None,
-+++    #     **model_kwargs,
-+++    # ):
-+++    #     """
-+++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+++    #     """
-+++    #     from ...generation.logits_process import LogitsProcessorList
-+++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+++    #     from mindnlp.core import nn, ops, no_grad
-+++    #     import numpy as np
-+++        
-+++    #     # 检查是否使用 StaticCache
-+++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+++    #     # 否则，直接调用父类方法
-+++    #     past_key_values = model_kwargs.get("past_key_values")
-+++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+++
-+++    #     if not isinstance(past_key_values, StaticCache):
-+++    #         # 不使用 StaticCache，直接调用父类方法
-+++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+++    #         return super()._sample(
-+++    #             input_ids=input_ids,
-+++    #             logits_processor=logits_processor,
-+++    #             stopping_criteria=stopping_criteria,
-+++    #             generation_config=generation_config,
-+++    #             synced_devices=synced_devices,
-+++    #             streamer=streamer,
-+++    #             logits_warper=logits_warper,
-+++    #             **model_kwargs,
-+++    #         )
-+++        
-+++    #     # 使用 StaticCache，进入自定义循环
-+++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+++    #     pad_token_id = generation_config._pad_token_tensor
-+++    #     output_attentions = generation_config.output_attentions
-+++    #     output_hidden_states = generation_config.output_hidden_states
-+++    #     output_scores = generation_config.output_scores
-+++    #     output_logits = generation_config.output_logits
-+++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+++    #     max_length = generation_config.max_length
-+++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+++    #     do_sample = generation_config.do_sample
-+++        
-+++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+++    #         raise ValueError(
-+++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+++    #             f"{logits_warper})."
-+++    #         )
-+++        
-+++    #     # init attention / hidden states / scores tuples
-+++    #     scores = () if (return_dict_in_generate and output_scores) else None
-+++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+++        
-+++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+++    #         encoder_hidden_states = (
-+++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+++    #         )
-+++        
-+++    #     # keep track of which sequences are already finished
-+++    #     batch_size, cur_len = input_ids.shape
-+++    #     this_peer_finished = False
-+++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+++        
-+++    #     time_record = []
-+++    #     from ....utils.testing_utils import parse_flag_from_env
-+++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+++        
-+++    #     while self._has_unfinished_sequences(
-+++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+++    #     ):
-+++    #         if _record_time:
-+++    #             import time as time_module
-+++    #             infer_start = time_module.time()
-+++            
-+++    #         # prepare model inputs
-+++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+++            
-+++    #         # prepare variable output controls
-+++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+++            
-+++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+++    #         cur_cache_position = model_inputs.get("cache_position")
-+++    #         cur_past_key_values = model_inputs.get("past_key_values")
-+++    #         cur_input_ids = model_inputs.get("input_ids")
-+++            
-+++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+++    #             cur_cache_position is not None and 
-+++    #             len(cur_cache_position.shape) > 0 and
-+++    #             cur_cache_position.shape[0] == 1 and
-+++    #             cur_input_ids is not None and
-+++    #             cur_input_ids.shape[1] == 1):
-+++    #             # 使用 JIT 优化的单 token 解码
-+++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+++    #             if not hasattr(self, '_jit_used'):
-+++    #                 self._jit_used = False
-+++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+++                
-+++    #             next_token_logits = self.get_decode_one_tokens_logits(
-+++    #                 cur_token=cur_input_ids,
-+++    #                 input_pos=model_inputs.get("position_ids"),
-+++    #                 cache_position=cur_cache_position,
-+++    #                 past_key_values=cur_past_key_values,
-+++    #             )
-+++                
-+++    #             # 标记已使用JIT（用于后续判断）
-+++    #             if not self._jit_used:
-+++    #                 self._jit_used = True
-+++                
-+++    #             # 构造兼容的输出对象
-+++    #             class JitOptimizedOutput:
-+++    #                 def __init__(self, logits, config):
-+++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+++    #                     self.config = config
-+++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+++    #                     self.attentions = None if not config.is_encoder_decoder else None
-+++    #                     self.cross_attentions = None
-+++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+++                
-+++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+++    #         else:
-+++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+++    #             outputs = self(**model_inputs, return_dict=True)
-+++            
-+++    #         if synced_devices and this_peer_finished:
-+++    #             continue
-+++            
-+++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+++    #         next_token_logits = outputs.logits[:, -1, :]
-+++            
-+++    #         # pre-process distribution
-+++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+++    #         if do_sample:
-+++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+++            
-+++    #         # Store scores, attentions and hidden_states when required
-+++    #         if return_dict_in_generate:
-+++    #             if output_scores:
-+++    #                 scores += (next_token_scores,)
-+++    #             if output_logits:
-+++    #                 raw_logits += (next_token_logits,)
-+++    #             if output_attentions:
-+++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+++    #                 if self.config.is_encoder_decoder:
-+++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+++                
-+++    #             if output_hidden_states:
-+++    #                 hidden = (
-+++    #                     outputs.decoder_hidden_states
-+++    #                     if self.config.is_encoder_decoder
-+++    #                     else outputs.hidden_states
-+++    #                 )
-+++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+++            
-+++    #         # token selection
-+++    #         if do_sample:
-+++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+++    #         else:
-+++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+++            
-+++    #         # finished sentences should have their next token be a padding token
-+++    #         if has_eos_stopping_criteria:
-+++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+++            
-+++    #         # update generated ids, model inputs, and length for next step
-+++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+++    #         if streamer is not None:
-+++    #             streamer.put(next_tokens)
-+++            
-+++    #         model_kwargs = self._update_model_kwargs_for_generation(
-+++    #             outputs,
-+++    #             model_kwargs,
-+++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+++    #         )
-+++            
-+++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+++    #         cur_len += 1
-+++            
-+++    #         if _record_time:
-+++    #             import time as time_module
-+++    #             infer_stop = time_module.time()
-+++    #             time_record.append(infer_stop - infer_start)
-+++            
-+++    #         del outputs
-+++        
-+++    #     average_infer_time = None
-+++    #     if time_record:
-+++    #         if len(time_record) > 1:
-+++    #             time_record.pop(0)
-+++    #         average_infer_time = sum(time_record) / len(time_record)
-+++    #         print(f'average inference time is: {average_infer_time}')
-+++    #         print(f'inference time record: {time_record}')
-+++        
-+++    #     if streamer is not None:
-+++    #         streamer.end()
-+++        
-+++    #     # 简单判断：打印是否使用了JIT路径
-+++    #     if hasattr(self, '_jit_used') and self._jit_used:
-+++    #         print("[JIT] ✓ JIT optimization was used during generation")
-+++    #     else:
-+++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+++        
-+++    #     if return_dict_in_generate:
-+++    #         if self.config.is_encoder_decoder:
-+++    #             return GenerateEncoderDecoderOutput(
-+++    #                 sequences=input_ids,
-+++    #                 scores=scores,
-+++    #                 logits=raw_logits,
-+++    #                 encoder_attentions=encoder_attentions,
-+++    #                 encoder_hidden_states=encoder_hidden_states,
-+++    #                 decoder_attentions=decoder_attentions,
-+++    #                 cross_attentions=cross_attentions,
-+++    #                 decoder_hidden_states=decoder_hidden_states,
-+++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++    #                 average_infer_time=average_infer_time
-+++    #             )
-+++    #         else:
-+++    #             return GenerateDecoderOnlyOutput(
-+++    #                 sequences=input_ids,
-+++    #                 scores=scores,
-+++    #                 logits=raw_logits,
-+++    #                 attentions=decoder_attentions,
-+++    #                 hidden_states=decoder_hidden_states,
-+++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++    #                 average_infer_time=average_infer_time
-+++    #             )
-+++    #     else:
-+++    #         return input_ids
-+++            
-+++    # def _prepare_cache_for_generation(
-+++    #     self,
-+++    #     generation_config,
-+++    #     model_kwargs,
-+++    #     assistant_model,
-+++    #     batch_size,
-+++    #     max_cache_length,
-+++    # ):
-+++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+++    #         generation_config.cache_implementation = "static"
-+++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+++        
-+++    #     if generation_config.cache_implementation == "static":
-+++    #         base_required_from_max_length = generation_config.max_length + 1
-+++    #         base_required = max(max_cache_length, base_required_from_max_length)
-+++    #         min_cache_size = 50
-+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+++    #         else:
-+++    #             max_cache_length = max(base_required, min_cache_size)
-+++            
-+++    #         original_max_cache_length = max_cache_length
-+++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+++    #         print(f"  - final max_cache_length: {max_cache_length}")
-+++            
-+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++    #             if max_cache_length > self.config.max_position_embeddings:
-+++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+++        
-+++    #     result = super()._prepare_cache_for_generation(
-+++    #         generation_config=generation_config,
-+++    #         model_kwargs=model_kwargs,
-+++    #         assistant_model=assistant_model,
-+++    #         batch_size=batch_size,
-+++    #         max_cache_length=max_cache_length,
-+++    #     )
-+++        
-+++    #     if generation_config.cache_implementation == "static":
-+++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+++    #         created_cache = model_kwargs.get(cache_name)
-+++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+++    #             if created_cache.max_cache_len < generation_config.max_length:
-+++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+++        
-+++    #     return result
-+++
-+++
-+++
-++ 
-++ 
-++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++-- 
-++2.27.0
-++
-+-- 
-+2.27.0
-+
-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-new file mode 100644
-index 00000000..966529e4
---- /dev/null
-+++ b/patches/0003-20261106secondcommit.patch
-@@ -0,0 +1,2769 @@
-+From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Thu, 6 Nov 2025 14:54:37 +0800
-+Subject: [PATCH 3/3] 20261106secondcommit
-+
-+---
-+ .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
-+ patches/0001-20251104commit.patch             | 1272 -----------------
-+ 3 files changed, 528 insertions(+), 2032 deletions(-)
-+ delete mode 100644 patches/0001-20251104commit.patch
-+
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index 73773c22..2f9192bf 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
-+ 
-+ _CONFIG_FOR_DOC = "DeepseekConfig"
-+ 
-++_attn_mask_cache = {}
-++
-++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-++    q_len = batch_and_seq[1]
-++    kv_len = batch_and_seq[1] + past_key_values_length 
-++    key = (batch_and_seq[0], q_len, kv_len)
-++
-++    if key in _attn_mask_cache:
-++        return _attn_mask_cache[key]
-++
-++    mask = _prepare_4d_causal_attention_mask(
-++        attention_mask,
-++        batch_and_seq,
-++        inputs_embeds,
-++        past_key_values_length,
-++    )
-++    _attn_mask_cache[key] = mask
-++    return mask
-+ 
-+ def _get_unpad_data(attention_mask):
-+     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
-+@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
-+         return final_output
-+ 
-+ 
-+-    @no_grad()
-+-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+-        expert_cache = ops.zeros_like(x)
-+-        idxs = flat_expert_indices.argsort()
-+-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+-        token_idxs = idxs // self.num_experts_per_tok
-+-
-+-        for i, end_idx in enumerate(tokens_per_expert):
-+-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+-            if start_idx == end_idx:
-+-                continue
-+-            expert = self.experts[i]
-+-            exp_token_idx = token_idxs[start_idx:end_idx]
-+-            expert_tokens = x[exp_token_idx]
-+-            expert_out = expert(expert_tokens)
-+-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+-
-+-        return expert_cache
-+-        
-+     # @no_grad()
-+-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+-    #     # expert_cache = torch.zeros_like(x)
-+-    #     # idxs = flat_expert_indices.argsort()
-+-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+-    #     # token_idxs = idxs // self.num_experts_per_tok
-+-    #     # for i, end_idx in enumerate(tokens_per_expert):
-+-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+-    #     #     if start_idx == end_idx:
-+-    #     #         continue
-+-    #     #     expert = self.experts[i]
-+-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+-    #     #     expert_tokens = x[exp_token_idx]
-+-    #     #     expert_out = expert(expert_tokens)
-+-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+-    #     # return expert_cache
-++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+     #     expert_cache = ops.zeros_like(x)
-+     #     idxs = flat_expert_indices.argsort()
-+     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
-+     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+ 
-+     #     return expert_cache
-+-    # @no_grad()
-+-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+-    #     expert_cache = ops.zeros_like(x)
-++        
-++    @no_grad()
-++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++        """
-++        优化版 MoE prefill：
-++        - 批量张量化处理同一个 expert 的所有 token
-++        - 跳过无 token 的专家
-++        - 保持结果完全一致
-++        """
-++        # 初始化输出缓存
-++        expert_cache = ops.zeros_like(x)
-+ 
-+-    #     # 排序保证顺序一致
-+-    #     idxs = flat_expert_indices.argsort()
-+-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+-    #     token_idxs = idxs // self.num_experts_per_tok
-++        # 排序（确保 scatter_add 位置对应原逻辑）
-++        idxs = flat_expert_indices.argsort()
-++        sorted_expert_indices = flat_expert_indices[idxs]
-++        sorted_token_indices = idxs // self.num_experts_per_tok
-+ 
-+-    #     # 找出有 token 的专家
-+-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++        # 每个 expert 的 token 数
-++        tokens_per_expert = sorted_expert_indices.bincount()
-+ 
-+-    #     for i in active_experts.tolist():
-+-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+-    #         end_idx = tokens_per_expert[i]
-+-    #         if start_idx == end_idx:  # 没有 token
-+-    #             continue
-++        # 找出有 token 的专家
-++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-+ 
-+-    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+-    #         expert_tokens = x[exp_token_idx]
-+-    #         expert_out = self.experts[i](expert_tokens)
-+-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++        for expert_id in active_experts.tolist():
-++            # 取该 expert 对应的排序后 token 区间
-++            start = (tokens_per_expert[:expert_id]).sum().item()
-++            end = start + tokens_per_expert[expert_id].item()
-+ 
-+-    #         expert_cache = mindspore.mint.scatter_add(
-+-    #             expert_cache,
-+-    #             0,
-+-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+-    #             expert_out
-+-    #         )
-++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
-++            expert_tokens = x[token_idx]                     # 取输入向量
-+ 
-+-    #     return expert_cache
-++            # 执行专家 MLP
-++            expert_out = self.experts[expert_id](expert_tokens)
-++
-++            # 按权重缩放
-++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
-++
-++            # 回写到缓存（等价 scatter_add）
-++            expert_cache = mindspore.mint.scatter_add(
-++                expert_cache,
-++                0,
-++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++                scaled_out
-++            )
-++
-++        return expert_cache
-++
-++        # @no_grad()
-++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++        #     # expert_cache = torch.zeros_like(x)
-++        #     # idxs = flat_expert_indices.argsort()
-++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++        #     # token_idxs = idxs // self.num_experts_per_tok
-++        #     # for i, end_idx in enumerate(tokens_per_expert):
-++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++        #     #     if start_idx == end_idx:
-++        #     #         continue
-++        #     #     expert = self.experts[i]
-++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++        #     #     expert_tokens = x[exp_token_idx]
-++        #     #     expert_out = expert(expert_tokens)
-++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++        #     # return expert_cache
-++        #     expert_cache = ops.zeros_like(x)
-++        #     idxs = flat_expert_indices.argsort()
-++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++        #     token_idxs = idxs // self.num_experts_per_tok
-++
-++        #     for i, end_idx in enumerate(tokens_per_expert):
-++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++        #         if start_idx == end_idx:
-++        #             continue
-++        #         expert = self.experts[i]
-++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-++        #         expert_tokens = x[exp_token_idx]
-++        #         expert_out = expert(expert_tokens)
-++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++
-++        #     return expert_cache
-++        # @no_grad()
-++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++        #     expert_cache = ops.zeros_like(x)
-++
-++        #     # 排序保证顺序一致
-++        #     idxs = flat_expert_indices.argsort()
-++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++        #     token_idxs = idxs // self.num_experts_per_tok
-++
-++        #     # 找出有 token 的专家
-++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++
-++        #     for i in active_experts.tolist():
-++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++        #         end_idx = tokens_per_expert[i]
-++        #         if start_idx == end_idx:  # 没有 token
-++        #             continue
-++
-++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-++        #         expert_tokens = x[exp_token_idx]
-++        #         expert_out = self.experts[i](expert_tokens)
-++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++
-++        #         expert_cache = mindspore.mint.scatter_add(
-++        #             expert_cache,
-++        #             0,
-++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++        #             expert_out
-++        #         )
-++
-++        #     return expert_cache
-+ 
-+ 
-+ 
-+@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
-+ 
-+         return attn_output, attn_weights, past_key_value
-+ 
-+-
-+ # class DeepseekFlashAttention(nn.Module):
-+ #     """
-+ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
-+ 
-+         return attn_output, attn_weights, past_key_value
-+ 
-++
-+ Deepseek_ATTENTION_CLASSES = {
-+     "eager": DeepseekAttention,
-+     "flash-attention": DeepseekFlashAttention,
-+@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
-+             )
-+         else:
-+             # 4d mask is passed through the layers
-+-            attention_mask = _prepare_4d_causal_attention_mask(
-++            # attention_mask = _prepare_4d_causal_attention_mask(
-++            #     attention_mask,
-++            #     (batch_size, seq_length),
-++            #     inputs_embeds,
-++            #     past_key_values_length,
-++            # )
-++            #@dwj
-++            attention_mask = get_cached_causal_mask(
-+                 attention_mask,
-+                 (batch_size, seq_length),
-+                 inputs_embeds,
-+@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+         # Initialize weights and apply final processing
-+         self.post_init()
-+         self.warm_up = False
-++        #@dwj
-++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-++            self.num_layers,
-++            self.num_attention_heads,
-++            self.head_dim,
-++            batch_size=1,
-++            max_length=self.max_length,
-++            dtype=mindspore.float16
-++        )
-++
-++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-++        key_cache = []
-++        value_cache = []
-++        for _ in range(num_layers):
-++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++            key_cache.append(k)
-++            value_cache.append(v)
-++        return key_cache, value_cache
-++
-+ 
-+     def warmup_moe_model_deep(self):
-+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+index bced285c..ebd7782e 100644
-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
-+ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-+ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-+ 
-+-Long_Prompt = False
-+-PROMPT_LENGTH_THRESHOLD = 128
-++Long_Prompt = 1
-++LONG_PROMPT_LENGTH_THRESHOLD = 128
-++SHORT_PROMPT_LENGTH_THRESHOLD = 32
-++
-++_causal_mask_cache = {}
-++
-++def get_cached_causal_mask_with_cache_position(
-++    attention_mask: mindspore.Tensor,
-++    sequence_length: int,
-++    target_length: int,
-++    dtype: mindspore.dtype,
-++    min_dtype: float,
-++    cache_position: mindspore.Tensor,
-++    batch_size: int,
-++):
-++    """
-++    带缓存的 causal mask 构造函数
-++    """
-++    # q_len 是当前 query 长度
-++    q_len = sequence_length
-++    # kv_len 是 target_length
-++    kv_len = target_length
-++
-++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
-++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
-++
-++    if key in _causal_mask_cache:
-++        return _causal_mask_cache[key]
-++
-++    # 调用原来的 mask 构造逻辑
-++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++        attention_mask,
-++        sequence_length=sequence_length,
-++        target_length=target_length,
-++        dtype=dtype,
-++        min_dtype=min_dtype,
-++        cache_position=cache_position,
-++        batch_size=batch_size,
-++    )
-++    # 缓存结果
-++    _causal_mask_cache[key] = causal_mask
-++    return causal_mask
-+ 
-+ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-+ def _prepare_4d_causal_attention_mask_with_cache_position(
-+@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+ 
-+ 
-+ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
-++# class Qwen2MoeAttention(nn.Module):
-++#     """
-++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-++#     and "Generating Long Sequences with Sparse Transformers".
-++#     """
-++
-++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++#         super().__init__()
-++#         self.config = config
-++#         self.layer_idx = layer_idx
-++#         if layer_idx is None:
-++#             logger.warning_once(
-++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++#                 "when creating this class."
-++#             )
-++
-++#         self.hidden_size = config.hidden_size
-++#         self.num_heads = config.num_attention_heads
-++#         self.head_dim = self.hidden_size // self.num_heads
-++#         self.num_key_value_heads = config.num_key_value_heads
-++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++#         self.max_position_embeddings = config.max_position_embeddings
-++#         self.rope_theta = config.rope_theta
-++#         self.is_causal = True
-++#         self.attention_dropout = config.attention_dropout
-++
-++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++#             raise ValueError(
-++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++#                 f" and `num_heads`: {self.num_heads})."
-++#             )
-++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++
-++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++#             self.head_dim,
-++#             max_position_embeddings=self.max_position_embeddings,
-++#             base=self.rope_theta,
-++#         )
-++
-++#     def forward(
-++#         self,
-++#         hidden_states: mindspore.Tensor,
-++#         attention_mask: Optional[mindspore.Tensor] = None,
-++#         position_ids: Optional[mindspore.Tensor] = None,
-++#         past_key_value: Optional[Cache] = None,
-++#         output_attentions: bool = False,
-++#         use_cache: bool = False,
-++#         cache_position: Optional[mindspore.Tensor] = None,
-++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++
-++        
-++
-++#         bsz, q_len, _ = hidden_states.shape
-++
-++#         query_states = self.q_proj(hidden_states)
-++#         key_states = self.k_proj(hidden_states)
-++#         value_states = self.v_proj(hidden_states)
-++
-++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++
-++#         kv_seq_len = key_states.shape[-2]
-++#         if past_key_value is not None:
-++#             if self.layer_idx is None:
-++#                 raise ValueError(
-++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++#                     "with a layer index."
-++#                 )
-++#             if isinstance(past_key_value, StaticCache):
-++#                 kv_seq_len = key_states.shape[-2]
-++#             else:
-++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++
-++#         if past_key_value is not None:
-++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++            
-++#             if isinstance(past_key_value, StaticCache):
-++#                 kv_seq_len = key_states.shape[-2]
-++
-++#         # repeat k/v heads if n_kv_heads < n_heads
-++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++        
-++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++
-++#         if attention_mask is not None:
-++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++#             attn_weights = attn_weights + causal_mask
-++
-++#         # upcast attention to fp32
-++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++#         attn_output = ops.matmul(attn_weights, value_states)
-++
-++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++#             raise ValueError(
-++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-++#                 f" {attn_output.shape}"
-++#             )
-++
-++#         attn_output = ops.transpose(attn_output, 1, 2)
-++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++
-++#         attn_output = self.o_proj(attn_output)
-++#         # @lwx
-++        
-++#         # max_seq_len = self.max_position_embeddings  # 2048
-++
-++#         # if attention_mask is not None:
-++#         #     # attention_mask: [B, 1, Sq, Sk]
-++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++
-++#         #     # pad 到 [max_seq_len, max_seq_len]
-++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++#         #     global_attention_mask = padded_mask
-++#         # else:
-++#         #     global_attention_mask = None
-++
-++
-++#         # sparse_mode=3
-++#         # attn_output = mindspore.ops.flash_attention_score(
-++#         #     query=query_states,
-++#         #     key=key_states,
-++#         #     value=value_states,
-++#         #     real_shift=None,
-++#         #     padding_mask=None,
-++
-++#         #     head_num=self.num_heads,
-++#         #     attn_mask=global_attention_mask,
-++#         #     keep_prob=1.0 - self.attention_dropout,
-++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++#         #     input_layout="BNSD",
-++#         #     pre_tokens=2147483647,
-++#         #     next_tokens=2147483647,
-++#         #     inner_precise=0,
-++#         #     drop_mask=None,
-++#         #     prefix=None,
-++#         #     actual_seq_qlen=None,
-++#         #     actual_seq_kvlen=None,
-++#         #     sparse_mode=sparse_mode,
-++#         # )
-++#         if not output_attentions:
-++#             attn_weights = None
-++
-++#         return attn_output, attn_weights, past_key_value
-++
-+ class Qwen2MoeAttention(nn.Module):
-+     """
-+-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-+-    and "Generating Long Sequences with Sparse Transformers".
-+-    """
-++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
-+ 
-++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
-++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
-++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
-++
-++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
-++    """
-+     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+         super().__init__()
-+         self.config = config
-+@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
-+         if layer_idx is None:
-+             logger.warning_once(
-+                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+                 "when creating this class."
-+             )
-+ 
-+@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
-+         use_cache: bool = False,
-+         cache_position: Optional[mindspore.Tensor] = None,
-+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+-
-+         
-+-
-++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
-+         bsz, q_len, _ = hidden_states.shape
-+ 
-+         query_states = self.q_proj(hidden_states)
-+         key_states = self.k_proj(hidden_states)
-+         value_states = self.v_proj(hidden_states)
-+ 
-+-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+-
-++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++        
-+         kv_seq_len = key_states.shape[-2]
-+         if past_key_value is not None:
-+-            if self.layer_idx is None:
-+-                raise ValueError(
-+-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+-                    "with a layer index."
-+-                )
-+-            if isinstance(past_key_value, StaticCache):
-+-                kv_seq_len = key_states.shape[-2]
-+-            else:
-+-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++        
-+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+ 
-+         if past_key_value is not None:
-+-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++
-++        # --- 2. 动态调度核心注意力计算 ---
-++        global Long_Prompt
-++        if Long_Prompt >= 1:
-++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
-++            fa_attention_mask = None
-++            if attention_mask is not None:
-++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++                fa_attention_mask = (mask_slice != 0)
-++
-++            attn_output = mindspore.ops.flash_attention_score(
-++                query=query_states,
-++                key=key_states,
-++                value=value_states,
-++                head_num=self.num_heads,
-++                attn_mask=fa_attention_mask,
-++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
-++                scalar_value=1.0 / math.sqrt(self.head_dim),
-++                input_layout="BNSD",
-++                sparse_mode=0,
-++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
-++            )
-+             
-+-            if isinstance(past_key_value, StaticCache):
-+-                kv_seq_len = key_states.shape[-2]
-++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++            attn_output = self.o_proj(attn_output)
-++            attn_weights = None
-++            if output_attentions:
-++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-+ 
-+-        # repeat k/v heads if n_kv_heads < n_heads
-+-        key_states = repeat_kv(key_states, self.num_key_value_groups)
-+-        value_states = repeat_kv(value_states, self.num_key_value_groups)
-+-        
-+-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++        else:
-++            # --- Eager Attention 路径 (用于短序列和解码) ---
-++            key_states = repeat_kv(key_states, self.num_key_value_groups)
-++            value_states = repeat_kv(value_states, self.num_key_value_groups)
-++            
-++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+ 
-+-        if attention_mask is not None:
-+-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+-            attn_weights = attn_weights + causal_mask
-++            if attention_mask is not None:
-++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++                attn_weights = attn_weights + causal_mask
-+ 
-+-        # upcast attention to fp32
-+-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+-        attn_output = ops.matmul(attn_weights, value_states)
-++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++            attn_output = ops.matmul(attn_weights, value_states)
-+ 
-+-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+-            raise ValueError(
-+-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-+-                f" {attn_output.shape}"
-+-            )
-++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++                raise ValueError(
-++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
-++                )
-+ 
-+-        attn_output = ops.transpose(attn_output, 1, 2)
-+-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++            attn_output = ops.transpose(attn_output, 1, 2)
-++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++            attn_output = self.o_proj(attn_output)
-+ 
-+-        attn_output = self.o_proj(attn_output)
-+-        # @lwx
-++            if not output_attentions:
-++                attn_weights = None
-+         
-+-        # max_seq_len = self.max_position_embeddings  # 2048
-+-
-+-        # if attention_mask is not None:
-+-        #     # attention_mask: [B, 1, Sq, Sk]
-+-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+-
-+-        #     # pad 到 [max_seq_len, max_seq_len]
-+-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+-        #     global_attention_mask = padded_mask
-+-        # else:
-+-        #     global_attention_mask = None
-+-
-+-
-+-        # sparse_mode=3
-+-        # attn_output = mindspore.ops.flash_attention_score(
-+-        #     query=query_states,
-+-        #     key=key_states,
-+-        #     value=value_states,
-+-        #     real_shift=None,
-+-        #     padding_mask=None,
-+-
-+-        #     head_num=self.num_heads,
-+-        #     attn_mask=global_attention_mask,
-+-        #     keep_prob=1.0 - self.attention_dropout,
-+-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+-        #     input_layout="BNSD",
-+-        #     pre_tokens=2147483647,
-+-        #     next_tokens=2147483647,
-+-        #     inner_precise=0,
-+-        #     drop_mask=None,
-+-        #     prefix=None,
-+-        #     actual_seq_qlen=None,
-+-        #     actual_seq_kvlen=None,
-+-        #     sparse_mode=sparse_mode,
-+-        # )
-+-        if not output_attentions:
-+-            attn_weights = None
-+-
-+         return attn_output, attn_weights, past_key_value
-+ 
-+-
-+ # class Qwen2MoeFlashAttention(nn.Module):
-+ #     """
-+ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
-+ #             return final_hidden_states, router_logits
-+ 
-+ 
-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+-#     """
-+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-+-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-+-#     `_moe_infer_prefill` (用于长序列处理) 方法。
-+-#     """
-+-#     def __init__(self, config: Qwen2MoeConfig):
-+-#         super().__init__()
-+-#         self.num_experts = config.num_experts
-+-#         self.top_k = config.num_experts_per_tok
-+-#         self.norm_topk_prob = config.norm_topk_prob
-+-
-+-#         # 门控网络
-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+-#         # 专家列表
-+-#         self.experts = nn.ModuleList(
-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+-#         )
-+-#         # 共享专家
-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+-
-+-#     @no_grad()
-+-#     def _moe_infer_decode(
-+-#         self, 
-+-#         hidden_states: mindspore.Tensor, 
-+-#         selected_experts: mindspore.Tensor, 
-+-#         routing_weights: mindspore.Tensor
-+-#     ) -> mindspore.Tensor:
-+-#         """
-+-#         【解码路径】针对 sequence_length=1 的极致优化。
-+-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-+-#         """
-+-#         batch_size, hidden_dim = hidden_states.shape
-+-        
-+-#         expert_outputs_list = [
-+-#             ops.cat([
-+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+-#             ], dim=0) 
-+-#             for i in range(batch_size)
-+-#         ]
-+-        
-+-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-+-#         # shape: (batch_size, top_k, hidden_dim)
-+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+-        
-+-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-+-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+-        
-+-#         return moe_output.squeeze(1)
-+-
-+-#     @no_grad()
-+-#     def _moe_infer_prefill(
-+-#         self, 
-+-#         hidden_states: mindspore.Tensor, 
-+-#         selected_experts: mindspore.Tensor, 
-+-#         routing_weights: mindspore.Tensor
-+-#     ) -> mindspore.Tensor:
-+-#         """
-+-#         【预填充路径】针对 sequence_length > 1 的优化。
-+-#         按专家对 Token 进行分组，并进行批处理。
-+-#         """
-+-#         moe_output = ops.zeros_like(hidden_states)
-+-#         num_tokens = hidden_states.shape[0]
-+-#         flat_selected_experts = selected_experts.flatten()
-+-        
-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+-        
-+-#         active_experts = ops.unique(flat_selected_experts)
-+-        
-+-#         for expert_idx_tensor in active_experts:
-+-#             expert_idx = expert_idx_tensor.item()
-+-#             expert_layer = self.experts[expert_idx]
-+-            
-+-#             mask = (flat_selected_experts == expert_idx_tensor)
-+-#             selected_token_indices = token_indices[mask]
-+-#             selected_routing_weights = routing_weights.flatten()[mask]
-+-            
-+-#             current_states = hidden_states[selected_token_indices]
-+-            
-+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+-            
-+-#             moe_output = moe_output.index_add(
-+-#                 dim=0,
-+-#                 index=selected_token_indices,
-+-#                 source=expert_output.to(hidden_states.dtype)
-+-#             )
-+-#         return moe_output
-+-
-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+-#         """
-+-#         顶层 forward 方法，作为智能分发器。
-+-#         """
-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-        
-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+-#         router_logits = self.gate(hidden_states_reshaped)
-+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+-
-+-#         if self.norm_topk_prob:
-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-        
-+-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+-        
-+-#         moe_output = None
-+-#         # 在推理时，根据序列长度选择最优路径
-+-#         if not self.training:
-+-#             if sequence_length == 1:
-+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+-#             else:
-+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+-#         else:
-+-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-+-#             raise NotImplementedError("Training path is not implemented.")
-+-
-+-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-+-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-+-        
-+-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-+-        
-+-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-+-        
-+-#         return final_hidden_states, router_logits
-+-
-+-
-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+-#     """
-+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-+-#     """
-+-#     def __init__(self, config: Qwen2MoeConfig):
-+-#         super().__init__()
-+-#         self.num_experts = config.num_experts
-+-#         self.top_k = config.num_experts_per_tok
-+-#         self.norm_topk_prob = config.norm_topk_prob
-+-
-+-#         # 门控网络
-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+-#         # 专家列表
-+-#         self.experts = nn.ModuleList(
-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+-#         )
-+-#         # 共享专家
-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+-
-+-#     @no_grad()
-+-#     def _moe_infer_decode(
-+-#         self, 
-+-#         hidden_states: mindspore.Tensor, 
-+-#         selected_experts: mindspore.Tensor, 
-+-#         routing_weights: mindspore.Tensor
-+-#     ) -> mindspore.Tensor:
-+-#         batch_size, _ = hidden_states.shape
-+-#         expert_outputs_list = [
-+-#             ops.cat([
-+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+-#             ], dim=0) 
-+-#             for i in range(batch_size)
-+-#         ]
-+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+-#         return moe_output.squeeze(1)
-+-
-+-#     @no_grad()
-+-#     def _moe_infer_prefill(
-+-#         self, 
-+-#         hidden_states: mindspore.Tensor, 
-+-#         selected_experts: mindspore.Tensor, 
-+-#         routing_weights: mindspore.Tensor
-+-#     ) -> mindspore.Tensor:
-+-#         moe_output = ops.zeros_like(hidden_states)
-+-#         num_tokens = hidden_states.shape[0]
-+-#         flat_selected_experts = selected_experts.flatten()
-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+-#         active_experts = ops.unique(flat_selected_experts)
-+-        
-+-#         for expert_idx_tensor in active_experts:
-+-#             expert_idx = expert_idx_tensor.item()
-+-#             expert_layer = self.experts[expert_idx]
-+-#             mask = (flat_selected_experts == expert_idx_tensor)
-+-#             selected_token_indices = token_indices[mask]
-+-#             selected_routing_weights = routing_weights.flatten()[mask]
-+-#             current_states = hidden_states[selected_token_indices]
-+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+-#             moe_output = moe_output.index_add(
-+-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+-#             )
-+-#         return moe_output
-+-
-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+-#         """
-+-#         顶层 forward 方法，作为智能分发器。
-+-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-+-#         """
-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-        
-+-#         # 1. 门控计算 (通用逻辑)
-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+-#         router_logits = self.gate(hidden_states_reshaped)
-+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+-
-+-#         if self.norm_topk_prob:
-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-        
-+-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+-        
-+-#         # 2. 智能分发到最优 MoE 路径
-+-#         moe_output = None
-+-#         if not self.training:
-+-#             if sequence_length == 1:
-+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+-#             else:
-+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+-#         else:
-+-#             raise NotImplementedError("Training path is not implemented.")
-+-
-+-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-+-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+-        
-+-#         # 4. 合并 MoE 输出和共享专家输出
-+-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+-        
-+-#         # 5. 恢复原始形状并返回
-+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+-        
-+-#         return final_hidden_states, router_logits
-+-
-+-# prefill fastest
-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+-#     """
-+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-+-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-+-#     """
-+-#     def __init__(self, config: Qwen2MoeConfig):
-+-#         super().__init__()
-+-#         self.num_experts = config.num_experts
-+-#         self.top_k = config.num_experts_per_tok
-+-#         self.norm_topk_prob = config.norm_topk_prob
-+-
-+-#         # 门控网络
-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+-#         # 专家列表
-+-#         self.experts = nn.ModuleList(
-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+-#         )
-+-#         # 共享专家
-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+-
-+-#     @no_grad()
-+-#     def _moe_infer_dispatch(
-+-#         self, 
-+-#         hidden_states: mindspore.Tensor, 
-+-#         selected_experts: mindspore.Tensor, 
-+-#         routing_weights: mindspore.Tensor
-+-#     ) -> mindspore.Tensor:
-+-#         """
-+-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-+-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-+-#         """
-+-#         moe_output = ops.zeros_like(hidden_states)
-+-#         num_tokens, _ = hidden_states.shape
-+-        
-+-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-+-#         flat_selected_experts = selected_experts.flatten()
-+-#         flat_routing_weights = routing_weights.flatten()
-+-
-+-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+-
-+-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-+-#         active_experts = ops.unique(flat_selected_experts)
-+-        
-+-#         for expert_idx_tensor in active_experts:
-+-#             expert_idx = expert_idx_tensor.item()
-+-#             expert_layer = self.experts[expert_idx]
-+-            
-+-#             # 找到所有分配给该专家的 token
-+-#             mask = (flat_selected_experts == expert_idx_tensor)
-+-            
-+-#             # 使用 mask 选取对应的 token 和权重
-+-#             current_token_indices = token_indices[mask]
-+-#             current_routing_weights = flat_routing_weights[mask]
-+-#             current_hidden_states = hidden_states[current_token_indices]
-+-            
-+-#             # 对这些 token 进行批处理
-+-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+-            
-+-#             # 使用 index_add 将结果精确地加回到对应位置
-+-#             moe_output = moe_output.index_add(
-+-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-+-#             )
-+-#         return moe_output
-+-
-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+-#         """
-+-#         顶层 forward 方法，作为智能分发器。
-+-#         """
-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-        
-+-#         # 1. 门控计算
-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+-#         router_logits = self.gate(hidden_states_reshaped)
-+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+-
-+-#         if self.norm_topk_prob:
-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-        
-+-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+-        
-+-#         # 2. 调用统一的 MoE 计算内核
-+-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-+-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-+-
-+-#         # 3. 统一处理共享专家
-+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+-        
-+-#         # 4. 合并输出
-+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+-        
-+-#         # 5. 恢复原始形状并返回
-+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+-        
-+-#         return final_hidden_states, router_logits
-+-
-+-
-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+-#     """
-+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+-#     【最终高性能与高精度版】：
-+-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-+-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-+-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-+-#     3. 这样实现了速度和准确性的两全其美。
-+-#     """
-+-#     def __init__(self, config: Qwen2MoeConfig):
-+-#         super().__init__()
-+-#         self.num_experts = config.num_experts
-+-#         self.top_k = config.num_experts_per_tok
-+-#         self.norm_topk_prob = config.norm_topk_prob
-+-
-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+-#         self.experts = nn.ModuleList(
-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+-#         )
-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+-
-+-#     @no_grad()
-+-#     def _moe_infer_decode(
-+-#         self, 
-+-#         hidden_states: mindspore.Tensor, 
-+-#         selected_experts: mindspore.Tensor, 
-+-#         routing_weights: mindspore.Tensor
-+-#     ) -> mindspore.Tensor:
-+-#         """
-+-#         【解码路径】极致优化版：bmm + 高精度累加。
-+-#         """
-+-#         original_dtype = hidden_states.dtype
-+-#         batch_size, _ = hidden_states.shape
-+-        
-+-#         expert_outputs_list = [
-+-#             ops.cat([
-+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+-#             ], dim=0) 
-+-#             for i in range(batch_size)
-+-#         ]
-+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+-
-+-#         # 在 float32 下执行 bmm，得到高精度结果
-+-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+-        
-+-#         # 将高精度结果转换回原始数据类型
-+-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-+-        
-+-#         return moe_output
-+-
-+-#     @no_grad()
-+-#     def _moe_infer_prefill(
-+-#         self, 
-+-#         hidden_states: mindspore.Tensor, 
-+-#         selected_experts: mindspore.Tensor, 
-+-#         routing_weights: mindspore.Tensor
-+-#     ) -> mindspore.Tensor:
-+-#         """
-+-#         【预填充路径】与原始实现一致，结果精确。
-+-#         """
-+-#         moe_output = ops.zeros_like(hidden_states)
-+-#         num_tokens, _ = hidden_states.shape
-+-#         flat_selected_experts = selected_experts.flatten()
-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+-#         active_experts = ops.unique(flat_selected_experts)
-+-        
-+-#         for expert_idx_tensor in active_experts:
-+-#             expert_idx = expert_idx_tensor.item()
-+-#             expert_layer = self.experts[expert_idx]
-+-#             mask = (flat_selected_experts == expert_idx_tensor)
-+-#             selected_token_indices = token_indices[mask]
-+-#             selected_routing_weights = routing_weights.flatten()[mask]
-+-#             current_states = hidden_states[selected_token_indices]
-+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+-#             moe_output = moe_output.index_add(
-+-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+-#             )
-+-#         return moe_output
-+-
-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-        
-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+-#         router_logits = self.gate(hidden_states_reshaped)
-+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+-
-+-#         if self.norm_topk_prob:
-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-        
-+-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-+-#         # 如果模型主体是 float16，后续再转换
-+-        
-+-#         moe_output = None
-+-#         if not self.training:
-+-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-+-#             # _moe_infer_decode 内部会处理好类型转换
-+-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-+-#             if sequence_length == 1:
-+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+-#             else:
-+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+-#         else:
-+-#             raise NotImplementedError("Training path is not implemented.")
-+-
-+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+-        
-+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+-        
-+-#         return final_hidden_states, router_logits
-+-    
-+-
-+-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+-#     """
-+-#     【融合版】一个混合专家模块，内置两种推理策略，
-+-#     由外部全局变量 `Long_Prompt` 控制：
-+-
-+-#     - if Long_Prompt is True:  【精度优先模式】
-+-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-+-#       适用于处理长序列，避免误差累积。
-+-
-+-#     - if Long_Prompt is False: 【速度优先模式】
-+-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-+-#       在解码阶段获得极致速度，同时保证结果高度准确。
-+-#     """
-+-#     def __init__(self, config: Qwen2MoeConfig):
-+-#         super().__init__()
-+-#         self.num_experts = config.num_experts
-+-#         self.top_k = config.num_experts_per_tok
-+-#         self.norm_topk_prob = config.norm_topk_prob
-+-
-+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+-#         self.experts = nn.ModuleList(
-+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+-#         )
-+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+-
-+-#     # --- 速度优先模式的辅助函数 ---
-+-#     @no_grad()
-+-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+-#         original_dtype = hidden_states.dtype
-+-#         batch_size, _ = hidden_states.shape
-+-#         expert_outputs_list = [
-+-#             ops.cat([
-+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+-#             ], dim=0) 
-+-#             for i in range(batch_size)
-+-#         ]
-+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+-#         weights_fp32 = routing_weights.to(mindspore.float32)
-+-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+-#         return moe_output_fp32.squeeze(1).to(original_dtype)
-+-
-+-#     @no_grad()
-+-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+-#         moe_output = ops.zeros_like(hidden_states)
-+-#         num_tokens, _ = hidden_states.shape
-+-#         flat_selected_experts = selected_experts.flatten()
-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+-#         active_experts = ops.unique(flat_selected_experts)
-+-#         for expert_idx_tensor in active_experts:
-+-#             expert_idx = expert_idx_tensor.item()
-+-#             expert_layer = self.experts[expert_idx]
-+-#             mask = (flat_selected_experts == expert_idx_tensor)
-+-#             selected_token_indices = token_indices[mask]
-+-#             selected_routing_weights = routing_weights.flatten()[mask]
-+-#             current_states = hidden_states[selected_token_indices]
-+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-+-#         return moe_output
-+-
-+-#     # --- 精度优先模式的辅助函数 ---
-+-#     @no_grad()
-+-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+-#         moe_output = ops.zeros_like(hidden_states)
-+-#         num_tokens, _ = hidden_states.shape
-+-#         flat_selected_experts = selected_experts.flatten()
-+-#         flat_routing_weights = routing_weights.flatten()
-+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+-#         active_experts = ops.unique(flat_selected_experts)
-+-#         for expert_idx_tensor in active_experts:
-+-#             expert_idx = expert_idx_tensor.item()
-+-#             expert_layer = self.experts[expert_idx]
-+-#             mask = (flat_selected_experts == expert_idx_tensor)
-+-#             current_token_indices = token_indices[mask]
-+-#             current_routing_weights = flat_routing_weights[mask]
-+-#             current_hidden_states = hidden_states[current_token_indices]
-+-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+-#         return moe_output
-+-
-+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+-#         # 声明我们将要使用一个在模块外部定义的全局变量
-+-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-+-#         global Long_Prompt
-+-
-+-#         # 1. 门控计算 (所有模式通用)
-+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+-#         router_logits = self.gate(hidden_states_reshaped)
-+-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+-#         if self.norm_topk_prob:
-+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-        
-+-#         moe_output = None
-+-#         if not self.training:
-+-#             # 根据 Long_Prompt 标志选择模式
-+-#             if Long_Prompt:
-+-#                 # --- 精度优先模式 ---
-+-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+-#             else:
-+-#                 # --- 速度优先模式 ---
-+-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+-#                 if sequence_length == 1:
-+-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+-#                 else:
-+-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+-#         else:
-+-#             raise NotImplementedError("Training path is not implemented.")
-+-
-+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+-        
-+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+-        
-+-#         return final_hidden_states, router_logits
-+-    
-+ class Qwen2MoeSparseMoeBlock(nn.Module):
-+     """
-+     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-+@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+         return moe_output_fp32.squeeze(1).to(original_dtype)
-+ 
-++    # @no_grad()
-++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++    #     num_tokens, _ = hidden_states.shape
-++    #     flat_selected_experts = selected_experts.flatten()
-++    #     sorted_expert_indices = flat_selected_experts.argsort()
-++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++    #     original_token_indices = sorted_expert_indices // self.top_k
-++    #     moe_output = ops.zeros_like(hidden_states)
-++    #     current_token_offset = 0
-++    #     for i in range(self.num_experts):
-++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
-++    #         if expert_token_count == 0:
-++    #             continue
-++    #         end_offset = current_token_offset + expert_token_count
-++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
-++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++    #         current_token_offset += expert_token_count
-++    #     return moe_output
-++
-+     @no_grad()
-+     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+-        num_tokens, _ = hidden_states.shape
-+-        flat_selected_experts = selected_experts.flatten()
-+-        sorted_expert_indices = flat_selected_experts.argsort()
-+-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+-        original_token_indices = sorted_expert_indices // self.top_k
-++        """
-++        优化版 MoE prefill (速度优先模式)：
-++        - 批量张量化处理同一个 expert 的所有 token
-++        - 跳过无 token 的专家
-++        - 保持结果完全一致
-++        """
-+         moe_output = ops.zeros_like(hidden_states)
-+-        current_token_offset = 0
-+-        for i in range(self.num_experts):
-+-            expert_token_count = tokens_per_expert[i] - current_token_offset
-+-            if expert_token_count == 0:
-+-                continue
-+-            end_offset = current_token_offset + expert_token_count
-+-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+-            expert_hidden_states = hidden_states[expert_original_token_indices]
-+-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+-            current_token_offset += expert_token_count
-++
-++        flat_selected_experts = selected_experts.flatten()
-++        flat_routing_weights = routing_weights.flatten()
-++
-++        idxs = flat_selected_experts.argsort()
-++        sorted_expert_indices = flat_selected_experts[idxs]
-++        sorted_token_indices = idxs // self.top_k
-++
-++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-++
-++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-++
-++        for expert_id in active_experts.tolist():
-++            start = int(tokens_per_expert[:expert_id].sum().item())
-++            end = start + int(tokens_per_expert[expert_id].item())
-++
-++            token_idx = sorted_token_indices[start:end]
-++            expert_tokens = hidden_states[token_idx]
-++
-++            expert_out = self.experts[expert_id](expert_tokens)
-++
-++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
-++
-++            moe_output = mindspore.mint.scatter_add(
-++                moe_output,
-++                0,
-++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
-++                scaled_out.to(hidden_states.dtype)
-++            )
-++
-+         return moe_output
-+ 
-++
-+     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-+     @no_grad()
-+     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+         
-+         moe_output = None
-+-        if Long_Prompt:
-+-            # --- 精度优先模式 (ACCURACY MODE) ---
-+-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++        # if Long_Prompt==0:
-++        #     # --- 精度优先模式 (ACCURACY MODE) ---
-++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++        # else:
-++        #     # --- 速度优先模式 (SPEED MODE) ---
-++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++        #     if sequence_length == 1:
-++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++        #     else:
-++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++        
-++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++        if sequence_length == 1:
-++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+         else:
-+-            # --- 速度优先模式 (SPEED MODE) ---
-+-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+-            if sequence_length == 1:
-+-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+-            else:
-+-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+-        
-++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++    
-+ 
-+         # 3. 共享专家计算与合并 (所有模式通用)
-+         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+         
-+         return final_hidden_states, router_logits
-+ 
-++
-+ class Qwen2MoeDecoderLayer(nn.Module):
-+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-+         super().__init__()
-+         self.hidden_size = config.hidden_size
-+         
-+-        # if Long_Prompt:
-+-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+-        # else:
-++        # if Long_Prompt == 2:
-+         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++        # else:
-++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+ 
-+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+ 
-+@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+             )
-+ 
-+         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
-+-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++        #     attention_mask,
-++        #     sequence_length=sequence_length,
-++        #     target_length=target_length,
-++        #     dtype=dtype,
-++        #     min_dtype=min_dtype,
-++        #     cache_position=cache_position,
-++        #     batch_size=input_tensor.shape[0],
-++        # )
-++        #@dwj
-++        causal_mask = get_cached_causal_mask_with_cache_position(
-+             attention_mask,
-+             sequence_length=sequence_length,
-+             target_length=target_length,
-+@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-+         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-+         """
-+-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
-++        _causal_mask_cache.clear()
-+ 
-+         input_ids = kwargs.get("input_ids")
-+         if input_ids is None and args:
-+@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+ 
-+         if input_ids is not None:
-+             prompt_length = input_ids.shape[1]
-+-            
-+-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-+-                Long_Prompt = True
-++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-++                Long_Prompt = 2
-++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-++                Long_Prompt = 0
-+             else:
-+-                Long_Prompt = False
-++                Long_Prompt = 1
-++
-+ 
-+         return super().generate(*args, **kwargs)
-+     
-+@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+             dtype = self.lm_head.weight.dtype
-+             min_dtype = float(ops.finfo(dtype).min)
-+ 
-+-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++            #     attention_mask,
-++            #     sequence_length=sequence_length,
-++            #     target_length=past_key_values.get_max_length(),
-++            #     dtype=dtype,
-++            #     min_dtype=min_dtype,
-++            #     cache_position=cache_position,
-++            #     batch_size=batch_size,
-++            # )
-++
-++            #@dwj
-++            attention_mask = get_cached_causal_mask_with_cache_position(
-+                 attention_mask,
-+                 sequence_length=sequence_length,
-+                 target_length=past_key_values.get_max_length(),
-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+deleted file mode 100644
-+index 6dfb5b93..00000000
-+--- a/patches/0001-20251104commit.patch
-++++ /dev/null
-+@@ -1,1272 +0,0 @@
-+-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+-From: Pinoeer-kingxi <13022943007@163.com>
-+-Date: Tue, 4 Nov 2025 09:11:51 +0800
-+-Subject: [PATCH] 20251104commit
-+-
-+----
-+- mindnlp/transformers/cache_utils.py           |  28 +-
-+- .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+- 3 files changed, 976 insertions(+), 87 deletions(-)
-+-
-+-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+-index cadd2e04..02f8d4be 100644
-+---- a/mindnlp/transformers/cache_utils.py
-+-+++ b/mindnlp/transformers/cache_utils.py
-+-@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+-             # k_out[:, :, cache_position] = key_states
-+-             # v_out[:, :, cache_position] = value_states
-+--            if ON_ORANGE_PI:
-+--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+--            else:
-+--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+--
-+-+            # if ON_ORANGE_PI:
-+-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+-+            # else:
-+-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+-+            # 确保 cache_position 是 1D tensor 并且类型正确
-+-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+-+            if cache_position.ndim > 1:
-+-+                cache_position = cache_position.flatten()
-+-+            # 确保类型是 int32 或 int64（MindSpore 要求）
-+-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+-+                cache_position = cache_position.int()
-+-+            
-+-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+-+            k_out[:, :, cache_position] = key_states
-+-+            v_out[:, :, cache_position] = value_states
-+-+            
-+-         return k_out, v_out
-+- 
-+-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+-index c695b944..d8303e45 100644
-+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+- # Copied from transformers.models.llama.modeling_llama.rotate_half
-+- def rotate_half(x):
-+-     """Rotates half the hidden dims of the input."""
-+--    x1 = x[..., : x.shape[-1] // 2]
-+--    x2 = x[..., x.shape[-1] // 2 :]
-+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+-+    # x1 = x[..., : x.shape[-1] // 2]
-+-+    # x2 = x[..., x.shape[-1] // 2 :]
-+-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+-     return ops.cat((-x2, x1), dim=-1)
-+- 
-+- 
-+-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+-         if self.training:
-+-             raise NotImplementedError("Training is not supported yet.")
-+-         else:
-+--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+--        if self.config.n_shared_experts is not None:
-+--            y = y + self.shared_experts(identity)
-+--        return y
-+-+            # @lwx
-+-+            if orig_shape[1] == 1:
-+-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+-+                y=y.view(*orig_shape)
-+-+                if self.config.n_shared_experts is not None:
-+-+                    y = y + self.shared_experts(identity)
-+-+                return y
-+-+            else:
-+-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+-+                if self.config.n_shared_experts is not None:
-+-+                    y = y + self.shared_experts(identity)
-+-+                return y
-+-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+-+        # if self.config.n_shared_experts is not None:
-+-+        #     y = y + self.shared_experts(identity)
-+-+        # return y
-+-+        
-+-+    @no_grad()
-+-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+-+
-+-+        expert_cache = ops.zeros_like(x)
-+-+        for i in range(self.num_experts_per_tok):
-+-+            expert_id = flat_expert_indices[i].item()
-+-+            weight = flat_expert_weights[i].item()
-+-+            expert = self.experts[expert_id]
-+-+            expert_out = expert(x)
-+-+            expert_cache += expert_out * weight
-+-+        return expert_cache
-+- 
-+-     @no_grad()
-+--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+--        # expert_cache = torch.zeros_like(x)
-+--        # idxs = flat_expert_indices.argsort()
-+--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+--        # token_idxs = idxs // self.num_experts_per_tok
-+--        # for i, end_idx in enumerate(tokens_per_expert):
-+--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+--        #     if start_idx == end_idx:
-+--        #         continue
-+--        #     expert = self.experts[i]
-+--        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+--        #     expert_tokens = x[exp_token_idx]
-+--        #     expert_out = expert(expert_tokens)
-+--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+--        # return expert_cache
-+-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+-         expert_cache = ops.zeros_like(x)
-+-         idxs = flat_expert_indices.argsort()
-+-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+-         token_idxs = idxs // self.num_experts_per_tok
-+-+
-+-         for i, end_idx in enumerate(tokens_per_expert):
-+-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+-             if start_idx == end_idx:
-+-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+-             expert_out = expert(expert_tokens)
-+-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+-+
-+-         return expert_cache
-+-+        
-+-+    # @no_grad()
-+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+-+    #     # expert_cache = torch.zeros_like(x)
-+-+    #     # idxs = flat_expert_indices.argsort()
-+-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+-+    #     # token_idxs = idxs // self.num_experts_per_tok
-+-+    #     # for i, end_idx in enumerate(tokens_per_expert):
-+-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+-+    #     #     if start_idx == end_idx:
-+-+    #     #         continue
-+-+    #     #     expert = self.experts[i]
-+-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+-+    #     #     expert_tokens = x[exp_token_idx]
-+-+    #     #     expert_out = expert(expert_tokens)
-+-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+-+    #     # return expert_cache
-+-+    #     expert_cache = ops.zeros_like(x)
-+-+    #     idxs = flat_expert_indices.argsort()
-+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+-+    #     token_idxs = idxs // self.num_experts_per_tok
-+-+
-+-+    #     for i, end_idx in enumerate(tokens_per_expert):
-+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+-+    #         if start_idx == end_idx:
-+-+    #             continue
-+-+    #         expert = self.experts[i]
-+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+-+    #         expert_tokens = x[exp_token_idx]
-+-+    #         expert_out = expert(expert_tokens)
-+-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+-+
-+-+    #     return expert_cache
-+-+    # @no_grad()
-+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+-+    #     expert_cache = ops.zeros_like(x)
-+-+
-+-+    #     # 排序保证顺序一致
-+-+    #     idxs = flat_expert_indices.argsort()
-+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+-+    #     token_idxs = idxs // self.num_experts_per_tok
-+-+
-+-+    #     # 找出有 token 的专家
-+-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+-+
-+-+    #     for i in active_experts.tolist():
-+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+-+    #         end_idx = tokens_per_expert[i]
-+-+    #         if start_idx == end_idx:  # 没有 token
-+-+    #             continue
-+-+
-+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+-+    #         expert_tokens = x[exp_token_idx]
-+-+    #         expert_out = self.experts[i](expert_tokens)
-+-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+-+
-+-+    #         expert_cache = mindspore.mint.scatter_add(
-+-+    #             expert_cache,
-+-+    #             0,
-+-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+-+    #             expert_out
-+-+    #         )
-+-+
-+-+    #     return expert_cache
-+-+
-+-+
-+- 
-+- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+- #     """
-+-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+- 
-+-         # Initialize weights and apply final processing
-+-         self.post_init()
-+-+        self.warm_up = False
-+-+
-+-+    def warmup_moe_model_deep(self):
-+-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+-+        test_texts = [
-+-+            "warmup short",
-+-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+-+        ]
-+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+-+        if tokenizer is None:
-+-+            from mindnlp.transformers import AutoTokenizer
-+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+-+            self._warmup_tokenizer = tokenizer
-+-+
-+-+        for text in test_texts:
-+-+            inputs = tokenizer(text, return_tensors="ms")
-+-+            with mindspore._no_grad():
-+-+                _ = self(**inputs, use_cache=False)
-+-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+- 
-+-     def get_input_embeddings(self):
-+-         return self.model.embed_tokens
-+-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+-         ```"""
-+-+        if not self.warm_up:
-+-+            self.warm_up = True
-+-+            self.warmup_moe_model_deep()
-+-+
-+-         output_attentions = (
-+-             output_attentions
-+-             if output_attentions is not None
-+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+-index 3cbf820e..d4c6b651 100644
-+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+-@@ -18,7 +18,6 @@
-+- # See the License for the specific language governing permissions and
-+- # limitations under the License.
-+- """MindSpore Qwen2MoE model."""
-+--
-+- import math
-+- from typing import List, Optional, Tuple, Union
-+- 
-+-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+-     TokenClassifierOutput,
-+- )
-+- from ...modeling_utils import PreTrainedModel
-+-+from ...generation import GenerationMixin
-+- from ....utils import logging
-+- from .configuration_qwen2_moe import Qwen2MoeConfig
-+- 
-+-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+-         self.variance_epsilon = eps
-+- 
-+-     def forward(self, hidden_states):
-+-+        # @dwj
-+-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+-+        # @lwx
-+-+        # if not self.training :
-+-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+-         input_dtype = hidden_states.dtype
-+-         hidden_states = hidden_states.to(mindspore.float32)
-+-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+-@@ -234,6 +239,8 @@ def rotate_half(x):
-+-     """Rotates half the hidden dims of the input."""
-+-     x1 = x[..., : x.shape[-1] // 2]
-+-     x2 = x[..., x.shape[-1] // 2 :]
-+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+-     return ops.cat((-x2, x1), dim=-1)
-+- 
-+- 
-+-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+-         self.config = config
-+-         self.hidden_size = config.hidden_size
-+-         self.intermediate_size = intermediate_size
-+-+        
-+-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+-         self.act_fn = ACT2FN[config.hidden_act]
-+- 
-+-     def forward(self, x):
-+--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+--
-+- 
-+-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+-+        # @lwx 
-+-+        # gate_up_output = self.gate_up_proj(x)
-+-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+-+        # return self.down_proj(swiglu_output)
-+-+
-+-+    # def forward(self, x):
-+-+    #     gate_proj_out = self.gate_proj(x)
-+-+    #     up_proj_out = self.up_proj(x)
-+-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+-+    #     return self.down_proj(swiglu_out)
-+-+        
-+- # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+-     """
-+-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+-         use_cache: bool = False,
-+-         cache_position: Optional[mindspore.Tensor] = None,
-+-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+-+
-+-+        
-+-+
-+-         bsz, q_len, _ = hidden_states.shape
-+- 
-+-         query_states = self.q_proj(hidden_states)
-+-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+-                     "with a layer index."
-+-                 )
-+--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+-+            if isinstance(past_key_value, StaticCache):
-+-+                kv_seq_len = key_states.shape[-2]
-+-+            else:
-+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+- 
-+-         if past_key_value is not None:
-+-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+-+            
-+-+            if isinstance(past_key_value, StaticCache):
-+-+                kv_seq_len = key_states.shape[-2]
-+- 
-+-         # repeat k/v heads if n_kv_heads < n_heads
-+-         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+-         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+--
-+-+        
-+-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+- 
-+--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+--            raise ValueError(
-+--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+--                f" {attn_weights.shape}"
-+--            )
-+--
-+--        if attention_mask is not None:  # no matter the length, we just slice it
-+--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+-+        if attention_mask is not None:
-+-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+-             attn_weights = attn_weights + causal_mask
-+- 
-+-         # upcast attention to fp32
-+-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+- 
-+-         attn_output = self.o_proj(attn_output)
-+--
-+-+        # @lwx
-+-+        
-+-+        # max_seq_len = self.max_position_embeddings  # 2048
-+-+
-+-+        # if attention_mask is not None:
-+-+        #     # attention_mask: [B, 1, Sq, Sk]
-+-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+-+
-+-+        #     # pad 到 [max_seq_len, max_seq_len]
-+-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+-+        #     global_attention_mask = padded_mask
-+-+        # else:
-+-+        #     global_attention_mask = None
-+-+
-+-+
-+-+        # sparse_mode=3
-+-+        # attn_output = mindspore.ops.flash_attention_score(
-+-+        #     query=query_states,
-+-+        #     key=key_states,
-+-+        #     value=value_states,
-+-+        #     real_shift=None,
-+-+        #     padding_mask=None,
-+-+
-+-+        #     head_num=self.num_heads,
-+-+        #     attn_mask=global_attention_mask,
-+-+        #     keep_prob=1.0 - self.attention_dropout,
-+-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+-+        #     input_layout="BNSD",
-+-+        #     pre_tokens=2147483647,
-+-+        #     next_tokens=2147483647,
-+-+        #     inner_precise=0,
-+-+        #     drop_mask=None,
-+-+        #     prefix=None,
-+-+        #     actual_seq_qlen=None,
-+-+        #     actual_seq_kvlen=None,
-+-+        #     sparse_mode=sparse_mode,
-+-+        # )
-+-         if not output_attentions:
-+-             attn_weights = None
-+- 
-+-         return attn_output, attn_weights, past_key_value
-+- 
-+- 
-+-+class Qwen2MoeFlashAttention(nn.Module):
-+-+    """
-+-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+-+
-+-+    关键改动:
-+-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+-+       直接传入原始的 key 和 value 张量效率更高。
-+-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+-+    """
-+-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+-+        super().__init__()
-+-+        self.config = config
-+-+        self.layer_idx = layer_idx
-+-+        self.hidden_size = config.hidden_size
-+-+        self.num_heads = config.num_attention_heads
-+-+        self.head_dim = self.hidden_size // self.num_heads
-+-+        self.num_key_value_heads = config.num_key_value_heads
-+-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+-+        self.max_position_embeddings = config.max_position_embeddings
-+-+        self.rope_theta = config.rope_theta
-+-+        self.attention_dropout = config.attention_dropout
-+-+
-+-+        if (self.head_dim * self.num_heads) != self.hidden_size:
-+-+            raise ValueError(
-+-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+-+            )
-+-+
-+-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+-+
-+-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+-+            self.head_dim,
-+-+            max_position_embeddings=self.max_position_embeddings,
-+-+            base=self.rope_theta,
-+-+        )
-+-+
-+-+    def forward(
-+-+        self,
-+-+        hidden_states: mindspore.Tensor,
-+-+        attention_mask: Optional[mindspore.Tensor] = None,
-+-+        position_ids: Optional[mindspore.Tensor] = None,
-+-+        past_key_value: Optional[Cache] = None,
-+-+        output_attentions: bool = False,
-+-+        use_cache: bool = False,
-+-+        cache_position: Optional[mindspore.Tensor] = None,
-+-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+-+
-+-+        bsz, q_len, _ = hidden_states.shape
-+-+
-+-+        # 1. 线性投射 Q, K, V
-+-+        query_states = self.q_proj(hidden_states)
-+-+        key_states = self.k_proj(hidden_states)
-+-+        value_states = self.v_proj(hidden_states)
-+-+
-+-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+-+        # query:   [B, S, H*D] -> [B, N1, S, D]
-+-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+
-+-+        # 3. RoPE 旋转位置编码
-+-+        kv_seq_len = key_states.shape[-2]
-+-+        if past_key_value is not None:
-+-+            if self.layer_idx is None:
-+-+                raise ValueError(
-+-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+-+                    "with a layer index."
-+-+                )
-+-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+-+                if cache_position.shape[0] == 1:
-+-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+-+                    kv_seq_len = past_seen_tokens + 1
-+-+                else:
-+-+                    # prefill 阶段：cache_position 是范围，使用其长度
-+-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+-+            else:
-+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+-+
-+-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+-+
-+-+        # 4. KV 缓存更新
-+-+        if past_key_value is not None:
-+-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+-+            key_states, value_states = past_key_value.update(
-+-+                key_states, value_states, self.layer_idx, cache_kwargs
-+-+            )
-+-+            
-+-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+-+                if cache_position.shape[0] == 1:
-+-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+-+                    kv_seq_len = key_states.shape[-2]
-+-+
-+-+        # 5. [重要] 准备 Attention Mask
-+-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+-+        fa_attention_mask = None
-+-+        if attention_mask is not None:
-+-+            # 截取与当前key长度匹配的部分
-+-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+-+            fa_attention_mask = (mask_slice != 0)
-+-+
-+-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+-+        input_dtype = query_states.dtype
-+-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+-+            query_states = query_states.to(mindspore.float16)
-+-+            key_states = key_states.to(mindspore.float16)
-+-+            value_states = value_states.to(mindspore.float16)
-+-+
-+-+        # 6. [核心] 调用 flash_attention_score 算子
-+-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+-+        attn_output = mindspore.ops.flash_attention_score(
-+-+            query=query_states,
-+-+            key=key_states,
-+-+            value=value_states,
-+-+            head_num=self.num_heads, # 传入Q的头数(N1)
-+-+            attn_mask=fa_attention_mask,
-+-+            keep_prob=1.0 - self.attention_dropout,
-+-+            scalar_value=1.0 / math.sqrt(self.head_dim),
-+-+            input_layout="BNSD",
-+-+            sparse_mode=0 # 使用 defaultMask 模式
-+-+        )
-+-+
-+-+        # 恢复原始数据类型
-+-+        attn_output = attn_output.to(input_dtype)
-+-+
-+-+        # 7. 调整输出形状
-+-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+-+        attn_output = self.o_proj(attn_output)
-+-+
-+-+        # FlashAttention 算子不直接返回注意力权重矩阵
-+-+        attn_weights = None
-+-+        if output_attentions:
-+-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+-+
-+-+        return attn_output, attn_weights, past_key_value
-+-+
-+-+    # def forward(
-+-+    #     self,
-+-+    #     hidden_states: mindspore.Tensor,
-+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-+-+    #     position_ids: Optional[mindspore.Tensor] = None,
-+-+    #     past_key_value: Optional[Cache] = None,
-+-+    #     output_attentions: bool = False,
-+-+    #     use_cache: bool = False,
-+-+    #     cache_position: Optional[mindspore.Tensor] = None,
-+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+-+
-+-+    #     bsz, q_len, _ = hidden_states.shape
-+-+
-+-+    #     # 1. 线性投射 Q, K, V
-+-+    #     query_states = self.q_proj(hidden_states)
-+-+    #     key_states = self.k_proj(hidden_states)
-+-+    #     value_states = self.v_proj(hidden_states)
-+-+
-+-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+
-+-+    #     # 3. RoPE 旋转位置编码
-+-+    #     kv_seq_len = key_states.shape[-2]
-+-+    #     if past_key_value is not None:
-+-+    #         if self.layer_idx is None:
-+-+    #             raise ValueError(
-+-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+-+    #                 "with a layer index."
-+-+    #             )
-+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+-+
-+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+-+
-+-+    #     # 4. KV 缓存更新
-+-+    #     if past_key_value is not None:
-+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+-+    #         key_states, value_states = past_key_value.update(
-+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-+-+    #         )
-+-+
-+-+    #     # 5. 准备 Attention Mask
-+-+    #     fa_attention_mask = None
-+-+    #     if attention_mask is not None:
-+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+-+    #         fa_attention_mask = (mask_slice != 0)
-+-+
-+-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+-+    #     input_dtype = query_states.dtype
-+-+
-+-+    #     # 6. [核心] 调用 flash_attention_score 算子
-+-+    #     attn_output = mindspore.ops.flash_attention_score(
-+-+    #         query=query_states,
-+-+    #         key=key_states,
-+-+    #         value=value_states,
-+-+    #         head_num=self.num_heads,
-+-+    #         attn_mask=fa_attention_mask,
-+-+    #         keep_prob=1.0 - self.attention_dropout,
-+-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+-+    #         input_layout="BNSD",
-+-+    #         sparse_mode=0,
-+-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+-+    #         inner_precise=1
-+-+    #     )
-+-+
-+-+    #     # 恢复原始数据类型
-+-+    #     attn_output = attn_output.to(input_dtype)
-+-+
-+-+    #     # 7. 调整输出形状
-+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+-+    #     attn_output = self.o_proj(attn_output)
-+-+
-+-+    #     attn_weights = None
-+-+    #     if output_attentions:
-+-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+-+
-+-+    #     return attn_output, attn_weights, past_key_value
-+-+
-+-+    # def forward(
-+-+    #     self,
-+-+    #     hidden_states: mindspore.Tensor,
-+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-+-+    #     position_ids: Optional[mindspore.Tensor] = None,
-+-+    #     past_key_value: Optional[Cache] = None,
-+-+    #     output_attentions: bool = False,
-+-+    #     use_cache: bool = False,
-+-+    #     cache_position: Optional[mindspore.Tensor] = None,
-+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+-+
-+-+    #     bsz, q_len, _ = hidden_states.shape
-+-+
-+-+    #     query_states = self.q_proj(hidden_states)
-+-+    #     key_states = self.k_proj(hidden_states)
-+-+    #     value_states = self.v_proj(hidden_states)
-+-+
-+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-+
-+-+    #     kv_seq_len = key_states.shape[-2]
-+-+    #     if past_key_value is not None:
-+-+    #         if self.layer_idx is None:
-+-+    #             raise ValueError("`layer_idx` must be specified for caching")
-+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+-+
-+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+-+
-+-+    #     if past_key_value is not None:
-+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+-+    #         key_states, value_states = past_key_value.update(
-+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-+-+    #         )
-+-+
-+-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+-+
-+-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+-+    #     query_states = query_states / math.sqrt(self.head_dim)
-+-+    #     # <--- 修改结束 ---
-+-+
-+-+    #     fa_attention_mask = None
-+-+    #     if attention_mask is not None:
-+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+-+    #         fa_attention_mask = (mask_slice != 0)
-+-+
-+-+    #     input_dtype = query_states.dtype
-+-+
-+-+    #     attn_output = mindspore.ops.flash_attention_score(
-+-+    #         query=query_states,    # 传入已经预先缩放过的 query
-+-+    #         key=key_states,
-+-+    #         value=value_states,
-+-+    #         head_num=self.num_heads,
-+-+    #         attn_mask=fa_attention_mask,
-+-+    #         keep_prob=1.0 - self.attention_dropout,
-+-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+-+    #         input_layout="BNSD",
-+-+    #         sparse_mode=0,
-+-+    #         inner_precise=1        # 仍然保持内部高精度计算
-+-+    #     )
-+-+
-+-+    #     attn_output = attn_output.to(input_dtype)
-+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+-+    #     attn_output = self.o_proj(attn_output)
-+-+
-+-+    #     attn_weights = None
-+-+    #     if output_attentions:
-+-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+-+
-+-+    #     return attn_output, attn_weights, past_key_value
-+-+
-+- QWEN2MOE_ATTENTION_CLASSES = {
-+-     "eager": Qwen2MoeAttention,
-+-+    "flash-attention": Qwen2MoeFlashAttention,
-+- }
-+- 
-+- 
-+-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+- 
-+-+    #@dwj
-+-+    # 只遍历激活的专家，而非全部专家
-+-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+--        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+--        hidden_states = hidden_states.view(-1, hidden_dim)
-+--        # router_logits: (batch * sequence_length, n_experts)
-+--        router_logits = self.gate(hidden_states)
-+--
-+--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+--        if self.norm_topk_prob:
-+--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+--        # we cast back to the input dtype
-+--        routing_weights = routing_weights.to(hidden_states.dtype)
-+--
-+--        final_hidden_states = ops.zeros(
-+--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+--        )
-+--
-+--        # One hot encode the selected experts to create an expert mask
-+--        # this will be used to easily index which expert is going to be sollicitated
-+--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+--
-+--        # Loop over all available experts in the model and perform the computation on each expert
-+--        for expert_idx in range(self.num_experts):
-+--            expert_layer = self.experts[expert_idx]
-+--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+--
-+--            # Index the correct hidden states and compute the expert hidden state for
-+--            # the current expert. We need to make sure to multiply the output hidden
-+--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+--            if 0 not in idx.shape:
-+--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+--
-+--                # However `index_add_` only support torch tensors for indexing so we'll use
-+--                # the `top_x` tensor here.
-+--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+--
-+--        shared_expert_output = self.shared_expert(hidden_states)
-+--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+--
-+--        final_hidden_states = final_hidden_states + shared_expert_output
-+-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+-+            num_tokens = hidden_states_reshaped.shape[0]
-+-+
-+-+            router_logits = self.gate(hidden_states_reshaped)
-+-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+-+
-+-+            if self.norm_topk_prob:
-+-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+-+            routing_weights = routing_weights.to(hidden_states.dtype)
-+-+            
-+-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+-+            flat_selected_experts = selected_experts.flatten()
-+-+            
-+-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+-+            token_indices = broadcasted_token_indices.flatten()
-+-+            
-+-+            active_experts = ops.unique(flat_selected_experts)
-+-+            
-+-+            for expert_idx_tensor in active_experts:
-+-+                expert_idx = expert_idx_tensor.item()
-+-+                expert_layer = self.experts[expert_idx]
-+-+                
-+-+                mask = (flat_selected_experts == expert_idx_tensor)
-+-+                selected_token_indices = token_indices[mask]
-+-+                selected_routing_weights = routing_weights.flatten()[mask]
-+-+                
-+-+                current_states = hidden_states_reshaped[selected_token_indices]
-+-+                
-+-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+-+                
-+-+                final_hidden_states = final_hidden_states.index_add(
-+-+                    dim=0,
-+-+                    index=selected_token_indices,
-+-+                    source=expert_output.to(hidden_states.dtype)
-+-+                )
-+-+            
-+-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+- 
-+--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+--        return final_hidden_states, router_logits
-+-+            final_hidden_states = final_hidden_states + shared_expert_output
-+-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+-+            
-+-+            return final_hidden_states, router_logits
-+- 
-+- 
-+- class Qwen2MoeDecoderLayer(nn.Module):
-+-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+- 
-+-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+- 
-+-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+-+
-+-         if (layer_idx not in config.mlp_only_layers) and (
-+-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+-         ):
-+-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+-     _skip_keys_device_placement = "past_key_values"
-+-     _supports_cache_class = True
-+-+#lwx
-+-+    # _supports_static_cache = True
-+- 
-+-     def _init_weights(self, module):
-+-         std = self.config.initializer_range
-+-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+-         return causal_mask
-+- 
-+- 
-+--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+-     _tied_weights_keys = ["lm_head.weight"]
-+- 
-+-     def __init__(self, config):
-+-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+-         self.num_experts_per_tok = config.num_experts_per_tok
-+-         # Initialize weights and apply final processing
-+-         self.post_init()
-+-+        # @lwx
-+-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+-+        #     self.generation_config.cache_implementation = "static"
-+-+        self._warmed_up = False
-+-+
-+-+    def warmup_moe_model(self):
-+-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+-+        test_texts = [
-+-+            "warmup short",
-+-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+-+        ]
-+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+-+        if tokenizer is None:
-+-+            from mindnlp.transformers import AutoTokenizer
-+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+-+            self._warmup_tokenizer = tokenizer
-+-+
-+-+        for text in test_texts:
-+-+            inputs = tokenizer(text, return_tensors="ms")
-+-+            with mindspore._no_grad():
-+-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+- 
-+-     def get_input_embeddings(self):
-+-         return self.model.embed_tokens
-+-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+-         ```"""
-+-+        if not self._warmed_up:
-+-+            self._warmed_up = True
-+-+            self.warmup_moe_model()
-+- 
-+-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+-         output_router_logits = (
-+-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+-             }
-+-         )
-+-         return model_inputs
-+-+# @lwx
-+-+    # def _decode_one_tokens_logits(
-+-+    #     self,
-+-+    #     cur_token: mindspore.Tensor,
-+-+    #     input_pos: Optional[mindspore.Tensor],
-+-+    #     cache_position: mindspore.Tensor,
-+-+    #     past_key_values: StaticCache,
-+-+    # ) -> mindspore.Tensor:
-+-+    #     """
-+-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+-+        
-+-+    #     Args:
-+-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+-+    #         input_pos: 输入位置信息，可选
-+-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+-+            
-+-+    #     Returns:
-+-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+-+    #     """
-+-+    #     # 调用JIT编译的版本
-+-+    #     return self.get_decode_one_tokens_logits(
-+-+    #         cur_token=cur_token,
-+-+    #         input_pos=input_pos,
-+-+    #         cache_position=cache_position,
-+-+    #         past_key_values=past_key_values,
-+-+    #     )
-+-+    
-+-+    # @mindspore.jit(jit_level='O1')
-+-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+-+    #     """
-+-+    #     JIT编译的函数，用于高效的单token解码
-+-+    #     使用JIT编译优化以支持静态shape和高效执行
-+-+        
-+-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+-+    #     """
-+-+    #     outputs = self.model.forward(
-+-+    #         input_ids=cur_token,
-+-+    #         position_ids=input_pos,
-+-+    #         cache_position=cache_position,
-+-+    #         past_key_values=past_key_values,
-+-+    #         use_cache=True,
-+-+    #         return_dict=False,
-+-+    #     )
-+-+        
-+-+    #     hidden_states = outputs[0]
-+-+    #     logits = self.lm_head.forward(hidden_states)
-+-+    #     logits = logits.float()
-+-+        
-+-+    #     return logits[:, -1, :]
-+-+
-+-+    # def _sample(
-+-+    #     self,
-+-+    #     input_ids: mindspore.Tensor,
-+-+    #     logits_processor,
-+-+    #     stopping_criteria,
-+-+    #     generation_config,
-+-+    #     synced_devices: bool,
-+-+    #     streamer=None,
-+-+    #     logits_warper=None,
-+-+    #     **model_kwargs,
-+-+    # ):
-+-+    #     """
-+-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+-+    #     """
-+-+    #     from ...generation.logits_process import LogitsProcessorList
-+-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+-+    #     from mindnlp.core import nn, ops, no_grad
-+-+    #     import numpy as np
-+-+        
-+-+    #     # 检查是否使用 StaticCache
-+-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+-+    #     # 否则，直接调用父类方法
-+-+    #     past_key_values = model_kwargs.get("past_key_values")
-+-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+-+
-+-+    #     if not isinstance(past_key_values, StaticCache):
-+-+    #         # 不使用 StaticCache，直接调用父类方法
-+-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+-+    #         return super()._sample(
-+-+    #             input_ids=input_ids,
-+-+    #             logits_processor=logits_processor,
-+-+    #             stopping_criteria=stopping_criteria,
-+-+    #             generation_config=generation_config,
-+-+    #             synced_devices=synced_devices,
-+-+    #             streamer=streamer,
-+-+    #             logits_warper=logits_warper,
-+-+    #             **model_kwargs,
-+-+    #         )
-+-+        
-+-+    #     # 使用 StaticCache，进入自定义循环
-+-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+-+    #     pad_token_id = generation_config._pad_token_tensor
-+-+    #     output_attentions = generation_config.output_attentions
-+-+    #     output_hidden_states = generation_config.output_hidden_states
-+-+    #     output_scores = generation_config.output_scores
-+-+    #     output_logits = generation_config.output_logits
-+-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+-+    #     max_length = generation_config.max_length
-+-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+-+    #     do_sample = generation_config.do_sample
-+-+        
-+-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+-+    #         raise ValueError(
-+-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+-+    #             f"{logits_warper})."
-+-+    #         )
-+-+        
-+-+    #     # init attention / hidden states / scores tuples
-+-+    #     scores = () if (return_dict_in_generate and output_scores) else None
-+-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+-+        
-+-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+-+    #         encoder_hidden_states = (
-+-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+-+    #         )
-+-+        
-+-+    #     # keep track of which sequences are already finished
-+-+    #     batch_size, cur_len = input_ids.shape
-+-+    #     this_peer_finished = False
-+-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+-+        
-+-+    #     time_record = []
-+-+    #     from ....utils.testing_utils import parse_flag_from_env
-+-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+-+        
-+-+    #     while self._has_unfinished_sequences(
-+-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+-+    #     ):
-+-+    #         if _record_time:
-+-+    #             import time as time_module
-+-+    #             infer_start = time_module.time()
-+-+            
-+-+    #         # prepare model inputs
-+-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+-+            
-+-+    #         # prepare variable output controls
-+-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+-+            
-+-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+-+    #         cur_cache_position = model_inputs.get("cache_position")
-+-+    #         cur_past_key_values = model_inputs.get("past_key_values")
-+-+    #         cur_input_ids = model_inputs.get("input_ids")
-+-+            
-+-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+-+    #             cur_cache_position is not None and 
-+-+    #             len(cur_cache_position.shape) > 0 and
-+-+    #             cur_cache_position.shape[0] == 1 and
-+-+    #             cur_input_ids is not None and
-+-+    #             cur_input_ids.shape[1] == 1):
-+-+    #             # 使用 JIT 优化的单 token 解码
-+-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+-+    #             if not hasattr(self, '_jit_used'):
-+-+    #                 self._jit_used = False
-+-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+-+                
-+-+    #             next_token_logits = self.get_decode_one_tokens_logits(
-+-+    #                 cur_token=cur_input_ids,
-+-+    #                 input_pos=model_inputs.get("position_ids"),
-+-+    #                 cache_position=cur_cache_position,
-+-+    #                 past_key_values=cur_past_key_values,
-+-+    #             )
-+-+                
-+-+    #             # 标记已使用JIT（用于后续判断）
-+-+    #             if not self._jit_used:
-+-+    #                 self._jit_used = True
-+-+                
-+-+    #             # 构造兼容的输出对象
-+-+    #             class JitOptimizedOutput:
-+-+    #                 def __init__(self, logits, config):
-+-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+-+    #                     self.config = config
-+-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+-+    #                     self.attentions = None if not config.is_encoder_decoder else None
-+-+    #                     self.cross_attentions = None
-+-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+-+                
-+-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+-+    #         else:
-+-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+-+    #             outputs = self(**model_inputs, return_dict=True)
-+-+            
-+-+    #         if synced_devices and this_peer_finished:
-+-+    #             continue
-+-+            
-+-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+-+    #         next_token_logits = outputs.logits[:, -1, :]
-+-+            
-+-+    #         # pre-process distribution
-+-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+-+    #         if do_sample:
-+-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+-+            
-+-+    #         # Store scores, attentions and hidden_states when required
-+-+    #         if return_dict_in_generate:
-+-+    #             if output_scores:
-+-+    #                 scores += (next_token_scores,)
-+-+    #             if output_logits:
-+-+    #                 raw_logits += (next_token_logits,)
-+-+    #             if output_attentions:
-+-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+-+    #                 if self.config.is_encoder_decoder:
-+-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+-+                
-+-+    #             if output_hidden_states:
-+-+    #                 hidden = (
-+-+    #                     outputs.decoder_hidden_states
-+-+    #                     if self.config.is_encoder_decoder
-+-+    #                     else outputs.hidden_states
-+-+    #                 )
-+-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+-+            
-+-+    #         # token selection
-+-+    #         if do_sample:
-+-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+-+    #         else:
-+-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+-+            
-+-+    #         # finished sentences should have their next token be a padding token
-+-+    #         if has_eos_stopping_criteria:
-+-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+-+            
-+-+    #         # update generated ids, model inputs, and length for next step
-+-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+-+    #         if streamer is not None:
-+-+    #             streamer.put(next_tokens)
-+-+            
-+-+    #         model_kwargs = self._update_model_kwargs_for_generation(
-+-+    #             outputs,
-+-+    #             model_kwargs,
-+-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+-+    #         )
-+-+            
-+-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+-+    #         cur_len += 1
-+-+            
-+-+    #         if _record_time:
-+-+    #             import time as time_module
-+-+    #             infer_stop = time_module.time()
-+-+    #             time_record.append(infer_stop - infer_start)
-+-+            
-+-+    #         del outputs
-+-+        
-+-+    #     average_infer_time = None
-+-+    #     if time_record:
-+-+    #         if len(time_record) > 1:
-+-+    #             time_record.pop(0)
-+-+    #         average_infer_time = sum(time_record) / len(time_record)
-+-+    #         print(f'average inference time is: {average_infer_time}')
-+-+    #         print(f'inference time record: {time_record}')
-+-+        
-+-+    #     if streamer is not None:
-+-+    #         streamer.end()
-+-+        
-+-+    #     # 简单判断：打印是否使用了JIT路径
-+-+    #     if hasattr(self, '_jit_used') and self._jit_used:
-+-+    #         print("[JIT] ✓ JIT optimization was used during generation")
-+-+    #     else:
-+-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+-+        
-+-+    #     if return_dict_in_generate:
-+-+    #         if self.config.is_encoder_decoder:
-+-+    #             return GenerateEncoderDecoderOutput(
-+-+    #                 sequences=input_ids,
-+-+    #                 scores=scores,
-+-+    #                 logits=raw_logits,
-+-+    #                 encoder_attentions=encoder_attentions,
-+-+    #                 encoder_hidden_states=encoder_hidden_states,
-+-+    #                 decoder_attentions=decoder_attentions,
-+-+    #                 cross_attentions=cross_attentions,
-+-+    #                 decoder_hidden_states=decoder_hidden_states,
-+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-+-+    #                 average_infer_time=average_infer_time
-+-+    #             )
-+-+    #         else:
-+-+    #             return GenerateDecoderOnlyOutput(
-+-+    #                 sequences=input_ids,
-+-+    #                 scores=scores,
-+-+    #                 logits=raw_logits,
-+-+    #                 attentions=decoder_attentions,
-+-+    #                 hidden_states=decoder_hidden_states,
-+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-+-+    #                 average_infer_time=average_infer_time
-+-+    #             )
-+-+    #     else:
-+-+    #         return input_ids
-+-+            
-+-+    # def _prepare_cache_for_generation(
-+-+    #     self,
-+-+    #     generation_config,
-+-+    #     model_kwargs,
-+-+    #     assistant_model,
-+-+    #     batch_size,
-+-+    #     max_cache_length,
-+-+    # ):
-+-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+-+    #         generation_config.cache_implementation = "static"
-+-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+-+        
-+-+    #     if generation_config.cache_implementation == "static":
-+-+    #         base_required_from_max_length = generation_config.max_length + 1
-+-+    #         base_required = max(max_cache_length, base_required_from_max_length)
-+-+    #         min_cache_size = 50
-+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+-+    #         else:
-+-+    #             max_cache_length = max(base_required, min_cache_size)
-+-+            
-+-+    #         original_max_cache_length = max_cache_length
-+-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+-+    #         print(f"  - final max_cache_length: {max_cache_length}")
-+-+            
-+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+-+    #             if max_cache_length > self.config.max_position_embeddings:
-+-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+-+        
-+-+    #     result = super()._prepare_cache_for_generation(
-+-+    #         generation_config=generation_config,
-+-+    #         model_kwargs=model_kwargs,
-+-+    #         assistant_model=assistant_model,
-+-+    #         batch_size=batch_size,
-+-+    #         max_cache_length=max_cache_length,
-+-+    #     )
-+-+        
-+-+    #     if generation_config.cache_implementation == "static":
-+-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+-+    #         created_cache = model_kwargs.get(cache_name)
-+-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+-+    #             if created_cache.max_cache_len < generation_config.max_length:
-+-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+-+        
-+-+    #     return result
-+-+
-+-+
-+-+
-+- 
-+- 
-+- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+--- 
-+-2.27.0
-+-
-+-- 
-+2.27.0
-+
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0005-20251107001commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0005-20251107001commit.patch"
deleted file mode 100644
index bbe6df27..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0005-20251107001commit.patch"
+++ /dev/null
@@ -1,7707 +0,0 @@
-From ab47c0478530d34d2b48200af0453dda94d1ec18 Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Fri, 7 Nov 2025 11:48:18 +0800
-Subject: [PATCH 05/10] 20251107001commit
-
----
- .../models/deepseek/modeling_deepseek.py      |   91 +-
- .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
- .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
- patches/0001-20251104commit.patch             |    2 +-
- patches/0002-20251106commit.patch             |    2 +-
- patches/0003-20261106secondcommit.patch       |    2 +-
- patches/0004-20251106change.patch             | 7498 +++++++++++++++++
- 7 files changed, 7577 insertions(+), 30 deletions(-)
- create mode 100644 patches/0004-20251106change.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index 0546f318..8831e4b7 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
-     #         expert_cache += expert_out * weight
-     #     return expert_cache
- 
--    @no_grad()
--    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--        # x 的 shape: (1, hidden_size)
--        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--
--        # 1. 收集所有需要的专家层
--        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--        selected_experts = [self.experts[i] for i in flat_expert_indices]
--
--        # 2. 并行计算所有专家的输出
--        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--        # ops.cat 会将它们堆叠成一个新的 Tensor
--        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--
--        # 3. 使用矩阵乘法进行加权求和
--        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--        # 最终结果 final_output 的 shape: (1, hidden_size)
--        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+    # @no_grad()
-+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+    #     # x 的 shape: (1, hidden_size)
-+    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+
-+    #     # 1. 收集所有需要的专家层
-+    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
-+
-+    #     # 2. 并行计算所有专家的输出
-+    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+    #     # ops.cat 会将它们堆叠成一个新的 Tensor
-+    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+
-+    #     # 3. 使用矩阵乘法进行加权求和
-+    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+    #     # 最终结果 final_output 的 shape: (1, hidden_size)
-+    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-         
--        return final_output
-+    #     return final_output
- 
- 
-     # @no_grad()
-@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
-             )
- 
-         return expert_cache
-+# 放置在 DeepseekMoE 类中
-+    @no_grad()
-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+        """
-+        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-+
-+        Args:
-+            x (Tensor): 输入张量, shape: (1, hidden_size)
-+            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-+            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-+        """
-+        top_k, _ = flat_expert_weights.shape
-+        hidden_size = x.shape[-1]
-+
-+        # 1. 将所有专家的权重堆叠起来
-+        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-+        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-+        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-+        
-+        # 2. "收集" 所需的专家权重
-+        selected_gate_w = stacked_gate_w[flat_expert_indices]
-+        selected_up_w = stacked_up_w[flat_expert_indices]
-+        selected_down_w = stacked_down_w[flat_expert_indices]
-+        
-+        # 3. 准备输入
-+        x_expanded = x.expand((top_k, 1, hidden_size))
-+        
-+        # 4. 并行计算 gate_proj 和 up_proj
-+        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-+        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-+
-+        # 5. 计算中间状态
-+        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-+        
-+        # 6. 并行计算 down_proj
-+        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-+        #    --- [FIX] ---
-+        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-+        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-+        #    --- [FIX END] ---
-+        
-+        # 7. 根据路由权重进行加权求和
-+        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-+
-+        return weighted_sum
-+
-+
- 
-         # @no_grad()
-         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-index ebd7782e..913a7609 100644
---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
- # Copied from transformers.models.llama.modeling_llama.rotate_half
- def rotate_half(x):
-     """Rotates half the hidden dims of the input."""
--    x1 = x[..., : x.shape[-1] // 2]
--    x2 = x[..., x.shape[-1] // 2 :]
-+    # x1 = x[..., : x.shape[-1] // 2]
-+    # x2 = x[..., x.shape[-1] // 2 :]
-     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-     return ops.cat((-x2, x1), dim=-1)
- 
- 
-diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-index d059dcbe..2b217b64 100644
---- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-+++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
- # Copied from transformers.models.llama.modeling_llama.rotate_half
- def rotate_half(x):
-     """Rotates half the hidden dims of the input."""
--    x1 = x[..., : x.shape[-1] // 2]
--    x2 = x[..., x.shape[-1] // 2 :]
-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+    # x1 = x[..., : x.shape[-1] // 2]
-+    # x2 = x[..., x.shape[-1] // 2 :]
-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-     return ops.cat((-x2, x1), dim=-1)
- 
- 
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-index 78f22642..0a0ef2d7 100644
---- a/patches/0001-20251104commit.patch
-+++ b/patches/0001-20251104commit.patch
-@@ -1,7 +1,7 @@
- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Tue, 4 Nov 2025 09:11:51 +0800
--Subject: [PATCH 1/3] 20251104commit
-+Subject: [PATCH 1/4] 20251104commit
- 
- ---
-  mindnlp/transformers/cache_utils.py           |  28 +-
-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-index 22b65dd5..5185270c 100644
---- a/patches/0002-20251106commit.patch
-+++ b/patches/0002-20251106commit.patch
-@@ -1,7 +1,7 @@
- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 09:20:38 +0800
--Subject: [PATCH 2/3] 20251106commit
-+Subject: [PATCH 2/4] 20251106commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-index 966529e4..3e05f821 100644
---- a/patches/0003-20261106secondcommit.patch
-+++ b/patches/0003-20261106secondcommit.patch
-@@ -1,7 +1,7 @@
- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 14:54:37 +0800
--Subject: [PATCH 3/3] 20261106secondcommit
-+Subject: [PATCH 3/4] 20261106secondcommit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-new file mode 100644
-index 00000000..88a1aef4
---- /dev/null
-+++ b/patches/0004-20251106change.patch
-@@ -0,0 +1,7498 @@
-+From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Thu, 6 Nov 2025 15:48:09 +0800
-+Subject: [PATCH 4/4] 20251106change
-+
-+---
-+ .../models/deepseek/modeling_deepseek.py      |  189 +-
-+ patches/0001-20251104commit.patch             | 1272 +++++++
-+ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
-+ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
-+ 4 files changed, 7244 insertions(+), 186 deletions(-)
-+ create mode 100644 patches/0001-20251104commit.patch
-+ create mode 100644 patches/0002-20251106commit.patch
-+ create mode 100644 patches/0003-20261106secondcommit.patch
-+
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index 2f9192bf..0546f318 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
-+ 
-+         return attn_output, attn_weights, past_key_value
-+ 
-+-# class DeepseekFlashAttention(nn.Module):
-+-#     """
-+-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-+-
-+-#     This class is designed as a drop-in replacement for DeepseekAttention.
-+-#     """
-+-
-+-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+-#         super().__init__()
-+-#         self.config = config
-+-#         self.layer_idx = layer_idx
-+-#         if layer_idx is None:
-+-#             logger.warning(
-+-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+-#                 "when creating this class."
-+-#             )
-+-
-+-#         self.attention_dropout = config.attention_dropout
-+-#         self.hidden_size = config.hidden_size
-+-#         self.num_heads = config.num_attention_heads
-+-#         self.head_dim = self.hidden_size // self.num_heads
-+-#         self.num_key_value_heads = config.num_key_value_heads
-+-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+-#         self.max_position_embeddings = config.max_position_embeddings
-+-#         self.rope_theta = config.rope_theta
-+-#         self.is_causal = True
-+-
-+-#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+-#             raise ValueError(
-+-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+-#                 f" and `num_heads`: {self.num_heads})."
-+-#             )
-+-
-+-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+-#         self._init_rope()
-+-
-+-#     def _init_rope(self):
-+-#         if self.config.rope_scaling is None:
-+-#             self.rotary_emb = DeepseekRotaryEmbedding(
-+-#                 self.head_dim,
-+-#                 max_position_embeddings=self.max_position_embeddings,
-+-#                 base=self.rope_theta,
-+-#             )
-+-#         else:
-+-#             scaling_type = self.config.rope_scaling["type"]
-+-#             scaling_factor = self.config.rope_scaling["factor"]
-+-#             if scaling_type == "linear":
-+-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+-#                     self.head_dim,
-+-#                     max_position_embeddings=self.max_position_embeddings,
-+-#                     scaling_factor=scaling_factor,
-+-#                     base=self.rope_theta,
-+-#                 )
-+-#             elif scaling_type == "dynamic":
-+-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+-#                     self.head_dim,
-+-#                     max_position_embeddings=self.max_position_embeddings,
-+-#                     scaling_factor=scaling_factor,
-+-#                     base=self.rope_theta,
-+-#                 )
-+-#             else:
-+-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+-
-+-#     def forward(
-+-#         self,
-+-#         hidden_states: mindspore.Tensor,
-+-#         attention_mask: Optional[mindspore.Tensor] = None,
-+-#         position_ids: Optional[mindspore.Tensor] = None,
-+-#         past_key_value: Optional[Cache] = None,
-+-#         output_attentions: bool = False,
-+-#         use_cache: bool = False,
-+-#         **kwargs,
-+-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+-#         if "padding_mask" in kwargs:
-+-#             warnings.warn(
-+-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+-#             )
-+-        
-+-#         if output_attentions:
-+-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-+-
-+-#         bsz, q_len, _ = hidden_states.shape
-+-
-+-#         if self.config.pretraining_tp > 1:
-+-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+-
-+-#         query_states = self.q_proj(hidden_states)
-+-#         key_states = self.k_proj(hidden_states)
-+-#         value_states = self.v_proj(hidden_states)
-+-
-+-#         # Reshape for multi-head attention
-+-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+-
-+-#         kv_seq_len = key_states.shape[-2]
-+-#         if past_key_value is not None:
-+-#             if self.layer_idx is None:
-+-#                 raise ValueError(
-+-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+-#                     "with a layer index."
-+-#                 )
-+-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+-        
-+-#         # Apply Rotary Positional Embedding
-+-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+-
-+-#         if past_key_value is not None:
-+-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-+-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+-
-+-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-+-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-+-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+-        
-+-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+-        
-+-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+-
-+-#         # Convert attention_mask for flash_attention_score
-+-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-+-#         if attention_mask is not None:
-+-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-+-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+-#                 raise ValueError(
-+-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+-#                 )
-+-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-+-#         else:
-+-#             attn_mask_for_fa = None
-+-        
-+-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+-
-+-#         # Call the fused flash_attention_score operator
-+-#         attn_output = mindspore.ops.flash_attention_score(
-+-#             query=query_states_for_fa,
-+-#             key=key_states_for_fa,
-+-#             value=value_states_for_fa,
-+-#             head_num=self.num_heads, # This is N1, the number of query heads
-+-#             input_layout='BSH',
-+-#             attn_mask=attn_mask_for_fa,
-+-#             keep_prob=keep_prob,
-+-#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+-#             sparse_mode=0 # Default mask mode
-+-#         )
-+-        
-+-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-+-#         attn_output = self.o_proj(attn_output)
-+-        
-+-#         # Flash Attention does not return attention weights
-+-#         attn_weights = None
-+-
-+-#         return attn_output, attn_weights, past_key_value
-+ 
-+ class DeepseekFlashAttention(nn.Module):
-+     """
-+@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
-+         super().__init__()
-+         self.hidden_size = config.hidden_size
-+ 
-+-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-+-            config=config, layer_idx=layer_idx
-+-        )
-++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-++            # config=config, layer_idx=layer_idx
-++        # )
-+ 
-+         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-+             config=config, layer_idx=layer_idx
-+@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
-+         return outputs
-+ 
-+ 
-+-
-+ class DeepseekPreTrainedModel(PreTrainedModel):
-+     config_class = DeepseekConfig
-+     base_model_prefix = "model"
-+@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+         # Initialize weights and apply final processing
-+         self.post_init()
-+         self.warm_up = False
-+-        #@dwj
-+-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-+-            self.num_layers,
-+-            self.num_attention_heads,
-+-            self.head_dim,
-+-            batch_size=1,
-+-            max_length=self.max_length,
-+-            dtype=mindspore.float16
-+-        )
-+-
-+-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-+-        key_cache = []
-+-        value_cache = []
-+-        for _ in range(num_layers):
-+-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+-            key_cache.append(k)
-+-            value_cache.append(v)
-+-        return key_cache, value_cache
-+-
-+ 
-+     def warmup_moe_model_deep(self):
-+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+new file mode 100644
-+index 00000000..78f22642
-+--- /dev/null
-++++ b/patches/0001-20251104commit.patch
-+@@ -0,0 +1,1272 @@
-++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++From: Pinoeer-kingxi <13022943007@163.com>
-++Date: Tue, 4 Nov 2025 09:11:51 +0800
-++Subject: [PATCH 1/3] 20251104commit
-++
-++---
-++ mindnlp/transformers/cache_utils.py           |  28 +-
-++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++ 3 files changed, 976 insertions(+), 87 deletions(-)
-++
-++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++index cadd2e04..02f8d4be 100644
-++--- a/mindnlp/transformers/cache_utils.py
-+++++ b/mindnlp/transformers/cache_utils.py
-++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++             # k_out[:, :, cache_position] = key_states
-++             # v_out[:, :, cache_position] = value_states
-++-            if ON_ORANGE_PI:
-++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++-            else:
-++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++-
-+++            # if ON_ORANGE_PI:
-+++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++            # else:
-+++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++            # 确保 cache_position 是 1D tensor 并且类型正确
-+++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+++            if cache_position.ndim > 1:
-+++                cache_position = cache_position.flatten()
-+++            # 确保类型是 int32 或 int64（MindSpore 要求）
-+++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+++                cache_position = cache_position.int()
-+++            
-+++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+++            k_out[:, :, cache_position] = key_states
-+++            v_out[:, :, cache_position] = value_states
-+++            
-++         return k_out, v_out
-++ 
-++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++index c695b944..d8303e45 100644
-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++ def rotate_half(x):
-++     """Rotates half the hidden dims of the input."""
-++-    x1 = x[..., : x.shape[-1] // 2]
-++-    x2 = x[..., x.shape[-1] // 2 :]
-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++    # x1 = x[..., : x.shape[-1] // 2]
-+++    # x2 = x[..., x.shape[-1] // 2 :]
-+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++     return ops.cat((-x2, x1), dim=-1)
-++ 
-++ 
-++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++         if self.training:
-++             raise NotImplementedError("Training is not supported yet.")
-++         else:
-++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++-        if self.config.n_shared_experts is not None:
-++-            y = y + self.shared_experts(identity)
-++-        return y
-+++            # @lwx
-+++            if orig_shape[1] == 1:
-+++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+++                y=y.view(*orig_shape)
-+++                if self.config.n_shared_experts is not None:
-+++                    y = y + self.shared_experts(identity)
-+++                return y
-+++            else:
-+++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+++                if self.config.n_shared_experts is not None:
-+++                    y = y + self.shared_experts(identity)
-+++                return y
-+++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++        # if self.config.n_shared_experts is not None:
-+++        #     y = y + self.shared_experts(identity)
-+++        # return y
-+++        
-+++    @no_grad()
-+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++
-+++        expert_cache = ops.zeros_like(x)
-+++        for i in range(self.num_experts_per_tok):
-+++            expert_id = flat_expert_indices[i].item()
-+++            weight = flat_expert_weights[i].item()
-+++            expert = self.experts[expert_id]
-+++            expert_out = expert(x)
-+++            expert_cache += expert_out * weight
-+++        return expert_cache
-++ 
-++     @no_grad()
-++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++-        # expert_cache = torch.zeros_like(x)
-++-        # idxs = flat_expert_indices.argsort()
-++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++-        # token_idxs = idxs // self.num_experts_per_tok
-++-        # for i, end_idx in enumerate(tokens_per_expert):
-++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++-        #     if start_idx == end_idx:
-++-        #         continue
-++-        #     expert = self.experts[i]
-++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++-        #     expert_tokens = x[exp_token_idx]
-++-        #     expert_out = expert(expert_tokens)
-++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++-        # return expert_cache
-+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++         expert_cache = ops.zeros_like(x)
-++         idxs = flat_expert_indices.argsort()
-++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++         token_idxs = idxs // self.num_experts_per_tok
-+++
-++         for i, end_idx in enumerate(tokens_per_expert):
-++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++             if start_idx == end_idx:
-++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++             expert_out = expert(expert_tokens)
-++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++
-++         return expert_cache
-+++        
-+++    # @no_grad()
-+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++    #     # expert_cache = torch.zeros_like(x)
-+++    #     # idxs = flat_expert_indices.argsort()
-+++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++    #     # token_idxs = idxs // self.num_experts_per_tok
-+++    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++    #     #     if start_idx == end_idx:
-+++    #     #         continue
-+++    #     #     expert = self.experts[i]
-+++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++    #     #     expert_tokens = x[exp_token_idx]
-+++    #     #     expert_out = expert(expert_tokens)
-+++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++    #     # return expert_cache
-+++    #     expert_cache = ops.zeros_like(x)
-+++    #     idxs = flat_expert_indices.argsort()
-+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++    #     token_idxs = idxs // self.num_experts_per_tok
-+++
-+++    #     for i, end_idx in enumerate(tokens_per_expert):
-+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++    #         if start_idx == end_idx:
-+++    #             continue
-+++    #         expert = self.experts[i]
-+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++    #         expert_tokens = x[exp_token_idx]
-+++    #         expert_out = expert(expert_tokens)
-+++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++
-+++    #     return expert_cache
-+++    # @no_grad()
-+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++    #     expert_cache = ops.zeros_like(x)
-+++
-+++    #     # 排序保证顺序一致
-+++    #     idxs = flat_expert_indices.argsort()
-+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++    #     token_idxs = idxs // self.num_experts_per_tok
-+++
-+++    #     # 找出有 token 的专家
-+++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++
-+++    #     for i in active_experts.tolist():
-+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++    #         end_idx = tokens_per_expert[i]
-+++    #         if start_idx == end_idx:  # 没有 token
-+++    #             continue
-+++
-+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++    #         expert_tokens = x[exp_token_idx]
-+++    #         expert_out = self.experts[i](expert_tokens)
-+++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++
-+++    #         expert_cache = mindspore.mint.scatter_add(
-+++    #             expert_cache,
-+++    #             0,
-+++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++    #             expert_out
-+++    #         )
-+++
-+++    #     return expert_cache
-+++
-+++
-++ 
-++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++ #     """
-++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++ 
-++         # Initialize weights and apply final processing
-++         self.post_init()
-+++        self.warm_up = False
-+++
-+++    def warmup_moe_model_deep(self):
-+++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++        test_texts = [
-+++            "warmup short",
-+++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+++        ]
-+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++        if tokenizer is None:
-+++            from mindnlp.transformers import AutoTokenizer
-+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++            self._warmup_tokenizer = tokenizer
-+++
-+++        for text in test_texts:
-+++            inputs = tokenizer(text, return_tensors="ms")
-+++            with mindspore._no_grad():
-+++                _ = self(**inputs, use_cache=False)
-+++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++ 
-++     def get_input_embeddings(self):
-++         return self.model.embed_tokens
-++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++         ```"""
-+++        if not self.warm_up:
-+++            self.warm_up = True
-+++            self.warmup_moe_model_deep()
-+++
-++         output_attentions = (
-++             output_attentions
-++             if output_attentions is not None
-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++index 3cbf820e..d4c6b651 100644
-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++@@ -18,7 +18,6 @@
-++ # See the License for the specific language governing permissions and
-++ # limitations under the License.
-++ """MindSpore Qwen2MoE model."""
-++-
-++ import math
-++ from typing import List, Optional, Tuple, Union
-++ 
-++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++     TokenClassifierOutput,
-++ )
-++ from ...modeling_utils import PreTrainedModel
-+++from ...generation import GenerationMixin
-++ from ....utils import logging
-++ from .configuration_qwen2_moe import Qwen2MoeConfig
-++ 
-++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++         self.variance_epsilon = eps
-++ 
-++     def forward(self, hidden_states):
-+++        # @dwj
-+++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++        # @lwx
-+++        # if not self.training :
-+++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++         input_dtype = hidden_states.dtype
-++         hidden_states = hidden_states.to(mindspore.float32)
-++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++@@ -234,6 +239,8 @@ def rotate_half(x):
-++     """Rotates half the hidden dims of the input."""
-++     x1 = x[..., : x.shape[-1] // 2]
-++     x2 = x[..., x.shape[-1] // 2 :]
-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++     return ops.cat((-x2, x1), dim=-1)
-++ 
-++ 
-++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++         self.config = config
-++         self.hidden_size = config.hidden_size
-++         self.intermediate_size = intermediate_size
-+++        
-++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++         self.act_fn = ACT2FN[config.hidden_act]
-++ 
-++     def forward(self, x):
-++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++-
-++ 
-+++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++        # @lwx 
-+++        # gate_up_output = self.gate_up_proj(x)
-+++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+++        # return self.down_proj(swiglu_output)
-+++
-+++    # def forward(self, x):
-+++    #     gate_proj_out = self.gate_proj(x)
-+++    #     up_proj_out = self.up_proj(x)
-+++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+++    #     return self.down_proj(swiglu_out)
-+++        
-++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++     """
-++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++         use_cache: bool = False,
-++         cache_position: Optional[mindspore.Tensor] = None,
-++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++        
-+++
-++         bsz, q_len, _ = hidden_states.shape
-++ 
-++         query_states = self.q_proj(hidden_states)
-++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++                     "with a layer index."
-++                 )
-++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++            if isinstance(past_key_value, StaticCache):
-+++                kv_seq_len = key_states.shape[-2]
-+++            else:
-+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++ 
-++         if past_key_value is not None:
-++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++            
-+++            if isinstance(past_key_value, StaticCache):
-+++                kv_seq_len = key_states.shape[-2]
-++ 
-++         # repeat k/v heads if n_kv_heads < n_heads
-++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++-
-+++        
-++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++ 
-++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++-            raise ValueError(
-++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++-                f" {attn_weights.shape}"
-++-            )
-++-
-++-        if attention_mask is not None:  # no matter the length, we just slice it
-++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+++        if attention_mask is not None:
-+++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++             attn_weights = attn_weights + causal_mask
-++ 
-++         # upcast attention to fp32
-++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++ 
-++         attn_output = self.o_proj(attn_output)
-++-
-+++        # @lwx
-+++        
-+++        # max_seq_len = self.max_position_embeddings  # 2048
-+++
-+++        # if attention_mask is not None:
-+++        #     # attention_mask: [B, 1, Sq, Sk]
-+++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++
-+++        #     # pad 到 [max_seq_len, max_seq_len]
-+++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++        #     global_attention_mask = padded_mask
-+++        # else:
-+++        #     global_attention_mask = None
-+++
-+++
-+++        # sparse_mode=3
-+++        # attn_output = mindspore.ops.flash_attention_score(
-+++        #     query=query_states,
-+++        #     key=key_states,
-+++        #     value=value_states,
-+++        #     real_shift=None,
-+++        #     padding_mask=None,
-+++
-+++        #     head_num=self.num_heads,
-+++        #     attn_mask=global_attention_mask,
-+++        #     keep_prob=1.0 - self.attention_dropout,
-+++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++        #     input_layout="BNSD",
-+++        #     pre_tokens=2147483647,
-+++        #     next_tokens=2147483647,
-+++        #     inner_precise=0,
-+++        #     drop_mask=None,
-+++        #     prefix=None,
-+++        #     actual_seq_qlen=None,
-+++        #     actual_seq_kvlen=None,
-+++        #     sparse_mode=sparse_mode,
-+++        # )
-++         if not output_attentions:
-++             attn_weights = None
-++ 
-++         return attn_output, attn_weights, past_key_value
-++ 
-++ 
-+++class Qwen2MoeFlashAttention(nn.Module):
-+++    """
-+++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++
-+++    关键改动:
-+++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++       直接传入原始的 key 和 value 张量效率更高。
-+++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++    """
-+++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++        super().__init__()
-+++        self.config = config
-+++        self.layer_idx = layer_idx
-+++        self.hidden_size = config.hidden_size
-+++        self.num_heads = config.num_attention_heads
-+++        self.head_dim = self.hidden_size // self.num_heads
-+++        self.num_key_value_heads = config.num_key_value_heads
-+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++        self.max_position_embeddings = config.max_position_embeddings
-+++        self.rope_theta = config.rope_theta
-+++        self.attention_dropout = config.attention_dropout
-+++
-+++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++            raise ValueError(
-+++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++            )
-+++
-+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++
-+++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++            self.head_dim,
-+++            max_position_embeddings=self.max_position_embeddings,
-+++            base=self.rope_theta,
-+++        )
-+++
-+++    def forward(
-+++        self,
-+++        hidden_states: mindspore.Tensor,
-+++        attention_mask: Optional[mindspore.Tensor] = None,
-+++        position_ids: Optional[mindspore.Tensor] = None,
-+++        past_key_value: Optional[Cache] = None,
-+++        output_attentions: bool = False,
-+++        use_cache: bool = False,
-+++        cache_position: Optional[mindspore.Tensor] = None,
-+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++        bsz, q_len, _ = hidden_states.shape
-+++
-+++        # 1. 线性投射 Q, K, V
-+++        query_states = self.q_proj(hidden_states)
-+++        key_states = self.k_proj(hidden_states)
-+++        value_states = self.v_proj(hidden_states)
-+++
-+++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++        # 3. RoPE 旋转位置编码
-+++        kv_seq_len = key_states.shape[-2]
-+++        if past_key_value is not None:
-+++            if self.layer_idx is None:
-+++                raise ValueError(
-+++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++                    "with a layer index."
-+++                )
-+++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++                if cache_position.shape[0] == 1:
-+++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++                    kv_seq_len = past_seen_tokens + 1
-+++                else:
-+++                    # prefill 阶段：cache_position 是范围，使用其长度
-+++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++            else:
-+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++
-+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++        # 4. KV 缓存更新
-+++        if past_key_value is not None:
-+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++            key_states, value_states = past_key_value.update(
-+++                key_states, value_states, self.layer_idx, cache_kwargs
-+++            )
-+++            
-+++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++                if cache_position.shape[0] == 1:
-+++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++                    kv_seq_len = key_states.shape[-2]
-+++
-+++        # 5. [重要] 准备 Attention Mask
-+++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++        fa_attention_mask = None
-+++        if attention_mask is not None:
-+++            # 截取与当前key长度匹配的部分
-+++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++            fa_attention_mask = (mask_slice != 0)
-+++
-+++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++        input_dtype = query_states.dtype
-+++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++            query_states = query_states.to(mindspore.float16)
-+++            key_states = key_states.to(mindspore.float16)
-+++            value_states = value_states.to(mindspore.float16)
-+++
-+++        # 6. [核心] 调用 flash_attention_score 算子
-+++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++        attn_output = mindspore.ops.flash_attention_score(
-+++            query=query_states,
-+++            key=key_states,
-+++            value=value_states,
-+++            head_num=self.num_heads, # 传入Q的头数(N1)
-+++            attn_mask=fa_attention_mask,
-+++            keep_prob=1.0 - self.attention_dropout,
-+++            scalar_value=1.0 / math.sqrt(self.head_dim),
-+++            input_layout="BNSD",
-+++            sparse_mode=0 # 使用 defaultMask 模式
-+++        )
-+++
-+++        # 恢复原始数据类型
-+++        attn_output = attn_output.to(input_dtype)
-+++
-+++        # 7. 调整输出形状
-+++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++        attn_output = self.o_proj(attn_output)
-+++
-+++        # FlashAttention 算子不直接返回注意力权重矩阵
-+++        attn_weights = None
-+++        if output_attentions:
-+++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++
-+++        return attn_output, attn_weights, past_key_value
-+++
-+++    # def forward(
-+++    #     self,
-+++    #     hidden_states: mindspore.Tensor,
-+++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++    #     past_key_value: Optional[Cache] = None,
-+++    #     output_attentions: bool = False,
-+++    #     use_cache: bool = False,
-+++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++    #     bsz, q_len, _ = hidden_states.shape
-+++
-+++    #     # 1. 线性投射 Q, K, V
-+++    #     query_states = self.q_proj(hidden_states)
-+++    #     key_states = self.k_proj(hidden_states)
-+++    #     value_states = self.v_proj(hidden_states)
-+++
-+++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++    #     # 3. RoPE 旋转位置编码
-+++    #     kv_seq_len = key_states.shape[-2]
-+++    #     if past_key_value is not None:
-+++    #         if self.layer_idx is None:
-+++    #             raise ValueError(
-+++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++    #                 "with a layer index."
-+++    #             )
-+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++
-+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++    #     # 4. KV 缓存更新
-+++    #     if past_key_value is not None:
-+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++    #         key_states, value_states = past_key_value.update(
-+++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++    #         )
-+++
-+++    #     # 5. 准备 Attention Mask
-+++    #     fa_attention_mask = None
-+++    #     if attention_mask is not None:
-+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++    #         fa_attention_mask = (mask_slice != 0)
-+++
-+++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++    #     input_dtype = query_states.dtype
-+++
-+++    #     # 6. [核心] 调用 flash_attention_score 算子
-+++    #     attn_output = mindspore.ops.flash_attention_score(
-+++    #         query=query_states,
-+++    #         key=key_states,
-+++    #         value=value_states,
-+++    #         head_num=self.num_heads,
-+++    #         attn_mask=fa_attention_mask,
-+++    #         keep_prob=1.0 - self.attention_dropout,
-+++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++    #         input_layout="BNSD",
-+++    #         sparse_mode=0,
-+++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++    #         inner_precise=1
-+++    #     )
-+++
-+++    #     # 恢复原始数据类型
-+++    #     attn_output = attn_output.to(input_dtype)
-+++
-+++    #     # 7. 调整输出形状
-+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++    #     attn_output = self.o_proj(attn_output)
-+++
-+++    #     attn_weights = None
-+++    #     if output_attentions:
-+++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++
-+++    #     return attn_output, attn_weights, past_key_value
-+++
-+++    # def forward(
-+++    #     self,
-+++    #     hidden_states: mindspore.Tensor,
-+++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++    #     past_key_value: Optional[Cache] = None,
-+++    #     output_attentions: bool = False,
-+++    #     use_cache: bool = False,
-+++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++    #     bsz, q_len, _ = hidden_states.shape
-+++
-+++    #     query_states = self.q_proj(hidden_states)
-+++    #     key_states = self.k_proj(hidden_states)
-+++    #     value_states = self.v_proj(hidden_states)
-+++
-+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++    #     kv_seq_len = key_states.shape[-2]
-+++    #     if past_key_value is not None:
-+++    #         if self.layer_idx is None:
-+++    #             raise ValueError("`layer_idx` must be specified for caching")
-+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++
-+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++    #     if past_key_value is not None:
-+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++    #         key_states, value_states = past_key_value.update(
-+++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++    #         )
-+++
-+++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++
-+++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++    #     query_states = query_states / math.sqrt(self.head_dim)
-+++    #     # <--- 修改结束 ---
-+++
-+++    #     fa_attention_mask = None
-+++    #     if attention_mask is not None:
-+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++    #         fa_attention_mask = (mask_slice != 0)
-+++
-+++    #     input_dtype = query_states.dtype
-+++
-+++    #     attn_output = mindspore.ops.flash_attention_score(
-+++    #         query=query_states,    # 传入已经预先缩放过的 query
-+++    #         key=key_states,
-+++    #         value=value_states,
-+++    #         head_num=self.num_heads,
-+++    #         attn_mask=fa_attention_mask,
-+++    #         keep_prob=1.0 - self.attention_dropout,
-+++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++    #         input_layout="BNSD",
-+++    #         sparse_mode=0,
-+++    #         inner_precise=1        # 仍然保持内部高精度计算
-+++    #     )
-+++
-+++    #     attn_output = attn_output.to(input_dtype)
-+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++    #     attn_output = self.o_proj(attn_output)
-+++
-+++    #     attn_weights = None
-+++    #     if output_attentions:
-+++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++
-+++    #     return attn_output, attn_weights, past_key_value
-+++
-++ QWEN2MOE_ATTENTION_CLASSES = {
-++     "eager": Qwen2MoeAttention,
-+++    "flash-attention": Qwen2MoeFlashAttention,
-++ }
-++ 
-++ 
-++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++ 
-+++    #@dwj
-+++    # 只遍历激活的专家，而非全部专家
-++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-        hidden_states = hidden_states.view(-1, hidden_dim)
-++-        # router_logits: (batch * sequence_length, n_experts)
-++-        router_logits = self.gate(hidden_states)
-++-
-++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++-        if self.norm_topk_prob:
-++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-        # we cast back to the input dtype
-++-        routing_weights = routing_weights.to(hidden_states.dtype)
-++-
-++-        final_hidden_states = ops.zeros(
-++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++-        )
-++-
-++-        # One hot encode the selected experts to create an expert mask
-++-        # this will be used to easily index which expert is going to be sollicitated
-++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++-
-++-        # Loop over all available experts in the model and perform the computation on each expert
-++-        for expert_idx in range(self.num_experts):
-++-            expert_layer = self.experts[expert_idx]
-++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++-
-++-            # Index the correct hidden states and compute the expert hidden state for
-++-            # the current expert. We need to make sure to multiply the output hidden
-++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++-            if 0 not in idx.shape:
-++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++-
-++-                # However `index_add_` only support torch tensors for indexing so we'll use
-++-                # the `top_x` tensor here.
-++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++-
-++-        shared_expert_output = self.shared_expert(hidden_states)
-++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++-
-++-        final_hidden_states = final_hidden_states + shared_expert_output
-+++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++            num_tokens = hidden_states_reshaped.shape[0]
-+++
-+++            router_logits = self.gate(hidden_states_reshaped)
-+++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++
-+++            if self.norm_topk_prob:
-+++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++            routing_weights = routing_weights.to(hidden_states.dtype)
-+++            
-+++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++            flat_selected_experts = selected_experts.flatten()
-+++            
-+++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++            token_indices = broadcasted_token_indices.flatten()
-+++            
-+++            active_experts = ops.unique(flat_selected_experts)
-+++            
-+++            for expert_idx_tensor in active_experts:
-+++                expert_idx = expert_idx_tensor.item()
-+++                expert_layer = self.experts[expert_idx]
-+++                
-+++                mask = (flat_selected_experts == expert_idx_tensor)
-+++                selected_token_indices = token_indices[mask]
-+++                selected_routing_weights = routing_weights.flatten()[mask]
-+++                
-+++                current_states = hidden_states_reshaped[selected_token_indices]
-+++                
-+++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++                
-+++                final_hidden_states = final_hidden_states.index_add(
-+++                    dim=0,
-+++                    index=selected_token_indices,
-+++                    source=expert_output.to(hidden_states.dtype)
-+++                )
-+++            
-+++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++ 
-++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++-        return final_hidden_states, router_logits
-+++            final_hidden_states = final_hidden_states + shared_expert_output
-+++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++            
-+++            return final_hidden_states, router_logits
-++ 
-++ 
-++ class Qwen2MoeDecoderLayer(nn.Module):
-++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++ 
-++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++ 
-+++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++
-++         if (layer_idx not in config.mlp_only_layers) and (
-++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++         ):
-++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++     _skip_keys_device_placement = "past_key_values"
-++     _supports_cache_class = True
-+++#lwx
-+++    # _supports_static_cache = True
-++ 
-++     def _init_weights(self, module):
-++         std = self.config.initializer_range
-++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++         return causal_mask
-++ 
-++ 
-++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++     _tied_weights_keys = ["lm_head.weight"]
-++ 
-++     def __init__(self, config):
-++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++         self.num_experts_per_tok = config.num_experts_per_tok
-++         # Initialize weights and apply final processing
-++         self.post_init()
-+++        # @lwx
-+++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+++        #     self.generation_config.cache_implementation = "static"
-+++        self._warmed_up = False
-+++
-+++    def warmup_moe_model(self):
-+++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+++        test_texts = [
-+++            "warmup short",
-+++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+++        ]
-+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++        if tokenizer is None:
-+++            from mindnlp.transformers import AutoTokenizer
-+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++            self._warmup_tokenizer = tokenizer
-+++
-+++        for text in test_texts:
-+++            inputs = tokenizer(text, return_tensors="ms")
-+++            with mindspore._no_grad():
-+++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++ 
-++     def get_input_embeddings(self):
-++         return self.model.embed_tokens
-++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++         ```"""
-+++        if not self._warmed_up:
-+++            self._warmed_up = True
-+++            self.warmup_moe_model()
-++ 
-++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++         output_router_logits = (
-++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++             }
-++         )
-++         return model_inputs
-+++# @lwx
-+++    # def _decode_one_tokens_logits(
-+++    #     self,
-+++    #     cur_token: mindspore.Tensor,
-+++    #     input_pos: Optional[mindspore.Tensor],
-+++    #     cache_position: mindspore.Tensor,
-+++    #     past_key_values: StaticCache,
-+++    # ) -> mindspore.Tensor:
-+++    #     """
-+++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+++        
-+++    #     Args:
-+++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+++    #         input_pos: 输入位置信息，可选
-+++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+++            
-+++    #     Returns:
-+++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+++    #     """
-+++    #     # 调用JIT编译的版本
-+++    #     return self.get_decode_one_tokens_logits(
-+++    #         cur_token=cur_token,
-+++    #         input_pos=input_pos,
-+++    #         cache_position=cache_position,
-+++    #         past_key_values=past_key_values,
-+++    #     )
-+++    
-+++    # @mindspore.jit(jit_level='O1')
-+++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+++    #     """
-+++    #     JIT编译的函数，用于高效的单token解码
-+++    #     使用JIT编译优化以支持静态shape和高效执行
-+++        
-+++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+++    #     """
-+++    #     outputs = self.model.forward(
-+++    #         input_ids=cur_token,
-+++    #         position_ids=input_pos,
-+++    #         cache_position=cache_position,
-+++    #         past_key_values=past_key_values,
-+++    #         use_cache=True,
-+++    #         return_dict=False,
-+++    #     )
-+++        
-+++    #     hidden_states = outputs[0]
-+++    #     logits = self.lm_head.forward(hidden_states)
-+++    #     logits = logits.float()
-+++        
-+++    #     return logits[:, -1, :]
-+++
-+++    # def _sample(
-+++    #     self,
-+++    #     input_ids: mindspore.Tensor,
-+++    #     logits_processor,
-+++    #     stopping_criteria,
-+++    #     generation_config,
-+++    #     synced_devices: bool,
-+++    #     streamer=None,
-+++    #     logits_warper=None,
-+++    #     **model_kwargs,
-+++    # ):
-+++    #     """
-+++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+++    #     """
-+++    #     from ...generation.logits_process import LogitsProcessorList
-+++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+++    #     from mindnlp.core import nn, ops, no_grad
-+++    #     import numpy as np
-+++        
-+++    #     # 检查是否使用 StaticCache
-+++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+++    #     # 否则，直接调用父类方法
-+++    #     past_key_values = model_kwargs.get("past_key_values")
-+++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+++
-+++    #     if not isinstance(past_key_values, StaticCache):
-+++    #         # 不使用 StaticCache，直接调用父类方法
-+++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+++    #         return super()._sample(
-+++    #             input_ids=input_ids,
-+++    #             logits_processor=logits_processor,
-+++    #             stopping_criteria=stopping_criteria,
-+++    #             generation_config=generation_config,
-+++    #             synced_devices=synced_devices,
-+++    #             streamer=streamer,
-+++    #             logits_warper=logits_warper,
-+++    #             **model_kwargs,
-+++    #         )
-+++        
-+++    #     # 使用 StaticCache，进入自定义循环
-+++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+++    #     pad_token_id = generation_config._pad_token_tensor
-+++    #     output_attentions = generation_config.output_attentions
-+++    #     output_hidden_states = generation_config.output_hidden_states
-+++    #     output_scores = generation_config.output_scores
-+++    #     output_logits = generation_config.output_logits
-+++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+++    #     max_length = generation_config.max_length
-+++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+++    #     do_sample = generation_config.do_sample
-+++        
-+++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+++    #         raise ValueError(
-+++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+++    #             f"{logits_warper})."
-+++    #         )
-+++        
-+++    #     # init attention / hidden states / scores tuples
-+++    #     scores = () if (return_dict_in_generate and output_scores) else None
-+++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+++        
-+++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+++    #         encoder_hidden_states = (
-+++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+++    #         )
-+++        
-+++    #     # keep track of which sequences are already finished
-+++    #     batch_size, cur_len = input_ids.shape
-+++    #     this_peer_finished = False
-+++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+++        
-+++    #     time_record = []
-+++    #     from ....utils.testing_utils import parse_flag_from_env
-+++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+++        
-+++    #     while self._has_unfinished_sequences(
-+++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+++    #     ):
-+++    #         if _record_time:
-+++    #             import time as time_module
-+++    #             infer_start = time_module.time()
-+++            
-+++    #         # prepare model inputs
-+++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+++            
-+++    #         # prepare variable output controls
-+++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+++            
-+++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+++    #         cur_cache_position = model_inputs.get("cache_position")
-+++    #         cur_past_key_values = model_inputs.get("past_key_values")
-+++    #         cur_input_ids = model_inputs.get("input_ids")
-+++            
-+++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+++    #             cur_cache_position is not None and 
-+++    #             len(cur_cache_position.shape) > 0 and
-+++    #             cur_cache_position.shape[0] == 1 and
-+++    #             cur_input_ids is not None and
-+++    #             cur_input_ids.shape[1] == 1):
-+++    #             # 使用 JIT 优化的单 token 解码
-+++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+++    #             if not hasattr(self, '_jit_used'):
-+++    #                 self._jit_used = False
-+++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+++                
-+++    #             next_token_logits = self.get_decode_one_tokens_logits(
-+++    #                 cur_token=cur_input_ids,
-+++    #                 input_pos=model_inputs.get("position_ids"),
-+++    #                 cache_position=cur_cache_position,
-+++    #                 past_key_values=cur_past_key_values,
-+++    #             )
-+++                
-+++    #             # 标记已使用JIT（用于后续判断）
-+++    #             if not self._jit_used:
-+++    #                 self._jit_used = True
-+++                
-+++    #             # 构造兼容的输出对象
-+++    #             class JitOptimizedOutput:
-+++    #                 def __init__(self, logits, config):
-+++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+++    #                     self.config = config
-+++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+++    #                     self.attentions = None if not config.is_encoder_decoder else None
-+++    #                     self.cross_attentions = None
-+++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+++                
-+++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+++    #         else:
-+++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+++    #             outputs = self(**model_inputs, return_dict=True)
-+++            
-+++    #         if synced_devices and this_peer_finished:
-+++    #             continue
-+++            
-+++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+++    #         next_token_logits = outputs.logits[:, -1, :]
-+++            
-+++    #         # pre-process distribution
-+++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+++    #         if do_sample:
-+++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+++            
-+++    #         # Store scores, attentions and hidden_states when required
-+++    #         if return_dict_in_generate:
-+++    #             if output_scores:
-+++    #                 scores += (next_token_scores,)
-+++    #             if output_logits:
-+++    #                 raw_logits += (next_token_logits,)
-+++    #             if output_attentions:
-+++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+++    #                 if self.config.is_encoder_decoder:
-+++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+++                
-+++    #             if output_hidden_states:
-+++    #                 hidden = (
-+++    #                     outputs.decoder_hidden_states
-+++    #                     if self.config.is_encoder_decoder
-+++    #                     else outputs.hidden_states
-+++    #                 )
-+++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+++            
-+++    #         # token selection
-+++    #         if do_sample:
-+++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+++    #         else:
-+++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+++            
-+++    #         # finished sentences should have their next token be a padding token
-+++    #         if has_eos_stopping_criteria:
-+++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+++            
-+++    #         # update generated ids, model inputs, and length for next step
-+++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+++    #         if streamer is not None:
-+++    #             streamer.put(next_tokens)
-+++            
-+++    #         model_kwargs = self._update_model_kwargs_for_generation(
-+++    #             outputs,
-+++    #             model_kwargs,
-+++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+++    #         )
-+++            
-+++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+++    #         cur_len += 1
-+++            
-+++    #         if _record_time:
-+++    #             import time as time_module
-+++    #             infer_stop = time_module.time()
-+++    #             time_record.append(infer_stop - infer_start)
-+++            
-+++    #         del outputs
-+++        
-+++    #     average_infer_time = None
-+++    #     if time_record:
-+++    #         if len(time_record) > 1:
-+++    #             time_record.pop(0)
-+++    #         average_infer_time = sum(time_record) / len(time_record)
-+++    #         print(f'average inference time is: {average_infer_time}')
-+++    #         print(f'inference time record: {time_record}')
-+++        
-+++    #     if streamer is not None:
-+++    #         streamer.end()
-+++        
-+++    #     # 简单判断：打印是否使用了JIT路径
-+++    #     if hasattr(self, '_jit_used') and self._jit_used:
-+++    #         print("[JIT] ✓ JIT optimization was used during generation")
-+++    #     else:
-+++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+++        
-+++    #     if return_dict_in_generate:
-+++    #         if self.config.is_encoder_decoder:
-+++    #             return GenerateEncoderDecoderOutput(
-+++    #                 sequences=input_ids,
-+++    #                 scores=scores,
-+++    #                 logits=raw_logits,
-+++    #                 encoder_attentions=encoder_attentions,
-+++    #                 encoder_hidden_states=encoder_hidden_states,
-+++    #                 decoder_attentions=decoder_attentions,
-+++    #                 cross_attentions=cross_attentions,
-+++    #                 decoder_hidden_states=decoder_hidden_states,
-+++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++    #                 average_infer_time=average_infer_time
-+++    #             )
-+++    #         else:
-+++    #             return GenerateDecoderOnlyOutput(
-+++    #                 sequences=input_ids,
-+++    #                 scores=scores,
-+++    #                 logits=raw_logits,
-+++    #                 attentions=decoder_attentions,
-+++    #                 hidden_states=decoder_hidden_states,
-+++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++    #                 average_infer_time=average_infer_time
-+++    #             )
-+++    #     else:
-+++    #         return input_ids
-+++            
-+++    # def _prepare_cache_for_generation(
-+++    #     self,
-+++    #     generation_config,
-+++    #     model_kwargs,
-+++    #     assistant_model,
-+++    #     batch_size,
-+++    #     max_cache_length,
-+++    # ):
-+++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+++    #         generation_config.cache_implementation = "static"
-+++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+++        
-+++    #     if generation_config.cache_implementation == "static":
-+++    #         base_required_from_max_length = generation_config.max_length + 1
-+++    #         base_required = max(max_cache_length, base_required_from_max_length)
-+++    #         min_cache_size = 50
-+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+++    #         else:
-+++    #             max_cache_length = max(base_required, min_cache_size)
-+++            
-+++    #         original_max_cache_length = max_cache_length
-+++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+++    #         print(f"  - final max_cache_length: {max_cache_length}")
-+++            
-+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++    #             if max_cache_length > self.config.max_position_embeddings:
-+++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+++        
-+++    #     result = super()._prepare_cache_for_generation(
-+++    #         generation_config=generation_config,
-+++    #         model_kwargs=model_kwargs,
-+++    #         assistant_model=assistant_model,
-+++    #         batch_size=batch_size,
-+++    #         max_cache_length=max_cache_length,
-+++    #     )
-+++        
-+++    #     if generation_config.cache_implementation == "static":
-+++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+++    #         created_cache = model_kwargs.get(cache_name)
-+++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+++    #             if created_cache.max_cache_len < generation_config.max_length:
-+++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+++        
-+++    #     return result
-+++
-+++
-+++
-++ 
-++ 
-++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++-- 
-++2.27.0
-++
-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+new file mode 100644
-+index 00000000..22b65dd5
-+--- /dev/null
-++++ b/patches/0002-20251106commit.patch
-+@@ -0,0 +1,3200 @@
-++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-++From: Pinoeer-kingxi <13022943007@163.com>
-++Date: Thu, 6 Nov 2025 09:20:38 +0800
-++Subject: [PATCH 2/3] 20251106commit
-++
-++---
-++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
-++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
-++ 3 files changed, 2689 insertions(+), 305 deletions(-)
-++ create mode 100644 patches/0001-20251104commit.patch
-++
-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++index d8303e45..73773c22 100644
-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
-++         #     y = y + self.shared_experts(identity)
-++         # return y
-++         
-+++    # @no_grad()
-+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++
-+++    #     expert_cache = ops.zeros_like(x)
-+++    #     for i in range(self.num_experts_per_tok):
-+++    #         expert_id = flat_expert_indices[i].item()
-+++    #         weight = flat_expert_weights[i].item()
-+++    #         expert = self.experts[expert_id]
-+++    #         expert_out = expert(x)
-+++    #         expert_cache += expert_out * weight
-+++    #     return expert_cache
-+++
-++     @no_grad()
-++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++        # x 的 shape: (1, hidden_size)
-+++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+++
-+++        # 1. 收集所有需要的专家层
-+++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-+++
-+++        # 2. 并行计算所有专家的输出
-+++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+++        # ops.cat 会将它们堆叠成一个新的 Tensor
-+++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+++
-+++        # 3. 使用矩阵乘法进行加权求和
-+++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++        # 最终结果 final_output 的 shape: (1, hidden_size)
-+++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+++        
-+++        return final_output
-++ 
-++-        expert_cache = ops.zeros_like(x)
-++-        for i in range(self.num_experts_per_tok):
-++-            expert_id = flat_expert_indices[i].item()
-++-            weight = flat_expert_weights[i].item()
-++-            expert = self.experts[expert_id]
-++-            expert_out = expert(x)
-++-            expert_cache += expert_out * weight
-++-        return expert_cache
-++ 
-++     @no_grad()
-++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
-++             key_states = self.k_proj(hidden_states)
-++             value_states = self.v_proj(hidden_states)
-++ 
-++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++        # @lwx
-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
-+++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-+++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-+++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-++ 
-++         kv_seq_len = key_states.shape[-2]
-++         if past_key_value is not None:
-++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
-++         return attn_output, attn_weights, past_key_value
-++ 
-++ 
-+++# class DeepseekFlashAttention(nn.Module):
-+++#     """
-+++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-+++
-+++#     This class is designed as a drop-in replacement for DeepseekAttention.
-+++#     """
-+++
-+++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+++#         super().__init__()
-+++#         self.config = config
-+++#         self.layer_idx = layer_idx
-+++#         if layer_idx is None:
-+++#             logger.warning(
-+++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++#                 "when creating this class."
-+++#             )
-+++
-+++#         self.attention_dropout = config.attention_dropout
-+++#         self.hidden_size = config.hidden_size
-+++#         self.num_heads = config.num_attention_heads
-+++#         self.head_dim = self.hidden_size // self.num_heads
-+++#         self.num_key_value_heads = config.num_key_value_heads
-+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++#         self.max_position_embeddings = config.max_position_embeddings
-+++#         self.rope_theta = config.rope_theta
-+++#         self.is_causal = True
-+++
-+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++#             raise ValueError(
-+++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++#                 f" and `num_heads`: {self.num_heads})."
-+++#             )
-+++
-+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+++#         self._init_rope()
-+++
-+++#     def _init_rope(self):
-+++#         if self.config.rope_scaling is None:
-+++#             self.rotary_emb = DeepseekRotaryEmbedding(
-+++#                 self.head_dim,
-+++#                 max_position_embeddings=self.max_position_embeddings,
-+++#                 base=self.rope_theta,
-+++#             )
-+++#         else:
-+++#             scaling_type = self.config.rope_scaling["type"]
-+++#             scaling_factor = self.config.rope_scaling["factor"]
-+++#             if scaling_type == "linear":
-+++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+++#                     self.head_dim,
-+++#                     max_position_embeddings=self.max_position_embeddings,
-+++#                     scaling_factor=scaling_factor,
-+++#                     base=self.rope_theta,
-+++#                 )
-+++#             elif scaling_type == "dynamic":
-+++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+++#                     self.head_dim,
-+++#                     max_position_embeddings=self.max_position_embeddings,
-+++#                     scaling_factor=scaling_factor,
-+++#                     base=self.rope_theta,
-+++#                 )
-+++#             else:
-+++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+++
-+++#     def forward(
-+++#         self,
-+++#         hidden_states: mindspore.Tensor,
-+++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++#         position_ids: Optional[mindspore.Tensor] = None,
-+++#         past_key_value: Optional[Cache] = None,
-+++#         output_attentions: bool = False,
-+++#         use_cache: bool = False,
-+++#         **kwargs,
-+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++#         if "padding_mask" in kwargs:
-+++#             warnings.warn(
-+++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+++#             )
-+++        
-+++#         if output_attentions:
-+++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-+++
-+++#         bsz, q_len, _ = hidden_states.shape
-+++
-+++#         if self.config.pretraining_tp > 1:
-+++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+++
-+++#         query_states = self.q_proj(hidden_states)
-+++#         key_states = self.k_proj(hidden_states)
-+++#         value_states = self.v_proj(hidden_states)
-+++
-+++#         # Reshape for multi-head attention
-+++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++#         kv_seq_len = key_states.shape[-2]
-+++#         if past_key_value is not None:
-+++#             if self.layer_idx is None:
-+++#                 raise ValueError(
-+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++#                     "with a layer index."
-+++#                 )
-+++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++        
-+++#         # Apply Rotary Positional Embedding
-+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++#         if past_key_value is not None:
-+++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-+++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++
-+++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-+++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-+++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++        
-+++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++        
-+++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++
-+++#         # Convert attention_mask for flash_attention_score
-+++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-+++#         if attention_mask is not None:
-+++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-+++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+++#                 raise ValueError(
-+++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+++#                 )
-+++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-+++#         else:
-+++#             attn_mask_for_fa = None
-+++        
-+++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+++
-+++#         # Call the fused flash_attention_score operator
-+++#         attn_output = mindspore.ops.flash_attention_score(
-+++#             query=query_states_for_fa,
-+++#             key=key_states_for_fa,
-+++#             value=value_states_for_fa,
-+++#             head_num=self.num_heads, # This is N1, the number of query heads
-+++#             input_layout='BSH',
-+++#             attn_mask=attn_mask_for_fa,
-+++#             keep_prob=keep_prob,
-+++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++#             sparse_mode=0 # Default mask mode
-+++#         )
-+++        
-+++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-+++#         attn_output = self.o_proj(attn_output)
-+++        
-+++#         # Flash Attention does not return attention weights
-+++#         attn_weights = None
-+++
-+++#         return attn_output, attn_weights, past_key_value
-+++
-+++class DeepseekFlashAttention(nn.Module):
-+++    """
-+++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
-+++    This implementation is a drop-in replacement for the original DeepseekAttention class,
-+++    designed for high performance on supported hardware (Ascend).
-+++
-+++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
-+++    """
-+++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+++        super().__init__()
-+++        self.config = config
-+++        self.layer_idx = layer_idx
-+++        if layer_idx is None:
-+++            logger.warning(
-+++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++                "when creating this class."
-+++            )
-+++
-+++        # --- [FIX] Correctly initialize all required attributes ---
-+++        self.attention_dropout = config.attention_dropout
-+++        self.hidden_size = config.hidden_size
-+++        self.num_heads = config.num_attention_heads
-+++        self.head_dim = self.hidden_size // self.num_heads
-+++        self.num_key_value_heads = config.num_key_value_heads
-+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++        self.max_position_embeddings = config.max_position_embeddings
-+++        self.rope_theta = config.rope_theta
-+++        self.is_causal = True
-+++
-+++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++            raise ValueError(
-+++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++                f" and `num_heads`: {self.num_heads})."
-+++            )
-+++
-+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+++        
-+++        # This call will now succeed as all attributes are initialized.
-+++        self._init_rope()
-+++
-+++    def _init_rope(self):
-+++        if self.config.rope_scaling is None:
-+++            self.rotary_emb = DeepseekRotaryEmbedding(
-+++                self.head_dim,
-+++                max_position_embeddings=self.max_position_embeddings,
-+++                base=self.rope_theta,
-+++            )
-+++        else:
-+++            scaling_type = self.config.rope_scaling["type"]
-+++            scaling_factor = self.config.rope_scaling["factor"]
-+++            if scaling_type == "linear":
-+++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+++                    self.head_dim,
-+++                    max_position_embeddings=self.max_position_embeddings,
-+++                    scaling_factor=scaling_factor,
-+++                    base=self.rope_theta,
-+++                )
-+++            elif scaling_type == "dynamic":
-+++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+++                    self.head_dim,
-+++                    max_position_embeddings=self.max_position_embeddings,
-+++                    scaling_factor=scaling_factor,
-+++                    base=self.rope_theta,
-+++                )
-+++            else:
-+++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+++
-+++    def forward(
-+++        self,
-+++        hidden_states: mindspore.Tensor,
-+++        attention_mask: Optional[mindspore.Tensor] = None,
-+++        position_ids: Optional[mindspore.Tensor] = None,
-+++        past_key_value: Optional[Cache] = None,
-+++        output_attentions: bool = False,
-+++        use_cache: bool = False,
-+++        **kwargs,
-+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++        if "padding_mask" in kwargs:
-+++            warnings.warn(
-+++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+++            )
-+++        if output_attentions:
-+++            warnings.warn(
-+++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
-+++            )
-+++
-+++        bsz, q_len, _ = hidden_states.shape
-+++
-+++        if self.config.pretraining_tp > 1:
-+++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+++
-+++        query_states = self.q_proj(hidden_states)
-+++        key_states = self.k_proj(hidden_states)
-+++        value_states = self.v_proj(hidden_states)
-+++
-+++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++        kv_seq_len = key_states.shape[-2]
-+++        if past_key_value is not None:
-+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++        
-+++        # Apply Rotary Position Embedding
-+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++        if past_key_value is not None:
-+++            cache_kwargs = {"sin": sin, "cos": cos}
-+++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++
-+++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
-+++        # So we must explicitly repeat the KV heads.
-+++        key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++        value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++
-+++        # Convert attention mask for flash_attention_score
-+++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
-+++        if attention_mask is not None:
-+++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+++                 raise ValueError(
-+++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+++                )
-+++            attn_mask_for_fa = attention_mask < 0
-+++        else:
-+++            attn_mask_for_fa = None
-+++
-+++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+++
-+++        # Call the fused operator using the efficient BNSD layout
-+++        attn_output = mindspore.ops.flash_attention_score(
-+++            query=query_states,
-+++            key=key_states,
-+++            value=value_states,
-+++            head_num=self.num_heads,
-+++            input_layout='BNSD', # Specify the correct layout
-+++            attn_mask=attn_mask_for_fa,
-+++            keep_prob=keep_prob,
-+++            scalar_value=1.0 / math.sqrt(self.head_dim)
-+++        )
-+++        
-+++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
-+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++        
-+++        # Apply output projection
-+++        attn_output = self.o_proj(attn_output)
-+++
-+++        # Flash attention does not return attention weights, so we return None.
-+++        attn_weights = None
-+++
-+++        return attn_output, attn_weights, past_key_value
-+++
-++ Deepseek_ATTENTION_CLASSES = {
-++     "eager": DeepseekAttention,
-+++    "flash-attention": DeepseekFlashAttention,
-++ }
-++ 
-++ 
-++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
-++             config=config, layer_idx=layer_idx
-++         )
-++ 
-+++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-+++            config=config, layer_idx=layer_idx
-+++        )
-+++
-++         self.mlp = (
-++             DeepseekMoE(config)
-++             if (
-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++index d4c6b651..bced285c 100644
-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
-++ 
-++ import mindspore
-++ import mindnlp.core.nn.functional as F
-++-from mindnlp.core import nn, ops
-+++from mindnlp.core import nn, ops, no_grad
-++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-++ 
-++ from ....common.activations import ACT2FN
-++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
-++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-++ 
-+++Long_Prompt = False
-+++PROMPT_LENGTH_THRESHOLD = 128
-++ 
-++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-++ def _prepare_4d_causal_attention_mask_with_cache_position(
-++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
-++         return attn_output, attn_weights, past_key_value
-++ 
-++ 
-+++# class Qwen2MoeFlashAttention(nn.Module):
-+++#     """
-+++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++
-+++#     关键改动:
-+++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++#        直接传入原始的 key 和 value 张量效率更高。
-+++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++#     """
-+++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++#         super().__init__()
-+++#         self.config = config
-+++#         self.layer_idx = layer_idx
-+++#         self.hidden_size = config.hidden_size
-+++#         self.num_heads = config.num_attention_heads
-+++#         self.head_dim = self.hidden_size // self.num_heads
-+++#         self.num_key_value_heads = config.num_key_value_heads
-+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++#         self.max_position_embeddings = config.max_position_embeddings
-+++#         self.rope_theta = config.rope_theta
-+++#         self.attention_dropout = config.attention_dropout
-+++
-+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++#             raise ValueError(
-+++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++#             )
-+++
-+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++
-+++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++#             self.head_dim,
-+++#             max_position_embeddings=self.max_position_embeddings,
-+++#             base=self.rope_theta,
-+++#         )
-+++
-+++#     def forward(
-+++#         self,
-+++#         hidden_states: mindspore.Tensor,
-+++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++#         position_ids: Optional[mindspore.Tensor] = None,
-+++#         past_key_value: Optional[Cache] = None,
-+++#         output_attentions: bool = False,
-+++#         use_cache: bool = False,
-+++#         cache_position: Optional[mindspore.Tensor] = None,
-+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++#         bsz, q_len, _ = hidden_states.shape
-+++
-+++#         # 1. 线性投射 Q, K, V
-+++#         query_states = self.q_proj(hidden_states)
-+++#         key_states = self.k_proj(hidden_states)
-+++#         value_states = self.v_proj(hidden_states)
-+++
-+++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++#         # query:   [B, S, H*D] -> [B, N1, S, D]
-+++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++#         # 3. RoPE 旋转位置编码
-+++#         kv_seq_len = key_states.shape[-2]
-+++#         if past_key_value is not None:
-+++#             if self.layer_idx is None:
-+++#                 raise ValueError(
-+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++#                     "with a layer index."
-+++#                 )
-+++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++#                 if cache_position.shape[0] == 1:
-+++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++#                     kv_seq_len = past_seen_tokens + 1
-+++#                 else:
-+++#                     # prefill 阶段：cache_position 是范围，使用其长度
-+++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++#             else:
-+++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++
-+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++#         # 4. KV 缓存更新
-+++#         if past_key_value is not None:
-+++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++#             key_states, value_states = past_key_value.update(
-+++#                 key_states, value_states, self.layer_idx, cache_kwargs
-+++#             )
-+++            
-+++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++#                 if cache_position.shape[0] == 1:
-+++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++#                     kv_seq_len = key_states.shape[-2]
-+++
-+++#         # 5. [重要] 准备 Attention Mask
-+++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++#         fa_attention_mask = None
-+++#         if attention_mask is not None:
-+++#             # 截取与当前key长度匹配的部分
-+++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++#             fa_attention_mask = (mask_slice != 0)
-+++
-+++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++#         input_dtype = query_states.dtype
-+++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++#             query_states = query_states.to(mindspore.float16)
-+++#             key_states = key_states.to(mindspore.float16)
-+++#             value_states = value_states.to(mindspore.float16)
-+++
-+++#         # 6. [核心] 调用 flash_attention_score 算子
-+++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++#         attn_output = mindspore.ops.flash_attention_score(
-+++#             query=query_states,
-+++#             key=key_states,
-+++#             value=value_states,
-+++#             head_num=self.num_heads, # 传入Q的头数(N1)
-+++#             attn_mask=fa_attention_mask,
-+++#             keep_prob=1.0 - self.attention_dropout,
-+++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++#             input_layout="BNSD",
-+++#             sparse_mode=0 # 使用 defaultMask 模式
-+++#         )
-+++
-+++#         # 恢复原始数据类型
-+++#         attn_output = attn_output.to(input_dtype)
-+++
-+++#         # 7. 调整输出形状
-+++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++#         attn_output = self.o_proj(attn_output)
-+++
-+++#         # FlashAttention 算子不直接返回注意力权重矩阵
-+++#         attn_weights = None
-+++#         if output_attentions:
-+++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++
-+++#         return attn_output, attn_weights, past_key_value
-+++
-+++#     # def forward(
-+++#     #     self,
-+++#     #     hidden_states: mindspore.Tensor,
-+++#     #     attention_mask: Optional[mindspore.Tensor] = None,
-+++#     #     position_ids: Optional[mindspore.Tensor] = None,
-+++#     #     past_key_value: Optional[Cache] = None,
-+++#     #     output_attentions: bool = False,
-+++#     #     use_cache: bool = False,
-+++#     #     cache_position: Optional[mindspore.Tensor] = None,
-+++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++#     #     bsz, q_len, _ = hidden_states.shape
-+++
-+++#     #     # 1. 线性投射 Q, K, V
-+++#     #     query_states = self.q_proj(hidden_states)
-+++#     #     key_states = self.k_proj(hidden_states)
-+++#     #     value_states = self.v_proj(hidden_states)
-+++
-+++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++
-+++#     #     # 3. RoPE 旋转位置编码
-+++#     #     kv_seq_len = key_states.shape[-2]
-+++#     #     if past_key_value is not None:
-+++#     #         if self.layer_idx is None:
-+++#     #             raise ValueError(
-+++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++#     #                 "with a layer index."
-+++#     #             )
-+++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++
-+++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++#     #     # 4. KV 缓存更新
-+++#     #     if past_key_value is not None:
-+++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++#     #         key_states, value_states = past_key_value.update(
-+++#     #             key_states, value_states, self.layer_idx, cache_kwargs
-+++#     #         )
-+++
-+++#     #     # 5. 准备 Attention Mask
-+++#     #     fa_attention_mask = None
-+++#     #     if attention_mask is not None:
-+++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++#     #         fa_attention_mask = (mask_slice != 0)
-+++
-+++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++#     #     input_dtype = query_states.dtype
-+++
-+++#     #     # 6. [核心] 调用 flash_attention_score 算子
-+++#     #     attn_output = mindspore.ops.flash_attention_score(
-+++#     #         query=query_states,
-+++#     #         key=key_states,
-+++#     #         value=value_states,
-+++#     #         head_num=self.num_heads,
-+++#     #         attn_mask=fa_attention_mask,
-+++#     #         keep_prob=1.0 - self.attention_dropout,
-+++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++#     #         input_layout="BNSD",
-+++#     #         sparse_mode=0,
-+++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++#     #         inner_precise=1
-+++#     #     )
-+++
-+++#     #     # 恢复原始数据类型
-+++#     #     attn_output = attn_output.to(input_dtype)
-+++
-+++#     #     # 7. 调整输出形状
-+++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++#     #     attn_output = self.o_proj(attn_output)
-+++
-+++#     #     attn_weights = None
-+++#     #     if output_attentions:
-+++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++
-+++#     #     return attn_output, attn_weights, past_key_value
-+++
-+++
-++ class Qwen2MoeFlashAttention(nn.Module):
-++     """
-++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++-
-++-    关键改动:
-++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++-       直接传入原始的 key 和 value 张量效率更高。
-++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
-+++
-+++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
-+++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
-+++    完全使用模型的低精度数据类型（如 float16）进行计算，
-+++    以达到理论上的最高执行速度。
-++     """
-++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++         super().__init__()
-++         self.config = config
-++         self.layer_idx = layer_idx
-+++        if layer_idx is None:
-+++            logger.warning_once(
-+++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
-+++            )
-+++
-++         self.hidden_size = config.hidden_size
-++         self.num_heads = config.num_attention_heads
-++         self.head_dim = self.hidden_size // self.num_heads
-++         self.num_key_value_heads = config.num_key_value_heads
-++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++         self.max_position_embeddings = config.max_position_embeddings
-++         self.rope_theta = config.rope_theta
-++         self.attention_dropout = config.attention_dropout
-++ 
-++-        if (self.head_dim * self.num_heads) != self.hidden_size:
-++-            raise ValueError(
-++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++-            )
-++-
-++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
-++         key_states = self.k_proj(hidden_states)
-++         value_states = self.v_proj(hidden_states)
-++ 
-++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++-        # query:   [B, S, H*D] -> [B, N1, S, D]
-++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++        # 2. 调整形状以匹配 BNSD 布局
-++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-
-++-        # 3. RoPE 旋转位置编码
-+++        
-+++        # 3. RoPE 和 KV 缓存
-++         kv_seq_len = key_states.shape[-2]
-++         if past_key_value is not None:
-++-            if self.layer_idx is None:
-++-                raise ValueError(
-++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++-                    "with a layer index."
-++-                )
-++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++-                if cache_position.shape[0] == 1:
-++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++-                    kv_seq_len = past_seen_tokens + 1
-++-                else:
-++-                    # prefill 阶段：cache_position 是范围，使用其长度
-++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++-            else:
-++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++-
-+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++        
-++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++ 
-++-        # 4. KV 缓存更新
-++         if past_key_value is not None:
-++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++-            key_states, value_states = past_key_value.update(
-++-                key_states, value_states, self.layer_idx, cache_kwargs
-++-            )
-++-            
-++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++-                if cache_position.shape[0] == 1:
-++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++-                    kv_seq_len = key_states.shape[-2]
-++-
-++-        # 5. [重要] 准备 Attention Mask
-++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++
-+++        # 4. 准备 Attention Mask
-++         fa_attention_mask = None
-++         if attention_mask is not None:
-++-            # 截取与当前key长度匹配的部分
-++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++             fa_attention_mask = (mask_slice != 0)
-++ 
-++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++-        input_dtype = query_states.dtype
-++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++-            query_states = query_states.to(mindspore.float16)
-++-            key_states = key_states.to(mindspore.float16)
-++-            value_states = value_states.to(mindspore.float16)
-++-
-++-        # 6. [核心] 调用 flash_attention_score 算子
-++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
-++         attn_output = mindspore.ops.flash_attention_score(
-++             query=query_states,
-++             key=key_states,
-++             value=value_states,
-++-            head_num=self.num_heads, # 传入Q的头数(N1)
-+++            head_num=self.num_heads,
-++             attn_mask=fa_attention_mask,
-++-            keep_prob=1.0 - self.attention_dropout,
-+++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
-++             scalar_value=1.0 / math.sqrt(self.head_dim),
-++             input_layout="BNSD",
-++-            sparse_mode=0 # 使用 defaultMask 模式
-+++            sparse_mode=0,
-+++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
-++         )
-++ 
-++-        # 恢复原始数据类型
-++-        attn_output = attn_output.to(input_dtype)
-++-
-++-        # 7. 调整输出形状
-++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++        # 6. 调整输出形状
-++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++         attn_output = self.o_proj(attn_output)
-++ 
-++-        # FlashAttention 算子不直接返回注意力权重矩阵
-+++        # 7. 返回结果
-++         attn_weights = None
-++         if output_attentions:
-++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-++ 
-++         return attn_output, attn_weights, past_key_value
-++ 
-++-    # def forward(
-++-    #     self,
-++-    #     hidden_states: mindspore.Tensor,
-++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-++-    #     position_ids: Optional[mindspore.Tensor] = None,
-++-    #     past_key_value: Optional[Cache] = None,
-++-    #     output_attentions: bool = False,
-++-    #     use_cache: bool = False,
-++-    #     cache_position: Optional[mindspore.Tensor] = None,
-++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++-
-++-    #     bsz, q_len, _ = hidden_states.shape
-++-
-++-    #     # 1. 线性投射 Q, K, V
-++-    #     query_states = self.q_proj(hidden_states)
-++-    #     key_states = self.k_proj(hidden_states)
-++-    #     value_states = self.v_proj(hidden_states)
-++-
-++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-
-++-    #     # 3. RoPE 旋转位置编码
-++-    #     kv_seq_len = key_states.shape[-2]
-++-    #     if past_key_value is not None:
-++-    #         if self.layer_idx is None:
-++-    #             raise ValueError(
-++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++-    #                 "with a layer index."
-++-    #             )
-++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++ 
-++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++-
-++-    #     # 4. KV 缓存更新
-++-    #     if past_key_value is not None:
-++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++-    #         key_states, value_states = past_key_value.update(
-++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-++-    #         )
-++-
-++-    #     # 5. 准备 Attention Mask
-++-    #     fa_attention_mask = None
-++-    #     if attention_mask is not None:
-++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++-    #         fa_attention_mask = (mask_slice != 0)
-++-
-++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++-    #     input_dtype = query_states.dtype
-++-
-++-    #     # 6. [核心] 调用 flash_attention_score 算子
-++-    #     attn_output = mindspore.ops.flash_attention_score(
-++-    #         query=query_states,
-++-    #         key=key_states,
-++-    #         value=value_states,
-++-    #         head_num=self.num_heads,
-++-    #         attn_mask=fa_attention_mask,
-++-    #         keep_prob=1.0 - self.attention_dropout,
-++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++-    #         input_layout="BNSD",
-++-    #         sparse_mode=0,
-++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++-    #         inner_precise=1
-++-    #     )
-++-
-++-    #     # 恢复原始数据类型
-++-    #     attn_output = attn_output.to(input_dtype)
-+++QWEN2MOE_ATTENTION_CLASSES = {
-+++    "eager": Qwen2MoeAttention,
-+++    "flash-attention": Qwen2MoeFlashAttention,
-+++}
-++ 
-++-    #     # 7. 调整输出形状
-++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++-    #     attn_output = self.o_proj(attn_output)
-++ 
-++-    #     attn_weights = None
-++-    #     if output_attentions:
-++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++#     def __init__(self, config):
-+++#         super().__init__()
-+++#         self.num_experts = config.num_experts
-+++#         self.top_k = config.num_experts_per_tok
-+++#         self.norm_topk_prob = config.norm_topk_prob
-+++
-+++#         # gating
-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++#         self.experts = nn.ModuleList(
-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++#         )
-+++
-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++
-+++#     #@dwj
-+++#     # 只遍历激活的专家，而非全部专家
-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++#             num_tokens = hidden_states_reshaped.shape[0]
-+++
-+++#             router_logits = self.gate(hidden_states_reshaped)
-+++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++
-+++#             if self.norm_topk_prob:
-+++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++#             routing_weights = routing_weights.to(hidden_states.dtype)
-+++            
-+++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++#             flat_selected_experts = selected_experts.flatten()
-+++            
-+++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++#             token_indices = broadcasted_token_indices.flatten()
-+++            
-+++#             active_experts = ops.unique(flat_selected_experts)
-+++            
-+++#             for expert_idx_tensor in active_experts:
-+++#                 expert_idx = expert_idx_tensor.item()
-+++#                 expert_layer = self.experts[expert_idx]
-+++                
-+++#                 mask = (flat_selected_experts == expert_idx_tensor)
-+++#                 selected_token_indices = token_indices[mask]
-+++#                 selected_routing_weights = routing_weights.flatten()[mask]
-+++                
-+++#                 current_states = hidden_states_reshaped[selected_token_indices]
-+++                
-+++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++                
-+++#                 final_hidden_states = final_hidden_states.index_add(
-+++#                     dim=0,
-+++#                     index=selected_token_indices,
-+++#                     source=expert_output.to(hidden_states.dtype)
-+++#                 )
-+++            
-+++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++ 
-++-    #     return attn_output, attn_weights, past_key_value
-+++#             final_hidden_states = final_hidden_states + shared_expert_output
-+++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++            
-+++#             return final_hidden_states, router_logits
-+++
-+++
-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++#     """
-+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-+++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-+++#     `_moe_infer_prefill` (用于长序列处理) 方法。
-+++#     """
-+++#     def __init__(self, config: Qwen2MoeConfig):
-+++#         super().__init__()
-+++#         self.num_experts = config.num_experts
-+++#         self.top_k = config.num_experts_per_tok
-+++#         self.norm_topk_prob = config.norm_topk_prob
-+++
-+++#         # 门控网络
-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++#         # 专家列表
-+++#         self.experts = nn.ModuleList(
-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++#         )
-+++#         # 共享专家
-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++
-+++#     @no_grad()
-+++#     def _moe_infer_decode(
-+++#         self, 
-+++#         hidden_states: mindspore.Tensor, 
-+++#         selected_experts: mindspore.Tensor, 
-+++#         routing_weights: mindspore.Tensor
-+++#     ) -> mindspore.Tensor:
-+++#         """
-+++#         【解码路径】针对 sequence_length=1 的极致优化。
-+++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-+++#         """
-+++#         batch_size, hidden_dim = hidden_states.shape
-+++        
-+++#         expert_outputs_list = [
-+++#             ops.cat([
-+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++#             ], dim=0) 
-+++#             for i in range(batch_size)
-+++#         ]
-+++        
-+++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-+++#         # shape: (batch_size, top_k, hidden_dim)
-+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++        
-+++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-+++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++        
-+++#         return moe_output.squeeze(1)
-+++
-+++#     @no_grad()
-+++#     def _moe_infer_prefill(
-+++#         self, 
-+++#         hidden_states: mindspore.Tensor, 
-+++#         selected_experts: mindspore.Tensor, 
-+++#         routing_weights: mindspore.Tensor
-+++#     ) -> mindspore.Tensor:
-+++#         """
-+++#         【预填充路径】针对 sequence_length > 1 的优化。
-+++#         按专家对 Token 进行分组，并进行批处理。
-+++#         """
-+++#         moe_output = ops.zeros_like(hidden_states)
-+++#         num_tokens = hidden_states.shape[0]
-+++#         flat_selected_experts = selected_experts.flatten()
-+++        
-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++        
-+++#         active_experts = ops.unique(flat_selected_experts)
-+++        
-+++#         for expert_idx_tensor in active_experts:
-+++#             expert_idx = expert_idx_tensor.item()
-+++#             expert_layer = self.experts[expert_idx]
-+++            
-+++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++#             selected_token_indices = token_indices[mask]
-+++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++            
-+++#             current_states = hidden_states[selected_token_indices]
-+++            
-+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++            
-+++#             moe_output = moe_output.index_add(
-+++#                 dim=0,
-+++#                 index=selected_token_indices,
-+++#                 source=expert_output.to(hidden_states.dtype)
-+++#             )
-+++#         return moe_output
-+++
-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++#         """
-+++#         顶层 forward 方法，作为智能分发器。
-+++#         """
-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++        
-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++#         router_logits = self.gate(hidden_states_reshaped)
-+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++ 
-++-    # def forward(
-++-    #     self,
-++-    #     hidden_states: mindspore.Tensor,
-++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-++-    #     position_ids: Optional[mindspore.Tensor] = None,
-++-    #     past_key_value: Optional[Cache] = None,
-++-    #     output_attentions: bool = False,
-++-    #     use_cache: bool = False,
-++-    #     cache_position: Optional[mindspore.Tensor] = None,
-++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++-
-++-    #     bsz, q_len, _ = hidden_states.shape
-++-
-++-    #     query_states = self.q_proj(hidden_states)
-++-    #     key_states = self.k_proj(hidden_states)
-++-    #     value_states = self.v_proj(hidden_states)
-++-
-++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-
-++-    #     kv_seq_len = key_states.shape[-2]
-++-    #     if past_key_value is not None:
-++-    #         if self.layer_idx is None:
-++-    #             raise ValueError("`layer_idx` must be specified for caching")
-++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++-
-++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++-
-++-    #     if past_key_value is not None:
-++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++-    #         key_states, value_states = past_key_value.update(
-++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-++-    #         )
-+++#         if self.norm_topk_prob:
-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++        
-+++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++        
-+++#         moe_output = None
-+++#         # 在推理时，根据序列长度选择最优路径
-+++#         if not self.training:
-+++#             if sequence_length == 1:
-+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++#             else:
-+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++#         else:
-+++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-+++#             raise NotImplementedError("Training path is not implemented.")
-+++
-+++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-+++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-+++        
-+++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-+++        
-+++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-+++        
-+++#         return final_hidden_states, router_logits
-+++
-+++
-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++#     """
-+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-+++#     """
-+++#     def __init__(self, config: Qwen2MoeConfig):
-+++#         super().__init__()
-+++#         self.num_experts = config.num_experts
-+++#         self.top_k = config.num_experts_per_tok
-+++#         self.norm_topk_prob = config.norm_topk_prob
-+++
-+++#         # 门控网络
-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++#         # 专家列表
-+++#         self.experts = nn.ModuleList(
-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++#         )
-+++#         # 共享专家
-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++
-+++#     @no_grad()
-+++#     def _moe_infer_decode(
-+++#         self, 
-+++#         hidden_states: mindspore.Tensor, 
-+++#         selected_experts: mindspore.Tensor, 
-+++#         routing_weights: mindspore.Tensor
-+++#     ) -> mindspore.Tensor:
-+++#         batch_size, _ = hidden_states.shape
-+++#         expert_outputs_list = [
-+++#             ops.cat([
-+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++#             ], dim=0) 
-+++#             for i in range(batch_size)
-+++#         ]
-+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++#         return moe_output.squeeze(1)
-+++
-+++#     @no_grad()
-+++#     def _moe_infer_prefill(
-+++#         self, 
-+++#         hidden_states: mindspore.Tensor, 
-+++#         selected_experts: mindspore.Tensor, 
-+++#         routing_weights: mindspore.Tensor
-+++#     ) -> mindspore.Tensor:
-+++#         moe_output = ops.zeros_like(hidden_states)
-+++#         num_tokens = hidden_states.shape[0]
-+++#         flat_selected_experts = selected_experts.flatten()
-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++#         active_experts = ops.unique(flat_selected_experts)
-+++        
-+++#         for expert_idx_tensor in active_experts:
-+++#             expert_idx = expert_idx_tensor.item()
-+++#             expert_layer = self.experts[expert_idx]
-+++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++#             selected_token_indices = token_indices[mask]
-+++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++#             current_states = hidden_states[selected_token_indices]
-+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++#             moe_output = moe_output.index_add(
-+++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++#             )
-+++#         return moe_output
-+++
-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++#         """
-+++#         顶层 forward 方法，作为智能分发器。
-+++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-+++#         """
-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++        
-+++#         # 1. 门控计算 (通用逻辑)
-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++#         router_logits = self.gate(hidden_states_reshaped)
-+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++
-+++#         if self.norm_topk_prob:
-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++        
-+++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++        
-+++#         # 2. 智能分发到最优 MoE 路径
-+++#         moe_output = None
-+++#         if not self.training:
-+++#             if sequence_length == 1:
-+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++#             else:
-+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++#         else:
-+++#             raise NotImplementedError("Training path is not implemented.")
-+++
-+++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-+++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++        
-+++#         # 4. 合并 MoE 输出和共享专家输出
-+++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++        
-+++#         # 5. 恢复原始形状并返回
-+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++        
-+++#         return final_hidden_states, router_logits
-+++
-+++# prefill fastest
-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++#     """
-+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-+++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-+++#     """
-+++#     def __init__(self, config: Qwen2MoeConfig):
-+++#         super().__init__()
-+++#         self.num_experts = config.num_experts
-+++#         self.top_k = config.num_experts_per_tok
-+++#         self.norm_topk_prob = config.norm_topk_prob
-+++
-+++#         # 门控网络
-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++#         # 专家列表
-+++#         self.experts = nn.ModuleList(
-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++#         )
-+++#         # 共享专家
-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++
-+++#     @no_grad()
-+++#     def _moe_infer_dispatch(
-+++#         self, 
-+++#         hidden_states: mindspore.Tensor, 
-+++#         selected_experts: mindspore.Tensor, 
-+++#         routing_weights: mindspore.Tensor
-+++#     ) -> mindspore.Tensor:
-+++#         """
-+++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-+++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-+++#         """
-+++#         moe_output = ops.zeros_like(hidden_states)
-+++#         num_tokens, _ = hidden_states.shape
-+++        
-+++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-+++#         flat_selected_experts = selected_experts.flatten()
-+++#         flat_routing_weights = routing_weights.flatten()
-++ 
-++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++-
-++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++-    #     query_states = query_states / math.sqrt(self.head_dim)
-++-    #     # <--- 修改结束 ---
-++-
-++-    #     fa_attention_mask = None
-++-    #     if attention_mask is not None:
-++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++-    #         fa_attention_mask = (mask_slice != 0)
-++-
-++-    #     input_dtype = query_states.dtype
-++-
-++-    #     attn_output = mindspore.ops.flash_attention_score(
-++-    #         query=query_states,    # 传入已经预先缩放过的 query
-++-    #         key=key_states,
-++-    #         value=value_states,
-++-    #         head_num=self.num_heads,
-++-    #         attn_mask=fa_attention_mask,
-++-    #         keep_prob=1.0 - self.attention_dropout,
-++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++-    #         input_layout="BNSD",
-++-    #         sparse_mode=0,
-++-    #         inner_precise=1        # 仍然保持内部高精度计算
-++-    #     )
-+++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++ 
-++-    #     attn_output = attn_output.to(input_dtype)
-++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++-    #     attn_output = self.o_proj(attn_output)
-+++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-+++#         active_experts = ops.unique(flat_selected_experts)
-+++        
-+++#         for expert_idx_tensor in active_experts:
-+++#             expert_idx = expert_idx_tensor.item()
-+++#             expert_layer = self.experts[expert_idx]
-+++            
-+++#             # 找到所有分配给该专家的 token
-+++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++            
-+++#             # 使用 mask 选取对应的 token 和权重
-+++#             current_token_indices = token_indices[mask]
-+++#             current_routing_weights = flat_routing_weights[mask]
-+++#             current_hidden_states = hidden_states[current_token_indices]
-+++            
-+++#             # 对这些 token 进行批处理
-+++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++            
-+++#             # 使用 index_add 将结果精确地加回到对应位置
-+++#             moe_output = moe_output.index_add(
-+++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-+++#             )
-+++#         return moe_output
-+++
-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++#         """
-+++#         顶层 forward 方法，作为智能分发器。
-+++#         """
-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++        
-+++#         # 1. 门控计算
-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++#         router_logits = self.gate(hidden_states_reshaped)
-+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++
-+++#         if self.norm_topk_prob:
-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++        
-+++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++        
-+++#         # 2. 调用统一的 MoE 计算内核
-+++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-+++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-++ 
-++-    #     attn_weights = None
-++-    #     if output_attentions:
-++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++#         # 3. 统一处理共享专家
-+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++        
-+++#         # 4. 合并输出
-+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++        
-+++#         # 5. 恢复原始形状并返回
-+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++        
-+++#         return final_hidden_states, router_logits
-+++
-+++
-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++#     """
-+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++#     【最终高性能与高精度版】：
-+++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-+++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-+++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-+++#     3. 这样实现了速度和准确性的两全其美。
-+++#     """
-+++#     def __init__(self, config: Qwen2MoeConfig):
-+++#         super().__init__()
-+++#         self.num_experts = config.num_experts
-+++#         self.top_k = config.num_experts_per_tok
-+++#         self.norm_topk_prob = config.norm_topk_prob
-+++
-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++#         self.experts = nn.ModuleList(
-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++#         )
-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++
-+++#     @no_grad()
-+++#     def _moe_infer_decode(
-+++#         self, 
-+++#         hidden_states: mindspore.Tensor, 
-+++#         selected_experts: mindspore.Tensor, 
-+++#         routing_weights: mindspore.Tensor
-+++#     ) -> mindspore.Tensor:
-+++#         """
-+++#         【解码路径】极致优化版：bmm + 高精度累加。
-+++#         """
-+++#         original_dtype = hidden_states.dtype
-+++#         batch_size, _ = hidden_states.shape
-+++        
-+++#         expert_outputs_list = [
-+++#             ops.cat([
-+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++#             ], dim=0) 
-+++#             for i in range(batch_size)
-+++#         ]
-+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++
-+++#         # 在 float32 下执行 bmm，得到高精度结果
-+++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++        
-+++#         # 将高精度结果转换回原始数据类型
-+++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-+++        
-+++#         return moe_output
-+++
-+++#     @no_grad()
-+++#     def _moe_infer_prefill(
-+++#         self, 
-+++#         hidden_states: mindspore.Tensor, 
-+++#         selected_experts: mindspore.Tensor, 
-+++#         routing_weights: mindspore.Tensor
-+++#     ) -> mindspore.Tensor:
-+++#         """
-+++#         【预填充路径】与原始实现一致，结果精确。
-+++#         """
-+++#         moe_output = ops.zeros_like(hidden_states)
-+++#         num_tokens, _ = hidden_states.shape
-+++#         flat_selected_experts = selected_experts.flatten()
-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++#         active_experts = ops.unique(flat_selected_experts)
-+++        
-+++#         for expert_idx_tensor in active_experts:
-+++#             expert_idx = expert_idx_tensor.item()
-+++#             expert_layer = self.experts[expert_idx]
-+++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++#             selected_token_indices = token_indices[mask]
-+++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++#             current_states = hidden_states[selected_token_indices]
-+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++#             moe_output = moe_output.index_add(
-+++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++#             )
-+++#         return moe_output
-+++
-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++        
-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++#         router_logits = self.gate(hidden_states_reshaped)
-+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++ 
-++-    #     return attn_output, attn_weights, past_key_value
-+++#         if self.norm_topk_prob:
-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++        
-+++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-+++#         # 如果模型主体是 float16，后续再转换
-+++        
-+++#         moe_output = None
-+++#         if not self.training:
-+++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-+++#             # _moe_infer_decode 内部会处理好类型转换
-+++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-+++#             if sequence_length == 1:
-+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++#             else:
-+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++#         else:
-+++#             raise NotImplementedError("Training path is not implemented.")
-+++
-+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++        
-+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++        
-+++#         return final_hidden_states, router_logits
-+++    
-++ 
-++-QWEN2MOE_ATTENTION_CLASSES = {
-++-    "eager": Qwen2MoeAttention,
-++-    "flash-attention": Qwen2MoeFlashAttention,
-++-}
-+++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++#     """
-+++#     【融合版】一个混合专家模块，内置两种推理策略，
-+++#     由外部全局变量 `Long_Prompt` 控制：
-+++
-+++#     - if Long_Prompt is True:  【精度优先模式】
-+++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-+++#       适用于处理长序列，避免误差累积。
-+++
-+++#     - if Long_Prompt is False: 【速度优先模式】
-+++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-+++#       在解码阶段获得极致速度，同时保证结果高度准确。
-+++#     """
-+++#     def __init__(self, config: Qwen2MoeConfig):
-+++#         super().__init__()
-+++#         self.num_experts = config.num_experts
-+++#         self.top_k = config.num_experts_per_tok
-+++#         self.norm_topk_prob = config.norm_topk_prob
-+++
-+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++#         self.experts = nn.ModuleList(
-+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++#         )
-+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++
-+++#     # --- 速度优先模式的辅助函数 ---
-+++#     @no_grad()
-+++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++#         original_dtype = hidden_states.dtype
-+++#         batch_size, _ = hidden_states.shape
-+++#         expert_outputs_list = [
-+++#             ops.cat([
-+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++#             ], dim=0) 
-+++#             for i in range(batch_size)
-+++#         ]
-+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++#         weights_fp32 = routing_weights.to(mindspore.float32)
-+++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++#         return moe_output_fp32.squeeze(1).to(original_dtype)
-+++
-+++#     @no_grad()
-+++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++#         moe_output = ops.zeros_like(hidden_states)
-+++#         num_tokens, _ = hidden_states.shape
-+++#         flat_selected_experts = selected_experts.flatten()
-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++#         active_experts = ops.unique(flat_selected_experts)
-+++#         for expert_idx_tensor in active_experts:
-+++#             expert_idx = expert_idx_tensor.item()
-+++#             expert_layer = self.experts[expert_idx]
-+++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++#             selected_token_indices = token_indices[mask]
-+++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++#             current_states = hidden_states[selected_token_indices]
-+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-+++#         return moe_output
-+++
-+++#     # --- 精度优先模式的辅助函数 ---
-+++#     @no_grad()
-+++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++#         moe_output = ops.zeros_like(hidden_states)
-+++#         num_tokens, _ = hidden_states.shape
-+++#         flat_selected_experts = selected_experts.flatten()
-+++#         flat_routing_weights = routing_weights.flatten()
-+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++#         active_experts = ops.unique(flat_selected_experts)
-+++#         for expert_idx_tensor in active_experts:
-+++#             expert_idx = expert_idx_tensor.item()
-+++#             expert_layer = self.experts[expert_idx]
-+++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++#             current_token_indices = token_indices[mask]
-+++#             current_routing_weights = flat_routing_weights[mask]
-+++#             current_hidden_states = hidden_states[current_token_indices]
-+++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+++#         return moe_output
-+++
-+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++#         # 声明我们将要使用一个在模块外部定义的全局变量
-+++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-+++#         global Long_Prompt
-+++
-+++#         # 1. 门控计算 (所有模式通用)
-+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++#         router_logits = self.gate(hidden_states_reshaped)
-+++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+++#         if self.norm_topk_prob:
-+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++        
-+++#         moe_output = None
-+++#         if not self.training:
-+++#             # 根据 Long_Prompt 标志选择模式
-+++#             if Long_Prompt:
-+++#                 # --- 精度优先模式 ---
-+++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++#             else:
-+++#                 # --- 速度优先模式 ---
-+++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++#                 if sequence_length == 1:
-+++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++#                 else:
-+++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++#         else:
-+++#             raise NotImplementedError("Training path is not implemented.")
-+++
-+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++        
-+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++        
-+++#         return final_hidden_states, router_logits
-+++    
-+++class Qwen2MoeSparseMoeBlock(nn.Module):
-+++    """
-+++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-+++    控制的顶级推理策略：
-++ 
-+++    - if Long_Prompt is True:  【精度优先模式】
-+++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
-+++      适用于需要严格可复现性的长序列任务。
-++ 
-++-class Qwen2MoeSparseMoeBlock(nn.Module):
-++-    def __init__(self, config):
-+++    - if Long_Prompt is False: 【速度优先模式】
-+++      采用业界最强的性能组合：
-+++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
-+++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
-+++    """
-+++    def __init__(self, config: Qwen2MoeConfig):
-++         super().__init__()
-++         self.num_experts = config.num_experts
-++         self.top_k = config.num_experts_per_tok
-++         self.norm_topk_prob = config.norm_topk_prob
-++ 
-++-        # gating
-++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++         self.experts = nn.ModuleList(
-++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++         )
-++-
-++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++ 
-++-    #@dwj
-++-    # 只遍历激活的专家，而非全部专家
-++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++-            num_tokens = hidden_states_reshaped.shape[0]
-++-
-++-            router_logits = self.gate(hidden_states_reshaped)
-++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++-
-++-            if self.norm_topk_prob:
-++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-            routing_weights = routing_weights.to(hidden_states.dtype)
-++-            
-++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++-            flat_selected_experts = selected_experts.flatten()
-++-            
-++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++-            token_indices = broadcasted_token_indices.flatten()
-++-            
-++-            active_experts = ops.unique(flat_selected_experts)
-++-            
-++-            for expert_idx_tensor in active_experts:
-++-                expert_idx = expert_idx_tensor.item()
-++-                expert_layer = self.experts[expert_idx]
-++-                
-++-                mask = (flat_selected_experts == expert_idx_tensor)
-++-                selected_token_indices = token_indices[mask]
-++-                selected_routing_weights = routing_weights.flatten()[mask]
-++-                
-++-                current_states = hidden_states_reshaped[selected_token_indices]
-++-                
-++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++-                
-++-                final_hidden_states = final_hidden_states.index_add(
-++-                    dim=0,
-++-                    index=selected_token_indices,
-++-                    source=expert_output.to(hidden_states.dtype)
-++-                )
-++-            
-++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-+++    @no_grad()
-+++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++        original_dtype = hidden_states.dtype
-+++        batch_size, _ = hidden_states.shape
-+++        expert_outputs_list = [
-+++            ops.cat([
-+++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++            ], dim=0)
-+++            for i in range(batch_size)
-+++        ]
-+++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++        weights_fp32 = routing_weights.to(mindspore.float32)
-+++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++        return moe_output_fp32.squeeze(1).to(original_dtype)
-+++
-+++    @no_grad()
-+++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++        num_tokens, _ = hidden_states.shape
-+++        flat_selected_experts = selected_experts.flatten()
-+++        sorted_expert_indices = flat_selected_experts.argsort()
-+++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+++        original_token_indices = sorted_expert_indices // self.top_k
-+++        moe_output = ops.zeros_like(hidden_states)
-+++        current_token_offset = 0
-+++        for i in range(self.num_experts):
-+++            expert_token_count = tokens_per_expert[i] - current_token_offset
-+++            if expert_token_count == 0:
-+++                continue
-+++            end_offset = current_token_offset + expert_token_count
-+++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+++            expert_hidden_states = hidden_states[expert_original_token_indices]
-+++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+++            current_token_offset += expert_token_count
-+++        return moe_output
-+++
-+++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-+++    @no_grad()
-+++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++        moe_output = ops.zeros_like(hidden_states)
-+++        num_tokens, _ = hidden_states.shape
-+++        flat_selected_experts = selected_experts.flatten()
-+++        flat_routing_weights = routing_weights.flatten()
-+++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++        active_experts = ops.unique(flat_selected_experts)
-+++        for expert_idx_tensor in active_experts:
-+++            expert_idx = expert_idx_tensor.item()
-+++            expert_layer = self.experts[expert_idx]
-+++            mask = (flat_selected_experts == expert_idx_tensor)
-+++            current_token_indices = token_indices[mask]
-+++            current_routing_weights = flat_routing_weights[mask]
-+++            current_hidden_states = hidden_states[current_token_indices]
-+++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+++        return moe_output
-++ 
-++-            final_hidden_states = final_hidden_states + shared_expert_output
-++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++-            
-++-            return final_hidden_states, router_logits
-+++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++        global Long_Prompt
-+++
-+++        # 1. 门控计算 (所有模式通用)
-+++        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++        router_logits = self.gate(hidden_states_reshaped)
-+++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+++        if self.norm_topk_prob:
-+++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++        
-+++        moe_output = None
-+++        if Long_Prompt:
-+++            # --- 精度优先模式 (ACCURACY MODE) ---
-+++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++        else:
-+++            # --- 速度优先模式 (SPEED MODE) ---
-+++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++            if sequence_length == 1:
-+++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++            else:
-+++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++        
-++ 
-+++        # 3. 共享专家计算与合并 (所有模式通用)
-+++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++        
-+++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++        
-+++        return final_hidden_states, router_logits
-++ 
-++ class Qwen2MoeDecoderLayer(nn.Module):
-++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-++         super().__init__()
-++         self.hidden_size = config.hidden_size
-+++        
-+++        # if Long_Prompt:
-+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++        # else:
-+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++ 
-++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++ 
-++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++-
-++         if (layer_idx not in config.mlp_only_layers) and (
-++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++         ):
-++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++             self._warmed_up = True
-++             self.warmup_moe_model()
-++ 
-+++
-+++
-++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++         output_router_logits = (
-++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
-++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++             router_logits=outputs.router_logits,
-++         )
-++ 
-+++    def generate(self, *args, **kwargs):
-+++        """
-+++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-+++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-+++        """
-+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+++
-+++        input_ids = kwargs.get("input_ids")
-+++        if input_ids is None and args:
-+++            input_ids = args[0]
-+++
-+++        if input_ids is not None:
-+++            prompt_length = input_ids.shape[1]
-+++            
-+++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-+++                Long_Prompt = True
-+++            else:
-+++                Long_Prompt = False
-+++
-+++        return super().generate(*args, **kwargs)
-+++    
-++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
-++     def prepare_inputs_for_generation(
-++         self,
-++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
-++         # Exception 1: when passing input_embeds, input_ids may be missing entries
-++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
-+++        
-++         if past_key_values is not None:
-++             if inputs_embeds is not None:  # Exception 1
-++                 if 0 not in input_ids.shape:
-++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++             }
-++         )
-++         return model_inputs
-+++
-++ # @lwx
-++     # def _decode_one_tokens_logits(
-++     #     self,
-++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
-++             attentions=outputs.attentions,
-++         )
-++ 
-+++
-++ __all__ = [
-++     "Qwen2MoeForCausalLM",
-++     "Qwen2MoeModel",
-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++new file mode 100644
-++index 00000000..6dfb5b93
-++--- /dev/null
-+++++ b/patches/0001-20251104commit.patch
-++@@ -0,0 +1,1272 @@
-+++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++From: Pinoeer-kingxi <13022943007@163.com>
-+++Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++Subject: [PATCH] 20251104commit
-+++
-+++---
-+++ mindnlp/transformers/cache_utils.py           |  28 +-
-+++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+++ 3 files changed, 976 insertions(+), 87 deletions(-)
-+++
-+++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+++index cadd2e04..02f8d4be 100644
-+++--- a/mindnlp/transformers/cache_utils.py
-++++++ b/mindnlp/transformers/cache_utils.py
-+++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+++             # k_out[:, :, cache_position] = key_states
-+++             # v_out[:, :, cache_position] = value_states
-+++-            if ON_ORANGE_PI:
-+++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++-            else:
-+++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++-
-++++            # if ON_ORANGE_PI:
-++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++            # else:
-++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++            # 确保 cache_position 是 1D tensor 并且类型正确
-++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++++            if cache_position.ndim > 1:
-++++                cache_position = cache_position.flatten()
-++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++++                cache_position = cache_position.int()
-++++            
-++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++++            k_out[:, :, cache_position] = key_states
-++++            v_out[:, :, cache_position] = value_states
-++++            
-+++         return k_out, v_out
-+++ 
-+++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++index c695b944..d8303e45 100644
-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++ def rotate_half(x):
-+++     """Rotates half the hidden dims of the input."""
-+++-    x1 = x[..., : x.shape[-1] // 2]
-+++-    x2 = x[..., x.shape[-1] // 2 :]
-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++    # x1 = x[..., : x.shape[-1] // 2]
-++++    # x2 = x[..., x.shape[-1] // 2 :]
-++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++     return ops.cat((-x2, x1), dim=-1)
-+++ 
-+++ 
-+++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+++         if self.training:
-+++             raise NotImplementedError("Training is not supported yet.")
-+++         else:
-+++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++-        if self.config.n_shared_experts is not None:
-+++-            y = y + self.shared_experts(identity)
-+++-        return y
-++++            # @lwx
-++++            if orig_shape[1] == 1:
-++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++++                y=y.view(*orig_shape)
-++++                if self.config.n_shared_experts is not None:
-++++                    y = y + self.shared_experts(identity)
-++++                return y
-++++            else:
-++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++++                if self.config.n_shared_experts is not None:
-++++                    y = y + self.shared_experts(identity)
-++++                return y
-++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++        # if self.config.n_shared_experts is not None:
-++++        #     y = y + self.shared_experts(identity)
-++++        # return y
-++++        
-++++    @no_grad()
-++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++
-++++        expert_cache = ops.zeros_like(x)
-++++        for i in range(self.num_experts_per_tok):
-++++            expert_id = flat_expert_indices[i].item()
-++++            weight = flat_expert_weights[i].item()
-++++            expert = self.experts[expert_id]
-++++            expert_out = expert(x)
-++++            expert_cache += expert_out * weight
-++++        return expert_cache
-+++ 
-+++     @no_grad()
-+++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++-        # expert_cache = torch.zeros_like(x)
-+++-        # idxs = flat_expert_indices.argsort()
-+++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++-        # token_idxs = idxs // self.num_experts_per_tok
-+++-        # for i, end_idx in enumerate(tokens_per_expert):
-+++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++-        #     if start_idx == end_idx:
-+++-        #         continue
-+++-        #     expert = self.experts[i]
-+++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++-        #     expert_tokens = x[exp_token_idx]
-+++-        #     expert_out = expert(expert_tokens)
-+++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++-        # return expert_cache
-++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++         expert_cache = ops.zeros_like(x)
-+++         idxs = flat_expert_indices.argsort()
-+++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++         token_idxs = idxs // self.num_experts_per_tok
-++++
-+++         for i, end_idx in enumerate(tokens_per_expert):
-+++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++             if start_idx == end_idx:
-+++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+++             expert_out = expert(expert_tokens)
-+++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++
-+++         return expert_cache
-++++        
-++++    # @no_grad()
-++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++    #     # expert_cache = torch.zeros_like(x)
-++++    #     # idxs = flat_expert_indices.argsort()
-++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++    #     # token_idxs = idxs // self.num_experts_per_tok
-++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++    #     #     if start_idx == end_idx:
-++++    #     #         continue
-++++    #     #     expert = self.experts[i]
-++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++    #     #     expert_tokens = x[exp_token_idx]
-++++    #     #     expert_out = expert(expert_tokens)
-++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++    #     # return expert_cache
-++++    #     expert_cache = ops.zeros_like(x)
-++++    #     idxs = flat_expert_indices.argsort()
-++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++
-++++    #     for i, end_idx in enumerate(tokens_per_expert):
-++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++    #         if start_idx == end_idx:
-++++    #             continue
-++++    #         expert = self.experts[i]
-++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++    #         expert_tokens = x[exp_token_idx]
-++++    #         expert_out = expert(expert_tokens)
-++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++
-++++    #     return expert_cache
-++++    # @no_grad()
-++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++    #     expert_cache = ops.zeros_like(x)
-++++
-++++    #     # 排序保证顺序一致
-++++    #     idxs = flat_expert_indices.argsort()
-++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++
-++++    #     # 找出有 token 的专家
-++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++
-++++    #     for i in active_experts.tolist():
-++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++    #         end_idx = tokens_per_expert[i]
-++++    #         if start_idx == end_idx:  # 没有 token
-++++    #             continue
-++++
-++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++    #         expert_tokens = x[exp_token_idx]
-++++    #         expert_out = self.experts[i](expert_tokens)
-++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++
-++++    #         expert_cache = mindspore.mint.scatter_add(
-++++    #             expert_cache,
-++++    #             0,
-++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++    #             expert_out
-++++    #         )
-++++
-++++    #     return expert_cache
-++++
-++++
-+++ 
-+++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+++ #     """
-+++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++ 
-+++         # Initialize weights and apply final processing
-+++         self.post_init()
-++++        self.warm_up = False
-++++
-++++    def warmup_moe_model_deep(self):
-++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++        test_texts = [
-++++            "warmup short",
-++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++++        ]
-++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++        if tokenizer is None:
-++++            from mindnlp.transformers import AutoTokenizer
-++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++            self._warmup_tokenizer = tokenizer
-++++
-++++        for text in test_texts:
-++++            inputs = tokenizer(text, return_tensors="ms")
-++++            with mindspore._no_grad():
-++++                _ = self(**inputs, use_cache=False)
-++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+++ 
-+++     def get_input_embeddings(self):
-+++         return self.model.embed_tokens
-+++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++         ```"""
-++++        if not self.warm_up:
-++++            self.warm_up = True
-++++            self.warmup_moe_model_deep()
-++++
-+++         output_attentions = (
-+++             output_attentions
-+++             if output_attentions is not None
-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++index 3cbf820e..d4c6b651 100644
-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++@@ -18,7 +18,6 @@
-+++ # See the License for the specific language governing permissions and
-+++ # limitations under the License.
-+++ """MindSpore Qwen2MoE model."""
-+++-
-+++ import math
-+++ from typing import List, Optional, Tuple, Union
-+++ 
-+++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+++     TokenClassifierOutput,
-+++ )
-+++ from ...modeling_utils import PreTrainedModel
-++++from ...generation import GenerationMixin
-+++ from ....utils import logging
-+++ from .configuration_qwen2_moe import Qwen2MoeConfig
-+++ 
-+++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+++         self.variance_epsilon = eps
-+++ 
-+++     def forward(self, hidden_states):
-++++        # @dwj
-++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++        # @lwx
-++++        # if not self.training :
-++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++         input_dtype = hidden_states.dtype
-+++         hidden_states = hidden_states.to(mindspore.float32)
-+++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+++@@ -234,6 +239,8 @@ def rotate_half(x):
-+++     """Rotates half the hidden dims of the input."""
-+++     x1 = x[..., : x.shape[-1] // 2]
-+++     x2 = x[..., x.shape[-1] // 2 :]
-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++     return ops.cat((-x2, x1), dim=-1)
-+++ 
-+++ 
-+++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+++         self.config = config
-+++         self.hidden_size = config.hidden_size
-+++         self.intermediate_size = intermediate_size
-++++        
-+++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+++         self.act_fn = ACT2FN[config.hidden_act]
-+++ 
-+++     def forward(self, x):
-+++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++-
-+++ 
-++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++        # @lwx 
-++++        # gate_up_output = self.gate_up_proj(x)
-++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++++        # return self.down_proj(swiglu_output)
-++++
-++++    # def forward(self, x):
-++++    #     gate_proj_out = self.gate_proj(x)
-++++    #     up_proj_out = self.up_proj(x)
-++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++++    #     return self.down_proj(swiglu_out)
-++++        
-+++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++     """
-+++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+++         use_cache: bool = False,
-+++         cache_position: Optional[mindspore.Tensor] = None,
-+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++        
-++++
-+++         bsz, q_len, _ = hidden_states.shape
-+++ 
-+++         query_states = self.q_proj(hidden_states)
-+++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++                     "with a layer index."
-+++                 )
-+++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++            if isinstance(past_key_value, StaticCache):
-++++                kv_seq_len = key_states.shape[-2]
-++++            else:
-++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++ 
-+++         if past_key_value is not None:
-+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++            
-++++            if isinstance(past_key_value, StaticCache):
-++++                kv_seq_len = key_states.shape[-2]
-+++ 
-+++         # repeat k/v heads if n_kv_heads < n_heads
-+++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++-
-++++        
-+++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++ 
-+++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+++-            raise ValueError(
-+++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+++-                f" {attn_weights.shape}"
-+++-            )
-+++-
-+++-        if attention_mask is not None:  # no matter the length, we just slice it
-+++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++++        if attention_mask is not None:
-++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++             attn_weights = attn_weights + causal_mask
-+++ 
-+++         # upcast attention to fp32
-+++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++ 
-+++         attn_output = self.o_proj(attn_output)
-+++-
-++++        # @lwx
-++++        
-++++        # max_seq_len = self.max_position_embeddings  # 2048
-++++
-++++        # if attention_mask is not None:
-++++        #     # attention_mask: [B, 1, Sq, Sk]
-++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++
-++++        #     # pad 到 [max_seq_len, max_seq_len]
-++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++        #     global_attention_mask = padded_mask
-++++        # else:
-++++        #     global_attention_mask = None
-++++
-++++
-++++        # sparse_mode=3
-++++        # attn_output = mindspore.ops.flash_attention_score(
-++++        #     query=query_states,
-++++        #     key=key_states,
-++++        #     value=value_states,
-++++        #     real_shift=None,
-++++        #     padding_mask=None,
-++++
-++++        #     head_num=self.num_heads,
-++++        #     attn_mask=global_attention_mask,
-++++        #     keep_prob=1.0 - self.attention_dropout,
-++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++        #     input_layout="BNSD",
-++++        #     pre_tokens=2147483647,
-++++        #     next_tokens=2147483647,
-++++        #     inner_precise=0,
-++++        #     drop_mask=None,
-++++        #     prefix=None,
-++++        #     actual_seq_qlen=None,
-++++        #     actual_seq_kvlen=None,
-++++        #     sparse_mode=sparse_mode,
-++++        # )
-+++         if not output_attentions:
-+++             attn_weights = None
-+++ 
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-+++ 
-++++class Qwen2MoeFlashAttention(nn.Module):
-++++    """
-++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++
-++++    关键改动:
-++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++       直接传入原始的 key 和 value 张量效率更高。
-++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++    """
-++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++        super().__init__()
-++++        self.config = config
-++++        self.layer_idx = layer_idx
-++++        self.hidden_size = config.hidden_size
-++++        self.num_heads = config.num_attention_heads
-++++        self.head_dim = self.hidden_size // self.num_heads
-++++        self.num_key_value_heads = config.num_key_value_heads
-++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++        self.max_position_embeddings = config.max_position_embeddings
-++++        self.rope_theta = config.rope_theta
-++++        self.attention_dropout = config.attention_dropout
-++++
-++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++            raise ValueError(
-++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++            )
-++++
-++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++
-++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++            self.head_dim,
-++++            max_position_embeddings=self.max_position_embeddings,
-++++            base=self.rope_theta,
-++++        )
-++++
-++++    def forward(
-++++        self,
-++++        hidden_states: mindspore.Tensor,
-++++        attention_mask: Optional[mindspore.Tensor] = None,
-++++        position_ids: Optional[mindspore.Tensor] = None,
-++++        past_key_value: Optional[Cache] = None,
-++++        output_attentions: bool = False,
-++++        use_cache: bool = False,
-++++        cache_position: Optional[mindspore.Tensor] = None,
-++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++        bsz, q_len, _ = hidden_states.shape
-++++
-++++        # 1. 线性投射 Q, K, V
-++++        query_states = self.q_proj(hidden_states)
-++++        key_states = self.k_proj(hidden_states)
-++++        value_states = self.v_proj(hidden_states)
-++++
-++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++        # 3. RoPE 旋转位置编码
-++++        kv_seq_len = key_states.shape[-2]
-++++        if past_key_value is not None:
-++++            if self.layer_idx is None:
-++++                raise ValueError(
-++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++                    "with a layer index."
-++++                )
-++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++                if cache_position.shape[0] == 1:
-++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++                    kv_seq_len = past_seen_tokens + 1
-++++                else:
-++++                    # prefill 阶段：cache_position 是范围，使用其长度
-++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++            else:
-++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++
-++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++        # 4. KV 缓存更新
-++++        if past_key_value is not None:
-++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++            key_states, value_states = past_key_value.update(
-++++                key_states, value_states, self.layer_idx, cache_kwargs
-++++            )
-++++            
-++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++                if cache_position.shape[0] == 1:
-++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++                    kv_seq_len = key_states.shape[-2]
-++++
-++++        # 5. [重要] 准备 Attention Mask
-++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++        fa_attention_mask = None
-++++        if attention_mask is not None:
-++++            # 截取与当前key长度匹配的部分
-++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++            fa_attention_mask = (mask_slice != 0)
-++++
-++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++        input_dtype = query_states.dtype
-++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++            query_states = query_states.to(mindspore.float16)
-++++            key_states = key_states.to(mindspore.float16)
-++++            value_states = value_states.to(mindspore.float16)
-++++
-++++        # 6. [核心] 调用 flash_attention_score 算子
-++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++        attn_output = mindspore.ops.flash_attention_score(
-++++            query=query_states,
-++++            key=key_states,
-++++            value=value_states,
-++++            head_num=self.num_heads, # 传入Q的头数(N1)
-++++            attn_mask=fa_attention_mask,
-++++            keep_prob=1.0 - self.attention_dropout,
-++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-++++            input_layout="BNSD",
-++++            sparse_mode=0 # 使用 defaultMask 模式
-++++        )
-++++
-++++        # 恢复原始数据类型
-++++        attn_output = attn_output.to(input_dtype)
-++++
-++++        # 7. 调整输出形状
-++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++        attn_output = self.o_proj(attn_output)
-++++
-++++        # FlashAttention 算子不直接返回注意力权重矩阵
-++++        attn_weights = None
-++++        if output_attentions:
-++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++
-++++        return attn_output, attn_weights, past_key_value
-++++
-++++    # def forward(
-++++    #     self,
-++++    #     hidden_states: mindspore.Tensor,
-++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++    #     past_key_value: Optional[Cache] = None,
-++++    #     output_attentions: bool = False,
-++++    #     use_cache: bool = False,
-++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++    #     bsz, q_len, _ = hidden_states.shape
-++++
-++++    #     # 1. 线性投射 Q, K, V
-++++    #     query_states = self.q_proj(hidden_states)
-++++    #     key_states = self.k_proj(hidden_states)
-++++    #     value_states = self.v_proj(hidden_states)
-++++
-++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++    #     # 3. RoPE 旋转位置编码
-++++    #     kv_seq_len = key_states.shape[-2]
-++++    #     if past_key_value is not None:
-++++    #         if self.layer_idx is None:
-++++    #             raise ValueError(
-++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++    #                 "with a layer index."
-++++    #             )
-++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++
-++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++    #     # 4. KV 缓存更新
-++++    #     if past_key_value is not None:
-++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++    #         key_states, value_states = past_key_value.update(
-++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++    #         )
-++++
-++++    #     # 5. 准备 Attention Mask
-++++    #     fa_attention_mask = None
-++++    #     if attention_mask is not None:
-++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++    #         fa_attention_mask = (mask_slice != 0)
-++++
-++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++    #     input_dtype = query_states.dtype
-++++
-++++    #     # 6. [核心] 调用 flash_attention_score 算子
-++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++    #         query=query_states,
-++++    #         key=key_states,
-++++    #         value=value_states,
-++++    #         head_num=self.num_heads,
-++++    #         attn_mask=fa_attention_mask,
-++++    #         keep_prob=1.0 - self.attention_dropout,
-++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++    #         input_layout="BNSD",
-++++    #         sparse_mode=0,
-++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++    #         inner_precise=1
-++++    #     )
-++++
-++++    #     # 恢复原始数据类型
-++++    #     attn_output = attn_output.to(input_dtype)
-++++
-++++    #     # 7. 调整输出形状
-++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++    #     attn_output = self.o_proj(attn_output)
-++++
-++++    #     attn_weights = None
-++++    #     if output_attentions:
-++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++
-++++    #     return attn_output, attn_weights, past_key_value
-++++
-++++    # def forward(
-++++    #     self,
-++++    #     hidden_states: mindspore.Tensor,
-++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++    #     past_key_value: Optional[Cache] = None,
-++++    #     output_attentions: bool = False,
-++++    #     use_cache: bool = False,
-++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++    #     bsz, q_len, _ = hidden_states.shape
-++++
-++++    #     query_states = self.q_proj(hidden_states)
-++++    #     key_states = self.k_proj(hidden_states)
-++++    #     value_states = self.v_proj(hidden_states)
-++++
-++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++    #     kv_seq_len = key_states.shape[-2]
-++++    #     if past_key_value is not None:
-++++    #         if self.layer_idx is None:
-++++    #             raise ValueError("`layer_idx` must be specified for caching")
-++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++
-++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++    #     if past_key_value is not None:
-++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++    #         key_states, value_states = past_key_value.update(
-++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++    #         )
-++++
-++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++
-++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++    #     query_states = query_states / math.sqrt(self.head_dim)
-++++    #     # <--- 修改结束 ---
-++++
-++++    #     fa_attention_mask = None
-++++    #     if attention_mask is not None:
-++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++    #         fa_attention_mask = (mask_slice != 0)
-++++
-++++    #     input_dtype = query_states.dtype
-++++
-++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++    #         query=query_states,    # 传入已经预先缩放过的 query
-++++    #         key=key_states,
-++++    #         value=value_states,
-++++    #         head_num=self.num_heads,
-++++    #         attn_mask=fa_attention_mask,
-++++    #         keep_prob=1.0 - self.attention_dropout,
-++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++    #         input_layout="BNSD",
-++++    #         sparse_mode=0,
-++++    #         inner_precise=1        # 仍然保持内部高精度计算
-++++    #     )
-++++
-++++    #     attn_output = attn_output.to(input_dtype)
-++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++    #     attn_output = self.o_proj(attn_output)
-++++
-++++    #     attn_weights = None
-++++    #     if output_attentions:
-++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++
-++++    #     return attn_output, attn_weights, past_key_value
-++++
-+++ QWEN2MOE_ATTENTION_CLASSES = {
-+++     "eager": Qwen2MoeAttention,
-++++    "flash-attention": Qwen2MoeFlashAttention,
-+++ }
-+++ 
-+++ 
-+++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++ 
-++++    #@dwj
-++++    # 只遍历激活的专家，而非全部专家
-+++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-        hidden_states = hidden_states.view(-1, hidden_dim)
-+++-        # router_logits: (batch * sequence_length, n_experts)
-+++-        router_logits = self.gate(hidden_states)
-+++-
-+++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++-        if self.norm_topk_prob:
-+++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-        # we cast back to the input dtype
-+++-        routing_weights = routing_weights.to(hidden_states.dtype)
-+++-
-+++-        final_hidden_states = ops.zeros(
-+++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+++-        )
-+++-
-+++-        # One hot encode the selected experts to create an expert mask
-+++-        # this will be used to easily index which expert is going to be sollicitated
-+++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+++-
-+++-        # Loop over all available experts in the model and perform the computation on each expert
-+++-        for expert_idx in range(self.num_experts):
-+++-            expert_layer = self.experts[expert_idx]
-+++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+++-
-+++-            # Index the correct hidden states and compute the expert hidden state for
-+++-            # the current expert. We need to make sure to multiply the output hidden
-+++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+++-            if 0 not in idx.shape:
-+++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+++-
-+++-                # However `index_add_` only support torch tensors for indexing so we'll use
-+++-                # the `top_x` tensor here.
-+++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+++-
-+++-        shared_expert_output = self.shared_expert(hidden_states)
-+++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+++-
-+++-        final_hidden_states = final_hidden_states + shared_expert_output
-++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++            num_tokens = hidden_states_reshaped.shape[0]
-++++
-++++            router_logits = self.gate(hidden_states_reshaped)
-++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++
-++++            if self.norm_topk_prob:
-++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++            routing_weights = routing_weights.to(hidden_states.dtype)
-++++            
-++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++            flat_selected_experts = selected_experts.flatten()
-++++            
-++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++            token_indices = broadcasted_token_indices.flatten()
-++++            
-++++            active_experts = ops.unique(flat_selected_experts)
-++++            
-++++            for expert_idx_tensor in active_experts:
-++++                expert_idx = expert_idx_tensor.item()
-++++                expert_layer = self.experts[expert_idx]
-++++                
-++++                mask = (flat_selected_experts == expert_idx_tensor)
-++++                selected_token_indices = token_indices[mask]
-++++                selected_routing_weights = routing_weights.flatten()[mask]
-++++                
-++++                current_states = hidden_states_reshaped[selected_token_indices]
-++++                
-++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++                
-++++                final_hidden_states = final_hidden_states.index_add(
-++++                    dim=0,
-++++                    index=selected_token_indices,
-++++                    source=expert_output.to(hidden_states.dtype)
-++++                )
-++++            
-++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++ 
-+++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++-        return final_hidden_states, router_logits
-++++            final_hidden_states = final_hidden_states + shared_expert_output
-++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++            
-++++            return final_hidden_states, router_logits
-+++ 
-+++ 
-+++ class Qwen2MoeDecoderLayer(nn.Module):
-+++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+++ 
-+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++ 
-++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++
-+++         if (layer_idx not in config.mlp_only_layers) and (
-+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++         ):
-+++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+++     _skip_keys_device_placement = "past_key_values"
-+++     _supports_cache_class = True
-++++#lwx
-++++    # _supports_static_cache = True
-+++ 
-+++     def _init_weights(self, module):
-+++         std = self.config.initializer_range
-+++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++         return causal_mask
-+++ 
-+++ 
-+++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++     _tied_weights_keys = ["lm_head.weight"]
-+++ 
-+++     def __init__(self, config):
-+++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++         self.num_experts_per_tok = config.num_experts_per_tok
-+++         # Initialize weights and apply final processing
-+++         self.post_init()
-++++        # @lwx
-++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++++        #     self.generation_config.cache_implementation = "static"
-++++        self._warmed_up = False
-++++
-++++    def warmup_moe_model(self):
-++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++++        test_texts = [
-++++            "warmup short",
-++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++++        ]
-++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++        if tokenizer is None:
-++++            from mindnlp.transformers import AutoTokenizer
-++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++            self._warmup_tokenizer = tokenizer
-++++
-++++        for text in test_texts:
-++++            inputs = tokenizer(text, return_tensors="ms")
-++++            with mindspore._no_grad():
-++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+++ 
-+++     def get_input_embeddings(self):
-+++         return self.model.embed_tokens
-+++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++         ```"""
-++++        if not self._warmed_up:
-++++            self._warmed_up = True
-++++            self.warmup_moe_model()
-+++ 
-+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++         output_router_logits = (
-+++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++             }
-+++         )
-+++         return model_inputs
-++++# @lwx
-++++    # def _decode_one_tokens_logits(
-++++    #     self,
-++++    #     cur_token: mindspore.Tensor,
-++++    #     input_pos: Optional[mindspore.Tensor],
-++++    #     cache_position: mindspore.Tensor,
-++++    #     past_key_values: StaticCache,
-++++    # ) -> mindspore.Tensor:
-++++    #     """
-++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++++        
-++++    #     Args:
-++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++++    #         input_pos: 输入位置信息，可选
-++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++++            
-++++    #     Returns:
-++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++++    #     """
-++++    #     # 调用JIT编译的版本
-++++    #     return self.get_decode_one_tokens_logits(
-++++    #         cur_token=cur_token,
-++++    #         input_pos=input_pos,
-++++    #         cache_position=cache_position,
-++++    #         past_key_values=past_key_values,
-++++    #     )
-++++    
-++++    # @mindspore.jit(jit_level='O1')
-++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++++    #     """
-++++    #     JIT编译的函数，用于高效的单token解码
-++++    #     使用JIT编译优化以支持静态shape和高效执行
-++++        
-++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++++    #     """
-++++    #     outputs = self.model.forward(
-++++    #         input_ids=cur_token,
-++++    #         position_ids=input_pos,
-++++    #         cache_position=cache_position,
-++++    #         past_key_values=past_key_values,
-++++    #         use_cache=True,
-++++    #         return_dict=False,
-++++    #     )
-++++        
-++++    #     hidden_states = outputs[0]
-++++    #     logits = self.lm_head.forward(hidden_states)
-++++    #     logits = logits.float()
-++++        
-++++    #     return logits[:, -1, :]
-++++
-++++    # def _sample(
-++++    #     self,
-++++    #     input_ids: mindspore.Tensor,
-++++    #     logits_processor,
-++++    #     stopping_criteria,
-++++    #     generation_config,
-++++    #     synced_devices: bool,
-++++    #     streamer=None,
-++++    #     logits_warper=None,
-++++    #     **model_kwargs,
-++++    # ):
-++++    #     """
-++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++++    #     """
-++++    #     from ...generation.logits_process import LogitsProcessorList
-++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++++    #     from mindnlp.core import nn, ops, no_grad
-++++    #     import numpy as np
-++++        
-++++    #     # 检查是否使用 StaticCache
-++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++++    #     # 否则，直接调用父类方法
-++++    #     past_key_values = model_kwargs.get("past_key_values")
-++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++++
-++++    #     if not isinstance(past_key_values, StaticCache):
-++++    #         # 不使用 StaticCache，直接调用父类方法
-++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++++    #         return super()._sample(
-++++    #             input_ids=input_ids,
-++++    #             logits_processor=logits_processor,
-++++    #             stopping_criteria=stopping_criteria,
-++++    #             generation_config=generation_config,
-++++    #             synced_devices=synced_devices,
-++++    #             streamer=streamer,
-++++    #             logits_warper=logits_warper,
-++++    #             **model_kwargs,
-++++    #         )
-++++        
-++++    #     # 使用 StaticCache，进入自定义循环
-++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++++    #     pad_token_id = generation_config._pad_token_tensor
-++++    #     output_attentions = generation_config.output_attentions
-++++    #     output_hidden_states = generation_config.output_hidden_states
-++++    #     output_scores = generation_config.output_scores
-++++    #     output_logits = generation_config.output_logits
-++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++++    #     max_length = generation_config.max_length
-++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++++    #     do_sample = generation_config.do_sample
-++++        
-++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++++    #         raise ValueError(
-++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++++    #             f"{logits_warper})."
-++++    #         )
-++++        
-++++    #     # init attention / hidden states / scores tuples
-++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++++        
-++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++++    #         encoder_hidden_states = (
-++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++++    #         )
-++++        
-++++    #     # keep track of which sequences are already finished
-++++    #     batch_size, cur_len = input_ids.shape
-++++    #     this_peer_finished = False
-++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++++        
-++++    #     time_record = []
-++++    #     from ....utils.testing_utils import parse_flag_from_env
-++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++++        
-++++    #     while self._has_unfinished_sequences(
-++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++++    #     ):
-++++    #         if _record_time:
-++++    #             import time as time_module
-++++    #             infer_start = time_module.time()
-++++            
-++++    #         # prepare model inputs
-++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++++            
-++++    #         # prepare variable output controls
-++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++++            
-++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++++    #         cur_cache_position = model_inputs.get("cache_position")
-++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-++++    #         cur_input_ids = model_inputs.get("input_ids")
-++++            
-++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++++    #             cur_cache_position is not None and 
-++++    #             len(cur_cache_position.shape) > 0 and
-++++    #             cur_cache_position.shape[0] == 1 and
-++++    #             cur_input_ids is not None and
-++++    #             cur_input_ids.shape[1] == 1):
-++++    #             # 使用 JIT 优化的单 token 解码
-++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++++    #             if not hasattr(self, '_jit_used'):
-++++    #                 self._jit_used = False
-++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++++                
-++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-++++    #                 cur_token=cur_input_ids,
-++++    #                 input_pos=model_inputs.get("position_ids"),
-++++    #                 cache_position=cur_cache_position,
-++++    #                 past_key_values=cur_past_key_values,
-++++    #             )
-++++                
-++++    #             # 标记已使用JIT（用于后续判断）
-++++    #             if not self._jit_used:
-++++    #                 self._jit_used = True
-++++                
-++++    #             # 构造兼容的输出对象
-++++    #             class JitOptimizedOutput:
-++++    #                 def __init__(self, logits, config):
-++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++++    #                     self.config = config
-++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-++++    #                     self.cross_attentions = None
-++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++++                
-++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++++    #         else:
-++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++++    #             outputs = self(**model_inputs, return_dict=True)
-++++            
-++++    #         if synced_devices and this_peer_finished:
-++++    #             continue
-++++            
-++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++++    #         next_token_logits = outputs.logits[:, -1, :]
-++++            
-++++    #         # pre-process distribution
-++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++++    #         if do_sample:
-++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++++            
-++++    #         # Store scores, attentions and hidden_states when required
-++++    #         if return_dict_in_generate:
-++++    #             if output_scores:
-++++    #                 scores += (next_token_scores,)
-++++    #             if output_logits:
-++++    #                 raw_logits += (next_token_logits,)
-++++    #             if output_attentions:
-++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++++    #                 if self.config.is_encoder_decoder:
-++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++++                
-++++    #             if output_hidden_states:
-++++    #                 hidden = (
-++++    #                     outputs.decoder_hidden_states
-++++    #                     if self.config.is_encoder_decoder
-++++    #                     else outputs.hidden_states
-++++    #                 )
-++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++++            
-++++    #         # token selection
-++++    #         if do_sample:
-++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++++    #         else:
-++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++++            
-++++    #         # finished sentences should have their next token be a padding token
-++++    #         if has_eos_stopping_criteria:
-++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++++            
-++++    #         # update generated ids, model inputs, and length for next step
-++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++++    #         if streamer is not None:
-++++    #             streamer.put(next_tokens)
-++++            
-++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-++++    #             outputs,
-++++    #             model_kwargs,
-++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++++    #         )
-++++            
-++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++++    #         cur_len += 1
-++++            
-++++    #         if _record_time:
-++++    #             import time as time_module
-++++    #             infer_stop = time_module.time()
-++++    #             time_record.append(infer_stop - infer_start)
-++++            
-++++    #         del outputs
-++++        
-++++    #     average_infer_time = None
-++++    #     if time_record:
-++++    #         if len(time_record) > 1:
-++++    #             time_record.pop(0)
-++++    #         average_infer_time = sum(time_record) / len(time_record)
-++++    #         print(f'average inference time is: {average_infer_time}')
-++++    #         print(f'inference time record: {time_record}')
-++++        
-++++    #     if streamer is not None:
-++++    #         streamer.end()
-++++        
-++++    #     # 简单判断：打印是否使用了JIT路径
-++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-++++    #     else:
-++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++++        
-++++    #     if return_dict_in_generate:
-++++    #         if self.config.is_encoder_decoder:
-++++    #             return GenerateEncoderDecoderOutput(
-++++    #                 sequences=input_ids,
-++++    #                 scores=scores,
-++++    #                 logits=raw_logits,
-++++    #                 encoder_attentions=encoder_attentions,
-++++    #                 encoder_hidden_states=encoder_hidden_states,
-++++    #                 decoder_attentions=decoder_attentions,
-++++    #                 cross_attentions=cross_attentions,
-++++    #                 decoder_hidden_states=decoder_hidden_states,
-++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++    #                 average_infer_time=average_infer_time
-++++    #             )
-++++    #         else:
-++++    #             return GenerateDecoderOnlyOutput(
-++++    #                 sequences=input_ids,
-++++    #                 scores=scores,
-++++    #                 logits=raw_logits,
-++++    #                 attentions=decoder_attentions,
-++++    #                 hidden_states=decoder_hidden_states,
-++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++    #                 average_infer_time=average_infer_time
-++++    #             )
-++++    #     else:
-++++    #         return input_ids
-++++            
-++++    # def _prepare_cache_for_generation(
-++++    #     self,
-++++    #     generation_config,
-++++    #     model_kwargs,
-++++    #     assistant_model,
-++++    #     batch_size,
-++++    #     max_cache_length,
-++++    # ):
-++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++++    #         generation_config.cache_implementation = "static"
-++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++++        
-++++    #     if generation_config.cache_implementation == "static":
-++++    #         base_required_from_max_length = generation_config.max_length + 1
-++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-++++    #         min_cache_size = 50
-++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++++    #         else:
-++++    #             max_cache_length = max(base_required, min_cache_size)
-++++            
-++++    #         original_max_cache_length = max_cache_length
-++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-++++            
-++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++    #             if max_cache_length > self.config.max_position_embeddings:
-++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++++        
-++++    #     result = super()._prepare_cache_for_generation(
-++++    #         generation_config=generation_config,
-++++    #         model_kwargs=model_kwargs,
-++++    #         assistant_model=assistant_model,
-++++    #         batch_size=batch_size,
-++++    #         max_cache_length=max_cache_length,
-++++    #     )
-++++        
-++++    #     if generation_config.cache_implementation == "static":
-++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++++    #         created_cache = model_kwargs.get(cache_name)
-++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++++    #             if created_cache.max_cache_len < generation_config.max_length:
-++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++++        
-++++    #     return result
-++++
-++++
-++++
-+++ 
-+++ 
-+++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+++-- 
-+++2.27.0
-+++
-++-- 
-++2.27.0
-++
-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+new file mode 100644
-+index 00000000..966529e4
-+--- /dev/null
-++++ b/patches/0003-20261106secondcommit.patch
-+@@ -0,0 +1,2769 @@
-++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-++From: Pinoeer-kingxi <13022943007@163.com>
-++Date: Thu, 6 Nov 2025 14:54:37 +0800
-++Subject: [PATCH 3/3] 20261106secondcommit
-++
-++---
-++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
-++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
-++ patches/0001-20251104commit.patch             | 1272 -----------------
-++ 3 files changed, 528 insertions(+), 2032 deletions(-)
-++ delete mode 100644 patches/0001-20251104commit.patch
-++
-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++index 73773c22..2f9192bf 100644
-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
-++ 
-++ _CONFIG_FOR_DOC = "DeepseekConfig"
-++ 
-+++_attn_mask_cache = {}
-+++
-+++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-+++    q_len = batch_and_seq[1]
-+++    kv_len = batch_and_seq[1] + past_key_values_length 
-+++    key = (batch_and_seq[0], q_len, kv_len)
-+++
-+++    if key in _attn_mask_cache:
-+++        return _attn_mask_cache[key]
-+++
-+++    mask = _prepare_4d_causal_attention_mask(
-+++        attention_mask,
-+++        batch_and_seq,
-+++        inputs_embeds,
-+++        past_key_values_length,
-+++    )
-+++    _attn_mask_cache[key] = mask
-+++    return mask
-++ 
-++ def _get_unpad_data(attention_mask):
-++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
-++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
-++         return final_output
-++ 
-++ 
-++-    @no_grad()
-++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++-        expert_cache = ops.zeros_like(x)
-++-        idxs = flat_expert_indices.argsort()
-++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++-        token_idxs = idxs // self.num_experts_per_tok
-++-
-++-        for i, end_idx in enumerate(tokens_per_expert):
-++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++-            if start_idx == end_idx:
-++-                continue
-++-            expert = self.experts[i]
-++-            exp_token_idx = token_idxs[start_idx:end_idx]
-++-            expert_tokens = x[exp_token_idx]
-++-            expert_out = expert(expert_tokens)
-++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++-
-++-        return expert_cache
-++-        
-++     # @no_grad()
-++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++-    #     # expert_cache = torch.zeros_like(x)
-++-    #     # idxs = flat_expert_indices.argsort()
-++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++-    #     # token_idxs = idxs // self.num_experts_per_tok
-++-    #     # for i, end_idx in enumerate(tokens_per_expert):
-++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++-    #     #     if start_idx == end_idx:
-++-    #     #         continue
-++-    #     #     expert = self.experts[i]
-++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++-    #     #     expert_tokens = x[exp_token_idx]
-++-    #     #     expert_out = expert(expert_tokens)
-++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++-    #     # return expert_cache
-+++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++     #     expert_cache = ops.zeros_like(x)
-++     #     idxs = flat_expert_indices.argsort()
-++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
-++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++ 
-++     #     return expert_cache
-++-    # @no_grad()
-++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++-    #     expert_cache = ops.zeros_like(x)
-+++        
-+++    @no_grad()
-+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++        """
-+++        优化版 MoE prefill：
-+++        - 批量张量化处理同一个 expert 的所有 token
-+++        - 跳过无 token 的专家
-+++        - 保持结果完全一致
-+++        """
-+++        # 初始化输出缓存
-+++        expert_cache = ops.zeros_like(x)
-++ 
-++-    #     # 排序保证顺序一致
-++-    #     idxs = flat_expert_indices.argsort()
-++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++-    #     token_idxs = idxs // self.num_experts_per_tok
-+++        # 排序（确保 scatter_add 位置对应原逻辑）
-+++        idxs = flat_expert_indices.argsort()
-+++        sorted_expert_indices = flat_expert_indices[idxs]
-+++        sorted_token_indices = idxs // self.num_experts_per_tok
-++ 
-++-    #     # 找出有 token 的专家
-++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++        # 每个 expert 的 token 数
-+++        tokens_per_expert = sorted_expert_indices.bincount()
-++ 
-++-    #     for i in active_experts.tolist():
-++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++-    #         end_idx = tokens_per_expert[i]
-++-    #         if start_idx == end_idx:  # 没有 token
-++-    #             continue
-+++        # 找出有 token 的专家
-+++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-++ 
-++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++-    #         expert_tokens = x[exp_token_idx]
-++-    #         expert_out = self.experts[i](expert_tokens)
-++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++        for expert_id in active_experts.tolist():
-+++            # 取该 expert 对应的排序后 token 区间
-+++            start = (tokens_per_expert[:expert_id]).sum().item()
-+++            end = start + tokens_per_expert[expert_id].item()
-++ 
-++-    #         expert_cache = mindspore.mint.scatter_add(
-++-    #             expert_cache,
-++-    #             0,
-++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++-    #             expert_out
-++-    #         )
-+++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
-+++            expert_tokens = x[token_idx]                     # 取输入向量
-++ 
-++-    #     return expert_cache
-+++            # 执行专家 MLP
-+++            expert_out = self.experts[expert_id](expert_tokens)
-+++
-+++            # 按权重缩放
-+++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
-+++
-+++            # 回写到缓存（等价 scatter_add）
-+++            expert_cache = mindspore.mint.scatter_add(
-+++                expert_cache,
-+++                0,
-+++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++                scaled_out
-+++            )
-+++
-+++        return expert_cache
-+++
-+++        # @no_grad()
-+++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++        #     # expert_cache = torch.zeros_like(x)
-+++        #     # idxs = flat_expert_indices.argsort()
-+++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++        #     # token_idxs = idxs // self.num_experts_per_tok
-+++        #     # for i, end_idx in enumerate(tokens_per_expert):
-+++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++        #     #     if start_idx == end_idx:
-+++        #     #         continue
-+++        #     #     expert = self.experts[i]
-+++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++        #     #     expert_tokens = x[exp_token_idx]
-+++        #     #     expert_out = expert(expert_tokens)
-+++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++        #     # return expert_cache
-+++        #     expert_cache = ops.zeros_like(x)
-+++        #     idxs = flat_expert_indices.argsort()
-+++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++        #     token_idxs = idxs // self.num_experts_per_tok
-+++
-+++        #     for i, end_idx in enumerate(tokens_per_expert):
-+++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++        #         if start_idx == end_idx:
-+++        #             continue
-+++        #         expert = self.experts[i]
-+++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++        #         expert_tokens = x[exp_token_idx]
-+++        #         expert_out = expert(expert_tokens)
-+++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++
-+++        #     return expert_cache
-+++        # @no_grad()
-+++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++        #     expert_cache = ops.zeros_like(x)
-+++
-+++        #     # 排序保证顺序一致
-+++        #     idxs = flat_expert_indices.argsort()
-+++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++        #     token_idxs = idxs // self.num_experts_per_tok
-+++
-+++        #     # 找出有 token 的专家
-+++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++
-+++        #     for i in active_experts.tolist():
-+++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++        #         end_idx = tokens_per_expert[i]
-+++        #         if start_idx == end_idx:  # 没有 token
-+++        #             continue
-+++
-+++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++        #         expert_tokens = x[exp_token_idx]
-+++        #         expert_out = self.experts[i](expert_tokens)
-+++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++
-+++        #         expert_cache = mindspore.mint.scatter_add(
-+++        #             expert_cache,
-+++        #             0,
-+++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++        #             expert_out
-+++        #         )
-+++
-+++        #     return expert_cache
-++ 
-++ 
-++ 
-++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
-++ 
-++         return attn_output, attn_weights, past_key_value
-++ 
-++-
-++ # class DeepseekFlashAttention(nn.Module):
-++ #     """
-++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
-++ 
-++         return attn_output, attn_weights, past_key_value
-++ 
-+++
-++ Deepseek_ATTENTION_CLASSES = {
-++     "eager": DeepseekAttention,
-++     "flash-attention": DeepseekFlashAttention,
-++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
-++             )
-++         else:
-++             # 4d mask is passed through the layers
-++-            attention_mask = _prepare_4d_causal_attention_mask(
-+++            # attention_mask = _prepare_4d_causal_attention_mask(
-+++            #     attention_mask,
-+++            #     (batch_size, seq_length),
-+++            #     inputs_embeds,
-+++            #     past_key_values_length,
-+++            # )
-+++            #@dwj
-+++            attention_mask = get_cached_causal_mask(
-++                 attention_mask,
-++                 (batch_size, seq_length),
-++                 inputs_embeds,
-++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++         # Initialize weights and apply final processing
-++         self.post_init()
-++         self.warm_up = False
-+++        #@dwj
-+++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-+++            self.num_layers,
-+++            self.num_attention_heads,
-+++            self.head_dim,
-+++            batch_size=1,
-+++            max_length=self.max_length,
-+++            dtype=mindspore.float16
-+++        )
-+++
-+++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-+++        key_cache = []
-+++        value_cache = []
-+++        for _ in range(num_layers):
-+++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++            key_cache.append(k)
-+++            value_cache.append(v)
-+++        return key_cache, value_cache
-+++
-++ 
-++     def warmup_moe_model_deep(self):
-++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++index bced285c..ebd7782e 100644
-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
-++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-++ 
-++-Long_Prompt = False
-++-PROMPT_LENGTH_THRESHOLD = 128
-+++Long_Prompt = 1
-+++LONG_PROMPT_LENGTH_THRESHOLD = 128
-+++SHORT_PROMPT_LENGTH_THRESHOLD = 32
-+++
-+++_causal_mask_cache = {}
-+++
-+++def get_cached_causal_mask_with_cache_position(
-+++    attention_mask: mindspore.Tensor,
-+++    sequence_length: int,
-+++    target_length: int,
-+++    dtype: mindspore.dtype,
-+++    min_dtype: float,
-+++    cache_position: mindspore.Tensor,
-+++    batch_size: int,
-+++):
-+++    """
-+++    带缓存的 causal mask 构造函数
-+++    """
-+++    # q_len 是当前 query 长度
-+++    q_len = sequence_length
-+++    # kv_len 是 target_length
-+++    kv_len = target_length
-+++
-+++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
-+++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
-+++
-+++    if key in _causal_mask_cache:
-+++        return _causal_mask_cache[key]
-+++
-+++    # 调用原来的 mask 构造逻辑
-+++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++        attention_mask,
-+++        sequence_length=sequence_length,
-+++        target_length=target_length,
-+++        dtype=dtype,
-+++        min_dtype=min_dtype,
-+++        cache_position=cache_position,
-+++        batch_size=batch_size,
-+++    )
-+++    # 缓存结果
-+++    _causal_mask_cache[key] = causal_mask
-+++    return causal_mask
-++ 
-++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-++ def _prepare_4d_causal_attention_mask_with_cache_position(
-++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++ 
-++ 
-++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
-+++# class Qwen2MoeAttention(nn.Module):
-+++#     """
-+++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-+++#     and "Generating Long Sequences with Sparse Transformers".
-+++#     """
-+++
-+++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++#         super().__init__()
-+++#         self.config = config
-+++#         self.layer_idx = layer_idx
-+++#         if layer_idx is None:
-+++#             logger.warning_once(
-+++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++#                 "when creating this class."
-+++#             )
-+++
-+++#         self.hidden_size = config.hidden_size
-+++#         self.num_heads = config.num_attention_heads
-+++#         self.head_dim = self.hidden_size // self.num_heads
-+++#         self.num_key_value_heads = config.num_key_value_heads
-+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++#         self.max_position_embeddings = config.max_position_embeddings
-+++#         self.rope_theta = config.rope_theta
-+++#         self.is_causal = True
-+++#         self.attention_dropout = config.attention_dropout
-+++
-+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++#             raise ValueError(
-+++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++#                 f" and `num_heads`: {self.num_heads})."
-+++#             )
-+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++
-+++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++#             self.head_dim,
-+++#             max_position_embeddings=self.max_position_embeddings,
-+++#             base=self.rope_theta,
-+++#         )
-+++
-+++#     def forward(
-+++#         self,
-+++#         hidden_states: mindspore.Tensor,
-+++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++#         position_ids: Optional[mindspore.Tensor] = None,
-+++#         past_key_value: Optional[Cache] = None,
-+++#         output_attentions: bool = False,
-+++#         use_cache: bool = False,
-+++#         cache_position: Optional[mindspore.Tensor] = None,
-+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++
-+++        
-+++
-+++#         bsz, q_len, _ = hidden_states.shape
-+++
-+++#         query_states = self.q_proj(hidden_states)
-+++#         key_states = self.k_proj(hidden_states)
-+++#         value_states = self.v_proj(hidden_states)
-+++
-+++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++
-+++#         kv_seq_len = key_states.shape[-2]
-+++#         if past_key_value is not None:
-+++#             if self.layer_idx is None:
-+++#                 raise ValueError(
-+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++#                     "with a layer index."
-+++#                 )
-+++#             if isinstance(past_key_value, StaticCache):
-+++#                 kv_seq_len = key_states.shape[-2]
-+++#             else:
-+++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++
-+++#         if past_key_value is not None:
-+++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++            
-+++#             if isinstance(past_key_value, StaticCache):
-+++#                 kv_seq_len = key_states.shape[-2]
-+++
-+++#         # repeat k/v heads if n_kv_heads < n_heads
-+++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++        
-+++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++
-+++#         if attention_mask is not None:
-+++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++#             attn_weights = attn_weights + causal_mask
-+++
-+++#         # upcast attention to fp32
-+++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+++#         attn_output = ops.matmul(attn_weights, value_states)
-+++
-+++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+++#             raise ValueError(
-+++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-+++#                 f" {attn_output.shape}"
-+++#             )
-+++
-+++#         attn_output = ops.transpose(attn_output, 1, 2)
-+++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++
-+++#         attn_output = self.o_proj(attn_output)
-+++#         # @lwx
-+++        
-+++#         # max_seq_len = self.max_position_embeddings  # 2048
-+++
-+++#         # if attention_mask is not None:
-+++#         #     # attention_mask: [B, 1, Sq, Sk]
-+++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++
-+++#         #     # pad 到 [max_seq_len, max_seq_len]
-+++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++#         #     global_attention_mask = padded_mask
-+++#         # else:
-+++#         #     global_attention_mask = None
-+++
-+++
-+++#         # sparse_mode=3
-+++#         # attn_output = mindspore.ops.flash_attention_score(
-+++#         #     query=query_states,
-+++#         #     key=key_states,
-+++#         #     value=value_states,
-+++#         #     real_shift=None,
-+++#         #     padding_mask=None,
-+++
-+++#         #     head_num=self.num_heads,
-+++#         #     attn_mask=global_attention_mask,
-+++#         #     keep_prob=1.0 - self.attention_dropout,
-+++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++#         #     input_layout="BNSD",
-+++#         #     pre_tokens=2147483647,
-+++#         #     next_tokens=2147483647,
-+++#         #     inner_precise=0,
-+++#         #     drop_mask=None,
-+++#         #     prefix=None,
-+++#         #     actual_seq_qlen=None,
-+++#         #     actual_seq_kvlen=None,
-+++#         #     sparse_mode=sparse_mode,
-+++#         # )
-+++#         if not output_attentions:
-+++#             attn_weights = None
-+++
-+++#         return attn_output, attn_weights, past_key_value
-+++
-++ class Qwen2MoeAttention(nn.Module):
-++     """
-++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-++-    and "Generating Long Sequences with Sparse Transformers".
-++-    """
-+++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
-++ 
-+++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
-+++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
-+++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
-+++
-+++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
-+++    """
-++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++         super().__init__()
-++         self.config = config
-++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
-++         if layer_idx is None:
-++             logger.warning_once(
-++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++                 "when creating this class."
-++             )
-++ 
-++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
-++         use_cache: bool = False,
-++         cache_position: Optional[mindspore.Tensor] = None,
-++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++-
-++         
-++-
-+++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
-++         bsz, q_len, _ = hidden_states.shape
-++ 
-++         query_states = self.q_proj(hidden_states)
-++         key_states = self.k_proj(hidden_states)
-++         value_states = self.v_proj(hidden_states)
-++ 
-++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++-
-+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++        
-++         kv_seq_len = key_states.shape[-2]
-++         if past_key_value is not None:
-++-            if self.layer_idx is None:
-++-                raise ValueError(
-++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++-                    "with a layer index."
-++-                )
-++-            if isinstance(past_key_value, StaticCache):
-++-                kv_seq_len = key_states.shape[-2]
-++-            else:
-++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++        
-++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++ 
-++         if past_key_value is not None:
-++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++
-+++        # --- 2. 动态调度核心注意力计算 ---
-+++        global Long_Prompt
-+++        if Long_Prompt >= 1:
-+++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
-+++            fa_attention_mask = None
-+++            if attention_mask is not None:
-+++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++                fa_attention_mask = (mask_slice != 0)
-+++
-+++            attn_output = mindspore.ops.flash_attention_score(
-+++                query=query_states,
-+++                key=key_states,
-+++                value=value_states,
-+++                head_num=self.num_heads,
-+++                attn_mask=fa_attention_mask,
-+++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
-+++                scalar_value=1.0 / math.sqrt(self.head_dim),
-+++                input_layout="BNSD",
-+++                sparse_mode=0,
-+++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
-+++            )
-++             
-++-            if isinstance(past_key_value, StaticCache):
-++-                kv_seq_len = key_states.shape[-2]
-+++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++            attn_output = self.o_proj(attn_output)
-+++            attn_weights = None
-+++            if output_attentions:
-+++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-++ 
-++-        # repeat k/v heads if n_kv_heads < n_heads
-++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
-++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
-++-        
-++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++        else:
-+++            # --- Eager Attention 路径 (用于短序列和解码) ---
-+++            key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++            value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++            
-+++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++ 
-++-        if attention_mask is not None:
-++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++-            attn_weights = attn_weights + causal_mask
-+++            if attention_mask is not None:
-+++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++                attn_weights = attn_weights + causal_mask
-++ 
-++-        # upcast attention to fp32
-++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++-        attn_output = ops.matmul(attn_weights, value_states)
-+++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+++            attn_output = ops.matmul(attn_weights, value_states)
-++ 
-++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++-            raise ValueError(
-++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-++-                f" {attn_output.shape}"
-++-            )
-+++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+++                raise ValueError(
-+++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
-+++                )
-++ 
-++-        attn_output = ops.transpose(attn_output, 1, 2)
-++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++            attn_output = ops.transpose(attn_output, 1, 2)
-+++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++            attn_output = self.o_proj(attn_output)
-++ 
-++-        attn_output = self.o_proj(attn_output)
-++-        # @lwx
-+++            if not output_attentions:
-+++                attn_weights = None
-++         
-++-        # max_seq_len = self.max_position_embeddings  # 2048
-++-
-++-        # if attention_mask is not None:
-++-        #     # attention_mask: [B, 1, Sq, Sk]
-++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++-
-++-        #     # pad 到 [max_seq_len, max_seq_len]
-++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++-        #     global_attention_mask = padded_mask
-++-        # else:
-++-        #     global_attention_mask = None
-++-
-++-
-++-        # sparse_mode=3
-++-        # attn_output = mindspore.ops.flash_attention_score(
-++-        #     query=query_states,
-++-        #     key=key_states,
-++-        #     value=value_states,
-++-        #     real_shift=None,
-++-        #     padding_mask=None,
-++-
-++-        #     head_num=self.num_heads,
-++-        #     attn_mask=global_attention_mask,
-++-        #     keep_prob=1.0 - self.attention_dropout,
-++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++-        #     input_layout="BNSD",
-++-        #     pre_tokens=2147483647,
-++-        #     next_tokens=2147483647,
-++-        #     inner_precise=0,
-++-        #     drop_mask=None,
-++-        #     prefix=None,
-++-        #     actual_seq_qlen=None,
-++-        #     actual_seq_kvlen=None,
-++-        #     sparse_mode=sparse_mode,
-++-        # )
-++-        if not output_attentions:
-++-            attn_weights = None
-++-
-++         return attn_output, attn_weights, past_key_value
-++ 
-++-
-++ # class Qwen2MoeFlashAttention(nn.Module):
-++ #     """
-++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
-++ #             return final_hidden_states, router_logits
-++ 
-++ 
-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++-#     """
-++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
-++-#     """
-++-#     def __init__(self, config: Qwen2MoeConfig):
-++-#         super().__init__()
-++-#         self.num_experts = config.num_experts
-++-#         self.top_k = config.num_experts_per_tok
-++-#         self.norm_topk_prob = config.norm_topk_prob
-++-
-++-#         # 门控网络
-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++-#         # 专家列表
-++-#         self.experts = nn.ModuleList(
-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++-#         )
-++-#         # 共享专家
-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++-
-++-#     @no_grad()
-++-#     def _moe_infer_decode(
-++-#         self, 
-++-#         hidden_states: mindspore.Tensor, 
-++-#         selected_experts: mindspore.Tensor, 
-++-#         routing_weights: mindspore.Tensor
-++-#     ) -> mindspore.Tensor:
-++-#         """
-++-#         【解码路径】针对 sequence_length=1 的极致优化。
-++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-++-#         """
-++-#         batch_size, hidden_dim = hidden_states.shape
-++-        
-++-#         expert_outputs_list = [
-++-#             ops.cat([
-++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++-#             ], dim=0) 
-++-#             for i in range(batch_size)
-++-#         ]
-++-        
-++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-++-#         # shape: (batch_size, top_k, hidden_dim)
-++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++-        
-++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++-        
-++-#         return moe_output.squeeze(1)
-++-
-++-#     @no_grad()
-++-#     def _moe_infer_prefill(
-++-#         self, 
-++-#         hidden_states: mindspore.Tensor, 
-++-#         selected_experts: mindspore.Tensor, 
-++-#         routing_weights: mindspore.Tensor
-++-#     ) -> mindspore.Tensor:
-++-#         """
-++-#         【预填充路径】针对 sequence_length > 1 的优化。
-++-#         按专家对 Token 进行分组，并进行批处理。
-++-#         """
-++-#         moe_output = ops.zeros_like(hidden_states)
-++-#         num_tokens = hidden_states.shape[0]
-++-#         flat_selected_experts = selected_experts.flatten()
-++-        
-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++-        
-++-#         active_experts = ops.unique(flat_selected_experts)
-++-        
-++-#         for expert_idx_tensor in active_experts:
-++-#             expert_idx = expert_idx_tensor.item()
-++-#             expert_layer = self.experts[expert_idx]
-++-            
-++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++-#             selected_token_indices = token_indices[mask]
-++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++-            
-++-#             current_states = hidden_states[selected_token_indices]
-++-            
-++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++-            
-++-#             moe_output = moe_output.index_add(
-++-#                 dim=0,
-++-#                 index=selected_token_indices,
-++-#                 source=expert_output.to(hidden_states.dtype)
-++-#             )
-++-#         return moe_output
-++-
-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++-#         """
-++-#         顶层 forward 方法，作为智能分发器。
-++-#         """
-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-        
-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++-#         router_logits = self.gate(hidden_states_reshaped)
-++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++-
-++-#         if self.norm_topk_prob:
-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-        
-++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++-        
-++-#         moe_output = None
-++-#         # 在推理时，根据序列长度选择最优路径
-++-#         if not self.training:
-++-#             if sequence_length == 1:
-++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++-#             else:
-++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++-#         else:
-++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-++-#             raise NotImplementedError("Training path is not implemented.")
-++-
-++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-++-        
-++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-++-        
-++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-++-        
-++-#         return final_hidden_states, router_logits
-++-
-++-
-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++-#     """
-++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-++-#     """
-++-#     def __init__(self, config: Qwen2MoeConfig):
-++-#         super().__init__()
-++-#         self.num_experts = config.num_experts
-++-#         self.top_k = config.num_experts_per_tok
-++-#         self.norm_topk_prob = config.norm_topk_prob
-++-
-++-#         # 门控网络
-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++-#         # 专家列表
-++-#         self.experts = nn.ModuleList(
-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++-#         )
-++-#         # 共享专家
-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++-
-++-#     @no_grad()
-++-#     def _moe_infer_decode(
-++-#         self, 
-++-#         hidden_states: mindspore.Tensor, 
-++-#         selected_experts: mindspore.Tensor, 
-++-#         routing_weights: mindspore.Tensor
-++-#     ) -> mindspore.Tensor:
-++-#         batch_size, _ = hidden_states.shape
-++-#         expert_outputs_list = [
-++-#             ops.cat([
-++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++-#             ], dim=0) 
-++-#             for i in range(batch_size)
-++-#         ]
-++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++-#         return moe_output.squeeze(1)
-++-
-++-#     @no_grad()
-++-#     def _moe_infer_prefill(
-++-#         self, 
-++-#         hidden_states: mindspore.Tensor, 
-++-#         selected_experts: mindspore.Tensor, 
-++-#         routing_weights: mindspore.Tensor
-++-#     ) -> mindspore.Tensor:
-++-#         moe_output = ops.zeros_like(hidden_states)
-++-#         num_tokens = hidden_states.shape[0]
-++-#         flat_selected_experts = selected_experts.flatten()
-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++-#         active_experts = ops.unique(flat_selected_experts)
-++-        
-++-#         for expert_idx_tensor in active_experts:
-++-#             expert_idx = expert_idx_tensor.item()
-++-#             expert_layer = self.experts[expert_idx]
-++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++-#             selected_token_indices = token_indices[mask]
-++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++-#             current_states = hidden_states[selected_token_indices]
-++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++-#             moe_output = moe_output.index_add(
-++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++-#             )
-++-#         return moe_output
-++-
-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++-#         """
-++-#         顶层 forward 方法，作为智能分发器。
-++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-++-#         """
-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-        
-++-#         # 1. 门控计算 (通用逻辑)
-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++-#         router_logits = self.gate(hidden_states_reshaped)
-++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++-
-++-#         if self.norm_topk_prob:
-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-        
-++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++-        
-++-#         # 2. 智能分发到最优 MoE 路径
-++-#         moe_output = None
-++-#         if not self.training:
-++-#             if sequence_length == 1:
-++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++-#             else:
-++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++-#         else:
-++-#             raise NotImplementedError("Training path is not implemented.")
-++-
-++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++-        
-++-#         # 4. 合并 MoE 输出和共享专家输出
-++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++-        
-++-#         # 5. 恢复原始形状并返回
-++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++-        
-++-#         return final_hidden_states, router_logits
-++-
-++-# prefill fastest
-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++-#     """
-++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-++-#     """
-++-#     def __init__(self, config: Qwen2MoeConfig):
-++-#         super().__init__()
-++-#         self.num_experts = config.num_experts
-++-#         self.top_k = config.num_experts_per_tok
-++-#         self.norm_topk_prob = config.norm_topk_prob
-++-
-++-#         # 门控网络
-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++-#         # 专家列表
-++-#         self.experts = nn.ModuleList(
-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++-#         )
-++-#         # 共享专家
-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++-
-++-#     @no_grad()
-++-#     def _moe_infer_dispatch(
-++-#         self, 
-++-#         hidden_states: mindspore.Tensor, 
-++-#         selected_experts: mindspore.Tensor, 
-++-#         routing_weights: mindspore.Tensor
-++-#     ) -> mindspore.Tensor:
-++-#         """
-++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-++-#         """
-++-#         moe_output = ops.zeros_like(hidden_states)
-++-#         num_tokens, _ = hidden_states.shape
-++-        
-++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-++-#         flat_selected_experts = selected_experts.flatten()
-++-#         flat_routing_weights = routing_weights.flatten()
-++-
-++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++-
-++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-++-#         active_experts = ops.unique(flat_selected_experts)
-++-        
-++-#         for expert_idx_tensor in active_experts:
-++-#             expert_idx = expert_idx_tensor.item()
-++-#             expert_layer = self.experts[expert_idx]
-++-            
-++-#             # 找到所有分配给该专家的 token
-++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++-            
-++-#             # 使用 mask 选取对应的 token 和权重
-++-#             current_token_indices = token_indices[mask]
-++-#             current_routing_weights = flat_routing_weights[mask]
-++-#             current_hidden_states = hidden_states[current_token_indices]
-++-            
-++-#             # 对这些 token 进行批处理
-++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++-            
-++-#             # 使用 index_add 将结果精确地加回到对应位置
-++-#             moe_output = moe_output.index_add(
-++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-++-#             )
-++-#         return moe_output
-++-
-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++-#         """
-++-#         顶层 forward 方法，作为智能分发器。
-++-#         """
-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-        
-++-#         # 1. 门控计算
-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++-#         router_logits = self.gate(hidden_states_reshaped)
-++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++-
-++-#         if self.norm_topk_prob:
-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-        
-++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++-        
-++-#         # 2. 调用统一的 MoE 计算内核
-++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-++-
-++-#         # 3. 统一处理共享专家
-++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++-        
-++-#         # 4. 合并输出
-++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++-        
-++-#         # 5. 恢复原始形状并返回
-++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++-        
-++-#         return final_hidden_states, router_logits
-++-
-++-
-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++-#     """
-++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++-#     【最终高性能与高精度版】：
-++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-++-#     3. 这样实现了速度和准确性的两全其美。
-++-#     """
-++-#     def __init__(self, config: Qwen2MoeConfig):
-++-#         super().__init__()
-++-#         self.num_experts = config.num_experts
-++-#         self.top_k = config.num_experts_per_tok
-++-#         self.norm_topk_prob = config.norm_topk_prob
-++-
-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++-#         self.experts = nn.ModuleList(
-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++-#         )
-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++-
-++-#     @no_grad()
-++-#     def _moe_infer_decode(
-++-#         self, 
-++-#         hidden_states: mindspore.Tensor, 
-++-#         selected_experts: mindspore.Tensor, 
-++-#         routing_weights: mindspore.Tensor
-++-#     ) -> mindspore.Tensor:
-++-#         """
-++-#         【解码路径】极致优化版：bmm + 高精度累加。
-++-#         """
-++-#         original_dtype = hidden_states.dtype
-++-#         batch_size, _ = hidden_states.shape
-++-        
-++-#         expert_outputs_list = [
-++-#             ops.cat([
-++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++-#             ], dim=0) 
-++-#             for i in range(batch_size)
-++-#         ]
-++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++-
-++-#         # 在 float32 下执行 bmm，得到高精度结果
-++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++-        
-++-#         # 将高精度结果转换回原始数据类型
-++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-++-        
-++-#         return moe_output
-++-
-++-#     @no_grad()
-++-#     def _moe_infer_prefill(
-++-#         self, 
-++-#         hidden_states: mindspore.Tensor, 
-++-#         selected_experts: mindspore.Tensor, 
-++-#         routing_weights: mindspore.Tensor
-++-#     ) -> mindspore.Tensor:
-++-#         """
-++-#         【预填充路径】与原始实现一致，结果精确。
-++-#         """
-++-#         moe_output = ops.zeros_like(hidden_states)
-++-#         num_tokens, _ = hidden_states.shape
-++-#         flat_selected_experts = selected_experts.flatten()
-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++-#         active_experts = ops.unique(flat_selected_experts)
-++-        
-++-#         for expert_idx_tensor in active_experts:
-++-#             expert_idx = expert_idx_tensor.item()
-++-#             expert_layer = self.experts[expert_idx]
-++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++-#             selected_token_indices = token_indices[mask]
-++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++-#             current_states = hidden_states[selected_token_indices]
-++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++-#             moe_output = moe_output.index_add(
-++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++-#             )
-++-#         return moe_output
-++-
-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-        
-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++-#         router_logits = self.gate(hidden_states_reshaped)
-++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++-
-++-#         if self.norm_topk_prob:
-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-        
-++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-++-#         # 如果模型主体是 float16，后续再转换
-++-        
-++-#         moe_output = None
-++-#         if not self.training:
-++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-++-#             # _moe_infer_decode 内部会处理好类型转换
-++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-++-#             if sequence_length == 1:
-++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++-#             else:
-++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++-#         else:
-++-#             raise NotImplementedError("Training path is not implemented.")
-++-
-++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++-        
-++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++-        
-++-#         return final_hidden_states, router_logits
-++-    
-++-
-++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++-#     """
-++-#     【融合版】一个混合专家模块，内置两种推理策略，
-++-#     由外部全局变量 `Long_Prompt` 控制：
-++-
-++-#     - if Long_Prompt is True:  【精度优先模式】
-++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-++-#       适用于处理长序列，避免误差累积。
-++-
-++-#     - if Long_Prompt is False: 【速度优先模式】
-++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-++-#       在解码阶段获得极致速度，同时保证结果高度准确。
-++-#     """
-++-#     def __init__(self, config: Qwen2MoeConfig):
-++-#         super().__init__()
-++-#         self.num_experts = config.num_experts
-++-#         self.top_k = config.num_experts_per_tok
-++-#         self.norm_topk_prob = config.norm_topk_prob
-++-
-++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++-#         self.experts = nn.ModuleList(
-++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++-#         )
-++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++-
-++-#     # --- 速度优先模式的辅助函数 ---
-++-#     @no_grad()
-++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++-#         original_dtype = hidden_states.dtype
-++-#         batch_size, _ = hidden_states.shape
-++-#         expert_outputs_list = [
-++-#             ops.cat([
-++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++-#             ], dim=0) 
-++-#             for i in range(batch_size)
-++-#         ]
-++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++-#         weights_fp32 = routing_weights.to(mindspore.float32)
-++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
-++-
-++-#     @no_grad()
-++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++-#         moe_output = ops.zeros_like(hidden_states)
-++-#         num_tokens, _ = hidden_states.shape
-++-#         flat_selected_experts = selected_experts.flatten()
-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++-#         active_experts = ops.unique(flat_selected_experts)
-++-#         for expert_idx_tensor in active_experts:
-++-#             expert_idx = expert_idx_tensor.item()
-++-#             expert_layer = self.experts[expert_idx]
-++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++-#             selected_token_indices = token_indices[mask]
-++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++-#             current_states = hidden_states[selected_token_indices]
-++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-++-#         return moe_output
-++-
-++-#     # --- 精度优先模式的辅助函数 ---
-++-#     @no_grad()
-++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++-#         moe_output = ops.zeros_like(hidden_states)
-++-#         num_tokens, _ = hidden_states.shape
-++-#         flat_selected_experts = selected_experts.flatten()
-++-#         flat_routing_weights = routing_weights.flatten()
-++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++-#         active_experts = ops.unique(flat_selected_experts)
-++-#         for expert_idx_tensor in active_experts:
-++-#             expert_idx = expert_idx_tensor.item()
-++-#             expert_layer = self.experts[expert_idx]
-++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++-#             current_token_indices = token_indices[mask]
-++-#             current_routing_weights = flat_routing_weights[mask]
-++-#             current_hidden_states = hidden_states[current_token_indices]
-++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++-#         return moe_output
-++-
-++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++-#         # 声明我们将要使用一个在模块外部定义的全局变量
-++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-++-#         global Long_Prompt
-++-
-++-#         # 1. 门控计算 (所有模式通用)
-++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++-#         router_logits = self.gate(hidden_states_reshaped)
-++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++-#         if self.norm_topk_prob:
-++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-        
-++-#         moe_output = None
-++-#         if not self.training:
-++-#             # 根据 Long_Prompt 标志选择模式
-++-#             if Long_Prompt:
-++-#                 # --- 精度优先模式 ---
-++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++-#             else:
-++-#                 # --- 速度优先模式 ---
-++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++-#                 if sequence_length == 1:
-++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++-#                 else:
-++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++-#         else:
-++-#             raise NotImplementedError("Training path is not implemented.")
-++-
-++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++-        
-++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++-        
-++-#         return final_hidden_states, router_logits
-++-    
-++ class Qwen2MoeSparseMoeBlock(nn.Module):
-++     """
-++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++         return moe_output_fp32.squeeze(1).to(original_dtype)
-++ 
-+++    # @no_grad()
-+++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++    #     num_tokens, _ = hidden_states.shape
-+++    #     flat_selected_experts = selected_experts.flatten()
-+++    #     sorted_expert_indices = flat_selected_experts.argsort()
-+++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+++    #     original_token_indices = sorted_expert_indices // self.top_k
-+++    #     moe_output = ops.zeros_like(hidden_states)
-+++    #     current_token_offset = 0
-+++    #     for i in range(self.num_experts):
-+++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
-+++    #         if expert_token_count == 0:
-+++    #             continue
-+++    #         end_offset = current_token_offset + expert_token_count
-+++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
-+++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+++    #         current_token_offset += expert_token_count
-+++    #     return moe_output
-+++
-++     @no_grad()
-++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++-        num_tokens, _ = hidden_states.shape
-++-        flat_selected_experts = selected_experts.flatten()
-++-        sorted_expert_indices = flat_selected_experts.argsort()
-++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++-        original_token_indices = sorted_expert_indices // self.top_k
-+++        """
-+++        优化版 MoE prefill (速度优先模式)：
-+++        - 批量张量化处理同一个 expert 的所有 token
-+++        - 跳过无 token 的专家
-+++        - 保持结果完全一致
-+++        """
-++         moe_output = ops.zeros_like(hidden_states)
-++-        current_token_offset = 0
-++-        for i in range(self.num_experts):
-++-            expert_token_count = tokens_per_expert[i] - current_token_offset
-++-            if expert_token_count == 0:
-++-                continue
-++-            end_offset = current_token_offset + expert_token_count
-++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++-            expert_hidden_states = hidden_states[expert_original_token_indices]
-++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++-            current_token_offset += expert_token_count
-+++
-+++        flat_selected_experts = selected_experts.flatten()
-+++        flat_routing_weights = routing_weights.flatten()
-+++
-+++        idxs = flat_selected_experts.argsort()
-+++        sorted_expert_indices = flat_selected_experts[idxs]
-+++        sorted_token_indices = idxs // self.top_k
-+++
-+++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-+++
-+++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-+++
-+++        for expert_id in active_experts.tolist():
-+++            start = int(tokens_per_expert[:expert_id].sum().item())
-+++            end = start + int(tokens_per_expert[expert_id].item())
-+++
-+++            token_idx = sorted_token_indices[start:end]
-+++            expert_tokens = hidden_states[token_idx]
-+++
-+++            expert_out = self.experts[expert_id](expert_tokens)
-+++
-+++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
-+++
-+++            moe_output = mindspore.mint.scatter_add(
-+++                moe_output,
-+++                0,
-+++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
-+++                scaled_out.to(hidden_states.dtype)
-+++            )
-+++
-++         return moe_output
-++ 
-+++
-++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-++     @no_grad()
-++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++         
-++         moe_output = None
-++-        if Long_Prompt:
-++-            # --- 精度优先模式 (ACCURACY MODE) ---
-++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++        # if Long_Prompt==0:
-+++        #     # --- 精度优先模式 (ACCURACY MODE) ---
-+++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++        # else:
-+++        #     # --- 速度优先模式 (SPEED MODE) ---
-+++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++        #     if sequence_length == 1:
-+++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++        #     else:
-+++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++        
-+++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++        if sequence_length == 1:
-+++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++         else:
-++-            # --- 速度优先模式 (SPEED MODE) ---
-++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++-            if sequence_length == 1:
-++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++-            else:
-++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++-        
-+++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++    
-++ 
-++         # 3. 共享专家计算与合并 (所有模式通用)
-++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++         
-++         return final_hidden_states, router_logits
-++ 
-+++
-++ class Qwen2MoeDecoderLayer(nn.Module):
-++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-++         super().__init__()
-++         self.hidden_size = config.hidden_size
-++         
-++-        # if Long_Prompt:
-++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++-        # else:
-+++        # if Long_Prompt == 2:
-++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++        # else:
-+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++ 
-++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++ 
-++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++             )
-++ 
-++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
-++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++        #     attention_mask,
-+++        #     sequence_length=sequence_length,
-+++        #     target_length=target_length,
-+++        #     dtype=dtype,
-+++        #     min_dtype=min_dtype,
-+++        #     cache_position=cache_position,
-+++        #     batch_size=input_tensor.shape[0],
-+++        # )
-+++        #@dwj
-+++        causal_mask = get_cached_causal_mask_with_cache_position(
-++             attention_mask,
-++             sequence_length=sequence_length,
-++             target_length=target_length,
-++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-++         """
-++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
-+++        _causal_mask_cache.clear()
-++ 
-++         input_ids = kwargs.get("input_ids")
-++         if input_ids is None and args:
-++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++ 
-++         if input_ids is not None:
-++             prompt_length = input_ids.shape[1]
-++-            
-++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-++-                Long_Prompt = True
-+++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-+++                Long_Prompt = 2
-+++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-+++                Long_Prompt = 0
-++             else:
-++-                Long_Prompt = False
-+++                Long_Prompt = 1
-+++
-++ 
-++         return super().generate(*args, **kwargs)
-++     
-++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++             dtype = self.lm_head.weight.dtype
-++             min_dtype = float(ops.finfo(dtype).min)
-++ 
-++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++            #     attention_mask,
-+++            #     sequence_length=sequence_length,
-+++            #     target_length=past_key_values.get_max_length(),
-+++            #     dtype=dtype,
-+++            #     min_dtype=min_dtype,
-+++            #     cache_position=cache_position,
-+++            #     batch_size=batch_size,
-+++            # )
-+++
-+++            #@dwj
-+++            attention_mask = get_cached_causal_mask_with_cache_position(
-++                 attention_mask,
-++                 sequence_length=sequence_length,
-++                 target_length=past_key_values.get_max_length(),
-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++deleted file mode 100644
-++index 6dfb5b93..00000000
-++--- a/patches/0001-20251104commit.patch
-+++++ /dev/null
-++@@ -1,1272 +0,0 @@
-++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++-From: Pinoeer-kingxi <13022943007@163.com>
-++-Date: Tue, 4 Nov 2025 09:11:51 +0800
-++-Subject: [PATCH] 20251104commit
-++-
-++----
-++- mindnlp/transformers/cache_utils.py           |  28 +-
-++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++- 3 files changed, 976 insertions(+), 87 deletions(-)
-++-
-++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++-index cadd2e04..02f8d4be 100644
-++---- a/mindnlp/transformers/cache_utils.py
-++-+++ b/mindnlp/transformers/cache_utils.py
-++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++-             # k_out[:, :, cache_position] = key_states
-++-             # v_out[:, :, cache_position] = value_states
-++--            if ON_ORANGE_PI:
-++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++--            else:
-++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++--
-++-+            # if ON_ORANGE_PI:
-++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++-+            # else:
-++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++-+            # 确保 cache_position 是 1D tensor 并且类型正确
-++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++-+            if cache_position.ndim > 1:
-++-+                cache_position = cache_position.flatten()
-++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
-++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++-+                cache_position = cache_position.int()
-++-+            
-++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++-+            k_out[:, :, cache_position] = key_states
-++-+            v_out[:, :, cache_position] = value_states
-++-+            
-++-         return k_out, v_out
-++- 
-++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++-index c695b944..d8303e45 100644
-++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++- # Copied from transformers.models.llama.modeling_llama.rotate_half
-++- def rotate_half(x):
-++-     """Rotates half the hidden dims of the input."""
-++--    x1 = x[..., : x.shape[-1] // 2]
-++--    x2 = x[..., x.shape[-1] // 2 :]
-++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++-+    # x1 = x[..., : x.shape[-1] // 2]
-++-+    # x2 = x[..., x.shape[-1] // 2 :]
-++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++-     return ops.cat((-x2, x1), dim=-1)
-++- 
-++- 
-++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++-         if self.training:
-++-             raise NotImplementedError("Training is not supported yet.")
-++-         else:
-++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++--        if self.config.n_shared_experts is not None:
-++--            y = y + self.shared_experts(identity)
-++--        return y
-++-+            # @lwx
-++-+            if orig_shape[1] == 1:
-++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++-+                y=y.view(*orig_shape)
-++-+                if self.config.n_shared_experts is not None:
-++-+                    y = y + self.shared_experts(identity)
-++-+                return y
-++-+            else:
-++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++-+                if self.config.n_shared_experts is not None:
-++-+                    y = y + self.shared_experts(identity)
-++-+                return y
-++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++-+        # if self.config.n_shared_experts is not None:
-++-+        #     y = y + self.shared_experts(identity)
-++-+        # return y
-++-+        
-++-+    @no_grad()
-++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++-+
-++-+        expert_cache = ops.zeros_like(x)
-++-+        for i in range(self.num_experts_per_tok):
-++-+            expert_id = flat_expert_indices[i].item()
-++-+            weight = flat_expert_weights[i].item()
-++-+            expert = self.experts[expert_id]
-++-+            expert_out = expert(x)
-++-+            expert_cache += expert_out * weight
-++-+        return expert_cache
-++- 
-++-     @no_grad()
-++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++--        # expert_cache = torch.zeros_like(x)
-++--        # idxs = flat_expert_indices.argsort()
-++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++--        # token_idxs = idxs // self.num_experts_per_tok
-++--        # for i, end_idx in enumerate(tokens_per_expert):
-++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++--        #     if start_idx == end_idx:
-++--        #         continue
-++--        #     expert = self.experts[i]
-++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++--        #     expert_tokens = x[exp_token_idx]
-++--        #     expert_out = expert(expert_tokens)
-++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++--        # return expert_cache
-++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++-         expert_cache = ops.zeros_like(x)
-++-         idxs = flat_expert_indices.argsort()
-++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++-         token_idxs = idxs // self.num_experts_per_tok
-++-+
-++-         for i, end_idx in enumerate(tokens_per_expert):
-++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++-             if start_idx == end_idx:
-++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++-             expert_out = expert(expert_tokens)
-++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++-+
-++-         return expert_cache
-++-+        
-++-+    # @no_grad()
-++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++-+    #     # expert_cache = torch.zeros_like(x)
-++-+    #     # idxs = flat_expert_indices.argsort()
-++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++-+    #     # token_idxs = idxs // self.num_experts_per_tok
-++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
-++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++-+    #     #     if start_idx == end_idx:
-++-+    #     #         continue
-++-+    #     #     expert = self.experts[i]
-++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++-+    #     #     expert_tokens = x[exp_token_idx]
-++-+    #     #     expert_out = expert(expert_tokens)
-++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++-+    #     # return expert_cache
-++-+    #     expert_cache = ops.zeros_like(x)
-++-+    #     idxs = flat_expert_indices.argsort()
-++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++-+    #     token_idxs = idxs // self.num_experts_per_tok
-++-+
-++-+    #     for i, end_idx in enumerate(tokens_per_expert):
-++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++-+    #         if start_idx == end_idx:
-++-+    #             continue
-++-+    #         expert = self.experts[i]
-++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++-+    #         expert_tokens = x[exp_token_idx]
-++-+    #         expert_out = expert(expert_tokens)
-++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++-+
-++-+    #     return expert_cache
-++-+    # @no_grad()
-++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++-+    #     expert_cache = ops.zeros_like(x)
-++-+
-++-+    #     # 排序保证顺序一致
-++-+    #     idxs = flat_expert_indices.argsort()
-++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++-+    #     token_idxs = idxs // self.num_experts_per_tok
-++-+
-++-+    #     # 找出有 token 的专家
-++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++-+
-++-+    #     for i in active_experts.tolist():
-++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++-+    #         end_idx = tokens_per_expert[i]
-++-+    #         if start_idx == end_idx:  # 没有 token
-++-+    #             continue
-++-+
-++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++-+    #         expert_tokens = x[exp_token_idx]
-++-+    #         expert_out = self.experts[i](expert_tokens)
-++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++-+
-++-+    #         expert_cache = mindspore.mint.scatter_add(
-++-+    #             expert_cache,
-++-+    #             0,
-++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++-+    #             expert_out
-++-+    #         )
-++-+
-++-+    #     return expert_cache
-++-+
-++-+
-++- 
-++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++- #     """
-++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++- 
-++-         # Initialize weights and apply final processing
-++-         self.post_init()
-++-+        self.warm_up = False
-++-+
-++-+    def warmup_moe_model_deep(self):
-++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++-+        test_texts = [
-++-+            "warmup short",
-++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++-+        ]
-++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++-+        if tokenizer is None:
-++-+            from mindnlp.transformers import AutoTokenizer
-++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++-+            self._warmup_tokenizer = tokenizer
-++-+
-++-+        for text in test_texts:
-++-+            inputs = tokenizer(text, return_tensors="ms")
-++-+            with mindspore._no_grad():
-++-+                _ = self(**inputs, use_cache=False)
-++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++- 
-++-     def get_input_embeddings(self):
-++-         return self.model.embed_tokens
-++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++-         ```"""
-++-+        if not self.warm_up:
-++-+            self.warm_up = True
-++-+            self.warmup_moe_model_deep()
-++-+
-++-         output_attentions = (
-++-             output_attentions
-++-             if output_attentions is not None
-++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++-index 3cbf820e..d4c6b651 100644
-++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++-@@ -18,7 +18,6 @@
-++- # See the License for the specific language governing permissions and
-++- # limitations under the License.
-++- """MindSpore Qwen2MoE model."""
-++--
-++- import math
-++- from typing import List, Optional, Tuple, Union
-++- 
-++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++-     TokenClassifierOutput,
-++- )
-++- from ...modeling_utils import PreTrainedModel
-++-+from ...generation import GenerationMixin
-++- from ....utils import logging
-++- from .configuration_qwen2_moe import Qwen2MoeConfig
-++- 
-++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++-         self.variance_epsilon = eps
-++- 
-++-     def forward(self, hidden_states):
-++-+        # @dwj
-++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++-+        # @lwx
-++-+        # if not self.training :
-++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++-         input_dtype = hidden_states.dtype
-++-         hidden_states = hidden_states.to(mindspore.float32)
-++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++-@@ -234,6 +239,8 @@ def rotate_half(x):
-++-     """Rotates half the hidden dims of the input."""
-++-     x1 = x[..., : x.shape[-1] // 2]
-++-     x2 = x[..., x.shape[-1] // 2 :]
-++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++-     return ops.cat((-x2, x1), dim=-1)
-++- 
-++- 
-++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++-         self.config = config
-++-         self.hidden_size = config.hidden_size
-++-         self.intermediate_size = intermediate_size
-++-+        
-++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++-         self.act_fn = ACT2FN[config.hidden_act]
-++- 
-++-     def forward(self, x):
-++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++--
-++- 
-++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++-+        # @lwx 
-++-+        # gate_up_output = self.gate_up_proj(x)
-++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++-+        # return self.down_proj(swiglu_output)
-++-+
-++-+    # def forward(self, x):
-++-+    #     gate_proj_out = self.gate_proj(x)
-++-+    #     up_proj_out = self.up_proj(x)
-++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++-+    #     return self.down_proj(swiglu_out)
-++-+        
-++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++-     """
-++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++-         use_cache: bool = False,
-++-         cache_position: Optional[mindspore.Tensor] = None,
-++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++-+
-++-+        
-++-+
-++-         bsz, q_len, _ = hidden_states.shape
-++- 
-++-         query_states = self.q_proj(hidden_states)
-++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++-                     "with a layer index."
-++-                 )
-++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++-+            if isinstance(past_key_value, StaticCache):
-++-+                kv_seq_len = key_states.shape[-2]
-++-+            else:
-++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++- 
-++-         if past_key_value is not None:
-++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++-+            
-++-+            if isinstance(past_key_value, StaticCache):
-++-+                kv_seq_len = key_states.shape[-2]
-++- 
-++-         # repeat k/v heads if n_kv_heads < n_heads
-++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++--
-++-+        
-++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++- 
-++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++--            raise ValueError(
-++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++--                f" {attn_weights.shape}"
-++--            )
-++--
-++--        if attention_mask is not None:  # no matter the length, we just slice it
-++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++-+        if attention_mask is not None:
-++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++-             attn_weights = attn_weights + causal_mask
-++- 
-++-         # upcast attention to fp32
-++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++- 
-++-         attn_output = self.o_proj(attn_output)
-++--
-++-+        # @lwx
-++-+        
-++-+        # max_seq_len = self.max_position_embeddings  # 2048
-++-+
-++-+        # if attention_mask is not None:
-++-+        #     # attention_mask: [B, 1, Sq, Sk]
-++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++-+
-++-+        #     # pad 到 [max_seq_len, max_seq_len]
-++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++-+        #     global_attention_mask = padded_mask
-++-+        # else:
-++-+        #     global_attention_mask = None
-++-+
-++-+
-++-+        # sparse_mode=3
-++-+        # attn_output = mindspore.ops.flash_attention_score(
-++-+        #     query=query_states,
-++-+        #     key=key_states,
-++-+        #     value=value_states,
-++-+        #     real_shift=None,
-++-+        #     padding_mask=None,
-++-+
-++-+        #     head_num=self.num_heads,
-++-+        #     attn_mask=global_attention_mask,
-++-+        #     keep_prob=1.0 - self.attention_dropout,
-++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++-+        #     input_layout="BNSD",
-++-+        #     pre_tokens=2147483647,
-++-+        #     next_tokens=2147483647,
-++-+        #     inner_precise=0,
-++-+        #     drop_mask=None,
-++-+        #     prefix=None,
-++-+        #     actual_seq_qlen=None,
-++-+        #     actual_seq_kvlen=None,
-++-+        #     sparse_mode=sparse_mode,
-++-+        # )
-++-         if not output_attentions:
-++-             attn_weights = None
-++- 
-++-         return attn_output, attn_weights, past_key_value
-++- 
-++- 
-++-+class Qwen2MoeFlashAttention(nn.Module):
-++-+    """
-++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++-+
-++-+    关键改动:
-++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++-+       直接传入原始的 key 和 value 张量效率更高。
-++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++-+    """
-++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++-+        super().__init__()
-++-+        self.config = config
-++-+        self.layer_idx = layer_idx
-++-+        self.hidden_size = config.hidden_size
-++-+        self.num_heads = config.num_attention_heads
-++-+        self.head_dim = self.hidden_size // self.num_heads
-++-+        self.num_key_value_heads = config.num_key_value_heads
-++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++-+        self.max_position_embeddings = config.max_position_embeddings
-++-+        self.rope_theta = config.rope_theta
-++-+        self.attention_dropout = config.attention_dropout
-++-+
-++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
-++-+            raise ValueError(
-++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++-+            )
-++-+
-++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++-+
-++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++-+            self.head_dim,
-++-+            max_position_embeddings=self.max_position_embeddings,
-++-+            base=self.rope_theta,
-++-+        )
-++-+
-++-+    def forward(
-++-+        self,
-++-+        hidden_states: mindspore.Tensor,
-++-+        attention_mask: Optional[mindspore.Tensor] = None,
-++-+        position_ids: Optional[mindspore.Tensor] = None,
-++-+        past_key_value: Optional[Cache] = None,
-++-+        output_attentions: bool = False,
-++-+        use_cache: bool = False,
-++-+        cache_position: Optional[mindspore.Tensor] = None,
-++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++-+
-++-+        bsz, q_len, _ = hidden_states.shape
-++-+
-++-+        # 1. 线性投射 Q, K, V
-++-+        query_states = self.q_proj(hidden_states)
-++-+        key_states = self.k_proj(hidden_states)
-++-+        value_states = self.v_proj(hidden_states)
-++-+
-++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
-++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+
-++-+        # 3. RoPE 旋转位置编码
-++-+        kv_seq_len = key_states.shape[-2]
-++-+        if past_key_value is not None:
-++-+            if self.layer_idx is None:
-++-+                raise ValueError(
-++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++-+                    "with a layer index."
-++-+                )
-++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++-+                if cache_position.shape[0] == 1:
-++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++-+                    kv_seq_len = past_seen_tokens + 1
-++-+                else:
-++-+                    # prefill 阶段：cache_position 是范围，使用其长度
-++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++-+            else:
-++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++-+
-++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++-+
-++-+        # 4. KV 缓存更新
-++-+        if past_key_value is not None:
-++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++-+            key_states, value_states = past_key_value.update(
-++-+                key_states, value_states, self.layer_idx, cache_kwargs
-++-+            )
-++-+            
-++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++-+                if cache_position.shape[0] == 1:
-++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++-+                    kv_seq_len = key_states.shape[-2]
-++-+
-++-+        # 5. [重要] 准备 Attention Mask
-++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++-+        fa_attention_mask = None
-++-+        if attention_mask is not None:
-++-+            # 截取与当前key长度匹配的部分
-++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++-+            fa_attention_mask = (mask_slice != 0)
-++-+
-++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++-+        input_dtype = query_states.dtype
-++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++-+            query_states = query_states.to(mindspore.float16)
-++-+            key_states = key_states.to(mindspore.float16)
-++-+            value_states = value_states.to(mindspore.float16)
-++-+
-++-+        # 6. [核心] 调用 flash_attention_score 算子
-++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++-+        attn_output = mindspore.ops.flash_attention_score(
-++-+            query=query_states,
-++-+            key=key_states,
-++-+            value=value_states,
-++-+            head_num=self.num_heads, # 传入Q的头数(N1)
-++-+            attn_mask=fa_attention_mask,
-++-+            keep_prob=1.0 - self.attention_dropout,
-++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
-++-+            input_layout="BNSD",
-++-+            sparse_mode=0 # 使用 defaultMask 模式
-++-+        )
-++-+
-++-+        # 恢复原始数据类型
-++-+        attn_output = attn_output.to(input_dtype)
-++-+
-++-+        # 7. 调整输出形状
-++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++-+        attn_output = self.o_proj(attn_output)
-++-+
-++-+        # FlashAttention 算子不直接返回注意力权重矩阵
-++-+        attn_weights = None
-++-+        if output_attentions:
-++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++-+
-++-+        return attn_output, attn_weights, past_key_value
-++-+
-++-+    # def forward(
-++-+    #     self,
-++-+    #     hidden_states: mindspore.Tensor,
-++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-++-+    #     past_key_value: Optional[Cache] = None,
-++-+    #     output_attentions: bool = False,
-++-+    #     use_cache: bool = False,
-++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++-+
-++-+    #     bsz, q_len, _ = hidden_states.shape
-++-+
-++-+    #     # 1. 线性投射 Q, K, V
-++-+    #     query_states = self.q_proj(hidden_states)
-++-+    #     key_states = self.k_proj(hidden_states)
-++-+    #     value_states = self.v_proj(hidden_states)
-++-+
-++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+
-++-+    #     # 3. RoPE 旋转位置编码
-++-+    #     kv_seq_len = key_states.shape[-2]
-++-+    #     if past_key_value is not None:
-++-+    #         if self.layer_idx is None:
-++-+    #             raise ValueError(
-++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++-+    #                 "with a layer index."
-++-+    #             )
-++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++-+
-++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++-+
-++-+    #     # 4. KV 缓存更新
-++-+    #     if past_key_value is not None:
-++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++-+    #         key_states, value_states = past_key_value.update(
-++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-++-+    #         )
-++-+
-++-+    #     # 5. 准备 Attention Mask
-++-+    #     fa_attention_mask = None
-++-+    #     if attention_mask is not None:
-++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++-+    #         fa_attention_mask = (mask_slice != 0)
-++-+
-++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++-+    #     input_dtype = query_states.dtype
-++-+
-++-+    #     # 6. [核心] 调用 flash_attention_score 算子
-++-+    #     attn_output = mindspore.ops.flash_attention_score(
-++-+    #         query=query_states,
-++-+    #         key=key_states,
-++-+    #         value=value_states,
-++-+    #         head_num=self.num_heads,
-++-+    #         attn_mask=fa_attention_mask,
-++-+    #         keep_prob=1.0 - self.attention_dropout,
-++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++-+    #         input_layout="BNSD",
-++-+    #         sparse_mode=0,
-++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++-+    #         inner_precise=1
-++-+    #     )
-++-+
-++-+    #     # 恢复原始数据类型
-++-+    #     attn_output = attn_output.to(input_dtype)
-++-+
-++-+    #     # 7. 调整输出形状
-++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++-+    #     attn_output = self.o_proj(attn_output)
-++-+
-++-+    #     attn_weights = None
-++-+    #     if output_attentions:
-++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++-+
-++-+    #     return attn_output, attn_weights, past_key_value
-++-+
-++-+    # def forward(
-++-+    #     self,
-++-+    #     hidden_states: mindspore.Tensor,
-++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-++-+    #     past_key_value: Optional[Cache] = None,
-++-+    #     output_attentions: bool = False,
-++-+    #     use_cache: bool = False,
-++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++-+
-++-+    #     bsz, q_len, _ = hidden_states.shape
-++-+
-++-+    #     query_states = self.q_proj(hidden_states)
-++-+    #     key_states = self.k_proj(hidden_states)
-++-+    #     value_states = self.v_proj(hidden_states)
-++-+
-++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-+
-++-+    #     kv_seq_len = key_states.shape[-2]
-++-+    #     if past_key_value is not None:
-++-+    #         if self.layer_idx is None:
-++-+    #             raise ValueError("`layer_idx` must be specified for caching")
-++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++-+
-++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++-+
-++-+    #     if past_key_value is not None:
-++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++-+    #         key_states, value_states = past_key_value.update(
-++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-++-+    #         )
-++-+
-++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++-+
-++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++-+    #     query_states = query_states / math.sqrt(self.head_dim)
-++-+    #     # <--- 修改结束 ---
-++-+
-++-+    #     fa_attention_mask = None
-++-+    #     if attention_mask is not None:
-++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++-+    #         fa_attention_mask = (mask_slice != 0)
-++-+
-++-+    #     input_dtype = query_states.dtype
-++-+
-++-+    #     attn_output = mindspore.ops.flash_attention_score(
-++-+    #         query=query_states,    # 传入已经预先缩放过的 query
-++-+    #         key=key_states,
-++-+    #         value=value_states,
-++-+    #         head_num=self.num_heads,
-++-+    #         attn_mask=fa_attention_mask,
-++-+    #         keep_prob=1.0 - self.attention_dropout,
-++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++-+    #         input_layout="BNSD",
-++-+    #         sparse_mode=0,
-++-+    #         inner_precise=1        # 仍然保持内部高精度计算
-++-+    #     )
-++-+
-++-+    #     attn_output = attn_output.to(input_dtype)
-++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++-+    #     attn_output = self.o_proj(attn_output)
-++-+
-++-+    #     attn_weights = None
-++-+    #     if output_attentions:
-++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++-+
-++-+    #     return attn_output, attn_weights, past_key_value
-++-+
-++- QWEN2MOE_ATTENTION_CLASSES = {
-++-     "eager": Qwen2MoeAttention,
-++-+    "flash-attention": Qwen2MoeFlashAttention,
-++- }
-++- 
-++- 
-++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++- 
-++-+    #@dwj
-++-+    # 只遍历激活的专家，而非全部专家
-++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++--        hidden_states = hidden_states.view(-1, hidden_dim)
-++--        # router_logits: (batch * sequence_length, n_experts)
-++--        router_logits = self.gate(hidden_states)
-++--
-++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++--        if self.norm_topk_prob:
-++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++--        # we cast back to the input dtype
-++--        routing_weights = routing_weights.to(hidden_states.dtype)
-++--
-++--        final_hidden_states = ops.zeros(
-++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++--        )
-++--
-++--        # One hot encode the selected experts to create an expert mask
-++--        # this will be used to easily index which expert is going to be sollicitated
-++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++--
-++--        # Loop over all available experts in the model and perform the computation on each expert
-++--        for expert_idx in range(self.num_experts):
-++--            expert_layer = self.experts[expert_idx]
-++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++--
-++--            # Index the correct hidden states and compute the expert hidden state for
-++--            # the current expert. We need to make sure to multiply the output hidden
-++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++--            if 0 not in idx.shape:
-++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++--
-++--                # However `index_add_` only support torch tensors for indexing so we'll use
-++--                # the `top_x` tensor here.
-++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++--
-++--        shared_expert_output = self.shared_expert(hidden_states)
-++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++--
-++--        final_hidden_states = final_hidden_states + shared_expert_output
-++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++-+            num_tokens = hidden_states_reshaped.shape[0]
-++-+
-++-+            router_logits = self.gate(hidden_states_reshaped)
-++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++-+
-++-+            if self.norm_topk_prob:
-++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++-+            routing_weights = routing_weights.to(hidden_states.dtype)
-++-+            
-++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++-+            flat_selected_experts = selected_experts.flatten()
-++-+            
-++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++-+            token_indices = broadcasted_token_indices.flatten()
-++-+            
-++-+            active_experts = ops.unique(flat_selected_experts)
-++-+            
-++-+            for expert_idx_tensor in active_experts:
-++-+                expert_idx = expert_idx_tensor.item()
-++-+                expert_layer = self.experts[expert_idx]
-++-+                
-++-+                mask = (flat_selected_experts == expert_idx_tensor)
-++-+                selected_token_indices = token_indices[mask]
-++-+                selected_routing_weights = routing_weights.flatten()[mask]
-++-+                
-++-+                current_states = hidden_states_reshaped[selected_token_indices]
-++-+                
-++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++-+                
-++-+                final_hidden_states = final_hidden_states.index_add(
-++-+                    dim=0,
-++-+                    index=selected_token_indices,
-++-+                    source=expert_output.to(hidden_states.dtype)
-++-+                )
-++-+            
-++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++- 
-++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++--        return final_hidden_states, router_logits
-++-+            final_hidden_states = final_hidden_states + shared_expert_output
-++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++-+            
-++-+            return final_hidden_states, router_logits
-++- 
-++- 
-++- class Qwen2MoeDecoderLayer(nn.Module):
-++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++- 
-++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++- 
-++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++-+
-++-         if (layer_idx not in config.mlp_only_layers) and (
-++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++-         ):
-++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++-     _skip_keys_device_placement = "past_key_values"
-++-     _supports_cache_class = True
-++-+#lwx
-++-+    # _supports_static_cache = True
-++- 
-++-     def _init_weights(self, module):
-++-         std = self.config.initializer_range
-++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++-         return causal_mask
-++- 
-++- 
-++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++-     _tied_weights_keys = ["lm_head.weight"]
-++- 
-++-     def __init__(self, config):
-++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++-         self.num_experts_per_tok = config.num_experts_per_tok
-++-         # Initialize weights and apply final processing
-++-         self.post_init()
-++-+        # @lwx
-++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++-+        #     self.generation_config.cache_implementation = "static"
-++-+        self._warmed_up = False
-++-+
-++-+    def warmup_moe_model(self):
-++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++-+        test_texts = [
-++-+            "warmup short",
-++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++-+        ]
-++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++-+        if tokenizer is None:
-++-+            from mindnlp.transformers import AutoTokenizer
-++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++-+            self._warmup_tokenizer = tokenizer
-++-+
-++-+        for text in test_texts:
-++-+            inputs = tokenizer(text, return_tensors="ms")
-++-+            with mindspore._no_grad():
-++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++- 
-++-     def get_input_embeddings(self):
-++-         return self.model.embed_tokens
-++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++-         ```"""
-++-+        if not self._warmed_up:
-++-+            self._warmed_up = True
-++-+            self.warmup_moe_model()
-++- 
-++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++-         output_router_logits = (
-++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++-             }
-++-         )
-++-         return model_inputs
-++-+# @lwx
-++-+    # def _decode_one_tokens_logits(
-++-+    #     self,
-++-+    #     cur_token: mindspore.Tensor,
-++-+    #     input_pos: Optional[mindspore.Tensor],
-++-+    #     cache_position: mindspore.Tensor,
-++-+    #     past_key_values: StaticCache,
-++-+    # ) -> mindspore.Tensor:
-++-+    #     """
-++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++-+        
-++-+    #     Args:
-++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++-+    #         input_pos: 输入位置信息，可选
-++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++-+            
-++-+    #     Returns:
-++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++-+    #     """
-++-+    #     # 调用JIT编译的版本
-++-+    #     return self.get_decode_one_tokens_logits(
-++-+    #         cur_token=cur_token,
-++-+    #         input_pos=input_pos,
-++-+    #         cache_position=cache_position,
-++-+    #         past_key_values=past_key_values,
-++-+    #     )
-++-+    
-++-+    # @mindspore.jit(jit_level='O1')
-++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++-+    #     """
-++-+    #     JIT编译的函数，用于高效的单token解码
-++-+    #     使用JIT编译优化以支持静态shape和高效执行
-++-+        
-++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++-+    #     """
-++-+    #     outputs = self.model.forward(
-++-+    #         input_ids=cur_token,
-++-+    #         position_ids=input_pos,
-++-+    #         cache_position=cache_position,
-++-+    #         past_key_values=past_key_values,
-++-+    #         use_cache=True,
-++-+    #         return_dict=False,
-++-+    #     )
-++-+        
-++-+    #     hidden_states = outputs[0]
-++-+    #     logits = self.lm_head.forward(hidden_states)
-++-+    #     logits = logits.float()
-++-+        
-++-+    #     return logits[:, -1, :]
-++-+
-++-+    # def _sample(
-++-+    #     self,
-++-+    #     input_ids: mindspore.Tensor,
-++-+    #     logits_processor,
-++-+    #     stopping_criteria,
-++-+    #     generation_config,
-++-+    #     synced_devices: bool,
-++-+    #     streamer=None,
-++-+    #     logits_warper=None,
-++-+    #     **model_kwargs,
-++-+    # ):
-++-+    #     """
-++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++-+    #     """
-++-+    #     from ...generation.logits_process import LogitsProcessorList
-++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++-+    #     from mindnlp.core import nn, ops, no_grad
-++-+    #     import numpy as np
-++-+        
-++-+    #     # 检查是否使用 StaticCache
-++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++-+    #     # 否则，直接调用父类方法
-++-+    #     past_key_values = model_kwargs.get("past_key_values")
-++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++-+
-++-+    #     if not isinstance(past_key_values, StaticCache):
-++-+    #         # 不使用 StaticCache，直接调用父类方法
-++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++-+    #         return super()._sample(
-++-+    #             input_ids=input_ids,
-++-+    #             logits_processor=logits_processor,
-++-+    #             stopping_criteria=stopping_criteria,
-++-+    #             generation_config=generation_config,
-++-+    #             synced_devices=synced_devices,
-++-+    #             streamer=streamer,
-++-+    #             logits_warper=logits_warper,
-++-+    #             **model_kwargs,
-++-+    #         )
-++-+        
-++-+    #     # 使用 StaticCache，进入自定义循环
-++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++-+    #     pad_token_id = generation_config._pad_token_tensor
-++-+    #     output_attentions = generation_config.output_attentions
-++-+    #     output_hidden_states = generation_config.output_hidden_states
-++-+    #     output_scores = generation_config.output_scores
-++-+    #     output_logits = generation_config.output_logits
-++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++-+    #     max_length = generation_config.max_length
-++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++-+    #     do_sample = generation_config.do_sample
-++-+        
-++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++-+    #         raise ValueError(
-++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++-+    #             f"{logits_warper})."
-++-+    #         )
-++-+        
-++-+    #     # init attention / hidden states / scores tuples
-++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
-++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++-+        
-++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++-+    #         encoder_hidden_states = (
-++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++-+    #         )
-++-+        
-++-+    #     # keep track of which sequences are already finished
-++-+    #     batch_size, cur_len = input_ids.shape
-++-+    #     this_peer_finished = False
-++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++-+        
-++-+    #     time_record = []
-++-+    #     from ....utils.testing_utils import parse_flag_from_env
-++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++-+        
-++-+    #     while self._has_unfinished_sequences(
-++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++-+    #     ):
-++-+    #         if _record_time:
-++-+    #             import time as time_module
-++-+    #             infer_start = time_module.time()
-++-+            
-++-+    #         # prepare model inputs
-++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++-+            
-++-+    #         # prepare variable output controls
-++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++-+            
-++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++-+    #         cur_cache_position = model_inputs.get("cache_position")
-++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
-++-+    #         cur_input_ids = model_inputs.get("input_ids")
-++-+            
-++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++-+    #             cur_cache_position is not None and 
-++-+    #             len(cur_cache_position.shape) > 0 and
-++-+    #             cur_cache_position.shape[0] == 1 and
-++-+    #             cur_input_ids is not None and
-++-+    #             cur_input_ids.shape[1] == 1):
-++-+    #             # 使用 JIT 优化的单 token 解码
-++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++-+    #             if not hasattr(self, '_jit_used'):
-++-+    #                 self._jit_used = False
-++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++-+                
-++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
-++-+    #                 cur_token=cur_input_ids,
-++-+    #                 input_pos=model_inputs.get("position_ids"),
-++-+    #                 cache_position=cur_cache_position,
-++-+    #                 past_key_values=cur_past_key_values,
-++-+    #             )
-++-+                
-++-+    #             # 标记已使用JIT（用于后续判断）
-++-+    #             if not self._jit_used:
-++-+    #                 self._jit_used = True
-++-+                
-++-+    #             # 构造兼容的输出对象
-++-+    #             class JitOptimizedOutput:
-++-+    #                 def __init__(self, logits, config):
-++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++-+    #                     self.config = config
-++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
-++-+    #                     self.cross_attentions = None
-++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++-+                
-++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++-+    #         else:
-++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++-+    #             outputs = self(**model_inputs, return_dict=True)
-++-+            
-++-+    #         if synced_devices and this_peer_finished:
-++-+    #             continue
-++-+            
-++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++-+    #         next_token_logits = outputs.logits[:, -1, :]
-++-+            
-++-+    #         # pre-process distribution
-++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++-+    #         if do_sample:
-++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++-+            
-++-+    #         # Store scores, attentions and hidden_states when required
-++-+    #         if return_dict_in_generate:
-++-+    #             if output_scores:
-++-+    #                 scores += (next_token_scores,)
-++-+    #             if output_logits:
-++-+    #                 raw_logits += (next_token_logits,)
-++-+    #             if output_attentions:
-++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++-+    #                 if self.config.is_encoder_decoder:
-++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++-+                
-++-+    #             if output_hidden_states:
-++-+    #                 hidden = (
-++-+    #                     outputs.decoder_hidden_states
-++-+    #                     if self.config.is_encoder_decoder
-++-+    #                     else outputs.hidden_states
-++-+    #                 )
-++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++-+            
-++-+    #         # token selection
-++-+    #         if do_sample:
-++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++-+    #         else:
-++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++-+            
-++-+    #         # finished sentences should have their next token be a padding token
-++-+    #         if has_eos_stopping_criteria:
-++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++-+            
-++-+    #         # update generated ids, model inputs, and length for next step
-++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++-+    #         if streamer is not None:
-++-+    #             streamer.put(next_tokens)
-++-+            
-++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
-++-+    #             outputs,
-++-+    #             model_kwargs,
-++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++-+    #         )
-++-+            
-++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++-+    #         cur_len += 1
-++-+            
-++-+    #         if _record_time:
-++-+    #             import time as time_module
-++-+    #             infer_stop = time_module.time()
-++-+    #             time_record.append(infer_stop - infer_start)
-++-+            
-++-+    #         del outputs
-++-+        
-++-+    #     average_infer_time = None
-++-+    #     if time_record:
-++-+    #         if len(time_record) > 1:
-++-+    #             time_record.pop(0)
-++-+    #         average_infer_time = sum(time_record) / len(time_record)
-++-+    #         print(f'average inference time is: {average_infer_time}')
-++-+    #         print(f'inference time record: {time_record}')
-++-+        
-++-+    #     if streamer is not None:
-++-+    #         streamer.end()
-++-+        
-++-+    #     # 简单判断：打印是否使用了JIT路径
-++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
-++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
-++-+    #     else:
-++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++-+        
-++-+    #     if return_dict_in_generate:
-++-+    #         if self.config.is_encoder_decoder:
-++-+    #             return GenerateEncoderDecoderOutput(
-++-+    #                 sequences=input_ids,
-++-+    #                 scores=scores,
-++-+    #                 logits=raw_logits,
-++-+    #                 encoder_attentions=encoder_attentions,
-++-+    #                 encoder_hidden_states=encoder_hidden_states,
-++-+    #                 decoder_attentions=decoder_attentions,
-++-+    #                 cross_attentions=cross_attentions,
-++-+    #                 decoder_hidden_states=decoder_hidden_states,
-++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-++-+    #                 average_infer_time=average_infer_time
-++-+    #             )
-++-+    #         else:
-++-+    #             return GenerateDecoderOnlyOutput(
-++-+    #                 sequences=input_ids,
-++-+    #                 scores=scores,
-++-+    #                 logits=raw_logits,
-++-+    #                 attentions=decoder_attentions,
-++-+    #                 hidden_states=decoder_hidden_states,
-++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-++-+    #                 average_infer_time=average_infer_time
-++-+    #             )
-++-+    #     else:
-++-+    #         return input_ids
-++-+            
-++-+    # def _prepare_cache_for_generation(
-++-+    #     self,
-++-+    #     generation_config,
-++-+    #     model_kwargs,
-++-+    #     assistant_model,
-++-+    #     batch_size,
-++-+    #     max_cache_length,
-++-+    # ):
-++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++-+    #         generation_config.cache_implementation = "static"
-++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++-+        
-++-+    #     if generation_config.cache_implementation == "static":
-++-+    #         base_required_from_max_length = generation_config.max_length + 1
-++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
-++-+    #         min_cache_size = 50
-++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++-+    #         else:
-++-+    #             max_cache_length = max(base_required, min_cache_size)
-++-+            
-++-+    #         original_max_cache_length = max_cache_length
-++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
-++-+            
-++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++-+    #             if max_cache_length > self.config.max_position_embeddings:
-++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++-+        
-++-+    #     result = super()._prepare_cache_for_generation(
-++-+    #         generation_config=generation_config,
-++-+    #         model_kwargs=model_kwargs,
-++-+    #         assistant_model=assistant_model,
-++-+    #         batch_size=batch_size,
-++-+    #         max_cache_length=max_cache_length,
-++-+    #     )
-++-+        
-++-+    #     if generation_config.cache_implementation == "static":
-++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++-+    #         created_cache = model_kwargs.get(cache_name)
-++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++-+    #             if created_cache.max_cache_len < generation_config.max_length:
-++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++-+        
-++-+    #     return result
-++-+
-++-+
-++-+
-++- 
-++- 
-++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++--- 
-++-2.27.0
-++-
-++-- 
-++2.27.0
-++
-+-- 
-+2.27.0
-+
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0006-20251107002commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0006-20251107002commit.patch"
deleted file mode 100644
index 46db89f2..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0006-20251107002commit.patch"
+++ /dev/null
@@ -1,7931 +0,0 @@
-From 2c9ca98c339c674179652ab1635dab69b46d9012 Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Fri, 7 Nov 2025 12:06:32 +0800
-Subject: [PATCH 06/10] 20251107002commit
-
----
- .../models/deepseek/modeling_deepseek.py      |  122 +-
- patches/0001-20251104commit.patch             |    2 +-
- patches/0002-20251106commit.patch             |    2 +-
- patches/0003-20261106secondcommit.patch       |    2 +-
- patches/0004-20251106change.patch             |    2 +-
- patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
- 6 files changed, 7773 insertions(+), 64 deletions(-)
- create mode 100644 patches/0005-20251107001commit.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index 8831e4b7..e7e1c053 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
-     #         expert_out = expert(x)
-     #         expert_cache += expert_out * weight
-     #     return expert_cache
--
--    # @no_grad()
--    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--    #     # x 的 shape: (1, hidden_size)
--    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
--    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--
--    #     # 1. 收集所有需要的专家层
--    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
--
--    #     # 2. 并行计算所有专家的输出
--    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--    #     # ops.cat 会将它们堆叠成一个新的 Tensor
--    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--
--    #     # 3. 使用矩阵乘法进行加权求和
--    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--    #     # 最终结果 final_output 的 shape: (1, hidden_size)
--    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+    
-+    @no_grad()
-+    dwj
-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+        # x 的 shape: (1, hidden_size)
-+        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+
-+        # 1. 收集所有需要的专家层
-+        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+        selected_experts = [self.experts[i] for i in flat_expert_indices]
-+
-+        # 2. 并行计算所有专家的输出
-+        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+        # ops.cat 会将它们堆叠成一个新的 Tensor
-+        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+
-+        # 3. 使用矩阵乘法进行加权求和
-+        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+        # 最终结果 final_output 的 shape: (1, hidden_size)
-+        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-         
--    #     return final_output
-+        return final_output
- 
- 
-     # @no_grad()
-@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
- 
-         return expert_cache
- # 放置在 DeepseekMoE 类中
--    @no_grad()
--    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--        """
--        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--
--        Args:
--            x (Tensor): 输入张量, shape: (1, hidden_size)
--            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--        """
--        top_k, _ = flat_expert_weights.shape
--        hidden_size = x.shape[-1]
--
--        # 1. 将所有专家的权重堆叠起来
--        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-+    # @no_grad()
-+    # #lwx 20251107
-+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+    #     """
-+    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-+
-+    #     Args:
-+    #         x (Tensor): 输入张量, shape: (1, hidden_size)
-+    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-+    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-+    #     """
-+    #     top_k, _ = flat_expert_weights.shape
-+    #     hidden_size = x.shape[-1]
-+
-+    #     # 1. 将所有专家的权重堆叠起来
-+    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-+    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-+    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-         
--        # 2. "收集" 所需的专家权重
--        selected_gate_w = stacked_gate_w[flat_expert_indices]
--        selected_up_w = stacked_up_w[flat_expert_indices]
--        selected_down_w = stacked_down_w[flat_expert_indices]
-+    #     # 2. "收集" 所需的专家权重
-+    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
-+    #     selected_up_w = stacked_up_w[flat_expert_indices]
-+    #     selected_down_w = stacked_down_w[flat_expert_indices]
-         
--        # 3. 准备输入
--        x_expanded = x.expand((top_k, 1, hidden_size))
-+    #     # 3. 准备输入
-+    #     x_expanded = x.expand((top_k, 1, hidden_size))
-         
--        # 4. 并行计算 gate_proj 和 up_proj
--        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-+    #     # 4. 并行计算 gate_proj 和 up_proj
-+    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-+    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
- 
--        # 5. 计算中间状态
--        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-+    #     # 5. 计算中间状态
-+    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-         
--        # 6. 并行计算 down_proj
--        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--        #    --- [FIX] ---
--        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--        #    --- [FIX END] ---
-+    #     # 6. 并行计算 down_proj
-+    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-+    #     #    --- [FIX] ---
-+    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-+    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-+    #     #    --- [FIX END] ---
-         
--        # 7. 根据路由权重进行加权求和
--        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-+    #     # 7. 根据路由权重进行加权求和
-+    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
- 
--        return weighted_sum
-+    #     return weighted_sum
- 
- 
- 
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-index 0a0ef2d7..2842180e 100644
---- a/patches/0001-20251104commit.patch
-+++ b/patches/0001-20251104commit.patch
-@@ -1,7 +1,7 @@
- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Tue, 4 Nov 2025 09:11:51 +0800
--Subject: [PATCH 1/4] 20251104commit
-+Subject: [PATCH 1/5] 20251104commit
- 
- ---
-  mindnlp/transformers/cache_utils.py           |  28 +-
-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-index 5185270c..c6cd8757 100644
---- a/patches/0002-20251106commit.patch
-+++ b/patches/0002-20251106commit.patch
-@@ -1,7 +1,7 @@
- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 09:20:38 +0800
--Subject: [PATCH 2/4] 20251106commit
-+Subject: [PATCH 2/5] 20251106commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-index 3e05f821..601960c9 100644
---- a/patches/0003-20261106secondcommit.patch
-+++ b/patches/0003-20261106secondcommit.patch
-@@ -1,7 +1,7 @@
- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 14:54:37 +0800
--Subject: [PATCH 3/4] 20261106secondcommit
-+Subject: [PATCH 3/5] 20261106secondcommit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-index 88a1aef4..8976f10b 100644
---- a/patches/0004-20251106change.patch
-+++ b/patches/0004-20251106change.patch
-@@ -1,7 +1,7 @@
- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 15:48:09 +0800
--Subject: [PATCH 4/4] 20251106change
-+Subject: [PATCH 4/5] 20251106change
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  189 +-
-diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-new file mode 100644
-index 00000000..8d9032be
---- /dev/null
-+++ b/patches/0005-20251107001commit.patch
-@@ -0,0 +1,7707 @@
-+From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Fri, 7 Nov 2025 11:48:18 +0800
-+Subject: [PATCH 5/5] 20251107001commit
-+
-+---
-+ .../models/deepseek/modeling_deepseek.py      |   91 +-
-+ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
-+ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
-+ patches/0001-20251104commit.patch             |    2 +-
-+ patches/0002-20251106commit.patch             |    2 +-
-+ patches/0003-20261106secondcommit.patch       |    2 +-
-+ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
-+ 7 files changed, 7577 insertions(+), 30 deletions(-)
-+ create mode 100644 patches/0004-20251106change.patch
-+
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index 0546f318..8831e4b7 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
-+     #         expert_cache += expert_out * weight
-+     #     return expert_cache
-+ 
-+-    @no_grad()
-+-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+-        # x 的 shape: (1, hidden_size)
-+-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+-
-+-        # 1. 收集所有需要的专家层
-+-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+-        selected_experts = [self.experts[i] for i in flat_expert_indices]
-+-
-+-        # 2. 并行计算所有专家的输出
-+-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+-        # ops.cat 会将它们堆叠成一个新的 Tensor
-+-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+-
-+-        # 3. 使用矩阵乘法进行加权求和
-+-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+-        # 最终结果 final_output 的 shape: (1, hidden_size)
-+-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++    # @no_grad()
-++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++    #     # x 的 shape: (1, hidden_size)
-++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++
-++    #     # 1. 收集所有需要的专家层
-++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
-++
-++    #     # 2. 并行计算所有专家的输出
-++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
-++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++
-++    #     # 3. 使用矩阵乘法进行加权求和
-++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
-++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+         
-+-        return final_output
-++    #     return final_output
-+ 
-+ 
-+     # @no_grad()
-+@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
-+             )
-+ 
-+         return expert_cache
-++# 放置在 DeepseekMoE 类中
-++    @no_grad()
-++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++        """
-++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-++
-++        Args:
-++            x (Tensor): 输入张量, shape: (1, hidden_size)
-++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-++        """
-++        top_k, _ = flat_expert_weights.shape
-++        hidden_size = x.shape[-1]
-++
-++        # 1. 将所有专家的权重堆叠起来
-++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-++        
-++        # 2. "收集" 所需的专家权重
-++        selected_gate_w = stacked_gate_w[flat_expert_indices]
-++        selected_up_w = stacked_up_w[flat_expert_indices]
-++        selected_down_w = stacked_down_w[flat_expert_indices]
-++        
-++        # 3. 准备输入
-++        x_expanded = x.expand((top_k, 1, hidden_size))
-++        
-++        # 4. 并行计算 gate_proj 和 up_proj
-++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-++
-++        # 5. 计算中间状态
-++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-++        
-++        # 6. 并行计算 down_proj
-++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-++        #    --- [FIX] ---
-++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-++        #    --- [FIX END] ---
-++        
-++        # 7. 根据路由权重进行加权求和
-++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-++
-++        return weighted_sum
-++
-++
-+ 
-+         # @no_grad()
-+         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+index ebd7782e..913a7609 100644
-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
-+ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+ def rotate_half(x):
-+     """Rotates half the hidden dims of the input."""
-+-    x1 = x[..., : x.shape[-1] // 2]
-+-    x2 = x[..., x.shape[-1] // 2 :]
-++    # x1 = x[..., : x.shape[-1] // 2]
-++    # x2 = x[..., x.shape[-1] // 2 :]
-+     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+     return ops.cat((-x2, x1), dim=-1)
-+ 
-+ 
-+diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-+index d059dcbe..2b217b64 100644
-+--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-+@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
-+ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+ def rotate_half(x):
-+     """Rotates half the hidden dims of the input."""
-+-    x1 = x[..., : x.shape[-1] // 2]
-+-    x2 = x[..., x.shape[-1] // 2 :]
-++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++    # x1 = x[..., : x.shape[-1] // 2]
-++    # x2 = x[..., x.shape[-1] // 2 :]
-++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+     return ops.cat((-x2, x1), dim=-1)
-+ 
-+ 
-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+index 78f22642..0a0ef2d7 100644
-+--- a/patches/0001-20251104commit.patch
-++++ b/patches/0001-20251104commit.patch
-+@@ -1,7 +1,7 @@
-+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Tue, 4 Nov 2025 09:11:51 +0800
-+-Subject: [PATCH 1/3] 20251104commit
-++Subject: [PATCH 1/4] 20251104commit
-+ 
-+ ---
-+  mindnlp/transformers/cache_utils.py           |  28 +-
-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+index 22b65dd5..5185270c 100644
-+--- a/patches/0002-20251106commit.patch
-++++ b/patches/0002-20251106commit.patch
-+@@ -1,7 +1,7 @@
-+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 09:20:38 +0800
-+-Subject: [PATCH 2/3] 20251106commit
-++Subject: [PATCH 2/4] 20251106commit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+index 966529e4..3e05f821 100644
-+--- a/patches/0003-20261106secondcommit.patch
-++++ b/patches/0003-20261106secondcommit.patch
-+@@ -1,7 +1,7 @@
-+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 14:54:37 +0800
-+-Subject: [PATCH 3/3] 20261106secondcommit
-++Subject: [PATCH 3/4] 20261106secondcommit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-+new file mode 100644
-+index 00000000..88a1aef4
-+--- /dev/null
-++++ b/patches/0004-20251106change.patch
-+@@ -0,0 +1,7498 @@
-++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-++From: Pinoeer-kingxi <13022943007@163.com>
-++Date: Thu, 6 Nov 2025 15:48:09 +0800
-++Subject: [PATCH 4/4] 20251106change
-++
-++---
-++ .../models/deepseek/modeling_deepseek.py      |  189 +-
-++ patches/0001-20251104commit.patch             | 1272 +++++++
-++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
-++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
-++ 4 files changed, 7244 insertions(+), 186 deletions(-)
-++ create mode 100644 patches/0001-20251104commit.patch
-++ create mode 100644 patches/0002-20251106commit.patch
-++ create mode 100644 patches/0003-20261106secondcommit.patch
-++
-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++index 2f9192bf..0546f318 100644
-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
-++ 
-++         return attn_output, attn_weights, past_key_value
-++ 
-++-# class DeepseekFlashAttention(nn.Module):
-++-#     """
-++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-++-
-++-#     This class is designed as a drop-in replacement for DeepseekAttention.
-++-#     """
-++-
-++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-++-#         super().__init__()
-++-#         self.config = config
-++-#         self.layer_idx = layer_idx
-++-#         if layer_idx is None:
-++-#             logger.warning(
-++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++-#                 "when creating this class."
-++-#             )
-++-
-++-#         self.attention_dropout = config.attention_dropout
-++-#         self.hidden_size = config.hidden_size
-++-#         self.num_heads = config.num_attention_heads
-++-#         self.head_dim = self.hidden_size // self.num_heads
-++-#         self.num_key_value_heads = config.num_key_value_heads
-++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++-#         self.max_position_embeddings = config.max_position_embeddings
-++-#         self.rope_theta = config.rope_theta
-++-#         self.is_causal = True
-++-
-++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++-#             raise ValueError(
-++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++-#                 f" and `num_heads`: {self.num_heads})."
-++-#             )
-++-
-++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-++-#         self._init_rope()
-++-
-++-#     def _init_rope(self):
-++-#         if self.config.rope_scaling is None:
-++-#             self.rotary_emb = DeepseekRotaryEmbedding(
-++-#                 self.head_dim,
-++-#                 max_position_embeddings=self.max_position_embeddings,
-++-#                 base=self.rope_theta,
-++-#             )
-++-#         else:
-++-#             scaling_type = self.config.rope_scaling["type"]
-++-#             scaling_factor = self.config.rope_scaling["factor"]
-++-#             if scaling_type == "linear":
-++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-++-#                     self.head_dim,
-++-#                     max_position_embeddings=self.max_position_embeddings,
-++-#                     scaling_factor=scaling_factor,
-++-#                     base=self.rope_theta,
-++-#                 )
-++-#             elif scaling_type == "dynamic":
-++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-++-#                     self.head_dim,
-++-#                     max_position_embeddings=self.max_position_embeddings,
-++-#                     scaling_factor=scaling_factor,
-++-#                     base=self.rope_theta,
-++-#                 )
-++-#             else:
-++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-++-
-++-#     def forward(
-++-#         self,
-++-#         hidden_states: mindspore.Tensor,
-++-#         attention_mask: Optional[mindspore.Tensor] = None,
-++-#         position_ids: Optional[mindspore.Tensor] = None,
-++-#         past_key_value: Optional[Cache] = None,
-++-#         output_attentions: bool = False,
-++-#         use_cache: bool = False,
-++-#         **kwargs,
-++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++-#         if "padding_mask" in kwargs:
-++-#             warnings.warn(
-++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-++-#             )
-++-        
-++-#         if output_attentions:
-++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-++-
-++-#         bsz, q_len, _ = hidden_states.shape
-++-
-++-#         if self.config.pretraining_tp > 1:
-++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-++-
-++-#         query_states = self.q_proj(hidden_states)
-++-#         key_states = self.k_proj(hidden_states)
-++-#         value_states = self.v_proj(hidden_states)
-++-
-++-#         # Reshape for multi-head attention
-++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++-
-++-#         kv_seq_len = key_states.shape[-2]
-++-#         if past_key_value is not None:
-++-#             if self.layer_idx is None:
-++-#                 raise ValueError(
-++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++-#                     "with a layer index."
-++-#                 )
-++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++-        
-++-#         # Apply Rotary Positional Embedding
-++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++-
-++-#         if past_key_value is not None:
-++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++-
-++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++-        
-++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++-        
-++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++-
-++-#         # Convert attention_mask for flash_attention_score
-++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-++-#         if attention_mask is not None:
-++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-++-#                 raise ValueError(
-++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-++-#                 )
-++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-++-#         else:
-++-#             attn_mask_for_fa = None
-++-        
-++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-++-
-++-#         # Call the fused flash_attention_score operator
-++-#         attn_output = mindspore.ops.flash_attention_score(
-++-#             query=query_states_for_fa,
-++-#             key=key_states_for_fa,
-++-#             value=value_states_for_fa,
-++-#             head_num=self.num_heads, # This is N1, the number of query heads
-++-#             input_layout='BSH',
-++-#             attn_mask=attn_mask_for_fa,
-++-#             keep_prob=keep_prob,
-++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
-++-#             sparse_mode=0 # Default mask mode
-++-#         )
-++-        
-++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-++-#         attn_output = self.o_proj(attn_output)
-++-        
-++-#         # Flash Attention does not return attention weights
-++-#         attn_weights = None
-++-
-++-#         return attn_output, attn_weights, past_key_value
-++ 
-++ class DeepseekFlashAttention(nn.Module):
-++     """
-++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
-++         super().__init__()
-++         self.hidden_size = config.hidden_size
-++ 
-++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-++-            config=config, layer_idx=layer_idx
-++-        )
-+++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-+++            # config=config, layer_idx=layer_idx
-+++        # )
-++ 
-++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-++             config=config, layer_idx=layer_idx
-++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
-++         return outputs
-++ 
-++ 
-++-
-++ class DeepseekPreTrainedModel(PreTrainedModel):
-++     config_class = DeepseekConfig
-++     base_model_prefix = "model"
-++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++         # Initialize weights and apply final processing
-++         self.post_init()
-++         self.warm_up = False
-++-        #@dwj
-++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-++-            self.num_layers,
-++-            self.num_attention_heads,
-++-            self.head_dim,
-++-            batch_size=1,
-++-            max_length=self.max_length,
-++-            dtype=mindspore.float16
-++-        )
-++-
-++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-++-        key_cache = []
-++-        value_cache = []
-++-        for _ in range(num_layers):
-++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++-            key_cache.append(k)
-++-            value_cache.append(v)
-++-        return key_cache, value_cache
-++-
-++ 
-++     def warmup_moe_model_deep(self):
-++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++new file mode 100644
-++index 00000000..78f22642
-++--- /dev/null
-+++++ b/patches/0001-20251104commit.patch
-++@@ -0,0 +1,1272 @@
-+++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++From: Pinoeer-kingxi <13022943007@163.com>
-+++Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++Subject: [PATCH 1/3] 20251104commit
-+++
-+++---
-+++ mindnlp/transformers/cache_utils.py           |  28 +-
-+++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+++ 3 files changed, 976 insertions(+), 87 deletions(-)
-+++
-+++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+++index cadd2e04..02f8d4be 100644
-+++--- a/mindnlp/transformers/cache_utils.py
-++++++ b/mindnlp/transformers/cache_utils.py
-+++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+++             # k_out[:, :, cache_position] = key_states
-+++             # v_out[:, :, cache_position] = value_states
-+++-            if ON_ORANGE_PI:
-+++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++-            else:
-+++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++-
-++++            # if ON_ORANGE_PI:
-++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++            # else:
-++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++            # 确保 cache_position 是 1D tensor 并且类型正确
-++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++++            if cache_position.ndim > 1:
-++++                cache_position = cache_position.flatten()
-++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++++                cache_position = cache_position.int()
-++++            
-++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++++            k_out[:, :, cache_position] = key_states
-++++            v_out[:, :, cache_position] = value_states
-++++            
-+++         return k_out, v_out
-+++ 
-+++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++index c695b944..d8303e45 100644
-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++ def rotate_half(x):
-+++     """Rotates half the hidden dims of the input."""
-+++-    x1 = x[..., : x.shape[-1] // 2]
-+++-    x2 = x[..., x.shape[-1] // 2 :]
-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++    # x1 = x[..., : x.shape[-1] // 2]
-++++    # x2 = x[..., x.shape[-1] // 2 :]
-++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++     return ops.cat((-x2, x1), dim=-1)
-+++ 
-+++ 
-+++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+++         if self.training:
-+++             raise NotImplementedError("Training is not supported yet.")
-+++         else:
-+++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++-        if self.config.n_shared_experts is not None:
-+++-            y = y + self.shared_experts(identity)
-+++-        return y
-++++            # @lwx
-++++            if orig_shape[1] == 1:
-++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++++                y=y.view(*orig_shape)
-++++                if self.config.n_shared_experts is not None:
-++++                    y = y + self.shared_experts(identity)
-++++                return y
-++++            else:
-++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++++                if self.config.n_shared_experts is not None:
-++++                    y = y + self.shared_experts(identity)
-++++                return y
-++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++        # if self.config.n_shared_experts is not None:
-++++        #     y = y + self.shared_experts(identity)
-++++        # return y
-++++        
-++++    @no_grad()
-++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++
-++++        expert_cache = ops.zeros_like(x)
-++++        for i in range(self.num_experts_per_tok):
-++++            expert_id = flat_expert_indices[i].item()
-++++            weight = flat_expert_weights[i].item()
-++++            expert = self.experts[expert_id]
-++++            expert_out = expert(x)
-++++            expert_cache += expert_out * weight
-++++        return expert_cache
-+++ 
-+++     @no_grad()
-+++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++-        # expert_cache = torch.zeros_like(x)
-+++-        # idxs = flat_expert_indices.argsort()
-+++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++-        # token_idxs = idxs // self.num_experts_per_tok
-+++-        # for i, end_idx in enumerate(tokens_per_expert):
-+++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++-        #     if start_idx == end_idx:
-+++-        #         continue
-+++-        #     expert = self.experts[i]
-+++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++-        #     expert_tokens = x[exp_token_idx]
-+++-        #     expert_out = expert(expert_tokens)
-+++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++-        # return expert_cache
-++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++         expert_cache = ops.zeros_like(x)
-+++         idxs = flat_expert_indices.argsort()
-+++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++         token_idxs = idxs // self.num_experts_per_tok
-++++
-+++         for i, end_idx in enumerate(tokens_per_expert):
-+++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++             if start_idx == end_idx:
-+++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+++             expert_out = expert(expert_tokens)
-+++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++
-+++         return expert_cache
-++++        
-++++    # @no_grad()
-++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++    #     # expert_cache = torch.zeros_like(x)
-++++    #     # idxs = flat_expert_indices.argsort()
-++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++    #     # token_idxs = idxs // self.num_experts_per_tok
-++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++    #     #     if start_idx == end_idx:
-++++    #     #         continue
-++++    #     #     expert = self.experts[i]
-++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++    #     #     expert_tokens = x[exp_token_idx]
-++++    #     #     expert_out = expert(expert_tokens)
-++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++    #     # return expert_cache
-++++    #     expert_cache = ops.zeros_like(x)
-++++    #     idxs = flat_expert_indices.argsort()
-++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++
-++++    #     for i, end_idx in enumerate(tokens_per_expert):
-++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++    #         if start_idx == end_idx:
-++++    #             continue
-++++    #         expert = self.experts[i]
-++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++    #         expert_tokens = x[exp_token_idx]
-++++    #         expert_out = expert(expert_tokens)
-++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++
-++++    #     return expert_cache
-++++    # @no_grad()
-++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++    #     expert_cache = ops.zeros_like(x)
-++++
-++++    #     # 排序保证顺序一致
-++++    #     idxs = flat_expert_indices.argsort()
-++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++
-++++    #     # 找出有 token 的专家
-++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++
-++++    #     for i in active_experts.tolist():
-++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++    #         end_idx = tokens_per_expert[i]
-++++    #         if start_idx == end_idx:  # 没有 token
-++++    #             continue
-++++
-++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++    #         expert_tokens = x[exp_token_idx]
-++++    #         expert_out = self.experts[i](expert_tokens)
-++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++
-++++    #         expert_cache = mindspore.mint.scatter_add(
-++++    #             expert_cache,
-++++    #             0,
-++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++    #             expert_out
-++++    #         )
-++++
-++++    #     return expert_cache
-++++
-++++
-+++ 
-+++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+++ #     """
-+++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++ 
-+++         # Initialize weights and apply final processing
-+++         self.post_init()
-++++        self.warm_up = False
-++++
-++++    def warmup_moe_model_deep(self):
-++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++        test_texts = [
-++++            "warmup short",
-++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++++        ]
-++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++        if tokenizer is None:
-++++            from mindnlp.transformers import AutoTokenizer
-++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++            self._warmup_tokenizer = tokenizer
-++++
-++++        for text in test_texts:
-++++            inputs = tokenizer(text, return_tensors="ms")
-++++            with mindspore._no_grad():
-++++                _ = self(**inputs, use_cache=False)
-++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+++ 
-+++     def get_input_embeddings(self):
-+++         return self.model.embed_tokens
-+++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++         ```"""
-++++        if not self.warm_up:
-++++            self.warm_up = True
-++++            self.warmup_moe_model_deep()
-++++
-+++         output_attentions = (
-+++             output_attentions
-+++             if output_attentions is not None
-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++index 3cbf820e..d4c6b651 100644
-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++@@ -18,7 +18,6 @@
-+++ # See the License for the specific language governing permissions and
-+++ # limitations under the License.
-+++ """MindSpore Qwen2MoE model."""
-+++-
-+++ import math
-+++ from typing import List, Optional, Tuple, Union
-+++ 
-+++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+++     TokenClassifierOutput,
-+++ )
-+++ from ...modeling_utils import PreTrainedModel
-++++from ...generation import GenerationMixin
-+++ from ....utils import logging
-+++ from .configuration_qwen2_moe import Qwen2MoeConfig
-+++ 
-+++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+++         self.variance_epsilon = eps
-+++ 
-+++     def forward(self, hidden_states):
-++++        # @dwj
-++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++        # @lwx
-++++        # if not self.training :
-++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++         input_dtype = hidden_states.dtype
-+++         hidden_states = hidden_states.to(mindspore.float32)
-+++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+++@@ -234,6 +239,8 @@ def rotate_half(x):
-+++     """Rotates half the hidden dims of the input."""
-+++     x1 = x[..., : x.shape[-1] // 2]
-+++     x2 = x[..., x.shape[-1] // 2 :]
-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++     return ops.cat((-x2, x1), dim=-1)
-+++ 
-+++ 
-+++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+++         self.config = config
-+++         self.hidden_size = config.hidden_size
-+++         self.intermediate_size = intermediate_size
-++++        
-+++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+++         self.act_fn = ACT2FN[config.hidden_act]
-+++ 
-+++     def forward(self, x):
-+++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++-
-+++ 
-++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++        # @lwx 
-++++        # gate_up_output = self.gate_up_proj(x)
-++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++++        # return self.down_proj(swiglu_output)
-++++
-++++    # def forward(self, x):
-++++    #     gate_proj_out = self.gate_proj(x)
-++++    #     up_proj_out = self.up_proj(x)
-++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++++    #     return self.down_proj(swiglu_out)
-++++        
-+++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++     """
-+++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+++         use_cache: bool = False,
-+++         cache_position: Optional[mindspore.Tensor] = None,
-+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++        
-++++
-+++         bsz, q_len, _ = hidden_states.shape
-+++ 
-+++         query_states = self.q_proj(hidden_states)
-+++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++                     "with a layer index."
-+++                 )
-+++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++            if isinstance(past_key_value, StaticCache):
-++++                kv_seq_len = key_states.shape[-2]
-++++            else:
-++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++ 
-+++         if past_key_value is not None:
-+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++            
-++++            if isinstance(past_key_value, StaticCache):
-++++                kv_seq_len = key_states.shape[-2]
-+++ 
-+++         # repeat k/v heads if n_kv_heads < n_heads
-+++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++-
-++++        
-+++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++ 
-+++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+++-            raise ValueError(
-+++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+++-                f" {attn_weights.shape}"
-+++-            )
-+++-
-+++-        if attention_mask is not None:  # no matter the length, we just slice it
-+++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++++        if attention_mask is not None:
-++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++             attn_weights = attn_weights + causal_mask
-+++ 
-+++         # upcast attention to fp32
-+++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++ 
-+++         attn_output = self.o_proj(attn_output)
-+++-
-++++        # @lwx
-++++        
-++++        # max_seq_len = self.max_position_embeddings  # 2048
-++++
-++++        # if attention_mask is not None:
-++++        #     # attention_mask: [B, 1, Sq, Sk]
-++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++
-++++        #     # pad 到 [max_seq_len, max_seq_len]
-++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++        #     global_attention_mask = padded_mask
-++++        # else:
-++++        #     global_attention_mask = None
-++++
-++++
-++++        # sparse_mode=3
-++++        # attn_output = mindspore.ops.flash_attention_score(
-++++        #     query=query_states,
-++++        #     key=key_states,
-++++        #     value=value_states,
-++++        #     real_shift=None,
-++++        #     padding_mask=None,
-++++
-++++        #     head_num=self.num_heads,
-++++        #     attn_mask=global_attention_mask,
-++++        #     keep_prob=1.0 - self.attention_dropout,
-++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++        #     input_layout="BNSD",
-++++        #     pre_tokens=2147483647,
-++++        #     next_tokens=2147483647,
-++++        #     inner_precise=0,
-++++        #     drop_mask=None,
-++++        #     prefix=None,
-++++        #     actual_seq_qlen=None,
-++++        #     actual_seq_kvlen=None,
-++++        #     sparse_mode=sparse_mode,
-++++        # )
-+++         if not output_attentions:
-+++             attn_weights = None
-+++ 
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-+++ 
-++++class Qwen2MoeFlashAttention(nn.Module):
-++++    """
-++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++
-++++    关键改动:
-++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++       直接传入原始的 key 和 value 张量效率更高。
-++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++    """
-++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++        super().__init__()
-++++        self.config = config
-++++        self.layer_idx = layer_idx
-++++        self.hidden_size = config.hidden_size
-++++        self.num_heads = config.num_attention_heads
-++++        self.head_dim = self.hidden_size // self.num_heads
-++++        self.num_key_value_heads = config.num_key_value_heads
-++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++        self.max_position_embeddings = config.max_position_embeddings
-++++        self.rope_theta = config.rope_theta
-++++        self.attention_dropout = config.attention_dropout
-++++
-++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++            raise ValueError(
-++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++            )
-++++
-++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++
-++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++            self.head_dim,
-++++            max_position_embeddings=self.max_position_embeddings,
-++++            base=self.rope_theta,
-++++        )
-++++
-++++    def forward(
-++++        self,
-++++        hidden_states: mindspore.Tensor,
-++++        attention_mask: Optional[mindspore.Tensor] = None,
-++++        position_ids: Optional[mindspore.Tensor] = None,
-++++        past_key_value: Optional[Cache] = None,
-++++        output_attentions: bool = False,
-++++        use_cache: bool = False,
-++++        cache_position: Optional[mindspore.Tensor] = None,
-++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++        bsz, q_len, _ = hidden_states.shape
-++++
-++++        # 1. 线性投射 Q, K, V
-++++        query_states = self.q_proj(hidden_states)
-++++        key_states = self.k_proj(hidden_states)
-++++        value_states = self.v_proj(hidden_states)
-++++
-++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++        # 3. RoPE 旋转位置编码
-++++        kv_seq_len = key_states.shape[-2]
-++++        if past_key_value is not None:
-++++            if self.layer_idx is None:
-++++                raise ValueError(
-++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++                    "with a layer index."
-++++                )
-++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++                if cache_position.shape[0] == 1:
-++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++                    kv_seq_len = past_seen_tokens + 1
-++++                else:
-++++                    # prefill 阶段：cache_position 是范围，使用其长度
-++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++            else:
-++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++
-++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++        # 4. KV 缓存更新
-++++        if past_key_value is not None:
-++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++            key_states, value_states = past_key_value.update(
-++++                key_states, value_states, self.layer_idx, cache_kwargs
-++++            )
-++++            
-++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++                if cache_position.shape[0] == 1:
-++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++                    kv_seq_len = key_states.shape[-2]
-++++
-++++        # 5. [重要] 准备 Attention Mask
-++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++        fa_attention_mask = None
-++++        if attention_mask is not None:
-++++            # 截取与当前key长度匹配的部分
-++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++            fa_attention_mask = (mask_slice != 0)
-++++
-++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++        input_dtype = query_states.dtype
-++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++            query_states = query_states.to(mindspore.float16)
-++++            key_states = key_states.to(mindspore.float16)
-++++            value_states = value_states.to(mindspore.float16)
-++++
-++++        # 6. [核心] 调用 flash_attention_score 算子
-++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++        attn_output = mindspore.ops.flash_attention_score(
-++++            query=query_states,
-++++            key=key_states,
-++++            value=value_states,
-++++            head_num=self.num_heads, # 传入Q的头数(N1)
-++++            attn_mask=fa_attention_mask,
-++++            keep_prob=1.0 - self.attention_dropout,
-++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-++++            input_layout="BNSD",
-++++            sparse_mode=0 # 使用 defaultMask 模式
-++++        )
-++++
-++++        # 恢复原始数据类型
-++++        attn_output = attn_output.to(input_dtype)
-++++
-++++        # 7. 调整输出形状
-++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++        attn_output = self.o_proj(attn_output)
-++++
-++++        # FlashAttention 算子不直接返回注意力权重矩阵
-++++        attn_weights = None
-++++        if output_attentions:
-++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++
-++++        return attn_output, attn_weights, past_key_value
-++++
-++++    # def forward(
-++++    #     self,
-++++    #     hidden_states: mindspore.Tensor,
-++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++    #     past_key_value: Optional[Cache] = None,
-++++    #     output_attentions: bool = False,
-++++    #     use_cache: bool = False,
-++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++    #     bsz, q_len, _ = hidden_states.shape
-++++
-++++    #     # 1. 线性投射 Q, K, V
-++++    #     query_states = self.q_proj(hidden_states)
-++++    #     key_states = self.k_proj(hidden_states)
-++++    #     value_states = self.v_proj(hidden_states)
-++++
-++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++    #     # 3. RoPE 旋转位置编码
-++++    #     kv_seq_len = key_states.shape[-2]
-++++    #     if past_key_value is not None:
-++++    #         if self.layer_idx is None:
-++++    #             raise ValueError(
-++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++    #                 "with a layer index."
-++++    #             )
-++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++
-++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++    #     # 4. KV 缓存更新
-++++    #     if past_key_value is not None:
-++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++    #         key_states, value_states = past_key_value.update(
-++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++    #         )
-++++
-++++    #     # 5. 准备 Attention Mask
-++++    #     fa_attention_mask = None
-++++    #     if attention_mask is not None:
-++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++    #         fa_attention_mask = (mask_slice != 0)
-++++
-++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++    #     input_dtype = query_states.dtype
-++++
-++++    #     # 6. [核心] 调用 flash_attention_score 算子
-++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++    #         query=query_states,
-++++    #         key=key_states,
-++++    #         value=value_states,
-++++    #         head_num=self.num_heads,
-++++    #         attn_mask=fa_attention_mask,
-++++    #         keep_prob=1.0 - self.attention_dropout,
-++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++    #         input_layout="BNSD",
-++++    #         sparse_mode=0,
-++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++    #         inner_precise=1
-++++    #     )
-++++
-++++    #     # 恢复原始数据类型
-++++    #     attn_output = attn_output.to(input_dtype)
-++++
-++++    #     # 7. 调整输出形状
-++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++    #     attn_output = self.o_proj(attn_output)
-++++
-++++    #     attn_weights = None
-++++    #     if output_attentions:
-++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++
-++++    #     return attn_output, attn_weights, past_key_value
-++++
-++++    # def forward(
-++++    #     self,
-++++    #     hidden_states: mindspore.Tensor,
-++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++    #     past_key_value: Optional[Cache] = None,
-++++    #     output_attentions: bool = False,
-++++    #     use_cache: bool = False,
-++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++    #     bsz, q_len, _ = hidden_states.shape
-++++
-++++    #     query_states = self.q_proj(hidden_states)
-++++    #     key_states = self.k_proj(hidden_states)
-++++    #     value_states = self.v_proj(hidden_states)
-++++
-++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++    #     kv_seq_len = key_states.shape[-2]
-++++    #     if past_key_value is not None:
-++++    #         if self.layer_idx is None:
-++++    #             raise ValueError("`layer_idx` must be specified for caching")
-++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++
-++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++    #     if past_key_value is not None:
-++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++    #         key_states, value_states = past_key_value.update(
-++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++    #         )
-++++
-++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++
-++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++    #     query_states = query_states / math.sqrt(self.head_dim)
-++++    #     # <--- 修改结束 ---
-++++
-++++    #     fa_attention_mask = None
-++++    #     if attention_mask is not None:
-++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++    #         fa_attention_mask = (mask_slice != 0)
-++++
-++++    #     input_dtype = query_states.dtype
-++++
-++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++    #         query=query_states,    # 传入已经预先缩放过的 query
-++++    #         key=key_states,
-++++    #         value=value_states,
-++++    #         head_num=self.num_heads,
-++++    #         attn_mask=fa_attention_mask,
-++++    #         keep_prob=1.0 - self.attention_dropout,
-++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++    #         input_layout="BNSD",
-++++    #         sparse_mode=0,
-++++    #         inner_precise=1        # 仍然保持内部高精度计算
-++++    #     )
-++++
-++++    #     attn_output = attn_output.to(input_dtype)
-++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++    #     attn_output = self.o_proj(attn_output)
-++++
-++++    #     attn_weights = None
-++++    #     if output_attentions:
-++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++
-++++    #     return attn_output, attn_weights, past_key_value
-++++
-+++ QWEN2MOE_ATTENTION_CLASSES = {
-+++     "eager": Qwen2MoeAttention,
-++++    "flash-attention": Qwen2MoeFlashAttention,
-+++ }
-+++ 
-+++ 
-+++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++ 
-++++    #@dwj
-++++    # 只遍历激活的专家，而非全部专家
-+++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-        hidden_states = hidden_states.view(-1, hidden_dim)
-+++-        # router_logits: (batch * sequence_length, n_experts)
-+++-        router_logits = self.gate(hidden_states)
-+++-
-+++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++-        if self.norm_topk_prob:
-+++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-        # we cast back to the input dtype
-+++-        routing_weights = routing_weights.to(hidden_states.dtype)
-+++-
-+++-        final_hidden_states = ops.zeros(
-+++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+++-        )
-+++-
-+++-        # One hot encode the selected experts to create an expert mask
-+++-        # this will be used to easily index which expert is going to be sollicitated
-+++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+++-
-+++-        # Loop over all available experts in the model and perform the computation on each expert
-+++-        for expert_idx in range(self.num_experts):
-+++-            expert_layer = self.experts[expert_idx]
-+++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+++-
-+++-            # Index the correct hidden states and compute the expert hidden state for
-+++-            # the current expert. We need to make sure to multiply the output hidden
-+++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+++-            if 0 not in idx.shape:
-+++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+++-
-+++-                # However `index_add_` only support torch tensors for indexing so we'll use
-+++-                # the `top_x` tensor here.
-+++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+++-
-+++-        shared_expert_output = self.shared_expert(hidden_states)
-+++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+++-
-+++-        final_hidden_states = final_hidden_states + shared_expert_output
-++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++            num_tokens = hidden_states_reshaped.shape[0]
-++++
-++++            router_logits = self.gate(hidden_states_reshaped)
-++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++
-++++            if self.norm_topk_prob:
-++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++            routing_weights = routing_weights.to(hidden_states.dtype)
-++++            
-++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++            flat_selected_experts = selected_experts.flatten()
-++++            
-++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++            token_indices = broadcasted_token_indices.flatten()
-++++            
-++++            active_experts = ops.unique(flat_selected_experts)
-++++            
-++++            for expert_idx_tensor in active_experts:
-++++                expert_idx = expert_idx_tensor.item()
-++++                expert_layer = self.experts[expert_idx]
-++++                
-++++                mask = (flat_selected_experts == expert_idx_tensor)
-++++                selected_token_indices = token_indices[mask]
-++++                selected_routing_weights = routing_weights.flatten()[mask]
-++++                
-++++                current_states = hidden_states_reshaped[selected_token_indices]
-++++                
-++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++                
-++++                final_hidden_states = final_hidden_states.index_add(
-++++                    dim=0,
-++++                    index=selected_token_indices,
-++++                    source=expert_output.to(hidden_states.dtype)
-++++                )
-++++            
-++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++ 
-+++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++-        return final_hidden_states, router_logits
-++++            final_hidden_states = final_hidden_states + shared_expert_output
-++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++            
-++++            return final_hidden_states, router_logits
-+++ 
-+++ 
-+++ class Qwen2MoeDecoderLayer(nn.Module):
-+++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+++ 
-+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++ 
-++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++
-+++         if (layer_idx not in config.mlp_only_layers) and (
-+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++         ):
-+++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+++     _skip_keys_device_placement = "past_key_values"
-+++     _supports_cache_class = True
-++++#lwx
-++++    # _supports_static_cache = True
-+++ 
-+++     def _init_weights(self, module):
-+++         std = self.config.initializer_range
-+++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++         return causal_mask
-+++ 
-+++ 
-+++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++     _tied_weights_keys = ["lm_head.weight"]
-+++ 
-+++     def __init__(self, config):
-+++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++         self.num_experts_per_tok = config.num_experts_per_tok
-+++         # Initialize weights and apply final processing
-+++         self.post_init()
-++++        # @lwx
-++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++++        #     self.generation_config.cache_implementation = "static"
-++++        self._warmed_up = False
-++++
-++++    def warmup_moe_model(self):
-++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++++        test_texts = [
-++++            "warmup short",
-++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++++        ]
-++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++        if tokenizer is None:
-++++            from mindnlp.transformers import AutoTokenizer
-++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++            self._warmup_tokenizer = tokenizer
-++++
-++++        for text in test_texts:
-++++            inputs = tokenizer(text, return_tensors="ms")
-++++            with mindspore._no_grad():
-++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+++ 
-+++     def get_input_embeddings(self):
-+++         return self.model.embed_tokens
-+++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++         ```"""
-++++        if not self._warmed_up:
-++++            self._warmed_up = True
-++++            self.warmup_moe_model()
-+++ 
-+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++         output_router_logits = (
-+++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++             }
-+++         )
-+++         return model_inputs
-++++# @lwx
-++++    # def _decode_one_tokens_logits(
-++++    #     self,
-++++    #     cur_token: mindspore.Tensor,
-++++    #     input_pos: Optional[mindspore.Tensor],
-++++    #     cache_position: mindspore.Tensor,
-++++    #     past_key_values: StaticCache,
-++++    # ) -> mindspore.Tensor:
-++++    #     """
-++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++++        
-++++    #     Args:
-++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++++    #         input_pos: 输入位置信息，可选
-++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++++            
-++++    #     Returns:
-++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++++    #     """
-++++    #     # 调用JIT编译的版本
-++++    #     return self.get_decode_one_tokens_logits(
-++++    #         cur_token=cur_token,
-++++    #         input_pos=input_pos,
-++++    #         cache_position=cache_position,
-++++    #         past_key_values=past_key_values,
-++++    #     )
-++++    
-++++    # @mindspore.jit(jit_level='O1')
-++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++++    #     """
-++++    #     JIT编译的函数，用于高效的单token解码
-++++    #     使用JIT编译优化以支持静态shape和高效执行
-++++        
-++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++++    #     """
-++++    #     outputs = self.model.forward(
-++++    #         input_ids=cur_token,
-++++    #         position_ids=input_pos,
-++++    #         cache_position=cache_position,
-++++    #         past_key_values=past_key_values,
-++++    #         use_cache=True,
-++++    #         return_dict=False,
-++++    #     )
-++++        
-++++    #     hidden_states = outputs[0]
-++++    #     logits = self.lm_head.forward(hidden_states)
-++++    #     logits = logits.float()
-++++        
-++++    #     return logits[:, -1, :]
-++++
-++++    # def _sample(
-++++    #     self,
-++++    #     input_ids: mindspore.Tensor,
-++++    #     logits_processor,
-++++    #     stopping_criteria,
-++++    #     generation_config,
-++++    #     synced_devices: bool,
-++++    #     streamer=None,
-++++    #     logits_warper=None,
-++++    #     **model_kwargs,
-++++    # ):
-++++    #     """
-++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++++    #     """
-++++    #     from ...generation.logits_process import LogitsProcessorList
-++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++++    #     from mindnlp.core import nn, ops, no_grad
-++++    #     import numpy as np
-++++        
-++++    #     # 检查是否使用 StaticCache
-++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++++    #     # 否则，直接调用父类方法
-++++    #     past_key_values = model_kwargs.get("past_key_values")
-++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++++
-++++    #     if not isinstance(past_key_values, StaticCache):
-++++    #         # 不使用 StaticCache，直接调用父类方法
-++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++++    #         return super()._sample(
-++++    #             input_ids=input_ids,
-++++    #             logits_processor=logits_processor,
-++++    #             stopping_criteria=stopping_criteria,
-++++    #             generation_config=generation_config,
-++++    #             synced_devices=synced_devices,
-++++    #             streamer=streamer,
-++++    #             logits_warper=logits_warper,
-++++    #             **model_kwargs,
-++++    #         )
-++++        
-++++    #     # 使用 StaticCache，进入自定义循环
-++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++++    #     pad_token_id = generation_config._pad_token_tensor
-++++    #     output_attentions = generation_config.output_attentions
-++++    #     output_hidden_states = generation_config.output_hidden_states
-++++    #     output_scores = generation_config.output_scores
-++++    #     output_logits = generation_config.output_logits
-++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++++    #     max_length = generation_config.max_length
-++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++++    #     do_sample = generation_config.do_sample
-++++        
-++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++++    #         raise ValueError(
-++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++++    #             f"{logits_warper})."
-++++    #         )
-++++        
-++++    #     # init attention / hidden states / scores tuples
-++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++++        
-++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++++    #         encoder_hidden_states = (
-++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++++    #         )
-++++        
-++++    #     # keep track of which sequences are already finished
-++++    #     batch_size, cur_len = input_ids.shape
-++++    #     this_peer_finished = False
-++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++++        
-++++    #     time_record = []
-++++    #     from ....utils.testing_utils import parse_flag_from_env
-++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++++        
-++++    #     while self._has_unfinished_sequences(
-++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++++    #     ):
-++++    #         if _record_time:
-++++    #             import time as time_module
-++++    #             infer_start = time_module.time()
-++++            
-++++    #         # prepare model inputs
-++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++++            
-++++    #         # prepare variable output controls
-++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++++            
-++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++++    #         cur_cache_position = model_inputs.get("cache_position")
-++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-++++    #         cur_input_ids = model_inputs.get("input_ids")
-++++            
-++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++++    #             cur_cache_position is not None and 
-++++    #             len(cur_cache_position.shape) > 0 and
-++++    #             cur_cache_position.shape[0] == 1 and
-++++    #             cur_input_ids is not None and
-++++    #             cur_input_ids.shape[1] == 1):
-++++    #             # 使用 JIT 优化的单 token 解码
-++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++++    #             if not hasattr(self, '_jit_used'):
-++++    #                 self._jit_used = False
-++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++++                
-++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-++++    #                 cur_token=cur_input_ids,
-++++    #                 input_pos=model_inputs.get("position_ids"),
-++++    #                 cache_position=cur_cache_position,
-++++    #                 past_key_values=cur_past_key_values,
-++++    #             )
-++++                
-++++    #             # 标记已使用JIT（用于后续判断）
-++++    #             if not self._jit_used:
-++++    #                 self._jit_used = True
-++++                
-++++    #             # 构造兼容的输出对象
-++++    #             class JitOptimizedOutput:
-++++    #                 def __init__(self, logits, config):
-++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++++    #                     self.config = config
-++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-++++    #                     self.cross_attentions = None
-++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++++                
-++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++++    #         else:
-++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++++    #             outputs = self(**model_inputs, return_dict=True)
-++++            
-++++    #         if synced_devices and this_peer_finished:
-++++    #             continue
-++++            
-++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++++    #         next_token_logits = outputs.logits[:, -1, :]
-++++            
-++++    #         # pre-process distribution
-++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++++    #         if do_sample:
-++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++++            
-++++    #         # Store scores, attentions and hidden_states when required
-++++    #         if return_dict_in_generate:
-++++    #             if output_scores:
-++++    #                 scores += (next_token_scores,)
-++++    #             if output_logits:
-++++    #                 raw_logits += (next_token_logits,)
-++++    #             if output_attentions:
-++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++++    #                 if self.config.is_encoder_decoder:
-++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++++                
-++++    #             if output_hidden_states:
-++++    #                 hidden = (
-++++    #                     outputs.decoder_hidden_states
-++++    #                     if self.config.is_encoder_decoder
-++++    #                     else outputs.hidden_states
-++++    #                 )
-++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++++            
-++++    #         # token selection
-++++    #         if do_sample:
-++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++++    #         else:
-++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++++            
-++++    #         # finished sentences should have their next token be a padding token
-++++    #         if has_eos_stopping_criteria:
-++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++++            
-++++    #         # update generated ids, model inputs, and length for next step
-++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++++    #         if streamer is not None:
-++++    #             streamer.put(next_tokens)
-++++            
-++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-++++    #             outputs,
-++++    #             model_kwargs,
-++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++++    #         )
-++++            
-++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++++    #         cur_len += 1
-++++            
-++++    #         if _record_time:
-++++    #             import time as time_module
-++++    #             infer_stop = time_module.time()
-++++    #             time_record.append(infer_stop - infer_start)
-++++            
-++++    #         del outputs
-++++        
-++++    #     average_infer_time = None
-++++    #     if time_record:
-++++    #         if len(time_record) > 1:
-++++    #             time_record.pop(0)
-++++    #         average_infer_time = sum(time_record) / len(time_record)
-++++    #         print(f'average inference time is: {average_infer_time}')
-++++    #         print(f'inference time record: {time_record}')
-++++        
-++++    #     if streamer is not None:
-++++    #         streamer.end()
-++++        
-++++    #     # 简单判断：打印是否使用了JIT路径
-++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-++++    #     else:
-++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++++        
-++++    #     if return_dict_in_generate:
-++++    #         if self.config.is_encoder_decoder:
-++++    #             return GenerateEncoderDecoderOutput(
-++++    #                 sequences=input_ids,
-++++    #                 scores=scores,
-++++    #                 logits=raw_logits,
-++++    #                 encoder_attentions=encoder_attentions,
-++++    #                 encoder_hidden_states=encoder_hidden_states,
-++++    #                 decoder_attentions=decoder_attentions,
-++++    #                 cross_attentions=cross_attentions,
-++++    #                 decoder_hidden_states=decoder_hidden_states,
-++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++    #                 average_infer_time=average_infer_time
-++++    #             )
-++++    #         else:
-++++    #             return GenerateDecoderOnlyOutput(
-++++    #                 sequences=input_ids,
-++++    #                 scores=scores,
-++++    #                 logits=raw_logits,
-++++    #                 attentions=decoder_attentions,
-++++    #                 hidden_states=decoder_hidden_states,
-++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++    #                 average_infer_time=average_infer_time
-++++    #             )
-++++    #     else:
-++++    #         return input_ids
-++++            
-++++    # def _prepare_cache_for_generation(
-++++    #     self,
-++++    #     generation_config,
-++++    #     model_kwargs,
-++++    #     assistant_model,
-++++    #     batch_size,
-++++    #     max_cache_length,
-++++    # ):
-++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++++    #         generation_config.cache_implementation = "static"
-++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++++        
-++++    #     if generation_config.cache_implementation == "static":
-++++    #         base_required_from_max_length = generation_config.max_length + 1
-++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-++++    #         min_cache_size = 50
-++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++++    #         else:
-++++    #             max_cache_length = max(base_required, min_cache_size)
-++++            
-++++    #         original_max_cache_length = max_cache_length
-++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-++++            
-++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++    #             if max_cache_length > self.config.max_position_embeddings:
-++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++++        
-++++    #     result = super()._prepare_cache_for_generation(
-++++    #         generation_config=generation_config,
-++++    #         model_kwargs=model_kwargs,
-++++    #         assistant_model=assistant_model,
-++++    #         batch_size=batch_size,
-++++    #         max_cache_length=max_cache_length,
-++++    #     )
-++++        
-++++    #     if generation_config.cache_implementation == "static":
-++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++++    #         created_cache = model_kwargs.get(cache_name)
-++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++++    #             if created_cache.max_cache_len < generation_config.max_length:
-++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++++        
-++++    #     return result
-++++
-++++
-++++
-+++ 
-+++ 
-+++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+++-- 
-+++2.27.0
-+++
-++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-++new file mode 100644
-++index 00000000..22b65dd5
-++--- /dev/null
-+++++ b/patches/0002-20251106commit.patch
-++@@ -0,0 +1,3200 @@
-+++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+++From: Pinoeer-kingxi <13022943007@163.com>
-+++Date: Thu, 6 Nov 2025 09:20:38 +0800
-+++Subject: [PATCH 2/3] 20251106commit
-+++
-+++---
-+++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
-+++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
-+++ 3 files changed, 2689 insertions(+), 305 deletions(-)
-+++ create mode 100644 patches/0001-20251104commit.patch
-+++
-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++index d8303e45..73773c22 100644
-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
-+++         #     y = y + self.shared_experts(identity)
-+++         # return y
-+++         
-++++    # @no_grad()
-++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++
-++++    #     expert_cache = ops.zeros_like(x)
-++++    #     for i in range(self.num_experts_per_tok):
-++++    #         expert_id = flat_expert_indices[i].item()
-++++    #         weight = flat_expert_weights[i].item()
-++++    #         expert = self.experts[expert_id]
-++++    #         expert_out = expert(x)
-++++    #         expert_cache += expert_out * weight
-++++    #     return expert_cache
-++++
-+++     @no_grad()
-+++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++        # x 的 shape: (1, hidden_size)
-++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++++
-++++        # 1. 收集所有需要的专家层
-++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-++++
-++++        # 2. 并行计算所有专家的输出
-++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++++        # ops.cat 会将它们堆叠成一个新的 Tensor
-++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++++
-++++        # 3. 使用矩阵乘法进行加权求和
-++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++        # 最终结果 final_output 的 shape: (1, hidden_size)
-++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++++        
-++++        return final_output
-+++ 
-+++-        expert_cache = ops.zeros_like(x)
-+++-        for i in range(self.num_experts_per_tok):
-+++-            expert_id = flat_expert_indices[i].item()
-+++-            weight = flat_expert_weights[i].item()
-+++-            expert = self.experts[expert_id]
-+++-            expert_out = expert(x)
-+++-            expert_cache += expert_out * weight
-+++-        return expert_cache
-+++ 
-+++     @no_grad()
-+++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
-+++             key_states = self.k_proj(hidden_states)
-+++             value_states = self.v_proj(hidden_states)
-+++ 
-+++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++        # @lwx
-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
-++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-+++ 
-+++         kv_seq_len = key_states.shape[-2]
-+++         if past_key_value is not None:
-+++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-+++ 
-++++# class DeepseekFlashAttention(nn.Module):
-++++#     """
-++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-++++
-++++#     This class is designed as a drop-in replacement for DeepseekAttention.
-++++#     """
-++++
-++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-++++#         super().__init__()
-++++#         self.config = config
-++++#         self.layer_idx = layer_idx
-++++#         if layer_idx is None:
-++++#             logger.warning(
-++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++#                 "when creating this class."
-++++#             )
-++++
-++++#         self.attention_dropout = config.attention_dropout
-++++#         self.hidden_size = config.hidden_size
-++++#         self.num_heads = config.num_attention_heads
-++++#         self.head_dim = self.hidden_size // self.num_heads
-++++#         self.num_key_value_heads = config.num_key_value_heads
-++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++#         self.max_position_embeddings = config.max_position_embeddings
-++++#         self.rope_theta = config.rope_theta
-++++#         self.is_causal = True
-++++
-++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++++#             raise ValueError(
-++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++++#                 f" and `num_heads`: {self.num_heads})."
-++++#             )
-++++
-++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-++++#         self._init_rope()
-++++
-++++#     def _init_rope(self):
-++++#         if self.config.rope_scaling is None:
-++++#             self.rotary_emb = DeepseekRotaryEmbedding(
-++++#                 self.head_dim,
-++++#                 max_position_embeddings=self.max_position_embeddings,
-++++#                 base=self.rope_theta,
-++++#             )
-++++#         else:
-++++#             scaling_type = self.config.rope_scaling["type"]
-++++#             scaling_factor = self.config.rope_scaling["factor"]
-++++#             if scaling_type == "linear":
-++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-++++#                     self.head_dim,
-++++#                     max_position_embeddings=self.max_position_embeddings,
-++++#                     scaling_factor=scaling_factor,
-++++#                     base=self.rope_theta,
-++++#                 )
-++++#             elif scaling_type == "dynamic":
-++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-++++#                     self.head_dim,
-++++#                     max_position_embeddings=self.max_position_embeddings,
-++++#                     scaling_factor=scaling_factor,
-++++#                     base=self.rope_theta,
-++++#                 )
-++++#             else:
-++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-++++
-++++#     def forward(
-++++#         self,
-++++#         hidden_states: mindspore.Tensor,
-++++#         attention_mask: Optional[mindspore.Tensor] = None,
-++++#         position_ids: Optional[mindspore.Tensor] = None,
-++++#         past_key_value: Optional[Cache] = None,
-++++#         output_attentions: bool = False,
-++++#         use_cache: bool = False,
-++++#         **kwargs,
-++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++#         if "padding_mask" in kwargs:
-++++#             warnings.warn(
-++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-++++#             )
-++++        
-++++#         if output_attentions:
-++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-++++
-++++#         bsz, q_len, _ = hidden_states.shape
-++++
-++++#         if self.config.pretraining_tp > 1:
-++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-++++
-++++#         query_states = self.q_proj(hidden_states)
-++++#         key_states = self.k_proj(hidden_states)
-++++#         value_states = self.v_proj(hidden_states)
-++++
-++++#         # Reshape for multi-head attention
-++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++#         kv_seq_len = key_states.shape[-2]
-++++#         if past_key_value is not None:
-++++#             if self.layer_idx is None:
-++++#                 raise ValueError(
-++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++#                     "with a layer index."
-++++#                 )
-++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++        
-++++#         # Apply Rotary Positional Embedding
-++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++#         if past_key_value is not None:
-++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++
-++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++        
-++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++++        
-++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++++
-++++#         # Convert attention_mask for flash_attention_score
-++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-++++#         if attention_mask is not None:
-++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-++++#                 raise ValueError(
-++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-++++#                 )
-++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-++++#         else:
-++++#             attn_mask_for_fa = None
-++++        
-++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-++++
-++++#         # Call the fused flash_attention_score operator
-++++#         attn_output = mindspore.ops.flash_attention_score(
-++++#             query=query_states_for_fa,
-++++#             key=key_states_for_fa,
-++++#             value=value_states_for_fa,
-++++#             head_num=self.num_heads, # This is N1, the number of query heads
-++++#             input_layout='BSH',
-++++#             attn_mask=attn_mask_for_fa,
-++++#             keep_prob=keep_prob,
-++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-++++#             sparse_mode=0 # Default mask mode
-++++#         )
-++++        
-++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-++++#         attn_output = self.o_proj(attn_output)
-++++        
-++++#         # Flash Attention does not return attention weights
-++++#         attn_weights = None
-++++
-++++#         return attn_output, attn_weights, past_key_value
-++++
-++++class DeepseekFlashAttention(nn.Module):
-++++    """
-++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
-++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
-++++    designed for high performance on supported hardware (Ascend).
-++++
-++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
-++++    """
-++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-++++        super().__init__()
-++++        self.config = config
-++++        self.layer_idx = layer_idx
-++++        if layer_idx is None:
-++++            logger.warning(
-++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++                "when creating this class."
-++++            )
-++++
-++++        # --- [FIX] Correctly initialize all required attributes ---
-++++        self.attention_dropout = config.attention_dropout
-++++        self.hidden_size = config.hidden_size
-++++        self.num_heads = config.num_attention_heads
-++++        self.head_dim = self.hidden_size // self.num_heads
-++++        self.num_key_value_heads = config.num_key_value_heads
-++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++        self.max_position_embeddings = config.max_position_embeddings
-++++        self.rope_theta = config.rope_theta
-++++        self.is_causal = True
-++++
-++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++            raise ValueError(
-++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++++                f" and `num_heads`: {self.num_heads})."
-++++            )
-++++
-++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-++++        
-++++        # This call will now succeed as all attributes are initialized.
-++++        self._init_rope()
-++++
-++++    def _init_rope(self):
-++++        if self.config.rope_scaling is None:
-++++            self.rotary_emb = DeepseekRotaryEmbedding(
-++++                self.head_dim,
-++++                max_position_embeddings=self.max_position_embeddings,
-++++                base=self.rope_theta,
-++++            )
-++++        else:
-++++            scaling_type = self.config.rope_scaling["type"]
-++++            scaling_factor = self.config.rope_scaling["factor"]
-++++            if scaling_type == "linear":
-++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-++++                    self.head_dim,
-++++                    max_position_embeddings=self.max_position_embeddings,
-++++                    scaling_factor=scaling_factor,
-++++                    base=self.rope_theta,
-++++                )
-++++            elif scaling_type == "dynamic":
-++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-++++                    self.head_dim,
-++++                    max_position_embeddings=self.max_position_embeddings,
-++++                    scaling_factor=scaling_factor,
-++++                    base=self.rope_theta,
-++++                )
-++++            else:
-++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-++++
-++++    def forward(
-++++        self,
-++++        hidden_states: mindspore.Tensor,
-++++        attention_mask: Optional[mindspore.Tensor] = None,
-++++        position_ids: Optional[mindspore.Tensor] = None,
-++++        past_key_value: Optional[Cache] = None,
-++++        output_attentions: bool = False,
-++++        use_cache: bool = False,
-++++        **kwargs,
-++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++        if "padding_mask" in kwargs:
-++++            warnings.warn(
-++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-++++            )
-++++        if output_attentions:
-++++            warnings.warn(
-++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
-++++            )
-++++
-++++        bsz, q_len, _ = hidden_states.shape
-++++
-++++        if self.config.pretraining_tp > 1:
-++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-++++
-++++        query_states = self.q_proj(hidden_states)
-++++        key_states = self.k_proj(hidden_states)
-++++        value_states = self.v_proj(hidden_states)
-++++
-++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++        kv_seq_len = key_states.shape[-2]
-++++        if past_key_value is not None:
-++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++        
-++++        # Apply Rotary Position Embedding
-++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++        if past_key_value is not None:
-++++            cache_kwargs = {"sin": sin, "cos": cos}
-++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++
-++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
-++++        # So we must explicitly repeat the KV heads.
-++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++
-++++        # Convert attention mask for flash_attention_score
-++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
-++++        if attention_mask is not None:
-++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-++++                 raise ValueError(
-++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-++++                )
-++++            attn_mask_for_fa = attention_mask < 0
-++++        else:
-++++            attn_mask_for_fa = None
-++++
-++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-++++
-++++        # Call the fused operator using the efficient BNSD layout
-++++        attn_output = mindspore.ops.flash_attention_score(
-++++            query=query_states,
-++++            key=key_states,
-++++            value=value_states,
-++++            head_num=self.num_heads,
-++++            input_layout='BNSD', # Specify the correct layout
-++++            attn_mask=attn_mask_for_fa,
-++++            keep_prob=keep_prob,
-++++            scalar_value=1.0 / math.sqrt(self.head_dim)
-++++        )
-++++        
-++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
-++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++        
-++++        # Apply output projection
-++++        attn_output = self.o_proj(attn_output)
-++++
-++++        # Flash attention does not return attention weights, so we return None.
-++++        attn_weights = None
-++++
-++++        return attn_output, attn_weights, past_key_value
-++++
-+++ Deepseek_ATTENTION_CLASSES = {
-+++     "eager": DeepseekAttention,
-++++    "flash-attention": DeepseekFlashAttention,
-+++ }
-+++ 
-+++ 
-+++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
-+++             config=config, layer_idx=layer_idx
-+++         )
-+++ 
-++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-++++            config=config, layer_idx=layer_idx
-++++        )
-++++
-+++         self.mlp = (
-+++             DeepseekMoE(config)
-+++             if (
-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++index d4c6b651..bced285c 100644
-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
-+++ 
-+++ import mindspore
-+++ import mindnlp.core.nn.functional as F
-+++-from mindnlp.core import nn, ops
-++++from mindnlp.core import nn, ops, no_grad
-+++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-+++ 
-+++ from ....common.activations import ACT2FN
-+++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
-+++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-+++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-+++ 
-++++Long_Prompt = False
-++++PROMPT_LENGTH_THRESHOLD = 128
-+++ 
-+++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-+++ def _prepare_4d_causal_attention_mask_with_cache_position(
-+++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-+++ 
-++++# class Qwen2MoeFlashAttention(nn.Module):
-++++#     """
-++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++
-++++#     关键改动:
-++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++#        直接传入原始的 key 和 value 张量效率更高。
-++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++#     """
-++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++#         super().__init__()
-++++#         self.config = config
-++++#         self.layer_idx = layer_idx
-++++#         self.hidden_size = config.hidden_size
-++++#         self.num_heads = config.num_attention_heads
-++++#         self.head_dim = self.hidden_size // self.num_heads
-++++#         self.num_key_value_heads = config.num_key_value_heads
-++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++#         self.max_position_embeddings = config.max_position_embeddings
-++++#         self.rope_theta = config.rope_theta
-++++#         self.attention_dropout = config.attention_dropout
-++++
-++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++++#             raise ValueError(
-++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++#             )
-++++
-++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++
-++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++#             self.head_dim,
-++++#             max_position_embeddings=self.max_position_embeddings,
-++++#             base=self.rope_theta,
-++++#         )
-++++
-++++#     def forward(
-++++#         self,
-++++#         hidden_states: mindspore.Tensor,
-++++#         attention_mask: Optional[mindspore.Tensor] = None,
-++++#         position_ids: Optional[mindspore.Tensor] = None,
-++++#         past_key_value: Optional[Cache] = None,
-++++#         output_attentions: bool = False,
-++++#         use_cache: bool = False,
-++++#         cache_position: Optional[mindspore.Tensor] = None,
-++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++#         bsz, q_len, _ = hidden_states.shape
-++++
-++++#         # 1. 线性投射 Q, K, V
-++++#         query_states = self.q_proj(hidden_states)
-++++#         key_states = self.k_proj(hidden_states)
-++++#         value_states = self.v_proj(hidden_states)
-++++
-++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
-++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++#         # 3. RoPE 旋转位置编码
-++++#         kv_seq_len = key_states.shape[-2]
-++++#         if past_key_value is not None:
-++++#             if self.layer_idx is None:
-++++#                 raise ValueError(
-++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++#                     "with a layer index."
-++++#                 )
-++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++#                 if cache_position.shape[0] == 1:
-++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++#                     kv_seq_len = past_seen_tokens + 1
-++++#                 else:
-++++#                     # prefill 阶段：cache_position 是范围，使用其长度
-++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++#             else:
-++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++
-++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++#         # 4. KV 缓存更新
-++++#         if past_key_value is not None:
-++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++#             key_states, value_states = past_key_value.update(
-++++#                 key_states, value_states, self.layer_idx, cache_kwargs
-++++#             )
-++++            
-++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++#                 if cache_position.shape[0] == 1:
-++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++#                     kv_seq_len = key_states.shape[-2]
-++++
-++++#         # 5. [重要] 准备 Attention Mask
-++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++#         fa_attention_mask = None
-++++#         if attention_mask is not None:
-++++#             # 截取与当前key长度匹配的部分
-++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++#             fa_attention_mask = (mask_slice != 0)
-++++
-++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++#         input_dtype = query_states.dtype
-++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++#             query_states = query_states.to(mindspore.float16)
-++++#             key_states = key_states.to(mindspore.float16)
-++++#             value_states = value_states.to(mindspore.float16)
-++++
-++++#         # 6. [核心] 调用 flash_attention_score 算子
-++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++#         attn_output = mindspore.ops.flash_attention_score(
-++++#             query=query_states,
-++++#             key=key_states,
-++++#             value=value_states,
-++++#             head_num=self.num_heads, # 传入Q的头数(N1)
-++++#             attn_mask=fa_attention_mask,
-++++#             keep_prob=1.0 - self.attention_dropout,
-++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-++++#             input_layout="BNSD",
-++++#             sparse_mode=0 # 使用 defaultMask 模式
-++++#         )
-++++
-++++#         # 恢复原始数据类型
-++++#         attn_output = attn_output.to(input_dtype)
-++++
-++++#         # 7. 调整输出形状
-++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++#         attn_output = self.o_proj(attn_output)
-++++
-++++#         # FlashAttention 算子不直接返回注意力权重矩阵
-++++#         attn_weights = None
-++++#         if output_attentions:
-++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++
-++++#         return attn_output, attn_weights, past_key_value
-++++
-++++#     # def forward(
-++++#     #     self,
-++++#     #     hidden_states: mindspore.Tensor,
-++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
-++++#     #     position_ids: Optional[mindspore.Tensor] = None,
-++++#     #     past_key_value: Optional[Cache] = None,
-++++#     #     output_attentions: bool = False,
-++++#     #     use_cache: bool = False,
-++++#     #     cache_position: Optional[mindspore.Tensor] = None,
-++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++#     #     bsz, q_len, _ = hidden_states.shape
-++++
-++++#     #     # 1. 线性投射 Q, K, V
-++++#     #     query_states = self.q_proj(hidden_states)
-++++#     #     key_states = self.k_proj(hidden_states)
-++++#     #     value_states = self.v_proj(hidden_states)
-++++
-++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++
-++++#     #     # 3. RoPE 旋转位置编码
-++++#     #     kv_seq_len = key_states.shape[-2]
-++++#     #     if past_key_value is not None:
-++++#     #         if self.layer_idx is None:
-++++#     #             raise ValueError(
-++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++#     #                 "with a layer index."
-++++#     #             )
-++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++
-++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++#     #     # 4. KV 缓存更新
-++++#     #     if past_key_value is not None:
-++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++#     #         key_states, value_states = past_key_value.update(
-++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
-++++#     #         )
-++++
-++++#     #     # 5. 准备 Attention Mask
-++++#     #     fa_attention_mask = None
-++++#     #     if attention_mask is not None:
-++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++#     #         fa_attention_mask = (mask_slice != 0)
-++++
-++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++#     #     input_dtype = query_states.dtype
-++++
-++++#     #     # 6. [核心] 调用 flash_attention_score 算子
-++++#     #     attn_output = mindspore.ops.flash_attention_score(
-++++#     #         query=query_states,
-++++#     #         key=key_states,
-++++#     #         value=value_states,
-++++#     #         head_num=self.num_heads,
-++++#     #         attn_mask=fa_attention_mask,
-++++#     #         keep_prob=1.0 - self.attention_dropout,
-++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++#     #         input_layout="BNSD",
-++++#     #         sparse_mode=0,
-++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++#     #         inner_precise=1
-++++#     #     )
-++++
-++++#     #     # 恢复原始数据类型
-++++#     #     attn_output = attn_output.to(input_dtype)
-++++
-++++#     #     # 7. 调整输出形状
-++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++#     #     attn_output = self.o_proj(attn_output)
-++++
-++++#     #     attn_weights = None
-++++#     #     if output_attentions:
-++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++
-++++#     #     return attn_output, attn_weights, past_key_value
-++++
-++++
-+++ class Qwen2MoeFlashAttention(nn.Module):
-+++     """
-+++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++-
-+++-    关键改动:
-+++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++-       直接传入原始的 key 和 value 张量效率更高。
-+++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
-++++
-++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
-++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
-++++    完全使用模型的低精度数据类型（如 float16）进行计算，
-++++    以达到理论上的最高执行速度。
-+++     """
-+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++         super().__init__()
-+++         self.config = config
-+++         self.layer_idx = layer_idx
-++++        if layer_idx is None:
-++++            logger.warning_once(
-++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
-++++            )
-++++
-+++         self.hidden_size = config.hidden_size
-+++         self.num_heads = config.num_attention_heads
-+++         self.head_dim = self.hidden_size // self.num_heads
-+++         self.num_key_value_heads = config.num_key_value_heads
-+++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++         self.max_position_embeddings = config.max_position_embeddings
-+++         self.rope_theta = config.rope_theta
-+++         self.attention_dropout = config.attention_dropout
-+++ 
-+++-        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++-            raise ValueError(
-+++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++-            )
-+++-
-+++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
-+++         key_states = self.k_proj(hidden_states)
-+++         value_states = self.v_proj(hidden_states)
-+++ 
-+++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++-        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++        # 2. 调整形状以匹配 BNSD 布局
-+++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-
-+++-        # 3. RoPE 旋转位置编码
-++++        
-++++        # 3. RoPE 和 KV 缓存
-+++         kv_seq_len = key_states.shape[-2]
-+++         if past_key_value is not None:
-+++-            if self.layer_idx is None:
-+++-                raise ValueError(
-+++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++-                    "with a layer index."
-+++-                )
-+++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++-                if cache_position.shape[0] == 1:
-+++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++-                    kv_seq_len = past_seen_tokens + 1
-+++-                else:
-+++-                    # prefill 阶段：cache_position 是范围，使用其长度
-+++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++-            else:
-+++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++-
-++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++        
-+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++ 
-+++-        # 4. KV 缓存更新
-+++         if past_key_value is not None:
-+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++-            key_states, value_states = past_key_value.update(
-+++-                key_states, value_states, self.layer_idx, cache_kwargs
-+++-            )
-+++-            
-+++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++-                if cache_position.shape[0] == 1:
-+++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++-                    kv_seq_len = key_states.shape[-2]
-+++-
-+++-        # 5. [重要] 准备 Attention Mask
-+++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++
-++++        # 4. 准备 Attention Mask
-+++         fa_attention_mask = None
-+++         if attention_mask is not None:
-+++-            # 截取与当前key长度匹配的部分
-+++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++             fa_attention_mask = (mask_slice != 0)
-+++ 
-+++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++-        input_dtype = query_states.dtype
-+++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++-            query_states = query_states.to(mindspore.float16)
-+++-            key_states = key_states.to(mindspore.float16)
-+++-            value_states = value_states.to(mindspore.float16)
-+++-
-+++-        # 6. [核心] 调用 flash_attention_score 算子
-+++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
-+++         attn_output = mindspore.ops.flash_attention_score(
-+++             query=query_states,
-+++             key=key_states,
-+++             value=value_states,
-+++-            head_num=self.num_heads, # 传入Q的头数(N1)
-++++            head_num=self.num_heads,
-+++             attn_mask=fa_attention_mask,
-+++-            keep_prob=1.0 - self.attention_dropout,
-++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
-+++             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++             input_layout="BNSD",
-+++-            sparse_mode=0 # 使用 defaultMask 模式
-++++            sparse_mode=0,
-++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
-+++         )
-+++ 
-+++-        # 恢复原始数据类型
-+++-        attn_output = attn_output.to(input_dtype)
-+++-
-+++-        # 7. 调整输出形状
-+++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++        # 6. 调整输出形状
-+++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++         attn_output = self.o_proj(attn_output)
-+++ 
-+++-        # FlashAttention 算子不直接返回注意力权重矩阵
-++++        # 7. 返回结果
-+++         attn_weights = None
-+++         if output_attentions:
-+++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-+++ 
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-+++-    # def forward(
-+++-    #     self,
-+++-    #     hidden_states: mindspore.Tensor,
-+++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++-    #     position_ids: Optional[mindspore.Tensor] = None,
-+++-    #     past_key_value: Optional[Cache] = None,
-+++-    #     output_attentions: bool = False,
-+++-    #     use_cache: bool = False,
-+++-    #     cache_position: Optional[mindspore.Tensor] = None,
-+++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++-
-+++-    #     bsz, q_len, _ = hidden_states.shape
-+++-
-+++-    #     # 1. 线性投射 Q, K, V
-+++-    #     query_states = self.q_proj(hidden_states)
-+++-    #     key_states = self.k_proj(hidden_states)
-+++-    #     value_states = self.v_proj(hidden_states)
-+++-
-+++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-
-+++-    #     # 3. RoPE 旋转位置编码
-+++-    #     kv_seq_len = key_states.shape[-2]
-+++-    #     if past_key_value is not None:
-+++-    #         if self.layer_idx is None:
-+++-    #             raise ValueError(
-+++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++-    #                 "with a layer index."
-+++-    #             )
-+++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++ 
-+++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++-
-+++-    #     # 4. KV 缓存更新
-+++-    #     if past_key_value is not None:
-+++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++-    #         key_states, value_states = past_key_value.update(
-+++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++-    #         )
-+++-
-+++-    #     # 5. 准备 Attention Mask
-+++-    #     fa_attention_mask = None
-+++-    #     if attention_mask is not None:
-+++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++-    #         fa_attention_mask = (mask_slice != 0)
-+++-
-+++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++-    #     input_dtype = query_states.dtype
-+++-
-+++-    #     # 6. [核心] 调用 flash_attention_score 算子
-+++-    #     attn_output = mindspore.ops.flash_attention_score(
-+++-    #         query=query_states,
-+++-    #         key=key_states,
-+++-    #         value=value_states,
-+++-    #         head_num=self.num_heads,
-+++-    #         attn_mask=fa_attention_mask,
-+++-    #         keep_prob=1.0 - self.attention_dropout,
-+++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++-    #         input_layout="BNSD",
-+++-    #         sparse_mode=0,
-+++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++-    #         inner_precise=1
-+++-    #     )
-+++-
-+++-    #     # 恢复原始数据类型
-+++-    #     attn_output = attn_output.to(input_dtype)
-++++QWEN2MOE_ATTENTION_CLASSES = {
-++++    "eager": Qwen2MoeAttention,
-++++    "flash-attention": Qwen2MoeFlashAttention,
-++++}
-+++ 
-+++-    #     # 7. 调整输出形状
-+++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++-    #     attn_output = self.o_proj(attn_output)
-+++ 
-+++-    #     attn_weights = None
-+++-    #     if output_attentions:
-+++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++#     def __init__(self, config):
-++++#         super().__init__()
-++++#         self.num_experts = config.num_experts
-++++#         self.top_k = config.num_experts_per_tok
-++++#         self.norm_topk_prob = config.norm_topk_prob
-++++
-++++#         # gating
-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++#         self.experts = nn.ModuleList(
-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++#         )
-++++
-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++
-++++#     #@dwj
-++++#     # 只遍历激活的专家，而非全部专家
-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++#             num_tokens = hidden_states_reshaped.shape[0]
-++++
-++++#             router_logits = self.gate(hidden_states_reshaped)
-++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++
-++++#             if self.norm_topk_prob:
-++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++#             routing_weights = routing_weights.to(hidden_states.dtype)
-++++            
-++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++#             flat_selected_experts = selected_experts.flatten()
-++++            
-++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++#             token_indices = broadcasted_token_indices.flatten()
-++++            
-++++#             active_experts = ops.unique(flat_selected_experts)
-++++            
-++++#             for expert_idx_tensor in active_experts:
-++++#                 expert_idx = expert_idx_tensor.item()
-++++#                 expert_layer = self.experts[expert_idx]
-++++                
-++++#                 mask = (flat_selected_experts == expert_idx_tensor)
-++++#                 selected_token_indices = token_indices[mask]
-++++#                 selected_routing_weights = routing_weights.flatten()[mask]
-++++                
-++++#                 current_states = hidden_states_reshaped[selected_token_indices]
-++++                
-++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++                
-++++#                 final_hidden_states = final_hidden_states.index_add(
-++++#                     dim=0,
-++++#                     index=selected_token_indices,
-++++#                     source=expert_output.to(hidden_states.dtype)
-++++#                 )
-++++            
-++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++ 
-+++-    #     return attn_output, attn_weights, past_key_value
-++++#             final_hidden_states = final_hidden_states + shared_expert_output
-++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++            
-++++#             return final_hidden_states, router_logits
-++++
-++++
-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++#     """
-++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
-++++#     """
-++++#     def __init__(self, config: Qwen2MoeConfig):
-++++#         super().__init__()
-++++#         self.num_experts = config.num_experts
-++++#         self.top_k = config.num_experts_per_tok
-++++#         self.norm_topk_prob = config.norm_topk_prob
-++++
-++++#         # 门控网络
-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++#         # 专家列表
-++++#         self.experts = nn.ModuleList(
-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++#         )
-++++#         # 共享专家
-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++
-++++#     @no_grad()
-++++#     def _moe_infer_decode(
-++++#         self, 
-++++#         hidden_states: mindspore.Tensor, 
-++++#         selected_experts: mindspore.Tensor, 
-++++#         routing_weights: mindspore.Tensor
-++++#     ) -> mindspore.Tensor:
-++++#         """
-++++#         【解码路径】针对 sequence_length=1 的极致优化。
-++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-++++#         """
-++++#         batch_size, hidden_dim = hidden_states.shape
-++++        
-++++#         expert_outputs_list = [
-++++#             ops.cat([
-++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++#             ], dim=0) 
-++++#             for i in range(batch_size)
-++++#         ]
-++++        
-++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-++++#         # shape: (batch_size, top_k, hidden_dim)
-++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++        
-++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++        
-++++#         return moe_output.squeeze(1)
-++++
-++++#     @no_grad()
-++++#     def _moe_infer_prefill(
-++++#         self, 
-++++#         hidden_states: mindspore.Tensor, 
-++++#         selected_experts: mindspore.Tensor, 
-++++#         routing_weights: mindspore.Tensor
-++++#     ) -> mindspore.Tensor:
-++++#         """
-++++#         【预填充路径】针对 sequence_length > 1 的优化。
-++++#         按专家对 Token 进行分组，并进行批处理。
-++++#         """
-++++#         moe_output = ops.zeros_like(hidden_states)
-++++#         num_tokens = hidden_states.shape[0]
-++++#         flat_selected_experts = selected_experts.flatten()
-++++        
-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++        
-++++#         active_experts = ops.unique(flat_selected_experts)
-++++        
-++++#         for expert_idx_tensor in active_experts:
-++++#             expert_idx = expert_idx_tensor.item()
-++++#             expert_layer = self.experts[expert_idx]
-++++            
-++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++#             selected_token_indices = token_indices[mask]
-++++#             selected_routing_weights = routing_weights.flatten()[mask]
-++++            
-++++#             current_states = hidden_states[selected_token_indices]
-++++            
-++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++            
-++++#             moe_output = moe_output.index_add(
-++++#                 dim=0,
-++++#                 index=selected_token_indices,
-++++#                 source=expert_output.to(hidden_states.dtype)
-++++#             )
-++++#         return moe_output
-++++
-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++#         """
-++++#         顶层 forward 方法，作为智能分发器。
-++++#         """
-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++        
-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++#         router_logits = self.gate(hidden_states_reshaped)
-++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++ 
-+++-    # def forward(
-+++-    #     self,
-+++-    #     hidden_states: mindspore.Tensor,
-+++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++-    #     position_ids: Optional[mindspore.Tensor] = None,
-+++-    #     past_key_value: Optional[Cache] = None,
-+++-    #     output_attentions: bool = False,
-+++-    #     use_cache: bool = False,
-+++-    #     cache_position: Optional[mindspore.Tensor] = None,
-+++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++-
-+++-    #     bsz, q_len, _ = hidden_states.shape
-+++-
-+++-    #     query_states = self.q_proj(hidden_states)
-+++-    #     key_states = self.k_proj(hidden_states)
-+++-    #     value_states = self.v_proj(hidden_states)
-+++-
-+++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-
-+++-    #     kv_seq_len = key_states.shape[-2]
-+++-    #     if past_key_value is not None:
-+++-    #         if self.layer_idx is None:
-+++-    #             raise ValueError("`layer_idx` must be specified for caching")
-+++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++-
-+++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++-
-+++-    #     if past_key_value is not None:
-+++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++-    #         key_states, value_states = past_key_value.update(
-+++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++-    #         )
-++++#         if self.norm_topk_prob:
-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++        
-++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++        
-++++#         moe_output = None
-++++#         # 在推理时，根据序列长度选择最优路径
-++++#         if not self.training:
-++++#             if sequence_length == 1:
-++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++++#             else:
-++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++++#         else:
-++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-++++#             raise NotImplementedError("Training path is not implemented.")
-++++
-++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-++++        
-++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-++++        
-++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-++++        
-++++#         return final_hidden_states, router_logits
-++++
-++++
-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++#     """
-++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-++++#     """
-++++#     def __init__(self, config: Qwen2MoeConfig):
-++++#         super().__init__()
-++++#         self.num_experts = config.num_experts
-++++#         self.top_k = config.num_experts_per_tok
-++++#         self.norm_topk_prob = config.norm_topk_prob
-++++
-++++#         # 门控网络
-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++#         # 专家列表
-++++#         self.experts = nn.ModuleList(
-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++#         )
-++++#         # 共享专家
-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++
-++++#     @no_grad()
-++++#     def _moe_infer_decode(
-++++#         self, 
-++++#         hidden_states: mindspore.Tensor, 
-++++#         selected_experts: mindspore.Tensor, 
-++++#         routing_weights: mindspore.Tensor
-++++#     ) -> mindspore.Tensor:
-++++#         batch_size, _ = hidden_states.shape
-++++#         expert_outputs_list = [
-++++#             ops.cat([
-++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++#             ], dim=0) 
-++++#             for i in range(batch_size)
-++++#         ]
-++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++#         return moe_output.squeeze(1)
-++++
-++++#     @no_grad()
-++++#     def _moe_infer_prefill(
-++++#         self, 
-++++#         hidden_states: mindspore.Tensor, 
-++++#         selected_experts: mindspore.Tensor, 
-++++#         routing_weights: mindspore.Tensor
-++++#     ) -> mindspore.Tensor:
-++++#         moe_output = ops.zeros_like(hidden_states)
-++++#         num_tokens = hidden_states.shape[0]
-++++#         flat_selected_experts = selected_experts.flatten()
-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++#         active_experts = ops.unique(flat_selected_experts)
-++++        
-++++#         for expert_idx_tensor in active_experts:
-++++#             expert_idx = expert_idx_tensor.item()
-++++#             expert_layer = self.experts[expert_idx]
-++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++#             selected_token_indices = token_indices[mask]
-++++#             selected_routing_weights = routing_weights.flatten()[mask]
-++++#             current_states = hidden_states[selected_token_indices]
-++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++#             moe_output = moe_output.index_add(
-++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++++#             )
-++++#         return moe_output
-++++
-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++#         """
-++++#         顶层 forward 方法，作为智能分发器。
-++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-++++#         """
-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++        
-++++#         # 1. 门控计算 (通用逻辑)
-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++#         router_logits = self.gate(hidden_states_reshaped)
-++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++
-++++#         if self.norm_topk_prob:
-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++        
-++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++        
-++++#         # 2. 智能分发到最优 MoE 路径
-++++#         moe_output = None
-++++#         if not self.training:
-++++#             if sequence_length == 1:
-++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++++#             else:
-++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++++#         else:
-++++#             raise NotImplementedError("Training path is not implemented.")
-++++
-++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++        
-++++#         # 4. 合并 MoE 输出和共享专家输出
-++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++        
-++++#         # 5. 恢复原始形状并返回
-++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++        
-++++#         return final_hidden_states, router_logits
-++++
-++++# prefill fastest
-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++#     """
-++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-++++#     """
-++++#     def __init__(self, config: Qwen2MoeConfig):
-++++#         super().__init__()
-++++#         self.num_experts = config.num_experts
-++++#         self.top_k = config.num_experts_per_tok
-++++#         self.norm_topk_prob = config.norm_topk_prob
-++++
-++++#         # 门控网络
-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++#         # 专家列表
-++++#         self.experts = nn.ModuleList(
-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++#         )
-++++#         # 共享专家
-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++
-++++#     @no_grad()
-++++#     def _moe_infer_dispatch(
-++++#         self, 
-++++#         hidden_states: mindspore.Tensor, 
-++++#         selected_experts: mindspore.Tensor, 
-++++#         routing_weights: mindspore.Tensor
-++++#     ) -> mindspore.Tensor:
-++++#         """
-++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-++++#         """
-++++#         moe_output = ops.zeros_like(hidden_states)
-++++#         num_tokens, _ = hidden_states.shape
-++++        
-++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-++++#         flat_selected_experts = selected_experts.flatten()
-++++#         flat_routing_weights = routing_weights.flatten()
-+++ 
-+++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++-
-+++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++-    #     query_states = query_states / math.sqrt(self.head_dim)
-+++-    #     # <--- 修改结束 ---
-+++-
-+++-    #     fa_attention_mask = None
-+++-    #     if attention_mask is not None:
-+++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++-    #         fa_attention_mask = (mask_slice != 0)
-+++-
-+++-    #     input_dtype = query_states.dtype
-+++-
-+++-    #     attn_output = mindspore.ops.flash_attention_score(
-+++-    #         query=query_states,    # 传入已经预先缩放过的 query
-+++-    #         key=key_states,
-+++-    #         value=value_states,
-+++-    #         head_num=self.num_heads,
-+++-    #         attn_mask=fa_attention_mask,
-+++-    #         keep_prob=1.0 - self.attention_dropout,
-+++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++-    #         input_layout="BNSD",
-+++-    #         sparse_mode=0,
-+++-    #         inner_precise=1        # 仍然保持内部高精度计算
-+++-    #     )
-++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++ 
-+++-    #     attn_output = attn_output.to(input_dtype)
-+++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++-    #     attn_output = self.o_proj(attn_output)
-++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-++++#         active_experts = ops.unique(flat_selected_experts)
-++++        
-++++#         for expert_idx_tensor in active_experts:
-++++#             expert_idx = expert_idx_tensor.item()
-++++#             expert_layer = self.experts[expert_idx]
-++++            
-++++#             # 找到所有分配给该专家的 token
-++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++            
-++++#             # 使用 mask 选取对应的 token 和权重
-++++#             current_token_indices = token_indices[mask]
-++++#             current_routing_weights = flat_routing_weights[mask]
-++++#             current_hidden_states = hidden_states[current_token_indices]
-++++            
-++++#             # 对这些 token 进行批处理
-++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++            
-++++#             # 使用 index_add 将结果精确地加回到对应位置
-++++#             moe_output = moe_output.index_add(
-++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-++++#             )
-++++#         return moe_output
-++++
-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++#         """
-++++#         顶层 forward 方法，作为智能分发器。
-++++#         """
-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++        
-++++#         # 1. 门控计算
-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++#         router_logits = self.gate(hidden_states_reshaped)
-++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++
-++++#         if self.norm_topk_prob:
-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++        
-++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++        
-++++#         # 2. 调用统一的 MoE 计算内核
-++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-+++ 
-+++-    #     attn_weights = None
-+++-    #     if output_attentions:
-+++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++#         # 3. 统一处理共享专家
-++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++        
-++++#         # 4. 合并输出
-++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++        
-++++#         # 5. 恢复原始形状并返回
-++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++        
-++++#         return final_hidden_states, router_logits
-++++
-++++
-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++#     """
-++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++#     【最终高性能与高精度版】：
-++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-++++#     3. 这样实现了速度和准确性的两全其美。
-++++#     """
-++++#     def __init__(self, config: Qwen2MoeConfig):
-++++#         super().__init__()
-++++#         self.num_experts = config.num_experts
-++++#         self.top_k = config.num_experts_per_tok
-++++#         self.norm_topk_prob = config.norm_topk_prob
-++++
-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++#         self.experts = nn.ModuleList(
-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++#         )
-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++
-++++#     @no_grad()
-++++#     def _moe_infer_decode(
-++++#         self, 
-++++#         hidden_states: mindspore.Tensor, 
-++++#         selected_experts: mindspore.Tensor, 
-++++#         routing_weights: mindspore.Tensor
-++++#     ) -> mindspore.Tensor:
-++++#         """
-++++#         【解码路径】极致优化版：bmm + 高精度累加。
-++++#         """
-++++#         original_dtype = hidden_states.dtype
-++++#         batch_size, _ = hidden_states.shape
-++++        
-++++#         expert_outputs_list = [
-++++#             ops.cat([
-++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++#             ], dim=0) 
-++++#             for i in range(batch_size)
-++++#         ]
-++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++
-++++#         # 在 float32 下执行 bmm，得到高精度结果
-++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++        
-++++#         # 将高精度结果转换回原始数据类型
-++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-++++        
-++++#         return moe_output
-++++
-++++#     @no_grad()
-++++#     def _moe_infer_prefill(
-++++#         self, 
-++++#         hidden_states: mindspore.Tensor, 
-++++#         selected_experts: mindspore.Tensor, 
-++++#         routing_weights: mindspore.Tensor
-++++#     ) -> mindspore.Tensor:
-++++#         """
-++++#         【预填充路径】与原始实现一致，结果精确。
-++++#         """
-++++#         moe_output = ops.zeros_like(hidden_states)
-++++#         num_tokens, _ = hidden_states.shape
-++++#         flat_selected_experts = selected_experts.flatten()
-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++#         active_experts = ops.unique(flat_selected_experts)
-++++        
-++++#         for expert_idx_tensor in active_experts:
-++++#             expert_idx = expert_idx_tensor.item()
-++++#             expert_layer = self.experts[expert_idx]
-++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++#             selected_token_indices = token_indices[mask]
-++++#             selected_routing_weights = routing_weights.flatten()[mask]
-++++#             current_states = hidden_states[selected_token_indices]
-++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++#             moe_output = moe_output.index_add(
-++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++++#             )
-++++#         return moe_output
-++++
-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++        
-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++#         router_logits = self.gate(hidden_states_reshaped)
-++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++ 
-+++-    #     return attn_output, attn_weights, past_key_value
-++++#         if self.norm_topk_prob:
-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++        
-++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-++++#         # 如果模型主体是 float16，后续再转换
-++++        
-++++#         moe_output = None
-++++#         if not self.training:
-++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-++++#             # _moe_infer_decode 内部会处理好类型转换
-++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-++++#             if sequence_length == 1:
-++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++++#             else:
-++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++++#         else:
-++++#             raise NotImplementedError("Training path is not implemented.")
-++++
-++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++        
-++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++        
-++++#         return final_hidden_states, router_logits
-++++    
-+++ 
-+++-QWEN2MOE_ATTENTION_CLASSES = {
-+++-    "eager": Qwen2MoeAttention,
-+++-    "flash-attention": Qwen2MoeFlashAttention,
-+++-}
-++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++#     """
-++++#     【融合版】一个混合专家模块，内置两种推理策略，
-++++#     由外部全局变量 `Long_Prompt` 控制：
-++++
-++++#     - if Long_Prompt is True:  【精度优先模式】
-++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-++++#       适用于处理长序列，避免误差累积。
-++++
-++++#     - if Long_Prompt is False: 【速度优先模式】
-++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-++++#       在解码阶段获得极致速度，同时保证结果高度准确。
-++++#     """
-++++#     def __init__(self, config: Qwen2MoeConfig):
-++++#         super().__init__()
-++++#         self.num_experts = config.num_experts
-++++#         self.top_k = config.num_experts_per_tok
-++++#         self.norm_topk_prob = config.norm_topk_prob
-++++
-++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++#         self.experts = nn.ModuleList(
-++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++#         )
-++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++
-++++#     # --- 速度优先模式的辅助函数 ---
-++++#     @no_grad()
-++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++#         original_dtype = hidden_states.dtype
-++++#         batch_size, _ = hidden_states.shape
-++++#         expert_outputs_list = [
-++++#             ops.cat([
-++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++#             ], dim=0) 
-++++#             for i in range(batch_size)
-++++#         ]
-++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++#         weights_fp32 = routing_weights.to(mindspore.float32)
-++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
-++++
-++++#     @no_grad()
-++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++#         moe_output = ops.zeros_like(hidden_states)
-++++#         num_tokens, _ = hidden_states.shape
-++++#         flat_selected_experts = selected_experts.flatten()
-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++#         active_experts = ops.unique(flat_selected_experts)
-++++#         for expert_idx_tensor in active_experts:
-++++#             expert_idx = expert_idx_tensor.item()
-++++#             expert_layer = self.experts[expert_idx]
-++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++#             selected_token_indices = token_indices[mask]
-++++#             selected_routing_weights = routing_weights.flatten()[mask]
-++++#             current_states = hidden_states[selected_token_indices]
-++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-++++#         return moe_output
-++++
-++++#     # --- 精度优先模式的辅助函数 ---
-++++#     @no_grad()
-++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++#         moe_output = ops.zeros_like(hidden_states)
-++++#         num_tokens, _ = hidden_states.shape
-++++#         flat_selected_experts = selected_experts.flatten()
-++++#         flat_routing_weights = routing_weights.flatten()
-++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++#         active_experts = ops.unique(flat_selected_experts)
-++++#         for expert_idx_tensor in active_experts:
-++++#             expert_idx = expert_idx_tensor.item()
-++++#             expert_layer = self.experts[expert_idx]
-++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++#             current_token_indices = token_indices[mask]
-++++#             current_routing_weights = flat_routing_weights[mask]
-++++#             current_hidden_states = hidden_states[current_token_indices]
-++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++++#         return moe_output
-++++
-++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++#         # 声明我们将要使用一个在模块外部定义的全局变量
-++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-++++#         global Long_Prompt
-++++
-++++#         # 1. 门控计算 (所有模式通用)
-++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++#         router_logits = self.gate(hidden_states_reshaped)
-++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++++#         if self.norm_topk_prob:
-++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++        
-++++#         moe_output = None
-++++#         if not self.training:
-++++#             # 根据 Long_Prompt 标志选择模式
-++++#             if Long_Prompt:
-++++#                 # --- 精度优先模式 ---
-++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++#             else:
-++++#                 # --- 速度优先模式 ---
-++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++#                 if sequence_length == 1:
-++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++#                 else:
-++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++#         else:
-++++#             raise NotImplementedError("Training path is not implemented.")
-++++
-++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++        
-++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++        
-++++#         return final_hidden_states, router_logits
-++++    
-++++class Qwen2MoeSparseMoeBlock(nn.Module):
-++++    """
-++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-++++    控制的顶级推理策略：
-+++ 
-++++    - if Long_Prompt is True:  【精度优先模式】
-++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
-++++      适用于需要严格可复现性的长序列任务。
-+++ 
-+++-class Qwen2MoeSparseMoeBlock(nn.Module):
-+++-    def __init__(self, config):
-++++    - if Long_Prompt is False: 【速度优先模式】
-++++      采用业界最强的性能组合：
-++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
-++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
-++++    """
-++++    def __init__(self, config: Qwen2MoeConfig):
-+++         super().__init__()
-+++         self.num_experts = config.num_experts
-+++         self.top_k = config.num_experts_per_tok
-+++         self.norm_topk_prob = config.norm_topk_prob
-+++ 
-+++-        # gating
-+++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++         self.experts = nn.ModuleList(
-+++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++         )
-+++-
-+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++ 
-+++-    #@dwj
-+++-    # 只遍历激活的专家，而非全部专家
-+++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++-            num_tokens = hidden_states_reshaped.shape[0]
-+++-
-+++-            router_logits = self.gate(hidden_states_reshaped)
-+++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++-
-+++-            if self.norm_topk_prob:
-+++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-            routing_weights = routing_weights.to(hidden_states.dtype)
-+++-            
-+++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++-            flat_selected_experts = selected_experts.flatten()
-+++-            
-+++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++-            token_indices = broadcasted_token_indices.flatten()
-+++-            
-+++-            active_experts = ops.unique(flat_selected_experts)
-+++-            
-+++-            for expert_idx_tensor in active_experts:
-+++-                expert_idx = expert_idx_tensor.item()
-+++-                expert_layer = self.experts[expert_idx]
-+++-                
-+++-                mask = (flat_selected_experts == expert_idx_tensor)
-+++-                selected_token_indices = token_indices[mask]
-+++-                selected_routing_weights = routing_weights.flatten()[mask]
-+++-                
-+++-                current_states = hidden_states_reshaped[selected_token_indices]
-+++-                
-+++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++-                
-+++-                final_hidden_states = final_hidden_states.index_add(
-+++-                    dim=0,
-+++-                    index=selected_token_indices,
-+++-                    source=expert_output.to(hidden_states.dtype)
-+++-                )
-+++-            
-+++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-++++    @no_grad()
-++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++        original_dtype = hidden_states.dtype
-++++        batch_size, _ = hidden_states.shape
-++++        expert_outputs_list = [
-++++            ops.cat([
-++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++            ], dim=0)
-++++            for i in range(batch_size)
-++++        ]
-++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++        weights_fp32 = routing_weights.to(mindspore.float32)
-++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++++        return moe_output_fp32.squeeze(1).to(original_dtype)
-++++
-++++    @no_grad()
-++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++        num_tokens, _ = hidden_states.shape
-++++        flat_selected_experts = selected_experts.flatten()
-++++        sorted_expert_indices = flat_selected_experts.argsort()
-++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++++        original_token_indices = sorted_expert_indices // self.top_k
-++++        moe_output = ops.zeros_like(hidden_states)
-++++        current_token_offset = 0
-++++        for i in range(self.num_experts):
-++++            expert_token_count = tokens_per_expert[i] - current_token_offset
-++++            if expert_token_count == 0:
-++++                continue
-++++            end_offset = current_token_offset + expert_token_count
-++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++++            expert_hidden_states = hidden_states[expert_original_token_indices]
-++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++++            current_token_offset += expert_token_count
-++++        return moe_output
-++++
-++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-++++    @no_grad()
-++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++        moe_output = ops.zeros_like(hidden_states)
-++++        num_tokens, _ = hidden_states.shape
-++++        flat_selected_experts = selected_experts.flatten()
-++++        flat_routing_weights = routing_weights.flatten()
-++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++        active_experts = ops.unique(flat_selected_experts)
-++++        for expert_idx_tensor in active_experts:
-++++            expert_idx = expert_idx_tensor.item()
-++++            expert_layer = self.experts[expert_idx]
-++++            mask = (flat_selected_experts == expert_idx_tensor)
-++++            current_token_indices = token_indices[mask]
-++++            current_routing_weights = flat_routing_weights[mask]
-++++            current_hidden_states = hidden_states[current_token_indices]
-++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++++        return moe_output
-+++ 
-+++-            final_hidden_states = final_hidden_states + shared_expert_output
-+++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++-            
-+++-            return final_hidden_states, router_logits
-++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++        global Long_Prompt
-++++
-++++        # 1. 门控计算 (所有模式通用)
-++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++        router_logits = self.gate(hidden_states_reshaped)
-++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++++        if self.norm_topk_prob:
-++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++        
-++++        moe_output = None
-++++        if Long_Prompt:
-++++            # --- 精度优先模式 (ACCURACY MODE) ---
-++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++        else:
-++++            # --- 速度优先模式 (SPEED MODE) ---
-++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++            if sequence_length == 1:
-++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++            else:
-++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++        
-+++ 
-++++        # 3. 共享专家计算与合并 (所有模式通用)
-++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++        
-++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++        
-++++        return final_hidden_states, router_logits
-+++ 
-+++ class Qwen2MoeDecoderLayer(nn.Module):
-+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-+++         super().__init__()
-+++         self.hidden_size = config.hidden_size
-++++        
-++++        # if Long_Prompt:
-++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++        # else:
-++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++ 
-+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++ 
-+++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++-
-+++         if (layer_idx not in config.mlp_only_layers) and (
-+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++         ):
-+++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++             self._warmed_up = True
-+++             self.warmup_moe_model()
-+++ 
-++++
-++++
-+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++         output_router_logits = (
-+++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
-+++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++             router_logits=outputs.router_logits,
-+++         )
-+++ 
-++++    def generate(self, *args, **kwargs):
-++++        """
-++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-++++        """
-++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-++++
-++++        input_ids = kwargs.get("input_ids")
-++++        if input_ids is None and args:
-++++            input_ids = args[0]
-++++
-++++        if input_ids is not None:
-++++            prompt_length = input_ids.shape[1]
-++++            
-++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-++++                Long_Prompt = True
-++++            else:
-++++                Long_Prompt = False
-++++
-++++        return super().generate(*args, **kwargs)
-++++    
-+++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
-+++     def prepare_inputs_for_generation(
-+++         self,
-+++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
-+++         # Exception 1: when passing input_embeds, input_ids may be missing entries
-+++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
-++++        
-+++         if past_key_values is not None:
-+++             if inputs_embeds is not None:  # Exception 1
-+++                 if 0 not in input_ids.shape:
-+++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++             }
-+++         )
-+++         return model_inputs
-++++
-+++ # @lwx
-+++     # def _decode_one_tokens_logits(
-+++     #     self,
-+++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
-+++             attentions=outputs.attentions,
-+++         )
-+++ 
-++++
-+++ __all__ = [
-+++     "Qwen2MoeForCausalLM",
-+++     "Qwen2MoeModel",
-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+++new file mode 100644
-+++index 00000000..6dfb5b93
-+++--- /dev/null
-++++++ b/patches/0001-20251104commit.patch
-+++@@ -0,0 +1,1272 @@
-++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++++From: Pinoeer-kingxi <13022943007@163.com>
-++++Date: Tue, 4 Nov 2025 09:11:51 +0800
-++++Subject: [PATCH] 20251104commit
-++++
-++++---
-++++ mindnlp/transformers/cache_utils.py           |  28 +-
-++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++++ 3 files changed, 976 insertions(+), 87 deletions(-)
-++++
-++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++++index cadd2e04..02f8d4be 100644
-++++--- a/mindnlp/transformers/cache_utils.py
-+++++++ b/mindnlp/transformers/cache_utils.py
-++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++++             # k_out[:, :, cache_position] = key_states
-++++             # v_out[:, :, cache_position] = value_states
-++++-            if ON_ORANGE_PI:
-++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++-            else:
-++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++-
-+++++            # if ON_ORANGE_PI:
-+++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++            # else:
-+++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++            # 确保 cache_position 是 1D tensor 并且类型正确
-+++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+++++            if cache_position.ndim > 1:
-+++++                cache_position = cache_position.flatten()
-+++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-+++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+++++                cache_position = cache_position.int()
-+++++            
-+++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+++++            k_out[:, :, cache_position] = key_states
-+++++            v_out[:, :, cache_position] = value_states
-+++++            
-++++         return k_out, v_out
-++++ 
-++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++index c695b944..d8303e45 100644
-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++++ def rotate_half(x):
-++++     """Rotates half the hidden dims of the input."""
-++++-    x1 = x[..., : x.shape[-1] // 2]
-++++-    x2 = x[..., x.shape[-1] // 2 :]
-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++    # x1 = x[..., : x.shape[-1] // 2]
-+++++    # x2 = x[..., x.shape[-1] // 2 :]
-+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++     return ops.cat((-x2, x1), dim=-1)
-++++ 
-++++ 
-++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++++         if self.training:
-++++             raise NotImplementedError("Training is not supported yet.")
-++++         else:
-++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++-        if self.config.n_shared_experts is not None:
-++++-            y = y + self.shared_experts(identity)
-++++-        return y
-+++++            # @lwx
-+++++            if orig_shape[1] == 1:
-+++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+++++                y=y.view(*orig_shape)
-+++++                if self.config.n_shared_experts is not None:
-+++++                    y = y + self.shared_experts(identity)
-+++++                return y
-+++++            else:
-+++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+++++                if self.config.n_shared_experts is not None:
-+++++                    y = y + self.shared_experts(identity)
-+++++                return y
-+++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++        # if self.config.n_shared_experts is not None:
-+++++        #     y = y + self.shared_experts(identity)
-+++++        # return y
-+++++        
-+++++    @no_grad()
-+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++
-+++++        expert_cache = ops.zeros_like(x)
-+++++        for i in range(self.num_experts_per_tok):
-+++++            expert_id = flat_expert_indices[i].item()
-+++++            weight = flat_expert_weights[i].item()
-+++++            expert = self.experts[expert_id]
-+++++            expert_out = expert(x)
-+++++            expert_cache += expert_out * weight
-+++++        return expert_cache
-++++ 
-++++     @no_grad()
-++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++-        # expert_cache = torch.zeros_like(x)
-++++-        # idxs = flat_expert_indices.argsort()
-++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++-        # token_idxs = idxs // self.num_experts_per_tok
-++++-        # for i, end_idx in enumerate(tokens_per_expert):
-++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++-        #     if start_idx == end_idx:
-++++-        #         continue
-++++-        #     expert = self.experts[i]
-++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++-        #     expert_tokens = x[exp_token_idx]
-++++-        #     expert_out = expert(expert_tokens)
-++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++-        # return expert_cache
-+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++         expert_cache = ops.zeros_like(x)
-++++         idxs = flat_expert_indices.argsort()
-++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++         token_idxs = idxs // self.num_experts_per_tok
-+++++
-++++         for i, end_idx in enumerate(tokens_per_expert):
-++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++             if start_idx == end_idx:
-++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++++             expert_out = expert(expert_tokens)
-++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++
-++++         return expert_cache
-+++++        
-+++++    # @no_grad()
-+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++    #     # expert_cache = torch.zeros_like(x)
-+++++    #     # idxs = flat_expert_indices.argsort()
-+++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++    #     # token_idxs = idxs // self.num_experts_per_tok
-+++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++    #     #     if start_idx == end_idx:
-+++++    #     #         continue
-+++++    #     #     expert = self.experts[i]
-+++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++    #     #     expert_tokens = x[exp_token_idx]
-+++++    #     #     expert_out = expert(expert_tokens)
-+++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++    #     # return expert_cache
-+++++    #     expert_cache = ops.zeros_like(x)
-+++++    #     idxs = flat_expert_indices.argsort()
-+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++    #     token_idxs = idxs // self.num_experts_per_tok
-+++++
-+++++    #     for i, end_idx in enumerate(tokens_per_expert):
-+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++    #         if start_idx == end_idx:
-+++++    #             continue
-+++++    #         expert = self.experts[i]
-+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++    #         expert_tokens = x[exp_token_idx]
-+++++    #         expert_out = expert(expert_tokens)
-+++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++
-+++++    #     return expert_cache
-+++++    # @no_grad()
-+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++    #     expert_cache = ops.zeros_like(x)
-+++++
-+++++    #     # 排序保证顺序一致
-+++++    #     idxs = flat_expert_indices.argsort()
-+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++    #     token_idxs = idxs // self.num_experts_per_tok
-+++++
-+++++    #     # 找出有 token 的专家
-+++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++
-+++++    #     for i in active_experts.tolist():
-+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++    #         end_idx = tokens_per_expert[i]
-+++++    #         if start_idx == end_idx:  # 没有 token
-+++++    #             continue
-+++++
-+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++    #         expert_tokens = x[exp_token_idx]
-+++++    #         expert_out = self.experts[i](expert_tokens)
-+++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++
-+++++    #         expert_cache = mindspore.mint.scatter_add(
-+++++    #             expert_cache,
-+++++    #             0,
-+++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++    #             expert_out
-+++++    #         )
-+++++
-+++++    #     return expert_cache
-+++++
-+++++
-++++ 
-++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++++ #     """
-++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++ 
-++++         # Initialize weights and apply final processing
-++++         self.post_init()
-+++++        self.warm_up = False
-+++++
-+++++    def warmup_moe_model_deep(self):
-+++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++++        test_texts = [
-+++++            "warmup short",
-+++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+++++        ]
-+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++        if tokenizer is None:
-+++++            from mindnlp.transformers import AutoTokenizer
-+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++            self._warmup_tokenizer = tokenizer
-+++++
-+++++        for text in test_texts:
-+++++            inputs = tokenizer(text, return_tensors="ms")
-+++++            with mindspore._no_grad():
-+++++                _ = self(**inputs, use_cache=False)
-+++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++++ 
-++++     def get_input_embeddings(self):
-++++         return self.model.embed_tokens
-++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++         ```"""
-+++++        if not self.warm_up:
-+++++            self.warm_up = True
-+++++            self.warmup_moe_model_deep()
-+++++
-++++         output_attentions = (
-++++             output_attentions
-++++             if output_attentions is not None
-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++index 3cbf820e..d4c6b651 100644
-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++@@ -18,7 +18,6 @@
-++++ # See the License for the specific language governing permissions and
-++++ # limitations under the License.
-++++ """MindSpore Qwen2MoE model."""
-++++-
-++++ import math
-++++ from typing import List, Optional, Tuple, Union
-++++ 
-++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++++     TokenClassifierOutput,
-++++ )
-++++ from ...modeling_utils import PreTrainedModel
-+++++from ...generation import GenerationMixin
-++++ from ....utils import logging
-++++ from .configuration_qwen2_moe import Qwen2MoeConfig
-++++ 
-++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++++         self.variance_epsilon = eps
-++++ 
-++++     def forward(self, hidden_states):
-+++++        # @dwj
-+++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++        # @lwx
-+++++        # if not self.training :
-+++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++         input_dtype = hidden_states.dtype
-++++         hidden_states = hidden_states.to(mindspore.float32)
-++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++++@@ -234,6 +239,8 @@ def rotate_half(x):
-++++     """Rotates half the hidden dims of the input."""
-++++     x1 = x[..., : x.shape[-1] // 2]
-++++     x2 = x[..., x.shape[-1] // 2 :]
-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++     return ops.cat((-x2, x1), dim=-1)
-++++ 
-++++ 
-++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++++         self.config = config
-++++         self.hidden_size = config.hidden_size
-++++         self.intermediate_size = intermediate_size
-+++++        
-++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++++         self.act_fn = ACT2FN[config.hidden_act]
-++++ 
-++++     def forward(self, x):
-++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++-
-++++ 
-+++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++        # @lwx 
-+++++        # gate_up_output = self.gate_up_proj(x)
-+++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+++++        # return self.down_proj(swiglu_output)
-+++++
-+++++    # def forward(self, x):
-+++++    #     gate_proj_out = self.gate_proj(x)
-+++++    #     up_proj_out = self.up_proj(x)
-+++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+++++    #     return self.down_proj(swiglu_out)
-+++++        
-++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++++     """
-++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++++         use_cache: bool = False,
-++++         cache_position: Optional[mindspore.Tensor] = None,
-++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++        
-+++++
-++++         bsz, q_len, _ = hidden_states.shape
-++++ 
-++++         query_states = self.q_proj(hidden_states)
-++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++                     "with a layer index."
-++++                 )
-++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++            if isinstance(past_key_value, StaticCache):
-+++++                kv_seq_len = key_states.shape[-2]
-+++++            else:
-+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++ 
-++++         if past_key_value is not None:
-++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++            
-+++++            if isinstance(past_key_value, StaticCache):
-+++++                kv_seq_len = key_states.shape[-2]
-++++ 
-++++         # repeat k/v heads if n_kv_heads < n_heads
-++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++-
-+++++        
-++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++ 
-++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++++-            raise ValueError(
-++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++++-                f" {attn_weights.shape}"
-++++-            )
-++++-
-++++-        if attention_mask is not None:  # no matter the length, we just slice it
-++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+++++        if attention_mask is not None:
-+++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++             attn_weights = attn_weights + causal_mask
-++++ 
-++++         # upcast attention to fp32
-++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++ 
-++++         attn_output = self.o_proj(attn_output)
-++++-
-+++++        # @lwx
-+++++        
-+++++        # max_seq_len = self.max_position_embeddings  # 2048
-+++++
-+++++        # if attention_mask is not None:
-+++++        #     # attention_mask: [B, 1, Sq, Sk]
-+++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++++
-+++++        #     # pad 到 [max_seq_len, max_seq_len]
-+++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++++        #     global_attention_mask = padded_mask
-+++++        # else:
-+++++        #     global_attention_mask = None
-+++++
-+++++
-+++++        # sparse_mode=3
-+++++        # attn_output = mindspore.ops.flash_attention_score(
-+++++        #     query=query_states,
-+++++        #     key=key_states,
-+++++        #     value=value_states,
-+++++        #     real_shift=None,
-+++++        #     padding_mask=None,
-+++++
-+++++        #     head_num=self.num_heads,
-+++++        #     attn_mask=global_attention_mask,
-+++++        #     keep_prob=1.0 - self.attention_dropout,
-+++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++        #     input_layout="BNSD",
-+++++        #     pre_tokens=2147483647,
-+++++        #     next_tokens=2147483647,
-+++++        #     inner_precise=0,
-+++++        #     drop_mask=None,
-+++++        #     prefix=None,
-+++++        #     actual_seq_qlen=None,
-+++++        #     actual_seq_kvlen=None,
-+++++        #     sparse_mode=sparse_mode,
-+++++        # )
-++++         if not output_attentions:
-++++             attn_weights = None
-++++ 
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-++++ 
-+++++class Qwen2MoeFlashAttention(nn.Module):
-+++++    """
-+++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++++
-+++++    关键改动:
-+++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++++       直接传入原始的 key 和 value 张量效率更高。
-+++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++    """
-+++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++        super().__init__()
-+++++        self.config = config
-+++++        self.layer_idx = layer_idx
-+++++        self.hidden_size = config.hidden_size
-+++++        self.num_heads = config.num_attention_heads
-+++++        self.head_dim = self.hidden_size // self.num_heads
-+++++        self.num_key_value_heads = config.num_key_value_heads
-+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++        self.max_position_embeddings = config.max_position_embeddings
-+++++        self.rope_theta = config.rope_theta
-+++++        self.attention_dropout = config.attention_dropout
-+++++
-+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++            raise ValueError(
-+++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++++            )
-+++++
-+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++
-+++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++            self.head_dim,
-+++++            max_position_embeddings=self.max_position_embeddings,
-+++++            base=self.rope_theta,
-+++++        )
-+++++
-+++++    def forward(
-+++++        self,
-+++++        hidden_states: mindspore.Tensor,
-+++++        attention_mask: Optional[mindspore.Tensor] = None,
-+++++        position_ids: Optional[mindspore.Tensor] = None,
-+++++        past_key_value: Optional[Cache] = None,
-+++++        output_attentions: bool = False,
-+++++        use_cache: bool = False,
-+++++        cache_position: Optional[mindspore.Tensor] = None,
-+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++        bsz, q_len, _ = hidden_states.shape
-+++++
-+++++        # 1. 线性投射 Q, K, V
-+++++        query_states = self.q_proj(hidden_states)
-+++++        key_states = self.k_proj(hidden_states)
-+++++        value_states = self.v_proj(hidden_states)
-+++++
-+++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++        # 3. RoPE 旋转位置编码
-+++++        kv_seq_len = key_states.shape[-2]
-+++++        if past_key_value is not None:
-+++++            if self.layer_idx is None:
-+++++                raise ValueError(
-+++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++                    "with a layer index."
-+++++                )
-+++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++++                if cache_position.shape[0] == 1:
-+++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++++                    kv_seq_len = past_seen_tokens + 1
-+++++                else:
-+++++                    # prefill 阶段：cache_position 是范围，使用其长度
-+++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++++            else:
-+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++
-+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++        # 4. KV 缓存更新
-+++++        if past_key_value is not None:
-+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++            key_states, value_states = past_key_value.update(
-+++++                key_states, value_states, self.layer_idx, cache_kwargs
-+++++            )
-+++++            
-+++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++                if cache_position.shape[0] == 1:
-+++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++++                    kv_seq_len = key_states.shape[-2]
-+++++
-+++++        # 5. [重要] 准备 Attention Mask
-+++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++        fa_attention_mask = None
-+++++        if attention_mask is not None:
-+++++            # 截取与当前key长度匹配的部分
-+++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++++            fa_attention_mask = (mask_slice != 0)
-+++++
-+++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++++        input_dtype = query_states.dtype
-+++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++++            query_states = query_states.to(mindspore.float16)
-+++++            key_states = key_states.to(mindspore.float16)
-+++++            value_states = value_states.to(mindspore.float16)
-+++++
-+++++        # 6. [核心] 调用 flash_attention_score 算子
-+++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++        attn_output = mindspore.ops.flash_attention_score(
-+++++            query=query_states,
-+++++            key=key_states,
-+++++            value=value_states,
-+++++            head_num=self.num_heads, # 传入Q的头数(N1)
-+++++            attn_mask=fa_attention_mask,
-+++++            keep_prob=1.0 - self.attention_dropout,
-+++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++            input_layout="BNSD",
-+++++            sparse_mode=0 # 使用 defaultMask 模式
-+++++        )
-+++++
-+++++        # 恢复原始数据类型
-+++++        attn_output = attn_output.to(input_dtype)
-+++++
-+++++        # 7. 调整输出形状
-+++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++        attn_output = self.o_proj(attn_output)
-+++++
-+++++        # FlashAttention 算子不直接返回注意力权重矩阵
-+++++        attn_weights = None
-+++++        if output_attentions:
-+++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++
-+++++        return attn_output, attn_weights, past_key_value
-+++++
-+++++    # def forward(
-+++++    #     self,
-+++++    #     hidden_states: mindspore.Tensor,
-+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++    #     past_key_value: Optional[Cache] = None,
-+++++    #     output_attentions: bool = False,
-+++++    #     use_cache: bool = False,
-+++++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++    #     bsz, q_len, _ = hidden_states.shape
-+++++
-+++++    #     # 1. 线性投射 Q, K, V
-+++++    #     query_states = self.q_proj(hidden_states)
-+++++    #     key_states = self.k_proj(hidden_states)
-+++++    #     value_states = self.v_proj(hidden_states)
-+++++
-+++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++    #     # 3. RoPE 旋转位置编码
-+++++    #     kv_seq_len = key_states.shape[-2]
-+++++    #     if past_key_value is not None:
-+++++    #         if self.layer_idx is None:
-+++++    #             raise ValueError(
-+++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++    #                 "with a layer index."
-+++++    #             )
-+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++
-+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++    #     # 4. KV 缓存更新
-+++++    #     if past_key_value is not None:
-+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++    #         key_states, value_states = past_key_value.update(
-+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++    #         )
-+++++
-+++++    #     # 5. 准备 Attention Mask
-+++++    #     fa_attention_mask = None
-+++++    #     if attention_mask is not None:
-+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++    #         fa_attention_mask = (mask_slice != 0)
-+++++
-+++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++++    #     input_dtype = query_states.dtype
-+++++
-+++++    #     # 6. [核心] 调用 flash_attention_score 算子
-+++++    #     attn_output = mindspore.ops.flash_attention_score(
-+++++    #         query=query_states,
-+++++    #         key=key_states,
-+++++    #         value=value_states,
-+++++    #         head_num=self.num_heads,
-+++++    #         attn_mask=fa_attention_mask,
-+++++    #         keep_prob=1.0 - self.attention_dropout,
-+++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++    #         input_layout="BNSD",
-+++++    #         sparse_mode=0,
-+++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++++    #         inner_precise=1
-+++++    #     )
-+++++
-+++++    #     # 恢复原始数据类型
-+++++    #     attn_output = attn_output.to(input_dtype)
-+++++
-+++++    #     # 7. 调整输出形状
-+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++    #     attn_output = self.o_proj(attn_output)
-+++++
-+++++    #     attn_weights = None
-+++++    #     if output_attentions:
-+++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++
-+++++    #     return attn_output, attn_weights, past_key_value
-+++++
-+++++    # def forward(
-+++++    #     self,
-+++++    #     hidden_states: mindspore.Tensor,
-+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++    #     past_key_value: Optional[Cache] = None,
-+++++    #     output_attentions: bool = False,
-+++++    #     use_cache: bool = False,
-+++++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++    #     bsz, q_len, _ = hidden_states.shape
-+++++
-+++++    #     query_states = self.q_proj(hidden_states)
-+++++    #     key_states = self.k_proj(hidden_states)
-+++++    #     value_states = self.v_proj(hidden_states)
-+++++
-+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++    #     kv_seq_len = key_states.shape[-2]
-+++++    #     if past_key_value is not None:
-+++++    #         if self.layer_idx is None:
-+++++    #             raise ValueError("`layer_idx` must be specified for caching")
-+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++
-+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++    #     if past_key_value is not None:
-+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++    #         key_states, value_states = past_key_value.update(
-+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++    #         )
-+++++
-+++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++
-+++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++++    #     query_states = query_states / math.sqrt(self.head_dim)
-+++++    #     # <--- 修改结束 ---
-+++++
-+++++    #     fa_attention_mask = None
-+++++    #     if attention_mask is not None:
-+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++    #         fa_attention_mask = (mask_slice != 0)
-+++++
-+++++    #     input_dtype = query_states.dtype
-+++++
-+++++    #     attn_output = mindspore.ops.flash_attention_score(
-+++++    #         query=query_states,    # 传入已经预先缩放过的 query
-+++++    #         key=key_states,
-+++++    #         value=value_states,
-+++++    #         head_num=self.num_heads,
-+++++    #         attn_mask=fa_attention_mask,
-+++++    #         keep_prob=1.0 - self.attention_dropout,
-+++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++++    #         input_layout="BNSD",
-+++++    #         sparse_mode=0,
-+++++    #         inner_precise=1        # 仍然保持内部高精度计算
-+++++    #     )
-+++++
-+++++    #     attn_output = attn_output.to(input_dtype)
-+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++    #     attn_output = self.o_proj(attn_output)
-+++++
-+++++    #     attn_weights = None
-+++++    #     if output_attentions:
-+++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++++
-+++++    #     return attn_output, attn_weights, past_key_value
-+++++
-++++ QWEN2MOE_ATTENTION_CLASSES = {
-++++     "eager": Qwen2MoeAttention,
-+++++    "flash-attention": Qwen2MoeFlashAttention,
-++++ }
-++++ 
-++++ 
-++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++ 
-+++++    #@dwj
-+++++    # 只遍历激活的专家，而非全部专家
-++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-        hidden_states = hidden_states.view(-1, hidden_dim)
-++++-        # router_logits: (batch * sequence_length, n_experts)
-++++-        router_logits = self.gate(hidden_states)
-++++-
-++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++-        if self.norm_topk_prob:
-++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-        # we cast back to the input dtype
-++++-        routing_weights = routing_weights.to(hidden_states.dtype)
-++++-
-++++-        final_hidden_states = ops.zeros(
-++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++++-        )
-++++-
-++++-        # One hot encode the selected experts to create an expert mask
-++++-        # this will be used to easily index which expert is going to be sollicitated
-++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++++-
-++++-        # Loop over all available experts in the model and perform the computation on each expert
-++++-        for expert_idx in range(self.num_experts):
-++++-            expert_layer = self.experts[expert_idx]
-++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++++-
-++++-            # Index the correct hidden states and compute the expert hidden state for
-++++-            # the current expert. We need to make sure to multiply the output hidden
-++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++++-            if 0 not in idx.shape:
-++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++++-
-++++-                # However `index_add_` only support torch tensors for indexing so we'll use
-++++-                # the `top_x` tensor here.
-++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++++-
-++++-        shared_expert_output = self.shared_expert(hidden_states)
-++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++++-
-++++-        final_hidden_states = final_hidden_states + shared_expert_output
-+++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++            num_tokens = hidden_states_reshaped.shape[0]
-+++++
-+++++            router_logits = self.gate(hidden_states_reshaped)
-+++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++
-+++++            if self.norm_topk_prob:
-+++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++            routing_weights = routing_weights.to(hidden_states.dtype)
-+++++            
-+++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++++            flat_selected_experts = selected_experts.flatten()
-+++++            
-+++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++++            token_indices = broadcasted_token_indices.flatten()
-+++++            
-+++++            active_experts = ops.unique(flat_selected_experts)
-+++++            
-+++++            for expert_idx_tensor in active_experts:
-+++++                expert_idx = expert_idx_tensor.item()
-+++++                expert_layer = self.experts[expert_idx]
-+++++                
-+++++                mask = (flat_selected_experts == expert_idx_tensor)
-+++++                selected_token_indices = token_indices[mask]
-+++++                selected_routing_weights = routing_weights.flatten()[mask]
-+++++                
-+++++                current_states = hidden_states_reshaped[selected_token_indices]
-+++++                
-+++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++                
-+++++                final_hidden_states = final_hidden_states.index_add(
-+++++                    dim=0,
-+++++                    index=selected_token_indices,
-+++++                    source=expert_output.to(hidden_states.dtype)
-+++++                )
-+++++            
-+++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++ 
-++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++-        return final_hidden_states, router_logits
-+++++            final_hidden_states = final_hidden_states + shared_expert_output
-+++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++            
-+++++            return final_hidden_states, router_logits
-++++ 
-++++ 
-++++ class Qwen2MoeDecoderLayer(nn.Module):
-++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++++ 
-++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++ 
-+++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++
-++++         if (layer_idx not in config.mlp_only_layers) and (
-++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++++         ):
-++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++++     _skip_keys_device_placement = "past_key_values"
-++++     _supports_cache_class = True
-+++++#lwx
-+++++    # _supports_static_cache = True
-++++ 
-++++     def _init_weights(self, module):
-++++         std = self.config.initializer_range
-++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++++         return causal_mask
-++++ 
-++++ 
-++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++     _tied_weights_keys = ["lm_head.weight"]
-++++ 
-++++     def __init__(self, config):
-++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++         self.num_experts_per_tok = config.num_experts_per_tok
-++++         # Initialize weights and apply final processing
-++++         self.post_init()
-+++++        # @lwx
-+++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+++++        #     self.generation_config.cache_implementation = "static"
-+++++        self._warmed_up = False
-+++++
-+++++    def warmup_moe_model(self):
-+++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+++++        test_texts = [
-+++++            "warmup short",
-+++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+++++        ]
-+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++        if tokenizer is None:
-+++++            from mindnlp.transformers import AutoTokenizer
-+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++            self._warmup_tokenizer = tokenizer
-+++++
-+++++        for text in test_texts:
-+++++            inputs = tokenizer(text, return_tensors="ms")
-+++++            with mindspore._no_grad():
-+++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++++ 
-++++     def get_input_embeddings(self):
-++++         return self.model.embed_tokens
-++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++         ```"""
-+++++        if not self._warmed_up:
-+++++            self._warmed_up = True
-+++++            self.warmup_moe_model()
-++++ 
-++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++++         output_router_logits = (
-++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++             }
-++++         )
-++++         return model_inputs
-+++++# @lwx
-+++++    # def _decode_one_tokens_logits(
-+++++    #     self,
-+++++    #     cur_token: mindspore.Tensor,
-+++++    #     input_pos: Optional[mindspore.Tensor],
-+++++    #     cache_position: mindspore.Tensor,
-+++++    #     past_key_values: StaticCache,
-+++++    # ) -> mindspore.Tensor:
-+++++    #     """
-+++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+++++        
-+++++    #     Args:
-+++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+++++    #         input_pos: 输入位置信息，可选
-+++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+++++            
-+++++    #     Returns:
-+++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+++++    #     """
-+++++    #     # 调用JIT编译的版本
-+++++    #     return self.get_decode_one_tokens_logits(
-+++++    #         cur_token=cur_token,
-+++++    #         input_pos=input_pos,
-+++++    #         cache_position=cache_position,
-+++++    #         past_key_values=past_key_values,
-+++++    #     )
-+++++    
-+++++    # @mindspore.jit(jit_level='O1')
-+++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+++++    #     """
-+++++    #     JIT编译的函数，用于高效的单token解码
-+++++    #     使用JIT编译优化以支持静态shape和高效执行
-+++++        
-+++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+++++    #     """
-+++++    #     outputs = self.model.forward(
-+++++    #         input_ids=cur_token,
-+++++    #         position_ids=input_pos,
-+++++    #         cache_position=cache_position,
-+++++    #         past_key_values=past_key_values,
-+++++    #         use_cache=True,
-+++++    #         return_dict=False,
-+++++    #     )
-+++++        
-+++++    #     hidden_states = outputs[0]
-+++++    #     logits = self.lm_head.forward(hidden_states)
-+++++    #     logits = logits.float()
-+++++        
-+++++    #     return logits[:, -1, :]
-+++++
-+++++    # def _sample(
-+++++    #     self,
-+++++    #     input_ids: mindspore.Tensor,
-+++++    #     logits_processor,
-+++++    #     stopping_criteria,
-+++++    #     generation_config,
-+++++    #     synced_devices: bool,
-+++++    #     streamer=None,
-+++++    #     logits_warper=None,
-+++++    #     **model_kwargs,
-+++++    # ):
-+++++    #     """
-+++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+++++    #     """
-+++++    #     from ...generation.logits_process import LogitsProcessorList
-+++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+++++    #     from mindnlp.core import nn, ops, no_grad
-+++++    #     import numpy as np
-+++++        
-+++++    #     # 检查是否使用 StaticCache
-+++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+++++    #     # 否则，直接调用父类方法
-+++++    #     past_key_values = model_kwargs.get("past_key_values")
-+++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+++++
-+++++    #     if not isinstance(past_key_values, StaticCache):
-+++++    #         # 不使用 StaticCache，直接调用父类方法
-+++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+++++    #         return super()._sample(
-+++++    #             input_ids=input_ids,
-+++++    #             logits_processor=logits_processor,
-+++++    #             stopping_criteria=stopping_criteria,
-+++++    #             generation_config=generation_config,
-+++++    #             synced_devices=synced_devices,
-+++++    #             streamer=streamer,
-+++++    #             logits_warper=logits_warper,
-+++++    #             **model_kwargs,
-+++++    #         )
-+++++        
-+++++    #     # 使用 StaticCache，进入自定义循环
-+++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+++++    #     pad_token_id = generation_config._pad_token_tensor
-+++++    #     output_attentions = generation_config.output_attentions
-+++++    #     output_hidden_states = generation_config.output_hidden_states
-+++++    #     output_scores = generation_config.output_scores
-+++++    #     output_logits = generation_config.output_logits
-+++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+++++    #     max_length = generation_config.max_length
-+++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+++++    #     do_sample = generation_config.do_sample
-+++++        
-+++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+++++    #         raise ValueError(
-+++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+++++    #             f"{logits_warper})."
-+++++    #         )
-+++++        
-+++++    #     # init attention / hidden states / scores tuples
-+++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-+++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+++++        
-+++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+++++    #         encoder_hidden_states = (
-+++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+++++    #         )
-+++++        
-+++++    #     # keep track of which sequences are already finished
-+++++    #     batch_size, cur_len = input_ids.shape
-+++++    #     this_peer_finished = False
-+++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+++++        
-+++++    #     time_record = []
-+++++    #     from ....utils.testing_utils import parse_flag_from_env
-+++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+++++        
-+++++    #     while self._has_unfinished_sequences(
-+++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+++++    #     ):
-+++++    #         if _record_time:
-+++++    #             import time as time_module
-+++++    #             infer_start = time_module.time()
-+++++            
-+++++    #         # prepare model inputs
-+++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+++++            
-+++++    #         # prepare variable output controls
-+++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+++++            
-+++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+++++    #         cur_cache_position = model_inputs.get("cache_position")
-+++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-+++++    #         cur_input_ids = model_inputs.get("input_ids")
-+++++            
-+++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+++++    #             cur_cache_position is not None and 
-+++++    #             len(cur_cache_position.shape) > 0 and
-+++++    #             cur_cache_position.shape[0] == 1 and
-+++++    #             cur_input_ids is not None and
-+++++    #             cur_input_ids.shape[1] == 1):
-+++++    #             # 使用 JIT 优化的单 token 解码
-+++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+++++    #             if not hasattr(self, '_jit_used'):
-+++++    #                 self._jit_used = False
-+++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+++++                
-+++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-+++++    #                 cur_token=cur_input_ids,
-+++++    #                 input_pos=model_inputs.get("position_ids"),
-+++++    #                 cache_position=cur_cache_position,
-+++++    #                 past_key_values=cur_past_key_values,
-+++++    #             )
-+++++                
-+++++    #             # 标记已使用JIT（用于后续判断）
-+++++    #             if not self._jit_used:
-+++++    #                 self._jit_used = True
-+++++                
-+++++    #             # 构造兼容的输出对象
-+++++    #             class JitOptimizedOutput:
-+++++    #                 def __init__(self, logits, config):
-+++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+++++    #                     self.config = config
-+++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-+++++    #                     self.cross_attentions = None
-+++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+++++                
-+++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+++++    #         else:
-+++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+++++    #             outputs = self(**model_inputs, return_dict=True)
-+++++            
-+++++    #         if synced_devices and this_peer_finished:
-+++++    #             continue
-+++++            
-+++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+++++    #         next_token_logits = outputs.logits[:, -1, :]
-+++++            
-+++++    #         # pre-process distribution
-+++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+++++    #         if do_sample:
-+++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+++++            
-+++++    #         # Store scores, attentions and hidden_states when required
-+++++    #         if return_dict_in_generate:
-+++++    #             if output_scores:
-+++++    #                 scores += (next_token_scores,)
-+++++    #             if output_logits:
-+++++    #                 raw_logits += (next_token_logits,)
-+++++    #             if output_attentions:
-+++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+++++    #                 if self.config.is_encoder_decoder:
-+++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+++++                
-+++++    #             if output_hidden_states:
-+++++    #                 hidden = (
-+++++    #                     outputs.decoder_hidden_states
-+++++    #                     if self.config.is_encoder_decoder
-+++++    #                     else outputs.hidden_states
-+++++    #                 )
-+++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+++++            
-+++++    #         # token selection
-+++++    #         if do_sample:
-+++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+++++    #         else:
-+++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+++++            
-+++++    #         # finished sentences should have their next token be a padding token
-+++++    #         if has_eos_stopping_criteria:
-+++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+++++            
-+++++    #         # update generated ids, model inputs, and length for next step
-+++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+++++    #         if streamer is not None:
-+++++    #             streamer.put(next_tokens)
-+++++            
-+++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-+++++    #             outputs,
-+++++    #             model_kwargs,
-+++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+++++    #         )
-+++++            
-+++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+++++    #         cur_len += 1
-+++++            
-+++++    #         if _record_time:
-+++++    #             import time as time_module
-+++++    #             infer_stop = time_module.time()
-+++++    #             time_record.append(infer_stop - infer_start)
-+++++            
-+++++    #         del outputs
-+++++        
-+++++    #     average_infer_time = None
-+++++    #     if time_record:
-+++++    #         if len(time_record) > 1:
-+++++    #             time_record.pop(0)
-+++++    #         average_infer_time = sum(time_record) / len(time_record)
-+++++    #         print(f'average inference time is: {average_infer_time}')
-+++++    #         print(f'inference time record: {time_record}')
-+++++        
-+++++    #     if streamer is not None:
-+++++    #         streamer.end()
-+++++        
-+++++    #     # 简单判断：打印是否使用了JIT路径
-+++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-+++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-+++++    #     else:
-+++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+++++        
-+++++    #     if return_dict_in_generate:
-+++++    #         if self.config.is_encoder_decoder:
-+++++    #             return GenerateEncoderDecoderOutput(
-+++++    #                 sequences=input_ids,
-+++++    #                 scores=scores,
-+++++    #                 logits=raw_logits,
-+++++    #                 encoder_attentions=encoder_attentions,
-+++++    #                 encoder_hidden_states=encoder_hidden_states,
-+++++    #                 decoder_attentions=decoder_attentions,
-+++++    #                 cross_attentions=cross_attentions,
-+++++    #                 decoder_hidden_states=decoder_hidden_states,
-+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++    #                 average_infer_time=average_infer_time
-+++++    #             )
-+++++    #         else:
-+++++    #             return GenerateDecoderOnlyOutput(
-+++++    #                 sequences=input_ids,
-+++++    #                 scores=scores,
-+++++    #                 logits=raw_logits,
-+++++    #                 attentions=decoder_attentions,
-+++++    #                 hidden_states=decoder_hidden_states,
-+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++    #                 average_infer_time=average_infer_time
-+++++    #             )
-+++++    #     else:
-+++++    #         return input_ids
-+++++            
-+++++    # def _prepare_cache_for_generation(
-+++++    #     self,
-+++++    #     generation_config,
-+++++    #     model_kwargs,
-+++++    #     assistant_model,
-+++++    #     batch_size,
-+++++    #     max_cache_length,
-+++++    # ):
-+++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+++++    #         generation_config.cache_implementation = "static"
-+++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+++++        
-+++++    #     if generation_config.cache_implementation == "static":
-+++++    #         base_required_from_max_length = generation_config.max_length + 1
-+++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-+++++    #         min_cache_size = 50
-+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+++++    #         else:
-+++++    #             max_cache_length = max(base_required, min_cache_size)
-+++++            
-+++++    #         original_max_cache_length = max_cache_length
-+++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-+++++            
-+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++    #             if max_cache_length > self.config.max_position_embeddings:
-+++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+++++        
-+++++    #     result = super()._prepare_cache_for_generation(
-+++++    #         generation_config=generation_config,
-+++++    #         model_kwargs=model_kwargs,
-+++++    #         assistant_model=assistant_model,
-+++++    #         batch_size=batch_size,
-+++++    #         max_cache_length=max_cache_length,
-+++++    #     )
-+++++        
-+++++    #     if generation_config.cache_implementation == "static":
-+++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+++++    #         created_cache = model_kwargs.get(cache_name)
-+++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+++++    #             if created_cache.max_cache_len < generation_config.max_length:
-+++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+++++        
-+++++    #     return result
-+++++
-+++++
-+++++
-++++ 
-++++ 
-++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++++-- 
-++++2.27.0
-++++
-+++-- 
-+++2.27.0
-+++
-++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-++new file mode 100644
-++index 00000000..966529e4
-++--- /dev/null
-+++++ b/patches/0003-20261106secondcommit.patch
-++@@ -0,0 +1,2769 @@
-+++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+++From: Pinoeer-kingxi <13022943007@163.com>
-+++Date: Thu, 6 Nov 2025 14:54:37 +0800
-+++Subject: [PATCH 3/3] 20261106secondcommit
-+++
-+++---
-+++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
-+++ patches/0001-20251104commit.patch             | 1272 -----------------
-+++ 3 files changed, 528 insertions(+), 2032 deletions(-)
-+++ delete mode 100644 patches/0001-20251104commit.patch
-+++
-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++index 73773c22..2f9192bf 100644
-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
-+++ 
-+++ _CONFIG_FOR_DOC = "DeepseekConfig"
-+++ 
-++++_attn_mask_cache = {}
-++++
-++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-++++    q_len = batch_and_seq[1]
-++++    kv_len = batch_and_seq[1] + past_key_values_length 
-++++    key = (batch_and_seq[0], q_len, kv_len)
-++++
-++++    if key in _attn_mask_cache:
-++++        return _attn_mask_cache[key]
-++++
-++++    mask = _prepare_4d_causal_attention_mask(
-++++        attention_mask,
-++++        batch_and_seq,
-++++        inputs_embeds,
-++++        past_key_values_length,
-++++    )
-++++    _attn_mask_cache[key] = mask
-++++    return mask
-+++ 
-+++ def _get_unpad_data(attention_mask):
-+++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
-+++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
-+++         return final_output
-+++ 
-+++ 
-+++-    @no_grad()
-+++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++-        expert_cache = ops.zeros_like(x)
-+++-        idxs = flat_expert_indices.argsort()
-+++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++-        token_idxs = idxs // self.num_experts_per_tok
-+++-
-+++-        for i, end_idx in enumerate(tokens_per_expert):
-+++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++-            if start_idx == end_idx:
-+++-                continue
-+++-            expert = self.experts[i]
-+++-            exp_token_idx = token_idxs[start_idx:end_idx]
-+++-            expert_tokens = x[exp_token_idx]
-+++-            expert_out = expert(expert_tokens)
-+++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++-
-+++-        return expert_cache
-+++-        
-+++     # @no_grad()
-+++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++-    #     # expert_cache = torch.zeros_like(x)
-+++-    #     # idxs = flat_expert_indices.argsort()
-+++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++-    #     # token_idxs = idxs // self.num_experts_per_tok
-+++-    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++-    #     #     if start_idx == end_idx:
-+++-    #     #         continue
-+++-    #     #     expert = self.experts[i]
-+++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++-    #     #     expert_tokens = x[exp_token_idx]
-+++-    #     #     expert_out = expert(expert_tokens)
-+++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++-    #     # return expert_cache
-++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++     #     expert_cache = ops.zeros_like(x)
-+++     #     idxs = flat_expert_indices.argsort()
-+++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
-+++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++ 
-+++     #     return expert_cache
-+++-    # @no_grad()
-+++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++-    #     expert_cache = ops.zeros_like(x)
-++++        
-++++    @no_grad()
-++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++        """
-++++        优化版 MoE prefill：
-++++        - 批量张量化处理同一个 expert 的所有 token
-++++        - 跳过无 token 的专家
-++++        - 保持结果完全一致
-++++        """
-++++        # 初始化输出缓存
-++++        expert_cache = ops.zeros_like(x)
-+++ 
-+++-    #     # 排序保证顺序一致
-+++-    #     idxs = flat_expert_indices.argsort()
-+++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++-    #     token_idxs = idxs // self.num_experts_per_tok
-++++        # 排序（确保 scatter_add 位置对应原逻辑）
-++++        idxs = flat_expert_indices.argsort()
-++++        sorted_expert_indices = flat_expert_indices[idxs]
-++++        sorted_token_indices = idxs // self.num_experts_per_tok
-+++ 
-+++-    #     # 找出有 token 的专家
-+++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++        # 每个 expert 的 token 数
-++++        tokens_per_expert = sorted_expert_indices.bincount()
-+++ 
-+++-    #     for i in active_experts.tolist():
-+++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++-    #         end_idx = tokens_per_expert[i]
-+++-    #         if start_idx == end_idx:  # 没有 token
-+++-    #             continue
-++++        # 找出有 token 的专家
-++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-+++ 
-+++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++-    #         expert_tokens = x[exp_token_idx]
-+++-    #         expert_out = self.experts[i](expert_tokens)
-+++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++        for expert_id in active_experts.tolist():
-++++            # 取该 expert 对应的排序后 token 区间
-++++            start = (tokens_per_expert[:expert_id]).sum().item()
-++++            end = start + tokens_per_expert[expert_id].item()
-+++ 
-+++-    #         expert_cache = mindspore.mint.scatter_add(
-+++-    #             expert_cache,
-+++-    #             0,
-+++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++-    #             expert_out
-+++-    #         )
-++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
-++++            expert_tokens = x[token_idx]                     # 取输入向量
-+++ 
-+++-    #     return expert_cache
-++++            # 执行专家 MLP
-++++            expert_out = self.experts[expert_id](expert_tokens)
-++++
-++++            # 按权重缩放
-++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
-++++
-++++            # 回写到缓存（等价 scatter_add）
-++++            expert_cache = mindspore.mint.scatter_add(
-++++                expert_cache,
-++++                0,
-++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++                scaled_out
-++++            )
-++++
-++++        return expert_cache
-++++
-++++        # @no_grad()
-++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++        #     # expert_cache = torch.zeros_like(x)
-++++        #     # idxs = flat_expert_indices.argsort()
-++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++        #     # token_idxs = idxs // self.num_experts_per_tok
-++++        #     # for i, end_idx in enumerate(tokens_per_expert):
-++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++        #     #     if start_idx == end_idx:
-++++        #     #         continue
-++++        #     #     expert = self.experts[i]
-++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++        #     #     expert_tokens = x[exp_token_idx]
-++++        #     #     expert_out = expert(expert_tokens)
-++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++        #     # return expert_cache
-++++        #     expert_cache = ops.zeros_like(x)
-++++        #     idxs = flat_expert_indices.argsort()
-++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++        #     token_idxs = idxs // self.num_experts_per_tok
-++++
-++++        #     for i, end_idx in enumerate(tokens_per_expert):
-++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++        #         if start_idx == end_idx:
-++++        #             continue
-++++        #         expert = self.experts[i]
-++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++        #         expert_tokens = x[exp_token_idx]
-++++        #         expert_out = expert(expert_tokens)
-++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++
-++++        #     return expert_cache
-++++        # @no_grad()
-++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++        #     expert_cache = ops.zeros_like(x)
-++++
-++++        #     # 排序保证顺序一致
-++++        #     idxs = flat_expert_indices.argsort()
-++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++        #     token_idxs = idxs // self.num_experts_per_tok
-++++
-++++        #     # 找出有 token 的专家
-++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++
-++++        #     for i in active_experts.tolist():
-++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++        #         end_idx = tokens_per_expert[i]
-++++        #         if start_idx == end_idx:  # 没有 token
-++++        #             continue
-++++
-++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++        #         expert_tokens = x[exp_token_idx]
-++++        #         expert_out = self.experts[i](expert_tokens)
-++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++
-++++        #         expert_cache = mindspore.mint.scatter_add(
-++++        #             expert_cache,
-++++        #             0,
-++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++        #             expert_out
-++++        #         )
-++++
-++++        #     return expert_cache
-+++ 
-+++ 
-+++ 
-+++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
-+++ 
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-+++-
-+++ # class DeepseekFlashAttention(nn.Module):
-+++ #     """
-+++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
-+++ 
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-++++
-+++ Deepseek_ATTENTION_CLASSES = {
-+++     "eager": DeepseekAttention,
-+++     "flash-attention": DeepseekFlashAttention,
-+++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
-+++             )
-+++         else:
-+++             # 4d mask is passed through the layers
-+++-            attention_mask = _prepare_4d_causal_attention_mask(
-++++            # attention_mask = _prepare_4d_causal_attention_mask(
-++++            #     attention_mask,
-++++            #     (batch_size, seq_length),
-++++            #     inputs_embeds,
-++++            #     past_key_values_length,
-++++            # )
-++++            #@dwj
-++++            attention_mask = get_cached_causal_mask(
-+++                 attention_mask,
-+++                 (batch_size, seq_length),
-+++                 inputs_embeds,
-+++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++         # Initialize weights and apply final processing
-+++         self.post_init()
-+++         self.warm_up = False
-++++        #@dwj
-++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-++++            self.num_layers,
-++++            self.num_attention_heads,
-++++            self.head_dim,
-++++            batch_size=1,
-++++            max_length=self.max_length,
-++++            dtype=mindspore.float16
-++++        )
-++++
-++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-++++        key_cache = []
-++++        value_cache = []
-++++        for _ in range(num_layers):
-++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++++            key_cache.append(k)
-++++            value_cache.append(v)
-++++        return key_cache, value_cache
-++++
-+++ 
-+++     def warmup_moe_model_deep(self):
-+++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++index bced285c..ebd7782e 100644
-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
-+++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-+++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-+++ 
-+++-Long_Prompt = False
-+++-PROMPT_LENGTH_THRESHOLD = 128
-++++Long_Prompt = 1
-++++LONG_PROMPT_LENGTH_THRESHOLD = 128
-++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
-++++
-++++_causal_mask_cache = {}
-++++
-++++def get_cached_causal_mask_with_cache_position(
-++++    attention_mask: mindspore.Tensor,
-++++    sequence_length: int,
-++++    target_length: int,
-++++    dtype: mindspore.dtype,
-++++    min_dtype: float,
-++++    cache_position: mindspore.Tensor,
-++++    batch_size: int,
-++++):
-++++    """
-++++    带缓存的 causal mask 构造函数
-++++    """
-++++    # q_len 是当前 query 长度
-++++    q_len = sequence_length
-++++    # kv_len 是 target_length
-++++    kv_len = target_length
-++++
-++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
-++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
-++++
-++++    if key in _causal_mask_cache:
-++++        return _causal_mask_cache[key]
-++++
-++++    # 调用原来的 mask 构造逻辑
-++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++        attention_mask,
-++++        sequence_length=sequence_length,
-++++        target_length=target_length,
-++++        dtype=dtype,
-++++        min_dtype=min_dtype,
-++++        cache_position=cache_position,
-++++        batch_size=batch_size,
-++++    )
-++++    # 缓存结果
-++++    _causal_mask_cache[key] = causal_mask
-++++    return causal_mask
-+++ 
-+++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-+++ def _prepare_4d_causal_attention_mask_with_cache_position(
-+++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++ 
-+++ 
-+++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
-++++# class Qwen2MoeAttention(nn.Module):
-++++#     """
-++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-++++#     and "Generating Long Sequences with Sparse Transformers".
-++++#     """
-++++
-++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++#         super().__init__()
-++++#         self.config = config
-++++#         self.layer_idx = layer_idx
-++++#         if layer_idx is None:
-++++#             logger.warning_once(
-++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++#                 "when creating this class."
-++++#             )
-++++
-++++#         self.hidden_size = config.hidden_size
-++++#         self.num_heads = config.num_attention_heads
-++++#         self.head_dim = self.hidden_size // self.num_heads
-++++#         self.num_key_value_heads = config.num_key_value_heads
-++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++#         self.max_position_embeddings = config.max_position_embeddings
-++++#         self.rope_theta = config.rope_theta
-++++#         self.is_causal = True
-++++#         self.attention_dropout = config.attention_dropout
-++++
-++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++++#             raise ValueError(
-++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++++#                 f" and `num_heads`: {self.num_heads})."
-++++#             )
-++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++
-++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++#             self.head_dim,
-++++#             max_position_embeddings=self.max_position_embeddings,
-++++#             base=self.rope_theta,
-++++#         )
-++++
-++++#     def forward(
-++++#         self,
-++++#         hidden_states: mindspore.Tensor,
-++++#         attention_mask: Optional[mindspore.Tensor] = None,
-++++#         position_ids: Optional[mindspore.Tensor] = None,
-++++#         past_key_value: Optional[Cache] = None,
-++++#         output_attentions: bool = False,
-++++#         use_cache: bool = False,
-++++#         cache_position: Optional[mindspore.Tensor] = None,
-++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++
-++++        
-++++
-++++#         bsz, q_len, _ = hidden_states.shape
-++++
-++++#         query_states = self.q_proj(hidden_states)
-++++#         key_states = self.k_proj(hidden_states)
-++++#         value_states = self.v_proj(hidden_states)
-++++
-++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++
-++++#         kv_seq_len = key_states.shape[-2]
-++++#         if past_key_value is not None:
-++++#             if self.layer_idx is None:
-++++#                 raise ValueError(
-++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++#                     "with a layer index."
-++++#                 )
-++++#             if isinstance(past_key_value, StaticCache):
-++++#                 kv_seq_len = key_states.shape[-2]
-++++#             else:
-++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++
-++++#         if past_key_value is not None:
-++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++            
-++++#             if isinstance(past_key_value, StaticCache):
-++++#                 kv_seq_len = key_states.shape[-2]
-++++
-++++#         # repeat k/v heads if n_kv_heads < n_heads
-++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++        
-++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++
-++++#         if attention_mask is not None:
-++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++#             attn_weights = attn_weights + causal_mask
-++++
-++++#         # upcast attention to fp32
-++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++++#         attn_output = ops.matmul(attn_weights, value_states)
-++++
-++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++++#             raise ValueError(
-++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-++++#                 f" {attn_output.shape}"
-++++#             )
-++++
-++++#         attn_output = ops.transpose(attn_output, 1, 2)
-++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++
-++++#         attn_output = self.o_proj(attn_output)
-++++#         # @lwx
-++++        
-++++#         # max_seq_len = self.max_position_embeddings  # 2048
-++++
-++++#         # if attention_mask is not None:
-++++#         #     # attention_mask: [B, 1, Sq, Sk]
-++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++
-++++#         #     # pad 到 [max_seq_len, max_seq_len]
-++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++#         #     global_attention_mask = padded_mask
-++++#         # else:
-++++#         #     global_attention_mask = None
-++++
-++++
-++++#         # sparse_mode=3
-++++#         # attn_output = mindspore.ops.flash_attention_score(
-++++#         #     query=query_states,
-++++#         #     key=key_states,
-++++#         #     value=value_states,
-++++#         #     real_shift=None,
-++++#         #     padding_mask=None,
-++++
-++++#         #     head_num=self.num_heads,
-++++#         #     attn_mask=global_attention_mask,
-++++#         #     keep_prob=1.0 - self.attention_dropout,
-++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++#         #     input_layout="BNSD",
-++++#         #     pre_tokens=2147483647,
-++++#         #     next_tokens=2147483647,
-++++#         #     inner_precise=0,
-++++#         #     drop_mask=None,
-++++#         #     prefix=None,
-++++#         #     actual_seq_qlen=None,
-++++#         #     actual_seq_kvlen=None,
-++++#         #     sparse_mode=sparse_mode,
-++++#         # )
-++++#         if not output_attentions:
-++++#             attn_weights = None
-++++
-++++#         return attn_output, attn_weights, past_key_value
-++++
-+++ class Qwen2MoeAttention(nn.Module):
-+++     """
-+++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-+++-    and "Generating Long Sequences with Sparse Transformers".
-+++-    """
-++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
-+++ 
-++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
-++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
-++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
-++++
-++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
-++++    """
-+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++         super().__init__()
-+++         self.config = config
-+++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
-+++         if layer_idx is None:
-+++             logger.warning_once(
-+++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++                 "when creating this class."
-+++             )
-+++ 
-+++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
-+++         use_cache: bool = False,
-+++         cache_position: Optional[mindspore.Tensor] = None,
-+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++-
-+++         
-+++-
-++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
-+++         bsz, q_len, _ = hidden_states.shape
-+++ 
-+++         query_states = self.q_proj(hidden_states)
-+++         key_states = self.k_proj(hidden_states)
-+++         value_states = self.v_proj(hidden_states)
-+++ 
-+++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++-
-++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++        
-+++         kv_seq_len = key_states.shape[-2]
-+++         if past_key_value is not None:
-+++-            if self.layer_idx is None:
-+++-                raise ValueError(
-+++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++-                    "with a layer index."
-+++-                )
-+++-            if isinstance(past_key_value, StaticCache):
-+++-                kv_seq_len = key_states.shape[-2]
-+++-            else:
-+++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++        
-+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++ 
-+++         if past_key_value is not None:
-+++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++
-++++        # --- 2. 动态调度核心注意力计算 ---
-++++        global Long_Prompt
-++++        if Long_Prompt >= 1:
-++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
-++++            fa_attention_mask = None
-++++            if attention_mask is not None:
-++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++                fa_attention_mask = (mask_slice != 0)
-++++
-++++            attn_output = mindspore.ops.flash_attention_score(
-++++                query=query_states,
-++++                key=key_states,
-++++                value=value_states,
-++++                head_num=self.num_heads,
-++++                attn_mask=fa_attention_mask,
-++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
-++++                scalar_value=1.0 / math.sqrt(self.head_dim),
-++++                input_layout="BNSD",
-++++                sparse_mode=0,
-++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
-++++            )
-+++             
-+++-            if isinstance(past_key_value, StaticCache):
-+++-                kv_seq_len = key_states.shape[-2]
-++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++            attn_output = self.o_proj(attn_output)
-++++            attn_weights = None
-++++            if output_attentions:
-++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-+++ 
-+++-        # repeat k/v heads if n_kv_heads < n_heads
-+++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++-        
-+++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++        else:
-++++            # --- Eager Attention 路径 (用于短序列和解码) ---
-++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++            
-++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++ 
-+++-        if attention_mask is not None:
-+++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++-            attn_weights = attn_weights + causal_mask
-++++            if attention_mask is not None:
-++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++                attn_weights = attn_weights + causal_mask
-+++ 
-+++-        # upcast attention to fp32
-+++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+++-        attn_output = ops.matmul(attn_weights, value_states)
-++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++++            attn_output = ops.matmul(attn_weights, value_states)
-+++ 
-+++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+++-            raise ValueError(
-+++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-+++-                f" {attn_output.shape}"
-+++-            )
-++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++++                raise ValueError(
-++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
-++++                )
-+++ 
-+++-        attn_output = ops.transpose(attn_output, 1, 2)
-+++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++            attn_output = ops.transpose(attn_output, 1, 2)
-++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++            attn_output = self.o_proj(attn_output)
-+++ 
-+++-        attn_output = self.o_proj(attn_output)
-+++-        # @lwx
-++++            if not output_attentions:
-++++                attn_weights = None
-+++         
-+++-        # max_seq_len = self.max_position_embeddings  # 2048
-+++-
-+++-        # if attention_mask is not None:
-+++-        #     # attention_mask: [B, 1, Sq, Sk]
-+++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++-
-+++-        #     # pad 到 [max_seq_len, max_seq_len]
-+++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++-        #     global_attention_mask = padded_mask
-+++-        # else:
-+++-        #     global_attention_mask = None
-+++-
-+++-
-+++-        # sparse_mode=3
-+++-        # attn_output = mindspore.ops.flash_attention_score(
-+++-        #     query=query_states,
-+++-        #     key=key_states,
-+++-        #     value=value_states,
-+++-        #     real_shift=None,
-+++-        #     padding_mask=None,
-+++-
-+++-        #     head_num=self.num_heads,
-+++-        #     attn_mask=global_attention_mask,
-+++-        #     keep_prob=1.0 - self.attention_dropout,
-+++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++-        #     input_layout="BNSD",
-+++-        #     pre_tokens=2147483647,
-+++-        #     next_tokens=2147483647,
-+++-        #     inner_precise=0,
-+++-        #     drop_mask=None,
-+++-        #     prefix=None,
-+++-        #     actual_seq_qlen=None,
-+++-        #     actual_seq_kvlen=None,
-+++-        #     sparse_mode=sparse_mode,
-+++-        # )
-+++-        if not output_attentions:
-+++-            attn_weights = None
-+++-
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-+++-
-+++ # class Qwen2MoeFlashAttention(nn.Module):
-+++ #     """
-+++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
-+++ #             return final_hidden_states, router_logits
-+++ 
-+++ 
-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++-#     """
-+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-+++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-+++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
-+++-#     """
-+++-#     def __init__(self, config: Qwen2MoeConfig):
-+++-#         super().__init__()
-+++-#         self.num_experts = config.num_experts
-+++-#         self.top_k = config.num_experts_per_tok
-+++-#         self.norm_topk_prob = config.norm_topk_prob
-+++-
-+++-#         # 门控网络
-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++-#         # 专家列表
-+++-#         self.experts = nn.ModuleList(
-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++-#         )
-+++-#         # 共享专家
-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++-
-+++-#     @no_grad()
-+++-#     def _moe_infer_decode(
-+++-#         self, 
-+++-#         hidden_states: mindspore.Tensor, 
-+++-#         selected_experts: mindspore.Tensor, 
-+++-#         routing_weights: mindspore.Tensor
-+++-#     ) -> mindspore.Tensor:
-+++-#         """
-+++-#         【解码路径】针对 sequence_length=1 的极致优化。
-+++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-+++-#         """
-+++-#         batch_size, hidden_dim = hidden_states.shape
-+++-        
-+++-#         expert_outputs_list = [
-+++-#             ops.cat([
-+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++-#             ], dim=0) 
-+++-#             for i in range(batch_size)
-+++-#         ]
-+++-        
-+++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-+++-#         # shape: (batch_size, top_k, hidden_dim)
-+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++-        
-+++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-+++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++-        
-+++-#         return moe_output.squeeze(1)
-+++-
-+++-#     @no_grad()
-+++-#     def _moe_infer_prefill(
-+++-#         self, 
-+++-#         hidden_states: mindspore.Tensor, 
-+++-#         selected_experts: mindspore.Tensor, 
-+++-#         routing_weights: mindspore.Tensor
-+++-#     ) -> mindspore.Tensor:
-+++-#         """
-+++-#         【预填充路径】针对 sequence_length > 1 的优化。
-+++-#         按专家对 Token 进行分组，并进行批处理。
-+++-#         """
-+++-#         moe_output = ops.zeros_like(hidden_states)
-+++-#         num_tokens = hidden_states.shape[0]
-+++-#         flat_selected_experts = selected_experts.flatten()
-+++-        
-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++-        
-+++-#         active_experts = ops.unique(flat_selected_experts)
-+++-        
-+++-#         for expert_idx_tensor in active_experts:
-+++-#             expert_idx = expert_idx_tensor.item()
-+++-#             expert_layer = self.experts[expert_idx]
-+++-            
-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++-#             selected_token_indices = token_indices[mask]
-+++-#             selected_routing_weights = routing_weights.flatten()[mask]
-+++-            
-+++-#             current_states = hidden_states[selected_token_indices]
-+++-            
-+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++-            
-+++-#             moe_output = moe_output.index_add(
-+++-#                 dim=0,
-+++-#                 index=selected_token_indices,
-+++-#                 source=expert_output.to(hidden_states.dtype)
-+++-#             )
-+++-#         return moe_output
-+++-
-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++-#         """
-+++-#         顶层 forward 方法，作为智能分发器。
-+++-#         """
-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-        
-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++-#         router_logits = self.gate(hidden_states_reshaped)
-+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++-
-+++-#         if self.norm_topk_prob:
-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-        
-+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++-        
-+++-#         moe_output = None
-+++-#         # 在推理时，根据序列长度选择最优路径
-+++-#         if not self.training:
-+++-#             if sequence_length == 1:
-+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++-#             else:
-+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++-#         else:
-+++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-+++-#             raise NotImplementedError("Training path is not implemented.")
-+++-
-+++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-+++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-+++-        
-+++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-+++-        
-+++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-+++-        
-+++-#         return final_hidden_states, router_logits
-+++-
-+++-
-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++-#     """
-+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-+++-#     """
-+++-#     def __init__(self, config: Qwen2MoeConfig):
-+++-#         super().__init__()
-+++-#         self.num_experts = config.num_experts
-+++-#         self.top_k = config.num_experts_per_tok
-+++-#         self.norm_topk_prob = config.norm_topk_prob
-+++-
-+++-#         # 门控网络
-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++-#         # 专家列表
-+++-#         self.experts = nn.ModuleList(
-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++-#         )
-+++-#         # 共享专家
-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++-
-+++-#     @no_grad()
-+++-#     def _moe_infer_decode(
-+++-#         self, 
-+++-#         hidden_states: mindspore.Tensor, 
-+++-#         selected_experts: mindspore.Tensor, 
-+++-#         routing_weights: mindspore.Tensor
-+++-#     ) -> mindspore.Tensor:
-+++-#         batch_size, _ = hidden_states.shape
-+++-#         expert_outputs_list = [
-+++-#             ops.cat([
-+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++-#             ], dim=0) 
-+++-#             for i in range(batch_size)
-+++-#         ]
-+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++-#         return moe_output.squeeze(1)
-+++-
-+++-#     @no_grad()
-+++-#     def _moe_infer_prefill(
-+++-#         self, 
-+++-#         hidden_states: mindspore.Tensor, 
-+++-#         selected_experts: mindspore.Tensor, 
-+++-#         routing_weights: mindspore.Tensor
-+++-#     ) -> mindspore.Tensor:
-+++-#         moe_output = ops.zeros_like(hidden_states)
-+++-#         num_tokens = hidden_states.shape[0]
-+++-#         flat_selected_experts = selected_experts.flatten()
-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++-#         active_experts = ops.unique(flat_selected_experts)
-+++-        
-+++-#         for expert_idx_tensor in active_experts:
-+++-#             expert_idx = expert_idx_tensor.item()
-+++-#             expert_layer = self.experts[expert_idx]
-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++-#             selected_token_indices = token_indices[mask]
-+++-#             selected_routing_weights = routing_weights.flatten()[mask]
-+++-#             current_states = hidden_states[selected_token_indices]
-+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++-#             moe_output = moe_output.index_add(
-+++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++-#             )
-+++-#         return moe_output
-+++-
-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++-#         """
-+++-#         顶层 forward 方法，作为智能分发器。
-+++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-+++-#         """
-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-        
-+++-#         # 1. 门控计算 (通用逻辑)
-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++-#         router_logits = self.gate(hidden_states_reshaped)
-+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++-
-+++-#         if self.norm_topk_prob:
-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-        
-+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++-        
-+++-#         # 2. 智能分发到最优 MoE 路径
-+++-#         moe_output = None
-+++-#         if not self.training:
-+++-#             if sequence_length == 1:
-+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++-#             else:
-+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++-#         else:
-+++-#             raise NotImplementedError("Training path is not implemented.")
-+++-
-+++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-+++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++-        
-+++-#         # 4. 合并 MoE 输出和共享专家输出
-+++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++-        
-+++-#         # 5. 恢复原始形状并返回
-+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++-        
-+++-#         return final_hidden_states, router_logits
-+++-
-+++-# prefill fastest
-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++-#     """
-+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-+++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-+++-#     """
-+++-#     def __init__(self, config: Qwen2MoeConfig):
-+++-#         super().__init__()
-+++-#         self.num_experts = config.num_experts
-+++-#         self.top_k = config.num_experts_per_tok
-+++-#         self.norm_topk_prob = config.norm_topk_prob
-+++-
-+++-#         # 门控网络
-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++-#         # 专家列表
-+++-#         self.experts = nn.ModuleList(
-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++-#         )
-+++-#         # 共享专家
-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++-
-+++-#     @no_grad()
-+++-#     def _moe_infer_dispatch(
-+++-#         self, 
-+++-#         hidden_states: mindspore.Tensor, 
-+++-#         selected_experts: mindspore.Tensor, 
-+++-#         routing_weights: mindspore.Tensor
-+++-#     ) -> mindspore.Tensor:
-+++-#         """
-+++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-+++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-+++-#         """
-+++-#         moe_output = ops.zeros_like(hidden_states)
-+++-#         num_tokens, _ = hidden_states.shape
-+++-        
-+++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-+++-#         flat_selected_experts = selected_experts.flatten()
-+++-#         flat_routing_weights = routing_weights.flatten()
-+++-
-+++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++-
-+++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-+++-#         active_experts = ops.unique(flat_selected_experts)
-+++-        
-+++-#         for expert_idx_tensor in active_experts:
-+++-#             expert_idx = expert_idx_tensor.item()
-+++-#             expert_layer = self.experts[expert_idx]
-+++-            
-+++-#             # 找到所有分配给该专家的 token
-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++-            
-+++-#             # 使用 mask 选取对应的 token 和权重
-+++-#             current_token_indices = token_indices[mask]
-+++-#             current_routing_weights = flat_routing_weights[mask]
-+++-#             current_hidden_states = hidden_states[current_token_indices]
-+++-            
-+++-#             # 对这些 token 进行批处理
-+++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++-            
-+++-#             # 使用 index_add 将结果精确地加回到对应位置
-+++-#             moe_output = moe_output.index_add(
-+++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-+++-#             )
-+++-#         return moe_output
-+++-
-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++-#         """
-+++-#         顶层 forward 方法，作为智能分发器。
-+++-#         """
-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-        
-+++-#         # 1. 门控计算
-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++-#         router_logits = self.gate(hidden_states_reshaped)
-+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++-
-+++-#         if self.norm_topk_prob:
-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-        
-+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++-        
-+++-#         # 2. 调用统一的 MoE 计算内核
-+++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-+++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-+++-
-+++-#         # 3. 统一处理共享专家
-+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++-        
-+++-#         # 4. 合并输出
-+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++-        
-+++-#         # 5. 恢复原始形状并返回
-+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++-        
-+++-#         return final_hidden_states, router_logits
-+++-
-+++-
-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++-#     """
-+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++-#     【最终高性能与高精度版】：
-+++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-+++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-+++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-+++-#     3. 这样实现了速度和准确性的两全其美。
-+++-#     """
-+++-#     def __init__(self, config: Qwen2MoeConfig):
-+++-#         super().__init__()
-+++-#         self.num_experts = config.num_experts
-+++-#         self.top_k = config.num_experts_per_tok
-+++-#         self.norm_topk_prob = config.norm_topk_prob
-+++-
-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++-#         self.experts = nn.ModuleList(
-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++-#         )
-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++-
-+++-#     @no_grad()
-+++-#     def _moe_infer_decode(
-+++-#         self, 
-+++-#         hidden_states: mindspore.Tensor, 
-+++-#         selected_experts: mindspore.Tensor, 
-+++-#         routing_weights: mindspore.Tensor
-+++-#     ) -> mindspore.Tensor:
-+++-#         """
-+++-#         【解码路径】极致优化版：bmm + 高精度累加。
-+++-#         """
-+++-#         original_dtype = hidden_states.dtype
-+++-#         batch_size, _ = hidden_states.shape
-+++-        
-+++-#         expert_outputs_list = [
-+++-#             ops.cat([
-+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++-#             ], dim=0) 
-+++-#             for i in range(batch_size)
-+++-#         ]
-+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++-
-+++-#         # 在 float32 下执行 bmm，得到高精度结果
-+++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++-        
-+++-#         # 将高精度结果转换回原始数据类型
-+++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-+++-        
-+++-#         return moe_output
-+++-
-+++-#     @no_grad()
-+++-#     def _moe_infer_prefill(
-+++-#         self, 
-+++-#         hidden_states: mindspore.Tensor, 
-+++-#         selected_experts: mindspore.Tensor, 
-+++-#         routing_weights: mindspore.Tensor
-+++-#     ) -> mindspore.Tensor:
-+++-#         """
-+++-#         【预填充路径】与原始实现一致，结果精确。
-+++-#         """
-+++-#         moe_output = ops.zeros_like(hidden_states)
-+++-#         num_tokens, _ = hidden_states.shape
-+++-#         flat_selected_experts = selected_experts.flatten()
-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++-#         active_experts = ops.unique(flat_selected_experts)
-+++-        
-+++-#         for expert_idx_tensor in active_experts:
-+++-#             expert_idx = expert_idx_tensor.item()
-+++-#             expert_layer = self.experts[expert_idx]
-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++-#             selected_token_indices = token_indices[mask]
-+++-#             selected_routing_weights = routing_weights.flatten()[mask]
-+++-#             current_states = hidden_states[selected_token_indices]
-+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++-#             moe_output = moe_output.index_add(
-+++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++-#             )
-+++-#         return moe_output
-+++-
-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-        
-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++-#         router_logits = self.gate(hidden_states_reshaped)
-+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++-
-+++-#         if self.norm_topk_prob:
-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-        
-+++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-+++-#         # 如果模型主体是 float16，后续再转换
-+++-        
-+++-#         moe_output = None
-+++-#         if not self.training:
-+++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-+++-#             # _moe_infer_decode 内部会处理好类型转换
-+++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-+++-#             if sequence_length == 1:
-+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++-#             else:
-+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++-#         else:
-+++-#             raise NotImplementedError("Training path is not implemented.")
-+++-
-+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++-        
-+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++-        
-+++-#         return final_hidden_states, router_logits
-+++-    
-+++-
-+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++-#     """
-+++-#     【融合版】一个混合专家模块，内置两种推理策略，
-+++-#     由外部全局变量 `Long_Prompt` 控制：
-+++-
-+++-#     - if Long_Prompt is True:  【精度优先模式】
-+++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-+++-#       适用于处理长序列，避免误差累积。
-+++-
-+++-#     - if Long_Prompt is False: 【速度优先模式】
-+++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-+++-#       在解码阶段获得极致速度，同时保证结果高度准确。
-+++-#     """
-+++-#     def __init__(self, config: Qwen2MoeConfig):
-+++-#         super().__init__()
-+++-#         self.num_experts = config.num_experts
-+++-#         self.top_k = config.num_experts_per_tok
-+++-#         self.norm_topk_prob = config.norm_topk_prob
-+++-
-+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++-#         self.experts = nn.ModuleList(
-+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++-#         )
-+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++-
-+++-#     # --- 速度优先模式的辅助函数 ---
-+++-#     @no_grad()
-+++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++-#         original_dtype = hidden_states.dtype
-+++-#         batch_size, _ = hidden_states.shape
-+++-#         expert_outputs_list = [
-+++-#             ops.cat([
-+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++-#             ], dim=0) 
-+++-#             for i in range(batch_size)
-+++-#         ]
-+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++-#         weights_fp32 = routing_weights.to(mindspore.float32)
-+++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
-+++-
-+++-#     @no_grad()
-+++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++-#         moe_output = ops.zeros_like(hidden_states)
-+++-#         num_tokens, _ = hidden_states.shape
-+++-#         flat_selected_experts = selected_experts.flatten()
-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++-#         active_experts = ops.unique(flat_selected_experts)
-+++-#         for expert_idx_tensor in active_experts:
-+++-#             expert_idx = expert_idx_tensor.item()
-+++-#             expert_layer = self.experts[expert_idx]
-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++-#             selected_token_indices = token_indices[mask]
-+++-#             selected_routing_weights = routing_weights.flatten()[mask]
-+++-#             current_states = hidden_states[selected_token_indices]
-+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-+++-#         return moe_output
-+++-
-+++-#     # --- 精度优先模式的辅助函数 ---
-+++-#     @no_grad()
-+++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++-#         moe_output = ops.zeros_like(hidden_states)
-+++-#         num_tokens, _ = hidden_states.shape
-+++-#         flat_selected_experts = selected_experts.flatten()
-+++-#         flat_routing_weights = routing_weights.flatten()
-+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++-#         active_experts = ops.unique(flat_selected_experts)
-+++-#         for expert_idx_tensor in active_experts:
-+++-#             expert_idx = expert_idx_tensor.item()
-+++-#             expert_layer = self.experts[expert_idx]
-+++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++-#             current_token_indices = token_indices[mask]
-+++-#             current_routing_weights = flat_routing_weights[mask]
-+++-#             current_hidden_states = hidden_states[current_token_indices]
-+++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+++-#         return moe_output
-+++-
-+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++-#         # 声明我们将要使用一个在模块外部定义的全局变量
-+++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-+++-#         global Long_Prompt
-+++-
-+++-#         # 1. 门控计算 (所有模式通用)
-+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++-#         router_logits = self.gate(hidden_states_reshaped)
-+++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+++-#         if self.norm_topk_prob:
-+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-        
-+++-#         moe_output = None
-+++-#         if not self.training:
-+++-#             # 根据 Long_Prompt 标志选择模式
-+++-#             if Long_Prompt:
-+++-#                 # --- 精度优先模式 ---
-+++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++-#             else:
-+++-#                 # --- 速度优先模式 ---
-+++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++-#                 if sequence_length == 1:
-+++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++-#                 else:
-+++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++-#         else:
-+++-#             raise NotImplementedError("Training path is not implemented.")
-+++-
-+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++-        
-+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++-        
-+++-#         return final_hidden_states, router_logits
-+++-    
-+++ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++     """
-+++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-+++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++         return moe_output_fp32.squeeze(1).to(original_dtype)
-+++ 
-++++    # @no_grad()
-++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++    #     num_tokens, _ = hidden_states.shape
-++++    #     flat_selected_experts = selected_experts.flatten()
-++++    #     sorted_expert_indices = flat_selected_experts.argsort()
-++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++++    #     original_token_indices = sorted_expert_indices // self.top_k
-++++    #     moe_output = ops.zeros_like(hidden_states)
-++++    #     current_token_offset = 0
-++++    #     for i in range(self.num_experts):
-++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
-++++    #         if expert_token_count == 0:
-++++    #             continue
-++++    #         end_offset = current_token_offset + expert_token_count
-++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
-++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++++    #         current_token_offset += expert_token_count
-++++    #     return moe_output
-++++
-+++     @no_grad()
-+++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++-        num_tokens, _ = hidden_states.shape
-+++-        flat_selected_experts = selected_experts.flatten()
-+++-        sorted_expert_indices = flat_selected_experts.argsort()
-+++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+++-        original_token_indices = sorted_expert_indices // self.top_k
-++++        """
-++++        优化版 MoE prefill (速度优先模式)：
-++++        - 批量张量化处理同一个 expert 的所有 token
-++++        - 跳过无 token 的专家
-++++        - 保持结果完全一致
-++++        """
-+++         moe_output = ops.zeros_like(hidden_states)
-+++-        current_token_offset = 0
-+++-        for i in range(self.num_experts):
-+++-            expert_token_count = tokens_per_expert[i] - current_token_offset
-+++-            if expert_token_count == 0:
-+++-                continue
-+++-            end_offset = current_token_offset + expert_token_count
-+++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+++-            expert_hidden_states = hidden_states[expert_original_token_indices]
-+++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+++-            current_token_offset += expert_token_count
-++++
-++++        flat_selected_experts = selected_experts.flatten()
-++++        flat_routing_weights = routing_weights.flatten()
-++++
-++++        idxs = flat_selected_experts.argsort()
-++++        sorted_expert_indices = flat_selected_experts[idxs]
-++++        sorted_token_indices = idxs // self.top_k
-++++
-++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-++++
-++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-++++
-++++        for expert_id in active_experts.tolist():
-++++            start = int(tokens_per_expert[:expert_id].sum().item())
-++++            end = start + int(tokens_per_expert[expert_id].item())
-++++
-++++            token_idx = sorted_token_indices[start:end]
-++++            expert_tokens = hidden_states[token_idx]
-++++
-++++            expert_out = self.experts[expert_id](expert_tokens)
-++++
-++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
-++++
-++++            moe_output = mindspore.mint.scatter_add(
-++++                moe_output,
-++++                0,
-++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
-++++                scaled_out.to(hidden_states.dtype)
-++++            )
-++++
-+++         return moe_output
-+++ 
-++++
-+++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-+++     @no_grad()
-+++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++         
-+++         moe_output = None
-+++-        if Long_Prompt:
-+++-            # --- 精度优先模式 (ACCURACY MODE) ---
-+++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++        # if Long_Prompt==0:
-++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
-++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++        # else:
-++++        #     # --- 速度优先模式 (SPEED MODE) ---
-++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++        #     if sequence_length == 1:
-++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++        #     else:
-++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++        
-++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++        if sequence_length == 1:
-++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++         else:
-+++-            # --- 速度优先模式 (SPEED MODE) ---
-+++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++-            if sequence_length == 1:
-+++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++-            else:
-+++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++-        
-++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++    
-+++ 
-+++         # 3. 共享专家计算与合并 (所有模式通用)
-+++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++         
-+++         return final_hidden_states, router_logits
-+++ 
-++++
-+++ class Qwen2MoeDecoderLayer(nn.Module):
-+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-+++         super().__init__()
-+++         self.hidden_size = config.hidden_size
-+++         
-+++-        # if Long_Prompt:
-+++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++-        # else:
-++++        # if Long_Prompt == 2:
-+++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++        # else:
-++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++ 
-+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++ 
-+++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++             )
-+++ 
-+++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
-+++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++        #     attention_mask,
-++++        #     sequence_length=sequence_length,
-++++        #     target_length=target_length,
-++++        #     dtype=dtype,
-++++        #     min_dtype=min_dtype,
-++++        #     cache_position=cache_position,
-++++        #     batch_size=input_tensor.shape[0],
-++++        # )
-++++        #@dwj
-++++        causal_mask = get_cached_causal_mask_with_cache_position(
-+++             attention_mask,
-+++             sequence_length=sequence_length,
-+++             target_length=target_length,
-+++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-+++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-+++         """
-+++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
-++++        _causal_mask_cache.clear()
-+++ 
-+++         input_ids = kwargs.get("input_ids")
-+++         if input_ids is None and args:
-+++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++ 
-+++         if input_ids is not None:
-+++             prompt_length = input_ids.shape[1]
-+++-            
-+++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-+++-                Long_Prompt = True
-++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-++++                Long_Prompt = 2
-++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-++++                Long_Prompt = 0
-+++             else:
-+++-                Long_Prompt = False
-++++                Long_Prompt = 1
-++++
-+++ 
-+++         return super().generate(*args, **kwargs)
-+++     
-+++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++             dtype = self.lm_head.weight.dtype
-+++             min_dtype = float(ops.finfo(dtype).min)
-+++ 
-+++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++            #     attention_mask,
-++++            #     sequence_length=sequence_length,
-++++            #     target_length=past_key_values.get_max_length(),
-++++            #     dtype=dtype,
-++++            #     min_dtype=min_dtype,
-++++            #     cache_position=cache_position,
-++++            #     batch_size=batch_size,
-++++            # )
-++++
-++++            #@dwj
-++++            attention_mask = get_cached_causal_mask_with_cache_position(
-+++                 attention_mask,
-+++                 sequence_length=sequence_length,
-+++                 target_length=past_key_values.get_max_length(),
-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+++deleted file mode 100644
-+++index 6dfb5b93..00000000
-+++--- a/patches/0001-20251104commit.patch
-++++++ /dev/null
-+++@@ -1,1272 +0,0 @@
-+++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++-From: Pinoeer-kingxi <13022943007@163.com>
-+++-Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++-Subject: [PATCH] 20251104commit
-+++-
-+++----
-+++- mindnlp/transformers/cache_utils.py           |  28 +-
-+++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+++- 3 files changed, 976 insertions(+), 87 deletions(-)
-+++-
-+++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+++-index cadd2e04..02f8d4be 100644
-+++---- a/mindnlp/transformers/cache_utils.py
-+++-+++ b/mindnlp/transformers/cache_utils.py
-+++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+++-             # k_out[:, :, cache_position] = key_states
-+++-             # v_out[:, :, cache_position] = value_states
-+++--            if ON_ORANGE_PI:
-+++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++--            else:
-+++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++--
-+++-+            # if ON_ORANGE_PI:
-+++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++-+            # else:
-+++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++-+            # 确保 cache_position 是 1D tensor 并且类型正确
-+++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+++-+            if cache_position.ndim > 1:
-+++-+                cache_position = cache_position.flatten()
-+++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
-+++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+++-+                cache_position = cache_position.int()
-+++-+            
-+++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+++-+            k_out[:, :, cache_position] = key_states
-+++-+            v_out[:, :, cache_position] = value_states
-+++-+            
-+++-         return k_out, v_out
-+++- 
-+++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++-index c695b944..d8303e45 100644
-+++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+++- # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++- def rotate_half(x):
-+++-     """Rotates half the hidden dims of the input."""
-+++--    x1 = x[..., : x.shape[-1] // 2]
-+++--    x2 = x[..., x.shape[-1] // 2 :]
-+++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++-+    # x1 = x[..., : x.shape[-1] // 2]
-+++-+    # x2 = x[..., x.shape[-1] // 2 :]
-+++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++-     return ops.cat((-x2, x1), dim=-1)
-+++- 
-+++- 
-+++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+++-         if self.training:
-+++-             raise NotImplementedError("Training is not supported yet.")
-+++-         else:
-+++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++--        if self.config.n_shared_experts is not None:
-+++--            y = y + self.shared_experts(identity)
-+++--        return y
-+++-+            # @lwx
-+++-+            if orig_shape[1] == 1:
-+++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+++-+                y=y.view(*orig_shape)
-+++-+                if self.config.n_shared_experts is not None:
-+++-+                    y = y + self.shared_experts(identity)
-+++-+                return y
-+++-+            else:
-+++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+++-+                if self.config.n_shared_experts is not None:
-+++-+                    y = y + self.shared_experts(identity)
-+++-+                return y
-+++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++-+        # if self.config.n_shared_experts is not None:
-+++-+        #     y = y + self.shared_experts(identity)
-+++-+        # return y
-+++-+        
-+++-+    @no_grad()
-+++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++-+
-+++-+        expert_cache = ops.zeros_like(x)
-+++-+        for i in range(self.num_experts_per_tok):
-+++-+            expert_id = flat_expert_indices[i].item()
-+++-+            weight = flat_expert_weights[i].item()
-+++-+            expert = self.experts[expert_id]
-+++-+            expert_out = expert(x)
-+++-+            expert_cache += expert_out * weight
-+++-+        return expert_cache
-+++- 
-+++-     @no_grad()
-+++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++--        # expert_cache = torch.zeros_like(x)
-+++--        # idxs = flat_expert_indices.argsort()
-+++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++--        # token_idxs = idxs // self.num_experts_per_tok
-+++--        # for i, end_idx in enumerate(tokens_per_expert):
-+++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++--        #     if start_idx == end_idx:
-+++--        #         continue
-+++--        #     expert = self.experts[i]
-+++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++--        #     expert_tokens = x[exp_token_idx]
-+++--        #     expert_out = expert(expert_tokens)
-+++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++--        # return expert_cache
-+++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++-         expert_cache = ops.zeros_like(x)
-+++-         idxs = flat_expert_indices.argsort()
-+++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++-         token_idxs = idxs // self.num_experts_per_tok
-+++-+
-+++-         for i, end_idx in enumerate(tokens_per_expert):
-+++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++-             if start_idx == end_idx:
-+++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+++-             expert_out = expert(expert_tokens)
-+++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++-+
-+++-         return expert_cache
-+++-+        
-+++-+    # @no_grad()
-+++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++-+    #     # expert_cache = torch.zeros_like(x)
-+++-+    #     # idxs = flat_expert_indices.argsort()
-+++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++-+    #     # token_idxs = idxs // self.num_experts_per_tok
-+++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++-+    #     #     if start_idx == end_idx:
-+++-+    #     #         continue
-+++-+    #     #     expert = self.experts[i]
-+++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++-+    #     #     expert_tokens = x[exp_token_idx]
-+++-+    #     #     expert_out = expert(expert_tokens)
-+++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++-+    #     # return expert_cache
-+++-+    #     expert_cache = ops.zeros_like(x)
-+++-+    #     idxs = flat_expert_indices.argsort()
-+++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++-+    #     token_idxs = idxs // self.num_experts_per_tok
-+++-+
-+++-+    #     for i, end_idx in enumerate(tokens_per_expert):
-+++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++-+    #         if start_idx == end_idx:
-+++-+    #             continue
-+++-+    #         expert = self.experts[i]
-+++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++-+    #         expert_tokens = x[exp_token_idx]
-+++-+    #         expert_out = expert(expert_tokens)
-+++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++-+
-+++-+    #     return expert_cache
-+++-+    # @no_grad()
-+++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++-+    #     expert_cache = ops.zeros_like(x)
-+++-+
-+++-+    #     # 排序保证顺序一致
-+++-+    #     idxs = flat_expert_indices.argsort()
-+++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++-+    #     token_idxs = idxs // self.num_experts_per_tok
-+++-+
-+++-+    #     # 找出有 token 的专家
-+++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++-+
-+++-+    #     for i in active_experts.tolist():
-+++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++-+    #         end_idx = tokens_per_expert[i]
-+++-+    #         if start_idx == end_idx:  # 没有 token
-+++-+    #             continue
-+++-+
-+++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++-+    #         expert_tokens = x[exp_token_idx]
-+++-+    #         expert_out = self.experts[i](expert_tokens)
-+++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++-+
-+++-+    #         expert_cache = mindspore.mint.scatter_add(
-+++-+    #             expert_cache,
-+++-+    #             0,
-+++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++-+    #             expert_out
-+++-+    #         )
-+++-+
-+++-+    #     return expert_cache
-+++-+
-+++-+
-+++- 
-+++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+++- #     """
-+++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++- 
-+++-         # Initialize weights and apply final processing
-+++-         self.post_init()
-+++-+        self.warm_up = False
-+++-+
-+++-+    def warmup_moe_model_deep(self):
-+++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++-+        test_texts = [
-+++-+            "warmup short",
-+++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+++-+        ]
-+++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++-+        if tokenizer is None:
-+++-+            from mindnlp.transformers import AutoTokenizer
-+++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++-+            self._warmup_tokenizer = tokenizer
-+++-+
-+++-+        for text in test_texts:
-+++-+            inputs = tokenizer(text, return_tensors="ms")
-+++-+            with mindspore._no_grad():
-+++-+                _ = self(**inputs, use_cache=False)
-+++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+++- 
-+++-     def get_input_embeddings(self):
-+++-         return self.model.embed_tokens
-+++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++-         ```"""
-+++-+        if not self.warm_up:
-+++-+            self.warm_up = True
-+++-+            self.warmup_moe_model_deep()
-+++-+
-+++-         output_attentions = (
-+++-             output_attentions
-+++-             if output_attentions is not None
-+++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++-index 3cbf820e..d4c6b651 100644
-+++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++-@@ -18,7 +18,6 @@
-+++- # See the License for the specific language governing permissions and
-+++- # limitations under the License.
-+++- """MindSpore Qwen2MoE model."""
-+++--
-+++- import math
-+++- from typing import List, Optional, Tuple, Union
-+++- 
-+++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+++-     TokenClassifierOutput,
-+++- )
-+++- from ...modeling_utils import PreTrainedModel
-+++-+from ...generation import GenerationMixin
-+++- from ....utils import logging
-+++- from .configuration_qwen2_moe import Qwen2MoeConfig
-+++- 
-+++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+++-         self.variance_epsilon = eps
-+++- 
-+++-     def forward(self, hidden_states):
-+++-+        # @dwj
-+++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++-+        # @lwx
-+++-+        # if not self.training :
-+++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++-         input_dtype = hidden_states.dtype
-+++-         hidden_states = hidden_states.to(mindspore.float32)
-+++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+++-@@ -234,6 +239,8 @@ def rotate_half(x):
-+++-     """Rotates half the hidden dims of the input."""
-+++-     x1 = x[..., : x.shape[-1] // 2]
-+++-     x2 = x[..., x.shape[-1] // 2 :]
-+++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++-     return ops.cat((-x2, x1), dim=-1)
-+++- 
-+++- 
-+++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+++-         self.config = config
-+++-         self.hidden_size = config.hidden_size
-+++-         self.intermediate_size = intermediate_size
-+++-+        
-+++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+++-         self.act_fn = ACT2FN[config.hidden_act]
-+++- 
-+++-     def forward(self, x):
-+++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++--
-+++- 
-+++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++-+        # @lwx 
-+++-+        # gate_up_output = self.gate_up_proj(x)
-+++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+++-+        # return self.down_proj(swiglu_output)
-+++-+
-+++-+    # def forward(self, x):
-+++-+    #     gate_proj_out = self.gate_proj(x)
-+++-+    #     up_proj_out = self.up_proj(x)
-+++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+++-+    #     return self.down_proj(swiglu_out)
-+++-+        
-+++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++-     """
-+++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+++-         use_cache: bool = False,
-+++-         cache_position: Optional[mindspore.Tensor] = None,
-+++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++-+
-+++-+        
-+++-+
-+++-         bsz, q_len, _ = hidden_states.shape
-+++- 
-+++-         query_states = self.q_proj(hidden_states)
-+++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++-                     "with a layer index."
-+++-                 )
-+++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++-+            if isinstance(past_key_value, StaticCache):
-+++-+                kv_seq_len = key_states.shape[-2]
-+++-+            else:
-+++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++- 
-+++-         if past_key_value is not None:
-+++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++-+            
-+++-+            if isinstance(past_key_value, StaticCache):
-+++-+                kv_seq_len = key_states.shape[-2]
-+++- 
-+++-         # repeat k/v heads if n_kv_heads < n_heads
-+++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++--
-+++-+        
-+++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++- 
-+++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+++--            raise ValueError(
-+++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+++--                f" {attn_weights.shape}"
-+++--            )
-+++--
-+++--        if attention_mask is not None:  # no matter the length, we just slice it
-+++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+++-+        if attention_mask is not None:
-+++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++-             attn_weights = attn_weights + causal_mask
-+++- 
-+++-         # upcast attention to fp32
-+++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++- 
-+++-         attn_output = self.o_proj(attn_output)
-+++--
-+++-+        # @lwx
-+++-+        
-+++-+        # max_seq_len = self.max_position_embeddings  # 2048
-+++-+
-+++-+        # if attention_mask is not None:
-+++-+        #     # attention_mask: [B, 1, Sq, Sk]
-+++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++-+
-+++-+        #     # pad 到 [max_seq_len, max_seq_len]
-+++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++-+        #     global_attention_mask = padded_mask
-+++-+        # else:
-+++-+        #     global_attention_mask = None
-+++-+
-+++-+
-+++-+        # sparse_mode=3
-+++-+        # attn_output = mindspore.ops.flash_attention_score(
-+++-+        #     query=query_states,
-+++-+        #     key=key_states,
-+++-+        #     value=value_states,
-+++-+        #     real_shift=None,
-+++-+        #     padding_mask=None,
-+++-+
-+++-+        #     head_num=self.num_heads,
-+++-+        #     attn_mask=global_attention_mask,
-+++-+        #     keep_prob=1.0 - self.attention_dropout,
-+++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++-+        #     input_layout="BNSD",
-+++-+        #     pre_tokens=2147483647,
-+++-+        #     next_tokens=2147483647,
-+++-+        #     inner_precise=0,
-+++-+        #     drop_mask=None,
-+++-+        #     prefix=None,
-+++-+        #     actual_seq_qlen=None,
-+++-+        #     actual_seq_kvlen=None,
-+++-+        #     sparse_mode=sparse_mode,
-+++-+        # )
-+++-         if not output_attentions:
-+++-             attn_weights = None
-+++- 
-+++-         return attn_output, attn_weights, past_key_value
-+++- 
-+++- 
-+++-+class Qwen2MoeFlashAttention(nn.Module):
-+++-+    """
-+++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++-+
-+++-+    关键改动:
-+++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++-+       直接传入原始的 key 和 value 张量效率更高。
-+++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++-+    """
-+++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++-+        super().__init__()
-+++-+        self.config = config
-+++-+        self.layer_idx = layer_idx
-+++-+        self.hidden_size = config.hidden_size
-+++-+        self.num_heads = config.num_attention_heads
-+++-+        self.head_dim = self.hidden_size // self.num_heads
-+++-+        self.num_key_value_heads = config.num_key_value_heads
-+++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++-+        self.max_position_embeddings = config.max_position_embeddings
-+++-+        self.rope_theta = config.rope_theta
-+++-+        self.attention_dropout = config.attention_dropout
-+++-+
-+++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++-+            raise ValueError(
-+++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++-+            )
-+++-+
-+++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++-+
-+++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++-+            self.head_dim,
-+++-+            max_position_embeddings=self.max_position_embeddings,
-+++-+            base=self.rope_theta,
-+++-+        )
-+++-+
-+++-+    def forward(
-+++-+        self,
-+++-+        hidden_states: mindspore.Tensor,
-+++-+        attention_mask: Optional[mindspore.Tensor] = None,
-+++-+        position_ids: Optional[mindspore.Tensor] = None,
-+++-+        past_key_value: Optional[Cache] = None,
-+++-+        output_attentions: bool = False,
-+++-+        use_cache: bool = False,
-+++-+        cache_position: Optional[mindspore.Tensor] = None,
-+++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++-+
-+++-+        bsz, q_len, _ = hidden_states.shape
-+++-+
-+++-+        # 1. 线性投射 Q, K, V
-+++-+        query_states = self.q_proj(hidden_states)
-+++-+        key_states = self.k_proj(hidden_states)
-+++-+        value_states = self.v_proj(hidden_states)
-+++-+
-+++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+
-+++-+        # 3. RoPE 旋转位置编码
-+++-+        kv_seq_len = key_states.shape[-2]
-+++-+        if past_key_value is not None:
-+++-+            if self.layer_idx is None:
-+++-+                raise ValueError(
-+++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++-+                    "with a layer index."
-+++-+                )
-+++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++-+                if cache_position.shape[0] == 1:
-+++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++-+                    kv_seq_len = past_seen_tokens + 1
-+++-+                else:
-+++-+                    # prefill 阶段：cache_position 是范围，使用其长度
-+++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++-+            else:
-+++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++-+
-+++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++-+
-+++-+        # 4. KV 缓存更新
-+++-+        if past_key_value is not None:
-+++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++-+            key_states, value_states = past_key_value.update(
-+++-+                key_states, value_states, self.layer_idx, cache_kwargs
-+++-+            )
-+++-+            
-+++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++-+                if cache_position.shape[0] == 1:
-+++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++-+                    kv_seq_len = key_states.shape[-2]
-+++-+
-+++-+        # 5. [重要] 准备 Attention Mask
-+++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++-+        fa_attention_mask = None
-+++-+        if attention_mask is not None:
-+++-+            # 截取与当前key长度匹配的部分
-+++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++-+            fa_attention_mask = (mask_slice != 0)
-+++-+
-+++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++-+        input_dtype = query_states.dtype
-+++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++-+            query_states = query_states.to(mindspore.float16)
-+++-+            key_states = key_states.to(mindspore.float16)
-+++-+            value_states = value_states.to(mindspore.float16)
-+++-+
-+++-+        # 6. [核心] 调用 flash_attention_score 算子
-+++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++-+        attn_output = mindspore.ops.flash_attention_score(
-+++-+            query=query_states,
-+++-+            key=key_states,
-+++-+            value=value_states,
-+++-+            head_num=self.num_heads, # 传入Q的头数(N1)
-+++-+            attn_mask=fa_attention_mask,
-+++-+            keep_prob=1.0 - self.attention_dropout,
-+++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
-+++-+            input_layout="BNSD",
-+++-+            sparse_mode=0 # 使用 defaultMask 模式
-+++-+        )
-+++-+
-+++-+        # 恢复原始数据类型
-+++-+        attn_output = attn_output.to(input_dtype)
-+++-+
-+++-+        # 7. 调整输出形状
-+++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++-+        attn_output = self.o_proj(attn_output)
-+++-+
-+++-+        # FlashAttention 算子不直接返回注意力权重矩阵
-+++-+        attn_weights = None
-+++-+        if output_attentions:
-+++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++-+
-+++-+        return attn_output, attn_weights, past_key_value
-+++-+
-+++-+    # def forward(
-+++-+    #     self,
-+++-+    #     hidden_states: mindspore.Tensor,
-+++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-+++-+    #     past_key_value: Optional[Cache] = None,
-+++-+    #     output_attentions: bool = False,
-+++-+    #     use_cache: bool = False,
-+++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-+++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++-+
-+++-+    #     bsz, q_len, _ = hidden_states.shape
-+++-+
-+++-+    #     # 1. 线性投射 Q, K, V
-+++-+    #     query_states = self.q_proj(hidden_states)
-+++-+    #     key_states = self.k_proj(hidden_states)
-+++-+    #     value_states = self.v_proj(hidden_states)
-+++-+
-+++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+
-+++-+    #     # 3. RoPE 旋转位置编码
-+++-+    #     kv_seq_len = key_states.shape[-2]
-+++-+    #     if past_key_value is not None:
-+++-+    #         if self.layer_idx is None:
-+++-+    #             raise ValueError(
-+++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++-+    #                 "with a layer index."
-+++-+    #             )
-+++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++-+
-+++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++-+
-+++-+    #     # 4. KV 缓存更新
-+++-+    #     if past_key_value is not None:
-+++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++-+    #         key_states, value_states = past_key_value.update(
-+++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++-+    #         )
-+++-+
-+++-+    #     # 5. 准备 Attention Mask
-+++-+    #     fa_attention_mask = None
-+++-+    #     if attention_mask is not None:
-+++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++-+    #         fa_attention_mask = (mask_slice != 0)
-+++-+
-+++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++-+    #     input_dtype = query_states.dtype
-+++-+
-+++-+    #     # 6. [核心] 调用 flash_attention_score 算子
-+++-+    #     attn_output = mindspore.ops.flash_attention_score(
-+++-+    #         query=query_states,
-+++-+    #         key=key_states,
-+++-+    #         value=value_states,
-+++-+    #         head_num=self.num_heads,
-+++-+    #         attn_mask=fa_attention_mask,
-+++-+    #         keep_prob=1.0 - self.attention_dropout,
-+++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++-+    #         input_layout="BNSD",
-+++-+    #         sparse_mode=0,
-+++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++-+    #         inner_precise=1
-+++-+    #     )
-+++-+
-+++-+    #     # 恢复原始数据类型
-+++-+    #     attn_output = attn_output.to(input_dtype)
-+++-+
-+++-+    #     # 7. 调整输出形状
-+++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++-+    #     attn_output = self.o_proj(attn_output)
-+++-+
-+++-+    #     attn_weights = None
-+++-+    #     if output_attentions:
-+++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++-+
-+++-+    #     return attn_output, attn_weights, past_key_value
-+++-+
-+++-+    # def forward(
-+++-+    #     self,
-+++-+    #     hidden_states: mindspore.Tensor,
-+++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-+++-+    #     past_key_value: Optional[Cache] = None,
-+++-+    #     output_attentions: bool = False,
-+++-+    #     use_cache: bool = False,
-+++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-+++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++-+
-+++-+    #     bsz, q_len, _ = hidden_states.shape
-+++-+
-+++-+    #     query_states = self.q_proj(hidden_states)
-+++-+    #     key_states = self.k_proj(hidden_states)
-+++-+    #     value_states = self.v_proj(hidden_states)
-+++-+
-+++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-+
-+++-+    #     kv_seq_len = key_states.shape[-2]
-+++-+    #     if past_key_value is not None:
-+++-+    #         if self.layer_idx is None:
-+++-+    #             raise ValueError("`layer_idx` must be specified for caching")
-+++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++-+
-+++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++-+
-+++-+    #     if past_key_value is not None:
-+++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++-+    #         key_states, value_states = past_key_value.update(
-+++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++-+    #         )
-+++-+
-+++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++-+
-+++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++-+    #     query_states = query_states / math.sqrt(self.head_dim)
-+++-+    #     # <--- 修改结束 ---
-+++-+
-+++-+    #     fa_attention_mask = None
-+++-+    #     if attention_mask is not None:
-+++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++-+    #         fa_attention_mask = (mask_slice != 0)
-+++-+
-+++-+    #     input_dtype = query_states.dtype
-+++-+
-+++-+    #     attn_output = mindspore.ops.flash_attention_score(
-+++-+    #         query=query_states,    # 传入已经预先缩放过的 query
-+++-+    #         key=key_states,
-+++-+    #         value=value_states,
-+++-+    #         head_num=self.num_heads,
-+++-+    #         attn_mask=fa_attention_mask,
-+++-+    #         keep_prob=1.0 - self.attention_dropout,
-+++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++-+    #         input_layout="BNSD",
-+++-+    #         sparse_mode=0,
-+++-+    #         inner_precise=1        # 仍然保持内部高精度计算
-+++-+    #     )
-+++-+
-+++-+    #     attn_output = attn_output.to(input_dtype)
-+++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++-+    #     attn_output = self.o_proj(attn_output)
-+++-+
-+++-+    #     attn_weights = None
-+++-+    #     if output_attentions:
-+++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++-+
-+++-+    #     return attn_output, attn_weights, past_key_value
-+++-+
-+++- QWEN2MOE_ATTENTION_CLASSES = {
-+++-     "eager": Qwen2MoeAttention,
-+++-+    "flash-attention": Qwen2MoeFlashAttention,
-+++- }
-+++- 
-+++- 
-+++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++- 
-+++-+    #@dwj
-+++-+    # 只遍历激活的专家，而非全部专家
-+++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++--        hidden_states = hidden_states.view(-1, hidden_dim)
-+++--        # router_logits: (batch * sequence_length, n_experts)
-+++--        router_logits = self.gate(hidden_states)
-+++--
-+++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++--        if self.norm_topk_prob:
-+++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++--        # we cast back to the input dtype
-+++--        routing_weights = routing_weights.to(hidden_states.dtype)
-+++--
-+++--        final_hidden_states = ops.zeros(
-+++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+++--        )
-+++--
-+++--        # One hot encode the selected experts to create an expert mask
-+++--        # this will be used to easily index which expert is going to be sollicitated
-+++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+++--
-+++--        # Loop over all available experts in the model and perform the computation on each expert
-+++--        for expert_idx in range(self.num_experts):
-+++--            expert_layer = self.experts[expert_idx]
-+++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+++--
-+++--            # Index the correct hidden states and compute the expert hidden state for
-+++--            # the current expert. We need to make sure to multiply the output hidden
-+++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+++--            if 0 not in idx.shape:
-+++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+++--
-+++--                # However `index_add_` only support torch tensors for indexing so we'll use
-+++--                # the `top_x` tensor here.
-+++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+++--
-+++--        shared_expert_output = self.shared_expert(hidden_states)
-+++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+++--
-+++--        final_hidden_states = final_hidden_states + shared_expert_output
-+++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++-+            num_tokens = hidden_states_reshaped.shape[0]
-+++-+
-+++-+            router_logits = self.gate(hidden_states_reshaped)
-+++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++-+
-+++-+            if self.norm_topk_prob:
-+++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++-+            routing_weights = routing_weights.to(hidden_states.dtype)
-+++-+            
-+++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++-+            flat_selected_experts = selected_experts.flatten()
-+++-+            
-+++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++-+            token_indices = broadcasted_token_indices.flatten()
-+++-+            
-+++-+            active_experts = ops.unique(flat_selected_experts)
-+++-+            
-+++-+            for expert_idx_tensor in active_experts:
-+++-+                expert_idx = expert_idx_tensor.item()
-+++-+                expert_layer = self.experts[expert_idx]
-+++-+                
-+++-+                mask = (flat_selected_experts == expert_idx_tensor)
-+++-+                selected_token_indices = token_indices[mask]
-+++-+                selected_routing_weights = routing_weights.flatten()[mask]
-+++-+                
-+++-+                current_states = hidden_states_reshaped[selected_token_indices]
-+++-+                
-+++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++-+                
-+++-+                final_hidden_states = final_hidden_states.index_add(
-+++-+                    dim=0,
-+++-+                    index=selected_token_indices,
-+++-+                    source=expert_output.to(hidden_states.dtype)
-+++-+                )
-+++-+            
-+++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++- 
-+++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++--        return final_hidden_states, router_logits
-+++-+            final_hidden_states = final_hidden_states + shared_expert_output
-+++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++-+            
-+++-+            return final_hidden_states, router_logits
-+++- 
-+++- 
-+++- class Qwen2MoeDecoderLayer(nn.Module):
-+++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+++- 
-+++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++- 
-+++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++-+
-+++-         if (layer_idx not in config.mlp_only_layers) and (
-+++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++-         ):
-+++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+++-     _skip_keys_device_placement = "past_key_values"
-+++-     _supports_cache_class = True
-+++-+#lwx
-+++-+    # _supports_static_cache = True
-+++- 
-+++-     def _init_weights(self, module):
-+++-         std = self.config.initializer_range
-+++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++-         return causal_mask
-+++- 
-+++- 
-+++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++-     _tied_weights_keys = ["lm_head.weight"]
-+++- 
-+++-     def __init__(self, config):
-+++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++-         self.num_experts_per_tok = config.num_experts_per_tok
-+++-         # Initialize weights and apply final processing
-+++-         self.post_init()
-+++-+        # @lwx
-+++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+++-+        #     self.generation_config.cache_implementation = "static"
-+++-+        self._warmed_up = False
-+++-+
-+++-+    def warmup_moe_model(self):
-+++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+++-+        test_texts = [
-+++-+            "warmup short",
-+++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+++-+        ]
-+++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++-+        if tokenizer is None:
-+++-+            from mindnlp.transformers import AutoTokenizer
-+++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++-+            self._warmup_tokenizer = tokenizer
-+++-+
-+++-+        for text in test_texts:
-+++-+            inputs = tokenizer(text, return_tensors="ms")
-+++-+            with mindspore._no_grad():
-+++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+++- 
-+++-     def get_input_embeddings(self):
-+++-         return self.model.embed_tokens
-+++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++-         ```"""
-+++-+        if not self._warmed_up:
-+++-+            self._warmed_up = True
-+++-+            self.warmup_moe_model()
-+++- 
-+++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++-         output_router_logits = (
-+++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++-             }
-+++-         )
-+++-         return model_inputs
-+++-+# @lwx
-+++-+    # def _decode_one_tokens_logits(
-+++-+    #     self,
-+++-+    #     cur_token: mindspore.Tensor,
-+++-+    #     input_pos: Optional[mindspore.Tensor],
-+++-+    #     cache_position: mindspore.Tensor,
-+++-+    #     past_key_values: StaticCache,
-+++-+    # ) -> mindspore.Tensor:
-+++-+    #     """
-+++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+++-+        
-+++-+    #     Args:
-+++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+++-+    #         input_pos: 输入位置信息，可选
-+++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+++-+            
-+++-+    #     Returns:
-+++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+++-+    #     """
-+++-+    #     # 调用JIT编译的版本
-+++-+    #     return self.get_decode_one_tokens_logits(
-+++-+    #         cur_token=cur_token,
-+++-+    #         input_pos=input_pos,
-+++-+    #         cache_position=cache_position,
-+++-+    #         past_key_values=past_key_values,
-+++-+    #     )
-+++-+    
-+++-+    # @mindspore.jit(jit_level='O1')
-+++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+++-+    #     """
-+++-+    #     JIT编译的函数，用于高效的单token解码
-+++-+    #     使用JIT编译优化以支持静态shape和高效执行
-+++-+        
-+++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+++-+    #     """
-+++-+    #     outputs = self.model.forward(
-+++-+    #         input_ids=cur_token,
-+++-+    #         position_ids=input_pos,
-+++-+    #         cache_position=cache_position,
-+++-+    #         past_key_values=past_key_values,
-+++-+    #         use_cache=True,
-+++-+    #         return_dict=False,
-+++-+    #     )
-+++-+        
-+++-+    #     hidden_states = outputs[0]
-+++-+    #     logits = self.lm_head.forward(hidden_states)
-+++-+    #     logits = logits.float()
-+++-+        
-+++-+    #     return logits[:, -1, :]
-+++-+
-+++-+    # def _sample(
-+++-+    #     self,
-+++-+    #     input_ids: mindspore.Tensor,
-+++-+    #     logits_processor,
-+++-+    #     stopping_criteria,
-+++-+    #     generation_config,
-+++-+    #     synced_devices: bool,
-+++-+    #     streamer=None,
-+++-+    #     logits_warper=None,
-+++-+    #     **model_kwargs,
-+++-+    # ):
-+++-+    #     """
-+++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+++-+    #     """
-+++-+    #     from ...generation.logits_process import LogitsProcessorList
-+++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+++-+    #     from mindnlp.core import nn, ops, no_grad
-+++-+    #     import numpy as np
-+++-+        
-+++-+    #     # 检查是否使用 StaticCache
-+++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+++-+    #     # 否则，直接调用父类方法
-+++-+    #     past_key_values = model_kwargs.get("past_key_values")
-+++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+++-+
-+++-+    #     if not isinstance(past_key_values, StaticCache):
-+++-+    #         # 不使用 StaticCache，直接调用父类方法
-+++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+++-+    #         return super()._sample(
-+++-+    #             input_ids=input_ids,
-+++-+    #             logits_processor=logits_processor,
-+++-+    #             stopping_criteria=stopping_criteria,
-+++-+    #             generation_config=generation_config,
-+++-+    #             synced_devices=synced_devices,
-+++-+    #             streamer=streamer,
-+++-+    #             logits_warper=logits_warper,
-+++-+    #             **model_kwargs,
-+++-+    #         )
-+++-+        
-+++-+    #     # 使用 StaticCache，进入自定义循环
-+++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+++-+    #     pad_token_id = generation_config._pad_token_tensor
-+++-+    #     output_attentions = generation_config.output_attentions
-+++-+    #     output_hidden_states = generation_config.output_hidden_states
-+++-+    #     output_scores = generation_config.output_scores
-+++-+    #     output_logits = generation_config.output_logits
-+++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+++-+    #     max_length = generation_config.max_length
-+++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+++-+    #     do_sample = generation_config.do_sample
-+++-+        
-+++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+++-+    #         raise ValueError(
-+++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+++-+    #             f"{logits_warper})."
-+++-+    #         )
-+++-+        
-+++-+    #     # init attention / hidden states / scores tuples
-+++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
-+++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+++-+        
-+++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+++-+    #         encoder_hidden_states = (
-+++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+++-+    #         )
-+++-+        
-+++-+    #     # keep track of which sequences are already finished
-+++-+    #     batch_size, cur_len = input_ids.shape
-+++-+    #     this_peer_finished = False
-+++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+++-+        
-+++-+    #     time_record = []
-+++-+    #     from ....utils.testing_utils import parse_flag_from_env
-+++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+++-+        
-+++-+    #     while self._has_unfinished_sequences(
-+++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+++-+    #     ):
-+++-+    #         if _record_time:
-+++-+    #             import time as time_module
-+++-+    #             infer_start = time_module.time()
-+++-+            
-+++-+    #         # prepare model inputs
-+++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+++-+            
-+++-+    #         # prepare variable output controls
-+++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+++-+            
-+++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+++-+    #         cur_cache_position = model_inputs.get("cache_position")
-+++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
-+++-+    #         cur_input_ids = model_inputs.get("input_ids")
-+++-+            
-+++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+++-+    #             cur_cache_position is not None and 
-+++-+    #             len(cur_cache_position.shape) > 0 and
-+++-+    #             cur_cache_position.shape[0] == 1 and
-+++-+    #             cur_input_ids is not None and
-+++-+    #             cur_input_ids.shape[1] == 1):
-+++-+    #             # 使用 JIT 优化的单 token 解码
-+++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+++-+    #             if not hasattr(self, '_jit_used'):
-+++-+    #                 self._jit_used = False
-+++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+++-+                
-+++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
-+++-+    #                 cur_token=cur_input_ids,
-+++-+    #                 input_pos=model_inputs.get("position_ids"),
-+++-+    #                 cache_position=cur_cache_position,
-+++-+    #                 past_key_values=cur_past_key_values,
-+++-+    #             )
-+++-+                
-+++-+    #             # 标记已使用JIT（用于后续判断）
-+++-+    #             if not self._jit_used:
-+++-+    #                 self._jit_used = True
-+++-+                
-+++-+    #             # 构造兼容的输出对象
-+++-+    #             class JitOptimizedOutput:
-+++-+    #                 def __init__(self, logits, config):
-+++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+++-+    #                     self.config = config
-+++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
-+++-+    #                     self.cross_attentions = None
-+++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+++-+                
-+++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+++-+    #         else:
-+++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+++-+    #             outputs = self(**model_inputs, return_dict=True)
-+++-+            
-+++-+    #         if synced_devices and this_peer_finished:
-+++-+    #             continue
-+++-+            
-+++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+++-+    #         next_token_logits = outputs.logits[:, -1, :]
-+++-+            
-+++-+    #         # pre-process distribution
-+++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+++-+    #         if do_sample:
-+++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+++-+            
-+++-+    #         # Store scores, attentions and hidden_states when required
-+++-+    #         if return_dict_in_generate:
-+++-+    #             if output_scores:
-+++-+    #                 scores += (next_token_scores,)
-+++-+    #             if output_logits:
-+++-+    #                 raw_logits += (next_token_logits,)
-+++-+    #             if output_attentions:
-+++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+++-+    #                 if self.config.is_encoder_decoder:
-+++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+++-+                
-+++-+    #             if output_hidden_states:
-+++-+    #                 hidden = (
-+++-+    #                     outputs.decoder_hidden_states
-+++-+    #                     if self.config.is_encoder_decoder
-+++-+    #                     else outputs.hidden_states
-+++-+    #                 )
-+++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+++-+            
-+++-+    #         # token selection
-+++-+    #         if do_sample:
-+++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+++-+    #         else:
-+++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+++-+            
-+++-+    #         # finished sentences should have their next token be a padding token
-+++-+    #         if has_eos_stopping_criteria:
-+++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+++-+            
-+++-+    #         # update generated ids, model inputs, and length for next step
-+++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+++-+    #         if streamer is not None:
-+++-+    #             streamer.put(next_tokens)
-+++-+            
-+++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
-+++-+    #             outputs,
-+++-+    #             model_kwargs,
-+++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+++-+    #         )
-+++-+            
-+++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+++-+    #         cur_len += 1
-+++-+            
-+++-+    #         if _record_time:
-+++-+    #             import time as time_module
-+++-+    #             infer_stop = time_module.time()
-+++-+    #             time_record.append(infer_stop - infer_start)
-+++-+            
-+++-+    #         del outputs
-+++-+        
-+++-+    #     average_infer_time = None
-+++-+    #     if time_record:
-+++-+    #         if len(time_record) > 1:
-+++-+    #             time_record.pop(0)
-+++-+    #         average_infer_time = sum(time_record) / len(time_record)
-+++-+    #         print(f'average inference time is: {average_infer_time}')
-+++-+    #         print(f'inference time record: {time_record}')
-+++-+        
-+++-+    #     if streamer is not None:
-+++-+    #         streamer.end()
-+++-+        
-+++-+    #     # 简单判断：打印是否使用了JIT路径
-+++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
-+++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
-+++-+    #     else:
-+++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+++-+        
-+++-+    #     if return_dict_in_generate:
-+++-+    #         if self.config.is_encoder_decoder:
-+++-+    #             return GenerateEncoderDecoderOutput(
-+++-+    #                 sequences=input_ids,
-+++-+    #                 scores=scores,
-+++-+    #                 logits=raw_logits,
-+++-+    #                 encoder_attentions=encoder_attentions,
-+++-+    #                 encoder_hidden_states=encoder_hidden_states,
-+++-+    #                 decoder_attentions=decoder_attentions,
-+++-+    #                 cross_attentions=cross_attentions,
-+++-+    #                 decoder_hidden_states=decoder_hidden_states,
-+++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++-+    #                 average_infer_time=average_infer_time
-+++-+    #             )
-+++-+    #         else:
-+++-+    #             return GenerateDecoderOnlyOutput(
-+++-+    #                 sequences=input_ids,
-+++-+    #                 scores=scores,
-+++-+    #                 logits=raw_logits,
-+++-+    #                 attentions=decoder_attentions,
-+++-+    #                 hidden_states=decoder_hidden_states,
-+++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++-+    #                 average_infer_time=average_infer_time
-+++-+    #             )
-+++-+    #     else:
-+++-+    #         return input_ids
-+++-+            
-+++-+    # def _prepare_cache_for_generation(
-+++-+    #     self,
-+++-+    #     generation_config,
-+++-+    #     model_kwargs,
-+++-+    #     assistant_model,
-+++-+    #     batch_size,
-+++-+    #     max_cache_length,
-+++-+    # ):
-+++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+++-+    #         generation_config.cache_implementation = "static"
-+++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+++-+        
-+++-+    #     if generation_config.cache_implementation == "static":
-+++-+    #         base_required_from_max_length = generation_config.max_length + 1
-+++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
-+++-+    #         min_cache_size = 50
-+++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+++-+    #         else:
-+++-+    #             max_cache_length = max(base_required, min_cache_size)
-+++-+            
-+++-+    #         original_max_cache_length = max_cache_length
-+++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
-+++-+            
-+++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++-+    #             if max_cache_length > self.config.max_position_embeddings:
-+++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+++-+        
-+++-+    #     result = super()._prepare_cache_for_generation(
-+++-+    #         generation_config=generation_config,
-+++-+    #         model_kwargs=model_kwargs,
-+++-+    #         assistant_model=assistant_model,
-+++-+    #         batch_size=batch_size,
-+++-+    #         max_cache_length=max_cache_length,
-+++-+    #     )
-+++-+        
-+++-+    #     if generation_config.cache_implementation == "static":
-+++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+++-+    #         created_cache = model_kwargs.get(cache_name)
-+++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+++-+    #             if created_cache.max_cache_len < generation_config.max_length:
-+++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+++-+        
-+++-+    #     return result
-+++-+
-+++-+
-+++-+
-+++- 
-+++- 
-+++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+++--- 
-+++-2.27.0
-+++-
-+++-- 
-+++2.27.0
-+++
-++-- 
-++2.27.0
-++
-+-- 
-+2.27.0
-+
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0007-20251107003commit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0007-20251107003commit.patch"
deleted file mode 100644
index 695e3df9..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0007-20251107003commit.patch"
+++ /dev/null
@@ -1,8034 +0,0 @@
-From 2831c3ffbda41719e00e1cd83c3840bcb9dd79db Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Fri, 7 Nov 2025 12:12:51 +0800
-Subject: [PATCH 07/10] 20251107003commit
-
----
- .../models/deepseek/modeling_deepseek.py      |    2 +-
- patches/0001-20251104commit.patch             |    2 +-
- patches/0002-20251106commit.patch             |    2 +-
- patches/0003-20261106secondcommit.patch       |    2 +-
- patches/0004-20251106change.patch             |    2 +-
- patches/0005-20251107001commit.patch          |    2 +-
- patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
- 7 files changed, 7937 insertions(+), 6 deletions(-)
- create mode 100644 patches/0006-20251107002commit.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index e7e1c053..ff631974 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
-     #     return expert_cache
-     
-     @no_grad()
--    dwj
-+    # dwj
-     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-         # x 的 shape: (1, hidden_size)
-         # flat_expert_indices 的 shape: (num_experts_per_tok,)
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-index 2842180e..c9c8c5ee 100644
---- a/patches/0001-20251104commit.patch
-+++ b/patches/0001-20251104commit.patch
-@@ -1,7 +1,7 @@
- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Tue, 4 Nov 2025 09:11:51 +0800
--Subject: [PATCH 1/5] 20251104commit
-+Subject: [PATCH 1/6] 20251104commit
- 
- ---
-  mindnlp/transformers/cache_utils.py           |  28 +-
-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-index c6cd8757..625656eb 100644
---- a/patches/0002-20251106commit.patch
-+++ b/patches/0002-20251106commit.patch
-@@ -1,7 +1,7 @@
- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 09:20:38 +0800
--Subject: [PATCH 2/5] 20251106commit
-+Subject: [PATCH 2/6] 20251106commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-index 601960c9..dcb85080 100644
---- a/patches/0003-20261106secondcommit.patch
-+++ b/patches/0003-20261106secondcommit.patch
-@@ -1,7 +1,7 @@
- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 14:54:37 +0800
--Subject: [PATCH 3/5] 20261106secondcommit
-+Subject: [PATCH 3/6] 20261106secondcommit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-index 8976f10b..bbed13cc 100644
---- a/patches/0004-20251106change.patch
-+++ b/patches/0004-20251106change.patch
-@@ -1,7 +1,7 @@
- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 15:48:09 +0800
--Subject: [PATCH 4/5] 20251106change
-+Subject: [PATCH 4/6] 20251106change
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  189 +-
-diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-index 8d9032be..b2d1035c 100644
---- a/patches/0005-20251107001commit.patch
-+++ b/patches/0005-20251107001commit.patch
-@@ -1,7 +1,7 @@
- From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Fri, 7 Nov 2025 11:48:18 +0800
--Subject: [PATCH 5/5] 20251107001commit
-+Subject: [PATCH 5/6] 20251107001commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |   91 +-
-diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
-new file mode 100644
-index 00000000..bffa134e
---- /dev/null
-+++ b/patches/0006-20251107002commit.patch
-@@ -0,0 +1,7931 @@
-+From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Fri, 7 Nov 2025 12:06:32 +0800
-+Subject: [PATCH 6/6] 20251107002commit
-+
-+---
-+ .../models/deepseek/modeling_deepseek.py      |  122 +-
-+ patches/0001-20251104commit.patch             |    2 +-
-+ patches/0002-20251106commit.patch             |    2 +-
-+ patches/0003-20261106secondcommit.patch       |    2 +-
-+ patches/0004-20251106change.patch             |    2 +-
-+ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
-+ 6 files changed, 7773 insertions(+), 64 deletions(-)
-+ create mode 100644 patches/0005-20251107001commit.patch
-+
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index 8831e4b7..e7e1c053 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
-+     #         expert_out = expert(x)
-+     #         expert_cache += expert_out * weight
-+     #     return expert_cache
-+-
-+-    # @no_grad()
-+-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+-    #     # x 的 shape: (1, hidden_size)
-+-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+-
-+-    #     # 1. 收集所有需要的专家层
-+-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
-+-
-+-    #     # 2. 并行计算所有专家的输出
-+-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
-+-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+-
-+-    #     # 3. 使用矩阵乘法进行加权求和
-+-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
-+-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++    
-++    @no_grad()
-++    dwj
-++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++        # x 的 shape: (1, hidden_size)
-++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++
-++        # 1. 收集所有需要的专家层
-++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-++
-++        # 2. 并行计算所有专家的输出
-++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++        # ops.cat 会将它们堆叠成一个新的 Tensor
-++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++
-++        # 3. 使用矩阵乘法进行加权求和
-++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++        # 最终结果 final_output 的 shape: (1, hidden_size)
-++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+         
-+-    #     return final_output
-++        return final_output
-+ 
-+ 
-+     # @no_grad()
-+@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
-+ 
-+         return expert_cache
-+ # 放置在 DeepseekMoE 类中
-+-    @no_grad()
-+-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+-        """
-+-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-+-
-+-        Args:
-+-            x (Tensor): 输入张量, shape: (1, hidden_size)
-+-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-+-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-+-        """
-+-        top_k, _ = flat_expert_weights.shape
-+-        hidden_size = x.shape[-1]
-+-
-+-        # 1. 将所有专家的权重堆叠起来
-+-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-+-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-+-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-++    # @no_grad()
-++    # #lwx 20251107
-++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++    #     """
-++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-++
-++    #     Args:
-++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
-++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-++    #     """
-++    #     top_k, _ = flat_expert_weights.shape
-++    #     hidden_size = x.shape[-1]
-++
-++    #     # 1. 将所有专家的权重堆叠起来
-++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-+         
-+-        # 2. "收集" 所需的专家权重
-+-        selected_gate_w = stacked_gate_w[flat_expert_indices]
-+-        selected_up_w = stacked_up_w[flat_expert_indices]
-+-        selected_down_w = stacked_down_w[flat_expert_indices]
-++    #     # 2. "收集" 所需的专家权重
-++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
-++    #     selected_up_w = stacked_up_w[flat_expert_indices]
-++    #     selected_down_w = stacked_down_w[flat_expert_indices]
-+         
-+-        # 3. 准备输入
-+-        x_expanded = x.expand((top_k, 1, hidden_size))
-++    #     # 3. 准备输入
-++    #     x_expanded = x.expand((top_k, 1, hidden_size))
-+         
-+-        # 4. 并行计算 gate_proj 和 up_proj
-+-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-+-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-++    #     # 4. 并行计算 gate_proj 和 up_proj
-++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-+ 
-+-        # 5. 计算中间状态
-+-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-++    #     # 5. 计算中间状态
-++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-+         
-+-        # 6. 并行计算 down_proj
-+-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-+-        #    --- [FIX] ---
-+-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-+-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-+-        #    --- [FIX END] ---
-++    #     # 6. 并行计算 down_proj
-++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-++    #     #    --- [FIX] ---
-++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-++    #     #    --- [FIX END] ---
-+         
-+-        # 7. 根据路由权重进行加权求和
-+-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-++    #     # 7. 根据路由权重进行加权求和
-++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-+ 
-+-        return weighted_sum
-++    #     return weighted_sum
-+ 
-+ 
-+ 
-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+index 0a0ef2d7..2842180e 100644
-+--- a/patches/0001-20251104commit.patch
-++++ b/patches/0001-20251104commit.patch
-+@@ -1,7 +1,7 @@
-+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Tue, 4 Nov 2025 09:11:51 +0800
-+-Subject: [PATCH 1/4] 20251104commit
-++Subject: [PATCH 1/5] 20251104commit
-+ 
-+ ---
-+  mindnlp/transformers/cache_utils.py           |  28 +-
-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+index 5185270c..c6cd8757 100644
-+--- a/patches/0002-20251106commit.patch
-++++ b/patches/0002-20251106commit.patch
-+@@ -1,7 +1,7 @@
-+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 09:20:38 +0800
-+-Subject: [PATCH 2/4] 20251106commit
-++Subject: [PATCH 2/5] 20251106commit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+index 3e05f821..601960c9 100644
-+--- a/patches/0003-20261106secondcommit.patch
-++++ b/patches/0003-20261106secondcommit.patch
-+@@ -1,7 +1,7 @@
-+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 14:54:37 +0800
-+-Subject: [PATCH 3/4] 20261106secondcommit
-++Subject: [PATCH 3/5] 20261106secondcommit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-+index 88a1aef4..8976f10b 100644
-+--- a/patches/0004-20251106change.patch
-++++ b/patches/0004-20251106change.patch
-+@@ -1,7 +1,7 @@
-+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 15:48:09 +0800
-+-Subject: [PATCH 4/4] 20251106change
-++Subject: [PATCH 4/5] 20251106change
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  189 +-
-+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-+new file mode 100644
-+index 00000000..8d9032be
-+--- /dev/null
-++++ b/patches/0005-20251107001commit.patch
-+@@ -0,0 +1,7707 @@
-++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-++From: Pinoeer-kingxi <13022943007@163.com>
-++Date: Fri, 7 Nov 2025 11:48:18 +0800
-++Subject: [PATCH 5/5] 20251107001commit
-++
-++---
-++ .../models/deepseek/modeling_deepseek.py      |   91 +-
-++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
-++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
-++ patches/0001-20251104commit.patch             |    2 +-
-++ patches/0002-20251106commit.patch             |    2 +-
-++ patches/0003-20261106secondcommit.patch       |    2 +-
-++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
-++ 7 files changed, 7577 insertions(+), 30 deletions(-)
-++ create mode 100644 patches/0004-20251106change.patch
-++
-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++index 0546f318..8831e4b7 100644
-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
-++     #         expert_cache += expert_out * weight
-++     #     return expert_cache
-++ 
-++-    @no_grad()
-++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++-        # x 的 shape: (1, hidden_size)
-++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++-
-++-        # 1. 收集所有需要的专家层
-++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
-++-
-++-        # 2. 并行计算所有专家的输出
-++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++-        # ops.cat 会将它们堆叠成一个新的 Tensor
-++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++-
-++-        # 3. 使用矩阵乘法进行加权求和
-++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++-        # 最终结果 final_output 的 shape: (1, hidden_size)
-++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+++    # @no_grad()
-+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++    #     # x 的 shape: (1, hidden_size)
-+++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+++
-+++    #     # 1. 收集所有需要的专家层
-+++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
-+++
-+++    #     # 2. 并行计算所有专家的输出
-+++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
-+++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+++
-+++    #     # 3. 使用矩阵乘法进行加权求和
-+++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
-+++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++         
-++-        return final_output
-+++    #     return final_output
-++ 
-++ 
-++     # @no_grad()
-++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
-++             )
-++ 
-++         return expert_cache
-+++# 放置在 DeepseekMoE 类中
-+++    @no_grad()
-+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++        """
-+++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-+++
-+++        Args:
-+++            x (Tensor): 输入张量, shape: (1, hidden_size)
-+++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-+++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-+++        """
-+++        top_k, _ = flat_expert_weights.shape
-+++        hidden_size = x.shape[-1]
-+++
-+++        # 1. 将所有专家的权重堆叠起来
-+++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-+++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-+++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-+++        
-+++        # 2. "收集" 所需的专家权重
-+++        selected_gate_w = stacked_gate_w[flat_expert_indices]
-+++        selected_up_w = stacked_up_w[flat_expert_indices]
-+++        selected_down_w = stacked_down_w[flat_expert_indices]
-+++        
-+++        # 3. 准备输入
-+++        x_expanded = x.expand((top_k, 1, hidden_size))
-+++        
-+++        # 4. 并行计算 gate_proj 和 up_proj
-+++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-+++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-+++
-+++        # 5. 计算中间状态
-+++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-+++        
-+++        # 6. 并行计算 down_proj
-+++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-+++        #    --- [FIX] ---
-+++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-+++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-+++        #    --- [FIX END] ---
-+++        
-+++        # 7. 根据路由权重进行加权求和
-+++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-+++
-+++        return weighted_sum
-+++
-+++
-++ 
-++         # @no_grad()
-++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++index ebd7782e..913a7609 100644
-++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
-++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++ def rotate_half(x):
-++     """Rotates half the hidden dims of the input."""
-++-    x1 = x[..., : x.shape[-1] // 2]
-++-    x2 = x[..., x.shape[-1] // 2 :]
-+++    # x1 = x[..., : x.shape[-1] // 2]
-+++    # x2 = x[..., x.shape[-1] // 2 :]
-++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++     return ops.cat((-x2, x1), dim=-1)
-++ 
-++ 
-++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-++index d059dcbe..2b217b64 100644
-++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-+++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
-++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++ def rotate_half(x):
-++     """Rotates half the hidden dims of the input."""
-++-    x1 = x[..., : x.shape[-1] // 2]
-++-    x2 = x[..., x.shape[-1] // 2 :]
-+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++    # x1 = x[..., : x.shape[-1] // 2]
-+++    # x2 = x[..., x.shape[-1] // 2 :]
-+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++     return ops.cat((-x2, x1), dim=-1)
-++ 
-++ 
-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++index 78f22642..0a0ef2d7 100644
-++--- a/patches/0001-20251104commit.patch
-+++++ b/patches/0001-20251104commit.patch
-++@@ -1,7 +1,7 @@
-++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Tue, 4 Nov 2025 09:11:51 +0800
-++-Subject: [PATCH 1/3] 20251104commit
-+++Subject: [PATCH 1/4] 20251104commit
-++ 
-++ ---
-++  mindnlp/transformers/cache_utils.py           |  28 +-
-++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-++index 22b65dd5..5185270c 100644
-++--- a/patches/0002-20251106commit.patch
-+++++ b/patches/0002-20251106commit.patch
-++@@ -1,7 +1,7 @@
-++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Thu, 6 Nov 2025 09:20:38 +0800
-++-Subject: [PATCH 2/3] 20251106commit
-+++Subject: [PATCH 2/4] 20251106commit
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-++index 966529e4..3e05f821 100644
-++--- a/patches/0003-20261106secondcommit.patch
-+++++ b/patches/0003-20261106secondcommit.patch
-++@@ -1,7 +1,7 @@
-++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Thu, 6 Nov 2025 14:54:37 +0800
-++-Subject: [PATCH 3/3] 20261106secondcommit
-+++Subject: [PATCH 3/4] 20261106secondcommit
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-++new file mode 100644
-++index 00000000..88a1aef4
-++--- /dev/null
-+++++ b/patches/0004-20251106change.patch
-++@@ -0,0 +1,7498 @@
-+++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-+++From: Pinoeer-kingxi <13022943007@163.com>
-+++Date: Thu, 6 Nov 2025 15:48:09 +0800
-+++Subject: [PATCH 4/4] 20251106change
-+++
-+++---
-+++ .../models/deepseek/modeling_deepseek.py      |  189 +-
-+++ patches/0001-20251104commit.patch             | 1272 +++++++
-+++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
-+++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
-+++ 4 files changed, 7244 insertions(+), 186 deletions(-)
-+++ create mode 100644 patches/0001-20251104commit.patch
-+++ create mode 100644 patches/0002-20251106commit.patch
-+++ create mode 100644 patches/0003-20261106secondcommit.patch
-+++
-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++index 2f9192bf..0546f318 100644
-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
-+++ 
-+++         return attn_output, attn_weights, past_key_value
-+++ 
-+++-# class DeepseekFlashAttention(nn.Module):
-+++-#     """
-+++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-+++-
-+++-#     This class is designed as a drop-in replacement for DeepseekAttention.
-+++-#     """
-+++-
-+++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+++-#         super().__init__()
-+++-#         self.config = config
-+++-#         self.layer_idx = layer_idx
-+++-#         if layer_idx is None:
-+++-#             logger.warning(
-+++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++-#                 "when creating this class."
-+++-#             )
-+++-
-+++-#         self.attention_dropout = config.attention_dropout
-+++-#         self.hidden_size = config.hidden_size
-+++-#         self.num_heads = config.num_attention_heads
-+++-#         self.head_dim = self.hidden_size // self.num_heads
-+++-#         self.num_key_value_heads = config.num_key_value_heads
-+++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++-#         self.max_position_embeddings = config.max_position_embeddings
-+++-#         self.rope_theta = config.rope_theta
-+++-#         self.is_causal = True
-+++-
-+++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++-#             raise ValueError(
-+++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++-#                 f" and `num_heads`: {self.num_heads})."
-+++-#             )
-+++-
-+++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+++-#         self._init_rope()
-+++-
-+++-#     def _init_rope(self):
-+++-#         if self.config.rope_scaling is None:
-+++-#             self.rotary_emb = DeepseekRotaryEmbedding(
-+++-#                 self.head_dim,
-+++-#                 max_position_embeddings=self.max_position_embeddings,
-+++-#                 base=self.rope_theta,
-+++-#             )
-+++-#         else:
-+++-#             scaling_type = self.config.rope_scaling["type"]
-+++-#             scaling_factor = self.config.rope_scaling["factor"]
-+++-#             if scaling_type == "linear":
-+++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+++-#                     self.head_dim,
-+++-#                     max_position_embeddings=self.max_position_embeddings,
-+++-#                     scaling_factor=scaling_factor,
-+++-#                     base=self.rope_theta,
-+++-#                 )
-+++-#             elif scaling_type == "dynamic":
-+++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+++-#                     self.head_dim,
-+++-#                     max_position_embeddings=self.max_position_embeddings,
-+++-#                     scaling_factor=scaling_factor,
-+++-#                     base=self.rope_theta,
-+++-#                 )
-+++-#             else:
-+++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+++-
-+++-#     def forward(
-+++-#         self,
-+++-#         hidden_states: mindspore.Tensor,
-+++-#         attention_mask: Optional[mindspore.Tensor] = None,
-+++-#         position_ids: Optional[mindspore.Tensor] = None,
-+++-#         past_key_value: Optional[Cache] = None,
-+++-#         output_attentions: bool = False,
-+++-#         use_cache: bool = False,
-+++-#         **kwargs,
-+++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++-#         if "padding_mask" in kwargs:
-+++-#             warnings.warn(
-+++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+++-#             )
-+++-        
-+++-#         if output_attentions:
-+++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-+++-
-+++-#         bsz, q_len, _ = hidden_states.shape
-+++-
-+++-#         if self.config.pretraining_tp > 1:
-+++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+++-
-+++-#         query_states = self.q_proj(hidden_states)
-+++-#         key_states = self.k_proj(hidden_states)
-+++-#         value_states = self.v_proj(hidden_states)
-+++-
-+++-#         # Reshape for multi-head attention
-+++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++-
-+++-#         kv_seq_len = key_states.shape[-2]
-+++-#         if past_key_value is not None:
-+++-#             if self.layer_idx is None:
-+++-#                 raise ValueError(
-+++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++-#                     "with a layer index."
-+++-#                 )
-+++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++-        
-+++-#         # Apply Rotary Positional Embedding
-+++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++-
-+++-#         if past_key_value is not None:
-+++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-+++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++-
-+++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-+++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-+++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++-        
-+++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++-        
-+++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++-
-+++-#         # Convert attention_mask for flash_attention_score
-+++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-+++-#         if attention_mask is not None:
-+++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-+++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+++-#                 raise ValueError(
-+++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+++-#                 )
-+++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-+++-#         else:
-+++-#             attn_mask_for_fa = None
-+++-        
-+++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+++-
-+++-#         # Call the fused flash_attention_score operator
-+++-#         attn_output = mindspore.ops.flash_attention_score(
-+++-#             query=query_states_for_fa,
-+++-#             key=key_states_for_fa,
-+++-#             value=value_states_for_fa,
-+++-#             head_num=self.num_heads, # This is N1, the number of query heads
-+++-#             input_layout='BSH',
-+++-#             attn_mask=attn_mask_for_fa,
-+++-#             keep_prob=keep_prob,
-+++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++-#             sparse_mode=0 # Default mask mode
-+++-#         )
-+++-        
-+++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-+++-#         attn_output = self.o_proj(attn_output)
-+++-        
-+++-#         # Flash Attention does not return attention weights
-+++-#         attn_weights = None
-+++-
-+++-#         return attn_output, attn_weights, past_key_value
-+++ 
-+++ class DeepseekFlashAttention(nn.Module):
-+++     """
-+++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
-+++         super().__init__()
-+++         self.hidden_size = config.hidden_size
-+++ 
-+++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-+++-            config=config, layer_idx=layer_idx
-+++-        )
-++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-++++            # config=config, layer_idx=layer_idx
-++++        # )
-+++ 
-+++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-+++             config=config, layer_idx=layer_idx
-+++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
-+++         return outputs
-+++ 
-+++ 
-+++-
-+++ class DeepseekPreTrainedModel(PreTrainedModel):
-+++     config_class = DeepseekConfig
-+++     base_model_prefix = "model"
-+++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++         # Initialize weights and apply final processing
-+++         self.post_init()
-+++         self.warm_up = False
-+++-        #@dwj
-+++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-+++-            self.num_layers,
-+++-            self.num_attention_heads,
-+++-            self.head_dim,
-+++-            batch_size=1,
-+++-            max_length=self.max_length,
-+++-            dtype=mindspore.float16
-+++-        )
-+++-
-+++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-+++-        key_cache = []
-+++-        value_cache = []
-+++-        for _ in range(num_layers):
-+++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++-            key_cache.append(k)
-+++-            value_cache.append(v)
-+++-        return key_cache, value_cache
-+++-
-+++ 
-+++     def warmup_moe_model_deep(self):
-+++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+++new file mode 100644
-+++index 00000000..78f22642
-+++--- /dev/null
-++++++ b/patches/0001-20251104commit.patch
-+++@@ -0,0 +1,1272 @@
-++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++++From: Pinoeer-kingxi <13022943007@163.com>
-++++Date: Tue, 4 Nov 2025 09:11:51 +0800
-++++Subject: [PATCH 1/3] 20251104commit
-++++
-++++---
-++++ mindnlp/transformers/cache_utils.py           |  28 +-
-++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++++ 3 files changed, 976 insertions(+), 87 deletions(-)
-++++
-++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++++index cadd2e04..02f8d4be 100644
-++++--- a/mindnlp/transformers/cache_utils.py
-+++++++ b/mindnlp/transformers/cache_utils.py
-++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++++             # k_out[:, :, cache_position] = key_states
-++++             # v_out[:, :, cache_position] = value_states
-++++-            if ON_ORANGE_PI:
-++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++-            else:
-++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++-
-+++++            # if ON_ORANGE_PI:
-+++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++            # else:
-+++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++            # 确保 cache_position 是 1D tensor 并且类型正确
-+++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+++++            if cache_position.ndim > 1:
-+++++                cache_position = cache_position.flatten()
-+++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-+++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+++++                cache_position = cache_position.int()
-+++++            
-+++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+++++            k_out[:, :, cache_position] = key_states
-+++++            v_out[:, :, cache_position] = value_states
-+++++            
-++++         return k_out, v_out
-++++ 
-++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++index c695b944..d8303e45 100644
-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++++ def rotate_half(x):
-++++     """Rotates half the hidden dims of the input."""
-++++-    x1 = x[..., : x.shape[-1] // 2]
-++++-    x2 = x[..., x.shape[-1] // 2 :]
-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++    # x1 = x[..., : x.shape[-1] // 2]
-+++++    # x2 = x[..., x.shape[-1] // 2 :]
-+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++     return ops.cat((-x2, x1), dim=-1)
-++++ 
-++++ 
-++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++++         if self.training:
-++++             raise NotImplementedError("Training is not supported yet.")
-++++         else:
-++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++-        if self.config.n_shared_experts is not None:
-++++-            y = y + self.shared_experts(identity)
-++++-        return y
-+++++            # @lwx
-+++++            if orig_shape[1] == 1:
-+++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+++++                y=y.view(*orig_shape)
-+++++                if self.config.n_shared_experts is not None:
-+++++                    y = y + self.shared_experts(identity)
-+++++                return y
-+++++            else:
-+++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+++++                if self.config.n_shared_experts is not None:
-+++++                    y = y + self.shared_experts(identity)
-+++++                return y
-+++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++        # if self.config.n_shared_experts is not None:
-+++++        #     y = y + self.shared_experts(identity)
-+++++        # return y
-+++++        
-+++++    @no_grad()
-+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++
-+++++        expert_cache = ops.zeros_like(x)
-+++++        for i in range(self.num_experts_per_tok):
-+++++            expert_id = flat_expert_indices[i].item()
-+++++            weight = flat_expert_weights[i].item()
-+++++            expert = self.experts[expert_id]
-+++++            expert_out = expert(x)
-+++++            expert_cache += expert_out * weight
-+++++        return expert_cache
-++++ 
-++++     @no_grad()
-++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++-        # expert_cache = torch.zeros_like(x)
-++++-        # idxs = flat_expert_indices.argsort()
-++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++-        # token_idxs = idxs // self.num_experts_per_tok
-++++-        # for i, end_idx in enumerate(tokens_per_expert):
-++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++-        #     if start_idx == end_idx:
-++++-        #         continue
-++++-        #     expert = self.experts[i]
-++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++-        #     expert_tokens = x[exp_token_idx]
-++++-        #     expert_out = expert(expert_tokens)
-++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++-        # return expert_cache
-+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++         expert_cache = ops.zeros_like(x)
-++++         idxs = flat_expert_indices.argsort()
-++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++         token_idxs = idxs // self.num_experts_per_tok
-+++++
-++++         for i, end_idx in enumerate(tokens_per_expert):
-++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++             if start_idx == end_idx:
-++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++++             expert_out = expert(expert_tokens)
-++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++
-++++         return expert_cache
-+++++        
-+++++    # @no_grad()
-+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++    #     # expert_cache = torch.zeros_like(x)
-+++++    #     # idxs = flat_expert_indices.argsort()
-+++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++    #     # token_idxs = idxs // self.num_experts_per_tok
-+++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++    #     #     if start_idx == end_idx:
-+++++    #     #         continue
-+++++    #     #     expert = self.experts[i]
-+++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++    #     #     expert_tokens = x[exp_token_idx]
-+++++    #     #     expert_out = expert(expert_tokens)
-+++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++    #     # return expert_cache
-+++++    #     expert_cache = ops.zeros_like(x)
-+++++    #     idxs = flat_expert_indices.argsort()
-+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++    #     token_idxs = idxs // self.num_experts_per_tok
-+++++
-+++++    #     for i, end_idx in enumerate(tokens_per_expert):
-+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++    #         if start_idx == end_idx:
-+++++    #             continue
-+++++    #         expert = self.experts[i]
-+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++    #         expert_tokens = x[exp_token_idx]
-+++++    #         expert_out = expert(expert_tokens)
-+++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++
-+++++    #     return expert_cache
-+++++    # @no_grad()
-+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++    #     expert_cache = ops.zeros_like(x)
-+++++
-+++++    #     # 排序保证顺序一致
-+++++    #     idxs = flat_expert_indices.argsort()
-+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++    #     token_idxs = idxs // self.num_experts_per_tok
-+++++
-+++++    #     # 找出有 token 的专家
-+++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++
-+++++    #     for i in active_experts.tolist():
-+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++    #         end_idx = tokens_per_expert[i]
-+++++    #         if start_idx == end_idx:  # 没有 token
-+++++    #             continue
-+++++
-+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++    #         expert_tokens = x[exp_token_idx]
-+++++    #         expert_out = self.experts[i](expert_tokens)
-+++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++
-+++++    #         expert_cache = mindspore.mint.scatter_add(
-+++++    #             expert_cache,
-+++++    #             0,
-+++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++    #             expert_out
-+++++    #         )
-+++++
-+++++    #     return expert_cache
-+++++
-+++++
-++++ 
-++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++++ #     """
-++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++ 
-++++         # Initialize weights and apply final processing
-++++         self.post_init()
-+++++        self.warm_up = False
-+++++
-+++++    def warmup_moe_model_deep(self):
-+++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++++        test_texts = [
-+++++            "warmup short",
-+++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+++++        ]
-+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++        if tokenizer is None:
-+++++            from mindnlp.transformers import AutoTokenizer
-+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++            self._warmup_tokenizer = tokenizer
-+++++
-+++++        for text in test_texts:
-+++++            inputs = tokenizer(text, return_tensors="ms")
-+++++            with mindspore._no_grad():
-+++++                _ = self(**inputs, use_cache=False)
-+++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++++ 
-++++     def get_input_embeddings(self):
-++++         return self.model.embed_tokens
-++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++         ```"""
-+++++        if not self.warm_up:
-+++++            self.warm_up = True
-+++++            self.warmup_moe_model_deep()
-+++++
-++++         output_attentions = (
-++++             output_attentions
-++++             if output_attentions is not None
-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++index 3cbf820e..d4c6b651 100644
-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++@@ -18,7 +18,6 @@
-++++ # See the License for the specific language governing permissions and
-++++ # limitations under the License.
-++++ """MindSpore Qwen2MoE model."""
-++++-
-++++ import math
-++++ from typing import List, Optional, Tuple, Union
-++++ 
-++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++++     TokenClassifierOutput,
-++++ )
-++++ from ...modeling_utils import PreTrainedModel
-+++++from ...generation import GenerationMixin
-++++ from ....utils import logging
-++++ from .configuration_qwen2_moe import Qwen2MoeConfig
-++++ 
-++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++++         self.variance_epsilon = eps
-++++ 
-++++     def forward(self, hidden_states):
-+++++        # @dwj
-+++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++        # @lwx
-+++++        # if not self.training :
-+++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++         input_dtype = hidden_states.dtype
-++++         hidden_states = hidden_states.to(mindspore.float32)
-++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++++@@ -234,6 +239,8 @@ def rotate_half(x):
-++++     """Rotates half the hidden dims of the input."""
-++++     x1 = x[..., : x.shape[-1] // 2]
-++++     x2 = x[..., x.shape[-1] // 2 :]
-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++     return ops.cat((-x2, x1), dim=-1)
-++++ 
-++++ 
-++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++++         self.config = config
-++++         self.hidden_size = config.hidden_size
-++++         self.intermediate_size = intermediate_size
-+++++        
-++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++++         self.act_fn = ACT2FN[config.hidden_act]
-++++ 
-++++     def forward(self, x):
-++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++-
-++++ 
-+++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++        # @lwx 
-+++++        # gate_up_output = self.gate_up_proj(x)
-+++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+++++        # return self.down_proj(swiglu_output)
-+++++
-+++++    # def forward(self, x):
-+++++    #     gate_proj_out = self.gate_proj(x)
-+++++    #     up_proj_out = self.up_proj(x)
-+++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+++++    #     return self.down_proj(swiglu_out)
-+++++        
-++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++++     """
-++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++++         use_cache: bool = False,
-++++         cache_position: Optional[mindspore.Tensor] = None,
-++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++        
-+++++
-++++         bsz, q_len, _ = hidden_states.shape
-++++ 
-++++         query_states = self.q_proj(hidden_states)
-++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++                     "with a layer index."
-++++                 )
-++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++            if isinstance(past_key_value, StaticCache):
-+++++                kv_seq_len = key_states.shape[-2]
-+++++            else:
-+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++ 
-++++         if past_key_value is not None:
-++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++            
-+++++            if isinstance(past_key_value, StaticCache):
-+++++                kv_seq_len = key_states.shape[-2]
-++++ 
-++++         # repeat k/v heads if n_kv_heads < n_heads
-++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++-
-+++++        
-++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++ 
-++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++++-            raise ValueError(
-++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++++-                f" {attn_weights.shape}"
-++++-            )
-++++-
-++++-        if attention_mask is not None:  # no matter the length, we just slice it
-++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+++++        if attention_mask is not None:
-+++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++             attn_weights = attn_weights + causal_mask
-++++ 
-++++         # upcast attention to fp32
-++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++ 
-++++         attn_output = self.o_proj(attn_output)
-++++-
-+++++        # @lwx
-+++++        
-+++++        # max_seq_len = self.max_position_embeddings  # 2048
-+++++
-+++++        # if attention_mask is not None:
-+++++        #     # attention_mask: [B, 1, Sq, Sk]
-+++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++++
-+++++        #     # pad 到 [max_seq_len, max_seq_len]
-+++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++++        #     global_attention_mask = padded_mask
-+++++        # else:
-+++++        #     global_attention_mask = None
-+++++
-+++++
-+++++        # sparse_mode=3
-+++++        # attn_output = mindspore.ops.flash_attention_score(
-+++++        #     query=query_states,
-+++++        #     key=key_states,
-+++++        #     value=value_states,
-+++++        #     real_shift=None,
-+++++        #     padding_mask=None,
-+++++
-+++++        #     head_num=self.num_heads,
-+++++        #     attn_mask=global_attention_mask,
-+++++        #     keep_prob=1.0 - self.attention_dropout,
-+++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++        #     input_layout="BNSD",
-+++++        #     pre_tokens=2147483647,
-+++++        #     next_tokens=2147483647,
-+++++        #     inner_precise=0,
-+++++        #     drop_mask=None,
-+++++        #     prefix=None,
-+++++        #     actual_seq_qlen=None,
-+++++        #     actual_seq_kvlen=None,
-+++++        #     sparse_mode=sparse_mode,
-+++++        # )
-++++         if not output_attentions:
-++++             attn_weights = None
-++++ 
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-++++ 
-+++++class Qwen2MoeFlashAttention(nn.Module):
-+++++    """
-+++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++++
-+++++    关键改动:
-+++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++++       直接传入原始的 key 和 value 张量效率更高。
-+++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++    """
-+++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++        super().__init__()
-+++++        self.config = config
-+++++        self.layer_idx = layer_idx
-+++++        self.hidden_size = config.hidden_size
-+++++        self.num_heads = config.num_attention_heads
-+++++        self.head_dim = self.hidden_size // self.num_heads
-+++++        self.num_key_value_heads = config.num_key_value_heads
-+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++        self.max_position_embeddings = config.max_position_embeddings
-+++++        self.rope_theta = config.rope_theta
-+++++        self.attention_dropout = config.attention_dropout
-+++++
-+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++            raise ValueError(
-+++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++++            )
-+++++
-+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++
-+++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++            self.head_dim,
-+++++            max_position_embeddings=self.max_position_embeddings,
-+++++            base=self.rope_theta,
-+++++        )
-+++++
-+++++    def forward(
-+++++        self,
-+++++        hidden_states: mindspore.Tensor,
-+++++        attention_mask: Optional[mindspore.Tensor] = None,
-+++++        position_ids: Optional[mindspore.Tensor] = None,
-+++++        past_key_value: Optional[Cache] = None,
-+++++        output_attentions: bool = False,
-+++++        use_cache: bool = False,
-+++++        cache_position: Optional[mindspore.Tensor] = None,
-+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++        bsz, q_len, _ = hidden_states.shape
-+++++
-+++++        # 1. 线性投射 Q, K, V
-+++++        query_states = self.q_proj(hidden_states)
-+++++        key_states = self.k_proj(hidden_states)
-+++++        value_states = self.v_proj(hidden_states)
-+++++
-+++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++        # 3. RoPE 旋转位置编码
-+++++        kv_seq_len = key_states.shape[-2]
-+++++        if past_key_value is not None:
-+++++            if self.layer_idx is None:
-+++++                raise ValueError(
-+++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++                    "with a layer index."
-+++++                )
-+++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++++                if cache_position.shape[0] == 1:
-+++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++++                    kv_seq_len = past_seen_tokens + 1
-+++++                else:
-+++++                    # prefill 阶段：cache_position 是范围，使用其长度
-+++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++++            else:
-+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++
-+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++        # 4. KV 缓存更新
-+++++        if past_key_value is not None:
-+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++            key_states, value_states = past_key_value.update(
-+++++                key_states, value_states, self.layer_idx, cache_kwargs
-+++++            )
-+++++            
-+++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++                if cache_position.shape[0] == 1:
-+++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++++                    kv_seq_len = key_states.shape[-2]
-+++++
-+++++        # 5. [重要] 准备 Attention Mask
-+++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++        fa_attention_mask = None
-+++++        if attention_mask is not None:
-+++++            # 截取与当前key长度匹配的部分
-+++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++++            fa_attention_mask = (mask_slice != 0)
-+++++
-+++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++++        input_dtype = query_states.dtype
-+++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++++            query_states = query_states.to(mindspore.float16)
-+++++            key_states = key_states.to(mindspore.float16)
-+++++            value_states = value_states.to(mindspore.float16)
-+++++
-+++++        # 6. [核心] 调用 flash_attention_score 算子
-+++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++        attn_output = mindspore.ops.flash_attention_score(
-+++++            query=query_states,
-+++++            key=key_states,
-+++++            value=value_states,
-+++++            head_num=self.num_heads, # 传入Q的头数(N1)
-+++++            attn_mask=fa_attention_mask,
-+++++            keep_prob=1.0 - self.attention_dropout,
-+++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++            input_layout="BNSD",
-+++++            sparse_mode=0 # 使用 defaultMask 模式
-+++++        )
-+++++
-+++++        # 恢复原始数据类型
-+++++        attn_output = attn_output.to(input_dtype)
-+++++
-+++++        # 7. 调整输出形状
-+++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++        attn_output = self.o_proj(attn_output)
-+++++
-+++++        # FlashAttention 算子不直接返回注意力权重矩阵
-+++++        attn_weights = None
-+++++        if output_attentions:
-+++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++
-+++++        return attn_output, attn_weights, past_key_value
-+++++
-+++++    # def forward(
-+++++    #     self,
-+++++    #     hidden_states: mindspore.Tensor,
-+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++    #     past_key_value: Optional[Cache] = None,
-+++++    #     output_attentions: bool = False,
-+++++    #     use_cache: bool = False,
-+++++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++    #     bsz, q_len, _ = hidden_states.shape
-+++++
-+++++    #     # 1. 线性投射 Q, K, V
-+++++    #     query_states = self.q_proj(hidden_states)
-+++++    #     key_states = self.k_proj(hidden_states)
-+++++    #     value_states = self.v_proj(hidden_states)
-+++++
-+++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++    #     # 3. RoPE 旋转位置编码
-+++++    #     kv_seq_len = key_states.shape[-2]
-+++++    #     if past_key_value is not None:
-+++++    #         if self.layer_idx is None:
-+++++    #             raise ValueError(
-+++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++    #                 "with a layer index."
-+++++    #             )
-+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++
-+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++    #     # 4. KV 缓存更新
-+++++    #     if past_key_value is not None:
-+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++    #         key_states, value_states = past_key_value.update(
-+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++    #         )
-+++++
-+++++    #     # 5. 准备 Attention Mask
-+++++    #     fa_attention_mask = None
-+++++    #     if attention_mask is not None:
-+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++    #         fa_attention_mask = (mask_slice != 0)
-+++++
-+++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++++    #     input_dtype = query_states.dtype
-+++++
-+++++    #     # 6. [核心] 调用 flash_attention_score 算子
-+++++    #     attn_output = mindspore.ops.flash_attention_score(
-+++++    #         query=query_states,
-+++++    #         key=key_states,
-+++++    #         value=value_states,
-+++++    #         head_num=self.num_heads,
-+++++    #         attn_mask=fa_attention_mask,
-+++++    #         keep_prob=1.0 - self.attention_dropout,
-+++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++    #         input_layout="BNSD",
-+++++    #         sparse_mode=0,
-+++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++++    #         inner_precise=1
-+++++    #     )
-+++++
-+++++    #     # 恢复原始数据类型
-+++++    #     attn_output = attn_output.to(input_dtype)
-+++++
-+++++    #     # 7. 调整输出形状
-+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++    #     attn_output = self.o_proj(attn_output)
-+++++
-+++++    #     attn_weights = None
-+++++    #     if output_attentions:
-+++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++
-+++++    #     return attn_output, attn_weights, past_key_value
-+++++
-+++++    # def forward(
-+++++    #     self,
-+++++    #     hidden_states: mindspore.Tensor,
-+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++    #     past_key_value: Optional[Cache] = None,
-+++++    #     output_attentions: bool = False,
-+++++    #     use_cache: bool = False,
-+++++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++    #     bsz, q_len, _ = hidden_states.shape
-+++++
-+++++    #     query_states = self.q_proj(hidden_states)
-+++++    #     key_states = self.k_proj(hidden_states)
-+++++    #     value_states = self.v_proj(hidden_states)
-+++++
-+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++    #     kv_seq_len = key_states.shape[-2]
-+++++    #     if past_key_value is not None:
-+++++    #         if self.layer_idx is None:
-+++++    #             raise ValueError("`layer_idx` must be specified for caching")
-+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++
-+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++    #     if past_key_value is not None:
-+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++    #         key_states, value_states = past_key_value.update(
-+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++    #         )
-+++++
-+++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++
-+++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++++    #     query_states = query_states / math.sqrt(self.head_dim)
-+++++    #     # <--- 修改结束 ---
-+++++
-+++++    #     fa_attention_mask = None
-+++++    #     if attention_mask is not None:
-+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++    #         fa_attention_mask = (mask_slice != 0)
-+++++
-+++++    #     input_dtype = query_states.dtype
-+++++
-+++++    #     attn_output = mindspore.ops.flash_attention_score(
-+++++    #         query=query_states,    # 传入已经预先缩放过的 query
-+++++    #         key=key_states,
-+++++    #         value=value_states,
-+++++    #         head_num=self.num_heads,
-+++++    #         attn_mask=fa_attention_mask,
-+++++    #         keep_prob=1.0 - self.attention_dropout,
-+++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++++    #         input_layout="BNSD",
-+++++    #         sparse_mode=0,
-+++++    #         inner_precise=1        # 仍然保持内部高精度计算
-+++++    #     )
-+++++
-+++++    #     attn_output = attn_output.to(input_dtype)
-+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++    #     attn_output = self.o_proj(attn_output)
-+++++
-+++++    #     attn_weights = None
-+++++    #     if output_attentions:
-+++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++++
-+++++    #     return attn_output, attn_weights, past_key_value
-+++++
-++++ QWEN2MOE_ATTENTION_CLASSES = {
-++++     "eager": Qwen2MoeAttention,
-+++++    "flash-attention": Qwen2MoeFlashAttention,
-++++ }
-++++ 
-++++ 
-++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++ 
-+++++    #@dwj
-+++++    # 只遍历激活的专家，而非全部专家
-++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-        hidden_states = hidden_states.view(-1, hidden_dim)
-++++-        # router_logits: (batch * sequence_length, n_experts)
-++++-        router_logits = self.gate(hidden_states)
-++++-
-++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++-        if self.norm_topk_prob:
-++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-        # we cast back to the input dtype
-++++-        routing_weights = routing_weights.to(hidden_states.dtype)
-++++-
-++++-        final_hidden_states = ops.zeros(
-++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++++-        )
-++++-
-++++-        # One hot encode the selected experts to create an expert mask
-++++-        # this will be used to easily index which expert is going to be sollicitated
-++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++++-
-++++-        # Loop over all available experts in the model and perform the computation on each expert
-++++-        for expert_idx in range(self.num_experts):
-++++-            expert_layer = self.experts[expert_idx]
-++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++++-
-++++-            # Index the correct hidden states and compute the expert hidden state for
-++++-            # the current expert. We need to make sure to multiply the output hidden
-++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++++-            if 0 not in idx.shape:
-++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++++-
-++++-                # However `index_add_` only support torch tensors for indexing so we'll use
-++++-                # the `top_x` tensor here.
-++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++++-
-++++-        shared_expert_output = self.shared_expert(hidden_states)
-++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++++-
-++++-        final_hidden_states = final_hidden_states + shared_expert_output
-+++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++            num_tokens = hidden_states_reshaped.shape[0]
-+++++
-+++++            router_logits = self.gate(hidden_states_reshaped)
-+++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++
-+++++            if self.norm_topk_prob:
-+++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++            routing_weights = routing_weights.to(hidden_states.dtype)
-+++++            
-+++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++++            flat_selected_experts = selected_experts.flatten()
-+++++            
-+++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++++            token_indices = broadcasted_token_indices.flatten()
-+++++            
-+++++            active_experts = ops.unique(flat_selected_experts)
-+++++            
-+++++            for expert_idx_tensor in active_experts:
-+++++                expert_idx = expert_idx_tensor.item()
-+++++                expert_layer = self.experts[expert_idx]
-+++++                
-+++++                mask = (flat_selected_experts == expert_idx_tensor)
-+++++                selected_token_indices = token_indices[mask]
-+++++                selected_routing_weights = routing_weights.flatten()[mask]
-+++++                
-+++++                current_states = hidden_states_reshaped[selected_token_indices]
-+++++                
-+++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++                
-+++++                final_hidden_states = final_hidden_states.index_add(
-+++++                    dim=0,
-+++++                    index=selected_token_indices,
-+++++                    source=expert_output.to(hidden_states.dtype)
-+++++                )
-+++++            
-+++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++ 
-++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++-        return final_hidden_states, router_logits
-+++++            final_hidden_states = final_hidden_states + shared_expert_output
-+++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++            
-+++++            return final_hidden_states, router_logits
-++++ 
-++++ 
-++++ class Qwen2MoeDecoderLayer(nn.Module):
-++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++++ 
-++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++ 
-+++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++
-++++         if (layer_idx not in config.mlp_only_layers) and (
-++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++++         ):
-++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++++     _skip_keys_device_placement = "past_key_values"
-++++     _supports_cache_class = True
-+++++#lwx
-+++++    # _supports_static_cache = True
-++++ 
-++++     def _init_weights(self, module):
-++++         std = self.config.initializer_range
-++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++++         return causal_mask
-++++ 
-++++ 
-++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++     _tied_weights_keys = ["lm_head.weight"]
-++++ 
-++++     def __init__(self, config):
-++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++         self.num_experts_per_tok = config.num_experts_per_tok
-++++         # Initialize weights and apply final processing
-++++         self.post_init()
-+++++        # @lwx
-+++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+++++        #     self.generation_config.cache_implementation = "static"
-+++++        self._warmed_up = False
-+++++
-+++++    def warmup_moe_model(self):
-+++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+++++        test_texts = [
-+++++            "warmup short",
-+++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+++++        ]
-+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++        if tokenizer is None:
-+++++            from mindnlp.transformers import AutoTokenizer
-+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++            self._warmup_tokenizer = tokenizer
-+++++
-+++++        for text in test_texts:
-+++++            inputs = tokenizer(text, return_tensors="ms")
-+++++            with mindspore._no_grad():
-+++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++++ 
-++++     def get_input_embeddings(self):
-++++         return self.model.embed_tokens
-++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++         ```"""
-+++++        if not self._warmed_up:
-+++++            self._warmed_up = True
-+++++            self.warmup_moe_model()
-++++ 
-++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++++         output_router_logits = (
-++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++             }
-++++         )
-++++         return model_inputs
-+++++# @lwx
-+++++    # def _decode_one_tokens_logits(
-+++++    #     self,
-+++++    #     cur_token: mindspore.Tensor,
-+++++    #     input_pos: Optional[mindspore.Tensor],
-+++++    #     cache_position: mindspore.Tensor,
-+++++    #     past_key_values: StaticCache,
-+++++    # ) -> mindspore.Tensor:
-+++++    #     """
-+++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+++++        
-+++++    #     Args:
-+++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+++++    #         input_pos: 输入位置信息，可选
-+++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+++++            
-+++++    #     Returns:
-+++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+++++    #     """
-+++++    #     # 调用JIT编译的版本
-+++++    #     return self.get_decode_one_tokens_logits(
-+++++    #         cur_token=cur_token,
-+++++    #         input_pos=input_pos,
-+++++    #         cache_position=cache_position,
-+++++    #         past_key_values=past_key_values,
-+++++    #     )
-+++++    
-+++++    # @mindspore.jit(jit_level='O1')
-+++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+++++    #     """
-+++++    #     JIT编译的函数，用于高效的单token解码
-+++++    #     使用JIT编译优化以支持静态shape和高效执行
-+++++        
-+++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+++++    #     """
-+++++    #     outputs = self.model.forward(
-+++++    #         input_ids=cur_token,
-+++++    #         position_ids=input_pos,
-+++++    #         cache_position=cache_position,
-+++++    #         past_key_values=past_key_values,
-+++++    #         use_cache=True,
-+++++    #         return_dict=False,
-+++++    #     )
-+++++        
-+++++    #     hidden_states = outputs[0]
-+++++    #     logits = self.lm_head.forward(hidden_states)
-+++++    #     logits = logits.float()
-+++++        
-+++++    #     return logits[:, -1, :]
-+++++
-+++++    # def _sample(
-+++++    #     self,
-+++++    #     input_ids: mindspore.Tensor,
-+++++    #     logits_processor,
-+++++    #     stopping_criteria,
-+++++    #     generation_config,
-+++++    #     synced_devices: bool,
-+++++    #     streamer=None,
-+++++    #     logits_warper=None,
-+++++    #     **model_kwargs,
-+++++    # ):
-+++++    #     """
-+++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+++++    #     """
-+++++    #     from ...generation.logits_process import LogitsProcessorList
-+++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+++++    #     from mindnlp.core import nn, ops, no_grad
-+++++    #     import numpy as np
-+++++        
-+++++    #     # 检查是否使用 StaticCache
-+++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+++++    #     # 否则，直接调用父类方法
-+++++    #     past_key_values = model_kwargs.get("past_key_values")
-+++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+++++
-+++++    #     if not isinstance(past_key_values, StaticCache):
-+++++    #         # 不使用 StaticCache，直接调用父类方法
-+++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+++++    #         return super()._sample(
-+++++    #             input_ids=input_ids,
-+++++    #             logits_processor=logits_processor,
-+++++    #             stopping_criteria=stopping_criteria,
-+++++    #             generation_config=generation_config,
-+++++    #             synced_devices=synced_devices,
-+++++    #             streamer=streamer,
-+++++    #             logits_warper=logits_warper,
-+++++    #             **model_kwargs,
-+++++    #         )
-+++++        
-+++++    #     # 使用 StaticCache，进入自定义循环
-+++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+++++    #     pad_token_id = generation_config._pad_token_tensor
-+++++    #     output_attentions = generation_config.output_attentions
-+++++    #     output_hidden_states = generation_config.output_hidden_states
-+++++    #     output_scores = generation_config.output_scores
-+++++    #     output_logits = generation_config.output_logits
-+++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+++++    #     max_length = generation_config.max_length
-+++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+++++    #     do_sample = generation_config.do_sample
-+++++        
-+++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+++++    #         raise ValueError(
-+++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+++++    #             f"{logits_warper})."
-+++++    #         )
-+++++        
-+++++    #     # init attention / hidden states / scores tuples
-+++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-+++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+++++        
-+++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+++++    #         encoder_hidden_states = (
-+++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+++++    #         )
-+++++        
-+++++    #     # keep track of which sequences are already finished
-+++++    #     batch_size, cur_len = input_ids.shape
-+++++    #     this_peer_finished = False
-+++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+++++        
-+++++    #     time_record = []
-+++++    #     from ....utils.testing_utils import parse_flag_from_env
-+++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+++++        
-+++++    #     while self._has_unfinished_sequences(
-+++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+++++    #     ):
-+++++    #         if _record_time:
-+++++    #             import time as time_module
-+++++    #             infer_start = time_module.time()
-+++++            
-+++++    #         # prepare model inputs
-+++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+++++            
-+++++    #         # prepare variable output controls
-+++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+++++            
-+++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+++++    #         cur_cache_position = model_inputs.get("cache_position")
-+++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-+++++    #         cur_input_ids = model_inputs.get("input_ids")
-+++++            
-+++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+++++    #             cur_cache_position is not None and 
-+++++    #             len(cur_cache_position.shape) > 0 and
-+++++    #             cur_cache_position.shape[0] == 1 and
-+++++    #             cur_input_ids is not None and
-+++++    #             cur_input_ids.shape[1] == 1):
-+++++    #             # 使用 JIT 优化的单 token 解码
-+++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+++++    #             if not hasattr(self, '_jit_used'):
-+++++    #                 self._jit_used = False
-+++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+++++                
-+++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-+++++    #                 cur_token=cur_input_ids,
-+++++    #                 input_pos=model_inputs.get("position_ids"),
-+++++    #                 cache_position=cur_cache_position,
-+++++    #                 past_key_values=cur_past_key_values,
-+++++    #             )
-+++++                
-+++++    #             # 标记已使用JIT（用于后续判断）
-+++++    #             if not self._jit_used:
-+++++    #                 self._jit_used = True
-+++++                
-+++++    #             # 构造兼容的输出对象
-+++++    #             class JitOptimizedOutput:
-+++++    #                 def __init__(self, logits, config):
-+++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+++++    #                     self.config = config
-+++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-+++++    #                     self.cross_attentions = None
-+++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+++++                
-+++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+++++    #         else:
-+++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+++++    #             outputs = self(**model_inputs, return_dict=True)
-+++++            
-+++++    #         if synced_devices and this_peer_finished:
-+++++    #             continue
-+++++            
-+++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+++++    #         next_token_logits = outputs.logits[:, -1, :]
-+++++            
-+++++    #         # pre-process distribution
-+++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+++++    #         if do_sample:
-+++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+++++            
-+++++    #         # Store scores, attentions and hidden_states when required
-+++++    #         if return_dict_in_generate:
-+++++    #             if output_scores:
-+++++    #                 scores += (next_token_scores,)
-+++++    #             if output_logits:
-+++++    #                 raw_logits += (next_token_logits,)
-+++++    #             if output_attentions:
-+++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+++++    #                 if self.config.is_encoder_decoder:
-+++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+++++                
-+++++    #             if output_hidden_states:
-+++++    #                 hidden = (
-+++++    #                     outputs.decoder_hidden_states
-+++++    #                     if self.config.is_encoder_decoder
-+++++    #                     else outputs.hidden_states
-+++++    #                 )
-+++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+++++            
-+++++    #         # token selection
-+++++    #         if do_sample:
-+++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+++++    #         else:
-+++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+++++            
-+++++    #         # finished sentences should have their next token be a padding token
-+++++    #         if has_eos_stopping_criteria:
-+++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+++++            
-+++++    #         # update generated ids, model inputs, and length for next step
-+++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+++++    #         if streamer is not None:
-+++++    #             streamer.put(next_tokens)
-+++++            
-+++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-+++++    #             outputs,
-+++++    #             model_kwargs,
-+++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+++++    #         )
-+++++            
-+++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+++++    #         cur_len += 1
-+++++            
-+++++    #         if _record_time:
-+++++    #             import time as time_module
-+++++    #             infer_stop = time_module.time()
-+++++    #             time_record.append(infer_stop - infer_start)
-+++++            
-+++++    #         del outputs
-+++++        
-+++++    #     average_infer_time = None
-+++++    #     if time_record:
-+++++    #         if len(time_record) > 1:
-+++++    #             time_record.pop(0)
-+++++    #         average_infer_time = sum(time_record) / len(time_record)
-+++++    #         print(f'average inference time is: {average_infer_time}')
-+++++    #         print(f'inference time record: {time_record}')
-+++++        
-+++++    #     if streamer is not None:
-+++++    #         streamer.end()
-+++++        
-+++++    #     # 简单判断：打印是否使用了JIT路径
-+++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-+++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-+++++    #     else:
-+++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+++++        
-+++++    #     if return_dict_in_generate:
-+++++    #         if self.config.is_encoder_decoder:
-+++++    #             return GenerateEncoderDecoderOutput(
-+++++    #                 sequences=input_ids,
-+++++    #                 scores=scores,
-+++++    #                 logits=raw_logits,
-+++++    #                 encoder_attentions=encoder_attentions,
-+++++    #                 encoder_hidden_states=encoder_hidden_states,
-+++++    #                 decoder_attentions=decoder_attentions,
-+++++    #                 cross_attentions=cross_attentions,
-+++++    #                 decoder_hidden_states=decoder_hidden_states,
-+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++    #                 average_infer_time=average_infer_time
-+++++    #             )
-+++++    #         else:
-+++++    #             return GenerateDecoderOnlyOutput(
-+++++    #                 sequences=input_ids,
-+++++    #                 scores=scores,
-+++++    #                 logits=raw_logits,
-+++++    #                 attentions=decoder_attentions,
-+++++    #                 hidden_states=decoder_hidden_states,
-+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++    #                 average_infer_time=average_infer_time
-+++++    #             )
-+++++    #     else:
-+++++    #         return input_ids
-+++++            
-+++++    # def _prepare_cache_for_generation(
-+++++    #     self,
-+++++    #     generation_config,
-+++++    #     model_kwargs,
-+++++    #     assistant_model,
-+++++    #     batch_size,
-+++++    #     max_cache_length,
-+++++    # ):
-+++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+++++    #         generation_config.cache_implementation = "static"
-+++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+++++        
-+++++    #     if generation_config.cache_implementation == "static":
-+++++    #         base_required_from_max_length = generation_config.max_length + 1
-+++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-+++++    #         min_cache_size = 50
-+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+++++    #         else:
-+++++    #             max_cache_length = max(base_required, min_cache_size)
-+++++            
-+++++    #         original_max_cache_length = max_cache_length
-+++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-+++++            
-+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++    #             if max_cache_length > self.config.max_position_embeddings:
-+++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+++++        
-+++++    #     result = super()._prepare_cache_for_generation(
-+++++    #         generation_config=generation_config,
-+++++    #         model_kwargs=model_kwargs,
-+++++    #         assistant_model=assistant_model,
-+++++    #         batch_size=batch_size,
-+++++    #         max_cache_length=max_cache_length,
-+++++    #     )
-+++++        
-+++++    #     if generation_config.cache_implementation == "static":
-+++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+++++    #         created_cache = model_kwargs.get(cache_name)
-+++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+++++    #             if created_cache.max_cache_len < generation_config.max_length:
-+++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+++++        
-+++++    #     return result
-+++++
-+++++
-+++++
-++++ 
-++++ 
-++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++++-- 
-++++2.27.0
-++++
-+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+++new file mode 100644
-+++index 00000000..22b65dd5
-+++--- /dev/null
-++++++ b/patches/0002-20251106commit.patch
-+++@@ -0,0 +1,3200 @@
-++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-++++From: Pinoeer-kingxi <13022943007@163.com>
-++++Date: Thu, 6 Nov 2025 09:20:38 +0800
-++++Subject: [PATCH 2/3] 20251106commit
-++++
-++++---
-++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
-++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
-++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
-++++ create mode 100644 patches/0001-20251104commit.patch
-++++
-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++index d8303e45..73773c22 100644
-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
-++++         #     y = y + self.shared_experts(identity)
-++++         # return y
-++++         
-+++++    # @no_grad()
-+++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++
-+++++    #     expert_cache = ops.zeros_like(x)
-+++++    #     for i in range(self.num_experts_per_tok):
-+++++    #         expert_id = flat_expert_indices[i].item()
-+++++    #         weight = flat_expert_weights[i].item()
-+++++    #         expert = self.experts[expert_id]
-+++++    #         expert_out = expert(x)
-+++++    #         expert_cache += expert_out * weight
-+++++    #     return expert_cache
-+++++
-++++     @no_grad()
-++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++        # x 的 shape: (1, hidden_size)
-+++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+++++
-+++++        # 1. 收集所有需要的专家层
-+++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-+++++
-+++++        # 2. 并行计算所有专家的输出
-+++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+++++        # ops.cat 会将它们堆叠成一个新的 Tensor
-+++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+++++
-+++++        # 3. 使用矩阵乘法进行加权求和
-+++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++++        # 最终结果 final_output 的 shape: (1, hidden_size)
-+++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+++++        
-+++++        return final_output
-++++ 
-++++-        expert_cache = ops.zeros_like(x)
-++++-        for i in range(self.num_experts_per_tok):
-++++-            expert_id = flat_expert_indices[i].item()
-++++-            weight = flat_expert_weights[i].item()
-++++-            expert = self.experts[expert_id]
-++++-            expert_out = expert(x)
-++++-            expert_cache += expert_out * weight
-++++-        return expert_cache
-++++ 
-++++     @no_grad()
-++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
-++++             key_states = self.k_proj(hidden_states)
-++++             value_states = self.v_proj(hidden_states)
-++++ 
-++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++        # @lwx
-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
-+++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-+++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-+++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-++++ 
-++++         kv_seq_len = key_states.shape[-2]
-++++         if past_key_value is not None:
-++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-++++ 
-+++++# class DeepseekFlashAttention(nn.Module):
-+++++#     """
-+++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-+++++
-+++++#     This class is designed as a drop-in replacement for DeepseekAttention.
-+++++#     """
-+++++
-+++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+++++#         super().__init__()
-+++++#         self.config = config
-+++++#         self.layer_idx = layer_idx
-+++++#         if layer_idx is None:
-+++++#             logger.warning(
-+++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++#                 "when creating this class."
-+++++#             )
-+++++
-+++++#         self.attention_dropout = config.attention_dropout
-+++++#         self.hidden_size = config.hidden_size
-+++++#         self.num_heads = config.num_attention_heads
-+++++#         self.head_dim = self.hidden_size // self.num_heads
-+++++#         self.num_key_value_heads = config.num_key_value_heads
-+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++#         self.max_position_embeddings = config.max_position_embeddings
-+++++#         self.rope_theta = config.rope_theta
-+++++#         self.is_causal = True
-+++++
-+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++#             raise ValueError(
-+++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++++#                 f" and `num_heads`: {self.num_heads})."
-+++++#             )
-+++++
-+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+++++#         self._init_rope()
-+++++
-+++++#     def _init_rope(self):
-+++++#         if self.config.rope_scaling is None:
-+++++#             self.rotary_emb = DeepseekRotaryEmbedding(
-+++++#                 self.head_dim,
-+++++#                 max_position_embeddings=self.max_position_embeddings,
-+++++#                 base=self.rope_theta,
-+++++#             )
-+++++#         else:
-+++++#             scaling_type = self.config.rope_scaling["type"]
-+++++#             scaling_factor = self.config.rope_scaling["factor"]
-+++++#             if scaling_type == "linear":
-+++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+++++#                     self.head_dim,
-+++++#                     max_position_embeddings=self.max_position_embeddings,
-+++++#                     scaling_factor=scaling_factor,
-+++++#                     base=self.rope_theta,
-+++++#                 )
-+++++#             elif scaling_type == "dynamic":
-+++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+++++#                     self.head_dim,
-+++++#                     max_position_embeddings=self.max_position_embeddings,
-+++++#                     scaling_factor=scaling_factor,
-+++++#                     base=self.rope_theta,
-+++++#                 )
-+++++#             else:
-+++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+++++
-+++++#     def forward(
-+++++#         self,
-+++++#         hidden_states: mindspore.Tensor,
-+++++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++++#         position_ids: Optional[mindspore.Tensor] = None,
-+++++#         past_key_value: Optional[Cache] = None,
-+++++#         output_attentions: bool = False,
-+++++#         use_cache: bool = False,
-+++++#         **kwargs,
-+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++#         if "padding_mask" in kwargs:
-+++++#             warnings.warn(
-+++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+++++#             )
-+++++        
-+++++#         if output_attentions:
-+++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-+++++
-+++++#         bsz, q_len, _ = hidden_states.shape
-+++++
-+++++#         if self.config.pretraining_tp > 1:
-+++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+++++
-+++++#         query_states = self.q_proj(hidden_states)
-+++++#         key_states = self.k_proj(hidden_states)
-+++++#         value_states = self.v_proj(hidden_states)
-+++++
-+++++#         # Reshape for multi-head attention
-+++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++#         kv_seq_len = key_states.shape[-2]
-+++++#         if past_key_value is not None:
-+++++#             if self.layer_idx is None:
-+++++#                 raise ValueError(
-+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++#                     "with a layer index."
-+++++#                 )
-+++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++        
-+++++#         # Apply Rotary Positional Embedding
-+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++#         if past_key_value is not None:
-+++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-+++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++
-+++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-+++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-+++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++        
-+++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++++        
-+++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++++
-+++++#         # Convert attention_mask for flash_attention_score
-+++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-+++++#         if attention_mask is not None:
-+++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-+++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+++++#                 raise ValueError(
-+++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+++++#                 )
-+++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-+++++#         else:
-+++++#             attn_mask_for_fa = None
-+++++        
-+++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+++++
-+++++#         # Call the fused flash_attention_score operator
-+++++#         attn_output = mindspore.ops.flash_attention_score(
-+++++#             query=query_states_for_fa,
-+++++#             key=key_states_for_fa,
-+++++#             value=value_states_for_fa,
-+++++#             head_num=self.num_heads, # This is N1, the number of query heads
-+++++#             input_layout='BSH',
-+++++#             attn_mask=attn_mask_for_fa,
-+++++#             keep_prob=keep_prob,
-+++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++#             sparse_mode=0 # Default mask mode
-+++++#         )
-+++++        
-+++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-+++++#         attn_output = self.o_proj(attn_output)
-+++++        
-+++++#         # Flash Attention does not return attention weights
-+++++#         attn_weights = None
-+++++
-+++++#         return attn_output, attn_weights, past_key_value
-+++++
-+++++class DeepseekFlashAttention(nn.Module):
-+++++    """
-+++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
-+++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
-+++++    designed for high performance on supported hardware (Ascend).
-+++++
-+++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
-+++++    """
-+++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+++++        super().__init__()
-+++++        self.config = config
-+++++        self.layer_idx = layer_idx
-+++++        if layer_idx is None:
-+++++            logger.warning(
-+++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++                "when creating this class."
-+++++            )
-+++++
-+++++        # --- [FIX] Correctly initialize all required attributes ---
-+++++        self.attention_dropout = config.attention_dropout
-+++++        self.hidden_size = config.hidden_size
-+++++        self.num_heads = config.num_attention_heads
-+++++        self.head_dim = self.hidden_size // self.num_heads
-+++++        self.num_key_value_heads = config.num_key_value_heads
-+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++        self.max_position_embeddings = config.max_position_embeddings
-+++++        self.rope_theta = config.rope_theta
-+++++        self.is_causal = True
-+++++
-+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++            raise ValueError(
-+++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++++                f" and `num_heads`: {self.num_heads})."
-+++++            )
-+++++
-+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+++++        
-+++++        # This call will now succeed as all attributes are initialized.
-+++++        self._init_rope()
-+++++
-+++++    def _init_rope(self):
-+++++        if self.config.rope_scaling is None:
-+++++            self.rotary_emb = DeepseekRotaryEmbedding(
-+++++                self.head_dim,
-+++++                max_position_embeddings=self.max_position_embeddings,
-+++++                base=self.rope_theta,
-+++++            )
-+++++        else:
-+++++            scaling_type = self.config.rope_scaling["type"]
-+++++            scaling_factor = self.config.rope_scaling["factor"]
-+++++            if scaling_type == "linear":
-+++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+++++                    self.head_dim,
-+++++                    max_position_embeddings=self.max_position_embeddings,
-+++++                    scaling_factor=scaling_factor,
-+++++                    base=self.rope_theta,
-+++++                )
-+++++            elif scaling_type == "dynamic":
-+++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+++++                    self.head_dim,
-+++++                    max_position_embeddings=self.max_position_embeddings,
-+++++                    scaling_factor=scaling_factor,
-+++++                    base=self.rope_theta,
-+++++                )
-+++++            else:
-+++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+++++
-+++++    def forward(
-+++++        self,
-+++++        hidden_states: mindspore.Tensor,
-+++++        attention_mask: Optional[mindspore.Tensor] = None,
-+++++        position_ids: Optional[mindspore.Tensor] = None,
-+++++        past_key_value: Optional[Cache] = None,
-+++++        output_attentions: bool = False,
-+++++        use_cache: bool = False,
-+++++        **kwargs,
-+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++        if "padding_mask" in kwargs:
-+++++            warnings.warn(
-+++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+++++            )
-+++++        if output_attentions:
-+++++            warnings.warn(
-+++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
-+++++            )
-+++++
-+++++        bsz, q_len, _ = hidden_states.shape
-+++++
-+++++        if self.config.pretraining_tp > 1:
-+++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+++++
-+++++        query_states = self.q_proj(hidden_states)
-+++++        key_states = self.k_proj(hidden_states)
-+++++        value_states = self.v_proj(hidden_states)
-+++++
-+++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++        kv_seq_len = key_states.shape[-2]
-+++++        if past_key_value is not None:
-+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++        
-+++++        # Apply Rotary Position Embedding
-+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++        if past_key_value is not None:
-+++++            cache_kwargs = {"sin": sin, "cos": cos}
-+++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++
-+++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
-+++++        # So we must explicitly repeat the KV heads.
-+++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++
-+++++        # Convert attention mask for flash_attention_score
-+++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
-+++++        if attention_mask is not None:
-+++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+++++                 raise ValueError(
-+++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+++++                )
-+++++            attn_mask_for_fa = attention_mask < 0
-+++++        else:
-+++++            attn_mask_for_fa = None
-+++++
-+++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+++++
-+++++        # Call the fused operator using the efficient BNSD layout
-+++++        attn_output = mindspore.ops.flash_attention_score(
-+++++            query=query_states,
-+++++            key=key_states,
-+++++            value=value_states,
-+++++            head_num=self.num_heads,
-+++++            input_layout='BNSD', # Specify the correct layout
-+++++            attn_mask=attn_mask_for_fa,
-+++++            keep_prob=keep_prob,
-+++++            scalar_value=1.0 / math.sqrt(self.head_dim)
-+++++        )
-+++++        
-+++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
-+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++        
-+++++        # Apply output projection
-+++++        attn_output = self.o_proj(attn_output)
-+++++
-+++++        # Flash attention does not return attention weights, so we return None.
-+++++        attn_weights = None
-+++++
-+++++        return attn_output, attn_weights, past_key_value
-+++++
-++++ Deepseek_ATTENTION_CLASSES = {
-++++     "eager": DeepseekAttention,
-+++++    "flash-attention": DeepseekFlashAttention,
-++++ }
-++++ 
-++++ 
-++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
-++++             config=config, layer_idx=layer_idx
-++++         )
-++++ 
-+++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-+++++            config=config, layer_idx=layer_idx
-+++++        )
-+++++
-++++         self.mlp = (
-++++             DeepseekMoE(config)
-++++             if (
-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++index d4c6b651..bced285c 100644
-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
-++++ 
-++++ import mindspore
-++++ import mindnlp.core.nn.functional as F
-++++-from mindnlp.core import nn, ops
-+++++from mindnlp.core import nn, ops, no_grad
-++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-++++ 
-++++ from ....common.activations import ACT2FN
-++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
-++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-++++ 
-+++++Long_Prompt = False
-+++++PROMPT_LENGTH_THRESHOLD = 128
-++++ 
-++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-++++ def _prepare_4d_causal_attention_mask_with_cache_position(
-++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-++++ 
-+++++# class Qwen2MoeFlashAttention(nn.Module):
-+++++#     """
-+++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++++
-+++++#     关键改动:
-+++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++++#        直接传入原始的 key 和 value 张量效率更高。
-+++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++#     """
-+++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++#         super().__init__()
-+++++#         self.config = config
-+++++#         self.layer_idx = layer_idx
-+++++#         self.hidden_size = config.hidden_size
-+++++#         self.num_heads = config.num_attention_heads
-+++++#         self.head_dim = self.hidden_size // self.num_heads
-+++++#         self.num_key_value_heads = config.num_key_value_heads
-+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++#         self.max_position_embeddings = config.max_position_embeddings
-+++++#         self.rope_theta = config.rope_theta
-+++++#         self.attention_dropout = config.attention_dropout
-+++++
-+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++#             raise ValueError(
-+++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++++#             )
-+++++
-+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++
-+++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++#             self.head_dim,
-+++++#             max_position_embeddings=self.max_position_embeddings,
-+++++#             base=self.rope_theta,
-+++++#         )
-+++++
-+++++#     def forward(
-+++++#         self,
-+++++#         hidden_states: mindspore.Tensor,
-+++++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++++#         position_ids: Optional[mindspore.Tensor] = None,
-+++++#         past_key_value: Optional[Cache] = None,
-+++++#         output_attentions: bool = False,
-+++++#         use_cache: bool = False,
-+++++#         cache_position: Optional[mindspore.Tensor] = None,
-+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++#         bsz, q_len, _ = hidden_states.shape
-+++++
-+++++#         # 1. 线性投射 Q, K, V
-+++++#         query_states = self.q_proj(hidden_states)
-+++++#         key_states = self.k_proj(hidden_states)
-+++++#         value_states = self.v_proj(hidden_states)
-+++++
-+++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
-+++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++#         # 3. RoPE 旋转位置编码
-+++++#         kv_seq_len = key_states.shape[-2]
-+++++#         if past_key_value is not None:
-+++++#             if self.layer_idx is None:
-+++++#                 raise ValueError(
-+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++#                     "with a layer index."
-+++++#                 )
-+++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++++#                 if cache_position.shape[0] == 1:
-+++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++++#                     kv_seq_len = past_seen_tokens + 1
-+++++#                 else:
-+++++#                     # prefill 阶段：cache_position 是范围，使用其长度
-+++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++++#             else:
-+++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++
-+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++#         # 4. KV 缓存更新
-+++++#         if past_key_value is not None:
-+++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++#             key_states, value_states = past_key_value.update(
-+++++#                 key_states, value_states, self.layer_idx, cache_kwargs
-+++++#             )
-+++++            
-+++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++#                 if cache_position.shape[0] == 1:
-+++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++++#                     kv_seq_len = key_states.shape[-2]
-+++++
-+++++#         # 5. [重要] 准备 Attention Mask
-+++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++#         fa_attention_mask = None
-+++++#         if attention_mask is not None:
-+++++#             # 截取与当前key长度匹配的部分
-+++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++++#             fa_attention_mask = (mask_slice != 0)
-+++++
-+++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++++#         input_dtype = query_states.dtype
-+++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++++#             query_states = query_states.to(mindspore.float16)
-+++++#             key_states = key_states.to(mindspore.float16)
-+++++#             value_states = value_states.to(mindspore.float16)
-+++++
-+++++#         # 6. [核心] 调用 flash_attention_score 算子
-+++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++#         attn_output = mindspore.ops.flash_attention_score(
-+++++#             query=query_states,
-+++++#             key=key_states,
-+++++#             value=value_states,
-+++++#             head_num=self.num_heads, # 传入Q的头数(N1)
-+++++#             attn_mask=fa_attention_mask,
-+++++#             keep_prob=1.0 - self.attention_dropout,
-+++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++#             input_layout="BNSD",
-+++++#             sparse_mode=0 # 使用 defaultMask 模式
-+++++#         )
-+++++
-+++++#         # 恢复原始数据类型
-+++++#         attn_output = attn_output.to(input_dtype)
-+++++
-+++++#         # 7. 调整输出形状
-+++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++#         attn_output = self.o_proj(attn_output)
-+++++
-+++++#         # FlashAttention 算子不直接返回注意力权重矩阵
-+++++#         attn_weights = None
-+++++#         if output_attentions:
-+++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++
-+++++#         return attn_output, attn_weights, past_key_value
-+++++
-+++++#     # def forward(
-+++++#     #     self,
-+++++#     #     hidden_states: mindspore.Tensor,
-+++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++#     #     position_ids: Optional[mindspore.Tensor] = None,
-+++++#     #     past_key_value: Optional[Cache] = None,
-+++++#     #     output_attentions: bool = False,
-+++++#     #     use_cache: bool = False,
-+++++#     #     cache_position: Optional[mindspore.Tensor] = None,
-+++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++#     #     bsz, q_len, _ = hidden_states.shape
-+++++
-+++++#     #     # 1. 线性投射 Q, K, V
-+++++#     #     query_states = self.q_proj(hidden_states)
-+++++#     #     key_states = self.k_proj(hidden_states)
-+++++#     #     value_states = self.v_proj(hidden_states)
-+++++
-+++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++
-+++++#     #     # 3. RoPE 旋转位置编码
-+++++#     #     kv_seq_len = key_states.shape[-2]
-+++++#     #     if past_key_value is not None:
-+++++#     #         if self.layer_idx is None:
-+++++#     #             raise ValueError(
-+++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++#     #                 "with a layer index."
-+++++#     #             )
-+++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++
-+++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++#     #     # 4. KV 缓存更新
-+++++#     #     if past_key_value is not None:
-+++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++#     #         key_states, value_states = past_key_value.update(
-+++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++#     #         )
-+++++
-+++++#     #     # 5. 准备 Attention Mask
-+++++#     #     fa_attention_mask = None
-+++++#     #     if attention_mask is not None:
-+++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++#     #         fa_attention_mask = (mask_slice != 0)
-+++++
-+++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++++#     #     input_dtype = query_states.dtype
-+++++
-+++++#     #     # 6. [核心] 调用 flash_attention_score 算子
-+++++#     #     attn_output = mindspore.ops.flash_attention_score(
-+++++#     #         query=query_states,
-+++++#     #         key=key_states,
-+++++#     #         value=value_states,
-+++++#     #         head_num=self.num_heads,
-+++++#     #         attn_mask=fa_attention_mask,
-+++++#     #         keep_prob=1.0 - self.attention_dropout,
-+++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++#     #         input_layout="BNSD",
-+++++#     #         sparse_mode=0,
-+++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++++#     #         inner_precise=1
-+++++#     #     )
-+++++
-+++++#     #     # 恢复原始数据类型
-+++++#     #     attn_output = attn_output.to(input_dtype)
-+++++
-+++++#     #     # 7. 调整输出形状
-+++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++#     #     attn_output = self.o_proj(attn_output)
-+++++
-+++++#     #     attn_weights = None
-+++++#     #     if output_attentions:
-+++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++
-+++++#     #     return attn_output, attn_weights, past_key_value
-+++++
-+++++
-++++ class Qwen2MoeFlashAttention(nn.Module):
-++++     """
-++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++-
-++++-    关键改动:
-++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++-       直接传入原始的 key 和 value 张量效率更高。
-++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
-+++++
-+++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
-+++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
-+++++    完全使用模型的低精度数据类型（如 float16）进行计算，
-+++++    以达到理论上的最高执行速度。
-++++     """
-++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++         super().__init__()
-++++         self.config = config
-++++         self.layer_idx = layer_idx
-+++++        if layer_idx is None:
-+++++            logger.warning_once(
-+++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
-+++++            )
-+++++
-++++         self.hidden_size = config.hidden_size
-++++         self.num_heads = config.num_attention_heads
-++++         self.head_dim = self.hidden_size // self.num_heads
-++++         self.num_key_value_heads = config.num_key_value_heads
-++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++         self.max_position_embeddings = config.max_position_embeddings
-++++         self.rope_theta = config.rope_theta
-++++         self.attention_dropout = config.attention_dropout
-++++ 
-++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++-            raise ValueError(
-++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++-            )
-++++-
-++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
-++++         key_states = self.k_proj(hidden_states)
-++++         value_states = self.v_proj(hidden_states)
-++++ 
-++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++        # 2. 调整形状以匹配 BNSD 布局
-++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-
-++++-        # 3. RoPE 旋转位置编码
-+++++        
-+++++        # 3. RoPE 和 KV 缓存
-++++         kv_seq_len = key_states.shape[-2]
-++++         if past_key_value is not None:
-++++-            if self.layer_idx is None:
-++++-                raise ValueError(
-++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++-                    "with a layer index."
-++++-                )
-++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++-                if cache_position.shape[0] == 1:
-++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++-                    kv_seq_len = past_seen_tokens + 1
-++++-                else:
-++++-                    # prefill 阶段：cache_position 是范围，使用其长度
-++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++-            else:
-++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++-
-+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++        
-++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++ 
-++++-        # 4. KV 缓存更新
-++++         if past_key_value is not None:
-++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++-            key_states, value_states = past_key_value.update(
-++++-                key_states, value_states, self.layer_idx, cache_kwargs
-++++-            )
-++++-            
-++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++-                if cache_position.shape[0] == 1:
-++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++-                    kv_seq_len = key_states.shape[-2]
-++++-
-++++-        # 5. [重要] 准备 Attention Mask
-++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++
-+++++        # 4. 准备 Attention Mask
-++++         fa_attention_mask = None
-++++         if attention_mask is not None:
-++++-            # 截取与当前key长度匹配的部分
-++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++             fa_attention_mask = (mask_slice != 0)
-++++ 
-++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++-        input_dtype = query_states.dtype
-++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++-            query_states = query_states.to(mindspore.float16)
-++++-            key_states = key_states.to(mindspore.float16)
-++++-            value_states = value_states.to(mindspore.float16)
-++++-
-++++-        # 6. [核心] 调用 flash_attention_score 算子
-++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
-++++         attn_output = mindspore.ops.flash_attention_score(
-++++             query=query_states,
-++++             key=key_states,
-++++             value=value_states,
-++++-            head_num=self.num_heads, # 传入Q的头数(N1)
-+++++            head_num=self.num_heads,
-++++             attn_mask=fa_attention_mask,
-++++-            keep_prob=1.0 - self.attention_dropout,
-+++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
-++++             scalar_value=1.0 / math.sqrt(self.head_dim),
-++++             input_layout="BNSD",
-++++-            sparse_mode=0 # 使用 defaultMask 模式
-+++++            sparse_mode=0,
-+++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
-++++         )
-++++ 
-++++-        # 恢复原始数据类型
-++++-        attn_output = attn_output.to(input_dtype)
-++++-
-++++-        # 7. 调整输出形状
-++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++        # 6. 调整输出形状
-++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++         attn_output = self.o_proj(attn_output)
-++++ 
-++++-        # FlashAttention 算子不直接返回注意力权重矩阵
-+++++        # 7. 返回结果
-++++         attn_weights = None
-++++         if output_attentions:
-++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-++++ 
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-++++-    # def forward(
-++++-    #     self,
-++++-    #     hidden_states: mindspore.Tensor,
-++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++-    #     position_ids: Optional[mindspore.Tensor] = None,
-++++-    #     past_key_value: Optional[Cache] = None,
-++++-    #     output_attentions: bool = False,
-++++-    #     use_cache: bool = False,
-++++-    #     cache_position: Optional[mindspore.Tensor] = None,
-++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++-
-++++-    #     bsz, q_len, _ = hidden_states.shape
-++++-
-++++-    #     # 1. 线性投射 Q, K, V
-++++-    #     query_states = self.q_proj(hidden_states)
-++++-    #     key_states = self.k_proj(hidden_states)
-++++-    #     value_states = self.v_proj(hidden_states)
-++++-
-++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-
-++++-    #     # 3. RoPE 旋转位置编码
-++++-    #     kv_seq_len = key_states.shape[-2]
-++++-    #     if past_key_value is not None:
-++++-    #         if self.layer_idx is None:
-++++-    #             raise ValueError(
-++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++-    #                 "with a layer index."
-++++-    #             )
-++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++ 
-++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++-
-++++-    #     # 4. KV 缓存更新
-++++-    #     if past_key_value is not None:
-++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++-    #         key_states, value_states = past_key_value.update(
-++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++-    #         )
-++++-
-++++-    #     # 5. 准备 Attention Mask
-++++-    #     fa_attention_mask = None
-++++-    #     if attention_mask is not None:
-++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++-    #         fa_attention_mask = (mask_slice != 0)
-++++-
-++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++-    #     input_dtype = query_states.dtype
-++++-
-++++-    #     # 6. [核心] 调用 flash_attention_score 算子
-++++-    #     attn_output = mindspore.ops.flash_attention_score(
-++++-    #         query=query_states,
-++++-    #         key=key_states,
-++++-    #         value=value_states,
-++++-    #         head_num=self.num_heads,
-++++-    #         attn_mask=fa_attention_mask,
-++++-    #         keep_prob=1.0 - self.attention_dropout,
-++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++-    #         input_layout="BNSD",
-++++-    #         sparse_mode=0,
-++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++-    #         inner_precise=1
-++++-    #     )
-++++-
-++++-    #     # 恢复原始数据类型
-++++-    #     attn_output = attn_output.to(input_dtype)
-+++++QWEN2MOE_ATTENTION_CLASSES = {
-+++++    "eager": Qwen2MoeAttention,
-+++++    "flash-attention": Qwen2MoeFlashAttention,
-+++++}
-++++ 
-++++-    #     # 7. 调整输出形状
-++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++-    #     attn_output = self.o_proj(attn_output)
-++++ 
-++++-    #     attn_weights = None
-++++-    #     if output_attentions:
-++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++#     def __init__(self, config):
-+++++#         super().__init__()
-+++++#         self.num_experts = config.num_experts
-+++++#         self.top_k = config.num_experts_per_tok
-+++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++
-+++++#         # gating
-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++#         self.experts = nn.ModuleList(
-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++#         )
-+++++
-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++
-+++++#     #@dwj
-+++++#     # 只遍历激活的专家，而非全部专家
-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++#             num_tokens = hidden_states_reshaped.shape[0]
-+++++
-+++++#             router_logits = self.gate(hidden_states_reshaped)
-+++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++
-+++++#             if self.norm_topk_prob:
-+++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++#             routing_weights = routing_weights.to(hidden_states.dtype)
-+++++            
-+++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++++#             flat_selected_experts = selected_experts.flatten()
-+++++            
-+++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++++#             token_indices = broadcasted_token_indices.flatten()
-+++++            
-+++++#             active_experts = ops.unique(flat_selected_experts)
-+++++            
-+++++#             for expert_idx_tensor in active_experts:
-+++++#                 expert_idx = expert_idx_tensor.item()
-+++++#                 expert_layer = self.experts[expert_idx]
-+++++                
-+++++#                 mask = (flat_selected_experts == expert_idx_tensor)
-+++++#                 selected_token_indices = token_indices[mask]
-+++++#                 selected_routing_weights = routing_weights.flatten()[mask]
-+++++                
-+++++#                 current_states = hidden_states_reshaped[selected_token_indices]
-+++++                
-+++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++                
-+++++#                 final_hidden_states = final_hidden_states.index_add(
-+++++#                     dim=0,
-+++++#                     index=selected_token_indices,
-+++++#                     source=expert_output.to(hidden_states.dtype)
-+++++#                 )
-+++++            
-+++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++ 
-++++-    #     return attn_output, attn_weights, past_key_value
-+++++#             final_hidden_states = final_hidden_states + shared_expert_output
-+++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++            
-+++++#             return final_hidden_states, router_logits
-+++++
-+++++
-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++#     """
-+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-+++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-+++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
-+++++#     """
-+++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++#         super().__init__()
-+++++#         self.num_experts = config.num_experts
-+++++#         self.top_k = config.num_experts_per_tok
-+++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++
-+++++#         # 门控网络
-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++#         # 专家列表
-+++++#         self.experts = nn.ModuleList(
-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++#         )
-+++++#         # 共享专家
-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++
-+++++#     @no_grad()
-+++++#     def _moe_infer_decode(
-+++++#         self, 
-+++++#         hidden_states: mindspore.Tensor, 
-+++++#         selected_experts: mindspore.Tensor, 
-+++++#         routing_weights: mindspore.Tensor
-+++++#     ) -> mindspore.Tensor:
-+++++#         """
-+++++#         【解码路径】针对 sequence_length=1 的极致优化。
-+++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-+++++#         """
-+++++#         batch_size, hidden_dim = hidden_states.shape
-+++++        
-+++++#         expert_outputs_list = [
-+++++#             ops.cat([
-+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++#             ], dim=0) 
-+++++#             for i in range(batch_size)
-+++++#         ]
-+++++        
-+++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-+++++#         # shape: (batch_size, top_k, hidden_dim)
-+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++        
-+++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-+++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++        
-+++++#         return moe_output.squeeze(1)
-+++++
-+++++#     @no_grad()
-+++++#     def _moe_infer_prefill(
-+++++#         self, 
-+++++#         hidden_states: mindspore.Tensor, 
-+++++#         selected_experts: mindspore.Tensor, 
-+++++#         routing_weights: mindspore.Tensor
-+++++#     ) -> mindspore.Tensor:
-+++++#         """
-+++++#         【预填充路径】针对 sequence_length > 1 的优化。
-+++++#         按专家对 Token 进行分组，并进行批处理。
-+++++#         """
-+++++#         moe_output = ops.zeros_like(hidden_states)
-+++++#         num_tokens = hidden_states.shape[0]
-+++++#         flat_selected_experts = selected_experts.flatten()
-+++++        
-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++        
-+++++#         active_experts = ops.unique(flat_selected_experts)
-+++++        
-+++++#         for expert_idx_tensor in active_experts:
-+++++#             expert_idx = expert_idx_tensor.item()
-+++++#             expert_layer = self.experts[expert_idx]
-+++++            
-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++#             selected_token_indices = token_indices[mask]
-+++++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++            
-+++++#             current_states = hidden_states[selected_token_indices]
-+++++            
-+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++            
-+++++#             moe_output = moe_output.index_add(
-+++++#                 dim=0,
-+++++#                 index=selected_token_indices,
-+++++#                 source=expert_output.to(hidden_states.dtype)
-+++++#             )
-+++++#         return moe_output
-+++++
-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++#         """
-+++++#         顶层 forward 方法，作为智能分发器。
-+++++#         """
-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++        
-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++ 
-++++-    # def forward(
-++++-    #     self,
-++++-    #     hidden_states: mindspore.Tensor,
-++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++-    #     position_ids: Optional[mindspore.Tensor] = None,
-++++-    #     past_key_value: Optional[Cache] = None,
-++++-    #     output_attentions: bool = False,
-++++-    #     use_cache: bool = False,
-++++-    #     cache_position: Optional[mindspore.Tensor] = None,
-++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++-
-++++-    #     bsz, q_len, _ = hidden_states.shape
-++++-
-++++-    #     query_states = self.q_proj(hidden_states)
-++++-    #     key_states = self.k_proj(hidden_states)
-++++-    #     value_states = self.v_proj(hidden_states)
-++++-
-++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-
-++++-    #     kv_seq_len = key_states.shape[-2]
-++++-    #     if past_key_value is not None:
-++++-    #         if self.layer_idx is None:
-++++-    #             raise ValueError("`layer_idx` must be specified for caching")
-++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++-
-++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++-
-++++-    #     if past_key_value is not None:
-++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++-    #         key_states, value_states = past_key_value.update(
-++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++-    #         )
-+++++#         if self.norm_topk_prob:
-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++        
-+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++        
-+++++#         moe_output = None
-+++++#         # 在推理时，根据序列长度选择最优路径
-+++++#         if not self.training:
-+++++#             if sequence_length == 1:
-+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++++#             else:
-+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++++#         else:
-+++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-+++++#             raise NotImplementedError("Training path is not implemented.")
-+++++
-+++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-+++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-+++++        
-+++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-+++++        
-+++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-+++++        
-+++++#         return final_hidden_states, router_logits
-+++++
-+++++
-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++#     """
-+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-+++++#     """
-+++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++#         super().__init__()
-+++++#         self.num_experts = config.num_experts
-+++++#         self.top_k = config.num_experts_per_tok
-+++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++
-+++++#         # 门控网络
-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++#         # 专家列表
-+++++#         self.experts = nn.ModuleList(
-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++#         )
-+++++#         # 共享专家
-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++
-+++++#     @no_grad()
-+++++#     def _moe_infer_decode(
-+++++#         self, 
-+++++#         hidden_states: mindspore.Tensor, 
-+++++#         selected_experts: mindspore.Tensor, 
-+++++#         routing_weights: mindspore.Tensor
-+++++#     ) -> mindspore.Tensor:
-+++++#         batch_size, _ = hidden_states.shape
-+++++#         expert_outputs_list = [
-+++++#             ops.cat([
-+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++#             ], dim=0) 
-+++++#             for i in range(batch_size)
-+++++#         ]
-+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++#         return moe_output.squeeze(1)
-+++++
-+++++#     @no_grad()
-+++++#     def _moe_infer_prefill(
-+++++#         self, 
-+++++#         hidden_states: mindspore.Tensor, 
-+++++#         selected_experts: mindspore.Tensor, 
-+++++#         routing_weights: mindspore.Tensor
-+++++#     ) -> mindspore.Tensor:
-+++++#         moe_output = ops.zeros_like(hidden_states)
-+++++#         num_tokens = hidden_states.shape[0]
-+++++#         flat_selected_experts = selected_experts.flatten()
-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++#         active_experts = ops.unique(flat_selected_experts)
-+++++        
-+++++#         for expert_idx_tensor in active_experts:
-+++++#             expert_idx = expert_idx_tensor.item()
-+++++#             expert_layer = self.experts[expert_idx]
-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++#             selected_token_indices = token_indices[mask]
-+++++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++#             current_states = hidden_states[selected_token_indices]
-+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++#             moe_output = moe_output.index_add(
-+++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++#             )
-+++++#         return moe_output
-+++++
-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++#         """
-+++++#         顶层 forward 方法，作为智能分发器。
-+++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-+++++#         """
-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++        
-+++++#         # 1. 门控计算 (通用逻辑)
-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++
-+++++#         if self.norm_topk_prob:
-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++        
-+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++        
-+++++#         # 2. 智能分发到最优 MoE 路径
-+++++#         moe_output = None
-+++++#         if not self.training:
-+++++#             if sequence_length == 1:
-+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++++#             else:
-+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++++#         else:
-+++++#             raise NotImplementedError("Training path is not implemented.")
-+++++
-+++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-+++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++        
-+++++#         # 4. 合并 MoE 输出和共享专家输出
-+++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++        
-+++++#         # 5. 恢复原始形状并返回
-+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++        
-+++++#         return final_hidden_states, router_logits
-+++++
-+++++# prefill fastest
-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++#     """
-+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-+++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-+++++#     """
-+++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++#         super().__init__()
-+++++#         self.num_experts = config.num_experts
-+++++#         self.top_k = config.num_experts_per_tok
-+++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++
-+++++#         # 门控网络
-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++#         # 专家列表
-+++++#         self.experts = nn.ModuleList(
-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++#         )
-+++++#         # 共享专家
-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++
-+++++#     @no_grad()
-+++++#     def _moe_infer_dispatch(
-+++++#         self, 
-+++++#         hidden_states: mindspore.Tensor, 
-+++++#         selected_experts: mindspore.Tensor, 
-+++++#         routing_weights: mindspore.Tensor
-+++++#     ) -> mindspore.Tensor:
-+++++#         """
-+++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-+++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-+++++#         """
-+++++#         moe_output = ops.zeros_like(hidden_states)
-+++++#         num_tokens, _ = hidden_states.shape
-+++++        
-+++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-+++++#         flat_selected_experts = selected_experts.flatten()
-+++++#         flat_routing_weights = routing_weights.flatten()
-++++ 
-++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++-
-++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++-    #     query_states = query_states / math.sqrt(self.head_dim)
-++++-    #     # <--- 修改结束 ---
-++++-
-++++-    #     fa_attention_mask = None
-++++-    #     if attention_mask is not None:
-++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++-    #         fa_attention_mask = (mask_slice != 0)
-++++-
-++++-    #     input_dtype = query_states.dtype
-++++-
-++++-    #     attn_output = mindspore.ops.flash_attention_score(
-++++-    #         query=query_states,    # 传入已经预先缩放过的 query
-++++-    #         key=key_states,
-++++-    #         value=value_states,
-++++-    #         head_num=self.num_heads,
-++++-    #         attn_mask=fa_attention_mask,
-++++-    #         keep_prob=1.0 - self.attention_dropout,
-++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++-    #         input_layout="BNSD",
-++++-    #         sparse_mode=0,
-++++-    #         inner_precise=1        # 仍然保持内部高精度计算
-++++-    #     )
-+++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++ 
-++++-    #     attn_output = attn_output.to(input_dtype)
-++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++-    #     attn_output = self.o_proj(attn_output)
-+++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-+++++#         active_experts = ops.unique(flat_selected_experts)
-+++++        
-+++++#         for expert_idx_tensor in active_experts:
-+++++#             expert_idx = expert_idx_tensor.item()
-+++++#             expert_layer = self.experts[expert_idx]
-+++++            
-+++++#             # 找到所有分配给该专家的 token
-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++            
-+++++#             # 使用 mask 选取对应的 token 和权重
-+++++#             current_token_indices = token_indices[mask]
-+++++#             current_routing_weights = flat_routing_weights[mask]
-+++++#             current_hidden_states = hidden_states[current_token_indices]
-+++++            
-+++++#             # 对这些 token 进行批处理
-+++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++++            
-+++++#             # 使用 index_add 将结果精确地加回到对应位置
-+++++#             moe_output = moe_output.index_add(
-+++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++#             )
-+++++#         return moe_output
-+++++
-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++#         """
-+++++#         顶层 forward 方法，作为智能分发器。
-+++++#         """
-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++        
-+++++#         # 1. 门控计算
-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++
-+++++#         if self.norm_topk_prob:
-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++        
-+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++        
-+++++#         # 2. 调用统一的 MoE 计算内核
-+++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-+++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-++++ 
-++++-    #     attn_weights = None
-++++-    #     if output_attentions:
-++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++++#         # 3. 统一处理共享专家
-+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++        
-+++++#         # 4. 合并输出
-+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++        
-+++++#         # 5. 恢复原始形状并返回
-+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++        
-+++++#         return final_hidden_states, router_logits
-+++++
-+++++
-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++#     """
-+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++#     【最终高性能与高精度版】：
-+++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-+++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-+++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-+++++#     3. 这样实现了速度和准确性的两全其美。
-+++++#     """
-+++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++#         super().__init__()
-+++++#         self.num_experts = config.num_experts
-+++++#         self.top_k = config.num_experts_per_tok
-+++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++
-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++#         self.experts = nn.ModuleList(
-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++#         )
-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++
-+++++#     @no_grad()
-+++++#     def _moe_infer_decode(
-+++++#         self, 
-+++++#         hidden_states: mindspore.Tensor, 
-+++++#         selected_experts: mindspore.Tensor, 
-+++++#         routing_weights: mindspore.Tensor
-+++++#     ) -> mindspore.Tensor:
-+++++#         """
-+++++#         【解码路径】极致优化版：bmm + 高精度累加。
-+++++#         """
-+++++#         original_dtype = hidden_states.dtype
-+++++#         batch_size, _ = hidden_states.shape
-+++++        
-+++++#         expert_outputs_list = [
-+++++#             ops.cat([
-+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++#             ], dim=0) 
-+++++#             for i in range(batch_size)
-+++++#         ]
-+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++
-+++++#         # 在 float32 下执行 bmm，得到高精度结果
-+++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++        
-+++++#         # 将高精度结果转换回原始数据类型
-+++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-+++++        
-+++++#         return moe_output
-+++++
-+++++#     @no_grad()
-+++++#     def _moe_infer_prefill(
-+++++#         self, 
-+++++#         hidden_states: mindspore.Tensor, 
-+++++#         selected_experts: mindspore.Tensor, 
-+++++#         routing_weights: mindspore.Tensor
-+++++#     ) -> mindspore.Tensor:
-+++++#         """
-+++++#         【预填充路径】与原始实现一致，结果精确。
-+++++#         """
-+++++#         moe_output = ops.zeros_like(hidden_states)
-+++++#         num_tokens, _ = hidden_states.shape
-+++++#         flat_selected_experts = selected_experts.flatten()
-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++#         active_experts = ops.unique(flat_selected_experts)
-+++++        
-+++++#         for expert_idx_tensor in active_experts:
-+++++#             expert_idx = expert_idx_tensor.item()
-+++++#             expert_layer = self.experts[expert_idx]
-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++#             selected_token_indices = token_indices[mask]
-+++++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++#             current_states = hidden_states[selected_token_indices]
-+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++#             moe_output = moe_output.index_add(
-+++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++#             )
-+++++#         return moe_output
-+++++
-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++        
-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++ 
-++++-    #     return attn_output, attn_weights, past_key_value
-+++++#         if self.norm_topk_prob:
-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++        
-+++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-+++++#         # 如果模型主体是 float16，后续再转换
-+++++        
-+++++#         moe_output = None
-+++++#         if not self.training:
-+++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-+++++#             # _moe_infer_decode 内部会处理好类型转换
-+++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-+++++#             if sequence_length == 1:
-+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++++#             else:
-+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++++#         else:
-+++++#             raise NotImplementedError("Training path is not implemented.")
-+++++
-+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++        
-+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++        
-+++++#         return final_hidden_states, router_logits
-+++++    
-++++ 
-++++-QWEN2MOE_ATTENTION_CLASSES = {
-++++-    "eager": Qwen2MoeAttention,
-++++-    "flash-attention": Qwen2MoeFlashAttention,
-++++-}
-+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++#     """
-+++++#     【融合版】一个混合专家模块，内置两种推理策略，
-+++++#     由外部全局变量 `Long_Prompt` 控制：
-+++++
-+++++#     - if Long_Prompt is True:  【精度优先模式】
-+++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-+++++#       适用于处理长序列，避免误差累积。
-+++++
-+++++#     - if Long_Prompt is False: 【速度优先模式】
-+++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-+++++#       在解码阶段获得极致速度，同时保证结果高度准确。
-+++++#     """
-+++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++#         super().__init__()
-+++++#         self.num_experts = config.num_experts
-+++++#         self.top_k = config.num_experts_per_tok
-+++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++
-+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++#         self.experts = nn.ModuleList(
-+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++#         )
-+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++
-+++++#     # --- 速度优先模式的辅助函数 ---
-+++++#     @no_grad()
-+++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++#         original_dtype = hidden_states.dtype
-+++++#         batch_size, _ = hidden_states.shape
-+++++#         expert_outputs_list = [
-+++++#             ops.cat([
-+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++#             ], dim=0) 
-+++++#             for i in range(batch_size)
-+++++#         ]
-+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++#         weights_fp32 = routing_weights.to(mindspore.float32)
-+++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
-+++++
-+++++#     @no_grad()
-+++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++#         moe_output = ops.zeros_like(hidden_states)
-+++++#         num_tokens, _ = hidden_states.shape
-+++++#         flat_selected_experts = selected_experts.flatten()
-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++#         active_experts = ops.unique(flat_selected_experts)
-+++++#         for expert_idx_tensor in active_experts:
-+++++#             expert_idx = expert_idx_tensor.item()
-+++++#             expert_layer = self.experts[expert_idx]
-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++#             selected_token_indices = token_indices[mask]
-+++++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++#             current_states = hidden_states[selected_token_indices]
-+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++#         return moe_output
-+++++
-+++++#     # --- 精度优先模式的辅助函数 ---
-+++++#     @no_grad()
-+++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++#         moe_output = ops.zeros_like(hidden_states)
-+++++#         num_tokens, _ = hidden_states.shape
-+++++#         flat_selected_experts = selected_experts.flatten()
-+++++#         flat_routing_weights = routing_weights.flatten()
-+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++#         active_experts = ops.unique(flat_selected_experts)
-+++++#         for expert_idx_tensor in active_experts:
-+++++#             expert_idx = expert_idx_tensor.item()
-+++++#             expert_layer = self.experts[expert_idx]
-+++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++#             current_token_indices = token_indices[mask]
-+++++#             current_routing_weights = flat_routing_weights[mask]
-+++++#             current_hidden_states = hidden_states[current_token_indices]
-+++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++#         return moe_output
-+++++
-+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++#         # 声明我们将要使用一个在模块外部定义的全局变量
-+++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-+++++#         global Long_Prompt
-+++++
-+++++#         # 1. 门控计算 (所有模式通用)
-+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+++++#         if self.norm_topk_prob:
-+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++        
-+++++#         moe_output = None
-+++++#         if not self.training:
-+++++#             # 根据 Long_Prompt 标志选择模式
-+++++#             if Long_Prompt:
-+++++#                 # --- 精度优先模式 ---
-+++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++#             else:
-+++++#                 # --- 速度优先模式 ---
-+++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++#                 if sequence_length == 1:
-+++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++#                 else:
-+++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++#         else:
-+++++#             raise NotImplementedError("Training path is not implemented.")
-+++++
-+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++        
-+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++        
-+++++#         return final_hidden_states, router_logits
-+++++    
-+++++class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++    """
-+++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-+++++    控制的顶级推理策略：
-++++ 
-+++++    - if Long_Prompt is True:  【精度优先模式】
-+++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
-+++++      适用于需要严格可复现性的长序列任务。
-++++ 
-++++-class Qwen2MoeSparseMoeBlock(nn.Module):
-++++-    def __init__(self, config):
-+++++    - if Long_Prompt is False: 【速度优先模式】
-+++++      采用业界最强的性能组合：
-+++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
-+++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
-+++++    """
-+++++    def __init__(self, config: Qwen2MoeConfig):
-++++         super().__init__()
-++++         self.num_experts = config.num_experts
-++++         self.top_k = config.num_experts_per_tok
-++++         self.norm_topk_prob = config.norm_topk_prob
-++++ 
-++++-        # gating
-++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++         self.experts = nn.ModuleList(
-++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++         )
-++++-
-++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++ 
-++++-    #@dwj
-++++-    # 只遍历激活的专家，而非全部专家
-++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++-            num_tokens = hidden_states_reshaped.shape[0]
-++++-
-++++-            router_logits = self.gate(hidden_states_reshaped)
-++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++-
-++++-            if self.norm_topk_prob:
-++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-            routing_weights = routing_weights.to(hidden_states.dtype)
-++++-            
-++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++-            flat_selected_experts = selected_experts.flatten()
-++++-            
-++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++-            token_indices = broadcasted_token_indices.flatten()
-++++-            
-++++-            active_experts = ops.unique(flat_selected_experts)
-++++-            
-++++-            for expert_idx_tensor in active_experts:
-++++-                expert_idx = expert_idx_tensor.item()
-++++-                expert_layer = self.experts[expert_idx]
-++++-                
-++++-                mask = (flat_selected_experts == expert_idx_tensor)
-++++-                selected_token_indices = token_indices[mask]
-++++-                selected_routing_weights = routing_weights.flatten()[mask]
-++++-                
-++++-                current_states = hidden_states_reshaped[selected_token_indices]
-++++-                
-++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++-                
-++++-                final_hidden_states = final_hidden_states.index_add(
-++++-                    dim=0,
-++++-                    index=selected_token_indices,
-++++-                    source=expert_output.to(hidden_states.dtype)
-++++-                )
-++++-            
-++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-+++++    @no_grad()
-+++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++        original_dtype = hidden_states.dtype
-+++++        batch_size, _ = hidden_states.shape
-+++++        expert_outputs_list = [
-+++++            ops.cat([
-+++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++            ], dim=0)
-+++++            for i in range(batch_size)
-+++++        ]
-+++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++        weights_fp32 = routing_weights.to(mindspore.float32)
-+++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++++        return moe_output_fp32.squeeze(1).to(original_dtype)
-+++++
-+++++    @no_grad()
-+++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++        num_tokens, _ = hidden_states.shape
-+++++        flat_selected_experts = selected_experts.flatten()
-+++++        sorted_expert_indices = flat_selected_experts.argsort()
-+++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+++++        original_token_indices = sorted_expert_indices // self.top_k
-+++++        moe_output = ops.zeros_like(hidden_states)
-+++++        current_token_offset = 0
-+++++        for i in range(self.num_experts):
-+++++            expert_token_count = tokens_per_expert[i] - current_token_offset
-+++++            if expert_token_count == 0:
-+++++                continue
-+++++            end_offset = current_token_offset + expert_token_count
-+++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+++++            expert_hidden_states = hidden_states[expert_original_token_indices]
-+++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++            current_token_offset += expert_token_count
-+++++        return moe_output
-+++++
-+++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-+++++    @no_grad()
-+++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++        moe_output = ops.zeros_like(hidden_states)
-+++++        num_tokens, _ = hidden_states.shape
-+++++        flat_selected_experts = selected_experts.flatten()
-+++++        flat_routing_weights = routing_weights.flatten()
-+++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++        active_experts = ops.unique(flat_selected_experts)
-+++++        for expert_idx_tensor in active_experts:
-+++++            expert_idx = expert_idx_tensor.item()
-+++++            expert_layer = self.experts[expert_idx]
-+++++            mask = (flat_selected_experts == expert_idx_tensor)
-+++++            current_token_indices = token_indices[mask]
-+++++            current_routing_weights = flat_routing_weights[mask]
-+++++            current_hidden_states = hidden_states[current_token_indices]
-+++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++        return moe_output
-++++ 
-++++-            final_hidden_states = final_hidden_states + shared_expert_output
-++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++-            
-++++-            return final_hidden_states, router_logits
-+++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++        global Long_Prompt
-+++++
-+++++        # 1. 门控计算 (所有模式通用)
-+++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++        router_logits = self.gate(hidden_states_reshaped)
-+++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+++++        if self.norm_topk_prob:
-+++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++        
-+++++        moe_output = None
-+++++        if Long_Prompt:
-+++++            # --- 精度优先模式 (ACCURACY MODE) ---
-+++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++        else:
-+++++            # --- 速度优先模式 (SPEED MODE) ---
-+++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++            if sequence_length == 1:
-+++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++            else:
-+++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++        
-++++ 
-+++++        # 3. 共享专家计算与合并 (所有模式通用)
-+++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++        
-+++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++        
-+++++        return final_hidden_states, router_logits
-++++ 
-++++ class Qwen2MoeDecoderLayer(nn.Module):
-++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-++++         super().__init__()
-++++         self.hidden_size = config.hidden_size
-+++++        
-+++++        # if Long_Prompt:
-+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++        # else:
-+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++ 
-++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++ 
-++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++-
-++++         if (layer_idx not in config.mlp_only_layers) and (
-++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++++         ):
-++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++             self._warmed_up = True
-++++             self.warmup_moe_model()
-++++ 
-+++++
-+++++
-++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++++         output_router_logits = (
-++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
-++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++             router_logits=outputs.router_logits,
-++++         )
-++++ 
-+++++    def generate(self, *args, **kwargs):
-+++++        """
-+++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-+++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-+++++        """
-+++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+++++
-+++++        input_ids = kwargs.get("input_ids")
-+++++        if input_ids is None and args:
-+++++            input_ids = args[0]
-+++++
-+++++        if input_ids is not None:
-+++++            prompt_length = input_ids.shape[1]
-+++++            
-+++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-+++++                Long_Prompt = True
-+++++            else:
-+++++                Long_Prompt = False
-+++++
-+++++        return super().generate(*args, **kwargs)
-+++++    
-++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
-++++     def prepare_inputs_for_generation(
-++++         self,
-++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
-++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
-++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
-+++++        
-++++         if past_key_values is not None:
-++++             if inputs_embeds is not None:  # Exception 1
-++++                 if 0 not in input_ids.shape:
-++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++             }
-++++         )
-++++         return model_inputs
-+++++
-++++ # @lwx
-++++     # def _decode_one_tokens_logits(
-++++     #     self,
-++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
-++++             attentions=outputs.attentions,
-++++         )
-++++ 
-+++++
-++++ __all__ = [
-++++     "Qwen2MoeForCausalLM",
-++++     "Qwen2MoeModel",
-++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++++new file mode 100644
-++++index 00000000..6dfb5b93
-++++--- /dev/null
-+++++++ b/patches/0001-20251104commit.patch
-++++@@ -0,0 +1,1272 @@
-+++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++++From: Pinoeer-kingxi <13022943007@163.com>
-+++++Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++++Subject: [PATCH] 20251104commit
-+++++
-+++++---
-+++++ mindnlp/transformers/cache_utils.py           |  28 +-
-+++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+++++ 3 files changed, 976 insertions(+), 87 deletions(-)
-+++++
-+++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+++++index cadd2e04..02f8d4be 100644
-+++++--- a/mindnlp/transformers/cache_utils.py
-++++++++ b/mindnlp/transformers/cache_utils.py
-+++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+++++             # k_out[:, :, cache_position] = key_states
-+++++             # v_out[:, :, cache_position] = value_states
-+++++-            if ON_ORANGE_PI:
-+++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++-            else:
-+++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++-
-++++++            # if ON_ORANGE_PI:
-++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++++            # else:
-++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++++            # 确保 cache_position 是 1D tensor 并且类型正确
-++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++++++            if cache_position.ndim > 1:
-++++++                cache_position = cache_position.flatten()
-++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++++++                cache_position = cache_position.int()
-++++++            
-++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++++++            k_out[:, :, cache_position] = key_states
-++++++            v_out[:, :, cache_position] = value_states
-++++++            
-+++++         return k_out, v_out
-+++++ 
-+++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++index c695b944..d8303e45 100644
-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++++ def rotate_half(x):
-+++++     """Rotates half the hidden dims of the input."""
-+++++-    x1 = x[..., : x.shape[-1] // 2]
-+++++-    x2 = x[..., x.shape[-1] // 2 :]
-++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++++    # x1 = x[..., : x.shape[-1] // 2]
-++++++    # x2 = x[..., x.shape[-1] // 2 :]
-++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++     return ops.cat((-x2, x1), dim=-1)
-+++++ 
-+++++ 
-+++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+++++         if self.training:
-+++++             raise NotImplementedError("Training is not supported yet.")
-+++++         else:
-+++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++-        if self.config.n_shared_experts is not None:
-+++++-            y = y + self.shared_experts(identity)
-+++++-        return y
-++++++            # @lwx
-++++++            if orig_shape[1] == 1:
-++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++++++                y=y.view(*orig_shape)
-++++++                if self.config.n_shared_experts is not None:
-++++++                    y = y + self.shared_experts(identity)
-++++++                return y
-++++++            else:
-++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++++++                if self.config.n_shared_experts is not None:
-++++++                    y = y + self.shared_experts(identity)
-++++++                return y
-++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++++        # if self.config.n_shared_experts is not None:
-++++++        #     y = y + self.shared_experts(identity)
-++++++        # return y
-++++++        
-++++++    @no_grad()
-++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++++
-++++++        expert_cache = ops.zeros_like(x)
-++++++        for i in range(self.num_experts_per_tok):
-++++++            expert_id = flat_expert_indices[i].item()
-++++++            weight = flat_expert_weights[i].item()
-++++++            expert = self.experts[expert_id]
-++++++            expert_out = expert(x)
-++++++            expert_cache += expert_out * weight
-++++++        return expert_cache
-+++++ 
-+++++     @no_grad()
-+++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++-        # expert_cache = torch.zeros_like(x)
-+++++-        # idxs = flat_expert_indices.argsort()
-+++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++-        # token_idxs = idxs // self.num_experts_per_tok
-+++++-        # for i, end_idx in enumerate(tokens_per_expert):
-+++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++-        #     if start_idx == end_idx:
-+++++-        #         continue
-+++++-        #     expert = self.experts[i]
-+++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++-        #     expert_tokens = x[exp_token_idx]
-+++++-        #     expert_out = expert(expert_tokens)
-+++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++-        # return expert_cache
-++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++         expert_cache = ops.zeros_like(x)
-+++++         idxs = flat_expert_indices.argsort()
-+++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++         token_idxs = idxs // self.num_experts_per_tok
-++++++
-+++++         for i, end_idx in enumerate(tokens_per_expert):
-+++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++             if start_idx == end_idx:
-+++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+++++             expert_out = expert(expert_tokens)
-+++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++
-+++++         return expert_cache
-++++++        
-++++++    # @no_grad()
-++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++    #     # expert_cache = torch.zeros_like(x)
-++++++    #     # idxs = flat_expert_indices.argsort()
-++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++    #     # token_idxs = idxs // self.num_experts_per_tok
-++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++    #     #     if start_idx == end_idx:
-++++++    #     #         continue
-++++++    #     #     expert = self.experts[i]
-++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++    #     #     expert_tokens = x[exp_token_idx]
-++++++    #     #     expert_out = expert(expert_tokens)
-++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++    #     # return expert_cache
-++++++    #     expert_cache = ops.zeros_like(x)
-++++++    #     idxs = flat_expert_indices.argsort()
-++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++++
-++++++    #     for i, end_idx in enumerate(tokens_per_expert):
-++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++    #         if start_idx == end_idx:
-++++++    #             continue
-++++++    #         expert = self.experts[i]
-++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++    #         expert_tokens = x[exp_token_idx]
-++++++    #         expert_out = expert(expert_tokens)
-++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++
-++++++    #     return expert_cache
-++++++    # @no_grad()
-++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++    #     expert_cache = ops.zeros_like(x)
-++++++
-++++++    #     # 排序保证顺序一致
-++++++    #     idxs = flat_expert_indices.argsort()
-++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++++
-++++++    #     # 找出有 token 的专家
-++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++++
-++++++    #     for i in active_experts.tolist():
-++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++    #         end_idx = tokens_per_expert[i]
-++++++    #         if start_idx == end_idx:  # 没有 token
-++++++    #             continue
-++++++
-++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++    #         expert_tokens = x[exp_token_idx]
-++++++    #         expert_out = self.experts[i](expert_tokens)
-++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++++
-++++++    #         expert_cache = mindspore.mint.scatter_add(
-++++++    #             expert_cache,
-++++++    #             0,
-++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++++    #             expert_out
-++++++    #         )
-++++++
-++++++    #     return expert_cache
-++++++
-++++++
-+++++ 
-+++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+++++ #     """
-+++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++ 
-+++++         # Initialize weights and apply final processing
-+++++         self.post_init()
-++++++        self.warm_up = False
-++++++
-++++++    def warmup_moe_model_deep(self):
-++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++++        test_texts = [
-++++++            "warmup short",
-++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++++++        ]
-++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++++        if tokenizer is None:
-++++++            from mindnlp.transformers import AutoTokenizer
-++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++++            self._warmup_tokenizer = tokenizer
-++++++
-++++++        for text in test_texts:
-++++++            inputs = tokenizer(text, return_tensors="ms")
-++++++            with mindspore._no_grad():
-++++++                _ = self(**inputs, use_cache=False)
-++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+++++ 
-+++++     def get_input_embeddings(self):
-+++++         return self.model.embed_tokens
-+++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++++         ```"""
-++++++        if not self.warm_up:
-++++++            self.warm_up = True
-++++++            self.warmup_moe_model_deep()
-++++++
-+++++         output_attentions = (
-+++++             output_attentions
-+++++             if output_attentions is not None
-+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++index 3cbf820e..d4c6b651 100644
-+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++@@ -18,7 +18,6 @@
-+++++ # See the License for the specific language governing permissions and
-+++++ # limitations under the License.
-+++++ """MindSpore Qwen2MoE model."""
-+++++-
-+++++ import math
-+++++ from typing import List, Optional, Tuple, Union
-+++++ 
-+++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+++++     TokenClassifierOutput,
-+++++ )
-+++++ from ...modeling_utils import PreTrainedModel
-++++++from ...generation import GenerationMixin
-+++++ from ....utils import logging
-+++++ from .configuration_qwen2_moe import Qwen2MoeConfig
-+++++ 
-+++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+++++         self.variance_epsilon = eps
-+++++ 
-+++++     def forward(self, hidden_states):
-++++++        # @dwj
-++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++++        # @lwx
-++++++        # if not self.training :
-++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++         input_dtype = hidden_states.dtype
-+++++         hidden_states = hidden_states.to(mindspore.float32)
-+++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+++++@@ -234,6 +239,8 @@ def rotate_half(x):
-+++++     """Rotates half the hidden dims of the input."""
-+++++     x1 = x[..., : x.shape[-1] // 2]
-+++++     x2 = x[..., x.shape[-1] // 2 :]
-++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++     return ops.cat((-x2, x1), dim=-1)
-+++++ 
-+++++ 
-+++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+++++         self.config = config
-+++++         self.hidden_size = config.hidden_size
-+++++         self.intermediate_size = intermediate_size
-++++++        
-+++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+++++         self.act_fn = ACT2FN[config.hidden_act]
-+++++ 
-+++++     def forward(self, x):
-+++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++-
-+++++ 
-++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++++        # @lwx 
-++++++        # gate_up_output = self.gate_up_proj(x)
-++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++++++        # return self.down_proj(swiglu_output)
-++++++
-++++++    # def forward(self, x):
-++++++    #     gate_proj_out = self.gate_proj(x)
-++++++    #     up_proj_out = self.up_proj(x)
-++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++++++    #     return self.down_proj(swiglu_out)
-++++++        
-+++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++++     """
-+++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+++++         use_cache: bool = False,
-+++++         cache_position: Optional[mindspore.Tensor] = None,
-+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++        
-++++++
-+++++         bsz, q_len, _ = hidden_states.shape
-+++++ 
-+++++         query_states = self.q_proj(hidden_states)
-+++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++                     "with a layer index."
-+++++                 )
-+++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++            if isinstance(past_key_value, StaticCache):
-++++++                kv_seq_len = key_states.shape[-2]
-++++++            else:
-++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++ 
-+++++         if past_key_value is not None:
-+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++            
-++++++            if isinstance(past_key_value, StaticCache):
-++++++                kv_seq_len = key_states.shape[-2]
-+++++ 
-+++++         # repeat k/v heads if n_kv_heads < n_heads
-+++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++-
-++++++        
-+++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++ 
-+++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+++++-            raise ValueError(
-+++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+++++-                f" {attn_weights.shape}"
-+++++-            )
-+++++-
-+++++-        if attention_mask is not None:  # no matter the length, we just slice it
-+++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++++++        if attention_mask is not None:
-++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++             attn_weights = attn_weights + causal_mask
-+++++ 
-+++++         # upcast attention to fp32
-+++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++ 
-+++++         attn_output = self.o_proj(attn_output)
-+++++-
-++++++        # @lwx
-++++++        
-++++++        # max_seq_len = self.max_position_embeddings  # 2048
-++++++
-++++++        # if attention_mask is not None:
-++++++        #     # attention_mask: [B, 1, Sq, Sk]
-++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++++
-++++++        #     # pad 到 [max_seq_len, max_seq_len]
-++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++++        #     global_attention_mask = padded_mask
-++++++        # else:
-++++++        #     global_attention_mask = None
-++++++
-++++++
-++++++        # sparse_mode=3
-++++++        # attn_output = mindspore.ops.flash_attention_score(
-++++++        #     query=query_states,
-++++++        #     key=key_states,
-++++++        #     value=value_states,
-++++++        #     real_shift=None,
-++++++        #     padding_mask=None,
-++++++
-++++++        #     head_num=self.num_heads,
-++++++        #     attn_mask=global_attention_mask,
-++++++        #     keep_prob=1.0 - self.attention_dropout,
-++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++        #     input_layout="BNSD",
-++++++        #     pre_tokens=2147483647,
-++++++        #     next_tokens=2147483647,
-++++++        #     inner_precise=0,
-++++++        #     drop_mask=None,
-++++++        #     prefix=None,
-++++++        #     actual_seq_qlen=None,
-++++++        #     actual_seq_kvlen=None,
-++++++        #     sparse_mode=sparse_mode,
-++++++        # )
-+++++         if not output_attentions:
-+++++             attn_weights = None
-+++++ 
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++ 
-++++++class Qwen2MoeFlashAttention(nn.Module):
-++++++    """
-++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++++
-++++++    关键改动:
-++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++++       直接传入原始的 key 和 value 张量效率更高。
-++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++++    """
-++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++++        super().__init__()
-++++++        self.config = config
-++++++        self.layer_idx = layer_idx
-++++++        self.hidden_size = config.hidden_size
-++++++        self.num_heads = config.num_attention_heads
-++++++        self.head_dim = self.hidden_size // self.num_heads
-++++++        self.num_key_value_heads = config.num_key_value_heads
-++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++        self.max_position_embeddings = config.max_position_embeddings
-++++++        self.rope_theta = config.rope_theta
-++++++        self.attention_dropout = config.attention_dropout
-++++++
-++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++            raise ValueError(
-++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++++            )
-++++++
-++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++++
-++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++++            self.head_dim,
-++++++            max_position_embeddings=self.max_position_embeddings,
-++++++            base=self.rope_theta,
-++++++        )
-++++++
-++++++    def forward(
-++++++        self,
-++++++        hidden_states: mindspore.Tensor,
-++++++        attention_mask: Optional[mindspore.Tensor] = None,
-++++++        position_ids: Optional[mindspore.Tensor] = None,
-++++++        past_key_value: Optional[Cache] = None,
-++++++        output_attentions: bool = False,
-++++++        use_cache: bool = False,
-++++++        cache_position: Optional[mindspore.Tensor] = None,
-++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++        bsz, q_len, _ = hidden_states.shape
-++++++
-++++++        # 1. 线性投射 Q, K, V
-++++++        query_states = self.q_proj(hidden_states)
-++++++        key_states = self.k_proj(hidden_states)
-++++++        value_states = self.v_proj(hidden_states)
-++++++
-++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++        # 3. RoPE 旋转位置编码
-++++++        kv_seq_len = key_states.shape[-2]
-++++++        if past_key_value is not None:
-++++++            if self.layer_idx is None:
-++++++                raise ValueError(
-++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++                    "with a layer index."
-++++++                )
-++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++++                if cache_position.shape[0] == 1:
-++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++++                    kv_seq_len = past_seen_tokens + 1
-++++++                else:
-++++++                    # prefill 阶段：cache_position 是范围，使用其长度
-++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++++            else:
-++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++
-++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++        # 4. KV 缓存更新
-++++++        if past_key_value is not None:
-++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++            key_states, value_states = past_key_value.update(
-++++++                key_states, value_states, self.layer_idx, cache_kwargs
-++++++            )
-++++++            
-++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++                if cache_position.shape[0] == 1:
-++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++++                    kv_seq_len = key_states.shape[-2]
-++++++
-++++++        # 5. [重要] 准备 Attention Mask
-++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++++        fa_attention_mask = None
-++++++        if attention_mask is not None:
-++++++            # 截取与当前key长度匹配的部分
-++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++++            fa_attention_mask = (mask_slice != 0)
-++++++
-++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++++        input_dtype = query_states.dtype
-++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++++            query_states = query_states.to(mindspore.float16)
-++++++            key_states = key_states.to(mindspore.float16)
-++++++            value_states = value_states.to(mindspore.float16)
-++++++
-++++++        # 6. [核心] 调用 flash_attention_score 算子
-++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++++        attn_output = mindspore.ops.flash_attention_score(
-++++++            query=query_states,
-++++++            key=key_states,
-++++++            value=value_states,
-++++++            head_num=self.num_heads, # 传入Q的头数(N1)
-++++++            attn_mask=fa_attention_mask,
-++++++            keep_prob=1.0 - self.attention_dropout,
-++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++            input_layout="BNSD",
-++++++            sparse_mode=0 # 使用 defaultMask 模式
-++++++        )
-++++++
-++++++        # 恢复原始数据类型
-++++++        attn_output = attn_output.to(input_dtype)
-++++++
-++++++        # 7. 调整输出形状
-++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++        attn_output = self.o_proj(attn_output)
-++++++
-++++++        # FlashAttention 算子不直接返回注意力权重矩阵
-++++++        attn_weights = None
-++++++        if output_attentions:
-++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++
-++++++        return attn_output, attn_weights, past_key_value
-++++++
-++++++    # def forward(
-++++++    #     self,
-++++++    #     hidden_states: mindspore.Tensor,
-++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++    #     past_key_value: Optional[Cache] = None,
-++++++    #     output_attentions: bool = False,
-++++++    #     use_cache: bool = False,
-++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++    #     bsz, q_len, _ = hidden_states.shape
-++++++
-++++++    #     # 1. 线性投射 Q, K, V
-++++++    #     query_states = self.q_proj(hidden_states)
-++++++    #     key_states = self.k_proj(hidden_states)
-++++++    #     value_states = self.v_proj(hidden_states)
-++++++
-++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++    #     # 3. RoPE 旋转位置编码
-++++++    #     kv_seq_len = key_states.shape[-2]
-++++++    #     if past_key_value is not None:
-++++++    #         if self.layer_idx is None:
-++++++    #             raise ValueError(
-++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++    #                 "with a layer index."
-++++++    #             )
-++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++
-++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++    #     # 4. KV 缓存更新
-++++++    #     if past_key_value is not None:
-++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++    #         key_states, value_states = past_key_value.update(
-++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++    #         )
-++++++
-++++++    #     # 5. 准备 Attention Mask
-++++++    #     fa_attention_mask = None
-++++++    #     if attention_mask is not None:
-++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++    #         fa_attention_mask = (mask_slice != 0)
-++++++
-++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++++    #     input_dtype = query_states.dtype
-++++++
-++++++    #     # 6. [核心] 调用 flash_attention_score 算子
-++++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++++    #         query=query_states,
-++++++    #         key=key_states,
-++++++    #         value=value_states,
-++++++    #         head_num=self.num_heads,
-++++++    #         attn_mask=fa_attention_mask,
-++++++    #         keep_prob=1.0 - self.attention_dropout,
-++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++    #         input_layout="BNSD",
-++++++    #         sparse_mode=0,
-++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++++    #         inner_precise=1
-++++++    #     )
-++++++
-++++++    #     # 恢复原始数据类型
-++++++    #     attn_output = attn_output.to(input_dtype)
-++++++
-++++++    #     # 7. 调整输出形状
-++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++    #     attn_output = self.o_proj(attn_output)
-++++++
-++++++    #     attn_weights = None
-++++++    #     if output_attentions:
-++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++
-++++++    #     return attn_output, attn_weights, past_key_value
-++++++
-++++++    # def forward(
-++++++    #     self,
-++++++    #     hidden_states: mindspore.Tensor,
-++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++    #     past_key_value: Optional[Cache] = None,
-++++++    #     output_attentions: bool = False,
-++++++    #     use_cache: bool = False,
-++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++    #     bsz, q_len, _ = hidden_states.shape
-++++++
-++++++    #     query_states = self.q_proj(hidden_states)
-++++++    #     key_states = self.k_proj(hidden_states)
-++++++    #     value_states = self.v_proj(hidden_states)
-++++++
-++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++    #     kv_seq_len = key_states.shape[-2]
-++++++    #     if past_key_value is not None:
-++++++    #         if self.layer_idx is None:
-++++++    #             raise ValueError("`layer_idx` must be specified for caching")
-++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++
-++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++    #     if past_key_value is not None:
-++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++    #         key_states, value_states = past_key_value.update(
-++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++    #         )
-++++++
-++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++
-++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++++    #     query_states = query_states / math.sqrt(self.head_dim)
-++++++    #     # <--- 修改结束 ---
-++++++
-++++++    #     fa_attention_mask = None
-++++++    #     if attention_mask is not None:
-++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++    #         fa_attention_mask = (mask_slice != 0)
-++++++
-++++++    #     input_dtype = query_states.dtype
-++++++
-++++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++++    #         query=query_states,    # 传入已经预先缩放过的 query
-++++++    #         key=key_states,
-++++++    #         value=value_states,
-++++++    #         head_num=self.num_heads,
-++++++    #         attn_mask=fa_attention_mask,
-++++++    #         keep_prob=1.0 - self.attention_dropout,
-++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++++    #         input_layout="BNSD",
-++++++    #         sparse_mode=0,
-++++++    #         inner_precise=1        # 仍然保持内部高精度计算
-++++++    #     )
-++++++
-++++++    #     attn_output = attn_output.to(input_dtype)
-++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++    #     attn_output = self.o_proj(attn_output)
-++++++
-++++++    #     attn_weights = None
-++++++    #     if output_attentions:
-++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++++
-++++++    #     return attn_output, attn_weights, past_key_value
-++++++
-+++++ QWEN2MOE_ATTENTION_CLASSES = {
-+++++     "eager": Qwen2MoeAttention,
-++++++    "flash-attention": Qwen2MoeFlashAttention,
-+++++ }
-+++++ 
-+++++ 
-+++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++ 
-++++++    #@dwj
-++++++    # 只遍历激活的专家，而非全部专家
-+++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-        hidden_states = hidden_states.view(-1, hidden_dim)
-+++++-        # router_logits: (batch * sequence_length, n_experts)
-+++++-        router_logits = self.gate(hidden_states)
-+++++-
-+++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++-        if self.norm_topk_prob:
-+++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-        # we cast back to the input dtype
-+++++-        routing_weights = routing_weights.to(hidden_states.dtype)
-+++++-
-+++++-        final_hidden_states = ops.zeros(
-+++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+++++-        )
-+++++-
-+++++-        # One hot encode the selected experts to create an expert mask
-+++++-        # this will be used to easily index which expert is going to be sollicitated
-+++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+++++-
-+++++-        # Loop over all available experts in the model and perform the computation on each expert
-+++++-        for expert_idx in range(self.num_experts):
-+++++-            expert_layer = self.experts[expert_idx]
-+++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+++++-
-+++++-            # Index the correct hidden states and compute the expert hidden state for
-+++++-            # the current expert. We need to make sure to multiply the output hidden
-+++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+++++-            if 0 not in idx.shape:
-+++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+++++-
-+++++-                # However `index_add_` only support torch tensors for indexing so we'll use
-+++++-                # the `top_x` tensor here.
-+++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+++++-
-+++++-        shared_expert_output = self.shared_expert(hidden_states)
-+++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+++++-
-+++++-        final_hidden_states = final_hidden_states + shared_expert_output
-++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++            num_tokens = hidden_states_reshaped.shape[0]
-++++++
-++++++            router_logits = self.gate(hidden_states_reshaped)
-++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++
-++++++            if self.norm_topk_prob:
-++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++            routing_weights = routing_weights.to(hidden_states.dtype)
-++++++            
-++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++++            flat_selected_experts = selected_experts.flatten()
-++++++            
-++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++++            token_indices = broadcasted_token_indices.flatten()
-++++++            
-++++++            active_experts = ops.unique(flat_selected_experts)
-++++++            
-++++++            for expert_idx_tensor in active_experts:
-++++++                expert_idx = expert_idx_tensor.item()
-++++++                expert_layer = self.experts[expert_idx]
-++++++                
-++++++                mask = (flat_selected_experts == expert_idx_tensor)
-++++++                selected_token_indices = token_indices[mask]
-++++++                selected_routing_weights = routing_weights.flatten()[mask]
-++++++                
-++++++                current_states = hidden_states_reshaped[selected_token_indices]
-++++++                
-++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++                
-++++++                final_hidden_states = final_hidden_states.index_add(
-++++++                    dim=0,
-++++++                    index=selected_token_indices,
-++++++                    source=expert_output.to(hidden_states.dtype)
-++++++                )
-++++++            
-++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++++ 
-+++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++-        return final_hidden_states, router_logits
-++++++            final_hidden_states = final_hidden_states + shared_expert_output
-++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++            
-++++++            return final_hidden_states, router_logits
-+++++ 
-+++++ 
-+++++ class Qwen2MoeDecoderLayer(nn.Module):
-+++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+++++ 
-+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++ 
-++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++++
-+++++         if (layer_idx not in config.mlp_only_layers) and (
-+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++++         ):
-+++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+++++     _skip_keys_device_placement = "past_key_values"
-+++++     _supports_cache_class = True
-++++++#lwx
-++++++    # _supports_static_cache = True
-+++++ 
-+++++     def _init_weights(self, module):
-+++++         std = self.config.initializer_range
-+++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++++         return causal_mask
-+++++ 
-+++++ 
-+++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++     _tied_weights_keys = ["lm_head.weight"]
-+++++ 
-+++++     def __init__(self, config):
-+++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++         self.num_experts_per_tok = config.num_experts_per_tok
-+++++         # Initialize weights and apply final processing
-+++++         self.post_init()
-++++++        # @lwx
-++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++++++        #     self.generation_config.cache_implementation = "static"
-++++++        self._warmed_up = False
-++++++
-++++++    def warmup_moe_model(self):
-++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++++++        test_texts = [
-++++++            "warmup short",
-++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++++++        ]
-++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++++        if tokenizer is None:
-++++++            from mindnlp.transformers import AutoTokenizer
-++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++++            self._warmup_tokenizer = tokenizer
-++++++
-++++++        for text in test_texts:
-++++++            inputs = tokenizer(text, return_tensors="ms")
-++++++            with mindspore._no_grad():
-++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+++++ 
-+++++     def get_input_embeddings(self):
-+++++         return self.model.embed_tokens
-+++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++++         ```"""
-++++++        if not self._warmed_up:
-++++++            self._warmed_up = True
-++++++            self.warmup_moe_model()
-+++++ 
-+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++++         output_router_logits = (
-+++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++             }
-+++++         )
-+++++         return model_inputs
-++++++# @lwx
-++++++    # def _decode_one_tokens_logits(
-++++++    #     self,
-++++++    #     cur_token: mindspore.Tensor,
-++++++    #     input_pos: Optional[mindspore.Tensor],
-++++++    #     cache_position: mindspore.Tensor,
-++++++    #     past_key_values: StaticCache,
-++++++    # ) -> mindspore.Tensor:
-++++++    #     """
-++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++++++        
-++++++    #     Args:
-++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++++++    #         input_pos: 输入位置信息，可选
-++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++++++            
-++++++    #     Returns:
-++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++++++    #     """
-++++++    #     # 调用JIT编译的版本
-++++++    #     return self.get_decode_one_tokens_logits(
-++++++    #         cur_token=cur_token,
-++++++    #         input_pos=input_pos,
-++++++    #         cache_position=cache_position,
-++++++    #         past_key_values=past_key_values,
-++++++    #     )
-++++++    
-++++++    # @mindspore.jit(jit_level='O1')
-++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++++++    #     """
-++++++    #     JIT编译的函数，用于高效的单token解码
-++++++    #     使用JIT编译优化以支持静态shape和高效执行
-++++++        
-++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++++++    #     """
-++++++    #     outputs = self.model.forward(
-++++++    #         input_ids=cur_token,
-++++++    #         position_ids=input_pos,
-++++++    #         cache_position=cache_position,
-++++++    #         past_key_values=past_key_values,
-++++++    #         use_cache=True,
-++++++    #         return_dict=False,
-++++++    #     )
-++++++        
-++++++    #     hidden_states = outputs[0]
-++++++    #     logits = self.lm_head.forward(hidden_states)
-++++++    #     logits = logits.float()
-++++++        
-++++++    #     return logits[:, -1, :]
-++++++
-++++++    # def _sample(
-++++++    #     self,
-++++++    #     input_ids: mindspore.Tensor,
-++++++    #     logits_processor,
-++++++    #     stopping_criteria,
-++++++    #     generation_config,
-++++++    #     synced_devices: bool,
-++++++    #     streamer=None,
-++++++    #     logits_warper=None,
-++++++    #     **model_kwargs,
-++++++    # ):
-++++++    #     """
-++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++++++    #     """
-++++++    #     from ...generation.logits_process import LogitsProcessorList
-++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++++++    #     from mindnlp.core import nn, ops, no_grad
-++++++    #     import numpy as np
-++++++        
-++++++    #     # 检查是否使用 StaticCache
-++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++++++    #     # 否则，直接调用父类方法
-++++++    #     past_key_values = model_kwargs.get("past_key_values")
-++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++++++
-++++++    #     if not isinstance(past_key_values, StaticCache):
-++++++    #         # 不使用 StaticCache，直接调用父类方法
-++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++++++    #         return super()._sample(
-++++++    #             input_ids=input_ids,
-++++++    #             logits_processor=logits_processor,
-++++++    #             stopping_criteria=stopping_criteria,
-++++++    #             generation_config=generation_config,
-++++++    #             synced_devices=synced_devices,
-++++++    #             streamer=streamer,
-++++++    #             logits_warper=logits_warper,
-++++++    #             **model_kwargs,
-++++++    #         )
-++++++        
-++++++    #     # 使用 StaticCache，进入自定义循环
-++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++++++    #     pad_token_id = generation_config._pad_token_tensor
-++++++    #     output_attentions = generation_config.output_attentions
-++++++    #     output_hidden_states = generation_config.output_hidden_states
-++++++    #     output_scores = generation_config.output_scores
-++++++    #     output_logits = generation_config.output_logits
-++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++++++    #     max_length = generation_config.max_length
-++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++++++    #     do_sample = generation_config.do_sample
-++++++        
-++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++++++    #         raise ValueError(
-++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++++++    #             f"{logits_warper})."
-++++++    #         )
-++++++        
-++++++    #     # init attention / hidden states / scores tuples
-++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++++++        
-++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++++++    #         encoder_hidden_states = (
-++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++++++    #         )
-++++++        
-++++++    #     # keep track of which sequences are already finished
-++++++    #     batch_size, cur_len = input_ids.shape
-++++++    #     this_peer_finished = False
-++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++++++        
-++++++    #     time_record = []
-++++++    #     from ....utils.testing_utils import parse_flag_from_env
-++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++++++        
-++++++    #     while self._has_unfinished_sequences(
-++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++++++    #     ):
-++++++    #         if _record_time:
-++++++    #             import time as time_module
-++++++    #             infer_start = time_module.time()
-++++++            
-++++++    #         # prepare model inputs
-++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++++++            
-++++++    #         # prepare variable output controls
-++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++++++            
-++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++++++    #         cur_cache_position = model_inputs.get("cache_position")
-++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-++++++    #         cur_input_ids = model_inputs.get("input_ids")
-++++++            
-++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++++++    #             cur_cache_position is not None and 
-++++++    #             len(cur_cache_position.shape) > 0 and
-++++++    #             cur_cache_position.shape[0] == 1 and
-++++++    #             cur_input_ids is not None and
-++++++    #             cur_input_ids.shape[1] == 1):
-++++++    #             # 使用 JIT 优化的单 token 解码
-++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++++++    #             if not hasattr(self, '_jit_used'):
-++++++    #                 self._jit_used = False
-++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++++++                
-++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-++++++    #                 cur_token=cur_input_ids,
-++++++    #                 input_pos=model_inputs.get("position_ids"),
-++++++    #                 cache_position=cur_cache_position,
-++++++    #                 past_key_values=cur_past_key_values,
-++++++    #             )
-++++++                
-++++++    #             # 标记已使用JIT（用于后续判断）
-++++++    #             if not self._jit_used:
-++++++    #                 self._jit_used = True
-++++++                
-++++++    #             # 构造兼容的输出对象
-++++++    #             class JitOptimizedOutput:
-++++++    #                 def __init__(self, logits, config):
-++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++++++    #                     self.config = config
-++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-++++++    #                     self.cross_attentions = None
-++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++++++                
-++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++++++    #         else:
-++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++++++    #             outputs = self(**model_inputs, return_dict=True)
-++++++            
-++++++    #         if synced_devices and this_peer_finished:
-++++++    #             continue
-++++++            
-++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++++++    #         next_token_logits = outputs.logits[:, -1, :]
-++++++            
-++++++    #         # pre-process distribution
-++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++++++    #         if do_sample:
-++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++++++            
-++++++    #         # Store scores, attentions and hidden_states when required
-++++++    #         if return_dict_in_generate:
-++++++    #             if output_scores:
-++++++    #                 scores += (next_token_scores,)
-++++++    #             if output_logits:
-++++++    #                 raw_logits += (next_token_logits,)
-++++++    #             if output_attentions:
-++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++++++    #                 if self.config.is_encoder_decoder:
-++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++++++                
-++++++    #             if output_hidden_states:
-++++++    #                 hidden = (
-++++++    #                     outputs.decoder_hidden_states
-++++++    #                     if self.config.is_encoder_decoder
-++++++    #                     else outputs.hidden_states
-++++++    #                 )
-++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++++++            
-++++++    #         # token selection
-++++++    #         if do_sample:
-++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++++++    #         else:
-++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++++++            
-++++++    #         # finished sentences should have their next token be a padding token
-++++++    #         if has_eos_stopping_criteria:
-++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++++++            
-++++++    #         # update generated ids, model inputs, and length for next step
-++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++++++    #         if streamer is not None:
-++++++    #             streamer.put(next_tokens)
-++++++            
-++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-++++++    #             outputs,
-++++++    #             model_kwargs,
-++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++++++    #         )
-++++++            
-++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++++++    #         cur_len += 1
-++++++            
-++++++    #         if _record_time:
-++++++    #             import time as time_module
-++++++    #             infer_stop = time_module.time()
-++++++    #             time_record.append(infer_stop - infer_start)
-++++++            
-++++++    #         del outputs
-++++++        
-++++++    #     average_infer_time = None
-++++++    #     if time_record:
-++++++    #         if len(time_record) > 1:
-++++++    #             time_record.pop(0)
-++++++    #         average_infer_time = sum(time_record) / len(time_record)
-++++++    #         print(f'average inference time is: {average_infer_time}')
-++++++    #         print(f'inference time record: {time_record}')
-++++++        
-++++++    #     if streamer is not None:
-++++++    #         streamer.end()
-++++++        
-++++++    #     # 简单判断：打印是否使用了JIT路径
-++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-++++++    #     else:
-++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++++++        
-++++++    #     if return_dict_in_generate:
-++++++    #         if self.config.is_encoder_decoder:
-++++++    #             return GenerateEncoderDecoderOutput(
-++++++    #                 sequences=input_ids,
-++++++    #                 scores=scores,
-++++++    #                 logits=raw_logits,
-++++++    #                 encoder_attentions=encoder_attentions,
-++++++    #                 encoder_hidden_states=encoder_hidden_states,
-++++++    #                 decoder_attentions=decoder_attentions,
-++++++    #                 cross_attentions=cross_attentions,
-++++++    #                 decoder_hidden_states=decoder_hidden_states,
-++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++++    #                 average_infer_time=average_infer_time
-++++++    #             )
-++++++    #         else:
-++++++    #             return GenerateDecoderOnlyOutput(
-++++++    #                 sequences=input_ids,
-++++++    #                 scores=scores,
-++++++    #                 logits=raw_logits,
-++++++    #                 attentions=decoder_attentions,
-++++++    #                 hidden_states=decoder_hidden_states,
-++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++++    #                 average_infer_time=average_infer_time
-++++++    #             )
-++++++    #     else:
-++++++    #         return input_ids
-++++++            
-++++++    # def _prepare_cache_for_generation(
-++++++    #     self,
-++++++    #     generation_config,
-++++++    #     model_kwargs,
-++++++    #     assistant_model,
-++++++    #     batch_size,
-++++++    #     max_cache_length,
-++++++    # ):
-++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++++++    #         generation_config.cache_implementation = "static"
-++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++++++        
-++++++    #     if generation_config.cache_implementation == "static":
-++++++    #         base_required_from_max_length = generation_config.max_length + 1
-++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-++++++    #         min_cache_size = 50
-++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++++++    #         else:
-++++++    #             max_cache_length = max(base_required, min_cache_size)
-++++++            
-++++++    #         original_max_cache_length = max_cache_length
-++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-++++++            
-++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++++    #             if max_cache_length > self.config.max_position_embeddings:
-++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++++++        
-++++++    #     result = super()._prepare_cache_for_generation(
-++++++    #         generation_config=generation_config,
-++++++    #         model_kwargs=model_kwargs,
-++++++    #         assistant_model=assistant_model,
-++++++    #         batch_size=batch_size,
-++++++    #         max_cache_length=max_cache_length,
-++++++    #     )
-++++++        
-++++++    #     if generation_config.cache_implementation == "static":
-++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++++++    #         created_cache = model_kwargs.get(cache_name)
-++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++++++    #             if created_cache.max_cache_len < generation_config.max_length:
-++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++++++        
-++++++    #     return result
-++++++
-++++++
-++++++
-+++++ 
-+++++ 
-+++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+++++-- 
-+++++2.27.0
-+++++
-++++-- 
-++++2.27.0
-++++
-+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+++new file mode 100644
-+++index 00000000..966529e4
-+++--- /dev/null
-++++++ b/patches/0003-20261106secondcommit.patch
-+++@@ -0,0 +1,2769 @@
-++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-++++From: Pinoeer-kingxi <13022943007@163.com>
-++++Date: Thu, 6 Nov 2025 14:54:37 +0800
-++++Subject: [PATCH 3/3] 20261106secondcommit
-++++
-++++---
-++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
-++++ patches/0001-20251104commit.patch             | 1272 -----------------
-++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
-++++ delete mode 100644 patches/0001-20251104commit.patch
-++++
-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++index 73773c22..2f9192bf 100644
-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
-++++ 
-++++ _CONFIG_FOR_DOC = "DeepseekConfig"
-++++ 
-+++++_attn_mask_cache = {}
-+++++
-+++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-+++++    q_len = batch_and_seq[1]
-+++++    kv_len = batch_and_seq[1] + past_key_values_length 
-+++++    key = (batch_and_seq[0], q_len, kv_len)
-+++++
-+++++    if key in _attn_mask_cache:
-+++++        return _attn_mask_cache[key]
-+++++
-+++++    mask = _prepare_4d_causal_attention_mask(
-+++++        attention_mask,
-+++++        batch_and_seq,
-+++++        inputs_embeds,
-+++++        past_key_values_length,
-+++++    )
-+++++    _attn_mask_cache[key] = mask
-+++++    return mask
-++++ 
-++++ def _get_unpad_data(attention_mask):
-++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
-++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
-++++         return final_output
-++++ 
-++++ 
-++++-    @no_grad()
-++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++-        expert_cache = ops.zeros_like(x)
-++++-        idxs = flat_expert_indices.argsort()
-++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++-        token_idxs = idxs // self.num_experts_per_tok
-++++-
-++++-        for i, end_idx in enumerate(tokens_per_expert):
-++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++-            if start_idx == end_idx:
-++++-                continue
-++++-            expert = self.experts[i]
-++++-            exp_token_idx = token_idxs[start_idx:end_idx]
-++++-            expert_tokens = x[exp_token_idx]
-++++-            expert_out = expert(expert_tokens)
-++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++-
-++++-        return expert_cache
-++++-        
-++++     # @no_grad()
-++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++-    #     # expert_cache = torch.zeros_like(x)
-++++-    #     # idxs = flat_expert_indices.argsort()
-++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++-    #     # token_idxs = idxs // self.num_experts_per_tok
-++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++-    #     #     if start_idx == end_idx:
-++++-    #     #         continue
-++++-    #     #     expert = self.experts[i]
-++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++-    #     #     expert_tokens = x[exp_token_idx]
-++++-    #     #     expert_out = expert(expert_tokens)
-++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++-    #     # return expert_cache
-+++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++     #     expert_cache = ops.zeros_like(x)
-++++     #     idxs = flat_expert_indices.argsort()
-++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
-++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++ 
-++++     #     return expert_cache
-++++-    # @no_grad()
-++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++-    #     expert_cache = ops.zeros_like(x)
-+++++        
-+++++    @no_grad()
-+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++        """
-+++++        优化版 MoE prefill：
-+++++        - 批量张量化处理同一个 expert 的所有 token
-+++++        - 跳过无 token 的专家
-+++++        - 保持结果完全一致
-+++++        """
-+++++        # 初始化输出缓存
-+++++        expert_cache = ops.zeros_like(x)
-++++ 
-++++-    #     # 排序保证顺序一致
-++++-    #     idxs = flat_expert_indices.argsort()
-++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++-    #     token_idxs = idxs // self.num_experts_per_tok
-+++++        # 排序（确保 scatter_add 位置对应原逻辑）
-+++++        idxs = flat_expert_indices.argsort()
-+++++        sorted_expert_indices = flat_expert_indices[idxs]
-+++++        sorted_token_indices = idxs // self.num_experts_per_tok
-++++ 
-++++-    #     # 找出有 token 的专家
-++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++        # 每个 expert 的 token 数
-+++++        tokens_per_expert = sorted_expert_indices.bincount()
-++++ 
-++++-    #     for i in active_experts.tolist():
-++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++-    #         end_idx = tokens_per_expert[i]
-++++-    #         if start_idx == end_idx:  # 没有 token
-++++-    #             continue
-+++++        # 找出有 token 的专家
-+++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-++++ 
-++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++-    #         expert_tokens = x[exp_token_idx]
-++++-    #         expert_out = self.experts[i](expert_tokens)
-++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++        for expert_id in active_experts.tolist():
-+++++            # 取该 expert 对应的排序后 token 区间
-+++++            start = (tokens_per_expert[:expert_id]).sum().item()
-+++++            end = start + tokens_per_expert[expert_id].item()
-++++ 
-++++-    #         expert_cache = mindspore.mint.scatter_add(
-++++-    #             expert_cache,
-++++-    #             0,
-++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++-    #             expert_out
-++++-    #         )
-+++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
-+++++            expert_tokens = x[token_idx]                     # 取输入向量
-++++ 
-++++-    #     return expert_cache
-+++++            # 执行专家 MLP
-+++++            expert_out = self.experts[expert_id](expert_tokens)
-+++++
-+++++            # 按权重缩放
-+++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
-+++++
-+++++            # 回写到缓存（等价 scatter_add）
-+++++            expert_cache = mindspore.mint.scatter_add(
-+++++                expert_cache,
-+++++                0,
-+++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++                scaled_out
-+++++            )
-+++++
-+++++        return expert_cache
-+++++
-+++++        # @no_grad()
-+++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++        #     # expert_cache = torch.zeros_like(x)
-+++++        #     # idxs = flat_expert_indices.argsort()
-+++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++        #     # token_idxs = idxs // self.num_experts_per_tok
-+++++        #     # for i, end_idx in enumerate(tokens_per_expert):
-+++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++        #     #     if start_idx == end_idx:
-+++++        #     #         continue
-+++++        #     #     expert = self.experts[i]
-+++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++        #     #     expert_tokens = x[exp_token_idx]
-+++++        #     #     expert_out = expert(expert_tokens)
-+++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++        #     # return expert_cache
-+++++        #     expert_cache = ops.zeros_like(x)
-+++++        #     idxs = flat_expert_indices.argsort()
-+++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++        #     token_idxs = idxs // self.num_experts_per_tok
-+++++
-+++++        #     for i, end_idx in enumerate(tokens_per_expert):
-+++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++        #         if start_idx == end_idx:
-+++++        #             continue
-+++++        #         expert = self.experts[i]
-+++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++        #         expert_tokens = x[exp_token_idx]
-+++++        #         expert_out = expert(expert_tokens)
-+++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++
-+++++        #     return expert_cache
-+++++        # @no_grad()
-+++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++        #     expert_cache = ops.zeros_like(x)
-+++++
-+++++        #     # 排序保证顺序一致
-+++++        #     idxs = flat_expert_indices.argsort()
-+++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++        #     token_idxs = idxs // self.num_experts_per_tok
-+++++
-+++++        #     # 找出有 token 的专家
-+++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++
-+++++        #     for i in active_experts.tolist():
-+++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++        #         end_idx = tokens_per_expert[i]
-+++++        #         if start_idx == end_idx:  # 没有 token
-+++++        #             continue
-+++++
-+++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++        #         expert_tokens = x[exp_token_idx]
-+++++        #         expert_out = self.experts[i](expert_tokens)
-+++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++
-+++++        #         expert_cache = mindspore.mint.scatter_add(
-+++++        #             expert_cache,
-+++++        #             0,
-+++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++        #             expert_out
-+++++        #         )
-+++++
-+++++        #     return expert_cache
-++++ 
-++++ 
-++++ 
-++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
-++++ 
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-++++-
-++++ # class DeepseekFlashAttention(nn.Module):
-++++ #     """
-++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
-++++ 
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-+++++
-++++ Deepseek_ATTENTION_CLASSES = {
-++++     "eager": DeepseekAttention,
-++++     "flash-attention": DeepseekFlashAttention,
-++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
-++++             )
-++++         else:
-++++             # 4d mask is passed through the layers
-++++-            attention_mask = _prepare_4d_causal_attention_mask(
-+++++            # attention_mask = _prepare_4d_causal_attention_mask(
-+++++            #     attention_mask,
-+++++            #     (batch_size, seq_length),
-+++++            #     inputs_embeds,
-+++++            #     past_key_values_length,
-+++++            # )
-+++++            #@dwj
-+++++            attention_mask = get_cached_causal_mask(
-++++                 attention_mask,
-++++                 (batch_size, seq_length),
-++++                 inputs_embeds,
-++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++         # Initialize weights and apply final processing
-++++         self.post_init()
-++++         self.warm_up = False
-+++++        #@dwj
-+++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-+++++            self.num_layers,
-+++++            self.num_attention_heads,
-+++++            self.head_dim,
-+++++            batch_size=1,
-+++++            max_length=self.max_length,
-+++++            dtype=mindspore.float16
-+++++        )
-+++++
-+++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-+++++        key_cache = []
-+++++        value_cache = []
-+++++        for _ in range(num_layers):
-+++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++++            key_cache.append(k)
-+++++            value_cache.append(v)
-+++++        return key_cache, value_cache
-+++++
-++++ 
-++++     def warmup_moe_model_deep(self):
-++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++index bced285c..ebd7782e 100644
-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
-++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-++++ 
-++++-Long_Prompt = False
-++++-PROMPT_LENGTH_THRESHOLD = 128
-+++++Long_Prompt = 1
-+++++LONG_PROMPT_LENGTH_THRESHOLD = 128
-+++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
-+++++
-+++++_causal_mask_cache = {}
-+++++
-+++++def get_cached_causal_mask_with_cache_position(
-+++++    attention_mask: mindspore.Tensor,
-+++++    sequence_length: int,
-+++++    target_length: int,
-+++++    dtype: mindspore.dtype,
-+++++    min_dtype: float,
-+++++    cache_position: mindspore.Tensor,
-+++++    batch_size: int,
-+++++):
-+++++    """
-+++++    带缓存的 causal mask 构造函数
-+++++    """
-+++++    # q_len 是当前 query 长度
-+++++    q_len = sequence_length
-+++++    # kv_len 是 target_length
-+++++    kv_len = target_length
-+++++
-+++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
-+++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
-+++++
-+++++    if key in _causal_mask_cache:
-+++++        return _causal_mask_cache[key]
-+++++
-+++++    # 调用原来的 mask 构造逻辑
-+++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++        attention_mask,
-+++++        sequence_length=sequence_length,
-+++++        target_length=target_length,
-+++++        dtype=dtype,
-+++++        min_dtype=min_dtype,
-+++++        cache_position=cache_position,
-+++++        batch_size=batch_size,
-+++++    )
-+++++    # 缓存结果
-+++++    _causal_mask_cache[key] = causal_mask
-+++++    return causal_mask
-++++ 
-++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-++++ def _prepare_4d_causal_attention_mask_with_cache_position(
-++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++++ 
-++++ 
-++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
-+++++# class Qwen2MoeAttention(nn.Module):
-+++++#     """
-+++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-+++++#     and "Generating Long Sequences with Sparse Transformers".
-+++++#     """
-+++++
-+++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++#         super().__init__()
-+++++#         self.config = config
-+++++#         self.layer_idx = layer_idx
-+++++#         if layer_idx is None:
-+++++#             logger.warning_once(
-+++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++#                 "when creating this class."
-+++++#             )
-+++++
-+++++#         self.hidden_size = config.hidden_size
-+++++#         self.num_heads = config.num_attention_heads
-+++++#         self.head_dim = self.hidden_size // self.num_heads
-+++++#         self.num_key_value_heads = config.num_key_value_heads
-+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++#         self.max_position_embeddings = config.max_position_embeddings
-+++++#         self.rope_theta = config.rope_theta
-+++++#         self.is_causal = True
-+++++#         self.attention_dropout = config.attention_dropout
-+++++
-+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++#             raise ValueError(
-+++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++++#                 f" and `num_heads`: {self.num_heads})."
-+++++#             )
-+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++
-+++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++#             self.head_dim,
-+++++#             max_position_embeddings=self.max_position_embeddings,
-+++++#             base=self.rope_theta,
-+++++#         )
-+++++
-+++++#     def forward(
-+++++#         self,
-+++++#         hidden_states: mindspore.Tensor,
-+++++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++++#         position_ids: Optional[mindspore.Tensor] = None,
-+++++#         past_key_value: Optional[Cache] = None,
-+++++#         output_attentions: bool = False,
-+++++#         use_cache: bool = False,
-+++++#         cache_position: Optional[mindspore.Tensor] = None,
-+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++
-+++++        
-+++++
-+++++#         bsz, q_len, _ = hidden_states.shape
-+++++
-+++++#         query_states = self.q_proj(hidden_states)
-+++++#         key_states = self.k_proj(hidden_states)
-+++++#         value_states = self.v_proj(hidden_states)
-+++++
-+++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++
-+++++#         kv_seq_len = key_states.shape[-2]
-+++++#         if past_key_value is not None:
-+++++#             if self.layer_idx is None:
-+++++#                 raise ValueError(
-+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++#                     "with a layer index."
-+++++#                 )
-+++++#             if isinstance(past_key_value, StaticCache):
-+++++#                 kv_seq_len = key_states.shape[-2]
-+++++#             else:
-+++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++
-+++++#         if past_key_value is not None:
-+++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++            
-+++++#             if isinstance(past_key_value, StaticCache):
-+++++#                 kv_seq_len = key_states.shape[-2]
-+++++
-+++++#         # repeat k/v heads if n_kv_heads < n_heads
-+++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++        
-+++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++
-+++++#         if attention_mask is not None:
-+++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++#             attn_weights = attn_weights + causal_mask
-+++++
-+++++#         # upcast attention to fp32
-+++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+++++#         attn_output = ops.matmul(attn_weights, value_states)
-+++++
-+++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+++++#             raise ValueError(
-+++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-+++++#                 f" {attn_output.shape}"
-+++++#             )
-+++++
-+++++#         attn_output = ops.transpose(attn_output, 1, 2)
-+++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++
-+++++#         attn_output = self.o_proj(attn_output)
-+++++#         # @lwx
-+++++        
-+++++#         # max_seq_len = self.max_position_embeddings  # 2048
-+++++
-+++++#         # if attention_mask is not None:
-+++++#         #     # attention_mask: [B, 1, Sq, Sk]
-+++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++++
-+++++#         #     # pad 到 [max_seq_len, max_seq_len]
-+++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++++#         #     global_attention_mask = padded_mask
-+++++#         # else:
-+++++#         #     global_attention_mask = None
-+++++
-+++++
-+++++#         # sparse_mode=3
-+++++#         # attn_output = mindspore.ops.flash_attention_score(
-+++++#         #     query=query_states,
-+++++#         #     key=key_states,
-+++++#         #     value=value_states,
-+++++#         #     real_shift=None,
-+++++#         #     padding_mask=None,
-+++++
-+++++#         #     head_num=self.num_heads,
-+++++#         #     attn_mask=global_attention_mask,
-+++++#         #     keep_prob=1.0 - self.attention_dropout,
-+++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++#         #     input_layout="BNSD",
-+++++#         #     pre_tokens=2147483647,
-+++++#         #     next_tokens=2147483647,
-+++++#         #     inner_precise=0,
-+++++#         #     drop_mask=None,
-+++++#         #     prefix=None,
-+++++#         #     actual_seq_qlen=None,
-+++++#         #     actual_seq_kvlen=None,
-+++++#         #     sparse_mode=sparse_mode,
-+++++#         # )
-+++++#         if not output_attentions:
-+++++#             attn_weights = None
-+++++
-+++++#         return attn_output, attn_weights, past_key_value
-+++++
-++++ class Qwen2MoeAttention(nn.Module):
-++++     """
-++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-++++-    and "Generating Long Sequences with Sparse Transformers".
-++++-    """
-+++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
-++++ 
-+++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
-+++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
-+++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
-+++++
-+++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
-+++++    """
-++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++         super().__init__()
-++++         self.config = config
-++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
-++++         if layer_idx is None:
-++++             logger.warning_once(
-++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++                 "when creating this class."
-++++             )
-++++ 
-++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
-++++         use_cache: bool = False,
-++++         cache_position: Optional[mindspore.Tensor] = None,
-++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++-
-++++         
-++++-
-+++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
-++++         bsz, q_len, _ = hidden_states.shape
-++++ 
-++++         query_states = self.q_proj(hidden_states)
-++++         key_states = self.k_proj(hidden_states)
-++++         value_states = self.v_proj(hidden_states)
-++++ 
-++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++-
-+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++        
-++++         kv_seq_len = key_states.shape[-2]
-++++         if past_key_value is not None:
-++++-            if self.layer_idx is None:
-++++-                raise ValueError(
-++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++-                    "with a layer index."
-++++-                )
-++++-            if isinstance(past_key_value, StaticCache):
-++++-                kv_seq_len = key_states.shape[-2]
-++++-            else:
-++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++        
-++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++ 
-++++         if past_key_value is not None:
-++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++
-+++++        # --- 2. 动态调度核心注意力计算 ---
-+++++        global Long_Prompt
-+++++        if Long_Prompt >= 1:
-+++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
-+++++            fa_attention_mask = None
-+++++            if attention_mask is not None:
-+++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++                fa_attention_mask = (mask_slice != 0)
-+++++
-+++++            attn_output = mindspore.ops.flash_attention_score(
-+++++                query=query_states,
-+++++                key=key_states,
-+++++                value=value_states,
-+++++                head_num=self.num_heads,
-+++++                attn_mask=fa_attention_mask,
-+++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
-+++++                scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++                input_layout="BNSD",
-+++++                sparse_mode=0,
-+++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
-+++++            )
-++++             
-++++-            if isinstance(past_key_value, StaticCache):
-++++-                kv_seq_len = key_states.shape[-2]
-+++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++            attn_output = self.o_proj(attn_output)
-+++++            attn_weights = None
-+++++            if output_attentions:
-+++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-++++ 
-++++-        # repeat k/v heads if n_kv_heads < n_heads
-++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++-        
-++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++        else:
-+++++            # --- Eager Attention 路径 (用于短序列和解码) ---
-+++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++            
-+++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++ 
-++++-        if attention_mask is not None:
-++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++-            attn_weights = attn_weights + causal_mask
-+++++            if attention_mask is not None:
-+++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++                attn_weights = attn_weights + causal_mask
-++++ 
-++++-        # upcast attention to fp32
-++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++++-        attn_output = ops.matmul(attn_weights, value_states)
-+++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+++++            attn_output = ops.matmul(attn_weights, value_states)
-++++ 
-++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++++-            raise ValueError(
-++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-++++-                f" {attn_output.shape}"
-++++-            )
-+++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+++++                raise ValueError(
-+++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
-+++++                )
-++++ 
-++++-        attn_output = ops.transpose(attn_output, 1, 2)
-++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++            attn_output = ops.transpose(attn_output, 1, 2)
-+++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++            attn_output = self.o_proj(attn_output)
-++++ 
-++++-        attn_output = self.o_proj(attn_output)
-++++-        # @lwx
-+++++            if not output_attentions:
-+++++                attn_weights = None
-++++         
-++++-        # max_seq_len = self.max_position_embeddings  # 2048
-++++-
-++++-        # if attention_mask is not None:
-++++-        #     # attention_mask: [B, 1, Sq, Sk]
-++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++-
-++++-        #     # pad 到 [max_seq_len, max_seq_len]
-++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++-        #     global_attention_mask = padded_mask
-++++-        # else:
-++++-        #     global_attention_mask = None
-++++-
-++++-
-++++-        # sparse_mode=3
-++++-        # attn_output = mindspore.ops.flash_attention_score(
-++++-        #     query=query_states,
-++++-        #     key=key_states,
-++++-        #     value=value_states,
-++++-        #     real_shift=None,
-++++-        #     padding_mask=None,
-++++-
-++++-        #     head_num=self.num_heads,
-++++-        #     attn_mask=global_attention_mask,
-++++-        #     keep_prob=1.0 - self.attention_dropout,
-++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++-        #     input_layout="BNSD",
-++++-        #     pre_tokens=2147483647,
-++++-        #     next_tokens=2147483647,
-++++-        #     inner_precise=0,
-++++-        #     drop_mask=None,
-++++-        #     prefix=None,
-++++-        #     actual_seq_qlen=None,
-++++-        #     actual_seq_kvlen=None,
-++++-        #     sparse_mode=sparse_mode,
-++++-        # )
-++++-        if not output_attentions:
-++++-            attn_weights = None
-++++-
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-++++-
-++++ # class Qwen2MoeFlashAttention(nn.Module):
-++++ #     """
-++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
-++++ #             return final_hidden_states, router_logits
-++++ 
-++++ 
-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++-#     """
-++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
-++++-#     """
-++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++-#         super().__init__()
-++++-#         self.num_experts = config.num_experts
-++++-#         self.top_k = config.num_experts_per_tok
-++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++-
-++++-#         # 门控网络
-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++-#         # 专家列表
-++++-#         self.experts = nn.ModuleList(
-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++-#         )
-++++-#         # 共享专家
-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++-
-++++-#     @no_grad()
-++++-#     def _moe_infer_decode(
-++++-#         self, 
-++++-#         hidden_states: mindspore.Tensor, 
-++++-#         selected_experts: mindspore.Tensor, 
-++++-#         routing_weights: mindspore.Tensor
-++++-#     ) -> mindspore.Tensor:
-++++-#         """
-++++-#         【解码路径】针对 sequence_length=1 的极致优化。
-++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-++++-#         """
-++++-#         batch_size, hidden_dim = hidden_states.shape
-++++-        
-++++-#         expert_outputs_list = [
-++++-#             ops.cat([
-++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++-#             ], dim=0) 
-++++-#             for i in range(batch_size)
-++++-#         ]
-++++-        
-++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-++++-#         # shape: (batch_size, top_k, hidden_dim)
-++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++-        
-++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++-        
-++++-#         return moe_output.squeeze(1)
-++++-
-++++-#     @no_grad()
-++++-#     def _moe_infer_prefill(
-++++-#         self, 
-++++-#         hidden_states: mindspore.Tensor, 
-++++-#         selected_experts: mindspore.Tensor, 
-++++-#         routing_weights: mindspore.Tensor
-++++-#     ) -> mindspore.Tensor:
-++++-#         """
-++++-#         【预填充路径】针对 sequence_length > 1 的优化。
-++++-#         按专家对 Token 进行分组，并进行批处理。
-++++-#         """
-++++-#         moe_output = ops.zeros_like(hidden_states)
-++++-#         num_tokens = hidden_states.shape[0]
-++++-#         flat_selected_experts = selected_experts.flatten()
-++++-        
-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++-        
-++++-#         active_experts = ops.unique(flat_selected_experts)
-++++-        
-++++-#         for expert_idx_tensor in active_experts:
-++++-#             expert_idx = expert_idx_tensor.item()
-++++-#             expert_layer = self.experts[expert_idx]
-++++-            
-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++-#             selected_token_indices = token_indices[mask]
-++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++++-            
-++++-#             current_states = hidden_states[selected_token_indices]
-++++-            
-++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++-            
-++++-#             moe_output = moe_output.index_add(
-++++-#                 dim=0,
-++++-#                 index=selected_token_indices,
-++++-#                 source=expert_output.to(hidden_states.dtype)
-++++-#             )
-++++-#         return moe_output
-++++-
-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++-#         """
-++++-#         顶层 forward 方法，作为智能分发器。
-++++-#         """
-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-        
-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++-
-++++-#         if self.norm_topk_prob:
-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-        
-++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++-        
-++++-#         moe_output = None
-++++-#         # 在推理时，根据序列长度选择最优路径
-++++-#         if not self.training:
-++++-#             if sequence_length == 1:
-++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++++-#             else:
-++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++++-#         else:
-++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-++++-#             raise NotImplementedError("Training path is not implemented.")
-++++-
-++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-++++-        
-++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-++++-        
-++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-++++-        
-++++-#         return final_hidden_states, router_logits
-++++-
-++++-
-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++-#     """
-++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-++++-#     """
-++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++-#         super().__init__()
-++++-#         self.num_experts = config.num_experts
-++++-#         self.top_k = config.num_experts_per_tok
-++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++-
-++++-#         # 门控网络
-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++-#         # 专家列表
-++++-#         self.experts = nn.ModuleList(
-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++-#         )
-++++-#         # 共享专家
-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++-
-++++-#     @no_grad()
-++++-#     def _moe_infer_decode(
-++++-#         self, 
-++++-#         hidden_states: mindspore.Tensor, 
-++++-#         selected_experts: mindspore.Tensor, 
-++++-#         routing_weights: mindspore.Tensor
-++++-#     ) -> mindspore.Tensor:
-++++-#         batch_size, _ = hidden_states.shape
-++++-#         expert_outputs_list = [
-++++-#             ops.cat([
-++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++-#             ], dim=0) 
-++++-#             for i in range(batch_size)
-++++-#         ]
-++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++-#         return moe_output.squeeze(1)
-++++-
-++++-#     @no_grad()
-++++-#     def _moe_infer_prefill(
-++++-#         self, 
-++++-#         hidden_states: mindspore.Tensor, 
-++++-#         selected_experts: mindspore.Tensor, 
-++++-#         routing_weights: mindspore.Tensor
-++++-#     ) -> mindspore.Tensor:
-++++-#         moe_output = ops.zeros_like(hidden_states)
-++++-#         num_tokens = hidden_states.shape[0]
-++++-#         flat_selected_experts = selected_experts.flatten()
-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++-#         active_experts = ops.unique(flat_selected_experts)
-++++-        
-++++-#         for expert_idx_tensor in active_experts:
-++++-#             expert_idx = expert_idx_tensor.item()
-++++-#             expert_layer = self.experts[expert_idx]
-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++-#             selected_token_indices = token_indices[mask]
-++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++++-#             current_states = hidden_states[selected_token_indices]
-++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++-#             moe_output = moe_output.index_add(
-++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++++-#             )
-++++-#         return moe_output
-++++-
-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++-#         """
-++++-#         顶层 forward 方法，作为智能分发器。
-++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-++++-#         """
-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-        
-++++-#         # 1. 门控计算 (通用逻辑)
-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++-
-++++-#         if self.norm_topk_prob:
-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-        
-++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++-        
-++++-#         # 2. 智能分发到最优 MoE 路径
-++++-#         moe_output = None
-++++-#         if not self.training:
-++++-#             if sequence_length == 1:
-++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++++-#             else:
-++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++++-#         else:
-++++-#             raise NotImplementedError("Training path is not implemented.")
-++++-
-++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++-        
-++++-#         # 4. 合并 MoE 输出和共享专家输出
-++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++-        
-++++-#         # 5. 恢复原始形状并返回
-++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++-        
-++++-#         return final_hidden_states, router_logits
-++++-
-++++-# prefill fastest
-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++-#     """
-++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-++++-#     """
-++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++-#         super().__init__()
-++++-#         self.num_experts = config.num_experts
-++++-#         self.top_k = config.num_experts_per_tok
-++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++-
-++++-#         # 门控网络
-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++-#         # 专家列表
-++++-#         self.experts = nn.ModuleList(
-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++-#         )
-++++-#         # 共享专家
-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++-
-++++-#     @no_grad()
-++++-#     def _moe_infer_dispatch(
-++++-#         self, 
-++++-#         hidden_states: mindspore.Tensor, 
-++++-#         selected_experts: mindspore.Tensor, 
-++++-#         routing_weights: mindspore.Tensor
-++++-#     ) -> mindspore.Tensor:
-++++-#         """
-++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-++++-#         """
-++++-#         moe_output = ops.zeros_like(hidden_states)
-++++-#         num_tokens, _ = hidden_states.shape
-++++-        
-++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-++++-#         flat_selected_experts = selected_experts.flatten()
-++++-#         flat_routing_weights = routing_weights.flatten()
-++++-
-++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++-
-++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-++++-#         active_experts = ops.unique(flat_selected_experts)
-++++-        
-++++-#         for expert_idx_tensor in active_experts:
-++++-#             expert_idx = expert_idx_tensor.item()
-++++-#             expert_layer = self.experts[expert_idx]
-++++-            
-++++-#             # 找到所有分配给该专家的 token
-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++-            
-++++-#             # 使用 mask 选取对应的 token 和权重
-++++-#             current_token_indices = token_indices[mask]
-++++-#             current_routing_weights = flat_routing_weights[mask]
-++++-#             current_hidden_states = hidden_states[current_token_indices]
-++++-            
-++++-#             # 对这些 token 进行批处理
-++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++-            
-++++-#             # 使用 index_add 将结果精确地加回到对应位置
-++++-#             moe_output = moe_output.index_add(
-++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-++++-#             )
-++++-#         return moe_output
-++++-
-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++-#         """
-++++-#         顶层 forward 方法，作为智能分发器。
-++++-#         """
-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-        
-++++-#         # 1. 门控计算
-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++-
-++++-#         if self.norm_topk_prob:
-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-        
-++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++-        
-++++-#         # 2. 调用统一的 MoE 计算内核
-++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-++++-
-++++-#         # 3. 统一处理共享专家
-++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++-        
-++++-#         # 4. 合并输出
-++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++-        
-++++-#         # 5. 恢复原始形状并返回
-++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++-        
-++++-#         return final_hidden_states, router_logits
-++++-
-++++-
-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++-#     """
-++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++-#     【最终高性能与高精度版】：
-++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-++++-#     3. 这样实现了速度和准确性的两全其美。
-++++-#     """
-++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++-#         super().__init__()
-++++-#         self.num_experts = config.num_experts
-++++-#         self.top_k = config.num_experts_per_tok
-++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++-
-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++-#         self.experts = nn.ModuleList(
-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++-#         )
-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++-
-++++-#     @no_grad()
-++++-#     def _moe_infer_decode(
-++++-#         self, 
-++++-#         hidden_states: mindspore.Tensor, 
-++++-#         selected_experts: mindspore.Tensor, 
-++++-#         routing_weights: mindspore.Tensor
-++++-#     ) -> mindspore.Tensor:
-++++-#         """
-++++-#         【解码路径】极致优化版：bmm + 高精度累加。
-++++-#         """
-++++-#         original_dtype = hidden_states.dtype
-++++-#         batch_size, _ = hidden_states.shape
-++++-        
-++++-#         expert_outputs_list = [
-++++-#             ops.cat([
-++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++-#             ], dim=0) 
-++++-#             for i in range(batch_size)
-++++-#         ]
-++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++-
-++++-#         # 在 float32 下执行 bmm，得到高精度结果
-++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++-        
-++++-#         # 将高精度结果转换回原始数据类型
-++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-++++-        
-++++-#         return moe_output
-++++-
-++++-#     @no_grad()
-++++-#     def _moe_infer_prefill(
-++++-#         self, 
-++++-#         hidden_states: mindspore.Tensor, 
-++++-#         selected_experts: mindspore.Tensor, 
-++++-#         routing_weights: mindspore.Tensor
-++++-#     ) -> mindspore.Tensor:
-++++-#         """
-++++-#         【预填充路径】与原始实现一致，结果精确。
-++++-#         """
-++++-#         moe_output = ops.zeros_like(hidden_states)
-++++-#         num_tokens, _ = hidden_states.shape
-++++-#         flat_selected_experts = selected_experts.flatten()
-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++-#         active_experts = ops.unique(flat_selected_experts)
-++++-        
-++++-#         for expert_idx_tensor in active_experts:
-++++-#             expert_idx = expert_idx_tensor.item()
-++++-#             expert_layer = self.experts[expert_idx]
-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++-#             selected_token_indices = token_indices[mask]
-++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++++-#             current_states = hidden_states[selected_token_indices]
-++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++-#             moe_output = moe_output.index_add(
-++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++++-#             )
-++++-#         return moe_output
-++++-
-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-        
-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++-
-++++-#         if self.norm_topk_prob:
-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-        
-++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-++++-#         # 如果模型主体是 float16，后续再转换
-++++-        
-++++-#         moe_output = None
-++++-#         if not self.training:
-++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-++++-#             # _moe_infer_decode 内部会处理好类型转换
-++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-++++-#             if sequence_length == 1:
-++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++++-#             else:
-++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++++-#         else:
-++++-#             raise NotImplementedError("Training path is not implemented.")
-++++-
-++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++-        
-++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++-        
-++++-#         return final_hidden_states, router_logits
-++++-    
-++++-
-++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++-#     """
-++++-#     【融合版】一个混合专家模块，内置两种推理策略，
-++++-#     由外部全局变量 `Long_Prompt` 控制：
-++++-
-++++-#     - if Long_Prompt is True:  【精度优先模式】
-++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-++++-#       适用于处理长序列，避免误差累积。
-++++-
-++++-#     - if Long_Prompt is False: 【速度优先模式】
-++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
-++++-#     """
-++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++-#         super().__init__()
-++++-#         self.num_experts = config.num_experts
-++++-#         self.top_k = config.num_experts_per_tok
-++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++-
-++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++-#         self.experts = nn.ModuleList(
-++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++-#         )
-++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++-
-++++-#     # --- 速度优先模式的辅助函数 ---
-++++-#     @no_grad()
-++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++-#         original_dtype = hidden_states.dtype
-++++-#         batch_size, _ = hidden_states.shape
-++++-#         expert_outputs_list = [
-++++-#             ops.cat([
-++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++-#             ], dim=0) 
-++++-#             for i in range(batch_size)
-++++-#         ]
-++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
-++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
-++++-
-++++-#     @no_grad()
-++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++-#         moe_output = ops.zeros_like(hidden_states)
-++++-#         num_tokens, _ = hidden_states.shape
-++++-#         flat_selected_experts = selected_experts.flatten()
-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++-#         active_experts = ops.unique(flat_selected_experts)
-++++-#         for expert_idx_tensor in active_experts:
-++++-#             expert_idx = expert_idx_tensor.item()
-++++-#             expert_layer = self.experts[expert_idx]
-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++-#             selected_token_indices = token_indices[mask]
-++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++++-#             current_states = hidden_states[selected_token_indices]
-++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-++++-#         return moe_output
-++++-
-++++-#     # --- 精度优先模式的辅助函数 ---
-++++-#     @no_grad()
-++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++-#         moe_output = ops.zeros_like(hidden_states)
-++++-#         num_tokens, _ = hidden_states.shape
-++++-#         flat_selected_experts = selected_experts.flatten()
-++++-#         flat_routing_weights = routing_weights.flatten()
-++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++-#         active_experts = ops.unique(flat_selected_experts)
-++++-#         for expert_idx_tensor in active_experts:
-++++-#             expert_idx = expert_idx_tensor.item()
-++++-#             expert_layer = self.experts[expert_idx]
-++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++-#             current_token_indices = token_indices[mask]
-++++-#             current_routing_weights = flat_routing_weights[mask]
-++++-#             current_hidden_states = hidden_states[current_token_indices]
-++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++++-#         return moe_output
-++++-
-++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
-++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-++++-#         global Long_Prompt
-++++-
-++++-#         # 1. 门控计算 (所有模式通用)
-++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++++-#         if self.norm_topk_prob:
-++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-        
-++++-#         moe_output = None
-++++-#         if not self.training:
-++++-#             # 根据 Long_Prompt 标志选择模式
-++++-#             if Long_Prompt:
-++++-#                 # --- 精度优先模式 ---
-++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++-#             else:
-++++-#                 # --- 速度优先模式 ---
-++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++-#                 if sequence_length == 1:
-++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++-#                 else:
-++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++-#         else:
-++++-#             raise NotImplementedError("Training path is not implemented.")
-++++-
-++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++-        
-++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++-        
-++++-#         return final_hidden_states, router_logits
-++++-    
-++++ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++     """
-++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++++         return moe_output_fp32.squeeze(1).to(original_dtype)
-++++ 
-+++++    # @no_grad()
-+++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++    #     num_tokens, _ = hidden_states.shape
-+++++    #     flat_selected_experts = selected_experts.flatten()
-+++++    #     sorted_expert_indices = flat_selected_experts.argsort()
-+++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+++++    #     original_token_indices = sorted_expert_indices // self.top_k
-+++++    #     moe_output = ops.zeros_like(hidden_states)
-+++++    #     current_token_offset = 0
-+++++    #     for i in range(self.num_experts):
-+++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
-+++++    #         if expert_token_count == 0:
-+++++    #             continue
-+++++    #         end_offset = current_token_offset + expert_token_count
-+++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
-+++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++    #         current_token_offset += expert_token_count
-+++++    #     return moe_output
-+++++
-++++     @no_grad()
-++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++-        num_tokens, _ = hidden_states.shape
-++++-        flat_selected_experts = selected_experts.flatten()
-++++-        sorted_expert_indices = flat_selected_experts.argsort()
-++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++++-        original_token_indices = sorted_expert_indices // self.top_k
-+++++        """
-+++++        优化版 MoE prefill (速度优先模式)：
-+++++        - 批量张量化处理同一个 expert 的所有 token
-+++++        - 跳过无 token 的专家
-+++++        - 保持结果完全一致
-+++++        """
-++++         moe_output = ops.zeros_like(hidden_states)
-++++-        current_token_offset = 0
-++++-        for i in range(self.num_experts):
-++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
-++++-            if expert_token_count == 0:
-++++-                continue
-++++-            end_offset = current_token_offset + expert_token_count
-++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
-++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++++-            current_token_offset += expert_token_count
-+++++
-+++++        flat_selected_experts = selected_experts.flatten()
-+++++        flat_routing_weights = routing_weights.flatten()
-+++++
-+++++        idxs = flat_selected_experts.argsort()
-+++++        sorted_expert_indices = flat_selected_experts[idxs]
-+++++        sorted_token_indices = idxs // self.top_k
-+++++
-+++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-+++++
-+++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-+++++
-+++++        for expert_id in active_experts.tolist():
-+++++            start = int(tokens_per_expert[:expert_id].sum().item())
-+++++            end = start + int(tokens_per_expert[expert_id].item())
-+++++
-+++++            token_idx = sorted_token_indices[start:end]
-+++++            expert_tokens = hidden_states[token_idx]
-+++++
-+++++            expert_out = self.experts[expert_id](expert_tokens)
-+++++
-+++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
-+++++
-+++++            moe_output = mindspore.mint.scatter_add(
-+++++                moe_output,
-+++++                0,
-+++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
-+++++                scaled_out.to(hidden_states.dtype)
-+++++            )
-+++++
-++++         return moe_output
-++++ 
-+++++
-++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-++++     @no_grad()
-++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++         
-++++         moe_output = None
-++++-        if Long_Prompt:
-++++-            # --- 精度优先模式 (ACCURACY MODE) ---
-++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++        # if Long_Prompt==0:
-+++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
-+++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++        # else:
-+++++        #     # --- 速度优先模式 (SPEED MODE) ---
-+++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++        #     if sequence_length == 1:
-+++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++        #     else:
-+++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++        
-+++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++        if sequence_length == 1:
-+++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++         else:
-++++-            # --- 速度优先模式 (SPEED MODE) ---
-++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++-            if sequence_length == 1:
-++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++-            else:
-++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++-        
-+++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++    
-++++ 
-++++         # 3. 共享专家计算与合并 (所有模式通用)
-++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++         
-++++         return final_hidden_states, router_logits
-++++ 
-+++++
-++++ class Qwen2MoeDecoderLayer(nn.Module):
-++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-++++         super().__init__()
-++++         self.hidden_size = config.hidden_size
-++++         
-++++-        # if Long_Prompt:
-++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++-        # else:
-+++++        # if Long_Prompt == 2:
-++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++        # else:
-+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++ 
-++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++ 
-++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++++             )
-++++ 
-++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
-++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++        #     attention_mask,
-+++++        #     sequence_length=sequence_length,
-+++++        #     target_length=target_length,
-+++++        #     dtype=dtype,
-+++++        #     min_dtype=min_dtype,
-+++++        #     cache_position=cache_position,
-+++++        #     batch_size=input_tensor.shape[0],
-+++++        # )
-+++++        #@dwj
-+++++        causal_mask = get_cached_causal_mask_with_cache_position(
-++++             attention_mask,
-++++             sequence_length=sequence_length,
-++++             target_length=target_length,
-++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-++++         """
-++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
-+++++        _causal_mask_cache.clear()
-++++ 
-++++         input_ids = kwargs.get("input_ids")
-++++         if input_ids is None and args:
-++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++ 
-++++         if input_ids is not None:
-++++             prompt_length = input_ids.shape[1]
-++++-            
-++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-++++-                Long_Prompt = True
-+++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-+++++                Long_Prompt = 2
-+++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-+++++                Long_Prompt = 0
-++++             else:
-++++-                Long_Prompt = False
-+++++                Long_Prompt = 1
-+++++
-++++ 
-++++         return super().generate(*args, **kwargs)
-++++     
-++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++             dtype = self.lm_head.weight.dtype
-++++             min_dtype = float(ops.finfo(dtype).min)
-++++ 
-++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++            #     attention_mask,
-+++++            #     sequence_length=sequence_length,
-+++++            #     target_length=past_key_values.get_max_length(),
-+++++            #     dtype=dtype,
-+++++            #     min_dtype=min_dtype,
-+++++            #     cache_position=cache_position,
-+++++            #     batch_size=batch_size,
-+++++            # )
-+++++
-+++++            #@dwj
-+++++            attention_mask = get_cached_causal_mask_with_cache_position(
-++++                 attention_mask,
-++++                 sequence_length=sequence_length,
-++++                 target_length=past_key_values.get_max_length(),
-++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++++deleted file mode 100644
-++++index 6dfb5b93..00000000
-++++--- a/patches/0001-20251104commit.patch
-+++++++ /dev/null
-++++@@ -1,1272 +0,0 @@
-++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++++-From: Pinoeer-kingxi <13022943007@163.com>
-++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
-++++-Subject: [PATCH] 20251104commit
-++++-
-++++----
-++++- mindnlp/transformers/cache_utils.py           |  28 +-
-++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++++- 3 files changed, 976 insertions(+), 87 deletions(-)
-++++-
-++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++++-index cadd2e04..02f8d4be 100644
-++++---- a/mindnlp/transformers/cache_utils.py
-++++-+++ b/mindnlp/transformers/cache_utils.py
-++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++++-             # k_out[:, :, cache_position] = key_states
-++++-             # v_out[:, :, cache_position] = value_states
-++++--            if ON_ORANGE_PI:
-++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++--            else:
-++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++--
-++++-+            # if ON_ORANGE_PI:
-++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++-+            # else:
-++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
-++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++++-+            if cache_position.ndim > 1:
-++++-+                cache_position = cache_position.flatten()
-++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
-++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++++-+                cache_position = cache_position.int()
-++++-+            
-++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++++-+            k_out[:, :, cache_position] = key_states
-++++-+            v_out[:, :, cache_position] = value_states
-++++-+            
-++++-         return k_out, v_out
-++++- 
-++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++-index c695b944..d8303e45 100644
-++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
-++++- def rotate_half(x):
-++++-     """Rotates half the hidden dims of the input."""
-++++--    x1 = x[..., : x.shape[-1] // 2]
-++++--    x2 = x[..., x.shape[-1] // 2 :]
-++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++-+    # x1 = x[..., : x.shape[-1] // 2]
-++++-+    # x2 = x[..., x.shape[-1] // 2 :]
-++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++-     return ops.cat((-x2, x1), dim=-1)
-++++- 
-++++- 
-++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++++-         if self.training:
-++++-             raise NotImplementedError("Training is not supported yet.")
-++++-         else:
-++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++--        if self.config.n_shared_experts is not None:
-++++--            y = y + self.shared_experts(identity)
-++++--        return y
-++++-+            # @lwx
-++++-+            if orig_shape[1] == 1:
-++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++++-+                y=y.view(*orig_shape)
-++++-+                if self.config.n_shared_experts is not None:
-++++-+                    y = y + self.shared_experts(identity)
-++++-+                return y
-++++-+            else:
-++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++++-+                if self.config.n_shared_experts is not None:
-++++-+                    y = y + self.shared_experts(identity)
-++++-+                return y
-++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++-+        # if self.config.n_shared_experts is not None:
-++++-+        #     y = y + self.shared_experts(identity)
-++++-+        # return y
-++++-+        
-++++-+    @no_grad()
-++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++-+
-++++-+        expert_cache = ops.zeros_like(x)
-++++-+        for i in range(self.num_experts_per_tok):
-++++-+            expert_id = flat_expert_indices[i].item()
-++++-+            weight = flat_expert_weights[i].item()
-++++-+            expert = self.experts[expert_id]
-++++-+            expert_out = expert(x)
-++++-+            expert_cache += expert_out * weight
-++++-+        return expert_cache
-++++- 
-++++-     @no_grad()
-++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++--        # expert_cache = torch.zeros_like(x)
-++++--        # idxs = flat_expert_indices.argsort()
-++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++--        # token_idxs = idxs // self.num_experts_per_tok
-++++--        # for i, end_idx in enumerate(tokens_per_expert):
-++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++--        #     if start_idx == end_idx:
-++++--        #         continue
-++++--        #     expert = self.experts[i]
-++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++--        #     expert_tokens = x[exp_token_idx]
-++++--        #     expert_out = expert(expert_tokens)
-++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++--        # return expert_cache
-++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++-         expert_cache = ops.zeros_like(x)
-++++-         idxs = flat_expert_indices.argsort()
-++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++-         token_idxs = idxs // self.num_experts_per_tok
-++++-+
-++++-         for i, end_idx in enumerate(tokens_per_expert):
-++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++-             if start_idx == end_idx:
-++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++++-             expert_out = expert(expert_tokens)
-++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++-+
-++++-         return expert_cache
-++++-+        
-++++-+    # @no_grad()
-++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++-+    #     # expert_cache = torch.zeros_like(x)
-++++-+    #     # idxs = flat_expert_indices.argsort()
-++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
-++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++-+    #     #     if start_idx == end_idx:
-++++-+    #     #         continue
-++++-+    #     #     expert = self.experts[i]
-++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++-+    #     #     expert_tokens = x[exp_token_idx]
-++++-+    #     #     expert_out = expert(expert_tokens)
-++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++-+    #     # return expert_cache
-++++-+    #     expert_cache = ops.zeros_like(x)
-++++-+    #     idxs = flat_expert_indices.argsort()
-++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++-+    #     token_idxs = idxs // self.num_experts_per_tok
-++++-+
-++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
-++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++-+    #         if start_idx == end_idx:
-++++-+    #             continue
-++++-+    #         expert = self.experts[i]
-++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++-+    #         expert_tokens = x[exp_token_idx]
-++++-+    #         expert_out = expert(expert_tokens)
-++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++-+
-++++-+    #     return expert_cache
-++++-+    # @no_grad()
-++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++-+    #     expert_cache = ops.zeros_like(x)
-++++-+
-++++-+    #     # 排序保证顺序一致
-++++-+    #     idxs = flat_expert_indices.argsort()
-++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++-+    #     token_idxs = idxs // self.num_experts_per_tok
-++++-+
-++++-+    #     # 找出有 token 的专家
-++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++-+
-++++-+    #     for i in active_experts.tolist():
-++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++-+    #         end_idx = tokens_per_expert[i]
-++++-+    #         if start_idx == end_idx:  # 没有 token
-++++-+    #             continue
-++++-+
-++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++-+    #         expert_tokens = x[exp_token_idx]
-++++-+    #         expert_out = self.experts[i](expert_tokens)
-++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++-+
-++++-+    #         expert_cache = mindspore.mint.scatter_add(
-++++-+    #             expert_cache,
-++++-+    #             0,
-++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++-+    #             expert_out
-++++-+    #         )
-++++-+
-++++-+    #     return expert_cache
-++++-+
-++++-+
-++++- 
-++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++++- #     """
-++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++- 
-++++-         # Initialize weights and apply final processing
-++++-         self.post_init()
-++++-+        self.warm_up = False
-++++-+
-++++-+    def warmup_moe_model_deep(self):
-++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++-+        test_texts = [
-++++-+            "warmup short",
-++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++++-+        ]
-++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++-+        if tokenizer is None:
-++++-+            from mindnlp.transformers import AutoTokenizer
-++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++-+            self._warmup_tokenizer = tokenizer
-++++-+
-++++-+        for text in test_texts:
-++++-+            inputs = tokenizer(text, return_tensors="ms")
-++++-+            with mindspore._no_grad():
-++++-+                _ = self(**inputs, use_cache=False)
-++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++++- 
-++++-     def get_input_embeddings(self):
-++++-         return self.model.embed_tokens
-++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++-         ```"""
-++++-+        if not self.warm_up:
-++++-+            self.warm_up = True
-++++-+            self.warmup_moe_model_deep()
-++++-+
-++++-         output_attentions = (
-++++-             output_attentions
-++++-             if output_attentions is not None
-++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++-index 3cbf820e..d4c6b651 100644
-++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++-@@ -18,7 +18,6 @@
-++++- # See the License for the specific language governing permissions and
-++++- # limitations under the License.
-++++- """MindSpore Qwen2MoE model."""
-++++--
-++++- import math
-++++- from typing import List, Optional, Tuple, Union
-++++- 
-++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++++-     TokenClassifierOutput,
-++++- )
-++++- from ...modeling_utils import PreTrainedModel
-++++-+from ...generation import GenerationMixin
-++++- from ....utils import logging
-++++- from .configuration_qwen2_moe import Qwen2MoeConfig
-++++- 
-++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++++-         self.variance_epsilon = eps
-++++- 
-++++-     def forward(self, hidden_states):
-++++-+        # @dwj
-++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++-+        # @lwx
-++++-+        # if not self.training :
-++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++-         input_dtype = hidden_states.dtype
-++++-         hidden_states = hidden_states.to(mindspore.float32)
-++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++++-@@ -234,6 +239,8 @@ def rotate_half(x):
-++++-     """Rotates half the hidden dims of the input."""
-++++-     x1 = x[..., : x.shape[-1] // 2]
-++++-     x2 = x[..., x.shape[-1] // 2 :]
-++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++-     return ops.cat((-x2, x1), dim=-1)
-++++- 
-++++- 
-++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++++-         self.config = config
-++++-         self.hidden_size = config.hidden_size
-++++-         self.intermediate_size = intermediate_size
-++++-+        
-++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++++-         self.act_fn = ACT2FN[config.hidden_act]
-++++- 
-++++-     def forward(self, x):
-++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++--
-++++- 
-++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++-+        # @lwx 
-++++-+        # gate_up_output = self.gate_up_proj(x)
-++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++++-+        # return self.down_proj(swiglu_output)
-++++-+
-++++-+    # def forward(self, x):
-++++-+    #     gate_proj_out = self.gate_proj(x)
-++++-+    #     up_proj_out = self.up_proj(x)
-++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++++-+    #     return self.down_proj(swiglu_out)
-++++-+        
-++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++++-     """
-++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++++-         use_cache: bool = False,
-++++-         cache_position: Optional[mindspore.Tensor] = None,
-++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++-+
-++++-+        
-++++-+
-++++-         bsz, q_len, _ = hidden_states.shape
-++++- 
-++++-         query_states = self.q_proj(hidden_states)
-++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++-                     "with a layer index."
-++++-                 )
-++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++-+            if isinstance(past_key_value, StaticCache):
-++++-+                kv_seq_len = key_states.shape[-2]
-++++-+            else:
-++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++- 
-++++-         if past_key_value is not None:
-++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++-+            
-++++-+            if isinstance(past_key_value, StaticCache):
-++++-+                kv_seq_len = key_states.shape[-2]
-++++- 
-++++-         # repeat k/v heads if n_kv_heads < n_heads
-++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++--
-++++-+        
-++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++- 
-++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++++--            raise ValueError(
-++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++++--                f" {attn_weights.shape}"
-++++--            )
-++++--
-++++--        if attention_mask is not None:  # no matter the length, we just slice it
-++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++++-+        if attention_mask is not None:
-++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++-             attn_weights = attn_weights + causal_mask
-++++- 
-++++-         # upcast attention to fp32
-++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++- 
-++++-         attn_output = self.o_proj(attn_output)
-++++--
-++++-+        # @lwx
-++++-+        
-++++-+        # max_seq_len = self.max_position_embeddings  # 2048
-++++-+
-++++-+        # if attention_mask is not None:
-++++-+        #     # attention_mask: [B, 1, Sq, Sk]
-++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++-+
-++++-+        #     # pad 到 [max_seq_len, max_seq_len]
-++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++-+        #     global_attention_mask = padded_mask
-++++-+        # else:
-++++-+        #     global_attention_mask = None
-++++-+
-++++-+
-++++-+        # sparse_mode=3
-++++-+        # attn_output = mindspore.ops.flash_attention_score(
-++++-+        #     query=query_states,
-++++-+        #     key=key_states,
-++++-+        #     value=value_states,
-++++-+        #     real_shift=None,
-++++-+        #     padding_mask=None,
-++++-+
-++++-+        #     head_num=self.num_heads,
-++++-+        #     attn_mask=global_attention_mask,
-++++-+        #     keep_prob=1.0 - self.attention_dropout,
-++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++-+        #     input_layout="BNSD",
-++++-+        #     pre_tokens=2147483647,
-++++-+        #     next_tokens=2147483647,
-++++-+        #     inner_precise=0,
-++++-+        #     drop_mask=None,
-++++-+        #     prefix=None,
-++++-+        #     actual_seq_qlen=None,
-++++-+        #     actual_seq_kvlen=None,
-++++-+        #     sparse_mode=sparse_mode,
-++++-+        # )
-++++-         if not output_attentions:
-++++-             attn_weights = None
-++++- 
-++++-         return attn_output, attn_weights, past_key_value
-++++- 
-++++- 
-++++-+class Qwen2MoeFlashAttention(nn.Module):
-++++-+    """
-++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++-+
-++++-+    关键改动:
-++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++-+       直接传入原始的 key 和 value 张量效率更高。
-++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++-+    """
-++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++-+        super().__init__()
-++++-+        self.config = config
-++++-+        self.layer_idx = layer_idx
-++++-+        self.hidden_size = config.hidden_size
-++++-+        self.num_heads = config.num_attention_heads
-++++-+        self.head_dim = self.hidden_size // self.num_heads
-++++-+        self.num_key_value_heads = config.num_key_value_heads
-++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++-+        self.max_position_embeddings = config.max_position_embeddings
-++++-+        self.rope_theta = config.rope_theta
-++++-+        self.attention_dropout = config.attention_dropout
-++++-+
-++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++-+            raise ValueError(
-++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++-+            )
-++++-+
-++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++-+
-++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++-+            self.head_dim,
-++++-+            max_position_embeddings=self.max_position_embeddings,
-++++-+            base=self.rope_theta,
-++++-+        )
-++++-+
-++++-+    def forward(
-++++-+        self,
-++++-+        hidden_states: mindspore.Tensor,
-++++-+        attention_mask: Optional[mindspore.Tensor] = None,
-++++-+        position_ids: Optional[mindspore.Tensor] = None,
-++++-+        past_key_value: Optional[Cache] = None,
-++++-+        output_attentions: bool = False,
-++++-+        use_cache: bool = False,
-++++-+        cache_position: Optional[mindspore.Tensor] = None,
-++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++-+
-++++-+        bsz, q_len, _ = hidden_states.shape
-++++-+
-++++-+        # 1. 线性投射 Q, K, V
-++++-+        query_states = self.q_proj(hidden_states)
-++++-+        key_states = self.k_proj(hidden_states)
-++++-+        value_states = self.v_proj(hidden_states)
-++++-+
-++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+
-++++-+        # 3. RoPE 旋转位置编码
-++++-+        kv_seq_len = key_states.shape[-2]
-++++-+        if past_key_value is not None:
-++++-+            if self.layer_idx is None:
-++++-+                raise ValueError(
-++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++-+                    "with a layer index."
-++++-+                )
-++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++-+                if cache_position.shape[0] == 1:
-++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++-+                    kv_seq_len = past_seen_tokens + 1
-++++-+                else:
-++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
-++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++-+            else:
-++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++-+
-++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++-+
-++++-+        # 4. KV 缓存更新
-++++-+        if past_key_value is not None:
-++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++-+            key_states, value_states = past_key_value.update(
-++++-+                key_states, value_states, self.layer_idx, cache_kwargs
-++++-+            )
-++++-+            
-++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++-+                if cache_position.shape[0] == 1:
-++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++-+                    kv_seq_len = key_states.shape[-2]
-++++-+
-++++-+        # 5. [重要] 准备 Attention Mask
-++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++-+        fa_attention_mask = None
-++++-+        if attention_mask is not None:
-++++-+            # 截取与当前key长度匹配的部分
-++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++-+            fa_attention_mask = (mask_slice != 0)
-++++-+
-++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++-+        input_dtype = query_states.dtype
-++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++-+            query_states = query_states.to(mindspore.float16)
-++++-+            key_states = key_states.to(mindspore.float16)
-++++-+            value_states = value_states.to(mindspore.float16)
-++++-+
-++++-+        # 6. [核心] 调用 flash_attention_score 算子
-++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++-+        attn_output = mindspore.ops.flash_attention_score(
-++++-+            query=query_states,
-++++-+            key=key_states,
-++++-+            value=value_states,
-++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
-++++-+            attn_mask=fa_attention_mask,
-++++-+            keep_prob=1.0 - self.attention_dropout,
-++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
-++++-+            input_layout="BNSD",
-++++-+            sparse_mode=0 # 使用 defaultMask 模式
-++++-+        )
-++++-+
-++++-+        # 恢复原始数据类型
-++++-+        attn_output = attn_output.to(input_dtype)
-++++-+
-++++-+        # 7. 调整输出形状
-++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++-+        attn_output = self.o_proj(attn_output)
-++++-+
-++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
-++++-+        attn_weights = None
-++++-+        if output_attentions:
-++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++-+
-++++-+        return attn_output, attn_weights, past_key_value
-++++-+
-++++-+    # def forward(
-++++-+    #     self,
-++++-+    #     hidden_states: mindspore.Tensor,
-++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-++++-+    #     past_key_value: Optional[Cache] = None,
-++++-+    #     output_attentions: bool = False,
-++++-+    #     use_cache: bool = False,
-++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++-+
-++++-+    #     bsz, q_len, _ = hidden_states.shape
-++++-+
-++++-+    #     # 1. 线性投射 Q, K, V
-++++-+    #     query_states = self.q_proj(hidden_states)
-++++-+    #     key_states = self.k_proj(hidden_states)
-++++-+    #     value_states = self.v_proj(hidden_states)
-++++-+
-++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+
-++++-+    #     # 3. RoPE 旋转位置编码
-++++-+    #     kv_seq_len = key_states.shape[-2]
-++++-+    #     if past_key_value is not None:
-++++-+    #         if self.layer_idx is None:
-++++-+    #             raise ValueError(
-++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++-+    #                 "with a layer index."
-++++-+    #             )
-++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++-+
-++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++-+
-++++-+    #     # 4. KV 缓存更新
-++++-+    #     if past_key_value is not None:
-++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++-+    #         key_states, value_states = past_key_value.update(
-++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++-+    #         )
-++++-+
-++++-+    #     # 5. 准备 Attention Mask
-++++-+    #     fa_attention_mask = None
-++++-+    #     if attention_mask is not None:
-++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++-+    #         fa_attention_mask = (mask_slice != 0)
-++++-+
-++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++-+    #     input_dtype = query_states.dtype
-++++-+
-++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
-++++-+    #     attn_output = mindspore.ops.flash_attention_score(
-++++-+    #         query=query_states,
-++++-+    #         key=key_states,
-++++-+    #         value=value_states,
-++++-+    #         head_num=self.num_heads,
-++++-+    #         attn_mask=fa_attention_mask,
-++++-+    #         keep_prob=1.0 - self.attention_dropout,
-++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++-+    #         input_layout="BNSD",
-++++-+    #         sparse_mode=0,
-++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++-+    #         inner_precise=1
-++++-+    #     )
-++++-+
-++++-+    #     # 恢复原始数据类型
-++++-+    #     attn_output = attn_output.to(input_dtype)
-++++-+
-++++-+    #     # 7. 调整输出形状
-++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++-+    #     attn_output = self.o_proj(attn_output)
-++++-+
-++++-+    #     attn_weights = None
-++++-+    #     if output_attentions:
-++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++-+
-++++-+    #     return attn_output, attn_weights, past_key_value
-++++-+
-++++-+    # def forward(
-++++-+    #     self,
-++++-+    #     hidden_states: mindspore.Tensor,
-++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-++++-+    #     past_key_value: Optional[Cache] = None,
-++++-+    #     output_attentions: bool = False,
-++++-+    #     use_cache: bool = False,
-++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++-+
-++++-+    #     bsz, q_len, _ = hidden_states.shape
-++++-+
-++++-+    #     query_states = self.q_proj(hidden_states)
-++++-+    #     key_states = self.k_proj(hidden_states)
-++++-+    #     value_states = self.v_proj(hidden_states)
-++++-+
-++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-+
-++++-+    #     kv_seq_len = key_states.shape[-2]
-++++-+    #     if past_key_value is not None:
-++++-+    #         if self.layer_idx is None:
-++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
-++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++-+
-++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++-+
-++++-+    #     if past_key_value is not None:
-++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++-+    #         key_states, value_states = past_key_value.update(
-++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++-+    #         )
-++++-+
-++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++-+
-++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
-++++-+    #     # <--- 修改结束 ---
-++++-+
-++++-+    #     fa_attention_mask = None
-++++-+    #     if attention_mask is not None:
-++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++-+    #         fa_attention_mask = (mask_slice != 0)
-++++-+
-++++-+    #     input_dtype = query_states.dtype
-++++-+
-++++-+    #     attn_output = mindspore.ops.flash_attention_score(
-++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
-++++-+    #         key=key_states,
-++++-+    #         value=value_states,
-++++-+    #         head_num=self.num_heads,
-++++-+    #         attn_mask=fa_attention_mask,
-++++-+    #         keep_prob=1.0 - self.attention_dropout,
-++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++-+    #         input_layout="BNSD",
-++++-+    #         sparse_mode=0,
-++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
-++++-+    #     )
-++++-+
-++++-+    #     attn_output = attn_output.to(input_dtype)
-++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++-+    #     attn_output = self.o_proj(attn_output)
-++++-+
-++++-+    #     attn_weights = None
-++++-+    #     if output_attentions:
-++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++-+
-++++-+    #     return attn_output, attn_weights, past_key_value
-++++-+
-++++- QWEN2MOE_ATTENTION_CLASSES = {
-++++-     "eager": Qwen2MoeAttention,
-++++-+    "flash-attention": Qwen2MoeFlashAttention,
-++++- }
-++++- 
-++++- 
-++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++- 
-++++-+    #@dwj
-++++-+    # 只遍历激活的专家，而非全部专家
-++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++--        hidden_states = hidden_states.view(-1, hidden_dim)
-++++--        # router_logits: (batch * sequence_length, n_experts)
-++++--        router_logits = self.gate(hidden_states)
-++++--
-++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++--        if self.norm_topk_prob:
-++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++--        # we cast back to the input dtype
-++++--        routing_weights = routing_weights.to(hidden_states.dtype)
-++++--
-++++--        final_hidden_states = ops.zeros(
-++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++++--        )
-++++--
-++++--        # One hot encode the selected experts to create an expert mask
-++++--        # this will be used to easily index which expert is going to be sollicitated
-++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++++--
-++++--        # Loop over all available experts in the model and perform the computation on each expert
-++++--        for expert_idx in range(self.num_experts):
-++++--            expert_layer = self.experts[expert_idx]
-++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++++--
-++++--            # Index the correct hidden states and compute the expert hidden state for
-++++--            # the current expert. We need to make sure to multiply the output hidden
-++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++++--            if 0 not in idx.shape:
-++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++++--
-++++--                # However `index_add_` only support torch tensors for indexing so we'll use
-++++--                # the `top_x` tensor here.
-++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++++--
-++++--        shared_expert_output = self.shared_expert(hidden_states)
-++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++++--
-++++--        final_hidden_states = final_hidden_states + shared_expert_output
-++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++-+            num_tokens = hidden_states_reshaped.shape[0]
-++++-+
-++++-+            router_logits = self.gate(hidden_states_reshaped)
-++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++-+
-++++-+            if self.norm_topk_prob:
-++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
-++++-+            
-++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++-+            flat_selected_experts = selected_experts.flatten()
-++++-+            
-++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++-+            token_indices = broadcasted_token_indices.flatten()
-++++-+            
-++++-+            active_experts = ops.unique(flat_selected_experts)
-++++-+            
-++++-+            for expert_idx_tensor in active_experts:
-++++-+                expert_idx = expert_idx_tensor.item()
-++++-+                expert_layer = self.experts[expert_idx]
-++++-+                
-++++-+                mask = (flat_selected_experts == expert_idx_tensor)
-++++-+                selected_token_indices = token_indices[mask]
-++++-+                selected_routing_weights = routing_weights.flatten()[mask]
-++++-+                
-++++-+                current_states = hidden_states_reshaped[selected_token_indices]
-++++-+                
-++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++-+                
-++++-+                final_hidden_states = final_hidden_states.index_add(
-++++-+                    dim=0,
-++++-+                    index=selected_token_indices,
-++++-+                    source=expert_output.to(hidden_states.dtype)
-++++-+                )
-++++-+            
-++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++- 
-++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++--        return final_hidden_states, router_logits
-++++-+            final_hidden_states = final_hidden_states + shared_expert_output
-++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++-+            
-++++-+            return final_hidden_states, router_logits
-++++- 
-++++- 
-++++- class Qwen2MoeDecoderLayer(nn.Module):
-++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++++- 
-++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++- 
-++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++-+
-++++-         if (layer_idx not in config.mlp_only_layers) and (
-++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++++-         ):
-++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++++-     _skip_keys_device_placement = "past_key_values"
-++++-     _supports_cache_class = True
-++++-+#lwx
-++++-+    # _supports_static_cache = True
-++++- 
-++++-     def _init_weights(self, module):
-++++-         std = self.config.initializer_range
-++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++++-         return causal_mask
-++++- 
-++++- 
-++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++-     _tied_weights_keys = ["lm_head.weight"]
-++++- 
-++++-     def __init__(self, config):
-++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++-         self.num_experts_per_tok = config.num_experts_per_tok
-++++-         # Initialize weights and apply final processing
-++++-         self.post_init()
-++++-+        # @lwx
-++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++++-+        #     self.generation_config.cache_implementation = "static"
-++++-+        self._warmed_up = False
-++++-+
-++++-+    def warmup_moe_model(self):
-++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++++-+        test_texts = [
-++++-+            "warmup short",
-++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++++-+        ]
-++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++-+        if tokenizer is None:
-++++-+            from mindnlp.transformers import AutoTokenizer
-++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++-+            self._warmup_tokenizer = tokenizer
-++++-+
-++++-+        for text in test_texts:
-++++-+            inputs = tokenizer(text, return_tensors="ms")
-++++-+            with mindspore._no_grad():
-++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++++- 
-++++-     def get_input_embeddings(self):
-++++-         return self.model.embed_tokens
-++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++-         ```"""
-++++-+        if not self._warmed_up:
-++++-+            self._warmed_up = True
-++++-+            self.warmup_moe_model()
-++++- 
-++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++++-         output_router_logits = (
-++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++-             }
-++++-         )
-++++-         return model_inputs
-++++-+# @lwx
-++++-+    # def _decode_one_tokens_logits(
-++++-+    #     self,
-++++-+    #     cur_token: mindspore.Tensor,
-++++-+    #     input_pos: Optional[mindspore.Tensor],
-++++-+    #     cache_position: mindspore.Tensor,
-++++-+    #     past_key_values: StaticCache,
-++++-+    # ) -> mindspore.Tensor:
-++++-+    #     """
-++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++++-+        
-++++-+    #     Args:
-++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++++-+    #         input_pos: 输入位置信息，可选
-++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++++-+            
-++++-+    #     Returns:
-++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++++-+    #     """
-++++-+    #     # 调用JIT编译的版本
-++++-+    #     return self.get_decode_one_tokens_logits(
-++++-+    #         cur_token=cur_token,
-++++-+    #         input_pos=input_pos,
-++++-+    #         cache_position=cache_position,
-++++-+    #         past_key_values=past_key_values,
-++++-+    #     )
-++++-+    
-++++-+    # @mindspore.jit(jit_level='O1')
-++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++++-+    #     """
-++++-+    #     JIT编译的函数，用于高效的单token解码
-++++-+    #     使用JIT编译优化以支持静态shape和高效执行
-++++-+        
-++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++++-+    #     """
-++++-+    #     outputs = self.model.forward(
-++++-+    #         input_ids=cur_token,
-++++-+    #         position_ids=input_pos,
-++++-+    #         cache_position=cache_position,
-++++-+    #         past_key_values=past_key_values,
-++++-+    #         use_cache=True,
-++++-+    #         return_dict=False,
-++++-+    #     )
-++++-+        
-++++-+    #     hidden_states = outputs[0]
-++++-+    #     logits = self.lm_head.forward(hidden_states)
-++++-+    #     logits = logits.float()
-++++-+        
-++++-+    #     return logits[:, -1, :]
-++++-+
-++++-+    # def _sample(
-++++-+    #     self,
-++++-+    #     input_ids: mindspore.Tensor,
-++++-+    #     logits_processor,
-++++-+    #     stopping_criteria,
-++++-+    #     generation_config,
-++++-+    #     synced_devices: bool,
-++++-+    #     streamer=None,
-++++-+    #     logits_warper=None,
-++++-+    #     **model_kwargs,
-++++-+    # ):
-++++-+    #     """
-++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++++-+    #     """
-++++-+    #     from ...generation.logits_process import LogitsProcessorList
-++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++++-+    #     from mindnlp.core import nn, ops, no_grad
-++++-+    #     import numpy as np
-++++-+        
-++++-+    #     # 检查是否使用 StaticCache
-++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++++-+    #     # 否则，直接调用父类方法
-++++-+    #     past_key_values = model_kwargs.get("past_key_values")
-++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++++-+
-++++-+    #     if not isinstance(past_key_values, StaticCache):
-++++-+    #         # 不使用 StaticCache，直接调用父类方法
-++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++++-+    #         return super()._sample(
-++++-+    #             input_ids=input_ids,
-++++-+    #             logits_processor=logits_processor,
-++++-+    #             stopping_criteria=stopping_criteria,
-++++-+    #             generation_config=generation_config,
-++++-+    #             synced_devices=synced_devices,
-++++-+    #             streamer=streamer,
-++++-+    #             logits_warper=logits_warper,
-++++-+    #             **model_kwargs,
-++++-+    #         )
-++++-+        
-++++-+    #     # 使用 StaticCache，进入自定义循环
-++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++++-+    #     pad_token_id = generation_config._pad_token_tensor
-++++-+    #     output_attentions = generation_config.output_attentions
-++++-+    #     output_hidden_states = generation_config.output_hidden_states
-++++-+    #     output_scores = generation_config.output_scores
-++++-+    #     output_logits = generation_config.output_logits
-++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++++-+    #     max_length = generation_config.max_length
-++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++++-+    #     do_sample = generation_config.do_sample
-++++-+        
-++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++++-+    #         raise ValueError(
-++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++++-+    #             f"{logits_warper})."
-++++-+    #         )
-++++-+        
-++++-+    #     # init attention / hidden states / scores tuples
-++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
-++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++++-+        
-++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++++-+    #         encoder_hidden_states = (
-++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++++-+    #         )
-++++-+        
-++++-+    #     # keep track of which sequences are already finished
-++++-+    #     batch_size, cur_len = input_ids.shape
-++++-+    #     this_peer_finished = False
-++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++++-+        
-++++-+    #     time_record = []
-++++-+    #     from ....utils.testing_utils import parse_flag_from_env
-++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++++-+        
-++++-+    #     while self._has_unfinished_sequences(
-++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++++-+    #     ):
-++++-+    #         if _record_time:
-++++-+    #             import time as time_module
-++++-+    #             infer_start = time_module.time()
-++++-+            
-++++-+    #         # prepare model inputs
-++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++++-+            
-++++-+    #         # prepare variable output controls
-++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++++-+            
-++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++++-+    #         cur_cache_position = model_inputs.get("cache_position")
-++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
-++++-+    #         cur_input_ids = model_inputs.get("input_ids")
-++++-+            
-++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++++-+    #             cur_cache_position is not None and 
-++++-+    #             len(cur_cache_position.shape) > 0 and
-++++-+    #             cur_cache_position.shape[0] == 1 and
-++++-+    #             cur_input_ids is not None and
-++++-+    #             cur_input_ids.shape[1] == 1):
-++++-+    #             # 使用 JIT 优化的单 token 解码
-++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++++-+    #             if not hasattr(self, '_jit_used'):
-++++-+    #                 self._jit_used = False
-++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++++-+                
-++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
-++++-+    #                 cur_token=cur_input_ids,
-++++-+    #                 input_pos=model_inputs.get("position_ids"),
-++++-+    #                 cache_position=cur_cache_position,
-++++-+    #                 past_key_values=cur_past_key_values,
-++++-+    #             )
-++++-+                
-++++-+    #             # 标记已使用JIT（用于后续判断）
-++++-+    #             if not self._jit_used:
-++++-+    #                 self._jit_used = True
-++++-+                
-++++-+    #             # 构造兼容的输出对象
-++++-+    #             class JitOptimizedOutput:
-++++-+    #                 def __init__(self, logits, config):
-++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++++-+    #                     self.config = config
-++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
-++++-+    #                     self.cross_attentions = None
-++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++++-+                
-++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++++-+    #         else:
-++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++++-+    #             outputs = self(**model_inputs, return_dict=True)
-++++-+            
-++++-+    #         if synced_devices and this_peer_finished:
-++++-+    #             continue
-++++-+            
-++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++++-+    #         next_token_logits = outputs.logits[:, -1, :]
-++++-+            
-++++-+    #         # pre-process distribution
-++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++++-+    #         if do_sample:
-++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++++-+            
-++++-+    #         # Store scores, attentions and hidden_states when required
-++++-+    #         if return_dict_in_generate:
-++++-+    #             if output_scores:
-++++-+    #                 scores += (next_token_scores,)
-++++-+    #             if output_logits:
-++++-+    #                 raw_logits += (next_token_logits,)
-++++-+    #             if output_attentions:
-++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++++-+    #                 if self.config.is_encoder_decoder:
-++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++++-+                
-++++-+    #             if output_hidden_states:
-++++-+    #                 hidden = (
-++++-+    #                     outputs.decoder_hidden_states
-++++-+    #                     if self.config.is_encoder_decoder
-++++-+    #                     else outputs.hidden_states
-++++-+    #                 )
-++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++++-+            
-++++-+    #         # token selection
-++++-+    #         if do_sample:
-++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++++-+    #         else:
-++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++++-+            
-++++-+    #         # finished sentences should have their next token be a padding token
-++++-+    #         if has_eos_stopping_criteria:
-++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++++-+            
-++++-+    #         # update generated ids, model inputs, and length for next step
-++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++++-+    #         if streamer is not None:
-++++-+    #             streamer.put(next_tokens)
-++++-+            
-++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
-++++-+    #             outputs,
-++++-+    #             model_kwargs,
-++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++++-+    #         )
-++++-+            
-++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++++-+    #         cur_len += 1
-++++-+            
-++++-+    #         if _record_time:
-++++-+    #             import time as time_module
-++++-+    #             infer_stop = time_module.time()
-++++-+    #             time_record.append(infer_stop - infer_start)
-++++-+            
-++++-+    #         del outputs
-++++-+        
-++++-+    #     average_infer_time = None
-++++-+    #     if time_record:
-++++-+    #         if len(time_record) > 1:
-++++-+    #             time_record.pop(0)
-++++-+    #         average_infer_time = sum(time_record) / len(time_record)
-++++-+    #         print(f'average inference time is: {average_infer_time}')
-++++-+    #         print(f'inference time record: {time_record}')
-++++-+        
-++++-+    #     if streamer is not None:
-++++-+    #         streamer.end()
-++++-+        
-++++-+    #     # 简单判断：打印是否使用了JIT路径
-++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
-++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
-++++-+    #     else:
-++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++++-+        
-++++-+    #     if return_dict_in_generate:
-++++-+    #         if self.config.is_encoder_decoder:
-++++-+    #             return GenerateEncoderDecoderOutput(
-++++-+    #                 sequences=input_ids,
-++++-+    #                 scores=scores,
-++++-+    #                 logits=raw_logits,
-++++-+    #                 encoder_attentions=encoder_attentions,
-++++-+    #                 encoder_hidden_states=encoder_hidden_states,
-++++-+    #                 decoder_attentions=decoder_attentions,
-++++-+    #                 cross_attentions=cross_attentions,
-++++-+    #                 decoder_hidden_states=decoder_hidden_states,
-++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++-+    #                 average_infer_time=average_infer_time
-++++-+    #             )
-++++-+    #         else:
-++++-+    #             return GenerateDecoderOnlyOutput(
-++++-+    #                 sequences=input_ids,
-++++-+    #                 scores=scores,
-++++-+    #                 logits=raw_logits,
-++++-+    #                 attentions=decoder_attentions,
-++++-+    #                 hidden_states=decoder_hidden_states,
-++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++-+    #                 average_infer_time=average_infer_time
-++++-+    #             )
-++++-+    #     else:
-++++-+    #         return input_ids
-++++-+            
-++++-+    # def _prepare_cache_for_generation(
-++++-+    #     self,
-++++-+    #     generation_config,
-++++-+    #     model_kwargs,
-++++-+    #     assistant_model,
-++++-+    #     batch_size,
-++++-+    #     max_cache_length,
-++++-+    # ):
-++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++++-+    #         generation_config.cache_implementation = "static"
-++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++++-+        
-++++-+    #     if generation_config.cache_implementation == "static":
-++++-+    #         base_required_from_max_length = generation_config.max_length + 1
-++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
-++++-+    #         min_cache_size = 50
-++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++++-+    #         else:
-++++-+    #             max_cache_length = max(base_required, min_cache_size)
-++++-+            
-++++-+    #         original_max_cache_length = max_cache_length
-++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
-++++-+            
-++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++-+    #             if max_cache_length > self.config.max_position_embeddings:
-++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++++-+        
-++++-+    #     result = super()._prepare_cache_for_generation(
-++++-+    #         generation_config=generation_config,
-++++-+    #         model_kwargs=model_kwargs,
-++++-+    #         assistant_model=assistant_model,
-++++-+    #         batch_size=batch_size,
-++++-+    #         max_cache_length=max_cache_length,
-++++-+    #     )
-++++-+        
-++++-+    #     if generation_config.cache_implementation == "static":
-++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++++-+    #         created_cache = model_kwargs.get(cache_name)
-++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
-++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++++-+        
-++++-+    #     return result
-++++-+
-++++-+
-++++-+
-++++- 
-++++- 
-++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++++--- 
-++++-2.27.0
-++++-
-++++-- 
-++++2.27.0
-++++
-+++-- 
-+++2.27.0
-+++
-++-- 
-++2.27.0
-++
-+-- 
-+2.27.0
-+
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0008-moe-change.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0008-moe-change.patch"
deleted file mode 100644
index 31d324c3..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0008-moe-change.patch"
+++ /dev/null
@@ -1,8789 +0,0 @@
-From 3b0f98eeed90a7204357d96aacc9dc7098b9dab1 Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Sun, 9 Nov 2025 00:50:01 +0800
-Subject: [PATCH 08/10] moe change
-
----
- .../models/deepseek/modeling_deepseek.py      |  433 +-
- .../models/qwen2_moe/modeling_qwen2_moe.py    |   86 +-
- patches/0001-20251104commit.patch             |    2 +-
- patches/0002-20251106commit.patch             |    2 +-
- patches/0003-20261106secondcommit.patch       |    2 +-
- patches/0004-20251106change.patch             |    2 +-
- patches/0005-20251107001commit.patch          |    2 +-
- patches/0006-20251107002commit.patch          |    2 +-
- patches/0007-20251107003commit.patch          | 8034 +++++++++++++++++
- 9 files changed, 8510 insertions(+), 55 deletions(-)
- create mode 100644 patches/0007-20251107003commit.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index ff631974..0af29305 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -19,8 +19,10 @@
- # limitations under the License.
- """ MindNLP DeepSeek model."""
- import math
-+import time
- import warnings
- from typing import List, Optional, Tuple, Union
-+from mindspore import mint
- import mindspore
- from mindnlp.core import nn, ops, no_grad
- from mindnlp.core.nn import functional as F
-@@ -54,6 +56,10 @@ logger = logging.get_logger(__name__)
- 
- _CONFIG_FOR_DOC = "DeepseekConfig"
- 
-+Long_Prompt = 1
-+LONG_PROMPT_LENGTH_THRESHOLD = 128
-+SHORT_PROMPT_LENGTH_THRESHOLD = 32
-+
- _attn_mask_cache = {}
- 
- def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-@@ -380,6 +386,8 @@ class MoEGate(nn.Module):
-         return topk_idx, topk_weight, aux_loss
- 
- 
-+bincount_op = mindspore.ops.Bincount()
-+
- class DeepseekMoE(nn.Module):
-     """
-     A mixed expert module containing shared experts.
-@@ -413,7 +421,10 @@ class DeepseekMoE(nn.Module):
-                     y = y + self.shared_experts(identity)
-                 return y
-             else:
--                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+                if Long_Prompt == 0:
-+                    y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+                else:
-+                    y= self.moe_infer_prefill_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-                 if self.config.n_shared_experts is not None:
-                     y = y + self.shared_experts(identity)
-                 return y
-@@ -421,7 +432,103 @@ class DeepseekMoE(nn.Module):
-         # if self.config.n_shared_experts is not None:
-         #     y = y + self.shared_experts(identity)
-         # return y
--        
-+    
-+    
-+    
-+    # lwx
-+    # def forward(self, x, expert_ids: Optional[mindspore.Tensor] = None):
-+    #     """
-+    #     如果 expert_ids 为 None，走单专家逻辑；
-+    #     如果有，多专家批量处理，保证和原逻辑一致。
-+    #     """
-+    #     if expert_ids is None:
-+    #         # 原单专家逻辑
-+    #         if self.config.pretraining_tp > 1:
-+    #             slice = self.intermediate_size // self.config.pretraining_tp
-+    #             gate_proj_slices = ops.split(self.gate_proj.weight, slice, dim=0)
-+    #             up_proj_slices = ops.split(self.up_proj.weight, slice, dim=0)
-+    #             down_proj_slices = ops.split(self.down_proj.weight, slice, dim=1)
-+    #             gate_proj = ops.cat([F.linear(x, gate_proj_slices[i]) 
-+    #                                  for i in range(self.config.pretraining_tp)], dim=-1)
-+    #             up_proj = ops.cat([F.linear(x, up_proj_slices[i]) 
-+    #                                for i in range(self.config.pretraining_tp)], dim=-1)
-+    #             intermediate_states = ops.split((self.act_fn(gate_proj) * up_proj), slice, dim=2)
-+    #             down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) 
-+    #                          for i in range(self.config.pretraining_tp)]
-+    #             down_proj = sum(down_proj)
-+    #         else:
-+    #             down_proj = self.down_proj(
-+    #                 self.act_fn(self.gate_proj(x)) * self.up_proj(x)
-+    #             )
-+    #         return down_proj
-+
-+    #     # ====== 批量多专家路径 ======
-+    #     hidden_size = x.shape[-1]
-+
-+    #     # 按 token expert_ids 选权重
-+    #     gate_weights = self.gate_proj.weight[expert_ids]  # shape: [tokens, inter_size]
-+    #     up_weights   = self.up_proj.weight[expert_ids]
-+    #     down_weights = self.down_proj.weight[expert_ids]
-+
-+    #     # 注意：pretraining_tp > 1 的分 slice 逻辑仍然要保留
-+    #     if self.config.pretraining_tp > 1:
-+    #         outputs = []
-+    #         slice = self.intermediate_size // self.config.pretraining_tp
-+    #         for i in range(self.config.pretraining_tp):
-+    #             # 每个 slice 单独计算
-+    #             gate_proj_out = F.linear(x, gate_weights[:, i*slice:(i+1)*slice])
-+    #             up_proj_out   = F.linear(x, up_weights[:, i*slice:(i+1)*slice])
-+    #             act_out       = self.act_fn(gate_proj_out) * up_proj_out
-+    #             down_proj_out = F.linear(act_out, down_weights[i*slice:(i+1)*slice, :])
-+    #             outputs.append(down_proj_out)
-+    #         return sum(outputs)
-+    #     else:
-+    #         gate_proj_out = F.linear(x, gate_weights)
-+    #         up_proj_out   = F.linear(x, up_weights)
-+    #         act_out       = self.act_fn(gate_proj_out) * up_proj_out
-+    #         return F.linear(act_out, down_weights)
-+    # @no_grad()
-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+    #     num_tokens = x.shape[0]
-+    #     hidden_size = x.shape[-1]
-+
-+    #     idxs = flat_expert_indices.argsort()
-+    #     sorted_expert_indices = flat_expert_indices[idxs]
-+    #     sorted_token_indices = idxs // self.num_experts_per_tok
-+    #     sorted_indices = sorted_token_indices
-+
-+    #     permuted_tokens = x[sorted_token_indices]
-+    #     sorted_weights  = flat_expert_weights[idxs]
-+
-+    #     # 一次调用多专家 forward
-+    #     expert_outputs = ops.zeros_like(permuted_tokens)
-+    #     expert_outputs = self.mlp_batch_forward(permuted_tokens, sorted_expert_indices)
-+
-+    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
-+    #     try:
-+    #         final_output = ops.moe_token_unpermute(
-+    #             expert_outputs,
-+    #             sorted_indices,
-+    #             probs=probs,
-+    #             padded_mode=False
-+    #         )
-+    #     except Exception:
-+    #         final_output = ops.zeros_like(x)
-+    #         final_output = mindspore.mint.scatter_add(
-+    #             final_output,
-+    #             0,
-+    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-+    #             expert_outputs * sorted_weights
-+    #         )
-+
-+    #     return final_output
-+
-+    # def mlp_batch_forward(self, tokens, expert_ids):
-+    #     """
-+    #     使用批量专家 forward（保留精度）
-+    #     """
-+    #     return self.experts[0].forward(tokens, expert_ids)
-+
-     # @no_grad()
-     # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
- 
-@@ -434,52 +541,15 @@ class DeepseekMoE(nn.Module):
-     #         expert_cache += expert_out * weight
-     #     return expert_cache
-     
-+    #@dwj
-     @no_grad()
--    # dwj
-     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--        # x 的 shape: (1, hidden_size)
--        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--
--        # 1. 收集所有需要的专家层
--        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-         selected_experts = [self.experts[i] for i in flat_expert_indices]
--
--        # 2. 并行计算所有专家的输出
--        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--        # ops.cat 会将它们堆叠成一个新的 Tensor
--        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-         expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--
--        # 3. 使用矩阵乘法进行加权求和
--        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--        # 最终结果 final_output 的 shape: (1, hidden_size)
-         final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--        
-         return final_output
- 
- 
--    # @no_grad()
--    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--    #     expert_cache = ops.zeros_like(x)
--    #     idxs = flat_expert_indices.argsort()
--    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--    #     token_idxs = idxs // self.num_experts_per_tok
--
--    #     for i, end_idx in enumerate(tokens_per_expert):
--    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--    #         if start_idx == end_idx:
--    #             continue
--    #         expert = self.experts[i]
--    #         exp_token_idx = token_idxs[start_idx:end_idx]
--    #         expert_tokens = x[exp_token_idx]
--    #         expert_out = expert(expert_tokens)
--    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--
--    #     return expert_cache
--        
-     @no_grad()
-     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-         """
-@@ -525,6 +595,264 @@ class DeepseekMoE(nn.Module):
-             )
- 
-         return expert_cache
-+
-+
-+    # @no_grad()
-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+    #     """
-+    #     优化版 MoE prefill：使用 mindspore.ops.moe_token_unpermute 替代手动 scatter_add
-+    #     """
-+    #     num_tokens = x.shape[0]
-+    #     hidden_size = x.shape[-1]
-+
-+    #     # 生成排序后的 token 索引
-+    #     idxs = flat_expert_indices.argsort()
-+    #     sorted_expert_indices = flat_expert_indices[idxs]
-+    #     sorted_token_indices = idxs // self.num_experts_per_tok
-+
-+    #     # 记录到 sorted_indices（moe_token_unpermute 用）
-+    #     sorted_indices = sorted_token_indices  # shape: [num_tokens * top_k]
-+
-+    #     # 收集专家输入
-+    #     permuted_tokens = x[sorted_token_indices]
-+
-+    #     # 执行每个专家的 MLP（批量处理）
-+    #     expert_outputs = []
-+    #     token_ptr = 0
-+    #     tokens_per_expert = sorted_expert_indices.bincount()
-+    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
-+    #         if count == 0:
-+    #             continue
-+    #         cur_tokens = permuted_tokens[token_ptr:token_ptr+count]
-+    #         out = self.experts[expert_id](cur_tokens)
-+    #         expert_outputs.append(out)
-+    #         token_ptr += count
-+
-+    #     # 拼接所有专家输出
-+    #     permuted_outputs = ops.cat(expert_outputs, axis=0)
-+
-+    #     # 权重缩放（probs 形状为 [num_tokens, top_k]）
-+    #     probs = flat_expert_weights.view(num_tokens, self.num_experts_per_tok)
-+
-+    #     # 直接调用硬件加速的 unpermute
-+    #     final_output = ops.moe_token_unpermute(
-+    #         permuted_outputs,         # shape: [num_tokens * top_k, hidden_size]
-+    #         sorted_indices,           # shape: [num_tokens * top_k]
-+    #         probs=probs,               # 按概率加权
-+    #         padded_mode=False
-+    #     )
-+
-+    #     return final_output
-+
-+    # lwx prefill 20251108
-+    @no_grad()
-+    def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
-+        """
-+        高性能 + 数值一致的 MoE prefill 推理：
-+        1. 批量化处理所有专家计算，减少 Python 循环开销
-+        2. Ascend A2 上使用 ops.moe_token_unpermute 加速 token 恢复
-+        3. CPU/GPU 上自动 fallback 到 scatter_add 实现
-+        4. 保证权重和 token 排列顺序与原版本完全一致，避免生成结果 mismatch
-+
-+        参数：
-+            x: [num_tokens, hidden_size]，
-+            MoE 输入的 token 表示
-+            flat_expert_indices: [num_tokens * top_k]，
-+            每个 token 的路由专家 id
-+            flat_expert_weights: [num_tokens * top_k, 1]，
-+            路由专家权重
-+        """
-+        num_tokens = x.shape[0]
-+        hidden_size = x.shape[-1]
-+
-+        # 1) 排序专家分配（与原 scatter_add 一致的顺序）
-+        idxs = flat_expert_indices.argsort()  # 排序索引
-+        sorted_expert_indices = flat_expert_indices[idxs]          # [num_tokens*top_k]
-+        sorted_token_indices = idxs // self.num_experts_per_tok    # 原 token ID
-+
-+        # sorted_indices 必须与 permuted_tokens 顺序匹配
-+        sorted_indices = sorted_token_indices   # 用原 token 位置恢复顺序
-+
-+        # 2) 收集专家输入（按 idxs 排序）
-+        permuted_tokens = x[sorted_token_indices]                  # [num_tokens*top_k, hidden_size]
-+        sorted_weights  = flat_expert_weights[idxs]                 # [num_tokens*top_k, 1]，确保与 permuted_tokens 对齐
-+
-+        # 3) 计算每个专家的 token 数
-+        tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
-+
-+        # 4) 批量专家计算（减少 Python 循环）
-+        gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
-+        up_weights   = ops.stack([expert.up_proj.weight   for expert in self.experts], dim=0)
-+        down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
-+
-+        expert_outputs = ops.zeros_like(permuted_tokens)
-+        ptr = 0
-+        for expert_id, count in enumerate(tokens_per_expert.tolist()):
-+            if count == 0:
-+                continue
-+            tokens = permuted_tokens[ptr:ptr+count]  # [count, hidden_size]
-+            
-+            # 与 DeepseekMLP forward 等价
-+            gate_proj_out = F.linear(tokens, gate_weights[expert_id])
-+            up_proj_out   = F.linear(tokens, up_weights[expert_id])
-+            act_out       = self.experts[expert_id].act_fn(gate_proj_out) * up_proj_out
-+            expert_out    = F.linear(act_out, down_weights[expert_id])
-+
-+            expert_outputs[ptr:ptr+count] = expert_out
-+            ptr += count
-+
-+        # 5) Ascend 加速的 unpermute（已排序的权重）
-+        probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)  # 按排序后的顺序 reshape
-+
-+        final_output = ops.zeros_like(x)
-+        final_output = mindspore.mint.scatter_add(
-+            final_output,
-+            0,
-+            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-+            expert_outputs * sorted_weights
-+        )      
-+
-+
-+        # try:
-+        #     final_output = ops.moe_token_unpermute(
-+        #         expert_outputs,   # [num_tokens*top_k, hidden_size]
-+        #         sorted_indices,   # [num_tokens*top_k] 原 token id
-+        #         probs=probs,      # 对应权重
-+        #         padded_mode=False
-+        #     )
-+        # except Exception:
-+        #     # CPU/GPU fallback：用 scatter_add 保证完全一致
-+        #     final_output = ops.zeros_like(x)
-+        #     final_output = mindspore.mint.scatter_add(
-+        #         final_output,
-+        #         0,
-+        #         sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-+        #         expert_outputs * sorted_weights
-+        #     )
-+
-+        return final_output
-+
-+
-+    # @no_grad()
-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+    #     num_tokens = x.shape[0]
-+    #     hidden_size = x.shape[-1]
-+
-+    #     idxs = flat_expert_indices.argsort()
-+    #     sorted_expert_indices = flat_expert_indices[idxs]
-+    #     sorted_token_indices = idxs // self.num_experts_per_tok
-+        
-+    #     # sorted_indices = sorted_token_indices
-+    #     sorted_indices = sorted_token_indices.astype(mindspore.int32)
-+    #     permuted_tokens = x[sorted_token_indices]
-+    #     sorted_weights = flat_expert_weights[idxs]
-+    #     tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
-+
-+    #     expert_outputs = ops.zeros_like(permuted_tokens)
-+    #     ptr = 0
-+
-+    #     # 只按专家维度循环
-+    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
-+    #         if count == 0:
-+    #             continue
-+    #         token_slice = slice(ptr, ptr + count)
-+    #         expert_tokens = permuted_tokens[token_slice]
-+
-+    #         # 保持原 forward（含 pretraining_tp、bias 等）
-+    #         expert_out = self.experts[expert_id](expert_tokens)
-+
-+    #         expert_outputs[token_slice] = expert_out
-+    #         ptr += count
-+
-+    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
-+    #     try:
-+    #         final_output = mindspore.ops.moe_token_unpermute(
-+    #             expert_outputs,
-+    #             sorted_indices,
-+    #             probs=probs,
-+    #             padded_mode=False
-+    #         )
-+    #     except Exception:
-+    #         final_output = ops.zeros_like(x)
-+    #         final_output = mindspore.mint.scatter_add(
-+    #             final_output,
-+    #             0,
-+    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-+    #             expert_outputs * sorted_weights
-+    #         )
-+
-+    #     return final_output
-+
-+
-+    #lwx
-+    # @no_grad()
-+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+    #     """
-+    #     并行化 MoE prefill：
-+    #     - 一次性计算所有专家输出，牺牲显存峰值换取速度
-+    #     - 保证结果与原版完全一致
-+    #     """
-+    #     # 输出缓存
-+    #     expert_cache = ops.zeros_like(x)
-+
-+    #     # token 总数（批量*seq_len*num_experts_per_tok）
-+    #     num_tokens = flat_expert_indices.shape[0]
-+    #     hidden_dim = x.shape[-1]
-+
-+    #     # 原 token ID（idxs // num_experts_per_tok）
-+    #     token_ids = ops.arange(num_tokens // self.num_experts_per_tok).repeat_interleave(self.num_experts_per_tok)
-+
-+    #     # ====== Step 1: 组织输入 ======
-+    #     # 按 experts 排序，保证 scatter_add 对应位置一致
-+    #     sort_ids = flat_expert_indices.argsort()
-+    #     sorted_experts = flat_expert_indices[sort_ids]
-+    #     sorted_tokens = token_ids[sort_ids]
-+    #     sorted_weights = flat_expert_weights[sort_ids]
-+
-+    #     # 收集每个专家的输入
-+    #     # build: expert_inputs[expert_id] = [tokens...]
-+    #     expert_inputs = []
-+    #     expert_outs = []
-+
-+    #     for eid in range(self.config.n_routed_experts):
-+    #         eid_mask = (sorted_experts == eid)
-+    #         if eid_mask.any():
-+    #             tokens_for_eid = x[sorted_tokens[eid_mask]]
-+    #             expert_inputs.append(tokens_for_eid)
-+    #         else:
-+    #             expert_inputs.append(None)
-+
-+    #     # ====== Step 2: 并行计算所有专家输出 ======
-+    #     # 存储所有专家结果到一个列表
-+    #     for eid in range(self.config.n_routed_experts):
-+    #         if expert_inputs[eid] is not None:
-+    #             out = self.experts[eid](expert_inputs[eid])
-+    #             expert_outs.append(out)
-+    #         else:
-+    #             expert_outs.append(None)
-+
-+    #     # ====== Step 3: scatter_add 回写结果 ======
-+    #     # 遍历专家，将结果加回对应的 token
-+    #     pos = 0
-+    #     for eid in range(self.config.n_routed_experts):
-+    #         if expert_outs[eid] is not None:
-+    #             size = expert_outs[eid].shape[0]
-+    #             tokens_idx = sorted_tokens[pos:pos+size]
-+    #             scaled_out = expert_outs[eid] * sorted_weights[pos:pos+size]
-+    #             pos += size
-+
-+    #             # scatter_add 到 expert_cache
-+    #             expert_cache = mindspore.mint.scatter_add(
-+    #                 expert_cache,
-+    #                 dim=0,
-+    #                 index=tokens_idx.view(-1, 1).tile((1, hidden_dim)),
-+    #                 src=scaled_out
-+    #             )
-+
-+    #     return expert_cache
-+
-+
-+
- # 放置在 DeepseekMoE 类中
-     # @no_grad()
-     # #lwx 20251107
-@@ -1188,7 +1516,7 @@ class DeepseekDecoderLayer(nn.Module):
-         self.hidden_size = config.hidden_size
- 
-         # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--            # config=config, layer_idx=layer_idx
-+        #     config=config, layer_idx=layer_idx
-         # )
- 
-         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-@@ -1204,6 +1532,7 @@ class DeepseekDecoderLayer(nn.Module):
-             )
-             else DeepseekMLP(config)
-         )
-+
-         self.input_layernorm = DeepseekRMSNorm(
-             config.hidden_size, eps=config.rms_norm_eps
-         )
-@@ -1537,6 +1866,28 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-     def get_decoder(self):
-         return self.model
- 
-+    def generate(self, *args, **kwargs):
-+        """
-+        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-+        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-+        """
-+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+
-+        input_ids = kwargs.get("input_ids")
-+        if input_ids is None and args:
-+            input_ids = args[0]
-+
-+        if input_ids is not None:
-+            prompt_length = input_ids.shape[1]
-+            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-+                Long_Prompt = 2
-+            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-+                Long_Prompt = 0
-+            else:
-+                Long_Prompt = 1
-+
-+
-+        return super().generate(*args, **kwargs)
- 
-     def forward(
-         self,
-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-index 913a7609..6566958b 100644
---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-@@ -1104,7 +1104,7 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
- 
-     # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-     @no_grad()
--    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+    def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-         original_dtype = hidden_states.dtype
-         batch_size, _ = hidden_states.shape
-         expert_outputs_list = [
-@@ -1119,8 +1119,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-         return moe_output_fp32.squeeze(1).to(original_dtype)
- 
-+
-     # @no_grad()
--    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+    # def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-     #     num_tokens, _ = hidden_states.shape
-     #     flat_selected_experts = selected_experts.flatten()
-     #     sorted_expert_indices = flat_selected_experts.argsort()
-@@ -1142,8 +1143,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-     #         current_token_offset += expert_token_count
-     #     return moe_output
- 
-+    # baseline
-     @no_grad()
--    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+    def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-         """
-         优化版 MoE prefill (速度优先模式)：
-         - 批量张量化处理同一个 expert 的所有 token
-@@ -1184,7 +1186,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         return moe_output
- 
- 
-+    @no_grad()
-+    def _moe_infer_prefill_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+        """
-+        优化版 MoE prefill (速度优先模式) - 连续切片 & 单次 scatter_add
-+        逻辑：
-+        1. 按 expert 排序，将同一 expert 的 token 放在连续内存中
-+        2. 每个 expert 一次性处理其全部 token
-+        3. 最后一次 scatter_add 回到原 token 顺序
-+        """
-+
-+        num_tokens = hidden_states.shape[0]
-+        hidden_size = hidden_states.shape[-1]
-+
-+        # 展平为一维
-+        flat_selected_experts = selected_experts.flatten()       # [num_tokens * top_k]
-+        flat_routing_weights = routing_weights.flatten()         # [num_tokens * top_k]
-+
-+        # 按 expert 排序
-+        idxs = flat_selected_experts.argsort()
-+        sorted_expert_indices = flat_selected_experts[idxs]       # expert ID 排序后
-+        sorted_token_indices = idxs // self.top_k                 # 对应原 token ID
-+
-+        # 排好序的输入向量（连续内存）
-+        permuted_tokens = hidden_states[sorted_token_indices]
-+
-+        # 排好序的权重
-+        sorted_weights = flat_routing_weights[idxs]
-+
-+        # 每个 expert 对应的 token 数量
-+        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-+
-+        # 存放专家输出（与 permuted_tokens 对应顺序保持一致）
-+        expert_outputs = ops.zeros_like(permuted_tokens)
-+
-+        ptr = 0  # 指向当前切片的起点
-+        for expert_id, count in enumerate(tokens_per_expert.tolist()):
-+            if count == 0:
-+                continue
-+
-+            token_slice = slice(ptr, ptr + count)
-+            expert_tokens = permuted_tokens[token_slice]  # 连续切片
-+
-+            # 执行专家 MLP
-+            expert_out = self.experts[expert_id](expert_tokens)
-+
-+            expert_outputs[token_slice] = expert_out
-+            ptr += count
-+
-+        # 按权重缩放
-+        scaled_outputs = expert_outputs * sorted_weights.unsqueeze(1)
-+
-+        # 回写到原 token 顺序 (单次 scatter_add)
-+        moe_output = mindspore.mint.scatter_add(
-+            ops.zeros_like(hidden_states),
-+            0,
-+            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-+            scaled_outputs
-+        )
-+
-+        return moe_output
-+
-+
-+    
-     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-+
-     @no_grad()
-     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-         moe_output = ops.zeros_like(hidden_states)
-@@ -1225,16 +1291,20 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         #     # --- 速度优先模式 (SPEED MODE) ---
-         #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-         #     if sequence_length == 1:
--        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+        #         moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-         #     else:
--        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+        #         moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-         
-         routing_weights_casted = routing_weights.to(hidden_states.dtype)
-         if sequence_length == 1:
--            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+            moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-         else:
--            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--    
-+            # if Long_Prompt == 1:
-+            #     moe_output = self._moe_infer_prefill_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+            # else:
-+            #     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+            moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+
- 
-         # 3. 共享专家计算与合并 (所有模式通用)
-         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-index c9c8c5ee..513dd40b 100644
---- a/patches/0001-20251104commit.patch
-+++ b/patches/0001-20251104commit.patch
-@@ -1,7 +1,7 @@
- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Tue, 4 Nov 2025 09:11:51 +0800
--Subject: [PATCH 1/6] 20251104commit
-+Subject: [PATCH 1/7] 20251104commit
- 
- ---
-  mindnlp/transformers/cache_utils.py           |  28 +-
-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-index 625656eb..41081b85 100644
---- a/patches/0002-20251106commit.patch
-+++ b/patches/0002-20251106commit.patch
-@@ -1,7 +1,7 @@
- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 09:20:38 +0800
--Subject: [PATCH 2/6] 20251106commit
-+Subject: [PATCH 2/7] 20251106commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-index dcb85080..c1392569 100644
---- a/patches/0003-20261106secondcommit.patch
-+++ b/patches/0003-20261106secondcommit.patch
-@@ -1,7 +1,7 @@
- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 14:54:37 +0800
--Subject: [PATCH 3/6] 20261106secondcommit
-+Subject: [PATCH 3/7] 20261106secondcommit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-index bbed13cc..e548b1b2 100644
---- a/patches/0004-20251106change.patch
-+++ b/patches/0004-20251106change.patch
-@@ -1,7 +1,7 @@
- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 15:48:09 +0800
--Subject: [PATCH 4/6] 20251106change
-+Subject: [PATCH 4/7] 20251106change
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  189 +-
-diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-index b2d1035c..bf224d2a 100644
---- a/patches/0005-20251107001commit.patch
-+++ b/patches/0005-20251107001commit.patch
-@@ -1,7 +1,7 @@
- From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Fri, 7 Nov 2025 11:48:18 +0800
--Subject: [PATCH 5/6] 20251107001commit
-+Subject: [PATCH 5/7] 20251107001commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |   91 +-
-diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
-index bffa134e..1bd306b9 100644
---- a/patches/0006-20251107002commit.patch
-+++ b/patches/0006-20251107002commit.patch
-@@ -1,7 +1,7 @@
- From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Fri, 7 Nov 2025 12:06:32 +0800
--Subject: [PATCH 6/6] 20251107002commit
-+Subject: [PATCH 6/7] 20251107002commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  122 +-
-diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
-new file mode 100644
-index 00000000..ce558554
---- /dev/null
-+++ b/patches/0007-20251107003commit.patch
-@@ -0,0 +1,8034 @@
-+From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Fri, 7 Nov 2025 12:12:51 +0800
-+Subject: [PATCH 7/7] 20251107003commit
-+
-+---
-+ .../models/deepseek/modeling_deepseek.py      |    2 +-
-+ patches/0001-20251104commit.patch             |    2 +-
-+ patches/0002-20251106commit.patch             |    2 +-
-+ patches/0003-20261106secondcommit.patch       |    2 +-
-+ patches/0004-20251106change.patch             |    2 +-
-+ patches/0005-20251107001commit.patch          |    2 +-
-+ patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
-+ 7 files changed, 7937 insertions(+), 6 deletions(-)
-+ create mode 100644 patches/0006-20251107002commit.patch
-+
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index e7e1c053..ff631974 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
-+     #     return expert_cache
-+     
-+     @no_grad()
-+-    dwj
-++    # dwj
-+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+         # x 的 shape: (1, hidden_size)
-+         # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+index 2842180e..c9c8c5ee 100644
-+--- a/patches/0001-20251104commit.patch
-++++ b/patches/0001-20251104commit.patch
-+@@ -1,7 +1,7 @@
-+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Tue, 4 Nov 2025 09:11:51 +0800
-+-Subject: [PATCH 1/5] 20251104commit
-++Subject: [PATCH 1/6] 20251104commit
-+ 
-+ ---
-+  mindnlp/transformers/cache_utils.py           |  28 +-
-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+index c6cd8757..625656eb 100644
-+--- a/patches/0002-20251106commit.patch
-++++ b/patches/0002-20251106commit.patch
-+@@ -1,7 +1,7 @@
-+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 09:20:38 +0800
-+-Subject: [PATCH 2/5] 20251106commit
-++Subject: [PATCH 2/6] 20251106commit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+index 601960c9..dcb85080 100644
-+--- a/patches/0003-20261106secondcommit.patch
-++++ b/patches/0003-20261106secondcommit.patch
-+@@ -1,7 +1,7 @@
-+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 14:54:37 +0800
-+-Subject: [PATCH 3/5] 20261106secondcommit
-++Subject: [PATCH 3/6] 20261106secondcommit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-+index 8976f10b..bbed13cc 100644
-+--- a/patches/0004-20251106change.patch
-++++ b/patches/0004-20251106change.patch
-+@@ -1,7 +1,7 @@
-+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 15:48:09 +0800
-+-Subject: [PATCH 4/5] 20251106change
-++Subject: [PATCH 4/6] 20251106change
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  189 +-
-+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-+index 8d9032be..b2d1035c 100644
-+--- a/patches/0005-20251107001commit.patch
-++++ b/patches/0005-20251107001commit.patch
-+@@ -1,7 +1,7 @@
-+ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Fri, 7 Nov 2025 11:48:18 +0800
-+-Subject: [PATCH 5/5] 20251107001commit
-++Subject: [PATCH 5/6] 20251107001commit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |   91 +-
-+diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
-+new file mode 100644
-+index 00000000..bffa134e
-+--- /dev/null
-++++ b/patches/0006-20251107002commit.patch
-+@@ -0,0 +1,7931 @@
-++From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
-++From: Pinoeer-kingxi <13022943007@163.com>
-++Date: Fri, 7 Nov 2025 12:06:32 +0800
-++Subject: [PATCH 6/6] 20251107002commit
-++
-++---
-++ .../models/deepseek/modeling_deepseek.py      |  122 +-
-++ patches/0001-20251104commit.patch             |    2 +-
-++ patches/0002-20251106commit.patch             |    2 +-
-++ patches/0003-20261106secondcommit.patch       |    2 +-
-++ patches/0004-20251106change.patch             |    2 +-
-++ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
-++ 6 files changed, 7773 insertions(+), 64 deletions(-)
-++ create mode 100644 patches/0005-20251107001commit.patch
-++
-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++index 8831e4b7..e7e1c053 100644
-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
-++     #         expert_out = expert(x)
-++     #         expert_cache += expert_out * weight
-++     #     return expert_cache
-++-
-++-    # @no_grad()
-++-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++-    #     # x 的 shape: (1, hidden_size)
-++-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++-
-++-    #     # 1. 收集所有需要的专家层
-++-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
-++-
-++-    #     # 2. 并行计算所有专家的输出
-++-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
-++-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++-
-++-    #     # 3. 使用矩阵乘法进行加权求和
-++-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
-++-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+++    
-+++    @no_grad()
-+++    dwj
-+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++        # x 的 shape: (1, hidden_size)
-+++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+++
-+++        # 1. 收集所有需要的专家层
-+++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-+++
-+++        # 2. 并行计算所有专家的输出
-+++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+++        # ops.cat 会将它们堆叠成一个新的 Tensor
-+++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+++
-+++        # 3. 使用矩阵乘法进行加权求和
-+++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++        # 最终结果 final_output 的 shape: (1, hidden_size)
-+++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++         
-++-    #     return final_output
-+++        return final_output
-++ 
-++ 
-++     # @no_grad()
-++@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
-++ 
-++         return expert_cache
-++ # 放置在 DeepseekMoE 类中
-++-    @no_grad()
-++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++-        """
-++-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-++-
-++-        Args:
-++-            x (Tensor): 输入张量, shape: (1, hidden_size)
-++-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-++-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-++-        """
-++-        top_k, _ = flat_expert_weights.shape
-++-        hidden_size = x.shape[-1]
-++-
-++-        # 1. 将所有专家的权重堆叠起来
-++-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-++-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-++-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-+++    # @no_grad()
-+++    # #lwx 20251107
-+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++    #     """
-+++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-+++
-+++    #     Args:
-+++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
-+++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-+++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-+++    #     """
-+++    #     top_k, _ = flat_expert_weights.shape
-+++    #     hidden_size = x.shape[-1]
-+++
-+++    #     # 1. 将所有专家的权重堆叠起来
-+++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-+++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-+++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-++         
-++-        # 2. "收集" 所需的专家权重
-++-        selected_gate_w = stacked_gate_w[flat_expert_indices]
-++-        selected_up_w = stacked_up_w[flat_expert_indices]
-++-        selected_down_w = stacked_down_w[flat_expert_indices]
-+++    #     # 2. "收集" 所需的专家权重
-+++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
-+++    #     selected_up_w = stacked_up_w[flat_expert_indices]
-+++    #     selected_down_w = stacked_down_w[flat_expert_indices]
-++         
-++-        # 3. 准备输入
-++-        x_expanded = x.expand((top_k, 1, hidden_size))
-+++    #     # 3. 准备输入
-+++    #     x_expanded = x.expand((top_k, 1, hidden_size))
-++         
-++-        # 4. 并行计算 gate_proj 和 up_proj
-++-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-++-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-+++    #     # 4. 并行计算 gate_proj 和 up_proj
-+++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-+++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-++ 
-++-        # 5. 计算中间状态
-++-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-+++    #     # 5. 计算中间状态
-+++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-++         
-++-        # 6. 并行计算 down_proj
-++-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-++-        #    --- [FIX] ---
-++-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-++-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-++-        #    --- [FIX END] ---
-+++    #     # 6. 并行计算 down_proj
-+++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-+++    #     #    --- [FIX] ---
-+++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-+++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-+++    #     #    --- [FIX END] ---
-++         
-++-        # 7. 根据路由权重进行加权求和
-++-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-+++    #     # 7. 根据路由权重进行加权求和
-+++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-++ 
-++-        return weighted_sum
-+++    #     return weighted_sum
-++ 
-++ 
-++ 
-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++index 0a0ef2d7..2842180e 100644
-++--- a/patches/0001-20251104commit.patch
-+++++ b/patches/0001-20251104commit.patch
-++@@ -1,7 +1,7 @@
-++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Tue, 4 Nov 2025 09:11:51 +0800
-++-Subject: [PATCH 1/4] 20251104commit
-+++Subject: [PATCH 1/5] 20251104commit
-++ 
-++ ---
-++  mindnlp/transformers/cache_utils.py           |  28 +-
-++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-++index 5185270c..c6cd8757 100644
-++--- a/patches/0002-20251106commit.patch
-+++++ b/patches/0002-20251106commit.patch
-++@@ -1,7 +1,7 @@
-++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Thu, 6 Nov 2025 09:20:38 +0800
-++-Subject: [PATCH 2/4] 20251106commit
-+++Subject: [PATCH 2/5] 20251106commit
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-++index 3e05f821..601960c9 100644
-++--- a/patches/0003-20261106secondcommit.patch
-+++++ b/patches/0003-20261106secondcommit.patch
-++@@ -1,7 +1,7 @@
-++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Thu, 6 Nov 2025 14:54:37 +0800
-++-Subject: [PATCH 3/4] 20261106secondcommit
-+++Subject: [PATCH 3/5] 20261106secondcommit
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-++index 88a1aef4..8976f10b 100644
-++--- a/patches/0004-20251106change.patch
-+++++ b/patches/0004-20251106change.patch
-++@@ -1,7 +1,7 @@
-++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Thu, 6 Nov 2025 15:48:09 +0800
-++-Subject: [PATCH 4/4] 20251106change
-+++Subject: [PATCH 4/5] 20251106change
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |  189 +-
-++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-++new file mode 100644
-++index 00000000..8d9032be
-++--- /dev/null
-+++++ b/patches/0005-20251107001commit.patch
-++@@ -0,0 +1,7707 @@
-+++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-+++From: Pinoeer-kingxi <13022943007@163.com>
-+++Date: Fri, 7 Nov 2025 11:48:18 +0800
-+++Subject: [PATCH 5/5] 20251107001commit
-+++
-+++---
-+++ .../models/deepseek/modeling_deepseek.py      |   91 +-
-+++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
-+++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
-+++ patches/0001-20251104commit.patch             |    2 +-
-+++ patches/0002-20251106commit.patch             |    2 +-
-+++ patches/0003-20261106secondcommit.patch       |    2 +-
-+++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
-+++ 7 files changed, 7577 insertions(+), 30 deletions(-)
-+++ create mode 100644 patches/0004-20251106change.patch
-+++
-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++index 0546f318..8831e4b7 100644
-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
-+++     #         expert_cache += expert_out * weight
-+++     #     return expert_cache
-+++ 
-+++-    @no_grad()
-+++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++-        # x 的 shape: (1, hidden_size)
-+++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+++-
-+++-        # 1. 收集所有需要的专家层
-+++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
-+++-
-+++-        # 2. 并行计算所有专家的输出
-+++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+++-        # ops.cat 会将它们堆叠成一个新的 Tensor
-+++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+++-
-+++-        # 3. 使用矩阵乘法进行加权求和
-+++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++-        # 最终结果 final_output 的 shape: (1, hidden_size)
-+++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++++    # @no_grad()
-++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++    #     # x 的 shape: (1, hidden_size)
-++++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++++
-++++    #     # 1. 收集所有需要的专家层
-++++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
-++++
-++++    #     # 2. 并行计算所有专家的输出
-++++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
-++++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++++
-++++    #     # 3. 使用矩阵乘法进行加权求和
-++++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
-++++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+++         
-+++-        return final_output
-++++    #     return final_output
-+++ 
-+++ 
-+++     # @no_grad()
-+++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
-+++             )
-+++ 
-+++         return expert_cache
-++++# 放置在 DeepseekMoE 类中
-++++    @no_grad()
-++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++        """
-++++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-++++
-++++        Args:
-++++            x (Tensor): 输入张量, shape: (1, hidden_size)
-++++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-++++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-++++        """
-++++        top_k, _ = flat_expert_weights.shape
-++++        hidden_size = x.shape[-1]
-++++
-++++        # 1. 将所有专家的权重堆叠起来
-++++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-++++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-++++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-++++        
-++++        # 2. "收集" 所需的专家权重
-++++        selected_gate_w = stacked_gate_w[flat_expert_indices]
-++++        selected_up_w = stacked_up_w[flat_expert_indices]
-++++        selected_down_w = stacked_down_w[flat_expert_indices]
-++++        
-++++        # 3. 准备输入
-++++        x_expanded = x.expand((top_k, 1, hidden_size))
-++++        
-++++        # 4. 并行计算 gate_proj 和 up_proj
-++++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-++++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-++++
-++++        # 5. 计算中间状态
-++++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-++++        
-++++        # 6. 并行计算 down_proj
-++++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-++++        #    --- [FIX] ---
-++++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-++++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-++++        #    --- [FIX END] ---
-++++        
-++++        # 7. 根据路由权重进行加权求和
-++++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-++++
-++++        return weighted_sum
-++++
-++++
-+++ 
-+++         # @no_grad()
-+++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++index ebd7782e..913a7609 100644
-+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
-+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++ def rotate_half(x):
-+++     """Rotates half the hidden dims of the input."""
-+++-    x1 = x[..., : x.shape[-1] // 2]
-+++-    x2 = x[..., x.shape[-1] // 2 :]
-++++    # x1 = x[..., : x.shape[-1] // 2]
-++++    # x2 = x[..., x.shape[-1] // 2 :]
-+++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++     return ops.cat((-x2, x1), dim=-1)
-+++ 
-+++ 
-+++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-+++index d059dcbe..2b217b64 100644
-+++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-++++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-+++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
-+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++ def rotate_half(x):
-+++     """Rotates half the hidden dims of the input."""
-+++-    x1 = x[..., : x.shape[-1] // 2]
-+++-    x2 = x[..., x.shape[-1] // 2 :]
-++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++    # x1 = x[..., : x.shape[-1] // 2]
-++++    # x2 = x[..., x.shape[-1] // 2 :]
-++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++     return ops.cat((-x2, x1), dim=-1)
-+++ 
-+++ 
-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+++index 78f22642..0a0ef2d7 100644
-+++--- a/patches/0001-20251104commit.patch
-++++++ b/patches/0001-20251104commit.patch
-+++@@ -1,7 +1,7 @@
-+++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++ From: Pinoeer-kingxi <13022943007@163.com>
-+++ Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++-Subject: [PATCH 1/3] 20251104commit
-++++Subject: [PATCH 1/4] 20251104commit
-+++ 
-+++ ---
-+++  mindnlp/transformers/cache_utils.py           |  28 +-
-+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+++index 22b65dd5..5185270c 100644
-+++--- a/patches/0002-20251106commit.patch
-++++++ b/patches/0002-20251106commit.patch
-+++@@ -1,7 +1,7 @@
-+++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+++ From: Pinoeer-kingxi <13022943007@163.com>
-+++ Date: Thu, 6 Nov 2025 09:20:38 +0800
-+++-Subject: [PATCH 2/3] 20251106commit
-++++Subject: [PATCH 2/4] 20251106commit
-+++ 
-+++ ---
-+++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+++index 966529e4..3e05f821 100644
-+++--- a/patches/0003-20261106secondcommit.patch
-++++++ b/patches/0003-20261106secondcommit.patch
-+++@@ -1,7 +1,7 @@
-+++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+++ From: Pinoeer-kingxi <13022943007@163.com>
-+++ Date: Thu, 6 Nov 2025 14:54:37 +0800
-+++-Subject: [PATCH 3/3] 20261106secondcommit
-++++Subject: [PATCH 3/4] 20261106secondcommit
-+++ 
-+++ ---
-+++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-+++new file mode 100644
-+++index 00000000..88a1aef4
-+++--- /dev/null
-++++++ b/patches/0004-20251106change.patch
-+++@@ -0,0 +1,7498 @@
-++++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-++++From: Pinoeer-kingxi <13022943007@163.com>
-++++Date: Thu, 6 Nov 2025 15:48:09 +0800
-++++Subject: [PATCH 4/4] 20251106change
-++++
-++++---
-++++ .../models/deepseek/modeling_deepseek.py      |  189 +-
-++++ patches/0001-20251104commit.patch             | 1272 +++++++
-++++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
-++++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
-++++ 4 files changed, 7244 insertions(+), 186 deletions(-)
-++++ create mode 100644 patches/0001-20251104commit.patch
-++++ create mode 100644 patches/0002-20251106commit.patch
-++++ create mode 100644 patches/0003-20261106secondcommit.patch
-++++
-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++index 2f9192bf..0546f318 100644
-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
-++++ 
-++++         return attn_output, attn_weights, past_key_value
-++++ 
-++++-# class DeepseekFlashAttention(nn.Module):
-++++-#     """
-++++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-++++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-++++-
-++++-#     This class is designed as a drop-in replacement for DeepseekAttention.
-++++-#     """
-++++-
-++++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-++++-#         super().__init__()
-++++-#         self.config = config
-++++-#         self.layer_idx = layer_idx
-++++-#         if layer_idx is None:
-++++-#             logger.warning(
-++++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++-#                 "when creating this class."
-++++-#             )
-++++-
-++++-#         self.attention_dropout = config.attention_dropout
-++++-#         self.hidden_size = config.hidden_size
-++++-#         self.num_heads = config.num_attention_heads
-++++-#         self.head_dim = self.hidden_size // self.num_heads
-++++-#         self.num_key_value_heads = config.num_key_value_heads
-++++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++-#         self.max_position_embeddings = config.max_position_embeddings
-++++-#         self.rope_theta = config.rope_theta
-++++-#         self.is_causal = True
-++++-
-++++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++++-#             raise ValueError(
-++++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++++-#                 f" and `num_heads`: {self.num_heads})."
-++++-#             )
-++++-
-++++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-++++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-++++-#         self._init_rope()
-++++-
-++++-#     def _init_rope(self):
-++++-#         if self.config.rope_scaling is None:
-++++-#             self.rotary_emb = DeepseekRotaryEmbedding(
-++++-#                 self.head_dim,
-++++-#                 max_position_embeddings=self.max_position_embeddings,
-++++-#                 base=self.rope_theta,
-++++-#             )
-++++-#         else:
-++++-#             scaling_type = self.config.rope_scaling["type"]
-++++-#             scaling_factor = self.config.rope_scaling["factor"]
-++++-#             if scaling_type == "linear":
-++++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-++++-#                     self.head_dim,
-++++-#                     max_position_embeddings=self.max_position_embeddings,
-++++-#                     scaling_factor=scaling_factor,
-++++-#                     base=self.rope_theta,
-++++-#                 )
-++++-#             elif scaling_type == "dynamic":
-++++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-++++-#                     self.head_dim,
-++++-#                     max_position_embeddings=self.max_position_embeddings,
-++++-#                     scaling_factor=scaling_factor,
-++++-#                     base=self.rope_theta,
-++++-#                 )
-++++-#             else:
-++++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-++++-
-++++-#     def forward(
-++++-#         self,
-++++-#         hidden_states: mindspore.Tensor,
-++++-#         attention_mask: Optional[mindspore.Tensor] = None,
-++++-#         position_ids: Optional[mindspore.Tensor] = None,
-++++-#         past_key_value: Optional[Cache] = None,
-++++-#         output_attentions: bool = False,
-++++-#         use_cache: bool = False,
-++++-#         **kwargs,
-++++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++-#         if "padding_mask" in kwargs:
-++++-#             warnings.warn(
-++++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-++++-#             )
-++++-        
-++++-#         if output_attentions:
-++++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-++++-
-++++-#         bsz, q_len, _ = hidden_states.shape
-++++-
-++++-#         if self.config.pretraining_tp > 1:
-++++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-++++-
-++++-#         query_states = self.q_proj(hidden_states)
-++++-#         key_states = self.k_proj(hidden_states)
-++++-#         value_states = self.v_proj(hidden_states)
-++++-
-++++-#         # Reshape for multi-head attention
-++++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++-
-++++-#         kv_seq_len = key_states.shape[-2]
-++++-#         if past_key_value is not None:
-++++-#             if self.layer_idx is None:
-++++-#                 raise ValueError(
-++++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++-#                     "with a layer index."
-++++-#                 )
-++++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++-        
-++++-#         # Apply Rotary Positional Embedding
-++++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++-
-++++-#         if past_key_value is not None:
-++++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-++++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++-
-++++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-++++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-++++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++-        
-++++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++++-        
-++++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++++-
-++++-#         # Convert attention_mask for flash_attention_score
-++++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-++++-#         if attention_mask is not None:
-++++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-++++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-++++-#                 raise ValueError(
-++++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-++++-#                 )
-++++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-++++-#         else:
-++++-#             attn_mask_for_fa = None
-++++-        
-++++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-++++-
-++++-#         # Call the fused flash_attention_score operator
-++++-#         attn_output = mindspore.ops.flash_attention_score(
-++++-#             query=query_states_for_fa,
-++++-#             key=key_states_for_fa,
-++++-#             value=value_states_for_fa,
-++++-#             head_num=self.num_heads, # This is N1, the number of query heads
-++++-#             input_layout='BSH',
-++++-#             attn_mask=attn_mask_for_fa,
-++++-#             keep_prob=keep_prob,
-++++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
-++++-#             sparse_mode=0 # Default mask mode
-++++-#         )
-++++-        
-++++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-++++-#         attn_output = self.o_proj(attn_output)
-++++-        
-++++-#         # Flash Attention does not return attention weights
-++++-#         attn_weights = None
-++++-
-++++-#         return attn_output, attn_weights, past_key_value
-++++ 
-++++ class DeepseekFlashAttention(nn.Module):
-++++     """
-++++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
-++++         super().__init__()
-++++         self.hidden_size = config.hidden_size
-++++ 
-++++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-++++-            config=config, layer_idx=layer_idx
-++++-        )
-+++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-+++++            # config=config, layer_idx=layer_idx
-+++++        # )
-++++ 
-++++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-++++             config=config, layer_idx=layer_idx
-++++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
-++++         return outputs
-++++ 
-++++ 
-++++-
-++++ class DeepseekPreTrainedModel(PreTrainedModel):
-++++     config_class = DeepseekConfig
-++++     base_model_prefix = "model"
-++++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++         # Initialize weights and apply final processing
-++++         self.post_init()
-++++         self.warm_up = False
-++++-        #@dwj
-++++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-++++-            self.num_layers,
-++++-            self.num_attention_heads,
-++++-            self.head_dim,
-++++-            batch_size=1,
-++++-            max_length=self.max_length,
-++++-            dtype=mindspore.float16
-++++-        )
-++++-
-++++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-++++-        key_cache = []
-++++-        value_cache = []
-++++-        for _ in range(num_layers):
-++++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++++-            key_cache.append(k)
-++++-            value_cache.append(v)
-++++-        return key_cache, value_cache
-++++-
-++++ 
-++++     def warmup_moe_model_deep(self):
-++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++++new file mode 100644
-++++index 00000000..78f22642
-++++--- /dev/null
-+++++++ b/patches/0001-20251104commit.patch
-++++@@ -0,0 +1,1272 @@
-+++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++++From: Pinoeer-kingxi <13022943007@163.com>
-+++++Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++++Subject: [PATCH 1/3] 20251104commit
-+++++
-+++++---
-+++++ mindnlp/transformers/cache_utils.py           |  28 +-
-+++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+++++ 3 files changed, 976 insertions(+), 87 deletions(-)
-+++++
-+++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+++++index cadd2e04..02f8d4be 100644
-+++++--- a/mindnlp/transformers/cache_utils.py
-++++++++ b/mindnlp/transformers/cache_utils.py
-+++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+++++             # k_out[:, :, cache_position] = key_states
-+++++             # v_out[:, :, cache_position] = value_states
-+++++-            if ON_ORANGE_PI:
-+++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++-            else:
-+++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++-
-++++++            # if ON_ORANGE_PI:
-++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++++            # else:
-++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++++            # 确保 cache_position 是 1D tensor 并且类型正确
-++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++++++            if cache_position.ndim > 1:
-++++++                cache_position = cache_position.flatten()
-++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++++++                cache_position = cache_position.int()
-++++++            
-++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++++++            k_out[:, :, cache_position] = key_states
-++++++            v_out[:, :, cache_position] = value_states
-++++++            
-+++++         return k_out, v_out
-+++++ 
-+++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++index c695b944..d8303e45 100644
-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++++ def rotate_half(x):
-+++++     """Rotates half the hidden dims of the input."""
-+++++-    x1 = x[..., : x.shape[-1] // 2]
-+++++-    x2 = x[..., x.shape[-1] // 2 :]
-++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++++    # x1 = x[..., : x.shape[-1] // 2]
-++++++    # x2 = x[..., x.shape[-1] // 2 :]
-++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++     return ops.cat((-x2, x1), dim=-1)
-+++++ 
-+++++ 
-+++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+++++         if self.training:
-+++++             raise NotImplementedError("Training is not supported yet.")
-+++++         else:
-+++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++-        if self.config.n_shared_experts is not None:
-+++++-            y = y + self.shared_experts(identity)
-+++++-        return y
-++++++            # @lwx
-++++++            if orig_shape[1] == 1:
-++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++++++                y=y.view(*orig_shape)
-++++++                if self.config.n_shared_experts is not None:
-++++++                    y = y + self.shared_experts(identity)
-++++++                return y
-++++++            else:
-++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++++++                if self.config.n_shared_experts is not None:
-++++++                    y = y + self.shared_experts(identity)
-++++++                return y
-++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++++        # if self.config.n_shared_experts is not None:
-++++++        #     y = y + self.shared_experts(identity)
-++++++        # return y
-++++++        
-++++++    @no_grad()
-++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++++
-++++++        expert_cache = ops.zeros_like(x)
-++++++        for i in range(self.num_experts_per_tok):
-++++++            expert_id = flat_expert_indices[i].item()
-++++++            weight = flat_expert_weights[i].item()
-++++++            expert = self.experts[expert_id]
-++++++            expert_out = expert(x)
-++++++            expert_cache += expert_out * weight
-++++++        return expert_cache
-+++++ 
-+++++     @no_grad()
-+++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++-        # expert_cache = torch.zeros_like(x)
-+++++-        # idxs = flat_expert_indices.argsort()
-+++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++-        # token_idxs = idxs // self.num_experts_per_tok
-+++++-        # for i, end_idx in enumerate(tokens_per_expert):
-+++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++-        #     if start_idx == end_idx:
-+++++-        #         continue
-+++++-        #     expert = self.experts[i]
-+++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++-        #     expert_tokens = x[exp_token_idx]
-+++++-        #     expert_out = expert(expert_tokens)
-+++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++-        # return expert_cache
-++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++         expert_cache = ops.zeros_like(x)
-+++++         idxs = flat_expert_indices.argsort()
-+++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++         token_idxs = idxs // self.num_experts_per_tok
-++++++
-+++++         for i, end_idx in enumerate(tokens_per_expert):
-+++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++             if start_idx == end_idx:
-+++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+++++             expert_out = expert(expert_tokens)
-+++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++
-+++++         return expert_cache
-++++++        
-++++++    # @no_grad()
-++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++    #     # expert_cache = torch.zeros_like(x)
-++++++    #     # idxs = flat_expert_indices.argsort()
-++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++    #     # token_idxs = idxs // self.num_experts_per_tok
-++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++    #     #     if start_idx == end_idx:
-++++++    #     #         continue
-++++++    #     #     expert = self.experts[i]
-++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++    #     #     expert_tokens = x[exp_token_idx]
-++++++    #     #     expert_out = expert(expert_tokens)
-++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++    #     # return expert_cache
-++++++    #     expert_cache = ops.zeros_like(x)
-++++++    #     idxs = flat_expert_indices.argsort()
-++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++++
-++++++    #     for i, end_idx in enumerate(tokens_per_expert):
-++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++    #         if start_idx == end_idx:
-++++++    #             continue
-++++++    #         expert = self.experts[i]
-++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++    #         expert_tokens = x[exp_token_idx]
-++++++    #         expert_out = expert(expert_tokens)
-++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++
-++++++    #     return expert_cache
-++++++    # @no_grad()
-++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++    #     expert_cache = ops.zeros_like(x)
-++++++
-++++++    #     # 排序保证顺序一致
-++++++    #     idxs = flat_expert_indices.argsort()
-++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++++
-++++++    #     # 找出有 token 的专家
-++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++++
-++++++    #     for i in active_experts.tolist():
-++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++    #         end_idx = tokens_per_expert[i]
-++++++    #         if start_idx == end_idx:  # 没有 token
-++++++    #             continue
-++++++
-++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++    #         expert_tokens = x[exp_token_idx]
-++++++    #         expert_out = self.experts[i](expert_tokens)
-++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++++
-++++++    #         expert_cache = mindspore.mint.scatter_add(
-++++++    #             expert_cache,
-++++++    #             0,
-++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++++    #             expert_out
-++++++    #         )
-++++++
-++++++    #     return expert_cache
-++++++
-++++++
-+++++ 
-+++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+++++ #     """
-+++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++ 
-+++++         # Initialize weights and apply final processing
-+++++         self.post_init()
-++++++        self.warm_up = False
-++++++
-++++++    def warmup_moe_model_deep(self):
-++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++++        test_texts = [
-++++++            "warmup short",
-++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++++++        ]
-++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++++        if tokenizer is None:
-++++++            from mindnlp.transformers import AutoTokenizer
-++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++++            self._warmup_tokenizer = tokenizer
-++++++
-++++++        for text in test_texts:
-++++++            inputs = tokenizer(text, return_tensors="ms")
-++++++            with mindspore._no_grad():
-++++++                _ = self(**inputs, use_cache=False)
-++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+++++ 
-+++++     def get_input_embeddings(self):
-+++++         return self.model.embed_tokens
-+++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++++         ```"""
-++++++        if not self.warm_up:
-++++++            self.warm_up = True
-++++++            self.warmup_moe_model_deep()
-++++++
-+++++         output_attentions = (
-+++++             output_attentions
-+++++             if output_attentions is not None
-+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++index 3cbf820e..d4c6b651 100644
-+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++@@ -18,7 +18,6 @@
-+++++ # See the License for the specific language governing permissions and
-+++++ # limitations under the License.
-+++++ """MindSpore Qwen2MoE model."""
-+++++-
-+++++ import math
-+++++ from typing import List, Optional, Tuple, Union
-+++++ 
-+++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+++++     TokenClassifierOutput,
-+++++ )
-+++++ from ...modeling_utils import PreTrainedModel
-++++++from ...generation import GenerationMixin
-+++++ from ....utils import logging
-+++++ from .configuration_qwen2_moe import Qwen2MoeConfig
-+++++ 
-+++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+++++         self.variance_epsilon = eps
-+++++ 
-+++++     def forward(self, hidden_states):
-++++++        # @dwj
-++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++++        # @lwx
-++++++        # if not self.training :
-++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++         input_dtype = hidden_states.dtype
-+++++         hidden_states = hidden_states.to(mindspore.float32)
-+++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+++++@@ -234,6 +239,8 @@ def rotate_half(x):
-+++++     """Rotates half the hidden dims of the input."""
-+++++     x1 = x[..., : x.shape[-1] // 2]
-+++++     x2 = x[..., x.shape[-1] // 2 :]
-++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++     return ops.cat((-x2, x1), dim=-1)
-+++++ 
-+++++ 
-+++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+++++         self.config = config
-+++++         self.hidden_size = config.hidden_size
-+++++         self.intermediate_size = intermediate_size
-++++++        
-+++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+++++         self.act_fn = ACT2FN[config.hidden_act]
-+++++ 
-+++++     def forward(self, x):
-+++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++-
-+++++ 
-++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++++        # @lwx 
-++++++        # gate_up_output = self.gate_up_proj(x)
-++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++++++        # return self.down_proj(swiglu_output)
-++++++
-++++++    # def forward(self, x):
-++++++    #     gate_proj_out = self.gate_proj(x)
-++++++    #     up_proj_out = self.up_proj(x)
-++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++++++    #     return self.down_proj(swiglu_out)
-++++++        
-+++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++++     """
-+++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+++++         use_cache: bool = False,
-+++++         cache_position: Optional[mindspore.Tensor] = None,
-+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++        
-++++++
-+++++         bsz, q_len, _ = hidden_states.shape
-+++++ 
-+++++         query_states = self.q_proj(hidden_states)
-+++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++                     "with a layer index."
-+++++                 )
-+++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++            if isinstance(past_key_value, StaticCache):
-++++++                kv_seq_len = key_states.shape[-2]
-++++++            else:
-++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++ 
-+++++         if past_key_value is not None:
-+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++            
-++++++            if isinstance(past_key_value, StaticCache):
-++++++                kv_seq_len = key_states.shape[-2]
-+++++ 
-+++++         # repeat k/v heads if n_kv_heads < n_heads
-+++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++-
-++++++        
-+++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++ 
-+++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+++++-            raise ValueError(
-+++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+++++-                f" {attn_weights.shape}"
-+++++-            )
-+++++-
-+++++-        if attention_mask is not None:  # no matter the length, we just slice it
-+++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++++++        if attention_mask is not None:
-++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++             attn_weights = attn_weights + causal_mask
-+++++ 
-+++++         # upcast attention to fp32
-+++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++ 
-+++++         attn_output = self.o_proj(attn_output)
-+++++-
-++++++        # @lwx
-++++++        
-++++++        # max_seq_len = self.max_position_embeddings  # 2048
-++++++
-++++++        # if attention_mask is not None:
-++++++        #     # attention_mask: [B, 1, Sq, Sk]
-++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++++
-++++++        #     # pad 到 [max_seq_len, max_seq_len]
-++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++++        #     global_attention_mask = padded_mask
-++++++        # else:
-++++++        #     global_attention_mask = None
-++++++
-++++++
-++++++        # sparse_mode=3
-++++++        # attn_output = mindspore.ops.flash_attention_score(
-++++++        #     query=query_states,
-++++++        #     key=key_states,
-++++++        #     value=value_states,
-++++++        #     real_shift=None,
-++++++        #     padding_mask=None,
-++++++
-++++++        #     head_num=self.num_heads,
-++++++        #     attn_mask=global_attention_mask,
-++++++        #     keep_prob=1.0 - self.attention_dropout,
-++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++        #     input_layout="BNSD",
-++++++        #     pre_tokens=2147483647,
-++++++        #     next_tokens=2147483647,
-++++++        #     inner_precise=0,
-++++++        #     drop_mask=None,
-++++++        #     prefix=None,
-++++++        #     actual_seq_qlen=None,
-++++++        #     actual_seq_kvlen=None,
-++++++        #     sparse_mode=sparse_mode,
-++++++        # )
-+++++         if not output_attentions:
-+++++             attn_weights = None
-+++++ 
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++ 
-++++++class Qwen2MoeFlashAttention(nn.Module):
-++++++    """
-++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++++
-++++++    关键改动:
-++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++++       直接传入原始的 key 和 value 张量效率更高。
-++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++++    """
-++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++++        super().__init__()
-++++++        self.config = config
-++++++        self.layer_idx = layer_idx
-++++++        self.hidden_size = config.hidden_size
-++++++        self.num_heads = config.num_attention_heads
-++++++        self.head_dim = self.hidden_size // self.num_heads
-++++++        self.num_key_value_heads = config.num_key_value_heads
-++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++        self.max_position_embeddings = config.max_position_embeddings
-++++++        self.rope_theta = config.rope_theta
-++++++        self.attention_dropout = config.attention_dropout
-++++++
-++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++            raise ValueError(
-++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++++            )
-++++++
-++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++++
-++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++++            self.head_dim,
-++++++            max_position_embeddings=self.max_position_embeddings,
-++++++            base=self.rope_theta,
-++++++        )
-++++++
-++++++    def forward(
-++++++        self,
-++++++        hidden_states: mindspore.Tensor,
-++++++        attention_mask: Optional[mindspore.Tensor] = None,
-++++++        position_ids: Optional[mindspore.Tensor] = None,
-++++++        past_key_value: Optional[Cache] = None,
-++++++        output_attentions: bool = False,
-++++++        use_cache: bool = False,
-++++++        cache_position: Optional[mindspore.Tensor] = None,
-++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++        bsz, q_len, _ = hidden_states.shape
-++++++
-++++++        # 1. 线性投射 Q, K, V
-++++++        query_states = self.q_proj(hidden_states)
-++++++        key_states = self.k_proj(hidden_states)
-++++++        value_states = self.v_proj(hidden_states)
-++++++
-++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++        # 3. RoPE 旋转位置编码
-++++++        kv_seq_len = key_states.shape[-2]
-++++++        if past_key_value is not None:
-++++++            if self.layer_idx is None:
-++++++                raise ValueError(
-++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++                    "with a layer index."
-++++++                )
-++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++++                if cache_position.shape[0] == 1:
-++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++++                    kv_seq_len = past_seen_tokens + 1
-++++++                else:
-++++++                    # prefill 阶段：cache_position 是范围，使用其长度
-++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++++            else:
-++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++
-++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++        # 4. KV 缓存更新
-++++++        if past_key_value is not None:
-++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++            key_states, value_states = past_key_value.update(
-++++++                key_states, value_states, self.layer_idx, cache_kwargs
-++++++            )
-++++++            
-++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++                if cache_position.shape[0] == 1:
-++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++++                    kv_seq_len = key_states.shape[-2]
-++++++
-++++++        # 5. [重要] 准备 Attention Mask
-++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++++        fa_attention_mask = None
-++++++        if attention_mask is not None:
-++++++            # 截取与当前key长度匹配的部分
-++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++++            fa_attention_mask = (mask_slice != 0)
-++++++
-++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++++        input_dtype = query_states.dtype
-++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++++            query_states = query_states.to(mindspore.float16)
-++++++            key_states = key_states.to(mindspore.float16)
-++++++            value_states = value_states.to(mindspore.float16)
-++++++
-++++++        # 6. [核心] 调用 flash_attention_score 算子
-++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++++        attn_output = mindspore.ops.flash_attention_score(
-++++++            query=query_states,
-++++++            key=key_states,
-++++++            value=value_states,
-++++++            head_num=self.num_heads, # 传入Q的头数(N1)
-++++++            attn_mask=fa_attention_mask,
-++++++            keep_prob=1.0 - self.attention_dropout,
-++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++            input_layout="BNSD",
-++++++            sparse_mode=0 # 使用 defaultMask 模式
-++++++        )
-++++++
-++++++        # 恢复原始数据类型
-++++++        attn_output = attn_output.to(input_dtype)
-++++++
-++++++        # 7. 调整输出形状
-++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++        attn_output = self.o_proj(attn_output)
-++++++
-++++++        # FlashAttention 算子不直接返回注意力权重矩阵
-++++++        attn_weights = None
-++++++        if output_attentions:
-++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++
-++++++        return attn_output, attn_weights, past_key_value
-++++++
-++++++    # def forward(
-++++++    #     self,
-++++++    #     hidden_states: mindspore.Tensor,
-++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++    #     past_key_value: Optional[Cache] = None,
-++++++    #     output_attentions: bool = False,
-++++++    #     use_cache: bool = False,
-++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++    #     bsz, q_len, _ = hidden_states.shape
-++++++
-++++++    #     # 1. 线性投射 Q, K, V
-++++++    #     query_states = self.q_proj(hidden_states)
-++++++    #     key_states = self.k_proj(hidden_states)
-++++++    #     value_states = self.v_proj(hidden_states)
-++++++
-++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++    #     # 3. RoPE 旋转位置编码
-++++++    #     kv_seq_len = key_states.shape[-2]
-++++++    #     if past_key_value is not None:
-++++++    #         if self.layer_idx is None:
-++++++    #             raise ValueError(
-++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++    #                 "with a layer index."
-++++++    #             )
-++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++
-++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++    #     # 4. KV 缓存更新
-++++++    #     if past_key_value is not None:
-++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++    #         key_states, value_states = past_key_value.update(
-++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++    #         )
-++++++
-++++++    #     # 5. 准备 Attention Mask
-++++++    #     fa_attention_mask = None
-++++++    #     if attention_mask is not None:
-++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++    #         fa_attention_mask = (mask_slice != 0)
-++++++
-++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++++    #     input_dtype = query_states.dtype
-++++++
-++++++    #     # 6. [核心] 调用 flash_attention_score 算子
-++++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++++    #         query=query_states,
-++++++    #         key=key_states,
-++++++    #         value=value_states,
-++++++    #         head_num=self.num_heads,
-++++++    #         attn_mask=fa_attention_mask,
-++++++    #         keep_prob=1.0 - self.attention_dropout,
-++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++    #         input_layout="BNSD",
-++++++    #         sparse_mode=0,
-++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++++    #         inner_precise=1
-++++++    #     )
-++++++
-++++++    #     # 恢复原始数据类型
-++++++    #     attn_output = attn_output.to(input_dtype)
-++++++
-++++++    #     # 7. 调整输出形状
-++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++    #     attn_output = self.o_proj(attn_output)
-++++++
-++++++    #     attn_weights = None
-++++++    #     if output_attentions:
-++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++
-++++++    #     return attn_output, attn_weights, past_key_value
-++++++
-++++++    # def forward(
-++++++    #     self,
-++++++    #     hidden_states: mindspore.Tensor,
-++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++    #     past_key_value: Optional[Cache] = None,
-++++++    #     output_attentions: bool = False,
-++++++    #     use_cache: bool = False,
-++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++    #     bsz, q_len, _ = hidden_states.shape
-++++++
-++++++    #     query_states = self.q_proj(hidden_states)
-++++++    #     key_states = self.k_proj(hidden_states)
-++++++    #     value_states = self.v_proj(hidden_states)
-++++++
-++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++    #     kv_seq_len = key_states.shape[-2]
-++++++    #     if past_key_value is not None:
-++++++    #         if self.layer_idx is None:
-++++++    #             raise ValueError("`layer_idx` must be specified for caching")
-++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++
-++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++    #     if past_key_value is not None:
-++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++    #         key_states, value_states = past_key_value.update(
-++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++    #         )
-++++++
-++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++
-++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++++    #     query_states = query_states / math.sqrt(self.head_dim)
-++++++    #     # <--- 修改结束 ---
-++++++
-++++++    #     fa_attention_mask = None
-++++++    #     if attention_mask is not None:
-++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++    #         fa_attention_mask = (mask_slice != 0)
-++++++
-++++++    #     input_dtype = query_states.dtype
-++++++
-++++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++++    #         query=query_states,    # 传入已经预先缩放过的 query
-++++++    #         key=key_states,
-++++++    #         value=value_states,
-++++++    #         head_num=self.num_heads,
-++++++    #         attn_mask=fa_attention_mask,
-++++++    #         keep_prob=1.0 - self.attention_dropout,
-++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++++    #         input_layout="BNSD",
-++++++    #         sparse_mode=0,
-++++++    #         inner_precise=1        # 仍然保持内部高精度计算
-++++++    #     )
-++++++
-++++++    #     attn_output = attn_output.to(input_dtype)
-++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++    #     attn_output = self.o_proj(attn_output)
-++++++
-++++++    #     attn_weights = None
-++++++    #     if output_attentions:
-++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++++
-++++++    #     return attn_output, attn_weights, past_key_value
-++++++
-+++++ QWEN2MOE_ATTENTION_CLASSES = {
-+++++     "eager": Qwen2MoeAttention,
-++++++    "flash-attention": Qwen2MoeFlashAttention,
-+++++ }
-+++++ 
-+++++ 
-+++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++ 
-++++++    #@dwj
-++++++    # 只遍历激活的专家，而非全部专家
-+++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-        hidden_states = hidden_states.view(-1, hidden_dim)
-+++++-        # router_logits: (batch * sequence_length, n_experts)
-+++++-        router_logits = self.gate(hidden_states)
-+++++-
-+++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++-        if self.norm_topk_prob:
-+++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-        # we cast back to the input dtype
-+++++-        routing_weights = routing_weights.to(hidden_states.dtype)
-+++++-
-+++++-        final_hidden_states = ops.zeros(
-+++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+++++-        )
-+++++-
-+++++-        # One hot encode the selected experts to create an expert mask
-+++++-        # this will be used to easily index which expert is going to be sollicitated
-+++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+++++-
-+++++-        # Loop over all available experts in the model and perform the computation on each expert
-+++++-        for expert_idx in range(self.num_experts):
-+++++-            expert_layer = self.experts[expert_idx]
-+++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+++++-
-+++++-            # Index the correct hidden states and compute the expert hidden state for
-+++++-            # the current expert. We need to make sure to multiply the output hidden
-+++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+++++-            if 0 not in idx.shape:
-+++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+++++-
-+++++-                # However `index_add_` only support torch tensors for indexing so we'll use
-+++++-                # the `top_x` tensor here.
-+++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+++++-
-+++++-        shared_expert_output = self.shared_expert(hidden_states)
-+++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+++++-
-+++++-        final_hidden_states = final_hidden_states + shared_expert_output
-++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++            num_tokens = hidden_states_reshaped.shape[0]
-++++++
-++++++            router_logits = self.gate(hidden_states_reshaped)
-++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++
-++++++            if self.norm_topk_prob:
-++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++            routing_weights = routing_weights.to(hidden_states.dtype)
-++++++            
-++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++++            flat_selected_experts = selected_experts.flatten()
-++++++            
-++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++++            token_indices = broadcasted_token_indices.flatten()
-++++++            
-++++++            active_experts = ops.unique(flat_selected_experts)
-++++++            
-++++++            for expert_idx_tensor in active_experts:
-++++++                expert_idx = expert_idx_tensor.item()
-++++++                expert_layer = self.experts[expert_idx]
-++++++                
-++++++                mask = (flat_selected_experts == expert_idx_tensor)
-++++++                selected_token_indices = token_indices[mask]
-++++++                selected_routing_weights = routing_weights.flatten()[mask]
-++++++                
-++++++                current_states = hidden_states_reshaped[selected_token_indices]
-++++++                
-++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++                
-++++++                final_hidden_states = final_hidden_states.index_add(
-++++++                    dim=0,
-++++++                    index=selected_token_indices,
-++++++                    source=expert_output.to(hidden_states.dtype)
-++++++                )
-++++++            
-++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++++ 
-+++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++-        return final_hidden_states, router_logits
-++++++            final_hidden_states = final_hidden_states + shared_expert_output
-++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++            
-++++++            return final_hidden_states, router_logits
-+++++ 
-+++++ 
-+++++ class Qwen2MoeDecoderLayer(nn.Module):
-+++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+++++ 
-+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++ 
-++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++++
-+++++         if (layer_idx not in config.mlp_only_layers) and (
-+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++++         ):
-+++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+++++     _skip_keys_device_placement = "past_key_values"
-+++++     _supports_cache_class = True
-++++++#lwx
-++++++    # _supports_static_cache = True
-+++++ 
-+++++     def _init_weights(self, module):
-+++++         std = self.config.initializer_range
-+++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++++         return causal_mask
-+++++ 
-+++++ 
-+++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++     _tied_weights_keys = ["lm_head.weight"]
-+++++ 
-+++++     def __init__(self, config):
-+++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++         self.num_experts_per_tok = config.num_experts_per_tok
-+++++         # Initialize weights and apply final processing
-+++++         self.post_init()
-++++++        # @lwx
-++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++++++        #     self.generation_config.cache_implementation = "static"
-++++++        self._warmed_up = False
-++++++
-++++++    def warmup_moe_model(self):
-++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++++++        test_texts = [
-++++++            "warmup short",
-++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++++++        ]
-++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++++        if tokenizer is None:
-++++++            from mindnlp.transformers import AutoTokenizer
-++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++++            self._warmup_tokenizer = tokenizer
-++++++
-++++++        for text in test_texts:
-++++++            inputs = tokenizer(text, return_tensors="ms")
-++++++            with mindspore._no_grad():
-++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+++++ 
-+++++     def get_input_embeddings(self):
-+++++         return self.model.embed_tokens
-+++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++++         ```"""
-++++++        if not self._warmed_up:
-++++++            self._warmed_up = True
-++++++            self.warmup_moe_model()
-+++++ 
-+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++++         output_router_logits = (
-+++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++             }
-+++++         )
-+++++         return model_inputs
-++++++# @lwx
-++++++    # def _decode_one_tokens_logits(
-++++++    #     self,
-++++++    #     cur_token: mindspore.Tensor,
-++++++    #     input_pos: Optional[mindspore.Tensor],
-++++++    #     cache_position: mindspore.Tensor,
-++++++    #     past_key_values: StaticCache,
-++++++    # ) -> mindspore.Tensor:
-++++++    #     """
-++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++++++        
-++++++    #     Args:
-++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++++++    #         input_pos: 输入位置信息，可选
-++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++++++            
-++++++    #     Returns:
-++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++++++    #     """
-++++++    #     # 调用JIT编译的版本
-++++++    #     return self.get_decode_one_tokens_logits(
-++++++    #         cur_token=cur_token,
-++++++    #         input_pos=input_pos,
-++++++    #         cache_position=cache_position,
-++++++    #         past_key_values=past_key_values,
-++++++    #     )
-++++++    
-++++++    # @mindspore.jit(jit_level='O1')
-++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++++++    #     """
-++++++    #     JIT编译的函数，用于高效的单token解码
-++++++    #     使用JIT编译优化以支持静态shape和高效执行
-++++++        
-++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++++++    #     """
-++++++    #     outputs = self.model.forward(
-++++++    #         input_ids=cur_token,
-++++++    #         position_ids=input_pos,
-++++++    #         cache_position=cache_position,
-++++++    #         past_key_values=past_key_values,
-++++++    #         use_cache=True,
-++++++    #         return_dict=False,
-++++++    #     )
-++++++        
-++++++    #     hidden_states = outputs[0]
-++++++    #     logits = self.lm_head.forward(hidden_states)
-++++++    #     logits = logits.float()
-++++++        
-++++++    #     return logits[:, -1, :]
-++++++
-++++++    # def _sample(
-++++++    #     self,
-++++++    #     input_ids: mindspore.Tensor,
-++++++    #     logits_processor,
-++++++    #     stopping_criteria,
-++++++    #     generation_config,
-++++++    #     synced_devices: bool,
-++++++    #     streamer=None,
-++++++    #     logits_warper=None,
-++++++    #     **model_kwargs,
-++++++    # ):
-++++++    #     """
-++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++++++    #     """
-++++++    #     from ...generation.logits_process import LogitsProcessorList
-++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++++++    #     from mindnlp.core import nn, ops, no_grad
-++++++    #     import numpy as np
-++++++        
-++++++    #     # 检查是否使用 StaticCache
-++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++++++    #     # 否则，直接调用父类方法
-++++++    #     past_key_values = model_kwargs.get("past_key_values")
-++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++++++
-++++++    #     if not isinstance(past_key_values, StaticCache):
-++++++    #         # 不使用 StaticCache，直接调用父类方法
-++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++++++    #         return super()._sample(
-++++++    #             input_ids=input_ids,
-++++++    #             logits_processor=logits_processor,
-++++++    #             stopping_criteria=stopping_criteria,
-++++++    #             generation_config=generation_config,
-++++++    #             synced_devices=synced_devices,
-++++++    #             streamer=streamer,
-++++++    #             logits_warper=logits_warper,
-++++++    #             **model_kwargs,
-++++++    #         )
-++++++        
-++++++    #     # 使用 StaticCache，进入自定义循环
-++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++++++    #     pad_token_id = generation_config._pad_token_tensor
-++++++    #     output_attentions = generation_config.output_attentions
-++++++    #     output_hidden_states = generation_config.output_hidden_states
-++++++    #     output_scores = generation_config.output_scores
-++++++    #     output_logits = generation_config.output_logits
-++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++++++    #     max_length = generation_config.max_length
-++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++++++    #     do_sample = generation_config.do_sample
-++++++        
-++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++++++    #         raise ValueError(
-++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++++++    #             f"{logits_warper})."
-++++++    #         )
-++++++        
-++++++    #     # init attention / hidden states / scores tuples
-++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++++++        
-++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++++++    #         encoder_hidden_states = (
-++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++++++    #         )
-++++++        
-++++++    #     # keep track of which sequences are already finished
-++++++    #     batch_size, cur_len = input_ids.shape
-++++++    #     this_peer_finished = False
-++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++++++        
-++++++    #     time_record = []
-++++++    #     from ....utils.testing_utils import parse_flag_from_env
-++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++++++        
-++++++    #     while self._has_unfinished_sequences(
-++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++++++    #     ):
-++++++    #         if _record_time:
-++++++    #             import time as time_module
-++++++    #             infer_start = time_module.time()
-++++++            
-++++++    #         # prepare model inputs
-++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++++++            
-++++++    #         # prepare variable output controls
-++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++++++            
-++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++++++    #         cur_cache_position = model_inputs.get("cache_position")
-++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-++++++    #         cur_input_ids = model_inputs.get("input_ids")
-++++++            
-++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++++++    #             cur_cache_position is not None and 
-++++++    #             len(cur_cache_position.shape) > 0 and
-++++++    #             cur_cache_position.shape[0] == 1 and
-++++++    #             cur_input_ids is not None and
-++++++    #             cur_input_ids.shape[1] == 1):
-++++++    #             # 使用 JIT 优化的单 token 解码
-++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++++++    #             if not hasattr(self, '_jit_used'):
-++++++    #                 self._jit_used = False
-++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++++++                
-++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-++++++    #                 cur_token=cur_input_ids,
-++++++    #                 input_pos=model_inputs.get("position_ids"),
-++++++    #                 cache_position=cur_cache_position,
-++++++    #                 past_key_values=cur_past_key_values,
-++++++    #             )
-++++++                
-++++++    #             # 标记已使用JIT（用于后续判断）
-++++++    #             if not self._jit_used:
-++++++    #                 self._jit_used = True
-++++++                
-++++++    #             # 构造兼容的输出对象
-++++++    #             class JitOptimizedOutput:
-++++++    #                 def __init__(self, logits, config):
-++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++++++    #                     self.config = config
-++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-++++++    #                     self.cross_attentions = None
-++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++++++                
-++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++++++    #         else:
-++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++++++    #             outputs = self(**model_inputs, return_dict=True)
-++++++            
-++++++    #         if synced_devices and this_peer_finished:
-++++++    #             continue
-++++++            
-++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++++++    #         next_token_logits = outputs.logits[:, -1, :]
-++++++            
-++++++    #         # pre-process distribution
-++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++++++    #         if do_sample:
-++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++++++            
-++++++    #         # Store scores, attentions and hidden_states when required
-++++++    #         if return_dict_in_generate:
-++++++    #             if output_scores:
-++++++    #                 scores += (next_token_scores,)
-++++++    #             if output_logits:
-++++++    #                 raw_logits += (next_token_logits,)
-++++++    #             if output_attentions:
-++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++++++    #                 if self.config.is_encoder_decoder:
-++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++++++                
-++++++    #             if output_hidden_states:
-++++++    #                 hidden = (
-++++++    #                     outputs.decoder_hidden_states
-++++++    #                     if self.config.is_encoder_decoder
-++++++    #                     else outputs.hidden_states
-++++++    #                 )
-++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++++++            
-++++++    #         # token selection
-++++++    #         if do_sample:
-++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++++++    #         else:
-++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++++++            
-++++++    #         # finished sentences should have their next token be a padding token
-++++++    #         if has_eos_stopping_criteria:
-++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++++++            
-++++++    #         # update generated ids, model inputs, and length for next step
-++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++++++    #         if streamer is not None:
-++++++    #             streamer.put(next_tokens)
-++++++            
-++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-++++++    #             outputs,
-++++++    #             model_kwargs,
-++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++++++    #         )
-++++++            
-++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++++++    #         cur_len += 1
-++++++            
-++++++    #         if _record_time:
-++++++    #             import time as time_module
-++++++    #             infer_stop = time_module.time()
-++++++    #             time_record.append(infer_stop - infer_start)
-++++++            
-++++++    #         del outputs
-++++++        
-++++++    #     average_infer_time = None
-++++++    #     if time_record:
-++++++    #         if len(time_record) > 1:
-++++++    #             time_record.pop(0)
-++++++    #         average_infer_time = sum(time_record) / len(time_record)
-++++++    #         print(f'average inference time is: {average_infer_time}')
-++++++    #         print(f'inference time record: {time_record}')
-++++++        
-++++++    #     if streamer is not None:
-++++++    #         streamer.end()
-++++++        
-++++++    #     # 简单判断：打印是否使用了JIT路径
-++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-++++++    #     else:
-++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++++++        
-++++++    #     if return_dict_in_generate:
-++++++    #         if self.config.is_encoder_decoder:
-++++++    #             return GenerateEncoderDecoderOutput(
-++++++    #                 sequences=input_ids,
-++++++    #                 scores=scores,
-++++++    #                 logits=raw_logits,
-++++++    #                 encoder_attentions=encoder_attentions,
-++++++    #                 encoder_hidden_states=encoder_hidden_states,
-++++++    #                 decoder_attentions=decoder_attentions,
-++++++    #                 cross_attentions=cross_attentions,
-++++++    #                 decoder_hidden_states=decoder_hidden_states,
-++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++++    #                 average_infer_time=average_infer_time
-++++++    #             )
-++++++    #         else:
-++++++    #             return GenerateDecoderOnlyOutput(
-++++++    #                 sequences=input_ids,
-++++++    #                 scores=scores,
-++++++    #                 logits=raw_logits,
-++++++    #                 attentions=decoder_attentions,
-++++++    #                 hidden_states=decoder_hidden_states,
-++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++++    #                 average_infer_time=average_infer_time
-++++++    #             )
-++++++    #     else:
-++++++    #         return input_ids
-++++++            
-++++++    # def _prepare_cache_for_generation(
-++++++    #     self,
-++++++    #     generation_config,
-++++++    #     model_kwargs,
-++++++    #     assistant_model,
-++++++    #     batch_size,
-++++++    #     max_cache_length,
-++++++    # ):
-++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++++++    #         generation_config.cache_implementation = "static"
-++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++++++        
-++++++    #     if generation_config.cache_implementation == "static":
-++++++    #         base_required_from_max_length = generation_config.max_length + 1
-++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-++++++    #         min_cache_size = 50
-++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++++++    #         else:
-++++++    #             max_cache_length = max(base_required, min_cache_size)
-++++++            
-++++++    #         original_max_cache_length = max_cache_length
-++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-++++++            
-++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++++    #             if max_cache_length > self.config.max_position_embeddings:
-++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++++++        
-++++++    #     result = super()._prepare_cache_for_generation(
-++++++    #         generation_config=generation_config,
-++++++    #         model_kwargs=model_kwargs,
-++++++    #         assistant_model=assistant_model,
-++++++    #         batch_size=batch_size,
-++++++    #         max_cache_length=max_cache_length,
-++++++    #     )
-++++++        
-++++++    #     if generation_config.cache_implementation == "static":
-++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++++++    #         created_cache = model_kwargs.get(cache_name)
-++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++++++    #             if created_cache.max_cache_len < generation_config.max_length:
-++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++++++        
-++++++    #     return result
-++++++
-++++++
-++++++
-+++++ 
-+++++ 
-+++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+++++-- 
-+++++2.27.0
-+++++
-++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-++++new file mode 100644
-++++index 00000000..22b65dd5
-++++--- /dev/null
-+++++++ b/patches/0002-20251106commit.patch
-++++@@ -0,0 +1,3200 @@
-+++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+++++From: Pinoeer-kingxi <13022943007@163.com>
-+++++Date: Thu, 6 Nov 2025 09:20:38 +0800
-+++++Subject: [PATCH 2/3] 20251106commit
-+++++
-+++++---
-+++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
-+++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
-+++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
-+++++ create mode 100644 patches/0001-20251104commit.patch
-+++++
-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++index d8303e45..73773c22 100644
-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
-+++++         #     y = y + self.shared_experts(identity)
-+++++         # return y
-+++++         
-++++++    # @no_grad()
-++++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++++
-++++++    #     expert_cache = ops.zeros_like(x)
-++++++    #     for i in range(self.num_experts_per_tok):
-++++++    #         expert_id = flat_expert_indices[i].item()
-++++++    #         weight = flat_expert_weights[i].item()
-++++++    #         expert = self.experts[expert_id]
-++++++    #         expert_out = expert(x)
-++++++    #         expert_cache += expert_out * weight
-++++++    #     return expert_cache
-++++++
-+++++     @no_grad()
-+++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++++        # x 的 shape: (1, hidden_size)
-++++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++++++
-++++++        # 1. 收集所有需要的专家层
-++++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-++++++
-++++++        # 2. 并行计算所有专家的输出
-++++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++++++        # ops.cat 会将它们堆叠成一个新的 Tensor
-++++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++++++
-++++++        # 3. 使用矩阵乘法进行加权求和
-++++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++++        # 最终结果 final_output 的 shape: (1, hidden_size)
-++++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++++++        
-++++++        return final_output
-+++++ 
-+++++-        expert_cache = ops.zeros_like(x)
-+++++-        for i in range(self.num_experts_per_tok):
-+++++-            expert_id = flat_expert_indices[i].item()
-+++++-            weight = flat_expert_weights[i].item()
-+++++-            expert = self.experts[expert_id]
-+++++-            expert_out = expert(x)
-+++++-            expert_cache += expert_out * weight
-+++++-        return expert_cache
-+++++ 
-+++++     @no_grad()
-+++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
-+++++             key_states = self.k_proj(hidden_states)
-+++++             value_states = self.v_proj(hidden_states)
-+++++ 
-+++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++++        # @lwx
-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
-++++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-++++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-++++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-+++++ 
-+++++         kv_seq_len = key_states.shape[-2]
-+++++         if past_key_value is not None:
-+++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++ 
-++++++# class DeepseekFlashAttention(nn.Module):
-++++++#     """
-++++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-++++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-++++++
-++++++#     This class is designed as a drop-in replacement for DeepseekAttention.
-++++++#     """
-++++++
-++++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-++++++#         super().__init__()
-++++++#         self.config = config
-++++++#         self.layer_idx = layer_idx
-++++++#         if layer_idx is None:
-++++++#             logger.warning(
-++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++++#                 "when creating this class."
-++++++#             )
-++++++
-++++++#         self.attention_dropout = config.attention_dropout
-++++++#         self.hidden_size = config.hidden_size
-++++++#         self.num_heads = config.num_attention_heads
-++++++#         self.head_dim = self.hidden_size // self.num_heads
-++++++#         self.num_key_value_heads = config.num_key_value_heads
-++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++#         self.max_position_embeddings = config.max_position_embeddings
-++++++#         self.rope_theta = config.rope_theta
-++++++#         self.is_causal = True
-++++++
-++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++#             raise ValueError(
-++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++++++#                 f" and `num_heads`: {self.num_heads})."
-++++++#             )
-++++++
-++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-++++++#         self._init_rope()
-++++++
-++++++#     def _init_rope(self):
-++++++#         if self.config.rope_scaling is None:
-++++++#             self.rotary_emb = DeepseekRotaryEmbedding(
-++++++#                 self.head_dim,
-++++++#                 max_position_embeddings=self.max_position_embeddings,
-++++++#                 base=self.rope_theta,
-++++++#             )
-++++++#         else:
-++++++#             scaling_type = self.config.rope_scaling["type"]
-++++++#             scaling_factor = self.config.rope_scaling["factor"]
-++++++#             if scaling_type == "linear":
-++++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-++++++#                     self.head_dim,
-++++++#                     max_position_embeddings=self.max_position_embeddings,
-++++++#                     scaling_factor=scaling_factor,
-++++++#                     base=self.rope_theta,
-++++++#                 )
-++++++#             elif scaling_type == "dynamic":
-++++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-++++++#                     self.head_dim,
-++++++#                     max_position_embeddings=self.max_position_embeddings,
-++++++#                     scaling_factor=scaling_factor,
-++++++#                     base=self.rope_theta,
-++++++#                 )
-++++++#             else:
-++++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-++++++
-++++++#     def forward(
-++++++#         self,
-++++++#         hidden_states: mindspore.Tensor,
-++++++#         attention_mask: Optional[mindspore.Tensor] = None,
-++++++#         position_ids: Optional[mindspore.Tensor] = None,
-++++++#         past_key_value: Optional[Cache] = None,
-++++++#         output_attentions: bool = False,
-++++++#         use_cache: bool = False,
-++++++#         **kwargs,
-++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++#         if "padding_mask" in kwargs:
-++++++#             warnings.warn(
-++++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-++++++#             )
-++++++        
-++++++#         if output_attentions:
-++++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-++++++
-++++++#         bsz, q_len, _ = hidden_states.shape
-++++++
-++++++#         if self.config.pretraining_tp > 1:
-++++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-++++++
-++++++#         query_states = self.q_proj(hidden_states)
-++++++#         key_states = self.k_proj(hidden_states)
-++++++#         value_states = self.v_proj(hidden_states)
-++++++
-++++++#         # Reshape for multi-head attention
-++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++#         kv_seq_len = key_states.shape[-2]
-++++++#         if past_key_value is not None:
-++++++#             if self.layer_idx is None:
-++++++#                 raise ValueError(
-++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++#                     "with a layer index."
-++++++#                 )
-++++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++        
-++++++#         # Apply Rotary Positional Embedding
-++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++#         if past_key_value is not None:
-++++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++
-++++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-++++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-++++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++        
-++++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++++++        
-++++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-++++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-++++++
-++++++#         # Convert attention_mask for flash_attention_score
-++++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-++++++#         if attention_mask is not None:
-++++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-++++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-++++++#                 raise ValueError(
-++++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-++++++#                 )
-++++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-++++++#         else:
-++++++#             attn_mask_for_fa = None
-++++++        
-++++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-++++++
-++++++#         # Call the fused flash_attention_score operator
-++++++#         attn_output = mindspore.ops.flash_attention_score(
-++++++#             query=query_states_for_fa,
-++++++#             key=key_states_for_fa,
-++++++#             value=value_states_for_fa,
-++++++#             head_num=self.num_heads, # This is N1, the number of query heads
-++++++#             input_layout='BSH',
-++++++#             attn_mask=attn_mask_for_fa,
-++++++#             keep_prob=keep_prob,
-++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++#             sparse_mode=0 # Default mask mode
-++++++#         )
-++++++        
-++++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-++++++#         attn_output = self.o_proj(attn_output)
-++++++        
-++++++#         # Flash Attention does not return attention weights
-++++++#         attn_weights = None
-++++++
-++++++#         return attn_output, attn_weights, past_key_value
-++++++
-++++++class DeepseekFlashAttention(nn.Module):
-++++++    """
-++++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
-++++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
-++++++    designed for high performance on supported hardware (Ascend).
-++++++
-++++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
-++++++    """
-++++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-++++++        super().__init__()
-++++++        self.config = config
-++++++        self.layer_idx = layer_idx
-++++++        if layer_idx is None:
-++++++            logger.warning(
-++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++++                "when creating this class."
-++++++            )
-++++++
-++++++        # --- [FIX] Correctly initialize all required attributes ---
-++++++        self.attention_dropout = config.attention_dropout
-++++++        self.hidden_size = config.hidden_size
-++++++        self.num_heads = config.num_attention_heads
-++++++        self.head_dim = self.hidden_size // self.num_heads
-++++++        self.num_key_value_heads = config.num_key_value_heads
-++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++        self.max_position_embeddings = config.max_position_embeddings
-++++++        self.rope_theta = config.rope_theta
-++++++        self.is_causal = True
-++++++
-++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++            raise ValueError(
-++++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++++++                f" and `num_heads`: {self.num_heads})."
-++++++            )
-++++++
-++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-++++++        
-++++++        # This call will now succeed as all attributes are initialized.
-++++++        self._init_rope()
-++++++
-++++++    def _init_rope(self):
-++++++        if self.config.rope_scaling is None:
-++++++            self.rotary_emb = DeepseekRotaryEmbedding(
-++++++                self.head_dim,
-++++++                max_position_embeddings=self.max_position_embeddings,
-++++++                base=self.rope_theta,
-++++++            )
-++++++        else:
-++++++            scaling_type = self.config.rope_scaling["type"]
-++++++            scaling_factor = self.config.rope_scaling["factor"]
-++++++            if scaling_type == "linear":
-++++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-++++++                    self.head_dim,
-++++++                    max_position_embeddings=self.max_position_embeddings,
-++++++                    scaling_factor=scaling_factor,
-++++++                    base=self.rope_theta,
-++++++                )
-++++++            elif scaling_type == "dynamic":
-++++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-++++++                    self.head_dim,
-++++++                    max_position_embeddings=self.max_position_embeddings,
-++++++                    scaling_factor=scaling_factor,
-++++++                    base=self.rope_theta,
-++++++                )
-++++++            else:
-++++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-++++++
-++++++    def forward(
-++++++        self,
-++++++        hidden_states: mindspore.Tensor,
-++++++        attention_mask: Optional[mindspore.Tensor] = None,
-++++++        position_ids: Optional[mindspore.Tensor] = None,
-++++++        past_key_value: Optional[Cache] = None,
-++++++        output_attentions: bool = False,
-++++++        use_cache: bool = False,
-++++++        **kwargs,
-++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++        if "padding_mask" in kwargs:
-++++++            warnings.warn(
-++++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-++++++            )
-++++++        if output_attentions:
-++++++            warnings.warn(
-++++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
-++++++            )
-++++++
-++++++        bsz, q_len, _ = hidden_states.shape
-++++++
-++++++        if self.config.pretraining_tp > 1:
-++++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-++++++
-++++++        query_states = self.q_proj(hidden_states)
-++++++        key_states = self.k_proj(hidden_states)
-++++++        value_states = self.v_proj(hidden_states)
-++++++
-++++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++        kv_seq_len = key_states.shape[-2]
-++++++        if past_key_value is not None:
-++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++        
-++++++        # Apply Rotary Position Embedding
-++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++        if past_key_value is not None:
-++++++            cache_kwargs = {"sin": sin, "cos": cos}
-++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++
-++++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
-++++++        # So we must explicitly repeat the KV heads.
-++++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++
-++++++        # Convert attention mask for flash_attention_score
-++++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
-++++++        if attention_mask is not None:
-++++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-++++++                 raise ValueError(
-++++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-++++++                )
-++++++            attn_mask_for_fa = attention_mask < 0
-++++++        else:
-++++++            attn_mask_for_fa = None
-++++++
-++++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-++++++
-++++++        # Call the fused operator using the efficient BNSD layout
-++++++        attn_output = mindspore.ops.flash_attention_score(
-++++++            query=query_states,
-++++++            key=key_states,
-++++++            value=value_states,
-++++++            head_num=self.num_heads,
-++++++            input_layout='BNSD', # Specify the correct layout
-++++++            attn_mask=attn_mask_for_fa,
-++++++            keep_prob=keep_prob,
-++++++            scalar_value=1.0 / math.sqrt(self.head_dim)
-++++++        )
-++++++        
-++++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
-++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++        
-++++++        # Apply output projection
-++++++        attn_output = self.o_proj(attn_output)
-++++++
-++++++        # Flash attention does not return attention weights, so we return None.
-++++++        attn_weights = None
-++++++
-++++++        return attn_output, attn_weights, past_key_value
-++++++
-+++++ Deepseek_ATTENTION_CLASSES = {
-+++++     "eager": DeepseekAttention,
-++++++    "flash-attention": DeepseekFlashAttention,
-+++++ }
-+++++ 
-+++++ 
-+++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
-+++++             config=config, layer_idx=layer_idx
-+++++         )
-+++++ 
-++++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-++++++            config=config, layer_idx=layer_idx
-++++++        )
-++++++
-+++++         self.mlp = (
-+++++             DeepseekMoE(config)
-+++++             if (
-+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++index d4c6b651..bced285c 100644
-+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
-+++++ 
-+++++ import mindspore
-+++++ import mindnlp.core.nn.functional as F
-+++++-from mindnlp.core import nn, ops
-++++++from mindnlp.core import nn, ops, no_grad
-+++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-+++++ 
-+++++ from ....common.activations import ACT2FN
-+++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
-+++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-+++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-+++++ 
-++++++Long_Prompt = False
-++++++PROMPT_LENGTH_THRESHOLD = 128
-+++++ 
-+++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-+++++ def _prepare_4d_causal_attention_mask_with_cache_position(
-+++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++ 
-++++++# class Qwen2MoeFlashAttention(nn.Module):
-++++++#     """
-++++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++++
-++++++#     关键改动:
-++++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++++#        直接传入原始的 key 和 value 张量效率更高。
-++++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++++#     """
-++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++++#         super().__init__()
-++++++#         self.config = config
-++++++#         self.layer_idx = layer_idx
-++++++#         self.hidden_size = config.hidden_size
-++++++#         self.num_heads = config.num_attention_heads
-++++++#         self.head_dim = self.hidden_size // self.num_heads
-++++++#         self.num_key_value_heads = config.num_key_value_heads
-++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++#         self.max_position_embeddings = config.max_position_embeddings
-++++++#         self.rope_theta = config.rope_theta
-++++++#         self.attention_dropout = config.attention_dropout
-++++++
-++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++#             raise ValueError(
-++++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++++#             )
-++++++
-++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++++
-++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++++#             self.head_dim,
-++++++#             max_position_embeddings=self.max_position_embeddings,
-++++++#             base=self.rope_theta,
-++++++#         )
-++++++
-++++++#     def forward(
-++++++#         self,
-++++++#         hidden_states: mindspore.Tensor,
-++++++#         attention_mask: Optional[mindspore.Tensor] = None,
-++++++#         position_ids: Optional[mindspore.Tensor] = None,
-++++++#         past_key_value: Optional[Cache] = None,
-++++++#         output_attentions: bool = False,
-++++++#         use_cache: bool = False,
-++++++#         cache_position: Optional[mindspore.Tensor] = None,
-++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++#         bsz, q_len, _ = hidden_states.shape
-++++++
-++++++#         # 1. 线性投射 Q, K, V
-++++++#         query_states = self.q_proj(hidden_states)
-++++++#         key_states = self.k_proj(hidden_states)
-++++++#         value_states = self.v_proj(hidden_states)
-++++++
-++++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
-++++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++#         # 3. RoPE 旋转位置编码
-++++++#         kv_seq_len = key_states.shape[-2]
-++++++#         if past_key_value is not None:
-++++++#             if self.layer_idx is None:
-++++++#                 raise ValueError(
-++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++#                     "with a layer index."
-++++++#                 )
-++++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++++#                 if cache_position.shape[0] == 1:
-++++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++++#                     kv_seq_len = past_seen_tokens + 1
-++++++#                 else:
-++++++#                     # prefill 阶段：cache_position 是范围，使用其长度
-++++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++++#             else:
-++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++
-++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++#         # 4. KV 缓存更新
-++++++#         if past_key_value is not None:
-++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++#             key_states, value_states = past_key_value.update(
-++++++#                 key_states, value_states, self.layer_idx, cache_kwargs
-++++++#             )
-++++++            
-++++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++#                 if cache_position.shape[0] == 1:
-++++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++++#                     kv_seq_len = key_states.shape[-2]
-++++++
-++++++#         # 5. [重要] 准备 Attention Mask
-++++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++++#         fa_attention_mask = None
-++++++#         if attention_mask is not None:
-++++++#             # 截取与当前key长度匹配的部分
-++++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++++#             fa_attention_mask = (mask_slice != 0)
-++++++
-++++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++++#         input_dtype = query_states.dtype
-++++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++++#             query_states = query_states.to(mindspore.float16)
-++++++#             key_states = key_states.to(mindspore.float16)
-++++++#             value_states = value_states.to(mindspore.float16)
-++++++
-++++++#         # 6. [核心] 调用 flash_attention_score 算子
-++++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++++#         attn_output = mindspore.ops.flash_attention_score(
-++++++#             query=query_states,
-++++++#             key=key_states,
-++++++#             value=value_states,
-++++++#             head_num=self.num_heads, # 传入Q的头数(N1)
-++++++#             attn_mask=fa_attention_mask,
-++++++#             keep_prob=1.0 - self.attention_dropout,
-++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++#             input_layout="BNSD",
-++++++#             sparse_mode=0 # 使用 defaultMask 模式
-++++++#         )
-++++++
-++++++#         # 恢复原始数据类型
-++++++#         attn_output = attn_output.to(input_dtype)
-++++++
-++++++#         # 7. 调整输出形状
-++++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++#         attn_output = self.o_proj(attn_output)
-++++++
-++++++#         # FlashAttention 算子不直接返回注意力权重矩阵
-++++++#         attn_weights = None
-++++++#         if output_attentions:
-++++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++
-++++++#         return attn_output, attn_weights, past_key_value
-++++++
-++++++#     # def forward(
-++++++#     #     self,
-++++++#     #     hidden_states: mindspore.Tensor,
-++++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++#     #     position_ids: Optional[mindspore.Tensor] = None,
-++++++#     #     past_key_value: Optional[Cache] = None,
-++++++#     #     output_attentions: bool = False,
-++++++#     #     use_cache: bool = False,
-++++++#     #     cache_position: Optional[mindspore.Tensor] = None,
-++++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++#     #     bsz, q_len, _ = hidden_states.shape
-++++++
-++++++#     #     # 1. 线性投射 Q, K, V
-++++++#     #     query_states = self.q_proj(hidden_states)
-++++++#     #     key_states = self.k_proj(hidden_states)
-++++++#     #     value_states = self.v_proj(hidden_states)
-++++++
-++++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++
-++++++#     #     # 3. RoPE 旋转位置编码
-++++++#     #     kv_seq_len = key_states.shape[-2]
-++++++#     #     if past_key_value is not None:
-++++++#     #         if self.layer_idx is None:
-++++++#     #             raise ValueError(
-++++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++#     #                 "with a layer index."
-++++++#     #             )
-++++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++
-++++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++#     #     # 4. KV 缓存更新
-++++++#     #     if past_key_value is not None:
-++++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++#     #         key_states, value_states = past_key_value.update(
-++++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++#     #         )
-++++++
-++++++#     #     # 5. 准备 Attention Mask
-++++++#     #     fa_attention_mask = None
-++++++#     #     if attention_mask is not None:
-++++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++#     #         fa_attention_mask = (mask_slice != 0)
-++++++
-++++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++++#     #     input_dtype = query_states.dtype
-++++++
-++++++#     #     # 6. [核心] 调用 flash_attention_score 算子
-++++++#     #     attn_output = mindspore.ops.flash_attention_score(
-++++++#     #         query=query_states,
-++++++#     #         key=key_states,
-++++++#     #         value=value_states,
-++++++#     #         head_num=self.num_heads,
-++++++#     #         attn_mask=fa_attention_mask,
-++++++#     #         keep_prob=1.0 - self.attention_dropout,
-++++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++#     #         input_layout="BNSD",
-++++++#     #         sparse_mode=0,
-++++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++++#     #         inner_precise=1
-++++++#     #     )
-++++++
-++++++#     #     # 恢复原始数据类型
-++++++#     #     attn_output = attn_output.to(input_dtype)
-++++++
-++++++#     #     # 7. 调整输出形状
-++++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++#     #     attn_output = self.o_proj(attn_output)
-++++++
-++++++#     #     attn_weights = None
-++++++#     #     if output_attentions:
-++++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++
-++++++#     #     return attn_output, attn_weights, past_key_value
-++++++
-++++++
-+++++ class Qwen2MoeFlashAttention(nn.Module):
-+++++     """
-+++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++++-
-+++++-    关键改动:
-+++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++++-       直接传入原始的 key 和 value 张量效率更高。
-+++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
-++++++
-++++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
-++++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
-++++++    完全使用模型的低精度数据类型（如 float16）进行计算，
-++++++    以达到理论上的最高执行速度。
-+++++     """
-+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++         super().__init__()
-+++++         self.config = config
-+++++         self.layer_idx = layer_idx
-++++++        if layer_idx is None:
-++++++            logger.warning_once(
-++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
-++++++            )
-++++++
-+++++         self.hidden_size = config.hidden_size
-+++++         self.num_heads = config.num_attention_heads
-+++++         self.head_dim = self.hidden_size // self.num_heads
-+++++         self.num_key_value_heads = config.num_key_value_heads
-+++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++         self.max_position_embeddings = config.max_position_embeddings
-+++++         self.rope_theta = config.rope_theta
-+++++         self.attention_dropout = config.attention_dropout
-+++++ 
-+++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++-            raise ValueError(
-+++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++++-            )
-+++++-
-+++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
-+++++         key_states = self.k_proj(hidden_states)
-+++++         value_states = self.v_proj(hidden_states)
-+++++ 
-+++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++++        # 2. 调整形状以匹配 BNSD 布局
-+++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-
-+++++-        # 3. RoPE 旋转位置编码
-++++++        
-++++++        # 3. RoPE 和 KV 缓存
-+++++         kv_seq_len = key_states.shape[-2]
-+++++         if past_key_value is not None:
-+++++-            if self.layer_idx is None:
-+++++-                raise ValueError(
-+++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++-                    "with a layer index."
-+++++-                )
-+++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++++-                if cache_position.shape[0] == 1:
-+++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++++-                    kv_seq_len = past_seen_tokens + 1
-+++++-                else:
-+++++-                    # prefill 阶段：cache_position 是范围，使用其长度
-+++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++++-            else:
-+++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++-
-++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++        
-+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++ 
-+++++-        # 4. KV 缓存更新
-+++++         if past_key_value is not None:
-+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++-            key_states, value_states = past_key_value.update(
-+++++-                key_states, value_states, self.layer_idx, cache_kwargs
-+++++-            )
-+++++-            
-+++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++-                if cache_position.shape[0] == 1:
-+++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++++-                    kv_seq_len = key_states.shape[-2]
-+++++-
-+++++-        # 5. [重要] 准备 Attention Mask
-+++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++
-++++++        # 4. 准备 Attention Mask
-+++++         fa_attention_mask = None
-+++++         if attention_mask is not None:
-+++++-            # 截取与当前key长度匹配的部分
-+++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++++             fa_attention_mask = (mask_slice != 0)
-+++++ 
-+++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++++-        input_dtype = query_states.dtype
-+++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++++-            query_states = query_states.to(mindspore.float16)
-+++++-            key_states = key_states.to(mindspore.float16)
-+++++-            value_states = value_states.to(mindspore.float16)
-+++++-
-+++++-        # 6. [核心] 调用 flash_attention_score 算子
-+++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
-+++++         attn_output = mindspore.ops.flash_attention_score(
-+++++             query=query_states,
-+++++             key=key_states,
-+++++             value=value_states,
-+++++-            head_num=self.num_heads, # 传入Q的头数(N1)
-++++++            head_num=self.num_heads,
-+++++             attn_mask=fa_attention_mask,
-+++++-            keep_prob=1.0 - self.attention_dropout,
-++++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
-+++++             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++             input_layout="BNSD",
-+++++-            sparse_mode=0 # 使用 defaultMask 模式
-++++++            sparse_mode=0,
-++++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
-+++++         )
-+++++ 
-+++++-        # 恢复原始数据类型
-+++++-        attn_output = attn_output.to(input_dtype)
-+++++-
-+++++-        # 7. 调整输出形状
-+++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++++        # 6. 调整输出形状
-+++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++         attn_output = self.o_proj(attn_output)
-+++++ 
-+++++-        # FlashAttention 算子不直接返回注意力权重矩阵
-++++++        # 7. 返回结果
-+++++         attn_weights = None
-+++++         if output_attentions:
-+++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-+++++ 
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++-    # def forward(
-+++++-    #     self,
-+++++-    #     hidden_states: mindspore.Tensor,
-+++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++-    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++-    #     past_key_value: Optional[Cache] = None,
-+++++-    #     output_attentions: bool = False,
-+++++-    #     use_cache: bool = False,
-+++++-    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++-
-+++++-    #     bsz, q_len, _ = hidden_states.shape
-+++++-
-+++++-    #     # 1. 线性投射 Q, K, V
-+++++-    #     query_states = self.q_proj(hidden_states)
-+++++-    #     key_states = self.k_proj(hidden_states)
-+++++-    #     value_states = self.v_proj(hidden_states)
-+++++-
-+++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-
-+++++-    #     # 3. RoPE 旋转位置编码
-+++++-    #     kv_seq_len = key_states.shape[-2]
-+++++-    #     if past_key_value is not None:
-+++++-    #         if self.layer_idx is None:
-+++++-    #             raise ValueError(
-+++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++-    #                 "with a layer index."
-+++++-    #             )
-+++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++ 
-+++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++-
-+++++-    #     # 4. KV 缓存更新
-+++++-    #     if past_key_value is not None:
-+++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++-    #         key_states, value_states = past_key_value.update(
-+++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++-    #         )
-+++++-
-+++++-    #     # 5. 准备 Attention Mask
-+++++-    #     fa_attention_mask = None
-+++++-    #     if attention_mask is not None:
-+++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++-    #         fa_attention_mask = (mask_slice != 0)
-+++++-
-+++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++++-    #     input_dtype = query_states.dtype
-+++++-
-+++++-    #     # 6. [核心] 调用 flash_attention_score 算子
-+++++-    #     attn_output = mindspore.ops.flash_attention_score(
-+++++-    #         query=query_states,
-+++++-    #         key=key_states,
-+++++-    #         value=value_states,
-+++++-    #         head_num=self.num_heads,
-+++++-    #         attn_mask=fa_attention_mask,
-+++++-    #         keep_prob=1.0 - self.attention_dropout,
-+++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++-    #         input_layout="BNSD",
-+++++-    #         sparse_mode=0,
-+++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++++-    #         inner_precise=1
-+++++-    #     )
-+++++-
-+++++-    #     # 恢复原始数据类型
-+++++-    #     attn_output = attn_output.to(input_dtype)
-++++++QWEN2MOE_ATTENTION_CLASSES = {
-++++++    "eager": Qwen2MoeAttention,
-++++++    "flash-attention": Qwen2MoeFlashAttention,
-++++++}
-+++++ 
-+++++-    #     # 7. 调整输出形状
-+++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++-    #     attn_output = self.o_proj(attn_output)
-+++++ 
-+++++-    #     attn_weights = None
-+++++-    #     if output_attentions:
-+++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++#     def __init__(self, config):
-++++++#         super().__init__()
-++++++#         self.num_experts = config.num_experts
-++++++#         self.top_k = config.num_experts_per_tok
-++++++#         self.norm_topk_prob = config.norm_topk_prob
-++++++
-++++++#         # gating
-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++#         self.experts = nn.ModuleList(
-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++#         )
-++++++
-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++
-++++++#     #@dwj
-++++++#     # 只遍历激活的专家，而非全部专家
-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++#             num_tokens = hidden_states_reshaped.shape[0]
-++++++
-++++++#             router_logits = self.gate(hidden_states_reshaped)
-++++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++
-++++++#             if self.norm_topk_prob:
-++++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++#             routing_weights = routing_weights.to(hidden_states.dtype)
-++++++            
-++++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++++#             flat_selected_experts = selected_experts.flatten()
-++++++            
-++++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++++#             token_indices = broadcasted_token_indices.flatten()
-++++++            
-++++++#             active_experts = ops.unique(flat_selected_experts)
-++++++            
-++++++#             for expert_idx_tensor in active_experts:
-++++++#                 expert_idx = expert_idx_tensor.item()
-++++++#                 expert_layer = self.experts[expert_idx]
-++++++                
-++++++#                 mask = (flat_selected_experts == expert_idx_tensor)
-++++++#                 selected_token_indices = token_indices[mask]
-++++++#                 selected_routing_weights = routing_weights.flatten()[mask]
-++++++                
-++++++#                 current_states = hidden_states_reshaped[selected_token_indices]
-++++++                
-++++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++                
-++++++#                 final_hidden_states = final_hidden_states.index_add(
-++++++#                     dim=0,
-++++++#                     index=selected_token_indices,
-++++++#                     source=expert_output.to(hidden_states.dtype)
-++++++#                 )
-++++++            
-++++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++++ 
-+++++-    #     return attn_output, attn_weights, past_key_value
-++++++#             final_hidden_states = final_hidden_states + shared_expert_output
-++++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++            
-++++++#             return final_hidden_states, router_logits
-++++++
-++++++
-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++#     """
-++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-++++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-++++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
-++++++#     """
-++++++#     def __init__(self, config: Qwen2MoeConfig):
-++++++#         super().__init__()
-++++++#         self.num_experts = config.num_experts
-++++++#         self.top_k = config.num_experts_per_tok
-++++++#         self.norm_topk_prob = config.norm_topk_prob
-++++++
-++++++#         # 门控网络
-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++#         # 专家列表
-++++++#         self.experts = nn.ModuleList(
-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++#         )
-++++++#         # 共享专家
-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++
-++++++#     @no_grad()
-++++++#     def _moe_infer_decode(
-++++++#         self, 
-++++++#         hidden_states: mindspore.Tensor, 
-++++++#         selected_experts: mindspore.Tensor, 
-++++++#         routing_weights: mindspore.Tensor
-++++++#     ) -> mindspore.Tensor:
-++++++#         """
-++++++#         【解码路径】针对 sequence_length=1 的极致优化。
-++++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-++++++#         """
-++++++#         batch_size, hidden_dim = hidden_states.shape
-++++++        
-++++++#         expert_outputs_list = [
-++++++#             ops.cat([
-++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++#             ], dim=0) 
-++++++#             for i in range(batch_size)
-++++++#         ]
-++++++        
-++++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-++++++#         # shape: (batch_size, top_k, hidden_dim)
-++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++        
-++++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++++        
-++++++#         return moe_output.squeeze(1)
-++++++
-++++++#     @no_grad()
-++++++#     def _moe_infer_prefill(
-++++++#         self, 
-++++++#         hidden_states: mindspore.Tensor, 
-++++++#         selected_experts: mindspore.Tensor, 
-++++++#         routing_weights: mindspore.Tensor
-++++++#     ) -> mindspore.Tensor:
-++++++#         """
-++++++#         【预填充路径】针对 sequence_length > 1 的优化。
-++++++#         按专家对 Token 进行分组，并进行批处理。
-++++++#         """
-++++++#         moe_output = ops.zeros_like(hidden_states)
-++++++#         num_tokens = hidden_states.shape[0]
-++++++#         flat_selected_experts = selected_experts.flatten()
-++++++        
-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++        
-++++++#         active_experts = ops.unique(flat_selected_experts)
-++++++        
-++++++#         for expert_idx_tensor in active_experts:
-++++++#             expert_idx = expert_idx_tensor.item()
-++++++#             expert_layer = self.experts[expert_idx]
-++++++            
-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++#             selected_token_indices = token_indices[mask]
-++++++#             selected_routing_weights = routing_weights.flatten()[mask]
-++++++            
-++++++#             current_states = hidden_states[selected_token_indices]
-++++++            
-++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++            
-++++++#             moe_output = moe_output.index_add(
-++++++#                 dim=0,
-++++++#                 index=selected_token_indices,
-++++++#                 source=expert_output.to(hidden_states.dtype)
-++++++#             )
-++++++#         return moe_output
-++++++
-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++#         """
-++++++#         顶层 forward 方法，作为智能分发器。
-++++++#         """
-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++        
-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++#         router_logits = self.gate(hidden_states_reshaped)
-++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++ 
-+++++-    # def forward(
-+++++-    #     self,
-+++++-    #     hidden_states: mindspore.Tensor,
-+++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++-    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++-    #     past_key_value: Optional[Cache] = None,
-+++++-    #     output_attentions: bool = False,
-+++++-    #     use_cache: bool = False,
-+++++-    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++-
-+++++-    #     bsz, q_len, _ = hidden_states.shape
-+++++-
-+++++-    #     query_states = self.q_proj(hidden_states)
-+++++-    #     key_states = self.k_proj(hidden_states)
-+++++-    #     value_states = self.v_proj(hidden_states)
-+++++-
-+++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-
-+++++-    #     kv_seq_len = key_states.shape[-2]
-+++++-    #     if past_key_value is not None:
-+++++-    #         if self.layer_idx is None:
-+++++-    #             raise ValueError("`layer_idx` must be specified for caching")
-+++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++-
-+++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++-
-+++++-    #     if past_key_value is not None:
-+++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++-    #         key_states, value_states = past_key_value.update(
-+++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++-    #         )
-++++++#         if self.norm_topk_prob:
-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++        
-++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++++        
-++++++#         moe_output = None
-++++++#         # 在推理时，根据序列长度选择最优路径
-++++++#         if not self.training:
-++++++#             if sequence_length == 1:
-++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++++++#             else:
-++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++++++#         else:
-++++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-++++++#             raise NotImplementedError("Training path is not implemented.")
-++++++
-++++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-++++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-++++++        
-++++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-++++++        
-++++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-++++++        
-++++++#         return final_hidden_states, router_logits
-++++++
-++++++
-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++#     """
-++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-++++++#     """
-++++++#     def __init__(self, config: Qwen2MoeConfig):
-++++++#         super().__init__()
-++++++#         self.num_experts = config.num_experts
-++++++#         self.top_k = config.num_experts_per_tok
-++++++#         self.norm_topk_prob = config.norm_topk_prob
-++++++
-++++++#         # 门控网络
-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++#         # 专家列表
-++++++#         self.experts = nn.ModuleList(
-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++#         )
-++++++#         # 共享专家
-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++
-++++++#     @no_grad()
-++++++#     def _moe_infer_decode(
-++++++#         self, 
-++++++#         hidden_states: mindspore.Tensor, 
-++++++#         selected_experts: mindspore.Tensor, 
-++++++#         routing_weights: mindspore.Tensor
-++++++#     ) -> mindspore.Tensor:
-++++++#         batch_size, _ = hidden_states.shape
-++++++#         expert_outputs_list = [
-++++++#             ops.cat([
-++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++#             ], dim=0) 
-++++++#             for i in range(batch_size)
-++++++#         ]
-++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++++#         return moe_output.squeeze(1)
-++++++
-++++++#     @no_grad()
-++++++#     def _moe_infer_prefill(
-++++++#         self, 
-++++++#         hidden_states: mindspore.Tensor, 
-++++++#         selected_experts: mindspore.Tensor, 
-++++++#         routing_weights: mindspore.Tensor
-++++++#     ) -> mindspore.Tensor:
-++++++#         moe_output = ops.zeros_like(hidden_states)
-++++++#         num_tokens = hidden_states.shape[0]
-++++++#         flat_selected_experts = selected_experts.flatten()
-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++#         active_experts = ops.unique(flat_selected_experts)
-++++++        
-++++++#         for expert_idx_tensor in active_experts:
-++++++#             expert_idx = expert_idx_tensor.item()
-++++++#             expert_layer = self.experts[expert_idx]
-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++#             selected_token_indices = token_indices[mask]
-++++++#             selected_routing_weights = routing_weights.flatten()[mask]
-++++++#             current_states = hidden_states[selected_token_indices]
-++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++#             moe_output = moe_output.index_add(
-++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++++++#             )
-++++++#         return moe_output
-++++++
-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++#         """
-++++++#         顶层 forward 方法，作为智能分发器。
-++++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-++++++#         """
-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++        
-++++++#         # 1. 门控计算 (通用逻辑)
-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++#         router_logits = self.gate(hidden_states_reshaped)
-++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++
-++++++#         if self.norm_topk_prob:
-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++        
-++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++++        
-++++++#         # 2. 智能分发到最优 MoE 路径
-++++++#         moe_output = None
-++++++#         if not self.training:
-++++++#             if sequence_length == 1:
-++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++++++#             else:
-++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++++++#         else:
-++++++#             raise NotImplementedError("Training path is not implemented.")
-++++++
-++++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-++++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++        
-++++++#         # 4. 合并 MoE 输出和共享专家输出
-++++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++        
-++++++#         # 5. 恢复原始形状并返回
-++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++        
-++++++#         return final_hidden_states, router_logits
-++++++
-++++++# prefill fastest
-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++#     """
-++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-++++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-++++++#     """
-++++++#     def __init__(self, config: Qwen2MoeConfig):
-++++++#         super().__init__()
-++++++#         self.num_experts = config.num_experts
-++++++#         self.top_k = config.num_experts_per_tok
-++++++#         self.norm_topk_prob = config.norm_topk_prob
-++++++
-++++++#         # 门控网络
-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++#         # 专家列表
-++++++#         self.experts = nn.ModuleList(
-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++#         )
-++++++#         # 共享专家
-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++
-++++++#     @no_grad()
-++++++#     def _moe_infer_dispatch(
-++++++#         self, 
-++++++#         hidden_states: mindspore.Tensor, 
-++++++#         selected_experts: mindspore.Tensor, 
-++++++#         routing_weights: mindspore.Tensor
-++++++#     ) -> mindspore.Tensor:
-++++++#         """
-++++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-++++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-++++++#         """
-++++++#         moe_output = ops.zeros_like(hidden_states)
-++++++#         num_tokens, _ = hidden_states.shape
-++++++        
-++++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-++++++#         flat_selected_experts = selected_experts.flatten()
-++++++#         flat_routing_weights = routing_weights.flatten()
-+++++ 
-+++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++-
-+++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++++-    #     query_states = query_states / math.sqrt(self.head_dim)
-+++++-    #     # <--- 修改结束 ---
-+++++-
-+++++-    #     fa_attention_mask = None
-+++++-    #     if attention_mask is not None:
-+++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++-    #         fa_attention_mask = (mask_slice != 0)
-+++++-
-+++++-    #     input_dtype = query_states.dtype
-+++++-
-+++++-    #     attn_output = mindspore.ops.flash_attention_score(
-+++++-    #         query=query_states,    # 传入已经预先缩放过的 query
-+++++-    #         key=key_states,
-+++++-    #         value=value_states,
-+++++-    #         head_num=self.num_heads,
-+++++-    #         attn_mask=fa_attention_mask,
-+++++-    #         keep_prob=1.0 - self.attention_dropout,
-+++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++++-    #         input_layout="BNSD",
-+++++-    #         sparse_mode=0,
-+++++-    #         inner_precise=1        # 仍然保持内部高精度计算
-+++++-    #     )
-++++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++ 
-+++++-    #     attn_output = attn_output.to(input_dtype)
-+++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++-    #     attn_output = self.o_proj(attn_output)
-++++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-++++++#         active_experts = ops.unique(flat_selected_experts)
-++++++        
-++++++#         for expert_idx_tensor in active_experts:
-++++++#             expert_idx = expert_idx_tensor.item()
-++++++#             expert_layer = self.experts[expert_idx]
-++++++            
-++++++#             # 找到所有分配给该专家的 token
-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++            
-++++++#             # 使用 mask 选取对应的 token 和权重
-++++++#             current_token_indices = token_indices[mask]
-++++++#             current_routing_weights = flat_routing_weights[mask]
-++++++#             current_hidden_states = hidden_states[current_token_indices]
-++++++            
-++++++#             # 对这些 token 进行批处理
-++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++++            
-++++++#             # 使用 index_add 将结果精确地加回到对应位置
-++++++#             moe_output = moe_output.index_add(
-++++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-++++++#             )
-++++++#         return moe_output
-++++++
-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++#         """
-++++++#         顶层 forward 方法，作为智能分发器。
-++++++#         """
-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++        
-++++++#         # 1. 门控计算
-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++#         router_logits = self.gate(hidden_states_reshaped)
-++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++
-++++++#         if self.norm_topk_prob:
-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++        
-++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++++        
-++++++#         # 2. 调用统一的 MoE 计算内核
-++++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-++++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-+++++ 
-+++++-    #     attn_weights = None
-+++++-    #     if output_attentions:
-+++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++++#         # 3. 统一处理共享专家
-++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++        
-++++++#         # 4. 合并输出
-++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++        
-++++++#         # 5. 恢复原始形状并返回
-++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++        
-++++++#         return final_hidden_states, router_logits
-++++++
-++++++
-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++#     """
-++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++++#     【最终高性能与高精度版】：
-++++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-++++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-++++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-++++++#     3. 这样实现了速度和准确性的两全其美。
-++++++#     """
-++++++#     def __init__(self, config: Qwen2MoeConfig):
-++++++#         super().__init__()
-++++++#         self.num_experts = config.num_experts
-++++++#         self.top_k = config.num_experts_per_tok
-++++++#         self.norm_topk_prob = config.norm_topk_prob
-++++++
-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++#         self.experts = nn.ModuleList(
-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++#         )
-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++
-++++++#     @no_grad()
-++++++#     def _moe_infer_decode(
-++++++#         self, 
-++++++#         hidden_states: mindspore.Tensor, 
-++++++#         selected_experts: mindspore.Tensor, 
-++++++#         routing_weights: mindspore.Tensor
-++++++#     ) -> mindspore.Tensor:
-++++++#         """
-++++++#         【解码路径】极致优化版：bmm + 高精度累加。
-++++++#         """
-++++++#         original_dtype = hidden_states.dtype
-++++++#         batch_size, _ = hidden_states.shape
-++++++        
-++++++#         expert_outputs_list = [
-++++++#             ops.cat([
-++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++#             ], dim=0) 
-++++++#             for i in range(batch_size)
-++++++#         ]
-++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++
-++++++#         # 在 float32 下执行 bmm，得到高精度结果
-++++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++++        
-++++++#         # 将高精度结果转换回原始数据类型
-++++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-++++++        
-++++++#         return moe_output
-++++++
-++++++#     @no_grad()
-++++++#     def _moe_infer_prefill(
-++++++#         self, 
-++++++#         hidden_states: mindspore.Tensor, 
-++++++#         selected_experts: mindspore.Tensor, 
-++++++#         routing_weights: mindspore.Tensor
-++++++#     ) -> mindspore.Tensor:
-++++++#         """
-++++++#         【预填充路径】与原始实现一致，结果精确。
-++++++#         """
-++++++#         moe_output = ops.zeros_like(hidden_states)
-++++++#         num_tokens, _ = hidden_states.shape
-++++++#         flat_selected_experts = selected_experts.flatten()
-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++#         active_experts = ops.unique(flat_selected_experts)
-++++++        
-++++++#         for expert_idx_tensor in active_experts:
-++++++#             expert_idx = expert_idx_tensor.item()
-++++++#             expert_layer = self.experts[expert_idx]
-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++#             selected_token_indices = token_indices[mask]
-++++++#             selected_routing_weights = routing_weights.flatten()[mask]
-++++++#             current_states = hidden_states[selected_token_indices]
-++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++#             moe_output = moe_output.index_add(
-++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++++++#             )
-++++++#         return moe_output
-++++++
-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++        
-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++#         router_logits = self.gate(hidden_states_reshaped)
-++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++ 
-+++++-    #     return attn_output, attn_weights, past_key_value
-++++++#         if self.norm_topk_prob:
-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++        
-++++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-++++++#         # 如果模型主体是 float16，后续再转换
-++++++        
-++++++#         moe_output = None
-++++++#         if not self.training:
-++++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-++++++#             # _moe_infer_decode 内部会处理好类型转换
-++++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-++++++#             if sequence_length == 1:
-++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++++++#             else:
-++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++++++#         else:
-++++++#             raise NotImplementedError("Training path is not implemented.")
-++++++
-++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++        
-++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++        
-++++++#         return final_hidden_states, router_logits
-++++++    
-+++++ 
-+++++-QWEN2MOE_ATTENTION_CLASSES = {
-+++++-    "eager": Qwen2MoeAttention,
-+++++-    "flash-attention": Qwen2MoeFlashAttention,
-+++++-}
-++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++#     """
-++++++#     【融合版】一个混合专家模块，内置两种推理策略，
-++++++#     由外部全局变量 `Long_Prompt` 控制：
-++++++
-++++++#     - if Long_Prompt is True:  【精度优先模式】
-++++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-++++++#       适用于处理长序列，避免误差累积。
-++++++
-++++++#     - if Long_Prompt is False: 【速度优先模式】
-++++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-++++++#       在解码阶段获得极致速度，同时保证结果高度准确。
-++++++#     """
-++++++#     def __init__(self, config: Qwen2MoeConfig):
-++++++#         super().__init__()
-++++++#         self.num_experts = config.num_experts
-++++++#         self.top_k = config.num_experts_per_tok
-++++++#         self.norm_topk_prob = config.norm_topk_prob
-++++++
-++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++#         self.experts = nn.ModuleList(
-++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++#         )
-++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++
-++++++#     # --- 速度优先模式的辅助函数 ---
-++++++#     @no_grad()
-++++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++#         original_dtype = hidden_states.dtype
-++++++#         batch_size, _ = hidden_states.shape
-++++++#         expert_outputs_list = [
-++++++#             ops.cat([
-++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++#             ], dim=0) 
-++++++#             for i in range(batch_size)
-++++++#         ]
-++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++#         weights_fp32 = routing_weights.to(mindspore.float32)
-++++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
-++++++
-++++++#     @no_grad()
-++++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++#         moe_output = ops.zeros_like(hidden_states)
-++++++#         num_tokens, _ = hidden_states.shape
-++++++#         flat_selected_experts = selected_experts.flatten()
-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++#         active_experts = ops.unique(flat_selected_experts)
-++++++#         for expert_idx_tensor in active_experts:
-++++++#             expert_idx = expert_idx_tensor.item()
-++++++#             expert_layer = self.experts[expert_idx]
-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++#             selected_token_indices = token_indices[mask]
-++++++#             selected_routing_weights = routing_weights.flatten()[mask]
-++++++#             current_states = hidden_states[selected_token_indices]
-++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-++++++#         return moe_output
-++++++
-++++++#     # --- 精度优先模式的辅助函数 ---
-++++++#     @no_grad()
-++++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++#         moe_output = ops.zeros_like(hidden_states)
-++++++#         num_tokens, _ = hidden_states.shape
-++++++#         flat_selected_experts = selected_experts.flatten()
-++++++#         flat_routing_weights = routing_weights.flatten()
-++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++#         active_experts = ops.unique(flat_selected_experts)
-++++++#         for expert_idx_tensor in active_experts:
-++++++#             expert_idx = expert_idx_tensor.item()
-++++++#             expert_layer = self.experts[expert_idx]
-++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++#             current_token_indices = token_indices[mask]
-++++++#             current_routing_weights = flat_routing_weights[mask]
-++++++#             current_hidden_states = hidden_states[current_token_indices]
-++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++++++#         return moe_output
-++++++
-++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++#         # 声明我们将要使用一个在模块外部定义的全局变量
-++++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-++++++#         global Long_Prompt
-++++++
-++++++#         # 1. 门控计算 (所有模式通用)
-++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++#         router_logits = self.gate(hidden_states_reshaped)
-++++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++++++#         if self.norm_topk_prob:
-++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++        
-++++++#         moe_output = None
-++++++#         if not self.training:
-++++++#             # 根据 Long_Prompt 标志选择模式
-++++++#             if Long_Prompt:
-++++++#                 # --- 精度优先模式 ---
-++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++#             else:
-++++++#                 # --- 速度优先模式 ---
-++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++#                 if sequence_length == 1:
-++++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++#                 else:
-++++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++#         else:
-++++++#             raise NotImplementedError("Training path is not implemented.")
-++++++
-++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++        
-++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++        
-++++++#         return final_hidden_states, router_logits
-++++++    
-++++++class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++    """
-++++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-++++++    控制的顶级推理策略：
-+++++ 
-++++++    - if Long_Prompt is True:  【精度优先模式】
-++++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
-++++++      适用于需要严格可复现性的长序列任务。
-+++++ 
-+++++-class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++-    def __init__(self, config):
-++++++    - if Long_Prompt is False: 【速度优先模式】
-++++++      采用业界最强的性能组合：
-++++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
-++++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
-++++++    """
-++++++    def __init__(self, config: Qwen2MoeConfig):
-+++++         super().__init__()
-+++++         self.num_experts = config.num_experts
-+++++         self.top_k = config.num_experts_per_tok
-+++++         self.norm_topk_prob = config.norm_topk_prob
-+++++ 
-+++++-        # gating
-+++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++         self.experts = nn.ModuleList(
-+++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++         )
-+++++-
-+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++ 
-+++++-    #@dwj
-+++++-    # 只遍历激活的专家，而非全部专家
-+++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++-            num_tokens = hidden_states_reshaped.shape[0]
-+++++-
-+++++-            router_logits = self.gate(hidden_states_reshaped)
-+++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++-
-+++++-            if self.norm_topk_prob:
-+++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-            routing_weights = routing_weights.to(hidden_states.dtype)
-+++++-            
-+++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++++-            flat_selected_experts = selected_experts.flatten()
-+++++-            
-+++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++++-            token_indices = broadcasted_token_indices.flatten()
-+++++-            
-+++++-            active_experts = ops.unique(flat_selected_experts)
-+++++-            
-+++++-            for expert_idx_tensor in active_experts:
-+++++-                expert_idx = expert_idx_tensor.item()
-+++++-                expert_layer = self.experts[expert_idx]
-+++++-                
-+++++-                mask = (flat_selected_experts == expert_idx_tensor)
-+++++-                selected_token_indices = token_indices[mask]
-+++++-                selected_routing_weights = routing_weights.flatten()[mask]
-+++++-                
-+++++-                current_states = hidden_states_reshaped[selected_token_indices]
-+++++-                
-+++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++-                
-+++++-                final_hidden_states = final_hidden_states.index_add(
-+++++-                    dim=0,
-+++++-                    index=selected_token_indices,
-+++++-                    source=expert_output.to(hidden_states.dtype)
-+++++-                )
-+++++-            
-+++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-++++++    @no_grad()
-++++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++        original_dtype = hidden_states.dtype
-++++++        batch_size, _ = hidden_states.shape
-++++++        expert_outputs_list = [
-++++++            ops.cat([
-++++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++            ], dim=0)
-++++++            for i in range(batch_size)
-++++++        ]
-++++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++        weights_fp32 = routing_weights.to(mindspore.float32)
-++++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++++++        return moe_output_fp32.squeeze(1).to(original_dtype)
-++++++
-++++++    @no_grad()
-++++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++        num_tokens, _ = hidden_states.shape
-++++++        flat_selected_experts = selected_experts.flatten()
-++++++        sorted_expert_indices = flat_selected_experts.argsort()
-++++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++++++        original_token_indices = sorted_expert_indices // self.top_k
-++++++        moe_output = ops.zeros_like(hidden_states)
-++++++        current_token_offset = 0
-++++++        for i in range(self.num_experts):
-++++++            expert_token_count = tokens_per_expert[i] - current_token_offset
-++++++            if expert_token_count == 0:
-++++++                continue
-++++++            end_offset = current_token_offset + expert_token_count
-++++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++++++            expert_hidden_states = hidden_states[expert_original_token_indices]
-++++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++++++            current_token_offset += expert_token_count
-++++++        return moe_output
-++++++
-++++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-++++++    @no_grad()
-++++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++        moe_output = ops.zeros_like(hidden_states)
-++++++        num_tokens, _ = hidden_states.shape
-++++++        flat_selected_experts = selected_experts.flatten()
-++++++        flat_routing_weights = routing_weights.flatten()
-++++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++        active_experts = ops.unique(flat_selected_experts)
-++++++        for expert_idx_tensor in active_experts:
-++++++            expert_idx = expert_idx_tensor.item()
-++++++            expert_layer = self.experts[expert_idx]
-++++++            mask = (flat_selected_experts == expert_idx_tensor)
-++++++            current_token_indices = token_indices[mask]
-++++++            current_routing_weights = flat_routing_weights[mask]
-++++++            current_hidden_states = hidden_states[current_token_indices]
-++++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++++++        return moe_output
-+++++ 
-+++++-            final_hidden_states = final_hidden_states + shared_expert_output
-+++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++-            
-+++++-            return final_hidden_states, router_logits
-++++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++        global Long_Prompt
-++++++
-++++++        # 1. 门控计算 (所有模式通用)
-++++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++        router_logits = self.gate(hidden_states_reshaped)
-++++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++++++        if self.norm_topk_prob:
-++++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++        
-++++++        moe_output = None
-++++++        if Long_Prompt:
-++++++            # --- 精度优先模式 (ACCURACY MODE) ---
-++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++        else:
-++++++            # --- 速度优先模式 (SPEED MODE) ---
-++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++            if sequence_length == 1:
-++++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++            else:
-++++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++        
-+++++ 
-++++++        # 3. 共享专家计算与合并 (所有模式通用)
-++++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++        
-++++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++        
-++++++        return final_hidden_states, router_logits
-+++++ 
-+++++ class Qwen2MoeDecoderLayer(nn.Module):
-+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-+++++         super().__init__()
-+++++         self.hidden_size = config.hidden_size
-++++++        
-++++++        # if Long_Prompt:
-++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++++        # else:
-++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++ 
-+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++ 
-+++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++-
-+++++         if (layer_idx not in config.mlp_only_layers) and (
-+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++++         ):
-+++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++             self._warmed_up = True
-+++++             self.warmup_moe_model()
-+++++ 
-++++++
-++++++
-+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++++         output_router_logits = (
-+++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
-+++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++             router_logits=outputs.router_logits,
-+++++         )
-+++++ 
-++++++    def generate(self, *args, **kwargs):
-++++++        """
-++++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-++++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-++++++        """
-++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-++++++
-++++++        input_ids = kwargs.get("input_ids")
-++++++        if input_ids is None and args:
-++++++            input_ids = args[0]
-++++++
-++++++        if input_ids is not None:
-++++++            prompt_length = input_ids.shape[1]
-++++++            
-++++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-++++++                Long_Prompt = True
-++++++            else:
-++++++                Long_Prompt = False
-++++++
-++++++        return super().generate(*args, **kwargs)
-++++++    
-+++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
-+++++     def prepare_inputs_for_generation(
-+++++         self,
-+++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
-+++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
-+++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
-++++++        
-+++++         if past_key_values is not None:
-+++++             if inputs_embeds is not None:  # Exception 1
-+++++                 if 0 not in input_ids.shape:
-+++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++             }
-+++++         )
-+++++         return model_inputs
-++++++
-+++++ # @lwx
-+++++     # def _decode_one_tokens_logits(
-+++++     #     self,
-+++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
-+++++             attentions=outputs.attentions,
-+++++         )
-+++++ 
-++++++
-+++++ __all__ = [
-+++++     "Qwen2MoeForCausalLM",
-+++++     "Qwen2MoeModel",
-+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+++++new file mode 100644
-+++++index 00000000..6dfb5b93
-+++++--- /dev/null
-++++++++ b/patches/0001-20251104commit.patch
-+++++@@ -0,0 +1,1272 @@
-++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++++++From: Pinoeer-kingxi <13022943007@163.com>
-++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
-++++++Subject: [PATCH] 20251104commit
-++++++
-++++++---
-++++++ mindnlp/transformers/cache_utils.py           |  28 +-
-++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
-++++++
-++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++++++index cadd2e04..02f8d4be 100644
-++++++--- a/mindnlp/transformers/cache_utils.py
-+++++++++ b/mindnlp/transformers/cache_utils.py
-++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++++++             # k_out[:, :, cache_position] = key_states
-++++++             # v_out[:, :, cache_position] = value_states
-++++++-            if ON_ORANGE_PI:
-++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++++-            else:
-++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++++-
-+++++++            # if ON_ORANGE_PI:
-+++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++++            # else:
-+++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++++            # 确保 cache_position 是 1D tensor 并且类型正确
-+++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+++++++            if cache_position.ndim > 1:
-+++++++                cache_position = cache_position.flatten()
-+++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-+++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+++++++                cache_position = cache_position.int()
-+++++++            
-+++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+++++++            k_out[:, :, cache_position] = key_states
-+++++++            v_out[:, :, cache_position] = value_states
-+++++++            
-++++++         return k_out, v_out
-++++++ 
-++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++index c695b944..d8303e45 100644
-++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++++++ def rotate_half(x):
-++++++     """Rotates half the hidden dims of the input."""
-++++++-    x1 = x[..., : x.shape[-1] // 2]
-++++++-    x2 = x[..., x.shape[-1] // 2 :]
-+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++++    # x1 = x[..., : x.shape[-1] // 2]
-+++++++    # x2 = x[..., x.shape[-1] // 2 :]
-+++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++++     return ops.cat((-x2, x1), dim=-1)
-++++++ 
-++++++ 
-++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++++++         if self.training:
-++++++             raise NotImplementedError("Training is not supported yet.")
-++++++         else:
-++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++++-        if self.config.n_shared_experts is not None:
-++++++-            y = y + self.shared_experts(identity)
-++++++-        return y
-+++++++            # @lwx
-+++++++            if orig_shape[1] == 1:
-+++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+++++++                y=y.view(*orig_shape)
-+++++++                if self.config.n_shared_experts is not None:
-+++++++                    y = y + self.shared_experts(identity)
-+++++++                return y
-+++++++            else:
-+++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+++++++                if self.config.n_shared_experts is not None:
-+++++++                    y = y + self.shared_experts(identity)
-+++++++                return y
-+++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++++        # if self.config.n_shared_experts is not None:
-+++++++        #     y = y + self.shared_experts(identity)
-+++++++        # return y
-+++++++        
-+++++++    @no_grad()
-+++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++++
-+++++++        expert_cache = ops.zeros_like(x)
-+++++++        for i in range(self.num_experts_per_tok):
-+++++++            expert_id = flat_expert_indices[i].item()
-+++++++            weight = flat_expert_weights[i].item()
-+++++++            expert = self.experts[expert_id]
-+++++++            expert_out = expert(x)
-+++++++            expert_cache += expert_out * weight
-+++++++        return expert_cache
-++++++ 
-++++++     @no_grad()
-++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++-        # expert_cache = torch.zeros_like(x)
-++++++-        # idxs = flat_expert_indices.argsort()
-++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++-        # token_idxs = idxs // self.num_experts_per_tok
-++++++-        # for i, end_idx in enumerate(tokens_per_expert):
-++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++-        #     if start_idx == end_idx:
-++++++-        #         continue
-++++++-        #     expert = self.experts[i]
-++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++-        #     expert_tokens = x[exp_token_idx]
-++++++-        #     expert_out = expert(expert_tokens)
-++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++-        # return expert_cache
-+++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++++         expert_cache = ops.zeros_like(x)
-++++++         idxs = flat_expert_indices.argsort()
-++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++         token_idxs = idxs // self.num_experts_per_tok
-+++++++
-++++++         for i, end_idx in enumerate(tokens_per_expert):
-++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++             if start_idx == end_idx:
-++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++++++             expert_out = expert(expert_tokens)
-++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++++
-++++++         return expert_cache
-+++++++        
-+++++++    # @no_grad()
-+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++++    #     # expert_cache = torch.zeros_like(x)
-+++++++    #     # idxs = flat_expert_indices.argsort()
-+++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++++    #     # token_idxs = idxs // self.num_experts_per_tok
-+++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++++    #     #     if start_idx == end_idx:
-+++++++    #     #         continue
-+++++++    #     #     expert = self.experts[i]
-+++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++    #     #     expert_tokens = x[exp_token_idx]
-+++++++    #     #     expert_out = expert(expert_tokens)
-+++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++++    #     # return expert_cache
-+++++++    #     expert_cache = ops.zeros_like(x)
-+++++++    #     idxs = flat_expert_indices.argsort()
-+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++++    #     token_idxs = idxs // self.num_experts_per_tok
-+++++++
-+++++++    #     for i, end_idx in enumerate(tokens_per_expert):
-+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++++    #         if start_idx == end_idx:
-+++++++    #             continue
-+++++++    #         expert = self.experts[i]
-+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++    #         expert_tokens = x[exp_token_idx]
-+++++++    #         expert_out = expert(expert_tokens)
-+++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++++
-+++++++    #     return expert_cache
-+++++++    # @no_grad()
-+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++++    #     expert_cache = ops.zeros_like(x)
-+++++++
-+++++++    #     # 排序保证顺序一致
-+++++++    #     idxs = flat_expert_indices.argsort()
-+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++++    #     token_idxs = idxs // self.num_experts_per_tok
-+++++++
-+++++++    #     # 找出有 token 的专家
-+++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++++
-+++++++    #     for i in active_experts.tolist():
-+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++++    #         end_idx = tokens_per_expert[i]
-+++++++    #         if start_idx == end_idx:  # 没有 token
-+++++++    #             continue
-+++++++
-+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++    #         expert_tokens = x[exp_token_idx]
-+++++++    #         expert_out = self.experts[i](expert_tokens)
-+++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++++
-+++++++    #         expert_cache = mindspore.mint.scatter_add(
-+++++++    #             expert_cache,
-+++++++    #             0,
-+++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++++    #             expert_out
-+++++++    #         )
-+++++++
-+++++++    #     return expert_cache
-+++++++
-+++++++
-++++++ 
-++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++++++ #     """
-++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++++ 
-++++++         # Initialize weights and apply final processing
-++++++         self.post_init()
-+++++++        self.warm_up = False
-+++++++
-+++++++    def warmup_moe_model_deep(self):
-+++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++++++        test_texts = [
-+++++++            "warmup short",
-+++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+++++++        ]
-+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++++        if tokenizer is None:
-+++++++            from mindnlp.transformers import AutoTokenizer
-+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++++            self._warmup_tokenizer = tokenizer
-+++++++
-+++++++        for text in test_texts:
-+++++++            inputs = tokenizer(text, return_tensors="ms")
-+++++++            with mindspore._no_grad():
-+++++++                _ = self(**inputs, use_cache=False)
-+++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++++++ 
-++++++     def get_input_embeddings(self):
-++++++         return self.model.embed_tokens
-++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++++         ```"""
-+++++++        if not self.warm_up:
-+++++++            self.warm_up = True
-+++++++            self.warmup_moe_model_deep()
-+++++++
-++++++         output_attentions = (
-++++++             output_attentions
-++++++             if output_attentions is not None
-++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++index 3cbf820e..d4c6b651 100644
-++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++@@ -18,7 +18,6 @@
-++++++ # See the License for the specific language governing permissions and
-++++++ # limitations under the License.
-++++++ """MindSpore Qwen2MoE model."""
-++++++-
-++++++ import math
-++++++ from typing import List, Optional, Tuple, Union
-++++++ 
-++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++++++     TokenClassifierOutput,
-++++++ )
-++++++ from ...modeling_utils import PreTrainedModel
-+++++++from ...generation import GenerationMixin
-++++++ from ....utils import logging
-++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
-++++++ 
-++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++++++         self.variance_epsilon = eps
-++++++ 
-++++++     def forward(self, hidden_states):
-+++++++        # @dwj
-+++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++++        # @lwx
-+++++++        # if not self.training :
-+++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++++         input_dtype = hidden_states.dtype
-++++++         hidden_states = hidden_states.to(mindspore.float32)
-++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++++++@@ -234,6 +239,8 @@ def rotate_half(x):
-++++++     """Rotates half the hidden dims of the input."""
-++++++     x1 = x[..., : x.shape[-1] // 2]
-++++++     x2 = x[..., x.shape[-1] // 2 :]
-+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++++     return ops.cat((-x2, x1), dim=-1)
-++++++ 
-++++++ 
-++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++++++         self.config = config
-++++++         self.hidden_size = config.hidden_size
-++++++         self.intermediate_size = intermediate_size
-+++++++        
-++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++++++         self.act_fn = ACT2FN[config.hidden_act]
-++++++ 
-++++++     def forward(self, x):
-++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++++-
-++++++ 
-+++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++++        # @lwx 
-+++++++        # gate_up_output = self.gate_up_proj(x)
-+++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+++++++        # return self.down_proj(swiglu_output)
-+++++++
-+++++++    # def forward(self, x):
-+++++++    #     gate_proj_out = self.gate_proj(x)
-+++++++    #     up_proj_out = self.up_proj(x)
-+++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+++++++    #     return self.down_proj(swiglu_out)
-+++++++        
-++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++++++     """
-++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++++++         use_cache: bool = False,
-++++++         cache_position: Optional[mindspore.Tensor] = None,
-++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++        
-+++++++
-++++++         bsz, q_len, _ = hidden_states.shape
-++++++ 
-++++++         query_states = self.q_proj(hidden_states)
-++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++                     "with a layer index."
-++++++                 )
-++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++            if isinstance(past_key_value, StaticCache):
-+++++++                kv_seq_len = key_states.shape[-2]
-+++++++            else:
-+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++ 
-++++++         if past_key_value is not None:
-++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++++            
-+++++++            if isinstance(past_key_value, StaticCache):
-+++++++                kv_seq_len = key_states.shape[-2]
-++++++ 
-++++++         # repeat k/v heads if n_kv_heads < n_heads
-++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++-
-+++++++        
-++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++++ 
-++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++++++-            raise ValueError(
-++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++++++-                f" {attn_weights.shape}"
-++++++-            )
-++++++-
-++++++-        if attention_mask is not None:  # no matter the length, we just slice it
-++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+++++++        if attention_mask is not None:
-+++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++++             attn_weights = attn_weights + causal_mask
-++++++ 
-++++++         # upcast attention to fp32
-++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++++ 
-++++++         attn_output = self.o_proj(attn_output)
-++++++-
-+++++++        # @lwx
-+++++++        
-+++++++        # max_seq_len = self.max_position_embeddings  # 2048
-+++++++
-+++++++        # if attention_mask is not None:
-+++++++        #     # attention_mask: [B, 1, Sq, Sk]
-+++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++++++
-+++++++        #     # pad 到 [max_seq_len, max_seq_len]
-+++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++++++        #     global_attention_mask = padded_mask
-+++++++        # else:
-+++++++        #     global_attention_mask = None
-+++++++
-+++++++
-+++++++        # sparse_mode=3
-+++++++        # attn_output = mindspore.ops.flash_attention_score(
-+++++++        #     query=query_states,
-+++++++        #     key=key_states,
-+++++++        #     value=value_states,
-+++++++        #     real_shift=None,
-+++++++        #     padding_mask=None,
-+++++++
-+++++++        #     head_num=self.num_heads,
-+++++++        #     attn_mask=global_attention_mask,
-+++++++        #     keep_prob=1.0 - self.attention_dropout,
-+++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++        #     input_layout="BNSD",
-+++++++        #     pre_tokens=2147483647,
-+++++++        #     next_tokens=2147483647,
-+++++++        #     inner_precise=0,
-+++++++        #     drop_mask=None,
-+++++++        #     prefix=None,
-+++++++        #     actual_seq_qlen=None,
-+++++++        #     actual_seq_kvlen=None,
-+++++++        #     sparse_mode=sparse_mode,
-+++++++        # )
-++++++         if not output_attentions:
-++++++             attn_weights = None
-++++++ 
-++++++         return attn_output, attn_weights, past_key_value
-++++++ 
-++++++ 
-+++++++class Qwen2MoeFlashAttention(nn.Module):
-+++++++    """
-+++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++++++
-+++++++    关键改动:
-+++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++++++       直接传入原始的 key 和 value 张量效率更高。
-+++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++++    """
-+++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++++        super().__init__()
-+++++++        self.config = config
-+++++++        self.layer_idx = layer_idx
-+++++++        self.hidden_size = config.hidden_size
-+++++++        self.num_heads = config.num_attention_heads
-+++++++        self.head_dim = self.hidden_size // self.num_heads
-+++++++        self.num_key_value_heads = config.num_key_value_heads
-+++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++++        self.max_position_embeddings = config.max_position_embeddings
-+++++++        self.rope_theta = config.rope_theta
-+++++++        self.attention_dropout = config.attention_dropout
-+++++++
-+++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++++            raise ValueError(
-+++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++++++            )
-+++++++
-+++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++++
-+++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++++            self.head_dim,
-+++++++            max_position_embeddings=self.max_position_embeddings,
-+++++++            base=self.rope_theta,
-+++++++        )
-+++++++
-+++++++    def forward(
-+++++++        self,
-+++++++        hidden_states: mindspore.Tensor,
-+++++++        attention_mask: Optional[mindspore.Tensor] = None,
-+++++++        position_ids: Optional[mindspore.Tensor] = None,
-+++++++        past_key_value: Optional[Cache] = None,
-+++++++        output_attentions: bool = False,
-+++++++        use_cache: bool = False,
-+++++++        cache_position: Optional[mindspore.Tensor] = None,
-+++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++        bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++        # 1. 线性投射 Q, K, V
-+++++++        query_states = self.q_proj(hidden_states)
-+++++++        key_states = self.k_proj(hidden_states)
-+++++++        value_states = self.v_proj(hidden_states)
-+++++++
-+++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++        # 3. RoPE 旋转位置编码
-+++++++        kv_seq_len = key_states.shape[-2]
-+++++++        if past_key_value is not None:
-+++++++            if self.layer_idx is None:
-+++++++                raise ValueError(
-+++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++                    "with a layer index."
-+++++++                )
-+++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++++++                if cache_position.shape[0] == 1:
-+++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++++++                    kv_seq_len = past_seen_tokens + 1
-+++++++                else:
-+++++++                    # prefill 阶段：cache_position 是范围，使用其长度
-+++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++++++            else:
-+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++
-+++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++        # 4. KV 缓存更新
-+++++++        if past_key_value is not None:
-+++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++++            key_states, value_states = past_key_value.update(
-+++++++                key_states, value_states, self.layer_idx, cache_kwargs
-+++++++            )
-+++++++            
-+++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++++                if cache_position.shape[0] == 1:
-+++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++++++                    kv_seq_len = key_states.shape[-2]
-+++++++
-+++++++        # 5. [重要] 准备 Attention Mask
-+++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++++        fa_attention_mask = None
-+++++++        if attention_mask is not None:
-+++++++            # 截取与当前key长度匹配的部分
-+++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++++++            fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++++++        input_dtype = query_states.dtype
-+++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++++++            query_states = query_states.to(mindspore.float16)
-+++++++            key_states = key_states.to(mindspore.float16)
-+++++++            value_states = value_states.to(mindspore.float16)
-+++++++
-+++++++        # 6. [核心] 调用 flash_attention_score 算子
-+++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++++        attn_output = mindspore.ops.flash_attention_score(
-+++++++            query=query_states,
-+++++++            key=key_states,
-+++++++            value=value_states,
-+++++++            head_num=self.num_heads, # 传入Q的头数(N1)
-+++++++            attn_mask=fa_attention_mask,
-+++++++            keep_prob=1.0 - self.attention_dropout,
-+++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++            input_layout="BNSD",
-+++++++            sparse_mode=0 # 使用 defaultMask 模式
-+++++++        )
-+++++++
-+++++++        # 恢复原始数据类型
-+++++++        attn_output = attn_output.to(input_dtype)
-+++++++
-+++++++        # 7. 调整输出形状
-+++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++        attn_output = self.o_proj(attn_output)
-+++++++
-+++++++        # FlashAttention 算子不直接返回注意力权重矩阵
-+++++++        attn_weights = None
-+++++++        if output_attentions:
-+++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++++
-+++++++        return attn_output, attn_weights, past_key_value
-+++++++
-+++++++    # def forward(
-+++++++    #     self,
-+++++++    #     hidden_states: mindspore.Tensor,
-+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++++    #     past_key_value: Optional[Cache] = None,
-+++++++    #     output_attentions: bool = False,
-+++++++    #     use_cache: bool = False,
-+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++    #     bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++    #     # 1. 线性投射 Q, K, V
-+++++++    #     query_states = self.q_proj(hidden_states)
-+++++++    #     key_states = self.k_proj(hidden_states)
-+++++++    #     value_states = self.v_proj(hidden_states)
-+++++++
-+++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++    #     # 3. RoPE 旋转位置编码
-+++++++    #     kv_seq_len = key_states.shape[-2]
-+++++++    #     if past_key_value is not None:
-+++++++    #         if self.layer_idx is None:
-+++++++    #             raise ValueError(
-+++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++    #                 "with a layer index."
-+++++++    #             )
-+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++
-+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++    #     # 4. KV 缓存更新
-+++++++    #     if past_key_value is not None:
-+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++++    #         key_states, value_states = past_key_value.update(
-+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++++    #         )
-+++++++
-+++++++    #     # 5. 准备 Attention Mask
-+++++++    #     fa_attention_mask = None
-+++++++    #     if attention_mask is not None:
-+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++    #         fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++++++    #     input_dtype = query_states.dtype
-+++++++
-+++++++    #     # 6. [核心] 调用 flash_attention_score 算子
-+++++++    #     attn_output = mindspore.ops.flash_attention_score(
-+++++++    #         query=query_states,
-+++++++    #         key=key_states,
-+++++++    #         value=value_states,
-+++++++    #         head_num=self.num_heads,
-+++++++    #         attn_mask=fa_attention_mask,
-+++++++    #         keep_prob=1.0 - self.attention_dropout,
-+++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++    #         input_layout="BNSD",
-+++++++    #         sparse_mode=0,
-+++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++++++    #         inner_precise=1
-+++++++    #     )
-+++++++
-+++++++    #     # 恢复原始数据类型
-+++++++    #     attn_output = attn_output.to(input_dtype)
-+++++++
-+++++++    #     # 7. 调整输出形状
-+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++    #     attn_output = self.o_proj(attn_output)
-+++++++
-+++++++    #     attn_weights = None
-+++++++    #     if output_attentions:
-+++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++++
-+++++++    #     return attn_output, attn_weights, past_key_value
-+++++++
-+++++++    # def forward(
-+++++++    #     self,
-+++++++    #     hidden_states: mindspore.Tensor,
-+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++++    #     past_key_value: Optional[Cache] = None,
-+++++++    #     output_attentions: bool = False,
-+++++++    #     use_cache: bool = False,
-+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++    #     bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++    #     query_states = self.q_proj(hidden_states)
-+++++++    #     key_states = self.k_proj(hidden_states)
-+++++++    #     value_states = self.v_proj(hidden_states)
-+++++++
-+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++    #     kv_seq_len = key_states.shape[-2]
-+++++++    #     if past_key_value is not None:
-+++++++    #         if self.layer_idx is None:
-+++++++    #             raise ValueError("`layer_idx` must be specified for caching")
-+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++
-+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++    #     if past_key_value is not None:
-+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++++    #         key_states, value_states = past_key_value.update(
-+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++++    #         )
-+++++++
-+++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++++
-+++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++++++    #     query_states = query_states / math.sqrt(self.head_dim)
-+++++++    #     # <--- 修改结束 ---
-+++++++
-+++++++    #     fa_attention_mask = None
-+++++++    #     if attention_mask is not None:
-+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++    #         fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++    #     input_dtype = query_states.dtype
-+++++++
-+++++++    #     attn_output = mindspore.ops.flash_attention_score(
-+++++++    #         query=query_states,    # 传入已经预先缩放过的 query
-+++++++    #         key=key_states,
-+++++++    #         value=value_states,
-+++++++    #         head_num=self.num_heads,
-+++++++    #         attn_mask=fa_attention_mask,
-+++++++    #         keep_prob=1.0 - self.attention_dropout,
-+++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++++++    #         input_layout="BNSD",
-+++++++    #         sparse_mode=0,
-+++++++    #         inner_precise=1        # 仍然保持内部高精度计算
-+++++++    #     )
-+++++++
-+++++++    #     attn_output = attn_output.to(input_dtype)
-+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++    #     attn_output = self.o_proj(attn_output)
-+++++++
-+++++++    #     attn_weights = None
-+++++++    #     if output_attentions:
-+++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++++++
-+++++++    #     return attn_output, attn_weights, past_key_value
-+++++++
-++++++ QWEN2MOE_ATTENTION_CLASSES = {
-++++++     "eager": Qwen2MoeAttention,
-+++++++    "flash-attention": Qwen2MoeFlashAttention,
-++++++ }
-++++++ 
-++++++ 
-++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++ 
-+++++++    #@dwj
-+++++++    # 只遍历激活的专家，而非全部专家
-++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
-++++++-        # router_logits: (batch * sequence_length, n_experts)
-++++++-        router_logits = self.gate(hidden_states)
-++++++-
-++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++-        if self.norm_topk_prob:
-++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-        # we cast back to the input dtype
-++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
-++++++-
-++++++-        final_hidden_states = ops.zeros(
-++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++++++-        )
-++++++-
-++++++-        # One hot encode the selected experts to create an expert mask
-++++++-        # this will be used to easily index which expert is going to be sollicitated
-++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++++++-
-++++++-        # Loop over all available experts in the model and perform the computation on each expert
-++++++-        for expert_idx in range(self.num_experts):
-++++++-            expert_layer = self.experts[expert_idx]
-++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++++++-
-++++++-            # Index the correct hidden states and compute the expert hidden state for
-++++++-            # the current expert. We need to make sure to multiply the output hidden
-++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++++++-            if 0 not in idx.shape:
-++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++++++-
-++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
-++++++-                # the `top_x` tensor here.
-++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++++++-
-++++++-        shared_expert_output = self.shared_expert(hidden_states)
-++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++++++-
-++++++-        final_hidden_states = final_hidden_states + shared_expert_output
-+++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++            num_tokens = hidden_states_reshaped.shape[0]
-+++++++
-+++++++            router_logits = self.gate(hidden_states_reshaped)
-+++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++++
-+++++++            if self.norm_topk_prob:
-+++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++            routing_weights = routing_weights.to(hidden_states.dtype)
-+++++++            
-+++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++++++            flat_selected_experts = selected_experts.flatten()
-+++++++            
-+++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++++++            token_indices = broadcasted_token_indices.flatten()
-+++++++            
-+++++++            active_experts = ops.unique(flat_selected_experts)
-+++++++            
-+++++++            for expert_idx_tensor in active_experts:
-+++++++                expert_idx = expert_idx_tensor.item()
-+++++++                expert_layer = self.experts[expert_idx]
-+++++++                
-+++++++                mask = (flat_selected_experts == expert_idx_tensor)
-+++++++                selected_token_indices = token_indices[mask]
-+++++++                selected_routing_weights = routing_weights.flatten()[mask]
-+++++++                
-+++++++                current_states = hidden_states_reshaped[selected_token_indices]
-+++++++                
-+++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++++                
-+++++++                final_hidden_states = final_hidden_states.index_add(
-+++++++                    dim=0,
-+++++++                    index=selected_token_indices,
-+++++++                    source=expert_output.to(hidden_states.dtype)
-+++++++                )
-+++++++            
-+++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++++ 
-++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++-        return final_hidden_states, router_logits
-+++++++            final_hidden_states = final_hidden_states + shared_expert_output
-+++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++++            
-+++++++            return final_hidden_states, router_logits
-++++++ 
-++++++ 
-++++++ class Qwen2MoeDecoderLayer(nn.Module):
-++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++++++ 
-++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++++ 
-+++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++++
-++++++         if (layer_idx not in config.mlp_only_layers) and (
-++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++++++         ):
-++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++++++     _skip_keys_device_placement = "past_key_values"
-++++++     _supports_cache_class = True
-+++++++#lwx
-+++++++    # _supports_static_cache = True
-++++++ 
-++++++     def _init_weights(self, module):
-++++++         std = self.config.initializer_range
-++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++++++         return causal_mask
-++++++ 
-++++++ 
-++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++     _tied_weights_keys = ["lm_head.weight"]
-++++++ 
-++++++     def __init__(self, config):
-++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++         self.num_experts_per_tok = config.num_experts_per_tok
-++++++         # Initialize weights and apply final processing
-++++++         self.post_init()
-+++++++        # @lwx
-+++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+++++++        #     self.generation_config.cache_implementation = "static"
-+++++++        self._warmed_up = False
-+++++++
-+++++++    def warmup_moe_model(self):
-+++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+++++++        test_texts = [
-+++++++            "warmup short",
-+++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+++++++        ]
-+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++++        if tokenizer is None:
-+++++++            from mindnlp.transformers import AutoTokenizer
-+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++++            self._warmup_tokenizer = tokenizer
-+++++++
-+++++++        for text in test_texts:
-+++++++            inputs = tokenizer(text, return_tensors="ms")
-+++++++            with mindspore._no_grad():
-+++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++++++ 
-++++++     def get_input_embeddings(self):
-++++++         return self.model.embed_tokens
-++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++++         ```"""
-+++++++        if not self._warmed_up:
-+++++++            self._warmed_up = True
-+++++++            self.warmup_moe_model()
-++++++ 
-++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++++++         output_router_logits = (
-++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++             }
-++++++         )
-++++++         return model_inputs
-+++++++# @lwx
-+++++++    # def _decode_one_tokens_logits(
-+++++++    #     self,
-+++++++    #     cur_token: mindspore.Tensor,
-+++++++    #     input_pos: Optional[mindspore.Tensor],
-+++++++    #     cache_position: mindspore.Tensor,
-+++++++    #     past_key_values: StaticCache,
-+++++++    # ) -> mindspore.Tensor:
-+++++++    #     """
-+++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+++++++        
-+++++++    #     Args:
-+++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+++++++    #         input_pos: 输入位置信息，可选
-+++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+++++++            
-+++++++    #     Returns:
-+++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+++++++    #     """
-+++++++    #     # 调用JIT编译的版本
-+++++++    #     return self.get_decode_one_tokens_logits(
-+++++++    #         cur_token=cur_token,
-+++++++    #         input_pos=input_pos,
-+++++++    #         cache_position=cache_position,
-+++++++    #         past_key_values=past_key_values,
-+++++++    #     )
-+++++++    
-+++++++    # @mindspore.jit(jit_level='O1')
-+++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+++++++    #     """
-+++++++    #     JIT编译的函数，用于高效的单token解码
-+++++++    #     使用JIT编译优化以支持静态shape和高效执行
-+++++++        
-+++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+++++++    #     """
-+++++++    #     outputs = self.model.forward(
-+++++++    #         input_ids=cur_token,
-+++++++    #         position_ids=input_pos,
-+++++++    #         cache_position=cache_position,
-+++++++    #         past_key_values=past_key_values,
-+++++++    #         use_cache=True,
-+++++++    #         return_dict=False,
-+++++++    #     )
-+++++++        
-+++++++    #     hidden_states = outputs[0]
-+++++++    #     logits = self.lm_head.forward(hidden_states)
-+++++++    #     logits = logits.float()
-+++++++        
-+++++++    #     return logits[:, -1, :]
-+++++++
-+++++++    # def _sample(
-+++++++    #     self,
-+++++++    #     input_ids: mindspore.Tensor,
-+++++++    #     logits_processor,
-+++++++    #     stopping_criteria,
-+++++++    #     generation_config,
-+++++++    #     synced_devices: bool,
-+++++++    #     streamer=None,
-+++++++    #     logits_warper=None,
-+++++++    #     **model_kwargs,
-+++++++    # ):
-+++++++    #     """
-+++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+++++++    #     """
-+++++++    #     from ...generation.logits_process import LogitsProcessorList
-+++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+++++++    #     from mindnlp.core import nn, ops, no_grad
-+++++++    #     import numpy as np
-+++++++        
-+++++++    #     # 检查是否使用 StaticCache
-+++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+++++++    #     # 否则，直接调用父类方法
-+++++++    #     past_key_values = model_kwargs.get("past_key_values")
-+++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+++++++
-+++++++    #     if not isinstance(past_key_values, StaticCache):
-+++++++    #         # 不使用 StaticCache，直接调用父类方法
-+++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+++++++    #         return super()._sample(
-+++++++    #             input_ids=input_ids,
-+++++++    #             logits_processor=logits_processor,
-+++++++    #             stopping_criteria=stopping_criteria,
-+++++++    #             generation_config=generation_config,
-+++++++    #             synced_devices=synced_devices,
-+++++++    #             streamer=streamer,
-+++++++    #             logits_warper=logits_warper,
-+++++++    #             **model_kwargs,
-+++++++    #         )
-+++++++        
-+++++++    #     # 使用 StaticCache，进入自定义循环
-+++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+++++++    #     pad_token_id = generation_config._pad_token_tensor
-+++++++    #     output_attentions = generation_config.output_attentions
-+++++++    #     output_hidden_states = generation_config.output_hidden_states
-+++++++    #     output_scores = generation_config.output_scores
-+++++++    #     output_logits = generation_config.output_logits
-+++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+++++++    #     max_length = generation_config.max_length
-+++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+++++++    #     do_sample = generation_config.do_sample
-+++++++        
-+++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+++++++    #         raise ValueError(
-+++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+++++++    #             f"{logits_warper})."
-+++++++    #         )
-+++++++        
-+++++++    #     # init attention / hidden states / scores tuples
-+++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-+++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+++++++        
-+++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+++++++    #         encoder_hidden_states = (
-+++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+++++++    #         )
-+++++++        
-+++++++    #     # keep track of which sequences are already finished
-+++++++    #     batch_size, cur_len = input_ids.shape
-+++++++    #     this_peer_finished = False
-+++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+++++++        
-+++++++    #     time_record = []
-+++++++    #     from ....utils.testing_utils import parse_flag_from_env
-+++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+++++++        
-+++++++    #     while self._has_unfinished_sequences(
-+++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+++++++    #     ):
-+++++++    #         if _record_time:
-+++++++    #             import time as time_module
-+++++++    #             infer_start = time_module.time()
-+++++++            
-+++++++    #         # prepare model inputs
-+++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+++++++            
-+++++++    #         # prepare variable output controls
-+++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+++++++            
-+++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+++++++    #         cur_cache_position = model_inputs.get("cache_position")
-+++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-+++++++    #         cur_input_ids = model_inputs.get("input_ids")
-+++++++            
-+++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+++++++    #             cur_cache_position is not None and 
-+++++++    #             len(cur_cache_position.shape) > 0 and
-+++++++    #             cur_cache_position.shape[0] == 1 and
-+++++++    #             cur_input_ids is not None and
-+++++++    #             cur_input_ids.shape[1] == 1):
-+++++++    #             # 使用 JIT 优化的单 token 解码
-+++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+++++++    #             if not hasattr(self, '_jit_used'):
-+++++++    #                 self._jit_used = False
-+++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+++++++                
-+++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-+++++++    #                 cur_token=cur_input_ids,
-+++++++    #                 input_pos=model_inputs.get("position_ids"),
-+++++++    #                 cache_position=cur_cache_position,
-+++++++    #                 past_key_values=cur_past_key_values,
-+++++++    #             )
-+++++++                
-+++++++    #             # 标记已使用JIT（用于后续判断）
-+++++++    #             if not self._jit_used:
-+++++++    #                 self._jit_used = True
-+++++++                
-+++++++    #             # 构造兼容的输出对象
-+++++++    #             class JitOptimizedOutput:
-+++++++    #                 def __init__(self, logits, config):
-+++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+++++++    #                     self.config = config
-+++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-+++++++    #                     self.cross_attentions = None
-+++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+++++++                
-+++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+++++++    #         else:
-+++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+++++++    #             outputs = self(**model_inputs, return_dict=True)
-+++++++            
-+++++++    #         if synced_devices and this_peer_finished:
-+++++++    #             continue
-+++++++            
-+++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+++++++    #         next_token_logits = outputs.logits[:, -1, :]
-+++++++            
-+++++++    #         # pre-process distribution
-+++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+++++++    #         if do_sample:
-+++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+++++++            
-+++++++    #         # Store scores, attentions and hidden_states when required
-+++++++    #         if return_dict_in_generate:
-+++++++    #             if output_scores:
-+++++++    #                 scores += (next_token_scores,)
-+++++++    #             if output_logits:
-+++++++    #                 raw_logits += (next_token_logits,)
-+++++++    #             if output_attentions:
-+++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+++++++    #                 if self.config.is_encoder_decoder:
-+++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+++++++                
-+++++++    #             if output_hidden_states:
-+++++++    #                 hidden = (
-+++++++    #                     outputs.decoder_hidden_states
-+++++++    #                     if self.config.is_encoder_decoder
-+++++++    #                     else outputs.hidden_states
-+++++++    #                 )
-+++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+++++++            
-+++++++    #         # token selection
-+++++++    #         if do_sample:
-+++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+++++++    #         else:
-+++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+++++++            
-+++++++    #         # finished sentences should have their next token be a padding token
-+++++++    #         if has_eos_stopping_criteria:
-+++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+++++++            
-+++++++    #         # update generated ids, model inputs, and length for next step
-+++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+++++++    #         if streamer is not None:
-+++++++    #             streamer.put(next_tokens)
-+++++++            
-+++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-+++++++    #             outputs,
-+++++++    #             model_kwargs,
-+++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+++++++    #         )
-+++++++            
-+++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+++++++    #         cur_len += 1
-+++++++            
-+++++++    #         if _record_time:
-+++++++    #             import time as time_module
-+++++++    #             infer_stop = time_module.time()
-+++++++    #             time_record.append(infer_stop - infer_start)
-+++++++            
-+++++++    #         del outputs
-+++++++        
-+++++++    #     average_infer_time = None
-+++++++    #     if time_record:
-+++++++    #         if len(time_record) > 1:
-+++++++    #             time_record.pop(0)
-+++++++    #         average_infer_time = sum(time_record) / len(time_record)
-+++++++    #         print(f'average inference time is: {average_infer_time}')
-+++++++    #         print(f'inference time record: {time_record}')
-+++++++        
-+++++++    #     if streamer is not None:
-+++++++    #         streamer.end()
-+++++++        
-+++++++    #     # 简单判断：打印是否使用了JIT路径
-+++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-+++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-+++++++    #     else:
-+++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+++++++        
-+++++++    #     if return_dict_in_generate:
-+++++++    #         if self.config.is_encoder_decoder:
-+++++++    #             return GenerateEncoderDecoderOutput(
-+++++++    #                 sequences=input_ids,
-+++++++    #                 scores=scores,
-+++++++    #                 logits=raw_logits,
-+++++++    #                 encoder_attentions=encoder_attentions,
-+++++++    #                 encoder_hidden_states=encoder_hidden_states,
-+++++++    #                 decoder_attentions=decoder_attentions,
-+++++++    #                 cross_attentions=cross_attentions,
-+++++++    #                 decoder_hidden_states=decoder_hidden_states,
-+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++++    #                 average_infer_time=average_infer_time
-+++++++    #             )
-+++++++    #         else:
-+++++++    #             return GenerateDecoderOnlyOutput(
-+++++++    #                 sequences=input_ids,
-+++++++    #                 scores=scores,
-+++++++    #                 logits=raw_logits,
-+++++++    #                 attentions=decoder_attentions,
-+++++++    #                 hidden_states=decoder_hidden_states,
-+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++++    #                 average_infer_time=average_infer_time
-+++++++    #             )
-+++++++    #     else:
-+++++++    #         return input_ids
-+++++++            
-+++++++    # def _prepare_cache_for_generation(
-+++++++    #     self,
-+++++++    #     generation_config,
-+++++++    #     model_kwargs,
-+++++++    #     assistant_model,
-+++++++    #     batch_size,
-+++++++    #     max_cache_length,
-+++++++    # ):
-+++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+++++++    #         generation_config.cache_implementation = "static"
-+++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+++++++        
-+++++++    #     if generation_config.cache_implementation == "static":
-+++++++    #         base_required_from_max_length = generation_config.max_length + 1
-+++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-+++++++    #         min_cache_size = 50
-+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+++++++    #         else:
-+++++++    #             max_cache_length = max(base_required, min_cache_size)
-+++++++            
-+++++++    #         original_max_cache_length = max_cache_length
-+++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-+++++++            
-+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++++    #             if max_cache_length > self.config.max_position_embeddings:
-+++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+++++++        
-+++++++    #     result = super()._prepare_cache_for_generation(
-+++++++    #         generation_config=generation_config,
-+++++++    #         model_kwargs=model_kwargs,
-+++++++    #         assistant_model=assistant_model,
-+++++++    #         batch_size=batch_size,
-+++++++    #         max_cache_length=max_cache_length,
-+++++++    #     )
-+++++++        
-+++++++    #     if generation_config.cache_implementation == "static":
-+++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+++++++    #         created_cache = model_kwargs.get(cache_name)
-+++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+++++++    #             if created_cache.max_cache_len < generation_config.max_length:
-+++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+++++++        
-+++++++    #     return result
-+++++++
-+++++++
-+++++++
-++++++ 
-++++++ 
-++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++++++-- 
-++++++2.27.0
-++++++
-+++++-- 
-+++++2.27.0
-+++++
-++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-++++new file mode 100644
-++++index 00000000..966529e4
-++++--- /dev/null
-+++++++ b/patches/0003-20261106secondcommit.patch
-++++@@ -0,0 +1,2769 @@
-+++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+++++From: Pinoeer-kingxi <13022943007@163.com>
-+++++Date: Thu, 6 Nov 2025 14:54:37 +0800
-+++++Subject: [PATCH 3/3] 20261106secondcommit
-+++++
-+++++---
-+++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
-+++++ patches/0001-20251104commit.patch             | 1272 -----------------
-+++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
-+++++ delete mode 100644 patches/0001-20251104commit.patch
-+++++
-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++index 73773c22..2f9192bf 100644
-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
-+++++ 
-+++++ _CONFIG_FOR_DOC = "DeepseekConfig"
-+++++ 
-++++++_attn_mask_cache = {}
-++++++
-++++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-++++++    q_len = batch_and_seq[1]
-++++++    kv_len = batch_and_seq[1] + past_key_values_length 
-++++++    key = (batch_and_seq[0], q_len, kv_len)
-++++++
-++++++    if key in _attn_mask_cache:
-++++++        return _attn_mask_cache[key]
-++++++
-++++++    mask = _prepare_4d_causal_attention_mask(
-++++++        attention_mask,
-++++++        batch_and_seq,
-++++++        inputs_embeds,
-++++++        past_key_values_length,
-++++++    )
-++++++    _attn_mask_cache[key] = mask
-++++++    return mask
-+++++ 
-+++++ def _get_unpad_data(attention_mask):
-+++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
-+++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
-+++++         return final_output
-+++++ 
-+++++ 
-+++++-    @no_grad()
-+++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++-        expert_cache = ops.zeros_like(x)
-+++++-        idxs = flat_expert_indices.argsort()
-+++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++-        token_idxs = idxs // self.num_experts_per_tok
-+++++-
-+++++-        for i, end_idx in enumerate(tokens_per_expert):
-+++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++-            if start_idx == end_idx:
-+++++-                continue
-+++++-            expert = self.experts[i]
-+++++-            exp_token_idx = token_idxs[start_idx:end_idx]
-+++++-            expert_tokens = x[exp_token_idx]
-+++++-            expert_out = expert(expert_tokens)
-+++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++-
-+++++-        return expert_cache
-+++++-        
-+++++     # @no_grad()
-+++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++-    #     # expert_cache = torch.zeros_like(x)
-+++++-    #     # idxs = flat_expert_indices.argsort()
-+++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++-    #     # token_idxs = idxs // self.num_experts_per_tok
-+++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++-    #     #     if start_idx == end_idx:
-+++++-    #     #         continue
-+++++-    #     #     expert = self.experts[i]
-+++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++-    #     #     expert_tokens = x[exp_token_idx]
-+++++-    #     #     expert_out = expert(expert_tokens)
-+++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++-    #     # return expert_cache
-++++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++     #     expert_cache = ops.zeros_like(x)
-+++++     #     idxs = flat_expert_indices.argsort()
-+++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
-+++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++ 
-+++++     #     return expert_cache
-+++++-    # @no_grad()
-+++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++-    #     expert_cache = ops.zeros_like(x)
-++++++        
-++++++    @no_grad()
-++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++++        """
-++++++        优化版 MoE prefill：
-++++++        - 批量张量化处理同一个 expert 的所有 token
-++++++        - 跳过无 token 的专家
-++++++        - 保持结果完全一致
-++++++        """
-++++++        # 初始化输出缓存
-++++++        expert_cache = ops.zeros_like(x)
-+++++ 
-+++++-    #     # 排序保证顺序一致
-+++++-    #     idxs = flat_expert_indices.argsort()
-+++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++-    #     token_idxs = idxs // self.num_experts_per_tok
-++++++        # 排序（确保 scatter_add 位置对应原逻辑）
-++++++        idxs = flat_expert_indices.argsort()
-++++++        sorted_expert_indices = flat_expert_indices[idxs]
-++++++        sorted_token_indices = idxs // self.num_experts_per_tok
-+++++ 
-+++++-    #     # 找出有 token 的专家
-+++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++++        # 每个 expert 的 token 数
-++++++        tokens_per_expert = sorted_expert_indices.bincount()
-+++++ 
-+++++-    #     for i in active_experts.tolist():
-+++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++-    #         end_idx = tokens_per_expert[i]
-+++++-    #         if start_idx == end_idx:  # 没有 token
-+++++-    #             continue
-++++++        # 找出有 token 的专家
-++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-+++++ 
-+++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++-    #         expert_tokens = x[exp_token_idx]
-+++++-    #         expert_out = self.experts[i](expert_tokens)
-+++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++++        for expert_id in active_experts.tolist():
-++++++            # 取该 expert 对应的排序后 token 区间
-++++++            start = (tokens_per_expert[:expert_id]).sum().item()
-++++++            end = start + tokens_per_expert[expert_id].item()
-+++++ 
-+++++-    #         expert_cache = mindspore.mint.scatter_add(
-+++++-    #             expert_cache,
-+++++-    #             0,
-+++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++-    #             expert_out
-+++++-    #         )
-++++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
-++++++            expert_tokens = x[token_idx]                     # 取输入向量
-+++++ 
-+++++-    #     return expert_cache
-++++++            # 执行专家 MLP
-++++++            expert_out = self.experts[expert_id](expert_tokens)
-++++++
-++++++            # 按权重缩放
-++++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
-++++++
-++++++            # 回写到缓存（等价 scatter_add）
-++++++            expert_cache = mindspore.mint.scatter_add(
-++++++                expert_cache,
-++++++                0,
-++++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++++                scaled_out
-++++++            )
-++++++
-++++++        return expert_cache
-++++++
-++++++        # @no_grad()
-++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++        #     # expert_cache = torch.zeros_like(x)
-++++++        #     # idxs = flat_expert_indices.argsort()
-++++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++        #     # token_idxs = idxs // self.num_experts_per_tok
-++++++        #     # for i, end_idx in enumerate(tokens_per_expert):
-++++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++        #     #     if start_idx == end_idx:
-++++++        #     #         continue
-++++++        #     #     expert = self.experts[i]
-++++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++        #     #     expert_tokens = x[exp_token_idx]
-++++++        #     #     expert_out = expert(expert_tokens)
-++++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++        #     # return expert_cache
-++++++        #     expert_cache = ops.zeros_like(x)
-++++++        #     idxs = flat_expert_indices.argsort()
-++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++        #     token_idxs = idxs // self.num_experts_per_tok
-++++++
-++++++        #     for i, end_idx in enumerate(tokens_per_expert):
-++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++        #         if start_idx == end_idx:
-++++++        #             continue
-++++++        #         expert = self.experts[i]
-++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++        #         expert_tokens = x[exp_token_idx]
-++++++        #         expert_out = expert(expert_tokens)
-++++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++
-++++++        #     return expert_cache
-++++++        # @no_grad()
-++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++        #     expert_cache = ops.zeros_like(x)
-++++++
-++++++        #     # 排序保证顺序一致
-++++++        #     idxs = flat_expert_indices.argsort()
-++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++        #     token_idxs = idxs // self.num_experts_per_tok
-++++++
-++++++        #     # 找出有 token 的专家
-++++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++++
-++++++        #     for i in active_experts.tolist():
-++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++        #         end_idx = tokens_per_expert[i]
-++++++        #         if start_idx == end_idx:  # 没有 token
-++++++        #             continue
-++++++
-++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++        #         expert_tokens = x[exp_token_idx]
-++++++        #         expert_out = self.experts[i](expert_tokens)
-++++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++++
-++++++        #         expert_cache = mindspore.mint.scatter_add(
-++++++        #             expert_cache,
-++++++        #             0,
-++++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++++        #             expert_out
-++++++        #         )
-++++++
-++++++        #     return expert_cache
-+++++ 
-+++++ 
-+++++ 
-+++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
-+++++ 
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++-
-+++++ # class DeepseekFlashAttention(nn.Module):
-+++++ #     """
-+++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
-+++++ 
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-++++++
-+++++ Deepseek_ATTENTION_CLASSES = {
-+++++     "eager": DeepseekAttention,
-+++++     "flash-attention": DeepseekFlashAttention,
-+++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
-+++++             )
-+++++         else:
-+++++             # 4d mask is passed through the layers
-+++++-            attention_mask = _prepare_4d_causal_attention_mask(
-++++++            # attention_mask = _prepare_4d_causal_attention_mask(
-++++++            #     attention_mask,
-++++++            #     (batch_size, seq_length),
-++++++            #     inputs_embeds,
-++++++            #     past_key_values_length,
-++++++            # )
-++++++            #@dwj
-++++++            attention_mask = get_cached_causal_mask(
-+++++                 attention_mask,
-+++++                 (batch_size, seq_length),
-+++++                 inputs_embeds,
-+++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++         # Initialize weights and apply final processing
-+++++         self.post_init()
-+++++         self.warm_up = False
-++++++        #@dwj
-++++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-++++++            self.num_layers,
-++++++            self.num_attention_heads,
-++++++            self.head_dim,
-++++++            batch_size=1,
-++++++            max_length=self.max_length,
-++++++            dtype=mindspore.float16
-++++++        )
-++++++
-++++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-++++++        key_cache = []
-++++++        value_cache = []
-++++++        for _ in range(num_layers):
-++++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-++++++            key_cache.append(k)
-++++++            value_cache.append(v)
-++++++        return key_cache, value_cache
-++++++
-+++++ 
-+++++     def warmup_moe_model_deep(self):
-+++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++index bced285c..ebd7782e 100644
-+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
-+++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-+++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-+++++ 
-+++++-Long_Prompt = False
-+++++-PROMPT_LENGTH_THRESHOLD = 128
-++++++Long_Prompt = 1
-++++++LONG_PROMPT_LENGTH_THRESHOLD = 128
-++++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
-++++++
-++++++_causal_mask_cache = {}
-++++++
-++++++def get_cached_causal_mask_with_cache_position(
-++++++    attention_mask: mindspore.Tensor,
-++++++    sequence_length: int,
-++++++    target_length: int,
-++++++    dtype: mindspore.dtype,
-++++++    min_dtype: float,
-++++++    cache_position: mindspore.Tensor,
-++++++    batch_size: int,
-++++++):
-++++++    """
-++++++    带缓存的 causal mask 构造函数
-++++++    """
-++++++    # q_len 是当前 query 长度
-++++++    q_len = sequence_length
-++++++    # kv_len 是 target_length
-++++++    kv_len = target_length
-++++++
-++++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
-++++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
-++++++
-++++++    if key in _causal_mask_cache:
-++++++        return _causal_mask_cache[key]
-++++++
-++++++    # 调用原来的 mask 构造逻辑
-++++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++++        attention_mask,
-++++++        sequence_length=sequence_length,
-++++++        target_length=target_length,
-++++++        dtype=dtype,
-++++++        min_dtype=min_dtype,
-++++++        cache_position=cache_position,
-++++++        batch_size=batch_size,
-++++++    )
-++++++    # 缓存结果
-++++++    _causal_mask_cache[key] = causal_mask
-++++++    return causal_mask
-+++++ 
-+++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-+++++ def _prepare_4d_causal_attention_mask_with_cache_position(
-+++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++++ 
-+++++ 
-+++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
-++++++# class Qwen2MoeAttention(nn.Module):
-++++++#     """
-++++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-++++++#     and "Generating Long Sequences with Sparse Transformers".
-++++++#     """
-++++++
-++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++++#         super().__init__()
-++++++#         self.config = config
-++++++#         self.layer_idx = layer_idx
-++++++#         if layer_idx is None:
-++++++#             logger.warning_once(
-++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++++#                 "when creating this class."
-++++++#             )
-++++++
-++++++#         self.hidden_size = config.hidden_size
-++++++#         self.num_heads = config.num_attention_heads
-++++++#         self.head_dim = self.hidden_size // self.num_heads
-++++++#         self.num_key_value_heads = config.num_key_value_heads
-++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++#         self.max_position_embeddings = config.max_position_embeddings
-++++++#         self.rope_theta = config.rope_theta
-++++++#         self.is_causal = True
-++++++#         self.attention_dropout = config.attention_dropout
-++++++
-++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++#             raise ValueError(
-++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-++++++#                 f" and `num_heads`: {self.num_heads})."
-++++++#             )
-++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++++
-++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++++#             self.head_dim,
-++++++#             max_position_embeddings=self.max_position_embeddings,
-++++++#             base=self.rope_theta,
-++++++#         )
-++++++
-++++++#     def forward(
-++++++#         self,
-++++++#         hidden_states: mindspore.Tensor,
-++++++#         attention_mask: Optional[mindspore.Tensor] = None,
-++++++#         position_ids: Optional[mindspore.Tensor] = None,
-++++++#         past_key_value: Optional[Cache] = None,
-++++++#         output_attentions: bool = False,
-++++++#         use_cache: bool = False,
-++++++#         cache_position: Optional[mindspore.Tensor] = None,
-++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++
-++++++        
-++++++
-++++++#         bsz, q_len, _ = hidden_states.shape
-++++++
-++++++#         query_states = self.q_proj(hidden_states)
-++++++#         key_states = self.k_proj(hidden_states)
-++++++#         value_states = self.v_proj(hidden_states)
-++++++
-++++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++++
-++++++#         kv_seq_len = key_states.shape[-2]
-++++++#         if past_key_value is not None:
-++++++#             if self.layer_idx is None:
-++++++#                 raise ValueError(
-++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++#                     "with a layer index."
-++++++#                 )
-++++++#             if isinstance(past_key_value, StaticCache):
-++++++#                 kv_seq_len = key_states.shape[-2]
-++++++#             else:
-++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++
-++++++#         if past_key_value is not None:
-++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++            
-++++++#             if isinstance(past_key_value, StaticCache):
-++++++#                 kv_seq_len = key_states.shape[-2]
-++++++
-++++++#         # repeat k/v heads if n_kv_heads < n_heads
-++++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++        
-++++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++++
-++++++#         if attention_mask is not None:
-++++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++++#             attn_weights = attn_weights + causal_mask
-++++++
-++++++#         # upcast attention to fp32
-++++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++++++#         attn_output = ops.matmul(attn_weights, value_states)
-++++++
-++++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++++++#             raise ValueError(
-++++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-++++++#                 f" {attn_output.shape}"
-++++++#             )
-++++++
-++++++#         attn_output = ops.transpose(attn_output, 1, 2)
-++++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++++
-++++++#         attn_output = self.o_proj(attn_output)
-++++++#         # @lwx
-++++++        
-++++++#         # max_seq_len = self.max_position_embeddings  # 2048
-++++++
-++++++#         # if attention_mask is not None:
-++++++#         #     # attention_mask: [B, 1, Sq, Sk]
-++++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++++
-++++++#         #     # pad 到 [max_seq_len, max_seq_len]
-++++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++++#         #     global_attention_mask = padded_mask
-++++++#         # else:
-++++++#         #     global_attention_mask = None
-++++++
-++++++
-++++++#         # sparse_mode=3
-++++++#         # attn_output = mindspore.ops.flash_attention_score(
-++++++#         #     query=query_states,
-++++++#         #     key=key_states,
-++++++#         #     value=value_states,
-++++++#         #     real_shift=None,
-++++++#         #     padding_mask=None,
-++++++
-++++++#         #     head_num=self.num_heads,
-++++++#         #     attn_mask=global_attention_mask,
-++++++#         #     keep_prob=1.0 - self.attention_dropout,
-++++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++#         #     input_layout="BNSD",
-++++++#         #     pre_tokens=2147483647,
-++++++#         #     next_tokens=2147483647,
-++++++#         #     inner_precise=0,
-++++++#         #     drop_mask=None,
-++++++#         #     prefix=None,
-++++++#         #     actual_seq_qlen=None,
-++++++#         #     actual_seq_kvlen=None,
-++++++#         #     sparse_mode=sparse_mode,
-++++++#         # )
-++++++#         if not output_attentions:
-++++++#             attn_weights = None
-++++++
-++++++#         return attn_output, attn_weights, past_key_value
-++++++
-+++++ class Qwen2MoeAttention(nn.Module):
-+++++     """
-+++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-+++++-    and "Generating Long Sequences with Sparse Transformers".
-+++++-    """
-++++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
-+++++ 
-++++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
-++++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
-++++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
-++++++
-++++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
-++++++    """
-+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++         super().__init__()
-+++++         self.config = config
-+++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
-+++++         if layer_idx is None:
-+++++             logger.warning_once(
-+++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++                 "when creating this class."
-+++++             )
-+++++ 
-+++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
-+++++         use_cache: bool = False,
-+++++         cache_position: Optional[mindspore.Tensor] = None,
-+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++-
-+++++         
-+++++-
-++++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
-+++++         bsz, q_len, _ = hidden_states.shape
-+++++ 
-+++++         query_states = self.q_proj(hidden_states)
-+++++         key_states = self.k_proj(hidden_states)
-+++++         value_states = self.v_proj(hidden_states)
-+++++ 
-+++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++-
-++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++        
-+++++         kv_seq_len = key_states.shape[-2]
-+++++         if past_key_value is not None:
-+++++-            if self.layer_idx is None:
-+++++-                raise ValueError(
-+++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++-                    "with a layer index."
-+++++-                )
-+++++-            if isinstance(past_key_value, StaticCache):
-+++++-                kv_seq_len = key_states.shape[-2]
-+++++-            else:
-+++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++        
-+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++ 
-+++++         if past_key_value is not None:
-+++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++
-++++++        # --- 2. 动态调度核心注意力计算 ---
-++++++        global Long_Prompt
-++++++        if Long_Prompt >= 1:
-++++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
-++++++            fa_attention_mask = None
-++++++            if attention_mask is not None:
-++++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++                fa_attention_mask = (mask_slice != 0)
-++++++
-++++++            attn_output = mindspore.ops.flash_attention_score(
-++++++                query=query_states,
-++++++                key=key_states,
-++++++                value=value_states,
-++++++                head_num=self.num_heads,
-++++++                attn_mask=fa_attention_mask,
-++++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
-++++++                scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++                input_layout="BNSD",
-++++++                sparse_mode=0,
-++++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
-++++++            )
-+++++             
-+++++-            if isinstance(past_key_value, StaticCache):
-+++++-                kv_seq_len = key_states.shape[-2]
-++++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++            attn_output = self.o_proj(attn_output)
-++++++            attn_weights = None
-++++++            if output_attentions:
-++++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-+++++ 
-+++++-        # repeat k/v heads if n_kv_heads < n_heads
-+++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++-        
-+++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++++        else:
-++++++            # --- Eager Attention 路径 (用于短序列和解码) ---
-++++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++            
-++++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++ 
-+++++-        if attention_mask is not None:
-+++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++-            attn_weights = attn_weights + causal_mask
-++++++            if attention_mask is not None:
-++++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++++                attn_weights = attn_weights + causal_mask
-+++++ 
-+++++-        # upcast attention to fp32
-+++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+++++-        attn_output = ops.matmul(attn_weights, value_states)
-++++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++++++            attn_output = ops.matmul(attn_weights, value_states)
-+++++ 
-+++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+++++-            raise ValueError(
-+++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-+++++-                f" {attn_output.shape}"
-+++++-            )
-++++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++++++                raise ValueError(
-++++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
-++++++                )
-+++++ 
-+++++-        attn_output = ops.transpose(attn_output, 1, 2)
-+++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++++            attn_output = ops.transpose(attn_output, 1, 2)
-++++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++++            attn_output = self.o_proj(attn_output)
-+++++ 
-+++++-        attn_output = self.o_proj(attn_output)
-+++++-        # @lwx
-++++++            if not output_attentions:
-++++++                attn_weights = None
-+++++         
-+++++-        # max_seq_len = self.max_position_embeddings  # 2048
-+++++-
-+++++-        # if attention_mask is not None:
-+++++-        #     # attention_mask: [B, 1, Sq, Sk]
-+++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++++-
-+++++-        #     # pad 到 [max_seq_len, max_seq_len]
-+++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++++-        #     global_attention_mask = padded_mask
-+++++-        # else:
-+++++-        #     global_attention_mask = None
-+++++-
-+++++-
-+++++-        # sparse_mode=3
-+++++-        # attn_output = mindspore.ops.flash_attention_score(
-+++++-        #     query=query_states,
-+++++-        #     key=key_states,
-+++++-        #     value=value_states,
-+++++-        #     real_shift=None,
-+++++-        #     padding_mask=None,
-+++++-
-+++++-        #     head_num=self.num_heads,
-+++++-        #     attn_mask=global_attention_mask,
-+++++-        #     keep_prob=1.0 - self.attention_dropout,
-+++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++-        #     input_layout="BNSD",
-+++++-        #     pre_tokens=2147483647,
-+++++-        #     next_tokens=2147483647,
-+++++-        #     inner_precise=0,
-+++++-        #     drop_mask=None,
-+++++-        #     prefix=None,
-+++++-        #     actual_seq_qlen=None,
-+++++-        #     actual_seq_kvlen=None,
-+++++-        #     sparse_mode=sparse_mode,
-+++++-        # )
-+++++-        if not output_attentions:
-+++++-            attn_weights = None
-+++++-
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++-
-+++++ # class Qwen2MoeFlashAttention(nn.Module):
-+++++ #     """
-+++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
-+++++ #             return final_hidden_states, router_logits
-+++++ 
-+++++ 
-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++-#     """
-+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-+++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-+++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
-+++++-#     """
-+++++-#     def __init__(self, config: Qwen2MoeConfig):
-+++++-#         super().__init__()
-+++++-#         self.num_experts = config.num_experts
-+++++-#         self.top_k = config.num_experts_per_tok
-+++++-#         self.norm_topk_prob = config.norm_topk_prob
-+++++-
-+++++-#         # 门控网络
-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++-#         # 专家列表
-+++++-#         self.experts = nn.ModuleList(
-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++-#         )
-+++++-#         # 共享专家
-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++-
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_decode(
-+++++-#         self, 
-+++++-#         hidden_states: mindspore.Tensor, 
-+++++-#         selected_experts: mindspore.Tensor, 
-+++++-#         routing_weights: mindspore.Tensor
-+++++-#     ) -> mindspore.Tensor:
-+++++-#         """
-+++++-#         【解码路径】针对 sequence_length=1 的极致优化。
-+++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-+++++-#         """
-+++++-#         batch_size, hidden_dim = hidden_states.shape
-+++++-        
-+++++-#         expert_outputs_list = [
-+++++-#             ops.cat([
-+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++-#             ], dim=0) 
-+++++-#             for i in range(batch_size)
-+++++-#         ]
-+++++-        
-+++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-+++++-#         # shape: (batch_size, top_k, hidden_dim)
-+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++-        
-+++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-+++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++-        
-+++++-#         return moe_output.squeeze(1)
-+++++-
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_prefill(
-+++++-#         self, 
-+++++-#         hidden_states: mindspore.Tensor, 
-+++++-#         selected_experts: mindspore.Tensor, 
-+++++-#         routing_weights: mindspore.Tensor
-+++++-#     ) -> mindspore.Tensor:
-+++++-#         """
-+++++-#         【预填充路径】针对 sequence_length > 1 的优化。
-+++++-#         按专家对 Token 进行分组，并进行批处理。
-+++++-#         """
-+++++-#         moe_output = ops.zeros_like(hidden_states)
-+++++-#         num_tokens = hidden_states.shape[0]
-+++++-#         flat_selected_experts = selected_experts.flatten()
-+++++-        
-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++-        
-+++++-#         active_experts = ops.unique(flat_selected_experts)
-+++++-        
-+++++-#         for expert_idx_tensor in active_experts:
-+++++-#             expert_idx = expert_idx_tensor.item()
-+++++-#             expert_layer = self.experts[expert_idx]
-+++++-            
-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++-#             selected_token_indices = token_indices[mask]
-+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++-            
-+++++-#             current_states = hidden_states[selected_token_indices]
-+++++-            
-+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++-            
-+++++-#             moe_output = moe_output.index_add(
-+++++-#                 dim=0,
-+++++-#                 index=selected_token_indices,
-+++++-#                 source=expert_output.to(hidden_states.dtype)
-+++++-#             )
-+++++-#         return moe_output
-+++++-
-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++-#         """
-+++++-#         顶层 forward 方法，作为智能分发器。
-+++++-#         """
-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-        
-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++-#         router_logits = self.gate(hidden_states_reshaped)
-+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++-
-+++++-#         if self.norm_topk_prob:
-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-        
-+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++-        
-+++++-#         moe_output = None
-+++++-#         # 在推理时，根据序列长度选择最优路径
-+++++-#         if not self.training:
-+++++-#             if sequence_length == 1:
-+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++++-#             else:
-+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++++-#         else:
-+++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-+++++-#             raise NotImplementedError("Training path is not implemented.")
-+++++-
-+++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-+++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-+++++-        
-+++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-+++++-        
-+++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-+++++-        
-+++++-#         return final_hidden_states, router_logits
-+++++-
-+++++-
-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++-#     """
-+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-+++++-#     """
-+++++-#     def __init__(self, config: Qwen2MoeConfig):
-+++++-#         super().__init__()
-+++++-#         self.num_experts = config.num_experts
-+++++-#         self.top_k = config.num_experts_per_tok
-+++++-#         self.norm_topk_prob = config.norm_topk_prob
-+++++-
-+++++-#         # 门控网络
-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++-#         # 专家列表
-+++++-#         self.experts = nn.ModuleList(
-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++-#         )
-+++++-#         # 共享专家
-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++-
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_decode(
-+++++-#         self, 
-+++++-#         hidden_states: mindspore.Tensor, 
-+++++-#         selected_experts: mindspore.Tensor, 
-+++++-#         routing_weights: mindspore.Tensor
-+++++-#     ) -> mindspore.Tensor:
-+++++-#         batch_size, _ = hidden_states.shape
-+++++-#         expert_outputs_list = [
-+++++-#             ops.cat([
-+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++-#             ], dim=0) 
-+++++-#             for i in range(batch_size)
-+++++-#         ]
-+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++-#         return moe_output.squeeze(1)
-+++++-
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_prefill(
-+++++-#         self, 
-+++++-#         hidden_states: mindspore.Tensor, 
-+++++-#         selected_experts: mindspore.Tensor, 
-+++++-#         routing_weights: mindspore.Tensor
-+++++-#     ) -> mindspore.Tensor:
-+++++-#         moe_output = ops.zeros_like(hidden_states)
-+++++-#         num_tokens = hidden_states.shape[0]
-+++++-#         flat_selected_experts = selected_experts.flatten()
-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++-#         active_experts = ops.unique(flat_selected_experts)
-+++++-        
-+++++-#         for expert_idx_tensor in active_experts:
-+++++-#             expert_idx = expert_idx_tensor.item()
-+++++-#             expert_layer = self.experts[expert_idx]
-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++-#             selected_token_indices = token_indices[mask]
-+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++-#             current_states = hidden_states[selected_token_indices]
-+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++-#             moe_output = moe_output.index_add(
-+++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++-#             )
-+++++-#         return moe_output
-+++++-
-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++-#         """
-+++++-#         顶层 forward 方法，作为智能分发器。
-+++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-+++++-#         """
-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-        
-+++++-#         # 1. 门控计算 (通用逻辑)
-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++-#         router_logits = self.gate(hidden_states_reshaped)
-+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++-
-+++++-#         if self.norm_topk_prob:
-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-        
-+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++-        
-+++++-#         # 2. 智能分发到最优 MoE 路径
-+++++-#         moe_output = None
-+++++-#         if not self.training:
-+++++-#             if sequence_length == 1:
-+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++++-#             else:
-+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++++-#         else:
-+++++-#             raise NotImplementedError("Training path is not implemented.")
-+++++-
-+++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-+++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++-        
-+++++-#         # 4. 合并 MoE 输出和共享专家输出
-+++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++-        
-+++++-#         # 5. 恢复原始形状并返回
-+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++-        
-+++++-#         return final_hidden_states, router_logits
-+++++-
-+++++-# prefill fastest
-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++-#     """
-+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-+++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-+++++-#     """
-+++++-#     def __init__(self, config: Qwen2MoeConfig):
-+++++-#         super().__init__()
-+++++-#         self.num_experts = config.num_experts
-+++++-#         self.top_k = config.num_experts_per_tok
-+++++-#         self.norm_topk_prob = config.norm_topk_prob
-+++++-
-+++++-#         # 门控网络
-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++-#         # 专家列表
-+++++-#         self.experts = nn.ModuleList(
-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++-#         )
-+++++-#         # 共享专家
-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++-
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_dispatch(
-+++++-#         self, 
-+++++-#         hidden_states: mindspore.Tensor, 
-+++++-#         selected_experts: mindspore.Tensor, 
-+++++-#         routing_weights: mindspore.Tensor
-+++++-#     ) -> mindspore.Tensor:
-+++++-#         """
-+++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-+++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-+++++-#         """
-+++++-#         moe_output = ops.zeros_like(hidden_states)
-+++++-#         num_tokens, _ = hidden_states.shape
-+++++-        
-+++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-+++++-#         flat_selected_experts = selected_experts.flatten()
-+++++-#         flat_routing_weights = routing_weights.flatten()
-+++++-
-+++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++-
-+++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-+++++-#         active_experts = ops.unique(flat_selected_experts)
-+++++-        
-+++++-#         for expert_idx_tensor in active_experts:
-+++++-#             expert_idx = expert_idx_tensor.item()
-+++++-#             expert_layer = self.experts[expert_idx]
-+++++-            
-+++++-#             # 找到所有分配给该专家的 token
-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++-            
-+++++-#             # 使用 mask 选取对应的 token 和权重
-+++++-#             current_token_indices = token_indices[mask]
-+++++-#             current_routing_weights = flat_routing_weights[mask]
-+++++-#             current_hidden_states = hidden_states[current_token_indices]
-+++++-            
-+++++-#             # 对这些 token 进行批处理
-+++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++++-            
-+++++-#             # 使用 index_add 将结果精确地加回到对应位置
-+++++-#             moe_output = moe_output.index_add(
-+++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++-#             )
-+++++-#         return moe_output
-+++++-
-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++-#         """
-+++++-#         顶层 forward 方法，作为智能分发器。
-+++++-#         """
-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-        
-+++++-#         # 1. 门控计算
-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++-#         router_logits = self.gate(hidden_states_reshaped)
-+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++-
-+++++-#         if self.norm_topk_prob:
-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-        
-+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++-        
-+++++-#         # 2. 调用统一的 MoE 计算内核
-+++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-+++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-+++++-
-+++++-#         # 3. 统一处理共享专家
-+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++-        
-+++++-#         # 4. 合并输出
-+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++-        
-+++++-#         # 5. 恢复原始形状并返回
-+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++-        
-+++++-#         return final_hidden_states, router_logits
-+++++-
-+++++-
-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++-#     """
-+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++-#     【最终高性能与高精度版】：
-+++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-+++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-+++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-+++++-#     3. 这样实现了速度和准确性的两全其美。
-+++++-#     """
-+++++-#     def __init__(self, config: Qwen2MoeConfig):
-+++++-#         super().__init__()
-+++++-#         self.num_experts = config.num_experts
-+++++-#         self.top_k = config.num_experts_per_tok
-+++++-#         self.norm_topk_prob = config.norm_topk_prob
-+++++-
-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++-#         self.experts = nn.ModuleList(
-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++-#         )
-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++-
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_decode(
-+++++-#         self, 
-+++++-#         hidden_states: mindspore.Tensor, 
-+++++-#         selected_experts: mindspore.Tensor, 
-+++++-#         routing_weights: mindspore.Tensor
-+++++-#     ) -> mindspore.Tensor:
-+++++-#         """
-+++++-#         【解码路径】极致优化版：bmm + 高精度累加。
-+++++-#         """
-+++++-#         original_dtype = hidden_states.dtype
-+++++-#         batch_size, _ = hidden_states.shape
-+++++-        
-+++++-#         expert_outputs_list = [
-+++++-#             ops.cat([
-+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++-#             ], dim=0) 
-+++++-#             for i in range(batch_size)
-+++++-#         ]
-+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++-
-+++++-#         # 在 float32 下执行 bmm，得到高精度结果
-+++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++-        
-+++++-#         # 将高精度结果转换回原始数据类型
-+++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-+++++-        
-+++++-#         return moe_output
-+++++-
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_prefill(
-+++++-#         self, 
-+++++-#         hidden_states: mindspore.Tensor, 
-+++++-#         selected_experts: mindspore.Tensor, 
-+++++-#         routing_weights: mindspore.Tensor
-+++++-#     ) -> mindspore.Tensor:
-+++++-#         """
-+++++-#         【预填充路径】与原始实现一致，结果精确。
-+++++-#         """
-+++++-#         moe_output = ops.zeros_like(hidden_states)
-+++++-#         num_tokens, _ = hidden_states.shape
-+++++-#         flat_selected_experts = selected_experts.flatten()
-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++-#         active_experts = ops.unique(flat_selected_experts)
-+++++-        
-+++++-#         for expert_idx_tensor in active_experts:
-+++++-#             expert_idx = expert_idx_tensor.item()
-+++++-#             expert_layer = self.experts[expert_idx]
-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++-#             selected_token_indices = token_indices[mask]
-+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++-#             current_states = hidden_states[selected_token_indices]
-+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++-#             moe_output = moe_output.index_add(
-+++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++-#             )
-+++++-#         return moe_output
-+++++-
-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-        
-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++-#         router_logits = self.gate(hidden_states_reshaped)
-+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++-
-+++++-#         if self.norm_topk_prob:
-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-        
-+++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-+++++-#         # 如果模型主体是 float16，后续再转换
-+++++-        
-+++++-#         moe_output = None
-+++++-#         if not self.training:
-+++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-+++++-#             # _moe_infer_decode 内部会处理好类型转换
-+++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-+++++-#             if sequence_length == 1:
-+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++++-#             else:
-+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++++-#         else:
-+++++-#             raise NotImplementedError("Training path is not implemented.")
-+++++-
-+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++-        
-+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++-        
-+++++-#         return final_hidden_states, router_logits
-+++++-    
-+++++-
-+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++-#     """
-+++++-#     【融合版】一个混合专家模块，内置两种推理策略，
-+++++-#     由外部全局变量 `Long_Prompt` 控制：
-+++++-
-+++++-#     - if Long_Prompt is True:  【精度优先模式】
-+++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-+++++-#       适用于处理长序列，避免误差累积。
-+++++-
-+++++-#     - if Long_Prompt is False: 【速度优先模式】
-+++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-+++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
-+++++-#     """
-+++++-#     def __init__(self, config: Qwen2MoeConfig):
-+++++-#         super().__init__()
-+++++-#         self.num_experts = config.num_experts
-+++++-#         self.top_k = config.num_experts_per_tok
-+++++-#         self.norm_topk_prob = config.norm_topk_prob
-+++++-
-+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++-#         self.experts = nn.ModuleList(
-+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++-#         )
-+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++-
-+++++-#     # --- 速度优先模式的辅助函数 ---
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++-#         original_dtype = hidden_states.dtype
-+++++-#         batch_size, _ = hidden_states.shape
-+++++-#         expert_outputs_list = [
-+++++-#             ops.cat([
-+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++-#             ], dim=0) 
-+++++-#             for i in range(batch_size)
-+++++-#         ]
-+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
-+++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
-+++++-
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++-#         moe_output = ops.zeros_like(hidden_states)
-+++++-#         num_tokens, _ = hidden_states.shape
-+++++-#         flat_selected_experts = selected_experts.flatten()
-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++-#         active_experts = ops.unique(flat_selected_experts)
-+++++-#         for expert_idx_tensor in active_experts:
-+++++-#             expert_idx = expert_idx_tensor.item()
-+++++-#             expert_layer = self.experts[expert_idx]
-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++-#             selected_token_indices = token_indices[mask]
-+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++-#             current_states = hidden_states[selected_token_indices]
-+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++-#         return moe_output
-+++++-
-+++++-#     # --- 精度优先模式的辅助函数 ---
-+++++-#     @no_grad()
-+++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++-#         moe_output = ops.zeros_like(hidden_states)
-+++++-#         num_tokens, _ = hidden_states.shape
-+++++-#         flat_selected_experts = selected_experts.flatten()
-+++++-#         flat_routing_weights = routing_weights.flatten()
-+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++-#         active_experts = ops.unique(flat_selected_experts)
-+++++-#         for expert_idx_tensor in active_experts:
-+++++-#             expert_idx = expert_idx_tensor.item()
-+++++-#             expert_layer = self.experts[expert_idx]
-+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++-#             current_token_indices = token_indices[mask]
-+++++-#             current_routing_weights = flat_routing_weights[mask]
-+++++-#             current_hidden_states = hidden_states[current_token_indices]
-+++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++-#         return moe_output
-+++++-
-+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
-+++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-+++++-#         global Long_Prompt
-+++++-
-+++++-#         # 1. 门控计算 (所有模式通用)
-+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++-#         router_logits = self.gate(hidden_states_reshaped)
-+++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+++++-#         if self.norm_topk_prob:
-+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-        
-+++++-#         moe_output = None
-+++++-#         if not self.training:
-+++++-#             # 根据 Long_Prompt 标志选择模式
-+++++-#             if Long_Prompt:
-+++++-#                 # --- 精度优先模式 ---
-+++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++-#             else:
-+++++-#                 # --- 速度优先模式 ---
-+++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++-#                 if sequence_length == 1:
-+++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++-#                 else:
-+++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++-#         else:
-+++++-#             raise NotImplementedError("Training path is not implemented.")
-+++++-
-+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++-        
-+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++-        
-+++++-#         return final_hidden_states, router_logits
-+++++-    
-+++++ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++     """
-+++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-+++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++++         return moe_output_fp32.squeeze(1).to(original_dtype)
-+++++ 
-++++++    # @no_grad()
-++++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++    #     num_tokens, _ = hidden_states.shape
-++++++    #     flat_selected_experts = selected_experts.flatten()
-++++++    #     sorted_expert_indices = flat_selected_experts.argsort()
-++++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++++++    #     original_token_indices = sorted_expert_indices // self.top_k
-++++++    #     moe_output = ops.zeros_like(hidden_states)
-++++++    #     current_token_offset = 0
-++++++    #     for i in range(self.num_experts):
-++++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
-++++++    #         if expert_token_count == 0:
-++++++    #             continue
-++++++    #         end_offset = current_token_offset + expert_token_count
-++++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
-++++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++++++    #         current_token_offset += expert_token_count
-++++++    #     return moe_output
-++++++
-+++++     @no_grad()
-+++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++-        num_tokens, _ = hidden_states.shape
-+++++-        flat_selected_experts = selected_experts.flatten()
-+++++-        sorted_expert_indices = flat_selected_experts.argsort()
-+++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+++++-        original_token_indices = sorted_expert_indices // self.top_k
-++++++        """
-++++++        优化版 MoE prefill (速度优先模式)：
-++++++        - 批量张量化处理同一个 expert 的所有 token
-++++++        - 跳过无 token 的专家
-++++++        - 保持结果完全一致
-++++++        """
-+++++         moe_output = ops.zeros_like(hidden_states)
-+++++-        current_token_offset = 0
-+++++-        for i in range(self.num_experts):
-+++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
-+++++-            if expert_token_count == 0:
-+++++-                continue
-+++++-            end_offset = current_token_offset + expert_token_count
-+++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
-+++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++-            current_token_offset += expert_token_count
-++++++
-++++++        flat_selected_experts = selected_experts.flatten()
-++++++        flat_routing_weights = routing_weights.flatten()
-++++++
-++++++        idxs = flat_selected_experts.argsort()
-++++++        sorted_expert_indices = flat_selected_experts[idxs]
-++++++        sorted_token_indices = idxs // self.top_k
-++++++
-++++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-++++++
-++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-++++++
-++++++        for expert_id in active_experts.tolist():
-++++++            start = int(tokens_per_expert[:expert_id].sum().item())
-++++++            end = start + int(tokens_per_expert[expert_id].item())
-++++++
-++++++            token_idx = sorted_token_indices[start:end]
-++++++            expert_tokens = hidden_states[token_idx]
-++++++
-++++++            expert_out = self.experts[expert_id](expert_tokens)
-++++++
-++++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
-++++++
-++++++            moe_output = mindspore.mint.scatter_add(
-++++++                moe_output,
-++++++                0,
-++++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
-++++++                scaled_out.to(hidden_states.dtype)
-++++++            )
-++++++
-+++++         return moe_output
-+++++ 
-++++++
-+++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-+++++     @no_grad()
-+++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++         
-+++++         moe_output = None
-+++++-        if Long_Prompt:
-+++++-            # --- 精度优先模式 (ACCURACY MODE) ---
-+++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++        # if Long_Prompt==0:
-++++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
-++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++        # else:
-++++++        #     # --- 速度优先模式 (SPEED MODE) ---
-++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++        #     if sequence_length == 1:
-++++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++        #     else:
-++++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++        
-++++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++        if sequence_length == 1:
-++++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++         else:
-+++++-            # --- 速度优先模式 (SPEED MODE) ---
-+++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++-            if sequence_length == 1:
-+++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++-            else:
-+++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++-        
-++++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++    
-+++++ 
-+++++         # 3. 共享专家计算与合并 (所有模式通用)
-+++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++         
-+++++         return final_hidden_states, router_logits
-+++++ 
-++++++
-+++++ class Qwen2MoeDecoderLayer(nn.Module):
-+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-+++++         super().__init__()
-+++++         self.hidden_size = config.hidden_size
-+++++         
-+++++-        # if Long_Prompt:
-+++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++-        # else:
-++++++        # if Long_Prompt == 2:
-+++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++++        # else:
-++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++ 
-+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++ 
-+++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++++             )
-+++++ 
-+++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
-+++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++++        #     attention_mask,
-++++++        #     sequence_length=sequence_length,
-++++++        #     target_length=target_length,
-++++++        #     dtype=dtype,
-++++++        #     min_dtype=min_dtype,
-++++++        #     cache_position=cache_position,
-++++++        #     batch_size=input_tensor.shape[0],
-++++++        # )
-++++++        #@dwj
-++++++        causal_mask = get_cached_causal_mask_with_cache_position(
-+++++             attention_mask,
-+++++             sequence_length=sequence_length,
-+++++             target_length=target_length,
-+++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-+++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-+++++         """
-+++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
-++++++        _causal_mask_cache.clear()
-+++++ 
-+++++         input_ids = kwargs.get("input_ids")
-+++++         if input_ids is None and args:
-+++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++ 
-+++++         if input_ids is not None:
-+++++             prompt_length = input_ids.shape[1]
-+++++-            
-+++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-+++++-                Long_Prompt = True
-++++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-++++++                Long_Prompt = 2
-++++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-++++++                Long_Prompt = 0
-+++++             else:
-+++++-                Long_Prompt = False
-++++++                Long_Prompt = 1
-++++++
-+++++ 
-+++++         return super().generate(*args, **kwargs)
-+++++     
-+++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++             dtype = self.lm_head.weight.dtype
-+++++             min_dtype = float(ops.finfo(dtype).min)
-+++++ 
-+++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-++++++            #     attention_mask,
-++++++            #     sequence_length=sequence_length,
-++++++            #     target_length=past_key_values.get_max_length(),
-++++++            #     dtype=dtype,
-++++++            #     min_dtype=min_dtype,
-++++++            #     cache_position=cache_position,
-++++++            #     batch_size=batch_size,
-++++++            # )
-++++++
-++++++            #@dwj
-++++++            attention_mask = get_cached_causal_mask_with_cache_position(
-+++++                 attention_mask,
-+++++                 sequence_length=sequence_length,
-+++++                 target_length=past_key_values.get_max_length(),
-+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+++++deleted file mode 100644
-+++++index 6dfb5b93..00000000
-+++++--- a/patches/0001-20251104commit.patch
-++++++++ /dev/null
-+++++@@ -1,1272 +0,0 @@
-+++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++++-From: Pinoeer-kingxi <13022943007@163.com>
-+++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++++-Subject: [PATCH] 20251104commit
-+++++-
-+++++----
-+++++- mindnlp/transformers/cache_utils.py           |  28 +-
-+++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+++++- 3 files changed, 976 insertions(+), 87 deletions(-)
-+++++-
-+++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+++++-index cadd2e04..02f8d4be 100644
-+++++---- a/mindnlp/transformers/cache_utils.py
-+++++-+++ b/mindnlp/transformers/cache_utils.py
-+++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+++++-             # k_out[:, :, cache_position] = key_states
-+++++-             # v_out[:, :, cache_position] = value_states
-+++++--            if ON_ORANGE_PI:
-+++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++--            else:
-+++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++--
-+++++-+            # if ON_ORANGE_PI:
-+++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++-+            # else:
-+++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
-+++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+++++-+            if cache_position.ndim > 1:
-+++++-+                cache_position = cache_position.flatten()
-+++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
-+++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+++++-+                cache_position = cache_position.int()
-+++++-+            
-+++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+++++-+            k_out[:, :, cache_position] = key_states
-+++++-+            v_out[:, :, cache_position] = value_states
-+++++-+            
-+++++-         return k_out, v_out
-+++++- 
-+++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++-index c695b944..d8303e45 100644
-+++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++++- def rotate_half(x):
-+++++-     """Rotates half the hidden dims of the input."""
-+++++--    x1 = x[..., : x.shape[-1] // 2]
-+++++--    x2 = x[..., x.shape[-1] // 2 :]
-+++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++-+    # x1 = x[..., : x.shape[-1] // 2]
-+++++-+    # x2 = x[..., x.shape[-1] // 2 :]
-+++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++-     return ops.cat((-x2, x1), dim=-1)
-+++++- 
-+++++- 
-+++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+++++-         if self.training:
-+++++-             raise NotImplementedError("Training is not supported yet.")
-+++++-         else:
-+++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++--        if self.config.n_shared_experts is not None:
-+++++--            y = y + self.shared_experts(identity)
-+++++--        return y
-+++++-+            # @lwx
-+++++-+            if orig_shape[1] == 1:
-+++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+++++-+                y=y.view(*orig_shape)
-+++++-+                if self.config.n_shared_experts is not None:
-+++++-+                    y = y + self.shared_experts(identity)
-+++++-+                return y
-+++++-+            else:
-+++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+++++-+                if self.config.n_shared_experts is not None:
-+++++-+                    y = y + self.shared_experts(identity)
-+++++-+                return y
-+++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++-+        # if self.config.n_shared_experts is not None:
-+++++-+        #     y = y + self.shared_experts(identity)
-+++++-+        # return y
-+++++-+        
-+++++-+    @no_grad()
-+++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++-+
-+++++-+        expert_cache = ops.zeros_like(x)
-+++++-+        for i in range(self.num_experts_per_tok):
-+++++-+            expert_id = flat_expert_indices[i].item()
-+++++-+            weight = flat_expert_weights[i].item()
-+++++-+            expert = self.experts[expert_id]
-+++++-+            expert_out = expert(x)
-+++++-+            expert_cache += expert_out * weight
-+++++-+        return expert_cache
-+++++- 
-+++++-     @no_grad()
-+++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++--        # expert_cache = torch.zeros_like(x)
-+++++--        # idxs = flat_expert_indices.argsort()
-+++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++--        # token_idxs = idxs // self.num_experts_per_tok
-+++++--        # for i, end_idx in enumerate(tokens_per_expert):
-+++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++--        #     if start_idx == end_idx:
-+++++--        #         continue
-+++++--        #     expert = self.experts[i]
-+++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++--        #     expert_tokens = x[exp_token_idx]
-+++++--        #     expert_out = expert(expert_tokens)
-+++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++--        # return expert_cache
-+++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++-         expert_cache = ops.zeros_like(x)
-+++++-         idxs = flat_expert_indices.argsort()
-+++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++-         token_idxs = idxs // self.num_experts_per_tok
-+++++-+
-+++++-         for i, end_idx in enumerate(tokens_per_expert):
-+++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++-             if start_idx == end_idx:
-+++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+++++-             expert_out = expert(expert_tokens)
-+++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++-+
-+++++-         return expert_cache
-+++++-+        
-+++++-+    # @no_grad()
-+++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++-+    #     # expert_cache = torch.zeros_like(x)
-+++++-+    #     # idxs = flat_expert_indices.argsort()
-+++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
-+++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++-+    #     #     if start_idx == end_idx:
-+++++-+    #     #         continue
-+++++-+    #     #     expert = self.experts[i]
-+++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++-+    #     #     expert_tokens = x[exp_token_idx]
-+++++-+    #     #     expert_out = expert(expert_tokens)
-+++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++-+    #     # return expert_cache
-+++++-+    #     expert_cache = ops.zeros_like(x)
-+++++-+    #     idxs = flat_expert_indices.argsort()
-+++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++-+    #     token_idxs = idxs // self.num_experts_per_tok
-+++++-+
-+++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
-+++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++-+    #         if start_idx == end_idx:
-+++++-+    #             continue
-+++++-+    #         expert = self.experts[i]
-+++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++-+    #         expert_tokens = x[exp_token_idx]
-+++++-+    #         expert_out = expert(expert_tokens)
-+++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++-+
-+++++-+    #     return expert_cache
-+++++-+    # @no_grad()
-+++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++-+    #     expert_cache = ops.zeros_like(x)
-+++++-+
-+++++-+    #     # 排序保证顺序一致
-+++++-+    #     idxs = flat_expert_indices.argsort()
-+++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++-+    #     token_idxs = idxs // self.num_experts_per_tok
-+++++-+
-+++++-+    #     # 找出有 token 的专家
-+++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++-+
-+++++-+    #     for i in active_experts.tolist():
-+++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++-+    #         end_idx = tokens_per_expert[i]
-+++++-+    #         if start_idx == end_idx:  # 没有 token
-+++++-+    #             continue
-+++++-+
-+++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++-+    #         expert_tokens = x[exp_token_idx]
-+++++-+    #         expert_out = self.experts[i](expert_tokens)
-+++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++-+
-+++++-+    #         expert_cache = mindspore.mint.scatter_add(
-+++++-+    #             expert_cache,
-+++++-+    #             0,
-+++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++-+    #             expert_out
-+++++-+    #         )
-+++++-+
-+++++-+    #     return expert_cache
-+++++-+
-+++++-+
-+++++- 
-+++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+++++- #     """
-+++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++- 
-+++++-         # Initialize weights and apply final processing
-+++++-         self.post_init()
-+++++-+        self.warm_up = False
-+++++-+
-+++++-+    def warmup_moe_model_deep(self):
-+++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++++-+        test_texts = [
-+++++-+            "warmup short",
-+++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+++++-+        ]
-+++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++-+        if tokenizer is None:
-+++++-+            from mindnlp.transformers import AutoTokenizer
-+++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++-+            self._warmup_tokenizer = tokenizer
-+++++-+
-+++++-+        for text in test_texts:
-+++++-+            inputs = tokenizer(text, return_tensors="ms")
-+++++-+            with mindspore._no_grad():
-+++++-+                _ = self(**inputs, use_cache=False)
-+++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+++++- 
-+++++-     def get_input_embeddings(self):
-+++++-         return self.model.embed_tokens
-+++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++++-         ```"""
-+++++-+        if not self.warm_up:
-+++++-+            self.warm_up = True
-+++++-+            self.warmup_moe_model_deep()
-+++++-+
-+++++-         output_attentions = (
-+++++-             output_attentions
-+++++-             if output_attentions is not None
-+++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++-index 3cbf820e..d4c6b651 100644
-+++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++-@@ -18,7 +18,6 @@
-+++++- # See the License for the specific language governing permissions and
-+++++- # limitations under the License.
-+++++- """MindSpore Qwen2MoE model."""
-+++++--
-+++++- import math
-+++++- from typing import List, Optional, Tuple, Union
-+++++- 
-+++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+++++-     TokenClassifierOutput,
-+++++- )
-+++++- from ...modeling_utils import PreTrainedModel
-+++++-+from ...generation import GenerationMixin
-+++++- from ....utils import logging
-+++++- from .configuration_qwen2_moe import Qwen2MoeConfig
-+++++- 
-+++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+++++-         self.variance_epsilon = eps
-+++++- 
-+++++-     def forward(self, hidden_states):
-+++++-+        # @dwj
-+++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++-+        # @lwx
-+++++-+        # if not self.training :
-+++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++-         input_dtype = hidden_states.dtype
-+++++-         hidden_states = hidden_states.to(mindspore.float32)
-+++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+++++-@@ -234,6 +239,8 @@ def rotate_half(x):
-+++++-     """Rotates half the hidden dims of the input."""
-+++++-     x1 = x[..., : x.shape[-1] // 2]
-+++++-     x2 = x[..., x.shape[-1] // 2 :]
-+++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++-     return ops.cat((-x2, x1), dim=-1)
-+++++- 
-+++++- 
-+++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+++++-         self.config = config
-+++++-         self.hidden_size = config.hidden_size
-+++++-         self.intermediate_size = intermediate_size
-+++++-+        
-+++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+++++-         self.act_fn = ACT2FN[config.hidden_act]
-+++++- 
-+++++-     def forward(self, x):
-+++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++--
-+++++- 
-+++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++-+        # @lwx 
-+++++-+        # gate_up_output = self.gate_up_proj(x)
-+++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+++++-+        # return self.down_proj(swiglu_output)
-+++++-+
-+++++-+    # def forward(self, x):
-+++++-+    #     gate_proj_out = self.gate_proj(x)
-+++++-+    #     up_proj_out = self.up_proj(x)
-+++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+++++-+    #     return self.down_proj(swiglu_out)
-+++++-+        
-+++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++++-     """
-+++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+++++-         use_cache: bool = False,
-+++++-         cache_position: Optional[mindspore.Tensor] = None,
-+++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++-+
-+++++-+        
-+++++-+
-+++++-         bsz, q_len, _ = hidden_states.shape
-+++++- 
-+++++-         query_states = self.q_proj(hidden_states)
-+++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++-                     "with a layer index."
-+++++-                 )
-+++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++-+            if isinstance(past_key_value, StaticCache):
-+++++-+                kv_seq_len = key_states.shape[-2]
-+++++-+            else:
-+++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++- 
-+++++-         if past_key_value is not None:
-+++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++-+            
-+++++-+            if isinstance(past_key_value, StaticCache):
-+++++-+                kv_seq_len = key_states.shape[-2]
-+++++- 
-+++++-         # repeat k/v heads if n_kv_heads < n_heads
-+++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++--
-+++++-+        
-+++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++- 
-+++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+++++--            raise ValueError(
-+++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+++++--                f" {attn_weights.shape}"
-+++++--            )
-+++++--
-+++++--        if attention_mask is not None:  # no matter the length, we just slice it
-+++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+++++-+        if attention_mask is not None:
-+++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++-             attn_weights = attn_weights + causal_mask
-+++++- 
-+++++-         # upcast attention to fp32
-+++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++- 
-+++++-         attn_output = self.o_proj(attn_output)
-+++++--
-+++++-+        # @lwx
-+++++-+        
-+++++-+        # max_seq_len = self.max_position_embeddings  # 2048
-+++++-+
-+++++-+        # if attention_mask is not None:
-+++++-+        #     # attention_mask: [B, 1, Sq, Sk]
-+++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++++-+
-+++++-+        #     # pad 到 [max_seq_len, max_seq_len]
-+++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++++-+        #     global_attention_mask = padded_mask
-+++++-+        # else:
-+++++-+        #     global_attention_mask = None
-+++++-+
-+++++-+
-+++++-+        # sparse_mode=3
-+++++-+        # attn_output = mindspore.ops.flash_attention_score(
-+++++-+        #     query=query_states,
-+++++-+        #     key=key_states,
-+++++-+        #     value=value_states,
-+++++-+        #     real_shift=None,
-+++++-+        #     padding_mask=None,
-+++++-+
-+++++-+        #     head_num=self.num_heads,
-+++++-+        #     attn_mask=global_attention_mask,
-+++++-+        #     keep_prob=1.0 - self.attention_dropout,
-+++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++-+        #     input_layout="BNSD",
-+++++-+        #     pre_tokens=2147483647,
-+++++-+        #     next_tokens=2147483647,
-+++++-+        #     inner_precise=0,
-+++++-+        #     drop_mask=None,
-+++++-+        #     prefix=None,
-+++++-+        #     actual_seq_qlen=None,
-+++++-+        #     actual_seq_kvlen=None,
-+++++-+        #     sparse_mode=sparse_mode,
-+++++-+        # )
-+++++-         if not output_attentions:
-+++++-             attn_weights = None
-+++++- 
-+++++-         return attn_output, attn_weights, past_key_value
-+++++- 
-+++++- 
-+++++-+class Qwen2MoeFlashAttention(nn.Module):
-+++++-+    """
-+++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++++-+
-+++++-+    关键改动:
-+++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++++-+       直接传入原始的 key 和 value 张量效率更高。
-+++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++-+    """
-+++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++-+        super().__init__()
-+++++-+        self.config = config
-+++++-+        self.layer_idx = layer_idx
-+++++-+        self.hidden_size = config.hidden_size
-+++++-+        self.num_heads = config.num_attention_heads
-+++++-+        self.head_dim = self.hidden_size // self.num_heads
-+++++-+        self.num_key_value_heads = config.num_key_value_heads
-+++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++-+        self.max_position_embeddings = config.max_position_embeddings
-+++++-+        self.rope_theta = config.rope_theta
-+++++-+        self.attention_dropout = config.attention_dropout
-+++++-+
-+++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++-+            raise ValueError(
-+++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++++-+            )
-+++++-+
-+++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++-+
-+++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++-+            self.head_dim,
-+++++-+            max_position_embeddings=self.max_position_embeddings,
-+++++-+            base=self.rope_theta,
-+++++-+        )
-+++++-+
-+++++-+    def forward(
-+++++-+        self,
-+++++-+        hidden_states: mindspore.Tensor,
-+++++-+        attention_mask: Optional[mindspore.Tensor] = None,
-+++++-+        position_ids: Optional[mindspore.Tensor] = None,
-+++++-+        past_key_value: Optional[Cache] = None,
-+++++-+        output_attentions: bool = False,
-+++++-+        use_cache: bool = False,
-+++++-+        cache_position: Optional[mindspore.Tensor] = None,
-+++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++-+
-+++++-+        bsz, q_len, _ = hidden_states.shape
-+++++-+
-+++++-+        # 1. 线性投射 Q, K, V
-+++++-+        query_states = self.q_proj(hidden_states)
-+++++-+        key_states = self.k_proj(hidden_states)
-+++++-+        value_states = self.v_proj(hidden_states)
-+++++-+
-+++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+
-+++++-+        # 3. RoPE 旋转位置编码
-+++++-+        kv_seq_len = key_states.shape[-2]
-+++++-+        if past_key_value is not None:
-+++++-+            if self.layer_idx is None:
-+++++-+                raise ValueError(
-+++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++-+                    "with a layer index."
-+++++-+                )
-+++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++++-+                if cache_position.shape[0] == 1:
-+++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++++-+                    kv_seq_len = past_seen_tokens + 1
-+++++-+                else:
-+++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
-+++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++++-+            else:
-+++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++-+
-+++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++-+
-+++++-+        # 4. KV 缓存更新
-+++++-+        if past_key_value is not None:
-+++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++-+            key_states, value_states = past_key_value.update(
-+++++-+                key_states, value_states, self.layer_idx, cache_kwargs
-+++++-+            )
-+++++-+            
-+++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++-+                if cache_position.shape[0] == 1:
-+++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++++-+                    kv_seq_len = key_states.shape[-2]
-+++++-+
-+++++-+        # 5. [重要] 准备 Attention Mask
-+++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++-+        fa_attention_mask = None
-+++++-+        if attention_mask is not None:
-+++++-+            # 截取与当前key长度匹配的部分
-+++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++++-+            fa_attention_mask = (mask_slice != 0)
-+++++-+
-+++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++++-+        input_dtype = query_states.dtype
-+++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++++-+            query_states = query_states.to(mindspore.float16)
-+++++-+            key_states = key_states.to(mindspore.float16)
-+++++-+            value_states = value_states.to(mindspore.float16)
-+++++-+
-+++++-+        # 6. [核心] 调用 flash_attention_score 算子
-+++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++-+        attn_output = mindspore.ops.flash_attention_score(
-+++++-+            query=query_states,
-+++++-+            key=key_states,
-+++++-+            value=value_states,
-+++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
-+++++-+            attn_mask=fa_attention_mask,
-+++++-+            keep_prob=1.0 - self.attention_dropout,
-+++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++-+            input_layout="BNSD",
-+++++-+            sparse_mode=0 # 使用 defaultMask 模式
-+++++-+        )
-+++++-+
-+++++-+        # 恢复原始数据类型
-+++++-+        attn_output = attn_output.to(input_dtype)
-+++++-+
-+++++-+        # 7. 调整输出形状
-+++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++-+        attn_output = self.o_proj(attn_output)
-+++++-+
-+++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
-+++++-+        attn_weights = None
-+++++-+        if output_attentions:
-+++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++-+
-+++++-+        return attn_output, attn_weights, past_key_value
-+++++-+
-+++++-+    # def forward(
-+++++-+    #     self,
-+++++-+    #     hidden_states: mindspore.Tensor,
-+++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++-+    #     past_key_value: Optional[Cache] = None,
-+++++-+    #     output_attentions: bool = False,
-+++++-+    #     use_cache: bool = False,
-+++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++-+
-+++++-+    #     bsz, q_len, _ = hidden_states.shape
-+++++-+
-+++++-+    #     # 1. 线性投射 Q, K, V
-+++++-+    #     query_states = self.q_proj(hidden_states)
-+++++-+    #     key_states = self.k_proj(hidden_states)
-+++++-+    #     value_states = self.v_proj(hidden_states)
-+++++-+
-+++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+
-+++++-+    #     # 3. RoPE 旋转位置编码
-+++++-+    #     kv_seq_len = key_states.shape[-2]
-+++++-+    #     if past_key_value is not None:
-+++++-+    #         if self.layer_idx is None:
-+++++-+    #             raise ValueError(
-+++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++-+    #                 "with a layer index."
-+++++-+    #             )
-+++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++-+
-+++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++-+
-+++++-+    #     # 4. KV 缓存更新
-+++++-+    #     if past_key_value is not None:
-+++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++-+    #         key_states, value_states = past_key_value.update(
-+++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++-+    #         )
-+++++-+
-+++++-+    #     # 5. 准备 Attention Mask
-+++++-+    #     fa_attention_mask = None
-+++++-+    #     if attention_mask is not None:
-+++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++-+    #         fa_attention_mask = (mask_slice != 0)
-+++++-+
-+++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++++-+    #     input_dtype = query_states.dtype
-+++++-+
-+++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
-+++++-+    #     attn_output = mindspore.ops.flash_attention_score(
-+++++-+    #         query=query_states,
-+++++-+    #         key=key_states,
-+++++-+    #         value=value_states,
-+++++-+    #         head_num=self.num_heads,
-+++++-+    #         attn_mask=fa_attention_mask,
-+++++-+    #         keep_prob=1.0 - self.attention_dropout,
-+++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++-+    #         input_layout="BNSD",
-+++++-+    #         sparse_mode=0,
-+++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++++-+    #         inner_precise=1
-+++++-+    #     )
-+++++-+
-+++++-+    #     # 恢复原始数据类型
-+++++-+    #     attn_output = attn_output.to(input_dtype)
-+++++-+
-+++++-+    #     # 7. 调整输出形状
-+++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++-+    #     attn_output = self.o_proj(attn_output)
-+++++-+
-+++++-+    #     attn_weights = None
-+++++-+    #     if output_attentions:
-+++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++-+
-+++++-+    #     return attn_output, attn_weights, past_key_value
-+++++-+
-+++++-+    # def forward(
-+++++-+    #     self,
-+++++-+    #     hidden_states: mindspore.Tensor,
-+++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++-+    #     past_key_value: Optional[Cache] = None,
-+++++-+    #     output_attentions: bool = False,
-+++++-+    #     use_cache: bool = False,
-+++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++-+
-+++++-+    #     bsz, q_len, _ = hidden_states.shape
-+++++-+
-+++++-+    #     query_states = self.q_proj(hidden_states)
-+++++-+    #     key_states = self.k_proj(hidden_states)
-+++++-+    #     value_states = self.v_proj(hidden_states)
-+++++-+
-+++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-+
-+++++-+    #     kv_seq_len = key_states.shape[-2]
-+++++-+    #     if past_key_value is not None:
-+++++-+    #         if self.layer_idx is None:
-+++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
-+++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++-+
-+++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++-+
-+++++-+    #     if past_key_value is not None:
-+++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++-+    #         key_states, value_states = past_key_value.update(
-+++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++-+    #         )
-+++++-+
-+++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++-+
-+++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
-+++++-+    #     # <--- 修改结束 ---
-+++++-+
-+++++-+    #     fa_attention_mask = None
-+++++-+    #     if attention_mask is not None:
-+++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++-+    #         fa_attention_mask = (mask_slice != 0)
-+++++-+
-+++++-+    #     input_dtype = query_states.dtype
-+++++-+
-+++++-+    #     attn_output = mindspore.ops.flash_attention_score(
-+++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
-+++++-+    #         key=key_states,
-+++++-+    #         value=value_states,
-+++++-+    #         head_num=self.num_heads,
-+++++-+    #         attn_mask=fa_attention_mask,
-+++++-+    #         keep_prob=1.0 - self.attention_dropout,
-+++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++++-+    #         input_layout="BNSD",
-+++++-+    #         sparse_mode=0,
-+++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
-+++++-+    #     )
-+++++-+
-+++++-+    #     attn_output = attn_output.to(input_dtype)
-+++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++-+    #     attn_output = self.o_proj(attn_output)
-+++++-+
-+++++-+    #     attn_weights = None
-+++++-+    #     if output_attentions:
-+++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++++-+
-+++++-+    #     return attn_output, attn_weights, past_key_value
-+++++-+
-+++++- QWEN2MOE_ATTENTION_CLASSES = {
-+++++-     "eager": Qwen2MoeAttention,
-+++++-+    "flash-attention": Qwen2MoeFlashAttention,
-+++++- }
-+++++- 
-+++++- 
-+++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++- 
-+++++-+    #@dwj
-+++++-+    # 只遍历激活的专家，而非全部专家
-+++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++--        hidden_states = hidden_states.view(-1, hidden_dim)
-+++++--        # router_logits: (batch * sequence_length, n_experts)
-+++++--        router_logits = self.gate(hidden_states)
-+++++--
-+++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++--        if self.norm_topk_prob:
-+++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++--        # we cast back to the input dtype
-+++++--        routing_weights = routing_weights.to(hidden_states.dtype)
-+++++--
-+++++--        final_hidden_states = ops.zeros(
-+++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+++++--        )
-+++++--
-+++++--        # One hot encode the selected experts to create an expert mask
-+++++--        # this will be used to easily index which expert is going to be sollicitated
-+++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+++++--
-+++++--        # Loop over all available experts in the model and perform the computation on each expert
-+++++--        for expert_idx in range(self.num_experts):
-+++++--            expert_layer = self.experts[expert_idx]
-+++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+++++--
-+++++--            # Index the correct hidden states and compute the expert hidden state for
-+++++--            # the current expert. We need to make sure to multiply the output hidden
-+++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+++++--            if 0 not in idx.shape:
-+++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+++++--
-+++++--                # However `index_add_` only support torch tensors for indexing so we'll use
-+++++--                # the `top_x` tensor here.
-+++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+++++--
-+++++--        shared_expert_output = self.shared_expert(hidden_states)
-+++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+++++--
-+++++--        final_hidden_states = final_hidden_states + shared_expert_output
-+++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++-+            num_tokens = hidden_states_reshaped.shape[0]
-+++++-+
-+++++-+            router_logits = self.gate(hidden_states_reshaped)
-+++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++-+
-+++++-+            if self.norm_topk_prob:
-+++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
-+++++-+            
-+++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++++-+            flat_selected_experts = selected_experts.flatten()
-+++++-+            
-+++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++++-+            token_indices = broadcasted_token_indices.flatten()
-+++++-+            
-+++++-+            active_experts = ops.unique(flat_selected_experts)
-+++++-+            
-+++++-+            for expert_idx_tensor in active_experts:
-+++++-+                expert_idx = expert_idx_tensor.item()
-+++++-+                expert_layer = self.experts[expert_idx]
-+++++-+                
-+++++-+                mask = (flat_selected_experts == expert_idx_tensor)
-+++++-+                selected_token_indices = token_indices[mask]
-+++++-+                selected_routing_weights = routing_weights.flatten()[mask]
-+++++-+                
-+++++-+                current_states = hidden_states_reshaped[selected_token_indices]
-+++++-+                
-+++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++-+                
-+++++-+                final_hidden_states = final_hidden_states.index_add(
-+++++-+                    dim=0,
-+++++-+                    index=selected_token_indices,
-+++++-+                    source=expert_output.to(hidden_states.dtype)
-+++++-+                )
-+++++-+            
-+++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++++- 
-+++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++--        return final_hidden_states, router_logits
-+++++-+            final_hidden_states = final_hidden_states + shared_expert_output
-+++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++-+            
-+++++-+            return final_hidden_states, router_logits
-+++++- 
-+++++- 
-+++++- class Qwen2MoeDecoderLayer(nn.Module):
-+++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+++++- 
-+++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++- 
-+++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++-+
-+++++-         if (layer_idx not in config.mlp_only_layers) and (
-+++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++++-         ):
-+++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+++++-     _skip_keys_device_placement = "past_key_values"
-+++++-     _supports_cache_class = True
-+++++-+#lwx
-+++++-+    # _supports_static_cache = True
-+++++- 
-+++++-     def _init_weights(self, module):
-+++++-         std = self.config.initializer_range
-+++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++++-         return causal_mask
-+++++- 
-+++++- 
-+++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++-     _tied_weights_keys = ["lm_head.weight"]
-+++++- 
-+++++-     def __init__(self, config):
-+++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++-         self.num_experts_per_tok = config.num_experts_per_tok
-+++++-         # Initialize weights and apply final processing
-+++++-         self.post_init()
-+++++-+        # @lwx
-+++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+++++-+        #     self.generation_config.cache_implementation = "static"
-+++++-+        self._warmed_up = False
-+++++-+
-+++++-+    def warmup_moe_model(self):
-+++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+++++-+        test_texts = [
-+++++-+            "warmup short",
-+++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+++++-+        ]
-+++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++-+        if tokenizer is None:
-+++++-+            from mindnlp.transformers import AutoTokenizer
-+++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++-+            self._warmup_tokenizer = tokenizer
-+++++-+
-+++++-+        for text in test_texts:
-+++++-+            inputs = tokenizer(text, return_tensors="ms")
-+++++-+            with mindspore._no_grad():
-+++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+++++- 
-+++++-     def get_input_embeddings(self):
-+++++-         return self.model.embed_tokens
-+++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++++-         ```"""
-+++++-+        if not self._warmed_up:
-+++++-+            self._warmed_up = True
-+++++-+            self.warmup_moe_model()
-+++++- 
-+++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++++-         output_router_logits = (
-+++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++-             }
-+++++-         )
-+++++-         return model_inputs
-+++++-+# @lwx
-+++++-+    # def _decode_one_tokens_logits(
-+++++-+    #     self,
-+++++-+    #     cur_token: mindspore.Tensor,
-+++++-+    #     input_pos: Optional[mindspore.Tensor],
-+++++-+    #     cache_position: mindspore.Tensor,
-+++++-+    #     past_key_values: StaticCache,
-+++++-+    # ) -> mindspore.Tensor:
-+++++-+    #     """
-+++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+++++-+        
-+++++-+    #     Args:
-+++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+++++-+    #         input_pos: 输入位置信息，可选
-+++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+++++-+            
-+++++-+    #     Returns:
-+++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+++++-+    #     """
-+++++-+    #     # 调用JIT编译的版本
-+++++-+    #     return self.get_decode_one_tokens_logits(
-+++++-+    #         cur_token=cur_token,
-+++++-+    #         input_pos=input_pos,
-+++++-+    #         cache_position=cache_position,
-+++++-+    #         past_key_values=past_key_values,
-+++++-+    #     )
-+++++-+    
-+++++-+    # @mindspore.jit(jit_level='O1')
-+++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+++++-+    #     """
-+++++-+    #     JIT编译的函数，用于高效的单token解码
-+++++-+    #     使用JIT编译优化以支持静态shape和高效执行
-+++++-+        
-+++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+++++-+    #     """
-+++++-+    #     outputs = self.model.forward(
-+++++-+    #         input_ids=cur_token,
-+++++-+    #         position_ids=input_pos,
-+++++-+    #         cache_position=cache_position,
-+++++-+    #         past_key_values=past_key_values,
-+++++-+    #         use_cache=True,
-+++++-+    #         return_dict=False,
-+++++-+    #     )
-+++++-+        
-+++++-+    #     hidden_states = outputs[0]
-+++++-+    #     logits = self.lm_head.forward(hidden_states)
-+++++-+    #     logits = logits.float()
-+++++-+        
-+++++-+    #     return logits[:, -1, :]
-+++++-+
-+++++-+    # def _sample(
-+++++-+    #     self,
-+++++-+    #     input_ids: mindspore.Tensor,
-+++++-+    #     logits_processor,
-+++++-+    #     stopping_criteria,
-+++++-+    #     generation_config,
-+++++-+    #     synced_devices: bool,
-+++++-+    #     streamer=None,
-+++++-+    #     logits_warper=None,
-+++++-+    #     **model_kwargs,
-+++++-+    # ):
-+++++-+    #     """
-+++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+++++-+    #     """
-+++++-+    #     from ...generation.logits_process import LogitsProcessorList
-+++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+++++-+    #     from mindnlp.core import nn, ops, no_grad
-+++++-+    #     import numpy as np
-+++++-+        
-+++++-+    #     # 检查是否使用 StaticCache
-+++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+++++-+    #     # 否则，直接调用父类方法
-+++++-+    #     past_key_values = model_kwargs.get("past_key_values")
-+++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+++++-+
-+++++-+    #     if not isinstance(past_key_values, StaticCache):
-+++++-+    #         # 不使用 StaticCache，直接调用父类方法
-+++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+++++-+    #         return super()._sample(
-+++++-+    #             input_ids=input_ids,
-+++++-+    #             logits_processor=logits_processor,
-+++++-+    #             stopping_criteria=stopping_criteria,
-+++++-+    #             generation_config=generation_config,
-+++++-+    #             synced_devices=synced_devices,
-+++++-+    #             streamer=streamer,
-+++++-+    #             logits_warper=logits_warper,
-+++++-+    #             **model_kwargs,
-+++++-+    #         )
-+++++-+        
-+++++-+    #     # 使用 StaticCache，进入自定义循环
-+++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+++++-+    #     pad_token_id = generation_config._pad_token_tensor
-+++++-+    #     output_attentions = generation_config.output_attentions
-+++++-+    #     output_hidden_states = generation_config.output_hidden_states
-+++++-+    #     output_scores = generation_config.output_scores
-+++++-+    #     output_logits = generation_config.output_logits
-+++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+++++-+    #     max_length = generation_config.max_length
-+++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+++++-+    #     do_sample = generation_config.do_sample
-+++++-+        
-+++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+++++-+    #         raise ValueError(
-+++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+++++-+    #             f"{logits_warper})."
-+++++-+    #         )
-+++++-+        
-+++++-+    #     # init attention / hidden states / scores tuples
-+++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
-+++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+++++-+        
-+++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+++++-+    #         encoder_hidden_states = (
-+++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+++++-+    #         )
-+++++-+        
-+++++-+    #     # keep track of which sequences are already finished
-+++++-+    #     batch_size, cur_len = input_ids.shape
-+++++-+    #     this_peer_finished = False
-+++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+++++-+        
-+++++-+    #     time_record = []
-+++++-+    #     from ....utils.testing_utils import parse_flag_from_env
-+++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+++++-+        
-+++++-+    #     while self._has_unfinished_sequences(
-+++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+++++-+    #     ):
-+++++-+    #         if _record_time:
-+++++-+    #             import time as time_module
-+++++-+    #             infer_start = time_module.time()
-+++++-+            
-+++++-+    #         # prepare model inputs
-+++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+++++-+            
-+++++-+    #         # prepare variable output controls
-+++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+++++-+            
-+++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+++++-+    #         cur_cache_position = model_inputs.get("cache_position")
-+++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
-+++++-+    #         cur_input_ids = model_inputs.get("input_ids")
-+++++-+            
-+++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+++++-+    #             cur_cache_position is not None and 
-+++++-+    #             len(cur_cache_position.shape) > 0 and
-+++++-+    #             cur_cache_position.shape[0] == 1 and
-+++++-+    #             cur_input_ids is not None and
-+++++-+    #             cur_input_ids.shape[1] == 1):
-+++++-+    #             # 使用 JIT 优化的单 token 解码
-+++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+++++-+    #             if not hasattr(self, '_jit_used'):
-+++++-+    #                 self._jit_used = False
-+++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+++++-+                
-+++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
-+++++-+    #                 cur_token=cur_input_ids,
-+++++-+    #                 input_pos=model_inputs.get("position_ids"),
-+++++-+    #                 cache_position=cur_cache_position,
-+++++-+    #                 past_key_values=cur_past_key_values,
-+++++-+    #             )
-+++++-+                
-+++++-+    #             # 标记已使用JIT（用于后续判断）
-+++++-+    #             if not self._jit_used:
-+++++-+    #                 self._jit_used = True
-+++++-+                
-+++++-+    #             # 构造兼容的输出对象
-+++++-+    #             class JitOptimizedOutput:
-+++++-+    #                 def __init__(self, logits, config):
-+++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+++++-+    #                     self.config = config
-+++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
-+++++-+    #                     self.cross_attentions = None
-+++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+++++-+                
-+++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+++++-+    #         else:
-+++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+++++-+    #             outputs = self(**model_inputs, return_dict=True)
-+++++-+            
-+++++-+    #         if synced_devices and this_peer_finished:
-+++++-+    #             continue
-+++++-+            
-+++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+++++-+    #         next_token_logits = outputs.logits[:, -1, :]
-+++++-+            
-+++++-+    #         # pre-process distribution
-+++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+++++-+    #         if do_sample:
-+++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+++++-+            
-+++++-+    #         # Store scores, attentions and hidden_states when required
-+++++-+    #         if return_dict_in_generate:
-+++++-+    #             if output_scores:
-+++++-+    #                 scores += (next_token_scores,)
-+++++-+    #             if output_logits:
-+++++-+    #                 raw_logits += (next_token_logits,)
-+++++-+    #             if output_attentions:
-+++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+++++-+    #                 if self.config.is_encoder_decoder:
-+++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+++++-+                
-+++++-+    #             if output_hidden_states:
-+++++-+    #                 hidden = (
-+++++-+    #                     outputs.decoder_hidden_states
-+++++-+    #                     if self.config.is_encoder_decoder
-+++++-+    #                     else outputs.hidden_states
-+++++-+    #                 )
-+++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+++++-+            
-+++++-+    #         # token selection
-+++++-+    #         if do_sample:
-+++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+++++-+    #         else:
-+++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+++++-+            
-+++++-+    #         # finished sentences should have their next token be a padding token
-+++++-+    #         if has_eos_stopping_criteria:
-+++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+++++-+            
-+++++-+    #         # update generated ids, model inputs, and length for next step
-+++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+++++-+    #         if streamer is not None:
-+++++-+    #             streamer.put(next_tokens)
-+++++-+            
-+++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
-+++++-+    #             outputs,
-+++++-+    #             model_kwargs,
-+++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+++++-+    #         )
-+++++-+            
-+++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+++++-+    #         cur_len += 1
-+++++-+            
-+++++-+    #         if _record_time:
-+++++-+    #             import time as time_module
-+++++-+    #             infer_stop = time_module.time()
-+++++-+    #             time_record.append(infer_stop - infer_start)
-+++++-+            
-+++++-+    #         del outputs
-+++++-+        
-+++++-+    #     average_infer_time = None
-+++++-+    #     if time_record:
-+++++-+    #         if len(time_record) > 1:
-+++++-+    #             time_record.pop(0)
-+++++-+    #         average_infer_time = sum(time_record) / len(time_record)
-+++++-+    #         print(f'average inference time is: {average_infer_time}')
-+++++-+    #         print(f'inference time record: {time_record}')
-+++++-+        
-+++++-+    #     if streamer is not None:
-+++++-+    #         streamer.end()
-+++++-+        
-+++++-+    #     # 简单判断：打印是否使用了JIT路径
-+++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
-+++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
-+++++-+    #     else:
-+++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+++++-+        
-+++++-+    #     if return_dict_in_generate:
-+++++-+    #         if self.config.is_encoder_decoder:
-+++++-+    #             return GenerateEncoderDecoderOutput(
-+++++-+    #                 sequences=input_ids,
-+++++-+    #                 scores=scores,
-+++++-+    #                 logits=raw_logits,
-+++++-+    #                 encoder_attentions=encoder_attentions,
-+++++-+    #                 encoder_hidden_states=encoder_hidden_states,
-+++++-+    #                 decoder_attentions=decoder_attentions,
-+++++-+    #                 cross_attentions=cross_attentions,
-+++++-+    #                 decoder_hidden_states=decoder_hidden_states,
-+++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++-+    #                 average_infer_time=average_infer_time
-+++++-+    #             )
-+++++-+    #         else:
-+++++-+    #             return GenerateDecoderOnlyOutput(
-+++++-+    #                 sequences=input_ids,
-+++++-+    #                 scores=scores,
-+++++-+    #                 logits=raw_logits,
-+++++-+    #                 attentions=decoder_attentions,
-+++++-+    #                 hidden_states=decoder_hidden_states,
-+++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++-+    #                 average_infer_time=average_infer_time
-+++++-+    #             )
-+++++-+    #     else:
-+++++-+    #         return input_ids
-+++++-+            
-+++++-+    # def _prepare_cache_for_generation(
-+++++-+    #     self,
-+++++-+    #     generation_config,
-+++++-+    #     model_kwargs,
-+++++-+    #     assistant_model,
-+++++-+    #     batch_size,
-+++++-+    #     max_cache_length,
-+++++-+    # ):
-+++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+++++-+    #         generation_config.cache_implementation = "static"
-+++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+++++-+        
-+++++-+    #     if generation_config.cache_implementation == "static":
-+++++-+    #         base_required_from_max_length = generation_config.max_length + 1
-+++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
-+++++-+    #         min_cache_size = 50
-+++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+++++-+    #         else:
-+++++-+    #             max_cache_length = max(base_required, min_cache_size)
-+++++-+            
-+++++-+    #         original_max_cache_length = max_cache_length
-+++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
-+++++-+            
-+++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++-+    #             if max_cache_length > self.config.max_position_embeddings:
-+++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+++++-+        
-+++++-+    #     result = super()._prepare_cache_for_generation(
-+++++-+    #         generation_config=generation_config,
-+++++-+    #         model_kwargs=model_kwargs,
-+++++-+    #         assistant_model=assistant_model,
-+++++-+    #         batch_size=batch_size,
-+++++-+    #         max_cache_length=max_cache_length,
-+++++-+    #     )
-+++++-+        
-+++++-+    #     if generation_config.cache_implementation == "static":
-+++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+++++-+    #         created_cache = model_kwargs.get(cache_name)
-+++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
-+++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+++++-+        
-+++++-+    #     return result
-+++++-+
-+++++-+
-+++++-+
-+++++- 
-+++++- 
-+++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+++++--- 
-+++++-2.27.0
-+++++-
-+++++-- 
-+++++2.27.0
-+++++
-++++-- 
-++++2.27.0
-++++
-+++-- 
-+++2.27.0
-+++
-++-- 
-++2.27.0
-++
-+-- 
-+2.27.0
-+
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0009-20251109firstcommit.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0009-20251109firstcommit.patch"
deleted file mode 100644
index 5ba94286..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0009-20251109firstcommit.patch"
+++ /dev/null
@@ -1,9078 +0,0 @@
-From 4f88911daf60910b3b94b56b8a590650454a2dde Mon Sep 17 00:00:00 2001
-From: Pinoeer-kingxi <13022943007@163.com>
-Date: Sun, 9 Nov 2025 02:09:15 +0800
-Subject: [PATCH 09/10] 20251109firstcommit
-
----
- .../models/deepseek/modeling_deepseek.py      |  103 +-
- patches/0001-20251104commit.patch             |    2 +-
- patches/0002-20251106commit.patch             |    2 +-
- patches/0003-20261106secondcommit.patch       |    2 +-
- patches/0004-20251106change.patch             |    2 +-
- patches/0005-20251107001commit.patch          |    2 +-
- patches/0006-20251107002commit.patch          |    2 +-
- patches/0007-20251107003commit.patch          |    2 +-
- patches/0008-moe-change.patch                 | 8789 +++++++++++++++++
- 9 files changed, 8889 insertions(+), 17 deletions(-)
- create mode 100644 patches/0008-moe-change.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index 0af29305..8d004af1 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -415,7 +415,9 @@ class DeepseekMoE(nn.Module):
-         else:
-             # @lwx
-             if orig_shape[1] == 1:
--                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+                # lwx moe_infer_decode_fast
-+                # y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+                y=self.moe_infer_decode_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-                 y=y.view(*orig_shape)
-                 if self.config.n_shared_experts is not None:
-                     y = y + self.shared_experts(identity)
-@@ -544,6 +546,7 @@ class DeepseekMoE(nn.Module):
-     #@dwj
-     @no_grad()
-     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+
-         selected_experts = [self.experts[i] for i in flat_expert_indices]
-         expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-         final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-@@ -643,6 +646,43 @@ class DeepseekMoE(nn.Module):
-     #     )
- 
-     #     return final_output
-+    # def init_expert_cache(self):
-+    #     """
-+    #     在模型初始化时调用，缓存所有专家的权重到显存。
-+    #     """
-+    #     self.cache_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
-+    #     self.cache_up_w   = ops.stack([expert.up_proj.weight for expert in self.experts], dim=0)
-+    #     self.cache_down_w = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
-+    @no_grad()
-+    def moe_infer_decode_fast(self, x, flat_expert_indices, flat_expert_weights):
-+        top_k = flat_expert_indices.shape[0]
-+        hidden_size = x.shape[-1]
-+
-+        selected_gate_w = []
-+        selected_up_w = []
-+        selected_down_w = []
-+
-+        for eid in flat_expert_indices.tolist():
-+            if hasattr(self, "cache_gate_w") and eid < self.cache_gate_w.shape[0]:
-+                selected_gate_w.append(self.cache_gate_w[eid])
-+                selected_up_w.append(self.cache_up_w[eid])
-+                selected_down_w.append(self.cache_down_w[eid])
-+            else:
-+                selected_gate_w.append(self.experts[eid].gate_proj.weight)
-+                selected_up_w.append(self.experts[eid].up_proj.weight)
-+                selected_down_w.append(self.experts[eid].down_proj.weight)
-+
-+        selected_gate_w = ops.stack(selected_gate_w, dim=0)
-+        selected_up_w   = ops.stack(selected_up_w, dim=0)
-+        selected_down_w = ops.stack(selected_down_w, dim=0)
-+
-+        x_expanded = x.expand((top_k, 1, hidden_size))
-+        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-+        up_out   = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-+        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-+        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-+        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-+        return weighted_sum
- 
-     # lwx prefill 20251108
-     @no_grad()
-@@ -711,7 +751,7 @@ class DeepseekMoE(nn.Module):
-             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-             expert_outputs * sorted_weights
-         )      
--
-+        return final_output
- 
-         # try:
-         #     final_output = ops.moe_token_unpermute(
-@@ -730,7 +770,7 @@ class DeepseekMoE(nn.Module):
-         #         expert_outputs * sorted_weights
-         #     )
- 
--        return final_output
-+        # return final_output
- 
- 
-     # @no_grad()
-@@ -1827,27 +1867,68 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
- 
-         # Initialize weights and apply final processing
-         self.post_init()
-+        # lwx
-         self.warm_up = False
--
-+    #初始
-+
-+    # def warmup_moe_model_deep(self):
-+    #     print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+    #     test_texts = [
-+    #         "warmup short",
-+    #         "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+    #         "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+    #     ]
-+    #     tokenizer = getattr(self, "_warmup_tokenizer", None)
-+    #     if tokenizer is None:
-+    #         from mindnlp.transformers import AutoTokenizer
-+    #         tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+    #         self._warmup_tokenizer = tokenizer
-+
-+    #     for text in test_texts:
-+    #         inputs = tokenizer(text, return_tensors="ms")
-+    #         with mindspore._no_grad():
-+    #             _ = self(**inputs, use_cache=False)
-+    #     print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+    
-     def warmup_moe_model_deep(self):
-         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--        test_texts = [
--            "warmup short",
--            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+
-+        # 直接用 eval.py 默认的 prompts 内容
-+        warmup_prompts = [
-+            "Hello, how are you?",
-+            "This American studied art at Yale and is the author of multiple popular mystery novels. First name is 'Hillary'. What's the last name?",
-+            """Summarize the following text: US President Donald Trump has said he is 'not happy' with his Russian counterpart Vladimir Putin, following Moscow's largest aerial attack yet on Ukraine.
-+    In a rare rebuke, Trump said: "What the hell happened to him? He's killing a lot of people." He later called Putin "absolutely crazy".
-+    Ukrainian President Volodymyr Zelensky earlier said Washington's "silence" over recent Russian attacks was encouraging Putin, urging "strong pressure" - including tougher sanctions - on Moscow.
-+    """
-         ]
-+
-         tokenizer = getattr(self, "_warmup_tokenizer", None)
-         if tokenizer is None:
-             from mindnlp.transformers import AutoTokenizer
-             tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-             self._warmup_tokenizer = tokenizer
- 
--        for text in test_texts:
-+        # 跑一遍 warmup_prompts，触发路由逻辑
-+        for text in warmup_prompts:
-             inputs = tokenizer(text, return_tensors="ms")
-             with mindspore._no_grad():
-                 _ = self(**inputs, use_cache=False)
-+
-+        # 这里可以加按需缓存逻辑，避免显存 OOM
-+        from mindnlp.transformers.models.deepseek.modeling_deepseek import DeepseekMoE
-+        for module in self.modules():
-+            if isinstance(module, DeepseekMoE):
-+                active_ids = getattr(module, "_last_routed_expert_ids", None)
-+                if active_ids is not None:
-+                    module.init_active_expert_cache(active_ids)
-         print("[Warmup] DeepSeek-MoE 模型预热完成。")
- 
-+    def init_active_expert_cache(self, active_ids):
-+        self.cache_gate_w = ops.stack([self.experts[i].gate_proj.weight for i in active_ids], dim=0)
-+        self.cache_up_w   = ops.stack([self.experts[i].up_proj.weight for i in active_ids], dim=0)
-+        self.cache_down_w = ops.stack([self.experts[i].down_proj.weight for i in active_ids], dim=0)
-+
-     def get_input_embeddings(self):
-         return self.model.embed_tokens
- 
-@@ -2208,7 +2289,9 @@ if __name__ == "__main__":
-     config.num_hidden_layers = 2
-     config.n_routed_experts = 2
-     model = DeepseekForCausalLM(config)
--
-+    # for module in model.modules():
-+    #     if isinstance(module, DeepseekMoE):
-+    #         module.init_expert_cache()
-     print('init model')
-     input_ids = mindspore.Tensor(np.random.randint(0, 10000, (1, 11)), mindspore.int32)
-     attention_mask = mindspore.Tensor(np.ones((1,11)), mindspore.int32)
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-index 513dd40b..8de61195 100644
---- a/patches/0001-20251104commit.patch
-+++ b/patches/0001-20251104commit.patch
-@@ -1,7 +1,7 @@
- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Tue, 4 Nov 2025 09:11:51 +0800
--Subject: [PATCH 1/7] 20251104commit
-+Subject: [PATCH 1/8] 20251104commit
- 
- ---
-  mindnlp/transformers/cache_utils.py           |  28 +-
-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-index 41081b85..d7a129ea 100644
---- a/patches/0002-20251106commit.patch
-+++ b/patches/0002-20251106commit.patch
-@@ -1,7 +1,7 @@
- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 09:20:38 +0800
--Subject: [PATCH 2/7] 20251106commit
-+Subject: [PATCH 2/8] 20251106commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-index c1392569..179a9bb5 100644
---- a/patches/0003-20261106secondcommit.patch
-+++ b/patches/0003-20261106secondcommit.patch
-@@ -1,7 +1,7 @@
- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 14:54:37 +0800
--Subject: [PATCH 3/7] 20261106secondcommit
-+Subject: [PATCH 3/8] 20261106secondcommit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-index e548b1b2..bc5549ca 100644
---- a/patches/0004-20251106change.patch
-+++ b/patches/0004-20251106change.patch
-@@ -1,7 +1,7 @@
- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Thu, 6 Nov 2025 15:48:09 +0800
--Subject: [PATCH 4/7] 20251106change
-+Subject: [PATCH 4/8] 20251106change
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  189 +-
-diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-index bf224d2a..7217a46b 100644
---- a/patches/0005-20251107001commit.patch
-+++ b/patches/0005-20251107001commit.patch
-@@ -1,7 +1,7 @@
- From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Fri, 7 Nov 2025 11:48:18 +0800
--Subject: [PATCH 5/7] 20251107001commit
-+Subject: [PATCH 5/8] 20251107001commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |   91 +-
-diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
-index 1bd306b9..80906633 100644
---- a/patches/0006-20251107002commit.patch
-+++ b/patches/0006-20251107002commit.patch
-@@ -1,7 +1,7 @@
- From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Fri, 7 Nov 2025 12:06:32 +0800
--Subject: [PATCH 6/7] 20251107002commit
-+Subject: [PATCH 6/8] 20251107002commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |  122 +-
-diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
-index ce558554..8a2fc4fe 100644
---- a/patches/0007-20251107003commit.patch
-+++ b/patches/0007-20251107003commit.patch
-@@ -1,7 +1,7 @@
- From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
- From: Pinoeer-kingxi <13022943007@163.com>
- Date: Fri, 7 Nov 2025 12:12:51 +0800
--Subject: [PATCH 7/7] 20251107003commit
-+Subject: [PATCH 7/8] 20251107003commit
- 
- ---
-  .../models/deepseek/modeling_deepseek.py      |    2 +-
-diff --git a/patches/0008-moe-change.patch b/patches/0008-moe-change.patch
-new file mode 100644
-index 00000000..349f1429
---- /dev/null
-+++ b/patches/0008-moe-change.patch
-@@ -0,0 +1,8789 @@
-+From 45ba3bbc411b64cbffd547fa3d66bce9545639dd Mon Sep 17 00:00:00 2001
-+From: Pinoeer-kingxi <13022943007@163.com>
-+Date: Sun, 9 Nov 2025 00:50:01 +0800
-+Subject: [PATCH 8/8] moe change
-+
-+---
-+ .../models/deepseek/modeling_deepseek.py      |  433 +-
-+ .../models/qwen2_moe/modeling_qwen2_moe.py    |   86 +-
-+ patches/0001-20251104commit.patch             |    2 +-
-+ patches/0002-20251106commit.patch             |    2 +-
-+ patches/0003-20261106secondcommit.patch       |    2 +-
-+ patches/0004-20251106change.patch             |    2 +-
-+ patches/0005-20251107001commit.patch          |    2 +-
-+ patches/0006-20251107002commit.patch          |    2 +-
-+ patches/0007-20251107003commit.patch          | 8034 +++++++++++++++++
-+ 9 files changed, 8510 insertions(+), 55 deletions(-)
-+ create mode 100644 patches/0007-20251107003commit.patch
-+
-+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+index ff631974..0af29305 100644
-+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+@@ -19,8 +19,10 @@
-+ # limitations under the License.
-+ """ MindNLP DeepSeek model."""
-+ import math
-++import time
-+ import warnings
-+ from typing import List, Optional, Tuple, Union
-++from mindspore import mint
-+ import mindspore
-+ from mindnlp.core import nn, ops, no_grad
-+ from mindnlp.core.nn import functional as F
-+@@ -54,6 +56,10 @@ logger = logging.get_logger(__name__)
-+ 
-+ _CONFIG_FOR_DOC = "DeepseekConfig"
-+ 
-++Long_Prompt = 1
-++LONG_PROMPT_LENGTH_THRESHOLD = 128
-++SHORT_PROMPT_LENGTH_THRESHOLD = 32
-++
-+ _attn_mask_cache = {}
-+ 
-+ def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-+@@ -380,6 +386,8 @@ class MoEGate(nn.Module):
-+         return topk_idx, topk_weight, aux_loss
-+ 
-+ 
-++bincount_op = mindspore.ops.Bincount()
-++
-+ class DeepseekMoE(nn.Module):
-+     """
-+     A mixed expert module containing shared experts.
-+@@ -413,7 +421,10 @@ class DeepseekMoE(nn.Module):
-+                     y = y + self.shared_experts(identity)
-+                 return y
-+             else:
-+-                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++                if Long_Prompt == 0:
-++                    y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++                else:
-++                    y= self.moe_infer_prefill_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+                 if self.config.n_shared_experts is not None:
-+                     y = y + self.shared_experts(identity)
-+                 return y
-+@@ -421,7 +432,103 @@ class DeepseekMoE(nn.Module):
-+         # if self.config.n_shared_experts is not None:
-+         #     y = y + self.shared_experts(identity)
-+         # return y
-+-        
-++    
-++    
-++    
-++    # lwx
-++    # def forward(self, x, expert_ids: Optional[mindspore.Tensor] = None):
-++    #     """
-++    #     如果 expert_ids 为 None，走单专家逻辑；
-++    #     如果有，多专家批量处理，保证和原逻辑一致。
-++    #     """
-++    #     if expert_ids is None:
-++    #         # 原单专家逻辑
-++    #         if self.config.pretraining_tp > 1:
-++    #             slice = self.intermediate_size // self.config.pretraining_tp
-++    #             gate_proj_slices = ops.split(self.gate_proj.weight, slice, dim=0)
-++    #             up_proj_slices = ops.split(self.up_proj.weight, slice, dim=0)
-++    #             down_proj_slices = ops.split(self.down_proj.weight, slice, dim=1)
-++    #             gate_proj = ops.cat([F.linear(x, gate_proj_slices[i]) 
-++    #                                  for i in range(self.config.pretraining_tp)], dim=-1)
-++    #             up_proj = ops.cat([F.linear(x, up_proj_slices[i]) 
-++    #                                for i in range(self.config.pretraining_tp)], dim=-1)
-++    #             intermediate_states = ops.split((self.act_fn(gate_proj) * up_proj), slice, dim=2)
-++    #             down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) 
-++    #                          for i in range(self.config.pretraining_tp)]
-++    #             down_proj = sum(down_proj)
-++    #         else:
-++    #             down_proj = self.down_proj(
-++    #                 self.act_fn(self.gate_proj(x)) * self.up_proj(x)
-++    #             )
-++    #         return down_proj
-++
-++    #     # ====== 批量多专家路径 ======
-++    #     hidden_size = x.shape[-1]
-++
-++    #     # 按 token expert_ids 选权重
-++    #     gate_weights = self.gate_proj.weight[expert_ids]  # shape: [tokens, inter_size]
-++    #     up_weights   = self.up_proj.weight[expert_ids]
-++    #     down_weights = self.down_proj.weight[expert_ids]
-++
-++    #     # 注意：pretraining_tp > 1 的分 slice 逻辑仍然要保留
-++    #     if self.config.pretraining_tp > 1:
-++    #         outputs = []
-++    #         slice = self.intermediate_size // self.config.pretraining_tp
-++    #         for i in range(self.config.pretraining_tp):
-++    #             # 每个 slice 单独计算
-++    #             gate_proj_out = F.linear(x, gate_weights[:, i*slice:(i+1)*slice])
-++    #             up_proj_out   = F.linear(x, up_weights[:, i*slice:(i+1)*slice])
-++    #             act_out       = self.act_fn(gate_proj_out) * up_proj_out
-++    #             down_proj_out = F.linear(act_out, down_weights[i*slice:(i+1)*slice, :])
-++    #             outputs.append(down_proj_out)
-++    #         return sum(outputs)
-++    #     else:
-++    #         gate_proj_out = F.linear(x, gate_weights)
-++    #         up_proj_out   = F.linear(x, up_weights)
-++    #         act_out       = self.act_fn(gate_proj_out) * up_proj_out
-++    #         return F.linear(act_out, down_weights)
-++    # @no_grad()
-++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++    #     num_tokens = x.shape[0]
-++    #     hidden_size = x.shape[-1]
-++
-++    #     idxs = flat_expert_indices.argsort()
-++    #     sorted_expert_indices = flat_expert_indices[idxs]
-++    #     sorted_token_indices = idxs // self.num_experts_per_tok
-++    #     sorted_indices = sorted_token_indices
-++
-++    #     permuted_tokens = x[sorted_token_indices]
-++    #     sorted_weights  = flat_expert_weights[idxs]
-++
-++    #     # 一次调用多专家 forward
-++    #     expert_outputs = ops.zeros_like(permuted_tokens)
-++    #     expert_outputs = self.mlp_batch_forward(permuted_tokens, sorted_expert_indices)
-++
-++    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
-++    #     try:
-++    #         final_output = ops.moe_token_unpermute(
-++    #             expert_outputs,
-++    #             sorted_indices,
-++    #             probs=probs,
-++    #             padded_mode=False
-++    #         )
-++    #     except Exception:
-++    #         final_output = ops.zeros_like(x)
-++    #         final_output = mindspore.mint.scatter_add(
-++    #             final_output,
-++    #             0,
-++    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-++    #             expert_outputs * sorted_weights
-++    #         )
-++
-++    #     return final_output
-++
-++    # def mlp_batch_forward(self, tokens, expert_ids):
-++    #     """
-++    #     使用批量专家 forward（保留精度）
-++    #     """
-++    #     return self.experts[0].forward(tokens, expert_ids)
-++
-+     # @no_grad()
-+     # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+ 
-+@@ -434,52 +541,15 @@ class DeepseekMoE(nn.Module):
-+     #         expert_cache += expert_out * weight
-+     #     return expert_cache
-+     
-++    #@dwj
-+     @no_grad()
-+-    # dwj
-+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+-        # x 的 shape: (1, hidden_size)
-+-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+-
-+-        # 1. 收集所有需要的专家层
-+-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+         selected_experts = [self.experts[i] for i in flat_expert_indices]
-+-
-+-        # 2. 并行计算所有专家的输出
-+-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+-        # ops.cat 会将它们堆叠成一个新的 Tensor
-+-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+         expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+-
-+-        # 3. 使用矩阵乘法进行加权求和
-+-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+-        # 最终结果 final_output 的 shape: (1, hidden_size)
-+         final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+-        
-+         return final_output
-+ 
-+ 
-+-    # @no_grad()
-+-    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+-    #     expert_cache = ops.zeros_like(x)
-+-    #     idxs = flat_expert_indices.argsort()
-+-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+-    #     token_idxs = idxs // self.num_experts_per_tok
-+-
-+-    #     for i, end_idx in enumerate(tokens_per_expert):
-+-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+-    #         if start_idx == end_idx:
-+-    #             continue
-+-    #         expert = self.experts[i]
-+-    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+-    #         expert_tokens = x[exp_token_idx]
-+-    #         expert_out = expert(expert_tokens)
-+-    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+-    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+-
-+-    #     return expert_cache
-+-        
-+     @no_grad()
-+     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+         """
-+@@ -525,6 +595,264 @@ class DeepseekMoE(nn.Module):
-+             )
-+ 
-+         return expert_cache
-++
-++
-++    # @no_grad()
-++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++    #     """
-++    #     优化版 MoE prefill：使用 mindspore.ops.moe_token_unpermute 替代手动 scatter_add
-++    #     """
-++    #     num_tokens = x.shape[0]
-++    #     hidden_size = x.shape[-1]
-++
-++    #     # 生成排序后的 token 索引
-++    #     idxs = flat_expert_indices.argsort()
-++    #     sorted_expert_indices = flat_expert_indices[idxs]
-++    #     sorted_token_indices = idxs // self.num_experts_per_tok
-++
-++    #     # 记录到 sorted_indices（moe_token_unpermute 用）
-++    #     sorted_indices = sorted_token_indices  # shape: [num_tokens * top_k]
-++
-++    #     # 收集专家输入
-++    #     permuted_tokens = x[sorted_token_indices]
-++
-++    #     # 执行每个专家的 MLP（批量处理）
-++    #     expert_outputs = []
-++    #     token_ptr = 0
-++    #     tokens_per_expert = sorted_expert_indices.bincount()
-++    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
-++    #         if count == 0:
-++    #             continue
-++    #         cur_tokens = permuted_tokens[token_ptr:token_ptr+count]
-++    #         out = self.experts[expert_id](cur_tokens)
-++    #         expert_outputs.append(out)
-++    #         token_ptr += count
-++
-++    #     # 拼接所有专家输出
-++    #     permuted_outputs = ops.cat(expert_outputs, axis=0)
-++
-++    #     # 权重缩放（probs 形状为 [num_tokens, top_k]）
-++    #     probs = flat_expert_weights.view(num_tokens, self.num_experts_per_tok)
-++
-++    #     # 直接调用硬件加速的 unpermute
-++    #     final_output = ops.moe_token_unpermute(
-++    #         permuted_outputs,         # shape: [num_tokens * top_k, hidden_size]
-++    #         sorted_indices,           # shape: [num_tokens * top_k]
-++    #         probs=probs,               # 按概率加权
-++    #         padded_mode=False
-++    #     )
-++
-++    #     return final_output
-++
-++    # lwx prefill 20251108
-++    @no_grad()
-++    def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
-++        """
-++        高性能 + 数值一致的 MoE prefill 推理：
-++        1. 批量化处理所有专家计算，减少 Python 循环开销
-++        2. Ascend A2 上使用 ops.moe_token_unpermute 加速 token 恢复
-++        3. CPU/GPU 上自动 fallback 到 scatter_add 实现
-++        4. 保证权重和 token 排列顺序与原版本完全一致，避免生成结果 mismatch
-++
-++        参数：
-++            x: [num_tokens, hidden_size]，
-++            MoE 输入的 token 表示
-++            flat_expert_indices: [num_tokens * top_k]，
-++            每个 token 的路由专家 id
-++            flat_expert_weights: [num_tokens * top_k, 1]，
-++            路由专家权重
-++        """
-++        num_tokens = x.shape[0]
-++        hidden_size = x.shape[-1]
-++
-++        # 1) 排序专家分配（与原 scatter_add 一致的顺序）
-++        idxs = flat_expert_indices.argsort()  # 排序索引
-++        sorted_expert_indices = flat_expert_indices[idxs]          # [num_tokens*top_k]
-++        sorted_token_indices = idxs // self.num_experts_per_tok    # 原 token ID
-++
-++        # sorted_indices 必须与 permuted_tokens 顺序匹配
-++        sorted_indices = sorted_token_indices   # 用原 token 位置恢复顺序
-++
-++        # 2) 收集专家输入（按 idxs 排序）
-++        permuted_tokens = x[sorted_token_indices]                  # [num_tokens*top_k, hidden_size]
-++        sorted_weights  = flat_expert_weights[idxs]                 # [num_tokens*top_k, 1]，确保与 permuted_tokens 对齐
-++
-++        # 3) 计算每个专家的 token 数
-++        tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
-++
-++        # 4) 批量专家计算（减少 Python 循环）
-++        gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
-++        up_weights   = ops.stack([expert.up_proj.weight   for expert in self.experts], dim=0)
-++        down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
-++
-++        expert_outputs = ops.zeros_like(permuted_tokens)
-++        ptr = 0
-++        for expert_id, count in enumerate(tokens_per_expert.tolist()):
-++            if count == 0:
-++                continue
-++            tokens = permuted_tokens[ptr:ptr+count]  # [count, hidden_size]
-++            
-++            # 与 DeepseekMLP forward 等价
-++            gate_proj_out = F.linear(tokens, gate_weights[expert_id])
-++            up_proj_out   = F.linear(tokens, up_weights[expert_id])
-++            act_out       = self.experts[expert_id].act_fn(gate_proj_out) * up_proj_out
-++            expert_out    = F.linear(act_out, down_weights[expert_id])
-++
-++            expert_outputs[ptr:ptr+count] = expert_out
-++            ptr += count
-++
-++        # 5) Ascend 加速的 unpermute（已排序的权重）
-++        probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)  # 按排序后的顺序 reshape
-++
-++        final_output = ops.zeros_like(x)
-++        final_output = mindspore.mint.scatter_add(
-++            final_output,
-++            0,
-++            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-++            expert_outputs * sorted_weights
-++        )      
-++
-++
-++        # try:
-++        #     final_output = ops.moe_token_unpermute(
-++        #         expert_outputs,   # [num_tokens*top_k, hidden_size]
-++        #         sorted_indices,   # [num_tokens*top_k] 原 token id
-++        #         probs=probs,      # 对应权重
-++        #         padded_mode=False
-++        #     )
-++        # except Exception:
-++        #     # CPU/GPU fallback：用 scatter_add 保证完全一致
-++        #     final_output = ops.zeros_like(x)
-++        #     final_output = mindspore.mint.scatter_add(
-++        #         final_output,
-++        #         0,
-++        #         sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-++        #         expert_outputs * sorted_weights
-++        #     )
-++
-++        return final_output
-++
-++
-++    # @no_grad()
-++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++    #     num_tokens = x.shape[0]
-++    #     hidden_size = x.shape[-1]
-++
-++    #     idxs = flat_expert_indices.argsort()
-++    #     sorted_expert_indices = flat_expert_indices[idxs]
-++    #     sorted_token_indices = idxs // self.num_experts_per_tok
-++        
-++    #     # sorted_indices = sorted_token_indices
-++    #     sorted_indices = sorted_token_indices.astype(mindspore.int32)
-++    #     permuted_tokens = x[sorted_token_indices]
-++    #     sorted_weights = flat_expert_weights[idxs]
-++    #     tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
-++
-++    #     expert_outputs = ops.zeros_like(permuted_tokens)
-++    #     ptr = 0
-++
-++    #     # 只按专家维度循环
-++    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
-++    #         if count == 0:
-++    #             continue
-++    #         token_slice = slice(ptr, ptr + count)
-++    #         expert_tokens = permuted_tokens[token_slice]
-++
-++    #         # 保持原 forward（含 pretraining_tp、bias 等）
-++    #         expert_out = self.experts[expert_id](expert_tokens)
-++
-++    #         expert_outputs[token_slice] = expert_out
-++    #         ptr += count
-++
-++    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
-++    #     try:
-++    #         final_output = mindspore.ops.moe_token_unpermute(
-++    #             expert_outputs,
-++    #             sorted_indices,
-++    #             probs=probs,
-++    #             padded_mode=False
-++    #         )
-++    #     except Exception:
-++    #         final_output = ops.zeros_like(x)
-++    #         final_output = mindspore.mint.scatter_add(
-++    #             final_output,
-++    #             0,
-++    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-++    #             expert_outputs * sorted_weights
-++    #         )
-++
-++    #     return final_output
-++
-++
-++    #lwx
-++    # @no_grad()
-++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++    #     """
-++    #     并行化 MoE prefill：
-++    #     - 一次性计算所有专家输出，牺牲显存峰值换取速度
-++    #     - 保证结果与原版完全一致
-++    #     """
-++    #     # 输出缓存
-++    #     expert_cache = ops.zeros_like(x)
-++
-++    #     # token 总数（批量*seq_len*num_experts_per_tok）
-++    #     num_tokens = flat_expert_indices.shape[0]
-++    #     hidden_dim = x.shape[-1]
-++
-++    #     # 原 token ID（idxs // num_experts_per_tok）
-++    #     token_ids = ops.arange(num_tokens // self.num_experts_per_tok).repeat_interleave(self.num_experts_per_tok)
-++
-++    #     # ====== Step 1: 组织输入 ======
-++    #     # 按 experts 排序，保证 scatter_add 对应位置一致
-++    #     sort_ids = flat_expert_indices.argsort()
-++    #     sorted_experts = flat_expert_indices[sort_ids]
-++    #     sorted_tokens = token_ids[sort_ids]
-++    #     sorted_weights = flat_expert_weights[sort_ids]
-++
-++    #     # 收集每个专家的输入
-++    #     # build: expert_inputs[expert_id] = [tokens...]
-++    #     expert_inputs = []
-++    #     expert_outs = []
-++
-++    #     for eid in range(self.config.n_routed_experts):
-++    #         eid_mask = (sorted_experts == eid)
-++    #         if eid_mask.any():
-++    #             tokens_for_eid = x[sorted_tokens[eid_mask]]
-++    #             expert_inputs.append(tokens_for_eid)
-++    #         else:
-++    #             expert_inputs.append(None)
-++
-++    #     # ====== Step 2: 并行计算所有专家输出 ======
-++    #     # 存储所有专家结果到一个列表
-++    #     for eid in range(self.config.n_routed_experts):
-++    #         if expert_inputs[eid] is not None:
-++    #             out = self.experts[eid](expert_inputs[eid])
-++    #             expert_outs.append(out)
-++    #         else:
-++    #             expert_outs.append(None)
-++
-++    #     # ====== Step 3: scatter_add 回写结果 ======
-++    #     # 遍历专家，将结果加回对应的 token
-++    #     pos = 0
-++    #     for eid in range(self.config.n_routed_experts):
-++    #         if expert_outs[eid] is not None:
-++    #             size = expert_outs[eid].shape[0]
-++    #             tokens_idx = sorted_tokens[pos:pos+size]
-++    #             scaled_out = expert_outs[eid] * sorted_weights[pos:pos+size]
-++    #             pos += size
-++
-++    #             # scatter_add 到 expert_cache
-++    #             expert_cache = mindspore.mint.scatter_add(
-++    #                 expert_cache,
-++    #                 dim=0,
-++    #                 index=tokens_idx.view(-1, 1).tile((1, hidden_dim)),
-++    #                 src=scaled_out
-++    #             )
-++
-++    #     return expert_cache
-++
-++
-++
-+ # 放置在 DeepseekMoE 类中
-+     # @no_grad()
-+     # #lwx 20251107
-+@@ -1188,7 +1516,7 @@ class DeepseekDecoderLayer(nn.Module):
-+         self.hidden_size = config.hidden_size
-+ 
-+         # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-+-            # config=config, layer_idx=layer_idx
-++        #     config=config, layer_idx=layer_idx
-+         # )
-+ 
-+         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-+@@ -1204,6 +1532,7 @@ class DeepseekDecoderLayer(nn.Module):
-+             )
-+             else DeepseekMLP(config)
-+         )
-++
-+         self.input_layernorm = DeepseekRMSNorm(
-+             config.hidden_size, eps=config.rms_norm_eps
-+         )
-+@@ -1537,6 +1866,28 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+     def get_decoder(self):
-+         return self.model
-+ 
-++    def generate(self, *args, **kwargs):
-++        """
-++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-++        """
-++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-++
-++        input_ids = kwargs.get("input_ids")
-++        if input_ids is None and args:
-++            input_ids = args[0]
-++
-++        if input_ids is not None:
-++            prompt_length = input_ids.shape[1]
-++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-++                Long_Prompt = 2
-++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-++                Long_Prompt = 0
-++            else:
-++                Long_Prompt = 1
-++
-++
-++        return super().generate(*args, **kwargs)
-+ 
-+     def forward(
-+         self,
-+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+index 913a7609..6566958b 100644
-+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+@@ -1104,7 +1104,7 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+ 
-+     # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-+     @no_grad()
-+-    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++    def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+         original_dtype = hidden_states.dtype
-+         batch_size, _ = hidden_states.shape
-+         expert_outputs_list = [
-+@@ -1119,8 +1119,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+         return moe_output_fp32.squeeze(1).to(original_dtype)
-+ 
-++
-+     # @no_grad()
-+-    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++    # def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+     #     num_tokens, _ = hidden_states.shape
-+     #     flat_selected_experts = selected_experts.flatten()
-+     #     sorted_expert_indices = flat_selected_experts.argsort()
-+@@ -1142,8 +1143,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+     #         current_token_offset += expert_token_count
-+     #     return moe_output
-+ 
-++    # baseline
-+     @no_grad()
-+-    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++    def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+         """
-+         优化版 MoE prefill (速度优先模式)：
-+         - 批量张量化处理同一个 expert 的所有 token
-+@@ -1184,7 +1186,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+         return moe_output
-+ 
-+ 
-++    @no_grad()
-++    def _moe_infer_prefill_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++        """
-++        优化版 MoE prefill (速度优先模式) - 连续切片 & 单次 scatter_add
-++        逻辑：
-++        1. 按 expert 排序，将同一 expert 的 token 放在连续内存中
-++        2. 每个 expert 一次性处理其全部 token
-++        3. 最后一次 scatter_add 回到原 token 顺序
-++        """
-++
-++        num_tokens = hidden_states.shape[0]
-++        hidden_size = hidden_states.shape[-1]
-++
-++        # 展平为一维
-++        flat_selected_experts = selected_experts.flatten()       # [num_tokens * top_k]
-++        flat_routing_weights = routing_weights.flatten()         # [num_tokens * top_k]
-++
-++        # 按 expert 排序
-++        idxs = flat_selected_experts.argsort()
-++        sorted_expert_indices = flat_selected_experts[idxs]       # expert ID 排序后
-++        sorted_token_indices = idxs // self.top_k                 # 对应原 token ID
-++
-++        # 排好序的输入向量（连续内存）
-++        permuted_tokens = hidden_states[sorted_token_indices]
-++
-++        # 排好序的权重
-++        sorted_weights = flat_routing_weights[idxs]
-++
-++        # 每个 expert 对应的 token 数量
-++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-++
-++        # 存放专家输出（与 permuted_tokens 对应顺序保持一致）
-++        expert_outputs = ops.zeros_like(permuted_tokens)
-++
-++        ptr = 0  # 指向当前切片的起点
-++        for expert_id, count in enumerate(tokens_per_expert.tolist()):
-++            if count == 0:
-++                continue
-++
-++            token_slice = slice(ptr, ptr + count)
-++            expert_tokens = permuted_tokens[token_slice]  # 连续切片
-++
-++            # 执行专家 MLP
-++            expert_out = self.experts[expert_id](expert_tokens)
-++
-++            expert_outputs[token_slice] = expert_out
-++            ptr += count
-++
-++        # 按权重缩放
-++        scaled_outputs = expert_outputs * sorted_weights.unsqueeze(1)
-++
-++        # 回写到原 token 顺序 (单次 scatter_add)
-++        moe_output = mindspore.mint.scatter_add(
-++            ops.zeros_like(hidden_states),
-++            0,
-++            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
-++            scaled_outputs
-++        )
-++
-++        return moe_output
-++
-++
-++    
-+     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-++
-+     @no_grad()
-+     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+         moe_output = ops.zeros_like(hidden_states)
-+@@ -1225,16 +1291,20 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+         #     # --- 速度优先模式 (SPEED MODE) ---
-+         #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+         #     if sequence_length == 1:
-+-        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++        #         moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+         #     else:
-+-        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++        #         moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+         
-+         routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+         if sequence_length == 1:
-+-            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++            moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+         else:
-+-            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+-    
-++            # if Long_Prompt == 1:
-++            #     moe_output = self._moe_infer_prefill_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++            # else:
-++            #     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++            moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++
-+ 
-+         # 3. 共享专家计算与合并 (所有模式通用)
-+         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+index c9c8c5ee..513dd40b 100644
-+--- a/patches/0001-20251104commit.patch
-++++ b/patches/0001-20251104commit.patch
-+@@ -1,7 +1,7 @@
-+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Tue, 4 Nov 2025 09:11:51 +0800
-+-Subject: [PATCH 1/6] 20251104commit
-++Subject: [PATCH 1/7] 20251104commit
-+ 
-+ ---
-+  mindnlp/transformers/cache_utils.py           |  28 +-
-+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+index 625656eb..41081b85 100644
-+--- a/patches/0002-20251106commit.patch
-++++ b/patches/0002-20251106commit.patch
-+@@ -1,7 +1,7 @@
-+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 09:20:38 +0800
-+-Subject: [PATCH 2/6] 20251106commit
-++Subject: [PATCH 2/7] 20251106commit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+index dcb85080..c1392569 100644
-+--- a/patches/0003-20261106secondcommit.patch
-++++ b/patches/0003-20261106secondcommit.patch
-+@@ -1,7 +1,7 @@
-+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 14:54:37 +0800
-+-Subject: [PATCH 3/6] 20261106secondcommit
-++Subject: [PATCH 3/7] 20261106secondcommit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-+index bbed13cc..e548b1b2 100644
-+--- a/patches/0004-20251106change.patch
-++++ b/patches/0004-20251106change.patch
-+@@ -1,7 +1,7 @@
-+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Thu, 6 Nov 2025 15:48:09 +0800
-+-Subject: [PATCH 4/6] 20251106change
-++Subject: [PATCH 4/7] 20251106change
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  189 +-
-+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-+index b2d1035c..bf224d2a 100644
-+--- a/patches/0005-20251107001commit.patch
-++++ b/patches/0005-20251107001commit.patch
-+@@ -1,7 +1,7 @@
-+ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Fri, 7 Nov 2025 11:48:18 +0800
-+-Subject: [PATCH 5/6] 20251107001commit
-++Subject: [PATCH 5/7] 20251107001commit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |   91 +-
-+diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
-+index bffa134e..1bd306b9 100644
-+--- a/patches/0006-20251107002commit.patch
-++++ b/patches/0006-20251107002commit.patch
-+@@ -1,7 +1,7 @@
-+ From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
-+ From: Pinoeer-kingxi <13022943007@163.com>
-+ Date: Fri, 7 Nov 2025 12:06:32 +0800
-+-Subject: [PATCH 6/6] 20251107002commit
-++Subject: [PATCH 6/7] 20251107002commit
-+ 
-+ ---
-+  .../models/deepseek/modeling_deepseek.py      |  122 +-
-+diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
-+new file mode 100644
-+index 00000000..ce558554
-+--- /dev/null
-++++ b/patches/0007-20251107003commit.patch
-+@@ -0,0 +1,8034 @@
-++From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
-++From: Pinoeer-kingxi <13022943007@163.com>
-++Date: Fri, 7 Nov 2025 12:12:51 +0800
-++Subject: [PATCH 7/7] 20251107003commit
-++
-++---
-++ .../models/deepseek/modeling_deepseek.py      |    2 +-
-++ patches/0001-20251104commit.patch             |    2 +-
-++ patches/0002-20251106commit.patch             |    2 +-
-++ patches/0003-20261106secondcommit.patch       |    2 +-
-++ patches/0004-20251106change.patch             |    2 +-
-++ patches/0005-20251107001commit.patch          |    2 +-
-++ patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
-++ 7 files changed, 7937 insertions(+), 6 deletions(-)
-++ create mode 100644 patches/0006-20251107002commit.patch
-++
-++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++index e7e1c053..ff631974 100644
-++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
-++     #     return expert_cache
-++     
-++     @no_grad()
-++-    dwj
-+++    # dwj
-++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++         # x 的 shape: (1, hidden_size)
-++         # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++index 2842180e..c9c8c5ee 100644
-++--- a/patches/0001-20251104commit.patch
-+++++ b/patches/0001-20251104commit.patch
-++@@ -1,7 +1,7 @@
-++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Tue, 4 Nov 2025 09:11:51 +0800
-++-Subject: [PATCH 1/5] 20251104commit
-+++Subject: [PATCH 1/6] 20251104commit
-++ 
-++ ---
-++  mindnlp/transformers/cache_utils.py           |  28 +-
-++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-++index c6cd8757..625656eb 100644
-++--- a/patches/0002-20251106commit.patch
-+++++ b/patches/0002-20251106commit.patch
-++@@ -1,7 +1,7 @@
-++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Thu, 6 Nov 2025 09:20:38 +0800
-++-Subject: [PATCH 2/5] 20251106commit
-+++Subject: [PATCH 2/6] 20251106commit
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-++index 601960c9..dcb85080 100644
-++--- a/patches/0003-20261106secondcommit.patch
-+++++ b/patches/0003-20261106secondcommit.patch
-++@@ -1,7 +1,7 @@
-++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Thu, 6 Nov 2025 14:54:37 +0800
-++-Subject: [PATCH 3/5] 20261106secondcommit
-+++Subject: [PATCH 3/6] 20261106secondcommit
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-++index 8976f10b..bbed13cc 100644
-++--- a/patches/0004-20251106change.patch
-+++++ b/patches/0004-20251106change.patch
-++@@ -1,7 +1,7 @@
-++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Thu, 6 Nov 2025 15:48:09 +0800
-++-Subject: [PATCH 4/5] 20251106change
-+++Subject: [PATCH 4/6] 20251106change
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |  189 +-
-++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-++index 8d9032be..b2d1035c 100644
-++--- a/patches/0005-20251107001commit.patch
-+++++ b/patches/0005-20251107001commit.patch
-++@@ -1,7 +1,7 @@
-++ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-++ From: Pinoeer-kingxi <13022943007@163.com>
-++ Date: Fri, 7 Nov 2025 11:48:18 +0800
-++-Subject: [PATCH 5/5] 20251107001commit
-+++Subject: [PATCH 5/6] 20251107001commit
-++ 
-++ ---
-++  .../models/deepseek/modeling_deepseek.py      |   91 +-
-++diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
-++new file mode 100644
-++index 00000000..bffa134e
-++--- /dev/null
-+++++ b/patches/0006-20251107002commit.patch
-++@@ -0,0 +1,7931 @@
-+++From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
-+++From: Pinoeer-kingxi <13022943007@163.com>
-+++Date: Fri, 7 Nov 2025 12:06:32 +0800
-+++Subject: [PATCH 6/6] 20251107002commit
-+++
-+++---
-+++ .../models/deepseek/modeling_deepseek.py      |  122 +-
-+++ patches/0001-20251104commit.patch             |    2 +-
-+++ patches/0002-20251106commit.patch             |    2 +-
-+++ patches/0003-20261106secondcommit.patch       |    2 +-
-+++ patches/0004-20251106change.patch             |    2 +-
-+++ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
-+++ 6 files changed, 7773 insertions(+), 64 deletions(-)
-+++ create mode 100644 patches/0005-20251107001commit.patch
-+++
-+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++index 8831e4b7..e7e1c053 100644
-+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
-+++     #         expert_out = expert(x)
-+++     #         expert_cache += expert_out * weight
-+++     #     return expert_cache
-+++-
-+++-    # @no_grad()
-+++-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++-    #     # x 的 shape: (1, hidden_size)
-+++-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+++-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+++-
-+++-    #     # 1. 收集所有需要的专家层
-+++-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+++-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
-+++-
-+++-    #     # 2. 并行计算所有专家的输出
-+++-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+++-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
-+++-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+++-
-+++-    #     # 3. 使用矩阵乘法进行加权求和
-+++-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+++-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
-+++-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++++    
-++++    @no_grad()
-++++    dwj
-++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++        # x 的 shape: (1, hidden_size)
-++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++++
-++++        # 1. 收集所有需要的专家层
-++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-++++
-++++        # 2. 并行计算所有专家的输出
-++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++++        # ops.cat 会将它们堆叠成一个新的 Tensor
-++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++++
-++++        # 3. 使用矩阵乘法进行加权求和
-++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++        # 最终结果 final_output 的 shape: (1, hidden_size)
-++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+++         
-+++-    #     return final_output
-++++        return final_output
-+++ 
-+++ 
-+++     # @no_grad()
-+++@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
-+++ 
-+++         return expert_cache
-+++ # 放置在 DeepseekMoE 类中
-+++-    @no_grad()
-+++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++-        """
-+++-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-+++-
-+++-        Args:
-+++-            x (Tensor): 输入张量, shape: (1, hidden_size)
-+++-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-+++-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-+++-        """
-+++-        top_k, _ = flat_expert_weights.shape
-+++-        hidden_size = x.shape[-1]
-+++-
-+++-        # 1. 将所有专家的权重堆叠起来
-+++-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-+++-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-+++-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-++++    # @no_grad()
-++++    # #lwx 20251107
-++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++    #     """
-++++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-++++
-++++    #     Args:
-++++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
-++++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-++++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-++++    #     """
-++++    #     top_k, _ = flat_expert_weights.shape
-++++    #     hidden_size = x.shape[-1]
-++++
-++++    #     # 1. 将所有专家的权重堆叠起来
-++++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-++++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-++++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-+++         
-+++-        # 2. "收集" 所需的专家权重
-+++-        selected_gate_w = stacked_gate_w[flat_expert_indices]
-+++-        selected_up_w = stacked_up_w[flat_expert_indices]
-+++-        selected_down_w = stacked_down_w[flat_expert_indices]
-++++    #     # 2. "收集" 所需的专家权重
-++++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
-++++    #     selected_up_w = stacked_up_w[flat_expert_indices]
-++++    #     selected_down_w = stacked_down_w[flat_expert_indices]
-+++         
-+++-        # 3. 准备输入
-+++-        x_expanded = x.expand((top_k, 1, hidden_size))
-++++    #     # 3. 准备输入
-++++    #     x_expanded = x.expand((top_k, 1, hidden_size))
-+++         
-+++-        # 4. 并行计算 gate_proj 和 up_proj
-+++-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-+++-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-++++    #     # 4. 并行计算 gate_proj 和 up_proj
-++++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-++++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-+++ 
-+++-        # 5. 计算中间状态
-+++-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-++++    #     # 5. 计算中间状态
-++++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-+++         
-+++-        # 6. 并行计算 down_proj
-+++-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-+++-        #    --- [FIX] ---
-+++-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-+++-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-+++-        #    --- [FIX END] ---
-++++    #     # 6. 并行计算 down_proj
-++++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-++++    #     #    --- [FIX] ---
-++++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-++++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-++++    #     #    --- [FIX END] ---
-+++         
-+++-        # 7. 根据路由权重进行加权求和
-+++-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-++++    #     # 7. 根据路由权重进行加权求和
-++++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-+++ 
-+++-        return weighted_sum
-++++    #     return weighted_sum
-+++ 
-+++ 
-+++ 
-+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+++index 0a0ef2d7..2842180e 100644
-+++--- a/patches/0001-20251104commit.patch
-++++++ b/patches/0001-20251104commit.patch
-+++@@ -1,7 +1,7 @@
-+++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++ From: Pinoeer-kingxi <13022943007@163.com>
-+++ Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++-Subject: [PATCH 1/4] 20251104commit
-++++Subject: [PATCH 1/5] 20251104commit
-+++ 
-+++ ---
-+++  mindnlp/transformers/cache_utils.py           |  28 +-
-+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+++index 5185270c..c6cd8757 100644
-+++--- a/patches/0002-20251106commit.patch
-++++++ b/patches/0002-20251106commit.patch
-+++@@ -1,7 +1,7 @@
-+++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-+++ From: Pinoeer-kingxi <13022943007@163.com>
-+++ Date: Thu, 6 Nov 2025 09:20:38 +0800
-+++-Subject: [PATCH 2/4] 20251106commit
-++++Subject: [PATCH 2/5] 20251106commit
-+++ 
-+++ ---
-+++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+++index 3e05f821..601960c9 100644
-+++--- a/patches/0003-20261106secondcommit.patch
-++++++ b/patches/0003-20261106secondcommit.patch
-+++@@ -1,7 +1,7 @@
-+++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-+++ From: Pinoeer-kingxi <13022943007@163.com>
-+++ Date: Thu, 6 Nov 2025 14:54:37 +0800
-+++-Subject: [PATCH 3/4] 20261106secondcommit
-++++Subject: [PATCH 3/5] 20261106secondcommit
-+++ 
-+++ ---
-+++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-+++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-+++index 88a1aef4..8976f10b 100644
-+++--- a/patches/0004-20251106change.patch
-++++++ b/patches/0004-20251106change.patch
-+++@@ -1,7 +1,7 @@
-+++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-+++ From: Pinoeer-kingxi <13022943007@163.com>
-+++ Date: Thu, 6 Nov 2025 15:48:09 +0800
-+++-Subject: [PATCH 4/4] 20251106change
-++++Subject: [PATCH 4/5] 20251106change
-+++ 
-+++ ---
-+++  .../models/deepseek/modeling_deepseek.py      |  189 +-
-+++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-+++new file mode 100644
-+++index 00000000..8d9032be
-+++--- /dev/null
-++++++ b/patches/0005-20251107001commit.patch
-+++@@ -0,0 +1,7707 @@
-++++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-++++From: Pinoeer-kingxi <13022943007@163.com>
-++++Date: Fri, 7 Nov 2025 11:48:18 +0800
-++++Subject: [PATCH 5/5] 20251107001commit
-++++
-++++---
-++++ .../models/deepseek/modeling_deepseek.py      |   91 +-
-++++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
-++++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
-++++ patches/0001-20251104commit.patch             |    2 +-
-++++ patches/0002-20251106commit.patch             |    2 +-
-++++ patches/0003-20261106secondcommit.patch       |    2 +-
-++++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
-++++ 7 files changed, 7577 insertions(+), 30 deletions(-)
-++++ create mode 100644 patches/0004-20251106change.patch
-++++
-++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++index 0546f318..8831e4b7 100644
-++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
-++++     #         expert_cache += expert_out * weight
-++++     #     return expert_cache
-++++ 
-++++-    @no_grad()
-++++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++-        # x 的 shape: (1, hidden_size)
-++++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-++++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-++++-
-++++-        # 1. 收集所有需要的专家层
-++++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-++++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
-++++-
-++++-        # 2. 并行计算所有专家的输出
-++++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-++++-        # ops.cat 会将它们堆叠成一个新的 Tensor
-++++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-++++-
-++++-        # 3. 使用矩阵乘法进行加权求和
-++++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-++++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-++++-        # 最终结果 final_output 的 shape: (1, hidden_size)
-++++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+++++    # @no_grad()
-+++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++    #     # x 的 shape: (1, hidden_size)
-+++++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+++++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+++++
-+++++    #     # 1. 收集所有需要的专家层
-+++++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+++++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
-+++++
-+++++    #     # 2. 并行计算所有专家的输出
-+++++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+++++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
-+++++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+++++
-+++++    #     # 3. 使用矩阵乘法进行加权求和
-+++++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+++++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
-+++++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-++++         
-++++-        return final_output
-+++++    #     return final_output
-++++ 
-++++ 
-++++     # @no_grad()
-++++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
-++++             )
-++++ 
-++++         return expert_cache
-+++++# 放置在 DeepseekMoE 类中
-+++++    @no_grad()
-+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++        """
-+++++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
-+++++
-+++++        Args:
-+++++            x (Tensor): 输入张量, shape: (1, hidden_size)
-+++++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
-+++++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
-+++++        """
-+++++        top_k, _ = flat_expert_weights.shape
-+++++        hidden_size = x.shape[-1]
-+++++
-+++++        # 1. 将所有专家的权重堆叠起来
-+++++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
-+++++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
-+++++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
-+++++        
-+++++        # 2. "收集" 所需的专家权重
-+++++        selected_gate_w = stacked_gate_w[flat_expert_indices]
-+++++        selected_up_w = stacked_up_w[flat_expert_indices]
-+++++        selected_down_w = stacked_down_w[flat_expert_indices]
-+++++        
-+++++        # 3. 准备输入
-+++++        x_expanded = x.expand((top_k, 1, hidden_size))
-+++++        
-+++++        # 4. 并行计算 gate_proj 和 up_proj
-+++++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
-+++++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-+++++
-+++++        # 5. 计算中间状态
-+++++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
-+++++        
-+++++        # 6. 并行计算 down_proj
-+++++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
-+++++        #    --- [FIX] ---
-+++++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
-+++++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
-+++++        #    --- [FIX END] ---
-+++++        
-+++++        # 7. 根据路由权重进行加权求和
-+++++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-+++++
-+++++        return weighted_sum
-+++++
-+++++
-++++ 
-++++         # @no_grad()
-++++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++index ebd7782e..913a7609 100644
-++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
-++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++++ def rotate_half(x):
-++++     """Rotates half the hidden dims of the input."""
-++++-    x1 = x[..., : x.shape[-1] // 2]
-++++-    x2 = x[..., x.shape[-1] // 2 :]
-+++++    # x1 = x[..., : x.shape[-1] // 2]
-+++++    # x2 = x[..., x.shape[-1] // 2 :]
-++++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++     return ops.cat((-x2, x1), dim=-1)
-++++ 
-++++ 
-++++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-++++index d059dcbe..2b217b64 100644
-++++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-+++++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
-++++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
-++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++++ def rotate_half(x):
-++++     """Rotates half the hidden dims of the input."""
-++++-    x1 = x[..., : x.shape[-1] // 2]
-++++-    x2 = x[..., x.shape[-1] // 2 :]
-+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++    # x1 = x[..., : x.shape[-1] // 2]
-+++++    # x2 = x[..., x.shape[-1] // 2 :]
-+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++     return ops.cat((-x2, x1), dim=-1)
-++++ 
-++++ 
-++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++++index 78f22642..0a0ef2d7 100644
-++++--- a/patches/0001-20251104commit.patch
-+++++++ b/patches/0001-20251104commit.patch
-++++@@ -1,7 +1,7 @@
-++++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++++ From: Pinoeer-kingxi <13022943007@163.com>
-++++ Date: Tue, 4 Nov 2025 09:11:51 +0800
-++++-Subject: [PATCH 1/3] 20251104commit
-+++++Subject: [PATCH 1/4] 20251104commit
-++++ 
-++++ ---
-++++  mindnlp/transformers/cache_utils.py           |  28 +-
-++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-++++index 22b65dd5..5185270c 100644
-++++--- a/patches/0002-20251106commit.patch
-+++++++ b/patches/0002-20251106commit.patch
-++++@@ -1,7 +1,7 @@
-++++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-++++ From: Pinoeer-kingxi <13022943007@163.com>
-++++ Date: Thu, 6 Nov 2025 09:20:38 +0800
-++++-Subject: [PATCH 2/3] 20251106commit
-+++++Subject: [PATCH 2/4] 20251106commit
-++++ 
-++++ ---
-++++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-++++index 966529e4..3e05f821 100644
-++++--- a/patches/0003-20261106secondcommit.patch
-+++++++ b/patches/0003-20261106secondcommit.patch
-++++@@ -1,7 +1,7 @@
-++++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-++++ From: Pinoeer-kingxi <13022943007@163.com>
-++++ Date: Thu, 6 Nov 2025 14:54:37 +0800
-++++-Subject: [PATCH 3/3] 20261106secondcommit
-+++++Subject: [PATCH 3/4] 20261106secondcommit
-++++ 
-++++ ---
-++++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
-++++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-++++new file mode 100644
-++++index 00000000..88a1aef4
-++++--- /dev/null
-+++++++ b/patches/0004-20251106change.patch
-++++@@ -0,0 +1,7498 @@
-+++++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-+++++From: Pinoeer-kingxi <13022943007@163.com>
-+++++Date: Thu, 6 Nov 2025 15:48:09 +0800
-+++++Subject: [PATCH 4/4] 20251106change
-+++++
-+++++---
-+++++ .../models/deepseek/modeling_deepseek.py      |  189 +-
-+++++ patches/0001-20251104commit.patch             | 1272 +++++++
-+++++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
-+++++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
-+++++ 4 files changed, 7244 insertions(+), 186 deletions(-)
-+++++ create mode 100644 patches/0001-20251104commit.patch
-+++++ create mode 100644 patches/0002-20251106commit.patch
-+++++ create mode 100644 patches/0003-20261106secondcommit.patch
-+++++
-+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++index 2f9192bf..0546f318 100644
-+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
-+++++ 
-+++++         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++-# class DeepseekFlashAttention(nn.Module):
-+++++-#     """
-+++++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+++++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-+++++-
-+++++-#     This class is designed as a drop-in replacement for DeepseekAttention.
-+++++-#     """
-+++++-
-+++++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+++++-#         super().__init__()
-+++++-#         self.config = config
-+++++-#         self.layer_idx = layer_idx
-+++++-#         if layer_idx is None:
-+++++-#             logger.warning(
-+++++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++-#                 "when creating this class."
-+++++-#             )
-+++++-
-+++++-#         self.attention_dropout = config.attention_dropout
-+++++-#         self.hidden_size = config.hidden_size
-+++++-#         self.num_heads = config.num_attention_heads
-+++++-#         self.head_dim = self.hidden_size // self.num_heads
-+++++-#         self.num_key_value_heads = config.num_key_value_heads
-+++++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++-#         self.max_position_embeddings = config.max_position_embeddings
-+++++-#         self.rope_theta = config.rope_theta
-+++++-#         self.is_causal = True
-+++++-
-+++++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++-#             raise ValueError(
-+++++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++++-#                 f" and `num_heads`: {self.num_heads})."
-+++++-#             )
-+++++-
-+++++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+++++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+++++-#         self._init_rope()
-+++++-
-+++++-#     def _init_rope(self):
-+++++-#         if self.config.rope_scaling is None:
-+++++-#             self.rotary_emb = DeepseekRotaryEmbedding(
-+++++-#                 self.head_dim,
-+++++-#                 max_position_embeddings=self.max_position_embeddings,
-+++++-#                 base=self.rope_theta,
-+++++-#             )
-+++++-#         else:
-+++++-#             scaling_type = self.config.rope_scaling["type"]
-+++++-#             scaling_factor = self.config.rope_scaling["factor"]
-+++++-#             if scaling_type == "linear":
-+++++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+++++-#                     self.head_dim,
-+++++-#                     max_position_embeddings=self.max_position_embeddings,
-+++++-#                     scaling_factor=scaling_factor,
-+++++-#                     base=self.rope_theta,
-+++++-#                 )
-+++++-#             elif scaling_type == "dynamic":
-+++++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+++++-#                     self.head_dim,
-+++++-#                     max_position_embeddings=self.max_position_embeddings,
-+++++-#                     scaling_factor=scaling_factor,
-+++++-#                     base=self.rope_theta,
-+++++-#                 )
-+++++-#             else:
-+++++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+++++-
-+++++-#     def forward(
-+++++-#         self,
-+++++-#         hidden_states: mindspore.Tensor,
-+++++-#         attention_mask: Optional[mindspore.Tensor] = None,
-+++++-#         position_ids: Optional[mindspore.Tensor] = None,
-+++++-#         past_key_value: Optional[Cache] = None,
-+++++-#         output_attentions: bool = False,
-+++++-#         use_cache: bool = False,
-+++++-#         **kwargs,
-+++++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++-#         if "padding_mask" in kwargs:
-+++++-#             warnings.warn(
-+++++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+++++-#             )
-+++++-        
-+++++-#         if output_attentions:
-+++++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-+++++-
-+++++-#         bsz, q_len, _ = hidden_states.shape
-+++++-
-+++++-#         if self.config.pretraining_tp > 1:
-+++++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+++++-
-+++++-#         query_states = self.q_proj(hidden_states)
-+++++-#         key_states = self.k_proj(hidden_states)
-+++++-#         value_states = self.v_proj(hidden_states)
-+++++-
-+++++-#         # Reshape for multi-head attention
-+++++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++-
-+++++-#         kv_seq_len = key_states.shape[-2]
-+++++-#         if past_key_value is not None:
-+++++-#             if self.layer_idx is None:
-+++++-#                 raise ValueError(
-+++++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++-#                     "with a layer index."
-+++++-#                 )
-+++++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++-        
-+++++-#         # Apply Rotary Positional Embedding
-+++++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++-
-+++++-#         if past_key_value is not None:
-+++++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-+++++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++-
-+++++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-+++++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-+++++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++-        
-+++++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++++-        
-+++++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++++-
-+++++-#         # Convert attention_mask for flash_attention_score
-+++++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-+++++-#         if attention_mask is not None:
-+++++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-+++++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+++++-#                 raise ValueError(
-+++++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+++++-#                 )
-+++++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-+++++-#         else:
-+++++-#             attn_mask_for_fa = None
-+++++-        
-+++++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+++++-
-+++++-#         # Call the fused flash_attention_score operator
-+++++-#         attn_output = mindspore.ops.flash_attention_score(
-+++++-#             query=query_states_for_fa,
-+++++-#             key=key_states_for_fa,
-+++++-#             value=value_states_for_fa,
-+++++-#             head_num=self.num_heads, # This is N1, the number of query heads
-+++++-#             input_layout='BSH',
-+++++-#             attn_mask=attn_mask_for_fa,
-+++++-#             keep_prob=keep_prob,
-+++++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++-#             sparse_mode=0 # Default mask mode
-+++++-#         )
-+++++-        
-+++++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-+++++-#         attn_output = self.o_proj(attn_output)
-+++++-        
-+++++-#         # Flash Attention does not return attention weights
-+++++-#         attn_weights = None
-+++++-
-+++++-#         return attn_output, attn_weights, past_key_value
-+++++ 
-+++++ class DeepseekFlashAttention(nn.Module):
-+++++     """
-+++++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
-+++++         super().__init__()
-+++++         self.hidden_size = config.hidden_size
-+++++ 
-+++++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-+++++-            config=config, layer_idx=layer_idx
-+++++-        )
-++++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
-++++++            # config=config, layer_idx=layer_idx
-++++++        # )
-+++++ 
-+++++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-+++++             config=config, layer_idx=layer_idx
-+++++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
-+++++         return outputs
-+++++ 
-+++++ 
-+++++-
-+++++ class DeepseekPreTrainedModel(PreTrainedModel):
-+++++     config_class = DeepseekConfig
-+++++     base_model_prefix = "model"
-+++++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++         # Initialize weights and apply final processing
-+++++         self.post_init()
-+++++         self.warm_up = False
-+++++-        #@dwj
-+++++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-+++++-            self.num_layers,
-+++++-            self.num_attention_heads,
-+++++-            self.head_dim,
-+++++-            batch_size=1,
-+++++-            max_length=self.max_length,
-+++++-            dtype=mindspore.float16
-+++++-        )
-+++++-
-+++++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-+++++-        key_cache = []
-+++++-        value_cache = []
-+++++-        for _ in range(num_layers):
-+++++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++++-            key_cache.append(k)
-+++++-            value_cache.append(v)
-+++++-        return key_cache, value_cache
-+++++-
-+++++ 
-+++++     def warmup_moe_model_deep(self):
-+++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-+++++new file mode 100644
-+++++index 00000000..78f22642
-+++++--- /dev/null
-++++++++ b/patches/0001-20251104commit.patch
-+++++@@ -0,0 +1,1272 @@
-++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++++++From: Pinoeer-kingxi <13022943007@163.com>
-++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
-++++++Subject: [PATCH 1/3] 20251104commit
-++++++
-++++++---
-++++++ mindnlp/transformers/cache_utils.py           |  28 +-
-++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
-++++++
-++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++++++index cadd2e04..02f8d4be 100644
-++++++--- a/mindnlp/transformers/cache_utils.py
-+++++++++ b/mindnlp/transformers/cache_utils.py
-++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++++++             # k_out[:, :, cache_position] = key_states
-++++++             # v_out[:, :, cache_position] = value_states
-++++++-            if ON_ORANGE_PI:
-++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++++-            else:
-++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++++-
-+++++++            # if ON_ORANGE_PI:
-+++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++++            # else:
-+++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++++            # 确保 cache_position 是 1D tensor 并且类型正确
-+++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-+++++++            if cache_position.ndim > 1:
-+++++++                cache_position = cache_position.flatten()
-+++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-+++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-+++++++                cache_position = cache_position.int()
-+++++++            
-+++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-+++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-+++++++            k_out[:, :, cache_position] = key_states
-+++++++            v_out[:, :, cache_position] = value_states
-+++++++            
-++++++         return k_out, v_out
-++++++ 
-++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++index c695b944..d8303e45 100644
-++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-++++++ def rotate_half(x):
-++++++     """Rotates half the hidden dims of the input."""
-++++++-    x1 = x[..., : x.shape[-1] // 2]
-++++++-    x2 = x[..., x.shape[-1] // 2 :]
-+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++++    # x1 = x[..., : x.shape[-1] // 2]
-+++++++    # x2 = x[..., x.shape[-1] // 2 :]
-+++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++++     return ops.cat((-x2, x1), dim=-1)
-++++++ 
-++++++ 
-++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++++++         if self.training:
-++++++             raise NotImplementedError("Training is not supported yet.")
-++++++         else:
-++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++++-        if self.config.n_shared_experts is not None:
-++++++-            y = y + self.shared_experts(identity)
-++++++-        return y
-+++++++            # @lwx
-+++++++            if orig_shape[1] == 1:
-+++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-+++++++                y=y.view(*orig_shape)
-+++++++                if self.config.n_shared_experts is not None:
-+++++++                    y = y + self.shared_experts(identity)
-+++++++                return y
-+++++++            else:
-+++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-+++++++                if self.config.n_shared_experts is not None:
-+++++++                    y = y + self.shared_experts(identity)
-+++++++                return y
-+++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++++        # if self.config.n_shared_experts is not None:
-+++++++        #     y = y + self.shared_experts(identity)
-+++++++        # return y
-+++++++        
-+++++++    @no_grad()
-+++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++++
-+++++++        expert_cache = ops.zeros_like(x)
-+++++++        for i in range(self.num_experts_per_tok):
-+++++++            expert_id = flat_expert_indices[i].item()
-+++++++            weight = flat_expert_weights[i].item()
-+++++++            expert = self.experts[expert_id]
-+++++++            expert_out = expert(x)
-+++++++            expert_cache += expert_out * weight
-+++++++        return expert_cache
-++++++ 
-++++++     @no_grad()
-++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++-        # expert_cache = torch.zeros_like(x)
-++++++-        # idxs = flat_expert_indices.argsort()
-++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++-        # token_idxs = idxs // self.num_experts_per_tok
-++++++-        # for i, end_idx in enumerate(tokens_per_expert):
-++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++-        #     if start_idx == end_idx:
-++++++-        #         continue
-++++++-        #     expert = self.experts[i]
-++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++-        #     expert_tokens = x[exp_token_idx]
-++++++-        #     expert_out = expert(expert_tokens)
-++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++-        # return expert_cache
-+++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++++         expert_cache = ops.zeros_like(x)
-++++++         idxs = flat_expert_indices.argsort()
-++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++         token_idxs = idxs // self.num_experts_per_tok
-+++++++
-++++++         for i, end_idx in enumerate(tokens_per_expert):
-++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++             if start_idx == end_idx:
-++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++++++             expert_out = expert(expert_tokens)
-++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++++
-++++++         return expert_cache
-+++++++        
-+++++++    # @no_grad()
-+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++++    #     # expert_cache = torch.zeros_like(x)
-+++++++    #     # idxs = flat_expert_indices.argsort()
-+++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++++    #     # token_idxs = idxs // self.num_experts_per_tok
-+++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-+++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++++    #     #     if start_idx == end_idx:
-+++++++    #     #         continue
-+++++++    #     #     expert = self.experts[i]
-+++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++    #     #     expert_tokens = x[exp_token_idx]
-+++++++    #     #     expert_out = expert(expert_tokens)
-+++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++++    #     # return expert_cache
-+++++++    #     expert_cache = ops.zeros_like(x)
-+++++++    #     idxs = flat_expert_indices.argsort()
-+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++++    #     token_idxs = idxs // self.num_experts_per_tok
-+++++++
-+++++++    #     for i, end_idx in enumerate(tokens_per_expert):
-+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++++    #         if start_idx == end_idx:
-+++++++    #             continue
-+++++++    #         expert = self.experts[i]
-+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++    #         expert_tokens = x[exp_token_idx]
-+++++++    #         expert_out = expert(expert_tokens)
-+++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++++
-+++++++    #     return expert_cache
-+++++++    # @no_grad()
-+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++++    #     expert_cache = ops.zeros_like(x)
-+++++++
-+++++++    #     # 排序保证顺序一致
-+++++++    #     idxs = flat_expert_indices.argsort()
-+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++++    #     token_idxs = idxs // self.num_experts_per_tok
-+++++++
-+++++++    #     # 找出有 token 的专家
-+++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++++
-+++++++    #     for i in active_experts.tolist():
-+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++++    #         end_idx = tokens_per_expert[i]
-+++++++    #         if start_idx == end_idx:  # 没有 token
-+++++++    #             continue
-+++++++
-+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++    #         expert_tokens = x[exp_token_idx]
-+++++++    #         expert_out = self.experts[i](expert_tokens)
-+++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++++
-+++++++    #         expert_cache = mindspore.mint.scatter_add(
-+++++++    #             expert_cache,
-+++++++    #             0,
-+++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++++    #             expert_out
-+++++++    #         )
-+++++++
-+++++++    #     return expert_cache
-+++++++
-+++++++
-++++++ 
-++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++++++ #     """
-++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++++ 
-++++++         # Initialize weights and apply final processing
-++++++         self.post_init()
-+++++++        self.warm_up = False
-+++++++
-+++++++    def warmup_moe_model_deep(self):
-+++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-+++++++        test_texts = [
-+++++++            "warmup short",
-+++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-+++++++        ]
-+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++++        if tokenizer is None:
-+++++++            from mindnlp.transformers import AutoTokenizer
-+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++++            self._warmup_tokenizer = tokenizer
-+++++++
-+++++++        for text in test_texts:
-+++++++            inputs = tokenizer(text, return_tensors="ms")
-+++++++            with mindspore._no_grad():
-+++++++                _ = self(**inputs, use_cache=False)
-+++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++++++ 
-++++++     def get_input_embeddings(self):
-++++++         return self.model.embed_tokens
-++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++++         ```"""
-+++++++        if not self.warm_up:
-+++++++            self.warm_up = True
-+++++++            self.warmup_moe_model_deep()
-+++++++
-++++++         output_attentions = (
-++++++             output_attentions
-++++++             if output_attentions is not None
-++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++index 3cbf820e..d4c6b651 100644
-++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++@@ -18,7 +18,6 @@
-++++++ # See the License for the specific language governing permissions and
-++++++ # limitations under the License.
-++++++ """MindSpore Qwen2MoE model."""
-++++++-
-++++++ import math
-++++++ from typing import List, Optional, Tuple, Union
-++++++ 
-++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++++++     TokenClassifierOutput,
-++++++ )
-++++++ from ...modeling_utils import PreTrainedModel
-+++++++from ...generation import GenerationMixin
-++++++ from ....utils import logging
-++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
-++++++ 
-++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++++++         self.variance_epsilon = eps
-++++++ 
-++++++     def forward(self, hidden_states):
-+++++++        # @dwj
-+++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++++        # @lwx
-+++++++        # if not self.training :
-+++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++++         input_dtype = hidden_states.dtype
-++++++         hidden_states = hidden_states.to(mindspore.float32)
-++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++++++@@ -234,6 +239,8 @@ def rotate_half(x):
-++++++     """Rotates half the hidden dims of the input."""
-++++++     x1 = x[..., : x.shape[-1] // 2]
-++++++     x2 = x[..., x.shape[-1] // 2 :]
-+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-+++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++++     return ops.cat((-x2, x1), dim=-1)
-++++++ 
-++++++ 
-++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++++++         self.config = config
-++++++         self.hidden_size = config.hidden_size
-++++++         self.intermediate_size = intermediate_size
-+++++++        
-++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++++++         self.act_fn = ACT2FN[config.hidden_act]
-++++++ 
-++++++     def forward(self, x):
-++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++++-
-++++++ 
-+++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++++        # @lwx 
-+++++++        # gate_up_output = self.gate_up_proj(x)
-+++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-+++++++        # return self.down_proj(swiglu_output)
-+++++++
-+++++++    # def forward(self, x):
-+++++++    #     gate_proj_out = self.gate_proj(x)
-+++++++    #     up_proj_out = self.up_proj(x)
-+++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-+++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-+++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-+++++++    #     return self.down_proj(swiglu_out)
-+++++++        
-++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++++++     """
-++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++++++         use_cache: bool = False,
-++++++         cache_position: Optional[mindspore.Tensor] = None,
-++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++        
-+++++++
-++++++         bsz, q_len, _ = hidden_states.shape
-++++++ 
-++++++         query_states = self.q_proj(hidden_states)
-++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++                     "with a layer index."
-++++++                 )
-++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++            if isinstance(past_key_value, StaticCache):
-+++++++                kv_seq_len = key_states.shape[-2]
-+++++++            else:
-+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++ 
-++++++         if past_key_value is not None:
-++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++++            
-+++++++            if isinstance(past_key_value, StaticCache):
-+++++++                kv_seq_len = key_states.shape[-2]
-++++++ 
-++++++         # repeat k/v heads if n_kv_heads < n_heads
-++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++-
-+++++++        
-++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++++ 
-++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++++++-            raise ValueError(
-++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++++++-                f" {attn_weights.shape}"
-++++++-            )
-++++++-
-++++++-        if attention_mask is not None:  # no matter the length, we just slice it
-++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-+++++++        if attention_mask is not None:
-+++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++++             attn_weights = attn_weights + causal_mask
-++++++ 
-++++++         # upcast attention to fp32
-++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++++ 
-++++++         attn_output = self.o_proj(attn_output)
-++++++-
-+++++++        # @lwx
-+++++++        
-+++++++        # max_seq_len = self.max_position_embeddings  # 2048
-+++++++
-+++++++        # if attention_mask is not None:
-+++++++        #     # attention_mask: [B, 1, Sq, Sk]
-+++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++++++
-+++++++        #     # pad 到 [max_seq_len, max_seq_len]
-+++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++++++        #     global_attention_mask = padded_mask
-+++++++        # else:
-+++++++        #     global_attention_mask = None
-+++++++
-+++++++
-+++++++        # sparse_mode=3
-+++++++        # attn_output = mindspore.ops.flash_attention_score(
-+++++++        #     query=query_states,
-+++++++        #     key=key_states,
-+++++++        #     value=value_states,
-+++++++        #     real_shift=None,
-+++++++        #     padding_mask=None,
-+++++++
-+++++++        #     head_num=self.num_heads,
-+++++++        #     attn_mask=global_attention_mask,
-+++++++        #     keep_prob=1.0 - self.attention_dropout,
-+++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++        #     input_layout="BNSD",
-+++++++        #     pre_tokens=2147483647,
-+++++++        #     next_tokens=2147483647,
-+++++++        #     inner_precise=0,
-+++++++        #     drop_mask=None,
-+++++++        #     prefix=None,
-+++++++        #     actual_seq_qlen=None,
-+++++++        #     actual_seq_kvlen=None,
-+++++++        #     sparse_mode=sparse_mode,
-+++++++        # )
-++++++         if not output_attentions:
-++++++             attn_weights = None
-++++++ 
-++++++         return attn_output, attn_weights, past_key_value
-++++++ 
-++++++ 
-+++++++class Qwen2MoeFlashAttention(nn.Module):
-+++++++    """
-+++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++++++
-+++++++    关键改动:
-+++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++++++       直接传入原始的 key 和 value 张量效率更高。
-+++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++++    """
-+++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++++        super().__init__()
-+++++++        self.config = config
-+++++++        self.layer_idx = layer_idx
-+++++++        self.hidden_size = config.hidden_size
-+++++++        self.num_heads = config.num_attention_heads
-+++++++        self.head_dim = self.hidden_size // self.num_heads
-+++++++        self.num_key_value_heads = config.num_key_value_heads
-+++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++++        self.max_position_embeddings = config.max_position_embeddings
-+++++++        self.rope_theta = config.rope_theta
-+++++++        self.attention_dropout = config.attention_dropout
-+++++++
-+++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++++            raise ValueError(
-+++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++++++            )
-+++++++
-+++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++++
-+++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++++            self.head_dim,
-+++++++            max_position_embeddings=self.max_position_embeddings,
-+++++++            base=self.rope_theta,
-+++++++        )
-+++++++
-+++++++    def forward(
-+++++++        self,
-+++++++        hidden_states: mindspore.Tensor,
-+++++++        attention_mask: Optional[mindspore.Tensor] = None,
-+++++++        position_ids: Optional[mindspore.Tensor] = None,
-+++++++        past_key_value: Optional[Cache] = None,
-+++++++        output_attentions: bool = False,
-+++++++        use_cache: bool = False,
-+++++++        cache_position: Optional[mindspore.Tensor] = None,
-+++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++        bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++        # 1. 线性投射 Q, K, V
-+++++++        query_states = self.q_proj(hidden_states)
-+++++++        key_states = self.k_proj(hidden_states)
-+++++++        value_states = self.v_proj(hidden_states)
-+++++++
-+++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-+++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++        # 3. RoPE 旋转位置编码
-+++++++        kv_seq_len = key_states.shape[-2]
-+++++++        if past_key_value is not None:
-+++++++            if self.layer_idx is None:
-+++++++                raise ValueError(
-+++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++                    "with a layer index."
-+++++++                )
-+++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++++++                if cache_position.shape[0] == 1:
-+++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++++++                    kv_seq_len = past_seen_tokens + 1
-+++++++                else:
-+++++++                    # prefill 阶段：cache_position 是范围，使用其长度
-+++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++++++            else:
-+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++
-+++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++        # 4. KV 缓存更新
-+++++++        if past_key_value is not None:
-+++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++++            key_states, value_states = past_key_value.update(
-+++++++                key_states, value_states, self.layer_idx, cache_kwargs
-+++++++            )
-+++++++            
-+++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++++                if cache_position.shape[0] == 1:
-+++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++++++                    kv_seq_len = key_states.shape[-2]
-+++++++
-+++++++        # 5. [重要] 准备 Attention Mask
-+++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++++        fa_attention_mask = None
-+++++++        if attention_mask is not None:
-+++++++            # 截取与当前key长度匹配的部分
-+++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++++++            fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++++++        input_dtype = query_states.dtype
-+++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++++++            query_states = query_states.to(mindspore.float16)
-+++++++            key_states = key_states.to(mindspore.float16)
-+++++++            value_states = value_states.to(mindspore.float16)
-+++++++
-+++++++        # 6. [核心] 调用 flash_attention_score 算子
-+++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++++        attn_output = mindspore.ops.flash_attention_score(
-+++++++            query=query_states,
-+++++++            key=key_states,
-+++++++            value=value_states,
-+++++++            head_num=self.num_heads, # 传入Q的头数(N1)
-+++++++            attn_mask=fa_attention_mask,
-+++++++            keep_prob=1.0 - self.attention_dropout,
-+++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++            input_layout="BNSD",
-+++++++            sparse_mode=0 # 使用 defaultMask 模式
-+++++++        )
-+++++++
-+++++++        # 恢复原始数据类型
-+++++++        attn_output = attn_output.to(input_dtype)
-+++++++
-+++++++        # 7. 调整输出形状
-+++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++        attn_output = self.o_proj(attn_output)
-+++++++
-+++++++        # FlashAttention 算子不直接返回注意力权重矩阵
-+++++++        attn_weights = None
-+++++++        if output_attentions:
-+++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++++
-+++++++        return attn_output, attn_weights, past_key_value
-+++++++
-+++++++    # def forward(
-+++++++    #     self,
-+++++++    #     hidden_states: mindspore.Tensor,
-+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++++    #     past_key_value: Optional[Cache] = None,
-+++++++    #     output_attentions: bool = False,
-+++++++    #     use_cache: bool = False,
-+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++    #     bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++    #     # 1. 线性投射 Q, K, V
-+++++++    #     query_states = self.q_proj(hidden_states)
-+++++++    #     key_states = self.k_proj(hidden_states)
-+++++++    #     value_states = self.v_proj(hidden_states)
-+++++++
-+++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++    #     # 3. RoPE 旋转位置编码
-+++++++    #     kv_seq_len = key_states.shape[-2]
-+++++++    #     if past_key_value is not None:
-+++++++    #         if self.layer_idx is None:
-+++++++    #             raise ValueError(
-+++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++    #                 "with a layer index."
-+++++++    #             )
-+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++
-+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++    #     # 4. KV 缓存更新
-+++++++    #     if past_key_value is not None:
-+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++++    #         key_states, value_states = past_key_value.update(
-+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++++    #         )
-+++++++
-+++++++    #     # 5. 准备 Attention Mask
-+++++++    #     fa_attention_mask = None
-+++++++    #     if attention_mask is not None:
-+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++    #         fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++++++    #     input_dtype = query_states.dtype
-+++++++
-+++++++    #     # 6. [核心] 调用 flash_attention_score 算子
-+++++++    #     attn_output = mindspore.ops.flash_attention_score(
-+++++++    #         query=query_states,
-+++++++    #         key=key_states,
-+++++++    #         value=value_states,
-+++++++    #         head_num=self.num_heads,
-+++++++    #         attn_mask=fa_attention_mask,
-+++++++    #         keep_prob=1.0 - self.attention_dropout,
-+++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++    #         input_layout="BNSD",
-+++++++    #         sparse_mode=0,
-+++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++++++    #         inner_precise=1
-+++++++    #     )
-+++++++
-+++++++    #     # 恢复原始数据类型
-+++++++    #     attn_output = attn_output.to(input_dtype)
-+++++++
-+++++++    #     # 7. 调整输出形状
-+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++    #     attn_output = self.o_proj(attn_output)
-+++++++
-+++++++    #     attn_weights = None
-+++++++    #     if output_attentions:
-+++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++++
-+++++++    #     return attn_output, attn_weights, past_key_value
-+++++++
-+++++++    # def forward(
-+++++++    #     self,
-+++++++    #     hidden_states: mindspore.Tensor,
-+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-+++++++    #     past_key_value: Optional[Cache] = None,
-+++++++    #     output_attentions: bool = False,
-+++++++    #     use_cache: bool = False,
-+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++    #     bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++    #     query_states = self.q_proj(hidden_states)
-+++++++    #     key_states = self.k_proj(hidden_states)
-+++++++    #     value_states = self.v_proj(hidden_states)
-+++++++
-+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++    #     kv_seq_len = key_states.shape[-2]
-+++++++    #     if past_key_value is not None:
-+++++++    #         if self.layer_idx is None:
-+++++++    #             raise ValueError("`layer_idx` must be specified for caching")
-+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++
-+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++    #     if past_key_value is not None:
-+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++++    #         key_states, value_states = past_key_value.update(
-+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++++    #         )
-+++++++
-+++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++++
-+++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-+++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-+++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-+++++++    #     query_states = query_states / math.sqrt(self.head_dim)
-+++++++    #     # <--- 修改结束 ---
-+++++++
-+++++++    #     fa_attention_mask = None
-+++++++    #     if attention_mask is not None:
-+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++    #         fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++    #     input_dtype = query_states.dtype
-+++++++
-+++++++    #     attn_output = mindspore.ops.flash_attention_score(
-+++++++    #         query=query_states,    # 传入已经预先缩放过的 query
-+++++++    #         key=key_states,
-+++++++    #         value=value_states,
-+++++++    #         head_num=self.num_heads,
-+++++++    #         attn_mask=fa_attention_mask,
-+++++++    #         keep_prob=1.0 - self.attention_dropout,
-+++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-+++++++    #         input_layout="BNSD",
-+++++++    #         sparse_mode=0,
-+++++++    #         inner_precise=1        # 仍然保持内部高精度计算
-+++++++    #     )
-+++++++
-+++++++    #     attn_output = attn_output.to(input_dtype)
-+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++    #     attn_output = self.o_proj(attn_output)
-+++++++
-+++++++    #     attn_weights = None
-+++++++    #     if output_attentions:
-+++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++++++
-+++++++    #     return attn_output, attn_weights, past_key_value
-+++++++
-++++++ QWEN2MOE_ATTENTION_CLASSES = {
-++++++     "eager": Qwen2MoeAttention,
-+++++++    "flash-attention": Qwen2MoeFlashAttention,
-++++++ }
-++++++ 
-++++++ 
-++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++ 
-+++++++    #@dwj
-+++++++    # 只遍历激活的专家，而非全部专家
-++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
-++++++-        # router_logits: (batch * sequence_length, n_experts)
-++++++-        router_logits = self.gate(hidden_states)
-++++++-
-++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++-        if self.norm_topk_prob:
-++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-        # we cast back to the input dtype
-++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
-++++++-
-++++++-        final_hidden_states = ops.zeros(
-++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++++++-        )
-++++++-
-++++++-        # One hot encode the selected experts to create an expert mask
-++++++-        # this will be used to easily index which expert is going to be sollicitated
-++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++++++-
-++++++-        # Loop over all available experts in the model and perform the computation on each expert
-++++++-        for expert_idx in range(self.num_experts):
-++++++-            expert_layer = self.experts[expert_idx]
-++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++++++-
-++++++-            # Index the correct hidden states and compute the expert hidden state for
-++++++-            # the current expert. We need to make sure to multiply the output hidden
-++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++++++-            if 0 not in idx.shape:
-++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++++++-
-++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
-++++++-                # the `top_x` tensor here.
-++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++++++-
-++++++-        shared_expert_output = self.shared_expert(hidden_states)
-++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++++++-
-++++++-        final_hidden_states = final_hidden_states + shared_expert_output
-+++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++            num_tokens = hidden_states_reshaped.shape[0]
-+++++++
-+++++++            router_logits = self.gate(hidden_states_reshaped)
-+++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++++
-+++++++            if self.norm_topk_prob:
-+++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++            routing_weights = routing_weights.to(hidden_states.dtype)
-+++++++            
-+++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++++++            flat_selected_experts = selected_experts.flatten()
-+++++++            
-+++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++++++            token_indices = broadcasted_token_indices.flatten()
-+++++++            
-+++++++            active_experts = ops.unique(flat_selected_experts)
-+++++++            
-+++++++            for expert_idx_tensor in active_experts:
-+++++++                expert_idx = expert_idx_tensor.item()
-+++++++                expert_layer = self.experts[expert_idx]
-+++++++                
-+++++++                mask = (flat_selected_experts == expert_idx_tensor)
-+++++++                selected_token_indices = token_indices[mask]
-+++++++                selected_routing_weights = routing_weights.flatten()[mask]
-+++++++                
-+++++++                current_states = hidden_states_reshaped[selected_token_indices]
-+++++++                
-+++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++++                
-+++++++                final_hidden_states = final_hidden_states.index_add(
-+++++++                    dim=0,
-+++++++                    index=selected_token_indices,
-+++++++                    source=expert_output.to(hidden_states.dtype)
-+++++++                )
-+++++++            
-+++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++++ 
-++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++-        return final_hidden_states, router_logits
-+++++++            final_hidden_states = final_hidden_states + shared_expert_output
-+++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++++            
-+++++++            return final_hidden_states, router_logits
-++++++ 
-++++++ 
-++++++ class Qwen2MoeDecoderLayer(nn.Module):
-++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++++++ 
-++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++++ 
-+++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++++
-++++++         if (layer_idx not in config.mlp_only_layers) and (
-++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++++++         ):
-++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++++++     _skip_keys_device_placement = "past_key_values"
-++++++     _supports_cache_class = True
-+++++++#lwx
-+++++++    # _supports_static_cache = True
-++++++ 
-++++++     def _init_weights(self, module):
-++++++         std = self.config.initializer_range
-++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++++++         return causal_mask
-++++++ 
-++++++ 
-++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++     _tied_weights_keys = ["lm_head.weight"]
-++++++ 
-++++++     def __init__(self, config):
-++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++         self.num_experts_per_tok = config.num_experts_per_tok
-++++++         # Initialize weights and apply final processing
-++++++         self.post_init()
-+++++++        # @lwx
-+++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-+++++++        #     self.generation_config.cache_implementation = "static"
-+++++++        self._warmed_up = False
-+++++++
-+++++++    def warmup_moe_model(self):
-+++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-+++++++        test_texts = [
-+++++++            "warmup short",
-+++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-+++++++        ]
-+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-+++++++        if tokenizer is None:
-+++++++            from mindnlp.transformers import AutoTokenizer
-+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-+++++++            self._warmup_tokenizer = tokenizer
-+++++++
-+++++++        for text in test_texts:
-+++++++            inputs = tokenizer(text, return_tensors="ms")
-+++++++            with mindspore._no_grad():
-+++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-+++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++++++ 
-++++++     def get_input_embeddings(self):
-++++++         return self.model.embed_tokens
-++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++++         ```"""
-+++++++        if not self._warmed_up:
-+++++++            self._warmed_up = True
-+++++++            self.warmup_moe_model()
-++++++ 
-++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++++++         output_router_logits = (
-++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++             }
-++++++         )
-++++++         return model_inputs
-+++++++# @lwx
-+++++++    # def _decode_one_tokens_logits(
-+++++++    #     self,
-+++++++    #     cur_token: mindspore.Tensor,
-+++++++    #     input_pos: Optional[mindspore.Tensor],
-+++++++    #     cache_position: mindspore.Tensor,
-+++++++    #     past_key_values: StaticCache,
-+++++++    # ) -> mindspore.Tensor:
-+++++++    #     """
-+++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-+++++++        
-+++++++    #     Args:
-+++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-+++++++    #         input_pos: 输入位置信息，可选
-+++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-+++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-+++++++            
-+++++++    #     Returns:
-+++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-+++++++    #     """
-+++++++    #     # 调用JIT编译的版本
-+++++++    #     return self.get_decode_one_tokens_logits(
-+++++++    #         cur_token=cur_token,
-+++++++    #         input_pos=input_pos,
-+++++++    #         cache_position=cache_position,
-+++++++    #         past_key_values=past_key_values,
-+++++++    #     )
-+++++++    
-+++++++    # @mindspore.jit(jit_level='O1')
-+++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-+++++++    #     """
-+++++++    #     JIT编译的函数，用于高效的单token解码
-+++++++    #     使用JIT编译优化以支持静态shape和高效执行
-+++++++        
-+++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-+++++++    #     """
-+++++++    #     outputs = self.model.forward(
-+++++++    #         input_ids=cur_token,
-+++++++    #         position_ids=input_pos,
-+++++++    #         cache_position=cache_position,
-+++++++    #         past_key_values=past_key_values,
-+++++++    #         use_cache=True,
-+++++++    #         return_dict=False,
-+++++++    #     )
-+++++++        
-+++++++    #     hidden_states = outputs[0]
-+++++++    #     logits = self.lm_head.forward(hidden_states)
-+++++++    #     logits = logits.float()
-+++++++        
-+++++++    #     return logits[:, -1, :]
-+++++++
-+++++++    # def _sample(
-+++++++    #     self,
-+++++++    #     input_ids: mindspore.Tensor,
-+++++++    #     logits_processor,
-+++++++    #     stopping_criteria,
-+++++++    #     generation_config,
-+++++++    #     synced_devices: bool,
-+++++++    #     streamer=None,
-+++++++    #     logits_warper=None,
-+++++++    #     **model_kwargs,
-+++++++    # ):
-+++++++    #     """
-+++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-+++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-+++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-+++++++    #     """
-+++++++    #     from ...generation.logits_process import LogitsProcessorList
-+++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-+++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-+++++++    #     from mindnlp.core import nn, ops, no_grad
-+++++++    #     import numpy as np
-+++++++        
-+++++++    #     # 检查是否使用 StaticCache
-+++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-+++++++    #     # 否则，直接调用父类方法
-+++++++    #     past_key_values = model_kwargs.get("past_key_values")
-+++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-+++++++
-+++++++    #     if not isinstance(past_key_values, StaticCache):
-+++++++    #         # 不使用 StaticCache，直接调用父类方法
-+++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-+++++++    #         return super()._sample(
-+++++++    #             input_ids=input_ids,
-+++++++    #             logits_processor=logits_processor,
-+++++++    #             stopping_criteria=stopping_criteria,
-+++++++    #             generation_config=generation_config,
-+++++++    #             synced_devices=synced_devices,
-+++++++    #             streamer=streamer,
-+++++++    #             logits_warper=logits_warper,
-+++++++    #             **model_kwargs,
-+++++++    #         )
-+++++++        
-+++++++    #     # 使用 StaticCache，进入自定义循环
-+++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-+++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-+++++++    #     pad_token_id = generation_config._pad_token_tensor
-+++++++    #     output_attentions = generation_config.output_attentions
-+++++++    #     output_hidden_states = generation_config.output_hidden_states
-+++++++    #     output_scores = generation_config.output_scores
-+++++++    #     output_logits = generation_config.output_logits
-+++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-+++++++    #     max_length = generation_config.max_length
-+++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-+++++++    #     do_sample = generation_config.do_sample
-+++++++        
-+++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-+++++++    #         raise ValueError(
-+++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-+++++++    #             f"{logits_warper})."
-+++++++    #         )
-+++++++        
-+++++++    #     # init attention / hidden states / scores tuples
-+++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-+++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-+++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-+++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-+++++++        
-+++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-+++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-+++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-+++++++    #         encoder_hidden_states = (
-+++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-+++++++    #         )
-+++++++        
-+++++++    #     # keep track of which sequences are already finished
-+++++++    #     batch_size, cur_len = input_ids.shape
-+++++++    #     this_peer_finished = False
-+++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-+++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-+++++++        
-+++++++    #     time_record = []
-+++++++    #     from ....utils.testing_utils import parse_flag_from_env
-+++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-+++++++        
-+++++++    #     while self._has_unfinished_sequences(
-+++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-+++++++    #     ):
-+++++++    #         if _record_time:
-+++++++    #             import time as time_module
-+++++++    #             infer_start = time_module.time()
-+++++++            
-+++++++    #         # prepare model inputs
-+++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-+++++++            
-+++++++    #         # prepare variable output controls
-+++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-+++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-+++++++            
-+++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-+++++++    #         cur_cache_position = model_inputs.get("cache_position")
-+++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-+++++++    #         cur_input_ids = model_inputs.get("input_ids")
-+++++++            
-+++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-+++++++    #             cur_cache_position is not None and 
-+++++++    #             len(cur_cache_position.shape) > 0 and
-+++++++    #             cur_cache_position.shape[0] == 1 and
-+++++++    #             cur_input_ids is not None and
-+++++++    #             cur_input_ids.shape[1] == 1):
-+++++++    #             # 使用 JIT 优化的单 token 解码
-+++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-+++++++    #             if not hasattr(self, '_jit_used'):
-+++++++    #                 self._jit_used = False
-+++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-+++++++                
-+++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-+++++++    #                 cur_token=cur_input_ids,
-+++++++    #                 input_pos=model_inputs.get("position_ids"),
-+++++++    #                 cache_position=cur_cache_position,
-+++++++    #                 past_key_values=cur_past_key_values,
-+++++++    #             )
-+++++++                
-+++++++    #             # 标记已使用JIT（用于后续判断）
-+++++++    #             if not self._jit_used:
-+++++++    #                 self._jit_used = True
-+++++++                
-+++++++    #             # 构造兼容的输出对象
-+++++++    #             class JitOptimizedOutput:
-+++++++    #                 def __init__(self, logits, config):
-+++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-+++++++    #                     self.config = config
-+++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-+++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-+++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-+++++++    #                     self.cross_attentions = None
-+++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-+++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-+++++++                
-+++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-+++++++    #         else:
-+++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-+++++++    #             outputs = self(**model_inputs, return_dict=True)
-+++++++            
-+++++++    #         if synced_devices and this_peer_finished:
-+++++++    #             continue
-+++++++            
-+++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-+++++++    #         next_token_logits = outputs.logits[:, -1, :]
-+++++++            
-+++++++    #         # pre-process distribution
-+++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-+++++++    #         if do_sample:
-+++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-+++++++            
-+++++++    #         # Store scores, attentions and hidden_states when required
-+++++++    #         if return_dict_in_generate:
-+++++++    #             if output_scores:
-+++++++    #                 scores += (next_token_scores,)
-+++++++    #             if output_logits:
-+++++++    #                 raw_logits += (next_token_logits,)
-+++++++    #             if output_attentions:
-+++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-+++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-+++++++    #                 if self.config.is_encoder_decoder:
-+++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-+++++++                
-+++++++    #             if output_hidden_states:
-+++++++    #                 hidden = (
-+++++++    #                     outputs.decoder_hidden_states
-+++++++    #                     if self.config.is_encoder_decoder
-+++++++    #                     else outputs.hidden_states
-+++++++    #                 )
-+++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-+++++++            
-+++++++    #         # token selection
-+++++++    #         if do_sample:
-+++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-+++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-+++++++    #         else:
-+++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-+++++++            
-+++++++    #         # finished sentences should have their next token be a padding token
-+++++++    #         if has_eos_stopping_criteria:
-+++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-+++++++            
-+++++++    #         # update generated ids, model inputs, and length for next step
-+++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-+++++++    #         if streamer is not None:
-+++++++    #             streamer.put(next_tokens)
-+++++++            
-+++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-+++++++    #             outputs,
-+++++++    #             model_kwargs,
-+++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-+++++++    #         )
-+++++++            
-+++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-+++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-+++++++    #         cur_len += 1
-+++++++            
-+++++++    #         if _record_time:
-+++++++    #             import time as time_module
-+++++++    #             infer_stop = time_module.time()
-+++++++    #             time_record.append(infer_stop - infer_start)
-+++++++            
-+++++++    #         del outputs
-+++++++        
-+++++++    #     average_infer_time = None
-+++++++    #     if time_record:
-+++++++    #         if len(time_record) > 1:
-+++++++    #             time_record.pop(0)
-+++++++    #         average_infer_time = sum(time_record) / len(time_record)
-+++++++    #         print(f'average inference time is: {average_infer_time}')
-+++++++    #         print(f'inference time record: {time_record}')
-+++++++        
-+++++++    #     if streamer is not None:
-+++++++    #         streamer.end()
-+++++++        
-+++++++    #     # 简单判断：打印是否使用了JIT路径
-+++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-+++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-+++++++    #     else:
-+++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-+++++++        
-+++++++    #     if return_dict_in_generate:
-+++++++    #         if self.config.is_encoder_decoder:
-+++++++    #             return GenerateEncoderDecoderOutput(
-+++++++    #                 sequences=input_ids,
-+++++++    #                 scores=scores,
-+++++++    #                 logits=raw_logits,
-+++++++    #                 encoder_attentions=encoder_attentions,
-+++++++    #                 encoder_hidden_states=encoder_hidden_states,
-+++++++    #                 decoder_attentions=decoder_attentions,
-+++++++    #                 cross_attentions=cross_attentions,
-+++++++    #                 decoder_hidden_states=decoder_hidden_states,
-+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++++    #                 average_infer_time=average_infer_time
-+++++++    #             )
-+++++++    #         else:
-+++++++    #             return GenerateDecoderOnlyOutput(
-+++++++    #                 sequences=input_ids,
-+++++++    #                 scores=scores,
-+++++++    #                 logits=raw_logits,
-+++++++    #                 attentions=decoder_attentions,
-+++++++    #                 hidden_states=decoder_hidden_states,
-+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-+++++++    #                 average_infer_time=average_infer_time
-+++++++    #             )
-+++++++    #     else:
-+++++++    #         return input_ids
-+++++++            
-+++++++    # def _prepare_cache_for_generation(
-+++++++    #     self,
-+++++++    #     generation_config,
-+++++++    #     model_kwargs,
-+++++++    #     assistant_model,
-+++++++    #     batch_size,
-+++++++    #     max_cache_length,
-+++++++    # ):
-+++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-+++++++    #         generation_config.cache_implementation = "static"
-+++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-+++++++        
-+++++++    #     if generation_config.cache_implementation == "static":
-+++++++    #         base_required_from_max_length = generation_config.max_length + 1
-+++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-+++++++    #         min_cache_size = 50
-+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-+++++++    #         else:
-+++++++    #             max_cache_length = max(base_required, min_cache_size)
-+++++++            
-+++++++    #         original_max_cache_length = max_cache_length
-+++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-+++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-+++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-+++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-+++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-+++++++            
-+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-+++++++    #             if max_cache_length > self.config.max_position_embeddings:
-+++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-+++++++        
-+++++++    #     result = super()._prepare_cache_for_generation(
-+++++++    #         generation_config=generation_config,
-+++++++    #         model_kwargs=model_kwargs,
-+++++++    #         assistant_model=assistant_model,
-+++++++    #         batch_size=batch_size,
-+++++++    #         max_cache_length=max_cache_length,
-+++++++    #     )
-+++++++        
-+++++++    #     if generation_config.cache_implementation == "static":
-+++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-+++++++    #         created_cache = model_kwargs.get(cache_name)
-+++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-+++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-+++++++    #             if created_cache.max_cache_len < generation_config.max_length:
-+++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-+++++++        
-+++++++    #     return result
-+++++++
-+++++++
-+++++++
-++++++ 
-++++++ 
-++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++++++-- 
-++++++2.27.0
-++++++
-+++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-+++++new file mode 100644
-+++++index 00000000..22b65dd5
-+++++--- /dev/null
-++++++++ b/patches/0002-20251106commit.patch
-+++++@@ -0,0 +1,3200 @@
-++++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-++++++From: Pinoeer-kingxi <13022943007@163.com>
-++++++Date: Thu, 6 Nov 2025 09:20:38 +0800
-++++++Subject: [PATCH 2/3] 20251106commit
-++++++
-++++++---
-++++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
-++++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
-++++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
-++++++ create mode 100644 patches/0001-20251104commit.patch
-++++++
-++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++index d8303e45..73773c22 100644
-++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
-++++++         #     y = y + self.shared_experts(identity)
-++++++         # return y
-++++++         
-+++++++    # @no_grad()
-+++++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++++
-+++++++    #     expert_cache = ops.zeros_like(x)
-+++++++    #     for i in range(self.num_experts_per_tok):
-+++++++    #         expert_id = flat_expert_indices[i].item()
-+++++++    #         weight = flat_expert_weights[i].item()
-+++++++    #         expert = self.experts[expert_id]
-+++++++    #         expert_out = expert(x)
-+++++++    #         expert_cache += expert_out * weight
-+++++++    #     return expert_cache
-+++++++
-++++++     @no_grad()
-++++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-+++++++        # x 的 shape: (1, hidden_size)
-+++++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
-+++++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
-+++++++
-+++++++        # 1. 收集所有需要的专家层
-+++++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
-+++++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
-+++++++
-+++++++        # 2. 并行计算所有专家的输出
-+++++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
-+++++++        # ops.cat 会将它们堆叠成一个新的 Tensor
-+++++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
-+++++++
-+++++++        # 3. 使用矩阵乘法进行加权求和
-+++++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
-+++++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
-+++++++        # 最终结果 final_output 的 shape: (1, hidden_size)
-+++++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
-+++++++        
-+++++++        return final_output
-++++++ 
-++++++-        expert_cache = ops.zeros_like(x)
-++++++-        for i in range(self.num_experts_per_tok):
-++++++-            expert_id = flat_expert_indices[i].item()
-++++++-            weight = flat_expert_weights[i].item()
-++++++-            expert = self.experts[expert_id]
-++++++-            expert_out = expert(x)
-++++++-            expert_cache += expert_out * weight
-++++++-        return expert_cache
-++++++ 
-++++++     @no_grad()
-++++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
-++++++             key_states = self.k_proj(hidden_states)
-++++++             value_states = self.v_proj(hidden_states)
-++++++ 
-++++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++++        # @lwx
-+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
-+++++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
-+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-+++++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-+++++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-++++++ 
-++++++         kv_seq_len = key_states.shape[-2]
-++++++         if past_key_value is not None:
-++++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
-++++++         return attn_output, attn_weights, past_key_value
-++++++ 
-++++++ 
-+++++++# class DeepseekFlashAttention(nn.Module):
-+++++++#     """
-+++++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-+++++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
-+++++++
-+++++++#     This class is designed as a drop-in replacement for DeepseekAttention.
-+++++++#     """
-+++++++
-+++++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+++++++#         super().__init__()
-+++++++#         self.config = config
-+++++++#         self.layer_idx = layer_idx
-+++++++#         if layer_idx is None:
-+++++++#             logger.warning(
-+++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++++#                 "when creating this class."
-+++++++#             )
-+++++++
-+++++++#         self.attention_dropout = config.attention_dropout
-+++++++#         self.hidden_size = config.hidden_size
-+++++++#         self.num_heads = config.num_attention_heads
-+++++++#         self.head_dim = self.hidden_size // self.num_heads
-+++++++#         self.num_key_value_heads = config.num_key_value_heads
-+++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++++#         self.max_position_embeddings = config.max_position_embeddings
-+++++++#         self.rope_theta = config.rope_theta
-+++++++#         self.is_causal = True
-+++++++
-+++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++++#             raise ValueError(
-+++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++++++#                 f" and `num_heads`: {self.num_heads})."
-+++++++#             )
-+++++++
-+++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+++++++#         self._init_rope()
-+++++++
-+++++++#     def _init_rope(self):
-+++++++#         if self.config.rope_scaling is None:
-+++++++#             self.rotary_emb = DeepseekRotaryEmbedding(
-+++++++#                 self.head_dim,
-+++++++#                 max_position_embeddings=self.max_position_embeddings,
-+++++++#                 base=self.rope_theta,
-+++++++#             )
-+++++++#         else:
-+++++++#             scaling_type = self.config.rope_scaling["type"]
-+++++++#             scaling_factor = self.config.rope_scaling["factor"]
-+++++++#             if scaling_type == "linear":
-+++++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+++++++#                     self.head_dim,
-+++++++#                     max_position_embeddings=self.max_position_embeddings,
-+++++++#                     scaling_factor=scaling_factor,
-+++++++#                     base=self.rope_theta,
-+++++++#                 )
-+++++++#             elif scaling_type == "dynamic":
-+++++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+++++++#                     self.head_dim,
-+++++++#                     max_position_embeddings=self.max_position_embeddings,
-+++++++#                     scaling_factor=scaling_factor,
-+++++++#                     base=self.rope_theta,
-+++++++#                 )
-+++++++#             else:
-+++++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+++++++
-+++++++#     def forward(
-+++++++#         self,
-+++++++#         hidden_states: mindspore.Tensor,
-+++++++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++++++#         position_ids: Optional[mindspore.Tensor] = None,
-+++++++#         past_key_value: Optional[Cache] = None,
-+++++++#         output_attentions: bool = False,
-+++++++#         use_cache: bool = False,
-+++++++#         **kwargs,
-+++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++#         if "padding_mask" in kwargs:
-+++++++#             warnings.warn(
-+++++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+++++++#             )
-+++++++        
-+++++++#         if output_attentions:
-+++++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
-+++++++
-+++++++#         bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++#         if self.config.pretraining_tp > 1:
-+++++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+++++++
-+++++++#         query_states = self.q_proj(hidden_states)
-+++++++#         key_states = self.k_proj(hidden_states)
-+++++++#         value_states = self.v_proj(hidden_states)
-+++++++
-+++++++#         # Reshape for multi-head attention
-+++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++#         kv_seq_len = key_states.shape[-2]
-+++++++#         if past_key_value is not None:
-+++++++#             if self.layer_idx is None:
-+++++++#                 raise ValueError(
-+++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++#                     "with a layer index."
-+++++++#                 )
-+++++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++        
-+++++++#         # Apply Rotary Positional Embedding
-+++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++#         if past_key_value is not None:
-+++++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
-+++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++++
-+++++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
-+++++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
-+++++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++        
-+++++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++++++        
-+++++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
-+++++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
-+++++++
-+++++++#         # Convert attention_mask for flash_attention_score
-+++++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
-+++++++#         if attention_mask is not None:
-+++++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
-+++++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+++++++#                 raise ValueError(
-+++++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+++++++#                 )
-+++++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
-+++++++#         else:
-+++++++#             attn_mask_for_fa = None
-+++++++        
-+++++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+++++++
-+++++++#         # Call the fused flash_attention_score operator
-+++++++#         attn_output = mindspore.ops.flash_attention_score(
-+++++++#             query=query_states_for_fa,
-+++++++#             key=key_states_for_fa,
-+++++++#             value=value_states_for_fa,
-+++++++#             head_num=self.num_heads, # This is N1, the number of query heads
-+++++++#             input_layout='BSH',
-+++++++#             attn_mask=attn_mask_for_fa,
-+++++++#             keep_prob=keep_prob,
-+++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++#             sparse_mode=0 # Default mask mode
-+++++++#         )
-+++++++        
-+++++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
-+++++++#         attn_output = self.o_proj(attn_output)
-+++++++        
-+++++++#         # Flash Attention does not return attention weights
-+++++++#         attn_weights = None
-+++++++
-+++++++#         return attn_output, attn_weights, past_key_value
-+++++++
-+++++++class DeepseekFlashAttention(nn.Module):
-+++++++    """
-+++++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
-+++++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
-+++++++    designed for high performance on supported hardware (Ascend).
-+++++++
-+++++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
-+++++++    """
-+++++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
-+++++++        super().__init__()
-+++++++        self.config = config
-+++++++        self.layer_idx = layer_idx
-+++++++        if layer_idx is None:
-+++++++            logger.warning(
-+++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++++                "when creating this class."
-+++++++            )
-+++++++
-+++++++        # --- [FIX] Correctly initialize all required attributes ---
-+++++++        self.attention_dropout = config.attention_dropout
-+++++++        self.hidden_size = config.hidden_size
-+++++++        self.num_heads = config.num_attention_heads
-+++++++        self.head_dim = self.hidden_size // self.num_heads
-+++++++        self.num_key_value_heads = config.num_key_value_heads
-+++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++++        self.max_position_embeddings = config.max_position_embeddings
-+++++++        self.rope_theta = config.rope_theta
-+++++++        self.is_causal = True
-+++++++
-+++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++++            raise ValueError(
-+++++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++++++                f" and `num_heads`: {self.num_heads})."
-+++++++            )
-+++++++
-+++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
-+++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
-+++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
-+++++++        
-+++++++        # This call will now succeed as all attributes are initialized.
-+++++++        self._init_rope()
-+++++++
-+++++++    def _init_rope(self):
-+++++++        if self.config.rope_scaling is None:
-+++++++            self.rotary_emb = DeepseekRotaryEmbedding(
-+++++++                self.head_dim,
-+++++++                max_position_embeddings=self.max_position_embeddings,
-+++++++                base=self.rope_theta,
-+++++++            )
-+++++++        else:
-+++++++            scaling_type = self.config.rope_scaling["type"]
-+++++++            scaling_factor = self.config.rope_scaling["factor"]
-+++++++            if scaling_type == "linear":
-+++++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
-+++++++                    self.head_dim,
-+++++++                    max_position_embeddings=self.max_position_embeddings,
-+++++++                    scaling_factor=scaling_factor,
-+++++++                    base=self.rope_theta,
-+++++++                )
-+++++++            elif scaling_type == "dynamic":
-+++++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
-+++++++                    self.head_dim,
-+++++++                    max_position_embeddings=self.max_position_embeddings,
-+++++++                    scaling_factor=scaling_factor,
-+++++++                    base=self.rope_theta,
-+++++++                )
-+++++++            else:
-+++++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
-+++++++
-+++++++    def forward(
-+++++++        self,
-+++++++        hidden_states: mindspore.Tensor,
-+++++++        attention_mask: Optional[mindspore.Tensor] = None,
-+++++++        position_ids: Optional[mindspore.Tensor] = None,
-+++++++        past_key_value: Optional[Cache] = None,
-+++++++        output_attentions: bool = False,
-+++++++        use_cache: bool = False,
-+++++++        **kwargs,
-+++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++        if "padding_mask" in kwargs:
-+++++++            warnings.warn(
-+++++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
-+++++++            )
-+++++++        if output_attentions:
-+++++++            warnings.warn(
-+++++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
-+++++++            )
-+++++++
-+++++++        bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++        if self.config.pretraining_tp > 1:
-+++++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
-+++++++
-+++++++        query_states = self.q_proj(hidden_states)
-+++++++        key_states = self.k_proj(hidden_states)
-+++++++        value_states = self.v_proj(hidden_states)
-+++++++
-+++++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
-+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++        kv_seq_len = key_states.shape[-2]
-+++++++        if past_key_value is not None:
-+++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++        
-+++++++        # Apply Rotary Position Embedding
-+++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++        if past_key_value is not None:
-+++++++            cache_kwargs = {"sin": sin, "cos": cos}
-+++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++++
-+++++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
-+++++++        # So we must explicitly repeat the KV heads.
-+++++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++++
-+++++++        # Convert attention mask for flash_attention_score
-+++++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
-+++++++        if attention_mask is not None:
-+++++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
-+++++++                 raise ValueError(
-+++++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
-+++++++                )
-+++++++            attn_mask_for_fa = attention_mask < 0
-+++++++        else:
-+++++++            attn_mask_for_fa = None
-+++++++
-+++++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
-+++++++
-+++++++        # Call the fused operator using the efficient BNSD layout
-+++++++        attn_output = mindspore.ops.flash_attention_score(
-+++++++            query=query_states,
-+++++++            key=key_states,
-+++++++            value=value_states,
-+++++++            head_num=self.num_heads,
-+++++++            input_layout='BNSD', # Specify the correct layout
-+++++++            attn_mask=attn_mask_for_fa,
-+++++++            keep_prob=keep_prob,
-+++++++            scalar_value=1.0 / math.sqrt(self.head_dim)
-+++++++        )
-+++++++        
-+++++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
-+++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++        
-+++++++        # Apply output projection
-+++++++        attn_output = self.o_proj(attn_output)
-+++++++
-+++++++        # Flash attention does not return attention weights, so we return None.
-+++++++        attn_weights = None
-+++++++
-+++++++        return attn_output, attn_weights, past_key_value
-+++++++
-++++++ Deepseek_ATTENTION_CLASSES = {
-++++++     "eager": DeepseekAttention,
-+++++++    "flash-attention": DeepseekFlashAttention,
-++++++ }
-++++++ 
-++++++ 
-++++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
-++++++             config=config, layer_idx=layer_idx
-++++++         )
-++++++ 
-+++++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-+++++++            config=config, layer_idx=layer_idx
-+++++++        )
-+++++++
-++++++         self.mlp = (
-++++++             DeepseekMoE(config)
-++++++             if (
-++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++index d4c6b651..bced285c 100644
-++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
-++++++ 
-++++++ import mindspore
-++++++ import mindnlp.core.nn.functional as F
-++++++-from mindnlp.core import nn, ops
-+++++++from mindnlp.core import nn, ops, no_grad
-++++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-++++++ 
-++++++ from ....common.activations import ACT2FN
-++++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
-++++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-++++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-++++++ 
-+++++++Long_Prompt = False
-+++++++PROMPT_LENGTH_THRESHOLD = 128
-++++++ 
-++++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-++++++ def _prepare_4d_causal_attention_mask_with_cache_position(
-++++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
-++++++         return attn_output, attn_weights, past_key_value
-++++++ 
-++++++ 
-+++++++# class Qwen2MoeFlashAttention(nn.Module):
-+++++++#     """
-+++++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-+++++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-+++++++
-+++++++#     关键改动:
-+++++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-+++++++#        直接传入原始的 key 和 value 张量效率更高。
-+++++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-+++++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++++#     """
-+++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++++#         super().__init__()
-+++++++#         self.config = config
-+++++++#         self.layer_idx = layer_idx
-+++++++#         self.hidden_size = config.hidden_size
-+++++++#         self.num_heads = config.num_attention_heads
-+++++++#         self.head_dim = self.hidden_size // self.num_heads
-+++++++#         self.num_key_value_heads = config.num_key_value_heads
-+++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++++#         self.max_position_embeddings = config.max_position_embeddings
-+++++++#         self.rope_theta = config.rope_theta
-+++++++#         self.attention_dropout = config.attention_dropout
-+++++++
-+++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++++#             raise ValueError(
-+++++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-+++++++#             )
-+++++++
-+++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++++
-+++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++++#             self.head_dim,
-+++++++#             max_position_embeddings=self.max_position_embeddings,
-+++++++#             base=self.rope_theta,
-+++++++#         )
-+++++++
-+++++++#     def forward(
-+++++++#         self,
-+++++++#         hidden_states: mindspore.Tensor,
-+++++++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++++++#         position_ids: Optional[mindspore.Tensor] = None,
-+++++++#         past_key_value: Optional[Cache] = None,
-+++++++#         output_attentions: bool = False,
-+++++++#         use_cache: bool = False,
-+++++++#         cache_position: Optional[mindspore.Tensor] = None,
-+++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++#         bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++#         # 1. 线性投射 Q, K, V
-+++++++#         query_states = self.q_proj(hidden_states)
-+++++++#         key_states = self.k_proj(hidden_states)
-+++++++#         value_states = self.v_proj(hidden_states)
-+++++++
-+++++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
-+++++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++#         # 3. RoPE 旋转位置编码
-+++++++#         kv_seq_len = key_states.shape[-2]
-+++++++#         if past_key_value is not None:
-+++++++#             if self.layer_idx is None:
-+++++++#                 raise ValueError(
-+++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++#                     "with a layer index."
-+++++++#                 )
-+++++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
-+++++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-+++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
-+++++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-+++++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-+++++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-+++++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-+++++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
-+++++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-+++++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-+++++++#                 if cache_position.shape[0] == 1:
-+++++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-+++++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-+++++++#                     kv_seq_len = past_seen_tokens + 1
-+++++++#                 else:
-+++++++#                     # prefill 阶段：cache_position 是范围，使用其长度
-+++++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
-+++++++#             else:
-+++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++
-+++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++#         # 4. KV 缓存更新
-+++++++#         if past_key_value is not None:
-+++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++++#             key_states, value_states = past_key_value.update(
-+++++++#                 key_states, value_states, self.layer_idx, cache_kwargs
-+++++++#             )
-+++++++            
-+++++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-+++++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-+++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
-+++++++#                 if cache_position.shape[0] == 1:
-+++++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-+++++++#                     kv_seq_len = key_states.shape[-2]
-+++++++
-+++++++#         # 5. [重要] 准备 Attention Mask
-+++++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-+++++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++++#         fa_attention_mask = None
-+++++++#         if attention_mask is not None:
-+++++++#             # 截取与当前key长度匹配的部分
-+++++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-+++++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-+++++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
-+++++++#             fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-+++++++#         input_dtype = query_states.dtype
-+++++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-+++++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-+++++++#             query_states = query_states.to(mindspore.float16)
-+++++++#             key_states = key_states.to(mindspore.float16)
-+++++++#             value_states = value_states.to(mindspore.float16)
-+++++++
-+++++++#         # 6. [核心] 调用 flash_attention_score 算子
-+++++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
-+++++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++++#         attn_output = mindspore.ops.flash_attention_score(
-+++++++#             query=query_states,
-+++++++#             key=key_states,
-+++++++#             value=value_states,
-+++++++#             head_num=self.num_heads, # 传入Q的头数(N1)
-+++++++#             attn_mask=fa_attention_mask,
-+++++++#             keep_prob=1.0 - self.attention_dropout,
-+++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++#             input_layout="BNSD",
-+++++++#             sparse_mode=0 # 使用 defaultMask 模式
-+++++++#         )
-+++++++
-+++++++#         # 恢复原始数据类型
-+++++++#         attn_output = attn_output.to(input_dtype)
-+++++++
-+++++++#         # 7. 调整输出形状
-+++++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++#         attn_output = self.o_proj(attn_output)
-+++++++
-+++++++#         # FlashAttention 算子不直接返回注意力权重矩阵
-+++++++#         attn_weights = None
-+++++++#         if output_attentions:
-+++++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++++
-+++++++#         return attn_output, attn_weights, past_key_value
-+++++++
-+++++++#     # def forward(
-+++++++#     #     self,
-+++++++#     #     hidden_states: mindspore.Tensor,
-+++++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
-+++++++#     #     position_ids: Optional[mindspore.Tensor] = None,
-+++++++#     #     past_key_value: Optional[Cache] = None,
-+++++++#     #     output_attentions: bool = False,
-+++++++#     #     use_cache: bool = False,
-+++++++#     #     cache_position: Optional[mindspore.Tensor] = None,
-+++++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++#     #     bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++#     #     # 1. 线性投射 Q, K, V
-+++++++#     #     query_states = self.q_proj(hidden_states)
-+++++++#     #     key_states = self.k_proj(hidden_states)
-+++++++#     #     value_states = self.v_proj(hidden_states)
-+++++++
-+++++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-+++++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++
-+++++++#     #     # 3. RoPE 旋转位置编码
-+++++++#     #     kv_seq_len = key_states.shape[-2]
-+++++++#     #     if past_key_value is not None:
-+++++++#     #         if self.layer_idx is None:
-+++++++#     #             raise ValueError(
-+++++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++#     #                 "with a layer index."
-+++++++#     #             )
-+++++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++
-+++++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++#     #     # 4. KV 缓存更新
-+++++++#     #     if past_key_value is not None:
-+++++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-+++++++#     #         key_states, value_states = past_key_value.update(
-+++++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
-+++++++#     #         )
-+++++++
-+++++++#     #     # 5. 准备 Attention Mask
-+++++++#     #     fa_attention_mask = None
-+++++++#     #     if attention_mask is not None:
-+++++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++#     #         fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-+++++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-+++++++#     #     input_dtype = query_states.dtype
-+++++++
-+++++++#     #     # 6. [核心] 调用 flash_attention_score 算子
-+++++++#     #     attn_output = mindspore.ops.flash_attention_score(
-+++++++#     #         query=query_states,
-+++++++#     #         key=key_states,
-+++++++#     #         value=value_states,
-+++++++#     #         head_num=self.num_heads,
-+++++++#     #         attn_mask=fa_attention_mask,
-+++++++#     #         keep_prob=1.0 - self.attention_dropout,
-+++++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++#     #         input_layout="BNSD",
-+++++++#     #         sparse_mode=0,
-+++++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
-+++++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-+++++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-+++++++#     #         inner_precise=1
-+++++++#     #     )
-+++++++
-+++++++#     #     # 恢复原始数据类型
-+++++++#     #     attn_output = attn_output.to(input_dtype)
-+++++++
-+++++++#     #     # 7. 调整输出形状
-+++++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++#     #     attn_output = self.o_proj(attn_output)
-+++++++
-+++++++#     #     attn_weights = None
-+++++++#     #     if output_attentions:
-+++++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++++
-+++++++#     #     return attn_output, attn_weights, past_key_value
-+++++++
-+++++++
-++++++ class Qwen2MoeFlashAttention(nn.Module):
-++++++     """
-++++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++++-
-++++++-    关键改动:
-++++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++++-       直接传入原始的 key 和 value 张量效率更高。
-++++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-+++++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
-+++++++
-+++++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
-+++++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
-+++++++    完全使用模型的低精度数据类型（如 float16）进行计算，
-+++++++    以达到理论上的最高执行速度。
-++++++     """
-++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++++         super().__init__()
-++++++         self.config = config
-++++++         self.layer_idx = layer_idx
-+++++++        if layer_idx is None:
-+++++++            logger.warning_once(
-+++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
-+++++++            )
-+++++++
-++++++         self.hidden_size = config.hidden_size
-++++++         self.num_heads = config.num_attention_heads
-++++++         self.head_dim = self.hidden_size // self.num_heads
-++++++         self.num_key_value_heads = config.num_key_value_heads
-++++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++         self.max_position_embeddings = config.max_position_embeddings
-++++++         self.rope_theta = config.rope_theta
-++++++         self.attention_dropout = config.attention_dropout
-++++++ 
-++++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++-            raise ValueError(
-++++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++++-            )
-++++++-
-++++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
-++++++         key_states = self.k_proj(hidden_states)
-++++++         value_states = self.v_proj(hidden_states)
-++++++ 
-++++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-+++++++        # 2. 调整形状以匹配 BNSD 布局
-++++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-
-++++++-        # 3. RoPE 旋转位置编码
-+++++++        
-+++++++        # 3. RoPE 和 KV 缓存
-++++++         kv_seq_len = key_states.shape[-2]
-++++++         if past_key_value is not None:
-++++++-            if self.layer_idx is None:
-++++++-                raise ValueError(
-++++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++-                    "with a layer index."
-++++++-                )
-++++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++++-                if cache_position.shape[0] == 1:
-++++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++++-                    kv_seq_len = past_seen_tokens + 1
-++++++-                else:
-++++++-                    # prefill 阶段：cache_position 是范围，使用其长度
-++++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++++-            else:
-++++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++-
-+++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++        
-++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++ 
-++++++-        # 4. KV 缓存更新
-++++++         if past_key_value is not None:
-++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++-            key_states, value_states = past_key_value.update(
-++++++-                key_states, value_states, self.layer_idx, cache_kwargs
-++++++-            )
-++++++-            
-++++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++-                if cache_position.shape[0] == 1:
-++++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++++-                    kv_seq_len = key_states.shape[-2]
-++++++-
-++++++-        # 5. [重要] 准备 Attention Mask
-++++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-+++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++++
-+++++++        # 4. 准备 Attention Mask
-++++++         fa_attention_mask = None
-++++++         if attention_mask is not None:
-++++++-            # 截取与当前key长度匹配的部分
-++++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++++             fa_attention_mask = (mask_slice != 0)
-++++++ 
-++++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++++-        input_dtype = query_states.dtype
-++++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++++-            query_states = query_states.to(mindspore.float16)
-++++++-            key_states = key_states.to(mindspore.float16)
-++++++-            value_states = value_states.to(mindspore.float16)
-++++++-
-++++++-        # 6. [核心] 调用 flash_attention_score 算子
-++++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-+++++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
-++++++         attn_output = mindspore.ops.flash_attention_score(
-++++++             query=query_states,
-++++++             key=key_states,
-++++++             value=value_states,
-++++++-            head_num=self.num_heads, # 传入Q的头数(N1)
-+++++++            head_num=self.num_heads,
-++++++             attn_mask=fa_attention_mask,
-++++++-            keep_prob=1.0 - self.attention_dropout,
-+++++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
-++++++             scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++             input_layout="BNSD",
-++++++-            sparse_mode=0 # 使用 defaultMask 模式
-+++++++            sparse_mode=0,
-+++++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
-++++++         )
-++++++ 
-++++++-        # 恢复原始数据类型
-++++++-        attn_output = attn_output.to(input_dtype)
-++++++-
-++++++-        # 7. 调整输出形状
-++++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-+++++++        # 6. 调整输出形状
-++++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++         attn_output = self.o_proj(attn_output)
-++++++ 
-++++++-        # FlashAttention 算子不直接返回注意力权重矩阵
-+++++++        # 7. 返回结果
-++++++         attn_weights = None
-++++++         if output_attentions:
-++++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-++++++ 
-++++++         return attn_output, attn_weights, past_key_value
-++++++ 
-++++++-    # def forward(
-++++++-    #     self,
-++++++-    #     hidden_states: mindspore.Tensor,
-++++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++-    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++-    #     past_key_value: Optional[Cache] = None,
-++++++-    #     output_attentions: bool = False,
-++++++-    #     use_cache: bool = False,
-++++++-    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++-
-++++++-    #     bsz, q_len, _ = hidden_states.shape
-++++++-
-++++++-    #     # 1. 线性投射 Q, K, V
-++++++-    #     query_states = self.q_proj(hidden_states)
-++++++-    #     key_states = self.k_proj(hidden_states)
-++++++-    #     value_states = self.v_proj(hidden_states)
-++++++-
-++++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-
-++++++-    #     # 3. RoPE 旋转位置编码
-++++++-    #     kv_seq_len = key_states.shape[-2]
-++++++-    #     if past_key_value is not None:
-++++++-    #         if self.layer_idx is None:
-++++++-    #             raise ValueError(
-++++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++-    #                 "with a layer index."
-++++++-    #             )
-++++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++ 
-++++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++-
-++++++-    #     # 4. KV 缓存更新
-++++++-    #     if past_key_value is not None:
-++++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++-    #         key_states, value_states = past_key_value.update(
-++++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++-    #         )
-++++++-
-++++++-    #     # 5. 准备 Attention Mask
-++++++-    #     fa_attention_mask = None
-++++++-    #     if attention_mask is not None:
-++++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++-    #         fa_attention_mask = (mask_slice != 0)
-++++++-
-++++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++++-    #     input_dtype = query_states.dtype
-++++++-
-++++++-    #     # 6. [核心] 调用 flash_attention_score 算子
-++++++-    #     attn_output = mindspore.ops.flash_attention_score(
-++++++-    #         query=query_states,
-++++++-    #         key=key_states,
-++++++-    #         value=value_states,
-++++++-    #         head_num=self.num_heads,
-++++++-    #         attn_mask=fa_attention_mask,
-++++++-    #         keep_prob=1.0 - self.attention_dropout,
-++++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++-    #         input_layout="BNSD",
-++++++-    #         sparse_mode=0,
-++++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++++-    #         inner_precise=1
-++++++-    #     )
-++++++-
-++++++-    #     # 恢复原始数据类型
-++++++-    #     attn_output = attn_output.to(input_dtype)
-+++++++QWEN2MOE_ATTENTION_CLASSES = {
-+++++++    "eager": Qwen2MoeAttention,
-+++++++    "flash-attention": Qwen2MoeFlashAttention,
-+++++++}
-++++++ 
-++++++-    #     # 7. 调整输出形状
-++++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++-    #     attn_output = self.o_proj(attn_output)
-++++++ 
-++++++-    #     attn_weights = None
-++++++-    #     if output_attentions:
-++++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++++#     def __init__(self, config):
-+++++++#         super().__init__()
-+++++++#         self.num_experts = config.num_experts
-+++++++#         self.top_k = config.num_experts_per_tok
-+++++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++++
-+++++++#         # gating
-+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++++#         self.experts = nn.ModuleList(
-+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++++#         )
-+++++++
-+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++++
-+++++++#     #@dwj
-+++++++#     # 只遍历激活的专家，而非全部专家
-+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++#             num_tokens = hidden_states_reshaped.shape[0]
-+++++++
-+++++++#             router_logits = self.gate(hidden_states_reshaped)
-+++++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++++
-+++++++#             if self.norm_topk_prob:
-+++++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++#             routing_weights = routing_weights.to(hidden_states.dtype)
-+++++++            
-+++++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-+++++++#             flat_selected_experts = selected_experts.flatten()
-+++++++            
-+++++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-+++++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-+++++++#             token_indices = broadcasted_token_indices.flatten()
-+++++++            
-+++++++#             active_experts = ops.unique(flat_selected_experts)
-+++++++            
-+++++++#             for expert_idx_tensor in active_experts:
-+++++++#                 expert_idx = expert_idx_tensor.item()
-+++++++#                 expert_layer = self.experts[expert_idx]
-+++++++                
-+++++++#                 mask = (flat_selected_experts == expert_idx_tensor)
-+++++++#                 selected_token_indices = token_indices[mask]
-+++++++#                 selected_routing_weights = routing_weights.flatten()[mask]
-+++++++                
-+++++++#                 current_states = hidden_states_reshaped[selected_token_indices]
-+++++++                
-+++++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++++                
-+++++++#                 final_hidden_states = final_hidden_states.index_add(
-+++++++#                     dim=0,
-+++++++#                     index=selected_token_indices,
-+++++++#                     source=expert_output.to(hidden_states.dtype)
-+++++++#                 )
-+++++++            
-+++++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++++ 
-++++++-    #     return attn_output, attn_weights, past_key_value
-+++++++#             final_hidden_states = final_hidden_states + shared_expert_output
-+++++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++++            
-+++++++#             return final_hidden_states, router_logits
-+++++++
-+++++++
-+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++++#     """
-+++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-+++++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-+++++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
-+++++++#     """
-+++++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++++#         super().__init__()
-+++++++#         self.num_experts = config.num_experts
-+++++++#         self.top_k = config.num_experts_per_tok
-+++++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++++
-+++++++#         # 门控网络
-+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++++#         # 专家列表
-+++++++#         self.experts = nn.ModuleList(
-+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++++#         )
-+++++++#         # 共享专家
-+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++++
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_decode(
-+++++++#         self, 
-+++++++#         hidden_states: mindspore.Tensor, 
-+++++++#         selected_experts: mindspore.Tensor, 
-+++++++#         routing_weights: mindspore.Tensor
-+++++++#     ) -> mindspore.Tensor:
-+++++++#         """
-+++++++#         【解码路径】针对 sequence_length=1 的极致优化。
-+++++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-+++++++#         """
-+++++++#         batch_size, hidden_dim = hidden_states.shape
-+++++++        
-+++++++#         expert_outputs_list = [
-+++++++#             ops.cat([
-+++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++++#             ], dim=0) 
-+++++++#             for i in range(batch_size)
-+++++++#         ]
-+++++++        
-+++++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-+++++++#         # shape: (batch_size, top_k, hidden_dim)
-+++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++++        
-+++++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-+++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++++        
-+++++++#         return moe_output.squeeze(1)
-+++++++
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_prefill(
-+++++++#         self, 
-+++++++#         hidden_states: mindspore.Tensor, 
-+++++++#         selected_experts: mindspore.Tensor, 
-+++++++#         routing_weights: mindspore.Tensor
-+++++++#     ) -> mindspore.Tensor:
-+++++++#         """
-+++++++#         【预填充路径】针对 sequence_length > 1 的优化。
-+++++++#         按专家对 Token 进行分组，并进行批处理。
-+++++++#         """
-+++++++#         moe_output = ops.zeros_like(hidden_states)
-+++++++#         num_tokens = hidden_states.shape[0]
-+++++++#         flat_selected_experts = selected_experts.flatten()
-+++++++        
-+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++++        
-+++++++#         active_experts = ops.unique(flat_selected_experts)
-+++++++        
-+++++++#         for expert_idx_tensor in active_experts:
-+++++++#             expert_idx = expert_idx_tensor.item()
-+++++++#             expert_layer = self.experts[expert_idx]
-+++++++            
-+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++++#             selected_token_indices = token_indices[mask]
-+++++++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++++            
-+++++++#             current_states = hidden_states[selected_token_indices]
-+++++++            
-+++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++++            
-+++++++#             moe_output = moe_output.index_add(
-+++++++#                 dim=0,
-+++++++#                 index=selected_token_indices,
-+++++++#                 source=expert_output.to(hidden_states.dtype)
-+++++++#             )
-+++++++#         return moe_output
-+++++++
-+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++++#         """
-+++++++#         顶层 forward 方法，作为智能分发器。
-+++++++#         """
-+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++        
-+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++ 
-++++++-    # def forward(
-++++++-    #     self,
-++++++-    #     hidden_states: mindspore.Tensor,
-++++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++-    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++-    #     past_key_value: Optional[Cache] = None,
-++++++-    #     output_attentions: bool = False,
-++++++-    #     use_cache: bool = False,
-++++++-    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++-
-++++++-    #     bsz, q_len, _ = hidden_states.shape
-++++++-
-++++++-    #     query_states = self.q_proj(hidden_states)
-++++++-    #     key_states = self.k_proj(hidden_states)
-++++++-    #     value_states = self.v_proj(hidden_states)
-++++++-
-++++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-
-++++++-    #     kv_seq_len = key_states.shape[-2]
-++++++-    #     if past_key_value is not None:
-++++++-    #         if self.layer_idx is None:
-++++++-    #             raise ValueError("`layer_idx` must be specified for caching")
-++++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++-
-++++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++-
-++++++-    #     if past_key_value is not None:
-++++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++-    #         key_states, value_states = past_key_value.update(
-++++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++-    #         )
-+++++++#         if self.norm_topk_prob:
-+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++        
-+++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++++        
-+++++++#         moe_output = None
-+++++++#         # 在推理时，根据序列长度选择最优路径
-+++++++#         if not self.training:
-+++++++#             if sequence_length == 1:
-+++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++++++#             else:
-+++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++++++#         else:
-+++++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-+++++++#             raise NotImplementedError("Training path is not implemented.")
-+++++++
-+++++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-+++++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-+++++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-+++++++        
-+++++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-+++++++        
-+++++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-+++++++        
-+++++++#         return final_hidden_states, router_logits
-+++++++
-+++++++
-+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++++#     """
-+++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-+++++++#     """
-+++++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++++#         super().__init__()
-+++++++#         self.num_experts = config.num_experts
-+++++++#         self.top_k = config.num_experts_per_tok
-+++++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++++
-+++++++#         # 门控网络
-+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++++#         # 专家列表
-+++++++#         self.experts = nn.ModuleList(
-+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++++#         )
-+++++++#         # 共享专家
-+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++++
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_decode(
-+++++++#         self, 
-+++++++#         hidden_states: mindspore.Tensor, 
-+++++++#         selected_experts: mindspore.Tensor, 
-+++++++#         routing_weights: mindspore.Tensor
-+++++++#     ) -> mindspore.Tensor:
-+++++++#         batch_size, _ = hidden_states.shape
-+++++++#         expert_outputs_list = [
-+++++++#             ops.cat([
-+++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++++#             ], dim=0) 
-+++++++#             for i in range(batch_size)
-+++++++#         ]
-+++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++++#         return moe_output.squeeze(1)
-+++++++
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_prefill(
-+++++++#         self, 
-+++++++#         hidden_states: mindspore.Tensor, 
-+++++++#         selected_experts: mindspore.Tensor, 
-+++++++#         routing_weights: mindspore.Tensor
-+++++++#     ) -> mindspore.Tensor:
-+++++++#         moe_output = ops.zeros_like(hidden_states)
-+++++++#         num_tokens = hidden_states.shape[0]
-+++++++#         flat_selected_experts = selected_experts.flatten()
-+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++++#         active_experts = ops.unique(flat_selected_experts)
-+++++++        
-+++++++#         for expert_idx_tensor in active_experts:
-+++++++#             expert_idx = expert_idx_tensor.item()
-+++++++#             expert_layer = self.experts[expert_idx]
-+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++++#             selected_token_indices = token_indices[mask]
-+++++++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++++#             current_states = hidden_states[selected_token_indices]
-+++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++++#             moe_output = moe_output.index_add(
-+++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++++#             )
-+++++++#         return moe_output
-+++++++
-+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++++#         """
-+++++++#         顶层 forward 方法，作为智能分发器。
-+++++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-+++++++#         """
-+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++        
-+++++++#         # 1. 门控计算 (通用逻辑)
-+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++++
-+++++++#         if self.norm_topk_prob:
-+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++        
-+++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++++        
-+++++++#         # 2. 智能分发到最优 MoE 路径
-+++++++#         moe_output = None
-+++++++#         if not self.training:
-+++++++#             if sequence_length == 1:
-+++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-+++++++#             else:
-+++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-+++++++#         else:
-+++++++#             raise NotImplementedError("Training path is not implemented.")
-+++++++
-+++++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-+++++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-+++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++++        
-+++++++#         # 4. 合并 MoE 输出和共享专家输出
-+++++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-+++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++++        
-+++++++#         # 5. 恢复原始形状并返回
-+++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++++        
-+++++++#         return final_hidden_states, router_logits
-+++++++
-+++++++# prefill fastest
-+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++++#     """
-+++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-+++++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-+++++++#     """
-+++++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++++#         super().__init__()
-+++++++#         self.num_experts = config.num_experts
-+++++++#         self.top_k = config.num_experts_per_tok
-+++++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++++
-+++++++#         # 门控网络
-+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++++#         # 专家列表
-+++++++#         self.experts = nn.ModuleList(
-+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++++#         )
-+++++++#         # 共享专家
-+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++++
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_dispatch(
-+++++++#         self, 
-+++++++#         hidden_states: mindspore.Tensor, 
-+++++++#         selected_experts: mindspore.Tensor, 
-+++++++#         routing_weights: mindspore.Tensor
-+++++++#     ) -> mindspore.Tensor:
-+++++++#         """
-+++++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-+++++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-+++++++#         """
-+++++++#         moe_output = ops.zeros_like(hidden_states)
-+++++++#         num_tokens, _ = hidden_states.shape
-+++++++        
-+++++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-+++++++#         flat_selected_experts = selected_experts.flatten()
-+++++++#         flat_routing_weights = routing_weights.flatten()
-++++++ 
-++++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++-
-++++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++++-    #     query_states = query_states / math.sqrt(self.head_dim)
-++++++-    #     # <--- 修改结束 ---
-++++++-
-++++++-    #     fa_attention_mask = None
-++++++-    #     if attention_mask is not None:
-++++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++-    #         fa_attention_mask = (mask_slice != 0)
-++++++-
-++++++-    #     input_dtype = query_states.dtype
-++++++-
-++++++-    #     attn_output = mindspore.ops.flash_attention_score(
-++++++-    #         query=query_states,    # 传入已经预先缩放过的 query
-++++++-    #         key=key_states,
-++++++-    #         value=value_states,
-++++++-    #         head_num=self.num_heads,
-++++++-    #         attn_mask=fa_attention_mask,
-++++++-    #         keep_prob=1.0 - self.attention_dropout,
-++++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++++-    #         input_layout="BNSD",
-++++++-    #         sparse_mode=0,
-++++++-    #         inner_precise=1        # 仍然保持内部高精度计算
-++++++-    #     )
-+++++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++ 
-++++++-    #     attn_output = attn_output.to(input_dtype)
-++++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++-    #     attn_output = self.o_proj(attn_output)
-+++++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-+++++++#         active_experts = ops.unique(flat_selected_experts)
-+++++++        
-+++++++#         for expert_idx_tensor in active_experts:
-+++++++#             expert_idx = expert_idx_tensor.item()
-+++++++#             expert_layer = self.experts[expert_idx]
-+++++++            
-+++++++#             # 找到所有分配给该专家的 token
-+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++++            
-+++++++#             # 使用 mask 选取对应的 token 和权重
-+++++++#             current_token_indices = token_indices[mask]
-+++++++#             current_routing_weights = flat_routing_weights[mask]
-+++++++#             current_hidden_states = hidden_states[current_token_indices]
-+++++++            
-+++++++#             # 对这些 token 进行批处理
-+++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++++++            
-+++++++#             # 使用 index_add 将结果精确地加回到对应位置
-+++++++#             moe_output = moe_output.index_add(
-+++++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++++#             )
-+++++++#         return moe_output
-+++++++
-+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++++#         """
-+++++++#         顶层 forward 方法，作为智能分发器。
-+++++++#         """
-+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++        
-+++++++#         # 1. 门控计算
-+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++++
-+++++++#         if self.norm_topk_prob:
-+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++        
-+++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
-+++++++        
-+++++++#         # 2. 调用统一的 MoE 计算内核
-+++++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-+++++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-++++++ 
-++++++-    #     attn_weights = None
-++++++-    #     if output_attentions:
-++++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-+++++++#         # 3. 统一处理共享专家
-+++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++++        
-+++++++#         # 4. 合并输出
-+++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++++        
-+++++++#         # 5. 恢复原始形状并返回
-+++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++++        
-+++++++#         return final_hidden_states, router_logits
-+++++++
-+++++++
-+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++++#     """
-+++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-+++++++#     【最终高性能与高精度版】：
-+++++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-+++++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-+++++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-+++++++#     3. 这样实现了速度和准确性的两全其美。
-+++++++#     """
-+++++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++++#         super().__init__()
-+++++++#         self.num_experts = config.num_experts
-+++++++#         self.top_k = config.num_experts_per_tok
-+++++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++++
-+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++++#         self.experts = nn.ModuleList(
-+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++++#         )
-+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++++
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_decode(
-+++++++#         self, 
-+++++++#         hidden_states: mindspore.Tensor, 
-+++++++#         selected_experts: mindspore.Tensor, 
-+++++++#         routing_weights: mindspore.Tensor
-+++++++#     ) -> mindspore.Tensor:
-+++++++#         """
-+++++++#         【解码路径】极致优化版：bmm + 高精度累加。
-+++++++#         """
-+++++++#         original_dtype = hidden_states.dtype
-+++++++#         batch_size, _ = hidden_states.shape
-+++++++        
-+++++++#         expert_outputs_list = [
-+++++++#             ops.cat([
-+++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++++#             ], dim=0) 
-+++++++#             for i in range(batch_size)
-+++++++#         ]
-+++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++++
-+++++++#         # 在 float32 下执行 bmm，得到高精度结果
-+++++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-+++++++        
-+++++++#         # 将高精度结果转换回原始数据类型
-+++++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-+++++++        
-+++++++#         return moe_output
-+++++++
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_prefill(
-+++++++#         self, 
-+++++++#         hidden_states: mindspore.Tensor, 
-+++++++#         selected_experts: mindspore.Tensor, 
-+++++++#         routing_weights: mindspore.Tensor
-+++++++#     ) -> mindspore.Tensor:
-+++++++#         """
-+++++++#         【预填充路径】与原始实现一致，结果精确。
-+++++++#         """
-+++++++#         moe_output = ops.zeros_like(hidden_states)
-+++++++#         num_tokens, _ = hidden_states.shape
-+++++++#         flat_selected_experts = selected_experts.flatten()
-+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++++#         active_experts = ops.unique(flat_selected_experts)
-+++++++        
-+++++++#         for expert_idx_tensor in active_experts:
-+++++++#             expert_idx = expert_idx_tensor.item()
-+++++++#             expert_layer = self.experts[expert_idx]
-+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++++#             selected_token_indices = token_indices[mask]
-+++++++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++++#             current_states = hidden_states[selected_token_indices]
-+++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++++#             moe_output = moe_output.index_add(
-+++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-+++++++#             )
-+++++++#         return moe_output
-+++++++
-+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++        
-+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++ 
-++++++-    #     return attn_output, attn_weights, past_key_value
-+++++++#         if self.norm_topk_prob:
-+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++        
-+++++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-+++++++#         # 如果模型主体是 float16，后续再转换
-+++++++        
-+++++++#         moe_output = None
-+++++++#         if not self.training:
-+++++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-+++++++#             # _moe_infer_decode 内部会处理好类型转换
-+++++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-+++++++#             if sequence_length == 1:
-+++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++++++#             else:
-+++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-+++++++#         else:
-+++++++#             raise NotImplementedError("Training path is not implemented.")
-+++++++
-+++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++++        
-+++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++++        
-+++++++#         return final_hidden_states, router_logits
-+++++++    
-++++++ 
-++++++-QWEN2MOE_ATTENTION_CLASSES = {
-++++++-    "eager": Qwen2MoeAttention,
-++++++-    "flash-attention": Qwen2MoeFlashAttention,
-++++++-}
-+++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++++#     """
-+++++++#     【融合版】一个混合专家模块，内置两种推理策略，
-+++++++#     由外部全局变量 `Long_Prompt` 控制：
-+++++++
-+++++++#     - if Long_Prompt is True:  【精度优先模式】
-+++++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-+++++++#       适用于处理长序列，避免误差累积。
-+++++++
-+++++++#     - if Long_Prompt is False: 【速度优先模式】
-+++++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-+++++++#       在解码阶段获得极致速度，同时保证结果高度准确。
-+++++++#     """
-+++++++#     def __init__(self, config: Qwen2MoeConfig):
-+++++++#         super().__init__()
-+++++++#         self.num_experts = config.num_experts
-+++++++#         self.top_k = config.num_experts_per_tok
-+++++++#         self.norm_topk_prob = config.norm_topk_prob
-+++++++
-+++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-+++++++#         self.experts = nn.ModuleList(
-+++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-+++++++#         )
-+++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++++
-+++++++#     # --- 速度优先模式的辅助函数 ---
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++++#         original_dtype = hidden_states.dtype
-+++++++#         batch_size, _ = hidden_states.shape
-+++++++#         expert_outputs_list = [
-+++++++#             ops.cat([
-+++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++++#             ], dim=0) 
-+++++++#             for i in range(batch_size)
-+++++++#         ]
-+++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++++#         weights_fp32 = routing_weights.to(mindspore.float32)
-+++++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+++++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
-+++++++
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++++#         moe_output = ops.zeros_like(hidden_states)
-+++++++#         num_tokens, _ = hidden_states.shape
-+++++++#         flat_selected_experts = selected_experts.flatten()
-+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++++#         active_experts = ops.unique(flat_selected_experts)
-+++++++#         for expert_idx_tensor in active_experts:
-+++++++#             expert_idx = expert_idx_tensor.item()
-+++++++#             expert_layer = self.experts[expert_idx]
-+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++++#             selected_token_indices = token_indices[mask]
-+++++++#             selected_routing_weights = routing_weights.flatten()[mask]
-+++++++#             current_states = hidden_states[selected_token_indices]
-+++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-+++++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++++#         return moe_output
-+++++++
-+++++++#     # --- 精度优先模式的辅助函数 ---
-+++++++#     @no_grad()
-+++++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++++#         moe_output = ops.zeros_like(hidden_states)
-+++++++#         num_tokens, _ = hidden_states.shape
-+++++++#         flat_selected_experts = selected_experts.flatten()
-+++++++#         flat_routing_weights = routing_weights.flatten()
-+++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++++#         active_experts = ops.unique(flat_selected_experts)
-+++++++#         for expert_idx_tensor in active_experts:
-+++++++#             expert_idx = expert_idx_tensor.item()
-+++++++#             expert_layer = self.experts[expert_idx]
-+++++++#             mask = (flat_selected_experts == expert_idx_tensor)
-+++++++#             current_token_indices = token_indices[mask]
-+++++++#             current_routing_weights = flat_routing_weights[mask]
-+++++++#             current_hidden_states = hidden_states[current_token_indices]
-+++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++++#         return moe_output
-+++++++
-+++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++++#         # 声明我们将要使用一个在模块外部定义的全局变量
-+++++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-+++++++#         global Long_Prompt
-+++++++
-+++++++#         # 1. 门控计算 (所有模式通用)
-+++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++#         router_logits = self.gate(hidden_states_reshaped)
-+++++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+++++++#         if self.norm_topk_prob:
-+++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++        
-+++++++#         moe_output = None
-+++++++#         if not self.training:
-+++++++#             # 根据 Long_Prompt 标志选择模式
-+++++++#             if Long_Prompt:
-+++++++#                 # --- 精度优先模式 ---
-+++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++#             else:
-+++++++#                 # --- 速度优先模式 ---
-+++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++++#                 if sequence_length == 1:
-+++++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++#                 else:
-+++++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++#         else:
-+++++++#             raise NotImplementedError("Training path is not implemented.")
-+++++++
-+++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++++        
-+++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++++        
-+++++++#         return final_hidden_states, router_logits
-+++++++    
-+++++++class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++++    """
-+++++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-+++++++    控制的顶级推理策略：
-++++++ 
-+++++++    - if Long_Prompt is True:  【精度优先模式】
-+++++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
-+++++++      适用于需要严格可复现性的长序列任务。
-++++++ 
-++++++-class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++-    def __init__(self, config):
-+++++++    - if Long_Prompt is False: 【速度优先模式】
-+++++++      采用业界最强的性能组合：
-+++++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
-+++++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
-+++++++    """
-+++++++    def __init__(self, config: Qwen2MoeConfig):
-++++++         super().__init__()
-++++++         self.num_experts = config.num_experts
-++++++         self.top_k = config.num_experts_per_tok
-++++++         self.norm_topk_prob = config.norm_topk_prob
-++++++ 
-++++++-        # gating
-++++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++         self.experts = nn.ModuleList(
-++++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++         )
-++++++-
-++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++ 
-++++++-    #@dwj
-++++++-    # 只遍历激活的专家，而非全部专家
-++++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++-            num_tokens = hidden_states_reshaped.shape[0]
-++++++-
-++++++-            router_logits = self.gate(hidden_states_reshaped)
-++++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++-
-++++++-            if self.norm_topk_prob:
-++++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-            routing_weights = routing_weights.to(hidden_states.dtype)
-++++++-            
-++++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++++-            flat_selected_experts = selected_experts.flatten()
-++++++-            
-++++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++++-            token_indices = broadcasted_token_indices.flatten()
-++++++-            
-++++++-            active_experts = ops.unique(flat_selected_experts)
-++++++-            
-++++++-            for expert_idx_tensor in active_experts:
-++++++-                expert_idx = expert_idx_tensor.item()
-++++++-                expert_layer = self.experts[expert_idx]
-++++++-                
-++++++-                mask = (flat_selected_experts == expert_idx_tensor)
-++++++-                selected_token_indices = token_indices[mask]
-++++++-                selected_routing_weights = routing_weights.flatten()[mask]
-++++++-                
-++++++-                current_states = hidden_states_reshaped[selected_token_indices]
-++++++-                
-++++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++-                
-++++++-                final_hidden_states = final_hidden_states.index_add(
-++++++-                    dim=0,
-++++++-                    index=selected_token_indices,
-++++++-                    source=expert_output.to(hidden_states.dtype)
-++++++-                )
-++++++-            
-++++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-+++++++    @no_grad()
-+++++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++++        original_dtype = hidden_states.dtype
-+++++++        batch_size, _ = hidden_states.shape
-+++++++        expert_outputs_list = [
-+++++++            ops.cat([
-+++++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-+++++++            ], dim=0)
-+++++++            for i in range(batch_size)
-+++++++        ]
-+++++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-+++++++        weights_fp32 = routing_weights.to(mindspore.float32)
-+++++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-+++++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-+++++++        return moe_output_fp32.squeeze(1).to(original_dtype)
-+++++++
-+++++++    @no_grad()
-+++++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++++        num_tokens, _ = hidden_states.shape
-+++++++        flat_selected_experts = selected_experts.flatten()
-+++++++        sorted_expert_indices = flat_selected_experts.argsort()
-+++++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+++++++        original_token_indices = sorted_expert_indices // self.top_k
-+++++++        moe_output = ops.zeros_like(hidden_states)
-+++++++        current_token_offset = 0
-+++++++        for i in range(self.num_experts):
-+++++++            expert_token_count = tokens_per_expert[i] - current_token_offset
-+++++++            if expert_token_count == 0:
-+++++++                continue
-+++++++            end_offset = current_token_offset + expert_token_count
-+++++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+++++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+++++++            expert_hidden_states = hidden_states[expert_original_token_indices]
-+++++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+++++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+++++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++++            current_token_offset += expert_token_count
-+++++++        return moe_output
-+++++++
-+++++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-+++++++    @no_grad()
-+++++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++++        moe_output = ops.zeros_like(hidden_states)
-+++++++        num_tokens, _ = hidden_states.shape
-+++++++        flat_selected_experts = selected_experts.flatten()
-+++++++        flat_routing_weights = routing_weights.flatten()
-+++++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-+++++++        active_experts = ops.unique(flat_selected_experts)
-+++++++        for expert_idx_tensor in active_experts:
-+++++++            expert_idx = expert_idx_tensor.item()
-+++++++            expert_layer = self.experts[expert_idx]
-+++++++            mask = (flat_selected_experts == expert_idx_tensor)
-+++++++            current_token_indices = token_indices[mask]
-+++++++            current_routing_weights = flat_routing_weights[mask]
-+++++++            current_hidden_states = hidden_states[current_token_indices]
-+++++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-+++++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++++        return moe_output
-++++++ 
-++++++-            final_hidden_states = final_hidden_states + shared_expert_output
-++++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++-            
-++++++-            return final_hidden_states, router_logits
-+++++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++++        global Long_Prompt
-+++++++
-+++++++        # 1. 门控计算 (所有模式通用)
-+++++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-+++++++        router_logits = self.gate(hidden_states_reshaped)
-+++++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-+++++++        if self.norm_topk_prob:
-+++++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++        
-+++++++        moe_output = None
-+++++++        if Long_Prompt:
-+++++++            # --- 精度优先模式 (ACCURACY MODE) ---
-+++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++        else:
-+++++++            # --- 速度优先模式 (SPEED MODE) ---
-+++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++++            if sequence_length == 1:
-+++++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++            else:
-+++++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++        
-++++++ 
-+++++++        # 3. 共享专家计算与合并 (所有模式通用)
-+++++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-+++++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-+++++++        
-+++++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-+++++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-+++++++        
-+++++++        return final_hidden_states, router_logits
-++++++ 
-++++++ class Qwen2MoeDecoderLayer(nn.Module):
-++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-++++++         super().__init__()
-++++++         self.hidden_size = config.hidden_size
-+++++++        
-+++++++        # if Long_Prompt:
-+++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++++        # else:
-+++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++++ 
-++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++++ 
-++++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++++-
-++++++         if (layer_idx not in config.mlp_only_layers) and (
-++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++++++         ):
-++++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++             self._warmed_up = True
-++++++             self.warmup_moe_model()
-++++++ 
-+++++++
-+++++++
-++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++++++         output_router_logits = (
-++++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
-++++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++             router_logits=outputs.router_logits,
-++++++         )
-++++++ 
-+++++++    def generate(self, *args, **kwargs):
-+++++++        """
-+++++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-+++++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-+++++++        """
-+++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+++++++
-+++++++        input_ids = kwargs.get("input_ids")
-+++++++        if input_ids is None and args:
-+++++++            input_ids = args[0]
-+++++++
-+++++++        if input_ids is not None:
-+++++++            prompt_length = input_ids.shape[1]
-+++++++            
-+++++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-+++++++                Long_Prompt = True
-+++++++            else:
-+++++++                Long_Prompt = False
-+++++++
-+++++++        return super().generate(*args, **kwargs)
-+++++++    
-++++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
-++++++     def prepare_inputs_for_generation(
-++++++         self,
-++++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
-++++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
-++++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
-+++++++        
-++++++         if past_key_values is not None:
-++++++             if inputs_embeds is not None:  # Exception 1
-++++++                 if 0 not in input_ids.shape:
-++++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++             }
-++++++         )
-++++++         return model_inputs
-+++++++
-++++++ # @lwx
-++++++     # def _decode_one_tokens_logits(
-++++++     #     self,
-++++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
-++++++             attentions=outputs.attentions,
-++++++         )
-++++++ 
-+++++++
-++++++ __all__ = [
-++++++     "Qwen2MoeForCausalLM",
-++++++     "Qwen2MoeModel",
-++++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++++++new file mode 100644
-++++++index 00000000..6dfb5b93
-++++++--- /dev/null
-+++++++++ b/patches/0001-20251104commit.patch
-++++++@@ -0,0 +1,1272 @@
-+++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-+++++++From: Pinoeer-kingxi <13022943007@163.com>
-+++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
-+++++++Subject: [PATCH] 20251104commit
-+++++++
-+++++++---
-+++++++ mindnlp/transformers/cache_utils.py           |  28 +-
-+++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
-+++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-+++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
-+++++++
-+++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-+++++++index cadd2e04..02f8d4be 100644
-+++++++--- a/mindnlp/transformers/cache_utils.py
-++++++++++ b/mindnlp/transformers/cache_utils.py
-+++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
-+++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-+++++++             # k_out[:, :, cache_position] = key_states
-+++++++             # v_out[:, :, cache_position] = value_states
-+++++++-            if ON_ORANGE_PI:
-+++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-+++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-+++++++-            else:
-+++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-+++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-+++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-+++++++-
-++++++++            # if ON_ORANGE_PI:
-++++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++++++            # else:
-++++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++++++            # 确保 cache_position 是 1D tensor 并且类型正确
-++++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++++++++            if cache_position.ndim > 1:
-++++++++                cache_position = cache_position.flatten()
-++++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
-++++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++++++++                cache_position = cache_position.int()
-++++++++            
-++++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++++++++            k_out[:, :, cache_position] = key_states
-++++++++            v_out[:, :, cache_position] = value_states
-++++++++            
-+++++++         return k_out, v_out
-+++++++ 
-+++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-+++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++index c695b944..d8303e45 100644
-+++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-+++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
-+++++++ def rotate_half(x):
-+++++++     """Rotates half the hidden dims of the input."""
-+++++++-    x1 = x[..., : x.shape[-1] // 2]
-+++++++-    x2 = x[..., x.shape[-1] // 2 :]
-++++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++++++    # x1 = x[..., : x.shape[-1] // 2]
-++++++++    # x2 = x[..., x.shape[-1] // 2 :]
-++++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++++     return ops.cat((-x2, x1), dim=-1)
-+++++++ 
-+++++++ 
-+++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-+++++++         if self.training:
-+++++++             raise NotImplementedError("Training is not supported yet.")
-+++++++         else:
-+++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-+++++++-        if self.config.n_shared_experts is not None:
-+++++++-            y = y + self.shared_experts(identity)
-+++++++-        return y
-++++++++            # @lwx
-++++++++            if orig_shape[1] == 1:
-++++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++++++++                y=y.view(*orig_shape)
-++++++++                if self.config.n_shared_experts is not None:
-++++++++                    y = y + self.shared_experts(identity)
-++++++++                return y
-++++++++            else:
-++++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++++++++                if self.config.n_shared_experts is not None:
-++++++++                    y = y + self.shared_experts(identity)
-++++++++                return y
-++++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++++++        # if self.config.n_shared_experts is not None:
-++++++++        #     y = y + self.shared_experts(identity)
-++++++++        # return y
-++++++++        
-++++++++    @no_grad()
-++++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++++++
-++++++++        expert_cache = ops.zeros_like(x)
-++++++++        for i in range(self.num_experts_per_tok):
-++++++++            expert_id = flat_expert_indices[i].item()
-++++++++            weight = flat_expert_weights[i].item()
-++++++++            expert = self.experts[expert_id]
-++++++++            expert_out = expert(x)
-++++++++            expert_cache += expert_out * weight
-++++++++        return expert_cache
-+++++++ 
-+++++++     @no_grad()
-+++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++++-        # expert_cache = torch.zeros_like(x)
-+++++++-        # idxs = flat_expert_indices.argsort()
-+++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++++-        # token_idxs = idxs // self.num_experts_per_tok
-+++++++-        # for i, end_idx in enumerate(tokens_per_expert):
-+++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++++-        #     if start_idx == end_idx:
-+++++++-        #         continue
-+++++++-        #     expert = self.experts[i]
-+++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++-        #     expert_tokens = x[exp_token_idx]
-+++++++-        #     expert_out = expert(expert_tokens)
-+++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++++-        # return expert_cache
-++++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++++         expert_cache = ops.zeros_like(x)
-+++++++         idxs = flat_expert_indices.argsort()
-+++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++++         token_idxs = idxs // self.num_experts_per_tok
-++++++++
-+++++++         for i, end_idx in enumerate(tokens_per_expert):
-+++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++++             if start_idx == end_idx:
-+++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-+++++++             expert_out = expert(expert_tokens)
-+++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++++
-+++++++         return expert_cache
-++++++++        
-++++++++    # @no_grad()
-++++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++++    #     # expert_cache = torch.zeros_like(x)
-++++++++    #     # idxs = flat_expert_indices.argsort()
-++++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++++    #     # token_idxs = idxs // self.num_experts_per_tok
-++++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++++    #     #     if start_idx == end_idx:
-++++++++    #     #         continue
-++++++++    #     #     expert = self.experts[i]
-++++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++++    #     #     expert_tokens = x[exp_token_idx]
-++++++++    #     #     expert_out = expert(expert_tokens)
-++++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++++    #     # return expert_cache
-++++++++    #     expert_cache = ops.zeros_like(x)
-++++++++    #     idxs = flat_expert_indices.argsort()
-++++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++++++
-++++++++    #     for i, end_idx in enumerate(tokens_per_expert):
-++++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++++    #         if start_idx == end_idx:
-++++++++    #             continue
-++++++++    #         expert = self.experts[i]
-++++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++++    #         expert_tokens = x[exp_token_idx]
-++++++++    #         expert_out = expert(expert_tokens)
-++++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++++
-++++++++    #     return expert_cache
-++++++++    # @no_grad()
-++++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++++    #     expert_cache = ops.zeros_like(x)
-++++++++
-++++++++    #     # 排序保证顺序一致
-++++++++    #     idxs = flat_expert_indices.argsort()
-++++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++++    #     token_idxs = idxs // self.num_experts_per_tok
-++++++++
-++++++++    #     # 找出有 token 的专家
-++++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++++++
-++++++++    #     for i in active_experts.tolist():
-++++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++++    #         end_idx = tokens_per_expert[i]
-++++++++    #         if start_idx == end_idx:  # 没有 token
-++++++++    #             continue
-++++++++
-++++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++++    #         expert_tokens = x[exp_token_idx]
-++++++++    #         expert_out = self.experts[i](expert_tokens)
-++++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++++++
-++++++++    #         expert_cache = mindspore.mint.scatter_add(
-++++++++    #             expert_cache,
-++++++++    #             0,
-++++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++++++    #             expert_out
-++++++++    #         )
-++++++++
-++++++++    #     return expert_cache
-++++++++
-++++++++
-+++++++ 
-+++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-+++++++ #     """
-+++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++++ 
-+++++++         # Initialize weights and apply final processing
-+++++++         self.post_init()
-++++++++        self.warm_up = False
-++++++++
-++++++++    def warmup_moe_model_deep(self):
-++++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++++++        test_texts = [
-++++++++            "warmup short",
-++++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++++++++        ]
-++++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++++++        if tokenizer is None:
-++++++++            from mindnlp.transformers import AutoTokenizer
-++++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++++++            self._warmup_tokenizer = tokenizer
-++++++++
-++++++++        for text in test_texts:
-++++++++            inputs = tokenizer(text, return_tensors="ms")
-++++++++            with mindspore._no_grad():
-++++++++                _ = self(**inputs, use_cache=False)
-++++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-+++++++ 
-+++++++     def get_input_embeddings(self):
-+++++++         return self.model.embed_tokens
-+++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-+++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++++++         ```"""
-++++++++        if not self.warm_up:
-++++++++            self.warm_up = True
-++++++++            self.warmup_moe_model_deep()
-++++++++
-+++++++         output_attentions = (
-+++++++             output_attentions
-+++++++             if output_attentions is not None
-+++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++index 3cbf820e..d4c6b651 100644
-+++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++@@ -18,7 +18,6 @@
-+++++++ # See the License for the specific language governing permissions and
-+++++++ # limitations under the License.
-+++++++ """MindSpore Qwen2MoE model."""
-+++++++-
-+++++++ import math
-+++++++ from typing import List, Optional, Tuple, Union
-+++++++ 
-+++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-+++++++     TokenClassifierOutput,
-+++++++ )
-+++++++ from ...modeling_utils import PreTrainedModel
-++++++++from ...generation import GenerationMixin
-+++++++ from ....utils import logging
-+++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
-+++++++ 
-+++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-+++++++         self.variance_epsilon = eps
-+++++++ 
-+++++++     def forward(self, hidden_states):
-++++++++        # @dwj
-++++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++++++        # @lwx
-++++++++        # if not self.training :
-++++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-+++++++         input_dtype = hidden_states.dtype
-+++++++         hidden_states = hidden_states.to(mindspore.float32)
-+++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-+++++++@@ -234,6 +239,8 @@ def rotate_half(x):
-+++++++     """Rotates half the hidden dims of the input."""
-+++++++     x1 = x[..., : x.shape[-1] // 2]
-+++++++     x2 = x[..., x.shape[-1] // 2 :]
-++++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-+++++++     return ops.cat((-x2, x1), dim=-1)
-+++++++ 
-+++++++ 
-+++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-+++++++         self.config = config
-+++++++         self.hidden_size = config.hidden_size
-+++++++         self.intermediate_size = intermediate_size
-++++++++        
-+++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-+++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-+++++++         self.act_fn = ACT2FN[config.hidden_act]
-+++++++ 
-+++++++     def forward(self, x):
-+++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-+++++++-
-+++++++ 
-++++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++++++        # @lwx 
-++++++++        # gate_up_output = self.gate_up_proj(x)
-++++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++++++++        # return self.down_proj(swiglu_output)
-++++++++
-++++++++    # def forward(self, x):
-++++++++    #     gate_proj_out = self.gate_proj(x)
-++++++++    #     up_proj_out = self.up_proj(x)
-++++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++++++++    #     return self.down_proj(swiglu_out)
-++++++++        
-+++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
-+++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-+++++++     """
-+++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-+++++++         use_cache: bool = False,
-+++++++         cache_position: Optional[mindspore.Tensor] = None,
-+++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++++
-++++++++        
-++++++++
-+++++++         bsz, q_len, _ = hidden_states.shape
-+++++++ 
-+++++++         query_states = self.q_proj(hidden_states)
-+++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-+++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++                     "with a layer index."
-+++++++                 )
-+++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++++            if isinstance(past_key_value, StaticCache):
-++++++++                kv_seq_len = key_states.shape[-2]
-++++++++            else:
-++++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++ 
-+++++++         if past_key_value is not None:
-+++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++++            
-++++++++            if isinstance(past_key_value, StaticCache):
-++++++++                kv_seq_len = key_states.shape[-2]
-+++++++ 
-+++++++         # repeat k/v heads if n_kv_heads < n_heads
-+++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++++-
-++++++++        
-+++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++++ 
-+++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-+++++++-            raise ValueError(
-+++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-+++++++-                f" {attn_weights.shape}"
-+++++++-            )
-+++++++-
-+++++++-        if attention_mask is not None:  # no matter the length, we just slice it
-+++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++++++++        if attention_mask is not None:
-++++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++++             attn_weights = attn_weights + causal_mask
-+++++++ 
-+++++++         # upcast attention to fp32
-+++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-+++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++++ 
-+++++++         attn_output = self.o_proj(attn_output)
-+++++++-
-++++++++        # @lwx
-++++++++        
-++++++++        # max_seq_len = self.max_position_embeddings  # 2048
-++++++++
-++++++++        # if attention_mask is not None:
-++++++++        #     # attention_mask: [B, 1, Sq, Sk]
-++++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++++++
-++++++++        #     # pad 到 [max_seq_len, max_seq_len]
-++++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++++++        #     global_attention_mask = padded_mask
-++++++++        # else:
-++++++++        #     global_attention_mask = None
-++++++++
-++++++++
-++++++++        # sparse_mode=3
-++++++++        # attn_output = mindspore.ops.flash_attention_score(
-++++++++        #     query=query_states,
-++++++++        #     key=key_states,
-++++++++        #     value=value_states,
-++++++++        #     real_shift=None,
-++++++++        #     padding_mask=None,
-++++++++
-++++++++        #     head_num=self.num_heads,
-++++++++        #     attn_mask=global_attention_mask,
-++++++++        #     keep_prob=1.0 - self.attention_dropout,
-++++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++++        #     input_layout="BNSD",
-++++++++        #     pre_tokens=2147483647,
-++++++++        #     next_tokens=2147483647,
-++++++++        #     inner_precise=0,
-++++++++        #     drop_mask=None,
-++++++++        #     prefix=None,
-++++++++        #     actual_seq_qlen=None,
-++++++++        #     actual_seq_kvlen=None,
-++++++++        #     sparse_mode=sparse_mode,
-++++++++        # )
-+++++++         if not output_attentions:
-+++++++             attn_weights = None
-+++++++ 
-+++++++         return attn_output, attn_weights, past_key_value
-+++++++ 
-+++++++ 
-++++++++class Qwen2MoeFlashAttention(nn.Module):
-++++++++    """
-++++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++++++
-++++++++    关键改动:
-++++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++++++       直接传入原始的 key 和 value 张量效率更高。
-++++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++++++    """
-++++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++++++        super().__init__()
-++++++++        self.config = config
-++++++++        self.layer_idx = layer_idx
-++++++++        self.hidden_size = config.hidden_size
-++++++++        self.num_heads = config.num_attention_heads
-++++++++        self.head_dim = self.hidden_size // self.num_heads
-++++++++        self.num_key_value_heads = config.num_key_value_heads
-++++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++++        self.max_position_embeddings = config.max_position_embeddings
-++++++++        self.rope_theta = config.rope_theta
-++++++++        self.attention_dropout = config.attention_dropout
-++++++++
-++++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++++            raise ValueError(
-++++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++++++            )
-++++++++
-++++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++++++
-++++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++++++            self.head_dim,
-++++++++            max_position_embeddings=self.max_position_embeddings,
-++++++++            base=self.rope_theta,
-++++++++        )
-++++++++
-++++++++    def forward(
-++++++++        self,
-++++++++        hidden_states: mindspore.Tensor,
-++++++++        attention_mask: Optional[mindspore.Tensor] = None,
-++++++++        position_ids: Optional[mindspore.Tensor] = None,
-++++++++        past_key_value: Optional[Cache] = None,
-++++++++        output_attentions: bool = False,
-++++++++        use_cache: bool = False,
-++++++++        cache_position: Optional[mindspore.Tensor] = None,
-++++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++++
-++++++++        bsz, q_len, _ = hidden_states.shape
-++++++++
-++++++++        # 1. 线性投射 Q, K, V
-++++++++        query_states = self.q_proj(hidden_states)
-++++++++        key_states = self.k_proj(hidden_states)
-++++++++        value_states = self.v_proj(hidden_states)
-++++++++
-++++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++
-++++++++        # 3. RoPE 旋转位置编码
-++++++++        kv_seq_len = key_states.shape[-2]
-++++++++        if past_key_value is not None:
-++++++++            if self.layer_idx is None:
-++++++++                raise ValueError(
-++++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++++                    "with a layer index."
-++++++++                )
-++++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++++++                if cache_position.shape[0] == 1:
-++++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++++++                    kv_seq_len = past_seen_tokens + 1
-++++++++                else:
-++++++++                    # prefill 阶段：cache_position 是范围，使用其长度
-++++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++++++            else:
-++++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++++
-++++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++++
-++++++++        # 4. KV 缓存更新
-++++++++        if past_key_value is not None:
-++++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++++            key_states, value_states = past_key_value.update(
-++++++++                key_states, value_states, self.layer_idx, cache_kwargs
-++++++++            )
-++++++++            
-++++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++++                if cache_position.shape[0] == 1:
-++++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++++++                    kv_seq_len = key_states.shape[-2]
-++++++++
-++++++++        # 5. [重要] 准备 Attention Mask
-++++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++++++        fa_attention_mask = None
-++++++++        if attention_mask is not None:
-++++++++            # 截取与当前key长度匹配的部分
-++++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++++++            fa_attention_mask = (mask_slice != 0)
-++++++++
-++++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++++++        input_dtype = query_states.dtype
-++++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++++++            query_states = query_states.to(mindspore.float16)
-++++++++            key_states = key_states.to(mindspore.float16)
-++++++++            value_states = value_states.to(mindspore.float16)
-++++++++
-++++++++        # 6. [核心] 调用 flash_attention_score 算子
-++++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++++++        attn_output = mindspore.ops.flash_attention_score(
-++++++++            query=query_states,
-++++++++            key=key_states,
-++++++++            value=value_states,
-++++++++            head_num=self.num_heads, # 传入Q的头数(N1)
-++++++++            attn_mask=fa_attention_mask,
-++++++++            keep_prob=1.0 - self.attention_dropout,
-++++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++++            input_layout="BNSD",
-++++++++            sparse_mode=0 # 使用 defaultMask 模式
-++++++++        )
-++++++++
-++++++++        # 恢复原始数据类型
-++++++++        attn_output = attn_output.to(input_dtype)
-++++++++
-++++++++        # 7. 调整输出形状
-++++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++++        attn_output = self.o_proj(attn_output)
-++++++++
-++++++++        # FlashAttention 算子不直接返回注意力权重矩阵
-++++++++        attn_weights = None
-++++++++        if output_attentions:
-++++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++++
-++++++++        return attn_output, attn_weights, past_key_value
-++++++++
-++++++++    # def forward(
-++++++++    #     self,
-++++++++    #     hidden_states: mindspore.Tensor,
-++++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++++    #     past_key_value: Optional[Cache] = None,
-++++++++    #     output_attentions: bool = False,
-++++++++    #     use_cache: bool = False,
-++++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++++
-++++++++    #     bsz, q_len, _ = hidden_states.shape
-++++++++
-++++++++    #     # 1. 线性投射 Q, K, V
-++++++++    #     query_states = self.q_proj(hidden_states)
-++++++++    #     key_states = self.k_proj(hidden_states)
-++++++++    #     value_states = self.v_proj(hidden_states)
-++++++++
-++++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++
-++++++++    #     # 3. RoPE 旋转位置编码
-++++++++    #     kv_seq_len = key_states.shape[-2]
-++++++++    #     if past_key_value is not None:
-++++++++    #         if self.layer_idx is None:
-++++++++    #             raise ValueError(
-++++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++++    #                 "with a layer index."
-++++++++    #             )
-++++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++++
-++++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++++
-++++++++    #     # 4. KV 缓存更新
-++++++++    #     if past_key_value is not None:
-++++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++++    #         key_states, value_states = past_key_value.update(
-++++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++++    #         )
-++++++++
-++++++++    #     # 5. 准备 Attention Mask
-++++++++    #     fa_attention_mask = None
-++++++++    #     if attention_mask is not None:
-++++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++++    #         fa_attention_mask = (mask_slice != 0)
-++++++++
-++++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++++++    #     input_dtype = query_states.dtype
-++++++++
-++++++++    #     # 6. [核心] 调用 flash_attention_score 算子
-++++++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++++++    #         query=query_states,
-++++++++    #         key=key_states,
-++++++++    #         value=value_states,
-++++++++    #         head_num=self.num_heads,
-++++++++    #         attn_mask=fa_attention_mask,
-++++++++    #         keep_prob=1.0 - self.attention_dropout,
-++++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++++    #         input_layout="BNSD",
-++++++++    #         sparse_mode=0,
-++++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++++++    #         inner_precise=1
-++++++++    #     )
-++++++++
-++++++++    #     # 恢复原始数据类型
-++++++++    #     attn_output = attn_output.to(input_dtype)
-++++++++
-++++++++    #     # 7. 调整输出形状
-++++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++++    #     attn_output = self.o_proj(attn_output)
-++++++++
-++++++++    #     attn_weights = None
-++++++++    #     if output_attentions:
-++++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++++
-++++++++    #     return attn_output, attn_weights, past_key_value
-++++++++
-++++++++    # def forward(
-++++++++    #     self,
-++++++++    #     hidden_states: mindspore.Tensor,
-++++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++++    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++++    #     past_key_value: Optional[Cache] = None,
-++++++++    #     output_attentions: bool = False,
-++++++++    #     use_cache: bool = False,
-++++++++    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++++
-++++++++    #     bsz, q_len, _ = hidden_states.shape
-++++++++
-++++++++    #     query_states = self.q_proj(hidden_states)
-++++++++    #     key_states = self.k_proj(hidden_states)
-++++++++    #     value_states = self.v_proj(hidden_states)
-++++++++
-++++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++++
-++++++++    #     kv_seq_len = key_states.shape[-2]
-++++++++    #     if past_key_value is not None:
-++++++++    #         if self.layer_idx is None:
-++++++++    #             raise ValueError("`layer_idx` must be specified for caching")
-++++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++++
-++++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++++
-++++++++    #     if past_key_value is not None:
-++++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++++    #         key_states, value_states = past_key_value.update(
-++++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++++    #         )
-++++++++
-++++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++++
-++++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++++++    #     query_states = query_states / math.sqrt(self.head_dim)
-++++++++    #     # <--- 修改结束 ---
-++++++++
-++++++++    #     fa_attention_mask = None
-++++++++    #     if attention_mask is not None:
-++++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++++    #         fa_attention_mask = (mask_slice != 0)
-++++++++
-++++++++    #     input_dtype = query_states.dtype
-++++++++
-++++++++    #     attn_output = mindspore.ops.flash_attention_score(
-++++++++    #         query=query_states,    # 传入已经预先缩放过的 query
-++++++++    #         key=key_states,
-++++++++    #         value=value_states,
-++++++++    #         head_num=self.num_heads,
-++++++++    #         attn_mask=fa_attention_mask,
-++++++++    #         keep_prob=1.0 - self.attention_dropout,
-++++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++++++    #         input_layout="BNSD",
-++++++++    #         sparse_mode=0,
-++++++++    #         inner_precise=1        # 仍然保持内部高精度计算
-++++++++    #     )
-++++++++
-++++++++    #     attn_output = attn_output.to(input_dtype)
-++++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++++    #     attn_output = self.o_proj(attn_output)
-++++++++
-++++++++    #     attn_weights = None
-++++++++    #     if output_attentions:
-++++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++++++
-++++++++    #     return attn_output, attn_weights, past_key_value
-++++++++
-+++++++ QWEN2MOE_ATTENTION_CLASSES = {
-+++++++     "eager": Qwen2MoeAttention,
-++++++++    "flash-attention": Qwen2MoeFlashAttention,
-+++++++ }
-+++++++ 
-+++++++ 
-+++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-+++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-+++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-+++++++ 
-++++++++    #@dwj
-++++++++    # 只遍历激活的专家，而非全部专家
-+++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-+++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
-+++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
-+++++++-        # router_logits: (batch * sequence_length, n_experts)
-+++++++-        router_logits = self.gate(hidden_states)
-+++++++-
-+++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-+++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-+++++++-        if self.norm_topk_prob:
-+++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-+++++++-        # we cast back to the input dtype
-+++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
-+++++++-
-+++++++-        final_hidden_states = ops.zeros(
-+++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-+++++++-        )
-+++++++-
-+++++++-        # One hot encode the selected experts to create an expert mask
-+++++++-        # this will be used to easily index which expert is going to be sollicitated
-+++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-+++++++-
-+++++++-        # Loop over all available experts in the model and perform the computation on each expert
-+++++++-        for expert_idx in range(self.num_experts):
-+++++++-            expert_layer = self.experts[expert_idx]
-+++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-+++++++-
-+++++++-            # Index the correct hidden states and compute the expert hidden state for
-+++++++-            # the current expert. We need to make sure to multiply the output hidden
-+++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-+++++++-            if 0 not in idx.shape:
-+++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-+++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-+++++++-
-+++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
-+++++++-                # the `top_x` tensor here.
-+++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-+++++++-
-+++++++-        shared_expert_output = self.shared_expert(hidden_states)
-+++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-+++++++-
-+++++++-        final_hidden_states = final_hidden_states + shared_expert_output
-++++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++++            num_tokens = hidden_states_reshaped.shape[0]
-++++++++
-++++++++            router_logits = self.gate(hidden_states_reshaped)
-++++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++++
-++++++++            if self.norm_topk_prob:
-++++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++++            routing_weights = routing_weights.to(hidden_states.dtype)
-++++++++            
-++++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++++++            flat_selected_experts = selected_experts.flatten()
-++++++++            
-++++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++++++            token_indices = broadcasted_token_indices.flatten()
-++++++++            
-++++++++            active_experts = ops.unique(flat_selected_experts)
-++++++++            
-++++++++            for expert_idx_tensor in active_experts:
-++++++++                expert_idx = expert_idx_tensor.item()
-++++++++                expert_layer = self.experts[expert_idx]
-++++++++                
-++++++++                mask = (flat_selected_experts == expert_idx_tensor)
-++++++++                selected_token_indices = token_indices[mask]
-++++++++                selected_routing_weights = routing_weights.flatten()[mask]
-++++++++                
-++++++++                current_states = hidden_states_reshaped[selected_token_indices]
-++++++++                
-++++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++++                
-++++++++                final_hidden_states = final_hidden_states.index_add(
-++++++++                    dim=0,
-++++++++                    index=selected_token_indices,
-++++++++                    source=expert_output.to(hidden_states.dtype)
-++++++++                )
-++++++++            
-++++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-+++++++ 
-+++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-+++++++-        return final_hidden_states, router_logits
-++++++++            final_hidden_states = final_hidden_states + shared_expert_output
-++++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++++            
-++++++++            return final_hidden_states, router_logits
-+++++++ 
-+++++++ 
-+++++++ class Qwen2MoeDecoderLayer(nn.Module):
-+++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-+++++++ 
-+++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-+++++++ 
-++++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++++++
-+++++++         if (layer_idx not in config.mlp_only_layers) and (
-+++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-+++++++         ):
-+++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-+++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-+++++++     _skip_keys_device_placement = "past_key_values"
-+++++++     _supports_cache_class = True
-++++++++#lwx
-++++++++    # _supports_static_cache = True
-+++++++ 
-+++++++     def _init_weights(self, module):
-+++++++         std = self.config.initializer_range
-+++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-+++++++         return causal_mask
-+++++++ 
-+++++++ 
-+++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-+++++++     _tied_weights_keys = ["lm_head.weight"]
-+++++++ 
-+++++++     def __init__(self, config):
-+++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++++         self.num_experts_per_tok = config.num_experts_per_tok
-+++++++         # Initialize weights and apply final processing
-+++++++         self.post_init()
-++++++++        # @lwx
-++++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++++++++        #     self.generation_config.cache_implementation = "static"
-++++++++        self._warmed_up = False
-++++++++
-++++++++    def warmup_moe_model(self):
-++++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++++++++        test_texts = [
-++++++++            "warmup short",
-++++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++++++++        ]
-++++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++++++        if tokenizer is None:
-++++++++            from mindnlp.transformers import AutoTokenizer
-++++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++++++            self._warmup_tokenizer = tokenizer
-++++++++
-++++++++        for text in test_texts:
-++++++++            inputs = tokenizer(text, return_tensors="ms")
-++++++++            with mindspore._no_grad():
-++++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
-+++++++ 
-+++++++     def get_input_embeddings(self):
-+++++++         return self.model.embed_tokens
-+++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-+++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-+++++++         ```"""
-++++++++        if not self._warmed_up:
-++++++++            self._warmed_up = True
-++++++++            self.warmup_moe_model()
-+++++++ 
-+++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-+++++++         output_router_logits = (
-+++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-+++++++             }
-+++++++         )
-+++++++         return model_inputs
-++++++++# @lwx
-++++++++    # def _decode_one_tokens_logits(
-++++++++    #     self,
-++++++++    #     cur_token: mindspore.Tensor,
-++++++++    #     input_pos: Optional[mindspore.Tensor],
-++++++++    #     cache_position: mindspore.Tensor,
-++++++++    #     past_key_values: StaticCache,
-++++++++    # ) -> mindspore.Tensor:
-++++++++    #     """
-++++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++++++++        
-++++++++    #     Args:
-++++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++++++++    #         input_pos: 输入位置信息，可选
-++++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++++++++            
-++++++++    #     Returns:
-++++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++++++++    #     """
-++++++++    #     # 调用JIT编译的版本
-++++++++    #     return self.get_decode_one_tokens_logits(
-++++++++    #         cur_token=cur_token,
-++++++++    #         input_pos=input_pos,
-++++++++    #         cache_position=cache_position,
-++++++++    #         past_key_values=past_key_values,
-++++++++    #     )
-++++++++    
-++++++++    # @mindspore.jit(jit_level='O1')
-++++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++++++++    #     """
-++++++++    #     JIT编译的函数，用于高效的单token解码
-++++++++    #     使用JIT编译优化以支持静态shape和高效执行
-++++++++        
-++++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++++++++    #     """
-++++++++    #     outputs = self.model.forward(
-++++++++    #         input_ids=cur_token,
-++++++++    #         position_ids=input_pos,
-++++++++    #         cache_position=cache_position,
-++++++++    #         past_key_values=past_key_values,
-++++++++    #         use_cache=True,
-++++++++    #         return_dict=False,
-++++++++    #     )
-++++++++        
-++++++++    #     hidden_states = outputs[0]
-++++++++    #     logits = self.lm_head.forward(hidden_states)
-++++++++    #     logits = logits.float()
-++++++++        
-++++++++    #     return logits[:, -1, :]
-++++++++
-++++++++    # def _sample(
-++++++++    #     self,
-++++++++    #     input_ids: mindspore.Tensor,
-++++++++    #     logits_processor,
-++++++++    #     stopping_criteria,
-++++++++    #     generation_config,
-++++++++    #     synced_devices: bool,
-++++++++    #     streamer=None,
-++++++++    #     logits_warper=None,
-++++++++    #     **model_kwargs,
-++++++++    # ):
-++++++++    #     """
-++++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++++++++    #     """
-++++++++    #     from ...generation.logits_process import LogitsProcessorList
-++++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++++++++    #     from mindnlp.core import nn, ops, no_grad
-++++++++    #     import numpy as np
-++++++++        
-++++++++    #     # 检查是否使用 StaticCache
-++++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++++++++    #     # 否则，直接调用父类方法
-++++++++    #     past_key_values = model_kwargs.get("past_key_values")
-++++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++++++++
-++++++++    #     if not isinstance(past_key_values, StaticCache):
-++++++++    #         # 不使用 StaticCache，直接调用父类方法
-++++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++++++++    #         return super()._sample(
-++++++++    #             input_ids=input_ids,
-++++++++    #             logits_processor=logits_processor,
-++++++++    #             stopping_criteria=stopping_criteria,
-++++++++    #             generation_config=generation_config,
-++++++++    #             synced_devices=synced_devices,
-++++++++    #             streamer=streamer,
-++++++++    #             logits_warper=logits_warper,
-++++++++    #             **model_kwargs,
-++++++++    #         )
-++++++++        
-++++++++    #     # 使用 StaticCache，进入自定义循环
-++++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++++++++    #     pad_token_id = generation_config._pad_token_tensor
-++++++++    #     output_attentions = generation_config.output_attentions
-++++++++    #     output_hidden_states = generation_config.output_hidden_states
-++++++++    #     output_scores = generation_config.output_scores
-++++++++    #     output_logits = generation_config.output_logits
-++++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++++++++    #     max_length = generation_config.max_length
-++++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++++++++    #     do_sample = generation_config.do_sample
-++++++++        
-++++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++++++++    #         raise ValueError(
-++++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++++++++    #             f"{logits_warper})."
-++++++++    #         )
-++++++++        
-++++++++    #     # init attention / hidden states / scores tuples
-++++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
-++++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++++++++        
-++++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++++++++    #         encoder_hidden_states = (
-++++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++++++++    #         )
-++++++++        
-++++++++    #     # keep track of which sequences are already finished
-++++++++    #     batch_size, cur_len = input_ids.shape
-++++++++    #     this_peer_finished = False
-++++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++++++++        
-++++++++    #     time_record = []
-++++++++    #     from ....utils.testing_utils import parse_flag_from_env
-++++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++++++++        
-++++++++    #     while self._has_unfinished_sequences(
-++++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++++++++    #     ):
-++++++++    #         if _record_time:
-++++++++    #             import time as time_module
-++++++++    #             infer_start = time_module.time()
-++++++++            
-++++++++    #         # prepare model inputs
-++++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++++++++            
-++++++++    #         # prepare variable output controls
-++++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++++++++            
-++++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++++++++    #         cur_cache_position = model_inputs.get("cache_position")
-++++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
-++++++++    #         cur_input_ids = model_inputs.get("input_ids")
-++++++++            
-++++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++++++++    #             cur_cache_position is not None and 
-++++++++    #             len(cur_cache_position.shape) > 0 and
-++++++++    #             cur_cache_position.shape[0] == 1 and
-++++++++    #             cur_input_ids is not None and
-++++++++    #             cur_input_ids.shape[1] == 1):
-++++++++    #             # 使用 JIT 优化的单 token 解码
-++++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++++++++    #             if not hasattr(self, '_jit_used'):
-++++++++    #                 self._jit_used = False
-++++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++++++++                
-++++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
-++++++++    #                 cur_token=cur_input_ids,
-++++++++    #                 input_pos=model_inputs.get("position_ids"),
-++++++++    #                 cache_position=cur_cache_position,
-++++++++    #                 past_key_values=cur_past_key_values,
-++++++++    #             )
-++++++++                
-++++++++    #             # 标记已使用JIT（用于后续判断）
-++++++++    #             if not self._jit_used:
-++++++++    #                 self._jit_used = True
-++++++++                
-++++++++    #             # 构造兼容的输出对象
-++++++++    #             class JitOptimizedOutput:
-++++++++    #                 def __init__(self, logits, config):
-++++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++++++++    #                     self.config = config
-++++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
-++++++++    #                     self.cross_attentions = None
-++++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++++++++                
-++++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++++++++    #         else:
-++++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++++++++    #             outputs = self(**model_inputs, return_dict=True)
-++++++++            
-++++++++    #         if synced_devices and this_peer_finished:
-++++++++    #             continue
-++++++++            
-++++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++++++++    #         next_token_logits = outputs.logits[:, -1, :]
-++++++++            
-++++++++    #         # pre-process distribution
-++++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++++++++    #         if do_sample:
-++++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++++++++            
-++++++++    #         # Store scores, attentions and hidden_states when required
-++++++++    #         if return_dict_in_generate:
-++++++++    #             if output_scores:
-++++++++    #                 scores += (next_token_scores,)
-++++++++    #             if output_logits:
-++++++++    #                 raw_logits += (next_token_logits,)
-++++++++    #             if output_attentions:
-++++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++++++++    #                 if self.config.is_encoder_decoder:
-++++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++++++++                
-++++++++    #             if output_hidden_states:
-++++++++    #                 hidden = (
-++++++++    #                     outputs.decoder_hidden_states
-++++++++    #                     if self.config.is_encoder_decoder
-++++++++    #                     else outputs.hidden_states
-++++++++    #                 )
-++++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++++++++            
-++++++++    #         # token selection
-++++++++    #         if do_sample:
-++++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++++++++    #         else:
-++++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++++++++            
-++++++++    #         # finished sentences should have their next token be a padding token
-++++++++    #         if has_eos_stopping_criteria:
-++++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++++++++            
-++++++++    #         # update generated ids, model inputs, and length for next step
-++++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++++++++    #         if streamer is not None:
-++++++++    #             streamer.put(next_tokens)
-++++++++            
-++++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
-++++++++    #             outputs,
-++++++++    #             model_kwargs,
-++++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++++++++    #         )
-++++++++            
-++++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++++++++    #         cur_len += 1
-++++++++            
-++++++++    #         if _record_time:
-++++++++    #             import time as time_module
-++++++++    #             infer_stop = time_module.time()
-++++++++    #             time_record.append(infer_stop - infer_start)
-++++++++            
-++++++++    #         del outputs
-++++++++        
-++++++++    #     average_infer_time = None
-++++++++    #     if time_record:
-++++++++    #         if len(time_record) > 1:
-++++++++    #             time_record.pop(0)
-++++++++    #         average_infer_time = sum(time_record) / len(time_record)
-++++++++    #         print(f'average inference time is: {average_infer_time}')
-++++++++    #         print(f'inference time record: {time_record}')
-++++++++        
-++++++++    #     if streamer is not None:
-++++++++    #         streamer.end()
-++++++++        
-++++++++    #     # 简单判断：打印是否使用了JIT路径
-++++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
-++++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
-++++++++    #     else:
-++++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++++++++        
-++++++++    #     if return_dict_in_generate:
-++++++++    #         if self.config.is_encoder_decoder:
-++++++++    #             return GenerateEncoderDecoderOutput(
-++++++++    #                 sequences=input_ids,
-++++++++    #                 scores=scores,
-++++++++    #                 logits=raw_logits,
-++++++++    #                 encoder_attentions=encoder_attentions,
-++++++++    #                 encoder_hidden_states=encoder_hidden_states,
-++++++++    #                 decoder_attentions=decoder_attentions,
-++++++++    #                 cross_attentions=cross_attentions,
-++++++++    #                 decoder_hidden_states=decoder_hidden_states,
-++++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++++++    #                 average_infer_time=average_infer_time
-++++++++    #             )
-++++++++    #         else:
-++++++++    #             return GenerateDecoderOnlyOutput(
-++++++++    #                 sequences=input_ids,
-++++++++    #                 scores=scores,
-++++++++    #                 logits=raw_logits,
-++++++++    #                 attentions=decoder_attentions,
-++++++++    #                 hidden_states=decoder_hidden_states,
-++++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++++++    #                 average_infer_time=average_infer_time
-++++++++    #             )
-++++++++    #     else:
-++++++++    #         return input_ids
-++++++++            
-++++++++    # def _prepare_cache_for_generation(
-++++++++    #     self,
-++++++++    #     generation_config,
-++++++++    #     model_kwargs,
-++++++++    #     assistant_model,
-++++++++    #     batch_size,
-++++++++    #     max_cache_length,
-++++++++    # ):
-++++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++++++++    #         generation_config.cache_implementation = "static"
-++++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++++++++        
-++++++++    #     if generation_config.cache_implementation == "static":
-++++++++    #         base_required_from_max_length = generation_config.max_length + 1
-++++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
-++++++++    #         min_cache_size = 50
-++++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++++++++    #         else:
-++++++++    #             max_cache_length = max(base_required, min_cache_size)
-++++++++            
-++++++++    #         original_max_cache_length = max_cache_length
-++++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
-++++++++            
-++++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++++++    #             if max_cache_length > self.config.max_position_embeddings:
-++++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++++++++        
-++++++++    #     result = super()._prepare_cache_for_generation(
-++++++++    #         generation_config=generation_config,
-++++++++    #         model_kwargs=model_kwargs,
-++++++++    #         assistant_model=assistant_model,
-++++++++    #         batch_size=batch_size,
-++++++++    #         max_cache_length=max_cache_length,
-++++++++    #     )
-++++++++        
-++++++++    #     if generation_config.cache_implementation == "static":
-++++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++++++++    #         created_cache = model_kwargs.get(cache_name)
-++++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++++++++    #             if created_cache.max_cache_len < generation_config.max_length:
-++++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++++++++        
-++++++++    #     return result
-++++++++
-++++++++
-++++++++
-+++++++ 
-+++++++ 
-+++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-+++++++-- 
-+++++++2.27.0
-+++++++
-++++++-- 
-++++++2.27.0
-++++++
-+++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-+++++new file mode 100644
-+++++index 00000000..966529e4
-+++++--- /dev/null
-++++++++ b/patches/0003-20261106secondcommit.patch
-+++++@@ -0,0 +1,2769 @@
-++++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-++++++From: Pinoeer-kingxi <13022943007@163.com>
-++++++Date: Thu, 6 Nov 2025 14:54:37 +0800
-++++++Subject: [PATCH 3/3] 20261106secondcommit
-++++++
-++++++---
-++++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
-++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
-++++++ patches/0001-20251104commit.patch             | 1272 -----------------
-++++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
-++++++ delete mode 100644 patches/0001-20251104commit.patch
-++++++
-++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++index 73773c22..2f9192bf 100644
-++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
-++++++ 
-++++++ _CONFIG_FOR_DOC = "DeepseekConfig"
-++++++ 
-+++++++_attn_mask_cache = {}
-+++++++
-+++++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
-+++++++    q_len = batch_and_seq[1]
-+++++++    kv_len = batch_and_seq[1] + past_key_values_length 
-+++++++    key = (batch_and_seq[0], q_len, kv_len)
-+++++++
-+++++++    if key in _attn_mask_cache:
-+++++++        return _attn_mask_cache[key]
-+++++++
-+++++++    mask = _prepare_4d_causal_attention_mask(
-+++++++        attention_mask,
-+++++++        batch_and_seq,
-+++++++        inputs_embeds,
-+++++++        past_key_values_length,
-+++++++    )
-+++++++    _attn_mask_cache[key] = mask
-+++++++    return mask
-++++++ 
-++++++ def _get_unpad_data(attention_mask):
-++++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
-++++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
-++++++         return final_output
-++++++ 
-++++++ 
-++++++-    @no_grad()
-++++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++++-        expert_cache = ops.zeros_like(x)
-++++++-        idxs = flat_expert_indices.argsort()
-++++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++-        token_idxs = idxs // self.num_experts_per_tok
-++++++-
-++++++-        for i, end_idx in enumerate(tokens_per_expert):
-++++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++-            if start_idx == end_idx:
-++++++-                continue
-++++++-            expert = self.experts[i]
-++++++-            exp_token_idx = token_idxs[start_idx:end_idx]
-++++++-            expert_tokens = x[exp_token_idx]
-++++++-            expert_out = expert(expert_tokens)
-++++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++-
-++++++-        return expert_cache
-++++++-        
-++++++     # @no_grad()
-++++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++-    #     # expert_cache = torch.zeros_like(x)
-++++++-    #     # idxs = flat_expert_indices.argsort()
-++++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++-    #     # token_idxs = idxs // self.num_experts_per_tok
-++++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++-    #     #     if start_idx == end_idx:
-++++++-    #     #         continue
-++++++-    #     #     expert = self.experts[i]
-++++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++-    #     #     expert_tokens = x[exp_token_idx]
-++++++-    #     #     expert_out = expert(expert_tokens)
-++++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++-    #     # return expert_cache
-+++++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++++     #     expert_cache = ops.zeros_like(x)
-++++++     #     idxs = flat_expert_indices.argsort()
-++++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
-++++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++ 
-++++++     #     return expert_cache
-++++++-    # @no_grad()
-++++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++-    #     expert_cache = ops.zeros_like(x)
-+++++++        
-+++++++    @no_grad()
-+++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-+++++++        """
-+++++++        优化版 MoE prefill：
-+++++++        - 批量张量化处理同一个 expert 的所有 token
-+++++++        - 跳过无 token 的专家
-+++++++        - 保持结果完全一致
-+++++++        """
-+++++++        # 初始化输出缓存
-+++++++        expert_cache = ops.zeros_like(x)
-++++++ 
-++++++-    #     # 排序保证顺序一致
-++++++-    #     idxs = flat_expert_indices.argsort()
-++++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++-    #     token_idxs = idxs // self.num_experts_per_tok
-+++++++        # 排序（确保 scatter_add 位置对应原逻辑）
-+++++++        idxs = flat_expert_indices.argsort()
-+++++++        sorted_expert_indices = flat_expert_indices[idxs]
-+++++++        sorted_token_indices = idxs // self.num_experts_per_tok
-++++++ 
-++++++-    #     # 找出有 token 的专家
-++++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++++        # 每个 expert 的 token 数
-+++++++        tokens_per_expert = sorted_expert_indices.bincount()
-++++++ 
-++++++-    #     for i in active_experts.tolist():
-++++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++-    #         end_idx = tokens_per_expert[i]
-++++++-    #         if start_idx == end_idx:  # 没有 token
-++++++-    #             continue
-+++++++        # 找出有 token 的专家
-+++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-++++++ 
-++++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++-    #         expert_tokens = x[exp_token_idx]
-++++++-    #         expert_out = self.experts[i](expert_tokens)
-++++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++++        for expert_id in active_experts.tolist():
-+++++++            # 取该 expert 对应的排序后 token 区间
-+++++++            start = (tokens_per_expert[:expert_id]).sum().item()
-+++++++            end = start + tokens_per_expert[expert_id].item()
-++++++ 
-++++++-    #         expert_cache = mindspore.mint.scatter_add(
-++++++-    #             expert_cache,
-++++++-    #             0,
-++++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++++-    #             expert_out
-++++++-    #         )
-+++++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
-+++++++            expert_tokens = x[token_idx]                     # 取输入向量
-++++++ 
-++++++-    #     return expert_cache
-+++++++            # 执行专家 MLP
-+++++++            expert_out = self.experts[expert_id](expert_tokens)
-+++++++
-+++++++            # 按权重缩放
-+++++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
-+++++++
-+++++++            # 回写到缓存（等价 scatter_add）
-+++++++            expert_cache = mindspore.mint.scatter_add(
-+++++++                expert_cache,
-+++++++                0,
-+++++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++++                scaled_out
-+++++++            )
-+++++++
-+++++++        return expert_cache
-+++++++
-+++++++        # @no_grad()
-+++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++++        #     # expert_cache = torch.zeros_like(x)
-+++++++        #     # idxs = flat_expert_indices.argsort()
-+++++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-+++++++        #     # token_idxs = idxs // self.num_experts_per_tok
-+++++++        #     # for i, end_idx in enumerate(tokens_per_expert):
-+++++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-+++++++        #     #     if start_idx == end_idx:
-+++++++        #     #         continue
-+++++++        #     #     expert = self.experts[i]
-+++++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++        #     #     expert_tokens = x[exp_token_idx]
-+++++++        #     #     expert_out = expert(expert_tokens)
-+++++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-+++++++        #     # return expert_cache
-+++++++        #     expert_cache = ops.zeros_like(x)
-+++++++        #     idxs = flat_expert_indices.argsort()
-+++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++++        #     token_idxs = idxs // self.num_experts_per_tok
-+++++++
-+++++++        #     for i, end_idx in enumerate(tokens_per_expert):
-+++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++++        #         if start_idx == end_idx:
-+++++++        #             continue
-+++++++        #         expert = self.experts[i]
-+++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++        #         expert_tokens = x[exp_token_idx]
-+++++++        #         expert_out = expert(expert_tokens)
-+++++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-+++++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-+++++++
-+++++++        #     return expert_cache
-+++++++        # @no_grad()
-+++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-+++++++        #     expert_cache = ops.zeros_like(x)
-+++++++
-+++++++        #     # 排序保证顺序一致
-+++++++        #     idxs = flat_expert_indices.argsort()
-+++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-+++++++        #     token_idxs = idxs // self.num_experts_per_tok
-+++++++
-+++++++        #     # 找出有 token 的专家
-+++++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-+++++++
-+++++++        #     for i in active_experts.tolist():
-+++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-+++++++        #         end_idx = tokens_per_expert[i]
-+++++++        #         if start_idx == end_idx:  # 没有 token
-+++++++        #             continue
-+++++++
-+++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
-+++++++        #         expert_tokens = x[exp_token_idx]
-+++++++        #         expert_out = self.experts[i](expert_tokens)
-+++++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-+++++++
-+++++++        #         expert_cache = mindspore.mint.scatter_add(
-+++++++        #             expert_cache,
-+++++++        #             0,
-+++++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-+++++++        #             expert_out
-+++++++        #         )
-+++++++
-+++++++        #     return expert_cache
-++++++ 
-++++++ 
-++++++ 
-++++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
-++++++ 
-++++++         return attn_output, attn_weights, past_key_value
-++++++ 
-++++++-
-++++++ # class DeepseekFlashAttention(nn.Module):
-++++++ #     """
-++++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
-++++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
-++++++ 
-++++++         return attn_output, attn_weights, past_key_value
-++++++ 
-+++++++
-++++++ Deepseek_ATTENTION_CLASSES = {
-++++++     "eager": DeepseekAttention,
-++++++     "flash-attention": DeepseekFlashAttention,
-++++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
-++++++             )
-++++++         else:
-++++++             # 4d mask is passed through the layers
-++++++-            attention_mask = _prepare_4d_causal_attention_mask(
-+++++++            # attention_mask = _prepare_4d_causal_attention_mask(
-+++++++            #     attention_mask,
-+++++++            #     (batch_size, seq_length),
-+++++++            #     inputs_embeds,
-+++++++            #     past_key_values_length,
-+++++++            # )
-+++++++            #@dwj
-+++++++            attention_mask = get_cached_causal_mask(
-++++++                 attention_mask,
-++++++                 (batch_size, seq_length),
-++++++                 inputs_embeds,
-++++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++++         # Initialize weights and apply final processing
-++++++         self.post_init()
-++++++         self.warm_up = False
-+++++++        #@dwj
-+++++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
-+++++++            self.num_layers,
-+++++++            self.num_attention_heads,
-+++++++            self.head_dim,
-+++++++            batch_size=1,
-+++++++            max_length=self.max_length,
-+++++++            dtype=mindspore.float16
-+++++++        )
-+++++++
-+++++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
-+++++++        key_cache = []
-+++++++        value_cache = []
-+++++++        for _ in range(num_layers):
-+++++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
-+++++++            key_cache.append(k)
-+++++++            value_cache.append(v)
-+++++++        return key_cache, value_cache
-+++++++
-++++++ 
-++++++     def warmup_moe_model_deep(self):
-++++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++index bced285c..ebd7782e 100644
-++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
-++++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-++++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-++++++ 
-++++++-Long_Prompt = False
-++++++-PROMPT_LENGTH_THRESHOLD = 128
-+++++++Long_Prompt = 1
-+++++++LONG_PROMPT_LENGTH_THRESHOLD = 128
-+++++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
-+++++++
-+++++++_causal_mask_cache = {}
-+++++++
-+++++++def get_cached_causal_mask_with_cache_position(
-+++++++    attention_mask: mindspore.Tensor,
-+++++++    sequence_length: int,
-+++++++    target_length: int,
-+++++++    dtype: mindspore.dtype,
-+++++++    min_dtype: float,
-+++++++    cache_position: mindspore.Tensor,
-+++++++    batch_size: int,
-+++++++):
-+++++++    """
-+++++++    带缓存的 causal mask 构造函数
-+++++++    """
-+++++++    # q_len 是当前 query 长度
-+++++++    q_len = sequence_length
-+++++++    # kv_len 是 target_length
-+++++++    kv_len = target_length
-+++++++
-+++++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
-+++++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
-+++++++
-+++++++    if key in _causal_mask_cache:
-+++++++        return _causal_mask_cache[key]
-+++++++
-+++++++    # 调用原来的 mask 构造逻辑
-+++++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++++        attention_mask,
-+++++++        sequence_length=sequence_length,
-+++++++        target_length=target_length,
-+++++++        dtype=dtype,
-+++++++        min_dtype=min_dtype,
-+++++++        cache_position=cache_position,
-+++++++        batch_size=batch_size,
-+++++++    )
-+++++++    # 缓存结果
-+++++++    _causal_mask_cache[key] = causal_mask
-+++++++    return causal_mask
-++++++ 
-++++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-++++++ def _prepare_4d_causal_attention_mask_with_cache_position(
-++++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++++++ 
-++++++ 
-++++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
-+++++++# class Qwen2MoeAttention(nn.Module):
-+++++++#     """
-+++++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-+++++++#     and "Generating Long Sequences with Sparse Transformers".
-+++++++#     """
-+++++++
-+++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-+++++++#         super().__init__()
-+++++++#         self.config = config
-+++++++#         self.layer_idx = layer_idx
-+++++++#         if layer_idx is None:
-+++++++#             logger.warning_once(
-+++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-+++++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++++#                 "when creating this class."
-+++++++#             )
-+++++++
-+++++++#         self.hidden_size = config.hidden_size
-+++++++#         self.num_heads = config.num_attention_heads
-+++++++#         self.head_dim = self.hidden_size // self.num_heads
-+++++++#         self.num_key_value_heads = config.num_key_value_heads
-+++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-+++++++#         self.max_position_embeddings = config.max_position_embeddings
-+++++++#         self.rope_theta = config.rope_theta
-+++++++#         self.is_causal = True
-+++++++#         self.attention_dropout = config.attention_dropout
-+++++++
-+++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
-+++++++#             raise ValueError(
-+++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
-+++++++#                 f" and `num_heads`: {self.num_heads})."
-+++++++#             )
-+++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-+++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-+++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-+++++++
-+++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
-+++++++#             self.head_dim,
-+++++++#             max_position_embeddings=self.max_position_embeddings,
-+++++++#             base=self.rope_theta,
-+++++++#         )
-+++++++
-+++++++#     def forward(
-+++++++#         self,
-+++++++#         hidden_states: mindspore.Tensor,
-+++++++#         attention_mask: Optional[mindspore.Tensor] = None,
-+++++++#         position_ids: Optional[mindspore.Tensor] = None,
-+++++++#         past_key_value: Optional[Cache] = None,
-+++++++#         output_attentions: bool = False,
-+++++++#         use_cache: bool = False,
-+++++++#         cache_position: Optional[mindspore.Tensor] = None,
-+++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-+++++++
-+++++++        
-+++++++
-+++++++#         bsz, q_len, _ = hidden_states.shape
-+++++++
-+++++++#         query_states = self.q_proj(hidden_states)
-+++++++#         key_states = self.k_proj(hidden_states)
-+++++++#         value_states = self.v_proj(hidden_states)
-+++++++
-+++++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-+++++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-+++++++
-+++++++#         kv_seq_len = key_states.shape[-2]
-+++++++#         if past_key_value is not None:
-+++++++#             if self.layer_idx is None:
-+++++++#                 raise ValueError(
-+++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-+++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-+++++++#                     "with a layer index."
-+++++++#                 )
-+++++++#             if isinstance(past_key_value, StaticCache):
-+++++++#                 kv_seq_len = key_states.shape[-2]
-+++++++#             else:
-+++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-+++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-+++++++
-+++++++#         if past_key_value is not None:
-+++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++++            
-+++++++#             if isinstance(past_key_value, StaticCache):
-+++++++#                 kv_seq_len = key_states.shape[-2]
-+++++++
-+++++++#         # repeat k/v heads if n_kv_heads < n_heads
-+++++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++++        
-+++++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++++
-+++++++#         if attention_mask is not None:
-+++++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++++#             attn_weights = attn_weights + causal_mask
-+++++++
-+++++++#         # upcast attention to fp32
-+++++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+++++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+++++++#         attn_output = ops.matmul(attn_weights, value_states)
-+++++++
-+++++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+++++++#             raise ValueError(
-+++++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-+++++++#                 f" {attn_output.shape}"
-+++++++#             )
-+++++++
-+++++++#         attn_output = ops.transpose(attn_output, 1, 2)
-+++++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++++
-+++++++#         attn_output = self.o_proj(attn_output)
-+++++++#         # @lwx
-+++++++        
-+++++++#         # max_seq_len = self.max_position_embeddings  # 2048
-+++++++
-+++++++#         # if attention_mask is not None:
-+++++++#         #     # attention_mask: [B, 1, Sq, Sk]
-+++++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-+++++++
-+++++++#         #     # pad 到 [max_seq_len, max_seq_len]
-+++++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-+++++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-+++++++#         #     global_attention_mask = padded_mask
-+++++++#         # else:
-+++++++#         #     global_attention_mask = None
-+++++++
-+++++++
-+++++++#         # sparse_mode=3
-+++++++#         # attn_output = mindspore.ops.flash_attention_score(
-+++++++#         #     query=query_states,
-+++++++#         #     key=key_states,
-+++++++#         #     value=value_states,
-+++++++#         #     real_shift=None,
-+++++++#         #     padding_mask=None,
-+++++++
-+++++++#         #     head_num=self.num_heads,
-+++++++#         #     attn_mask=global_attention_mask,
-+++++++#         #     keep_prob=1.0 - self.attention_dropout,
-+++++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++#         #     input_layout="BNSD",
-+++++++#         #     pre_tokens=2147483647,
-+++++++#         #     next_tokens=2147483647,
-+++++++#         #     inner_precise=0,
-+++++++#         #     drop_mask=None,
-+++++++#         #     prefix=None,
-+++++++#         #     actual_seq_qlen=None,
-+++++++#         #     actual_seq_kvlen=None,
-+++++++#         #     sparse_mode=sparse_mode,
-+++++++#         # )
-+++++++#         if not output_attentions:
-+++++++#             attn_weights = None
-+++++++
-+++++++#         return attn_output, attn_weights, past_key_value
-+++++++
-++++++ class Qwen2MoeAttention(nn.Module):
-++++++     """
-++++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
-++++++-    and "Generating Long Sequences with Sparse Transformers".
-++++++-    """
-+++++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
-++++++ 
-+++++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
-+++++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
-+++++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
-+++++++
-+++++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
-+++++++    """
-++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++++         super().__init__()
-++++++         self.config = config
-++++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
-++++++         if layer_idx is None:
-++++++             logger.warning_once(
-++++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
-++++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-+++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
-++++++                 "when creating this class."
-++++++             )
-++++++ 
-++++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
-++++++         use_cache: bool = False,
-++++++         cache_position: Optional[mindspore.Tensor] = None,
-++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++-
-++++++         
-++++++-
-+++++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
-++++++         bsz, q_len, _ = hidden_states.shape
-++++++ 
-++++++         query_states = self.q_proj(hidden_states)
-++++++         key_states = self.k_proj(hidden_states)
-++++++         value_states = self.v_proj(hidden_states)
-++++++ 
-++++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
-++++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
-++++++-
-+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-+++++++        
-++++++         kv_seq_len = key_states.shape[-2]
-++++++         if past_key_value is not None:
-++++++-            if self.layer_idx is None:
-++++++-                raise ValueError(
-++++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++-                    "with a layer index."
-++++++-                )
-++++++-            if isinstance(past_key_value, StaticCache):
-++++++-                kv_seq_len = key_states.shape[-2]
-++++++-            else:
-++++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-+++++++        
-++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++ 
-++++++         if past_key_value is not None:
-++++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-+++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-+++++++
-+++++++        # --- 2. 动态调度核心注意力计算 ---
-+++++++        global Long_Prompt
-+++++++        if Long_Prompt >= 1:
-+++++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
-+++++++            fa_attention_mask = None
-+++++++            if attention_mask is not None:
-+++++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-+++++++                fa_attention_mask = (mask_slice != 0)
-+++++++
-+++++++            attn_output = mindspore.ops.flash_attention_score(
-+++++++                query=query_states,
-+++++++                key=key_states,
-+++++++                value=value_states,
-+++++++                head_num=self.num_heads,
-+++++++                attn_mask=fa_attention_mask,
-+++++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
-+++++++                scalar_value=1.0 / math.sqrt(self.head_dim),
-+++++++                input_layout="BNSD",
-+++++++                sparse_mode=0,
-+++++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
-+++++++            )
-++++++             
-++++++-            if isinstance(past_key_value, StaticCache):
-++++++-                kv_seq_len = key_states.shape[-2]
-+++++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-+++++++            attn_output = self.o_proj(attn_output)
-+++++++            attn_weights = None
-+++++++            if output_attentions:
-+++++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-++++++ 
-++++++-        # repeat k/v heads if n_kv_heads < n_heads
-++++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++-        
-++++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-+++++++        else:
-+++++++            # --- Eager Attention 路径 (用于短序列和解码) ---
-+++++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
-+++++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
-+++++++            
-+++++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++++ 
-++++++-        if attention_mask is not None:
-++++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++++-            attn_weights = attn_weights + causal_mask
-+++++++            if attention_mask is not None:
-+++++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-+++++++                attn_weights = attn_weights + causal_mask
-++++++ 
-++++++-        # upcast attention to fp32
-++++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-++++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-++++++-        attn_output = ops.matmul(attn_weights, value_states)
-+++++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
-+++++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
-+++++++            attn_output = ops.matmul(attn_weights, value_states)
-++++++ 
-++++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-++++++-            raise ValueError(
-++++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
-++++++-                f" {attn_output.shape}"
-++++++-            )
-+++++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
-+++++++                raise ValueError(
-+++++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
-+++++++                )
-++++++ 
-++++++-        attn_output = ops.transpose(attn_output, 1, 2)
-++++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++++            attn_output = ops.transpose(attn_output, 1, 2)
-+++++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-+++++++            attn_output = self.o_proj(attn_output)
-++++++ 
-++++++-        attn_output = self.o_proj(attn_output)
-++++++-        # @lwx
-+++++++            if not output_attentions:
-+++++++                attn_weights = None
-++++++         
-++++++-        # max_seq_len = self.max_position_embeddings  # 2048
-++++++-
-++++++-        # if attention_mask is not None:
-++++++-        #     # attention_mask: [B, 1, Sq, Sk]
-++++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++++-
-++++++-        #     # pad 到 [max_seq_len, max_seq_len]
-++++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++++-        #     global_attention_mask = padded_mask
-++++++-        # else:
-++++++-        #     global_attention_mask = None
-++++++-
-++++++-
-++++++-        # sparse_mode=3
-++++++-        # attn_output = mindspore.ops.flash_attention_score(
-++++++-        #     query=query_states,
-++++++-        #     key=key_states,
-++++++-        #     value=value_states,
-++++++-        #     real_shift=None,
-++++++-        #     padding_mask=None,
-++++++-
-++++++-        #     head_num=self.num_heads,
-++++++-        #     attn_mask=global_attention_mask,
-++++++-        #     keep_prob=1.0 - self.attention_dropout,
-++++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++-        #     input_layout="BNSD",
-++++++-        #     pre_tokens=2147483647,
-++++++-        #     next_tokens=2147483647,
-++++++-        #     inner_precise=0,
-++++++-        #     drop_mask=None,
-++++++-        #     prefix=None,
-++++++-        #     actual_seq_qlen=None,
-++++++-        #     actual_seq_kvlen=None,
-++++++-        #     sparse_mode=sparse_mode,
-++++++-        # )
-++++++-        if not output_attentions:
-++++++-            attn_weights = None
-++++++-
-++++++         return attn_output, attn_weights, past_key_value
-++++++ 
-++++++-
-++++++ # class Qwen2MoeFlashAttention(nn.Module):
-++++++ #     """
-++++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
-++++++ #             return final_hidden_states, router_logits
-++++++ 
-++++++ 
-++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++-#     """
-++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
-++++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
-++++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
-++++++-#     """
-++++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++++-#         super().__init__()
-++++++-#         self.num_experts = config.num_experts
-++++++-#         self.top_k = config.num_experts_per_tok
-++++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++++-
-++++++-#         # 门控网络
-++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++-#         # 专家列表
-++++++-#         self.experts = nn.ModuleList(
-++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++-#         )
-++++++-#         # 共享专家
-++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++-
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_decode(
-++++++-#         self, 
-++++++-#         hidden_states: mindspore.Tensor, 
-++++++-#         selected_experts: mindspore.Tensor, 
-++++++-#         routing_weights: mindspore.Tensor
-++++++-#     ) -> mindspore.Tensor:
-++++++-#         """
-++++++-#         【解码路径】针对 sequence_length=1 的极致优化。
-++++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
-++++++-#         """
-++++++-#         batch_size, hidden_dim = hidden_states.shape
-++++++-        
-++++++-#         expert_outputs_list = [
-++++++-#             ops.cat([
-++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++-#             ], dim=0) 
-++++++-#             for i in range(batch_size)
-++++++-#         ]
-++++++-        
-++++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
-++++++-#         # shape: (batch_size, top_k, hidden_dim)
-++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++-        
-++++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
-++++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++++-        
-++++++-#         return moe_output.squeeze(1)
-++++++-
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_prefill(
-++++++-#         self, 
-++++++-#         hidden_states: mindspore.Tensor, 
-++++++-#         selected_experts: mindspore.Tensor, 
-++++++-#         routing_weights: mindspore.Tensor
-++++++-#     ) -> mindspore.Tensor:
-++++++-#         """
-++++++-#         【预填充路径】针对 sequence_length > 1 的优化。
-++++++-#         按专家对 Token 进行分组，并进行批处理。
-++++++-#         """
-++++++-#         moe_output = ops.zeros_like(hidden_states)
-++++++-#         num_tokens = hidden_states.shape[0]
-++++++-#         flat_selected_experts = selected_experts.flatten()
-++++++-        
-++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++-        
-++++++-#         active_experts = ops.unique(flat_selected_experts)
-++++++-        
-++++++-#         for expert_idx_tensor in active_experts:
-++++++-#             expert_idx = expert_idx_tensor.item()
-++++++-#             expert_layer = self.experts[expert_idx]
-++++++-            
-++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++-#             selected_token_indices = token_indices[mask]
-++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++++++-            
-++++++-#             current_states = hidden_states[selected_token_indices]
-++++++-            
-++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++-            
-++++++-#             moe_output = moe_output.index_add(
-++++++-#                 dim=0,
-++++++-#                 index=selected_token_indices,
-++++++-#                 source=expert_output.to(hidden_states.dtype)
-++++++-#             )
-++++++-#         return moe_output
-++++++-
-++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++-#         """
-++++++-#         顶层 forward 方法，作为智能分发器。
-++++++-#         """
-++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-        
-++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++-
-++++++-#         if self.norm_topk_prob:
-++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-        
-++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++++-        
-++++++-#         moe_output = None
-++++++-#         # 在推理时，根据序列长度选择最优路径
-++++++-#         if not self.training:
-++++++-#             if sequence_length == 1:
-++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++++++-#             else:
-++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++++++-#         else:
-++++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
-++++++-#             raise NotImplementedError("Training path is not implemented.")
-++++++-
-++++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
-++++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
-++++++-        
-++++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
-++++++-        
-++++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
-++++++-        
-++++++-#         return final_hidden_states, router_logits
-++++++-
-++++++-
-++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++-#     """
-++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
-++++++-#     """
-++++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++++-#         super().__init__()
-++++++-#         self.num_experts = config.num_experts
-++++++-#         self.top_k = config.num_experts_per_tok
-++++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++++-
-++++++-#         # 门控网络
-++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++-#         # 专家列表
-++++++-#         self.experts = nn.ModuleList(
-++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++-#         )
-++++++-#         # 共享专家
-++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++-
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_decode(
-++++++-#         self, 
-++++++-#         hidden_states: mindspore.Tensor, 
-++++++-#         selected_experts: mindspore.Tensor, 
-++++++-#         routing_weights: mindspore.Tensor
-++++++-#     ) -> mindspore.Tensor:
-++++++-#         batch_size, _ = hidden_states.shape
-++++++-#         expert_outputs_list = [
-++++++-#             ops.cat([
-++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++-#             ], dim=0) 
-++++++-#             for i in range(batch_size)
-++++++-#         ]
-++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++++-#         return moe_output.squeeze(1)
-++++++-
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_prefill(
-++++++-#         self, 
-++++++-#         hidden_states: mindspore.Tensor, 
-++++++-#         selected_experts: mindspore.Tensor, 
-++++++-#         routing_weights: mindspore.Tensor
-++++++-#     ) -> mindspore.Tensor:
-++++++-#         moe_output = ops.zeros_like(hidden_states)
-++++++-#         num_tokens = hidden_states.shape[0]
-++++++-#         flat_selected_experts = selected_experts.flatten()
-++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++-#         active_experts = ops.unique(flat_selected_experts)
-++++++-        
-++++++-#         for expert_idx_tensor in active_experts:
-++++++-#             expert_idx = expert_idx_tensor.item()
-++++++-#             expert_layer = self.experts[expert_idx]
-++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++-#             selected_token_indices = token_indices[mask]
-++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++++++-#             current_states = hidden_states[selected_token_indices]
-++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++-#             moe_output = moe_output.index_add(
-++++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++++++-#             )
-++++++-#         return moe_output
-++++++-
-++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++-#         """
-++++++-#         顶层 forward 方法，作为智能分发器。
-++++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
-++++++-#         """
-++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-        
-++++++-#         # 1. 门控计算 (通用逻辑)
-++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++-
-++++++-#         if self.norm_topk_prob:
-++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-        
-++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++++-        
-++++++-#         # 2. 智能分发到最优 MoE 路径
-++++++-#         moe_output = None
-++++++-#         if not self.training:
-++++++-#             if sequence_length == 1:
-++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
-++++++-#             else:
-++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
-++++++-#         else:
-++++++-#             raise NotImplementedError("Training path is not implemented.")
-++++++-
-++++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
-++++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
-++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++-        
-++++++-#         # 4. 合并 MoE 输出和共享专家输出
-++++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
-++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++-        
-++++++-#         # 5. 恢复原始形状并返回
-++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++-        
-++++++-#         return final_hidden_states, router_logits
-++++++-
-++++++-# prefill fastest
-++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++-#     """
-++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
-++++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
-++++++-#     """
-++++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++++-#         super().__init__()
-++++++-#         self.num_experts = config.num_experts
-++++++-#         self.top_k = config.num_experts_per_tok
-++++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++++-
-++++++-#         # 门控网络
-++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++-#         # 专家列表
-++++++-#         self.experts = nn.ModuleList(
-++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++-#         )
-++++++-#         # 共享专家
-++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++-
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_dispatch(
-++++++-#         self, 
-++++++-#         hidden_states: mindspore.Tensor, 
-++++++-#         selected_experts: mindspore.Tensor, 
-++++++-#         routing_weights: mindspore.Tensor
-++++++-#     ) -> mindspore.Tensor:
-++++++-#         """
-++++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
-++++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
-++++++-#         """
-++++++-#         moe_output = ops.zeros_like(hidden_states)
-++++++-#         num_tokens, _ = hidden_states.shape
-++++++-        
-++++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
-++++++-#         flat_selected_experts = selected_experts.flatten()
-++++++-#         flat_routing_weights = routing_weights.flatten()
-++++++-
-++++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
-++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++-
-++++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
-++++++-#         active_experts = ops.unique(flat_selected_experts)
-++++++-        
-++++++-#         for expert_idx_tensor in active_experts:
-++++++-#             expert_idx = expert_idx_tensor.item()
-++++++-#             expert_layer = self.experts[expert_idx]
-++++++-            
-++++++-#             # 找到所有分配给该专家的 token
-++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++-            
-++++++-#             # 使用 mask 选取对应的 token 和权重
-++++++-#             current_token_indices = token_indices[mask]
-++++++-#             current_routing_weights = flat_routing_weights[mask]
-++++++-#             current_hidden_states = hidden_states[current_token_indices]
-++++++-            
-++++++-#             # 对这些 token 进行批处理
-++++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++++-            
-++++++-#             # 使用 index_add 将结果精确地加回到对应位置
-++++++-#             moe_output = moe_output.index_add(
-++++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
-++++++-#             )
-++++++-#         return moe_output
-++++++-
-++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++-#         """
-++++++-#         顶层 forward 方法，作为智能分发器。
-++++++-#         """
-++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-        
-++++++-#         # 1. 门控计算
-++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++-
-++++++-#         if self.norm_topk_prob:
-++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-        
-++++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
-++++++-        
-++++++-#         # 2. 调用统一的 MoE 计算内核
-++++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
-++++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-++++++-
-++++++-#         # 3. 统一处理共享专家
-++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++-        
-++++++-#         # 4. 合并输出
-++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++-        
-++++++-#         # 5. 恢复原始形状并返回
-++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++-        
-++++++-#         return final_hidden_states, router_logits
-++++++-
-++++++-
-++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++-#     """
-++++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
-++++++-#     【最终高性能与高精度版】：
-++++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
-++++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
-++++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
-++++++-#     3. 这样实现了速度和准确性的两全其美。
-++++++-#     """
-++++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++++-#         super().__init__()
-++++++-#         self.num_experts = config.num_experts
-++++++-#         self.top_k = config.num_experts_per_tok
-++++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++++-
-++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++-#         self.experts = nn.ModuleList(
-++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++-#         )
-++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++-
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_decode(
-++++++-#         self, 
-++++++-#         hidden_states: mindspore.Tensor, 
-++++++-#         selected_experts: mindspore.Tensor, 
-++++++-#         routing_weights: mindspore.Tensor
-++++++-#     ) -> mindspore.Tensor:
-++++++-#         """
-++++++-#         【解码路径】极致优化版：bmm + 高精度累加。
-++++++-#         """
-++++++-#         original_dtype = hidden_states.dtype
-++++++-#         batch_size, _ = hidden_states.shape
-++++++-        
-++++++-#         expert_outputs_list = [
-++++++-#             ops.cat([
-++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++-#             ], dim=0) 
-++++++-#             for i in range(batch_size)
-++++++-#         ]
-++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++-
-++++++-#         # 在 float32 下执行 bmm，得到高精度结果
-++++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
-++++++-        
-++++++-#         # 将高精度结果转换回原始数据类型
-++++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
-++++++-        
-++++++-#         return moe_output
-++++++-
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_prefill(
-++++++-#         self, 
-++++++-#         hidden_states: mindspore.Tensor, 
-++++++-#         selected_experts: mindspore.Tensor, 
-++++++-#         routing_weights: mindspore.Tensor
-++++++-#     ) -> mindspore.Tensor:
-++++++-#         """
-++++++-#         【预填充路径】与原始实现一致，结果精确。
-++++++-#         """
-++++++-#         moe_output = ops.zeros_like(hidden_states)
-++++++-#         num_tokens, _ = hidden_states.shape
-++++++-#         flat_selected_experts = selected_experts.flatten()
-++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++-#         active_experts = ops.unique(flat_selected_experts)
-++++++-        
-++++++-#         for expert_idx_tensor in active_experts:
-++++++-#             expert_idx = expert_idx_tensor.item()
-++++++-#             expert_layer = self.experts[expert_idx]
-++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++-#             selected_token_indices = token_indices[mask]
-++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++++++-#             current_states = hidden_states[selected_token_indices]
-++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++-#             moe_output = moe_output.index_add(
-++++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
-++++++-#             )
-++++++-#         return moe_output
-++++++-
-++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-        
-++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++-
-++++++-#         if self.norm_topk_prob:
-++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-        
-++++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
-++++++-#         # 如果模型主体是 float16，后续再转换
-++++++-        
-++++++-#         moe_output = None
-++++++-#         if not self.training:
-++++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
-++++++-#             # _moe_infer_decode 内部会处理好类型转换
-++++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
-++++++-#             if sequence_length == 1:
-++++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++++++-#             else:
-++++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
-++++++-#         else:
-++++++-#             raise NotImplementedError("Training path is not implemented.")
-++++++-
-++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++-        
-++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++-        
-++++++-#         return final_hidden_states, router_logits
-++++++-    
-++++++-
-++++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++-#     """
-++++++-#     【融合版】一个混合专家模块，内置两种推理策略，
-++++++-#     由外部全局变量 `Long_Prompt` 控制：
-++++++-
-++++++-#     - if Long_Prompt is True:  【精度优先模式】
-++++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
-++++++-#       适用于处理长序列，避免误差累积。
-++++++-
-++++++-#     - if Long_Prompt is False: 【速度优先模式】
-++++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
-++++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
-++++++-#     """
-++++++-#     def __init__(self, config: Qwen2MoeConfig):
-++++++-#         super().__init__()
-++++++-#         self.num_experts = config.num_experts
-++++++-#         self.top_k = config.num_experts_per_tok
-++++++-#         self.norm_topk_prob = config.norm_topk_prob
-++++++-
-++++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
-++++++-#         self.experts = nn.ModuleList(
-++++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
-++++++-#         )
-++++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++-
-++++++-#     # --- 速度优先模式的辅助函数 ---
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++-#         original_dtype = hidden_states.dtype
-++++++-#         batch_size, _ = hidden_states.shape
-++++++-#         expert_outputs_list = [
-++++++-#             ops.cat([
-++++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
-++++++-#             ], dim=0) 
-++++++-#             for i in range(batch_size)
-++++++-#         ]
-++++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
-++++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
-++++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
-++++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
-++++++-
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++-#         moe_output = ops.zeros_like(hidden_states)
-++++++-#         num_tokens, _ = hidden_states.shape
-++++++-#         flat_selected_experts = selected_experts.flatten()
-++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++-#         active_experts = ops.unique(flat_selected_experts)
-++++++-#         for expert_idx_tensor in active_experts:
-++++++-#             expert_idx = expert_idx_tensor.item()
-++++++-#             expert_layer = self.experts[expert_idx]
-++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++-#             selected_token_indices = token_indices[mask]
-++++++-#             selected_routing_weights = routing_weights.flatten()[mask]
-++++++-#             current_states = hidden_states[selected_token_indices]
-++++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
-++++++-#         return moe_output
-++++++-
-++++++-#     # --- 精度优先模式的辅助函数 ---
-++++++-#     @no_grad()
-++++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++-#         moe_output = ops.zeros_like(hidden_states)
-++++++-#         num_tokens, _ = hidden_states.shape
-++++++-#         flat_selected_experts = selected_experts.flatten()
-++++++-#         flat_routing_weights = routing_weights.flatten()
-++++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-++++++-#         active_experts = ops.unique(flat_selected_experts)
-++++++-#         for expert_idx_tensor in active_experts:
-++++++-#             expert_idx = expert_idx_tensor.item()
-++++++-#             expert_layer = self.experts[expert_idx]
-++++++-#             mask = (flat_selected_experts == expert_idx_tensor)
-++++++-#             current_token_indices = token_indices[mask]
-++++++-#             current_routing_weights = flat_routing_weights[mask]
-++++++-#             current_hidden_states = hidden_states[current_token_indices]
-++++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
-++++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
-++++++-#         return moe_output
-++++++-
-++++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
-++++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
-++++++-#         global Long_Prompt
-++++++-
-++++++-#         # 1. 门控计算 (所有模式通用)
-++++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++-#         router_logits = self.gate(hidden_states_reshaped)
-++++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
-++++++-#         if self.norm_topk_prob:
-++++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-        
-++++++-#         moe_output = None
-++++++-#         if not self.training:
-++++++-#             # 根据 Long_Prompt 标志选择模式
-++++++-#             if Long_Prompt:
-++++++-#                 # --- 精度优先模式 ---
-++++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++-#             else:
-++++++-#                 # --- 速度优先模式 ---
-++++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++-#                 if sequence_length == 1:
-++++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++-#                 else:
-++++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++-#         else:
-++++++-#             raise NotImplementedError("Training path is not implemented.")
-++++++-
-++++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-++++++-        
-++++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
-++++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
-++++++-        
-++++++-#         return final_hidden_states, router_logits
-++++++-    
-++++++ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++     """
-++++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
-++++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-++++++         return moe_output_fp32.squeeze(1).to(original_dtype)
-++++++ 
-+++++++    # @no_grad()
-+++++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-+++++++    #     num_tokens, _ = hidden_states.shape
-+++++++    #     flat_selected_experts = selected_experts.flatten()
-+++++++    #     sorted_expert_indices = flat_selected_experts.argsort()
-+++++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-+++++++    #     original_token_indices = sorted_expert_indices // self.top_k
-+++++++    #     moe_output = ops.zeros_like(hidden_states)
-+++++++    #     current_token_offset = 0
-+++++++    #     for i in range(self.num_experts):
-+++++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
-+++++++    #         if expert_token_count == 0:
-+++++++    #             continue
-+++++++    #         end_offset = current_token_offset + expert_token_count
-+++++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-+++++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-+++++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
-+++++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-+++++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-+++++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-+++++++    #         current_token_offset += expert_token_count
-+++++++    #     return moe_output
-+++++++
-++++++     @no_grad()
-++++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++-        num_tokens, _ = hidden_states.shape
-++++++-        flat_selected_experts = selected_experts.flatten()
-++++++-        sorted_expert_indices = flat_selected_experts.argsort()
-++++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
-++++++-        original_token_indices = sorted_expert_indices // self.top_k
-+++++++        """
-+++++++        优化版 MoE prefill (速度优先模式)：
-+++++++        - 批量张量化处理同一个 expert 的所有 token
-+++++++        - 跳过无 token 的专家
-+++++++        - 保持结果完全一致
-+++++++        """
-++++++         moe_output = ops.zeros_like(hidden_states)
-++++++-        current_token_offset = 0
-++++++-        for i in range(self.num_experts):
-++++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
-++++++-            if expert_token_count == 0:
-++++++-                continue
-++++++-            end_offset = current_token_offset + expert_token_count
-++++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
-++++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
-++++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
-++++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
-++++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
-++++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
-++++++-            current_token_offset += expert_token_count
-+++++++
-+++++++        flat_selected_experts = selected_experts.flatten()
-+++++++        flat_routing_weights = routing_weights.flatten()
-+++++++
-+++++++        idxs = flat_selected_experts.argsort()
-+++++++        sorted_expert_indices = flat_selected_experts[idxs]
-+++++++        sorted_token_indices = idxs // self.top_k
-+++++++
-+++++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
-+++++++
-+++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-+++++++
-+++++++        for expert_id in active_experts.tolist():
-+++++++            start = int(tokens_per_expert[:expert_id].sum().item())
-+++++++            end = start + int(tokens_per_expert[expert_id].item())
-+++++++
-+++++++            token_idx = sorted_token_indices[start:end]
-+++++++            expert_tokens = hidden_states[token_idx]
-+++++++
-+++++++            expert_out = self.experts[expert_id](expert_tokens)
-+++++++
-+++++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
-+++++++
-+++++++            moe_output = mindspore.mint.scatter_add(
-+++++++                moe_output,
-+++++++                0,
-+++++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
-+++++++                scaled_out.to(hidden_states.dtype)
-+++++++            )
-+++++++
-++++++         return moe_output
-++++++ 
-+++++++
-++++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
-++++++     @no_grad()
-++++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-++++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++         
-++++++         moe_output = None
-++++++-        if Long_Prompt:
-++++++-            # --- 精度优先模式 (ACCURACY MODE) ---
-++++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++        # if Long_Prompt==0:
-+++++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
-+++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++        # else:
-+++++++        #     # --- 速度优先模式 (SPEED MODE) ---
-+++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++++        #     if sequence_length == 1:
-+++++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++        #     else:
-+++++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++        
-+++++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
-+++++++        if sequence_length == 1:
-+++++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++         else:
-++++++-            # --- 速度优先模式 (SPEED MODE) ---
-++++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
-++++++-            if sequence_length == 1:
-++++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++-            else:
-++++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-++++++-        
-+++++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
-+++++++    
-++++++ 
-++++++         # 3. 共享专家计算与合并 (所有模式通用)
-++++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-++++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++         
-++++++         return final_hidden_states, router_logits
-++++++ 
-+++++++
-++++++ class Qwen2MoeDecoderLayer(nn.Module):
-++++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-++++++         super().__init__()
-++++++         self.hidden_size = config.hidden_size
-++++++         
-++++++-        # if Long_Prompt:
-++++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++++-        # else:
-+++++++        # if Long_Prompt == 2:
-++++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-+++++++        # else:
-+++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++++ 
-++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++++ 
-++++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++++++             )
-++++++ 
-++++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
-++++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++++        #     attention_mask,
-+++++++        #     sequence_length=sequence_length,
-+++++++        #     target_length=target_length,
-+++++++        #     dtype=dtype,
-+++++++        #     min_dtype=min_dtype,
-+++++++        #     cache_position=cache_position,
-+++++++        #     batch_size=input_tensor.shape[0],
-+++++++        # )
-+++++++        #@dwj
-+++++++        causal_mask = get_cached_causal_mask_with_cache_position(
-++++++             attention_mask,
-++++++             sequence_length=sequence_length,
-++++++             target_length=target_length,
-++++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
-++++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
-++++++         """
-++++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
-+++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
-+++++++        _causal_mask_cache.clear()
-++++++ 
-++++++         input_ids = kwargs.get("input_ids")
-++++++         if input_ids is None and args:
-++++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++ 
-++++++         if input_ids is not None:
-++++++             prompt_length = input_ids.shape[1]
-++++++-            
-++++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
-++++++-                Long_Prompt = True
-+++++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
-+++++++                Long_Prompt = 2
-+++++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
-+++++++                Long_Prompt = 0
-++++++             else:
-++++++-                Long_Prompt = False
-+++++++                Long_Prompt = 1
-+++++++
-++++++ 
-++++++         return super().generate(*args, **kwargs)
-++++++     
-++++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++             dtype = self.lm_head.weight.dtype
-++++++             min_dtype = float(ops.finfo(dtype).min)
-++++++ 
-++++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-+++++++            #     attention_mask,
-+++++++            #     sequence_length=sequence_length,
-+++++++            #     target_length=past_key_values.get_max_length(),
-+++++++            #     dtype=dtype,
-+++++++            #     min_dtype=min_dtype,
-+++++++            #     cache_position=cache_position,
-+++++++            #     batch_size=batch_size,
-+++++++            # )
-+++++++
-+++++++            #@dwj
-+++++++            attention_mask = get_cached_causal_mask_with_cache_position(
-++++++                 attention_mask,
-++++++                 sequence_length=sequence_length,
-++++++                 target_length=past_key_values.get_max_length(),
-++++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-++++++deleted file mode 100644
-++++++index 6dfb5b93..00000000
-++++++--- a/patches/0001-20251104commit.patch
-+++++++++ /dev/null
-++++++@@ -1,1272 +0,0 @@
-++++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-++++++-From: Pinoeer-kingxi <13022943007@163.com>
-++++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
-++++++-Subject: [PATCH] 20251104commit
-++++++-
-++++++----
-++++++- mindnlp/transformers/cache_utils.py           |  28 +-
-++++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
-++++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-++++++- 3 files changed, 976 insertions(+), 87 deletions(-)
-++++++-
-++++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
-++++++-index cadd2e04..02f8d4be 100644
-++++++---- a/mindnlp/transformers/cache_utils.py
-++++++-+++ b/mindnlp/transformers/cache_utils.py
-++++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
-++++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
-++++++-             # k_out[:, :, cache_position] = key_states
-++++++-             # v_out[:, :, cache_position] = value_states
-++++++--            if ON_ORANGE_PI:
-++++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++++--            else:
-++++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++++--
-++++++-+            # if ON_ORANGE_PI:
-++++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
-++++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
-++++++-+            # else:
-++++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
-++++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
-++++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
-++++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
-++++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
-++++++-+            if cache_position.ndim > 1:
-++++++-+                cache_position = cache_position.flatten()
-++++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
-++++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
-++++++-+                cache_position = cache_position.int()
-++++++-+            
-++++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
-++++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
-++++++-+            k_out[:, :, cache_position] = key_states
-++++++-+            v_out[:, :, cache_position] = value_states
-++++++-+            
-++++++-         return k_out, v_out
-++++++- 
-++++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
-++++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++-index c695b944..d8303e45 100644
-++++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-++++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-++++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
-++++++- def rotate_half(x):
-++++++-     """Rotates half the hidden dims of the input."""
-++++++--    x1 = x[..., : x.shape[-1] // 2]
-++++++--    x2 = x[..., x.shape[-1] // 2 :]
-++++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++++-+    # x1 = x[..., : x.shape[-1] // 2]
-++++++-+    # x2 = x[..., x.shape[-1] // 2 :]
-++++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++++-     return ops.cat((-x2, x1), dim=-1)
-++++++- 
-++++++- 
-++++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
-++++++-         if self.training:
-++++++-             raise NotImplementedError("Training is not supported yet.")
-++++++-         else:
-++++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++++--        if self.config.n_shared_experts is not None:
-++++++--            y = y + self.shared_experts(identity)
-++++++--        return y
-++++++-+            # @lwx
-++++++-+            if orig_shape[1] == 1:
-++++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-++++++-+                y=y.view(*orig_shape)
-++++++-+                if self.config.n_shared_experts is not None:
-++++++-+                    y = y + self.shared_experts(identity)
-++++++-+                return y
-++++++-+            else:
-++++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
-++++++-+                if self.config.n_shared_experts is not None:
-++++++-+                    y = y + self.shared_experts(identity)
-++++++-+                return y
-++++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
-++++++-+        # if self.config.n_shared_experts is not None:
-++++++-+        #     y = y + self.shared_experts(identity)
-++++++-+        # return y
-++++++-+        
-++++++-+    @no_grad()
-++++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-++++++-+
-++++++-+        expert_cache = ops.zeros_like(x)
-++++++-+        for i in range(self.num_experts_per_tok):
-++++++-+            expert_id = flat_expert_indices[i].item()
-++++++-+            weight = flat_expert_weights[i].item()
-++++++-+            expert = self.experts[expert_id]
-++++++-+            expert_out = expert(x)
-++++++-+            expert_cache += expert_out * weight
-++++++-+        return expert_cache
-++++++- 
-++++++-     @no_grad()
-++++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++--        # expert_cache = torch.zeros_like(x)
-++++++--        # idxs = flat_expert_indices.argsort()
-++++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++--        # token_idxs = idxs // self.num_experts_per_tok
-++++++--        # for i, end_idx in enumerate(tokens_per_expert):
-++++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++--        #     if start_idx == end_idx:
-++++++--        #         continue
-++++++--        #     expert = self.experts[i]
-++++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++--        #     expert_tokens = x[exp_token_idx]
-++++++--        #     expert_out = expert(expert_tokens)
-++++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++--        # return expert_cache
-++++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
-++++++-         expert_cache = ops.zeros_like(x)
-++++++-         idxs = flat_expert_indices.argsort()
-++++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++-         token_idxs = idxs // self.num_experts_per_tok
-++++++-+
-++++++-         for i, end_idx in enumerate(tokens_per_expert):
-++++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++-             if start_idx == end_idx:
-++++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
-++++++-             expert_out = expert(expert_tokens)
-++++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++-+
-++++++-         return expert_cache
-++++++-+        
-++++++-+    # @no_grad()
-++++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++-+    #     # expert_cache = torch.zeros_like(x)
-++++++-+    #     # idxs = flat_expert_indices.argsort()
-++++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
-++++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
-++++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
-++++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
-++++++-+    #     #     if start_idx == end_idx:
-++++++-+    #     #         continue
-++++++-+    #     #     expert = self.experts[i]
-++++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
-++++++-+    #     #     expert_tokens = x[exp_token_idx]
-++++++-+    #     #     expert_out = expert(expert_tokens)
-++++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
-++++++-+    #     # return expert_cache
-++++++-+    #     expert_cache = ops.zeros_like(x)
-++++++-+    #     idxs = flat_expert_indices.argsort()
-++++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++-+    #     token_idxs = idxs // self.num_experts_per_tok
-++++++-+
-++++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
-++++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++-+    #         if start_idx == end_idx:
-++++++-+    #             continue
-++++++-+    #         expert = self.experts[i]
-++++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++-+    #         expert_tokens = x[exp_token_idx]
-++++++-+    #         expert_out = expert(expert_tokens)
-++++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
-++++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-++++++-+
-++++++-+    #     return expert_cache
-++++++-+    # @no_grad()
-++++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
-++++++-+    #     expert_cache = ops.zeros_like(x)
-++++++-+
-++++++-+    #     # 排序保证顺序一致
-++++++-+    #     idxs = flat_expert_indices.argsort()
-++++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
-++++++-+    #     token_idxs = idxs // self.num_experts_per_tok
-++++++-+
-++++++-+    #     # 找出有 token 的专家
-++++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
-++++++-+
-++++++-+    #     for i in active_experts.tolist():
-++++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
-++++++-+    #         end_idx = tokens_per_expert[i]
-++++++-+    #         if start_idx == end_idx:  # 没有 token
-++++++-+    #             continue
-++++++-+
-++++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
-++++++-+    #         expert_tokens = x[exp_token_idx]
-++++++-+    #         expert_out = self.experts[i](expert_tokens)
-++++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
-++++++-+
-++++++-+    #         expert_cache = mindspore.mint.scatter_add(
-++++++-+    #             expert_cache,
-++++++-+    #             0,
-++++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
-++++++-+    #             expert_out
-++++++-+    #         )
-++++++-+
-++++++-+    #     return expert_cache
-++++++-+
-++++++-+
-++++++- 
-++++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-++++++- #     """
-++++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++++- 
-++++++-         # Initialize weights and apply final processing
-++++++-         self.post_init()
-++++++-+        self.warm_up = False
-++++++-+
-++++++-+    def warmup_moe_model_deep(self):
-++++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
-++++++-+        test_texts = [
-++++++-+            "warmup short",
-++++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
-++++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
-++++++-+        ]
-++++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++++-+        if tokenizer is None:
-++++++-+            from mindnlp.transformers import AutoTokenizer
-++++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++++-+            self._warmup_tokenizer = tokenizer
-++++++-+
-++++++-+        for text in test_texts:
-++++++-+            inputs = tokenizer(text, return_tensors="ms")
-++++++-+            with mindspore._no_grad():
-++++++-+                _ = self(**inputs, use_cache=False)
-++++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-++++++- 
-++++++-     def get_input_embeddings(self):
-++++++-         return self.model.embed_tokens
-++++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-++++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++++-         ```"""
-++++++-+        if not self.warm_up:
-++++++-+            self.warm_up = True
-++++++-+            self.warmup_moe_model_deep()
-++++++-+
-++++++-         output_attentions = (
-++++++-             output_attentions
-++++++-             if output_attentions is not None
-++++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++-index 3cbf820e..d4c6b651 100644
-++++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-++++++-@@ -18,7 +18,6 @@
-++++++- # See the License for the specific language governing permissions and
-++++++- # limitations under the License.
-++++++- """MindSpore Qwen2MoE model."""
-++++++--
-++++++- import math
-++++++- from typing import List, Optional, Tuple, Union
-++++++- 
-++++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
-++++++-     TokenClassifierOutput,
-++++++- )
-++++++- from ...modeling_utils import PreTrainedModel
-++++++-+from ...generation import GenerationMixin
-++++++- from ....utils import logging
-++++++- from .configuration_qwen2_moe import Qwen2MoeConfig
-++++++- 
-++++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
-++++++-         self.variance_epsilon = eps
-++++++- 
-++++++-     def forward(self, hidden_states):
-++++++-+        # @dwj
-++++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++++-+        # @lwx
-++++++-+        # if not self.training :
-++++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-++++++-         input_dtype = hidden_states.dtype
-++++++-         hidden_states = hidden_states.to(mindspore.float32)
-++++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-++++++-@@ -234,6 +239,8 @@ def rotate_half(x):
-++++++-     """Rotates half the hidden dims of the input."""
-++++++-     x1 = x[..., : x.shape[-1] // 2]
-++++++-     x2 = x[..., x.shape[-1] // 2 :]
-++++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-++++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-++++++-     return ops.cat((-x2, x1), dim=-1)
-++++++- 
-++++++- 
-++++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
-++++++-         self.config = config
-++++++-         self.hidden_size = config.hidden_size
-++++++-         self.intermediate_size = intermediate_size
-++++++-+        
-++++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
-++++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
-++++++-         self.act_fn = ACT2FN[config.hidden_act]
-++++++- 
-++++++-     def forward(self, x):
-++++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++++--
-++++++- 
-++++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
-++++++-+        # @lwx 
-++++++-+        # gate_up_output = self.gate_up_proj(x)
-++++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
-++++++-+        # return self.down_proj(swiglu_output)
-++++++-+
-++++++-+    # def forward(self, x):
-++++++-+    #     gate_proj_out = self.gate_proj(x)
-++++++-+    #     up_proj_out = self.up_proj(x)
-++++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
-++++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
-++++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
-++++++-+    #     return self.down_proj(swiglu_out)
-++++++-+        
-++++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
-++++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-++++++-     """
-++++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
-++++++-         use_cache: bool = False,
-++++++-         cache_position: Optional[mindspore.Tensor] = None,
-++++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++-+
-++++++-+        
-++++++-+
-++++++-         bsz, q_len, _ = hidden_states.shape
-++++++- 
-++++++-         query_states = self.q_proj(hidden_states)
-++++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
-++++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++-                     "with a layer index."
-++++++-                 )
-++++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++-+            if isinstance(past_key_value, StaticCache):
-++++++-+                kv_seq_len = key_states.shape[-2]
-++++++-+            else:
-++++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++- 
-++++++-         if past_key_value is not None:
-++++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
-++++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
-++++++-+            
-++++++-+            if isinstance(past_key_value, StaticCache):
-++++++-+                kv_seq_len = key_states.shape[-2]
-++++++- 
-++++++-         # repeat k/v heads if n_kv_heads < n_heads
-++++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++--
-++++++-+        
-++++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-++++++- 
-++++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
-++++++--            raise ValueError(
-++++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
-++++++--                f" {attn_weights.shape}"
-++++++--            )
-++++++--
-++++++--        if attention_mask is not None:  # no matter the length, we just slice it
-++++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
-++++++-+        if attention_mask is not None:
-++++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
-++++++-             attn_weights = attn_weights + causal_mask
-++++++- 
-++++++-         # upcast attention to fp32
-++++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
-++++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-++++++- 
-++++++-         attn_output = self.o_proj(attn_output)
-++++++--
-++++++-+        # @lwx
-++++++-+        
-++++++-+        # max_seq_len = self.max_position_embeddings  # 2048
-++++++-+
-++++++-+        # if attention_mask is not None:
-++++++-+        #     # attention_mask: [B, 1, Sq, Sk]
-++++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
-++++++-+
-++++++-+        #     # pad 到 [max_seq_len, max_seq_len]
-++++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
-++++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
-++++++-+        #     global_attention_mask = padded_mask
-++++++-+        # else:
-++++++-+        #     global_attention_mask = None
-++++++-+
-++++++-+
-++++++-+        # sparse_mode=3
-++++++-+        # attn_output = mindspore.ops.flash_attention_score(
-++++++-+        #     query=query_states,
-++++++-+        #     key=key_states,
-++++++-+        #     value=value_states,
-++++++-+        #     real_shift=None,
-++++++-+        #     padding_mask=None,
-++++++-+
-++++++-+        #     head_num=self.num_heads,
-++++++-+        #     attn_mask=global_attention_mask,
-++++++-+        #     keep_prob=1.0 - self.attention_dropout,
-++++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++-+        #     input_layout="BNSD",
-++++++-+        #     pre_tokens=2147483647,
-++++++-+        #     next_tokens=2147483647,
-++++++-+        #     inner_precise=0,
-++++++-+        #     drop_mask=None,
-++++++-+        #     prefix=None,
-++++++-+        #     actual_seq_qlen=None,
-++++++-+        #     actual_seq_kvlen=None,
-++++++-+        #     sparse_mode=sparse_mode,
-++++++-+        # )
-++++++-         if not output_attentions:
-++++++-             attn_weights = None
-++++++- 
-++++++-         return attn_output, attn_weights, past_key_value
-++++++- 
-++++++- 
-++++++-+class Qwen2MoeFlashAttention(nn.Module):
-++++++-+    """
-++++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
-++++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
-++++++-+
-++++++-+    关键改动:
-++++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
-++++++-+       直接传入原始的 key 和 value 张量效率更高。
-++++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
-++++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
-++++++-+    """
-++++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
-++++++-+        super().__init__()
-++++++-+        self.config = config
-++++++-+        self.layer_idx = layer_idx
-++++++-+        self.hidden_size = config.hidden_size
-++++++-+        self.num_heads = config.num_attention_heads
-++++++-+        self.head_dim = self.hidden_size // self.num_heads
-++++++-+        self.num_key_value_heads = config.num_key_value_heads
-++++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
-++++++-+        self.max_position_embeddings = config.max_position_embeddings
-++++++-+        self.rope_theta = config.rope_theta
-++++++-+        self.attention_dropout = config.attention_dropout
-++++++-+
-++++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
-++++++-+            raise ValueError(
-++++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
-++++++-+            )
-++++++-+
-++++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
-++++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
-++++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
-++++++-+
-++++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
-++++++-+            self.head_dim,
-++++++-+            max_position_embeddings=self.max_position_embeddings,
-++++++-+            base=self.rope_theta,
-++++++-+        )
-++++++-+
-++++++-+    def forward(
-++++++-+        self,
-++++++-+        hidden_states: mindspore.Tensor,
-++++++-+        attention_mask: Optional[mindspore.Tensor] = None,
-++++++-+        position_ids: Optional[mindspore.Tensor] = None,
-++++++-+        past_key_value: Optional[Cache] = None,
-++++++-+        output_attentions: bool = False,
-++++++-+        use_cache: bool = False,
-++++++-+        cache_position: Optional[mindspore.Tensor] = None,
-++++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++-+
-++++++-+        bsz, q_len, _ = hidden_states.shape
-++++++-+
-++++++-+        # 1. 线性投射 Q, K, V
-++++++-+        query_states = self.q_proj(hidden_states)
-++++++-+        key_states = self.k_proj(hidden_states)
-++++++-+        value_states = self.v_proj(hidden_states)
-++++++-+
-++++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
-++++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
-++++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+
-++++++-+        # 3. RoPE 旋转位置编码
-++++++-+        kv_seq_len = key_states.shape[-2]
-++++++-+        if past_key_value is not None:
-++++++-+            if self.layer_idx is None:
-++++++-+                raise ValueError(
-++++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++-+                    "with a layer index."
-++++++-+                )
-++++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
-++++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
-++++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
-++++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
-++++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
-++++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
-++++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
-++++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
-++++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
-++++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
-++++++-+                if cache_position.shape[0] == 1:
-++++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
-++++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
-++++++-+                    kv_seq_len = past_seen_tokens + 1
-++++++-+                else:
-++++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
-++++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
-++++++-+            else:
-++++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++-+
-++++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++-+
-++++++-+        # 4. KV 缓存更新
-++++++-+        if past_key_value is not None:
-++++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++-+            key_states, value_states = past_key_value.update(
-++++++-+                key_states, value_states, self.layer_idx, cache_kwargs
-++++++-+            )
-++++++-+            
-++++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
-++++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
-++++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
-++++++-+                if cache_position.shape[0] == 1:
-++++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
-++++++-+                    kv_seq_len = key_states.shape[-2]
-++++++-+
-++++++-+        # 5. [重要] 准备 Attention Mask
-++++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
-++++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
-++++++-+        fa_attention_mask = None
-++++++-+        if attention_mask is not None:
-++++++-+            # 截取与当前key长度匹配的部分
-++++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
-++++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
-++++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
-++++++-+            fa_attention_mask = (mask_slice != 0)
-++++++-+
-++++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
-++++++-+        input_dtype = query_states.dtype
-++++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
-++++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
-++++++-+            query_states = query_states.to(mindspore.float16)
-++++++-+            key_states = key_states.to(mindspore.float16)
-++++++-+            value_states = value_states.to(mindspore.float16)
-++++++-+
-++++++-+        # 6. [核心] 调用 flash_attention_score 算子
-++++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
-++++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
-++++++-+        attn_output = mindspore.ops.flash_attention_score(
-++++++-+            query=query_states,
-++++++-+            key=key_states,
-++++++-+            value=value_states,
-++++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
-++++++-+            attn_mask=fa_attention_mask,
-++++++-+            keep_prob=1.0 - self.attention_dropout,
-++++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++-+            input_layout="BNSD",
-++++++-+            sparse_mode=0 # 使用 defaultMask 模式
-++++++-+        )
-++++++-+
-++++++-+        # 恢复原始数据类型
-++++++-+        attn_output = attn_output.to(input_dtype)
-++++++-+
-++++++-+        # 7. 调整输出形状
-++++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
-++++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++-+        attn_output = self.o_proj(attn_output)
-++++++-+
-++++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
-++++++-+        attn_weights = None
-++++++-+        if output_attentions:
-++++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++-+
-++++++-+        return attn_output, attn_weights, past_key_value
-++++++-+
-++++++-+    # def forward(
-++++++-+    #     self,
-++++++-+    #     hidden_states: mindspore.Tensor,
-++++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++-+    #     past_key_value: Optional[Cache] = None,
-++++++-+    #     output_attentions: bool = False,
-++++++-+    #     use_cache: bool = False,
-++++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++-+
-++++++-+    #     bsz, q_len, _ = hidden_states.shape
-++++++-+
-++++++-+    #     # 1. 线性投射 Q, K, V
-++++++-+    #     query_states = self.q_proj(hidden_states)
-++++++-+    #     key_states = self.k_proj(hidden_states)
-++++++-+    #     value_states = self.v_proj(hidden_states)
-++++++-+
-++++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
-++++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+
-++++++-+    #     # 3. RoPE 旋转位置编码
-++++++-+    #     kv_seq_len = key_states.shape[-2]
-++++++-+    #     if past_key_value is not None:
-++++++-+    #         if self.layer_idx is None:
-++++++-+    #             raise ValueError(
-++++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
-++++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
-++++++-+    #                 "with a layer index."
-++++++-+    #             )
-++++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++-+
-++++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++-+
-++++++-+    #     # 4. KV 缓存更新
-++++++-+    #     if past_key_value is not None:
-++++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++-+    #         key_states, value_states = past_key_value.update(
-++++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++-+    #         )
-++++++-+
-++++++-+    #     # 5. 准备 Attention Mask
-++++++-+    #     fa_attention_mask = None
-++++++-+    #     if attention_mask is not None:
-++++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++-+    #         fa_attention_mask = (mask_slice != 0)
-++++++-+
-++++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
-++++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
-++++++-+    #     input_dtype = query_states.dtype
-++++++-+
-++++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
-++++++-+    #     attn_output = mindspore.ops.flash_attention_score(
-++++++-+    #         query=query_states,
-++++++-+    #         key=key_states,
-++++++-+    #         value=value_states,
-++++++-+    #         head_num=self.num_heads,
-++++++-+    #         attn_mask=fa_attention_mask,
-++++++-+    #         keep_prob=1.0 - self.attention_dropout,
-++++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
-++++++-+    #         input_layout="BNSD",
-++++++-+    #         sparse_mode=0,
-++++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
-++++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
-++++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
-++++++-+    #         inner_precise=1
-++++++-+    #     )
-++++++-+
-++++++-+    #     # 恢复原始数据类型
-++++++-+    #     attn_output = attn_output.to(input_dtype)
-++++++-+
-++++++-+    #     # 7. 调整输出形状
-++++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++-+    #     attn_output = self.o_proj(attn_output)
-++++++-+
-++++++-+    #     attn_weights = None
-++++++-+    #     if output_attentions:
-++++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
-++++++-+
-++++++-+    #     return attn_output, attn_weights, past_key_value
-++++++-+
-++++++-+    # def forward(
-++++++-+    #     self,
-++++++-+    #     hidden_states: mindspore.Tensor,
-++++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
-++++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
-++++++-+    #     past_key_value: Optional[Cache] = None,
-++++++-+    #     output_attentions: bool = False,
-++++++-+    #     use_cache: bool = False,
-++++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
-++++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
-++++++-+
-++++++-+    #     bsz, q_len, _ = hidden_states.shape
-++++++-+
-++++++-+    #     query_states = self.q_proj(hidden_states)
-++++++-+    #     key_states = self.k_proj(hidden_states)
-++++++-+    #     value_states = self.v_proj(hidden_states)
-++++++-+
-++++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-++++++-+
-++++++-+    #     kv_seq_len = key_states.shape[-2]
-++++++-+    #     if past_key_value is not None:
-++++++-+    #         if self.layer_idx is None:
-++++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
-++++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-++++++-+
-++++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
-++++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-++++++-+
-++++++-+    #     if past_key_value is not None:
-++++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-++++++-+    #         key_states, value_states = past_key_value.update(
-++++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
-++++++-+    #         )
-++++++-+
-++++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
-++++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
-++++++-+
-++++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
-++++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
-++++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
-++++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
-++++++-+    #     # <--- 修改结束 ---
-++++++-+
-++++++-+    #     fa_attention_mask = None
-++++++-+    #     if attention_mask is not None:
-++++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-++++++-+    #         fa_attention_mask = (mask_slice != 0)
-++++++-+
-++++++-+    #     input_dtype = query_states.dtype
-++++++-+
-++++++-+    #     attn_output = mindspore.ops.flash_attention_score(
-++++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
-++++++-+    #         key=key_states,
-++++++-+    #         value=value_states,
-++++++-+    #         head_num=self.num_heads,
-++++++-+    #         attn_mask=fa_attention_mask,
-++++++-+    #         keep_prob=1.0 - self.attention_dropout,
-++++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
-++++++-+    #         input_layout="BNSD",
-++++++-+    #         sparse_mode=0,
-++++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
-++++++-+    #     )
-++++++-+
-++++++-+    #     attn_output = attn_output.to(input_dtype)
-++++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-++++++-+    #     attn_output = self.o_proj(attn_output)
-++++++-+
-++++++-+    #     attn_weights = None
-++++++-+    #     if output_attentions:
-++++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
-++++++-+
-++++++-+    #     return attn_output, attn_weights, past_key_value
-++++++-+
-++++++- QWEN2MOE_ATTENTION_CLASSES = {
-++++++-     "eager": Qwen2MoeAttention,
-++++++-+    "flash-attention": Qwen2MoeFlashAttention,
-++++++- }
-++++++- 
-++++++- 
-++++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-++++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-++++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-++++++- 
-++++++-+    #@dwj
-++++++-+    # 只遍历激活的专家，而非全部专家
-++++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
-++++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++--        hidden_states = hidden_states.view(-1, hidden_dim)
-++++++--        # router_logits: (batch * sequence_length, n_experts)
-++++++--        router_logits = self.gate(hidden_states)
-++++++--
-++++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++--        if self.norm_topk_prob:
-++++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++--        # we cast back to the input dtype
-++++++--        routing_weights = routing_weights.to(hidden_states.dtype)
-++++++--
-++++++--        final_hidden_states = ops.zeros(
-++++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
-++++++--        )
-++++++--
-++++++--        # One hot encode the selected experts to create an expert mask
-++++++--        # this will be used to easily index which expert is going to be sollicitated
-++++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
-++++++--
-++++++--        # Loop over all available experts in the model and perform the computation on each expert
-++++++--        for expert_idx in range(self.num_experts):
-++++++--            expert_layer = self.experts[expert_idx]
-++++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
-++++++--
-++++++--            # Index the correct hidden states and compute the expert hidden state for
-++++++--            # the current expert. We need to make sure to multiply the output hidden
-++++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
-++++++--            if 0 not in idx.shape:
-++++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
-++++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
-++++++--
-++++++--                # However `index_add_` only support torch tensors for indexing so we'll use
-++++++--                # the `top_x` tensor here.
-++++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
-++++++--
-++++++--        shared_expert_output = self.shared_expert(hidden_states)
-++++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
-++++++--
-++++++--        final_hidden_states = final_hidden_states + shared_expert_output
-++++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
-++++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
-++++++-+            num_tokens = hidden_states_reshaped.shape[0]
-++++++-+
-++++++-+            router_logits = self.gate(hidden_states_reshaped)
-++++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
-++++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-++++++-+
-++++++-+            if self.norm_topk_prob:
-++++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-++++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
-++++++-+            
-++++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
-++++++-+            flat_selected_experts = selected_experts.flatten()
-++++++-+            
-++++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
-++++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
-++++++-+            token_indices = broadcasted_token_indices.flatten()
-++++++-+            
-++++++-+            active_experts = ops.unique(flat_selected_experts)
-++++++-+            
-++++++-+            for expert_idx_tensor in active_experts:
-++++++-+                expert_idx = expert_idx_tensor.item()
-++++++-+                expert_layer = self.experts[expert_idx]
-++++++-+                
-++++++-+                mask = (flat_selected_experts == expert_idx_tensor)
-++++++-+                selected_token_indices = token_indices[mask]
-++++++-+                selected_routing_weights = routing_weights.flatten()[mask]
-++++++-+                
-++++++-+                current_states = hidden_states_reshaped[selected_token_indices]
-++++++-+                
-++++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
-++++++-+                
-++++++-+                final_hidden_states = final_hidden_states.index_add(
-++++++-+                    dim=0,
-++++++-+                    index=selected_token_indices,
-++++++-+                    source=expert_output.to(hidden_states.dtype)
-++++++-+                )
-++++++-+            
-++++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
-++++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-++++++- 
-++++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++--        return final_hidden_states, router_logits
-++++++-+            final_hidden_states = final_hidden_states + shared_expert_output
-++++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
-++++++-+            
-++++++-+            return final_hidden_states, router_logits
-++++++- 
-++++++- 
-++++++- class Qwen2MoeDecoderLayer(nn.Module):
-++++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-++++++- 
-++++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-++++++- 
-++++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-++++++-+
-++++++-         if (layer_idx not in config.mlp_only_layers) and (
-++++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
-++++++-         ):
-++++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-++++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-++++++-     _skip_keys_device_placement = "past_key_values"
-++++++-     _supports_cache_class = True
-++++++-+#lwx
-++++++-+    # _supports_static_cache = True
-++++++- 
-++++++-     def _init_weights(self, module):
-++++++-         std = self.config.initializer_range
-++++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-++++++-         return causal_mask
-++++++- 
-++++++- 
-++++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-++++++-     _tied_weights_keys = ["lm_head.weight"]
-++++++- 
-++++++-     def __init__(self, config):
-++++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++-         self.num_experts_per_tok = config.num_experts_per_tok
-++++++-         # Initialize weights and apply final processing
-++++++-         self.post_init()
-++++++-+        # @lwx
-++++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
-++++++-+        #     self.generation_config.cache_implementation = "static"
-++++++-+        self._warmed_up = False
-++++++-+
-++++++-+    def warmup_moe_model(self):
-++++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
-++++++-+        test_texts = [
-++++++-+            "warmup short",
-++++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-++++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-++++++-+        ]
-++++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
-++++++-+        if tokenizer is None:
-++++++-+            from mindnlp.transformers import AutoTokenizer
-++++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-++++++-+            self._warmup_tokenizer = tokenizer
-++++++-+
-++++++-+        for text in test_texts:
-++++++-+            inputs = tokenizer(text, return_tensors="ms")
-++++++-+            with mindspore._no_grad():
-++++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
-++++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
-++++++- 
-++++++-     def get_input_embeddings(self):
-++++++-         return self.model.embed_tokens
-++++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
-++++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
-++++++-         ```"""
-++++++-+        if not self._warmed_up:
-++++++-+            self._warmed_up = True
-++++++-+            self.warmup_moe_model()
-++++++- 
-++++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
-++++++-         output_router_logits = (
-++++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
-++++++-             }
-++++++-         )
-++++++-         return model_inputs
-++++++-+# @lwx
-++++++-+    # def _decode_one_tokens_logits(
-++++++-+    #     self,
-++++++-+    #     cur_token: mindspore.Tensor,
-++++++-+    #     input_pos: Optional[mindspore.Tensor],
-++++++-+    #     cache_position: mindspore.Tensor,
-++++++-+    #     past_key_values: StaticCache,
-++++++-+    # ) -> mindspore.Tensor:
-++++++-+    #     """
-++++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
-++++++-+        
-++++++-+    #     Args:
-++++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
-++++++-+    #         input_pos: 输入位置信息，可选
-++++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
-++++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
-++++++-+            
-++++++-+    #     Returns:
-++++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
-++++++-+    #     """
-++++++-+    #     # 调用JIT编译的版本
-++++++-+    #     return self.get_decode_one_tokens_logits(
-++++++-+    #         cur_token=cur_token,
-++++++-+    #         input_pos=input_pos,
-++++++-+    #         cache_position=cache_position,
-++++++-+    #         past_key_values=past_key_values,
-++++++-+    #     )
-++++++-+    
-++++++-+    # @mindspore.jit(jit_level='O1')
-++++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
-++++++-+    #     """
-++++++-+    #     JIT编译的函数，用于高效的单token解码
-++++++-+    #     使用JIT编译优化以支持静态shape和高效执行
-++++++-+        
-++++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
-++++++-+    #     """
-++++++-+    #     outputs = self.model.forward(
-++++++-+    #         input_ids=cur_token,
-++++++-+    #         position_ids=input_pos,
-++++++-+    #         cache_position=cache_position,
-++++++-+    #         past_key_values=past_key_values,
-++++++-+    #         use_cache=True,
-++++++-+    #         return_dict=False,
-++++++-+    #     )
-++++++-+        
-++++++-+    #     hidden_states = outputs[0]
-++++++-+    #     logits = self.lm_head.forward(hidden_states)
-++++++-+    #     logits = logits.float()
-++++++-+        
-++++++-+    #     return logits[:, -1, :]
-++++++-+
-++++++-+    # def _sample(
-++++++-+    #     self,
-++++++-+    #     input_ids: mindspore.Tensor,
-++++++-+    #     logits_processor,
-++++++-+    #     stopping_criteria,
-++++++-+    #     generation_config,
-++++++-+    #     synced_devices: bool,
-++++++-+    #     streamer=None,
-++++++-+    #     logits_warper=None,
-++++++-+    #     **model_kwargs,
-++++++-+    # ):
-++++++-+    #     """
-++++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
-++++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
-++++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
-++++++-+    #     """
-++++++-+    #     from ...generation.logits_process import LogitsProcessorList
-++++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
-++++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
-++++++-+    #     from mindnlp.core import nn, ops, no_grad
-++++++-+    #     import numpy as np
-++++++-+        
-++++++-+    #     # 检查是否使用 StaticCache
-++++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
-++++++-+    #     # 否则，直接调用父类方法
-++++++-+    #     past_key_values = model_kwargs.get("past_key_values")
-++++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
-++++++-+
-++++++-+    #     if not isinstance(past_key_values, StaticCache):
-++++++-+    #         # 不使用 StaticCache，直接调用父类方法
-++++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
-++++++-+    #         return super()._sample(
-++++++-+    #             input_ids=input_ids,
-++++++-+    #             logits_processor=logits_processor,
-++++++-+    #             stopping_criteria=stopping_criteria,
-++++++-+    #             generation_config=generation_config,
-++++++-+    #             synced_devices=synced_devices,
-++++++-+    #             streamer=streamer,
-++++++-+    #             logits_warper=logits_warper,
-++++++-+    #             **model_kwargs,
-++++++-+    #         )
-++++++-+        
-++++++-+    #     # 使用 StaticCache，进入自定义循环
-++++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
-++++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
-++++++-+    #     pad_token_id = generation_config._pad_token_tensor
-++++++-+    #     output_attentions = generation_config.output_attentions
-++++++-+    #     output_hidden_states = generation_config.output_hidden_states
-++++++-+    #     output_scores = generation_config.output_scores
-++++++-+    #     output_logits = generation_config.output_logits
-++++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
-++++++-+    #     max_length = generation_config.max_length
-++++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
-++++++-+    #     do_sample = generation_config.do_sample
-++++++-+        
-++++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
-++++++-+    #         raise ValueError(
-++++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
-++++++-+    #             f"{logits_warper})."
-++++++-+    #         )
-++++++-+        
-++++++-+    #     # init attention / hidden states / scores tuples
-++++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
-++++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
-++++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
-++++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
-++++++-+        
-++++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
-++++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
-++++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
-++++++-+    #         encoder_hidden_states = (
-++++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
-++++++-+    #         )
-++++++-+        
-++++++-+    #     # keep track of which sequences are already finished
-++++++-+    #     batch_size, cur_len = input_ids.shape
-++++++-+    #     this_peer_finished = False
-++++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
-++++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
-++++++-+        
-++++++-+    #     time_record = []
-++++++-+    #     from ....utils.testing_utils import parse_flag_from_env
-++++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
-++++++-+        
-++++++-+    #     while self._has_unfinished_sequences(
-++++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
-++++++-+    #     ):
-++++++-+    #         if _record_time:
-++++++-+    #             import time as time_module
-++++++-+    #             infer_start = time_module.time()
-++++++-+            
-++++++-+    #         # prepare model inputs
-++++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
-++++++-+            
-++++++-+    #         # prepare variable output controls
-++++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
-++++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
-++++++-+            
-++++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
-++++++-+    #         cur_cache_position = model_inputs.get("cache_position")
-++++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
-++++++-+    #         cur_input_ids = model_inputs.get("input_ids")
-++++++-+            
-++++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
-++++++-+    #             cur_cache_position is not None and 
-++++++-+    #             len(cur_cache_position.shape) > 0 and
-++++++-+    #             cur_cache_position.shape[0] == 1 and
-++++++-+    #             cur_input_ids is not None and
-++++++-+    #             cur_input_ids.shape[1] == 1):
-++++++-+    #             # 使用 JIT 优化的单 token 解码
-++++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
-++++++-+    #             if not hasattr(self, '_jit_used'):
-++++++-+    #                 self._jit_used = False
-++++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
-++++++-+                
-++++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
-++++++-+    #                 cur_token=cur_input_ids,
-++++++-+    #                 input_pos=model_inputs.get("position_ids"),
-++++++-+    #                 cache_position=cur_cache_position,
-++++++-+    #                 past_key_values=cur_past_key_values,
-++++++-+    #             )
-++++++-+                
-++++++-+    #             # 标记已使用JIT（用于后续判断）
-++++++-+    #             if not self._jit_used:
-++++++-+    #                 self._jit_used = True
-++++++-+                
-++++++-+    #             # 构造兼容的输出对象
-++++++-+    #             class JitOptimizedOutput:
-++++++-+    #                 def __init__(self, logits, config):
-++++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
-++++++-+    #                     self.config = config
-++++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
-++++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
-++++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
-++++++-+    #                     self.cross_attentions = None
-++++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
-++++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
-++++++-+                
-++++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
-++++++-+    #         else:
-++++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
-++++++-+    #             outputs = self(**model_inputs, return_dict=True)
-++++++-+            
-++++++-+    #         if synced_devices and this_peer_finished:
-++++++-+    #             continue
-++++++-+            
-++++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
-++++++-+    #         next_token_logits = outputs.logits[:, -1, :]
-++++++-+            
-++++++-+    #         # pre-process distribution
-++++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
-++++++-+    #         if do_sample:
-++++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
-++++++-+            
-++++++-+    #         # Store scores, attentions and hidden_states when required
-++++++-+    #         if return_dict_in_generate:
-++++++-+    #             if output_scores:
-++++++-+    #                 scores += (next_token_scores,)
-++++++-+    #             if output_logits:
-++++++-+    #                 raw_logits += (next_token_logits,)
-++++++-+    #             if output_attentions:
-++++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
-++++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
-++++++-+    #                 if self.config.is_encoder_decoder:
-++++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
-++++++-+                
-++++++-+    #             if output_hidden_states:
-++++++-+    #                 hidden = (
-++++++-+    #                     outputs.decoder_hidden_states
-++++++-+    #                     if self.config.is_encoder_decoder
-++++++-+    #                     else outputs.hidden_states
-++++++-+    #                 )
-++++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
-++++++-+            
-++++++-+    #         # token selection
-++++++-+    #         if do_sample:
-++++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
-++++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
-++++++-+    #         else:
-++++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
-++++++-+            
-++++++-+    #         # finished sentences should have their next token be a padding token
-++++++-+    #         if has_eos_stopping_criteria:
-++++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
-++++++-+            
-++++++-+    #         # update generated ids, model inputs, and length for next step
-++++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
-++++++-+    #         if streamer is not None:
-++++++-+    #             streamer.put(next_tokens)
-++++++-+            
-++++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
-++++++-+    #             outputs,
-++++++-+    #             model_kwargs,
-++++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
-++++++-+    #         )
-++++++-+            
-++++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
-++++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
-++++++-+    #         cur_len += 1
-++++++-+            
-++++++-+    #         if _record_time:
-++++++-+    #             import time as time_module
-++++++-+    #             infer_stop = time_module.time()
-++++++-+    #             time_record.append(infer_stop - infer_start)
-++++++-+            
-++++++-+    #         del outputs
-++++++-+        
-++++++-+    #     average_infer_time = None
-++++++-+    #     if time_record:
-++++++-+    #         if len(time_record) > 1:
-++++++-+    #             time_record.pop(0)
-++++++-+    #         average_infer_time = sum(time_record) / len(time_record)
-++++++-+    #         print(f'average inference time is: {average_infer_time}')
-++++++-+    #         print(f'inference time record: {time_record}')
-++++++-+        
-++++++-+    #     if streamer is not None:
-++++++-+    #         streamer.end()
-++++++-+        
-++++++-+    #     # 简单判断：打印是否使用了JIT路径
-++++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
-++++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
-++++++-+    #     else:
-++++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
-++++++-+        
-++++++-+    #     if return_dict_in_generate:
-++++++-+    #         if self.config.is_encoder_decoder:
-++++++-+    #             return GenerateEncoderDecoderOutput(
-++++++-+    #                 sequences=input_ids,
-++++++-+    #                 scores=scores,
-++++++-+    #                 logits=raw_logits,
-++++++-+    #                 encoder_attentions=encoder_attentions,
-++++++-+    #                 encoder_hidden_states=encoder_hidden_states,
-++++++-+    #                 decoder_attentions=decoder_attentions,
-++++++-+    #                 cross_attentions=cross_attentions,
-++++++-+    #                 decoder_hidden_states=decoder_hidden_states,
-++++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++++-+    #                 average_infer_time=average_infer_time
-++++++-+    #             )
-++++++-+    #         else:
-++++++-+    #             return GenerateDecoderOnlyOutput(
-++++++-+    #                 sequences=input_ids,
-++++++-+    #                 scores=scores,
-++++++-+    #                 logits=raw_logits,
-++++++-+    #                 attentions=decoder_attentions,
-++++++-+    #                 hidden_states=decoder_hidden_states,
-++++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
-++++++-+    #                 average_infer_time=average_infer_time
-++++++-+    #             )
-++++++-+    #     else:
-++++++-+    #         return input_ids
-++++++-+            
-++++++-+    # def _prepare_cache_for_generation(
-++++++-+    #     self,
-++++++-+    #     generation_config,
-++++++-+    #     model_kwargs,
-++++++-+    #     assistant_model,
-++++++-+    #     batch_size,
-++++++-+    #     max_cache_length,
-++++++-+    # ):
-++++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
-++++++-+    #         generation_config.cache_implementation = "static"
-++++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
-++++++-+        
-++++++-+    #     if generation_config.cache_implementation == "static":
-++++++-+    #         base_required_from_max_length = generation_config.max_length + 1
-++++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
-++++++-+    #         min_cache_size = 50
-++++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
-++++++-+    #         else:
-++++++-+    #             max_cache_length = max(base_required, min_cache_size)
-++++++-+            
-++++++-+    #         original_max_cache_length = max_cache_length
-++++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
-++++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
-++++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
-++++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
-++++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
-++++++-+            
-++++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
-++++++-+    #             if max_cache_length > self.config.max_position_embeddings:
-++++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
-++++++-+        
-++++++-+    #     result = super()._prepare_cache_for_generation(
-++++++-+    #         generation_config=generation_config,
-++++++-+    #         model_kwargs=model_kwargs,
-++++++-+    #         assistant_model=assistant_model,
-++++++-+    #         batch_size=batch_size,
-++++++-+    #         max_cache_length=max_cache_length,
-++++++-+    #     )
-++++++-+        
-++++++-+    #     if generation_config.cache_implementation == "static":
-++++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
-++++++-+    #         created_cache = model_kwargs.get(cache_name)
-++++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
-++++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
-++++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
-++++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
-++++++-+        
-++++++-+    #     return result
-++++++-+
-++++++-+
-++++++-+
-++++++- 
-++++++- 
-++++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
-++++++--- 
-++++++-2.27.0
-++++++-
-++++++-- 
-++++++2.27.0
-++++++
-+++++-- 
-+++++2.27.0
-+++++
-++++-- 
-++++2.27.0
-++++
-+++-- 
-+++2.27.0
-+++
-++-- 
-++2.27.0
-++
-+-- 
-+2.27.0
-+
--- 
-2.39.5 (Apple Git-154)
-
diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0010-.patch" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0010-.patch"
deleted file mode 100644
index a1832dc4..00000000
--- "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches/0010-.patch"
+++ /dev/null
@@ -1,49453 +0,0 @@
-From 5d88d879c9a97cf89b7f7a00df9534ba2df9e955 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?=E9=82=93=E4=BC=9F=E9=94=AE?= <emmmvkdeng@gmail.com>
-Date: Wed, 3 Dec 2025 16:13:15 +0800
-Subject: [PATCH 10/10] =?UTF-8?q?=E6=9C=80=E5=90=8E=E6=95=B4=E7=90=86?=
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
----
- .../models/deepseek/modeling_deepseek.py      |  731 +-
- .../models/qwen2_moe/modeling_qwen2_moe.py    | 1005 +-
- patches/0001-20251104commit.patch             | 1272 ---
- patches/0002-20251106commit.patch             | 3200 ------
- patches/0003-20261106secondcommit.patch       | 2769 ------
- patches/0004-20251106change.patch             | 7498 --------------
- patches/0005-20251107001commit.patch          | 7707 ---------------
- patches/0006-20251107002commit.patch          | 7931 ---------------
- patches/0007-20251107003commit.patch          | 8034 ---------------
- patches/0008-moe-change.patch                 | 8789 -----------------
- 10 files changed, 29 insertions(+), 48907 deletions(-)
- delete mode 100644 patches/0001-20251104commit.patch
- delete mode 100644 patches/0002-20251106commit.patch
- delete mode 100644 patches/0003-20261106secondcommit.patch
- delete mode 100644 patches/0004-20251106change.patch
- delete mode 100644 patches/0005-20251107001commit.patch
- delete mode 100644 patches/0006-20251107002commit.patch
- delete mode 100644 patches/0007-20251107003commit.patch
- delete mode 100644 patches/0008-moe-change.patch
-
-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-index 8d004af1..8178fb05 100644
---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
-@@ -234,9 +234,6 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
- # Copied from transformers.models.llama.modeling_llama.rotate_half
- def rotate_half(x):
-     """Rotates half the hidden dims of the input."""
--    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--    # x1 = x[..., : x.shape[-1] // 2]
--    # x2 = x[..., x.shape[-1] // 2 :]
-     x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-     return ops.cat((-x2, x1), dim=-1)
- 
-@@ -413,10 +410,7 @@ class DeepseekMoE(nn.Module):
-         if self.training:
-             raise NotImplementedError("Training is not supported yet.")
-         else:
--            # @lwx
-             if orig_shape[1] == 1:
--                # lwx moe_infer_decode_fast
--                # y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-                 y=self.moe_infer_decode_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
-                 y=y.view(*orig_shape)
-                 if self.config.n_shared_experts is not None:
-@@ -430,120 +424,7 @@ class DeepseekMoE(nn.Module):
-                 if self.config.n_shared_experts is not None:
-                     y = y + self.shared_experts(identity)
-                 return y
--            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--        # if self.config.n_shared_experts is not None:
--        #     y = y + self.shared_experts(identity)
--        # return y
--    
--    
--    
--    # lwx
--    # def forward(self, x, expert_ids: Optional[mindspore.Tensor] = None):
--    #     """
--    #     如果 expert_ids 为 None，走单专家逻辑；
--    #     如果有，多专家批量处理，保证和原逻辑一致。
--    #     """
--    #     if expert_ids is None:
--    #         # 原单专家逻辑
--    #         if self.config.pretraining_tp > 1:
--    #             slice = self.intermediate_size // self.config.pretraining_tp
--    #             gate_proj_slices = ops.split(self.gate_proj.weight, slice, dim=0)
--    #             up_proj_slices = ops.split(self.up_proj.weight, slice, dim=0)
--    #             down_proj_slices = ops.split(self.down_proj.weight, slice, dim=1)
--    #             gate_proj = ops.cat([F.linear(x, gate_proj_slices[i]) 
--    #                                  for i in range(self.config.pretraining_tp)], dim=-1)
--    #             up_proj = ops.cat([F.linear(x, up_proj_slices[i]) 
--    #                                for i in range(self.config.pretraining_tp)], dim=-1)
--    #             intermediate_states = ops.split((self.act_fn(gate_proj) * up_proj), slice, dim=2)
--    #             down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) 
--    #                          for i in range(self.config.pretraining_tp)]
--    #             down_proj = sum(down_proj)
--    #         else:
--    #             down_proj = self.down_proj(
--    #                 self.act_fn(self.gate_proj(x)) * self.up_proj(x)
--    #             )
--    #         return down_proj
--
--    #     # ====== 批量多专家路径 ======
--    #     hidden_size = x.shape[-1]
--
--    #     # 按 token expert_ids 选权重
--    #     gate_weights = self.gate_proj.weight[expert_ids]  # shape: [tokens, inter_size]
--    #     up_weights   = self.up_proj.weight[expert_ids]
--    #     down_weights = self.down_proj.weight[expert_ids]
--
--    #     # 注意：pretraining_tp > 1 的分 slice 逻辑仍然要保留
--    #     if self.config.pretraining_tp > 1:
--    #         outputs = []
--    #         slice = self.intermediate_size // self.config.pretraining_tp
--    #         for i in range(self.config.pretraining_tp):
--    #             # 每个 slice 单独计算
--    #             gate_proj_out = F.linear(x, gate_weights[:, i*slice:(i+1)*slice])
--    #             up_proj_out   = F.linear(x, up_weights[:, i*slice:(i+1)*slice])
--    #             act_out       = self.act_fn(gate_proj_out) * up_proj_out
--    #             down_proj_out = F.linear(act_out, down_weights[i*slice:(i+1)*slice, :])
--    #             outputs.append(down_proj_out)
--    #         return sum(outputs)
--    #     else:
--    #         gate_proj_out = F.linear(x, gate_weights)
--    #         up_proj_out   = F.linear(x, up_weights)
--    #         act_out       = self.act_fn(gate_proj_out) * up_proj_out
--    #         return F.linear(act_out, down_weights)
--    # @no_grad()
--    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--    #     num_tokens = x.shape[0]
--    #     hidden_size = x.shape[-1]
--
--    #     idxs = flat_expert_indices.argsort()
--    #     sorted_expert_indices = flat_expert_indices[idxs]
--    #     sorted_token_indices = idxs // self.num_experts_per_tok
--    #     sorted_indices = sorted_token_indices
--
--    #     permuted_tokens = x[sorted_token_indices]
--    #     sorted_weights  = flat_expert_weights[idxs]
--
--    #     # 一次调用多专家 forward
--    #     expert_outputs = ops.zeros_like(permuted_tokens)
--    #     expert_outputs = self.mlp_batch_forward(permuted_tokens, sorted_expert_indices)
--
--    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
--    #     try:
--    #         final_output = ops.moe_token_unpermute(
--    #             expert_outputs,
--    #             sorted_indices,
--    #             probs=probs,
--    #             padded_mode=False
--    #         )
--    #     except Exception:
--    #         final_output = ops.zeros_like(x)
--    #         final_output = mindspore.mint.scatter_add(
--    #             final_output,
--    #             0,
--    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
--    #             expert_outputs * sorted_weights
--    #         )
--
--    #     return final_output
--
--    # def mlp_batch_forward(self, tokens, expert_ids):
--    #     """
--    #     使用批量专家 forward（保留精度）
--    #     """
--    #     return self.experts[0].forward(tokens, expert_ids)
--
--    # @no_grad()
--    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--
--    #     expert_cache = ops.zeros_like(x)
--    #     for i in range(self.num_experts_per_tok):
--    #         expert_id = flat_expert_indices[i].item()
--    #         weight = flat_expert_weights[i].item()
--    #         expert = self.experts[expert_id]
--    #         expert_out = expert(x)
--    #         expert_cache += expert_out * weight
--    #     return expert_cache
--    
--    #@dwj
-+
-     @no_grad()
-     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
- 
-@@ -561,35 +442,27 @@ class DeepseekMoE(nn.Module):
-         - 跳过无 token 的专家
-         - 保持结果完全一致
-         """
--        # 初始化输出缓存
-         expert_cache = ops.zeros_like(x)
- 
--        # 排序（确保 scatter_add 位置对应原逻辑）
-         idxs = flat_expert_indices.argsort()
-         sorted_expert_indices = flat_expert_indices[idxs]
-         sorted_token_indices = idxs // self.num_experts_per_tok
- 
--        # 每个 expert 的 token 数
-         tokens_per_expert = sorted_expert_indices.bincount()
- 
--        # 找出有 token 的专家
-         active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
- 
-         for expert_id in active_experts.tolist():
--            # 取该 expert 对应的排序后 token 区间
-             start = (tokens_per_expert[:expert_id]).sum().item()
-             end = start + tokens_per_expert[expert_id].item()
- 
--            token_idx = sorted_token_indices[start:end]      # 原 token 位置
--            expert_tokens = x[token_idx]                     # 取输入向量
-+            token_idx = sorted_token_indices[start:end]
-+            expert_tokens = x[token_idx]
- 
--            # 执行专家 MLP
-             expert_out = self.experts[expert_id](expert_tokens)
- 
--            # 按权重缩放
-             scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
- 
--            # 回写到缓存（等价 scatter_add）
-             expert_cache = mindspore.mint.scatter_add(
-                 expert_cache,
-                 0,
-@@ -599,60 +472,6 @@ class DeepseekMoE(nn.Module):
- 
-         return expert_cache
- 
--
--    # @no_grad()
--    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--    #     """
--    #     优化版 MoE prefill：使用 mindspore.ops.moe_token_unpermute 替代手动 scatter_add
--    #     """
--    #     num_tokens = x.shape[0]
--    #     hidden_size = x.shape[-1]
--
--    #     # 生成排序后的 token 索引
--    #     idxs = flat_expert_indices.argsort()
--    #     sorted_expert_indices = flat_expert_indices[idxs]
--    #     sorted_token_indices = idxs // self.num_experts_per_tok
--
--    #     # 记录到 sorted_indices（moe_token_unpermute 用）
--    #     sorted_indices = sorted_token_indices  # shape: [num_tokens * top_k]
--
--    #     # 收集专家输入
--    #     permuted_tokens = x[sorted_token_indices]
--
--    #     # 执行每个专家的 MLP（批量处理）
--    #     expert_outputs = []
--    #     token_ptr = 0
--    #     tokens_per_expert = sorted_expert_indices.bincount()
--    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
--    #         if count == 0:
--    #             continue
--    #         cur_tokens = permuted_tokens[token_ptr:token_ptr+count]
--    #         out = self.experts[expert_id](cur_tokens)
--    #         expert_outputs.append(out)
--    #         token_ptr += count
--
--    #     # 拼接所有专家输出
--    #     permuted_outputs = ops.cat(expert_outputs, axis=0)
--
--    #     # 权重缩放（probs 形状为 [num_tokens, top_k]）
--    #     probs = flat_expert_weights.view(num_tokens, self.num_experts_per_tok)
--
--    #     # 直接调用硬件加速的 unpermute
--    #     final_output = ops.moe_token_unpermute(
--    #         permuted_outputs,         # shape: [num_tokens * top_k, hidden_size]
--    #         sorted_indices,           # shape: [num_tokens * top_k]
--    #         probs=probs,               # 按概率加权
--    #         padded_mode=False
--    #     )
--
--    #     return final_output
--    # def init_expert_cache(self):
--    #     """
--    #     在模型初始化时调用，缓存所有专家的权重到显存。
--    #     """
--    #     self.cache_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
--    #     self.cache_up_w   = ops.stack([expert.up_proj.weight for expert in self.experts], dim=0)
--    #     self.cache_down_w = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
-     @no_grad()
-     def moe_infer_decode_fast(self, x, flat_expert_indices, flat_expert_weights):
-         top_k = flat_expert_indices.shape[0]
-@@ -684,43 +503,22 @@ class DeepseekMoE(nn.Module):
-         weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-         return weighted_sum
- 
--    # lwx prefill 20251108
-     @no_grad()
-     def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
--        """
--        高性能 + 数值一致的 MoE prefill 推理：
--        1. 批量化处理所有专家计算，减少 Python 循环开销
--        2. Ascend A2 上使用 ops.moe_token_unpermute 加速 token 恢复
--        3. CPU/GPU 上自动 fallback 到 scatter_add 实现
--        4. 保证权重和 token 排列顺序与原版本完全一致，避免生成结果 mismatch
--
--        参数：
--            x: [num_tokens, hidden_size]，
--            MoE 输入的 token 表示
--            flat_expert_indices: [num_tokens * top_k]，
--            每个 token 的路由专家 id
--            flat_expert_weights: [num_tokens * top_k, 1]，
--            路由专家权重
--        """
-         num_tokens = x.shape[0]
-         hidden_size = x.shape[-1]
- 
--        # 1) 排序专家分配（与原 scatter_add 一致的顺序）
--        idxs = flat_expert_indices.argsort()  # 排序索引
--        sorted_expert_indices = flat_expert_indices[idxs]          # [num_tokens*top_k]
--        sorted_token_indices = idxs // self.num_experts_per_tok    # 原 token ID
-+        idxs = flat_expert_indices.argsort()
-+        sorted_expert_indices = flat_expert_indices[idxs]
-+        sorted_token_indices = idxs // self.num_experts_per_tok
- 
--        # sorted_indices 必须与 permuted_tokens 顺序匹配
--        sorted_indices = sorted_token_indices   # 用原 token 位置恢复顺序
-+        sorted_indices = sorted_token_indices
- 
--        # 2) 收集专家输入（按 idxs 排序）
--        permuted_tokens = x[sorted_token_indices]                  # [num_tokens*top_k, hidden_size]
--        sorted_weights  = flat_expert_weights[idxs]                 # [num_tokens*top_k, 1]，确保与 permuted_tokens 对齐
-+        permuted_tokens = x[sorted_token_indices]
-+        sorted_weights  = flat_expert_weights[idxs]
- 
--        # 3) 计算每个专家的 token 数
-         tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
- 
--        # 4) 批量专家计算（减少 Python 循环）
-         gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
-         up_weights   = ops.stack([expert.up_proj.weight   for expert in self.experts], dim=0)
-         down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
-@@ -731,8 +529,7 @@ class DeepseekMoE(nn.Module):
-             if count == 0:
-                 continue
-             tokens = permuted_tokens[ptr:ptr+count]  # [count, hidden_size]
--            
--            # 与 DeepseekMLP forward 等价
-+
-             gate_proj_out = F.linear(tokens, gate_weights[expert_id])
-             up_proj_out   = F.linear(tokens, up_weights[expert_id])
-             act_out       = self.experts[expert_id].act_fn(gate_proj_out) * up_proj_out
-@@ -741,7 +538,6 @@ class DeepseekMoE(nn.Module):
-             expert_outputs[ptr:ptr+count] = expert_out
-             ptr += count
- 
--        # 5) Ascend 加速的 unpermute（已排序的权重）
-         probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)  # 按排序后的顺序 reshape
- 
-         final_output = ops.zeros_like(x)
-@@ -753,444 +549,6 @@ class DeepseekMoE(nn.Module):
-         )      
-         return final_output
- 
--        # try:
--        #     final_output = ops.moe_token_unpermute(
--        #         expert_outputs,   # [num_tokens*top_k, hidden_size]
--        #         sorted_indices,   # [num_tokens*top_k] 原 token id
--        #         probs=probs,      # 对应权重
--        #         padded_mode=False
--        #     )
--        # except Exception:
--        #     # CPU/GPU fallback：用 scatter_add 保证完全一致
--        #     final_output = ops.zeros_like(x)
--        #     final_output = mindspore.mint.scatter_add(
--        #         final_output,
--        #         0,
--        #         sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
--        #         expert_outputs * sorted_weights
--        #     )
--
--        # return final_output
--
--
--    # @no_grad()
--    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--    #     num_tokens = x.shape[0]
--    #     hidden_size = x.shape[-1]
--
--    #     idxs = flat_expert_indices.argsort()
--    #     sorted_expert_indices = flat_expert_indices[idxs]
--    #     sorted_token_indices = idxs // self.num_experts_per_tok
--        
--    #     # sorted_indices = sorted_token_indices
--    #     sorted_indices = sorted_token_indices.astype(mindspore.int32)
--    #     permuted_tokens = x[sorted_token_indices]
--    #     sorted_weights = flat_expert_weights[idxs]
--    #     tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
--
--    #     expert_outputs = ops.zeros_like(permuted_tokens)
--    #     ptr = 0
--
--    #     # 只按专家维度循环
--    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
--    #         if count == 0:
--    #             continue
--    #         token_slice = slice(ptr, ptr + count)
--    #         expert_tokens = permuted_tokens[token_slice]
--
--    #         # 保持原 forward（含 pretraining_tp、bias 等）
--    #         expert_out = self.experts[expert_id](expert_tokens)
--
--    #         expert_outputs[token_slice] = expert_out
--    #         ptr += count
--
--    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
--    #     try:
--    #         final_output = mindspore.ops.moe_token_unpermute(
--    #             expert_outputs,
--    #             sorted_indices,
--    #             probs=probs,
--    #             padded_mode=False
--    #         )
--    #     except Exception:
--    #         final_output = ops.zeros_like(x)
--    #         final_output = mindspore.mint.scatter_add(
--    #             final_output,
--    #             0,
--    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
--    #             expert_outputs * sorted_weights
--    #         )
--
--    #     return final_output
--
--
--    #lwx
--    # @no_grad()
--    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--    #     """
--    #     并行化 MoE prefill：
--    #     - 一次性计算所有专家输出，牺牲显存峰值换取速度
--    #     - 保证结果与原版完全一致
--    #     """
--    #     # 输出缓存
--    #     expert_cache = ops.zeros_like(x)
--
--    #     # token 总数（批量*seq_len*num_experts_per_tok）
--    #     num_tokens = flat_expert_indices.shape[0]
--    #     hidden_dim = x.shape[-1]
--
--    #     # 原 token ID（idxs // num_experts_per_tok）
--    #     token_ids = ops.arange(num_tokens // self.num_experts_per_tok).repeat_interleave(self.num_experts_per_tok)
--
--    #     # ====== Step 1: 组织输入 ======
--    #     # 按 experts 排序，保证 scatter_add 对应位置一致
--    #     sort_ids = flat_expert_indices.argsort()
--    #     sorted_experts = flat_expert_indices[sort_ids]
--    #     sorted_tokens = token_ids[sort_ids]
--    #     sorted_weights = flat_expert_weights[sort_ids]
--
--    #     # 收集每个专家的输入
--    #     # build: expert_inputs[expert_id] = [tokens...]
--    #     expert_inputs = []
--    #     expert_outs = []
--
--    #     for eid in range(self.config.n_routed_experts):
--    #         eid_mask = (sorted_experts == eid)
--    #         if eid_mask.any():
--    #             tokens_for_eid = x[sorted_tokens[eid_mask]]
--    #             expert_inputs.append(tokens_for_eid)
--    #         else:
--    #             expert_inputs.append(None)
--
--    #     # ====== Step 2: 并行计算所有专家输出 ======
--    #     # 存储所有专家结果到一个列表
--    #     for eid in range(self.config.n_routed_experts):
--    #         if expert_inputs[eid] is not None:
--    #             out = self.experts[eid](expert_inputs[eid])
--    #             expert_outs.append(out)
--    #         else:
--    #             expert_outs.append(None)
--
--    #     # ====== Step 3: scatter_add 回写结果 ======
--    #     # 遍历专家，将结果加回对应的 token
--    #     pos = 0
--    #     for eid in range(self.config.n_routed_experts):
--    #         if expert_outs[eid] is not None:
--    #             size = expert_outs[eid].shape[0]
--    #             tokens_idx = sorted_tokens[pos:pos+size]
--    #             scaled_out = expert_outs[eid] * sorted_weights[pos:pos+size]
--    #             pos += size
--
--    #             # scatter_add 到 expert_cache
--    #             expert_cache = mindspore.mint.scatter_add(
--    #                 expert_cache,
--    #                 dim=0,
--    #                 index=tokens_idx.view(-1, 1).tile((1, hidden_dim)),
--    #                 src=scaled_out
--    #             )
--
--    #     return expert_cache
--
--
--
--# 放置在 DeepseekMoE 类中
--    # @no_grad()
--    # #lwx 20251107
--    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--    #     """
--    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--
--    #     Args:
--    #         x (Tensor): 输入张量, shape: (1, hidden_size)
--    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--    #     """
--    #     top_k, _ = flat_expert_weights.shape
--    #     hidden_size = x.shape[-1]
--
--    #     # 1. 将所有专家的权重堆叠起来
--    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--        
--    #     # 2. "收集" 所需的专家权重
--    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
--    #     selected_up_w = stacked_up_w[flat_expert_indices]
--    #     selected_down_w = stacked_down_w[flat_expert_indices]
--        
--    #     # 3. 准备输入
--    #     x_expanded = x.expand((top_k, 1, hidden_size))
--        
--    #     # 4. 并行计算 gate_proj 和 up_proj
--    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--
--    #     # 5. 计算中间状态
--    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--        
--    #     # 6. 并行计算 down_proj
--    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--    #     #    --- [FIX] ---
--    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--    #     #    --- [FIX END] ---
--        
--    #     # 7. 根据路由权重进行加权求和
--    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--
--    #     return weighted_sum
--
--
--
--        # @no_grad()
--        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--        #     # expert_cache = torch.zeros_like(x)
--        #     # idxs = flat_expert_indices.argsort()
--        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--        #     # token_idxs = idxs // self.num_experts_per_tok
--        #     # for i, end_idx in enumerate(tokens_per_expert):
--        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--        #     #     if start_idx == end_idx:
--        #     #         continue
--        #     #     expert = self.experts[i]
--        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--        #     #     expert_tokens = x[exp_token_idx]
--        #     #     expert_out = expert(expert_tokens)
--        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--        #     # return expert_cache
--        #     expert_cache = ops.zeros_like(x)
--        #     idxs = flat_expert_indices.argsort()
--        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--        #     token_idxs = idxs // self.num_experts_per_tok
--
--        #     for i, end_idx in enumerate(tokens_per_expert):
--        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--        #         if start_idx == end_idx:
--        #             continue
--        #         expert = self.experts[i]
--        #         exp_token_idx = token_idxs[start_idx:end_idx]
--        #         expert_tokens = x[exp_token_idx]
--        #         expert_out = expert(expert_tokens)
--        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--
--        #     return expert_cache
--        # @no_grad()
--        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--        #     expert_cache = ops.zeros_like(x)
--
--        #     # 排序保证顺序一致
--        #     idxs = flat_expert_indices.argsort()
--        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--        #     token_idxs = idxs // self.num_experts_per_tok
--
--        #     # 找出有 token 的专家
--        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--
--        #     for i in active_experts.tolist():
--        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--        #         end_idx = tokens_per_expert[i]
--        #         if start_idx == end_idx:  # 没有 token
--        #             continue
--
--        #         exp_token_idx = token_idxs[start_idx:end_idx]
--        #         expert_tokens = x[exp_token_idx]
--        #         expert_out = self.experts[i](expert_tokens)
--        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--
--        #         expert_cache = mindspore.mint.scatter_add(
--        #             expert_cache,
--        #             0,
--        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--        #             expert_out
--        #         )
--
--        #     return expert_cache
--
--
--
--# class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--#     """
--#     The trick function of adding auxiliary (aux) loss, 
--#     which includes the gradient of the aux loss during backpropagation.
--#     """
--#     @staticmethod
--#     def forward(ctx, x, loss):
--#         assert loss.numel() == 1
--#         ctx.dtype = loss.dtype
--#         ctx.required_aux_loss = loss.requires_grad
--#         return x
--
--#     @staticmethod
--#     def backward(ctx, grad_output):
--#         grad_loss = None
--#         if ctx.required_aux_loss:
--#             grad_loss = ops.ones(1, dtype=ctx.dtype)
--#         return grad_output, grad_loss
--
--
--# class DeepseekMoE(nn.Module):
--#     '''
--#     A mixed expert module containing shared experts.
--#     '''
--#     def __init__(self, config):
--#         super().__init__()
--#         self.config = config
--#         self.num_experts_per_tok = config.num_experts_per_tok
--#         if hasattr(config, "ep_size") and config.ep_size > 1:
--#             assert config.ep_size == mindspore.mint.distributed.get_world_size()
--#             self.ep_size = config.ep_size
--#             self.experts_per_rank = config.n_routed_experts // config.ep_size
--#             self.ep_rank = mindspore.mint.distributed.get_rank()
--#             self.experts = nn.ModuleList(
--#                 [
--#                     (
--#                         DeepseekMLP(
--#                             config, intermediate_size=config.moe_intermediate_size
--#                         )
--#                         if i >= self.ep_rank * self.experts_per_rank
--#                            and i < (self.ep_rank + 1) * self.experts_per_rank
--#                         else None
--#                     )
--#                     for i in range(config.n_routed_experts)
--#                 ]
--#             )
--
--#         else:
--#             self.ep_size = 1
--#             self.experts_per_rank = config.n_routed_experts
--#             self.ep_rank = 0
--#             self.experts = nn.ModuleList(
--#                 [
--#                     DeepseekMLP(
--#                         config, intermediate_size=config.moe_intermediate_size
--#                     )
--#                     for i in range(config.n_routed_experts)
--#                 ]
--#             )
--#         self.gate = MoEGate(config)
--#         if config.n_shared_experts is not None:
--#             intermediate_size = config.moe_intermediate_size * config.n_shared_experts
--#             self.shared_experts = DeepseekMLP(
--#                 config=config, intermediate_size=intermediate_size
--#             )
--
--#     def forward(self, hidden_states):
--#         identity = hidden_states
--#         orig_shape = hidden_states.shape
--#         topk_idx, topk_weight, aux_loss = self.gate(hidden_states)
--#         hidden_states = hidden_states.view(-1, hidden_states.shape[-1])
--#         flat_topk_idx = topk_idx.view(-1)
--#         if self.training:
--#             hidden_states = hidden_states.repeat_interleave(
--#             self.num_experts_per_tok, dim=0
--#             )
--#             y = ops.empty(hidden_states.shape)
--#             for i, expert in enumerate(self.experts):
--#                 y[flat_topk_idx == i] = expert(hidden_states[flat_topk_idx == i])
--#             y = ops.sum(y.view(*topk_weight.shape, -1) * topk_weight.unsqueeze(-1), dim=1)
--#             y = y.to(hidden_states.dtype).view(*orig_shape)
--#             # y = AddAuxiliaryLoss.apply(y, aux_loss)
--#         else:
--#             # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--#             y = self.moe_infer(hidden_states, topk_idx, topk_weight).view(*orig_shape)
--#         if self.config.n_shared_experts is not None:
--#             y = y + self.shared_experts(identity)
--#         return y
--    
--#     # # @mindnlp.core.no_grad()
--#     # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--#     #     expert_cache = ops.zeros_like(x)
--#     #     idxs = flat_expert_indices.argsort()
--#     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--#     #     token_idxs = idxs // self.num_experts_per_tok
--#     #     for i, end_idx in enumerate(tokens_per_expert):
--#     #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--#     #         if start_idx == end_idx:
--#     #             continue
--#     #         expert = self.experts[i]
--#     #         exp_token_idx = token_idxs[start_idx:end_idx]
--#     #         expert_tokens = x[exp_token_idx]
--#     #         expert_out = expert(expert_tokens)
--#     #         expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--#     #         # expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out, reduce='sum')
--#     #     return expert_out # expert_cache
--#     def moe_infer(self, x, topk_ids, topk_weight):
--#         cnts = topk_ids.new_zeros((topk_ids.shape[0], len(self.experts)))
--#         cnts.scatter_(1, topk_ids, 1)
--#         tokens_per_expert = cnts.sum(dim=0)
--#         idxs = topk_ids.view(-1).argsort()
--#         sorted_tokens = x[idxs // topk_ids.shape[1]]
--#         sorted_tokens_shape = sorted_tokens.shape
--#         if self.ep_size > 1:
--#             tokens_per_ep_rank = tokens_per_expert.view(self.ep_size, -1).sum(dim=1)
--#             tokens_per_expert_group = tokens_per_expert.new_empty(
--#                 tokens_per_expert.shape[0]
--#             )
--#             mindspore.mint.distributed.all_to_all_single(tokens_per_expert_group, tokens_per_expert)
--#             output_splits = (
--#                 tokens_per_expert_group.view(self.ep_size, -1)
--#                 .sum(1)
--#                 .cpu()
--#                 .numpy()
--#                 .tolist()
--#             )
--#             gathered_tokens = sorted_tokens.new_empty(
--#                 tokens_per_expert_group.sum(dim=0).cpu().item(), sorted_tokens.shape[1]
--#             )
--#             input_split_sizes = tokens_per_ep_rank.cpu().numpy().tolist()
--#             mindspore.mint.distributed.all_to_all(
--#                 list(gathered_tokens.split(output_splits)),
--#                 list(sorted_tokens.split(input_split_sizes)),
--#             )
--#             tokens_per_expert_post_gather = tokens_per_expert_group.view(
--#                 self.ep_size, self.experts_per_rank
--#             ).sum(dim=0)
--#             gatherd_idxs = np.zeros(shape=(gathered_tokens.shape[0],), dtype=np.int32)
--#             s = 0
--#             for i, k in enumerate(tokens_per_expert_group.cpu().numpy()):
--#                 gatherd_idxs[s : s + k] = i % self.experts_per_rank
--#                 s += k
--#             gatherd_idxs = gatherd_idxs.argsort()
--#             sorted_tokens = gathered_tokens[gatherd_idxs]
--#             tokens_per_expert = tokens_per_expert_post_gather
--#         tokens_per_expert = tokens_per_expert.cpu().numpy()
--#         outputs = []
--#         start_idx = 0
--#         for i, num_tokens in enumerate(tokens_per_expert):
--#             end_idx = start_idx + num_tokens
--#             if num_tokens == 0:
--#                 continue
--#             expert = self.experts[i + self.ep_rank * self.experts_per_rank]
--#             tokens_for_this_expert = sorted_tokens[start_idx:end_idx]
--#             expert_out = expert(tokens_for_this_expert)
--#             outputs.append(expert_out)
--#             start_idx = end_idx
--
--#         outs = ops.cat(outputs, dim=0) if len(outputs) else sorted_tokens.new_empty(0)
--#         if self.ep_size > 1:
--#             new_x = ops.empty_like(outs)
--#             new_x[gatherd_idxs] = outs
--#             gathered_tokens = new_x.new_empty(*sorted_tokens_shape)
--#             mindspore.mint.distributed.all_to_all(
--#                 list(gathered_tokens.split(input_split_sizes)),
--#                 list(new_x.split(output_splits)),
--#             )
--#             outs = gathered_tokens
--
--#         new_x = ops.empty_like(outs)
--#         new_x[idxs] = outs
--#         final_out = (
--#             new_x.view(*topk_ids.shape, -1)
--#             .type(topk_weight.dtype)
--#             .mul_(topk_weight.unsqueeze(dim=-1))
--#             .sum(dim=1)
--#             .type(new_x.dtype)
--#         )
--#         return final_out
--
--
- # Copied from transformers.models.llama.modeling_llama.repeat_kv
- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-     """
-@@ -1313,10 +671,6 @@ class DeepseekAttention(nn.Module):
-             key_states = self.k_proj(hidden_states)
-             value_states = self.v_proj(hidden_states)
- 
--        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--        # @lwx
-         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
-         query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
-         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
-@@ -1555,10 +909,6 @@ class DeepseekDecoderLayer(nn.Module):
-         super().__init__()
-         self.hidden_size = config.hidden_size
- 
--        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--        #     config=config, layer_idx=layer_idx
--        # )
--
-         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
-             config=config, layer_idx=layer_idx
-         )
-@@ -1774,14 +1124,6 @@ class DeepseekModel(DeepseekPreTrainedModel):
-                 else None
-             )
-         else:
--            # 4d mask is passed through the layers
--            # attention_mask = _prepare_4d_causal_attention_mask(
--            #     attention_mask,
--            #     (batch_size, seq_length),
--            #     inputs_embeds,
--            #     past_key_values_length,
--            # )
--            #@dwj
-             attention_mask = get_cached_causal_mask(
-                 attention_mask,
-                 (batch_size, seq_length),
-@@ -1869,38 +1211,14 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-         self.post_init()
-         # lwx
-         self.warm_up = False
--    #初始
--
--    # def warmup_moe_model_deep(self):
--    #     print("[Warmup] DeepSeek-MoE 模型预热开始...")
--    #     test_texts = [
--    #         "warmup short",
--    #         "This is a medium length warmup sentence for MoE experts. middle middle middle",
--    #         "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--    #     ]
--    #     tokenizer = getattr(self, "_warmup_tokenizer", None)
--    #     if tokenizer is None:
--    #         from mindnlp.transformers import AutoTokenizer
--    #         tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--    #         self._warmup_tokenizer = tokenizer
--
--    #     for text in test_texts:
--    #         inputs = tokenizer(text, return_tensors="ms")
--    #         with mindspore._no_grad():
--    #             _ = self(**inputs, use_cache=False)
--    #     print("[Warmup] DeepSeek-MoE 模型预热完成。")
--    
-+
-     def warmup_moe_model_deep(self):
-         print("[Warmup] DeepSeek-MoE 模型预热开始...")
- 
--        # 直接用 eval.py 默认的 prompts 内容
-         warmup_prompts = [
--            "Hello, how are you?",
--            "This American studied art at Yale and is the author of multiple popular mystery novels. First name is 'Hillary'. What's the last name?",
--            """Summarize the following text: US President Donald Trump has said he is 'not happy' with his Russian counterpart Vladimir Putin, following Moscow's largest aerial attack yet on Ukraine.
--    In a rare rebuke, Trump said: "What the hell happened to him? He's killing a lot of people." He later called Putin "absolutely crazy".
--    Ukrainian President Volodymyr Zelensky earlier said Washington's "silence" over recent Russian attacks was encouraging Putin, urging "strong pressure" - including tougher sanctions - on Moscow.
--    """
-+            "warmup short",
-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
-         ]
- 
-         tokenizer = getattr(self, "_warmup_tokenizer", None)
-@@ -1909,13 +1227,11 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-             tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
-             self._warmup_tokenizer = tokenizer
- 
--        # 跑一遍 warmup_prompts，触发路由逻辑
-         for text in warmup_prompts:
-             inputs = tokenizer(text, return_tensors="ms")
-             with mindspore._no_grad():
-                 _ = self(**inputs, use_cache=False)
- 
--        # 这里可以加按需缓存逻辑，避免显存 OOM
-         from mindnlp.transformers.models.deepseek.modeling_deepseek import DeepseekMoE
-         for module in self.modules():
-             if isinstance(module, DeepseekMoE):
-@@ -2051,15 +1367,13 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
- 
-         loss = None
-         if labels is not None:
--            # Shift so that tokens < n predict n
-             shift_logits = logits[..., :-1, :]
-             shift_labels = labels[..., 1:]
--            # Flatten the tokens
-+
-             loss_fct = nn.CrossEntropyLoss()
-             shift_logits = shift_logits.view(-1, self.config.vocab_size)
-             shift_labels = shift_labels.view(-1)
--            # Enable model parallelism
--            # shift_labels = shift_labels.to(shift_logits)
-+
-             loss = loss_fct(shift_logits, shift_labels)
- 
-         if not return_dict:
-@@ -2091,22 +1405,16 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-                 cache_length = past_length = past_key_values[0][0].shape[2]
-                 max_cache_length = None
- 
--            # Keep only the unprocessed tokens:
--            # 1 - If the length of the attention_mask exceeds the length of input_ids, then we are in a setting where
--            # some of the inputs are exclusivelly passed as part of the cache (e.g. when passing input_embeds as
--            # input)
-+
-             if (
-                 attention_mask is not None
-                 and attention_mask.shape[1] > input_ids.shape[1]
-             ):
-                 input_ids = input_ids[:, -(attention_mask.shape[1] - past_length) :]
--            # 2 - If the past_length is smaller than input_ids', then input_ids holds all input tokens. We can discard
--            # input_ids based on the past_length.
-+
-             elif past_length < input_ids.shape[1]:
-                 input_ids = input_ids[:, past_length:]
--            # 3 - Otherwise (past_length >= input_ids.shape[1]), let's assume input_ids only has unprocessed tokens.
- 
--            # If we are about to go beyond the maximum cache length, we need to crop the input attention mask.
-             if (
-                 max_cache_length is not None
-                 and attention_mask is not None
-@@ -2116,14 +1424,11 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
- 
-         position_ids = kwargs.get("position_ids", None)
-         if attention_mask is not None and position_ids is None:
--            # create position_ids on the fly for batch generation
-             position_ids = attention_mask.to(mindspore.int32).cumsum(-1) - 1
--            # position_ids.masked_fill_(attention_mask == 0, 1)
-             position_ids = ops.masked_fill(position_ids, attention_mask == 0, 1)
-             if past_key_values:
-                 position_ids = position_ids[:, -input_ids.shape[1] :]
- 
--        # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
-         if inputs_embeds is not None and past_key_values is None:
-             model_inputs = {"inputs_embeds": inputs_embeds}
-         else:
-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-index 6566958b..d689e36d 100644
---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
-@@ -63,18 +63,14 @@ def get_cached_causal_mask_with_cache_position(
-     """
-     带缓存的 causal mask 构造函数
-     """
--    # q_len 是当前 query 长度
-     q_len = sequence_length
--    # kv_len 是 target_length
-     kv_len = target_length
- 
--    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
-     key = (batch_size, q_len, kv_len, dtype, min_dtype)
- 
-     if key in _causal_mask_cache:
-         return _causal_mask_cache[key]
- 
--    # 调用原来的 mask 构造逻辑
-     causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
-         attention_mask,
-         sequence_length=sequence_length,
-@@ -84,7 +80,6 @@ def get_cached_causal_mask_with_cache_position(
-         cache_position=cache_position,
-         batch_size=batch_size,
-     )
--    # 缓存结果
-     _causal_mask_cache[key] = causal_mask
-     return causal_mask
- 
-@@ -224,11 +219,6 @@ class Qwen2MoeRMSNorm(nn.Module):
-         self.variance_epsilon = eps
- 
-     def forward(self, hidden_states):
--        # @dwj
--        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--        # @lwx
--        # if not self.training :
--        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
-         input_dtype = hidden_states.dtype
-         hidden_states = hidden_states.to(mindspore.float32)
-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
-@@ -279,9 +269,6 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
- # Copied from transformers.models.llama.modeling_llama.rotate_half
- def rotate_half(x):
-     """Rotates half the hidden dims of the input."""
--    # x1 = x[..., : x.shape[-1] // 2]
--    # x2 = x[..., x.shape[-1] // 2 :]
--    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
-     x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
-     return ops.cat((-x2, x1), dim=-1)
- 
-@@ -329,21 +316,8 @@ class Qwen2MoeMLP(nn.Module):
-         self.act_fn = ACT2FN[config.hidden_act]
- 
-     def forward(self, x):
--
-         return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--        # @lwx 
--        # gate_up_output = self.gate_up_proj(x)
--        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--        # return self.down_proj(swiglu_output)
--
--    # def forward(self, x):
--    #     gate_proj_out = self.gate_proj(x)
--    #     up_proj_out = self.up_proj(x)
--    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--    #     return self.down_proj(swiglu_out)
--        
-+
- # Copied from transformers.models.llama.modeling_llama.repeat_kv
- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-     """
-@@ -356,164 +330,6 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-     hidden_states = hidden_states[:, :, None, :, :].broadcast_to((batch, num_key_value_heads, n_rep, slen, head_dim))
-     return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
- 
--
--# Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
--# class Qwen2MoeAttention(nn.Module):
--#     """
--#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--#     and "Generating Long Sequences with Sparse Transformers".
--#     """
--
--#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--#         super().__init__()
--#         self.config = config
--#         self.layer_idx = layer_idx
--#         if layer_idx is None:
--#             logger.warning_once(
--#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--#                 "when creating this class."
--#             )
--
--#         self.hidden_size = config.hidden_size
--#         self.num_heads = config.num_attention_heads
--#         self.head_dim = self.hidden_size // self.num_heads
--#         self.num_key_value_heads = config.num_key_value_heads
--#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--#         self.max_position_embeddings = config.max_position_embeddings
--#         self.rope_theta = config.rope_theta
--#         self.is_causal = True
--#         self.attention_dropout = config.attention_dropout
--
--#         if (self.head_dim * self.num_heads) != self.hidden_size:
--#             raise ValueError(
--#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--#                 f" and `num_heads`: {self.num_heads})."
--#             )
--#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--
--#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--#             self.head_dim,
--#             max_position_embeddings=self.max_position_embeddings,
--#             base=self.rope_theta,
--#         )
--
--#     def forward(
--#         self,
--#         hidden_states: mindspore.Tensor,
--#         attention_mask: Optional[mindspore.Tensor] = None,
--#         position_ids: Optional[mindspore.Tensor] = None,
--#         past_key_value: Optional[Cache] = None,
--#         output_attentions: bool = False,
--#         use_cache: bool = False,
--#         cache_position: Optional[mindspore.Tensor] = None,
--#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--
--        
--
--#         bsz, q_len, _ = hidden_states.shape
--
--#         query_states = self.q_proj(hidden_states)
--#         key_states = self.k_proj(hidden_states)
--#         value_states = self.v_proj(hidden_states)
--
--#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--
--#         kv_seq_len = key_states.shape[-2]
--#         if past_key_value is not None:
--#             if self.layer_idx is None:
--#                 raise ValueError(
--#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--#                     "with a layer index."
--#                 )
--#             if isinstance(past_key_value, StaticCache):
--#                 kv_seq_len = key_states.shape[-2]
--#             else:
--#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--
--#         if past_key_value is not None:
--#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--            
--#             if isinstance(past_key_value, StaticCache):
--#                 kv_seq_len = key_states.shape[-2]
--
--#         # repeat k/v heads if n_kv_heads < n_heads
--#         key_states = repeat_kv(key_states, self.num_key_value_groups)
--#         value_states = repeat_kv(value_states, self.num_key_value_groups)
--        
--#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--
--#         if attention_mask is not None:
--#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--#             attn_weights = attn_weights + causal_mask
--
--#         # upcast attention to fp32
--#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--#         attn_output = ops.matmul(attn_weights, value_states)
--
--#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--#             raise ValueError(
--#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--#                 f" {attn_output.shape}"
--#             )
--
--#         attn_output = ops.transpose(attn_output, 1, 2)
--#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--
--#         attn_output = self.o_proj(attn_output)
--#         # @lwx
--        
--#         # max_seq_len = self.max_position_embeddings  # 2048
--
--#         # if attention_mask is not None:
--#         #     # attention_mask: [B, 1, Sq, Sk]
--#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--
--#         #     # pad 到 [max_seq_len, max_seq_len]
--#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--#         #     global_attention_mask = padded_mask
--#         # else:
--#         #     global_attention_mask = None
--
--
--#         # sparse_mode=3
--#         # attn_output = mindspore.ops.flash_attention_score(
--#         #     query=query_states,
--#         #     key=key_states,
--#         #     value=value_states,
--#         #     real_shift=None,
--#         #     padding_mask=None,
--
--#         #     head_num=self.num_heads,
--#         #     attn_mask=global_attention_mask,
--#         #     keep_prob=1.0 - self.attention_dropout,
--#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
--#         #     input_layout="BNSD",
--#         #     pre_tokens=2147483647,
--#         #     next_tokens=2147483647,
--#         #     inner_precise=0,
--#         #     drop_mask=None,
--#         #     prefix=None,
--#         #     actual_seq_qlen=None,
--#         #     actual_seq_kvlen=None,
--#         #     sparse_mode=sparse_mode,
--#         # )
--#         if not output_attentions:
--#             attn_weights = None
--
--#         return attn_output, attn_weights, past_key_value
--
- class Qwen2MoeAttention(nn.Module):
-     """
-     一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
-@@ -594,10 +410,8 @@ class Qwen2MoeAttention(nn.Module):
-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
- 
--        # --- 2. 动态调度核心注意力计算 ---
-         global Long_Prompt
-         if Long_Prompt >= 1:
--            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
-             fa_attention_mask = None
-             if attention_mask is not None:
-                 mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-@@ -613,7 +427,7 @@ class Qwen2MoeAttention(nn.Module):
-                 scalar_value=1.0 / math.sqrt(self.head_dim),
-                 input_layout="BNSD",
-                 sparse_mode=0,
--                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
-+                inner_precise=0
-             )
-             
-             attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-@@ -623,7 +437,6 @@ class Qwen2MoeAttention(nn.Module):
-                 logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
- 
-         else:
--            # --- Eager Attention 路径 (用于短序列和解码) ---
-             key_states = repeat_kv(key_states, self.num_key_value_groups)
-             value_states = repeat_kv(value_states, self.num_key_value_groups)
-             
-@@ -651,252 +464,6 @@ class Qwen2MoeAttention(nn.Module):
-         
-         return attn_output, attn_weights, past_key_value
- 
--# class Qwen2MoeFlashAttention(nn.Module):
--#     """
--#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--
--#     关键改动:
--#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--#        直接传入原始的 key 和 value 张量效率更高。
--#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--#     """
--#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--#         super().__init__()
--#         self.config = config
--#         self.layer_idx = layer_idx
--#         self.hidden_size = config.hidden_size
--#         self.num_heads = config.num_attention_heads
--#         self.head_dim = self.hidden_size // self.num_heads
--#         self.num_key_value_heads = config.num_key_value_heads
--#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--#         self.max_position_embeddings = config.max_position_embeddings
--#         self.rope_theta = config.rope_theta
--#         self.attention_dropout = config.attention_dropout
--
--#         if (self.head_dim * self.num_heads) != self.hidden_size:
--#             raise ValueError(
--#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--#             )
--
--#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--
--#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--#             self.head_dim,
--#             max_position_embeddings=self.max_position_embeddings,
--#             base=self.rope_theta,
--#         )
--
--#     def forward(
--#         self,
--#         hidden_states: mindspore.Tensor,
--#         attention_mask: Optional[mindspore.Tensor] = None,
--#         position_ids: Optional[mindspore.Tensor] = None,
--#         past_key_value: Optional[Cache] = None,
--#         output_attentions: bool = False,
--#         use_cache: bool = False,
--#         cache_position: Optional[mindspore.Tensor] = None,
--#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--
--#         bsz, q_len, _ = hidden_states.shape
--
--#         # 1. 线性投射 Q, K, V
--#         query_states = self.q_proj(hidden_states)
--#         key_states = self.k_proj(hidden_states)
--#         value_states = self.v_proj(hidden_states)
--
--#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--#         # query:   [B, S, H*D] -> [B, N1, S, D]
--#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
--#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--
--#         # 3. RoPE 旋转位置编码
--#         kv_seq_len = key_states.shape[-2]
--#         if past_key_value is not None:
--#             if self.layer_idx is None:
--#                 raise ValueError(
--#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--#                     "with a layer index."
--#                 )
--#             # 对于 StaticCache，需要特殊处理 kv_seq_len
--#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
--#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
--#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--#                 if cache_position.shape[0] == 1:
--#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--#                     kv_seq_len = past_seen_tokens + 1
--#                 else:
--#                     # prefill 阶段：cache_position 是范围，使用其长度
--#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
--#             else:
--#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--
--#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--
--#         # 4. KV 缓存更新
--#         if past_key_value is not None:
--#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--#             key_states, value_states = past_key_value.update(
--#                 key_states, value_states, self.layer_idx, cache_kwargs
--#             )
--            
--#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--#                 if cache_position.shape[0] == 1:
--#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--#                     kv_seq_len = key_states.shape[-2]
--
--#         # 5. [重要] 准备 Attention Mask
--#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--#         fa_attention_mask = None
--#         if attention_mask is not None:
--#             # 截取与当前key长度匹配的部分
--#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--#             # 转换为布尔类型: 大负数 -> True, 0 -> False
--#             fa_attention_mask = (mask_slice != 0)
--
--#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--#         input_dtype = query_states.dtype
--#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--#             query_states = query_states.to(mindspore.float16)
--#             key_states = key_states.to(mindspore.float16)
--#             value_states = value_states.to(mindspore.float16)
--
--#         # 6. [核心] 调用 flash_attention_score 算子
--#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
--#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--#         attn_output = mindspore.ops.flash_attention_score(
--#             query=query_states,
--#             key=key_states,
--#             value=value_states,
--#             head_num=self.num_heads, # 传入Q的头数(N1)
--#             attn_mask=fa_attention_mask,
--#             keep_prob=1.0 - self.attention_dropout,
--#             scalar_value=1.0 / math.sqrt(self.head_dim),
--#             input_layout="BNSD",
--#             sparse_mode=0 # 使用 defaultMask 模式
--#         )
--
--#         # 恢复原始数据类型
--#         attn_output = attn_output.to(input_dtype)
--
--#         # 7. 调整输出形状
--#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--#         attn_output = self.o_proj(attn_output)
--
--#         # FlashAttention 算子不直接返回注意力权重矩阵
--#         attn_weights = None
--#         if output_attentions:
--#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--
--#         return attn_output, attn_weights, past_key_value
--
--#     # def forward(
--#     #     self,
--#     #     hidden_states: mindspore.Tensor,
--#     #     attention_mask: Optional[mindspore.Tensor] = None,
--#     #     position_ids: Optional[mindspore.Tensor] = None,
--#     #     past_key_value: Optional[Cache] = None,
--#     #     output_attentions: bool = False,
--#     #     use_cache: bool = False,
--#     #     cache_position: Optional[mindspore.Tensor] = None,
--#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--
--#     #     bsz, q_len, _ = hidden_states.shape
--
--#     #     # 1. 线性投射 Q, K, V
--#     #     query_states = self.q_proj(hidden_states)
--#     #     key_states = self.k_proj(hidden_states)
--#     #     value_states = self.v_proj(hidden_states)
--
--#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--
--#     #     # 3. RoPE 旋转位置编码
--#     #     kv_seq_len = key_states.shape[-2]
--#     #     if past_key_value is not None:
--#     #         if self.layer_idx is None:
--#     #             raise ValueError(
--#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--#     #                 "with a layer index."
--#     #             )
--#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--
--#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--
--#     #     # 4. KV 缓存更新
--#     #     if past_key_value is not None:
--#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--#     #         key_states, value_states = past_key_value.update(
--#     #             key_states, value_states, self.layer_idx, cache_kwargs
--#     #         )
--
--#     #     # 5. 准备 Attention Mask
--#     #     fa_attention_mask = None
--#     #     if attention_mask is not None:
--#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--#     #         fa_attention_mask = (mask_slice != 0)
--
--#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--#     #     input_dtype = query_states.dtype
--
--#     #     # 6. [核心] 调用 flash_attention_score 算子
--#     #     attn_output = mindspore.ops.flash_attention_score(
--#     #         query=query_states,
--#     #         key=key_states,
--#     #         value=value_states,
--#     #         head_num=self.num_heads,
--#     #         attn_mask=fa_attention_mask,
--#     #         keep_prob=1.0 - self.attention_dropout,
--#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
--#     #         input_layout="BNSD",
--#     #         sparse_mode=0,
--#     #         # <--- 修改点 2: 启用内部高精度计算 ---
--#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--#     #         inner_precise=1
--#     #     )
--
--#     #     # 恢复原始数据类型
--#     #     attn_output = attn_output.to(input_dtype)
--
--#     #     # 7. 调整输出形状
--#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--#     #     attn_output = self.o_proj(attn_output)
--
--#     #     attn_weights = None
--#     #     if output_attentions:
--#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--
--#     #     return attn_output, attn_weights, past_key_value
--
- 
- class Qwen2MoeFlashAttention(nn.Module):
-     """
-@@ -948,17 +515,14 @@ class Qwen2MoeFlashAttention(nn.Module):
- 
-         bsz, q_len, _ = hidden_states.shape
- 
--        # 1. 线性投射 Q, K, V
-         query_states = self.q_proj(hidden_states)
-         key_states = self.k_proj(hidden_states)
-         value_states = self.v_proj(hidden_states)
- 
--        # 2. 调整形状以匹配 BNSD 布局
-         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
-         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
-         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--        
--        # 3. RoPE 和 KV 缓存
-+
-         kv_seq_len = key_states.shape[-2]
-         if past_key_value is not None:
-             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-@@ -970,13 +534,11 @@ class Qwen2MoeFlashAttention(nn.Module):
-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
- 
--        # 4. 准备 Attention Mask
-         fa_attention_mask = None
-         if attention_mask is not None:
-             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
-             fa_attention_mask = (mask_slice != 0)
- 
--        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
-         attn_output = mindspore.ops.flash_attention_score(
-             query=query_states,
-             key=key_states,
-@@ -987,14 +549,12 @@ class Qwen2MoeFlashAttention(nn.Module):
-             scalar_value=1.0 / math.sqrt(self.head_dim),
-             input_layout="BNSD",
-             sparse_mode=0,
--            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
-+            inner_precise=0
-         )
- 
--        # 6. 调整输出形状
-         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
-         attn_output = self.o_proj(attn_output)
- 
--        # 7. 返回结果
-         attn_weights = None
-         if output_attentions:
-              logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-@@ -1007,88 +567,7 @@ QWEN2MOE_ATTENTION_CLASSES = {
-     "flash-attention": Qwen2MoeFlashAttention,
- }
- 
--
--# class Qwen2MoeSparseMoeBlock(nn.Module):
--#     def __init__(self, config):
--#         super().__init__()
--#         self.num_experts = config.num_experts
--#         self.top_k = config.num_experts_per_tok
--#         self.norm_topk_prob = config.norm_topk_prob
--
--#         # gating
--#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--#         self.experts = nn.ModuleList(
--#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--#         )
--
--#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--
--#     #@dwj
--#     # 只遍历激活的专家，而非全部专家
--#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--#             batch_size, sequence_length, hidden_dim = hidden_states.shape
--#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--#             num_tokens = hidden_states_reshaped.shape[0]
--
--#             router_logits = self.gate(hidden_states_reshaped)
--#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--
--#             if self.norm_topk_prob:
--#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--#             routing_weights = routing_weights.to(hidden_states.dtype)
--            
--#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--#             flat_selected_experts = selected_experts.flatten()
--            
--#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--#             token_indices = broadcasted_token_indices.flatten()
--            
--#             active_experts = ops.unique(flat_selected_experts)
--            
--#             for expert_idx_tensor in active_experts:
--#                 expert_idx = expert_idx_tensor.item()
--#                 expert_layer = self.experts[expert_idx]
--                
--#                 mask = (flat_selected_experts == expert_idx_tensor)
--#                 selected_token_indices = token_indices[mask]
--#                 selected_routing_weights = routing_weights.flatten()[mask]
--                
--#                 current_states = hidden_states_reshaped[selected_token_indices]
--                
--#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--                
--#                 final_hidden_states = final_hidden_states.index_add(
--#                     dim=0,
--#                     index=selected_token_indices,
--#                     source=expert_output.to(hidden_states.dtype)
--#                 )
--            
--#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
--#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--
--#             final_hidden_states = final_hidden_states + shared_expert_output
--#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--            
--#             return final_hidden_states, router_logits
--
--
- class Qwen2MoeSparseMoeBlock(nn.Module):
--    """
--    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--    控制的顶级推理策略：
--
--    - if Long_Prompt is True:  【精度优先模式】
--      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
--      适用于需要严格可复现性的长序列任务。
--
--    - if Long_Prompt is False: 【速度优先模式】
--      采用业界最强的性能组合：
--      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
--      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
--    """
-     def __init__(self, config: Qwen2MoeConfig):
-         super().__init__()
-         self.num_experts = config.num_experts
-@@ -1102,7 +581,6 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
- 
--    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
-     @no_grad()
-     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-         original_dtype = hidden_states.dtype
-@@ -1119,39 +597,8 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
-         return moe_output_fp32.squeeze(1).to(original_dtype)
- 
--
--    # @no_grad()
--    # def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--    #     num_tokens, _ = hidden_states.shape
--    #     flat_selected_experts = selected_experts.flatten()
--    #     sorted_expert_indices = flat_selected_experts.argsort()
--    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--    #     original_token_indices = sorted_expert_indices // self.top_k
--    #     moe_output = ops.zeros_like(hidden_states)
--    #     current_token_offset = 0
--    #     for i in range(self.num_experts):
--    #         expert_token_count = tokens_per_expert[i] - current_token_offset
--    #         if expert_token_count == 0:
--    #             continue
--    #         end_offset = current_token_offset + expert_token_count
--    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--    #         expert_hidden_states = hidden_states[expert_original_token_indices]
--    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--    #         current_token_offset += expert_token_count
--    #     return moe_output
--
--    # baseline
-     @no_grad()
-     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--        """
--        优化版 MoE prefill (速度优先模式)：
--        - 批量张量化处理同一个 expert 的所有 token
--        - 跳过无 token 的专家
--        - 保持结果完全一致
--        """
-         moe_output = ops.zeros_like(hidden_states)
- 
-         flat_selected_experts = selected_experts.flatten()
-@@ -1188,56 +635,39 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
- 
-     @no_grad()
-     def _moe_infer_prefill_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--        """
--        优化版 MoE prefill (速度优先模式) - 连续切片 & 单次 scatter_add
--        逻辑：
--        1. 按 expert 排序，将同一 expert 的 token 放在连续内存中
--        2. 每个 expert 一次性处理其全部 token
--        3. 最后一次 scatter_add 回到原 token 顺序
--        """
--
-         num_tokens = hidden_states.shape[0]
-         hidden_size = hidden_states.shape[-1]
- 
--        # 展平为一维
--        flat_selected_experts = selected_experts.flatten()       # [num_tokens * top_k]
--        flat_routing_weights = routing_weights.flatten()         # [num_tokens * top_k]
-+        flat_selected_experts = selected_experts.flatten()
-+        flat_routing_weights = routing_weights.flatten()
- 
--        # 按 expert 排序
-         idxs = flat_selected_experts.argsort()
--        sorted_expert_indices = flat_selected_experts[idxs]       # expert ID 排序后
--        sorted_token_indices = idxs // self.top_k                 # 对应原 token ID
-+        sorted_expert_indices = flat_selected_experts[idxs]
-+        sorted_token_indices = idxs // self.top_k
- 
--        # 排好序的输入向量（连续内存）
-         permuted_tokens = hidden_states[sorted_token_indices]
- 
--        # 排好序的权重
-         sorted_weights = flat_routing_weights[idxs]
- 
--        # 每个 expert 对应的 token 数量
-         tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
- 
--        # 存放专家输出（与 permuted_tokens 对应顺序保持一致）
-         expert_outputs = ops.zeros_like(permuted_tokens)
- 
--        ptr = 0  # 指向当前切片的起点
-+        ptr = 0
-         for expert_id, count in enumerate(tokens_per_expert.tolist()):
-             if count == 0:
-                 continue
- 
-             token_slice = slice(ptr, ptr + count)
--            expert_tokens = permuted_tokens[token_slice]  # 连续切片
-+            expert_tokens = permuted_tokens[token_slice]
- 
--            # 执行专家 MLP
-             expert_out = self.experts[expert_id](expert_tokens)
- 
-             expert_outputs[token_slice] = expert_out
-             ptr += count
- 
--        # 按权重缩放
-         scaled_outputs = expert_outputs * sorted_weights.unsqueeze(1)
- 
--        # 回写到原 token 顺序 (单次 scatter_add)
-         moe_output = mindspore.mint.scatter_add(
-             ops.zeros_like(hidden_states),
-             0,
-@@ -1247,10 +677,6 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
- 
-         return moe_output
- 
--
--    
--    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--
-     @no_grad()
-     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
-         moe_output = ops.zeros_like(hidden_states)
-@@ -1282,31 +708,12 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-         if self.norm_topk_prob:
-             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
-         
--        moe_output = None
--        # if Long_Prompt==0:
--        #     # --- 精度优先模式 (ACCURACY MODE) ---
--        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--        # else:
--        #     # --- 速度优先模式 (SPEED MODE) ---
--        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--        #     if sequence_length == 1:
--        #         moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--        #     else:
--        #         moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--        
-         routing_weights_casted = routing_weights.to(hidden_states.dtype)
-         if sequence_length == 1:
-             moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
-         else:
--            # if Long_Prompt == 1:
--            #     moe_output = self._moe_infer_prefill_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--            # else:
--            #     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
-             moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
- 
--
--        # 3. 共享专家计算与合并 (所有模式通用)
-         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
-                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
-         
-@@ -1320,11 +727,6 @@ class Qwen2MoeDecoderLayer(nn.Module):
-     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
-         super().__init__()
-         self.hidden_size = config.hidden_size
--        
--        # if Long_Prompt == 2:
--        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--        # else:
--        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
- 
-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
- 
-@@ -1421,8 +823,6 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
-     _skip_keys_device_placement = "past_key_values"
-     _supports_cache_class = True
--#lwx
--    # _supports_static_cache = True
- 
-     def _init_weights(self, module):
-         std = self.config.initializer_range
-@@ -1576,7 +976,6 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
- 
-         hidden_states = self.norm(hidden_states)
- 
--        # add hidden states from the last decoder layer
-         if output_hidden_states:
-             all_hidden_states += (hidden_states,)
- 
-@@ -1598,7 +997,6 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-             router_logits=all_router_logits,
-         )
- 
--    # Copied from transformers.models.llama.modeling_llama.LlamaModel._update_causal_mask
-     def _update_causal_mask(
-         self,
-         attention_mask: mindspore.Tensor,
-@@ -1626,17 +1024,6 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
-                 else past_seen_tokens + sequence_length + 1
-             )
- 
--        # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
--        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--        #     attention_mask,
--        #     sequence_length=sequence_length,
--        #     target_length=target_length,
--        #     dtype=dtype,
--        #     min_dtype=min_dtype,
--        #     cache_position=cache_position,
--        #     batch_size=input_tensor.shape[0],
--        # )
--        #@dwj
-         causal_mask = get_cached_causal_mask_with_cache_position(
-             attention_mask,
-             sequence_length=sequence_length,
-@@ -1664,9 +1051,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-         self.num_experts_per_tok = config.num_experts_per_tok
-         # Initialize weights and apply final processing
-         self.post_init()
--        # @lwx
--        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--        #     self.generation_config.cache_implementation = "static"
-+
-         self._warmed_up = False
- 
-     def warmup_moe_model(self):
-@@ -1890,17 +1275,6 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-             dtype = self.lm_head.weight.dtype
-             min_dtype = float(ops.finfo(dtype).min)
- 
--            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--            #     attention_mask,
--            #     sequence_length=sequence_length,
--            #     target_length=past_key_values.get_max_length(),
--            #     dtype=dtype,
--            #     min_dtype=min_dtype,
--            #     cache_position=cache_position,
--            #     batch_size=batch_size,
--            # )
--
--            #@dwj
-             attention_mask = get_cached_causal_mask_with_cache_position(
-                 attention_mask,
-                 sequence_length=sequence_length,
-@@ -1922,363 +1296,6 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-         )
-         return model_inputs
- 
--# @lwx
--    # def _decode_one_tokens_logits(
--    #     self,
--    #     cur_token: mindspore.Tensor,
--    #     input_pos: Optional[mindspore.Tensor],
--    #     cache_position: mindspore.Tensor,
--    #     past_key_values: StaticCache,
--    # ) -> mindspore.Tensor:
--    #     """
--    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--        
--    #     Args:
--    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--    #         input_pos: 输入位置信息，可选
--    #         cache_position: 当前token在cache中的位置，shape为(1,)
--    #         past_key_values: StaticCache对象，存储之前的key-value状态
--            
--    #     Returns:
--    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--    #     """
--    #     # 调用JIT编译的版本
--    #     return self.get_decode_one_tokens_logits(
--    #         cur_token=cur_token,
--    #         input_pos=input_pos,
--    #         cache_position=cache_position,
--    #         past_key_values=past_key_values,
--    #     )
--    
--    # @mindspore.jit(jit_level='O1')
--    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--    #     """
--    #     JIT编译的函数，用于高效的单token解码
--    #     使用JIT编译优化以支持静态shape和高效执行
--        
--    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--    #     """
--    #     outputs = self.model.forward(
--    #         input_ids=cur_token,
--    #         position_ids=input_pos,
--    #         cache_position=cache_position,
--    #         past_key_values=past_key_values,
--    #         use_cache=True,
--    #         return_dict=False,
--    #     )
--        
--    #     hidden_states = outputs[0]
--    #     logits = self.lm_head.forward(hidden_states)
--    #     logits = logits.float()
--        
--    #     return logits[:, -1, :]
--
--    # def _sample(
--    #     self,
--    #     input_ids: mindspore.Tensor,
--    #     logits_processor,
--    #     stopping_criteria,
--    #     generation_config,
--    #     synced_devices: bool,
--    #     streamer=None,
--    #     logits_warper=None,
--    #     **model_kwargs,
--    # ):
--    #     """
--    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--    #     """
--    #     from ...generation.logits_process import LogitsProcessorList
--    #     from ...generation.stopping_criteria import StoppingCriteriaList
--    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--    #     from mindnlp.core import nn, ops, no_grad
--    #     import numpy as np
--        
--    #     # 检查是否使用 StaticCache
--    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--    #     # 否则，直接调用父类方法
--    #     past_key_values = model_kwargs.get("past_key_values")
--    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--
--    #     if not isinstance(past_key_values, StaticCache):
--    #         # 不使用 StaticCache，直接调用父类方法
--    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--    #         return super()._sample(
--    #             input_ids=input_ids,
--    #             logits_processor=logits_processor,
--    #             stopping_criteria=stopping_criteria,
--    #             generation_config=generation_config,
--    #             synced_devices=synced_devices,
--    #             streamer=streamer,
--    #             logits_warper=logits_warper,
--    #             **model_kwargs,
--    #         )
--        
--    #     # 使用 StaticCache，进入自定义循环
--    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--    #     pad_token_id = generation_config._pad_token_tensor
--    #     output_attentions = generation_config.output_attentions
--    #     output_hidden_states = generation_config.output_hidden_states
--    #     output_scores = generation_config.output_scores
--    #     output_logits = generation_config.output_logits
--    #     return_dict_in_generate = generation_config.return_dict_in_generate
--    #     max_length = generation_config.max_length
--    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--    #     do_sample = generation_config.do_sample
--        
--    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--    #         raise ValueError(
--    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--    #             f"{logits_warper})."
--    #         )
--        
--    #     # init attention / hidden states / scores tuples
--    #     scores = () if (return_dict_in_generate and output_scores) else None
--    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--        
--    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--    #         encoder_hidden_states = (
--    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--    #         )
--        
--    #     # keep track of which sequences are already finished
--    #     batch_size, cur_len = input_ids.shape
--    #     this_peer_finished = False
--    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--        
--    #     time_record = []
--    #     from ....utils.testing_utils import parse_flag_from_env
--    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--        
--    #     while self._has_unfinished_sequences(
--    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--    #     ):
--    #         if _record_time:
--    #             import time as time_module
--    #             infer_start = time_module.time()
--            
--    #         # prepare model inputs
--    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--            
--    #         # prepare variable output controls
--    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--            
--    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--    #         cur_cache_position = model_inputs.get("cache_position")
--    #         cur_past_key_values = model_inputs.get("past_key_values")
--    #         cur_input_ids = model_inputs.get("input_ids")
--            
--    #         if (isinstance(cur_past_key_values, StaticCache) and 
--    #             cur_cache_position is not None and 
--    #             len(cur_cache_position.shape) > 0 and
--    #             cur_cache_position.shape[0] == 1 and
--    #             cur_input_ids is not None and
--    #             cur_input_ids.shape[1] == 1):
--    #             # 使用 JIT 优化的单 token 解码
--    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--    #             if not hasattr(self, '_jit_used'):
--    #                 self._jit_used = False
--    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--                
--    #             next_token_logits = self.get_decode_one_tokens_logits(
--    #                 cur_token=cur_input_ids,
--    #                 input_pos=model_inputs.get("position_ids"),
--    #                 cache_position=cur_cache_position,
--    #                 past_key_values=cur_past_key_values,
--    #             )
--                
--    #             # 标记已使用JIT（用于后续判断）
--    #             if not self._jit_used:
--    #                 self._jit_used = True
--                
--    #             # 构造兼容的输出对象
--    #             class JitOptimizedOutput:
--    #                 def __init__(self, logits, config):
--    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--    #                     self.config = config
--    #                     # 对于 JIT 优化路径，这些属性通常不需要
--    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--    #                     self.attentions = None if not config.is_encoder_decoder else None
--    #                     self.cross_attentions = None
--    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--                
--    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--    #         else:
--    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--    #             outputs = self(**model_inputs, return_dict=True)
--            
--    #         if synced_devices and this_peer_finished:
--    #             continue
--            
--    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--    #         next_token_logits = outputs.logits[:, -1, :]
--            
--    #         # pre-process distribution
--    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--    #         if do_sample:
--    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--            
--    #         # Store scores, attentions and hidden_states when required
--    #         if return_dict_in_generate:
--    #             if output_scores:
--    #                 scores += (next_token_scores,)
--    #             if output_logits:
--    #                 raw_logits += (next_token_logits,)
--    #             if output_attentions:
--    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--    #                 if self.config.is_encoder_decoder:
--    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--                
--    #             if output_hidden_states:
--    #                 hidden = (
--    #                     outputs.decoder_hidden_states
--    #                     if self.config.is_encoder_decoder
--    #                     else outputs.hidden_states
--    #                 )
--    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--            
--    #         # token selection
--    #         if do_sample:
--    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--    #         else:
--    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--            
--    #         # finished sentences should have their next token be a padding token
--    #         if has_eos_stopping_criteria:
--    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--            
--    #         # update generated ids, model inputs, and length for next step
--    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--    #         if streamer is not None:
--    #             streamer.put(next_tokens)
--            
--    #         model_kwargs = self._update_model_kwargs_for_generation(
--    #             outputs,
--    #             model_kwargs,
--    #             is_encoder_decoder=self.config.is_encoder_decoder,
--    #         )
--            
--    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--    #         cur_len += 1
--            
--    #         if _record_time:
--    #             import time as time_module
--    #             infer_stop = time_module.time()
--    #             time_record.append(infer_stop - infer_start)
--            
--    #         del outputs
--        
--    #     average_infer_time = None
--    #     if time_record:
--    #         if len(time_record) > 1:
--    #             time_record.pop(0)
--    #         average_infer_time = sum(time_record) / len(time_record)
--    #         print(f'average inference time is: {average_infer_time}')
--    #         print(f'inference time record: {time_record}')
--        
--    #     if streamer is not None:
--    #         streamer.end()
--        
--    #     # 简单判断：打印是否使用了JIT路径
--    #     if hasattr(self, '_jit_used') and self._jit_used:
--    #         print("[JIT] ✓ JIT optimization was used during generation")
--    #     else:
--    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--        
--    #     if return_dict_in_generate:
--    #         if self.config.is_encoder_decoder:
--    #             return GenerateEncoderDecoderOutput(
--    #                 sequences=input_ids,
--    #                 scores=scores,
--    #                 logits=raw_logits,
--    #                 encoder_attentions=encoder_attentions,
--    #                 encoder_hidden_states=encoder_hidden_states,
--    #                 decoder_attentions=decoder_attentions,
--    #                 cross_attentions=cross_attentions,
--    #                 decoder_hidden_states=decoder_hidden_states,
--    #                 past_key_values=model_kwargs.get("past_key_values"),
--    #                 average_infer_time=average_infer_time
--    #             )
--    #         else:
--    #             return GenerateDecoderOnlyOutput(
--    #                 sequences=input_ids,
--    #                 scores=scores,
--    #                 logits=raw_logits,
--    #                 attentions=decoder_attentions,
--    #                 hidden_states=decoder_hidden_states,
--    #                 past_key_values=model_kwargs.get("past_key_values"),
--    #                 average_infer_time=average_infer_time
--    #             )
--    #     else:
--    #         return input_ids
--            
--    # def _prepare_cache_for_generation(
--    #     self,
--    #     generation_config,
--    #     model_kwargs,
--    #     assistant_model,
--    #     batch_size,
--    #     max_cache_length,
--    # ):
--    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--    #         generation_config.cache_implementation = "static"
--    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--        
--    #     if generation_config.cache_implementation == "static":
--    #         base_required_from_max_length = generation_config.max_length + 1
--    #         base_required = max(max_cache_length, base_required_from_max_length)
--    #         min_cache_size = 50
--    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--    #         else:
--    #             max_cache_length = max(base_required, min_cache_size)
--            
--    #         original_max_cache_length = max_cache_length
--    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--    #         print(f"  - final max_cache_length: {max_cache_length}")
--            
--    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--    #             if max_cache_length > self.config.max_position_embeddings:
--    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--        
--    #     result = super()._prepare_cache_for_generation(
--    #         generation_config=generation_config,
--    #         model_kwargs=model_kwargs,
--    #         assistant_model=assistant_model,
--    #         batch_size=batch_size,
--    #         max_cache_length=max_cache_length,
--    #     )
--        
--    #     if generation_config.cache_implementation == "static":
--    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--    #         created_cache = model_kwargs.get(cache_name)
--    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--    #             if created_cache.max_cache_len < generation_config.max_length:
--    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--        
--    #     return result
--
--
--
--
--
- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
- class Qwen2MoeForSequenceClassification(Qwen2MoePreTrainedModel):
-     def __init__(self, config):
-diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
-deleted file mode 100644
-index 8de61195..00000000
---- a/patches/0001-20251104commit.patch
-+++ /dev/null
-@@ -1,1272 +0,0 @@
--From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Tue, 4 Nov 2025 09:11:51 +0800
--Subject: [PATCH 1/8] 20251104commit
--
-----
-- mindnlp/transformers/cache_utils.py           |  28 +-
-- .../models/deepseek/modeling_deepseek.py      | 149 ++-
-- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
-- 3 files changed, 976 insertions(+), 87 deletions(-)
--
--diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--index cadd2e04..02f8d4be 100644
----- a/mindnlp/transformers/cache_utils.py
--+++ b/mindnlp/transformers/cache_utils.py
--@@ -812,14 +812,26 @@ class StaticCache(Cache):
--             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--             # k_out[:, :, cache_position] = key_states
--             # v_out[:, :, cache_position] = value_states
---            if ON_ORANGE_PI:
---                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
---                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
---            else:
---                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
---                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
---                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
---
--+            # if ON_ORANGE_PI:
--+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+            # else:
--+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+            # 确保 cache_position 是 1D tensor 并且类型正确
--+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+            if cache_position.ndim > 1:
--+                cache_position = cache_position.flatten()
--+            # 确保类型是 int32 或 int64（MindSpore 要求）
--+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+                cache_position = cache_position.int()
--+            
--+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+            k_out[:, :, cache_position] = key_states
--+            v_out[:, :, cache_position] = value_states
--+            
--         return k_out, v_out
-- 
--     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index c695b944..d8303e45 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
-- # Copied from transformers.models.llama.modeling_llama.rotate_half
-- def rotate_half(x):
--     """Rotates half the hidden dims of the input."""
---    x1 = x[..., : x.shape[-1] // 2]
---    x2 = x[..., x.shape[-1] // 2 :]
--+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+    # x1 = x[..., : x.shape[-1] // 2]
--+    # x2 = x[..., x.shape[-1] // 2 :]
--+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--     return ops.cat((-x2, x1), dim=-1)
-- 
-- 
--@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--         if self.training:
--             raise NotImplementedError("Training is not supported yet.")
--         else:
---            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
---        if self.config.n_shared_experts is not None:
---            y = y + self.shared_experts(identity)
---        return y
--+            # @lwx
--+            if orig_shape[1] == 1:
--+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+                y=y.view(*orig_shape)
--+                if self.config.n_shared_experts is not None:
--+                    y = y + self.shared_experts(identity)
--+                return y
--+            else:
--+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+                if self.config.n_shared_experts is not None:
--+                    y = y + self.shared_experts(identity)
--+                return y
--+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+        # if self.config.n_shared_experts is not None:
--+        #     y = y + self.shared_experts(identity)
--+        # return y
--+        
--+    @no_grad()
--+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+
--+        expert_cache = ops.zeros_like(x)
--+        for i in range(self.num_experts_per_tok):
--+            expert_id = flat_expert_indices[i].item()
--+            weight = flat_expert_weights[i].item()
--+            expert = self.experts[expert_id]
--+            expert_out = expert(x)
--+            expert_cache += expert_out * weight
--+        return expert_cache
-- 
--     @no_grad()
---    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
---        # expert_cache = torch.zeros_like(x)
---        # idxs = flat_expert_indices.argsort()
---        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
---        # token_idxs = idxs // self.num_experts_per_tok
---        # for i, end_idx in enumerate(tokens_per_expert):
---        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
---        #     if start_idx == end_idx:
---        #         continue
---        #     expert = self.experts[i]
---        #     exp_token_idx = token_idxs[start_idx:end_idx]
---        #     expert_tokens = x[exp_token_idx]
---        #     expert_out = expert(expert_tokens)
---        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
---        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
---        # return expert_cache
--+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--         expert_cache = ops.zeros_like(x)
--         idxs = flat_expert_indices.argsort()
--         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--         token_idxs = idxs // self.num_experts_per_tok
--+
--         for i, end_idx in enumerate(tokens_per_expert):
--             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--             if start_idx == end_idx:
--@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--             expert_out = expert(expert_tokens)
--             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+
--         return expert_cache
--+        
--+    # @no_grad()
--+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+    #     # expert_cache = torch.zeros_like(x)
--+    #     # idxs = flat_expert_indices.argsort()
--+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+    #     # token_idxs = idxs // self.num_experts_per_tok
--+    #     # for i, end_idx in enumerate(tokens_per_expert):
--+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+    #     #     if start_idx == end_idx:
--+    #     #         continue
--+    #     #     expert = self.experts[i]
--+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+    #     #     expert_tokens = x[exp_token_idx]
--+    #     #     expert_out = expert(expert_tokens)
--+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+    #     # return expert_cache
--+    #     expert_cache = ops.zeros_like(x)
--+    #     idxs = flat_expert_indices.argsort()
--+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+    #     token_idxs = idxs // self.num_experts_per_tok
--+
--+    #     for i, end_idx in enumerate(tokens_per_expert):
--+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+    #         if start_idx == end_idx:
--+    #             continue
--+    #         expert = self.experts[i]
--+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+    #         expert_tokens = x[exp_token_idx]
--+    #         expert_out = expert(expert_tokens)
--+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+
--+    #     return expert_cache
--+    # @no_grad()
--+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+    #     expert_cache = ops.zeros_like(x)
--+
--+    #     # 排序保证顺序一致
--+    #     idxs = flat_expert_indices.argsort()
--+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+    #     token_idxs = idxs // self.num_experts_per_tok
--+
--+    #     # 找出有 token 的专家
--+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+
--+    #     for i in active_experts.tolist():
--+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+    #         end_idx = tokens_per_expert[i]
--+    #         if start_idx == end_idx:  # 没有 token
--+    #             continue
--+
--+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+    #         expert_tokens = x[exp_token_idx]
--+    #         expert_out = self.experts[i](expert_tokens)
--+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+
--+    #         expert_cache = mindspore.mint.scatter_add(
--+    #             expert_cache,
--+    #             0,
--+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+    #             expert_out
--+    #         )
--+
--+    #     return expert_cache
--+
--+
-- 
-- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
-- #     """
--@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
-- 
--         # Initialize weights and apply final processing
--         self.post_init()
--+        self.warm_up = False
--+
--+    def warmup_moe_model_deep(self):
--+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+        test_texts = [
--+            "warmup short",
--+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+        ]
--+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+        if tokenizer is None:
--+            from mindnlp.transformers import AutoTokenizer
--+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+            self._warmup_tokenizer = tokenizer
--+
--+        for text in test_texts:
--+            inputs = tokenizer(text, return_tensors="ms")
--+            with mindspore._no_grad():
--+                _ = self(**inputs, use_cache=False)
--+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
-- 
--     def get_input_embeddings(self):
--         return self.model.embed_tokens
--@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--         ```"""
--+        if not self.warm_up:
--+            self.warm_up = True
--+            self.warmup_moe_model_deep()
--+
--         output_attentions = (
--             output_attentions
--             if output_attentions is not None
--diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--index 3cbf820e..d4c6b651 100644
----- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--@@ -18,7 +18,6 @@
-- # See the License for the specific language governing permissions and
-- # limitations under the License.
-- """MindSpore Qwen2MoE model."""
---
-- import math
-- from typing import List, Optional, Tuple, Union
-- 
--@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--     TokenClassifierOutput,
-- )
-- from ...modeling_utils import PreTrainedModel
--+from ...generation import GenerationMixin
-- from ....utils import logging
-- from .configuration_qwen2_moe import Qwen2MoeConfig
-- 
--@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--         self.variance_epsilon = eps
-- 
--     def forward(self, hidden_states):
--+        # @dwj
--+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+        # @lwx
--+        # if not self.training :
--+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--         input_dtype = hidden_states.dtype
--         hidden_states = hidden_states.to(mindspore.float32)
--         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--@@ -234,6 +239,8 @@ def rotate_half(x):
--     """Rotates half the hidden dims of the input."""
--     x1 = x[..., : x.shape[-1] // 2]
--     x2 = x[..., x.shape[-1] // 2 :]
--+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--     return ops.cat((-x2, x1), dim=-1)
-- 
-- 
--@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--         self.config = config
--         self.hidden_size = config.hidden_size
--         self.intermediate_size = intermediate_size
--+        
--         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--         self.act_fn = ACT2FN[config.hidden_act]
-- 
--     def forward(self, x):
---        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
---
-- 
--+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+        # @lwx 
--+        # gate_up_output = self.gate_up_proj(x)
--+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+        # return self.down_proj(swiglu_output)
--+
--+    # def forward(self, x):
--+    #     gate_proj_out = self.gate_proj(x)
--+    #     up_proj_out = self.up_proj(x)
--+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+    #     return self.down_proj(swiglu_out)
--+        
-- # Copied from transformers.models.llama.modeling_llama.repeat_kv
-- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--     """
--@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--         use_cache: bool = False,
--         cache_position: Optional[mindspore.Tensor] = None,
--     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+        
--+
--         bsz, q_len, _ = hidden_states.shape
-- 
--         query_states = self.q_proj(hidden_states)
--@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--                     "with a layer index."
--                 )
---            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+            if isinstance(past_key_value, StaticCache):
--+                kv_seq_len = key_states.shape[-2]
--+            else:
--+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-- 
--         if past_key_value is not None:
--             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+            
--+            if isinstance(past_key_value, StaticCache):
--+                kv_seq_len = key_states.shape[-2]
-- 
--         # repeat k/v heads if n_kv_heads < n_heads
--         key_states = repeat_kv(key_states, self.num_key_value_groups)
--         value_states = repeat_kv(value_states, self.num_key_value_groups)
---
--+        
--         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-- 
---        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
---            raise ValueError(
---                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
---                f" {attn_weights.shape}"
---            )
---
---        if attention_mask is not None:  # no matter the length, we just slice it
---            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+        if attention_mask is not None:
--+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--             attn_weights = attn_weights + causal_mask
-- 
--         # upcast attention to fp32
--@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
-- 
--         attn_output = self.o_proj(attn_output)
---
--+        # @lwx
--+        
--+        # max_seq_len = self.max_position_embeddings  # 2048
--+
--+        # if attention_mask is not None:
--+        #     # attention_mask: [B, 1, Sq, Sk]
--+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+
--+        #     # pad 到 [max_seq_len, max_seq_len]
--+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+        #     global_attention_mask = padded_mask
--+        # else:
--+        #     global_attention_mask = None
--+
--+
--+        # sparse_mode=3
--+        # attn_output = mindspore.ops.flash_attention_score(
--+        #     query=query_states,
--+        #     key=key_states,
--+        #     value=value_states,
--+        #     real_shift=None,
--+        #     padding_mask=None,
--+
--+        #     head_num=self.num_heads,
--+        #     attn_mask=global_attention_mask,
--+        #     keep_prob=1.0 - self.attention_dropout,
--+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+        #     input_layout="BNSD",
--+        #     pre_tokens=2147483647,
--+        #     next_tokens=2147483647,
--+        #     inner_precise=0,
--+        #     drop_mask=None,
--+        #     prefix=None,
--+        #     actual_seq_qlen=None,
--+        #     actual_seq_kvlen=None,
--+        #     sparse_mode=sparse_mode,
--+        # )
--         if not output_attentions:
--             attn_weights = None
-- 
--         return attn_output, attn_weights, past_key_value
-- 
-- 
--+class Qwen2MoeFlashAttention(nn.Module):
--+    """
--+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+
--+    关键改动:
--+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+       直接传入原始的 key 和 value 张量效率更高。
--+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+    """
--+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+        super().__init__()
--+        self.config = config
--+        self.layer_idx = layer_idx
--+        self.hidden_size = config.hidden_size
--+        self.num_heads = config.num_attention_heads
--+        self.head_dim = self.hidden_size // self.num_heads
--+        self.num_key_value_heads = config.num_key_value_heads
--+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+        self.max_position_embeddings = config.max_position_embeddings
--+        self.rope_theta = config.rope_theta
--+        self.attention_dropout = config.attention_dropout
--+
--+        if (self.head_dim * self.num_heads) != self.hidden_size:
--+            raise ValueError(
--+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+            )
--+
--+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+
--+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+            self.head_dim,
--+            max_position_embeddings=self.max_position_embeddings,
--+            base=self.rope_theta,
--+        )
--+
--+    def forward(
--+        self,
--+        hidden_states: mindspore.Tensor,
--+        attention_mask: Optional[mindspore.Tensor] = None,
--+        position_ids: Optional[mindspore.Tensor] = None,
--+        past_key_value: Optional[Cache] = None,
--+        output_attentions: bool = False,
--+        use_cache: bool = False,
--+        cache_position: Optional[mindspore.Tensor] = None,
--+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+        bsz, q_len, _ = hidden_states.shape
--+
--+        # 1. 线性投射 Q, K, V
--+        query_states = self.q_proj(hidden_states)
--+        key_states = self.k_proj(hidden_states)
--+        value_states = self.v_proj(hidden_states)
--+
--+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+        # query:   [B, S, H*D] -> [B, N1, S, D]
--+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+        # 3. RoPE 旋转位置编码
--+        kv_seq_len = key_states.shape[-2]
--+        if past_key_value is not None:
--+            if self.layer_idx is None:
--+                raise ValueError(
--+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+                    "with a layer index."
--+                )
--+            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+                if cache_position.shape[0] == 1:
--+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+                    kv_seq_len = past_seen_tokens + 1
--+                else:
--+                    # prefill 阶段：cache_position 是范围，使用其长度
--+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+            else:
--+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+
--+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+        # 4. KV 缓存更新
--+        if past_key_value is not None:
--+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+            key_states, value_states = past_key_value.update(
--+                key_states, value_states, self.layer_idx, cache_kwargs
--+            )
--+            
--+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+                if cache_position.shape[0] == 1:
--+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+                    kv_seq_len = key_states.shape[-2]
--+
--+        # 5. [重要] 准备 Attention Mask
--+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+        fa_attention_mask = None
--+        if attention_mask is not None:
--+            # 截取与当前key长度匹配的部分
--+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+            fa_attention_mask = (mask_slice != 0)
--+
--+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+        input_dtype = query_states.dtype
--+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+            query_states = query_states.to(mindspore.float16)
--+            key_states = key_states.to(mindspore.float16)
--+            value_states = value_states.to(mindspore.float16)
--+
--+        # 6. [核心] 调用 flash_attention_score 算子
--+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+        attn_output = mindspore.ops.flash_attention_score(
--+            query=query_states,
--+            key=key_states,
--+            value=value_states,
--+            head_num=self.num_heads, # 传入Q的头数(N1)
--+            attn_mask=fa_attention_mask,
--+            keep_prob=1.0 - self.attention_dropout,
--+            scalar_value=1.0 / math.sqrt(self.head_dim),
--+            input_layout="BNSD",
--+            sparse_mode=0 # 使用 defaultMask 模式
--+        )
--+
--+        # 恢复原始数据类型
--+        attn_output = attn_output.to(input_dtype)
--+
--+        # 7. 调整输出形状
--+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+        attn_output = self.o_proj(attn_output)
--+
--+        # FlashAttention 算子不直接返回注意力权重矩阵
--+        attn_weights = None
--+        if output_attentions:
--+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+
--+        return attn_output, attn_weights, past_key_value
--+
--+    # def forward(
--+    #     self,
--+    #     hidden_states: mindspore.Tensor,
--+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+    #     position_ids: Optional[mindspore.Tensor] = None,
--+    #     past_key_value: Optional[Cache] = None,
--+    #     output_attentions: bool = False,
--+    #     use_cache: bool = False,
--+    #     cache_position: Optional[mindspore.Tensor] = None,
--+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+    #     bsz, q_len, _ = hidden_states.shape
--+
--+    #     # 1. 线性投射 Q, K, V
--+    #     query_states = self.q_proj(hidden_states)
--+    #     key_states = self.k_proj(hidden_states)
--+    #     value_states = self.v_proj(hidden_states)
--+
--+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+    #     # 3. RoPE 旋转位置编码
--+    #     kv_seq_len = key_states.shape[-2]
--+    #     if past_key_value is not None:
--+    #         if self.layer_idx is None:
--+    #             raise ValueError(
--+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+    #                 "with a layer index."
--+    #             )
--+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+
--+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+    #     # 4. KV 缓存更新
--+    #     if past_key_value is not None:
--+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+    #         key_states, value_states = past_key_value.update(
--+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+    #         )
--+
--+    #     # 5. 准备 Attention Mask
--+    #     fa_attention_mask = None
--+    #     if attention_mask is not None:
--+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+    #         fa_attention_mask = (mask_slice != 0)
--+
--+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+    #     input_dtype = query_states.dtype
--+
--+    #     # 6. [核心] 调用 flash_attention_score 算子
--+    #     attn_output = mindspore.ops.flash_attention_score(
--+    #         query=query_states,
--+    #         key=key_states,
--+    #         value=value_states,
--+    #         head_num=self.num_heads,
--+    #         attn_mask=fa_attention_mask,
--+    #         keep_prob=1.0 - self.attention_dropout,
--+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+    #         input_layout="BNSD",
--+    #         sparse_mode=0,
--+    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+    #         inner_precise=1
--+    #     )
--+
--+    #     # 恢复原始数据类型
--+    #     attn_output = attn_output.to(input_dtype)
--+
--+    #     # 7. 调整输出形状
--+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+    #     attn_output = self.o_proj(attn_output)
--+
--+    #     attn_weights = None
--+    #     if output_attentions:
--+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+
--+    #     return attn_output, attn_weights, past_key_value
--+
--+    # def forward(
--+    #     self,
--+    #     hidden_states: mindspore.Tensor,
--+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+    #     position_ids: Optional[mindspore.Tensor] = None,
--+    #     past_key_value: Optional[Cache] = None,
--+    #     output_attentions: bool = False,
--+    #     use_cache: bool = False,
--+    #     cache_position: Optional[mindspore.Tensor] = None,
--+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+    #     bsz, q_len, _ = hidden_states.shape
--+
--+    #     query_states = self.q_proj(hidden_states)
--+    #     key_states = self.k_proj(hidden_states)
--+    #     value_states = self.v_proj(hidden_states)
--+
--+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+    #     kv_seq_len = key_states.shape[-2]
--+    #     if past_key_value is not None:
--+    #         if self.layer_idx is None:
--+    #             raise ValueError("`layer_idx` must be specified for caching")
--+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+
--+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+    #     if past_key_value is not None:
--+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+    #         key_states, value_states = past_key_value.update(
--+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+    #         )
--+
--+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+
--+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+    #     query_states = query_states / math.sqrt(self.head_dim)
--+    #     # <--- 修改结束 ---
--+
--+    #     fa_attention_mask = None
--+    #     if attention_mask is not None:
--+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+    #         fa_attention_mask = (mask_slice != 0)
--+
--+    #     input_dtype = query_states.dtype
--+
--+    #     attn_output = mindspore.ops.flash_attention_score(
--+    #         query=query_states,    # 传入已经预先缩放过的 query
--+    #         key=key_states,
--+    #         value=value_states,
--+    #         head_num=self.num_heads,
--+    #         attn_mask=fa_attention_mask,
--+    #         keep_prob=1.0 - self.attention_dropout,
--+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+    #         input_layout="BNSD",
--+    #         sparse_mode=0,
--+    #         inner_precise=1        # 仍然保持内部高精度计算
--+    #     )
--+
--+    #     attn_output = attn_output.to(input_dtype)
--+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+    #     attn_output = self.o_proj(attn_output)
--+
--+    #     attn_weights = None
--+    #     if output_attentions:
--+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+
--+    #     return attn_output, attn_weights, past_key_value
--+
-- QWEN2MOE_ATTENTION_CLASSES = {
--     "eager": Qwen2MoeAttention,
--+    "flash-attention": Qwen2MoeFlashAttention,
-- }
-- 
-- 
--@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-- 
--+    #@dwj
--+    # 只遍历激活的专家，而非全部专家
--     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
---        batch_size, sequence_length, hidden_dim = hidden_states.shape
---        hidden_states = hidden_states.view(-1, hidden_dim)
---        # router_logits: (batch * sequence_length, n_experts)
---        router_logits = self.gate(hidden_states)
---
---        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
---        if self.norm_topk_prob:
---            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---        # we cast back to the input dtype
---        routing_weights = routing_weights.to(hidden_states.dtype)
---
---        final_hidden_states = ops.zeros(
---            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
---        )
---
---        # One hot encode the selected experts to create an expert mask
---        # this will be used to easily index which expert is going to be sollicitated
---        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
---
---        # Loop over all available experts in the model and perform the computation on each expert
---        for expert_idx in range(self.num_experts):
---            expert_layer = self.experts[expert_idx]
---            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
---
---            # Index the correct hidden states and compute the expert hidden state for
---            # the current expert. We need to make sure to multiply the output hidden
---            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
---            if 0 not in idx.shape:
---                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
---                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
---
---                # However `index_add_` only support torch tensors for indexing so we'll use
---                # the `top_x` tensor here.
---                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
---
---        shared_expert_output = self.shared_expert(hidden_states)
---        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
---
---        final_hidden_states = final_hidden_states + shared_expert_output
--+            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+            num_tokens = hidden_states_reshaped.shape[0]
--+
--+            router_logits = self.gate(hidden_states_reshaped)
--+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+
--+            if self.norm_topk_prob:
--+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+            routing_weights = routing_weights.to(hidden_states.dtype)
--+            
--+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+            flat_selected_experts = selected_experts.flatten()
--+            
--+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+            token_indices = broadcasted_token_indices.flatten()
--+            
--+            active_experts = ops.unique(flat_selected_experts)
--+            
--+            for expert_idx_tensor in active_experts:
--+                expert_idx = expert_idx_tensor.item()
--+                expert_layer = self.experts[expert_idx]
--+                
--+                mask = (flat_selected_experts == expert_idx_tensor)
--+                selected_token_indices = token_indices[mask]
--+                selected_routing_weights = routing_weights.flatten()[mask]
--+                
--+                current_states = hidden_states_reshaped[selected_token_indices]
--+                
--+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+                
--+                final_hidden_states = final_hidden_states.index_add(
--+                    dim=0,
--+                    index=selected_token_indices,
--+                    source=expert_output.to(hidden_states.dtype)
--+                )
--+            
--+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-- 
---        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
---        return final_hidden_states, router_logits
--+            final_hidden_states = final_hidden_states + shared_expert_output
--+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+            
--+            return final_hidden_states, router_logits
-- 
-- 
-- class Qwen2MoeDecoderLayer(nn.Module):
--@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
-- 
--         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-- 
--+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+
--         if (layer_idx not in config.mlp_only_layers) and (
--             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--         ):
--@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--     _skip_keys_device_placement = "past_key_values"
--     _supports_cache_class = True
--+#lwx
--+    # _supports_static_cache = True
-- 
--     def _init_weights(self, module):
--         std = self.config.initializer_range
--@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--         return causal_mask
-- 
-- 
---class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--     _tied_weights_keys = ["lm_head.weight"]
-- 
--     def __init__(self, config):
--@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--         self.num_experts_per_tok = config.num_experts_per_tok
--         # Initialize weights and apply final processing
--         self.post_init()
--+        # @lwx
--+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+        #     self.generation_config.cache_implementation = "static"
--+        self._warmed_up = False
--+
--+    def warmup_moe_model(self):
--+        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+        test_texts = [
--+            "warmup short",
--+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+        ]
--+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+        if tokenizer is None:
--+            from mindnlp.transformers import AutoTokenizer
--+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+            self._warmup_tokenizer = tokenizer
--+
--+        for text in test_texts:
--+            inputs = tokenizer(text, return_tensors="ms")
--+            with mindspore._no_grad():
--+                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+        print("[Warmup] Qwen2-MoE 模型预热完成。")
-- 
--     def get_input_embeddings(self):
--         return self.model.embed_tokens
--@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--         ```"""
--+        if not self._warmed_up:
--+            self._warmed_up = True
--+            self.warmup_moe_model()
-- 
--         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--         output_router_logits = (
--@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--             }
--         )
--         return model_inputs
--+# @lwx
--+    # def _decode_one_tokens_logits(
--+    #     self,
--+    #     cur_token: mindspore.Tensor,
--+    #     input_pos: Optional[mindspore.Tensor],
--+    #     cache_position: mindspore.Tensor,
--+    #     past_key_values: StaticCache,
--+    # ) -> mindspore.Tensor:
--+    #     """
--+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+        
--+    #     Args:
--+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+    #         input_pos: 输入位置信息，可选
--+    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+            
--+    #     Returns:
--+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+    #     """
--+    #     # 调用JIT编译的版本
--+    #     return self.get_decode_one_tokens_logits(
--+    #         cur_token=cur_token,
--+    #         input_pos=input_pos,
--+    #         cache_position=cache_position,
--+    #         past_key_values=past_key_values,
--+    #     )
--+    
--+    # @mindspore.jit(jit_level='O1')
--+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+    #     """
--+    #     JIT编译的函数，用于高效的单token解码
--+    #     使用JIT编译优化以支持静态shape和高效执行
--+        
--+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+    #     """
--+    #     outputs = self.model.forward(
--+    #         input_ids=cur_token,
--+    #         position_ids=input_pos,
--+    #         cache_position=cache_position,
--+    #         past_key_values=past_key_values,
--+    #         use_cache=True,
--+    #         return_dict=False,
--+    #     )
--+        
--+    #     hidden_states = outputs[0]
--+    #     logits = self.lm_head.forward(hidden_states)
--+    #     logits = logits.float()
--+        
--+    #     return logits[:, -1, :]
--+
--+    # def _sample(
--+    #     self,
--+    #     input_ids: mindspore.Tensor,
--+    #     logits_processor,
--+    #     stopping_criteria,
--+    #     generation_config,
--+    #     synced_devices: bool,
--+    #     streamer=None,
--+    #     logits_warper=None,
--+    #     **model_kwargs,
--+    # ):
--+    #     """
--+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+    #     """
--+    #     from ...generation.logits_process import LogitsProcessorList
--+    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+    #     from mindnlp.core import nn, ops, no_grad
--+    #     import numpy as np
--+        
--+    #     # 检查是否使用 StaticCache
--+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+    #     # 否则，直接调用父类方法
--+    #     past_key_values = model_kwargs.get("past_key_values")
--+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+
--+    #     if not isinstance(past_key_values, StaticCache):
--+    #         # 不使用 StaticCache，直接调用父类方法
--+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+    #         return super()._sample(
--+    #             input_ids=input_ids,
--+    #             logits_processor=logits_processor,
--+    #             stopping_criteria=stopping_criteria,
--+    #             generation_config=generation_config,
--+    #             synced_devices=synced_devices,
--+    #             streamer=streamer,
--+    #             logits_warper=logits_warper,
--+    #             **model_kwargs,
--+    #         )
--+        
--+    #     # 使用 StaticCache，进入自定义循环
--+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+    #     pad_token_id = generation_config._pad_token_tensor
--+    #     output_attentions = generation_config.output_attentions
--+    #     output_hidden_states = generation_config.output_hidden_states
--+    #     output_scores = generation_config.output_scores
--+    #     output_logits = generation_config.output_logits
--+    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+    #     max_length = generation_config.max_length
--+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+    #     do_sample = generation_config.do_sample
--+        
--+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+    #         raise ValueError(
--+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+    #             f"{logits_warper})."
--+    #         )
--+        
--+    #     # init attention / hidden states / scores tuples
--+    #     scores = () if (return_dict_in_generate and output_scores) else None
--+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+        
--+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+    #         encoder_hidden_states = (
--+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+    #         )
--+        
--+    #     # keep track of which sequences are already finished
--+    #     batch_size, cur_len = input_ids.shape
--+    #     this_peer_finished = False
--+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+        
--+    #     time_record = []
--+    #     from ....utils.testing_utils import parse_flag_from_env
--+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+        
--+    #     while self._has_unfinished_sequences(
--+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+    #     ):
--+    #         if _record_time:
--+    #             import time as time_module
--+    #             infer_start = time_module.time()
--+            
--+    #         # prepare model inputs
--+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+            
--+    #         # prepare variable output controls
--+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+            
--+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+    #         cur_cache_position = model_inputs.get("cache_position")
--+    #         cur_past_key_values = model_inputs.get("past_key_values")
--+    #         cur_input_ids = model_inputs.get("input_ids")
--+            
--+    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+    #             cur_cache_position is not None and 
--+    #             len(cur_cache_position.shape) > 0 and
--+    #             cur_cache_position.shape[0] == 1 and
--+    #             cur_input_ids is not None and
--+    #             cur_input_ids.shape[1] == 1):
--+    #             # 使用 JIT 优化的单 token 解码
--+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+    #             if not hasattr(self, '_jit_used'):
--+    #                 self._jit_used = False
--+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+                
--+    #             next_token_logits = self.get_decode_one_tokens_logits(
--+    #                 cur_token=cur_input_ids,
--+    #                 input_pos=model_inputs.get("position_ids"),
--+    #                 cache_position=cur_cache_position,
--+    #                 past_key_values=cur_past_key_values,
--+    #             )
--+                
--+    #             # 标记已使用JIT（用于后续判断）
--+    #             if not self._jit_used:
--+    #                 self._jit_used = True
--+                
--+    #             # 构造兼容的输出对象
--+    #             class JitOptimizedOutput:
--+    #                 def __init__(self, logits, config):
--+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+    #                     self.config = config
--+    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+    #                     self.attentions = None if not config.is_encoder_decoder else None
--+    #                     self.cross_attentions = None
--+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+                
--+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+    #         else:
--+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+    #             outputs = self(**model_inputs, return_dict=True)
--+            
--+    #         if synced_devices and this_peer_finished:
--+    #             continue
--+            
--+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+    #         next_token_logits = outputs.logits[:, -1, :]
--+            
--+    #         # pre-process distribution
--+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+    #         if do_sample:
--+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+            
--+    #         # Store scores, attentions and hidden_states when required
--+    #         if return_dict_in_generate:
--+    #             if output_scores:
--+    #                 scores += (next_token_scores,)
--+    #             if output_logits:
--+    #                 raw_logits += (next_token_logits,)
--+    #             if output_attentions:
--+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+    #                 if self.config.is_encoder_decoder:
--+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+                
--+    #             if output_hidden_states:
--+    #                 hidden = (
--+    #                     outputs.decoder_hidden_states
--+    #                     if self.config.is_encoder_decoder
--+    #                     else outputs.hidden_states
--+    #                 )
--+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+            
--+    #         # token selection
--+    #         if do_sample:
--+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+    #         else:
--+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+            
--+    #         # finished sentences should have their next token be a padding token
--+    #         if has_eos_stopping_criteria:
--+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+            
--+    #         # update generated ids, model inputs, and length for next step
--+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+    #         if streamer is not None:
--+    #             streamer.put(next_tokens)
--+            
--+    #         model_kwargs = self._update_model_kwargs_for_generation(
--+    #             outputs,
--+    #             model_kwargs,
--+    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+    #         )
--+            
--+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+    #         cur_len += 1
--+            
--+    #         if _record_time:
--+    #             import time as time_module
--+    #             infer_stop = time_module.time()
--+    #             time_record.append(infer_stop - infer_start)
--+            
--+    #         del outputs
--+        
--+    #     average_infer_time = None
--+    #     if time_record:
--+    #         if len(time_record) > 1:
--+    #             time_record.pop(0)
--+    #         average_infer_time = sum(time_record) / len(time_record)
--+    #         print(f'average inference time is: {average_infer_time}')
--+    #         print(f'inference time record: {time_record}')
--+        
--+    #     if streamer is not None:
--+    #         streamer.end()
--+        
--+    #     # 简单判断：打印是否使用了JIT路径
--+    #     if hasattr(self, '_jit_used') and self._jit_used:
--+    #         print("[JIT] ✓ JIT optimization was used during generation")
--+    #     else:
--+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+        
--+    #     if return_dict_in_generate:
--+    #         if self.config.is_encoder_decoder:
--+    #             return GenerateEncoderDecoderOutput(
--+    #                 sequences=input_ids,
--+    #                 scores=scores,
--+    #                 logits=raw_logits,
--+    #                 encoder_attentions=encoder_attentions,
--+    #                 encoder_hidden_states=encoder_hidden_states,
--+    #                 decoder_attentions=decoder_attentions,
--+    #                 cross_attentions=cross_attentions,
--+    #                 decoder_hidden_states=decoder_hidden_states,
--+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+    #                 average_infer_time=average_infer_time
--+    #             )
--+    #         else:
--+    #             return GenerateDecoderOnlyOutput(
--+    #                 sequences=input_ids,
--+    #                 scores=scores,
--+    #                 logits=raw_logits,
--+    #                 attentions=decoder_attentions,
--+    #                 hidden_states=decoder_hidden_states,
--+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+    #                 average_infer_time=average_infer_time
--+    #             )
--+    #     else:
--+    #         return input_ids
--+            
--+    # def _prepare_cache_for_generation(
--+    #     self,
--+    #     generation_config,
--+    #     model_kwargs,
--+    #     assistant_model,
--+    #     batch_size,
--+    #     max_cache_length,
--+    # ):
--+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+    #         generation_config.cache_implementation = "static"
--+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+        
--+    #     if generation_config.cache_implementation == "static":
--+    #         base_required_from_max_length = generation_config.max_length + 1
--+    #         base_required = max(max_cache_length, base_required_from_max_length)
--+    #         min_cache_size = 50
--+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+    #         else:
--+    #             max_cache_length = max(base_required, min_cache_size)
--+            
--+    #         original_max_cache_length = max_cache_length
--+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+    #         print(f"  - final max_cache_length: {max_cache_length}")
--+            
--+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+    #             if max_cache_length > self.config.max_position_embeddings:
--+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+        
--+    #     result = super()._prepare_cache_for_generation(
--+    #         generation_config=generation_config,
--+    #         model_kwargs=model_kwargs,
--+    #         assistant_model=assistant_model,
--+    #         batch_size=batch_size,
--+    #         max_cache_length=max_cache_length,
--+    #     )
--+        
--+    #     if generation_config.cache_implementation == "static":
--+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+    #         created_cache = model_kwargs.get(cache_name)
--+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+    #             if created_cache.max_cache_len < generation_config.max_length:
--+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+        
--+    #     return result
--+
--+
--+
-- 
-- 
-- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
---- 
--2.27.0
--
-diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
-deleted file mode 100644
-index d7a129ea..00000000
---- a/patches/0002-20251106commit.patch
-+++ /dev/null
-@@ -1,3200 +0,0 @@
--From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Thu, 6 Nov 2025 09:20:38 +0800
--Subject: [PATCH 2/8] 20251106commit
--
-----
-- .../models/deepseek/modeling_deepseek.py      |  379 ++++-
-- .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
-- patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
-- 3 files changed, 2689 insertions(+), 305 deletions(-)
-- create mode 100644 patches/0001-20251104commit.patch
--
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index d8303e45..73773c22 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
--         #     y = y + self.shared_experts(identity)
--         # return y
--         
--+    # @no_grad()
--+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+
--+    #     expert_cache = ops.zeros_like(x)
--+    #     for i in range(self.num_experts_per_tok):
--+    #         expert_id = flat_expert_indices[i].item()
--+    #         weight = flat_expert_weights[i].item()
--+    #         expert = self.experts[expert_id]
--+    #         expert_out = expert(x)
--+    #         expert_cache += expert_out * weight
--+    #     return expert_cache
--+
--     @no_grad()
--     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+        # x 的 shape: (1, hidden_size)
--+        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+
--+        # 1. 收集所有需要的专家层
--+        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+        selected_experts = [self.experts[i] for i in flat_expert_indices]
--+
--+        # 2. 并行计算所有专家的输出
--+        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+        # ops.cat 会将它们堆叠成一个新的 Tensor
--+        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+
--+        # 3. 使用矩阵乘法进行加权求和
--+        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+        # 最终结果 final_output 的 shape: (1, hidden_size)
--+        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+        
--+        return final_output
-- 
---        expert_cache = ops.zeros_like(x)
---        for i in range(self.num_experts_per_tok):
---            expert_id = flat_expert_indices[i].item()
---            weight = flat_expert_weights[i].item()
---            expert = self.experts[expert_id]
---            expert_out = expert(x)
---            expert_cache += expert_out * weight
---        return expert_cache
-- 
--     @no_grad()
--     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
--             key_states = self.k_proj(hidden_states)
--             value_states = self.v_proj(hidden_states)
-- 
---        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
---        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
---        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+        # @lwx
--+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
--+        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
--+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--+        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--+        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
-- 
--         kv_seq_len = key_states.shape[-2]
--         if past_key_value is not None:
--@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
--         return attn_output, attn_weights, past_key_value
-- 
-- 
--+# class DeepseekFlashAttention(nn.Module):
--+#     """
--+#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--+#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--+
--+#     This class is designed as a drop-in replacement for DeepseekAttention.
--+#     """
--+
--+#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--+#         super().__init__()
--+#         self.config = config
--+#         self.layer_idx = layer_idx
--+#         if layer_idx is None:
--+#             logger.warning(
--+#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+#                 "when creating this class."
--+#             )
--+
--+#         self.attention_dropout = config.attention_dropout
--+#         self.hidden_size = config.hidden_size
--+#         self.num_heads = config.num_attention_heads
--+#         self.head_dim = self.hidden_size // self.num_heads
--+#         self.num_key_value_heads = config.num_key_value_heads
--+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+#         self.max_position_embeddings = config.max_position_embeddings
--+#         self.rope_theta = config.rope_theta
--+#         self.is_causal = True
--+
--+#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+#             raise ValueError(
--+#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+#                 f" and `num_heads`: {self.num_heads})."
--+#             )
--+
--+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--+#         self._init_rope()
--+
--+#     def _init_rope(self):
--+#         if self.config.rope_scaling is None:
--+#             self.rotary_emb = DeepseekRotaryEmbedding(
--+#                 self.head_dim,
--+#                 max_position_embeddings=self.max_position_embeddings,
--+#                 base=self.rope_theta,
--+#             )
--+#         else:
--+#             scaling_type = self.config.rope_scaling["type"]
--+#             scaling_factor = self.config.rope_scaling["factor"]
--+#             if scaling_type == "linear":
--+#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--+#                     self.head_dim,
--+#                     max_position_embeddings=self.max_position_embeddings,
--+#                     scaling_factor=scaling_factor,
--+#                     base=self.rope_theta,
--+#                 )
--+#             elif scaling_type == "dynamic":
--+#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--+#                     self.head_dim,
--+#                     max_position_embeddings=self.max_position_embeddings,
--+#                     scaling_factor=scaling_factor,
--+#                     base=self.rope_theta,
--+#                 )
--+#             else:
--+#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--+
--+#     def forward(
--+#         self,
--+#         hidden_states: mindspore.Tensor,
--+#         attention_mask: Optional[mindspore.Tensor] = None,
--+#         position_ids: Optional[mindspore.Tensor] = None,
--+#         past_key_value: Optional[Cache] = None,
--+#         output_attentions: bool = False,
--+#         use_cache: bool = False,
--+#         **kwargs,
--+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+#         if "padding_mask" in kwargs:
--+#             warnings.warn(
--+#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--+#             )
--+        
--+#         if output_attentions:
--+#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--+
--+#         bsz, q_len, _ = hidden_states.shape
--+
--+#         if self.config.pretraining_tp > 1:
--+#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--+
--+#         query_states = self.q_proj(hidden_states)
--+#         key_states = self.k_proj(hidden_states)
--+#         value_states = self.v_proj(hidden_states)
--+
--+#         # Reshape for multi-head attention
--+#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+#         kv_seq_len = key_states.shape[-2]
--+#         if past_key_value is not None:
--+#             if self.layer_idx is None:
--+#                 raise ValueError(
--+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+#                     "with a layer index."
--+#                 )
--+#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+        
--+#         # Apply Rotary Positional Embedding
--+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+#         if past_key_value is not None:
--+#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--+#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+
--+#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--+#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--+#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+        
--+#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+        
--+#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+
--+#         # Convert attention_mask for flash_attention_score
--+#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--+#         if attention_mask is not None:
--+#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--+#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--+#                 raise ValueError(
--+#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--+#                 )
--+#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--+#         else:
--+#             attn_mask_for_fa = None
--+        
--+#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--+
--+#         # Call the fused flash_attention_score operator
--+#         attn_output = mindspore.ops.flash_attention_score(
--+#             query=query_states_for_fa,
--+#             key=key_states_for_fa,
--+#             value=value_states_for_fa,
--+#             head_num=self.num_heads, # This is N1, the number of query heads
--+#             input_layout='BSH',
--+#             attn_mask=attn_mask_for_fa,
--+#             keep_prob=keep_prob,
--+#             scalar_value=1.0 / math.sqrt(self.head_dim),
--+#             sparse_mode=0 # Default mask mode
--+#         )
--+        
--+#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--+#         attn_output = self.o_proj(attn_output)
--+        
--+#         # Flash Attention does not return attention weights
--+#         attn_weights = None
--+
--+#         return attn_output, attn_weights, past_key_value
--+
--+class DeepseekFlashAttention(nn.Module):
--+    """
--+    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
--+    This implementation is a drop-in replacement for the original DeepseekAttention class,
--+    designed for high performance on supported hardware (Ascend).
--+
--+    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
--+    """
--+    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--+        super().__init__()
--+        self.config = config
--+        self.layer_idx = layer_idx
--+        if layer_idx is None:
--+            logger.warning(
--+                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+                "when creating this class."
--+            )
--+
--+        # --- [FIX] Correctly initialize all required attributes ---
--+        self.attention_dropout = config.attention_dropout
--+        self.hidden_size = config.hidden_size
--+        self.num_heads = config.num_attention_heads
--+        self.head_dim = self.hidden_size // self.num_heads
--+        self.num_key_value_heads = config.num_key_value_heads
--+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+        self.max_position_embeddings = config.max_position_embeddings
--+        self.rope_theta = config.rope_theta
--+        self.is_causal = True
--+
--+        if (self.head_dim * self.num_heads) != self.hidden_size:
--+            raise ValueError(
--+                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+                f" and `num_heads`: {self.num_heads})."
--+            )
--+
--+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--+        
--+        # This call will now succeed as all attributes are initialized.
--+        self._init_rope()
--+
--+    def _init_rope(self):
--+        if self.config.rope_scaling is None:
--+            self.rotary_emb = DeepseekRotaryEmbedding(
--+                self.head_dim,
--+                max_position_embeddings=self.max_position_embeddings,
--+                base=self.rope_theta,
--+            )
--+        else:
--+            scaling_type = self.config.rope_scaling["type"]
--+            scaling_factor = self.config.rope_scaling["factor"]
--+            if scaling_type == "linear":
--+                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--+                    self.head_dim,
--+                    max_position_embeddings=self.max_position_embeddings,
--+                    scaling_factor=scaling_factor,
--+                    base=self.rope_theta,
--+                )
--+            elif scaling_type == "dynamic":
--+                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--+                    self.head_dim,
--+                    max_position_embeddings=self.max_position_embeddings,
--+                    scaling_factor=scaling_factor,
--+                    base=self.rope_theta,
--+                )
--+            else:
--+                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--+
--+    def forward(
--+        self,
--+        hidden_states: mindspore.Tensor,
--+        attention_mask: Optional[mindspore.Tensor] = None,
--+        position_ids: Optional[mindspore.Tensor] = None,
--+        past_key_value: Optional[Cache] = None,
--+        output_attentions: bool = False,
--+        use_cache: bool = False,
--+        **kwargs,
--+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+        if "padding_mask" in kwargs:
--+            warnings.warn(
--+                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--+            )
--+        if output_attentions:
--+            warnings.warn(
--+                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
--+            )
--+
--+        bsz, q_len, _ = hidden_states.shape
--+
--+        if self.config.pretraining_tp > 1:
--+            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--+
--+        query_states = self.q_proj(hidden_states)
--+        key_states = self.k_proj(hidden_states)
--+        value_states = self.v_proj(hidden_states)
--+
--+        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
--+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+        kv_seq_len = key_states.shape[-2]
--+        if past_key_value is not None:
--+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+        
--+        # Apply Rotary Position Embedding
--+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+        if past_key_value is not None:
--+            cache_kwargs = {"sin": sin, "cos": cos}
--+            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+
--+        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
--+        # So we must explicitly repeat the KV heads.
--+        key_states = repeat_kv(key_states, self.num_key_value_groups)
--+        value_states = repeat_kv(value_states, self.num_key_value_groups)
--+
--+        # Convert attention mask for flash_attention_score
--+        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
--+        if attention_mask is not None:
--+            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--+                 raise ValueError(
--+                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--+                )
--+            attn_mask_for_fa = attention_mask < 0
--+        else:
--+            attn_mask_for_fa = None
--+
--+        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--+
--+        # Call the fused operator using the efficient BNSD layout
--+        attn_output = mindspore.ops.flash_attention_score(
--+            query=query_states,
--+            key=key_states,
--+            value=value_states,
--+            head_num=self.num_heads,
--+            input_layout='BNSD', # Specify the correct layout
--+            attn_mask=attn_mask_for_fa,
--+            keep_prob=keep_prob,
--+            scalar_value=1.0 / math.sqrt(self.head_dim)
--+        )
--+        
--+        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
--+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+        
--+        # Apply output projection
--+        attn_output = self.o_proj(attn_output)
--+
--+        # Flash attention does not return attention weights, so we return None.
--+        attn_weights = None
--+
--+        return attn_output, attn_weights, past_key_value
--+
-- Deepseek_ATTENTION_CLASSES = {
--     "eager": DeepseekAttention,
--+    "flash-attention": DeepseekFlashAttention,
-- }
-- 
-- 
--@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
--             config=config, layer_idx=layer_idx
--         )
-- 
--+        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--+            config=config, layer_idx=layer_idx
--+        )
--+
--         self.mlp = (
--             DeepseekMoE(config)
--             if (
--diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--index d4c6b651..bced285c 100644
----- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
-- 
-- import mindspore
-- import mindnlp.core.nn.functional as F
---from mindnlp.core import nn, ops
--+from mindnlp.core import nn, ops, no_grad
-- from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-- 
-- from ....common.activations import ACT2FN
--@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
-- _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-- _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-- 
--+Long_Prompt = False
--+PROMPT_LENGTH_THRESHOLD = 128
-- 
-- # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-- def _prepare_4d_causal_attention_mask_with_cache_position(
--@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
--         return attn_output, attn_weights, past_key_value
-- 
-- 
--+# class Qwen2MoeFlashAttention(nn.Module):
--+#     """
--+#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+
--+#     关键改动:
--+#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+#        直接传入原始的 key 和 value 张量效率更高。
--+#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+#     """
--+#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+#         super().__init__()
--+#         self.config = config
--+#         self.layer_idx = layer_idx
--+#         self.hidden_size = config.hidden_size
--+#         self.num_heads = config.num_attention_heads
--+#         self.head_dim = self.hidden_size // self.num_heads
--+#         self.num_key_value_heads = config.num_key_value_heads
--+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+#         self.max_position_embeddings = config.max_position_embeddings
--+#         self.rope_theta = config.rope_theta
--+#         self.attention_dropout = config.attention_dropout
--+
--+#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+#             raise ValueError(
--+#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+#             )
--+
--+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+
--+#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+#             self.head_dim,
--+#             max_position_embeddings=self.max_position_embeddings,
--+#             base=self.rope_theta,
--+#         )
--+
--+#     def forward(
--+#         self,
--+#         hidden_states: mindspore.Tensor,
--+#         attention_mask: Optional[mindspore.Tensor] = None,
--+#         position_ids: Optional[mindspore.Tensor] = None,
--+#         past_key_value: Optional[Cache] = None,
--+#         output_attentions: bool = False,
--+#         use_cache: bool = False,
--+#         cache_position: Optional[mindspore.Tensor] = None,
--+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+#         bsz, q_len, _ = hidden_states.shape
--+
--+#         # 1. 线性投射 Q, K, V
--+#         query_states = self.q_proj(hidden_states)
--+#         key_states = self.k_proj(hidden_states)
--+#         value_states = self.v_proj(hidden_states)
--+
--+#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+#         # query:   [B, S, H*D] -> [B, N1, S, D]
--+#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+#         # 3. RoPE 旋转位置编码
--+#         kv_seq_len = key_states.shape[-2]
--+#         if past_key_value is not None:
--+#             if self.layer_idx is None:
--+#                 raise ValueError(
--+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+#                     "with a layer index."
--+#                 )
--+#             # 对于 StaticCache，需要特殊处理 kv_seq_len
--+#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+#                 if cache_position.shape[0] == 1:
--+#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+#                     kv_seq_len = past_seen_tokens + 1
--+#                 else:
--+#                     # prefill 阶段：cache_position 是范围，使用其长度
--+#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+#             else:
--+#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+
--+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+#         # 4. KV 缓存更新
--+#         if past_key_value is not None:
--+#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+#             key_states, value_states = past_key_value.update(
--+#                 key_states, value_states, self.layer_idx, cache_kwargs
--+#             )
--+            
--+#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+#                 if cache_position.shape[0] == 1:
--+#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+#                     kv_seq_len = key_states.shape[-2]
--+
--+#         # 5. [重要] 准备 Attention Mask
--+#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+#         fa_attention_mask = None
--+#         if attention_mask is not None:
--+#             # 截取与当前key长度匹配的部分
--+#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+#             # 转换为布尔类型: 大负数 -> True, 0 -> False
--+#             fa_attention_mask = (mask_slice != 0)
--+
--+#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+#         input_dtype = query_states.dtype
--+#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+#             query_states = query_states.to(mindspore.float16)
--+#             key_states = key_states.to(mindspore.float16)
--+#             value_states = value_states.to(mindspore.float16)
--+
--+#         # 6. [核心] 调用 flash_attention_score 算子
--+#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+#         attn_output = mindspore.ops.flash_attention_score(
--+#             query=query_states,
--+#             key=key_states,
--+#             value=value_states,
--+#             head_num=self.num_heads, # 传入Q的头数(N1)
--+#             attn_mask=fa_attention_mask,
--+#             keep_prob=1.0 - self.attention_dropout,
--+#             scalar_value=1.0 / math.sqrt(self.head_dim),
--+#             input_layout="BNSD",
--+#             sparse_mode=0 # 使用 defaultMask 模式
--+#         )
--+
--+#         # 恢复原始数据类型
--+#         attn_output = attn_output.to(input_dtype)
--+
--+#         # 7. 调整输出形状
--+#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+#         attn_output = self.o_proj(attn_output)
--+
--+#         # FlashAttention 算子不直接返回注意力权重矩阵
--+#         attn_weights = None
--+#         if output_attentions:
--+#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+
--+#         return attn_output, attn_weights, past_key_value
--+
--+#     # def forward(
--+#     #     self,
--+#     #     hidden_states: mindspore.Tensor,
--+#     #     attention_mask: Optional[mindspore.Tensor] = None,
--+#     #     position_ids: Optional[mindspore.Tensor] = None,
--+#     #     past_key_value: Optional[Cache] = None,
--+#     #     output_attentions: bool = False,
--+#     #     use_cache: bool = False,
--+#     #     cache_position: Optional[mindspore.Tensor] = None,
--+#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+#     #     bsz, q_len, _ = hidden_states.shape
--+
--+#     #     # 1. 线性投射 Q, K, V
--+#     #     query_states = self.q_proj(hidden_states)
--+#     #     key_states = self.k_proj(hidden_states)
--+#     #     value_states = self.v_proj(hidden_states)
--+
--+#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+
--+#     #     # 3. RoPE 旋转位置编码
--+#     #     kv_seq_len = key_states.shape[-2]
--+#     #     if past_key_value is not None:
--+#     #         if self.layer_idx is None:
--+#     #             raise ValueError(
--+#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+#     #                 "with a layer index."
--+#     #             )
--+#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+
--+#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+#     #     # 4. KV 缓存更新
--+#     #     if past_key_value is not None:
--+#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+#     #         key_states, value_states = past_key_value.update(
--+#     #             key_states, value_states, self.layer_idx, cache_kwargs
--+#     #         )
--+
--+#     #     # 5. 准备 Attention Mask
--+#     #     fa_attention_mask = None
--+#     #     if attention_mask is not None:
--+#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+#     #         fa_attention_mask = (mask_slice != 0)
--+
--+#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+#     #     input_dtype = query_states.dtype
--+
--+#     #     # 6. [核心] 调用 flash_attention_score 算子
--+#     #     attn_output = mindspore.ops.flash_attention_score(
--+#     #         query=query_states,
--+#     #         key=key_states,
--+#     #         value=value_states,
--+#     #         head_num=self.num_heads,
--+#     #         attn_mask=fa_attention_mask,
--+#     #         keep_prob=1.0 - self.attention_dropout,
--+#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+#     #         input_layout="BNSD",
--+#     #         sparse_mode=0,
--+#     #         # <--- 修改点 2: 启用内部高精度计算 ---
--+#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+#     #         inner_precise=1
--+#     #     )
--+
--+#     #     # 恢复原始数据类型
--+#     #     attn_output = attn_output.to(input_dtype)
--+
--+#     #     # 7. 调整输出形状
--+#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+#     #     attn_output = self.o_proj(attn_output)
--+
--+#     #     attn_weights = None
--+#     #     if output_attentions:
--+#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+
--+#     #     return attn_output, attn_weights, past_key_value
--+
--+
-- class Qwen2MoeFlashAttention(nn.Module):
--     """
---    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
---    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
---
---    关键改动:
---    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
---       直接传入原始的 key 和 value 张量效率更高。
---    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
---    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
--+
--+    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
--+    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
--+    完全使用模型的低精度数据类型（如 float16）进行计算，
--+    以达到理论上的最高执行速度。
--     """
--     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--         super().__init__()
--         self.config = config
--         self.layer_idx = layer_idx
--+        if layer_idx is None:
--+            logger.warning_once(
--+                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
--+            )
--+
--         self.hidden_size = config.hidden_size
--         self.num_heads = config.num_attention_heads
--         self.head_dim = self.hidden_size // self.num_heads
--         self.num_key_value_heads = config.num_key_value_heads
---        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--         self.max_position_embeddings = config.max_position_embeddings
--         self.rope_theta = config.rope_theta
--         self.attention_dropout = config.attention_dropout
-- 
---        if (self.head_dim * self.num_heads) != self.hidden_size:
---            raise ValueError(
---                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
---            )
---
--         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
--         key_states = self.k_proj(hidden_states)
--         value_states = self.v_proj(hidden_states)
-- 
---        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
---        # query:   [B, S, H*D] -> [B, N1, S, D]
---        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+        # 2. 调整形状以匹配 BNSD 布局
--         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---
---        # 3. RoPE 旋转位置编码
--+        
--+        # 3. RoPE 和 KV 缓存
--         kv_seq_len = key_states.shape[-2]
--         if past_key_value is not None:
---            if self.layer_idx is None:
---                raise ValueError(
---                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
---                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
---                    "with a layer index."
---                )
---            # 对于 StaticCache，需要特殊处理 kv_seq_len
---            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
---            if isinstance(past_key_value, StaticCache) and cache_position is not None:
---                # 使用 cache_position 的长度来确定实际的 kv_seq_len
---                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
---                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
---                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
---                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
---                # 临时解决方案：使用 cache_position 的最大值（如果可能）
---                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
---                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
---                if cache_position.shape[0] == 1:
---                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
---                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
---                    kv_seq_len = past_seen_tokens + 1
---                else:
---                    # prefill 阶段：cache_position 是范围，使用其长度
---                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
---            else:
---                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
---
--+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+        
--         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-- 
---        # 4. KV 缓存更新
--         if past_key_value is not None:
--             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
---            key_states, value_states = past_key_value.update(
---                key_states, value_states, self.layer_idx, cache_kwargs
---            )
---            
---            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
---            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
---            if isinstance(past_key_value, StaticCache) and cache_position is not None:
---                if cache_position.shape[0] == 1:
---                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
---                    kv_seq_len = key_states.shape[-2]
---
---        # 5. [重要] 准备 Attention Mask
---        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
---        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+
--+        # 4. 准备 Attention Mask
--         fa_attention_mask = None
--         if attention_mask is not None:
---            # 截取与当前key长度匹配的部分
---            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
---            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
---            # 转换为布尔类型: 大负数 -> True, 0 -> False
--             fa_attention_mask = (mask_slice != 0)
-- 
---        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
---        input_dtype = query_states.dtype
---        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
---            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
---            query_states = query_states.to(mindspore.float16)
---            key_states = key_states.to(mindspore.float16)
---            value_states = value_states.to(mindspore.float16)
---
---        # 6. [核心] 调用 flash_attention_score 算子
---        #    - 无需手动 repeat_kv, 算子原生支持 GQA
---        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
--         attn_output = mindspore.ops.flash_attention_score(
--             query=query_states,
--             key=key_states,
--             value=value_states,
---            head_num=self.num_heads, # 传入Q的头数(N1)
--+            head_num=self.num_heads,
--             attn_mask=fa_attention_mask,
---            keep_prob=1.0 - self.attention_dropout,
--+            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
--             scalar_value=1.0 / math.sqrt(self.head_dim),
--             input_layout="BNSD",
---            sparse_mode=0 # 使用 defaultMask 模式
--+            sparse_mode=0,
--+            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
--         )
-- 
---        # 恢复原始数据类型
---        attn_output = attn_output.to(input_dtype)
---
---        # 7. 调整输出形状
---        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+        # 6. 调整输出形状
--         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--         attn_output = self.o_proj(attn_output)
-- 
---        # FlashAttention 算子不直接返回注意力权重矩阵
--+        # 7. 返回结果
--         attn_weights = None
--         if output_attentions:
---             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-- 
--         return attn_output, attn_weights, past_key_value
-- 
---    # def forward(
---    #     self,
---    #     hidden_states: mindspore.Tensor,
---    #     attention_mask: Optional[mindspore.Tensor] = None,
---    #     position_ids: Optional[mindspore.Tensor] = None,
---    #     past_key_value: Optional[Cache] = None,
---    #     output_attentions: bool = False,
---    #     use_cache: bool = False,
---    #     cache_position: Optional[mindspore.Tensor] = None,
---    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
---
---    #     bsz, q_len, _ = hidden_states.shape
---
---    #     # 1. 线性投射 Q, K, V
---    #     query_states = self.q_proj(hidden_states)
---    #     key_states = self.k_proj(hidden_states)
---    #     value_states = self.v_proj(hidden_states)
---
---    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
---    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
---    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---
---    #     # 3. RoPE 旋转位置编码
---    #     kv_seq_len = key_states.shape[-2]
---    #     if past_key_value is not None:
---    #         if self.layer_idx is None:
---    #             raise ValueError(
---    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
---    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
---    #                 "with a layer index."
---    #             )
---    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
-- 
---    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
---    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
---
---    #     # 4. KV 缓存更新
---    #     if past_key_value is not None:
---    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
---    #         key_states, value_states = past_key_value.update(
---    #             key_states, value_states, self.layer_idx, cache_kwargs
---    #         )
---
---    #     # 5. 准备 Attention Mask
---    #     fa_attention_mask = None
---    #     if attention_mask is not None:
---    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
---    #         fa_attention_mask = (mask_slice != 0)
---
---    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
---    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
---    #     input_dtype = query_states.dtype
---
---    #     # 6. [核心] 调用 flash_attention_score 算子
---    #     attn_output = mindspore.ops.flash_attention_score(
---    #         query=query_states,
---    #         key=key_states,
---    #         value=value_states,
---    #         head_num=self.num_heads,
---    #         attn_mask=fa_attention_mask,
---    #         keep_prob=1.0 - self.attention_dropout,
---    #         scalar_value=1.0 / math.sqrt(self.head_dim),
---    #         input_layout="BNSD",
---    #         sparse_mode=0,
---    #         # <--- 修改点 2: 启用内部高精度计算 ---
---    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
---    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
---    #         inner_precise=1
---    #     )
---
---    #     # 恢复原始数据类型
---    #     attn_output = attn_output.to(input_dtype)
--+QWEN2MOE_ATTENTION_CLASSES = {
--+    "eager": Qwen2MoeAttention,
--+    "flash-attention": Qwen2MoeFlashAttention,
--+}
-- 
---    #     # 7. 调整输出形状
---    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
---    #     attn_output = self.o_proj(attn_output)
-- 
---    #     attn_weights = None
---    #     if output_attentions:
---    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+# class Qwen2MoeSparseMoeBlock(nn.Module):
--+#     def __init__(self, config):
--+#         super().__init__()
--+#         self.num_experts = config.num_experts
--+#         self.top_k = config.num_experts_per_tok
--+#         self.norm_topk_prob = config.norm_topk_prob
--+
--+#         # gating
--+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+#         self.experts = nn.ModuleList(
--+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+#         )
--+
--+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+
--+#     #@dwj
--+#     # 只遍历激活的专家，而非全部专家
--+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+#             batch_size, sequence_length, hidden_dim = hidden_states.shape
--+#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+#             num_tokens = hidden_states_reshaped.shape[0]
--+
--+#             router_logits = self.gate(hidden_states_reshaped)
--+#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+
--+#             if self.norm_topk_prob:
--+#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+#             routing_weights = routing_weights.to(hidden_states.dtype)
--+            
--+#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+#             flat_selected_experts = selected_experts.flatten()
--+            
--+#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+#             token_indices = broadcasted_token_indices.flatten()
--+            
--+#             active_experts = ops.unique(flat_selected_experts)
--+            
--+#             for expert_idx_tensor in active_experts:
--+#                 expert_idx = expert_idx_tensor.item()
--+#                 expert_layer = self.experts[expert_idx]
--+                
--+#                 mask = (flat_selected_experts == expert_idx_tensor)
--+#                 selected_token_indices = token_indices[mask]
--+#                 selected_routing_weights = routing_weights.flatten()[mask]
--+                
--+#                 current_states = hidden_states_reshaped[selected_token_indices]
--+                
--+#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+                
--+#                 final_hidden_states = final_hidden_states.index_add(
--+#                     dim=0,
--+#                     index=selected_token_indices,
--+#                     source=expert_output.to(hidden_states.dtype)
--+#                 )
--+            
--+#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
-- 
---    #     return attn_output, attn_weights, past_key_value
--+#             final_hidden_states = final_hidden_states + shared_expert_output
--+#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+            
--+#             return final_hidden_states, router_logits
--+
--+
--+# class Qwen2MoeSparseMoeBlock(nn.Module):
--+#     """
--+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--+#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--+#     `_moe_infer_prefill` (用于长序列处理) 方法。
--+#     """
--+#     def __init__(self, config: Qwen2MoeConfig):
--+#         super().__init__()
--+#         self.num_experts = config.num_experts
--+#         self.top_k = config.num_experts_per_tok
--+#         self.norm_topk_prob = config.norm_topk_prob
--+
--+#         # 门控网络
--+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+#         # 专家列表
--+#         self.experts = nn.ModuleList(
--+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+#         )
--+#         # 共享专家
--+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+
--+#     @no_grad()
--+#     def _moe_infer_decode(
--+#         self, 
--+#         hidden_states: mindspore.Tensor, 
--+#         selected_experts: mindspore.Tensor, 
--+#         routing_weights: mindspore.Tensor
--+#     ) -> mindspore.Tensor:
--+#         """
--+#         【解码路径】针对 sequence_length=1 的极致优化。
--+#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--+#         """
--+#         batch_size, hidden_dim = hidden_states.shape
--+        
--+#         expert_outputs_list = [
--+#             ops.cat([
--+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+#             ], dim=0) 
--+#             for i in range(batch_size)
--+#         ]
--+        
--+#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--+#         # shape: (batch_size, top_k, hidden_dim)
--+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+        
--+#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--+#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+        
--+#         return moe_output.squeeze(1)
--+
--+#     @no_grad()
--+#     def _moe_infer_prefill(
--+#         self, 
--+#         hidden_states: mindspore.Tensor, 
--+#         selected_experts: mindspore.Tensor, 
--+#         routing_weights: mindspore.Tensor
--+#     ) -> mindspore.Tensor:
--+#         """
--+#         【预填充路径】针对 sequence_length > 1 的优化。
--+#         按专家对 Token 进行分组，并进行批处理。
--+#         """
--+#         moe_output = ops.zeros_like(hidden_states)
--+#         num_tokens = hidden_states.shape[0]
--+#         flat_selected_experts = selected_experts.flatten()
--+        
--+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+        
--+#         active_experts = ops.unique(flat_selected_experts)
--+        
--+#         for expert_idx_tensor in active_experts:
--+#             expert_idx = expert_idx_tensor.item()
--+#             expert_layer = self.experts[expert_idx]
--+            
--+#             mask = (flat_selected_experts == expert_idx_tensor)
--+#             selected_token_indices = token_indices[mask]
--+#             selected_routing_weights = routing_weights.flatten()[mask]
--+            
--+#             current_states = hidden_states[selected_token_indices]
--+            
--+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+            
--+#             moe_output = moe_output.index_add(
--+#                 dim=0,
--+#                 index=selected_token_indices,
--+#                 source=expert_output.to(hidden_states.dtype)
--+#             )
--+#         return moe_output
--+
--+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+#         """
--+#         顶层 forward 方法，作为智能分发器。
--+#         """
--+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+        
--+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+#         router_logits = self.gate(hidden_states_reshaped)
--+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-- 
---    # def forward(
---    #     self,
---    #     hidden_states: mindspore.Tensor,
---    #     attention_mask: Optional[mindspore.Tensor] = None,
---    #     position_ids: Optional[mindspore.Tensor] = None,
---    #     past_key_value: Optional[Cache] = None,
---    #     output_attentions: bool = False,
---    #     use_cache: bool = False,
---    #     cache_position: Optional[mindspore.Tensor] = None,
---    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
---
---    #     bsz, q_len, _ = hidden_states.shape
---
---    #     query_states = self.q_proj(hidden_states)
---    #     key_states = self.k_proj(hidden_states)
---    #     value_states = self.v_proj(hidden_states)
---
---    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
---    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---
---    #     kv_seq_len = key_states.shape[-2]
---    #     if past_key_value is not None:
---    #         if self.layer_idx is None:
---    #             raise ValueError("`layer_idx` must be specified for caching")
---    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
---
---    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
---    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
---
---    #     if past_key_value is not None:
---    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
---    #         key_states, value_states = past_key_value.update(
---    #             key_states, value_states, self.layer_idx, cache_kwargs
---    #         )
--+#         if self.norm_topk_prob:
--+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+        
--+#         routing_weights = routing_weights.to(hidden_states.dtype)
--+        
--+#         moe_output = None
--+#         # 在推理时，根据序列长度选择最优路径
--+#         if not self.training:
--+#             if sequence_length == 1:
--+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+#             else:
--+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+#         else:
--+#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--+#             raise NotImplementedError("Training path is not implemented.")
--+
--+#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--+#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--+        
--+#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--+        
--+#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--+        
--+#         return final_hidden_states, router_logits
--+
--+
--+# class Qwen2MoeSparseMoeBlock(nn.Module):
--+#     """
--+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--+#     """
--+#     def __init__(self, config: Qwen2MoeConfig):
--+#         super().__init__()
--+#         self.num_experts = config.num_experts
--+#         self.top_k = config.num_experts_per_tok
--+#         self.norm_topk_prob = config.norm_topk_prob
--+
--+#         # 门控网络
--+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+#         # 专家列表
--+#         self.experts = nn.ModuleList(
--+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+#         )
--+#         # 共享专家
--+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+
--+#     @no_grad()
--+#     def _moe_infer_decode(
--+#         self, 
--+#         hidden_states: mindspore.Tensor, 
--+#         selected_experts: mindspore.Tensor, 
--+#         routing_weights: mindspore.Tensor
--+#     ) -> mindspore.Tensor:
--+#         batch_size, _ = hidden_states.shape
--+#         expert_outputs_list = [
--+#             ops.cat([
--+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+#             ], dim=0) 
--+#             for i in range(batch_size)
--+#         ]
--+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+#         return moe_output.squeeze(1)
--+
--+#     @no_grad()
--+#     def _moe_infer_prefill(
--+#         self, 
--+#         hidden_states: mindspore.Tensor, 
--+#         selected_experts: mindspore.Tensor, 
--+#         routing_weights: mindspore.Tensor
--+#     ) -> mindspore.Tensor:
--+#         moe_output = ops.zeros_like(hidden_states)
--+#         num_tokens = hidden_states.shape[0]
--+#         flat_selected_experts = selected_experts.flatten()
--+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+#         active_experts = ops.unique(flat_selected_experts)
--+        
--+#         for expert_idx_tensor in active_experts:
--+#             expert_idx = expert_idx_tensor.item()
--+#             expert_layer = self.experts[expert_idx]
--+#             mask = (flat_selected_experts == expert_idx_tensor)
--+#             selected_token_indices = token_indices[mask]
--+#             selected_routing_weights = routing_weights.flatten()[mask]
--+#             current_states = hidden_states[selected_token_indices]
--+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+#             moe_output = moe_output.index_add(
--+#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+#             )
--+#         return moe_output
--+
--+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+#         """
--+#         顶层 forward 方法，作为智能分发器。
--+#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--+#         """
--+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+        
--+#         # 1. 门控计算 (通用逻辑)
--+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+#         router_logits = self.gate(hidden_states_reshaped)
--+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+
--+#         if self.norm_topk_prob:
--+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+        
--+#         routing_weights = routing_weights.to(hidden_states.dtype)
--+        
--+#         # 2. 智能分发到最优 MoE 路径
--+#         moe_output = None
--+#         if not self.training:
--+#             if sequence_length == 1:
--+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+#             else:
--+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+#         else:
--+#             raise NotImplementedError("Training path is not implemented.")
--+
--+#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--+#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+        
--+#         # 4. 合并 MoE 输出和共享专家输出
--+#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+        
--+#         # 5. 恢复原始形状并返回
--+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+        
--+#         return final_hidden_states, router_logits
--+
--+# prefill fastest
--+# class Qwen2MoeSparseMoeBlock(nn.Module):
--+#     """
--+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--+#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--+#     """
--+#     def __init__(self, config: Qwen2MoeConfig):
--+#         super().__init__()
--+#         self.num_experts = config.num_experts
--+#         self.top_k = config.num_experts_per_tok
--+#         self.norm_topk_prob = config.norm_topk_prob
--+
--+#         # 门控网络
--+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+#         # 专家列表
--+#         self.experts = nn.ModuleList(
--+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+#         )
--+#         # 共享专家
--+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+
--+#     @no_grad()
--+#     def _moe_infer_dispatch(
--+#         self, 
--+#         hidden_states: mindspore.Tensor, 
--+#         selected_experts: mindspore.Tensor, 
--+#         routing_weights: mindspore.Tensor
--+#     ) -> mindspore.Tensor:
--+#         """
--+#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--+#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--+#         """
--+#         moe_output = ops.zeros_like(hidden_states)
--+#         num_tokens, _ = hidden_states.shape
--+        
--+#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--+#         flat_selected_experts = selected_experts.flatten()
--+#         flat_routing_weights = routing_weights.flatten()
-- 
---    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
---    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
---
---    #     # <--- 核心修改点: 手动进行高精度缩放 ---
---    #     # 在调用算子前，手动将 query_states 除以缩放因子。
---    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
---    #     query_states = query_states / math.sqrt(self.head_dim)
---    #     # <--- 修改结束 ---
---
---    #     fa_attention_mask = None
---    #     if attention_mask is not None:
---    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
---    #         fa_attention_mask = (mask_slice != 0)
---
---    #     input_dtype = query_states.dtype
---
---    #     attn_output = mindspore.ops.flash_attention_score(
---    #         query=query_states,    # 传入已经预先缩放过的 query
---    #         key=key_states,
---    #         value=value_states,
---    #         head_num=self.num_heads,
---    #         attn_mask=fa_attention_mask,
---    #         keep_prob=1.0 - self.attention_dropout,
---    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
---    #         input_layout="BNSD",
---    #         sparse_mode=0,
---    #         inner_precise=1        # 仍然保持内部高精度计算
---    #     )
--+#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
-- 
---    #     attn_output = attn_output.to(input_dtype)
---    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
---    #     attn_output = self.o_proj(attn_output)
--+#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--+#         active_experts = ops.unique(flat_selected_experts)
--+        
--+#         for expert_idx_tensor in active_experts:
--+#             expert_idx = expert_idx_tensor.item()
--+#             expert_layer = self.experts[expert_idx]
--+            
--+#             # 找到所有分配给该专家的 token
--+#             mask = (flat_selected_experts == expert_idx_tensor)
--+            
--+#             # 使用 mask 选取对应的 token 和权重
--+#             current_token_indices = token_indices[mask]
--+#             current_routing_weights = flat_routing_weights[mask]
--+#             current_hidden_states = hidden_states[current_token_indices]
--+            
--+#             # 对这些 token 进行批处理
--+#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+            
--+#             # 使用 index_add 将结果精确地加回到对应位置
--+#             moe_output = moe_output.index_add(
--+#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--+#             )
--+#         return moe_output
--+
--+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+#         """
--+#         顶层 forward 方法，作为智能分发器。
--+#         """
--+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+        
--+#         # 1. 门控计算
--+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+#         router_logits = self.gate(hidden_states_reshaped)
--+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+
--+#         if self.norm_topk_prob:
--+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+        
--+#         routing_weights = routing_weights.to(hidden_states.dtype)
--+        
--+#         # 2. 调用统一的 MoE 计算内核
--+#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--+#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
-- 
---    #     attn_weights = None
---    #     if output_attentions:
---    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+#         # 3. 统一处理共享专家
--+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+        
--+#         # 4. 合并输出
--+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+        
--+#         # 5. 恢复原始形状并返回
--+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+        
--+#         return final_hidden_states, router_logits
--+
--+
--+# class Qwen2MoeSparseMoeBlock(nn.Module):
--+#     """
--+#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+#     【最终高性能与高精度版】：
--+#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--+#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--+#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--+#     3. 这样实现了速度和准确性的两全其美。
--+#     """
--+#     def __init__(self, config: Qwen2MoeConfig):
--+#         super().__init__()
--+#         self.num_experts = config.num_experts
--+#         self.top_k = config.num_experts_per_tok
--+#         self.norm_topk_prob = config.norm_topk_prob
--+
--+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+#         self.experts = nn.ModuleList(
--+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+#         )
--+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+
--+#     @no_grad()
--+#     def _moe_infer_decode(
--+#         self, 
--+#         hidden_states: mindspore.Tensor, 
--+#         selected_experts: mindspore.Tensor, 
--+#         routing_weights: mindspore.Tensor
--+#     ) -> mindspore.Tensor:
--+#         """
--+#         【解码路径】极致优化版：bmm + 高精度累加。
--+#         """
--+#         original_dtype = hidden_states.dtype
--+#         batch_size, _ = hidden_states.shape
--+        
--+#         expert_outputs_list = [
--+#             ops.cat([
--+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+#             ], dim=0) 
--+#             for i in range(batch_size)
--+#         ]
--+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+
--+#         # 在 float32 下执行 bmm，得到高精度结果
--+#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+        
--+#         # 将高精度结果转换回原始数据类型
--+#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--+        
--+#         return moe_output
--+
--+#     @no_grad()
--+#     def _moe_infer_prefill(
--+#         self, 
--+#         hidden_states: mindspore.Tensor, 
--+#         selected_experts: mindspore.Tensor, 
--+#         routing_weights: mindspore.Tensor
--+#     ) -> mindspore.Tensor:
--+#         """
--+#         【预填充路径】与原始实现一致，结果精确。
--+#         """
--+#         moe_output = ops.zeros_like(hidden_states)
--+#         num_tokens, _ = hidden_states.shape
--+#         flat_selected_experts = selected_experts.flatten()
--+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+#         active_experts = ops.unique(flat_selected_experts)
--+        
--+#         for expert_idx_tensor in active_experts:
--+#             expert_idx = expert_idx_tensor.item()
--+#             expert_layer = self.experts[expert_idx]
--+#             mask = (flat_selected_experts == expert_idx_tensor)
--+#             selected_token_indices = token_indices[mask]
--+#             selected_routing_weights = routing_weights.flatten()[mask]
--+#             current_states = hidden_states[selected_token_indices]
--+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+#             moe_output = moe_output.index_add(
--+#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+#             )
--+#         return moe_output
--+
--+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+        
--+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+#         router_logits = self.gate(hidden_states_reshaped)
--+#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
-- 
---    #     return attn_output, attn_weights, past_key_value
--+#         if self.norm_topk_prob:
--+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+        
--+#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--+#         # 如果模型主体是 float16，后续再转换
--+        
--+#         moe_output = None
--+#         if not self.training:
--+#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--+#             # _moe_infer_decode 内部会处理好类型转换
--+#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--+#             if sequence_length == 1:
--+#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+#             else:
--+#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+#         else:
--+#             raise NotImplementedError("Training path is not implemented.")
--+
--+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+        
--+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+        
--+#         return final_hidden_states, router_logits
--+    
-- 
---QWEN2MOE_ATTENTION_CLASSES = {
---    "eager": Qwen2MoeAttention,
---    "flash-attention": Qwen2MoeFlashAttention,
---}
--+# class Qwen2MoeSparseMoeBlock(nn.Module):
--+#     """
--+#     【融合版】一个混合专家模块，内置两种推理策略，
--+#     由外部全局变量 `Long_Prompt` 控制：
--+
--+#     - if Long_Prompt is True:  【精度优先模式】
--+#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--+#       适用于处理长序列，避免误差累积。
--+
--+#     - if Long_Prompt is False: 【速度优先模式】
--+#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--+#       在解码阶段获得极致速度，同时保证结果高度准确。
--+#     """
--+#     def __init__(self, config: Qwen2MoeConfig):
--+#         super().__init__()
--+#         self.num_experts = config.num_experts
--+#         self.top_k = config.num_experts_per_tok
--+#         self.norm_topk_prob = config.norm_topk_prob
--+
--+#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+#         self.experts = nn.ModuleList(
--+#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+#         )
--+#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+
--+#     # --- 速度优先模式的辅助函数 ---
--+#     @no_grad()
--+#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+#         original_dtype = hidden_states.dtype
--+#         batch_size, _ = hidden_states.shape
--+#         expert_outputs_list = [
--+#             ops.cat([
--+#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+#             ], dim=0) 
--+#             for i in range(batch_size)
--+#         ]
--+#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+#         weights_fp32 = routing_weights.to(mindspore.float32)
--+#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+#         return moe_output_fp32.squeeze(1).to(original_dtype)
--+
--+#     @no_grad()
--+#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+#         moe_output = ops.zeros_like(hidden_states)
--+#         num_tokens, _ = hidden_states.shape
--+#         flat_selected_experts = selected_experts.flatten()
--+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+#         active_experts = ops.unique(flat_selected_experts)
--+#         for expert_idx_tensor in active_experts:
--+#             expert_idx = expert_idx_tensor.item()
--+#             expert_layer = self.experts[expert_idx]
--+#             mask = (flat_selected_experts == expert_idx_tensor)
--+#             selected_token_indices = token_indices[mask]
--+#             selected_routing_weights = routing_weights.flatten()[mask]
--+#             current_states = hidden_states[selected_token_indices]
--+#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--+#         return moe_output
--+
--+#     # --- 精度优先模式的辅助函数 ---
--+#     @no_grad()
--+#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+#         moe_output = ops.zeros_like(hidden_states)
--+#         num_tokens, _ = hidden_states.shape
--+#         flat_selected_experts = selected_experts.flatten()
--+#         flat_routing_weights = routing_weights.flatten()
--+#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+#         active_experts = ops.unique(flat_selected_experts)
--+#         for expert_idx_tensor in active_experts:
--+#             expert_idx = expert_idx_tensor.item()
--+#             expert_layer = self.experts[expert_idx]
--+#             mask = (flat_selected_experts == expert_idx_tensor)
--+#             current_token_indices = token_indices[mask]
--+#             current_routing_weights = flat_routing_weights[mask]
--+#             current_hidden_states = hidden_states[current_token_indices]
--+#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+#         return moe_output
--+
--+#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+#         # 声明我们将要使用一个在模块外部定义的全局变量
--+#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--+#         global Long_Prompt
--+
--+#         # 1. 门控计算 (所有模式通用)
--+#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+#         router_logits = self.gate(hidden_states_reshaped)
--+#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+#         if self.norm_topk_prob:
--+#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+        
--+#         moe_output = None
--+#         if not self.training:
--+#             # 根据 Long_Prompt 标志选择模式
--+#             if Long_Prompt:
--+#                 # --- 精度优先模式 ---
--+#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+#             else:
--+#                 # --- 速度优先模式 ---
--+#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+#                 if sequence_length == 1:
--+#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+#                 else:
--+#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+#         else:
--+#             raise NotImplementedError("Training path is not implemented.")
--+
--+#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+        
--+#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+        
--+#         return final_hidden_states, router_logits
--+    
--+class Qwen2MoeSparseMoeBlock(nn.Module):
--+    """
--+    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--+    控制的顶级推理策略：
-- 
--+    - if Long_Prompt is True:  【精度优先模式】
--+      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
--+      适用于需要严格可复现性的长序列任务。
-- 
---class Qwen2MoeSparseMoeBlock(nn.Module):
---    def __init__(self, config):
--+    - if Long_Prompt is False: 【速度优先模式】
--+      采用业界最强的性能组合：
--+      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
--+      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
--+    """
--+    def __init__(self, config: Qwen2MoeConfig):
--         super().__init__()
--         self.num_experts = config.num_experts
--         self.top_k = config.num_experts_per_tok
--         self.norm_topk_prob = config.norm_topk_prob
-- 
---        # gating
--         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--         self.experts = nn.ModuleList(
--             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--         )
---
--         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
-- 
---    #@dwj
---    # 只遍历激活的专家，而非全部专家
---    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
---            batch_size, sequence_length, hidden_dim = hidden_states.shape
---            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
---            num_tokens = hidden_states_reshaped.shape[0]
---
---            router_logits = self.gate(hidden_states_reshaped)
---            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
---
---            if self.norm_topk_prob:
---                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---            routing_weights = routing_weights.to(hidden_states.dtype)
---            
---            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
---            flat_selected_experts = selected_experts.flatten()
---            
---            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
---            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
---            token_indices = broadcasted_token_indices.flatten()
---            
---            active_experts = ops.unique(flat_selected_experts)
---            
---            for expert_idx_tensor in active_experts:
---                expert_idx = expert_idx_tensor.item()
---                expert_layer = self.experts[expert_idx]
---                
---                mask = (flat_selected_experts == expert_idx_tensor)
---                selected_token_indices = token_indices[mask]
---                selected_routing_weights = routing_weights.flatten()[mask]
---                
---                current_states = hidden_states_reshaped[selected_token_indices]
---                
---                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
---                
---                final_hidden_states = final_hidden_states.index_add(
---                    dim=0,
---                    index=selected_token_indices,
---                    source=expert_output.to(hidden_states.dtype)
---                )
---            
---            shared_expert_output = self.shared_expert(hidden_states_reshaped)
---            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
--+    @no_grad()
--+    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+        original_dtype = hidden_states.dtype
--+        batch_size, _ = hidden_states.shape
--+        expert_outputs_list = [
--+            ops.cat([
--+                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+            ], dim=0)
--+            for i in range(batch_size)
--+        ]
--+        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+        weights_fp32 = routing_weights.to(mindspore.float32)
--+        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+        return moe_output_fp32.squeeze(1).to(original_dtype)
--+
--+    @no_grad()
--+    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+        num_tokens, _ = hidden_states.shape
--+        flat_selected_experts = selected_experts.flatten()
--+        sorted_expert_indices = flat_selected_experts.argsort()
--+        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+        original_token_indices = sorted_expert_indices // self.top_k
--+        moe_output = ops.zeros_like(hidden_states)
--+        current_token_offset = 0
--+        for i in range(self.num_experts):
--+            expert_token_count = tokens_per_expert[i] - current_token_offset
--+            if expert_token_count == 0:
--+                continue
--+            end_offset = current_token_offset + expert_token_count
--+            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+            expert_hidden_states = hidden_states[expert_original_token_indices]
--+            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+            current_token_offset += expert_token_count
--+        return moe_output
--+
--+    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--+    @no_grad()
--+    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+        moe_output = ops.zeros_like(hidden_states)
--+        num_tokens, _ = hidden_states.shape
--+        flat_selected_experts = selected_experts.flatten()
--+        flat_routing_weights = routing_weights.flatten()
--+        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+        active_experts = ops.unique(flat_selected_experts)
--+        for expert_idx_tensor in active_experts:
--+            expert_idx = expert_idx_tensor.item()
--+            expert_layer = self.experts[expert_idx]
--+            mask = (flat_selected_experts == expert_idx_tensor)
--+            current_token_indices = token_indices[mask]
--+            current_routing_weights = flat_routing_weights[mask]
--+            current_hidden_states = hidden_states[current_token_indices]
--+            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+        return moe_output
-- 
---            final_hidden_states = final_hidden_states + shared_expert_output
---            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
---            
---            return final_hidden_states, router_logits
--+    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+        global Long_Prompt
--+
--+        # 1. 门控计算 (所有模式通用)
--+        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+        router_logits = self.gate(hidden_states_reshaped)
--+        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+        if self.norm_topk_prob:
--+            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+        
--+        moe_output = None
--+        if Long_Prompt:
--+            # --- 精度优先模式 (ACCURACY MODE) ---
--+            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+        else:
--+            # --- 速度优先模式 (SPEED MODE) ---
--+            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+            if sequence_length == 1:
--+                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+            else:
--+                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+        
-- 
--+        # 3. 共享专家计算与合并 (所有模式通用)
--+        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+        
--+        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+        
--+        return final_hidden_states, router_logits
-- 
-- class Qwen2MoeDecoderLayer(nn.Module):
--     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--         super().__init__()
--         self.hidden_size = config.hidden_size
--+        
--+        # if Long_Prompt:
--+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+        # else:
--+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
-- 
--         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-- 
---        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
---
--         if (layer_idx not in config.mlp_only_layers) and (
--             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--         ):
--@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--             self._warmed_up = True
--             self.warmup_moe_model()
-- 
--+
--+
--         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--         output_router_logits = (
--             output_router_logits if output_router_logits is not None else self.config.output_router_logits
--@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--             router_logits=outputs.router_logits,
--         )
-- 
--+    def generate(self, *args, **kwargs):
--+        """
--+        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--+        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--+        """
--+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--+
--+        input_ids = kwargs.get("input_ids")
--+        if input_ids is None and args:
--+            input_ids = args[0]
--+
--+        if input_ids is not None:
--+            prompt_length = input_ids.shape[1]
--+            
--+            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--+                Long_Prompt = True
--+            else:
--+                Long_Prompt = False
--+
--+        return super().generate(*args, **kwargs)
--+    
--     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
--     def prepare_inputs_for_generation(
--         self,
--@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
--         # Exception 1: when passing input_embeds, input_ids may be missing entries
--         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
--+        
--         if past_key_values is not None:
--             if inputs_embeds is not None:  # Exception 1
--                 if 0 not in input_ids.shape:
--@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--             }
--         )
--         return model_inputs
--+
-- # @lwx
--     # def _decode_one_tokens_logits(
--     #     self,
--@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
--             attentions=outputs.attentions,
--         )
-- 
--+
-- __all__ = [
--     "Qwen2MoeForCausalLM",
--     "Qwen2MoeModel",
--diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--new file mode 100644
--index 00000000..6dfb5b93
----- /dev/null
--+++ b/patches/0001-20251104commit.patch
--@@ -0,0 +1,1272 @@
--+From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+From: Pinoeer-kingxi <13022943007@163.com>
--+Date: Tue, 4 Nov 2025 09:11:51 +0800
--+Subject: [PATCH] 20251104commit
--+
--+---
--+ mindnlp/transformers/cache_utils.py           |  28 +-
--+ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+ 3 files changed, 976 insertions(+), 87 deletions(-)
--+
--+diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+index cadd2e04..02f8d4be 100644
--+--- a/mindnlp/transformers/cache_utils.py
--++++ b/mindnlp/transformers/cache_utils.py
--+@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+             # k_out[:, :, cache_position] = key_states
--+             # v_out[:, :, cache_position] = value_states
--+-            if ON_ORANGE_PI:
--+-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+-            else:
--+-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+-
--++            # if ON_ORANGE_PI:
--++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++            # else:
--++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++            # 确保 cache_position 是 1D tensor 并且类型正确
--++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--++            if cache_position.ndim > 1:
--++                cache_position = cache_position.flatten()
--++            # 确保类型是 int32 或 int64（MindSpore 要求）
--++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--++                cache_position = cache_position.int()
--++            
--++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--++            k_out[:, :, cache_position] = key_states
--++            v_out[:, :, cache_position] = value_states
--++            
--+         return k_out, v_out
--+ 
--+     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+index c695b944..d8303e45 100644
--+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+ def rotate_half(x):
--+     """Rotates half the hidden dims of the input."""
--+-    x1 = x[..., : x.shape[-1] // 2]
--+-    x2 = x[..., x.shape[-1] // 2 :]
--++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++    # x1 = x[..., : x.shape[-1] // 2]
--++    # x2 = x[..., x.shape[-1] // 2 :]
--++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+     return ops.cat((-x2, x1), dim=-1)
--+ 
--+ 
--+@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+         if self.training:
--+             raise NotImplementedError("Training is not supported yet.")
--+         else:
--+-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+-        if self.config.n_shared_experts is not None:
--+-            y = y + self.shared_experts(identity)
--+-        return y
--++            # @lwx
--++            if orig_shape[1] == 1:
--++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--++                y=y.view(*orig_shape)
--++                if self.config.n_shared_experts is not None:
--++                    y = y + self.shared_experts(identity)
--++                return y
--++            else:
--++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--++                if self.config.n_shared_experts is not None:
--++                    y = y + self.shared_experts(identity)
--++                return y
--++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++        # if self.config.n_shared_experts is not None:
--++        #     y = y + self.shared_experts(identity)
--++        # return y
--++        
--++    @no_grad()
--++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++
--++        expert_cache = ops.zeros_like(x)
--++        for i in range(self.num_experts_per_tok):
--++            expert_id = flat_expert_indices[i].item()
--++            weight = flat_expert_weights[i].item()
--++            expert = self.experts[expert_id]
--++            expert_out = expert(x)
--++            expert_cache += expert_out * weight
--++        return expert_cache
--+ 
--+     @no_grad()
--+-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+-        # expert_cache = torch.zeros_like(x)
--+-        # idxs = flat_expert_indices.argsort()
--+-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+-        # token_idxs = idxs // self.num_experts_per_tok
--+-        # for i, end_idx in enumerate(tokens_per_expert):
--+-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+-        #     if start_idx == end_idx:
--+-        #         continue
--+-        #     expert = self.experts[i]
--+-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+-        #     expert_tokens = x[exp_token_idx]
--+-        #     expert_out = expert(expert_tokens)
--+-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+-        # return expert_cache
--++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+         expert_cache = ops.zeros_like(x)
--+         idxs = flat_expert_indices.argsort()
--+         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+         token_idxs = idxs // self.num_experts_per_tok
--++
--+         for i, end_idx in enumerate(tokens_per_expert):
--+             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+             if start_idx == end_idx:
--+@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+             expert_out = expert(expert_tokens)
--+             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++
--+         return expert_cache
--++        
--++    # @no_grad()
--++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++    #     # expert_cache = torch.zeros_like(x)
--++    #     # idxs = flat_expert_indices.argsort()
--++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++    #     # token_idxs = idxs // self.num_experts_per_tok
--++    #     # for i, end_idx in enumerate(tokens_per_expert):
--++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++    #     #     if start_idx == end_idx:
--++    #     #         continue
--++    #     #     expert = self.experts[i]
--++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++    #     #     expert_tokens = x[exp_token_idx]
--++    #     #     expert_out = expert(expert_tokens)
--++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++    #     # return expert_cache
--++    #     expert_cache = ops.zeros_like(x)
--++    #     idxs = flat_expert_indices.argsort()
--++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++    #     token_idxs = idxs // self.num_experts_per_tok
--++
--++    #     for i, end_idx in enumerate(tokens_per_expert):
--++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++    #         if start_idx == end_idx:
--++    #             continue
--++    #         expert = self.experts[i]
--++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++    #         expert_tokens = x[exp_token_idx]
--++    #         expert_out = expert(expert_tokens)
--++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++
--++    #     return expert_cache
--++    # @no_grad()
--++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++    #     expert_cache = ops.zeros_like(x)
--++
--++    #     # 排序保证顺序一致
--++    #     idxs = flat_expert_indices.argsort()
--++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++    #     token_idxs = idxs // self.num_experts_per_tok
--++
--++    #     # 找出有 token 的专家
--++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++
--++    #     for i in active_experts.tolist():
--++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++    #         end_idx = tokens_per_expert[i]
--++    #         if start_idx == end_idx:  # 没有 token
--++    #             continue
--++
--++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++    #         expert_tokens = x[exp_token_idx]
--++    #         expert_out = self.experts[i](expert_tokens)
--++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++
--++    #         expert_cache = mindspore.mint.scatter_add(
--++    #             expert_cache,
--++    #             0,
--++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++    #             expert_out
--++    #         )
--++
--++    #     return expert_cache
--++
--++
--+ 
--+ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+ #     """
--+@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+ 
--+         # Initialize weights and apply final processing
--+         self.post_init()
--++        self.warm_up = False
--++
--++    def warmup_moe_model_deep(self):
--++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++        test_texts = [
--++            "warmup short",
--++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--++        ]
--++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++        if tokenizer is None:
--++            from mindnlp.transformers import AutoTokenizer
--++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++            self._warmup_tokenizer = tokenizer
--++
--++        for text in test_texts:
--++            inputs = tokenizer(text, return_tensors="ms")
--++            with mindspore._no_grad():
--++                _ = self(**inputs, use_cache=False)
--++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+ 
--+     def get_input_embeddings(self):
--+         return self.model.embed_tokens
--+@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+         ```"""
--++        if not self.warm_up:
--++            self.warm_up = True
--++            self.warmup_moe_model_deep()
--++
--+         output_attentions = (
--+             output_attentions
--+             if output_attentions is not None
--+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+index 3cbf820e..d4c6b651 100644
--+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+@@ -18,7 +18,6 @@
--+ # See the License for the specific language governing permissions and
--+ # limitations under the License.
--+ """MindSpore Qwen2MoE model."""
--+-
--+ import math
--+ from typing import List, Optional, Tuple, Union
--+ 
--+@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+     TokenClassifierOutput,
--+ )
--+ from ...modeling_utils import PreTrainedModel
--++from ...generation import GenerationMixin
--+ from ....utils import logging
--+ from .configuration_qwen2_moe import Qwen2MoeConfig
--+ 
--+@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+         self.variance_epsilon = eps
--+ 
--+     def forward(self, hidden_states):
--++        # @dwj
--++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++        # @lwx
--++        # if not self.training :
--++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+         input_dtype = hidden_states.dtype
--+         hidden_states = hidden_states.to(mindspore.float32)
--+         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+@@ -234,6 +239,8 @@ def rotate_half(x):
--+     """Rotates half the hidden dims of the input."""
--+     x1 = x[..., : x.shape[-1] // 2]
--+     x2 = x[..., x.shape[-1] // 2 :]
--++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+     return ops.cat((-x2, x1), dim=-1)
--+ 
--+ 
--+@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+         self.config = config
--+         self.hidden_size = config.hidden_size
--+         self.intermediate_size = intermediate_size
--++        
--+         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+         self.act_fn = ACT2FN[config.hidden_act]
--+ 
--+     def forward(self, x):
--+-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+-
--+ 
--++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++        # @lwx 
--++        # gate_up_output = self.gate_up_proj(x)
--++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--++        # return self.down_proj(swiglu_output)
--++
--++    # def forward(self, x):
--++    #     gate_proj_out = self.gate_proj(x)
--++    #     up_proj_out = self.up_proj(x)
--++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--++    #     return self.down_proj(swiglu_out)
--++        
--+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+     """
--+@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+         use_cache: bool = False,
--+         cache_position: Optional[mindspore.Tensor] = None,
--+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++        
--++
--+         bsz, q_len, _ = hidden_states.shape
--+ 
--+         query_states = self.q_proj(hidden_states)
--+@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+                     "with a layer index."
--+                 )
--+-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++            if isinstance(past_key_value, StaticCache):
--++                kv_seq_len = key_states.shape[-2]
--++            else:
--++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+ 
--+         if past_key_value is not None:
--+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++            
--++            if isinstance(past_key_value, StaticCache):
--++                kv_seq_len = key_states.shape[-2]
--+ 
--+         # repeat k/v heads if n_kv_heads < n_heads
--+         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+-
--++        
--+         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+ 
--+-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+-            raise ValueError(
--+-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+-                f" {attn_weights.shape}"
--+-            )
--+-
--+-        if attention_mask is not None:  # no matter the length, we just slice it
--+-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--++        if attention_mask is not None:
--++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+             attn_weights = attn_weights + causal_mask
--+ 
--+         # upcast attention to fp32
--+@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+ 
--+         attn_output = self.o_proj(attn_output)
--+-
--++        # @lwx
--++        
--++        # max_seq_len = self.max_position_embeddings  # 2048
--++
--++        # if attention_mask is not None:
--++        #     # attention_mask: [B, 1, Sq, Sk]
--++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++
--++        #     # pad 到 [max_seq_len, max_seq_len]
--++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++        #     global_attention_mask = padded_mask
--++        # else:
--++        #     global_attention_mask = None
--++
--++
--++        # sparse_mode=3
--++        # attn_output = mindspore.ops.flash_attention_score(
--++        #     query=query_states,
--++        #     key=key_states,
--++        #     value=value_states,
--++        #     real_shift=None,
--++        #     padding_mask=None,
--++
--++        #     head_num=self.num_heads,
--++        #     attn_mask=global_attention_mask,
--++        #     keep_prob=1.0 - self.attention_dropout,
--++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++        #     input_layout="BNSD",
--++        #     pre_tokens=2147483647,
--++        #     next_tokens=2147483647,
--++        #     inner_precise=0,
--++        #     drop_mask=None,
--++        #     prefix=None,
--++        #     actual_seq_qlen=None,
--++        #     actual_seq_kvlen=None,
--++        #     sparse_mode=sparse_mode,
--++        # )
--+         if not output_attentions:
--+             attn_weights = None
--+ 
--+         return attn_output, attn_weights, past_key_value
--+ 
--+ 
--++class Qwen2MoeFlashAttention(nn.Module):
--++    """
--++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++
--++    关键改动:
--++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++       直接传入原始的 key 和 value 张量效率更高。
--++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++    """
--++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++        super().__init__()
--++        self.config = config
--++        self.layer_idx = layer_idx
--++        self.hidden_size = config.hidden_size
--++        self.num_heads = config.num_attention_heads
--++        self.head_dim = self.hidden_size // self.num_heads
--++        self.num_key_value_heads = config.num_key_value_heads
--++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++        self.max_position_embeddings = config.max_position_embeddings
--++        self.rope_theta = config.rope_theta
--++        self.attention_dropout = config.attention_dropout
--++
--++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++            raise ValueError(
--++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++            )
--++
--++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++
--++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++            self.head_dim,
--++            max_position_embeddings=self.max_position_embeddings,
--++            base=self.rope_theta,
--++        )
--++
--++    def forward(
--++        self,
--++        hidden_states: mindspore.Tensor,
--++        attention_mask: Optional[mindspore.Tensor] = None,
--++        position_ids: Optional[mindspore.Tensor] = None,
--++        past_key_value: Optional[Cache] = None,
--++        output_attentions: bool = False,
--++        use_cache: bool = False,
--++        cache_position: Optional[mindspore.Tensor] = None,
--++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++        bsz, q_len, _ = hidden_states.shape
--++
--++        # 1. 线性投射 Q, K, V
--++        query_states = self.q_proj(hidden_states)
--++        key_states = self.k_proj(hidden_states)
--++        value_states = self.v_proj(hidden_states)
--++
--++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++        # query:   [B, S, H*D] -> [B, N1, S, D]
--++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++        # 3. RoPE 旋转位置编码
--++        kv_seq_len = key_states.shape[-2]
--++        if past_key_value is not None:
--++            if self.layer_idx is None:
--++                raise ValueError(
--++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++                    "with a layer index."
--++                )
--++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++                if cache_position.shape[0] == 1:
--++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++                    kv_seq_len = past_seen_tokens + 1
--++                else:
--++                    # prefill 阶段：cache_position 是范围，使用其长度
--++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++            else:
--++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++
--++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++        # 4. KV 缓存更新
--++        if past_key_value is not None:
--++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++            key_states, value_states = past_key_value.update(
--++                key_states, value_states, self.layer_idx, cache_kwargs
--++            )
--++            
--++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++                if cache_position.shape[0] == 1:
--++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++                    kv_seq_len = key_states.shape[-2]
--++
--++        # 5. [重要] 准备 Attention Mask
--++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++        fa_attention_mask = None
--++        if attention_mask is not None:
--++            # 截取与当前key长度匹配的部分
--++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++            fa_attention_mask = (mask_slice != 0)
--++
--++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++        input_dtype = query_states.dtype
--++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++            query_states = query_states.to(mindspore.float16)
--++            key_states = key_states.to(mindspore.float16)
--++            value_states = value_states.to(mindspore.float16)
--++
--++        # 6. [核心] 调用 flash_attention_score 算子
--++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++        attn_output = mindspore.ops.flash_attention_score(
--++            query=query_states,
--++            key=key_states,
--++            value=value_states,
--++            head_num=self.num_heads, # 传入Q的头数(N1)
--++            attn_mask=fa_attention_mask,
--++            keep_prob=1.0 - self.attention_dropout,
--++            scalar_value=1.0 / math.sqrt(self.head_dim),
--++            input_layout="BNSD",
--++            sparse_mode=0 # 使用 defaultMask 模式
--++        )
--++
--++        # 恢复原始数据类型
--++        attn_output = attn_output.to(input_dtype)
--++
--++        # 7. 调整输出形状
--++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++        attn_output = self.o_proj(attn_output)
--++
--++        # FlashAttention 算子不直接返回注意力权重矩阵
--++        attn_weights = None
--++        if output_attentions:
--++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++
--++        return attn_output, attn_weights, past_key_value
--++
--++    # def forward(
--++    #     self,
--++    #     hidden_states: mindspore.Tensor,
--++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++    #     position_ids: Optional[mindspore.Tensor] = None,
--++    #     past_key_value: Optional[Cache] = None,
--++    #     output_attentions: bool = False,
--++    #     use_cache: bool = False,
--++    #     cache_position: Optional[mindspore.Tensor] = None,
--++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++    #     bsz, q_len, _ = hidden_states.shape
--++
--++    #     # 1. 线性投射 Q, K, V
--++    #     query_states = self.q_proj(hidden_states)
--++    #     key_states = self.k_proj(hidden_states)
--++    #     value_states = self.v_proj(hidden_states)
--++
--++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++    #     # 3. RoPE 旋转位置编码
--++    #     kv_seq_len = key_states.shape[-2]
--++    #     if past_key_value is not None:
--++    #         if self.layer_idx is None:
--++    #             raise ValueError(
--++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++    #                 "with a layer index."
--++    #             )
--++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++
--++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++    #     # 4. KV 缓存更新
--++    #     if past_key_value is not None:
--++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++    #         key_states, value_states = past_key_value.update(
--++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++    #         )
--++
--++    #     # 5. 准备 Attention Mask
--++    #     fa_attention_mask = None
--++    #     if attention_mask is not None:
--++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++    #         fa_attention_mask = (mask_slice != 0)
--++
--++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++    #     input_dtype = query_states.dtype
--++
--++    #     # 6. [核心] 调用 flash_attention_score 算子
--++    #     attn_output = mindspore.ops.flash_attention_score(
--++    #         query=query_states,
--++    #         key=key_states,
--++    #         value=value_states,
--++    #         head_num=self.num_heads,
--++    #         attn_mask=fa_attention_mask,
--++    #         keep_prob=1.0 - self.attention_dropout,
--++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++    #         input_layout="BNSD",
--++    #         sparse_mode=0,
--++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++    #         inner_precise=1
--++    #     )
--++
--++    #     # 恢复原始数据类型
--++    #     attn_output = attn_output.to(input_dtype)
--++
--++    #     # 7. 调整输出形状
--++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++    #     attn_output = self.o_proj(attn_output)
--++
--++    #     attn_weights = None
--++    #     if output_attentions:
--++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++
--++    #     return attn_output, attn_weights, past_key_value
--++
--++    # def forward(
--++    #     self,
--++    #     hidden_states: mindspore.Tensor,
--++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++    #     position_ids: Optional[mindspore.Tensor] = None,
--++    #     past_key_value: Optional[Cache] = None,
--++    #     output_attentions: bool = False,
--++    #     use_cache: bool = False,
--++    #     cache_position: Optional[mindspore.Tensor] = None,
--++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++    #     bsz, q_len, _ = hidden_states.shape
--++
--++    #     query_states = self.q_proj(hidden_states)
--++    #     key_states = self.k_proj(hidden_states)
--++    #     value_states = self.v_proj(hidden_states)
--++
--++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++    #     kv_seq_len = key_states.shape[-2]
--++    #     if past_key_value is not None:
--++    #         if self.layer_idx is None:
--++    #             raise ValueError("`layer_idx` must be specified for caching")
--++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++
--++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++    #     if past_key_value is not None:
--++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++    #         key_states, value_states = past_key_value.update(
--++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++    #         )
--++
--++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++
--++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++    #     query_states = query_states / math.sqrt(self.head_dim)
--++    #     # <--- 修改结束 ---
--++
--++    #     fa_attention_mask = None
--++    #     if attention_mask is not None:
--++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++    #         fa_attention_mask = (mask_slice != 0)
--++
--++    #     input_dtype = query_states.dtype
--++
--++    #     attn_output = mindspore.ops.flash_attention_score(
--++    #         query=query_states,    # 传入已经预先缩放过的 query
--++    #         key=key_states,
--++    #         value=value_states,
--++    #         head_num=self.num_heads,
--++    #         attn_mask=fa_attention_mask,
--++    #         keep_prob=1.0 - self.attention_dropout,
--++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++    #         input_layout="BNSD",
--++    #         sparse_mode=0,
--++    #         inner_precise=1        # 仍然保持内部高精度计算
--++    #     )
--++
--++    #     attn_output = attn_output.to(input_dtype)
--++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++    #     attn_output = self.o_proj(attn_output)
--++
--++    #     attn_weights = None
--++    #     if output_attentions:
--++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++
--++    #     return attn_output, attn_weights, past_key_value
--++
--+ QWEN2MOE_ATTENTION_CLASSES = {
--+     "eager": Qwen2MoeAttention,
--++    "flash-attention": Qwen2MoeFlashAttention,
--+ }
--+ 
--+ 
--+@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+ 
--++    #@dwj
--++    # 只遍历激活的专家，而非全部专家
--+     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-        hidden_states = hidden_states.view(-1, hidden_dim)
--+-        # router_logits: (batch * sequence_length, n_experts)
--+-        router_logits = self.gate(hidden_states)
--+-
--+-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+-        if self.norm_topk_prob:
--+-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-        # we cast back to the input dtype
--+-        routing_weights = routing_weights.to(hidden_states.dtype)
--+-
--+-        final_hidden_states = ops.zeros(
--+-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+-        )
--+-
--+-        # One hot encode the selected experts to create an expert mask
--+-        # this will be used to easily index which expert is going to be sollicitated
--+-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+-
--+-        # Loop over all available experts in the model and perform the computation on each expert
--+-        for expert_idx in range(self.num_experts):
--+-            expert_layer = self.experts[expert_idx]
--+-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+-
--+-            # Index the correct hidden states and compute the expert hidden state for
--+-            # the current expert. We need to make sure to multiply the output hidden
--+-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+-            if 0 not in idx.shape:
--+-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+-
--+-                # However `index_add_` only support torch tensors for indexing so we'll use
--+-                # the `top_x` tensor here.
--+-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+-
--+-        shared_expert_output = self.shared_expert(hidden_states)
--+-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+-
--+-        final_hidden_states = final_hidden_states + shared_expert_output
--++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++            num_tokens = hidden_states_reshaped.shape[0]
--++
--++            router_logits = self.gate(hidden_states_reshaped)
--++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++
--++            if self.norm_topk_prob:
--++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++            routing_weights = routing_weights.to(hidden_states.dtype)
--++            
--++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++            flat_selected_experts = selected_experts.flatten()
--++            
--++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++            token_indices = broadcasted_token_indices.flatten()
--++            
--++            active_experts = ops.unique(flat_selected_experts)
--++            
--++            for expert_idx_tensor in active_experts:
--++                expert_idx = expert_idx_tensor.item()
--++                expert_layer = self.experts[expert_idx]
--++                
--++                mask = (flat_selected_experts == expert_idx_tensor)
--++                selected_token_indices = token_indices[mask]
--++                selected_routing_weights = routing_weights.flatten()[mask]
--++                
--++                current_states = hidden_states_reshaped[selected_token_indices]
--++                
--++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++                
--++                final_hidden_states = final_hidden_states.index_add(
--++                    dim=0,
--++                    index=selected_token_indices,
--++                    source=expert_output.to(hidden_states.dtype)
--++                )
--++            
--++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+ 
--+-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+-        return final_hidden_states, router_logits
--++            final_hidden_states = final_hidden_states + shared_expert_output
--++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++            
--++            return final_hidden_states, router_logits
--+ 
--+ 
--+ class Qwen2MoeDecoderLayer(nn.Module):
--+@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+ 
--+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+ 
--++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++
--+         if (layer_idx not in config.mlp_only_layers) and (
--+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+         ):
--+@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+     _skip_keys_device_placement = "past_key_values"
--+     _supports_cache_class = True
--++#lwx
--++    # _supports_static_cache = True
--+ 
--+     def _init_weights(self, module):
--+         std = self.config.initializer_range
--+@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+         return causal_mask
--+ 
--+ 
--+-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+     _tied_weights_keys = ["lm_head.weight"]
--+ 
--+     def __init__(self, config):
--+@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+         self.num_experts_per_tok = config.num_experts_per_tok
--+         # Initialize weights and apply final processing
--+         self.post_init()
--++        # @lwx
--++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--++        #     self.generation_config.cache_implementation = "static"
--++        self._warmed_up = False
--++
--++    def warmup_moe_model(self):
--++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--++        test_texts = [
--++            "warmup short",
--++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--++        ]
--++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++        if tokenizer is None:
--++            from mindnlp.transformers import AutoTokenizer
--++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++            self._warmup_tokenizer = tokenizer
--++
--++        for text in test_texts:
--++            inputs = tokenizer(text, return_tensors="ms")
--++            with mindspore._no_grad():
--++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+ 
--+     def get_input_embeddings(self):
--+         return self.model.embed_tokens
--+@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+         ```"""
--++        if not self._warmed_up:
--++            self._warmed_up = True
--++            self.warmup_moe_model()
--+ 
--+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+         output_router_logits = (
--+@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+             }
--+         )
--+         return model_inputs
--++# @lwx
--++    # def _decode_one_tokens_logits(
--++    #     self,
--++    #     cur_token: mindspore.Tensor,
--++    #     input_pos: Optional[mindspore.Tensor],
--++    #     cache_position: mindspore.Tensor,
--++    #     past_key_values: StaticCache,
--++    # ) -> mindspore.Tensor:
--++    #     """
--++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--++        
--++    #     Args:
--++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--++    #         input_pos: 输入位置信息，可选
--++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--++            
--++    #     Returns:
--++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--++    #     """
--++    #     # 调用JIT编译的版本
--++    #     return self.get_decode_one_tokens_logits(
--++    #         cur_token=cur_token,
--++    #         input_pos=input_pos,
--++    #         cache_position=cache_position,
--++    #         past_key_values=past_key_values,
--++    #     )
--++    
--++    # @mindspore.jit(jit_level='O1')
--++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--++    #     """
--++    #     JIT编译的函数，用于高效的单token解码
--++    #     使用JIT编译优化以支持静态shape和高效执行
--++        
--++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--++    #     """
--++    #     outputs = self.model.forward(
--++    #         input_ids=cur_token,
--++    #         position_ids=input_pos,
--++    #         cache_position=cache_position,
--++    #         past_key_values=past_key_values,
--++    #         use_cache=True,
--++    #         return_dict=False,
--++    #     )
--++        
--++    #     hidden_states = outputs[0]
--++    #     logits = self.lm_head.forward(hidden_states)
--++    #     logits = logits.float()
--++        
--++    #     return logits[:, -1, :]
--++
--++    # def _sample(
--++    #     self,
--++    #     input_ids: mindspore.Tensor,
--++    #     logits_processor,
--++    #     stopping_criteria,
--++    #     generation_config,
--++    #     synced_devices: bool,
--++    #     streamer=None,
--++    #     logits_warper=None,
--++    #     **model_kwargs,
--++    # ):
--++    #     """
--++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--++    #     """
--++    #     from ...generation.logits_process import LogitsProcessorList
--++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--++    #     from mindnlp.core import nn, ops, no_grad
--++    #     import numpy as np
--++        
--++    #     # 检查是否使用 StaticCache
--++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--++    #     # 否则，直接调用父类方法
--++    #     past_key_values = model_kwargs.get("past_key_values")
--++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--++
--++    #     if not isinstance(past_key_values, StaticCache):
--++    #         # 不使用 StaticCache，直接调用父类方法
--++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--++    #         return super()._sample(
--++    #             input_ids=input_ids,
--++    #             logits_processor=logits_processor,
--++    #             stopping_criteria=stopping_criteria,
--++    #             generation_config=generation_config,
--++    #             synced_devices=synced_devices,
--++    #             streamer=streamer,
--++    #             logits_warper=logits_warper,
--++    #             **model_kwargs,
--++    #         )
--++        
--++    #     # 使用 StaticCache，进入自定义循环
--++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--++    #     pad_token_id = generation_config._pad_token_tensor
--++    #     output_attentions = generation_config.output_attentions
--++    #     output_hidden_states = generation_config.output_hidden_states
--++    #     output_scores = generation_config.output_scores
--++    #     output_logits = generation_config.output_logits
--++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--++    #     max_length = generation_config.max_length
--++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--++    #     do_sample = generation_config.do_sample
--++        
--++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--++    #         raise ValueError(
--++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--++    #             f"{logits_warper})."
--++    #         )
--++        
--++    #     # init attention / hidden states / scores tuples
--++    #     scores = () if (return_dict_in_generate and output_scores) else None
--++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--++        
--++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--++    #         encoder_hidden_states = (
--++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--++    #         )
--++        
--++    #     # keep track of which sequences are already finished
--++    #     batch_size, cur_len = input_ids.shape
--++    #     this_peer_finished = False
--++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--++        
--++    #     time_record = []
--++    #     from ....utils.testing_utils import parse_flag_from_env
--++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--++        
--++    #     while self._has_unfinished_sequences(
--++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--++    #     ):
--++    #         if _record_time:
--++    #             import time as time_module
--++    #             infer_start = time_module.time()
--++            
--++    #         # prepare model inputs
--++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--++            
--++    #         # prepare variable output controls
--++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--++            
--++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--++    #         cur_cache_position = model_inputs.get("cache_position")
--++    #         cur_past_key_values = model_inputs.get("past_key_values")
--++    #         cur_input_ids = model_inputs.get("input_ids")
--++            
--++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--++    #             cur_cache_position is not None and 
--++    #             len(cur_cache_position.shape) > 0 and
--++    #             cur_cache_position.shape[0] == 1 and
--++    #             cur_input_ids is not None and
--++    #             cur_input_ids.shape[1] == 1):
--++    #             # 使用 JIT 优化的单 token 解码
--++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--++    #             if not hasattr(self, '_jit_used'):
--++    #                 self._jit_used = False
--++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--++                
--++    #             next_token_logits = self.get_decode_one_tokens_logits(
--++    #                 cur_token=cur_input_ids,
--++    #                 input_pos=model_inputs.get("position_ids"),
--++    #                 cache_position=cur_cache_position,
--++    #                 past_key_values=cur_past_key_values,
--++    #             )
--++                
--++    #             # 标记已使用JIT（用于后续判断）
--++    #             if not self._jit_used:
--++    #                 self._jit_used = True
--++                
--++    #             # 构造兼容的输出对象
--++    #             class JitOptimizedOutput:
--++    #                 def __init__(self, logits, config):
--++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--++    #                     self.config = config
--++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--++    #                     self.attentions = None if not config.is_encoder_decoder else None
--++    #                     self.cross_attentions = None
--++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--++                
--++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--++    #         else:
--++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--++    #             outputs = self(**model_inputs, return_dict=True)
--++            
--++    #         if synced_devices and this_peer_finished:
--++    #             continue
--++            
--++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--++    #         next_token_logits = outputs.logits[:, -1, :]
--++            
--++    #         # pre-process distribution
--++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--++    #         if do_sample:
--++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--++            
--++    #         # Store scores, attentions and hidden_states when required
--++    #         if return_dict_in_generate:
--++    #             if output_scores:
--++    #                 scores += (next_token_scores,)
--++    #             if output_logits:
--++    #                 raw_logits += (next_token_logits,)
--++    #             if output_attentions:
--++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--++    #                 if self.config.is_encoder_decoder:
--++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--++                
--++    #             if output_hidden_states:
--++    #                 hidden = (
--++    #                     outputs.decoder_hidden_states
--++    #                     if self.config.is_encoder_decoder
--++    #                     else outputs.hidden_states
--++    #                 )
--++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--++            
--++    #         # token selection
--++    #         if do_sample:
--++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--++    #         else:
--++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--++            
--++    #         # finished sentences should have their next token be a padding token
--++    #         if has_eos_stopping_criteria:
--++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--++            
--++    #         # update generated ids, model inputs, and length for next step
--++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--++    #         if streamer is not None:
--++    #             streamer.put(next_tokens)
--++            
--++    #         model_kwargs = self._update_model_kwargs_for_generation(
--++    #             outputs,
--++    #             model_kwargs,
--++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--++    #         )
--++            
--++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--++    #         cur_len += 1
--++            
--++    #         if _record_time:
--++    #             import time as time_module
--++    #             infer_stop = time_module.time()
--++    #             time_record.append(infer_stop - infer_start)
--++            
--++    #         del outputs
--++        
--++    #     average_infer_time = None
--++    #     if time_record:
--++    #         if len(time_record) > 1:
--++    #             time_record.pop(0)
--++    #         average_infer_time = sum(time_record) / len(time_record)
--++    #         print(f'average inference time is: {average_infer_time}')
--++    #         print(f'inference time record: {time_record}')
--++        
--++    #     if streamer is not None:
--++    #         streamer.end()
--++        
--++    #     # 简单判断：打印是否使用了JIT路径
--++    #     if hasattr(self, '_jit_used') and self._jit_used:
--++    #         print("[JIT] ✓ JIT optimization was used during generation")
--++    #     else:
--++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--++        
--++    #     if return_dict_in_generate:
--++    #         if self.config.is_encoder_decoder:
--++    #             return GenerateEncoderDecoderOutput(
--++    #                 sequences=input_ids,
--++    #                 scores=scores,
--++    #                 logits=raw_logits,
--++    #                 encoder_attentions=encoder_attentions,
--++    #                 encoder_hidden_states=encoder_hidden_states,
--++    #                 decoder_attentions=decoder_attentions,
--++    #                 cross_attentions=cross_attentions,
--++    #                 decoder_hidden_states=decoder_hidden_states,
--++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++    #                 average_infer_time=average_infer_time
--++    #             )
--++    #         else:
--++    #             return GenerateDecoderOnlyOutput(
--++    #                 sequences=input_ids,
--++    #                 scores=scores,
--++    #                 logits=raw_logits,
--++    #                 attentions=decoder_attentions,
--++    #                 hidden_states=decoder_hidden_states,
--++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++    #                 average_infer_time=average_infer_time
--++    #             )
--++    #     else:
--++    #         return input_ids
--++            
--++    # def _prepare_cache_for_generation(
--++    #     self,
--++    #     generation_config,
--++    #     model_kwargs,
--++    #     assistant_model,
--++    #     batch_size,
--++    #     max_cache_length,
--++    # ):
--++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--++    #         generation_config.cache_implementation = "static"
--++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--++        
--++    #     if generation_config.cache_implementation == "static":
--++    #         base_required_from_max_length = generation_config.max_length + 1
--++    #         base_required = max(max_cache_length, base_required_from_max_length)
--++    #         min_cache_size = 50
--++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--++    #         else:
--++    #             max_cache_length = max(base_required, min_cache_size)
--++            
--++    #         original_max_cache_length = max_cache_length
--++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--++    #         print(f"  - final max_cache_length: {max_cache_length}")
--++            
--++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++    #             if max_cache_length > self.config.max_position_embeddings:
--++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--++        
--++    #     result = super()._prepare_cache_for_generation(
--++    #         generation_config=generation_config,
--++    #         model_kwargs=model_kwargs,
--++    #         assistant_model=assistant_model,
--++    #         batch_size=batch_size,
--++    #         max_cache_length=max_cache_length,
--++    #     )
--++        
--++    #     if generation_config.cache_implementation == "static":
--++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--++    #         created_cache = model_kwargs.get(cache_name)
--++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--++    #             if created_cache.max_cache_len < generation_config.max_length:
--++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--++        
--++    #     return result
--++
--++
--++
--+ 
--+ 
--+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+-- 
--+2.27.0
--+
---- 
--2.27.0
--
-diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
-deleted file mode 100644
-index 179a9bb5..00000000
---- a/patches/0003-20261106secondcommit.patch
-+++ /dev/null
-@@ -1,2769 +0,0 @@
--From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Thu, 6 Nov 2025 14:54:37 +0800
--Subject: [PATCH 3/8] 20261106secondcommit
--
-----
-- .../models/deepseek/modeling_deepseek.py      |  217 ++-
-- .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
-- patches/0001-20251104commit.patch             | 1272 -----------------
-- 3 files changed, 528 insertions(+), 2032 deletions(-)
-- delete mode 100644 patches/0001-20251104commit.patch
--
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index 73773c22..2f9192bf 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
-- 
-- _CONFIG_FOR_DOC = "DeepseekConfig"
-- 
--+_attn_mask_cache = {}
--+
--+def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
--+    q_len = batch_and_seq[1]
--+    kv_len = batch_and_seq[1] + past_key_values_length 
--+    key = (batch_and_seq[0], q_len, kv_len)
--+
--+    if key in _attn_mask_cache:
--+        return _attn_mask_cache[key]
--+
--+    mask = _prepare_4d_causal_attention_mask(
--+        attention_mask,
--+        batch_and_seq,
--+        inputs_embeds,
--+        past_key_values_length,
--+    )
--+    _attn_mask_cache[key] = mask
--+    return mask
-- 
-- def _get_unpad_data(attention_mask):
--     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
--@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
--         return final_output
-- 
-- 
---    @no_grad()
---    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
---        expert_cache = ops.zeros_like(x)
---        idxs = flat_expert_indices.argsort()
---        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
---        token_idxs = idxs // self.num_experts_per_tok
---
---        for i, end_idx in enumerate(tokens_per_expert):
---            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
---            if start_idx == end_idx:
---                continue
---            expert = self.experts[i]
---            exp_token_idx = token_idxs[start_idx:end_idx]
---            expert_tokens = x[exp_token_idx]
---            expert_out = expert(expert_tokens)
---            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
---            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
---
---        return expert_cache
---        
--     # @no_grad()
---    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
---    #     # expert_cache = torch.zeros_like(x)
---    #     # idxs = flat_expert_indices.argsort()
---    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
---    #     # token_idxs = idxs // self.num_experts_per_tok
---    #     # for i, end_idx in enumerate(tokens_per_expert):
---    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
---    #     #     if start_idx == end_idx:
---    #     #         continue
---    #     #     expert = self.experts[i]
---    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
---    #     #     expert_tokens = x[exp_token_idx]
---    #     #     expert_out = expert(expert_tokens)
---    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
---    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
---    #     # return expert_cache
--+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--     #     expert_cache = ops.zeros_like(x)
--     #     idxs = flat_expert_indices.argsort()
--     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
--     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
-- 
--     #     return expert_cache
---    # @no_grad()
---    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
---    #     expert_cache = ops.zeros_like(x)
--+        
--+    @no_grad()
--+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+        """
--+        优化版 MoE prefill：
--+        - 批量张量化处理同一个 expert 的所有 token
--+        - 跳过无 token 的专家
--+        - 保持结果完全一致
--+        """
--+        # 初始化输出缓存
--+        expert_cache = ops.zeros_like(x)
-- 
---    #     # 排序保证顺序一致
---    #     idxs = flat_expert_indices.argsort()
---    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
---    #     token_idxs = idxs // self.num_experts_per_tok
--+        # 排序（确保 scatter_add 位置对应原逻辑）
--+        idxs = flat_expert_indices.argsort()
--+        sorted_expert_indices = flat_expert_indices[idxs]
--+        sorted_token_indices = idxs // self.num_experts_per_tok
-- 
---    #     # 找出有 token 的专家
---    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+        # 每个 expert 的 token 数
--+        tokens_per_expert = sorted_expert_indices.bincount()
-- 
---    #     for i in active_experts.tolist():
---    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
---    #         end_idx = tokens_per_expert[i]
---    #         if start_idx == end_idx:  # 没有 token
---    #             continue
--+        # 找出有 token 的专家
--+        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
-- 
---    #         exp_token_idx = token_idxs[start_idx:end_idx]
---    #         expert_tokens = x[exp_token_idx]
---    #         expert_out = self.experts[i](expert_tokens)
---    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+        for expert_id in active_experts.tolist():
--+            # 取该 expert 对应的排序后 token 区间
--+            start = (tokens_per_expert[:expert_id]).sum().item()
--+            end = start + tokens_per_expert[expert_id].item()
-- 
---    #         expert_cache = mindspore.mint.scatter_add(
---    #             expert_cache,
---    #             0,
---    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
---    #             expert_out
---    #         )
--+            token_idx = sorted_token_indices[start:end]      # 原 token 位置
--+            expert_tokens = x[token_idx]                     # 取输入向量
-- 
---    #     return expert_cache
--+            # 执行专家 MLP
--+            expert_out = self.experts[expert_id](expert_tokens)
--+
--+            # 按权重缩放
--+            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
--+
--+            # 回写到缓存（等价 scatter_add）
--+            expert_cache = mindspore.mint.scatter_add(
--+                expert_cache,
--+                0,
--+                token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+                scaled_out
--+            )
--+
--+        return expert_cache
--+
--+        # @no_grad()
--+        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+        #     # expert_cache = torch.zeros_like(x)
--+        #     # idxs = flat_expert_indices.argsort()
--+        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+        #     # token_idxs = idxs // self.num_experts_per_tok
--+        #     # for i, end_idx in enumerate(tokens_per_expert):
--+        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+        #     #     if start_idx == end_idx:
--+        #     #         continue
--+        #     #     expert = self.experts[i]
--+        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+        #     #     expert_tokens = x[exp_token_idx]
--+        #     #     expert_out = expert(expert_tokens)
--+        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+        #     # return expert_cache
--+        #     expert_cache = ops.zeros_like(x)
--+        #     idxs = flat_expert_indices.argsort()
--+        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+        #     token_idxs = idxs // self.num_experts_per_tok
--+
--+        #     for i, end_idx in enumerate(tokens_per_expert):
--+        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+        #         if start_idx == end_idx:
--+        #             continue
--+        #         expert = self.experts[i]
--+        #         exp_token_idx = token_idxs[start_idx:end_idx]
--+        #         expert_tokens = x[exp_token_idx]
--+        #         expert_out = expert(expert_tokens)
--+        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+
--+        #     return expert_cache
--+        # @no_grad()
--+        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+        #     expert_cache = ops.zeros_like(x)
--+
--+        #     # 排序保证顺序一致
--+        #     idxs = flat_expert_indices.argsort()
--+        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+        #     token_idxs = idxs // self.num_experts_per_tok
--+
--+        #     # 找出有 token 的专家
--+        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+
--+        #     for i in active_experts.tolist():
--+        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+        #         end_idx = tokens_per_expert[i]
--+        #         if start_idx == end_idx:  # 没有 token
--+        #             continue
--+
--+        #         exp_token_idx = token_idxs[start_idx:end_idx]
--+        #         expert_tokens = x[exp_token_idx]
--+        #         expert_out = self.experts[i](expert_tokens)
--+        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+
--+        #         expert_cache = mindspore.mint.scatter_add(
--+        #             expert_cache,
--+        #             0,
--+        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+        #             expert_out
--+        #         )
--+
--+        #     return expert_cache
-- 
-- 
-- 
--@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
-- 
--         return attn_output, attn_weights, past_key_value
-- 
---
-- # class DeepseekFlashAttention(nn.Module):
-- #     """
-- #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
-- 
--         return attn_output, attn_weights, past_key_value
-- 
--+
-- Deepseek_ATTENTION_CLASSES = {
--     "eager": DeepseekAttention,
--     "flash-attention": DeepseekFlashAttention,
--@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
--             )
--         else:
--             # 4d mask is passed through the layers
---            attention_mask = _prepare_4d_causal_attention_mask(
--+            # attention_mask = _prepare_4d_causal_attention_mask(
--+            #     attention_mask,
--+            #     (batch_size, seq_length),
--+            #     inputs_embeds,
--+            #     past_key_values_length,
--+            # )
--+            #@dwj
--+            attention_mask = get_cached_causal_mask(
--                 attention_mask,
--                 (batch_size, seq_length),
--                 inputs_embeds,
--@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--         # Initialize weights and apply final processing
--         self.post_init()
--         self.warm_up = False
--+        #@dwj
--+        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--+            self.num_layers,
--+            self.num_attention_heads,
--+            self.head_dim,
--+            batch_size=1,
--+            max_length=self.max_length,
--+            dtype=mindspore.float16
--+        )
--+
--+    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--+        key_cache = []
--+        value_cache = []
--+        for _ in range(num_layers):
--+            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+            key_cache.append(k)
--+            value_cache.append(v)
--+        return key_cache, value_cache
--+
-- 
--     def warmup_moe_model_deep(self):
--         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--index bced285c..ebd7782e 100644
----- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
-- _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
-- _CONFIG_FOR_DOC = "Qwen2MoeConfig"
-- 
---Long_Prompt = False
---PROMPT_LENGTH_THRESHOLD = 128
--+Long_Prompt = 1
--+LONG_PROMPT_LENGTH_THRESHOLD = 128
--+SHORT_PROMPT_LENGTH_THRESHOLD = 32
--+
--+_causal_mask_cache = {}
--+
--+def get_cached_causal_mask_with_cache_position(
--+    attention_mask: mindspore.Tensor,
--+    sequence_length: int,
--+    target_length: int,
--+    dtype: mindspore.dtype,
--+    min_dtype: float,
--+    cache_position: mindspore.Tensor,
--+    batch_size: int,
--+):
--+    """
--+    带缓存的 causal mask 构造函数
--+    """
--+    # q_len 是当前 query 长度
--+    q_len = sequence_length
--+    # kv_len 是 target_length
--+    kv_len = target_length
--+
--+    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
--+    key = (batch_size, q_len, kv_len, dtype, min_dtype)
--+
--+    if key in _causal_mask_cache:
--+        return _causal_mask_cache[key]
--+
--+    # 调用原来的 mask 构造逻辑
--+    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+        attention_mask,
--+        sequence_length=sequence_length,
--+        target_length=target_length,
--+        dtype=dtype,
--+        min_dtype=min_dtype,
--+        cache_position=cache_position,
--+        batch_size=batch_size,
--+    )
--+    # 缓存结果
--+    _causal_mask_cache[key] = causal_mask
--+    return causal_mask
-- 
-- # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
-- def _prepare_4d_causal_attention_mask_with_cache_position(
--@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
-- 
-- 
-- # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
--+# class Qwen2MoeAttention(nn.Module):
--+#     """
--+#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--+#     and "Generating Long Sequences with Sparse Transformers".
--+#     """
--+
--+#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+#         super().__init__()
--+#         self.config = config
--+#         self.layer_idx = layer_idx
--+#         if layer_idx is None:
--+#             logger.warning_once(
--+#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+#                 "when creating this class."
--+#             )
--+
--+#         self.hidden_size = config.hidden_size
--+#         self.num_heads = config.num_attention_heads
--+#         self.head_dim = self.hidden_size // self.num_heads
--+#         self.num_key_value_heads = config.num_key_value_heads
--+#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+#         self.max_position_embeddings = config.max_position_embeddings
--+#         self.rope_theta = config.rope_theta
--+#         self.is_causal = True
--+#         self.attention_dropout = config.attention_dropout
--+
--+#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+#             raise ValueError(
--+#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+#                 f" and `num_heads`: {self.num_heads})."
--+#             )
--+#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+
--+#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+#             self.head_dim,
--+#             max_position_embeddings=self.max_position_embeddings,
--+#             base=self.rope_theta,
--+#         )
--+
--+#     def forward(
--+#         self,
--+#         hidden_states: mindspore.Tensor,
--+#         attention_mask: Optional[mindspore.Tensor] = None,
--+#         position_ids: Optional[mindspore.Tensor] = None,
--+#         past_key_value: Optional[Cache] = None,
--+#         output_attentions: bool = False,
--+#         use_cache: bool = False,
--+#         cache_position: Optional[mindspore.Tensor] = None,
--+#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+
--+        
--+
--+#         bsz, q_len, _ = hidden_states.shape
--+
--+#         query_states = self.q_proj(hidden_states)
--+#         key_states = self.k_proj(hidden_states)
--+#         value_states = self.v_proj(hidden_states)
--+
--+#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+
--+#         kv_seq_len = key_states.shape[-2]
--+#         if past_key_value is not None:
--+#             if self.layer_idx is None:
--+#                 raise ValueError(
--+#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+#                     "with a layer index."
--+#                 )
--+#             if isinstance(past_key_value, StaticCache):
--+#                 kv_seq_len = key_states.shape[-2]
--+#             else:
--+#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+
--+#         if past_key_value is not None:
--+#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+            
--+#             if isinstance(past_key_value, StaticCache):
--+#                 kv_seq_len = key_states.shape[-2]
--+
--+#         # repeat k/v heads if n_kv_heads < n_heads
--+#         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+#         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+        
--+#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+
--+#         if attention_mask is not None:
--+#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+#             attn_weights = attn_weights + causal_mask
--+
--+#         # upcast attention to fp32
--+#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+#         attn_output = ops.matmul(attn_weights, value_states)
--+
--+#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+#             raise ValueError(
--+#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--+#                 f" {attn_output.shape}"
--+#             )
--+
--+#         attn_output = ops.transpose(attn_output, 1, 2)
--+#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+
--+#         attn_output = self.o_proj(attn_output)
--+#         # @lwx
--+        
--+#         # max_seq_len = self.max_position_embeddings  # 2048
--+
--+#         # if attention_mask is not None:
--+#         #     # attention_mask: [B, 1, Sq, Sk]
--+#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+
--+#         #     # pad 到 [max_seq_len, max_seq_len]
--+#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+#         #     global_attention_mask = padded_mask
--+#         # else:
--+#         #     global_attention_mask = None
--+
--+
--+#         # sparse_mode=3
--+#         # attn_output = mindspore.ops.flash_attention_score(
--+#         #     query=query_states,
--+#         #     key=key_states,
--+#         #     value=value_states,
--+#         #     real_shift=None,
--+#         #     padding_mask=None,
--+
--+#         #     head_num=self.num_heads,
--+#         #     attn_mask=global_attention_mask,
--+#         #     keep_prob=1.0 - self.attention_dropout,
--+#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+#         #     input_layout="BNSD",
--+#         #     pre_tokens=2147483647,
--+#         #     next_tokens=2147483647,
--+#         #     inner_precise=0,
--+#         #     drop_mask=None,
--+#         #     prefix=None,
--+#         #     actual_seq_qlen=None,
--+#         #     actual_seq_kvlen=None,
--+#         #     sparse_mode=sparse_mode,
--+#         # )
--+#         if not output_attentions:
--+#             attn_weights = None
--+
--+#         return attn_output, attn_weights, past_key_value
--+
-- class Qwen2MoeAttention(nn.Module):
--     """
---    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
---    and "Generating Long Sequences with Sparse Transformers".
---    """
--+    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
-- 
--+    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
--+    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
--+    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
--+
--+    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
--+    """
--     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--         super().__init__()
--         self.config = config
--@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
--         if layer_idx is None:
--             logger.warning_once(
--                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
---                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--                 "when creating this class."
--             )
-- 
--@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
--         use_cache: bool = False,
--         cache_position: Optional[mindspore.Tensor] = None,
--     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
---
--         
---
--+        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
--         bsz, q_len, _ = hidden_states.shape
-- 
--         query_states = self.q_proj(hidden_states)
--         key_states = self.k_proj(hidden_states)
--         value_states = self.v_proj(hidden_states)
-- 
---        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
---        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
---        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
---
--+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+        
--         kv_seq_len = key_states.shape[-2]
--         if past_key_value is not None:
---            if self.layer_idx is None:
---                raise ValueError(
---                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
---                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
---                    "with a layer index."
---                )
---            if isinstance(past_key_value, StaticCache):
---                kv_seq_len = key_states.shape[-2]
---            else:
---                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+        
--         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
-- 
--         if past_key_value is not None:
---            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+
--+        # --- 2. 动态调度核心注意力计算 ---
--+        global Long_Prompt
--+        if Long_Prompt >= 1:
--+            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
--+            fa_attention_mask = None
--+            if attention_mask is not None:
--+                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+                fa_attention_mask = (mask_slice != 0)
--+
--+            attn_output = mindspore.ops.flash_attention_score(
--+                query=query_states,
--+                key=key_states,
--+                value=value_states,
--+                head_num=self.num_heads,
--+                attn_mask=fa_attention_mask,
--+                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
--+                scalar_value=1.0 / math.sqrt(self.head_dim),
--+                input_layout="BNSD",
--+                sparse_mode=0,
--+                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
--+            )
--             
---            if isinstance(past_key_value, StaticCache):
---                kv_seq_len = key_states.shape[-2]
--+            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+            attn_output = self.o_proj(attn_output)
--+            attn_weights = None
--+            if output_attentions:
--+                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
-- 
---        # repeat k/v heads if n_kv_heads < n_heads
---        key_states = repeat_kv(key_states, self.num_key_value_groups)
---        value_states = repeat_kv(value_states, self.num_key_value_groups)
---        
---        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+        else:
--+            # --- Eager Attention 路径 (用于短序列和解码) ---
--+            key_states = repeat_kv(key_states, self.num_key_value_groups)
--+            value_states = repeat_kv(value_states, self.num_key_value_groups)
--+            
--+            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
-- 
---        if attention_mask is not None:
---            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
---            attn_weights = attn_weights + causal_mask
--+            if attention_mask is not None:
--+                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+                attn_weights = attn_weights + causal_mask
-- 
---        # upcast attention to fp32
---        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
---        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
---        attn_output = ops.matmul(attn_weights, value_states)
--+            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+            attn_output = ops.matmul(attn_weights, value_states)
-- 
---        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
---            raise ValueError(
---                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
---                f" {attn_output.shape}"
---            )
--+            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+                raise ValueError(
--+                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
--+                )
-- 
---        attn_output = ops.transpose(attn_output, 1, 2)
---        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+            attn_output = ops.transpose(attn_output, 1, 2)
--+            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+            attn_output = self.o_proj(attn_output)
-- 
---        attn_output = self.o_proj(attn_output)
---        # @lwx
--+            if not output_attentions:
--+                attn_weights = None
--         
---        # max_seq_len = self.max_position_embeddings  # 2048
---
---        # if attention_mask is not None:
---        #     # attention_mask: [B, 1, Sq, Sk]
---        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
---
---        #     # pad 到 [max_seq_len, max_seq_len]
---        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
---        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
---        #     global_attention_mask = padded_mask
---        # else:
---        #     global_attention_mask = None
---
---
---        # sparse_mode=3
---        # attn_output = mindspore.ops.flash_attention_score(
---        #     query=query_states,
---        #     key=key_states,
---        #     value=value_states,
---        #     real_shift=None,
---        #     padding_mask=None,
---
---        #     head_num=self.num_heads,
---        #     attn_mask=global_attention_mask,
---        #     keep_prob=1.0 - self.attention_dropout,
---        #     scalar_value=1.0 / math.sqrt(self.head_dim),
---        #     input_layout="BNSD",
---        #     pre_tokens=2147483647,
---        #     next_tokens=2147483647,
---        #     inner_precise=0,
---        #     drop_mask=None,
---        #     prefix=None,
---        #     actual_seq_qlen=None,
---        #     actual_seq_kvlen=None,
---        #     sparse_mode=sparse_mode,
---        # )
---        if not output_attentions:
---            attn_weights = None
---
--         return attn_output, attn_weights, past_key_value
-- 
---
-- # class Qwen2MoeFlashAttention(nn.Module):
-- #     """
-- #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
-- #             return final_hidden_states, router_logits
-- 
-- 
---# class Qwen2MoeSparseMoeBlock(nn.Module):
---#     """
---#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
---#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
---#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
---#     `_moe_infer_prefill` (用于长序列处理) 方法。
---#     """
---#     def __init__(self, config: Qwen2MoeConfig):
---#         super().__init__()
---#         self.num_experts = config.num_experts
---#         self.top_k = config.num_experts_per_tok
---#         self.norm_topk_prob = config.norm_topk_prob
---
---#         # 门控网络
---#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
---#         # 专家列表
---#         self.experts = nn.ModuleList(
---#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
---#         )
---#         # 共享专家
---#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
---#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
---
---#     @no_grad()
---#     def _moe_infer_decode(
---#         self, 
---#         hidden_states: mindspore.Tensor, 
---#         selected_experts: mindspore.Tensor, 
---#         routing_weights: mindspore.Tensor
---#     ) -> mindspore.Tensor:
---#         """
---#         【解码路径】针对 sequence_length=1 的极致优化。
---#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
---#         """
---#         batch_size, hidden_dim = hidden_states.shape
---        
---#         expert_outputs_list = [
---#             ops.cat([
---#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
---#             ], dim=0) 
---#             for i in range(batch_size)
---#         ]
---        
---#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
---#         # shape: (batch_size, top_k, hidden_dim)
---#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
---        
---#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
---#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
---        
---#         return moe_output.squeeze(1)
---
---#     @no_grad()
---#     def _moe_infer_prefill(
---#         self, 
---#         hidden_states: mindspore.Tensor, 
---#         selected_experts: mindspore.Tensor, 
---#         routing_weights: mindspore.Tensor
---#     ) -> mindspore.Tensor:
---#         """
---#         【预填充路径】针对 sequence_length > 1 的优化。
---#         按专家对 Token 进行分组，并进行批处理。
---#         """
---#         moe_output = ops.zeros_like(hidden_states)
---#         num_tokens = hidden_states.shape[0]
---#         flat_selected_experts = selected_experts.flatten()
---        
---#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
---        
---#         active_experts = ops.unique(flat_selected_experts)
---        
---#         for expert_idx_tensor in active_experts:
---#             expert_idx = expert_idx_tensor.item()
---#             expert_layer = self.experts[expert_idx]
---            
---#             mask = (flat_selected_experts == expert_idx_tensor)
---#             selected_token_indices = token_indices[mask]
---#             selected_routing_weights = routing_weights.flatten()[mask]
---            
---#             current_states = hidden_states[selected_token_indices]
---            
---#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
---            
---#             moe_output = moe_output.index_add(
---#                 dim=0,
---#                 index=selected_token_indices,
---#                 source=expert_output.to(hidden_states.dtype)
---#             )
---#         return moe_output
---
---#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
---#         """
---#         顶层 forward 方法，作为智能分发器。
---#         """
---#         batch_size, sequence_length, hidden_dim = hidden_states.shape
---        
---#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
---#         router_logits = self.gate(hidden_states_reshaped)
---#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
---
---#         if self.norm_topk_prob:
---#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---        
---#         routing_weights = routing_weights.to(hidden_states.dtype)
---        
---#         moe_output = None
---#         # 在推理时，根据序列长度选择最优路径
---#         if not self.training:
---#             if sequence_length == 1:
---#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
---#             else:
---#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
---#         else:
---#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
---#             raise NotImplementedError("Training path is not implemented.")
---
---#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
---#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
---#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
---        
---#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
---        
---#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
---        
---#         return final_hidden_states, router_logits
---
---
---# class Qwen2MoeSparseMoeBlock(nn.Module):
---#     """
---#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
---#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
---#     """
---#     def __init__(self, config: Qwen2MoeConfig):
---#         super().__init__()
---#         self.num_experts = config.num_experts
---#         self.top_k = config.num_experts_per_tok
---#         self.norm_topk_prob = config.norm_topk_prob
---
---#         # 门控网络
---#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
---#         # 专家列表
---#         self.experts = nn.ModuleList(
---#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
---#         )
---#         # 共享专家
---#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
---#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
---
---#     @no_grad()
---#     def _moe_infer_decode(
---#         self, 
---#         hidden_states: mindspore.Tensor, 
---#         selected_experts: mindspore.Tensor, 
---#         routing_weights: mindspore.Tensor
---#     ) -> mindspore.Tensor:
---#         batch_size, _ = hidden_states.shape
---#         expert_outputs_list = [
---#             ops.cat([
---#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
---#             ], dim=0) 
---#             for i in range(batch_size)
---#         ]
---#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
---#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
---#         return moe_output.squeeze(1)
---
---#     @no_grad()
---#     def _moe_infer_prefill(
---#         self, 
---#         hidden_states: mindspore.Tensor, 
---#         selected_experts: mindspore.Tensor, 
---#         routing_weights: mindspore.Tensor
---#     ) -> mindspore.Tensor:
---#         moe_output = ops.zeros_like(hidden_states)
---#         num_tokens = hidden_states.shape[0]
---#         flat_selected_experts = selected_experts.flatten()
---#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
---#         active_experts = ops.unique(flat_selected_experts)
---        
---#         for expert_idx_tensor in active_experts:
---#             expert_idx = expert_idx_tensor.item()
---#             expert_layer = self.experts[expert_idx]
---#             mask = (flat_selected_experts == expert_idx_tensor)
---#             selected_token_indices = token_indices[mask]
---#             selected_routing_weights = routing_weights.flatten()[mask]
---#             current_states = hidden_states[selected_token_indices]
---#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
---#             moe_output = moe_output.index_add(
---#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
---#             )
---#         return moe_output
---
---#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
---#         """
---#         顶层 forward 方法，作为智能分发器。
---#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
---#         """
---#         batch_size, sequence_length, hidden_dim = hidden_states.shape
---        
---#         # 1. 门控计算 (通用逻辑)
---#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
---#         router_logits = self.gate(hidden_states_reshaped)
---#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
---
---#         if self.norm_topk_prob:
---#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---        
---#         routing_weights = routing_weights.to(hidden_states.dtype)
---        
---#         # 2. 智能分发到最优 MoE 路径
---#         moe_output = None
---#         if not self.training:
---#             if sequence_length == 1:
---#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
---#             else:
---#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
---#         else:
---#             raise NotImplementedError("Training path is not implemented.")
---
---#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
---#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
---#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
---#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
---        
---#         # 4. 合并 MoE 输出和共享专家输出
---#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
---#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
---        
---#         # 5. 恢复原始形状并返回
---#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
---        
---#         return final_hidden_states, router_logits
---
---# prefill fastest
---# class Qwen2MoeSparseMoeBlock(nn.Module):
---#     """
---#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
---#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
---#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
---#     """
---#     def __init__(self, config: Qwen2MoeConfig):
---#         super().__init__()
---#         self.num_experts = config.num_experts
---#         self.top_k = config.num_experts_per_tok
---#         self.norm_topk_prob = config.norm_topk_prob
---
---#         # 门控网络
---#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
---#         # 专家列表
---#         self.experts = nn.ModuleList(
---#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
---#         )
---#         # 共享专家
---#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
---#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
---
---#     @no_grad()
---#     def _moe_infer_dispatch(
---#         self, 
---#         hidden_states: mindspore.Tensor, 
---#         selected_experts: mindspore.Tensor, 
---#         routing_weights: mindspore.Tensor
---#     ) -> mindspore.Tensor:
---#         """
---#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
---#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
---#         """
---#         moe_output = ops.zeros_like(hidden_states)
---#         num_tokens, _ = hidden_states.shape
---        
---#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
---#         flat_selected_experts = selected_experts.flatten()
---#         flat_routing_weights = routing_weights.flatten()
---
---#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
---#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
---
---#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
---#         active_experts = ops.unique(flat_selected_experts)
---        
---#         for expert_idx_tensor in active_experts:
---#             expert_idx = expert_idx_tensor.item()
---#             expert_layer = self.experts[expert_idx]
---            
---#             # 找到所有分配给该专家的 token
---#             mask = (flat_selected_experts == expert_idx_tensor)
---            
---#             # 使用 mask 选取对应的 token 和权重
---#             current_token_indices = token_indices[mask]
---#             current_routing_weights = flat_routing_weights[mask]
---#             current_hidden_states = hidden_states[current_token_indices]
---            
---#             # 对这些 token 进行批处理
---#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
---            
---#             # 使用 index_add 将结果精确地加回到对应位置
---#             moe_output = moe_output.index_add(
---#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
---#             )
---#         return moe_output
---
---#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
---#         """
---#         顶层 forward 方法，作为智能分发器。
---#         """
---#         batch_size, sequence_length, hidden_dim = hidden_states.shape
---        
---#         # 1. 门控计算
---#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
---#         router_logits = self.gate(hidden_states_reshaped)
---#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
---
---#         if self.norm_topk_prob:
---#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---        
---#         routing_weights = routing_weights.to(hidden_states.dtype)
---        
---#         # 2. 调用统一的 MoE 计算内核
---#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
---#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
---
---#         # 3. 统一处理共享专家
---#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
---#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
---        
---#         # 4. 合并输出
---#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
---        
---#         # 5. 恢复原始形状并返回
---#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
---        
---#         return final_hidden_states, router_logits
---
---
---# class Qwen2MoeSparseMoeBlock(nn.Module):
---#     """
---#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
---#     【最终高性能与高精度版】：
---#     1. 解码路径使用 bmm 算子以达到最大推理速度。
---#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
---#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
---#     3. 这样实现了速度和准确性的两全其美。
---#     """
---#     def __init__(self, config: Qwen2MoeConfig):
---#         super().__init__()
---#         self.num_experts = config.num_experts
---#         self.top_k = config.num_experts_per_tok
---#         self.norm_topk_prob = config.norm_topk_prob
---
---#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
---#         self.experts = nn.ModuleList(
---#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
---#         )
---#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
---#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
---
---#     @no_grad()
---#     def _moe_infer_decode(
---#         self, 
---#         hidden_states: mindspore.Tensor, 
---#         selected_experts: mindspore.Tensor, 
---#         routing_weights: mindspore.Tensor
---#     ) -> mindspore.Tensor:
---#         """
---#         【解码路径】极致优化版：bmm + 高精度累加。
---#         """
---#         original_dtype = hidden_states.dtype
---#         batch_size, _ = hidden_states.shape
---        
---#         expert_outputs_list = [
---#             ops.cat([
---#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
---#             ], dim=0) 
---#             for i in range(batch_size)
---#         ]
---#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
---
---#         # 在 float32 下执行 bmm，得到高精度结果
---#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
---        
---#         # 将高精度结果转换回原始数据类型
---#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
---        
---#         return moe_output
---
---#     @no_grad()
---#     def _moe_infer_prefill(
---#         self, 
---#         hidden_states: mindspore.Tensor, 
---#         selected_experts: mindspore.Tensor, 
---#         routing_weights: mindspore.Tensor
---#     ) -> mindspore.Tensor:
---#         """
---#         【预填充路径】与原始实现一致，结果精确。
---#         """
---#         moe_output = ops.zeros_like(hidden_states)
---#         num_tokens, _ = hidden_states.shape
---#         flat_selected_experts = selected_experts.flatten()
---#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
---#         active_experts = ops.unique(flat_selected_experts)
---        
---#         for expert_idx_tensor in active_experts:
---#             expert_idx = expert_idx_tensor.item()
---#             expert_layer = self.experts[expert_idx]
---#             mask = (flat_selected_experts == expert_idx_tensor)
---#             selected_token_indices = token_indices[mask]
---#             selected_routing_weights = routing_weights.flatten()[mask]
---#             current_states = hidden_states[selected_token_indices]
---#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
---#             moe_output = moe_output.index_add(
---#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
---#             )
---#         return moe_output
---
---#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
---#         batch_size, sequence_length, hidden_dim = hidden_states.shape
---        
---#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
---#         router_logits = self.gate(hidden_states_reshaped)
---#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
---
---#         if self.norm_topk_prob:
---#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---        
---#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
---#         # 如果模型主体是 float16，后续再转换
---        
---#         moe_output = None
---#         if not self.training:
---#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
---#             # _moe_infer_decode 内部会处理好类型转换
---#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
---#             if sequence_length == 1:
---#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
---#             else:
---#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
---#         else:
---#             raise NotImplementedError("Training path is not implemented.")
---
---#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
---#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
---        
---#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
---#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
---        
---#         return final_hidden_states, router_logits
---    
---
---# class Qwen2MoeSparseMoeBlock(nn.Module):
---#     """
---#     【融合版】一个混合专家模块，内置两种推理策略，
---#     由外部全局变量 `Long_Prompt` 控制：
---
---#     - if Long_Prompt is True:  【精度优先模式】
---#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
---#       适用于处理长序列，避免误差累积。
---
---#     - if Long_Prompt is False: 【速度优先模式】
---#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
---#       在解码阶段获得极致速度，同时保证结果高度准确。
---#     """
---#     def __init__(self, config: Qwen2MoeConfig):
---#         super().__init__()
---#         self.num_experts = config.num_experts
---#         self.top_k = config.num_experts_per_tok
---#         self.norm_topk_prob = config.norm_topk_prob
---
---#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
---#         self.experts = nn.ModuleList(
---#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
---#         )
---#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
---#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
---
---#     # --- 速度优先模式的辅助函数 ---
---#     @no_grad()
---#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
---#         original_dtype = hidden_states.dtype
---#         batch_size, _ = hidden_states.shape
---#         expert_outputs_list = [
---#             ops.cat([
---#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
---#             ], dim=0) 
---#             for i in range(batch_size)
---#         ]
---#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
---#         weights_fp32 = routing_weights.to(mindspore.float32)
---#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
---#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
---#         return moe_output_fp32.squeeze(1).to(original_dtype)
---
---#     @no_grad()
---#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
---#         moe_output = ops.zeros_like(hidden_states)
---#         num_tokens, _ = hidden_states.shape
---#         flat_selected_experts = selected_experts.flatten()
---#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
---#         active_experts = ops.unique(flat_selected_experts)
---#         for expert_idx_tensor in active_experts:
---#             expert_idx = expert_idx_tensor.item()
---#             expert_layer = self.experts[expert_idx]
---#             mask = (flat_selected_experts == expert_idx_tensor)
---#             selected_token_indices = token_indices[mask]
---#             selected_routing_weights = routing_weights.flatten()[mask]
---#             current_states = hidden_states[selected_token_indices]
---#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
---#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
---#         return moe_output
---
---#     # --- 精度优先模式的辅助函数 ---
---#     @no_grad()
---#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
---#         moe_output = ops.zeros_like(hidden_states)
---#         num_tokens, _ = hidden_states.shape
---#         flat_selected_experts = selected_experts.flatten()
---#         flat_routing_weights = routing_weights.flatten()
---#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
---#         active_experts = ops.unique(flat_selected_experts)
---#         for expert_idx_tensor in active_experts:
---#             expert_idx = expert_idx_tensor.item()
---#             expert_layer = self.experts[expert_idx]
---#             mask = (flat_selected_experts == expert_idx_tensor)
---#             current_token_indices = token_indices[mask]
---#             current_routing_weights = flat_routing_weights[mask]
---#             current_hidden_states = hidden_states[current_token_indices]
---#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
---#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
---#         return moe_output
---
---#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
---#         # 声明我们将要使用一个在模块外部定义的全局变量
---#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
---#         global Long_Prompt
---
---#         # 1. 门控计算 (所有模式通用)
---#         batch_size, sequence_length, hidden_dim = hidden_states.shape
---#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
---#         router_logits = self.gate(hidden_states_reshaped)
---#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
---#         if self.norm_topk_prob:
---#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---        
---#         moe_output = None
---#         if not self.training:
---#             # 根据 Long_Prompt 标志选择模式
---#             if Long_Prompt:
---#                 # --- 精度优先模式 ---
---#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
---#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
---#             else:
---#                 # --- 速度优先模式 ---
---#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
---#                 if sequence_length == 1:
---#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
---#                 else:
---#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
---#         else:
---#             raise NotImplementedError("Training path is not implemented.")
---
---#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
---#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
---        
---#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
---#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
---        
---#         return final_hidden_states, router_logits
---    
-- class Qwen2MoeSparseMoeBlock(nn.Module):
--     """
--     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--         return moe_output_fp32.squeeze(1).to(original_dtype)
-- 
--+    # @no_grad()
--+    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+    #     num_tokens, _ = hidden_states.shape
--+    #     flat_selected_experts = selected_experts.flatten()
--+    #     sorted_expert_indices = flat_selected_experts.argsort()
--+    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+    #     original_token_indices = sorted_expert_indices // self.top_k
--+    #     moe_output = ops.zeros_like(hidden_states)
--+    #     current_token_offset = 0
--+    #     for i in range(self.num_experts):
--+    #         expert_token_count = tokens_per_expert[i] - current_token_offset
--+    #         if expert_token_count == 0:
--+    #             continue
--+    #         end_offset = current_token_offset + expert_token_count
--+    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+    #         expert_hidden_states = hidden_states[expert_original_token_indices]
--+    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+    #         current_token_offset += expert_token_count
--+    #     return moe_output
--+
--     @no_grad()
--     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
---        num_tokens, _ = hidden_states.shape
---        flat_selected_experts = selected_experts.flatten()
---        sorted_expert_indices = flat_selected_experts.argsort()
---        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
---        original_token_indices = sorted_expert_indices // self.top_k
--+        """
--+        优化版 MoE prefill (速度优先模式)：
--+        - 批量张量化处理同一个 expert 的所有 token
--+        - 跳过无 token 的专家
--+        - 保持结果完全一致
--+        """
--         moe_output = ops.zeros_like(hidden_states)
---        current_token_offset = 0
---        for i in range(self.num_experts):
---            expert_token_count = tokens_per_expert[i] - current_token_offset
---            if expert_token_count == 0:
---                continue
---            end_offset = current_token_offset + expert_token_count
---            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
---            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
---            expert_hidden_states = hidden_states[expert_original_token_indices]
---            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
---            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
---            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
---            current_token_offset += expert_token_count
--+
--+        flat_selected_experts = selected_experts.flatten()
--+        flat_routing_weights = routing_weights.flatten()
--+
--+        idxs = flat_selected_experts.argsort()
--+        sorted_expert_indices = flat_selected_experts[idxs]
--+        sorted_token_indices = idxs // self.top_k
--+
--+        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
--+
--+        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--+
--+        for expert_id in active_experts.tolist():
--+            start = int(tokens_per_expert[:expert_id].sum().item())
--+            end = start + int(tokens_per_expert[expert_id].item())
--+
--+            token_idx = sorted_token_indices[start:end]
--+            expert_tokens = hidden_states[token_idx]
--+
--+            expert_out = self.experts[expert_id](expert_tokens)
--+
--+            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
--+
--+            moe_output = mindspore.mint.scatter_add(
--+                moe_output,
--+                0,
--+                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
--+                scaled_out.to(hidden_states.dtype)
--+            )
--+
--         return moe_output
-- 
--+
--     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--     @no_grad()
--     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--         
--         moe_output = None
---        if Long_Prompt:
---            # --- 精度优先模式 (ACCURACY MODE) ---
---            routing_weights_casted = routing_weights.to(hidden_states.dtype)
---            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+        # if Long_Prompt==0:
--+        #     # --- 精度优先模式 (ACCURACY MODE) ---
--+        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+        # else:
--+        #     # --- 速度优先模式 (SPEED MODE) ---
--+        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+        #     if sequence_length == 1:
--+        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+        #     else:
--+        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+        
--+        routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+        if sequence_length == 1:
--+            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--         else:
---            # --- 速度优先模式 (SPEED MODE) ---
---            routing_weights_casted = routing_weights.to(hidden_states.dtype)
---            if sequence_length == 1:
---                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
---            else:
---                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
---        
--+            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+    
-- 
--         # 3. 共享专家计算与合并 (所有模式通用)
--         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--         
--         return final_hidden_states, router_logits
-- 
--+
-- class Qwen2MoeDecoderLayer(nn.Module):
--     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--         super().__init__()
--         self.hidden_size = config.hidden_size
--         
---        # if Long_Prompt:
---        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
---        # else:
--+        # if Long_Prompt == 2:
--         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+        # else:
--+        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-- 
--         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
-- 
--@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--             )
-- 
--         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
---        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+        #     attention_mask,
--+        #     sequence_length=sequence_length,
--+        #     target_length=target_length,
--+        #     dtype=dtype,
--+        #     min_dtype=min_dtype,
--+        #     cache_position=cache_position,
--+        #     batch_size=input_tensor.shape[0],
--+        # )
--+        #@dwj
--+        causal_mask = get_cached_causal_mask_with_cache_position(
--             attention_mask,
--             sequence_length=sequence_length,
--             target_length=target_length,
--@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--         """
---        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
--+        _causal_mask_cache.clear()
-- 
--         input_ids = kwargs.get("input_ids")
--         if input_ids is None and args:
--@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
-- 
--         if input_ids is not None:
--             prompt_length = input_ids.shape[1]
---            
---            if prompt_length > PROMPT_LENGTH_THRESHOLD:
---                Long_Prompt = True
--+            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
--+                Long_Prompt = 2
--+            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
--+                Long_Prompt = 0
--             else:
---                Long_Prompt = False
--+                Long_Prompt = 1
--+
-- 
--         return super().generate(*args, **kwargs)
--     
--@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--             dtype = self.lm_head.weight.dtype
--             min_dtype = float(ops.finfo(dtype).min)
-- 
---            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+            #     attention_mask,
--+            #     sequence_length=sequence_length,
--+            #     target_length=past_key_values.get_max_length(),
--+            #     dtype=dtype,
--+            #     min_dtype=min_dtype,
--+            #     cache_position=cache_position,
--+            #     batch_size=batch_size,
--+            # )
--+
--+            #@dwj
--+            attention_mask = get_cached_causal_mask_with_cache_position(
--                 attention_mask,
--                 sequence_length=sequence_length,
--                 target_length=past_key_values.get_max_length(),
--diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--deleted file mode 100644
--index 6dfb5b93..00000000
----- a/patches/0001-20251104commit.patch
--+++ /dev/null
--@@ -1,1272 +0,0 @@
---From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
---From: Pinoeer-kingxi <13022943007@163.com>
---Date: Tue, 4 Nov 2025 09:11:51 +0800
---Subject: [PATCH] 20251104commit
---
------
--- mindnlp/transformers/cache_utils.py           |  28 +-
--- .../models/deepseek/modeling_deepseek.py      | 149 ++-
--- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--- 3 files changed, 976 insertions(+), 87 deletions(-)
---
---diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
---index cadd2e04..02f8d4be 100644
------ a/mindnlp/transformers/cache_utils.py
---+++ b/mindnlp/transformers/cache_utils.py
---@@ -812,14 +812,26 @@ class StaticCache(Cache):
---             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
---             # k_out[:, :, cache_position] = key_states
---             # v_out[:, :, cache_position] = value_states
----            if ON_ORANGE_PI:
----                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
----                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
----            else:
----                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
----                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
----                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
----
---+            # if ON_ORANGE_PI:
---+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
---+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
---+            # else:
---+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
---+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
---+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
---+            # 确保 cache_position 是 1D tensor 并且类型正确
---+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
---+            if cache_position.ndim > 1:
---+                cache_position = cache_position.flatten()
---+            # 确保类型是 int32 或 int64（MindSpore 要求）
---+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
---+                cache_position = cache_position.int()
---+            
---+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
---+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
---+            k_out[:, :, cache_position] = key_states
---+            v_out[:, :, cache_position] = value_states
---+            
---         return k_out, v_out
--- 
---     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
---diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
---index c695b944..d8303e45 100644
------ a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
---+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
---@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--- # Copied from transformers.models.llama.modeling_llama.rotate_half
--- def rotate_half(x):
---     """Rotates half the hidden dims of the input."""
----    x1 = x[..., : x.shape[-1] // 2]
----    x2 = x[..., x.shape[-1] // 2 :]
---+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
---+    # x1 = x[..., : x.shape[-1] // 2]
---+    # x2 = x[..., x.shape[-1] // 2 :]
---+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
---     return ops.cat((-x2, x1), dim=-1)
--- 
--- 
---@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
---         if self.training:
---             raise NotImplementedError("Training is not supported yet.")
---         else:
----            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
----        if self.config.n_shared_experts is not None:
----            y = y + self.shared_experts(identity)
----        return y
---+            # @lwx
---+            if orig_shape[1] == 1:
---+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
---+                y=y.view(*orig_shape)
---+                if self.config.n_shared_experts is not None:
---+                    y = y + self.shared_experts(identity)
---+                return y
---+            else:
---+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
---+                if self.config.n_shared_experts is not None:
---+                    y = y + self.shared_experts(identity)
---+                return y
---+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
---+        # if self.config.n_shared_experts is not None:
---+        #     y = y + self.shared_experts(identity)
---+        # return y
---+        
---+    @no_grad()
---+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
---+
---+        expert_cache = ops.zeros_like(x)
---+        for i in range(self.num_experts_per_tok):
---+            expert_id = flat_expert_indices[i].item()
---+            weight = flat_expert_weights[i].item()
---+            expert = self.experts[expert_id]
---+            expert_out = expert(x)
---+            expert_cache += expert_out * weight
---+        return expert_cache
--- 
---     @no_grad()
----    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
----        # expert_cache = torch.zeros_like(x)
----        # idxs = flat_expert_indices.argsort()
----        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
----        # token_idxs = idxs // self.num_experts_per_tok
----        # for i, end_idx in enumerate(tokens_per_expert):
----        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
----        #     if start_idx == end_idx:
----        #         continue
----        #     expert = self.experts[i]
----        #     exp_token_idx = token_idxs[start_idx:end_idx]
----        #     expert_tokens = x[exp_token_idx]
----        #     expert_out = expert(expert_tokens)
----        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
----        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
----        # return expert_cache
---+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
---         expert_cache = ops.zeros_like(x)
---         idxs = flat_expert_indices.argsort()
---         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
---         token_idxs = idxs // self.num_experts_per_tok
---+
---         for i, end_idx in enumerate(tokens_per_expert):
---             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
---             if start_idx == end_idx:
---@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
---             expert_out = expert(expert_tokens)
---             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
---             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
---+
---         return expert_cache
---+        
---+    # @no_grad()
---+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
---+    #     # expert_cache = torch.zeros_like(x)
---+    #     # idxs = flat_expert_indices.argsort()
---+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
---+    #     # token_idxs = idxs // self.num_experts_per_tok
---+    #     # for i, end_idx in enumerate(tokens_per_expert):
---+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
---+    #     #     if start_idx == end_idx:
---+    #     #         continue
---+    #     #     expert = self.experts[i]
---+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
---+    #     #     expert_tokens = x[exp_token_idx]
---+    #     #     expert_out = expert(expert_tokens)
---+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
---+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
---+    #     # return expert_cache
---+    #     expert_cache = ops.zeros_like(x)
---+    #     idxs = flat_expert_indices.argsort()
---+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
---+    #     token_idxs = idxs // self.num_experts_per_tok
---+
---+    #     for i, end_idx in enumerate(tokens_per_expert):
---+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
---+    #         if start_idx == end_idx:
---+    #             continue
---+    #         expert = self.experts[i]
---+    #         exp_token_idx = token_idxs[start_idx:end_idx]
---+    #         expert_tokens = x[exp_token_idx]
---+    #         expert_out = expert(expert_tokens)
---+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
---+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
---+
---+    #     return expert_cache
---+    # @no_grad()
---+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
---+    #     expert_cache = ops.zeros_like(x)
---+
---+    #     # 排序保证顺序一致
---+    #     idxs = flat_expert_indices.argsort()
---+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
---+    #     token_idxs = idxs // self.num_experts_per_tok
---+
---+    #     # 找出有 token 的专家
---+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
---+
---+    #     for i in active_experts.tolist():
---+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
---+    #         end_idx = tokens_per_expert[i]
---+    #         if start_idx == end_idx:  # 没有 token
---+    #             continue
---+
---+    #         exp_token_idx = token_idxs[start_idx:end_idx]
---+    #         expert_tokens = x[exp_token_idx]
---+    #         expert_out = self.experts[i](expert_tokens)
---+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
---+
---+    #         expert_cache = mindspore.mint.scatter_add(
---+    #             expert_cache,
---+    #             0,
---+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
---+    #             expert_out
---+    #         )
---+
---+    #     return expert_cache
---+
---+
--- 
--- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--- #     """
---@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--- 
---         # Initialize weights and apply final processing
---         self.post_init()
---+        self.warm_up = False
---+
---+    def warmup_moe_model_deep(self):
---+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
---+        test_texts = [
---+            "warmup short",
---+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
---+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
---+        ]
---+        tokenizer = getattr(self, "_warmup_tokenizer", None)
---+        if tokenizer is None:
---+            from mindnlp.transformers import AutoTokenizer
---+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
---+            self._warmup_tokenizer = tokenizer
---+
---+        for text in test_texts:
---+            inputs = tokenizer(text, return_tensors="ms")
---+            with mindspore._no_grad():
---+                _ = self(**inputs, use_cache=False)
---+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--- 
---     def get_input_embeddings(self):
---         return self.model.embed_tokens
---@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
---         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
---         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
---         ```"""
---+        if not self.warm_up:
---+            self.warm_up = True
---+            self.warmup_moe_model_deep()
---+
---         output_attentions = (
---             output_attentions
---             if output_attentions is not None
---diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
---index 3cbf820e..d4c6b651 100644
------ a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
---+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
---@@ -18,7 +18,6 @@
--- # See the License for the specific language governing permissions and
--- # limitations under the License.
--- """MindSpore Qwen2MoE model."""
----
--- import math
--- from typing import List, Optional, Tuple, Union
--- 
---@@ -36,6 +35,7 @@ from ...modeling_outputs import (
---     TokenClassifierOutput,
--- )
--- from ...modeling_utils import PreTrainedModel
---+from ...generation import GenerationMixin
--- from ....utils import logging
--- from .configuration_qwen2_moe import Qwen2MoeConfig
--- 
---@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
---         self.variance_epsilon = eps
--- 
---     def forward(self, hidden_states):
---+        # @dwj
---+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
---+        # @lwx
---+        # if not self.training :
---+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
---         input_dtype = hidden_states.dtype
---         hidden_states = hidden_states.to(mindspore.float32)
---         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
---@@ -234,6 +239,8 @@ def rotate_half(x):
---     """Rotates half the hidden dims of the input."""
---     x1 = x[..., : x.shape[-1] // 2]
---     x2 = x[..., x.shape[-1] // 2 :]
---+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
---+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
---     return ops.cat((-x2, x1), dim=-1)
--- 
--- 
---@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
---         self.config = config
---         self.hidden_size = config.hidden_size
---         self.intermediate_size = intermediate_size
---+        
---         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
---         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
---         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
---         self.act_fn = ACT2FN[config.hidden_act]
--- 
---     def forward(self, x):
----        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
----
--- 
---+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
---+        # @lwx 
---+        # gate_up_output = self.gate_up_proj(x)
---+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
---+        # return self.down_proj(swiglu_output)
---+
---+    # def forward(self, x):
---+    #     gate_proj_out = self.gate_proj(x)
---+    #     up_proj_out = self.up_proj(x)
---+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
---+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
---+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
---+    #     return self.down_proj(swiglu_out)
---+        
--- # Copied from transformers.models.llama.modeling_llama.repeat_kv
--- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
---     """
---@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
---         use_cache: bool = False,
---         cache_position: Optional[mindspore.Tensor] = None,
---     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
---+
---+        
---+
---         bsz, q_len, _ = hidden_states.shape
--- 
---         query_states = self.q_proj(hidden_states)
---@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
---                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
---                     "with a layer index."
---                 )
----            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
---+            if isinstance(past_key_value, StaticCache):
---+                kv_seq_len = key_states.shape[-2]
---+            else:
---+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
---         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
---         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--- 
---         if past_key_value is not None:
---             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
---             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
---+            
---+            if isinstance(past_key_value, StaticCache):
---+                kv_seq_len = key_states.shape[-2]
--- 
---         # repeat k/v heads if n_kv_heads < n_heads
---         key_states = repeat_kv(key_states, self.num_key_value_groups)
---         value_states = repeat_kv(value_states, self.num_key_value_groups)
----
---+        
---         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--- 
----        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
----            raise ValueError(
----                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
----                f" {attn_weights.shape}"
----            )
----
----        if attention_mask is not None:  # no matter the length, we just slice it
----            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
---+        if attention_mask is not None:
---+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
---             attn_weights = attn_weights + causal_mask
--- 
---         # upcast attention to fp32
---@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
---         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--- 
---         attn_output = self.o_proj(attn_output)
----
---+        # @lwx
---+        
---+        # max_seq_len = self.max_position_embeddings  # 2048
---+
---+        # if attention_mask is not None:
---+        #     # attention_mask: [B, 1, Sq, Sk]
---+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
---+
---+        #     # pad 到 [max_seq_len, max_seq_len]
---+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
---+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
---+        #     global_attention_mask = padded_mask
---+        # else:
---+        #     global_attention_mask = None
---+
---+
---+        # sparse_mode=3
---+        # attn_output = mindspore.ops.flash_attention_score(
---+        #     query=query_states,
---+        #     key=key_states,
---+        #     value=value_states,
---+        #     real_shift=None,
---+        #     padding_mask=None,
---+
---+        #     head_num=self.num_heads,
---+        #     attn_mask=global_attention_mask,
---+        #     keep_prob=1.0 - self.attention_dropout,
---+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
---+        #     input_layout="BNSD",
---+        #     pre_tokens=2147483647,
---+        #     next_tokens=2147483647,
---+        #     inner_precise=0,
---+        #     drop_mask=None,
---+        #     prefix=None,
---+        #     actual_seq_qlen=None,
---+        #     actual_seq_kvlen=None,
---+        #     sparse_mode=sparse_mode,
---+        # )
---         if not output_attentions:
---             attn_weights = None
--- 
---         return attn_output, attn_weights, past_key_value
--- 
--- 
---+class Qwen2MoeFlashAttention(nn.Module):
---+    """
---+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
---+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
---+
---+    关键改动:
---+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
---+       直接传入原始的 key 和 value 张量效率更高。
---+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
---+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
---+    """
---+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
---+        super().__init__()
---+        self.config = config
---+        self.layer_idx = layer_idx
---+        self.hidden_size = config.hidden_size
---+        self.num_heads = config.num_attention_heads
---+        self.head_dim = self.hidden_size // self.num_heads
---+        self.num_key_value_heads = config.num_key_value_heads
---+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
---+        self.max_position_embeddings = config.max_position_embeddings
---+        self.rope_theta = config.rope_theta
---+        self.attention_dropout = config.attention_dropout
---+
---+        if (self.head_dim * self.num_heads) != self.hidden_size:
---+            raise ValueError(
---+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
---+            )
---+
---+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
---+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
---+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
---+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
---+
---+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
---+            self.head_dim,
---+            max_position_embeddings=self.max_position_embeddings,
---+            base=self.rope_theta,
---+        )
---+
---+    def forward(
---+        self,
---+        hidden_states: mindspore.Tensor,
---+        attention_mask: Optional[mindspore.Tensor] = None,
---+        position_ids: Optional[mindspore.Tensor] = None,
---+        past_key_value: Optional[Cache] = None,
---+        output_attentions: bool = False,
---+        use_cache: bool = False,
---+        cache_position: Optional[mindspore.Tensor] = None,
---+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
---+
---+        bsz, q_len, _ = hidden_states.shape
---+
---+        # 1. 线性投射 Q, K, V
---+        query_states = self.q_proj(hidden_states)
---+        key_states = self.k_proj(hidden_states)
---+        value_states = self.v_proj(hidden_states)
---+
---+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
---+        # query:   [B, S, H*D] -> [B, N1, S, D]
---+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
---+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
---+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---+
---+        # 3. RoPE 旋转位置编码
---+        kv_seq_len = key_states.shape[-2]
---+        if past_key_value is not None:
---+            if self.layer_idx is None:
---+                raise ValueError(
---+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
---+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
---+                    "with a layer index."
---+                )
---+            # 对于 StaticCache，需要特殊处理 kv_seq_len
---+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
---+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
---+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
---+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
---+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
---+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
---+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
---+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
---+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
---+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
---+                if cache_position.shape[0] == 1:
---+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
---+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
---+                    kv_seq_len = past_seen_tokens + 1
---+                else:
---+                    # prefill 阶段：cache_position 是范围，使用其长度
---+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
---+            else:
---+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
---+
---+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
---+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
---+
---+        # 4. KV 缓存更新
---+        if past_key_value is not None:
---+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
---+            key_states, value_states = past_key_value.update(
---+                key_states, value_states, self.layer_idx, cache_kwargs
---+            )
---+            
---+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
---+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
---+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
---+                if cache_position.shape[0] == 1:
---+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
---+                    kv_seq_len = key_states.shape[-2]
---+
---+        # 5. [重要] 准备 Attention Mask
---+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
---+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
---+        fa_attention_mask = None
---+        if attention_mask is not None:
---+            # 截取与当前key长度匹配的部分
---+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
---+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
---+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
---+            # 转换为布尔类型: 大负数 -> True, 0 -> False
---+            fa_attention_mask = (mask_slice != 0)
---+
---+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
---+        input_dtype = query_states.dtype
---+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
---+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
---+            query_states = query_states.to(mindspore.float16)
---+            key_states = key_states.to(mindspore.float16)
---+            value_states = value_states.to(mindspore.float16)
---+
---+        # 6. [核心] 调用 flash_attention_score 算子
---+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
---+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
---+        attn_output = mindspore.ops.flash_attention_score(
---+            query=query_states,
---+            key=key_states,
---+            value=value_states,
---+            head_num=self.num_heads, # 传入Q的头数(N1)
---+            attn_mask=fa_attention_mask,
---+            keep_prob=1.0 - self.attention_dropout,
---+            scalar_value=1.0 / math.sqrt(self.head_dim),
---+            input_layout="BNSD",
---+            sparse_mode=0 # 使用 defaultMask 模式
---+        )
---+
---+        # 恢复原始数据类型
---+        attn_output = attn_output.to(input_dtype)
---+
---+        # 7. 调整输出形状
---+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
---+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
---+        attn_output = self.o_proj(attn_output)
---+
---+        # FlashAttention 算子不直接返回注意力权重矩阵
---+        attn_weights = None
---+        if output_attentions:
---+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
---+
---+        return attn_output, attn_weights, past_key_value
---+
---+    # def forward(
---+    #     self,
---+    #     hidden_states: mindspore.Tensor,
---+    #     attention_mask: Optional[mindspore.Tensor] = None,
---+    #     position_ids: Optional[mindspore.Tensor] = None,
---+    #     past_key_value: Optional[Cache] = None,
---+    #     output_attentions: bool = False,
---+    #     use_cache: bool = False,
---+    #     cache_position: Optional[mindspore.Tensor] = None,
---+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
---+
---+    #     bsz, q_len, _ = hidden_states.shape
---+
---+    #     # 1. 线性投射 Q, K, V
---+    #     query_states = self.q_proj(hidden_states)
---+    #     key_states = self.k_proj(hidden_states)
---+    #     value_states = self.v_proj(hidden_states)
---+
---+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
---+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
---+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---+
---+    #     # 3. RoPE 旋转位置编码
---+    #     kv_seq_len = key_states.shape[-2]
---+    #     if past_key_value is not None:
---+    #         if self.layer_idx is None:
---+    #             raise ValueError(
---+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
---+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
---+    #                 "with a layer index."
---+    #             )
---+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
---+
---+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
---+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
---+
---+    #     # 4. KV 缓存更新
---+    #     if past_key_value is not None:
---+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
---+    #         key_states, value_states = past_key_value.update(
---+    #             key_states, value_states, self.layer_idx, cache_kwargs
---+    #         )
---+
---+    #     # 5. 准备 Attention Mask
---+    #     fa_attention_mask = None
---+    #     if attention_mask is not None:
---+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
---+    #         fa_attention_mask = (mask_slice != 0)
---+
---+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
---+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
---+    #     input_dtype = query_states.dtype
---+
---+    #     # 6. [核心] 调用 flash_attention_score 算子
---+    #     attn_output = mindspore.ops.flash_attention_score(
---+    #         query=query_states,
---+    #         key=key_states,
---+    #         value=value_states,
---+    #         head_num=self.num_heads,
---+    #         attn_mask=fa_attention_mask,
---+    #         keep_prob=1.0 - self.attention_dropout,
---+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
---+    #         input_layout="BNSD",
---+    #         sparse_mode=0,
---+    #         # <--- 修改点 2: 启用内部高精度计算 ---
---+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
---+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
---+    #         inner_precise=1
---+    #     )
---+
---+    #     # 恢复原始数据类型
---+    #     attn_output = attn_output.to(input_dtype)
---+
---+    #     # 7. 调整输出形状
---+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
---+    #     attn_output = self.o_proj(attn_output)
---+
---+    #     attn_weights = None
---+    #     if output_attentions:
---+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
---+
---+    #     return attn_output, attn_weights, past_key_value
---+
---+    # def forward(
---+    #     self,
---+    #     hidden_states: mindspore.Tensor,
---+    #     attention_mask: Optional[mindspore.Tensor] = None,
---+    #     position_ids: Optional[mindspore.Tensor] = None,
---+    #     past_key_value: Optional[Cache] = None,
---+    #     output_attentions: bool = False,
---+    #     use_cache: bool = False,
---+    #     cache_position: Optional[mindspore.Tensor] = None,
---+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
---+
---+    #     bsz, q_len, _ = hidden_states.shape
---+
---+    #     query_states = self.q_proj(hidden_states)
---+    #     key_states = self.k_proj(hidden_states)
---+    #     value_states = self.v_proj(hidden_states)
---+
---+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
---+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---+
---+    #     kv_seq_len = key_states.shape[-2]
---+    #     if past_key_value is not None:
---+    #         if self.layer_idx is None:
---+    #             raise ValueError("`layer_idx` must be specified for caching")
---+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
---+
---+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
---+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
---+
---+    #     if past_key_value is not None:
---+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
---+    #         key_states, value_states = past_key_value.update(
---+    #             key_states, value_states, self.layer_idx, cache_kwargs
---+    #         )
---+
---+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
---+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
---+
---+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
---+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
---+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
---+    #     query_states = query_states / math.sqrt(self.head_dim)
---+    #     # <--- 修改结束 ---
---+
---+    #     fa_attention_mask = None
---+    #     if attention_mask is not None:
---+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
---+    #         fa_attention_mask = (mask_slice != 0)
---+
---+    #     input_dtype = query_states.dtype
---+
---+    #     attn_output = mindspore.ops.flash_attention_score(
---+    #         query=query_states,    # 传入已经预先缩放过的 query
---+    #         key=key_states,
---+    #         value=value_states,
---+    #         head_num=self.num_heads,
---+    #         attn_mask=fa_attention_mask,
---+    #         keep_prob=1.0 - self.attention_dropout,
---+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
---+    #         input_layout="BNSD",
---+    #         sparse_mode=0,
---+    #         inner_precise=1        # 仍然保持内部高精度计算
---+    #     )
---+
---+    #     attn_output = attn_output.to(input_dtype)
---+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
---+    #     attn_output = self.o_proj(attn_output)
---+
---+    #     attn_weights = None
---+    #     if output_attentions:
---+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
---+
---+    #     return attn_output, attn_weights, past_key_value
---+
--- QWEN2MOE_ATTENTION_CLASSES = {
---     "eager": Qwen2MoeAttention,
---+    "flash-attention": Qwen2MoeFlashAttention,
--- }
--- 
--- 
---@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
---         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
---         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--- 
---+    #@dwj
---+    # 只遍历激活的专家，而非全部专家
---     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
----        batch_size, sequence_length, hidden_dim = hidden_states.shape
----        hidden_states = hidden_states.view(-1, hidden_dim)
----        # router_logits: (batch * sequence_length, n_experts)
----        router_logits = self.gate(hidden_states)
----
----        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
----        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
----        if self.norm_topk_prob:
----            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
----        # we cast back to the input dtype
----        routing_weights = routing_weights.to(hidden_states.dtype)
----
----        final_hidden_states = ops.zeros(
----            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
----        )
----
----        # One hot encode the selected experts to create an expert mask
----        # this will be used to easily index which expert is going to be sollicitated
----        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
----
----        # Loop over all available experts in the model and perform the computation on each expert
----        for expert_idx in range(self.num_experts):
----            expert_layer = self.experts[expert_idx]
----            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
----
----            # Index the correct hidden states and compute the expert hidden state for
----            # the current expert. We need to make sure to multiply the output hidden
----            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
----            if 0 not in idx.shape:
----                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
----                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
----
----                # However `index_add_` only support torch tensors for indexing so we'll use
----                # the `top_x` tensor here.
----                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
----
----        shared_expert_output = self.shared_expert(hidden_states)
----        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
----
----        final_hidden_states = final_hidden_states + shared_expert_output
---+            batch_size, sequence_length, hidden_dim = hidden_states.shape
---+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
---+            num_tokens = hidden_states_reshaped.shape[0]
---+
---+            router_logits = self.gate(hidden_states_reshaped)
---+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
---+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
---+
---+            if self.norm_topk_prob:
---+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
---+            routing_weights = routing_weights.to(hidden_states.dtype)
---+            
---+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
---+            flat_selected_experts = selected_experts.flatten()
---+            
---+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
---+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
---+            token_indices = broadcasted_token_indices.flatten()
---+            
---+            active_experts = ops.unique(flat_selected_experts)
---+            
---+            for expert_idx_tensor in active_experts:
---+                expert_idx = expert_idx_tensor.item()
---+                expert_layer = self.experts[expert_idx]
---+                
---+                mask = (flat_selected_experts == expert_idx_tensor)
---+                selected_token_indices = token_indices[mask]
---+                selected_routing_weights = routing_weights.flatten()[mask]
---+                
---+                current_states = hidden_states_reshaped[selected_token_indices]
---+                
---+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
---+                
---+                final_hidden_states = final_hidden_states.index_add(
---+                    dim=0,
---+                    index=selected_token_indices,
---+                    source=expert_output.to(hidden_states.dtype)
---+                )
---+            
---+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
---+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--- 
----        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
----        return final_hidden_states, router_logits
---+            final_hidden_states = final_hidden_states + shared_expert_output
---+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
---+            
---+            return final_hidden_states, router_logits
--- 
--- 
--- class Qwen2MoeDecoderLayer(nn.Module):
---@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--- 
---         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--- 
---+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
---+
---         if (layer_idx not in config.mlp_only_layers) and (
---             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
---         ):
---@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
---     _no_split_modules = ["Qwen2MoeDecoderLayer"]
---     _skip_keys_device_placement = "past_key_values"
---     _supports_cache_class = True
---+#lwx
---+    # _supports_static_cache = True
--- 
---     def _init_weights(self, module):
---         std = self.config.initializer_range
---@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
---         return causal_mask
--- 
--- 
----class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
---+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
---     _tied_weights_keys = ["lm_head.weight"]
--- 
---     def __init__(self, config):
---@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
---         self.num_experts_per_tok = config.num_experts_per_tok
---         # Initialize weights and apply final processing
---         self.post_init()
---+        # @lwx
---+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
---+        #     self.generation_config.cache_implementation = "static"
---+        self._warmed_up = False
---+
---+    def warmup_moe_model(self):
---+        print("[Warmup] Qwen2-MoE 模型预热开始...")
---+        test_texts = [
---+            "warmup short",
---+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
---+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
---+        ]
---+        tokenizer = getattr(self, "_warmup_tokenizer", None)
---+        if tokenizer is None:
---+            from mindnlp.transformers import AutoTokenizer
---+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
---+            self._warmup_tokenizer = tokenizer
---+
---+        for text in test_texts:
---+            inputs = tokenizer(text, return_tensors="ms")
---+            with mindspore._no_grad():
---+                _ = self(**inputs, output_router_logits=True, use_cache=False)
---+        print("[Warmup] Qwen2-MoE 模型预热完成。")
--- 
---     def get_input_embeddings(self):
---         return self.model.embed_tokens
---@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
---         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
---         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
---         ```"""
---+        if not self._warmed_up:
---+            self._warmed_up = True
---+            self.warmup_moe_model()
--- 
---         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
---         output_router_logits = (
---@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
---             }
---         )
---         return model_inputs
---+# @lwx
---+    # def _decode_one_tokens_logits(
---+    #     self,
---+    #     cur_token: mindspore.Tensor,
---+    #     input_pos: Optional[mindspore.Tensor],
---+    #     cache_position: mindspore.Tensor,
---+    #     past_key_values: StaticCache,
---+    # ) -> mindspore.Tensor:
---+    #     """
---+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
---+        
---+    #     Args:
---+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
---+    #         input_pos: 输入位置信息，可选
---+    #         cache_position: 当前token在cache中的位置，shape为(1,)
---+    #         past_key_values: StaticCache对象，存储之前的key-value状态
---+            
---+    #     Returns:
---+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
---+    #     """
---+    #     # 调用JIT编译的版本
---+    #     return self.get_decode_one_tokens_logits(
---+    #         cur_token=cur_token,
---+    #         input_pos=input_pos,
---+    #         cache_position=cache_position,
---+    #         past_key_values=past_key_values,
---+    #     )
---+    
---+    # @mindspore.jit(jit_level='O1')
---+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
---+    #     """
---+    #     JIT编译的函数，用于高效的单token解码
---+    #     使用JIT编译优化以支持静态shape和高效执行
---+        
---+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
---+    #     """
---+    #     outputs = self.model.forward(
---+    #         input_ids=cur_token,
---+    #         position_ids=input_pos,
---+    #         cache_position=cache_position,
---+    #         past_key_values=past_key_values,
---+    #         use_cache=True,
---+    #         return_dict=False,
---+    #     )
---+        
---+    #     hidden_states = outputs[0]
---+    #     logits = self.lm_head.forward(hidden_states)
---+    #     logits = logits.float()
---+        
---+    #     return logits[:, -1, :]
---+
---+    # def _sample(
---+    #     self,
---+    #     input_ids: mindspore.Tensor,
---+    #     logits_processor,
---+    #     stopping_criteria,
---+    #     generation_config,
---+    #     synced_devices: bool,
---+    #     streamer=None,
---+    #     logits_warper=None,
---+    #     **model_kwargs,
---+    # ):
---+    #     """
---+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
---+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
---+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
---+    #     """
---+    #     from ...generation.logits_process import LogitsProcessorList
---+    #     from ...generation.stopping_criteria import StoppingCriteriaList
---+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
---+    #     from mindnlp.core import nn, ops, no_grad
---+    #     import numpy as np
---+        
---+    #     # 检查是否使用 StaticCache
---+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
---+    #     # 否则，直接调用父类方法
---+    #     past_key_values = model_kwargs.get("past_key_values")
---+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
---+
---+    #     if not isinstance(past_key_values, StaticCache):
---+    #         # 不使用 StaticCache，直接调用父类方法
---+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
---+    #         return super()._sample(
---+    #             input_ids=input_ids,
---+    #             logits_processor=logits_processor,
---+    #             stopping_criteria=stopping_criteria,
---+    #             generation_config=generation_config,
---+    #             synced_devices=synced_devices,
---+    #             streamer=streamer,
---+    #             logits_warper=logits_warper,
---+    #             **model_kwargs,
---+    #         )
---+        
---+    #     # 使用 StaticCache，进入自定义循环
---+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
---+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
---+    #     pad_token_id = generation_config._pad_token_tensor
---+    #     output_attentions = generation_config.output_attentions
---+    #     output_hidden_states = generation_config.output_hidden_states
---+    #     output_scores = generation_config.output_scores
---+    #     output_logits = generation_config.output_logits
---+    #     return_dict_in_generate = generation_config.return_dict_in_generate
---+    #     max_length = generation_config.max_length
---+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
---+    #     do_sample = generation_config.do_sample
---+        
---+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
---+    #         raise ValueError(
---+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
---+    #             f"{logits_warper})."
---+    #         )
---+        
---+    #     # init attention / hidden states / scores tuples
---+    #     scores = () if (return_dict_in_generate and output_scores) else None
---+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
---+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
---+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
---+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
---+        
---+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
---+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
---+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
---+    #         encoder_hidden_states = (
---+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
---+    #         )
---+        
---+    #     # keep track of which sequences are already finished
---+    #     batch_size, cur_len = input_ids.shape
---+    #     this_peer_finished = False
---+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
---+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
---+        
---+    #     time_record = []
---+    #     from ....utils.testing_utils import parse_flag_from_env
---+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
---+        
---+    #     while self._has_unfinished_sequences(
---+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
---+    #     ):
---+    #         if _record_time:
---+    #             import time as time_module
---+    #             infer_start = time_module.time()
---+            
---+    #         # prepare model inputs
---+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
---+            
---+    #         # prepare variable output controls
---+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
---+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
---+            
---+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
---+    #         cur_cache_position = model_inputs.get("cache_position")
---+    #         cur_past_key_values = model_inputs.get("past_key_values")
---+    #         cur_input_ids = model_inputs.get("input_ids")
---+            
---+    #         if (isinstance(cur_past_key_values, StaticCache) and 
---+    #             cur_cache_position is not None and 
---+    #             len(cur_cache_position.shape) > 0 and
---+    #             cur_cache_position.shape[0] == 1 and
---+    #             cur_input_ids is not None and
---+    #             cur_input_ids.shape[1] == 1):
---+    #             # 使用 JIT 优化的单 token 解码
---+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
---+    #             if not hasattr(self, '_jit_used'):
---+    #                 self._jit_used = False
---+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
---+                
---+    #             next_token_logits = self.get_decode_one_tokens_logits(
---+    #                 cur_token=cur_input_ids,
---+    #                 input_pos=model_inputs.get("position_ids"),
---+    #                 cache_position=cur_cache_position,
---+    #                 past_key_values=cur_past_key_values,
---+    #             )
---+                
---+    #             # 标记已使用JIT（用于后续判断）
---+    #             if not self._jit_used:
---+    #                 self._jit_used = True
---+                
---+    #             # 构造兼容的输出对象
---+    #             class JitOptimizedOutput:
---+    #                 def __init__(self, logits, config):
---+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
---+    #                     self.config = config
---+    #                     # 对于 JIT 优化路径，这些属性通常不需要
---+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
---+    #                     self.attentions = None if not config.is_encoder_decoder else None
---+    #                     self.cross_attentions = None
---+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
---+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
---+                
---+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
---+    #         else:
---+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
---+    #             outputs = self(**model_inputs, return_dict=True)
---+            
---+    #         if synced_devices and this_peer_finished:
---+    #             continue
---+            
---+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
---+    #         next_token_logits = outputs.logits[:, -1, :]
---+            
---+    #         # pre-process distribution
---+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
---+    #         if do_sample:
---+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
---+            
---+    #         # Store scores, attentions and hidden_states when required
---+    #         if return_dict_in_generate:
---+    #             if output_scores:
---+    #                 scores += (next_token_scores,)
---+    #             if output_logits:
---+    #                 raw_logits += (next_token_logits,)
---+    #             if output_attentions:
---+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
---+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
---+    #                 if self.config.is_encoder_decoder:
---+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
---+                
---+    #             if output_hidden_states:
---+    #                 hidden = (
---+    #                     outputs.decoder_hidden_states
---+    #                     if self.config.is_encoder_decoder
---+    #                     else outputs.hidden_states
---+    #                 )
---+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
---+            
---+    #         # token selection
---+    #         if do_sample:
---+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
---+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
---+    #         else:
---+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
---+            
---+    #         # finished sentences should have their next token be a padding token
---+    #         if has_eos_stopping_criteria:
---+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
---+            
---+    #         # update generated ids, model inputs, and length for next step
---+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
---+    #         if streamer is not None:
---+    #             streamer.put(next_tokens)
---+            
---+    #         model_kwargs = self._update_model_kwargs_for_generation(
---+    #             outputs,
---+    #             model_kwargs,
---+    #             is_encoder_decoder=self.config.is_encoder_decoder,
---+    #         )
---+            
---+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
---+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
---+    #         cur_len += 1
---+            
---+    #         if _record_time:
---+    #             import time as time_module
---+    #             infer_stop = time_module.time()
---+    #             time_record.append(infer_stop - infer_start)
---+            
---+    #         del outputs
---+        
---+    #     average_infer_time = None
---+    #     if time_record:
---+    #         if len(time_record) > 1:
---+    #             time_record.pop(0)
---+    #         average_infer_time = sum(time_record) / len(time_record)
---+    #         print(f'average inference time is: {average_infer_time}')
---+    #         print(f'inference time record: {time_record}')
---+        
---+    #     if streamer is not None:
---+    #         streamer.end()
---+        
---+    #     # 简单判断：打印是否使用了JIT路径
---+    #     if hasattr(self, '_jit_used') and self._jit_used:
---+    #         print("[JIT] ✓ JIT optimization was used during generation")
---+    #     else:
---+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
---+        
---+    #     if return_dict_in_generate:
---+    #         if self.config.is_encoder_decoder:
---+    #             return GenerateEncoderDecoderOutput(
---+    #                 sequences=input_ids,
---+    #                 scores=scores,
---+    #                 logits=raw_logits,
---+    #                 encoder_attentions=encoder_attentions,
---+    #                 encoder_hidden_states=encoder_hidden_states,
---+    #                 decoder_attentions=decoder_attentions,
---+    #                 cross_attentions=cross_attentions,
---+    #                 decoder_hidden_states=decoder_hidden_states,
---+    #                 past_key_values=model_kwargs.get("past_key_values"),
---+    #                 average_infer_time=average_infer_time
---+    #             )
---+    #         else:
---+    #             return GenerateDecoderOnlyOutput(
---+    #                 sequences=input_ids,
---+    #                 scores=scores,
---+    #                 logits=raw_logits,
---+    #                 attentions=decoder_attentions,
---+    #                 hidden_states=decoder_hidden_states,
---+    #                 past_key_values=model_kwargs.get("past_key_values"),
---+    #                 average_infer_time=average_infer_time
---+    #             )
---+    #     else:
---+    #         return input_ids
---+            
---+    # def _prepare_cache_for_generation(
---+    #     self,
---+    #     generation_config,
---+    #     model_kwargs,
---+    #     assistant_model,
---+    #     batch_size,
---+    #     max_cache_length,
---+    # ):
---+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
---+    #         generation_config.cache_implementation = "static"
---+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
---+        
---+    #     if generation_config.cache_implementation == "static":
---+    #         base_required_from_max_length = generation_config.max_length + 1
---+    #         base_required = max(max_cache_length, base_required_from_max_length)
---+    #         min_cache_size = 50
---+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
---+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
---+    #         else:
---+    #             max_cache_length = max(base_required, min_cache_size)
---+            
---+    #         original_max_cache_length = max_cache_length
---+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
---+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
---+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
---+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
---+    #         print(f"  - final max_cache_length: {max_cache_length}")
---+            
---+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
---+    #             if max_cache_length > self.config.max_position_embeddings:
---+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
---+        
---+    #     result = super()._prepare_cache_for_generation(
---+    #         generation_config=generation_config,
---+    #         model_kwargs=model_kwargs,
---+    #         assistant_model=assistant_model,
---+    #         batch_size=batch_size,
---+    #         max_cache_length=max_cache_length,
---+    #     )
---+        
---+    #     if generation_config.cache_implementation == "static":
---+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
---+    #         created_cache = model_kwargs.get(cache_name)
---+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
---+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
---+    #             if created_cache.max_cache_len < generation_config.max_length:
---+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
---+        
---+    #     return result
---+
---+
---+
--- 
--- 
--- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
----- 
---2.27.0
---
---- 
--2.27.0
--
-diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
-deleted file mode 100644
-index bc5549ca..00000000
---- a/patches/0004-20251106change.patch
-+++ /dev/null
-@@ -1,7498 +0,0 @@
--From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Thu, 6 Nov 2025 15:48:09 +0800
--Subject: [PATCH 4/8] 20251106change
--
-----
-- .../models/deepseek/modeling_deepseek.py      |  189 +-
-- patches/0001-20251104commit.patch             | 1272 +++++++
-- patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
-- patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
-- 4 files changed, 7244 insertions(+), 186 deletions(-)
-- create mode 100644 patches/0001-20251104commit.patch
-- create mode 100644 patches/0002-20251106commit.patch
-- create mode 100644 patches/0003-20261106secondcommit.patch
--
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index 2f9192bf..0546f318 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
-- 
--         return attn_output, attn_weights, past_key_value
-- 
---# class DeepseekFlashAttention(nn.Module):
---#     """
---#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
---#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
---
---#     This class is designed as a drop-in replacement for DeepseekAttention.
---#     """
---
---#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
---#         super().__init__()
---#         self.config = config
---#         self.layer_idx = layer_idx
---#         if layer_idx is None:
---#             logger.warning(
---#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
---#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
---#                 "when creating this class."
---#             )
---
---#         self.attention_dropout = config.attention_dropout
---#         self.hidden_size = config.hidden_size
---#         self.num_heads = config.num_attention_heads
---#         self.head_dim = self.hidden_size // self.num_heads
---#         self.num_key_value_heads = config.num_key_value_heads
---#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
---#         self.max_position_embeddings = config.max_position_embeddings
---#         self.rope_theta = config.rope_theta
---#         self.is_causal = True
---
---#         if (self.head_dim * self.num_heads) != self.hidden_size:
---#             raise ValueError(
---#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
---#                 f" and `num_heads`: {self.num_heads})."
---#             )
---
---#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
---#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
---#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
---#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
---#         self._init_rope()
---
---#     def _init_rope(self):
---#         if self.config.rope_scaling is None:
---#             self.rotary_emb = DeepseekRotaryEmbedding(
---#                 self.head_dim,
---#                 max_position_embeddings=self.max_position_embeddings,
---#                 base=self.rope_theta,
---#             )
---#         else:
---#             scaling_type = self.config.rope_scaling["type"]
---#             scaling_factor = self.config.rope_scaling["factor"]
---#             if scaling_type == "linear":
---#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
---#                     self.head_dim,
---#                     max_position_embeddings=self.max_position_embeddings,
---#                     scaling_factor=scaling_factor,
---#                     base=self.rope_theta,
---#                 )
---#             elif scaling_type == "dynamic":
---#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
---#                     self.head_dim,
---#                     max_position_embeddings=self.max_position_embeddings,
---#                     scaling_factor=scaling_factor,
---#                     base=self.rope_theta,
---#                 )
---#             else:
---#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
---
---#     def forward(
---#         self,
---#         hidden_states: mindspore.Tensor,
---#         attention_mask: Optional[mindspore.Tensor] = None,
---#         position_ids: Optional[mindspore.Tensor] = None,
---#         past_key_value: Optional[Cache] = None,
---#         output_attentions: bool = False,
---#         use_cache: bool = False,
---#         **kwargs,
---#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
---#         if "padding_mask" in kwargs:
---#             warnings.warn(
---#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
---#             )
---        
---#         if output_attentions:
---#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
---
---#         bsz, q_len, _ = hidden_states.shape
---
---#         if self.config.pretraining_tp > 1:
---#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
---
---#         query_states = self.q_proj(hidden_states)
---#         key_states = self.k_proj(hidden_states)
---#         value_states = self.v_proj(hidden_states)
---
---#         # Reshape for multi-head attention
---#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
---#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
---
---#         kv_seq_len = key_states.shape[-2]
---#         if past_key_value is not None:
---#             if self.layer_idx is None:
---#                 raise ValueError(
---#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
---#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
---#                     "with a layer index."
---#                 )
---#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
---        
---#         # Apply Rotary Positional Embedding
---#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
---#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
---
---#         if past_key_value is not None:
---#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
---#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
---
---#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
---#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
---#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
---        
---#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
---#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
---        
---#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
---#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
---
---#         # Convert attention_mask for flash_attention_score
---#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
---#         if attention_mask is not None:
---#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
---#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
---#                 raise ValueError(
---#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
---#                 )
---#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
---#         else:
---#             attn_mask_for_fa = None
---        
---#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
---
---#         # Call the fused flash_attention_score operator
---#         attn_output = mindspore.ops.flash_attention_score(
---#             query=query_states_for_fa,
---#             key=key_states_for_fa,
---#             value=value_states_for_fa,
---#             head_num=self.num_heads, # This is N1, the number of query heads
---#             input_layout='BSH',
---#             attn_mask=attn_mask_for_fa,
---#             keep_prob=keep_prob,
---#             scalar_value=1.0 / math.sqrt(self.head_dim),
---#             sparse_mode=0 # Default mask mode
---#         )
---        
---#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
---#         attn_output = self.o_proj(attn_output)
---        
---#         # Flash Attention does not return attention weights
---#         attn_weights = None
---
---#         return attn_output, attn_weights, past_key_value
-- 
-- class DeepseekFlashAttention(nn.Module):
--     """
--@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
--         super().__init__()
--         self.hidden_size = config.hidden_size
-- 
---        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
---            config=config, layer_idx=layer_idx
---        )
--+        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--+            # config=config, layer_idx=layer_idx
--+        # )
-- 
--         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--             config=config, layer_idx=layer_idx
--@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
--         return outputs
-- 
-- 
---
-- class DeepseekPreTrainedModel(PreTrainedModel):
--     config_class = DeepseekConfig
--     base_model_prefix = "model"
--@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--         # Initialize weights and apply final processing
--         self.post_init()
--         self.warm_up = False
---        #@dwj
---        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
---            self.num_layers,
---            self.num_attention_heads,
---            self.head_dim,
---            batch_size=1,
---            max_length=self.max_length,
---            dtype=mindspore.float16
---        )
---
---    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
---        key_cache = []
---        value_cache = []
---        for _ in range(num_layers):
---            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
---            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
---            key_cache.append(k)
---            value_cache.append(v)
---        return key_cache, value_cache
---
-- 
--     def warmup_moe_model_deep(self):
--         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--new file mode 100644
--index 00000000..78f22642
----- /dev/null
--+++ b/patches/0001-20251104commit.patch
--@@ -0,0 +1,1272 @@
--+From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+From: Pinoeer-kingxi <13022943007@163.com>
--+Date: Tue, 4 Nov 2025 09:11:51 +0800
--+Subject: [PATCH 1/3] 20251104commit
--+
--+---
--+ mindnlp/transformers/cache_utils.py           |  28 +-
--+ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+ 3 files changed, 976 insertions(+), 87 deletions(-)
--+
--+diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+index cadd2e04..02f8d4be 100644
--+--- a/mindnlp/transformers/cache_utils.py
--++++ b/mindnlp/transformers/cache_utils.py
--+@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+             # k_out[:, :, cache_position] = key_states
--+             # v_out[:, :, cache_position] = value_states
--+-            if ON_ORANGE_PI:
--+-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+-            else:
--+-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+-
--++            # if ON_ORANGE_PI:
--++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++            # else:
--++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++            # 确保 cache_position 是 1D tensor 并且类型正确
--++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--++            if cache_position.ndim > 1:
--++                cache_position = cache_position.flatten()
--++            # 确保类型是 int32 或 int64（MindSpore 要求）
--++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--++                cache_position = cache_position.int()
--++            
--++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--++            k_out[:, :, cache_position] = key_states
--++            v_out[:, :, cache_position] = value_states
--++            
--+         return k_out, v_out
--+ 
--+     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+index c695b944..d8303e45 100644
--+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+ def rotate_half(x):
--+     """Rotates half the hidden dims of the input."""
--+-    x1 = x[..., : x.shape[-1] // 2]
--+-    x2 = x[..., x.shape[-1] // 2 :]
--++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++    # x1 = x[..., : x.shape[-1] // 2]
--++    # x2 = x[..., x.shape[-1] // 2 :]
--++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+     return ops.cat((-x2, x1), dim=-1)
--+ 
--+ 
--+@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+         if self.training:
--+             raise NotImplementedError("Training is not supported yet.")
--+         else:
--+-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+-        if self.config.n_shared_experts is not None:
--+-            y = y + self.shared_experts(identity)
--+-        return y
--++            # @lwx
--++            if orig_shape[1] == 1:
--++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--++                y=y.view(*orig_shape)
--++                if self.config.n_shared_experts is not None:
--++                    y = y + self.shared_experts(identity)
--++                return y
--++            else:
--++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--++                if self.config.n_shared_experts is not None:
--++                    y = y + self.shared_experts(identity)
--++                return y
--++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++        # if self.config.n_shared_experts is not None:
--++        #     y = y + self.shared_experts(identity)
--++        # return y
--++        
--++    @no_grad()
--++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++
--++        expert_cache = ops.zeros_like(x)
--++        for i in range(self.num_experts_per_tok):
--++            expert_id = flat_expert_indices[i].item()
--++            weight = flat_expert_weights[i].item()
--++            expert = self.experts[expert_id]
--++            expert_out = expert(x)
--++            expert_cache += expert_out * weight
--++        return expert_cache
--+ 
--+     @no_grad()
--+-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+-        # expert_cache = torch.zeros_like(x)
--+-        # idxs = flat_expert_indices.argsort()
--+-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+-        # token_idxs = idxs // self.num_experts_per_tok
--+-        # for i, end_idx in enumerate(tokens_per_expert):
--+-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+-        #     if start_idx == end_idx:
--+-        #         continue
--+-        #     expert = self.experts[i]
--+-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+-        #     expert_tokens = x[exp_token_idx]
--+-        #     expert_out = expert(expert_tokens)
--+-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+-        # return expert_cache
--++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+         expert_cache = ops.zeros_like(x)
--+         idxs = flat_expert_indices.argsort()
--+         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+         token_idxs = idxs // self.num_experts_per_tok
--++
--+         for i, end_idx in enumerate(tokens_per_expert):
--+             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+             if start_idx == end_idx:
--+@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+             expert_out = expert(expert_tokens)
--+             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++
--+         return expert_cache
--++        
--++    # @no_grad()
--++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++    #     # expert_cache = torch.zeros_like(x)
--++    #     # idxs = flat_expert_indices.argsort()
--++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++    #     # token_idxs = idxs // self.num_experts_per_tok
--++    #     # for i, end_idx in enumerate(tokens_per_expert):
--++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++    #     #     if start_idx == end_idx:
--++    #     #         continue
--++    #     #     expert = self.experts[i]
--++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++    #     #     expert_tokens = x[exp_token_idx]
--++    #     #     expert_out = expert(expert_tokens)
--++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++    #     # return expert_cache
--++    #     expert_cache = ops.zeros_like(x)
--++    #     idxs = flat_expert_indices.argsort()
--++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++    #     token_idxs = idxs // self.num_experts_per_tok
--++
--++    #     for i, end_idx in enumerate(tokens_per_expert):
--++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++    #         if start_idx == end_idx:
--++    #             continue
--++    #         expert = self.experts[i]
--++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++    #         expert_tokens = x[exp_token_idx]
--++    #         expert_out = expert(expert_tokens)
--++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++
--++    #     return expert_cache
--++    # @no_grad()
--++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++    #     expert_cache = ops.zeros_like(x)
--++
--++    #     # 排序保证顺序一致
--++    #     idxs = flat_expert_indices.argsort()
--++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++    #     token_idxs = idxs // self.num_experts_per_tok
--++
--++    #     # 找出有 token 的专家
--++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++
--++    #     for i in active_experts.tolist():
--++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++    #         end_idx = tokens_per_expert[i]
--++    #         if start_idx == end_idx:  # 没有 token
--++    #             continue
--++
--++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++    #         expert_tokens = x[exp_token_idx]
--++    #         expert_out = self.experts[i](expert_tokens)
--++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++
--++    #         expert_cache = mindspore.mint.scatter_add(
--++    #             expert_cache,
--++    #             0,
--++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++    #             expert_out
--++    #         )
--++
--++    #     return expert_cache
--++
--++
--+ 
--+ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+ #     """
--+@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+ 
--+         # Initialize weights and apply final processing
--+         self.post_init()
--++        self.warm_up = False
--++
--++    def warmup_moe_model_deep(self):
--++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++        test_texts = [
--++            "warmup short",
--++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--++        ]
--++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++        if tokenizer is None:
--++            from mindnlp.transformers import AutoTokenizer
--++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++            self._warmup_tokenizer = tokenizer
--++
--++        for text in test_texts:
--++            inputs = tokenizer(text, return_tensors="ms")
--++            with mindspore._no_grad():
--++                _ = self(**inputs, use_cache=False)
--++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+ 
--+     def get_input_embeddings(self):
--+         return self.model.embed_tokens
--+@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+         ```"""
--++        if not self.warm_up:
--++            self.warm_up = True
--++            self.warmup_moe_model_deep()
--++
--+         output_attentions = (
--+             output_attentions
--+             if output_attentions is not None
--+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+index 3cbf820e..d4c6b651 100644
--+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+@@ -18,7 +18,6 @@
--+ # See the License for the specific language governing permissions and
--+ # limitations under the License.
--+ """MindSpore Qwen2MoE model."""
--+-
--+ import math
--+ from typing import List, Optional, Tuple, Union
--+ 
--+@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+     TokenClassifierOutput,
--+ )
--+ from ...modeling_utils import PreTrainedModel
--++from ...generation import GenerationMixin
--+ from ....utils import logging
--+ from .configuration_qwen2_moe import Qwen2MoeConfig
--+ 
--+@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+         self.variance_epsilon = eps
--+ 
--+     def forward(self, hidden_states):
--++        # @dwj
--++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++        # @lwx
--++        # if not self.training :
--++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+         input_dtype = hidden_states.dtype
--+         hidden_states = hidden_states.to(mindspore.float32)
--+         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+@@ -234,6 +239,8 @@ def rotate_half(x):
--+     """Rotates half the hidden dims of the input."""
--+     x1 = x[..., : x.shape[-1] // 2]
--+     x2 = x[..., x.shape[-1] // 2 :]
--++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+     return ops.cat((-x2, x1), dim=-1)
--+ 
--+ 
--+@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+         self.config = config
--+         self.hidden_size = config.hidden_size
--+         self.intermediate_size = intermediate_size
--++        
--+         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+         self.act_fn = ACT2FN[config.hidden_act]
--+ 
--+     def forward(self, x):
--+-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+-
--+ 
--++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++        # @lwx 
--++        # gate_up_output = self.gate_up_proj(x)
--++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--++        # return self.down_proj(swiglu_output)
--++
--++    # def forward(self, x):
--++    #     gate_proj_out = self.gate_proj(x)
--++    #     up_proj_out = self.up_proj(x)
--++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--++    #     return self.down_proj(swiglu_out)
--++        
--+ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+     """
--+@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+         use_cache: bool = False,
--+         cache_position: Optional[mindspore.Tensor] = None,
--+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++        
--++
--+         bsz, q_len, _ = hidden_states.shape
--+ 
--+         query_states = self.q_proj(hidden_states)
--+@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+                     "with a layer index."
--+                 )
--+-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++            if isinstance(past_key_value, StaticCache):
--++                kv_seq_len = key_states.shape[-2]
--++            else:
--++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+ 
--+         if past_key_value is not None:
--+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++            
--++            if isinstance(past_key_value, StaticCache):
--++                kv_seq_len = key_states.shape[-2]
--+ 
--+         # repeat k/v heads if n_kv_heads < n_heads
--+         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+-
--++        
--+         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+ 
--+-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+-            raise ValueError(
--+-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+-                f" {attn_weights.shape}"
--+-            )
--+-
--+-        if attention_mask is not None:  # no matter the length, we just slice it
--+-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--++        if attention_mask is not None:
--++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+             attn_weights = attn_weights + causal_mask
--+ 
--+         # upcast attention to fp32
--+@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+ 
--+         attn_output = self.o_proj(attn_output)
--+-
--++        # @lwx
--++        
--++        # max_seq_len = self.max_position_embeddings  # 2048
--++
--++        # if attention_mask is not None:
--++        #     # attention_mask: [B, 1, Sq, Sk]
--++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++
--++        #     # pad 到 [max_seq_len, max_seq_len]
--++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++        #     global_attention_mask = padded_mask
--++        # else:
--++        #     global_attention_mask = None
--++
--++
--++        # sparse_mode=3
--++        # attn_output = mindspore.ops.flash_attention_score(
--++        #     query=query_states,
--++        #     key=key_states,
--++        #     value=value_states,
--++        #     real_shift=None,
--++        #     padding_mask=None,
--++
--++        #     head_num=self.num_heads,
--++        #     attn_mask=global_attention_mask,
--++        #     keep_prob=1.0 - self.attention_dropout,
--++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++        #     input_layout="BNSD",
--++        #     pre_tokens=2147483647,
--++        #     next_tokens=2147483647,
--++        #     inner_precise=0,
--++        #     drop_mask=None,
--++        #     prefix=None,
--++        #     actual_seq_qlen=None,
--++        #     actual_seq_kvlen=None,
--++        #     sparse_mode=sparse_mode,
--++        # )
--+         if not output_attentions:
--+             attn_weights = None
--+ 
--+         return attn_output, attn_weights, past_key_value
--+ 
--+ 
--++class Qwen2MoeFlashAttention(nn.Module):
--++    """
--++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++
--++    关键改动:
--++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++       直接传入原始的 key 和 value 张量效率更高。
--++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++    """
--++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++        super().__init__()
--++        self.config = config
--++        self.layer_idx = layer_idx
--++        self.hidden_size = config.hidden_size
--++        self.num_heads = config.num_attention_heads
--++        self.head_dim = self.hidden_size // self.num_heads
--++        self.num_key_value_heads = config.num_key_value_heads
--++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++        self.max_position_embeddings = config.max_position_embeddings
--++        self.rope_theta = config.rope_theta
--++        self.attention_dropout = config.attention_dropout
--++
--++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++            raise ValueError(
--++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++            )
--++
--++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++
--++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++            self.head_dim,
--++            max_position_embeddings=self.max_position_embeddings,
--++            base=self.rope_theta,
--++        )
--++
--++    def forward(
--++        self,
--++        hidden_states: mindspore.Tensor,
--++        attention_mask: Optional[mindspore.Tensor] = None,
--++        position_ids: Optional[mindspore.Tensor] = None,
--++        past_key_value: Optional[Cache] = None,
--++        output_attentions: bool = False,
--++        use_cache: bool = False,
--++        cache_position: Optional[mindspore.Tensor] = None,
--++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++        bsz, q_len, _ = hidden_states.shape
--++
--++        # 1. 线性投射 Q, K, V
--++        query_states = self.q_proj(hidden_states)
--++        key_states = self.k_proj(hidden_states)
--++        value_states = self.v_proj(hidden_states)
--++
--++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++        # query:   [B, S, H*D] -> [B, N1, S, D]
--++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++        # 3. RoPE 旋转位置编码
--++        kv_seq_len = key_states.shape[-2]
--++        if past_key_value is not None:
--++            if self.layer_idx is None:
--++                raise ValueError(
--++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++                    "with a layer index."
--++                )
--++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++                if cache_position.shape[0] == 1:
--++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++                    kv_seq_len = past_seen_tokens + 1
--++                else:
--++                    # prefill 阶段：cache_position 是范围，使用其长度
--++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++            else:
--++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++
--++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++        # 4. KV 缓存更新
--++        if past_key_value is not None:
--++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++            key_states, value_states = past_key_value.update(
--++                key_states, value_states, self.layer_idx, cache_kwargs
--++            )
--++            
--++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++                if cache_position.shape[0] == 1:
--++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++                    kv_seq_len = key_states.shape[-2]
--++
--++        # 5. [重要] 准备 Attention Mask
--++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++        fa_attention_mask = None
--++        if attention_mask is not None:
--++            # 截取与当前key长度匹配的部分
--++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++            fa_attention_mask = (mask_slice != 0)
--++
--++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++        input_dtype = query_states.dtype
--++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++            query_states = query_states.to(mindspore.float16)
--++            key_states = key_states.to(mindspore.float16)
--++            value_states = value_states.to(mindspore.float16)
--++
--++        # 6. [核心] 调用 flash_attention_score 算子
--++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++        attn_output = mindspore.ops.flash_attention_score(
--++            query=query_states,
--++            key=key_states,
--++            value=value_states,
--++            head_num=self.num_heads, # 传入Q的头数(N1)
--++            attn_mask=fa_attention_mask,
--++            keep_prob=1.0 - self.attention_dropout,
--++            scalar_value=1.0 / math.sqrt(self.head_dim),
--++            input_layout="BNSD",
--++            sparse_mode=0 # 使用 defaultMask 模式
--++        )
--++
--++        # 恢复原始数据类型
--++        attn_output = attn_output.to(input_dtype)
--++
--++        # 7. 调整输出形状
--++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++        attn_output = self.o_proj(attn_output)
--++
--++        # FlashAttention 算子不直接返回注意力权重矩阵
--++        attn_weights = None
--++        if output_attentions:
--++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++
--++        return attn_output, attn_weights, past_key_value
--++
--++    # def forward(
--++    #     self,
--++    #     hidden_states: mindspore.Tensor,
--++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++    #     position_ids: Optional[mindspore.Tensor] = None,
--++    #     past_key_value: Optional[Cache] = None,
--++    #     output_attentions: bool = False,
--++    #     use_cache: bool = False,
--++    #     cache_position: Optional[mindspore.Tensor] = None,
--++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++    #     bsz, q_len, _ = hidden_states.shape
--++
--++    #     # 1. 线性投射 Q, K, V
--++    #     query_states = self.q_proj(hidden_states)
--++    #     key_states = self.k_proj(hidden_states)
--++    #     value_states = self.v_proj(hidden_states)
--++
--++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++    #     # 3. RoPE 旋转位置编码
--++    #     kv_seq_len = key_states.shape[-2]
--++    #     if past_key_value is not None:
--++    #         if self.layer_idx is None:
--++    #             raise ValueError(
--++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++    #                 "with a layer index."
--++    #             )
--++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++
--++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++    #     # 4. KV 缓存更新
--++    #     if past_key_value is not None:
--++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++    #         key_states, value_states = past_key_value.update(
--++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++    #         )
--++
--++    #     # 5. 准备 Attention Mask
--++    #     fa_attention_mask = None
--++    #     if attention_mask is not None:
--++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++    #         fa_attention_mask = (mask_slice != 0)
--++
--++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++    #     input_dtype = query_states.dtype
--++
--++    #     # 6. [核心] 调用 flash_attention_score 算子
--++    #     attn_output = mindspore.ops.flash_attention_score(
--++    #         query=query_states,
--++    #         key=key_states,
--++    #         value=value_states,
--++    #         head_num=self.num_heads,
--++    #         attn_mask=fa_attention_mask,
--++    #         keep_prob=1.0 - self.attention_dropout,
--++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++    #         input_layout="BNSD",
--++    #         sparse_mode=0,
--++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++    #         inner_precise=1
--++    #     )
--++
--++    #     # 恢复原始数据类型
--++    #     attn_output = attn_output.to(input_dtype)
--++
--++    #     # 7. 调整输出形状
--++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++    #     attn_output = self.o_proj(attn_output)
--++
--++    #     attn_weights = None
--++    #     if output_attentions:
--++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++
--++    #     return attn_output, attn_weights, past_key_value
--++
--++    # def forward(
--++    #     self,
--++    #     hidden_states: mindspore.Tensor,
--++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++    #     position_ids: Optional[mindspore.Tensor] = None,
--++    #     past_key_value: Optional[Cache] = None,
--++    #     output_attentions: bool = False,
--++    #     use_cache: bool = False,
--++    #     cache_position: Optional[mindspore.Tensor] = None,
--++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++    #     bsz, q_len, _ = hidden_states.shape
--++
--++    #     query_states = self.q_proj(hidden_states)
--++    #     key_states = self.k_proj(hidden_states)
--++    #     value_states = self.v_proj(hidden_states)
--++
--++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++    #     kv_seq_len = key_states.shape[-2]
--++    #     if past_key_value is not None:
--++    #         if self.layer_idx is None:
--++    #             raise ValueError("`layer_idx` must be specified for caching")
--++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++
--++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++    #     if past_key_value is not None:
--++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++    #         key_states, value_states = past_key_value.update(
--++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++    #         )
--++
--++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++
--++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++    #     query_states = query_states / math.sqrt(self.head_dim)
--++    #     # <--- 修改结束 ---
--++
--++    #     fa_attention_mask = None
--++    #     if attention_mask is not None:
--++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++    #         fa_attention_mask = (mask_slice != 0)
--++
--++    #     input_dtype = query_states.dtype
--++
--++    #     attn_output = mindspore.ops.flash_attention_score(
--++    #         query=query_states,    # 传入已经预先缩放过的 query
--++    #         key=key_states,
--++    #         value=value_states,
--++    #         head_num=self.num_heads,
--++    #         attn_mask=fa_attention_mask,
--++    #         keep_prob=1.0 - self.attention_dropout,
--++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++    #         input_layout="BNSD",
--++    #         sparse_mode=0,
--++    #         inner_precise=1        # 仍然保持内部高精度计算
--++    #     )
--++
--++    #     attn_output = attn_output.to(input_dtype)
--++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++    #     attn_output = self.o_proj(attn_output)
--++
--++    #     attn_weights = None
--++    #     if output_attentions:
--++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++
--++    #     return attn_output, attn_weights, past_key_value
--++
--+ QWEN2MOE_ATTENTION_CLASSES = {
--+     "eager": Qwen2MoeAttention,
--++    "flash-attention": Qwen2MoeFlashAttention,
--+ }
--+ 
--+ 
--+@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+ 
--++    #@dwj
--++    # 只遍历激活的专家，而非全部专家
--+     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-        hidden_states = hidden_states.view(-1, hidden_dim)
--+-        # router_logits: (batch * sequence_length, n_experts)
--+-        router_logits = self.gate(hidden_states)
--+-
--+-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+-        if self.norm_topk_prob:
--+-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-        # we cast back to the input dtype
--+-        routing_weights = routing_weights.to(hidden_states.dtype)
--+-
--+-        final_hidden_states = ops.zeros(
--+-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+-        )
--+-
--+-        # One hot encode the selected experts to create an expert mask
--+-        # this will be used to easily index which expert is going to be sollicitated
--+-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+-
--+-        # Loop over all available experts in the model and perform the computation on each expert
--+-        for expert_idx in range(self.num_experts):
--+-            expert_layer = self.experts[expert_idx]
--+-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+-
--+-            # Index the correct hidden states and compute the expert hidden state for
--+-            # the current expert. We need to make sure to multiply the output hidden
--+-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+-            if 0 not in idx.shape:
--+-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+-
--+-                # However `index_add_` only support torch tensors for indexing so we'll use
--+-                # the `top_x` tensor here.
--+-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+-
--+-        shared_expert_output = self.shared_expert(hidden_states)
--+-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+-
--+-        final_hidden_states = final_hidden_states + shared_expert_output
--++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++            num_tokens = hidden_states_reshaped.shape[0]
--++
--++            router_logits = self.gate(hidden_states_reshaped)
--++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++
--++            if self.norm_topk_prob:
--++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++            routing_weights = routing_weights.to(hidden_states.dtype)
--++            
--++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++            flat_selected_experts = selected_experts.flatten()
--++            
--++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++            token_indices = broadcasted_token_indices.flatten()
--++            
--++            active_experts = ops.unique(flat_selected_experts)
--++            
--++            for expert_idx_tensor in active_experts:
--++                expert_idx = expert_idx_tensor.item()
--++                expert_layer = self.experts[expert_idx]
--++                
--++                mask = (flat_selected_experts == expert_idx_tensor)
--++                selected_token_indices = token_indices[mask]
--++                selected_routing_weights = routing_weights.flatten()[mask]
--++                
--++                current_states = hidden_states_reshaped[selected_token_indices]
--++                
--++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++                
--++                final_hidden_states = final_hidden_states.index_add(
--++                    dim=0,
--++                    index=selected_token_indices,
--++                    source=expert_output.to(hidden_states.dtype)
--++                )
--++            
--++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+ 
--+-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+-        return final_hidden_states, router_logits
--++            final_hidden_states = final_hidden_states + shared_expert_output
--++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++            
--++            return final_hidden_states, router_logits
--+ 
--+ 
--+ class Qwen2MoeDecoderLayer(nn.Module):
--+@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+ 
--+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+ 
--++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++
--+         if (layer_idx not in config.mlp_only_layers) and (
--+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+         ):
--+@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+     _skip_keys_device_placement = "past_key_values"
--+     _supports_cache_class = True
--++#lwx
--++    # _supports_static_cache = True
--+ 
--+     def _init_weights(self, module):
--+         std = self.config.initializer_range
--+@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+         return causal_mask
--+ 
--+ 
--+-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+     _tied_weights_keys = ["lm_head.weight"]
--+ 
--+     def __init__(self, config):
--+@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+         self.num_experts_per_tok = config.num_experts_per_tok
--+         # Initialize weights and apply final processing
--+         self.post_init()
--++        # @lwx
--++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--++        #     self.generation_config.cache_implementation = "static"
--++        self._warmed_up = False
--++
--++    def warmup_moe_model(self):
--++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--++        test_texts = [
--++            "warmup short",
--++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--++        ]
--++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++        if tokenizer is None:
--++            from mindnlp.transformers import AutoTokenizer
--++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++            self._warmup_tokenizer = tokenizer
--++
--++        for text in test_texts:
--++            inputs = tokenizer(text, return_tensors="ms")
--++            with mindspore._no_grad():
--++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+ 
--+     def get_input_embeddings(self):
--+         return self.model.embed_tokens
--+@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+         ```"""
--++        if not self._warmed_up:
--++            self._warmed_up = True
--++            self.warmup_moe_model()
--+ 
--+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+         output_router_logits = (
--+@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+             }
--+         )
--+         return model_inputs
--++# @lwx
--++    # def _decode_one_tokens_logits(
--++    #     self,
--++    #     cur_token: mindspore.Tensor,
--++    #     input_pos: Optional[mindspore.Tensor],
--++    #     cache_position: mindspore.Tensor,
--++    #     past_key_values: StaticCache,
--++    # ) -> mindspore.Tensor:
--++    #     """
--++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--++        
--++    #     Args:
--++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--++    #         input_pos: 输入位置信息，可选
--++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--++            
--++    #     Returns:
--++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--++    #     """
--++    #     # 调用JIT编译的版本
--++    #     return self.get_decode_one_tokens_logits(
--++    #         cur_token=cur_token,
--++    #         input_pos=input_pos,
--++    #         cache_position=cache_position,
--++    #         past_key_values=past_key_values,
--++    #     )
--++    
--++    # @mindspore.jit(jit_level='O1')
--++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--++    #     """
--++    #     JIT编译的函数，用于高效的单token解码
--++    #     使用JIT编译优化以支持静态shape和高效执行
--++        
--++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--++    #     """
--++    #     outputs = self.model.forward(
--++    #         input_ids=cur_token,
--++    #         position_ids=input_pos,
--++    #         cache_position=cache_position,
--++    #         past_key_values=past_key_values,
--++    #         use_cache=True,
--++    #         return_dict=False,
--++    #     )
--++        
--++    #     hidden_states = outputs[0]
--++    #     logits = self.lm_head.forward(hidden_states)
--++    #     logits = logits.float()
--++        
--++    #     return logits[:, -1, :]
--++
--++    # def _sample(
--++    #     self,
--++    #     input_ids: mindspore.Tensor,
--++    #     logits_processor,
--++    #     stopping_criteria,
--++    #     generation_config,
--++    #     synced_devices: bool,
--++    #     streamer=None,
--++    #     logits_warper=None,
--++    #     **model_kwargs,
--++    # ):
--++    #     """
--++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--++    #     """
--++    #     from ...generation.logits_process import LogitsProcessorList
--++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--++    #     from mindnlp.core import nn, ops, no_grad
--++    #     import numpy as np
--++        
--++    #     # 检查是否使用 StaticCache
--++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--++    #     # 否则，直接调用父类方法
--++    #     past_key_values = model_kwargs.get("past_key_values")
--++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--++
--++    #     if not isinstance(past_key_values, StaticCache):
--++    #         # 不使用 StaticCache，直接调用父类方法
--++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--++    #         return super()._sample(
--++    #             input_ids=input_ids,
--++    #             logits_processor=logits_processor,
--++    #             stopping_criteria=stopping_criteria,
--++    #             generation_config=generation_config,
--++    #             synced_devices=synced_devices,
--++    #             streamer=streamer,
--++    #             logits_warper=logits_warper,
--++    #             **model_kwargs,
--++    #         )
--++        
--++    #     # 使用 StaticCache，进入自定义循环
--++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--++    #     pad_token_id = generation_config._pad_token_tensor
--++    #     output_attentions = generation_config.output_attentions
--++    #     output_hidden_states = generation_config.output_hidden_states
--++    #     output_scores = generation_config.output_scores
--++    #     output_logits = generation_config.output_logits
--++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--++    #     max_length = generation_config.max_length
--++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--++    #     do_sample = generation_config.do_sample
--++        
--++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--++    #         raise ValueError(
--++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--++    #             f"{logits_warper})."
--++    #         )
--++        
--++    #     # init attention / hidden states / scores tuples
--++    #     scores = () if (return_dict_in_generate and output_scores) else None
--++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--++        
--++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--++    #         encoder_hidden_states = (
--++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--++    #         )
--++        
--++    #     # keep track of which sequences are already finished
--++    #     batch_size, cur_len = input_ids.shape
--++    #     this_peer_finished = False
--++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--++        
--++    #     time_record = []
--++    #     from ....utils.testing_utils import parse_flag_from_env
--++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--++        
--++    #     while self._has_unfinished_sequences(
--++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--++    #     ):
--++    #         if _record_time:
--++    #             import time as time_module
--++    #             infer_start = time_module.time()
--++            
--++    #         # prepare model inputs
--++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--++            
--++    #         # prepare variable output controls
--++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--++            
--++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--++    #         cur_cache_position = model_inputs.get("cache_position")
--++    #         cur_past_key_values = model_inputs.get("past_key_values")
--++    #         cur_input_ids = model_inputs.get("input_ids")
--++            
--++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--++    #             cur_cache_position is not None and 
--++    #             len(cur_cache_position.shape) > 0 and
--++    #             cur_cache_position.shape[0] == 1 and
--++    #             cur_input_ids is not None and
--++    #             cur_input_ids.shape[1] == 1):
--++    #             # 使用 JIT 优化的单 token 解码
--++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--++    #             if not hasattr(self, '_jit_used'):
--++    #                 self._jit_used = False
--++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--++                
--++    #             next_token_logits = self.get_decode_one_tokens_logits(
--++    #                 cur_token=cur_input_ids,
--++    #                 input_pos=model_inputs.get("position_ids"),
--++    #                 cache_position=cur_cache_position,
--++    #                 past_key_values=cur_past_key_values,
--++    #             )
--++                
--++    #             # 标记已使用JIT（用于后续判断）
--++    #             if not self._jit_used:
--++    #                 self._jit_used = True
--++                
--++    #             # 构造兼容的输出对象
--++    #             class JitOptimizedOutput:
--++    #                 def __init__(self, logits, config):
--++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--++    #                     self.config = config
--++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--++    #                     self.attentions = None if not config.is_encoder_decoder else None
--++    #                     self.cross_attentions = None
--++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--++                
--++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--++    #         else:
--++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--++    #             outputs = self(**model_inputs, return_dict=True)
--++            
--++    #         if synced_devices and this_peer_finished:
--++    #             continue
--++            
--++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--++    #         next_token_logits = outputs.logits[:, -1, :]
--++            
--++    #         # pre-process distribution
--++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--++    #         if do_sample:
--++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--++            
--++    #         # Store scores, attentions and hidden_states when required
--++    #         if return_dict_in_generate:
--++    #             if output_scores:
--++    #                 scores += (next_token_scores,)
--++    #             if output_logits:
--++    #                 raw_logits += (next_token_logits,)
--++    #             if output_attentions:
--++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--++    #                 if self.config.is_encoder_decoder:
--++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--++                
--++    #             if output_hidden_states:
--++    #                 hidden = (
--++    #                     outputs.decoder_hidden_states
--++    #                     if self.config.is_encoder_decoder
--++    #                     else outputs.hidden_states
--++    #                 )
--++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--++            
--++    #         # token selection
--++    #         if do_sample:
--++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--++    #         else:
--++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--++            
--++    #         # finished sentences should have their next token be a padding token
--++    #         if has_eos_stopping_criteria:
--++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--++            
--++    #         # update generated ids, model inputs, and length for next step
--++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--++    #         if streamer is not None:
--++    #             streamer.put(next_tokens)
--++            
--++    #         model_kwargs = self._update_model_kwargs_for_generation(
--++    #             outputs,
--++    #             model_kwargs,
--++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--++    #         )
--++            
--++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--++    #         cur_len += 1
--++            
--++    #         if _record_time:
--++    #             import time as time_module
--++    #             infer_stop = time_module.time()
--++    #             time_record.append(infer_stop - infer_start)
--++            
--++    #         del outputs
--++        
--++    #     average_infer_time = None
--++    #     if time_record:
--++    #         if len(time_record) > 1:
--++    #             time_record.pop(0)
--++    #         average_infer_time = sum(time_record) / len(time_record)
--++    #         print(f'average inference time is: {average_infer_time}')
--++    #         print(f'inference time record: {time_record}')
--++        
--++    #     if streamer is not None:
--++    #         streamer.end()
--++        
--++    #     # 简单判断：打印是否使用了JIT路径
--++    #     if hasattr(self, '_jit_used') and self._jit_used:
--++    #         print("[JIT] ✓ JIT optimization was used during generation")
--++    #     else:
--++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--++        
--++    #     if return_dict_in_generate:
--++    #         if self.config.is_encoder_decoder:
--++    #             return GenerateEncoderDecoderOutput(
--++    #                 sequences=input_ids,
--++    #                 scores=scores,
--++    #                 logits=raw_logits,
--++    #                 encoder_attentions=encoder_attentions,
--++    #                 encoder_hidden_states=encoder_hidden_states,
--++    #                 decoder_attentions=decoder_attentions,
--++    #                 cross_attentions=cross_attentions,
--++    #                 decoder_hidden_states=decoder_hidden_states,
--++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++    #                 average_infer_time=average_infer_time
--++    #             )
--++    #         else:
--++    #             return GenerateDecoderOnlyOutput(
--++    #                 sequences=input_ids,
--++    #                 scores=scores,
--++    #                 logits=raw_logits,
--++    #                 attentions=decoder_attentions,
--++    #                 hidden_states=decoder_hidden_states,
--++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++    #                 average_infer_time=average_infer_time
--++    #             )
--++    #     else:
--++    #         return input_ids
--++            
--++    # def _prepare_cache_for_generation(
--++    #     self,
--++    #     generation_config,
--++    #     model_kwargs,
--++    #     assistant_model,
--++    #     batch_size,
--++    #     max_cache_length,
--++    # ):
--++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--++    #         generation_config.cache_implementation = "static"
--++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--++        
--++    #     if generation_config.cache_implementation == "static":
--++    #         base_required_from_max_length = generation_config.max_length + 1
--++    #         base_required = max(max_cache_length, base_required_from_max_length)
--++    #         min_cache_size = 50
--++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--++    #         else:
--++    #             max_cache_length = max(base_required, min_cache_size)
--++            
--++    #         original_max_cache_length = max_cache_length
--++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--++    #         print(f"  - final max_cache_length: {max_cache_length}")
--++            
--++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++    #             if max_cache_length > self.config.max_position_embeddings:
--++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--++        
--++    #     result = super()._prepare_cache_for_generation(
--++    #         generation_config=generation_config,
--++    #         model_kwargs=model_kwargs,
--++    #         assistant_model=assistant_model,
--++    #         batch_size=batch_size,
--++    #         max_cache_length=max_cache_length,
--++    #     )
--++        
--++    #     if generation_config.cache_implementation == "static":
--++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--++    #         created_cache = model_kwargs.get(cache_name)
--++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--++    #             if created_cache.max_cache_len < generation_config.max_length:
--++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--++        
--++    #     return result
--++
--++
--++
--+ 
--+ 
--+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+-- 
--+2.27.0
--+
--diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--new file mode 100644
--index 00000000..22b65dd5
----- /dev/null
--+++ b/patches/0002-20251106commit.patch
--@@ -0,0 +1,3200 @@
--+From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--+From: Pinoeer-kingxi <13022943007@163.com>
--+Date: Thu, 6 Nov 2025 09:20:38 +0800
--+Subject: [PATCH 2/3] 20251106commit
--+
--+---
--+ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
--+ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
--+ 3 files changed, 2689 insertions(+), 305 deletions(-)
--+ create mode 100644 patches/0001-20251104commit.patch
--+
--+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+index d8303e45..73773c22 100644
--+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
--+         #     y = y + self.shared_experts(identity)
--+         # return y
--+         
--++    # @no_grad()
--++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++
--++    #     expert_cache = ops.zeros_like(x)
--++    #     for i in range(self.num_experts_per_tok):
--++    #         expert_id = flat_expert_indices[i].item()
--++    #         weight = flat_expert_weights[i].item()
--++    #         expert = self.experts[expert_id]
--++    #         expert_out = expert(x)
--++    #         expert_cache += expert_out * weight
--++    #     return expert_cache
--++
--+     @no_grad()
--+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++        # x 的 shape: (1, hidden_size)
--++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--++
--++        # 1. 收集所有需要的专家层
--++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--++        selected_experts = [self.experts[i] for i in flat_expert_indices]
--++
--++        # 2. 并行计算所有专家的输出
--++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--++        # ops.cat 会将它们堆叠成一个新的 Tensor
--++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--++
--++        # 3. 使用矩阵乘法进行加权求和
--++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++        # 最终结果 final_output 的 shape: (1, hidden_size)
--++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--++        
--++        return final_output
--+ 
--+-        expert_cache = ops.zeros_like(x)
--+-        for i in range(self.num_experts_per_tok):
--+-            expert_id = flat_expert_indices[i].item()
--+-            weight = flat_expert_weights[i].item()
--+-            expert = self.experts[expert_id]
--+-            expert_out = expert(x)
--+-            expert_cache += expert_out * weight
--+-        return expert_cache
--+ 
--+     @no_grad()
--+     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
--+             key_states = self.k_proj(hidden_states)
--+             value_states = self.v_proj(hidden_states)
--+ 
--+-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++        # @lwx
--++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
--++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
--++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--+ 
--+         kv_seq_len = key_states.shape[-2]
--+         if past_key_value is not None:
--+@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
--+         return attn_output, attn_weights, past_key_value
--+ 
--+ 
--++# class DeepseekFlashAttention(nn.Module):
--++#     """
--++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--++
--++#     This class is designed as a drop-in replacement for DeepseekAttention.
--++#     """
--++
--++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--++#         super().__init__()
--++#         self.config = config
--++#         self.layer_idx = layer_idx
--++#         if layer_idx is None:
--++#             logger.warning(
--++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++#                 "when creating this class."
--++#             )
--++
--++#         self.attention_dropout = config.attention_dropout
--++#         self.hidden_size = config.hidden_size
--++#         self.num_heads = config.num_attention_heads
--++#         self.head_dim = self.hidden_size // self.num_heads
--++#         self.num_key_value_heads = config.num_key_value_heads
--++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++#         self.max_position_embeddings = config.max_position_embeddings
--++#         self.rope_theta = config.rope_theta
--++#         self.is_causal = True
--++
--++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++#             raise ValueError(
--++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++#                 f" and `num_heads`: {self.num_heads})."
--++#             )
--++
--++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--++#         self._init_rope()
--++
--++#     def _init_rope(self):
--++#         if self.config.rope_scaling is None:
--++#             self.rotary_emb = DeepseekRotaryEmbedding(
--++#                 self.head_dim,
--++#                 max_position_embeddings=self.max_position_embeddings,
--++#                 base=self.rope_theta,
--++#             )
--++#         else:
--++#             scaling_type = self.config.rope_scaling["type"]
--++#             scaling_factor = self.config.rope_scaling["factor"]
--++#             if scaling_type == "linear":
--++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--++#                     self.head_dim,
--++#                     max_position_embeddings=self.max_position_embeddings,
--++#                     scaling_factor=scaling_factor,
--++#                     base=self.rope_theta,
--++#                 )
--++#             elif scaling_type == "dynamic":
--++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--++#                     self.head_dim,
--++#                     max_position_embeddings=self.max_position_embeddings,
--++#                     scaling_factor=scaling_factor,
--++#                     base=self.rope_theta,
--++#                 )
--++#             else:
--++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--++
--++#     def forward(
--++#         self,
--++#         hidden_states: mindspore.Tensor,
--++#         attention_mask: Optional[mindspore.Tensor] = None,
--++#         position_ids: Optional[mindspore.Tensor] = None,
--++#         past_key_value: Optional[Cache] = None,
--++#         output_attentions: bool = False,
--++#         use_cache: bool = False,
--++#         **kwargs,
--++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++#         if "padding_mask" in kwargs:
--++#             warnings.warn(
--++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--++#             )
--++        
--++#         if output_attentions:
--++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--++
--++#         bsz, q_len, _ = hidden_states.shape
--++
--++#         if self.config.pretraining_tp > 1:
--++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--++
--++#         query_states = self.q_proj(hidden_states)
--++#         key_states = self.k_proj(hidden_states)
--++#         value_states = self.v_proj(hidden_states)
--++
--++#         # Reshape for multi-head attention
--++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++#         kv_seq_len = key_states.shape[-2]
--++#         if past_key_value is not None:
--++#             if self.layer_idx is None:
--++#                 raise ValueError(
--++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++#                     "with a layer index."
--++#                 )
--++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++        
--++#         # Apply Rotary Positional Embedding
--++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++#         if past_key_value is not None:
--++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++
--++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++        
--++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++        
--++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++
--++#         # Convert attention_mask for flash_attention_score
--++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--++#         if attention_mask is not None:
--++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--++#                 raise ValueError(
--++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--++#                 )
--++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--++#         else:
--++#             attn_mask_for_fa = None
--++        
--++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--++
--++#         # Call the fused flash_attention_score operator
--++#         attn_output = mindspore.ops.flash_attention_score(
--++#             query=query_states_for_fa,
--++#             key=key_states_for_fa,
--++#             value=value_states_for_fa,
--++#             head_num=self.num_heads, # This is N1, the number of query heads
--++#             input_layout='BSH',
--++#             attn_mask=attn_mask_for_fa,
--++#             keep_prob=keep_prob,
--++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--++#             sparse_mode=0 # Default mask mode
--++#         )
--++        
--++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--++#         attn_output = self.o_proj(attn_output)
--++        
--++#         # Flash Attention does not return attention weights
--++#         attn_weights = None
--++
--++#         return attn_output, attn_weights, past_key_value
--++
--++class DeepseekFlashAttention(nn.Module):
--++    """
--++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
--++    This implementation is a drop-in replacement for the original DeepseekAttention class,
--++    designed for high performance on supported hardware (Ascend).
--++
--++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
--++    """
--++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--++        super().__init__()
--++        self.config = config
--++        self.layer_idx = layer_idx
--++        if layer_idx is None:
--++            logger.warning(
--++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++                "when creating this class."
--++            )
--++
--++        # --- [FIX] Correctly initialize all required attributes ---
--++        self.attention_dropout = config.attention_dropout
--++        self.hidden_size = config.hidden_size
--++        self.num_heads = config.num_attention_heads
--++        self.head_dim = self.hidden_size // self.num_heads
--++        self.num_key_value_heads = config.num_key_value_heads
--++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++        self.max_position_embeddings = config.max_position_embeddings
--++        self.rope_theta = config.rope_theta
--++        self.is_causal = True
--++
--++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++            raise ValueError(
--++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++                f" and `num_heads`: {self.num_heads})."
--++            )
--++
--++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--++        
--++        # This call will now succeed as all attributes are initialized.
--++        self._init_rope()
--++
--++    def _init_rope(self):
--++        if self.config.rope_scaling is None:
--++            self.rotary_emb = DeepseekRotaryEmbedding(
--++                self.head_dim,
--++                max_position_embeddings=self.max_position_embeddings,
--++                base=self.rope_theta,
--++            )
--++        else:
--++            scaling_type = self.config.rope_scaling["type"]
--++            scaling_factor = self.config.rope_scaling["factor"]
--++            if scaling_type == "linear":
--++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--++                    self.head_dim,
--++                    max_position_embeddings=self.max_position_embeddings,
--++                    scaling_factor=scaling_factor,
--++                    base=self.rope_theta,
--++                )
--++            elif scaling_type == "dynamic":
--++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--++                    self.head_dim,
--++                    max_position_embeddings=self.max_position_embeddings,
--++                    scaling_factor=scaling_factor,
--++                    base=self.rope_theta,
--++                )
--++            else:
--++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--++
--++    def forward(
--++        self,
--++        hidden_states: mindspore.Tensor,
--++        attention_mask: Optional[mindspore.Tensor] = None,
--++        position_ids: Optional[mindspore.Tensor] = None,
--++        past_key_value: Optional[Cache] = None,
--++        output_attentions: bool = False,
--++        use_cache: bool = False,
--++        **kwargs,
--++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++        if "padding_mask" in kwargs:
--++            warnings.warn(
--++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--++            )
--++        if output_attentions:
--++            warnings.warn(
--++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
--++            )
--++
--++        bsz, q_len, _ = hidden_states.shape
--++
--++        if self.config.pretraining_tp > 1:
--++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--++
--++        query_states = self.q_proj(hidden_states)
--++        key_states = self.k_proj(hidden_states)
--++        value_states = self.v_proj(hidden_states)
--++
--++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
--++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++        kv_seq_len = key_states.shape[-2]
--++        if past_key_value is not None:
--++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++        
--++        # Apply Rotary Position Embedding
--++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++        if past_key_value is not None:
--++            cache_kwargs = {"sin": sin, "cos": cos}
--++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++
--++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
--++        # So we must explicitly repeat the KV heads.
--++        key_states = repeat_kv(key_states, self.num_key_value_groups)
--++        value_states = repeat_kv(value_states, self.num_key_value_groups)
--++
--++        # Convert attention mask for flash_attention_score
--++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
--++        if attention_mask is not None:
--++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--++                 raise ValueError(
--++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--++                )
--++            attn_mask_for_fa = attention_mask < 0
--++        else:
--++            attn_mask_for_fa = None
--++
--++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--++
--++        # Call the fused operator using the efficient BNSD layout
--++        attn_output = mindspore.ops.flash_attention_score(
--++            query=query_states,
--++            key=key_states,
--++            value=value_states,
--++            head_num=self.num_heads,
--++            input_layout='BNSD', # Specify the correct layout
--++            attn_mask=attn_mask_for_fa,
--++            keep_prob=keep_prob,
--++            scalar_value=1.0 / math.sqrt(self.head_dim)
--++        )
--++        
--++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
--++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++        
--++        # Apply output projection
--++        attn_output = self.o_proj(attn_output)
--++
--++        # Flash attention does not return attention weights, so we return None.
--++        attn_weights = None
--++
--++        return attn_output, attn_weights, past_key_value
--++
--+ Deepseek_ATTENTION_CLASSES = {
--+     "eager": DeepseekAttention,
--++    "flash-attention": DeepseekFlashAttention,
--+ }
--+ 
--+ 
--+@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
--+             config=config, layer_idx=layer_idx
--+         )
--+ 
--++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--++            config=config, layer_idx=layer_idx
--++        )
--++
--+         self.mlp = (
--+             DeepseekMoE(config)
--+             if (
--+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+index d4c6b651..bced285c 100644
--+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
--+ 
--+ import mindspore
--+ import mindnlp.core.nn.functional as F
--+-from mindnlp.core import nn, ops
--++from mindnlp.core import nn, ops, no_grad
--+ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
--+ 
--+ from ....common.activations import ACT2FN
--+@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
--+ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--+ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--+ 
--++Long_Prompt = False
--++PROMPT_LENGTH_THRESHOLD = 128
--+ 
--+ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--+ def _prepare_4d_causal_attention_mask_with_cache_position(
--+@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
--+         return attn_output, attn_weights, past_key_value
--+ 
--+ 
--++# class Qwen2MoeFlashAttention(nn.Module):
--++#     """
--++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++
--++#     关键改动:
--++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++#        直接传入原始的 key 和 value 张量效率更高。
--++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++#     """
--++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++#         super().__init__()
--++#         self.config = config
--++#         self.layer_idx = layer_idx
--++#         self.hidden_size = config.hidden_size
--++#         self.num_heads = config.num_attention_heads
--++#         self.head_dim = self.hidden_size // self.num_heads
--++#         self.num_key_value_heads = config.num_key_value_heads
--++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++#         self.max_position_embeddings = config.max_position_embeddings
--++#         self.rope_theta = config.rope_theta
--++#         self.attention_dropout = config.attention_dropout
--++
--++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++#             raise ValueError(
--++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++#             )
--++
--++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++
--++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++#             self.head_dim,
--++#             max_position_embeddings=self.max_position_embeddings,
--++#             base=self.rope_theta,
--++#         )
--++
--++#     def forward(
--++#         self,
--++#         hidden_states: mindspore.Tensor,
--++#         attention_mask: Optional[mindspore.Tensor] = None,
--++#         position_ids: Optional[mindspore.Tensor] = None,
--++#         past_key_value: Optional[Cache] = None,
--++#         output_attentions: bool = False,
--++#         use_cache: bool = False,
--++#         cache_position: Optional[mindspore.Tensor] = None,
--++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++#         bsz, q_len, _ = hidden_states.shape
--++
--++#         # 1. 线性投射 Q, K, V
--++#         query_states = self.q_proj(hidden_states)
--++#         key_states = self.k_proj(hidden_states)
--++#         value_states = self.v_proj(hidden_states)
--++
--++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++#         # query:   [B, S, H*D] -> [B, N1, S, D]
--++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++#         # 3. RoPE 旋转位置编码
--++#         kv_seq_len = key_states.shape[-2]
--++#         if past_key_value is not None:
--++#             if self.layer_idx is None:
--++#                 raise ValueError(
--++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++#                     "with a layer index."
--++#                 )
--++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
--++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++#                 if cache_position.shape[0] == 1:
--++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++#                     kv_seq_len = past_seen_tokens + 1
--++#                 else:
--++#                     # prefill 阶段：cache_position 是范围，使用其长度
--++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++#             else:
--++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++
--++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++#         # 4. KV 缓存更新
--++#         if past_key_value is not None:
--++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++#             key_states, value_states = past_key_value.update(
--++#                 key_states, value_states, self.layer_idx, cache_kwargs
--++#             )
--++            
--++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++#                 if cache_position.shape[0] == 1:
--++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++#                     kv_seq_len = key_states.shape[-2]
--++
--++#         # 5. [重要] 准备 Attention Mask
--++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++#         fa_attention_mask = None
--++#         if attention_mask is not None:
--++#             # 截取与当前key长度匹配的部分
--++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
--++#             fa_attention_mask = (mask_slice != 0)
--++
--++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++#         input_dtype = query_states.dtype
--++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++#             query_states = query_states.to(mindspore.float16)
--++#             key_states = key_states.to(mindspore.float16)
--++#             value_states = value_states.to(mindspore.float16)
--++
--++#         # 6. [核心] 调用 flash_attention_score 算子
--++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++#         attn_output = mindspore.ops.flash_attention_score(
--++#             query=query_states,
--++#             key=key_states,
--++#             value=value_states,
--++#             head_num=self.num_heads, # 传入Q的头数(N1)
--++#             attn_mask=fa_attention_mask,
--++#             keep_prob=1.0 - self.attention_dropout,
--++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--++#             input_layout="BNSD",
--++#             sparse_mode=0 # 使用 defaultMask 模式
--++#         )
--++
--++#         # 恢复原始数据类型
--++#         attn_output = attn_output.to(input_dtype)
--++
--++#         # 7. 调整输出形状
--++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++#         attn_output = self.o_proj(attn_output)
--++
--++#         # FlashAttention 算子不直接返回注意力权重矩阵
--++#         attn_weights = None
--++#         if output_attentions:
--++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++
--++#         return attn_output, attn_weights, past_key_value
--++
--++#     # def forward(
--++#     #     self,
--++#     #     hidden_states: mindspore.Tensor,
--++#     #     attention_mask: Optional[mindspore.Tensor] = None,
--++#     #     position_ids: Optional[mindspore.Tensor] = None,
--++#     #     past_key_value: Optional[Cache] = None,
--++#     #     output_attentions: bool = False,
--++#     #     use_cache: bool = False,
--++#     #     cache_position: Optional[mindspore.Tensor] = None,
--++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++#     #     bsz, q_len, _ = hidden_states.shape
--++
--++#     #     # 1. 线性投射 Q, K, V
--++#     #     query_states = self.q_proj(hidden_states)
--++#     #     key_states = self.k_proj(hidden_states)
--++#     #     value_states = self.v_proj(hidden_states)
--++
--++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++
--++#     #     # 3. RoPE 旋转位置编码
--++#     #     kv_seq_len = key_states.shape[-2]
--++#     #     if past_key_value is not None:
--++#     #         if self.layer_idx is None:
--++#     #             raise ValueError(
--++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++#     #                 "with a layer index."
--++#     #             )
--++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++
--++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++#     #     # 4. KV 缓存更新
--++#     #     if past_key_value is not None:
--++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++#     #         key_states, value_states = past_key_value.update(
--++#     #             key_states, value_states, self.layer_idx, cache_kwargs
--++#     #         )
--++
--++#     #     # 5. 准备 Attention Mask
--++#     #     fa_attention_mask = None
--++#     #     if attention_mask is not None:
--++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++#     #         fa_attention_mask = (mask_slice != 0)
--++
--++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++#     #     input_dtype = query_states.dtype
--++
--++#     #     # 6. [核心] 调用 flash_attention_score 算子
--++#     #     attn_output = mindspore.ops.flash_attention_score(
--++#     #         query=query_states,
--++#     #         key=key_states,
--++#     #         value=value_states,
--++#     #         head_num=self.num_heads,
--++#     #         attn_mask=fa_attention_mask,
--++#     #         keep_prob=1.0 - self.attention_dropout,
--++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++#     #         input_layout="BNSD",
--++#     #         sparse_mode=0,
--++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
--++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++#     #         inner_precise=1
--++#     #     )
--++
--++#     #     # 恢复原始数据类型
--++#     #     attn_output = attn_output.to(input_dtype)
--++
--++#     #     # 7. 调整输出形状
--++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++#     #     attn_output = self.o_proj(attn_output)
--++
--++#     #     attn_weights = None
--++#     #     if output_attentions:
--++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++
--++#     #     return attn_output, attn_weights, past_key_value
--++
--++
--+ class Qwen2MoeFlashAttention(nn.Module):
--+     """
--+-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+-
--+-    关键改动:
--+-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+-       直接传入原始的 key 和 value 张量效率更高。
--+-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
--++
--++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
--++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
--++    完全使用模型的低精度数据类型（如 float16）进行计算，
--++    以达到理论上的最高执行速度。
--+     """
--+     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+         super().__init__()
--+         self.config = config
--+         self.layer_idx = layer_idx
--++        if layer_idx is None:
--++            logger.warning_once(
--++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
--++            )
--++
--+         self.hidden_size = config.hidden_size
--+         self.num_heads = config.num_attention_heads
--+         self.head_dim = self.hidden_size // self.num_heads
--+         self.num_key_value_heads = config.num_key_value_heads
--+-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+         self.max_position_embeddings = config.max_position_embeddings
--+         self.rope_theta = config.rope_theta
--+         self.attention_dropout = config.attention_dropout
--+ 
--+-        if (self.head_dim * self.num_heads) != self.hidden_size:
--+-            raise ValueError(
--+-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+-            )
--+-
--+         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
--+         key_states = self.k_proj(hidden_states)
--+         value_states = self.v_proj(hidden_states)
--+ 
--+-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+-        # query:   [B, S, H*D] -> [B, N1, S, D]
--+-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++        # 2. 调整形状以匹配 BNSD 布局
--+         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-
--+-        # 3. RoPE 旋转位置编码
--++        
--++        # 3. RoPE 和 KV 缓存
--+         kv_seq_len = key_states.shape[-2]
--+         if past_key_value is not None:
--+-            if self.layer_idx is None:
--+-                raise ValueError(
--+-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+-                    "with a layer index."
--+-                )
--+-            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+-                if cache_position.shape[0] == 1:
--+-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+-                    kv_seq_len = past_seen_tokens + 1
--+-                else:
--+-                    # prefill 阶段：cache_position 是范围，使用其长度
--+-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+-            else:
--+-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+-
--++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++        
--+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+ 
--+-        # 4. KV 缓存更新
--+         if past_key_value is not None:
--+             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+-            key_states, value_states = past_key_value.update(
--+-                key_states, value_states, self.layer_idx, cache_kwargs
--+-            )
--+-            
--+-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+-                if cache_position.shape[0] == 1:
--+-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+-                    kv_seq_len = key_states.shape[-2]
--+-
--+-        # 5. [重要] 准备 Attention Mask
--+-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++
--++        # 4. 准备 Attention Mask
--+         fa_attention_mask = None
--+         if attention_mask is not None:
--+-            # 截取与当前key长度匹配的部分
--+-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+-            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+             fa_attention_mask = (mask_slice != 0)
--+ 
--+-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+-        input_dtype = query_states.dtype
--+-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+-            query_states = query_states.to(mindspore.float16)
--+-            key_states = key_states.to(mindspore.float16)
--+-            value_states = value_states.to(mindspore.float16)
--+-
--+-        # 6. [核心] 调用 flash_attention_score 算子
--+-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
--+         attn_output = mindspore.ops.flash_attention_score(
--+             query=query_states,
--+             key=key_states,
--+             value=value_states,
--+-            head_num=self.num_heads, # 传入Q的头数(N1)
--++            head_num=self.num_heads,
--+             attn_mask=fa_attention_mask,
--+-            keep_prob=1.0 - self.attention_dropout,
--++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
--+             scalar_value=1.0 / math.sqrt(self.head_dim),
--+             input_layout="BNSD",
--+-            sparse_mode=0 # 使用 defaultMask 模式
--++            sparse_mode=0,
--++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
--+         )
--+ 
--+-        # 恢复原始数据类型
--+-        attn_output = attn_output.to(input_dtype)
--+-
--+-        # 7. 调整输出形状
--+-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++        # 6. 调整输出形状
--+         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+         attn_output = self.o_proj(attn_output)
--+ 
--+-        # FlashAttention 算子不直接返回注意力权重矩阵
--++        # 7. 返回结果
--+         attn_weights = None
--+         if output_attentions:
--+-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--+ 
--+         return attn_output, attn_weights, past_key_value
--+ 
--+-    # def forward(
--+-    #     self,
--+-    #     hidden_states: mindspore.Tensor,
--+-    #     attention_mask: Optional[mindspore.Tensor] = None,
--+-    #     position_ids: Optional[mindspore.Tensor] = None,
--+-    #     past_key_value: Optional[Cache] = None,
--+-    #     output_attentions: bool = False,
--+-    #     use_cache: bool = False,
--+-    #     cache_position: Optional[mindspore.Tensor] = None,
--+-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+-
--+-    #     bsz, q_len, _ = hidden_states.shape
--+-
--+-    #     # 1. 线性投射 Q, K, V
--+-    #     query_states = self.q_proj(hidden_states)
--+-    #     key_states = self.k_proj(hidden_states)
--+-    #     value_states = self.v_proj(hidden_states)
--+-
--+-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-
--+-    #     # 3. RoPE 旋转位置编码
--+-    #     kv_seq_len = key_states.shape[-2]
--+-    #     if past_key_value is not None:
--+-    #         if self.layer_idx is None:
--+-    #             raise ValueError(
--+-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+-    #                 "with a layer index."
--+-    #             )
--+-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+ 
--+-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+-
--+-    #     # 4. KV 缓存更新
--+-    #     if past_key_value is not None:
--+-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+-    #         key_states, value_states = past_key_value.update(
--+-    #             key_states, value_states, self.layer_idx, cache_kwargs
--+-    #         )
--+-
--+-    #     # 5. 准备 Attention Mask
--+-    #     fa_attention_mask = None
--+-    #     if attention_mask is not None:
--+-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+-    #         fa_attention_mask = (mask_slice != 0)
--+-
--+-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+-    #     input_dtype = query_states.dtype
--+-
--+-    #     # 6. [核心] 调用 flash_attention_score 算子
--+-    #     attn_output = mindspore.ops.flash_attention_score(
--+-    #         query=query_states,
--+-    #         key=key_states,
--+-    #         value=value_states,
--+-    #         head_num=self.num_heads,
--+-    #         attn_mask=fa_attention_mask,
--+-    #         keep_prob=1.0 - self.attention_dropout,
--+-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+-    #         input_layout="BNSD",
--+-    #         sparse_mode=0,
--+-    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+-    #         inner_precise=1
--+-    #     )
--+-
--+-    #     # 恢复原始数据类型
--+-    #     attn_output = attn_output.to(input_dtype)
--++QWEN2MOE_ATTENTION_CLASSES = {
--++    "eager": Qwen2MoeAttention,
--++    "flash-attention": Qwen2MoeFlashAttention,
--++}
--+ 
--+-    #     # 7. 调整输出形状
--+-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+-    #     attn_output = self.o_proj(attn_output)
--+ 
--+-    #     attn_weights = None
--+-    #     if output_attentions:
--+-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++#     def __init__(self, config):
--++#         super().__init__()
--++#         self.num_experts = config.num_experts
--++#         self.top_k = config.num_experts_per_tok
--++#         self.norm_topk_prob = config.norm_topk_prob
--++
--++#         # gating
--++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++#         self.experts = nn.ModuleList(
--++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++#         )
--++
--++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++
--++#     #@dwj
--++#     # 只遍历激活的专家，而非全部专家
--++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
--++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++#             num_tokens = hidden_states_reshaped.shape[0]
--++
--++#             router_logits = self.gate(hidden_states_reshaped)
--++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++
--++#             if self.norm_topk_prob:
--++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++#             routing_weights = routing_weights.to(hidden_states.dtype)
--++            
--++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++#             flat_selected_experts = selected_experts.flatten()
--++            
--++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++#             token_indices = broadcasted_token_indices.flatten()
--++            
--++#             active_experts = ops.unique(flat_selected_experts)
--++            
--++#             for expert_idx_tensor in active_experts:
--++#                 expert_idx = expert_idx_tensor.item()
--++#                 expert_layer = self.experts[expert_idx]
--++                
--++#                 mask = (flat_selected_experts == expert_idx_tensor)
--++#                 selected_token_indices = token_indices[mask]
--++#                 selected_routing_weights = routing_weights.flatten()[mask]
--++                
--++#                 current_states = hidden_states_reshaped[selected_token_indices]
--++                
--++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++                
--++#                 final_hidden_states = final_hidden_states.index_add(
--++#                     dim=0,
--++#                     index=selected_token_indices,
--++#                     source=expert_output.to(hidden_states.dtype)
--++#                 )
--++            
--++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+ 
--+-    #     return attn_output, attn_weights, past_key_value
--++#             final_hidden_states = final_hidden_states + shared_expert_output
--++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++            
--++#             return final_hidden_states, router_logits
--++
--++
--++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++#     """
--++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--++#     `_moe_infer_prefill` (用于长序列处理) 方法。
--++#     """
--++#     def __init__(self, config: Qwen2MoeConfig):
--++#         super().__init__()
--++#         self.num_experts = config.num_experts
--++#         self.top_k = config.num_experts_per_tok
--++#         self.norm_topk_prob = config.norm_topk_prob
--++
--++#         # 门控网络
--++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++#         # 专家列表
--++#         self.experts = nn.ModuleList(
--++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++#         )
--++#         # 共享专家
--++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++
--++#     @no_grad()
--++#     def _moe_infer_decode(
--++#         self, 
--++#         hidden_states: mindspore.Tensor, 
--++#         selected_experts: mindspore.Tensor, 
--++#         routing_weights: mindspore.Tensor
--++#     ) -> mindspore.Tensor:
--++#         """
--++#         【解码路径】针对 sequence_length=1 的极致优化。
--++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--++#         """
--++#         batch_size, hidden_dim = hidden_states.shape
--++        
--++#         expert_outputs_list = [
--++#             ops.cat([
--++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++#             ], dim=0) 
--++#             for i in range(batch_size)
--++#         ]
--++        
--++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--++#         # shape: (batch_size, top_k, hidden_dim)
--++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++        
--++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++        
--++#         return moe_output.squeeze(1)
--++
--++#     @no_grad()
--++#     def _moe_infer_prefill(
--++#         self, 
--++#         hidden_states: mindspore.Tensor, 
--++#         selected_experts: mindspore.Tensor, 
--++#         routing_weights: mindspore.Tensor
--++#     ) -> mindspore.Tensor:
--++#         """
--++#         【预填充路径】针对 sequence_length > 1 的优化。
--++#         按专家对 Token 进行分组，并进行批处理。
--++#         """
--++#         moe_output = ops.zeros_like(hidden_states)
--++#         num_tokens = hidden_states.shape[0]
--++#         flat_selected_experts = selected_experts.flatten()
--++        
--++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++        
--++#         active_experts = ops.unique(flat_selected_experts)
--++        
--++#         for expert_idx_tensor in active_experts:
--++#             expert_idx = expert_idx_tensor.item()
--++#             expert_layer = self.experts[expert_idx]
--++            
--++#             mask = (flat_selected_experts == expert_idx_tensor)
--++#             selected_token_indices = token_indices[mask]
--++#             selected_routing_weights = routing_weights.flatten()[mask]
--++            
--++#             current_states = hidden_states[selected_token_indices]
--++            
--++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++            
--++#             moe_output = moe_output.index_add(
--++#                 dim=0,
--++#                 index=selected_token_indices,
--++#                 source=expert_output.to(hidden_states.dtype)
--++#             )
--++#         return moe_output
--++
--++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++#         """
--++#         顶层 forward 方法，作为智能分发器。
--++#         """
--++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++        
--++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++#         router_logits = self.gate(hidden_states_reshaped)
--++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+ 
--+-    # def forward(
--+-    #     self,
--+-    #     hidden_states: mindspore.Tensor,
--+-    #     attention_mask: Optional[mindspore.Tensor] = None,
--+-    #     position_ids: Optional[mindspore.Tensor] = None,
--+-    #     past_key_value: Optional[Cache] = None,
--+-    #     output_attentions: bool = False,
--+-    #     use_cache: bool = False,
--+-    #     cache_position: Optional[mindspore.Tensor] = None,
--+-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+-
--+-    #     bsz, q_len, _ = hidden_states.shape
--+-
--+-    #     query_states = self.q_proj(hidden_states)
--+-    #     key_states = self.k_proj(hidden_states)
--+-    #     value_states = self.v_proj(hidden_states)
--+-
--+-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-
--+-    #     kv_seq_len = key_states.shape[-2]
--+-    #     if past_key_value is not None:
--+-    #         if self.layer_idx is None:
--+-    #             raise ValueError("`layer_idx` must be specified for caching")
--+-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+-
--+-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+-
--+-    #     if past_key_value is not None:
--+-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+-    #         key_states, value_states = past_key_value.update(
--+-    #             key_states, value_states, self.layer_idx, cache_kwargs
--+-    #         )
--++#         if self.norm_topk_prob:
--++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++        
--++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++        
--++#         moe_output = None
--++#         # 在推理时，根据序列长度选择最优路径
--++#         if not self.training:
--++#             if sequence_length == 1:
--++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++#             else:
--++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++#         else:
--++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--++#             raise NotImplementedError("Training path is not implemented.")
--++
--++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--++        
--++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--++        
--++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--++        
--++#         return final_hidden_states, router_logits
--++
--++
--++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++#     """
--++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--++#     """
--++#     def __init__(self, config: Qwen2MoeConfig):
--++#         super().__init__()
--++#         self.num_experts = config.num_experts
--++#         self.top_k = config.num_experts_per_tok
--++#         self.norm_topk_prob = config.norm_topk_prob
--++
--++#         # 门控网络
--++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++#         # 专家列表
--++#         self.experts = nn.ModuleList(
--++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++#         )
--++#         # 共享专家
--++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++
--++#     @no_grad()
--++#     def _moe_infer_decode(
--++#         self, 
--++#         hidden_states: mindspore.Tensor, 
--++#         selected_experts: mindspore.Tensor, 
--++#         routing_weights: mindspore.Tensor
--++#     ) -> mindspore.Tensor:
--++#         batch_size, _ = hidden_states.shape
--++#         expert_outputs_list = [
--++#             ops.cat([
--++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++#             ], dim=0) 
--++#             for i in range(batch_size)
--++#         ]
--++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++#         return moe_output.squeeze(1)
--++
--++#     @no_grad()
--++#     def _moe_infer_prefill(
--++#         self, 
--++#         hidden_states: mindspore.Tensor, 
--++#         selected_experts: mindspore.Tensor, 
--++#         routing_weights: mindspore.Tensor
--++#     ) -> mindspore.Tensor:
--++#         moe_output = ops.zeros_like(hidden_states)
--++#         num_tokens = hidden_states.shape[0]
--++#         flat_selected_experts = selected_experts.flatten()
--++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++#         active_experts = ops.unique(flat_selected_experts)
--++        
--++#         for expert_idx_tensor in active_experts:
--++#             expert_idx = expert_idx_tensor.item()
--++#             expert_layer = self.experts[expert_idx]
--++#             mask = (flat_selected_experts == expert_idx_tensor)
--++#             selected_token_indices = token_indices[mask]
--++#             selected_routing_weights = routing_weights.flatten()[mask]
--++#             current_states = hidden_states[selected_token_indices]
--++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++#             moe_output = moe_output.index_add(
--++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++#             )
--++#         return moe_output
--++
--++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++#         """
--++#         顶层 forward 方法，作为智能分发器。
--++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--++#         """
--++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++        
--++#         # 1. 门控计算 (通用逻辑)
--++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++#         router_logits = self.gate(hidden_states_reshaped)
--++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++
--++#         if self.norm_topk_prob:
--++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++        
--++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++        
--++#         # 2. 智能分发到最优 MoE 路径
--++#         moe_output = None
--++#         if not self.training:
--++#             if sequence_length == 1:
--++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++#             else:
--++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++#         else:
--++#             raise NotImplementedError("Training path is not implemented.")
--++
--++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++        
--++#         # 4. 合并 MoE 输出和共享专家输出
--++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++        
--++#         # 5. 恢复原始形状并返回
--++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++        
--++#         return final_hidden_states, router_logits
--++
--++# prefill fastest
--++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++#     """
--++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--++#     """
--++#     def __init__(self, config: Qwen2MoeConfig):
--++#         super().__init__()
--++#         self.num_experts = config.num_experts
--++#         self.top_k = config.num_experts_per_tok
--++#         self.norm_topk_prob = config.norm_topk_prob
--++
--++#         # 门控网络
--++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++#         # 专家列表
--++#         self.experts = nn.ModuleList(
--++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++#         )
--++#         # 共享专家
--++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++
--++#     @no_grad()
--++#     def _moe_infer_dispatch(
--++#         self, 
--++#         hidden_states: mindspore.Tensor, 
--++#         selected_experts: mindspore.Tensor, 
--++#         routing_weights: mindspore.Tensor
--++#     ) -> mindspore.Tensor:
--++#         """
--++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--++#         """
--++#         moe_output = ops.zeros_like(hidden_states)
--++#         num_tokens, _ = hidden_states.shape
--++        
--++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--++#         flat_selected_experts = selected_experts.flatten()
--++#         flat_routing_weights = routing_weights.flatten()
--+ 
--+-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+-
--+-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+-    #     query_states = query_states / math.sqrt(self.head_dim)
--+-    #     # <--- 修改结束 ---
--+-
--+-    #     fa_attention_mask = None
--+-    #     if attention_mask is not None:
--+-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+-    #         fa_attention_mask = (mask_slice != 0)
--+-
--+-    #     input_dtype = query_states.dtype
--+-
--+-    #     attn_output = mindspore.ops.flash_attention_score(
--+-    #         query=query_states,    # 传入已经预先缩放过的 query
--+-    #         key=key_states,
--+-    #         value=value_states,
--+-    #         head_num=self.num_heads,
--+-    #         attn_mask=fa_attention_mask,
--+-    #         keep_prob=1.0 - self.attention_dropout,
--+-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+-    #         input_layout="BNSD",
--+-    #         sparse_mode=0,
--+-    #         inner_precise=1        # 仍然保持内部高精度计算
--+-    #     )
--++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+ 
--+-    #     attn_output = attn_output.to(input_dtype)
--+-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+-    #     attn_output = self.o_proj(attn_output)
--++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--++#         active_experts = ops.unique(flat_selected_experts)
--++        
--++#         for expert_idx_tensor in active_experts:
--++#             expert_idx = expert_idx_tensor.item()
--++#             expert_layer = self.experts[expert_idx]
--++            
--++#             # 找到所有分配给该专家的 token
--++#             mask = (flat_selected_experts == expert_idx_tensor)
--++            
--++#             # 使用 mask 选取对应的 token 和权重
--++#             current_token_indices = token_indices[mask]
--++#             current_routing_weights = flat_routing_weights[mask]
--++#             current_hidden_states = hidden_states[current_token_indices]
--++            
--++#             # 对这些 token 进行批处理
--++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++            
--++#             # 使用 index_add 将结果精确地加回到对应位置
--++#             moe_output = moe_output.index_add(
--++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--++#             )
--++#         return moe_output
--++
--++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++#         """
--++#         顶层 forward 方法，作为智能分发器。
--++#         """
--++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++        
--++#         # 1. 门控计算
--++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++#         router_logits = self.gate(hidden_states_reshaped)
--++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++
--++#         if self.norm_topk_prob:
--++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++        
--++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++        
--++#         # 2. 调用统一的 MoE 计算内核
--++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--+ 
--+-    #     attn_weights = None
--+-    #     if output_attentions:
--+-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++#         # 3. 统一处理共享专家
--++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++        
--++#         # 4. 合并输出
--++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++        
--++#         # 5. 恢复原始形状并返回
--++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++        
--++#         return final_hidden_states, router_logits
--++
--++
--++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++#     """
--++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++#     【最终高性能与高精度版】：
--++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--++#     3. 这样实现了速度和准确性的两全其美。
--++#     """
--++#     def __init__(self, config: Qwen2MoeConfig):
--++#         super().__init__()
--++#         self.num_experts = config.num_experts
--++#         self.top_k = config.num_experts_per_tok
--++#         self.norm_topk_prob = config.norm_topk_prob
--++
--++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++#         self.experts = nn.ModuleList(
--++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++#         )
--++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++
--++#     @no_grad()
--++#     def _moe_infer_decode(
--++#         self, 
--++#         hidden_states: mindspore.Tensor, 
--++#         selected_experts: mindspore.Tensor, 
--++#         routing_weights: mindspore.Tensor
--++#     ) -> mindspore.Tensor:
--++#         """
--++#         【解码路径】极致优化版：bmm + 高精度累加。
--++#         """
--++#         original_dtype = hidden_states.dtype
--++#         batch_size, _ = hidden_states.shape
--++        
--++#         expert_outputs_list = [
--++#             ops.cat([
--++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++#             ], dim=0) 
--++#             for i in range(batch_size)
--++#         ]
--++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++
--++#         # 在 float32 下执行 bmm，得到高精度结果
--++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++        
--++#         # 将高精度结果转换回原始数据类型
--++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--++        
--++#         return moe_output
--++
--++#     @no_grad()
--++#     def _moe_infer_prefill(
--++#         self, 
--++#         hidden_states: mindspore.Tensor, 
--++#         selected_experts: mindspore.Tensor, 
--++#         routing_weights: mindspore.Tensor
--++#     ) -> mindspore.Tensor:
--++#         """
--++#         【预填充路径】与原始实现一致，结果精确。
--++#         """
--++#         moe_output = ops.zeros_like(hidden_states)
--++#         num_tokens, _ = hidden_states.shape
--++#         flat_selected_experts = selected_experts.flatten()
--++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++#         active_experts = ops.unique(flat_selected_experts)
--++        
--++#         for expert_idx_tensor in active_experts:
--++#             expert_idx = expert_idx_tensor.item()
--++#             expert_layer = self.experts[expert_idx]
--++#             mask = (flat_selected_experts == expert_idx_tensor)
--++#             selected_token_indices = token_indices[mask]
--++#             selected_routing_weights = routing_weights.flatten()[mask]
--++#             current_states = hidden_states[selected_token_indices]
--++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++#             moe_output = moe_output.index_add(
--++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++#             )
--++#         return moe_output
--++
--++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++        
--++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++#         router_logits = self.gate(hidden_states_reshaped)
--++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+ 
--+-    #     return attn_output, attn_weights, past_key_value
--++#         if self.norm_topk_prob:
--++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++        
--++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--++#         # 如果模型主体是 float16，后续再转换
--++        
--++#         moe_output = None
--++#         if not self.training:
--++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--++#             # _moe_infer_decode 内部会处理好类型转换
--++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--++#             if sequence_length == 1:
--++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++#             else:
--++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++#         else:
--++#             raise NotImplementedError("Training path is not implemented.")
--++
--++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++        
--++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++        
--++#         return final_hidden_states, router_logits
--++    
--+ 
--+-QWEN2MOE_ATTENTION_CLASSES = {
--+-    "eager": Qwen2MoeAttention,
--+-    "flash-attention": Qwen2MoeFlashAttention,
--+-}
--++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++#     """
--++#     【融合版】一个混合专家模块，内置两种推理策略，
--++#     由外部全局变量 `Long_Prompt` 控制：
--++
--++#     - if Long_Prompt is True:  【精度优先模式】
--++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--++#       适用于处理长序列，避免误差累积。
--++
--++#     - if Long_Prompt is False: 【速度优先模式】
--++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--++#       在解码阶段获得极致速度，同时保证结果高度准确。
--++#     """
--++#     def __init__(self, config: Qwen2MoeConfig):
--++#         super().__init__()
--++#         self.num_experts = config.num_experts
--++#         self.top_k = config.num_experts_per_tok
--++#         self.norm_topk_prob = config.norm_topk_prob
--++
--++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++#         self.experts = nn.ModuleList(
--++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++#         )
--++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++
--++#     # --- 速度优先模式的辅助函数 ---
--++#     @no_grad()
--++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++#         original_dtype = hidden_states.dtype
--++#         batch_size, _ = hidden_states.shape
--++#         expert_outputs_list = [
--++#             ops.cat([
--++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++#             ], dim=0) 
--++#             for i in range(batch_size)
--++#         ]
--++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++#         weights_fp32 = routing_weights.to(mindspore.float32)
--++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++#         return moe_output_fp32.squeeze(1).to(original_dtype)
--++
--++#     @no_grad()
--++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++#         moe_output = ops.zeros_like(hidden_states)
--++#         num_tokens, _ = hidden_states.shape
--++#         flat_selected_experts = selected_experts.flatten()
--++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++#         active_experts = ops.unique(flat_selected_experts)
--++#         for expert_idx_tensor in active_experts:
--++#             expert_idx = expert_idx_tensor.item()
--++#             expert_layer = self.experts[expert_idx]
--++#             mask = (flat_selected_experts == expert_idx_tensor)
--++#             selected_token_indices = token_indices[mask]
--++#             selected_routing_weights = routing_weights.flatten()[mask]
--++#             current_states = hidden_states[selected_token_indices]
--++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--++#         return moe_output
--++
--++#     # --- 精度优先模式的辅助函数 ---
--++#     @no_grad()
--++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++#         moe_output = ops.zeros_like(hidden_states)
--++#         num_tokens, _ = hidden_states.shape
--++#         flat_selected_experts = selected_experts.flatten()
--++#         flat_routing_weights = routing_weights.flatten()
--++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++#         active_experts = ops.unique(flat_selected_experts)
--++#         for expert_idx_tensor in active_experts:
--++#             expert_idx = expert_idx_tensor.item()
--++#             expert_layer = self.experts[expert_idx]
--++#             mask = (flat_selected_experts == expert_idx_tensor)
--++#             current_token_indices = token_indices[mask]
--++#             current_routing_weights = flat_routing_weights[mask]
--++#             current_hidden_states = hidden_states[current_token_indices]
--++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--++#         return moe_output
--++
--++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++#         # 声明我们将要使用一个在模块外部定义的全局变量
--++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--++#         global Long_Prompt
--++
--++#         # 1. 门控计算 (所有模式通用)
--++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++#         router_logits = self.gate(hidden_states_reshaped)
--++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--++#         if self.norm_topk_prob:
--++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++        
--++#         moe_output = None
--++#         if not self.training:
--++#             # 根据 Long_Prompt 标志选择模式
--++#             if Long_Prompt:
--++#                 # --- 精度优先模式 ---
--++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++#             else:
--++#                 # --- 速度优先模式 ---
--++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++#                 if sequence_length == 1:
--++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++#                 else:
--++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++#         else:
--++#             raise NotImplementedError("Training path is not implemented.")
--++
--++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++        
--++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++        
--++#         return final_hidden_states, router_logits
--++    
--++class Qwen2MoeSparseMoeBlock(nn.Module):
--++    """
--++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--++    控制的顶级推理策略：
--+ 
--++    - if Long_Prompt is True:  【精度优先模式】
--++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
--++      适用于需要严格可复现性的长序列任务。
--+ 
--+-class Qwen2MoeSparseMoeBlock(nn.Module):
--+-    def __init__(self, config):
--++    - if Long_Prompt is False: 【速度优先模式】
--++      采用业界最强的性能组合：
--++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
--++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
--++    """
--++    def __init__(self, config: Qwen2MoeConfig):
--+         super().__init__()
--+         self.num_experts = config.num_experts
--+         self.top_k = config.num_experts_per_tok
--+         self.norm_topk_prob = config.norm_topk_prob
--+ 
--+-        # gating
--+         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+         self.experts = nn.ModuleList(
--+             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+         )
--+-
--+         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+ 
--+-    #@dwj
--+-    # 只遍历激活的专家，而非全部专家
--+-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+-            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+-            num_tokens = hidden_states_reshaped.shape[0]
--+-
--+-            router_logits = self.gate(hidden_states_reshaped)
--+-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+-
--+-            if self.norm_topk_prob:
--+-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-            routing_weights = routing_weights.to(hidden_states.dtype)
--+-            
--+-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+-            flat_selected_experts = selected_experts.flatten()
--+-            
--+-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+-            token_indices = broadcasted_token_indices.flatten()
--+-            
--+-            active_experts = ops.unique(flat_selected_experts)
--+-            
--+-            for expert_idx_tensor in active_experts:
--+-                expert_idx = expert_idx_tensor.item()
--+-                expert_layer = self.experts[expert_idx]
--+-                
--+-                mask = (flat_selected_experts == expert_idx_tensor)
--+-                selected_token_indices = token_indices[mask]
--+-                selected_routing_weights = routing_weights.flatten()[mask]
--+-                
--+-                current_states = hidden_states_reshaped[selected_token_indices]
--+-                
--+-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+-                
--+-                final_hidden_states = final_hidden_states.index_add(
--+-                    dim=0,
--+-                    index=selected_token_indices,
--+-                    source=expert_output.to(hidden_states.dtype)
--+-                )
--+-            
--+-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
--++    @no_grad()
--++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++        original_dtype = hidden_states.dtype
--++        batch_size, _ = hidden_states.shape
--++        expert_outputs_list = [
--++            ops.cat([
--++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++            ], dim=0)
--++            for i in range(batch_size)
--++        ]
--++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++        weights_fp32 = routing_weights.to(mindspore.float32)
--++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++        return moe_output_fp32.squeeze(1).to(original_dtype)
--++
--++    @no_grad()
--++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++        num_tokens, _ = hidden_states.shape
--++        flat_selected_experts = selected_experts.flatten()
--++        sorted_expert_indices = flat_selected_experts.argsort()
--++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--++        original_token_indices = sorted_expert_indices // self.top_k
--++        moe_output = ops.zeros_like(hidden_states)
--++        current_token_offset = 0
--++        for i in range(self.num_experts):
--++            expert_token_count = tokens_per_expert[i] - current_token_offset
--++            if expert_token_count == 0:
--++                continue
--++            end_offset = current_token_offset + expert_token_count
--++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--++            expert_hidden_states = hidden_states[expert_original_token_indices]
--++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--++            current_token_offset += expert_token_count
--++        return moe_output
--++
--++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--++    @no_grad()
--++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++        moe_output = ops.zeros_like(hidden_states)
--++        num_tokens, _ = hidden_states.shape
--++        flat_selected_experts = selected_experts.flatten()
--++        flat_routing_weights = routing_weights.flatten()
--++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++        active_experts = ops.unique(flat_selected_experts)
--++        for expert_idx_tensor in active_experts:
--++            expert_idx = expert_idx_tensor.item()
--++            expert_layer = self.experts[expert_idx]
--++            mask = (flat_selected_experts == expert_idx_tensor)
--++            current_token_indices = token_indices[mask]
--++            current_routing_weights = flat_routing_weights[mask]
--++            current_hidden_states = hidden_states[current_token_indices]
--++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--++        return moe_output
--+ 
--+-            final_hidden_states = final_hidden_states + shared_expert_output
--+-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+-            
--+-            return final_hidden_states, router_logits
--++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++        global Long_Prompt
--++
--++        # 1. 门控计算 (所有模式通用)
--++        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++        router_logits = self.gate(hidden_states_reshaped)
--++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--++        if self.norm_topk_prob:
--++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++        
--++        moe_output = None
--++        if Long_Prompt:
--++            # --- 精度优先模式 (ACCURACY MODE) ---
--++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++        else:
--++            # --- 速度优先模式 (SPEED MODE) ---
--++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++            if sequence_length == 1:
--++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++            else:
--++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++        
--+ 
--++        # 3. 共享专家计算与合并 (所有模式通用)
--++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++        
--++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++        
--++        return final_hidden_states, router_logits
--+ 
--+ class Qwen2MoeDecoderLayer(nn.Module):
--+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--+         super().__init__()
--+         self.hidden_size = config.hidden_size
--++        
--++        # if Long_Prompt:
--++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++        # else:
--++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+ 
--+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+ 
--+-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+-
--+         if (layer_idx not in config.mlp_only_layers) and (
--+             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+         ):
--+@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+             self._warmed_up = True
--+             self.warmup_moe_model()
--+ 
--++
--++
--+         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+         output_router_logits = (
--+             output_router_logits if output_router_logits is not None else self.config.output_router_logits
--+@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+             router_logits=outputs.router_logits,
--+         )
--+ 
--++    def generate(self, *args, **kwargs):
--++        """
--++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--++        """
--++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--++
--++        input_ids = kwargs.get("input_ids")
--++        if input_ids is None and args:
--++            input_ids = args[0]
--++
--++        if input_ids is not None:
--++            prompt_length = input_ids.shape[1]
--++            
--++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--++                Long_Prompt = True
--++            else:
--++                Long_Prompt = False
--++
--++        return super().generate(*args, **kwargs)
--++    
--+     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
--+     def prepare_inputs_for_generation(
--+         self,
--+@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
--+         # Exception 1: when passing input_embeds, input_ids may be missing entries
--+         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
--++        
--+         if past_key_values is not None:
--+             if inputs_embeds is not None:  # Exception 1
--+                 if 0 not in input_ids.shape:
--+@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+             }
--+         )
--+         return model_inputs
--++
--+ # @lwx
--+     # def _decode_one_tokens_logits(
--+     #     self,
--+@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
--+             attentions=outputs.attentions,
--+         )
--+ 
--++
--+ __all__ = [
--+     "Qwen2MoeForCausalLM",
--+     "Qwen2MoeModel",
--+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+new file mode 100644
--+index 00000000..6dfb5b93
--+--- /dev/null
--++++ b/patches/0001-20251104commit.patch
--+@@ -0,0 +1,1272 @@
--++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++From: Pinoeer-kingxi <13022943007@163.com>
--++Date: Tue, 4 Nov 2025 09:11:51 +0800
--++Subject: [PATCH] 20251104commit
--++
--++---
--++ mindnlp/transformers/cache_utils.py           |  28 +-
--++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--++ 3 files changed, 976 insertions(+), 87 deletions(-)
--++
--++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--++index cadd2e04..02f8d4be 100644
--++--- a/mindnlp/transformers/cache_utils.py
--+++++ b/mindnlp/transformers/cache_utils.py
--++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--++             # k_out[:, :, cache_position] = key_states
--++             # v_out[:, :, cache_position] = value_states
--++-            if ON_ORANGE_PI:
--++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++-            else:
--++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++-
--+++            # if ON_ORANGE_PI:
--+++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++            # else:
--+++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++            # 确保 cache_position 是 1D tensor 并且类型正确
--+++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+++            if cache_position.ndim > 1:
--+++                cache_position = cache_position.flatten()
--+++            # 确保类型是 int32 或 int64（MindSpore 要求）
--+++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+++                cache_position = cache_position.int()
--+++            
--+++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+++            k_out[:, :, cache_position] = key_states
--+++            v_out[:, :, cache_position] = value_states
--+++            
--++         return k_out, v_out
--++ 
--++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++index c695b944..d8303e45 100644
--++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--++ def rotate_half(x):
--++     """Rotates half the hidden dims of the input."""
--++-    x1 = x[..., : x.shape[-1] // 2]
--++-    x2 = x[..., x.shape[-1] // 2 :]
--+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++    # x1 = x[..., : x.shape[-1] // 2]
--+++    # x2 = x[..., x.shape[-1] // 2 :]
--+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++     return ops.cat((-x2, x1), dim=-1)
--++ 
--++ 
--++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--++         if self.training:
--++             raise NotImplementedError("Training is not supported yet.")
--++         else:
--++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++-        if self.config.n_shared_experts is not None:
--++-            y = y + self.shared_experts(identity)
--++-        return y
--+++            # @lwx
--+++            if orig_shape[1] == 1:
--+++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+++                y=y.view(*orig_shape)
--+++                if self.config.n_shared_experts is not None:
--+++                    y = y + self.shared_experts(identity)
--+++                return y
--+++            else:
--+++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+++                if self.config.n_shared_experts is not None:
--+++                    y = y + self.shared_experts(identity)
--+++                return y
--+++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++        # if self.config.n_shared_experts is not None:
--+++        #     y = y + self.shared_experts(identity)
--+++        # return y
--+++        
--+++    @no_grad()
--+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++
--+++        expert_cache = ops.zeros_like(x)
--+++        for i in range(self.num_experts_per_tok):
--+++            expert_id = flat_expert_indices[i].item()
--+++            weight = flat_expert_weights[i].item()
--+++            expert = self.experts[expert_id]
--+++            expert_out = expert(x)
--+++            expert_cache += expert_out * weight
--+++        return expert_cache
--++ 
--++     @no_grad()
--++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++-        # expert_cache = torch.zeros_like(x)
--++-        # idxs = flat_expert_indices.argsort()
--++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++-        # token_idxs = idxs // self.num_experts_per_tok
--++-        # for i, end_idx in enumerate(tokens_per_expert):
--++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++-        #     if start_idx == end_idx:
--++-        #         continue
--++-        #     expert = self.experts[i]
--++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--++-        #     expert_tokens = x[exp_token_idx]
--++-        #     expert_out = expert(expert_tokens)
--++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++-        # return expert_cache
--+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++         expert_cache = ops.zeros_like(x)
--++         idxs = flat_expert_indices.argsort()
--++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++         token_idxs = idxs // self.num_experts_per_tok
--+++
--++         for i, end_idx in enumerate(tokens_per_expert):
--++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++             if start_idx == end_idx:
--++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--++             expert_out = expert(expert_tokens)
--++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++
--++         return expert_cache
--+++        
--+++    # @no_grad()
--+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++    #     # expert_cache = torch.zeros_like(x)
--+++    #     # idxs = flat_expert_indices.argsort()
--+++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++    #     # token_idxs = idxs // self.num_experts_per_tok
--+++    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++    #     #     if start_idx == end_idx:
--+++    #     #         continue
--+++    #     #     expert = self.experts[i]
--+++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++    #     #     expert_tokens = x[exp_token_idx]
--+++    #     #     expert_out = expert(expert_tokens)
--+++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++    #     # return expert_cache
--+++    #     expert_cache = ops.zeros_like(x)
--+++    #     idxs = flat_expert_indices.argsort()
--+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++    #     token_idxs = idxs // self.num_experts_per_tok
--+++
--+++    #     for i, end_idx in enumerate(tokens_per_expert):
--+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++    #         if start_idx == end_idx:
--+++    #             continue
--+++    #         expert = self.experts[i]
--+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++    #         expert_tokens = x[exp_token_idx]
--+++    #         expert_out = expert(expert_tokens)
--+++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++
--+++    #     return expert_cache
--+++    # @no_grad()
--+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++    #     expert_cache = ops.zeros_like(x)
--+++
--+++    #     # 排序保证顺序一致
--+++    #     idxs = flat_expert_indices.argsort()
--+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++    #     token_idxs = idxs // self.num_experts_per_tok
--+++
--+++    #     # 找出有 token 的专家
--+++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++
--+++    #     for i in active_experts.tolist():
--+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++    #         end_idx = tokens_per_expert[i]
--+++    #         if start_idx == end_idx:  # 没有 token
--+++    #             continue
--+++
--+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++    #         expert_tokens = x[exp_token_idx]
--+++    #         expert_out = self.experts[i](expert_tokens)
--+++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++
--+++    #         expert_cache = mindspore.mint.scatter_add(
--+++    #             expert_cache,
--+++    #             0,
--+++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++    #             expert_out
--+++    #         )
--+++
--+++    #     return expert_cache
--+++
--+++
--++ 
--++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--++ #     """
--++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++ 
--++         # Initialize weights and apply final processing
--++         self.post_init()
--+++        self.warm_up = False
--+++
--+++    def warmup_moe_model_deep(self):
--+++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++        test_texts = [
--+++            "warmup short",
--+++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+++        ]
--+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++        if tokenizer is None:
--+++            from mindnlp.transformers import AutoTokenizer
--+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++            self._warmup_tokenizer = tokenizer
--+++
--+++        for text in test_texts:
--+++            inputs = tokenizer(text, return_tensors="ms")
--+++            with mindspore._no_grad():
--+++                _ = self(**inputs, use_cache=False)
--+++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--++ 
--++     def get_input_embeddings(self):
--++         return self.model.embed_tokens
--++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++         ```"""
--+++        if not self.warm_up:
--+++            self.warm_up = True
--+++            self.warmup_moe_model_deep()
--+++
--++         output_attentions = (
--++             output_attentions
--++             if output_attentions is not None
--++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++index 3cbf820e..d4c6b651 100644
--++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++@@ -18,7 +18,6 @@
--++ # See the License for the specific language governing permissions and
--++ # limitations under the License.
--++ """MindSpore Qwen2MoE model."""
--++-
--++ import math
--++ from typing import List, Optional, Tuple, Union
--++ 
--++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--++     TokenClassifierOutput,
--++ )
--++ from ...modeling_utils import PreTrainedModel
--+++from ...generation import GenerationMixin
--++ from ....utils import logging
--++ from .configuration_qwen2_moe import Qwen2MoeConfig
--++ 
--++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--++         self.variance_epsilon = eps
--++ 
--++     def forward(self, hidden_states):
--+++        # @dwj
--+++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++        # @lwx
--+++        # if not self.training :
--+++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++         input_dtype = hidden_states.dtype
--++         hidden_states = hidden_states.to(mindspore.float32)
--++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--++@@ -234,6 +239,8 @@ def rotate_half(x):
--++     """Rotates half the hidden dims of the input."""
--++     x1 = x[..., : x.shape[-1] // 2]
--++     x2 = x[..., x.shape[-1] // 2 :]
--+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++     return ops.cat((-x2, x1), dim=-1)
--++ 
--++ 
--++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--++         self.config = config
--++         self.hidden_size = config.hidden_size
--++         self.intermediate_size = intermediate_size
--+++        
--++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--++         self.act_fn = ACT2FN[config.hidden_act]
--++ 
--++     def forward(self, x):
--++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++-
--++ 
--+++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++        # @lwx 
--+++        # gate_up_output = self.gate_up_proj(x)
--+++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+++        # return self.down_proj(swiglu_output)
--+++
--+++    # def forward(self, x):
--+++    #     gate_proj_out = self.gate_proj(x)
--+++    #     up_proj_out = self.up_proj(x)
--+++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+++    #     return self.down_proj(swiglu_out)
--+++        
--++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++     """
--++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--++         use_cache: bool = False,
--++         cache_position: Optional[mindspore.Tensor] = None,
--++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++        
--+++
--++         bsz, q_len, _ = hidden_states.shape
--++ 
--++         query_states = self.q_proj(hidden_states)
--++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++                     "with a layer index."
--++                 )
--++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++            if isinstance(past_key_value, StaticCache):
--+++                kv_seq_len = key_states.shape[-2]
--+++            else:
--+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++ 
--++         if past_key_value is not None:
--++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++            
--+++            if isinstance(past_key_value, StaticCache):
--+++                kv_seq_len = key_states.shape[-2]
--++ 
--++         # repeat k/v heads if n_kv_heads < n_heads
--++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++-
--+++        
--++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++ 
--++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--++-            raise ValueError(
--++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--++-                f" {attn_weights.shape}"
--++-            )
--++-
--++-        if attention_mask is not None:  # no matter the length, we just slice it
--++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+++        if attention_mask is not None:
--+++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++             attn_weights = attn_weights + causal_mask
--++ 
--++         # upcast attention to fp32
--++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++ 
--++         attn_output = self.o_proj(attn_output)
--++-
--+++        # @lwx
--+++        
--+++        # max_seq_len = self.max_position_embeddings  # 2048
--+++
--+++        # if attention_mask is not None:
--+++        #     # attention_mask: [B, 1, Sq, Sk]
--+++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++
--+++        #     # pad 到 [max_seq_len, max_seq_len]
--+++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++        #     global_attention_mask = padded_mask
--+++        # else:
--+++        #     global_attention_mask = None
--+++
--+++
--+++        # sparse_mode=3
--+++        # attn_output = mindspore.ops.flash_attention_score(
--+++        #     query=query_states,
--+++        #     key=key_states,
--+++        #     value=value_states,
--+++        #     real_shift=None,
--+++        #     padding_mask=None,
--+++
--+++        #     head_num=self.num_heads,
--+++        #     attn_mask=global_attention_mask,
--+++        #     keep_prob=1.0 - self.attention_dropout,
--+++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++        #     input_layout="BNSD",
--+++        #     pre_tokens=2147483647,
--+++        #     next_tokens=2147483647,
--+++        #     inner_precise=0,
--+++        #     drop_mask=None,
--+++        #     prefix=None,
--+++        #     actual_seq_qlen=None,
--+++        #     actual_seq_kvlen=None,
--+++        #     sparse_mode=sparse_mode,
--+++        # )
--++         if not output_attentions:
--++             attn_weights = None
--++ 
--++         return attn_output, attn_weights, past_key_value
--++ 
--++ 
--+++class Qwen2MoeFlashAttention(nn.Module):
--+++    """
--+++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++
--+++    关键改动:
--+++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++       直接传入原始的 key 和 value 张量效率更高。
--+++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++    """
--+++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++        super().__init__()
--+++        self.config = config
--+++        self.layer_idx = layer_idx
--+++        self.hidden_size = config.hidden_size
--+++        self.num_heads = config.num_attention_heads
--+++        self.head_dim = self.hidden_size // self.num_heads
--+++        self.num_key_value_heads = config.num_key_value_heads
--+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++        self.max_position_embeddings = config.max_position_embeddings
--+++        self.rope_theta = config.rope_theta
--+++        self.attention_dropout = config.attention_dropout
--+++
--+++        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++            raise ValueError(
--+++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++            )
--+++
--+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++
--+++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++            self.head_dim,
--+++            max_position_embeddings=self.max_position_embeddings,
--+++            base=self.rope_theta,
--+++        )
--+++
--+++    def forward(
--+++        self,
--+++        hidden_states: mindspore.Tensor,
--+++        attention_mask: Optional[mindspore.Tensor] = None,
--+++        position_ids: Optional[mindspore.Tensor] = None,
--+++        past_key_value: Optional[Cache] = None,
--+++        output_attentions: bool = False,
--+++        use_cache: bool = False,
--+++        cache_position: Optional[mindspore.Tensor] = None,
--+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++        bsz, q_len, _ = hidden_states.shape
--+++
--+++        # 1. 线性投射 Q, K, V
--+++        query_states = self.q_proj(hidden_states)
--+++        key_states = self.k_proj(hidden_states)
--+++        value_states = self.v_proj(hidden_states)
--+++
--+++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++        # 3. RoPE 旋转位置编码
--+++        kv_seq_len = key_states.shape[-2]
--+++        if past_key_value is not None:
--+++            if self.layer_idx is None:
--+++                raise ValueError(
--+++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++                    "with a layer index."
--+++                )
--+++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++                if cache_position.shape[0] == 1:
--+++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++                    kv_seq_len = past_seen_tokens + 1
--+++                else:
--+++                    # prefill 阶段：cache_position 是范围，使用其长度
--+++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++            else:
--+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++
--+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++        # 4. KV 缓存更新
--+++        if past_key_value is not None:
--+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++            key_states, value_states = past_key_value.update(
--+++                key_states, value_states, self.layer_idx, cache_kwargs
--+++            )
--+++            
--+++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++                if cache_position.shape[0] == 1:
--+++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++                    kv_seq_len = key_states.shape[-2]
--+++
--+++        # 5. [重要] 准备 Attention Mask
--+++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++        fa_attention_mask = None
--+++        if attention_mask is not None:
--+++            # 截取与当前key长度匹配的部分
--+++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++            fa_attention_mask = (mask_slice != 0)
--+++
--+++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++        input_dtype = query_states.dtype
--+++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++            query_states = query_states.to(mindspore.float16)
--+++            key_states = key_states.to(mindspore.float16)
--+++            value_states = value_states.to(mindspore.float16)
--+++
--+++        # 6. [核心] 调用 flash_attention_score 算子
--+++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++        attn_output = mindspore.ops.flash_attention_score(
--+++            query=query_states,
--+++            key=key_states,
--+++            value=value_states,
--+++            head_num=self.num_heads, # 传入Q的头数(N1)
--+++            attn_mask=fa_attention_mask,
--+++            keep_prob=1.0 - self.attention_dropout,
--+++            scalar_value=1.0 / math.sqrt(self.head_dim),
--+++            input_layout="BNSD",
--+++            sparse_mode=0 # 使用 defaultMask 模式
--+++        )
--+++
--+++        # 恢复原始数据类型
--+++        attn_output = attn_output.to(input_dtype)
--+++
--+++        # 7. 调整输出形状
--+++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++        attn_output = self.o_proj(attn_output)
--+++
--+++        # FlashAttention 算子不直接返回注意力权重矩阵
--+++        attn_weights = None
--+++        if output_attentions:
--+++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++
--+++        return attn_output, attn_weights, past_key_value
--+++
--+++    # def forward(
--+++    #     self,
--+++    #     hidden_states: mindspore.Tensor,
--+++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++    #     past_key_value: Optional[Cache] = None,
--+++    #     output_attentions: bool = False,
--+++    #     use_cache: bool = False,
--+++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++    #     bsz, q_len, _ = hidden_states.shape
--+++
--+++    #     # 1. 线性投射 Q, K, V
--+++    #     query_states = self.q_proj(hidden_states)
--+++    #     key_states = self.k_proj(hidden_states)
--+++    #     value_states = self.v_proj(hidden_states)
--+++
--+++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++    #     # 3. RoPE 旋转位置编码
--+++    #     kv_seq_len = key_states.shape[-2]
--+++    #     if past_key_value is not None:
--+++    #         if self.layer_idx is None:
--+++    #             raise ValueError(
--+++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++    #                 "with a layer index."
--+++    #             )
--+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++
--+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++    #     # 4. KV 缓存更新
--+++    #     if past_key_value is not None:
--+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++    #         key_states, value_states = past_key_value.update(
--+++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++    #         )
--+++
--+++    #     # 5. 准备 Attention Mask
--+++    #     fa_attention_mask = None
--+++    #     if attention_mask is not None:
--+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++    #         fa_attention_mask = (mask_slice != 0)
--+++
--+++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++    #     input_dtype = query_states.dtype
--+++
--+++    #     # 6. [核心] 调用 flash_attention_score 算子
--+++    #     attn_output = mindspore.ops.flash_attention_score(
--+++    #         query=query_states,
--+++    #         key=key_states,
--+++    #         value=value_states,
--+++    #         head_num=self.num_heads,
--+++    #         attn_mask=fa_attention_mask,
--+++    #         keep_prob=1.0 - self.attention_dropout,
--+++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++    #         input_layout="BNSD",
--+++    #         sparse_mode=0,
--+++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++    #         inner_precise=1
--+++    #     )
--+++
--+++    #     # 恢复原始数据类型
--+++    #     attn_output = attn_output.to(input_dtype)
--+++
--+++    #     # 7. 调整输出形状
--+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++    #     attn_output = self.o_proj(attn_output)
--+++
--+++    #     attn_weights = None
--+++    #     if output_attentions:
--+++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++
--+++    #     return attn_output, attn_weights, past_key_value
--+++
--+++    # def forward(
--+++    #     self,
--+++    #     hidden_states: mindspore.Tensor,
--+++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++    #     past_key_value: Optional[Cache] = None,
--+++    #     output_attentions: bool = False,
--+++    #     use_cache: bool = False,
--+++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++    #     bsz, q_len, _ = hidden_states.shape
--+++
--+++    #     query_states = self.q_proj(hidden_states)
--+++    #     key_states = self.k_proj(hidden_states)
--+++    #     value_states = self.v_proj(hidden_states)
--+++
--+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++    #     kv_seq_len = key_states.shape[-2]
--+++    #     if past_key_value is not None:
--+++    #         if self.layer_idx is None:
--+++    #             raise ValueError("`layer_idx` must be specified for caching")
--+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++
--+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++    #     if past_key_value is not None:
--+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++    #         key_states, value_states = past_key_value.update(
--+++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++    #         )
--+++
--+++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++
--+++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++    #     query_states = query_states / math.sqrt(self.head_dim)
--+++    #     # <--- 修改结束 ---
--+++
--+++    #     fa_attention_mask = None
--+++    #     if attention_mask is not None:
--+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++    #         fa_attention_mask = (mask_slice != 0)
--+++
--+++    #     input_dtype = query_states.dtype
--+++
--+++    #     attn_output = mindspore.ops.flash_attention_score(
--+++    #         query=query_states,    # 传入已经预先缩放过的 query
--+++    #         key=key_states,
--+++    #         value=value_states,
--+++    #         head_num=self.num_heads,
--+++    #         attn_mask=fa_attention_mask,
--+++    #         keep_prob=1.0 - self.attention_dropout,
--+++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++    #         input_layout="BNSD",
--+++    #         sparse_mode=0,
--+++    #         inner_precise=1        # 仍然保持内部高精度计算
--+++    #     )
--+++
--+++    #     attn_output = attn_output.to(input_dtype)
--+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++    #     attn_output = self.o_proj(attn_output)
--+++
--+++    #     attn_weights = None
--+++    #     if output_attentions:
--+++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++
--+++    #     return attn_output, attn_weights, past_key_value
--+++
--++ QWEN2MOE_ATTENTION_CLASSES = {
--++     "eager": Qwen2MoeAttention,
--+++    "flash-attention": Qwen2MoeFlashAttention,
--++ }
--++ 
--++ 
--++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++ 
--+++    #@dwj
--+++    # 只遍历激活的专家，而非全部专家
--++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-        hidden_states = hidden_states.view(-1, hidden_dim)
--++-        # router_logits: (batch * sequence_length, n_experts)
--++-        router_logits = self.gate(hidden_states)
--++-
--++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++-        if self.norm_topk_prob:
--++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-        # we cast back to the input dtype
--++-        routing_weights = routing_weights.to(hidden_states.dtype)
--++-
--++-        final_hidden_states = ops.zeros(
--++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--++-        )
--++-
--++-        # One hot encode the selected experts to create an expert mask
--++-        # this will be used to easily index which expert is going to be sollicitated
--++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--++-
--++-        # Loop over all available experts in the model and perform the computation on each expert
--++-        for expert_idx in range(self.num_experts):
--++-            expert_layer = self.experts[expert_idx]
--++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--++-
--++-            # Index the correct hidden states and compute the expert hidden state for
--++-            # the current expert. We need to make sure to multiply the output hidden
--++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--++-            if 0 not in idx.shape:
--++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--++-
--++-                # However `index_add_` only support torch tensors for indexing so we'll use
--++-                # the `top_x` tensor here.
--++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--++-
--++-        shared_expert_output = self.shared_expert(hidden_states)
--++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--++-
--++-        final_hidden_states = final_hidden_states + shared_expert_output
--+++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++            num_tokens = hidden_states_reshaped.shape[0]
--+++
--+++            router_logits = self.gate(hidden_states_reshaped)
--+++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++
--+++            if self.norm_topk_prob:
--+++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++            routing_weights = routing_weights.to(hidden_states.dtype)
--+++            
--+++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++            flat_selected_experts = selected_experts.flatten()
--+++            
--+++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++            token_indices = broadcasted_token_indices.flatten()
--+++            
--+++            active_experts = ops.unique(flat_selected_experts)
--+++            
--+++            for expert_idx_tensor in active_experts:
--+++                expert_idx = expert_idx_tensor.item()
--+++                expert_layer = self.experts[expert_idx]
--+++                
--+++                mask = (flat_selected_experts == expert_idx_tensor)
--+++                selected_token_indices = token_indices[mask]
--+++                selected_routing_weights = routing_weights.flatten()[mask]
--+++                
--+++                current_states = hidden_states_reshaped[selected_token_indices]
--+++                
--+++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++                
--+++                final_hidden_states = final_hidden_states.index_add(
--+++                    dim=0,
--+++                    index=selected_token_indices,
--+++                    source=expert_output.to(hidden_states.dtype)
--+++                )
--+++            
--+++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++ 
--++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++-        return final_hidden_states, router_logits
--+++            final_hidden_states = final_hidden_states + shared_expert_output
--+++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++            
--+++            return final_hidden_states, router_logits
--++ 
--++ 
--++ class Qwen2MoeDecoderLayer(nn.Module):
--++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--++ 
--++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++ 
--+++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++
--++         if (layer_idx not in config.mlp_only_layers) and (
--++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++         ):
--++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--++     _skip_keys_device_placement = "past_key_values"
--++     _supports_cache_class = True
--+++#lwx
--+++    # _supports_static_cache = True
--++ 
--++     def _init_weights(self, module):
--++         std = self.config.initializer_range
--++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++         return causal_mask
--++ 
--++ 
--++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++     _tied_weights_keys = ["lm_head.weight"]
--++ 
--++     def __init__(self, config):
--++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++         self.num_experts_per_tok = config.num_experts_per_tok
--++         # Initialize weights and apply final processing
--++         self.post_init()
--+++        # @lwx
--+++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+++        #     self.generation_config.cache_implementation = "static"
--+++        self._warmed_up = False
--+++
--+++    def warmup_moe_model(self):
--+++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+++        test_texts = [
--+++            "warmup short",
--+++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+++        ]
--+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++        if tokenizer is None:
--+++            from mindnlp.transformers import AutoTokenizer
--+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++            self._warmup_tokenizer = tokenizer
--+++
--+++        for text in test_texts:
--+++            inputs = tokenizer(text, return_tensors="ms")
--+++            with mindspore._no_grad():
--+++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--++ 
--++     def get_input_embeddings(self):
--++         return self.model.embed_tokens
--++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++         ```"""
--+++        if not self._warmed_up:
--+++            self._warmed_up = True
--+++            self.warmup_moe_model()
--++ 
--++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++         output_router_logits = (
--++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++             }
--++         )
--++         return model_inputs
--+++# @lwx
--+++    # def _decode_one_tokens_logits(
--+++    #     self,
--+++    #     cur_token: mindspore.Tensor,
--+++    #     input_pos: Optional[mindspore.Tensor],
--+++    #     cache_position: mindspore.Tensor,
--+++    #     past_key_values: StaticCache,
--+++    # ) -> mindspore.Tensor:
--+++    #     """
--+++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+++        
--+++    #     Args:
--+++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+++    #         input_pos: 输入位置信息，可选
--+++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+++            
--+++    #     Returns:
--+++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+++    #     """
--+++    #     # 调用JIT编译的版本
--+++    #     return self.get_decode_one_tokens_logits(
--+++    #         cur_token=cur_token,
--+++    #         input_pos=input_pos,
--+++    #         cache_position=cache_position,
--+++    #         past_key_values=past_key_values,
--+++    #     )
--+++    
--+++    # @mindspore.jit(jit_level='O1')
--+++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+++    #     """
--+++    #     JIT编译的函数，用于高效的单token解码
--+++    #     使用JIT编译优化以支持静态shape和高效执行
--+++        
--+++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+++    #     """
--+++    #     outputs = self.model.forward(
--+++    #         input_ids=cur_token,
--+++    #         position_ids=input_pos,
--+++    #         cache_position=cache_position,
--+++    #         past_key_values=past_key_values,
--+++    #         use_cache=True,
--+++    #         return_dict=False,
--+++    #     )
--+++        
--+++    #     hidden_states = outputs[0]
--+++    #     logits = self.lm_head.forward(hidden_states)
--+++    #     logits = logits.float()
--+++        
--+++    #     return logits[:, -1, :]
--+++
--+++    # def _sample(
--+++    #     self,
--+++    #     input_ids: mindspore.Tensor,
--+++    #     logits_processor,
--+++    #     stopping_criteria,
--+++    #     generation_config,
--+++    #     synced_devices: bool,
--+++    #     streamer=None,
--+++    #     logits_warper=None,
--+++    #     **model_kwargs,
--+++    # ):
--+++    #     """
--+++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+++    #     """
--+++    #     from ...generation.logits_process import LogitsProcessorList
--+++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+++    #     from mindnlp.core import nn, ops, no_grad
--+++    #     import numpy as np
--+++        
--+++    #     # 检查是否使用 StaticCache
--+++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+++    #     # 否则，直接调用父类方法
--+++    #     past_key_values = model_kwargs.get("past_key_values")
--+++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+++
--+++    #     if not isinstance(past_key_values, StaticCache):
--+++    #         # 不使用 StaticCache，直接调用父类方法
--+++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+++    #         return super()._sample(
--+++    #             input_ids=input_ids,
--+++    #             logits_processor=logits_processor,
--+++    #             stopping_criteria=stopping_criteria,
--+++    #             generation_config=generation_config,
--+++    #             synced_devices=synced_devices,
--+++    #             streamer=streamer,
--+++    #             logits_warper=logits_warper,
--+++    #             **model_kwargs,
--+++    #         )
--+++        
--+++    #     # 使用 StaticCache，进入自定义循环
--+++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+++    #     pad_token_id = generation_config._pad_token_tensor
--+++    #     output_attentions = generation_config.output_attentions
--+++    #     output_hidden_states = generation_config.output_hidden_states
--+++    #     output_scores = generation_config.output_scores
--+++    #     output_logits = generation_config.output_logits
--+++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+++    #     max_length = generation_config.max_length
--+++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+++    #     do_sample = generation_config.do_sample
--+++        
--+++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+++    #         raise ValueError(
--+++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+++    #             f"{logits_warper})."
--+++    #         )
--+++        
--+++    #     # init attention / hidden states / scores tuples
--+++    #     scores = () if (return_dict_in_generate and output_scores) else None
--+++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+++        
--+++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+++    #         encoder_hidden_states = (
--+++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+++    #         )
--+++        
--+++    #     # keep track of which sequences are already finished
--+++    #     batch_size, cur_len = input_ids.shape
--+++    #     this_peer_finished = False
--+++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+++        
--+++    #     time_record = []
--+++    #     from ....utils.testing_utils import parse_flag_from_env
--+++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+++        
--+++    #     while self._has_unfinished_sequences(
--+++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+++    #     ):
--+++    #         if _record_time:
--+++    #             import time as time_module
--+++    #             infer_start = time_module.time()
--+++            
--+++    #         # prepare model inputs
--+++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+++            
--+++    #         # prepare variable output controls
--+++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+++            
--+++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+++    #         cur_cache_position = model_inputs.get("cache_position")
--+++    #         cur_past_key_values = model_inputs.get("past_key_values")
--+++    #         cur_input_ids = model_inputs.get("input_ids")
--+++            
--+++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+++    #             cur_cache_position is not None and 
--+++    #             len(cur_cache_position.shape) > 0 and
--+++    #             cur_cache_position.shape[0] == 1 and
--+++    #             cur_input_ids is not None and
--+++    #             cur_input_ids.shape[1] == 1):
--+++    #             # 使用 JIT 优化的单 token 解码
--+++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+++    #             if not hasattr(self, '_jit_used'):
--+++    #                 self._jit_used = False
--+++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+++                
--+++    #             next_token_logits = self.get_decode_one_tokens_logits(
--+++    #                 cur_token=cur_input_ids,
--+++    #                 input_pos=model_inputs.get("position_ids"),
--+++    #                 cache_position=cur_cache_position,
--+++    #                 past_key_values=cur_past_key_values,
--+++    #             )
--+++                
--+++    #             # 标记已使用JIT（用于后续判断）
--+++    #             if not self._jit_used:
--+++    #                 self._jit_used = True
--+++                
--+++    #             # 构造兼容的输出对象
--+++    #             class JitOptimizedOutput:
--+++    #                 def __init__(self, logits, config):
--+++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+++    #                     self.config = config
--+++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+++    #                     self.attentions = None if not config.is_encoder_decoder else None
--+++    #                     self.cross_attentions = None
--+++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+++                
--+++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+++    #         else:
--+++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+++    #             outputs = self(**model_inputs, return_dict=True)
--+++            
--+++    #         if synced_devices and this_peer_finished:
--+++    #             continue
--+++            
--+++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+++    #         next_token_logits = outputs.logits[:, -1, :]
--+++            
--+++    #         # pre-process distribution
--+++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+++    #         if do_sample:
--+++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+++            
--+++    #         # Store scores, attentions and hidden_states when required
--+++    #         if return_dict_in_generate:
--+++    #             if output_scores:
--+++    #                 scores += (next_token_scores,)
--+++    #             if output_logits:
--+++    #                 raw_logits += (next_token_logits,)
--+++    #             if output_attentions:
--+++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+++    #                 if self.config.is_encoder_decoder:
--+++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+++                
--+++    #             if output_hidden_states:
--+++    #                 hidden = (
--+++    #                     outputs.decoder_hidden_states
--+++    #                     if self.config.is_encoder_decoder
--+++    #                     else outputs.hidden_states
--+++    #                 )
--+++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+++            
--+++    #         # token selection
--+++    #         if do_sample:
--+++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+++    #         else:
--+++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+++            
--+++    #         # finished sentences should have their next token be a padding token
--+++    #         if has_eos_stopping_criteria:
--+++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+++            
--+++    #         # update generated ids, model inputs, and length for next step
--+++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+++    #         if streamer is not None:
--+++    #             streamer.put(next_tokens)
--+++            
--+++    #         model_kwargs = self._update_model_kwargs_for_generation(
--+++    #             outputs,
--+++    #             model_kwargs,
--+++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+++    #         )
--+++            
--+++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+++    #         cur_len += 1
--+++            
--+++    #         if _record_time:
--+++    #             import time as time_module
--+++    #             infer_stop = time_module.time()
--+++    #             time_record.append(infer_stop - infer_start)
--+++            
--+++    #         del outputs
--+++        
--+++    #     average_infer_time = None
--+++    #     if time_record:
--+++    #         if len(time_record) > 1:
--+++    #             time_record.pop(0)
--+++    #         average_infer_time = sum(time_record) / len(time_record)
--+++    #         print(f'average inference time is: {average_infer_time}')
--+++    #         print(f'inference time record: {time_record}')
--+++        
--+++    #     if streamer is not None:
--+++    #         streamer.end()
--+++        
--+++    #     # 简单判断：打印是否使用了JIT路径
--+++    #     if hasattr(self, '_jit_used') and self._jit_used:
--+++    #         print("[JIT] ✓ JIT optimization was used during generation")
--+++    #     else:
--+++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+++        
--+++    #     if return_dict_in_generate:
--+++    #         if self.config.is_encoder_decoder:
--+++    #             return GenerateEncoderDecoderOutput(
--+++    #                 sequences=input_ids,
--+++    #                 scores=scores,
--+++    #                 logits=raw_logits,
--+++    #                 encoder_attentions=encoder_attentions,
--+++    #                 encoder_hidden_states=encoder_hidden_states,
--+++    #                 decoder_attentions=decoder_attentions,
--+++    #                 cross_attentions=cross_attentions,
--+++    #                 decoder_hidden_states=decoder_hidden_states,
--+++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++    #                 average_infer_time=average_infer_time
--+++    #             )
--+++    #         else:
--+++    #             return GenerateDecoderOnlyOutput(
--+++    #                 sequences=input_ids,
--+++    #                 scores=scores,
--+++    #                 logits=raw_logits,
--+++    #                 attentions=decoder_attentions,
--+++    #                 hidden_states=decoder_hidden_states,
--+++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++    #                 average_infer_time=average_infer_time
--+++    #             )
--+++    #     else:
--+++    #         return input_ids
--+++            
--+++    # def _prepare_cache_for_generation(
--+++    #     self,
--+++    #     generation_config,
--+++    #     model_kwargs,
--+++    #     assistant_model,
--+++    #     batch_size,
--+++    #     max_cache_length,
--+++    # ):
--+++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+++    #         generation_config.cache_implementation = "static"
--+++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+++        
--+++    #     if generation_config.cache_implementation == "static":
--+++    #         base_required_from_max_length = generation_config.max_length + 1
--+++    #         base_required = max(max_cache_length, base_required_from_max_length)
--+++    #         min_cache_size = 50
--+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+++    #         else:
--+++    #             max_cache_length = max(base_required, min_cache_size)
--+++            
--+++    #         original_max_cache_length = max_cache_length
--+++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+++    #         print(f"  - final max_cache_length: {max_cache_length}")
--+++            
--+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++    #             if max_cache_length > self.config.max_position_embeddings:
--+++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+++        
--+++    #     result = super()._prepare_cache_for_generation(
--+++    #         generation_config=generation_config,
--+++    #         model_kwargs=model_kwargs,
--+++    #         assistant_model=assistant_model,
--+++    #         batch_size=batch_size,
--+++    #         max_cache_length=max_cache_length,
--+++    #     )
--+++        
--+++    #     if generation_config.cache_implementation == "static":
--+++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+++    #         created_cache = model_kwargs.get(cache_name)
--+++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+++    #             if created_cache.max_cache_len < generation_config.max_length:
--+++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+++        
--+++    #     return result
--+++
--+++
--+++
--++ 
--++ 
--++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--++-- 
--++2.27.0
--++
--+-- 
--+2.27.0
--+
--diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--new file mode 100644
--index 00000000..966529e4
----- /dev/null
--+++ b/patches/0003-20261106secondcommit.patch
--@@ -0,0 +1,2769 @@
--+From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--+From: Pinoeer-kingxi <13022943007@163.com>
--+Date: Thu, 6 Nov 2025 14:54:37 +0800
--+Subject: [PATCH 3/3] 20261106secondcommit
--+
--+---
--+ .../models/deepseek/modeling_deepseek.py      |  217 ++-
--+ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
--+ patches/0001-20251104commit.patch             | 1272 -----------------
--+ 3 files changed, 528 insertions(+), 2032 deletions(-)
--+ delete mode 100644 patches/0001-20251104commit.patch
--+
--+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+index 73773c22..2f9192bf 100644
--+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
--+ 
--+ _CONFIG_FOR_DOC = "DeepseekConfig"
--+ 
--++_attn_mask_cache = {}
--++
--++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
--++    q_len = batch_and_seq[1]
--++    kv_len = batch_and_seq[1] + past_key_values_length 
--++    key = (batch_and_seq[0], q_len, kv_len)
--++
--++    if key in _attn_mask_cache:
--++        return _attn_mask_cache[key]
--++
--++    mask = _prepare_4d_causal_attention_mask(
--++        attention_mask,
--++        batch_and_seq,
--++        inputs_embeds,
--++        past_key_values_length,
--++    )
--++    _attn_mask_cache[key] = mask
--++    return mask
--+ 
--+ def _get_unpad_data(attention_mask):
--+     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
--+@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
--+         return final_output
--+ 
--+ 
--+-    @no_grad()
--+-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+-        expert_cache = ops.zeros_like(x)
--+-        idxs = flat_expert_indices.argsort()
--+-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+-        token_idxs = idxs // self.num_experts_per_tok
--+-
--+-        for i, end_idx in enumerate(tokens_per_expert):
--+-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+-            if start_idx == end_idx:
--+-                continue
--+-            expert = self.experts[i]
--+-            exp_token_idx = token_idxs[start_idx:end_idx]
--+-            expert_tokens = x[exp_token_idx]
--+-            expert_out = expert(expert_tokens)
--+-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+-
--+-        return expert_cache
--+-        
--+     # @no_grad()
--+-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+-    #     # expert_cache = torch.zeros_like(x)
--+-    #     # idxs = flat_expert_indices.argsort()
--+-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+-    #     # token_idxs = idxs // self.num_experts_per_tok
--+-    #     # for i, end_idx in enumerate(tokens_per_expert):
--+-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+-    #     #     if start_idx == end_idx:
--+-    #     #         continue
--+-    #     #     expert = self.experts[i]
--+-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+-    #     #     expert_tokens = x[exp_token_idx]
--+-    #     #     expert_out = expert(expert_tokens)
--+-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+-    #     # return expert_cache
--++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+     #     expert_cache = ops.zeros_like(x)
--+     #     idxs = flat_expert_indices.argsort()
--+     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
--+     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+ 
--+     #     return expert_cache
--+-    # @no_grad()
--+-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+-    #     expert_cache = ops.zeros_like(x)
--++        
--++    @no_grad()
--++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++        """
--++        优化版 MoE prefill：
--++        - 批量张量化处理同一个 expert 的所有 token
--++        - 跳过无 token 的专家
--++        - 保持结果完全一致
--++        """
--++        # 初始化输出缓存
--++        expert_cache = ops.zeros_like(x)
--+ 
--+-    #     # 排序保证顺序一致
--+-    #     idxs = flat_expert_indices.argsort()
--+-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+-    #     token_idxs = idxs // self.num_experts_per_tok
--++        # 排序（确保 scatter_add 位置对应原逻辑）
--++        idxs = flat_expert_indices.argsort()
--++        sorted_expert_indices = flat_expert_indices[idxs]
--++        sorted_token_indices = idxs // self.num_experts_per_tok
--+ 
--+-    #     # 找出有 token 的专家
--+-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++        # 每个 expert 的 token 数
--++        tokens_per_expert = sorted_expert_indices.bincount()
--+ 
--+-    #     for i in active_experts.tolist():
--+-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+-    #         end_idx = tokens_per_expert[i]
--+-    #         if start_idx == end_idx:  # 没有 token
--+-    #             continue
--++        # 找出有 token 的专家
--++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--+ 
--+-    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+-    #         expert_tokens = x[exp_token_idx]
--+-    #         expert_out = self.experts[i](expert_tokens)
--+-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++        for expert_id in active_experts.tolist():
--++            # 取该 expert 对应的排序后 token 区间
--++            start = (tokens_per_expert[:expert_id]).sum().item()
--++            end = start + tokens_per_expert[expert_id].item()
--+ 
--+-    #         expert_cache = mindspore.mint.scatter_add(
--+-    #             expert_cache,
--+-    #             0,
--+-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+-    #             expert_out
--+-    #         )
--++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
--++            expert_tokens = x[token_idx]                     # 取输入向量
--+ 
--+-    #     return expert_cache
--++            # 执行专家 MLP
--++            expert_out = self.experts[expert_id](expert_tokens)
--++
--++            # 按权重缩放
--++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
--++
--++            # 回写到缓存（等价 scatter_add）
--++            expert_cache = mindspore.mint.scatter_add(
--++                expert_cache,
--++                0,
--++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++                scaled_out
--++            )
--++
--++        return expert_cache
--++
--++        # @no_grad()
--++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++        #     # expert_cache = torch.zeros_like(x)
--++        #     # idxs = flat_expert_indices.argsort()
--++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++        #     # token_idxs = idxs // self.num_experts_per_tok
--++        #     # for i, end_idx in enumerate(tokens_per_expert):
--++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++        #     #     if start_idx == end_idx:
--++        #     #         continue
--++        #     #     expert = self.experts[i]
--++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++        #     #     expert_tokens = x[exp_token_idx]
--++        #     #     expert_out = expert(expert_tokens)
--++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++        #     # return expert_cache
--++        #     expert_cache = ops.zeros_like(x)
--++        #     idxs = flat_expert_indices.argsort()
--++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++        #     token_idxs = idxs // self.num_experts_per_tok
--++
--++        #     for i, end_idx in enumerate(tokens_per_expert):
--++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++        #         if start_idx == end_idx:
--++        #             continue
--++        #         expert = self.experts[i]
--++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--++        #         expert_tokens = x[exp_token_idx]
--++        #         expert_out = expert(expert_tokens)
--++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++
--++        #     return expert_cache
--++        # @no_grad()
--++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++        #     expert_cache = ops.zeros_like(x)
--++
--++        #     # 排序保证顺序一致
--++        #     idxs = flat_expert_indices.argsort()
--++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++        #     token_idxs = idxs // self.num_experts_per_tok
--++
--++        #     # 找出有 token 的专家
--++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++
--++        #     for i in active_experts.tolist():
--++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++        #         end_idx = tokens_per_expert[i]
--++        #         if start_idx == end_idx:  # 没有 token
--++        #             continue
--++
--++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--++        #         expert_tokens = x[exp_token_idx]
--++        #         expert_out = self.experts[i](expert_tokens)
--++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++
--++        #         expert_cache = mindspore.mint.scatter_add(
--++        #             expert_cache,
--++        #             0,
--++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++        #             expert_out
--++        #         )
--++
--++        #     return expert_cache
--+ 
--+ 
--+ 
--+@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
--+ 
--+         return attn_output, attn_weights, past_key_value
--+ 
--+-
--+ # class DeepseekFlashAttention(nn.Module):
--+ #     """
--+ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--+@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
--+ 
--+         return attn_output, attn_weights, past_key_value
--+ 
--++
--+ Deepseek_ATTENTION_CLASSES = {
--+     "eager": DeepseekAttention,
--+     "flash-attention": DeepseekFlashAttention,
--+@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
--+             )
--+         else:
--+             # 4d mask is passed through the layers
--+-            attention_mask = _prepare_4d_causal_attention_mask(
--++            # attention_mask = _prepare_4d_causal_attention_mask(
--++            #     attention_mask,
--++            #     (batch_size, seq_length),
--++            #     inputs_embeds,
--++            #     past_key_values_length,
--++            # )
--++            #@dwj
--++            attention_mask = get_cached_causal_mask(
--+                 attention_mask,
--+                 (batch_size, seq_length),
--+                 inputs_embeds,
--+@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+         # Initialize weights and apply final processing
--+         self.post_init()
--+         self.warm_up = False
--++        #@dwj
--++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--++            self.num_layers,
--++            self.num_attention_heads,
--++            self.head_dim,
--++            batch_size=1,
--++            max_length=self.max_length,
--++            dtype=mindspore.float16
--++        )
--++
--++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--++        key_cache = []
--++        value_cache = []
--++        for _ in range(num_layers):
--++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++            key_cache.append(k)
--++            value_cache.append(v)
--++        return key_cache, value_cache
--++
--+ 
--+     def warmup_moe_model_deep(self):
--+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+index bced285c..ebd7782e 100644
--+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
--+ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--+ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--+ 
--+-Long_Prompt = False
--+-PROMPT_LENGTH_THRESHOLD = 128
--++Long_Prompt = 1
--++LONG_PROMPT_LENGTH_THRESHOLD = 128
--++SHORT_PROMPT_LENGTH_THRESHOLD = 32
--++
--++_causal_mask_cache = {}
--++
--++def get_cached_causal_mask_with_cache_position(
--++    attention_mask: mindspore.Tensor,
--++    sequence_length: int,
--++    target_length: int,
--++    dtype: mindspore.dtype,
--++    min_dtype: float,
--++    cache_position: mindspore.Tensor,
--++    batch_size: int,
--++):
--++    """
--++    带缓存的 causal mask 构造函数
--++    """
--++    # q_len 是当前 query 长度
--++    q_len = sequence_length
--++    # kv_len 是 target_length
--++    kv_len = target_length
--++
--++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
--++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
--++
--++    if key in _causal_mask_cache:
--++        return _causal_mask_cache[key]
--++
--++    # 调用原来的 mask 构造逻辑
--++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++        attention_mask,
--++        sequence_length=sequence_length,
--++        target_length=target_length,
--++        dtype=dtype,
--++        min_dtype=min_dtype,
--++        cache_position=cache_position,
--++        batch_size=batch_size,
--++    )
--++    # 缓存结果
--++    _causal_mask_cache[key] = causal_mask
--++    return causal_mask
--+ 
--+ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--+ def _prepare_4d_causal_attention_mask_with_cache_position(
--+@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+ 
--+ 
--+ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
--++# class Qwen2MoeAttention(nn.Module):
--++#     """
--++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--++#     and "Generating Long Sequences with Sparse Transformers".
--++#     """
--++
--++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++#         super().__init__()
--++#         self.config = config
--++#         self.layer_idx = layer_idx
--++#         if layer_idx is None:
--++#             logger.warning_once(
--++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++#                 "when creating this class."
--++#             )
--++
--++#         self.hidden_size = config.hidden_size
--++#         self.num_heads = config.num_attention_heads
--++#         self.head_dim = self.hidden_size // self.num_heads
--++#         self.num_key_value_heads = config.num_key_value_heads
--++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++#         self.max_position_embeddings = config.max_position_embeddings
--++#         self.rope_theta = config.rope_theta
--++#         self.is_causal = True
--++#         self.attention_dropout = config.attention_dropout
--++
--++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++#             raise ValueError(
--++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++#                 f" and `num_heads`: {self.num_heads})."
--++#             )
--++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++
--++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++#             self.head_dim,
--++#             max_position_embeddings=self.max_position_embeddings,
--++#             base=self.rope_theta,
--++#         )
--++
--++#     def forward(
--++#         self,
--++#         hidden_states: mindspore.Tensor,
--++#         attention_mask: Optional[mindspore.Tensor] = None,
--++#         position_ids: Optional[mindspore.Tensor] = None,
--++#         past_key_value: Optional[Cache] = None,
--++#         output_attentions: bool = False,
--++#         use_cache: bool = False,
--++#         cache_position: Optional[mindspore.Tensor] = None,
--++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++
--++        
--++
--++#         bsz, q_len, _ = hidden_states.shape
--++
--++#         query_states = self.q_proj(hidden_states)
--++#         key_states = self.k_proj(hidden_states)
--++#         value_states = self.v_proj(hidden_states)
--++
--++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++
--++#         kv_seq_len = key_states.shape[-2]
--++#         if past_key_value is not None:
--++#             if self.layer_idx is None:
--++#                 raise ValueError(
--++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++#                     "with a layer index."
--++#                 )
--++#             if isinstance(past_key_value, StaticCache):
--++#                 kv_seq_len = key_states.shape[-2]
--++#             else:
--++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++
--++#         if past_key_value is not None:
--++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++            
--++#             if isinstance(past_key_value, StaticCache):
--++#                 kv_seq_len = key_states.shape[-2]
--++
--++#         # repeat k/v heads if n_kv_heads < n_heads
--++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++        
--++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++
--++#         if attention_mask is not None:
--++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++#             attn_weights = attn_weights + causal_mask
--++
--++#         # upcast attention to fp32
--++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--++#         attn_output = ops.matmul(attn_weights, value_states)
--++
--++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--++#             raise ValueError(
--++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--++#                 f" {attn_output.shape}"
--++#             )
--++
--++#         attn_output = ops.transpose(attn_output, 1, 2)
--++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++
--++#         attn_output = self.o_proj(attn_output)
--++#         # @lwx
--++        
--++#         # max_seq_len = self.max_position_embeddings  # 2048
--++
--++#         # if attention_mask is not None:
--++#         #     # attention_mask: [B, 1, Sq, Sk]
--++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++
--++#         #     # pad 到 [max_seq_len, max_seq_len]
--++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++#         #     global_attention_mask = padded_mask
--++#         # else:
--++#         #     global_attention_mask = None
--++
--++
--++#         # sparse_mode=3
--++#         # attn_output = mindspore.ops.flash_attention_score(
--++#         #     query=query_states,
--++#         #     key=key_states,
--++#         #     value=value_states,
--++#         #     real_shift=None,
--++#         #     padding_mask=None,
--++
--++#         #     head_num=self.num_heads,
--++#         #     attn_mask=global_attention_mask,
--++#         #     keep_prob=1.0 - self.attention_dropout,
--++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++#         #     input_layout="BNSD",
--++#         #     pre_tokens=2147483647,
--++#         #     next_tokens=2147483647,
--++#         #     inner_precise=0,
--++#         #     drop_mask=None,
--++#         #     prefix=None,
--++#         #     actual_seq_qlen=None,
--++#         #     actual_seq_kvlen=None,
--++#         #     sparse_mode=sparse_mode,
--++#         # )
--++#         if not output_attentions:
--++#             attn_weights = None
--++
--++#         return attn_output, attn_weights, past_key_value
--++
--+ class Qwen2MoeAttention(nn.Module):
--+     """
--+-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--+-    and "Generating Long Sequences with Sparse Transformers".
--+-    """
--++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
--+ 
--++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
--++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
--++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
--++
--++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
--++    """
--+     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+         super().__init__()
--+         self.config = config
--+@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
--+         if layer_idx is None:
--+             logger.warning_once(
--+                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+                 "when creating this class."
--+             )
--+ 
--+@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
--+         use_cache: bool = False,
--+         cache_position: Optional[mindspore.Tensor] = None,
--+     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+-
--+         
--+-
--++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
--+         bsz, q_len, _ = hidden_states.shape
--+ 
--+         query_states = self.q_proj(hidden_states)
--+         key_states = self.k_proj(hidden_states)
--+         value_states = self.v_proj(hidden_states)
--+ 
--+-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+-
--++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++        
--+         kv_seq_len = key_states.shape[-2]
--+         if past_key_value is not None:
--+-            if self.layer_idx is None:
--+-                raise ValueError(
--+-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+-                    "with a layer index."
--+-                )
--+-            if isinstance(past_key_value, StaticCache):
--+-                kv_seq_len = key_states.shape[-2]
--+-            else:
--+-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++        
--+         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+ 
--+         if past_key_value is not None:
--+-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++
--++        # --- 2. 动态调度核心注意力计算 ---
--++        global Long_Prompt
--++        if Long_Prompt >= 1:
--++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
--++            fa_attention_mask = None
--++            if attention_mask is not None:
--++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++                fa_attention_mask = (mask_slice != 0)
--++
--++            attn_output = mindspore.ops.flash_attention_score(
--++                query=query_states,
--++                key=key_states,
--++                value=value_states,
--++                head_num=self.num_heads,
--++                attn_mask=fa_attention_mask,
--++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
--++                scalar_value=1.0 / math.sqrt(self.head_dim),
--++                input_layout="BNSD",
--++                sparse_mode=0,
--++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
--++            )
--+             
--+-            if isinstance(past_key_value, StaticCache):
--+-                kv_seq_len = key_states.shape[-2]
--++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++            attn_output = self.o_proj(attn_output)
--++            attn_weights = None
--++            if output_attentions:
--++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--+ 
--+-        # repeat k/v heads if n_kv_heads < n_heads
--+-        key_states = repeat_kv(key_states, self.num_key_value_groups)
--+-        value_states = repeat_kv(value_states, self.num_key_value_groups)
--+-        
--+-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++        else:
--++            # --- Eager Attention 路径 (用于短序列和解码) ---
--++            key_states = repeat_kv(key_states, self.num_key_value_groups)
--++            value_states = repeat_kv(value_states, self.num_key_value_groups)
--++            
--++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+ 
--+-        if attention_mask is not None:
--+-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+-            attn_weights = attn_weights + causal_mask
--++            if attention_mask is not None:
--++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++                attn_weights = attn_weights + causal_mask
--+ 
--+-        # upcast attention to fp32
--+-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+-        attn_output = ops.matmul(attn_weights, value_states)
--++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--++            attn_output = ops.matmul(attn_weights, value_states)
--+ 
--+-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+-            raise ValueError(
--+-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--+-                f" {attn_output.shape}"
--+-            )
--++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--++                raise ValueError(
--++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
--++                )
--+ 
--+-        attn_output = ops.transpose(attn_output, 1, 2)
--+-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++            attn_output = ops.transpose(attn_output, 1, 2)
--++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++            attn_output = self.o_proj(attn_output)
--+ 
--+-        attn_output = self.o_proj(attn_output)
--+-        # @lwx
--++            if not output_attentions:
--++                attn_weights = None
--+         
--+-        # max_seq_len = self.max_position_embeddings  # 2048
--+-
--+-        # if attention_mask is not None:
--+-        #     # attention_mask: [B, 1, Sq, Sk]
--+-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+-
--+-        #     # pad 到 [max_seq_len, max_seq_len]
--+-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+-        #     global_attention_mask = padded_mask
--+-        # else:
--+-        #     global_attention_mask = None
--+-
--+-
--+-        # sparse_mode=3
--+-        # attn_output = mindspore.ops.flash_attention_score(
--+-        #     query=query_states,
--+-        #     key=key_states,
--+-        #     value=value_states,
--+-        #     real_shift=None,
--+-        #     padding_mask=None,
--+-
--+-        #     head_num=self.num_heads,
--+-        #     attn_mask=global_attention_mask,
--+-        #     keep_prob=1.0 - self.attention_dropout,
--+-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+-        #     input_layout="BNSD",
--+-        #     pre_tokens=2147483647,
--+-        #     next_tokens=2147483647,
--+-        #     inner_precise=0,
--+-        #     drop_mask=None,
--+-        #     prefix=None,
--+-        #     actual_seq_qlen=None,
--+-        #     actual_seq_kvlen=None,
--+-        #     sparse_mode=sparse_mode,
--+-        # )
--+-        if not output_attentions:
--+-            attn_weights = None
--+-
--+         return attn_output, attn_weights, past_key_value
--+ 
--+-
--+ # class Qwen2MoeFlashAttention(nn.Module):
--+ #     """
--+ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
--+ #             return final_hidden_states, router_logits
--+ 
--+ 
--+-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+-#     """
--+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--+-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--+-#     `_moe_infer_prefill` (用于长序列处理) 方法。
--+-#     """
--+-#     def __init__(self, config: Qwen2MoeConfig):
--+-#         super().__init__()
--+-#         self.num_experts = config.num_experts
--+-#         self.top_k = config.num_experts_per_tok
--+-#         self.norm_topk_prob = config.norm_topk_prob
--+-
--+-#         # 门控网络
--+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+-#         # 专家列表
--+-#         self.experts = nn.ModuleList(
--+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+-#         )
--+-#         # 共享专家
--+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+-
--+-#     @no_grad()
--+-#     def _moe_infer_decode(
--+-#         self, 
--+-#         hidden_states: mindspore.Tensor, 
--+-#         selected_experts: mindspore.Tensor, 
--+-#         routing_weights: mindspore.Tensor
--+-#     ) -> mindspore.Tensor:
--+-#         """
--+-#         【解码路径】针对 sequence_length=1 的极致优化。
--+-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--+-#         """
--+-#         batch_size, hidden_dim = hidden_states.shape
--+-        
--+-#         expert_outputs_list = [
--+-#             ops.cat([
--+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+-#             ], dim=0) 
--+-#             for i in range(batch_size)
--+-#         ]
--+-        
--+-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--+-#         # shape: (batch_size, top_k, hidden_dim)
--+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+-        
--+-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--+-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+-        
--+-#         return moe_output.squeeze(1)
--+-
--+-#     @no_grad()
--+-#     def _moe_infer_prefill(
--+-#         self, 
--+-#         hidden_states: mindspore.Tensor, 
--+-#         selected_experts: mindspore.Tensor, 
--+-#         routing_weights: mindspore.Tensor
--+-#     ) -> mindspore.Tensor:
--+-#         """
--+-#         【预填充路径】针对 sequence_length > 1 的优化。
--+-#         按专家对 Token 进行分组，并进行批处理。
--+-#         """
--+-#         moe_output = ops.zeros_like(hidden_states)
--+-#         num_tokens = hidden_states.shape[0]
--+-#         flat_selected_experts = selected_experts.flatten()
--+-        
--+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+-        
--+-#         active_experts = ops.unique(flat_selected_experts)
--+-        
--+-#         for expert_idx_tensor in active_experts:
--+-#             expert_idx = expert_idx_tensor.item()
--+-#             expert_layer = self.experts[expert_idx]
--+-            
--+-#             mask = (flat_selected_experts == expert_idx_tensor)
--+-#             selected_token_indices = token_indices[mask]
--+-#             selected_routing_weights = routing_weights.flatten()[mask]
--+-            
--+-#             current_states = hidden_states[selected_token_indices]
--+-            
--+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+-            
--+-#             moe_output = moe_output.index_add(
--+-#                 dim=0,
--+-#                 index=selected_token_indices,
--+-#                 source=expert_output.to(hidden_states.dtype)
--+-#             )
--+-#         return moe_output
--+-
--+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+-#         """
--+-#         顶层 forward 方法，作为智能分发器。
--+-#         """
--+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-        
--+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+-#         router_logits = self.gate(hidden_states_reshaped)
--+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+-
--+-#         if self.norm_topk_prob:
--+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-        
--+-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+-        
--+-#         moe_output = None
--+-#         # 在推理时，根据序列长度选择最优路径
--+-#         if not self.training:
--+-#             if sequence_length == 1:
--+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+-#             else:
--+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+-#         else:
--+-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--+-#             raise NotImplementedError("Training path is not implemented.")
--+-
--+-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--+-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--+-        
--+-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--+-        
--+-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--+-        
--+-#         return final_hidden_states, router_logits
--+-
--+-
--+-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+-#     """
--+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--+-#     """
--+-#     def __init__(self, config: Qwen2MoeConfig):
--+-#         super().__init__()
--+-#         self.num_experts = config.num_experts
--+-#         self.top_k = config.num_experts_per_tok
--+-#         self.norm_topk_prob = config.norm_topk_prob
--+-
--+-#         # 门控网络
--+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+-#         # 专家列表
--+-#         self.experts = nn.ModuleList(
--+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+-#         )
--+-#         # 共享专家
--+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+-
--+-#     @no_grad()
--+-#     def _moe_infer_decode(
--+-#         self, 
--+-#         hidden_states: mindspore.Tensor, 
--+-#         selected_experts: mindspore.Tensor, 
--+-#         routing_weights: mindspore.Tensor
--+-#     ) -> mindspore.Tensor:
--+-#         batch_size, _ = hidden_states.shape
--+-#         expert_outputs_list = [
--+-#             ops.cat([
--+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+-#             ], dim=0) 
--+-#             for i in range(batch_size)
--+-#         ]
--+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+-#         return moe_output.squeeze(1)
--+-
--+-#     @no_grad()
--+-#     def _moe_infer_prefill(
--+-#         self, 
--+-#         hidden_states: mindspore.Tensor, 
--+-#         selected_experts: mindspore.Tensor, 
--+-#         routing_weights: mindspore.Tensor
--+-#     ) -> mindspore.Tensor:
--+-#         moe_output = ops.zeros_like(hidden_states)
--+-#         num_tokens = hidden_states.shape[0]
--+-#         flat_selected_experts = selected_experts.flatten()
--+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+-#         active_experts = ops.unique(flat_selected_experts)
--+-        
--+-#         for expert_idx_tensor in active_experts:
--+-#             expert_idx = expert_idx_tensor.item()
--+-#             expert_layer = self.experts[expert_idx]
--+-#             mask = (flat_selected_experts == expert_idx_tensor)
--+-#             selected_token_indices = token_indices[mask]
--+-#             selected_routing_weights = routing_weights.flatten()[mask]
--+-#             current_states = hidden_states[selected_token_indices]
--+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+-#             moe_output = moe_output.index_add(
--+-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+-#             )
--+-#         return moe_output
--+-
--+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+-#         """
--+-#         顶层 forward 方法，作为智能分发器。
--+-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--+-#         """
--+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-        
--+-#         # 1. 门控计算 (通用逻辑)
--+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+-#         router_logits = self.gate(hidden_states_reshaped)
--+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+-
--+-#         if self.norm_topk_prob:
--+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-        
--+-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+-        
--+-#         # 2. 智能分发到最优 MoE 路径
--+-#         moe_output = None
--+-#         if not self.training:
--+-#             if sequence_length == 1:
--+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+-#             else:
--+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+-#         else:
--+-#             raise NotImplementedError("Training path is not implemented.")
--+-
--+-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--+-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+-        
--+-#         # 4. 合并 MoE 输出和共享专家输出
--+-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+-        
--+-#         # 5. 恢复原始形状并返回
--+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+-        
--+-#         return final_hidden_states, router_logits
--+-
--+-# prefill fastest
--+-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+-#     """
--+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--+-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--+-#     """
--+-#     def __init__(self, config: Qwen2MoeConfig):
--+-#         super().__init__()
--+-#         self.num_experts = config.num_experts
--+-#         self.top_k = config.num_experts_per_tok
--+-#         self.norm_topk_prob = config.norm_topk_prob
--+-
--+-#         # 门控网络
--+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+-#         # 专家列表
--+-#         self.experts = nn.ModuleList(
--+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+-#         )
--+-#         # 共享专家
--+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+-
--+-#     @no_grad()
--+-#     def _moe_infer_dispatch(
--+-#         self, 
--+-#         hidden_states: mindspore.Tensor, 
--+-#         selected_experts: mindspore.Tensor, 
--+-#         routing_weights: mindspore.Tensor
--+-#     ) -> mindspore.Tensor:
--+-#         """
--+-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--+-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--+-#         """
--+-#         moe_output = ops.zeros_like(hidden_states)
--+-#         num_tokens, _ = hidden_states.shape
--+-        
--+-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--+-#         flat_selected_experts = selected_experts.flatten()
--+-#         flat_routing_weights = routing_weights.flatten()
--+-
--+-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+-
--+-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--+-#         active_experts = ops.unique(flat_selected_experts)
--+-        
--+-#         for expert_idx_tensor in active_experts:
--+-#             expert_idx = expert_idx_tensor.item()
--+-#             expert_layer = self.experts[expert_idx]
--+-            
--+-#             # 找到所有分配给该专家的 token
--+-#             mask = (flat_selected_experts == expert_idx_tensor)
--+-            
--+-#             # 使用 mask 选取对应的 token 和权重
--+-#             current_token_indices = token_indices[mask]
--+-#             current_routing_weights = flat_routing_weights[mask]
--+-#             current_hidden_states = hidden_states[current_token_indices]
--+-            
--+-#             # 对这些 token 进行批处理
--+-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+-            
--+-#             # 使用 index_add 将结果精确地加回到对应位置
--+-#             moe_output = moe_output.index_add(
--+-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--+-#             )
--+-#         return moe_output
--+-
--+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+-#         """
--+-#         顶层 forward 方法，作为智能分发器。
--+-#         """
--+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-        
--+-#         # 1. 门控计算
--+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+-#         router_logits = self.gate(hidden_states_reshaped)
--+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+-
--+-#         if self.norm_topk_prob:
--+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-        
--+-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+-        
--+-#         # 2. 调用统一的 MoE 计算内核
--+-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--+-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--+-
--+-#         # 3. 统一处理共享专家
--+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+-        
--+-#         # 4. 合并输出
--+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+-        
--+-#         # 5. 恢复原始形状并返回
--+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+-        
--+-#         return final_hidden_states, router_logits
--+-
--+-
--+-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+-#     """
--+-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+-#     【最终高性能与高精度版】：
--+-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--+-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--+-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--+-#     3. 这样实现了速度和准确性的两全其美。
--+-#     """
--+-#     def __init__(self, config: Qwen2MoeConfig):
--+-#         super().__init__()
--+-#         self.num_experts = config.num_experts
--+-#         self.top_k = config.num_experts_per_tok
--+-#         self.norm_topk_prob = config.norm_topk_prob
--+-
--+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+-#         self.experts = nn.ModuleList(
--+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+-#         )
--+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+-
--+-#     @no_grad()
--+-#     def _moe_infer_decode(
--+-#         self, 
--+-#         hidden_states: mindspore.Tensor, 
--+-#         selected_experts: mindspore.Tensor, 
--+-#         routing_weights: mindspore.Tensor
--+-#     ) -> mindspore.Tensor:
--+-#         """
--+-#         【解码路径】极致优化版：bmm + 高精度累加。
--+-#         """
--+-#         original_dtype = hidden_states.dtype
--+-#         batch_size, _ = hidden_states.shape
--+-        
--+-#         expert_outputs_list = [
--+-#             ops.cat([
--+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+-#             ], dim=0) 
--+-#             for i in range(batch_size)
--+-#         ]
--+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+-
--+-#         # 在 float32 下执行 bmm，得到高精度结果
--+-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+-        
--+-#         # 将高精度结果转换回原始数据类型
--+-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--+-        
--+-#         return moe_output
--+-
--+-#     @no_grad()
--+-#     def _moe_infer_prefill(
--+-#         self, 
--+-#         hidden_states: mindspore.Tensor, 
--+-#         selected_experts: mindspore.Tensor, 
--+-#         routing_weights: mindspore.Tensor
--+-#     ) -> mindspore.Tensor:
--+-#         """
--+-#         【预填充路径】与原始实现一致，结果精确。
--+-#         """
--+-#         moe_output = ops.zeros_like(hidden_states)
--+-#         num_tokens, _ = hidden_states.shape
--+-#         flat_selected_experts = selected_experts.flatten()
--+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+-#         active_experts = ops.unique(flat_selected_experts)
--+-        
--+-#         for expert_idx_tensor in active_experts:
--+-#             expert_idx = expert_idx_tensor.item()
--+-#             expert_layer = self.experts[expert_idx]
--+-#             mask = (flat_selected_experts == expert_idx_tensor)
--+-#             selected_token_indices = token_indices[mask]
--+-#             selected_routing_weights = routing_weights.flatten()[mask]
--+-#             current_states = hidden_states[selected_token_indices]
--+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+-#             moe_output = moe_output.index_add(
--+-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+-#             )
--+-#         return moe_output
--+-
--+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-        
--+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+-#         router_logits = self.gate(hidden_states_reshaped)
--+-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+-
--+-#         if self.norm_topk_prob:
--+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-        
--+-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--+-#         # 如果模型主体是 float16，后续再转换
--+-        
--+-#         moe_output = None
--+-#         if not self.training:
--+-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--+-#             # _moe_infer_decode 内部会处理好类型转换
--+-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--+-#             if sequence_length == 1:
--+-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+-#             else:
--+-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+-#         else:
--+-#             raise NotImplementedError("Training path is not implemented.")
--+-
--+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+-        
--+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+-        
--+-#         return final_hidden_states, router_logits
--+-    
--+-
--+-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+-#     """
--+-#     【融合版】一个混合专家模块，内置两种推理策略，
--+-#     由外部全局变量 `Long_Prompt` 控制：
--+-
--+-#     - if Long_Prompt is True:  【精度优先模式】
--+-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--+-#       适用于处理长序列，避免误差累积。
--+-
--+-#     - if Long_Prompt is False: 【速度优先模式】
--+-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--+-#       在解码阶段获得极致速度，同时保证结果高度准确。
--+-#     """
--+-#     def __init__(self, config: Qwen2MoeConfig):
--+-#         super().__init__()
--+-#         self.num_experts = config.num_experts
--+-#         self.top_k = config.num_experts_per_tok
--+-#         self.norm_topk_prob = config.norm_topk_prob
--+-
--+-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+-#         self.experts = nn.ModuleList(
--+-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+-#         )
--+-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+-
--+-#     # --- 速度优先模式的辅助函数 ---
--+-#     @no_grad()
--+-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+-#         original_dtype = hidden_states.dtype
--+-#         batch_size, _ = hidden_states.shape
--+-#         expert_outputs_list = [
--+-#             ops.cat([
--+-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+-#             ], dim=0) 
--+-#             for i in range(batch_size)
--+-#         ]
--+-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+-#         weights_fp32 = routing_weights.to(mindspore.float32)
--+-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+-#         return moe_output_fp32.squeeze(1).to(original_dtype)
--+-
--+-#     @no_grad()
--+-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+-#         moe_output = ops.zeros_like(hidden_states)
--+-#         num_tokens, _ = hidden_states.shape
--+-#         flat_selected_experts = selected_experts.flatten()
--+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+-#         active_experts = ops.unique(flat_selected_experts)
--+-#         for expert_idx_tensor in active_experts:
--+-#             expert_idx = expert_idx_tensor.item()
--+-#             expert_layer = self.experts[expert_idx]
--+-#             mask = (flat_selected_experts == expert_idx_tensor)
--+-#             selected_token_indices = token_indices[mask]
--+-#             selected_routing_weights = routing_weights.flatten()[mask]
--+-#             current_states = hidden_states[selected_token_indices]
--+-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--+-#         return moe_output
--+-
--+-#     # --- 精度优先模式的辅助函数 ---
--+-#     @no_grad()
--+-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+-#         moe_output = ops.zeros_like(hidden_states)
--+-#         num_tokens, _ = hidden_states.shape
--+-#         flat_selected_experts = selected_experts.flatten()
--+-#         flat_routing_weights = routing_weights.flatten()
--+-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+-#         active_experts = ops.unique(flat_selected_experts)
--+-#         for expert_idx_tensor in active_experts:
--+-#             expert_idx = expert_idx_tensor.item()
--+-#             expert_layer = self.experts[expert_idx]
--+-#             mask = (flat_selected_experts == expert_idx_tensor)
--+-#             current_token_indices = token_indices[mask]
--+-#             current_routing_weights = flat_routing_weights[mask]
--+-#             current_hidden_states = hidden_states[current_token_indices]
--+-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+-#         return moe_output
--+-
--+-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+-#         # 声明我们将要使用一个在模块外部定义的全局变量
--+-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--+-#         global Long_Prompt
--+-
--+-#         # 1. 门控计算 (所有模式通用)
--+-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+-#         router_logits = self.gate(hidden_states_reshaped)
--+-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+-#         if self.norm_topk_prob:
--+-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-        
--+-#         moe_output = None
--+-#         if not self.training:
--+-#             # 根据 Long_Prompt 标志选择模式
--+-#             if Long_Prompt:
--+-#                 # --- 精度优先模式 ---
--+-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+-#             else:
--+-#                 # --- 速度优先模式 ---
--+-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+-#                 if sequence_length == 1:
--+-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+-#                 else:
--+-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+-#         else:
--+-#             raise NotImplementedError("Training path is not implemented.")
--+-
--+-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+-        
--+-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+-        
--+-#         return final_hidden_states, router_logits
--+-    
--+ class Qwen2MoeSparseMoeBlock(nn.Module):
--+     """
--+     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--+@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+         return moe_output_fp32.squeeze(1).to(original_dtype)
--+ 
--++    # @no_grad()
--++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++    #     num_tokens, _ = hidden_states.shape
--++    #     flat_selected_experts = selected_experts.flatten()
--++    #     sorted_expert_indices = flat_selected_experts.argsort()
--++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--++    #     original_token_indices = sorted_expert_indices // self.top_k
--++    #     moe_output = ops.zeros_like(hidden_states)
--++    #     current_token_offset = 0
--++    #     for i in range(self.num_experts):
--++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
--++    #         if expert_token_count == 0:
--++    #             continue
--++    #         end_offset = current_token_offset + expert_token_count
--++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
--++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--++    #         current_token_offset += expert_token_count
--++    #     return moe_output
--++
--+     @no_grad()
--+     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+-        num_tokens, _ = hidden_states.shape
--+-        flat_selected_experts = selected_experts.flatten()
--+-        sorted_expert_indices = flat_selected_experts.argsort()
--+-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+-        original_token_indices = sorted_expert_indices // self.top_k
--++        """
--++        优化版 MoE prefill (速度优先模式)：
--++        - 批量张量化处理同一个 expert 的所有 token
--++        - 跳过无 token 的专家
--++        - 保持结果完全一致
--++        """
--+         moe_output = ops.zeros_like(hidden_states)
--+-        current_token_offset = 0
--+-        for i in range(self.num_experts):
--+-            expert_token_count = tokens_per_expert[i] - current_token_offset
--+-            if expert_token_count == 0:
--+-                continue
--+-            end_offset = current_token_offset + expert_token_count
--+-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+-            expert_hidden_states = hidden_states[expert_original_token_indices]
--+-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+-            current_token_offset += expert_token_count
--++
--++        flat_selected_experts = selected_experts.flatten()
--++        flat_routing_weights = routing_weights.flatten()
--++
--++        idxs = flat_selected_experts.argsort()
--++        sorted_expert_indices = flat_selected_experts[idxs]
--++        sorted_token_indices = idxs // self.top_k
--++
--++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
--++
--++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--++
--++        for expert_id in active_experts.tolist():
--++            start = int(tokens_per_expert[:expert_id].sum().item())
--++            end = start + int(tokens_per_expert[expert_id].item())
--++
--++            token_idx = sorted_token_indices[start:end]
--++            expert_tokens = hidden_states[token_idx]
--++
--++            expert_out = self.experts[expert_id](expert_tokens)
--++
--++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
--++
--++            moe_output = mindspore.mint.scatter_add(
--++                moe_output,
--++                0,
--++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
--++                scaled_out.to(hidden_states.dtype)
--++            )
--++
--+         return moe_output
--+ 
--++
--+     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--+     @no_grad()
--+     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+         
--+         moe_output = None
--+-        if Long_Prompt:
--+-            # --- 精度优先模式 (ACCURACY MODE) ---
--+-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++        # if Long_Prompt==0:
--++        #     # --- 精度优先模式 (ACCURACY MODE) ---
--++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++        # else:
--++        #     # --- 速度优先模式 (SPEED MODE) ---
--++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++        #     if sequence_length == 1:
--++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++        #     else:
--++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++        
--++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++        if sequence_length == 1:
--++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+         else:
--+-            # --- 速度优先模式 (SPEED MODE) ---
--+-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+-            if sequence_length == 1:
--+-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+-            else:
--+-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+-        
--++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++    
--+ 
--+         # 3. 共享专家计算与合并 (所有模式通用)
--+         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+         
--+         return final_hidden_states, router_logits
--+ 
--++
--+ class Qwen2MoeDecoderLayer(nn.Module):
--+     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--+         super().__init__()
--+         self.hidden_size = config.hidden_size
--+         
--+-        # if Long_Prompt:
--+-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+-        # else:
--++        # if Long_Prompt == 2:
--+         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++        # else:
--++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+ 
--+         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+ 
--+@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+             )
--+ 
--+         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
--+-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++        #     attention_mask,
--++        #     sequence_length=sequence_length,
--++        #     target_length=target_length,
--++        #     dtype=dtype,
--++        #     min_dtype=min_dtype,
--++        #     cache_position=cache_position,
--++        #     batch_size=input_tensor.shape[0],
--++        # )
--++        #@dwj
--++        causal_mask = get_cached_causal_mask_with_cache_position(
--+             attention_mask,
--+             sequence_length=sequence_length,
--+             target_length=target_length,
--+@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--+         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--+         """
--+-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
--++        _causal_mask_cache.clear()
--+ 
--+         input_ids = kwargs.get("input_ids")
--+         if input_ids is None and args:
--+@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+ 
--+         if input_ids is not None:
--+             prompt_length = input_ids.shape[1]
--+-            
--+-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--+-                Long_Prompt = True
--++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
--++                Long_Prompt = 2
--++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
--++                Long_Prompt = 0
--+             else:
--+-                Long_Prompt = False
--++                Long_Prompt = 1
--++
--+ 
--+         return super().generate(*args, **kwargs)
--+     
--+@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+             dtype = self.lm_head.weight.dtype
--+             min_dtype = float(ops.finfo(dtype).min)
--+ 
--+-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++            #     attention_mask,
--++            #     sequence_length=sequence_length,
--++            #     target_length=past_key_values.get_max_length(),
--++            #     dtype=dtype,
--++            #     min_dtype=min_dtype,
--++            #     cache_position=cache_position,
--++            #     batch_size=batch_size,
--++            # )
--++
--++            #@dwj
--++            attention_mask = get_cached_causal_mask_with_cache_position(
--+                 attention_mask,
--+                 sequence_length=sequence_length,
--+                 target_length=past_key_values.get_max_length(),
--+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+deleted file mode 100644
--+index 6dfb5b93..00000000
--+--- a/patches/0001-20251104commit.patch
--++++ /dev/null
--+@@ -1,1272 +0,0 @@
--+-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+-From: Pinoeer-kingxi <13022943007@163.com>
--+-Date: Tue, 4 Nov 2025 09:11:51 +0800
--+-Subject: [PATCH] 20251104commit
--+-
--+----
--+- mindnlp/transformers/cache_utils.py           |  28 +-
--+- .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+- 3 files changed, 976 insertions(+), 87 deletions(-)
--+-
--+-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+-index cadd2e04..02f8d4be 100644
--+---- a/mindnlp/transformers/cache_utils.py
--+-+++ b/mindnlp/transformers/cache_utils.py
--+-@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+-             # k_out[:, :, cache_position] = key_states
--+-             # v_out[:, :, cache_position] = value_states
--+--            if ON_ORANGE_PI:
--+--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+--            else:
--+--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+--
--+-+            # if ON_ORANGE_PI:
--+-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+-+            # else:
--+-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+-+            # 确保 cache_position 是 1D tensor 并且类型正确
--+-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+-+            if cache_position.ndim > 1:
--+-+                cache_position = cache_position.flatten()
--+-+            # 确保类型是 int32 或 int64（MindSpore 要求）
--+-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+-+                cache_position = cache_position.int()
--+-+            
--+-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+-+            k_out[:, :, cache_position] = key_states
--+-+            v_out[:, :, cache_position] = value_states
--+-+            
--+-         return k_out, v_out
--+- 
--+-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+-index c695b944..d8303e45 100644
--+---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+- # Copied from transformers.models.llama.modeling_llama.rotate_half
--+- def rotate_half(x):
--+-     """Rotates half the hidden dims of the input."""
--+--    x1 = x[..., : x.shape[-1] // 2]
--+--    x2 = x[..., x.shape[-1] // 2 :]
--+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+-+    # x1 = x[..., : x.shape[-1] // 2]
--+-+    # x2 = x[..., x.shape[-1] // 2 :]
--+-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+-     return ops.cat((-x2, x1), dim=-1)
--+- 
--+- 
--+-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+-         if self.training:
--+-             raise NotImplementedError("Training is not supported yet.")
--+-         else:
--+--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+--        if self.config.n_shared_experts is not None:
--+--            y = y + self.shared_experts(identity)
--+--        return y
--+-+            # @lwx
--+-+            if orig_shape[1] == 1:
--+-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+-+                y=y.view(*orig_shape)
--+-+                if self.config.n_shared_experts is not None:
--+-+                    y = y + self.shared_experts(identity)
--+-+                return y
--+-+            else:
--+-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+-+                if self.config.n_shared_experts is not None:
--+-+                    y = y + self.shared_experts(identity)
--+-+                return y
--+-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+-+        # if self.config.n_shared_experts is not None:
--+-+        #     y = y + self.shared_experts(identity)
--+-+        # return y
--+-+        
--+-+    @no_grad()
--+-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+-+
--+-+        expert_cache = ops.zeros_like(x)
--+-+        for i in range(self.num_experts_per_tok):
--+-+            expert_id = flat_expert_indices[i].item()
--+-+            weight = flat_expert_weights[i].item()
--+-+            expert = self.experts[expert_id]
--+-+            expert_out = expert(x)
--+-+            expert_cache += expert_out * weight
--+-+        return expert_cache
--+- 
--+-     @no_grad()
--+--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+--        # expert_cache = torch.zeros_like(x)
--+--        # idxs = flat_expert_indices.argsort()
--+--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+--        # token_idxs = idxs // self.num_experts_per_tok
--+--        # for i, end_idx in enumerate(tokens_per_expert):
--+--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+--        #     if start_idx == end_idx:
--+--        #         continue
--+--        #     expert = self.experts[i]
--+--        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+--        #     expert_tokens = x[exp_token_idx]
--+--        #     expert_out = expert(expert_tokens)
--+--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+--        # return expert_cache
--+-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+-         expert_cache = ops.zeros_like(x)
--+-         idxs = flat_expert_indices.argsort()
--+-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+-         token_idxs = idxs // self.num_experts_per_tok
--+-+
--+-         for i, end_idx in enumerate(tokens_per_expert):
--+-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+-             if start_idx == end_idx:
--+-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+-             expert_out = expert(expert_tokens)
--+-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+-+
--+-         return expert_cache
--+-+        
--+-+    # @no_grad()
--+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+-+    #     # expert_cache = torch.zeros_like(x)
--+-+    #     # idxs = flat_expert_indices.argsort()
--+-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+-+    #     # token_idxs = idxs // self.num_experts_per_tok
--+-+    #     # for i, end_idx in enumerate(tokens_per_expert):
--+-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+-+    #     #     if start_idx == end_idx:
--+-+    #     #         continue
--+-+    #     #     expert = self.experts[i]
--+-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+-+    #     #     expert_tokens = x[exp_token_idx]
--+-+    #     #     expert_out = expert(expert_tokens)
--+-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+-+    #     # return expert_cache
--+-+    #     expert_cache = ops.zeros_like(x)
--+-+    #     idxs = flat_expert_indices.argsort()
--+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+-+    #     token_idxs = idxs // self.num_experts_per_tok
--+-+
--+-+    #     for i, end_idx in enumerate(tokens_per_expert):
--+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+-+    #         if start_idx == end_idx:
--+-+    #             continue
--+-+    #         expert = self.experts[i]
--+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+-+    #         expert_tokens = x[exp_token_idx]
--+-+    #         expert_out = expert(expert_tokens)
--+-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+-+
--+-+    #     return expert_cache
--+-+    # @no_grad()
--+-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+-+    #     expert_cache = ops.zeros_like(x)
--+-+
--+-+    #     # 排序保证顺序一致
--+-+    #     idxs = flat_expert_indices.argsort()
--+-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+-+    #     token_idxs = idxs // self.num_experts_per_tok
--+-+
--+-+    #     # 找出有 token 的专家
--+-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+-+
--+-+    #     for i in active_experts.tolist():
--+-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+-+    #         end_idx = tokens_per_expert[i]
--+-+    #         if start_idx == end_idx:  # 没有 token
--+-+    #             continue
--+-+
--+-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+-+    #         expert_tokens = x[exp_token_idx]
--+-+    #         expert_out = self.experts[i](expert_tokens)
--+-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+-+
--+-+    #         expert_cache = mindspore.mint.scatter_add(
--+-+    #             expert_cache,
--+-+    #             0,
--+-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+-+    #             expert_out
--+-+    #         )
--+-+
--+-+    #     return expert_cache
--+-+
--+-+
--+- 
--+- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+- #     """
--+-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+- 
--+-         # Initialize weights and apply final processing
--+-         self.post_init()
--+-+        self.warm_up = False
--+-+
--+-+    def warmup_moe_model_deep(self):
--+-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+-+        test_texts = [
--+-+            "warmup short",
--+-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+-+        ]
--+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+-+        if tokenizer is None:
--+-+            from mindnlp.transformers import AutoTokenizer
--+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+-+            self._warmup_tokenizer = tokenizer
--+-+
--+-+        for text in test_texts:
--+-+            inputs = tokenizer(text, return_tensors="ms")
--+-+            with mindspore._no_grad():
--+-+                _ = self(**inputs, use_cache=False)
--+-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+- 
--+-     def get_input_embeddings(self):
--+-         return self.model.embed_tokens
--+-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+-         ```"""
--+-+        if not self.warm_up:
--+-+            self.warm_up = True
--+-+            self.warmup_moe_model_deep()
--+-+
--+-         output_attentions = (
--+-             output_attentions
--+-             if output_attentions is not None
--+-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+-index 3cbf820e..d4c6b651 100644
--+---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+-@@ -18,7 +18,6 @@
--+- # See the License for the specific language governing permissions and
--+- # limitations under the License.
--+- """MindSpore Qwen2MoE model."""
--+--
--+- import math
--+- from typing import List, Optional, Tuple, Union
--+- 
--+-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+-     TokenClassifierOutput,
--+- )
--+- from ...modeling_utils import PreTrainedModel
--+-+from ...generation import GenerationMixin
--+- from ....utils import logging
--+- from .configuration_qwen2_moe import Qwen2MoeConfig
--+- 
--+-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+-         self.variance_epsilon = eps
--+- 
--+-     def forward(self, hidden_states):
--+-+        # @dwj
--+-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+-+        # @lwx
--+-+        # if not self.training :
--+-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+-         input_dtype = hidden_states.dtype
--+-         hidden_states = hidden_states.to(mindspore.float32)
--+-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+-@@ -234,6 +239,8 @@ def rotate_half(x):
--+-     """Rotates half the hidden dims of the input."""
--+-     x1 = x[..., : x.shape[-1] // 2]
--+-     x2 = x[..., x.shape[-1] // 2 :]
--+-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+-     return ops.cat((-x2, x1), dim=-1)
--+- 
--+- 
--+-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+-         self.config = config
--+-         self.hidden_size = config.hidden_size
--+-         self.intermediate_size = intermediate_size
--+-+        
--+-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+-         self.act_fn = ACT2FN[config.hidden_act]
--+- 
--+-     def forward(self, x):
--+--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+--
--+- 
--+-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+-+        # @lwx 
--+-+        # gate_up_output = self.gate_up_proj(x)
--+-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+-+        # return self.down_proj(swiglu_output)
--+-+
--+-+    # def forward(self, x):
--+-+    #     gate_proj_out = self.gate_proj(x)
--+-+    #     up_proj_out = self.up_proj(x)
--+-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+-+    #     return self.down_proj(swiglu_out)
--+-+        
--+- # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+-     """
--+-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+-         use_cache: bool = False,
--+-         cache_position: Optional[mindspore.Tensor] = None,
--+-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+-+
--+-+        
--+-+
--+-         bsz, q_len, _ = hidden_states.shape
--+- 
--+-         query_states = self.q_proj(hidden_states)
--+-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+-                     "with a layer index."
--+-                 )
--+--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+-+            if isinstance(past_key_value, StaticCache):
--+-+                kv_seq_len = key_states.shape[-2]
--+-+            else:
--+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+- 
--+-         if past_key_value is not None:
--+-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+-+            
--+-+            if isinstance(past_key_value, StaticCache):
--+-+                kv_seq_len = key_states.shape[-2]
--+- 
--+-         # repeat k/v heads if n_kv_heads < n_heads
--+-         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+-         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+--
--+-+        
--+-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+- 
--+--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+--            raise ValueError(
--+--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+--                f" {attn_weights.shape}"
--+--            )
--+--
--+--        if attention_mask is not None:  # no matter the length, we just slice it
--+--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+-+        if attention_mask is not None:
--+-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+-             attn_weights = attn_weights + causal_mask
--+- 
--+-         # upcast attention to fp32
--+-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+- 
--+-         attn_output = self.o_proj(attn_output)
--+--
--+-+        # @lwx
--+-+        
--+-+        # max_seq_len = self.max_position_embeddings  # 2048
--+-+
--+-+        # if attention_mask is not None:
--+-+        #     # attention_mask: [B, 1, Sq, Sk]
--+-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+-+
--+-+        #     # pad 到 [max_seq_len, max_seq_len]
--+-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+-+        #     global_attention_mask = padded_mask
--+-+        # else:
--+-+        #     global_attention_mask = None
--+-+
--+-+
--+-+        # sparse_mode=3
--+-+        # attn_output = mindspore.ops.flash_attention_score(
--+-+        #     query=query_states,
--+-+        #     key=key_states,
--+-+        #     value=value_states,
--+-+        #     real_shift=None,
--+-+        #     padding_mask=None,
--+-+
--+-+        #     head_num=self.num_heads,
--+-+        #     attn_mask=global_attention_mask,
--+-+        #     keep_prob=1.0 - self.attention_dropout,
--+-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+-+        #     input_layout="BNSD",
--+-+        #     pre_tokens=2147483647,
--+-+        #     next_tokens=2147483647,
--+-+        #     inner_precise=0,
--+-+        #     drop_mask=None,
--+-+        #     prefix=None,
--+-+        #     actual_seq_qlen=None,
--+-+        #     actual_seq_kvlen=None,
--+-+        #     sparse_mode=sparse_mode,
--+-+        # )
--+-         if not output_attentions:
--+-             attn_weights = None
--+- 
--+-         return attn_output, attn_weights, past_key_value
--+- 
--+- 
--+-+class Qwen2MoeFlashAttention(nn.Module):
--+-+    """
--+-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+-+
--+-+    关键改动:
--+-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+-+       直接传入原始的 key 和 value 张量效率更高。
--+-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+-+    """
--+-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+-+        super().__init__()
--+-+        self.config = config
--+-+        self.layer_idx = layer_idx
--+-+        self.hidden_size = config.hidden_size
--+-+        self.num_heads = config.num_attention_heads
--+-+        self.head_dim = self.hidden_size // self.num_heads
--+-+        self.num_key_value_heads = config.num_key_value_heads
--+-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+-+        self.max_position_embeddings = config.max_position_embeddings
--+-+        self.rope_theta = config.rope_theta
--+-+        self.attention_dropout = config.attention_dropout
--+-+
--+-+        if (self.head_dim * self.num_heads) != self.hidden_size:
--+-+            raise ValueError(
--+-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+-+            )
--+-+
--+-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+-+
--+-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+-+            self.head_dim,
--+-+            max_position_embeddings=self.max_position_embeddings,
--+-+            base=self.rope_theta,
--+-+        )
--+-+
--+-+    def forward(
--+-+        self,
--+-+        hidden_states: mindspore.Tensor,
--+-+        attention_mask: Optional[mindspore.Tensor] = None,
--+-+        position_ids: Optional[mindspore.Tensor] = None,
--+-+        past_key_value: Optional[Cache] = None,
--+-+        output_attentions: bool = False,
--+-+        use_cache: bool = False,
--+-+        cache_position: Optional[mindspore.Tensor] = None,
--+-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+-+
--+-+        bsz, q_len, _ = hidden_states.shape
--+-+
--+-+        # 1. 线性投射 Q, K, V
--+-+        query_states = self.q_proj(hidden_states)
--+-+        key_states = self.k_proj(hidden_states)
--+-+        value_states = self.v_proj(hidden_states)
--+-+
--+-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+-+        # query:   [B, S, H*D] -> [B, N1, S, D]
--+-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+
--+-+        # 3. RoPE 旋转位置编码
--+-+        kv_seq_len = key_states.shape[-2]
--+-+        if past_key_value is not None:
--+-+            if self.layer_idx is None:
--+-+                raise ValueError(
--+-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+-+                    "with a layer index."
--+-+                )
--+-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+-+                if cache_position.shape[0] == 1:
--+-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+-+                    kv_seq_len = past_seen_tokens + 1
--+-+                else:
--+-+                    # prefill 阶段：cache_position 是范围，使用其长度
--+-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+-+            else:
--+-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+-+
--+-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+-+
--+-+        # 4. KV 缓存更新
--+-+        if past_key_value is not None:
--+-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+-+            key_states, value_states = past_key_value.update(
--+-+                key_states, value_states, self.layer_idx, cache_kwargs
--+-+            )
--+-+            
--+-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+-+                if cache_position.shape[0] == 1:
--+-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+-+                    kv_seq_len = key_states.shape[-2]
--+-+
--+-+        # 5. [重要] 准备 Attention Mask
--+-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+-+        fa_attention_mask = None
--+-+        if attention_mask is not None:
--+-+            # 截取与当前key长度匹配的部分
--+-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+-+            fa_attention_mask = (mask_slice != 0)
--+-+
--+-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+-+        input_dtype = query_states.dtype
--+-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+-+            query_states = query_states.to(mindspore.float16)
--+-+            key_states = key_states.to(mindspore.float16)
--+-+            value_states = value_states.to(mindspore.float16)
--+-+
--+-+        # 6. [核心] 调用 flash_attention_score 算子
--+-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+-+        attn_output = mindspore.ops.flash_attention_score(
--+-+            query=query_states,
--+-+            key=key_states,
--+-+            value=value_states,
--+-+            head_num=self.num_heads, # 传入Q的头数(N1)
--+-+            attn_mask=fa_attention_mask,
--+-+            keep_prob=1.0 - self.attention_dropout,
--+-+            scalar_value=1.0 / math.sqrt(self.head_dim),
--+-+            input_layout="BNSD",
--+-+            sparse_mode=0 # 使用 defaultMask 模式
--+-+        )
--+-+
--+-+        # 恢复原始数据类型
--+-+        attn_output = attn_output.to(input_dtype)
--+-+
--+-+        # 7. 调整输出形状
--+-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+-+        attn_output = self.o_proj(attn_output)
--+-+
--+-+        # FlashAttention 算子不直接返回注意力权重矩阵
--+-+        attn_weights = None
--+-+        if output_attentions:
--+-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+-+
--+-+        return attn_output, attn_weights, past_key_value
--+-+
--+-+    # def forward(
--+-+    #     self,
--+-+    #     hidden_states: mindspore.Tensor,
--+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+-+    #     position_ids: Optional[mindspore.Tensor] = None,
--+-+    #     past_key_value: Optional[Cache] = None,
--+-+    #     output_attentions: bool = False,
--+-+    #     use_cache: bool = False,
--+-+    #     cache_position: Optional[mindspore.Tensor] = None,
--+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+-+
--+-+    #     bsz, q_len, _ = hidden_states.shape
--+-+
--+-+    #     # 1. 线性投射 Q, K, V
--+-+    #     query_states = self.q_proj(hidden_states)
--+-+    #     key_states = self.k_proj(hidden_states)
--+-+    #     value_states = self.v_proj(hidden_states)
--+-+
--+-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+
--+-+    #     # 3. RoPE 旋转位置编码
--+-+    #     kv_seq_len = key_states.shape[-2]
--+-+    #     if past_key_value is not None:
--+-+    #         if self.layer_idx is None:
--+-+    #             raise ValueError(
--+-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+-+    #                 "with a layer index."
--+-+    #             )
--+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+-+
--+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+-+
--+-+    #     # 4. KV 缓存更新
--+-+    #     if past_key_value is not None:
--+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+-+    #         key_states, value_states = past_key_value.update(
--+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+-+    #         )
--+-+
--+-+    #     # 5. 准备 Attention Mask
--+-+    #     fa_attention_mask = None
--+-+    #     if attention_mask is not None:
--+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+-+    #         fa_attention_mask = (mask_slice != 0)
--+-+
--+-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+-+    #     input_dtype = query_states.dtype
--+-+
--+-+    #     # 6. [核心] 调用 flash_attention_score 算子
--+-+    #     attn_output = mindspore.ops.flash_attention_score(
--+-+    #         query=query_states,
--+-+    #         key=key_states,
--+-+    #         value=value_states,
--+-+    #         head_num=self.num_heads,
--+-+    #         attn_mask=fa_attention_mask,
--+-+    #         keep_prob=1.0 - self.attention_dropout,
--+-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+-+    #         input_layout="BNSD",
--+-+    #         sparse_mode=0,
--+-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+-+    #         inner_precise=1
--+-+    #     )
--+-+
--+-+    #     # 恢复原始数据类型
--+-+    #     attn_output = attn_output.to(input_dtype)
--+-+
--+-+    #     # 7. 调整输出形状
--+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+-+    #     attn_output = self.o_proj(attn_output)
--+-+
--+-+    #     attn_weights = None
--+-+    #     if output_attentions:
--+-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+-+
--+-+    #     return attn_output, attn_weights, past_key_value
--+-+
--+-+    # def forward(
--+-+    #     self,
--+-+    #     hidden_states: mindspore.Tensor,
--+-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+-+    #     position_ids: Optional[mindspore.Tensor] = None,
--+-+    #     past_key_value: Optional[Cache] = None,
--+-+    #     output_attentions: bool = False,
--+-+    #     use_cache: bool = False,
--+-+    #     cache_position: Optional[mindspore.Tensor] = None,
--+-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+-+
--+-+    #     bsz, q_len, _ = hidden_states.shape
--+-+
--+-+    #     query_states = self.q_proj(hidden_states)
--+-+    #     key_states = self.k_proj(hidden_states)
--+-+    #     value_states = self.v_proj(hidden_states)
--+-+
--+-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-+
--+-+    #     kv_seq_len = key_states.shape[-2]
--+-+    #     if past_key_value is not None:
--+-+    #         if self.layer_idx is None:
--+-+    #             raise ValueError("`layer_idx` must be specified for caching")
--+-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+-+
--+-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+-+
--+-+    #     if past_key_value is not None:
--+-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+-+    #         key_states, value_states = past_key_value.update(
--+-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+-+    #         )
--+-+
--+-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+-+
--+-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+-+    #     query_states = query_states / math.sqrt(self.head_dim)
--+-+    #     # <--- 修改结束 ---
--+-+
--+-+    #     fa_attention_mask = None
--+-+    #     if attention_mask is not None:
--+-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+-+    #         fa_attention_mask = (mask_slice != 0)
--+-+
--+-+    #     input_dtype = query_states.dtype
--+-+
--+-+    #     attn_output = mindspore.ops.flash_attention_score(
--+-+    #         query=query_states,    # 传入已经预先缩放过的 query
--+-+    #         key=key_states,
--+-+    #         value=value_states,
--+-+    #         head_num=self.num_heads,
--+-+    #         attn_mask=fa_attention_mask,
--+-+    #         keep_prob=1.0 - self.attention_dropout,
--+-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+-+    #         input_layout="BNSD",
--+-+    #         sparse_mode=0,
--+-+    #         inner_precise=1        # 仍然保持内部高精度计算
--+-+    #     )
--+-+
--+-+    #     attn_output = attn_output.to(input_dtype)
--+-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+-+    #     attn_output = self.o_proj(attn_output)
--+-+
--+-+    #     attn_weights = None
--+-+    #     if output_attentions:
--+-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+-+
--+-+    #     return attn_output, attn_weights, past_key_value
--+-+
--+- QWEN2MOE_ATTENTION_CLASSES = {
--+-     "eager": Qwen2MoeAttention,
--+-+    "flash-attention": Qwen2MoeFlashAttention,
--+- }
--+- 
--+- 
--+-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+- 
--+-+    #@dwj
--+-+    # 只遍历激活的专家，而非全部专家
--+-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+--        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+--        hidden_states = hidden_states.view(-1, hidden_dim)
--+--        # router_logits: (batch * sequence_length, n_experts)
--+--        router_logits = self.gate(hidden_states)
--+--
--+--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+--        if self.norm_topk_prob:
--+--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+--        # we cast back to the input dtype
--+--        routing_weights = routing_weights.to(hidden_states.dtype)
--+--
--+--        final_hidden_states = ops.zeros(
--+--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+--        )
--+--
--+--        # One hot encode the selected experts to create an expert mask
--+--        # this will be used to easily index which expert is going to be sollicitated
--+--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+--
--+--        # Loop over all available experts in the model and perform the computation on each expert
--+--        for expert_idx in range(self.num_experts):
--+--            expert_layer = self.experts[expert_idx]
--+--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+--
--+--            # Index the correct hidden states and compute the expert hidden state for
--+--            # the current expert. We need to make sure to multiply the output hidden
--+--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+--            if 0 not in idx.shape:
--+--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+--
--+--                # However `index_add_` only support torch tensors for indexing so we'll use
--+--                # the `top_x` tensor here.
--+--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+--
--+--        shared_expert_output = self.shared_expert(hidden_states)
--+--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+--
--+--        final_hidden_states = final_hidden_states + shared_expert_output
--+-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+-+            num_tokens = hidden_states_reshaped.shape[0]
--+-+
--+-+            router_logits = self.gate(hidden_states_reshaped)
--+-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+-+
--+-+            if self.norm_topk_prob:
--+-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+-+            routing_weights = routing_weights.to(hidden_states.dtype)
--+-+            
--+-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+-+            flat_selected_experts = selected_experts.flatten()
--+-+            
--+-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+-+            token_indices = broadcasted_token_indices.flatten()
--+-+            
--+-+            active_experts = ops.unique(flat_selected_experts)
--+-+            
--+-+            for expert_idx_tensor in active_experts:
--+-+                expert_idx = expert_idx_tensor.item()
--+-+                expert_layer = self.experts[expert_idx]
--+-+                
--+-+                mask = (flat_selected_experts == expert_idx_tensor)
--+-+                selected_token_indices = token_indices[mask]
--+-+                selected_routing_weights = routing_weights.flatten()[mask]
--+-+                
--+-+                current_states = hidden_states_reshaped[selected_token_indices]
--+-+                
--+-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+-+                
--+-+                final_hidden_states = final_hidden_states.index_add(
--+-+                    dim=0,
--+-+                    index=selected_token_indices,
--+-+                    source=expert_output.to(hidden_states.dtype)
--+-+                )
--+-+            
--+-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+- 
--+--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+--        return final_hidden_states, router_logits
--+-+            final_hidden_states = final_hidden_states + shared_expert_output
--+-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+-+            
--+-+            return final_hidden_states, router_logits
--+- 
--+- 
--+- class Qwen2MoeDecoderLayer(nn.Module):
--+-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+- 
--+-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+- 
--+-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+-+
--+-         if (layer_idx not in config.mlp_only_layers) and (
--+-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+-         ):
--+-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+-     _skip_keys_device_placement = "past_key_values"
--+-     _supports_cache_class = True
--+-+#lwx
--+-+    # _supports_static_cache = True
--+- 
--+-     def _init_weights(self, module):
--+-         std = self.config.initializer_range
--+-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+-         return causal_mask
--+- 
--+- 
--+--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+-     _tied_weights_keys = ["lm_head.weight"]
--+- 
--+-     def __init__(self, config):
--+-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+-         self.num_experts_per_tok = config.num_experts_per_tok
--+-         # Initialize weights and apply final processing
--+-         self.post_init()
--+-+        # @lwx
--+-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+-+        #     self.generation_config.cache_implementation = "static"
--+-+        self._warmed_up = False
--+-+
--+-+    def warmup_moe_model(self):
--+-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+-+        test_texts = [
--+-+            "warmup short",
--+-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+-+        ]
--+-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+-+        if tokenizer is None:
--+-+            from mindnlp.transformers import AutoTokenizer
--+-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+-+            self._warmup_tokenizer = tokenizer
--+-+
--+-+        for text in test_texts:
--+-+            inputs = tokenizer(text, return_tensors="ms")
--+-+            with mindspore._no_grad():
--+-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+- 
--+-     def get_input_embeddings(self):
--+-         return self.model.embed_tokens
--+-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+-         ```"""
--+-+        if not self._warmed_up:
--+-+            self._warmed_up = True
--+-+            self.warmup_moe_model()
--+- 
--+-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+-         output_router_logits = (
--+-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+-             }
--+-         )
--+-         return model_inputs
--+-+# @lwx
--+-+    # def _decode_one_tokens_logits(
--+-+    #     self,
--+-+    #     cur_token: mindspore.Tensor,
--+-+    #     input_pos: Optional[mindspore.Tensor],
--+-+    #     cache_position: mindspore.Tensor,
--+-+    #     past_key_values: StaticCache,
--+-+    # ) -> mindspore.Tensor:
--+-+    #     """
--+-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+-+        
--+-+    #     Args:
--+-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+-+    #         input_pos: 输入位置信息，可选
--+-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+-+            
--+-+    #     Returns:
--+-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+-+    #     """
--+-+    #     # 调用JIT编译的版本
--+-+    #     return self.get_decode_one_tokens_logits(
--+-+    #         cur_token=cur_token,
--+-+    #         input_pos=input_pos,
--+-+    #         cache_position=cache_position,
--+-+    #         past_key_values=past_key_values,
--+-+    #     )
--+-+    
--+-+    # @mindspore.jit(jit_level='O1')
--+-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+-+    #     """
--+-+    #     JIT编译的函数，用于高效的单token解码
--+-+    #     使用JIT编译优化以支持静态shape和高效执行
--+-+        
--+-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+-+    #     """
--+-+    #     outputs = self.model.forward(
--+-+    #         input_ids=cur_token,
--+-+    #         position_ids=input_pos,
--+-+    #         cache_position=cache_position,
--+-+    #         past_key_values=past_key_values,
--+-+    #         use_cache=True,
--+-+    #         return_dict=False,
--+-+    #     )
--+-+        
--+-+    #     hidden_states = outputs[0]
--+-+    #     logits = self.lm_head.forward(hidden_states)
--+-+    #     logits = logits.float()
--+-+        
--+-+    #     return logits[:, -1, :]
--+-+
--+-+    # def _sample(
--+-+    #     self,
--+-+    #     input_ids: mindspore.Tensor,
--+-+    #     logits_processor,
--+-+    #     stopping_criteria,
--+-+    #     generation_config,
--+-+    #     synced_devices: bool,
--+-+    #     streamer=None,
--+-+    #     logits_warper=None,
--+-+    #     **model_kwargs,
--+-+    # ):
--+-+    #     """
--+-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+-+    #     """
--+-+    #     from ...generation.logits_process import LogitsProcessorList
--+-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+-+    #     from mindnlp.core import nn, ops, no_grad
--+-+    #     import numpy as np
--+-+        
--+-+    #     # 检查是否使用 StaticCache
--+-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+-+    #     # 否则，直接调用父类方法
--+-+    #     past_key_values = model_kwargs.get("past_key_values")
--+-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+-+
--+-+    #     if not isinstance(past_key_values, StaticCache):
--+-+    #         # 不使用 StaticCache，直接调用父类方法
--+-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+-+    #         return super()._sample(
--+-+    #             input_ids=input_ids,
--+-+    #             logits_processor=logits_processor,
--+-+    #             stopping_criteria=stopping_criteria,
--+-+    #             generation_config=generation_config,
--+-+    #             synced_devices=synced_devices,
--+-+    #             streamer=streamer,
--+-+    #             logits_warper=logits_warper,
--+-+    #             **model_kwargs,
--+-+    #         )
--+-+        
--+-+    #     # 使用 StaticCache，进入自定义循环
--+-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+-+    #     pad_token_id = generation_config._pad_token_tensor
--+-+    #     output_attentions = generation_config.output_attentions
--+-+    #     output_hidden_states = generation_config.output_hidden_states
--+-+    #     output_scores = generation_config.output_scores
--+-+    #     output_logits = generation_config.output_logits
--+-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+-+    #     max_length = generation_config.max_length
--+-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+-+    #     do_sample = generation_config.do_sample
--+-+        
--+-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+-+    #         raise ValueError(
--+-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+-+    #             f"{logits_warper})."
--+-+    #         )
--+-+        
--+-+    #     # init attention / hidden states / scores tuples
--+-+    #     scores = () if (return_dict_in_generate and output_scores) else None
--+-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+-+        
--+-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+-+    #         encoder_hidden_states = (
--+-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+-+    #         )
--+-+        
--+-+    #     # keep track of which sequences are already finished
--+-+    #     batch_size, cur_len = input_ids.shape
--+-+    #     this_peer_finished = False
--+-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+-+        
--+-+    #     time_record = []
--+-+    #     from ....utils.testing_utils import parse_flag_from_env
--+-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+-+        
--+-+    #     while self._has_unfinished_sequences(
--+-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+-+    #     ):
--+-+    #         if _record_time:
--+-+    #             import time as time_module
--+-+    #             infer_start = time_module.time()
--+-+            
--+-+    #         # prepare model inputs
--+-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+-+            
--+-+    #         # prepare variable output controls
--+-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+-+            
--+-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+-+    #         cur_cache_position = model_inputs.get("cache_position")
--+-+    #         cur_past_key_values = model_inputs.get("past_key_values")
--+-+    #         cur_input_ids = model_inputs.get("input_ids")
--+-+            
--+-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+-+    #             cur_cache_position is not None and 
--+-+    #             len(cur_cache_position.shape) > 0 and
--+-+    #             cur_cache_position.shape[0] == 1 and
--+-+    #             cur_input_ids is not None and
--+-+    #             cur_input_ids.shape[1] == 1):
--+-+    #             # 使用 JIT 优化的单 token 解码
--+-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+-+    #             if not hasattr(self, '_jit_used'):
--+-+    #                 self._jit_used = False
--+-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+-+                
--+-+    #             next_token_logits = self.get_decode_one_tokens_logits(
--+-+    #                 cur_token=cur_input_ids,
--+-+    #                 input_pos=model_inputs.get("position_ids"),
--+-+    #                 cache_position=cur_cache_position,
--+-+    #                 past_key_values=cur_past_key_values,
--+-+    #             )
--+-+                
--+-+    #             # 标记已使用JIT（用于后续判断）
--+-+    #             if not self._jit_used:
--+-+    #                 self._jit_used = True
--+-+                
--+-+    #             # 构造兼容的输出对象
--+-+    #             class JitOptimizedOutput:
--+-+    #                 def __init__(self, logits, config):
--+-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+-+    #                     self.config = config
--+-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+-+    #                     self.attentions = None if not config.is_encoder_decoder else None
--+-+    #                     self.cross_attentions = None
--+-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+-+                
--+-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+-+    #         else:
--+-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+-+    #             outputs = self(**model_inputs, return_dict=True)
--+-+            
--+-+    #         if synced_devices and this_peer_finished:
--+-+    #             continue
--+-+            
--+-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+-+    #         next_token_logits = outputs.logits[:, -1, :]
--+-+            
--+-+    #         # pre-process distribution
--+-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+-+    #         if do_sample:
--+-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+-+            
--+-+    #         # Store scores, attentions and hidden_states when required
--+-+    #         if return_dict_in_generate:
--+-+    #             if output_scores:
--+-+    #                 scores += (next_token_scores,)
--+-+    #             if output_logits:
--+-+    #                 raw_logits += (next_token_logits,)
--+-+    #             if output_attentions:
--+-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+-+    #                 if self.config.is_encoder_decoder:
--+-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+-+                
--+-+    #             if output_hidden_states:
--+-+    #                 hidden = (
--+-+    #                     outputs.decoder_hidden_states
--+-+    #                     if self.config.is_encoder_decoder
--+-+    #                     else outputs.hidden_states
--+-+    #                 )
--+-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+-+            
--+-+    #         # token selection
--+-+    #         if do_sample:
--+-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+-+    #         else:
--+-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+-+            
--+-+    #         # finished sentences should have their next token be a padding token
--+-+    #         if has_eos_stopping_criteria:
--+-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+-+            
--+-+    #         # update generated ids, model inputs, and length for next step
--+-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+-+    #         if streamer is not None:
--+-+    #             streamer.put(next_tokens)
--+-+            
--+-+    #         model_kwargs = self._update_model_kwargs_for_generation(
--+-+    #             outputs,
--+-+    #             model_kwargs,
--+-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+-+    #         )
--+-+            
--+-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+-+    #         cur_len += 1
--+-+            
--+-+    #         if _record_time:
--+-+    #             import time as time_module
--+-+    #             infer_stop = time_module.time()
--+-+    #             time_record.append(infer_stop - infer_start)
--+-+            
--+-+    #         del outputs
--+-+        
--+-+    #     average_infer_time = None
--+-+    #     if time_record:
--+-+    #         if len(time_record) > 1:
--+-+    #             time_record.pop(0)
--+-+    #         average_infer_time = sum(time_record) / len(time_record)
--+-+    #         print(f'average inference time is: {average_infer_time}')
--+-+    #         print(f'inference time record: {time_record}')
--+-+        
--+-+    #     if streamer is not None:
--+-+    #         streamer.end()
--+-+        
--+-+    #     # 简单判断：打印是否使用了JIT路径
--+-+    #     if hasattr(self, '_jit_used') and self._jit_used:
--+-+    #         print("[JIT] ✓ JIT optimization was used during generation")
--+-+    #     else:
--+-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+-+        
--+-+    #     if return_dict_in_generate:
--+-+    #         if self.config.is_encoder_decoder:
--+-+    #             return GenerateEncoderDecoderOutput(
--+-+    #                 sequences=input_ids,
--+-+    #                 scores=scores,
--+-+    #                 logits=raw_logits,
--+-+    #                 encoder_attentions=encoder_attentions,
--+-+    #                 encoder_hidden_states=encoder_hidden_states,
--+-+    #                 decoder_attentions=decoder_attentions,
--+-+    #                 cross_attentions=cross_attentions,
--+-+    #                 decoder_hidden_states=decoder_hidden_states,
--+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+-+    #                 average_infer_time=average_infer_time
--+-+    #             )
--+-+    #         else:
--+-+    #             return GenerateDecoderOnlyOutput(
--+-+    #                 sequences=input_ids,
--+-+    #                 scores=scores,
--+-+    #                 logits=raw_logits,
--+-+    #                 attentions=decoder_attentions,
--+-+    #                 hidden_states=decoder_hidden_states,
--+-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+-+    #                 average_infer_time=average_infer_time
--+-+    #             )
--+-+    #     else:
--+-+    #         return input_ids
--+-+            
--+-+    # def _prepare_cache_for_generation(
--+-+    #     self,
--+-+    #     generation_config,
--+-+    #     model_kwargs,
--+-+    #     assistant_model,
--+-+    #     batch_size,
--+-+    #     max_cache_length,
--+-+    # ):
--+-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+-+    #         generation_config.cache_implementation = "static"
--+-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+-+        
--+-+    #     if generation_config.cache_implementation == "static":
--+-+    #         base_required_from_max_length = generation_config.max_length + 1
--+-+    #         base_required = max(max_cache_length, base_required_from_max_length)
--+-+    #         min_cache_size = 50
--+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+-+    #         else:
--+-+    #             max_cache_length = max(base_required, min_cache_size)
--+-+            
--+-+    #         original_max_cache_length = max_cache_length
--+-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+-+    #         print(f"  - final max_cache_length: {max_cache_length}")
--+-+            
--+-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+-+    #             if max_cache_length > self.config.max_position_embeddings:
--+-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+-+        
--+-+    #     result = super()._prepare_cache_for_generation(
--+-+    #         generation_config=generation_config,
--+-+    #         model_kwargs=model_kwargs,
--+-+    #         assistant_model=assistant_model,
--+-+    #         batch_size=batch_size,
--+-+    #         max_cache_length=max_cache_length,
--+-+    #     )
--+-+        
--+-+    #     if generation_config.cache_implementation == "static":
--+-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+-+    #         created_cache = model_kwargs.get(cache_name)
--+-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+-+    #             if created_cache.max_cache_len < generation_config.max_length:
--+-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+-+        
--+-+    #     return result
--+-+
--+-+
--+-+
--+- 
--+- 
--+- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+--- 
--+-2.27.0
--+-
--+-- 
--+2.27.0
--+
---- 
--2.27.0
--
-diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
-deleted file mode 100644
-index 7217a46b..00000000
---- a/patches/0005-20251107001commit.patch
-+++ /dev/null
-@@ -1,7707 +0,0 @@
--From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Fri, 7 Nov 2025 11:48:18 +0800
--Subject: [PATCH 5/8] 20251107001commit
--
-----
-- .../models/deepseek/modeling_deepseek.py      |   91 +-
-- .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
-- .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
-- patches/0001-20251104commit.patch             |    2 +-
-- patches/0002-20251106commit.patch             |    2 +-
-- patches/0003-20261106secondcommit.patch       |    2 +-
-- patches/0004-20251106change.patch             | 7498 +++++++++++++++++
-- 7 files changed, 7577 insertions(+), 30 deletions(-)
-- create mode 100644 patches/0004-20251106change.patch
--
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index 0546f318..8831e4b7 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
--     #         expert_cache += expert_out * weight
--     #     return expert_cache
-- 
---    @no_grad()
---    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
---        # x 的 shape: (1, hidden_size)
---        # flat_expert_indices 的 shape: (num_experts_per_tok,)
---        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
---
---        # 1. 收集所有需要的专家层
---        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
---        selected_experts = [self.experts[i] for i in flat_expert_indices]
---
---        # 2. 并行计算所有专家的输出
---        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
---        # ops.cat 会将它们堆叠成一个新的 Tensor
---        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
---        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
---
---        # 3. 使用矩阵乘法进行加权求和
---        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
---        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
---        # 最终结果 final_output 的 shape: (1, hidden_size)
---        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+    # @no_grad()
--+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+    #     # x 的 shape: (1, hidden_size)
--+    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+
--+    #     # 1. 收集所有需要的专家层
--+    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
--+
--+    #     # 2. 并行计算所有专家的输出
--+    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+    #     # ops.cat 会将它们堆叠成一个新的 Tensor
--+    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+
--+    #     # 3. 使用矩阵乘法进行加权求和
--+    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+    #     # 最终结果 final_output 的 shape: (1, hidden_size)
--+    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--         
---        return final_output
--+    #     return final_output
-- 
-- 
--     # @no_grad()
--@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
--             )
-- 
--         return expert_cache
--+# 放置在 DeepseekMoE 类中
--+    @no_grad()
--+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+        """
--+        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--+
--+        Args:
--+            x (Tensor): 输入张量, shape: (1, hidden_size)
--+            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--+            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--+        """
--+        top_k, _ = flat_expert_weights.shape
--+        hidden_size = x.shape[-1]
--+
--+        # 1. 将所有专家的权重堆叠起来
--+        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--+        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--+        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--+        
--+        # 2. "收集" 所需的专家权重
--+        selected_gate_w = stacked_gate_w[flat_expert_indices]
--+        selected_up_w = stacked_up_w[flat_expert_indices]
--+        selected_down_w = stacked_down_w[flat_expert_indices]
--+        
--+        # 3. 准备输入
--+        x_expanded = x.expand((top_k, 1, hidden_size))
--+        
--+        # 4. 并行计算 gate_proj 和 up_proj
--+        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--+        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--+
--+        # 5. 计算中间状态
--+        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--+        
--+        # 6. 并行计算 down_proj
--+        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--+        #    --- [FIX] ---
--+        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--+        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--+        #    --- [FIX END] ---
--+        
--+        # 7. 根据路由权重进行加权求和
--+        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--+
--+        return weighted_sum
--+
--+
-- 
--         # @no_grad()
--         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--index ebd7782e..913a7609 100644
----- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
-- # Copied from transformers.models.llama.modeling_llama.rotate_half
-- def rotate_half(x):
--     """Rotates half the hidden dims of the input."""
---    x1 = x[..., : x.shape[-1] // 2]
---    x2 = x[..., x.shape[-1] // 2 :]
--+    # x1 = x[..., : x.shape[-1] // 2]
--+    # x2 = x[..., x.shape[-1] // 2 :]
--     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
---    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--     return ops.cat((-x2, x1), dim=-1)
-- 
-- 
--diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--index d059dcbe..2b217b64 100644
----- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--+++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
-- # Copied from transformers.models.llama.modeling_llama.rotate_half
-- def rotate_half(x):
--     """Rotates half the hidden dims of the input."""
---    x1 = x[..., : x.shape[-1] // 2]
---    x2 = x[..., x.shape[-1] // 2 :]
--+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+    # x1 = x[..., : x.shape[-1] // 2]
--+    # x2 = x[..., x.shape[-1] // 2 :]
--+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--     return ops.cat((-x2, x1), dim=-1)
-- 
-- 
--diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--index 78f22642..0a0ef2d7 100644
----- a/patches/0001-20251104commit.patch
--+++ b/patches/0001-20251104commit.patch
--@@ -1,7 +1,7 @@
-- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Tue, 4 Nov 2025 09:11:51 +0800
---Subject: [PATCH 1/3] 20251104commit
--+Subject: [PATCH 1/4] 20251104commit
-- 
-- ---
--  mindnlp/transformers/cache_utils.py           |  28 +-
--diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--index 22b65dd5..5185270c 100644
----- a/patches/0002-20251106commit.patch
--+++ b/patches/0002-20251106commit.patch
--@@ -1,7 +1,7 @@
-- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 09:20:38 +0800
---Subject: [PATCH 2/3] 20251106commit
--+Subject: [PATCH 2/4] 20251106commit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--index 966529e4..3e05f821 100644
----- a/patches/0003-20261106secondcommit.patch
--+++ b/patches/0003-20261106secondcommit.patch
--@@ -1,7 +1,7 @@
-- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 14:54:37 +0800
---Subject: [PATCH 3/3] 20261106secondcommit
--+Subject: [PATCH 3/4] 20261106secondcommit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--new file mode 100644
--index 00000000..88a1aef4
----- /dev/null
--+++ b/patches/0004-20251106change.patch
--@@ -0,0 +1,7498 @@
--+From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
--+From: Pinoeer-kingxi <13022943007@163.com>
--+Date: Thu, 6 Nov 2025 15:48:09 +0800
--+Subject: [PATCH 4/4] 20251106change
--+
--+---
--+ .../models/deepseek/modeling_deepseek.py      |  189 +-
--+ patches/0001-20251104commit.patch             | 1272 +++++++
--+ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
--+ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
--+ 4 files changed, 7244 insertions(+), 186 deletions(-)
--+ create mode 100644 patches/0001-20251104commit.patch
--+ create mode 100644 patches/0002-20251106commit.patch
--+ create mode 100644 patches/0003-20261106secondcommit.patch
--+
--+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+index 2f9192bf..0546f318 100644
--+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
--+ 
--+         return attn_output, attn_weights, past_key_value
--+ 
--+-# class DeepseekFlashAttention(nn.Module):
--+-#     """
--+-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--+-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--+-
--+-#     This class is designed as a drop-in replacement for DeepseekAttention.
--+-#     """
--+-
--+-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--+-#         super().__init__()
--+-#         self.config = config
--+-#         self.layer_idx = layer_idx
--+-#         if layer_idx is None:
--+-#             logger.warning(
--+-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+-#                 "when creating this class."
--+-#             )
--+-
--+-#         self.attention_dropout = config.attention_dropout
--+-#         self.hidden_size = config.hidden_size
--+-#         self.num_heads = config.num_attention_heads
--+-#         self.head_dim = self.hidden_size // self.num_heads
--+-#         self.num_key_value_heads = config.num_key_value_heads
--+-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+-#         self.max_position_embeddings = config.max_position_embeddings
--+-#         self.rope_theta = config.rope_theta
--+-#         self.is_causal = True
--+-
--+-#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+-#             raise ValueError(
--+-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+-#                 f" and `num_heads`: {self.num_heads})."
--+-#             )
--+-
--+-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--+-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--+-#         self._init_rope()
--+-
--+-#     def _init_rope(self):
--+-#         if self.config.rope_scaling is None:
--+-#             self.rotary_emb = DeepseekRotaryEmbedding(
--+-#                 self.head_dim,
--+-#                 max_position_embeddings=self.max_position_embeddings,
--+-#                 base=self.rope_theta,
--+-#             )
--+-#         else:
--+-#             scaling_type = self.config.rope_scaling["type"]
--+-#             scaling_factor = self.config.rope_scaling["factor"]
--+-#             if scaling_type == "linear":
--+-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--+-#                     self.head_dim,
--+-#                     max_position_embeddings=self.max_position_embeddings,
--+-#                     scaling_factor=scaling_factor,
--+-#                     base=self.rope_theta,
--+-#                 )
--+-#             elif scaling_type == "dynamic":
--+-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--+-#                     self.head_dim,
--+-#                     max_position_embeddings=self.max_position_embeddings,
--+-#                     scaling_factor=scaling_factor,
--+-#                     base=self.rope_theta,
--+-#                 )
--+-#             else:
--+-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--+-
--+-#     def forward(
--+-#         self,
--+-#         hidden_states: mindspore.Tensor,
--+-#         attention_mask: Optional[mindspore.Tensor] = None,
--+-#         position_ids: Optional[mindspore.Tensor] = None,
--+-#         past_key_value: Optional[Cache] = None,
--+-#         output_attentions: bool = False,
--+-#         use_cache: bool = False,
--+-#         **kwargs,
--+-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+-#         if "padding_mask" in kwargs:
--+-#             warnings.warn(
--+-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--+-#             )
--+-        
--+-#         if output_attentions:
--+-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--+-
--+-#         bsz, q_len, _ = hidden_states.shape
--+-
--+-#         if self.config.pretraining_tp > 1:
--+-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--+-
--+-#         query_states = self.q_proj(hidden_states)
--+-#         key_states = self.k_proj(hidden_states)
--+-#         value_states = self.v_proj(hidden_states)
--+-
--+-#         # Reshape for multi-head attention
--+-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+-
--+-#         kv_seq_len = key_states.shape[-2]
--+-#         if past_key_value is not None:
--+-#             if self.layer_idx is None:
--+-#                 raise ValueError(
--+-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+-#                     "with a layer index."
--+-#                 )
--+-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+-        
--+-#         # Apply Rotary Positional Embedding
--+-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+-
--+-#         if past_key_value is not None:
--+-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--+-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+-
--+-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--+-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--+-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+-        
--+-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+-        
--+-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+-
--+-#         # Convert attention_mask for flash_attention_score
--+-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--+-#         if attention_mask is not None:
--+-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--+-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--+-#                 raise ValueError(
--+-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--+-#                 )
--+-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--+-#         else:
--+-#             attn_mask_for_fa = None
--+-        
--+-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--+-
--+-#         # Call the fused flash_attention_score operator
--+-#         attn_output = mindspore.ops.flash_attention_score(
--+-#             query=query_states_for_fa,
--+-#             key=key_states_for_fa,
--+-#             value=value_states_for_fa,
--+-#             head_num=self.num_heads, # This is N1, the number of query heads
--+-#             input_layout='BSH',
--+-#             attn_mask=attn_mask_for_fa,
--+-#             keep_prob=keep_prob,
--+-#             scalar_value=1.0 / math.sqrt(self.head_dim),
--+-#             sparse_mode=0 # Default mask mode
--+-#         )
--+-        
--+-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--+-#         attn_output = self.o_proj(attn_output)
--+-        
--+-#         # Flash Attention does not return attention weights
--+-#         attn_weights = None
--+-
--+-#         return attn_output, attn_weights, past_key_value
--+ 
--+ class DeepseekFlashAttention(nn.Module):
--+     """
--+@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
--+         super().__init__()
--+         self.hidden_size = config.hidden_size
--+ 
--+-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--+-            config=config, layer_idx=layer_idx
--+-        )
--++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--++            # config=config, layer_idx=layer_idx
--++        # )
--+ 
--+         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--+             config=config, layer_idx=layer_idx
--+@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
--+         return outputs
--+ 
--+ 
--+-
--+ class DeepseekPreTrainedModel(PreTrainedModel):
--+     config_class = DeepseekConfig
--+     base_model_prefix = "model"
--+@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+         # Initialize weights and apply final processing
--+         self.post_init()
--+         self.warm_up = False
--+-        #@dwj
--+-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--+-            self.num_layers,
--+-            self.num_attention_heads,
--+-            self.head_dim,
--+-            batch_size=1,
--+-            max_length=self.max_length,
--+-            dtype=mindspore.float16
--+-        )
--+-
--+-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--+-        key_cache = []
--+-        value_cache = []
--+-        for _ in range(num_layers):
--+-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+-            key_cache.append(k)
--+-            value_cache.append(v)
--+-        return key_cache, value_cache
--+-
--+ 
--+     def warmup_moe_model_deep(self):
--+         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+new file mode 100644
--+index 00000000..78f22642
--+--- /dev/null
--++++ b/patches/0001-20251104commit.patch
--+@@ -0,0 +1,1272 @@
--++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++From: Pinoeer-kingxi <13022943007@163.com>
--++Date: Tue, 4 Nov 2025 09:11:51 +0800
--++Subject: [PATCH 1/3] 20251104commit
--++
--++---
--++ mindnlp/transformers/cache_utils.py           |  28 +-
--++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--++ 3 files changed, 976 insertions(+), 87 deletions(-)
--++
--++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--++index cadd2e04..02f8d4be 100644
--++--- a/mindnlp/transformers/cache_utils.py
--+++++ b/mindnlp/transformers/cache_utils.py
--++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--++             # k_out[:, :, cache_position] = key_states
--++             # v_out[:, :, cache_position] = value_states
--++-            if ON_ORANGE_PI:
--++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++-            else:
--++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++-
--+++            # if ON_ORANGE_PI:
--+++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++            # else:
--+++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++            # 确保 cache_position 是 1D tensor 并且类型正确
--+++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+++            if cache_position.ndim > 1:
--+++                cache_position = cache_position.flatten()
--+++            # 确保类型是 int32 或 int64（MindSpore 要求）
--+++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+++                cache_position = cache_position.int()
--+++            
--+++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+++            k_out[:, :, cache_position] = key_states
--+++            v_out[:, :, cache_position] = value_states
--+++            
--++         return k_out, v_out
--++ 
--++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++index c695b944..d8303e45 100644
--++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--++ def rotate_half(x):
--++     """Rotates half the hidden dims of the input."""
--++-    x1 = x[..., : x.shape[-1] // 2]
--++-    x2 = x[..., x.shape[-1] // 2 :]
--+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++    # x1 = x[..., : x.shape[-1] // 2]
--+++    # x2 = x[..., x.shape[-1] // 2 :]
--+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++     return ops.cat((-x2, x1), dim=-1)
--++ 
--++ 
--++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--++         if self.training:
--++             raise NotImplementedError("Training is not supported yet.")
--++         else:
--++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++-        if self.config.n_shared_experts is not None:
--++-            y = y + self.shared_experts(identity)
--++-        return y
--+++            # @lwx
--+++            if orig_shape[1] == 1:
--+++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+++                y=y.view(*orig_shape)
--+++                if self.config.n_shared_experts is not None:
--+++                    y = y + self.shared_experts(identity)
--+++                return y
--+++            else:
--+++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+++                if self.config.n_shared_experts is not None:
--+++                    y = y + self.shared_experts(identity)
--+++                return y
--+++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++        # if self.config.n_shared_experts is not None:
--+++        #     y = y + self.shared_experts(identity)
--+++        # return y
--+++        
--+++    @no_grad()
--+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++
--+++        expert_cache = ops.zeros_like(x)
--+++        for i in range(self.num_experts_per_tok):
--+++            expert_id = flat_expert_indices[i].item()
--+++            weight = flat_expert_weights[i].item()
--+++            expert = self.experts[expert_id]
--+++            expert_out = expert(x)
--+++            expert_cache += expert_out * weight
--+++        return expert_cache
--++ 
--++     @no_grad()
--++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++-        # expert_cache = torch.zeros_like(x)
--++-        # idxs = flat_expert_indices.argsort()
--++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++-        # token_idxs = idxs // self.num_experts_per_tok
--++-        # for i, end_idx in enumerate(tokens_per_expert):
--++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++-        #     if start_idx == end_idx:
--++-        #         continue
--++-        #     expert = self.experts[i]
--++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--++-        #     expert_tokens = x[exp_token_idx]
--++-        #     expert_out = expert(expert_tokens)
--++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++-        # return expert_cache
--+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++         expert_cache = ops.zeros_like(x)
--++         idxs = flat_expert_indices.argsort()
--++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++         token_idxs = idxs // self.num_experts_per_tok
--+++
--++         for i, end_idx in enumerate(tokens_per_expert):
--++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++             if start_idx == end_idx:
--++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--++             expert_out = expert(expert_tokens)
--++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++
--++         return expert_cache
--+++        
--+++    # @no_grad()
--+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++    #     # expert_cache = torch.zeros_like(x)
--+++    #     # idxs = flat_expert_indices.argsort()
--+++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++    #     # token_idxs = idxs // self.num_experts_per_tok
--+++    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++    #     #     if start_idx == end_idx:
--+++    #     #         continue
--+++    #     #     expert = self.experts[i]
--+++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++    #     #     expert_tokens = x[exp_token_idx]
--+++    #     #     expert_out = expert(expert_tokens)
--+++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++    #     # return expert_cache
--+++    #     expert_cache = ops.zeros_like(x)
--+++    #     idxs = flat_expert_indices.argsort()
--+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++    #     token_idxs = idxs // self.num_experts_per_tok
--+++
--+++    #     for i, end_idx in enumerate(tokens_per_expert):
--+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++    #         if start_idx == end_idx:
--+++    #             continue
--+++    #         expert = self.experts[i]
--+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++    #         expert_tokens = x[exp_token_idx]
--+++    #         expert_out = expert(expert_tokens)
--+++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++
--+++    #     return expert_cache
--+++    # @no_grad()
--+++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++    #     expert_cache = ops.zeros_like(x)
--+++
--+++    #     # 排序保证顺序一致
--+++    #     idxs = flat_expert_indices.argsort()
--+++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++    #     token_idxs = idxs // self.num_experts_per_tok
--+++
--+++    #     # 找出有 token 的专家
--+++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++
--+++    #     for i in active_experts.tolist():
--+++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++    #         end_idx = tokens_per_expert[i]
--+++    #         if start_idx == end_idx:  # 没有 token
--+++    #             continue
--+++
--+++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++    #         expert_tokens = x[exp_token_idx]
--+++    #         expert_out = self.experts[i](expert_tokens)
--+++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++
--+++    #         expert_cache = mindspore.mint.scatter_add(
--+++    #             expert_cache,
--+++    #             0,
--+++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++    #             expert_out
--+++    #         )
--+++
--+++    #     return expert_cache
--+++
--+++
--++ 
--++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--++ #     """
--++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++ 
--++         # Initialize weights and apply final processing
--++         self.post_init()
--+++        self.warm_up = False
--+++
--+++    def warmup_moe_model_deep(self):
--+++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++        test_texts = [
--+++            "warmup short",
--+++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+++        ]
--+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++        if tokenizer is None:
--+++            from mindnlp.transformers import AutoTokenizer
--+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++            self._warmup_tokenizer = tokenizer
--+++
--+++        for text in test_texts:
--+++            inputs = tokenizer(text, return_tensors="ms")
--+++            with mindspore._no_grad():
--+++                _ = self(**inputs, use_cache=False)
--+++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--++ 
--++     def get_input_embeddings(self):
--++         return self.model.embed_tokens
--++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++         ```"""
--+++        if not self.warm_up:
--+++            self.warm_up = True
--+++            self.warmup_moe_model_deep()
--+++
--++         output_attentions = (
--++             output_attentions
--++             if output_attentions is not None
--++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++index 3cbf820e..d4c6b651 100644
--++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++@@ -18,7 +18,6 @@
--++ # See the License for the specific language governing permissions and
--++ # limitations under the License.
--++ """MindSpore Qwen2MoE model."""
--++-
--++ import math
--++ from typing import List, Optional, Tuple, Union
--++ 
--++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--++     TokenClassifierOutput,
--++ )
--++ from ...modeling_utils import PreTrainedModel
--+++from ...generation import GenerationMixin
--++ from ....utils import logging
--++ from .configuration_qwen2_moe import Qwen2MoeConfig
--++ 
--++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--++         self.variance_epsilon = eps
--++ 
--++     def forward(self, hidden_states):
--+++        # @dwj
--+++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++        # @lwx
--+++        # if not self.training :
--+++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++         input_dtype = hidden_states.dtype
--++         hidden_states = hidden_states.to(mindspore.float32)
--++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--++@@ -234,6 +239,8 @@ def rotate_half(x):
--++     """Rotates half the hidden dims of the input."""
--++     x1 = x[..., : x.shape[-1] // 2]
--++     x2 = x[..., x.shape[-1] // 2 :]
--+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++     return ops.cat((-x2, x1), dim=-1)
--++ 
--++ 
--++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--++         self.config = config
--++         self.hidden_size = config.hidden_size
--++         self.intermediate_size = intermediate_size
--+++        
--++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--++         self.act_fn = ACT2FN[config.hidden_act]
--++ 
--++     def forward(self, x):
--++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++-
--++ 
--+++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++        # @lwx 
--+++        # gate_up_output = self.gate_up_proj(x)
--+++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+++        # return self.down_proj(swiglu_output)
--+++
--+++    # def forward(self, x):
--+++    #     gate_proj_out = self.gate_proj(x)
--+++    #     up_proj_out = self.up_proj(x)
--+++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+++    #     return self.down_proj(swiglu_out)
--+++        
--++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++     """
--++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--++         use_cache: bool = False,
--++         cache_position: Optional[mindspore.Tensor] = None,
--++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++        
--+++
--++         bsz, q_len, _ = hidden_states.shape
--++ 
--++         query_states = self.q_proj(hidden_states)
--++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++                     "with a layer index."
--++                 )
--++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++            if isinstance(past_key_value, StaticCache):
--+++                kv_seq_len = key_states.shape[-2]
--+++            else:
--+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++ 
--++         if past_key_value is not None:
--++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++            
--+++            if isinstance(past_key_value, StaticCache):
--+++                kv_seq_len = key_states.shape[-2]
--++ 
--++         # repeat k/v heads if n_kv_heads < n_heads
--++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++-
--+++        
--++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++ 
--++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--++-            raise ValueError(
--++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--++-                f" {attn_weights.shape}"
--++-            )
--++-
--++-        if attention_mask is not None:  # no matter the length, we just slice it
--++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+++        if attention_mask is not None:
--+++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++             attn_weights = attn_weights + causal_mask
--++ 
--++         # upcast attention to fp32
--++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++ 
--++         attn_output = self.o_proj(attn_output)
--++-
--+++        # @lwx
--+++        
--+++        # max_seq_len = self.max_position_embeddings  # 2048
--+++
--+++        # if attention_mask is not None:
--+++        #     # attention_mask: [B, 1, Sq, Sk]
--+++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++
--+++        #     # pad 到 [max_seq_len, max_seq_len]
--+++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++        #     global_attention_mask = padded_mask
--+++        # else:
--+++        #     global_attention_mask = None
--+++
--+++
--+++        # sparse_mode=3
--+++        # attn_output = mindspore.ops.flash_attention_score(
--+++        #     query=query_states,
--+++        #     key=key_states,
--+++        #     value=value_states,
--+++        #     real_shift=None,
--+++        #     padding_mask=None,
--+++
--+++        #     head_num=self.num_heads,
--+++        #     attn_mask=global_attention_mask,
--+++        #     keep_prob=1.0 - self.attention_dropout,
--+++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++        #     input_layout="BNSD",
--+++        #     pre_tokens=2147483647,
--+++        #     next_tokens=2147483647,
--+++        #     inner_precise=0,
--+++        #     drop_mask=None,
--+++        #     prefix=None,
--+++        #     actual_seq_qlen=None,
--+++        #     actual_seq_kvlen=None,
--+++        #     sparse_mode=sparse_mode,
--+++        # )
--++         if not output_attentions:
--++             attn_weights = None
--++ 
--++         return attn_output, attn_weights, past_key_value
--++ 
--++ 
--+++class Qwen2MoeFlashAttention(nn.Module):
--+++    """
--+++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++
--+++    关键改动:
--+++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++       直接传入原始的 key 和 value 张量效率更高。
--+++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++    """
--+++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++        super().__init__()
--+++        self.config = config
--+++        self.layer_idx = layer_idx
--+++        self.hidden_size = config.hidden_size
--+++        self.num_heads = config.num_attention_heads
--+++        self.head_dim = self.hidden_size // self.num_heads
--+++        self.num_key_value_heads = config.num_key_value_heads
--+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++        self.max_position_embeddings = config.max_position_embeddings
--+++        self.rope_theta = config.rope_theta
--+++        self.attention_dropout = config.attention_dropout
--+++
--+++        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++            raise ValueError(
--+++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++            )
--+++
--+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++
--+++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++            self.head_dim,
--+++            max_position_embeddings=self.max_position_embeddings,
--+++            base=self.rope_theta,
--+++        )
--+++
--+++    def forward(
--+++        self,
--+++        hidden_states: mindspore.Tensor,
--+++        attention_mask: Optional[mindspore.Tensor] = None,
--+++        position_ids: Optional[mindspore.Tensor] = None,
--+++        past_key_value: Optional[Cache] = None,
--+++        output_attentions: bool = False,
--+++        use_cache: bool = False,
--+++        cache_position: Optional[mindspore.Tensor] = None,
--+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++        bsz, q_len, _ = hidden_states.shape
--+++
--+++        # 1. 线性投射 Q, K, V
--+++        query_states = self.q_proj(hidden_states)
--+++        key_states = self.k_proj(hidden_states)
--+++        value_states = self.v_proj(hidden_states)
--+++
--+++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++        # 3. RoPE 旋转位置编码
--+++        kv_seq_len = key_states.shape[-2]
--+++        if past_key_value is not None:
--+++            if self.layer_idx is None:
--+++                raise ValueError(
--+++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++                    "with a layer index."
--+++                )
--+++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++                if cache_position.shape[0] == 1:
--+++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++                    kv_seq_len = past_seen_tokens + 1
--+++                else:
--+++                    # prefill 阶段：cache_position 是范围，使用其长度
--+++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++            else:
--+++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++
--+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++        # 4. KV 缓存更新
--+++        if past_key_value is not None:
--+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++            key_states, value_states = past_key_value.update(
--+++                key_states, value_states, self.layer_idx, cache_kwargs
--+++            )
--+++            
--+++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++                if cache_position.shape[0] == 1:
--+++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++                    kv_seq_len = key_states.shape[-2]
--+++
--+++        # 5. [重要] 准备 Attention Mask
--+++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++        fa_attention_mask = None
--+++        if attention_mask is not None:
--+++            # 截取与当前key长度匹配的部分
--+++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++            fa_attention_mask = (mask_slice != 0)
--+++
--+++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++        input_dtype = query_states.dtype
--+++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++            query_states = query_states.to(mindspore.float16)
--+++            key_states = key_states.to(mindspore.float16)
--+++            value_states = value_states.to(mindspore.float16)
--+++
--+++        # 6. [核心] 调用 flash_attention_score 算子
--+++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++        attn_output = mindspore.ops.flash_attention_score(
--+++            query=query_states,
--+++            key=key_states,
--+++            value=value_states,
--+++            head_num=self.num_heads, # 传入Q的头数(N1)
--+++            attn_mask=fa_attention_mask,
--+++            keep_prob=1.0 - self.attention_dropout,
--+++            scalar_value=1.0 / math.sqrt(self.head_dim),
--+++            input_layout="BNSD",
--+++            sparse_mode=0 # 使用 defaultMask 模式
--+++        )
--+++
--+++        # 恢复原始数据类型
--+++        attn_output = attn_output.to(input_dtype)
--+++
--+++        # 7. 调整输出形状
--+++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++        attn_output = self.o_proj(attn_output)
--+++
--+++        # FlashAttention 算子不直接返回注意力权重矩阵
--+++        attn_weights = None
--+++        if output_attentions:
--+++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++
--+++        return attn_output, attn_weights, past_key_value
--+++
--+++    # def forward(
--+++    #     self,
--+++    #     hidden_states: mindspore.Tensor,
--+++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++    #     past_key_value: Optional[Cache] = None,
--+++    #     output_attentions: bool = False,
--+++    #     use_cache: bool = False,
--+++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++    #     bsz, q_len, _ = hidden_states.shape
--+++
--+++    #     # 1. 线性投射 Q, K, V
--+++    #     query_states = self.q_proj(hidden_states)
--+++    #     key_states = self.k_proj(hidden_states)
--+++    #     value_states = self.v_proj(hidden_states)
--+++
--+++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++    #     # 3. RoPE 旋转位置编码
--+++    #     kv_seq_len = key_states.shape[-2]
--+++    #     if past_key_value is not None:
--+++    #         if self.layer_idx is None:
--+++    #             raise ValueError(
--+++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++    #                 "with a layer index."
--+++    #             )
--+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++
--+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++    #     # 4. KV 缓存更新
--+++    #     if past_key_value is not None:
--+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++    #         key_states, value_states = past_key_value.update(
--+++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++    #         )
--+++
--+++    #     # 5. 准备 Attention Mask
--+++    #     fa_attention_mask = None
--+++    #     if attention_mask is not None:
--+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++    #         fa_attention_mask = (mask_slice != 0)
--+++
--+++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++    #     input_dtype = query_states.dtype
--+++
--+++    #     # 6. [核心] 调用 flash_attention_score 算子
--+++    #     attn_output = mindspore.ops.flash_attention_score(
--+++    #         query=query_states,
--+++    #         key=key_states,
--+++    #         value=value_states,
--+++    #         head_num=self.num_heads,
--+++    #         attn_mask=fa_attention_mask,
--+++    #         keep_prob=1.0 - self.attention_dropout,
--+++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++    #         input_layout="BNSD",
--+++    #         sparse_mode=0,
--+++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++    #         inner_precise=1
--+++    #     )
--+++
--+++    #     # 恢复原始数据类型
--+++    #     attn_output = attn_output.to(input_dtype)
--+++
--+++    #     # 7. 调整输出形状
--+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++    #     attn_output = self.o_proj(attn_output)
--+++
--+++    #     attn_weights = None
--+++    #     if output_attentions:
--+++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++
--+++    #     return attn_output, attn_weights, past_key_value
--+++
--+++    # def forward(
--+++    #     self,
--+++    #     hidden_states: mindspore.Tensor,
--+++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++    #     past_key_value: Optional[Cache] = None,
--+++    #     output_attentions: bool = False,
--+++    #     use_cache: bool = False,
--+++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++    #     bsz, q_len, _ = hidden_states.shape
--+++
--+++    #     query_states = self.q_proj(hidden_states)
--+++    #     key_states = self.k_proj(hidden_states)
--+++    #     value_states = self.v_proj(hidden_states)
--+++
--+++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++    #     kv_seq_len = key_states.shape[-2]
--+++    #     if past_key_value is not None:
--+++    #         if self.layer_idx is None:
--+++    #             raise ValueError("`layer_idx` must be specified for caching")
--+++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++
--+++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++    #     if past_key_value is not None:
--+++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++    #         key_states, value_states = past_key_value.update(
--+++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++    #         )
--+++
--+++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++
--+++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++    #     query_states = query_states / math.sqrt(self.head_dim)
--+++    #     # <--- 修改结束 ---
--+++
--+++    #     fa_attention_mask = None
--+++    #     if attention_mask is not None:
--+++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++    #         fa_attention_mask = (mask_slice != 0)
--+++
--+++    #     input_dtype = query_states.dtype
--+++
--+++    #     attn_output = mindspore.ops.flash_attention_score(
--+++    #         query=query_states,    # 传入已经预先缩放过的 query
--+++    #         key=key_states,
--+++    #         value=value_states,
--+++    #         head_num=self.num_heads,
--+++    #         attn_mask=fa_attention_mask,
--+++    #         keep_prob=1.0 - self.attention_dropout,
--+++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++    #         input_layout="BNSD",
--+++    #         sparse_mode=0,
--+++    #         inner_precise=1        # 仍然保持内部高精度计算
--+++    #     )
--+++
--+++    #     attn_output = attn_output.to(input_dtype)
--+++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++    #     attn_output = self.o_proj(attn_output)
--+++
--+++    #     attn_weights = None
--+++    #     if output_attentions:
--+++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++
--+++    #     return attn_output, attn_weights, past_key_value
--+++
--++ QWEN2MOE_ATTENTION_CLASSES = {
--++     "eager": Qwen2MoeAttention,
--+++    "flash-attention": Qwen2MoeFlashAttention,
--++ }
--++ 
--++ 
--++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++ 
--+++    #@dwj
--+++    # 只遍历激活的专家，而非全部专家
--++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-        hidden_states = hidden_states.view(-1, hidden_dim)
--++-        # router_logits: (batch * sequence_length, n_experts)
--++-        router_logits = self.gate(hidden_states)
--++-
--++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++-        if self.norm_topk_prob:
--++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-        # we cast back to the input dtype
--++-        routing_weights = routing_weights.to(hidden_states.dtype)
--++-
--++-        final_hidden_states = ops.zeros(
--++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--++-        )
--++-
--++-        # One hot encode the selected experts to create an expert mask
--++-        # this will be used to easily index which expert is going to be sollicitated
--++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--++-
--++-        # Loop over all available experts in the model and perform the computation on each expert
--++-        for expert_idx in range(self.num_experts):
--++-            expert_layer = self.experts[expert_idx]
--++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--++-
--++-            # Index the correct hidden states and compute the expert hidden state for
--++-            # the current expert. We need to make sure to multiply the output hidden
--++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--++-            if 0 not in idx.shape:
--++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--++-
--++-                # However `index_add_` only support torch tensors for indexing so we'll use
--++-                # the `top_x` tensor here.
--++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--++-
--++-        shared_expert_output = self.shared_expert(hidden_states)
--++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--++-
--++-        final_hidden_states = final_hidden_states + shared_expert_output
--+++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++            num_tokens = hidden_states_reshaped.shape[0]
--+++
--+++            router_logits = self.gate(hidden_states_reshaped)
--+++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++
--+++            if self.norm_topk_prob:
--+++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++            routing_weights = routing_weights.to(hidden_states.dtype)
--+++            
--+++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++            flat_selected_experts = selected_experts.flatten()
--+++            
--+++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++            token_indices = broadcasted_token_indices.flatten()
--+++            
--+++            active_experts = ops.unique(flat_selected_experts)
--+++            
--+++            for expert_idx_tensor in active_experts:
--+++                expert_idx = expert_idx_tensor.item()
--+++                expert_layer = self.experts[expert_idx]
--+++                
--+++                mask = (flat_selected_experts == expert_idx_tensor)
--+++                selected_token_indices = token_indices[mask]
--+++                selected_routing_weights = routing_weights.flatten()[mask]
--+++                
--+++                current_states = hidden_states_reshaped[selected_token_indices]
--+++                
--+++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++                
--+++                final_hidden_states = final_hidden_states.index_add(
--+++                    dim=0,
--+++                    index=selected_token_indices,
--+++                    source=expert_output.to(hidden_states.dtype)
--+++                )
--+++            
--+++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++ 
--++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++-        return final_hidden_states, router_logits
--+++            final_hidden_states = final_hidden_states + shared_expert_output
--+++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++            
--+++            return final_hidden_states, router_logits
--++ 
--++ 
--++ class Qwen2MoeDecoderLayer(nn.Module):
--++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--++ 
--++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++ 
--+++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++
--++         if (layer_idx not in config.mlp_only_layers) and (
--++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++         ):
--++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--++     _skip_keys_device_placement = "past_key_values"
--++     _supports_cache_class = True
--+++#lwx
--+++    # _supports_static_cache = True
--++ 
--++     def _init_weights(self, module):
--++         std = self.config.initializer_range
--++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++         return causal_mask
--++ 
--++ 
--++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++     _tied_weights_keys = ["lm_head.weight"]
--++ 
--++     def __init__(self, config):
--++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++         self.num_experts_per_tok = config.num_experts_per_tok
--++         # Initialize weights and apply final processing
--++         self.post_init()
--+++        # @lwx
--+++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+++        #     self.generation_config.cache_implementation = "static"
--+++        self._warmed_up = False
--+++
--+++    def warmup_moe_model(self):
--+++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+++        test_texts = [
--+++            "warmup short",
--+++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+++        ]
--+++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++        if tokenizer is None:
--+++            from mindnlp.transformers import AutoTokenizer
--+++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++            self._warmup_tokenizer = tokenizer
--+++
--+++        for text in test_texts:
--+++            inputs = tokenizer(text, return_tensors="ms")
--+++            with mindspore._no_grad():
--+++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--++ 
--++     def get_input_embeddings(self):
--++         return self.model.embed_tokens
--++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++         ```"""
--+++        if not self._warmed_up:
--+++            self._warmed_up = True
--+++            self.warmup_moe_model()
--++ 
--++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++         output_router_logits = (
--++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++             }
--++         )
--++         return model_inputs
--+++# @lwx
--+++    # def _decode_one_tokens_logits(
--+++    #     self,
--+++    #     cur_token: mindspore.Tensor,
--+++    #     input_pos: Optional[mindspore.Tensor],
--+++    #     cache_position: mindspore.Tensor,
--+++    #     past_key_values: StaticCache,
--+++    # ) -> mindspore.Tensor:
--+++    #     """
--+++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+++        
--+++    #     Args:
--+++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+++    #         input_pos: 输入位置信息，可选
--+++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+++            
--+++    #     Returns:
--+++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+++    #     """
--+++    #     # 调用JIT编译的版本
--+++    #     return self.get_decode_one_tokens_logits(
--+++    #         cur_token=cur_token,
--+++    #         input_pos=input_pos,
--+++    #         cache_position=cache_position,
--+++    #         past_key_values=past_key_values,
--+++    #     )
--+++    
--+++    # @mindspore.jit(jit_level='O1')
--+++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+++    #     """
--+++    #     JIT编译的函数，用于高效的单token解码
--+++    #     使用JIT编译优化以支持静态shape和高效执行
--+++        
--+++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+++    #     """
--+++    #     outputs = self.model.forward(
--+++    #         input_ids=cur_token,
--+++    #         position_ids=input_pos,
--+++    #         cache_position=cache_position,
--+++    #         past_key_values=past_key_values,
--+++    #         use_cache=True,
--+++    #         return_dict=False,
--+++    #     )
--+++        
--+++    #     hidden_states = outputs[0]
--+++    #     logits = self.lm_head.forward(hidden_states)
--+++    #     logits = logits.float()
--+++        
--+++    #     return logits[:, -1, :]
--+++
--+++    # def _sample(
--+++    #     self,
--+++    #     input_ids: mindspore.Tensor,
--+++    #     logits_processor,
--+++    #     stopping_criteria,
--+++    #     generation_config,
--+++    #     synced_devices: bool,
--+++    #     streamer=None,
--+++    #     logits_warper=None,
--+++    #     **model_kwargs,
--+++    # ):
--+++    #     """
--+++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+++    #     """
--+++    #     from ...generation.logits_process import LogitsProcessorList
--+++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+++    #     from mindnlp.core import nn, ops, no_grad
--+++    #     import numpy as np
--+++        
--+++    #     # 检查是否使用 StaticCache
--+++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+++    #     # 否则，直接调用父类方法
--+++    #     past_key_values = model_kwargs.get("past_key_values")
--+++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+++
--+++    #     if not isinstance(past_key_values, StaticCache):
--+++    #         # 不使用 StaticCache，直接调用父类方法
--+++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+++    #         return super()._sample(
--+++    #             input_ids=input_ids,
--+++    #             logits_processor=logits_processor,
--+++    #             stopping_criteria=stopping_criteria,
--+++    #             generation_config=generation_config,
--+++    #             synced_devices=synced_devices,
--+++    #             streamer=streamer,
--+++    #             logits_warper=logits_warper,
--+++    #             **model_kwargs,
--+++    #         )
--+++        
--+++    #     # 使用 StaticCache，进入自定义循环
--+++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+++    #     pad_token_id = generation_config._pad_token_tensor
--+++    #     output_attentions = generation_config.output_attentions
--+++    #     output_hidden_states = generation_config.output_hidden_states
--+++    #     output_scores = generation_config.output_scores
--+++    #     output_logits = generation_config.output_logits
--+++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+++    #     max_length = generation_config.max_length
--+++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+++    #     do_sample = generation_config.do_sample
--+++        
--+++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+++    #         raise ValueError(
--+++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+++    #             f"{logits_warper})."
--+++    #         )
--+++        
--+++    #     # init attention / hidden states / scores tuples
--+++    #     scores = () if (return_dict_in_generate and output_scores) else None
--+++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+++        
--+++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+++    #         encoder_hidden_states = (
--+++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+++    #         )
--+++        
--+++    #     # keep track of which sequences are already finished
--+++    #     batch_size, cur_len = input_ids.shape
--+++    #     this_peer_finished = False
--+++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+++        
--+++    #     time_record = []
--+++    #     from ....utils.testing_utils import parse_flag_from_env
--+++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+++        
--+++    #     while self._has_unfinished_sequences(
--+++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+++    #     ):
--+++    #         if _record_time:
--+++    #             import time as time_module
--+++    #             infer_start = time_module.time()
--+++            
--+++    #         # prepare model inputs
--+++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+++            
--+++    #         # prepare variable output controls
--+++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+++            
--+++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+++    #         cur_cache_position = model_inputs.get("cache_position")
--+++    #         cur_past_key_values = model_inputs.get("past_key_values")
--+++    #         cur_input_ids = model_inputs.get("input_ids")
--+++            
--+++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+++    #             cur_cache_position is not None and 
--+++    #             len(cur_cache_position.shape) > 0 and
--+++    #             cur_cache_position.shape[0] == 1 and
--+++    #             cur_input_ids is not None and
--+++    #             cur_input_ids.shape[1] == 1):
--+++    #             # 使用 JIT 优化的单 token 解码
--+++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+++    #             if not hasattr(self, '_jit_used'):
--+++    #                 self._jit_used = False
--+++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+++                
--+++    #             next_token_logits = self.get_decode_one_tokens_logits(
--+++    #                 cur_token=cur_input_ids,
--+++    #                 input_pos=model_inputs.get("position_ids"),
--+++    #                 cache_position=cur_cache_position,
--+++    #                 past_key_values=cur_past_key_values,
--+++    #             )
--+++                
--+++    #             # 标记已使用JIT（用于后续判断）
--+++    #             if not self._jit_used:
--+++    #                 self._jit_used = True
--+++                
--+++    #             # 构造兼容的输出对象
--+++    #             class JitOptimizedOutput:
--+++    #                 def __init__(self, logits, config):
--+++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+++    #                     self.config = config
--+++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+++    #                     self.attentions = None if not config.is_encoder_decoder else None
--+++    #                     self.cross_attentions = None
--+++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+++                
--+++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+++    #         else:
--+++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+++    #             outputs = self(**model_inputs, return_dict=True)
--+++            
--+++    #         if synced_devices and this_peer_finished:
--+++    #             continue
--+++            
--+++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+++    #         next_token_logits = outputs.logits[:, -1, :]
--+++            
--+++    #         # pre-process distribution
--+++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+++    #         if do_sample:
--+++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+++            
--+++    #         # Store scores, attentions and hidden_states when required
--+++    #         if return_dict_in_generate:
--+++    #             if output_scores:
--+++    #                 scores += (next_token_scores,)
--+++    #             if output_logits:
--+++    #                 raw_logits += (next_token_logits,)
--+++    #             if output_attentions:
--+++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+++    #                 if self.config.is_encoder_decoder:
--+++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+++                
--+++    #             if output_hidden_states:
--+++    #                 hidden = (
--+++    #                     outputs.decoder_hidden_states
--+++    #                     if self.config.is_encoder_decoder
--+++    #                     else outputs.hidden_states
--+++    #                 )
--+++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+++            
--+++    #         # token selection
--+++    #         if do_sample:
--+++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+++    #         else:
--+++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+++            
--+++    #         # finished sentences should have their next token be a padding token
--+++    #         if has_eos_stopping_criteria:
--+++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+++            
--+++    #         # update generated ids, model inputs, and length for next step
--+++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+++    #         if streamer is not None:
--+++    #             streamer.put(next_tokens)
--+++            
--+++    #         model_kwargs = self._update_model_kwargs_for_generation(
--+++    #             outputs,
--+++    #             model_kwargs,
--+++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+++    #         )
--+++            
--+++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+++    #         cur_len += 1
--+++            
--+++    #         if _record_time:
--+++    #             import time as time_module
--+++    #             infer_stop = time_module.time()
--+++    #             time_record.append(infer_stop - infer_start)
--+++            
--+++    #         del outputs
--+++        
--+++    #     average_infer_time = None
--+++    #     if time_record:
--+++    #         if len(time_record) > 1:
--+++    #             time_record.pop(0)
--+++    #         average_infer_time = sum(time_record) / len(time_record)
--+++    #         print(f'average inference time is: {average_infer_time}')
--+++    #         print(f'inference time record: {time_record}')
--+++        
--+++    #     if streamer is not None:
--+++    #         streamer.end()
--+++        
--+++    #     # 简单判断：打印是否使用了JIT路径
--+++    #     if hasattr(self, '_jit_used') and self._jit_used:
--+++    #         print("[JIT] ✓ JIT optimization was used during generation")
--+++    #     else:
--+++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+++        
--+++    #     if return_dict_in_generate:
--+++    #         if self.config.is_encoder_decoder:
--+++    #             return GenerateEncoderDecoderOutput(
--+++    #                 sequences=input_ids,
--+++    #                 scores=scores,
--+++    #                 logits=raw_logits,
--+++    #                 encoder_attentions=encoder_attentions,
--+++    #                 encoder_hidden_states=encoder_hidden_states,
--+++    #                 decoder_attentions=decoder_attentions,
--+++    #                 cross_attentions=cross_attentions,
--+++    #                 decoder_hidden_states=decoder_hidden_states,
--+++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++    #                 average_infer_time=average_infer_time
--+++    #             )
--+++    #         else:
--+++    #             return GenerateDecoderOnlyOutput(
--+++    #                 sequences=input_ids,
--+++    #                 scores=scores,
--+++    #                 logits=raw_logits,
--+++    #                 attentions=decoder_attentions,
--+++    #                 hidden_states=decoder_hidden_states,
--+++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++    #                 average_infer_time=average_infer_time
--+++    #             )
--+++    #     else:
--+++    #         return input_ids
--+++            
--+++    # def _prepare_cache_for_generation(
--+++    #     self,
--+++    #     generation_config,
--+++    #     model_kwargs,
--+++    #     assistant_model,
--+++    #     batch_size,
--+++    #     max_cache_length,
--+++    # ):
--+++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+++    #         generation_config.cache_implementation = "static"
--+++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+++        
--+++    #     if generation_config.cache_implementation == "static":
--+++    #         base_required_from_max_length = generation_config.max_length + 1
--+++    #         base_required = max(max_cache_length, base_required_from_max_length)
--+++    #         min_cache_size = 50
--+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+++    #         else:
--+++    #             max_cache_length = max(base_required, min_cache_size)
--+++            
--+++    #         original_max_cache_length = max_cache_length
--+++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+++    #         print(f"  - final max_cache_length: {max_cache_length}")
--+++            
--+++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++    #             if max_cache_length > self.config.max_position_embeddings:
--+++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+++        
--+++    #     result = super()._prepare_cache_for_generation(
--+++    #         generation_config=generation_config,
--+++    #         model_kwargs=model_kwargs,
--+++    #         assistant_model=assistant_model,
--+++    #         batch_size=batch_size,
--+++    #         max_cache_length=max_cache_length,
--+++    #     )
--+++        
--+++    #     if generation_config.cache_implementation == "static":
--+++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+++    #         created_cache = model_kwargs.get(cache_name)
--+++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+++    #             if created_cache.max_cache_len < generation_config.max_length:
--+++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+++        
--+++    #     return result
--+++
--+++
--+++
--++ 
--++ 
--++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--++-- 
--++2.27.0
--++
--+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--+new file mode 100644
--+index 00000000..22b65dd5
--+--- /dev/null
--++++ b/patches/0002-20251106commit.patch
--+@@ -0,0 +1,3200 @@
--++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--++From: Pinoeer-kingxi <13022943007@163.com>
--++Date: Thu, 6 Nov 2025 09:20:38 +0800
--++Subject: [PATCH 2/3] 20251106commit
--++
--++---
--++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
--++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
--++ 3 files changed, 2689 insertions(+), 305 deletions(-)
--++ create mode 100644 patches/0001-20251104commit.patch
--++
--++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++index d8303e45..73773c22 100644
--++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
--++         #     y = y + self.shared_experts(identity)
--++         # return y
--++         
--+++    # @no_grad()
--+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++
--+++    #     expert_cache = ops.zeros_like(x)
--+++    #     for i in range(self.num_experts_per_tok):
--+++    #         expert_id = flat_expert_indices[i].item()
--+++    #         weight = flat_expert_weights[i].item()
--+++    #         expert = self.experts[expert_id]
--+++    #         expert_out = expert(x)
--+++    #         expert_cache += expert_out * weight
--+++    #     return expert_cache
--+++
--++     @no_grad()
--++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++        # x 的 shape: (1, hidden_size)
--+++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+++
--+++        # 1. 收集所有需要的专家层
--+++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+++        selected_experts = [self.experts[i] for i in flat_expert_indices]
--+++
--+++        # 2. 并行计算所有专家的输出
--+++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+++        # ops.cat 会将它们堆叠成一个新的 Tensor
--+++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+++
--+++        # 3. 使用矩阵乘法进行加权求和
--+++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++        # 最终结果 final_output 的 shape: (1, hidden_size)
--+++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+++        
--+++        return final_output
--++ 
--++-        expert_cache = ops.zeros_like(x)
--++-        for i in range(self.num_experts_per_tok):
--++-            expert_id = flat_expert_indices[i].item()
--++-            weight = flat_expert_weights[i].item()
--++-            expert = self.experts[expert_id]
--++-            expert_out = expert(x)
--++-            expert_cache += expert_out * weight
--++-        return expert_cache
--++ 
--++     @no_grad()
--++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
--++             key_states = self.k_proj(hidden_states)
--++             value_states = self.v_proj(hidden_states)
--++ 
--++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++        # @lwx
--+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
--+++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
--+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--+++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--+++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--++ 
--++         kv_seq_len = key_states.shape[-2]
--++         if past_key_value is not None:
--++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
--++         return attn_output, attn_weights, past_key_value
--++ 
--++ 
--+++# class DeepseekFlashAttention(nn.Module):
--+++#     """
--+++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--+++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--+++
--+++#     This class is designed as a drop-in replacement for DeepseekAttention.
--+++#     """
--+++
--+++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--+++#         super().__init__()
--+++#         self.config = config
--+++#         self.layer_idx = layer_idx
--+++#         if layer_idx is None:
--+++#             logger.warning(
--+++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++#                 "when creating this class."
--+++#             )
--+++
--+++#         self.attention_dropout = config.attention_dropout
--+++#         self.hidden_size = config.hidden_size
--+++#         self.num_heads = config.num_attention_heads
--+++#         self.head_dim = self.hidden_size // self.num_heads
--+++#         self.num_key_value_heads = config.num_key_value_heads
--+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++#         self.max_position_embeddings = config.max_position_embeddings
--+++#         self.rope_theta = config.rope_theta
--+++#         self.is_causal = True
--+++
--+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+++#             raise ValueError(
--+++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+++#                 f" and `num_heads`: {self.num_heads})."
--+++#             )
--+++
--+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--+++#         self._init_rope()
--+++
--+++#     def _init_rope(self):
--+++#         if self.config.rope_scaling is None:
--+++#             self.rotary_emb = DeepseekRotaryEmbedding(
--+++#                 self.head_dim,
--+++#                 max_position_embeddings=self.max_position_embeddings,
--+++#                 base=self.rope_theta,
--+++#             )
--+++#         else:
--+++#             scaling_type = self.config.rope_scaling["type"]
--+++#             scaling_factor = self.config.rope_scaling["factor"]
--+++#             if scaling_type == "linear":
--+++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--+++#                     self.head_dim,
--+++#                     max_position_embeddings=self.max_position_embeddings,
--+++#                     scaling_factor=scaling_factor,
--+++#                     base=self.rope_theta,
--+++#                 )
--+++#             elif scaling_type == "dynamic":
--+++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--+++#                     self.head_dim,
--+++#                     max_position_embeddings=self.max_position_embeddings,
--+++#                     scaling_factor=scaling_factor,
--+++#                     base=self.rope_theta,
--+++#                 )
--+++#             else:
--+++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--+++
--+++#     def forward(
--+++#         self,
--+++#         hidden_states: mindspore.Tensor,
--+++#         attention_mask: Optional[mindspore.Tensor] = None,
--+++#         position_ids: Optional[mindspore.Tensor] = None,
--+++#         past_key_value: Optional[Cache] = None,
--+++#         output_attentions: bool = False,
--+++#         use_cache: bool = False,
--+++#         **kwargs,
--+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++#         if "padding_mask" in kwargs:
--+++#             warnings.warn(
--+++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--+++#             )
--+++        
--+++#         if output_attentions:
--+++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--+++
--+++#         bsz, q_len, _ = hidden_states.shape
--+++
--+++#         if self.config.pretraining_tp > 1:
--+++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--+++
--+++#         query_states = self.q_proj(hidden_states)
--+++#         key_states = self.k_proj(hidden_states)
--+++#         value_states = self.v_proj(hidden_states)
--+++
--+++#         # Reshape for multi-head attention
--+++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++#         kv_seq_len = key_states.shape[-2]
--+++#         if past_key_value is not None:
--+++#             if self.layer_idx is None:
--+++#                 raise ValueError(
--+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++#                     "with a layer index."
--+++#                 )
--+++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++        
--+++#         # Apply Rotary Positional Embedding
--+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++#         if past_key_value is not None:
--+++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--+++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++
--+++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--+++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--+++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++        
--+++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+++        
--+++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+++
--+++#         # Convert attention_mask for flash_attention_score
--+++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--+++#         if attention_mask is not None:
--+++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--+++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--+++#                 raise ValueError(
--+++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--+++#                 )
--+++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--+++#         else:
--+++#             attn_mask_for_fa = None
--+++        
--+++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--+++
--+++#         # Call the fused flash_attention_score operator
--+++#         attn_output = mindspore.ops.flash_attention_score(
--+++#             query=query_states_for_fa,
--+++#             key=key_states_for_fa,
--+++#             value=value_states_for_fa,
--+++#             head_num=self.num_heads, # This is N1, the number of query heads
--+++#             input_layout='BSH',
--+++#             attn_mask=attn_mask_for_fa,
--+++#             keep_prob=keep_prob,
--+++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--+++#             sparse_mode=0 # Default mask mode
--+++#         )
--+++        
--+++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--+++#         attn_output = self.o_proj(attn_output)
--+++        
--+++#         # Flash Attention does not return attention weights
--+++#         attn_weights = None
--+++
--+++#         return attn_output, attn_weights, past_key_value
--+++
--+++class DeepseekFlashAttention(nn.Module):
--+++    """
--+++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
--+++    This implementation is a drop-in replacement for the original DeepseekAttention class,
--+++    designed for high performance on supported hardware (Ascend).
--+++
--+++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
--+++    """
--+++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--+++        super().__init__()
--+++        self.config = config
--+++        self.layer_idx = layer_idx
--+++        if layer_idx is None:
--+++            logger.warning(
--+++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++                "when creating this class."
--+++            )
--+++
--+++        # --- [FIX] Correctly initialize all required attributes ---
--+++        self.attention_dropout = config.attention_dropout
--+++        self.hidden_size = config.hidden_size
--+++        self.num_heads = config.num_attention_heads
--+++        self.head_dim = self.hidden_size // self.num_heads
--+++        self.num_key_value_heads = config.num_key_value_heads
--+++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++        self.max_position_embeddings = config.max_position_embeddings
--+++        self.rope_theta = config.rope_theta
--+++        self.is_causal = True
--+++
--+++        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++            raise ValueError(
--+++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+++                f" and `num_heads`: {self.num_heads})."
--+++            )
--+++
--+++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--+++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--+++        
--+++        # This call will now succeed as all attributes are initialized.
--+++        self._init_rope()
--+++
--+++    def _init_rope(self):
--+++        if self.config.rope_scaling is None:
--+++            self.rotary_emb = DeepseekRotaryEmbedding(
--+++                self.head_dim,
--+++                max_position_embeddings=self.max_position_embeddings,
--+++                base=self.rope_theta,
--+++            )
--+++        else:
--+++            scaling_type = self.config.rope_scaling["type"]
--+++            scaling_factor = self.config.rope_scaling["factor"]
--+++            if scaling_type == "linear":
--+++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--+++                    self.head_dim,
--+++                    max_position_embeddings=self.max_position_embeddings,
--+++                    scaling_factor=scaling_factor,
--+++                    base=self.rope_theta,
--+++                )
--+++            elif scaling_type == "dynamic":
--+++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--+++                    self.head_dim,
--+++                    max_position_embeddings=self.max_position_embeddings,
--+++                    scaling_factor=scaling_factor,
--+++                    base=self.rope_theta,
--+++                )
--+++            else:
--+++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--+++
--+++    def forward(
--+++        self,
--+++        hidden_states: mindspore.Tensor,
--+++        attention_mask: Optional[mindspore.Tensor] = None,
--+++        position_ids: Optional[mindspore.Tensor] = None,
--+++        past_key_value: Optional[Cache] = None,
--+++        output_attentions: bool = False,
--+++        use_cache: bool = False,
--+++        **kwargs,
--+++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++        if "padding_mask" in kwargs:
--+++            warnings.warn(
--+++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--+++            )
--+++        if output_attentions:
--+++            warnings.warn(
--+++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
--+++            )
--+++
--+++        bsz, q_len, _ = hidden_states.shape
--+++
--+++        if self.config.pretraining_tp > 1:
--+++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--+++
--+++        query_states = self.q_proj(hidden_states)
--+++        key_states = self.k_proj(hidden_states)
--+++        value_states = self.v_proj(hidden_states)
--+++
--+++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
--+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++        kv_seq_len = key_states.shape[-2]
--+++        if past_key_value is not None:
--+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++        
--+++        # Apply Rotary Position Embedding
--+++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++        if past_key_value is not None:
--+++            cache_kwargs = {"sin": sin, "cos": cos}
--+++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++
--+++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
--+++        # So we must explicitly repeat the KV heads.
--+++        key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++        value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++
--+++        # Convert attention mask for flash_attention_score
--+++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
--+++        if attention_mask is not None:
--+++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--+++                 raise ValueError(
--+++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--+++                )
--+++            attn_mask_for_fa = attention_mask < 0
--+++        else:
--+++            attn_mask_for_fa = None
--+++
--+++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--+++
--+++        # Call the fused operator using the efficient BNSD layout
--+++        attn_output = mindspore.ops.flash_attention_score(
--+++            query=query_states,
--+++            key=key_states,
--+++            value=value_states,
--+++            head_num=self.num_heads,
--+++            input_layout='BNSD', # Specify the correct layout
--+++            attn_mask=attn_mask_for_fa,
--+++            keep_prob=keep_prob,
--+++            scalar_value=1.0 / math.sqrt(self.head_dim)
--+++        )
--+++        
--+++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
--+++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++        
--+++        # Apply output projection
--+++        attn_output = self.o_proj(attn_output)
--+++
--+++        # Flash attention does not return attention weights, so we return None.
--+++        attn_weights = None
--+++
--+++        return attn_output, attn_weights, past_key_value
--+++
--++ Deepseek_ATTENTION_CLASSES = {
--++     "eager": DeepseekAttention,
--+++    "flash-attention": DeepseekFlashAttention,
--++ }
--++ 
--++ 
--++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
--++             config=config, layer_idx=layer_idx
--++         )
--++ 
--+++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--+++            config=config, layer_idx=layer_idx
--+++        )
--+++
--++         self.mlp = (
--++             DeepseekMoE(config)
--++             if (
--++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++index d4c6b651..bced285c 100644
--++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
--++ 
--++ import mindspore
--++ import mindnlp.core.nn.functional as F
--++-from mindnlp.core import nn, ops
--+++from mindnlp.core import nn, ops, no_grad
--++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
--++ 
--++ from ....common.activations import ACT2FN
--++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
--++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--++ 
--+++Long_Prompt = False
--+++PROMPT_LENGTH_THRESHOLD = 128
--++ 
--++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--++ def _prepare_4d_causal_attention_mask_with_cache_position(
--++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
--++         return attn_output, attn_weights, past_key_value
--++ 
--++ 
--+++# class Qwen2MoeFlashAttention(nn.Module):
--+++#     """
--+++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++
--+++#     关键改动:
--+++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++#        直接传入原始的 key 和 value 张量效率更高。
--+++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++#     """
--+++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++#         super().__init__()
--+++#         self.config = config
--+++#         self.layer_idx = layer_idx
--+++#         self.hidden_size = config.hidden_size
--+++#         self.num_heads = config.num_attention_heads
--+++#         self.head_dim = self.hidden_size // self.num_heads
--+++#         self.num_key_value_heads = config.num_key_value_heads
--+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++#         self.max_position_embeddings = config.max_position_embeddings
--+++#         self.rope_theta = config.rope_theta
--+++#         self.attention_dropout = config.attention_dropout
--+++
--+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+++#             raise ValueError(
--+++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++#             )
--+++
--+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++
--+++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++#             self.head_dim,
--+++#             max_position_embeddings=self.max_position_embeddings,
--+++#             base=self.rope_theta,
--+++#         )
--+++
--+++#     def forward(
--+++#         self,
--+++#         hidden_states: mindspore.Tensor,
--+++#         attention_mask: Optional[mindspore.Tensor] = None,
--+++#         position_ids: Optional[mindspore.Tensor] = None,
--+++#         past_key_value: Optional[Cache] = None,
--+++#         output_attentions: bool = False,
--+++#         use_cache: bool = False,
--+++#         cache_position: Optional[mindspore.Tensor] = None,
--+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++#         bsz, q_len, _ = hidden_states.shape
--+++
--+++#         # 1. 线性投射 Q, K, V
--+++#         query_states = self.q_proj(hidden_states)
--+++#         key_states = self.k_proj(hidden_states)
--+++#         value_states = self.v_proj(hidden_states)
--+++
--+++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++#         # query:   [B, S, H*D] -> [B, N1, S, D]
--+++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++#         # 3. RoPE 旋转位置编码
--+++#         kv_seq_len = key_states.shape[-2]
--+++#         if past_key_value is not None:
--+++#             if self.layer_idx is None:
--+++#                 raise ValueError(
--+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++#                     "with a layer index."
--+++#                 )
--+++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++#                 if cache_position.shape[0] == 1:
--+++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++#                     kv_seq_len = past_seen_tokens + 1
--+++#                 else:
--+++#                     # prefill 阶段：cache_position 是范围，使用其长度
--+++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++#             else:
--+++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++
--+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++#         # 4. KV 缓存更新
--+++#         if past_key_value is not None:
--+++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++#             key_states, value_states = past_key_value.update(
--+++#                 key_states, value_states, self.layer_idx, cache_kwargs
--+++#             )
--+++            
--+++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++#                 if cache_position.shape[0] == 1:
--+++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++#                     kv_seq_len = key_states.shape[-2]
--+++
--+++#         # 5. [重要] 准备 Attention Mask
--+++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++#         fa_attention_mask = None
--+++#         if attention_mask is not None:
--+++#             # 截取与当前key长度匹配的部分
--+++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++#             fa_attention_mask = (mask_slice != 0)
--+++
--+++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++#         input_dtype = query_states.dtype
--+++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++#             query_states = query_states.to(mindspore.float16)
--+++#             key_states = key_states.to(mindspore.float16)
--+++#             value_states = value_states.to(mindspore.float16)
--+++
--+++#         # 6. [核心] 调用 flash_attention_score 算子
--+++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++#         attn_output = mindspore.ops.flash_attention_score(
--+++#             query=query_states,
--+++#             key=key_states,
--+++#             value=value_states,
--+++#             head_num=self.num_heads, # 传入Q的头数(N1)
--+++#             attn_mask=fa_attention_mask,
--+++#             keep_prob=1.0 - self.attention_dropout,
--+++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--+++#             input_layout="BNSD",
--+++#             sparse_mode=0 # 使用 defaultMask 模式
--+++#         )
--+++
--+++#         # 恢复原始数据类型
--+++#         attn_output = attn_output.to(input_dtype)
--+++
--+++#         # 7. 调整输出形状
--+++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++#         attn_output = self.o_proj(attn_output)
--+++
--+++#         # FlashAttention 算子不直接返回注意力权重矩阵
--+++#         attn_weights = None
--+++#         if output_attentions:
--+++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++
--+++#         return attn_output, attn_weights, past_key_value
--+++
--+++#     # def forward(
--+++#     #     self,
--+++#     #     hidden_states: mindspore.Tensor,
--+++#     #     attention_mask: Optional[mindspore.Tensor] = None,
--+++#     #     position_ids: Optional[mindspore.Tensor] = None,
--+++#     #     past_key_value: Optional[Cache] = None,
--+++#     #     output_attentions: bool = False,
--+++#     #     use_cache: bool = False,
--+++#     #     cache_position: Optional[mindspore.Tensor] = None,
--+++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++#     #     bsz, q_len, _ = hidden_states.shape
--+++
--+++#     #     # 1. 线性投射 Q, K, V
--+++#     #     query_states = self.q_proj(hidden_states)
--+++#     #     key_states = self.k_proj(hidden_states)
--+++#     #     value_states = self.v_proj(hidden_states)
--+++
--+++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++
--+++#     #     # 3. RoPE 旋转位置编码
--+++#     #     kv_seq_len = key_states.shape[-2]
--+++#     #     if past_key_value is not None:
--+++#     #         if self.layer_idx is None:
--+++#     #             raise ValueError(
--+++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++#     #                 "with a layer index."
--+++#     #             )
--+++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++
--+++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++#     #     # 4. KV 缓存更新
--+++#     #     if past_key_value is not None:
--+++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++#     #         key_states, value_states = past_key_value.update(
--+++#     #             key_states, value_states, self.layer_idx, cache_kwargs
--+++#     #         )
--+++
--+++#     #     # 5. 准备 Attention Mask
--+++#     #     fa_attention_mask = None
--+++#     #     if attention_mask is not None:
--+++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++#     #         fa_attention_mask = (mask_slice != 0)
--+++
--+++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++#     #     input_dtype = query_states.dtype
--+++
--+++#     #     # 6. [核心] 调用 flash_attention_score 算子
--+++#     #     attn_output = mindspore.ops.flash_attention_score(
--+++#     #         query=query_states,
--+++#     #         key=key_states,
--+++#     #         value=value_states,
--+++#     #         head_num=self.num_heads,
--+++#     #         attn_mask=fa_attention_mask,
--+++#     #         keep_prob=1.0 - self.attention_dropout,
--+++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++#     #         input_layout="BNSD",
--+++#     #         sparse_mode=0,
--+++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++#     #         inner_precise=1
--+++#     #     )
--+++
--+++#     #     # 恢复原始数据类型
--+++#     #     attn_output = attn_output.to(input_dtype)
--+++
--+++#     #     # 7. 调整输出形状
--+++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++#     #     attn_output = self.o_proj(attn_output)
--+++
--+++#     #     attn_weights = None
--+++#     #     if output_attentions:
--+++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++
--+++#     #     return attn_output, attn_weights, past_key_value
--+++
--+++
--++ class Qwen2MoeFlashAttention(nn.Module):
--++     """
--++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++-
--++-    关键改动:
--++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++-       直接传入原始的 key 和 value 张量效率更高。
--++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
--+++
--+++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
--+++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
--+++    完全使用模型的低精度数据类型（如 float16）进行计算，
--+++    以达到理论上的最高执行速度。
--++     """
--++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++         super().__init__()
--++         self.config = config
--++         self.layer_idx = layer_idx
--+++        if layer_idx is None:
--+++            logger.warning_once(
--+++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
--+++            )
--+++
--++         self.hidden_size = config.hidden_size
--++         self.num_heads = config.num_attention_heads
--++         self.head_dim = self.hidden_size // self.num_heads
--++         self.num_key_value_heads = config.num_key_value_heads
--++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++         self.max_position_embeddings = config.max_position_embeddings
--++         self.rope_theta = config.rope_theta
--++         self.attention_dropout = config.attention_dropout
--++ 
--++-        if (self.head_dim * self.num_heads) != self.hidden_size:
--++-            raise ValueError(
--++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++-            )
--++-
--++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
--++         key_states = self.k_proj(hidden_states)
--++         value_states = self.v_proj(hidden_states)
--++ 
--++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++-        # query:   [B, S, H*D] -> [B, N1, S, D]
--++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++        # 2. 调整形状以匹配 BNSD 布局
--++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-
--++-        # 3. RoPE 旋转位置编码
--+++        
--+++        # 3. RoPE 和 KV 缓存
--++         kv_seq_len = key_states.shape[-2]
--++         if past_key_value is not None:
--++-            if self.layer_idx is None:
--++-                raise ValueError(
--++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++-                    "with a layer index."
--++-                )
--++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++-                if cache_position.shape[0] == 1:
--++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++-                    kv_seq_len = past_seen_tokens + 1
--++-                else:
--++-                    # prefill 阶段：cache_position 是范围，使用其长度
--++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++-            else:
--++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++-
--+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++        
--++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++ 
--++-        # 4. KV 缓存更新
--++         if past_key_value is not None:
--++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++-            key_states, value_states = past_key_value.update(
--++-                key_states, value_states, self.layer_idx, cache_kwargs
--++-            )
--++-            
--++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++-                if cache_position.shape[0] == 1:
--++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++-                    kv_seq_len = key_states.shape[-2]
--++-
--++-        # 5. [重要] 准备 Attention Mask
--++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++
--+++        # 4. 准备 Attention Mask
--++         fa_attention_mask = None
--++         if attention_mask is not None:
--++-            # 截取与当前key长度匹配的部分
--++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++             fa_attention_mask = (mask_slice != 0)
--++ 
--++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++-        input_dtype = query_states.dtype
--++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++-            query_states = query_states.to(mindspore.float16)
--++-            key_states = key_states.to(mindspore.float16)
--++-            value_states = value_states.to(mindspore.float16)
--++-
--++-        # 6. [核心] 调用 flash_attention_score 算子
--++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
--++         attn_output = mindspore.ops.flash_attention_score(
--++             query=query_states,
--++             key=key_states,
--++             value=value_states,
--++-            head_num=self.num_heads, # 传入Q的头数(N1)
--+++            head_num=self.num_heads,
--++             attn_mask=fa_attention_mask,
--++-            keep_prob=1.0 - self.attention_dropout,
--+++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
--++             scalar_value=1.0 / math.sqrt(self.head_dim),
--++             input_layout="BNSD",
--++-            sparse_mode=0 # 使用 defaultMask 模式
--+++            sparse_mode=0,
--+++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
--++         )
--++ 
--++-        # 恢复原始数据类型
--++-        attn_output = attn_output.to(input_dtype)
--++-
--++-        # 7. 调整输出形状
--++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++        # 6. 调整输出形状
--++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++         attn_output = self.o_proj(attn_output)
--++ 
--++-        # FlashAttention 算子不直接返回注意力权重矩阵
--+++        # 7. 返回结果
--++         attn_weights = None
--++         if output_attentions:
--++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--++ 
--++         return attn_output, attn_weights, past_key_value
--++ 
--++-    # def forward(
--++-    #     self,
--++-    #     hidden_states: mindspore.Tensor,
--++-    #     attention_mask: Optional[mindspore.Tensor] = None,
--++-    #     position_ids: Optional[mindspore.Tensor] = None,
--++-    #     past_key_value: Optional[Cache] = None,
--++-    #     output_attentions: bool = False,
--++-    #     use_cache: bool = False,
--++-    #     cache_position: Optional[mindspore.Tensor] = None,
--++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++-
--++-    #     bsz, q_len, _ = hidden_states.shape
--++-
--++-    #     # 1. 线性投射 Q, K, V
--++-    #     query_states = self.q_proj(hidden_states)
--++-    #     key_states = self.k_proj(hidden_states)
--++-    #     value_states = self.v_proj(hidden_states)
--++-
--++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-
--++-    #     # 3. RoPE 旋转位置编码
--++-    #     kv_seq_len = key_states.shape[-2]
--++-    #     if past_key_value is not None:
--++-    #         if self.layer_idx is None:
--++-    #             raise ValueError(
--++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++-    #                 "with a layer index."
--++-    #             )
--++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++ 
--++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++-
--++-    #     # 4. KV 缓存更新
--++-    #     if past_key_value is not None:
--++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++-    #         key_states, value_states = past_key_value.update(
--++-    #             key_states, value_states, self.layer_idx, cache_kwargs
--++-    #         )
--++-
--++-    #     # 5. 准备 Attention Mask
--++-    #     fa_attention_mask = None
--++-    #     if attention_mask is not None:
--++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++-    #         fa_attention_mask = (mask_slice != 0)
--++-
--++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++-    #     input_dtype = query_states.dtype
--++-
--++-    #     # 6. [核心] 调用 flash_attention_score 算子
--++-    #     attn_output = mindspore.ops.flash_attention_score(
--++-    #         query=query_states,
--++-    #         key=key_states,
--++-    #         value=value_states,
--++-    #         head_num=self.num_heads,
--++-    #         attn_mask=fa_attention_mask,
--++-    #         keep_prob=1.0 - self.attention_dropout,
--++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++-    #         input_layout="BNSD",
--++-    #         sparse_mode=0,
--++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++-    #         inner_precise=1
--++-    #     )
--++-
--++-    #     # 恢复原始数据类型
--++-    #     attn_output = attn_output.to(input_dtype)
--+++QWEN2MOE_ATTENTION_CLASSES = {
--+++    "eager": Qwen2MoeAttention,
--+++    "flash-attention": Qwen2MoeFlashAttention,
--+++}
--++ 
--++-    #     # 7. 调整输出形状
--++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++-    #     attn_output = self.o_proj(attn_output)
--++ 
--++-    #     attn_weights = None
--++-    #     if output_attentions:
--++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++#     def __init__(self, config):
--+++#         super().__init__()
--+++#         self.num_experts = config.num_experts
--+++#         self.top_k = config.num_experts_per_tok
--+++#         self.norm_topk_prob = config.norm_topk_prob
--+++
--+++#         # gating
--+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++#         self.experts = nn.ModuleList(
--+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++#         )
--+++
--+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++
--+++#     #@dwj
--+++#     # 只遍历激活的专家，而非全部专家
--+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++#             num_tokens = hidden_states_reshaped.shape[0]
--+++
--+++#             router_logits = self.gate(hidden_states_reshaped)
--+++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++
--+++#             if self.norm_topk_prob:
--+++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++#             routing_weights = routing_weights.to(hidden_states.dtype)
--+++            
--+++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++#             flat_selected_experts = selected_experts.flatten()
--+++            
--+++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++#             token_indices = broadcasted_token_indices.flatten()
--+++            
--+++#             active_experts = ops.unique(flat_selected_experts)
--+++            
--+++#             for expert_idx_tensor in active_experts:
--+++#                 expert_idx = expert_idx_tensor.item()
--+++#                 expert_layer = self.experts[expert_idx]
--+++                
--+++#                 mask = (flat_selected_experts == expert_idx_tensor)
--+++#                 selected_token_indices = token_indices[mask]
--+++#                 selected_routing_weights = routing_weights.flatten()[mask]
--+++                
--+++#                 current_states = hidden_states_reshaped[selected_token_indices]
--+++                
--+++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++                
--+++#                 final_hidden_states = final_hidden_states.index_add(
--+++#                     dim=0,
--+++#                     index=selected_token_indices,
--+++#                     source=expert_output.to(hidden_states.dtype)
--+++#                 )
--+++            
--+++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++ 
--++-    #     return attn_output, attn_weights, past_key_value
--+++#             final_hidden_states = final_hidden_states + shared_expert_output
--+++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++            
--+++#             return final_hidden_states, router_logits
--+++
--+++
--+++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++#     """
--+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--+++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--+++#     `_moe_infer_prefill` (用于长序列处理) 方法。
--+++#     """
--+++#     def __init__(self, config: Qwen2MoeConfig):
--+++#         super().__init__()
--+++#         self.num_experts = config.num_experts
--+++#         self.top_k = config.num_experts_per_tok
--+++#         self.norm_topk_prob = config.norm_topk_prob
--+++
--+++#         # 门控网络
--+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++#         # 专家列表
--+++#         self.experts = nn.ModuleList(
--+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++#         )
--+++#         # 共享专家
--+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++
--+++#     @no_grad()
--+++#     def _moe_infer_decode(
--+++#         self, 
--+++#         hidden_states: mindspore.Tensor, 
--+++#         selected_experts: mindspore.Tensor, 
--+++#         routing_weights: mindspore.Tensor
--+++#     ) -> mindspore.Tensor:
--+++#         """
--+++#         【解码路径】针对 sequence_length=1 的极致优化。
--+++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--+++#         """
--+++#         batch_size, hidden_dim = hidden_states.shape
--+++        
--+++#         expert_outputs_list = [
--+++#             ops.cat([
--+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++#             ], dim=0) 
--+++#             for i in range(batch_size)
--+++#         ]
--+++        
--+++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--+++#         # shape: (batch_size, top_k, hidden_dim)
--+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++        
--+++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--+++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++        
--+++#         return moe_output.squeeze(1)
--+++
--+++#     @no_grad()
--+++#     def _moe_infer_prefill(
--+++#         self, 
--+++#         hidden_states: mindspore.Tensor, 
--+++#         selected_experts: mindspore.Tensor, 
--+++#         routing_weights: mindspore.Tensor
--+++#     ) -> mindspore.Tensor:
--+++#         """
--+++#         【预填充路径】针对 sequence_length > 1 的优化。
--+++#         按专家对 Token 进行分组，并进行批处理。
--+++#         """
--+++#         moe_output = ops.zeros_like(hidden_states)
--+++#         num_tokens = hidden_states.shape[0]
--+++#         flat_selected_experts = selected_experts.flatten()
--+++        
--+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++        
--+++#         active_experts = ops.unique(flat_selected_experts)
--+++        
--+++#         for expert_idx_tensor in active_experts:
--+++#             expert_idx = expert_idx_tensor.item()
--+++#             expert_layer = self.experts[expert_idx]
--+++            
--+++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++#             selected_token_indices = token_indices[mask]
--+++#             selected_routing_weights = routing_weights.flatten()[mask]
--+++            
--+++#             current_states = hidden_states[selected_token_indices]
--+++            
--+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++            
--+++#             moe_output = moe_output.index_add(
--+++#                 dim=0,
--+++#                 index=selected_token_indices,
--+++#                 source=expert_output.to(hidden_states.dtype)
--+++#             )
--+++#         return moe_output
--+++
--+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++#         """
--+++#         顶层 forward 方法，作为智能分发器。
--+++#         """
--+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++        
--+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++#         router_logits = self.gate(hidden_states_reshaped)
--+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++ 
--++-    # def forward(
--++-    #     self,
--++-    #     hidden_states: mindspore.Tensor,
--++-    #     attention_mask: Optional[mindspore.Tensor] = None,
--++-    #     position_ids: Optional[mindspore.Tensor] = None,
--++-    #     past_key_value: Optional[Cache] = None,
--++-    #     output_attentions: bool = False,
--++-    #     use_cache: bool = False,
--++-    #     cache_position: Optional[mindspore.Tensor] = None,
--++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++-
--++-    #     bsz, q_len, _ = hidden_states.shape
--++-
--++-    #     query_states = self.q_proj(hidden_states)
--++-    #     key_states = self.k_proj(hidden_states)
--++-    #     value_states = self.v_proj(hidden_states)
--++-
--++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-
--++-    #     kv_seq_len = key_states.shape[-2]
--++-    #     if past_key_value is not None:
--++-    #         if self.layer_idx is None:
--++-    #             raise ValueError("`layer_idx` must be specified for caching")
--++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++-
--++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++-
--++-    #     if past_key_value is not None:
--++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++-    #         key_states, value_states = past_key_value.update(
--++-    #             key_states, value_states, self.layer_idx, cache_kwargs
--++-    #         )
--+++#         if self.norm_topk_prob:
--+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++        
--+++#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++        
--+++#         moe_output = None
--+++#         # 在推理时，根据序列长度选择最优路径
--+++#         if not self.training:
--+++#             if sequence_length == 1:
--+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+++#             else:
--+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+++#         else:
--+++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--+++#             raise NotImplementedError("Training path is not implemented.")
--+++
--+++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--+++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--+++        
--+++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--+++        
--+++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--+++        
--+++#         return final_hidden_states, router_logits
--+++
--+++
--+++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++#     """
--+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--+++#     """
--+++#     def __init__(self, config: Qwen2MoeConfig):
--+++#         super().__init__()
--+++#         self.num_experts = config.num_experts
--+++#         self.top_k = config.num_experts_per_tok
--+++#         self.norm_topk_prob = config.norm_topk_prob
--+++
--+++#         # 门控网络
--+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++#         # 专家列表
--+++#         self.experts = nn.ModuleList(
--+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++#         )
--+++#         # 共享专家
--+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++
--+++#     @no_grad()
--+++#     def _moe_infer_decode(
--+++#         self, 
--+++#         hidden_states: mindspore.Tensor, 
--+++#         selected_experts: mindspore.Tensor, 
--+++#         routing_weights: mindspore.Tensor
--+++#     ) -> mindspore.Tensor:
--+++#         batch_size, _ = hidden_states.shape
--+++#         expert_outputs_list = [
--+++#             ops.cat([
--+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++#             ], dim=0) 
--+++#             for i in range(batch_size)
--+++#         ]
--+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++#         return moe_output.squeeze(1)
--+++
--+++#     @no_grad()
--+++#     def _moe_infer_prefill(
--+++#         self, 
--+++#         hidden_states: mindspore.Tensor, 
--+++#         selected_experts: mindspore.Tensor, 
--+++#         routing_weights: mindspore.Tensor
--+++#     ) -> mindspore.Tensor:
--+++#         moe_output = ops.zeros_like(hidden_states)
--+++#         num_tokens = hidden_states.shape[0]
--+++#         flat_selected_experts = selected_experts.flatten()
--+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++#         active_experts = ops.unique(flat_selected_experts)
--+++        
--+++#         for expert_idx_tensor in active_experts:
--+++#             expert_idx = expert_idx_tensor.item()
--+++#             expert_layer = self.experts[expert_idx]
--+++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++#             selected_token_indices = token_indices[mask]
--+++#             selected_routing_weights = routing_weights.flatten()[mask]
--+++#             current_states = hidden_states[selected_token_indices]
--+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++#             moe_output = moe_output.index_add(
--+++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+++#             )
--+++#         return moe_output
--+++
--+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++#         """
--+++#         顶层 forward 方法，作为智能分发器。
--+++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--+++#         """
--+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++        
--+++#         # 1. 门控计算 (通用逻辑)
--+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++#         router_logits = self.gate(hidden_states_reshaped)
--+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++
--+++#         if self.norm_topk_prob:
--+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++        
--+++#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++        
--+++#         # 2. 智能分发到最优 MoE 路径
--+++#         moe_output = None
--+++#         if not self.training:
--+++#             if sequence_length == 1:
--+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+++#             else:
--+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+++#         else:
--+++#             raise NotImplementedError("Training path is not implemented.")
--+++
--+++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--+++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++        
--+++#         # 4. 合并 MoE 输出和共享专家输出
--+++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++        
--+++#         # 5. 恢复原始形状并返回
--+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++        
--+++#         return final_hidden_states, router_logits
--+++
--+++# prefill fastest
--+++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++#     """
--+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--+++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--+++#     """
--+++#     def __init__(self, config: Qwen2MoeConfig):
--+++#         super().__init__()
--+++#         self.num_experts = config.num_experts
--+++#         self.top_k = config.num_experts_per_tok
--+++#         self.norm_topk_prob = config.norm_topk_prob
--+++
--+++#         # 门控网络
--+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++#         # 专家列表
--+++#         self.experts = nn.ModuleList(
--+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++#         )
--+++#         # 共享专家
--+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++
--+++#     @no_grad()
--+++#     def _moe_infer_dispatch(
--+++#         self, 
--+++#         hidden_states: mindspore.Tensor, 
--+++#         selected_experts: mindspore.Tensor, 
--+++#         routing_weights: mindspore.Tensor
--+++#     ) -> mindspore.Tensor:
--+++#         """
--+++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--+++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--+++#         """
--+++#         moe_output = ops.zeros_like(hidden_states)
--+++#         num_tokens, _ = hidden_states.shape
--+++        
--+++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--+++#         flat_selected_experts = selected_experts.flatten()
--+++#         flat_routing_weights = routing_weights.flatten()
--++ 
--++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++-
--++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++-    #     query_states = query_states / math.sqrt(self.head_dim)
--++-    #     # <--- 修改结束 ---
--++-
--++-    #     fa_attention_mask = None
--++-    #     if attention_mask is not None:
--++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++-    #         fa_attention_mask = (mask_slice != 0)
--++-
--++-    #     input_dtype = query_states.dtype
--++-
--++-    #     attn_output = mindspore.ops.flash_attention_score(
--++-    #         query=query_states,    # 传入已经预先缩放过的 query
--++-    #         key=key_states,
--++-    #         value=value_states,
--++-    #         head_num=self.num_heads,
--++-    #         attn_mask=fa_attention_mask,
--++-    #         keep_prob=1.0 - self.attention_dropout,
--++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++-    #         input_layout="BNSD",
--++-    #         sparse_mode=0,
--++-    #         inner_precise=1        # 仍然保持内部高精度计算
--++-    #     )
--+++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++ 
--++-    #     attn_output = attn_output.to(input_dtype)
--++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++-    #     attn_output = self.o_proj(attn_output)
--+++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--+++#         active_experts = ops.unique(flat_selected_experts)
--+++        
--+++#         for expert_idx_tensor in active_experts:
--+++#             expert_idx = expert_idx_tensor.item()
--+++#             expert_layer = self.experts[expert_idx]
--+++            
--+++#             # 找到所有分配给该专家的 token
--+++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++            
--+++#             # 使用 mask 选取对应的 token 和权重
--+++#             current_token_indices = token_indices[mask]
--+++#             current_routing_weights = flat_routing_weights[mask]
--+++#             current_hidden_states = hidden_states[current_token_indices]
--+++            
--+++#             # 对这些 token 进行批处理
--+++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++            
--+++#             # 使用 index_add 将结果精确地加回到对应位置
--+++#             moe_output = moe_output.index_add(
--+++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--+++#             )
--+++#         return moe_output
--+++
--+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++#         """
--+++#         顶层 forward 方法，作为智能分发器。
--+++#         """
--+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++        
--+++#         # 1. 门控计算
--+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++#         router_logits = self.gate(hidden_states_reshaped)
--+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++
--+++#         if self.norm_topk_prob:
--+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++        
--+++#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++        
--+++#         # 2. 调用统一的 MoE 计算内核
--+++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--+++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--++ 
--++-    #     attn_weights = None
--++-    #     if output_attentions:
--++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++#         # 3. 统一处理共享专家
--+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++        
--+++#         # 4. 合并输出
--+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++        
--+++#         # 5. 恢复原始形状并返回
--+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++        
--+++#         return final_hidden_states, router_logits
--+++
--+++
--+++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++#     """
--+++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++#     【最终高性能与高精度版】：
--+++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--+++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--+++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--+++#     3. 这样实现了速度和准确性的两全其美。
--+++#     """
--+++#     def __init__(self, config: Qwen2MoeConfig):
--+++#         super().__init__()
--+++#         self.num_experts = config.num_experts
--+++#         self.top_k = config.num_experts_per_tok
--+++#         self.norm_topk_prob = config.norm_topk_prob
--+++
--+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++#         self.experts = nn.ModuleList(
--+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++#         )
--+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++
--+++#     @no_grad()
--+++#     def _moe_infer_decode(
--+++#         self, 
--+++#         hidden_states: mindspore.Tensor, 
--+++#         selected_experts: mindspore.Tensor, 
--+++#         routing_weights: mindspore.Tensor
--+++#     ) -> mindspore.Tensor:
--+++#         """
--+++#         【解码路径】极致优化版：bmm + 高精度累加。
--+++#         """
--+++#         original_dtype = hidden_states.dtype
--+++#         batch_size, _ = hidden_states.shape
--+++        
--+++#         expert_outputs_list = [
--+++#             ops.cat([
--+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++#             ], dim=0) 
--+++#             for i in range(batch_size)
--+++#         ]
--+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++
--+++#         # 在 float32 下执行 bmm，得到高精度结果
--+++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++        
--+++#         # 将高精度结果转换回原始数据类型
--+++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--+++        
--+++#         return moe_output
--+++
--+++#     @no_grad()
--+++#     def _moe_infer_prefill(
--+++#         self, 
--+++#         hidden_states: mindspore.Tensor, 
--+++#         selected_experts: mindspore.Tensor, 
--+++#         routing_weights: mindspore.Tensor
--+++#     ) -> mindspore.Tensor:
--+++#         """
--+++#         【预填充路径】与原始实现一致，结果精确。
--+++#         """
--+++#         moe_output = ops.zeros_like(hidden_states)
--+++#         num_tokens, _ = hidden_states.shape
--+++#         flat_selected_experts = selected_experts.flatten()
--+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++#         active_experts = ops.unique(flat_selected_experts)
--+++        
--+++#         for expert_idx_tensor in active_experts:
--+++#             expert_idx = expert_idx_tensor.item()
--+++#             expert_layer = self.experts[expert_idx]
--+++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++#             selected_token_indices = token_indices[mask]
--+++#             selected_routing_weights = routing_weights.flatten()[mask]
--+++#             current_states = hidden_states[selected_token_indices]
--+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++#             moe_output = moe_output.index_add(
--+++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+++#             )
--+++#         return moe_output
--+++
--+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++        
--+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++#         router_logits = self.gate(hidden_states_reshaped)
--+++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++ 
--++-    #     return attn_output, attn_weights, past_key_value
--+++#         if self.norm_topk_prob:
--+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++        
--+++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--+++#         # 如果模型主体是 float16，后续再转换
--+++        
--+++#         moe_output = None
--+++#         if not self.training:
--+++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--+++#             # _moe_infer_decode 内部会处理好类型转换
--+++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--+++#             if sequence_length == 1:
--+++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+++#             else:
--+++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+++#         else:
--+++#             raise NotImplementedError("Training path is not implemented.")
--+++
--+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++        
--+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++        
--+++#         return final_hidden_states, router_logits
--+++    
--++ 
--++-QWEN2MOE_ATTENTION_CLASSES = {
--++-    "eager": Qwen2MoeAttention,
--++-    "flash-attention": Qwen2MoeFlashAttention,
--++-}
--+++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++#     """
--+++#     【融合版】一个混合专家模块，内置两种推理策略，
--+++#     由外部全局变量 `Long_Prompt` 控制：
--+++
--+++#     - if Long_Prompt is True:  【精度优先模式】
--+++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--+++#       适用于处理长序列，避免误差累积。
--+++
--+++#     - if Long_Prompt is False: 【速度优先模式】
--+++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--+++#       在解码阶段获得极致速度，同时保证结果高度准确。
--+++#     """
--+++#     def __init__(self, config: Qwen2MoeConfig):
--+++#         super().__init__()
--+++#         self.num_experts = config.num_experts
--+++#         self.top_k = config.num_experts_per_tok
--+++#         self.norm_topk_prob = config.norm_topk_prob
--+++
--+++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++#         self.experts = nn.ModuleList(
--+++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++#         )
--+++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++
--+++#     # --- 速度优先模式的辅助函数 ---
--+++#     @no_grad()
--+++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++#         original_dtype = hidden_states.dtype
--+++#         batch_size, _ = hidden_states.shape
--+++#         expert_outputs_list = [
--+++#             ops.cat([
--+++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++#             ], dim=0) 
--+++#             for i in range(batch_size)
--+++#         ]
--+++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++#         weights_fp32 = routing_weights.to(mindspore.float32)
--+++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+++#         return moe_output_fp32.squeeze(1).to(original_dtype)
--+++
--+++#     @no_grad()
--+++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++#         moe_output = ops.zeros_like(hidden_states)
--+++#         num_tokens, _ = hidden_states.shape
--+++#         flat_selected_experts = selected_experts.flatten()
--+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++#         active_experts = ops.unique(flat_selected_experts)
--+++#         for expert_idx_tensor in active_experts:
--+++#             expert_idx = expert_idx_tensor.item()
--+++#             expert_layer = self.experts[expert_idx]
--+++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++#             selected_token_indices = token_indices[mask]
--+++#             selected_routing_weights = routing_weights.flatten()[mask]
--+++#             current_states = hidden_states[selected_token_indices]
--+++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--+++#         return moe_output
--+++
--+++#     # --- 精度优先模式的辅助函数 ---
--+++#     @no_grad()
--+++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++#         moe_output = ops.zeros_like(hidden_states)
--+++#         num_tokens, _ = hidden_states.shape
--+++#         flat_selected_experts = selected_experts.flatten()
--+++#         flat_routing_weights = routing_weights.flatten()
--+++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++#         active_experts = ops.unique(flat_selected_experts)
--+++#         for expert_idx_tensor in active_experts:
--+++#             expert_idx = expert_idx_tensor.item()
--+++#             expert_layer = self.experts[expert_idx]
--+++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++#             current_token_indices = token_indices[mask]
--+++#             current_routing_weights = flat_routing_weights[mask]
--+++#             current_hidden_states = hidden_states[current_token_indices]
--+++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+++#         return moe_output
--+++
--+++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++#         # 声明我们将要使用一个在模块外部定义的全局变量
--+++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--+++#         global Long_Prompt
--+++
--+++#         # 1. 门控计算 (所有模式通用)
--+++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++#         router_logits = self.gate(hidden_states_reshaped)
--+++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+++#         if self.norm_topk_prob:
--+++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++        
--+++#         moe_output = None
--+++#         if not self.training:
--+++#             # 根据 Long_Prompt 标志选择模式
--+++#             if Long_Prompt:
--+++#                 # --- 精度优先模式 ---
--+++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++#             else:
--+++#                 # --- 速度优先模式 ---
--+++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++#                 if sequence_length == 1:
--+++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++#                 else:
--+++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++#         else:
--+++#             raise NotImplementedError("Training path is not implemented.")
--+++
--+++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++        
--+++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++        
--+++#         return final_hidden_states, router_logits
--+++    
--+++class Qwen2MoeSparseMoeBlock(nn.Module):
--+++    """
--+++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--+++    控制的顶级推理策略：
--++ 
--+++    - if Long_Prompt is True:  【精度优先模式】
--+++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
--+++      适用于需要严格可复现性的长序列任务。
--++ 
--++-class Qwen2MoeSparseMoeBlock(nn.Module):
--++-    def __init__(self, config):
--+++    - if Long_Prompt is False: 【速度优先模式】
--+++      采用业界最强的性能组合：
--+++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
--+++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
--+++    """
--+++    def __init__(self, config: Qwen2MoeConfig):
--++         super().__init__()
--++         self.num_experts = config.num_experts
--++         self.top_k = config.num_experts_per_tok
--++         self.norm_topk_prob = config.norm_topk_prob
--++ 
--++-        # gating
--++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++         self.experts = nn.ModuleList(
--++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++         )
--++-
--++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++ 
--++-    #@dwj
--++-    # 只遍历激活的专家，而非全部专家
--++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++-            num_tokens = hidden_states_reshaped.shape[0]
--++-
--++-            router_logits = self.gate(hidden_states_reshaped)
--++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++-
--++-            if self.norm_topk_prob:
--++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-            routing_weights = routing_weights.to(hidden_states.dtype)
--++-            
--++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++-            flat_selected_experts = selected_experts.flatten()
--++-            
--++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++-            token_indices = broadcasted_token_indices.flatten()
--++-            
--++-            active_experts = ops.unique(flat_selected_experts)
--++-            
--++-            for expert_idx_tensor in active_experts:
--++-                expert_idx = expert_idx_tensor.item()
--++-                expert_layer = self.experts[expert_idx]
--++-                
--++-                mask = (flat_selected_experts == expert_idx_tensor)
--++-                selected_token_indices = token_indices[mask]
--++-                selected_routing_weights = routing_weights.flatten()[mask]
--++-                
--++-                current_states = hidden_states_reshaped[selected_token_indices]
--++-                
--++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++-                
--++-                final_hidden_states = final_hidden_states.index_add(
--++-                    dim=0,
--++-                    index=selected_token_indices,
--++-                    source=expert_output.to(hidden_states.dtype)
--++-                )
--++-            
--++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
--+++    @no_grad()
--+++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++        original_dtype = hidden_states.dtype
--+++        batch_size, _ = hidden_states.shape
--+++        expert_outputs_list = [
--+++            ops.cat([
--+++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++            ], dim=0)
--+++            for i in range(batch_size)
--+++        ]
--+++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++        weights_fp32 = routing_weights.to(mindspore.float32)
--+++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+++        return moe_output_fp32.squeeze(1).to(original_dtype)
--+++
--+++    @no_grad()
--+++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++        num_tokens, _ = hidden_states.shape
--+++        flat_selected_experts = selected_experts.flatten()
--+++        sorted_expert_indices = flat_selected_experts.argsort()
--+++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+++        original_token_indices = sorted_expert_indices // self.top_k
--+++        moe_output = ops.zeros_like(hidden_states)
--+++        current_token_offset = 0
--+++        for i in range(self.num_experts):
--+++            expert_token_count = tokens_per_expert[i] - current_token_offset
--+++            if expert_token_count == 0:
--+++                continue
--+++            end_offset = current_token_offset + expert_token_count
--+++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+++            expert_hidden_states = hidden_states[expert_original_token_indices]
--+++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+++            current_token_offset += expert_token_count
--+++        return moe_output
--+++
--+++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--+++    @no_grad()
--+++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++        moe_output = ops.zeros_like(hidden_states)
--+++        num_tokens, _ = hidden_states.shape
--+++        flat_selected_experts = selected_experts.flatten()
--+++        flat_routing_weights = routing_weights.flatten()
--+++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++        active_experts = ops.unique(flat_selected_experts)
--+++        for expert_idx_tensor in active_experts:
--+++            expert_idx = expert_idx_tensor.item()
--+++            expert_layer = self.experts[expert_idx]
--+++            mask = (flat_selected_experts == expert_idx_tensor)
--+++            current_token_indices = token_indices[mask]
--+++            current_routing_weights = flat_routing_weights[mask]
--+++            current_hidden_states = hidden_states[current_token_indices]
--+++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+++        return moe_output
--++ 
--++-            final_hidden_states = final_hidden_states + shared_expert_output
--++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++-            
--++-            return final_hidden_states, router_logits
--+++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++        global Long_Prompt
--+++
--+++        # 1. 门控计算 (所有模式通用)
--+++        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++        router_logits = self.gate(hidden_states_reshaped)
--+++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+++        if self.norm_topk_prob:
--+++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++        
--+++        moe_output = None
--+++        if Long_Prompt:
--+++            # --- 精度优先模式 (ACCURACY MODE) ---
--+++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++        else:
--+++            # --- 速度优先模式 (SPEED MODE) ---
--+++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++            if sequence_length == 1:
--+++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++            else:
--+++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++        
--++ 
--+++        # 3. 共享专家计算与合并 (所有模式通用)
--+++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++        
--+++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++        
--+++        return final_hidden_states, router_logits
--++ 
--++ class Qwen2MoeDecoderLayer(nn.Module):
--++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--++         super().__init__()
--++         self.hidden_size = config.hidden_size
--+++        
--+++        # if Long_Prompt:
--+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++        # else:
--+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++ 
--++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++ 
--++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++-
--++         if (layer_idx not in config.mlp_only_layers) and (
--++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++         ):
--++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++             self._warmed_up = True
--++             self.warmup_moe_model()
--++ 
--+++
--+++
--++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++         output_router_logits = (
--++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
--++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++             router_logits=outputs.router_logits,
--++         )
--++ 
--+++    def generate(self, *args, **kwargs):
--+++        """
--+++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--+++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--+++        """
--+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--+++
--+++        input_ids = kwargs.get("input_ids")
--+++        if input_ids is None and args:
--+++            input_ids = args[0]
--+++
--+++        if input_ids is not None:
--+++            prompt_length = input_ids.shape[1]
--+++            
--+++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--+++                Long_Prompt = True
--+++            else:
--+++                Long_Prompt = False
--+++
--+++        return super().generate(*args, **kwargs)
--+++    
--++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
--++     def prepare_inputs_for_generation(
--++         self,
--++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
--++         # Exception 1: when passing input_embeds, input_ids may be missing entries
--++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
--+++        
--++         if past_key_values is not None:
--++             if inputs_embeds is not None:  # Exception 1
--++                 if 0 not in input_ids.shape:
--++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++             }
--++         )
--++         return model_inputs
--+++
--++ # @lwx
--++     # def _decode_one_tokens_logits(
--++     #     self,
--++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
--++             attentions=outputs.attentions,
--++         )
--++ 
--+++
--++ __all__ = [
--++     "Qwen2MoeForCausalLM",
--++     "Qwen2MoeModel",
--++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--++new file mode 100644
--++index 00000000..6dfb5b93
--++--- /dev/null
--+++++ b/patches/0001-20251104commit.patch
--++@@ -0,0 +1,1272 @@
--+++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+++From: Pinoeer-kingxi <13022943007@163.com>
--+++Date: Tue, 4 Nov 2025 09:11:51 +0800
--+++Subject: [PATCH] 20251104commit
--+++
--+++---
--+++ mindnlp/transformers/cache_utils.py           |  28 +-
--+++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+++ 3 files changed, 976 insertions(+), 87 deletions(-)
--+++
--+++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+++index cadd2e04..02f8d4be 100644
--+++--- a/mindnlp/transformers/cache_utils.py
--++++++ b/mindnlp/transformers/cache_utils.py
--+++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+++             # k_out[:, :, cache_position] = key_states
--+++             # v_out[:, :, cache_position] = value_states
--+++-            if ON_ORANGE_PI:
--+++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++-            else:
--+++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++-
--++++            # if ON_ORANGE_PI:
--++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++            # else:
--++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++            # 确保 cache_position 是 1D tensor 并且类型正确
--++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--++++            if cache_position.ndim > 1:
--++++                cache_position = cache_position.flatten()
--++++            # 确保类型是 int32 或 int64（MindSpore 要求）
--++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--++++                cache_position = cache_position.int()
--++++            
--++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--++++            k_out[:, :, cache_position] = key_states
--++++            v_out[:, :, cache_position] = value_states
--++++            
--+++         return k_out, v_out
--+++ 
--+++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++index c695b944..d8303e45 100644
--+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+++ def rotate_half(x):
--+++     """Rotates half the hidden dims of the input."""
--+++-    x1 = x[..., : x.shape[-1] // 2]
--+++-    x2 = x[..., x.shape[-1] // 2 :]
--++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++    # x1 = x[..., : x.shape[-1] // 2]
--++++    # x2 = x[..., x.shape[-1] // 2 :]
--++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++     return ops.cat((-x2, x1), dim=-1)
--+++ 
--+++ 
--+++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+++         if self.training:
--+++             raise NotImplementedError("Training is not supported yet.")
--+++         else:
--+++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++-        if self.config.n_shared_experts is not None:
--+++-            y = y + self.shared_experts(identity)
--+++-        return y
--++++            # @lwx
--++++            if orig_shape[1] == 1:
--++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--++++                y=y.view(*orig_shape)
--++++                if self.config.n_shared_experts is not None:
--++++                    y = y + self.shared_experts(identity)
--++++                return y
--++++            else:
--++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--++++                if self.config.n_shared_experts is not None:
--++++                    y = y + self.shared_experts(identity)
--++++                return y
--++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++        # if self.config.n_shared_experts is not None:
--++++        #     y = y + self.shared_experts(identity)
--++++        # return y
--++++        
--++++    @no_grad()
--++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++
--++++        expert_cache = ops.zeros_like(x)
--++++        for i in range(self.num_experts_per_tok):
--++++            expert_id = flat_expert_indices[i].item()
--++++            weight = flat_expert_weights[i].item()
--++++            expert = self.experts[expert_id]
--++++            expert_out = expert(x)
--++++            expert_cache += expert_out * weight
--++++        return expert_cache
--+++ 
--+++     @no_grad()
--+++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++-        # expert_cache = torch.zeros_like(x)
--+++-        # idxs = flat_expert_indices.argsort()
--+++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++-        # token_idxs = idxs // self.num_experts_per_tok
--+++-        # for i, end_idx in enumerate(tokens_per_expert):
--+++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++-        #     if start_idx == end_idx:
--+++-        #         continue
--+++-        #     expert = self.experts[i]
--+++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++-        #     expert_tokens = x[exp_token_idx]
--+++-        #     expert_out = expert(expert_tokens)
--+++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++-        # return expert_cache
--++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++         expert_cache = ops.zeros_like(x)
--+++         idxs = flat_expert_indices.argsort()
--+++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++         token_idxs = idxs // self.num_experts_per_tok
--++++
--+++         for i, end_idx in enumerate(tokens_per_expert):
--+++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++             if start_idx == end_idx:
--+++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+++             expert_out = expert(expert_tokens)
--+++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++
--+++         return expert_cache
--++++        
--++++    # @no_grad()
--++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++    #     # expert_cache = torch.zeros_like(x)
--++++    #     # idxs = flat_expert_indices.argsort()
--++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++    #     # token_idxs = idxs // self.num_experts_per_tok
--++++    #     # for i, end_idx in enumerate(tokens_per_expert):
--++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++    #     #     if start_idx == end_idx:
--++++    #     #         continue
--++++    #     #     expert = self.experts[i]
--++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++    #     #     expert_tokens = x[exp_token_idx]
--++++    #     #     expert_out = expert(expert_tokens)
--++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++    #     # return expert_cache
--++++    #     expert_cache = ops.zeros_like(x)
--++++    #     idxs = flat_expert_indices.argsort()
--++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++    #     token_idxs = idxs // self.num_experts_per_tok
--++++
--++++    #     for i, end_idx in enumerate(tokens_per_expert):
--++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++    #         if start_idx == end_idx:
--++++    #             continue
--++++    #         expert = self.experts[i]
--++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++    #         expert_tokens = x[exp_token_idx]
--++++    #         expert_out = expert(expert_tokens)
--++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++
--++++    #     return expert_cache
--++++    # @no_grad()
--++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++    #     expert_cache = ops.zeros_like(x)
--++++
--++++    #     # 排序保证顺序一致
--++++    #     idxs = flat_expert_indices.argsort()
--++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++    #     token_idxs = idxs // self.num_experts_per_tok
--++++
--++++    #     # 找出有 token 的专家
--++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++
--++++    #     for i in active_experts.tolist():
--++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++    #         end_idx = tokens_per_expert[i]
--++++    #         if start_idx == end_idx:  # 没有 token
--++++    #             continue
--++++
--++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++    #         expert_tokens = x[exp_token_idx]
--++++    #         expert_out = self.experts[i](expert_tokens)
--++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++
--++++    #         expert_cache = mindspore.mint.scatter_add(
--++++    #             expert_cache,
--++++    #             0,
--++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++    #             expert_out
--++++    #         )
--++++
--++++    #     return expert_cache
--++++
--++++
--+++ 
--+++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+++ #     """
--+++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++ 
--+++         # Initialize weights and apply final processing
--+++         self.post_init()
--++++        self.warm_up = False
--++++
--++++    def warmup_moe_model_deep(self):
--++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++++        test_texts = [
--++++            "warmup short",
--++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--++++        ]
--++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++        if tokenizer is None:
--++++            from mindnlp.transformers import AutoTokenizer
--++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++            self._warmup_tokenizer = tokenizer
--++++
--++++        for text in test_texts:
--++++            inputs = tokenizer(text, return_tensors="ms")
--++++            with mindspore._no_grad():
--++++                _ = self(**inputs, use_cache=False)
--++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+++ 
--+++     def get_input_embeddings(self):
--+++         return self.model.embed_tokens
--+++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++         ```"""
--++++        if not self.warm_up:
--++++            self.warm_up = True
--++++            self.warmup_moe_model_deep()
--++++
--+++         output_attentions = (
--+++             output_attentions
--+++             if output_attentions is not None
--+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++index 3cbf820e..d4c6b651 100644
--+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++@@ -18,7 +18,6 @@
--+++ # See the License for the specific language governing permissions and
--+++ # limitations under the License.
--+++ """MindSpore Qwen2MoE model."""
--+++-
--+++ import math
--+++ from typing import List, Optional, Tuple, Union
--+++ 
--+++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+++     TokenClassifierOutput,
--+++ )
--+++ from ...modeling_utils import PreTrainedModel
--++++from ...generation import GenerationMixin
--+++ from ....utils import logging
--+++ from .configuration_qwen2_moe import Qwen2MoeConfig
--+++ 
--+++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+++         self.variance_epsilon = eps
--+++ 
--+++     def forward(self, hidden_states):
--++++        # @dwj
--++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++        # @lwx
--++++        # if not self.training :
--++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++         input_dtype = hidden_states.dtype
--+++         hidden_states = hidden_states.to(mindspore.float32)
--+++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+++@@ -234,6 +239,8 @@ def rotate_half(x):
--+++     """Rotates half the hidden dims of the input."""
--+++     x1 = x[..., : x.shape[-1] // 2]
--+++     x2 = x[..., x.shape[-1] // 2 :]
--++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++     return ops.cat((-x2, x1), dim=-1)
--+++ 
--+++ 
--+++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+++         self.config = config
--+++         self.hidden_size = config.hidden_size
--+++         self.intermediate_size = intermediate_size
--++++        
--+++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+++         self.act_fn = ACT2FN[config.hidden_act]
--+++ 
--+++     def forward(self, x):
--+++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++-
--+++ 
--++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++        # @lwx 
--++++        # gate_up_output = self.gate_up_proj(x)
--++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--++++        # return self.down_proj(swiglu_output)
--++++
--++++    # def forward(self, x):
--++++    #     gate_proj_out = self.gate_proj(x)
--++++    #     up_proj_out = self.up_proj(x)
--++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--++++    #     return self.down_proj(swiglu_out)
--++++        
--+++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+++     """
--+++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+++         use_cache: bool = False,
--+++         cache_position: Optional[mindspore.Tensor] = None,
--+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++        
--++++
--+++         bsz, q_len, _ = hidden_states.shape
--+++ 
--+++         query_states = self.q_proj(hidden_states)
--+++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++                     "with a layer index."
--+++                 )
--+++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++            if isinstance(past_key_value, StaticCache):
--++++                kv_seq_len = key_states.shape[-2]
--++++            else:
--++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++ 
--+++         if past_key_value is not None:
--+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++            
--++++            if isinstance(past_key_value, StaticCache):
--++++                kv_seq_len = key_states.shape[-2]
--+++ 
--+++         # repeat k/v heads if n_kv_heads < n_heads
--+++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++-
--++++        
--+++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++ 
--+++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+++-            raise ValueError(
--+++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+++-                f" {attn_weights.shape}"
--+++-            )
--+++-
--+++-        if attention_mask is not None:  # no matter the length, we just slice it
--+++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--++++        if attention_mask is not None:
--++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++             attn_weights = attn_weights + causal_mask
--+++ 
--+++         # upcast attention to fp32
--+++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++ 
--+++         attn_output = self.o_proj(attn_output)
--+++-
--++++        # @lwx
--++++        
--++++        # max_seq_len = self.max_position_embeddings  # 2048
--++++
--++++        # if attention_mask is not None:
--++++        #     # attention_mask: [B, 1, Sq, Sk]
--++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++++
--++++        #     # pad 到 [max_seq_len, max_seq_len]
--++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++++        #     global_attention_mask = padded_mask
--++++        # else:
--++++        #     global_attention_mask = None
--++++
--++++
--++++        # sparse_mode=3
--++++        # attn_output = mindspore.ops.flash_attention_score(
--++++        #     query=query_states,
--++++        #     key=key_states,
--++++        #     value=value_states,
--++++        #     real_shift=None,
--++++        #     padding_mask=None,
--++++
--++++        #     head_num=self.num_heads,
--++++        #     attn_mask=global_attention_mask,
--++++        #     keep_prob=1.0 - self.attention_dropout,
--++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++++        #     input_layout="BNSD",
--++++        #     pre_tokens=2147483647,
--++++        #     next_tokens=2147483647,
--++++        #     inner_precise=0,
--++++        #     drop_mask=None,
--++++        #     prefix=None,
--++++        #     actual_seq_qlen=None,
--++++        #     actual_seq_kvlen=None,
--++++        #     sparse_mode=sparse_mode,
--++++        # )
--+++         if not output_attentions:
--+++             attn_weights = None
--+++ 
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--+++ 
--++++class Qwen2MoeFlashAttention(nn.Module):
--++++    """
--++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++++
--++++    关键改动:
--++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++++       直接传入原始的 key 和 value 张量效率更高。
--++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++    """
--++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++        super().__init__()
--++++        self.config = config
--++++        self.layer_idx = layer_idx
--++++        self.hidden_size = config.hidden_size
--++++        self.num_heads = config.num_attention_heads
--++++        self.head_dim = self.hidden_size // self.num_heads
--++++        self.num_key_value_heads = config.num_key_value_heads
--++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++        self.max_position_embeddings = config.max_position_embeddings
--++++        self.rope_theta = config.rope_theta
--++++        self.attention_dropout = config.attention_dropout
--++++
--++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++++            raise ValueError(
--++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++++            )
--++++
--++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++
--++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++            self.head_dim,
--++++            max_position_embeddings=self.max_position_embeddings,
--++++            base=self.rope_theta,
--++++        )
--++++
--++++    def forward(
--++++        self,
--++++        hidden_states: mindspore.Tensor,
--++++        attention_mask: Optional[mindspore.Tensor] = None,
--++++        position_ids: Optional[mindspore.Tensor] = None,
--++++        past_key_value: Optional[Cache] = None,
--++++        output_attentions: bool = False,
--++++        use_cache: bool = False,
--++++        cache_position: Optional[mindspore.Tensor] = None,
--++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++        bsz, q_len, _ = hidden_states.shape
--++++
--++++        # 1. 线性投射 Q, K, V
--++++        query_states = self.q_proj(hidden_states)
--++++        key_states = self.k_proj(hidden_states)
--++++        value_states = self.v_proj(hidden_states)
--++++
--++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++        # query:   [B, S, H*D] -> [B, N1, S, D]
--++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++        # 3. RoPE 旋转位置编码
--++++        kv_seq_len = key_states.shape[-2]
--++++        if past_key_value is not None:
--++++            if self.layer_idx is None:
--++++                raise ValueError(
--++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++                    "with a layer index."
--++++                )
--++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++++                if cache_position.shape[0] == 1:
--++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++++                    kv_seq_len = past_seen_tokens + 1
--++++                else:
--++++                    # prefill 阶段：cache_position 是范围，使用其长度
--++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++++            else:
--++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++
--++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++        # 4. KV 缓存更新
--++++        if past_key_value is not None:
--++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++            key_states, value_states = past_key_value.update(
--++++                key_states, value_states, self.layer_idx, cache_kwargs
--++++            )
--++++            
--++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++                if cache_position.shape[0] == 1:
--++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++++                    kv_seq_len = key_states.shape[-2]
--++++
--++++        # 5. [重要] 准备 Attention Mask
--++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++        fa_attention_mask = None
--++++        if attention_mask is not None:
--++++            # 截取与当前key长度匹配的部分
--++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++++            fa_attention_mask = (mask_slice != 0)
--++++
--++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++++        input_dtype = query_states.dtype
--++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++++            query_states = query_states.to(mindspore.float16)
--++++            key_states = key_states.to(mindspore.float16)
--++++            value_states = value_states.to(mindspore.float16)
--++++
--++++        # 6. [核心] 调用 flash_attention_score 算子
--++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++        attn_output = mindspore.ops.flash_attention_score(
--++++            query=query_states,
--++++            key=key_states,
--++++            value=value_states,
--++++            head_num=self.num_heads, # 传入Q的头数(N1)
--++++            attn_mask=fa_attention_mask,
--++++            keep_prob=1.0 - self.attention_dropout,
--++++            scalar_value=1.0 / math.sqrt(self.head_dim),
--++++            input_layout="BNSD",
--++++            sparse_mode=0 # 使用 defaultMask 模式
--++++        )
--++++
--++++        # 恢复原始数据类型
--++++        attn_output = attn_output.to(input_dtype)
--++++
--++++        # 7. 调整输出形状
--++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++        attn_output = self.o_proj(attn_output)
--++++
--++++        # FlashAttention 算子不直接返回注意力权重矩阵
--++++        attn_weights = None
--++++        if output_attentions:
--++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++
--++++        return attn_output, attn_weights, past_key_value
--++++
--++++    # def forward(
--++++    #     self,
--++++    #     hidden_states: mindspore.Tensor,
--++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++    #     position_ids: Optional[mindspore.Tensor] = None,
--++++    #     past_key_value: Optional[Cache] = None,
--++++    #     output_attentions: bool = False,
--++++    #     use_cache: bool = False,
--++++    #     cache_position: Optional[mindspore.Tensor] = None,
--++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++    #     bsz, q_len, _ = hidden_states.shape
--++++
--++++    #     # 1. 线性投射 Q, K, V
--++++    #     query_states = self.q_proj(hidden_states)
--++++    #     key_states = self.k_proj(hidden_states)
--++++    #     value_states = self.v_proj(hidden_states)
--++++
--++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++    #     # 3. RoPE 旋转位置编码
--++++    #     kv_seq_len = key_states.shape[-2]
--++++    #     if past_key_value is not None:
--++++    #         if self.layer_idx is None:
--++++    #             raise ValueError(
--++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++    #                 "with a layer index."
--++++    #             )
--++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++
--++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++    #     # 4. KV 缓存更新
--++++    #     if past_key_value is not None:
--++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++    #         key_states, value_states = past_key_value.update(
--++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++    #         )
--++++
--++++    #     # 5. 准备 Attention Mask
--++++    #     fa_attention_mask = None
--++++    #     if attention_mask is not None:
--++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++    #         fa_attention_mask = (mask_slice != 0)
--++++
--++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++++    #     input_dtype = query_states.dtype
--++++
--++++    #     # 6. [核心] 调用 flash_attention_score 算子
--++++    #     attn_output = mindspore.ops.flash_attention_score(
--++++    #         query=query_states,
--++++    #         key=key_states,
--++++    #         value=value_states,
--++++    #         head_num=self.num_heads,
--++++    #         attn_mask=fa_attention_mask,
--++++    #         keep_prob=1.0 - self.attention_dropout,
--++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++++    #         input_layout="BNSD",
--++++    #         sparse_mode=0,
--++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++++    #         inner_precise=1
--++++    #     )
--++++
--++++    #     # 恢复原始数据类型
--++++    #     attn_output = attn_output.to(input_dtype)
--++++
--++++    #     # 7. 调整输出形状
--++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++    #     attn_output = self.o_proj(attn_output)
--++++
--++++    #     attn_weights = None
--++++    #     if output_attentions:
--++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++
--++++    #     return attn_output, attn_weights, past_key_value
--++++
--++++    # def forward(
--++++    #     self,
--++++    #     hidden_states: mindspore.Tensor,
--++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++    #     position_ids: Optional[mindspore.Tensor] = None,
--++++    #     past_key_value: Optional[Cache] = None,
--++++    #     output_attentions: bool = False,
--++++    #     use_cache: bool = False,
--++++    #     cache_position: Optional[mindspore.Tensor] = None,
--++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++    #     bsz, q_len, _ = hidden_states.shape
--++++
--++++    #     query_states = self.q_proj(hidden_states)
--++++    #     key_states = self.k_proj(hidden_states)
--++++    #     value_states = self.v_proj(hidden_states)
--++++
--++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++    #     kv_seq_len = key_states.shape[-2]
--++++    #     if past_key_value is not None:
--++++    #         if self.layer_idx is None:
--++++    #             raise ValueError("`layer_idx` must be specified for caching")
--++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++
--++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++    #     if past_key_value is not None:
--++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++    #         key_states, value_states = past_key_value.update(
--++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++    #         )
--++++
--++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++
--++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++++    #     query_states = query_states / math.sqrt(self.head_dim)
--++++    #     # <--- 修改结束 ---
--++++
--++++    #     fa_attention_mask = None
--++++    #     if attention_mask is not None:
--++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++    #         fa_attention_mask = (mask_slice != 0)
--++++
--++++    #     input_dtype = query_states.dtype
--++++
--++++    #     attn_output = mindspore.ops.flash_attention_score(
--++++    #         query=query_states,    # 传入已经预先缩放过的 query
--++++    #         key=key_states,
--++++    #         value=value_states,
--++++    #         head_num=self.num_heads,
--++++    #         attn_mask=fa_attention_mask,
--++++    #         keep_prob=1.0 - self.attention_dropout,
--++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++++    #         input_layout="BNSD",
--++++    #         sparse_mode=0,
--++++    #         inner_precise=1        # 仍然保持内部高精度计算
--++++    #     )
--++++
--++++    #     attn_output = attn_output.to(input_dtype)
--++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++    #     attn_output = self.o_proj(attn_output)
--++++
--++++    #     attn_weights = None
--++++    #     if output_attentions:
--++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++++
--++++    #     return attn_output, attn_weights, past_key_value
--++++
--+++ QWEN2MOE_ATTENTION_CLASSES = {
--+++     "eager": Qwen2MoeAttention,
--++++    "flash-attention": Qwen2MoeFlashAttention,
--+++ }
--+++ 
--+++ 
--+++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++ 
--++++    #@dwj
--++++    # 只遍历激活的专家，而非全部专家
--+++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-        hidden_states = hidden_states.view(-1, hidden_dim)
--+++-        # router_logits: (batch * sequence_length, n_experts)
--+++-        router_logits = self.gate(hidden_states)
--+++-
--+++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++-        if self.norm_topk_prob:
--+++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-        # we cast back to the input dtype
--+++-        routing_weights = routing_weights.to(hidden_states.dtype)
--+++-
--+++-        final_hidden_states = ops.zeros(
--+++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+++-        )
--+++-
--+++-        # One hot encode the selected experts to create an expert mask
--+++-        # this will be used to easily index which expert is going to be sollicitated
--+++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+++-
--+++-        # Loop over all available experts in the model and perform the computation on each expert
--+++-        for expert_idx in range(self.num_experts):
--+++-            expert_layer = self.experts[expert_idx]
--+++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+++-
--+++-            # Index the correct hidden states and compute the expert hidden state for
--+++-            # the current expert. We need to make sure to multiply the output hidden
--+++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+++-            if 0 not in idx.shape:
--+++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+++-
--+++-                # However `index_add_` only support torch tensors for indexing so we'll use
--+++-                # the `top_x` tensor here.
--+++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+++-
--+++-        shared_expert_output = self.shared_expert(hidden_states)
--+++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+++-
--+++-        final_hidden_states = final_hidden_states + shared_expert_output
--++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++            num_tokens = hidden_states_reshaped.shape[0]
--++++
--++++            router_logits = self.gate(hidden_states_reshaped)
--++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++
--++++            if self.norm_topk_prob:
--++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++            routing_weights = routing_weights.to(hidden_states.dtype)
--++++            
--++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++++            flat_selected_experts = selected_experts.flatten()
--++++            
--++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++++            token_indices = broadcasted_token_indices.flatten()
--++++            
--++++            active_experts = ops.unique(flat_selected_experts)
--++++            
--++++            for expert_idx_tensor in active_experts:
--++++                expert_idx = expert_idx_tensor.item()
--++++                expert_layer = self.experts[expert_idx]
--++++                
--++++                mask = (flat_selected_experts == expert_idx_tensor)
--++++                selected_token_indices = token_indices[mask]
--++++                selected_routing_weights = routing_weights.flatten()[mask]
--++++                
--++++                current_states = hidden_states_reshaped[selected_token_indices]
--++++                
--++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++                
--++++                final_hidden_states = final_hidden_states.index_add(
--++++                    dim=0,
--++++                    index=selected_token_indices,
--++++                    source=expert_output.to(hidden_states.dtype)
--++++                )
--++++            
--++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++ 
--+++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++-        return final_hidden_states, router_logits
--++++            final_hidden_states = final_hidden_states + shared_expert_output
--++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++            
--++++            return final_hidden_states, router_logits
--+++ 
--+++ 
--+++ class Qwen2MoeDecoderLayer(nn.Module):
--+++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+++ 
--+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++ 
--++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++
--+++         if (layer_idx not in config.mlp_only_layers) and (
--+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+++         ):
--+++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+++     _skip_keys_device_placement = "past_key_values"
--+++     _supports_cache_class = True
--++++#lwx
--++++    # _supports_static_cache = True
--+++ 
--+++     def _init_weights(self, module):
--+++         std = self.config.initializer_range
--+++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+++         return causal_mask
--+++ 
--+++ 
--+++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++     _tied_weights_keys = ["lm_head.weight"]
--+++ 
--+++     def __init__(self, config):
--+++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++         self.num_experts_per_tok = config.num_experts_per_tok
--+++         # Initialize weights and apply final processing
--+++         self.post_init()
--++++        # @lwx
--++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--++++        #     self.generation_config.cache_implementation = "static"
--++++        self._warmed_up = False
--++++
--++++    def warmup_moe_model(self):
--++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--++++        test_texts = [
--++++            "warmup short",
--++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--++++        ]
--++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++        if tokenizer is None:
--++++            from mindnlp.transformers import AutoTokenizer
--++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++            self._warmup_tokenizer = tokenizer
--++++
--++++        for text in test_texts:
--++++            inputs = tokenizer(text, return_tensors="ms")
--++++            with mindspore._no_grad():
--++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+++ 
--+++     def get_input_embeddings(self):
--+++         return self.model.embed_tokens
--+++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++         ```"""
--++++        if not self._warmed_up:
--++++            self._warmed_up = True
--++++            self.warmup_moe_model()
--+++ 
--+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+++         output_router_logits = (
--+++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++             }
--+++         )
--+++         return model_inputs
--++++# @lwx
--++++    # def _decode_one_tokens_logits(
--++++    #     self,
--++++    #     cur_token: mindspore.Tensor,
--++++    #     input_pos: Optional[mindspore.Tensor],
--++++    #     cache_position: mindspore.Tensor,
--++++    #     past_key_values: StaticCache,
--++++    # ) -> mindspore.Tensor:
--++++    #     """
--++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--++++        
--++++    #     Args:
--++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--++++    #         input_pos: 输入位置信息，可选
--++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--++++            
--++++    #     Returns:
--++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--++++    #     """
--++++    #     # 调用JIT编译的版本
--++++    #     return self.get_decode_one_tokens_logits(
--++++    #         cur_token=cur_token,
--++++    #         input_pos=input_pos,
--++++    #         cache_position=cache_position,
--++++    #         past_key_values=past_key_values,
--++++    #     )
--++++    
--++++    # @mindspore.jit(jit_level='O1')
--++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--++++    #     """
--++++    #     JIT编译的函数，用于高效的单token解码
--++++    #     使用JIT编译优化以支持静态shape和高效执行
--++++        
--++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--++++    #     """
--++++    #     outputs = self.model.forward(
--++++    #         input_ids=cur_token,
--++++    #         position_ids=input_pos,
--++++    #         cache_position=cache_position,
--++++    #         past_key_values=past_key_values,
--++++    #         use_cache=True,
--++++    #         return_dict=False,
--++++    #     )
--++++        
--++++    #     hidden_states = outputs[0]
--++++    #     logits = self.lm_head.forward(hidden_states)
--++++    #     logits = logits.float()
--++++        
--++++    #     return logits[:, -1, :]
--++++
--++++    # def _sample(
--++++    #     self,
--++++    #     input_ids: mindspore.Tensor,
--++++    #     logits_processor,
--++++    #     stopping_criteria,
--++++    #     generation_config,
--++++    #     synced_devices: bool,
--++++    #     streamer=None,
--++++    #     logits_warper=None,
--++++    #     **model_kwargs,
--++++    # ):
--++++    #     """
--++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--++++    #     """
--++++    #     from ...generation.logits_process import LogitsProcessorList
--++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--++++    #     from mindnlp.core import nn, ops, no_grad
--++++    #     import numpy as np
--++++        
--++++    #     # 检查是否使用 StaticCache
--++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--++++    #     # 否则，直接调用父类方法
--++++    #     past_key_values = model_kwargs.get("past_key_values")
--++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--++++
--++++    #     if not isinstance(past_key_values, StaticCache):
--++++    #         # 不使用 StaticCache，直接调用父类方法
--++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--++++    #         return super()._sample(
--++++    #             input_ids=input_ids,
--++++    #             logits_processor=logits_processor,
--++++    #             stopping_criteria=stopping_criteria,
--++++    #             generation_config=generation_config,
--++++    #             synced_devices=synced_devices,
--++++    #             streamer=streamer,
--++++    #             logits_warper=logits_warper,
--++++    #             **model_kwargs,
--++++    #         )
--++++        
--++++    #     # 使用 StaticCache，进入自定义循环
--++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--++++    #     pad_token_id = generation_config._pad_token_tensor
--++++    #     output_attentions = generation_config.output_attentions
--++++    #     output_hidden_states = generation_config.output_hidden_states
--++++    #     output_scores = generation_config.output_scores
--++++    #     output_logits = generation_config.output_logits
--++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--++++    #     max_length = generation_config.max_length
--++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--++++    #     do_sample = generation_config.do_sample
--++++        
--++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--++++    #         raise ValueError(
--++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--++++    #             f"{logits_warper})."
--++++    #         )
--++++        
--++++    #     # init attention / hidden states / scores tuples
--++++    #     scores = () if (return_dict_in_generate and output_scores) else None
--++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--++++        
--++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--++++    #         encoder_hidden_states = (
--++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--++++    #         )
--++++        
--++++    #     # keep track of which sequences are already finished
--++++    #     batch_size, cur_len = input_ids.shape
--++++    #     this_peer_finished = False
--++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--++++        
--++++    #     time_record = []
--++++    #     from ....utils.testing_utils import parse_flag_from_env
--++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--++++        
--++++    #     while self._has_unfinished_sequences(
--++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--++++    #     ):
--++++    #         if _record_time:
--++++    #             import time as time_module
--++++    #             infer_start = time_module.time()
--++++            
--++++    #         # prepare model inputs
--++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--++++            
--++++    #         # prepare variable output controls
--++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--++++            
--++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--++++    #         cur_cache_position = model_inputs.get("cache_position")
--++++    #         cur_past_key_values = model_inputs.get("past_key_values")
--++++    #         cur_input_ids = model_inputs.get("input_ids")
--++++            
--++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--++++    #             cur_cache_position is not None and 
--++++    #             len(cur_cache_position.shape) > 0 and
--++++    #             cur_cache_position.shape[0] == 1 and
--++++    #             cur_input_ids is not None and
--++++    #             cur_input_ids.shape[1] == 1):
--++++    #             # 使用 JIT 优化的单 token 解码
--++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--++++    #             if not hasattr(self, '_jit_used'):
--++++    #                 self._jit_used = False
--++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--++++                
--++++    #             next_token_logits = self.get_decode_one_tokens_logits(
--++++    #                 cur_token=cur_input_ids,
--++++    #                 input_pos=model_inputs.get("position_ids"),
--++++    #                 cache_position=cur_cache_position,
--++++    #                 past_key_values=cur_past_key_values,
--++++    #             )
--++++                
--++++    #             # 标记已使用JIT（用于后续判断）
--++++    #             if not self._jit_used:
--++++    #                 self._jit_used = True
--++++                
--++++    #             # 构造兼容的输出对象
--++++    #             class JitOptimizedOutput:
--++++    #                 def __init__(self, logits, config):
--++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--++++    #                     self.config = config
--++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--++++    #                     self.attentions = None if not config.is_encoder_decoder else None
--++++    #                     self.cross_attentions = None
--++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--++++                
--++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--++++    #         else:
--++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--++++    #             outputs = self(**model_inputs, return_dict=True)
--++++            
--++++    #         if synced_devices and this_peer_finished:
--++++    #             continue
--++++            
--++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--++++    #         next_token_logits = outputs.logits[:, -1, :]
--++++            
--++++    #         # pre-process distribution
--++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--++++    #         if do_sample:
--++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--++++            
--++++    #         # Store scores, attentions and hidden_states when required
--++++    #         if return_dict_in_generate:
--++++    #             if output_scores:
--++++    #                 scores += (next_token_scores,)
--++++    #             if output_logits:
--++++    #                 raw_logits += (next_token_logits,)
--++++    #             if output_attentions:
--++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--++++    #                 if self.config.is_encoder_decoder:
--++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--++++                
--++++    #             if output_hidden_states:
--++++    #                 hidden = (
--++++    #                     outputs.decoder_hidden_states
--++++    #                     if self.config.is_encoder_decoder
--++++    #                     else outputs.hidden_states
--++++    #                 )
--++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--++++            
--++++    #         # token selection
--++++    #         if do_sample:
--++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--++++    #         else:
--++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--++++            
--++++    #         # finished sentences should have their next token be a padding token
--++++    #         if has_eos_stopping_criteria:
--++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--++++            
--++++    #         # update generated ids, model inputs, and length for next step
--++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--++++    #         if streamer is not None:
--++++    #             streamer.put(next_tokens)
--++++            
--++++    #         model_kwargs = self._update_model_kwargs_for_generation(
--++++    #             outputs,
--++++    #             model_kwargs,
--++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--++++    #         )
--++++            
--++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--++++    #         cur_len += 1
--++++            
--++++    #         if _record_time:
--++++    #             import time as time_module
--++++    #             infer_stop = time_module.time()
--++++    #             time_record.append(infer_stop - infer_start)
--++++            
--++++    #         del outputs
--++++        
--++++    #     average_infer_time = None
--++++    #     if time_record:
--++++    #         if len(time_record) > 1:
--++++    #             time_record.pop(0)
--++++    #         average_infer_time = sum(time_record) / len(time_record)
--++++    #         print(f'average inference time is: {average_infer_time}')
--++++    #         print(f'inference time record: {time_record}')
--++++        
--++++    #     if streamer is not None:
--++++    #         streamer.end()
--++++        
--++++    #     # 简单判断：打印是否使用了JIT路径
--++++    #     if hasattr(self, '_jit_used') and self._jit_used:
--++++    #         print("[JIT] ✓ JIT optimization was used during generation")
--++++    #     else:
--++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--++++        
--++++    #     if return_dict_in_generate:
--++++    #         if self.config.is_encoder_decoder:
--++++    #             return GenerateEncoderDecoderOutput(
--++++    #                 sequences=input_ids,
--++++    #                 scores=scores,
--++++    #                 logits=raw_logits,
--++++    #                 encoder_attentions=encoder_attentions,
--++++    #                 encoder_hidden_states=encoder_hidden_states,
--++++    #                 decoder_attentions=decoder_attentions,
--++++    #                 cross_attentions=cross_attentions,
--++++    #                 decoder_hidden_states=decoder_hidden_states,
--++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++    #                 average_infer_time=average_infer_time
--++++    #             )
--++++    #         else:
--++++    #             return GenerateDecoderOnlyOutput(
--++++    #                 sequences=input_ids,
--++++    #                 scores=scores,
--++++    #                 logits=raw_logits,
--++++    #                 attentions=decoder_attentions,
--++++    #                 hidden_states=decoder_hidden_states,
--++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++    #                 average_infer_time=average_infer_time
--++++    #             )
--++++    #     else:
--++++    #         return input_ids
--++++            
--++++    # def _prepare_cache_for_generation(
--++++    #     self,
--++++    #     generation_config,
--++++    #     model_kwargs,
--++++    #     assistant_model,
--++++    #     batch_size,
--++++    #     max_cache_length,
--++++    # ):
--++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--++++    #         generation_config.cache_implementation = "static"
--++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--++++        
--++++    #     if generation_config.cache_implementation == "static":
--++++    #         base_required_from_max_length = generation_config.max_length + 1
--++++    #         base_required = max(max_cache_length, base_required_from_max_length)
--++++    #         min_cache_size = 50
--++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--++++    #         else:
--++++    #             max_cache_length = max(base_required, min_cache_size)
--++++            
--++++    #         original_max_cache_length = max_cache_length
--++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--++++    #         print(f"  - final max_cache_length: {max_cache_length}")
--++++            
--++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++    #             if max_cache_length > self.config.max_position_embeddings:
--++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--++++        
--++++    #     result = super()._prepare_cache_for_generation(
--++++    #         generation_config=generation_config,
--++++    #         model_kwargs=model_kwargs,
--++++    #         assistant_model=assistant_model,
--++++    #         batch_size=batch_size,
--++++    #         max_cache_length=max_cache_length,
--++++    #     )
--++++        
--++++    #     if generation_config.cache_implementation == "static":
--++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--++++    #         created_cache = model_kwargs.get(cache_name)
--++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--++++    #             if created_cache.max_cache_len < generation_config.max_length:
--++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--++++        
--++++    #     return result
--++++
--++++
--++++
--+++ 
--+++ 
--+++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+++-- 
--+++2.27.0
--+++
--++-- 
--++2.27.0
--++
--+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--+new file mode 100644
--+index 00000000..966529e4
--+--- /dev/null
--++++ b/patches/0003-20261106secondcommit.patch
--+@@ -0,0 +1,2769 @@
--++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--++From: Pinoeer-kingxi <13022943007@163.com>
--++Date: Thu, 6 Nov 2025 14:54:37 +0800
--++Subject: [PATCH 3/3] 20261106secondcommit
--++
--++---
--++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
--++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
--++ patches/0001-20251104commit.patch             | 1272 -----------------
--++ 3 files changed, 528 insertions(+), 2032 deletions(-)
--++ delete mode 100644 patches/0001-20251104commit.patch
--++
--++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++index 73773c22..2f9192bf 100644
--++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
--++ 
--++ _CONFIG_FOR_DOC = "DeepseekConfig"
--++ 
--+++_attn_mask_cache = {}
--+++
--+++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
--+++    q_len = batch_and_seq[1]
--+++    kv_len = batch_and_seq[1] + past_key_values_length 
--+++    key = (batch_and_seq[0], q_len, kv_len)
--+++
--+++    if key in _attn_mask_cache:
--+++        return _attn_mask_cache[key]
--+++
--+++    mask = _prepare_4d_causal_attention_mask(
--+++        attention_mask,
--+++        batch_and_seq,
--+++        inputs_embeds,
--+++        past_key_values_length,
--+++    )
--+++    _attn_mask_cache[key] = mask
--+++    return mask
--++ 
--++ def _get_unpad_data(attention_mask):
--++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
--++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
--++         return final_output
--++ 
--++ 
--++-    @no_grad()
--++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++-        expert_cache = ops.zeros_like(x)
--++-        idxs = flat_expert_indices.argsort()
--++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++-        token_idxs = idxs // self.num_experts_per_tok
--++-
--++-        for i, end_idx in enumerate(tokens_per_expert):
--++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++-            if start_idx == end_idx:
--++-                continue
--++-            expert = self.experts[i]
--++-            exp_token_idx = token_idxs[start_idx:end_idx]
--++-            expert_tokens = x[exp_token_idx]
--++-            expert_out = expert(expert_tokens)
--++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++-
--++-        return expert_cache
--++-        
--++     # @no_grad()
--++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++-    #     # expert_cache = torch.zeros_like(x)
--++-    #     # idxs = flat_expert_indices.argsort()
--++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++-    #     # token_idxs = idxs // self.num_experts_per_tok
--++-    #     # for i, end_idx in enumerate(tokens_per_expert):
--++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++-    #     #     if start_idx == end_idx:
--++-    #     #         continue
--++-    #     #     expert = self.experts[i]
--++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++-    #     #     expert_tokens = x[exp_token_idx]
--++-    #     #     expert_out = expert(expert_tokens)
--++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++-    #     # return expert_cache
--+++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++     #     expert_cache = ops.zeros_like(x)
--++     #     idxs = flat_expert_indices.argsort()
--++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
--++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++ 
--++     #     return expert_cache
--++-    # @no_grad()
--++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++-    #     expert_cache = ops.zeros_like(x)
--+++        
--+++    @no_grad()
--+++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++        """
--+++        优化版 MoE prefill：
--+++        - 批量张量化处理同一个 expert 的所有 token
--+++        - 跳过无 token 的专家
--+++        - 保持结果完全一致
--+++        """
--+++        # 初始化输出缓存
--+++        expert_cache = ops.zeros_like(x)
--++ 
--++-    #     # 排序保证顺序一致
--++-    #     idxs = flat_expert_indices.argsort()
--++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++-    #     token_idxs = idxs // self.num_experts_per_tok
--+++        # 排序（确保 scatter_add 位置对应原逻辑）
--+++        idxs = flat_expert_indices.argsort()
--+++        sorted_expert_indices = flat_expert_indices[idxs]
--+++        sorted_token_indices = idxs // self.num_experts_per_tok
--++ 
--++-    #     # 找出有 token 的专家
--++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++        # 每个 expert 的 token 数
--+++        tokens_per_expert = sorted_expert_indices.bincount()
--++ 
--++-    #     for i in active_experts.tolist():
--++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++-    #         end_idx = tokens_per_expert[i]
--++-    #         if start_idx == end_idx:  # 没有 token
--++-    #             continue
--+++        # 找出有 token 的专家
--+++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--++ 
--++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++-    #         expert_tokens = x[exp_token_idx]
--++-    #         expert_out = self.experts[i](expert_tokens)
--++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++        for expert_id in active_experts.tolist():
--+++            # 取该 expert 对应的排序后 token 区间
--+++            start = (tokens_per_expert[:expert_id]).sum().item()
--+++            end = start + tokens_per_expert[expert_id].item()
--++ 
--++-    #         expert_cache = mindspore.mint.scatter_add(
--++-    #             expert_cache,
--++-    #             0,
--++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++-    #             expert_out
--++-    #         )
--+++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
--+++            expert_tokens = x[token_idx]                     # 取输入向量
--++ 
--++-    #     return expert_cache
--+++            # 执行专家 MLP
--+++            expert_out = self.experts[expert_id](expert_tokens)
--+++
--+++            # 按权重缩放
--+++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
--+++
--+++            # 回写到缓存（等价 scatter_add）
--+++            expert_cache = mindspore.mint.scatter_add(
--+++                expert_cache,
--+++                0,
--+++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++                scaled_out
--+++            )
--+++
--+++        return expert_cache
--+++
--+++        # @no_grad()
--+++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++        #     # expert_cache = torch.zeros_like(x)
--+++        #     # idxs = flat_expert_indices.argsort()
--+++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++        #     # token_idxs = idxs // self.num_experts_per_tok
--+++        #     # for i, end_idx in enumerate(tokens_per_expert):
--+++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++        #     #     if start_idx == end_idx:
--+++        #     #         continue
--+++        #     #     expert = self.experts[i]
--+++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++        #     #     expert_tokens = x[exp_token_idx]
--+++        #     #     expert_out = expert(expert_tokens)
--+++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++        #     # return expert_cache
--+++        #     expert_cache = ops.zeros_like(x)
--+++        #     idxs = flat_expert_indices.argsort()
--+++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++        #     token_idxs = idxs // self.num_experts_per_tok
--+++
--+++        #     for i, end_idx in enumerate(tokens_per_expert):
--+++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++        #         if start_idx == end_idx:
--+++        #             continue
--+++        #         expert = self.experts[i]
--+++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++        #         expert_tokens = x[exp_token_idx]
--+++        #         expert_out = expert(expert_tokens)
--+++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++
--+++        #     return expert_cache
--+++        # @no_grad()
--+++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++        #     expert_cache = ops.zeros_like(x)
--+++
--+++        #     # 排序保证顺序一致
--+++        #     idxs = flat_expert_indices.argsort()
--+++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++        #     token_idxs = idxs // self.num_experts_per_tok
--+++
--+++        #     # 找出有 token 的专家
--+++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++
--+++        #     for i in active_experts.tolist():
--+++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++        #         end_idx = tokens_per_expert[i]
--+++        #         if start_idx == end_idx:  # 没有 token
--+++        #             continue
--+++
--+++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++        #         expert_tokens = x[exp_token_idx]
--+++        #         expert_out = self.experts[i](expert_tokens)
--+++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++
--+++        #         expert_cache = mindspore.mint.scatter_add(
--+++        #             expert_cache,
--+++        #             0,
--+++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++        #             expert_out
--+++        #         )
--+++
--+++        #     return expert_cache
--++ 
--++ 
--++ 
--++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
--++ 
--++         return attn_output, attn_weights, past_key_value
--++ 
--++-
--++ # class DeepseekFlashAttention(nn.Module):
--++ #     """
--++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
--++ 
--++         return attn_output, attn_weights, past_key_value
--++ 
--+++
--++ Deepseek_ATTENTION_CLASSES = {
--++     "eager": DeepseekAttention,
--++     "flash-attention": DeepseekFlashAttention,
--++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
--++             )
--++         else:
--++             # 4d mask is passed through the layers
--++-            attention_mask = _prepare_4d_causal_attention_mask(
--+++            # attention_mask = _prepare_4d_causal_attention_mask(
--+++            #     attention_mask,
--+++            #     (batch_size, seq_length),
--+++            #     inputs_embeds,
--+++            #     past_key_values_length,
--+++            # )
--+++            #@dwj
--+++            attention_mask = get_cached_causal_mask(
--++                 attention_mask,
--++                 (batch_size, seq_length),
--++                 inputs_embeds,
--++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++         # Initialize weights and apply final processing
--++         self.post_init()
--++         self.warm_up = False
--+++        #@dwj
--+++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--+++            self.num_layers,
--+++            self.num_attention_heads,
--+++            self.head_dim,
--+++            batch_size=1,
--+++            max_length=self.max_length,
--+++            dtype=mindspore.float16
--+++        )
--+++
--+++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--+++        key_cache = []
--+++        value_cache = []
--+++        for _ in range(num_layers):
--+++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+++            key_cache.append(k)
--+++            value_cache.append(v)
--+++        return key_cache, value_cache
--+++
--++ 
--++     def warmup_moe_model_deep(self):
--++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++index bced285c..ebd7782e 100644
--++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
--++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--++ 
--++-Long_Prompt = False
--++-PROMPT_LENGTH_THRESHOLD = 128
--+++Long_Prompt = 1
--+++LONG_PROMPT_LENGTH_THRESHOLD = 128
--+++SHORT_PROMPT_LENGTH_THRESHOLD = 32
--+++
--+++_causal_mask_cache = {}
--+++
--+++def get_cached_causal_mask_with_cache_position(
--+++    attention_mask: mindspore.Tensor,
--+++    sequence_length: int,
--+++    target_length: int,
--+++    dtype: mindspore.dtype,
--+++    min_dtype: float,
--+++    cache_position: mindspore.Tensor,
--+++    batch_size: int,
--+++):
--+++    """
--+++    带缓存的 causal mask 构造函数
--+++    """
--+++    # q_len 是当前 query 长度
--+++    q_len = sequence_length
--+++    # kv_len 是 target_length
--+++    kv_len = target_length
--+++
--+++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
--+++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
--+++
--+++    if key in _causal_mask_cache:
--+++        return _causal_mask_cache[key]
--+++
--+++    # 调用原来的 mask 构造逻辑
--+++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++        attention_mask,
--+++        sequence_length=sequence_length,
--+++        target_length=target_length,
--+++        dtype=dtype,
--+++        min_dtype=min_dtype,
--+++        cache_position=cache_position,
--+++        batch_size=batch_size,
--+++    )
--+++    # 缓存结果
--+++    _causal_mask_cache[key] = causal_mask
--+++    return causal_mask
--++ 
--++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--++ def _prepare_4d_causal_attention_mask_with_cache_position(
--++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++ 
--++ 
--++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
--+++# class Qwen2MoeAttention(nn.Module):
--+++#     """
--+++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--+++#     and "Generating Long Sequences with Sparse Transformers".
--+++#     """
--+++
--+++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++#         super().__init__()
--+++#         self.config = config
--+++#         self.layer_idx = layer_idx
--+++#         if layer_idx is None:
--+++#             logger.warning_once(
--+++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++#                 "when creating this class."
--+++#             )
--+++
--+++#         self.hidden_size = config.hidden_size
--+++#         self.num_heads = config.num_attention_heads
--+++#         self.head_dim = self.hidden_size // self.num_heads
--+++#         self.num_key_value_heads = config.num_key_value_heads
--+++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++#         self.max_position_embeddings = config.max_position_embeddings
--+++#         self.rope_theta = config.rope_theta
--+++#         self.is_causal = True
--+++#         self.attention_dropout = config.attention_dropout
--+++
--+++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+++#             raise ValueError(
--+++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+++#                 f" and `num_heads`: {self.num_heads})."
--+++#             )
--+++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++
--+++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++#             self.head_dim,
--+++#             max_position_embeddings=self.max_position_embeddings,
--+++#             base=self.rope_theta,
--+++#         )
--+++
--+++#     def forward(
--+++#         self,
--+++#         hidden_states: mindspore.Tensor,
--+++#         attention_mask: Optional[mindspore.Tensor] = None,
--+++#         position_ids: Optional[mindspore.Tensor] = None,
--+++#         past_key_value: Optional[Cache] = None,
--+++#         output_attentions: bool = False,
--+++#         use_cache: bool = False,
--+++#         cache_position: Optional[mindspore.Tensor] = None,
--+++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++
--+++        
--+++
--+++#         bsz, q_len, _ = hidden_states.shape
--+++
--+++#         query_states = self.q_proj(hidden_states)
--+++#         key_states = self.k_proj(hidden_states)
--+++#         value_states = self.v_proj(hidden_states)
--+++
--+++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++
--+++#         kv_seq_len = key_states.shape[-2]
--+++#         if past_key_value is not None:
--+++#             if self.layer_idx is None:
--+++#                 raise ValueError(
--+++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++#                     "with a layer index."
--+++#                 )
--+++#             if isinstance(past_key_value, StaticCache):
--+++#                 kv_seq_len = key_states.shape[-2]
--+++#             else:
--+++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++
--+++#         if past_key_value is not None:
--+++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++            
--+++#             if isinstance(past_key_value, StaticCache):
--+++#                 kv_seq_len = key_states.shape[-2]
--+++
--+++#         # repeat k/v heads if n_kv_heads < n_heads
--+++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++        
--+++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++
--+++#         if attention_mask is not None:
--+++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++#             attn_weights = attn_weights + causal_mask
--+++
--+++#         # upcast attention to fp32
--+++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+++#         attn_output = ops.matmul(attn_weights, value_states)
--+++
--+++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+++#             raise ValueError(
--+++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--+++#                 f" {attn_output.shape}"
--+++#             )
--+++
--+++#         attn_output = ops.transpose(attn_output, 1, 2)
--+++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++
--+++#         attn_output = self.o_proj(attn_output)
--+++#         # @lwx
--+++        
--+++#         # max_seq_len = self.max_position_embeddings  # 2048
--+++
--+++#         # if attention_mask is not None:
--+++#         #     # attention_mask: [B, 1, Sq, Sk]
--+++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++
--+++#         #     # pad 到 [max_seq_len, max_seq_len]
--+++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++#         #     global_attention_mask = padded_mask
--+++#         # else:
--+++#         #     global_attention_mask = None
--+++
--+++
--+++#         # sparse_mode=3
--+++#         # attn_output = mindspore.ops.flash_attention_score(
--+++#         #     query=query_states,
--+++#         #     key=key_states,
--+++#         #     value=value_states,
--+++#         #     real_shift=None,
--+++#         #     padding_mask=None,
--+++
--+++#         #     head_num=self.num_heads,
--+++#         #     attn_mask=global_attention_mask,
--+++#         #     keep_prob=1.0 - self.attention_dropout,
--+++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++#         #     input_layout="BNSD",
--+++#         #     pre_tokens=2147483647,
--+++#         #     next_tokens=2147483647,
--+++#         #     inner_precise=0,
--+++#         #     drop_mask=None,
--+++#         #     prefix=None,
--+++#         #     actual_seq_qlen=None,
--+++#         #     actual_seq_kvlen=None,
--+++#         #     sparse_mode=sparse_mode,
--+++#         # )
--+++#         if not output_attentions:
--+++#             attn_weights = None
--+++
--+++#         return attn_output, attn_weights, past_key_value
--+++
--++ class Qwen2MoeAttention(nn.Module):
--++     """
--++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--++-    and "Generating Long Sequences with Sparse Transformers".
--++-    """
--+++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
--++ 
--+++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
--+++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
--+++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
--+++
--+++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
--+++    """
--++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++         super().__init__()
--++         self.config = config
--++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
--++         if layer_idx is None:
--++             logger.warning_once(
--++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++                 "when creating this class."
--++             )
--++ 
--++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
--++         use_cache: bool = False,
--++         cache_position: Optional[mindspore.Tensor] = None,
--++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++-
--++         
--++-
--+++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
--++         bsz, q_len, _ = hidden_states.shape
--++ 
--++         query_states = self.q_proj(hidden_states)
--++         key_states = self.k_proj(hidden_states)
--++         value_states = self.v_proj(hidden_states)
--++ 
--++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++-
--+++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++        
--++         kv_seq_len = key_states.shape[-2]
--++         if past_key_value is not None:
--++-            if self.layer_idx is None:
--++-                raise ValueError(
--++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++-                    "with a layer index."
--++-                )
--++-            if isinstance(past_key_value, StaticCache):
--++-                kv_seq_len = key_states.shape[-2]
--++-            else:
--++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++        
--++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++ 
--++         if past_key_value is not None:
--++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++
--+++        # --- 2. 动态调度核心注意力计算 ---
--+++        global Long_Prompt
--+++        if Long_Prompt >= 1:
--+++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
--+++            fa_attention_mask = None
--+++            if attention_mask is not None:
--+++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++                fa_attention_mask = (mask_slice != 0)
--+++
--+++            attn_output = mindspore.ops.flash_attention_score(
--+++                query=query_states,
--+++                key=key_states,
--+++                value=value_states,
--+++                head_num=self.num_heads,
--+++                attn_mask=fa_attention_mask,
--+++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
--+++                scalar_value=1.0 / math.sqrt(self.head_dim),
--+++                input_layout="BNSD",
--+++                sparse_mode=0,
--+++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
--+++            )
--++             
--++-            if isinstance(past_key_value, StaticCache):
--++-                kv_seq_len = key_states.shape[-2]
--+++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++            attn_output = self.o_proj(attn_output)
--+++            attn_weights = None
--+++            if output_attentions:
--+++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--++ 
--++-        # repeat k/v heads if n_kv_heads < n_heads
--++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
--++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
--++-        
--++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++        else:
--+++            # --- Eager Attention 路径 (用于短序列和解码) ---
--+++            key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++            value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++            
--+++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++ 
--++-        if attention_mask is not None:
--++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++-            attn_weights = attn_weights + causal_mask
--+++            if attention_mask is not None:
--+++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++                attn_weights = attn_weights + causal_mask
--++ 
--++-        # upcast attention to fp32
--++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--++-        attn_output = ops.matmul(attn_weights, value_states)
--+++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+++            attn_output = ops.matmul(attn_weights, value_states)
--++ 
--++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--++-            raise ValueError(
--++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--++-                f" {attn_output.shape}"
--++-            )
--+++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+++                raise ValueError(
--+++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
--+++                )
--++ 
--++-        attn_output = ops.transpose(attn_output, 1, 2)
--++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++            attn_output = ops.transpose(attn_output, 1, 2)
--+++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++            attn_output = self.o_proj(attn_output)
--++ 
--++-        attn_output = self.o_proj(attn_output)
--++-        # @lwx
--+++            if not output_attentions:
--+++                attn_weights = None
--++         
--++-        # max_seq_len = self.max_position_embeddings  # 2048
--++-
--++-        # if attention_mask is not None:
--++-        #     # attention_mask: [B, 1, Sq, Sk]
--++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++-
--++-        #     # pad 到 [max_seq_len, max_seq_len]
--++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++-        #     global_attention_mask = padded_mask
--++-        # else:
--++-        #     global_attention_mask = None
--++-
--++-
--++-        # sparse_mode=3
--++-        # attn_output = mindspore.ops.flash_attention_score(
--++-        #     query=query_states,
--++-        #     key=key_states,
--++-        #     value=value_states,
--++-        #     real_shift=None,
--++-        #     padding_mask=None,
--++-
--++-        #     head_num=self.num_heads,
--++-        #     attn_mask=global_attention_mask,
--++-        #     keep_prob=1.0 - self.attention_dropout,
--++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++-        #     input_layout="BNSD",
--++-        #     pre_tokens=2147483647,
--++-        #     next_tokens=2147483647,
--++-        #     inner_precise=0,
--++-        #     drop_mask=None,
--++-        #     prefix=None,
--++-        #     actual_seq_qlen=None,
--++-        #     actual_seq_kvlen=None,
--++-        #     sparse_mode=sparse_mode,
--++-        # )
--++-        if not output_attentions:
--++-            attn_weights = None
--++-
--++         return attn_output, attn_weights, past_key_value
--++ 
--++-
--++ # class Qwen2MoeFlashAttention(nn.Module):
--++ #     """
--++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
--++ #             return final_hidden_states, router_logits
--++ 
--++ 
--++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++-#     """
--++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
--++-#     """
--++-#     def __init__(self, config: Qwen2MoeConfig):
--++-#         super().__init__()
--++-#         self.num_experts = config.num_experts
--++-#         self.top_k = config.num_experts_per_tok
--++-#         self.norm_topk_prob = config.norm_topk_prob
--++-
--++-#         # 门控网络
--++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++-#         # 专家列表
--++-#         self.experts = nn.ModuleList(
--++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++-#         )
--++-#         # 共享专家
--++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++-
--++-#     @no_grad()
--++-#     def _moe_infer_decode(
--++-#         self, 
--++-#         hidden_states: mindspore.Tensor, 
--++-#         selected_experts: mindspore.Tensor, 
--++-#         routing_weights: mindspore.Tensor
--++-#     ) -> mindspore.Tensor:
--++-#         """
--++-#         【解码路径】针对 sequence_length=1 的极致优化。
--++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--++-#         """
--++-#         batch_size, hidden_dim = hidden_states.shape
--++-        
--++-#         expert_outputs_list = [
--++-#             ops.cat([
--++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++-#             ], dim=0) 
--++-#             for i in range(batch_size)
--++-#         ]
--++-        
--++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--++-#         # shape: (batch_size, top_k, hidden_dim)
--++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++-        
--++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++-        
--++-#         return moe_output.squeeze(1)
--++-
--++-#     @no_grad()
--++-#     def _moe_infer_prefill(
--++-#         self, 
--++-#         hidden_states: mindspore.Tensor, 
--++-#         selected_experts: mindspore.Tensor, 
--++-#         routing_weights: mindspore.Tensor
--++-#     ) -> mindspore.Tensor:
--++-#         """
--++-#         【预填充路径】针对 sequence_length > 1 的优化。
--++-#         按专家对 Token 进行分组，并进行批处理。
--++-#         """
--++-#         moe_output = ops.zeros_like(hidden_states)
--++-#         num_tokens = hidden_states.shape[0]
--++-#         flat_selected_experts = selected_experts.flatten()
--++-        
--++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++-        
--++-#         active_experts = ops.unique(flat_selected_experts)
--++-        
--++-#         for expert_idx_tensor in active_experts:
--++-#             expert_idx = expert_idx_tensor.item()
--++-#             expert_layer = self.experts[expert_idx]
--++-            
--++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++-#             selected_token_indices = token_indices[mask]
--++-#             selected_routing_weights = routing_weights.flatten()[mask]
--++-            
--++-#             current_states = hidden_states[selected_token_indices]
--++-            
--++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++-            
--++-#             moe_output = moe_output.index_add(
--++-#                 dim=0,
--++-#                 index=selected_token_indices,
--++-#                 source=expert_output.to(hidden_states.dtype)
--++-#             )
--++-#         return moe_output
--++-
--++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++-#         """
--++-#         顶层 forward 方法，作为智能分发器。
--++-#         """
--++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-        
--++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++-#         router_logits = self.gate(hidden_states_reshaped)
--++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++-
--++-#         if self.norm_topk_prob:
--++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-        
--++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--++-        
--++-#         moe_output = None
--++-#         # 在推理时，根据序列长度选择最优路径
--++-#         if not self.training:
--++-#             if sequence_length == 1:
--++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++-#             else:
--++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++-#         else:
--++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--++-#             raise NotImplementedError("Training path is not implemented.")
--++-
--++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--++-        
--++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--++-        
--++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--++-        
--++-#         return final_hidden_states, router_logits
--++-
--++-
--++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++-#     """
--++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--++-#     """
--++-#     def __init__(self, config: Qwen2MoeConfig):
--++-#         super().__init__()
--++-#         self.num_experts = config.num_experts
--++-#         self.top_k = config.num_experts_per_tok
--++-#         self.norm_topk_prob = config.norm_topk_prob
--++-
--++-#         # 门控网络
--++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++-#         # 专家列表
--++-#         self.experts = nn.ModuleList(
--++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++-#         )
--++-#         # 共享专家
--++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++-
--++-#     @no_grad()
--++-#     def _moe_infer_decode(
--++-#         self, 
--++-#         hidden_states: mindspore.Tensor, 
--++-#         selected_experts: mindspore.Tensor, 
--++-#         routing_weights: mindspore.Tensor
--++-#     ) -> mindspore.Tensor:
--++-#         batch_size, _ = hidden_states.shape
--++-#         expert_outputs_list = [
--++-#             ops.cat([
--++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++-#             ], dim=0) 
--++-#             for i in range(batch_size)
--++-#         ]
--++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++-#         return moe_output.squeeze(1)
--++-
--++-#     @no_grad()
--++-#     def _moe_infer_prefill(
--++-#         self, 
--++-#         hidden_states: mindspore.Tensor, 
--++-#         selected_experts: mindspore.Tensor, 
--++-#         routing_weights: mindspore.Tensor
--++-#     ) -> mindspore.Tensor:
--++-#         moe_output = ops.zeros_like(hidden_states)
--++-#         num_tokens = hidden_states.shape[0]
--++-#         flat_selected_experts = selected_experts.flatten()
--++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++-#         active_experts = ops.unique(flat_selected_experts)
--++-        
--++-#         for expert_idx_tensor in active_experts:
--++-#             expert_idx = expert_idx_tensor.item()
--++-#             expert_layer = self.experts[expert_idx]
--++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++-#             selected_token_indices = token_indices[mask]
--++-#             selected_routing_weights = routing_weights.flatten()[mask]
--++-#             current_states = hidden_states[selected_token_indices]
--++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++-#             moe_output = moe_output.index_add(
--++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++-#             )
--++-#         return moe_output
--++-
--++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++-#         """
--++-#         顶层 forward 方法，作为智能分发器。
--++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--++-#         """
--++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-        
--++-#         # 1. 门控计算 (通用逻辑)
--++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++-#         router_logits = self.gate(hidden_states_reshaped)
--++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++-
--++-#         if self.norm_topk_prob:
--++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-        
--++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--++-        
--++-#         # 2. 智能分发到最优 MoE 路径
--++-#         moe_output = None
--++-#         if not self.training:
--++-#             if sequence_length == 1:
--++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++-#             else:
--++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++-#         else:
--++-#             raise NotImplementedError("Training path is not implemented.")
--++-
--++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++-        
--++-#         # 4. 合并 MoE 输出和共享专家输出
--++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++-        
--++-#         # 5. 恢复原始形状并返回
--++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++-        
--++-#         return final_hidden_states, router_logits
--++-
--++-# prefill fastest
--++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++-#     """
--++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--++-#     """
--++-#     def __init__(self, config: Qwen2MoeConfig):
--++-#         super().__init__()
--++-#         self.num_experts = config.num_experts
--++-#         self.top_k = config.num_experts_per_tok
--++-#         self.norm_topk_prob = config.norm_topk_prob
--++-
--++-#         # 门控网络
--++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++-#         # 专家列表
--++-#         self.experts = nn.ModuleList(
--++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++-#         )
--++-#         # 共享专家
--++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++-
--++-#     @no_grad()
--++-#     def _moe_infer_dispatch(
--++-#         self, 
--++-#         hidden_states: mindspore.Tensor, 
--++-#         selected_experts: mindspore.Tensor, 
--++-#         routing_weights: mindspore.Tensor
--++-#     ) -> mindspore.Tensor:
--++-#         """
--++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--++-#         """
--++-#         moe_output = ops.zeros_like(hidden_states)
--++-#         num_tokens, _ = hidden_states.shape
--++-        
--++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--++-#         flat_selected_experts = selected_experts.flatten()
--++-#         flat_routing_weights = routing_weights.flatten()
--++-
--++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++-
--++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--++-#         active_experts = ops.unique(flat_selected_experts)
--++-        
--++-#         for expert_idx_tensor in active_experts:
--++-#             expert_idx = expert_idx_tensor.item()
--++-#             expert_layer = self.experts[expert_idx]
--++-            
--++-#             # 找到所有分配给该专家的 token
--++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++-            
--++-#             # 使用 mask 选取对应的 token 和权重
--++-#             current_token_indices = token_indices[mask]
--++-#             current_routing_weights = flat_routing_weights[mask]
--++-#             current_hidden_states = hidden_states[current_token_indices]
--++-            
--++-#             # 对这些 token 进行批处理
--++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++-            
--++-#             # 使用 index_add 将结果精确地加回到对应位置
--++-#             moe_output = moe_output.index_add(
--++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--++-#             )
--++-#         return moe_output
--++-
--++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++-#         """
--++-#         顶层 forward 方法，作为智能分发器。
--++-#         """
--++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-        
--++-#         # 1. 门控计算
--++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++-#         router_logits = self.gate(hidden_states_reshaped)
--++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++-
--++-#         if self.norm_topk_prob:
--++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-        
--++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--++-        
--++-#         # 2. 调用统一的 MoE 计算内核
--++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--++-
--++-#         # 3. 统一处理共享专家
--++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++-        
--++-#         # 4. 合并输出
--++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++-        
--++-#         # 5. 恢复原始形状并返回
--++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++-        
--++-#         return final_hidden_states, router_logits
--++-
--++-
--++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++-#     """
--++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++-#     【最终高性能与高精度版】：
--++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--++-#     3. 这样实现了速度和准确性的两全其美。
--++-#     """
--++-#     def __init__(self, config: Qwen2MoeConfig):
--++-#         super().__init__()
--++-#         self.num_experts = config.num_experts
--++-#         self.top_k = config.num_experts_per_tok
--++-#         self.norm_topk_prob = config.norm_topk_prob
--++-
--++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++-#         self.experts = nn.ModuleList(
--++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++-#         )
--++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++-
--++-#     @no_grad()
--++-#     def _moe_infer_decode(
--++-#         self, 
--++-#         hidden_states: mindspore.Tensor, 
--++-#         selected_experts: mindspore.Tensor, 
--++-#         routing_weights: mindspore.Tensor
--++-#     ) -> mindspore.Tensor:
--++-#         """
--++-#         【解码路径】极致优化版：bmm + 高精度累加。
--++-#         """
--++-#         original_dtype = hidden_states.dtype
--++-#         batch_size, _ = hidden_states.shape
--++-        
--++-#         expert_outputs_list = [
--++-#             ops.cat([
--++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++-#             ], dim=0) 
--++-#             for i in range(batch_size)
--++-#         ]
--++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++-
--++-#         # 在 float32 下执行 bmm，得到高精度结果
--++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++-        
--++-#         # 将高精度结果转换回原始数据类型
--++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--++-        
--++-#         return moe_output
--++-
--++-#     @no_grad()
--++-#     def _moe_infer_prefill(
--++-#         self, 
--++-#         hidden_states: mindspore.Tensor, 
--++-#         selected_experts: mindspore.Tensor, 
--++-#         routing_weights: mindspore.Tensor
--++-#     ) -> mindspore.Tensor:
--++-#         """
--++-#         【预填充路径】与原始实现一致，结果精确。
--++-#         """
--++-#         moe_output = ops.zeros_like(hidden_states)
--++-#         num_tokens, _ = hidden_states.shape
--++-#         flat_selected_experts = selected_experts.flatten()
--++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++-#         active_experts = ops.unique(flat_selected_experts)
--++-        
--++-#         for expert_idx_tensor in active_experts:
--++-#             expert_idx = expert_idx_tensor.item()
--++-#             expert_layer = self.experts[expert_idx]
--++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++-#             selected_token_indices = token_indices[mask]
--++-#             selected_routing_weights = routing_weights.flatten()[mask]
--++-#             current_states = hidden_states[selected_token_indices]
--++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++-#             moe_output = moe_output.index_add(
--++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++-#             )
--++-#         return moe_output
--++-
--++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-        
--++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++-#         router_logits = self.gate(hidden_states_reshaped)
--++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++-
--++-#         if self.norm_topk_prob:
--++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-        
--++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--++-#         # 如果模型主体是 float16，后续再转换
--++-        
--++-#         moe_output = None
--++-#         if not self.training:
--++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--++-#             # _moe_infer_decode 内部会处理好类型转换
--++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--++-#             if sequence_length == 1:
--++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++-#             else:
--++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++-#         else:
--++-#             raise NotImplementedError("Training path is not implemented.")
--++-
--++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++-        
--++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++-        
--++-#         return final_hidden_states, router_logits
--++-    
--++-
--++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++-#     """
--++-#     【融合版】一个混合专家模块，内置两种推理策略，
--++-#     由外部全局变量 `Long_Prompt` 控制：
--++-
--++-#     - if Long_Prompt is True:  【精度优先模式】
--++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--++-#       适用于处理长序列，避免误差累积。
--++-
--++-#     - if Long_Prompt is False: 【速度优先模式】
--++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--++-#       在解码阶段获得极致速度，同时保证结果高度准确。
--++-#     """
--++-#     def __init__(self, config: Qwen2MoeConfig):
--++-#         super().__init__()
--++-#         self.num_experts = config.num_experts
--++-#         self.top_k = config.num_experts_per_tok
--++-#         self.norm_topk_prob = config.norm_topk_prob
--++-
--++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++-#         self.experts = nn.ModuleList(
--++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++-#         )
--++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++-
--++-#     # --- 速度优先模式的辅助函数 ---
--++-#     @no_grad()
--++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++-#         original_dtype = hidden_states.dtype
--++-#         batch_size, _ = hidden_states.shape
--++-#         expert_outputs_list = [
--++-#             ops.cat([
--++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++-#             ], dim=0) 
--++-#             for i in range(batch_size)
--++-#         ]
--++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++-#         weights_fp32 = routing_weights.to(mindspore.float32)
--++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
--++-
--++-#     @no_grad()
--++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++-#         moe_output = ops.zeros_like(hidden_states)
--++-#         num_tokens, _ = hidden_states.shape
--++-#         flat_selected_experts = selected_experts.flatten()
--++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++-#         active_experts = ops.unique(flat_selected_experts)
--++-#         for expert_idx_tensor in active_experts:
--++-#             expert_idx = expert_idx_tensor.item()
--++-#             expert_layer = self.experts[expert_idx]
--++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++-#             selected_token_indices = token_indices[mask]
--++-#             selected_routing_weights = routing_weights.flatten()[mask]
--++-#             current_states = hidden_states[selected_token_indices]
--++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--++-#         return moe_output
--++-
--++-#     # --- 精度优先模式的辅助函数 ---
--++-#     @no_grad()
--++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++-#         moe_output = ops.zeros_like(hidden_states)
--++-#         num_tokens, _ = hidden_states.shape
--++-#         flat_selected_experts = selected_experts.flatten()
--++-#         flat_routing_weights = routing_weights.flatten()
--++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++-#         active_experts = ops.unique(flat_selected_experts)
--++-#         for expert_idx_tensor in active_experts:
--++-#             expert_idx = expert_idx_tensor.item()
--++-#             expert_layer = self.experts[expert_idx]
--++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++-#             current_token_indices = token_indices[mask]
--++-#             current_routing_weights = flat_routing_weights[mask]
--++-#             current_hidden_states = hidden_states[current_token_indices]
--++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--++-#         return moe_output
--++-
--++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++-#         # 声明我们将要使用一个在模块外部定义的全局变量
--++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--++-#         global Long_Prompt
--++-
--++-#         # 1. 门控计算 (所有模式通用)
--++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++-#         router_logits = self.gate(hidden_states_reshaped)
--++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--++-#         if self.norm_topk_prob:
--++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-        
--++-#         moe_output = None
--++-#         if not self.training:
--++-#             # 根据 Long_Prompt 标志选择模式
--++-#             if Long_Prompt:
--++-#                 # --- 精度优先模式 ---
--++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++-#             else:
--++-#                 # --- 速度优先模式 ---
--++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++-#                 if sequence_length == 1:
--++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++-#                 else:
--++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++-#         else:
--++-#             raise NotImplementedError("Training path is not implemented.")
--++-
--++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++-        
--++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++-        
--++-#         return final_hidden_states, router_logits
--++-    
--++ class Qwen2MoeSparseMoeBlock(nn.Module):
--++     """
--++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++         return moe_output_fp32.squeeze(1).to(original_dtype)
--++ 
--+++    # @no_grad()
--+++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++    #     num_tokens, _ = hidden_states.shape
--+++    #     flat_selected_experts = selected_experts.flatten()
--+++    #     sorted_expert_indices = flat_selected_experts.argsort()
--+++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+++    #     original_token_indices = sorted_expert_indices // self.top_k
--+++    #     moe_output = ops.zeros_like(hidden_states)
--+++    #     current_token_offset = 0
--+++    #     for i in range(self.num_experts):
--+++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
--+++    #         if expert_token_count == 0:
--+++    #             continue
--+++    #         end_offset = current_token_offset + expert_token_count
--+++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
--+++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+++    #         current_token_offset += expert_token_count
--+++    #     return moe_output
--+++
--++     @no_grad()
--++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++-        num_tokens, _ = hidden_states.shape
--++-        flat_selected_experts = selected_experts.flatten()
--++-        sorted_expert_indices = flat_selected_experts.argsort()
--++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--++-        original_token_indices = sorted_expert_indices // self.top_k
--+++        """
--+++        优化版 MoE prefill (速度优先模式)：
--+++        - 批量张量化处理同一个 expert 的所有 token
--+++        - 跳过无 token 的专家
--+++        - 保持结果完全一致
--+++        """
--++         moe_output = ops.zeros_like(hidden_states)
--++-        current_token_offset = 0
--++-        for i in range(self.num_experts):
--++-            expert_token_count = tokens_per_expert[i] - current_token_offset
--++-            if expert_token_count == 0:
--++-                continue
--++-            end_offset = current_token_offset + expert_token_count
--++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--++-            expert_hidden_states = hidden_states[expert_original_token_indices]
--++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--++-            current_token_offset += expert_token_count
--+++
--+++        flat_selected_experts = selected_experts.flatten()
--+++        flat_routing_weights = routing_weights.flatten()
--+++
--+++        idxs = flat_selected_experts.argsort()
--+++        sorted_expert_indices = flat_selected_experts[idxs]
--+++        sorted_token_indices = idxs // self.top_k
--+++
--+++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
--+++
--+++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--+++
--+++        for expert_id in active_experts.tolist():
--+++            start = int(tokens_per_expert[:expert_id].sum().item())
--+++            end = start + int(tokens_per_expert[expert_id].item())
--+++
--+++            token_idx = sorted_token_indices[start:end]
--+++            expert_tokens = hidden_states[token_idx]
--+++
--+++            expert_out = self.experts[expert_id](expert_tokens)
--+++
--+++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
--+++
--+++            moe_output = mindspore.mint.scatter_add(
--+++                moe_output,
--+++                0,
--+++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
--+++                scaled_out.to(hidden_states.dtype)
--+++            )
--+++
--++         return moe_output
--++ 
--+++
--++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--++     @no_grad()
--++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++         
--++         moe_output = None
--++-        if Long_Prompt:
--++-            # --- 精度优先模式 (ACCURACY MODE) ---
--++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++        # if Long_Prompt==0:
--+++        #     # --- 精度优先模式 (ACCURACY MODE) ---
--+++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++        # else:
--+++        #     # --- 速度优先模式 (SPEED MODE) ---
--+++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++        #     if sequence_length == 1:
--+++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++        #     else:
--+++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++        
--+++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++        if sequence_length == 1:
--+++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++         else:
--++-            # --- 速度优先模式 (SPEED MODE) ---
--++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++-            if sequence_length == 1:
--++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++-            else:
--++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++-        
--+++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++    
--++ 
--++         # 3. 共享专家计算与合并 (所有模式通用)
--++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++         
--++         return final_hidden_states, router_logits
--++ 
--+++
--++ class Qwen2MoeDecoderLayer(nn.Module):
--++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--++         super().__init__()
--++         self.hidden_size = config.hidden_size
--++         
--++-        # if Long_Prompt:
--++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++-        # else:
--+++        # if Long_Prompt == 2:
--++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++        # else:
--+++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++ 
--++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++ 
--++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++             )
--++ 
--++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
--++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++        #     attention_mask,
--+++        #     sequence_length=sequence_length,
--+++        #     target_length=target_length,
--+++        #     dtype=dtype,
--+++        #     min_dtype=min_dtype,
--+++        #     cache_position=cache_position,
--+++        #     batch_size=input_tensor.shape[0],
--+++        # )
--+++        #@dwj
--+++        causal_mask = get_cached_causal_mask_with_cache_position(
--++             attention_mask,
--++             sequence_length=sequence_length,
--++             target_length=target_length,
--++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--++         """
--++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--+++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
--+++        _causal_mask_cache.clear()
--++ 
--++         input_ids = kwargs.get("input_ids")
--++         if input_ids is None and args:
--++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++ 
--++         if input_ids is not None:
--++             prompt_length = input_ids.shape[1]
--++-            
--++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--++-                Long_Prompt = True
--+++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
--+++                Long_Prompt = 2
--+++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
--+++                Long_Prompt = 0
--++             else:
--++-                Long_Prompt = False
--+++                Long_Prompt = 1
--+++
--++ 
--++         return super().generate(*args, **kwargs)
--++     
--++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++             dtype = self.lm_head.weight.dtype
--++             min_dtype = float(ops.finfo(dtype).min)
--++ 
--++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++            #     attention_mask,
--+++            #     sequence_length=sequence_length,
--+++            #     target_length=past_key_values.get_max_length(),
--+++            #     dtype=dtype,
--+++            #     min_dtype=min_dtype,
--+++            #     cache_position=cache_position,
--+++            #     batch_size=batch_size,
--+++            # )
--+++
--+++            #@dwj
--+++            attention_mask = get_cached_causal_mask_with_cache_position(
--++                 attention_mask,
--++                 sequence_length=sequence_length,
--++                 target_length=past_key_values.get_max_length(),
--++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--++deleted file mode 100644
--++index 6dfb5b93..00000000
--++--- a/patches/0001-20251104commit.patch
--+++++ /dev/null
--++@@ -1,1272 +0,0 @@
--++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++-From: Pinoeer-kingxi <13022943007@163.com>
--++-Date: Tue, 4 Nov 2025 09:11:51 +0800
--++-Subject: [PATCH] 20251104commit
--++-
--++----
--++- mindnlp/transformers/cache_utils.py           |  28 +-
--++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
--++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--++- 3 files changed, 976 insertions(+), 87 deletions(-)
--++-
--++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--++-index cadd2e04..02f8d4be 100644
--++---- a/mindnlp/transformers/cache_utils.py
--++-+++ b/mindnlp/transformers/cache_utils.py
--++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
--++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--++-             # k_out[:, :, cache_position] = key_states
--++-             # v_out[:, :, cache_position] = value_states
--++--            if ON_ORANGE_PI:
--++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++--            else:
--++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++--
--++-+            # if ON_ORANGE_PI:
--++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++-+            # else:
--++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++-+            # 确保 cache_position 是 1D tensor 并且类型正确
--++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--++-+            if cache_position.ndim > 1:
--++-+                cache_position = cache_position.flatten()
--++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
--++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--++-+                cache_position = cache_position.int()
--++-+            
--++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--++-+            k_out[:, :, cache_position] = key_states
--++-+            v_out[:, :, cache_position] = value_states
--++-+            
--++-         return k_out, v_out
--++- 
--++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++-index c695b944..d8303e45 100644
--++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--++- # Copied from transformers.models.llama.modeling_llama.rotate_half
--++- def rotate_half(x):
--++-     """Rotates half the hidden dims of the input."""
--++--    x1 = x[..., : x.shape[-1] // 2]
--++--    x2 = x[..., x.shape[-1] // 2 :]
--++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++-+    # x1 = x[..., : x.shape[-1] // 2]
--++-+    # x2 = x[..., x.shape[-1] // 2 :]
--++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++-     return ops.cat((-x2, x1), dim=-1)
--++- 
--++- 
--++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--++-         if self.training:
--++-             raise NotImplementedError("Training is not supported yet.")
--++-         else:
--++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++--        if self.config.n_shared_experts is not None:
--++--            y = y + self.shared_experts(identity)
--++--        return y
--++-+            # @lwx
--++-+            if orig_shape[1] == 1:
--++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--++-+                y=y.view(*orig_shape)
--++-+                if self.config.n_shared_experts is not None:
--++-+                    y = y + self.shared_experts(identity)
--++-+                return y
--++-+            else:
--++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--++-+                if self.config.n_shared_experts is not None:
--++-+                    y = y + self.shared_experts(identity)
--++-+                return y
--++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++-+        # if self.config.n_shared_experts is not None:
--++-+        #     y = y + self.shared_experts(identity)
--++-+        # return y
--++-+        
--++-+    @no_grad()
--++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++-+
--++-+        expert_cache = ops.zeros_like(x)
--++-+        for i in range(self.num_experts_per_tok):
--++-+            expert_id = flat_expert_indices[i].item()
--++-+            weight = flat_expert_weights[i].item()
--++-+            expert = self.experts[expert_id]
--++-+            expert_out = expert(x)
--++-+            expert_cache += expert_out * weight
--++-+        return expert_cache
--++- 
--++-     @no_grad()
--++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++--        # expert_cache = torch.zeros_like(x)
--++--        # idxs = flat_expert_indices.argsort()
--++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++--        # token_idxs = idxs // self.num_experts_per_tok
--++--        # for i, end_idx in enumerate(tokens_per_expert):
--++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++--        #     if start_idx == end_idx:
--++--        #         continue
--++--        #     expert = self.experts[i]
--++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
--++--        #     expert_tokens = x[exp_token_idx]
--++--        #     expert_out = expert(expert_tokens)
--++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++--        # return expert_cache
--++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++-         expert_cache = ops.zeros_like(x)
--++-         idxs = flat_expert_indices.argsort()
--++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++-         token_idxs = idxs // self.num_experts_per_tok
--++-+
--++-         for i, end_idx in enumerate(tokens_per_expert):
--++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++-             if start_idx == end_idx:
--++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--++-             expert_out = expert(expert_tokens)
--++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++-+
--++-         return expert_cache
--++-+        
--++-+    # @no_grad()
--++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++-+    #     # expert_cache = torch.zeros_like(x)
--++-+    #     # idxs = flat_expert_indices.argsort()
--++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++-+    #     # token_idxs = idxs // self.num_experts_per_tok
--++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
--++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++-+    #     #     if start_idx == end_idx:
--++-+    #     #         continue
--++-+    #     #     expert = self.experts[i]
--++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++-+    #     #     expert_tokens = x[exp_token_idx]
--++-+    #     #     expert_out = expert(expert_tokens)
--++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++-+    #     # return expert_cache
--++-+    #     expert_cache = ops.zeros_like(x)
--++-+    #     idxs = flat_expert_indices.argsort()
--++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++-+    #     token_idxs = idxs // self.num_experts_per_tok
--++-+
--++-+    #     for i, end_idx in enumerate(tokens_per_expert):
--++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++-+    #         if start_idx == end_idx:
--++-+    #             continue
--++-+    #         expert = self.experts[i]
--++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++-+    #         expert_tokens = x[exp_token_idx]
--++-+    #         expert_out = expert(expert_tokens)
--++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++-+
--++-+    #     return expert_cache
--++-+    # @no_grad()
--++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++-+    #     expert_cache = ops.zeros_like(x)
--++-+
--++-+    #     # 排序保证顺序一致
--++-+    #     idxs = flat_expert_indices.argsort()
--++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++-+    #     token_idxs = idxs // self.num_experts_per_tok
--++-+
--++-+    #     # 找出有 token 的专家
--++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++-+
--++-+    #     for i in active_experts.tolist():
--++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++-+    #         end_idx = tokens_per_expert[i]
--++-+    #         if start_idx == end_idx:  # 没有 token
--++-+    #             continue
--++-+
--++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++-+    #         expert_tokens = x[exp_token_idx]
--++-+    #         expert_out = self.experts[i](expert_tokens)
--++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++-+
--++-+    #         expert_cache = mindspore.mint.scatter_add(
--++-+    #             expert_cache,
--++-+    #             0,
--++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++-+    #             expert_out
--++-+    #         )
--++-+
--++-+    #     return expert_cache
--++-+
--++-+
--++- 
--++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--++- #     """
--++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++- 
--++-         # Initialize weights and apply final processing
--++-         self.post_init()
--++-+        self.warm_up = False
--++-+
--++-+    def warmup_moe_model_deep(self):
--++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++-+        test_texts = [
--++-+            "warmup short",
--++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--++-+        ]
--++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++-+        if tokenizer is None:
--++-+            from mindnlp.transformers import AutoTokenizer
--++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++-+            self._warmup_tokenizer = tokenizer
--++-+
--++-+        for text in test_texts:
--++-+            inputs = tokenizer(text, return_tensors="ms")
--++-+            with mindspore._no_grad():
--++-+                _ = self(**inputs, use_cache=False)
--++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--++- 
--++-     def get_input_embeddings(self):
--++-         return self.model.embed_tokens
--++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++-         ```"""
--++-+        if not self.warm_up:
--++-+            self.warm_up = True
--++-+            self.warmup_moe_model_deep()
--++-+
--++-         output_attentions = (
--++-             output_attentions
--++-             if output_attentions is not None
--++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++-index 3cbf820e..d4c6b651 100644
--++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++-@@ -18,7 +18,6 @@
--++- # See the License for the specific language governing permissions and
--++- # limitations under the License.
--++- """MindSpore Qwen2MoE model."""
--++--
--++- import math
--++- from typing import List, Optional, Tuple, Union
--++- 
--++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--++-     TokenClassifierOutput,
--++- )
--++- from ...modeling_utils import PreTrainedModel
--++-+from ...generation import GenerationMixin
--++- from ....utils import logging
--++- from .configuration_qwen2_moe import Qwen2MoeConfig
--++- 
--++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--++-         self.variance_epsilon = eps
--++- 
--++-     def forward(self, hidden_states):
--++-+        # @dwj
--++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++-+        # @lwx
--++-+        # if not self.training :
--++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++-         input_dtype = hidden_states.dtype
--++-         hidden_states = hidden_states.to(mindspore.float32)
--++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--++-@@ -234,6 +239,8 @@ def rotate_half(x):
--++-     """Rotates half the hidden dims of the input."""
--++-     x1 = x[..., : x.shape[-1] // 2]
--++-     x2 = x[..., x.shape[-1] // 2 :]
--++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++-     return ops.cat((-x2, x1), dim=-1)
--++- 
--++- 
--++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--++-         self.config = config
--++-         self.hidden_size = config.hidden_size
--++-         self.intermediate_size = intermediate_size
--++-+        
--++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--++-         self.act_fn = ACT2FN[config.hidden_act]
--++- 
--++-     def forward(self, x):
--++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++--
--++- 
--++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++-+        # @lwx 
--++-+        # gate_up_output = self.gate_up_proj(x)
--++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--++-+        # return self.down_proj(swiglu_output)
--++-+
--++-+    # def forward(self, x):
--++-+    #     gate_proj_out = self.gate_proj(x)
--++-+    #     up_proj_out = self.up_proj(x)
--++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--++-+    #     return self.down_proj(swiglu_out)
--++-+        
--++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
--++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++-     """
--++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--++-         use_cache: bool = False,
--++-         cache_position: Optional[mindspore.Tensor] = None,
--++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++-+
--++-+        
--++-+
--++-         bsz, q_len, _ = hidden_states.shape
--++- 
--++-         query_states = self.q_proj(hidden_states)
--++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++-                     "with a layer index."
--++-                 )
--++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++-+            if isinstance(past_key_value, StaticCache):
--++-+                kv_seq_len = key_states.shape[-2]
--++-+            else:
--++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++- 
--++-         if past_key_value is not None:
--++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++-+            
--++-+            if isinstance(past_key_value, StaticCache):
--++-+                kv_seq_len = key_states.shape[-2]
--++- 
--++-         # repeat k/v heads if n_kv_heads < n_heads
--++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++--
--++-+        
--++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++- 
--++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--++--            raise ValueError(
--++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--++--                f" {attn_weights.shape}"
--++--            )
--++--
--++--        if attention_mask is not None:  # no matter the length, we just slice it
--++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--++-+        if attention_mask is not None:
--++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++-             attn_weights = attn_weights + causal_mask
--++- 
--++-         # upcast attention to fp32
--++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++- 
--++-         attn_output = self.o_proj(attn_output)
--++--
--++-+        # @lwx
--++-+        
--++-+        # max_seq_len = self.max_position_embeddings  # 2048
--++-+
--++-+        # if attention_mask is not None:
--++-+        #     # attention_mask: [B, 1, Sq, Sk]
--++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++-+
--++-+        #     # pad 到 [max_seq_len, max_seq_len]
--++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++-+        #     global_attention_mask = padded_mask
--++-+        # else:
--++-+        #     global_attention_mask = None
--++-+
--++-+
--++-+        # sparse_mode=3
--++-+        # attn_output = mindspore.ops.flash_attention_score(
--++-+        #     query=query_states,
--++-+        #     key=key_states,
--++-+        #     value=value_states,
--++-+        #     real_shift=None,
--++-+        #     padding_mask=None,
--++-+
--++-+        #     head_num=self.num_heads,
--++-+        #     attn_mask=global_attention_mask,
--++-+        #     keep_prob=1.0 - self.attention_dropout,
--++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++-+        #     input_layout="BNSD",
--++-+        #     pre_tokens=2147483647,
--++-+        #     next_tokens=2147483647,
--++-+        #     inner_precise=0,
--++-+        #     drop_mask=None,
--++-+        #     prefix=None,
--++-+        #     actual_seq_qlen=None,
--++-+        #     actual_seq_kvlen=None,
--++-+        #     sparse_mode=sparse_mode,
--++-+        # )
--++-         if not output_attentions:
--++-             attn_weights = None
--++- 
--++-         return attn_output, attn_weights, past_key_value
--++- 
--++- 
--++-+class Qwen2MoeFlashAttention(nn.Module):
--++-+    """
--++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++-+
--++-+    关键改动:
--++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++-+       直接传入原始的 key 和 value 张量效率更高。
--++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++-+    """
--++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++-+        super().__init__()
--++-+        self.config = config
--++-+        self.layer_idx = layer_idx
--++-+        self.hidden_size = config.hidden_size
--++-+        self.num_heads = config.num_attention_heads
--++-+        self.head_dim = self.hidden_size // self.num_heads
--++-+        self.num_key_value_heads = config.num_key_value_heads
--++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++-+        self.max_position_embeddings = config.max_position_embeddings
--++-+        self.rope_theta = config.rope_theta
--++-+        self.attention_dropout = config.attention_dropout
--++-+
--++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
--++-+            raise ValueError(
--++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++-+            )
--++-+
--++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++-+
--++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++-+            self.head_dim,
--++-+            max_position_embeddings=self.max_position_embeddings,
--++-+            base=self.rope_theta,
--++-+        )
--++-+
--++-+    def forward(
--++-+        self,
--++-+        hidden_states: mindspore.Tensor,
--++-+        attention_mask: Optional[mindspore.Tensor] = None,
--++-+        position_ids: Optional[mindspore.Tensor] = None,
--++-+        past_key_value: Optional[Cache] = None,
--++-+        output_attentions: bool = False,
--++-+        use_cache: bool = False,
--++-+        cache_position: Optional[mindspore.Tensor] = None,
--++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++-+
--++-+        bsz, q_len, _ = hidden_states.shape
--++-+
--++-+        # 1. 线性投射 Q, K, V
--++-+        query_states = self.q_proj(hidden_states)
--++-+        key_states = self.k_proj(hidden_states)
--++-+        value_states = self.v_proj(hidden_states)
--++-+
--++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
--++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+
--++-+        # 3. RoPE 旋转位置编码
--++-+        kv_seq_len = key_states.shape[-2]
--++-+        if past_key_value is not None:
--++-+            if self.layer_idx is None:
--++-+                raise ValueError(
--++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++-+                    "with a layer index."
--++-+                )
--++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++-+                if cache_position.shape[0] == 1:
--++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++-+                    kv_seq_len = past_seen_tokens + 1
--++-+                else:
--++-+                    # prefill 阶段：cache_position 是范围，使用其长度
--++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++-+            else:
--++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++-+
--++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++-+
--++-+        # 4. KV 缓存更新
--++-+        if past_key_value is not None:
--++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++-+            key_states, value_states = past_key_value.update(
--++-+                key_states, value_states, self.layer_idx, cache_kwargs
--++-+            )
--++-+            
--++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++-+                if cache_position.shape[0] == 1:
--++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++-+                    kv_seq_len = key_states.shape[-2]
--++-+
--++-+        # 5. [重要] 准备 Attention Mask
--++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++-+        fa_attention_mask = None
--++-+        if attention_mask is not None:
--++-+            # 截取与当前key长度匹配的部分
--++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++-+            fa_attention_mask = (mask_slice != 0)
--++-+
--++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++-+        input_dtype = query_states.dtype
--++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++-+            query_states = query_states.to(mindspore.float16)
--++-+            key_states = key_states.to(mindspore.float16)
--++-+            value_states = value_states.to(mindspore.float16)
--++-+
--++-+        # 6. [核心] 调用 flash_attention_score 算子
--++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++-+        attn_output = mindspore.ops.flash_attention_score(
--++-+            query=query_states,
--++-+            key=key_states,
--++-+            value=value_states,
--++-+            head_num=self.num_heads, # 传入Q的头数(N1)
--++-+            attn_mask=fa_attention_mask,
--++-+            keep_prob=1.0 - self.attention_dropout,
--++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
--++-+            input_layout="BNSD",
--++-+            sparse_mode=0 # 使用 defaultMask 模式
--++-+        )
--++-+
--++-+        # 恢复原始数据类型
--++-+        attn_output = attn_output.to(input_dtype)
--++-+
--++-+        # 7. 调整输出形状
--++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++-+        attn_output = self.o_proj(attn_output)
--++-+
--++-+        # FlashAttention 算子不直接返回注意力权重矩阵
--++-+        attn_weights = None
--++-+        if output_attentions:
--++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++-+
--++-+        return attn_output, attn_weights, past_key_value
--++-+
--++-+    # def forward(
--++-+    #     self,
--++-+    #     hidden_states: mindspore.Tensor,
--++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--++-+    #     position_ids: Optional[mindspore.Tensor] = None,
--++-+    #     past_key_value: Optional[Cache] = None,
--++-+    #     output_attentions: bool = False,
--++-+    #     use_cache: bool = False,
--++-+    #     cache_position: Optional[mindspore.Tensor] = None,
--++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++-+
--++-+    #     bsz, q_len, _ = hidden_states.shape
--++-+
--++-+    #     # 1. 线性投射 Q, K, V
--++-+    #     query_states = self.q_proj(hidden_states)
--++-+    #     key_states = self.k_proj(hidden_states)
--++-+    #     value_states = self.v_proj(hidden_states)
--++-+
--++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+
--++-+    #     # 3. RoPE 旋转位置编码
--++-+    #     kv_seq_len = key_states.shape[-2]
--++-+    #     if past_key_value is not None:
--++-+    #         if self.layer_idx is None:
--++-+    #             raise ValueError(
--++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++-+    #                 "with a layer index."
--++-+    #             )
--++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++-+
--++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++-+
--++-+    #     # 4. KV 缓存更新
--++-+    #     if past_key_value is not None:
--++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++-+    #         key_states, value_states = past_key_value.update(
--++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--++-+    #         )
--++-+
--++-+    #     # 5. 准备 Attention Mask
--++-+    #     fa_attention_mask = None
--++-+    #     if attention_mask is not None:
--++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++-+    #         fa_attention_mask = (mask_slice != 0)
--++-+
--++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++-+    #     input_dtype = query_states.dtype
--++-+
--++-+    #     # 6. [核心] 调用 flash_attention_score 算子
--++-+    #     attn_output = mindspore.ops.flash_attention_score(
--++-+    #         query=query_states,
--++-+    #         key=key_states,
--++-+    #         value=value_states,
--++-+    #         head_num=self.num_heads,
--++-+    #         attn_mask=fa_attention_mask,
--++-+    #         keep_prob=1.0 - self.attention_dropout,
--++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++-+    #         input_layout="BNSD",
--++-+    #         sparse_mode=0,
--++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++-+    #         inner_precise=1
--++-+    #     )
--++-+
--++-+    #     # 恢复原始数据类型
--++-+    #     attn_output = attn_output.to(input_dtype)
--++-+
--++-+    #     # 7. 调整输出形状
--++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++-+    #     attn_output = self.o_proj(attn_output)
--++-+
--++-+    #     attn_weights = None
--++-+    #     if output_attentions:
--++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++-+
--++-+    #     return attn_output, attn_weights, past_key_value
--++-+
--++-+    # def forward(
--++-+    #     self,
--++-+    #     hidden_states: mindspore.Tensor,
--++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--++-+    #     position_ids: Optional[mindspore.Tensor] = None,
--++-+    #     past_key_value: Optional[Cache] = None,
--++-+    #     output_attentions: bool = False,
--++-+    #     use_cache: bool = False,
--++-+    #     cache_position: Optional[mindspore.Tensor] = None,
--++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++-+
--++-+    #     bsz, q_len, _ = hidden_states.shape
--++-+
--++-+    #     query_states = self.q_proj(hidden_states)
--++-+    #     key_states = self.k_proj(hidden_states)
--++-+    #     value_states = self.v_proj(hidden_states)
--++-+
--++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-+
--++-+    #     kv_seq_len = key_states.shape[-2]
--++-+    #     if past_key_value is not None:
--++-+    #         if self.layer_idx is None:
--++-+    #             raise ValueError("`layer_idx` must be specified for caching")
--++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++-+
--++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++-+
--++-+    #     if past_key_value is not None:
--++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++-+    #         key_states, value_states = past_key_value.update(
--++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--++-+    #         )
--++-+
--++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++-+
--++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++-+    #     query_states = query_states / math.sqrt(self.head_dim)
--++-+    #     # <--- 修改结束 ---
--++-+
--++-+    #     fa_attention_mask = None
--++-+    #     if attention_mask is not None:
--++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++-+    #         fa_attention_mask = (mask_slice != 0)
--++-+
--++-+    #     input_dtype = query_states.dtype
--++-+
--++-+    #     attn_output = mindspore.ops.flash_attention_score(
--++-+    #         query=query_states,    # 传入已经预先缩放过的 query
--++-+    #         key=key_states,
--++-+    #         value=value_states,
--++-+    #         head_num=self.num_heads,
--++-+    #         attn_mask=fa_attention_mask,
--++-+    #         keep_prob=1.0 - self.attention_dropout,
--++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++-+    #         input_layout="BNSD",
--++-+    #         sparse_mode=0,
--++-+    #         inner_precise=1        # 仍然保持内部高精度计算
--++-+    #     )
--++-+
--++-+    #     attn_output = attn_output.to(input_dtype)
--++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++-+    #     attn_output = self.o_proj(attn_output)
--++-+
--++-+    #     attn_weights = None
--++-+    #     if output_attentions:
--++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++-+
--++-+    #     return attn_output, attn_weights, past_key_value
--++-+
--++- QWEN2MOE_ATTENTION_CLASSES = {
--++-     "eager": Qwen2MoeAttention,
--++-+    "flash-attention": Qwen2MoeFlashAttention,
--++- }
--++- 
--++- 
--++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++- 
--++-+    #@dwj
--++-+    # 只遍历激活的专家，而非全部专家
--++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++--        hidden_states = hidden_states.view(-1, hidden_dim)
--++--        # router_logits: (batch * sequence_length, n_experts)
--++--        router_logits = self.gate(hidden_states)
--++--
--++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++--        if self.norm_topk_prob:
--++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++--        # we cast back to the input dtype
--++--        routing_weights = routing_weights.to(hidden_states.dtype)
--++--
--++--        final_hidden_states = ops.zeros(
--++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--++--        )
--++--
--++--        # One hot encode the selected experts to create an expert mask
--++--        # this will be used to easily index which expert is going to be sollicitated
--++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--++--
--++--        # Loop over all available experts in the model and perform the computation on each expert
--++--        for expert_idx in range(self.num_experts):
--++--            expert_layer = self.experts[expert_idx]
--++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--++--
--++--            # Index the correct hidden states and compute the expert hidden state for
--++--            # the current expert. We need to make sure to multiply the output hidden
--++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--++--            if 0 not in idx.shape:
--++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--++--
--++--                # However `index_add_` only support torch tensors for indexing so we'll use
--++--                # the `top_x` tensor here.
--++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--++--
--++--        shared_expert_output = self.shared_expert(hidden_states)
--++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--++--
--++--        final_hidden_states = final_hidden_states + shared_expert_output
--++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++-+            num_tokens = hidden_states_reshaped.shape[0]
--++-+
--++-+            router_logits = self.gate(hidden_states_reshaped)
--++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++-+
--++-+            if self.norm_topk_prob:
--++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++-+            routing_weights = routing_weights.to(hidden_states.dtype)
--++-+            
--++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++-+            flat_selected_experts = selected_experts.flatten()
--++-+            
--++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++-+            token_indices = broadcasted_token_indices.flatten()
--++-+            
--++-+            active_experts = ops.unique(flat_selected_experts)
--++-+            
--++-+            for expert_idx_tensor in active_experts:
--++-+                expert_idx = expert_idx_tensor.item()
--++-+                expert_layer = self.experts[expert_idx]
--++-+                
--++-+                mask = (flat_selected_experts == expert_idx_tensor)
--++-+                selected_token_indices = token_indices[mask]
--++-+                selected_routing_weights = routing_weights.flatten()[mask]
--++-+                
--++-+                current_states = hidden_states_reshaped[selected_token_indices]
--++-+                
--++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++-+                
--++-+                final_hidden_states = final_hidden_states.index_add(
--++-+                    dim=0,
--++-+                    index=selected_token_indices,
--++-+                    source=expert_output.to(hidden_states.dtype)
--++-+                )
--++-+            
--++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++- 
--++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++--        return final_hidden_states, router_logits
--++-+            final_hidden_states = final_hidden_states + shared_expert_output
--++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++-+            
--++-+            return final_hidden_states, router_logits
--++- 
--++- 
--++- class Qwen2MoeDecoderLayer(nn.Module):
--++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--++- 
--++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++- 
--++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++-+
--++-         if (layer_idx not in config.mlp_only_layers) and (
--++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++-         ):
--++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--++-     _skip_keys_device_placement = "past_key_values"
--++-     _supports_cache_class = True
--++-+#lwx
--++-+    # _supports_static_cache = True
--++- 
--++-     def _init_weights(self, module):
--++-         std = self.config.initializer_range
--++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++-         return causal_mask
--++- 
--++- 
--++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++-     _tied_weights_keys = ["lm_head.weight"]
--++- 
--++-     def __init__(self, config):
--++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++-         self.num_experts_per_tok = config.num_experts_per_tok
--++-         # Initialize weights and apply final processing
--++-         self.post_init()
--++-+        # @lwx
--++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--++-+        #     self.generation_config.cache_implementation = "static"
--++-+        self._warmed_up = False
--++-+
--++-+    def warmup_moe_model(self):
--++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
--++-+        test_texts = [
--++-+            "warmup short",
--++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--++-+        ]
--++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++-+        if tokenizer is None:
--++-+            from mindnlp.transformers import AutoTokenizer
--++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++-+            self._warmup_tokenizer = tokenizer
--++-+
--++-+        for text in test_texts:
--++-+            inputs = tokenizer(text, return_tensors="ms")
--++-+            with mindspore._no_grad():
--++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
--++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
--++- 
--++-     def get_input_embeddings(self):
--++-         return self.model.embed_tokens
--++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++-         ```"""
--++-+        if not self._warmed_up:
--++-+            self._warmed_up = True
--++-+            self.warmup_moe_model()
--++- 
--++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++-         output_router_logits = (
--++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++-             }
--++-         )
--++-         return model_inputs
--++-+# @lwx
--++-+    # def _decode_one_tokens_logits(
--++-+    #     self,
--++-+    #     cur_token: mindspore.Tensor,
--++-+    #     input_pos: Optional[mindspore.Tensor],
--++-+    #     cache_position: mindspore.Tensor,
--++-+    #     past_key_values: StaticCache,
--++-+    # ) -> mindspore.Tensor:
--++-+    #     """
--++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--++-+        
--++-+    #     Args:
--++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--++-+    #         input_pos: 输入位置信息，可选
--++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
--++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
--++-+            
--++-+    #     Returns:
--++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--++-+    #     """
--++-+    #     # 调用JIT编译的版本
--++-+    #     return self.get_decode_one_tokens_logits(
--++-+    #         cur_token=cur_token,
--++-+    #         input_pos=input_pos,
--++-+    #         cache_position=cache_position,
--++-+    #         past_key_values=past_key_values,
--++-+    #     )
--++-+    
--++-+    # @mindspore.jit(jit_level='O1')
--++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--++-+    #     """
--++-+    #     JIT编译的函数，用于高效的单token解码
--++-+    #     使用JIT编译优化以支持静态shape和高效执行
--++-+        
--++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--++-+    #     """
--++-+    #     outputs = self.model.forward(
--++-+    #         input_ids=cur_token,
--++-+    #         position_ids=input_pos,
--++-+    #         cache_position=cache_position,
--++-+    #         past_key_values=past_key_values,
--++-+    #         use_cache=True,
--++-+    #         return_dict=False,
--++-+    #     )
--++-+        
--++-+    #     hidden_states = outputs[0]
--++-+    #     logits = self.lm_head.forward(hidden_states)
--++-+    #     logits = logits.float()
--++-+        
--++-+    #     return logits[:, -1, :]
--++-+
--++-+    # def _sample(
--++-+    #     self,
--++-+    #     input_ids: mindspore.Tensor,
--++-+    #     logits_processor,
--++-+    #     stopping_criteria,
--++-+    #     generation_config,
--++-+    #     synced_devices: bool,
--++-+    #     streamer=None,
--++-+    #     logits_warper=None,
--++-+    #     **model_kwargs,
--++-+    # ):
--++-+    #     """
--++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--++-+    #     """
--++-+    #     from ...generation.logits_process import LogitsProcessorList
--++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
--++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--++-+    #     from mindnlp.core import nn, ops, no_grad
--++-+    #     import numpy as np
--++-+        
--++-+    #     # 检查是否使用 StaticCache
--++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--++-+    #     # 否则，直接调用父类方法
--++-+    #     past_key_values = model_kwargs.get("past_key_values")
--++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--++-+
--++-+    #     if not isinstance(past_key_values, StaticCache):
--++-+    #         # 不使用 StaticCache，直接调用父类方法
--++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--++-+    #         return super()._sample(
--++-+    #             input_ids=input_ids,
--++-+    #             logits_processor=logits_processor,
--++-+    #             stopping_criteria=stopping_criteria,
--++-+    #             generation_config=generation_config,
--++-+    #             synced_devices=synced_devices,
--++-+    #             streamer=streamer,
--++-+    #             logits_warper=logits_warper,
--++-+    #             **model_kwargs,
--++-+    #         )
--++-+        
--++-+    #     # 使用 StaticCache，进入自定义循环
--++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--++-+    #     pad_token_id = generation_config._pad_token_tensor
--++-+    #     output_attentions = generation_config.output_attentions
--++-+    #     output_hidden_states = generation_config.output_hidden_states
--++-+    #     output_scores = generation_config.output_scores
--++-+    #     output_logits = generation_config.output_logits
--++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
--++-+    #     max_length = generation_config.max_length
--++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--++-+    #     do_sample = generation_config.do_sample
--++-+        
--++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--++-+    #         raise ValueError(
--++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--++-+    #             f"{logits_warper})."
--++-+    #         )
--++-+        
--++-+    #     # init attention / hidden states / scores tuples
--++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
--++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--++-+        
--++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--++-+    #         encoder_hidden_states = (
--++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--++-+    #         )
--++-+        
--++-+    #     # keep track of which sequences are already finished
--++-+    #     batch_size, cur_len = input_ids.shape
--++-+    #     this_peer_finished = False
--++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--++-+        
--++-+    #     time_record = []
--++-+    #     from ....utils.testing_utils import parse_flag_from_env
--++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--++-+        
--++-+    #     while self._has_unfinished_sequences(
--++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--++-+    #     ):
--++-+    #         if _record_time:
--++-+    #             import time as time_module
--++-+    #             infer_start = time_module.time()
--++-+            
--++-+    #         # prepare model inputs
--++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--++-+            
--++-+    #         # prepare variable output controls
--++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--++-+            
--++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--++-+    #         cur_cache_position = model_inputs.get("cache_position")
--++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
--++-+    #         cur_input_ids = model_inputs.get("input_ids")
--++-+            
--++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
--++-+    #             cur_cache_position is not None and 
--++-+    #             len(cur_cache_position.shape) > 0 and
--++-+    #             cur_cache_position.shape[0] == 1 and
--++-+    #             cur_input_ids is not None and
--++-+    #             cur_input_ids.shape[1] == 1):
--++-+    #             # 使用 JIT 优化的单 token 解码
--++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--++-+    #             if not hasattr(self, '_jit_used'):
--++-+    #                 self._jit_used = False
--++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--++-+                
--++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
--++-+    #                 cur_token=cur_input_ids,
--++-+    #                 input_pos=model_inputs.get("position_ids"),
--++-+    #                 cache_position=cur_cache_position,
--++-+    #                 past_key_values=cur_past_key_values,
--++-+    #             )
--++-+                
--++-+    #             # 标记已使用JIT（用于后续判断）
--++-+    #             if not self._jit_used:
--++-+    #                 self._jit_used = True
--++-+                
--++-+    #             # 构造兼容的输出对象
--++-+    #             class JitOptimizedOutput:
--++-+    #                 def __init__(self, logits, config):
--++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--++-+    #                     self.config = config
--++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
--++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
--++-+    #                     self.cross_attentions = None
--++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--++-+                
--++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--++-+    #         else:
--++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--++-+    #             outputs = self(**model_inputs, return_dict=True)
--++-+            
--++-+    #         if synced_devices and this_peer_finished:
--++-+    #             continue
--++-+            
--++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--++-+    #         next_token_logits = outputs.logits[:, -1, :]
--++-+            
--++-+    #         # pre-process distribution
--++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--++-+    #         if do_sample:
--++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--++-+            
--++-+    #         # Store scores, attentions and hidden_states when required
--++-+    #         if return_dict_in_generate:
--++-+    #             if output_scores:
--++-+    #                 scores += (next_token_scores,)
--++-+    #             if output_logits:
--++-+    #                 raw_logits += (next_token_logits,)
--++-+    #             if output_attentions:
--++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--++-+    #                 if self.config.is_encoder_decoder:
--++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--++-+                
--++-+    #             if output_hidden_states:
--++-+    #                 hidden = (
--++-+    #                     outputs.decoder_hidden_states
--++-+    #                     if self.config.is_encoder_decoder
--++-+    #                     else outputs.hidden_states
--++-+    #                 )
--++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--++-+            
--++-+    #         # token selection
--++-+    #         if do_sample:
--++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--++-+    #         else:
--++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--++-+            
--++-+    #         # finished sentences should have their next token be a padding token
--++-+    #         if has_eos_stopping_criteria:
--++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--++-+            
--++-+    #         # update generated ids, model inputs, and length for next step
--++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--++-+    #         if streamer is not None:
--++-+    #             streamer.put(next_tokens)
--++-+            
--++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
--++-+    #             outputs,
--++-+    #             model_kwargs,
--++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
--++-+    #         )
--++-+            
--++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--++-+    #         cur_len += 1
--++-+            
--++-+    #         if _record_time:
--++-+    #             import time as time_module
--++-+    #             infer_stop = time_module.time()
--++-+    #             time_record.append(infer_stop - infer_start)
--++-+            
--++-+    #         del outputs
--++-+        
--++-+    #     average_infer_time = None
--++-+    #     if time_record:
--++-+    #         if len(time_record) > 1:
--++-+    #             time_record.pop(0)
--++-+    #         average_infer_time = sum(time_record) / len(time_record)
--++-+    #         print(f'average inference time is: {average_infer_time}')
--++-+    #         print(f'inference time record: {time_record}')
--++-+        
--++-+    #     if streamer is not None:
--++-+    #         streamer.end()
--++-+        
--++-+    #     # 简单判断：打印是否使用了JIT路径
--++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
--++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
--++-+    #     else:
--++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--++-+        
--++-+    #     if return_dict_in_generate:
--++-+    #         if self.config.is_encoder_decoder:
--++-+    #             return GenerateEncoderDecoderOutput(
--++-+    #                 sequences=input_ids,
--++-+    #                 scores=scores,
--++-+    #                 logits=raw_logits,
--++-+    #                 encoder_attentions=encoder_attentions,
--++-+    #                 encoder_hidden_states=encoder_hidden_states,
--++-+    #                 decoder_attentions=decoder_attentions,
--++-+    #                 cross_attentions=cross_attentions,
--++-+    #                 decoder_hidden_states=decoder_hidden_states,
--++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--++-+    #                 average_infer_time=average_infer_time
--++-+    #             )
--++-+    #         else:
--++-+    #             return GenerateDecoderOnlyOutput(
--++-+    #                 sequences=input_ids,
--++-+    #                 scores=scores,
--++-+    #                 logits=raw_logits,
--++-+    #                 attentions=decoder_attentions,
--++-+    #                 hidden_states=decoder_hidden_states,
--++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--++-+    #                 average_infer_time=average_infer_time
--++-+    #             )
--++-+    #     else:
--++-+    #         return input_ids
--++-+            
--++-+    # def _prepare_cache_for_generation(
--++-+    #     self,
--++-+    #     generation_config,
--++-+    #     model_kwargs,
--++-+    #     assistant_model,
--++-+    #     batch_size,
--++-+    #     max_cache_length,
--++-+    # ):
--++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--++-+    #         generation_config.cache_implementation = "static"
--++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--++-+        
--++-+    #     if generation_config.cache_implementation == "static":
--++-+    #         base_required_from_max_length = generation_config.max_length + 1
--++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
--++-+    #         min_cache_size = 50
--++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--++-+    #         else:
--++-+    #             max_cache_length = max(base_required, min_cache_size)
--++-+            
--++-+    #         original_max_cache_length = max_cache_length
--++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
--++-+            
--++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++-+    #             if max_cache_length > self.config.max_position_embeddings:
--++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--++-+        
--++-+    #     result = super()._prepare_cache_for_generation(
--++-+    #         generation_config=generation_config,
--++-+    #         model_kwargs=model_kwargs,
--++-+    #         assistant_model=assistant_model,
--++-+    #         batch_size=batch_size,
--++-+    #         max_cache_length=max_cache_length,
--++-+    #     )
--++-+        
--++-+    #     if generation_config.cache_implementation == "static":
--++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--++-+    #         created_cache = model_kwargs.get(cache_name)
--++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--++-+    #             if created_cache.max_cache_len < generation_config.max_length:
--++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--++-+        
--++-+    #     return result
--++-+
--++-+
--++-+
--++- 
--++- 
--++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--++--- 
--++-2.27.0
--++-
--++-- 
--++2.27.0
--++
--+-- 
--+2.27.0
--+
---- 
--2.27.0
--
-diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
-deleted file mode 100644
-index 80906633..00000000
---- a/patches/0006-20251107002commit.patch
-+++ /dev/null
-@@ -1,7931 +0,0 @@
--From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Fri, 7 Nov 2025 12:06:32 +0800
--Subject: [PATCH 6/8] 20251107002commit
--
-----
-- .../models/deepseek/modeling_deepseek.py      |  122 +-
-- patches/0001-20251104commit.patch             |    2 +-
-- patches/0002-20251106commit.patch             |    2 +-
-- patches/0003-20261106secondcommit.patch       |    2 +-
-- patches/0004-20251106change.patch             |    2 +-
-- patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
-- 6 files changed, 7773 insertions(+), 64 deletions(-)
-- create mode 100644 patches/0005-20251107001commit.patch
--
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index 8831e4b7..e7e1c053 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
--     #         expert_out = expert(x)
--     #         expert_cache += expert_out * weight
--     #     return expert_cache
---
---    # @no_grad()
---    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
---    #     # x 的 shape: (1, hidden_size)
---    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
---    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
---
---    #     # 1. 收集所有需要的专家层
---    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
---    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
---
---    #     # 2. 并行计算所有专家的输出
---    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
---    #     # ops.cat 会将它们堆叠成一个新的 Tensor
---    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
---    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
---
---    #     # 3. 使用矩阵乘法进行加权求和
---    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
---    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
---    #     # 最终结果 final_output 的 shape: (1, hidden_size)
---    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+    
--+    @no_grad()
--+    dwj
--+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+        # x 的 shape: (1, hidden_size)
--+        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+
--+        # 1. 收集所有需要的专家层
--+        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+        selected_experts = [self.experts[i] for i in flat_expert_indices]
--+
--+        # 2. 并行计算所有专家的输出
--+        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+        # ops.cat 会将它们堆叠成一个新的 Tensor
--+        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+
--+        # 3. 使用矩阵乘法进行加权求和
--+        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+        # 最终结果 final_output 的 shape: (1, hidden_size)
--+        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--         
---    #     return final_output
--+        return final_output
-- 
-- 
--     # @no_grad()
--@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
-- 
--         return expert_cache
-- # 放置在 DeepseekMoE 类中
---    @no_grad()
---    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
---        """
---        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
---
---        Args:
---            x (Tensor): 输入张量, shape: (1, hidden_size)
---            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
---            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
---        """
---        top_k, _ = flat_expert_weights.shape
---        hidden_size = x.shape[-1]
---
---        # 1. 将所有专家的权重堆叠起来
---        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
---        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
---        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--+    # @no_grad()
--+    # #lwx 20251107
--+    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+    #     """
--+    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--+
--+    #     Args:
--+    #         x (Tensor): 输入张量, shape: (1, hidden_size)
--+    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--+    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--+    #     """
--+    #     top_k, _ = flat_expert_weights.shape
--+    #     hidden_size = x.shape[-1]
--+
--+    #     # 1. 将所有专家的权重堆叠起来
--+    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--+    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--+    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--         
---        # 2. "收集" 所需的专家权重
---        selected_gate_w = stacked_gate_w[flat_expert_indices]
---        selected_up_w = stacked_up_w[flat_expert_indices]
---        selected_down_w = stacked_down_w[flat_expert_indices]
--+    #     # 2. "收集" 所需的专家权重
--+    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
--+    #     selected_up_w = stacked_up_w[flat_expert_indices]
--+    #     selected_down_w = stacked_down_w[flat_expert_indices]
--         
---        # 3. 准备输入
---        x_expanded = x.expand((top_k, 1, hidden_size))
--+    #     # 3. 准备输入
--+    #     x_expanded = x.expand((top_k, 1, hidden_size))
--         
---        # 4. 并行计算 gate_proj 和 up_proj
---        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
---        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--+    #     # 4. 并行计算 gate_proj 和 up_proj
--+    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--+    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
-- 
---        # 5. 计算中间状态
---        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--+    #     # 5. 计算中间状态
--+    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--         
---        # 6. 并行计算 down_proj
---        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
---        #    --- [FIX] ---
---        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
---        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
---        #    --- [FIX END] ---
--+    #     # 6. 并行计算 down_proj
--+    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--+    #     #    --- [FIX] ---
--+    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--+    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--+    #     #    --- [FIX END] ---
--         
---        # 7. 根据路由权重进行加权求和
---        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--+    #     # 7. 根据路由权重进行加权求和
--+    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
-- 
---        return weighted_sum
--+    #     return weighted_sum
-- 
-- 
-- 
--diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--index 0a0ef2d7..2842180e 100644
----- a/patches/0001-20251104commit.patch
--+++ b/patches/0001-20251104commit.patch
--@@ -1,7 +1,7 @@
-- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Tue, 4 Nov 2025 09:11:51 +0800
---Subject: [PATCH 1/4] 20251104commit
--+Subject: [PATCH 1/5] 20251104commit
-- 
-- ---
--  mindnlp/transformers/cache_utils.py           |  28 +-
--diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--index 5185270c..c6cd8757 100644
----- a/patches/0002-20251106commit.patch
--+++ b/patches/0002-20251106commit.patch
--@@ -1,7 +1,7 @@
-- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 09:20:38 +0800
---Subject: [PATCH 2/4] 20251106commit
--+Subject: [PATCH 2/5] 20251106commit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--index 3e05f821..601960c9 100644
----- a/patches/0003-20261106secondcommit.patch
--+++ b/patches/0003-20261106secondcommit.patch
--@@ -1,7 +1,7 @@
-- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 14:54:37 +0800
---Subject: [PATCH 3/4] 20261106secondcommit
--+Subject: [PATCH 3/5] 20261106secondcommit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--index 88a1aef4..8976f10b 100644
----- a/patches/0004-20251106change.patch
--+++ b/patches/0004-20251106change.patch
--@@ -1,7 +1,7 @@
-- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 15:48:09 +0800
---Subject: [PATCH 4/4] 20251106change
--+Subject: [PATCH 4/5] 20251106change
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  189 +-
--diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
--new file mode 100644
--index 00000000..8d9032be
----- /dev/null
--+++ b/patches/0005-20251107001commit.patch
--@@ -0,0 +1,7707 @@
--+From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
--+From: Pinoeer-kingxi <13022943007@163.com>
--+Date: Fri, 7 Nov 2025 11:48:18 +0800
--+Subject: [PATCH 5/5] 20251107001commit
--+
--+---
--+ .../models/deepseek/modeling_deepseek.py      |   91 +-
--+ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
--+ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
--+ patches/0001-20251104commit.patch             |    2 +-
--+ patches/0002-20251106commit.patch             |    2 +-
--+ patches/0003-20261106secondcommit.patch       |    2 +-
--+ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
--+ 7 files changed, 7577 insertions(+), 30 deletions(-)
--+ create mode 100644 patches/0004-20251106change.patch
--+
--+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+index 0546f318..8831e4b7 100644
--+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
--+     #         expert_cache += expert_out * weight
--+     #     return expert_cache
--+ 
--+-    @no_grad()
--+-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+-        # x 的 shape: (1, hidden_size)
--+-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+-
--+-        # 1. 收集所有需要的专家层
--+-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+-        selected_experts = [self.experts[i] for i in flat_expert_indices]
--+-
--+-        # 2. 并行计算所有专家的输出
--+-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+-        # ops.cat 会将它们堆叠成一个新的 Tensor
--+-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+-
--+-        # 3. 使用矩阵乘法进行加权求和
--+-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+-        # 最终结果 final_output 的 shape: (1, hidden_size)
--+-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--++    # @no_grad()
--++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++    #     # x 的 shape: (1, hidden_size)
--++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
--++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--++
--++    #     # 1. 收集所有需要的专家层
--++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
--++
--++    #     # 2. 并行计算所有专家的输出
--++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
--++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--++
--++    #     # 3. 使用矩阵乘法进行加权求和
--++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
--++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+         
--+-        return final_output
--++    #     return final_output
--+ 
--+ 
--+     # @no_grad()
--+@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
--+             )
--+ 
--+         return expert_cache
--++# 放置在 DeepseekMoE 类中
--++    @no_grad()
--++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++        """
--++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--++
--++        Args:
--++            x (Tensor): 输入张量, shape: (1, hidden_size)
--++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--++        """
--++        top_k, _ = flat_expert_weights.shape
--++        hidden_size = x.shape[-1]
--++
--++        # 1. 将所有专家的权重堆叠起来
--++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--++        
--++        # 2. "收集" 所需的专家权重
--++        selected_gate_w = stacked_gate_w[flat_expert_indices]
--++        selected_up_w = stacked_up_w[flat_expert_indices]
--++        selected_down_w = stacked_down_w[flat_expert_indices]
--++        
--++        # 3. 准备输入
--++        x_expanded = x.expand((top_k, 1, hidden_size))
--++        
--++        # 4. 并行计算 gate_proj 和 up_proj
--++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--++
--++        # 5. 计算中间状态
--++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--++        
--++        # 6. 并行计算 down_proj
--++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--++        #    --- [FIX] ---
--++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--++        #    --- [FIX END] ---
--++        
--++        # 7. 根据路由权重进行加权求和
--++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--++
--++        return weighted_sum
--++
--++
--+ 
--+         # @no_grad()
--+         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+index ebd7782e..913a7609 100644
--+--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
--+ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+ def rotate_half(x):
--+     """Rotates half the hidden dims of the input."""
--+-    x1 = x[..., : x.shape[-1] // 2]
--+-    x2 = x[..., x.shape[-1] // 2 :]
--++    # x1 = x[..., : x.shape[-1] // 2]
--++    # x2 = x[..., x.shape[-1] // 2 :]
--+     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+     return ops.cat((-x2, x1), dim=-1)
--+ 
--+ 
--+diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--+index d059dcbe..2b217b64 100644
--+--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--+@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
--+ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+ def rotate_half(x):
--+     """Rotates half the hidden dims of the input."""
--+-    x1 = x[..., : x.shape[-1] // 2]
--+-    x2 = x[..., x.shape[-1] // 2 :]
--++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++    # x1 = x[..., : x.shape[-1] // 2]
--++    # x2 = x[..., x.shape[-1] // 2 :]
--++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+     return ops.cat((-x2, x1), dim=-1)
--+ 
--+ 
--+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+index 78f22642..0a0ef2d7 100644
--+--- a/patches/0001-20251104commit.patch
--++++ b/patches/0001-20251104commit.patch
--+@@ -1,7 +1,7 @@
--+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Tue, 4 Nov 2025 09:11:51 +0800
--+-Subject: [PATCH 1/3] 20251104commit
--++Subject: [PATCH 1/4] 20251104commit
--+ 
--+ ---
--+  mindnlp/transformers/cache_utils.py           |  28 +-
--+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--+index 22b65dd5..5185270c 100644
--+--- a/patches/0002-20251106commit.patch
--++++ b/patches/0002-20251106commit.patch
--+@@ -1,7 +1,7 @@
--+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Thu, 6 Nov 2025 09:20:38 +0800
--+-Subject: [PATCH 2/3] 20251106commit
--++Subject: [PATCH 2/4] 20251106commit
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--+index 966529e4..3e05f821 100644
--+--- a/patches/0003-20261106secondcommit.patch
--++++ b/patches/0003-20261106secondcommit.patch
--+@@ -1,7 +1,7 @@
--+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Thu, 6 Nov 2025 14:54:37 +0800
--+-Subject: [PATCH 3/3] 20261106secondcommit
--++Subject: [PATCH 3/4] 20261106secondcommit
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--+new file mode 100644
--+index 00000000..88a1aef4
--+--- /dev/null
--++++ b/patches/0004-20251106change.patch
--+@@ -0,0 +1,7498 @@
--++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
--++From: Pinoeer-kingxi <13022943007@163.com>
--++Date: Thu, 6 Nov 2025 15:48:09 +0800
--++Subject: [PATCH 4/4] 20251106change
--++
--++---
--++ .../models/deepseek/modeling_deepseek.py      |  189 +-
--++ patches/0001-20251104commit.patch             | 1272 +++++++
--++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
--++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
--++ 4 files changed, 7244 insertions(+), 186 deletions(-)
--++ create mode 100644 patches/0001-20251104commit.patch
--++ create mode 100644 patches/0002-20251106commit.patch
--++ create mode 100644 patches/0003-20261106secondcommit.patch
--++
--++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++index 2f9192bf..0546f318 100644
--++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
--++ 
--++         return attn_output, attn_weights, past_key_value
--++ 
--++-# class DeepseekFlashAttention(nn.Module):
--++-#     """
--++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--++-
--++-#     This class is designed as a drop-in replacement for DeepseekAttention.
--++-#     """
--++-
--++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--++-#         super().__init__()
--++-#         self.config = config
--++-#         self.layer_idx = layer_idx
--++-#         if layer_idx is None:
--++-#             logger.warning(
--++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++-#                 "when creating this class."
--++-#             )
--++-
--++-#         self.attention_dropout = config.attention_dropout
--++-#         self.hidden_size = config.hidden_size
--++-#         self.num_heads = config.num_attention_heads
--++-#         self.head_dim = self.hidden_size // self.num_heads
--++-#         self.num_key_value_heads = config.num_key_value_heads
--++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++-#         self.max_position_embeddings = config.max_position_embeddings
--++-#         self.rope_theta = config.rope_theta
--++-#         self.is_causal = True
--++-
--++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++-#             raise ValueError(
--++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++-#                 f" and `num_heads`: {self.num_heads})."
--++-#             )
--++-
--++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--++-#         self._init_rope()
--++-
--++-#     def _init_rope(self):
--++-#         if self.config.rope_scaling is None:
--++-#             self.rotary_emb = DeepseekRotaryEmbedding(
--++-#                 self.head_dim,
--++-#                 max_position_embeddings=self.max_position_embeddings,
--++-#                 base=self.rope_theta,
--++-#             )
--++-#         else:
--++-#             scaling_type = self.config.rope_scaling["type"]
--++-#             scaling_factor = self.config.rope_scaling["factor"]
--++-#             if scaling_type == "linear":
--++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--++-#                     self.head_dim,
--++-#                     max_position_embeddings=self.max_position_embeddings,
--++-#                     scaling_factor=scaling_factor,
--++-#                     base=self.rope_theta,
--++-#                 )
--++-#             elif scaling_type == "dynamic":
--++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--++-#                     self.head_dim,
--++-#                     max_position_embeddings=self.max_position_embeddings,
--++-#                     scaling_factor=scaling_factor,
--++-#                     base=self.rope_theta,
--++-#                 )
--++-#             else:
--++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--++-
--++-#     def forward(
--++-#         self,
--++-#         hidden_states: mindspore.Tensor,
--++-#         attention_mask: Optional[mindspore.Tensor] = None,
--++-#         position_ids: Optional[mindspore.Tensor] = None,
--++-#         past_key_value: Optional[Cache] = None,
--++-#         output_attentions: bool = False,
--++-#         use_cache: bool = False,
--++-#         **kwargs,
--++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++-#         if "padding_mask" in kwargs:
--++-#             warnings.warn(
--++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--++-#             )
--++-        
--++-#         if output_attentions:
--++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--++-
--++-#         bsz, q_len, _ = hidden_states.shape
--++-
--++-#         if self.config.pretraining_tp > 1:
--++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--++-
--++-#         query_states = self.q_proj(hidden_states)
--++-#         key_states = self.k_proj(hidden_states)
--++-#         value_states = self.v_proj(hidden_states)
--++-
--++-#         # Reshape for multi-head attention
--++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++-
--++-#         kv_seq_len = key_states.shape[-2]
--++-#         if past_key_value is not None:
--++-#             if self.layer_idx is None:
--++-#                 raise ValueError(
--++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++-#                     "with a layer index."
--++-#                 )
--++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++-        
--++-#         # Apply Rotary Positional Embedding
--++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++-
--++-#         if past_key_value is not None:
--++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++-
--++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++-        
--++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++-        
--++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++-
--++-#         # Convert attention_mask for flash_attention_score
--++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--++-#         if attention_mask is not None:
--++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--++-#                 raise ValueError(
--++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--++-#                 )
--++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--++-#         else:
--++-#             attn_mask_for_fa = None
--++-        
--++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--++-
--++-#         # Call the fused flash_attention_score operator
--++-#         attn_output = mindspore.ops.flash_attention_score(
--++-#             query=query_states_for_fa,
--++-#             key=key_states_for_fa,
--++-#             value=value_states_for_fa,
--++-#             head_num=self.num_heads, # This is N1, the number of query heads
--++-#             input_layout='BSH',
--++-#             attn_mask=attn_mask_for_fa,
--++-#             keep_prob=keep_prob,
--++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
--++-#             sparse_mode=0 # Default mask mode
--++-#         )
--++-        
--++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--++-#         attn_output = self.o_proj(attn_output)
--++-        
--++-#         # Flash Attention does not return attention weights
--++-#         attn_weights = None
--++-
--++-#         return attn_output, attn_weights, past_key_value
--++ 
--++ class DeepseekFlashAttention(nn.Module):
--++     """
--++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
--++         super().__init__()
--++         self.hidden_size = config.hidden_size
--++ 
--++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--++-            config=config, layer_idx=layer_idx
--++-        )
--+++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--+++            # config=config, layer_idx=layer_idx
--+++        # )
--++ 
--++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--++             config=config, layer_idx=layer_idx
--++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
--++         return outputs
--++ 
--++ 
--++-
--++ class DeepseekPreTrainedModel(PreTrainedModel):
--++     config_class = DeepseekConfig
--++     base_model_prefix = "model"
--++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++         # Initialize weights and apply final processing
--++         self.post_init()
--++         self.warm_up = False
--++-        #@dwj
--++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--++-            self.num_layers,
--++-            self.num_attention_heads,
--++-            self.head_dim,
--++-            batch_size=1,
--++-            max_length=self.max_length,
--++-            dtype=mindspore.float16
--++-        )
--++-
--++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--++-        key_cache = []
--++-        value_cache = []
--++-        for _ in range(num_layers):
--++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++-            key_cache.append(k)
--++-            value_cache.append(v)
--++-        return key_cache, value_cache
--++-
--++ 
--++     def warmup_moe_model_deep(self):
--++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--++new file mode 100644
--++index 00000000..78f22642
--++--- /dev/null
--+++++ b/patches/0001-20251104commit.patch
--++@@ -0,0 +1,1272 @@
--+++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+++From: Pinoeer-kingxi <13022943007@163.com>
--+++Date: Tue, 4 Nov 2025 09:11:51 +0800
--+++Subject: [PATCH 1/3] 20251104commit
--+++
--+++---
--+++ mindnlp/transformers/cache_utils.py           |  28 +-
--+++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+++ 3 files changed, 976 insertions(+), 87 deletions(-)
--+++
--+++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+++index cadd2e04..02f8d4be 100644
--+++--- a/mindnlp/transformers/cache_utils.py
--++++++ b/mindnlp/transformers/cache_utils.py
--+++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+++             # k_out[:, :, cache_position] = key_states
--+++             # v_out[:, :, cache_position] = value_states
--+++-            if ON_ORANGE_PI:
--+++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++-            else:
--+++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++-
--++++            # if ON_ORANGE_PI:
--++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++            # else:
--++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++            # 确保 cache_position 是 1D tensor 并且类型正确
--++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--++++            if cache_position.ndim > 1:
--++++                cache_position = cache_position.flatten()
--++++            # 确保类型是 int32 或 int64（MindSpore 要求）
--++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--++++                cache_position = cache_position.int()
--++++            
--++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--++++            k_out[:, :, cache_position] = key_states
--++++            v_out[:, :, cache_position] = value_states
--++++            
--+++         return k_out, v_out
--+++ 
--+++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++index c695b944..d8303e45 100644
--+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+++ def rotate_half(x):
--+++     """Rotates half the hidden dims of the input."""
--+++-    x1 = x[..., : x.shape[-1] // 2]
--+++-    x2 = x[..., x.shape[-1] // 2 :]
--++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++    # x1 = x[..., : x.shape[-1] // 2]
--++++    # x2 = x[..., x.shape[-1] // 2 :]
--++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++     return ops.cat((-x2, x1), dim=-1)
--+++ 
--+++ 
--+++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+++         if self.training:
--+++             raise NotImplementedError("Training is not supported yet.")
--+++         else:
--+++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++-        if self.config.n_shared_experts is not None:
--+++-            y = y + self.shared_experts(identity)
--+++-        return y
--++++            # @lwx
--++++            if orig_shape[1] == 1:
--++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--++++                y=y.view(*orig_shape)
--++++                if self.config.n_shared_experts is not None:
--++++                    y = y + self.shared_experts(identity)
--++++                return y
--++++            else:
--++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--++++                if self.config.n_shared_experts is not None:
--++++                    y = y + self.shared_experts(identity)
--++++                return y
--++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++        # if self.config.n_shared_experts is not None:
--++++        #     y = y + self.shared_experts(identity)
--++++        # return y
--++++        
--++++    @no_grad()
--++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++
--++++        expert_cache = ops.zeros_like(x)
--++++        for i in range(self.num_experts_per_tok):
--++++            expert_id = flat_expert_indices[i].item()
--++++            weight = flat_expert_weights[i].item()
--++++            expert = self.experts[expert_id]
--++++            expert_out = expert(x)
--++++            expert_cache += expert_out * weight
--++++        return expert_cache
--+++ 
--+++     @no_grad()
--+++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++-        # expert_cache = torch.zeros_like(x)
--+++-        # idxs = flat_expert_indices.argsort()
--+++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++-        # token_idxs = idxs // self.num_experts_per_tok
--+++-        # for i, end_idx in enumerate(tokens_per_expert):
--+++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++-        #     if start_idx == end_idx:
--+++-        #         continue
--+++-        #     expert = self.experts[i]
--+++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++-        #     expert_tokens = x[exp_token_idx]
--+++-        #     expert_out = expert(expert_tokens)
--+++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++-        # return expert_cache
--++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++         expert_cache = ops.zeros_like(x)
--+++         idxs = flat_expert_indices.argsort()
--+++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++         token_idxs = idxs // self.num_experts_per_tok
--++++
--+++         for i, end_idx in enumerate(tokens_per_expert):
--+++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++             if start_idx == end_idx:
--+++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+++             expert_out = expert(expert_tokens)
--+++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++
--+++         return expert_cache
--++++        
--++++    # @no_grad()
--++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++    #     # expert_cache = torch.zeros_like(x)
--++++    #     # idxs = flat_expert_indices.argsort()
--++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++    #     # token_idxs = idxs // self.num_experts_per_tok
--++++    #     # for i, end_idx in enumerate(tokens_per_expert):
--++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++    #     #     if start_idx == end_idx:
--++++    #     #         continue
--++++    #     #     expert = self.experts[i]
--++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++    #     #     expert_tokens = x[exp_token_idx]
--++++    #     #     expert_out = expert(expert_tokens)
--++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++    #     # return expert_cache
--++++    #     expert_cache = ops.zeros_like(x)
--++++    #     idxs = flat_expert_indices.argsort()
--++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++    #     token_idxs = idxs // self.num_experts_per_tok
--++++
--++++    #     for i, end_idx in enumerate(tokens_per_expert):
--++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++    #         if start_idx == end_idx:
--++++    #             continue
--++++    #         expert = self.experts[i]
--++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++    #         expert_tokens = x[exp_token_idx]
--++++    #         expert_out = expert(expert_tokens)
--++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++
--++++    #     return expert_cache
--++++    # @no_grad()
--++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++    #     expert_cache = ops.zeros_like(x)
--++++
--++++    #     # 排序保证顺序一致
--++++    #     idxs = flat_expert_indices.argsort()
--++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++    #     token_idxs = idxs // self.num_experts_per_tok
--++++
--++++    #     # 找出有 token 的专家
--++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++
--++++    #     for i in active_experts.tolist():
--++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++    #         end_idx = tokens_per_expert[i]
--++++    #         if start_idx == end_idx:  # 没有 token
--++++    #             continue
--++++
--++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++    #         expert_tokens = x[exp_token_idx]
--++++    #         expert_out = self.experts[i](expert_tokens)
--++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++
--++++    #         expert_cache = mindspore.mint.scatter_add(
--++++    #             expert_cache,
--++++    #             0,
--++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++    #             expert_out
--++++    #         )
--++++
--++++    #     return expert_cache
--++++
--++++
--+++ 
--+++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+++ #     """
--+++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++ 
--+++         # Initialize weights and apply final processing
--+++         self.post_init()
--++++        self.warm_up = False
--++++
--++++    def warmup_moe_model_deep(self):
--++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++++        test_texts = [
--++++            "warmup short",
--++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--++++        ]
--++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++        if tokenizer is None:
--++++            from mindnlp.transformers import AutoTokenizer
--++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++            self._warmup_tokenizer = tokenizer
--++++
--++++        for text in test_texts:
--++++            inputs = tokenizer(text, return_tensors="ms")
--++++            with mindspore._no_grad():
--++++                _ = self(**inputs, use_cache=False)
--++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+++ 
--+++     def get_input_embeddings(self):
--+++         return self.model.embed_tokens
--+++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++         ```"""
--++++        if not self.warm_up:
--++++            self.warm_up = True
--++++            self.warmup_moe_model_deep()
--++++
--+++         output_attentions = (
--+++             output_attentions
--+++             if output_attentions is not None
--+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++index 3cbf820e..d4c6b651 100644
--+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++@@ -18,7 +18,6 @@
--+++ # See the License for the specific language governing permissions and
--+++ # limitations under the License.
--+++ """MindSpore Qwen2MoE model."""
--+++-
--+++ import math
--+++ from typing import List, Optional, Tuple, Union
--+++ 
--+++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+++     TokenClassifierOutput,
--+++ )
--+++ from ...modeling_utils import PreTrainedModel
--++++from ...generation import GenerationMixin
--+++ from ....utils import logging
--+++ from .configuration_qwen2_moe import Qwen2MoeConfig
--+++ 
--+++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+++         self.variance_epsilon = eps
--+++ 
--+++     def forward(self, hidden_states):
--++++        # @dwj
--++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++        # @lwx
--++++        # if not self.training :
--++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++         input_dtype = hidden_states.dtype
--+++         hidden_states = hidden_states.to(mindspore.float32)
--+++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+++@@ -234,6 +239,8 @@ def rotate_half(x):
--+++     """Rotates half the hidden dims of the input."""
--+++     x1 = x[..., : x.shape[-1] // 2]
--+++     x2 = x[..., x.shape[-1] // 2 :]
--++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++     return ops.cat((-x2, x1), dim=-1)
--+++ 
--+++ 
--+++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+++         self.config = config
--+++         self.hidden_size = config.hidden_size
--+++         self.intermediate_size = intermediate_size
--++++        
--+++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+++         self.act_fn = ACT2FN[config.hidden_act]
--+++ 
--+++     def forward(self, x):
--+++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++-
--+++ 
--++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++        # @lwx 
--++++        # gate_up_output = self.gate_up_proj(x)
--++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--++++        # return self.down_proj(swiglu_output)
--++++
--++++    # def forward(self, x):
--++++    #     gate_proj_out = self.gate_proj(x)
--++++    #     up_proj_out = self.up_proj(x)
--++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--++++    #     return self.down_proj(swiglu_out)
--++++        
--+++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+++     """
--+++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+++         use_cache: bool = False,
--+++         cache_position: Optional[mindspore.Tensor] = None,
--+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++        
--++++
--+++         bsz, q_len, _ = hidden_states.shape
--+++ 
--+++         query_states = self.q_proj(hidden_states)
--+++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++                     "with a layer index."
--+++                 )
--+++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++            if isinstance(past_key_value, StaticCache):
--++++                kv_seq_len = key_states.shape[-2]
--++++            else:
--++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++ 
--+++         if past_key_value is not None:
--+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++            
--++++            if isinstance(past_key_value, StaticCache):
--++++                kv_seq_len = key_states.shape[-2]
--+++ 
--+++         # repeat k/v heads if n_kv_heads < n_heads
--+++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++-
--++++        
--+++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++ 
--+++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+++-            raise ValueError(
--+++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+++-                f" {attn_weights.shape}"
--+++-            )
--+++-
--+++-        if attention_mask is not None:  # no matter the length, we just slice it
--+++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--++++        if attention_mask is not None:
--++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++             attn_weights = attn_weights + causal_mask
--+++ 
--+++         # upcast attention to fp32
--+++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++ 
--+++         attn_output = self.o_proj(attn_output)
--+++-
--++++        # @lwx
--++++        
--++++        # max_seq_len = self.max_position_embeddings  # 2048
--++++
--++++        # if attention_mask is not None:
--++++        #     # attention_mask: [B, 1, Sq, Sk]
--++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++++
--++++        #     # pad 到 [max_seq_len, max_seq_len]
--++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++++        #     global_attention_mask = padded_mask
--++++        # else:
--++++        #     global_attention_mask = None
--++++
--++++
--++++        # sparse_mode=3
--++++        # attn_output = mindspore.ops.flash_attention_score(
--++++        #     query=query_states,
--++++        #     key=key_states,
--++++        #     value=value_states,
--++++        #     real_shift=None,
--++++        #     padding_mask=None,
--++++
--++++        #     head_num=self.num_heads,
--++++        #     attn_mask=global_attention_mask,
--++++        #     keep_prob=1.0 - self.attention_dropout,
--++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++++        #     input_layout="BNSD",
--++++        #     pre_tokens=2147483647,
--++++        #     next_tokens=2147483647,
--++++        #     inner_precise=0,
--++++        #     drop_mask=None,
--++++        #     prefix=None,
--++++        #     actual_seq_qlen=None,
--++++        #     actual_seq_kvlen=None,
--++++        #     sparse_mode=sparse_mode,
--++++        # )
--+++         if not output_attentions:
--+++             attn_weights = None
--+++ 
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--+++ 
--++++class Qwen2MoeFlashAttention(nn.Module):
--++++    """
--++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++++
--++++    关键改动:
--++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++++       直接传入原始的 key 和 value 张量效率更高。
--++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++    """
--++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++        super().__init__()
--++++        self.config = config
--++++        self.layer_idx = layer_idx
--++++        self.hidden_size = config.hidden_size
--++++        self.num_heads = config.num_attention_heads
--++++        self.head_dim = self.hidden_size // self.num_heads
--++++        self.num_key_value_heads = config.num_key_value_heads
--++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++        self.max_position_embeddings = config.max_position_embeddings
--++++        self.rope_theta = config.rope_theta
--++++        self.attention_dropout = config.attention_dropout
--++++
--++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++++            raise ValueError(
--++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++++            )
--++++
--++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++
--++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++            self.head_dim,
--++++            max_position_embeddings=self.max_position_embeddings,
--++++            base=self.rope_theta,
--++++        )
--++++
--++++    def forward(
--++++        self,
--++++        hidden_states: mindspore.Tensor,
--++++        attention_mask: Optional[mindspore.Tensor] = None,
--++++        position_ids: Optional[mindspore.Tensor] = None,
--++++        past_key_value: Optional[Cache] = None,
--++++        output_attentions: bool = False,
--++++        use_cache: bool = False,
--++++        cache_position: Optional[mindspore.Tensor] = None,
--++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++        bsz, q_len, _ = hidden_states.shape
--++++
--++++        # 1. 线性投射 Q, K, V
--++++        query_states = self.q_proj(hidden_states)
--++++        key_states = self.k_proj(hidden_states)
--++++        value_states = self.v_proj(hidden_states)
--++++
--++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++        # query:   [B, S, H*D] -> [B, N1, S, D]
--++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++        # 3. RoPE 旋转位置编码
--++++        kv_seq_len = key_states.shape[-2]
--++++        if past_key_value is not None:
--++++            if self.layer_idx is None:
--++++                raise ValueError(
--++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++                    "with a layer index."
--++++                )
--++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++++                if cache_position.shape[0] == 1:
--++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++++                    kv_seq_len = past_seen_tokens + 1
--++++                else:
--++++                    # prefill 阶段：cache_position 是范围，使用其长度
--++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++++            else:
--++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++
--++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++        # 4. KV 缓存更新
--++++        if past_key_value is not None:
--++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++            key_states, value_states = past_key_value.update(
--++++                key_states, value_states, self.layer_idx, cache_kwargs
--++++            )
--++++            
--++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++                if cache_position.shape[0] == 1:
--++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++++                    kv_seq_len = key_states.shape[-2]
--++++
--++++        # 5. [重要] 准备 Attention Mask
--++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++        fa_attention_mask = None
--++++        if attention_mask is not None:
--++++            # 截取与当前key长度匹配的部分
--++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++++            fa_attention_mask = (mask_slice != 0)
--++++
--++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++++        input_dtype = query_states.dtype
--++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++++            query_states = query_states.to(mindspore.float16)
--++++            key_states = key_states.to(mindspore.float16)
--++++            value_states = value_states.to(mindspore.float16)
--++++
--++++        # 6. [核心] 调用 flash_attention_score 算子
--++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++        attn_output = mindspore.ops.flash_attention_score(
--++++            query=query_states,
--++++            key=key_states,
--++++            value=value_states,
--++++            head_num=self.num_heads, # 传入Q的头数(N1)
--++++            attn_mask=fa_attention_mask,
--++++            keep_prob=1.0 - self.attention_dropout,
--++++            scalar_value=1.0 / math.sqrt(self.head_dim),
--++++            input_layout="BNSD",
--++++            sparse_mode=0 # 使用 defaultMask 模式
--++++        )
--++++
--++++        # 恢复原始数据类型
--++++        attn_output = attn_output.to(input_dtype)
--++++
--++++        # 7. 调整输出形状
--++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++        attn_output = self.o_proj(attn_output)
--++++
--++++        # FlashAttention 算子不直接返回注意力权重矩阵
--++++        attn_weights = None
--++++        if output_attentions:
--++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++
--++++        return attn_output, attn_weights, past_key_value
--++++
--++++    # def forward(
--++++    #     self,
--++++    #     hidden_states: mindspore.Tensor,
--++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++    #     position_ids: Optional[mindspore.Tensor] = None,
--++++    #     past_key_value: Optional[Cache] = None,
--++++    #     output_attentions: bool = False,
--++++    #     use_cache: bool = False,
--++++    #     cache_position: Optional[mindspore.Tensor] = None,
--++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++    #     bsz, q_len, _ = hidden_states.shape
--++++
--++++    #     # 1. 线性投射 Q, K, V
--++++    #     query_states = self.q_proj(hidden_states)
--++++    #     key_states = self.k_proj(hidden_states)
--++++    #     value_states = self.v_proj(hidden_states)
--++++
--++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++    #     # 3. RoPE 旋转位置编码
--++++    #     kv_seq_len = key_states.shape[-2]
--++++    #     if past_key_value is not None:
--++++    #         if self.layer_idx is None:
--++++    #             raise ValueError(
--++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++    #                 "with a layer index."
--++++    #             )
--++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++
--++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++    #     # 4. KV 缓存更新
--++++    #     if past_key_value is not None:
--++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++    #         key_states, value_states = past_key_value.update(
--++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++    #         )
--++++
--++++    #     # 5. 准备 Attention Mask
--++++    #     fa_attention_mask = None
--++++    #     if attention_mask is not None:
--++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++    #         fa_attention_mask = (mask_slice != 0)
--++++
--++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++++    #     input_dtype = query_states.dtype
--++++
--++++    #     # 6. [核心] 调用 flash_attention_score 算子
--++++    #     attn_output = mindspore.ops.flash_attention_score(
--++++    #         query=query_states,
--++++    #         key=key_states,
--++++    #         value=value_states,
--++++    #         head_num=self.num_heads,
--++++    #         attn_mask=fa_attention_mask,
--++++    #         keep_prob=1.0 - self.attention_dropout,
--++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++++    #         input_layout="BNSD",
--++++    #         sparse_mode=0,
--++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++++    #         inner_precise=1
--++++    #     )
--++++
--++++    #     # 恢复原始数据类型
--++++    #     attn_output = attn_output.to(input_dtype)
--++++
--++++    #     # 7. 调整输出形状
--++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++    #     attn_output = self.o_proj(attn_output)
--++++
--++++    #     attn_weights = None
--++++    #     if output_attentions:
--++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++
--++++    #     return attn_output, attn_weights, past_key_value
--++++
--++++    # def forward(
--++++    #     self,
--++++    #     hidden_states: mindspore.Tensor,
--++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++    #     position_ids: Optional[mindspore.Tensor] = None,
--++++    #     past_key_value: Optional[Cache] = None,
--++++    #     output_attentions: bool = False,
--++++    #     use_cache: bool = False,
--++++    #     cache_position: Optional[mindspore.Tensor] = None,
--++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++    #     bsz, q_len, _ = hidden_states.shape
--++++
--++++    #     query_states = self.q_proj(hidden_states)
--++++    #     key_states = self.k_proj(hidden_states)
--++++    #     value_states = self.v_proj(hidden_states)
--++++
--++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++    #     kv_seq_len = key_states.shape[-2]
--++++    #     if past_key_value is not None:
--++++    #         if self.layer_idx is None:
--++++    #             raise ValueError("`layer_idx` must be specified for caching")
--++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++
--++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++    #     if past_key_value is not None:
--++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++    #         key_states, value_states = past_key_value.update(
--++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++    #         )
--++++
--++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++
--++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++++    #     query_states = query_states / math.sqrt(self.head_dim)
--++++    #     # <--- 修改结束 ---
--++++
--++++    #     fa_attention_mask = None
--++++    #     if attention_mask is not None:
--++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++    #         fa_attention_mask = (mask_slice != 0)
--++++
--++++    #     input_dtype = query_states.dtype
--++++
--++++    #     attn_output = mindspore.ops.flash_attention_score(
--++++    #         query=query_states,    # 传入已经预先缩放过的 query
--++++    #         key=key_states,
--++++    #         value=value_states,
--++++    #         head_num=self.num_heads,
--++++    #         attn_mask=fa_attention_mask,
--++++    #         keep_prob=1.0 - self.attention_dropout,
--++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++++    #         input_layout="BNSD",
--++++    #         sparse_mode=0,
--++++    #         inner_precise=1        # 仍然保持内部高精度计算
--++++    #     )
--++++
--++++    #     attn_output = attn_output.to(input_dtype)
--++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++    #     attn_output = self.o_proj(attn_output)
--++++
--++++    #     attn_weights = None
--++++    #     if output_attentions:
--++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++++
--++++    #     return attn_output, attn_weights, past_key_value
--++++
--+++ QWEN2MOE_ATTENTION_CLASSES = {
--+++     "eager": Qwen2MoeAttention,
--++++    "flash-attention": Qwen2MoeFlashAttention,
--+++ }
--+++ 
--+++ 
--+++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++ 
--++++    #@dwj
--++++    # 只遍历激活的专家，而非全部专家
--+++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-        hidden_states = hidden_states.view(-1, hidden_dim)
--+++-        # router_logits: (batch * sequence_length, n_experts)
--+++-        router_logits = self.gate(hidden_states)
--+++-
--+++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++-        if self.norm_topk_prob:
--+++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-        # we cast back to the input dtype
--+++-        routing_weights = routing_weights.to(hidden_states.dtype)
--+++-
--+++-        final_hidden_states = ops.zeros(
--+++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+++-        )
--+++-
--+++-        # One hot encode the selected experts to create an expert mask
--+++-        # this will be used to easily index which expert is going to be sollicitated
--+++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+++-
--+++-        # Loop over all available experts in the model and perform the computation on each expert
--+++-        for expert_idx in range(self.num_experts):
--+++-            expert_layer = self.experts[expert_idx]
--+++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+++-
--+++-            # Index the correct hidden states and compute the expert hidden state for
--+++-            # the current expert. We need to make sure to multiply the output hidden
--+++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+++-            if 0 not in idx.shape:
--+++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+++-
--+++-                # However `index_add_` only support torch tensors for indexing so we'll use
--+++-                # the `top_x` tensor here.
--+++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+++-
--+++-        shared_expert_output = self.shared_expert(hidden_states)
--+++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+++-
--+++-        final_hidden_states = final_hidden_states + shared_expert_output
--++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++            num_tokens = hidden_states_reshaped.shape[0]
--++++
--++++            router_logits = self.gate(hidden_states_reshaped)
--++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++
--++++            if self.norm_topk_prob:
--++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++            routing_weights = routing_weights.to(hidden_states.dtype)
--++++            
--++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++++            flat_selected_experts = selected_experts.flatten()
--++++            
--++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++++            token_indices = broadcasted_token_indices.flatten()
--++++            
--++++            active_experts = ops.unique(flat_selected_experts)
--++++            
--++++            for expert_idx_tensor in active_experts:
--++++                expert_idx = expert_idx_tensor.item()
--++++                expert_layer = self.experts[expert_idx]
--++++                
--++++                mask = (flat_selected_experts == expert_idx_tensor)
--++++                selected_token_indices = token_indices[mask]
--++++                selected_routing_weights = routing_weights.flatten()[mask]
--++++                
--++++                current_states = hidden_states_reshaped[selected_token_indices]
--++++                
--++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++                
--++++                final_hidden_states = final_hidden_states.index_add(
--++++                    dim=0,
--++++                    index=selected_token_indices,
--++++                    source=expert_output.to(hidden_states.dtype)
--++++                )
--++++            
--++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++ 
--+++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++-        return final_hidden_states, router_logits
--++++            final_hidden_states = final_hidden_states + shared_expert_output
--++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++            
--++++            return final_hidden_states, router_logits
--+++ 
--+++ 
--+++ class Qwen2MoeDecoderLayer(nn.Module):
--+++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+++ 
--+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++ 
--++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++
--+++         if (layer_idx not in config.mlp_only_layers) and (
--+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+++         ):
--+++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+++     _skip_keys_device_placement = "past_key_values"
--+++     _supports_cache_class = True
--++++#lwx
--++++    # _supports_static_cache = True
--+++ 
--+++     def _init_weights(self, module):
--+++         std = self.config.initializer_range
--+++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+++         return causal_mask
--+++ 
--+++ 
--+++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++     _tied_weights_keys = ["lm_head.weight"]
--+++ 
--+++     def __init__(self, config):
--+++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++         self.num_experts_per_tok = config.num_experts_per_tok
--+++         # Initialize weights and apply final processing
--+++         self.post_init()
--++++        # @lwx
--++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--++++        #     self.generation_config.cache_implementation = "static"
--++++        self._warmed_up = False
--++++
--++++    def warmup_moe_model(self):
--++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--++++        test_texts = [
--++++            "warmup short",
--++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--++++        ]
--++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++        if tokenizer is None:
--++++            from mindnlp.transformers import AutoTokenizer
--++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++            self._warmup_tokenizer = tokenizer
--++++
--++++        for text in test_texts:
--++++            inputs = tokenizer(text, return_tensors="ms")
--++++            with mindspore._no_grad():
--++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+++ 
--+++     def get_input_embeddings(self):
--+++         return self.model.embed_tokens
--+++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++         ```"""
--++++        if not self._warmed_up:
--++++            self._warmed_up = True
--++++            self.warmup_moe_model()
--+++ 
--+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+++         output_router_logits = (
--+++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++             }
--+++         )
--+++         return model_inputs
--++++# @lwx
--++++    # def _decode_one_tokens_logits(
--++++    #     self,
--++++    #     cur_token: mindspore.Tensor,
--++++    #     input_pos: Optional[mindspore.Tensor],
--++++    #     cache_position: mindspore.Tensor,
--++++    #     past_key_values: StaticCache,
--++++    # ) -> mindspore.Tensor:
--++++    #     """
--++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--++++        
--++++    #     Args:
--++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--++++    #         input_pos: 输入位置信息，可选
--++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--++++            
--++++    #     Returns:
--++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--++++    #     """
--++++    #     # 调用JIT编译的版本
--++++    #     return self.get_decode_one_tokens_logits(
--++++    #         cur_token=cur_token,
--++++    #         input_pos=input_pos,
--++++    #         cache_position=cache_position,
--++++    #         past_key_values=past_key_values,
--++++    #     )
--++++    
--++++    # @mindspore.jit(jit_level='O1')
--++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--++++    #     """
--++++    #     JIT编译的函数，用于高效的单token解码
--++++    #     使用JIT编译优化以支持静态shape和高效执行
--++++        
--++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--++++    #     """
--++++    #     outputs = self.model.forward(
--++++    #         input_ids=cur_token,
--++++    #         position_ids=input_pos,
--++++    #         cache_position=cache_position,
--++++    #         past_key_values=past_key_values,
--++++    #         use_cache=True,
--++++    #         return_dict=False,
--++++    #     )
--++++        
--++++    #     hidden_states = outputs[0]
--++++    #     logits = self.lm_head.forward(hidden_states)
--++++    #     logits = logits.float()
--++++        
--++++    #     return logits[:, -1, :]
--++++
--++++    # def _sample(
--++++    #     self,
--++++    #     input_ids: mindspore.Tensor,
--++++    #     logits_processor,
--++++    #     stopping_criteria,
--++++    #     generation_config,
--++++    #     synced_devices: bool,
--++++    #     streamer=None,
--++++    #     logits_warper=None,
--++++    #     **model_kwargs,
--++++    # ):
--++++    #     """
--++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--++++    #     """
--++++    #     from ...generation.logits_process import LogitsProcessorList
--++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--++++    #     from mindnlp.core import nn, ops, no_grad
--++++    #     import numpy as np
--++++        
--++++    #     # 检查是否使用 StaticCache
--++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--++++    #     # 否则，直接调用父类方法
--++++    #     past_key_values = model_kwargs.get("past_key_values")
--++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--++++
--++++    #     if not isinstance(past_key_values, StaticCache):
--++++    #         # 不使用 StaticCache，直接调用父类方法
--++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--++++    #         return super()._sample(
--++++    #             input_ids=input_ids,
--++++    #             logits_processor=logits_processor,
--++++    #             stopping_criteria=stopping_criteria,
--++++    #             generation_config=generation_config,
--++++    #             synced_devices=synced_devices,
--++++    #             streamer=streamer,
--++++    #             logits_warper=logits_warper,
--++++    #             **model_kwargs,
--++++    #         )
--++++        
--++++    #     # 使用 StaticCache，进入自定义循环
--++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--++++    #     pad_token_id = generation_config._pad_token_tensor
--++++    #     output_attentions = generation_config.output_attentions
--++++    #     output_hidden_states = generation_config.output_hidden_states
--++++    #     output_scores = generation_config.output_scores
--++++    #     output_logits = generation_config.output_logits
--++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--++++    #     max_length = generation_config.max_length
--++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--++++    #     do_sample = generation_config.do_sample
--++++        
--++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--++++    #         raise ValueError(
--++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--++++    #             f"{logits_warper})."
--++++    #         )
--++++        
--++++    #     # init attention / hidden states / scores tuples
--++++    #     scores = () if (return_dict_in_generate and output_scores) else None
--++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--++++        
--++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--++++    #         encoder_hidden_states = (
--++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--++++    #         )
--++++        
--++++    #     # keep track of which sequences are already finished
--++++    #     batch_size, cur_len = input_ids.shape
--++++    #     this_peer_finished = False
--++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--++++        
--++++    #     time_record = []
--++++    #     from ....utils.testing_utils import parse_flag_from_env
--++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--++++        
--++++    #     while self._has_unfinished_sequences(
--++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--++++    #     ):
--++++    #         if _record_time:
--++++    #             import time as time_module
--++++    #             infer_start = time_module.time()
--++++            
--++++    #         # prepare model inputs
--++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--++++            
--++++    #         # prepare variable output controls
--++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--++++            
--++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--++++    #         cur_cache_position = model_inputs.get("cache_position")
--++++    #         cur_past_key_values = model_inputs.get("past_key_values")
--++++    #         cur_input_ids = model_inputs.get("input_ids")
--++++            
--++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--++++    #             cur_cache_position is not None and 
--++++    #             len(cur_cache_position.shape) > 0 and
--++++    #             cur_cache_position.shape[0] == 1 and
--++++    #             cur_input_ids is not None and
--++++    #             cur_input_ids.shape[1] == 1):
--++++    #             # 使用 JIT 优化的单 token 解码
--++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--++++    #             if not hasattr(self, '_jit_used'):
--++++    #                 self._jit_used = False
--++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--++++                
--++++    #             next_token_logits = self.get_decode_one_tokens_logits(
--++++    #                 cur_token=cur_input_ids,
--++++    #                 input_pos=model_inputs.get("position_ids"),
--++++    #                 cache_position=cur_cache_position,
--++++    #                 past_key_values=cur_past_key_values,
--++++    #             )
--++++                
--++++    #             # 标记已使用JIT（用于后续判断）
--++++    #             if not self._jit_used:
--++++    #                 self._jit_used = True
--++++                
--++++    #             # 构造兼容的输出对象
--++++    #             class JitOptimizedOutput:
--++++    #                 def __init__(self, logits, config):
--++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--++++    #                     self.config = config
--++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--++++    #                     self.attentions = None if not config.is_encoder_decoder else None
--++++    #                     self.cross_attentions = None
--++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--++++                
--++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--++++    #         else:
--++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--++++    #             outputs = self(**model_inputs, return_dict=True)
--++++            
--++++    #         if synced_devices and this_peer_finished:
--++++    #             continue
--++++            
--++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--++++    #         next_token_logits = outputs.logits[:, -1, :]
--++++            
--++++    #         # pre-process distribution
--++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--++++    #         if do_sample:
--++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--++++            
--++++    #         # Store scores, attentions and hidden_states when required
--++++    #         if return_dict_in_generate:
--++++    #             if output_scores:
--++++    #                 scores += (next_token_scores,)
--++++    #             if output_logits:
--++++    #                 raw_logits += (next_token_logits,)
--++++    #             if output_attentions:
--++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--++++    #                 if self.config.is_encoder_decoder:
--++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--++++                
--++++    #             if output_hidden_states:
--++++    #                 hidden = (
--++++    #                     outputs.decoder_hidden_states
--++++    #                     if self.config.is_encoder_decoder
--++++    #                     else outputs.hidden_states
--++++    #                 )
--++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--++++            
--++++    #         # token selection
--++++    #         if do_sample:
--++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--++++    #         else:
--++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--++++            
--++++    #         # finished sentences should have their next token be a padding token
--++++    #         if has_eos_stopping_criteria:
--++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--++++            
--++++    #         # update generated ids, model inputs, and length for next step
--++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--++++    #         if streamer is not None:
--++++    #             streamer.put(next_tokens)
--++++            
--++++    #         model_kwargs = self._update_model_kwargs_for_generation(
--++++    #             outputs,
--++++    #             model_kwargs,
--++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--++++    #         )
--++++            
--++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--++++    #         cur_len += 1
--++++            
--++++    #         if _record_time:
--++++    #             import time as time_module
--++++    #             infer_stop = time_module.time()
--++++    #             time_record.append(infer_stop - infer_start)
--++++            
--++++    #         del outputs
--++++        
--++++    #     average_infer_time = None
--++++    #     if time_record:
--++++    #         if len(time_record) > 1:
--++++    #             time_record.pop(0)
--++++    #         average_infer_time = sum(time_record) / len(time_record)
--++++    #         print(f'average inference time is: {average_infer_time}')
--++++    #         print(f'inference time record: {time_record}')
--++++        
--++++    #     if streamer is not None:
--++++    #         streamer.end()
--++++        
--++++    #     # 简单判断：打印是否使用了JIT路径
--++++    #     if hasattr(self, '_jit_used') and self._jit_used:
--++++    #         print("[JIT] ✓ JIT optimization was used during generation")
--++++    #     else:
--++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--++++        
--++++    #     if return_dict_in_generate:
--++++    #         if self.config.is_encoder_decoder:
--++++    #             return GenerateEncoderDecoderOutput(
--++++    #                 sequences=input_ids,
--++++    #                 scores=scores,
--++++    #                 logits=raw_logits,
--++++    #                 encoder_attentions=encoder_attentions,
--++++    #                 encoder_hidden_states=encoder_hidden_states,
--++++    #                 decoder_attentions=decoder_attentions,
--++++    #                 cross_attentions=cross_attentions,
--++++    #                 decoder_hidden_states=decoder_hidden_states,
--++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++    #                 average_infer_time=average_infer_time
--++++    #             )
--++++    #         else:
--++++    #             return GenerateDecoderOnlyOutput(
--++++    #                 sequences=input_ids,
--++++    #                 scores=scores,
--++++    #                 logits=raw_logits,
--++++    #                 attentions=decoder_attentions,
--++++    #                 hidden_states=decoder_hidden_states,
--++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++    #                 average_infer_time=average_infer_time
--++++    #             )
--++++    #     else:
--++++    #         return input_ids
--++++            
--++++    # def _prepare_cache_for_generation(
--++++    #     self,
--++++    #     generation_config,
--++++    #     model_kwargs,
--++++    #     assistant_model,
--++++    #     batch_size,
--++++    #     max_cache_length,
--++++    # ):
--++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--++++    #         generation_config.cache_implementation = "static"
--++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--++++        
--++++    #     if generation_config.cache_implementation == "static":
--++++    #         base_required_from_max_length = generation_config.max_length + 1
--++++    #         base_required = max(max_cache_length, base_required_from_max_length)
--++++    #         min_cache_size = 50
--++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--++++    #         else:
--++++    #             max_cache_length = max(base_required, min_cache_size)
--++++            
--++++    #         original_max_cache_length = max_cache_length
--++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--++++    #         print(f"  - final max_cache_length: {max_cache_length}")
--++++            
--++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++    #             if max_cache_length > self.config.max_position_embeddings:
--++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--++++        
--++++    #     result = super()._prepare_cache_for_generation(
--++++    #         generation_config=generation_config,
--++++    #         model_kwargs=model_kwargs,
--++++    #         assistant_model=assistant_model,
--++++    #         batch_size=batch_size,
--++++    #         max_cache_length=max_cache_length,
--++++    #     )
--++++        
--++++    #     if generation_config.cache_implementation == "static":
--++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--++++    #         created_cache = model_kwargs.get(cache_name)
--++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--++++    #             if created_cache.max_cache_len < generation_config.max_length:
--++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--++++        
--++++    #     return result
--++++
--++++
--++++
--+++ 
--+++ 
--+++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+++-- 
--+++2.27.0
--+++
--++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--++new file mode 100644
--++index 00000000..22b65dd5
--++--- /dev/null
--+++++ b/patches/0002-20251106commit.patch
--++@@ -0,0 +1,3200 @@
--+++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--+++From: Pinoeer-kingxi <13022943007@163.com>
--+++Date: Thu, 6 Nov 2025 09:20:38 +0800
--+++Subject: [PATCH 2/3] 20251106commit
--+++
--+++---
--+++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
--+++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
--+++ 3 files changed, 2689 insertions(+), 305 deletions(-)
--+++ create mode 100644 patches/0001-20251104commit.patch
--+++
--+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++index d8303e45..73773c22 100644
--+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
--+++         #     y = y + self.shared_experts(identity)
--+++         # return y
--+++         
--++++    # @no_grad()
--++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++
--++++    #     expert_cache = ops.zeros_like(x)
--++++    #     for i in range(self.num_experts_per_tok):
--++++    #         expert_id = flat_expert_indices[i].item()
--++++    #         weight = flat_expert_weights[i].item()
--++++    #         expert = self.experts[expert_id]
--++++    #         expert_out = expert(x)
--++++    #         expert_cache += expert_out * weight
--++++    #     return expert_cache
--++++
--+++     @no_grad()
--+++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++        # x 的 shape: (1, hidden_size)
--++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--++++
--++++        # 1. 收集所有需要的专家层
--++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
--++++
--++++        # 2. 并行计算所有专家的输出
--++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--++++        # ops.cat 会将它们堆叠成一个新的 Tensor
--++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--++++
--++++        # 3. 使用矩阵乘法进行加权求和
--++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++++        # 最终结果 final_output 的 shape: (1, hidden_size)
--++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--++++        
--++++        return final_output
--+++ 
--+++-        expert_cache = ops.zeros_like(x)
--+++-        for i in range(self.num_experts_per_tok):
--+++-            expert_id = flat_expert_indices[i].item()
--+++-            weight = flat_expert_weights[i].item()
--+++-            expert = self.experts[expert_id]
--+++-            expert_out = expert(x)
--+++-            expert_cache += expert_out * weight
--+++-        return expert_cache
--+++ 
--+++     @no_grad()
--+++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
--+++             key_states = self.k_proj(hidden_states)
--+++             value_states = self.v_proj(hidden_states)
--+++ 
--+++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++        # @lwx
--++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
--++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
--++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--+++ 
--+++         kv_seq_len = key_states.shape[-2]
--+++         if past_key_value is not None:
--+++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--+++ 
--++++# class DeepseekFlashAttention(nn.Module):
--++++#     """
--++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--++++
--++++#     This class is designed as a drop-in replacement for DeepseekAttention.
--++++#     """
--++++
--++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--++++#         super().__init__()
--++++#         self.config = config
--++++#         self.layer_idx = layer_idx
--++++#         if layer_idx is None:
--++++#             logger.warning(
--++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++#                 "when creating this class."
--++++#             )
--++++
--++++#         self.attention_dropout = config.attention_dropout
--++++#         self.hidden_size = config.hidden_size
--++++#         self.num_heads = config.num_attention_heads
--++++#         self.head_dim = self.hidden_size // self.num_heads
--++++#         self.num_key_value_heads = config.num_key_value_heads
--++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++#         self.max_position_embeddings = config.max_position_embeddings
--++++#         self.rope_theta = config.rope_theta
--++++#         self.is_causal = True
--++++
--++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++++#             raise ValueError(
--++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++++#                 f" and `num_heads`: {self.num_heads})."
--++++#             )
--++++
--++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--++++#         self._init_rope()
--++++
--++++#     def _init_rope(self):
--++++#         if self.config.rope_scaling is None:
--++++#             self.rotary_emb = DeepseekRotaryEmbedding(
--++++#                 self.head_dim,
--++++#                 max_position_embeddings=self.max_position_embeddings,
--++++#                 base=self.rope_theta,
--++++#             )
--++++#         else:
--++++#             scaling_type = self.config.rope_scaling["type"]
--++++#             scaling_factor = self.config.rope_scaling["factor"]
--++++#             if scaling_type == "linear":
--++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--++++#                     self.head_dim,
--++++#                     max_position_embeddings=self.max_position_embeddings,
--++++#                     scaling_factor=scaling_factor,
--++++#                     base=self.rope_theta,
--++++#                 )
--++++#             elif scaling_type == "dynamic":
--++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--++++#                     self.head_dim,
--++++#                     max_position_embeddings=self.max_position_embeddings,
--++++#                     scaling_factor=scaling_factor,
--++++#                     base=self.rope_theta,
--++++#                 )
--++++#             else:
--++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--++++
--++++#     def forward(
--++++#         self,
--++++#         hidden_states: mindspore.Tensor,
--++++#         attention_mask: Optional[mindspore.Tensor] = None,
--++++#         position_ids: Optional[mindspore.Tensor] = None,
--++++#         past_key_value: Optional[Cache] = None,
--++++#         output_attentions: bool = False,
--++++#         use_cache: bool = False,
--++++#         **kwargs,
--++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++#         if "padding_mask" in kwargs:
--++++#             warnings.warn(
--++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--++++#             )
--++++        
--++++#         if output_attentions:
--++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--++++
--++++#         bsz, q_len, _ = hidden_states.shape
--++++
--++++#         if self.config.pretraining_tp > 1:
--++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--++++
--++++#         query_states = self.q_proj(hidden_states)
--++++#         key_states = self.k_proj(hidden_states)
--++++#         value_states = self.v_proj(hidden_states)
--++++
--++++#         # Reshape for multi-head attention
--++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++#         kv_seq_len = key_states.shape[-2]
--++++#         if past_key_value is not None:
--++++#             if self.layer_idx is None:
--++++#                 raise ValueError(
--++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++#                     "with a layer index."
--++++#                 )
--++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++        
--++++#         # Apply Rotary Positional Embedding
--++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++#         if past_key_value is not None:
--++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++
--++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++        
--++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++++        
--++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++++
--++++#         # Convert attention_mask for flash_attention_score
--++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--++++#         if attention_mask is not None:
--++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--++++#                 raise ValueError(
--++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--++++#                 )
--++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--++++#         else:
--++++#             attn_mask_for_fa = None
--++++        
--++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--++++
--++++#         # Call the fused flash_attention_score operator
--++++#         attn_output = mindspore.ops.flash_attention_score(
--++++#             query=query_states_for_fa,
--++++#             key=key_states_for_fa,
--++++#             value=value_states_for_fa,
--++++#             head_num=self.num_heads, # This is N1, the number of query heads
--++++#             input_layout='BSH',
--++++#             attn_mask=attn_mask_for_fa,
--++++#             keep_prob=keep_prob,
--++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--++++#             sparse_mode=0 # Default mask mode
--++++#         )
--++++        
--++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--++++#         attn_output = self.o_proj(attn_output)
--++++        
--++++#         # Flash Attention does not return attention weights
--++++#         attn_weights = None
--++++
--++++#         return attn_output, attn_weights, past_key_value
--++++
--++++class DeepseekFlashAttention(nn.Module):
--++++    """
--++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
--++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
--++++    designed for high performance on supported hardware (Ascend).
--++++
--++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
--++++    """
--++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--++++        super().__init__()
--++++        self.config = config
--++++        self.layer_idx = layer_idx
--++++        if layer_idx is None:
--++++            logger.warning(
--++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++                "when creating this class."
--++++            )
--++++
--++++        # --- [FIX] Correctly initialize all required attributes ---
--++++        self.attention_dropout = config.attention_dropout
--++++        self.hidden_size = config.hidden_size
--++++        self.num_heads = config.num_attention_heads
--++++        self.head_dim = self.hidden_size // self.num_heads
--++++        self.num_key_value_heads = config.num_key_value_heads
--++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++        self.max_position_embeddings = config.max_position_embeddings
--++++        self.rope_theta = config.rope_theta
--++++        self.is_causal = True
--++++
--++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++++            raise ValueError(
--++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++++                f" and `num_heads`: {self.num_heads})."
--++++            )
--++++
--++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--++++        
--++++        # This call will now succeed as all attributes are initialized.
--++++        self._init_rope()
--++++
--++++    def _init_rope(self):
--++++        if self.config.rope_scaling is None:
--++++            self.rotary_emb = DeepseekRotaryEmbedding(
--++++                self.head_dim,
--++++                max_position_embeddings=self.max_position_embeddings,
--++++                base=self.rope_theta,
--++++            )
--++++        else:
--++++            scaling_type = self.config.rope_scaling["type"]
--++++            scaling_factor = self.config.rope_scaling["factor"]
--++++            if scaling_type == "linear":
--++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--++++                    self.head_dim,
--++++                    max_position_embeddings=self.max_position_embeddings,
--++++                    scaling_factor=scaling_factor,
--++++                    base=self.rope_theta,
--++++                )
--++++            elif scaling_type == "dynamic":
--++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--++++                    self.head_dim,
--++++                    max_position_embeddings=self.max_position_embeddings,
--++++                    scaling_factor=scaling_factor,
--++++                    base=self.rope_theta,
--++++                )
--++++            else:
--++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--++++
--++++    def forward(
--++++        self,
--++++        hidden_states: mindspore.Tensor,
--++++        attention_mask: Optional[mindspore.Tensor] = None,
--++++        position_ids: Optional[mindspore.Tensor] = None,
--++++        past_key_value: Optional[Cache] = None,
--++++        output_attentions: bool = False,
--++++        use_cache: bool = False,
--++++        **kwargs,
--++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++        if "padding_mask" in kwargs:
--++++            warnings.warn(
--++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--++++            )
--++++        if output_attentions:
--++++            warnings.warn(
--++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
--++++            )
--++++
--++++        bsz, q_len, _ = hidden_states.shape
--++++
--++++        if self.config.pretraining_tp > 1:
--++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--++++
--++++        query_states = self.q_proj(hidden_states)
--++++        key_states = self.k_proj(hidden_states)
--++++        value_states = self.v_proj(hidden_states)
--++++
--++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
--++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++        kv_seq_len = key_states.shape[-2]
--++++        if past_key_value is not None:
--++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++        
--++++        # Apply Rotary Position Embedding
--++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++        if past_key_value is not None:
--++++            cache_kwargs = {"sin": sin, "cos": cos}
--++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++
--++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
--++++        # So we must explicitly repeat the KV heads.
--++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++
--++++        # Convert attention mask for flash_attention_score
--++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
--++++        if attention_mask is not None:
--++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--++++                 raise ValueError(
--++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--++++                )
--++++            attn_mask_for_fa = attention_mask < 0
--++++        else:
--++++            attn_mask_for_fa = None
--++++
--++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--++++
--++++        # Call the fused operator using the efficient BNSD layout
--++++        attn_output = mindspore.ops.flash_attention_score(
--++++            query=query_states,
--++++            key=key_states,
--++++            value=value_states,
--++++            head_num=self.num_heads,
--++++            input_layout='BNSD', # Specify the correct layout
--++++            attn_mask=attn_mask_for_fa,
--++++            keep_prob=keep_prob,
--++++            scalar_value=1.0 / math.sqrt(self.head_dim)
--++++        )
--++++        
--++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
--++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++        
--++++        # Apply output projection
--++++        attn_output = self.o_proj(attn_output)
--++++
--++++        # Flash attention does not return attention weights, so we return None.
--++++        attn_weights = None
--++++
--++++        return attn_output, attn_weights, past_key_value
--++++
--+++ Deepseek_ATTENTION_CLASSES = {
--+++     "eager": DeepseekAttention,
--++++    "flash-attention": DeepseekFlashAttention,
--+++ }
--+++ 
--+++ 
--+++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
--+++             config=config, layer_idx=layer_idx
--+++         )
--+++ 
--++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--++++            config=config, layer_idx=layer_idx
--++++        )
--++++
--+++         self.mlp = (
--+++             DeepseekMoE(config)
--+++             if (
--+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++index d4c6b651..bced285c 100644
--+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
--+++ 
--+++ import mindspore
--+++ import mindnlp.core.nn.functional as F
--+++-from mindnlp.core import nn, ops
--++++from mindnlp.core import nn, ops, no_grad
--+++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
--+++ 
--+++ from ....common.activations import ACT2FN
--+++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
--+++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--+++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--+++ 
--++++Long_Prompt = False
--++++PROMPT_LENGTH_THRESHOLD = 128
--+++ 
--+++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--+++ def _prepare_4d_causal_attention_mask_with_cache_position(
--+++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--+++ 
--++++# class Qwen2MoeFlashAttention(nn.Module):
--++++#     """
--++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++++
--++++#     关键改动:
--++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++++#        直接传入原始的 key 和 value 张量效率更高。
--++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++#     """
--++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++#         super().__init__()
--++++#         self.config = config
--++++#         self.layer_idx = layer_idx
--++++#         self.hidden_size = config.hidden_size
--++++#         self.num_heads = config.num_attention_heads
--++++#         self.head_dim = self.hidden_size // self.num_heads
--++++#         self.num_key_value_heads = config.num_key_value_heads
--++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++#         self.max_position_embeddings = config.max_position_embeddings
--++++#         self.rope_theta = config.rope_theta
--++++#         self.attention_dropout = config.attention_dropout
--++++
--++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++++#             raise ValueError(
--++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++++#             )
--++++
--++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++
--++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++#             self.head_dim,
--++++#             max_position_embeddings=self.max_position_embeddings,
--++++#             base=self.rope_theta,
--++++#         )
--++++
--++++#     def forward(
--++++#         self,
--++++#         hidden_states: mindspore.Tensor,
--++++#         attention_mask: Optional[mindspore.Tensor] = None,
--++++#         position_ids: Optional[mindspore.Tensor] = None,
--++++#         past_key_value: Optional[Cache] = None,
--++++#         output_attentions: bool = False,
--++++#         use_cache: bool = False,
--++++#         cache_position: Optional[mindspore.Tensor] = None,
--++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++#         bsz, q_len, _ = hidden_states.shape
--++++
--++++#         # 1. 线性投射 Q, K, V
--++++#         query_states = self.q_proj(hidden_states)
--++++#         key_states = self.k_proj(hidden_states)
--++++#         value_states = self.v_proj(hidden_states)
--++++
--++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
--++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++#         # 3. RoPE 旋转位置编码
--++++#         kv_seq_len = key_states.shape[-2]
--++++#         if past_key_value is not None:
--++++#             if self.layer_idx is None:
--++++#                 raise ValueError(
--++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++#                     "with a layer index."
--++++#                 )
--++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
--++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++++#                 if cache_position.shape[0] == 1:
--++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++++#                     kv_seq_len = past_seen_tokens + 1
--++++#                 else:
--++++#                     # prefill 阶段：cache_position 是范围，使用其长度
--++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++++#             else:
--++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++
--++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++#         # 4. KV 缓存更新
--++++#         if past_key_value is not None:
--++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++#             key_states, value_states = past_key_value.update(
--++++#                 key_states, value_states, self.layer_idx, cache_kwargs
--++++#             )
--++++            
--++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++#                 if cache_position.shape[0] == 1:
--++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++++#                     kv_seq_len = key_states.shape[-2]
--++++
--++++#         # 5. [重要] 准备 Attention Mask
--++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++#         fa_attention_mask = None
--++++#         if attention_mask is not None:
--++++#             # 截取与当前key长度匹配的部分
--++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
--++++#             fa_attention_mask = (mask_slice != 0)
--++++
--++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++++#         input_dtype = query_states.dtype
--++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++++#             query_states = query_states.to(mindspore.float16)
--++++#             key_states = key_states.to(mindspore.float16)
--++++#             value_states = value_states.to(mindspore.float16)
--++++
--++++#         # 6. [核心] 调用 flash_attention_score 算子
--++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++#         attn_output = mindspore.ops.flash_attention_score(
--++++#             query=query_states,
--++++#             key=key_states,
--++++#             value=value_states,
--++++#             head_num=self.num_heads, # 传入Q的头数(N1)
--++++#             attn_mask=fa_attention_mask,
--++++#             keep_prob=1.0 - self.attention_dropout,
--++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--++++#             input_layout="BNSD",
--++++#             sparse_mode=0 # 使用 defaultMask 模式
--++++#         )
--++++
--++++#         # 恢复原始数据类型
--++++#         attn_output = attn_output.to(input_dtype)
--++++
--++++#         # 7. 调整输出形状
--++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++#         attn_output = self.o_proj(attn_output)
--++++
--++++#         # FlashAttention 算子不直接返回注意力权重矩阵
--++++#         attn_weights = None
--++++#         if output_attentions:
--++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++
--++++#         return attn_output, attn_weights, past_key_value
--++++
--++++#     # def forward(
--++++#     #     self,
--++++#     #     hidden_states: mindspore.Tensor,
--++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
--++++#     #     position_ids: Optional[mindspore.Tensor] = None,
--++++#     #     past_key_value: Optional[Cache] = None,
--++++#     #     output_attentions: bool = False,
--++++#     #     use_cache: bool = False,
--++++#     #     cache_position: Optional[mindspore.Tensor] = None,
--++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++#     #     bsz, q_len, _ = hidden_states.shape
--++++
--++++#     #     # 1. 线性投射 Q, K, V
--++++#     #     query_states = self.q_proj(hidden_states)
--++++#     #     key_states = self.k_proj(hidden_states)
--++++#     #     value_states = self.v_proj(hidden_states)
--++++
--++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++
--++++#     #     # 3. RoPE 旋转位置编码
--++++#     #     kv_seq_len = key_states.shape[-2]
--++++#     #     if past_key_value is not None:
--++++#     #         if self.layer_idx is None:
--++++#     #             raise ValueError(
--++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++#     #                 "with a layer index."
--++++#     #             )
--++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++
--++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++#     #     # 4. KV 缓存更新
--++++#     #     if past_key_value is not None:
--++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++#     #         key_states, value_states = past_key_value.update(
--++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
--++++#     #         )
--++++
--++++#     #     # 5. 准备 Attention Mask
--++++#     #     fa_attention_mask = None
--++++#     #     if attention_mask is not None:
--++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++#     #         fa_attention_mask = (mask_slice != 0)
--++++
--++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++++#     #     input_dtype = query_states.dtype
--++++
--++++#     #     # 6. [核心] 调用 flash_attention_score 算子
--++++#     #     attn_output = mindspore.ops.flash_attention_score(
--++++#     #         query=query_states,
--++++#     #         key=key_states,
--++++#     #         value=value_states,
--++++#     #         head_num=self.num_heads,
--++++#     #         attn_mask=fa_attention_mask,
--++++#     #         keep_prob=1.0 - self.attention_dropout,
--++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++++#     #         input_layout="BNSD",
--++++#     #         sparse_mode=0,
--++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
--++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++++#     #         inner_precise=1
--++++#     #     )
--++++
--++++#     #     # 恢复原始数据类型
--++++#     #     attn_output = attn_output.to(input_dtype)
--++++
--++++#     #     # 7. 调整输出形状
--++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++#     #     attn_output = self.o_proj(attn_output)
--++++
--++++#     #     attn_weights = None
--++++#     #     if output_attentions:
--++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++
--++++#     #     return attn_output, attn_weights, past_key_value
--++++
--++++
--+++ class Qwen2MoeFlashAttention(nn.Module):
--+++     """
--+++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++-
--+++-    关键改动:
--+++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++-       直接传入原始的 key 和 value 张量效率更高。
--+++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
--++++
--++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
--++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
--++++    完全使用模型的低精度数据类型（如 float16）进行计算，
--++++    以达到理论上的最高执行速度。
--+++     """
--+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++         super().__init__()
--+++         self.config = config
--+++         self.layer_idx = layer_idx
--++++        if layer_idx is None:
--++++            logger.warning_once(
--++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
--++++            )
--++++
--+++         self.hidden_size = config.hidden_size
--+++         self.num_heads = config.num_attention_heads
--+++         self.head_dim = self.hidden_size // self.num_heads
--+++         self.num_key_value_heads = config.num_key_value_heads
--+++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++         self.max_position_embeddings = config.max_position_embeddings
--+++         self.rope_theta = config.rope_theta
--+++         self.attention_dropout = config.attention_dropout
--+++ 
--+++-        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++-            raise ValueError(
--+++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++-            )
--+++-
--+++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
--+++         key_states = self.k_proj(hidden_states)
--+++         value_states = self.v_proj(hidden_states)
--+++ 
--+++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++-        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++        # 2. 调整形状以匹配 BNSD 布局
--+++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-
--+++-        # 3. RoPE 旋转位置编码
--++++        
--++++        # 3. RoPE 和 KV 缓存
--+++         kv_seq_len = key_states.shape[-2]
--+++         if past_key_value is not None:
--+++-            if self.layer_idx is None:
--+++-                raise ValueError(
--+++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++-                    "with a layer index."
--+++-                )
--+++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++-                if cache_position.shape[0] == 1:
--+++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++-                    kv_seq_len = past_seen_tokens + 1
--+++-                else:
--+++-                    # prefill 阶段：cache_position 是范围，使用其长度
--+++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++-            else:
--+++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++-
--++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++        
--+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++ 
--+++-        # 4. KV 缓存更新
--+++         if past_key_value is not None:
--+++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++-            key_states, value_states = past_key_value.update(
--+++-                key_states, value_states, self.layer_idx, cache_kwargs
--+++-            )
--+++-            
--+++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++-                if cache_position.shape[0] == 1:
--+++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++-                    kv_seq_len = key_states.shape[-2]
--+++-
--+++-        # 5. [重要] 准备 Attention Mask
--+++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++
--++++        # 4. 准备 Attention Mask
--+++         fa_attention_mask = None
--+++         if attention_mask is not None:
--+++-            # 截取与当前key长度匹配的部分
--+++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++             fa_attention_mask = (mask_slice != 0)
--+++ 
--+++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++-        input_dtype = query_states.dtype
--+++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++-            query_states = query_states.to(mindspore.float16)
--+++-            key_states = key_states.to(mindspore.float16)
--+++-            value_states = value_states.to(mindspore.float16)
--+++-
--+++-        # 6. [核心] 调用 flash_attention_score 算子
--+++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
--+++         attn_output = mindspore.ops.flash_attention_score(
--+++             query=query_states,
--+++             key=key_states,
--+++             value=value_states,
--+++-            head_num=self.num_heads, # 传入Q的头数(N1)
--++++            head_num=self.num_heads,
--+++             attn_mask=fa_attention_mask,
--+++-            keep_prob=1.0 - self.attention_dropout,
--++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
--+++             scalar_value=1.0 / math.sqrt(self.head_dim),
--+++             input_layout="BNSD",
--+++-            sparse_mode=0 # 使用 defaultMask 模式
--++++            sparse_mode=0,
--++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
--+++         )
--+++ 
--+++-        # 恢复原始数据类型
--+++-        attn_output = attn_output.to(input_dtype)
--+++-
--+++-        # 7. 调整输出形状
--+++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++        # 6. 调整输出形状
--+++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++         attn_output = self.o_proj(attn_output)
--+++ 
--+++-        # FlashAttention 算子不直接返回注意力权重矩阵
--++++        # 7. 返回结果
--+++         attn_weights = None
--+++         if output_attentions:
--+++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--+++ 
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--+++-    # def forward(
--+++-    #     self,
--+++-    #     hidden_states: mindspore.Tensor,
--+++-    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++-    #     position_ids: Optional[mindspore.Tensor] = None,
--+++-    #     past_key_value: Optional[Cache] = None,
--+++-    #     output_attentions: bool = False,
--+++-    #     use_cache: bool = False,
--+++-    #     cache_position: Optional[mindspore.Tensor] = None,
--+++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++-
--+++-    #     bsz, q_len, _ = hidden_states.shape
--+++-
--+++-    #     # 1. 线性投射 Q, K, V
--+++-    #     query_states = self.q_proj(hidden_states)
--+++-    #     key_states = self.k_proj(hidden_states)
--+++-    #     value_states = self.v_proj(hidden_states)
--+++-
--+++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-
--+++-    #     # 3. RoPE 旋转位置编码
--+++-    #     kv_seq_len = key_states.shape[-2]
--+++-    #     if past_key_value is not None:
--+++-    #         if self.layer_idx is None:
--+++-    #             raise ValueError(
--+++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++-    #                 "with a layer index."
--+++-    #             )
--+++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++ 
--+++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++-
--+++-    #     # 4. KV 缓存更新
--+++-    #     if past_key_value is not None:
--+++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++-    #         key_states, value_states = past_key_value.update(
--+++-    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++-    #         )
--+++-
--+++-    #     # 5. 准备 Attention Mask
--+++-    #     fa_attention_mask = None
--+++-    #     if attention_mask is not None:
--+++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++-    #         fa_attention_mask = (mask_slice != 0)
--+++-
--+++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++-    #     input_dtype = query_states.dtype
--+++-
--+++-    #     # 6. [核心] 调用 flash_attention_score 算子
--+++-    #     attn_output = mindspore.ops.flash_attention_score(
--+++-    #         query=query_states,
--+++-    #         key=key_states,
--+++-    #         value=value_states,
--+++-    #         head_num=self.num_heads,
--+++-    #         attn_mask=fa_attention_mask,
--+++-    #         keep_prob=1.0 - self.attention_dropout,
--+++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++-    #         input_layout="BNSD",
--+++-    #         sparse_mode=0,
--+++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++-    #         inner_precise=1
--+++-    #     )
--+++-
--+++-    #     # 恢复原始数据类型
--+++-    #     attn_output = attn_output.to(input_dtype)
--++++QWEN2MOE_ATTENTION_CLASSES = {
--++++    "eager": Qwen2MoeAttention,
--++++    "flash-attention": Qwen2MoeFlashAttention,
--++++}
--+++ 
--+++-    #     # 7. 调整输出形状
--+++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++-    #     attn_output = self.o_proj(attn_output)
--+++ 
--+++-    #     attn_weights = None
--+++-    #     if output_attentions:
--+++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++#     def __init__(self, config):
--++++#         super().__init__()
--++++#         self.num_experts = config.num_experts
--++++#         self.top_k = config.num_experts_per_tok
--++++#         self.norm_topk_prob = config.norm_topk_prob
--++++
--++++#         # gating
--++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++#         self.experts = nn.ModuleList(
--++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++#         )
--++++
--++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++
--++++#     #@dwj
--++++#     # 只遍历激活的专家，而非全部专家
--++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++#             num_tokens = hidden_states_reshaped.shape[0]
--++++
--++++#             router_logits = self.gate(hidden_states_reshaped)
--++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++
--++++#             if self.norm_topk_prob:
--++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++#             routing_weights = routing_weights.to(hidden_states.dtype)
--++++            
--++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++++#             flat_selected_experts = selected_experts.flatten()
--++++            
--++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++++#             token_indices = broadcasted_token_indices.flatten()
--++++            
--++++#             active_experts = ops.unique(flat_selected_experts)
--++++            
--++++#             for expert_idx_tensor in active_experts:
--++++#                 expert_idx = expert_idx_tensor.item()
--++++#                 expert_layer = self.experts[expert_idx]
--++++                
--++++#                 mask = (flat_selected_experts == expert_idx_tensor)
--++++#                 selected_token_indices = token_indices[mask]
--++++#                 selected_routing_weights = routing_weights.flatten()[mask]
--++++                
--++++#                 current_states = hidden_states_reshaped[selected_token_indices]
--++++                
--++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++                
--++++#                 final_hidden_states = final_hidden_states.index_add(
--++++#                     dim=0,
--++++#                     index=selected_token_indices,
--++++#                     source=expert_output.to(hidden_states.dtype)
--++++#                 )
--++++            
--++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++ 
--+++-    #     return attn_output, attn_weights, past_key_value
--++++#             final_hidden_states = final_hidden_states + shared_expert_output
--++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++            
--++++#             return final_hidden_states, router_logits
--++++
--++++
--++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++#     """
--++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
--++++#     """
--++++#     def __init__(self, config: Qwen2MoeConfig):
--++++#         super().__init__()
--++++#         self.num_experts = config.num_experts
--++++#         self.top_k = config.num_experts_per_tok
--++++#         self.norm_topk_prob = config.norm_topk_prob
--++++
--++++#         # 门控网络
--++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++#         # 专家列表
--++++#         self.experts = nn.ModuleList(
--++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++#         )
--++++#         # 共享专家
--++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++
--++++#     @no_grad()
--++++#     def _moe_infer_decode(
--++++#         self, 
--++++#         hidden_states: mindspore.Tensor, 
--++++#         selected_experts: mindspore.Tensor, 
--++++#         routing_weights: mindspore.Tensor
--++++#     ) -> mindspore.Tensor:
--++++#         """
--++++#         【解码路径】针对 sequence_length=1 的极致优化。
--++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--++++#         """
--++++#         batch_size, hidden_dim = hidden_states.shape
--++++        
--++++#         expert_outputs_list = [
--++++#             ops.cat([
--++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++#             ], dim=0) 
--++++#             for i in range(batch_size)
--++++#         ]
--++++        
--++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--++++#         # shape: (batch_size, top_k, hidden_dim)
--++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++        
--++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++        
--++++#         return moe_output.squeeze(1)
--++++
--++++#     @no_grad()
--++++#     def _moe_infer_prefill(
--++++#         self, 
--++++#         hidden_states: mindspore.Tensor, 
--++++#         selected_experts: mindspore.Tensor, 
--++++#         routing_weights: mindspore.Tensor
--++++#     ) -> mindspore.Tensor:
--++++#         """
--++++#         【预填充路径】针对 sequence_length > 1 的优化。
--++++#         按专家对 Token 进行分组，并进行批处理。
--++++#         """
--++++#         moe_output = ops.zeros_like(hidden_states)
--++++#         num_tokens = hidden_states.shape[0]
--++++#         flat_selected_experts = selected_experts.flatten()
--++++        
--++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++        
--++++#         active_experts = ops.unique(flat_selected_experts)
--++++        
--++++#         for expert_idx_tensor in active_experts:
--++++#             expert_idx = expert_idx_tensor.item()
--++++#             expert_layer = self.experts[expert_idx]
--++++            
--++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++#             selected_token_indices = token_indices[mask]
--++++#             selected_routing_weights = routing_weights.flatten()[mask]
--++++            
--++++#             current_states = hidden_states[selected_token_indices]
--++++            
--++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++            
--++++#             moe_output = moe_output.index_add(
--++++#                 dim=0,
--++++#                 index=selected_token_indices,
--++++#                 source=expert_output.to(hidden_states.dtype)
--++++#             )
--++++#         return moe_output
--++++
--++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++#         """
--++++#         顶层 forward 方法，作为智能分发器。
--++++#         """
--++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++        
--++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++#         router_logits = self.gate(hidden_states_reshaped)
--++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++ 
--+++-    # def forward(
--+++-    #     self,
--+++-    #     hidden_states: mindspore.Tensor,
--+++-    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++-    #     position_ids: Optional[mindspore.Tensor] = None,
--+++-    #     past_key_value: Optional[Cache] = None,
--+++-    #     output_attentions: bool = False,
--+++-    #     use_cache: bool = False,
--+++-    #     cache_position: Optional[mindspore.Tensor] = None,
--+++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++-
--+++-    #     bsz, q_len, _ = hidden_states.shape
--+++-
--+++-    #     query_states = self.q_proj(hidden_states)
--+++-    #     key_states = self.k_proj(hidden_states)
--+++-    #     value_states = self.v_proj(hidden_states)
--+++-
--+++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-
--+++-    #     kv_seq_len = key_states.shape[-2]
--+++-    #     if past_key_value is not None:
--+++-    #         if self.layer_idx is None:
--+++-    #             raise ValueError("`layer_idx` must be specified for caching")
--+++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++-
--+++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++-
--+++-    #     if past_key_value is not None:
--+++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++-    #         key_states, value_states = past_key_value.update(
--+++-    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++-    #         )
--++++#         if self.norm_topk_prob:
--++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++        
--++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++        
--++++#         moe_output = None
--++++#         # 在推理时，根据序列长度选择最优路径
--++++#         if not self.training:
--++++#             if sequence_length == 1:
--++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++++#             else:
--++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++++#         else:
--++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--++++#             raise NotImplementedError("Training path is not implemented.")
--++++
--++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--++++        
--++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--++++        
--++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--++++        
--++++#         return final_hidden_states, router_logits
--++++
--++++
--++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++#     """
--++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--++++#     """
--++++#     def __init__(self, config: Qwen2MoeConfig):
--++++#         super().__init__()
--++++#         self.num_experts = config.num_experts
--++++#         self.top_k = config.num_experts_per_tok
--++++#         self.norm_topk_prob = config.norm_topk_prob
--++++
--++++#         # 门控网络
--++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++#         # 专家列表
--++++#         self.experts = nn.ModuleList(
--++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++#         )
--++++#         # 共享专家
--++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++
--++++#     @no_grad()
--++++#     def _moe_infer_decode(
--++++#         self, 
--++++#         hidden_states: mindspore.Tensor, 
--++++#         selected_experts: mindspore.Tensor, 
--++++#         routing_weights: mindspore.Tensor
--++++#     ) -> mindspore.Tensor:
--++++#         batch_size, _ = hidden_states.shape
--++++#         expert_outputs_list = [
--++++#             ops.cat([
--++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++#             ], dim=0) 
--++++#             for i in range(batch_size)
--++++#         ]
--++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++#         return moe_output.squeeze(1)
--++++
--++++#     @no_grad()
--++++#     def _moe_infer_prefill(
--++++#         self, 
--++++#         hidden_states: mindspore.Tensor, 
--++++#         selected_experts: mindspore.Tensor, 
--++++#         routing_weights: mindspore.Tensor
--++++#     ) -> mindspore.Tensor:
--++++#         moe_output = ops.zeros_like(hidden_states)
--++++#         num_tokens = hidden_states.shape[0]
--++++#         flat_selected_experts = selected_experts.flatten()
--++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++#         active_experts = ops.unique(flat_selected_experts)
--++++        
--++++#         for expert_idx_tensor in active_experts:
--++++#             expert_idx = expert_idx_tensor.item()
--++++#             expert_layer = self.experts[expert_idx]
--++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++#             selected_token_indices = token_indices[mask]
--++++#             selected_routing_weights = routing_weights.flatten()[mask]
--++++#             current_states = hidden_states[selected_token_indices]
--++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++#             moe_output = moe_output.index_add(
--++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++++#             )
--++++#         return moe_output
--++++
--++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++#         """
--++++#         顶层 forward 方法，作为智能分发器。
--++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--++++#         """
--++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++        
--++++#         # 1. 门控计算 (通用逻辑)
--++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++#         router_logits = self.gate(hidden_states_reshaped)
--++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++
--++++#         if self.norm_topk_prob:
--++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++        
--++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++        
--++++#         # 2. 智能分发到最优 MoE 路径
--++++#         moe_output = None
--++++#         if not self.training:
--++++#             if sequence_length == 1:
--++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++++#             else:
--++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++++#         else:
--++++#             raise NotImplementedError("Training path is not implemented.")
--++++
--++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++        
--++++#         # 4. 合并 MoE 输出和共享专家输出
--++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++        
--++++#         # 5. 恢复原始形状并返回
--++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++        
--++++#         return final_hidden_states, router_logits
--++++
--++++# prefill fastest
--++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++#     """
--++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--++++#     """
--++++#     def __init__(self, config: Qwen2MoeConfig):
--++++#         super().__init__()
--++++#         self.num_experts = config.num_experts
--++++#         self.top_k = config.num_experts_per_tok
--++++#         self.norm_topk_prob = config.norm_topk_prob
--++++
--++++#         # 门控网络
--++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++#         # 专家列表
--++++#         self.experts = nn.ModuleList(
--++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++#         )
--++++#         # 共享专家
--++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++
--++++#     @no_grad()
--++++#     def _moe_infer_dispatch(
--++++#         self, 
--++++#         hidden_states: mindspore.Tensor, 
--++++#         selected_experts: mindspore.Tensor, 
--++++#         routing_weights: mindspore.Tensor
--++++#     ) -> mindspore.Tensor:
--++++#         """
--++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--++++#         """
--++++#         moe_output = ops.zeros_like(hidden_states)
--++++#         num_tokens, _ = hidden_states.shape
--++++        
--++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--++++#         flat_selected_experts = selected_experts.flatten()
--++++#         flat_routing_weights = routing_weights.flatten()
--+++ 
--+++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++-
--+++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++-    #     query_states = query_states / math.sqrt(self.head_dim)
--+++-    #     # <--- 修改结束 ---
--+++-
--+++-    #     fa_attention_mask = None
--+++-    #     if attention_mask is not None:
--+++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++-    #         fa_attention_mask = (mask_slice != 0)
--+++-
--+++-    #     input_dtype = query_states.dtype
--+++-
--+++-    #     attn_output = mindspore.ops.flash_attention_score(
--+++-    #         query=query_states,    # 传入已经预先缩放过的 query
--+++-    #         key=key_states,
--+++-    #         value=value_states,
--+++-    #         head_num=self.num_heads,
--+++-    #         attn_mask=fa_attention_mask,
--+++-    #         keep_prob=1.0 - self.attention_dropout,
--+++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++-    #         input_layout="BNSD",
--+++-    #         sparse_mode=0,
--+++-    #         inner_precise=1        # 仍然保持内部高精度计算
--+++-    #     )
--++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++ 
--+++-    #     attn_output = attn_output.to(input_dtype)
--+++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++-    #     attn_output = self.o_proj(attn_output)
--++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--++++#         active_experts = ops.unique(flat_selected_experts)
--++++        
--++++#         for expert_idx_tensor in active_experts:
--++++#             expert_idx = expert_idx_tensor.item()
--++++#             expert_layer = self.experts[expert_idx]
--++++            
--++++#             # 找到所有分配给该专家的 token
--++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++            
--++++#             # 使用 mask 选取对应的 token 和权重
--++++#             current_token_indices = token_indices[mask]
--++++#             current_routing_weights = flat_routing_weights[mask]
--++++#             current_hidden_states = hidden_states[current_token_indices]
--++++            
--++++#             # 对这些 token 进行批处理
--++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++++            
--++++#             # 使用 index_add 将结果精确地加回到对应位置
--++++#             moe_output = moe_output.index_add(
--++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--++++#             )
--++++#         return moe_output
--++++
--++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++#         """
--++++#         顶层 forward 方法，作为智能分发器。
--++++#         """
--++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++        
--++++#         # 1. 门控计算
--++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++#         router_logits = self.gate(hidden_states_reshaped)
--++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++
--++++#         if self.norm_topk_prob:
--++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++        
--++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++        
--++++#         # 2. 调用统一的 MoE 计算内核
--++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--+++ 
--+++-    #     attn_weights = None
--+++-    #     if output_attentions:
--+++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++++#         # 3. 统一处理共享专家
--++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++        
--++++#         # 4. 合并输出
--++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++        
--++++#         # 5. 恢复原始形状并返回
--++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++        
--++++#         return final_hidden_states, router_logits
--++++
--++++
--++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++#     """
--++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++#     【最终高性能与高精度版】：
--++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--++++#     3. 这样实现了速度和准确性的两全其美。
--++++#     """
--++++#     def __init__(self, config: Qwen2MoeConfig):
--++++#         super().__init__()
--++++#         self.num_experts = config.num_experts
--++++#         self.top_k = config.num_experts_per_tok
--++++#         self.norm_topk_prob = config.norm_topk_prob
--++++
--++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++#         self.experts = nn.ModuleList(
--++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++#         )
--++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++
--++++#     @no_grad()
--++++#     def _moe_infer_decode(
--++++#         self, 
--++++#         hidden_states: mindspore.Tensor, 
--++++#         selected_experts: mindspore.Tensor, 
--++++#         routing_weights: mindspore.Tensor
--++++#     ) -> mindspore.Tensor:
--++++#         """
--++++#         【解码路径】极致优化版：bmm + 高精度累加。
--++++#         """
--++++#         original_dtype = hidden_states.dtype
--++++#         batch_size, _ = hidden_states.shape
--++++        
--++++#         expert_outputs_list = [
--++++#             ops.cat([
--++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++#             ], dim=0) 
--++++#             for i in range(batch_size)
--++++#         ]
--++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++
--++++#         # 在 float32 下执行 bmm，得到高精度结果
--++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++        
--++++#         # 将高精度结果转换回原始数据类型
--++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--++++        
--++++#         return moe_output
--++++
--++++#     @no_grad()
--++++#     def _moe_infer_prefill(
--++++#         self, 
--++++#         hidden_states: mindspore.Tensor, 
--++++#         selected_experts: mindspore.Tensor, 
--++++#         routing_weights: mindspore.Tensor
--++++#     ) -> mindspore.Tensor:
--++++#         """
--++++#         【预填充路径】与原始实现一致，结果精确。
--++++#         """
--++++#         moe_output = ops.zeros_like(hidden_states)
--++++#         num_tokens, _ = hidden_states.shape
--++++#         flat_selected_experts = selected_experts.flatten()
--++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++#         active_experts = ops.unique(flat_selected_experts)
--++++        
--++++#         for expert_idx_tensor in active_experts:
--++++#             expert_idx = expert_idx_tensor.item()
--++++#             expert_layer = self.experts[expert_idx]
--++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++#             selected_token_indices = token_indices[mask]
--++++#             selected_routing_weights = routing_weights.flatten()[mask]
--++++#             current_states = hidden_states[selected_token_indices]
--++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++#             moe_output = moe_output.index_add(
--++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++++#             )
--++++#         return moe_output
--++++
--++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++        
--++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++#         router_logits = self.gate(hidden_states_reshaped)
--++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++ 
--+++-    #     return attn_output, attn_weights, past_key_value
--++++#         if self.norm_topk_prob:
--++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++        
--++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--++++#         # 如果模型主体是 float16，后续再转换
--++++        
--++++#         moe_output = None
--++++#         if not self.training:
--++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--++++#             # _moe_infer_decode 内部会处理好类型转换
--++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--++++#             if sequence_length == 1:
--++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++++#             else:
--++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++++#         else:
--++++#             raise NotImplementedError("Training path is not implemented.")
--++++
--++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++        
--++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++        
--++++#         return final_hidden_states, router_logits
--++++    
--+++ 
--+++-QWEN2MOE_ATTENTION_CLASSES = {
--+++-    "eager": Qwen2MoeAttention,
--+++-    "flash-attention": Qwen2MoeFlashAttention,
--+++-}
--++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++#     """
--++++#     【融合版】一个混合专家模块，内置两种推理策略，
--++++#     由外部全局变量 `Long_Prompt` 控制：
--++++
--++++#     - if Long_Prompt is True:  【精度优先模式】
--++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--++++#       适用于处理长序列，避免误差累积。
--++++
--++++#     - if Long_Prompt is False: 【速度优先模式】
--++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--++++#       在解码阶段获得极致速度，同时保证结果高度准确。
--++++#     """
--++++#     def __init__(self, config: Qwen2MoeConfig):
--++++#         super().__init__()
--++++#         self.num_experts = config.num_experts
--++++#         self.top_k = config.num_experts_per_tok
--++++#         self.norm_topk_prob = config.norm_topk_prob
--++++
--++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++#         self.experts = nn.ModuleList(
--++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++#         )
--++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++
--++++#     # --- 速度优先模式的辅助函数 ---
--++++#     @no_grad()
--++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++#         original_dtype = hidden_states.dtype
--++++#         batch_size, _ = hidden_states.shape
--++++#         expert_outputs_list = [
--++++#             ops.cat([
--++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++#             ], dim=0) 
--++++#             for i in range(batch_size)
--++++#         ]
--++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++#         weights_fp32 = routing_weights.to(mindspore.float32)
--++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
--++++
--++++#     @no_grad()
--++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++#         moe_output = ops.zeros_like(hidden_states)
--++++#         num_tokens, _ = hidden_states.shape
--++++#         flat_selected_experts = selected_experts.flatten()
--++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++#         active_experts = ops.unique(flat_selected_experts)
--++++#         for expert_idx_tensor in active_experts:
--++++#             expert_idx = expert_idx_tensor.item()
--++++#             expert_layer = self.experts[expert_idx]
--++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++#             selected_token_indices = token_indices[mask]
--++++#             selected_routing_weights = routing_weights.flatten()[mask]
--++++#             current_states = hidden_states[selected_token_indices]
--++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--++++#         return moe_output
--++++
--++++#     # --- 精度优先模式的辅助函数 ---
--++++#     @no_grad()
--++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++#         moe_output = ops.zeros_like(hidden_states)
--++++#         num_tokens, _ = hidden_states.shape
--++++#         flat_selected_experts = selected_experts.flatten()
--++++#         flat_routing_weights = routing_weights.flatten()
--++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++#         active_experts = ops.unique(flat_selected_experts)
--++++#         for expert_idx_tensor in active_experts:
--++++#             expert_idx = expert_idx_tensor.item()
--++++#             expert_layer = self.experts[expert_idx]
--++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++#             current_token_indices = token_indices[mask]
--++++#             current_routing_weights = flat_routing_weights[mask]
--++++#             current_hidden_states = hidden_states[current_token_indices]
--++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--++++#         return moe_output
--++++
--++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++#         # 声明我们将要使用一个在模块外部定义的全局变量
--++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--++++#         global Long_Prompt
--++++
--++++#         # 1. 门控计算 (所有模式通用)
--++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++#         router_logits = self.gate(hidden_states_reshaped)
--++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--++++#         if self.norm_topk_prob:
--++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++        
--++++#         moe_output = None
--++++#         if not self.training:
--++++#             # 根据 Long_Prompt 标志选择模式
--++++#             if Long_Prompt:
--++++#                 # --- 精度优先模式 ---
--++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++#             else:
--++++#                 # --- 速度优先模式 ---
--++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++#                 if sequence_length == 1:
--++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++#                 else:
--++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++#         else:
--++++#             raise NotImplementedError("Training path is not implemented.")
--++++
--++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++        
--++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++        
--++++#         return final_hidden_states, router_logits
--++++    
--++++class Qwen2MoeSparseMoeBlock(nn.Module):
--++++    """
--++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--++++    控制的顶级推理策略：
--+++ 
--++++    - if Long_Prompt is True:  【精度优先模式】
--++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
--++++      适用于需要严格可复现性的长序列任务。
--+++ 
--+++-class Qwen2MoeSparseMoeBlock(nn.Module):
--+++-    def __init__(self, config):
--++++    - if Long_Prompt is False: 【速度优先模式】
--++++      采用业界最强的性能组合：
--++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
--++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
--++++    """
--++++    def __init__(self, config: Qwen2MoeConfig):
--+++         super().__init__()
--+++         self.num_experts = config.num_experts
--+++         self.top_k = config.num_experts_per_tok
--+++         self.norm_topk_prob = config.norm_topk_prob
--+++ 
--+++-        # gating
--+++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++         self.experts = nn.ModuleList(
--+++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++         )
--+++-
--+++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++ 
--+++-    #@dwj
--+++-    # 只遍历激活的专家，而非全部专家
--+++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++-            num_tokens = hidden_states_reshaped.shape[0]
--+++-
--+++-            router_logits = self.gate(hidden_states_reshaped)
--+++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++-
--+++-            if self.norm_topk_prob:
--+++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-            routing_weights = routing_weights.to(hidden_states.dtype)
--+++-            
--+++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++-            flat_selected_experts = selected_experts.flatten()
--+++-            
--+++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++-            token_indices = broadcasted_token_indices.flatten()
--+++-            
--+++-            active_experts = ops.unique(flat_selected_experts)
--+++-            
--+++-            for expert_idx_tensor in active_experts:
--+++-                expert_idx = expert_idx_tensor.item()
--+++-                expert_layer = self.experts[expert_idx]
--+++-                
--+++-                mask = (flat_selected_experts == expert_idx_tensor)
--+++-                selected_token_indices = token_indices[mask]
--+++-                selected_routing_weights = routing_weights.flatten()[mask]
--+++-                
--+++-                current_states = hidden_states_reshaped[selected_token_indices]
--+++-                
--+++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++-                
--+++-                final_hidden_states = final_hidden_states.index_add(
--+++-                    dim=0,
--+++-                    index=selected_token_indices,
--+++-                    source=expert_output.to(hidden_states.dtype)
--+++-                )
--+++-            
--+++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
--++++    @no_grad()
--++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++        original_dtype = hidden_states.dtype
--++++        batch_size, _ = hidden_states.shape
--++++        expert_outputs_list = [
--++++            ops.cat([
--++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++            ], dim=0)
--++++            for i in range(batch_size)
--++++        ]
--++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++        weights_fp32 = routing_weights.to(mindspore.float32)
--++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++++        return moe_output_fp32.squeeze(1).to(original_dtype)
--++++
--++++    @no_grad()
--++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++        num_tokens, _ = hidden_states.shape
--++++        flat_selected_experts = selected_experts.flatten()
--++++        sorted_expert_indices = flat_selected_experts.argsort()
--++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--++++        original_token_indices = sorted_expert_indices // self.top_k
--++++        moe_output = ops.zeros_like(hidden_states)
--++++        current_token_offset = 0
--++++        for i in range(self.num_experts):
--++++            expert_token_count = tokens_per_expert[i] - current_token_offset
--++++            if expert_token_count == 0:
--++++                continue
--++++            end_offset = current_token_offset + expert_token_count
--++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--++++            expert_hidden_states = hidden_states[expert_original_token_indices]
--++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--++++            current_token_offset += expert_token_count
--++++        return moe_output
--++++
--++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--++++    @no_grad()
--++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++        moe_output = ops.zeros_like(hidden_states)
--++++        num_tokens, _ = hidden_states.shape
--++++        flat_selected_experts = selected_experts.flatten()
--++++        flat_routing_weights = routing_weights.flatten()
--++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++        active_experts = ops.unique(flat_selected_experts)
--++++        for expert_idx_tensor in active_experts:
--++++            expert_idx = expert_idx_tensor.item()
--++++            expert_layer = self.experts[expert_idx]
--++++            mask = (flat_selected_experts == expert_idx_tensor)
--++++            current_token_indices = token_indices[mask]
--++++            current_routing_weights = flat_routing_weights[mask]
--++++            current_hidden_states = hidden_states[current_token_indices]
--++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--++++        return moe_output
--+++ 
--+++-            final_hidden_states = final_hidden_states + shared_expert_output
--+++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++-            
--+++-            return final_hidden_states, router_logits
--++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++        global Long_Prompt
--++++
--++++        # 1. 门控计算 (所有模式通用)
--++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++        router_logits = self.gate(hidden_states_reshaped)
--++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--++++        if self.norm_topk_prob:
--++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++        
--++++        moe_output = None
--++++        if Long_Prompt:
--++++            # --- 精度优先模式 (ACCURACY MODE) ---
--++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++        else:
--++++            # --- 速度优先模式 (SPEED MODE) ---
--++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++            if sequence_length == 1:
--++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++            else:
--++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++        
--+++ 
--++++        # 3. 共享专家计算与合并 (所有模式通用)
--++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++        
--++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++        
--++++        return final_hidden_states, router_logits
--+++ 
--+++ class Qwen2MoeDecoderLayer(nn.Module):
--+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--+++         super().__init__()
--+++         self.hidden_size = config.hidden_size
--++++        
--++++        # if Long_Prompt:
--++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++        # else:
--++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++ 
--+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++ 
--+++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++-
--+++         if (layer_idx not in config.mlp_only_layers) and (
--+++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+++         ):
--+++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++             self._warmed_up = True
--+++             self.warmup_moe_model()
--+++ 
--++++
--++++
--+++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+++         output_router_logits = (
--+++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
--+++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++             router_logits=outputs.router_logits,
--+++         )
--+++ 
--++++    def generate(self, *args, **kwargs):
--++++        """
--++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--++++        """
--++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--++++
--++++        input_ids = kwargs.get("input_ids")
--++++        if input_ids is None and args:
--++++            input_ids = args[0]
--++++
--++++        if input_ids is not None:
--++++            prompt_length = input_ids.shape[1]
--++++            
--++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--++++                Long_Prompt = True
--++++            else:
--++++                Long_Prompt = False
--++++
--++++        return super().generate(*args, **kwargs)
--++++    
--+++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
--+++     def prepare_inputs_for_generation(
--+++         self,
--+++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
--+++         # Exception 1: when passing input_embeds, input_ids may be missing entries
--+++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
--++++        
--+++         if past_key_values is not None:
--+++             if inputs_embeds is not None:  # Exception 1
--+++                 if 0 not in input_ids.shape:
--+++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++             }
--+++         )
--+++         return model_inputs
--++++
--+++ # @lwx
--+++     # def _decode_one_tokens_logits(
--+++     #     self,
--+++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
--+++             attentions=outputs.attentions,
--+++         )
--+++ 
--++++
--+++ __all__ = [
--+++     "Qwen2MoeForCausalLM",
--+++     "Qwen2MoeModel",
--+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+++new file mode 100644
--+++index 00000000..6dfb5b93
--+++--- /dev/null
--++++++ b/patches/0001-20251104commit.patch
--+++@@ -0,0 +1,1272 @@
--++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++++From: Pinoeer-kingxi <13022943007@163.com>
--++++Date: Tue, 4 Nov 2025 09:11:51 +0800
--++++Subject: [PATCH] 20251104commit
--++++
--++++---
--++++ mindnlp/transformers/cache_utils.py           |  28 +-
--++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--++++ 3 files changed, 976 insertions(+), 87 deletions(-)
--++++
--++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--++++index cadd2e04..02f8d4be 100644
--++++--- a/mindnlp/transformers/cache_utils.py
--+++++++ b/mindnlp/transformers/cache_utils.py
--++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--++++             # k_out[:, :, cache_position] = key_states
--++++             # v_out[:, :, cache_position] = value_states
--++++-            if ON_ORANGE_PI:
--++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++-            else:
--++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++-
--+++++            # if ON_ORANGE_PI:
--+++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++++            # else:
--+++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++++            # 确保 cache_position 是 1D tensor 并且类型正确
--+++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+++++            if cache_position.ndim > 1:
--+++++                cache_position = cache_position.flatten()
--+++++            # 确保类型是 int32 或 int64（MindSpore 要求）
--+++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+++++                cache_position = cache_position.int()
--+++++            
--+++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+++++            k_out[:, :, cache_position] = key_states
--+++++            v_out[:, :, cache_position] = value_states
--+++++            
--++++         return k_out, v_out
--++++ 
--++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++index c695b944..d8303e45 100644
--++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--++++ def rotate_half(x):
--++++     """Rotates half the hidden dims of the input."""
--++++-    x1 = x[..., : x.shape[-1] // 2]
--++++-    x2 = x[..., x.shape[-1] // 2 :]
--+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++++    # x1 = x[..., : x.shape[-1] // 2]
--+++++    # x2 = x[..., x.shape[-1] // 2 :]
--+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++     return ops.cat((-x2, x1), dim=-1)
--++++ 
--++++ 
--++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--++++         if self.training:
--++++             raise NotImplementedError("Training is not supported yet.")
--++++         else:
--++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++-        if self.config.n_shared_experts is not None:
--++++-            y = y + self.shared_experts(identity)
--++++-        return y
--+++++            # @lwx
--+++++            if orig_shape[1] == 1:
--+++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+++++                y=y.view(*orig_shape)
--+++++                if self.config.n_shared_experts is not None:
--+++++                    y = y + self.shared_experts(identity)
--+++++                return y
--+++++            else:
--+++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+++++                if self.config.n_shared_experts is not None:
--+++++                    y = y + self.shared_experts(identity)
--+++++                return y
--+++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++++        # if self.config.n_shared_experts is not None:
--+++++        #     y = y + self.shared_experts(identity)
--+++++        # return y
--+++++        
--+++++    @no_grad()
--+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++++
--+++++        expert_cache = ops.zeros_like(x)
--+++++        for i in range(self.num_experts_per_tok):
--+++++            expert_id = flat_expert_indices[i].item()
--+++++            weight = flat_expert_weights[i].item()
--+++++            expert = self.experts[expert_id]
--+++++            expert_out = expert(x)
--+++++            expert_cache += expert_out * weight
--+++++        return expert_cache
--++++ 
--++++     @no_grad()
--++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++-        # expert_cache = torch.zeros_like(x)
--++++-        # idxs = flat_expert_indices.argsort()
--++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++-        # token_idxs = idxs // self.num_experts_per_tok
--++++-        # for i, end_idx in enumerate(tokens_per_expert):
--++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++-        #     if start_idx == end_idx:
--++++-        #         continue
--++++-        #     expert = self.experts[i]
--++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++-        #     expert_tokens = x[exp_token_idx]
--++++-        #     expert_out = expert(expert_tokens)
--++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++-        # return expert_cache
--+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++         expert_cache = ops.zeros_like(x)
--++++         idxs = flat_expert_indices.argsort()
--++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++         token_idxs = idxs // self.num_experts_per_tok
--+++++
--++++         for i, end_idx in enumerate(tokens_per_expert):
--++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++             if start_idx == end_idx:
--++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--++++             expert_out = expert(expert_tokens)
--++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++
--++++         return expert_cache
--+++++        
--+++++    # @no_grad()
--+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++    #     # expert_cache = torch.zeros_like(x)
--+++++    #     # idxs = flat_expert_indices.argsort()
--+++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++    #     # token_idxs = idxs // self.num_experts_per_tok
--+++++    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++    #     #     if start_idx == end_idx:
--+++++    #     #         continue
--+++++    #     #     expert = self.experts[i]
--+++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++    #     #     expert_tokens = x[exp_token_idx]
--+++++    #     #     expert_out = expert(expert_tokens)
--+++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++    #     # return expert_cache
--+++++    #     expert_cache = ops.zeros_like(x)
--+++++    #     idxs = flat_expert_indices.argsort()
--+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++    #     token_idxs = idxs // self.num_experts_per_tok
--+++++
--+++++    #     for i, end_idx in enumerate(tokens_per_expert):
--+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++    #         if start_idx == end_idx:
--+++++    #             continue
--+++++    #         expert = self.experts[i]
--+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++    #         expert_tokens = x[exp_token_idx]
--+++++    #         expert_out = expert(expert_tokens)
--+++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++
--+++++    #     return expert_cache
--+++++    # @no_grad()
--+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++    #     expert_cache = ops.zeros_like(x)
--+++++
--+++++    #     # 排序保证顺序一致
--+++++    #     idxs = flat_expert_indices.argsort()
--+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++    #     token_idxs = idxs // self.num_experts_per_tok
--+++++
--+++++    #     # 找出有 token 的专家
--+++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++++
--+++++    #     for i in active_experts.tolist():
--+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++    #         end_idx = tokens_per_expert[i]
--+++++    #         if start_idx == end_idx:  # 没有 token
--+++++    #             continue
--+++++
--+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++    #         expert_tokens = x[exp_token_idx]
--+++++    #         expert_out = self.experts[i](expert_tokens)
--+++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++++
--+++++    #         expert_cache = mindspore.mint.scatter_add(
--+++++    #             expert_cache,
--+++++    #             0,
--+++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++++    #             expert_out
--+++++    #         )
--+++++
--+++++    #     return expert_cache
--+++++
--+++++
--++++ 
--++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--++++ #     """
--++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++ 
--++++         # Initialize weights and apply final processing
--++++         self.post_init()
--+++++        self.warm_up = False
--+++++
--+++++    def warmup_moe_model_deep(self):
--+++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++++        test_texts = [
--+++++            "warmup short",
--+++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+++++        ]
--+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++++        if tokenizer is None:
--+++++            from mindnlp.transformers import AutoTokenizer
--+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++++            self._warmup_tokenizer = tokenizer
--+++++
--+++++        for text in test_texts:
--+++++            inputs = tokenizer(text, return_tensors="ms")
--+++++            with mindspore._no_grad():
--+++++                _ = self(**inputs, use_cache=False)
--+++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--++++ 
--++++     def get_input_embeddings(self):
--++++         return self.model.embed_tokens
--++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++++         ```"""
--+++++        if not self.warm_up:
--+++++            self.warm_up = True
--+++++            self.warmup_moe_model_deep()
--+++++
--++++         output_attentions = (
--++++             output_attentions
--++++             if output_attentions is not None
--++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++index 3cbf820e..d4c6b651 100644
--++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++@@ -18,7 +18,6 @@
--++++ # See the License for the specific language governing permissions and
--++++ # limitations under the License.
--++++ """MindSpore Qwen2MoE model."""
--++++-
--++++ import math
--++++ from typing import List, Optional, Tuple, Union
--++++ 
--++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--++++     TokenClassifierOutput,
--++++ )
--++++ from ...modeling_utils import PreTrainedModel
--+++++from ...generation import GenerationMixin
--++++ from ....utils import logging
--++++ from .configuration_qwen2_moe import Qwen2MoeConfig
--++++ 
--++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--++++         self.variance_epsilon = eps
--++++ 
--++++     def forward(self, hidden_states):
--+++++        # @dwj
--+++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++++        # @lwx
--+++++        # if not self.training :
--+++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++         input_dtype = hidden_states.dtype
--++++         hidden_states = hidden_states.to(mindspore.float32)
--++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--++++@@ -234,6 +239,8 @@ def rotate_half(x):
--++++     """Rotates half the hidden dims of the input."""
--++++     x1 = x[..., : x.shape[-1] // 2]
--++++     x2 = x[..., x.shape[-1] // 2 :]
--+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++     return ops.cat((-x2, x1), dim=-1)
--++++ 
--++++ 
--++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--++++         self.config = config
--++++         self.hidden_size = config.hidden_size
--++++         self.intermediate_size = intermediate_size
--+++++        
--++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--++++         self.act_fn = ACT2FN[config.hidden_act]
--++++ 
--++++     def forward(self, x):
--++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++-
--++++ 
--+++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++++        # @lwx 
--+++++        # gate_up_output = self.gate_up_proj(x)
--+++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+++++        # return self.down_proj(swiglu_output)
--+++++
--+++++    # def forward(self, x):
--+++++    #     gate_proj_out = self.gate_proj(x)
--+++++    #     up_proj_out = self.up_proj(x)
--+++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+++++    #     return self.down_proj(swiglu_out)
--+++++        
--++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++++     """
--++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--++++         use_cache: bool = False,
--++++         cache_position: Optional[mindspore.Tensor] = None,
--++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++        
--+++++
--++++         bsz, q_len, _ = hidden_states.shape
--++++ 
--++++         query_states = self.q_proj(hidden_states)
--++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++                     "with a layer index."
--++++                 )
--++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++            if isinstance(past_key_value, StaticCache):
--+++++                kv_seq_len = key_states.shape[-2]
--+++++            else:
--+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++ 
--++++         if past_key_value is not None:
--++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++            
--+++++            if isinstance(past_key_value, StaticCache):
--+++++                kv_seq_len = key_states.shape[-2]
--++++ 
--++++         # repeat k/v heads if n_kv_heads < n_heads
--++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++-
--+++++        
--++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++ 
--++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--++++-            raise ValueError(
--++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--++++-                f" {attn_weights.shape}"
--++++-            )
--++++-
--++++-        if attention_mask is not None:  # no matter the length, we just slice it
--++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+++++        if attention_mask is not None:
--+++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++             attn_weights = attn_weights + causal_mask
--++++ 
--++++         # upcast attention to fp32
--++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++ 
--++++         attn_output = self.o_proj(attn_output)
--++++-
--+++++        # @lwx
--+++++        
--+++++        # max_seq_len = self.max_position_embeddings  # 2048
--+++++
--+++++        # if attention_mask is not None:
--+++++        #     # attention_mask: [B, 1, Sq, Sk]
--+++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++++
--+++++        #     # pad 到 [max_seq_len, max_seq_len]
--+++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++++        #     global_attention_mask = padded_mask
--+++++        # else:
--+++++        #     global_attention_mask = None
--+++++
--+++++
--+++++        # sparse_mode=3
--+++++        # attn_output = mindspore.ops.flash_attention_score(
--+++++        #     query=query_states,
--+++++        #     key=key_states,
--+++++        #     value=value_states,
--+++++        #     real_shift=None,
--+++++        #     padding_mask=None,
--+++++
--+++++        #     head_num=self.num_heads,
--+++++        #     attn_mask=global_attention_mask,
--+++++        #     keep_prob=1.0 - self.attention_dropout,
--+++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++        #     input_layout="BNSD",
--+++++        #     pre_tokens=2147483647,
--+++++        #     next_tokens=2147483647,
--+++++        #     inner_precise=0,
--+++++        #     drop_mask=None,
--+++++        #     prefix=None,
--+++++        #     actual_seq_qlen=None,
--+++++        #     actual_seq_kvlen=None,
--+++++        #     sparse_mode=sparse_mode,
--+++++        # )
--++++         if not output_attentions:
--++++             attn_weights = None
--++++ 
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--++++ 
--+++++class Qwen2MoeFlashAttention(nn.Module):
--+++++    """
--+++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++++
--+++++    关键改动:
--+++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++++       直接传入原始的 key 和 value 张量效率更高。
--+++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++++    """
--+++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++++        super().__init__()
--+++++        self.config = config
--+++++        self.layer_idx = layer_idx
--+++++        self.hidden_size = config.hidden_size
--+++++        self.num_heads = config.num_attention_heads
--+++++        self.head_dim = self.hidden_size // self.num_heads
--+++++        self.num_key_value_heads = config.num_key_value_heads
--+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++        self.max_position_embeddings = config.max_position_embeddings
--+++++        self.rope_theta = config.rope_theta
--+++++        self.attention_dropout = config.attention_dropout
--+++++
--+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++            raise ValueError(
--+++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++++            )
--+++++
--+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++++
--+++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++++            self.head_dim,
--+++++            max_position_embeddings=self.max_position_embeddings,
--+++++            base=self.rope_theta,
--+++++        )
--+++++
--+++++    def forward(
--+++++        self,
--+++++        hidden_states: mindspore.Tensor,
--+++++        attention_mask: Optional[mindspore.Tensor] = None,
--+++++        position_ids: Optional[mindspore.Tensor] = None,
--+++++        past_key_value: Optional[Cache] = None,
--+++++        output_attentions: bool = False,
--+++++        use_cache: bool = False,
--+++++        cache_position: Optional[mindspore.Tensor] = None,
--+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++        bsz, q_len, _ = hidden_states.shape
--+++++
--+++++        # 1. 线性投射 Q, K, V
--+++++        query_states = self.q_proj(hidden_states)
--+++++        key_states = self.k_proj(hidden_states)
--+++++        value_states = self.v_proj(hidden_states)
--+++++
--+++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++        # 3. RoPE 旋转位置编码
--+++++        kv_seq_len = key_states.shape[-2]
--+++++        if past_key_value is not None:
--+++++            if self.layer_idx is None:
--+++++                raise ValueError(
--+++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++                    "with a layer index."
--+++++                )
--+++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++++                if cache_position.shape[0] == 1:
--+++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++++                    kv_seq_len = past_seen_tokens + 1
--+++++                else:
--+++++                    # prefill 阶段：cache_position 是范围，使用其长度
--+++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++++            else:
--+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++
--+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++        # 4. KV 缓存更新
--+++++        if past_key_value is not None:
--+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++            key_states, value_states = past_key_value.update(
--+++++                key_states, value_states, self.layer_idx, cache_kwargs
--+++++            )
--+++++            
--+++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++                if cache_position.shape[0] == 1:
--+++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++++                    kv_seq_len = key_states.shape[-2]
--+++++
--+++++        # 5. [重要] 准备 Attention Mask
--+++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++++        fa_attention_mask = None
--+++++        if attention_mask is not None:
--+++++            # 截取与当前key长度匹配的部分
--+++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++++            fa_attention_mask = (mask_slice != 0)
--+++++
--+++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++++        input_dtype = query_states.dtype
--+++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++++            query_states = query_states.to(mindspore.float16)
--+++++            key_states = key_states.to(mindspore.float16)
--+++++            value_states = value_states.to(mindspore.float16)
--+++++
--+++++        # 6. [核心] 调用 flash_attention_score 算子
--+++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++++        attn_output = mindspore.ops.flash_attention_score(
--+++++            query=query_states,
--+++++            key=key_states,
--+++++            value=value_states,
--+++++            head_num=self.num_heads, # 传入Q的头数(N1)
--+++++            attn_mask=fa_attention_mask,
--+++++            keep_prob=1.0 - self.attention_dropout,
--+++++            scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++            input_layout="BNSD",
--+++++            sparse_mode=0 # 使用 defaultMask 模式
--+++++        )
--+++++
--+++++        # 恢复原始数据类型
--+++++        attn_output = attn_output.to(input_dtype)
--+++++
--+++++        # 7. 调整输出形状
--+++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++        attn_output = self.o_proj(attn_output)
--+++++
--+++++        # FlashAttention 算子不直接返回注意力权重矩阵
--+++++        attn_weights = None
--+++++        if output_attentions:
--+++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++
--+++++        return attn_output, attn_weights, past_key_value
--+++++
--+++++    # def forward(
--+++++    #     self,
--+++++    #     hidden_states: mindspore.Tensor,
--+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++    #     past_key_value: Optional[Cache] = None,
--+++++    #     output_attentions: bool = False,
--+++++    #     use_cache: bool = False,
--+++++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++    #     bsz, q_len, _ = hidden_states.shape
--+++++
--+++++    #     # 1. 线性投射 Q, K, V
--+++++    #     query_states = self.q_proj(hidden_states)
--+++++    #     key_states = self.k_proj(hidden_states)
--+++++    #     value_states = self.v_proj(hidden_states)
--+++++
--+++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++    #     # 3. RoPE 旋转位置编码
--+++++    #     kv_seq_len = key_states.shape[-2]
--+++++    #     if past_key_value is not None:
--+++++    #         if self.layer_idx is None:
--+++++    #             raise ValueError(
--+++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++    #                 "with a layer index."
--+++++    #             )
--+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++
--+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++    #     # 4. KV 缓存更新
--+++++    #     if past_key_value is not None:
--+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++    #         key_states, value_states = past_key_value.update(
--+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++    #         )
--+++++
--+++++    #     # 5. 准备 Attention Mask
--+++++    #     fa_attention_mask = None
--+++++    #     if attention_mask is not None:
--+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++    #         fa_attention_mask = (mask_slice != 0)
--+++++
--+++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++++    #     input_dtype = query_states.dtype
--+++++
--+++++    #     # 6. [核心] 调用 flash_attention_score 算子
--+++++    #     attn_output = mindspore.ops.flash_attention_score(
--+++++    #         query=query_states,
--+++++    #         key=key_states,
--+++++    #         value=value_states,
--+++++    #         head_num=self.num_heads,
--+++++    #         attn_mask=fa_attention_mask,
--+++++    #         keep_prob=1.0 - self.attention_dropout,
--+++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++    #         input_layout="BNSD",
--+++++    #         sparse_mode=0,
--+++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++++    #         inner_precise=1
--+++++    #     )
--+++++
--+++++    #     # 恢复原始数据类型
--+++++    #     attn_output = attn_output.to(input_dtype)
--+++++
--+++++    #     # 7. 调整输出形状
--+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++    #     attn_output = self.o_proj(attn_output)
--+++++
--+++++    #     attn_weights = None
--+++++    #     if output_attentions:
--+++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++
--+++++    #     return attn_output, attn_weights, past_key_value
--+++++
--+++++    # def forward(
--+++++    #     self,
--+++++    #     hidden_states: mindspore.Tensor,
--+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++    #     past_key_value: Optional[Cache] = None,
--+++++    #     output_attentions: bool = False,
--+++++    #     use_cache: bool = False,
--+++++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++    #     bsz, q_len, _ = hidden_states.shape
--+++++
--+++++    #     query_states = self.q_proj(hidden_states)
--+++++    #     key_states = self.k_proj(hidden_states)
--+++++    #     value_states = self.v_proj(hidden_states)
--+++++
--+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++    #     kv_seq_len = key_states.shape[-2]
--+++++    #     if past_key_value is not None:
--+++++    #         if self.layer_idx is None:
--+++++    #             raise ValueError("`layer_idx` must be specified for caching")
--+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++
--+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++    #     if past_key_value is not None:
--+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++    #         key_states, value_states = past_key_value.update(
--+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++    #         )
--+++++
--+++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++
--+++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++++    #     query_states = query_states / math.sqrt(self.head_dim)
--+++++    #     # <--- 修改结束 ---
--+++++
--+++++    #     fa_attention_mask = None
--+++++    #     if attention_mask is not None:
--+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++    #         fa_attention_mask = (mask_slice != 0)
--+++++
--+++++    #     input_dtype = query_states.dtype
--+++++
--+++++    #     attn_output = mindspore.ops.flash_attention_score(
--+++++    #         query=query_states,    # 传入已经预先缩放过的 query
--+++++    #         key=key_states,
--+++++    #         value=value_states,
--+++++    #         head_num=self.num_heads,
--+++++    #         attn_mask=fa_attention_mask,
--+++++    #         keep_prob=1.0 - self.attention_dropout,
--+++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++++    #         input_layout="BNSD",
--+++++    #         sparse_mode=0,
--+++++    #         inner_precise=1        # 仍然保持内部高精度计算
--+++++    #     )
--+++++
--+++++    #     attn_output = attn_output.to(input_dtype)
--+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++    #     attn_output = self.o_proj(attn_output)
--+++++
--+++++    #     attn_weights = None
--+++++    #     if output_attentions:
--+++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++++
--+++++    #     return attn_output, attn_weights, past_key_value
--+++++
--++++ QWEN2MOE_ATTENTION_CLASSES = {
--++++     "eager": Qwen2MoeAttention,
--+++++    "flash-attention": Qwen2MoeFlashAttention,
--++++ }
--++++ 
--++++ 
--++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++ 
--+++++    #@dwj
--+++++    # 只遍历激活的专家，而非全部专家
--++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-        hidden_states = hidden_states.view(-1, hidden_dim)
--++++-        # router_logits: (batch * sequence_length, n_experts)
--++++-        router_logits = self.gate(hidden_states)
--++++-
--++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++-        if self.norm_topk_prob:
--++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-        # we cast back to the input dtype
--++++-        routing_weights = routing_weights.to(hidden_states.dtype)
--++++-
--++++-        final_hidden_states = ops.zeros(
--++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--++++-        )
--++++-
--++++-        # One hot encode the selected experts to create an expert mask
--++++-        # this will be used to easily index which expert is going to be sollicitated
--++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--++++-
--++++-        # Loop over all available experts in the model and perform the computation on each expert
--++++-        for expert_idx in range(self.num_experts):
--++++-            expert_layer = self.experts[expert_idx]
--++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--++++-
--++++-            # Index the correct hidden states and compute the expert hidden state for
--++++-            # the current expert. We need to make sure to multiply the output hidden
--++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--++++-            if 0 not in idx.shape:
--++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--++++-
--++++-                # However `index_add_` only support torch tensors for indexing so we'll use
--++++-                # the `top_x` tensor here.
--++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--++++-
--++++-        shared_expert_output = self.shared_expert(hidden_states)
--++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--++++-
--++++-        final_hidden_states = final_hidden_states + shared_expert_output
--+++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++            num_tokens = hidden_states_reshaped.shape[0]
--+++++
--+++++            router_logits = self.gate(hidden_states_reshaped)
--+++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++
--+++++            if self.norm_topk_prob:
--+++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++            routing_weights = routing_weights.to(hidden_states.dtype)
--+++++            
--+++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++++            flat_selected_experts = selected_experts.flatten()
--+++++            
--+++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++++            token_indices = broadcasted_token_indices.flatten()
--+++++            
--+++++            active_experts = ops.unique(flat_selected_experts)
--+++++            
--+++++            for expert_idx_tensor in active_experts:
--+++++                expert_idx = expert_idx_tensor.item()
--+++++                expert_layer = self.experts[expert_idx]
--+++++                
--+++++                mask = (flat_selected_experts == expert_idx_tensor)
--+++++                selected_token_indices = token_indices[mask]
--+++++                selected_routing_weights = routing_weights.flatten()[mask]
--+++++                
--+++++                current_states = hidden_states_reshaped[selected_token_indices]
--+++++                
--+++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++                
--+++++                final_hidden_states = final_hidden_states.index_add(
--+++++                    dim=0,
--+++++                    index=selected_token_indices,
--+++++                    source=expert_output.to(hidden_states.dtype)
--+++++                )
--+++++            
--+++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++++ 
--++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++-        return final_hidden_states, router_logits
--+++++            final_hidden_states = final_hidden_states + shared_expert_output
--+++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++            
--+++++            return final_hidden_states, router_logits
--++++ 
--++++ 
--++++ class Qwen2MoeDecoderLayer(nn.Module):
--++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--++++ 
--++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++ 
--+++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++++
--++++         if (layer_idx not in config.mlp_only_layers) and (
--++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++++         ):
--++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--++++     _skip_keys_device_placement = "past_key_values"
--++++     _supports_cache_class = True
--+++++#lwx
--+++++    # _supports_static_cache = True
--++++ 
--++++     def _init_weights(self, module):
--++++         std = self.config.initializer_range
--++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++++         return causal_mask
--++++ 
--++++ 
--++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++     _tied_weights_keys = ["lm_head.weight"]
--++++ 
--++++     def __init__(self, config):
--++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++         self.num_experts_per_tok = config.num_experts_per_tok
--++++         # Initialize weights and apply final processing
--++++         self.post_init()
--+++++        # @lwx
--+++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+++++        #     self.generation_config.cache_implementation = "static"
--+++++        self._warmed_up = False
--+++++
--+++++    def warmup_moe_model(self):
--+++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+++++        test_texts = [
--+++++            "warmup short",
--+++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+++++        ]
--+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++++        if tokenizer is None:
--+++++            from mindnlp.transformers import AutoTokenizer
--+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++++            self._warmup_tokenizer = tokenizer
--+++++
--+++++        for text in test_texts:
--+++++            inputs = tokenizer(text, return_tensors="ms")
--+++++            with mindspore._no_grad():
--+++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--++++ 
--++++     def get_input_embeddings(self):
--++++         return self.model.embed_tokens
--++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++++         ```"""
--+++++        if not self._warmed_up:
--+++++            self._warmed_up = True
--+++++            self.warmup_moe_model()
--++++ 
--++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++++         output_router_logits = (
--++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++             }
--++++         )
--++++         return model_inputs
--+++++# @lwx
--+++++    # def _decode_one_tokens_logits(
--+++++    #     self,
--+++++    #     cur_token: mindspore.Tensor,
--+++++    #     input_pos: Optional[mindspore.Tensor],
--+++++    #     cache_position: mindspore.Tensor,
--+++++    #     past_key_values: StaticCache,
--+++++    # ) -> mindspore.Tensor:
--+++++    #     """
--+++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+++++        
--+++++    #     Args:
--+++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+++++    #         input_pos: 输入位置信息，可选
--+++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+++++            
--+++++    #     Returns:
--+++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+++++    #     """
--+++++    #     # 调用JIT编译的版本
--+++++    #     return self.get_decode_one_tokens_logits(
--+++++    #         cur_token=cur_token,
--+++++    #         input_pos=input_pos,
--+++++    #         cache_position=cache_position,
--+++++    #         past_key_values=past_key_values,
--+++++    #     )
--+++++    
--+++++    # @mindspore.jit(jit_level='O1')
--+++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+++++    #     """
--+++++    #     JIT编译的函数，用于高效的单token解码
--+++++    #     使用JIT编译优化以支持静态shape和高效执行
--+++++        
--+++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+++++    #     """
--+++++    #     outputs = self.model.forward(
--+++++    #         input_ids=cur_token,
--+++++    #         position_ids=input_pos,
--+++++    #         cache_position=cache_position,
--+++++    #         past_key_values=past_key_values,
--+++++    #         use_cache=True,
--+++++    #         return_dict=False,
--+++++    #     )
--+++++        
--+++++    #     hidden_states = outputs[0]
--+++++    #     logits = self.lm_head.forward(hidden_states)
--+++++    #     logits = logits.float()
--+++++        
--+++++    #     return logits[:, -1, :]
--+++++
--+++++    # def _sample(
--+++++    #     self,
--+++++    #     input_ids: mindspore.Tensor,
--+++++    #     logits_processor,
--+++++    #     stopping_criteria,
--+++++    #     generation_config,
--+++++    #     synced_devices: bool,
--+++++    #     streamer=None,
--+++++    #     logits_warper=None,
--+++++    #     **model_kwargs,
--+++++    # ):
--+++++    #     """
--+++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+++++    #     """
--+++++    #     from ...generation.logits_process import LogitsProcessorList
--+++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+++++    #     from mindnlp.core import nn, ops, no_grad
--+++++    #     import numpy as np
--+++++        
--+++++    #     # 检查是否使用 StaticCache
--+++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+++++    #     # 否则，直接调用父类方法
--+++++    #     past_key_values = model_kwargs.get("past_key_values")
--+++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+++++
--+++++    #     if not isinstance(past_key_values, StaticCache):
--+++++    #         # 不使用 StaticCache，直接调用父类方法
--+++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+++++    #         return super()._sample(
--+++++    #             input_ids=input_ids,
--+++++    #             logits_processor=logits_processor,
--+++++    #             stopping_criteria=stopping_criteria,
--+++++    #             generation_config=generation_config,
--+++++    #             synced_devices=synced_devices,
--+++++    #             streamer=streamer,
--+++++    #             logits_warper=logits_warper,
--+++++    #             **model_kwargs,
--+++++    #         )
--+++++        
--+++++    #     # 使用 StaticCache，进入自定义循环
--+++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+++++    #     pad_token_id = generation_config._pad_token_tensor
--+++++    #     output_attentions = generation_config.output_attentions
--+++++    #     output_hidden_states = generation_config.output_hidden_states
--+++++    #     output_scores = generation_config.output_scores
--+++++    #     output_logits = generation_config.output_logits
--+++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+++++    #     max_length = generation_config.max_length
--+++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+++++    #     do_sample = generation_config.do_sample
--+++++        
--+++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+++++    #         raise ValueError(
--+++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+++++    #             f"{logits_warper})."
--+++++    #         )
--+++++        
--+++++    #     # init attention / hidden states / scores tuples
--+++++    #     scores = () if (return_dict_in_generate and output_scores) else None
--+++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+++++        
--+++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+++++    #         encoder_hidden_states = (
--+++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+++++    #         )
--+++++        
--+++++    #     # keep track of which sequences are already finished
--+++++    #     batch_size, cur_len = input_ids.shape
--+++++    #     this_peer_finished = False
--+++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+++++        
--+++++    #     time_record = []
--+++++    #     from ....utils.testing_utils import parse_flag_from_env
--+++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+++++        
--+++++    #     while self._has_unfinished_sequences(
--+++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+++++    #     ):
--+++++    #         if _record_time:
--+++++    #             import time as time_module
--+++++    #             infer_start = time_module.time()
--+++++            
--+++++    #         # prepare model inputs
--+++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+++++            
--+++++    #         # prepare variable output controls
--+++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+++++            
--+++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+++++    #         cur_cache_position = model_inputs.get("cache_position")
--+++++    #         cur_past_key_values = model_inputs.get("past_key_values")
--+++++    #         cur_input_ids = model_inputs.get("input_ids")
--+++++            
--+++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+++++    #             cur_cache_position is not None and 
--+++++    #             len(cur_cache_position.shape) > 0 and
--+++++    #             cur_cache_position.shape[0] == 1 and
--+++++    #             cur_input_ids is not None and
--+++++    #             cur_input_ids.shape[1] == 1):
--+++++    #             # 使用 JIT 优化的单 token 解码
--+++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+++++    #             if not hasattr(self, '_jit_used'):
--+++++    #                 self._jit_used = False
--+++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+++++                
--+++++    #             next_token_logits = self.get_decode_one_tokens_logits(
--+++++    #                 cur_token=cur_input_ids,
--+++++    #                 input_pos=model_inputs.get("position_ids"),
--+++++    #                 cache_position=cur_cache_position,
--+++++    #                 past_key_values=cur_past_key_values,
--+++++    #             )
--+++++                
--+++++    #             # 标记已使用JIT（用于后续判断）
--+++++    #             if not self._jit_used:
--+++++    #                 self._jit_used = True
--+++++                
--+++++    #             # 构造兼容的输出对象
--+++++    #             class JitOptimizedOutput:
--+++++    #                 def __init__(self, logits, config):
--+++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+++++    #                     self.config = config
--+++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+++++    #                     self.attentions = None if not config.is_encoder_decoder else None
--+++++    #                     self.cross_attentions = None
--+++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+++++                
--+++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+++++    #         else:
--+++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+++++    #             outputs = self(**model_inputs, return_dict=True)
--+++++            
--+++++    #         if synced_devices and this_peer_finished:
--+++++    #             continue
--+++++            
--+++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+++++    #         next_token_logits = outputs.logits[:, -1, :]
--+++++            
--+++++    #         # pre-process distribution
--+++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+++++    #         if do_sample:
--+++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+++++            
--+++++    #         # Store scores, attentions and hidden_states when required
--+++++    #         if return_dict_in_generate:
--+++++    #             if output_scores:
--+++++    #                 scores += (next_token_scores,)
--+++++    #             if output_logits:
--+++++    #                 raw_logits += (next_token_logits,)
--+++++    #             if output_attentions:
--+++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+++++    #                 if self.config.is_encoder_decoder:
--+++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+++++                
--+++++    #             if output_hidden_states:
--+++++    #                 hidden = (
--+++++    #                     outputs.decoder_hidden_states
--+++++    #                     if self.config.is_encoder_decoder
--+++++    #                     else outputs.hidden_states
--+++++    #                 )
--+++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+++++            
--+++++    #         # token selection
--+++++    #         if do_sample:
--+++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+++++    #         else:
--+++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+++++            
--+++++    #         # finished sentences should have their next token be a padding token
--+++++    #         if has_eos_stopping_criteria:
--+++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+++++            
--+++++    #         # update generated ids, model inputs, and length for next step
--+++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+++++    #         if streamer is not None:
--+++++    #             streamer.put(next_tokens)
--+++++            
--+++++    #         model_kwargs = self._update_model_kwargs_for_generation(
--+++++    #             outputs,
--+++++    #             model_kwargs,
--+++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+++++    #         )
--+++++            
--+++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+++++    #         cur_len += 1
--+++++            
--+++++    #         if _record_time:
--+++++    #             import time as time_module
--+++++    #             infer_stop = time_module.time()
--+++++    #             time_record.append(infer_stop - infer_start)
--+++++            
--+++++    #         del outputs
--+++++        
--+++++    #     average_infer_time = None
--+++++    #     if time_record:
--+++++    #         if len(time_record) > 1:
--+++++    #             time_record.pop(0)
--+++++    #         average_infer_time = sum(time_record) / len(time_record)
--+++++    #         print(f'average inference time is: {average_infer_time}')
--+++++    #         print(f'inference time record: {time_record}')
--+++++        
--+++++    #     if streamer is not None:
--+++++    #         streamer.end()
--+++++        
--+++++    #     # 简单判断：打印是否使用了JIT路径
--+++++    #     if hasattr(self, '_jit_used') and self._jit_used:
--+++++    #         print("[JIT] ✓ JIT optimization was used during generation")
--+++++    #     else:
--+++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+++++        
--+++++    #     if return_dict_in_generate:
--+++++    #         if self.config.is_encoder_decoder:
--+++++    #             return GenerateEncoderDecoderOutput(
--+++++    #                 sequences=input_ids,
--+++++    #                 scores=scores,
--+++++    #                 logits=raw_logits,
--+++++    #                 encoder_attentions=encoder_attentions,
--+++++    #                 encoder_hidden_states=encoder_hidden_states,
--+++++    #                 decoder_attentions=decoder_attentions,
--+++++    #                 cross_attentions=cross_attentions,
--+++++    #                 decoder_hidden_states=decoder_hidden_states,
--+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++++    #                 average_infer_time=average_infer_time
--+++++    #             )
--+++++    #         else:
--+++++    #             return GenerateDecoderOnlyOutput(
--+++++    #                 sequences=input_ids,
--+++++    #                 scores=scores,
--+++++    #                 logits=raw_logits,
--+++++    #                 attentions=decoder_attentions,
--+++++    #                 hidden_states=decoder_hidden_states,
--+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++++    #                 average_infer_time=average_infer_time
--+++++    #             )
--+++++    #     else:
--+++++    #         return input_ids
--+++++            
--+++++    # def _prepare_cache_for_generation(
--+++++    #     self,
--+++++    #     generation_config,
--+++++    #     model_kwargs,
--+++++    #     assistant_model,
--+++++    #     batch_size,
--+++++    #     max_cache_length,
--+++++    # ):
--+++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+++++    #         generation_config.cache_implementation = "static"
--+++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+++++        
--+++++    #     if generation_config.cache_implementation == "static":
--+++++    #         base_required_from_max_length = generation_config.max_length + 1
--+++++    #         base_required = max(max_cache_length, base_required_from_max_length)
--+++++    #         min_cache_size = 50
--+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+++++    #         else:
--+++++    #             max_cache_length = max(base_required, min_cache_size)
--+++++            
--+++++    #         original_max_cache_length = max_cache_length
--+++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+++++    #         print(f"  - final max_cache_length: {max_cache_length}")
--+++++            
--+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++++    #             if max_cache_length > self.config.max_position_embeddings:
--+++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+++++        
--+++++    #     result = super()._prepare_cache_for_generation(
--+++++    #         generation_config=generation_config,
--+++++    #         model_kwargs=model_kwargs,
--+++++    #         assistant_model=assistant_model,
--+++++    #         batch_size=batch_size,
--+++++    #         max_cache_length=max_cache_length,
--+++++    #     )
--+++++        
--+++++    #     if generation_config.cache_implementation == "static":
--+++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+++++    #         created_cache = model_kwargs.get(cache_name)
--+++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+++++    #             if created_cache.max_cache_len < generation_config.max_length:
--+++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+++++        
--+++++    #     return result
--+++++
--+++++
--+++++
--++++ 
--++++ 
--++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--++++-- 
--++++2.27.0
--++++
--+++-- 
--+++2.27.0
--+++
--++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--++new file mode 100644
--++index 00000000..966529e4
--++--- /dev/null
--+++++ b/patches/0003-20261106secondcommit.patch
--++@@ -0,0 +1,2769 @@
--+++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--+++From: Pinoeer-kingxi <13022943007@163.com>
--+++Date: Thu, 6 Nov 2025 14:54:37 +0800
--+++Subject: [PATCH 3/3] 20261106secondcommit
--+++
--+++---
--+++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
--+++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
--+++ patches/0001-20251104commit.patch             | 1272 -----------------
--+++ 3 files changed, 528 insertions(+), 2032 deletions(-)
--+++ delete mode 100644 patches/0001-20251104commit.patch
--+++
--+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++index 73773c22..2f9192bf 100644
--+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
--+++ 
--+++ _CONFIG_FOR_DOC = "DeepseekConfig"
--+++ 
--++++_attn_mask_cache = {}
--++++
--++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
--++++    q_len = batch_and_seq[1]
--++++    kv_len = batch_and_seq[1] + past_key_values_length 
--++++    key = (batch_and_seq[0], q_len, kv_len)
--++++
--++++    if key in _attn_mask_cache:
--++++        return _attn_mask_cache[key]
--++++
--++++    mask = _prepare_4d_causal_attention_mask(
--++++        attention_mask,
--++++        batch_and_seq,
--++++        inputs_embeds,
--++++        past_key_values_length,
--++++    )
--++++    _attn_mask_cache[key] = mask
--++++    return mask
--+++ 
--+++ def _get_unpad_data(attention_mask):
--+++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
--+++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
--+++         return final_output
--+++ 
--+++ 
--+++-    @no_grad()
--+++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++-        expert_cache = ops.zeros_like(x)
--+++-        idxs = flat_expert_indices.argsort()
--+++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++-        token_idxs = idxs // self.num_experts_per_tok
--+++-
--+++-        for i, end_idx in enumerate(tokens_per_expert):
--+++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++-            if start_idx == end_idx:
--+++-                continue
--+++-            expert = self.experts[i]
--+++-            exp_token_idx = token_idxs[start_idx:end_idx]
--+++-            expert_tokens = x[exp_token_idx]
--+++-            expert_out = expert(expert_tokens)
--+++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++-
--+++-        return expert_cache
--+++-        
--+++     # @no_grad()
--+++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++-    #     # expert_cache = torch.zeros_like(x)
--+++-    #     # idxs = flat_expert_indices.argsort()
--+++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++-    #     # token_idxs = idxs // self.num_experts_per_tok
--+++-    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++-    #     #     if start_idx == end_idx:
--+++-    #     #         continue
--+++-    #     #     expert = self.experts[i]
--+++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++-    #     #     expert_tokens = x[exp_token_idx]
--+++-    #     #     expert_out = expert(expert_tokens)
--+++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++-    #     # return expert_cache
--++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++     #     expert_cache = ops.zeros_like(x)
--+++     #     idxs = flat_expert_indices.argsort()
--+++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
--+++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++ 
--+++     #     return expert_cache
--+++-    # @no_grad()
--+++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++-    #     expert_cache = ops.zeros_like(x)
--++++        
--++++    @no_grad()
--++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++        """
--++++        优化版 MoE prefill：
--++++        - 批量张量化处理同一个 expert 的所有 token
--++++        - 跳过无 token 的专家
--++++        - 保持结果完全一致
--++++        """
--++++        # 初始化输出缓存
--++++        expert_cache = ops.zeros_like(x)
--+++ 
--+++-    #     # 排序保证顺序一致
--+++-    #     idxs = flat_expert_indices.argsort()
--+++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++-    #     token_idxs = idxs // self.num_experts_per_tok
--++++        # 排序（确保 scatter_add 位置对应原逻辑）
--++++        idxs = flat_expert_indices.argsort()
--++++        sorted_expert_indices = flat_expert_indices[idxs]
--++++        sorted_token_indices = idxs // self.num_experts_per_tok
--+++ 
--+++-    #     # 找出有 token 的专家
--+++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++        # 每个 expert 的 token 数
--++++        tokens_per_expert = sorted_expert_indices.bincount()
--+++ 
--+++-    #     for i in active_experts.tolist():
--+++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++-    #         end_idx = tokens_per_expert[i]
--+++-    #         if start_idx == end_idx:  # 没有 token
--+++-    #             continue
--++++        # 找出有 token 的专家
--++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--+++ 
--+++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++-    #         expert_tokens = x[exp_token_idx]
--+++-    #         expert_out = self.experts[i](expert_tokens)
--+++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++        for expert_id in active_experts.tolist():
--++++            # 取该 expert 对应的排序后 token 区间
--++++            start = (tokens_per_expert[:expert_id]).sum().item()
--++++            end = start + tokens_per_expert[expert_id].item()
--+++ 
--+++-    #         expert_cache = mindspore.mint.scatter_add(
--+++-    #             expert_cache,
--+++-    #             0,
--+++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++-    #             expert_out
--+++-    #         )
--++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
--++++            expert_tokens = x[token_idx]                     # 取输入向量
--+++ 
--+++-    #     return expert_cache
--++++            # 执行专家 MLP
--++++            expert_out = self.experts[expert_id](expert_tokens)
--++++
--++++            # 按权重缩放
--++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
--++++
--++++            # 回写到缓存（等价 scatter_add）
--++++            expert_cache = mindspore.mint.scatter_add(
--++++                expert_cache,
--++++                0,
--++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++                scaled_out
--++++            )
--++++
--++++        return expert_cache
--++++
--++++        # @no_grad()
--++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++        #     # expert_cache = torch.zeros_like(x)
--++++        #     # idxs = flat_expert_indices.argsort()
--++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++        #     # token_idxs = idxs // self.num_experts_per_tok
--++++        #     # for i, end_idx in enumerate(tokens_per_expert):
--++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++        #     #     if start_idx == end_idx:
--++++        #     #         continue
--++++        #     #     expert = self.experts[i]
--++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++        #     #     expert_tokens = x[exp_token_idx]
--++++        #     #     expert_out = expert(expert_tokens)
--++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++        #     # return expert_cache
--++++        #     expert_cache = ops.zeros_like(x)
--++++        #     idxs = flat_expert_indices.argsort()
--++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++        #     token_idxs = idxs // self.num_experts_per_tok
--++++
--++++        #     for i, end_idx in enumerate(tokens_per_expert):
--++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++        #         if start_idx == end_idx:
--++++        #             continue
--++++        #         expert = self.experts[i]
--++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++        #         expert_tokens = x[exp_token_idx]
--++++        #         expert_out = expert(expert_tokens)
--++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++
--++++        #     return expert_cache
--++++        # @no_grad()
--++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++        #     expert_cache = ops.zeros_like(x)
--++++
--++++        #     # 排序保证顺序一致
--++++        #     idxs = flat_expert_indices.argsort()
--++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++        #     token_idxs = idxs // self.num_experts_per_tok
--++++
--++++        #     # 找出有 token 的专家
--++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++
--++++        #     for i in active_experts.tolist():
--++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++        #         end_idx = tokens_per_expert[i]
--++++        #         if start_idx == end_idx:  # 没有 token
--++++        #             continue
--++++
--++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++        #         expert_tokens = x[exp_token_idx]
--++++        #         expert_out = self.experts[i](expert_tokens)
--++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++
--++++        #         expert_cache = mindspore.mint.scatter_add(
--++++        #             expert_cache,
--++++        #             0,
--++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++        #             expert_out
--++++        #         )
--++++
--++++        #     return expert_cache
--+++ 
--+++ 
--+++ 
--+++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
--+++ 
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--+++-
--+++ # class DeepseekFlashAttention(nn.Module):
--+++ #     """
--+++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--+++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
--+++ 
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--++++
--+++ Deepseek_ATTENTION_CLASSES = {
--+++     "eager": DeepseekAttention,
--+++     "flash-attention": DeepseekFlashAttention,
--+++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
--+++             )
--+++         else:
--+++             # 4d mask is passed through the layers
--+++-            attention_mask = _prepare_4d_causal_attention_mask(
--++++            # attention_mask = _prepare_4d_causal_attention_mask(
--++++            #     attention_mask,
--++++            #     (batch_size, seq_length),
--++++            #     inputs_embeds,
--++++            #     past_key_values_length,
--++++            # )
--++++            #@dwj
--++++            attention_mask = get_cached_causal_mask(
--+++                 attention_mask,
--+++                 (batch_size, seq_length),
--+++                 inputs_embeds,
--+++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++         # Initialize weights and apply final processing
--+++         self.post_init()
--+++         self.warm_up = False
--++++        #@dwj
--++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--++++            self.num_layers,
--++++            self.num_attention_heads,
--++++            self.head_dim,
--++++            batch_size=1,
--++++            max_length=self.max_length,
--++++            dtype=mindspore.float16
--++++        )
--++++
--++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--++++        key_cache = []
--++++        value_cache = []
--++++        for _ in range(num_layers):
--++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++++            key_cache.append(k)
--++++            value_cache.append(v)
--++++        return key_cache, value_cache
--++++
--+++ 
--+++     def warmup_moe_model_deep(self):
--+++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++index bced285c..ebd7782e 100644
--+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
--+++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--+++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--+++ 
--+++-Long_Prompt = False
--+++-PROMPT_LENGTH_THRESHOLD = 128
--++++Long_Prompt = 1
--++++LONG_PROMPT_LENGTH_THRESHOLD = 128
--++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
--++++
--++++_causal_mask_cache = {}
--++++
--++++def get_cached_causal_mask_with_cache_position(
--++++    attention_mask: mindspore.Tensor,
--++++    sequence_length: int,
--++++    target_length: int,
--++++    dtype: mindspore.dtype,
--++++    min_dtype: float,
--++++    cache_position: mindspore.Tensor,
--++++    batch_size: int,
--++++):
--++++    """
--++++    带缓存的 causal mask 构造函数
--++++    """
--++++    # q_len 是当前 query 长度
--++++    q_len = sequence_length
--++++    # kv_len 是 target_length
--++++    kv_len = target_length
--++++
--++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
--++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
--++++
--++++    if key in _causal_mask_cache:
--++++        return _causal_mask_cache[key]
--++++
--++++    # 调用原来的 mask 构造逻辑
--++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++        attention_mask,
--++++        sequence_length=sequence_length,
--++++        target_length=target_length,
--++++        dtype=dtype,
--++++        min_dtype=min_dtype,
--++++        cache_position=cache_position,
--++++        batch_size=batch_size,
--++++    )
--++++    # 缓存结果
--++++    _causal_mask_cache[key] = causal_mask
--++++    return causal_mask
--+++ 
--+++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--+++ def _prepare_4d_causal_attention_mask_with_cache_position(
--+++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+++ 
--+++ 
--+++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
--++++# class Qwen2MoeAttention(nn.Module):
--++++#     """
--++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--++++#     and "Generating Long Sequences with Sparse Transformers".
--++++#     """
--++++
--++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++#         super().__init__()
--++++#         self.config = config
--++++#         self.layer_idx = layer_idx
--++++#         if layer_idx is None:
--++++#             logger.warning_once(
--++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++#                 "when creating this class."
--++++#             )
--++++
--++++#         self.hidden_size = config.hidden_size
--++++#         self.num_heads = config.num_attention_heads
--++++#         self.head_dim = self.hidden_size // self.num_heads
--++++#         self.num_key_value_heads = config.num_key_value_heads
--++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++#         self.max_position_embeddings = config.max_position_embeddings
--++++#         self.rope_theta = config.rope_theta
--++++#         self.is_causal = True
--++++#         self.attention_dropout = config.attention_dropout
--++++
--++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++++#             raise ValueError(
--++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++++#                 f" and `num_heads`: {self.num_heads})."
--++++#             )
--++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++
--++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++#             self.head_dim,
--++++#             max_position_embeddings=self.max_position_embeddings,
--++++#             base=self.rope_theta,
--++++#         )
--++++
--++++#     def forward(
--++++#         self,
--++++#         hidden_states: mindspore.Tensor,
--++++#         attention_mask: Optional[mindspore.Tensor] = None,
--++++#         position_ids: Optional[mindspore.Tensor] = None,
--++++#         past_key_value: Optional[Cache] = None,
--++++#         output_attentions: bool = False,
--++++#         use_cache: bool = False,
--++++#         cache_position: Optional[mindspore.Tensor] = None,
--++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++
--++++        
--++++
--++++#         bsz, q_len, _ = hidden_states.shape
--++++
--++++#         query_states = self.q_proj(hidden_states)
--++++#         key_states = self.k_proj(hidden_states)
--++++#         value_states = self.v_proj(hidden_states)
--++++
--++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++
--++++#         kv_seq_len = key_states.shape[-2]
--++++#         if past_key_value is not None:
--++++#             if self.layer_idx is None:
--++++#                 raise ValueError(
--++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++#                     "with a layer index."
--++++#                 )
--++++#             if isinstance(past_key_value, StaticCache):
--++++#                 kv_seq_len = key_states.shape[-2]
--++++#             else:
--++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++
--++++#         if past_key_value is not None:
--++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++            
--++++#             if isinstance(past_key_value, StaticCache):
--++++#                 kv_seq_len = key_states.shape[-2]
--++++
--++++#         # repeat k/v heads if n_kv_heads < n_heads
--++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++        
--++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++
--++++#         if attention_mask is not None:
--++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++#             attn_weights = attn_weights + causal_mask
--++++
--++++#         # upcast attention to fp32
--++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--++++#         attn_output = ops.matmul(attn_weights, value_states)
--++++
--++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--++++#             raise ValueError(
--++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--++++#                 f" {attn_output.shape}"
--++++#             )
--++++
--++++#         attn_output = ops.transpose(attn_output, 1, 2)
--++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++
--++++#         attn_output = self.o_proj(attn_output)
--++++#         # @lwx
--++++        
--++++#         # max_seq_len = self.max_position_embeddings  # 2048
--++++
--++++#         # if attention_mask is not None:
--++++#         #     # attention_mask: [B, 1, Sq, Sk]
--++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++++
--++++#         #     # pad 到 [max_seq_len, max_seq_len]
--++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++++#         #     global_attention_mask = padded_mask
--++++#         # else:
--++++#         #     global_attention_mask = None
--++++
--++++
--++++#         # sparse_mode=3
--++++#         # attn_output = mindspore.ops.flash_attention_score(
--++++#         #     query=query_states,
--++++#         #     key=key_states,
--++++#         #     value=value_states,
--++++#         #     real_shift=None,
--++++#         #     padding_mask=None,
--++++
--++++#         #     head_num=self.num_heads,
--++++#         #     attn_mask=global_attention_mask,
--++++#         #     keep_prob=1.0 - self.attention_dropout,
--++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++++#         #     input_layout="BNSD",
--++++#         #     pre_tokens=2147483647,
--++++#         #     next_tokens=2147483647,
--++++#         #     inner_precise=0,
--++++#         #     drop_mask=None,
--++++#         #     prefix=None,
--++++#         #     actual_seq_qlen=None,
--++++#         #     actual_seq_kvlen=None,
--++++#         #     sparse_mode=sparse_mode,
--++++#         # )
--++++#         if not output_attentions:
--++++#             attn_weights = None
--++++
--++++#         return attn_output, attn_weights, past_key_value
--++++
--+++ class Qwen2MoeAttention(nn.Module):
--+++     """
--+++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--+++-    and "Generating Long Sequences with Sparse Transformers".
--+++-    """
--++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
--+++ 
--++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
--++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
--++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
--++++
--++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
--++++    """
--+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++         super().__init__()
--+++         self.config = config
--+++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
--+++         if layer_idx is None:
--+++             logger.warning_once(
--+++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++                 "when creating this class."
--+++             )
--+++ 
--+++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
--+++         use_cache: bool = False,
--+++         cache_position: Optional[mindspore.Tensor] = None,
--+++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++-
--+++         
--+++-
--++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
--+++         bsz, q_len, _ = hidden_states.shape
--+++ 
--+++         query_states = self.q_proj(hidden_states)
--+++         key_states = self.k_proj(hidden_states)
--+++         value_states = self.v_proj(hidden_states)
--+++ 
--+++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++-
--++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++        
--+++         kv_seq_len = key_states.shape[-2]
--+++         if past_key_value is not None:
--+++-            if self.layer_idx is None:
--+++-                raise ValueError(
--+++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++-                    "with a layer index."
--+++-                )
--+++-            if isinstance(past_key_value, StaticCache):
--+++-                kv_seq_len = key_states.shape[-2]
--+++-            else:
--+++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++        
--+++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++ 
--+++         if past_key_value is not None:
--+++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++
--++++        # --- 2. 动态调度核心注意力计算 ---
--++++        global Long_Prompt
--++++        if Long_Prompt >= 1:
--++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
--++++            fa_attention_mask = None
--++++            if attention_mask is not None:
--++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++                fa_attention_mask = (mask_slice != 0)
--++++
--++++            attn_output = mindspore.ops.flash_attention_score(
--++++                query=query_states,
--++++                key=key_states,
--++++                value=value_states,
--++++                head_num=self.num_heads,
--++++                attn_mask=fa_attention_mask,
--++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
--++++                scalar_value=1.0 / math.sqrt(self.head_dim),
--++++                input_layout="BNSD",
--++++                sparse_mode=0,
--++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
--++++            )
--+++             
--+++-            if isinstance(past_key_value, StaticCache):
--+++-                kv_seq_len = key_states.shape[-2]
--++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++            attn_output = self.o_proj(attn_output)
--++++            attn_weights = None
--++++            if output_attentions:
--++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--+++ 
--+++-        # repeat k/v heads if n_kv_heads < n_heads
--+++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++-        
--+++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++        else:
--++++            # --- Eager Attention 路径 (用于短序列和解码) ---
--++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++            
--++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++ 
--+++-        if attention_mask is not None:
--+++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++-            attn_weights = attn_weights + causal_mask
--++++            if attention_mask is not None:
--++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++                attn_weights = attn_weights + causal_mask
--+++ 
--+++-        # upcast attention to fp32
--+++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+++-        attn_output = ops.matmul(attn_weights, value_states)
--++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--++++            attn_output = ops.matmul(attn_weights, value_states)
--+++ 
--+++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+++-            raise ValueError(
--+++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--+++-                f" {attn_output.shape}"
--+++-            )
--++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--++++                raise ValueError(
--++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
--++++                )
--+++ 
--+++-        attn_output = ops.transpose(attn_output, 1, 2)
--+++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++            attn_output = ops.transpose(attn_output, 1, 2)
--++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++            attn_output = self.o_proj(attn_output)
--+++ 
--+++-        attn_output = self.o_proj(attn_output)
--+++-        # @lwx
--++++            if not output_attentions:
--++++                attn_weights = None
--+++         
--+++-        # max_seq_len = self.max_position_embeddings  # 2048
--+++-
--+++-        # if attention_mask is not None:
--+++-        #     # attention_mask: [B, 1, Sq, Sk]
--+++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++-
--+++-        #     # pad 到 [max_seq_len, max_seq_len]
--+++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++-        #     global_attention_mask = padded_mask
--+++-        # else:
--+++-        #     global_attention_mask = None
--+++-
--+++-
--+++-        # sparse_mode=3
--+++-        # attn_output = mindspore.ops.flash_attention_score(
--+++-        #     query=query_states,
--+++-        #     key=key_states,
--+++-        #     value=value_states,
--+++-        #     real_shift=None,
--+++-        #     padding_mask=None,
--+++-
--+++-        #     head_num=self.num_heads,
--+++-        #     attn_mask=global_attention_mask,
--+++-        #     keep_prob=1.0 - self.attention_dropout,
--+++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++-        #     input_layout="BNSD",
--+++-        #     pre_tokens=2147483647,
--+++-        #     next_tokens=2147483647,
--+++-        #     inner_precise=0,
--+++-        #     drop_mask=None,
--+++-        #     prefix=None,
--+++-        #     actual_seq_qlen=None,
--+++-        #     actual_seq_kvlen=None,
--+++-        #     sparse_mode=sparse_mode,
--+++-        # )
--+++-        if not output_attentions:
--+++-            attn_weights = None
--+++-
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--+++-
--+++ # class Qwen2MoeFlashAttention(nn.Module):
--+++ #     """
--+++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
--+++ #             return final_hidden_states, router_logits
--+++ 
--+++ 
--+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++-#     """
--+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--+++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--+++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
--+++-#     """
--+++-#     def __init__(self, config: Qwen2MoeConfig):
--+++-#         super().__init__()
--+++-#         self.num_experts = config.num_experts
--+++-#         self.top_k = config.num_experts_per_tok
--+++-#         self.norm_topk_prob = config.norm_topk_prob
--+++-
--+++-#         # 门控网络
--+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++-#         # 专家列表
--+++-#         self.experts = nn.ModuleList(
--+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++-#         )
--+++-#         # 共享专家
--+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++-
--+++-#     @no_grad()
--+++-#     def _moe_infer_decode(
--+++-#         self, 
--+++-#         hidden_states: mindspore.Tensor, 
--+++-#         selected_experts: mindspore.Tensor, 
--+++-#         routing_weights: mindspore.Tensor
--+++-#     ) -> mindspore.Tensor:
--+++-#         """
--+++-#         【解码路径】针对 sequence_length=1 的极致优化。
--+++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--+++-#         """
--+++-#         batch_size, hidden_dim = hidden_states.shape
--+++-        
--+++-#         expert_outputs_list = [
--+++-#             ops.cat([
--+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++-#             ], dim=0) 
--+++-#             for i in range(batch_size)
--+++-#         ]
--+++-        
--+++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--+++-#         # shape: (batch_size, top_k, hidden_dim)
--+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++-        
--+++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--+++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++-        
--+++-#         return moe_output.squeeze(1)
--+++-
--+++-#     @no_grad()
--+++-#     def _moe_infer_prefill(
--+++-#         self, 
--+++-#         hidden_states: mindspore.Tensor, 
--+++-#         selected_experts: mindspore.Tensor, 
--+++-#         routing_weights: mindspore.Tensor
--+++-#     ) -> mindspore.Tensor:
--+++-#         """
--+++-#         【预填充路径】针对 sequence_length > 1 的优化。
--+++-#         按专家对 Token 进行分组，并进行批处理。
--+++-#         """
--+++-#         moe_output = ops.zeros_like(hidden_states)
--+++-#         num_tokens = hidden_states.shape[0]
--+++-#         flat_selected_experts = selected_experts.flatten()
--+++-        
--+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++-        
--+++-#         active_experts = ops.unique(flat_selected_experts)
--+++-        
--+++-#         for expert_idx_tensor in active_experts:
--+++-#             expert_idx = expert_idx_tensor.item()
--+++-#             expert_layer = self.experts[expert_idx]
--+++-            
--+++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++-#             selected_token_indices = token_indices[mask]
--+++-#             selected_routing_weights = routing_weights.flatten()[mask]
--+++-            
--+++-#             current_states = hidden_states[selected_token_indices]
--+++-            
--+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++-            
--+++-#             moe_output = moe_output.index_add(
--+++-#                 dim=0,
--+++-#                 index=selected_token_indices,
--+++-#                 source=expert_output.to(hidden_states.dtype)
--+++-#             )
--+++-#         return moe_output
--+++-
--+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++-#         """
--+++-#         顶层 forward 方法，作为智能分发器。
--+++-#         """
--+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-        
--+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++-#         router_logits = self.gate(hidden_states_reshaped)
--+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++-
--+++-#         if self.norm_topk_prob:
--+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-        
--+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++-        
--+++-#         moe_output = None
--+++-#         # 在推理时，根据序列长度选择最优路径
--+++-#         if not self.training:
--+++-#             if sequence_length == 1:
--+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+++-#             else:
--+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+++-#         else:
--+++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--+++-#             raise NotImplementedError("Training path is not implemented.")
--+++-
--+++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--+++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--+++-        
--+++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--+++-        
--+++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--+++-        
--+++-#         return final_hidden_states, router_logits
--+++-
--+++-
--+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++-#     """
--+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--+++-#     """
--+++-#     def __init__(self, config: Qwen2MoeConfig):
--+++-#         super().__init__()
--+++-#         self.num_experts = config.num_experts
--+++-#         self.top_k = config.num_experts_per_tok
--+++-#         self.norm_topk_prob = config.norm_topk_prob
--+++-
--+++-#         # 门控网络
--+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++-#         # 专家列表
--+++-#         self.experts = nn.ModuleList(
--+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++-#         )
--+++-#         # 共享专家
--+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++-
--+++-#     @no_grad()
--+++-#     def _moe_infer_decode(
--+++-#         self, 
--+++-#         hidden_states: mindspore.Tensor, 
--+++-#         selected_experts: mindspore.Tensor, 
--+++-#         routing_weights: mindspore.Tensor
--+++-#     ) -> mindspore.Tensor:
--+++-#         batch_size, _ = hidden_states.shape
--+++-#         expert_outputs_list = [
--+++-#             ops.cat([
--+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++-#             ], dim=0) 
--+++-#             for i in range(batch_size)
--+++-#         ]
--+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++-#         return moe_output.squeeze(1)
--+++-
--+++-#     @no_grad()
--+++-#     def _moe_infer_prefill(
--+++-#         self, 
--+++-#         hidden_states: mindspore.Tensor, 
--+++-#         selected_experts: mindspore.Tensor, 
--+++-#         routing_weights: mindspore.Tensor
--+++-#     ) -> mindspore.Tensor:
--+++-#         moe_output = ops.zeros_like(hidden_states)
--+++-#         num_tokens = hidden_states.shape[0]
--+++-#         flat_selected_experts = selected_experts.flatten()
--+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++-#         active_experts = ops.unique(flat_selected_experts)
--+++-        
--+++-#         for expert_idx_tensor in active_experts:
--+++-#             expert_idx = expert_idx_tensor.item()
--+++-#             expert_layer = self.experts[expert_idx]
--+++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++-#             selected_token_indices = token_indices[mask]
--+++-#             selected_routing_weights = routing_weights.flatten()[mask]
--+++-#             current_states = hidden_states[selected_token_indices]
--+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++-#             moe_output = moe_output.index_add(
--+++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+++-#             )
--+++-#         return moe_output
--+++-
--+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++-#         """
--+++-#         顶层 forward 方法，作为智能分发器。
--+++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--+++-#         """
--+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-        
--+++-#         # 1. 门控计算 (通用逻辑)
--+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++-#         router_logits = self.gate(hidden_states_reshaped)
--+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++-
--+++-#         if self.norm_topk_prob:
--+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-        
--+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++-        
--+++-#         # 2. 智能分发到最优 MoE 路径
--+++-#         moe_output = None
--+++-#         if not self.training:
--+++-#             if sequence_length == 1:
--+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+++-#             else:
--+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+++-#         else:
--+++-#             raise NotImplementedError("Training path is not implemented.")
--+++-
--+++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--+++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++-        
--+++-#         # 4. 合并 MoE 输出和共享专家输出
--+++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++-        
--+++-#         # 5. 恢复原始形状并返回
--+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++-        
--+++-#         return final_hidden_states, router_logits
--+++-
--+++-# prefill fastest
--+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++-#     """
--+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--+++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--+++-#     """
--+++-#     def __init__(self, config: Qwen2MoeConfig):
--+++-#         super().__init__()
--+++-#         self.num_experts = config.num_experts
--+++-#         self.top_k = config.num_experts_per_tok
--+++-#         self.norm_topk_prob = config.norm_topk_prob
--+++-
--+++-#         # 门控网络
--+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++-#         # 专家列表
--+++-#         self.experts = nn.ModuleList(
--+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++-#         )
--+++-#         # 共享专家
--+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++-
--+++-#     @no_grad()
--+++-#     def _moe_infer_dispatch(
--+++-#         self, 
--+++-#         hidden_states: mindspore.Tensor, 
--+++-#         selected_experts: mindspore.Tensor, 
--+++-#         routing_weights: mindspore.Tensor
--+++-#     ) -> mindspore.Tensor:
--+++-#         """
--+++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--+++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--+++-#         """
--+++-#         moe_output = ops.zeros_like(hidden_states)
--+++-#         num_tokens, _ = hidden_states.shape
--+++-        
--+++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--+++-#         flat_selected_experts = selected_experts.flatten()
--+++-#         flat_routing_weights = routing_weights.flatten()
--+++-
--+++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++-
--+++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--+++-#         active_experts = ops.unique(flat_selected_experts)
--+++-        
--+++-#         for expert_idx_tensor in active_experts:
--+++-#             expert_idx = expert_idx_tensor.item()
--+++-#             expert_layer = self.experts[expert_idx]
--+++-            
--+++-#             # 找到所有分配给该专家的 token
--+++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++-            
--+++-#             # 使用 mask 选取对应的 token 和权重
--+++-#             current_token_indices = token_indices[mask]
--+++-#             current_routing_weights = flat_routing_weights[mask]
--+++-#             current_hidden_states = hidden_states[current_token_indices]
--+++-            
--+++-#             # 对这些 token 进行批处理
--+++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++-            
--+++-#             # 使用 index_add 将结果精确地加回到对应位置
--+++-#             moe_output = moe_output.index_add(
--+++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--+++-#             )
--+++-#         return moe_output
--+++-
--+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++-#         """
--+++-#         顶层 forward 方法，作为智能分发器。
--+++-#         """
--+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-        
--+++-#         # 1. 门控计算
--+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++-#         router_logits = self.gate(hidden_states_reshaped)
--+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++-
--+++-#         if self.norm_topk_prob:
--+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-        
--+++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++-        
--+++-#         # 2. 调用统一的 MoE 计算内核
--+++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--+++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--+++-
--+++-#         # 3. 统一处理共享专家
--+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++-        
--+++-#         # 4. 合并输出
--+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++-        
--+++-#         # 5. 恢复原始形状并返回
--+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++-        
--+++-#         return final_hidden_states, router_logits
--+++-
--+++-
--+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++-#     """
--+++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++-#     【最终高性能与高精度版】：
--+++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--+++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--+++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--+++-#     3. 这样实现了速度和准确性的两全其美。
--+++-#     """
--+++-#     def __init__(self, config: Qwen2MoeConfig):
--+++-#         super().__init__()
--+++-#         self.num_experts = config.num_experts
--+++-#         self.top_k = config.num_experts_per_tok
--+++-#         self.norm_topk_prob = config.norm_topk_prob
--+++-
--+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++-#         self.experts = nn.ModuleList(
--+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++-#         )
--+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++-
--+++-#     @no_grad()
--+++-#     def _moe_infer_decode(
--+++-#         self, 
--+++-#         hidden_states: mindspore.Tensor, 
--+++-#         selected_experts: mindspore.Tensor, 
--+++-#         routing_weights: mindspore.Tensor
--+++-#     ) -> mindspore.Tensor:
--+++-#         """
--+++-#         【解码路径】极致优化版：bmm + 高精度累加。
--+++-#         """
--+++-#         original_dtype = hidden_states.dtype
--+++-#         batch_size, _ = hidden_states.shape
--+++-        
--+++-#         expert_outputs_list = [
--+++-#             ops.cat([
--+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++-#             ], dim=0) 
--+++-#             for i in range(batch_size)
--+++-#         ]
--+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++-
--+++-#         # 在 float32 下执行 bmm，得到高精度结果
--+++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++-        
--+++-#         # 将高精度结果转换回原始数据类型
--+++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--+++-        
--+++-#         return moe_output
--+++-
--+++-#     @no_grad()
--+++-#     def _moe_infer_prefill(
--+++-#         self, 
--+++-#         hidden_states: mindspore.Tensor, 
--+++-#         selected_experts: mindspore.Tensor, 
--+++-#         routing_weights: mindspore.Tensor
--+++-#     ) -> mindspore.Tensor:
--+++-#         """
--+++-#         【预填充路径】与原始实现一致，结果精确。
--+++-#         """
--+++-#         moe_output = ops.zeros_like(hidden_states)
--+++-#         num_tokens, _ = hidden_states.shape
--+++-#         flat_selected_experts = selected_experts.flatten()
--+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++-#         active_experts = ops.unique(flat_selected_experts)
--+++-        
--+++-#         for expert_idx_tensor in active_experts:
--+++-#             expert_idx = expert_idx_tensor.item()
--+++-#             expert_layer = self.experts[expert_idx]
--+++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++-#             selected_token_indices = token_indices[mask]
--+++-#             selected_routing_weights = routing_weights.flatten()[mask]
--+++-#             current_states = hidden_states[selected_token_indices]
--+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++-#             moe_output = moe_output.index_add(
--+++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+++-#             )
--+++-#         return moe_output
--+++-
--+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-        
--+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++-#         router_logits = self.gate(hidden_states_reshaped)
--+++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++-
--+++-#         if self.norm_topk_prob:
--+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-        
--+++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--+++-#         # 如果模型主体是 float16，后续再转换
--+++-        
--+++-#         moe_output = None
--+++-#         if not self.training:
--+++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--+++-#             # _moe_infer_decode 内部会处理好类型转换
--+++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--+++-#             if sequence_length == 1:
--+++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+++-#             else:
--+++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+++-#         else:
--+++-#             raise NotImplementedError("Training path is not implemented.")
--+++-
--+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++-        
--+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++-        
--+++-#         return final_hidden_states, router_logits
--+++-    
--+++-
--+++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++-#     """
--+++-#     【融合版】一个混合专家模块，内置两种推理策略，
--+++-#     由外部全局变量 `Long_Prompt` 控制：
--+++-
--+++-#     - if Long_Prompt is True:  【精度优先模式】
--+++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--+++-#       适用于处理长序列，避免误差累积。
--+++-
--+++-#     - if Long_Prompt is False: 【速度优先模式】
--+++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--+++-#       在解码阶段获得极致速度，同时保证结果高度准确。
--+++-#     """
--+++-#     def __init__(self, config: Qwen2MoeConfig):
--+++-#         super().__init__()
--+++-#         self.num_experts = config.num_experts
--+++-#         self.top_k = config.num_experts_per_tok
--+++-#         self.norm_topk_prob = config.norm_topk_prob
--+++-
--+++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++-#         self.experts = nn.ModuleList(
--+++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++-#         )
--+++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++-
--+++-#     # --- 速度优先模式的辅助函数 ---
--+++-#     @no_grad()
--+++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++-#         original_dtype = hidden_states.dtype
--+++-#         batch_size, _ = hidden_states.shape
--+++-#         expert_outputs_list = [
--+++-#             ops.cat([
--+++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++-#             ], dim=0) 
--+++-#             for i in range(batch_size)
--+++-#         ]
--+++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++-#         weights_fp32 = routing_weights.to(mindspore.float32)
--+++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
--+++-
--+++-#     @no_grad()
--+++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++-#         moe_output = ops.zeros_like(hidden_states)
--+++-#         num_tokens, _ = hidden_states.shape
--+++-#         flat_selected_experts = selected_experts.flatten()
--+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++-#         active_experts = ops.unique(flat_selected_experts)
--+++-#         for expert_idx_tensor in active_experts:
--+++-#             expert_idx = expert_idx_tensor.item()
--+++-#             expert_layer = self.experts[expert_idx]
--+++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++-#             selected_token_indices = token_indices[mask]
--+++-#             selected_routing_weights = routing_weights.flatten()[mask]
--+++-#             current_states = hidden_states[selected_token_indices]
--+++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--+++-#         return moe_output
--+++-
--+++-#     # --- 精度优先模式的辅助函数 ---
--+++-#     @no_grad()
--+++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++-#         moe_output = ops.zeros_like(hidden_states)
--+++-#         num_tokens, _ = hidden_states.shape
--+++-#         flat_selected_experts = selected_experts.flatten()
--+++-#         flat_routing_weights = routing_weights.flatten()
--+++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++-#         active_experts = ops.unique(flat_selected_experts)
--+++-#         for expert_idx_tensor in active_experts:
--+++-#             expert_idx = expert_idx_tensor.item()
--+++-#             expert_layer = self.experts[expert_idx]
--+++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++-#             current_token_indices = token_indices[mask]
--+++-#             current_routing_weights = flat_routing_weights[mask]
--+++-#             current_hidden_states = hidden_states[current_token_indices]
--+++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+++-#         return moe_output
--+++-
--+++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++-#         # 声明我们将要使用一个在模块外部定义的全局变量
--+++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--+++-#         global Long_Prompt
--+++-
--+++-#         # 1. 门控计算 (所有模式通用)
--+++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++-#         router_logits = self.gate(hidden_states_reshaped)
--+++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+++-#         if self.norm_topk_prob:
--+++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-        
--+++-#         moe_output = None
--+++-#         if not self.training:
--+++-#             # 根据 Long_Prompt 标志选择模式
--+++-#             if Long_Prompt:
--+++-#                 # --- 精度优先模式 ---
--+++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++-#             else:
--+++-#                 # --- 速度优先模式 ---
--+++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++-#                 if sequence_length == 1:
--+++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++-#                 else:
--+++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++-#         else:
--+++-#             raise NotImplementedError("Training path is not implemented.")
--+++-
--+++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++-        
--+++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++-        
--+++-#         return final_hidden_states, router_logits
--+++-    
--+++ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++     """
--+++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--+++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+++         return moe_output_fp32.squeeze(1).to(original_dtype)
--+++ 
--++++    # @no_grad()
--++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++    #     num_tokens, _ = hidden_states.shape
--++++    #     flat_selected_experts = selected_experts.flatten()
--++++    #     sorted_expert_indices = flat_selected_experts.argsort()
--++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--++++    #     original_token_indices = sorted_expert_indices // self.top_k
--++++    #     moe_output = ops.zeros_like(hidden_states)
--++++    #     current_token_offset = 0
--++++    #     for i in range(self.num_experts):
--++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
--++++    #         if expert_token_count == 0:
--++++    #             continue
--++++    #         end_offset = current_token_offset + expert_token_count
--++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
--++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--++++    #         current_token_offset += expert_token_count
--++++    #     return moe_output
--++++
--+++     @no_grad()
--+++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++-        num_tokens, _ = hidden_states.shape
--+++-        flat_selected_experts = selected_experts.flatten()
--+++-        sorted_expert_indices = flat_selected_experts.argsort()
--+++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+++-        original_token_indices = sorted_expert_indices // self.top_k
--++++        """
--++++        优化版 MoE prefill (速度优先模式)：
--++++        - 批量张量化处理同一个 expert 的所有 token
--++++        - 跳过无 token 的专家
--++++        - 保持结果完全一致
--++++        """
--+++         moe_output = ops.zeros_like(hidden_states)
--+++-        current_token_offset = 0
--+++-        for i in range(self.num_experts):
--+++-            expert_token_count = tokens_per_expert[i] - current_token_offset
--+++-            if expert_token_count == 0:
--+++-                continue
--+++-            end_offset = current_token_offset + expert_token_count
--+++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+++-            expert_hidden_states = hidden_states[expert_original_token_indices]
--+++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+++-            current_token_offset += expert_token_count
--++++
--++++        flat_selected_experts = selected_experts.flatten()
--++++        flat_routing_weights = routing_weights.flatten()
--++++
--++++        idxs = flat_selected_experts.argsort()
--++++        sorted_expert_indices = flat_selected_experts[idxs]
--++++        sorted_token_indices = idxs // self.top_k
--++++
--++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
--++++
--++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--++++
--++++        for expert_id in active_experts.tolist():
--++++            start = int(tokens_per_expert[:expert_id].sum().item())
--++++            end = start + int(tokens_per_expert[expert_id].item())
--++++
--++++            token_idx = sorted_token_indices[start:end]
--++++            expert_tokens = hidden_states[token_idx]
--++++
--++++            expert_out = self.experts[expert_id](expert_tokens)
--++++
--++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
--++++
--++++            moe_output = mindspore.mint.scatter_add(
--++++                moe_output,
--++++                0,
--++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
--++++                scaled_out.to(hidden_states.dtype)
--++++            )
--++++
--+++         return moe_output
--+++ 
--++++
--+++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--+++     @no_grad()
--+++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++         
--+++         moe_output = None
--+++-        if Long_Prompt:
--+++-            # --- 精度优先模式 (ACCURACY MODE) ---
--+++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++        # if Long_Prompt==0:
--++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
--++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++        # else:
--++++        #     # --- 速度优先模式 (SPEED MODE) ---
--++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++        #     if sequence_length == 1:
--++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++        #     else:
--++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++        
--++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++        if sequence_length == 1:
--++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++         else:
--+++-            # --- 速度优先模式 (SPEED MODE) ---
--+++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++-            if sequence_length == 1:
--+++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++-            else:
--+++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++-        
--++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++    
--+++ 
--+++         # 3. 共享专家计算与合并 (所有模式通用)
--+++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++         
--+++         return final_hidden_states, router_logits
--+++ 
--++++
--+++ class Qwen2MoeDecoderLayer(nn.Module):
--+++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--+++         super().__init__()
--+++         self.hidden_size = config.hidden_size
--+++         
--+++-        # if Long_Prompt:
--+++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++-        # else:
--++++        # if Long_Prompt == 2:
--+++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++        # else:
--++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++ 
--+++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++ 
--+++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+++             )
--+++ 
--+++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
--+++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++        #     attention_mask,
--++++        #     sequence_length=sequence_length,
--++++        #     target_length=target_length,
--++++        #     dtype=dtype,
--++++        #     min_dtype=min_dtype,
--++++        #     cache_position=cache_position,
--++++        #     batch_size=input_tensor.shape[0],
--++++        # )
--++++        #@dwj
--++++        causal_mask = get_cached_causal_mask_with_cache_position(
--+++             attention_mask,
--+++             sequence_length=sequence_length,
--+++             target_length=target_length,
--+++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--+++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--+++         """
--+++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
--++++        _causal_mask_cache.clear()
--+++ 
--+++         input_ids = kwargs.get("input_ids")
--+++         if input_ids is None and args:
--+++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++ 
--+++         if input_ids is not None:
--+++             prompt_length = input_ids.shape[1]
--+++-            
--+++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--+++-                Long_Prompt = True
--++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
--++++                Long_Prompt = 2
--++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
--++++                Long_Prompt = 0
--+++             else:
--+++-                Long_Prompt = False
--++++                Long_Prompt = 1
--++++
--+++ 
--+++         return super().generate(*args, **kwargs)
--+++     
--+++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++             dtype = self.lm_head.weight.dtype
--+++             min_dtype = float(ops.finfo(dtype).min)
--+++ 
--+++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++            #     attention_mask,
--++++            #     sequence_length=sequence_length,
--++++            #     target_length=past_key_values.get_max_length(),
--++++            #     dtype=dtype,
--++++            #     min_dtype=min_dtype,
--++++            #     cache_position=cache_position,
--++++            #     batch_size=batch_size,
--++++            # )
--++++
--++++            #@dwj
--++++            attention_mask = get_cached_causal_mask_with_cache_position(
--+++                 attention_mask,
--+++                 sequence_length=sequence_length,
--+++                 target_length=past_key_values.get_max_length(),
--+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+++deleted file mode 100644
--+++index 6dfb5b93..00000000
--+++--- a/patches/0001-20251104commit.patch
--++++++ /dev/null
--+++@@ -1,1272 +0,0 @@
--+++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+++-From: Pinoeer-kingxi <13022943007@163.com>
--+++-Date: Tue, 4 Nov 2025 09:11:51 +0800
--+++-Subject: [PATCH] 20251104commit
--+++-
--+++----
--+++- mindnlp/transformers/cache_utils.py           |  28 +-
--+++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+++- 3 files changed, 976 insertions(+), 87 deletions(-)
--+++-
--+++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+++-index cadd2e04..02f8d4be 100644
--+++---- a/mindnlp/transformers/cache_utils.py
--+++-+++ b/mindnlp/transformers/cache_utils.py
--+++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+++-             # k_out[:, :, cache_position] = key_states
--+++-             # v_out[:, :, cache_position] = value_states
--+++--            if ON_ORANGE_PI:
--+++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++--            else:
--+++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++--
--+++-+            # if ON_ORANGE_PI:
--+++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++-+            # else:
--+++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++-+            # 确保 cache_position 是 1D tensor 并且类型正确
--+++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+++-+            if cache_position.ndim > 1:
--+++-+                cache_position = cache_position.flatten()
--+++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
--+++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+++-+                cache_position = cache_position.int()
--+++-+            
--+++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+++-+            k_out[:, :, cache_position] = key_states
--+++-+            v_out[:, :, cache_position] = value_states
--+++-+            
--+++-         return k_out, v_out
--+++- 
--+++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++-index c695b944..d8303e45 100644
--+++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+++- # Copied from transformers.models.llama.modeling_llama.rotate_half
--+++- def rotate_half(x):
--+++-     """Rotates half the hidden dims of the input."""
--+++--    x1 = x[..., : x.shape[-1] // 2]
--+++--    x2 = x[..., x.shape[-1] // 2 :]
--+++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++-+    # x1 = x[..., : x.shape[-1] // 2]
--+++-+    # x2 = x[..., x.shape[-1] // 2 :]
--+++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++-     return ops.cat((-x2, x1), dim=-1)
--+++- 
--+++- 
--+++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+++-         if self.training:
--+++-             raise NotImplementedError("Training is not supported yet.")
--+++-         else:
--+++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++--        if self.config.n_shared_experts is not None:
--+++--            y = y + self.shared_experts(identity)
--+++--        return y
--+++-+            # @lwx
--+++-+            if orig_shape[1] == 1:
--+++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+++-+                y=y.view(*orig_shape)
--+++-+                if self.config.n_shared_experts is not None:
--+++-+                    y = y + self.shared_experts(identity)
--+++-+                return y
--+++-+            else:
--+++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+++-+                if self.config.n_shared_experts is not None:
--+++-+                    y = y + self.shared_experts(identity)
--+++-+                return y
--+++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++-+        # if self.config.n_shared_experts is not None:
--+++-+        #     y = y + self.shared_experts(identity)
--+++-+        # return y
--+++-+        
--+++-+    @no_grad()
--+++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++-+
--+++-+        expert_cache = ops.zeros_like(x)
--+++-+        for i in range(self.num_experts_per_tok):
--+++-+            expert_id = flat_expert_indices[i].item()
--+++-+            weight = flat_expert_weights[i].item()
--+++-+            expert = self.experts[expert_id]
--+++-+            expert_out = expert(x)
--+++-+            expert_cache += expert_out * weight
--+++-+        return expert_cache
--+++- 
--+++-     @no_grad()
--+++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++--        # expert_cache = torch.zeros_like(x)
--+++--        # idxs = flat_expert_indices.argsort()
--+++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++--        # token_idxs = idxs // self.num_experts_per_tok
--+++--        # for i, end_idx in enumerate(tokens_per_expert):
--+++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++--        #     if start_idx == end_idx:
--+++--        #         continue
--+++--        #     expert = self.experts[i]
--+++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++--        #     expert_tokens = x[exp_token_idx]
--+++--        #     expert_out = expert(expert_tokens)
--+++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++--        # return expert_cache
--+++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++-         expert_cache = ops.zeros_like(x)
--+++-         idxs = flat_expert_indices.argsort()
--+++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++-         token_idxs = idxs // self.num_experts_per_tok
--+++-+
--+++-         for i, end_idx in enumerate(tokens_per_expert):
--+++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++-             if start_idx == end_idx:
--+++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+++-             expert_out = expert(expert_tokens)
--+++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++-+
--+++-         return expert_cache
--+++-+        
--+++-+    # @no_grad()
--+++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++-+    #     # expert_cache = torch.zeros_like(x)
--+++-+    #     # idxs = flat_expert_indices.argsort()
--+++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++-+    #     # token_idxs = idxs // self.num_experts_per_tok
--+++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++-+    #     #     if start_idx == end_idx:
--+++-+    #     #         continue
--+++-+    #     #     expert = self.experts[i]
--+++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++-+    #     #     expert_tokens = x[exp_token_idx]
--+++-+    #     #     expert_out = expert(expert_tokens)
--+++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++-+    #     # return expert_cache
--+++-+    #     expert_cache = ops.zeros_like(x)
--+++-+    #     idxs = flat_expert_indices.argsort()
--+++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++-+    #     token_idxs = idxs // self.num_experts_per_tok
--+++-+
--+++-+    #     for i, end_idx in enumerate(tokens_per_expert):
--+++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++-+    #         if start_idx == end_idx:
--+++-+    #             continue
--+++-+    #         expert = self.experts[i]
--+++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++-+    #         expert_tokens = x[exp_token_idx]
--+++-+    #         expert_out = expert(expert_tokens)
--+++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++-+
--+++-+    #     return expert_cache
--+++-+    # @no_grad()
--+++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++-+    #     expert_cache = ops.zeros_like(x)
--+++-+
--+++-+    #     # 排序保证顺序一致
--+++-+    #     idxs = flat_expert_indices.argsort()
--+++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++-+    #     token_idxs = idxs // self.num_experts_per_tok
--+++-+
--+++-+    #     # 找出有 token 的专家
--+++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++-+
--+++-+    #     for i in active_experts.tolist():
--+++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++-+    #         end_idx = tokens_per_expert[i]
--+++-+    #         if start_idx == end_idx:  # 没有 token
--+++-+    #             continue
--+++-+
--+++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++-+    #         expert_tokens = x[exp_token_idx]
--+++-+    #         expert_out = self.experts[i](expert_tokens)
--+++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++-+
--+++-+    #         expert_cache = mindspore.mint.scatter_add(
--+++-+    #             expert_cache,
--+++-+    #             0,
--+++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++-+    #             expert_out
--+++-+    #         )
--+++-+
--+++-+    #     return expert_cache
--+++-+
--+++-+
--+++- 
--+++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+++- #     """
--+++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++- 
--+++-         # Initialize weights and apply final processing
--+++-         self.post_init()
--+++-+        self.warm_up = False
--+++-+
--+++-+    def warmup_moe_model_deep(self):
--+++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++-+        test_texts = [
--+++-+            "warmup short",
--+++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+++-+        ]
--+++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++-+        if tokenizer is None:
--+++-+            from mindnlp.transformers import AutoTokenizer
--+++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++-+            self._warmup_tokenizer = tokenizer
--+++-+
--+++-+        for text in test_texts:
--+++-+            inputs = tokenizer(text, return_tensors="ms")
--+++-+            with mindspore._no_grad():
--+++-+                _ = self(**inputs, use_cache=False)
--+++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+++- 
--+++-     def get_input_embeddings(self):
--+++-         return self.model.embed_tokens
--+++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++-         ```"""
--+++-+        if not self.warm_up:
--+++-+            self.warm_up = True
--+++-+            self.warmup_moe_model_deep()
--+++-+
--+++-         output_attentions = (
--+++-             output_attentions
--+++-             if output_attentions is not None
--+++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++-index 3cbf820e..d4c6b651 100644
--+++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++-@@ -18,7 +18,6 @@
--+++- # See the License for the specific language governing permissions and
--+++- # limitations under the License.
--+++- """MindSpore Qwen2MoE model."""
--+++--
--+++- import math
--+++- from typing import List, Optional, Tuple, Union
--+++- 
--+++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+++-     TokenClassifierOutput,
--+++- )
--+++- from ...modeling_utils import PreTrainedModel
--+++-+from ...generation import GenerationMixin
--+++- from ....utils import logging
--+++- from .configuration_qwen2_moe import Qwen2MoeConfig
--+++- 
--+++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+++-         self.variance_epsilon = eps
--+++- 
--+++-     def forward(self, hidden_states):
--+++-+        # @dwj
--+++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++-+        # @lwx
--+++-+        # if not self.training :
--+++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++-         input_dtype = hidden_states.dtype
--+++-         hidden_states = hidden_states.to(mindspore.float32)
--+++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+++-@@ -234,6 +239,8 @@ def rotate_half(x):
--+++-     """Rotates half the hidden dims of the input."""
--+++-     x1 = x[..., : x.shape[-1] // 2]
--+++-     x2 = x[..., x.shape[-1] // 2 :]
--+++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++-     return ops.cat((-x2, x1), dim=-1)
--+++- 
--+++- 
--+++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+++-         self.config = config
--+++-         self.hidden_size = config.hidden_size
--+++-         self.intermediate_size = intermediate_size
--+++-+        
--+++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+++-         self.act_fn = ACT2FN[config.hidden_act]
--+++- 
--+++-     def forward(self, x):
--+++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++--
--+++- 
--+++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++-+        # @lwx 
--+++-+        # gate_up_output = self.gate_up_proj(x)
--+++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+++-+        # return self.down_proj(swiglu_output)
--+++-+
--+++-+    # def forward(self, x):
--+++-+    #     gate_proj_out = self.gate_proj(x)
--+++-+    #     up_proj_out = self.up_proj(x)
--+++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+++-+    #     return self.down_proj(swiglu_out)
--+++-+        
--+++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+++-     """
--+++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+++-         use_cache: bool = False,
--+++-         cache_position: Optional[mindspore.Tensor] = None,
--+++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++-+
--+++-+        
--+++-+
--+++-         bsz, q_len, _ = hidden_states.shape
--+++- 
--+++-         query_states = self.q_proj(hidden_states)
--+++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++-                     "with a layer index."
--+++-                 )
--+++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++-+            if isinstance(past_key_value, StaticCache):
--+++-+                kv_seq_len = key_states.shape[-2]
--+++-+            else:
--+++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++- 
--+++-         if past_key_value is not None:
--+++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++-+            
--+++-+            if isinstance(past_key_value, StaticCache):
--+++-+                kv_seq_len = key_states.shape[-2]
--+++- 
--+++-         # repeat k/v heads if n_kv_heads < n_heads
--+++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++--
--+++-+        
--+++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++- 
--+++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+++--            raise ValueError(
--+++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+++--                f" {attn_weights.shape}"
--+++--            )
--+++--
--+++--        if attention_mask is not None:  # no matter the length, we just slice it
--+++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+++-+        if attention_mask is not None:
--+++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++-             attn_weights = attn_weights + causal_mask
--+++- 
--+++-         # upcast attention to fp32
--+++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++- 
--+++-         attn_output = self.o_proj(attn_output)
--+++--
--+++-+        # @lwx
--+++-+        
--+++-+        # max_seq_len = self.max_position_embeddings  # 2048
--+++-+
--+++-+        # if attention_mask is not None:
--+++-+        #     # attention_mask: [B, 1, Sq, Sk]
--+++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++-+
--+++-+        #     # pad 到 [max_seq_len, max_seq_len]
--+++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++-+        #     global_attention_mask = padded_mask
--+++-+        # else:
--+++-+        #     global_attention_mask = None
--+++-+
--+++-+
--+++-+        # sparse_mode=3
--+++-+        # attn_output = mindspore.ops.flash_attention_score(
--+++-+        #     query=query_states,
--+++-+        #     key=key_states,
--+++-+        #     value=value_states,
--+++-+        #     real_shift=None,
--+++-+        #     padding_mask=None,
--+++-+
--+++-+        #     head_num=self.num_heads,
--+++-+        #     attn_mask=global_attention_mask,
--+++-+        #     keep_prob=1.0 - self.attention_dropout,
--+++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++-+        #     input_layout="BNSD",
--+++-+        #     pre_tokens=2147483647,
--+++-+        #     next_tokens=2147483647,
--+++-+        #     inner_precise=0,
--+++-+        #     drop_mask=None,
--+++-+        #     prefix=None,
--+++-+        #     actual_seq_qlen=None,
--+++-+        #     actual_seq_kvlen=None,
--+++-+        #     sparse_mode=sparse_mode,
--+++-+        # )
--+++-         if not output_attentions:
--+++-             attn_weights = None
--+++- 
--+++-         return attn_output, attn_weights, past_key_value
--+++- 
--+++- 
--+++-+class Qwen2MoeFlashAttention(nn.Module):
--+++-+    """
--+++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++-+
--+++-+    关键改动:
--+++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++-+       直接传入原始的 key 和 value 张量效率更高。
--+++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++-+    """
--+++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++-+        super().__init__()
--+++-+        self.config = config
--+++-+        self.layer_idx = layer_idx
--+++-+        self.hidden_size = config.hidden_size
--+++-+        self.num_heads = config.num_attention_heads
--+++-+        self.head_dim = self.hidden_size // self.num_heads
--+++-+        self.num_key_value_heads = config.num_key_value_heads
--+++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++-+        self.max_position_embeddings = config.max_position_embeddings
--+++-+        self.rope_theta = config.rope_theta
--+++-+        self.attention_dropout = config.attention_dropout
--+++-+
--+++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++-+            raise ValueError(
--+++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++-+            )
--+++-+
--+++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++-+
--+++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++-+            self.head_dim,
--+++-+            max_position_embeddings=self.max_position_embeddings,
--+++-+            base=self.rope_theta,
--+++-+        )
--+++-+
--+++-+    def forward(
--+++-+        self,
--+++-+        hidden_states: mindspore.Tensor,
--+++-+        attention_mask: Optional[mindspore.Tensor] = None,
--+++-+        position_ids: Optional[mindspore.Tensor] = None,
--+++-+        past_key_value: Optional[Cache] = None,
--+++-+        output_attentions: bool = False,
--+++-+        use_cache: bool = False,
--+++-+        cache_position: Optional[mindspore.Tensor] = None,
--+++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++-+
--+++-+        bsz, q_len, _ = hidden_states.shape
--+++-+
--+++-+        # 1. 线性投射 Q, K, V
--+++-+        query_states = self.q_proj(hidden_states)
--+++-+        key_states = self.k_proj(hidden_states)
--+++-+        value_states = self.v_proj(hidden_states)
--+++-+
--+++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+
--+++-+        # 3. RoPE 旋转位置编码
--+++-+        kv_seq_len = key_states.shape[-2]
--+++-+        if past_key_value is not None:
--+++-+            if self.layer_idx is None:
--+++-+                raise ValueError(
--+++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++-+                    "with a layer index."
--+++-+                )
--+++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++-+                if cache_position.shape[0] == 1:
--+++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++-+                    kv_seq_len = past_seen_tokens + 1
--+++-+                else:
--+++-+                    # prefill 阶段：cache_position 是范围，使用其长度
--+++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++-+            else:
--+++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++-+
--+++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++-+
--+++-+        # 4. KV 缓存更新
--+++-+        if past_key_value is not None:
--+++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++-+            key_states, value_states = past_key_value.update(
--+++-+                key_states, value_states, self.layer_idx, cache_kwargs
--+++-+            )
--+++-+            
--+++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++-+                if cache_position.shape[0] == 1:
--+++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++-+                    kv_seq_len = key_states.shape[-2]
--+++-+
--+++-+        # 5. [重要] 准备 Attention Mask
--+++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++-+        fa_attention_mask = None
--+++-+        if attention_mask is not None:
--+++-+            # 截取与当前key长度匹配的部分
--+++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++-+            fa_attention_mask = (mask_slice != 0)
--+++-+
--+++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++-+        input_dtype = query_states.dtype
--+++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++-+            query_states = query_states.to(mindspore.float16)
--+++-+            key_states = key_states.to(mindspore.float16)
--+++-+            value_states = value_states.to(mindspore.float16)
--+++-+
--+++-+        # 6. [核心] 调用 flash_attention_score 算子
--+++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++-+        attn_output = mindspore.ops.flash_attention_score(
--+++-+            query=query_states,
--+++-+            key=key_states,
--+++-+            value=value_states,
--+++-+            head_num=self.num_heads, # 传入Q的头数(N1)
--+++-+            attn_mask=fa_attention_mask,
--+++-+            keep_prob=1.0 - self.attention_dropout,
--+++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
--+++-+            input_layout="BNSD",
--+++-+            sparse_mode=0 # 使用 defaultMask 模式
--+++-+        )
--+++-+
--+++-+        # 恢复原始数据类型
--+++-+        attn_output = attn_output.to(input_dtype)
--+++-+
--+++-+        # 7. 调整输出形状
--+++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++-+        attn_output = self.o_proj(attn_output)
--+++-+
--+++-+        # FlashAttention 算子不直接返回注意力权重矩阵
--+++-+        attn_weights = None
--+++-+        if output_attentions:
--+++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++-+
--+++-+        return attn_output, attn_weights, past_key_value
--+++-+
--+++-+    # def forward(
--+++-+    #     self,
--+++-+    #     hidden_states: mindspore.Tensor,
--+++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++-+    #     position_ids: Optional[mindspore.Tensor] = None,
--+++-+    #     past_key_value: Optional[Cache] = None,
--+++-+    #     output_attentions: bool = False,
--+++-+    #     use_cache: bool = False,
--+++-+    #     cache_position: Optional[mindspore.Tensor] = None,
--+++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++-+
--+++-+    #     bsz, q_len, _ = hidden_states.shape
--+++-+
--+++-+    #     # 1. 线性投射 Q, K, V
--+++-+    #     query_states = self.q_proj(hidden_states)
--+++-+    #     key_states = self.k_proj(hidden_states)
--+++-+    #     value_states = self.v_proj(hidden_states)
--+++-+
--+++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+
--+++-+    #     # 3. RoPE 旋转位置编码
--+++-+    #     kv_seq_len = key_states.shape[-2]
--+++-+    #     if past_key_value is not None:
--+++-+    #         if self.layer_idx is None:
--+++-+    #             raise ValueError(
--+++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++-+    #                 "with a layer index."
--+++-+    #             )
--+++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++-+
--+++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++-+
--+++-+    #     # 4. KV 缓存更新
--+++-+    #     if past_key_value is not None:
--+++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++-+    #         key_states, value_states = past_key_value.update(
--+++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++-+    #         )
--+++-+
--+++-+    #     # 5. 准备 Attention Mask
--+++-+    #     fa_attention_mask = None
--+++-+    #     if attention_mask is not None:
--+++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++-+    #         fa_attention_mask = (mask_slice != 0)
--+++-+
--+++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++-+    #     input_dtype = query_states.dtype
--+++-+
--+++-+    #     # 6. [核心] 调用 flash_attention_score 算子
--+++-+    #     attn_output = mindspore.ops.flash_attention_score(
--+++-+    #         query=query_states,
--+++-+    #         key=key_states,
--+++-+    #         value=value_states,
--+++-+    #         head_num=self.num_heads,
--+++-+    #         attn_mask=fa_attention_mask,
--+++-+    #         keep_prob=1.0 - self.attention_dropout,
--+++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++-+    #         input_layout="BNSD",
--+++-+    #         sparse_mode=0,
--+++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++-+    #         inner_precise=1
--+++-+    #     )
--+++-+
--+++-+    #     # 恢复原始数据类型
--+++-+    #     attn_output = attn_output.to(input_dtype)
--+++-+
--+++-+    #     # 7. 调整输出形状
--+++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++-+    #     attn_output = self.o_proj(attn_output)
--+++-+
--+++-+    #     attn_weights = None
--+++-+    #     if output_attentions:
--+++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++-+
--+++-+    #     return attn_output, attn_weights, past_key_value
--+++-+
--+++-+    # def forward(
--+++-+    #     self,
--+++-+    #     hidden_states: mindspore.Tensor,
--+++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++-+    #     position_ids: Optional[mindspore.Tensor] = None,
--+++-+    #     past_key_value: Optional[Cache] = None,
--+++-+    #     output_attentions: bool = False,
--+++-+    #     use_cache: bool = False,
--+++-+    #     cache_position: Optional[mindspore.Tensor] = None,
--+++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++-+
--+++-+    #     bsz, q_len, _ = hidden_states.shape
--+++-+
--+++-+    #     query_states = self.q_proj(hidden_states)
--+++-+    #     key_states = self.k_proj(hidden_states)
--+++-+    #     value_states = self.v_proj(hidden_states)
--+++-+
--+++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-+
--+++-+    #     kv_seq_len = key_states.shape[-2]
--+++-+    #     if past_key_value is not None:
--+++-+    #         if self.layer_idx is None:
--+++-+    #             raise ValueError("`layer_idx` must be specified for caching")
--+++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++-+
--+++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++-+
--+++-+    #     if past_key_value is not None:
--+++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++-+    #         key_states, value_states = past_key_value.update(
--+++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++-+    #         )
--+++-+
--+++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++-+
--+++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++-+    #     query_states = query_states / math.sqrt(self.head_dim)
--+++-+    #     # <--- 修改结束 ---
--+++-+
--+++-+    #     fa_attention_mask = None
--+++-+    #     if attention_mask is not None:
--+++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++-+    #         fa_attention_mask = (mask_slice != 0)
--+++-+
--+++-+    #     input_dtype = query_states.dtype
--+++-+
--+++-+    #     attn_output = mindspore.ops.flash_attention_score(
--+++-+    #         query=query_states,    # 传入已经预先缩放过的 query
--+++-+    #         key=key_states,
--+++-+    #         value=value_states,
--+++-+    #         head_num=self.num_heads,
--+++-+    #         attn_mask=fa_attention_mask,
--+++-+    #         keep_prob=1.0 - self.attention_dropout,
--+++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++-+    #         input_layout="BNSD",
--+++-+    #         sparse_mode=0,
--+++-+    #         inner_precise=1        # 仍然保持内部高精度计算
--+++-+    #     )
--+++-+
--+++-+    #     attn_output = attn_output.to(input_dtype)
--+++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++-+    #     attn_output = self.o_proj(attn_output)
--+++-+
--+++-+    #     attn_weights = None
--+++-+    #     if output_attentions:
--+++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++-+
--+++-+    #     return attn_output, attn_weights, past_key_value
--+++-+
--+++- QWEN2MOE_ATTENTION_CLASSES = {
--+++-     "eager": Qwen2MoeAttention,
--+++-+    "flash-attention": Qwen2MoeFlashAttention,
--+++- }
--+++- 
--+++- 
--+++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++- 
--+++-+    #@dwj
--+++-+    # 只遍历激活的专家，而非全部专家
--+++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++--        hidden_states = hidden_states.view(-1, hidden_dim)
--+++--        # router_logits: (batch * sequence_length, n_experts)
--+++--        router_logits = self.gate(hidden_states)
--+++--
--+++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++--        if self.norm_topk_prob:
--+++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++--        # we cast back to the input dtype
--+++--        routing_weights = routing_weights.to(hidden_states.dtype)
--+++--
--+++--        final_hidden_states = ops.zeros(
--+++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+++--        )
--+++--
--+++--        # One hot encode the selected experts to create an expert mask
--+++--        # this will be used to easily index which expert is going to be sollicitated
--+++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+++--
--+++--        # Loop over all available experts in the model and perform the computation on each expert
--+++--        for expert_idx in range(self.num_experts):
--+++--            expert_layer = self.experts[expert_idx]
--+++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+++--
--+++--            # Index the correct hidden states and compute the expert hidden state for
--+++--            # the current expert. We need to make sure to multiply the output hidden
--+++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+++--            if 0 not in idx.shape:
--+++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+++--
--+++--                # However `index_add_` only support torch tensors for indexing so we'll use
--+++--                # the `top_x` tensor here.
--+++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+++--
--+++--        shared_expert_output = self.shared_expert(hidden_states)
--+++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+++--
--+++--        final_hidden_states = final_hidden_states + shared_expert_output
--+++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++-+            num_tokens = hidden_states_reshaped.shape[0]
--+++-+
--+++-+            router_logits = self.gate(hidden_states_reshaped)
--+++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++-+
--+++-+            if self.norm_topk_prob:
--+++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++-+            routing_weights = routing_weights.to(hidden_states.dtype)
--+++-+            
--+++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++-+            flat_selected_experts = selected_experts.flatten()
--+++-+            
--+++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++-+            token_indices = broadcasted_token_indices.flatten()
--+++-+            
--+++-+            active_experts = ops.unique(flat_selected_experts)
--+++-+            
--+++-+            for expert_idx_tensor in active_experts:
--+++-+                expert_idx = expert_idx_tensor.item()
--+++-+                expert_layer = self.experts[expert_idx]
--+++-+                
--+++-+                mask = (flat_selected_experts == expert_idx_tensor)
--+++-+                selected_token_indices = token_indices[mask]
--+++-+                selected_routing_weights = routing_weights.flatten()[mask]
--+++-+                
--+++-+                current_states = hidden_states_reshaped[selected_token_indices]
--+++-+                
--+++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++-+                
--+++-+                final_hidden_states = final_hidden_states.index_add(
--+++-+                    dim=0,
--+++-+                    index=selected_token_indices,
--+++-+                    source=expert_output.to(hidden_states.dtype)
--+++-+                )
--+++-+            
--+++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++- 
--+++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++--        return final_hidden_states, router_logits
--+++-+            final_hidden_states = final_hidden_states + shared_expert_output
--+++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++-+            
--+++-+            return final_hidden_states, router_logits
--+++- 
--+++- 
--+++- class Qwen2MoeDecoderLayer(nn.Module):
--+++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+++- 
--+++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++- 
--+++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++-+
--+++-         if (layer_idx not in config.mlp_only_layers) and (
--+++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+++-         ):
--+++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+++-     _skip_keys_device_placement = "past_key_values"
--+++-     _supports_cache_class = True
--+++-+#lwx
--+++-+    # _supports_static_cache = True
--+++- 
--+++-     def _init_weights(self, module):
--+++-         std = self.config.initializer_range
--+++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+++-         return causal_mask
--+++- 
--+++- 
--+++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++-     _tied_weights_keys = ["lm_head.weight"]
--+++- 
--+++-     def __init__(self, config):
--+++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++-         self.num_experts_per_tok = config.num_experts_per_tok
--+++-         # Initialize weights and apply final processing
--+++-         self.post_init()
--+++-+        # @lwx
--+++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+++-+        #     self.generation_config.cache_implementation = "static"
--+++-+        self._warmed_up = False
--+++-+
--+++-+    def warmup_moe_model(self):
--+++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+++-+        test_texts = [
--+++-+            "warmup short",
--+++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+++-+        ]
--+++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++-+        if tokenizer is None:
--+++-+            from mindnlp.transformers import AutoTokenizer
--+++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++-+            self._warmup_tokenizer = tokenizer
--+++-+
--+++-+        for text in test_texts:
--+++-+            inputs = tokenizer(text, return_tensors="ms")
--+++-+            with mindspore._no_grad():
--+++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+++- 
--+++-     def get_input_embeddings(self):
--+++-         return self.model.embed_tokens
--+++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++-         ```"""
--+++-+        if not self._warmed_up:
--+++-+            self._warmed_up = True
--+++-+            self.warmup_moe_model()
--+++- 
--+++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+++-         output_router_logits = (
--+++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++-             }
--+++-         )
--+++-         return model_inputs
--+++-+# @lwx
--+++-+    # def _decode_one_tokens_logits(
--+++-+    #     self,
--+++-+    #     cur_token: mindspore.Tensor,
--+++-+    #     input_pos: Optional[mindspore.Tensor],
--+++-+    #     cache_position: mindspore.Tensor,
--+++-+    #     past_key_values: StaticCache,
--+++-+    # ) -> mindspore.Tensor:
--+++-+    #     """
--+++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+++-+        
--+++-+    #     Args:
--+++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+++-+    #         input_pos: 输入位置信息，可选
--+++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+++-+            
--+++-+    #     Returns:
--+++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+++-+    #     """
--+++-+    #     # 调用JIT编译的版本
--+++-+    #     return self.get_decode_one_tokens_logits(
--+++-+    #         cur_token=cur_token,
--+++-+    #         input_pos=input_pos,
--+++-+    #         cache_position=cache_position,
--+++-+    #         past_key_values=past_key_values,
--+++-+    #     )
--+++-+    
--+++-+    # @mindspore.jit(jit_level='O1')
--+++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+++-+    #     """
--+++-+    #     JIT编译的函数，用于高效的单token解码
--+++-+    #     使用JIT编译优化以支持静态shape和高效执行
--+++-+        
--+++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+++-+    #     """
--+++-+    #     outputs = self.model.forward(
--+++-+    #         input_ids=cur_token,
--+++-+    #         position_ids=input_pos,
--+++-+    #         cache_position=cache_position,
--+++-+    #         past_key_values=past_key_values,
--+++-+    #         use_cache=True,
--+++-+    #         return_dict=False,
--+++-+    #     )
--+++-+        
--+++-+    #     hidden_states = outputs[0]
--+++-+    #     logits = self.lm_head.forward(hidden_states)
--+++-+    #     logits = logits.float()
--+++-+        
--+++-+    #     return logits[:, -1, :]
--+++-+
--+++-+    # def _sample(
--+++-+    #     self,
--+++-+    #     input_ids: mindspore.Tensor,
--+++-+    #     logits_processor,
--+++-+    #     stopping_criteria,
--+++-+    #     generation_config,
--+++-+    #     synced_devices: bool,
--+++-+    #     streamer=None,
--+++-+    #     logits_warper=None,
--+++-+    #     **model_kwargs,
--+++-+    # ):
--+++-+    #     """
--+++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+++-+    #     """
--+++-+    #     from ...generation.logits_process import LogitsProcessorList
--+++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+++-+    #     from mindnlp.core import nn, ops, no_grad
--+++-+    #     import numpy as np
--+++-+        
--+++-+    #     # 检查是否使用 StaticCache
--+++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+++-+    #     # 否则，直接调用父类方法
--+++-+    #     past_key_values = model_kwargs.get("past_key_values")
--+++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+++-+
--+++-+    #     if not isinstance(past_key_values, StaticCache):
--+++-+    #         # 不使用 StaticCache，直接调用父类方法
--+++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+++-+    #         return super()._sample(
--+++-+    #             input_ids=input_ids,
--+++-+    #             logits_processor=logits_processor,
--+++-+    #             stopping_criteria=stopping_criteria,
--+++-+    #             generation_config=generation_config,
--+++-+    #             synced_devices=synced_devices,
--+++-+    #             streamer=streamer,
--+++-+    #             logits_warper=logits_warper,
--+++-+    #             **model_kwargs,
--+++-+    #         )
--+++-+        
--+++-+    #     # 使用 StaticCache，进入自定义循环
--+++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+++-+    #     pad_token_id = generation_config._pad_token_tensor
--+++-+    #     output_attentions = generation_config.output_attentions
--+++-+    #     output_hidden_states = generation_config.output_hidden_states
--+++-+    #     output_scores = generation_config.output_scores
--+++-+    #     output_logits = generation_config.output_logits
--+++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+++-+    #     max_length = generation_config.max_length
--+++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+++-+    #     do_sample = generation_config.do_sample
--+++-+        
--+++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+++-+    #         raise ValueError(
--+++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+++-+    #             f"{logits_warper})."
--+++-+    #         )
--+++-+        
--+++-+    #     # init attention / hidden states / scores tuples
--+++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
--+++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+++-+        
--+++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+++-+    #         encoder_hidden_states = (
--+++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+++-+    #         )
--+++-+        
--+++-+    #     # keep track of which sequences are already finished
--+++-+    #     batch_size, cur_len = input_ids.shape
--+++-+    #     this_peer_finished = False
--+++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+++-+        
--+++-+    #     time_record = []
--+++-+    #     from ....utils.testing_utils import parse_flag_from_env
--+++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+++-+        
--+++-+    #     while self._has_unfinished_sequences(
--+++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+++-+    #     ):
--+++-+    #         if _record_time:
--+++-+    #             import time as time_module
--+++-+    #             infer_start = time_module.time()
--+++-+            
--+++-+    #         # prepare model inputs
--+++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+++-+            
--+++-+    #         # prepare variable output controls
--+++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+++-+            
--+++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+++-+    #         cur_cache_position = model_inputs.get("cache_position")
--+++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
--+++-+    #         cur_input_ids = model_inputs.get("input_ids")
--+++-+            
--+++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+++-+    #             cur_cache_position is not None and 
--+++-+    #             len(cur_cache_position.shape) > 0 and
--+++-+    #             cur_cache_position.shape[0] == 1 and
--+++-+    #             cur_input_ids is not None and
--+++-+    #             cur_input_ids.shape[1] == 1):
--+++-+    #             # 使用 JIT 优化的单 token 解码
--+++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+++-+    #             if not hasattr(self, '_jit_used'):
--+++-+    #                 self._jit_used = False
--+++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+++-+                
--+++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
--+++-+    #                 cur_token=cur_input_ids,
--+++-+    #                 input_pos=model_inputs.get("position_ids"),
--+++-+    #                 cache_position=cur_cache_position,
--+++-+    #                 past_key_values=cur_past_key_values,
--+++-+    #             )
--+++-+                
--+++-+    #             # 标记已使用JIT（用于后续判断）
--+++-+    #             if not self._jit_used:
--+++-+    #                 self._jit_used = True
--+++-+                
--+++-+    #             # 构造兼容的输出对象
--+++-+    #             class JitOptimizedOutput:
--+++-+    #                 def __init__(self, logits, config):
--+++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+++-+    #                     self.config = config
--+++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
--+++-+    #                     self.cross_attentions = None
--+++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+++-+                
--+++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+++-+    #         else:
--+++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+++-+    #             outputs = self(**model_inputs, return_dict=True)
--+++-+            
--+++-+    #         if synced_devices and this_peer_finished:
--+++-+    #             continue
--+++-+            
--+++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+++-+    #         next_token_logits = outputs.logits[:, -1, :]
--+++-+            
--+++-+    #         # pre-process distribution
--+++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+++-+    #         if do_sample:
--+++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+++-+            
--+++-+    #         # Store scores, attentions and hidden_states when required
--+++-+    #         if return_dict_in_generate:
--+++-+    #             if output_scores:
--+++-+    #                 scores += (next_token_scores,)
--+++-+    #             if output_logits:
--+++-+    #                 raw_logits += (next_token_logits,)
--+++-+    #             if output_attentions:
--+++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+++-+    #                 if self.config.is_encoder_decoder:
--+++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+++-+                
--+++-+    #             if output_hidden_states:
--+++-+    #                 hidden = (
--+++-+    #                     outputs.decoder_hidden_states
--+++-+    #                     if self.config.is_encoder_decoder
--+++-+    #                     else outputs.hidden_states
--+++-+    #                 )
--+++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+++-+            
--+++-+    #         # token selection
--+++-+    #         if do_sample:
--+++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+++-+    #         else:
--+++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+++-+            
--+++-+    #         # finished sentences should have their next token be a padding token
--+++-+    #         if has_eos_stopping_criteria:
--+++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+++-+            
--+++-+    #         # update generated ids, model inputs, and length for next step
--+++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+++-+    #         if streamer is not None:
--+++-+    #             streamer.put(next_tokens)
--+++-+            
--+++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
--+++-+    #             outputs,
--+++-+    #             model_kwargs,
--+++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+++-+    #         )
--+++-+            
--+++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+++-+    #         cur_len += 1
--+++-+            
--+++-+    #         if _record_time:
--+++-+    #             import time as time_module
--+++-+    #             infer_stop = time_module.time()
--+++-+    #             time_record.append(infer_stop - infer_start)
--+++-+            
--+++-+    #         del outputs
--+++-+        
--+++-+    #     average_infer_time = None
--+++-+    #     if time_record:
--+++-+    #         if len(time_record) > 1:
--+++-+    #             time_record.pop(0)
--+++-+    #         average_infer_time = sum(time_record) / len(time_record)
--+++-+    #         print(f'average inference time is: {average_infer_time}')
--+++-+    #         print(f'inference time record: {time_record}')
--+++-+        
--+++-+    #     if streamer is not None:
--+++-+    #         streamer.end()
--+++-+        
--+++-+    #     # 简单判断：打印是否使用了JIT路径
--+++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
--+++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
--+++-+    #     else:
--+++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+++-+        
--+++-+    #     if return_dict_in_generate:
--+++-+    #         if self.config.is_encoder_decoder:
--+++-+    #             return GenerateEncoderDecoderOutput(
--+++-+    #                 sequences=input_ids,
--+++-+    #                 scores=scores,
--+++-+    #                 logits=raw_logits,
--+++-+    #                 encoder_attentions=encoder_attentions,
--+++-+    #                 encoder_hidden_states=encoder_hidden_states,
--+++-+    #                 decoder_attentions=decoder_attentions,
--+++-+    #                 cross_attentions=cross_attentions,
--+++-+    #                 decoder_hidden_states=decoder_hidden_states,
--+++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++-+    #                 average_infer_time=average_infer_time
--+++-+    #             )
--+++-+    #         else:
--+++-+    #             return GenerateDecoderOnlyOutput(
--+++-+    #                 sequences=input_ids,
--+++-+    #                 scores=scores,
--+++-+    #                 logits=raw_logits,
--+++-+    #                 attentions=decoder_attentions,
--+++-+    #                 hidden_states=decoder_hidden_states,
--+++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++-+    #                 average_infer_time=average_infer_time
--+++-+    #             )
--+++-+    #     else:
--+++-+    #         return input_ids
--+++-+            
--+++-+    # def _prepare_cache_for_generation(
--+++-+    #     self,
--+++-+    #     generation_config,
--+++-+    #     model_kwargs,
--+++-+    #     assistant_model,
--+++-+    #     batch_size,
--+++-+    #     max_cache_length,
--+++-+    # ):
--+++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+++-+    #         generation_config.cache_implementation = "static"
--+++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+++-+        
--+++-+    #     if generation_config.cache_implementation == "static":
--+++-+    #         base_required_from_max_length = generation_config.max_length + 1
--+++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
--+++-+    #         min_cache_size = 50
--+++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+++-+    #         else:
--+++-+    #             max_cache_length = max(base_required, min_cache_size)
--+++-+            
--+++-+    #         original_max_cache_length = max_cache_length
--+++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
--+++-+            
--+++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++-+    #             if max_cache_length > self.config.max_position_embeddings:
--+++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+++-+        
--+++-+    #     result = super()._prepare_cache_for_generation(
--+++-+    #         generation_config=generation_config,
--+++-+    #         model_kwargs=model_kwargs,
--+++-+    #         assistant_model=assistant_model,
--+++-+    #         batch_size=batch_size,
--+++-+    #         max_cache_length=max_cache_length,
--+++-+    #     )
--+++-+        
--+++-+    #     if generation_config.cache_implementation == "static":
--+++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+++-+    #         created_cache = model_kwargs.get(cache_name)
--+++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+++-+    #             if created_cache.max_cache_len < generation_config.max_length:
--+++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+++-+        
--+++-+    #     return result
--+++-+
--+++-+
--+++-+
--+++- 
--+++- 
--+++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+++--- 
--+++-2.27.0
--+++-
--+++-- 
--+++2.27.0
--+++
--++-- 
--++2.27.0
--++
--+-- 
--+2.27.0
--+
---- 
--2.27.0
--
-diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
-deleted file mode 100644
-index 8a2fc4fe..00000000
---- a/patches/0007-20251107003commit.patch
-+++ /dev/null
-@@ -1,8034 +0,0 @@
--From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Fri, 7 Nov 2025 12:12:51 +0800
--Subject: [PATCH 7/8] 20251107003commit
--
-----
-- .../models/deepseek/modeling_deepseek.py      |    2 +-
-- patches/0001-20251104commit.patch             |    2 +-
-- patches/0002-20251106commit.patch             |    2 +-
-- patches/0003-20261106secondcommit.patch       |    2 +-
-- patches/0004-20251106change.patch             |    2 +-
-- patches/0005-20251107001commit.patch          |    2 +-
-- patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
-- 7 files changed, 7937 insertions(+), 6 deletions(-)
-- create mode 100644 patches/0006-20251107002commit.patch
--
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index e7e1c053..ff631974 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
--     #     return expert_cache
--     
--     @no_grad()
---    dwj
--+    # dwj
--     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--         # x 的 shape: (1, hidden_size)
--         # flat_expert_indices 的 shape: (num_experts_per_tok,)
--diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--index 2842180e..c9c8c5ee 100644
----- a/patches/0001-20251104commit.patch
--+++ b/patches/0001-20251104commit.patch
--@@ -1,7 +1,7 @@
-- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Tue, 4 Nov 2025 09:11:51 +0800
---Subject: [PATCH 1/5] 20251104commit
--+Subject: [PATCH 1/6] 20251104commit
-- 
-- ---
--  mindnlp/transformers/cache_utils.py           |  28 +-
--diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--index c6cd8757..625656eb 100644
----- a/patches/0002-20251106commit.patch
--+++ b/patches/0002-20251106commit.patch
--@@ -1,7 +1,7 @@
-- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 09:20:38 +0800
---Subject: [PATCH 2/5] 20251106commit
--+Subject: [PATCH 2/6] 20251106commit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--index 601960c9..dcb85080 100644
----- a/patches/0003-20261106secondcommit.patch
--+++ b/patches/0003-20261106secondcommit.patch
--@@ -1,7 +1,7 @@
-- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 14:54:37 +0800
---Subject: [PATCH 3/5] 20261106secondcommit
--+Subject: [PATCH 3/6] 20261106secondcommit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--index 8976f10b..bbed13cc 100644
----- a/patches/0004-20251106change.patch
--+++ b/patches/0004-20251106change.patch
--@@ -1,7 +1,7 @@
-- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 15:48:09 +0800
---Subject: [PATCH 4/5] 20251106change
--+Subject: [PATCH 4/6] 20251106change
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  189 +-
--diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
--index 8d9032be..b2d1035c 100644
----- a/patches/0005-20251107001commit.patch
--+++ b/patches/0005-20251107001commit.patch
--@@ -1,7 +1,7 @@
-- From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Fri, 7 Nov 2025 11:48:18 +0800
---Subject: [PATCH 5/5] 20251107001commit
--+Subject: [PATCH 5/6] 20251107001commit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |   91 +-
--diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
--new file mode 100644
--index 00000000..bffa134e
----- /dev/null
--+++ b/patches/0006-20251107002commit.patch
--@@ -0,0 +1,7931 @@
--+From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
--+From: Pinoeer-kingxi <13022943007@163.com>
--+Date: Fri, 7 Nov 2025 12:06:32 +0800
--+Subject: [PATCH 6/6] 20251107002commit
--+
--+---
--+ .../models/deepseek/modeling_deepseek.py      |  122 +-
--+ patches/0001-20251104commit.patch             |    2 +-
--+ patches/0002-20251106commit.patch             |    2 +-
--+ patches/0003-20261106secondcommit.patch       |    2 +-
--+ patches/0004-20251106change.patch             |    2 +-
--+ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
--+ 6 files changed, 7773 insertions(+), 64 deletions(-)
--+ create mode 100644 patches/0005-20251107001commit.patch
--+
--+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+index 8831e4b7..e7e1c053 100644
--+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
--+     #         expert_out = expert(x)
--+     #         expert_cache += expert_out * weight
--+     #     return expert_cache
--+-
--+-    # @no_grad()
--+-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+-    #     # x 的 shape: (1, hidden_size)
--+-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+-
--+-    #     # 1. 收集所有需要的专家层
--+-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
--+-
--+-    #     # 2. 并行计算所有专家的输出
--+-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
--+-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+-
--+-    #     # 3. 使用矩阵乘法进行加权求和
--+-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
--+-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--++    
--++    @no_grad()
--++    dwj
--++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++        # x 的 shape: (1, hidden_size)
--++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--++
--++        # 1. 收集所有需要的专家层
--++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--++        selected_experts = [self.experts[i] for i in flat_expert_indices]
--++
--++        # 2. 并行计算所有专家的输出
--++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--++        # ops.cat 会将它们堆叠成一个新的 Tensor
--++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--++
--++        # 3. 使用矩阵乘法进行加权求和
--++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++        # 最终结果 final_output 的 shape: (1, hidden_size)
--++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+         
--+-    #     return final_output
--++        return final_output
--+ 
--+ 
--+     # @no_grad()
--+@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
--+ 
--+         return expert_cache
--+ # 放置在 DeepseekMoE 类中
--+-    @no_grad()
--+-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+-        """
--+-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--+-
--+-        Args:
--+-            x (Tensor): 输入张量, shape: (1, hidden_size)
--+-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--+-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--+-        """
--+-        top_k, _ = flat_expert_weights.shape
--+-        hidden_size = x.shape[-1]
--+-
--+-        # 1. 将所有专家的权重堆叠起来
--+-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--+-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--+-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--++    # @no_grad()
--++    # #lwx 20251107
--++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++    #     """
--++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--++
--++    #     Args:
--++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
--++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--++    #     """
--++    #     top_k, _ = flat_expert_weights.shape
--++    #     hidden_size = x.shape[-1]
--++
--++    #     # 1. 将所有专家的权重堆叠起来
--++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--+         
--+-        # 2. "收集" 所需的专家权重
--+-        selected_gate_w = stacked_gate_w[flat_expert_indices]
--+-        selected_up_w = stacked_up_w[flat_expert_indices]
--+-        selected_down_w = stacked_down_w[flat_expert_indices]
--++    #     # 2. "收集" 所需的专家权重
--++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
--++    #     selected_up_w = stacked_up_w[flat_expert_indices]
--++    #     selected_down_w = stacked_down_w[flat_expert_indices]
--+         
--+-        # 3. 准备输入
--+-        x_expanded = x.expand((top_k, 1, hidden_size))
--++    #     # 3. 准备输入
--++    #     x_expanded = x.expand((top_k, 1, hidden_size))
--+         
--+-        # 4. 并行计算 gate_proj 和 up_proj
--+-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--+-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--++    #     # 4. 并行计算 gate_proj 和 up_proj
--++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--+ 
--+-        # 5. 计算中间状态
--+-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--++    #     # 5. 计算中间状态
--++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--+         
--+-        # 6. 并行计算 down_proj
--+-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--+-        #    --- [FIX] ---
--+-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--+-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--+-        #    --- [FIX END] ---
--++    #     # 6. 并行计算 down_proj
--++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--++    #     #    --- [FIX] ---
--++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--++    #     #    --- [FIX END] ---
--+         
--+-        # 7. 根据路由权重进行加权求和
--+-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--++    #     # 7. 根据路由权重进行加权求和
--++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--+ 
--+-        return weighted_sum
--++    #     return weighted_sum
--+ 
--+ 
--+ 
--+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+index 0a0ef2d7..2842180e 100644
--+--- a/patches/0001-20251104commit.patch
--++++ b/patches/0001-20251104commit.patch
--+@@ -1,7 +1,7 @@
--+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Tue, 4 Nov 2025 09:11:51 +0800
--+-Subject: [PATCH 1/4] 20251104commit
--++Subject: [PATCH 1/5] 20251104commit
--+ 
--+ ---
--+  mindnlp/transformers/cache_utils.py           |  28 +-
--+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--+index 5185270c..c6cd8757 100644
--+--- a/patches/0002-20251106commit.patch
--++++ b/patches/0002-20251106commit.patch
--+@@ -1,7 +1,7 @@
--+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Thu, 6 Nov 2025 09:20:38 +0800
--+-Subject: [PATCH 2/4] 20251106commit
--++Subject: [PATCH 2/5] 20251106commit
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--+index 3e05f821..601960c9 100644
--+--- a/patches/0003-20261106secondcommit.patch
--++++ b/patches/0003-20261106secondcommit.patch
--+@@ -1,7 +1,7 @@
--+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Thu, 6 Nov 2025 14:54:37 +0800
--+-Subject: [PATCH 3/4] 20261106secondcommit
--++Subject: [PATCH 3/5] 20261106secondcommit
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--+index 88a1aef4..8976f10b 100644
--+--- a/patches/0004-20251106change.patch
--++++ b/patches/0004-20251106change.patch
--+@@ -1,7 +1,7 @@
--+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Thu, 6 Nov 2025 15:48:09 +0800
--+-Subject: [PATCH 4/4] 20251106change
--++Subject: [PATCH 4/5] 20251106change
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |  189 +-
--+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
--+new file mode 100644
--+index 00000000..8d9032be
--+--- /dev/null
--++++ b/patches/0005-20251107001commit.patch
--+@@ -0,0 +1,7707 @@
--++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
--++From: Pinoeer-kingxi <13022943007@163.com>
--++Date: Fri, 7 Nov 2025 11:48:18 +0800
--++Subject: [PATCH 5/5] 20251107001commit
--++
--++---
--++ .../models/deepseek/modeling_deepseek.py      |   91 +-
--++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
--++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
--++ patches/0001-20251104commit.patch             |    2 +-
--++ patches/0002-20251106commit.patch             |    2 +-
--++ patches/0003-20261106secondcommit.patch       |    2 +-
--++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
--++ 7 files changed, 7577 insertions(+), 30 deletions(-)
--++ create mode 100644 patches/0004-20251106change.patch
--++
--++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++index 0546f318..8831e4b7 100644
--++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
--++     #         expert_cache += expert_out * weight
--++     #     return expert_cache
--++ 
--++-    @no_grad()
--++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++-        # x 的 shape: (1, hidden_size)
--++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--++-
--++-        # 1. 收集所有需要的专家层
--++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
--++-
--++-        # 2. 并行计算所有专家的输出
--++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--++-        # ops.cat 会将它们堆叠成一个新的 Tensor
--++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--++-
--++-        # 3. 使用矩阵乘法进行加权求和
--++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++-        # 最终结果 final_output 的 shape: (1, hidden_size)
--++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+++    # @no_grad()
--+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++    #     # x 的 shape: (1, hidden_size)
--+++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+++
--+++    #     # 1. 收集所有需要的专家层
--+++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
--+++
--+++    #     # 2. 并行计算所有专家的输出
--+++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
--+++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+++
--+++    #     # 3. 使用矩阵乘法进行加权求和
--+++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
--+++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--++         
--++-        return final_output
--+++    #     return final_output
--++ 
--++ 
--++     # @no_grad()
--++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
--++             )
--++ 
--++         return expert_cache
--+++# 放置在 DeepseekMoE 类中
--+++    @no_grad()
--+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++        """
--+++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--+++
--+++        Args:
--+++            x (Tensor): 输入张量, shape: (1, hidden_size)
--+++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--+++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--+++        """
--+++        top_k, _ = flat_expert_weights.shape
--+++        hidden_size = x.shape[-1]
--+++
--+++        # 1. 将所有专家的权重堆叠起来
--+++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--+++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--+++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--+++        
--+++        # 2. "收集" 所需的专家权重
--+++        selected_gate_w = stacked_gate_w[flat_expert_indices]
--+++        selected_up_w = stacked_up_w[flat_expert_indices]
--+++        selected_down_w = stacked_down_w[flat_expert_indices]
--+++        
--+++        # 3. 准备输入
--+++        x_expanded = x.expand((top_k, 1, hidden_size))
--+++        
--+++        # 4. 并行计算 gate_proj 和 up_proj
--+++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--+++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--+++
--+++        # 5. 计算中间状态
--+++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--+++        
--+++        # 6. 并行计算 down_proj
--+++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--+++        #    --- [FIX] ---
--+++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--+++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--+++        #    --- [FIX END] ---
--+++        
--+++        # 7. 根据路由权重进行加权求和
--+++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--+++
--+++        return weighted_sum
--+++
--+++
--++ 
--++         # @no_grad()
--++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++index ebd7782e..913a7609 100644
--++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
--++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--++ def rotate_half(x):
--++     """Rotates half the hidden dims of the input."""
--++-    x1 = x[..., : x.shape[-1] // 2]
--++-    x2 = x[..., x.shape[-1] // 2 :]
--+++    # x1 = x[..., : x.shape[-1] // 2]
--+++    # x2 = x[..., x.shape[-1] // 2 :]
--++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++     return ops.cat((-x2, x1), dim=-1)
--++ 
--++ 
--++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--++index d059dcbe..2b217b64 100644
--++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--+++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
--++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--++ def rotate_half(x):
--++     """Rotates half the hidden dims of the input."""
--++-    x1 = x[..., : x.shape[-1] // 2]
--++-    x2 = x[..., x.shape[-1] // 2 :]
--+++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++    # x1 = x[..., : x.shape[-1] // 2]
--+++    # x2 = x[..., x.shape[-1] // 2 :]
--+++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++     return ops.cat((-x2, x1), dim=-1)
--++ 
--++ 
--++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--++index 78f22642..0a0ef2d7 100644
--++--- a/patches/0001-20251104commit.patch
--+++++ b/patches/0001-20251104commit.patch
--++@@ -1,7 +1,7 @@
--++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++ From: Pinoeer-kingxi <13022943007@163.com>
--++ Date: Tue, 4 Nov 2025 09:11:51 +0800
--++-Subject: [PATCH 1/3] 20251104commit
--+++Subject: [PATCH 1/4] 20251104commit
--++ 
--++ ---
--++  mindnlp/transformers/cache_utils.py           |  28 +-
--++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--++index 22b65dd5..5185270c 100644
--++--- a/patches/0002-20251106commit.patch
--+++++ b/patches/0002-20251106commit.patch
--++@@ -1,7 +1,7 @@
--++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--++ From: Pinoeer-kingxi <13022943007@163.com>
--++ Date: Thu, 6 Nov 2025 09:20:38 +0800
--++-Subject: [PATCH 2/3] 20251106commit
--+++Subject: [PATCH 2/4] 20251106commit
--++ 
--++ ---
--++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--++index 966529e4..3e05f821 100644
--++--- a/patches/0003-20261106secondcommit.patch
--+++++ b/patches/0003-20261106secondcommit.patch
--++@@ -1,7 +1,7 @@
--++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--++ From: Pinoeer-kingxi <13022943007@163.com>
--++ Date: Thu, 6 Nov 2025 14:54:37 +0800
--++-Subject: [PATCH 3/3] 20261106secondcommit
--+++Subject: [PATCH 3/4] 20261106secondcommit
--++ 
--++ ---
--++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--++new file mode 100644
--++index 00000000..88a1aef4
--++--- /dev/null
--+++++ b/patches/0004-20251106change.patch
--++@@ -0,0 +1,7498 @@
--+++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
--+++From: Pinoeer-kingxi <13022943007@163.com>
--+++Date: Thu, 6 Nov 2025 15:48:09 +0800
--+++Subject: [PATCH 4/4] 20251106change
--+++
--+++---
--+++ .../models/deepseek/modeling_deepseek.py      |  189 +-
--+++ patches/0001-20251104commit.patch             | 1272 +++++++
--+++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
--+++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
--+++ 4 files changed, 7244 insertions(+), 186 deletions(-)
--+++ create mode 100644 patches/0001-20251104commit.patch
--+++ create mode 100644 patches/0002-20251106commit.patch
--+++ create mode 100644 patches/0003-20261106secondcommit.patch
--+++
--+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++index 2f9192bf..0546f318 100644
--+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
--+++ 
--+++         return attn_output, attn_weights, past_key_value
--+++ 
--+++-# class DeepseekFlashAttention(nn.Module):
--+++-#     """
--+++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--+++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--+++-
--+++-#     This class is designed as a drop-in replacement for DeepseekAttention.
--+++-#     """
--+++-
--+++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--+++-#         super().__init__()
--+++-#         self.config = config
--+++-#         self.layer_idx = layer_idx
--+++-#         if layer_idx is None:
--+++-#             logger.warning(
--+++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++-#                 "when creating this class."
--+++-#             )
--+++-
--+++-#         self.attention_dropout = config.attention_dropout
--+++-#         self.hidden_size = config.hidden_size
--+++-#         self.num_heads = config.num_attention_heads
--+++-#         self.head_dim = self.hidden_size // self.num_heads
--+++-#         self.num_key_value_heads = config.num_key_value_heads
--+++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++-#         self.max_position_embeddings = config.max_position_embeddings
--+++-#         self.rope_theta = config.rope_theta
--+++-#         self.is_causal = True
--+++-
--+++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+++-#             raise ValueError(
--+++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+++-#                 f" and `num_heads`: {self.num_heads})."
--+++-#             )
--+++-
--+++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--+++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--+++-#         self._init_rope()
--+++-
--+++-#     def _init_rope(self):
--+++-#         if self.config.rope_scaling is None:
--+++-#             self.rotary_emb = DeepseekRotaryEmbedding(
--+++-#                 self.head_dim,
--+++-#                 max_position_embeddings=self.max_position_embeddings,
--+++-#                 base=self.rope_theta,
--+++-#             )
--+++-#         else:
--+++-#             scaling_type = self.config.rope_scaling["type"]
--+++-#             scaling_factor = self.config.rope_scaling["factor"]
--+++-#             if scaling_type == "linear":
--+++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--+++-#                     self.head_dim,
--+++-#                     max_position_embeddings=self.max_position_embeddings,
--+++-#                     scaling_factor=scaling_factor,
--+++-#                     base=self.rope_theta,
--+++-#                 )
--+++-#             elif scaling_type == "dynamic":
--+++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--+++-#                     self.head_dim,
--+++-#                     max_position_embeddings=self.max_position_embeddings,
--+++-#                     scaling_factor=scaling_factor,
--+++-#                     base=self.rope_theta,
--+++-#                 )
--+++-#             else:
--+++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--+++-
--+++-#     def forward(
--+++-#         self,
--+++-#         hidden_states: mindspore.Tensor,
--+++-#         attention_mask: Optional[mindspore.Tensor] = None,
--+++-#         position_ids: Optional[mindspore.Tensor] = None,
--+++-#         past_key_value: Optional[Cache] = None,
--+++-#         output_attentions: bool = False,
--+++-#         use_cache: bool = False,
--+++-#         **kwargs,
--+++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++-#         if "padding_mask" in kwargs:
--+++-#             warnings.warn(
--+++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--+++-#             )
--+++-        
--+++-#         if output_attentions:
--+++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--+++-
--+++-#         bsz, q_len, _ = hidden_states.shape
--+++-
--+++-#         if self.config.pretraining_tp > 1:
--+++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--+++-
--+++-#         query_states = self.q_proj(hidden_states)
--+++-#         key_states = self.k_proj(hidden_states)
--+++-#         value_states = self.v_proj(hidden_states)
--+++-
--+++-#         # Reshape for multi-head attention
--+++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++-
--+++-#         kv_seq_len = key_states.shape[-2]
--+++-#         if past_key_value is not None:
--+++-#             if self.layer_idx is None:
--+++-#                 raise ValueError(
--+++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++-#                     "with a layer index."
--+++-#                 )
--+++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++-        
--+++-#         # Apply Rotary Positional Embedding
--+++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++-
--+++-#         if past_key_value is not None:
--+++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--+++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++-
--+++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--+++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--+++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++-        
--+++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+++-        
--+++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+++-
--+++-#         # Convert attention_mask for flash_attention_score
--+++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--+++-#         if attention_mask is not None:
--+++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--+++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--+++-#                 raise ValueError(
--+++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--+++-#                 )
--+++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--+++-#         else:
--+++-#             attn_mask_for_fa = None
--+++-        
--+++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--+++-
--+++-#         # Call the fused flash_attention_score operator
--+++-#         attn_output = mindspore.ops.flash_attention_score(
--+++-#             query=query_states_for_fa,
--+++-#             key=key_states_for_fa,
--+++-#             value=value_states_for_fa,
--+++-#             head_num=self.num_heads, # This is N1, the number of query heads
--+++-#             input_layout='BSH',
--+++-#             attn_mask=attn_mask_for_fa,
--+++-#             keep_prob=keep_prob,
--+++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
--+++-#             sparse_mode=0 # Default mask mode
--+++-#         )
--+++-        
--+++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--+++-#         attn_output = self.o_proj(attn_output)
--+++-        
--+++-#         # Flash Attention does not return attention weights
--+++-#         attn_weights = None
--+++-
--+++-#         return attn_output, attn_weights, past_key_value
--+++ 
--+++ class DeepseekFlashAttention(nn.Module):
--+++     """
--+++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
--+++         super().__init__()
--+++         self.hidden_size = config.hidden_size
--+++ 
--+++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--+++-            config=config, layer_idx=layer_idx
--+++-        )
--++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--++++            # config=config, layer_idx=layer_idx
--++++        # )
--+++ 
--+++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--+++             config=config, layer_idx=layer_idx
--+++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
--+++         return outputs
--+++ 
--+++ 
--+++-
--+++ class DeepseekPreTrainedModel(PreTrainedModel):
--+++     config_class = DeepseekConfig
--+++     base_model_prefix = "model"
--+++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++         # Initialize weights and apply final processing
--+++         self.post_init()
--+++         self.warm_up = False
--+++-        #@dwj
--+++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--+++-            self.num_layers,
--+++-            self.num_attention_heads,
--+++-            self.head_dim,
--+++-            batch_size=1,
--+++-            max_length=self.max_length,
--+++-            dtype=mindspore.float16
--+++-        )
--+++-
--+++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--+++-        key_cache = []
--+++-        value_cache = []
--+++-        for _ in range(num_layers):
--+++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+++-            key_cache.append(k)
--+++-            value_cache.append(v)
--+++-        return key_cache, value_cache
--+++-
--+++ 
--+++     def warmup_moe_model_deep(self):
--+++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+++new file mode 100644
--+++index 00000000..78f22642
--+++--- /dev/null
--++++++ b/patches/0001-20251104commit.patch
--+++@@ -0,0 +1,1272 @@
--++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++++From: Pinoeer-kingxi <13022943007@163.com>
--++++Date: Tue, 4 Nov 2025 09:11:51 +0800
--++++Subject: [PATCH 1/3] 20251104commit
--++++
--++++---
--++++ mindnlp/transformers/cache_utils.py           |  28 +-
--++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--++++ 3 files changed, 976 insertions(+), 87 deletions(-)
--++++
--++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--++++index cadd2e04..02f8d4be 100644
--++++--- a/mindnlp/transformers/cache_utils.py
--+++++++ b/mindnlp/transformers/cache_utils.py
--++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--++++             # k_out[:, :, cache_position] = key_states
--++++             # v_out[:, :, cache_position] = value_states
--++++-            if ON_ORANGE_PI:
--++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++-            else:
--++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++-
--+++++            # if ON_ORANGE_PI:
--+++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++++            # else:
--+++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++++            # 确保 cache_position 是 1D tensor 并且类型正确
--+++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+++++            if cache_position.ndim > 1:
--+++++                cache_position = cache_position.flatten()
--+++++            # 确保类型是 int32 或 int64（MindSpore 要求）
--+++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+++++                cache_position = cache_position.int()
--+++++            
--+++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+++++            k_out[:, :, cache_position] = key_states
--+++++            v_out[:, :, cache_position] = value_states
--+++++            
--++++         return k_out, v_out
--++++ 
--++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++index c695b944..d8303e45 100644
--++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--++++ def rotate_half(x):
--++++     """Rotates half the hidden dims of the input."""
--++++-    x1 = x[..., : x.shape[-1] // 2]
--++++-    x2 = x[..., x.shape[-1] // 2 :]
--+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++++    # x1 = x[..., : x.shape[-1] // 2]
--+++++    # x2 = x[..., x.shape[-1] // 2 :]
--+++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++     return ops.cat((-x2, x1), dim=-1)
--++++ 
--++++ 
--++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--++++         if self.training:
--++++             raise NotImplementedError("Training is not supported yet.")
--++++         else:
--++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++-        if self.config.n_shared_experts is not None:
--++++-            y = y + self.shared_experts(identity)
--++++-        return y
--+++++            # @lwx
--+++++            if orig_shape[1] == 1:
--+++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+++++                y=y.view(*orig_shape)
--+++++                if self.config.n_shared_experts is not None:
--+++++                    y = y + self.shared_experts(identity)
--+++++                return y
--+++++            else:
--+++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+++++                if self.config.n_shared_experts is not None:
--+++++                    y = y + self.shared_experts(identity)
--+++++                return y
--+++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++++        # if self.config.n_shared_experts is not None:
--+++++        #     y = y + self.shared_experts(identity)
--+++++        # return y
--+++++        
--+++++    @no_grad()
--+++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++++
--+++++        expert_cache = ops.zeros_like(x)
--+++++        for i in range(self.num_experts_per_tok):
--+++++            expert_id = flat_expert_indices[i].item()
--+++++            weight = flat_expert_weights[i].item()
--+++++            expert = self.experts[expert_id]
--+++++            expert_out = expert(x)
--+++++            expert_cache += expert_out * weight
--+++++        return expert_cache
--++++ 
--++++     @no_grad()
--++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++-        # expert_cache = torch.zeros_like(x)
--++++-        # idxs = flat_expert_indices.argsort()
--++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++-        # token_idxs = idxs // self.num_experts_per_tok
--++++-        # for i, end_idx in enumerate(tokens_per_expert):
--++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++-        #     if start_idx == end_idx:
--++++-        #         continue
--++++-        #     expert = self.experts[i]
--++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++-        #     expert_tokens = x[exp_token_idx]
--++++-        #     expert_out = expert(expert_tokens)
--++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++-        # return expert_cache
--+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++         expert_cache = ops.zeros_like(x)
--++++         idxs = flat_expert_indices.argsort()
--++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++         token_idxs = idxs // self.num_experts_per_tok
--+++++
--++++         for i, end_idx in enumerate(tokens_per_expert):
--++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++             if start_idx == end_idx:
--++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--++++             expert_out = expert(expert_tokens)
--++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++
--++++         return expert_cache
--+++++        
--+++++    # @no_grad()
--+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++    #     # expert_cache = torch.zeros_like(x)
--+++++    #     # idxs = flat_expert_indices.argsort()
--+++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++    #     # token_idxs = idxs // self.num_experts_per_tok
--+++++    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++    #     #     if start_idx == end_idx:
--+++++    #     #         continue
--+++++    #     #     expert = self.experts[i]
--+++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++    #     #     expert_tokens = x[exp_token_idx]
--+++++    #     #     expert_out = expert(expert_tokens)
--+++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++    #     # return expert_cache
--+++++    #     expert_cache = ops.zeros_like(x)
--+++++    #     idxs = flat_expert_indices.argsort()
--+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++    #     token_idxs = idxs // self.num_experts_per_tok
--+++++
--+++++    #     for i, end_idx in enumerate(tokens_per_expert):
--+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++    #         if start_idx == end_idx:
--+++++    #             continue
--+++++    #         expert = self.experts[i]
--+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++    #         expert_tokens = x[exp_token_idx]
--+++++    #         expert_out = expert(expert_tokens)
--+++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++
--+++++    #     return expert_cache
--+++++    # @no_grad()
--+++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++    #     expert_cache = ops.zeros_like(x)
--+++++
--+++++    #     # 排序保证顺序一致
--+++++    #     idxs = flat_expert_indices.argsort()
--+++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++    #     token_idxs = idxs // self.num_experts_per_tok
--+++++
--+++++    #     # 找出有 token 的专家
--+++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++++
--+++++    #     for i in active_experts.tolist():
--+++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++    #         end_idx = tokens_per_expert[i]
--+++++    #         if start_idx == end_idx:  # 没有 token
--+++++    #             continue
--+++++
--+++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++    #         expert_tokens = x[exp_token_idx]
--+++++    #         expert_out = self.experts[i](expert_tokens)
--+++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++++
--+++++    #         expert_cache = mindspore.mint.scatter_add(
--+++++    #             expert_cache,
--+++++    #             0,
--+++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++++    #             expert_out
--+++++    #         )
--+++++
--+++++    #     return expert_cache
--+++++
--+++++
--++++ 
--++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--++++ #     """
--++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++ 
--++++         # Initialize weights and apply final processing
--++++         self.post_init()
--+++++        self.warm_up = False
--+++++
--+++++    def warmup_moe_model_deep(self):
--+++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++++        test_texts = [
--+++++            "warmup short",
--+++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+++++        ]
--+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++++        if tokenizer is None:
--+++++            from mindnlp.transformers import AutoTokenizer
--+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++++            self._warmup_tokenizer = tokenizer
--+++++
--+++++        for text in test_texts:
--+++++            inputs = tokenizer(text, return_tensors="ms")
--+++++            with mindspore._no_grad():
--+++++                _ = self(**inputs, use_cache=False)
--+++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--++++ 
--++++     def get_input_embeddings(self):
--++++         return self.model.embed_tokens
--++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++++         ```"""
--+++++        if not self.warm_up:
--+++++            self.warm_up = True
--+++++            self.warmup_moe_model_deep()
--+++++
--++++         output_attentions = (
--++++             output_attentions
--++++             if output_attentions is not None
--++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++index 3cbf820e..d4c6b651 100644
--++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++@@ -18,7 +18,6 @@
--++++ # See the License for the specific language governing permissions and
--++++ # limitations under the License.
--++++ """MindSpore Qwen2MoE model."""
--++++-
--++++ import math
--++++ from typing import List, Optional, Tuple, Union
--++++ 
--++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--++++     TokenClassifierOutput,
--++++ )
--++++ from ...modeling_utils import PreTrainedModel
--+++++from ...generation import GenerationMixin
--++++ from ....utils import logging
--++++ from .configuration_qwen2_moe import Qwen2MoeConfig
--++++ 
--++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--++++         self.variance_epsilon = eps
--++++ 
--++++     def forward(self, hidden_states):
--+++++        # @dwj
--+++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++++        # @lwx
--+++++        # if not self.training :
--+++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++         input_dtype = hidden_states.dtype
--++++         hidden_states = hidden_states.to(mindspore.float32)
--++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--++++@@ -234,6 +239,8 @@ def rotate_half(x):
--++++     """Rotates half the hidden dims of the input."""
--++++     x1 = x[..., : x.shape[-1] // 2]
--++++     x2 = x[..., x.shape[-1] // 2 :]
--+++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++     return ops.cat((-x2, x1), dim=-1)
--++++ 
--++++ 
--++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--++++         self.config = config
--++++         self.hidden_size = config.hidden_size
--++++         self.intermediate_size = intermediate_size
--+++++        
--++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--++++         self.act_fn = ACT2FN[config.hidden_act]
--++++ 
--++++     def forward(self, x):
--++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++-
--++++ 
--+++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++++        # @lwx 
--+++++        # gate_up_output = self.gate_up_proj(x)
--+++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+++++        # return self.down_proj(swiglu_output)
--+++++
--+++++    # def forward(self, x):
--+++++    #     gate_proj_out = self.gate_proj(x)
--+++++    #     up_proj_out = self.up_proj(x)
--+++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+++++    #     return self.down_proj(swiglu_out)
--+++++        
--++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++++     """
--++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--++++         use_cache: bool = False,
--++++         cache_position: Optional[mindspore.Tensor] = None,
--++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++        
--+++++
--++++         bsz, q_len, _ = hidden_states.shape
--++++ 
--++++         query_states = self.q_proj(hidden_states)
--++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++                     "with a layer index."
--++++                 )
--++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++            if isinstance(past_key_value, StaticCache):
--+++++                kv_seq_len = key_states.shape[-2]
--+++++            else:
--+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++ 
--++++         if past_key_value is not None:
--++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++            
--+++++            if isinstance(past_key_value, StaticCache):
--+++++                kv_seq_len = key_states.shape[-2]
--++++ 
--++++         # repeat k/v heads if n_kv_heads < n_heads
--++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++-
--+++++        
--++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++ 
--++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--++++-            raise ValueError(
--++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--++++-                f" {attn_weights.shape}"
--++++-            )
--++++-
--++++-        if attention_mask is not None:  # no matter the length, we just slice it
--++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+++++        if attention_mask is not None:
--+++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++             attn_weights = attn_weights + causal_mask
--++++ 
--++++         # upcast attention to fp32
--++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++ 
--++++         attn_output = self.o_proj(attn_output)
--++++-
--+++++        # @lwx
--+++++        
--+++++        # max_seq_len = self.max_position_embeddings  # 2048
--+++++
--+++++        # if attention_mask is not None:
--+++++        #     # attention_mask: [B, 1, Sq, Sk]
--+++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++++
--+++++        #     # pad 到 [max_seq_len, max_seq_len]
--+++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++++        #     global_attention_mask = padded_mask
--+++++        # else:
--+++++        #     global_attention_mask = None
--+++++
--+++++
--+++++        # sparse_mode=3
--+++++        # attn_output = mindspore.ops.flash_attention_score(
--+++++        #     query=query_states,
--+++++        #     key=key_states,
--+++++        #     value=value_states,
--+++++        #     real_shift=None,
--+++++        #     padding_mask=None,
--+++++
--+++++        #     head_num=self.num_heads,
--+++++        #     attn_mask=global_attention_mask,
--+++++        #     keep_prob=1.0 - self.attention_dropout,
--+++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++        #     input_layout="BNSD",
--+++++        #     pre_tokens=2147483647,
--+++++        #     next_tokens=2147483647,
--+++++        #     inner_precise=0,
--+++++        #     drop_mask=None,
--+++++        #     prefix=None,
--+++++        #     actual_seq_qlen=None,
--+++++        #     actual_seq_kvlen=None,
--+++++        #     sparse_mode=sparse_mode,
--+++++        # )
--++++         if not output_attentions:
--++++             attn_weights = None
--++++ 
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--++++ 
--+++++class Qwen2MoeFlashAttention(nn.Module):
--+++++    """
--+++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++++
--+++++    关键改动:
--+++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++++       直接传入原始的 key 和 value 张量效率更高。
--+++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++++    """
--+++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++++        super().__init__()
--+++++        self.config = config
--+++++        self.layer_idx = layer_idx
--+++++        self.hidden_size = config.hidden_size
--+++++        self.num_heads = config.num_attention_heads
--+++++        self.head_dim = self.hidden_size // self.num_heads
--+++++        self.num_key_value_heads = config.num_key_value_heads
--+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++        self.max_position_embeddings = config.max_position_embeddings
--+++++        self.rope_theta = config.rope_theta
--+++++        self.attention_dropout = config.attention_dropout
--+++++
--+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++            raise ValueError(
--+++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++++            )
--+++++
--+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++++
--+++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++++            self.head_dim,
--+++++            max_position_embeddings=self.max_position_embeddings,
--+++++            base=self.rope_theta,
--+++++        )
--+++++
--+++++    def forward(
--+++++        self,
--+++++        hidden_states: mindspore.Tensor,
--+++++        attention_mask: Optional[mindspore.Tensor] = None,
--+++++        position_ids: Optional[mindspore.Tensor] = None,
--+++++        past_key_value: Optional[Cache] = None,
--+++++        output_attentions: bool = False,
--+++++        use_cache: bool = False,
--+++++        cache_position: Optional[mindspore.Tensor] = None,
--+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++        bsz, q_len, _ = hidden_states.shape
--+++++
--+++++        # 1. 线性投射 Q, K, V
--+++++        query_states = self.q_proj(hidden_states)
--+++++        key_states = self.k_proj(hidden_states)
--+++++        value_states = self.v_proj(hidden_states)
--+++++
--+++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++        # 3. RoPE 旋转位置编码
--+++++        kv_seq_len = key_states.shape[-2]
--+++++        if past_key_value is not None:
--+++++            if self.layer_idx is None:
--+++++                raise ValueError(
--+++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++                    "with a layer index."
--+++++                )
--+++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++++                if cache_position.shape[0] == 1:
--+++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++++                    kv_seq_len = past_seen_tokens + 1
--+++++                else:
--+++++                    # prefill 阶段：cache_position 是范围，使用其长度
--+++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++++            else:
--+++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++
--+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++        # 4. KV 缓存更新
--+++++        if past_key_value is not None:
--+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++            key_states, value_states = past_key_value.update(
--+++++                key_states, value_states, self.layer_idx, cache_kwargs
--+++++            )
--+++++            
--+++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++                if cache_position.shape[0] == 1:
--+++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++++                    kv_seq_len = key_states.shape[-2]
--+++++
--+++++        # 5. [重要] 准备 Attention Mask
--+++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++++        fa_attention_mask = None
--+++++        if attention_mask is not None:
--+++++            # 截取与当前key长度匹配的部分
--+++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++++            fa_attention_mask = (mask_slice != 0)
--+++++
--+++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++++        input_dtype = query_states.dtype
--+++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++++            query_states = query_states.to(mindspore.float16)
--+++++            key_states = key_states.to(mindspore.float16)
--+++++            value_states = value_states.to(mindspore.float16)
--+++++
--+++++        # 6. [核心] 调用 flash_attention_score 算子
--+++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++++        attn_output = mindspore.ops.flash_attention_score(
--+++++            query=query_states,
--+++++            key=key_states,
--+++++            value=value_states,
--+++++            head_num=self.num_heads, # 传入Q的头数(N1)
--+++++            attn_mask=fa_attention_mask,
--+++++            keep_prob=1.0 - self.attention_dropout,
--+++++            scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++            input_layout="BNSD",
--+++++            sparse_mode=0 # 使用 defaultMask 模式
--+++++        )
--+++++
--+++++        # 恢复原始数据类型
--+++++        attn_output = attn_output.to(input_dtype)
--+++++
--+++++        # 7. 调整输出形状
--+++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++        attn_output = self.o_proj(attn_output)
--+++++
--+++++        # FlashAttention 算子不直接返回注意力权重矩阵
--+++++        attn_weights = None
--+++++        if output_attentions:
--+++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++
--+++++        return attn_output, attn_weights, past_key_value
--+++++
--+++++    # def forward(
--+++++    #     self,
--+++++    #     hidden_states: mindspore.Tensor,
--+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++    #     past_key_value: Optional[Cache] = None,
--+++++    #     output_attentions: bool = False,
--+++++    #     use_cache: bool = False,
--+++++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++    #     bsz, q_len, _ = hidden_states.shape
--+++++
--+++++    #     # 1. 线性投射 Q, K, V
--+++++    #     query_states = self.q_proj(hidden_states)
--+++++    #     key_states = self.k_proj(hidden_states)
--+++++    #     value_states = self.v_proj(hidden_states)
--+++++
--+++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++    #     # 3. RoPE 旋转位置编码
--+++++    #     kv_seq_len = key_states.shape[-2]
--+++++    #     if past_key_value is not None:
--+++++    #         if self.layer_idx is None:
--+++++    #             raise ValueError(
--+++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++    #                 "with a layer index."
--+++++    #             )
--+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++
--+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++    #     # 4. KV 缓存更新
--+++++    #     if past_key_value is not None:
--+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++    #         key_states, value_states = past_key_value.update(
--+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++    #         )
--+++++
--+++++    #     # 5. 准备 Attention Mask
--+++++    #     fa_attention_mask = None
--+++++    #     if attention_mask is not None:
--+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++    #         fa_attention_mask = (mask_slice != 0)
--+++++
--+++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++++    #     input_dtype = query_states.dtype
--+++++
--+++++    #     # 6. [核心] 调用 flash_attention_score 算子
--+++++    #     attn_output = mindspore.ops.flash_attention_score(
--+++++    #         query=query_states,
--+++++    #         key=key_states,
--+++++    #         value=value_states,
--+++++    #         head_num=self.num_heads,
--+++++    #         attn_mask=fa_attention_mask,
--+++++    #         keep_prob=1.0 - self.attention_dropout,
--+++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++    #         input_layout="BNSD",
--+++++    #         sparse_mode=0,
--+++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++++    #         inner_precise=1
--+++++    #     )
--+++++
--+++++    #     # 恢复原始数据类型
--+++++    #     attn_output = attn_output.to(input_dtype)
--+++++
--+++++    #     # 7. 调整输出形状
--+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++    #     attn_output = self.o_proj(attn_output)
--+++++
--+++++    #     attn_weights = None
--+++++    #     if output_attentions:
--+++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++
--+++++    #     return attn_output, attn_weights, past_key_value
--+++++
--+++++    # def forward(
--+++++    #     self,
--+++++    #     hidden_states: mindspore.Tensor,
--+++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++    #     past_key_value: Optional[Cache] = None,
--+++++    #     output_attentions: bool = False,
--+++++    #     use_cache: bool = False,
--+++++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++    #     bsz, q_len, _ = hidden_states.shape
--+++++
--+++++    #     query_states = self.q_proj(hidden_states)
--+++++    #     key_states = self.k_proj(hidden_states)
--+++++    #     value_states = self.v_proj(hidden_states)
--+++++
--+++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++    #     kv_seq_len = key_states.shape[-2]
--+++++    #     if past_key_value is not None:
--+++++    #         if self.layer_idx is None:
--+++++    #             raise ValueError("`layer_idx` must be specified for caching")
--+++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++
--+++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++    #     if past_key_value is not None:
--+++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++    #         key_states, value_states = past_key_value.update(
--+++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++    #         )
--+++++
--+++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++
--+++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++++    #     query_states = query_states / math.sqrt(self.head_dim)
--+++++    #     # <--- 修改结束 ---
--+++++
--+++++    #     fa_attention_mask = None
--+++++    #     if attention_mask is not None:
--+++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++    #         fa_attention_mask = (mask_slice != 0)
--+++++
--+++++    #     input_dtype = query_states.dtype
--+++++
--+++++    #     attn_output = mindspore.ops.flash_attention_score(
--+++++    #         query=query_states,    # 传入已经预先缩放过的 query
--+++++    #         key=key_states,
--+++++    #         value=value_states,
--+++++    #         head_num=self.num_heads,
--+++++    #         attn_mask=fa_attention_mask,
--+++++    #         keep_prob=1.0 - self.attention_dropout,
--+++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++++    #         input_layout="BNSD",
--+++++    #         sparse_mode=0,
--+++++    #         inner_precise=1        # 仍然保持内部高精度计算
--+++++    #     )
--+++++
--+++++    #     attn_output = attn_output.to(input_dtype)
--+++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++    #     attn_output = self.o_proj(attn_output)
--+++++
--+++++    #     attn_weights = None
--+++++    #     if output_attentions:
--+++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++++
--+++++    #     return attn_output, attn_weights, past_key_value
--+++++
--++++ QWEN2MOE_ATTENTION_CLASSES = {
--++++     "eager": Qwen2MoeAttention,
--+++++    "flash-attention": Qwen2MoeFlashAttention,
--++++ }
--++++ 
--++++ 
--++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++ 
--+++++    #@dwj
--+++++    # 只遍历激活的专家，而非全部专家
--++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-        hidden_states = hidden_states.view(-1, hidden_dim)
--++++-        # router_logits: (batch * sequence_length, n_experts)
--++++-        router_logits = self.gate(hidden_states)
--++++-
--++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++-        if self.norm_topk_prob:
--++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-        # we cast back to the input dtype
--++++-        routing_weights = routing_weights.to(hidden_states.dtype)
--++++-
--++++-        final_hidden_states = ops.zeros(
--++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--++++-        )
--++++-
--++++-        # One hot encode the selected experts to create an expert mask
--++++-        # this will be used to easily index which expert is going to be sollicitated
--++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--++++-
--++++-        # Loop over all available experts in the model and perform the computation on each expert
--++++-        for expert_idx in range(self.num_experts):
--++++-            expert_layer = self.experts[expert_idx]
--++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--++++-
--++++-            # Index the correct hidden states and compute the expert hidden state for
--++++-            # the current expert. We need to make sure to multiply the output hidden
--++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--++++-            if 0 not in idx.shape:
--++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--++++-
--++++-                # However `index_add_` only support torch tensors for indexing so we'll use
--++++-                # the `top_x` tensor here.
--++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--++++-
--++++-        shared_expert_output = self.shared_expert(hidden_states)
--++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--++++-
--++++-        final_hidden_states = final_hidden_states + shared_expert_output
--+++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++            num_tokens = hidden_states_reshaped.shape[0]
--+++++
--+++++            router_logits = self.gate(hidden_states_reshaped)
--+++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++
--+++++            if self.norm_topk_prob:
--+++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++            routing_weights = routing_weights.to(hidden_states.dtype)
--+++++            
--+++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++++            flat_selected_experts = selected_experts.flatten()
--+++++            
--+++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++++            token_indices = broadcasted_token_indices.flatten()
--+++++            
--+++++            active_experts = ops.unique(flat_selected_experts)
--+++++            
--+++++            for expert_idx_tensor in active_experts:
--+++++                expert_idx = expert_idx_tensor.item()
--+++++                expert_layer = self.experts[expert_idx]
--+++++                
--+++++                mask = (flat_selected_experts == expert_idx_tensor)
--+++++                selected_token_indices = token_indices[mask]
--+++++                selected_routing_weights = routing_weights.flatten()[mask]
--+++++                
--+++++                current_states = hidden_states_reshaped[selected_token_indices]
--+++++                
--+++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++                
--+++++                final_hidden_states = final_hidden_states.index_add(
--+++++                    dim=0,
--+++++                    index=selected_token_indices,
--+++++                    source=expert_output.to(hidden_states.dtype)
--+++++                )
--+++++            
--+++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++++ 
--++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++-        return final_hidden_states, router_logits
--+++++            final_hidden_states = final_hidden_states + shared_expert_output
--+++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++            
--+++++            return final_hidden_states, router_logits
--++++ 
--++++ 
--++++ class Qwen2MoeDecoderLayer(nn.Module):
--++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--++++ 
--++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++ 
--+++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++++
--++++         if (layer_idx not in config.mlp_only_layers) and (
--++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++++         ):
--++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--++++     _skip_keys_device_placement = "past_key_values"
--++++     _supports_cache_class = True
--+++++#lwx
--+++++    # _supports_static_cache = True
--++++ 
--++++     def _init_weights(self, module):
--++++         std = self.config.initializer_range
--++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++++         return causal_mask
--++++ 
--++++ 
--++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++     _tied_weights_keys = ["lm_head.weight"]
--++++ 
--++++     def __init__(self, config):
--++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++         self.num_experts_per_tok = config.num_experts_per_tok
--++++         # Initialize weights and apply final processing
--++++         self.post_init()
--+++++        # @lwx
--+++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+++++        #     self.generation_config.cache_implementation = "static"
--+++++        self._warmed_up = False
--+++++
--+++++    def warmup_moe_model(self):
--+++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+++++        test_texts = [
--+++++            "warmup short",
--+++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+++++        ]
--+++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++++        if tokenizer is None:
--+++++            from mindnlp.transformers import AutoTokenizer
--+++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++++            self._warmup_tokenizer = tokenizer
--+++++
--+++++        for text in test_texts:
--+++++            inputs = tokenizer(text, return_tensors="ms")
--+++++            with mindspore._no_grad():
--+++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--++++ 
--++++     def get_input_embeddings(self):
--++++         return self.model.embed_tokens
--++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++++         ```"""
--+++++        if not self._warmed_up:
--+++++            self._warmed_up = True
--+++++            self.warmup_moe_model()
--++++ 
--++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++++         output_router_logits = (
--++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++             }
--++++         )
--++++         return model_inputs
--+++++# @lwx
--+++++    # def _decode_one_tokens_logits(
--+++++    #     self,
--+++++    #     cur_token: mindspore.Tensor,
--+++++    #     input_pos: Optional[mindspore.Tensor],
--+++++    #     cache_position: mindspore.Tensor,
--+++++    #     past_key_values: StaticCache,
--+++++    # ) -> mindspore.Tensor:
--+++++    #     """
--+++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+++++        
--+++++    #     Args:
--+++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+++++    #         input_pos: 输入位置信息，可选
--+++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+++++            
--+++++    #     Returns:
--+++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+++++    #     """
--+++++    #     # 调用JIT编译的版本
--+++++    #     return self.get_decode_one_tokens_logits(
--+++++    #         cur_token=cur_token,
--+++++    #         input_pos=input_pos,
--+++++    #         cache_position=cache_position,
--+++++    #         past_key_values=past_key_values,
--+++++    #     )
--+++++    
--+++++    # @mindspore.jit(jit_level='O1')
--+++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+++++    #     """
--+++++    #     JIT编译的函数，用于高效的单token解码
--+++++    #     使用JIT编译优化以支持静态shape和高效执行
--+++++        
--+++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+++++    #     """
--+++++    #     outputs = self.model.forward(
--+++++    #         input_ids=cur_token,
--+++++    #         position_ids=input_pos,
--+++++    #         cache_position=cache_position,
--+++++    #         past_key_values=past_key_values,
--+++++    #         use_cache=True,
--+++++    #         return_dict=False,
--+++++    #     )
--+++++        
--+++++    #     hidden_states = outputs[0]
--+++++    #     logits = self.lm_head.forward(hidden_states)
--+++++    #     logits = logits.float()
--+++++        
--+++++    #     return logits[:, -1, :]
--+++++
--+++++    # def _sample(
--+++++    #     self,
--+++++    #     input_ids: mindspore.Tensor,
--+++++    #     logits_processor,
--+++++    #     stopping_criteria,
--+++++    #     generation_config,
--+++++    #     synced_devices: bool,
--+++++    #     streamer=None,
--+++++    #     logits_warper=None,
--+++++    #     **model_kwargs,
--+++++    # ):
--+++++    #     """
--+++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+++++    #     """
--+++++    #     from ...generation.logits_process import LogitsProcessorList
--+++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+++++    #     from mindnlp.core import nn, ops, no_grad
--+++++    #     import numpy as np
--+++++        
--+++++    #     # 检查是否使用 StaticCache
--+++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+++++    #     # 否则，直接调用父类方法
--+++++    #     past_key_values = model_kwargs.get("past_key_values")
--+++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+++++
--+++++    #     if not isinstance(past_key_values, StaticCache):
--+++++    #         # 不使用 StaticCache，直接调用父类方法
--+++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+++++    #         return super()._sample(
--+++++    #             input_ids=input_ids,
--+++++    #             logits_processor=logits_processor,
--+++++    #             stopping_criteria=stopping_criteria,
--+++++    #             generation_config=generation_config,
--+++++    #             synced_devices=synced_devices,
--+++++    #             streamer=streamer,
--+++++    #             logits_warper=logits_warper,
--+++++    #             **model_kwargs,
--+++++    #         )
--+++++        
--+++++    #     # 使用 StaticCache，进入自定义循环
--+++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+++++    #     pad_token_id = generation_config._pad_token_tensor
--+++++    #     output_attentions = generation_config.output_attentions
--+++++    #     output_hidden_states = generation_config.output_hidden_states
--+++++    #     output_scores = generation_config.output_scores
--+++++    #     output_logits = generation_config.output_logits
--+++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+++++    #     max_length = generation_config.max_length
--+++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+++++    #     do_sample = generation_config.do_sample
--+++++        
--+++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+++++    #         raise ValueError(
--+++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+++++    #             f"{logits_warper})."
--+++++    #         )
--+++++        
--+++++    #     # init attention / hidden states / scores tuples
--+++++    #     scores = () if (return_dict_in_generate and output_scores) else None
--+++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+++++        
--+++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+++++    #         encoder_hidden_states = (
--+++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+++++    #         )
--+++++        
--+++++    #     # keep track of which sequences are already finished
--+++++    #     batch_size, cur_len = input_ids.shape
--+++++    #     this_peer_finished = False
--+++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+++++        
--+++++    #     time_record = []
--+++++    #     from ....utils.testing_utils import parse_flag_from_env
--+++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+++++        
--+++++    #     while self._has_unfinished_sequences(
--+++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+++++    #     ):
--+++++    #         if _record_time:
--+++++    #             import time as time_module
--+++++    #             infer_start = time_module.time()
--+++++            
--+++++    #         # prepare model inputs
--+++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+++++            
--+++++    #         # prepare variable output controls
--+++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+++++            
--+++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+++++    #         cur_cache_position = model_inputs.get("cache_position")
--+++++    #         cur_past_key_values = model_inputs.get("past_key_values")
--+++++    #         cur_input_ids = model_inputs.get("input_ids")
--+++++            
--+++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+++++    #             cur_cache_position is not None and 
--+++++    #             len(cur_cache_position.shape) > 0 and
--+++++    #             cur_cache_position.shape[0] == 1 and
--+++++    #             cur_input_ids is not None and
--+++++    #             cur_input_ids.shape[1] == 1):
--+++++    #             # 使用 JIT 优化的单 token 解码
--+++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+++++    #             if not hasattr(self, '_jit_used'):
--+++++    #                 self._jit_used = False
--+++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+++++                
--+++++    #             next_token_logits = self.get_decode_one_tokens_logits(
--+++++    #                 cur_token=cur_input_ids,
--+++++    #                 input_pos=model_inputs.get("position_ids"),
--+++++    #                 cache_position=cur_cache_position,
--+++++    #                 past_key_values=cur_past_key_values,
--+++++    #             )
--+++++                
--+++++    #             # 标记已使用JIT（用于后续判断）
--+++++    #             if not self._jit_used:
--+++++    #                 self._jit_used = True
--+++++                
--+++++    #             # 构造兼容的输出对象
--+++++    #             class JitOptimizedOutput:
--+++++    #                 def __init__(self, logits, config):
--+++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+++++    #                     self.config = config
--+++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+++++    #                     self.attentions = None if not config.is_encoder_decoder else None
--+++++    #                     self.cross_attentions = None
--+++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+++++                
--+++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+++++    #         else:
--+++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+++++    #             outputs = self(**model_inputs, return_dict=True)
--+++++            
--+++++    #         if synced_devices and this_peer_finished:
--+++++    #             continue
--+++++            
--+++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+++++    #         next_token_logits = outputs.logits[:, -1, :]
--+++++            
--+++++    #         # pre-process distribution
--+++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+++++    #         if do_sample:
--+++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+++++            
--+++++    #         # Store scores, attentions and hidden_states when required
--+++++    #         if return_dict_in_generate:
--+++++    #             if output_scores:
--+++++    #                 scores += (next_token_scores,)
--+++++    #             if output_logits:
--+++++    #                 raw_logits += (next_token_logits,)
--+++++    #             if output_attentions:
--+++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+++++    #                 if self.config.is_encoder_decoder:
--+++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+++++                
--+++++    #             if output_hidden_states:
--+++++    #                 hidden = (
--+++++    #                     outputs.decoder_hidden_states
--+++++    #                     if self.config.is_encoder_decoder
--+++++    #                     else outputs.hidden_states
--+++++    #                 )
--+++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+++++            
--+++++    #         # token selection
--+++++    #         if do_sample:
--+++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+++++    #         else:
--+++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+++++            
--+++++    #         # finished sentences should have their next token be a padding token
--+++++    #         if has_eos_stopping_criteria:
--+++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+++++            
--+++++    #         # update generated ids, model inputs, and length for next step
--+++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+++++    #         if streamer is not None:
--+++++    #             streamer.put(next_tokens)
--+++++            
--+++++    #         model_kwargs = self._update_model_kwargs_for_generation(
--+++++    #             outputs,
--+++++    #             model_kwargs,
--+++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+++++    #         )
--+++++            
--+++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+++++    #         cur_len += 1
--+++++            
--+++++    #         if _record_time:
--+++++    #             import time as time_module
--+++++    #             infer_stop = time_module.time()
--+++++    #             time_record.append(infer_stop - infer_start)
--+++++            
--+++++    #         del outputs
--+++++        
--+++++    #     average_infer_time = None
--+++++    #     if time_record:
--+++++    #         if len(time_record) > 1:
--+++++    #             time_record.pop(0)
--+++++    #         average_infer_time = sum(time_record) / len(time_record)
--+++++    #         print(f'average inference time is: {average_infer_time}')
--+++++    #         print(f'inference time record: {time_record}')
--+++++        
--+++++    #     if streamer is not None:
--+++++    #         streamer.end()
--+++++        
--+++++    #     # 简单判断：打印是否使用了JIT路径
--+++++    #     if hasattr(self, '_jit_used') and self._jit_used:
--+++++    #         print("[JIT] ✓ JIT optimization was used during generation")
--+++++    #     else:
--+++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+++++        
--+++++    #     if return_dict_in_generate:
--+++++    #         if self.config.is_encoder_decoder:
--+++++    #             return GenerateEncoderDecoderOutput(
--+++++    #                 sequences=input_ids,
--+++++    #                 scores=scores,
--+++++    #                 logits=raw_logits,
--+++++    #                 encoder_attentions=encoder_attentions,
--+++++    #                 encoder_hidden_states=encoder_hidden_states,
--+++++    #                 decoder_attentions=decoder_attentions,
--+++++    #                 cross_attentions=cross_attentions,
--+++++    #                 decoder_hidden_states=decoder_hidden_states,
--+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++++    #                 average_infer_time=average_infer_time
--+++++    #             )
--+++++    #         else:
--+++++    #             return GenerateDecoderOnlyOutput(
--+++++    #                 sequences=input_ids,
--+++++    #                 scores=scores,
--+++++    #                 logits=raw_logits,
--+++++    #                 attentions=decoder_attentions,
--+++++    #                 hidden_states=decoder_hidden_states,
--+++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++++    #                 average_infer_time=average_infer_time
--+++++    #             )
--+++++    #     else:
--+++++    #         return input_ids
--+++++            
--+++++    # def _prepare_cache_for_generation(
--+++++    #     self,
--+++++    #     generation_config,
--+++++    #     model_kwargs,
--+++++    #     assistant_model,
--+++++    #     batch_size,
--+++++    #     max_cache_length,
--+++++    # ):
--+++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+++++    #         generation_config.cache_implementation = "static"
--+++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+++++        
--+++++    #     if generation_config.cache_implementation == "static":
--+++++    #         base_required_from_max_length = generation_config.max_length + 1
--+++++    #         base_required = max(max_cache_length, base_required_from_max_length)
--+++++    #         min_cache_size = 50
--+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+++++    #         else:
--+++++    #             max_cache_length = max(base_required, min_cache_size)
--+++++            
--+++++    #         original_max_cache_length = max_cache_length
--+++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+++++    #         print(f"  - final max_cache_length: {max_cache_length}")
--+++++            
--+++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++++    #             if max_cache_length > self.config.max_position_embeddings:
--+++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+++++        
--+++++    #     result = super()._prepare_cache_for_generation(
--+++++    #         generation_config=generation_config,
--+++++    #         model_kwargs=model_kwargs,
--+++++    #         assistant_model=assistant_model,
--+++++    #         batch_size=batch_size,
--+++++    #         max_cache_length=max_cache_length,
--+++++    #     )
--+++++        
--+++++    #     if generation_config.cache_implementation == "static":
--+++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+++++    #         created_cache = model_kwargs.get(cache_name)
--+++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+++++    #             if created_cache.max_cache_len < generation_config.max_length:
--+++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+++++        
--+++++    #     return result
--+++++
--+++++
--+++++
--++++ 
--++++ 
--++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--++++-- 
--++++2.27.0
--++++
--+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--+++new file mode 100644
--+++index 00000000..22b65dd5
--+++--- /dev/null
--++++++ b/patches/0002-20251106commit.patch
--+++@@ -0,0 +1,3200 @@
--++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--++++From: Pinoeer-kingxi <13022943007@163.com>
--++++Date: Thu, 6 Nov 2025 09:20:38 +0800
--++++Subject: [PATCH 2/3] 20251106commit
--++++
--++++---
--++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
--++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
--++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
--++++ create mode 100644 patches/0001-20251104commit.patch
--++++
--++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++index d8303e45..73773c22 100644
--++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
--++++         #     y = y + self.shared_experts(identity)
--++++         # return y
--++++         
--+++++    # @no_grad()
--+++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++++
--+++++    #     expert_cache = ops.zeros_like(x)
--+++++    #     for i in range(self.num_experts_per_tok):
--+++++    #         expert_id = flat_expert_indices[i].item()
--+++++    #         weight = flat_expert_weights[i].item()
--+++++    #         expert = self.experts[expert_id]
--+++++    #         expert_out = expert(x)
--+++++    #         expert_cache += expert_out * weight
--+++++    #     return expert_cache
--+++++
--++++     @no_grad()
--++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++++        # x 的 shape: (1, hidden_size)
--+++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+++++
--+++++        # 1. 收集所有需要的专家层
--+++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
--+++++
--+++++        # 2. 并行计算所有专家的输出
--+++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+++++        # ops.cat 会将它们堆叠成一个新的 Tensor
--+++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+++++
--+++++        # 3. 使用矩阵乘法进行加权求和
--+++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++++        # 最终结果 final_output 的 shape: (1, hidden_size)
--+++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+++++        
--+++++        return final_output
--++++ 
--++++-        expert_cache = ops.zeros_like(x)
--++++-        for i in range(self.num_experts_per_tok):
--++++-            expert_id = flat_expert_indices[i].item()
--++++-            weight = flat_expert_weights[i].item()
--++++-            expert = self.experts[expert_id]
--++++-            expert_out = expert(x)
--++++-            expert_cache += expert_out * weight
--++++-        return expert_cache
--++++ 
--++++     @no_grad()
--++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
--++++             key_states = self.k_proj(hidden_states)
--++++             value_states = self.v_proj(hidden_states)
--++++ 
--++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++++        # @lwx
--+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
--+++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
--+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--+++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--+++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--++++ 
--++++         kv_seq_len = key_states.shape[-2]
--++++         if past_key_value is not None:
--++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--++++ 
--+++++# class DeepseekFlashAttention(nn.Module):
--+++++#     """
--+++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--+++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--+++++
--+++++#     This class is designed as a drop-in replacement for DeepseekAttention.
--+++++#     """
--+++++
--+++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--+++++#         super().__init__()
--+++++#         self.config = config
--+++++#         self.layer_idx = layer_idx
--+++++#         if layer_idx is None:
--+++++#             logger.warning(
--+++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++++#                 "when creating this class."
--+++++#             )
--+++++
--+++++#         self.attention_dropout = config.attention_dropout
--+++++#         self.hidden_size = config.hidden_size
--+++++#         self.num_heads = config.num_attention_heads
--+++++#         self.head_dim = self.hidden_size // self.num_heads
--+++++#         self.num_key_value_heads = config.num_key_value_heads
--+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++#         self.max_position_embeddings = config.max_position_embeddings
--+++++#         self.rope_theta = config.rope_theta
--+++++#         self.is_causal = True
--+++++
--+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++#             raise ValueError(
--+++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+++++#                 f" and `num_heads`: {self.num_heads})."
--+++++#             )
--+++++
--+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--+++++#         self._init_rope()
--+++++
--+++++#     def _init_rope(self):
--+++++#         if self.config.rope_scaling is None:
--+++++#             self.rotary_emb = DeepseekRotaryEmbedding(
--+++++#                 self.head_dim,
--+++++#                 max_position_embeddings=self.max_position_embeddings,
--+++++#                 base=self.rope_theta,
--+++++#             )
--+++++#         else:
--+++++#             scaling_type = self.config.rope_scaling["type"]
--+++++#             scaling_factor = self.config.rope_scaling["factor"]
--+++++#             if scaling_type == "linear":
--+++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--+++++#                     self.head_dim,
--+++++#                     max_position_embeddings=self.max_position_embeddings,
--+++++#                     scaling_factor=scaling_factor,
--+++++#                     base=self.rope_theta,
--+++++#                 )
--+++++#             elif scaling_type == "dynamic":
--+++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--+++++#                     self.head_dim,
--+++++#                     max_position_embeddings=self.max_position_embeddings,
--+++++#                     scaling_factor=scaling_factor,
--+++++#                     base=self.rope_theta,
--+++++#                 )
--+++++#             else:
--+++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--+++++
--+++++#     def forward(
--+++++#         self,
--+++++#         hidden_states: mindspore.Tensor,
--+++++#         attention_mask: Optional[mindspore.Tensor] = None,
--+++++#         position_ids: Optional[mindspore.Tensor] = None,
--+++++#         past_key_value: Optional[Cache] = None,
--+++++#         output_attentions: bool = False,
--+++++#         use_cache: bool = False,
--+++++#         **kwargs,
--+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++#         if "padding_mask" in kwargs:
--+++++#             warnings.warn(
--+++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--+++++#             )
--+++++        
--+++++#         if output_attentions:
--+++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--+++++
--+++++#         bsz, q_len, _ = hidden_states.shape
--+++++
--+++++#         if self.config.pretraining_tp > 1:
--+++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--+++++
--+++++#         query_states = self.q_proj(hidden_states)
--+++++#         key_states = self.k_proj(hidden_states)
--+++++#         value_states = self.v_proj(hidden_states)
--+++++
--+++++#         # Reshape for multi-head attention
--+++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++#         kv_seq_len = key_states.shape[-2]
--+++++#         if past_key_value is not None:
--+++++#             if self.layer_idx is None:
--+++++#                 raise ValueError(
--+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++#                     "with a layer index."
--+++++#                 )
--+++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++        
--+++++#         # Apply Rotary Positional Embedding
--+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++#         if past_key_value is not None:
--+++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--+++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++
--+++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--+++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--+++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++        
--+++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+++++        
--+++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--+++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--+++++
--+++++#         # Convert attention_mask for flash_attention_score
--+++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--+++++#         if attention_mask is not None:
--+++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--+++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--+++++#                 raise ValueError(
--+++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--+++++#                 )
--+++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--+++++#         else:
--+++++#             attn_mask_for_fa = None
--+++++        
--+++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--+++++
--+++++#         # Call the fused flash_attention_score operator
--+++++#         attn_output = mindspore.ops.flash_attention_score(
--+++++#             query=query_states_for_fa,
--+++++#             key=key_states_for_fa,
--+++++#             value=value_states_for_fa,
--+++++#             head_num=self.num_heads, # This is N1, the number of query heads
--+++++#             input_layout='BSH',
--+++++#             attn_mask=attn_mask_for_fa,
--+++++#             keep_prob=keep_prob,
--+++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++#             sparse_mode=0 # Default mask mode
--+++++#         )
--+++++        
--+++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--+++++#         attn_output = self.o_proj(attn_output)
--+++++        
--+++++#         # Flash Attention does not return attention weights
--+++++#         attn_weights = None
--+++++
--+++++#         return attn_output, attn_weights, past_key_value
--+++++
--+++++class DeepseekFlashAttention(nn.Module):
--+++++    """
--+++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
--+++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
--+++++    designed for high performance on supported hardware (Ascend).
--+++++
--+++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
--+++++    """
--+++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--+++++        super().__init__()
--+++++        self.config = config
--+++++        self.layer_idx = layer_idx
--+++++        if layer_idx is None:
--+++++            logger.warning(
--+++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++++                "when creating this class."
--+++++            )
--+++++
--+++++        # --- [FIX] Correctly initialize all required attributes ---
--+++++        self.attention_dropout = config.attention_dropout
--+++++        self.hidden_size = config.hidden_size
--+++++        self.num_heads = config.num_attention_heads
--+++++        self.head_dim = self.hidden_size // self.num_heads
--+++++        self.num_key_value_heads = config.num_key_value_heads
--+++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++        self.max_position_embeddings = config.max_position_embeddings
--+++++        self.rope_theta = config.rope_theta
--+++++        self.is_causal = True
--+++++
--+++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++            raise ValueError(
--+++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+++++                f" and `num_heads`: {self.num_heads})."
--+++++            )
--+++++
--+++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--+++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--+++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--+++++        
--+++++        # This call will now succeed as all attributes are initialized.
--+++++        self._init_rope()
--+++++
--+++++    def _init_rope(self):
--+++++        if self.config.rope_scaling is None:
--+++++            self.rotary_emb = DeepseekRotaryEmbedding(
--+++++                self.head_dim,
--+++++                max_position_embeddings=self.max_position_embeddings,
--+++++                base=self.rope_theta,
--+++++            )
--+++++        else:
--+++++            scaling_type = self.config.rope_scaling["type"]
--+++++            scaling_factor = self.config.rope_scaling["factor"]
--+++++            if scaling_type == "linear":
--+++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--+++++                    self.head_dim,
--+++++                    max_position_embeddings=self.max_position_embeddings,
--+++++                    scaling_factor=scaling_factor,
--+++++                    base=self.rope_theta,
--+++++                )
--+++++            elif scaling_type == "dynamic":
--+++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--+++++                    self.head_dim,
--+++++                    max_position_embeddings=self.max_position_embeddings,
--+++++                    scaling_factor=scaling_factor,
--+++++                    base=self.rope_theta,
--+++++                )
--+++++            else:
--+++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--+++++
--+++++    def forward(
--+++++        self,
--+++++        hidden_states: mindspore.Tensor,
--+++++        attention_mask: Optional[mindspore.Tensor] = None,
--+++++        position_ids: Optional[mindspore.Tensor] = None,
--+++++        past_key_value: Optional[Cache] = None,
--+++++        output_attentions: bool = False,
--+++++        use_cache: bool = False,
--+++++        **kwargs,
--+++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++        if "padding_mask" in kwargs:
--+++++            warnings.warn(
--+++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--+++++            )
--+++++        if output_attentions:
--+++++            warnings.warn(
--+++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
--+++++            )
--+++++
--+++++        bsz, q_len, _ = hidden_states.shape
--+++++
--+++++        if self.config.pretraining_tp > 1:
--+++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--+++++
--+++++        query_states = self.q_proj(hidden_states)
--+++++        key_states = self.k_proj(hidden_states)
--+++++        value_states = self.v_proj(hidden_states)
--+++++
--+++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
--+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++        kv_seq_len = key_states.shape[-2]
--+++++        if past_key_value is not None:
--+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++        
--+++++        # Apply Rotary Position Embedding
--+++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++        if past_key_value is not None:
--+++++            cache_kwargs = {"sin": sin, "cos": cos}
--+++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++
--+++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
--+++++        # So we must explicitly repeat the KV heads.
--+++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++
--+++++        # Convert attention mask for flash_attention_score
--+++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
--+++++        if attention_mask is not None:
--+++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--+++++                 raise ValueError(
--+++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--+++++                )
--+++++            attn_mask_for_fa = attention_mask < 0
--+++++        else:
--+++++            attn_mask_for_fa = None
--+++++
--+++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--+++++
--+++++        # Call the fused operator using the efficient BNSD layout
--+++++        attn_output = mindspore.ops.flash_attention_score(
--+++++            query=query_states,
--+++++            key=key_states,
--+++++            value=value_states,
--+++++            head_num=self.num_heads,
--+++++            input_layout='BNSD', # Specify the correct layout
--+++++            attn_mask=attn_mask_for_fa,
--+++++            keep_prob=keep_prob,
--+++++            scalar_value=1.0 / math.sqrt(self.head_dim)
--+++++        )
--+++++        
--+++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
--+++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++        
--+++++        # Apply output projection
--+++++        attn_output = self.o_proj(attn_output)
--+++++
--+++++        # Flash attention does not return attention weights, so we return None.
--+++++        attn_weights = None
--+++++
--+++++        return attn_output, attn_weights, past_key_value
--+++++
--++++ Deepseek_ATTENTION_CLASSES = {
--++++     "eager": DeepseekAttention,
--+++++    "flash-attention": DeepseekFlashAttention,
--++++ }
--++++ 
--++++ 
--++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
--++++             config=config, layer_idx=layer_idx
--++++         )
--++++ 
--+++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--+++++            config=config, layer_idx=layer_idx
--+++++        )
--+++++
--++++         self.mlp = (
--++++             DeepseekMoE(config)
--++++             if (
--++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++index d4c6b651..bced285c 100644
--++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
--++++ 
--++++ import mindspore
--++++ import mindnlp.core.nn.functional as F
--++++-from mindnlp.core import nn, ops
--+++++from mindnlp.core import nn, ops, no_grad
--++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
--++++ 
--++++ from ....common.activations import ACT2FN
--++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
--++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--++++ 
--+++++Long_Prompt = False
--+++++PROMPT_LENGTH_THRESHOLD = 128
--++++ 
--++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--++++ def _prepare_4d_causal_attention_mask_with_cache_position(
--++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--++++ 
--+++++# class Qwen2MoeFlashAttention(nn.Module):
--+++++#     """
--+++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++++
--+++++#     关键改动:
--+++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++++#        直接传入原始的 key 和 value 张量效率更高。
--+++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++++#     """
--+++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++++#         super().__init__()
--+++++#         self.config = config
--+++++#         self.layer_idx = layer_idx
--+++++#         self.hidden_size = config.hidden_size
--+++++#         self.num_heads = config.num_attention_heads
--+++++#         self.head_dim = self.hidden_size // self.num_heads
--+++++#         self.num_key_value_heads = config.num_key_value_heads
--+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++#         self.max_position_embeddings = config.max_position_embeddings
--+++++#         self.rope_theta = config.rope_theta
--+++++#         self.attention_dropout = config.attention_dropout
--+++++
--+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++#             raise ValueError(
--+++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++++#             )
--+++++
--+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++++
--+++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++++#             self.head_dim,
--+++++#             max_position_embeddings=self.max_position_embeddings,
--+++++#             base=self.rope_theta,
--+++++#         )
--+++++
--+++++#     def forward(
--+++++#         self,
--+++++#         hidden_states: mindspore.Tensor,
--+++++#         attention_mask: Optional[mindspore.Tensor] = None,
--+++++#         position_ids: Optional[mindspore.Tensor] = None,
--+++++#         past_key_value: Optional[Cache] = None,
--+++++#         output_attentions: bool = False,
--+++++#         use_cache: bool = False,
--+++++#         cache_position: Optional[mindspore.Tensor] = None,
--+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++#         bsz, q_len, _ = hidden_states.shape
--+++++
--+++++#         # 1. 线性投射 Q, K, V
--+++++#         query_states = self.q_proj(hidden_states)
--+++++#         key_states = self.k_proj(hidden_states)
--+++++#         value_states = self.v_proj(hidden_states)
--+++++
--+++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
--+++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++#         # 3. RoPE 旋转位置编码
--+++++#         kv_seq_len = key_states.shape[-2]
--+++++#         if past_key_value is not None:
--+++++#             if self.layer_idx is None:
--+++++#                 raise ValueError(
--+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++#                     "with a layer index."
--+++++#                 )
--+++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++++#                 if cache_position.shape[0] == 1:
--+++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++++#                     kv_seq_len = past_seen_tokens + 1
--+++++#                 else:
--+++++#                     # prefill 阶段：cache_position 是范围，使用其长度
--+++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++++#             else:
--+++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++
--+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++#         # 4. KV 缓存更新
--+++++#         if past_key_value is not None:
--+++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++#             key_states, value_states = past_key_value.update(
--+++++#                 key_states, value_states, self.layer_idx, cache_kwargs
--+++++#             )
--+++++            
--+++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++#                 if cache_position.shape[0] == 1:
--+++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++++#                     kv_seq_len = key_states.shape[-2]
--+++++
--+++++#         # 5. [重要] 准备 Attention Mask
--+++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++++#         fa_attention_mask = None
--+++++#         if attention_mask is not None:
--+++++#             # 截取与当前key长度匹配的部分
--+++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++++#             fa_attention_mask = (mask_slice != 0)
--+++++
--+++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++++#         input_dtype = query_states.dtype
--+++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++++#             query_states = query_states.to(mindspore.float16)
--+++++#             key_states = key_states.to(mindspore.float16)
--+++++#             value_states = value_states.to(mindspore.float16)
--+++++
--+++++#         # 6. [核心] 调用 flash_attention_score 算子
--+++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++++#         attn_output = mindspore.ops.flash_attention_score(
--+++++#             query=query_states,
--+++++#             key=key_states,
--+++++#             value=value_states,
--+++++#             head_num=self.num_heads, # 传入Q的头数(N1)
--+++++#             attn_mask=fa_attention_mask,
--+++++#             keep_prob=1.0 - self.attention_dropout,
--+++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++#             input_layout="BNSD",
--+++++#             sparse_mode=0 # 使用 defaultMask 模式
--+++++#         )
--+++++
--+++++#         # 恢复原始数据类型
--+++++#         attn_output = attn_output.to(input_dtype)
--+++++
--+++++#         # 7. 调整输出形状
--+++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++#         attn_output = self.o_proj(attn_output)
--+++++
--+++++#         # FlashAttention 算子不直接返回注意力权重矩阵
--+++++#         attn_weights = None
--+++++#         if output_attentions:
--+++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++
--+++++#         return attn_output, attn_weights, past_key_value
--+++++
--+++++#     # def forward(
--+++++#     #     self,
--+++++#     #     hidden_states: mindspore.Tensor,
--+++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++#     #     position_ids: Optional[mindspore.Tensor] = None,
--+++++#     #     past_key_value: Optional[Cache] = None,
--+++++#     #     output_attentions: bool = False,
--+++++#     #     use_cache: bool = False,
--+++++#     #     cache_position: Optional[mindspore.Tensor] = None,
--+++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++#     #     bsz, q_len, _ = hidden_states.shape
--+++++
--+++++#     #     # 1. 线性投射 Q, K, V
--+++++#     #     query_states = self.q_proj(hidden_states)
--+++++#     #     key_states = self.k_proj(hidden_states)
--+++++#     #     value_states = self.v_proj(hidden_states)
--+++++
--+++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++
--+++++#     #     # 3. RoPE 旋转位置编码
--+++++#     #     kv_seq_len = key_states.shape[-2]
--+++++#     #     if past_key_value is not None:
--+++++#     #         if self.layer_idx is None:
--+++++#     #             raise ValueError(
--+++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++#     #                 "with a layer index."
--+++++#     #             )
--+++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++
--+++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++#     #     # 4. KV 缓存更新
--+++++#     #     if past_key_value is not None:
--+++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++#     #         key_states, value_states = past_key_value.update(
--+++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++#     #         )
--+++++
--+++++#     #     # 5. 准备 Attention Mask
--+++++#     #     fa_attention_mask = None
--+++++#     #     if attention_mask is not None:
--+++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++#     #         fa_attention_mask = (mask_slice != 0)
--+++++
--+++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++++#     #     input_dtype = query_states.dtype
--+++++
--+++++#     #     # 6. [核心] 调用 flash_attention_score 算子
--+++++#     #     attn_output = mindspore.ops.flash_attention_score(
--+++++#     #         query=query_states,
--+++++#     #         key=key_states,
--+++++#     #         value=value_states,
--+++++#     #         head_num=self.num_heads,
--+++++#     #         attn_mask=fa_attention_mask,
--+++++#     #         keep_prob=1.0 - self.attention_dropout,
--+++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++#     #         input_layout="BNSD",
--+++++#     #         sparse_mode=0,
--+++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++++#     #         inner_precise=1
--+++++#     #     )
--+++++
--+++++#     #     # 恢复原始数据类型
--+++++#     #     attn_output = attn_output.to(input_dtype)
--+++++
--+++++#     #     # 7. 调整输出形状
--+++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++#     #     attn_output = self.o_proj(attn_output)
--+++++
--+++++#     #     attn_weights = None
--+++++#     #     if output_attentions:
--+++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++
--+++++#     #     return attn_output, attn_weights, past_key_value
--+++++
--+++++
--++++ class Qwen2MoeFlashAttention(nn.Module):
--++++     """
--++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++++-
--++++-    关键改动:
--++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++++-       直接传入原始的 key 和 value 张量效率更高。
--++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
--+++++
--+++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
--+++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
--+++++    完全使用模型的低精度数据类型（如 float16）进行计算，
--+++++    以达到理论上的最高执行速度。
--++++     """
--++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++         super().__init__()
--++++         self.config = config
--++++         self.layer_idx = layer_idx
--+++++        if layer_idx is None:
--+++++            logger.warning_once(
--+++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
--+++++            )
--+++++
--++++         self.hidden_size = config.hidden_size
--++++         self.num_heads = config.num_attention_heads
--++++         self.head_dim = self.hidden_size // self.num_heads
--++++         self.num_key_value_heads = config.num_key_value_heads
--++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++         self.max_position_embeddings = config.max_position_embeddings
--++++         self.rope_theta = config.rope_theta
--++++         self.attention_dropout = config.attention_dropout
--++++ 
--++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
--++++-            raise ValueError(
--++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++++-            )
--++++-
--++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
--++++         key_states = self.k_proj(hidden_states)
--++++         value_states = self.v_proj(hidden_states)
--++++ 
--++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
--++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++++        # 2. 调整形状以匹配 BNSD 布局
--++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-
--++++-        # 3. RoPE 旋转位置编码
--+++++        
--+++++        # 3. RoPE 和 KV 缓存
--++++         kv_seq_len = key_states.shape[-2]
--++++         if past_key_value is not None:
--++++-            if self.layer_idx is None:
--++++-                raise ValueError(
--++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++-                    "with a layer index."
--++++-                )
--++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++++-                if cache_position.shape[0] == 1:
--++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++++-                    kv_seq_len = past_seen_tokens + 1
--++++-                else:
--++++-                    # prefill 阶段：cache_position 是范围，使用其长度
--++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++++-            else:
--++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++-
--+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++        
--++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++ 
--++++-        # 4. KV 缓存更新
--++++         if past_key_value is not None:
--++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++-            key_states, value_states = past_key_value.update(
--++++-                key_states, value_states, self.layer_idx, cache_kwargs
--++++-            )
--++++-            
--++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++-                if cache_position.shape[0] == 1:
--++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++++-                    kv_seq_len = key_states.shape[-2]
--++++-
--++++-        # 5. [重要] 准备 Attention Mask
--++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++
--+++++        # 4. 准备 Attention Mask
--++++         fa_attention_mask = None
--++++         if attention_mask is not None:
--++++-            # 截取与当前key长度匹配的部分
--++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++++             fa_attention_mask = (mask_slice != 0)
--++++ 
--++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++++-        input_dtype = query_states.dtype
--++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++++-            query_states = query_states.to(mindspore.float16)
--++++-            key_states = key_states.to(mindspore.float16)
--++++-            value_states = value_states.to(mindspore.float16)
--++++-
--++++-        # 6. [核心] 调用 flash_attention_score 算子
--++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
--++++         attn_output = mindspore.ops.flash_attention_score(
--++++             query=query_states,
--++++             key=key_states,
--++++             value=value_states,
--++++-            head_num=self.num_heads, # 传入Q的头数(N1)
--+++++            head_num=self.num_heads,
--++++             attn_mask=fa_attention_mask,
--++++-            keep_prob=1.0 - self.attention_dropout,
--+++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
--++++             scalar_value=1.0 / math.sqrt(self.head_dim),
--++++             input_layout="BNSD",
--++++-            sparse_mode=0 # 使用 defaultMask 模式
--+++++            sparse_mode=0,
--+++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
--++++         )
--++++ 
--++++-        # 恢复原始数据类型
--++++-        attn_output = attn_output.to(input_dtype)
--++++-
--++++-        # 7. 调整输出形状
--++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++++        # 6. 调整输出形状
--++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++         attn_output = self.o_proj(attn_output)
--++++ 
--++++-        # FlashAttention 算子不直接返回注意力权重矩阵
--+++++        # 7. 返回结果
--++++         attn_weights = None
--++++         if output_attentions:
--++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--++++ 
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--++++-    # def forward(
--++++-    #     self,
--++++-    #     hidden_states: mindspore.Tensor,
--++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++-    #     position_ids: Optional[mindspore.Tensor] = None,
--++++-    #     past_key_value: Optional[Cache] = None,
--++++-    #     output_attentions: bool = False,
--++++-    #     use_cache: bool = False,
--++++-    #     cache_position: Optional[mindspore.Tensor] = None,
--++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++-
--++++-    #     bsz, q_len, _ = hidden_states.shape
--++++-
--++++-    #     # 1. 线性投射 Q, K, V
--++++-    #     query_states = self.q_proj(hidden_states)
--++++-    #     key_states = self.k_proj(hidden_states)
--++++-    #     value_states = self.v_proj(hidden_states)
--++++-
--++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-
--++++-    #     # 3. RoPE 旋转位置编码
--++++-    #     kv_seq_len = key_states.shape[-2]
--++++-    #     if past_key_value is not None:
--++++-    #         if self.layer_idx is None:
--++++-    #             raise ValueError(
--++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++-    #                 "with a layer index."
--++++-    #             )
--++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++ 
--++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++-
--++++-    #     # 4. KV 缓存更新
--++++-    #     if past_key_value is not None:
--++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++-    #         key_states, value_states = past_key_value.update(
--++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++-    #         )
--++++-
--++++-    #     # 5. 准备 Attention Mask
--++++-    #     fa_attention_mask = None
--++++-    #     if attention_mask is not None:
--++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++-    #         fa_attention_mask = (mask_slice != 0)
--++++-
--++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++++-    #     input_dtype = query_states.dtype
--++++-
--++++-    #     # 6. [核心] 调用 flash_attention_score 算子
--++++-    #     attn_output = mindspore.ops.flash_attention_score(
--++++-    #         query=query_states,
--++++-    #         key=key_states,
--++++-    #         value=value_states,
--++++-    #         head_num=self.num_heads,
--++++-    #         attn_mask=fa_attention_mask,
--++++-    #         keep_prob=1.0 - self.attention_dropout,
--++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++++-    #         input_layout="BNSD",
--++++-    #         sparse_mode=0,
--++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++++-    #         inner_precise=1
--++++-    #     )
--++++-
--++++-    #     # 恢复原始数据类型
--++++-    #     attn_output = attn_output.to(input_dtype)
--+++++QWEN2MOE_ATTENTION_CLASSES = {
--+++++    "eager": Qwen2MoeAttention,
--+++++    "flash-attention": Qwen2MoeFlashAttention,
--+++++}
--++++ 
--++++-    #     # 7. 调整输出形状
--++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++-    #     attn_output = self.o_proj(attn_output)
--++++ 
--++++-    #     attn_weights = None
--++++-    #     if output_attentions:
--++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++#     def __init__(self, config):
--+++++#         super().__init__()
--+++++#         self.num_experts = config.num_experts
--+++++#         self.top_k = config.num_experts_per_tok
--+++++#         self.norm_topk_prob = config.norm_topk_prob
--+++++
--+++++#         # gating
--+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++#         self.experts = nn.ModuleList(
--+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++#         )
--+++++
--+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++
--+++++#     #@dwj
--+++++#     # 只遍历激活的专家，而非全部专家
--+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++#             num_tokens = hidden_states_reshaped.shape[0]
--+++++
--+++++#             router_logits = self.gate(hidden_states_reshaped)
--+++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++
--+++++#             if self.norm_topk_prob:
--+++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++#             routing_weights = routing_weights.to(hidden_states.dtype)
--+++++            
--+++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++++#             flat_selected_experts = selected_experts.flatten()
--+++++            
--+++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++++#             token_indices = broadcasted_token_indices.flatten()
--+++++            
--+++++#             active_experts = ops.unique(flat_selected_experts)
--+++++            
--+++++#             for expert_idx_tensor in active_experts:
--+++++#                 expert_idx = expert_idx_tensor.item()
--+++++#                 expert_layer = self.experts[expert_idx]
--+++++                
--+++++#                 mask = (flat_selected_experts == expert_idx_tensor)
--+++++#                 selected_token_indices = token_indices[mask]
--+++++#                 selected_routing_weights = routing_weights.flatten()[mask]
--+++++                
--+++++#                 current_states = hidden_states_reshaped[selected_token_indices]
--+++++                
--+++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++                
--+++++#                 final_hidden_states = final_hidden_states.index_add(
--+++++#                     dim=0,
--+++++#                     index=selected_token_indices,
--+++++#                     source=expert_output.to(hidden_states.dtype)
--+++++#                 )
--+++++            
--+++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++++ 
--++++-    #     return attn_output, attn_weights, past_key_value
--+++++#             final_hidden_states = final_hidden_states + shared_expert_output
--+++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++            
--+++++#             return final_hidden_states, router_logits
--+++++
--+++++
--+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++#     """
--+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--+++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--+++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
--+++++#     """
--+++++#     def __init__(self, config: Qwen2MoeConfig):
--+++++#         super().__init__()
--+++++#         self.num_experts = config.num_experts
--+++++#         self.top_k = config.num_experts_per_tok
--+++++#         self.norm_topk_prob = config.norm_topk_prob
--+++++
--+++++#         # 门控网络
--+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++#         # 专家列表
--+++++#         self.experts = nn.ModuleList(
--+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++#         )
--+++++#         # 共享专家
--+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++
--+++++#     @no_grad()
--+++++#     def _moe_infer_decode(
--+++++#         self, 
--+++++#         hidden_states: mindspore.Tensor, 
--+++++#         selected_experts: mindspore.Tensor, 
--+++++#         routing_weights: mindspore.Tensor
--+++++#     ) -> mindspore.Tensor:
--+++++#         """
--+++++#         【解码路径】针对 sequence_length=1 的极致优化。
--+++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--+++++#         """
--+++++#         batch_size, hidden_dim = hidden_states.shape
--+++++        
--+++++#         expert_outputs_list = [
--+++++#             ops.cat([
--+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++#             ], dim=0) 
--+++++#             for i in range(batch_size)
--+++++#         ]
--+++++        
--+++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--+++++#         # shape: (batch_size, top_k, hidden_dim)
--+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++        
--+++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--+++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++++        
--+++++#         return moe_output.squeeze(1)
--+++++
--+++++#     @no_grad()
--+++++#     def _moe_infer_prefill(
--+++++#         self, 
--+++++#         hidden_states: mindspore.Tensor, 
--+++++#         selected_experts: mindspore.Tensor, 
--+++++#         routing_weights: mindspore.Tensor
--+++++#     ) -> mindspore.Tensor:
--+++++#         """
--+++++#         【预填充路径】针对 sequence_length > 1 的优化。
--+++++#         按专家对 Token 进行分组，并进行批处理。
--+++++#         """
--+++++#         moe_output = ops.zeros_like(hidden_states)
--+++++#         num_tokens = hidden_states.shape[0]
--+++++#         flat_selected_experts = selected_experts.flatten()
--+++++        
--+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++        
--+++++#         active_experts = ops.unique(flat_selected_experts)
--+++++        
--+++++#         for expert_idx_tensor in active_experts:
--+++++#             expert_idx = expert_idx_tensor.item()
--+++++#             expert_layer = self.experts[expert_idx]
--+++++            
--+++++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++#             selected_token_indices = token_indices[mask]
--+++++#             selected_routing_weights = routing_weights.flatten()[mask]
--+++++            
--+++++#             current_states = hidden_states[selected_token_indices]
--+++++            
--+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++            
--+++++#             moe_output = moe_output.index_add(
--+++++#                 dim=0,
--+++++#                 index=selected_token_indices,
--+++++#                 source=expert_output.to(hidden_states.dtype)
--+++++#             )
--+++++#         return moe_output
--+++++
--+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++#         """
--+++++#         顶层 forward 方法，作为智能分发器。
--+++++#         """
--+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++        
--+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++#         router_logits = self.gate(hidden_states_reshaped)
--+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++ 
--++++-    # def forward(
--++++-    #     self,
--++++-    #     hidden_states: mindspore.Tensor,
--++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++-    #     position_ids: Optional[mindspore.Tensor] = None,
--++++-    #     past_key_value: Optional[Cache] = None,
--++++-    #     output_attentions: bool = False,
--++++-    #     use_cache: bool = False,
--++++-    #     cache_position: Optional[mindspore.Tensor] = None,
--++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++-
--++++-    #     bsz, q_len, _ = hidden_states.shape
--++++-
--++++-    #     query_states = self.q_proj(hidden_states)
--++++-    #     key_states = self.k_proj(hidden_states)
--++++-    #     value_states = self.v_proj(hidden_states)
--++++-
--++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-
--++++-    #     kv_seq_len = key_states.shape[-2]
--++++-    #     if past_key_value is not None:
--++++-    #         if self.layer_idx is None:
--++++-    #             raise ValueError("`layer_idx` must be specified for caching")
--++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++-
--++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++-
--++++-    #     if past_key_value is not None:
--++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++-    #         key_states, value_states = past_key_value.update(
--++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++-    #         )
--+++++#         if self.norm_topk_prob:
--+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++        
--+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++++        
--+++++#         moe_output = None
--+++++#         # 在推理时，根据序列长度选择最优路径
--+++++#         if not self.training:
--+++++#             if sequence_length == 1:
--+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+++++#             else:
--+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+++++#         else:
--+++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--+++++#             raise NotImplementedError("Training path is not implemented.")
--+++++
--+++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--+++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--+++++        
--+++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--+++++        
--+++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--+++++        
--+++++#         return final_hidden_states, router_logits
--+++++
--+++++
--+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++#     """
--+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--+++++#     """
--+++++#     def __init__(self, config: Qwen2MoeConfig):
--+++++#         super().__init__()
--+++++#         self.num_experts = config.num_experts
--+++++#         self.top_k = config.num_experts_per_tok
--+++++#         self.norm_topk_prob = config.norm_topk_prob
--+++++
--+++++#         # 门控网络
--+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++#         # 专家列表
--+++++#         self.experts = nn.ModuleList(
--+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++#         )
--+++++#         # 共享专家
--+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++
--+++++#     @no_grad()
--+++++#     def _moe_infer_decode(
--+++++#         self, 
--+++++#         hidden_states: mindspore.Tensor, 
--+++++#         selected_experts: mindspore.Tensor, 
--+++++#         routing_weights: mindspore.Tensor
--+++++#     ) -> mindspore.Tensor:
--+++++#         batch_size, _ = hidden_states.shape
--+++++#         expert_outputs_list = [
--+++++#             ops.cat([
--+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++#             ], dim=0) 
--+++++#             for i in range(batch_size)
--+++++#         ]
--+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++++#         return moe_output.squeeze(1)
--+++++
--+++++#     @no_grad()
--+++++#     def _moe_infer_prefill(
--+++++#         self, 
--+++++#         hidden_states: mindspore.Tensor, 
--+++++#         selected_experts: mindspore.Tensor, 
--+++++#         routing_weights: mindspore.Tensor
--+++++#     ) -> mindspore.Tensor:
--+++++#         moe_output = ops.zeros_like(hidden_states)
--+++++#         num_tokens = hidden_states.shape[0]
--+++++#         flat_selected_experts = selected_experts.flatten()
--+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++#         active_experts = ops.unique(flat_selected_experts)
--+++++        
--+++++#         for expert_idx_tensor in active_experts:
--+++++#             expert_idx = expert_idx_tensor.item()
--+++++#             expert_layer = self.experts[expert_idx]
--+++++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++#             selected_token_indices = token_indices[mask]
--+++++#             selected_routing_weights = routing_weights.flatten()[mask]
--+++++#             current_states = hidden_states[selected_token_indices]
--+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++#             moe_output = moe_output.index_add(
--+++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+++++#             )
--+++++#         return moe_output
--+++++
--+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++#         """
--+++++#         顶层 forward 方法，作为智能分发器。
--+++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--+++++#         """
--+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++        
--+++++#         # 1. 门控计算 (通用逻辑)
--+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++#         router_logits = self.gate(hidden_states_reshaped)
--+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++
--+++++#         if self.norm_topk_prob:
--+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++        
--+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++++        
--+++++#         # 2. 智能分发到最优 MoE 路径
--+++++#         moe_output = None
--+++++#         if not self.training:
--+++++#             if sequence_length == 1:
--+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+++++#             else:
--+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+++++#         else:
--+++++#             raise NotImplementedError("Training path is not implemented.")
--+++++
--+++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--+++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++        
--+++++#         # 4. 合并 MoE 输出和共享专家输出
--+++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++        
--+++++#         # 5. 恢复原始形状并返回
--+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++        
--+++++#         return final_hidden_states, router_logits
--+++++
--+++++# prefill fastest
--+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++#     """
--+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--+++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--+++++#     """
--+++++#     def __init__(self, config: Qwen2MoeConfig):
--+++++#         super().__init__()
--+++++#         self.num_experts = config.num_experts
--+++++#         self.top_k = config.num_experts_per_tok
--+++++#         self.norm_topk_prob = config.norm_topk_prob
--+++++
--+++++#         # 门控网络
--+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++#         # 专家列表
--+++++#         self.experts = nn.ModuleList(
--+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++#         )
--+++++#         # 共享专家
--+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++
--+++++#     @no_grad()
--+++++#     def _moe_infer_dispatch(
--+++++#         self, 
--+++++#         hidden_states: mindspore.Tensor, 
--+++++#         selected_experts: mindspore.Tensor, 
--+++++#         routing_weights: mindspore.Tensor
--+++++#     ) -> mindspore.Tensor:
--+++++#         """
--+++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--+++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--+++++#         """
--+++++#         moe_output = ops.zeros_like(hidden_states)
--+++++#         num_tokens, _ = hidden_states.shape
--+++++        
--+++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--+++++#         flat_selected_experts = selected_experts.flatten()
--+++++#         flat_routing_weights = routing_weights.flatten()
--++++ 
--++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++-
--++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++++-    #     query_states = query_states / math.sqrt(self.head_dim)
--++++-    #     # <--- 修改结束 ---
--++++-
--++++-    #     fa_attention_mask = None
--++++-    #     if attention_mask is not None:
--++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++-    #         fa_attention_mask = (mask_slice != 0)
--++++-
--++++-    #     input_dtype = query_states.dtype
--++++-
--++++-    #     attn_output = mindspore.ops.flash_attention_score(
--++++-    #         query=query_states,    # 传入已经预先缩放过的 query
--++++-    #         key=key_states,
--++++-    #         value=value_states,
--++++-    #         head_num=self.num_heads,
--++++-    #         attn_mask=fa_attention_mask,
--++++-    #         keep_prob=1.0 - self.attention_dropout,
--++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++++-    #         input_layout="BNSD",
--++++-    #         sparse_mode=0,
--++++-    #         inner_precise=1        # 仍然保持内部高精度计算
--++++-    #     )
--+++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++ 
--++++-    #     attn_output = attn_output.to(input_dtype)
--++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++-    #     attn_output = self.o_proj(attn_output)
--+++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--+++++#         active_experts = ops.unique(flat_selected_experts)
--+++++        
--+++++#         for expert_idx_tensor in active_experts:
--+++++#             expert_idx = expert_idx_tensor.item()
--+++++#             expert_layer = self.experts[expert_idx]
--+++++            
--+++++#             # 找到所有分配给该专家的 token
--+++++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++            
--+++++#             # 使用 mask 选取对应的 token 和权重
--+++++#             current_token_indices = token_indices[mask]
--+++++#             current_routing_weights = flat_routing_weights[mask]
--+++++#             current_hidden_states = hidden_states[current_token_indices]
--+++++            
--+++++#             # 对这些 token 进行批处理
--+++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++++            
--+++++#             # 使用 index_add 将结果精确地加回到对应位置
--+++++#             moe_output = moe_output.index_add(
--+++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--+++++#             )
--+++++#         return moe_output
--+++++
--+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++#         """
--+++++#         顶层 forward 方法，作为智能分发器。
--+++++#         """
--+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++        
--+++++#         # 1. 门控计算
--+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++#         router_logits = self.gate(hidden_states_reshaped)
--+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++
--+++++#         if self.norm_topk_prob:
--+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++        
--+++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++++        
--+++++#         # 2. 调用统一的 MoE 计算内核
--+++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--+++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--++++ 
--++++-    #     attn_weights = None
--++++-    #     if output_attentions:
--++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++++#         # 3. 统一处理共享专家
--+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++        
--+++++#         # 4. 合并输出
--+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++        
--+++++#         # 5. 恢复原始形状并返回
--+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++        
--+++++#         return final_hidden_states, router_logits
--+++++
--+++++
--+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++#     """
--+++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++++#     【最终高性能与高精度版】：
--+++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--+++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--+++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--+++++#     3. 这样实现了速度和准确性的两全其美。
--+++++#     """
--+++++#     def __init__(self, config: Qwen2MoeConfig):
--+++++#         super().__init__()
--+++++#         self.num_experts = config.num_experts
--+++++#         self.top_k = config.num_experts_per_tok
--+++++#         self.norm_topk_prob = config.norm_topk_prob
--+++++
--+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++#         self.experts = nn.ModuleList(
--+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++#         )
--+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++
--+++++#     @no_grad()
--+++++#     def _moe_infer_decode(
--+++++#         self, 
--+++++#         hidden_states: mindspore.Tensor, 
--+++++#         selected_experts: mindspore.Tensor, 
--+++++#         routing_weights: mindspore.Tensor
--+++++#     ) -> mindspore.Tensor:
--+++++#         """
--+++++#         【解码路径】极致优化版：bmm + 高精度累加。
--+++++#         """
--+++++#         original_dtype = hidden_states.dtype
--+++++#         batch_size, _ = hidden_states.shape
--+++++        
--+++++#         expert_outputs_list = [
--+++++#             ops.cat([
--+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++#             ], dim=0) 
--+++++#             for i in range(batch_size)
--+++++#         ]
--+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++
--+++++#         # 在 float32 下执行 bmm，得到高精度结果
--+++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++++        
--+++++#         # 将高精度结果转换回原始数据类型
--+++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--+++++        
--+++++#         return moe_output
--+++++
--+++++#     @no_grad()
--+++++#     def _moe_infer_prefill(
--+++++#         self, 
--+++++#         hidden_states: mindspore.Tensor, 
--+++++#         selected_experts: mindspore.Tensor, 
--+++++#         routing_weights: mindspore.Tensor
--+++++#     ) -> mindspore.Tensor:
--+++++#         """
--+++++#         【预填充路径】与原始实现一致，结果精确。
--+++++#         """
--+++++#         moe_output = ops.zeros_like(hidden_states)
--+++++#         num_tokens, _ = hidden_states.shape
--+++++#         flat_selected_experts = selected_experts.flatten()
--+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++#         active_experts = ops.unique(flat_selected_experts)
--+++++        
--+++++#         for expert_idx_tensor in active_experts:
--+++++#             expert_idx = expert_idx_tensor.item()
--+++++#             expert_layer = self.experts[expert_idx]
--+++++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++#             selected_token_indices = token_indices[mask]
--+++++#             selected_routing_weights = routing_weights.flatten()[mask]
--+++++#             current_states = hidden_states[selected_token_indices]
--+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++#             moe_output = moe_output.index_add(
--+++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+++++#             )
--+++++#         return moe_output
--+++++
--+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++        
--+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++#         router_logits = self.gate(hidden_states_reshaped)
--+++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++ 
--++++-    #     return attn_output, attn_weights, past_key_value
--+++++#         if self.norm_topk_prob:
--+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++        
--+++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--+++++#         # 如果模型主体是 float16，后续再转换
--+++++        
--+++++#         moe_output = None
--+++++#         if not self.training:
--+++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--+++++#             # _moe_infer_decode 内部会处理好类型转换
--+++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--+++++#             if sequence_length == 1:
--+++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+++++#             else:
--+++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+++++#         else:
--+++++#             raise NotImplementedError("Training path is not implemented.")
--+++++
--+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++        
--+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++        
--+++++#         return final_hidden_states, router_logits
--+++++    
--++++ 
--++++-QWEN2MOE_ATTENTION_CLASSES = {
--++++-    "eager": Qwen2MoeAttention,
--++++-    "flash-attention": Qwen2MoeFlashAttention,
--++++-}
--+++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++#     """
--+++++#     【融合版】一个混合专家模块，内置两种推理策略，
--+++++#     由外部全局变量 `Long_Prompt` 控制：
--+++++
--+++++#     - if Long_Prompt is True:  【精度优先模式】
--+++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--+++++#       适用于处理长序列，避免误差累积。
--+++++
--+++++#     - if Long_Prompt is False: 【速度优先模式】
--+++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--+++++#       在解码阶段获得极致速度，同时保证结果高度准确。
--+++++#     """
--+++++#     def __init__(self, config: Qwen2MoeConfig):
--+++++#         super().__init__()
--+++++#         self.num_experts = config.num_experts
--+++++#         self.top_k = config.num_experts_per_tok
--+++++#         self.norm_topk_prob = config.norm_topk_prob
--+++++
--+++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++#         self.experts = nn.ModuleList(
--+++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++#         )
--+++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++
--+++++#     # --- 速度优先模式的辅助函数 ---
--+++++#     @no_grad()
--+++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++#         original_dtype = hidden_states.dtype
--+++++#         batch_size, _ = hidden_states.shape
--+++++#         expert_outputs_list = [
--+++++#             ops.cat([
--+++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++#             ], dim=0) 
--+++++#             for i in range(batch_size)
--+++++#         ]
--+++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++#         weights_fp32 = routing_weights.to(mindspore.float32)
--+++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
--+++++
--+++++#     @no_grad()
--+++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++#         moe_output = ops.zeros_like(hidden_states)
--+++++#         num_tokens, _ = hidden_states.shape
--+++++#         flat_selected_experts = selected_experts.flatten()
--+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++#         active_experts = ops.unique(flat_selected_experts)
--+++++#         for expert_idx_tensor in active_experts:
--+++++#             expert_idx = expert_idx_tensor.item()
--+++++#             expert_layer = self.experts[expert_idx]
--+++++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++#             selected_token_indices = token_indices[mask]
--+++++#             selected_routing_weights = routing_weights.flatten()[mask]
--+++++#             current_states = hidden_states[selected_token_indices]
--+++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--+++++#         return moe_output
--+++++
--+++++#     # --- 精度优先模式的辅助函数 ---
--+++++#     @no_grad()
--+++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++#         moe_output = ops.zeros_like(hidden_states)
--+++++#         num_tokens, _ = hidden_states.shape
--+++++#         flat_selected_experts = selected_experts.flatten()
--+++++#         flat_routing_weights = routing_weights.flatten()
--+++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++#         active_experts = ops.unique(flat_selected_experts)
--+++++#         for expert_idx_tensor in active_experts:
--+++++#             expert_idx = expert_idx_tensor.item()
--+++++#             expert_layer = self.experts[expert_idx]
--+++++#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++#             current_token_indices = token_indices[mask]
--+++++#             current_routing_weights = flat_routing_weights[mask]
--+++++#             current_hidden_states = hidden_states[current_token_indices]
--+++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+++++#         return moe_output
--+++++
--+++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++#         # 声明我们将要使用一个在模块外部定义的全局变量
--+++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--+++++#         global Long_Prompt
--+++++
--+++++#         # 1. 门控计算 (所有模式通用)
--+++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++#         router_logits = self.gate(hidden_states_reshaped)
--+++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+++++#         if self.norm_topk_prob:
--+++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++        
--+++++#         moe_output = None
--+++++#         if not self.training:
--+++++#             # 根据 Long_Prompt 标志选择模式
--+++++#             if Long_Prompt:
--+++++#                 # --- 精度优先模式 ---
--+++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++#             else:
--+++++#                 # --- 速度优先模式 ---
--+++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++#                 if sequence_length == 1:
--+++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++#                 else:
--+++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++#         else:
--+++++#             raise NotImplementedError("Training path is not implemented.")
--+++++
--+++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++        
--+++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++        
--+++++#         return final_hidden_states, router_logits
--+++++    
--+++++class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++    """
--+++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--+++++    控制的顶级推理策略：
--++++ 
--+++++    - if Long_Prompt is True:  【精度优先模式】
--+++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
--+++++      适用于需要严格可复现性的长序列任务。
--++++ 
--++++-class Qwen2MoeSparseMoeBlock(nn.Module):
--++++-    def __init__(self, config):
--+++++    - if Long_Prompt is False: 【速度优先模式】
--+++++      采用业界最强的性能组合：
--+++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
--+++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
--+++++    """
--+++++    def __init__(self, config: Qwen2MoeConfig):
--++++         super().__init__()
--++++         self.num_experts = config.num_experts
--++++         self.top_k = config.num_experts_per_tok
--++++         self.norm_topk_prob = config.norm_topk_prob
--++++ 
--++++-        # gating
--++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++         self.experts = nn.ModuleList(
--++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++         )
--++++-
--++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++ 
--++++-    #@dwj
--++++-    # 只遍历激活的专家，而非全部专家
--++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++-            num_tokens = hidden_states_reshaped.shape[0]
--++++-
--++++-            router_logits = self.gate(hidden_states_reshaped)
--++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++-
--++++-            if self.norm_topk_prob:
--++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-            routing_weights = routing_weights.to(hidden_states.dtype)
--++++-            
--++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++++-            flat_selected_experts = selected_experts.flatten()
--++++-            
--++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++++-            token_indices = broadcasted_token_indices.flatten()
--++++-            
--++++-            active_experts = ops.unique(flat_selected_experts)
--++++-            
--++++-            for expert_idx_tensor in active_experts:
--++++-                expert_idx = expert_idx_tensor.item()
--++++-                expert_layer = self.experts[expert_idx]
--++++-                
--++++-                mask = (flat_selected_experts == expert_idx_tensor)
--++++-                selected_token_indices = token_indices[mask]
--++++-                selected_routing_weights = routing_weights.flatten()[mask]
--++++-                
--++++-                current_states = hidden_states_reshaped[selected_token_indices]
--++++-                
--++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++-                
--++++-                final_hidden_states = final_hidden_states.index_add(
--++++-                    dim=0,
--++++-                    index=selected_token_indices,
--++++-                    source=expert_output.to(hidden_states.dtype)
--++++-                )
--++++-            
--++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
--+++++    @no_grad()
--+++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++        original_dtype = hidden_states.dtype
--+++++        batch_size, _ = hidden_states.shape
--+++++        expert_outputs_list = [
--+++++            ops.cat([
--+++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++            ], dim=0)
--+++++            for i in range(batch_size)
--+++++        ]
--+++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++        weights_fp32 = routing_weights.to(mindspore.float32)
--+++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+++++        return moe_output_fp32.squeeze(1).to(original_dtype)
--+++++
--+++++    @no_grad()
--+++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++        num_tokens, _ = hidden_states.shape
--+++++        flat_selected_experts = selected_experts.flatten()
--+++++        sorted_expert_indices = flat_selected_experts.argsort()
--+++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+++++        original_token_indices = sorted_expert_indices // self.top_k
--+++++        moe_output = ops.zeros_like(hidden_states)
--+++++        current_token_offset = 0
--+++++        for i in range(self.num_experts):
--+++++            expert_token_count = tokens_per_expert[i] - current_token_offset
--+++++            if expert_token_count == 0:
--+++++                continue
--+++++            end_offset = current_token_offset + expert_token_count
--+++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+++++            expert_hidden_states = hidden_states[expert_original_token_indices]
--+++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+++++            current_token_offset += expert_token_count
--+++++        return moe_output
--+++++
--+++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--+++++    @no_grad()
--+++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++        moe_output = ops.zeros_like(hidden_states)
--+++++        num_tokens, _ = hidden_states.shape
--+++++        flat_selected_experts = selected_experts.flatten()
--+++++        flat_routing_weights = routing_weights.flatten()
--+++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++        active_experts = ops.unique(flat_selected_experts)
--+++++        for expert_idx_tensor in active_experts:
--+++++            expert_idx = expert_idx_tensor.item()
--+++++            expert_layer = self.experts[expert_idx]
--+++++            mask = (flat_selected_experts == expert_idx_tensor)
--+++++            current_token_indices = token_indices[mask]
--+++++            current_routing_weights = flat_routing_weights[mask]
--+++++            current_hidden_states = hidden_states[current_token_indices]
--+++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+++++        return moe_output
--++++ 
--++++-            final_hidden_states = final_hidden_states + shared_expert_output
--++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++-            
--++++-            return final_hidden_states, router_logits
--+++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++        global Long_Prompt
--+++++
--+++++        # 1. 门控计算 (所有模式通用)
--+++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++        router_logits = self.gate(hidden_states_reshaped)
--+++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+++++        if self.norm_topk_prob:
--+++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++        
--+++++        moe_output = None
--+++++        if Long_Prompt:
--+++++            # --- 精度优先模式 (ACCURACY MODE) ---
--+++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++        else:
--+++++            # --- 速度优先模式 (SPEED MODE) ---
--+++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++            if sequence_length == 1:
--+++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++            else:
--+++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++        
--++++ 
--+++++        # 3. 共享专家计算与合并 (所有模式通用)
--+++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++        
--+++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++        
--+++++        return final_hidden_states, router_logits
--++++ 
--++++ class Qwen2MoeDecoderLayer(nn.Module):
--++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--++++         super().__init__()
--++++         self.hidden_size = config.hidden_size
--+++++        
--+++++        # if Long_Prompt:
--+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++++        # else:
--+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++ 
--++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++ 
--++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++-
--++++         if (layer_idx not in config.mlp_only_layers) and (
--++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++++         ):
--++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++             self._warmed_up = True
--++++             self.warmup_moe_model()
--++++ 
--+++++
--+++++
--++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++++         output_router_logits = (
--++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
--++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++             router_logits=outputs.router_logits,
--++++         )
--++++ 
--+++++    def generate(self, *args, **kwargs):
--+++++        """
--+++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--+++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--+++++        """
--+++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--+++++
--+++++        input_ids = kwargs.get("input_ids")
--+++++        if input_ids is None and args:
--+++++            input_ids = args[0]
--+++++
--+++++        if input_ids is not None:
--+++++            prompt_length = input_ids.shape[1]
--+++++            
--+++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--+++++                Long_Prompt = True
--+++++            else:
--+++++                Long_Prompt = False
--+++++
--+++++        return super().generate(*args, **kwargs)
--+++++    
--++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
--++++     def prepare_inputs_for_generation(
--++++         self,
--++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
--++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
--++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
--+++++        
--++++         if past_key_values is not None:
--++++             if inputs_embeds is not None:  # Exception 1
--++++                 if 0 not in input_ids.shape:
--++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++             }
--++++         )
--++++         return model_inputs
--+++++
--++++ # @lwx
--++++     # def _decode_one_tokens_logits(
--++++     #     self,
--++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
--++++             attentions=outputs.attentions,
--++++         )
--++++ 
--+++++
--++++ __all__ = [
--++++     "Qwen2MoeForCausalLM",
--++++     "Qwen2MoeModel",
--++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--++++new file mode 100644
--++++index 00000000..6dfb5b93
--++++--- /dev/null
--+++++++ b/patches/0001-20251104commit.patch
--++++@@ -0,0 +1,1272 @@
--+++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+++++From: Pinoeer-kingxi <13022943007@163.com>
--+++++Date: Tue, 4 Nov 2025 09:11:51 +0800
--+++++Subject: [PATCH] 20251104commit
--+++++
--+++++---
--+++++ mindnlp/transformers/cache_utils.py           |  28 +-
--+++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+++++ 3 files changed, 976 insertions(+), 87 deletions(-)
--+++++
--+++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+++++index cadd2e04..02f8d4be 100644
--+++++--- a/mindnlp/transformers/cache_utils.py
--++++++++ b/mindnlp/transformers/cache_utils.py
--+++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+++++             # k_out[:, :, cache_position] = key_states
--+++++             # v_out[:, :, cache_position] = value_states
--+++++-            if ON_ORANGE_PI:
--+++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++++-            else:
--+++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++++-
--++++++            # if ON_ORANGE_PI:
--++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++++            # else:
--++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++++            # 确保 cache_position 是 1D tensor 并且类型正确
--++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--++++++            if cache_position.ndim > 1:
--++++++                cache_position = cache_position.flatten()
--++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
--++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--++++++                cache_position = cache_position.int()
--++++++            
--++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--++++++            k_out[:, :, cache_position] = key_states
--++++++            v_out[:, :, cache_position] = value_states
--++++++            
--+++++         return k_out, v_out
--+++++ 
--+++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++index c695b944..d8303e45 100644
--+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+++++ def rotate_half(x):
--+++++     """Rotates half the hidden dims of the input."""
--+++++-    x1 = x[..., : x.shape[-1] // 2]
--+++++-    x2 = x[..., x.shape[-1] // 2 :]
--++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++++    # x1 = x[..., : x.shape[-1] // 2]
--++++++    # x2 = x[..., x.shape[-1] // 2 :]
--++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++++     return ops.cat((-x2, x1), dim=-1)
--+++++ 
--+++++ 
--+++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+++++         if self.training:
--+++++             raise NotImplementedError("Training is not supported yet.")
--+++++         else:
--+++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++++-        if self.config.n_shared_experts is not None:
--+++++-            y = y + self.shared_experts(identity)
--+++++-        return y
--++++++            # @lwx
--++++++            if orig_shape[1] == 1:
--++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--++++++                y=y.view(*orig_shape)
--++++++                if self.config.n_shared_experts is not None:
--++++++                    y = y + self.shared_experts(identity)
--++++++                return y
--++++++            else:
--++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--++++++                if self.config.n_shared_experts is not None:
--++++++                    y = y + self.shared_experts(identity)
--++++++                return y
--++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++++        # if self.config.n_shared_experts is not None:
--++++++        #     y = y + self.shared_experts(identity)
--++++++        # return y
--++++++        
--++++++    @no_grad()
--++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++++
--++++++        expert_cache = ops.zeros_like(x)
--++++++        for i in range(self.num_experts_per_tok):
--++++++            expert_id = flat_expert_indices[i].item()
--++++++            weight = flat_expert_weights[i].item()
--++++++            expert = self.experts[expert_id]
--++++++            expert_out = expert(x)
--++++++            expert_cache += expert_out * weight
--++++++        return expert_cache
--+++++ 
--+++++     @no_grad()
--+++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++-        # expert_cache = torch.zeros_like(x)
--+++++-        # idxs = flat_expert_indices.argsort()
--+++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++-        # token_idxs = idxs // self.num_experts_per_tok
--+++++-        # for i, end_idx in enumerate(tokens_per_expert):
--+++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++-        #     if start_idx == end_idx:
--+++++-        #         continue
--+++++-        #     expert = self.experts[i]
--+++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++-        #     expert_tokens = x[exp_token_idx]
--+++++-        #     expert_out = expert(expert_tokens)
--+++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++-        # return expert_cache
--++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++++         expert_cache = ops.zeros_like(x)
--+++++         idxs = flat_expert_indices.argsort()
--+++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++         token_idxs = idxs // self.num_experts_per_tok
--++++++
--+++++         for i, end_idx in enumerate(tokens_per_expert):
--+++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++             if start_idx == end_idx:
--+++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+++++             expert_out = expert(expert_tokens)
--+++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++++
--+++++         return expert_cache
--++++++        
--++++++    # @no_grad()
--++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++++    #     # expert_cache = torch.zeros_like(x)
--++++++    #     # idxs = flat_expert_indices.argsort()
--++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++++    #     # token_idxs = idxs // self.num_experts_per_tok
--++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
--++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++++    #     #     if start_idx == end_idx:
--++++++    #     #         continue
--++++++    #     #     expert = self.experts[i]
--++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++++    #     #     expert_tokens = x[exp_token_idx]
--++++++    #     #     expert_out = expert(expert_tokens)
--++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++++    #     # return expert_cache
--++++++    #     expert_cache = ops.zeros_like(x)
--++++++    #     idxs = flat_expert_indices.argsort()
--++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++++    #     token_idxs = idxs // self.num_experts_per_tok
--++++++
--++++++    #     for i, end_idx in enumerate(tokens_per_expert):
--++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++++    #         if start_idx == end_idx:
--++++++    #             continue
--++++++    #         expert = self.experts[i]
--++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++++    #         expert_tokens = x[exp_token_idx]
--++++++    #         expert_out = expert(expert_tokens)
--++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++++
--++++++    #     return expert_cache
--++++++    # @no_grad()
--++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++++    #     expert_cache = ops.zeros_like(x)
--++++++
--++++++    #     # 排序保证顺序一致
--++++++    #     idxs = flat_expert_indices.argsort()
--++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++++    #     token_idxs = idxs // self.num_experts_per_tok
--++++++
--++++++    #     # 找出有 token 的专家
--++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++++
--++++++    #     for i in active_experts.tolist():
--++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++++    #         end_idx = tokens_per_expert[i]
--++++++    #         if start_idx == end_idx:  # 没有 token
--++++++    #             continue
--++++++
--++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++++    #         expert_tokens = x[exp_token_idx]
--++++++    #         expert_out = self.experts[i](expert_tokens)
--++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++++
--++++++    #         expert_cache = mindspore.mint.scatter_add(
--++++++    #             expert_cache,
--++++++    #             0,
--++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++++    #             expert_out
--++++++    #         )
--++++++
--++++++    #     return expert_cache
--++++++
--++++++
--+++++ 
--+++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+++++ #     """
--+++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++++ 
--+++++         # Initialize weights and apply final processing
--+++++         self.post_init()
--++++++        self.warm_up = False
--++++++
--++++++    def warmup_moe_model_deep(self):
--++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++++++        test_texts = [
--++++++            "warmup short",
--++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--++++++        ]
--++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++++        if tokenizer is None:
--++++++            from mindnlp.transformers import AutoTokenizer
--++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++++            self._warmup_tokenizer = tokenizer
--++++++
--++++++        for text in test_texts:
--++++++            inputs = tokenizer(text, return_tensors="ms")
--++++++            with mindspore._no_grad():
--++++++                _ = self(**inputs, use_cache=False)
--++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+++++ 
--+++++     def get_input_embeddings(self):
--+++++         return self.model.embed_tokens
--+++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++++         ```"""
--++++++        if not self.warm_up:
--++++++            self.warm_up = True
--++++++            self.warmup_moe_model_deep()
--++++++
--+++++         output_attentions = (
--+++++             output_attentions
--+++++             if output_attentions is not None
--+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++index 3cbf820e..d4c6b651 100644
--+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++@@ -18,7 +18,6 @@
--+++++ # See the License for the specific language governing permissions and
--+++++ # limitations under the License.
--+++++ """MindSpore Qwen2MoE model."""
--+++++-
--+++++ import math
--+++++ from typing import List, Optional, Tuple, Union
--+++++ 
--+++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+++++     TokenClassifierOutput,
--+++++ )
--+++++ from ...modeling_utils import PreTrainedModel
--++++++from ...generation import GenerationMixin
--+++++ from ....utils import logging
--+++++ from .configuration_qwen2_moe import Qwen2MoeConfig
--+++++ 
--+++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+++++         self.variance_epsilon = eps
--+++++ 
--+++++     def forward(self, hidden_states):
--++++++        # @dwj
--++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++++        # @lwx
--++++++        # if not self.training :
--++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++++         input_dtype = hidden_states.dtype
--+++++         hidden_states = hidden_states.to(mindspore.float32)
--+++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+++++@@ -234,6 +239,8 @@ def rotate_half(x):
--+++++     """Rotates half the hidden dims of the input."""
--+++++     x1 = x[..., : x.shape[-1] // 2]
--+++++     x2 = x[..., x.shape[-1] // 2 :]
--++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++++     return ops.cat((-x2, x1), dim=-1)
--+++++ 
--+++++ 
--+++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+++++         self.config = config
--+++++         self.hidden_size = config.hidden_size
--+++++         self.intermediate_size = intermediate_size
--++++++        
--+++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+++++         self.act_fn = ACT2FN[config.hidden_act]
--+++++ 
--+++++     def forward(self, x):
--+++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++++-
--+++++ 
--++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++++        # @lwx 
--++++++        # gate_up_output = self.gate_up_proj(x)
--++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--++++++        # return self.down_proj(swiglu_output)
--++++++
--++++++    # def forward(self, x):
--++++++    #     gate_proj_out = self.gate_proj(x)
--++++++    #     up_proj_out = self.up_proj(x)
--++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--++++++    #     return self.down_proj(swiglu_out)
--++++++        
--+++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+++++     """
--+++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+++++         use_cache: bool = False,
--+++++         cache_position: Optional[mindspore.Tensor] = None,
--+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++        
--++++++
--+++++         bsz, q_len, _ = hidden_states.shape
--+++++ 
--+++++         query_states = self.q_proj(hidden_states)
--+++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++                     "with a layer index."
--+++++                 )
--+++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++            if isinstance(past_key_value, StaticCache):
--++++++                kv_seq_len = key_states.shape[-2]
--++++++            else:
--++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++ 
--+++++         if past_key_value is not None:
--+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++++            
--++++++            if isinstance(past_key_value, StaticCache):
--++++++                kv_seq_len = key_states.shape[-2]
--+++++ 
--+++++         # repeat k/v heads if n_kv_heads < n_heads
--+++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++-
--++++++        
--+++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++++ 
--+++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+++++-            raise ValueError(
--+++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+++++-                f" {attn_weights.shape}"
--+++++-            )
--+++++-
--+++++-        if attention_mask is not None:  # no matter the length, we just slice it
--+++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--++++++        if attention_mask is not None:
--++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++++             attn_weights = attn_weights + causal_mask
--+++++ 
--+++++         # upcast attention to fp32
--+++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++++ 
--+++++         attn_output = self.o_proj(attn_output)
--+++++-
--++++++        # @lwx
--++++++        
--++++++        # max_seq_len = self.max_position_embeddings  # 2048
--++++++
--++++++        # if attention_mask is not None:
--++++++        #     # attention_mask: [B, 1, Sq, Sk]
--++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++++++
--++++++        #     # pad 到 [max_seq_len, max_seq_len]
--++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++++++        #     global_attention_mask = padded_mask
--++++++        # else:
--++++++        #     global_attention_mask = None
--++++++
--++++++
--++++++        # sparse_mode=3
--++++++        # attn_output = mindspore.ops.flash_attention_score(
--++++++        #     query=query_states,
--++++++        #     key=key_states,
--++++++        #     value=value_states,
--++++++        #     real_shift=None,
--++++++        #     padding_mask=None,
--++++++
--++++++        #     head_num=self.num_heads,
--++++++        #     attn_mask=global_attention_mask,
--++++++        #     keep_prob=1.0 - self.attention_dropout,
--++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++        #     input_layout="BNSD",
--++++++        #     pre_tokens=2147483647,
--++++++        #     next_tokens=2147483647,
--++++++        #     inner_precise=0,
--++++++        #     drop_mask=None,
--++++++        #     prefix=None,
--++++++        #     actual_seq_qlen=None,
--++++++        #     actual_seq_kvlen=None,
--++++++        #     sparse_mode=sparse_mode,
--++++++        # )
--+++++         if not output_attentions:
--+++++             attn_weights = None
--+++++ 
--+++++         return attn_output, attn_weights, past_key_value
--+++++ 
--+++++ 
--++++++class Qwen2MoeFlashAttention(nn.Module):
--++++++    """
--++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++++++
--++++++    关键改动:
--++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++++++       直接传入原始的 key 和 value 张量效率更高。
--++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++++    """
--++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++++        super().__init__()
--++++++        self.config = config
--++++++        self.layer_idx = layer_idx
--++++++        self.hidden_size = config.hidden_size
--++++++        self.num_heads = config.num_attention_heads
--++++++        self.head_dim = self.hidden_size // self.num_heads
--++++++        self.num_key_value_heads = config.num_key_value_heads
--++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++++        self.max_position_embeddings = config.max_position_embeddings
--++++++        self.rope_theta = config.rope_theta
--++++++        self.attention_dropout = config.attention_dropout
--++++++
--++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++++++            raise ValueError(
--++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++++++            )
--++++++
--++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++++
--++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++++            self.head_dim,
--++++++            max_position_embeddings=self.max_position_embeddings,
--++++++            base=self.rope_theta,
--++++++        )
--++++++
--++++++    def forward(
--++++++        self,
--++++++        hidden_states: mindspore.Tensor,
--++++++        attention_mask: Optional[mindspore.Tensor] = None,
--++++++        position_ids: Optional[mindspore.Tensor] = None,
--++++++        past_key_value: Optional[Cache] = None,
--++++++        output_attentions: bool = False,
--++++++        use_cache: bool = False,
--++++++        cache_position: Optional[mindspore.Tensor] = None,
--++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++        bsz, q_len, _ = hidden_states.shape
--++++++
--++++++        # 1. 线性投射 Q, K, V
--++++++        query_states = self.q_proj(hidden_states)
--++++++        key_states = self.k_proj(hidden_states)
--++++++        value_states = self.v_proj(hidden_states)
--++++++
--++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
--++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++        # 3. RoPE 旋转位置编码
--++++++        kv_seq_len = key_states.shape[-2]
--++++++        if past_key_value is not None:
--++++++            if self.layer_idx is None:
--++++++                raise ValueError(
--++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++                    "with a layer index."
--++++++                )
--++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++++++                if cache_position.shape[0] == 1:
--++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++++++                    kv_seq_len = past_seen_tokens + 1
--++++++                else:
--++++++                    # prefill 阶段：cache_position 是范围，使用其长度
--++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++++++            else:
--++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++
--++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++        # 4. KV 缓存更新
--++++++        if past_key_value is not None:
--++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++++            key_states, value_states = past_key_value.update(
--++++++                key_states, value_states, self.layer_idx, cache_kwargs
--++++++            )
--++++++            
--++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++++                if cache_position.shape[0] == 1:
--++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++++++                    kv_seq_len = key_states.shape[-2]
--++++++
--++++++        # 5. [重要] 准备 Attention Mask
--++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++++        fa_attention_mask = None
--++++++        if attention_mask is not None:
--++++++            # 截取与当前key长度匹配的部分
--++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++++++            fa_attention_mask = (mask_slice != 0)
--++++++
--++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++++++        input_dtype = query_states.dtype
--++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++++++            query_states = query_states.to(mindspore.float16)
--++++++            key_states = key_states.to(mindspore.float16)
--++++++            value_states = value_states.to(mindspore.float16)
--++++++
--++++++        # 6. [核心] 调用 flash_attention_score 算子
--++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++++        attn_output = mindspore.ops.flash_attention_score(
--++++++            query=query_states,
--++++++            key=key_states,
--++++++            value=value_states,
--++++++            head_num=self.num_heads, # 传入Q的头数(N1)
--++++++            attn_mask=fa_attention_mask,
--++++++            keep_prob=1.0 - self.attention_dropout,
--++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++            input_layout="BNSD",
--++++++            sparse_mode=0 # 使用 defaultMask 模式
--++++++        )
--++++++
--++++++        # 恢复原始数据类型
--++++++        attn_output = attn_output.to(input_dtype)
--++++++
--++++++        # 7. 调整输出形状
--++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++        attn_output = self.o_proj(attn_output)
--++++++
--++++++        # FlashAttention 算子不直接返回注意力权重矩阵
--++++++        attn_weights = None
--++++++        if output_attentions:
--++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++++
--++++++        return attn_output, attn_weights, past_key_value
--++++++
--++++++    # def forward(
--++++++    #     self,
--++++++    #     hidden_states: mindspore.Tensor,
--++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++++    #     position_ids: Optional[mindspore.Tensor] = None,
--++++++    #     past_key_value: Optional[Cache] = None,
--++++++    #     output_attentions: bool = False,
--++++++    #     use_cache: bool = False,
--++++++    #     cache_position: Optional[mindspore.Tensor] = None,
--++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++    #     bsz, q_len, _ = hidden_states.shape
--++++++
--++++++    #     # 1. 线性投射 Q, K, V
--++++++    #     query_states = self.q_proj(hidden_states)
--++++++    #     key_states = self.k_proj(hidden_states)
--++++++    #     value_states = self.v_proj(hidden_states)
--++++++
--++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++    #     # 3. RoPE 旋转位置编码
--++++++    #     kv_seq_len = key_states.shape[-2]
--++++++    #     if past_key_value is not None:
--++++++    #         if self.layer_idx is None:
--++++++    #             raise ValueError(
--++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++    #                 "with a layer index."
--++++++    #             )
--++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++
--++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++    #     # 4. KV 缓存更新
--++++++    #     if past_key_value is not None:
--++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++++    #         key_states, value_states = past_key_value.update(
--++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++++    #         )
--++++++
--++++++    #     # 5. 准备 Attention Mask
--++++++    #     fa_attention_mask = None
--++++++    #     if attention_mask is not None:
--++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++    #         fa_attention_mask = (mask_slice != 0)
--++++++
--++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++++++    #     input_dtype = query_states.dtype
--++++++
--++++++    #     # 6. [核心] 调用 flash_attention_score 算子
--++++++    #     attn_output = mindspore.ops.flash_attention_score(
--++++++    #         query=query_states,
--++++++    #         key=key_states,
--++++++    #         value=value_states,
--++++++    #         head_num=self.num_heads,
--++++++    #         attn_mask=fa_attention_mask,
--++++++    #         keep_prob=1.0 - self.attention_dropout,
--++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++    #         input_layout="BNSD",
--++++++    #         sparse_mode=0,
--++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++++++    #         inner_precise=1
--++++++    #     )
--++++++
--++++++    #     # 恢复原始数据类型
--++++++    #     attn_output = attn_output.to(input_dtype)
--++++++
--++++++    #     # 7. 调整输出形状
--++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++    #     attn_output = self.o_proj(attn_output)
--++++++
--++++++    #     attn_weights = None
--++++++    #     if output_attentions:
--++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++++
--++++++    #     return attn_output, attn_weights, past_key_value
--++++++
--++++++    # def forward(
--++++++    #     self,
--++++++    #     hidden_states: mindspore.Tensor,
--++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++++    #     position_ids: Optional[mindspore.Tensor] = None,
--++++++    #     past_key_value: Optional[Cache] = None,
--++++++    #     output_attentions: bool = False,
--++++++    #     use_cache: bool = False,
--++++++    #     cache_position: Optional[mindspore.Tensor] = None,
--++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++    #     bsz, q_len, _ = hidden_states.shape
--++++++
--++++++    #     query_states = self.q_proj(hidden_states)
--++++++    #     key_states = self.k_proj(hidden_states)
--++++++    #     value_states = self.v_proj(hidden_states)
--++++++
--++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++    #     kv_seq_len = key_states.shape[-2]
--++++++    #     if past_key_value is not None:
--++++++    #         if self.layer_idx is None:
--++++++    #             raise ValueError("`layer_idx` must be specified for caching")
--++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++
--++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++    #     if past_key_value is not None:
--++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++++    #         key_states, value_states = past_key_value.update(
--++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++++    #         )
--++++++
--++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++++
--++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++++++    #     query_states = query_states / math.sqrt(self.head_dim)
--++++++    #     # <--- 修改结束 ---
--++++++
--++++++    #     fa_attention_mask = None
--++++++    #     if attention_mask is not None:
--++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++    #         fa_attention_mask = (mask_slice != 0)
--++++++
--++++++    #     input_dtype = query_states.dtype
--++++++
--++++++    #     attn_output = mindspore.ops.flash_attention_score(
--++++++    #         query=query_states,    # 传入已经预先缩放过的 query
--++++++    #         key=key_states,
--++++++    #         value=value_states,
--++++++    #         head_num=self.num_heads,
--++++++    #         attn_mask=fa_attention_mask,
--++++++    #         keep_prob=1.0 - self.attention_dropout,
--++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++++++    #         input_layout="BNSD",
--++++++    #         sparse_mode=0,
--++++++    #         inner_precise=1        # 仍然保持内部高精度计算
--++++++    #     )
--++++++
--++++++    #     attn_output = attn_output.to(input_dtype)
--++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++    #     attn_output = self.o_proj(attn_output)
--++++++
--++++++    #     attn_weights = None
--++++++    #     if output_attentions:
--++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++++++
--++++++    #     return attn_output, attn_weights, past_key_value
--++++++
--+++++ QWEN2MOE_ATTENTION_CLASSES = {
--+++++     "eager": Qwen2MoeAttention,
--++++++    "flash-attention": Qwen2MoeFlashAttention,
--+++++ }
--+++++ 
--+++++ 
--+++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++ 
--++++++    #@dwj
--++++++    # 只遍历激活的专家，而非全部专家
--+++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-        hidden_states = hidden_states.view(-1, hidden_dim)
--+++++-        # router_logits: (batch * sequence_length, n_experts)
--+++++-        router_logits = self.gate(hidden_states)
--+++++-
--+++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++-        if self.norm_topk_prob:
--+++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-        # we cast back to the input dtype
--+++++-        routing_weights = routing_weights.to(hidden_states.dtype)
--+++++-
--+++++-        final_hidden_states = ops.zeros(
--+++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+++++-        )
--+++++-
--+++++-        # One hot encode the selected experts to create an expert mask
--+++++-        # this will be used to easily index which expert is going to be sollicitated
--+++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+++++-
--+++++-        # Loop over all available experts in the model and perform the computation on each expert
--+++++-        for expert_idx in range(self.num_experts):
--+++++-            expert_layer = self.experts[expert_idx]
--+++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+++++-
--+++++-            # Index the correct hidden states and compute the expert hidden state for
--+++++-            # the current expert. We need to make sure to multiply the output hidden
--+++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+++++-            if 0 not in idx.shape:
--+++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+++++-
--+++++-                # However `index_add_` only support torch tensors for indexing so we'll use
--+++++-                # the `top_x` tensor here.
--+++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+++++-
--+++++-        shared_expert_output = self.shared_expert(hidden_states)
--+++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+++++-
--+++++-        final_hidden_states = final_hidden_states + shared_expert_output
--++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++            num_tokens = hidden_states_reshaped.shape[0]
--++++++
--++++++            router_logits = self.gate(hidden_states_reshaped)
--++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++++
--++++++            if self.norm_topk_prob:
--++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++            routing_weights = routing_weights.to(hidden_states.dtype)
--++++++            
--++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++++++            flat_selected_experts = selected_experts.flatten()
--++++++            
--++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++++++            token_indices = broadcasted_token_indices.flatten()
--++++++            
--++++++            active_experts = ops.unique(flat_selected_experts)
--++++++            
--++++++            for expert_idx_tensor in active_experts:
--++++++                expert_idx = expert_idx_tensor.item()
--++++++                expert_layer = self.experts[expert_idx]
--++++++                
--++++++                mask = (flat_selected_experts == expert_idx_tensor)
--++++++                selected_token_indices = token_indices[mask]
--++++++                selected_routing_weights = routing_weights.flatten()[mask]
--++++++                
--++++++                current_states = hidden_states_reshaped[selected_token_indices]
--++++++                
--++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++++                
--++++++                final_hidden_states = final_hidden_states.index_add(
--++++++                    dim=0,
--++++++                    index=selected_token_indices,
--++++++                    source=expert_output.to(hidden_states.dtype)
--++++++                )
--++++++            
--++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++++ 
--+++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++-        return final_hidden_states, router_logits
--++++++            final_hidden_states = final_hidden_states + shared_expert_output
--++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++++            
--++++++            return final_hidden_states, router_logits
--+++++ 
--+++++ 
--+++++ class Qwen2MoeDecoderLayer(nn.Module):
--+++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+++++ 
--+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++++ 
--++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++++
--+++++         if (layer_idx not in config.mlp_only_layers) and (
--+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+++++         ):
--+++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+++++     _skip_keys_device_placement = "past_key_values"
--+++++     _supports_cache_class = True
--++++++#lwx
--++++++    # _supports_static_cache = True
--+++++ 
--+++++     def _init_weights(self, module):
--+++++         std = self.config.initializer_range
--+++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+++++         return causal_mask
--+++++ 
--+++++ 
--+++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++     _tied_weights_keys = ["lm_head.weight"]
--+++++ 
--+++++     def __init__(self, config):
--+++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++         self.num_experts_per_tok = config.num_experts_per_tok
--+++++         # Initialize weights and apply final processing
--+++++         self.post_init()
--++++++        # @lwx
--++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--++++++        #     self.generation_config.cache_implementation = "static"
--++++++        self._warmed_up = False
--++++++
--++++++    def warmup_moe_model(self):
--++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--++++++        test_texts = [
--++++++            "warmup short",
--++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--++++++        ]
--++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++++        if tokenizer is None:
--++++++            from mindnlp.transformers import AutoTokenizer
--++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++++            self._warmup_tokenizer = tokenizer
--++++++
--++++++        for text in test_texts:
--++++++            inputs = tokenizer(text, return_tensors="ms")
--++++++            with mindspore._no_grad():
--++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+++++ 
--+++++     def get_input_embeddings(self):
--+++++         return self.model.embed_tokens
--+++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++++         ```"""
--++++++        if not self._warmed_up:
--++++++            self._warmed_up = True
--++++++            self.warmup_moe_model()
--+++++ 
--+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+++++         output_router_logits = (
--+++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++             }
--+++++         )
--+++++         return model_inputs
--++++++# @lwx
--++++++    # def _decode_one_tokens_logits(
--++++++    #     self,
--++++++    #     cur_token: mindspore.Tensor,
--++++++    #     input_pos: Optional[mindspore.Tensor],
--++++++    #     cache_position: mindspore.Tensor,
--++++++    #     past_key_values: StaticCache,
--++++++    # ) -> mindspore.Tensor:
--++++++    #     """
--++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--++++++        
--++++++    #     Args:
--++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--++++++    #         input_pos: 输入位置信息，可选
--++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--++++++            
--++++++    #     Returns:
--++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--++++++    #     """
--++++++    #     # 调用JIT编译的版本
--++++++    #     return self.get_decode_one_tokens_logits(
--++++++    #         cur_token=cur_token,
--++++++    #         input_pos=input_pos,
--++++++    #         cache_position=cache_position,
--++++++    #         past_key_values=past_key_values,
--++++++    #     )
--++++++    
--++++++    # @mindspore.jit(jit_level='O1')
--++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--++++++    #     """
--++++++    #     JIT编译的函数，用于高效的单token解码
--++++++    #     使用JIT编译优化以支持静态shape和高效执行
--++++++        
--++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--++++++    #     """
--++++++    #     outputs = self.model.forward(
--++++++    #         input_ids=cur_token,
--++++++    #         position_ids=input_pos,
--++++++    #         cache_position=cache_position,
--++++++    #         past_key_values=past_key_values,
--++++++    #         use_cache=True,
--++++++    #         return_dict=False,
--++++++    #     )
--++++++        
--++++++    #     hidden_states = outputs[0]
--++++++    #     logits = self.lm_head.forward(hidden_states)
--++++++    #     logits = logits.float()
--++++++        
--++++++    #     return logits[:, -1, :]
--++++++
--++++++    # def _sample(
--++++++    #     self,
--++++++    #     input_ids: mindspore.Tensor,
--++++++    #     logits_processor,
--++++++    #     stopping_criteria,
--++++++    #     generation_config,
--++++++    #     synced_devices: bool,
--++++++    #     streamer=None,
--++++++    #     logits_warper=None,
--++++++    #     **model_kwargs,
--++++++    # ):
--++++++    #     """
--++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--++++++    #     """
--++++++    #     from ...generation.logits_process import LogitsProcessorList
--++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--++++++    #     from mindnlp.core import nn, ops, no_grad
--++++++    #     import numpy as np
--++++++        
--++++++    #     # 检查是否使用 StaticCache
--++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--++++++    #     # 否则，直接调用父类方法
--++++++    #     past_key_values = model_kwargs.get("past_key_values")
--++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--++++++
--++++++    #     if not isinstance(past_key_values, StaticCache):
--++++++    #         # 不使用 StaticCache，直接调用父类方法
--++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--++++++    #         return super()._sample(
--++++++    #             input_ids=input_ids,
--++++++    #             logits_processor=logits_processor,
--++++++    #             stopping_criteria=stopping_criteria,
--++++++    #             generation_config=generation_config,
--++++++    #             synced_devices=synced_devices,
--++++++    #             streamer=streamer,
--++++++    #             logits_warper=logits_warper,
--++++++    #             **model_kwargs,
--++++++    #         )
--++++++        
--++++++    #     # 使用 StaticCache，进入自定义循环
--++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--++++++    #     pad_token_id = generation_config._pad_token_tensor
--++++++    #     output_attentions = generation_config.output_attentions
--++++++    #     output_hidden_states = generation_config.output_hidden_states
--++++++    #     output_scores = generation_config.output_scores
--++++++    #     output_logits = generation_config.output_logits
--++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--++++++    #     max_length = generation_config.max_length
--++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--++++++    #     do_sample = generation_config.do_sample
--++++++        
--++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--++++++    #         raise ValueError(
--++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--++++++    #             f"{logits_warper})."
--++++++    #         )
--++++++        
--++++++    #     # init attention / hidden states / scores tuples
--++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
--++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--++++++        
--++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--++++++    #         encoder_hidden_states = (
--++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--++++++    #         )
--++++++        
--++++++    #     # keep track of which sequences are already finished
--++++++    #     batch_size, cur_len = input_ids.shape
--++++++    #     this_peer_finished = False
--++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--++++++        
--++++++    #     time_record = []
--++++++    #     from ....utils.testing_utils import parse_flag_from_env
--++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--++++++        
--++++++    #     while self._has_unfinished_sequences(
--++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--++++++    #     ):
--++++++    #         if _record_time:
--++++++    #             import time as time_module
--++++++    #             infer_start = time_module.time()
--++++++            
--++++++    #         # prepare model inputs
--++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--++++++            
--++++++    #         # prepare variable output controls
--++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--++++++            
--++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--++++++    #         cur_cache_position = model_inputs.get("cache_position")
--++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
--++++++    #         cur_input_ids = model_inputs.get("input_ids")
--++++++            
--++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--++++++    #             cur_cache_position is not None and 
--++++++    #             len(cur_cache_position.shape) > 0 and
--++++++    #             cur_cache_position.shape[0] == 1 and
--++++++    #             cur_input_ids is not None and
--++++++    #             cur_input_ids.shape[1] == 1):
--++++++    #             # 使用 JIT 优化的单 token 解码
--++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--++++++    #             if not hasattr(self, '_jit_used'):
--++++++    #                 self._jit_used = False
--++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--++++++                
--++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
--++++++    #                 cur_token=cur_input_ids,
--++++++    #                 input_pos=model_inputs.get("position_ids"),
--++++++    #                 cache_position=cur_cache_position,
--++++++    #                 past_key_values=cur_past_key_values,
--++++++    #             )
--++++++                
--++++++    #             # 标记已使用JIT（用于后续判断）
--++++++    #             if not self._jit_used:
--++++++    #                 self._jit_used = True
--++++++                
--++++++    #             # 构造兼容的输出对象
--++++++    #             class JitOptimizedOutput:
--++++++    #                 def __init__(self, logits, config):
--++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--++++++    #                     self.config = config
--++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
--++++++    #                     self.cross_attentions = None
--++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--++++++                
--++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--++++++    #         else:
--++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--++++++    #             outputs = self(**model_inputs, return_dict=True)
--++++++            
--++++++    #         if synced_devices and this_peer_finished:
--++++++    #             continue
--++++++            
--++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--++++++    #         next_token_logits = outputs.logits[:, -1, :]
--++++++            
--++++++    #         # pre-process distribution
--++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--++++++    #         if do_sample:
--++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--++++++            
--++++++    #         # Store scores, attentions and hidden_states when required
--++++++    #         if return_dict_in_generate:
--++++++    #             if output_scores:
--++++++    #                 scores += (next_token_scores,)
--++++++    #             if output_logits:
--++++++    #                 raw_logits += (next_token_logits,)
--++++++    #             if output_attentions:
--++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--++++++    #                 if self.config.is_encoder_decoder:
--++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--++++++                
--++++++    #             if output_hidden_states:
--++++++    #                 hidden = (
--++++++    #                     outputs.decoder_hidden_states
--++++++    #                     if self.config.is_encoder_decoder
--++++++    #                     else outputs.hidden_states
--++++++    #                 )
--++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--++++++            
--++++++    #         # token selection
--++++++    #         if do_sample:
--++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--++++++    #         else:
--++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--++++++            
--++++++    #         # finished sentences should have their next token be a padding token
--++++++    #         if has_eos_stopping_criteria:
--++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--++++++            
--++++++    #         # update generated ids, model inputs, and length for next step
--++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--++++++    #         if streamer is not None:
--++++++    #             streamer.put(next_tokens)
--++++++            
--++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
--++++++    #             outputs,
--++++++    #             model_kwargs,
--++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--++++++    #         )
--++++++            
--++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--++++++    #         cur_len += 1
--++++++            
--++++++    #         if _record_time:
--++++++    #             import time as time_module
--++++++    #             infer_stop = time_module.time()
--++++++    #             time_record.append(infer_stop - infer_start)
--++++++            
--++++++    #         del outputs
--++++++        
--++++++    #     average_infer_time = None
--++++++    #     if time_record:
--++++++    #         if len(time_record) > 1:
--++++++    #             time_record.pop(0)
--++++++    #         average_infer_time = sum(time_record) / len(time_record)
--++++++    #         print(f'average inference time is: {average_infer_time}')
--++++++    #         print(f'inference time record: {time_record}')
--++++++        
--++++++    #     if streamer is not None:
--++++++    #         streamer.end()
--++++++        
--++++++    #     # 简单判断：打印是否使用了JIT路径
--++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
--++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
--++++++    #     else:
--++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--++++++        
--++++++    #     if return_dict_in_generate:
--++++++    #         if self.config.is_encoder_decoder:
--++++++    #             return GenerateEncoderDecoderOutput(
--++++++    #                 sequences=input_ids,
--++++++    #                 scores=scores,
--++++++    #                 logits=raw_logits,
--++++++    #                 encoder_attentions=encoder_attentions,
--++++++    #                 encoder_hidden_states=encoder_hidden_states,
--++++++    #                 decoder_attentions=decoder_attentions,
--++++++    #                 cross_attentions=cross_attentions,
--++++++    #                 decoder_hidden_states=decoder_hidden_states,
--++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++++    #                 average_infer_time=average_infer_time
--++++++    #             )
--++++++    #         else:
--++++++    #             return GenerateDecoderOnlyOutput(
--++++++    #                 sequences=input_ids,
--++++++    #                 scores=scores,
--++++++    #                 logits=raw_logits,
--++++++    #                 attentions=decoder_attentions,
--++++++    #                 hidden_states=decoder_hidden_states,
--++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++++    #                 average_infer_time=average_infer_time
--++++++    #             )
--++++++    #     else:
--++++++    #         return input_ids
--++++++            
--++++++    # def _prepare_cache_for_generation(
--++++++    #     self,
--++++++    #     generation_config,
--++++++    #     model_kwargs,
--++++++    #     assistant_model,
--++++++    #     batch_size,
--++++++    #     max_cache_length,
--++++++    # ):
--++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--++++++    #         generation_config.cache_implementation = "static"
--++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--++++++        
--++++++    #     if generation_config.cache_implementation == "static":
--++++++    #         base_required_from_max_length = generation_config.max_length + 1
--++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
--++++++    #         min_cache_size = 50
--++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--++++++    #         else:
--++++++    #             max_cache_length = max(base_required, min_cache_size)
--++++++            
--++++++    #         original_max_cache_length = max_cache_length
--++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
--++++++            
--++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++++    #             if max_cache_length > self.config.max_position_embeddings:
--++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--++++++        
--++++++    #     result = super()._prepare_cache_for_generation(
--++++++    #         generation_config=generation_config,
--++++++    #         model_kwargs=model_kwargs,
--++++++    #         assistant_model=assistant_model,
--++++++    #         batch_size=batch_size,
--++++++    #         max_cache_length=max_cache_length,
--++++++    #     )
--++++++        
--++++++    #     if generation_config.cache_implementation == "static":
--++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--++++++    #         created_cache = model_kwargs.get(cache_name)
--++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--++++++    #             if created_cache.max_cache_len < generation_config.max_length:
--++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--++++++        
--++++++    #     return result
--++++++
--++++++
--++++++
--+++++ 
--+++++ 
--+++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+++++-- 
--+++++2.27.0
--+++++
--++++-- 
--++++2.27.0
--++++
--+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--+++new file mode 100644
--+++index 00000000..966529e4
--+++--- /dev/null
--++++++ b/patches/0003-20261106secondcommit.patch
--+++@@ -0,0 +1,2769 @@
--++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--++++From: Pinoeer-kingxi <13022943007@163.com>
--++++Date: Thu, 6 Nov 2025 14:54:37 +0800
--++++Subject: [PATCH 3/3] 20261106secondcommit
--++++
--++++---
--++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
--++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
--++++ patches/0001-20251104commit.patch             | 1272 -----------------
--++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
--++++ delete mode 100644 patches/0001-20251104commit.patch
--++++
--++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++index 73773c22..2f9192bf 100644
--++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
--++++ 
--++++ _CONFIG_FOR_DOC = "DeepseekConfig"
--++++ 
--+++++_attn_mask_cache = {}
--+++++
--+++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
--+++++    q_len = batch_and_seq[1]
--+++++    kv_len = batch_and_seq[1] + past_key_values_length 
--+++++    key = (batch_and_seq[0], q_len, kv_len)
--+++++
--+++++    if key in _attn_mask_cache:
--+++++        return _attn_mask_cache[key]
--+++++
--+++++    mask = _prepare_4d_causal_attention_mask(
--+++++        attention_mask,
--+++++        batch_and_seq,
--+++++        inputs_embeds,
--+++++        past_key_values_length,
--+++++    )
--+++++    _attn_mask_cache[key] = mask
--+++++    return mask
--++++ 
--++++ def _get_unpad_data(attention_mask):
--++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
--++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
--++++         return final_output
--++++ 
--++++ 
--++++-    @no_grad()
--++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++-        expert_cache = ops.zeros_like(x)
--++++-        idxs = flat_expert_indices.argsort()
--++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++-        token_idxs = idxs // self.num_experts_per_tok
--++++-
--++++-        for i, end_idx in enumerate(tokens_per_expert):
--++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++-            if start_idx == end_idx:
--++++-                continue
--++++-            expert = self.experts[i]
--++++-            exp_token_idx = token_idxs[start_idx:end_idx]
--++++-            expert_tokens = x[exp_token_idx]
--++++-            expert_out = expert(expert_tokens)
--++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++-
--++++-        return expert_cache
--++++-        
--++++     # @no_grad()
--++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++-    #     # expert_cache = torch.zeros_like(x)
--++++-    #     # idxs = flat_expert_indices.argsort()
--++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++-    #     # token_idxs = idxs // self.num_experts_per_tok
--++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
--++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++-    #     #     if start_idx == end_idx:
--++++-    #     #         continue
--++++-    #     #     expert = self.experts[i]
--++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++-    #     #     expert_tokens = x[exp_token_idx]
--++++-    #     #     expert_out = expert(expert_tokens)
--++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++-    #     # return expert_cache
--+++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++     #     expert_cache = ops.zeros_like(x)
--++++     #     idxs = flat_expert_indices.argsort()
--++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
--++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++ 
--++++     #     return expert_cache
--++++-    # @no_grad()
--++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++-    #     expert_cache = ops.zeros_like(x)
--+++++        
--+++++    @no_grad()
--+++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++++        """
--+++++        优化版 MoE prefill：
--+++++        - 批量张量化处理同一个 expert 的所有 token
--+++++        - 跳过无 token 的专家
--+++++        - 保持结果完全一致
--+++++        """
--+++++        # 初始化输出缓存
--+++++        expert_cache = ops.zeros_like(x)
--++++ 
--++++-    #     # 排序保证顺序一致
--++++-    #     idxs = flat_expert_indices.argsort()
--++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++-    #     token_idxs = idxs // self.num_experts_per_tok
--+++++        # 排序（确保 scatter_add 位置对应原逻辑）
--+++++        idxs = flat_expert_indices.argsort()
--+++++        sorted_expert_indices = flat_expert_indices[idxs]
--+++++        sorted_token_indices = idxs // self.num_experts_per_tok
--++++ 
--++++-    #     # 找出有 token 的专家
--++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++++        # 每个 expert 的 token 数
--+++++        tokens_per_expert = sorted_expert_indices.bincount()
--++++ 
--++++-    #     for i in active_experts.tolist():
--++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++-    #         end_idx = tokens_per_expert[i]
--++++-    #         if start_idx == end_idx:  # 没有 token
--++++-    #             continue
--+++++        # 找出有 token 的专家
--+++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--++++ 
--++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++-    #         expert_tokens = x[exp_token_idx]
--++++-    #         expert_out = self.experts[i](expert_tokens)
--++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++++        for expert_id in active_experts.tolist():
--+++++            # 取该 expert 对应的排序后 token 区间
--+++++            start = (tokens_per_expert[:expert_id]).sum().item()
--+++++            end = start + tokens_per_expert[expert_id].item()
--++++ 
--++++-    #         expert_cache = mindspore.mint.scatter_add(
--++++-    #             expert_cache,
--++++-    #             0,
--++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++-    #             expert_out
--++++-    #         )
--+++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
--+++++            expert_tokens = x[token_idx]                     # 取输入向量
--++++ 
--++++-    #     return expert_cache
--+++++            # 执行专家 MLP
--+++++            expert_out = self.experts[expert_id](expert_tokens)
--+++++
--+++++            # 按权重缩放
--+++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
--+++++
--+++++            # 回写到缓存（等价 scatter_add）
--+++++            expert_cache = mindspore.mint.scatter_add(
--+++++                expert_cache,
--+++++                0,
--+++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++++                scaled_out
--+++++            )
--+++++
--+++++        return expert_cache
--+++++
--+++++        # @no_grad()
--+++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++        #     # expert_cache = torch.zeros_like(x)
--+++++        #     # idxs = flat_expert_indices.argsort()
--+++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++        #     # token_idxs = idxs // self.num_experts_per_tok
--+++++        #     # for i, end_idx in enumerate(tokens_per_expert):
--+++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++        #     #     if start_idx == end_idx:
--+++++        #     #         continue
--+++++        #     #     expert = self.experts[i]
--+++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++        #     #     expert_tokens = x[exp_token_idx]
--+++++        #     #     expert_out = expert(expert_tokens)
--+++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++        #     # return expert_cache
--+++++        #     expert_cache = ops.zeros_like(x)
--+++++        #     idxs = flat_expert_indices.argsort()
--+++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++        #     token_idxs = idxs // self.num_experts_per_tok
--+++++
--+++++        #     for i, end_idx in enumerate(tokens_per_expert):
--+++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++        #         if start_idx == end_idx:
--+++++        #             continue
--+++++        #         expert = self.experts[i]
--+++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++        #         expert_tokens = x[exp_token_idx]
--+++++        #         expert_out = expert(expert_tokens)
--+++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++
--+++++        #     return expert_cache
--+++++        # @no_grad()
--+++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++        #     expert_cache = ops.zeros_like(x)
--+++++
--+++++        #     # 排序保证顺序一致
--+++++        #     idxs = flat_expert_indices.argsort()
--+++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++        #     token_idxs = idxs // self.num_experts_per_tok
--+++++
--+++++        #     # 找出有 token 的专家
--+++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++++
--+++++        #     for i in active_experts.tolist():
--+++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++        #         end_idx = tokens_per_expert[i]
--+++++        #         if start_idx == end_idx:  # 没有 token
--+++++        #             continue
--+++++
--+++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++        #         expert_tokens = x[exp_token_idx]
--+++++        #         expert_out = self.experts[i](expert_tokens)
--+++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++++
--+++++        #         expert_cache = mindspore.mint.scatter_add(
--+++++        #             expert_cache,
--+++++        #             0,
--+++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++++        #             expert_out
--+++++        #         )
--+++++
--+++++        #     return expert_cache
--++++ 
--++++ 
--++++ 
--++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
--++++ 
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--++++-
--++++ # class DeepseekFlashAttention(nn.Module):
--++++ #     """
--++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
--++++ 
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--+++++
--++++ Deepseek_ATTENTION_CLASSES = {
--++++     "eager": DeepseekAttention,
--++++     "flash-attention": DeepseekFlashAttention,
--++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
--++++             )
--++++         else:
--++++             # 4d mask is passed through the layers
--++++-            attention_mask = _prepare_4d_causal_attention_mask(
--+++++            # attention_mask = _prepare_4d_causal_attention_mask(
--+++++            #     attention_mask,
--+++++            #     (batch_size, seq_length),
--+++++            #     inputs_embeds,
--+++++            #     past_key_values_length,
--+++++            # )
--+++++            #@dwj
--+++++            attention_mask = get_cached_causal_mask(
--++++                 attention_mask,
--++++                 (batch_size, seq_length),
--++++                 inputs_embeds,
--++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++         # Initialize weights and apply final processing
--++++         self.post_init()
--++++         self.warm_up = False
--+++++        #@dwj
--+++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--+++++            self.num_layers,
--+++++            self.num_attention_heads,
--+++++            self.head_dim,
--+++++            batch_size=1,
--+++++            max_length=self.max_length,
--+++++            dtype=mindspore.float16
--+++++        )
--+++++
--+++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--+++++        key_cache = []
--+++++        value_cache = []
--+++++        for _ in range(num_layers):
--+++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--+++++            key_cache.append(k)
--+++++            value_cache.append(v)
--+++++        return key_cache, value_cache
--+++++
--++++ 
--++++     def warmup_moe_model_deep(self):
--++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++index bced285c..ebd7782e 100644
--++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
--++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--++++ 
--++++-Long_Prompt = False
--++++-PROMPT_LENGTH_THRESHOLD = 128
--+++++Long_Prompt = 1
--+++++LONG_PROMPT_LENGTH_THRESHOLD = 128
--+++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
--+++++
--+++++_causal_mask_cache = {}
--+++++
--+++++def get_cached_causal_mask_with_cache_position(
--+++++    attention_mask: mindspore.Tensor,
--+++++    sequence_length: int,
--+++++    target_length: int,
--+++++    dtype: mindspore.dtype,
--+++++    min_dtype: float,
--+++++    cache_position: mindspore.Tensor,
--+++++    batch_size: int,
--+++++):
--+++++    """
--+++++    带缓存的 causal mask 构造函数
--+++++    """
--+++++    # q_len 是当前 query 长度
--+++++    q_len = sequence_length
--+++++    # kv_len 是 target_length
--+++++    kv_len = target_length
--+++++
--+++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
--+++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
--+++++
--+++++    if key in _causal_mask_cache:
--+++++        return _causal_mask_cache[key]
--+++++
--+++++    # 调用原来的 mask 构造逻辑
--+++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++++        attention_mask,
--+++++        sequence_length=sequence_length,
--+++++        target_length=target_length,
--+++++        dtype=dtype,
--+++++        min_dtype=min_dtype,
--+++++        cache_position=cache_position,
--+++++        batch_size=batch_size,
--+++++    )
--+++++    # 缓存结果
--+++++    _causal_mask_cache[key] = causal_mask
--+++++    return causal_mask
--++++ 
--++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--++++ def _prepare_4d_causal_attention_mask_with_cache_position(
--++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++++ 
--++++ 
--++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
--+++++# class Qwen2MoeAttention(nn.Module):
--+++++#     """
--+++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--+++++#     and "Generating Long Sequences with Sparse Transformers".
--+++++#     """
--+++++
--+++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++++#         super().__init__()
--+++++#         self.config = config
--+++++#         self.layer_idx = layer_idx
--+++++#         if layer_idx is None:
--+++++#             logger.warning_once(
--+++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++++#                 "when creating this class."
--+++++#             )
--+++++
--+++++#         self.hidden_size = config.hidden_size
--+++++#         self.num_heads = config.num_attention_heads
--+++++#         self.head_dim = self.hidden_size // self.num_heads
--+++++#         self.num_key_value_heads = config.num_key_value_heads
--+++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++#         self.max_position_embeddings = config.max_position_embeddings
--+++++#         self.rope_theta = config.rope_theta
--+++++#         self.is_causal = True
--+++++#         self.attention_dropout = config.attention_dropout
--+++++
--+++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++#             raise ValueError(
--+++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--+++++#                 f" and `num_heads`: {self.num_heads})."
--+++++#             )
--+++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++++
--+++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++++#             self.head_dim,
--+++++#             max_position_embeddings=self.max_position_embeddings,
--+++++#             base=self.rope_theta,
--+++++#         )
--+++++
--+++++#     def forward(
--+++++#         self,
--+++++#         hidden_states: mindspore.Tensor,
--+++++#         attention_mask: Optional[mindspore.Tensor] = None,
--+++++#         position_ids: Optional[mindspore.Tensor] = None,
--+++++#         past_key_value: Optional[Cache] = None,
--+++++#         output_attentions: bool = False,
--+++++#         use_cache: bool = False,
--+++++#         cache_position: Optional[mindspore.Tensor] = None,
--+++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++
--+++++        
--+++++
--+++++#         bsz, q_len, _ = hidden_states.shape
--+++++
--+++++#         query_states = self.q_proj(hidden_states)
--+++++#         key_states = self.k_proj(hidden_states)
--+++++#         value_states = self.v_proj(hidden_states)
--+++++
--+++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++++
--+++++#         kv_seq_len = key_states.shape[-2]
--+++++#         if past_key_value is not None:
--+++++#             if self.layer_idx is None:
--+++++#                 raise ValueError(
--+++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++#                     "with a layer index."
--+++++#                 )
--+++++#             if isinstance(past_key_value, StaticCache):
--+++++#                 kv_seq_len = key_states.shape[-2]
--+++++#             else:
--+++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++
--+++++#         if past_key_value is not None:
--+++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++            
--+++++#             if isinstance(past_key_value, StaticCache):
--+++++#                 kv_seq_len = key_states.shape[-2]
--+++++
--+++++#         # repeat k/v heads if n_kv_heads < n_heads
--+++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++        
--+++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++++
--+++++#         if attention_mask is not None:
--+++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++++#             attn_weights = attn_weights + causal_mask
--+++++
--+++++#         # upcast attention to fp32
--+++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+++++#         attn_output = ops.matmul(attn_weights, value_states)
--+++++
--+++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+++++#             raise ValueError(
--+++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--+++++#                 f" {attn_output.shape}"
--+++++#             )
--+++++
--+++++#         attn_output = ops.transpose(attn_output, 1, 2)
--+++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++++
--+++++#         attn_output = self.o_proj(attn_output)
--+++++#         # @lwx
--+++++        
--+++++#         # max_seq_len = self.max_position_embeddings  # 2048
--+++++
--+++++#         # if attention_mask is not None:
--+++++#         #     # attention_mask: [B, 1, Sq, Sk]
--+++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++++
--+++++#         #     # pad 到 [max_seq_len, max_seq_len]
--+++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++++#         #     global_attention_mask = padded_mask
--+++++#         # else:
--+++++#         #     global_attention_mask = None
--+++++
--+++++
--+++++#         # sparse_mode=3
--+++++#         # attn_output = mindspore.ops.flash_attention_score(
--+++++#         #     query=query_states,
--+++++#         #     key=key_states,
--+++++#         #     value=value_states,
--+++++#         #     real_shift=None,
--+++++#         #     padding_mask=None,
--+++++
--+++++#         #     head_num=self.num_heads,
--+++++#         #     attn_mask=global_attention_mask,
--+++++#         #     keep_prob=1.0 - self.attention_dropout,
--+++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++#         #     input_layout="BNSD",
--+++++#         #     pre_tokens=2147483647,
--+++++#         #     next_tokens=2147483647,
--+++++#         #     inner_precise=0,
--+++++#         #     drop_mask=None,
--+++++#         #     prefix=None,
--+++++#         #     actual_seq_qlen=None,
--+++++#         #     actual_seq_kvlen=None,
--+++++#         #     sparse_mode=sparse_mode,
--+++++#         # )
--+++++#         if not output_attentions:
--+++++#             attn_weights = None
--+++++
--+++++#         return attn_output, attn_weights, past_key_value
--+++++
--++++ class Qwen2MoeAttention(nn.Module):
--++++     """
--++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--++++-    and "Generating Long Sequences with Sparse Transformers".
--++++-    """
--+++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
--++++ 
--+++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
--+++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
--+++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
--+++++
--+++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
--+++++    """
--++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++         super().__init__()
--++++         self.config = config
--++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
--++++         if layer_idx is None:
--++++             logger.warning_once(
--++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++                 "when creating this class."
--++++             )
--++++ 
--++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
--++++         use_cache: bool = False,
--++++         cache_position: Optional[mindspore.Tensor] = None,
--++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++-
--++++         
--++++-
--+++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
--++++         bsz, q_len, _ = hidden_states.shape
--++++ 
--++++         query_states = self.q_proj(hidden_states)
--++++         key_states = self.k_proj(hidden_states)
--++++         value_states = self.v_proj(hidden_states)
--++++ 
--++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++-
--+++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++        
--++++         kv_seq_len = key_states.shape[-2]
--++++         if past_key_value is not None:
--++++-            if self.layer_idx is None:
--++++-                raise ValueError(
--++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++-                    "with a layer index."
--++++-                )
--++++-            if isinstance(past_key_value, StaticCache):
--++++-                kv_seq_len = key_states.shape[-2]
--++++-            else:
--++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++        
--++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++ 
--++++         if past_key_value is not None:
--++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++
--+++++        # --- 2. 动态调度核心注意力计算 ---
--+++++        global Long_Prompt
--+++++        if Long_Prompt >= 1:
--+++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
--+++++            fa_attention_mask = None
--+++++            if attention_mask is not None:
--+++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++                fa_attention_mask = (mask_slice != 0)
--+++++
--+++++            attn_output = mindspore.ops.flash_attention_score(
--+++++                query=query_states,
--+++++                key=key_states,
--+++++                value=value_states,
--+++++                head_num=self.num_heads,
--+++++                attn_mask=fa_attention_mask,
--+++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
--+++++                scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++                input_layout="BNSD",
--+++++                sparse_mode=0,
--+++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
--+++++            )
--++++             
--++++-            if isinstance(past_key_value, StaticCache):
--++++-                kv_seq_len = key_states.shape[-2]
--+++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++            attn_output = self.o_proj(attn_output)
--+++++            attn_weights = None
--+++++            if output_attentions:
--+++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--++++ 
--++++-        # repeat k/v heads if n_kv_heads < n_heads
--++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++-        
--++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++++        else:
--+++++            # --- Eager Attention 路径 (用于短序列和解码) ---
--+++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++            
--+++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++ 
--++++-        if attention_mask is not None:
--++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++-            attn_weights = attn_weights + causal_mask
--+++++            if attention_mask is not None:
--+++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++++                attn_weights = attn_weights + causal_mask
--++++ 
--++++-        # upcast attention to fp32
--++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--++++-        attn_output = ops.matmul(attn_weights, value_states)
--+++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+++++            attn_output = ops.matmul(attn_weights, value_states)
--++++ 
--++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--++++-            raise ValueError(
--++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--++++-                f" {attn_output.shape}"
--++++-            )
--+++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+++++                raise ValueError(
--+++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
--+++++                )
--++++ 
--++++-        attn_output = ops.transpose(attn_output, 1, 2)
--++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++++            attn_output = ops.transpose(attn_output, 1, 2)
--+++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++++            attn_output = self.o_proj(attn_output)
--++++ 
--++++-        attn_output = self.o_proj(attn_output)
--++++-        # @lwx
--+++++            if not output_attentions:
--+++++                attn_weights = None
--++++         
--++++-        # max_seq_len = self.max_position_embeddings  # 2048
--++++-
--++++-        # if attention_mask is not None:
--++++-        #     # attention_mask: [B, 1, Sq, Sk]
--++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++++-
--++++-        #     # pad 到 [max_seq_len, max_seq_len]
--++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++++-        #     global_attention_mask = padded_mask
--++++-        # else:
--++++-        #     global_attention_mask = None
--++++-
--++++-
--++++-        # sparse_mode=3
--++++-        # attn_output = mindspore.ops.flash_attention_score(
--++++-        #     query=query_states,
--++++-        #     key=key_states,
--++++-        #     value=value_states,
--++++-        #     real_shift=None,
--++++-        #     padding_mask=None,
--++++-
--++++-        #     head_num=self.num_heads,
--++++-        #     attn_mask=global_attention_mask,
--++++-        #     keep_prob=1.0 - self.attention_dropout,
--++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++++-        #     input_layout="BNSD",
--++++-        #     pre_tokens=2147483647,
--++++-        #     next_tokens=2147483647,
--++++-        #     inner_precise=0,
--++++-        #     drop_mask=None,
--++++-        #     prefix=None,
--++++-        #     actual_seq_qlen=None,
--++++-        #     actual_seq_kvlen=None,
--++++-        #     sparse_mode=sparse_mode,
--++++-        # )
--++++-        if not output_attentions:
--++++-            attn_weights = None
--++++-
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--++++-
--++++ # class Qwen2MoeFlashAttention(nn.Module):
--++++ #     """
--++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
--++++ #             return final_hidden_states, router_logits
--++++ 
--++++ 
--++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++-#     """
--++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
--++++-#     """
--++++-#     def __init__(self, config: Qwen2MoeConfig):
--++++-#         super().__init__()
--++++-#         self.num_experts = config.num_experts
--++++-#         self.top_k = config.num_experts_per_tok
--++++-#         self.norm_topk_prob = config.norm_topk_prob
--++++-
--++++-#         # 门控网络
--++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++-#         # 专家列表
--++++-#         self.experts = nn.ModuleList(
--++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++-#         )
--++++-#         # 共享专家
--++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++-
--++++-#     @no_grad()
--++++-#     def _moe_infer_decode(
--++++-#         self, 
--++++-#         hidden_states: mindspore.Tensor, 
--++++-#         selected_experts: mindspore.Tensor, 
--++++-#         routing_weights: mindspore.Tensor
--++++-#     ) -> mindspore.Tensor:
--++++-#         """
--++++-#         【解码路径】针对 sequence_length=1 的极致优化。
--++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--++++-#         """
--++++-#         batch_size, hidden_dim = hidden_states.shape
--++++-        
--++++-#         expert_outputs_list = [
--++++-#             ops.cat([
--++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++-#             ], dim=0) 
--++++-#             for i in range(batch_size)
--++++-#         ]
--++++-        
--++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--++++-#         # shape: (batch_size, top_k, hidden_dim)
--++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++-        
--++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++-        
--++++-#         return moe_output.squeeze(1)
--++++-
--++++-#     @no_grad()
--++++-#     def _moe_infer_prefill(
--++++-#         self, 
--++++-#         hidden_states: mindspore.Tensor, 
--++++-#         selected_experts: mindspore.Tensor, 
--++++-#         routing_weights: mindspore.Tensor
--++++-#     ) -> mindspore.Tensor:
--++++-#         """
--++++-#         【预填充路径】针对 sequence_length > 1 的优化。
--++++-#         按专家对 Token 进行分组，并进行批处理。
--++++-#         """
--++++-#         moe_output = ops.zeros_like(hidden_states)
--++++-#         num_tokens = hidden_states.shape[0]
--++++-#         flat_selected_experts = selected_experts.flatten()
--++++-        
--++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++-        
--++++-#         active_experts = ops.unique(flat_selected_experts)
--++++-        
--++++-#         for expert_idx_tensor in active_experts:
--++++-#             expert_idx = expert_idx_tensor.item()
--++++-#             expert_layer = self.experts[expert_idx]
--++++-            
--++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++++-#             selected_token_indices = token_indices[mask]
--++++-#             selected_routing_weights = routing_weights.flatten()[mask]
--++++-            
--++++-#             current_states = hidden_states[selected_token_indices]
--++++-            
--++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++-            
--++++-#             moe_output = moe_output.index_add(
--++++-#                 dim=0,
--++++-#                 index=selected_token_indices,
--++++-#                 source=expert_output.to(hidden_states.dtype)
--++++-#             )
--++++-#         return moe_output
--++++-
--++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++-#         """
--++++-#         顶层 forward 方法，作为智能分发器。
--++++-#         """
--++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-        
--++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++-#         router_logits = self.gate(hidden_states_reshaped)
--++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++-
--++++-#         if self.norm_topk_prob:
--++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-        
--++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++-        
--++++-#         moe_output = None
--++++-#         # 在推理时，根据序列长度选择最优路径
--++++-#         if not self.training:
--++++-#             if sequence_length == 1:
--++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++++-#             else:
--++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++++-#         else:
--++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--++++-#             raise NotImplementedError("Training path is not implemented.")
--++++-
--++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--++++-        
--++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--++++-        
--++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--++++-        
--++++-#         return final_hidden_states, router_logits
--++++-
--++++-
--++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++-#     """
--++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--++++-#     """
--++++-#     def __init__(self, config: Qwen2MoeConfig):
--++++-#         super().__init__()
--++++-#         self.num_experts = config.num_experts
--++++-#         self.top_k = config.num_experts_per_tok
--++++-#         self.norm_topk_prob = config.norm_topk_prob
--++++-
--++++-#         # 门控网络
--++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++-#         # 专家列表
--++++-#         self.experts = nn.ModuleList(
--++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++-#         )
--++++-#         # 共享专家
--++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++-
--++++-#     @no_grad()
--++++-#     def _moe_infer_decode(
--++++-#         self, 
--++++-#         hidden_states: mindspore.Tensor, 
--++++-#         selected_experts: mindspore.Tensor, 
--++++-#         routing_weights: mindspore.Tensor
--++++-#     ) -> mindspore.Tensor:
--++++-#         batch_size, _ = hidden_states.shape
--++++-#         expert_outputs_list = [
--++++-#             ops.cat([
--++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++-#             ], dim=0) 
--++++-#             for i in range(batch_size)
--++++-#         ]
--++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++-#         return moe_output.squeeze(1)
--++++-
--++++-#     @no_grad()
--++++-#     def _moe_infer_prefill(
--++++-#         self, 
--++++-#         hidden_states: mindspore.Tensor, 
--++++-#         selected_experts: mindspore.Tensor, 
--++++-#         routing_weights: mindspore.Tensor
--++++-#     ) -> mindspore.Tensor:
--++++-#         moe_output = ops.zeros_like(hidden_states)
--++++-#         num_tokens = hidden_states.shape[0]
--++++-#         flat_selected_experts = selected_experts.flatten()
--++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++-#         active_experts = ops.unique(flat_selected_experts)
--++++-        
--++++-#         for expert_idx_tensor in active_experts:
--++++-#             expert_idx = expert_idx_tensor.item()
--++++-#             expert_layer = self.experts[expert_idx]
--++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++++-#             selected_token_indices = token_indices[mask]
--++++-#             selected_routing_weights = routing_weights.flatten()[mask]
--++++-#             current_states = hidden_states[selected_token_indices]
--++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++-#             moe_output = moe_output.index_add(
--++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++++-#             )
--++++-#         return moe_output
--++++-
--++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++-#         """
--++++-#         顶层 forward 方法，作为智能分发器。
--++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--++++-#         """
--++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-        
--++++-#         # 1. 门控计算 (通用逻辑)
--++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++-#         router_logits = self.gate(hidden_states_reshaped)
--++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++-
--++++-#         if self.norm_topk_prob:
--++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-        
--++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++-        
--++++-#         # 2. 智能分发到最优 MoE 路径
--++++-#         moe_output = None
--++++-#         if not self.training:
--++++-#             if sequence_length == 1:
--++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++++-#             else:
--++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++++-#         else:
--++++-#             raise NotImplementedError("Training path is not implemented.")
--++++-
--++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++-        
--++++-#         # 4. 合并 MoE 输出和共享专家输出
--++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++-        
--++++-#         # 5. 恢复原始形状并返回
--++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++-        
--++++-#         return final_hidden_states, router_logits
--++++-
--++++-# prefill fastest
--++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++-#     """
--++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--++++-#     """
--++++-#     def __init__(self, config: Qwen2MoeConfig):
--++++-#         super().__init__()
--++++-#         self.num_experts = config.num_experts
--++++-#         self.top_k = config.num_experts_per_tok
--++++-#         self.norm_topk_prob = config.norm_topk_prob
--++++-
--++++-#         # 门控网络
--++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++-#         # 专家列表
--++++-#         self.experts = nn.ModuleList(
--++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++-#         )
--++++-#         # 共享专家
--++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++-
--++++-#     @no_grad()
--++++-#     def _moe_infer_dispatch(
--++++-#         self, 
--++++-#         hidden_states: mindspore.Tensor, 
--++++-#         selected_experts: mindspore.Tensor, 
--++++-#         routing_weights: mindspore.Tensor
--++++-#     ) -> mindspore.Tensor:
--++++-#         """
--++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--++++-#         """
--++++-#         moe_output = ops.zeros_like(hidden_states)
--++++-#         num_tokens, _ = hidden_states.shape
--++++-        
--++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--++++-#         flat_selected_experts = selected_experts.flatten()
--++++-#         flat_routing_weights = routing_weights.flatten()
--++++-
--++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++-
--++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--++++-#         active_experts = ops.unique(flat_selected_experts)
--++++-        
--++++-#         for expert_idx_tensor in active_experts:
--++++-#             expert_idx = expert_idx_tensor.item()
--++++-#             expert_layer = self.experts[expert_idx]
--++++-            
--++++-#             # 找到所有分配给该专家的 token
--++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++++-            
--++++-#             # 使用 mask 选取对应的 token 和权重
--++++-#             current_token_indices = token_indices[mask]
--++++-#             current_routing_weights = flat_routing_weights[mask]
--++++-#             current_hidden_states = hidden_states[current_token_indices]
--++++-            
--++++-#             # 对这些 token 进行批处理
--++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++++-            
--++++-#             # 使用 index_add 将结果精确地加回到对应位置
--++++-#             moe_output = moe_output.index_add(
--++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--++++-#             )
--++++-#         return moe_output
--++++-
--++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++-#         """
--++++-#         顶层 forward 方法，作为智能分发器。
--++++-#         """
--++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-        
--++++-#         # 1. 门控计算
--++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++-#         router_logits = self.gate(hidden_states_reshaped)
--++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++-
--++++-#         if self.norm_topk_prob:
--++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-        
--++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++-        
--++++-#         # 2. 调用统一的 MoE 计算内核
--++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--++++-
--++++-#         # 3. 统一处理共享专家
--++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++-        
--++++-#         # 4. 合并输出
--++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++-        
--++++-#         # 5. 恢复原始形状并返回
--++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++-        
--++++-#         return final_hidden_states, router_logits
--++++-
--++++-
--++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++-#     """
--++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++-#     【最终高性能与高精度版】：
--++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--++++-#     3. 这样实现了速度和准确性的两全其美。
--++++-#     """
--++++-#     def __init__(self, config: Qwen2MoeConfig):
--++++-#         super().__init__()
--++++-#         self.num_experts = config.num_experts
--++++-#         self.top_k = config.num_experts_per_tok
--++++-#         self.norm_topk_prob = config.norm_topk_prob
--++++-
--++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++-#         self.experts = nn.ModuleList(
--++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++-#         )
--++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++-
--++++-#     @no_grad()
--++++-#     def _moe_infer_decode(
--++++-#         self, 
--++++-#         hidden_states: mindspore.Tensor, 
--++++-#         selected_experts: mindspore.Tensor, 
--++++-#         routing_weights: mindspore.Tensor
--++++-#     ) -> mindspore.Tensor:
--++++-#         """
--++++-#         【解码路径】极致优化版：bmm + 高精度累加。
--++++-#         """
--++++-#         original_dtype = hidden_states.dtype
--++++-#         batch_size, _ = hidden_states.shape
--++++-        
--++++-#         expert_outputs_list = [
--++++-#             ops.cat([
--++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++-#             ], dim=0) 
--++++-#             for i in range(batch_size)
--++++-#         ]
--++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++-
--++++-#         # 在 float32 下执行 bmm，得到高精度结果
--++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++-        
--++++-#         # 将高精度结果转换回原始数据类型
--++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--++++-        
--++++-#         return moe_output
--++++-
--++++-#     @no_grad()
--++++-#     def _moe_infer_prefill(
--++++-#         self, 
--++++-#         hidden_states: mindspore.Tensor, 
--++++-#         selected_experts: mindspore.Tensor, 
--++++-#         routing_weights: mindspore.Tensor
--++++-#     ) -> mindspore.Tensor:
--++++-#         """
--++++-#         【预填充路径】与原始实现一致，结果精确。
--++++-#         """
--++++-#         moe_output = ops.zeros_like(hidden_states)
--++++-#         num_tokens, _ = hidden_states.shape
--++++-#         flat_selected_experts = selected_experts.flatten()
--++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++-#         active_experts = ops.unique(flat_selected_experts)
--++++-        
--++++-#         for expert_idx_tensor in active_experts:
--++++-#             expert_idx = expert_idx_tensor.item()
--++++-#             expert_layer = self.experts[expert_idx]
--++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++++-#             selected_token_indices = token_indices[mask]
--++++-#             selected_routing_weights = routing_weights.flatten()[mask]
--++++-#             current_states = hidden_states[selected_token_indices]
--++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++-#             moe_output = moe_output.index_add(
--++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++++-#             )
--++++-#         return moe_output
--++++-
--++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-        
--++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++-#         router_logits = self.gate(hidden_states_reshaped)
--++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++-
--++++-#         if self.norm_topk_prob:
--++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-        
--++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--++++-#         # 如果模型主体是 float16，后续再转换
--++++-        
--++++-#         moe_output = None
--++++-#         if not self.training:
--++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--++++-#             # _moe_infer_decode 内部会处理好类型转换
--++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--++++-#             if sequence_length == 1:
--++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++++-#             else:
--++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++++-#         else:
--++++-#             raise NotImplementedError("Training path is not implemented.")
--++++-
--++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++-        
--++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++-        
--++++-#         return final_hidden_states, router_logits
--++++-    
--++++-
--++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++-#     """
--++++-#     【融合版】一个混合专家模块，内置两种推理策略，
--++++-#     由外部全局变量 `Long_Prompt` 控制：
--++++-
--++++-#     - if Long_Prompt is True:  【精度优先模式】
--++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--++++-#       适用于处理长序列，避免误差累积。
--++++-
--++++-#     - if Long_Prompt is False: 【速度优先模式】
--++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
--++++-#     """
--++++-#     def __init__(self, config: Qwen2MoeConfig):
--++++-#         super().__init__()
--++++-#         self.num_experts = config.num_experts
--++++-#         self.top_k = config.num_experts_per_tok
--++++-#         self.norm_topk_prob = config.norm_topk_prob
--++++-
--++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++-#         self.experts = nn.ModuleList(
--++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++-#         )
--++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++-
--++++-#     # --- 速度优先模式的辅助函数 ---
--++++-#     @no_grad()
--++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++-#         original_dtype = hidden_states.dtype
--++++-#         batch_size, _ = hidden_states.shape
--++++-#         expert_outputs_list = [
--++++-#             ops.cat([
--++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++-#             ], dim=0) 
--++++-#             for i in range(batch_size)
--++++-#         ]
--++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
--++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
--++++-
--++++-#     @no_grad()
--++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++-#         moe_output = ops.zeros_like(hidden_states)
--++++-#         num_tokens, _ = hidden_states.shape
--++++-#         flat_selected_experts = selected_experts.flatten()
--++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++-#         active_experts = ops.unique(flat_selected_experts)
--++++-#         for expert_idx_tensor in active_experts:
--++++-#             expert_idx = expert_idx_tensor.item()
--++++-#             expert_layer = self.experts[expert_idx]
--++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++++-#             selected_token_indices = token_indices[mask]
--++++-#             selected_routing_weights = routing_weights.flatten()[mask]
--++++-#             current_states = hidden_states[selected_token_indices]
--++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--++++-#         return moe_output
--++++-
--++++-#     # --- 精度优先模式的辅助函数 ---
--++++-#     @no_grad()
--++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++-#         moe_output = ops.zeros_like(hidden_states)
--++++-#         num_tokens, _ = hidden_states.shape
--++++-#         flat_selected_experts = selected_experts.flatten()
--++++-#         flat_routing_weights = routing_weights.flatten()
--++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++-#         active_experts = ops.unique(flat_selected_experts)
--++++-#         for expert_idx_tensor in active_experts:
--++++-#             expert_idx = expert_idx_tensor.item()
--++++-#             expert_layer = self.experts[expert_idx]
--++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--++++-#             current_token_indices = token_indices[mask]
--++++-#             current_routing_weights = flat_routing_weights[mask]
--++++-#             current_hidden_states = hidden_states[current_token_indices]
--++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--++++-#         return moe_output
--++++-
--++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
--++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--++++-#         global Long_Prompt
--++++-
--++++-#         # 1. 门控计算 (所有模式通用)
--++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++-#         router_logits = self.gate(hidden_states_reshaped)
--++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--++++-#         if self.norm_topk_prob:
--++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-        
--++++-#         moe_output = None
--++++-#         if not self.training:
--++++-#             # 根据 Long_Prompt 标志选择模式
--++++-#             if Long_Prompt:
--++++-#                 # --- 精度优先模式 ---
--++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++-#             else:
--++++-#                 # --- 速度优先模式 ---
--++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++-#                 if sequence_length == 1:
--++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++-#                 else:
--++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++-#         else:
--++++-#             raise NotImplementedError("Training path is not implemented.")
--++++-
--++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++-        
--++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++-        
--++++-#         return final_hidden_states, router_logits
--++++-    
--++++ class Qwen2MoeSparseMoeBlock(nn.Module):
--++++     """
--++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++++         return moe_output_fp32.squeeze(1).to(original_dtype)
--++++ 
--+++++    # @no_grad()
--+++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++    #     num_tokens, _ = hidden_states.shape
--+++++    #     flat_selected_experts = selected_experts.flatten()
--+++++    #     sorted_expert_indices = flat_selected_experts.argsort()
--+++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+++++    #     original_token_indices = sorted_expert_indices // self.top_k
--+++++    #     moe_output = ops.zeros_like(hidden_states)
--+++++    #     current_token_offset = 0
--+++++    #     for i in range(self.num_experts):
--+++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
--+++++    #         if expert_token_count == 0:
--+++++    #             continue
--+++++    #         end_offset = current_token_offset + expert_token_count
--+++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
--+++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+++++    #         current_token_offset += expert_token_count
--+++++    #     return moe_output
--+++++
--++++     @no_grad()
--++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++-        num_tokens, _ = hidden_states.shape
--++++-        flat_selected_experts = selected_experts.flatten()
--++++-        sorted_expert_indices = flat_selected_experts.argsort()
--++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--++++-        original_token_indices = sorted_expert_indices // self.top_k
--+++++        """
--+++++        优化版 MoE prefill (速度优先模式)：
--+++++        - 批量张量化处理同一个 expert 的所有 token
--+++++        - 跳过无 token 的专家
--+++++        - 保持结果完全一致
--+++++        """
--++++         moe_output = ops.zeros_like(hidden_states)
--++++-        current_token_offset = 0
--++++-        for i in range(self.num_experts):
--++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
--++++-            if expert_token_count == 0:
--++++-                continue
--++++-            end_offset = current_token_offset + expert_token_count
--++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
--++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--++++-            current_token_offset += expert_token_count
--+++++
--+++++        flat_selected_experts = selected_experts.flatten()
--+++++        flat_routing_weights = routing_weights.flatten()
--+++++
--+++++        idxs = flat_selected_experts.argsort()
--+++++        sorted_expert_indices = flat_selected_experts[idxs]
--+++++        sorted_token_indices = idxs // self.top_k
--+++++
--+++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
--+++++
--+++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--+++++
--+++++        for expert_id in active_experts.tolist():
--+++++            start = int(tokens_per_expert[:expert_id].sum().item())
--+++++            end = start + int(tokens_per_expert[expert_id].item())
--+++++
--+++++            token_idx = sorted_token_indices[start:end]
--+++++            expert_tokens = hidden_states[token_idx]
--+++++
--+++++            expert_out = self.experts[expert_id](expert_tokens)
--+++++
--+++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
--+++++
--+++++            moe_output = mindspore.mint.scatter_add(
--+++++                moe_output,
--+++++                0,
--+++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
--+++++                scaled_out.to(hidden_states.dtype)
--+++++            )
--+++++
--++++         return moe_output
--++++ 
--+++++
--++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--++++     @no_grad()
--++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++         
--++++         moe_output = None
--++++-        if Long_Prompt:
--++++-            # --- 精度优先模式 (ACCURACY MODE) ---
--++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++        # if Long_Prompt==0:
--+++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
--+++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++        # else:
--+++++        #     # --- 速度优先模式 (SPEED MODE) ---
--+++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++        #     if sequence_length == 1:
--+++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++        #     else:
--+++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++        
--+++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++        if sequence_length == 1:
--+++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++         else:
--++++-            # --- 速度优先模式 (SPEED MODE) ---
--++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++-            if sequence_length == 1:
--++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++-            else:
--++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++-        
--+++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++    
--++++ 
--++++         # 3. 共享专家计算与合并 (所有模式通用)
--++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++++         
--++++         return final_hidden_states, router_logits
--++++ 
--+++++
--++++ class Qwen2MoeDecoderLayer(nn.Module):
--++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--++++         super().__init__()
--++++         self.hidden_size = config.hidden_size
--++++         
--++++-        # if Long_Prompt:
--++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++-        # else:
--+++++        # if Long_Prompt == 2:
--++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++++        # else:
--+++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++ 
--++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++ 
--++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++++             )
--++++ 
--++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
--++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++++        #     attention_mask,
--+++++        #     sequence_length=sequence_length,
--+++++        #     target_length=target_length,
--+++++        #     dtype=dtype,
--+++++        #     min_dtype=min_dtype,
--+++++        #     cache_position=cache_position,
--+++++        #     batch_size=input_tensor.shape[0],
--+++++        # )
--+++++        #@dwj
--+++++        causal_mask = get_cached_causal_mask_with_cache_position(
--++++             attention_mask,
--++++             sequence_length=sequence_length,
--++++             target_length=target_length,
--++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--++++         """
--++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--+++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
--+++++        _causal_mask_cache.clear()
--++++ 
--++++         input_ids = kwargs.get("input_ids")
--++++         if input_ids is None and args:
--++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++ 
--++++         if input_ids is not None:
--++++             prompt_length = input_ids.shape[1]
--++++-            
--++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--++++-                Long_Prompt = True
--+++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
--+++++                Long_Prompt = 2
--+++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
--+++++                Long_Prompt = 0
--++++             else:
--++++-                Long_Prompt = False
--+++++                Long_Prompt = 1
--+++++
--++++ 
--++++         return super().generate(*args, **kwargs)
--++++     
--++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++             dtype = self.lm_head.weight.dtype
--++++             min_dtype = float(ops.finfo(dtype).min)
--++++ 
--++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--+++++            #     attention_mask,
--+++++            #     sequence_length=sequence_length,
--+++++            #     target_length=past_key_values.get_max_length(),
--+++++            #     dtype=dtype,
--+++++            #     min_dtype=min_dtype,
--+++++            #     cache_position=cache_position,
--+++++            #     batch_size=batch_size,
--+++++            # )
--+++++
--+++++            #@dwj
--+++++            attention_mask = get_cached_causal_mask_with_cache_position(
--++++                 attention_mask,
--++++                 sequence_length=sequence_length,
--++++                 target_length=past_key_values.get_max_length(),
--++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--++++deleted file mode 100644
--++++index 6dfb5b93..00000000
--++++--- a/patches/0001-20251104commit.patch
--+++++++ /dev/null
--++++@@ -1,1272 +0,0 @@
--++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++++-From: Pinoeer-kingxi <13022943007@163.com>
--++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
--++++-Subject: [PATCH] 20251104commit
--++++-
--++++----
--++++- mindnlp/transformers/cache_utils.py           |  28 +-
--++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
--++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--++++- 3 files changed, 976 insertions(+), 87 deletions(-)
--++++-
--++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--++++-index cadd2e04..02f8d4be 100644
--++++---- a/mindnlp/transformers/cache_utils.py
--++++-+++ b/mindnlp/transformers/cache_utils.py
--++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
--++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--++++-             # k_out[:, :, cache_position] = key_states
--++++-             # v_out[:, :, cache_position] = value_states
--++++--            if ON_ORANGE_PI:
--++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++--            else:
--++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++--
--++++-+            # if ON_ORANGE_PI:
--++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++-+            # else:
--++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
--++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--++++-+            if cache_position.ndim > 1:
--++++-+                cache_position = cache_position.flatten()
--++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
--++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--++++-+                cache_position = cache_position.int()
--++++-+            
--++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--++++-+            k_out[:, :, cache_position] = key_states
--++++-+            v_out[:, :, cache_position] = value_states
--++++-+            
--++++-         return k_out, v_out
--++++- 
--++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++-index c695b944..d8303e45 100644
--++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
--++++- def rotate_half(x):
--++++-     """Rotates half the hidden dims of the input."""
--++++--    x1 = x[..., : x.shape[-1] // 2]
--++++--    x2 = x[..., x.shape[-1] // 2 :]
--++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++-+    # x1 = x[..., : x.shape[-1] // 2]
--++++-+    # x2 = x[..., x.shape[-1] // 2 :]
--++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++-     return ops.cat((-x2, x1), dim=-1)
--++++- 
--++++- 
--++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--++++-         if self.training:
--++++-             raise NotImplementedError("Training is not supported yet.")
--++++-         else:
--++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++--        if self.config.n_shared_experts is not None:
--++++--            y = y + self.shared_experts(identity)
--++++--        return y
--++++-+            # @lwx
--++++-+            if orig_shape[1] == 1:
--++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--++++-+                y=y.view(*orig_shape)
--++++-+                if self.config.n_shared_experts is not None:
--++++-+                    y = y + self.shared_experts(identity)
--++++-+                return y
--++++-+            else:
--++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--++++-+                if self.config.n_shared_experts is not None:
--++++-+                    y = y + self.shared_experts(identity)
--++++-+                return y
--++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++-+        # if self.config.n_shared_experts is not None:
--++++-+        #     y = y + self.shared_experts(identity)
--++++-+        # return y
--++++-+        
--++++-+    @no_grad()
--++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++-+
--++++-+        expert_cache = ops.zeros_like(x)
--++++-+        for i in range(self.num_experts_per_tok):
--++++-+            expert_id = flat_expert_indices[i].item()
--++++-+            weight = flat_expert_weights[i].item()
--++++-+            expert = self.experts[expert_id]
--++++-+            expert_out = expert(x)
--++++-+            expert_cache += expert_out * weight
--++++-+        return expert_cache
--++++- 
--++++-     @no_grad()
--++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++--        # expert_cache = torch.zeros_like(x)
--++++--        # idxs = flat_expert_indices.argsort()
--++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++--        # token_idxs = idxs // self.num_experts_per_tok
--++++--        # for i, end_idx in enumerate(tokens_per_expert):
--++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++--        #     if start_idx == end_idx:
--++++--        #         continue
--++++--        #     expert = self.experts[i]
--++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++--        #     expert_tokens = x[exp_token_idx]
--++++--        #     expert_out = expert(expert_tokens)
--++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++--        # return expert_cache
--++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++-         expert_cache = ops.zeros_like(x)
--++++-         idxs = flat_expert_indices.argsort()
--++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++-         token_idxs = idxs // self.num_experts_per_tok
--++++-+
--++++-         for i, end_idx in enumerate(tokens_per_expert):
--++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++-             if start_idx == end_idx:
--++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--++++-             expert_out = expert(expert_tokens)
--++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++-+
--++++-         return expert_cache
--++++-+        
--++++-+    # @no_grad()
--++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++-+    #     # expert_cache = torch.zeros_like(x)
--++++-+    #     # idxs = flat_expert_indices.argsort()
--++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
--++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
--++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++-+    #     #     if start_idx == end_idx:
--++++-+    #     #         continue
--++++-+    #     #     expert = self.experts[i]
--++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++-+    #     #     expert_tokens = x[exp_token_idx]
--++++-+    #     #     expert_out = expert(expert_tokens)
--++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++-+    #     # return expert_cache
--++++-+    #     expert_cache = ops.zeros_like(x)
--++++-+    #     idxs = flat_expert_indices.argsort()
--++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++-+    #     token_idxs = idxs // self.num_experts_per_tok
--++++-+
--++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
--++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++-+    #         if start_idx == end_idx:
--++++-+    #             continue
--++++-+    #         expert = self.experts[i]
--++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++-+    #         expert_tokens = x[exp_token_idx]
--++++-+    #         expert_out = expert(expert_tokens)
--++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++-+
--++++-+    #     return expert_cache
--++++-+    # @no_grad()
--++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++-+    #     expert_cache = ops.zeros_like(x)
--++++-+
--++++-+    #     # 排序保证顺序一致
--++++-+    #     idxs = flat_expert_indices.argsort()
--++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++-+    #     token_idxs = idxs // self.num_experts_per_tok
--++++-+
--++++-+    #     # 找出有 token 的专家
--++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++-+
--++++-+    #     for i in active_experts.tolist():
--++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++-+    #         end_idx = tokens_per_expert[i]
--++++-+    #         if start_idx == end_idx:  # 没有 token
--++++-+    #             continue
--++++-+
--++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++-+    #         expert_tokens = x[exp_token_idx]
--++++-+    #         expert_out = self.experts[i](expert_tokens)
--++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++-+
--++++-+    #         expert_cache = mindspore.mint.scatter_add(
--++++-+    #             expert_cache,
--++++-+    #             0,
--++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++-+    #             expert_out
--++++-+    #         )
--++++-+
--++++-+    #     return expert_cache
--++++-+
--++++-+
--++++- 
--++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--++++- #     """
--++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++- 
--++++-         # Initialize weights and apply final processing
--++++-         self.post_init()
--++++-+        self.warm_up = False
--++++-+
--++++-+    def warmup_moe_model_deep(self):
--++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++++-+        test_texts = [
--++++-+            "warmup short",
--++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--++++-+        ]
--++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++-+        if tokenizer is None:
--++++-+            from mindnlp.transformers import AutoTokenizer
--++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++-+            self._warmup_tokenizer = tokenizer
--++++-+
--++++-+        for text in test_texts:
--++++-+            inputs = tokenizer(text, return_tensors="ms")
--++++-+            with mindspore._no_grad():
--++++-+                _ = self(**inputs, use_cache=False)
--++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--++++- 
--++++-     def get_input_embeddings(self):
--++++-         return self.model.embed_tokens
--++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++++-         ```"""
--++++-+        if not self.warm_up:
--++++-+            self.warm_up = True
--++++-+            self.warmup_moe_model_deep()
--++++-+
--++++-         output_attentions = (
--++++-             output_attentions
--++++-             if output_attentions is not None
--++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++-index 3cbf820e..d4c6b651 100644
--++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++-@@ -18,7 +18,6 @@
--++++- # See the License for the specific language governing permissions and
--++++- # limitations under the License.
--++++- """MindSpore Qwen2MoE model."""
--++++--
--++++- import math
--++++- from typing import List, Optional, Tuple, Union
--++++- 
--++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--++++-     TokenClassifierOutput,
--++++- )
--++++- from ...modeling_utils import PreTrainedModel
--++++-+from ...generation import GenerationMixin
--++++- from ....utils import logging
--++++- from .configuration_qwen2_moe import Qwen2MoeConfig
--++++- 
--++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--++++-         self.variance_epsilon = eps
--++++- 
--++++-     def forward(self, hidden_states):
--++++-+        # @dwj
--++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++-+        # @lwx
--++++-+        # if not self.training :
--++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++-         input_dtype = hidden_states.dtype
--++++-         hidden_states = hidden_states.to(mindspore.float32)
--++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--++++-@@ -234,6 +239,8 @@ def rotate_half(x):
--++++-     """Rotates half the hidden dims of the input."""
--++++-     x1 = x[..., : x.shape[-1] // 2]
--++++-     x2 = x[..., x.shape[-1] // 2 :]
--++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++-     return ops.cat((-x2, x1), dim=-1)
--++++- 
--++++- 
--++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--++++-         self.config = config
--++++-         self.hidden_size = config.hidden_size
--++++-         self.intermediate_size = intermediate_size
--++++-+        
--++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--++++-         self.act_fn = ACT2FN[config.hidden_act]
--++++- 
--++++-     def forward(self, x):
--++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++--
--++++- 
--++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++-+        # @lwx 
--++++-+        # gate_up_output = self.gate_up_proj(x)
--++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--++++-+        # return self.down_proj(swiglu_output)
--++++-+
--++++-+    # def forward(self, x):
--++++-+    #     gate_proj_out = self.gate_proj(x)
--++++-+    #     up_proj_out = self.up_proj(x)
--++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--++++-+    #     return self.down_proj(swiglu_out)
--++++-+        
--++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
--++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++++-     """
--++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--++++-         use_cache: bool = False,
--++++-         cache_position: Optional[mindspore.Tensor] = None,
--++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++-+
--++++-+        
--++++-+
--++++-         bsz, q_len, _ = hidden_states.shape
--++++- 
--++++-         query_states = self.q_proj(hidden_states)
--++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++-                     "with a layer index."
--++++-                 )
--++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++-+            if isinstance(past_key_value, StaticCache):
--++++-+                kv_seq_len = key_states.shape[-2]
--++++-+            else:
--++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++- 
--++++-         if past_key_value is not None:
--++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++-+            
--++++-+            if isinstance(past_key_value, StaticCache):
--++++-+                kv_seq_len = key_states.shape[-2]
--++++- 
--++++-         # repeat k/v heads if n_kv_heads < n_heads
--++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++--
--++++-+        
--++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++- 
--++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--++++--            raise ValueError(
--++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--++++--                f" {attn_weights.shape}"
--++++--            )
--++++--
--++++--        if attention_mask is not None:  # no matter the length, we just slice it
--++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--++++-+        if attention_mask is not None:
--++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++-             attn_weights = attn_weights + causal_mask
--++++- 
--++++-         # upcast attention to fp32
--++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++- 
--++++-         attn_output = self.o_proj(attn_output)
--++++--
--++++-+        # @lwx
--++++-+        
--++++-+        # max_seq_len = self.max_position_embeddings  # 2048
--++++-+
--++++-+        # if attention_mask is not None:
--++++-+        #     # attention_mask: [B, 1, Sq, Sk]
--++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++++-+
--++++-+        #     # pad 到 [max_seq_len, max_seq_len]
--++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++++-+        #     global_attention_mask = padded_mask
--++++-+        # else:
--++++-+        #     global_attention_mask = None
--++++-+
--++++-+
--++++-+        # sparse_mode=3
--++++-+        # attn_output = mindspore.ops.flash_attention_score(
--++++-+        #     query=query_states,
--++++-+        #     key=key_states,
--++++-+        #     value=value_states,
--++++-+        #     real_shift=None,
--++++-+        #     padding_mask=None,
--++++-+
--++++-+        #     head_num=self.num_heads,
--++++-+        #     attn_mask=global_attention_mask,
--++++-+        #     keep_prob=1.0 - self.attention_dropout,
--++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++++-+        #     input_layout="BNSD",
--++++-+        #     pre_tokens=2147483647,
--++++-+        #     next_tokens=2147483647,
--++++-+        #     inner_precise=0,
--++++-+        #     drop_mask=None,
--++++-+        #     prefix=None,
--++++-+        #     actual_seq_qlen=None,
--++++-+        #     actual_seq_kvlen=None,
--++++-+        #     sparse_mode=sparse_mode,
--++++-+        # )
--++++-         if not output_attentions:
--++++-             attn_weights = None
--++++- 
--++++-         return attn_output, attn_weights, past_key_value
--++++- 
--++++- 
--++++-+class Qwen2MoeFlashAttention(nn.Module):
--++++-+    """
--++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++++-+
--++++-+    关键改动:
--++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++++-+       直接传入原始的 key 和 value 张量效率更高。
--++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++-+    """
--++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++-+        super().__init__()
--++++-+        self.config = config
--++++-+        self.layer_idx = layer_idx
--++++-+        self.hidden_size = config.hidden_size
--++++-+        self.num_heads = config.num_attention_heads
--++++-+        self.head_dim = self.hidden_size // self.num_heads
--++++-+        self.num_key_value_heads = config.num_key_value_heads
--++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++-+        self.max_position_embeddings = config.max_position_embeddings
--++++-+        self.rope_theta = config.rope_theta
--++++-+        self.attention_dropout = config.attention_dropout
--++++-+
--++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
--++++-+            raise ValueError(
--++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++++-+            )
--++++-+
--++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++-+
--++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++-+            self.head_dim,
--++++-+            max_position_embeddings=self.max_position_embeddings,
--++++-+            base=self.rope_theta,
--++++-+        )
--++++-+
--++++-+    def forward(
--++++-+        self,
--++++-+        hidden_states: mindspore.Tensor,
--++++-+        attention_mask: Optional[mindspore.Tensor] = None,
--++++-+        position_ids: Optional[mindspore.Tensor] = None,
--++++-+        past_key_value: Optional[Cache] = None,
--++++-+        output_attentions: bool = False,
--++++-+        use_cache: bool = False,
--++++-+        cache_position: Optional[mindspore.Tensor] = None,
--++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++-+
--++++-+        bsz, q_len, _ = hidden_states.shape
--++++-+
--++++-+        # 1. 线性投射 Q, K, V
--++++-+        query_states = self.q_proj(hidden_states)
--++++-+        key_states = self.k_proj(hidden_states)
--++++-+        value_states = self.v_proj(hidden_states)
--++++-+
--++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
--++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+
--++++-+        # 3. RoPE 旋转位置编码
--++++-+        kv_seq_len = key_states.shape[-2]
--++++-+        if past_key_value is not None:
--++++-+            if self.layer_idx is None:
--++++-+                raise ValueError(
--++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++-+                    "with a layer index."
--++++-+                )
--++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++++-+                if cache_position.shape[0] == 1:
--++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++++-+                    kv_seq_len = past_seen_tokens + 1
--++++-+                else:
--++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
--++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++++-+            else:
--++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++-+
--++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++-+
--++++-+        # 4. KV 缓存更新
--++++-+        if past_key_value is not None:
--++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++-+            key_states, value_states = past_key_value.update(
--++++-+                key_states, value_states, self.layer_idx, cache_kwargs
--++++-+            )
--++++-+            
--++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++-+                if cache_position.shape[0] == 1:
--++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++++-+                    kv_seq_len = key_states.shape[-2]
--++++-+
--++++-+        # 5. [重要] 准备 Attention Mask
--++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++-+        fa_attention_mask = None
--++++-+        if attention_mask is not None:
--++++-+            # 截取与当前key长度匹配的部分
--++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++++-+            fa_attention_mask = (mask_slice != 0)
--++++-+
--++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++++-+        input_dtype = query_states.dtype
--++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++++-+            query_states = query_states.to(mindspore.float16)
--++++-+            key_states = key_states.to(mindspore.float16)
--++++-+            value_states = value_states.to(mindspore.float16)
--++++-+
--++++-+        # 6. [核心] 调用 flash_attention_score 算子
--++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++-+        attn_output = mindspore.ops.flash_attention_score(
--++++-+            query=query_states,
--++++-+            key=key_states,
--++++-+            value=value_states,
--++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
--++++-+            attn_mask=fa_attention_mask,
--++++-+            keep_prob=1.0 - self.attention_dropout,
--++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
--++++-+            input_layout="BNSD",
--++++-+            sparse_mode=0 # 使用 defaultMask 模式
--++++-+        )
--++++-+
--++++-+        # 恢复原始数据类型
--++++-+        attn_output = attn_output.to(input_dtype)
--++++-+
--++++-+        # 7. 调整输出形状
--++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++-+        attn_output = self.o_proj(attn_output)
--++++-+
--++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
--++++-+        attn_weights = None
--++++-+        if output_attentions:
--++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++-+
--++++-+        return attn_output, attn_weights, past_key_value
--++++-+
--++++-+    # def forward(
--++++-+    #     self,
--++++-+    #     hidden_states: mindspore.Tensor,
--++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
--++++-+    #     past_key_value: Optional[Cache] = None,
--++++-+    #     output_attentions: bool = False,
--++++-+    #     use_cache: bool = False,
--++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
--++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++-+
--++++-+    #     bsz, q_len, _ = hidden_states.shape
--++++-+
--++++-+    #     # 1. 线性投射 Q, K, V
--++++-+    #     query_states = self.q_proj(hidden_states)
--++++-+    #     key_states = self.k_proj(hidden_states)
--++++-+    #     value_states = self.v_proj(hidden_states)
--++++-+
--++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+
--++++-+    #     # 3. RoPE 旋转位置编码
--++++-+    #     kv_seq_len = key_states.shape[-2]
--++++-+    #     if past_key_value is not None:
--++++-+    #         if self.layer_idx is None:
--++++-+    #             raise ValueError(
--++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++-+    #                 "with a layer index."
--++++-+    #             )
--++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++-+
--++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++-+
--++++-+    #     # 4. KV 缓存更新
--++++-+    #     if past_key_value is not None:
--++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++-+    #         key_states, value_states = past_key_value.update(
--++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++-+    #         )
--++++-+
--++++-+    #     # 5. 准备 Attention Mask
--++++-+    #     fa_attention_mask = None
--++++-+    #     if attention_mask is not None:
--++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++-+    #         fa_attention_mask = (mask_slice != 0)
--++++-+
--++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++++-+    #     input_dtype = query_states.dtype
--++++-+
--++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
--++++-+    #     attn_output = mindspore.ops.flash_attention_score(
--++++-+    #         query=query_states,
--++++-+    #         key=key_states,
--++++-+    #         value=value_states,
--++++-+    #         head_num=self.num_heads,
--++++-+    #         attn_mask=fa_attention_mask,
--++++-+    #         keep_prob=1.0 - self.attention_dropout,
--++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++++-+    #         input_layout="BNSD",
--++++-+    #         sparse_mode=0,
--++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++++-+    #         inner_precise=1
--++++-+    #     )
--++++-+
--++++-+    #     # 恢复原始数据类型
--++++-+    #     attn_output = attn_output.to(input_dtype)
--++++-+
--++++-+    #     # 7. 调整输出形状
--++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++-+    #     attn_output = self.o_proj(attn_output)
--++++-+
--++++-+    #     attn_weights = None
--++++-+    #     if output_attentions:
--++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++-+
--++++-+    #     return attn_output, attn_weights, past_key_value
--++++-+
--++++-+    # def forward(
--++++-+    #     self,
--++++-+    #     hidden_states: mindspore.Tensor,
--++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
--++++-+    #     past_key_value: Optional[Cache] = None,
--++++-+    #     output_attentions: bool = False,
--++++-+    #     use_cache: bool = False,
--++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
--++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++-+
--++++-+    #     bsz, q_len, _ = hidden_states.shape
--++++-+
--++++-+    #     query_states = self.q_proj(hidden_states)
--++++-+    #     key_states = self.k_proj(hidden_states)
--++++-+    #     value_states = self.v_proj(hidden_states)
--++++-+
--++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-+
--++++-+    #     kv_seq_len = key_states.shape[-2]
--++++-+    #     if past_key_value is not None:
--++++-+    #         if self.layer_idx is None:
--++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
--++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++-+
--++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++-+
--++++-+    #     if past_key_value is not None:
--++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++-+    #         key_states, value_states = past_key_value.update(
--++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++-+    #         )
--++++-+
--++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++-+
--++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
--++++-+    #     # <--- 修改结束 ---
--++++-+
--++++-+    #     fa_attention_mask = None
--++++-+    #     if attention_mask is not None:
--++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++-+    #         fa_attention_mask = (mask_slice != 0)
--++++-+
--++++-+    #     input_dtype = query_states.dtype
--++++-+
--++++-+    #     attn_output = mindspore.ops.flash_attention_score(
--++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
--++++-+    #         key=key_states,
--++++-+    #         value=value_states,
--++++-+    #         head_num=self.num_heads,
--++++-+    #         attn_mask=fa_attention_mask,
--++++-+    #         keep_prob=1.0 - self.attention_dropout,
--++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++++-+    #         input_layout="BNSD",
--++++-+    #         sparse_mode=0,
--++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
--++++-+    #     )
--++++-+
--++++-+    #     attn_output = attn_output.to(input_dtype)
--++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++-+    #     attn_output = self.o_proj(attn_output)
--++++-+
--++++-+    #     attn_weights = None
--++++-+    #     if output_attentions:
--++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++++-+
--++++-+    #     return attn_output, attn_weights, past_key_value
--++++-+
--++++- QWEN2MOE_ATTENTION_CLASSES = {
--++++-     "eager": Qwen2MoeAttention,
--++++-+    "flash-attention": Qwen2MoeFlashAttention,
--++++- }
--++++- 
--++++- 
--++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++- 
--++++-+    #@dwj
--++++-+    # 只遍历激活的专家，而非全部专家
--++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++--        hidden_states = hidden_states.view(-1, hidden_dim)
--++++--        # router_logits: (batch * sequence_length, n_experts)
--++++--        router_logits = self.gate(hidden_states)
--++++--
--++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++--        if self.norm_topk_prob:
--++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++--        # we cast back to the input dtype
--++++--        routing_weights = routing_weights.to(hidden_states.dtype)
--++++--
--++++--        final_hidden_states = ops.zeros(
--++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--++++--        )
--++++--
--++++--        # One hot encode the selected experts to create an expert mask
--++++--        # this will be used to easily index which expert is going to be sollicitated
--++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--++++--
--++++--        # Loop over all available experts in the model and perform the computation on each expert
--++++--        for expert_idx in range(self.num_experts):
--++++--            expert_layer = self.experts[expert_idx]
--++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--++++--
--++++--            # Index the correct hidden states and compute the expert hidden state for
--++++--            # the current expert. We need to make sure to multiply the output hidden
--++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--++++--            if 0 not in idx.shape:
--++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--++++--
--++++--                # However `index_add_` only support torch tensors for indexing so we'll use
--++++--                # the `top_x` tensor here.
--++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--++++--
--++++--        shared_expert_output = self.shared_expert(hidden_states)
--++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--++++--
--++++--        final_hidden_states = final_hidden_states + shared_expert_output
--++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++-+            num_tokens = hidden_states_reshaped.shape[0]
--++++-+
--++++-+            router_logits = self.gate(hidden_states_reshaped)
--++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++-+
--++++-+            if self.norm_topk_prob:
--++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
--++++-+            
--++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++++-+            flat_selected_experts = selected_experts.flatten()
--++++-+            
--++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++++-+            token_indices = broadcasted_token_indices.flatten()
--++++-+            
--++++-+            active_experts = ops.unique(flat_selected_experts)
--++++-+            
--++++-+            for expert_idx_tensor in active_experts:
--++++-+                expert_idx = expert_idx_tensor.item()
--++++-+                expert_layer = self.experts[expert_idx]
--++++-+                
--++++-+                mask = (flat_selected_experts == expert_idx_tensor)
--++++-+                selected_token_indices = token_indices[mask]
--++++-+                selected_routing_weights = routing_weights.flatten()[mask]
--++++-+                
--++++-+                current_states = hidden_states_reshaped[selected_token_indices]
--++++-+                
--++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++-+                
--++++-+                final_hidden_states = final_hidden_states.index_add(
--++++-+                    dim=0,
--++++-+                    index=selected_token_indices,
--++++-+                    source=expert_output.to(hidden_states.dtype)
--++++-+                )
--++++-+            
--++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++++- 
--++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++--        return final_hidden_states, router_logits
--++++-+            final_hidden_states = final_hidden_states + shared_expert_output
--++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++-+            
--++++-+            return final_hidden_states, router_logits
--++++- 
--++++- 
--++++- class Qwen2MoeDecoderLayer(nn.Module):
--++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--++++- 
--++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++- 
--++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++-+
--++++-         if (layer_idx not in config.mlp_only_layers) and (
--++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++++-         ):
--++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--++++-     _skip_keys_device_placement = "past_key_values"
--++++-     _supports_cache_class = True
--++++-+#lwx
--++++-+    # _supports_static_cache = True
--++++- 
--++++-     def _init_weights(self, module):
--++++-         std = self.config.initializer_range
--++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++++-         return causal_mask
--++++- 
--++++- 
--++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++-     _tied_weights_keys = ["lm_head.weight"]
--++++- 
--++++-     def __init__(self, config):
--++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++-         self.num_experts_per_tok = config.num_experts_per_tok
--++++-         # Initialize weights and apply final processing
--++++-         self.post_init()
--++++-+        # @lwx
--++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--++++-+        #     self.generation_config.cache_implementation = "static"
--++++-+        self._warmed_up = False
--++++-+
--++++-+    def warmup_moe_model(self):
--++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
--++++-+        test_texts = [
--++++-+            "warmup short",
--++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--++++-+        ]
--++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++-+        if tokenizer is None:
--++++-+            from mindnlp.transformers import AutoTokenizer
--++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++-+            self._warmup_tokenizer = tokenizer
--++++-+
--++++-+        for text in test_texts:
--++++-+            inputs = tokenizer(text, return_tensors="ms")
--++++-+            with mindspore._no_grad():
--++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
--++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
--++++- 
--++++-     def get_input_embeddings(self):
--++++-         return self.model.embed_tokens
--++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++++-         ```"""
--++++-+        if not self._warmed_up:
--++++-+            self._warmed_up = True
--++++-+            self.warmup_moe_model()
--++++- 
--++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++++-         output_router_logits = (
--++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++-             }
--++++-         )
--++++-         return model_inputs
--++++-+# @lwx
--++++-+    # def _decode_one_tokens_logits(
--++++-+    #     self,
--++++-+    #     cur_token: mindspore.Tensor,
--++++-+    #     input_pos: Optional[mindspore.Tensor],
--++++-+    #     cache_position: mindspore.Tensor,
--++++-+    #     past_key_values: StaticCache,
--++++-+    # ) -> mindspore.Tensor:
--++++-+    #     """
--++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--++++-+        
--++++-+    #     Args:
--++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--++++-+    #         input_pos: 输入位置信息，可选
--++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
--++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
--++++-+            
--++++-+    #     Returns:
--++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--++++-+    #     """
--++++-+    #     # 调用JIT编译的版本
--++++-+    #     return self.get_decode_one_tokens_logits(
--++++-+    #         cur_token=cur_token,
--++++-+    #         input_pos=input_pos,
--++++-+    #         cache_position=cache_position,
--++++-+    #         past_key_values=past_key_values,
--++++-+    #     )
--++++-+    
--++++-+    # @mindspore.jit(jit_level='O1')
--++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--++++-+    #     """
--++++-+    #     JIT编译的函数，用于高效的单token解码
--++++-+    #     使用JIT编译优化以支持静态shape和高效执行
--++++-+        
--++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--++++-+    #     """
--++++-+    #     outputs = self.model.forward(
--++++-+    #         input_ids=cur_token,
--++++-+    #         position_ids=input_pos,
--++++-+    #         cache_position=cache_position,
--++++-+    #         past_key_values=past_key_values,
--++++-+    #         use_cache=True,
--++++-+    #         return_dict=False,
--++++-+    #     )
--++++-+        
--++++-+    #     hidden_states = outputs[0]
--++++-+    #     logits = self.lm_head.forward(hidden_states)
--++++-+    #     logits = logits.float()
--++++-+        
--++++-+    #     return logits[:, -1, :]
--++++-+
--++++-+    # def _sample(
--++++-+    #     self,
--++++-+    #     input_ids: mindspore.Tensor,
--++++-+    #     logits_processor,
--++++-+    #     stopping_criteria,
--++++-+    #     generation_config,
--++++-+    #     synced_devices: bool,
--++++-+    #     streamer=None,
--++++-+    #     logits_warper=None,
--++++-+    #     **model_kwargs,
--++++-+    # ):
--++++-+    #     """
--++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--++++-+    #     """
--++++-+    #     from ...generation.logits_process import LogitsProcessorList
--++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
--++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--++++-+    #     from mindnlp.core import nn, ops, no_grad
--++++-+    #     import numpy as np
--++++-+        
--++++-+    #     # 检查是否使用 StaticCache
--++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--++++-+    #     # 否则，直接调用父类方法
--++++-+    #     past_key_values = model_kwargs.get("past_key_values")
--++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--++++-+
--++++-+    #     if not isinstance(past_key_values, StaticCache):
--++++-+    #         # 不使用 StaticCache，直接调用父类方法
--++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--++++-+    #         return super()._sample(
--++++-+    #             input_ids=input_ids,
--++++-+    #             logits_processor=logits_processor,
--++++-+    #             stopping_criteria=stopping_criteria,
--++++-+    #             generation_config=generation_config,
--++++-+    #             synced_devices=synced_devices,
--++++-+    #             streamer=streamer,
--++++-+    #             logits_warper=logits_warper,
--++++-+    #             **model_kwargs,
--++++-+    #         )
--++++-+        
--++++-+    #     # 使用 StaticCache，进入自定义循环
--++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--++++-+    #     pad_token_id = generation_config._pad_token_tensor
--++++-+    #     output_attentions = generation_config.output_attentions
--++++-+    #     output_hidden_states = generation_config.output_hidden_states
--++++-+    #     output_scores = generation_config.output_scores
--++++-+    #     output_logits = generation_config.output_logits
--++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
--++++-+    #     max_length = generation_config.max_length
--++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--++++-+    #     do_sample = generation_config.do_sample
--++++-+        
--++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--++++-+    #         raise ValueError(
--++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--++++-+    #             f"{logits_warper})."
--++++-+    #         )
--++++-+        
--++++-+    #     # init attention / hidden states / scores tuples
--++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
--++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--++++-+        
--++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--++++-+    #         encoder_hidden_states = (
--++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--++++-+    #         )
--++++-+        
--++++-+    #     # keep track of which sequences are already finished
--++++-+    #     batch_size, cur_len = input_ids.shape
--++++-+    #     this_peer_finished = False
--++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--++++-+        
--++++-+    #     time_record = []
--++++-+    #     from ....utils.testing_utils import parse_flag_from_env
--++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--++++-+        
--++++-+    #     while self._has_unfinished_sequences(
--++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--++++-+    #     ):
--++++-+    #         if _record_time:
--++++-+    #             import time as time_module
--++++-+    #             infer_start = time_module.time()
--++++-+            
--++++-+    #         # prepare model inputs
--++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--++++-+            
--++++-+    #         # prepare variable output controls
--++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--++++-+            
--++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--++++-+    #         cur_cache_position = model_inputs.get("cache_position")
--++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
--++++-+    #         cur_input_ids = model_inputs.get("input_ids")
--++++-+            
--++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
--++++-+    #             cur_cache_position is not None and 
--++++-+    #             len(cur_cache_position.shape) > 0 and
--++++-+    #             cur_cache_position.shape[0] == 1 and
--++++-+    #             cur_input_ids is not None and
--++++-+    #             cur_input_ids.shape[1] == 1):
--++++-+    #             # 使用 JIT 优化的单 token 解码
--++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--++++-+    #             if not hasattr(self, '_jit_used'):
--++++-+    #                 self._jit_used = False
--++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--++++-+                
--++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
--++++-+    #                 cur_token=cur_input_ids,
--++++-+    #                 input_pos=model_inputs.get("position_ids"),
--++++-+    #                 cache_position=cur_cache_position,
--++++-+    #                 past_key_values=cur_past_key_values,
--++++-+    #             )
--++++-+                
--++++-+    #             # 标记已使用JIT（用于后续判断）
--++++-+    #             if not self._jit_used:
--++++-+    #                 self._jit_used = True
--++++-+                
--++++-+    #             # 构造兼容的输出对象
--++++-+    #             class JitOptimizedOutput:
--++++-+    #                 def __init__(self, logits, config):
--++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--++++-+    #                     self.config = config
--++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
--++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
--++++-+    #                     self.cross_attentions = None
--++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--++++-+                
--++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--++++-+    #         else:
--++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--++++-+    #             outputs = self(**model_inputs, return_dict=True)
--++++-+            
--++++-+    #         if synced_devices and this_peer_finished:
--++++-+    #             continue
--++++-+            
--++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--++++-+    #         next_token_logits = outputs.logits[:, -1, :]
--++++-+            
--++++-+    #         # pre-process distribution
--++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--++++-+    #         if do_sample:
--++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--++++-+            
--++++-+    #         # Store scores, attentions and hidden_states when required
--++++-+    #         if return_dict_in_generate:
--++++-+    #             if output_scores:
--++++-+    #                 scores += (next_token_scores,)
--++++-+    #             if output_logits:
--++++-+    #                 raw_logits += (next_token_logits,)
--++++-+    #             if output_attentions:
--++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--++++-+    #                 if self.config.is_encoder_decoder:
--++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--++++-+                
--++++-+    #             if output_hidden_states:
--++++-+    #                 hidden = (
--++++-+    #                     outputs.decoder_hidden_states
--++++-+    #                     if self.config.is_encoder_decoder
--++++-+    #                     else outputs.hidden_states
--++++-+    #                 )
--++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--++++-+            
--++++-+    #         # token selection
--++++-+    #         if do_sample:
--++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--++++-+    #         else:
--++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--++++-+            
--++++-+    #         # finished sentences should have their next token be a padding token
--++++-+    #         if has_eos_stopping_criteria:
--++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--++++-+            
--++++-+    #         # update generated ids, model inputs, and length for next step
--++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--++++-+    #         if streamer is not None:
--++++-+    #             streamer.put(next_tokens)
--++++-+            
--++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
--++++-+    #             outputs,
--++++-+    #             model_kwargs,
--++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
--++++-+    #         )
--++++-+            
--++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--++++-+    #         cur_len += 1
--++++-+            
--++++-+    #         if _record_time:
--++++-+    #             import time as time_module
--++++-+    #             infer_stop = time_module.time()
--++++-+    #             time_record.append(infer_stop - infer_start)
--++++-+            
--++++-+    #         del outputs
--++++-+        
--++++-+    #     average_infer_time = None
--++++-+    #     if time_record:
--++++-+    #         if len(time_record) > 1:
--++++-+    #             time_record.pop(0)
--++++-+    #         average_infer_time = sum(time_record) / len(time_record)
--++++-+    #         print(f'average inference time is: {average_infer_time}')
--++++-+    #         print(f'inference time record: {time_record}')
--++++-+        
--++++-+    #     if streamer is not None:
--++++-+    #         streamer.end()
--++++-+        
--++++-+    #     # 简单判断：打印是否使用了JIT路径
--++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
--++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
--++++-+    #     else:
--++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--++++-+        
--++++-+    #     if return_dict_in_generate:
--++++-+    #         if self.config.is_encoder_decoder:
--++++-+    #             return GenerateEncoderDecoderOutput(
--++++-+    #                 sequences=input_ids,
--++++-+    #                 scores=scores,
--++++-+    #                 logits=raw_logits,
--++++-+    #                 encoder_attentions=encoder_attentions,
--++++-+    #                 encoder_hidden_states=encoder_hidden_states,
--++++-+    #                 decoder_attentions=decoder_attentions,
--++++-+    #                 cross_attentions=cross_attentions,
--++++-+    #                 decoder_hidden_states=decoder_hidden_states,
--++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++-+    #                 average_infer_time=average_infer_time
--++++-+    #             )
--++++-+    #         else:
--++++-+    #             return GenerateDecoderOnlyOutput(
--++++-+    #                 sequences=input_ids,
--++++-+    #                 scores=scores,
--++++-+    #                 logits=raw_logits,
--++++-+    #                 attentions=decoder_attentions,
--++++-+    #                 hidden_states=decoder_hidden_states,
--++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++-+    #                 average_infer_time=average_infer_time
--++++-+    #             )
--++++-+    #     else:
--++++-+    #         return input_ids
--++++-+            
--++++-+    # def _prepare_cache_for_generation(
--++++-+    #     self,
--++++-+    #     generation_config,
--++++-+    #     model_kwargs,
--++++-+    #     assistant_model,
--++++-+    #     batch_size,
--++++-+    #     max_cache_length,
--++++-+    # ):
--++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--++++-+    #         generation_config.cache_implementation = "static"
--++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--++++-+        
--++++-+    #     if generation_config.cache_implementation == "static":
--++++-+    #         base_required_from_max_length = generation_config.max_length + 1
--++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
--++++-+    #         min_cache_size = 50
--++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--++++-+    #         else:
--++++-+    #             max_cache_length = max(base_required, min_cache_size)
--++++-+            
--++++-+    #         original_max_cache_length = max_cache_length
--++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
--++++-+            
--++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++-+    #             if max_cache_length > self.config.max_position_embeddings:
--++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--++++-+        
--++++-+    #     result = super()._prepare_cache_for_generation(
--++++-+    #         generation_config=generation_config,
--++++-+    #         model_kwargs=model_kwargs,
--++++-+    #         assistant_model=assistant_model,
--++++-+    #         batch_size=batch_size,
--++++-+    #         max_cache_length=max_cache_length,
--++++-+    #     )
--++++-+        
--++++-+    #     if generation_config.cache_implementation == "static":
--++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--++++-+    #         created_cache = model_kwargs.get(cache_name)
--++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
--++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--++++-+        
--++++-+    #     return result
--++++-+
--++++-+
--++++-+
--++++- 
--++++- 
--++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--++++--- 
--++++-2.27.0
--++++-
--++++-- 
--++++2.27.0
--++++
--+++-- 
--+++2.27.0
--+++
--++-- 
--++2.27.0
--++
--+-- 
--+2.27.0
--+
---- 
--2.27.0
--
-diff --git a/patches/0008-moe-change.patch b/patches/0008-moe-change.patch
-deleted file mode 100644
-index 349f1429..00000000
---- a/patches/0008-moe-change.patch
-+++ /dev/null
-@@ -1,8789 +0,0 @@
--From 45ba3bbc411b64cbffd547fa3d66bce9545639dd Mon Sep 17 00:00:00 2001
--From: Pinoeer-kingxi <13022943007@163.com>
--Date: Sun, 9 Nov 2025 00:50:01 +0800
--Subject: [PATCH 8/8] moe change
--
-----
-- .../models/deepseek/modeling_deepseek.py      |  433 +-
-- .../models/qwen2_moe/modeling_qwen2_moe.py    |   86 +-
-- patches/0001-20251104commit.patch             |    2 +-
-- patches/0002-20251106commit.patch             |    2 +-
-- patches/0003-20261106secondcommit.patch       |    2 +-
-- patches/0004-20251106change.patch             |    2 +-
-- patches/0005-20251107001commit.patch          |    2 +-
-- patches/0006-20251107002commit.patch          |    2 +-
-- patches/0007-20251107003commit.patch          | 8034 +++++++++++++++++
-- 9 files changed, 8510 insertions(+), 55 deletions(-)
-- create mode 100644 patches/0007-20251107003commit.patch
--
--diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--index ff631974..0af29305 100644
----- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--@@ -19,8 +19,10 @@
-- # limitations under the License.
-- """ MindNLP DeepSeek model."""
-- import math
--+import time
-- import warnings
-- from typing import List, Optional, Tuple, Union
--+from mindspore import mint
-- import mindspore
-- from mindnlp.core import nn, ops, no_grad
-- from mindnlp.core.nn import functional as F
--@@ -54,6 +56,10 @@ logger = logging.get_logger(__name__)
-- 
-- _CONFIG_FOR_DOC = "DeepseekConfig"
-- 
--+Long_Prompt = 1
--+LONG_PROMPT_LENGTH_THRESHOLD = 128
--+SHORT_PROMPT_LENGTH_THRESHOLD = 32
--+
-- _attn_mask_cache = {}
-- 
-- def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
--@@ -380,6 +386,8 @@ class MoEGate(nn.Module):
--         return topk_idx, topk_weight, aux_loss
-- 
-- 
--+bincount_op = mindspore.ops.Bincount()
--+
-- class DeepseekMoE(nn.Module):
--     """
--     A mixed expert module containing shared experts.
--@@ -413,7 +421,10 @@ class DeepseekMoE(nn.Module):
--                     y = y + self.shared_experts(identity)
--                 return y
--             else:
---                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+                if Long_Prompt == 0:
--+                    y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+                else:
--+                    y= self.moe_infer_prefill_fast(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--                 if self.config.n_shared_experts is not None:
--                     y = y + self.shared_experts(identity)
--                 return y
--@@ -421,7 +432,103 @@ class DeepseekMoE(nn.Module):
--         # if self.config.n_shared_experts is not None:
--         #     y = y + self.shared_experts(identity)
--         # return y
---        
--+    
--+    
--+    
--+    # lwx
--+    # def forward(self, x, expert_ids: Optional[mindspore.Tensor] = None):
--+    #     """
--+    #     如果 expert_ids 为 None，走单专家逻辑；
--+    #     如果有，多专家批量处理，保证和原逻辑一致。
--+    #     """
--+    #     if expert_ids is None:
--+    #         # 原单专家逻辑
--+    #         if self.config.pretraining_tp > 1:
--+    #             slice = self.intermediate_size // self.config.pretraining_tp
--+    #             gate_proj_slices = ops.split(self.gate_proj.weight, slice, dim=0)
--+    #             up_proj_slices = ops.split(self.up_proj.weight, slice, dim=0)
--+    #             down_proj_slices = ops.split(self.down_proj.weight, slice, dim=1)
--+    #             gate_proj = ops.cat([F.linear(x, gate_proj_slices[i]) 
--+    #                                  for i in range(self.config.pretraining_tp)], dim=-1)
--+    #             up_proj = ops.cat([F.linear(x, up_proj_slices[i]) 
--+    #                                for i in range(self.config.pretraining_tp)], dim=-1)
--+    #             intermediate_states = ops.split((self.act_fn(gate_proj) * up_proj), slice, dim=2)
--+    #             down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) 
--+    #                          for i in range(self.config.pretraining_tp)]
--+    #             down_proj = sum(down_proj)
--+    #         else:
--+    #             down_proj = self.down_proj(
--+    #                 self.act_fn(self.gate_proj(x)) * self.up_proj(x)
--+    #             )
--+    #         return down_proj
--+
--+    #     # ====== 批量多专家路径 ======
--+    #     hidden_size = x.shape[-1]
--+
--+    #     # 按 token expert_ids 选权重
--+    #     gate_weights = self.gate_proj.weight[expert_ids]  # shape: [tokens, inter_size]
--+    #     up_weights   = self.up_proj.weight[expert_ids]
--+    #     down_weights = self.down_proj.weight[expert_ids]
--+
--+    #     # 注意：pretraining_tp > 1 的分 slice 逻辑仍然要保留
--+    #     if self.config.pretraining_tp > 1:
--+    #         outputs = []
--+    #         slice = self.intermediate_size // self.config.pretraining_tp
--+    #         for i in range(self.config.pretraining_tp):
--+    #             # 每个 slice 单独计算
--+    #             gate_proj_out = F.linear(x, gate_weights[:, i*slice:(i+1)*slice])
--+    #             up_proj_out   = F.linear(x, up_weights[:, i*slice:(i+1)*slice])
--+    #             act_out       = self.act_fn(gate_proj_out) * up_proj_out
--+    #             down_proj_out = F.linear(act_out, down_weights[i*slice:(i+1)*slice, :])
--+    #             outputs.append(down_proj_out)
--+    #         return sum(outputs)
--+    #     else:
--+    #         gate_proj_out = F.linear(x, gate_weights)
--+    #         up_proj_out   = F.linear(x, up_weights)
--+    #         act_out       = self.act_fn(gate_proj_out) * up_proj_out
--+    #         return F.linear(act_out, down_weights)
--+    # @no_grad()
--+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+    #     num_tokens = x.shape[0]
--+    #     hidden_size = x.shape[-1]
--+
--+    #     idxs = flat_expert_indices.argsort()
--+    #     sorted_expert_indices = flat_expert_indices[idxs]
--+    #     sorted_token_indices = idxs // self.num_experts_per_tok
--+    #     sorted_indices = sorted_token_indices
--+
--+    #     permuted_tokens = x[sorted_token_indices]
--+    #     sorted_weights  = flat_expert_weights[idxs]
--+
--+    #     # 一次调用多专家 forward
--+    #     expert_outputs = ops.zeros_like(permuted_tokens)
--+    #     expert_outputs = self.mlp_batch_forward(permuted_tokens, sorted_expert_indices)
--+
--+    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
--+    #     try:
--+    #         final_output = ops.moe_token_unpermute(
--+    #             expert_outputs,
--+    #             sorted_indices,
--+    #             probs=probs,
--+    #             padded_mode=False
--+    #         )
--+    #     except Exception:
--+    #         final_output = ops.zeros_like(x)
--+    #         final_output = mindspore.mint.scatter_add(
--+    #             final_output,
--+    #             0,
--+    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
--+    #             expert_outputs * sorted_weights
--+    #         )
--+
--+    #     return final_output
--+
--+    # def mlp_batch_forward(self, tokens, expert_ids):
--+    #     """
--+    #     使用批量专家 forward（保留精度）
--+    #     """
--+    #     return self.experts[0].forward(tokens, expert_ids)
--+
--     # @no_grad()
--     # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
-- 
--@@ -434,52 +541,15 @@ class DeepseekMoE(nn.Module):
--     #         expert_cache += expert_out * weight
--     #     return expert_cache
--     
--+    #@dwj
--     @no_grad()
---    # dwj
--     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
---        # x 的 shape: (1, hidden_size)
---        # flat_expert_indices 的 shape: (num_experts_per_tok,)
---        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
---
---        # 1. 收集所有需要的专家层
---        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--         selected_experts = [self.experts[i] for i in flat_expert_indices]
---
---        # 2. 并行计算所有专家的输出
---        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
---        # ops.cat 会将它们堆叠成一个新的 Tensor
---        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--         expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
---
---        # 3. 使用矩阵乘法进行加权求和
---        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
---        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
---        # 最终结果 final_output 的 shape: (1, hidden_size)
--         final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
---        
--         return final_output
-- 
-- 
---    # @no_grad()
---    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
---    #     expert_cache = ops.zeros_like(x)
---    #     idxs = flat_expert_indices.argsort()
---    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
---    #     token_idxs = idxs // self.num_experts_per_tok
---
---    #     for i, end_idx in enumerate(tokens_per_expert):
---    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
---    #         if start_idx == end_idx:
---    #             continue
---    #         expert = self.experts[i]
---    #         exp_token_idx = token_idxs[start_idx:end_idx]
---    #         expert_tokens = x[exp_token_idx]
---    #         expert_out = expert(expert_tokens)
---    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
---    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
---
---    #     return expert_cache
---        
--     @no_grad()
--     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--         """
--@@ -525,6 +595,264 @@ class DeepseekMoE(nn.Module):
--             )
-- 
--         return expert_cache
--+
--+
--+    # @no_grad()
--+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+    #     """
--+    #     优化版 MoE prefill：使用 mindspore.ops.moe_token_unpermute 替代手动 scatter_add
--+    #     """
--+    #     num_tokens = x.shape[0]
--+    #     hidden_size = x.shape[-1]
--+
--+    #     # 生成排序后的 token 索引
--+    #     idxs = flat_expert_indices.argsort()
--+    #     sorted_expert_indices = flat_expert_indices[idxs]
--+    #     sorted_token_indices = idxs // self.num_experts_per_tok
--+
--+    #     # 记录到 sorted_indices（moe_token_unpermute 用）
--+    #     sorted_indices = sorted_token_indices  # shape: [num_tokens * top_k]
--+
--+    #     # 收集专家输入
--+    #     permuted_tokens = x[sorted_token_indices]
--+
--+    #     # 执行每个专家的 MLP（批量处理）
--+    #     expert_outputs = []
--+    #     token_ptr = 0
--+    #     tokens_per_expert = sorted_expert_indices.bincount()
--+    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
--+    #         if count == 0:
--+    #             continue
--+    #         cur_tokens = permuted_tokens[token_ptr:token_ptr+count]
--+    #         out = self.experts[expert_id](cur_tokens)
--+    #         expert_outputs.append(out)
--+    #         token_ptr += count
--+
--+    #     # 拼接所有专家输出
--+    #     permuted_outputs = ops.cat(expert_outputs, axis=0)
--+
--+    #     # 权重缩放（probs 形状为 [num_tokens, top_k]）
--+    #     probs = flat_expert_weights.view(num_tokens, self.num_experts_per_tok)
--+
--+    #     # 直接调用硬件加速的 unpermute
--+    #     final_output = ops.moe_token_unpermute(
--+    #         permuted_outputs,         # shape: [num_tokens * top_k, hidden_size]
--+    #         sorted_indices,           # shape: [num_tokens * top_k]
--+    #         probs=probs,               # 按概率加权
--+    #         padded_mode=False
--+    #     )
--+
--+    #     return final_output
--+
--+    # lwx prefill 20251108
--+    @no_grad()
--+    def moe_infer_prefill_fast(self, x, flat_expert_indices, flat_expert_weights):
--+        """
--+        高性能 + 数值一致的 MoE prefill 推理：
--+        1. 批量化处理所有专家计算，减少 Python 循环开销
--+        2. Ascend A2 上使用 ops.moe_token_unpermute 加速 token 恢复
--+        3. CPU/GPU 上自动 fallback 到 scatter_add 实现
--+        4. 保证权重和 token 排列顺序与原版本完全一致，避免生成结果 mismatch
--+
--+        参数：
--+            x: [num_tokens, hidden_size]，
--+            MoE 输入的 token 表示
--+            flat_expert_indices: [num_tokens * top_k]，
--+            每个 token 的路由专家 id
--+            flat_expert_weights: [num_tokens * top_k, 1]，
--+            路由专家权重
--+        """
--+        num_tokens = x.shape[0]
--+        hidden_size = x.shape[-1]
--+
--+        # 1) 排序专家分配（与原 scatter_add 一致的顺序）
--+        idxs = flat_expert_indices.argsort()  # 排序索引
--+        sorted_expert_indices = flat_expert_indices[idxs]          # [num_tokens*top_k]
--+        sorted_token_indices = idxs // self.num_experts_per_tok    # 原 token ID
--+
--+        # sorted_indices 必须与 permuted_tokens 顺序匹配
--+        sorted_indices = sorted_token_indices   # 用原 token 位置恢复顺序
--+
--+        # 2) 收集专家输入（按 idxs 排序）
--+        permuted_tokens = x[sorted_token_indices]                  # [num_tokens*top_k, hidden_size]
--+        sorted_weights  = flat_expert_weights[idxs]                 # [num_tokens*top_k, 1]，确保与 permuted_tokens 对齐
--+
--+        # 3) 计算每个专家的 token 数
--+        tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
--+
--+        # 4) 批量专家计算（减少 Python 循环）
--+        gate_weights = ops.stack([expert.gate_proj.weight for expert in self.experts], dim=0)
--+        up_weights   = ops.stack([expert.up_proj.weight   for expert in self.experts], dim=0)
--+        down_weights = ops.stack([expert.down_proj.weight for expert in self.experts], dim=0)
--+
--+        expert_outputs = ops.zeros_like(permuted_tokens)
--+        ptr = 0
--+        for expert_id, count in enumerate(tokens_per_expert.tolist()):
--+            if count == 0:
--+                continue
--+            tokens = permuted_tokens[ptr:ptr+count]  # [count, hidden_size]
--+            
--+            # 与 DeepseekMLP forward 等价
--+            gate_proj_out = F.linear(tokens, gate_weights[expert_id])
--+            up_proj_out   = F.linear(tokens, up_weights[expert_id])
--+            act_out       = self.experts[expert_id].act_fn(gate_proj_out) * up_proj_out
--+            expert_out    = F.linear(act_out, down_weights[expert_id])
--+
--+            expert_outputs[ptr:ptr+count] = expert_out
--+            ptr += count
--+
--+        # 5) Ascend 加速的 unpermute（已排序的权重）
--+        probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)  # 按排序后的顺序 reshape
--+
--+        final_output = ops.zeros_like(x)
--+        final_output = mindspore.mint.scatter_add(
--+            final_output,
--+            0,
--+            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
--+            expert_outputs * sorted_weights
--+        )      
--+
--+
--+        # try:
--+        #     final_output = ops.moe_token_unpermute(
--+        #         expert_outputs,   # [num_tokens*top_k, hidden_size]
--+        #         sorted_indices,   # [num_tokens*top_k] 原 token id
--+        #         probs=probs,      # 对应权重
--+        #         padded_mode=False
--+        #     )
--+        # except Exception:
--+        #     # CPU/GPU fallback：用 scatter_add 保证完全一致
--+        #     final_output = ops.zeros_like(x)
--+        #     final_output = mindspore.mint.scatter_add(
--+        #         final_output,
--+        #         0,
--+        #         sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
--+        #         expert_outputs * sorted_weights
--+        #     )
--+
--+        return final_output
--+
--+
--+    # @no_grad()
--+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+    #     num_tokens = x.shape[0]
--+    #     hidden_size = x.shape[-1]
--+
--+    #     idxs = flat_expert_indices.argsort()
--+    #     sorted_expert_indices = flat_expert_indices[idxs]
--+    #     sorted_token_indices = idxs // self.num_experts_per_tok
--+        
--+    #     # sorted_indices = sorted_token_indices
--+    #     sorted_indices = sorted_token_indices.astype(mindspore.int32)
--+    #     permuted_tokens = x[sorted_token_indices]
--+    #     sorted_weights = flat_expert_weights[idxs]
--+    #     tokens_per_expert = sorted_expert_indices.bincount(minlength=len(self.experts))
--+
--+    #     expert_outputs = ops.zeros_like(permuted_tokens)
--+    #     ptr = 0
--+
--+    #     # 只按专家维度循环
--+    #     for expert_id, count in enumerate(tokens_per_expert.tolist()):
--+    #         if count == 0:
--+    #             continue
--+    #         token_slice = slice(ptr, ptr + count)
--+    #         expert_tokens = permuted_tokens[token_slice]
--+
--+    #         # 保持原 forward（含 pretraining_tp、bias 等）
--+    #         expert_out = self.experts[expert_id](expert_tokens)
--+
--+    #         expert_outputs[token_slice] = expert_out
--+    #         ptr += count
--+
--+    #     probs = sorted_weights.view(num_tokens, self.num_experts_per_tok)
--+    #     try:
--+    #         final_output = mindspore.ops.moe_token_unpermute(
--+    #             expert_outputs,
--+    #             sorted_indices,
--+    #             probs=probs,
--+    #             padded_mode=False
--+    #         )
--+    #     except Exception:
--+    #         final_output = ops.zeros_like(x)
--+    #         final_output = mindspore.mint.scatter_add(
--+    #             final_output,
--+    #             0,
--+    #             sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
--+    #             expert_outputs * sorted_weights
--+    #         )
--+
--+    #     return final_output
--+
--+
--+    #lwx
--+    # @no_grad()
--+    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+    #     """
--+    #     并行化 MoE prefill：
--+    #     - 一次性计算所有专家输出，牺牲显存峰值换取速度
--+    #     - 保证结果与原版完全一致
--+    #     """
--+    #     # 输出缓存
--+    #     expert_cache = ops.zeros_like(x)
--+
--+    #     # token 总数（批量*seq_len*num_experts_per_tok）
--+    #     num_tokens = flat_expert_indices.shape[0]
--+    #     hidden_dim = x.shape[-1]
--+
--+    #     # 原 token ID（idxs // num_experts_per_tok）
--+    #     token_ids = ops.arange(num_tokens // self.num_experts_per_tok).repeat_interleave(self.num_experts_per_tok)
--+
--+    #     # ====== Step 1: 组织输入 ======
--+    #     # 按 experts 排序，保证 scatter_add 对应位置一致
--+    #     sort_ids = flat_expert_indices.argsort()
--+    #     sorted_experts = flat_expert_indices[sort_ids]
--+    #     sorted_tokens = token_ids[sort_ids]
--+    #     sorted_weights = flat_expert_weights[sort_ids]
--+
--+    #     # 收集每个专家的输入
--+    #     # build: expert_inputs[expert_id] = [tokens...]
--+    #     expert_inputs = []
--+    #     expert_outs = []
--+
--+    #     for eid in range(self.config.n_routed_experts):
--+    #         eid_mask = (sorted_experts == eid)
--+    #         if eid_mask.any():
--+    #             tokens_for_eid = x[sorted_tokens[eid_mask]]
--+    #             expert_inputs.append(tokens_for_eid)
--+    #         else:
--+    #             expert_inputs.append(None)
--+
--+    #     # ====== Step 2: 并行计算所有专家输出 ======
--+    #     # 存储所有专家结果到一个列表
--+    #     for eid in range(self.config.n_routed_experts):
--+    #         if expert_inputs[eid] is not None:
--+    #             out = self.experts[eid](expert_inputs[eid])
--+    #             expert_outs.append(out)
--+    #         else:
--+    #             expert_outs.append(None)
--+
--+    #     # ====== Step 3: scatter_add 回写结果 ======
--+    #     # 遍历专家，将结果加回对应的 token
--+    #     pos = 0
--+    #     for eid in range(self.config.n_routed_experts):
--+    #         if expert_outs[eid] is not None:
--+    #             size = expert_outs[eid].shape[0]
--+    #             tokens_idx = sorted_tokens[pos:pos+size]
--+    #             scaled_out = expert_outs[eid] * sorted_weights[pos:pos+size]
--+    #             pos += size
--+
--+    #             # scatter_add 到 expert_cache
--+    #             expert_cache = mindspore.mint.scatter_add(
--+    #                 expert_cache,
--+    #                 dim=0,
--+    #                 index=tokens_idx.view(-1, 1).tile((1, hidden_dim)),
--+    #                 src=scaled_out
--+    #             )
--+
--+    #     return expert_cache
--+
--+
--+
-- # 放置在 DeepseekMoE 类中
--     # @no_grad()
--     # #lwx 20251107
--@@ -1188,7 +1516,7 @@ class DeepseekDecoderLayer(nn.Module):
--         self.hidden_size = config.hidden_size
-- 
--         # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
---            # config=config, layer_idx=layer_idx
--+        #     config=config, layer_idx=layer_idx
--         # )
-- 
--         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--@@ -1204,6 +1532,7 @@ class DeepseekDecoderLayer(nn.Module):
--             )
--             else DeepseekMLP(config)
--         )
--+
--         self.input_layernorm = DeepseekRMSNorm(
--             config.hidden_size, eps=config.rms_norm_eps
--         )
--@@ -1537,6 +1866,28 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--     def get_decoder(self):
--         return self.model
-- 
--+    def generate(self, *args, **kwargs):
--+        """
--+        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--+        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--+        """
--+        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--+
--+        input_ids = kwargs.get("input_ids")
--+        if input_ids is None and args:
--+            input_ids = args[0]
--+
--+        if input_ids is not None:
--+            prompt_length = input_ids.shape[1]
--+            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
--+                Long_Prompt = 2
--+            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
--+                Long_Prompt = 0
--+            else:
--+                Long_Prompt = 1
--+
--+
--+        return super().generate(*args, **kwargs)
-- 
--     def forward(
--         self,
--diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--index 913a7609..6566958b 100644
----- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--@@ -1104,7 +1104,7 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
-- 
--     # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
--     @no_grad()
---    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+    def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--         original_dtype = hidden_states.dtype
--         batch_size, _ = hidden_states.shape
--         expert_outputs_list = [
--@@ -1119,8 +1119,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--         return moe_output_fp32.squeeze(1).to(original_dtype)
-- 
--+
--     # @no_grad()
---    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+    # def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--     #     num_tokens, _ = hidden_states.shape
--     #     flat_selected_experts = selected_experts.flatten()
--     #     sorted_expert_indices = flat_selected_experts.argsort()
--@@ -1142,8 +1143,9 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--     #         current_token_offset += expert_token_count
--     #     return moe_output
-- 
--+    # baseline
--     @no_grad()
---    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+    def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--         """
--         优化版 MoE prefill (速度优先模式)：
--         - 批量张量化处理同一个 expert 的所有 token
--@@ -1184,7 +1186,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--         return moe_output
-- 
-- 
--+    @no_grad()
--+    def _moe_infer_prefill_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+        """
--+        优化版 MoE prefill (速度优先模式) - 连续切片 & 单次 scatter_add
--+        逻辑：
--+        1. 按 expert 排序，将同一 expert 的 token 放在连续内存中
--+        2. 每个 expert 一次性处理其全部 token
--+        3. 最后一次 scatter_add 回到原 token 顺序
--+        """
--+
--+        num_tokens = hidden_states.shape[0]
--+        hidden_size = hidden_states.shape[-1]
--+
--+        # 展平为一维
--+        flat_selected_experts = selected_experts.flatten()       # [num_tokens * top_k]
--+        flat_routing_weights = routing_weights.flatten()         # [num_tokens * top_k]
--+
--+        # 按 expert 排序
--+        idxs = flat_selected_experts.argsort()
--+        sorted_expert_indices = flat_selected_experts[idxs]       # expert ID 排序后
--+        sorted_token_indices = idxs // self.top_k                 # 对应原 token ID
--+
--+        # 排好序的输入向量（连续内存）
--+        permuted_tokens = hidden_states[sorted_token_indices]
--+
--+        # 排好序的权重
--+        sorted_weights = flat_routing_weights[idxs]
--+
--+        # 每个 expert 对应的 token 数量
--+        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
--+
--+        # 存放专家输出（与 permuted_tokens 对应顺序保持一致）
--+        expert_outputs = ops.zeros_like(permuted_tokens)
--+
--+        ptr = 0  # 指向当前切片的起点
--+        for expert_id, count in enumerate(tokens_per_expert.tolist()):
--+            if count == 0:
--+                continue
--+
--+            token_slice = slice(ptr, ptr + count)
--+            expert_tokens = permuted_tokens[token_slice]  # 连续切片
--+
--+            # 执行专家 MLP
--+            expert_out = self.experts[expert_id](expert_tokens)
--+
--+            expert_outputs[token_slice] = expert_out
--+            ptr += count
--+
--+        # 按权重缩放
--+        scaled_outputs = expert_outputs * sorted_weights.unsqueeze(1)
--+
--+        # 回写到原 token 顺序 (单次 scatter_add)
--+        moe_output = mindspore.mint.scatter_add(
--+            ops.zeros_like(hidden_states),
--+            0,
--+            sorted_token_indices.view(-1, 1).tile((1, hidden_size)),
--+            scaled_outputs
--+        )
--+
--+        return moe_output
--+
--+
--+    
--     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--+
--     @no_grad()
--     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--         moe_output = ops.zeros_like(hidden_states)
--@@ -1225,16 +1291,20 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--         #     # --- 速度优先模式 (SPEED MODE) ---
--         #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--         #     if sequence_length == 1:
---        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+        #         moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--         #     else:
---        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+        #         moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--         
--         routing_weights_casted = routing_weights.to(hidden_states.dtype)
--         if sequence_length == 1:
---            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+            moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--         else:
---            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
---    
--+            # if Long_Prompt == 1:
--+            #     moe_output = self._moe_infer_prefill_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+            # else:
--+            #     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+            moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+
-- 
--         # 3. 共享专家计算与合并 (所有模式通用)
--         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--index c9c8c5ee..513dd40b 100644
----- a/patches/0001-20251104commit.patch
--+++ b/patches/0001-20251104commit.patch
--@@ -1,7 +1,7 @@
-- From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Tue, 4 Nov 2025 09:11:51 +0800
---Subject: [PATCH 1/6] 20251104commit
--+Subject: [PATCH 1/7] 20251104commit
-- 
-- ---
--  mindnlp/transformers/cache_utils.py           |  28 +-
--diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--index 625656eb..41081b85 100644
----- a/patches/0002-20251106commit.patch
--+++ b/patches/0002-20251106commit.patch
--@@ -1,7 +1,7 @@
-- From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 09:20:38 +0800
---Subject: [PATCH 2/6] 20251106commit
--+Subject: [PATCH 2/7] 20251106commit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--index dcb85080..c1392569 100644
----- a/patches/0003-20261106secondcommit.patch
--+++ b/patches/0003-20261106secondcommit.patch
--@@ -1,7 +1,7 @@
-- From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 14:54:37 +0800
---Subject: [PATCH 3/6] 20261106secondcommit
--+Subject: [PATCH 3/7] 20261106secondcommit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--index bbed13cc..e548b1b2 100644
----- a/patches/0004-20251106change.patch
--+++ b/patches/0004-20251106change.patch
--@@ -1,7 +1,7 @@
-- From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Thu, 6 Nov 2025 15:48:09 +0800
---Subject: [PATCH 4/6] 20251106change
--+Subject: [PATCH 4/7] 20251106change
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  189 +-
--diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
--index b2d1035c..bf224d2a 100644
----- a/patches/0005-20251107001commit.patch
--+++ b/patches/0005-20251107001commit.patch
--@@ -1,7 +1,7 @@
-- From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Fri, 7 Nov 2025 11:48:18 +0800
---Subject: [PATCH 5/6] 20251107001commit
--+Subject: [PATCH 5/7] 20251107001commit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |   91 +-
--diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
--index bffa134e..1bd306b9 100644
----- a/patches/0006-20251107002commit.patch
--+++ b/patches/0006-20251107002commit.patch
--@@ -1,7 +1,7 @@
-- From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
-- From: Pinoeer-kingxi <13022943007@163.com>
-- Date: Fri, 7 Nov 2025 12:06:32 +0800
---Subject: [PATCH 6/6] 20251107002commit
--+Subject: [PATCH 6/7] 20251107002commit
-- 
-- ---
--  .../models/deepseek/modeling_deepseek.py      |  122 +-
--diff --git a/patches/0007-20251107003commit.patch b/patches/0007-20251107003commit.patch
--new file mode 100644
--index 00000000..ce558554
----- /dev/null
--+++ b/patches/0007-20251107003commit.patch
--@@ -0,0 +1,8034 @@
--+From cee579410530fa9fad61cd1b8a2c5cb8eb2d71f7 Mon Sep 17 00:00:00 2001
--+From: Pinoeer-kingxi <13022943007@163.com>
--+Date: Fri, 7 Nov 2025 12:12:51 +0800
--+Subject: [PATCH 7/7] 20251107003commit
--+
--+---
--+ .../models/deepseek/modeling_deepseek.py      |    2 +-
--+ patches/0001-20251104commit.patch             |    2 +-
--+ patches/0002-20251106commit.patch             |    2 +-
--+ patches/0003-20261106secondcommit.patch       |    2 +-
--+ patches/0004-20251106change.patch             |    2 +-
--+ patches/0005-20251107001commit.patch          |    2 +-
--+ patches/0006-20251107002commit.patch          | 7931 +++++++++++++++++
--+ 7 files changed, 7937 insertions(+), 6 deletions(-)
--+ create mode 100644 patches/0006-20251107002commit.patch
--+
--+diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+index e7e1c053..ff631974 100644
--+--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+@@ -435,7 +435,7 @@ class DeepseekMoE(nn.Module):
--+     #     return expert_cache
--+     
--+     @no_grad()
--+-    dwj
--++    # dwj
--+     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+         # x 的 shape: (1, hidden_size)
--+         # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+index 2842180e..c9c8c5ee 100644
--+--- a/patches/0001-20251104commit.patch
--++++ b/patches/0001-20251104commit.patch
--+@@ -1,7 +1,7 @@
--+ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Tue, 4 Nov 2025 09:11:51 +0800
--+-Subject: [PATCH 1/5] 20251104commit
--++Subject: [PATCH 1/6] 20251104commit
--+ 
--+ ---
--+  mindnlp/transformers/cache_utils.py           |  28 +-
--+diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--+index c6cd8757..625656eb 100644
--+--- a/patches/0002-20251106commit.patch
--++++ b/patches/0002-20251106commit.patch
--+@@ -1,7 +1,7 @@
--+ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Thu, 6 Nov 2025 09:20:38 +0800
--+-Subject: [PATCH 2/5] 20251106commit
--++Subject: [PATCH 2/6] 20251106commit
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--+diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--+index 601960c9..dcb85080 100644
--+--- a/patches/0003-20261106secondcommit.patch
--++++ b/patches/0003-20261106secondcommit.patch
--+@@ -1,7 +1,7 @@
--+ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Thu, 6 Nov 2025 14:54:37 +0800
--+-Subject: [PATCH 3/5] 20261106secondcommit
--++Subject: [PATCH 3/6] 20261106secondcommit
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--+diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--+index 8976f10b..bbed13cc 100644
--+--- a/patches/0004-20251106change.patch
--++++ b/patches/0004-20251106change.patch
--+@@ -1,7 +1,7 @@
--+ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Thu, 6 Nov 2025 15:48:09 +0800
--+-Subject: [PATCH 4/5] 20251106change
--++Subject: [PATCH 4/6] 20251106change
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |  189 +-
--+diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
--+index 8d9032be..b2d1035c 100644
--+--- a/patches/0005-20251107001commit.patch
--++++ b/patches/0005-20251107001commit.patch
--+@@ -1,7 +1,7 @@
--+ From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
--+ From: Pinoeer-kingxi <13022943007@163.com>
--+ Date: Fri, 7 Nov 2025 11:48:18 +0800
--+-Subject: [PATCH 5/5] 20251107001commit
--++Subject: [PATCH 5/6] 20251107001commit
--+ 
--+ ---
--+  .../models/deepseek/modeling_deepseek.py      |   91 +-
--+diff --git a/patches/0006-20251107002commit.patch b/patches/0006-20251107002commit.patch
--+new file mode 100644
--+index 00000000..bffa134e
--+--- /dev/null
--++++ b/patches/0006-20251107002commit.patch
--+@@ -0,0 +1,7931 @@
--++From 5914e3e59151bf5f44089d83c508b03132e7bb60 Mon Sep 17 00:00:00 2001
--++From: Pinoeer-kingxi <13022943007@163.com>
--++Date: Fri, 7 Nov 2025 12:06:32 +0800
--++Subject: [PATCH 6/6] 20251107002commit
--++
--++---
--++ .../models/deepseek/modeling_deepseek.py      |  122 +-
--++ patches/0001-20251104commit.patch             |    2 +-
--++ patches/0002-20251106commit.patch             |    2 +-
--++ patches/0003-20261106secondcommit.patch       |    2 +-
--++ patches/0004-20251106change.patch             |    2 +-
--++ patches/0005-20251107001commit.patch          | 7707 +++++++++++++++++
--++ 6 files changed, 7773 insertions(+), 64 deletions(-)
--++ create mode 100644 patches/0005-20251107001commit.patch
--++
--++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++index 8831e4b7..e7e1c053 100644
--++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++@@ -433,30 +433,31 @@ class DeepseekMoE(nn.Module):
--++     #         expert_out = expert(x)
--++     #         expert_cache += expert_out * weight
--++     #     return expert_cache
--++-
--++-    # @no_grad()
--++-    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++-    #     # x 的 shape: (1, hidden_size)
--++-    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
--++-    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--++-
--++-    #     # 1. 收集所有需要的专家层
--++-    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--++-    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
--++-
--++-    #     # 2. 并行计算所有专家的输出
--++-    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--++-    #     # ops.cat 会将它们堆叠成一个新的 Tensor
--++-    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++-    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--++-
--++-    #     # 3. 使用矩阵乘法进行加权求和
--++-    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--++-    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++-    #     # 最终结果 final_output 的 shape: (1, hidden_size)
--++-    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+++    
--+++    @no_grad()
--+++    dwj
--+++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++        # x 的 shape: (1, hidden_size)
--+++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+++
--+++        # 1. 收集所有需要的专家层
--+++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+++        selected_experts = [self.experts[i] for i in flat_expert_indices]
--+++
--+++        # 2. 并行计算所有专家的输出
--+++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+++        # ops.cat 会将它们堆叠成一个新的 Tensor
--+++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+++
--+++        # 3. 使用矩阵乘法进行加权求和
--+++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++        # 最终结果 final_output 的 shape: (1, hidden_size)
--+++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--++         
--++-    #     return final_output
--+++        return final_output
--++ 
--++ 
--++     # @no_grad()
--++@@ -525,50 +526,51 @@ class DeepseekMoE(nn.Module):
--++ 
--++         return expert_cache
--++ # 放置在 DeepseekMoE 类中
--++-    @no_grad()
--++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++-        """
--++-        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--++-
--++-        Args:
--++-            x (Tensor): 输入张量, shape: (1, hidden_size)
--++-            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--++-            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--++-        """
--++-        top_k, _ = flat_expert_weights.shape
--++-        hidden_size = x.shape[-1]
--++-
--++-        # 1. 将所有专家的权重堆叠起来
--++-        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--++-        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--++-        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--+++    # @no_grad()
--+++    # #lwx 20251107
--+++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++    #     """
--+++    #     优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--+++
--+++    #     Args:
--+++    #         x (Tensor): 输入张量, shape: (1, hidden_size)
--+++    #         flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--+++    #         flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--+++    #     """
--+++    #     top_k, _ = flat_expert_weights.shape
--+++    #     hidden_size = x.shape[-1]
--+++
--+++    #     # 1. 将所有专家的权重堆叠起来
--+++    #     stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--+++    #     stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--+++    #     stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--++         
--++-        # 2. "收集" 所需的专家权重
--++-        selected_gate_w = stacked_gate_w[flat_expert_indices]
--++-        selected_up_w = stacked_up_w[flat_expert_indices]
--++-        selected_down_w = stacked_down_w[flat_expert_indices]
--+++    #     # 2. "收集" 所需的专家权重
--+++    #     selected_gate_w = stacked_gate_w[flat_expert_indices]
--+++    #     selected_up_w = stacked_up_w[flat_expert_indices]
--+++    #     selected_down_w = stacked_down_w[flat_expert_indices]
--++         
--++-        # 3. 准备输入
--++-        x_expanded = x.expand((top_k, 1, hidden_size))
--+++    #     # 3. 准备输入
--+++    #     x_expanded = x.expand((top_k, 1, hidden_size))
--++         
--++-        # 4. 并行计算 gate_proj 和 up_proj
--++-        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--++-        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--+++    #     # 4. 并行计算 gate_proj 和 up_proj
--+++    #     gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--+++    #     up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--++ 
--++-        # 5. 计算中间状态
--++-        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--+++    #     # 5. 计算中间状态
--+++    #     intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--++         
--++-        # 6. 并行计算 down_proj
--++-        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--++-        #    --- [FIX] ---
--++-        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--++-        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--++-        #    --- [FIX END] ---
--+++    #     # 6. 并行计算 down_proj
--+++    #     #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--+++    #     #    --- [FIX] ---
--+++    #     #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--+++    #     expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--+++    #     #    --- [FIX END] ---
--++         
--++-        # 7. 根据路由权重进行加权求和
--++-        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--+++    #     # 7. 根据路由权重进行加权求和
--+++    #     weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--++ 
--++-        return weighted_sum
--+++    #     return weighted_sum
--++ 
--++ 
--++ 
--++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--++index 0a0ef2d7..2842180e 100644
--++--- a/patches/0001-20251104commit.patch
--+++++ b/patches/0001-20251104commit.patch
--++@@ -1,7 +1,7 @@
--++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++ From: Pinoeer-kingxi <13022943007@163.com>
--++ Date: Tue, 4 Nov 2025 09:11:51 +0800
--++-Subject: [PATCH 1/4] 20251104commit
--+++Subject: [PATCH 1/5] 20251104commit
--++ 
--++ ---
--++  mindnlp/transformers/cache_utils.py           |  28 +-
--++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--++index 5185270c..c6cd8757 100644
--++--- a/patches/0002-20251106commit.patch
--+++++ b/patches/0002-20251106commit.patch
--++@@ -1,7 +1,7 @@
--++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--++ From: Pinoeer-kingxi <13022943007@163.com>
--++ Date: Thu, 6 Nov 2025 09:20:38 +0800
--++-Subject: [PATCH 2/4] 20251106commit
--+++Subject: [PATCH 2/5] 20251106commit
--++ 
--++ ---
--++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--++index 3e05f821..601960c9 100644
--++--- a/patches/0003-20261106secondcommit.patch
--+++++ b/patches/0003-20261106secondcommit.patch
--++@@ -1,7 +1,7 @@
--++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--++ From: Pinoeer-kingxi <13022943007@163.com>
--++ Date: Thu, 6 Nov 2025 14:54:37 +0800
--++-Subject: [PATCH 3/4] 20261106secondcommit
--+++Subject: [PATCH 3/5] 20261106secondcommit
--++ 
--++ ---
--++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--++index 88a1aef4..8976f10b 100644
--++--- a/patches/0004-20251106change.patch
--+++++ b/patches/0004-20251106change.patch
--++@@ -1,7 +1,7 @@
--++ From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
--++ From: Pinoeer-kingxi <13022943007@163.com>
--++ Date: Thu, 6 Nov 2025 15:48:09 +0800
--++-Subject: [PATCH 4/4] 20251106change
--+++Subject: [PATCH 4/5] 20251106change
--++ 
--++ ---
--++  .../models/deepseek/modeling_deepseek.py      |  189 +-
--++diff --git a/patches/0005-20251107001commit.patch b/patches/0005-20251107001commit.patch
--++new file mode 100644
--++index 00000000..8d9032be
--++--- /dev/null
--+++++ b/patches/0005-20251107001commit.patch
--++@@ -0,0 +1,7707 @@
--+++From 0aff56c2ef51374ba385ca0965ff57053447db54 Mon Sep 17 00:00:00 2001
--+++From: Pinoeer-kingxi <13022943007@163.com>
--+++Date: Fri, 7 Nov 2025 11:48:18 +0800
--+++Subject: [PATCH 5/5] 20251107001commit
--+++
--+++---
--+++ .../models/deepseek/modeling_deepseek.py      |   91 +-
--+++ .../models/qwen2_moe/modeling_qwen2_moe.py    |    6 +-
--+++ .../models/qwen2_vl/modeling_qwen2_vl.py      |    6 +-
--+++ patches/0001-20251104commit.patch             |    2 +-
--+++ patches/0002-20251106commit.patch             |    2 +-
--+++ patches/0003-20261106secondcommit.patch       |    2 +-
--+++ patches/0004-20251106change.patch             | 7498 +++++++++++++++++
--+++ 7 files changed, 7577 insertions(+), 30 deletions(-)
--+++ create mode 100644 patches/0004-20251106change.patch
--+++
--+++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++index 0546f318..8831e4b7 100644
--+++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++@@ -434,29 +434,29 @@ class DeepseekMoE(nn.Module):
--+++     #         expert_cache += expert_out * weight
--+++     #     return expert_cache
--+++ 
--+++-    @no_grad()
--+++-    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++-        # x 的 shape: (1, hidden_size)
--+++-        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--+++-        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--+++-
--+++-        # 1. 收集所有需要的专家层
--+++-        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--+++-        selected_experts = [self.experts[i] for i in flat_expert_indices]
--+++-
--+++-        # 2. 并行计算所有专家的输出
--+++-        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--+++-        # ops.cat 会将它们堆叠成一个新的 Tensor
--+++-        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++-        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--+++-
--+++-        # 3. 使用矩阵乘法进行加权求和
--+++-        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--+++-        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--+++-        # 最终结果 final_output 的 shape: (1, hidden_size)
--+++-        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--++++    # @no_grad()
--++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++    #     # x 的 shape: (1, hidden_size)
--++++    #     # flat_expert_indices 的 shape: (num_experts_per_tok,)
--++++    #     # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--++++
--++++    #     # 1. 收集所有需要的专家层
--++++    #     # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--++++    #     selected_experts = [self.experts[i] for i in flat_expert_indices]
--++++
--++++    #     # 2. 并行计算所有专家的输出
--++++    #     # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--++++    #     # ops.cat 会将它们堆叠成一个新的 Tensor
--++++    #     # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++++    #     expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--++++
--++++    #     # 3. 使用矩阵乘法进行加权求和
--++++    #     # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--++++    #     # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++++    #     # 最终结果 final_output 的 shape: (1, hidden_size)
--++++    #     final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--+++         
--+++-        return final_output
--++++    #     return final_output
--+++ 
--+++ 
--+++     # @no_grad()
--+++@@ -524,6 +524,53 @@ class DeepseekMoE(nn.Module):
--+++             )
--+++ 
--+++         return expert_cache
--++++# 放置在 DeepseekMoE 类中
--++++    @no_grad()
--++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++        """
--++++        优化版 MoE decode：使用批量矩阵乘法 (bmm) 并行处理所有专家。
--++++
--++++        Args:
--++++            x (Tensor): 输入张量, shape: (1, hidden_size)
--++++            flat_expert_indices (Tensor): 选中的专家索引, shape: (num_experts_per_tok,)
--++++            flat_expert_weights (Tensor): 专家的权重, shape: (num_experts_per_tok, 1)
--++++        """
--++++        top_k, _ = flat_expert_weights.shape
--++++        hidden_size = x.shape[-1]
--++++
--++++        # 1. 将所有专家的权重堆叠起来
--++++        stacked_gate_w = ops.stack([expert.gate_proj.weight for expert in self.experts])
--++++        stacked_up_w = ops.stack([expert.up_proj.weight for expert in self.experts])
--++++        stacked_down_w = ops.stack([expert.down_proj.weight for expert in self.experts])
--++++        
--++++        # 2. "收集" 所需的专家权重
--++++        selected_gate_w = stacked_gate_w[flat_expert_indices]
--++++        selected_up_w = stacked_up_w[flat_expert_indices]
--++++        selected_down_w = stacked_down_w[flat_expert_indices]
--++++        
--++++        # 3. 准备输入
--++++        x_expanded = x.expand((top_k, 1, hidden_size))
--++++        
--++++        # 4. 并行计算 gate_proj 和 up_proj
--++++        gate_out = ops.bmm(x_expanded, selected_gate_w.transpose(0, 2, 1))
--++++        up_out = ops.bmm(x_expanded, selected_up_w.transpose(0, 2, 1))
--++++
--++++        # 5. 计算中间状态
--++++        intermediate_states = self.experts[0].act_fn(gate_out) * up_out
--++++        
--++++        # 6. 并行计算 down_proj
--++++        #    (top_k, 1, I) @ (top_k, I, H) -> (top_k, 1, H)
--++++        #    --- [FIX] ---
--++++        #    对 down_proj 的权重进行转置以匹配矩阵乘法维度
--++++        expert_outputs = ops.bmm(intermediate_states, selected_down_w.transpose(0, 2, 1))
--++++        #    --- [FIX END] ---
--++++        
--++++        # 7. 根据路由权重进行加权求和
--++++        weighted_sum = (expert_outputs * flat_expert_weights.unsqueeze(-1)).sum(axis=0)
--++++
--++++        return weighted_sum
--++++
--++++
--+++ 
--+++         # @no_grad()
--+++         # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++index ebd7782e..913a7609 100644
--+++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++@@ -279,10 +279,10 @@ class Qwen2MoeRotaryEmbedding(nn.Module):
--+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+++ def rotate_half(x):
--+++     """Rotates half the hidden dims of the input."""
--+++-    x1 = x[..., : x.shape[-1] // 2]
--+++-    x2 = x[..., x.shape[-1] // 2 :]
--++++    # x1 = x[..., : x.shape[-1] // 2]
--++++    # x2 = x[..., x.shape[-1] // 2 :]
--+++     # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++-    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++     return ops.cat((-x2, x1), dim=-1)
--+++ 
--+++ 
--+++diff --git a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--+++index d059dcbe..2b217b64 100644
--+++--- a/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--++++++ b/mindnlp/transformers/models/qwen2_vl/modeling_qwen2_vl.py
--+++@@ -176,8 +176,10 @@ class Qwen2VLRotaryEmbedding(nn.Module):
--+++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+++ def rotate_half(x):
--+++     """Rotates half the hidden dims of the input."""
--+++-    x1 = x[..., : x.shape[-1] // 2]
--+++-    x2 = x[..., x.shape[-1] // 2 :]
--++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++    # x1 = x[..., : x.shape[-1] // 2]
--++++    # x2 = x[..., x.shape[-1] // 2 :]
--++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++     return ops.cat((-x2, x1), dim=-1)
--+++ 
--+++ 
--+++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+++index 78f22642..0a0ef2d7 100644
--+++--- a/patches/0001-20251104commit.patch
--++++++ b/patches/0001-20251104commit.patch
--+++@@ -1,7 +1,7 @@
--+++ From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+++ From: Pinoeer-kingxi <13022943007@163.com>
--+++ Date: Tue, 4 Nov 2025 09:11:51 +0800
--+++-Subject: [PATCH 1/3] 20251104commit
--++++Subject: [PATCH 1/4] 20251104commit
--+++ 
--+++ ---
--+++  mindnlp/transformers/cache_utils.py           |  28 +-
--+++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--+++index 22b65dd5..5185270c 100644
--+++--- a/patches/0002-20251106commit.patch
--++++++ b/patches/0002-20251106commit.patch
--+++@@ -1,7 +1,7 @@
--+++ From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--+++ From: Pinoeer-kingxi <13022943007@163.com>
--+++ Date: Thu, 6 Nov 2025 09:20:38 +0800
--+++-Subject: [PATCH 2/3] 20251106commit
--++++Subject: [PATCH 2/4] 20251106commit
--+++ 
--+++ ---
--+++  .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--+++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--+++index 966529e4..3e05f821 100644
--+++--- a/patches/0003-20261106secondcommit.patch
--++++++ b/patches/0003-20261106secondcommit.patch
--+++@@ -1,7 +1,7 @@
--+++ From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--+++ From: Pinoeer-kingxi <13022943007@163.com>
--+++ Date: Thu, 6 Nov 2025 14:54:37 +0800
--+++-Subject: [PATCH 3/3] 20261106secondcommit
--++++Subject: [PATCH 3/4] 20261106secondcommit
--+++ 
--+++ ---
--+++  .../models/deepseek/modeling_deepseek.py      |  217 ++-
--+++diff --git a/patches/0004-20251106change.patch b/patches/0004-20251106change.patch
--+++new file mode 100644
--+++index 00000000..88a1aef4
--+++--- /dev/null
--++++++ b/patches/0004-20251106change.patch
--+++@@ -0,0 +1,7498 @@
--++++From 04a0154934c483b9f42d997f28c0420c9b50ead6 Mon Sep 17 00:00:00 2001
--++++From: Pinoeer-kingxi <13022943007@163.com>
--++++Date: Thu, 6 Nov 2025 15:48:09 +0800
--++++Subject: [PATCH 4/4] 20251106change
--++++
--++++---
--++++ .../models/deepseek/modeling_deepseek.py      |  189 +-
--++++ patches/0001-20251104commit.patch             | 1272 +++++++
--++++ patches/0002-20251106commit.patch             | 3200 +++++++++++++++++
--++++ patches/0003-20261106secondcommit.patch       | 2769 ++++++++++++++
--++++ 4 files changed, 7244 insertions(+), 186 deletions(-)
--++++ create mode 100644 patches/0001-20251104commit.patch
--++++ create mode 100644 patches/0002-20251106commit.patch
--++++ create mode 100644 patches/0003-20261106secondcommit.patch
--++++
--++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++index 2f9192bf..0546f318 100644
--++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++@@ -968,168 +968,6 @@ class DeepseekAttention(nn.Module):
--++++ 
--++++         return attn_output, attn_weights, past_key_value
--++++ 
--++++-# class DeepseekFlashAttention(nn.Module):
--++++-#     """
--++++-#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--++++-#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--++++-
--++++-#     This class is designed as a drop-in replacement for DeepseekAttention.
--++++-#     """
--++++-
--++++-#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--++++-#         super().__init__()
--++++-#         self.config = config
--++++-#         self.layer_idx = layer_idx
--++++-#         if layer_idx is None:
--++++-#             logger.warning(
--++++-#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++++-#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++-#                 "when creating this class."
--++++-#             )
--++++-
--++++-#         self.attention_dropout = config.attention_dropout
--++++-#         self.hidden_size = config.hidden_size
--++++-#         self.num_heads = config.num_attention_heads
--++++-#         self.head_dim = self.hidden_size // self.num_heads
--++++-#         self.num_key_value_heads = config.num_key_value_heads
--++++-#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++-#         self.max_position_embeddings = config.max_position_embeddings
--++++-#         self.rope_theta = config.rope_theta
--++++-#         self.is_causal = True
--++++-
--++++-#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++++-#             raise ValueError(
--++++-#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++++-#                 f" and `num_heads`: {self.num_heads})."
--++++-#             )
--++++-
--++++-#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--++++-#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++-#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++-#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--++++-#         self._init_rope()
--++++-
--++++-#     def _init_rope(self):
--++++-#         if self.config.rope_scaling is None:
--++++-#             self.rotary_emb = DeepseekRotaryEmbedding(
--++++-#                 self.head_dim,
--++++-#                 max_position_embeddings=self.max_position_embeddings,
--++++-#                 base=self.rope_theta,
--++++-#             )
--++++-#         else:
--++++-#             scaling_type = self.config.rope_scaling["type"]
--++++-#             scaling_factor = self.config.rope_scaling["factor"]
--++++-#             if scaling_type == "linear":
--++++-#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--++++-#                     self.head_dim,
--++++-#                     max_position_embeddings=self.max_position_embeddings,
--++++-#                     scaling_factor=scaling_factor,
--++++-#                     base=self.rope_theta,
--++++-#                 )
--++++-#             elif scaling_type == "dynamic":
--++++-#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--++++-#                     self.head_dim,
--++++-#                     max_position_embeddings=self.max_position_embeddings,
--++++-#                     scaling_factor=scaling_factor,
--++++-#                     base=self.rope_theta,
--++++-#                 )
--++++-#             else:
--++++-#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--++++-
--++++-#     def forward(
--++++-#         self,
--++++-#         hidden_states: mindspore.Tensor,
--++++-#         attention_mask: Optional[mindspore.Tensor] = None,
--++++-#         position_ids: Optional[mindspore.Tensor] = None,
--++++-#         past_key_value: Optional[Cache] = None,
--++++-#         output_attentions: bool = False,
--++++-#         use_cache: bool = False,
--++++-#         **kwargs,
--++++-#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++-#         if "padding_mask" in kwargs:
--++++-#             warnings.warn(
--++++-#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--++++-#             )
--++++-        
--++++-#         if output_attentions:
--++++-#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--++++-
--++++-#         bsz, q_len, _ = hidden_states.shape
--++++-
--++++-#         if self.config.pretraining_tp > 1:
--++++-#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--++++-
--++++-#         query_states = self.q_proj(hidden_states)
--++++-#         key_states = self.k_proj(hidden_states)
--++++-#         value_states = self.v_proj(hidden_states)
--++++-
--++++-#         # Reshape for multi-head attention
--++++-#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++-
--++++-#         kv_seq_len = key_states.shape[-2]
--++++-#         if past_key_value is not None:
--++++-#             if self.layer_idx is None:
--++++-#                 raise ValueError(
--++++-#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++-#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++-#                     "with a layer index."
--++++-#                 )
--++++-#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++-        
--++++-#         # Apply Rotary Positional Embedding
--++++-#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++-#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++-
--++++-#         if past_key_value is not None:
--++++-#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--++++-#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++-
--++++-#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--++++-#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--++++-#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++-        
--++++-#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++++-#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++++-        
--++++-#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++++-#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++++-
--++++-#         # Convert attention_mask for flash_attention_score
--++++-#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--++++-#         if attention_mask is not None:
--++++-#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--++++-#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--++++-#                 raise ValueError(
--++++-#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--++++-#                 )
--++++-#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--++++-#         else:
--++++-#             attn_mask_for_fa = None
--++++-        
--++++-#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--++++-
--++++-#         # Call the fused flash_attention_score operator
--++++-#         attn_output = mindspore.ops.flash_attention_score(
--++++-#             query=query_states_for_fa,
--++++-#             key=key_states_for_fa,
--++++-#             value=value_states_for_fa,
--++++-#             head_num=self.num_heads, # This is N1, the number of query heads
--++++-#             input_layout='BSH',
--++++-#             attn_mask=attn_mask_for_fa,
--++++-#             keep_prob=keep_prob,
--++++-#             scalar_value=1.0 / math.sqrt(self.head_dim),
--++++-#             sparse_mode=0 # Default mask mode
--++++-#         )
--++++-        
--++++-#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--++++-#         attn_output = self.o_proj(attn_output)
--++++-        
--++++-#         # Flash Attention does not return attention weights
--++++-#         attn_weights = None
--++++-
--++++-#         return attn_output, attn_weights, past_key_value
--++++ 
--++++ class DeepseekFlashAttention(nn.Module):
--++++     """
--++++@@ -1300,9 +1138,9 @@ class DeepseekDecoderLayer(nn.Module):
--++++         super().__init__()
--++++         self.hidden_size = config.hidden_size
--++++ 
--++++-        self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--++++-            config=config, layer_idx=layer_idx
--++++-        )
--+++++        # self.self_attn = Deepseek_ATTENTION_CLASSES[config._attn_implementation](
--+++++            # config=config, layer_idx=layer_idx
--+++++        # )
--++++ 
--++++         self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--++++             config=config, layer_idx=layer_idx
--++++@@ -1387,7 +1225,6 @@ class DeepseekDecoderLayer(nn.Module):
--++++         return outputs
--++++ 
--++++ 
--++++-
--++++ class DeepseekPreTrainedModel(PreTrainedModel):
--++++     config_class = DeepseekConfig
--++++     base_model_prefix = "model"
--++++@@ -1613,26 +1450,6 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++         # Initialize weights and apply final processing
--++++         self.post_init()
--++++         self.warm_up = False
--++++-        #@dwj
--++++-        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--++++-            self.num_layers,
--++++-            self.num_attention_heads,
--++++-            self.head_dim,
--++++-            batch_size=1,
--++++-            max_length=self.max_length,
--++++-            dtype=mindspore.float16
--++++-        )
--++++-
--++++-    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--++++-        key_cache = []
--++++-        value_cache = []
--++++-        for _ in range(num_layers):
--++++-            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++++-            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++++-            key_cache.append(k)
--++++-            value_cache.append(v)
--++++-        return key_cache, value_cache
--++++-
--++++ 
--++++     def warmup_moe_model_deep(self):
--++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--++++new file mode 100644
--++++index 00000000..78f22642
--++++--- /dev/null
--+++++++ b/patches/0001-20251104commit.patch
--++++@@ -0,0 +1,1272 @@
--+++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+++++From: Pinoeer-kingxi <13022943007@163.com>
--+++++Date: Tue, 4 Nov 2025 09:11:51 +0800
--+++++Subject: [PATCH 1/3] 20251104commit
--+++++
--+++++---
--+++++ mindnlp/transformers/cache_utils.py           |  28 +-
--+++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+++++ 3 files changed, 976 insertions(+), 87 deletions(-)
--+++++
--+++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+++++index cadd2e04..02f8d4be 100644
--+++++--- a/mindnlp/transformers/cache_utils.py
--++++++++ b/mindnlp/transformers/cache_utils.py
--+++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+++++             # k_out[:, :, cache_position] = key_states
--+++++             # v_out[:, :, cache_position] = value_states
--+++++-            if ON_ORANGE_PI:
--+++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++++-            else:
--+++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++++-
--++++++            # if ON_ORANGE_PI:
--++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++++            # else:
--++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++++            # 确保 cache_position 是 1D tensor 并且类型正确
--++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--++++++            if cache_position.ndim > 1:
--++++++                cache_position = cache_position.flatten()
--++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
--++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--++++++                cache_position = cache_position.int()
--++++++            
--++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--++++++            k_out[:, :, cache_position] = key_states
--++++++            v_out[:, :, cache_position] = value_states
--++++++            
--+++++         return k_out, v_out
--+++++ 
--+++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++index c695b944..d8303e45 100644
--+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--+++++ def rotate_half(x):
--+++++     """Rotates half the hidden dims of the input."""
--+++++-    x1 = x[..., : x.shape[-1] // 2]
--+++++-    x2 = x[..., x.shape[-1] // 2 :]
--++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++++    # x1 = x[..., : x.shape[-1] // 2]
--++++++    # x2 = x[..., x.shape[-1] // 2 :]
--++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++++     return ops.cat((-x2, x1), dim=-1)
--+++++ 
--+++++ 
--+++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+++++         if self.training:
--+++++             raise NotImplementedError("Training is not supported yet.")
--+++++         else:
--+++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++++-        if self.config.n_shared_experts is not None:
--+++++-            y = y + self.shared_experts(identity)
--+++++-        return y
--++++++            # @lwx
--++++++            if orig_shape[1] == 1:
--++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--++++++                y=y.view(*orig_shape)
--++++++                if self.config.n_shared_experts is not None:
--++++++                    y = y + self.shared_experts(identity)
--++++++                return y
--++++++            else:
--++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--++++++                if self.config.n_shared_experts is not None:
--++++++                    y = y + self.shared_experts(identity)
--++++++                return y
--++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++++        # if self.config.n_shared_experts is not None:
--++++++        #     y = y + self.shared_experts(identity)
--++++++        # return y
--++++++        
--++++++    @no_grad()
--++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++++
--++++++        expert_cache = ops.zeros_like(x)
--++++++        for i in range(self.num_experts_per_tok):
--++++++            expert_id = flat_expert_indices[i].item()
--++++++            weight = flat_expert_weights[i].item()
--++++++            expert = self.experts[expert_id]
--++++++            expert_out = expert(x)
--++++++            expert_cache += expert_out * weight
--++++++        return expert_cache
--+++++ 
--+++++     @no_grad()
--+++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++-        # expert_cache = torch.zeros_like(x)
--+++++-        # idxs = flat_expert_indices.argsort()
--+++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++-        # token_idxs = idxs // self.num_experts_per_tok
--+++++-        # for i, end_idx in enumerate(tokens_per_expert):
--+++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++-        #     if start_idx == end_idx:
--+++++-        #         continue
--+++++-        #     expert = self.experts[i]
--+++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++-        #     expert_tokens = x[exp_token_idx]
--+++++-        #     expert_out = expert(expert_tokens)
--+++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++-        # return expert_cache
--++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++++         expert_cache = ops.zeros_like(x)
--+++++         idxs = flat_expert_indices.argsort()
--+++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++         token_idxs = idxs // self.num_experts_per_tok
--++++++
--+++++         for i, end_idx in enumerate(tokens_per_expert):
--+++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++             if start_idx == end_idx:
--+++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+++++             expert_out = expert(expert_tokens)
--+++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++++
--+++++         return expert_cache
--++++++        
--++++++    # @no_grad()
--++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++++    #     # expert_cache = torch.zeros_like(x)
--++++++    #     # idxs = flat_expert_indices.argsort()
--++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++++    #     # token_idxs = idxs // self.num_experts_per_tok
--++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
--++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++++    #     #     if start_idx == end_idx:
--++++++    #     #         continue
--++++++    #     #     expert = self.experts[i]
--++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++++    #     #     expert_tokens = x[exp_token_idx]
--++++++    #     #     expert_out = expert(expert_tokens)
--++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++++    #     # return expert_cache
--++++++    #     expert_cache = ops.zeros_like(x)
--++++++    #     idxs = flat_expert_indices.argsort()
--++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++++    #     token_idxs = idxs // self.num_experts_per_tok
--++++++
--++++++    #     for i, end_idx in enumerate(tokens_per_expert):
--++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++++    #         if start_idx == end_idx:
--++++++    #             continue
--++++++    #         expert = self.experts[i]
--++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++++    #         expert_tokens = x[exp_token_idx]
--++++++    #         expert_out = expert(expert_tokens)
--++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++++
--++++++    #     return expert_cache
--++++++    # @no_grad()
--++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++++    #     expert_cache = ops.zeros_like(x)
--++++++
--++++++    #     # 排序保证顺序一致
--++++++    #     idxs = flat_expert_indices.argsort()
--++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++++    #     token_idxs = idxs // self.num_experts_per_tok
--++++++
--++++++    #     # 找出有 token 的专家
--++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++++
--++++++    #     for i in active_experts.tolist():
--++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++++    #         end_idx = tokens_per_expert[i]
--++++++    #         if start_idx == end_idx:  # 没有 token
--++++++    #             continue
--++++++
--++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++++    #         expert_tokens = x[exp_token_idx]
--++++++    #         expert_out = self.experts[i](expert_tokens)
--++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++++
--++++++    #         expert_cache = mindspore.mint.scatter_add(
--++++++    #             expert_cache,
--++++++    #             0,
--++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++++    #             expert_out
--++++++    #         )
--++++++
--++++++    #     return expert_cache
--++++++
--++++++
--+++++ 
--+++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+++++ #     """
--+++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++++ 
--+++++         # Initialize weights and apply final processing
--+++++         self.post_init()
--++++++        self.warm_up = False
--++++++
--++++++    def warmup_moe_model_deep(self):
--++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--++++++        test_texts = [
--++++++            "warmup short",
--++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--++++++        ]
--++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++++        if tokenizer is None:
--++++++            from mindnlp.transformers import AutoTokenizer
--++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++++            self._warmup_tokenizer = tokenizer
--++++++
--++++++        for text in test_texts:
--++++++            inputs = tokenizer(text, return_tensors="ms")
--++++++            with mindspore._no_grad():
--++++++                _ = self(**inputs, use_cache=False)
--++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+++++ 
--+++++     def get_input_embeddings(self):
--+++++         return self.model.embed_tokens
--+++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++++         ```"""
--++++++        if not self.warm_up:
--++++++            self.warm_up = True
--++++++            self.warmup_moe_model_deep()
--++++++
--+++++         output_attentions = (
--+++++             output_attentions
--+++++             if output_attentions is not None
--+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++index 3cbf820e..d4c6b651 100644
--+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++@@ -18,7 +18,6 @@
--+++++ # See the License for the specific language governing permissions and
--+++++ # limitations under the License.
--+++++ """MindSpore Qwen2MoE model."""
--+++++-
--+++++ import math
--+++++ from typing import List, Optional, Tuple, Union
--+++++ 
--+++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+++++     TokenClassifierOutput,
--+++++ )
--+++++ from ...modeling_utils import PreTrainedModel
--++++++from ...generation import GenerationMixin
--+++++ from ....utils import logging
--+++++ from .configuration_qwen2_moe import Qwen2MoeConfig
--+++++ 
--+++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+++++         self.variance_epsilon = eps
--+++++ 
--+++++     def forward(self, hidden_states):
--++++++        # @dwj
--++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++++        # @lwx
--++++++        # if not self.training :
--++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++++         input_dtype = hidden_states.dtype
--+++++         hidden_states = hidden_states.to(mindspore.float32)
--+++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+++++@@ -234,6 +239,8 @@ def rotate_half(x):
--+++++     """Rotates half the hidden dims of the input."""
--+++++     x1 = x[..., : x.shape[-1] // 2]
--+++++     x2 = x[..., x.shape[-1] // 2 :]
--++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++++     return ops.cat((-x2, x1), dim=-1)
--+++++ 
--+++++ 
--+++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+++++         self.config = config
--+++++         self.hidden_size = config.hidden_size
--+++++         self.intermediate_size = intermediate_size
--++++++        
--+++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+++++         self.act_fn = ACT2FN[config.hidden_act]
--+++++ 
--+++++     def forward(self, x):
--+++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++++-
--+++++ 
--++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++++        # @lwx 
--++++++        # gate_up_output = self.gate_up_proj(x)
--++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--++++++        # return self.down_proj(swiglu_output)
--++++++
--++++++    # def forward(self, x):
--++++++    #     gate_proj_out = self.gate_proj(x)
--++++++    #     up_proj_out = self.up_proj(x)
--++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--++++++    #     return self.down_proj(swiglu_out)
--++++++        
--+++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+++++     """
--+++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+++++         use_cache: bool = False,
--+++++         cache_position: Optional[mindspore.Tensor] = None,
--+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++        
--++++++
--+++++         bsz, q_len, _ = hidden_states.shape
--+++++ 
--+++++         query_states = self.q_proj(hidden_states)
--+++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++                     "with a layer index."
--+++++                 )
--+++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++            if isinstance(past_key_value, StaticCache):
--++++++                kv_seq_len = key_states.shape[-2]
--++++++            else:
--++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++ 
--+++++         if past_key_value is not None:
--+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++++            
--++++++            if isinstance(past_key_value, StaticCache):
--++++++                kv_seq_len = key_states.shape[-2]
--+++++ 
--+++++         # repeat k/v heads if n_kv_heads < n_heads
--+++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++-
--++++++        
--+++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++++ 
--+++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+++++-            raise ValueError(
--+++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+++++-                f" {attn_weights.shape}"
--+++++-            )
--+++++-
--+++++-        if attention_mask is not None:  # no matter the length, we just slice it
--+++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--++++++        if attention_mask is not None:
--++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++++             attn_weights = attn_weights + causal_mask
--+++++ 
--+++++         # upcast attention to fp32
--+++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++++ 
--+++++         attn_output = self.o_proj(attn_output)
--+++++-
--++++++        # @lwx
--++++++        
--++++++        # max_seq_len = self.max_position_embeddings  # 2048
--++++++
--++++++        # if attention_mask is not None:
--++++++        #     # attention_mask: [B, 1, Sq, Sk]
--++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++++++
--++++++        #     # pad 到 [max_seq_len, max_seq_len]
--++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++++++        #     global_attention_mask = padded_mask
--++++++        # else:
--++++++        #     global_attention_mask = None
--++++++
--++++++
--++++++        # sparse_mode=3
--++++++        # attn_output = mindspore.ops.flash_attention_score(
--++++++        #     query=query_states,
--++++++        #     key=key_states,
--++++++        #     value=value_states,
--++++++        #     real_shift=None,
--++++++        #     padding_mask=None,
--++++++
--++++++        #     head_num=self.num_heads,
--++++++        #     attn_mask=global_attention_mask,
--++++++        #     keep_prob=1.0 - self.attention_dropout,
--++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++        #     input_layout="BNSD",
--++++++        #     pre_tokens=2147483647,
--++++++        #     next_tokens=2147483647,
--++++++        #     inner_precise=0,
--++++++        #     drop_mask=None,
--++++++        #     prefix=None,
--++++++        #     actual_seq_qlen=None,
--++++++        #     actual_seq_kvlen=None,
--++++++        #     sparse_mode=sparse_mode,
--++++++        # )
--+++++         if not output_attentions:
--+++++             attn_weights = None
--+++++ 
--+++++         return attn_output, attn_weights, past_key_value
--+++++ 
--+++++ 
--++++++class Qwen2MoeFlashAttention(nn.Module):
--++++++    """
--++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++++++
--++++++    关键改动:
--++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++++++       直接传入原始的 key 和 value 张量效率更高。
--++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++++    """
--++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++++        super().__init__()
--++++++        self.config = config
--++++++        self.layer_idx = layer_idx
--++++++        self.hidden_size = config.hidden_size
--++++++        self.num_heads = config.num_attention_heads
--++++++        self.head_dim = self.hidden_size // self.num_heads
--++++++        self.num_key_value_heads = config.num_key_value_heads
--++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++++        self.max_position_embeddings = config.max_position_embeddings
--++++++        self.rope_theta = config.rope_theta
--++++++        self.attention_dropout = config.attention_dropout
--++++++
--++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++++++            raise ValueError(
--++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++++++            )
--++++++
--++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++++
--++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++++            self.head_dim,
--++++++            max_position_embeddings=self.max_position_embeddings,
--++++++            base=self.rope_theta,
--++++++        )
--++++++
--++++++    def forward(
--++++++        self,
--++++++        hidden_states: mindspore.Tensor,
--++++++        attention_mask: Optional[mindspore.Tensor] = None,
--++++++        position_ids: Optional[mindspore.Tensor] = None,
--++++++        past_key_value: Optional[Cache] = None,
--++++++        output_attentions: bool = False,
--++++++        use_cache: bool = False,
--++++++        cache_position: Optional[mindspore.Tensor] = None,
--++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++        bsz, q_len, _ = hidden_states.shape
--++++++
--++++++        # 1. 线性投射 Q, K, V
--++++++        query_states = self.q_proj(hidden_states)
--++++++        key_states = self.k_proj(hidden_states)
--++++++        value_states = self.v_proj(hidden_states)
--++++++
--++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
--++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++        # 3. RoPE 旋转位置编码
--++++++        kv_seq_len = key_states.shape[-2]
--++++++        if past_key_value is not None:
--++++++            if self.layer_idx is None:
--++++++                raise ValueError(
--++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++                    "with a layer index."
--++++++                )
--++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++++++                if cache_position.shape[0] == 1:
--++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++++++                    kv_seq_len = past_seen_tokens + 1
--++++++                else:
--++++++                    # prefill 阶段：cache_position 是范围，使用其长度
--++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++++++            else:
--++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++
--++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++        # 4. KV 缓存更新
--++++++        if past_key_value is not None:
--++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++++            key_states, value_states = past_key_value.update(
--++++++                key_states, value_states, self.layer_idx, cache_kwargs
--++++++            )
--++++++            
--++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++++                if cache_position.shape[0] == 1:
--++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++++++                    kv_seq_len = key_states.shape[-2]
--++++++
--++++++        # 5. [重要] 准备 Attention Mask
--++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++++        fa_attention_mask = None
--++++++        if attention_mask is not None:
--++++++            # 截取与当前key长度匹配的部分
--++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--++++++            fa_attention_mask = (mask_slice != 0)
--++++++
--++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++++++        input_dtype = query_states.dtype
--++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++++++            query_states = query_states.to(mindspore.float16)
--++++++            key_states = key_states.to(mindspore.float16)
--++++++            value_states = value_states.to(mindspore.float16)
--++++++
--++++++        # 6. [核心] 调用 flash_attention_score 算子
--++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++++        attn_output = mindspore.ops.flash_attention_score(
--++++++            query=query_states,
--++++++            key=key_states,
--++++++            value=value_states,
--++++++            head_num=self.num_heads, # 传入Q的头数(N1)
--++++++            attn_mask=fa_attention_mask,
--++++++            keep_prob=1.0 - self.attention_dropout,
--++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++            input_layout="BNSD",
--++++++            sparse_mode=0 # 使用 defaultMask 模式
--++++++        )
--++++++
--++++++        # 恢复原始数据类型
--++++++        attn_output = attn_output.to(input_dtype)
--++++++
--++++++        # 7. 调整输出形状
--++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++        attn_output = self.o_proj(attn_output)
--++++++
--++++++        # FlashAttention 算子不直接返回注意力权重矩阵
--++++++        attn_weights = None
--++++++        if output_attentions:
--++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++++
--++++++        return attn_output, attn_weights, past_key_value
--++++++
--++++++    # def forward(
--++++++    #     self,
--++++++    #     hidden_states: mindspore.Tensor,
--++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++++    #     position_ids: Optional[mindspore.Tensor] = None,
--++++++    #     past_key_value: Optional[Cache] = None,
--++++++    #     output_attentions: bool = False,
--++++++    #     use_cache: bool = False,
--++++++    #     cache_position: Optional[mindspore.Tensor] = None,
--++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++    #     bsz, q_len, _ = hidden_states.shape
--++++++
--++++++    #     # 1. 线性投射 Q, K, V
--++++++    #     query_states = self.q_proj(hidden_states)
--++++++    #     key_states = self.k_proj(hidden_states)
--++++++    #     value_states = self.v_proj(hidden_states)
--++++++
--++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++    #     # 3. RoPE 旋转位置编码
--++++++    #     kv_seq_len = key_states.shape[-2]
--++++++    #     if past_key_value is not None:
--++++++    #         if self.layer_idx is None:
--++++++    #             raise ValueError(
--++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++    #                 "with a layer index."
--++++++    #             )
--++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++
--++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++    #     # 4. KV 缓存更新
--++++++    #     if past_key_value is not None:
--++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++++    #         key_states, value_states = past_key_value.update(
--++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++++    #         )
--++++++
--++++++    #     # 5. 准备 Attention Mask
--++++++    #     fa_attention_mask = None
--++++++    #     if attention_mask is not None:
--++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++    #         fa_attention_mask = (mask_slice != 0)
--++++++
--++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++++++    #     input_dtype = query_states.dtype
--++++++
--++++++    #     # 6. [核心] 调用 flash_attention_score 算子
--++++++    #     attn_output = mindspore.ops.flash_attention_score(
--++++++    #         query=query_states,
--++++++    #         key=key_states,
--++++++    #         value=value_states,
--++++++    #         head_num=self.num_heads,
--++++++    #         attn_mask=fa_attention_mask,
--++++++    #         keep_prob=1.0 - self.attention_dropout,
--++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++    #         input_layout="BNSD",
--++++++    #         sparse_mode=0,
--++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++++++    #         inner_precise=1
--++++++    #     )
--++++++
--++++++    #     # 恢复原始数据类型
--++++++    #     attn_output = attn_output.to(input_dtype)
--++++++
--++++++    #     # 7. 调整输出形状
--++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++    #     attn_output = self.o_proj(attn_output)
--++++++
--++++++    #     attn_weights = None
--++++++    #     if output_attentions:
--++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++++
--++++++    #     return attn_output, attn_weights, past_key_value
--++++++
--++++++    # def forward(
--++++++    #     self,
--++++++    #     hidden_states: mindspore.Tensor,
--++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--++++++    #     position_ids: Optional[mindspore.Tensor] = None,
--++++++    #     past_key_value: Optional[Cache] = None,
--++++++    #     output_attentions: bool = False,
--++++++    #     use_cache: bool = False,
--++++++    #     cache_position: Optional[mindspore.Tensor] = None,
--++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++    #     bsz, q_len, _ = hidden_states.shape
--++++++
--++++++    #     query_states = self.q_proj(hidden_states)
--++++++    #     key_states = self.k_proj(hidden_states)
--++++++    #     value_states = self.v_proj(hidden_states)
--++++++
--++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++    #     kv_seq_len = key_states.shape[-2]
--++++++    #     if past_key_value is not None:
--++++++    #         if self.layer_idx is None:
--++++++    #             raise ValueError("`layer_idx` must be specified for caching")
--++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++
--++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++    #     if past_key_value is not None:
--++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++++    #         key_states, value_states = past_key_value.update(
--++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--++++++    #         )
--++++++
--++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++++
--++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--++++++    #     query_states = query_states / math.sqrt(self.head_dim)
--++++++    #     # <--- 修改结束 ---
--++++++
--++++++    #     fa_attention_mask = None
--++++++    #     if attention_mask is not None:
--++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++    #         fa_attention_mask = (mask_slice != 0)
--++++++
--++++++    #     input_dtype = query_states.dtype
--++++++
--++++++    #     attn_output = mindspore.ops.flash_attention_score(
--++++++    #         query=query_states,    # 传入已经预先缩放过的 query
--++++++    #         key=key_states,
--++++++    #         value=value_states,
--++++++    #         head_num=self.num_heads,
--++++++    #         attn_mask=fa_attention_mask,
--++++++    #         keep_prob=1.0 - self.attention_dropout,
--++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--++++++    #         input_layout="BNSD",
--++++++    #         sparse_mode=0,
--++++++    #         inner_precise=1        # 仍然保持内部高精度计算
--++++++    #     )
--++++++
--++++++    #     attn_output = attn_output.to(input_dtype)
--++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++    #     attn_output = self.o_proj(attn_output)
--++++++
--++++++    #     attn_weights = None
--++++++    #     if output_attentions:
--++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++++++
--++++++    #     return attn_output, attn_weights, past_key_value
--++++++
--+++++ QWEN2MOE_ATTENTION_CLASSES = {
--+++++     "eager": Qwen2MoeAttention,
--++++++    "flash-attention": Qwen2MoeFlashAttention,
--+++++ }
--+++++ 
--+++++ 
--+++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++ 
--++++++    #@dwj
--++++++    # 只遍历激活的专家，而非全部专家
--+++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-        hidden_states = hidden_states.view(-1, hidden_dim)
--+++++-        # router_logits: (batch * sequence_length, n_experts)
--+++++-        router_logits = self.gate(hidden_states)
--+++++-
--+++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++-        if self.norm_topk_prob:
--+++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-        # we cast back to the input dtype
--+++++-        routing_weights = routing_weights.to(hidden_states.dtype)
--+++++-
--+++++-        final_hidden_states = ops.zeros(
--+++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+++++-        )
--+++++-
--+++++-        # One hot encode the selected experts to create an expert mask
--+++++-        # this will be used to easily index which expert is going to be sollicitated
--+++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+++++-
--+++++-        # Loop over all available experts in the model and perform the computation on each expert
--+++++-        for expert_idx in range(self.num_experts):
--+++++-            expert_layer = self.experts[expert_idx]
--+++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+++++-
--+++++-            # Index the correct hidden states and compute the expert hidden state for
--+++++-            # the current expert. We need to make sure to multiply the output hidden
--+++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+++++-            if 0 not in idx.shape:
--+++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+++++-
--+++++-                # However `index_add_` only support torch tensors for indexing so we'll use
--+++++-                # the `top_x` tensor here.
--+++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+++++-
--+++++-        shared_expert_output = self.shared_expert(hidden_states)
--+++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+++++-
--+++++-        final_hidden_states = final_hidden_states + shared_expert_output
--++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++            num_tokens = hidden_states_reshaped.shape[0]
--++++++
--++++++            router_logits = self.gate(hidden_states_reshaped)
--++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++++
--++++++            if self.norm_topk_prob:
--++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++            routing_weights = routing_weights.to(hidden_states.dtype)
--++++++            
--++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++++++            flat_selected_experts = selected_experts.flatten()
--++++++            
--++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++++++            token_indices = broadcasted_token_indices.flatten()
--++++++            
--++++++            active_experts = ops.unique(flat_selected_experts)
--++++++            
--++++++            for expert_idx_tensor in active_experts:
--++++++                expert_idx = expert_idx_tensor.item()
--++++++                expert_layer = self.experts[expert_idx]
--++++++                
--++++++                mask = (flat_selected_experts == expert_idx_tensor)
--++++++                selected_token_indices = token_indices[mask]
--++++++                selected_routing_weights = routing_weights.flatten()[mask]
--++++++                
--++++++                current_states = hidden_states_reshaped[selected_token_indices]
--++++++                
--++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++++                
--++++++                final_hidden_states = final_hidden_states.index_add(
--++++++                    dim=0,
--++++++                    index=selected_token_indices,
--++++++                    source=expert_output.to(hidden_states.dtype)
--++++++                )
--++++++            
--++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++++ 
--+++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++-        return final_hidden_states, router_logits
--++++++            final_hidden_states = final_hidden_states + shared_expert_output
--++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++++            
--++++++            return final_hidden_states, router_logits
--+++++ 
--+++++ 
--+++++ class Qwen2MoeDecoderLayer(nn.Module):
--+++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+++++ 
--+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++++ 
--++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++++
--+++++         if (layer_idx not in config.mlp_only_layers) and (
--+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+++++         ):
--+++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+++++     _skip_keys_device_placement = "past_key_values"
--+++++     _supports_cache_class = True
--++++++#lwx
--++++++    # _supports_static_cache = True
--+++++ 
--+++++     def _init_weights(self, module):
--+++++         std = self.config.initializer_range
--+++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+++++         return causal_mask
--+++++ 
--+++++ 
--+++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++     _tied_weights_keys = ["lm_head.weight"]
--+++++ 
--+++++     def __init__(self, config):
--+++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++         self.num_experts_per_tok = config.num_experts_per_tok
--+++++         # Initialize weights and apply final processing
--+++++         self.post_init()
--++++++        # @lwx
--++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--++++++        #     self.generation_config.cache_implementation = "static"
--++++++        self._warmed_up = False
--++++++
--++++++    def warmup_moe_model(self):
--++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--++++++        test_texts = [
--++++++            "warmup short",
--++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--++++++        ]
--++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--++++++        if tokenizer is None:
--++++++            from mindnlp.transformers import AutoTokenizer
--++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--++++++            self._warmup_tokenizer = tokenizer
--++++++
--++++++        for text in test_texts:
--++++++            inputs = tokenizer(text, return_tensors="ms")
--++++++            with mindspore._no_grad():
--++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+++++ 
--+++++     def get_input_embeddings(self):
--+++++         return self.model.embed_tokens
--+++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++++         ```"""
--++++++        if not self._warmed_up:
--++++++            self._warmed_up = True
--++++++            self.warmup_moe_model()
--+++++ 
--+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+++++         output_router_logits = (
--+++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++             }
--+++++         )
--+++++         return model_inputs
--++++++# @lwx
--++++++    # def _decode_one_tokens_logits(
--++++++    #     self,
--++++++    #     cur_token: mindspore.Tensor,
--++++++    #     input_pos: Optional[mindspore.Tensor],
--++++++    #     cache_position: mindspore.Tensor,
--++++++    #     past_key_values: StaticCache,
--++++++    # ) -> mindspore.Tensor:
--++++++    #     """
--++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--++++++        
--++++++    #     Args:
--++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--++++++    #         input_pos: 输入位置信息，可选
--++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--++++++            
--++++++    #     Returns:
--++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--++++++    #     """
--++++++    #     # 调用JIT编译的版本
--++++++    #     return self.get_decode_one_tokens_logits(
--++++++    #         cur_token=cur_token,
--++++++    #         input_pos=input_pos,
--++++++    #         cache_position=cache_position,
--++++++    #         past_key_values=past_key_values,
--++++++    #     )
--++++++    
--++++++    # @mindspore.jit(jit_level='O1')
--++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--++++++    #     """
--++++++    #     JIT编译的函数，用于高效的单token解码
--++++++    #     使用JIT编译优化以支持静态shape和高效执行
--++++++        
--++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--++++++    #     """
--++++++    #     outputs = self.model.forward(
--++++++    #         input_ids=cur_token,
--++++++    #         position_ids=input_pos,
--++++++    #         cache_position=cache_position,
--++++++    #         past_key_values=past_key_values,
--++++++    #         use_cache=True,
--++++++    #         return_dict=False,
--++++++    #     )
--++++++        
--++++++    #     hidden_states = outputs[0]
--++++++    #     logits = self.lm_head.forward(hidden_states)
--++++++    #     logits = logits.float()
--++++++        
--++++++    #     return logits[:, -1, :]
--++++++
--++++++    # def _sample(
--++++++    #     self,
--++++++    #     input_ids: mindspore.Tensor,
--++++++    #     logits_processor,
--++++++    #     stopping_criteria,
--++++++    #     generation_config,
--++++++    #     synced_devices: bool,
--++++++    #     streamer=None,
--++++++    #     logits_warper=None,
--++++++    #     **model_kwargs,
--++++++    # ):
--++++++    #     """
--++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--++++++    #     """
--++++++    #     from ...generation.logits_process import LogitsProcessorList
--++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--++++++    #     from mindnlp.core import nn, ops, no_grad
--++++++    #     import numpy as np
--++++++        
--++++++    #     # 检查是否使用 StaticCache
--++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--++++++    #     # 否则，直接调用父类方法
--++++++    #     past_key_values = model_kwargs.get("past_key_values")
--++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--++++++
--++++++    #     if not isinstance(past_key_values, StaticCache):
--++++++    #         # 不使用 StaticCache，直接调用父类方法
--++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--++++++    #         return super()._sample(
--++++++    #             input_ids=input_ids,
--++++++    #             logits_processor=logits_processor,
--++++++    #             stopping_criteria=stopping_criteria,
--++++++    #             generation_config=generation_config,
--++++++    #             synced_devices=synced_devices,
--++++++    #             streamer=streamer,
--++++++    #             logits_warper=logits_warper,
--++++++    #             **model_kwargs,
--++++++    #         )
--++++++        
--++++++    #     # 使用 StaticCache，进入自定义循环
--++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--++++++    #     pad_token_id = generation_config._pad_token_tensor
--++++++    #     output_attentions = generation_config.output_attentions
--++++++    #     output_hidden_states = generation_config.output_hidden_states
--++++++    #     output_scores = generation_config.output_scores
--++++++    #     output_logits = generation_config.output_logits
--++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--++++++    #     max_length = generation_config.max_length
--++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--++++++    #     do_sample = generation_config.do_sample
--++++++        
--++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--++++++    #         raise ValueError(
--++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--++++++    #             f"{logits_warper})."
--++++++    #         )
--++++++        
--++++++    #     # init attention / hidden states / scores tuples
--++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
--++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--++++++        
--++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--++++++    #         encoder_hidden_states = (
--++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--++++++    #         )
--++++++        
--++++++    #     # keep track of which sequences are already finished
--++++++    #     batch_size, cur_len = input_ids.shape
--++++++    #     this_peer_finished = False
--++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--++++++        
--++++++    #     time_record = []
--++++++    #     from ....utils.testing_utils import parse_flag_from_env
--++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--++++++        
--++++++    #     while self._has_unfinished_sequences(
--++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--++++++    #     ):
--++++++    #         if _record_time:
--++++++    #             import time as time_module
--++++++    #             infer_start = time_module.time()
--++++++            
--++++++    #         # prepare model inputs
--++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--++++++            
--++++++    #         # prepare variable output controls
--++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--++++++            
--++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--++++++    #         cur_cache_position = model_inputs.get("cache_position")
--++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
--++++++    #         cur_input_ids = model_inputs.get("input_ids")
--++++++            
--++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--++++++    #             cur_cache_position is not None and 
--++++++    #             len(cur_cache_position.shape) > 0 and
--++++++    #             cur_cache_position.shape[0] == 1 and
--++++++    #             cur_input_ids is not None and
--++++++    #             cur_input_ids.shape[1] == 1):
--++++++    #             # 使用 JIT 优化的单 token 解码
--++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--++++++    #             if not hasattr(self, '_jit_used'):
--++++++    #                 self._jit_used = False
--++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--++++++                
--++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
--++++++    #                 cur_token=cur_input_ids,
--++++++    #                 input_pos=model_inputs.get("position_ids"),
--++++++    #                 cache_position=cur_cache_position,
--++++++    #                 past_key_values=cur_past_key_values,
--++++++    #             )
--++++++                
--++++++    #             # 标记已使用JIT（用于后续判断）
--++++++    #             if not self._jit_used:
--++++++    #                 self._jit_used = True
--++++++                
--++++++    #             # 构造兼容的输出对象
--++++++    #             class JitOptimizedOutput:
--++++++    #                 def __init__(self, logits, config):
--++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--++++++    #                     self.config = config
--++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
--++++++    #                     self.cross_attentions = None
--++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--++++++                
--++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--++++++    #         else:
--++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--++++++    #             outputs = self(**model_inputs, return_dict=True)
--++++++            
--++++++    #         if synced_devices and this_peer_finished:
--++++++    #             continue
--++++++            
--++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--++++++    #         next_token_logits = outputs.logits[:, -1, :]
--++++++            
--++++++    #         # pre-process distribution
--++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--++++++    #         if do_sample:
--++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--++++++            
--++++++    #         # Store scores, attentions and hidden_states when required
--++++++    #         if return_dict_in_generate:
--++++++    #             if output_scores:
--++++++    #                 scores += (next_token_scores,)
--++++++    #             if output_logits:
--++++++    #                 raw_logits += (next_token_logits,)
--++++++    #             if output_attentions:
--++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--++++++    #                 if self.config.is_encoder_decoder:
--++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--++++++                
--++++++    #             if output_hidden_states:
--++++++    #                 hidden = (
--++++++    #                     outputs.decoder_hidden_states
--++++++    #                     if self.config.is_encoder_decoder
--++++++    #                     else outputs.hidden_states
--++++++    #                 )
--++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--++++++            
--++++++    #         # token selection
--++++++    #         if do_sample:
--++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--++++++    #         else:
--++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--++++++            
--++++++    #         # finished sentences should have their next token be a padding token
--++++++    #         if has_eos_stopping_criteria:
--++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--++++++            
--++++++    #         # update generated ids, model inputs, and length for next step
--++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--++++++    #         if streamer is not None:
--++++++    #             streamer.put(next_tokens)
--++++++            
--++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
--++++++    #             outputs,
--++++++    #             model_kwargs,
--++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--++++++    #         )
--++++++            
--++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--++++++    #         cur_len += 1
--++++++            
--++++++    #         if _record_time:
--++++++    #             import time as time_module
--++++++    #             infer_stop = time_module.time()
--++++++    #             time_record.append(infer_stop - infer_start)
--++++++            
--++++++    #         del outputs
--++++++        
--++++++    #     average_infer_time = None
--++++++    #     if time_record:
--++++++    #         if len(time_record) > 1:
--++++++    #             time_record.pop(0)
--++++++    #         average_infer_time = sum(time_record) / len(time_record)
--++++++    #         print(f'average inference time is: {average_infer_time}')
--++++++    #         print(f'inference time record: {time_record}')
--++++++        
--++++++    #     if streamer is not None:
--++++++    #         streamer.end()
--++++++        
--++++++    #     # 简单判断：打印是否使用了JIT路径
--++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
--++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
--++++++    #     else:
--++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--++++++        
--++++++    #     if return_dict_in_generate:
--++++++    #         if self.config.is_encoder_decoder:
--++++++    #             return GenerateEncoderDecoderOutput(
--++++++    #                 sequences=input_ids,
--++++++    #                 scores=scores,
--++++++    #                 logits=raw_logits,
--++++++    #                 encoder_attentions=encoder_attentions,
--++++++    #                 encoder_hidden_states=encoder_hidden_states,
--++++++    #                 decoder_attentions=decoder_attentions,
--++++++    #                 cross_attentions=cross_attentions,
--++++++    #                 decoder_hidden_states=decoder_hidden_states,
--++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++++    #                 average_infer_time=average_infer_time
--++++++    #             )
--++++++    #         else:
--++++++    #             return GenerateDecoderOnlyOutput(
--++++++    #                 sequences=input_ids,
--++++++    #                 scores=scores,
--++++++    #                 logits=raw_logits,
--++++++    #                 attentions=decoder_attentions,
--++++++    #                 hidden_states=decoder_hidden_states,
--++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--++++++    #                 average_infer_time=average_infer_time
--++++++    #             )
--++++++    #     else:
--++++++    #         return input_ids
--++++++            
--++++++    # def _prepare_cache_for_generation(
--++++++    #     self,
--++++++    #     generation_config,
--++++++    #     model_kwargs,
--++++++    #     assistant_model,
--++++++    #     batch_size,
--++++++    #     max_cache_length,
--++++++    # ):
--++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--++++++    #         generation_config.cache_implementation = "static"
--++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--++++++        
--++++++    #     if generation_config.cache_implementation == "static":
--++++++    #         base_required_from_max_length = generation_config.max_length + 1
--++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
--++++++    #         min_cache_size = 50
--++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--++++++    #         else:
--++++++    #             max_cache_length = max(base_required, min_cache_size)
--++++++            
--++++++    #         original_max_cache_length = max_cache_length
--++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
--++++++            
--++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--++++++    #             if max_cache_length > self.config.max_position_embeddings:
--++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--++++++        
--++++++    #     result = super()._prepare_cache_for_generation(
--++++++    #         generation_config=generation_config,
--++++++    #         model_kwargs=model_kwargs,
--++++++    #         assistant_model=assistant_model,
--++++++    #         batch_size=batch_size,
--++++++    #         max_cache_length=max_cache_length,
--++++++    #     )
--++++++        
--++++++    #     if generation_config.cache_implementation == "static":
--++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--++++++    #         created_cache = model_kwargs.get(cache_name)
--++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--++++++    #             if created_cache.max_cache_len < generation_config.max_length:
--++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--++++++        
--++++++    #     return result
--++++++
--++++++
--++++++
--+++++ 
--+++++ 
--+++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+++++-- 
--+++++2.27.0
--+++++
--++++diff --git a/patches/0002-20251106commit.patch b/patches/0002-20251106commit.patch
--++++new file mode 100644
--++++index 00000000..22b65dd5
--++++--- /dev/null
--+++++++ b/patches/0002-20251106commit.patch
--++++@@ -0,0 +1,3200 @@
--+++++From 1b2b3a555b7f4c777a43b806c8c7a0f2049f8de1 Mon Sep 17 00:00:00 2001
--+++++From: Pinoeer-kingxi <13022943007@163.com>
--+++++Date: Thu, 6 Nov 2025 09:20:38 +0800
--+++++Subject: [PATCH 2/3] 20251106commit
--+++++
--+++++---
--+++++ .../models/deepseek/modeling_deepseek.py      |  379 ++++-
--+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1343 +++++++++++++----
--+++++ patches/0001-20251104commit.patch             | 1272 ++++++++++++++++
--+++++ 3 files changed, 2689 insertions(+), 305 deletions(-)
--+++++ create mode 100644 patches/0001-20251104commit.patch
--+++++
--+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++index d8303e45..73773c22 100644
--+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++@@ -404,17 +404,42 @@ class DeepseekMoE(nn.Module):
--+++++         #     y = y + self.shared_experts(identity)
--+++++         # return y
--+++++         
--++++++    # @no_grad()
--++++++    # def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++++
--++++++    #     expert_cache = ops.zeros_like(x)
--++++++    #     for i in range(self.num_experts_per_tok):
--++++++    #         expert_id = flat_expert_indices[i].item()
--++++++    #         weight = flat_expert_weights[i].item()
--++++++    #         expert = self.experts[expert_id]
--++++++    #         expert_out = expert(x)
--++++++    #         expert_cache += expert_out * weight
--++++++    #     return expert_cache
--++++++
--+++++     @no_grad()
--+++++     def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--++++++        # x 的 shape: (1, hidden_size)
--++++++        # flat_expert_indices 的 shape: (num_experts_per_tok,)
--++++++        # flat_expert_weights 的 shape: (num_experts_per_tok, 1)
--++++++
--++++++        # 1. 收集所有需要的专家层
--++++++        # 注意: flat_expert_indices 是一个 Tensor，可以直接用于索引
--++++++        selected_experts = [self.experts[i] for i in flat_expert_indices]
--++++++
--++++++        # 2. 并行计算所有专家的输出
--++++++        # [expert(x) for expert in selected_experts] 会得到一个 list of Tensors
--++++++        # ops.cat 会将它们堆叠成一个新的 Tensor
--++++++        # 最终 expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++++++        expert_outputs = ops.cat([expert(x) for expert in selected_experts], dim=0)
--++++++
--++++++        # 3. 使用矩阵乘法进行加权求和
--++++++        # flat_expert_weights.T 的 shape: (1, num_experts_per_tok)
--++++++        # expert_outputs 的 shape: (num_experts_per_tok, hidden_size)
--++++++        # 最终结果 final_output 的 shape: (1, hidden_size)
--++++++        final_output = ops.matmul(flat_expert_weights.T, expert_outputs)
--++++++        
--++++++        return final_output
--+++++ 
--+++++-        expert_cache = ops.zeros_like(x)
--+++++-        for i in range(self.num_experts_per_tok):
--+++++-            expert_id = flat_expert_indices[i].item()
--+++++-            weight = flat_expert_weights[i].item()
--+++++-            expert = self.experts[expert_id]
--+++++-            expert_out = expert(x)
--+++++-            expert_cache += expert_out * weight
--+++++-        return expert_cache
--+++++ 
--+++++     @no_grad()
--+++++     def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++++@@ -807,9 +832,16 @@ class DeepseekAttention(nn.Module):
--+++++             key_states = self.k_proj(hidden_states)
--+++++             value_states = self.v_proj(hidden_states)
--+++++ 
--+++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++++        # query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++++++        # key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++++        # value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++++        # @lwx
--++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim)
--++++++        query_states = query_states.transpose(0, 2, 1, 3)  # (bsz, num_heads, q_len, head_dim)
--++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--++++++        key_states = key_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim)
--++++++        value_states = value_states.transpose(0, 2, 1, 3)  # (bsz, num_key_value_heads, q_len, head_dim)
--+++++ 
--+++++         kv_seq_len = key_states.shape[-2]
--+++++         if past_key_value is not None:
--+++++@@ -873,8 +905,329 @@ class DeepseekAttention(nn.Module):
--+++++         return attn_output, attn_weights, past_key_value
--+++++ 
--+++++ 
--++++++# class DeepseekFlashAttention(nn.Module):
--++++++#     """
--++++++#     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--++++++#     mindspore.ops.flash_attention_score for acceleration on Ascend NPU.
--++++++
--++++++#     This class is designed as a drop-in replacement for DeepseekAttention.
--++++++#     """
--++++++
--++++++#     def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--++++++#         super().__init__()
--++++++#         self.config = config
--++++++#         self.layer_idx = layer_idx
--++++++#         if layer_idx is None:
--++++++#             logger.warning(
--++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++++++#                 "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++++#                 "when creating this class."
--++++++#             )
--++++++
--++++++#         self.attention_dropout = config.attention_dropout
--++++++#         self.hidden_size = config.hidden_size
--++++++#         self.num_heads = config.num_attention_heads
--++++++#         self.head_dim = self.hidden_size // self.num_heads
--++++++#         self.num_key_value_heads = config.num_key_value_heads
--++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++++#         self.max_position_embeddings = config.max_position_embeddings
--++++++#         self.rope_theta = config.rope_theta
--++++++#         self.is_causal = True
--++++++
--++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++++++#             raise ValueError(
--++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++++++#                 f" and `num_heads`: {self.num_heads})."
--++++++#             )
--++++++
--++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--++++++#         self._init_rope()
--++++++
--++++++#     def _init_rope(self):
--++++++#         if self.config.rope_scaling is None:
--++++++#             self.rotary_emb = DeepseekRotaryEmbedding(
--++++++#                 self.head_dim,
--++++++#                 max_position_embeddings=self.max_position_embeddings,
--++++++#                 base=self.rope_theta,
--++++++#             )
--++++++#         else:
--++++++#             scaling_type = self.config.rope_scaling["type"]
--++++++#             scaling_factor = self.config.rope_scaling["factor"]
--++++++#             if scaling_type == "linear":
--++++++#                 self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--++++++#                     self.head_dim,
--++++++#                     max_position_embeddings=self.max_position_embeddings,
--++++++#                     scaling_factor=scaling_factor,
--++++++#                     base=self.rope_theta,
--++++++#                 )
--++++++#             elif scaling_type == "dynamic":
--++++++#                 self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--++++++#                     self.head_dim,
--++++++#                     max_position_embeddings=self.max_position_embeddings,
--++++++#                     scaling_factor=scaling_factor,
--++++++#                     base=self.rope_theta,
--++++++#                 )
--++++++#             else:
--++++++#                 raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--++++++
--++++++#     def forward(
--++++++#         self,
--++++++#         hidden_states: mindspore.Tensor,
--++++++#         attention_mask: Optional[mindspore.Tensor] = None,
--++++++#         position_ids: Optional[mindspore.Tensor] = None,
--++++++#         past_key_value: Optional[Cache] = None,
--++++++#         output_attentions: bool = False,
--++++++#         use_cache: bool = False,
--++++++#         **kwargs,
--++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++#         if "padding_mask" in kwargs:
--++++++#             warnings.warn(
--++++++#                 "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--++++++#             )
--++++++        
--++++++#         if output_attentions:
--++++++#             warnings.warn("`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned.")
--++++++
--++++++#         bsz, q_len, _ = hidden_states.shape
--++++++
--++++++#         if self.config.pretraining_tp > 1:
--++++++#             raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--++++++
--++++++#         query_states = self.q_proj(hidden_states)
--++++++#         key_states = self.k_proj(hidden_states)
--++++++#         value_states = self.v_proj(hidden_states)
--++++++
--++++++#         # Reshape for multi-head attention
--++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++#         kv_seq_len = key_states.shape[-2]
--++++++#         if past_key_value is not None:
--++++++#             if self.layer_idx is None:
--++++++#                 raise ValueError(
--++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++#                     "with a layer index."
--++++++#                 )
--++++++#             kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++        
--++++++#         # Apply Rotary Positional Embedding
--++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++#         if past_key_value is not None:
--++++++#             cache_kwargs = {"sin": sin, "cos": cos}  # Specific to RoPE models
--++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++++
--++++++#         # Reshape Q, K, V for flash_attention_score's 'BSH' layout
--++++++#         # Q: (bsz, num_heads, q_len, head_dim) -> (bsz, q_len, num_heads, head_dim) -> (bsz, q_len, hidden_size)
--++++++#         query_states_for_fa = query_states.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++        
--++++++#         # K: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++++++#         key_states_for_fa = key_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++++++        
--++++++#         # V: (bsz, num_kv_heads, kv_seq_len, head_dim) -> (bsz, kv_seq_len, num_kv_heads, head_dim) -> (bsz, kv_seq_len, num_kv_heads * head_dim)
--++++++#         value_states_for_fa = value_states.transpose(0, 2, 1, 3).reshape(bsz, kv_seq_len, self.num_key_value_heads * self.head_dim)
--++++++
--++++++#         # Convert attention_mask for flash_attention_score
--++++++#         # The original mask is float with -inf for masked positions. FA needs a boolean mask where True means discard.
--++++++#         if attention_mask is not None:
--++++++#             # The mask should have been prepared as (bsz, 1, q_len, kv_seq_len)
--++++++#             if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--++++++#                 raise ValueError(
--++++++#                     f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--++++++#                 )
--++++++#             attn_mask_for_fa = attention_mask < 0  # Convert -inf to True
--++++++#         else:
--++++++#             attn_mask_for_fa = None
--++++++        
--++++++#         keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--++++++
--++++++#         # Call the fused flash_attention_score operator
--++++++#         attn_output = mindspore.ops.flash_attention_score(
--++++++#             query=query_states_for_fa,
--++++++#             key=key_states_for_fa,
--++++++#             value=value_states_for_fa,
--++++++#             head_num=self.num_heads, # This is N1, the number of query heads
--++++++#             input_layout='BSH',
--++++++#             attn_mask=attn_mask_for_fa,
--++++++#             keep_prob=keep_prob,
--++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++#             sparse_mode=0 # Default mask mode
--++++++#         )
--++++++        
--++++++#         # Output shape is already (bsz, q_len, hidden_size), so no reshape is needed
--++++++#         attn_output = self.o_proj(attn_output)
--++++++        
--++++++#         # Flash Attention does not return attention weights
--++++++#         attn_weights = None
--++++++
--++++++#         return attn_output, attn_weights, past_key_value
--++++++
--++++++class DeepseekFlashAttention(nn.Module):
--++++++    """
--++++++    DeepseekAttention implemented with MindSpore's flash_attention_score operator.
--++++++    This implementation is a drop-in replacement for the original DeepseekAttention class,
--++++++    designed for high performance on supported hardware (Ascend).
--++++++
--++++++    It uses the 'BNSD' (Batch, Num_heads, Seq_len, Head_dim) memory layout for efficiency.
--++++++    """
--++++++    def __init__(self, config: DeepseekConfig, layer_idx: Optional[int] = None):
--++++++        super().__init__()
--++++++        self.config = config
--++++++        self.layer_idx = layer_idx
--++++++        if layer_idx is None:
--++++++            logger.warning(
--++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++++                "when creating this class."
--++++++            )
--++++++
--++++++        # --- [FIX] Correctly initialize all required attributes ---
--++++++        self.attention_dropout = config.attention_dropout
--++++++        self.hidden_size = config.hidden_size
--++++++        self.num_heads = config.num_attention_heads
--++++++        self.head_dim = self.hidden_size // self.num_heads
--++++++        self.num_key_value_heads = config.num_key_value_heads
--++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++++        self.max_position_embeddings = config.max_position_embeddings
--++++++        self.rope_theta = config.rope_theta
--++++++        self.is_causal = True
--++++++
--++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--++++++            raise ValueError(
--++++++                f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++++++                f" and `num_heads`: {self.num_heads})."
--++++++            )
--++++++
--++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=config.attention_bias)
--++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=config.attention_bias)
--++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=config.attention_bias)
--++++++        
--++++++        # This call will now succeed as all attributes are initialized.
--++++++        self._init_rope()
--++++++
--++++++    def _init_rope(self):
--++++++        if self.config.rope_scaling is None:
--++++++            self.rotary_emb = DeepseekRotaryEmbedding(
--++++++                self.head_dim,
--++++++                max_position_embeddings=self.max_position_embeddings,
--++++++                base=self.rope_theta,
--++++++            )
--++++++        else:
--++++++            scaling_type = self.config.rope_scaling["type"]
--++++++            scaling_factor = self.config.rope_scaling["factor"]
--++++++            if scaling_type == "linear":
--++++++                self.rotary_emb = DeepseekLinearScalingRotaryEmbedding(
--++++++                    self.head_dim,
--++++++                    max_position_embeddings=self.max_position_embeddings,
--++++++                    scaling_factor=scaling_factor,
--++++++                    base=self.rope_theta,
--++++++                )
--++++++            elif scaling_type == "dynamic":
--++++++                self.rotary_emb = DeepseekDynamicNTKScalingRotaryEmbedding(
--++++++                    self.head_dim,
--++++++                    max_position_embeddings=self.max_position_embeddings,
--++++++                    scaling_factor=scaling_factor,
--++++++                    base=self.rope_theta,
--++++++                )
--++++++            else:
--++++++                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
--++++++
--++++++    def forward(
--++++++        self,
--++++++        hidden_states: mindspore.Tensor,
--++++++        attention_mask: Optional[mindspore.Tensor] = None,
--++++++        position_ids: Optional[mindspore.Tensor] = None,
--++++++        past_key_value: Optional[Cache] = None,
--++++++        output_attentions: bool = False,
--++++++        use_cache: bool = False,
--++++++        **kwargs,
--++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++        if "padding_mask" in kwargs:
--++++++            warnings.warn(
--++++++                "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
--++++++            )
--++++++        if output_attentions:
--++++++            warnings.warn(
--++++++                "`DeepseekFlashAttention` does not support `output_attentions=True`, attention weights will not be returned."
--++++++            )
--++++++
--++++++        bsz, q_len, _ = hidden_states.shape
--++++++
--++++++        if self.config.pretraining_tp > 1:
--++++++            raise NotImplementedError("DeepseekFlashAttention does not support `pretraining_tp > 1`.")
--++++++
--++++++        query_states = self.q_proj(hidden_states)
--++++++        key_states = self.k_proj(hidden_states)
--++++++        value_states = self.v_proj(hidden_states)
--++++++
--++++++        # Reshape to BNSD format (Batch, Num_heads, Seq_len, Head_dim)
--++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++        kv_seq_len = key_states.shape[-2]
--++++++        if past_key_value is not None:
--++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++        
--++++++        # Apply Rotary Position Embedding
--++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++        if past_key_value is not None:
--++++++            cache_kwargs = {"sin": sin, "cos": cos}
--++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++++
--++++++        # For GQA/MQA, flash_attention_score in BNSD layout requires Q and KV to have the same number of heads.
--++++++        # So we must explicitly repeat the KV heads.
--++++++        key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++++        value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++++
--++++++        # Convert attention mask for flash_attention_score
--++++++        # The operator expects a boolean mask where True means to MASK OUT/DISCARD.
--++++++        if attention_mask is not None:
--++++++            if attention_mask.shape != (bsz, 1, q_len, kv_seq_len):
--++++++                 raise ValueError(
--++++++                    f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
--++++++                )
--++++++            attn_mask_for_fa = attention_mask < 0
--++++++        else:
--++++++            attn_mask_for_fa = None
--++++++
--++++++        keep_prob = 1.0 - self.attention_dropout if self.training else 1.0
--++++++
--++++++        # Call the fused operator using the efficient BNSD layout
--++++++        attn_output = mindspore.ops.flash_attention_score(
--++++++            query=query_states,
--++++++            key=key_states,
--++++++            value=value_states,
--++++++            head_num=self.num_heads,
--++++++            input_layout='BNSD', # Specify the correct layout
--++++++            attn_mask=attn_mask_for_fa,
--++++++            keep_prob=keep_prob,
--++++++            scalar_value=1.0 / math.sqrt(self.head_dim)
--++++++        )
--++++++        
--++++++        # The output of FA is in BNSD format. We need to reshape it back to the expected (B, S, H) format.
--++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++        
--++++++        # Apply output projection
--++++++        attn_output = self.o_proj(attn_output)
--++++++
--++++++        # Flash attention does not return attention weights, so we return None.
--++++++        attn_weights = None
--++++++
--++++++        return attn_output, attn_weights, past_key_value
--++++++
--+++++ Deepseek_ATTENTION_CLASSES = {
--+++++     "eager": DeepseekAttention,
--++++++    "flash-attention": DeepseekFlashAttention,
--+++++ }
--+++++ 
--+++++ 
--+++++@@ -887,6 +1240,10 @@ class DeepseekDecoderLayer(nn.Module):
--+++++             config=config, layer_idx=layer_idx
--+++++         )
--+++++ 
--++++++        self.self_attn = Deepseek_ATTENTION_CLASSES["flash-attention"](
--++++++            config=config, layer_idx=layer_idx
--++++++        )
--++++++
--+++++         self.mlp = (
--+++++             DeepseekMoE(config)
--+++++             if (
--+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++index d4c6b651..bced285c 100644
--+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++@@ -23,7 +23,7 @@ from typing import List, Optional, Tuple, Union
--+++++ 
--+++++ import mindspore
--+++++ import mindnlp.core.nn.functional as F
--+++++-from mindnlp.core import nn, ops
--++++++from mindnlp.core import nn, ops, no_grad
--+++++ from mindnlp.core.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
--+++++ 
--+++++ from ....common.activations import ACT2FN
--+++++@@ -45,6 +45,8 @@ logger = logging.get_logger(__name__)
--+++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--+++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--+++++ 
--++++++Long_Prompt = False
--++++++PROMPT_LENGTH_THRESHOLD = 128
--+++++ 
--+++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--+++++ def _prepare_4d_causal_attention_mask_with_cache_position(
--+++++@@ -473,35 +475,279 @@ class Qwen2MoeAttention(nn.Module):
--+++++         return attn_output, attn_weights, past_key_value
--+++++ 
--+++++ 
--++++++# class Qwen2MoeFlashAttention(nn.Module):
--++++++#     """
--++++++#     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--++++++#     这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--++++++
--++++++#     关键改动:
--++++++#     1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--++++++#        直接传入原始的 key 和 value 张量效率更高。
--++++++#     2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--++++++#     3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++++#     """
--++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++++#         super().__init__()
--++++++#         self.config = config
--++++++#         self.layer_idx = layer_idx
--++++++#         self.hidden_size = config.hidden_size
--++++++#         self.num_heads = config.num_attention_heads
--++++++#         self.head_dim = self.hidden_size // self.num_heads
--++++++#         self.num_key_value_heads = config.num_key_value_heads
--++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++++#         self.max_position_embeddings = config.max_position_embeddings
--++++++#         self.rope_theta = config.rope_theta
--++++++#         self.attention_dropout = config.attention_dropout
--++++++
--++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++++++#             raise ValueError(
--++++++#                 f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--++++++#             )
--++++++
--++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++++
--++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++++#             self.head_dim,
--++++++#             max_position_embeddings=self.max_position_embeddings,
--++++++#             base=self.rope_theta,
--++++++#         )
--++++++
--++++++#     def forward(
--++++++#         self,
--++++++#         hidden_states: mindspore.Tensor,
--++++++#         attention_mask: Optional[mindspore.Tensor] = None,
--++++++#         position_ids: Optional[mindspore.Tensor] = None,
--++++++#         past_key_value: Optional[Cache] = None,
--++++++#         output_attentions: bool = False,
--++++++#         use_cache: bool = False,
--++++++#         cache_position: Optional[mindspore.Tensor] = None,
--++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++#         bsz, q_len, _ = hidden_states.shape
--++++++
--++++++#         # 1. 线性投射 Q, K, V
--++++++#         query_states = self.q_proj(hidden_states)
--++++++#         key_states = self.k_proj(hidden_states)
--++++++#         value_states = self.v_proj(hidden_states)
--++++++
--++++++#         # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++++#         # query:   [B, S, H*D] -> [B, N1, S, D]
--++++++#         # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++++#         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++#         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++#         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++#         # 3. RoPE 旋转位置编码
--++++++#         kv_seq_len = key_states.shape[-2]
--++++++#         if past_key_value is not None:
--++++++#             if self.layer_idx is None:
--++++++#                 raise ValueError(
--++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++#                     "with a layer index."
--++++++#                 )
--++++++#             # 对于 StaticCache，需要特殊处理 kv_seq_len
--++++++#             # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++++#                 # 使用 cache_position 的长度来确定实际的 kv_seq_len
--++++++#                 # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--++++++#                 # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--++++++#                 # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--++++++#                 # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--++++++#                 # 临时解决方案：使用 cache_position 的最大值（如果可能）
--++++++#                 # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--++++++#                 past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--++++++#                 if cache_position.shape[0] == 1:
--++++++#                     # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--++++++#                     # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--++++++#                     kv_seq_len = past_seen_tokens + 1
--++++++#                 else:
--++++++#                     # prefill 阶段：cache_position 是范围，使用其长度
--++++++#                     kv_seq_len = cache_position.shape[0] + past_seen_tokens
--++++++#             else:
--++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++
--++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++#         # 4. KV 缓存更新
--++++++#         if past_key_value is not None:
--++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++++#             key_states, value_states = past_key_value.update(
--++++++#                 key_states, value_states, self.layer_idx, cache_kwargs
--++++++#             )
--++++++            
--++++++#             # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--++++++#             # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--++++++#             if isinstance(past_key_value, StaticCache) and cache_position is not None:
--++++++#                 if cache_position.shape[0] == 1:
--++++++#                     # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--++++++#                     kv_seq_len = key_states.shape[-2]
--++++++
--++++++#         # 5. [重要] 准备 Attention Mask
--++++++#         #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--++++++#         #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++++#         fa_attention_mask = None
--++++++#         if attention_mask is not None:
--++++++#             # 截取与当前key长度匹配的部分
--++++++#             # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--++++++#             # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--++++++#             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++#             # 转换为布尔类型: 大负数 -> True, 0 -> False
--++++++#             fa_attention_mask = (mask_slice != 0)
--++++++
--++++++#         # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--++++++#         input_dtype = query_states.dtype
--++++++#         if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--++++++#             # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--++++++#             query_states = query_states.to(mindspore.float16)
--++++++#             key_states = key_states.to(mindspore.float16)
--++++++#             value_states = value_states.to(mindspore.float16)
--++++++
--++++++#         # 6. [核心] 调用 flash_attention_score 算子
--++++++#         #    - 无需手动 repeat_kv, 算子原生支持 GQA
--++++++#         #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++++#         attn_output = mindspore.ops.flash_attention_score(
--++++++#             query=query_states,
--++++++#             key=key_states,
--++++++#             value=value_states,
--++++++#             head_num=self.num_heads, # 传入Q的头数(N1)
--++++++#             attn_mask=fa_attention_mask,
--++++++#             keep_prob=1.0 - self.attention_dropout,
--++++++#             scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++#             input_layout="BNSD",
--++++++#             sparse_mode=0 # 使用 defaultMask 模式
--++++++#         )
--++++++
--++++++#         # 恢复原始数据类型
--++++++#         attn_output = attn_output.to(input_dtype)
--++++++
--++++++#         # 7. 调整输出形状
--++++++#         # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++++#         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++#         attn_output = self.o_proj(attn_output)
--++++++
--++++++#         # FlashAttention 算子不直接返回注意力权重矩阵
--++++++#         attn_weights = None
--++++++#         if output_attentions:
--++++++#              logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++++
--++++++#         return attn_output, attn_weights, past_key_value
--++++++
--++++++#     # def forward(
--++++++#     #     self,
--++++++#     #     hidden_states: mindspore.Tensor,
--++++++#     #     attention_mask: Optional[mindspore.Tensor] = None,
--++++++#     #     position_ids: Optional[mindspore.Tensor] = None,
--++++++#     #     past_key_value: Optional[Cache] = None,
--++++++#     #     output_attentions: bool = False,
--++++++#     #     use_cache: bool = False,
--++++++#     #     cache_position: Optional[mindspore.Tensor] = None,
--++++++#     # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++#     #     bsz, q_len, _ = hidden_states.shape
--++++++
--++++++#     #     # 1. 线性投射 Q, K, V
--++++++#     #     query_states = self.q_proj(hidden_states)
--++++++#     #     key_states = self.k_proj(hidden_states)
--++++++#     #     value_states = self.v_proj(hidden_states)
--++++++
--++++++#     #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--++++++#     #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++#     #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++#     #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++
--++++++#     #     # 3. RoPE 旋转位置编码
--++++++#     #     kv_seq_len = key_states.shape[-2]
--++++++#     #     if past_key_value is not None:
--++++++#     #         if self.layer_idx is None:
--++++++#     #             raise ValueError(
--++++++#     #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++++#     #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++#     #                 "with a layer index."
--++++++#     #             )
--++++++#     #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++
--++++++#     #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++#     #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++#     #     # 4. KV 缓存更新
--++++++#     #     if past_key_value is not None:
--++++++#     #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--++++++#     #         key_states, value_states = past_key_value.update(
--++++++#     #             key_states, value_states, self.layer_idx, cache_kwargs
--++++++#     #         )
--++++++
--++++++#     #     # 5. 准备 Attention Mask
--++++++#     #     fa_attention_mask = None
--++++++#     #     if attention_mask is not None:
--++++++#     #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++#     #         fa_attention_mask = (mask_slice != 0)
--++++++
--++++++#     #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--++++++#     #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--++++++#     #     input_dtype = query_states.dtype
--++++++
--++++++#     #     # 6. [核心] 调用 flash_attention_score 算子
--++++++#     #     attn_output = mindspore.ops.flash_attention_score(
--++++++#     #         query=query_states,
--++++++#     #         key=key_states,
--++++++#     #         value=value_states,
--++++++#     #         head_num=self.num_heads,
--++++++#     #         attn_mask=fa_attention_mask,
--++++++#     #         keep_prob=1.0 - self.attention_dropout,
--++++++#     #         scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++#     #         input_layout="BNSD",
--++++++#     #         sparse_mode=0,
--++++++#     #         # <--- 修改点 2: 启用内部高精度计算 ---
--++++++#     #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--++++++#     #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--++++++#     #         inner_precise=1
--++++++#     #     )
--++++++
--++++++#     #     # 恢复原始数据类型
--++++++#     #     attn_output = attn_output.to(input_dtype)
--++++++
--++++++#     #     # 7. 调整输出形状
--++++++#     #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++#     #     attn_output = self.o_proj(attn_output)
--++++++
--++++++#     #     attn_weights = None
--++++++#     #     if output_attentions:
--++++++#     #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++++
--++++++#     #     return attn_output, attn_weights, past_key_value
--++++++
--++++++
--+++++ class Qwen2MoeFlashAttention(nn.Module):
--+++++     """
--+++++-    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++++-    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++++-
--+++++-    关键改动:
--+++++-    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++++-       直接传入原始的 key 和 value 张量效率更高。
--+++++-    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++++-    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--++++++    Qwen2MoeAttention 的 Flash Attention **纯速度优化**版本。
--++++++
--++++++    此版本将 `mindspore.ops.flash_attention_score` 的 `inner_precise`
--++++++    参数设置为 0，关闭内部高精度累加。这将在硬件允许的情况下，
--++++++    完全使用模型的低精度数据类型（如 float16）进行计算，
--++++++    以达到理论上的最高执行速度。
--+++++     """
--+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++++         super().__init__()
--+++++         self.config = config
--+++++         self.layer_idx = layer_idx
--++++++        if layer_idx is None:
--++++++            logger.warning_once(
--++++++                f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended."
--++++++            )
--++++++
--+++++         self.hidden_size = config.hidden_size
--+++++         self.num_heads = config.num_attention_heads
--+++++         self.head_dim = self.hidden_size // self.num_heads
--+++++         self.num_key_value_heads = config.num_key_value_heads
--+++++-        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++         self.max_position_embeddings = config.max_position_embeddings
--+++++         self.rope_theta = config.rope_theta
--+++++         self.attention_dropout = config.attention_dropout
--+++++ 
--+++++-        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++-            raise ValueError(
--+++++-                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++++-            )
--+++++-
--+++++         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++++         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++@@ -531,351 +777,834 @@ class Qwen2MoeFlashAttention(nn.Module):
--+++++         key_states = self.k_proj(hidden_states)
--+++++         value_states = self.v_proj(hidden_states)
--+++++ 
--+++++-        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++-        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++++-        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--++++++        # 2. 调整形状以匹配 BNSD 布局
--+++++         query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++         key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++         value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-
--+++++-        # 3. RoPE 旋转位置编码
--++++++        
--++++++        # 3. RoPE 和 KV 缓存
--+++++         kv_seq_len = key_states.shape[-2]
--+++++         if past_key_value is not None:
--+++++-            if self.layer_idx is None:
--+++++-                raise ValueError(
--+++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++-                    "with a layer index."
--+++++-                )
--+++++-            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++++-            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++-                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++++-                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++++-                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++++-                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++++-                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++++-                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++++-                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++++-                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++++-                if cache_position.shape[0] == 1:
--+++++-                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++++-                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++++-                    kv_seq_len = past_seen_tokens + 1
--+++++-                else:
--+++++-                    # prefill 阶段：cache_position 是范围，使用其长度
--+++++-                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++++-            else:
--+++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++-
--++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++        
--+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++ 
--+++++-        # 4. KV 缓存更新
--+++++         if past_key_value is not None:
--+++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++-            key_states, value_states = past_key_value.update(
--+++++-                key_states, value_states, self.layer_idx, cache_kwargs
--+++++-            )
--+++++-            
--+++++-            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++++-            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++++-            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++-                if cache_position.shape[0] == 1:
--+++++-                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++++-                    kv_seq_len = key_states.shape[-2]
--+++++-
--+++++-        # 5. [重要] 准备 Attention Mask
--+++++-        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++++-        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--++++++            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++++
--++++++        # 4. 准备 Attention Mask
--+++++         fa_attention_mask = None
--+++++         if attention_mask is not None:
--+++++-            # 截取与当前key长度匹配的部分
--+++++-            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++++-            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++++             mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++-            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++++             fa_attention_mask = (mask_slice != 0)
--+++++ 
--+++++-        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++++-        input_dtype = query_states.dtype
--+++++-        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++++-            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++++-            query_states = query_states.to(mindspore.float16)
--+++++-            key_states = key_states.to(mindspore.float16)
--+++++-            value_states = value_states.to(mindspore.float16)
--+++++-
--+++++-        # 6. [核心] 调用 flash_attention_score 算子
--+++++-        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++++-        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--++++++        # 5. 【核心】调用 flash_attention_score，关闭高精度累加
--+++++         attn_output = mindspore.ops.flash_attention_score(
--+++++             query=query_states,
--+++++             key=key_states,
--+++++             value=value_states,
--+++++-            head_num=self.num_heads, # 传入Q的头数(N1)
--++++++            head_num=self.num_heads,
--+++++             attn_mask=fa_attention_mask,
--+++++-            keep_prob=1.0 - self.attention_dropout,
--++++++            keep_prob=1.0 - self.attention_dropout if self.training else 1.0, # 推理时关闭dropout
--+++++             scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++             input_layout="BNSD",
--+++++-            sparse_mode=0 # 使用 defaultMask 模式
--++++++            sparse_mode=0,
--++++++            inner_precise=0 # 【关键改动】设置为0，关闭内部FP32计算，追求最快速度
--+++++         )
--+++++ 
--+++++-        # 恢复原始数据类型
--+++++-        attn_output = attn_output.to(input_dtype)
--+++++-
--+++++-        # 7. 调整输出形状
--+++++-        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--++++++        # 6. 调整输出形状
--+++++         attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++         attn_output = self.o_proj(attn_output)
--+++++ 
--+++++-        # FlashAttention 算子不直接返回注意力权重矩阵
--++++++        # 7. 返回结果
--+++++         attn_weights = None
--+++++         if output_attentions:
--+++++-             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++++             logger.warning_once("`Qwen2MoeFlashAttention` is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--+++++ 
--+++++         return attn_output, attn_weights, past_key_value
--+++++ 
--+++++-    # def forward(
--+++++-    #     self,
--+++++-    #     hidden_states: mindspore.Tensor,
--+++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++-    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++-    #     past_key_value: Optional[Cache] = None,
--+++++-    #     output_attentions: bool = False,
--+++++-    #     use_cache: bool = False,
--+++++-    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++-
--+++++-    #     bsz, q_len, _ = hidden_states.shape
--+++++-
--+++++-    #     # 1. 线性投射 Q, K, V
--+++++-    #     query_states = self.q_proj(hidden_states)
--+++++-    #     key_states = self.k_proj(hidden_states)
--+++++-    #     value_states = self.v_proj(hidden_states)
--+++++-
--+++++-    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-
--+++++-    #     # 3. RoPE 旋转位置编码
--+++++-    #     kv_seq_len = key_states.shape[-2]
--+++++-    #     if past_key_value is not None:
--+++++-    #         if self.layer_idx is None:
--+++++-    #             raise ValueError(
--+++++-    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++-    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++-    #                 "with a layer index."
--+++++-    #             )
--+++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++ 
--+++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++-
--+++++-    #     # 4. KV 缓存更新
--+++++-    #     if past_key_value is not None:
--+++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++-    #         key_states, value_states = past_key_value.update(
--+++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++-    #         )
--+++++-
--+++++-    #     # 5. 准备 Attention Mask
--+++++-    #     fa_attention_mask = None
--+++++-    #     if attention_mask is not None:
--+++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++-    #         fa_attention_mask = (mask_slice != 0)
--+++++-
--+++++-    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++++-    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++++-    #     input_dtype = query_states.dtype
--+++++-
--+++++-    #     # 6. [核心] 调用 flash_attention_score 算子
--+++++-    #     attn_output = mindspore.ops.flash_attention_score(
--+++++-    #         query=query_states,
--+++++-    #         key=key_states,
--+++++-    #         value=value_states,
--+++++-    #         head_num=self.num_heads,
--+++++-    #         attn_mask=fa_attention_mask,
--+++++-    #         keep_prob=1.0 - self.attention_dropout,
--+++++-    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++-    #         input_layout="BNSD",
--+++++-    #         sparse_mode=0,
--+++++-    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++++-    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++++-    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++++-    #         inner_precise=1
--+++++-    #     )
--+++++-
--+++++-    #     # 恢复原始数据类型
--+++++-    #     attn_output = attn_output.to(input_dtype)
--++++++QWEN2MOE_ATTENTION_CLASSES = {
--++++++    "eager": Qwen2MoeAttention,
--++++++    "flash-attention": Qwen2MoeFlashAttention,
--++++++}
--+++++ 
--+++++-    #     # 7. 调整输出形状
--+++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++-    #     attn_output = self.o_proj(attn_output)
--+++++ 
--+++++-    #     attn_weights = None
--+++++-    #     if output_attentions:
--+++++-    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++++#     def __init__(self, config):
--++++++#         super().__init__()
--++++++#         self.num_experts = config.num_experts
--++++++#         self.top_k = config.num_experts_per_tok
--++++++#         self.norm_topk_prob = config.norm_topk_prob
--++++++
--++++++#         # gating
--++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++++#         self.experts = nn.ModuleList(
--++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++++#         )
--++++++
--++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++++
--++++++#     #@dwj
--++++++#     # 只遍历激活的专家，而非全部专家
--++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++++#             batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++#             hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++#             num_tokens = hidden_states_reshaped.shape[0]
--++++++
--++++++#             router_logits = self.gate(hidden_states_reshaped)
--++++++#             routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++#             routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++++
--++++++#             if self.norm_topk_prob:
--++++++#                 routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++#             routing_weights = routing_weights.to(hidden_states.dtype)
--++++++            
--++++++#             final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--++++++#             flat_selected_experts = selected_experts.flatten()
--++++++            
--++++++#             unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--++++++#             broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--++++++#             token_indices = broadcasted_token_indices.flatten()
--++++++            
--++++++#             active_experts = ops.unique(flat_selected_experts)
--++++++            
--++++++#             for expert_idx_tensor in active_experts:
--++++++#                 expert_idx = expert_idx_tensor.item()
--++++++#                 expert_layer = self.experts[expert_idx]
--++++++                
--++++++#                 mask = (flat_selected_experts == expert_idx_tensor)
--++++++#                 selected_token_indices = token_indices[mask]
--++++++#                 selected_routing_weights = routing_weights.flatten()[mask]
--++++++                
--++++++#                 current_states = hidden_states_reshaped[selected_token_indices]
--++++++                
--++++++#                 expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++++                
--++++++#                 final_hidden_states = final_hidden_states.index_add(
--++++++#                     dim=0,
--++++++#                     index=selected_token_indices,
--++++++#                     source=expert_output.to(hidden_states.dtype)
--++++++#                 )
--++++++            
--++++++#             shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++++#             shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++++ 
--+++++-    #     return attn_output, attn_weights, past_key_value
--++++++#             final_hidden_states = final_hidden_states + shared_expert_output
--++++++#             final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++++            
--++++++#             return final_hidden_states, router_logits
--++++++
--++++++
--++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++++#     """
--++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++++#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--++++++#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--++++++#     `_moe_infer_prefill` (用于长序列处理) 方法。
--++++++#     """
--++++++#     def __init__(self, config: Qwen2MoeConfig):
--++++++#         super().__init__()
--++++++#         self.num_experts = config.num_experts
--++++++#         self.top_k = config.num_experts_per_tok
--++++++#         self.norm_topk_prob = config.norm_topk_prob
--++++++
--++++++#         # 门控网络
--++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++++#         # 专家列表
--++++++#         self.experts = nn.ModuleList(
--++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++++#         )
--++++++#         # 共享专家
--++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++++
--++++++#     @no_grad()
--++++++#     def _moe_infer_decode(
--++++++#         self, 
--++++++#         hidden_states: mindspore.Tensor, 
--++++++#         selected_experts: mindspore.Tensor, 
--++++++#         routing_weights: mindspore.Tensor
--++++++#     ) -> mindspore.Tensor:
--++++++#         """
--++++++#         【解码路径】针对 sequence_length=1 的极致优化。
--++++++#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--++++++#         """
--++++++#         batch_size, hidden_dim = hidden_states.shape
--++++++        
--++++++#         expert_outputs_list = [
--++++++#             ops.cat([
--++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++++#             ], dim=0) 
--++++++#             for i in range(batch_size)
--++++++#         ]
--++++++        
--++++++#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--++++++#         # shape: (batch_size, top_k, hidden_dim)
--++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++++        
--++++++#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++++        
--++++++#         return moe_output.squeeze(1)
--++++++
--++++++#     @no_grad()
--++++++#     def _moe_infer_prefill(
--++++++#         self, 
--++++++#         hidden_states: mindspore.Tensor, 
--++++++#         selected_experts: mindspore.Tensor, 
--++++++#         routing_weights: mindspore.Tensor
--++++++#     ) -> mindspore.Tensor:
--++++++#         """
--++++++#         【预填充路径】针对 sequence_length > 1 的优化。
--++++++#         按专家对 Token 进行分组，并进行批处理。
--++++++#         """
--++++++#         moe_output = ops.zeros_like(hidden_states)
--++++++#         num_tokens = hidden_states.shape[0]
--++++++#         flat_selected_experts = selected_experts.flatten()
--++++++        
--++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++++        
--++++++#         active_experts = ops.unique(flat_selected_experts)
--++++++        
--++++++#         for expert_idx_tensor in active_experts:
--++++++#             expert_idx = expert_idx_tensor.item()
--++++++#             expert_layer = self.experts[expert_idx]
--++++++            
--++++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++++#             selected_token_indices = token_indices[mask]
--++++++#             selected_routing_weights = routing_weights.flatten()[mask]
--++++++            
--++++++#             current_states = hidden_states[selected_token_indices]
--++++++            
--++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++++            
--++++++#             moe_output = moe_output.index_add(
--++++++#                 dim=0,
--++++++#                 index=selected_token_indices,
--++++++#                 source=expert_output.to(hidden_states.dtype)
--++++++#             )
--++++++#         return moe_output
--++++++
--++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++++#         """
--++++++#         顶层 forward 方法，作为智能分发器。
--++++++#         """
--++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++        
--++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++#         router_logits = self.gate(hidden_states_reshaped)
--++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++ 
--+++++-    # def forward(
--+++++-    #     self,
--+++++-    #     hidden_states: mindspore.Tensor,
--+++++-    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++-    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++-    #     past_key_value: Optional[Cache] = None,
--+++++-    #     output_attentions: bool = False,
--+++++-    #     use_cache: bool = False,
--+++++-    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++-    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++-
--+++++-    #     bsz, q_len, _ = hidden_states.shape
--+++++-
--+++++-    #     query_states = self.q_proj(hidden_states)
--+++++-    #     key_states = self.k_proj(hidden_states)
--+++++-    #     value_states = self.v_proj(hidden_states)
--+++++-
--+++++-    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-
--+++++-    #     kv_seq_len = key_states.shape[-2]
--+++++-    #     if past_key_value is not None:
--+++++-    #         if self.layer_idx is None:
--+++++-    #             raise ValueError("`layer_idx` must be specified for caching")
--+++++-    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++-
--+++++-    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++-    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++-
--+++++-    #     if past_key_value is not None:
--+++++-    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++-    #         key_states, value_states = past_key_value.update(
--+++++-    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++-    #         )
--++++++#         if self.norm_topk_prob:
--++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++        
--++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++++        
--++++++#         moe_output = None
--++++++#         # 在推理时，根据序列长度选择最优路径
--++++++#         if not self.training:
--++++++#             if sequence_length == 1:
--++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++++++#             else:
--++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++++++#         else:
--++++++#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--++++++#             raise NotImplementedError("Training path is not implemented.")
--++++++
--++++++#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--++++++#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--++++++#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--++++++        
--++++++#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--++++++        
--++++++#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--++++++        
--++++++#         return final_hidden_states, router_logits
--++++++
--++++++
--++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++++#     """
--++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++++#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--++++++#     """
--++++++#     def __init__(self, config: Qwen2MoeConfig):
--++++++#         super().__init__()
--++++++#         self.num_experts = config.num_experts
--++++++#         self.top_k = config.num_experts_per_tok
--++++++#         self.norm_topk_prob = config.norm_topk_prob
--++++++
--++++++#         # 门控网络
--++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++++#         # 专家列表
--++++++#         self.experts = nn.ModuleList(
--++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++++#         )
--++++++#         # 共享专家
--++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++++
--++++++#     @no_grad()
--++++++#     def _moe_infer_decode(
--++++++#         self, 
--++++++#         hidden_states: mindspore.Tensor, 
--++++++#         selected_experts: mindspore.Tensor, 
--++++++#         routing_weights: mindspore.Tensor
--++++++#     ) -> mindspore.Tensor:
--++++++#         batch_size, _ = hidden_states.shape
--++++++#         expert_outputs_list = [
--++++++#             ops.cat([
--++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++++#             ], dim=0) 
--++++++#             for i in range(batch_size)
--++++++#         ]
--++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++++#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++++#         return moe_output.squeeze(1)
--++++++
--++++++#     @no_grad()
--++++++#     def _moe_infer_prefill(
--++++++#         self, 
--++++++#         hidden_states: mindspore.Tensor, 
--++++++#         selected_experts: mindspore.Tensor, 
--++++++#         routing_weights: mindspore.Tensor
--++++++#     ) -> mindspore.Tensor:
--++++++#         moe_output = ops.zeros_like(hidden_states)
--++++++#         num_tokens = hidden_states.shape[0]
--++++++#         flat_selected_experts = selected_experts.flatten()
--++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++++#         active_experts = ops.unique(flat_selected_experts)
--++++++        
--++++++#         for expert_idx_tensor in active_experts:
--++++++#             expert_idx = expert_idx_tensor.item()
--++++++#             expert_layer = self.experts[expert_idx]
--++++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++++#             selected_token_indices = token_indices[mask]
--++++++#             selected_routing_weights = routing_weights.flatten()[mask]
--++++++#             current_states = hidden_states[selected_token_indices]
--++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++++#             moe_output = moe_output.index_add(
--++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++++++#             )
--++++++#         return moe_output
--++++++
--++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++++#         """
--++++++#         顶层 forward 方法，作为智能分发器。
--++++++#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--++++++#         """
--++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++        
--++++++#         # 1. 门控计算 (通用逻辑)
--++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++#         router_logits = self.gate(hidden_states_reshaped)
--++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++++
--++++++#         if self.norm_topk_prob:
--++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++        
--++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++++        
--++++++#         # 2. 智能分发到最优 MoE 路径
--++++++#         moe_output = None
--++++++#         if not self.training:
--++++++#             if sequence_length == 1:
--++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--++++++#             else:
--++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--++++++#         else:
--++++++#             raise NotImplementedError("Training path is not implemented.")
--++++++
--++++++#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--++++++#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++++        
--++++++#         # 4. 合并 MoE 输出和共享专家输出
--++++++#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++++        
--++++++#         # 5. 恢复原始形状并返回
--++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++++        
--++++++#         return final_hidden_states, router_logits
--++++++
--++++++# prefill fastest
--++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++++#     """
--++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++++#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--++++++#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--++++++#     """
--++++++#     def __init__(self, config: Qwen2MoeConfig):
--++++++#         super().__init__()
--++++++#         self.num_experts = config.num_experts
--++++++#         self.top_k = config.num_experts_per_tok
--++++++#         self.norm_topk_prob = config.norm_topk_prob
--++++++
--++++++#         # 门控网络
--++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++++#         # 专家列表
--++++++#         self.experts = nn.ModuleList(
--++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++++#         )
--++++++#         # 共享专家
--++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++++
--++++++#     @no_grad()
--++++++#     def _moe_infer_dispatch(
--++++++#         self, 
--++++++#         hidden_states: mindspore.Tensor, 
--++++++#         selected_experts: mindspore.Tensor, 
--++++++#         routing_weights: mindspore.Tensor
--++++++#     ) -> mindspore.Tensor:
--++++++#         """
--++++++#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--++++++#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--++++++#         """
--++++++#         moe_output = ops.zeros_like(hidden_states)
--++++++#         num_tokens, _ = hidden_states.shape
--++++++        
--++++++#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--++++++#         flat_selected_experts = selected_experts.flatten()
--++++++#         flat_routing_weights = routing_weights.flatten()
--+++++ 
--+++++-    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++-    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++-
--+++++-    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++++-    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++++-    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++++-    #     query_states = query_states / math.sqrt(self.head_dim)
--+++++-    #     # <--- 修改结束 ---
--+++++-
--+++++-    #     fa_attention_mask = None
--+++++-    #     if attention_mask is not None:
--+++++-    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++-    #         fa_attention_mask = (mask_slice != 0)
--+++++-
--+++++-    #     input_dtype = query_states.dtype
--+++++-
--+++++-    #     attn_output = mindspore.ops.flash_attention_score(
--+++++-    #         query=query_states,    # 传入已经预先缩放过的 query
--+++++-    #         key=key_states,
--+++++-    #         value=value_states,
--+++++-    #         head_num=self.num_heads,
--+++++-    #         attn_mask=fa_attention_mask,
--+++++-    #         keep_prob=1.0 - self.attention_dropout,
--+++++-    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++++-    #         input_layout="BNSD",
--+++++-    #         sparse_mode=0,
--+++++-    #         inner_precise=1        # 仍然保持内部高精度计算
--+++++-    #     )
--++++++#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++ 
--+++++-    #     attn_output = attn_output.to(input_dtype)
--+++++-    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++-    #     attn_output = self.o_proj(attn_output)
--++++++#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--++++++#         active_experts = ops.unique(flat_selected_experts)
--++++++        
--++++++#         for expert_idx_tensor in active_experts:
--++++++#             expert_idx = expert_idx_tensor.item()
--++++++#             expert_layer = self.experts[expert_idx]
--++++++            
--++++++#             # 找到所有分配给该专家的 token
--++++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++++            
--++++++#             # 使用 mask 选取对应的 token 和权重
--++++++#             current_token_indices = token_indices[mask]
--++++++#             current_routing_weights = flat_routing_weights[mask]
--++++++#             current_hidden_states = hidden_states[current_token_indices]
--++++++            
--++++++#             # 对这些 token 进行批处理
--++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++++++            
--++++++#             # 使用 index_add 将结果精确地加回到对应位置
--++++++#             moe_output = moe_output.index_add(
--++++++#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--++++++#             )
--++++++#         return moe_output
--++++++
--++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++++#         """
--++++++#         顶层 forward 方法，作为智能分发器。
--++++++#         """
--++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++        
--++++++#         # 1. 门控计算
--++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++#         router_logits = self.gate(hidden_states_reshaped)
--++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++++
--++++++#         if self.norm_topk_prob:
--++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++        
--++++++#         routing_weights = routing_weights.to(hidden_states.dtype)
--++++++        
--++++++#         # 2. 调用统一的 MoE 计算内核
--++++++#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--++++++#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--+++++ 
--+++++-    #     attn_weights = None
--+++++-    #     if output_attentions:
--+++++-    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--++++++#         # 3. 统一处理共享专家
--++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++++        
--++++++#         # 4. 合并输出
--++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++++        
--++++++#         # 5. 恢复原始形状并返回
--++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++++        
--++++++#         return final_hidden_states, router_logits
--++++++
--++++++
--++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++++#     """
--++++++#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--++++++#     【最终高性能与高精度版】：
--++++++#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--++++++#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--++++++#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--++++++#     3. 这样实现了速度和准确性的两全其美。
--++++++#     """
--++++++#     def __init__(self, config: Qwen2MoeConfig):
--++++++#         super().__init__()
--++++++#         self.num_experts = config.num_experts
--++++++#         self.top_k = config.num_experts_per_tok
--++++++#         self.norm_topk_prob = config.norm_topk_prob
--++++++
--++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++++#         self.experts = nn.ModuleList(
--++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++++#         )
--++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++++
--++++++#     @no_grad()
--++++++#     def _moe_infer_decode(
--++++++#         self, 
--++++++#         hidden_states: mindspore.Tensor, 
--++++++#         selected_experts: mindspore.Tensor, 
--++++++#         routing_weights: mindspore.Tensor
--++++++#     ) -> mindspore.Tensor:
--++++++#         """
--++++++#         【解码路径】极致优化版：bmm + 高精度累加。
--++++++#         """
--++++++#         original_dtype = hidden_states.dtype
--++++++#         batch_size, _ = hidden_states.shape
--++++++        
--++++++#         expert_outputs_list = [
--++++++#             ops.cat([
--++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++++#             ], dim=0) 
--++++++#             for i in range(batch_size)
--++++++#         ]
--++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++++
--++++++#         # 在 float32 下执行 bmm，得到高精度结果
--++++++#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--++++++        
--++++++#         # 将高精度结果转换回原始数据类型
--++++++#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--++++++        
--++++++#         return moe_output
--++++++
--++++++#     @no_grad()
--++++++#     def _moe_infer_prefill(
--++++++#         self, 
--++++++#         hidden_states: mindspore.Tensor, 
--++++++#         selected_experts: mindspore.Tensor, 
--++++++#         routing_weights: mindspore.Tensor
--++++++#     ) -> mindspore.Tensor:
--++++++#         """
--++++++#         【预填充路径】与原始实现一致，结果精确。
--++++++#         """
--++++++#         moe_output = ops.zeros_like(hidden_states)
--++++++#         num_tokens, _ = hidden_states.shape
--++++++#         flat_selected_experts = selected_experts.flatten()
--++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++++#         active_experts = ops.unique(flat_selected_experts)
--++++++        
--++++++#         for expert_idx_tensor in active_experts:
--++++++#             expert_idx = expert_idx_tensor.item()
--++++++#             expert_layer = self.experts[expert_idx]
--++++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++++#             selected_token_indices = token_indices[mask]
--++++++#             selected_routing_weights = routing_weights.flatten()[mask]
--++++++#             current_states = hidden_states[selected_token_indices]
--++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++++#             moe_output = moe_output.index_add(
--++++++#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--++++++#             )
--++++++#         return moe_output
--++++++
--++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++        
--++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++#         router_logits = self.gate(hidden_states_reshaped)
--++++++#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++ 
--+++++-    #     return attn_output, attn_weights, past_key_value
--++++++#         if self.norm_topk_prob:
--++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++        
--++++++#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--++++++#         # 如果模型主体是 float16，后续再转换
--++++++        
--++++++#         moe_output = None
--++++++#         if not self.training:
--++++++#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--++++++#             # _moe_infer_decode 内部会处理好类型转换
--++++++#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--++++++#             if sequence_length == 1:
--++++++#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++++++#             else:
--++++++#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--++++++#         else:
--++++++#             raise NotImplementedError("Training path is not implemented.")
--++++++
--++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++++        
--++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++++        
--++++++#         return final_hidden_states, router_logits
--++++++    
--+++++ 
--+++++-QWEN2MOE_ATTENTION_CLASSES = {
--+++++-    "eager": Qwen2MoeAttention,
--+++++-    "flash-attention": Qwen2MoeFlashAttention,
--+++++-}
--++++++# class Qwen2MoeSparseMoeBlock(nn.Module):
--++++++#     """
--++++++#     【融合版】一个混合专家模块，内置两种推理策略，
--++++++#     由外部全局变量 `Long_Prompt` 控制：
--++++++
--++++++#     - if Long_Prompt is True:  【精度优先模式】
--++++++#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--++++++#       适用于处理长序列，避免误差累积。
--++++++
--++++++#     - if Long_Prompt is False: 【速度优先模式】
--++++++#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--++++++#       在解码阶段获得极致速度，同时保证结果高度准确。
--++++++#     """
--++++++#     def __init__(self, config: Qwen2MoeConfig):
--++++++#         super().__init__()
--++++++#         self.num_experts = config.num_experts
--++++++#         self.top_k = config.num_experts_per_tok
--++++++#         self.norm_topk_prob = config.norm_topk_prob
--++++++
--++++++#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--++++++#         self.experts = nn.ModuleList(
--++++++#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--++++++#         )
--++++++#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++++#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++++
--++++++#     # --- 速度优先模式的辅助函数 ---
--++++++#     @no_grad()
--++++++#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++++#         original_dtype = hidden_states.dtype
--++++++#         batch_size, _ = hidden_states.shape
--++++++#         expert_outputs_list = [
--++++++#             ops.cat([
--++++++#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++++#             ], dim=0) 
--++++++#             for i in range(batch_size)
--++++++#         ]
--++++++#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++++#         weights_fp32 = routing_weights.to(mindspore.float32)
--++++++#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--++++++#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++++++#         return moe_output_fp32.squeeze(1).to(original_dtype)
--++++++
--++++++#     @no_grad()
--++++++#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++++#         moe_output = ops.zeros_like(hidden_states)
--++++++#         num_tokens, _ = hidden_states.shape
--++++++#         flat_selected_experts = selected_experts.flatten()
--++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++++#         active_experts = ops.unique(flat_selected_experts)
--++++++#         for expert_idx_tensor in active_experts:
--++++++#             expert_idx = expert_idx_tensor.item()
--++++++#             expert_layer = self.experts[expert_idx]
--++++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++++#             selected_token_indices = token_indices[mask]
--++++++#             selected_routing_weights = routing_weights.flatten()[mask]
--++++++#             current_states = hidden_states[selected_token_indices]
--++++++#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--++++++#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--++++++#         return moe_output
--++++++
--++++++#     # --- 精度优先模式的辅助函数 ---
--++++++#     @no_grad()
--++++++#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++++#         moe_output = ops.zeros_like(hidden_states)
--++++++#         num_tokens, _ = hidden_states.shape
--++++++#         flat_selected_experts = selected_experts.flatten()
--++++++#         flat_routing_weights = routing_weights.flatten()
--++++++#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++++#         active_experts = ops.unique(flat_selected_experts)
--++++++#         for expert_idx_tensor in active_experts:
--++++++#             expert_idx = expert_idx_tensor.item()
--++++++#             expert_layer = self.experts[expert_idx]
--++++++#             mask = (flat_selected_experts == expert_idx_tensor)
--++++++#             current_token_indices = token_indices[mask]
--++++++#             current_routing_weights = flat_routing_weights[mask]
--++++++#             current_hidden_states = hidden_states[current_token_indices]
--++++++#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++++++#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--++++++#         return moe_output
--++++++
--++++++#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++++#         # 声明我们将要使用一个在模块外部定义的全局变量
--++++++#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--++++++#         global Long_Prompt
--++++++
--++++++#         # 1. 门控计算 (所有模式通用)
--++++++#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++#         router_logits = self.gate(hidden_states_reshaped)
--++++++#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--++++++#         if self.norm_topk_prob:
--++++++#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++        
--++++++#         moe_output = None
--++++++#         if not self.training:
--++++++#             # 根据 Long_Prompt 标志选择模式
--++++++#             if Long_Prompt:
--++++++#                 # --- 精度优先模式 ---
--++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++++#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++#             else:
--++++++#                 # --- 速度优先模式 ---
--++++++#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++++#                 if sequence_length == 1:
--++++++#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++#                 else:
--++++++#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++#         else:
--++++++#             raise NotImplementedError("Training path is not implemented.")
--++++++
--++++++#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++++#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++++        
--++++++#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++++#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++++        
--++++++#         return final_hidden_states, router_logits
--++++++    
--++++++class Qwen2MoeSparseMoeBlock(nn.Module):
--++++++    """
--++++++    【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--++++++    控制的顶级推理策略：
--+++++ 
--++++++    - if Long_Prompt is True:  【精度优先模式】
--++++++      采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配原始逻辑。
--++++++      适用于需要严格可复现性的长序列任务。
--+++++ 
--+++++-class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++-    def __init__(self, config):
--++++++    - if Long_Prompt is False: 【速度优先模式】
--++++++      采用业界最强的性能组合：
--++++++      - Prefill 阶段: 使用 DeepSeek 的“全局-排序-切片”策略，速度最快。
--++++++      - Decode 阶段: 使用“bmm+高精度累加”策略，兼顾速度与准确性。
--++++++    """
--++++++    def __init__(self, config: Qwen2MoeConfig):
--+++++         super().__init__()
--+++++         self.num_experts = config.num_experts
--+++++         self.top_k = config.num_experts_per_tok
--+++++         self.norm_topk_prob = config.norm_topk_prob
--+++++ 
--+++++-        # gating
--+++++         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++         self.experts = nn.ModuleList(
--+++++             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++         )
--+++++-
--+++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++ 
--+++++-    #@dwj
--+++++-    # 只遍历激活的专家，而非全部专家
--+++++-    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++-            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++-            num_tokens = hidden_states_reshaped.shape[0]
--+++++-
--+++++-            router_logits = self.gate(hidden_states_reshaped)
--+++++-            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++-
--+++++-            if self.norm_topk_prob:
--+++++-                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-            routing_weights = routing_weights.to(hidden_states.dtype)
--+++++-            
--+++++-            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++++-            flat_selected_experts = selected_experts.flatten()
--+++++-            
--+++++-            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++++-            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++++-            token_indices = broadcasted_token_indices.flatten()
--+++++-            
--+++++-            active_experts = ops.unique(flat_selected_experts)
--+++++-            
--+++++-            for expert_idx_tensor in active_experts:
--+++++-                expert_idx = expert_idx_tensor.item()
--+++++-                expert_layer = self.experts[expert_idx]
--+++++-                
--+++++-                mask = (flat_selected_experts == expert_idx_tensor)
--+++++-                selected_token_indices = token_indices[mask]
--+++++-                selected_routing_weights = routing_weights.flatten()[mask]
--+++++-                
--+++++-                current_states = hidden_states_reshaped[selected_token_indices]
--+++++-                
--+++++-                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++-                
--+++++-                final_hidden_states = final_hidden_states.index_add(
--+++++-                    dim=0,
--+++++-                    index=selected_token_indices,
--+++++-                    source=expert_output.to(hidden_states.dtype)
--+++++-                )
--+++++-            
--+++++-            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++++-            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++++++    # --- 速度优先模式 (SPEED MODE) 的辅助函数 ---
--++++++    @no_grad()
--++++++    def _moe_infer_decode_fast(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++++        original_dtype = hidden_states.dtype
--++++++        batch_size, _ = hidden_states.shape
--++++++        expert_outputs_list = [
--++++++            ops.cat([
--++++++                self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--++++++            ], dim=0)
--++++++            for i in range(batch_size)
--++++++        ]
--++++++        expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--++++++        weights_fp32 = routing_weights.to(mindspore.float32)
--++++++        outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--++++++        moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--++++++        return moe_output_fp32.squeeze(1).to(original_dtype)
--++++++
--++++++    @no_grad()
--++++++    def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++++        num_tokens, _ = hidden_states.shape
--++++++        flat_selected_experts = selected_experts.flatten()
--++++++        sorted_expert_indices = flat_selected_experts.argsort()
--++++++        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--++++++        original_token_indices = sorted_expert_indices // self.top_k
--++++++        moe_output = ops.zeros_like(hidden_states)
--++++++        current_token_offset = 0
--++++++        for i in range(self.num_experts):
--++++++            expert_token_count = tokens_per_expert[i] - current_token_offset
--++++++            if expert_token_count == 0:
--++++++                continue
--++++++            end_offset = current_token_offset + expert_token_count
--++++++            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--++++++            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--++++++            expert_hidden_states = hidden_states[expert_original_token_indices]
--++++++            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--++++++            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--++++++            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--++++++            current_token_offset += expert_token_count
--++++++        return moe_output
--++++++
--++++++    # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--++++++    @no_grad()
--++++++    def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++++        moe_output = ops.zeros_like(hidden_states)
--++++++        num_tokens, _ = hidden_states.shape
--++++++        flat_selected_experts = selected_experts.flatten()
--++++++        flat_routing_weights = routing_weights.flatten()
--++++++        token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--++++++        active_experts = ops.unique(flat_selected_experts)
--++++++        for expert_idx_tensor in active_experts:
--++++++            expert_idx = expert_idx_tensor.item()
--++++++            expert_layer = self.experts[expert_idx]
--++++++            mask = (flat_selected_experts == expert_idx_tensor)
--++++++            current_token_indices = token_indices[mask]
--++++++            current_routing_weights = flat_routing_weights[mask]
--++++++            current_hidden_states = hidden_states[current_token_indices]
--++++++            expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--++++++            moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--++++++        return moe_output
--+++++ 
--+++++-            final_hidden_states = final_hidden_states + shared_expert_output
--+++++-            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++-            
--+++++-            return final_hidden_states, router_logits
--++++++    def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++++        global Long_Prompt
--++++++
--++++++        # 1. 门控计算 (所有模式通用)
--++++++        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--++++++        router_logits = self.gate(hidden_states_reshaped)
--++++++        routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++        routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--++++++        if self.norm_topk_prob:
--++++++            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++        
--++++++        moe_output = None
--++++++        if Long_Prompt:
--++++++            # --- 精度优先模式 (ACCURACY MODE) ---
--++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++++            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++        else:
--++++++            # --- 速度优先模式 (SPEED MODE) ---
--++++++            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++++            if sequence_length == 1:
--++++++                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++            else:
--++++++                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++        
--+++++ 
--++++++        # 3. 共享专家计算与合并 (所有模式通用)
--++++++        gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--++++++                                     F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--++++++        
--++++++        final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--++++++        final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--++++++        
--++++++        return final_hidden_states, router_logits
--+++++ 
--+++++ class Qwen2MoeDecoderLayer(nn.Module):
--+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--+++++         super().__init__()
--+++++         self.hidden_size = config.hidden_size
--++++++        
--++++++        # if Long_Prompt:
--++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++++        # else:
--++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++++ 
--+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++++ 
--+++++-        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++++-
--+++++         if (layer_idx not in config.mlp_only_layers) and (
--+++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+++++         ):
--+++++@@ -1288,6 +2017,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++             self._warmed_up = True
--+++++             self.warmup_moe_model()
--+++++ 
--++++++
--++++++
--+++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+++++         output_router_logits = (
--+++++             output_router_logits if output_router_logits is not None else self.config.output_router_logits
--+++++@@ -1355,6 +2086,27 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++             router_logits=outputs.router_logits,
--+++++         )
--+++++ 
--++++++    def generate(self, *args, **kwargs):
--++++++        """
--++++++        重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--++++++        这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--++++++        """
--++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--++++++
--++++++        input_ids = kwargs.get("input_ids")
--++++++        if input_ids is None and args:
--++++++            input_ids = args[0]
--++++++
--++++++        if input_ids is not None:
--++++++            prompt_length = input_ids.shape[1]
--++++++            
--++++++            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--++++++                Long_Prompt = True
--++++++            else:
--++++++                Long_Prompt = False
--++++++
--++++++        return super().generate(*args, **kwargs)
--++++++    
--+++++     # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM.prepare_inputs_for_generation
--+++++     def prepare_inputs_for_generation(
--+++++         self,
--+++++@@ -1370,6 +2122,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++         # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
--+++++         # Exception 1: when passing input_embeds, input_ids may be missing entries
--+++++         # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
--++++++        
--+++++         if past_key_values is not None:
--+++++             if inputs_embeds is not None:  # Exception 1
--+++++                 if 0 not in input_ids.shape:
--+++++@@ -1421,6 +2174,7 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++             }
--+++++         )
--+++++         return model_inputs
--++++++
--+++++ # @lwx
--+++++     # def _decode_one_tokens_logits(
--+++++     #     self,
--+++++@@ -1960,6 +2714,7 @@ class Qwen2MoeForTokenClassification(Qwen2MoePreTrainedModel):
--+++++             attentions=outputs.attentions,
--+++++         )
--+++++ 
--++++++
--+++++ __all__ = [
--+++++     "Qwen2MoeForCausalLM",
--+++++     "Qwen2MoeModel",
--+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+++++new file mode 100644
--+++++index 00000000..6dfb5b93
--+++++--- /dev/null
--++++++++ b/patches/0001-20251104commit.patch
--+++++@@ -0,0 +1,1272 @@
--++++++From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--++++++From: Pinoeer-kingxi <13022943007@163.com>
--++++++Date: Tue, 4 Nov 2025 09:11:51 +0800
--++++++Subject: [PATCH] 20251104commit
--++++++
--++++++---
--++++++ mindnlp/transformers/cache_utils.py           |  28 +-
--++++++ .../models/deepseek/modeling_deepseek.py      | 149 ++-
--++++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--++++++ 3 files changed, 976 insertions(+), 87 deletions(-)
--++++++
--++++++diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--++++++index cadd2e04..02f8d4be 100644
--++++++--- a/mindnlp/transformers/cache_utils.py
--+++++++++ b/mindnlp/transformers/cache_utils.py
--++++++@@ -812,14 +812,26 @@ class StaticCache(Cache):
--++++++             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--++++++             # k_out[:, :, cache_position] = key_states
--++++++             # v_out[:, :, cache_position] = value_states
--++++++-            if ON_ORANGE_PI:
--++++++-                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--++++++-                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--++++++-            else:
--++++++-                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--++++++-                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--++++++-                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--++++++-
--+++++++            # if ON_ORANGE_PI:
--+++++++            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++++++            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++++++            # else:
--+++++++            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++++++            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++++++            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++++++            # 确保 cache_position 是 1D tensor 并且类型正确
--+++++++            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+++++++            if cache_position.ndim > 1:
--+++++++                cache_position = cache_position.flatten()
--+++++++            # 确保类型是 int32 或 int64（MindSpore 要求）
--+++++++            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+++++++                cache_position = cache_position.int()
--+++++++            
--+++++++            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+++++++            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+++++++            k_out[:, :, cache_position] = key_states
--+++++++            v_out[:, :, cache_position] = value_states
--+++++++            
--++++++         return k_out, v_out
--++++++ 
--++++++     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--++++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++index c695b944..d8303e45 100644
--++++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--++++++ # Copied from transformers.models.llama.modeling_llama.rotate_half
--++++++ def rotate_half(x):
--++++++     """Rotates half the hidden dims of the input."""
--++++++-    x1 = x[..., : x.shape[-1] // 2]
--++++++-    x2 = x[..., x.shape[-1] // 2 :]
--+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++++++    # x1 = x[..., : x.shape[-1] // 2]
--+++++++    # x2 = x[..., x.shape[-1] // 2 :]
--+++++++    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++++     return ops.cat((-x2, x1), dim=-1)
--++++++ 
--++++++ 
--++++++@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--++++++         if self.training:
--++++++             raise NotImplementedError("Training is not supported yet.")
--++++++         else:
--++++++-            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--++++++-        if self.config.n_shared_experts is not None:
--++++++-            y = y + self.shared_experts(identity)
--++++++-        return y
--+++++++            # @lwx
--+++++++            if orig_shape[1] == 1:
--+++++++                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+++++++                y=y.view(*orig_shape)
--+++++++                if self.config.n_shared_experts is not None:
--+++++++                    y = y + self.shared_experts(identity)
--+++++++                return y
--+++++++            else:
--+++++++                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+++++++                if self.config.n_shared_experts is not None:
--+++++++                    y = y + self.shared_experts(identity)
--+++++++                return y
--+++++++            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++++++        # if self.config.n_shared_experts is not None:
--+++++++        #     y = y + self.shared_experts(identity)
--+++++++        # return y
--+++++++        
--+++++++    @no_grad()
--+++++++    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++++++
--+++++++        expert_cache = ops.zeros_like(x)
--+++++++        for i in range(self.num_experts_per_tok):
--+++++++            expert_id = flat_expert_indices[i].item()
--+++++++            weight = flat_expert_weights[i].item()
--+++++++            expert = self.experts[expert_id]
--+++++++            expert_out = expert(x)
--+++++++            expert_cache += expert_out * weight
--+++++++        return expert_cache
--++++++ 
--++++++     @no_grad()
--++++++-    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++++-        # expert_cache = torch.zeros_like(x)
--++++++-        # idxs = flat_expert_indices.argsort()
--++++++-        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++++-        # token_idxs = idxs // self.num_experts_per_tok
--++++++-        # for i, end_idx in enumerate(tokens_per_expert):
--++++++-        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++++-        #     if start_idx == end_idx:
--++++++-        #         continue
--++++++-        #     expert = self.experts[i]
--++++++-        #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++++-        #     expert_tokens = x[exp_token_idx]
--++++++-        #     expert_out = expert(expert_tokens)
--++++++-        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++++-        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++++-        # return expert_cache
--+++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++++         expert_cache = ops.zeros_like(x)
--++++++         idxs = flat_expert_indices.argsort()
--++++++         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++++         token_idxs = idxs // self.num_experts_per_tok
--+++++++
--++++++         for i, end_idx in enumerate(tokens_per_expert):
--++++++             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++++             if start_idx == end_idx:
--++++++@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--++++++             expert_out = expert(expert_tokens)
--++++++             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++++             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++++
--++++++         return expert_cache
--+++++++        
--+++++++    # @no_grad()
--+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++++    #     # expert_cache = torch.zeros_like(x)
--+++++++    #     # idxs = flat_expert_indices.argsort()
--+++++++    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++++    #     # token_idxs = idxs // self.num_experts_per_tok
--+++++++    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++++++    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++++    #     #     if start_idx == end_idx:
--+++++++    #     #         continue
--+++++++    #     #     expert = self.experts[i]
--+++++++    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++++    #     #     expert_tokens = x[exp_token_idx]
--+++++++    #     #     expert_out = expert(expert_tokens)
--+++++++    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++++    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++++    #     # return expert_cache
--+++++++    #     expert_cache = ops.zeros_like(x)
--+++++++    #     idxs = flat_expert_indices.argsort()
--+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++++    #     token_idxs = idxs // self.num_experts_per_tok
--+++++++
--+++++++    #     for i, end_idx in enumerate(tokens_per_expert):
--+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++++    #         if start_idx == end_idx:
--+++++++    #             continue
--+++++++    #         expert = self.experts[i]
--+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++++    #         expert_tokens = x[exp_token_idx]
--+++++++    #         expert_out = expert(expert_tokens)
--+++++++    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++++    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++++
--+++++++    #     return expert_cache
--+++++++    # @no_grad()
--+++++++    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++++    #     expert_cache = ops.zeros_like(x)
--+++++++
--+++++++    #     # 排序保证顺序一致
--+++++++    #     idxs = flat_expert_indices.argsort()
--+++++++    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++++    #     token_idxs = idxs // self.num_experts_per_tok
--+++++++
--+++++++    #     # 找出有 token 的专家
--+++++++    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++++++
--+++++++    #     for i in active_experts.tolist():
--+++++++    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++++    #         end_idx = tokens_per_expert[i]
--+++++++    #         if start_idx == end_idx:  # 没有 token
--+++++++    #             continue
--+++++++
--+++++++    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++++    #         expert_tokens = x[exp_token_idx]
--+++++++    #         expert_out = self.experts[i](expert_tokens)
--+++++++    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++++++
--+++++++    #         expert_cache = mindspore.mint.scatter_add(
--+++++++    #             expert_cache,
--+++++++    #             0,
--+++++++    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++++++    #             expert_out
--+++++++    #         )
--+++++++
--+++++++    #     return expert_cache
--+++++++
--+++++++
--++++++ 
--++++++ # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--++++++ #     """
--++++++@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++++ 
--++++++         # Initialize weights and apply final processing
--++++++         self.post_init()
--+++++++        self.warm_up = False
--+++++++
--+++++++    def warmup_moe_model_deep(self):
--+++++++        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++++++        test_texts = [
--+++++++            "warmup short",
--+++++++            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+++++++        ]
--+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++++++        if tokenizer is None:
--+++++++            from mindnlp.transformers import AutoTokenizer
--+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++++++            self._warmup_tokenizer = tokenizer
--+++++++
--+++++++        for text in test_texts:
--+++++++            inputs = tokenizer(text, return_tensors="ms")
--+++++++            with mindspore._no_grad():
--+++++++                _ = self(**inputs, use_cache=False)
--+++++++        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--++++++ 
--++++++     def get_input_embeddings(self):
--++++++         return self.model.embed_tokens
--++++++@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++++++         ```"""
--+++++++        if not self.warm_up:
--+++++++            self.warm_up = True
--+++++++            self.warmup_moe_model_deep()
--+++++++
--++++++         output_attentions = (
--++++++             output_attentions
--++++++             if output_attentions is not None
--++++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++index 3cbf820e..d4c6b651 100644
--++++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++@@ -18,7 +18,6 @@
--++++++ # See the License for the specific language governing permissions and
--++++++ # limitations under the License.
--++++++ """MindSpore Qwen2MoE model."""
--++++++-
--++++++ import math
--++++++ from typing import List, Optional, Tuple, Union
--++++++ 
--++++++@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--++++++     TokenClassifierOutput,
--++++++ )
--++++++ from ...modeling_utils import PreTrainedModel
--+++++++from ...generation import GenerationMixin
--++++++ from ....utils import logging
--++++++ from .configuration_qwen2_moe import Qwen2MoeConfig
--++++++ 
--++++++@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--++++++         self.variance_epsilon = eps
--++++++ 
--++++++     def forward(self, hidden_states):
--+++++++        # @dwj
--+++++++        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++++++        # @lwx
--+++++++        # if not self.training :
--+++++++        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--++++++         input_dtype = hidden_states.dtype
--++++++         hidden_states = hidden_states.to(mindspore.float32)
--++++++         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--++++++@@ -234,6 +239,8 @@ def rotate_half(x):
--++++++     """Rotates half the hidden dims of the input."""
--++++++     x1 = x[..., : x.shape[-1] // 2]
--++++++     x2 = x[..., x.shape[-1] // 2 :]
--+++++++    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++++++    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--++++++     return ops.cat((-x2, x1), dim=-1)
--++++++ 
--++++++ 
--++++++@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--++++++         self.config = config
--++++++         self.hidden_size = config.hidden_size
--++++++         self.intermediate_size = intermediate_size
--+++++++        
--++++++         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++++++         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--++++++         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--++++++         self.act_fn = ACT2FN[config.hidden_act]
--++++++ 
--++++++     def forward(self, x):
--++++++-        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--++++++-
--++++++ 
--+++++++        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++++++        # @lwx 
--+++++++        # gate_up_output = self.gate_up_proj(x)
--+++++++        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+++++++        # return self.down_proj(swiglu_output)
--+++++++
--+++++++    # def forward(self, x):
--+++++++    #     gate_proj_out = self.gate_proj(x)
--+++++++    #     up_proj_out = self.up_proj(x)
--+++++++    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+++++++    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+++++++    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+++++++    #     return self.down_proj(swiglu_out)
--+++++++        
--++++++ # Copied from transformers.models.llama.modeling_llama.repeat_kv
--++++++ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--++++++     """
--++++++@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--++++++         use_cache: bool = False,
--++++++         cache_position: Optional[mindspore.Tensor] = None,
--++++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++++
--+++++++        
--+++++++
--++++++         bsz, q_len, _ = hidden_states.shape
--++++++ 
--++++++         query_states = self.q_proj(hidden_states)
--++++++@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--++++++                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++                     "with a layer index."
--++++++                 )
--++++++-            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++++            if isinstance(past_key_value, StaticCache):
--+++++++                kv_seq_len = key_states.shape[-2]
--+++++++            else:
--+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++ 
--++++++         if past_key_value is not None:
--++++++             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++++            
--+++++++            if isinstance(past_key_value, StaticCache):
--+++++++                kv_seq_len = key_states.shape[-2]
--++++++ 
--++++++         # repeat k/v heads if n_kv_heads < n_heads
--++++++         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++++         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++++-
--+++++++        
--++++++         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++++ 
--++++++-        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--++++++-            raise ValueError(
--++++++-                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--++++++-                f" {attn_weights.shape}"
--++++++-            )
--++++++-
--++++++-        if attention_mask is not None:  # no matter the length, we just slice it
--++++++-            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+++++++        if attention_mask is not None:
--+++++++            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++++             attn_weights = attn_weights + causal_mask
--++++++ 
--++++++         # upcast attention to fp32
--++++++@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--++++++         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++++ 
--++++++         attn_output = self.o_proj(attn_output)
--++++++-
--+++++++        # @lwx
--+++++++        
--+++++++        # max_seq_len = self.max_position_embeddings  # 2048
--+++++++
--+++++++        # if attention_mask is not None:
--+++++++        #     # attention_mask: [B, 1, Sq, Sk]
--+++++++        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++++++
--+++++++        #     # pad 到 [max_seq_len, max_seq_len]
--+++++++        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++++++        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++++++        #     global_attention_mask = padded_mask
--+++++++        # else:
--+++++++        #     global_attention_mask = None
--+++++++
--+++++++
--+++++++        # sparse_mode=3
--+++++++        # attn_output = mindspore.ops.flash_attention_score(
--+++++++        #     query=query_states,
--+++++++        #     key=key_states,
--+++++++        #     value=value_states,
--+++++++        #     real_shift=None,
--+++++++        #     padding_mask=None,
--+++++++
--+++++++        #     head_num=self.num_heads,
--+++++++        #     attn_mask=global_attention_mask,
--+++++++        #     keep_prob=1.0 - self.attention_dropout,
--+++++++        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++++        #     input_layout="BNSD",
--+++++++        #     pre_tokens=2147483647,
--+++++++        #     next_tokens=2147483647,
--+++++++        #     inner_precise=0,
--+++++++        #     drop_mask=None,
--+++++++        #     prefix=None,
--+++++++        #     actual_seq_qlen=None,
--+++++++        #     actual_seq_kvlen=None,
--+++++++        #     sparse_mode=sparse_mode,
--+++++++        # )
--++++++         if not output_attentions:
--++++++             attn_weights = None
--++++++ 
--++++++         return attn_output, attn_weights, past_key_value
--++++++ 
--++++++ 
--+++++++class Qwen2MoeFlashAttention(nn.Module):
--+++++++    """
--+++++++    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++++++    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++++++
--+++++++    关键改动:
--+++++++    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++++++       直接传入原始的 key 和 value 张量效率更高。
--+++++++    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++++++    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++++++    """
--+++++++    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++++++        super().__init__()
--+++++++        self.config = config
--+++++++        self.layer_idx = layer_idx
--+++++++        self.hidden_size = config.hidden_size
--+++++++        self.num_heads = config.num_attention_heads
--+++++++        self.head_dim = self.hidden_size // self.num_heads
--+++++++        self.num_key_value_heads = config.num_key_value_heads
--+++++++        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++++        self.max_position_embeddings = config.max_position_embeddings
--+++++++        self.rope_theta = config.rope_theta
--+++++++        self.attention_dropout = config.attention_dropout
--+++++++
--+++++++        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++++            raise ValueError(
--+++++++                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++++++            )
--+++++++
--+++++++        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++++++        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++++        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++++        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++++++
--+++++++        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++++++            self.head_dim,
--+++++++            max_position_embeddings=self.max_position_embeddings,
--+++++++            base=self.rope_theta,
--+++++++        )
--+++++++
--+++++++    def forward(
--+++++++        self,
--+++++++        hidden_states: mindspore.Tensor,
--+++++++        attention_mask: Optional[mindspore.Tensor] = None,
--+++++++        position_ids: Optional[mindspore.Tensor] = None,
--+++++++        past_key_value: Optional[Cache] = None,
--+++++++        output_attentions: bool = False,
--+++++++        use_cache: bool = False,
--+++++++        cache_position: Optional[mindspore.Tensor] = None,
--+++++++    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++++
--+++++++        bsz, q_len, _ = hidden_states.shape
--+++++++
--+++++++        # 1. 线性投射 Q, K, V
--+++++++        query_states = self.q_proj(hidden_states)
--+++++++        key_states = self.k_proj(hidden_states)
--+++++++        value_states = self.v_proj(hidden_states)
--+++++++
--+++++++        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++++        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++++++        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++
--+++++++        # 3. RoPE 旋转位置编码
--+++++++        kv_seq_len = key_states.shape[-2]
--+++++++        if past_key_value is not None:
--+++++++            if self.layer_idx is None:
--+++++++                raise ValueError(
--+++++++                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++++                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++++                    "with a layer index."
--+++++++                )
--+++++++            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++++++            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++++                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++++++                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++++++                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++++++                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++++++                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++++++                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++++++                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++++++                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++++++                if cache_position.shape[0] == 1:
--+++++++                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++++++                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++++++                    kv_seq_len = past_seen_tokens + 1
--+++++++                else:
--+++++++                    # prefill 阶段：cache_position 是范围，使用其长度
--+++++++                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++++++            else:
--+++++++                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++++
--+++++++        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++++        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++++
--+++++++        # 4. KV 缓存更新
--+++++++        if past_key_value is not None:
--+++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++++            key_states, value_states = past_key_value.update(
--+++++++                key_states, value_states, self.layer_idx, cache_kwargs
--+++++++            )
--+++++++            
--+++++++            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++++++            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++++++            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++++                if cache_position.shape[0] == 1:
--+++++++                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++++++                    kv_seq_len = key_states.shape[-2]
--+++++++
--+++++++        # 5. [重要] 准备 Attention Mask
--+++++++        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++++++        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++++++        fa_attention_mask = None
--+++++++        if attention_mask is not None:
--+++++++            # 截取与当前key长度匹配的部分
--+++++++            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++++++            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++++++            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++++            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++++++            fa_attention_mask = (mask_slice != 0)
--+++++++
--+++++++        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++++++        input_dtype = query_states.dtype
--+++++++        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++++++            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++++++            query_states = query_states.to(mindspore.float16)
--+++++++            key_states = key_states.to(mindspore.float16)
--+++++++            value_states = value_states.to(mindspore.float16)
--+++++++
--+++++++        # 6. [核心] 调用 flash_attention_score 算子
--+++++++        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++++++        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++++++        attn_output = mindspore.ops.flash_attention_score(
--+++++++            query=query_states,
--+++++++            key=key_states,
--+++++++            value=value_states,
--+++++++            head_num=self.num_heads, # 传入Q的头数(N1)
--+++++++            attn_mask=fa_attention_mask,
--+++++++            keep_prob=1.0 - self.attention_dropout,
--+++++++            scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++++            input_layout="BNSD",
--+++++++            sparse_mode=0 # 使用 defaultMask 模式
--+++++++        )
--+++++++
--+++++++        # 恢复原始数据类型
--+++++++        attn_output = attn_output.to(input_dtype)
--+++++++
--+++++++        # 7. 调整输出形状
--+++++++        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++++++        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++++        attn_output = self.o_proj(attn_output)
--+++++++
--+++++++        # FlashAttention 算子不直接返回注意力权重矩阵
--+++++++        attn_weights = None
--+++++++        if output_attentions:
--+++++++             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++++
--+++++++        return attn_output, attn_weights, past_key_value
--+++++++
--+++++++    # def forward(
--+++++++    #     self,
--+++++++    #     hidden_states: mindspore.Tensor,
--+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++++    #     past_key_value: Optional[Cache] = None,
--+++++++    #     output_attentions: bool = False,
--+++++++    #     use_cache: bool = False,
--+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++++
--+++++++    #     bsz, q_len, _ = hidden_states.shape
--+++++++
--+++++++    #     # 1. 线性投射 Q, K, V
--+++++++    #     query_states = self.q_proj(hidden_states)
--+++++++    #     key_states = self.k_proj(hidden_states)
--+++++++    #     value_states = self.v_proj(hidden_states)
--+++++++
--+++++++    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++
--+++++++    #     # 3. RoPE 旋转位置编码
--+++++++    #     kv_seq_len = key_states.shape[-2]
--+++++++    #     if past_key_value is not None:
--+++++++    #         if self.layer_idx is None:
--+++++++    #             raise ValueError(
--+++++++    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++++    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++++    #                 "with a layer index."
--+++++++    #             )
--+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++++
--+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++++
--+++++++    #     # 4. KV 缓存更新
--+++++++    #     if past_key_value is not None:
--+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++++    #         key_states, value_states = past_key_value.update(
--+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++++    #         )
--+++++++
--+++++++    #     # 5. 准备 Attention Mask
--+++++++    #     fa_attention_mask = None
--+++++++    #     if attention_mask is not None:
--+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++++    #         fa_attention_mask = (mask_slice != 0)
--+++++++
--+++++++    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++++++    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++++++    #     input_dtype = query_states.dtype
--+++++++
--+++++++    #     # 6. [核心] 调用 flash_attention_score 算子
--+++++++    #     attn_output = mindspore.ops.flash_attention_score(
--+++++++    #         query=query_states,
--+++++++    #         key=key_states,
--+++++++    #         value=value_states,
--+++++++    #         head_num=self.num_heads,
--+++++++    #         attn_mask=fa_attention_mask,
--+++++++    #         keep_prob=1.0 - self.attention_dropout,
--+++++++    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++++    #         input_layout="BNSD",
--+++++++    #         sparse_mode=0,
--+++++++    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++++++    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++++++    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++++++    #         inner_precise=1
--+++++++    #     )
--+++++++
--+++++++    #     # 恢复原始数据类型
--+++++++    #     attn_output = attn_output.to(input_dtype)
--+++++++
--+++++++    #     # 7. 调整输出形状
--+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++++    #     attn_output = self.o_proj(attn_output)
--+++++++
--+++++++    #     attn_weights = None
--+++++++    #     if output_attentions:
--+++++++    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++++
--+++++++    #     return attn_output, attn_weights, past_key_value
--+++++++
--+++++++    # def forward(
--+++++++    #     self,
--+++++++    #     hidden_states: mindspore.Tensor,
--+++++++    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++++    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++++    #     past_key_value: Optional[Cache] = None,
--+++++++    #     output_attentions: bool = False,
--+++++++    #     use_cache: bool = False,
--+++++++    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++++    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++++
--+++++++    #     bsz, q_len, _ = hidden_states.shape
--+++++++
--+++++++    #     query_states = self.q_proj(hidden_states)
--+++++++    #     key_states = self.k_proj(hidden_states)
--+++++++    #     value_states = self.v_proj(hidden_states)
--+++++++
--+++++++    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++++
--+++++++    #     kv_seq_len = key_states.shape[-2]
--+++++++    #     if past_key_value is not None:
--+++++++    #         if self.layer_idx is None:
--+++++++    #             raise ValueError("`layer_idx` must be specified for caching")
--+++++++    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++++
--+++++++    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++++    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++++
--+++++++    #     if past_key_value is not None:
--+++++++    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++++    #         key_states, value_states = past_key_value.update(
--+++++++    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++++    #         )
--+++++++
--+++++++    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++++    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++++
--+++++++    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++++++    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++++++    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++++++    #     query_states = query_states / math.sqrt(self.head_dim)
--+++++++    #     # <--- 修改结束 ---
--+++++++
--+++++++    #     fa_attention_mask = None
--+++++++    #     if attention_mask is not None:
--+++++++    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++++    #         fa_attention_mask = (mask_slice != 0)
--+++++++
--+++++++    #     input_dtype = query_states.dtype
--+++++++
--+++++++    #     attn_output = mindspore.ops.flash_attention_score(
--+++++++    #         query=query_states,    # 传入已经预先缩放过的 query
--+++++++    #         key=key_states,
--+++++++    #         value=value_states,
--+++++++    #         head_num=self.num_heads,
--+++++++    #         attn_mask=fa_attention_mask,
--+++++++    #         keep_prob=1.0 - self.attention_dropout,
--+++++++    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++++++    #         input_layout="BNSD",
--+++++++    #         sparse_mode=0,
--+++++++    #         inner_precise=1        # 仍然保持内部高精度计算
--+++++++    #     )
--+++++++
--+++++++    #     attn_output = attn_output.to(input_dtype)
--+++++++    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++++    #     attn_output = self.o_proj(attn_output)
--+++++++
--+++++++    #     attn_weights = None
--+++++++    #     if output_attentions:
--+++++++    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++++++
--+++++++    #     return attn_output, attn_weights, past_key_value
--+++++++
--++++++ QWEN2MOE_ATTENTION_CLASSES = {
--++++++     "eager": Qwen2MoeAttention,
--+++++++    "flash-attention": Qwen2MoeFlashAttention,
--++++++ }
--++++++ 
--++++++ 
--++++++@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--++++++         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--++++++         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--++++++ 
--+++++++    #@dwj
--+++++++    # 只遍历激活的专家，而非全部专家
--++++++     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--++++++-        batch_size, sequence_length, hidden_dim = hidden_states.shape
--++++++-        hidden_states = hidden_states.view(-1, hidden_dim)
--++++++-        # router_logits: (batch * sequence_length, n_experts)
--++++++-        router_logits = self.gate(hidden_states)
--++++++-
--++++++-        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--++++++-        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--++++++-        if self.norm_topk_prob:
--++++++-            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--++++++-        # we cast back to the input dtype
--++++++-        routing_weights = routing_weights.to(hidden_states.dtype)
--++++++-
--++++++-        final_hidden_states = ops.zeros(
--++++++-            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--++++++-        )
--++++++-
--++++++-        # One hot encode the selected experts to create an expert mask
--++++++-        # this will be used to easily index which expert is going to be sollicitated
--++++++-        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--++++++-
--++++++-        # Loop over all available experts in the model and perform the computation on each expert
--++++++-        for expert_idx in range(self.num_experts):
--++++++-            expert_layer = self.experts[expert_idx]
--++++++-            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--++++++-
--++++++-            # Index the correct hidden states and compute the expert hidden state for
--++++++-            # the current expert. We need to make sure to multiply the output hidden
--++++++-            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--++++++-            if 0 not in idx.shape:
--++++++-                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--++++++-                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--++++++-
--++++++-                # However `index_add_` only support torch tensors for indexing so we'll use
--++++++-                # the `top_x` tensor here.
--++++++-                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--++++++-
--++++++-        shared_expert_output = self.shared_expert(hidden_states)
--++++++-        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--++++++-
--++++++-        final_hidden_states = final_hidden_states + shared_expert_output
--+++++++            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++++            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++++            num_tokens = hidden_states_reshaped.shape[0]
--+++++++
--+++++++            router_logits = self.gate(hidden_states_reshaped)
--+++++++            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++++            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++++
--+++++++            if self.norm_topk_prob:
--+++++++                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++++            routing_weights = routing_weights.to(hidden_states.dtype)
--+++++++            
--+++++++            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++++++            flat_selected_experts = selected_experts.flatten()
--+++++++            
--+++++++            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++++++            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++++++            token_indices = broadcasted_token_indices.flatten()
--+++++++            
--+++++++            active_experts = ops.unique(flat_selected_experts)
--+++++++            
--+++++++            for expert_idx_tensor in active_experts:
--+++++++                expert_idx = expert_idx_tensor.item()
--+++++++                expert_layer = self.experts[expert_idx]
--+++++++                
--+++++++                mask = (flat_selected_experts == expert_idx_tensor)
--+++++++                selected_token_indices = token_indices[mask]
--+++++++                selected_routing_weights = routing_weights.flatten()[mask]
--+++++++                
--+++++++                current_states = hidden_states_reshaped[selected_token_indices]
--+++++++                
--+++++++                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++++                
--+++++++                final_hidden_states = final_hidden_states.index_add(
--+++++++                    dim=0,
--+++++++                    index=selected_token_indices,
--+++++++                    source=expert_output.to(hidden_states.dtype)
--+++++++                )
--+++++++            
--+++++++            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++++++            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--++++++ 
--++++++-        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--++++++-        return final_hidden_states, router_logits
--+++++++            final_hidden_states = final_hidden_states + shared_expert_output
--+++++++            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++++            
--+++++++            return final_hidden_states, router_logits
--++++++ 
--++++++ 
--++++++ class Qwen2MoeDecoderLayer(nn.Module):
--++++++@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--++++++ 
--++++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--++++++ 
--+++++++        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++++++
--++++++         if (layer_idx not in config.mlp_only_layers) and (
--++++++             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--++++++         ):
--++++++@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--++++++     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--++++++     _skip_keys_device_placement = "past_key_values"
--++++++     _supports_cache_class = True
--+++++++#lwx
--+++++++    # _supports_static_cache = True
--++++++ 
--++++++     def _init_weights(self, module):
--++++++         std = self.config.initializer_range
--++++++@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--++++++         return causal_mask
--++++++ 
--++++++ 
--++++++-class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++++class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--++++++     _tied_weights_keys = ["lm_head.weight"]
--++++++ 
--++++++     def __init__(self, config):
--++++++@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++++         self.num_experts_per_tok = config.num_experts_per_tok
--++++++         # Initialize weights and apply final processing
--++++++         self.post_init()
--+++++++        # @lwx
--+++++++        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+++++++        #     self.generation_config.cache_implementation = "static"
--+++++++        self._warmed_up = False
--+++++++
--+++++++    def warmup_moe_model(self):
--+++++++        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+++++++        test_texts = [
--+++++++            "warmup short",
--+++++++            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+++++++            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+++++++        ]
--+++++++        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++++++        if tokenizer is None:
--+++++++            from mindnlp.transformers import AutoTokenizer
--+++++++            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++++++            self._warmup_tokenizer = tokenizer
--+++++++
--+++++++        for text in test_texts:
--+++++++            inputs = tokenizer(text, return_tensors="ms")
--+++++++            with mindspore._no_grad():
--+++++++                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+++++++        print("[Warmup] Qwen2-MoE 模型预热完成。")
--++++++ 
--++++++     def get_input_embeddings(self):
--++++++         return self.model.embed_tokens
--++++++@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++++         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--++++++         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--++++++         ```"""
--+++++++        if not self._warmed_up:
--+++++++            self._warmed_up = True
--+++++++            self.warmup_moe_model()
--++++++ 
--++++++         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--++++++         output_router_logits = (
--++++++@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--++++++             }
--++++++         )
--++++++         return model_inputs
--+++++++# @lwx
--+++++++    # def _decode_one_tokens_logits(
--+++++++    #     self,
--+++++++    #     cur_token: mindspore.Tensor,
--+++++++    #     input_pos: Optional[mindspore.Tensor],
--+++++++    #     cache_position: mindspore.Tensor,
--+++++++    #     past_key_values: StaticCache,
--+++++++    # ) -> mindspore.Tensor:
--+++++++    #     """
--+++++++    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+++++++        
--+++++++    #     Args:
--+++++++    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+++++++    #         input_pos: 输入位置信息，可选
--+++++++    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+++++++    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+++++++            
--+++++++    #     Returns:
--+++++++    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+++++++    #     """
--+++++++    #     # 调用JIT编译的版本
--+++++++    #     return self.get_decode_one_tokens_logits(
--+++++++    #         cur_token=cur_token,
--+++++++    #         input_pos=input_pos,
--+++++++    #         cache_position=cache_position,
--+++++++    #         past_key_values=past_key_values,
--+++++++    #     )
--+++++++    
--+++++++    # @mindspore.jit(jit_level='O1')
--+++++++    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+++++++    #     """
--+++++++    #     JIT编译的函数，用于高效的单token解码
--+++++++    #     使用JIT编译优化以支持静态shape和高效执行
--+++++++        
--+++++++    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+++++++    #     """
--+++++++    #     outputs = self.model.forward(
--+++++++    #         input_ids=cur_token,
--+++++++    #         position_ids=input_pos,
--+++++++    #         cache_position=cache_position,
--+++++++    #         past_key_values=past_key_values,
--+++++++    #         use_cache=True,
--+++++++    #         return_dict=False,
--+++++++    #     )
--+++++++        
--+++++++    #     hidden_states = outputs[0]
--+++++++    #     logits = self.lm_head.forward(hidden_states)
--+++++++    #     logits = logits.float()
--+++++++        
--+++++++    #     return logits[:, -1, :]
--+++++++
--+++++++    # def _sample(
--+++++++    #     self,
--+++++++    #     input_ids: mindspore.Tensor,
--+++++++    #     logits_processor,
--+++++++    #     stopping_criteria,
--+++++++    #     generation_config,
--+++++++    #     synced_devices: bool,
--+++++++    #     streamer=None,
--+++++++    #     logits_warper=None,
--+++++++    #     **model_kwargs,
--+++++++    # ):
--+++++++    #     """
--+++++++    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+++++++    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+++++++    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+++++++    #     """
--+++++++    #     from ...generation.logits_process import LogitsProcessorList
--+++++++    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+++++++    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+++++++    #     from mindnlp.core import nn, ops, no_grad
--+++++++    #     import numpy as np
--+++++++        
--+++++++    #     # 检查是否使用 StaticCache
--+++++++    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+++++++    #     # 否则，直接调用父类方法
--+++++++    #     past_key_values = model_kwargs.get("past_key_values")
--+++++++    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+++++++
--+++++++    #     if not isinstance(past_key_values, StaticCache):
--+++++++    #         # 不使用 StaticCache，直接调用父类方法
--+++++++    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+++++++    #         return super()._sample(
--+++++++    #             input_ids=input_ids,
--+++++++    #             logits_processor=logits_processor,
--+++++++    #             stopping_criteria=stopping_criteria,
--+++++++    #             generation_config=generation_config,
--+++++++    #             synced_devices=synced_devices,
--+++++++    #             streamer=streamer,
--+++++++    #             logits_warper=logits_warper,
--+++++++    #             **model_kwargs,
--+++++++    #         )
--+++++++        
--+++++++    #     # 使用 StaticCache，进入自定义循环
--+++++++    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+++++++    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+++++++    #     pad_token_id = generation_config._pad_token_tensor
--+++++++    #     output_attentions = generation_config.output_attentions
--+++++++    #     output_hidden_states = generation_config.output_hidden_states
--+++++++    #     output_scores = generation_config.output_scores
--+++++++    #     output_logits = generation_config.output_logits
--+++++++    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+++++++    #     max_length = generation_config.max_length
--+++++++    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+++++++    #     do_sample = generation_config.do_sample
--+++++++        
--+++++++    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+++++++    #         raise ValueError(
--+++++++    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+++++++    #             f"{logits_warper})."
--+++++++    #         )
--+++++++        
--+++++++    #     # init attention / hidden states / scores tuples
--+++++++    #     scores = () if (return_dict_in_generate and output_scores) else None
--+++++++    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+++++++    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++++++    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++++++    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+++++++        
--+++++++    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+++++++    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+++++++    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+++++++    #         encoder_hidden_states = (
--+++++++    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+++++++    #         )
--+++++++        
--+++++++    #     # keep track of which sequences are already finished
--+++++++    #     batch_size, cur_len = input_ids.shape
--+++++++    #     this_peer_finished = False
--+++++++    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+++++++    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+++++++        
--+++++++    #     time_record = []
--+++++++    #     from ....utils.testing_utils import parse_flag_from_env
--+++++++    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+++++++        
--+++++++    #     while self._has_unfinished_sequences(
--+++++++    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+++++++    #     ):
--+++++++    #         if _record_time:
--+++++++    #             import time as time_module
--+++++++    #             infer_start = time_module.time()
--+++++++            
--+++++++    #         # prepare model inputs
--+++++++    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+++++++            
--+++++++    #         # prepare variable output controls
--+++++++    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+++++++    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+++++++            
--+++++++    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+++++++    #         cur_cache_position = model_inputs.get("cache_position")
--+++++++    #         cur_past_key_values = model_inputs.get("past_key_values")
--+++++++    #         cur_input_ids = model_inputs.get("input_ids")
--+++++++            
--+++++++    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+++++++    #             cur_cache_position is not None and 
--+++++++    #             len(cur_cache_position.shape) > 0 and
--+++++++    #             cur_cache_position.shape[0] == 1 and
--+++++++    #             cur_input_ids is not None and
--+++++++    #             cur_input_ids.shape[1] == 1):
--+++++++    #             # 使用 JIT 优化的单 token 解码
--+++++++    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+++++++    #             if not hasattr(self, '_jit_used'):
--+++++++    #                 self._jit_used = False
--+++++++    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+++++++                
--+++++++    #             next_token_logits = self.get_decode_one_tokens_logits(
--+++++++    #                 cur_token=cur_input_ids,
--+++++++    #                 input_pos=model_inputs.get("position_ids"),
--+++++++    #                 cache_position=cur_cache_position,
--+++++++    #                 past_key_values=cur_past_key_values,
--+++++++    #             )
--+++++++                
--+++++++    #             # 标记已使用JIT（用于后续判断）
--+++++++    #             if not self._jit_used:
--+++++++    #                 self._jit_used = True
--+++++++                
--+++++++    #             # 构造兼容的输出对象
--+++++++    #             class JitOptimizedOutput:
--+++++++    #                 def __init__(self, logits, config):
--+++++++    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+++++++    #                     self.config = config
--+++++++    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+++++++    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+++++++    #                     self.attentions = None if not config.is_encoder_decoder else None
--+++++++    #                     self.cross_attentions = None
--+++++++    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+++++++    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+++++++                
--+++++++    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+++++++    #         else:
--+++++++    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+++++++    #             outputs = self(**model_inputs, return_dict=True)
--+++++++            
--+++++++    #         if synced_devices and this_peer_finished:
--+++++++    #             continue
--+++++++            
--+++++++    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+++++++    #         next_token_logits = outputs.logits[:, -1, :]
--+++++++            
--+++++++    #         # pre-process distribution
--+++++++    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+++++++    #         if do_sample:
--+++++++    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+++++++            
--+++++++    #         # Store scores, attentions and hidden_states when required
--+++++++    #         if return_dict_in_generate:
--+++++++    #             if output_scores:
--+++++++    #                 scores += (next_token_scores,)
--+++++++    #             if output_logits:
--+++++++    #                 raw_logits += (next_token_logits,)
--+++++++    #             if output_attentions:
--+++++++    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+++++++    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+++++++    #                 if self.config.is_encoder_decoder:
--+++++++    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+++++++                
--+++++++    #             if output_hidden_states:
--+++++++    #                 hidden = (
--+++++++    #                     outputs.decoder_hidden_states
--+++++++    #                     if self.config.is_encoder_decoder
--+++++++    #                     else outputs.hidden_states
--+++++++    #                 )
--+++++++    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+++++++            
--+++++++    #         # token selection
--+++++++    #         if do_sample:
--+++++++    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+++++++    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+++++++    #         else:
--+++++++    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+++++++            
--+++++++    #         # finished sentences should have their next token be a padding token
--+++++++    #         if has_eos_stopping_criteria:
--+++++++    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+++++++            
--+++++++    #         # update generated ids, model inputs, and length for next step
--+++++++    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+++++++    #         if streamer is not None:
--+++++++    #             streamer.put(next_tokens)
--+++++++            
--+++++++    #         model_kwargs = self._update_model_kwargs_for_generation(
--+++++++    #             outputs,
--+++++++    #             model_kwargs,
--+++++++    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+++++++    #         )
--+++++++            
--+++++++    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+++++++    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+++++++    #         cur_len += 1
--+++++++            
--+++++++    #         if _record_time:
--+++++++    #             import time as time_module
--+++++++    #             infer_stop = time_module.time()
--+++++++    #             time_record.append(infer_stop - infer_start)
--+++++++            
--+++++++    #         del outputs
--+++++++        
--+++++++    #     average_infer_time = None
--+++++++    #     if time_record:
--+++++++    #         if len(time_record) > 1:
--+++++++    #             time_record.pop(0)
--+++++++    #         average_infer_time = sum(time_record) / len(time_record)
--+++++++    #         print(f'average inference time is: {average_infer_time}')
--+++++++    #         print(f'inference time record: {time_record}')
--+++++++        
--+++++++    #     if streamer is not None:
--+++++++    #         streamer.end()
--+++++++        
--+++++++    #     # 简单判断：打印是否使用了JIT路径
--+++++++    #     if hasattr(self, '_jit_used') and self._jit_used:
--+++++++    #         print("[JIT] ✓ JIT optimization was used during generation")
--+++++++    #     else:
--+++++++    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+++++++        
--+++++++    #     if return_dict_in_generate:
--+++++++    #         if self.config.is_encoder_decoder:
--+++++++    #             return GenerateEncoderDecoderOutput(
--+++++++    #                 sequences=input_ids,
--+++++++    #                 scores=scores,
--+++++++    #                 logits=raw_logits,
--+++++++    #                 encoder_attentions=encoder_attentions,
--+++++++    #                 encoder_hidden_states=encoder_hidden_states,
--+++++++    #                 decoder_attentions=decoder_attentions,
--+++++++    #                 cross_attentions=cross_attentions,
--+++++++    #                 decoder_hidden_states=decoder_hidden_states,
--+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++++++    #                 average_infer_time=average_infer_time
--+++++++    #             )
--+++++++    #         else:
--+++++++    #             return GenerateDecoderOnlyOutput(
--+++++++    #                 sequences=input_ids,
--+++++++    #                 scores=scores,
--+++++++    #                 logits=raw_logits,
--+++++++    #                 attentions=decoder_attentions,
--+++++++    #                 hidden_states=decoder_hidden_states,
--+++++++    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++++++    #                 average_infer_time=average_infer_time
--+++++++    #             )
--+++++++    #     else:
--+++++++    #         return input_ids
--+++++++            
--+++++++    # def _prepare_cache_for_generation(
--+++++++    #     self,
--+++++++    #     generation_config,
--+++++++    #     model_kwargs,
--+++++++    #     assistant_model,
--+++++++    #     batch_size,
--+++++++    #     max_cache_length,
--+++++++    # ):
--+++++++    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+++++++    #         generation_config.cache_implementation = "static"
--+++++++    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+++++++        
--+++++++    #     if generation_config.cache_implementation == "static":
--+++++++    #         base_required_from_max_length = generation_config.max_length + 1
--+++++++    #         base_required = max(max_cache_length, base_required_from_max_length)
--+++++++    #         min_cache_size = 50
--+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++++++    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+++++++    #         else:
--+++++++    #             max_cache_length = max(base_required, min_cache_size)
--+++++++            
--+++++++    #         original_max_cache_length = max_cache_length
--+++++++    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+++++++    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+++++++    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+++++++    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+++++++    #         print(f"  - final max_cache_length: {max_cache_length}")
--+++++++            
--+++++++    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++++++    #             if max_cache_length > self.config.max_position_embeddings:
--+++++++    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+++++++        
--+++++++    #     result = super()._prepare_cache_for_generation(
--+++++++    #         generation_config=generation_config,
--+++++++    #         model_kwargs=model_kwargs,
--+++++++    #         assistant_model=assistant_model,
--+++++++    #         batch_size=batch_size,
--+++++++    #         max_cache_length=max_cache_length,
--+++++++    #     )
--+++++++        
--+++++++    #     if generation_config.cache_implementation == "static":
--+++++++    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+++++++    #         created_cache = model_kwargs.get(cache_name)
--+++++++    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+++++++    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+++++++    #             if created_cache.max_cache_len < generation_config.max_length:
--+++++++    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+++++++        
--+++++++    #     return result
--+++++++
--+++++++
--+++++++
--++++++ 
--++++++ 
--++++++ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--++++++-- 
--++++++2.27.0
--++++++
--+++++-- 
--+++++2.27.0
--+++++
--++++diff --git a/patches/0003-20261106secondcommit.patch b/patches/0003-20261106secondcommit.patch
--++++new file mode 100644
--++++index 00000000..966529e4
--++++--- /dev/null
--+++++++ b/patches/0003-20261106secondcommit.patch
--++++@@ -0,0 +1,2769 @@
--+++++From 1cf79d864cf51fd66bef8fea63047c5fde477f53 Mon Sep 17 00:00:00 2001
--+++++From: Pinoeer-kingxi <13022943007@163.com>
--+++++Date: Thu, 6 Nov 2025 14:54:37 +0800
--+++++Subject: [PATCH 3/3] 20261106secondcommit
--+++++
--+++++---
--+++++ .../models/deepseek/modeling_deepseek.py      |  217 ++-
--+++++ .../models/qwen2_moe/modeling_qwen2_moe.py    | 1071 +++++---------
--+++++ patches/0001-20251104commit.patch             | 1272 -----------------
--+++++ 3 files changed, 528 insertions(+), 2032 deletions(-)
--+++++ delete mode 100644 patches/0001-20251104commit.patch
--+++++
--+++++diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++index 73773c22..2f9192bf 100644
--+++++--- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--++++++++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++@@ -54,6 +54,24 @@ logger = logging.get_logger(__name__)
--+++++ 
--+++++ _CONFIG_FOR_DOC = "DeepseekConfig"
--+++++ 
--++++++_attn_mask_cache = {}
--++++++
--++++++def get_cached_causal_mask(attention_mask, batch_and_seq, inputs_embeds, past_key_values_length):
--++++++    q_len = batch_and_seq[1]
--++++++    kv_len = batch_and_seq[1] + past_key_values_length 
--++++++    key = (batch_and_seq[0], q_len, kv_len)
--++++++
--++++++    if key in _attn_mask_cache:
--++++++        return _attn_mask_cache[key]
--++++++
--++++++    mask = _prepare_4d_causal_attention_mask(
--++++++        attention_mask,
--++++++        batch_and_seq,
--++++++        inputs_embeds,
--++++++        past_key_values_length,
--++++++    )
--++++++    _attn_mask_cache[key] = mask
--++++++    return mask
--+++++ 
--+++++ def _get_unpad_data(attention_mask):
--+++++     seqlens_in_batch = attention_mask.sum(dim=-1, dtype=mindspore.int32)
--+++++@@ -441,43 +459,8 @@ class DeepseekMoE(nn.Module):
--+++++         return final_output
--+++++ 
--+++++ 
--+++++-    @no_grad()
--+++++-    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++++-        expert_cache = ops.zeros_like(x)
--+++++-        idxs = flat_expert_indices.argsort()
--+++++-        tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++-        token_idxs = idxs // self.num_experts_per_tok
--+++++-
--+++++-        for i, end_idx in enumerate(tokens_per_expert):
--+++++-            start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++-            if start_idx == end_idx:
--+++++-                continue
--+++++-            expert = self.experts[i]
--+++++-            exp_token_idx = token_idxs[start_idx:end_idx]
--+++++-            expert_tokens = x[exp_token_idx]
--+++++-            expert_out = expert(expert_tokens)
--+++++-            expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++-            expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++-
--+++++-        return expert_cache
--+++++-        
--+++++     # @no_grad()
--+++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++-    #     # expert_cache = torch.zeros_like(x)
--+++++-    #     # idxs = flat_expert_indices.argsort()
--+++++-    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++-    #     # token_idxs = idxs // self.num_experts_per_tok
--+++++-    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++++-    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++-    #     #     if start_idx == end_idx:
--+++++-    #     #         continue
--+++++-    #     #     expert = self.experts[i]
--+++++-    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++-    #     #     expert_tokens = x[exp_token_idx]
--+++++-    #     #     expert_out = expert(expert_tokens)
--+++++-    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++-    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++-    #     # return expert_cache
--++++++    # def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++++     #     expert_cache = ops.zeros_like(x)
--+++++     #     idxs = flat_expert_indices.argsort()
--+++++     #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++@@ -495,37 +478,118 @@ class DeepseekMoE(nn.Module):
--+++++     #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++ 
--+++++     #     return expert_cache
--+++++-    # @no_grad()
--+++++-    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++-    #     expert_cache = ops.zeros_like(x)
--++++++        
--++++++    @no_grad()
--++++++    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--++++++        """
--++++++        优化版 MoE prefill：
--++++++        - 批量张量化处理同一个 expert 的所有 token
--++++++        - 跳过无 token 的专家
--++++++        - 保持结果完全一致
--++++++        """
--++++++        # 初始化输出缓存
--++++++        expert_cache = ops.zeros_like(x)
--+++++ 
--+++++-    #     # 排序保证顺序一致
--+++++-    #     idxs = flat_expert_indices.argsort()
--+++++-    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++-    #     token_idxs = idxs // self.num_experts_per_tok
--++++++        # 排序（确保 scatter_add 位置对应原逻辑）
--++++++        idxs = flat_expert_indices.argsort()
--++++++        sorted_expert_indices = flat_expert_indices[idxs]
--++++++        sorted_token_indices = idxs // self.num_experts_per_tok
--+++++ 
--+++++-    #     # 找出有 token 的专家
--+++++-    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++++        # 每个 expert 的 token 数
--++++++        tokens_per_expert = sorted_expert_indices.bincount()
--+++++ 
--+++++-    #     for i in active_experts.tolist():
--+++++-    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++-    #         end_idx = tokens_per_expert[i]
--+++++-    #         if start_idx == end_idx:  # 没有 token
--+++++-    #             continue
--++++++        # 找出有 token 的专家
--++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--+++++ 
--+++++-    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++-    #         expert_tokens = x[exp_token_idx]
--+++++-    #         expert_out = self.experts[i](expert_tokens)
--+++++-    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++++        for expert_id in active_experts.tolist():
--++++++            # 取该 expert 对应的排序后 token 区间
--++++++            start = (tokens_per_expert[:expert_id]).sum().item()
--++++++            end = start + tokens_per_expert[expert_id].item()
--+++++ 
--+++++-    #         expert_cache = mindspore.mint.scatter_add(
--+++++-    #             expert_cache,
--+++++-    #             0,
--+++++-    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++++-    #             expert_out
--+++++-    #         )
--++++++            token_idx = sorted_token_indices[start:end]      # 原 token 位置
--++++++            expert_tokens = x[token_idx]                     # 取输入向量
--+++++ 
--+++++-    #     return expert_cache
--++++++            # 执行专家 MLP
--++++++            expert_out = self.experts[expert_id](expert_tokens)
--++++++
--++++++            # 按权重缩放
--++++++            scaled_out = expert_out * flat_expert_weights[idxs[start:end]]
--++++++
--++++++            # 回写到缓存（等价 scatter_add）
--++++++            expert_cache = mindspore.mint.scatter_add(
--++++++                expert_cache,
--++++++                0,
--++++++                token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++++                scaled_out
--++++++            )
--++++++
--++++++        return expert_cache
--++++++
--++++++        # @no_grad()
--++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++++        #     # expert_cache = torch.zeros_like(x)
--++++++        #     # idxs = flat_expert_indices.argsort()
--++++++        #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--++++++        #     # token_idxs = idxs // self.num_experts_per_tok
--++++++        #     # for i, end_idx in enumerate(tokens_per_expert):
--++++++        #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--++++++        #     #     if start_idx == end_idx:
--++++++        #     #         continue
--++++++        #     #     expert = self.experts[i]
--++++++        #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--++++++        #     #     expert_tokens = x[exp_token_idx]
--++++++        #     #     expert_out = expert(expert_tokens)
--++++++        #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--++++++        #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--++++++        #     # return expert_cache
--++++++        #     expert_cache = ops.zeros_like(x)
--++++++        #     idxs = flat_expert_indices.argsort()
--++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++++        #     token_idxs = idxs // self.num_experts_per_tok
--++++++
--++++++        #     for i, end_idx in enumerate(tokens_per_expert):
--++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++++        #         if start_idx == end_idx:
--++++++        #             continue
--++++++        #         expert = self.experts[i]
--++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++++        #         expert_tokens = x[exp_token_idx]
--++++++        #         expert_out = expert(expert_tokens)
--++++++        #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--++++++        #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--++++++
--++++++        #     return expert_cache
--++++++        # @no_grad()
--++++++        # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--++++++        #     expert_cache = ops.zeros_like(x)
--++++++
--++++++        #     # 排序保证顺序一致
--++++++        #     idxs = flat_expert_indices.argsort()
--++++++        #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--++++++        #     token_idxs = idxs // self.num_experts_per_tok
--++++++
--++++++        #     # 找出有 token 的专家
--++++++        #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--++++++
--++++++        #     for i in active_experts.tolist():
--++++++        #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--++++++        #         end_idx = tokens_per_expert[i]
--++++++        #         if start_idx == end_idx:  # 没有 token
--++++++        #             continue
--++++++
--++++++        #         exp_token_idx = token_idxs[start_idx:end_idx]
--++++++        #         expert_tokens = x[exp_token_idx]
--++++++        #         expert_out = self.experts[i](expert_tokens)
--++++++        #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--++++++
--++++++        #         expert_cache = mindspore.mint.scatter_add(
--++++++        #             expert_cache,
--++++++        #             0,
--++++++        #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--++++++        #             expert_out
--++++++        #         )
--++++++
--++++++        #     return expert_cache
--+++++ 
--+++++ 
--+++++ 
--+++++@@ -904,7 +968,6 @@ class DeepseekAttention(nn.Module):
--+++++ 
--+++++         return attn_output, attn_weights, past_key_value
--+++++ 
--+++++-
--+++++ # class DeepseekFlashAttention(nn.Module):
--+++++ #     """
--+++++ #     Multi-headed attention from 'Attention Is All You Need' paper, implemented using
--+++++@@ -1225,6 +1288,7 @@ class DeepseekFlashAttention(nn.Module):
--+++++ 
--+++++         return attn_output, attn_weights, past_key_value
--+++++ 
--++++++
--+++++ Deepseek_ATTENTION_CLASSES = {
--+++++     "eager": DeepseekAttention,
--+++++     "flash-attention": DeepseekFlashAttention,
--+++++@@ -1456,7 +1520,14 @@ class DeepseekModel(DeepseekPreTrainedModel):
--+++++             )
--+++++         else:
--+++++             # 4d mask is passed through the layers
--+++++-            attention_mask = _prepare_4d_causal_attention_mask(
--++++++            # attention_mask = _prepare_4d_causal_attention_mask(
--++++++            #     attention_mask,
--++++++            #     (batch_size, seq_length),
--++++++            #     inputs_embeds,
--++++++            #     past_key_values_length,
--++++++            # )
--++++++            #@dwj
--++++++            attention_mask = get_cached_causal_mask(
--+++++                 attention_mask,
--+++++                 (batch_size, seq_length),
--+++++                 inputs_embeds,
--+++++@@ -1542,6 +1613,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++++         # Initialize weights and apply final processing
--+++++         self.post_init()
--+++++         self.warm_up = False
--++++++        #@dwj
--++++++        self.kv_cache_keys, self.kv_cache_values = self.init_kv_cache(
--++++++            self.num_layers,
--++++++            self.num_attention_heads,
--++++++            self.head_dim,
--++++++            batch_size=1,
--++++++            max_length=self.max_length,
--++++++            dtype=mindspore.float16
--++++++        )
--++++++
--++++++    def init_kv_cache(self, num_layers, num_heads, head_dim, batch_size, max_length, dtype):
--++++++        key_cache = []
--++++++        value_cache = []
--++++++        for _ in range(num_layers):
--++++++            k = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++++++            v = ops.zeros((batch_size, num_heads, max_length, head_dim), dtype=dtype)
--++++++            key_cache.append(k)
--++++++            value_cache.append(v)
--++++++        return key_cache, value_cache
--++++++
--+++++ 
--+++++     def warmup_moe_model_deep(self):
--+++++         print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++++diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++index bced285c..ebd7782e 100644
--+++++--- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--++++++++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++@@ -45,8 +45,48 @@ logger = logging.get_logger(__name__)
--+++++ _CHECKPOINT_FOR_DOC = "Qwen/Qwen1.5-MoE-A2.7B"
--+++++ _CONFIG_FOR_DOC = "Qwen2MoeConfig"
--+++++ 
--+++++-Long_Prompt = False
--+++++-PROMPT_LENGTH_THRESHOLD = 128
--++++++Long_Prompt = 1
--++++++LONG_PROMPT_LENGTH_THRESHOLD = 128
--++++++SHORT_PROMPT_LENGTH_THRESHOLD = 32
--++++++
--++++++_causal_mask_cache = {}
--++++++
--++++++def get_cached_causal_mask_with_cache_position(
--++++++    attention_mask: mindspore.Tensor,
--++++++    sequence_length: int,
--++++++    target_length: int,
--++++++    dtype: mindspore.dtype,
--++++++    min_dtype: float,
--++++++    cache_position: mindspore.Tensor,
--++++++    batch_size: int,
--++++++):
--++++++    """
--++++++    带缓存的 causal mask 构造函数
--++++++    """
--++++++    # q_len 是当前 query 长度
--++++++    q_len = sequence_length
--++++++    # kv_len 是 target_length
--++++++    kv_len = target_length
--++++++
--++++++    # 注意缓存 key 加上 q_len 和 kv_len，避免 prefill 与 decode 混淆
--++++++    key = (batch_size, q_len, kv_len, dtype, min_dtype)
--++++++
--++++++    if key in _causal_mask_cache:
--++++++        return _causal_mask_cache[key]
--++++++
--++++++    # 调用原来的 mask 构造逻辑
--++++++    causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++++        attention_mask,
--++++++        sequence_length=sequence_length,
--++++++        target_length=target_length,
--++++++        dtype=dtype,
--++++++        min_dtype=min_dtype,
--++++++        cache_position=cache_position,
--++++++        batch_size=batch_size,
--++++++    )
--++++++    # 缓存结果
--++++++    _causal_mask_cache[key] = causal_mask
--++++++    return causal_mask
--+++++ 
--+++++ # Copied from transformers.models.llama.modeling_llama._prepare_4d_causal_attention_mask_with_cache_position
--+++++ def _prepare_4d_causal_attention_mask_with_cache_position(
--+++++@@ -318,12 +358,172 @@ def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+++++ 
--+++++ 
--+++++ # Copied from transformers.models.qwen2.modeling_qwen2.Qwen2Attention with Qwen2->Qwen2Moe
--++++++# class Qwen2MoeAttention(nn.Module):
--++++++#     """
--++++++#     Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--++++++#     and "Generating Long Sequences with Sparse Transformers".
--++++++#     """
--++++++
--++++++#     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--++++++#         super().__init__()
--++++++#         self.config = config
--++++++#         self.layer_idx = layer_idx
--++++++#         if layer_idx is None:
--++++++#             logger.warning_once(
--++++++#                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--++++++#                 "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++++#                 "when creating this class."
--++++++#             )
--++++++
--++++++#         self.hidden_size = config.hidden_size
--++++++#         self.num_heads = config.num_attention_heads
--++++++#         self.head_dim = self.hidden_size // self.num_heads
--++++++#         self.num_key_value_heads = config.num_key_value_heads
--++++++#         self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--++++++#         self.max_position_embeddings = config.max_position_embeddings
--++++++#         self.rope_theta = config.rope_theta
--++++++#         self.is_causal = True
--++++++#         self.attention_dropout = config.attention_dropout
--++++++
--++++++#         if (self.head_dim * self.num_heads) != self.hidden_size:
--++++++#             raise ValueError(
--++++++#                 f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
--++++++#                 f" and `num_heads`: {self.num_heads})."
--++++++#             )
--++++++#         self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--++++++#         self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++++#         self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--++++++#         self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--++++++
--++++++#         self.rotary_emb = Qwen2MoeRotaryEmbedding(
--++++++#             self.head_dim,
--++++++#             max_position_embeddings=self.max_position_embeddings,
--++++++#             base=self.rope_theta,
--++++++#         )
--++++++
--++++++#     def forward(
--++++++#         self,
--++++++#         hidden_states: mindspore.Tensor,
--++++++#         attention_mask: Optional[mindspore.Tensor] = None,
--++++++#         position_ids: Optional[mindspore.Tensor] = None,
--++++++#         past_key_value: Optional[Cache] = None,
--++++++#         output_attentions: bool = False,
--++++++#         use_cache: bool = False,
--++++++#         cache_position: Optional[mindspore.Tensor] = None,
--++++++#     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--++++++
--++++++        
--++++++
--++++++#         bsz, q_len, _ = hidden_states.shape
--++++++
--++++++#         query_states = self.q_proj(hidden_states)
--++++++#         key_states = self.k_proj(hidden_states)
--++++++#         value_states = self.v_proj(hidden_states)
--++++++
--++++++#         query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--++++++#         key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++++#         value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--++++++
--++++++#         kv_seq_len = key_states.shape[-2]
--++++++#         if past_key_value is not None:
--++++++#             if self.layer_idx is None:
--++++++#                 raise ValueError(
--++++++#                     f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--++++++#                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--++++++#                     "with a layer index."
--++++++#                 )
--++++++#             if isinstance(past_key_value, StaticCache):
--++++++#                 kv_seq_len = key_states.shape[-2]
--++++++#             else:
--++++++#                 kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++#         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--++++++#         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--++++++
--++++++#         if past_key_value is not None:
--++++++#             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++++++#             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++++            
--++++++#             if isinstance(past_key_value, StaticCache):
--++++++#                 kv_seq_len = key_states.shape[-2]
--++++++
--++++++#         # repeat k/v heads if n_kv_heads < n_heads
--++++++#         key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++++#         value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++++        
--++++++#         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++++
--++++++#         if attention_mask is not None:
--++++++#             causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++++#             attn_weights = attn_weights + causal_mask
--++++++
--++++++#         # upcast attention to fp32
--++++++#         attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--++++++#         attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--++++++#         attn_output = ops.matmul(attn_weights, value_states)
--++++++
--++++++#         if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--++++++#             raise ValueError(
--++++++#                 f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--++++++#                 f" {attn_output.shape}"
--++++++#             )
--++++++
--++++++#         attn_output = ops.transpose(attn_output, 1, 2)
--++++++#         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++++
--++++++#         attn_output = self.o_proj(attn_output)
--++++++#         # @lwx
--++++++        
--++++++#         # max_seq_len = self.max_position_embeddings  # 2048
--++++++
--++++++#         # if attention_mask is not None:
--++++++#         #     # attention_mask: [B, 1, Sq, Sk]
--++++++#         #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--++++++
--++++++#         #     # pad 到 [max_seq_len, max_seq_len]
--++++++#         #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--++++++#         #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--++++++#         #     global_attention_mask = padded_mask
--++++++#         # else:
--++++++#         #     global_attention_mask = None
--++++++
--++++++
--++++++#         # sparse_mode=3
--++++++#         # attn_output = mindspore.ops.flash_attention_score(
--++++++#         #     query=query_states,
--++++++#         #     key=key_states,
--++++++#         #     value=value_states,
--++++++#         #     real_shift=None,
--++++++#         #     padding_mask=None,
--++++++
--++++++#         #     head_num=self.num_heads,
--++++++#         #     attn_mask=global_attention_mask,
--++++++#         #     keep_prob=1.0 - self.attention_dropout,
--++++++#         #     scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++#         #     input_layout="BNSD",
--++++++#         #     pre_tokens=2147483647,
--++++++#         #     next_tokens=2147483647,
--++++++#         #     inner_precise=0,
--++++++#         #     drop_mask=None,
--++++++#         #     prefix=None,
--++++++#         #     actual_seq_qlen=None,
--++++++#         #     actual_seq_kvlen=None,
--++++++#         #     sparse_mode=sparse_mode,
--++++++#         # )
--++++++#         if not output_attentions:
--++++++#             attn_weights = None
--++++++
--++++++#         return attn_output, attn_weights, past_key_value
--++++++
--+++++ class Qwen2MoeAttention(nn.Module):
--+++++     """
--+++++-    Multi-headed attention from 'Attention Is All You Need' paper. Modified to use sliding window attention: Longformer
--+++++-    and "Generating Long Sequences with Sparse Transformers".
--+++++-    """
--++++++    一个融合了 Eager 和 Flash Attention 实现的统一注意力模块。
--+++++ 
--++++++    本模块在 `forward` 方法内部根据全局变量 `Long_Prompt` 的值进行动态调度：
--++++++    - if Long_Prompt == 2: 使用高精度 Flash Attention 路径，针对长序列进行优化。
--++++++    - else: 使用标准的 Eager Attention 路径，保证短序列和解码阶段的数值一致性。
--++++++
--++++++    这避免了在外部（如 DecoderLayer）进行复杂的对象实例化切换。
--++++++    """
--+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++++         super().__init__()
--+++++         self.config = config
--+++++@@ -331,7 +531,7 @@ class Qwen2MoeAttention(nn.Module):
--+++++         if layer_idx is None:
--+++++             logger.warning_once(
--+++++                 f"Instantiating {self.__class__.__name__} without passing `layer_idx` is not recommended and will "
--+++++-                "to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--++++++                "lead to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` "
--+++++                 "when creating this class."
--+++++             )
--+++++ 
--+++++@@ -371,110 +571,86 @@ class Qwen2MoeAttention(nn.Module):
--+++++         use_cache: bool = False,
--+++++         cache_position: Optional[mindspore.Tensor] = None,
--+++++     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++-
--+++++         
--+++++-
--++++++        # --- 1. 通用计算部分 (Projections, RoPE, KV Cache) ---
--+++++         bsz, q_len, _ = hidden_states.shape
--+++++ 
--+++++         query_states = self.q_proj(hidden_states)
--+++++         key_states = self.k_proj(hidden_states)
--+++++         value_states = self.v_proj(hidden_states)
--+++++ 
--+++++-        query_states = ops.transpose(query_states.view(bsz, q_len, self.num_heads, self.head_dim), 1, 2)
--+++++-        key_states = ops.transpose(key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++++-        value_states = ops.transpose(value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim), 1, 2)
--+++++-
--++++++        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--++++++        
--+++++         kv_seq_len = key_states.shape[-2]
--+++++         if past_key_value is not None:
--+++++-            if self.layer_idx is None:
--+++++-                raise ValueError(
--+++++-                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++-                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++-                    "with a layer index."
--+++++-                )
--+++++-            if isinstance(past_key_value, StaticCache):
--+++++-                kv_seq_len = key_states.shape[-2]
--+++++-            else:
--+++++-                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--++++++        
--+++++         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++ 
--+++++         if past_key_value is not None:
--+++++-            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--++++++            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--++++++
--++++++        # --- 2. 动态调度核心注意力计算 ---
--++++++        global Long_Prompt
--++++++        if Long_Prompt >= 1:
--++++++            # --- Flash Attention 路径 (高精度，用于长序列 prefill) ---
--++++++            fa_attention_mask = None
--++++++            if attention_mask is not None:
--++++++                mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--++++++                fa_attention_mask = (mask_slice != 0)
--++++++
--++++++            attn_output = mindspore.ops.flash_attention_score(
--++++++                query=query_states,
--++++++                key=key_states,
--++++++                value=value_states,
--++++++                head_num=self.num_heads,
--++++++                attn_mask=fa_attention_mask,
--++++++                keep_prob=1.0 - self.attention_dropout if self.training else 1.0,
--++++++                scalar_value=1.0 / math.sqrt(self.head_dim),
--++++++                input_layout="BNSD",
--++++++                sparse_mode=0,
--++++++                inner_precise=0  # 使用高精度模式以对齐 Eager 结果
--++++++            )
--+++++             
--+++++-            if isinstance(past_key_value, StaticCache):
--+++++-                kv_seq_len = key_states.shape[-2]
--++++++            attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--++++++            attn_output = self.o_proj(attn_output)
--++++++            attn_weights = None
--++++++            if output_attentions:
--++++++                logger.warning_once("Flash Attention path is used, but `output_attentions=True`. Flash Attention does not return attention weights.")
--+++++ 
--+++++-        # repeat k/v heads if n_kv_heads < n_heads
--+++++-        key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++-        value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++-        
--+++++-        attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--++++++        else:
--++++++            # --- Eager Attention 路径 (用于短序列和解码) ---
--++++++            key_states = repeat_kv(key_states, self.num_key_value_groups)
--++++++            value_states = repeat_kv(value_states, self.num_key_value_groups)
--++++++            
--++++++            attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++++ 
--+++++-        if attention_mask is not None:
--+++++-            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++++-            attn_weights = attn_weights + causal_mask
--++++++            if attention_mask is not None:
--++++++                causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--++++++                attn_weights = attn_weights + causal_mask
--+++++ 
--+++++-        # upcast attention to fp32
--+++++-        attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--+++++-        attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--+++++-        attn_output = ops.matmul(attn_weights, value_states)
--++++++            attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=mindspore.float32).to(query_states.dtype)
--++++++            attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
--++++++            attn_output = ops.matmul(attn_weights, value_states)
--+++++ 
--+++++-        if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--+++++-            raise ValueError(
--+++++-                f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
--+++++-                f" {attn_output.shape}"
--+++++-            )
--++++++            if attn_output.shape != (bsz, self.num_heads, q_len, self.head_dim):
--++++++                raise ValueError(
--++++++                    f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is {attn_output.shape}"
--++++++                )
--+++++ 
--+++++-        attn_output = ops.transpose(attn_output, 1, 2)
--+++++-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++++            attn_output = ops.transpose(attn_output, 1, 2)
--++++++            attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--++++++            attn_output = self.o_proj(attn_output)
--+++++ 
--+++++-        attn_output = self.o_proj(attn_output)
--+++++-        # @lwx
--++++++            if not output_attentions:
--++++++                attn_weights = None
--+++++         
--+++++-        # max_seq_len = self.max_position_embeddings  # 2048
--+++++-
--+++++-        # if attention_mask is not None:
--+++++-        #     # attention_mask: [B, 1, Sq, Sk]
--+++++-        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++++-
--+++++-        #     # pad 到 [max_seq_len, max_seq_len]
--+++++-        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++++-        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++++-        #     global_attention_mask = padded_mask
--+++++-        # else:
--+++++-        #     global_attention_mask = None
--+++++-
--+++++-
--+++++-        # sparse_mode=3
--+++++-        # attn_output = mindspore.ops.flash_attention_score(
--+++++-        #     query=query_states,
--+++++-        #     key=key_states,
--+++++-        #     value=value_states,
--+++++-        #     real_shift=None,
--+++++-        #     padding_mask=None,
--+++++-
--+++++-        #     head_num=self.num_heads,
--+++++-        #     attn_mask=global_attention_mask,
--+++++-        #     keep_prob=1.0 - self.attention_dropout,
--+++++-        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++-        #     input_layout="BNSD",
--+++++-        #     pre_tokens=2147483647,
--+++++-        #     next_tokens=2147483647,
--+++++-        #     inner_precise=0,
--+++++-        #     drop_mask=None,
--+++++-        #     prefix=None,
--+++++-        #     actual_seq_qlen=None,
--+++++-        #     actual_seq_kvlen=None,
--+++++-        #     sparse_mode=sparse_mode,
--+++++-        # )
--+++++-        if not output_attentions:
--+++++-            attn_weights = None
--+++++-
--+++++         return attn_output, attn_weights, past_key_value
--+++++ 
--+++++-
--+++++ # class Qwen2MoeFlashAttention(nn.Module):
--+++++ #     """
--+++++ #     Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++++@@ -899,578 +1075,6 @@ QWEN2MOE_ATTENTION_CLASSES = {
--+++++ #             return final_hidden_states, router_logits
--+++++ 
--+++++ 
--+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++-#     """
--+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++++-#     它包含一个顶层 `forward` 方法，根据输入序列的长度智能地分发到
--+++++-#     专门优化的 `_moe_infer_decode` (用于单 token 生成) 或 
--+++++-#     `_moe_infer_prefill` (用于长序列处理) 方法。
--+++++-#     """
--+++++-#     def __init__(self, config: Qwen2MoeConfig):
--+++++-#         super().__init__()
--+++++-#         self.num_experts = config.num_experts
--+++++-#         self.top_k = config.num_experts_per_tok
--+++++-#         self.norm_topk_prob = config.norm_topk_prob
--+++++-
--+++++-#         # 门控网络
--+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++-#         # 专家列表
--+++++-#         self.experts = nn.ModuleList(
--+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++-#         )
--+++++-#         # 共享专家
--+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++-
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_decode(
--+++++-#         self, 
--+++++-#         hidden_states: mindspore.Tensor, 
--+++++-#         selected_experts: mindspore.Tensor, 
--+++++-#         routing_weights: mindspore.Tensor
--+++++-#     ) -> mindspore.Tensor:
--+++++-#         """
--+++++-#         【解码路径】针对 sequence_length=1 的极致优化。
--+++++-#         使用向量化操作处理一个批次 (batch) 的单 token 输入。
--+++++-#         """
--+++++-#         batch_size, hidden_dim = hidden_states.shape
--+++++-        
--+++++-#         expert_outputs_list = [
--+++++-#             ops.cat([
--+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++-#             ], dim=0) 
--+++++-#             for i in range(batch_size)
--+++++-#         ]
--+++++-        
--+++++-#         # --- 错误修复：将 axis=0 修改为 dim=0 ---
--+++++-#         # shape: (batch_size, top_k, hidden_dim)
--+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++-        
--+++++-#         # 使用批量矩阵乘法 (bmm) 高效完成加权求和
--+++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++++-        
--+++++-#         return moe_output.squeeze(1)
--+++++-
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_prefill(
--+++++-#         self, 
--+++++-#         hidden_states: mindspore.Tensor, 
--+++++-#         selected_experts: mindspore.Tensor, 
--+++++-#         routing_weights: mindspore.Tensor
--+++++-#     ) -> mindspore.Tensor:
--+++++-#         """
--+++++-#         【预填充路径】针对 sequence_length > 1 的优化。
--+++++-#         按专家对 Token 进行分组，并进行批处理。
--+++++-#         """
--+++++-#         moe_output = ops.zeros_like(hidden_states)
--+++++-#         num_tokens = hidden_states.shape[0]
--+++++-#         flat_selected_experts = selected_experts.flatten()
--+++++-        
--+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++-        
--+++++-#         active_experts = ops.unique(flat_selected_experts)
--+++++-        
--+++++-#         for expert_idx_tensor in active_experts:
--+++++-#             expert_idx = expert_idx_tensor.item()
--+++++-#             expert_layer = self.experts[expert_idx]
--+++++-            
--+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++-#             selected_token_indices = token_indices[mask]
--+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
--+++++-            
--+++++-#             current_states = hidden_states[selected_token_indices]
--+++++-            
--+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++-            
--+++++-#             moe_output = moe_output.index_add(
--+++++-#                 dim=0,
--+++++-#                 index=selected_token_indices,
--+++++-#                 source=expert_output.to(hidden_states.dtype)
--+++++-#             )
--+++++-#         return moe_output
--+++++-
--+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++-#         """
--+++++-#         顶层 forward 方法，作为智能分发器。
--+++++-#         """
--+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-        
--+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++-#         router_logits = self.gate(hidden_states_reshaped)
--+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++-
--+++++-#         if self.norm_topk_prob:
--+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-        
--+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++++-        
--+++++-#         moe_output = None
--+++++-#         # 在推理时，根据序列长度选择最优路径
--+++++-#         if not self.training:
--+++++-#             if sequence_length == 1:
--+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+++++-#             else:
--+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+++++-#         else:
--+++++-#             # 可以在此实现训练逻辑，如果暂时不需要，直接报错是安全的
--+++++-#             raise NotImplementedError("Training path is not implemented.")
--+++++-
--+++++-#         shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++++-#         shared_expert_gate_output = self.shared_expert_gate(hidden_states_reshaped)
--+++++-#         shared_expert_weights = F.sigmoid(shared_expert_gate_output)
--+++++-        
--+++++-#         final_hidden_states = moe_output + shared_expert_output * shared_expert_weights
--+++++-        
--+++++-#         final_hidden_states = final_hidden_states.view(batch_size, sequence_length, hidden_dim)
--+++++-        
--+++++-#         return final_hidden_states, router_logits
--+++++-
--+++++-
--+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++-#     """
--+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++++-#     该版本修复了原始优化版本中因共享专家处理路径不统一而导致的计算结果不匹配问题。
--+++++-#     """
--+++++-#     def __init__(self, config: Qwen2MoeConfig):
--+++++-#         super().__init__()
--+++++-#         self.num_experts = config.num_experts
--+++++-#         self.top_k = config.num_experts_per_tok
--+++++-#         self.norm_topk_prob = config.norm_topk_prob
--+++++-
--+++++-#         # 门控网络
--+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++-#         # 专家列表
--+++++-#         self.experts = nn.ModuleList(
--+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++-#         )
--+++++-#         # 共享专家
--+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++-
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_decode(
--+++++-#         self, 
--+++++-#         hidden_states: mindspore.Tensor, 
--+++++-#         selected_experts: mindspore.Tensor, 
--+++++-#         routing_weights: mindspore.Tensor
--+++++-#     ) -> mindspore.Tensor:
--+++++-#         batch_size, _ = hidden_states.shape
--+++++-#         expert_outputs_list = [
--+++++-#             ops.cat([
--+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++-#             ], dim=0) 
--+++++-#             for i in range(batch_size)
--+++++-#         ]
--+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++-#         moe_output = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++++-#         return moe_output.squeeze(1)
--+++++-
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_prefill(
--+++++-#         self, 
--+++++-#         hidden_states: mindspore.Tensor, 
--+++++-#         selected_experts: mindspore.Tensor, 
--+++++-#         routing_weights: mindspore.Tensor
--+++++-#     ) -> mindspore.Tensor:
--+++++-#         moe_output = ops.zeros_like(hidden_states)
--+++++-#         num_tokens = hidden_states.shape[0]
--+++++-#         flat_selected_experts = selected_experts.flatten()
--+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++-#         active_experts = ops.unique(flat_selected_experts)
--+++++-        
--+++++-#         for expert_idx_tensor in active_experts:
--+++++-#             expert_idx = expert_idx_tensor.item()
--+++++-#             expert_layer = self.experts[expert_idx]
--+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++-#             selected_token_indices = token_indices[mask]
--+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
--+++++-#             current_states = hidden_states[selected_token_indices]
--+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++-#             moe_output = moe_output.index_add(
--+++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+++++-#             )
--+++++-#         return moe_output
--+++++-
--+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++-#         """
--+++++-#         顶层 forward 方法，作为智能分发器。
--+++++-#         【修正版】确保共享专家的计算逻辑在所有路径中保持一致。
--+++++-#         """
--+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-        
--+++++-#         # 1. 门控计算 (通用逻辑)
--+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++-#         router_logits = self.gate(hidden_states_reshaped)
--+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++-
--+++++-#         if self.norm_topk_prob:
--+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-        
--+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++++-        
--+++++-#         # 2. 智能分发到最优 MoE 路径
--+++++-#         moe_output = None
--+++++-#         if not self.training:
--+++++-#             if sequence_length == 1:
--+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights)
--+++++-#             else:
--+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights)
--+++++-#         else:
--+++++-#             raise NotImplementedError("Training path is not implemented.")
--+++++-
--+++++-#         # 3. 【关键修正】统一在这里处理共享专家，确保逻辑一致
--+++++-#         #    共享专家和它的门控网络，都作用于 reshape 后的张量
--+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++-        
--+++++-#         # 4. 合并 MoE 输出和共享专家输出
--+++++-#         #    两个张量的 shape 都是 [num_tokens, hidden_dim]，直接相加
--+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++-        
--+++++-#         # 5. 恢复原始形状并返回
--+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++-        
--+++++-#         return final_hidden_states, router_logits
--+++++-
--+++++-# prefill fastest
--+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++-#     """
--+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++++-#     【最终修正版】：统一了 decode 和 prefill 路径的核心计算内核 (index_add)，
--+++++-#     以确保在所有情况下计算结果 100% 匹配，同时保留了路径分发带来的性能优势。
--+++++-#     """
--+++++-#     def __init__(self, config: Qwen2MoeConfig):
--+++++-#         super().__init__()
--+++++-#         self.num_experts = config.num_experts
--+++++-#         self.top_k = config.num_experts_per_tok
--+++++-#         self.norm_topk_prob = config.norm_topk_prob
--+++++-
--+++++-#         # 门控网络
--+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++-#         # 专家列表
--+++++-#         self.experts = nn.ModuleList(
--+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++-#         )
--+++++-#         # 共享专家
--+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++-
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_dispatch(
--+++++-#         self, 
--+++++-#         hidden_states: mindspore.Tensor, 
--+++++-#         selected_experts: mindspore.Tensor, 
--+++++-#         routing_weights: mindspore.Tensor
--+++++-#     ) -> mindspore.Tensor:
--+++++-#         """
--+++++-#         【统一计算内核】：无论是 decode 还是 prefill，都使用与原始代码完全一致的 `index_add` 逻辑。
--+++++-#         这保证了浮点数运算的顺序和方式完全相同，从而确保结果的一致性。
--+++++-#         """
--+++++-#         moe_output = ops.zeros_like(hidden_states)
--+++++-#         num_tokens, _ = hidden_states.shape
--+++++-        
--+++++-#         # 将专家索引和权重展平，这对于 prefill 和 decode 都是通用的
--+++++-#         flat_selected_experts = selected_experts.flatten()
--+++++-#         flat_routing_weights = routing_weights.flatten()
--+++++-
--+++++-#         # 创建 token_idx 用于将计算结果映射回正确的 token 位置
--+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++-
--+++++-#         # 找到所有被激活的专家（对于 decode 来说，这步开销极小）
--+++++-#         active_experts = ops.unique(flat_selected_experts)
--+++++-        
--+++++-#         for expert_idx_tensor in active_experts:
--+++++-#             expert_idx = expert_idx_tensor.item()
--+++++-#             expert_layer = self.experts[expert_idx]
--+++++-            
--+++++-#             # 找到所有分配给该专家的 token
--+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++-            
--+++++-#             # 使用 mask 选取对应的 token 和权重
--+++++-#             current_token_indices = token_indices[mask]
--+++++-#             current_routing_weights = flat_routing_weights[mask]
--+++++-#             current_hidden_states = hidden_states[current_token_indices]
--+++++-            
--+++++-#             # 对这些 token 进行批处理
--+++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++++-            
--+++++-#             # 使用 index_add 将结果精确地加回到对应位置
--+++++-#             moe_output = moe_output.index_add(
--+++++-#                 dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype)
--+++++-#             )
--+++++-#         return moe_output
--+++++-
--+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++-#         """
--+++++-#         顶层 forward 方法，作为智能分发器。
--+++++-#         """
--+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-        
--+++++-#         # 1. 门控计算
--+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++-#         router_logits = self.gate(hidden_states_reshaped)
--+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++-
--+++++-#         if self.norm_topk_prob:
--+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-        
--+++++-#         routing_weights = routing_weights.to(hidden_states.dtype)
--+++++-        
--+++++-#         # 2. 调用统一的 MoE 计算内核
--+++++-#         #    我们不再需要区分 decode 和 prefill，因为这个函数对两者都高效且正确
--+++++-#         moe_output = self._moe_infer_dispatch(hidden_states_reshaped, selected_experts, routing_weights)
--+++++-
--+++++-#         # 3. 统一处理共享专家
--+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++-        
--+++++-#         # 4. 合并输出
--+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++-        
--+++++-#         # 5. 恢复原始形状并返回
--+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++-        
--+++++-#         return final_hidden_states, router_logits
--+++++-
--+++++-
--+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++-#     """
--+++++-#     一个混合专家模块 (MoE block)，其结构模仿了 DeepseekMoE 的高效推理封装。
--+++++-#     【最终高性能与高精度版】：
--+++++-#     1. 解码路径使用 bmm 算子以达到最大推理速度。
--+++++-#     2. 在 bmm 计算前，强制将输入提升到 float32 进行高精度累加，以消除
--+++++-#        因并行计算顺序差异导致的浮点数误差，确保结果与串行逻辑一致。
--+++++-#     3. 这样实现了速度和准确性的两全其美。
--+++++-#     """
--+++++-#     def __init__(self, config: Qwen2MoeConfig):
--+++++-#         super().__init__()
--+++++-#         self.num_experts = config.num_experts
--+++++-#         self.top_k = config.num_experts_per_tok
--+++++-#         self.norm_topk_prob = config.norm_topk_prob
--+++++-
--+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++-#         self.experts = nn.ModuleList(
--+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++-#         )
--+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++-
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_decode(
--+++++-#         self, 
--+++++-#         hidden_states: mindspore.Tensor, 
--+++++-#         selected_experts: mindspore.Tensor, 
--+++++-#         routing_weights: mindspore.Tensor
--+++++-#     ) -> mindspore.Tensor:
--+++++-#         """
--+++++-#         【解码路径】极致优化版：bmm + 高精度累加。
--+++++-#         """
--+++++-#         original_dtype = hidden_states.dtype
--+++++-#         batch_size, _ = hidden_states.shape
--+++++-        
--+++++-#         expert_outputs_list = [
--+++++-#             ops.cat([
--+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++-#             ], dim=0) 
--+++++-#             for i in range(batch_size)
--+++++-#         ]
--+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++-
--+++++-#         # 在 float32 下执行 bmm，得到高精度结果
--+++++-#         moe_output_fp32 = ops.bmm(routing_weights.unsqueeze(1), expert_outputs_stacked)
--+++++-        
--+++++-#         # 将高精度结果转换回原始数据类型
--+++++-#         moe_output = moe_output_fp32.squeeze(1).to(original_dtype)
--+++++-        
--+++++-#         return moe_output
--+++++-
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_prefill(
--+++++-#         self, 
--+++++-#         hidden_states: mindspore.Tensor, 
--+++++-#         selected_experts: mindspore.Tensor, 
--+++++-#         routing_weights: mindspore.Tensor
--+++++-#     ) -> mindspore.Tensor:
--+++++-#         """
--+++++-#         【预填充路径】与原始实现一致，结果精确。
--+++++-#         """
--+++++-#         moe_output = ops.zeros_like(hidden_states)
--+++++-#         num_tokens, _ = hidden_states.shape
--+++++-#         flat_selected_experts = selected_experts.flatten()
--+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++-#         active_experts = ops.unique(flat_selected_experts)
--+++++-        
--+++++-#         for expert_idx_tensor in active_experts:
--+++++-#             expert_idx = expert_idx_tensor.item()
--+++++-#             expert_layer = self.experts[expert_idx]
--+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++-#             selected_token_indices = token_indices[mask]
--+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
--+++++-#             current_states = hidden_states[selected_token_indices]
--+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++-#             moe_output = moe_output.index_add(
--+++++-#                 dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype)
--+++++-#             )
--+++++-#         return moe_output
--+++++-
--+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-        
--+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++-#         router_logits = self.gate(hidden_states_reshaped)
--+++++-#         routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-#         routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++-
--+++++-#         if self.norm_topk_prob:
--+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-        
--+++++-#         # 注意：这里我们保留 routing_weights 为 float32，因为它在 decode 路径中需要高精度
--+++++-#         # 如果模型主体是 float16，后续再转换
--+++++-        
--+++++-#         moe_output = None
--+++++-#         if not self.training:
--+++++-#             # 传递给 decode 的 routing_weights 是 fp32，而 hidden_states 是原始类型
--+++++-#             # _moe_infer_decode 内部会处理好类型转换
--+++++-#             temp_routing_weights = routing_weights.to(hidden_states.dtype)
--+++++-#             if sequence_length == 1:
--+++++-#                 moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+++++-#             else:
--+++++-#                 moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, temp_routing_weights)
--+++++-#         else:
--+++++-#             raise NotImplementedError("Training path is not implemented.")
--+++++-
--+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++-        
--+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++-        
--+++++-#         return final_hidden_states, router_logits
--+++++-    
--+++++-
--+++++-# class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++-#     """
--+++++-#     【融合版】一个混合专家模块，内置两种推理策略，
--+++++-#     由外部全局变量 `Long_Prompt` 控制：
--+++++-
--+++++-#     - if Long_Prompt is True:  【精度优先模式】
--+++++-#       采用统一的 index_add 内核，保证在任何情况下结果都 100% 匹配。
--+++++-#       适用于处理长序列，避免误差累积。
--+++++-
--+++++-#     - if Long_Prompt is False: 【速度优先模式】
--+++++-#       智能分发到 prefill(index_add) 和 decode(bmm+fp32) 路径，
--+++++-#       在解码阶段获得极致速度，同时保证结果高度准确。
--+++++-#     """
--+++++-#     def __init__(self, config: Qwen2MoeConfig):
--+++++-#         super().__init__()
--+++++-#         self.num_experts = config.num_experts
--+++++-#         self.top_k = config.num_experts_per_tok
--+++++-#         self.norm_topk_prob = config.norm_topk_prob
--+++++-
--+++++-#         self.gate = nn.Linear(config.hidden_size, config.num_experts, bias=False)
--+++++-#         self.experts = nn.ModuleList(
--+++++-#             [Qwen2MoeMLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(self.num_experts)]
--+++++-#         )
--+++++-#         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++-#         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++-
--+++++-#     # --- 速度优先模式的辅助函数 ---
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_decode(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++-#         original_dtype = hidden_states.dtype
--+++++-#         batch_size, _ = hidden_states.shape
--+++++-#         expert_outputs_list = [
--+++++-#             ops.cat([
--+++++-#                 self.experts[expert_idx.item()](hidden_states[i:i+1]) for expert_idx in selected_experts[i]
--+++++-#             ], dim=0) 
--+++++-#             for i in range(batch_size)
--+++++-#         ]
--+++++-#         expert_outputs_stacked = ops.stack(expert_outputs_list, dim=0)
--+++++-#         weights_fp32 = routing_weights.to(mindspore.float32)
--+++++-#         outputs_fp32 = expert_outputs_stacked.to(mindspore.float32)
--+++++-#         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+++++-#         return moe_output_fp32.squeeze(1).to(original_dtype)
--+++++-
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_prefill(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++-#         moe_output = ops.zeros_like(hidden_states)
--+++++-#         num_tokens, _ = hidden_states.shape
--+++++-#         flat_selected_experts = selected_experts.flatten()
--+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++-#         active_experts = ops.unique(flat_selected_experts)
--+++++-#         for expert_idx_tensor in active_experts:
--+++++-#             expert_idx = expert_idx_tensor.item()
--+++++-#             expert_layer = self.experts[expert_idx]
--+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++-#             selected_token_indices = token_indices[mask]
--+++++-#             selected_routing_weights = routing_weights.flatten()[mask]
--+++++-#             current_states = hidden_states[selected_token_indices]
--+++++-#             expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++-#             moe_output = moe_output.index_add(dim=0, index=selected_token_indices, source=expert_output.to(hidden_states.dtype))
--+++++-#         return moe_output
--+++++-
--+++++-#     # --- 精度优先模式的辅助函数 ---
--+++++-#     @no_grad()
--+++++-#     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++-#         moe_output = ops.zeros_like(hidden_states)
--+++++-#         num_tokens, _ = hidden_states.shape
--+++++-#         flat_selected_experts = selected_experts.flatten()
--+++++-#         flat_routing_weights = routing_weights.flatten()
--+++++-#         token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1).broadcast_to((-1, self.top_k)).flatten()
--+++++-#         active_experts = ops.unique(flat_selected_experts)
--+++++-#         for expert_idx_tensor in active_experts:
--+++++-#             expert_idx = expert_idx_tensor.item()
--+++++-#             expert_layer = self.experts[expert_idx]
--+++++-#             mask = (flat_selected_experts == expert_idx_tensor)
--+++++-#             current_token_indices = token_indices[mask]
--+++++-#             current_routing_weights = flat_routing_weights[mask]
--+++++-#             current_hidden_states = hidden_states[current_token_indices]
--+++++-#             expert_output = expert_layer(current_hidden_states) * current_routing_weights.unsqueeze(1)
--+++++-#             moe_output = moe_output.index_add(dim=0, index=current_token_indices, source=expert_output.to(hidden_states.dtype))
--+++++-#         return moe_output
--+++++-
--+++++-#     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++-#         # 声明我们将要使用一个在模块外部定义的全局变量
--+++++-#         # 这是一个简单的实现方式，更复杂的工程中可能会使用配置对象传递
--+++++-#         global Long_Prompt
--+++++-
--+++++-#         # 1. 门控计算 (所有模式通用)
--+++++-#         batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-#         hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++-#         router_logits = self.gate(hidden_states_reshaped)
--+++++-#         routing_weights_fp32 = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-#         routing_weights, selected_experts = ops.topk(routing_weights_fp32, self.top_k, dim=-1)
--+++++-#         if self.norm_topk_prob:
--+++++-#             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-        
--+++++-#         moe_output = None
--+++++-#         if not self.training:
--+++++-#             # 根据 Long_Prompt 标志选择模式
--+++++-#             if Long_Prompt:
--+++++-#                 # --- 精度优先模式 ---
--+++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++-#                 moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++-#             else:
--+++++-#                 # --- 速度优先模式 ---
--+++++-#                 routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++-#                 if sequence_length == 1:
--+++++-#                     moe_output = self._moe_infer_decode(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++-#                 else:
--+++++-#                     moe_output = self._moe_infer_prefill(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++-#         else:
--+++++-#             raise NotImplementedError("Training path is not implemented.")
--+++++-
--+++++-#         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++-#                                      F.sigmoid(self.shared_expert_gate(hidden_states_reshaped))
--+++++-        
--+++++-#         final_hidden_states_reshaped = moe_output + gated_shared_expert_output
--+++++-#         final_hidden_states = final_hidden_states_reshaped.view(batch_size, sequence_length, hidden_dim)
--+++++-        
--+++++-#         return final_hidden_states, router_logits
--+++++-    
--+++++ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++     """
--+++++     【最终融合版】一个混合专家模块，内置两种由外部全局变量 `Long_Prompt` 
--+++++@@ -1515,29 +1119,71 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++         moe_output_fp32 = ops.bmm(weights_fp32.unsqueeze(1), outputs_fp32)
--+++++         return moe_output_fp32.squeeze(1).to(original_dtype)
--+++++ 
--++++++    # @no_grad()
--++++++    # def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--++++++    #     num_tokens, _ = hidden_states.shape
--++++++    #     flat_selected_experts = selected_experts.flatten()
--++++++    #     sorted_expert_indices = flat_selected_experts.argsort()
--++++++    #     tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--++++++    #     original_token_indices = sorted_expert_indices // self.top_k
--++++++    #     moe_output = ops.zeros_like(hidden_states)
--++++++    #     current_token_offset = 0
--++++++    #     for i in range(self.num_experts):
--++++++    #         expert_token_count = tokens_per_expert[i] - current_token_offset
--++++++    #         if expert_token_count == 0:
--++++++    #             continue
--++++++    #         end_offset = current_token_offset + expert_token_count
--++++++    #         expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--++++++    #         expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--++++++    #         expert_hidden_states = hidden_states[expert_original_token_indices]
--++++++    #         expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--++++++    #         expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--++++++    #         moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--++++++    #         current_token_offset += expert_token_count
--++++++    #     return moe_output
--++++++
--+++++     @no_grad()
--+++++     def _moe_infer_prefill_fast_deepspeed_style(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++-        num_tokens, _ = hidden_states.shape
--+++++-        flat_selected_experts = selected_experts.flatten()
--+++++-        sorted_expert_indices = flat_selected_experts.argsort()
--+++++-        tokens_per_expert = flat_selected_experts.bincount(minlength=self.num_experts).cumsum(0)
--+++++-        original_token_indices = sorted_expert_indices // self.top_k
--++++++        """
--++++++        优化版 MoE prefill (速度优先模式)：
--++++++        - 批量张量化处理同一个 expert 的所有 token
--++++++        - 跳过无 token 的专家
--++++++        - 保持结果完全一致
--++++++        """
--+++++         moe_output = ops.zeros_like(hidden_states)
--+++++-        current_token_offset = 0
--+++++-        for i in range(self.num_experts):
--+++++-            expert_token_count = tokens_per_expert[i] - current_token_offset
--+++++-            if expert_token_count == 0:
--+++++-                continue
--+++++-            end_offset = current_token_offset + expert_token_count
--+++++-            expert_original_token_indices = original_token_indices[current_token_offset:end_offset]
--+++++-            expert_sorted_indices = sorted_expert_indices[current_token_offset:end_offset]
--+++++-            expert_hidden_states = hidden_states[expert_original_token_indices]
--+++++-            expert_routing_weights = routing_weights.flatten()[expert_sorted_indices]
--+++++-            expert_output = self.experts[i](expert_hidden_states) * expert_routing_weights.unsqueeze(1)
--+++++-            moe_output = moe_output.index_add(dim=0, index=expert_original_token_indices, source=expert_output.to(hidden_states.dtype))
--+++++-            current_token_offset += expert_token_count
--++++++
--++++++        flat_selected_experts = selected_experts.flatten()
--++++++        flat_routing_weights = routing_weights.flatten()
--++++++
--++++++        idxs = flat_selected_experts.argsort()
--++++++        sorted_expert_indices = flat_selected_experts[idxs]
--++++++        sorted_token_indices = idxs // self.top_k
--++++++
--++++++        tokens_per_expert = sorted_expert_indices.bincount(minlength=self.num_experts)
--++++++
--++++++        active_experts = (tokens_per_expert > 0).nonzero(as_tuple=False).flatten()
--++++++
--++++++        for expert_id in active_experts.tolist():
--++++++            start = int(tokens_per_expert[:expert_id].sum().item())
--++++++            end = start + int(tokens_per_expert[expert_id].item())
--++++++
--++++++            token_idx = sorted_token_indices[start:end]
--++++++            expert_tokens = hidden_states[token_idx]
--++++++
--++++++            expert_out = self.experts[expert_id](expert_tokens)
--++++++
--++++++            scaled_out = expert_out * flat_routing_weights[idxs[start:end]].unsqueeze(1)
--++++++
--++++++            moe_output = mindspore.mint.scatter_add(
--++++++                moe_output,
--++++++                0,
--++++++                token_idx.view(-1, 1).tile((1, hidden_states.shape[-1])),
--++++++                scaled_out.to(hidden_states.dtype)
--++++++            )
--++++++
--+++++         return moe_output
--+++++ 
--++++++
--+++++     # --- 精度优先模式 (ACCURACY MODE) 的辅助函数 ---
--+++++     @no_grad()
--+++++     def _moe_infer_dispatch_accurate(self, hidden_states, selected_experts, routing_weights) -> mindspore.Tensor:
--+++++@@ -1571,18 +1217,24 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++             routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++         
--+++++         moe_output = None
--+++++-        if Long_Prompt:
--+++++-            # --- 精度优先模式 (ACCURACY MODE) ---
--+++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++-            moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++        # if Long_Prompt==0:
--++++++        #     # --- 精度优先模式 (ACCURACY MODE) ---
--++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++++        #     moe_output = self._moe_infer_dispatch_accurate(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++        # else:
--++++++        #     # --- 速度优先模式 (SPEED MODE) ---
--++++++        #     routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++++        #     if sequence_length == 1:
--++++++        #         moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++        #     else:
--++++++        #         moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++        
--++++++        routing_weights_casted = routing_weights.to(hidden_states.dtype)
--++++++        if sequence_length == 1:
--++++++            moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++         else:
--+++++-            # --- 速度优先模式 (SPEED MODE) ---
--+++++-            routing_weights_casted = routing_weights.to(hidden_states.dtype)
--+++++-            if sequence_length == 1:
--+++++-                moe_output = self._moe_infer_decode_fast(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++-            else:
--+++++-                moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--+++++-        
--++++++            moe_output = self._moe_infer_prefill_fast_deepspeed_style(hidden_states_reshaped, selected_experts, routing_weights_casted)
--++++++    
--+++++ 
--+++++         # 3. 共享专家计算与合并 (所有模式通用)
--+++++         gated_shared_expert_output = self.shared_expert(hidden_states_reshaped) * \
--+++++@@ -1593,15 +1245,16 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++         
--+++++         return final_hidden_states, router_logits
--+++++ 
--++++++
--+++++ class Qwen2MoeDecoderLayer(nn.Module):
--+++++     def __init__(self, config: Qwen2MoeConfig, layer_idx: int):
--+++++         super().__init__()
--+++++         self.hidden_size = config.hidden_size
--+++++         
--+++++-        # if Long_Prompt:
--+++++-        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++++-        # else:
--++++++        # if Long_Prompt == 2:
--+++++         #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--++++++        # else:
--++++++        #     self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++++ 
--+++++         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++++ 
--+++++@@ -1904,7 +1557,17 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+++++             )
--+++++ 
--+++++         # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
--+++++-        causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++++        # causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++++        #     attention_mask,
--++++++        #     sequence_length=sequence_length,
--++++++        #     target_length=target_length,
--++++++        #     dtype=dtype,
--++++++        #     min_dtype=min_dtype,
--++++++        #     cache_position=cache_position,
--++++++        #     batch_size=input_tensor.shape[0],
--++++++        # )
--++++++        #@dwj
--++++++        causal_mask = get_cached_causal_mask_with_cache_position(
--+++++             attention_mask,
--+++++             sequence_length=sequence_length,
--+++++             target_length=target_length,
--+++++@@ -2091,7 +1754,8 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++         重写 generate 方法，将其作为设置 MoE 策略的唯一入口。
--+++++         这个方法是所有生成任务的“前门”，保证逻辑一定会被执行。
--+++++         """
--+++++-        global Long_Prompt, PROMPT_LENGTH_THRESHOLD
--++++++        global Long_Prompt, PROMPT_LENGTH_THRESHOLD,_causal_mask_cache
--++++++        _causal_mask_cache.clear()
--+++++ 
--+++++         input_ids = kwargs.get("input_ids")
--+++++         if input_ids is None and args:
--+++++@@ -2099,11 +1763,13 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++ 
--+++++         if input_ids is not None:
--+++++             prompt_length = input_ids.shape[1]
--+++++-            
--+++++-            if prompt_length > PROMPT_LENGTH_THRESHOLD:
--+++++-                Long_Prompt = True
--++++++            if prompt_length > LONG_PROMPT_LENGTH_THRESHOLD:
--++++++                Long_Prompt = 2
--++++++            elif prompt_length < SHORT_PROMPT_LENGTH_THRESHOLD:
--++++++                Long_Prompt = 0
--+++++             else:
--+++++-                Long_Prompt = False
--++++++                Long_Prompt = 1
--++++++
--+++++ 
--+++++         return super().generate(*args, **kwargs)
--+++++     
--+++++@@ -2154,7 +1820,18 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++             dtype = self.lm_head.weight.dtype
--+++++             min_dtype = float(ops.finfo(dtype).min)
--+++++ 
--+++++-            attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++++            # attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
--++++++            #     attention_mask,
--++++++            #     sequence_length=sequence_length,
--++++++            #     target_length=past_key_values.get_max_length(),
--++++++            #     dtype=dtype,
--++++++            #     min_dtype=min_dtype,
--++++++            #     cache_position=cache_position,
--++++++            #     batch_size=batch_size,
--++++++            # )
--++++++
--++++++            #@dwj
--++++++            attention_mask = get_cached_causal_mask_with_cache_position(
--+++++                 attention_mask,
--+++++                 sequence_length=sequence_length,
--+++++                 target_length=past_key_values.get_max_length(),
--+++++diff --git a/patches/0001-20251104commit.patch b/patches/0001-20251104commit.patch
--+++++deleted file mode 100644
--+++++index 6dfb5b93..00000000
--+++++--- a/patches/0001-20251104commit.patch
--++++++++ /dev/null
--+++++@@ -1,1272 +0,0 @@
--+++++-From 1c7cdda5edcc67eb81880aeaa24f98aa46012c01 Mon Sep 17 00:00:00 2001
--+++++-From: Pinoeer-kingxi <13022943007@163.com>
--+++++-Date: Tue, 4 Nov 2025 09:11:51 +0800
--+++++-Subject: [PATCH] 20251104commit
--+++++-
--+++++----
--+++++- mindnlp/transformers/cache_utils.py           |  28 +-
--+++++- .../models/deepseek/modeling_deepseek.py      | 149 ++-
--+++++- .../models/qwen2_moe/modeling_qwen2_moe.py    | 886 ++++++++++++++++--
--+++++- 3 files changed, 976 insertions(+), 87 deletions(-)
--+++++-
--+++++-diff --git a/mindnlp/transformers/cache_utils.py b/mindnlp/transformers/cache_utils.py
--+++++-index cadd2e04..02f8d4be 100644
--+++++---- a/mindnlp/transformers/cache_utils.py
--+++++-+++ b/mindnlp/transformers/cache_utils.py
--+++++-@@ -812,14 +812,26 @@ class StaticCache(Cache):
--+++++-             # # The operator 'aten::index_copy.out' is not currently implemented for the MPS device.
--+++++-             # k_out[:, :, cache_position] = key_states
--+++++-             # v_out[:, :, cache_position] = value_states
--+++++--            if ON_ORANGE_PI:
--+++++--                k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++++--                v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++++--            else:
--+++++--                # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++++--                k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++++--                v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++++--
--+++++-+            # if ON_ORANGE_PI:
--+++++-+            #     k_out = ops.inplace_index_add(k_out, 2, cache_position.int(), key_states)
--+++++-+            #     v_out = ops.inplace_index_add(v_out, 2, cache_position.int(), value_states)
--+++++-+            # else:
--+++++-+            #     # use index_add for mindspore since tensor slice is too slow and no implementation of index_copy
--+++++-+            #     k_out = ops.index_add(k_out, 2, cache_position.int(), key_states)
--+++++-+            #     v_out = ops.index_add(v_out, 2, cache_position.int(), value_states)
--+++++-+            # 确保 cache_position 是 1D tensor 并且类型正确
--+++++-+            # 根据官方文档: indices 必须是 1D tensor，且 indices.shape[0] == y.shape[axis]
--+++++-+            if cache_position.ndim > 1:
--+++++-+                cache_position = cache_position.flatten()
--+++++-+            # 确保类型是 int32 或 int64（MindSpore 要求）
--+++++-+            if cache_position.dtype not in (mindspore.int32, mindspore.int64):
--+++++-+                cache_position = cache_position.int()
--+++++-+            
--+++++-+            # JIT 编译不支持 try-except，直接使用切片赋值（更简单且兼容 JIT）
--+++++-+            # 切片赋值对于 StaticCache 是安全的，因为 cache_position 是预分配的索引
--+++++-+            k_out[:, :, cache_position] = key_states
--+++++-+            v_out[:, :, cache_position] = value_states
--+++++-+            
--+++++-         return k_out, v_out
--+++++- 
--+++++-     def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
--+++++-diff --git a/mindnlp/transformers/models/deepseek/modeling_deepseek.py b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++-index c695b944..d8303e45 100644
--+++++---- a/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++-+++ b/mindnlp/transformers/models/deepseek/modeling_deepseek.py
--+++++-@@ -210,8 +210,10 @@ class DeepseekDynamicNTKScalingRotaryEmbedding(DeepseekRotaryEmbedding):
--+++++- # Copied from transformers.models.llama.modeling_llama.rotate_half
--+++++- def rotate_half(x):
--+++++-     """Rotates half the hidden dims of the input."""
--+++++--    x1 = x[..., : x.shape[-1] // 2]
--+++++--    x2 = x[..., x.shape[-1] // 2 :]
--+++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++++-+    # x1 = x[..., : x.shape[-1] // 2]
--+++++-+    # x2 = x[..., x.shape[-1] // 2 :]
--+++++-+    x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++++-     return ops.cat((-x2, x1), dim=-1)
--+++++- 
--+++++- 
--+++++-@@ -385,32 +387,42 @@ class DeepseekMoE(nn.Module):
--+++++-         if self.training:
--+++++-             raise NotImplementedError("Training is not supported yet.")
--+++++-         else:
--+++++--            y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++++--        if self.config.n_shared_experts is not None:
--+++++--            y = y + self.shared_experts(identity)
--+++++--        return y
--+++++-+            # @lwx
--+++++-+            if orig_shape[1] == 1:
--+++++-+                y=self.moe_infer_decode(hidden_states,flat_topk_idx,topk_weight.view(-1, 1))
--+++++-+                y=y.view(*orig_shape)
--+++++-+                if self.config.n_shared_experts is not None:
--+++++-+                    y = y + self.shared_experts(identity)
--+++++-+                return y
--+++++-+            else:
--+++++-+                y= self.moe_infer_prefill(hidden_states,flat_topk_idx,topk_weight.view(-1, 1)).view(*orig_shape)
--+++++-+                if self.config.n_shared_experts is not None:
--+++++-+                    y = y + self.shared_experts(identity)
--+++++-+                return y
--+++++-+            # y = self.moe_infer(hidden_states, flat_topk_idx, topk_weight.view(-1, 1)).view(*orig_shape)
--+++++-+        # if self.config.n_shared_experts is not None:
--+++++-+        #     y = y + self.shared_experts(identity)
--+++++-+        # return y
--+++++-+        
--+++++-+    @no_grad()
--+++++-+    def moe_infer_decode(self, x, flat_expert_indices, flat_expert_weights):
--+++++-+
--+++++-+        expert_cache = ops.zeros_like(x)
--+++++-+        for i in range(self.num_experts_per_tok):
--+++++-+            expert_id = flat_expert_indices[i].item()
--+++++-+            weight = flat_expert_weights[i].item()
--+++++-+            expert = self.experts[expert_id]
--+++++-+            expert_out = expert(x)
--+++++-+            expert_cache += expert_out * weight
--+++++-+        return expert_cache
--+++++- 
--+++++-     @no_grad()
--+++++--    def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++--        # expert_cache = torch.zeros_like(x)
--+++++--        # idxs = flat_expert_indices.argsort()
--+++++--        # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++--        # token_idxs = idxs // self.num_experts_per_tok
--+++++--        # for i, end_idx in enumerate(tokens_per_expert):
--+++++--        #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++--        #     if start_idx == end_idx:
--+++++--        #         continue
--+++++--        #     expert = self.experts[i]
--+++++--        #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++--        #     expert_tokens = x[exp_token_idx]
--+++++--        #     expert_out = expert(expert_tokens)
--+++++--        #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++--        #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++--        # return expert_cache
--+++++-+    def moe_infer_prefill(self, x, flat_expert_indices, flat_expert_weights):
--+++++-         expert_cache = ops.zeros_like(x)
--+++++-         idxs = flat_expert_indices.argsort()
--+++++-         tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++-         token_idxs = idxs // self.num_experts_per_tok
--+++++-+
--+++++-         for i, end_idx in enumerate(tokens_per_expert):
--+++++-             start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++-             if start_idx == end_idx:
--+++++-@@ -421,7 +433,76 @@ class DeepseekMoE(nn.Module):
--+++++-             expert_out = expert(expert_tokens)
--+++++-             expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++-             expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++-+
--+++++-         return expert_cache
--+++++-+        
--+++++-+    # @no_grad()
--+++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++-+    #     # expert_cache = torch.zeros_like(x)
--+++++-+    #     # idxs = flat_expert_indices.argsort()
--+++++-+    #     # tokens_per_expert = flat_expert_indices.bincount().cpu().numpy().cumsum(0)
--+++++-+    #     # token_idxs = idxs // self.num_experts_per_tok
--+++++-+    #     # for i, end_idx in enumerate(tokens_per_expert):
--+++++-+    #     #     start_idx = 0 if i == 0 else tokens_per_expert[i - 1]
--+++++-+    #     #     if start_idx == end_idx:
--+++++-+    #     #         continue
--+++++-+    #     #     expert = self.experts[i]
--+++++-+    #     #     exp_token_idx = token_idxs[start_idx:end_idx]
--+++++-+    #     #     expert_tokens = x[exp_token_idx]
--+++++-+    #     #     expert_out = expert(expert_tokens)
--+++++-+    #     #     expert_out.mul_(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++-+    #     #     expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce='sum')
--+++++-+    #     # return expert_cache
--+++++-+    #     expert_cache = ops.zeros_like(x)
--+++++-+    #     idxs = flat_expert_indices.argsort()
--+++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++-+    #     token_idxs = idxs // self.num_experts_per_tok
--+++++-+
--+++++-+    #     for i, end_idx in enumerate(tokens_per_expert):
--+++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++-+    #         if start_idx == end_idx:
--+++++-+    #             continue
--+++++-+    #         expert = self.experts[i]
--+++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++-+    #         expert_tokens = x[exp_token_idx]
--+++++-+    #         expert_out = expert(expert_tokens)
--+++++-+    #         expert_out = expert_out.mul(flat_expert_weights[idxs[start_idx:end_idx]])
--+++++-+    #         expert_cache = mindspore.mint.scatter_add(expert_cache, 0, exp_token_idx.view(-1, 1).tile((1, x.shape[-1])), expert_out)
--+++++-+
--+++++-+    #     return expert_cache
--+++++-+    # @no_grad()
--+++++-+    # def moe_infer(self, x, flat_expert_indices, flat_expert_weights):
--+++++-+    #     expert_cache = ops.zeros_like(x)
--+++++-+
--+++++-+    #     # 排序保证顺序一致
--+++++-+    #     idxs = flat_expert_indices.argsort()
--+++++-+    #     tokens_per_expert = flat_expert_indices.bincount().cumsum(0)
--+++++-+    #     token_idxs = idxs // self.num_experts_per_tok
--+++++-+
--+++++-+    #     # 找出有 token 的专家
--+++++-+    #     active_experts = (tokens_per_expert > ops.cat((ops.zeros(1, dtype=tokens_per_expert.dtype), tokens_per_expert[:-1]))).nonzero().squeeze(-1)
--+++++-+
--+++++-+    #     for i in active_experts.tolist():
--+++++-+    #         start_idx = 0 if i == 0 else tokens_per_expert[i-1]
--+++++-+    #         end_idx = tokens_per_expert[i]
--+++++-+    #         if start_idx == end_idx:  # 没有 token
--+++++-+    #             continue
--+++++-+
--+++++-+    #         exp_token_idx = token_idxs[start_idx:end_idx]
--+++++-+    #         expert_tokens = x[exp_token_idx]
--+++++-+    #         expert_out = self.experts[i](expert_tokens)
--+++++-+    #         expert_out = expert_out * flat_expert_weights[idxs[start_idx:end_idx]]
--+++++-+
--+++++-+    #         expert_cache = mindspore.mint.scatter_add(
--+++++-+    #             expert_cache,
--+++++-+    #             0,
--+++++-+    #             exp_token_idx.view(-1, 1).tile((1, x.shape[-1])),
--+++++-+    #             expert_out
--+++++-+    #         )
--+++++-+
--+++++-+    #     return expert_cache
--+++++-+
--+++++-+
--+++++- 
--+++++- # class AddAuxiliaryLoss(mindnlp.core.autograd.Function):
--+++++- #     """
--+++++-@@ -1103,6 +1184,26 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++++- 
--+++++-         # Initialize weights and apply final processing
--+++++-         self.post_init()
--+++++-+        self.warm_up = False
--+++++-+
--+++++-+    def warmup_moe_model_deep(self):
--+++++-+        print("[Warmup] DeepSeek-MoE 模型预热开始...")
--+++++-+        test_texts = [
--+++++-+            "warmup short",
--+++++-+            "This is a medium length warmup sentence for MoE experts. middle middle middle",
--+++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover different attention paths. very very long, very very long, very very long"
--+++++-+        ]
--+++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++++-+        if tokenizer is None:
--+++++-+            from mindnlp.transformers import AutoTokenizer
--+++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++++-+            self._warmup_tokenizer = tokenizer
--+++++-+
--+++++-+        for text in test_texts:
--+++++-+            inputs = tokenizer(text, return_tensors="ms")
--+++++-+            with mindspore._no_grad():
--+++++-+                _ = self(**inputs, use_cache=False)
--+++++-+        print("[Warmup] DeepSeek-MoE 模型预热完成。")
--+++++- 
--+++++-     def get_input_embeddings(self):
--+++++-         return self.model.embed_tokens
--+++++-@@ -1161,6 +1262,10 @@ class DeepseekForCausalLM(DeepseekPreTrainedModel):
--+++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++++-         ```"""
--+++++-+        if not self.warm_up:
--+++++-+            self.warm_up = True
--+++++-+            self.warmup_moe_model_deep()
--+++++-+
--+++++-         output_attentions = (
--+++++-             output_attentions
--+++++-             if output_attentions is not None
--+++++-diff --git a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++-index 3cbf820e..d4c6b651 100644
--+++++---- a/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++-+++ b/mindnlp/transformers/models/qwen2_moe/modeling_qwen2_moe.py
--+++++-@@ -18,7 +18,6 @@
--+++++- # See the License for the specific language governing permissions and
--+++++- # limitations under the License.
--+++++- """MindSpore Qwen2MoE model."""
--+++++--
--+++++- import math
--+++++- from typing import List, Optional, Tuple, Union
--+++++- 
--+++++-@@ -36,6 +35,7 @@ from ...modeling_outputs import (
--+++++-     TokenClassifierOutput,
--+++++- )
--+++++- from ...modeling_utils import PreTrainedModel
--+++++-+from ...generation import GenerationMixin
--+++++- from ....utils import logging
--+++++- from .configuration_qwen2_moe import Qwen2MoeConfig
--+++++- 
--+++++-@@ -182,6 +182,11 @@ class Qwen2MoeRMSNorm(nn.Module):
--+++++-         self.variance_epsilon = eps
--+++++- 
--+++++-     def forward(self, hidden_states):
--+++++-+        # @dwj
--+++++-+        # return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++++-+        # @lwx
--+++++-+        # if not self.training :
--+++++-+        #     return F.rms_norm(hidden_states, self.weight, self.variance_epsilon)
--+++++-         input_dtype = hidden_states.dtype
--+++++-         hidden_states = hidden_states.to(mindspore.float32)
--+++++-         variance = ops.mean(hidden_states.pow(2), -1, keepdim=True)
--+++++-@@ -234,6 +239,8 @@ def rotate_half(x):
--+++++-     """Rotates half the hidden dims of the input."""
--+++++-     x1 = x[..., : x.shape[-1] // 2]
--+++++-     x2 = x[..., x.shape[-1] // 2 :]
--+++++-+    # @lwx_note: 这里使用 ops.split 代替 x[..., : x.shape[-1] // 2] 和 x[..., x.shape[-1] // 2 :]
--+++++-+    # x1,x2 = ops.split( x, x.shape[-1] // 2, dim=-1 )
--+++++-     return ops.cat((-x2, x1), dim=-1)
--+++++- 
--+++++- 
--+++++-@@ -273,15 +280,28 @@ class Qwen2MoeMLP(nn.Module):
--+++++-         self.config = config
--+++++-         self.hidden_size = config.hidden_size
--+++++-         self.intermediate_size = intermediate_size
--+++++-+        
--+++++-         self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++++-         self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
--+++++-         self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
--+++++-         self.act_fn = ACT2FN[config.hidden_act]
--+++++- 
--+++++-     def forward(self, x):
--+++++--        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++++--
--+++++- 
--+++++-+        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
--+++++-+        # @lwx 
--+++++-+        # gate_up_output = self.gate_up_proj(x)
--+++++-+        # swiglu_output = mindspore.ops.swiglu(gate_up_output)
--+++++-+        # return self.down_proj(swiglu_output)
--+++++-+
--+++++-+    # def forward(self, x):
--+++++-+    #     gate_proj_out = self.gate_proj(x)
--+++++-+    #     up_proj_out = self.up_proj(x)
--+++++-+    #     # 拼接，形状变 (batch, seq_len, intermediate_size * 2)
--+++++-+    #     # gate_up_out = mindspore.ops.cat([gate_proj_out.astype(x.dtype), up_proj_out.astype(x.dtype)],-1)
--+++++-+    #     swiglu_out = mindspore.ops.silu(gate_proj_out) * up_proj_out
--+++++-+    #     return self.down_proj(swiglu_out)
--+++++-+        
--+++++- # Copied from transformers.models.llama.modeling_llama.repeat_kv
--+++++- def repeat_kv(hidden_states: mindspore.Tensor, n_rep: int) -> mindspore.Tensor:
--+++++-     """
--+++++-@@ -349,6 +369,9 @@ class Qwen2MoeAttention(nn.Module):
--+++++-         use_cache: bool = False,
--+++++-         cache_position: Optional[mindspore.Tensor] = None,
--+++++-     ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++-+
--+++++-+        
--+++++-+
--+++++-         bsz, q_len, _ = hidden_states.shape
--+++++- 
--+++++-         query_states = self.q_proj(hidden_states)
--+++++-@@ -367,28 +390,28 @@ class Qwen2MoeAttention(nn.Module):
--+++++-                     "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++-                     "with a layer index."
--+++++-                 )
--+++++--            kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++-+            if isinstance(past_key_value, StaticCache):
--+++++-+                kv_seq_len = key_states.shape[-2]
--+++++-+            else:
--+++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++-         cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++-         query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++- 
--+++++-         if past_key_value is not None:
--+++++-             cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}  # Specific to RoPE models
--+++++-             key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
--+++++-+            
--+++++-+            if isinstance(past_key_value, StaticCache):
--+++++-+                kv_seq_len = key_states.shape[-2]
--+++++- 
--+++++-         # repeat k/v heads if n_kv_heads < n_heads
--+++++-         key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++-         value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++--
--+++++-+        
--+++++-         attn_weights = ops.matmul(query_states, ops.transpose(key_states, 2, 3)) / math.sqrt(self.head_dim)
--+++++- 
--+++++--        if attn_weights.shape != (bsz, self.num_heads, q_len, kv_seq_len):
--+++++--            raise ValueError(
--+++++--                f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
--+++++--                f" {attn_weights.shape}"
--+++++--            )
--+++++--
--+++++--        if attention_mask is not None:  # no matter the length, we just slice it
--+++++--            causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
--+++++-+        if attention_mask is not None:
--+++++-+            causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
--+++++-             attn_weights = attn_weights + causal_mask
--+++++- 
--+++++-         # upcast attention to fp32
--+++++-@@ -406,15 +429,374 @@ class Qwen2MoeAttention(nn.Module):
--+++++-         attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
--+++++- 
--+++++-         attn_output = self.o_proj(attn_output)
--+++++--
--+++++-+        # @lwx
--+++++-+        
--+++++-+        # max_seq_len = self.max_position_embeddings  # 2048
--+++++-+
--+++++-+        # if attention_mask is not None:
--+++++-+        #     # attention_mask: [B, 1, Sq, Sk]
--+++++-+        #     mask_2d = attention_mask[0, 0]  # -> [Sq, Sk] 单个样本的二维mask
--+++++-+
--+++++-+        #     # pad 到 [max_seq_len, max_seq_len]
--+++++-+        #     padded_mask = ops.ones((max_seq_len, max_seq_len), dtype=mask_2d.dtype) != 0
--+++++-+        #     padded_mask[:mask_2d.shape[0], :mask_2d.shape[1]] = (mask_2d != 0)
--+++++-+        #     global_attention_mask = padded_mask
--+++++-+        # else:
--+++++-+        #     global_attention_mask = None
--+++++-+
--+++++-+
--+++++-+        # sparse_mode=3
--+++++-+        # attn_output = mindspore.ops.flash_attention_score(
--+++++-+        #     query=query_states,
--+++++-+        #     key=key_states,
--+++++-+        #     value=value_states,
--+++++-+        #     real_shift=None,
--+++++-+        #     padding_mask=None,
--+++++-+
--+++++-+        #     head_num=self.num_heads,
--+++++-+        #     attn_mask=global_attention_mask,
--+++++-+        #     keep_prob=1.0 - self.attention_dropout,
--+++++-+        #     scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++-+        #     input_layout="BNSD",
--+++++-+        #     pre_tokens=2147483647,
--+++++-+        #     next_tokens=2147483647,
--+++++-+        #     inner_precise=0,
--+++++-+        #     drop_mask=None,
--+++++-+        #     prefix=None,
--+++++-+        #     actual_seq_qlen=None,
--+++++-+        #     actual_seq_kvlen=None,
--+++++-+        #     sparse_mode=sparse_mode,
--+++++-+        # )
--+++++-         if not output_attentions:
--+++++-             attn_weights = None
--+++++- 
--+++++-         return attn_output, attn_weights, past_key_value
--+++++- 
--+++++- 
--+++++-+class Qwen2MoeFlashAttention(nn.Module):
--+++++-+    """
--+++++-+    Qwen2MoeAttention的优化版本，直接调用底层的 mindspore.ops.flash_attention_score 算子。
--+++++-+    这个实现为昇腾硬件（如 Atlas A2）进行了深度优化。
--+++++-+
--+++++-+    关键改动:
--+++++-+    1. 移除了手动的 `repeat_kv` 调用。`flash_attention_score` 内部原生支持GQA (Grouped-Query Attention)，
--+++++-+       直接传入原始的 key 和 value 张量效率更高。
--+++++-+    2. 增加了将标准浮点型 attention_mask 转换为 `flash_attention_score` 所需的布尔型掩码的逻辑。
--+++++-+    3. 严格遵循 `flash_attention_score` 的参数要求，如 `input_layout="BNSD"`。
--+++++-+    """
--+++++-+    def __init__(self, config: Qwen2MoeConfig, layer_idx: Optional[int] = None):
--+++++-+        super().__init__()
--+++++-+        self.config = config
--+++++-+        self.layer_idx = layer_idx
--+++++-+        self.hidden_size = config.hidden_size
--+++++-+        self.num_heads = config.num_attention_heads
--+++++-+        self.head_dim = self.hidden_size // self.num_heads
--+++++-+        self.num_key_value_heads = config.num_key_value_heads
--+++++-+        self.num_key_value_groups = self.num_heads // self.num_key_value_heads
--+++++-+        self.max_position_embeddings = config.max_position_embeddings
--+++++-+        self.rope_theta = config.rope_theta
--+++++-+        self.attention_dropout = config.attention_dropout
--+++++-+
--+++++-+        if (self.head_dim * self.num_heads) != self.hidden_size:
--+++++-+            raise ValueError(
--+++++-+                f"hidden_size ({self.hidden_size}) must be divisible by num_heads ({self.num_heads})"
--+++++-+            )
--+++++-+
--+++++-+        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
--+++++-+        self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++-+        self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
--+++++-+        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
--+++++-+
--+++++-+        self.rotary_emb = Qwen2MoeRotaryEmbedding(
--+++++-+            self.head_dim,
--+++++-+            max_position_embeddings=self.max_position_embeddings,
--+++++-+            base=self.rope_theta,
--+++++-+        )
--+++++-+
--+++++-+    def forward(
--+++++-+        self,
--+++++-+        hidden_states: mindspore.Tensor,
--+++++-+        attention_mask: Optional[mindspore.Tensor] = None,
--+++++-+        position_ids: Optional[mindspore.Tensor] = None,
--+++++-+        past_key_value: Optional[Cache] = None,
--+++++-+        output_attentions: bool = False,
--+++++-+        use_cache: bool = False,
--+++++-+        cache_position: Optional[mindspore.Tensor] = None,
--+++++-+    ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++-+
--+++++-+        bsz, q_len, _ = hidden_states.shape
--+++++-+
--+++++-+        # 1. 线性投射 Q, K, V
--+++++-+        query_states = self.q_proj(hidden_states)
--+++++-+        key_states = self.k_proj(hidden_states)
--+++++-+        value_states = self.v_proj(hidden_states)
--+++++-+
--+++++-+        # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++-+        # query:   [B, S, H*D] -> [B, N1, S, D]
--+++++-+        # key/val: [B, S, H2*D] -> [B, N2, S, D]
--+++++-+        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+
--+++++-+        # 3. RoPE 旋转位置编码
--+++++-+        kv_seq_len = key_states.shape[-2]
--+++++-+        if past_key_value is not None:
--+++++-+            if self.layer_idx is None:
--+++++-+                raise ValueError(
--+++++-+                    f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++-+                    "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++-+                    "with a layer index."
--+++++-+                )
--+++++-+            # 对于 StaticCache，需要特殊处理 kv_seq_len
--+++++-+            # 因为 StaticCache 的 key_states 形状是整个 cache 大小，但实际只使用 cache_position 指定的部分
--+++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++-+                # 使用 cache_position 的长度来确定实际的 kv_seq_len
--+++++-+                # 在 prefill 阶段：cache_position = [0, 1, 2, ..., n-1]，kv_seq_len = n
--+++++-+                # 在 decode 阶段：cache_position = [pos]，kv_seq_len = pos + 1（但我们无法在 JIT 中获取 pos 值）
--+++++-+                # 为了 JIT 兼容，我们使用 cache_position 的长度，但这只在 prefill 阶段正确
--+++++-+                # 对于 decode 阶段，我们需要在 Python 层预先计算并传递
--+++++-+                # 临时解决方案：使用 cache_position 的最大值（如果可能）
--+++++-+                # 但由于 JIT 限制，我们使用一个近似值：cache_position.shape[0] + past_seen_tokens
--+++++-+                past_seen_tokens = past_key_value.get_seq_length(self.layer_idx) if hasattr(past_key_value, 'get_seq_length') else 0
--+++++-+                if cache_position.shape[0] == 1:
--+++++-+                    # decode 阶段：cache_position 是单个值，我们需要该值 + 1
--+++++-+                    # 但由于 JIT 限制，我们使用 past_seen_tokens + 1（近似）
--+++++-+                    kv_seq_len = past_seen_tokens + 1
--+++++-+                else:
--+++++-+                    # prefill 阶段：cache_position 是范围，使用其长度
--+++++-+                    kv_seq_len = cache_position.shape[0] + past_seen_tokens
--+++++-+            else:
--+++++-+                kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++-+
--+++++-+        cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++-+        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++-+
--+++++-+        # 4. KV 缓存更新
--+++++-+        if past_key_value is not None:
--+++++-+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++-+            key_states, value_states = past_key_value.update(
--+++++-+                key_states, value_states, self.layer_idx, cache_kwargs
--+++++-+            )
--+++++-+            
--+++++-+            # 对于 StaticCache 的 decode 阶段，update() 后 key_states.shape[-2] 就是实际长度
--+++++-+            # 我们需要更新 kv_seq_len（因为 key_states 形状是 max_cache_len，但实际只使用部分）
--+++++-+            if isinstance(past_key_value, StaticCache) and cache_position is not None:
--+++++-+                if cache_position.shape[0] == 1:
--+++++-+                    # decode 阶段：使用 key_states 的实际 shape（已包含之前的 cache + 当前 token）
--+++++-+                    kv_seq_len = key_states.shape[-2]
--+++++-+
--+++++-+        # 5. [重要] 准备 Attention Mask
--+++++-+        #    flash_attention_score 需要一个布尔掩码，其中 True 表示需要被丢弃（mask掉）
--+++++-+        #    而上游传入的 attention_mask 是浮点类型，0 表示保留，大负数表示丢弃
--+++++-+        fa_attention_mask = None
--+++++-+        if attention_mask is not None:
--+++++-+            # 截取与当前key长度匹配的部分
--+++++-+            # 原始 mask 形状: (B, 1, Sq, Sk_max), 我们需要 (B, N1, Sq, Sk_cur)
--+++++-+            # FA算子会自动广播，所以我们只需要 (B, 1, Sq, Sk_cur)
--+++++-+            mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++-+            # 转换为布尔类型: 大负数 -> True, 0 -> False
--+++++-+            fa_attention_mask = (mask_slice != 0)
--+++++-+
--+++++-+        # 确保输入数据类型为 float16 或 bfloat16，与算子要求一致
--+++++-+        input_dtype = query_states.dtype
--+++++-+        if input_dtype not in (mindspore.float16, mindspore.bfloat16):
--+++++-+            # 强制用 fp16, 减少 bf16 精度异常，并满足算子要求
--+++++-+            query_states = query_states.to(mindspore.float16)
--+++++-+            key_states = key_states.to(mindspore.float16)
--+++++-+            value_states = value_states.to(mindspore.float16)
--+++++-+
--+++++-+        # 6. [核心] 调用 flash_attention_score 算子
--+++++-+        #    - 无需手动 repeat_kv, 算子原生支持 GQA
--+++++-+        #    - input_layout='BNSD' 对应 [Batch, Num_heads, Seq_len, Head_dim]
--+++++-+        attn_output = mindspore.ops.flash_attention_score(
--+++++-+            query=query_states,
--+++++-+            key=key_states,
--+++++-+            value=value_states,
--+++++-+            head_num=self.num_heads, # 传入Q的头数(N1)
--+++++-+            attn_mask=fa_attention_mask,
--+++++-+            keep_prob=1.0 - self.attention_dropout,
--+++++-+            scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++-+            input_layout="BNSD",
--+++++-+            sparse_mode=0 # 使用 defaultMask 模式
--+++++-+        )
--+++++-+
--+++++-+        # 恢复原始数据类型
--+++++-+        attn_output = attn_output.to(input_dtype)
--+++++-+
--+++++-+        # 7. 调整输出形状
--+++++-+        # [B, N1, S, D] -> [B, S, N1, D] -> [B, S, H]
--+++++-+        attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++-+        attn_output = self.o_proj(attn_output)
--+++++-+
--+++++-+        # FlashAttention 算子不直接返回注意力权重矩阵
--+++++-+        attn_weights = None
--+++++-+        if output_attentions:
--+++++-+             logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++-+
--+++++-+        return attn_output, attn_weights, past_key_value
--+++++-+
--+++++-+    # def forward(
--+++++-+    #     self,
--+++++-+    #     hidden_states: mindspore.Tensor,
--+++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++-+    #     past_key_value: Optional[Cache] = None,
--+++++-+    #     output_attentions: bool = False,
--+++++-+    #     use_cache: bool = False,
--+++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++-+
--+++++-+    #     bsz, q_len, _ = hidden_states.shape
--+++++-+
--+++++-+    #     # 1. 线性投射 Q, K, V
--+++++-+    #     query_states = self.q_proj(hidden_states)
--+++++-+    #     key_states = self.k_proj(hidden_states)
--+++++-+    #     value_states = self.v_proj(hidden_states)
--+++++-+
--+++++-+    #     # 2. 调整形状以匹配 Flash Attention 的 BNSD 布局
--+++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+
--+++++-+    #     # 3. RoPE 旋转位置编码
--+++++-+    #     kv_seq_len = key_states.shape[-2]
--+++++-+    #     if past_key_value is not None:
--+++++-+    #         if self.layer_idx is None:
--+++++-+    #             raise ValueError(
--+++++-+    #                 f"The cache structure has changed since version v4.36. If you are using {self.__class__.__name__} "
--+++++-+    #                 "for auto-regressive decoding with k/v caching, please make sure to initialize the attention class "
--+++++-+    #                 "with a layer index."
--+++++-+    #             )
--+++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++-+
--+++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++-+
--+++++-+    #     # 4. KV 缓存更新
--+++++-+    #     if past_key_value is not None:
--+++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++-+    #         key_states, value_states = past_key_value.update(
--+++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++-+    #         )
--+++++-+
--+++++-+    #     # 5. 准备 Attention Mask
--+++++-+    #     fa_attention_mask = None
--+++++-+    #     if attention_mask is not None:
--+++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++-+    #         fa_attention_mask = (mask_slice != 0)
--+++++-+
--+++++-+    #     # <--- 修改点 1: 删除了不必要的强制类型转换 ---
--+++++-+    #     # 保留原始数据类型，例如 bfloat16，以避免精度损失。
--+++++-+    #     input_dtype = query_states.dtype
--+++++-+
--+++++-+    #     # 6. [核心] 调用 flash_attention_score 算子
--+++++-+    #     attn_output = mindspore.ops.flash_attention_score(
--+++++-+    #         query=query_states,
--+++++-+    #         key=key_states,
--+++++-+    #         value=value_states,
--+++++-+    #         head_num=self.num_heads,
--+++++-+    #         attn_mask=fa_attention_mask,
--+++++-+    #         keep_prob=1.0 - self.attention_dropout,
--+++++-+    #         scalar_value=1.0 / math.sqrt(self.head_dim),
--+++++-+    #         input_layout="BNSD",
--+++++-+    #         sparse_mode=0,
--+++++-+    #         # <--- 修改点 2: 启用内部高精度计算 ---
--+++++-+    #         # inner_precise=1 会让算子内部使用 float32 进行累加和 softmax 计算，
--+++++-+    #         # 这与 Eager 版本中的 .softmax(dtype=ms.float32) 行为对齐。
--+++++-+    #         inner_precise=1
--+++++-+    #     )
--+++++-+
--+++++-+    #     # 恢复原始数据类型
--+++++-+    #     attn_output = attn_output.to(input_dtype)
--+++++-+
--+++++-+    #     # 7. 调整输出形状
--+++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++-+    #     attn_output = self.o_proj(attn_output)
--+++++-+
--+++++-+    #     attn_weights = None
--+++++-+    #     if output_attentions:
--+++++-+    #          logger.warning_once("Qwen2MoeFlashAttention is used but `output_attentions=True`. FA does not return attentions.")
--+++++-+
--+++++-+    #     return attn_output, attn_weights, past_key_value
--+++++-+
--+++++-+    # def forward(
--+++++-+    #     self,
--+++++-+    #     hidden_states: mindspore.Tensor,
--+++++-+    #     attention_mask: Optional[mindspore.Tensor] = None,
--+++++-+    #     position_ids: Optional[mindspore.Tensor] = None,
--+++++-+    #     past_key_value: Optional[Cache] = None,
--+++++-+    #     output_attentions: bool = False,
--+++++-+    #     use_cache: bool = False,
--+++++-+    #     cache_position: Optional[mindspore.Tensor] = None,
--+++++-+    # ) -> Tuple[mindspore.Tensor, Optional[mindspore.Tensor], Optional[Tuple[mindspore.Tensor]]]:
--+++++-+
--+++++-+    #     bsz, q_len, _ = hidden_states.shape
--+++++-+
--+++++-+    #     query_states = self.q_proj(hidden_states)
--+++++-+    #     key_states = self.k_proj(hidden_states)
--+++++-+    #     value_states = self.v_proj(hidden_states)
--+++++-+
--+++++-+    #     query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+    #     key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+    #     value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(0, 2, 1, 3)
--+++++-+
--+++++-+    #     kv_seq_len = key_states.shape[-2]
--+++++-+    #     if past_key_value is not None:
--+++++-+    #         if self.layer_idx is None:
--+++++-+    #             raise ValueError("`layer_idx` must be specified for caching")
--+++++-+    #         kv_seq_len += past_key_value.get_usable_length(kv_seq_len, self.layer_idx)
--+++++-+
--+++++-+    #     cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
--+++++-+    #     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
--+++++-+
--+++++-+    #     if past_key_value is not None:
--+++++-+    #         cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
--+++++-+    #         key_states, value_states = past_key_value.update(
--+++++-+    #             key_states, value_states, self.layer_idx, cache_kwargs
--+++++-+    #         )
--+++++-+
--+++++-+    #     key_states = repeat_kv(key_states, self.num_key_value_groups)
--+++++-+    #     value_states = repeat_kv(value_states, self.num_key_value_groups)
--+++++-+
--+++++-+    #     # <--- 核心修改点: 手动进行高精度缩放 ---
--+++++-+    #     # 在调用算子前，手动将 query_states 除以缩放因子。
--+++++-+    #     # 这样做可以确保缩放操作的精度与 Eager 版本的隐式高精度除法完全一致。
--+++++-+    #     query_states = query_states / math.sqrt(self.head_dim)
--+++++-+    #     # <--- 修改结束 ---
--+++++-+
--+++++-+    #     fa_attention_mask = None
--+++++-+    #     if attention_mask is not None:
--+++++-+    #         mask_slice = attention_mask[:, :, :q_len, :key_states.shape[-2]]
--+++++-+    #         fa_attention_mask = (mask_slice != 0)
--+++++-+
--+++++-+    #     input_dtype = query_states.dtype
--+++++-+
--+++++-+    #     attn_output = mindspore.ops.flash_attention_score(
--+++++-+    #         query=query_states,    # 传入已经预先缩放过的 query
--+++++-+    #         key=key_states,
--+++++-+    #         value=value_states,
--+++++-+    #         head_num=self.num_heads,
--+++++-+    #         attn_mask=fa_attention_mask,
--+++++-+    #         keep_prob=1.0 - self.attention_dropout,
--+++++-+    #         scalar_value=1.0,      # 设置为 1.0，因为缩放已在外部完成
--+++++-+    #         input_layout="BNSD",
--+++++-+    #         sparse_mode=0,
--+++++-+    #         inner_precise=1        # 仍然保持内部高精度计算
--+++++-+    #     )
--+++++-+
--+++++-+    #     attn_output = attn_output.to(input_dtype)
--+++++-+    #     attn_output = attn_output.transpose(0, 2, 1, 3).reshape(bsz, q_len, self.hidden_size)
--+++++-+    #     attn_output = self.o_proj(attn_output)
--+++++-+
--+++++-+    #     attn_weights = None
--+++++-+    #     if output_attentions:
--+++++-+    #          logger.warning_once("Qwen2MoeFlashAttention does not return attention weights.")
--+++++-+
--+++++-+    #     return attn_output, attn_weights, past_key_value
--+++++-+
--+++++- QWEN2MOE_ATTENTION_CLASSES = {
--+++++-     "eager": Qwen2MoeAttention,
--+++++-+    "flash-attention": Qwen2MoeFlashAttention,
--+++++- }
--+++++- 
--+++++- 
--+++++-@@ -434,50 +816,55 @@ class Qwen2MoeSparseMoeBlock(nn.Module):
--+++++-         self.shared_expert = Qwen2MoeMLP(config, intermediate_size=config.shared_expert_intermediate_size)
--+++++-         self.shared_expert_gate = nn.Linear(config.hidden_size, 1, bias=False)
--+++++- 
--+++++-+    #@dwj
--+++++-+    # 只遍历激活的专家，而非全部专家
--+++++-     def forward(self, hidden_states: mindspore.Tensor) -> mindspore.Tensor:
--+++++--        batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++--        hidden_states = hidden_states.view(-1, hidden_dim)
--+++++--        # router_logits: (batch * sequence_length, n_experts)
--+++++--        router_logits = self.gate(hidden_states)
--+++++--
--+++++--        routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++--        routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++--        if self.norm_topk_prob:
--+++++--            routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++--        # we cast back to the input dtype
--+++++--        routing_weights = routing_weights.to(hidden_states.dtype)
--+++++--
--+++++--        final_hidden_states = ops.zeros(
--+++++--            (batch_size * sequence_length, hidden_dim), dtype=hidden_states.dtype
--+++++--        )
--+++++--
--+++++--        # One hot encode the selected experts to create an expert mask
--+++++--        # this will be used to easily index which expert is going to be sollicitated
--+++++--        expert_mask = nn.functional.one_hot(selected_experts, num_classes=self.num_experts).permute(2, 1, 0)
--+++++--
--+++++--        # Loop over all available experts in the model and perform the computation on each expert
--+++++--        for expert_idx in range(self.num_experts):
--+++++--            expert_layer = self.experts[expert_idx]
--+++++--            idx, top_x = ops.nonzero(expert_mask[expert_idx], as_tuple=True)
--+++++--
--+++++--            # Index the correct hidden states and compute the expert hidden state for
--+++++--            # the current expert. We need to make sure to multiply the output hidden
--+++++--            # states by `routing_weights` on the corresponding tokens (top-1 and top-2)
--+++++--            if 0 not in idx.shape:
--+++++--                current_state = hidden_states[None, top_x].reshape(-1, hidden_dim)
--+++++--                current_hidden_states = expert_layer(current_state) * routing_weights[top_x, idx, None]
--+++++--
--+++++--                # However `index_add_` only support torch tensors for indexing so we'll use
--+++++--                # the `top_x` tensor here.
--+++++--                final_hidden_states = final_hidden_states.index_add(0, top_x.int(), current_hidden_states.to(hidden_states.dtype))
--+++++--
--+++++--        shared_expert_output = self.shared_expert(hidden_states)
--+++++--        shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states)) * shared_expert_output
--+++++--
--+++++--        final_hidden_states = final_hidden_states + shared_expert_output
--+++++-+            batch_size, sequence_length, hidden_dim = hidden_states.shape
--+++++-+            hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
--+++++-+            num_tokens = hidden_states_reshaped.shape[0]
--+++++-+
--+++++-+            router_logits = self.gate(hidden_states_reshaped)
--+++++-+            routing_weights = F.softmax(router_logits, dim=1, dtype=mindspore.float32)
--+++++-+            routing_weights, selected_experts = ops.topk(routing_weights, self.top_k, dim=-1)
--+++++-+
--+++++-+            if self.norm_topk_prob:
--+++++-+                routing_weights /= ops.sum(routing_weights, dim=-1, keepdim=True)
--+++++-+            routing_weights = routing_weights.to(hidden_states.dtype)
--+++++-+            
--+++++-+            final_hidden_states = ops.zeros_like(hidden_states_reshaped)
--+++++-+            flat_selected_experts = selected_experts.flatten()
--+++++-+            
--+++++-+            unsqueezed_token_indices = ops.arange(num_tokens, dtype=mindspore.int32).unsqueeze(1)
--+++++-+            broadcasted_token_indices = unsqueezed_token_indices.broadcast_to((-1, self.top_k))
--+++++-+            token_indices = broadcasted_token_indices.flatten()
--+++++-+            
--+++++-+            active_experts = ops.unique(flat_selected_experts)
--+++++-+            
--+++++-+            for expert_idx_tensor in active_experts:
--+++++-+                expert_idx = expert_idx_tensor.item()
--+++++-+                expert_layer = self.experts[expert_idx]
--+++++-+                
--+++++-+                mask = (flat_selected_experts == expert_idx_tensor)
--+++++-+                selected_token_indices = token_indices[mask]
--+++++-+                selected_routing_weights = routing_weights.flatten()[mask]
--+++++-+                
--+++++-+                current_states = hidden_states_reshaped[selected_token_indices]
--+++++-+                
--+++++-+                expert_output = expert_layer(current_states) * selected_routing_weights.unsqueeze(1)
--+++++-+                
--+++++-+                final_hidden_states = final_hidden_states.index_add(
--+++++-+                    dim=0,
--+++++-+                    index=selected_token_indices,
--+++++-+                    source=expert_output.to(hidden_states.dtype)
--+++++-+                )
--+++++-+            
--+++++-+            shared_expert_output = self.shared_expert(hidden_states_reshaped)
--+++++-+            shared_expert_output = F.sigmoid(self.shared_expert_gate(hidden_states_reshaped)) * shared_expert_output
--+++++- 
--+++++--        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++--        return final_hidden_states, router_logits
--+++++-+            final_hidden_states = final_hidden_states + shared_expert_output
--+++++-+            final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
--+++++-+            
--+++++-+            return final_hidden_states, router_logits
--+++++- 
--+++++- 
--+++++- class Qwen2MoeDecoderLayer(nn.Module):
--+++++-@@ -487,6 +874,8 @@ class Qwen2MoeDecoderLayer(nn.Module):
--+++++- 
--+++++-         self.self_attn = QWEN2MOE_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
--+++++- 
--+++++-+        # self.self_attn = QWEN2MOE_ATTENTION_CLASSES["flash-attention"](config, layer_idx)
--+++++-+
--+++++-         if (layer_idx not in config.mlp_only_layers) and (
--+++++-             config.num_experts > 0 and (layer_idx + 1) % config.decoder_sparse_step == 0
--+++++-         ):
--+++++-@@ -580,6 +969,8 @@ class Qwen2MoePreTrainedModel(PreTrainedModel):
--+++++-     _no_split_modules = ["Qwen2MoeDecoderLayer"]
--+++++-     _skip_keys_device_placement = "past_key_values"
--+++++-     _supports_cache_class = True
--+++++-+#lwx
--+++++-+    # _supports_static_cache = True
--+++++- 
--+++++-     def _init_weights(self, module):
--+++++-         std = self.config.initializer_range
--+++++-@@ -797,7 +1188,7 @@ class Qwen2MoeModel(Qwen2MoePreTrainedModel):
--+++++-         return causal_mask
--+++++- 
--+++++- 
--+++++--class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++-+class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel, GenerationMixin):
--+++++-     _tied_weights_keys = ["lm_head.weight"]
--+++++- 
--+++++-     def __init__(self, config):
--+++++-@@ -811,6 +1202,29 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++-         self.num_experts_per_tok = config.num_experts_per_tok
--+++++-         # Initialize weights and apply final processing
--+++++-         self.post_init()
--+++++-+        # @lwx
--+++++-+        # if self.generation_config is not None and self.generation_config.cache_implementation is None:
--+++++-+        #     self.generation_config.cache_implementation = "static"
--+++++-+        self._warmed_up = False
--+++++-+
--+++++-+    def warmup_moe_model(self):
--+++++-+        print("[Warmup] Qwen2-MoE 模型预热开始...")
--+++++-+        test_texts = [
--+++++-+            "warmup short",
--+++++-+            "This is a medium length warmup sentence for MoE experts.middle midlle midlle",
--+++++-+            "This is a long warmup sentence designed to trigger as many experts as possible and include attention mask variations to cover FlashAttention or eager attention paths.very very long,very very long,very very long,very very long"
--+++++-+        ]
--+++++-+        tokenizer = getattr(self, "_warmup_tokenizer", None)
--+++++-+        if tokenizer is None:
--+++++-+            from mindnlp.transformers import AutoTokenizer
--+++++-+            tokenizer = AutoTokenizer.from_pretrained(self.config.name_or_path)
--+++++-+            self._warmup_tokenizer = tokenizer
--+++++-+
--+++++-+        for text in test_texts:
--+++++-+            inputs = tokenizer(text, return_tensors="ms")
--+++++-+            with mindspore._no_grad():
--+++++-+                _ = self(**inputs, output_router_logits=True, use_cache=False)
--+++++-+        print("[Warmup] Qwen2-MoE 模型预热完成。")
--+++++- 
--+++++-     def get_input_embeddings(self):
--+++++-         return self.model.embed_tokens
--+++++-@@ -870,6 +1284,9 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++-         >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
--+++++-         "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
--+++++-         ```"""
--+++++-+        if not self._warmed_up:
--+++++-+            self._warmed_up = True
--+++++-+            self.warmup_moe_model()
--+++++- 
--+++++-         output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
--+++++-         output_router_logits = (
--+++++-@@ -1004,6 +1421,361 @@ class Qwen2MoeForCausalLM(Qwen2MoePreTrainedModel):
--+++++-             }
--+++++-         )
--+++++-         return model_inputs
--+++++-+# @lwx
--+++++-+    # def _decode_one_tokens_logits(
--+++++-+    #     self,
--+++++-+    #     cur_token: mindspore.Tensor,
--+++++-+    #     input_pos: Optional[mindspore.Tensor],
--+++++-+    #     cache_position: mindspore.Tensor,
--+++++-+    #     past_key_values: StaticCache,
--+++++-+    # ) -> mindspore.Tensor:
--+++++-+    #     """
--+++++-+    #     单个token的解码函数,返回Logits（内部实现，未被JIT编译）
--+++++-+        
--+++++-+    #     Args:
--+++++-+    #         cur_token: 当前要处理的token，shape为(batch_size, 1)
--+++++-+    #         input_pos: 输入位置信息，可选
--+++++-+    #         cache_position: 当前token在cache中的位置，shape为(1,)
--+++++-+    #         past_key_values: StaticCache对象，存储之前的key-value状态
--+++++-+            
--+++++-+    #     Returns:
--+++++-+    #         logits: 当前token的logits，shape为(batch_size, vocab_size)
--+++++-+    #     """
--+++++-+    #     # 调用JIT编译的版本
--+++++-+    #     return self.get_decode_one_tokens_logits(
--+++++-+    #         cur_token=cur_token,
--+++++-+    #         input_pos=input_pos,
--+++++-+    #         cache_position=cache_position,
--+++++-+    #         past_key_values=past_key_values,
--+++++-+    #     )
--+++++-+    
--+++++-+    # @mindspore.jit(jit_level='O1')
--+++++-+    # def get_decode_one_tokens_logits(self, cur_token, input_pos, cache_position, past_key_values):
--+++++-+    #     """
--+++++-+    #     JIT编译的函数，用于高效的单token解码
--+++++-+    #     使用JIT编译优化以支持静态shape和高效执行
--+++++-+        
--+++++-+    #     注意：直接调用forward方法，避免经过_call_impl中的try-except
--+++++-+    #     """
--+++++-+    #     outputs = self.model.forward(
--+++++-+    #         input_ids=cur_token,
--+++++-+    #         position_ids=input_pos,
--+++++-+    #         cache_position=cache_position,
--+++++-+    #         past_key_values=past_key_values,
--+++++-+    #         use_cache=True,
--+++++-+    #         return_dict=False,
--+++++-+    #     )
--+++++-+        
--+++++-+    #     hidden_states = outputs[0]
--+++++-+    #     logits = self.lm_head.forward(hidden_states)
--+++++-+    #     logits = logits.float()
--+++++-+        
--+++++-+    #     return logits[:, -1, :]
--+++++-+
--+++++-+    # def _sample(
--+++++-+    #     self,
--+++++-+    #     input_ids: mindspore.Tensor,
--+++++-+    #     logits_processor,
--+++++-+    #     stopping_criteria,
--+++++-+    #     generation_config,
--+++++-+    #     synced_devices: bool,
--+++++-+    #     streamer=None,
--+++++-+    #     logits_warper=None,
--+++++-+    #     **model_kwargs,
--+++++-+    # ):
--+++++-+    #     """
--+++++-+    #     重写 _sample 方法以在 StaticCache + 单 token 生成时使用 JIT 优化
--+++++-+    #     对于首次 prefill 阶段（cache_position 包含多个位置），使用标准路径
--+++++-+    #     对于自回归生成阶段（cache_position 长度为 1），使用 JIT 优化的路径
--+++++-+    #     """
--+++++-+    #     from ...generation.logits_process import LogitsProcessorList
--+++++-+    #     from ...generation.stopping_criteria import StoppingCriteriaList
--+++++-+    #     from ...generation.utils import GenerateDecoderOnlyOutput, GenerateEncoderDecoderOutput
--+++++-+    #     from mindnlp.core import nn, ops, no_grad
--+++++-+    #     import numpy as np
--+++++-+        
--+++++-+    #     # 检查是否使用 StaticCache
--+++++-+    #     # 如果使用 StaticCache，我们进入自定义循环以在单 token 生成时使用 JIT 优化
--+++++-+    #     # 否则，直接调用父类方法
--+++++-+    #     past_key_values = model_kwargs.get("past_key_values")
--+++++-+    #     print(f"[DEBUG] _sample called, past_key_values type: {type(past_key_values).__name__}, is StaticCache: {isinstance(past_key_values, StaticCache)}")
--+++++-+
--+++++-+    #     if not isinstance(past_key_values, StaticCache):
--+++++-+    #         # 不使用 StaticCache，直接调用父类方法
--+++++-+    #         print("[DEBUG] Using standard path (no StaticCache or not yet initialized)")
--+++++-+    #         return super()._sample(
--+++++-+    #             input_ids=input_ids,
--+++++-+    #             logits_processor=logits_processor,
--+++++-+    #             stopping_criteria=stopping_criteria,
--+++++-+    #             generation_config=generation_config,
--+++++-+    #             synced_devices=synced_devices,
--+++++-+    #             streamer=streamer,
--+++++-+    #             logits_warper=logits_warper,
--+++++-+    #             **model_kwargs,
--+++++-+    #         )
--+++++-+        
--+++++-+    #     # 使用 StaticCache，进入自定义循环
--+++++-+    #     # 在循环内会根据 cache_position 的长度动态选择使用 JIT 优化（单 token）或标准路径（prefill）
--+++++-+    #     # 大部分逻辑与父类相同，但 forward 调用改为使用 JIT 优化方法
--+++++-+    #     pad_token_id = generation_config._pad_token_tensor
--+++++-+    #     output_attentions = generation_config.output_attentions
--+++++-+    #     output_hidden_states = generation_config.output_hidden_states
--+++++-+    #     output_scores = generation_config.output_scores
--+++++-+    #     output_logits = generation_config.output_logits
--+++++-+    #     return_dict_in_generate = generation_config.return_dict_in_generate
--+++++-+    #     max_length = generation_config.max_length
--+++++-+    #     has_eos_stopping_criteria = any(hasattr(criteria, "eos_token_id") for criteria in stopping_criteria)
--+++++-+    #     do_sample = generation_config.do_sample
--+++++-+        
--+++++-+    #     if do_sample is True and not isinstance(logits_warper, LogitsProcessorList):
--+++++-+    #         raise ValueError(
--+++++-+    #             "`do_sample` is set to `True`, `logits_warper` must be a `LogitsProcessorList` instance (it is "
--+++++-+    #             f"{logits_warper})."
--+++++-+    #         )
--+++++-+        
--+++++-+    #     # init attention / hidden states / scores tuples
--+++++-+    #     scores = () if (return_dict_in_generate and output_scores) else None
--+++++-+    #     raw_logits = () if (return_dict_in_generate and output_logits) else None
--+++++-+    #     decoder_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++++-+    #     cross_attentions = () if (return_dict_in_generate and output_attentions) else None
--+++++-+    #     decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None
--+++++-+        
--+++++-+    #     # if model is an encoder-decoder, retrieve encoder attention weights and hidden states
--+++++-+    #     if return_dict_in_generate and self.config.is_encoder_decoder:
--+++++-+    #         encoder_attentions = model_kwargs["encoder_outputs"].get("attentions") if output_attentions else None
--+++++-+    #         encoder_hidden_states = (
--+++++-+    #             model_kwargs["encoder_outputs"].get("hidden_states") if output_hidden_states else None
--+++++-+    #         )
--+++++-+        
--+++++-+    #     # keep track of which sequences are already finished
--+++++-+    #     batch_size, cur_len = input_ids.shape
--+++++-+    #     this_peer_finished = False
--+++++-+    #     unfinished_sequences = ops.ones(batch_size, dtype=mindspore.int64)
--+++++-+    #     model_kwargs = self._get_initial_cache_position(input_ids, model_kwargs)
--+++++-+        
--+++++-+    #     time_record = []
--+++++-+    #     from ....utils.testing_utils import parse_flag_from_env
--+++++-+    #     _record_time = parse_flag_from_env('INFERENCE_TIME_RECORD', False)
--+++++-+        
--+++++-+    #     while self._has_unfinished_sequences(
--+++++-+    #         this_peer_finished, synced_devices, cur_len=cur_len, max_length=max_length
--+++++-+    #     ):
--+++++-+    #         if _record_time:
--+++++-+    #             import time as time_module
--+++++-+    #             infer_start = time_module.time()
--+++++-+            
--+++++-+    #         # prepare model inputs
--+++++-+    #         model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
--+++++-+            
--+++++-+    #         # prepare variable output controls
--+++++-+    #         model_inputs.update({"output_attentions": output_attentions} if output_attentions else {})
--+++++-+    #         model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
--+++++-+            
--+++++-+    #         # 关键修改：检测到 StaticCache + 单 token 生成时，使用 JIT 优化的方法
--+++++-+    #         cur_cache_position = model_inputs.get("cache_position")
--+++++-+    #         cur_past_key_values = model_inputs.get("past_key_values")
--+++++-+    #         cur_input_ids = model_inputs.get("input_ids")
--+++++-+            
--+++++-+    #         if (isinstance(cur_past_key_values, StaticCache) and 
--+++++-+    #             cur_cache_position is not None and 
--+++++-+    #             len(cur_cache_position.shape) > 0 and
--+++++-+    #             cur_cache_position.shape[0] == 1 and
--+++++-+    #             cur_input_ids is not None and
--+++++-+    #             cur_input_ids.shape[1] == 1):
--+++++-+    #             # 使用 JIT 优化的单 token 解码
--+++++-+    #             # 简单判断方法：首次调用时打印（JIT编译需要时间）
--+++++-+    #             if not hasattr(self, '_jit_used'):
--+++++-+    #                 self._jit_used = False
--+++++-+    #                 print("[JIT] ✓ JIT optimized path activated (first call will compile)")
--+++++-+                
--+++++-+    #             next_token_logits = self.get_decode_one_tokens_logits(
--+++++-+    #                 cur_token=cur_input_ids,
--+++++-+    #                 input_pos=model_inputs.get("position_ids"),
--+++++-+    #                 cache_position=cur_cache_position,
--+++++-+    #                 past_key_values=cur_past_key_values,
--+++++-+    #             )
--+++++-+                
--+++++-+    #             # 标记已使用JIT（用于后续判断）
--+++++-+    #             if not self._jit_used:
--+++++-+    #                 self._jit_used = True
--+++++-+                
--+++++-+    #             # 构造兼容的输出对象
--+++++-+    #             class JitOptimizedOutput:
--+++++-+    #                 def __init__(self, logits, config):
--+++++-+    #                     self.logits = logits.unsqueeze(1) if logits.ndim == 2 else logits
--+++++-+    #                     self.config = config
--+++++-+    #                     # 对于 JIT 优化路径，这些属性通常不需要
--+++++-+    #                     self.decoder_attentions = None if config.is_encoder_decoder else None
--+++++-+    #                     self.attentions = None if not config.is_encoder_decoder else None
--+++++-+    #                     self.cross_attentions = None
--+++++-+    #                     self.decoder_hidden_states = None if config.is_encoder_decoder else None
--+++++-+    #                     self.hidden_states = None if not config.is_encoder_decoder else None
--+++++-+                
--+++++-+    #             outputs = JitOptimizedOutput(next_token_logits, self.config)
--+++++-+    #         else:
--+++++-+    #             # 标准 forward 调用（首次prefill阶段或非StaticCache）
--+++++-+    #             outputs = self(**model_inputs, return_dict=True)
--+++++-+            
--+++++-+    #         if synced_devices and this_peer_finished:
--+++++-+    #             continue
--+++++-+            
--+++++-+    #         # Clone is needed to avoid keeping a hanging ref to outputs.logits
--+++++-+    #         next_token_logits = outputs.logits[:, -1, :]
--+++++-+            
--+++++-+    #         # pre-process distribution
--+++++-+    #         next_token_scores = logits_processor(input_ids, next_token_logits)
--+++++-+    #         if do_sample:
--+++++-+    #             next_token_scores = logits_warper(input_ids, next_token_scores)
--+++++-+            
--+++++-+    #         # Store scores, attentions and hidden_states when required
--+++++-+    #         if return_dict_in_generate:
--+++++-+    #             if output_scores:
--+++++-+    #                 scores += (next_token_scores,)
--+++++-+    #             if output_logits:
--+++++-+    #                 raw_logits += (next_token_logits,)
--+++++-+    #             if output_attentions:
--+++++-+    #                 attn = outputs.decoder_attentions if self.config.is_encoder_decoder else outputs.attentions
--+++++-+    #                 decoder_attentions += (attn,) if attn is not None else (None,)
--+++++-+    #                 if self.config.is_encoder_decoder:
--+++++-+    #                     cross_attentions += (outputs.cross_attentions,) if outputs.cross_attentions is not None else (None,)
--+++++-+                
--+++++-+    #             if output_hidden_states:
--+++++-+    #                 hidden = (
--+++++-+    #                     outputs.decoder_hidden_states
--+++++-+    #                     if self.config.is_encoder_decoder
--+++++-+    #                     else outputs.hidden_states
--+++++-+    #                 )
--+++++-+    #                 decoder_hidden_states += (hidden,) if hidden is not None else (None,)
--+++++-+            
--+++++-+    #         # token selection
--+++++-+    #         if do_sample:
--+++++-+    #             probs = nn.functional.softmax(next_token_scores, dim=-1)
--+++++-+    #             next_tokens = ops.multinomial(probs, num_samples=1).squeeze(1)
--+++++-+    #         else:
--+++++-+    #             next_tokens = ops.argmax(next_token_scores, dim=-1)
--+++++-+            
--+++++-+    #         # finished sentences should have their next token be a padding token
--+++++-+    #         if has_eos_stopping_criteria:
--+++++-+    #             next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
--+++++-+            
--+++++-+    #         # update generated ids, model inputs, and length for next step
--+++++-+    #         input_ids = ops.cat([input_ids, next_tokens[:, None]], dim=-1)
--+++++-+    #         if streamer is not None:
--+++++-+    #             streamer.put(next_tokens)
--+++++-+            
--+++++-+    #         model_kwargs = self._update_model_kwargs_for_generation(
--+++++-+    #             outputs,
--+++++-+    #             model_kwargs,
--+++++-+    #             is_encoder_decoder=self.config.is_encoder_decoder,
--+++++-+    #         )
--+++++-+            
--+++++-+    #         unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
--+++++-+    #         this_peer_finished = np.max(unfinished_sequences.asnumpy()).item() == 0
--+++++-+    #         cur_len += 1
--+++++-+            
--+++++-+    #         if _record_time:
--+++++-+    #             import time as time_module
--+++++-+    #             infer_stop = time_module.time()
--+++++-+    #             time_record.append(infer_stop - infer_start)
--+++++-+            
--+++++-+    #         del outputs
--+++++-+        
--+++++-+    #     average_infer_time = None
--+++++-+    #     if time_record:
--+++++-+    #         if len(time_record) > 1:
--+++++-+    #             time_record.pop(0)
--+++++-+    #         average_infer_time = sum(time_record) / len(time_record)
--+++++-+    #         print(f'average inference time is: {average_infer_time}')
--+++++-+    #         print(f'inference time record: {time_record}')
--+++++-+        
--+++++-+    #     if streamer is not None:
--+++++-+    #         streamer.end()
--+++++-+        
--+++++-+    #     # 简单判断：打印是否使用了JIT路径
--+++++-+    #     if hasattr(self, '_jit_used') and self._jit_used:
--+++++-+    #         print("[JIT] ✓ JIT optimization was used during generation")
--+++++-+    #     else:
--+++++-+    #         print("[JIT] ✗ JIT optimization was NOT used (using standard path)")
--+++++-+        
--+++++-+    #     if return_dict_in_generate:
--+++++-+    #         if self.config.is_encoder_decoder:
--+++++-+    #             return GenerateEncoderDecoderOutput(
--+++++-+    #                 sequences=input_ids,
--+++++-+    #                 scores=scores,
--+++++-+    #                 logits=raw_logits,
--+++++-+    #                 encoder_attentions=encoder_attentions,
--+++++-+    #                 encoder_hidden_states=encoder_hidden_states,
--+++++-+    #                 decoder_attentions=decoder_attentions,
--+++++-+    #                 cross_attentions=cross_attentions,
--+++++-+    #                 decoder_hidden_states=decoder_hidden_states,
--+++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++++-+    #                 average_infer_time=average_infer_time
--+++++-+    #             )
--+++++-+    #         else:
--+++++-+    #             return GenerateDecoderOnlyOutput(
--+++++-+    #                 sequences=input_ids,
--+++++-+    #                 scores=scores,
--+++++-+    #                 logits=raw_logits,
--+++++-+    #                 attentions=decoder_attentions,
--+++++-+    #                 hidden_states=decoder_hidden_states,
--+++++-+    #                 past_key_values=model_kwargs.get("past_key_values"),
--+++++-+    #                 average_infer_time=average_infer_time
--+++++-+    #             )
--+++++-+    #     else:
--+++++-+    #         return input_ids
--+++++-+            
--+++++-+    # def _prepare_cache_for_generation(
--+++++-+    #     self,
--+++++-+    #     generation_config,
--+++++-+    #     model_kwargs,
--+++++-+    #     assistant_model,
--+++++-+    #     batch_size,
--+++++-+    #     max_cache_length,
--+++++-+    # ):
--+++++-+    #     if generation_config.cache_implementation is None and self._supports_static_cache:
--+++++-+    #         generation_config.cache_implementation = "static"
--+++++-+    #         print("[JIT] ✓ StaticCache set as default in _prepare_cache_for_generation")
--+++++-+        
--+++++-+    #     if generation_config.cache_implementation == "static":
--+++++-+    #         base_required_from_max_length = generation_config.max_length + 1
--+++++-+    #         base_required = max(max_cache_length, base_required_from_max_length)
--+++++-+    #         min_cache_size = 50
--+++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++++-+    #             max_cache_length = min(max(base_required, min_cache_size), self.config.max_position_embeddings)
--+++++-+    #         else:
--+++++-+    #             max_cache_length = max(base_required, min_cache_size)
--+++++-+            
--+++++-+    #         original_max_cache_length = max_cache_length
--+++++-+    #         print(f"[JIT] StaticCache max_cache_length calculation:")
--+++++-+    #         print(f"  - input max_cache_length: {original_max_cache_length}")
--+++++-+    #         print(f"  - generation_config.max_length: {generation_config.max_length}")
--+++++-+    #         print(f"  - base_required_from_max_length: {base_required_from_max_length}")
--+++++-+    #         print(f"  - final max_cache_length: {max_cache_length}")
--+++++-+            
--+++++-+    #         if hasattr(self.config, 'max_position_embeddings') and self.config.max_position_embeddings is not None:
--+++++-+    #             if max_cache_length > self.config.max_position_embeddings:
--+++++-+    #                 print(f"[JIT] WARNING: Required cache length ({max_cache_length}) exceeds max_position_embeddings ({self.config.max_position_embeddings})")
--+++++-+        
--+++++-+    #     result = super()._prepare_cache_for_generation(
--+++++-+    #         generation_config=generation_config,
--+++++-+    #         model_kwargs=model_kwargs,
--+++++-+    #         assistant_model=assistant_model,
--+++++-+    #         batch_size=batch_size,
--+++++-+    #         max_cache_length=max_cache_length,
--+++++-+    #     )
--+++++-+        
--+++++-+    #     if generation_config.cache_implementation == "static":
--+++++-+    #         cache_name = "past_key_values" if "mamba" not in self.__class__.__name__.lower() else "cache_params"
--+++++-+    #         created_cache = model_kwargs.get(cache_name)
--+++++-+    #         if created_cache is not None and hasattr(created_cache, 'max_cache_len'):
--+++++-+    #             print(f"[JIT] Created StaticCache with max_cache_len: {created_cache.max_cache_len}")
--+++++-+    #             if created_cache.max_cache_len < generation_config.max_length:
--+++++-+    #                 print(f"[JIT] WARNING: Created cache max_cache_len ({created_cache.max_cache_len}) < max_length ({generation_config.max_length})")
--+++++-+        
--+++++-+    #     return result
--+++++-+
--+++++-+
--+++++-+
--+++++- 
--+++++- 
--+++++- # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Qwen2Moe, LLAMA->QWEN2MOE
--+++++--- 
--+++++-2.27.0
--+++++-
--+++++-- 
--+++++2.27.0
--+++++
--++++-- 
--++++2.27.0
--++++
--+++-- 
--+++2.27.0
--+++
--++-- 
--++2.27.0
--++
--+-- 
--+2.27.0
--+
---- 
--2.27.0
--
--- 
-2.39.5 (Apple Git-154)
-

From 5eefcda2eec1f853bbb351f7587ac8c7b3e1d8b5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E9=82=93=E4=BC=9F=E9=94=AE?= <emmmvkdeng@gmail.com>
Date: Wed, 10 Dec 2025 14:15:14 +0800
Subject: [PATCH 3/3] =?UTF-8?q?=E6=A0=B9=E6=8D=AEreview=E9=87=8D=E6=96=B0?=
 =?UTF-8?q?=E4=B8=8A=E4=BC=A0?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../\351\230\237\344\274\215emmm/patches.zip" | Bin 0 -> 1011964 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 "2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches.zip"

diff --git "a/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches.zip" "b/2025-Ascend-Innovation-Contest/S1/MoE/\351\230\237\344\274\215emmm/patches.zip"
new file mode 100644
index 0000000000000000000000000000000000000000..6a7b63a5d43a9a817f27b335f5305912221d8004
GIT binary patch
literal 1011964
zcmZU(Q_L_tw5|Kvwr$(CZQHhO+cv(}wr$(Ct^Kc+larn7CT-_MGcIR4o(=_RU=S#P
z{~6SdQQH4o{C^7s01kkIp^LGFsWZK*3M2q<5*(-H|4dg8XaGQvS3m#&5S0JUD*UJL
zKRd+#DvZmvcGla&1E86x0wDii3TBpehBkEntM~tlCI7!-%RJiIOJZqvKh#z`@C}wN
z^)^fB{>EToAuwG65(e)0USG&o)z@lTX)z@1ZghL%5)cRw8=0h<OeRbMY?;?KLgRE3
z4i_Q#O}6fD%D;no%AKl`jaxf`8<^n%q`A(Q%azVGT#trhc8I=WmglL9J{>$=_O%@9
zIrj4#QLXlVk_*_IIgKoJC}-52ViSy8MC6v)6ip=>{~GyARA{J>!vkefa~No{mCU_H
z>a%o%zlA8#rAt4gIy7>43#L;}I(IhtAG%K$s5n;&kt!38Ua1f$Ziq)ds)|(JC{3V7
z>E&{+X?mot^ipYZI6N-r>m(I7s!`A}D~eF41u9HEkdc>`mXJ$KH9AdO-@M|?I$*af
z6~}h8*Dj}2Zal&+CU5W}`Rx&uvWS%f$fq86(`uBLcB+*(-6(Y*>H@u4yO9a|1XCt?
zGNlOeIHhZnHXAKtFn6F7_6VGvnFA>N5+0Z=G00pceKni1RMN-4uVENz(z)^swe9?>
z2ogIqfm|y6jXN0_3$bItGV|=P@D^U<m|sdA3&;RM+$43EUXW<jOgc2IrY}-SrBaL>
z%ajBq<JosXWbCFH;c*C-oTVg{s7#>#XlD@GRoPkBNXdC;m@mT-2<Lw3RLCLzGDH64
zXqAoHDvLc^h4X-W8mm;<5L@ipyc>5O@`%HvX|{-8;*J!in7xNBT4Xna?o`P8F&+ZO
z8<5Ebr>n(sCnpruR0x_BQX#${KJ2$@1Fsq+tV}{*RnRp`?U3wUNp`Eu+$<tEXN<96
z9~T{8O4Z@;{2xGHh7+saQTSts3k=(IH|jUb)XJ;)tu=ahyr~kEoKyf_9t#v0$3j^y
zRuW6p+X!dmsRVIsLd|4MlU#J9iV)Ym3QpT?w+e?@|8bGlSJ7>|$!g&0$j-GM4Dv~{
zNQalkt|<%aMeGnXK-rLYn-etaLRlM0rASM`BF*L35}6?QXH_K-+))dUX3Xi}wxS*~
zBkn3AZjRn`UCI09_mc!;?;)IgC{X-_@73|Q_deVl{g$hP>HC9GNKyrB!-L=L-R$)+
zzMmiL7e@zg<*O?45lE*$=j*I@{%91uFKwz4U<i%hHm;UIBZOPCE<)Ef*B;PC4F%ki
zkwLuEHs(`As*cGU3AK<|4K4E+knxKU444F3A{cdufTi;d+B2u-uhQWH_sTP^Y<wTq
z478p3Yjt(w7I&^$E2A;ML?09HBx5>8La>`rh3Za+*R({W<!tQXbBeO9@@;;fvR}dP
z;_LjGIlRywa`sqm)-1JxTQ@pB{qAp`UGej5wEvp^Esn%ap|kgOE%LX4Ipb}2_c^}l
z{QbZi?tG%Je!Almbu3SlL&mTDogRD^Tl-9lj$H2bZ1>%tjNa`AB7U*k*YJC!6#8nr
z?_TJ8xnB)_0kk>($@$+1I~Ftyl?Sv0?^8-KeV!`R9chr%8;jW!dHh=fvs99Iy}(>4
zQCrfU?OL{x7+&^TR)p=*CZ%t2WSG$jI2k0224(8ON3G@(17cHC;v^Bp0%w<5H9LKy
zolz&yTsoT^VXC>tCf3A`{*6Gfqw$fsmXcBo;=%q?jB?a_^8tAEnOoEn`+2YQ=g?eB
z=%;Qi=j=h(l2%zeV8RC?yEVq*kc8W?1zuvO(erpU$>-f*X-Oj!@BT=Si^VFYvo<ph
zq6Bm&-h+Lb(xH2U>bgVn>0cZ&6<(_p?T_rhP`F168IB_duFr>eK~+e)1yTVkXez;e
zAGI`1yxx(PiUw4m1d|F%u-X?Pp8$NaQ3pi_QC&Puj&GyWdx6abMQ*bV+O6ULwQ+cw
zh<K3rYf~S?u;i{4u6KODr9ujhZl6xJ!_pBVTrERY%GjSlNxaNTpObe7EU88jY6rri
zT`GUv=7aqHQt`A?t4djfa-H@gwAXxO)V|9QY5<VKo6!d2Wu%!86Tt;^Cp_b@^A<ns
z+y@<N$o08Z2|!#$uq%h{#lH%}3vHnv4q;l{!W$2Yg-#_Pk%PTF_GN{_Nf%*gtjycd
z6HF&Ye7&Jz&6>*VwwiDHuoNYdD%{t<Zon<BWHNN3wBTs#v}KJcvmp!+c1?9Fq<Y*=
zcsOcJ9np)I`8X$1+XbNG9lCM^2E9s|EKpEaW@I&U6*TM+Qro^;5xEEet_<!W3n7`b
z$cSjHcw;(F1KLXbR$4!rVhqiqor@V>&zBKOUTzh}4zs+K((n&<%JeB)WkH2+>hhGo
zH5*OFbbjf>t+#2pu<*(tGpdlvjLdIv3HJOtsu0Yh+kFn*H>$;oN8vH<P~C6VFOIvZ
zDr*oT#28J`2*%8S<!+Z88BMK`qt3~K@(*5DPpWhnPlQ`$*3e-<GaV|fF`>F<&hCda
zV}Kb8he9RUw<a_)U`%H0<9nhH`6w%4Ry}fj1)&**7+`wjFf5y>Id-ukb#Dnt6wZ^O
zD>UbRIT<E$nm0hPdsQ0GS_)B`n}WWtFHrz&T!}r-4($dJ?bsQ}602?FzGpud>k3!T
zP1~|OiMFh=>WBUUPYN=m8X;z)O@U}}85VOb2?*r&XUR8GO+ohSmt@dAH=>GHvop}3
zVhYdBy9^R6i)_pw37RD=Q-;ozgbl-YAjmUxwRqHQt;MQr*pI}5jIObxXb+7Dx9S%_
zW1>ssr%&sIJPsRxCL^4ufEi2B##Np%*Rp17iXMt!6iwUVO8bg~+HF!Ux9X*cY~0W7
zxSnSV{ilfdI2|<HK#rKNaKouEdKOaHbj>!j7gp%TzKT^j)OyPngfk^ae7>V$X_8Kq
z+EC*}CAM9uiuBv0oE=h%3JKF9PJ>QGDw;*Aoa&^RmAfI>!cfL*1CF0H3P(#EZBxPW
zu#4>TL?k>AD~hK?L@YbXER!X#6=T2z>K7W01?g85#F|(y!xt*Rj`|heAe0NhpN0sD
zA<wv7dJ$)@#d9eaD=n}a8~7neFKSkiSxWGL4BETAR9Hw4?F2qxKQ2H#xW^Eq2*wX&
z&Q9on(a|S?fC7eZbzs>Wh7J{2c0GUXr}iwhxFPs`sNf%#cr!X&CJKzb)-8s))?W6~
z1Y=!p*5X`@71S`rktO^7N1Wu86I0)8jw9H{ZnQuSQ~T5UcOLvg-naLCT2r$XfBzNi
zoQ;0>7mLj;bNv|q6CDqhuk$*Q5WCmY^t-?J>E3rX(A;VMZ_#3|Tx>Bf;5xqTAF*W-
zUH#LL7St}kI*kM-U*Y637!u@`5EX;%n@srPngyPF-_k6UZ_0S#^+wr2P=8*iHx#ci
zEwrDF`KZa?UK{a1yqldruO0P)y!%<-c;5BSKf+yV=U?N_I+_kBI$G$03{Us5IQ!vs
zkM}GOuKx-l5c#j~#hZd!{A(B3rTLs)#VyyTfsYa$aYYN<IWc%D{96rIo~{>osE6Qi
z7Y!sTsRo*v1WM7&YKR^)Lq(5uH#kdNb>?`y=s0|Njuc#A#UG43GNP4st}GI9njvPE
zC68FJuOPZB;e}1xCULB=(#(>Z_5C%SA*;~{nYzu^XuCOtPr&u$rmQ6;Zd;A3A*>85
z3aS{$_ll*_SD{o~`$lJVH2inm_10ipu#QlFv-xP7Fmgk-C55-7f%LYoe7%j%r`gdh
z{kOH*mp>2Dv<DXI!B+}~(*-(bKj_0%i&^u4_Nn~?XQjh~O4DH5B<m7U*Wx9j-w!Ux
z#CsRmY>)Y5Lh3?r;M`T$<fy-x0=rE$W>J;$+J%r%pg#egF$=azVV3k}mTGjkkTn9g
zHH{!eheCq&3|!n^<9Ns`OAE_eD9sg|;1_s9e?*TmRf5p7UleZ$&i&R?THv?)_Orjx
z7|AW2vqMWq@1^Ngfq0w*&txwKF^pGdtNL?92{|Kigzcojqgcp*2qrnRbwf`J=I=tz
zR+#~#q+kP~av>rjy15(xGGf(j;wK;C3Ft9$t>~)O;-VBJ?rRtt*1Gw&FPI2(go=2K
zFQXHg>G<67E}lr9Ajv>`#h*B`wlwRGj^oN;KrXB<Ro2N-Cb+g`0-EEu>j&|addnGi
zO;)qo(4F;zo$6rezt6t2*)ymKHdMAKsUa{3c9z$O;(m9qG2?w?6EGZ+%Kn!bis_eB
zGpWoQ4CBt;`zHUtUybR<q|=r3q2Nlg7L5-OLtv4kS`6D2+a)fJiGadQN^tMKE3y?D
zRy|)cJAkUdCYa-Pg=4pGr@k`YNDDYp-mc(@AZ_;RxG!lZiKca0d1m!_E$!#@KO<bv
z+LN}QFb5>fAPJw4Wk9YF5E@SOc!$qtOBmE{*#>3*Sz>kOW)8Pijs`*i%XC_sCc%JV
z(KsCFDXx*7eiR8SW)|5H?pmBaTg$Gr*M(TYEd%fx>61?-<64!H#((OU%+kzXDG%D|
zt=NWNr=>R(4AGfGH^%SSPaxAI{Wh9MbBYy8H0feBM=~AQV!0VvE7umXQdt;u8ha4K
zIcb0kU~51{ut|)uX$PQ%rNE629z~BJWYmsZhg0LIoltxKG}{(L&TF;zRmp?UD)h2j
zD&OJbgcu1#acz(|S1hv)9R4f@T-RlogLezM;ZGsbW5A995?Op?qkx19BBw9#h1vk0
zN@t~EB(nO|@=!omkMIx4aVCaUAn@LyVoCtdqZe|G#n$SLCL0A7nzLoo6h7jAJeD31
zk6K^~ff*NtVQ3HnnbeK&5Pua@)(OMv>u|U|ahl1#5&|+=rw|G56b7`h$gE#N?7QE<
zPyx5rf*I(ZsZl~+1A#v`)S0<};ilxKLJdG{NV0H!`Ua8YD6_E(mjtdCGf{Z7*U?0k
zU}5D$qt^u+7*5~TRz(coR~%uecw_**+1Apx7;I;#)7u?|(1Lv9&_pm3?Z><3PNfy1
zjX9fuYB9KH=3UXhdq<o=Uy4-n(oYp=^Yn@dhBj+YX2CzNtWkVgXM9UH8cVK$E9X#4
z7j-tV;&FKy2a$ZA#?-F*`$NkQ4vk~cg6`Ro_Y>*^*TODA)p-P@c8+HIMXt$&1V-Md
z3;>732QDoA@aEttVKJeSP=ukI@<B|;3GyIyclfMf3v&d?c@G^XwRV)T&(YxVxKz1(
zd<0Q=d314ZX}Wyi!SnaUu1`A{AB(OB#Gp;$3D~+r8<<yGpPC}cwrLLR8B73Pk1X5s
zQFdV^?jde9i!I%Z-q#lxr;O#Fh*tikH6E9ERBew+x2|{SW?DRcyJ65H;&Egk!e`}}
zG{btE+B)1g`_$}*b7SB6?)~k9O*9b!p~kSDNLywDAEYC>a7dPfgyNYCW(54}z%YN@
zv0zU#=KOF_G}@|ruxLP7x-tr>Mp6=%J%}*>uXF1b0o`f4ZC%X9fvlvi6wL8<qW~MG
zW{*;;;YbfF0b=MwKCVN<O9pT}e^{EB=A9%mKP~!fTH?b%C)%vxs?Ml3nrxZ1Z;*~4
z9A@KhOJhp5-owJuI05O5a!q+DCoDn`Z8zB*kBo>GMkdwmr){s1AXU_6O=@c79J|BK
zxl}AgGvk!On)|3CJPi}*=8x2KsOHVV#cxSu{T<ia4a_WBhO^;rr|XR5GE?j@f}3Cy
zQhTdsp2Kc;0&!X>+%z|%Qn+%#;b9{X?>-O1knh4hbxguyW+JYAx7mEBKOV!(4@0L>
zS08wDtw(HtQL8?b623wu#B34cs%tY@z7xsA;0;%u=jIKk!Q+Lshk+`E=`hkSroOWm
zHogXMl)bxW47yaL*l^C+UHL^RM_a7gPEWy7FH?s1@CJ8@d;Zo`j?b+cRfXEz9Phb;
zRN0to!;S}8GA$16mqy@6hM2{C34!jS7?2KkMC+~wjr+vcJE6&A1ZJ)|3vj8wfR7Jo
z+~>w-AnStfqS<vv?!KF7bOcNF8RbJ%T@Uv{@nh-*k+c})w#MlK4<RH)lR+k9eu(We
zuc{sh48@~PgVQL>x@*=z6bir8`2nI|YsU~*h~rp!aEgpIA#PbMHE`RiS8RbCjD|3U
zWXM`^QB#ATnS_el8A?`ot}7By(efEQpsI-Ih*W4!CVbwQV$(Hxk3Ix-V-FVy5e83A
z_azL)pG#+c>%u-_!yq8O$%HBP@5eAaWX1itE%G1m{G=x>*5$K1C4R-v7Hpe#8DE-2
z8lg@mm9LGN?#ud@FFQDE3MZ~DEqOBoMp7ptM9k$)&Z>D_iN!YAr2Ip_4*8gs3(8DT
z=dMNXJ-dk%Uf$4ak*`a`An>q7`7^oXKFbkjCf{7l5H^dakYFrK8azSakF&_kb;G5}
zk8a&>#kz5tuy?1Otq1B<%1WPXO2sXQh>5JY-mfJ{H+dR=f|P57Liz5Hhzi{FGaKSX
z{KwiMIKppfsXOccVUFWK9Yyxg0U?wgmCDH{oQ>lhIg}&>QkxX2FDU{@?hW><uPHM~
z9HX_k1f`fF;fPpBfUFCV0O?KrB_FKqkLUA<>XygT+4=C!r+r?@LToEb{v-W14nz(Z
zi7p<6**{;2XGOuql2MdB2T!a3v85WM36cu$RecDt3B7{A_st2Q;Qf%+7=h`@mjZ`G
zLqf?W>kawL)4;*%M}kM1C>&ch0LN1%QrT%c<M@M|p_S=Ra9ORf!P*~|#`#J2iglc9
zQ!k-}d?<+ovMm$~O*Jt_lduvZk-i%nn%OxeHdOslDmMyc2pT~IL72oqV-omQM>;5g
z6c-tSIGN%wdQCy?mTncLL$WmRDj!nJ?S{RjkZfV6%gj0!e+c935hHu>xiyg>3wWS)
zt3D>rmoFVXTzHUaELl0Nb~?$Nx`C1++O>;e5N7&eAG}ORv=&u)#vv*APN|0_#{QM5
z_QuuTZjffqK)wfOVd_eOQg^cL4p>WYyn7C<Odh_Hh!~xz{@I4cA)Ew)LYYX_WV=Vn
z$cHN?Jq4!}^Kx#&hB`l=T)T6w|6_6VNb2x!Vb0iQ*Bjd(?`R?X$9@iYnj=k??e^FE
z&zdLfZEVx$k&)y2cSoMRc1PR)-ooMN@H7;k-Oj(|_x<ZX6!Ox?ArI#z5ROCh)9mK?
zG%+)?rM`yW!~Oo$_9L$U&cmq0*YF$qx6$e6?(k#y*FzsHNy)VFesok5{|DSarymGD
zettpm#w+_R36Q#!36oFu<4$kW^JDtAiO*L1H}F@PfUeIc@knnZC-ZWVzCf8Mp$3{D
zR{}Lq7x8cBQvV$X;`z@5R6JbOC>+R-a#g>p#q}Y0AUvI|Ru|_#cJiXnPIQoBve@@z
z`PZY}@AJ3Y6|9Z^g}#UD<L-X+J@g+uN>X%x3a<mZ+Z)(Jz+_7vIC0j_&jCjJgx|g0
z>x%2Ubh_bZx9b6QZ=L1c{eECy7&p!Q$N%o*U~PL<bN9*rYe^wm@-px-5dJop@81z6
z?Z!3aniFRs?iGe@PGvo(v|`}Cyw_h#Hj6C&YX@Pyba0pkU43{A?Igmv@tMjqg^({s
z3@Gr-3)pny84iIb8i;{mKQj0{)iaC{s1K*i<R{aM%MPthv&$4F$Ac4jyXHrT{h;GP
zCboo~CM!k*3n-BQcn3dVmg%JxgSjUSP%dN~lk7Dt?&z>)qyqy_QWV*AP1`{dDMhBM
zT*d~A#px&dR7XzPt4TonZ?>R+IaYP(u3Y;v(E|`D=X^9fG>1bbFdw7sk8>7Koo5<P
z5kL5d;*UkWq_>gyQW6x`KWC+Hp5ryaQEc4o`9Rl+^RPx3gTyLY;#*2ytV@|C@7*Q>
zb43i(FlaalEK_}OlOq9Wj~qMCIx_jNpp6?Uk}#R{0Amcn5tyLuszX?Y(DcyZFTTh9
z#_n<)O|<Wc`f~k9r$hrS$;RerVG>SOV$e<x(&deJXiGSa6mxkzNjnKJ-{zJ9p&k%h
zVvt$@f&}5bgfRiZs6OzoaZ9?D|Ac-!>qYBDOIo@S_5#FpQ^HzxrYcL7)q2$jNix4Q
zXkysX3srGZ`H4u1rO=bmRX_EVG@@0s;|HMp<=WKt+wSIB^z4BAd<f!G8RTb#7D<PY
zxQZf3K62DUZC)e?se_?v+;=Pn%?#KY9beh@VsIPR{PN;4(o)33EWHER4;+v?I3U=I
zg>Hyo=xRh?Ft33;h|l9{$B!txpWl6SqSzfZ5D0G@Y<o{nk&tjr0)v(1dxF36DS(ha
zsU+c2S3YTIG~%5V4oL&PTl$gh&Wwd>%OZHQdxBrRPtg;Jp;#IPxn;_n;jjDN$Lrx^
z`tX}{8bx{?qBgp!*mA)Y4r|P^jVfFoAPk64m=>Wf-K#;R>O|3}a74ZK#}W}BQ}mrD
zgF1MEglir1<~*v%R(QA@oyK2$AR0(<mME17eOv{~pE&HalDJ}I_nvyeb5pzZk|(xE
z%to4-fVj7Ne+#j9oF>mpE1UCX``6^*Thj6o7J|i~7-unY=k{HwY`Mhk1h9<eJ#C6?
zw8MAJ(*y3%kB>i+zIRGL$6?lY?0WL~JYE@jc3QPP?*RJzc*4w8Z5Un{_y%9U3Uz}U
zM7QG|X|Kmxe^;}cHJDpmTmbAgkTR_1>J0(<IWVn`r*h`|dFor0IEh@s^~j_S$-BeI
zyBnWaY9k<TzS6ZD-{E@)AUwa24du_&4@q9!_Hik+!+(`-@86FHqJU7weCWoPw7I|A
ztS*tLB*eV3TF4CF(3*=<s2J!7t)hpkf~5N3;uQTo6GMXuG*k6I1Kag~vYknQ+qVsu
ze%m=u=NKSJyo?<z&d??d^7%B=dpdZ3(|jpoEmQJow|+-C--t!)yII5M&p4ttpByxn
zNZ6=*zt-%v+-G<+xc;}H<!9vR^gKa%z*T8|9CtE&3kod^+;MGFi&OIGr)Q8lwsbob
z9kwakYEJJa=Cc@L$zG7kbvBTyRF%Jm#fC^y+o8|8t$H<`d@{KF#%A=>0_{!}`r7Y&
zQMFvvSV7}Rx!?oIOdrzlxBKDs?L7wI8BoaHE=}_j4RVY=%d}>IhH6g?2L-$};+zav
zX=QQTYk0{sB0;Zt9zt#E0dQ<xovXW-B8RGKChkV2L~EnBB8GKb#pm+=g3(X;+;4Tu
zwY;^uUh6n@%~ooZRO?}~@-)diDU4Bs6^Z_<=Bg0Ig~)!ZBMeTn`7Mo=<DTykAh}`4
zTdka0i8B1>K9bd#=iDcrHau{M{PT=SjcBi7PK>P>2iBwG|Iju}mld!$B6=t85dtB`
zKR&?CFGHMvze3B+bCeNkvmJeGF{Ja4$d?Q0F&3Kln2&o}7^6PZ$jcsN4l;8&F1GU$
z=&HC`PUwEuUl?6#FCwK^fK<o<ioTKM8J>iFZs!+kyrbLl8-h=+?$FKp+)m?<7=ydZ
z++I_d$QCyj-<JokFE+1fR{Q{%SbRb6M?}&Efv1najQ?RmaQQz#pAX!0NyqmAy#V>j
zX}Mx!NqJM2m0rNx_yJY_aZjJoPUn>_g_Z~0B2mHI8w@@C{%6rHS7q9S>7gdXVB1fI
z4k8*mBm#5oBBw1j+Kal|GO4D}PV4Aoea>=v&v<+_zpnXm=F3OXPj3}GJlrJ-eh83^
z0j+M1O!bE12;s4@Yy5={MAl$)NA*a^VkN2~$gN;{nZ&&@!)$?smDn>igS(q8<bq1k
z^N&PBw<=`k=G_8m5Y<aP&?gia#{Ck$q0A|XDwCw(nuS&)4%td0R4dj}y0eZ4&ESAn
z%W_5n-h|n4hNsMaF;Ck9s$gOTaLo^Mp6->m9eILA@b&sTv-wY_X9xK_K1<l#tuH3x
zA30P!{1yq9@}b|xu1HiY<pZkx8nNRqA4A*>nbf*Cuey15=Uf`_QljE?iJ5O8_Ezsd
z%R6<~$`}WurmVD|f5Wq5FQclQc*kE?hXSv7x&l6bC2+~yI*pona}>V<ATvv38<42c
z8H4rM=xpMy(ddlOy05kOk4!Ux-T;J(_(PCyUX8meg}9G4ak@^aM$=fiHVO)%e6pC8
z1LOM=p#o_)tcP_5#MszV@Mu-s6Ozw&Y0ZCgL}vuoByY~wW^XK|PE3=ISYH}#@pl1`
zCguU$og0`Fnj+1guM?mt8jb0&c7^jP;rZ@$!lle@9_Is2!}0ccW8qT+tIEi2tm&rZ
zn3<E((Q~ZXt@wAVeVt}GeQRD%9H&=l{9fWusQE2ttiYI<3|leQD`e0(Sk5L9s0t)-
z{xiom4O^oC$qZT(rjePq0f3nl(BMVyK4Ke6agWHgfpF9!gFWrK7V80HmqMfYiLN(P
zTNxLc1g_Z({MJ$jqck;HN*#{GonR?;%~A|4*v^Shs@)4rGgZ4<`jL;ZuML1N*zD#n
zs{N8?DFdnwgY``9NODZAHX^WNWA#j;Ij#xTrQ}Tn<~6LQYRA1$x|sQSUYkr|$p)W|
zYI2=dQi03o?tWa%tZZ4~t1`48`RZlt%-MdfS<KQ~yPeD}^@!p2FQPdyEOi>=mWu(b
zsLqeM2c+jVW{Gu;^~zZ-r^C6IS{jl4OUxXEiDn-V4H_<T`kwIuuDX%&Td^yMk?r?b
zNEY{ne}F_xwA~;>CnLu4irwg($?TOAOs9#t>Er_Bv3F)FdIGh0D$hX#|IvvksoCjj
zC<#Hm9KbCdO*$rb^N(}svtgE0w&k)uCp&&5v(853fP4p$8v*4u&k53^$gieo-#P5&
zM$l>Q=C92sv>UXCw|j@Fl|H7a9;^LD8qj>(_f4AIl00;C$X*{;wp}b}_CWO85AXtS
zVzj5JJ^e!HXseu;5vjgPK72=z#@S4Nh7DN2q1!7O7{nqzfarL9j&l1TMBh>t6<j%d
zv1UX=ANn>yADG#q8cPUuNnH#yvpZmLOw$(OY3XXBB3wcTmJHuh9*@rN@9tJ|iq90k
zsk#}Bx!y<MtwVSM!To+;69`wrIu#rP>w-K1fhyy!8EB$g*Zg$2;x+eT^~0F%j-|ei
zrMj8iUv)oSf9D0g;^^19H=rFR0xCF&KNPflX7Q$prfjRJZ0_Q}+=rRm`q^9uQ$_ji
zH%DYgc^h;#uPMZBd-@_OiOGDZU3bS{{1#~X>F6xyAG{tsMzyzP7R4)Ikh25H)aF4|
zhg-PNW1TL_t~n=%jFFn1?+s3`70x$+NhDIv8pSUkuY(WQaa?(>66ZmK6^EB@W`8*s
zOm^_oI%ehBNN*MOTGlh+w;!W+k6q6*zB=CzRM7?d-`2f#UoY9(;bs75so#(Akq+l&
zh77v&(sX18W+GNU_4G4{LvPE|k;5m<dh!AJf%l;G?uMJUCwPxDt~UBWQUpkfzRx$S
zAE_9RF|^Y%bbikub~oG0ikGMxXK34fKg^27yYtI)PvYZ-0<|Nf2)^&D)!azjuGTWP
zyPYoAvt8RRx3X57y>9clRbGefrh)`cf^FI!vk<V>2rURj3v}L;sH)(of5PG-7W=bt
zv*V)}R#%bkqHK07z}Pp0Q>gwQ#)T=I`^cmkk`uHY%|bQ@Fi*$qL2+c6+O0~QrYK9o
zdsQQZEER_p)?6paqE#V_rpl*bU}IyC9d~K*a@^+ft9o&3cyp1*su?gevGTP3p1}E$
z1j$91H4^rhA%yJoH+Y8pqR+HhSApJ)#zHU)y`J-QVA)_`^|}P07=H+RL|X~a3bH%V
zYqVuT9{I=B9xyC!G!O2pTq?($jU|^-rX-*(D|N{?R=Dd?x3hlo5@#qLEX^$vAZCy%
zx1h-g#X4nFrrG6!IRweR<7yrddZlpf3(Y45*3S^sso}~B`8fh!7M!*+*cu*p3x-Xs
zn?ZPVx><?B1qC6{>RiG~))C`pD2RR;%Xt73Z6e|7Wx*tBQWw_avcqL?EXTE)TUd#!
z82N14E?$627n0PQs?&u(8NWsu2FbEYLiLd-n&AZD3Y_371T)S$h;H(dv2YWyaDmtp
z260lrjx{=+N%S#_C*)ks{PF6X)p5{w)PYE7=|S3dl>wPt<1~p=oeG#ghST_E{K7Px
zYMlTL14`|CguW}H5(wZz#?Gke$cJAPP@J^)LGTOCtYwDA-jyst5}hS1c#@w6T>1gl
zP5mGw_X8M2iENQQJbo~Uu@g%&mKb#yS@1xA>C({$Qp)l44>?dvA-SgWf#ET6R@J#H
zg_9<<4cX9j`ZXeSq4g7RO(!GC&yo?T)<?n3NmW7pv3Cflc*oGenI*xr#$;8K%3$@X
zGCB|tSm#PcO|<AhDCY){{R>PJawAM|MSZ6VjN<8uMe+FmT}e*~c`#IruVI60g^hf$
zafN`k$#DJO5sC&->eGuY5<C?`7^XLlOla?E^Fp6SmncW46A9XJIIult&5~>cov<&=
zMez}m7st=gyR%Qz^M;qr!g_dwrG^kEHL<<%%~*v~z@H0s@+>2J;HAQx>rlWZe?mdM
zC<JiDy-Zv}Ovwo{s}ZJUMIX3@6#I=zgCefHyCNBA>jA*8;1c3s7yv&01K_cwR^_Q$
zT6tok{DF>S^@QL85KE%eyWJ8kjkl_55~k&;?xPRd;6QwVPhVluo@vh0CLJsRBwl@>
zDnG#^ugx=X)v{vWlOCj?DM|z*Y2IuYM<oD2NR&LXHwvN1Nj5}K`2jh}04QwLbx6o3
z;O?jxl(yS?*C|TlY=fhkRa_9zlhl+3%lfV!<XK1@^FB!>WoSBA@d6ki>Rk!;ePYox
zceZKexTTPRnrDm88Sd$I9b62RgVw}U0HH4YfdfvIxu9$-i(%?dii(i@@QFPqwHfuR
z0dUJr0|pd~NQl5hV$q)43kj$nzsPkCax}O_nX>DR|499)ErTLXt2+#R36xqoY0?!J
zd<I9-@uX)Lsiplzczv@{_I&5lN|lS;esn3x)}YfqL<~8*C7ttqVFI(}=j83v$_J5l
zG{&x)`mS%-hWs)V_JK{5;JT^&C3Z@1dGS=KN*A2S(pBz}Vu}XG-0N5;o;?&dR}KPn
z&h!2uSOuXvfmRQ!<7$Eki6){IAf|3WIEAoImDU^x_^L_V7X}lkBzyKsfA>voO{}Jp
z=7|_=F2PEAdUbLWr>QXJslLdV-Gi4b7j(R${XLfGm6%ZR$JF%>WyB?L`}84HPi9K#
zur#wRClK<s2H7hcVTbDo1G()TGx9<cQ?^GaUaA+?erXM=QW8<}TkMXat1B&rZiO8h
zKKFSVUS{aaG2%k;y^HU=M0F%=kl0>F4v^ShZklA1&>EVJ6C=|fQ<81G3up$LCzf2<
zHO#m`OA96teR#khk5X0I+!B<<RwfE@375gF*(!qmrK0IY?y6DK1OVMo%e!w;3B9#R
zks4B-F%y~cQc{t67cbavL=J*14VA6;W}S#6AFREQQj|joOC)UcRA!S{$G$z!hm>hO
z3IqwmZ)+h@T4Q3Ywpb@tm8wka3cTE}zEF!kAD0_tReQ<|U^I>&CwkH;ypTxuZxoR%
zHG$EYN!O9jDhZAuC5DjNs@FeM4hwCPP=2Gw>8CVb6Z|U8S~twRz#q_{UNai42we~l
zuGPSB<fFW7#}SuGrNXG5^4EOGUFfP3aEu(%NJDupR`qD4(D|fxYXTj1xFrc1?#qg|
zMWLX^bzTUpTJYSEG_XL+2}JjQUs9HO$~gcE%k#*QElzXs@!xLqBFBoIgq*7;8sMg7
zdRev1hY|M4bX0vYwOxZEvi;J>3b3<rO(%5_=t0I~#Ys6K2iD$y9We0yLUD%tQAYUt
z1fWapdzTDMpEA&^lpM}lal%8Xq}1+53p>g{<UiPV8(7pEWnB9(Dr3w{(n%iY2{Mf%
zt<WHDBU`qXxrSDwFe*D%3nhI?JrjgzllvE>CyMz=Q41WNiKe;oV17BNBArFsU8bRE
zgBD3Zr;15AN22#a_v7^^nxL6%G!{FB3Fh&7K3TLhrvo=^yKA1M7%2uosR5}<u?jIE
zlIxk5Izcg%F?SkWOdTDnvupykfk_U;qEjtUMW|MTH_{}jtRh7qphPoR&DBPynoK#A
zieAAw3OJuYJ8a)lmp2h=XVsu^R|?tph-$|;$o37+kXnu-8nVy?(K~YhvePx(zOTJE
zq}9v34#*{DNw+hma9SeprXnjWb|027#1o3eMciWW5_xgiM&e@g5bpYY*`j|aPVy*=
zQ4Uv_z|kNu(!)+eHmnVg(NB#FKBQ6}FOH;ikx>|q=%DeTVL%D$T_{DvI~gM$enQYC
zc<iV$eJom>@Xx2YB=S_1L5s+9=6uEfQQ2wp%g6yL#+CJNhH(ZKQAAo#TtI<?fr(TB
zv@VHA*`~X_dC}U-ZBW9SrZAp&B^gdI5CSeFJWQKe7v;E<U=(yS#vY0#cY<(;C|8ad
zk(j%6Gb6^kxM((3vbP)UJBu7<#*RoQ2&HrNVo!qex#}BSQG-g!btY)!|KMUV6`N{x
z$1wx%7{AjzFrU1=0NjJtH5~Lfr7*5A_p>Y~^hdl%`Y%evn0Q31^J{i~%^c1Y&kjFr
zj)Ws(9T{;CAcQ~_bBaoAA~Sq<|L!cV?fsp#_C!CWr{Vcji7X{yuj}lZV=+rolsZ-u
z+>=A<MRN}j#mOdGzYJb8pP1E@?XAqq3DYRz--V{PS&8EZ&~A5tPaTzCk515CTCz(l
z&Axaa)4H}7%^%+m6N}-0pN@LoyId^Lo{#@f&o+1y&12E+%0&s&bUqYc`lVyPNBpxx
zj@+IXz2pMZHh3}>-y2XO<F&wQv5YQC{Sb`W09Dp8^0H>cG>)-lUB6W_<EQNPeE&T7
zkc_ey!*qp9zLYO+W)qjZX;aE04CnV6+fno%YJ>XzQz3L`P?xCPL)FJmlZ>IqTW^t$
z*}yds!H%}GRfqh#G4#wj-wkyUN57@z;qLnWL6ukv&X?CGiCx5{TPvH50s@;+04dy*
z0H`{l@9U3F{zrP&L<v>vc6qy4L;2-aPUz^D*+*5bZPaRpkp4CnjnI_Uc|V)0?Zp|$
znI{ye6L9WVoi?`V9M0U4STz@MF+kiLn_}4DS>7QH`GmwbCF|zpK}!wCNB1ZGj{U=G
zEZg)U!)oc39D4I)Sn*788iBV)TS{f$0TW2Pyag@2T&!O~#df@aDn@Ui$e;(_qG{iE
zX_Qe_azJ<OPZW(=<7~26rX-_K(DP44nwYp?a6IKH!*`-QwQW>Zw}v~+K3$WH*4Qrq
z>)r`Y-(pg7C_3Q<lGttd^|Pjp0$q4VcH4Q!{*PW>hMT-^ze3n0TM#Gf>MaH0pRL2D
zav8Q@NXNv}e+Vq2jtLT5uC;d%H-VvcYEQ4~=P3<w-a{$cN(wlVtF6u`EGR##T*SdR
z3)1G|416|PewvIx<G2NT_G!+duRE)tfi^O?3r72rmw{0X8>@HxgbgdTgi+D8^j@0o
z#MkZVq={F*ay8JHh6>IK-n$``7aqX6Ig37`mVuHWeeaeEZPbmC+4k3px`CDE2`NcQ
z*eYo6LQ()2tV@F=Gmbby*EF?eG%5xzJ|?iDgu`fWyQboG#^a`%8{yyEe_xwgB_ri&
zrHpGcN<>L-i&NGa$-%0O&aMX+lxHLE(dPK`q&w$WqcypItC`<I(tzy}hf4E3heKw;
z@X^hLfeVnBNRFXoi(Q`z(tBOnfA42+Un8v{&UERla&915!+Ot*-^h#dzU)}?Aas^U
zWe%`&0K=%PKo5vJtNA-p!K>^T@)524wz=H|5N?hn3eV<9kr1CrBWJLPS%nG>B?xs&
znJ!*5>ZwNHrbA_OWaiSIa=cT4gZy;n1ihH$bUx3{Bp2TG_wj<zNx$N7e;DHOi+>DA
zH<aN@N<SA1!fsLz^nS(W<Y|vnTDwOONBHo!0wNtM>M5xR7F*a$AdeV(3h1YGt9E=f
zvU46~wGMK7Lol?xh^w>TLj~9lGbhKO_zW!*(vJ<HdyUKr+is%xc6+6PX_iV-5PQ;T
ze}^raXWuBXG_$&N>nBxi2b9Nq^IZnT5`#{J(#JiWaD*pm1y0h*s%j%bWDv^Z#zFI%
zlJXQ%g3ttquV>JKY4s|oFy-c|8!Q5++>m=YljzDiofe>nifUlsj7OPJLtq2GHT4G<
zQdH_duobiTB#NCJXonMAAcg|83OVxL(ixwX&}BsyO6Z{oX42p54KIq_E%u^5eC;m}
z4Z=m0!B!&NtSU+dM`_bd#0oxL{{=>V0bB}_fQv0y1cdTL<^vY%ua@)0fYbO4ApQ+e
zq>$4gt$H{L)8bPb#D~nZJLQ!?%aKT4H3>HD?FFnSk?R3W9-?>?iX^p$bfZxh=x^ZA
z*hb$<YsldzfiwgfE>ht!bo>Mf2?q_iReSc&rAVhvW(l74tWBujlP!PM?M9ih!Z50%
zDafE@!Yg#-x`v!*Onz(+;QW^>H&0qkWi7dp@6)3$L(I6I7t7G7vbmP{)f?#Z33_X}
zCzA1)UzFc4XcLoOuhpoRi`cGUq-?FhtYC&BDdo4q&eMI|-ZqGitunngq=semr?%Ty
zCA=(g0c?`jG)hmXyPl&&r_lU_dL+jp6ur-G)a0)Lpwq8R#K9v3KfC<gQ7RVwWjlEd
z-2^&nP(>-M8zJeeR_3HLfpMl(h`NM@tj<;Hs7#omqgLfr(3m-*mcgX&V}dD_Pvm*<
zrti5C6EuVl&vJSzHB%N2eeOkBAUKNxH${MPfyyEz6o^?YeA#mAZF?!M;cl~$z1e9s
zw!SQ`(RTBFzWsz)M6F@c;aLJLgOVQjuodoT(3{zawutG_0F!*R7crtta(f-%5r240
z8?%KC9hw)v530L0!K&a*{b6DOVY6Ccc?eE}5<lqLA=B+eHd^%nVFM=im4Pl#OJa5L
z$!+kuwuE}(q)azXRL}CNgV@G5Y6l#}ofbk<9<}OFxxc}N$9Zy$7F^QulI3zA4*&G^
z8E!B9Y%N)A(4$7=K9t`PN!}fIl>2V?F$@D^{s7&}?8U76aT(3@(a_gBCv<7rdom2g
z=7dc>K;EnY1z<t|Gw42MV#3kyfB`2_OF<3ssLuA)U^9?745ig=J`QZW;f9WL`C`7R
z;ZcXlRK0LSI%TUI*i?R!{#bE#EO%|N16>u%X*FDA{+w!U+-9}7tliFvW<F{!VD{{C
zyWm@6)-4UPcH8OwE^)hbHvc5-9f3AQpTctcn0P!H+sxI=GRL6kD^SGU{X{tj7{K@L
z@uj#gBnPLr;2u7KM~^({eJTFy_~jH7vIJSuOK?`q#CVthG`B;+noNHv1viOW=Ed0#
zesDxxO{ev-kwtS3ufF!G^t@#)nn*3*Tsi-QUV=~xbz5tUBvsXk#spDgsjNJVQN7Vr
zR-f8vt*x;Wm~JiFoQW*W44<9jhghM~b5_B>%Hi+gaL=bx{sS$q&D)QjZpMAQIFFT2
z8?#?rt}EauJkW^j333%izRLZZR(F3el}v@n%WcBIb=H?N^to14jgeJnoQx9-pnh$a
z&@oG<5R4a4hB5)2z4q-;Y}EHmT$(49)1l%3UGU1+b;jGNh~3l0>${wu{g2uzZz{Dx
zuTHTNRwsbZU!k4(#xi2n1DS={QiIubbb?o%xl-!S1bPqHSzJGktrl5F8r()F2$~Q-
zHZr#(Q?gfwmT(|<f?z76dJQsGvP|{zcv^l;VMn3tNSUi(M<JZmcH~ys*cv<twblz?
zvWA0sG1v*OJ^)JR(O<eu9c#6rjH9SYKtNlq<nn1unopqOsz_LZbkOf(k3w{t1;=b&
z1l}O1SdW`Q{rV0$&(E%mZTOUG&BDhGrrldAkN>*3Ciz@3ep+DlOCD-bd--OSG|I)l
zW)w8+j=!l0rprK!m+UVkAw=5`3V$(TtB<Ci`EZ)w$Lt!>e%-i9phkG%C%R1AN}@1<
znP)v#7G%ceqFYFARt1`W^}u=bdDveM`Ru<pHshdg$Ctv%9wA(F;e8le0*it3Muup9
zPZn9S;1G?{tRglkvQuQH@Q(@t^BxK>>Wh&)<bSlv&7H*mT~`V&b$iVXx8pU+{w|!Y
zDTl8zX#G%8LlY|`=Pg<DFBqnsBXg0%U=En|ldwMHN&IxsHn-4`hCtw=5}%{Ovl}Ua
zv%iKzPSMVQ){%rMHL&E!qYIUgVzEZQmM6u87k9?{1T^j0km>HCm$dC_pg5V{lB)u)
z)hqy&tzHC^465|XY?{84Rkb9#tj3<22zHl;G51#->)UULZrN60ebb^u6}wq4bPkmx
zvDc!FSU8-lM>z94xG<8^Q|Li5_1mvuw3EmLbxf307xG<MN@+kMp;hfUYYym{rD606
z1|AGSi>Zn=?cUcU`9&qBe9NT|eL5tpPBqp+LSR-4RwVtpOwT0@vnj#DM!r1!1^@CJ
z?@Y=h4nlrFei<^x2fzx9ugbCL5{9`H{>N9v3;p)dC`q9?@$05s>;AL*EQiG)3e^&q
zxfWg~B<j;)BosG|MuoE}18^qMleI|$$CWv|;GIAK6DWLX4xLKrHd)9Cxl*D6I#_Lo
za(J5(mLw?52nIKZA#O_h_6%9e2MpB%FEDtbSPzDz3{R<g&kY-^i7$BpEC4W==f;)D
zMXhnC1uhcv1R2iqKb16Dpt`_`gOhGW&~tLX|2_W8Ht2X&d4_9itG2WyPRW%^^H#$b
zX(r^{nqkVbr5{vI0$)5<`}JFc_d2ZUZHZW{vN)+8wQR2i>4j$)SEpqb=RDSv4N*aA
z;&w<Vr6tUh8YuKiGw}a;=ReX|W>J&Oh`U-->sT#*E+dC|@=9gJcb}|o+LBo6=wTh!
zC$E+Vn(hJGWb+zk9#dHIJtCHX8~C+|T4D74QK?%vUmdowR<B~3Hc^@l&%z*-0<uFD
zP~qb1E#JdFwxzzBNH3*I;ZZoJIt@&6Po#+jrt(dHjur7dq*tjC^P4pEKSSdZ<=U){
zbAst*$50(Q%T1~Q82CiSGxE?{`t*S=DerAOaLxz4F5OfxE)TeXwxd%Cl)Ul=-Z&Cc
zCDZdTK=kKyG1mRoWS!!^*O$FWIy{hUS~$F8nIoStR<8nU6au&ov5sHIvCv6RbiRrC
zx<rsz1`39!7>YGJh7b#8!OLyh{#0y%|6K`J=In_Rm2VhQ$MdY4wSphN*aJ+JKxZxh
zX`1wuMM8pyjL(3>#JK=uK)zjHAo>URO<@PWc<X{U-XV<@`0lL*$_}w=&hK-2)nWR@
zeu2e!y`<i|A&ln*8M;pVxIT$<x83;kq5BS3BElw*W<h0+LlNI6&ze$?Tgw#213%wI
zoZ*5r2@5<V*V3!*8)Nd#yfaRmIWMY*mEf*R_5tJY-4f<KbIeJ8VI?K!h4c8VK35c@
z_W#VGg}`H3V#>k8muY*T2v0K23;0x~IN-*hNrfA{RutrJZI(`597GL3%=@5dA{NdW
zknG59grr-!?nKSJI3`Lp^iV5Y8_Jj7rqZGW36)uDkYS>bF-R#Zp45bzZHN`B3HjfX
z55U^F^2Nn?&hPhJt$i@uMh<U;d`tGt!I9%|?ws$FKov3__}5rSTZWX3fqdC;0${+l
zDnz0|T^z9d04gbeI&**f)~W0c)#_^-(4K}rVv@lHauS(t?wtbq6Ipz{&Tz9GfzwQ$
zb-uba(;?ohqwGWoWsCQNU;OyA@b)$^ciZiF&*$2ZZLy|Y;9XxvxSE3_?+8c6X~@4B
zbQ&*>=E>$}I|XASU&sb`2sQ(^b&#_M9qv$|U6KA3iqN2u8+a|;`ZT0WrEbCR1LDjU
z^AB?yMGlwE#%cWf%JonOFUuaXuk%J1p|iOAU4Xb8wEiFfOIVLE%JxET*p0(mM!Sj}
zS-ru;{nn;az7h59V!zvY1dJkO8#ewavEaeG8Vm?~T5-K1&?ms?vM=D+PMA3nqw3fW
zgawNImUJ<RnO*fKUS8;dY&nS6db<nYj{rSULyOnA`4fAHn}t(S_t})h7YhJ7rNTDC
zz`XgFp4*(R_$a5@gCSSMT{t+9?^*$je}iDa0<4DKUrXBQ#TMA5S}Q}>Rme8Y>Nd<4
zWWX+--yFX7@Q;1o9wmOHlZ*Sx-D0QH#?JzxrR@$_z8ZEq);GADDcI1p{-g8CVl>xA
zUi=^NS_dQ6&F%T^?KlaN4GKda^~Dxf%rffB9WKN(_EpUO%P!cT8`*Y&_FbbBx0LqH
z#XN;5#*m6X3=s)-Los!s4_XMuN<+O)WMTs28zwo!UBtM@;|g9;x}qRZ^sAU?S2!_?
zbHI41n6V2re5SLC#+2gB?jy6bw3dSg;S)(_l)E1RUgR=QhzNye1dy#;GyPtjNVhxn
zqMUV$$0Wv_77>0fFJEus>Qbx;tx>x*F+0ObT8eeg2$vNn?y(k6N+V0#@!RFhzph5O
z;3c*q&j5vSS^c6N5N+H2)T%VdI1iwGJdYOu0#;%P>L_u=o5AE0YjW%IQR_8%#V7g1
z*f~!Z89JFEk{7Kmy{<<CLpLA8S8fsE&3O#71fp~i^@6bB==lG_3iO=>w3zIlDJ{u&
z?&y7`D}JsFkrYmOe43rCqD?M~gbRrf!iF6_`5MkjVvwULR6m#E&`;{Q=;^r=;Wi1N
z)2+eAxVSc=VUsMEC6hzRv(pd%rDC|)<+;x6R>|g;E?cDG4a;7o{O(|e*2d7!$X33=
z0L@l<M_DM6YWeTmD;N38XtUxdo;)?aFY%|;{j@y1N?(sev*I^6eVblS9yW`OXU&ye
z(X3tSdR1*$WWgG_EDkinT*ldJo9bVtXQ<|PRjn9^vs_#~nh_e~n2KJu_N_>(NQ2Xu
zF_w90Q%al$W}&p@IsnK+m1y@l4J{|nQtDl0_=#;yedHl{6@#7l3JA`Co7C+t0|4;K
z$yad)cpT-$iIDD?G;t6UH)hZK#U+>J>fHm#9#)KUDN~zEu1hury;;#EwN4$AoTN;M
z)RHWdS#-tljMxuYl#-K^Wul$Xcui%jxRtCN?$ow%9tSp;<_lXX`gA<Akw!F~eWYxA
zh+?E@d+~Cr5+7J$>Ojf|0<@T&h5Ia|kz7_QKzo@~s~I_81i{Z+7sB2EnltrtJK(_l
zRKbWui})N6WOg-5fFTW+L7RCALyu?SEhnv2;p8KH-Nj45TWq6jV_e2FDS?!}!jQ9r
zzpHChtwmy=>zbFv=%q<U34S_(uLHIKSv&rrYz0Rwu6A7jwW`ja3IYD)FH&q~PMNM9
z#U*fX35ms6BPSEWt;OyGxCEN?I?_T{ATk(gEoQ=;@=Xz$-kwU4sZST>cKzXi-X;SP
zXCj{yJfX5H3Sl`>noN}@x*7O5*%4s?*c9+o#pM0nPqQvk6WA;yKT9w+J&joLGhaC`
zK;svN?bi=YYF7*AzwXK|B)e>-Ne8+XP#m3#DTsE4BAFxwcwCF}@Dl#8`h;n<BCg$2
zk;GEm3gdfAoW&y)QrxQ(TP96`z(%}+zncL#_Zad(oA4fFlHm2xwwh@}d8{PjEFkDq
z4$f>6(9*o&0_Pt<Bsv?1!Rnn>Lh7L0jHXg^veGl0+QFSRk+dc)(|d0XcZEZ|S!DYJ
zipCxqx;I2rI;u0#@K2U9DYiaLR?z6uZ(A)27YA{uEXT0eNgtPPz!kga{3LJf)C&UJ
z+Nu<nSUqh{aRI!hN*5CcA-jr`xR>CG(61Dpr-=gsBjjc1K-D!_ZPr}zF-M&d+M^8r
zqSH-n(!o@|`$=P)VEoWKL&d~eMY5S~RXx%?8W?SS5AeuBF>{<Y5>Pb5-;DuQ4hqn3
zZ7nttul_JD3$fh=0FYHhk+Ym@{PaA!J6#pT(QG}9=ww|CmGYhc!yF?RvjBlTT}mx!
z7}~G=KSZ4acV^MHZDZTEZ6_7mw(W{-+pbicd|}15ZQHi}axUI&ZT*G4+nQsJ-j|dH
zBHKb<*O11-L+|pk%V?n$VNsTzO_7k-zuqS3(8l`(-~M_YPi>%NJB?nKTqXB&>>&<2
z_Sjd)j;dRt*VPYX&z4jN*YjK@2|L<dA<iT})nOPYI)x;hNM;zCGE)wTH4kb5Wh7Ww
za{k-JWe{-fx6X3JV%A0FRZJjy<DdX7WuC-=VRkJwdU1zOi-_t$el*R5Qy31j3LZO(
zxui3NY3)D?JOQImqiVzc-)?aA+6#RJ$Yk9z6_*?DrJNBQ$abqnH_Qkd`Dwo}5z%DA
zqfQFvuBti;$xhe>K8gUwO67V{$0g2+kwtSNi{Q&?@Pc(%C<B-%ZLj2~M$L%DO?YV~
zyKQ6eD&=yr8kK6YkiU~m10;}@$D%rr1<ba*WLA`z*U3lF8Zh?~aZDm!4jG-cS81Q0
zpU*8+VDItCJsf}|c`l=CgH}Szh3(asJ{Rk<x06gWKp$XugRZ3pa>X|iA$o7ET3nEF
z_@Yk?EgA~fohEx<L_p8Z@`Ox?p@#xuiT4c}1J^p<KE+p94AU9m@@%OkYWyZZ@Zhoj
zCDY+lWAmlJ;nO7V2ETxAUum-mC}U6xW$~~Bn4_gn`%Q1y{7k>&m=JkzpF8VSH;<&*
zsMIWL<JVbi>ob8(sIqL8_7Y+j$nA^saDV{Pj<4Nwo$3$Ba>#O~RQ<wS2U9uIhUz4~
zjj&IAiT`kv=ilV2BXi~}w9G19i<8cbpOa;@xp>vu4hCJ@*9)eK(SC}a-p|JwzYUAE
z2_{towJ#N5gK0ygj3^*Sg=znJ2XwiL%{CDyO)$w(Wp`hUdMSh|z$TK$smoj*ZgjDj
zuJUITB3J??tlZ+SI3`3VkXR0o@Hox0A-yKcl$I^bw%dUS$Q?RHtmtMo8^2)LMzRgs
zN!fu&8Me>Dh=Q1zClWV%h-UvnMA?qwNdD+gK^1MsmPueB>O~(LWKCFsodzmJXUGPW
zIZ3v8pdc+>Dy~G6i7xQV+!2>z8FqX@JJ#GG&lX1XO@+V<Z}+T~asWv3dWPw-d}J8y
ziol!0yZ5^#fIPj4B99dXK%;{lgBQvsGsFP>IE^zUDTDHl<$2TAqNkOVsb5S%UuZ)9
z#0zUQrUnK==ie>I!H@rh*^NwT-~Kgst-)wAhE7SDb;1HD(4G#3Z12M70@-Y5S@8c!
z{k|N+Ipg{!i$`VDUGnu*&<Z2zR;T4q#Ofrcov|&=j1Kd9<3Wdp5Q8F8z8jb^@bkq=
zau1I#d>iXg>rIOKLtnfn#H6qEYJV+Py0j(30-{7oll2#Oo+`;VjzIqMbq~p``Gyzj
z@7RbQdDParY~OHTF08#wE1S%Xucbf2UmddLK_)-Rf)=wIaJKSU@v45mhQz02v}Di=
zxT723L0zkeq1Xa;29Eq;Zo^BULjysaD^RmypNR4N@!;O&Xp|$Loq+F^(pyrt-z?$_
z!%fxep@%~~ky9Aft>^05k-zdXI{uBI^iaYtGAtw_Xy=_{6PWTO;{rDk7Kz<j!ZmtR
z|Kl<kabQ{ihty!^03Em`m*zpC?yFv-j}4H=ti;n<^n-nG1%fd)NgCx&Y{(%FBZIoW
zDH1Y{RStn5qYoLPl9B|az?b^On3qnS?-AJt2e<5Hgk)xn(7-*VVsX*B$@e>akd;pY
zU9(h{byoiCIBvO0Mmfk?2sctQXD0oW^bdt&=Tm8uu<D~zSM5>5sGea`1&B_OB`o}d
z;l74K20M2Qoo^Zo2O<zhJIRHV4mFf=KY-c|8kKGUsts3m-djzoJR#v>s9xbfKAtY!
zYZ*-hT@y>rSqNe~#3?ZBlH2ierwMN+XavJ5gO{oV`ygjE(0wBpc?|XKdW|YnN~3A_
z`P$A|DOG`|Uc%IJf?5gdw_xWd7A{mXJOUZns}f+On1-7xd~8+3Q3OtDKfy6&o!p@!
zh_6z3P49d7kN1{dys$qQVh+^Wsr@Yf$H%uI6M28@F3MJx%eRuv3Gu|mvGd@BrfUDi
znRY}3y;&_H@^$4B@t6+cI+FfSKa1$re3%~8oIuS0ji|%J;%W#M{kvAk2snF%$4%jx
zx5N7PdOYGi3xPTMb}P9R2wobAIcQ6YaG73*X>X(4pSdAh>6G$Edg+usp-ZU+4xD-l
zW*pmzw1HADl{7M#j-yLSG^WQ&$&@}0TokKj4l5y&8ku-o*RwQz*mLCi)PN?^-1al{
z)mW;@PU#Bd;$00Z)&jPUCk)aK?o54U9r;AqB2AtUD_00(5hQQ<56OZg$1@phm%=qD
z7gRKlU84Y!%SNCh%DXY!ffvzjm^Lpx|Lr1UJ;?ohyyYj%Z<^3n<WJj$odNY$VrNwB
zfbQBz^wP<UtdoEs*%=!7q)3|ir(Si+o`XJ#XmOS04l*unP^|QiDMgzL!k5aG9I$}0
zAu<-4V!uUBvR(3QX3||(Tuu^X_Ta398<<CUNe?N9dt`lK76M}D=8_W9j-N#zvS;dv
zIq4gX7r!_uLB%HJtH{o6hjcwMVH!&wbR^qd8w8n?j*X2#pN>_JM~DWrEl(dUn<-Zx
zZ7YnF&mn{m97#crHPi2jtd@JvOf601`m`f5f(Shu6BOpDw-U0sf@igoaacdQ3CxX0
zQ+E9XIZTh)RBVP7jIFFpdbZ%>01DW(gAU;eF!!;o{)z8*Li??M`~8#1h`_A56vFr!
zJE_n^G$$fX1)rykf&w1;h^m2(PSrnb|99EOPA-Z7Vi^{xyGK#N^nuWe`c&)z(;+Yv
zR-V30rnWYl>rP$}kV0niv$-3v*1uz5X@banL-9wn346T)ee)o1lx-rFsv5>2kF8qI
z>Ay48zpgiLuA2CDIQ$>%RkiZEvP?Xy*sBi2-Y5frU<GVyb{BOPUcLp_^tR8NN+C4#
zODpo5Q`Ixd8_{9Iv^RTUtAPPyI;ySGY?Ym<!?tJkq!6=7s!eM$Dk>G|H_9{C(Q0b#
zK|@p|F`YwDI4r7B4lxFqoY~Uq_|bC2wCR?D_~+^mXn@F&T@T2PW^4@3(uNeo=$}SZ
z#1$Gxmtv{$755)*p!kuW1sQNF;a5Z^+eB`tr&)~{(Wf?j1--js3b3Ia@H=(sEX(^|
z)r(pjRl$`Uq{cG}`UA1wFj2dzL^uHGs3u$S8zx`I`hKfZ$Ng%v$;QHZ{Atqr$#xU8
z^cOt(Op?s?c&d(<96GIzDZQykQ}nE)*rJZ4DL#*I5L5?fFRWJ^sa_d<?n#W;9y6;4
z@VMna{0!i0OLjURdvT3O3O)8>D%eCU)kua9sam$6t+EqVh?N?pY@p5r_tbTA8c!NQ
z@%#iHCT=QXg{X2gAa?nik@!br<F{e;N|kI0|BDG1csIKkiNfGaU5?p$W>mg6lTM7Y
zVgE`<;3eGx(z)2R5rdCp2uFO<Qzb2$vddTLMf4%jGAhcTFI_P001XCOiYd}V=qoL>
zxPr1I5yS_$>na<{*wJE-NfHX>7ge2Q^%_yI+7YGm1SK@iQJL@G<4fH~74OZ-vUuw<
z=(-ND2TH5jm3yo{F#@Z#7PLNr#K@XTAHb=e04%f2g?lo5GVoT<@vQ-GUMslC*nd{H
z+XOvCi)W-stmsbTNflVWSbH$u9kwbu0xNBY3hnN8>#dxRTh;N(*H>#n=M926>+m;}
z1e+^y%bxE?q7Coy^ooAVQ_oGEHR+)lEelWi;BI$MD7kx^EzhKCWGo}sUzW>po8Btx
z%~wAD*N}itLZ1fLX~_1bZdhBn0t@h6Z!w?iDZ2T}5>*B|aC2zw+(%dIp*ZpKFsi0@
zUJ`s<=B{>5ag<Hv5X`tZ_Sl;~Tq~-+=vmf?dr7At#@wt*z3*ejdlfcKQn`f-zV;J7
z9th;ypb$r`oYsqtU*9#iCLQdOi~{d}Se;F7FZx4QEgET2WhnLOw%(5WHj)jFDE~Ob
z=TEA^LEFiHu=qzoAnJO!)ji#l9<MexXn<is9_PILe(jUJ>{B^@;Ad{}K%?LKqw}@L
zvB2jF(H(S!5FxVyG|nAM46oVF|I>A&OpD)kp7Lw-*sWL#g=_>Az8pmfU-uu<rQ3Zc
zXkJ0bgfhpGwKGIjhrVV{Me<^WI6uQR#2>?P|CLixN9VrhOsAY$WRPdb{Zppr-SN{)
zfR;k4y*tO9I?eU~n(9wxGjh8u&HO1<|G~9*SRa_X_g^aM9vTtVF9iI-G3XMA<K3Mg
za3|33VY0UPuJu4A@nVRdrtf@FwZ<*rn%V+v<`@XPCExGyDr5j%Zq)wtLS(iHs`z;z
z0{L$b;HmIjVE<*%gQB=tZN9F^0UGu-u(T-i)MC|l<#N)NaZ8JtVQD5+zg!0$p))zy
zciT$1!?-7D;|4&IiZoC?2Dj$4S0+|;wE0P~`w&-4#FW~c=QJ4uf~)m3wyULk!rF*2
zx4?un`-z}vo>xbJmedbBNxFSdJbi9g$Jbsq*G{9S`W0G0_q$W0`OLSOg0<RTRy@BR
z_f9Eco!xT)rQ?((@D3n`u#sp8FLe_8G-xan`X7z)qw8B<&4lI%enE`Teh>ED+Te&z
zpjWOnWl0I=WOS}u9w#-4;R4X%@)us#>C=V?zPqmvd-s`xuQC~ze*2@}?RN(bVpGAT
zu+vjx)VVjVKdUaT#|94Pwbswkz1QS9<9|IZZm88BPFYlL6$j(l%GqnFkQsM^EFe5$
zQ<v>i{mw0&p~sZBjoz=(s8g-imN|Zxa6*j$Xp18nU6|WN_q!r9ek1eb7avn6Vpn4E
zgSc5yR~*OgRA~vzYc!YRJsF?(8edO~+a<UwB7AV{4$3a9sp@SJIssnouDc?R`+3HD
z4fYqQ1<XRDB2psN=)C_hQH3(A83H#}en`J=UfO|?eg&iDc26kmSpnqy+L)fF*ABBA
z2^m2oB|AHry!}rvrl(qEipp38zQ;eP5gR52bUfUWRGs7}cv%&G>!2Cqhs?3pKHR}}
z5Xs6eUik;lUNYOa<u>NmtDm+(1hfyAzyA)NDl{24@ZZbHD0=10DOJEYa$tz}KVr}M
zKRT+`?bPaWiA=owSkiiV*eDRxOVC5L{r4vKv*GEBvx6=FuU3y;<Oc$zrjJ2Si@V6M
zf8a?mCZ~s`u=n?1bsVKJ9K&z7a7lWKqel99+B^ZRL^_3D*z;o`A^&Xc0+?gWfk-W)
z7Th<z8qd7auo=zIA)O=Z;K?!@Lin5#)h%L8lJ;c{z-kE$Wwapv3zA~fslt2!QZe33
zuqWV=B1ncObe-1<SfTL}4=Mbtl2Bse`5#IO9%9p7o-bdAL6I}tz%CcxK+dLeTTnGa
zeE}PM|L@@I^q+SP>t%vv27cZ1GG^q&>ZC_l0$RklN-mEW*RyPv&AN!_gTF$WczwAi
z5@6?+Slq|c+N*j8d_G_gAvw2Ca`RE9Fd`NT+PjV@iu$8NxPVORDqG}3KRQt}JM6qg
z(&-p;>3&Efer&SGYDl!tk6>2UKJM?&zIoOufK>)<8WrUJnezu2G74%+!<qxbS?)TP
znD031yvNiHokEU_y@I61F8=8la3j~(imSBC)+x0dI*w{KFipe{A+;|GznW&hsnhWp
zgD+y>-_7*i$m)+5k7p{@SYO(ZnCKw0uL4_NYExgY>JBP7>q6h7hsr;(SIB)lguP)J
z!`^UnH$E6`Aox=V(ewI1rxHJLo~zziP}EkTgqcSQ1z|_aF6MZ}KSIYepPn1-pBH*Y
zK<D$Qd+^vs`9xz+Da<ZP$9f3}*u8%Fh9qKl<0`P_RyHivAz{^S()Ie8ms0(I;>>i4
zE@$~|=iIJJUpGSeCzbu5kDf}j<D4w-iz)J~QF+9|DNIVwzyYk+|6=B>u}Xq8;&(*_
zPvLolyfq_=iX@qt*~%vK%b4mC4cfX(SW)#?4OtZSF`{|ZZBXEbGv(EeoAo?vgB_6$
z*@Hp{Y~cqAj_I!e#mU@asA3q-j<0kI0$Q7}vaPxa2?gZCgz&Yur0Dv}abk_4;k8&v
z6wNJ6RuDve@%w3}We8OZOE7D0)Z?7+zk65vzv;eiK-Bj>m@V8eo`&Zs8(Vc`6#P;o
z3Pm1%yeTa2R$<fk(k`e)bq+z;yg;sA?M#4*%^X*Kf@_O5b3|?N--bv9ZXpiA9tMh+
zjWz;!M(C!4Y%8idK$^1Hy9sW&dV;>dISeV2br^}BXCVjBNYSN<M0_cE{!qs|m%|?Y
zp>9K+g)HQ;^p~hg!_zNrYQYzpZ0D|^w@mUxGW)j|M3gI?I#VCqi0trszZwe-c$-U4
zM~P4=ugUW@z0!nHte6Abop-t9I;5uY8Y4({kM_>_A2RQDK>w<>Sr3)!!@p>yj1|s)
zw5{ZG@lM8A>+3sKZ{|^}{+VC4HXs<LLy^m~Vyn_gqB-dB>oBJ;k|_t%wT<31kARdT
zZo1Krb0_=CDUGxpjIi5~Z7Ea_6Uzok{8CemcyLf8H>Mr7OSTq$Yt|AYso2b*N&JE@
zmo4%?2cVvAAE8H8)kqdgesdJ59{8V^p`UJtp+`SB1ec4wHA5ege77q<7P$ViDSfRc
zvbuEjS10Snxg`A7G58*4LRAHo{Dw%Dqm~fCr_b|}_BaEUqwb+y0LzN)me$KFcKv0S
zN6a%%SQE;XhJ34rG?MjT6f0KuQPZs0)LPtq7Nlhbk&QkHk$eMzI;E}U@#Su^U@#*7
zy_75lJph>?-nywpFBIE7F(EP1_x%O@p^2d%6!@}NTS~PRoim!fz=Y?#Gc}Hy7dY%L
zOw&n}jz@!1aQm!ksM=?tA1%mnzXzxJYkV%Nm3LgifQPaK@KDI{nd<zWXbLrnq@!V4
z=w2W-8nok*-7(zGy6c>*(G-5kaE)=(I@uoD;mh%uBvgInbdA&2(T<+o+9g9-I}NUH
zEz;GevxOVY6Nvfv13Q_K3Bdjbc9My+UY`t=?LLea%jysaj(<1B3rul4B9v_t1!7(W
z)|$hI7WHMuApG@H2m|8`leQ}3$q|ULWK9AM*?_ry;)-qx`bAQ3MFv~{9LN}3_eNYd
zmE507WkVIyb7=CEdB1adcYi+=mmi_}R*msbG0^J{Zv6;~Y<9=r*9^xszebnK$flsn
zL%7^5oepa6nqDU%u6~(XFV8<ak+Bl?ibre>2r|v><t}QYL{g_>pKVb-5Wt{vFnj(x
z!`cG2)YF|BWVew^CNXV!Ol_>KW5iCsd$;>>X|MjLd1?^pIfOWJW+Lwd1aSN2@C{20
zh5)G)m@_)b6*YsTqT`6k#C&0Z<%&W6a~?mQer*$u$WQ6W*wb;bD|P(F-=_NqRU+No
zyw~yk1^W0v0k+;^WK_s-+EDRg0hss^UK0X#dYT17+NfnD3v&WP=Wb34?1l-l>0vLF
z_0{qAW3)Ka;^)5!=)}4*xm?iG&E11%B6s6>tt*eS3gXk=nA>pYC}n%Rw1#{rd055B
zakne*S`VrLkXD|{mtr8E$d)$FCeZ#_H57?+xfEE;{#~fjKHq@)2c{$r_K$W8F(l6c
zNuJd+H!!e3iK(=woW`|w%M|rw$Srrdv03Bhk?M)fF<WU<y&Hq`bw}_Tr`rH2KJI>5
zReL+{-2AK+&(N|-#}2CM90LtldYw##D>#Z9v^di<K0sLU_z%!#N;{cj?GuRqJ6~1)
zzipe`W!0H&28FD@34ha*N@LO?r_@Jy$Oje|!6OP>L-JuJ?`DCq$k>2VL#HbeK6P;Y
z$esMX$JH~tmDnl;tjFFZ!f(!e{U09dwZm#4+SE^;O#+*v^nVe|7!*&(!yDP0-J0Ou
zpS`?mhSzCt2`q&+HgySzC)W(l51)wXi8#@^{un6<5mzeq_Dw?dw)=<6#5Ya(yxLH%
zbKmf7-CR9gJkildFSSF)8Y8yp@@gdI=<_N&qqJMmH5cX!XkA;F&F0h<=&LHjhkopj
zPS*1HRon50e;+gR7w=K$vb@yvg==EewCVf0#=e9e0lAM9g&T!u?+{X<vuoZEfU2ru
zgM^@lWt8;J`Eg~@hw*u7QT;4G43eQKxTC6~`MAY9wYi<@pwgteCk@-Lg2x(aM)G2b
z-U*A8dHY4a6kTdZeiKYB{SK6CGT+t81x~5J<ra&iVQTyVMk&Bri$;>LF>WKCvXo&z
zfz;3T{++ZH=+^4C=Jwu&=^N*J!GTR!2mqki{G1I@N~vqMg;0_P-o%pjv|<fWHt#el
zEg?!>6nN2Au&Ut9J5=3I?LbBEQM9qRrqBJQ)&3D8Uoyiy`ZuhyJ|#<ioX%_qo=e?R
zoz)3`L|qmLt;TAV3sjwBx#2wO>DkSlETqf(^w3$hM3C$5&bxd;U8GD#XLcS<ItAJu
zqF52L9XdrdkGPbfSvwt_0cK_$Jwwei;F%4etk?nK&xrn~=wNT-xDj``k-+Lm`zom}
z%Np?Udd@Uoz8whcjR@g^N%HuXM}2fgVISZp1rkZIXWu-Zsf;(j!dmwjsgw1PSFMtz
z+60}>25{>~uyWcAxUy7EZJV-G)>6=51IVJrSynz}wdFUC`@-#ol&Idg2_?gTy9*}M
zfU8Bxs0CFFHZoVrGJlohO?M>Jhh<qh{D*XP0ZM1ZrhF7*wcjPOJ`$WV*On4?xhy|g
z-nTrOeW+03gmh)8M?*MA)KKz@N1N+FY3+oB*Tfy|Y+|LWdoRI!r5V50*i$vvf~8H|
z9Y-+30pZ^qEmF=k7Dium1rMz_ku3R9I6}^cb+J&gfG?c2B#fR^(yFNN&ZHofSsgRO
zzjH`Mwt}N+MLw;mXruI4$<k^VoXocLXQ)>Z2Zs11c}0=&=?ykCPjW?J6D{?~O<0<!
z<qW_*02qwtY)t&$$3=165Z-TUblfT2F4oQx^W!OY_OajKI1#Vme4lVx7=F;CLSHjx
zo70%1RYCsN$Y#<kFGr724*h!6tXy9k#MQKyaYESQ31FQkdIiMfFjj0}-{66q!}6n;
z9s(Tn^5Vn=V4GRAFW4qZbKK%PCO>AUfaB~#*VJRLl|tVE{{f&d!eca2=lNMu<I|ip
zQZdjoV?iR^I}jYDk+&DL%atzP0q0*w(TlpCT<MehxjzZ^MqojT3^t?GoQlukFrOoK
zH36|NPM@?ajgA0jNBUSR9c}zZ3PT`Iv1*gcba$gF(Zeu5%UnZOGXDeT@*xUZfdum=
zpO!bbzBeD_mUjJLURJ~9MmB-Qk3;xlFiokM@VmL=J(jk+^O3;_SqXp3fhHU<_#{W=
z9y*AvmkT#i?L5C<OO8S`OvRE>$laAn>AFYj`z$`BC3tjjMzEIQv@u)@#UOuKN~Aqh
zFeJ?-<t2?^MBEg`?}30$@?_MvQ7y_r_<|g-lRfAziqu>qUwW!QGjkxx`2UH<1+ap>
zA=ROq7e#Pv5Fln39`}KzPOT(sSnh8K;?~3<WZTU@k%Rz(uf_E#dG{J99j_|J1OI-r
zV3Dd@c4aUWrFg+9d~<3Hxo--83GTEG;(=hn;b<-s0y`r>s+Y-ci$|k(I5>S{E|bw>
zpr+Gxq23~tP`5f`HngfUvG>uE|3GdNSkb|U$~@V<=~^<h`uGu1(l4G77hLY2A+Z!L
z|8nIZ#NbA2vgVf_Et%taaqHxTJH-MJk>tDU!&TITXLkgZaO)UcnhCDLfyuIJj>1GB
z#EYU-Q&LppCIv9%yHAm<L5Ccs2xMmFSZ)4_%K47{hkMIHZr1=9_%}xZj%I@3SXs3h
zOvI-*%bxzMS`bR~h+~z5)zc}cN?r<Ow5@6lS6=>97uF*a53N!n2vkmOmsqKhkmfRw
z<5K)3Ka23?a#R18h1-{NmBVs%%OAr8xT7EHDSBN1BN4W|`t}#|nYq(lWMKNRtYc)H
zRRL}@FoopF*5^6A9kN1?^=X1@?OjWfcYEg)Ma>E&XW}WYY?Y6+tqrx`UoDP5$PVp$
zpZ3ScsTv9!(BVNsnAo|=*WMLw)<av<cj`zFBPjK`w+#S#IYmwMQ~iZ9+c^n)n_R5J
zncxgo_ODB*$Ft0AJsHQqU{@@^^ub66vlVRSSr?S2J>8}<J1*YjeQ#}lM%Uvv5sSMb
z=PMP9yL{fT>2$_z6JQyl4Us5)_b{e{vd$YIn84tP`ueE{G3i8PO%J(TMAY_Q2$Iz!
zK9r(zbI>P9kB+eQ0yRv8Hx4IBF{OEKPgi^Q3Yf>$HLq&e9aV<4d#{>puGib=DD?4t
z+_#pytIt0pPt4x-lhdbsZOWR4yv63D3`a;8?c@ajhPu+iG^pVW=2*R^W5_9E&~x|A
z-OXyd|JY#vIQ`16-sApcZW=KM^Y4ev$Nb?m#UDt8M^IvHO8WrLI0iYsw7TB*fDDPH
zSB0OrX}7s?x)d|lpmZTz=~6|(=DQ>G1b|8rhZxkMgi<cF;6?LS{qYc{BMnZ7TVQug
zBhYoN)Pc3d)O2n!;Z~SVAlcXA!@qk>`Qgy}-tPA6pWAJ~W@$}8S4Xeq^E|FIoo(mv
z!6yHOvG^+GryE^EP!}-0!He-}Z*XWj>(u3`!BLHHNWsId4>`7pH7pYcW!1RRg{5oI
zu71%!MA?EJ;`mvl_{3yx!2upoRiJL+wAS7JXB-GvO_wuj{of054!fu5J|nxi@QC5a
z{Cj^gseK%IN*$KE`1HM9CSGl&Awax!Z=z?gbyd=Js6KoLOy~&yn06$t%c^$KfKdou
zxO`746FF*kJ$5?piQ<!lY>9`!X+bO6ouToX{T}R0sYFV}I_kkV7RVQXc!1VtO<{Re
z!8397h>S`ZyrTC=If%8tO@&G}<*K0ME+>5iEGMQy*`W=vQCC_?E6%4Rh(p%F^kt?(
znXAiC74|=34uK{2dhB?^n=uX@jAUz(jC`SBIGjrNC#9xJs`Yh+M|mez2F6JgMqaQ?
z;QX%X%UBJ5$#__jsp&A-fSsCQ&*!d_d!RXuS1<b1!kwHxv+><pnT@7#xHth_nf)G<
zJJ`l*u1Wm;UbAf3DTd%PRw$z`?v(kVNTA|tNR;X7F|NI4MZ5CG**NcK5^lF<)SL<1
z7$-D9gC*jq_=P9dPl24ftj+%yAG)5x$1gN4m<F}X+Qjoi^>X6ZPo#6po}<6QuG-tC
zgck)bitJOa;XN%oF6|D$jU>)$tT&@bMw?}}&<E>8X?=GKhP6~LC0r5+SX>5meD9lg
z{^p34>}GTJ|9!<38!K;;y=<>rqsJd+VQ=J?H8g=Vdt+02O=s{DBznbtm=nNtKXgY9
z=4SDua>Keu>d(L&Gs7<49M|iYnDh8io{qE+#>50ylpnB%e)=K#y0&i_qWk@!jqAjD
zO48DF{WmvPTj2!n%X%(D8oPqlnZjtPId0Qx`xVyE{u2!`skoDTgl&Ki0O`pS*D`r;
zx4NmQyDNDTle_OI3p}G=UCzCv+#C8??J?UYXRWG~{&tin-pU+Lw@B;mx)VPilUl*8
zI{!J>%I7qwM$-#`ZVS1?Giumn@%Cr&(&P3n|HksV<w~OW&ay#2)~J7rO1_;iX3J-Q
z<_}jb2Q&@C0w8tf=kDI-VE<TuX;8K?+svKs(;pk=$$!<yZr}TSb-OsgZ<TJ%c2jKz
zCBg8<yjbJ+XYO9qB8p_ZaG&@bw=8FZ-sL_^L_^5766z=^2$cf~gJ_$mAMaDCyO5!)
zV<jEg4_(PZ5}|@60r&|#B*#4uwx@@f<ESdCFeEAhNQ@{>l2*{gjx>fLDYTXMt0}C`
zctvbUOh9*ac>Zx#%<Kjda0*BvIbbgVcquHZMC7XrQK2y}4WEqb6WYHf-;+J;SR%fC
z{N_C0Lau0VSb+*9`Y?PgiCm1HI7HVdJb)=dqUd9Uq4KvV@9P-?uP_UC25st*A{Grw
zs8IdIf=*+-Yku3u(Bu<XYJwTZLIz@FC)u^hchqQ)oKloyfh-S6KW+=3K`VLJ#4}FG
zjm1p-1fi!8kd}1n{2>UtphYeO7hO_2IB9A}2hbm=p<ou+Y08R>6PrdG$QFU3ZAw~#
zQG2y3-{Pv`Febdez-eGn3LxD>ImQukwUWZ1+>JgM1Y!jxz26RT18KIk9KgzdP9S*s
z>ljqR=yt@&C>yDTG<Kx45{^c@kF(Vgc1}#p5PfF{NH<pIeh1{Fv}>AG@KFzSn3*_*
zEWFI*M8p=5lZvuTXF!BBW4ErCBify3asyg|+uosPo==AGwiiUy!*t2b<$UHZiTr*e
zAHOJ3eBO4psTOK!3MoTGOI+0MMs~I5MCtp!H3Bb>Bt2h3|H}!35az^yv8lr7sd+*v
z%L_)$A#A^_p3y$k0wnQJePMR7hp`Z!A~#TD!o^|@=6dA7r&L`+jB%PQrIl&Tad5L*
ziCZA^$PftyBV_eT%-Iy6h0@;Gx?kV%YoVCtV0?yqQXAYDA{Ps6QuV+{J?|2=e}l2S
zU&tsE1d+WA;vd9}!x!zpF7P^~Cs9`F$l1%TF>^K6i1`PwY(pxJZ{INWh9-qr1gz?<
zCSS3hUqjsP5hx<um(vZD&_0j)sW>po9OaDR=Rnx3g6{7Tqn(c}vxM7p)AZ$m?8tx2
zAWG1r5!d9owm?Y?K)+<~O#aUbB4kdmH_`M44Ev}V9N}(jWbbvGOsVQ&6{}Yk_N<~1
zF={il!OBoGO&~r|`nfChN}<^9H1GKW4R%Dy&16KZn68K;^6E<KIKk!Fi^2w0{vb{I
zMeVG@`v^kjr7HgU_$-c9E5&|I)fGt3?;(r$xc13A_3?5Sdb<`|NH%y0*p5$=sV)Ll
zN{nKxNl>u4iY|t94~Yt>-ceIZdwmWOgCI{xA6DQ#W|;N)b#;aEk(EAF0o{mOV_j#!
zQ9VHz#sIARXul-C^oKA8)NuN*-oE^CIisbDHg9UPxPvEgKClEkv+rIt@B~LY(#H)5
z3yXn26tjNKX3u!6e5DlK=$Lk^{7lp-Cr$5iL8EtJ$<L`TJ6$P`8xaWTK~)PP17&uB
zq<g%-XmCzijo?+8zTs(>3;<`8CK2;LmgTF_152uB^7gjEs}G(AOn4eSfyfwLktZ>w
zxaeXuf;_IA#~R^LO(#|xnw+tTL_guG+hxL3a<-LhrwMf`(hE_S7iaRHM-{e|L}Ct?
zX5sygVCD_(@8>gG-}qCE$ynAK=={!LmqM|ej58R5Xb>&ME`AZg7hMt*z)K<b*Xyc-
zl|RtPO6jgKC57!!<Z|@j4zszq6=W1spR2hf!wxZZ>x&OmN~U$SoA$u>2B;h5%<&4!
z(aq=y(o%6QPNwiA!igK__P95r6cn8NW<UxKdOulr+@6<`X#I09y57OXKzVT6K=yt+
zzUi#l^vi(rBnwXyUPV@xA+WBII9Bo1Kv??lL%^xh(WoXg`Jt_Wt?0{hKX<0XJBWg;
z^NPYzD}!H!*A<fx!?C<q>uHZgpmhUX@IFzS`(j}R3gRCH@kB(q!a0~+9T*4{t84T!
zyx@7|0UNM8f$J$mY{^r*Ycmvzuv}Rkwu6ud;*J8D)SQh9jg0OKZfTrBMT<B3!2(SL
zr#!wx4v#uTuP-X=#WM1bWB;`TWT#u7*_O7zYKJ*n?s6?tG6gFSCpJhwH(z5!gjkf%
ze6xig+x<R&Pqp_4Q$T~C{(X&Mo{V9BjFIb)AZ}^DB=yhrZ~JvlPRWHDYOf9mw>FYU
z%;D`c#&OrqCuD7{fc0E6GRr~nEww9akIa}(hAg37U&{z_TD{{JO|svXCj;fh0`X3;
z4H-}|qU1mQzYdXhcr}x~MfMg;8{-GfMC<zWw~yV4L|LkX4c+oVa!wpC<SRwsL6j@D
zzv_g6xuCS|WqFS&0cdM71PBAq2hheruHLs-Z0OvKTip#Fw{ZvS|9<v7to=N{r#;!O
za@y<oej~u!GVX3RwCmy!01)!lR@<izP)RP27S{qlBts4V_N*?**rfp?4?fV32gqE^
zzwoX~dF94YQKOs`>Y$0n_!D$dy-j>@=qGkr4-0OZ^PW2n_ovM)U=RD6`TSy5O_7u6
zoWcTx*rfbr&DMSlPIOzog4S=bk&-W7qk1VL|M-WCUeTTO3J(-XGGRpMo;kR}`z}&a
zDZr4k)k!3E{(H|hxBB;|R1KhubrW`D=+!jldi{bKS&0nzjOEfRe-Y35x~7r0kiw6N
zg^W98l;<k_7E*h5VhPVf@fe?^kMQccQnQyUT68d7NUN<O3PH%0V=!pC+5XZ{MEJYd
z>|QB=<sj9(AtX0J-+akf2St`>ues2nj(Z%d=|(?L=MYl+b7;|qs@rzh%kollzmC+5
z#Y%tG&&U+BVg)+Nec{j7hi$@a&-D4zYQH#KKF)wq+s@e;iGlq_*M-(E+ykI<Z}U~^
zam%oS8ydb`DqXQ42$M&+3z<r7s9c}*&8r}_i6@~0ZhCI6$3V5cf17V)c0mnBR_C^b
z2YX*N4*y)o*V2b~7N^}YuKxs%3>*_ANMdPNz3*5&_(0_75NsAC`HJmnQ^A+O)>Vpg
z3Naf8*S6pvfaNo+!E9-i1-+}$ZoORYx!P<?{^IMv_M?|>Sey}^JphR!t95hk^YqL-
z#3HhpDVR*8(L|@zhfB<zn6ts|!InugPKKcb{iXkf$l$)%Kniwl>sXbPUU{NMQi>d_
z>uXxgw3m$u;ajWOtU-$aWO2TP3%`jA|EDypn!cq3ZVc5I(gzs<Pwr$ah-%Q1R+U}6
zV24l3$4x`6PET)Gn!T95@I>2A54+JZq3`3LdCO7^jyK)Jf41SneAwm-0j{J7xWcJn
z>%Zl<gVrqdymwBCy_6O3(f)Rqq&TvaW0==4=;wfNX1`jH^sVh}P<jkSLL{fzNwjML
z5tf^tAAXJQkTy`J^8y`_ezW@hVF-9yTb;g{S^qN{IjDmA&H7cWSU86uWW(5<KC>FP
zJCh*@;{u1o=AZT14mZus-0gD?j(c>WwaZ9q82z{z!J%L)5X~^{iz%7>6B6eQ&+kCa
z?=dISy)=NZ_YdfUHqdDEqZgIRBaZ&nD4IpR6xzXU{A*J&eQ#AkZAZ{t?M>-@;;=C@
z#r?c_rz9ICkHgH<Lud0tF?y^A>Y=-mqKBBU+jkr5^;(!U3xF?3U-`liT|WR{G1J{a
zHB`9bcC+IwYK$7rBVN&D*jn#DwXxEK+LGFSENMa-2{|CbU_GY)p}$8Z_MWsr-81^@
z2Mx#s=vwK&Ra{Ciz%(`NF8biV)~@}JhQ~@6?vnFXXmD<*4uK-$sa#!y%}$l=FB@q!
zM*v$<$9?r?%Ns)v(5IT0F{Ibo69`)^#akE9C4@pJWveBY;!Z!xxX$)#X<D<o)ySXz
zfSvYb_1sZrsFou$An`j<Zh?@}O4<gC=cTIa0{r%^O&oD`ernb9#v;=tTMT9D2+*#v
zHKn*JLviH}7ZHTcTnObmTd5<Zp(GudeX7=czhx#VzY7aY@^UYPW}|E>gi`Kks=(>g
z6_MXW6-MxbM(j4Q=!Wi0z*MUQ9u)`TG^E@KjcUU%6?U>c67!0J`{uGB1AC0A6?dBn
zgx%^HN+XQ1v?D;4jH8PN8}gT_koSYKRt=h=<U)+PQu`=)lQ9UO77UE4%vL<nhS%?5
z@F=X7IyR57tRDDk^Ex;<<p+6gvcjEh?y<KI<vM>{Q~O9qc5>okgk-rC+Z%tQ<8*<n
z0_`C@WvL<<tlxs*O<Lj8m~Z~p3ST$3>gS=FwyNtssc&35e=6NxgUY_+vB~3J`4Wfi
zDr^9?4WVv~n?_JS)ZOT(FvW#*M1Q(^5p}YiA?m|qhsPDCie73XN$!%Fvz7;D<Z}h&
zPwF&%ycSOCd`(NYma#Xn)ypdgDnE@OqzutdcuX!E8w}wZNCAcHG?}Gs>Q2{9C>Phh
zbkQP=uP4Z5RvDXMm~lYS#uv1wEt$eJ8HCegJ8Y2o1|GyTmD)YETRP8dXzmCtjtDHj
zK%|aVm<tcDH{%xEHVaM8c)Y4bf!30svIVXV!Ot9&S6KTU0edXQOmq1t%(I$}R+a2#
zrYbAevjSAgjWRYedu(G2bvh^jzuzgb*D5$+_vu;JiBYK^2TW{YkR5cGihMyd1C@+6
zXpAR8on71aX+Jdwzp6x7&@UrsRwt?G!Ox+x%;)}=RhM6;$I+czxYBjMG%q61eb7{O
zy)^F(v*mg&=0&W{rZl5&$#PJV1c2i9iX!b^1=L=G??;5i7?a&YEE|h?S>ifXaPHc>
z6895AZRvrrB206k`Bq9Pvyj8#&m$=m7-aZF0EZoxH0W)4tc#n%+C)n><7#CGQo9LN
z{DK*sgP7Kr?8t{G5fgaq4^31BXkry|Lilh?V%AHtqO!=v&#Y64%d|PJEAqvBaWcwA
z`f?B6;TUCc36pswF)LoTxo-(w-ij(H6u~x<8{b|@BKE~E$ldD$I>`C8r7)YWp>8eO
z$GMF;v&Z-7Q)oqJ-dl~UWfeO~{kaTM+(R;k(9BcAndz|y@rf~_(^wQuT|)3s5+VVD
zC~!g<4oa0pkN`-5Ne}aAbH-9)QkhJS(Hk=~&a8!2O}nJZ;tfB`X;2q2a&yyi9uE5=
zmBo2<DVLf_UFu1SOl7k2y45-{Y^NHLcPdkcTrb$7x0k=o2zWsbbraf}1bvW6B)K2L
z3vTl<T!;!hl({peq->yAqnm>8c)h#FARdrHV-3#Rk`T#kqX!r6X!4h1pk5`6`Y+`i
zp1Et;EWr~h^Dp9$zxEAa`ZbV4@ivlwGNmI*6i4;IYL~5!MzNWzc(==yPR>T(^Qq@&
zzl62-UJ0FNpMN&*?RTzm6#ZisM#M@FqR)rr`53nyVx2NAhqrUY;}CAY7UowNu$ej_
z#;Z9T6z%l|z{jAlBMmt=c_wt}><>_s-<XPkJ=S5#F>;lfmd#v9peg|Bq)$P2E!fLF
z@fNkpkL+V2+`XmfcGR&4%oc1Ry(CF7a0V~o%&BoGd#e9SFw#+dw&H(D^D;+a>&b=$
zUKYO%Dswj~TnME<-_T<yPY*y9LT)k(Bko0tn!=9=(XMfXARNwoJ@n#@6|ufMca|ko
zRM!w?$@~_iiw0KRjF%ed`m_v4k%>*A@eJTb5?VxQtta6+bgae0q2F6f995&iYE*rp
z;cR_$q~H{u3mNd9&HrpA0h6i>2#2tv=j6*6S0|ZcA&`&H@5RHJnm5kyOTXtv%!t%i
zrG`IFh}L4uskmY^Vp>J%<4lkxxHiWk2%wQ_2M@6bh_MB<bVWsR&v}-m)D(QS)=jif
z;|fLIlwES0f!|kY8HuPyS!XEO)U7d3X<N3Z{<;4X$)!iP`NyVK^T&R~jn&X8(FCot
z=~&QQ;0_axa9<_f;@~DS2x_kE>9!N%L2dWxa8_{M&3naO)Cpt`%QuO5$m&At86!^^
zGY%7>GUHoB7<hiONF2*pPi;&roZGC)w2m^nm6&!r|KGUces9481j+7*8pnWft%o;B
z+6q?uFb`|-?ye8J^j3cFo7ob&bNCHCAb~UjvN^VTYRXlVC&u^!HP+269D)<U{!A0n
zI1YYwqZfW~R@g{SM@J`*!w|zb5z9JSVlVOvEH}v2?i#o;#-fdN)z-^|A$!!t^%%l?
zkDtHRh2f993xeI5R4H$}@vSHY#xSH8@@4M=@m8$&qRf{F^aerbe6-tfbg^#rh9A)N
zaKGn6xQ7H?;>yA*a<pqV>h)O<-~7$``P$ZCLkH$1{^0Yde1{PN$q9Xw-C69gH|LO&
zP9-IV&T^B5%Z)>;Y$N95<yMFLIOL}!K57o8Y7%Hrzp;^Nu<SjjVjqY0q)Fc|77F^5
zZ*i*G;I%wsHk?ht=`eV$G$Qk0E`k?SrT1>74rf&Bd7Cf%C5W3Id$rq$#ZQNY`C!bh
zCt}}gqGWxc%+I{tG4Yv_D+XC)nwK>oSk48ExV&8Zj%_A#O2oCu)~Gt5;23`<Nr0=J
zORw47sx1lTd5pM8WPYygah_&Ez^x&DUx6*`8Xet4uE!7ZCu(=&P%uu8=)I$yu?oU2
zd{ARA7YHp!yB4ibJ_n`Pu8dtZd|t7#E|W^1IX8mEbb8JmYCeYH(;*cXEb0w8y!lf&
zWokm)17)>la|RNnp_{Jta(n_cp}Kp+K}UR3Pr3gzhO;>a>l(Xyv%H^rZZ2_iIA7H7
z4|j=*>}GCdC}QnNab!pJl$on7zYOYPLyMFAgP>zb_z>#*55d+`m&auY6NkR3K*^`u
zNpI6oJH!~sC=QsFMYdkL6^9S7h;jrDQ!ElP%8-^oBbR3jYyj<w+=3BdgJI->2xwtW
z4^}r!#S8&|D`$X)7tldVQ(yMeS`a%}BXV>_l5lphQj;1ve`&{KoSLVEt;e)Xu+H67
z1QqGW<YLv>Veqpma689M$WZ=~>E8|9RdIupZS-3(-k|icLxX9UTxh&G5~$w1`KsWW
zAgg9OprS`w{VR#11}vw>a8*hgfew9AV33q?ctUr%x_+_zY6W%oMbr}d?-HnZ(y!%H
zT6SIYN^i;fUy($ILK+q=nZ$|I%V*ey@6&ePvwI)K#gKo0EE~%Vd}VU@gXvqfPk60v
zg8C^o@jt6*#cP$b&bEhnYzrTP_;$BT=Gd8_(w__N^#mWd<IxH6X{%1+;Y`|;2@0q0
zQHt~4*50CmD4pjxONcE4Tkyl80>*#ET?^VUeswg9INTbYWJDKT{Y~tF5!-bjs`3yJ
zto{~r$Q1O+E->vQ^o)8Z{r3=)LlLJyU$eL&3a|tIUjLMY&R}w@ZBV<oU5s-X*5{GW
zM_ldU`sTrHdc)^oA%I@GMfosJ17RKLP~uxJ2&Xq*t~8nXbEYqT=r;aLHo=!NRk`%x
zroKboX#X$|p-b*vjO-NnRGEKn%J1``#Bg9sK@VgunoI@s^z?+OWdiV9NCHVTkv#}`
zAQJF=B<ZJAi4rJbyN-A@Y5l%041t(TJQI@UKYh)Zbkxrd)js;-bfM0imbr6Jn0a!a
z`T{AU3>{u(B`oS4^cJS)pEy9~FNRdk8IauKB^M)R*?IVWV*kd;Ul+EYaq_TL3Vbg~
zy~&|fLZa-GiNP657gYBg;f|25IxaUBS@vU{=cnQR2in>7lFE)9RQd*0F#fF&>7J-p
zdS$SXg@;8L&U;p~8Yfk3NZrjN#s(0IB2YxF82O1;tKJ&|CF=fyt5(@gA#m%M@#O&v
z=d1z-U>P189&S`GN6<>=!Ntb|MP(LQ)`LE*C-MTsjmp(xvu4thXt5yZH+TEq`q>K`
z;gy(3p-U{Ca3iCu=ttBA5!UBVQ!QiRR4su4&sr@!3Rc8smXOY=r}Eb+2&Q~yla#vJ
zxu;vh&iwR-vvFK2>^d$gY#SL|0tjYj5$k6wzR=D@?28dP)3i+-pgGOgiV4%ab%t%i
zhLM>)ER6sYauMv!@jl&5+3IJr1U9KHtF;>ByOXAz^-GML)u*xl7}O!{cFN@3vPB$f
z^`+TYQpQqsA=Gl!km>vN=H%o`O)c{$%^OvxE|!l~XyQ1f2L-6>rZZG~=j~)G=yQhp
zW+z6dzM_{$M>HjWqs{l0XgZAi+Q6U4au6<;S~3mc5@q(E!mAc;j2E|HH|>Q)MKqfH
zd98rNcvxr>f{sDI41e$)o=T8SS5RsTzn=T0g+Xsk=6!B5dvaHUI*?_#h;YA3!KE<$
zu~%EiV^<ZbzqIGRGVcNzTymvU+Qp+G0Y=OHUuXAc?#xJeoH`4UAXO=o9scBja=xbM
zV7Tl>;o*iP0&Di~&|hlGGvY|#DZ25?hhB|#=kiT6#_W53!R;EA(Qhf5Xz)!AVz=gQ
zRi`tnCk&|<pg7wll3#3XQtPguFn{4;n&D?Q-3ptg;^HLTHAvHA^V$pb!Ml`jHYqt;
zd_G%}=s&7xn&6=r50j7&+5^j9g&B&j3C%!57c-x>SZ{tibo02+x8YD4Wy`Pz{Rx#0
zHEfp#)MXv#L$?HWQY_nf)KfQRuR30+vh8>{@D@_}Y$VTmT>5gXMR7Rsc(EP23wZs!
zF$FA;n#Dufxzm}YqHMM4J)O;e-)i!Yeg1?&clERc)cV9HnWn~cp1uBXtq~s6qy5#s
zsYl1o47Y06ZfStxzHIJ7$tL{63qc3K=%?c_IlTh(u3d=6#zR{V(e((|H66z&$M;3A
z2*lT>!k80j(Rf};MMPF$Mso=F*ELxnuUU_~wZ~f0Lm%fb(?DlHn96Gurw>j70Ht^{
z(~8K3+^Q3iaSB3Ml}3c6WEnexQJVRbK@$kT%M2v6hkjTt^y;SZIzmP(66y=vsv95U
z8_9x10YTxQ^+UI|phvF%WuHOw#;)Mfw{{*-l)PmgfsnKX!f8>TErECp`|9Fp>kUwb
zX703N1J#s@SC<hqDZGxwV9nDof0KDZnk23*iXG<ZN#Y>5K2^0ne1TVKLO7b1eP89!
zg5Qd2Lcd@1X#=XZ-{{ZuppS_`8~PDC7^m$ZC9hUVrL9Q<l*#2?qJY#qpp`gD%#WvM
zA;QD2ghe0v4v3Tnx=78KqAX9}X{omuja{3A!=dGbx<4K%qld3~@As!RS!bz~Y9h}T
z@dkiK@hJ{!6_SUVW(4xa{CrRvL?4=e*wn}rfmkB#yAvj3eUuIi%a5K9C`<f@uz9YI
zh58?ODfRmn)PM^88C$WwRoc%WQM*QC{MGl}$t(qe^lcBz0ou>xLE|nM{LCUqp}5bI
z<2Xxneuofp{?I4uP2I>ab4~~j<^t8p5S|)6Yyqa_{pAts6BIvs;dPTFRwgYnnLZ*N
z4UA?ry<$(DD|;r%*qJe^K{NtKG?z;@TMud?U0eea>5ss+Xq`PFUDr)Y!K8=6<)LE3
znA3NPlM)bYQVY!{8?jh)GQm`_^dVZFH0D+Nmh;^T*Q<rzOaswBw%rVuWq^RBbp)db
zv`{NlArGCY4EZXHuGOr!3sro2g^Y(vP;5Q*y7I*hROcgx9V<wQdrD3uTc_N`cqU-&
z;MG5t&;~kti9M+TD#KRwhd1xsLl9Cm3O^N1NijdhE{)O~FeAx5k=B%x3{n>R@_fH+
z)^m{nV6v$s|Bfd+vUVc+m1Mj{RVhQZBLe15l+IAH5i0mZ=;3M1GWJhOw@cUSM#!En
zXzmaXhc*_(U1{_{rZDweP-jl;t}KfIFW{5+|GP`&vOHU%a0W`BI8u6xJ*9MMX<B>o
z0|SAbl18#+$vBM%v<<Lvh)ezSpHL{+m&v?1{>~&&c(Z--!oOD33|oSxX*^y$f5K;<
zEgUUr;Ai&c=Zex{#+vi8>ILiJ?!#*uEZP-zeX8jpnSwqBEbt(F4enqEX@PX4Mjp<&
z`D{|FJM~HCRzBwYK%mbJ`WBx0ayKBxp=M=LYEX3PF_FHokvPJ<?)s$u;V;A@Hp8Y_
zS=|Re#W7)V$gWy|jKr4R6^_Y4G%1MoS5m32MK(4R*|w!bmT%?P2fM__r{0l!IVSPA
zf>z06AVR^%1;C;N2CaezVjS@e2;GGOJSYyw+9j!dGEkk#K%s7JcZk?ff=i>*(>K;M
zwI)qug7DQ0F{Iz0lo(>VN{{ELs?Ox;8LP!bk;)>T$)I`%z6bV8M3y9YpDqT*M?^G)
zIQVv{iSWQmN?t*j%?GztVXjf#s4KJh5=4<PY{H%k4tJ$tjPFl4a8k~Y?AeX46RipI
zwQw4>QQL{<kxX29_L3N#cEF<oV;(&4txtLKU6HTBQA^?W_a$MCSe1Fuq<1R1$hDD~
z$1hIE<h}8;En5N?8-mCd&u~!Zl)J@gM;e=TWL?}v_i&ltyp4Yx#~Ywzez&W+DRdDd
z<ZtIYvvQW6q!w_u?8~{-Yut?jV1U3LZGRCb3hYi=<oKb50~Q?13D-*a9otjIS<nXX
zgw-O%6PJ=K9FU~{p~T)7pnDwNKcP;I2H!-@6U5R*lv#;gA#S?NGQq&Ec98ik<@O6x
z`xhgalyei#T3T@aME)`pOu<PNdG!rdq8=(B`hRTQV{;|y+J@`cwmY_M+qP}nw(WGB
zbkwnJ+vwOv$H^XZuJzX1wQE=9C*;Ez&$!R)2sY(OZY4smURIR<kqM`sbn7V_NcxS;
znJDJSY@E7atLv;~QX!1sYu@9YehB6BZ|yVFP7?`XMtqmq$JC$v7kXA$7?PA|s1F-D
zrY@E!yS7gYjhl9;U%C7#;}e<s+bze%$!bOc03bp_#)t>oqgH##t7(#PP2;?ymYSXS
z3E7{|22`DtVpA~(M^ou$LS)>>;7}cU;OFH)A#OBNHH?>><-Ep?gfzrlcrxJ7rorge
zVkAXZ^e`sJMDMWOY6QKb)XBanteF>E4namg&gi%I_8K&zs@;T!q<!NG*^>E;lBH9B
z__1mbxMUGGuz$&8h7-JJGJkG6`nR~*HWU;;InoDc?_z*CZ~S$(H^3e$gcOp`pbEMS
zhS5Q=<&-=oL=0w*O<?WefiFkgjajtBUc#chna~cRa1L{5(xKCrLp`-R_Y>X5EN*G&
zkU0+u*$RQ4l3OOkL>eb&)0RTS@KkI~)XB}WbPi-vqh$+q*nTA*VLN6Lq27(Wq&81u
zliE(s>*9n=V~n-olikp8@dU9f1GDw(8Pm|ndO;6Lr6%P`hepYD?GNX}_;rZ~0~{dO
zkZ<gGXHNFJdqP4qf;iH_HjzJU+0ST*@eUeyXbdc_H01rk2{wFZL1xpke|qnxWF8|e
zzt7~TANl8!Pn#?d!%c#8=lYc%=GGW!@_b~h7cTyIa7~B?xZgSPgynggLhWmV8XL_c
zC6TI80DT@=&`9))snondK%c-EM`dz@;#RTbNcbE{Ol|qUOCIP?U*)@NVW$11kk#%y
z5(*&C?e$&bSIm0gGEi>NDZjfMp$?Xd<SBZdT@CG>K$A{IX#yBDuqWf4`az|C<x4x~
zplzQ9kTEfUl}BKK2$gcmmFQKzGbpXr^!8a+j;0SM1yqRP!3)!<uN&adnG!`J_bi~O
zUmX0@;}iyOdX<i8rhBaC8X)mj;wHn<w!~BsoiCOkrcMVMR3mpg2;xCJn)@S8qjt?;
zu&h0fB;^$y7N?<ONYHv);PyG`y4mZqaSgLpA&~>a+<a8|VVi=xz;SU3HB^!oZz?-W
z7|hCY73Y=&>t$iAQc~R6=p-NPDypkfVqov;8il!VlY-N%5tKAQHUTE(&_rho)t~Ay
zcs7XQRQ!hVA(_irn+zdMVVuWDt+R^bJMda6O1B*Y5O<izuU~n-zJvUWLC~bh8jjy0
z#T2W5NE>0aBrJ;$jemA-Oc&U!enTI#%$8xC5G-0bG(MV<=84T5iY!s=5fPINUn15f
zPn*4FgIh<8YC(r}5r&F&xo{)72Y?7;Jb>=3PSV^8!g^9vB<z3o%|>vB0~xOYNr@1a
zHE~H#$WghumQDVEQR`_sTQhT#3nG(rr*v2%tJX)AI#jQWgUOT(jqOQ%Ag|2j7RQ!l
zPF`3y>7uaESLQ;a6E@`CpZNMvWE;PhO<xa&CEFE7W0FoffG<Fix@u#DL0Ll5RJYxP
z$Rg1=?KkDc5xJ6`Af+|~jNYKkYg;~1XSovTm7VZcp8JvJq*c=5J{V!SHXJ4JJa=sx
z_seFDM#_-y=WXZl;p^pXo7Wp(JF-c(`_@Ztn=-$EDU3{kFsi!Y#D2jYA^LR5TKp<z
zT=eYy!pC~=C`G_~q$jWQJ?qqFtaiQ?q=SDdo9VKF#FN1{JfA9s$55gcV~x+5%-d}G
z?fV;=Po8Zdvr3kIM(}6bgGO$pOiVdB6EKF7R5}4cL<Q+AJ$FL<9<xvjpZ34Os)BH5
z@`&Y<rp58d{(1y#<E*s?dAfX;X4lhfSUUwE@8kYMAd*vsg|w0TV)nHJm@t-UEyKvq
ztN#3mJc7vV5H(_QoQe98!7#N%&+8`t=YS+eVt#+S{fNO{WLU-G7Wecr(69&+2_FiA
z)$r;O$XPyLliz=q>m7ZlkxWT@=i8A5rW<j}jehfs2#JpL7JUTE851IAmpK5{_&EP_
zqTL4~d-Oq!746ae1Yo1(b82*QwtKo4xnJGgKWpg1krj=-zJ<HIkze;b+?rk;hq5>|
z{9T*^R53GgEN@iZj~@liXLUsWF{_%^<s%t&2_mF_)(>U|VUg5NE(I_g2!<6bN%dRL
z;wiN=yHz^5dVL=$0MpF*gurX<Ce1O3UO@R&nQg$Za@dYRov^j-gty*(7{;!oKb+Ek
z-)76@6ly9NbQ<-f*bjOazHMMOcX3i}E!>U`Q+0}1*Dy{@w{x|ttz)fgYjq=Q<E-kx
zt(J30?;v|{I&+)@J?@K%mGi=!I^Ny|B9?r%0~UhL`4?qyB`S2rJxlXnkYx1p!dsi>
zQAZRCus`)p+beODqY5#+tVMoYFeZe~Gj2WKHnZgjWR@QXhnS4zL8?pea-!JJZ$MWS
zr<BwRp*h&gI8@rCSdC|{Xn_a(B)WYww)<HaQunJcSwe*Ww6a*8^>`>vZQneUD*}UY
z+`Nx0dr9BhwyNY$*^&`l8B<-^6@PmyE0WfFS)5$0k^&P6`Z7b75G!#}1FeHy)2{(z
zskA81dG$fl4_CGX%JUj%6tV1iu%wAPc~msX=VJZhKqjBR4eb7{?r%j8x2TUsV4qaM
zU&Luvf&!%Hl*}4YFP^J&B~>W+Oc#`(ebw0J`2QG)ao7Jd5ZmlBh`=DAfxklnPrO-c
zfrv~l0S2OR!~YqGAOAHF)x5w7_8_)S^p!}6G^8@sSXn0owCoA>G|8d_mUD_p!QzqE
z9q@*F^-nWIeRr-bMcei;n$J=KYK5vMWDOy*D5d@hbw_;u7wYDuGc5%ASLg3btOi*6
z{)({U^9=?Z(_&i(5l$`5mn+jH?7ur<Hy}suuF%R~^y^|*i_G`WSW0N1C1V8Yv1IZl
zB*y}!!b+8nv9YEN{E2M$JBM0WNB;6+!_o3tbdRf6T7grr4qqH<{Q<DIISPjXm#{45
zd7A~a`2tjQAgEw*tyB_+rW~R@6;xh9CDfQd4mO!{@K65go$PTz{dxN!?Ga=Ma2nux
z#%D=g4Dpbc13cI5g-1Blr%>dq6eEwK!Mo@EHmDifAv>KK&UnEvc)y?7ufg5|wAak8
z45mz>dc6^vj_6IkM^bjj<$$YLoIOpUWIPI;j5Q#P&4W)YjS2Hd2nBqwkDmuO5WF97
z%)+0{0#W59?5_Z%Qots|Va2}-ijZjOV&L+#v{8WPgg&W=G`5PMCT-95FLuv$3O8#L
zjeRsob%<OCUb?=2&}wiLh0Tzm8IcXBMce{7{H=R}+VjERT!~RaoXaSw>`BBTl-Yj)
zL=(mYn~Vb5?avXeSCS#p9=0f^bFTP}CnwEy(xIA%tNM+&i<x#vnb88DqVw1BVc3M9
zH-R-r3^P0_3aqwtQ}G9OjgJ7S@zo1NGe?2r)CrfYHw};fv0Wp2zD$2;$y+CQTKX9J
z_a37VnZD|qy-uCF-j6c=MkX1KsfrQjOLLn{vFLj47~BD1R6kguv<8rLf~|f;qc|TB
zL`Hmakh8KvRMP^BmkOV0kU{0y$+b}T76SDC3e9ZpYh54N7WAlo-+kz9OQE;5M`|B*
za%N_VMrImpnu2z5b!O`CWo)s4H(2f5@2&QLQuiyTtA#;xuSzwRqWN8RTPl;Br?OZJ
zTHznB9zgQpzlq+0`Z}=1XI>lC?J>9ze<VfjJ7()VOVcD<!Q^pRS%Ges(<a4%6AQy}
z%PX%8d@7QU;z0Ocz2vGDkHm^gn?T-iV-pb*0t*Oy@5<&m&II({KE}Gh?ANp8|Ar?4
zB)*~Eybz?N1Rf}6<#w?}Jx5Viyat}Z6!P4y6eSl|2zr~b0`5<UdCuMhDxaWg{DBj@
zL#R^($mOC<t`|G|R~DC3uW#>R>vegr2cBDmPnCImrSUPWt$?V4(Wir*M+TiP5Lu1|
z<dS7RdrlXmZ5Yc2@*Y<A1Ej<8{d(b`PB1~JAjP~iJ$xbmm$MZ%7)NSQ+%5hw1Q9xO
z7CSGh>!>1EFy>%92aOV`Qi5X#o41l=OIx4a4+M$SIB}RuR~b|B$z#9c?rBvK$S5p|
z|98@R`=6v&uxpt#vxhJW8?;&ez5(2rA~i!OaYdwXujnFT#+(7gMS@g+$c_WTz^e%a
zJJOCoL|LMJJgi~Nk~sPw0P!d+rxE}lmIWyyXViFNO35$*0K_gF3o7<XeOYL05sa99
z!LmrR%<s&u+QS9&W0;j+xt+n!LyQ4K7Zv~jaWZfQ03ee67eMsjh02cG$@-T+Ec+jS
zsK^@V4?@1!3jaLzkg7Val-S|F|D*10V#wWS;s5wU)_#Ja7-GS4UI2f1cWeLim)R;i
z%fQ5>w#h&I;q+T{zAN7t^y;7$V3^`x{!rAt<GzhAxA;A9{MR8}12GRCQ%b%%r)@2-
zt>cibr}l2&=%gdWHA1L6Ane@+guMxXu=i=*y~Zk5&epw2i3jI})k+MsfSa8atix0S
z#l?(deM`~1NpV|frqZW1_tnKc4n~jlzkx3@p$hS=8{sC!J{=J<3aC2=rWI&mj;Hu6
zMrkN<#<B9re-huDuu9i4a!Yt?@iX8*o8ij%HP+~Z?3V+Vtc?y{Uj6Ov#eWaFHz-bW
z|I_;7&FyP^Yxdly>@?W-lPuchz$Z?m2L14?g>%A;9`eFpTg2!*LimkZoH&FImk0?r
zC#u1PG(5^Vk?x%O;7Av#Nra;&aV~nmF7BSdeV0h<CN(`OdmK#l7P_wM@{8XbXC=2G
zK?jJv=(^Yqg!&_M^Jniw0*1Br+3H*Ll7CYUJk?iQSZ(jQ0yC7I;46i!D6ZnSlzDL+
z70|1odId;RyzBI%3kF<Gd-zQBo<?!;fs3k!%;+tOqQdF~w2$9eGEV-ShDNz~^~1tS
zW$Et2`J<cpW}Tvmh1HgXMR__qC|PY7UK;XMnvAks_V1Af)gA@j(n`6Sm*w;d>BiL(
zSv5U$n&e+^V2Pv+%mT>Ru)oe3WKbyU804Ja8>AaBFbzY3QQ`vy1X-)YG7KZSJuj)E
zpNTE*$=UiwrkWq_!7NAb$P!?P#<s(x-H@cfYx{}@xE>ln_tP*f`ermS+NQR(_}><|
znB>f`03VfdHxqPIOxBnCHO`Bk3b~+z4!`E7;<oa@Z+O}pC}>-qpSz#Cx9C(xONZ&O
zc2Ve`tCMWn&_vqqV%l7{ybXp+F%Bo&UN<yHp}XQ|1%Sv1ogg*8zF_psRUz%G_1@&%
zoTEsPs%JhzqQar@$UzN_3Xt5KHGMz&xcpFg!Kh0M&mG?wi)z|x!JSElV)kh_vcvm|
zmSjwf{__9MRZ{00>RP5I?m`36h+ek<END#5&R+P0?kk?P-|t`yZ5mJ;ZNdi~qTuxu
zm=@9BqEBMk(k*2$OEy=dfCwG&nSm{2J#m+UNke6DI0pm%G>9o)I1+)n+B*PyMV}ge
zBo5d}Y>=S}2(XnfMpd`VnR>oDJGz1)x`KbU7(llfG_X`gd@MLn)keG@iZ#gkIi87s
z;grVK>}F%0d@FU{4mcmU)cz^XpdBGC$R=Nb1RC)2R4#3Etjz5c5#n04r>-5wBQTG*
z;RtY#*LR!Y3mgzlP^ensFVOe&$A1`alHo>#|BI<H%kyv|TJGdyIK#tol-nk;{Oaw)
z4;kzbi#H4V2nZn=dW|jL@+6UPx(n;3@pCW?<2MDqe{*`oXY($WHa<FoTx*n!+}yGg
z2>E(UcWL_agNif0P-Vs-E%LRH1bhKIn4kP5`)y2I3A%DvRBKR=e6u;Q^|()=aBE*D
zhu^%j{&~|Br>hRS{MNlK%*T+y3$<#!%`_hou-(B@Z?d|wblmOJdcD<Yw>q<wewFil
zx!wV(2};n;0@>7Ex*F@)zp4sbh;da^pseL)iDdN<iMQY0Sj^JE6u#lTua^{uI~}O(
zGlnu^@TEy<hVUZhqmD8p6W@zO$~`zhh-GA)GL@uPnadoF-5;}jk2I-vXURT@ov2nN
zE^@p`G3+-Q@$&y65OXauh5t(+{`g7wZX7iynQ$JO#X$zF@1!jT#hQH;!sBP64WG2-
zb;=5nE3Wo?M;sjQ`}B&KE)ABc+B|jy7F&g~%k5r`7Y^qs<0ZmIkNSSg+S#v^J((B<
zS7Xt=2E$g-W3-pTGzdl%r(V}Isd2dQ8AKXNja~(vCNcu4I$azg)HW<)SM59yjZ7~-
zP(q<Y*^3h#XpG5B^nEeBIY`X@2R{2|u^rakZD|x2Shh-AdnpU)cyy#~0#?CZi2N0F
zE_6B?&_;D4m@btcSNr%!(>$OTvDKkp@IF|ym=-29wC=VN>x|}V`hjW*hY>p`rH;g2
zO3Z{_l0|(2IBoONXB9n&kqZ2k2y}t;vH*8naf+-lSxWOdfVDhbB=Y{?Ip6r!cH4tO
z8RbOP#-0I_i5q%I++Gcqo<ugnZrs2+v`|?Ji<D0n=0mWcVHg@ZKsdyq;p9cpQ&*I@
z23Gfl#1~)!A>NW9O%iFkEPc>lf>`zd-WCOGE9g%VONN4x)&{*hij<knJDnPSw_n&U
zcpK2(gjL@KA#KLboQ~Q87yXtvui<yqD6P6zLk8&z{MV*=3?0>d+o)y1b4CKX)7i~;
zOI=wF=^_fgSSXa#g|XOm$ZX0(%_|zJr6lSdtRGxY#dhULI1mDqK8RDBymCqtLFRE4
zkNWu3TNw^ATmp8Pm$W%VIu^GG9?MXx4q|jJu*C+_FnfrTaht~c>=!A$8BuJ@Uruaz
zUlmCrp(~_lCl%W<1<81Lzm?s=5z*FCfB#41EZV#x|9%AT0t#-$yzH?_(vl{)y>&LQ
z$0TzS*EB(ix9Qh1$&a6|l#rXofwmT7PnjT>d(aiJ@rfU-q#4}CTvL`Sij*)=HK~j>
zdI=m$mSh%T2ri6*Z_Is!yReGb4KP{2>NTuO_pH?3FmN2e2<rsGnVQ;8Ct?H5bU*K9
zsejaVvCN)u6W&7S{o&A^Xk>LGcPSGh<-(!;8Jx=nxMw4n+c15sODtD5q#{$cJ+lC1
z)`TJbnKOX9ilEMETB8zcu!ovBjPmnwMI&YX9<&PwE`#4)p+|<w4A%^`O9Q_e{J0X0
zb7Pj97eh`2-&S90Ll<`9NrKQUxXP$Peih&#PCKVtG?#K!XnilFB0H`&w97Mo4pfl>
z5>YtBsDB;A+LeV!jradJh&9KxdB1mp5tbW=e_Ap*8-OzZ;~=gky1|^+0EG=2Yq3p(
zt+cSwGnq@FinAI_!{ZhwoiCqE$7ZKX=*pkhf6(GLx=E=noyfdyL&mMYk^NfzQ)Zma
z7o<f%l^eyeeBFphVo34^eKC!zra3@zk@@*IM`Ci?kqU$H7Wl1FbTmz%#C;*2w2nJ%
z!ICRQhI$NI(q>P^;+bvyhZoA;vQfJ3?T#+0b-0Z)cw0XfiCDgHSRO_NavBA+>k!-Y
zLd}upr@A!gV0p5-T*Vu`Qyzl@MW+^WB}$U50>`1n`Pp@>rZc$LDjUj9{|}_-5czGL
zxhd4%q~yvE#vVaLlXKxv;7?)aD$>fNx%mx6mGIn+9l;!&MDD|E?%y&rh^^XKQvGuA
z3&Sw*0Y51o()Ne%Q%>!CKgq=O`s^qP#J_Ve%j)4LsLJ~K@dlvq449x<3u@x{m@;eR
z^m1$)OU(@6y~upStJg?lNGM}&eUoMH=vufO_A1GhNpBv4wlvAortybNS=f{%EylL?
zCg^^(6fO{3&&xQ34Hx~~M(0Y?**=7;QP$GjdWEYYhM61(wyH38KW8L*`7NfTIPDxg
z!Fbi7l&TUMQ=)mbvn1C5JnG|ZS*A;OM=DeXnQT|t85fJ2nQ0X?Qu?^XANX{8_+%sY
z*wk@dqVWKDG%rcYB5l<FiKCJ}+bkwCXbI0bjcKBgdqkK(Mhq?E+!`37Iv&~NqWr>u
zw@hiZuKrP)JZqYBK|c7(xaKv45`m%^T4#rS+fRjqEi56OMvYxN(6IqXN~`MnT%(7_
z(yJjP2mcx?bzrORTf|78X8S{c2e%S$^EvDCpErWNPTYYIa!q@+L8?QUYb#jL-!H&2
zl9(aEHa~u*rCydVZXvW+^<%$}WF6HBXz}|0$@<}OogXD$Ovj9IXB5UD*FG4^biIhs
zhOxnZh(oObZ*iO6O~^X>U5ZS3CLW=s{$t2yuYa&#`kG}9*MAdlM8b?H;#L6sF5$<c
zF!WYE>(6z(<AP|_-`aF1LRB#f=)TXC@z6#bUvaAl@XYXy`3&YT#ain!F}iA!-o{0Y
z<P#voe5<PcjwB@MZ1&4Zu5JNj-blhHDFSnok|Y90X@q|up%CFxk!Htwbkl~kiaWc=
zhA~1Y7lcdm?_>Anc7<;vE2NT@8&t29xVXq>m%F{1v^ORT*RE%)1Ekw21HnbYsmVx*
zvIbB+x|F%VEm<^8>k$v@@2Utcf>k3-14nJrEm?*@*!N>Fq!8SorJ*Lo){OdU^Z%nH
z-XJ^8xC~=B(%<}}B-&87pz%!oqa<#|I;wz$!OAZDrXQf(h|8S@2HzSRLOU`z7}Xv)
z&jJe1SgIY23!tkAJFUqbOFeF(C>0~+a1L}GJq2D^w30#CIDNhQyXnxGf>b)iaa1E5
z>wDZ+g7UE6H;F`|>MPxdMvG>XB}}g;+ddhhtUHyt#}Y!694W5%b_e*eQ5a9OUkQz2
zK$_eDr{)84Bbh!~V%ar_QZB;f`_5q4?jm9wm}p`i&1#czR1uBLiH>aksIM<D2Rzu0
zFKKj}?BXq%sgnPoBUTvVY?avo8-jt*hdF<p)+7})DbzCc`Ipj`tqLYr#OXj3KuJ{b
z`Hzw~tkL5RxDqsPn=X}g6w%<{VnrVb^tS)KoDNb$qo-bd^5o0)kC9lI{{wQ{<E81&
z#qid{_trts<k06R&>>qQbhtzm<d9zLEeB$fG(@-SZG2(d0r2kVMxMV)&mTJnMZ5`J
zMreyGM<SBE9VmD4bd%om^(D}HrI*{=^th|S_A6^)==h#LhAur+2ZtZdxGC9ju<Zve
z&=y*I9m5pAO6{d~yaoI>>j?T&ih-wBmHQ*`+Ebp-Z(je2%Mf<RCXx~-!0ya_x_6P1
zqjSVQawDdH${|1we>}x-J^DgFS>r=~L%?*P7$y?jeNl=-pp^3!UFif*+hUbC=MUF4
zM75CE|JO)lnq6F#pn1SXM}_}8GSstX{xagZZ{3GqUL8{Ma(sjwML|u~lNGkKF8fPh
zdVj-~toe^Mcs%LdExt$2C%l2&?YBzUR=+M$yT697pC*w&&x;`@CzGp31+txB$CV-3
zWpCpVb}<i*vw>b)>+R(w;C+`+57nRFFNsy=xm!7p+Z~3GRS$h=j-<XXz4rs8PQ9Ta
zceaeOYooFg@F?9?0Tc(=Q29{f`ov(#3~8GX&<|ZZSo-$5-ddQ1o+3wwIV2W)Hf-iy
z$>+%(k5l!FkW-*6ge>KaU-p>Q)69Io)$waM7`@OAymj)D7(mCq!M4bmw)iC#Q>C|N
zV-06VB1WXf-Z`ua&Vs4vxb*f?9u@rvtA-iD+c{_nO$QhL{&W5!4eigRYF)`*vfd3G
zCX4P(Wa(9DCZTYEPG!2dxL{>6)d|x@ib7Q#JXfH)51fH^qpav?F1g#w5nc~TT1wd>
z*#*|fedN=pj)M|L<lC;~R-k~NW`3r-ynp|E<S9WYlHlqMm6Pjcpu|r&-eAqZloP-+
z8Dpyh3P!#nL|P@`yk|ylx9a-|9f|By2}XAMNU)o|!LTosfQ3R-;xenr)_)>>&5c0D
zf>nFYS%G4{2kVd#GBWosX2;6!zK_hfQ%(|&6H8T;chP0DL5I0in~mw!g}(qY3Nf*M
zfU}zKav0+;FVydO>mEYxLD2-=;I1l7tj*!r#n-Buq0ojYnK5sSN_ct}!aZWFTjQj|
z)p$a8DRqt?G%Vb_%n<jP`;n_<4r0v?BQ^%xLA#SDy=IFfbRsp3aWxJYsQnum`uc;m
zgtQ4+Rj>*QTns%ryzhav*Wno~_SYY+Gl!Vk*JENRUm@4H0_ON#-X-ubA4+41Q&ngw
z(IDa4Z#&y>qXNBu7}%-sqS~wbp$Qe;{}v_e6(r7(we=?w17<f8!)!dmQ)?@jN)s}B
z5H34#(SpZYgzdwuF3=zb8e%Ad$^=pj73Xqj`I94K5yGO$!gA9lN;XZ}f1!0#;kLB3
zJLjic7&ql{d>q^(@8@{&4@wA_C4Y?qbi^SdtrNlh9|wlgD_p-sV<wW)R0F)-lJkr{
zVi;i7yvjZrG;X<fUq(|>kcgz0kIYKp-{MDIgLy|+cfww)n*ZH_$d8iAU3_NbiLJ|Q
zX8C5cgx<>I8t*Sxu)9z8t=Mb540WxWD?)9(SwPxgZKV=5CGRx4+o*Q&j}A{bk*B!K
zSMR(<Tc|tr`>jl6<d{U(V6K;v<Se6TR2ep_UaCM27ehN9TKG_qZ<rd)E;%HyfC)xt
zR7KOKI@=M=7BW?ZlFpOHP}at~2s2z<4T5+N%E%WDANsCsw!4_7S__7wofjrq<myB_
z^w12Y9Hjk|v*0tG(d`A(S7N?|gW78<{UyruqSW)Enc;4JPq@zl1v&?j8s|z-$tonS
zvH_2#zX-7fE|LIF5R;mV5m*A0v`rQao>H!p2xY;tZvhrA$PGK}en?=5VA7%D$OQa4
z={MPF0U}!nsLX<p-5ee!J7LRL6liFN;_wVq5*$BKT^MZRPh0f#!O`0`+Hypx`#*hx
zR_F{rw+@91sr!REm%DJr{$Ua~f=)hHskCp|mu#BmW9^jSm;wmq(lN4FQ~^w4*lFi~
znMCe}<iBa9B{%B5@igIhnF#ahnL|~!*9!WASzxb$H#U1>nS@291@}|-ap}MH>Wm7|
z+tp0Wt8v*BJ<t@T6PCogc#ayE`ycVtCU-Y9aEz}lw$y8c(R9v0%o`ep{D}?L3se~N
z8!9kKby#8IcID6mAhbb!ca=%Fcx4e3nI#=#;xJw>QaACfA7Ujml}x8F9&T#op9qtX
zr^#ffevOlMc}X9&3DbMjs6L6BvL!h1n{}a0GUoED{T#lTer&-5*JH->n%K_T@P|iV
zcvlwf&7&buHDuhSvWyJXWQvYhaoBv_SxIQQ-fCcL_SlTCsZ40GUw@kKydW8y1ENX1
z%f-wzrU!@Yf`2$ZXDeqas=qfDB3<uAj=GTacEn0|!C>(4y8YWH0v7|u^YW!o7Lpz6
zrCS`{w_BS{K69XaM(O+9FqBc3nwdcmwB0XBybAMc*ffpEz^$6tCwl-+Y2wKC(5-w&
z9Oo2;i51uc%^W<|%Dh>Z`o$wb;_S6;_@_HvH#s=(z6hoOzwwTfUyd@gZe3aj{vCyn
zeGWgA^M9N~@mPS9*!({y5$=p={2>ve&zr;9h#HH~RS&k^lo)>3bKKKZ@QaxML5h}+
z2JUW!QMul3EOCfe$J=Tg)qFb~L-dT?^>2<}Ef({|!XWt+ygg`3g*o9v&8e~Ch1n-g
zbsDGLu<+3-!zQsyMzNUBvASL^_9$82_-3=rb9?>;HF{hl?)wo%vq}NaB4R*R0oplx
zy59AD%F$$kTYDhuA4X08_TucJCn`m;V8nOtg!#UReCn=}r=J-Dx>Mnus|jBxZzo{L
zrO2`=Lhj;6wnIG7y1Po2lm;WMWQl6Z<=&2EBNkd~ngh4>?AlZKwG9`Q$17`z_FDPQ
z8l$=r!V-!!kJIUTbXK4`wDUe<#o6q0ntMBHZqq;QXmn^G#T{vWZRcucd!2rKgeX4l
zKS?YP0)9q_ad|#Vu4#N(3R~(6Zp7kyb^AL)F?qmY#cVXUG)LGmR5H*r8Z|#3a9q{-
zLR?A>e}kZ1{i}Vk%0}->Ndrz%!*wEFBajIoCGy4sq{Lat7|n5*JZu)1Dw{tr#3W@h
z4Q*{sF#B`@oyWW#O1Qn9y#C8sd3u?7Wnkt06xAHA!VW|cC98CECkUoZhGA3H59$cp
z#6VE$^B2~=zf7J%cGhl>5}M^!#Rs;Ci9-I0A0AoUGdgz|z$Q77yCSodu{?g5I6I_o
z``j<wXK<hib?4nzKoAhk#Jg<jtV0bPx7wNHEWx6{XBll3Uz&iC^Bb^QW`#{|D-)(@
z5djNT9C^PPQWcPC_|55*A|LVwIbbo|X2-Ff3lZ8zlxT}F<yS27@nT=;M&(Uw{5tyB
zrMyJyH4jr%(l8!>CM(Xtc;J&Fu$5?3C{|59pC?W+TKVynPkq{Cbr*>yshb>kXXj-p
zaQr&)0+Y{hB8#!80}z8m2$YX%_$?^$J#|pRVv)70k=1sm#OF}kNN1LD3`KwHgZ4N*
zuSL8<+>2Rr$+RIV;2Vz;d7h>nCX^xr{-McG%#p=o%7k1a*Vfk{ry+JKc2&NxMv)$)
z5~KsLlT2fdH@*Eb1l*1l<1|mWfXJx<li2qO{L?!do#9CrIyzcJ{Wwqg7Vq_ljba6J
z?M7J8oTwQ0A`Hn($UEY$7b=pGKFUlWad({b<o$`pzcTx__I2B2K-AG?247u?M6-(u
zCoB`F$2~c%hp?aqh%@0!C*MKb)*!Ql1E;r#sdoq2wfjPO{St>BCo-fa;jTpMIFe^N
z<{NEao>dzlu(Lic=sJ<Q<(9{K!-Sod?CEfTz*SR6-5RmbYsc`M<e}x+%WNQ?7=)lW
zxpG1>NL3ldvRv7gEchItM}<I~o_rV>*S4nwhXnFjd_Bi2CbCyXgVY)Y=aZBFoF(fE
z<S+W4{pFA+B4LNpTqx*y-E7kwy(rAA=l=?r`?jby&I(d)Cl@#Zp}j(8l!02ui7ulV
zdSpODXxwF)3fdutFk1)Pe;R2|w~kndELah$A+bs?I!j8@Wx`GFYnuEO+u;J)nM}vT
zV24Dmd-TY!r~sr{X4eoVl>TP6ydD^<P7P|j-i9A_4<#W@QkWS9Wm893lD;qiQLF@t
z#Nw$t3#5FBtY84bYH%j@BPR}D>jqvbq>nL}KfPm5onsC^QylsY47N|AfF>qLXjF)_
zr=k90pfTmU-L7sJB8z~JfLLDx^oAt#0rija_oIkax-%7zcDTq~X+-wIfPb<LylxA}
zZ<kg54N1$TrbX4TL2eL`KNZ=gfK$-&0=6cOI+`&(zD)(?%)}Ma11sJbrUIj9^<%xI
zrgF*bgmVoItgVu#N)Dul<TTbFuaOhYG;#}Vz-%K06g3JVlbfKKHa?&%<Ejh3hcw-Y
z1DrNAD~wm)%k-<{8DkC({#P!O$7-^1m=}VN=v(GN=V>2UT8j+7;<4K}RB~!n?*%)m
zQsdMOq>lxt+a8TpWH2mbx|ScQUx{LbDh&)V&cK?)shI&^y{0e<K-Jh)fn=0@H};^V
z4|rIx7T;!4ZrU-Ms_YG>_Ylfx`fe4S$eN{LY|Y@1>py^l=RuCY;$EK|R*WNq6Wm||
zhhh|p;*Qyb4nm7~t=GJPXqOu7A7QijlEclZw7v;`AtEKI@-bc~GA~53Cp(f#zQ-y0
z>nNQ6E!u%EQJuLV3A((TOOw#2<=8~L*Ju1`wh%UD8C2b(AHxS10bny=VoO}1GDDs(
zFOs;CKGnIyFP=LHEdMOW8@>0`LFS5&H5d-y0_UTCaz7(!QeVLCUlIQ77Mnj`c)vJK
zaJS!l^=EjCRH4ABj&DY1M#d1|RnDJOPh71K{sDQqi(GI4Yo5D{C)p8i+Mi(e!g|zC
z_T!IK0C1XnK84?`$4i@q@4`hJh2ETF-s3UwZsW=*VGVG8VoP7-ob;rlw<qKFz@UtB
ze(;F6Y-uQ_S(73+*rFu($Cf6eo;;$4tawlv!FY6L6vV;V-wA38wTlaO&sv=X)o44j
zAWH@oMz<5&LPE5Cxnjfd-;Co*8`0DbcBWJ82+bI8Ap&~#t``qUrc<C$H)g%p5l$MU
zD-?$TgfG{g1A~ikz42|+$={k-YwAs<tilb|!JqAT0deA06{4_VkB-@2K*)6e#D+Z@
z)Hg=Yt#rQ`A4Fc^QhxW@CNW<*Jd8;rH~GCE<LBIgV#K-W>G<$m3lnFb#v6ZG<9|qt
z^013>6wMtY-;Q_-7}zB^T*B4qJQF<D@@&uNZL?;^c)%Zdqu^%^jd*&{=1aN4BOs6)
zXUW<w&XvhUiQ-)K>8+*U3f<YG#XKc{D-vVGpfYw_dh}>c98Fw7-g+XO%>UTOuNO5|
zv5=r$`IPUf0!>-nYhCSzCB|>^__hRd+f(yL0+qBCy_?yI%j?G9uAo~%jj7pW;eKh=
zE!&8Bb9dP3JOV|#SVKy95}B{{<Xutg=NL4`2*I7vSgi>eN-^B)4I*|IVb8#=JamR+
zhuXTQm``HjPWg^g1VJpSk*j(25TqK~cdm+-sJ;#;_N?ZKx7k<V`K|^%!_YTMgJ+73
zbH_ryHmA5UOzl!pXM;#SmH+Af*@)m~$PZ*ZVBRMUrr8x2MR3bAnKr)6wG^&3Em}<i
zyl{x;w?v>X5uoJbxz($wVPWw}P@|JEjR?Ode+e4U2tRcd5YnX@V&wWSpSWLo9e-`#
zb%`=>X>6(Dt77aJ83%IgPUbhr(_km|=>xg{mF#Ho;!uQYoHzd%yu=C)=M~vb4Zs^t
zAoG*mD4WgxH5m%L9abeOJv{7!P|x%rykQGj>Dp9vq)A87Ajd`}Zforhl)}bT2+fIW
z?IdLi#}RCNwF3d;%$UCvXNJebHTPOsFoNPJ>eLM%hqZFMK_U6{QRT|u>0$LF3PP>|
zwdqzE?j@A__S|TJSo!&XLTnYY`AP=uV~sD%Jnm(G;vvb%60X(B2XUcWDPLNW5K#pb
z7A^bL?yvgUOF+@|1#f8aC|nz$W5{PnbcH}#;->*8`EH6S;#({(H?e<vl1n|brGPW0
zj${aD_bALJ$lU+DZKC+wgm2pK;XXo=hhI_SQjBtz&9H-CS|iQrEVPnpct#t`L(tES
zBKw7pY>@I5?j*D%LM?Bi9xVIn#3}rc9Z8zRsu6gf*v_{9_sP`O7uV`l6(M3f`SsEW
zMPSDkOcs^8)R~Pm%^czs=~45rt4R%i3X`Q&N%9ZcKjnUd33n>sP3)u5D8ypz$nEMP
z)z<bEYGTiAtfBtM5uO+s0b4KSP6z}1lw7MJ5zDrbcYa2BWTj#|!(o)iV3v@)U-I7G
za=K4KQ%YNT;0#*g{!^UF2;FH-4k);Oi9-9vOfE;w2qhGKWRD*_v)ff_q8VA4F!Ull
z;)fwMY6I;ke**pUW`8tI$XfV|4x9FP-w0hwEUxo7RJ53?YI{`q{+=azBSIn4mw)C-
zXZ&--AadoVk(XZm6*b3hPf2IT@>)yH`G}qWpmSbjICOI1bc>zW>#k15s-^jReC&Sj
zRJG8bDVVKYT<zM8rS#*py~WcO@hiaXFLvzrak|Wa#P;sJ&bM4`nyqVb4*c@wWjc-<
zM^`t(<%22<?lwrgzQpSe{YgV{mM-;SXE76MNL|BQdb$Ve+5qU!L+>upRhzg?_qGyk
z?Yx6klb--USKnLcZ@sF&SOnvqHM?DGA9!P9Ck09Z<dF@j(dPy6y5w~f#k{_YvYv-C
z85i_NT8D^#-A*(reA1ZT2js&#FPjt!a-^e-#x}}Th5Py0{ScSM7x_nCho}fL&q+Ck
z`6#A&?|~^?Zj3_QYHh*3dl7wLX+eOlH<wbUZwe39L-n3>VOWO2N?APpjGsP0D#*%m
zmfsq@1#H3lxOSm)r5JE~%m%(=+<EEzdmc1QrwDjG+U#-v%W|^LwFOv&lE`r23%P0X
zt5hu>J)Q6}gqGhGQ5itSM{8r)$?g{sSqK+lEh>4~t$er)cKJ(#BN@|%arYJ!P9r_&
zz$L+_AY)M}KLa?>{gppgdoB4y#rp)p09FrH>PC*vl8lnw1?sJX^$@;=x3m9dk!nG;
zOpF2WPZP_2OU_*4grL3~Sd|gvs&Js<id^lBv}W4zY0cBuP@+qPMlh?8i&vEj_b9_x
z4#mjIHEIx@!5PBoEU52LUJdIZlYTl*k}+!Y0g^Ne6k|TjFq%oICQ{>T+GyDgR{5w@
zL{#h$hjcfSG*A|M<=2IZu&T$sKe4|S5?XCXqbPoy?5mD1pLOn{)zFe(oY2$pwGDEB
zkSRE}AW0wR3QU&GXL+~yzSe7JNk}l2*VSn}mI}y{avLQ*sDuYXf@G37ZqQCHhYeJ1
zlJgRdn&|YyV4Bh(HS*hKYTR4l&USpM4<>iz+m+<h!fB#^8mrIx50l3e@ZfC3xn0*f
znZD1N3EyLLo(K&4MsXwN({||;gqxP90s$s!H#dkT#ho<*dkQyV$;;%Idw6?znYuau
z9Xs35o-TpKa)7_#2wwJYWQu(9y@$29Cprq*9qSFSHoCTW9n(PYu*wu#`K7E-hs$Iz
zN;`|AQz-6+X?D&7{<Kn=F$<bESf-#c^C;7I`Ui*^r&!(uZxv_E`to;<5#QM#aWi%Z
zAm%%{al7iDyk;CL3%B)mXx9b`*zthbx=i2#2tQ4bUTy#TT<)jrq_dYGUovO>V$dE)
zkz$U&5Z9sFdFb1)cE5ui6uaMGI4FIy+le*%z_znL4w{*^rK)@BwAPO~N}G&99Hc7c
zL2(y~)l<CD9d_&VRV}K=xjP@x^LC%0gQN_2!z6p@ijYs$KWWffHKPoEPoDis>Eu>U
zf`BR^!mgBI49GM69p6XH$!Du+S<mcz8s?CIo9tBZT)VTIx6}J$qFbtc6SA;3@&i#-
zILA0I(Z^RsL!o36P~a@}{Zb$M?LmaWN0ttV?F<)7a`gMB{qe6eH)6BHHSJ;$K$^fW
zFAyT<<`KBsowcY&IGC9AZjPCE)P-_j!5;&pi!CkaGxQR8m_dvN!sys)<vv`XarUEz
z#M5Eb$NLFNB4`H0H@WA<|NoSya7Q$6*zw7&1)CY!P;{(df)-d`0P)l#aq7v#;hvQH
zfQwo$*|k^(9V%P>k&)G*s+D5&O!0K0w9`^FL(3*TsR0DMi=($~Kp%I#iO3|aNW_;;
zqoI0$v}|1^WW&C3N*&1PgHn&K6da`C9B^_Rwlbif?qUyZfXm%lK|W&R_s_D<ISD8z
zK}+hn>pFxT+ATqYk}58vmLDz;s%fj+Fb^>_>~_-P34~@yjb)CcQwi9+LIR-2%QQfS
z-GJcHS1m!VmqEvg;6q#45i~a@GwFtGklF)<%!=d7=O&O*<VBrN88<=kPw8D$&)eNR
zZHe^ka<hP<ENsT=k~Di5+~66nA*=09-Y;G(N#1W&S8UBC%|bZPvnO&w04AwJ$Y6e0
zfExGUh9P~c7X6#?CX|!z#kEHDp?j0^f)ea}foq7S(np!zKfrtFF1Bibm*@aPWHv41
zm+mwA{^R(A{;C${@9st*dgQK?Hp1yu2iBe#g0$G$8Cjp`A&jHq{5CMlukuTBhyz~e
zPD=%aM+6=(#GP^T+UHlNlj|%?)(sQ{`y_)|mHt1+-ST$kX#$z!W~h9)Ec2H3jGNor
zDP!QR6LqV?ew`oQNRd*rBa(=jR1(}iIKQV9>|PBo(gm(|n(a>X*V%u3Sh4pNc)F6v
zD*soSQs?o%_Bq(NYjt4QBe)*%U6WKxocg={<7%Vc$HDr2CGY0uld$<Y1n{2rYPkB>
z!wGoGrp89)V(2bJkLQoVa%E_MZ_{eIq6ElTg6{~pJFR3;KvI5O$TjklVpEw`X;a1<
z9rUwW7xWs9uMW|JQP<eQP&mElePA~jriFUooYr4Wh4ZU<{EUuU2(C0#49=O$R@GUs
za2PbqR0ET}3Mpl?8hM(B?7%sGQhbhqV)}%7IKZjGbJZN<05Iby44jY{oFBs$eUnEQ
z`*fr1yBr=x?ji@Kxk96O-fS5bSdyI<sij^f-{e5$&y87uRe%0*oYZYLN;Y*oHmtL_
zwg)4ZZN`<{VXr~T;x>m{*~6LjI%|WSAgZL};EktSaKb*YJ$pa(miCU)2mPGxlD|6C
zl~bz9uViV-2#jZyg4W8uDLX5ZoYRwD?Uw_cj1mkw7enlE{12JPv1(`FbQ8yl{_(r!
z8>v+$DlRSZaZw9vu7gSQ$Tv|#s1-FPBwwvOz@rd?B{0}i<&{*mbfMtrDs7BJ6h=#5
zElNK#CNR%Q!g9@(o*ZM6Lx#IYQ3Dk2r=Fg5S4YU8KB7LYcaC8YH`IkZk;Jf^{V5ef
zv=h{wE0o7l46`E`A*3U81G}MS*toTKShpm!HStI)K*{C^kgF17T@c5sLjOv)Sw~S1
z?h4R4eN^7qGbk$@Lv<UdtP>9jMtD~Gd7c}3g~T++P&90|Kk!H3tM6zZJlNcb`SgX~
z-Z)c5&QMJJzGauD*cHjbXbVO+vP&ggj?sqz$f}!?<XFC+B$l6Rg9V>Z{fuL%LSVpT
zkfIb2ZIfI>leh#Uu}swnyauE-5-~OF7+oOFiSG3SCgVYrA&`L)xrE63Nb%1<x~3M}
zT$GwR6J<2|4BmnC`HIaFnFgmH4f$n~>814=IAnsIQE%)w$RT9KaoD0Gwugz!!f$MG
zyu-PCWDw(E_WAcQTDSP8-PioW3_OqA&9+0k*}IVM5q1728Sh_T#Y<jG@<r+^v52<v
z!e9kXlqTuIFr@t-WlR&>L6R>mO0zp7iV;%S-~zGYSxOHjSzIE&qJtAYa2k;HU}38&
zL!i7#pOyhM%bY#^V@l&bk=oXAs{5^Ij=u{XvIGXS+g3+rHeU02R1tQ)6%-`1f*LT#
zQ!=7IfBWVEb^pyFx-aW6i|Jy8At8P)Bx<1OVC9U-nyH}S=?nq?6%@XE(OH<3j?mMP
zE=@r9YB+WY9W=`Vq1}RLPBUs4vS<Dx^!%@2%xwaEzQFiYGdd&pcI5*KHN+=vjjKzA
zrk-H$t{D0VOzn1lXXLE)=}ax}-HjpeH*LeYwmkfTdf&Q|J*w8Y#<)ybFr-vD2o^rI
z;bqMflj?@~O^x#f2<p27v${8~0r>~x0*}wp*VmW67J51TQUAm@dd4K$bu#ljuSKP9
z&{oa1D;G+ba+aH!!a#0?8p2YQut$&f+0X7|g=fJQsWm@dlJ>-pWNQASy(%yOf3oZd
zHcUdf$4A{v_ME1`gA)WhR}_xcEcbdu-SjK=(J<Od-LYz6Ei;{SzAo?hTzqLDwem1J
z0<m^Be2`)e8nK`-y*m@Stf)o0nK!!<B%)J*F1Vd=%G1L6l(1*DkL}_gsF)_Y1-_8U
zcQJUYkkz<>1JQx+ac>bs`18@Sc}aifiY6rg#4ujk>+}=OpcW7gj=kPN;nd&md1*v}
zO?CP3m3YYAv_QYGb}tbUbFpi|y@b*|5SlfpQYgw9Tjgbjt3&x}Hi;QX0(@UGWNhYR
zA*O<n60U!IUmeK_Bo;EwvsYpYvg@;&OWZ;s*$&4+{YSDQWuT%x8Kyo@36x#^Ba$Ks
z<fAcI+*~J6pD3zNPmu5g9mbb*z+)GzJz@Yw&H1akPqWDd)L7Zy*h6Nd3z+l+iPCKy
zE7IoZoibk1I9@B4QRi)MwYhSJA~=b~AKh=uG#WN$DAWxJR6Nkw2ZHsI0iVEHwZ#+0
zPGY*u{JeFs{^fzN6T$L`hU~l($3&No*zy|vV5)jU`V{#5?8Fr$aO04U-JkOle<szQ
zv@PiEr|J6woF<vgn~4tvjlS2-X9@W?6w?aqewC=@J<lh>>Fc2IjpUeEf~E52dp21>
zD%&}A4FDmN$w(06)ugK)hrxb=woKS>Ygds-y?670pe*o?byxy_P1+`uE~{h2Y=n1_
z)-28j&8-L9@SEwl+RKvP8S}Bx0AnTM5bS0}L%39gW+8LB2-yEbl(-c`J|Hx>O9t_^
zja@*)vaDcAy0GUQE}7g#TF!Q!%tQ@pkt<W|czAOc>Cl|agR0^=%o`{ibbnirzHX(@
zdrC9kTZq!MNa*<vPKj0jtI-?x_cO`RZuZaJH{xRsTjT(75}TAPZLsmDB8<V}o2Hv&
zVh`iH-qrDJ^Tjuq%8`Xij7x%DXvi17lm)V>)=df{a?-r<8ItMvgc;X%NONF>HX!N~
zgc^inM-itnv>5g+e15ve;YH*mFF!B#6wuWX!IJ=DwjJ2gpoGVfCLBPFY*~_qU^qpL
zINqIS(j|g{H3mz*R5Yl(SR<<olr^rVY!<qw51pDpy3bLS7j8u2_f1|1vcw=fWUS0x
zG%s{UNa$z~5KxuFxrgys5kv=u+B2I{4iz=JPg^?8)KU)Zm(g~z$G5Tvlk7KSCWtF^
zM}+=eO3bNKHktZ^$gR5_T*fZD1o$Bm%k%$oJ`shi6Do!O<$USR%FZm`!LP<i;!@+g
z2V~fdse!;Wi$-UQieJB1H(7fM21*X-;EX^)(Rdu^%5%FgOK!BK%q|*IM_ZPPdb&{m
z#M&2z%Oz~CfGZcG?Eds3J#McP@sq}}*Z10(ffS4%1xGnDKO1Rs@ps-^muf<v`W4j-
zRi7+=jg^~)8)NZf|G>=YnnpJ!JBPEpE@*orBjcOR-xeSCJ_M2IZK~ZL5uUOC8t9jv
zxAyy8kqcnO=g#LpKR3@r`OFkMoouBYMY&iCGW+5HMc`Wa`kx&&m+Gj@ocT2070Vzd
z@XYq0*ZXW3RZ_MnOQoKQv6ij?jx#U<JcD9zLy+3%G6Zs533$myxK47%C!5G{=%K;~
zHV7p^K&4f*LtdeWlfWsTVzocc`N5;>+gzV>hI1sWifo`B9Pfw-Fx;|##r+|pn{NRW
zLP9mpx1@$$T&)a7tbo?&%-0959#3F%PyP*H2!^L^E;chH0Ou3%>Up~Y=)?h7D9l4y
z`K7X63==|skWkj@GJoAm<{0-aKMTOKB<NoZd?&`{KayWwCx8#O@k&rJiSmW;>T+BQ
z0(HU)_CtfdWZKTfO1fEzna%d=ef0(I`czovI~@4icappWgE621XJ^))Zl2T<qEY0k
zBux=El~`6D?-9K+U($wZxCuo^cU74nbkkdalrr`#n8gbvjB)!NR}Lq3mKiBdY_8xd
zxWW@Usc-KCAea+#U4gCEL{bZ}(G&?gcEdPx%xBxEZPXlS@#IN&LUOMt7ZcMlz!%W;
z|6JO}O6nUHf+~L_jSN=@4ST{_V(j4NtX{2XB07(D7MtfDa3i^Gh{&Gcw3a1GRRZI$
z6BaYGdr70LxTMorRuP~79rWHe;0LPy`Sde9n7jx|fRaXCx`s{nY)CI^K}Jx^Ck@lu
zUZDraZ8f{8;@+@MKBCNpGm0{q!K*o^ac1&hKUiv=U|=EATUC(M0BGA|WBB%pblin5
zA+^Q27q=*}jc*^&R?qsX>!P+#ifUfL`-U&*6AQPi@{Kbs@QaRUQ5~{yDcGTIM(R^j
z;1|^t5LIH6%dz2E;9}^V!|RA;coT`ZR$~5+rhgLNA(rSap>bxtHNMhAaw_I5Z6Apq
zndwvGE$RWs6g_Xg3YIK}mqM_(2qD)9{VW_;63k}_us6C!7?aH}us}gtHwg_GpqBG0
zm8h6aPE%7<C#SD4p6G;j7*17^$k4IoY>*Ogvt2XAqvI9}`8fX|S9l5mg|Id5;6`Gr
zFltFXD1xV#JVKysdfv^Uo^}9Tf%<EsKG*`K-&#&ii@b$~xxYzMFR~lQw4B1*{}|g4
zgQFCgFH=@B@61>;w<`kC98LZMY~7#7L=K65+95AR)8r2DepwiC2W5Rvq+ye4r<s)&
zjX>vFj_$o{Rl46iy0RcnrkPV^R#?=s^4l*O@qTqFp2D%S%6#0UX^pZr7Qb6qhQO>9
z0;#Y%3q7HftP?&#nbEcwmxj%L!N-`%VIiiP$eNcn>v1Kmbp9t<ibOa0W#%CPo3;Fr
zo`)LG{&=hR{%m8o=!Y68Z3_nA8t83nJ=63~*}fE7rPjUOUy+>fCw7Lqe{*cx#&kRw
z<KDt~^)KKIj!X0DR@O+N*wzy2>@L>8J@w9LwPjlw@A=R}k&AG->q7MTQi2VeoGB3+
zV=Nmjb4iwEX7#l4B~coURoO`_z?KGGjkZWKa!suoeNpj{eST@$J@X7R&XR0n({qPW
z`a0$+t&BwLGUa4vqHS_%@zNxr3^j#?=u^p>zXwiB_s~U}*wCT(cQ~;z&IpL0f)_QT
ztiI51A3z>=iI1Uqye<ZiQ&2R^zP`j`STge7Px|{H;4r+|?V@cjuZ8UHG@r$Bm{L!4
zpJ5Ct*P|z){~y-wfw|JQTN`zpbZpzUZKq?~wr$(CZQHgwwvCQE*?FE9-@DhhR@JK7
zyXIB%7u<E%oZ~vraX6rCxq|FTF$wF_Bt0EfqGYhDjzTrx4g8~YK<;MldN<IcU|9S8
z&nK`RE!<QZrdbXGC=TXZbQy_u-2Z`a<sGm?JJk)+NKb)>`^$vJ_q(fS-{A6uTO{e}
z=5&^c<OjDB!Q{A^&bgQMmBZTVb@5}QMjs+%BVR?2diF=WA%GboUHQ{+WAVXk_61z_
zc;ZaE?4A5c%-+M%<nu49eF`k`_8*64@P&Z1xcy|5u{&Phi0|PmeT9sUAa<NqeE4q>
z8G_r=wcUZL*#&%(DzKI+hm#;H(M2aopGosNEDeo_=f9n$v<d7wRA<MSbL*su!IxnO
zhF|DO=ajR1O>xnzcgRaK+uB~AXZxGZz=ynD9w?=*Of{lI3nI(Q+U7B|Ds_vU&a-AS
z!*ObVHVAr}wcvTZy;xnWcBVT(+~@CLv1W`P+Gd%5BF5?ppoTs%R)tC~l~QcdwN^y`
zla1jv&18j~!b%n?U>Z;5lQT`8NU?cuO@U<Ax3~6Tm*yBQY9e^l##nfV7r0OCAV_IC
zZ;#4Et+8>>EA0DIhcrq(1LY}fo4_yej?4Rn$tMe1t3>}^dna42UyeXb{gY9Ao4tei
z?pHHoGJt$A)bvx(&dT3~nQR3-(Pmkm>OT*tP@|}%U=9EFV?yl>$4c(ZtXA|AxHp4$
z!PZIdED1umYqq}~8QjZlHv-h$c&l@IDYlhbOyA%CWMnd|qmAz7Q~}1DN?BKBLG|nD
z7k0whUYy`M)8wBddU2qXXnVc(z6Y>ys5&$3t);%{z31HA;R~S;$wtt7rp-{YXePPy
zE8*Na<v-~+D@T@zoHTP%>)`eo92mhJ)LkNp{%y%LT9B^EqV>Lw-Y4gx=IHc?VqdS_
z+M&-AZKtOXfR{SHnOsdCG>eUX#~Y3C>qb%h!D=uTST&uR#uAZ-lBdw`)wnsw-H2M#
z%b+R%frcI?K2<2@?pJf!TYFr8W|avA&25Cm5l{hE$YdLoQrEPlCZJfD@wz+M!&Yr%
zm&%?Zb(xSG-<PtFoX6cqnu+Ep(6CF3B!EI$pGQ(~Wg7|3nFdY-B#9{?N?j-{CN>KC
zF50jgtrIC0#l&jBmrYu7bE?r6H|D+dH474#`%h*D5B}yfFhcgvi9&rq_Zj0<gJY+h
zmR24YWK8sbJRg!TXp}$#n1wR06^1$QQh>x=ank36w~5+kC{Ec%{{!Frc&3EJE+{qP
zPH=AFiGrMsDS@OoM(_$l^=&~NKowQ8D$lZere5?Zm6dVHf_xD)_tQr+7~^C#*jz}*
zbfuF7NLP`m_sBl9k&(#_J%ubut%KWE%TYw9FtrykI>;H769-tnFTLsS2o3W{9P%qY
z`jt77GfP*m8K-q>hgH&>Jkd|~*>pcnybpKpLTBR!v`E0pQluGV1XlqH5L@2usv_5e
zl7ZWeLabrYQ&x7~@Z=vW0VsZsh)tgXvlwK-X%Pj|b#a0dE!x&-!BnCd*xVKCWWcOf
z#99Lz?uo+Gj4SU4JR<ijLO8mxmecm%Z<4XvO{kPEa37(1{=r50Sfb@a5X7dW$#2L!
zx>Nj4{*`Itje3C{i8Cwt%gM`wV1*IBD7mK+GaNL*gB5o!L2R{ouCHddLlSZxPbJ1!
zU}9c^o$#yvW#916?1Jm%hnAjvTAx?7|HHn4iHCvSY#p^=t68nV9b{^y=j!(I_Bz`r
z6hzhF@&@0r!1sEcqBt4inIglSOg;5>O{v(}OcGE6+-ykA<a9`6N;9Nrgh&5=I7CtS
z%fb}5LGv+6m<QnuD9I@lEGa-JdTra1Mni*C2X&;=DUOt56qiiIrZk^=m>0s1C>(=+
zxWv5Z&I5(jZkz$VEJxO+;%??^0C{wR`))Ssa=6ZpK?Vh{z0WjfHgL!4gps!odPcs9
z(?!d2z#V_ziyW|)qj0gL4}<;#<W8lVrS|ncidIWctYb4c`aV0wYU-o|I<y_VmhzY9
z5<;t@Uaz8NXMc$}4SlQBfFIpse|(q{dY0p_x*^TC&&O{*f}L%N9rFSKn|J=$-d^qM
z9^}$)XQ;cq7Eh)8_{`lN-iz_E=yLgk+)KAPdSt06h9<6;r%L-|NkSkBaWU;>EyTPF
zab^AMq0gO&@FP0{LbKmka}6mGuDlPfA7vxy?R5fqCk$j&l(;S9zC&dR0hIF}t~+$G
z3;%go7Vsidm@X@WNHIR+5%3)$L1<En9Aqib2%bp7{r^F62wQOeLUEdBg`j1}IDAp5
z^}hjZ*{sBTAd2x46>6iQ(ktIJla;slp@8UdY#ugsQ)D>1ziFHbv|KKvSpXiymQx|%
ze5l6O?^~%tH$W<hzjPdN+O_}GalWN@XQK$lJ5}Pe?Y3vk6H#gD7@w=noRS3Alh@VA
z(p-4QK&Y8!ESYguksF=%xJKo}W$XbqKds_7TY}$1npjA;!|f93@vp|8lv}$1(Z&$L
zb8%jNNi%`^SmD0}(!o#uN#*?LMhZ;t9Bd<+0a}j9wfp(mMDHnz8Yif*_Z40CJ+XH2
zfmA~MR&2UnK(8UKQ*gK&cNU8Bn(sPan1Kkc&Qiar952&ugt8ulGBDJ#V=5wkQePrM
zvc;b<6vo7qBIa6G_tW1zsvRREX6xfU<cEP#%mc0-Z9Bc^-)PSHYI?uZGp?~k7*hbg
z2XGi8_qs7`qXe@!HK}29t_f-^PqnsJ+v7dA9Ite~Ef28Bo2lZ<)7eyR;oKfQFCQq|
ze0!tRK~!de5u(-Wp<2Usp>Fxdv>4?yl%?nCo`lP9?~iZe8e!z|=<d5he=;v}au(~x
zjE?l(Qtg|(TLk%u$_khypPwoxgV4=LfLFfd_e-p_bmox7^cn+=(HM^C*!=#s$jfWF
zX447RCqNyIUBY&{Zz3;b9qOA)%bo3VDKF$KSuaPk^DYDmukEev#;)bbA(Dz@yiK&q
zaf`U6yP^J8WlZZ_zG9g8dc5?zTn@?}Za3*3N`3eUW!e*9g+(ZjmFV#r*`J3O0=Wcn
zg-v~wi3>k@@`EA;b#VEPvO0rK7TsO?wUZ4TJ;K>C<Ld7#j&&NX1$@ytJ8L1;tFx=P
z_~2Qv7WYk8+n|Ci);5#ER7i?mu221gz@})2AHxGd$*BrI`MUasK)nus?hG}7(XnyX
zlg1NJSsnrDQI+hB`RYc7Jw;1@pB2KyV%X`Vc~BtPLs7&*ONa=I#P3b7e~_RPeliB~
zo1S9~p|RDz+adJ2AlZW9XEp9{Ak7kF10KVe*wRJk_EBS@&{I$8<XOd%eVL0Jy$&0a
zG{_05jSDA+2;Z|&lQIYU+a9{~(i{jAlmsd$?X!5Q1f3x(aOq&<VK#VD9*}&mBveNZ
z3~uL=qSKK(ivG)%adQ8lf|jyC9(d3xT}G$j5GitaNR#9CuY|MbtIp=Sk(T9`!xSCI
zkUe_3z5|{rkBhv_UQZnV-gYD92zs34wzrvK;Gy7w^yxm5&POUuz6JFV2>y~RM`Kqw
z(FoPZ#~+wbB7g}>`oqq!{gtUXdgL!^#{caf)Xb)bjgSr|w+!7(Pdt$%imIl@%MCHo
zN43<ak5oyLGtZ+jotHENLp#XCN(i&y(d<F$#H;tA8Ub~Bzhz>nj%z@Ou(_=?PbS;%
zB_r1t=mW9Efy?yoa>PrWpCZgm16u0c_)!f2=Gyv1UkQH5+4@T-QhoVKc=5?sm5eXU
zv*&=}-eN5)PveW(OuPL^V*d&!6-M4bvY>W{fcA}VYe}WR$5pU~-yLeuGr*@X;!+nY
z;2(?enH}`89)>t?L;Ok~pGmL=TkNSFamuq&lzAiWH02T(A88A5Xoo0Kf+qx_cz6s!
zY5pl6`#+p)o+XH4{dZ<v@Xo3|u27AfP>qgIvwYuu41eq!LKOj1w3BNtvQmo3r649x
z{J}+)CVF2-_m9Tmb~?8&frg6Mj7RGrXm@ijN$t7YgnMw(1JO)+TEj5X3GKAr3F0)@
zoW%~WI!X|)mne|D6~OSBi^k^W=!0gA&6TUE49%@i)WgL}l7LS(MAF3yDrEA&-GHK_
z5(RC#S4Q}OL^9Oe*>A5yr;AHQq%HE9jz*`8Xib||$4yvK^8IkLCcFDWnwMa~P-l9q
zA$pZM3#of4Cj#CYz9=iL2&2G`PdkZ5R+_wnTq0P;NuE%jl@|l`nBKdW+)CiA+x+##
zaSdLnou4Ka&#;6oh&#fsTCYmv-b@kWZvOy0(iJZ*!^e;iu~EmNm55P)mcg7CPK$*U
zqf^pE8eY5>W+tdElW-FhRP;f?<@T?=D|EDqYsgnYRxgm|!xTSzdVv@%3F~}_a#ARJ
z5M;kPBTChUWegAe7dIo3X3l)&WE!z0n977V9FQOjWNJKFG?312u+nRdFRAQqOdH3K
zno)IreMye)HNHK?EQ24=aHMdRg-0K?{1E~0C+>oYv{|({=t6L;2~K;6w);Mg!2{FE
zXhvWtS__LiU6NI&kEUS8<xXHk@HJJvh8`<zlY>~_PqHx7S9SDbCVsGt##(I{eoes9
zyQR`$6X8i_Xfu^*`4TSlQxDI?)*62qNKcz^%8;-ylPX$H4p6Iygi6r>uuumToY~cI
zy1nTeAu?Hq?Rf{7zkV%`i;e7RPtcI$%s&K~vhWjJc)G1h>i(A>cFF*{QDUuSN!sx~
z36BPU`7vko`ziFf1`Yvo&kUe9R7&Q5Dp1rvT<$uW)r;_ODXVoO1ua>c1etH^2_>qw
zvg~wpOzvXGF`LX(iXc@;LePYYOcC!i!0bj6ihmW|kM_l;w)aeGvskyza$;!>9*|IZ
zB#)U3bE)8^Q$>j$1Pka=uUsuh1TOyS_${jWB&vB_P6Jg#L9{E;Aoc<g8b<15ERbYC
zJZ|Xq{UNUZ(L8)`Cli0oZscxN)q5Ue`TbkFu$6%UzaTwD^9YLB(*O;)1U~?VRK7ia
zLL4?l!?~(%!qt3**DGK%qIh5C`)j?;zBS6pvO#dzb=zA`094>zl388w%%GMxjm|`}
zaF|j&LY<dgs1xb0aFHgG;yS!y+0OOlpPf2^{Z1oxm@$eXjq~PGW@e>vfj?=CFco1t
z=&6juxBA*8^glFSjFm@>b$jrdHM#Nz2S@8+h_6fLnK2a0!%~mTW=>TYE@I|5Z&{_d
za04H*Sl=UgH9fm+V&8Z&^ds|TD$GzjzM{8Xi*)d@5ZX0HPeIjqj<uDQ$5>I>_pWdh
zjKMS%VJ4ux5SGd=`7a{8y-6>q<rL57L}iiBxbb^+H_Y+X!b1pWtBAq18c+NH>P$-g
z%=C{;0lfSS5&6EuvG(vMfNp}v-IUPB`~@0Z1!sijS_07iYos~31=whszQoHE2t#%F
zWQhL}$T2lJwl_HxWX7araf=tCE-$RNo#GU9+ra@5%j~Q;$F-6?j6uP(gk<g&lA#bE
z8Hi<W7MjK}YZ0h^vp2CcYKCRPx<tV-X7#JIi&IDFMjcICn7iN}N{V522x@=tQ7_S5
zvKB@lZO)v<h4td{lNB13#N)qVqrAwBeiSIYwjtTzFuD@Sa}*D;GTRe0(X=g8W7nWY
z@xKpQNLF%oDvC2q(Q%2QT<1`v3I&I7;UVzxg>Ne_8K1jv`yzF|LV<q|k7A!{a@I$I
zAVu@T0jih=P6CnWlgn=t*iBfWxFCW7us&bSNKI*zhz9xP(=Fhd>284zTAbGx&8y0I
z26SyIp;DVH9>7pJ0+*7-9OV<ur=nk3By#KWo%V{@s1BG4eUYbVUdljtFrK=X-W^-6
z^BW_IeVv!%HaI|iUqAS>R|lFDi6Dznr$9E9Zt{|b&9Yz1GyL*@EEl--`ire<;>1q<
zQ-1uVy@}OdWBwD}9S8Ala2And`&;qCrj8DD0c9u`_8{bNJ+T({FH}Oags!BwOYp@3
zv-BBqWMgwW?J336!+sHYMvN;|z1HYb8e1p_5`oX03?j>@Du-&1a``-+tv24z%ladV
z7{$qzu_6UwgsvaU!+2g*>{S~iy6uCMPxF6TJQ@{!b#(5c!dTNWe_o5;E_a12&1qIg
zG|f^CWq3$1F+6vy)>V2!_0;GLw`h>zED~epCNu`1U+_;sXKW5wG!s?SmZ+pMvSe}+
z-bME<IoU)Fb!He^K;zQ;l_cJVSSa2kENg84q~44Q0qS%Rfv=ATj%h=?W^$0FiP661
z;?F%SI(XBRfGZY=JSt&=@4lx3#P}?tKhOnnE$X$|O8-@FY5Cs-Q5Huth)EO$`*UlK
zrG|a>I*7s0Q!*K0QKwts6yt<zyO6Q2G@*LXl%HA?p{Sk8hSv){E(bIWfXx@%O{U|f
z8!1*W&b~L79E@CxcWrEPxdEO)5`0>^a)?#2JE-4l^`ZnRGuBs!(IbrKpMPRwQl!Ej
zH@+}qz?b3Wtu(wsN_JrhggTh%)}#UZz!lqf&r5djLtV}$JCZNG+OSiGdyc;&Gqr*u
zhCRE{E~m@XvlXBykUEp=ZpG>vKb<HL6-(mk1RY9fTf5!LR|-$D6k*3mY>^7l#oAuI
zJ1FaDyc#DJv-%&_-ofg^C=Qs=8`BlWP?vu)OLnG^jC&zip`{tK%$}7i#j&k&tqJ)$
z`c^ms6cSxUP~3`kH9$BwQPEuxYExWAI{e}d|5d`i#=C{ZMZ|M_?Ov<2Ey&sH9lDAe
zk@tMoFv_zX&zD6vkqL=No%J!Wg-#gJ{BT?JJ?)D&?q+^4xO;`3|B&Oer=RNvSm~Mx
z=bCA^?W<PfFRJ&U$XZVZe=O4Jx>dHm{}|ZJa(qLr`QSTP%~o8TOuhLt`;t0Pt|`0+
z;<*1>8K>q4qNSLJ)N1@W$ow3g_djPwwnXOmi^Ce|$1?2p(ryaUf!rD>nImXdy-<r>
zz+wG2*&<5Wmk4uO#Y3rhBR22TEc}CjiX+uQetW!n{6Y>N)faHmFDWwotrFzQ))xWW
zQ82%c5WwB<-s$|SB?^n+_mOTJ0!rV8PNSZFUNUgz5Sm=OI@Ik|r%O-ny;VQ_eK`#=
zbG=2s*)AM}(3_d<BF&J}riT{x5~r}54xAmfnt0&zAvNyKbAGE{xX7HQ?YzhYDv7qk
zUGyW!jl>XtOY1o^1n(wI20d}`-qNQQCsr{U=rMLRG8#D4d|cAPGCTx0gBXtz<a{Ne
zc)UItCA$*EGiBD`t+_T;ok=WF3t@k(duxoxbyQ#XHNOqDGrczYGy-IBCfAfss1rdO
z9{6t0^9j(Vg8&C$Yse6cIBBVegwRGF12CzJ>HyNO;|cOPblbTpZsP|bp|WlqU0it&
z;CTCEN-TqVHEcZW@aQ|YDkeot+!@uvLmbR1T@#$`ZVvbb=p!f&W@!JC{rc~0^a*Fd
zjfnPIet&deCTGDs`g;}dyqw?4YIAdQ{=e82e;ri9J7HI#=3znxX_J-a-q3RNi2oJ1
z@*#n6H6D%cH5oSSG44z5uUTs6F@Nn<pEQ}g0e2Yr{(pzii6Xe3pE{mg88|LQJjEOl
zmcnDZ10mjG3^F|q?zW>}cB6my!9PdAe$u3tKNa+BWj@=y=HZnSfiO8z{sH2VsHbnp
z*Z7G>B~*NGSEIGYpSobiqFW{`Hd6HZX&Z#dQwBpRS~w5mcP(fvlrrpD+A9z=kQ=uC
z6}_LGDVM49aKj%XkZ_=a605fLCCbC2-B$AUZ+b;<E+INkW=s1et@d!DZh|Vm#SY}R
z6H$ZUSfGo6ZY}iy{sA7ef`vAhwq3dtYv$N6=Uspl^lu|B@ZkWfQYbojYBf*5%;4ZP
zBw6A8>@t{78F8RI9=oq;D}PLadsVjk{-83HtKrqf1qf{4Vn&VFRn_{|)C^f8Ng~7f
z1v6)!lg6}oA${p5P5cVmwu9?m-Qb4rdc28|nA6w*7^=FqAbVd*3d?X^WF?qNBSuQx
zr<(C`nDWY|doEYfSN|`5<pe+xd+;pkbx@EKm{G)rU2t1ePS6%@)1QGQkrSt-kM|Vn
zT@UBieVfy1)H|bwl}KT77IHFViO8vdJV~qshr!RmWS|`4v?Yu0pjQ$s0&kC+V8t#-
za!6b!;yIxm4g4zgz%8zVy&*05<XU$G)BupTa%vD*E)AqqrDZSmIiZ6!d>dz2Aq87g
z1@W8lhjt9w$ys9X8Wgh*$t-A*fBtsj%oGYdjaEk4)*Bee>eAA<ySu^bN)`~}5M(2o
z+YbT?px0~k_hjBI@Jh}<$7Rv=5ZB$dNX-fM%L^k(OBa&IgDORJfCoN_01*F^z@k91
zaz@pSh)~-~2x-2DLMZM&w~S$9B5pL%>@jls>A0cO#O`Z;)?W(MxY!H@XQ3!2|L+RD
z$co+h!j<$->YNO<x3efF-q=y?&Zjiau!A$lpHo~+vVd6}9^%z!Dn=M&#Sdvga@ic9
zuX43MgaZ>ADFh&<rLU;9K7x+17BrxI>d6+-S(TWa!>TUYrW+ajL$>OGa!-9I0J=mk
zD1H@Q#LF`^2xj#E0$A=~{~Lfs=)VP6{3iGf5T;TG`R>vtKktO4Z(>ju2VbXVx_#FZ
zC4hrsC0~wNTb$kKQ>XWH33TK<f3^+()nrKkr-ifOasV%E3Y-@t;K_=K*q+ZlLafSO
z`gGEV0QPUj=lOwy8f7LWH7Q17F#>H1fhXbYtjDJ8UkHYaaHgFUrYg}rXy2s)`@Gi)
zqSJR;XNd$3Rvhl5L%El-Fqx<GQ%iH6D_AI0F^zUF6D1ma8c_-%u_hi!#~;dZXQ`(9
z(L6qA9C}V^E9Sw$zr%+emtzWtPC`uRv`#%IGmoIg!stC^bOWjxZ^Rfj(6c1W%5TiK
zu8q@!Rh0hD_Tuy`#)))lxw)`OIr@V#7r<i<FgOkTF4+VvcH(plf<Z#Wu$|IA302PO
z@)kXH`3IGMT*)};aX#@+U{YZO;T{J69HRh?625I9&pTv+imCehyF2o%3I$!HpP4w=
zLe{9>z9dPr0a+!Sq-?_`Q${U%XBptN8#_I2d~93{u#5FY5k!>_KElIEXa!ZfqgJO(
z6D%R3g&x4C%;z3?%Olc@2<DX&xN7+L6Q45iiybYUQqA6GCYRMR(H+aII#0^5bE3F?
z?#BpOzv;2M&x`VS4^t-QEX!G*QtS*%PFD5XTb;Ao-PjERLtS`~6IyJZj^0JM>5Fl%
z>QyAvmOYgm{;*!Q{CVmbZ}~XSgTZb#XZ>mtQh;GCb|6U=NH%TWN?SfIQDi-mH*M6Q
zG<~{wpxD5Zz~#sYj65u+P`-6qdUS~WA;v^h_(G>cMr24Z%%TXMY0xdN<y5s-Q=DNW
zBN~>A8`+#Ws9Xj#ds-Dj{Q}<$ee8=!WTCT<FJ4wZ{Q%!w=++Ae8WsXYKHh1_3XZqB
zDCcRe+u`2G@j*$#HPd!_fwXmM>|M8_!fgusOmWLiMeZs55O-IDh=WU0B)EcObz|F0
z#%v+q*RA&-2+RJo!c6obeIthIiy}#(7Kl9m_-{Au8r6-KYa=Kc72r?SHqYUXGTCsE
zdA)ZPWLuNf4BDmW>jl<wi@^=#@mx_h^G+Mny+(n(micsiFp+E=ocQY+QCchkSFvtL
zuR`VySu4BOw>x{>L@Cj@8!*Icy-*Z1`tlup=aTMcXs^wU+L^7Unh}X##|u$<ruhm4
zziJdJs*c^wfNb<?$DOgZtt78ZSh2ZKVA!`eX2>D&o{gAwJaeO43Zk=OaAjVzV|A3h
z%>87GOJ>CB+-i@Llr)~VA53PwuF<WgnLD6VT{_PEFgbUMCaBh&rmfmttOr52ZF(|k
zZdEkbL@nJiS?b5RE0{f()(LuZ235`@cUCkXm{|LIR>(!JE=^||9jDq}SD0RY1}=Pn
zkptjvm7~oJF(EXFMNAT<LZLzmj*Ww=2oYY_$>+QRiBbm@q*f8I)d;H%u@MuxLLodv
z6V;;{MJ`HvmgV*-#Z2|VwQhz<IG1hLM&!5;RhY-s-N7RRj%WP;immKLjdBud!@o9_
zD%K|-ut~3JW$`g>^{gASu)BF2RlYg_a>Kd{L+qn7Fe;!I<|OIG#h&Sq=x&lN6H8ea
zDU~DYii-G2>flJGhAI{k4P@x#!N9_~ac~M^>xbVbp9YT$6oh$;P3R%dFFdO%bmc-j
z!Qm=#u}{reHBIXJHn@HCF!5b)@*qVc{>~`VnHGqM+|(DueP=eeg1Q2(xEv)DkP`Oo
zboI><KU;)IsEvjtI*#eH^dEtniuGF&uw{=Pl;X(*_0)OA<GT%3u+If#$*eql-s2b?
z0#^Q~y;766=VTw%3ewB)M)lec<ZRqus-VxD=@?yH4z=Cy#33u4+s6!vR1=7f{EO_c
z4^x2>IzHR}`R;<@4V)9T{Gn3>GnE>NOcS22oU!?1q}q-(pJe!02iZIvo+XUiKJ6Pd
zPwR}SfyL+a?g=;A;#Kq7qY%lw%lY1nWR`sNNpVyFfJ$ty++i*lfl9`c^}QB?f<Kvd
zvovSx?f;@z47lXt^MmrVg5+_It5qdhpj9;<asAD^h5vhbh1&nW$}3CN6r(b%lm=gs
zI`+Ak6?KD7gp!ht4^fK4HSGBj;Wa_aLB!<Bhq<jgJ{uygNpAx5EEulmfVx!>71(33
zMU+JC(m7&=m2m(Z2q4!M!%3a+W1oW}i)m4eR!_r5dj^x>?e!3nsg7n<%#fQRB#Cs<
zDxbjC5@_Rv79Dl~Pd9ege4mwV)_>?(%#V)<&ERv_H{`%lk4fyS%MRy5!N9=ABnWL8
z<2C&eEWM4*g0k8I-IM6c+ezmQ>+3=L38*W-4VH<tdsU>UB1!AM`-+@DJ-zyh+3TBc
z6+}$zONn7e1@l&>C6;xThFyRcl^xbgt-Tdc0akgq=YgnaggFFhBz>CcYP8m)tJwx5
zWKoCX8h8_?as=9&Y#H0BLhrlm`Qd;LM|<47(R!k{=EA$k_k*vLT%Mu~f)k8Nb&aD?
z3oIi>ur`_qTW3ppkZ1;nUg${LACVaj(b4e0SZ^j=5}e~#cZ6niAYGApd}w#JtH@<g
zVZx&$5cWxAa`c_}a{~sQj&WZ436A<uaY)?PTJC62Q7b>E;-T?}8vO(3#G!L&b$oK|
zejOOIpz;22QSw(OzuFLMr+>@Ikt%Ge;A-w_!%P#*Biv5J2SNHmY8{&WG8@gwcEfN#
zr5p<(;|6V0)wjr?gEv(9kSV_94Nn}X7ze(?ZB%3GLZg9GRftO#ZJp0k3bm_5@QhV@
zfS6t=Th34gOAkS>TtrCqiU*!$ldUGdz#?6yS}ZS{_Us>`7<^%082F4zf(=G|sjwYi
z5Sn7-;?$2O=46%MdHHW7mc=^u@EkQ_{X3`-?gieB!~{?Teo<d-wy`LcZE*n^t%YXO
z2$!GLaPoop#Y;PJSt%Mfl0$_IG>XlTn2tD>!0eiwN{;~6fJUrmz?|9TB_V!BHrows
zciS_(&Oo;tK74{6YW2MF7)J9)aZA$T*??WP>iY`*TyDZ}8jCMxM|LcPQAuP)oU}C|
zar?3u@WrA>az48HOQ`EfR$}+8MFGv26KBAQ{gjBfu|MJt`GCM1AbyHQDP%-T(ON<)
zkNM9|t<G!%f=8a{(&hN<f3Hn8BgIbufxt$hRzJL>CU)XS6ok%-FFKk)FF1qD63HJF
z#HM!E@Lj{pzt?T$KubOTIDEadv#T>BK4tCCI8h2#PFSp}=abhcfyF*7`TS`0(8*}W
zB`QlB2sV$R>qUkX^jV#655LXR&Lqr~&ge<sG2<rhY>Rm==vOH3JF-A14?Bk<YA-w0
z2KGg1qr00Z`U<jUjS%(96Ti7ket&;w;gbLTlU?n#7hy=H!PVkN`M{2t;}JtD1)=4^
zjg$WA8YubV4$O(E<|yeqhX&lqeJQAo0Dh@8-N{q30JaDGp;J2I7;RbIQC~=)5$rua
z`QVEmfX3h%@XJ241{e$2L}!D_Zn)MS)|E=6>LQ3oh6kSf^Ke;p)pdf$F8;F~kfS2Y
z2;dKU`o9sDxS)R^ELm(@(tNt(Xbd$Bu^0+~Vn(VAl~oewDVK-C9A!cJS{M2YJsE>Y
z{7fWz7Y`ym9TP*f5JgG^YxUA`yG2l}m-TYSCvwm!;wy^KwA)o6Evm0!r*a^7(J%|8
zbscp07BP%I(|ua~SR_cy*f7_#qcrFqQgqgZplqSNiK3cH9ze0l;9`CXH+zhLU0r;X
zS0=5l+IB>6;-W;1$JS27P%W@JD39fl<b~x2yzR%6D$7-Kg+X$l5FLsHtv|gQicV|!
zcJU4~l_b>3ujPpig-+u|Ru$hTis$1aJiZj|E(IHQ7l2hdvkS=>Ah)7m4qeh~0!V^7
zf+Qq0*n{pY<V%ruJUQuG+x*3y3X*$ME2E4@+jiBVm1RlVWg=`IdlS;o0v3g!{qq#E
zNEd3B{-l;&tmx`BEg4SkkMS)L#(?FcR9@0Wqc)h)gR5!`Sa@SML5Y?5vFe)Fw*>RD
zin$5XNtOx=MPeI@ZQs>>M3D5^B};6%wTuaIJhskDCcjUrHmfZ0`&d9t9iV<T*7&B|
zyt~S7Xai#XHd7QRY2p5kgF6Vlnc(2~=d~}yN1b6D+8dR0*V8%>+(=tRI5}ws+^ilD
zi;E34i@c#2%i9Yv(iQEHu~^gs5d}*n{Ycn0)BFV=Kyib=5-+t`)&t6iNETCB<o|b6
zdt#*-EZ5G5LSE{*podRX4qI`r_uvcfl#MdZOg_A-o!0;X`65&yE%?f?8NuG+tH$Nw
zWc{~q!0S|F`EdKF)A|OrH-?01Jku-uZwa@sF>!|Xz)B-W^&S)d^*wtPK>|7m{rIa)
z*r2UkzqsKKzwf*@Bq1@`0`Bm;e=eLoG~iGjU;>!wQb2M1fhu$H(tJ9tFS*j|LWyCk
zU;!Iav0Q_Ydm*j<?+o#Q4@qmKK4C(m$KWhBiK8Wk3B82)q#4`;9*&2EiF`8zpaTTe
zr<2eM2^Z6bDO4?TK(vA|`3P*=mn3nE(O~(wypX;IgALX(>~=vo(t~+)3sVJBc?wP1
z=%la}!AkHml?7vu>%@v}u?F?@i3SF|5CJmH&o;-|P8!syBu6cVj4)Rh*C!_@XP3<3
zi1xu{{G<C}Lg&QsT_kP{Ca6BSWpnyBrX;@0JI3ttm;vd7KX0yjmlr9Vc5|I+UH8Q|
z7X5V-{lLx4OPQPxH8^(E%Ms;FsjO<k<;e5wHngrbi-NKPq>t>&WAIwW4fbF_!$;~c
z|FgaV4h6J=4ij)OwW-GaZm0jGel;<)5RMkv`q@tZe3<$wak61(=7<6;n2FRom8;?5
zZhf}0^t~CRITx?q3sNzaoEmI~<L=--IDI|1^gFHXWpqcb(Z$vJF>GDtb`I?WoQ(r7
z{RmJ1-0ufgm5s|kGmaED`F5!apOY$V?*276P5#JBzuIo~9?IVo*QbhZIHrwG9WePJ
z0|DsuM0IkFVTL=4HZM1~fiwN4_W#vg(cAy8?uwrN|K43u`}nuJa{BR4cO`Pf?#G=r
zos#Z*K?)jKuqGyMJG-%~cr}V>I|mzCD<u|@CRHVypwz*-)Ysdi=s4Fttk|_Ej-O6f
z*=dp(APy`q?yY|lpQ=o!@|LNG`2mD86-Mn65+0J{Z-Ao?1O|_Rmouiqok(gnWg$w2
zB|=&2Q$w8NczVo`cif&9lLzpJB`rHU==O)c^DuC+0U8Z1eP^eHC6=)LY3nl@XKO|p
zj784MT{vbC9o;GzW~W<a7NvW$F>oH|rHGNNwg$<~DEgB*AksDHHsQo|saWge$BiV;
zsD9?-<)$sN)OavLs*R@lah7umlcIt*5~eWGp=O7q!|G@iWi{$@Mg+Qbi?pck3jqX?
zZWe(8ai$D>h1X^C0H8CUB}-&_lUbdWk(-sDx_^a9c%$<8MCNph7-3FBg>giVDG~5)
zFf*K#3y4f`i~-VE2CEFO_6;lMwbMEeq<OS?8KNOtuNZ^+@CkN(;cvBKM^NyMqPkzG
z#HF8v=fG{dH)MnGUQTA%-5+ZxYx{^c{oXgJEQk*&e>6sfh`y5jE~gJtPIyu5=8zQJ
zMUVe8BJgn$>?1LBvK1p5gk>Y2$@_YO*;FwFViFggP~`FoZPJC7BTL7dtDe;x)NYPJ
z1amriPT}M9Cv>FL<JZa{8N>r%Zayl+fHaOb<?=mw58i?0#%B|R&Gdc~?o@I=Co;}u
zl#7$dd<%iWDm3YA;jVfBIa-2byrmgsa4&-6`w~J?Bhkl$1#`nw$rYx0VHHNs5_D;}
zE#Hezu&rQZY0iqtPp;PQ-LSq28X%+)n6`>BW8*eqUh<0fKpY0Nyd$l~tXwl!oil@u
z@IAj@CA|D~KKFwhuenUfOTlllGVDxh24LWJ49H|ztHp?7C@?QM^?NnqyQSvRBkAsF
zdY+E5hc%hnJe{D8PIbl%AUb{E>USy4UMCXQr(}DlbXf-K1Bf|(w$RoCQn!r0Gt?wa
zpTgvDLc9o!?!dUcU|0y<=^m5aB`uNiX{WuLeg~b_W6=PKgFLj;x_h0TOiwmR6q17M
zmkXO2v!;0mSG^TNLtKVs<o%rMBvJ<#Ch2Z9=<kf8(W|J9taq#AJMsXxuj|i!Hgg_M
z(S-wf7$lHj1(o(+Gbayn-Jy@|ukI6zJ9pU0e1p=4T1gp1c}C?sd10wCgW8;XcvQZZ
z7H~Z4oxM29D)I2OyX_%7;?y;N47;1&hlf-Lui1B5e^)s===o#;EFaFmcx(y}TN4ID
zd7mX2)|VpFG;F2n)W^92S+PmWh9V_*tS^=;E(VuOSxwGDFqYL~wya<p#Tisg?S<I&
z3Tz3=6jb`Y!%8m%Rra#GfF*;%_*=|Q0hbTZc+R!zdTQtH7@2c3D}{vUC{l`}*_N^6
zErbkdvfTMtG~#%fI>xmF!lK7UDkda`M@*|@lA=ewoQ0>dz5<aCxm?e+OEo+-+8)2C
z>6$G>nK3s*XpF5Ix0CLoL{5pmVEe}5S%vxF)z}8Vvv(^K(}X8`M2Pa(KOHZ0%taVw
z@*hZPtn=QJNE+(-g~J3!B_=f5@Fot`2K#DpavLp{(&oj@FY&)eN5UY)yC)}jS>;GG
zuQ%U`2K%1^+%6+XE&9UVF}ZTFqkct82R{ogo1*2HTDcjbPoQLPha^s(f}Fay@+~v(
zMr%Iv6@SheR`}*kfXiWIxdp5d?I(q$*lcNb{fWxv(Dm)5fBz1iVaH+>@isS>F4@N)
zr;L1WqL5uRF)c3ur7tiS_46jxK+2E;Xhk=?;`i~7)a6IB+0pj+KsjYNbzG=gT98{;
zRHmf~+znx@L+BaY#6PqoT?{b|xJIWGyEXXrjq*hhpCkNi6{Uk5M@5Ufro@(K#iXcP
zq>0C%$$U<KW>tj2IZV!anc?uBb$@GkTJvV_myM>I+R6vGQ%Tf&3m`UuUg8{|L=DT9
z(*YCEfC~<Ztij{V>=vKGiFZS2-9q;|nuj#PZ*hl~z&RpCn;<L<Cr{Dwib+MYDPiRi
z$^;V>(Q4V(CZ-w2e;2s7$oh_7crcKZ!KhP50H+hC=zkjCcO*z<rc<d^I$T3;BMp8a
z)aJgLrqKa&GBE=sXN5b?49eQ?+~I$hmH?^jKGT@EpfdN*64y7Yi^=#VgysR>exyn7
zux=8~Dk_kO37o%$`*xjEA#DUpXkHyxJ>0wTErNK$(?tMe-!~C?jSha3#3|YvQxH^i
zmeKR?cs8sTR2~bz@yWE{O6wmkfzIEFoZ;8bCJvtLVy=NnD-&ABA};NQq&+nD>e}z?
z`VkuU%>9A!=ms&`xZv0S5M;Tw;vMjyu4Am4ZZwxLu-325i(Z-SE*7D{c%A>6gFj8c
zk(f%3HgXg?n-_Ej=P142ayXt{W1)1KU;0N_Q_N2~oO{*ZDmSb(NLU>D*OpchOpwh!
z29IVEVuV(PRQB_n=|mu07h1^zL1rFH8AdU<mo1j0b5n&f^hVwHaDS@j=m_*=J$gO%
z*&;W~@uzO3w>!2Yf28go!`BH}kI4?uSm{=4R(?uFFrVFa#vyHT#1m&dS<TB7uD+j9
zDh}!fe%yL~q#YCf<h&#DElGu0p_T$ekn?4n3S^kA3=N)3;h{$5QWm@>jA+vHWK+i(
zioY2a>yDGWH-YL9objrW<n~54!Tu=wG|4SW_O7TFJr!<QiQkRVIQ>!fO%N~xp5W6b
zT=#pEWyG?U1{?!#5JbsyF0w4iZPYWHW^q!1?#0Bx$t~>A;0#Ii^XO3d`RhU|b?G*!
zr-PHlAUn0?Ck<$>YH2OghEFgim(&3-Cs%V52Dv@WVOf+}tyY4?LJ&(z@xAvxg^abu
zaBYjQaz;0)am9IsTD;OSnICa{@@x2*nw+w)dIG<@w|T-ws&Ybe8vqt^k{!qgV1YhO
z7YNtou%WDc7aB(*TR9-BNlkVt89!GHox2i*z%{K>Mc{gHB2uVGu;Z~Y_z)G{j9+#=
zwOx8=qt|1xqd~?5y~Uy~M=M@C>1ab`VC|B~g^(j6(D<LibgXtRsBKpVV_7wuqifF_
z&vwm{7^z52`aTk^YjvVkJBkpy12p@yw%<JwQZ!?!_7BbVF-UGWxxN6B{QIzVAAe9f
zrpjP9)jA(D&kqsB@YS+k!I5AP*rmbx26ze159?lD{J#Y~HwYz3q2;|tePK;|ZKy-h
zFtlOyCKecfk@D1$Z{h1Jx7+v1=6nn_c<<&Tg7cmojxi;e7;D3=sX+)r;@5Pw|Ms7p
zy#s>D1;(;?5|IR$ZSxs=Vn2yCXP_Veg19(-vS@-IGSKKRbe-Ks?~0!TnvFJ<HkRfR
zE%|h2#G)FN7Aw{(Rq79r)TUX=v5dCO84z^AeaQyb;htZ2_l;0*xcQD{k6P}wQ2RL4
z$#qvDat=RW`NSFLkzXQ=p<7Thq%ev!RDSs9-S;p8olSWpD*%xMO<HpFe;=uMqz@!<
z03F?})!Xw=IB%lZsIUy80kY6wrQkZ{7mF#{p7>R?IeN6Y)#pjM=Sh^2q`RH9E8*n&
z?T=GsiQDPGunJKML#npp;n!YfTHe;pm(2^;Gm=P#@zPnu3t)gfUiNYYtu`gzKPdqY
zwOdMBksRDp*laeR`?#IwIlXYf;kdaf=3Xu?duz^9WQ(+C+|0jpwYqm_a?!F!uKYP}
zE-{{mv@2#-H=x+zfea5{bn<e2R$lI@SY^IQ5rb50Pc0osOF+{Op6K{!czI-HawyKJ
zyV*gXsT2-&zMU7LZ|8+q*NYYEKb@D?AGRjXPsabnd0A|X#N!$0?6yBn`#po}Rcd|I
zfLXE9yFs(`>@Xf^SG|IR^yeCG!=>2edOKyU!S}00`@QvMJ9(|caI(jH;|b8RkI0pi
ze}gE{BnT=-5kZiqLxium;d~B%;aHwy$kA5T)=Ck`8HV`XAd@v6u=oi9c|>=cGQ}}8
zpn!+!$3DrZE(Bt>p+0;ReZBedSdG#Q`f$|lu!VcrQH2|G<{mV}O2vHK>%GMH>zcsl
z^Zm%sbjy(pi>?F<E;r(D=S3!0{qr7-mtWv%7%@K{m*oeOtt8OLep7_Nf)(E88P=jj
z*x`&&DZ?&vMZj-X%3TmKTQu`>?yzi&5wobVG{L=U#j*qaxw76NKCKId@Dr893##UL
z<oI8087`L1dVwBXpEv|iO8DR11;WTI8rW_;Il;0lEp0t`I07{~Rag=0;L%;><G6n@
zFAmP!!2f1m;zew7(b@jVyaaPV|HZr*JN}D#L7>lGM`A~!rT}PCCiv0D$jRyc7xN;@
z1JaTo8D#7ATBQyFB3;dfW9B+80M6h)qSp_RZ3L<>MTw{=kWpR%TtqPWP(XmGD-EO+
z3ZfR;pK7v{Um&J6J-z}tOi__35(<Y(zwnpx;?MX`<t6fO<>jXi(Bv4RAr^x=rs3IA
zg9yZIN<UBRIXXH~K0U;pvjy(W(6cf&Vy50@?7{<^{?u5ELwkU@N;NgEv<ssq`GAbp
z6xgBbK%?+L0U{gV2Re{qVrlz5yC`l36v)?!(QueCHo5BSnA*=@l;r1yqPotTrpaXk
zno_W*fY>V*@l>Rj9YoNJ-^h!(%hCBuTea~y{t6P7Xdr~dv7X3TSY_Hx#RD<|bn!Rx
zk`5K2XtjJKNDl+11!P|!sgPceopK5XOUZ$pi#+Im7jDoEAaRw4co1R+45qeHIZwe{
zaJ7!0M#=yy{VeGokvvpP9Z_1%B_RsDgRCH4;sQ<5M!|YWMm!qQV{xl*Sg@hM#}^VN
zokBZbrw9IL773^9r;HJEI3<mcn~OD2h<GDI1j$^yzGL?pagQqRWnT+A!3Kr{kBQ#X
zEzlAvB{3h&cyWvxEK>#xa>PK;%0${q%_x>d!LeHYAZAotPZT}G3y@N9m7s~AG8IfS
zcXVV9(4dT;P)jH|Y066b1JT|(s+>D%GMwsm`iUbF{7VomHaZ>JHyxPBv52354TCDD
zKZCuoNQ-Bteg7}yg%SGq0|5p#DB_&VUbH1R6?Vf4#YpY#k8y(^avm3z<5r-6Ixrzp
zc%5`R#Yqj(4KUl4<5j``Kwe1yMqX^Rzmb>8q>EVo2@usmv&0&iB(;nq?pX^dN4dsN
z<L^}JUPCxpnvXtpI26D9ppZe6#>}yQWi#{ss*je^z-*yWrF+|Ad261~Z)=9jCiM1N
zeMVAA(9P8{=N}pRegpdTRT(BgmRK=TQR?eK2y{!?_J+&~-N#+&@{9LjlM>!@vkMHG
zslp=O-AOw_b7|GO1s)I9l;5?`w4?Ze^cTe|f_HjB)`Hx+&o23B67Gf=4zu3ZrcKyX
z13+NByj)2qtydw7(yPQn)l!w5vg52KMh*gV8n2AY9@bYPUBWKTsRZ!tTuZ?#JDD5+
zHf+P9F~zdFMIe@vd*b|<wifk{W^7CNk|ugDO7rEQ3R|Sr*K^p0Oq6&^gmKtMA)gZe
z&+Rl4)i_^mqCK1sz<l=8r9Yt4+;><M=@d~1P><{-w8H50)Wofl37F=boEjDraf%yQ
zrwYIdVmylAF6KvjDv`<1n}_7uRr9yt9*nvaI@EmO+kLCL3QNG7!3o)?z7GS;Ednfq
z$n0rCWqo(aar_U<>&$U}%j=u1Yb_cYPs?Sx1EymUG7anj4G^_t<H&gR)$T#18Dnu>
zv!Pk#8l*SU1`X-u$c-EgE^Q9N%dAU;1*qMRD{9n`>)P>(?SoP?a|FqthqQ-gDDrAQ
zVa0_|8Wlr-^pDa+1bE~qDzPr+V2Q8@wTW|9{>gloVBJ0J1DH;#B*`I8JYyPfS=tgE
zfsvkS;y+OExCZF+aJxTlHt^8Uww=EKmEQeT6rW67sZ;cq@xoG2tltEdbv(u%XT&jv
zRIWCWut{@>ugpkzl?D9?CB`DD*@EI*e#o?T-9B!|3+;kiKa>xGyl1Hi>t!b?8Zsx@
zu0l4btfuOTtL^a{g(+K~p@F0laG|bSEz1Th`DLvD(W4|+-+#AS+%*<EHTI2KpvNH!
zfVh+c?zs=LU-IH?N<_y4utWj0Ue4$guIP|n``Bho8)oJg8?XB3Tj4O=P5Z;*b++vO
zottouh?m-)RHEZ|tygFr5G~(Ys3zZH%ML8|5<H*dlP;29f{kIO;SFC03Jw=m#Ua;!
z`*!1)otH8acHj$nm``9X(%5cv!S5JniZ8iLuUUu2PX*1`^5t4S198CS1MA=~;-ylD
z{3rs+{eK`{be1JseCkkhgXUfj;T{=RCS);?4&MRx{QVL4Q8;#&B(hjKCW$aYpbNx<
zBzcZdhBNyDcah{1iHnWO*AE;VBy%b~CxyGaa)s)*a@D_*9P9z0LZF=`#S0jN$*$dw
zEWk>p_2P$l<3>knJ*uD-a1sD<52)u&AV9u_mlW~8gcszA;lG8KW+$UA)Ok<gtOxFT
z(5y-DiZ!pLj0l=R*v8Wkn5JoTHW9hDK4zE;!l;7ou{CVDv1n;&`A=SN?uGK4A9`*Q
zSDalhs{tBe+Y{l^<~E)R8o~}l;NrdkxcI=`Er)s4Ie4~RcOk71_7h%;+msVyrNE@g
zX=va&A?W%0iW$Xv`Hd-5oNEdrLU5Ay42B`I(5&&o%;)p5&spW@_VA!$@=-y}TKU}a
z?{G~lnuM|EYlQL{F<f7FzET^jv5CNfNre;(Erv1q#FG0Ok_D6T@d$KKAR8Wpv7KTo
z&Po3p;6;pLpBU*!LOIV`rsRmnTLRT6&I@)jh7D5k_@>-cXlBU(y4);Gh&9GsheDG9
zkL~efd}i4K<n#yoY0qG(OdPOK-GPuDFeVWnIk?v%Hz74C2_in6@tG|qbt1IG-MN^Q
zYma)D`xMXbD*H2mC{g$|0Tal9mU(7I;x(9GO*&sbXBT*%j_;qtuai>GUsa+kiW$EY
zc=YOr6OL0t-~!RBi8tx5!Jz<LLpE2@j&sOWU1&xdS^ovR*jL(L@_qv^y}jJt>z9Eo
zV1EHG&u`lQ0$vgr8qowI54U~09uH_vC%UPw?>Y$|acH+?B82?<&I~be<T8Fq-EJ}d
z3wSYtGR++tccA66_y%6?3IkuGB7(s@4Gnayn7j<R?RgGk6mRiqdS2ay_k$%PZA}QA
zSr*S_8{1km%&%HC{ACl|UP@Z0Kf`KCU%sU0)14yDes?BbaG~5IvEy9=k%e5tu#~`v
zx8hVegWXuU51+<_Y^tK&&~kIMehi|BuLkAjb_-_|zkipeakxO?Gio3WPUF?|CUt%M
z>y!0T-p?=~Ywh1=7OI(il>9^q-<&@yvV0;jrUbN(a_Iyv?a#kH9zPZ$71LxWQO89h
zuKPWlD>gV194UYI=`SKlo(fRJ8vfpTfQNF#6Zl2O<<mt85)?big;+D@Hp{-@MW8CH
zsddHqCuL-gb%CO2!lX8>Lhl#|lD8S~)4YK2kKppK{LVGKh9Fen9<SLW7|Adj*RrL0
zn-Lq=;4mZ`mw~w_25H#b&dT|bBxn+4y9Hn)Z*bMydK|D>bLnj{%dD`|l(|BvuJbKU
zc?N4Xt@`LKwQAK3n^hh){O|twHh;ahI<NC8&wD7|=&)-62tcPlHHc!ZP;MX-eMDiP
z?2h?&uSCaBG6Mrh6M2j%#F#htuBlkQ-Nr%XIrD;x3u<-OSCwgSdncFX2n`t~QJQZ@
zJ?|exQdscE59=_`Yo$%5KGt(>MME>XV6?yS((oD)yl2LyIdD^lc<oWFUtlsEWIWkv
zugI6qrvWg~NA#}@=~^02^!;R(phV^|NIeb_54Vc<muyV3W^}EINRyOx1Xmixt?kkQ
zf1vgIrAiX_x<J=UHFUhHrH9{5AdeYJqV&q__+#!!v-P~82GHrg#L&bBg=pwwYVJ^z
z1~k)kw}q_n^T)1LLdD8<&i3kZyXG3WPM$_(cHU-lJp`w4Im8Ej<-L=Jse{+zng;^;
zBC`-bP!1Or_|=ciZES46{plPuNF0biJ%srQ;NRc~?+x%J>`dDa6@nb-naix+7Gf)Y
zmVv4GR(5vLSJSCakY+`7sKRckJs{V3Ih~kRvh%m^!u_}JA|hq{?YmSKZ7BZKj@VAR
zY5Gs!<?i3U%R>J@e3z}mfA}tJ|MXo{eBb`|U8HuA>mH<Ejca#Z`|e`t$Or_JDo+24
z?-HFEi)Cu+jg;Dkw=;Z6-($ea`x+@AwR=4!0n6)+$&`8|1k-CwR^%Ftn%(9d8ZuYX
zE<-@bv<4Hodb|l`YWCuG?KE(09vvA=@F!v$oGKX+49%U;a)kv0qZR}uMdQUv`ORpc
zZY)6>o%8DqN|eGZQ22ugSYQsZr6WM<cXg#2(^wa!emhgJLdYajwVc-k<t;uRz+&Mq
zez#g>Fka0`cH_CZb82BsV<|=XoLZ4MNM!$guir%3OMAMvI3$VNWrqykL+FI+)cxw6
zo*}VNaKKy%Vk_P<vnF7XVi?wv35TknNM3Xd1RivM+9m97--YPgcY%&3m+>S(Y%m-8
zhwmZ+^$*{L`raSt*{qc0R2Sf%zRS|L??M>zH3TRqqEydRKpZH$1ZS_0l2kq5@Pdw;
zx_qfG@ri{=UcC>Vl&M9oM8a)Xo1qKbZqcry$#vWH#VG={6XFrgno&<MjQ(%}Xl)+O
zA*9@?t{T=;sbNQo^Eu*VL~=z{R&=wH_aD5A#b3OO`Zw=_7ja7{4Ud3P!riD!v|zq!
z`Ab#AW?2p6LKJ2h2Q+Dcs2SnwC#aP(J&)LL%C@KE(X@ArR^>{BVU?D!^eXQ9T=FGY
z&hud@)ZhOPUGLx=Y1`;q$95*p#I|iG9ozQAHYT=hJDJ$FZQJ%lC(pCrz3c4r)mPPj
zL09!%ecjh@t@YbDLM#utqik<l;FlkwcJbbhfrDSX;BPgn`~f|)jw|^LGoHS5BBQ2M
z-{;42$S|H+?YhLgbm_hfGv#RNGy7#JmF=)fjGM`laD`f!ek?_B&G#*7+j09D(*C%T
zjjXcEWpzeR0^x^sSqnw~UO%&3LKMS63Ot=|?`7`wC!6&~XD!pOr>u1*&t~h*&gaWV
zMq_h7v=O(-2x-O?f6y(qd$OnWam;yjcYh>{CtK+~t~oy3;KR(p6FS5#Jh#wEqO_l0
zU*zM0GJRgv3P$>NYSSpk_GM1EU2htD<7!e905yGAhWL@j-<FITCL^`cD#mimsf1&0
znNg9hXV!&5bjZS}3}ObVJ*lQk6^Dj)&u)p?h(r*L5pd~0Zp?vn-gipfhvBWHNbLet
z&s<niFWjvmq+>nzjt?X1Sqvv?UaB7_o%U^WT9^^jBM7}o-1+7}dR&PyikUzItA7!u
zOh|%q_%PvN$^VHA5k@IB2_oFW7Ncr{$4W{y1JKjD_f@XXFe$V_XMLDnC+3@ezvFju
z2w0D$Eyo%Gq4r=deXMhbUy_P%GR$*$NP!>aacVH1?kJC_pKmGTc5JtQ#anjNDayBZ
zG5UH}*ncp)T7~SMnuc=58K--GEjXOUKXU-K=rs8A?jX(@cFvp|4u=2q$XDQx$gAQe
zc`iur+mZ_JYK!?X{Wv`$l1QCO7H|<cwi@b+($Z6KqMjFS3m~W{lzrThj+(BM1r6FY
zvTiuQEvx^x?h<dKq3a+wydfeaUaWeOLEEl52O;*~b(bEyuktq2nWm(wBw;x_NXreL
zMwYjpk3;gjlfjqoy319NV}#tbLZbC=FDjx&KRfh-lH94S7jyT&WHMxM>DbsJDXGdU
z&m4?gf2W1K*~Jyh=WecZ2g}pBJ3W>b`J4xVoJC76+`yTK@KQoU?6HeZuq=?NHS(B^
zOl_QhstXzqJV>BpuxI<bjmuTH@^6_-Z);?ColQ;BPGl8<mAL;;-DR7WVdsC>UDRf;
z|E;@#Zms=WcbWdD?xGOzG-Uk1?2gH~SratUQu4p*E=PC&SKUS7|F`baCC~J4-9?im
zLWB6<x{FJl_DDhn-$_*+ltXq8JI}v$mn-4_uDh7e@pfQte%D>-7tR*@PQWf=bN@ee
zmz*O%u`-Iq5>c{#E53Jc=NXInJx0pd780EO^}8FviF=h>v<`Xa@4Abg=8>Z?p3-3z
z@IQ4IHEhwS|EarR{CC|&1JwV&>Mjl>jhUnyAtlhY`$%37Qg_31P;f|XDCfq=Vu0_u
zOX7d(F6IAs-6i3lx{K*|-9=_e{=4qN{ZHKm^}FuUHSxLDo=ndEr|CE0F7~6|KXsRi
z@48E)0W*pNDa}EME=d`2N7M;hNi5QHp(z;lzP+~eZ^A)kMt}FgR=X6iN~)w`*j2+s
z7E$4(S-i!$(Ww?>BLcVxeYV6ZJgCK8ztJBcR#$YYp!iK;L}Ul6k|^^P@}#u&7UUkC
zQf$|v({+>GHSw$yBL{jo7eWtoD#YEuq6wwznk)l9<Q4FiC-=&3M2<oorR?e$b(l+i
zM+5y5>N(vcg}?!c<4Xm?OZUlkxFkr+fB1j*UYSA=_1<Rg_5+4gmKt7h)mdSkG@0(W
zb>?se9M|YKdxt2E2Oxj}kn_#@I2h+{-anX^hQILt`a+<O!zGTff>UAuHB##*BQ>)A
zm|hA!)oZ9v0M9W%ctB^Q7$=LhVw-AI=1!y*)P!t|F4PD&qed%9jFHlBGf@$nZw$~G
zm7kEq34~T-^#oc#1xmKauri1(^3iZ+BPd83^HXQD$PsZ5OmZMiXCUueqlPLgYWOp2
zz5p};hquh^=kyd>LJzE23ZIWSF%(_>98xH#&-|M&wry91y_lON3grPBi9q0g)m;+4
z>n`5i|JGgjh;-#Z@d*N9QU0mB^lN<AUHBD(gl7u=yYAv;W2?V>_0bS_5@=CU1{3lK
z4dq>6nLuk+oGrLC^Vv~N;5uo=FKHm8IUiYaOFQHh+^Lo3t1zEUWX)%(t7Bj{zf|%m
zFac0q^ZG$RwOq|&u@A9;?VH`i2TyN{Xi$BLI6<Ohaj%g3zz4TqX;5jf_JtTxZJj^T
zz4do(Ir;mK12+GX{|R-=B*-E)IAdM0o-!)4oo_LlRt1`$(=sJaRafGKud>}Lab*h4
zRI*#esS1n{dca(656v93d6crTtbIG976^j!syr}_<gG;r21Ug6guXORi>&{#?n<?X
zH){xYdKGZL@L<vc6=g#bqsRaZ1e6$T>KpqKdy7T{Aq?LVB@hZdUld!~GFTL9_<gCt
zIq)C(agSMg<0lF1^djSX&r&`C5yi)zmVz+}qcsUGBZD>Dpu2UCxdx0$8-hd2T_9*<
zvyf_@JEPWBpV;T$z_cl}M`Yn(>kzb047eyn7!(ZgbA<W`Tg^4%83J&9PH^le&df9Z
zE71m!UAEuF@R7QSz%Q^Dx`&DlVg0g7;sG23fwzc!m4&zd<j(e{YY$hX%fp2*hzcWX
z(MjlYksZaWgI^=I%Cm?<PB-Q~kKoL*b_j*M;&nH}Y+h@3JMl5sPJjHQxToj+kGZ`%
znE5VUHxlT~9kZT}f&X{i#h8biFFoq!CO)www-%WhYgZJDJ{hr}NfDTz9eqNzexEf#
zA-7EYj>t5>*Uh7|ZcK9VZyS8}@e*;8p_YbpLk?DIn`4tA63lI}!XIPx)B_6Z!K7w3
zCR3acP1p~zRJ!K2$5#o5BjCVi2l;32f9o#TUIbwOPu&Hfp;4PqL$&u)JGxt}KxJWX
z(9nek;OiHf{J8<_yG^O~T>G}JDlpNfPB=_L^GCB#`Wc6(P7&^MrPkya{oPO0NC&bE
zYuUMEuk4fKsD>c*x=p#m%+U)q+@{QreW^Xj!_`5__PEXYTHyLG%4@Am!K?PyQFfRA
z<ft**EOQ%Y%E{?Z88Y!VPtU*iMjsg-4jK)pt_C?uf}=i(5^&#n7pgc36ovqExjbc;
zOv`6No@`?9hYVvmqNKHr^#qBB?zD^Z-UH|NC}-OM3CK%+4rF0(%yDu?zXh_y4VD#d
zhSZhFHWbjXfi`c{0$(SGM?sbbv+>WW>pYR`A!EbTA2MX9-mL*7J``|HKUbb?3zga!
zuQ`&^_i@)KzP00FYQ-nmpU`;+KsFF+l|VX<mN=+xV4HAG^jG_Cd3fa4KcwGLrHcP5
zl}7j^u9ex$GJdV`y5+lBUl;V80&{o+l?2yx6?K5G<;a(J+cMAjKNvm%3ysc&^=2CQ
zmN#R(FJ@A;I*7M3M5N=MfD$$c=Qz1m?@K1=H_0=_o%B1W%SzR2K8>2IQk$QgFCD@f
z8I`ub0dWjj8{a~+jd@bxf<!jyx&E9)O2S^x+{ncK88M8S;qf-)5f5rbV?14zYkPjf
zR9)(8+ZEQv{tQ15?+uES??$k6JZR#Xk%U@Hmt;O?p4Bs#%#35mG@r+sFSe+>uc|aA
zxo167!!5YG0Z(YkQ1qa^1U0FB9dxvzE36eR`rtc;YL0QVzqD_Iq^Hn}y|aeOQc05_
z!Xo!aY9Vh`8^JcTTq2~gDB1BQt^L>Z^}YZHB?C%PB0`<~{qJ~FLK{kB;>+N7^=OQE
zEFc^uJT<Xt-<ROmpFfbA&J^m4P8?sXa80p^Y&~{~TjUYimE3=^vMg?S0@u&$t~*f5
zGi#tERG77-M=+On2T!XTbKjbl3i~b10YCU7%DE(F`e4!mLx<T2%G&DgSn&8e?63xc
z_fbnq&l5M95{pJUlYryCATGBkg4OZZ1H>3?VvwhTO(jFb1e%e>J?#$D3rye2mB$l;
zcy95&TVS>s>M6+K66*(0=<MOH9wS7_LlVK~;1+DUSV=|FX}YYZc?yCDk1E1~p0sf(
zrrb34ENbK9mo^)v{!MIWr1woMHJw%<ex{(uI=5mZu7y}4-R%V@$#TG;NV=Hh0k@Ue
zk7pj!k$d1!I|#EZL4uu4u!|y_EaIUoBeEm<RrLO~3bY81O;uDiM6M~*7-X<4K=&s<
z?vB_zfUbv;%lbwL*4O9i3zj|@(FrqLMSxIM22sycw!LaP<mM2FIZZ0JLQ<pAaFpRp
zSB;~7LdeOqd6X7GIOFOm%qP@TpVrCE11DIzR3$;2_Cl<at(9e!Y~kQxuBtgpH~l0*
zFk9I42(FdUnpQ2**V-4nt9Pe&Z;7&XKF?MVx2E0|W}g#O9)^KBh_Q$ozywb!GX^oY
z?r`E`qG|2i_JSA91tOvSqlQ0zbarS-0Hiqp23%3-L=+YC>$eJWL31y)h`l{NIJ#+T
zAMI*6T?VIx#be30;tdudNA`oTOF;dIgFi@vmP-AXN&I&4qRZvi<#`wz`C487juCE)
zQQUS4Lef|Wz}(h-nr%w~<q;m=XHF3rIaB(!qo;%_PT6CHM4?r2iMT74hmc}}wcY!>
zXtX_=d`|TYC}WFL)vQRZh9<cqdv;YB3*RXRHes_u5^^GI2=-flBV3DS3|5yXvAfRd
zyt}#i<HIJJ>d%z+3<Xtcp$7+Nh*6H+s-f|$W{lWPI?5n^g4Xo=T2jQ1>Pn(OpO;-X
z8;p7nCH*4m`9Z6i)9rI6H2fzE4GLJ;&ss*~_X;-o41UQOC}Lul*FLDRl%F1t`<y=F
za4Qz**Q1eP!odhhqc?&t{<f~jQ8BnxAFgnnaslSX4>W!}BT_&Kz4a!sA6f>*sccuC
zYOP(H3Fh+UNVPHROjh-2$VL>`avZ51u+zd}!fZls9!dJ#<thlQ6+*94jdO*a@!D0O
zmqn}Vp>QVJP$3;ju$YnW2}k=$7_x_AMrX70lBk9HeTM>grs-~-!@ejT_`vWM@6P8k
zmI)MJ+8?G#wX(mR*<r~^qLP9YPGsSKjZ~vp%ZOSzxRg7M#+*Mji6#Ijy@rV12!**=
zC(G2x9h`GjA?t;Rq8TVBIAc>wAle~MwDfST@rf;j>H_P)$FnP+m0v_caqaVUud2Bq
z%VRe!=kg#TfMO}E5Ueu~hzc>w)kVa0BNd5!?@oRsas6$U=`kzAjXqjsu^qf&TtTr2
zYoA*KiGP7bWeqj#f6!N`ag)~K$4A-rZ;I{EOGFj%#013B67`~u4LTx0<JSO{py6d;
zCCz3VJCG<ubs#wZjSEa|Gn2%tg9DAc%>jC*#z*B1r@W9Z5j@fA@h%ViF-3(h!tgRe
zU`_ISy?^_5yMV#{jevhjoe*+Futl&OltESD?2j021F4CE?>p5=1-25`IiScFL%2IC
z?~7<*jmlV0PxyR$?Rm?iM_vYV)!&Cblma?ogsGoVu-flY@g!4usXLZL<wg|IFrG1a
z5fOpf{UJX5?1!55Xcwvvbqp#o-Fa6>35{@^UIm&KMI@VB(Pc>*J3LArA|o_}C={XM
zT~^-L-{A~GcUL#Z!pz=Z?0~ZiE83KsR+VKNlg|v<s?MCi{^B7g77V;%1rknB-pn@G
zBE~VBU00;zkhpeD+|K%-Kt>>5jE&?>8bGn@OK!k4Ey}brE<Z^gn8OPQ_Of}w(k{W8
z#G7RJsBUHf0$*+_DcWqytc6Ut%!wm{I=8?pMr<&ofCeT0p8d(4{>@{4Bqu({4nLIn
zJB5s7F!>p%*79YagVBzN5o{w*$D-;UI|U&zx*)LEocyH_>S-=G*Q5MCxH8wivfm}7
zg#uHyz@`rj%@oER4jgNTl;Rtj39~>CWqu*Cu9(ZBO)ChOPN>Od-ji105kejT8+uwM
zs>L{#p~UYS1UVd%Z(`6~U`@z8GCq(+e=csxmhoq3$;?I&GE6aeH|}5u!4TV=g7Hb+
zj_^;!leA1S$cEWM^`nNao#fd%QP~JvF|1fUlZwQPB4AXuy$=O`%$iQUs*1lAvh{RP
z&CoRRT>c^NY|L%srEqeGFkNPFkEw!*5CT7R&@~oQ?0bH+M?#|;f^EW{?Zv3XQ;|@0
z-d>`%&J&LpZ7~nXgQJVhYsdDo=V|A{|BAk8!Yu^<W8#LY2=YXNQac$t&}t?iOGXJc
z;FC7d$;t+g-r<_Ju%tLAq6qwX!*keyUsWuk=J<oDUz<3}U6U#1W=38eUh}TFc@ux|
zqMcYerqIS|teCIrVKjc+J8L73;Um|aNx;#&L@~qaNpf;FFptHxK<v*M*BqPN0Nta8
zv8dXlwK=lkv|N?D<PU0wps?6&xXZVA-r66ML9LVqII)>f^Cg~lCz_LaRdmIfLR#LS
zoMK5E^ZC8(_aR@UmpBvAf7)}H2d{(n(n15IXi@bm?p)VQk;%~nfsu-}wk=gjnhfK8
zBz(xRO35456bp^=h*6eK90Yk?SPdc)yzkhqiuna5^AS=se@PNI=*SpJMcS0%H&}4O
zQpFnB2=bAbv=+VlU&1q7V#O}02@aq=w&jzn@eT-iyf-V+F6T=<*py8>TzB-RP9lG2
zXIvbtd&0wZkTDQWFFlz1BBZ56F{6<54IRJ<Fq~Xh?y6BEq-<CT*5;Mi`enC5TpjfP
zIu9B*wiex*-!q*{O&J&-QGdT?YF_|*8`%E<H7Zlqvkk{Aw3*;ezyY8R5;4<TZ$ZR1
ziqQB=YDT7VcfTIB(HRnFPdpyqy<U)!=Zrr-au<M(`umG4jhZTT8f8G%t!{ORFeL#<
zKCB1YH5pK@4ORDI(c6~mQRJ-fq{$%JNz;<tQf$bwH&%!v2Uc5_oU6`3m~ZKf8>@<p
zqk@Bv$@UKJ$paNZ^~tst>-^p8FyI;*)2{;>`SjD(RCo-XzXBm3{=%W!&pqR9c6mDb
z=GbiU&eUkS?)$ShXME4ajr=Q<HJ3%>=u#$+m1OhT@#G4JAnrN~Kq%WmD)XGxfrF<#
z6a=RHjj56^M0IX#rMMhObCM~hMT_kKFHR<XNG_Zr=Zw)w5Rjw#Osx(kFp5A`$A;k>
zB8}lOWL{TeszW5SybyJc(|jVtl0#9Z2*AzojPVeU7uEgYsgtD~Sn3L06nj|76?^N<
zrpqqZn|0@#)@i)!kd4cM<))=s6i8NH-{15h+)qy3K?-+3pt`1)O|Dl~opHxJh$j=$
zlQHVQz6NCwgQgWhmf3uGs_wmEj`_zKC)HBbfe5e&P5PAEm^*{y04NI|lS9S>1EG0n
z?TD~nuK%bSj)z1bt9q!21Hu26=YN|T2!o0i8^t4vQCrziTewi0-F~_34yJhcv=1e{
zQM|I{U_9Rz1hqy)@be97%j|Elw}b-+7mqrc)d&>6jhA3IcVPu`5-B)uVvr}>@~Q|3
z{S9u7=m<o)^$q_R!=qdXZgfdq?PL){QOHo2D<$1EGWMD|+ry#RV9EDM=!1xSi!!St
z5Bk0uKpr)|KYyQ)-FWl{3n0+^iWw#}wXg3&8A|3(6HB__;4Q<Pn?Gz|CaF8n<TY|a
zDhNA#%H{^tdl1tPH;wd37AUncm!Qk#X#QO%zk1-obKPGP#-4$p$B!C|mH=}1e)uU-
zJt7o8&R2{C{OhT~bMI<T1xn^hw;fv2LxyEQo25{Y#~yudn*V(*tGlrA$EUGMd80$M
z<veBAofi0}9~hC(ILr;$jdV03wj;XE*0u$?V4~8?us$ZA_2kDohum_r`~1m~?!^2d
zqz4MG>~%2%kB&NGb5d&y@+W<g^Z<h%!WRiDI<g|Ok~03hbsufhX+10(xHGvUgBEVT
z01!qY`7H5o<R}p$&v9jb#BbNasNtC5m}<a3u7zDMT#<@I1CIEC6$fLeZBho10^$uA
za!$A{>M{des>7tYaRKuix}&yt)7zUkxMIMIOXI^{#Gw*TBSI!~qZG2yPT#yO_OdQm
zL+FOwbf87K#jty;ybgbSWZduk0tKf5nrYB@7+{(d%(~Fx*(>6fi8$jDjMl^p!*Fe0
z&<w-W7gztxI(8F7jIJAM2gw+olfTp-cgXrX%{dhvFyVsuV*QjQ2>D=$@lijs_vSca
z4P#f7%focwoQ#p=8y!M5U1DvO+axAO2;98t>JvU!fZF!pWR8d%3)5JVbHTtk?W1?r
zm33fY4PESDoEgLF6=^j`0v>AkNZAKfztsys&XWPl=k#(nM%>aRb7#1<68>w-Wz2`{
z3aU3sO{dIbjcie3kLbnorRU65Y^Lea68+_HfP<N(#E%47+2OS-I!X5VqZRHoZ08W`
zj*Ih*B8MXZ)(nhF(oD0Vro4?wWHygA)nNOVGvVSP0^I5aY|rt*HKE8%N^}Iymx;vi
zKGN!)u-2e6zbFPJdx<UbfRLlk-zO%?HVA0J8ZCxZ|413a;QO0VXoup0oUJ9C1iLr>
zU_6A2`<4c<9hFkks;%{SP%i2NsMPK-mEBc>`1*xUo08|}iB2mt=+zt{A0HFL_oEc+
z>0!Wvht-BWl&gA}{CcG#!kQNG2HD^5z&0ZGTF1aX$3c{a9Z~m@{%{xoXBc!6evN$g
zVEa(ALdy=tDvc@On2wBaGs^yU5{sI$OaRA6VNB;-CfC2mkeINtd^7~Yq{_8P#AVBk
zeHjFiPj?-*Lf|6`iTa$2ZUg^zHb{w7^9v=X$!*7;3=y0MG_uhoK_&YdOZXpa39b^u
z#7_B9#f}l6HeTs_`i6MtDTpf3LV^TZLd-9PWhMY8@E3hC)yAl0g$JZf7WU=13>AI~
z!Vm)`>c-qUbgnfylWUnX^2e^Pb%M~RD;X-VY;>UwuCF$3@u~C%ZWrk&E+#iq#`9O%
z%-O8u!+?Fk!!$>2DWg;pdU$#mexUCIe~pW+Y|d7KY*SQSq2g#Gze`Ir-88qYun}}5
z0*2&wAj&bqWtF%2#~{v>)z*BcdxqOT&NgI*Ei(fZmH*iN<8j#dACH5<x5q*2ACH3z
z+rJ)%&wo7**WVt8w|_hiRhiLzGT>P;J}Mv<0s=)%Qxg1c^vC~r9Jaqb4j@;UaAumD
zLDbxvP6D)9wu*;IR&6d#+RaV7LblC4ljt55RaOO!9g}M-#(XMkyq1@-dXomfAi5iD
zI$8ORgfD(YDY8s?pYd^a-|USSy$ydqJ^im8=K&>npVaa|JZ<ySn@V!n%X0P7W)W)q
znyGISS>A<-#AWS23FW1T6Zmr3c{p3mR#FazJjC3~4=*A$pauzg2`B6l0YIc@Ceg(I
zXdXb<ik=vC=`mngd8~g4H*ri#uTuq<I2eQbw2>BfV=cag<eOnvSttC-gLlzLL)5rW
z3yck%<m!fLtWTC$!EcsE3rHvDGPrb>n3J?+7J-zs_1C(4r9~Bo85C;bYAn#M5KY@+
zL8z9rzhQ^O#_<g?@i+bpctM&ZWR#O5^DxxU%EmoWu{zXbmiu<b?Np_gda}afIA431
zvMOTCFD=L~gDxZ&RjYMg?QAV(5eM=hSj??JU4ptjsGP^8ekQ*en9Q~$&`U?&+dXuk
z0~96Ae?A1yb3~rK@~k|JeS!JYClglsL8yR{R;4I(YWk&S?9g&%)5_c?@33e4V3#7Q
z!sSQ}hIQat`*A4Cm{x!r{sk*mNVStOl-JJza$h3BSB{^Cb@fHboEX7BQLh}m4rQU!
zAd_t(5Tk^`aOo)nxX%z_H9dWgmdJ7vF%X#cprMFz1G$1kN~YQ4D%X$Q!J2n?8*_wb
zsm4E)#s(#mA@Mv0N5#=-5eA|u)R;*g(;(s?(r1Q;M)}Ww7lQ*Y1RBltD|yGVoR`3n
z!W@e9Ia$W30K|uWA$N+ipmY#)VD%2ICR$^WFxB5n7Z5cmrd#byH9Z|(qzz33b<5$$
zIHfPKF8rE}xFvn+K0zy}k~RoA70FCAJ;o{t_m&hwQ}U*)fR<D*AL0LJq{O~ye+J&W
zV9XQr6VT#bDnGhFP$%3p(k4QLI%o%(X+^nD2m&vzS7A)(kC8Q*7f7QW*>NmAD6xSO
zs`q=gDw#Tw-15u6$rNy)z^lar_U5|r8uO*K*5#9#&T{XtFzLoUwma8BQjHcPArl!H
z8Ji%oWjw$mNVLRd`t4hIL4Er#-vV<!2k?0LC|(p(@%theahamlpAJjYZhWhig+jLG
zuCvjoX!t@DS)T0c$;YTCYpEuE@5P3wqbD^P!B}_}0k0SHK>I%DohoUX6qlWI*|H-j
z;#ce#L&b??1ZjbxX;8aNBqIzjvM3Bo);gjW79zXwpj)A=EM8R75m0(TaBk1Y*lb3!
zao--p0)_xe?a(+!JNlvSQp#oJAk38+cJ7;^KG-Srg1k+`lV{RUdQxfszkU)AR((c;
z{Lnup=t3<F@IrA(#HAhRZZhF{%1HPo!Z_oB1Krr{kg&sg#|Otfv8dcuT?Z4gf{26S
z!MLk?0#DWxVGFKLTUmPNbV7>`2i?V>k3dIDh&W@3zC{ZO(jE`-DO>H1+jNm)d~Gn;
ze_`EVCQ6JMSZRNRF;agPE4twg#NxeKCNeA@_;b)|R$^_1sSW>nS?TkhwPK-IB;2r^
z*dku!5vpf7lQ++SQqadXtdte!xf4a`;ECuj56&W%E2^_R^XPQy#9i#b^;C2KbYMtl
zf++AVR2xlw*s%oF{I%l-VfX!$4f=O$bb`s#16W*<H^T_jkEp@|T(X>%m8`Jn6S<0D
zdCP`zaT^t2(>Ko_8(IW0LEGjyoSl1FRsu4k%kJE{JZ7Yei0bME`+L!>G4fv*gD=on
zV3e^nkf}i-Ct~^U?9xi)9j{lrQOE}HSh<s5+$UDm*;t_QttRK63*7C&3*bGrApuaQ
z$b)}HOk+h;3KW?YoWXCpE)Q<QRyKNxJ!aB}#Hm9nkPQlX2p8?`S+^j3$NVT*Ekj@-
z!%=+<^iI}GVU{c@N)V+XDkR*z&;{bZLOEka?d}FBidI_=>W1Wla9<K$hgxlWd-qnC
zBk=hhru76DD76A7BhucGYJ{X9Nn}D9%>*KY6B!5USvex|#RECi&l)}*IKFW@UB@l$
z^mC=1ivC{UIf)tT->ln3XUdYY?L(g;7HOO;qcR__wvQbRVCrH;aCC6hiq3m~^Dtr&
zB!+%J*^u#=Z<}%!!<3vp{h`6Tyn%PVKvVEG5DKy^PFiLmsv~-mzYa7Yx~1A1yScT&
zn%_n|HdC86edAoSnmU{Ygu$N8H?Y!_YN)zO>I~-QrmI`RZMVUajLS`hmHQdhRivsc
zE%4x4b0j=1;${Q2<&PA(i0EYW6peM6qqHbzwb|JNcH4#Ic53ek80yN_4gj!L<%Xl<
zVp=NoLNO>pXy*D7EzTbPRK5Xj<)k(7zbUp9NT!2HFGwbX3%9-!IiCnLj^-d$o3jd*
z^AqFNO_~srk*Hw$Vd-;=dHj9^E?&SvVTvCqJSoSoF9h(NAwO6l6R4D%L-?}%d_8Q5
z@%q!1ho9jpIN%?tzD_ht*ocg`7%@jWTRcdQOP;G&2FI*zrpS(0cl;;d6Use1wcAVQ
zSCDCOKF}jW!oQw@y>98|3=vorSG5r_?=@vIC1JTkvy_e4E)8-jPHDtKVyyZ4S}D2k
zv?sE^BS9VPjhsvK+v{B9c|&N4K0$Dz8zvXMM}ja(v{q2PiLOWsqr^5uBgwS9*mPM^
z4+y1JI^r+j$0;bh<fAiMB%x<r&$nM8!BH?I`X*Gid*VdFDE;y<^?!sb*@$GXw!dkY
zxt32gh8%YX)7w+wwhNu)F6BD+gDD;X;W?gx14+g=P)UwVo?(q#7q`qt^997;P;rA0
zeOGb<BPfDBArS81c;>*@V2=t8m@lssIfQ;I7JfYPG%<z?%S#lr?Ei=bs`dX73lvRP
z<DS-wyU&X~sCLquzAU9Gh{11IcCiaLrJIx^;Mrg58&qbSx{{4s-g?n|8wAGak17;q
zA{H#Otk((LJC<@|J4`#14O`BVl49%Qnq4kExee3O%KOm@%8YG&uLb1vY`Zlozb6?X
zlxh@7r5Vx)IXkC;8bHD2bOEzDwSCArDXZGvAgb8c%g_F`wwh-%g}#hf_y70-WRn6Q
zEiFb1P~CcFqs*YKkolx&61ZK<AT<R_)w#gr(yTK51?-P&TFmI^v$K1_)I~@&JZ=a@
zNZlPqKW;hbP^l`e_b!gFc>fUSN6~h^fGB+n3x@VuAz>Zmu+?N~v)sDe&TC-h@ISct
z_kCP(t;>uA%mHW1uk!%<YT8Y%`kMpe{S^F7^6gn!1$DU0|Ju}iuf{fGuJ7VV_swso
z=D&6I!h`cUJM1&7$h{&l1YAgP<^yY|$OqXxwePz+&l_qH7~S~_np+>6=s&zMQh$+e
ze{iFWX-ue*XA>eas)MDxW?Y5YGJ-u%k<akx(&y*pHgmgv9^IAQM*)~4j%8V^d9^=x
z!`X$)I2^jWc7421UAMjM>g?|mJ<p8yZR;b3@!Q$?X5~(c%zUjgKd|?jU!scaJv?6j
zvRnmlKj*#e5CtxBKBo+34BsRio|XWnzb;reQN_Q-cx*OP^!RTXZ||OZu|OB^w?jpc
zJV4V}>Mh8yGFZ2Mo=_OECK@X4!J~CFX;o>A5v&<(!nOHbO(p6}Fs#d6rH8`B=Zjhv
zlv*`Kqsj4)5KFe?YBdmlE|oj6Gs)6C8s}(>OjKpxn-X5ipNYn07GtGKDZL6zg|?=O
z-MyovJCLigWdEi7BloH_@N|ovReNhhmvnv!|B5qOUYj{HSwK5-Bl7@^@&s{)WVg>7
z_8bNotLinSlqm%n$TglO#X;mBbSV#$Et{vm&QY(4?Eyx|Y8N|F6c=MH=sU4x>UBLt
zKmOtQx&Pt$4=sKR{fFn5(`6~XX5s5P;Cl%zsfZtI$i+u-?kg64une3IE`<lGH%)c6
z()N&GZG)BTcAn<Z<+ERszy-#|zu|{+tHv++2jLI+7vUeA@x<A}BkMzaN39Fq5x1?l
zgW>6mYMBnm7n7=vXR4_5CKJm!)$13D>>6(_cE@A|>w#lZh-cyZ0%P9#4)4+NW$0vf
zN48qHi)qfpuqwTu{6&nCd#tSrk;6O!<)bH7Dd;>*DTr%iw_YB>HkrIBH06--dh30r
z_CnR<`DkH$tMz;<7=$G!C1DVpam#QJk#!M1kdPS#?2ma%Ti#fwRYiNgMZfLv1loGL
zpy=hMbaU)xe`RBho%V81XF8bG{0rNg$u|85wkI>_=8F9Xwg=GP(01J(Hb*p@+CQIQ
zj?8?JMaZl?wugeN3687bv3{@O&fistaj7NZ6O7MMaC#J79WbYp>aad~l_32IPF~u9
z5&4C6C|=Ls;dJANCUP*cKO1jfCx@m}fc~ZJ+nb%PFrME{{zKc>ab|e_u6Zj&fWHjS
z$Op=pAQKHK!qC&#W;h<jX8hhu{X^S_&j8s@D}NB!On?8t|Do;cSv+AyxMJ{WC$_b-
zKNrjx5OH$&HQVl^JnzgxCB1$@OqX7BU@w;d*Gq<KJ-6>i;Vf_7Kd10{9dWBn%hKt-
zBmYIH|0eFiU610}v|t_Q-PUP3U+(+fd4MacN-!Q;BI38x0}#Ehk_;QlFc=%RlJ)E3
z-9fB4q~*gflDgKHDpZzT6?0Yti*QVpjW}KEStp3HYRN|6cm2Y;f~1pv0|TDB2N8jX
zz5PSf5gb8Ii}PTO^~_#MJtsl>6{n|H0$rN0(0Y|>osNW`*M#e;B;bxvV@||i=ZBkW
zFHBNLMlvVHDvc|>j<7XGOW)OOaX+5AXZGIBt~#lsej(zd;5Uo|s?w38Aev18SnK7@
zCOk&Se@D87?U(}LXA!}cCS?zFIm`}C<KA|}|AXA03Vb8?=VW<oGu~q;tLg+q1G)Po
zCbS!fCJvPbJIm428myKR=7lUSNHqz_a~Giu{N3*d*_^w&F9y4}<X0A<!&7d3bUX?<
zyW|V|j3ktgtqBy|)=IeWv^iW6%dqq}WeHc8N^_4cCDqnL`0al~=s0;}%0B(6@C8il
z_K;0u%-7b6v|KT4`3=mbmicKWyf3GGLx<rNbB0>y&sroL0rO_)M|?IDlM14+6QRXe
zz)xxBGUjx^8-~9dL0`e?yn%IEyv<H`)pCb(rzPtZ0Y(~nDy@MKyYMFZ#9lG2y+N&x
zPOCRkGe3n7p?u)Dw<<;AM0op7moP;XR`)=fy$hL(tymQGQ`Csq6<DvCudVX1+k##z
z03}_n{9WgKKC>?mc$qsTB^cR00N}kUvR*ha$qCF-=lC32B<{R-xZe$UFz^_I_-*L{
z5|#gj-GluHyLXP1`Uks*iJuku0hs3RKMBgP$jmB|0nIZEJC)E3S9l6ISmY2lOE?=$
z%w+@9_yNfzO4I*7Y~1nxK==F8@2C?gz*$%o0;JCi?$5nB{{`K9AYgL%Ze(Qy$lSgt
z?4MQ@iSv(tL-!E>f$q2ef$j~?Ai{j#&!}P7LnO8Kk22r(`VESqp9poGK%B<rqi%j8
zUPF{?K3OpvRd!Ueid}!&L}=&%p4)$Iub=O-xa4Sy2-_+^#_449*x(<ZY#G&2xkprv
zAc$8M)}SG1QRCy_)?#;|bKF2nBuiU7U+;9Ze;{4go>Uiht!1-D)rL(*H(82<=CD<L
zbN2(m!2-(;r1kxLa!O>^B*zPzT37Gx?P5VhGW#pbSC{n<P_dM8^mrprVQv74zyZI_
zY0))-6ItFovj|0@IDli>J>Oqx_nT~fL#m&3(3<_}3V(Z^oAZ>#D>MiHz_&BFCa*Sk
zw4CU+&%V6c>=)c$4KHKs`Gj)j4%U3uc^%sRKfJvx+8$kE=snk;66E$ngg>Yx^<0<)
zF_^hw)ea=lG4@}^i>%lx;%6e~Y+^1kT(_~gZG}fhWRAN(y!))Gc~$V4;eW}k4$}f@
zDLl_|5u*7_STm>}io-TmJ)FBqTJo<=Cjao=5Moz<B*zMD%6&&A%cm{<BB`HU`r+qA
zK&=d{Dc)xKmvbDhsx^3utDjz6e0aIEEIE1-Ai=8xNq#X|1h91L6VS=RuAP$8DoOr5
zJwCsjX5B3kl~r^Ll$E8q2A9}^?w~BzphhRjYSD*1uK3D#n^fEsbX!5Ikfd58lU5#%
zMjc-c0TxSAS4V)11zV0y13*#wK3g0-nj{(1;`N||lX`*pfX@Cw*Y*4Pw0Ap9!5O-d
zmazbY!@PVIftvesIE9BQkiZp#YEh5`tpy|k=6GTpAyh5f7sO?#uIKQyd3;#6g?vg8
zOKkr_?dlSrH^FFikx>3D|1pmj+W+8lb}DK^3;c__wT_&o+2OkLt8;^TexPWO8gmbY
z?zuAQSIZCKmdA(|r;WdQgGJ~E<1MdiJ^Md~_F<m|p(QqelYLu+W+Tsf;5Ul4mB~{`
zKzh$E^is`3dkmO>(4U|c$U_E0o;PD`e}@QE@u=cY;|VsPO*abI1iC3?3a{58sV90^
z&`fNCCNoc!i>Ebf$^@ne?MN{`4|IOHy{>|L$INgldsKHcgWn<~ORcyFQ?!DG;FF}8
z!*~rcL1@8F7QrdfRs;&jyX#^3JDGM$QTn6w9X4j}e1D2Ypg*BWfNE;4ez`e+g>=P<
zi-}Ak?<VvMl?$y<d9j*eY>%A7ny12^oWGfrz8%6CjlEo7xa5zi+xk3|le`=Wil`JM
z)}m?O@4f3`W8`SrcwIfVKg18F9?YACJp%jL;b$sS(rHuW_y7c3mu;x32XXOCqjA`M
z?PGMDr?*4;L}KJBSopZxZ!Wlw5-d`j@USoFt99*;rz2$dpL#M~pCUc?=v0o+FTru4
z{23j*=x5~ke|f#4Wfc3OLJL)}KQy))&4WzXzoq1$=VX_VP5*FC*3JC=hE{fK>xL6U
z*wt>DL(i81PK^2fV?s3a6mLDQ`#q4o-t&z~oiT0!_?Lsia?MKUv-R%j>R=4s?&cPU
z<AjG9V%L0=g9g0H`SvCHnP=;PV2kQ`bePX`h;_Rnx2x^A)oQozdiBS;>-Ct`X06F~
zm-U+;>rNc^&)my6LFPX2;WDT^^v(PnMGa?jB#Zk}+`ZQ3680ua|2Bwt?^>CR2|vX*
zD42bETeR_3VZqt#B(U4~eR{C4saggovD6h7M`JZAvq(d+8^cDP0S6VH^hrAiuwRSj
zBVMnCE!Qtd{9hk;we?r62~imHQ6X}oQ_RwY$;IFOJNIuSmf+1WY(^p`Jur=(6xhdJ
zQxu9N{W5^t+ei~WZo~bA<$JZL@@Gp7Jagydl8815vQ}vlGYT0?iK4?9#+e5W`!VGu
z<d+WB?S`d;cM)4W5ekmlyiFClmAvh6H?grlj0kqEwsBnjVBRyR6jV||1uWbiA=vrh
zSYBz>4Q9s}ZhU^>IP%Sjq*`^f84wU>aiqEL+8?&MkxSHlQ_lu?V!W;fh7dtV81`nX
zXc7(NxJ%@HW{_efPY|=;q~Lx{R3yM^V&dfVX*q0br6E0RPH{I;^f|1s{(fbT#h{mT
zoF@ciGnh1(c?p#SR+6HIR1(Z8&VDS$?z$`@`w^0UXR$!%3@>w$-v-Oj3eHu{JPuLk
z`jeYp5HSx`d<;H$2vSBm@Z}>1WXFJ5?SWB%K9+`|6)_pWh$c(332JwhspVMVC#Vt0
z4+z38*~$H^m#%POL)6F*1&>d%NUT)_IyI!x@}$9lQCU-1(U!1?Rlj#xw`2)*<vlV#
zd|`ddI~Fy(;Spg+|DCVQO3d0w&_gPGtZ`^9a8d7tlZxoKN)#2hU!%ahe*K!X4ugR@
zu7L*F)2|eh)!IMtn;WHn4tN;$n<Ej9;#N}pv`4$wan2_CoyTIIL*OWAA70g_4OQIh
z2OyADmZV}%?nn9P6(kMWOEBbM;8Uo?ASnB(Y`B4B-2*i4!O9gR)}X@;GEOq&f2vC<
zR~z4wD0<uAOg?J9lOY5%x)7ZQ#y%MQ9(b@aqXy4sCGDhjc07y}H#$d(jee2)<(?0^
zSQZvZrch)8HC-*!gJi)bRtv_(WL3?cHSjzZ@^b~9*qB?mdKhj3PuFivB<P@ouyp}>
z+(4{8SV$ptMiXm(?a@tGW&xMlNj(60E(I9^SkJFKR%Hb4k7ZF_a*6${l70)c-N+Nt
z;J_T0s}G6xItbNt;oa^S_q~@52uRLM5a|9P2tdh<$qZX0zMvO`Y9h_ZbL|kc3XuJY
zcBc$OQzR0c=fg)5P$U4-M%nRwz7G;!;u?HG<0I6d0hY4$W=*Ri=&9(nJ2iSfY)@2d
zQfTV+WKC<F;Yk7!e_IV4&bKJ-ucDUB_?71Hfh>`>V~J@!*BFKZsLwpsu(m+3T<WGV
z0prnTy9Jx4k0iARrm%FAh*TY4<*{o_7JFZdiI946Pqbo1bme+PcPj0o4bDO*erA;(
z5-Ib0!)~M#WQLRHph;Glm)5uLw+~|kWH9UnwK3a7ij^;CvNKz1;@NYSA^z3iGChwk
zEv-qp^Mt~#e*`GocF7Dfy@Ay8C~~0smh&3pi;ajqpf2UGm)8c0FEK)1vgahP?<&&h
z?|?oig+){QN?bO0$G@K~jG-=OCa98;lT-!CC2TH}`5piQ)j~AxdfpsmN`F?5qV5D6
zMgDMyL37c}9a7FFSwLQ912iWa!S^Mi2g{LGrOQ8B?{VpE3cA!2DUuR>FO5c~S!T9z
z>JMP<YJQP)1n?95fXr&^&P^ltAtq%GviI`G8(s*40u2s2P?c09mJt^R$Y42GdV=a$
z3Y?Xio%PUF@oZSu#`|aB7DcdPe+tMQQ4dBOlF#D2`~`%2__fQ|p?=uvrVg={nx=w4
zIzpjwbfKV_cWGuop#0Iycw<{v1&nuLKLTFhG-@T$#pTg=A4$^oh?`hrZj@26;-6n*
znN`LQvLvP1n3D|I#Fr4%qD$9;HANQkQ?{@pL<QTa9jIL0MzOdG>-;u?aI1)5#z%oI
zN)27k7zkflL>1(x9mB^=*BlE<O`$L(%^H0j`9~o#aWQk#92y4_jG4U2R+HP-M_T;D
z($LMdrKW!ZsUym5b*M_qRh-j$boxQ#X0X)78!tvE8;#wyiZ!rAF8R&lVnN2!K)Ms5
z0W{*<gK!I_#0s+#dVP0G#Vp2~_UMzm$kut;v8F<Lr-+j|68(t0tjGNkWk*MlG%1(Q
zj9XyNEvE|rHC=|q%rs(6#;gL^O&AgLlesATTFOGx56m=39cZUlar<`)Iiw?M(a3%W
zEXTWe>V)bRyjiUy0Qw%~asu>lxDWqC+~V;K_|}SEthXWZZ=H9;%Pz00&syibBoAK9
zNPQ?GwU>oMG}4fp1osY6gY^ijauv?hPTU6p%kY;8GQ}y`kcKM$z^~oj{$8mo<{2EH
zb=+ZblUc!+octGZD34mWD|PIdj)2z^6tv2v3h<e;d@J~EUMVG^l_+1MjXc3(gGJaP
zZF?emPm8z;7)d$lA44J<0)@xvRG2o#19d|cnxjjIOM3}OTsiOlx<i_;lt#b3@CkSr
z<b^tjATznb>cw2$N%~_2e}u`Gm2~90p}HY?5n09;Ceo;QYRk%wIy&2?m8Lv9QZ&6W
zpXVsjA?WwU=E*`!Di_}j{H)QQ>}kNFsat`(TX|85aTY8WX6iA@Bwd5^?-BEwMH$Vb
z6ATk-#pH%tS*%Zvj$9`4hzWJfCU5d{eEq)Wj?q-AHCZu0-~8j>gtcD`1%r`VY@idU
z(15|20SeO^7e+a@T@oYCwa5$%M_;BBg7Kv(4?C;H8Amy3EhiNrVRHa<DmMdaoV%Qi
z+pu`QAH}|PgH9-$W;>=OiuLN$Q;2fE*E5bptn4LK|AWS9$w8D{Ys~E%Or|rPxJm0J
ziw_m!{d=3uj7E%*ze5oo_ve4#ERjgf+aL3P+FHE*wRaE*bzl))bD3aF6p_>hNIG#c
zG3}Z7>ubp!*guc7MKh5tnrWvfj1YWi9?|yifE&vVF=v=WIuOJoMcwJLhG$rx5gk*A
zW*QR-GBuKWDvhek={X#>faVl1^;_pNrQw9X+RZCtf}XYZ{-FicR0~(FJf8I8bU;<n
z%uB<E+;n|ufOgP_a`A$)5imM{JoK|o7Y`gP6ahKNqxp~tFo_qW)BP|%onnt*k7+H*
zW~Ji|orNOe9+4JYVM`YaCTR!GSUTUP_x|Yjwb}0G@O3}^%vmfol|t1jhooszpWB$b
zF+eIv3d9-u&<Ea`ihcXrw}E;%R(<HUbd)pHlJ9c|OK-y(y$RXg_5D7iL@vX7X35Vb
zkmX9Z8?dD)#{wgJCC_B~L}VVf@fMTf(*ds2>N?2lqZZiJKiDp54~4|xEnNUAh3X02
zrBxiCaU`)h92;S^dy&AQF=Zl7d*sgL<HV4oP&=b<D}51R9c&v(ULCSw4}(`Y$j%=l
z(IIbfJ4sVmZL0ZUfxfItUG<?#eItp2xc$(yihsKZ_LW1)l_1O5>Er6)T_WTIavY$u
zGcePcH`D-p+7!?@wUb3#gSj~y`*yas8AN{keP94NO~1LXPm!p1+NNGCGb#ZRnpY8o
z<3)kBdL5@{XhNBA;O7<IWtH95XmcKlftisI>7LR{nzv;P;0&#Jzb6glmHo)c$=36*
zhpCqwBx;qI<yEhdb><d#kS)UFu<{Us=A}5auk7-~Gc>p&hI%s|j!_HpySa4}??c17
z#xl#GKRC09sL(0dp2J)d;|fw<y+Bh!o6r{MEn(~7mr*X%D{bRATmLBrFgIY>fQ;J8
zEqpO8@t}oGxiB?rQQtKcr?IB*2Sfh5uGy(w=BG%iMytP6604$z;{a6k^rNfcC@nOa
zQ|9cZ0-=joWxjZx^c*w$F#Pc&7(Nj^pN|8D9Z4ELSuV?2&ZoBod6pN7(64%1W$(lV
zI7R@=4XhT(jE;rRcVMGSQr%4mOS$;B$iB|!mBW5bb0otev!=NwHtbTTH`+bndnPZf
zkm`J-J)nETHOJT30<_Odw|$rIE%9lh1|B;lapiTqr|1*(-iR%K8mpCact3uHygDDg
zTTrRm0&%$LXOLzm76E8KN2T}0pz0G3+@COyJPNLUkr?gY9qr1QYm<@luNBq<fpuem
z5fAh-csgc+T_VgI#ju0rXhIiBRd#%8W=7pK5O?Xaz2;MH`TdO3A+>Fxn?u;`+QYaR
zd1`5+^Nc}}KTky%%M3VBCybuXySEn4ug?+3S4tN)SXuUt7+sELT{8~ibn!R%PTOFN
z=R1qU*77?s`N77zMXAa^7|^jiJlD0leO=Y96gg0`I(nfA<XsT*5qEM*r%2s;V-4-D
z6UbnfZ{R6)l#HbCS-pR*IJ47$C0GY-z?fA5FfEfi-BIQrJW=3aJQ9Y*QQ~Iv%*!fc
z>m8DuJjcoavl(|;=<I4!^LNsbP4~gRJbXf4%+qEsCneHcFDtwF0UU)HxDxM1xYDO&
z;z{NToON?&a^13Z%#ps(nIJCNl-(C-?+6^9w}&Jmp^9jol?{U1x2dwR$1iK`48D!_
z_xAf2m9~2*&-?AEKZn#NhRA&x&7n84I7fR+<cyB$yh?YQF9IK`XYx^-uV;|fSehwC
zjLF&!Zr3tBcj4gir?X^Nc&qGh84LAh-}H+V`*w)rj3?X3i1(9n`xIews-$wHG0`>R
zp@on6y?ZIZtz$!djR~MM`&8BJ%DSA<enF(Ek<z#p%a7T)7o&yCX@KJ$K<az^z=67M
z9P2Urg;fHFt(g@hQD9|3HFn<!rRYy2pIPHmn85W8+f`sZhm*u}H2yA9_o&$MxSsNI
zazn7g92PhUi5BIOTiVb&q^uF0WH$>*Oq>r%JQI`-O15Jj2$m9-hg@VqFCv5<rS9QR
z_=m2*-2WwH#ppFkluv#K@3H~x12)&xPG6F%G%9_+Pi){OhvPf3PURw&XnYg)myoXV
zcb$ql0)GE~T{<t#sIM?+k;=QDLjr72c_8GK%Ka|_HQHs%Uy#|)dL`z&>+&_TV$5uX
zm`0yGxwN#5W~G+w=P<LbDzH<9{oAS0m4)8mNe`RY3-*Jtp!0-lFbQrdlOLn76cGgY
zH^RuefzU)m{fTBzmLgo29`JAFBURYcMw_L1Z3TXc9))as+hu-L2*t;xy5P%QcbJa+
z`%RXt`72}Sj0s8PPpCm_g5sTSqxrhAR<jf=zu8sZvCQFb{Ku7Xd885KnIvw)WfEV`
zBbR?#-bae7D;ZCq-(Pp>J`%){rvs9iloKS~?jy(SMR~6@%g(st9znWZl^Lm24LF3$
z1tPXHPOUi++Kf5w65E*@Kd`Ay5sgEA)+uGE7%*?(SVV*>F$W9Io3B00*0-$Nn{37}
zw!0j)Os|`&+Ml~R`GVo#|A1iCxW`c*G1>L3n{Ey0I${}NQ_~|6O@2m6YKyOh^#94M
z$J{*ZT?a+1)gY-4xc)NcER{^1X0?QayE@a@<DpB6?sw$}c)d!6xu3#ZAc&!~s>;)n
zRlspB)rMbKKroA7#ni{$%63LI%#a$IgA7qjz+lc#cr<FmUEzIe)dUA}UE%~2K=9ym
zBk*xquUq_O%D`-9q&A}=5_oGS{h%G>-aEYPlf@!m(Y71A(rn%@QC*J&9T^K_NT$U#
zfnID5`jk!mW}Imc;yEpVhUJEcL1B__?Ib`ND;Wl~`XvUN2n$f|txk(b<x-t-G|8mc
z^Xj)RvJ@980f-s0Hg-65qO71A`l7ZFjk;_#y_%A9sx->cpLK+3A<^iUX=={)F4!qf
zW;X0}+xs5X70N7AHWxYkmYJUuyl*?puUL6DMyrQ>z76lM!a8&WRqSxddRL`n_TCW6
zaXbB8TV^abc_lFP7Q82ih#2*|SMUX2t)EzFp$_2l#<5unoH)0#AQ)~xeo`ynT0<Kr
zEtKJ9mfdPBS>!T!RYoOe5<9!_E^c6J&C!knu1-@kBu+*aV7%*1)BtNep`;!)=F0OB
z*qB#(D$8(dO?8b9C(u^{Q5hx4ruZ%0?<7AuF~3SZRyyr$Zcll`>ttf(L<`Z8rVJl`
zEUe<-)5>iZjs^-U2?v~@B*47kk&B6cX_Q_cNl2++fBtMxS8*&+n{Hbxsm9E$K1uvp
zqd&B=k8;Z(TkwOInAVV#+R1ccqN*B{Iy5pBZ**^Oh#ATDb(Hma7$@qKdwgDPv$Zv-
zwLWo&<vQ>&SoVbWAoigh9Ad*i61_4u&3S7qPYVoam+G(1rJD1FPmdN(*L!W#>8eDk
z7K!D|R%)aV86xrjVe1{3BMrN@-PoDfb|$uM+sVY{#I|kQwr$%Jn-kmldY<Rq`~7y+
zu3A5!`>yJ~uWPM!9B1i!Ga}Y=5?#TFQ%H`l%++**=d6zI*Ifv4S*tSsav#aWn!T+j
zHICq)j}{y2L=V9NBIYR1B=4xcAq*)9+Q#4Q+L6YLR-}Yb%{sGkDX{Ji%aGlw=z=pS
zFCi!3x{dK8-cUi8aVnLRBPg;Rp%{gk)b;m(m!1}8wjUnyQWYZAxfIQ7kFnFDVGqT4
z;N>Fj93@)uTho|v+H`1d?GG*+Ego`k<WyrLFKv9jF!vs+o&b!Bm~q25=}9l!F{6qu
z)?K(_&}*r}GDe|D)&iKUHK;d`)S-4Q&?qPtD?)}V{C9>OftaF{#-R5b79tnifLLA0
zFAZ#eU?FwGY?=Y{ah9+cWNw`IytO`UG_MJQ1Iz%vTg5YaqXg@DuZ^nfyV7y&3Z%bY
zkbNS<J_E{M^Qmiko~w6<5G!g7q&kIPf35kS44O|-I@qkmgld9<KjtDyor7ADwvJJe
z_j8t_1dMoMA|)A$(x}K8YB?~j68uz)@7##AqY_HY%b3;+wYb)>*bU)=43MTm8kRxx
zxo{cMh5V+shAnpo*%XI@c(^4EJ<uwOOxP|(YSqV&yH7P+UY}JPLa@?5tO9?9Z5LY}
zvges2J8MUVq=8UP9CT>%pyiC=+RH)9v6WasJkqBl{&rx4q?aty4|iDea2oU2Mve&t
z+sA$qXZW)<B|yrN!|dfY-ZqiJmJ3>G>MtiP=2I%|5$GoOUd;W6BRJ|0q5hfv8@ly@
zAzp2);mF&|xA6CDpJBScQhN#E5lEg3_!M$MOUV8jHsfcci%Y{U^OXNKF@(t)Si<y3
zdx~YK1DV9%yp?zs>09gK{iHiZXqnck3G)t@I%=$lSBpS~y$VwVKM#;TCYwZBxP~SA
z${7gbCM7JyN%G4?%Gf%k8WEIZ5e-jpqUI!YHtc9@zqI}X4~V!K{M!dI0oZ222miG{
z^N(L~EKt^BbC62xUFRcCp(Tm|e83=Jr;a9!45u7nL+WfUzuX+|_*<`A{}3(~j*1Wc
zwb%s>`YpJxeV*&XF{wm%L{ae@%Hu&nS?iCyL$&|xCH%#DcP<-pHp3;0>RufmaK4?2
zO~cXD67v}LdhROPVNLoa#ZMUttBhNgp5e6FI#=p@ddqd?qG|EzN@_UU^|vL0F!yn(
z%vcbSqxEERn@vBhgINF;zjt&=G}Zj+pKeTL!v~YJ9cqxp8pSH}{%tVdEQ|WOs;&3F
z-r$TTeK5*}=ndYgndebVD7NwtX=w;?HM?YL)eI7M#br{soSN_&3q$7_^q4TbhHS^$
zEoCresDO={Cdvg|9Un3E)0W*-y3d~qhsuxaC*S1_h{<g>R+I(k>d%`^*bUX%+T-8i
zjjZ}^Wo-#sWWg;=Az>Q*K>g;xj^8715bRcrBK@OW-~#)@rSs(X8F}_X_V}#Uyg=xd
zn)GkMQ}~j?jY-x%2+NRR5*7HEt}0j-GB}cEG&1ki3O?HM8BYak@I^YbxP<-}7jwzt
zJ~Zqb2zR;+c}ESyn<DZ08+pU{A;Ln3AmCz3TwqiE?@y2BC}DzWF?$f*wh&y18;loy
z&(nU3P?{;}H@QGN!9K~JP&CQTL2e{)=Dvze5gxvO8eCVOeqH}4=+5Gr*LSX}F5@)F
zhSHhH_JV(nrg{?$`+*PN)}ucHJ>%bWB7y$(Cq!NwU+3;N`v<y$#fdxDjaO)_>qzNm
zjOPNlP0ivL50KZ&8)gw}AIbtcCX!Zol-a0wBND!D!W}9Ln;|(#7v;h5oq>iJ)$Yki
zz?!8?mnjnr%cF$ukPHi^e66|~XStSMYybVGN0E{>b!7MxGpMrbj2es((KZWA3LI=A
zZgF+n-HO<V^?`y_{tnpUky<Msf#eL2-;M5%b03PUh@m%;K!l(^Jb5>_j@wnFR?RyC
zSkMjsmK18zNvc1&leeqpO%YmM8dEW5X9$S%3%T68*W5?vd@jBlFkzaQ)Q4$7;A)Xc
zXFh@AY7zN(+$|TI0R(mCRew7=I_)u(ZdH77@74d_1=lk!ZFvK2MzE>rGjlgbhl>y|
zXG;~aktK%r=(u}zDa0nanm4(pJCmI-@YO{ufdJxn!h5#2oms-=0=)NcRV<tAl=*rs
z@QNZf(_G2fQ^+Gkc_e@3ue;sLE@g+2DX258BubIaW!lyJ^?)R2=FGsp9JgI>vS{+H
zVqe|a$-TV9!s}~>bog8w(x95}bhjqFCY_h#F?!E}k$O7Ng-`fK3`r{b4Kbf1RDQv+
zd?oDm6^lA&m%tmYVhQm463>5|Hq=5AKpxmt{q?{xRxtnW563(o+;Bh^v{N*^*Qa9q
z!|Vq@>Y;fi-^s<kPY=%n&CO?1KDBp*%cFS*CGaqGBj^%HQ8tm-lxX@E_f;%o9L);_
zz`jUz(+kX>DS|xS*hzToerX*&1A`N~q^WE37%qPYCGd9h7B##$E$T{0#~n(~RpteJ
z^!q7&{zOy25PZQ8fMHDnokZ7{R`v$X?AM$rPk8rfg1sN*4VNVZc0EnBg}NP9Vw0p?
z>XK#%h2d*72gf<WL$|dR->B`ZiQM<_$)lJ*c(9EaefNBO_jD*n-GOBor9Q*ENT^A6
zMtYLBD_{?)?}h!kly#3^-!=tzO=*24eJd2Pg&{CgM$F#`)6<PazayM2mv^RW#?;J8
zz=vsgErF^pli(nH!b7X5^Yd2$@gWsU)DDyJXVLGvi+gQx#xCDn^YkGF<Xgs+nPHBD
z9_AiJh<7$;)e}zwgORXvbE{yg-7~teTFf?E@*Kqbvc@~k=k)YX+5QDr#b^hA*u%MB
zPEM(`(f4l7_Hk%-CQfVZnmO#Js%Z$d5bt9qm=98;JTKm!uN}Z^fk^5|X^wdn-u+6N
zCM4udmW7lWQ9Rf$JhGy|DJ{iWl~;DuiFaH0ylqp;fLO$Ca_O}{`Wo1L7}$Ic6XPt{
zyC!sH39k^8?ZP1Nxt!5U(AT;fg8Lb!g89T2SiEN7Edbd1A^_~OIERo|77rE_BB5_e
z1rzxPD8ZnPgi-&(5re%~-#W?dWJdTS5cx6yVw>R9#|lovtU>6EpXFr6ujBTn{(LO?
z{4qT;1+LX={|a37$A@zaIP>K1Vx+v;k_awQ$|~d*3qt~<=xM%kWkiraAFus!VZ+O^
zUP|qU&E9P?U|+|4#^L!OGn$^?`onGW^FfArD*AfWH75GcPX#p#+ZP?-ksF*<O*Qai
zN(@_OYSpxPr4~Lu#V9RE>_rdTy<;*<a!_?;8y0@Kq|I<51QhM_AqMeqCHf0NLkpTy
z%=*V^4iW3%fLKJ{+At5lyx)*e%s=g}!GK%L3Fiq0;T{F#0f-!PJ24W<W+;0=Y$|v5
z>^@P(t`<Uk=~BXd`MpOX-(%5uLShf|!#$HT_4tV-J{O5RAn7>Uols!kB_k2dj+M-!
zh>5P5=vq~PS!ffvMbJiSiTo(Fa;{7yY{f^actm%lNJ_S+z+(B1)x5DQ!5toYIinxn
zLo73)uTjkgp7L)qHZ>}UOIQp}G#H$q_V=>pZQPBz8Knd2gK(n0-57|DM4}j;XQ#4*
z%F95SDNNoO%^2|94hoLIuS}FoLtclcq&f;9?rsVEVy4Ij37@cw1WSxIjk-!<D~V~r
z{FW8bS}hhHvd<nAxTa2ngsd*q{opaoRnKyomesiwL!{5#JFxX5j@QRFcG8vv#Nq>h
z{Q({p^kVoLNJ`Lw-5nigi6iqj8$bi$Sg9J%?;%Zll5}UjnzJ+dvsJ1;ehA;GC^#c^
zlmxb%?{3+l80Up#bkYzkFS<R}N}15-jeHafRiNd(y`(7ZGFZtwZB12b5so0gYp;)c
zdDR6{w%RY%g6fjB78Mj!R#8G~CqJS$(nP`y1R6xjt+-&s^Ce2OB~b#%S;AI{%a8A7
z-VB#4!AO6YN;AP1GCB-D>gE&pDdE746Uuf^l+1Pu$*MD+SWGa1A1}aexlNR)NaEPm
zZx;)n#1eRFqkj^lnJ=o4x6NT%QQ5=@-Mk@=BEPjL=9NXxt1;{$j%~u8);8rj=~j#;
zK6K45DyiCJ!1o|oQ@bO>?Xf7-6yE;4Y%Qs)WlUOHIttrCP0`l<Kn%X6TQhpD0&h;2
zE%0}qR!$jI&oWNZW^xJPr%GLc5g)jWgqwF3oLIZZO63JF^CSi#3U*QP4bS4E4VG%D
zcOct>+(YkFGJ|UkEl0M^^~B`3(d-dyB6!BMxzogfn!*Fnb{>gRaJ*p4TCth<<+}-u
zP|JOoANhF%1&tUWS>~Rvy+(nc2roDx(S!{`iQh>rW|ky(6%(7**vi6av|kx^n0HAj
zl53#0>T+EDkIP!=rJp<N`-)QJ!=FH2R8{ny$#?PX@C)M}@-<_el^di!sNT;Yqt3RR
z;S28VQ@5gzmu+c$+k?!V9{;6!?Z}t81SdqY24RqmoF*}ssf6<^hkH7kC4dMDksZF+
zyVQmf1e}l}PyVysx<<DKLMTs^hC#nQ$+qY=UQU6=(1pXToYM5Fa89UNd#qV|>yg=v
zuKMet7Txx2Z_$FsIijU}M&6($3ahS(Y1MAgqx0$e>Cx+by47Z~f0QF>l(?53k181<
zb#D7E$EK^1bb^ZyznBcgV4l~IvKU)YvTR*`d3ik}Lx^b}c?pXN9pWg>FcqV@1Gm2H
zabm(owL&f`=)!p_4v6#sJeIi_9bHp40LJFeZh9@FC*zoNGXcmJHfT6F+1}mu8Z`m}
z0?vtg5GQ2fS*#ye?$d|}g$r3_d}q4%tmxB?X9}Ei&~^y*ONr=I>JY6Js@LksWa93i
zwoqdCZH~_*FSI-{w&Hvc1X&0wEr|<6jPI(r75ph8%hXudFH^>VJCjgd!gLEhY(~FQ
z%^?$JDq1W)z<M~45}KE26lPT^O?7}yl;w;a{wYfs*&ZkLcumdkPzjuao=TmaD`+Xo
z`mjHcc37QjGhrq?8BogmRnxfKy47=1kmYi)RFKV)c;@u7#d1Z=2SRs;w1#tGYQ*Qw
zu!x^#hODGbMC&>z@fCKR=f5*2`BkS7lKiyN1XX_@P1bv67Oh>P^BKi|LgJYgYF7To
z`d|pzIG_S&fSm3~_{$ca4P~yohhv>jEWy<jdKfmva*e+0CLGb?NThr7m-7!5qO*r0
z;ZG=F;`&0KCi?`~Qo}MsBq{fD;V|P#5?z%BtT_(<++rm@5$j=^XRDVq+^D9dUlula
zlEy>f*h8r%=oSCbK8Wg_|I<EXZ6>(`v=0aWv=0(gM~-nNKj^_q@$s!HC=>F$7vh^%
zOXc*Ck(65*#*uRj;%;pT=RPM!MN$o6HjJz@(b++ok<sT#`NDS5i36Lmp=^<TM}1=l
z;I0C}T8x5m^Pmj~%4KufGn7hM-|}f`BPai=3+ip9Eo=boe|BQN*}#2b5bdkAf4yOA
zeXsO9uYk6k-_87Z=<GkvVAfwi-KR)?q%YER`!gTJAs1yLZ1K9U%4MAR;~g{6T==t^
zM<r^|8tm>zW0aUrwVHK4!+IX&t@TM3JIZuY$T<%t7;#V@m43q4UxMEWhu-_L%8tp0
z=Q2rbg4&bCl6BRJuTI~WrK&sf4eE+J>=_;Si8Kk)@%R92ZvoR}?1md^a`IG<HfDQG
zsNhFH4E&XbT#E=WF5l_yeN>Si)+mDHVJ~h^_lUDrxie>++Col(gTtR<MaW$3smJis
z_tQ=CVV?(XoT^`WAMj9HpnX+Y(zk!y-FaSwlt!|Uvc3qwSbUR8kVX?9{j2Z!nA{z2
zM7-f1dEi!74*vNbF#Ey0J{VNRnClC{d(s_5$a&<zZJ`N$q()H6Hsl>g+Qog&^68W%
zZ+IA6AiyHYxrC63hH6ac)CIa_WLUVDW?(CO5hpAi+s9wyTe4_N%%SbxngrdZ4TI>4
zXop5=q)x$0C@enr7X?+1`hFW-1hn_li!&~UDaJmarCYMR%*_Em49O)PY*?#QxoPQn
zZp5uQln%2uB0>!}s7fI$`k~_7hLjUTt!~$@s7`B^7Bfv}0DCEM`Hz0GIc8Io!OJIO
z15+Tvdkfl#N^;?*5D%xe_Z@ac{lqd{+K!yT_A^WXA*bqwK1D}@BaAfeYeZ8NB{2P~
zLr&~cqEzIZiBh%`&<@SAe<Z{@#fbVB$pQUx=>4}|a6?>wCg+Twk}<mw&kIiQ($T@(
z1u_<%D<hgIxxy`B4oZ+VWG+@hW2MeM6E@^bh2<B64nyIRUvZ*lEl}E}Gh)-S#3Dd3
z(tLgSNco2Tf<HV*XLoRQP&MEg;E&}xGU+zBAS-FpHF|@VznkQsY;WYFz7{5G&727x
z>1M&Muc-T;UV9_tGeUi*%Py-P_vFqQLhVsWKW{y3&NXK)N@tPw8A!c0$G!3GJNR;1
zyI;R@yY&lMwlALw#M!TsVaRFZMLO$aso7@!O0*nj;9T@4!TPbNO|fVsM&UU6<u8!)
zR<p|jARYheNRZO<;z*@++Y{o;9hYhyW12_lzB1c8*81O!nb1}{$aT}YDK?P{+tSU}
zidsJ9F0@jU^!*jVt)&WzmI+h~%SjWVjA*lh8i<A|5d@M=C>o$Y?VWT8f-EROYd@xp
zi>fh3B*aD*Qe=!Dq75iNA3-iF-kmvousywdZ?7FTN#@s5rVXu|wV5O0iC7rJ>t}4V
zAsD%e6sVgsDotimhgk;IeoiqBipR<?Z9<WMT!b+Czq)1_I12Zhp+)HeMVa0a&z5g?
zA|BIqnQM>GbvfXa&=pvk3uEY#1f|2$W9uIdM2?Zu856Cp^aZnT40VZ$VGXv4j+Uz%
zukVL`364t+#GaX8w}LSj<kS8tf;I;{3HG)XHM*{BHESKW>>G5tF0sfb+OCXJ!r@i7
zxI04PpB15rk1w$fvKP8ljg2L0^n8t-Pi*vtWkfW(MR<r+$Eo$c)fdz}Lk(K$lr3mo
z7cjKc)|71_E@(Egt`eCr-toE>YeB6}*SnpAoxK|k_9xdZLtY~5Zp~2b`i8pX?v?pF
z@g&!+-{22HW_g0FVIrp%PGh!V)U;RB2a5p{Vp#y`9FaQEt?qI;M;baX6x@7S2@NA{
zCMQd|OhBywwZ5xo9F6iRq%>A^{)d=s@{X=a!J8=zC^9Cox?RoWNMpC-WY12)R_{*l
z?d9cri|yLGi_Y+}esxQf^X1>%_%CP@{J>dh1-`$s=dPg&5hvuSuD(YN0nL?#N-1-)
zs+6#Y`KEz40ap6wcO|OVMAg(ONAa1KCZ9bemWhw8CCHled!^hf@#@P^f!kE%aLXlQ
z#)oBZeOcfp(ADdP&H`$`>UE01W$RU#1zbiQyZdY0%wP$5N(Yl{A`AIBL@Hsy1Svfj
z53;B60>r^Epd|g5NbL-AemFX2cQw!pN%X4C9#7$59HDAh^)*n1?@e8SnHMh`MV%Hj
z?m5qki29`IDUEHjk`I?;AA_$i$ik+}nNEIvZn#|T^mwY9{*9r1f3n$t%mRUJMYL~#
z%(66|Tl#V+ij0e1r-dL(R=s9y1L2wwX6%ACAcS}`@-CxJKbIX!fM$jRlWwaAZ433&
z5TG%1lv7_VA^R++lp)bB+c$nmG;nABq%ax7Jgt+99(WL0vsD@`i>riMt1nmwI{F<?
zEL^7q(j-MIa%_#>#=>!9lDJOHWoB1n2WT)b;S)@AzAp8Z=)lf%X12`p@BwbQegWk<
z8vN(PzPIw&G;R^bFs5hIE~u8y@|!4y0|l6G-r<NaeI{xOxVq;go1GlEI`(8{q^CUF
z{r3XcNs+e|WHO^SL}8<ART6OZ=D*s071G!yWB6MAZQ(~fhjfiwV&tA+)v>2G23l9E
zIB@ih;9G0A!|}=oC7IiQ0X_e8_?a`Dol7c4!urE$a^<t!)ump)G*h4Xpcj-Oc?kkI
z9&RAcE`9Uzbda)@4v1#P+S_hpW#?+G+k5qEio5gKbbGrSO+7-u=_vY!0&8a}`nJbY
znri!7ElA~0_uuTx>+@hDRA$ycVRLvrg;5w@2>3$9Gi3fRXF1)Su2*ZhYxOp(o$ou7
z7ru7aE#B9c+qZ0WBIW*Q!7=DaC!Y5*-<gW9ZE$HybqaqzJVG;-6Q-J!*e6Q5dnr~L
zq2;~4OEfC<U`htfOr`_RZbG!+N{k*5i0x<e>;Wx*9^S>auT{O1**WtJmcsV;!-V$(
zdnjd0a*dgPTv#`(Z!RBQhPKD;FyKCZbrC)6h=6{a!%~x_#dYg-yP}7Y!}sRwGxB-G
zv92_lH<z+hdtYjyG8#-lfz$hO%J&-M5-EfNw1vV?ua{+`<8|TVwTffF6oY=v^XqG$
z4Uk|3?F0D>AIY^7DJ2gN6sO+B8?2o|p2pX0mG^mhf4CMK<C=X2chJ=kK!#m;$@RRu
z4$&!2X;I{qIw#~~XEz4_mkopa$A$rSd5nENz#X~9bNf2JF1?Nvrw==pVz1)Ueg78C
z`58v__6QHSq`N^~1tzledi^gMW=A1C`?b#T&YGim$euHDc6Ri4u?^#DGxRDj5>xD8
zQxDxv=y3VJWEc(m4?(t*9-GZ9J-#P)z(l=A0C@BDAX*wBrH6av<(30Xe^{EMpl%Oo
zSAm3t$qfq_RHjcQ;tm0Ol(OnqoZ5ji3vT$1hspeJttN0~psoc8PQ^v4vc_`zcmsro
zyTL8LxpOkn)>{ip+9)5d3@u)2f&q@TRWU}yUDQSk3AjOI0q<@uB8H67IyXG!{o<h{
z1<jQdi29KZ_u@>OA$G#OLyX)mfGOF+%WK^G2FdkNMe0)-47(4Gz*Gt+6~AM)NB84p
z9-tuEVvbFf&tlD@@=GLD;@>^=$y(`)tN*cID=gF3x_4=yA^ZY;#riV8F{p*dvk={O
z5?CN7Rg=`9>VbEr4^M&@NBKO&R(|_Rk=Cr|=Hha@cCr6qGF4l7f5$~S&<o%ce*DXY
zxH-(qYJ=Np)p(0qjKygr=xpWJ>iVrD1cMW2#lDw9X~u8Ft{iio^ubcIIa&mzBsb^m
zV7FsWkTnWV$HN_7dp)U_^?TM=9);0-6xK3}^Jqpl*psY>e>SUW56x6Q{6rc+-(q4V
zdWxYiS<Znfro_x|)1J;1V_YR(0~m)+H!F14_?SSr9p8yRO}~00HHN&Kyp9x&p6md-
zEz8&vh52Cckochfzu=NP5(nKzl4P-$fTp2QzzBTV3HQz7x}E@|$S#)XbbflCPpgCb
z%;;QtU<nENUItgYvrYP|*U6$XZr<;|9~QmonVt<dhot34IcctD1P3FOW8{ti-OJf9
z1}61XyXEFP#BAHn*iS91K(lG32Z%7wl5<G=$g9}V)ZVW#UPj@>>bC!v(sC8}UrOs_
z<Y+Wno7qj*dvu|V58mwuGgw2}Nc7*U*LP_GKG?K6GYSm)FR0H@PVKM#0PcH<%tr78
zcayW>T#VxrXmguiSoLyng>+ImRzFNp&X0uC<QrNlFqkd{1tumD!KPeU67e{hD|+7l
zfm)4+vCvlHX@O}rlxIz}O-bx_vI=<01kV>_ai{}9Kile@zO_H%imjeN2@Ex-EQ&J2
z=#q9K2F16$(J5Clz64**Q!^DYc*vy9_?&8aNwwj{c5~cB*!9|SYJmQJy!TXu7@ub!
z<+AO{L7qnk4ln%Qm{wH~0<0M?ItrGS<_-Xo-Yo-eH-Tm5{?TOOVv5E)-YaY`5ad1|
z`(tJ8zra?+KVXZH@$ln$O&Cw2gv$wy%qbKlph1akMRIz?@+gDS(YTP=YKowmc-b^b
zB?DLiD$agk4vazi{{w5CHP-3Ua3kl`%-*1ggvpr){B@k1ciAoBz}Dh(Mz6)yy-=pu
z+9)l8eVDPnj8nt+y`ZA$_;`RT%|NMO16Cak7f)+0aB-7vy`tOp8IUWh_0>pwU(fzV
zjKwFFi*_u1oR7ErUtEh7X)3B@8JIlF+zZ{;>wNnKtTkPqmbUA~;dTLIE_}l^1>Xl8
z8}luTrK*WFWqBzr=QctpK?~MvG3^!KF$vc3<dk?{HUB+qg61RLw<|oHD+>E<9-dhu
zDqeu?mhbBu@i6i_>gPr*>Yt<JF<BKG_9E97h3JEUkL6<Jolb%sXM$dHFAoH{uf;_G
zxmiqjiw=OSfTbRKnhl9gFv<Oe0~x`->+h}SLefo_&VI<Ni{FOVToNKvOOo2E{`m|2
zGLd{tg}G57Ry^}n%Oa&AM?Mu|wB(-^MA0)W%Q`c&aO@u!4<tR#$8(!HPx%g9WOap`
zty*y~sBLhzB#h=B)<Hj;g^s3(=w4=&p_5Zg80&E-xZ)s_aDKO~F@S5}mEjxE&YdT0
zgjRN(!*$K~!i*1$Kd~=A3rukm+v(_7r-{#R$<y8ZaBAi*05R4#C1uiBnsO8Da~3VM
z{9}sh5L8h6m`~;loBd=HgWw0VY%79$A&S95iEYqzoLj2GQM(x`o)N2YHwR|q2BAh2
z;~tpnKbni^Lwc#Yk-R@PLzEW=<*KZhLp*8W`-j~42Ey%E<dqZzSiclbCJ&{lVjSx&
zRVZV_>2u~Nm*JF9P6*Gs<V<U4>*nkI7dBCZOtg_+XY1^Ok#LD7@JoJLRZ1ape|_Ke
zNq}ocMRJa$7M7^T4B5f^Df3akr=WSKM@o|qEf~(|1j$0z#-Dyp*A&h7|G-*;iPs$j
zdH^nk=(xmuM77a*kWwxGi73F@s@{RgMBj_VC6XLJGUU(4`zuibhhGmwFRL`o`4fn=
zbKQ7T1>KpRs=MTOh?<_ujijx*zvOpI^?>oRwu<3GZ~k)HgPpP{`DZ7Hs_Ce-vgtxE
zZ3Xk|jwy`nTdQ+2`R7K3QsS45boKamjYho%-YvEzE-mvTvQx}Tx{s!o_!oQ7cKo!Q
z`83HVj?!^)7VdOY>^aeXtuIS4481a_Ma!3xb6m<XXmebNf}C#;8<U>cwz}}S)j}j|
z>ib<UaKaATbN{)tZ*j7<_7{#^46YK$#o*_CZ|1(Nh#TOOai(hcF(lB{%2$#D!=R6Y
z!`Qv-?D);j48Yd8>hXMGl0n9EJ0@nd8?9`4+k;pG-y9`Gh>X#OOJ!;hilAYm5ocgf
zBfjx`gVD5`FwQO7rN(yumd`3?N*-CO-Q3`x5Zki$$eav_@@b;3M%=XTOaQhM4KN0s
z@Tr%A>_Qg21Flu^W3SX@Nn#MVGO*eQsx{RH0X}Q{febBCJthRK4GcP}j5$ib3F%B>
zKc?}u2O}y_@!h)1`l&{l%rNaQ!IMTUL4gz&WcED}qz;ChD2@nsElG54-SBeL^L{IP
zGhdcGdVa7*64pY}={h!z=$qnbj?-pFe9qX&;EI5`#sP{*W}b^*JFF?44V9&JoYVG9
zK+oT)e?`cZDRte9thO+xB4xNiOLRo~VOLjFWC2=9cw5(uBz8=+d>^kCwn!wPOO)+V
zt7a(p-9pi*vAP^EifIwr_Sb2!HT8&A3|pH_uih3y?jDt7RyRIZgI%})%`F@Wv35i}
zily5W-%yQ(qyYGrPMyKve&YM&qF(sX2m%y342GcJa+_-=-!Mv$n9wfK8S{ezaX$$h
zCeFl#qZe&RxC|oe&Fp0E&Y~QJN6c=FpX6p>A^_b&wckP+O))*`Te=uTxf<`Ko$?lv
zM+fmjA2b|AmKd=EVx!^V9>t^!79=^PPh$@Gyi)`RVw`Mw_-N<8*S{SYFJ7YnBq}N6
zuy0O|oN>2-Wb-in+7_eOQCp>!SCGw=&v{-rk1y@f{p`C)a<UTLZB(0I^ETWmKd`=1
zC`&>3D_7RMkine4W^7dxv<3XQ;Xtawz0`w~4^x_slCRaliXlWgE%g2)2n5WF#)1Bb
zcle0a9$#hsj8wdui?i~IqbxvAxK*$@pVbN53oz~ME~miyyOt#K>+|!o-aki|*-Jnf
zAD%DY<h?))*1;RgQJr`()i20NHNi5-MFnPFxmev$OgA8<zem@q+~7w0x_Oz-iCP+?
zOmlZQDfxia4+C9D)%`7_l2ByL)KEP*0lXX^ZUH;wW8kE?47ZB57uPwk+oH3gpK_y|
z*J^f667m*CC)<ffi&a?HXhQA$*ciAiLr#uwSFa*6(Pn04rGK35T-Cv@;fLvRR%X>{
z^GJ=XSZ3`={rIVvBd9xdwKi=jIl*q_!GbY3{y_eC*{Du{7gr~-S;?`tHO}ORM|TF*
zrR}9huULHx_(yX26nI){9-VqEBAaUasVuXPtY){WN<4Mie2B;9P!*f%!N{`K7hZt2
zmgp#Db#>}7`s1BjxoY7{<t5^B7V7#n@A)!y5Ob0Wb~l(hWpg{LbC8nh6`JVM^<%R>
z(y#kg-S~QzIfvmtr9`|<PFf9?Uz*sv%QrJhYy%Utj!pP2C+WoxKZ`74Jy2_M)Ke&Q
zY%N1cCiB$~<BJbwBR=a<{cPNfUUB#Hz3P)XY}^dXlUhwC$&(pcu7ZnK^}q>e5QzE2
z0jYBSF(=oHzCW<z<{~Yz6)jquj9QDWSu3Yglm_ZnmA18Pnv)*wwi2mjY^rhk(RH!u
zLYuEFyHk7zpfPO^Hn#u<0RT8(P<ejVcqng0ZY-tR`I^S`qV_TF^*0@w#8{XbD3Vd9
z1r?a*<lN`i#`jE4PSRXEeYaakI0lra?uWqRxmcH{h2xvHRk#qXhGn_P=%|!5)qDnf
zrh4t{53JzRJ1Og4uif_z?{dd(WO;UHHOcwm(p*+somJ0b?PPaM(^g7l&UbY~*_AFx
zk|(;T8@{0oXCY+>HrakbE^=7*b(qL)S#=rb2_pn((L6f$g~a0@?n-+uo3};h)!YZ;
z$%7sHlFk|2`;x(*TQ@~QmmGWb=T?6MR;!(ulR$H{$of9zsFnlnvsI{@Jaj7-0|Imj
z#$Z$*=%>$S>_S$Qh_3hnG%gwb$d+7ADk3sLtrO)hR}OLRYRddEzPBs&L*tXhC+Q{k
zP?x&Kq$`%3nvxcvauH8a4-kTkI<X88@7zC5;Fm@pmVaT@4i(J$Bhb#^P|di!@jri^
z6&ZdNNH|Yq!jh;jhRcCsD$SJ0_rc(uqH0tg@=?fJI_77ugg$}ETyoD?RO+Mkr+EsY
z4VYhl4l@|w9ybLL1^u+)n6!A*N_0FOt?mV^pUTR4Zlxq<0UOck97c($jS9}B2~YFN
zdDyKz$Oh3?a|7<I|C2j>oZ1*~YJA9v18<k&mlgd4%dWyvk$ch~ki+Uca&MF0E<xUh
zB+vWHE10}?9a>(#ttIwC_2a18kkfT6<M;GxW=V9Cs=x<fxK(payLMfuX|zXs0;e%;
zX1{O0!pMSDJ6Ft(%8y%};jV@|9fX1<{~w5YSSa{DSCq&1CDPJU3xd3#(Gbg-kc@xh
zelEa$vJGKaoy71e=3q7{RB$ibQ|R@1lkiG7^2awMGU_Jqv{Qms)KhKh7Q9(HRmAsR
zta;o#RdBhhfxAO{3O!1fTb~&u9y8^m6P;x`lpRce?D0#{mF1Q5hK!$&k3^o043_Rz
zu|21A!>U;T&ZQ~6U3)n|jz8WM)?jt0#9M_>_Hg1c&k&|D4GZ(8k#jK9$oc`qd4dSa
z((*WdsAO<S(l|py8aovVu%N4wZC}(0BnL4Z_N8BNcX$lTO}U|E@UunyA>5G&`U;aJ
zr0Ek7A&~zA<rVk-gKFL=BEBGYY-sj)MF0xH^t3?YK>%r5N*F3bk^pl-%Rm(0d`f5H
zNQ*xLhGN;$C<S~t+e&oP=lW3F8jlT=7^dKh?eT5HY|axH2N)V)c0xGF2o#GZwPq+*
zK5qKAH=C-52Lm>ApqBBNtnY3<&Fd4X?&)`dXr_x@fy&^<*sGmRY94(Fhr3tvzW0<8
z)2TsgIBS}SLDs(vTIr_Ac}vdB*}cj7Vys^LAyKjq9x1lGP|0jl94?(5wn6zHl7oNE
zCHNGncE7}`rRQ;RZf*x8J-;MM<W6yM*)5e_s-CI#YcLi5!Fc$rC!!USCaM-kI(T<L
z@U_f%#?`wos+*8_wAqcW(kjcVa<9B#)9H*UXh74gwHi*qbV~}3D9cpGaNXHfI~BrD
z`9ZcJvT3{ZN8pha-ls)1AuINOI~i0qTwqMxu_=^hWY}__<=??j)goRCe4&Dxyc<5O
zmj8#;`{{?DDUfj`ogAuE%QewCSF{dE%$EoYIWh@8fuZTf-U~6-psj!2TEn&<9K=~?
zE#K+IwL~XyZHgT;ZPig>O-Ca;tP&ClDQj`EO>x3m;?fwc=VMPlrI3PIrP}j>D+<^`
z$yfB7m!!1__wN^fxg1!;*sXeCJC+0N)Id%6uN@<(W2c&y2?DGte-_v&4l#wUE<&d|
z(wbVm{>B43Oe_N&S4dz!Li~ClnUnvCsw<z)e?RQvzHdL_d>T$HM#ybnC1_)&*t++1
zF~gTAe6fDL@1!||4H_Stn_R%mp|^zN4Jg>fdN|DTd_pE$fwEpQrE@|?5d)U28Q&BY
zQC)yPMRvaXYuv4C4|C?*{+CTRa-^L*a`vn1<>5`b%_(paOP@|R9U$5Ge!V)llh{Pz
zr{j4Q?97UY|6tx(+@VQk2E>(i1<e?To^mfI7^@d2PFESm{SYvC%21obL~ZP%#1jG|
zTm*gj@O~Ru$hJeb!~bs_VqvfRxBKfN*7KLq=bQvZxZPp25x^xWC0Ix*F~xs5kee!>
zJz9&CoBhkpB^#MsY*C)y_Tv$~KwO#P_64yhc0nDlXj}7D0hXv@cNrCPLJ-z<6npnF
zuKsa1CjpRTbd2L4P5wShhd4+D$)tiw<3tw9-CPN4Es2K}0z;n4Cqus*F8BVcY}dxd
z&S$Bc`GOq_FYiuZ5~Ek6!`F9jxXSbWqV`^$2uS5gB0oirVH!GZH4&du5zNAyX*Ez{
zk3B7KKz^mL8Yri56A%eVd<Xf0)=Yz(3-+B~7p723U<6ha5)K<%yL*7>YU|a7qUh6?
z%^NtY@qh2XM1g=m_2{$Y^S4#dAt>5HG3M^QJS8%!=nC5(b<Be}8*cXXa=C->Uu*NX
zQOL36E1bfMm?vd(vCd?fnlaV1=PuwDlni!l>@UWaC4zegFoq>mn;wn{p!3@&MSOob
z$Z2++CF{ixN%=Q<?3a@#6d(W6jMtIK&T%;syc`c$PXxu%R}=nfBwm}X825%7>%Ys1
zGsN$8*OR`DX_vTzNTqZ^A(Hb0Ds|qQjsRmnM>#l+pUrHZV?P5++-lHd-p0dWjzDZn
z(k(sv^%mJ#%@BA=*-%y4qxDvmd{(d`fzS-WRR&P_)^<a!KNG)D+#tBw{OJt|`TU-H
zdPCymp^bghtmJ4P&;TZY?sH*kd>LEMD6MpSnzhRLT7TOdee0SOSvN(;tu*NFH()j3
z)nebj=_SQ+N04md-eaL%qE)FBY^Ir9EXB$FD7N>;*1IT&63AujBN`^#$f9Set!V4x
z4{u#CIae_n3%az<8dn;Acx~LIw5Y3A;~p61uUta$r0?^%<ZBz+Q|)lgJjT?>=x%*^
z5XL^u87a5hY};=0z@^_E*wn;M5+UU<_xiZJ8>W}2Dq6gPmcY^9XUgyDoOrxENL$4m
zwN6YlBX#3{uCFz|jbL<8you06R;j7MURU7&^M_mFgX~j%@v7N)vRczaG*m^F<}g-g
z!;!LI$y~{(KP&k8Jl8?I?qczosw_ai^_`Obiqg()TtXP~?2r)`-0peRoY?>o14I53
z3vBhHRB!K!PBz#6dX_%hBMhgQ4`S&VpT{|j`tRLI<D{ORdo{Yx|5GiJ=f+f*xq;8k
z=+0=(X;wu}Uey+Gp|{^-6P@bS_3y1pi=WhIM{1Ht&q^$Im1UN&YL(q;`M0w>rSyKH
zv)y(#kNbe*V96efiJY7c;prv5?dN~?(e7bKd3Ad1Aq+S6H%6!x=i>W>lkCjoHs+7c
zDiIOWAKDhbc|5A8t>z0a^yJYc19*RW8Xp6r(Ga&aY8OTJFHEjBj5a6n*3{_4fyE;W
zQUksjwgf3<+8n^IX85{<MWq)nNi4wxom~WG>JN^nMo3b$8M-Au+tx@B049+@FQAo!
zkZZssEBlO;Ndv_w4^l4hqy$bgm+ECYyf5QL{H^yFM4SKeedXnvGwj^kV8r_)%$5l|
z5LiB*#pofQvl7;^l_-oW9rt%(o^r?uQotd?IBuPIM`-Yc`|Ua|bTJN1SJEuFsK$9O
z(3jB`s(OL>oC3;F1EP>&3x)V&SFT>Lz_PRw6GTj2^BaG35Sk%_Uaqbe7s)*jpk-RD
zjo(+_1cq1`jF6B6tbZ<70ABwfL?OSMOG%YBFUL9i3TQ23Q4Ft<#W$&?j1(9X-}WM*
zjSZaAWdR36herp!3vHN`F5pT^1-~<RLlUSGB7MCKTb*70w=5;>V4Gp$=6ruzqzv2U
zoQ05rh>hz3+OE=#^1DDLhjwSfXfMN*geS`8+YOE~KOa$bf;9GR27v1~ccSoP2!CF;
zFTsadKK@7KOK8U<DFYHNuB!u75H|i?O_6u{{Q%c9+;M;M(~YBvK8q-RcZOD$c{@@`
z0htpVd(_E(+U7p+aJQ?erxHKQc=a??uk$`tU!PQ}<^6!{`F57{=@o+p6p7z<_jGUE
z9!g!9zOf_2D_BWHsT<rq;R>-D6Ege)Mrs8k(g0Id70=xIxW7ST?nSUMn5w9>8R~&8
z%$y49d~WYZE$v(SVBb+XOlbF8QbhoL`SL+C!3JEwk3Go{3`!C4He6o7-{mO>$s=0C
z(Q5N!nAas)OP+z0T(wLOS66?XjQ14BI<gYu#uhPq<Wdkq+~WG|68sM(M9?N|*h`q}
zMyg)D#Tz@c`<kW1cr!GNB<x%k))hTJEg!FZqmI&&CV=4_qXXznF!`85Uat1T$qD~L
z)Gd%EV*jto#Mp+o$Hxj;Qz;fY+-(OvC+FA4DH2+uR-nweAm%cP@o~oc1N17&VJsm!
z@ovB5EBSnz!y;hhU~EKqrUFb+hCvG-^?8makwO?kgyIxb@t7dYL`A*C@rfH0qC)lc
zh#)4>ip3gB+cQeL<tq{UHPUUj^?G&`xfqQzDzSgK_-$Lk)}8_^maXlD5^`ZB4|sol
zSI@~RSnHBy|4N*xORE8Ux_hty_|Y;DCN^vWdhb=cnhU=VoMm06M}}E}3#3z3!1J0j
ziB_QH!5E(8YwS<Fmu*^j!3!pm3gLZF<`Y13HN@<dfyuMC$bCG1xcAob`%!kU+Uidg
z!dep*fQ_pO3R+~}tiDO=>C=k`zG#^M3bTsI*j1QV0r*Y+0W(2=SVgDK>A=x!D{+c=
zoA(~<pj-5i22%qW<iqoMsd0piP!#ZUOf6vqeT|Z8$2g`_a+%eeGAM&>FR~)D>`=##
z<I-WG`L#Iu;1R-gt*0|=s#w&m=KeYPh6n|Zfm+y6(C(Pi6iMAB;_h`b-k*bM$HK)o
zllaN|>2mh!Xo<SF-6Nb?EQx(&$|ySIYONHjC`e3SI4-bTBZcyi)4Q9)8|-i4j?^Sm
z%5oxpFh6yN(Rw5bI*%<@ply~Wh@6b}%>*lH-`*HVq159`(2HE*arO1v*5Hc5TybG@
z^58VwQ4sIeYG6@*6lUT~nSzm$r}WU$b%WO)W99s{eN~G?y6(w$Ht+FUSE|vMH#PJ)
z{^ro!H7(6h3Wl^6)&jL^jke?;4nTpBN~irIzhfmIDOI-WgGK!`IVak%+C9Vv>zK<T
zx6@AnljZ$;pZ6O#*AAbnE~-`Si-%dQA901U40oQ_@-y$XO`yR*^7Ozpc<znbJ*sb5
za`^Cu=t`@Q^*dslJb!;3+L`-69BL3mCYrR?qu9ePJkj@ZWq^F2Ky{@Do7e$u#HuxA
z)t0uBjLiqH>ykjgr87bkLAQOk6_nMdXXE9R$8@2bj6bmt`CY3}vkp(ck!Gki4hiVq
zW0Nw*lSlEadbirCB@%+a-4oVP6`;q&$nl0Kanj{duDkN}nB{d(aWL}f)`Mq9F=TJx
z`8?ZSq<<k4P^(7Bey9~1>1Od3+rqv{a3DnIi*GU5ghrC~=k!g6*6#HtKM7<N*mbN%
z-baLz9@WivjJ6@Iof|B1chwEaobC;i@Lmo&{?-|=)54kwHZ9j4??+1ONKP>)a<+P&
zPlnwV40AlF@-zs6>HR#s5lENMArv!i#kkHY4!9cjOvRx0ruKB9(fx?mOvrBg`^9mY
z)7#*6&V#+CIKpi$-^`<19-9X^7++11P?=z$b^6mu4(ng{-Lgv1HkKcqe4srK!9AFL
z@my6jnNF6<{X7>Nck?g95j2*os<?G;zCk76N(~x>?xv#ieUW)$Q@oCvTpUeT#mZkL
z4u!+?Nu1hHN$YvJ`P$EZZ?fEN3*)el+cXb<*!0!OI3ZK_10<d#@UnniUIzqLBxVoa
z4G<dEOBXr+SDx7NPk#7%qM|R}N!0mgx7-lreh@A08fEXxN)b?eZ>cCpjZyd`a$vJx
zCls5p-$IwLNiz8P!`~?zdYB{u$E*FU$nm`?k+lUno1nuh$<j@<Ri25RG<N>Lb*{8v
zxV~o{o1(H!bnM$AFvxNc+aczj(`*Y?x7QirzIu~hfKrz{?gZfT=tb>Gf(*5U)W(Qu
z<p$)LgNsn?SH6zOs~wX^R&Ouz5(M9Cyu9m#v6ozyRbdh}0VhM$D{jt&&zX?X)Bt`J
zq>8kcd!+k9Py)PPILtcYolf_cn-?}QsAK^JJRv5%Zo7OX{fu0v8NHypA-4eUd4Krp
zy72SU+DPi?-ZK3T-S#I~fK+a;{C<Q^F&>b3rCH-H!2Lj5A*>+UF??s<LK*4pH1AJF
zi31xX6oY=gSTpl2h35CYtHtj_;C@uWzktNQ6YWd*xpi*dXZ{2wDzw(-KH+`KDNQUm
zH(w&0VEHGR)yFz99(fsoXcvy$e66NrGL4Mi6(JVVYU4Wau?5X01t6t8l2edA?>lDh
z7;y9KS;XW?mVIyDzwum&=~hGNQ9}q(L~v=&%MZj>X$RQ}>XpLb>R{;LSb<X=j|UQt
zMbF(-Z1z+S_eo5$l9v`QwP|*CTDlO5DHvMcTj#P2ZD4q|IXLDy-eFLNw8pnrK%(N=
zjC;SORl;*>+glAkwe&{QGbGzk{vJnZVQ_W6UBLQHpW#akvPsML302Fhn!hlxWsU$F
zC$j7rxEYJHS@^^`L&RY7ntS#a6{7>^1kkmfyPV0=;_*KBE$Kzg&{-!lzZyokt8y_C
zU^z_y!50RrE;QYt2X&I4qTlhD5#w@|YlC+0<F&SvjHi4ml#M>^hAS0OnG97VvZM`C
zA#?y`dT|2E9}M;XVdcycK_%nsO`O0_FP%w=U8ESi)KgQ>#F$tAq!K<rdGoIe0|P4f
z{P8?F$`H?G0mcJN-Ec&2x0?;6rav{RHkYW{E((hqI157`15<_e3lB}=%YvxuV0{u{
zzIxgA;M`(Kk2Y};pdB)Bs)*fsJ}ysch;E4PG)vTk^}biwHRr#K(uvDkcm1^yF9_)@
zD{&~}E%#1r>h0^`-&l;)`;;kB-uyo@p%j;)Eq_cztuLEhHwoxAy>0-cTv&3Xs2yMr
zDrHq6{C#dii7*Jqq)|xOQq*h7C_`dZXa~3G@In*aKpg=L!KZ?Feb1y2dZt{;I*JA=
zlDO3h11&JvXQfxBPr9i>Z2dAZU-159z464h-u`}RlRZ{b9sCwilR}d_okR`Vzp1V*
zK50f%#EL<`ozdCYOP1PY-hh|t?_*^-OxDUmlP*6#3ku&RV)8B6YTGpo+-w1eNN^Sn
z(Z-#qNVBomzr?W#UZa&+kZOV5=8`K`vlqxv&!wvA2nQulv|=pVXi!Gf&ey4)e3teR
z(e@u9^5v|_6Uf6O3jnhpZeGQwO>#I2SHk-pxV^eCV#1Y^i3#bW+pk&g#zkUq{W%hi
zd~ZJ7f#^cgl~CbI7aP5dTb2`tA+R+yk_OQ{J=-Bolz_Fz6VDT!mX>F0b&t*A<=Y@Q
z<h_JLe(*DufliLYBUj4Z?)M6#tEN^x!jx<K1$~J1G`t7if(4aTaJ4szu{nxc9i!+j
zhc{u%z>t>sDX6-MT~{EbB@Xm&{HHH4@MY_HC6eDRTCIN+;#MQl&h)1q7$5x9d$_hv
zQ>wLndtE12z8f8Et*S5gt13}bI=Y7vOW-{@bRU;B25(o?QK<^Rr8y>#Z7m_q$`=h$
zkCQ{?Y^OnH)EY(K*>S>D!yLOxTI?p0dXXpDw>yGyLs0bBb6j3^j5k0}f`21aFZ@UV
zZL0XA%Uj654az^r`Nh|?hFe>wrO2#i%0xll1|P`Tes!KP`6{5xYs9(j*IS+y;9Y4&
zX-0-WB_ns9W;#OdO3BiAz{u_sxg1iQ0(XUmS$|v79-hd_I1OfN#YYXCVkPk&RHs>Z
z*S(y@)q7^Oo-cqjVfi%2e33u7j-r0IsfY|KhQoqofuhyfkz)B4rO97uQaaLG<8gSB
zYE?MW+wwaO^hcSR{L)NmM_`?O#59UO-ntk?2owcW)D7Vut0wbCl5`!nW0nQs_c3k|
zbB|C++%JKyN945Bmyt_M!|H?tEQ6Bt;8UEz0aVeb24xHBo!9DCTM>}@)^NDQ6jtmk
z%g!bUl6YGG8z!+~ZcB3QeGLZPD&vXbebjEtU2V~pB=L@)i57&x+XB#9OQ4&<3E8K<
z_X7{4Wu_Wb)G~C`B6O5EG<&Sx|L8awc(|cA1$C<Ui&%KV4~Z?&>!eEUMwDi2`_UOn
z-<HyIL{NQ-*ykkWsC?J#1Z6Vjp2pur3Q8_v8Z&UmwWEyT>3!ZrBPlIH3Z?}@Gl&v>
zs>Ynf=GHEXO;r@wKr=ILrjrtc6h$OuvY96Lrl{;2OPYG#?N!5FIp(v%PGW_Q#?u~b
zHU(B}a~RT(wetl(;Y-tz$!i==D!_3m^aZpK6i#){n}J!NUUK0>e+;sVvRtjn4nQ`T
z=6nb(J>QMBA;@)5;Llq#u^`}%j|jrE^pBnqbBGAOb=~ibq35z-oNYd%DgboK9&hxz
z9~F#_{_E3Bi3^Knx5*d({%g^@h`o4TjHmU2tMo{r>R@y)d)*%+^Lag8uQwa{8;v$<
zT%UVW*S@#6ZQi#xx({p~<s|O&(eC_Iol^E8vpiUcbHtlE!_<Kfkot)TVH6ZkmW=0a
za?%Q%!37c>*r@!dj0FOuAE_8oT8qd+lNRC`;fHX19F2_FOx}&;goNzNEkzBl%VKgF
zu|HTa#wrUhxsWJf`S8rG+LiinmoXutB^){!y;z5ec;v#=X?1B#M;jj?f7d;n{CJ_Z
znddl8WXy-CNk8YDT%H21=Ol9>tJ6{8<}=Dtd!T@-mMBrGZze>;Gk?eAy2uLzjOUIT
z+5=|@skj~=Sw$ReW1w)c1Rn#_yl^?f#3b5figl=g=Hf9cQpfQy`|ZObIn4`#kjPG!
zEZscHv(zmdH{!FDiOTg80#9!QOTmb!tajstG95?6v%P4^4IbTGV4}Iqy96Q)K=f!i
zd<2vO%hd2)<Jh~+H_HiVj5uDI23FH_fUoT_3Hzv|J0TAp8Sq{mEWGiOaTmPHRLTki
z<y1|clf3hT{w%rX2h4c2uKZz0ZdkvlVrBzXk^e0xveB3I^Hi4X=fcBt4fI0aNtuhA
zxo5mJcF6g|!?l1NIsyTbXDq|b#Y`mROcU^haYTgSo6nz3@n>P*1v~S1;5junEdZ8!
zdVGgf%t#sQj7U-PJm#RY-<vLjz7x;9loZ?*sd3DSyr`G(qwm1Wz#7ik^}wL;LX23*
z9uQ+3F4DlAEMX%>j?&$J_wt`G<GFtkcpZQYJg-YCE@P}d?c&;2*IyX%(uM!R@VsB^
z=Vc2sJsBo9wxZdNv&W;0!wr#QVU66&T=XT8>%q>YMiR=uP<0V>UhxP^$8~(^a%XKF
z?%H+qdYQT6a4{u}v#r^dxm(-`GcPZtNtN40kePwWnh+9WjQOwD2wxG3e@DX25w0s!
zCgYW^dqZ6>|Nmq?f2_RVwB2l^?IIL>qP&!>N474I2H+PC9rqhM{8mPGCWAoE``BS;
zd=sZoClC?#Y`NHZ+h$OL6q`jOX9lr1Dc4w|B!I{Hd6`*;ar3H>p78tR|B!VL%$2ok
zxTs^>wr#Ux+qP|WY}>Xvb~?7*vD2~bGrw=GwX4qFRr4Q=8ko=f-1nsbRSKbqS&1D#
z%DKR5YLa&ZzD^Z~{3WXTjVk?QFRmxP{0yuXkoj~LB~f#iKi3Dejj$f>-Wy&Z1b>7O
zG(k@&lKuY9G+ePK16APCT+JE=@pdad^pMAQV*bPqBOaC+GS5`XWEd6DVJndDjPm=|
zlZ2fk;Tr#uDTT<mf&)6qJ=7i>$&Dy8irhyD^CSE0-f~o+-R^Jek5sBx8_l1FJWGL3
z2PU<;z*(BViXO2xq=SQ1ACqY{$^v!_W~sSVK6jfGovX-r#JF$)i>%?4ekl3BoSJe)
zh`NnVo0d)7v9M$kDc5Vbu{Bj@2;uK*a;aX@x_z$G^sWJoDr*82z2S(bn8|%6Rr*tZ
z0}}V581K>!AJlXp^z55iDIU8PA$(boUQAJU@8;lca?07(FN9EEM*ezz_n$$1f(^(*
znE%XJPj~*|X?Jcm#g8@_4qdKKL#n1Lj13)!G)?IwM^Sev6%1cG6Aw<`j8Ywog>h3I
zjiU{U$4z(H1NLCalWD^iS<U9fW|FO*E=sM?GhGfCwpkiok-*b3BpF&eTAo09Ji%<<
zFy>otzCj!|FX8n2IT^pMsDM*!UL7;sp$u@sf|;sZXgo;<YC@lQXxL?t&$agmv$I2F
z*TU1*YNXN2*PsqU#CFMFXepZfW$`^f|8)ClPI>hy5F{zHgKzWUnPZOTu%Kt1JPl|Z
zF=s#(nF#LLo@o^qQ`fJvjJpB!8c0Onr>9?idl_BX9_)XPWSYEu@hx8_M8lHxbUf8N
zUFn=<%jEi`e?3L=^2d*Ew?8F>b@x}F{JW$G$~6h+O4LCyIm3{<O!oKl!s>ZwYq**)
z(i8~8H^i5pjrw&t9^>EO?LxNo-kjQ5i|gSVjQVP9<fStRs{TERDjuY`z%^?FEb4@m
zckI?-X9hjO9(YI@>d+AIHx}IROsKJ%pJpwS8~;k318a}tgYgQZ;9T;FlQ(Cm@#l|L
z<UXXZ5FlG)n#>6Jz9VjA845mz#N%gQpQ;pm#>0PT<sMxOp$qyyP0N*bj;XT;`AYhV
zKJu9p9A@&%EQU&d4O_V&6oJS~B<4v8MsXrVpVBIW5cQG6lB=rwv<o4^k@!e3@43+Z
zP~!F_QHrM%iGAPuR%7Crq}OxtAet_g>Z#K1qE18S3Wz!%e&zQM!R(A%Y9nGm2COQp
zMqyzB`pp;HpGXSiP*nJ2b*A5EV8hNo6HPCmJnV`5Vv9)4!R@~SLv+*aOs@t|E(L;M
z@qr1U#<4;_5i?d;>#4&O!hYoierY;Ph_>V&*`-zBeRwh%(@>`f$Ub1Y>r~3A(hQkm
z%W*%pla#eK&8p}oGT4F_Wx1!X0;8bf%H|32Mf56t_F818TwIh0as-S!g~|RJ*u7(R
zM$f7`<N(_e$1Lxgnw}f|QxwVv<$9(fZDM;#=Zl@Hja}aK=d|#!zW#+tezUTfM`dMB
zl-aH4qihnve}X4kydT42(fS3+T|IW<XOoRt-)TJ?TvNC_47bzw5bILINS7SDmZi-H
z{$*_UAdmT%Uc)S=wN!rJlf)`sg#g)^h<&_5CqvuVHtN?jTfbn@Lr3H!65^yW%gz8!
zHm;P-6+@`eESGMjWLNF-e%f+RONd1JR@P=ao|n7GK!7%G2eXP^BPQ{B9}kU&(IoAk
zq*raTP_a#9#PJOs?Qu3yomKCSEM7~xT2%svyevO2G)SJxusXzCbv@5Hw-J|A=nu>T
zSu>g{@99%V)*8V>!Yf=01yT|<ax5IupGH#~%l&Q@JWD6P1Wa!FMa4Xs$Rd74t>!6$
zLPodY0)`kysBlQAh{{<WV+%ySWY-**wfM=fX&AR%F&SL5@xZg&2`A%3=V8A)Z^WvQ
zIh24;3`Ex0U{YoZcy4bajVY|RmY*09--Ai12<nc6$dslHCwLQGSms4#b5UQJBK~3b
zyp~D^{?NQen?!&}*Nswj>f3Dgw9&h2%jL|S*<(PV1W86W5<fRk?aVN=dUmyT+nB+I
zaXirj%)P61`@tnr%h3TdaMy{?it4s<%9Wiz3wq=a`c~?}4%ulQvG7b1<O=p-z01QC
zAi$NnEMXvRD!&z7i{9*&TUE+?_JMVv1QyiR4MTa<k18(iP9(<$l`D~*v#H(GAN$yd
z+ZtaFCu$u8O1*h;69}^*3IgmbQ$piiU2(2K5fpUR(&7~IM?er2dwKdzi3`+y!Yc^-
z(p_U{rXdx9n;eAypoa0&J4~SG{p2aZm=&yii{dci)v4Fn3D|2hxcLj)`>5W2tG(lX
zY0Hz(W0ybTK38|fm*$Pyj%hGJjb~nc{tzraXZAGHme=>(aayW^_-!@@La={&d{s#q
zdtOdWMP{5)1rG1;pKJ}V_4c|mwkjMI*e6)Sdox14;ZQ;bVx&626mM=icDPdSHzKmJ
z1&Q<-MY_vRvJzS4<t_uQ2a*LF5d^nTqUUn(zKW4~^0{bhmHC?c86xisJ$hJ?lMdI8
zwTMHWNTBnN>654W2o%*?o&@;Jjx>tgL|LU7Q~n=XPk=Ko7<xi}X;LWk;954tx#0e<
zwc3lK8wE3}JcW0kELY|~y58rDn;J*p-&wkm{wNcg)roYN#~-awGXa}|Ib3$CG{e+q
z#}OG@ywDYwYUHaLfL7L_c}4V9m|5yGo#9X1A+!PNVqSIR-|XB@s#-3kFcf%+g@Orp
zgCrN;oG<GEqZhFXX}w>Tw}U=cAI(0;$(}i9@1mcbpv3Aqq#*$ary#pfmk?F>LzLoV
z=H1)=^+HQvT44RZC`XW8Q1_OFV4=fQgy0KBa$xrH6J~>38b1_>K8!nD`+%}NtV2P`
zts4#MFc9z@tEgaqftFHlqkW7_D$>SaRQAFP^hBY0jW&IhC8|+0E&oXC%rP?TxFdrA
zcSl25b^E7p0pd#VC%@RY(5SK8Pl>}o>PF8-R%Ul$q6|=Z{I<chD9NzG04kJ7WVM(^
z-wa<Lq70qH!f2v0NMDU{3Mj++FiZ@){%l)8$l^a~T@`3Cpt=dq%cm4kG!rEk?a9;{
ztV{-B`u+EeDWjtbP}|TQ7|f-^nlO}A;f*7bmG6OYR50KsF$4{LTrIAq-5AW?ZtVfg
ze@LgX`@gkP{juGLL^*8>YDfA45|VfrH(YSvU7uu4{to=xv})!&6dOe5fz8BH=|v{|
z5W_2>gicb1@1)E(xaEx1e?iJXI<9H}_<M?yWq3D>>(IR?=aa*e#9E5~JZp@=84i;Q
zcRy~FOA^Rt+zu(<nsVzHWD6*t8wN|8rgjUxryBB^QH3+BM#!R30OD1iM4v!y`4-Y?
zBEPH1qbCT$=fC<G{ie4w5R{*8@#?_B8K(SjPZ=!rlG3GR;DstmOhopc8|#uM>o9iR
z>_d(nW4-3|K~A2-9T3TewNp)Z)&4MoPjo1VzU;(^8O`FR1;aIGulyZ{oA7ujJ_?}>
zO-V5O{P#`#or7G5naE}vhHhNiNBU_I2nvw!&X)5vX1We&<7EE59(XNRnnk(^40RTd
z?p%>UkDoC{;5Y}&WpDYDqFkD9+feVuN(>ks_8^ppH;3Y;M%Ao-s1)d++`@PQFRoQ7
zozDIqd>E^PaFh-Er*d+0B@8osQJs5&Ve@7uQ`nP&YLutRSC1$kF26{bKmk5~c7+5+
zU@$i<v*1)}6ezl9m8UiL@>U5}&f8MSM1aT#Hy2v$OW*n}OwtA+Z4GzR_}l@^lJRRF
z@Caak;Nc^yAoX|Z-=k$s)P)o>Nac{ceg7n>MFqR-8NV(j{ntuaC~6`;D6q}GrB+3<
zznw4HKViT`d(M`o*es2aUZHt$|3&<;Bce;P(-gBrHMcT;Yh=lV+tV)_$pJ^uL6B81
zVaB&77BlOT*_4EfD@AigN5^<dWGtEE<?Cpl6iB~1vMpnav^pvcoi&%Izpy{qmKoux
z{~1bNvazR-ivBdo^6bG&o>KK!pzDCH8x$f*YaCh(I2082NlSu7rXKO!`N4&)-7kqR
zy7|;%ac9m#Bosgt;rb*&W`VZggL-czFT+yOKaw+nBRrFinC@$+LzHO;(LeL|ER&h2
zT&ML);dhgbq#neFI~U+WDP5YILg={i(zKwd{Q<r<OJn^ZOT@Qd-q@W{4XI5!FX<-=
zv<qXML-&;us8qd#fQf)cSrU*DB$UK2y)CZR?tU-m=g6WfmvY#KmQ*4o8%p<-d@Mmf
zBTk9Q0Z9iopFhPplW6MvZ+0lr=L@C?xlz)Q&T*#R6l`!Zk%dU=-dGlY{AU3iX<lu=
z7dk)5;k<I&WBMR7MZnYgec6(3Ifwx{0^*GXh(qeZD}z?KeB}D02a)yiRH0OWaNLhC
zA%6Ag)jVd^iQNIVB@M$I1qJQZ#jeboI=^g_Y`LV~P=nYHsNVKt#G-lUjxjM$=8l*%
zd@}BflP%C|{C4libyK8}YmSK#kUq)li`4M&aCjBX&oplB_iHTU2z(@S?O%6aAu73d
zFH}L)*D7qxMOus1B7%6$di>d!koJpf>4M1ZDuEI0U1xX|L1^wMB|#l5W8xJ;$8#f6
zjacAFlY3s{``N3WPe?ssa;n0$K`?m{nC(;wOA_zsw|3b9v&waJXD8Tg7#umCh;q(5
z3`|vOppimwg;S#RyxM@i<RX11_bcke9o5%ASUuC2(w-yC4mfJWqoxKtU9vJ~85iKA
znZHu3#4jvndaFdx6kW{P(b1&wW~+t{U`mXX)}G}RkDDRe77;DX9JHa@z>+@UKx5h%
zfv+?KD7?``e5-?Fs#SK##Je*J3glwk=a9tPWe7zbK@)}7Wx?7adXwErX%YiRCxVNG
zhM%3RK*b+_AMwl57)pj63Xi))#w=%79QOYhtPAQ1U^e5D^OG|Gfv}+=@LoXTahe<d
zEA`NZBH+pL@W<f>@lNhnX+eyK>bMID+IG=<Cg}2=XldCNX=>$@kXG3eqUJMC{svHF
z{mHO3CFZg%J{^~82!(LkUdc#AF#}x>@1r4v3n?=8u+6o`dCbNrD#?DpFBQG^L-o_{
z;_uWEUsODLV?0ywVI@*KmBEJSx0X-Ln9R<mQ}A6@4CK+Uxy5CKG|DKi2;$J|(Gcc$
zB3sHh<qpR}j0Vmy0VYVVC926k;@NoX({S@Vr-231W*MhE$v3y{)#3OLqmkOVqxco)
z@JA6j0)s4I6>>}#W-qgyglI_wo5Fs*p|&e^YG(56s%k&o$LTZQJ&Hhp!%AInQF$T|
zW?o{qMF~50pUXH#z2AH`47nsNhFT=vjzas>S1CTtt#iO>xRkqPU&&8n8|vhQv)|nL
zd_;?URiA?iBUQ#1v4l@mi#&vm!FjbcfBvjhibgCdY?I8|_q#s$Mug91FHogO<(wx+
z7Kt>X3ia9{R>x1w0REqISP$&|uX7j%rq_D)f1E?xp~K1sSLi?Eyk2p`LNK?6HuDvP
z5%S0zd2yjz0|6uCAcctVcQ||@;M;g-#fMCr5JkReK&Iu;$>KFrmNKbn8_KlUy4p;5
zWl(1<7Ma84;0PpJ#XZUm+P__rsJfMT+3PPNvhYdDm9@FEr+#GQl$4TA&<VwjHaq`Z
zH=Xr7YzYo+@Xj*mp`(R-R>?rCi_wb13w>devKfuiN|Hqew#fB5*Ti>bCX}N*fL@vo
z(JE5!(f*ZE2_jalIcXNXMstKFsrQY#shkb7zzi84SWfD2x&G`v#38IkFvyrI{q*#V
z@bK{Xq%)D4L;9!)4qJ-yNI0>FDwtcJ5&&!iYy4nU<2ZC}$DxrQU^a^T>er6X#^`Rz
zQjpV)TXL-}r6wk<MjD2&(hg#IOgu|SjA>GYZFMl5V$B@ubo^N8qYWpvn*Vc@$Vzd+
zA9s{O@$W0`Rpw-Lxdl@pM!#VifiG0QVV<tDKVRHrNFlVP)04~dHaO0&^k#7p;I1Z4
zj9Gro44;K|XY13U`(Yh_J<qojKwJ=js9DP3i~P`0?w!fk_ja;7UCI1UB6dt!G^b-K
zz$xG=GC!ShhJ!)CU9g8XEIp=pm{%daw;kS1!9U1AI9vjZQy}QXkf{dQW>g}<=WggO
z++)kdKsd)3h3I%|$4+?OBT&ZUZmSoxC~PRlZC@*CE<|iLvln_kVb<?x`xR_`lvZMi
z{9)i^W=B`Rt;fmA{C2Z)q$PY(P&cuR8eQV*59emQ!#*%zvddwg=d6%x`(fvP`(pAu
zLitwN^KelY+hibef!90FCa=I~kRL`g?Lydz$nZnA*u`I~+Y3&xkU6ci)n^|t4A~f&
zemyercggp^hD9yN72Ixr2nZMA0iEPDCgvbTafQBLp9Ct9m>{N5ov^Z;geLmkV&k>+
ztZ(?-{HIlkCWJw<mPj78j*IWEEX_c+#U0D;IHq9pmu-|?U9CPV{j?2ZHc-K{>NdKA
zsIx?t@WX+rSeq{lR=~|`AYyH>JpcLfV;=T!Z1GC9S@%K*s3cjzuMr447^v9_#ssIy
z#w~yNhEYvX{eu%oo~7bkOFa987nU4GS&Rzy!xOZ-wBTiA#5i~HhcKQ;V(G?hCK%zm
z26&o0mLvQ>an(17Eno%`o9fzGzE<?9e%de@h2}5{E=GKNRQ8e?g_aG~5A5nX%R6tj
zO|Ekk>;W<6%pT>J;-v{Jcv_$ph)R{qvT|hU%XPJdDS-WEx>LK3Ujs=}Es?T|+QX(8
z@U_WQxCNOgX6p;GiR+9itmyKnrCXx`@jiD&%zov!x8%?L5+@$a4;xr-j2()!XEaKH
zZ8m9D>VhbW)l{gM(j*?3sVO7(CZ|*4Wgb92b%p=l^L0o|^u;wk5-bh|crYYD=G{p<
z^alh_L~oF+xdGhof0Df<di{bP<m1I{>`qVwf5kVig_X$3|43<_SvFCBo4%H~j%Sd0
zs9hH(f$<1~%}OQkzyAs%Q+sfbES|6BtgJ2iaG}pqO9y3L5j9$T0T65rjC{OZFShPZ
zUo7T>sajTI7JpO`OV5qju2yCTfdmS%&;jLFM2#Gep;gP(s4ty*p}uG&j@;3$yr-)D
zNEmWYBR+5Y39TS-o&765K_wvtcrp|s3{9DEU*b6kEE0+Roj0>G3z!=lPxF|#q4db+
z(}oo<Zy^qrFElNlZ%!spCeKe2(U|hmV}^mkgQXL|YHRe#q+F$(2~NigBEIJUG<&<}
zK!6KT$umE92=n%cQNI1S*CNf|`(l~3d~>bZQEEtIFEbg#7p8K9G>-@8#(aJiKs>6<
z<;pN_Qtq(#e7N7P&|MMXgJXAJ`!;ZQ{lsht;IMq1C?2`k`I*~mJ+uUV>Xvk2YWzp@
zOdjFjRGGsZ$oGsC8H~W669|wZ7QOf<MNId9rHG;a0BE-VrHHAZhiybR%@#SHP4~v7
zlp{xuhwYRi<S0`1hZdq4nSNxjnjwDI>^5<@uFwqI0JU3G{Y2tC`SQg3@s^iS7)eY#
z^6)WzbgJ+fc3*${T3pJz(@yCh2DR8`epB)mR{~ffMamJSaqs2bZg<+^_xC>?wD@Xz
zyg1w+CM<krGMY5db+jO`0vxS##sb2N(SnRqv#5ZrRd42+E-LO{+!=&OSKQ*o-q<RX
z4$zo%nTmZQgNLbH>8D-xzqQ@`DBU>aSKo}nD$;_THU^Crq-=r?kXAEKyvJ;*E&mM-
zBPnhXdPk5mA1BxobV>?D!xOqKY6Yy)yk!l~eQ^|1dGe&q_@3zdNOE97SNiQEZ#frs
zt_igM#`7Tvoz*Ioi%&(@qs=CW6+VwG-uc4R&iS7D?NRD<X>@QG{Nz=u*7Q0L+-AFK
zqW9Gr<5vSh_9~1e9?nGlCfwh-9=Qp7TqDcqO!y%=Fgz&bu)_n$3$wa@lJSM#2RO@e
zAVfy|IMI~Sf^`w~BJwHuW;YF1kC$YfpoS+DeKAdl=nJ8KJEDA^f_@%AgpC(;QV{9$
z8)0|f>v#h^2pI~EplfzM#kM@M_bn}7m$bUAJ#!UuT^;h17>0S_A348PTjX4iPOaq7
z@l_{*t0Gb3nLx$5Ip!B?yQ16j>km%O2{HcmxE~7+nLw^79c)_J*_Qk+J!J(%-Z=mM
zt&oQ-6I<*DN|$?JWwQgbXX(cYQbkGc2@u%qX?nP!Rm_+>F460#Y^dqI)Bh2UA?t0Z
z;2ZBHKK5HK4UrLQi(eINZ*u)0SLQ)|DKpPLSr{#KTG;Vig2!Z{j1=76iyYu>OdcKQ
z(yP)#Ve>2aT>~l_{62~Rz1CcN3JQRSO4^U1cX<#TxMVC%MH?EyL2`G`S8YZsKQtQl
zC<#}9fvEX|gqu`g2Kt5OhrpPk?Id<;C(V?ws%!C@L|UGI7&3g@5*M$G4&;KE?w=$-
zGR$$Rcj3FroMIJeH79Ez;vn`Y+FX*Ay*yWzu<_w(U-B9~ZWHM1Xse+}WUDXezdHeU
zF=_>AK4;zUyR+@@o_NuM{E)gif>$%s{Rmzk6}Jy73&r_I^Cw8brmAK6gjdPm&{fJ6
zimAb5l0866;8%l(znolV{BHM{7De9YSb1O}n9syt%p;y5mTi4F%9z!K)wPQ~{M&{>
zgp?M3L1rF!b5(6oV$RR$ED@WhQ?71|YV|&6jLEDq(Kex>(4IL^4y{bhN=w#6^~NIk
zeq4C(_6SG3UGK7We<C?_0EowitGtHzhFbbuyR|7uflHj%zVggS#6ZTiHZs0j>22&u
zd9-uhaccMHG+zI04KESq3a1O3=O-<1fibWNLTXc7nQkevJ7eIT+2JE05Cs797_Tjw
z=;adl^e^+M4^&2z+bHEIk{$fUhGYm+1+T@vYP8{LbVbM7?d3H8+r9YqerrVHcq_gR
z@aRHfm5Da^G<%)ci5Y=MCC#kSnNCrqK-fIar2%W8LLT+swIp@#A~EEnAmV4rP>;u|
zkF0Vfi&OAPVdPDbxRuS(B*y5Lyb{MXR}?@9+g1tsYz#`wd<#`&hH>fTNR63FtCeFX
z<utIBnwO*vRbyw-n4p*t*RJw&GXX@4xfb$TE1UtN2)UHFBr6+QYYC%^^{9O%mH<1o
z$1#Z}<)dP^cAIj{X4%Dc8;IJ)k4D#E-@+^#$hFOoWkh$!M#3M~r6cihE67z&V16wj
z*32*6qWpzxFOy4X+S-T<=vBk?iFAL&H7CbjX(DXoa2o!C==9iph_zl8epf}?m8#il
zI9BhEPKdSz3JXM%F(?9!8t-u=szg>I5?9{wG;9CBLo}WQ!4iP)4U{y6W$df*1$+mk
z;E7&M$yv+CbvQo{L(eT9Y>=edCQKrtV-aV0>H~`(`Tuo;KIE7c1kSe7O1gj8I5J_}
zM75|iEpd9HeSV5hdTqX#L#CUV@0u@L5_Wd8I*+@q74M@rSR-ro>CqtCExSgy{yTV(
z3g=Rr?(=71SRV#_Yh+oUCmEFCSsC~4ar<st=p0S_9Kb!|O8Mgd!#yU(Ss%%%mhGt4
z<&l9<Ujews!$=@wXK)O2SS7}||8kG-BLCwavB#Z$*oVGE{>wdrr12E}!#xtP&zwnQ
z{?V5ky8p3D$MQe!(PVZmQuXZ@M!kfE1Ic;4U3|g<!nSBZcQN%xDlCJoTnMtB<l}`h
zq!ZlBBz{Tp<+3zE$_ZDU8R9>`A5S+5B&weDaGEAYeS&Z3f-EFeaZrL}7~q&Bp?ldJ
zT@n)dJLZEs<_0@zI_=fJ*o@=9^&M!<<3C}fu62SA2t-xQWVns{9T;c6OPy{zF3(N?
z!cLua%x?j|G%P6~xaWSURHVVjhtoE+G@r||Gxup=<JL~6d(7umwdW5%Dmhtr%iIwQ
z3(P#dQ9O>zlo`L^(y?V&fT22Xw@H*iIh*ICuvv-$DIN?=Ddq@Ava|8BI14BT87@ED
zk*1k{%k__CxNHV;vHw0QA1*(xWabnrF2yY0O%u2N*wDW=iMVcCJC2?;?{|h21P8@5
z4jkh+Wfa`|yxi`A!_Xv?8q(hSc%<ds-_r5)cE2r_$N*!T!E1s6&e~4U>3j{VdUEh8
zeEeO0cVbcPr6<<=KG>>PncLN71~>xlwfpmR+2vu@dbQf<u+!?-Ctg!5R}cYuD9B<d
z4uwpGL2O#0^Po;sgOC6w0xlqeu)MXgh`W&@qz5|Qvsxx+PE_XyR(qS?7Im~)*jKTI
z#aku6E#pcl@jlalg(7aa!{+k&FZyWyV$}iywxg!oJ1v9+7TXfAVoZHh@bxnGf1{68
ze6oE1MIZB9iA)Y&?<MQ9N|2d`VS57I#PE2su0!9)Oq>yv-n@W#A#%e?>J99`3wHZH
zNc=M)m7(j_1DT@1JC<(jX_js!ExXP473$oohFtEbfl|&SVJPkNUAi8<>+SfZZ{ia8
zXqc>syc2`wvdsGsX^4&Ge=#IPhP&0I(}n}mfF?IIE|PIS8N*3SAdNeC@_?ZZ=~zLT
ztVuV<g6<SzvgsGpr$QhX-tZt&8FYc0xRdN9Oc{i7=%F)IE}`*o()bWZ_7g<yu<@I^
zLlWK+iU~M2m?Q@Hdt#n!Q9&x^ixHZ;4hw=af=tE?-4sv;>xa=hCW%nBmVvz!k2)TZ
zV6Q0vwpUW~;I1OunIbpkuVYi2(0w#D{Nh1KSWiSIp!S<U7ODm(>h|#T=#1Eq4;Fd-
zu*nK^6E5Z!924l5MzM&5^nSnivsEloU@<5)cWZ*S;b$$p#*N)tq|#&80Tx=`i6a=2
z@js28d~4&jg;41}%%lllu(N29Ml#)`HzpB`@_BzJ4zS`a39vr$lhK3K6AE>Yf=#T#
zs}1Fl*gI`5z|^B@j#|Ky35JY*<8;&yaPSQz#FZnZmTCTqM?BIlLl_Wo_&#AGW+kqn
z!*@q-vUbZu3EAUx!Y4-kkr-Xu17IHqz+{C0?Bh0X8an#F>?28<@!o&fM|2WZ<0(|T
z{*cMHf7wSnK9Fj1@t{g0YK7b6msPRbB>?;AxtlFbH2x6>U>|wI@zM_^nK2LrY&Bi8
z&nza9kbxFMaWL>q`?QKhmQSN&7?et_pgQWN`;n|T#S_83S!{LOd6KVF5qayy#K*3}
znhUWressgNM8b?Kt1@aZUp3i!uwl}pg=*ENGDGXfbN&&XZU_!f#UkFqHX^!#uaZWQ
z%WQD0O<BHUV|as!;R!_W;xEbw(mtMzZ^rEsHv6Rtq3>nE-Iv|hN9noO=h@N6f}3TV
z1CeC9UmpPDfwu<OeCt0?!Za%FP6v^tPBPjcB1}V85dvmLCzy0^AK;ejsl8E<)RzTf
z<*Bcd)j~Q{Gn$IIeP^|lX5Nt8Fj~lx)$`nz0WSHrABdh`Iwt;;PG;UZ)%^uEq~X{S
z!*RJK8fo@8b6+yfcFBEioYn|KV4RH=9EmfM-VTJu?k6H%ePm0JVJJiU6F=f#_wj-c
zF`6fq!nM9fE-W91bc;uJ4B$R?;kGb{x~M90F{kA$Pi5MRc}51yWDEW4J|aj}VFg99
zR;}kb1=b+aXup|XrdCuk<^bGB?Iw4*C5Ik|;h=YsCVmwxEdL+@Q$nc|^9R<|{K%??
zA&nI$$Sc9#)(s<bTStJ4payS5=P&u$>{sPW``<LFITamxn8M=cDh2J4LcT_L-Q6xE
zlcqFA)D0{(*zlevl^U1Ga3w}PbE+G(2UzRuckQVoD05jV^=^@_ob@|LB#+{jFh@I0
z6Zwywz@L)K?ElD*WYI7cwtvM!1c`zn?RtA9vne6T{*@o$0S5@-lIUTfj+)a7WPm+k
znOJNTr+kVOIz1%q71j|$P6GnyHqG%qwFKoU9C%_QvZ%D8*Sq2~c;840GyX9>s<p^3
zPDa^55mi=MpAfA`m>dUa>{iWIIZ$ZL)@i?NyUDWByV$qC=;3%5Wbc$z)i)C$`iNq-
z)r82@Vk~66FT@sAY2&=f8Aawa!?tmBB){;L)bLs&G-9`NZRlzsFB{U^yF{dW=ZRAx
zTp157$?MvaWRpfx%_&h)+v7J>Q6Kpz+iYj}5tFRL>ho;9-#MR?T^Gj#fwO|NWVEHt
zmn~Xr$M@;6)dQBSoc2poZiM?DZ|<h6haW<wQDBx}Grk(eNhhTp3nWw9>G&OoBzLXg
z(HQ1jh%(8>vs5Bg>CB{nvD`~&0ux(RAihIQ@Js1^VtZ471B4VjVV(ZMKan;N6ro}4
zof{zNCQ6&<tx0#NQ;Mi}=;wb-Jy6h>@eS^0%ap;F*!eVYZ4uE}HHU>g)MyR<=*La5
zry^$|aogp|0q55-FB+)jKjLMr;YQ1-5NfqA+F(d((GZaZAds&AAdqPIWw6`u;|u+%
zpZ_lcNtGNK&+hvUKp<PaB?Vzgo4v+()oQc19{>ns**^%RB`oELmN(^4^&WSCNycVW
zfyl=dPh9+RQSgnR*sTK5i!nCe5?h5A<h?FA%|NXNY=O1-8ct77N?C{_{`ZKbXmIf0
z9j0{0zNo&tE@>Nn(tB1#Lu62p<T?8%o9;yfbAm=oQY~T4FeNcO$SdgYn3g-0eyAe>
z;Z=)*L{BXQE;p3pIQw_Wu)Po}l)?oyGrile8=;p_F7j8|y(|M7^?hf3l|1a9lcK-7
z^Mf0<N3}3hgPa0jTQS_Eb&6PCDSt*CSwsAtG5s}MmM}V67t@XEOy5T*WKblA+-@7@
z2X3b;2Z0tE#O$tKZ(~c3k6kAgqmMi)rho$i`4VW&LD*7$K3RXy&}jr}$39|5j&{W?
zCvpr`>_C0N9ggN~g4&hYt3Vj<dIkxhnWPMg#^~N5AtV}q6w)3tHYDZ5m`A{^|AzYD
ziG9*So?-Xu>IHBhyN4rG{^mBCD2Z#}*CP*i^VJIsHDVC{r8`E*?b4kePh%k{Q&f)e
z;`S}5KP^$#j4$dwmK~13rEgy}j;T3<If`8OFQ{EZ@lu_|h=lOie2xFsS?~FolY?J`
zCs){p?}9<Dgz`sOr6evUMiN?JRUeEAXP7f#ln<FJ-#mpy9-Ndk+b9oofc_wJI!=O{
zS!!3->m6s9KK6{|ib|x=lw@AUv4S%-=JA1C12}ruMRjk5K0>Z6!XC`oK@A(~ta-qZ
z5!Bi+%CzQtdQ0;k1+ugf7IrK8g%8Gg!*c(6z6AP`hKXu)dy!w*c7zdy3Of1}6lXRC
z&h7S0LdK&9FqWL+f2?sj6^<FPSQ9pf`*AWqB}_JqoN=!I{qyvk@XO`>Yxs3i=J~ru
zEL{b6O_|@2l{}duQ`{-ov6D=Q?FuG()FTWE1W<5D)d<9&*xri7Hii|;UVpQ%ez59S
zuf*r?@AcMPin(q)_bcJ=*15;H&RLPf-ngFh2;C=?HvRM0K`Ehwu*H?0Q~VBkKa|t0
zy)s@AhdklpY4G}(ej}(Y>O^d039;d>SZmNzWe(kOV=;$_2&fuQhJUT53sV(^_>wjE
z>q+1u%^`y-6A69K>s#zTx=ge~n(|p^*>Zl`-Fjzf%Z@fYv)1+j=kMRo6286t-xg6b
zB`UIYJGr_$l}TlNUPc<i%%Ix=ChHZhGGp2nO`i;m1Y5@mP?!J5fwb!S*MS^N!Krwx
zM6C(<;gT-wD1wq^_m2blhQj9pa3Fy>{^LOAbIk!M{_8+CFjoskM1KMtNMb;O{XY)m
zy*j{wOqlo6`kw>2ZF=!Qg^Hs+fw9Jq%g;9WAx*?GF~eAgM7Vcwn;VkO0?RbQWiYn9
z$RRi+|E<P?R_Mx*DZWWtPSfA<5Uk*waGX~m)vK85&Q>53Gc&ip+YB@@A}l5{`x2my
zVShcX;McG(7jL?583xT0^QXnTLm8M|;g@!wtqO7VZ_W}p?@wx9%o)}WqI)LxtO}l&
zbu~F1UsbK^n-q-;a1Pso4^%hVB(Dhb2NS@g=J^oQR~mV%0Lkz4O^+PM8PgHtfhB;*
z`d}(_<TyS9k{|kA0V;eXI-y`<az_u4(oaM7^%ai>yj@;9+v<#;Li4A}#fY4<v>Z><
z->>fDZDWN3R%|#oE%Fz7UfF+`OrQutVt2BE7-rGcV%>}XI=XEqGl;6@8mc=^*;yY%
zwvi}QcYqle;zY0|^l3=tM+CPk(x-~LV)CL)H@ap(FL!56_WiQPZGfx2Ly0<*unw6d
zlVn1G_^nE!wPg~a*V}SA1|t`vQ6_YvQuTrwjIi`#-g&#+h9vtw;9;9d7P2I>hzDcN
zjU4ZPI?O&L0+KiNPw;5n@9T5F;~i~YXVi9cJM?-pld-v^IGSt%W=kZHPVj|W3KwIw
zs-lp1?u0s(Ek-PCd;4kSdIujv2SZ+g(!mh{>Y*X{1oVZn$^~7Yt_n++y!3T|-8VSo
z&l1(1gk2B81pz)rSm<e}ERIDjEVg~s?+z4|M<0}U*tg`-atvhXA^-%5)DD0keE|^U
zZPQ&Y$E@KD()~y7%0Yxb-!Bp0rkul@`Pl;3WRXaJIO{}vcir1~C)?6&eA6PiBO#e;
zLBIELd9E*VPl8pe@{ha9J)e(T1)x8~`;%4e=H!?V$bHSh-F+UXp|91k|JF<REv~c^
z?hH?{^)zt^yhaNf?EaV;M-s@wVNE|GMeH>@E_Y?b{N3&s4k1xa?Mytzv4$7@=kF%0
zg=HMjmHXf?-`MDQ;`g|1Xx<|fcM>m><CRBh>^iX2x%CWJwJ|f0hR7OOBA&%FSP5!N
za4|e7P@_4Nm5HF1$J!bk!?6$gy%p9_<uDn=T0`H7Dj;G^FqOZnhMsloD}p+cTxN4b
z=epuJW-=;Dd5scCh-iWPc+e!dY|cT3a?%vw3lZi$Q;4)Wg2OuBa#0HCh!6txl0U%|
z(XA|ErRi+D(=Ughv2j^gNZy!0ax1tEZ{oxyvGTESlyIIw<W1)zN(r$*&e@_cB3xn=
za)ukmAYpckb6X=~0>KbwSfxl?mF`zpBUa6cdI)hAqWn<EPjiZXO2S1K+Tx|njX%iL
znj^{>AI~XTp&L|eN{be<?iKnh`|6CTO}4D#3>=frMe4qYFKx81#i4VKWT>E4ILb`X
z)~rwSh`2f|2eF>saF$Ino(2Xfjcmi8bs4({L8ID*K`**Gar$~fn~@*h`%?`>GZl)P
zX0WX4$4Zv#)$TX3IX8{mt+v@!AW+w*2vpXBxzaiw1efwxY4jg7@Qrlewy){MR;)IP
zO+_6==1LU0&&@N~HtJ%VnzJp$@0zGdAZUZ<fUu%N#-|3S)KYXiqmq9K96Uy>yB|2M
zpe42rFa565;59yc{ku>Fw_J;2E>P@jrWB8@Q3WAu@jHM~RxKt?4QnBRg7V<m-g_f1
zI13j+^yo?sesv^j4HUb`&qF$y1_?p{R4x&TL;UsR?Xqiot<_`WSM3JNd)xJ9+FF<0
z`PFZSx_Dw0_S%7wY6Da20XRpXR`B}W#*RGZ*jLPoMDy<vvJtY&(U8>{qnNKdxR($w
z<7BD7R650zKWf)M7z|Ae`88)z;(aPT@p|6h3CB2<r+hUGyO@-zn^Ua0wZg}p5|&LB
zSTkHEJn<4@LeKEaO7)Qkv6%9V(q=NUR;rJTlL_t;Bnd^~c(Ab9tPbXZqF(!zCh58B
z=~M6Ewa2&T4ZW$@aoD8(a~io&&)2)pyUw{7+v~SK!)M8`^?;1{HjW4M9*b8sL57r7
z2j0;riwAU<j5(f=H5Sc96=<SPv|nqbY_e6L6uHsrx|Ad_(?d(&RH_5~CJC7Xpde-A
zi{$O)@LJ{ACG;HS90*$C!YT&Vr54<6o8!M4_p-w?IX-hWJOC6VrrUoh$kuArml=l7
z{~rZ;W1<Bm=>HD|Dbh+*%L|ub09PsC;s?Q2wENt<?I3hhNQuB`Z2@tJh`F^#fbbs*
zG6q0FS|^Mut>ND;2K50`@u*kq+hB|dSy6#F@{dC5l}FCpG;r!T$1SZrs^7ob%5Bgy
z{w{yCr+6YorDNMovtytpBW<mpEYgxAwlee)($nL%+E{;Dr=N0xwBF=vVSDfUJfbXg
zA8D6tS-uK&iB`B)daw(NppkD0^5w9p?L*jkvGjVQkW+xKBF1^k(DZw;^)aS>%nyCD
zk84y7M46L{UgLU3NShk<qls2JJ5ygnhW{JO&ww0$!6rQ~vP4nd7-W&QzFEW99yV5f
zhRnh+$_I+vl&|aJ-#wqv)5PbuoSUta!7Ed#7-6qj_ZosE@}<<PV>$;uVL1Skm5KEW
z#4{R;*i=psxc>}c2hls4mB|01)_Vo_uYDvA88=07Vd$a~i=@t2oR4&Rpvi=Yf&_(@
z#oB=d1Ear}$UOMV9Le$<*~F`dy#n2axhDTs?M140OHsc>3~t6OSKuXMitL;DK~x*Y
zc;kwsC%ScRel8v6ov10$L?d00F6RyWEJC;WVC=gR+y-&gijo{p)+17#G^a-CF>H%>
z3lk48E=9c=aqW_t#hu*bN#*p5A_~{LtcOsOs`B|F1!{a)X0V^zT<szZGc{E=nezee
z<BjkXk@~GzkK%GEmo0z{=IYgX%AlEn=P^A!=~|$&hFdgOtPHZ?i2K?l_i9)HD~6NH
zhKlna6PU6{Dy}&y*!z-^s~`&BEuTyhOj9?EnM>ECj0V<WiTzzX1U6=|X?sCGsdg0a
zBq?0?()W<gEahm`W4G?}x@b$j>=^iEVz_T<a7y=mQjK50Y3mggTuobn(zq4?-gXaB
zZ~XwRN6JBOq%kC(bRw4`j#RIFfRut-G`<wGYtJVXCna*I$xgvm$A?Ljd}8=t4YJkJ
zmutQJUk%cPEo7f;stlk(?!Do$mH$wl`Tf^=cZySKh)sZ3s;B~>LE3S=i&km^G)POc
z6|<Y=`!Jl`e>6zlol+b~TcLTfvsMgvc6e3sG4vr%9==n)eO+`sU)dvgX-gLBU1a-A
zAgK#*+EhS^kjM&ZCKCnO!<r*dRsT;5cUrk0PJ`i?vDH(y8IkqYEGL_$haR1D+zB#|
z9o?MF*~AANP&zyrSM0344e{#|gr<HU--X{6yUkO!R(R?OETBlgLDEo^bO6?4EEcUe
zFRzv<A%RyPiG^7tNz(LaXRCLdGr8h)p?Xq^_Ww9Y`L-H9=!S@T73_gm+9D>YY5)f*
z4kpB^md@_mvc%oh3T4`_OpNqR;b*QwU|FggNkNBde2E>xqO!hf$z}tnE6F;?7UhW@
zBFqw^UxJv2G`ErX0~xR&RBt6rD2bnVmc2-j_Y-;@d=;s}AWdwhK!<q2Y|YqXZ+!fF
zvuDR#QErx!(5cY^3yJIyHVI9o?rTshEr3=~DYR*!IIXjB=y>i=*30Jh(66m`E7p1F
z#W~evQQRh^FYOfumDwq^;HfUqRa0TO?De{&g^-6bY}7-8xK|AykFJ2mvVMtmU$mm0
z0o?Lh1<-;Fjh5zDDoLKYt>}dj_A0R}0Xx6-pcugraBm>CWZ%+h2{~1W!VG(OKlss5
zf6GlfOr0-wjdhFEHJE7jI}5gms*r3|#UphhOxs&%8}mJaJ`@Xp2RZk|b2Qk}mNc9M
z!3^*qQ^k1h_0O|%2Qr2QIH)dSU6r*-MYPPjE+e>rDeG)yfXy3T6o+I|ho2pp5D|16
zC`L_Ix4pMo;Jny$KnWsYBr56EEH|to#8<#bu+(>PLmFoKFV|(wnLvez6VQOHn#Z00
z%yE@-#c#9{2*H{)V9JV`mzT_INr$!nIWrL@11#0^SHW?V#g?*Sl!4ab!?pIXT}s@`
z$aZLR3GKtI@6pEluDP{x9{L#t0JeWRB1we1gtKoEAVbIa;o!PlkupA?FE8iKiGmXd
zKoPvR(A<gkO?CsXvwxHekJBY1GeLfxe@Z{2W*D4%-il+b0uMN#gMZhXUH<~Se(^<p
zxLb;dLA6YjTo!ncJ$HCtaqe(69)gkh`EWe%`{^)p5UggSe*q>Y0++A9#>;&=BOhdV
zuNM191%3H2h%P7L6Z9=r@(>J>ckfZ+Bmo}~KkWKx3Nyz%-F?|`_nZW&7ljAa%vlBs
z+4SiM!yZeb!0j(HYRyM&FvNq*GsmN;(jbb@+$I^4)PPSgLh>k+RiiF#nYVgYY|aGM
zrH!b_av|`O1C2;Z*f2D<wW+i1$gJ5PvJH*50QaWJcqSz3_J%O%$&g>7XR+lC^Lz*h
z(yx3J5Kr<?M&l><g4!b(sTJ6#N9Pw_oU39CMvSHdtvAra{(mxx@A4g;@p?dQAZZvQ
zFx(W)fo`N`ojbe4HO7{sarC@xyO4C&4E8-;xIBqb<!=Xl5w*pLFV}9b1FKkPR{}uW
zYPhuZd%5c)q9pT+02A^pX<XC*-e6Pu^}jGAiY1%v_`fjZS=7=Q-v4082IYTX$Z0>n
zY~}3SZMOtpSCj3`P0cJSjB(j;htp|JB@dD`A-!?Us$N8r(b|7lNWM(m^PL>Rlk@3V
zJL$Ni_c!s}hCpV@jX%jJm$P;_lf1t_KTWc|=j8dS|KmdT09;5#nnX!uu3$@rLRI%1
z>u2I_aS50c&hb1^nwE|>%ES|Y*0n#u@kzIwD{jBxgBY8?cSYeI_w@K8g-bXJO|N&$
zw)^vbE~R2g?t1-S0X%}UwN=&_+hQFF^%@^NOfGZVl9!<JvC1X9<2ms_mJg|J7O~q0
zKo0LlNb(7-<Lo_v25AnUyb?4__z#>f<qxTK0r{+2;8RTuNs}3obL{mz(RdR3Go`5d
z!?>Lj!c~M9h{7NB@j<?9$W;P6lDk%kmLOdY<haw!Cxd8pix9dU$VvLsR4jEXF!{s8
zH6<Y89{FL}>;G4W^z~Qp_g7#@jT%)k$xhBzDK_y{=##B0@u=77R9_KW70#C2<@?}V
zhE=+^)R(hH0}E3f%E}Okxm!e0hxkrk>-71<yRkFf-Q?r&_i{^T5A0{Q@Q!qZbx6wj
zL<C>M{?NxR6|t-y9~8P5LbJ}0TAcN(DRz3$tc0R#gP|(>L_2mRm(54x1nJwY>orV5
zAzA#cr{%&izY@j<(S1b33;3!{#UsifNi0q1W`}$T*{B7@67Xvf2fg!yu=~V`SeWCo
z5hBK98L{V_K6CKd;W+U0G~4fwSf2g`ShuQD%%{pda?ftjbE2`@<`EN~d&{UpXp;Az
zaL9t~ktkH>_iNtjB2b#z3(%WuB&%%)u&i?Wm6S;oR(_rsgg#%`ov7YE$qm~COUqqS
z(1SBh-WFcjIBW(;0_4906?g*1{K-E1vq@@beSijop|Vwx#_6~-#ZM(;D9(_fRjjgL
zkBIM$Lgi1ep<WRdmBH{&Q=%~LBqS$5o3QKQDpRW;D|B<Xk%X}96G2wgJ(?9KQYndK
zwD>rg5t{Ot`b;%94)bGi{YV4rf>mUterQGzLmp7b{5aX=5v?a+_b#7;=|`FaZQA67
zUJU>*LM-p@QSz&pgLf64&7=gDx{o`7?;mBgXdx3>Vm(272)ltb%5)x@v<RlOk#sdo
zXtA$^ttOT=ii4w`^NmU`?PgX27qz&DxdFXME5bO$yE}cXN7Qx7MDaoOuO7AGUY%A;
z;umMY7KzN@(bG?AF}3K9TZ^?s^A{Xw8m0{A<uy_6<yAv7@zJ#Z;XTRW6W)=^yENS$
zBsaCZxbB;97_w43McZ7T>y_%>>S_k^Z5Bp80-*uaRzH5!d3ouqk1J4n<+>C*q@TXW
zws!5T^kV6w|8vW9$3{$N``uhc?zy*A+6Bk34opXIab>%$ulU)$XRBKNC&{sCQD8P+
z6aULh-L<n<a<B*}5Fih=+Ntlb)p<;wyG4UPFJRI_xWIxDw*E&sG!=g-1bn@l8bV`b
z`zuhB1v?8m35~EWCWEw;@$2w|eD3CNH63jFE@6iZ030pKw~Q7KKJ3jXzdFTE5)c_W
z?W93J6^~s6&=$EK#|{HKtm#&ow$B5Zp&h589oO7$pwD-Ui-3?(s8$`M(P?^qx_*we
z@6Y7-N`p_pE+ebQQ=+o<-f(&>>nKM4&lSL`R4)HPUVF5TCePzs+cW-H?(4$ud+3)F
z<m-{)*V#=UjS>{%)+})ZoMQpE{G3retxkX2*A>*486IMEo7Z|h@GC74v@grlvM2;N
zUv-D+{XJqUToRn-rQhGHe$Mm!&*<<Y;=f0^0vts!hW7%~gQq8_Eh`mfB8{4U<*hWl
zTE@2Ued3XZ3Q1vKmN<(i$tTYWIY9AVRP+^Av;%sdw8Z-}LbtHUOsiGM*q-6uDo<OR
z4WDPLH=<*UoK6E(XTr2r#?u{T@~H=0GNnWh9-SK$t0mY6#M3G+H&R@snJ^gbstqxo
zO-9F5KV4-=__W1dB)>uzz}}0)DXF1D@t_^}#R$A{mc(b!qo=4Xvo}my2JgL<L+?UD
zG2xA0V>%b&;agEh(;t(~)yGY7V|=49@K8VhiKO~Zazd19JH8=-9P4w9rY-W{o4($M
zYugdXjOvaQ3eRWn$&)agJt*%_5ZxFWzCjf*z+QP9iu#>TgqtSTywu#tt)M`^AEdoj
z7yO+h+F#H=Zz3;li>+6o1BS!KIQO}tZ4G(QT{zwm@<BXGycA&$JKS?*{J>^qehYdr
z0Omtta|0?xv;<V?KFMpfriCMnV4Sn)aJzKg9)(Yq&oKOF+aRvH!xnkgVA3aH>Z-n$
zAByki9jwP+;G>)X-gXb~y;DXOkAgek1w#?yI%TCgyryo2Rf5)tMsYufzs>3EG^@<Q
z-0W}+xdQs3lfhuIU+|=0EHo`+6}6#EV+h^!zf!;Kw5Uvlh$QFo498$54Y*9{5eR7!
z_dg#oQC`s<LfxP&St0_VsxYXStd$$+3zd+j(n+~kypepJo9Un|kK8NOwAbJlA!<TD
zt${R8!$K8wJm6q+=5$j1eL1VzvWQ!$X#saJ4Y9%tJ&l^RS@4^xjYiu6;wL&3xZHzn
ztDV6*HenElW<^|Y%$Y%aQb<?XH>tGqg(&l(oX6w1A*+)NY|0{92U~f`cl|>{CLS4I
zp{*W9<BuPwyw^WH=u|{^5ew9qg&j4PveRi8pd|8;zY$ecvr&IH{9J13a3ET-I4c)b
zwv&@F(>_P4E@6=&5&K!Mwir8JF=mLb`B-IFsny*Aaq1Yc5ahQ5eXG|=jGUXCzCVwO
zy7#Eha&yVvT&gMhUCtNi&Mp6w|4I)0$2x46r>;*BMN9Z8pjshY^XU+nk8vaf%-y-P
zDd>*^TgqCa|97h%V9v6gy4Gnj+2gnI6qrkBL;T||TLNO*E$SA<WNzM`nNeBG4T_oL
zYFo}P0CHTeQj~7ZW>NLWlkwMN(wRp8vivEh%eL~iN5M8<OB?NyAw{1<&(C`8JT6Bh
zlC1(`_le3Cwn>a-gjzKj{`Cv%KBo5~SZPaaa$mn`uY4V)b(!}paBs07c(C7%orm@R
z7A;%P#zfvU;4Wj#@VFO?_Kn3hzO_Sh*q_NI>aC_<M3&OIO34f2Ls0V0xhKxZNNFY3
zsz#|yrN<0IlQj!1mE3MjvS*19tylI2f$gwCav@q>aGsaE6iWS-2ux25Z$g_D&2YHO
z7z#SNcBMm$A>5+oL5TRXtqok-oF2@X>1(9xPtDCj{^+6X=%3}T_OT*!Mo~ltYT2kj
zL!PVa$f!6fcMY*1Iqt!OMdtmBtTN&4hmOXJ@Ty=ADOOSk4$J$KMxTRuz)nrjZ@;DU
z*g_R<vK{`S34w$+`salB!ep13=1H8*arPiu^I*s?Zqw!qIao$k>U1N@Tp+Y$n>p6l
zOKeTuRR&b_n^Z%in!bQMy2eI>4u@Al$y=kw>*qcGZTJ!NSOArQn@{KeIlmwNy!mPc
zu$yi@@p58*bWyY2??F9*SSn|C(Zu}C@wGNoVIc4jzyJIMr7j282raW?^H8^h3DTsp
zi-He`bg%I!K=%lpT6u1@`@GEC+VHqc|3vSLwI=<L8sY3$c5{##e&2~&&Bbk>O}*Hh
z`&}H<)d6|$nA$0!-1U61_1k6kYbwsJYkTWk-~8Lhr|r_m;W@6&33{r0uS3x$u>31-
zt806jtvbyAa*I&0ic*X^*_Wtw*Ci1ck(K`lZiD*_+j>@37Mkr*5A|x=)8NMAg+w%M
z)Hk`@N|3+c5ZE7f=B5UWc;usFUFBQAYe<5P=A^(v{cy%lLe0n!a8eZN>`Rx$Bc)=$
zbY|E?jwBd$*d^ilu}q!Tl#afCMY(<mg*5dIHmG<UfsS-V{U!q(V9r>+MfpX3HOh(k
zSHK*;D?iVONff~)aTT!shX=b9jMMwE@Nz`vpyF;@=&IvnWhBb;mp*Q5e7H|EJ3CrQ
z(#7@Eu)hwFB>-#&VysALV_7$nDgF?G8Ki~$|A@NB;7Gf+4b(9wwr$(CZEIrNnb@|S
zbexH8+qP|EcAod!yLQ#8`)B{^x_k9n=XD-Noe@vWW%+bw+8=ZnYU*^=pNUaYVY#20
zllIy^Nd_EKC)2R+UY-LJ@=S9!u!u6e&$H&9x1Z7snp(|;<FJi2wsEV?=<y#2O>M6Z
z<Bz0}#$%}G=WD_Lt`z#DKnVfcP!C>KFrKhSLM6ZGWk=;G&iw``87Yggr_4if10A1a
zqmu?kldutY4DTl&;D%-T#n3+!;t2C)64hO*0~?8B887=a|AboPm2;J=)5f>fhTZWd
zBX|QrBH@O(^cLE-o``B5(w2jX&GI8jCPljvNU?-I%RHb(Sd$<L(6v#QZ@dsHBRpYt
zjjAY7h`xkj_K0{O0dMpP8D!E4gL_~?!FQ_o+Rg|SpdreE;0*N83=vusQw|za0`UGI
zn0fC8YJl)kiTougLi8CX-jHG@_30sf`1qrs8eC*rA=C+tj6lgn+|JZLfG&tQa_2Z{
z2AaDYLh@!qyzEATiG?evrhQNVC*;!qfFsDBjTu1|;7-~}JIfgE*jRRIYQ1~s4@ELB
zie|dw@RCD&D3z-WP~g8xWPBhM1hPdHAvcN1I^r?wM3Nb7;fb*y6o!yQn8MyqJV<HL
zwWp)2*Fo%2ouHd|N69<Cm4?!_`TrT`aGY)q$~GD~p?7doB)kVIL*a^48hHp3Ilc?E
z=iH1VST^zZ1j)jplv{MtIAEnVeo&jvc_mQfC_q%RP*&N0sIx2U59*v!Yr8Q%+KcjZ
z44V|94SatZBk!qUjmj7BavZK|WEy|ve72_a;fEC!pTkR>AJM)KJ#k`vjc4n%6K0C;
zFG_0l9xFUH_SZp7nZlUc&UUt#*P=1LI>Lrj8#GMYN^l38j*ac@K9l-`6^&8qw}vDr
zabKw|i;=zy%rlN%cwUhA*nq7MLT0*!<nH1MzYZ6kD3mNEq9slBl}A%DGb!iJ8WS?x
zFksnwxRom8Nv3ue>d`8jPqA`>YgW6qYIM5foHRVHD>j1*EGBhAj6Piv_MjnUYZL_L
zOyd$ALg;n-%Mj5XH5R-#;s3y7go+9gybcNLe{^=0Sep!HQ)2H1B`|G`-e*Om>m`5R
z&-Fu&lrC_sw8Q4XnwE<ds`Kds9a1*YZ}&Ns-pJI}$CL8sW7FHicuXj<L;wK^06EWE
z5bXJKgLLwy2ql~pgBtOMb3+Oi*^{7+l+cmJO@PtYqQ*-pKBEGe8Sl3%6a+s1o+j|m
z=)|+hb%ykd)zi6X4SdyIUsd?uAjXMY=ULKj>^?QyzZ}~O7!n&Jv#8zGLJ<1cFB^Fu
zEzXd)5dncjJW34i8abe9vqEy*s?nkhp09y!76+s!hbHC5d%Li?fv`2e+jvR(TA<hz
zw+)Zd!Qq$d{n5Q;3(eTd)S9k%OSE;*N0xAEy8|MBAo|axtvSUGSJNGbeDF}v#JcvG
z;EEQFj>iLDLXauwx0z}$i3?6-y#siBq2*;uphRIaeiUN}vj0~x9#62i96)zRUy!*^
zCixjng#8&!ln3L8=t&BFIvfgG2e_dg>VHtn!wXjf|GuWXSrZR>Uy2q#GO|RK(Gg1J
zUH_R)3}-UqZCAcXA^fXcA0*W?I$X+2d8h$4MDg{6W7@GuZE-L*bEtLK<pz<9fKNM7
z^)A#PUuJXM51)HygC}dC7X8cvahHHB!j}_o0=mM3S=(h$UHKx6Q)>PEOO8=5)vA(>
z2oCXMra78iXg;*^^KxFa9UDSc)d_QsR5^xYLNml7$ER!|mtKaA?mlW-w2_YwFzwM*
z<-{qxhxk{^?axYPb@3H(0A{!mETL!-_$z}<5UqPxAe$8lv)lVFFr<hEoG3(c$J`ow
z#w+Zil|d#WF&kgMvwIA8c{CR~XNrJD0fxk#E6R*W!7+`}YcFQG*t?-@JlEJZysFKT
zte<i<bxSN3y2XwuQOSjKQm$l+OB+GXdZ=fE$7?vj+8Lp{SsWB(<ha9svxdE!iFAsI
z@Xs1OOYmJ{rS>}?um3~_QmI9?634TODgJNL(B4a=ir;BQUpxJy+wmc7_^Btb?>UJj
zN`YYg4EjrF>^Onv0SHd&0V#=MWSK(p=exrt1vc1wP%T#E#~7>zjq{6ysBR6pZ#N9W
z6HQu`s6FqQlIz#X4rLxi0Kf#q2Of2zC=@-C5x$VAI(3W}Lc|yvFqu(<<O<DXB=ur(
zqdNIq&LB%&Kc2i!WGP;lijQ=ktqw0ML**k}QWCVF92h^*=v1sDt%k?@zTVfb$!9Aq
z)QJYh1~nN?X-Xv_iAip688fXF87UZQdGWacd8ry89bKAq2X_9Pz_no&xMS5U&4sDF
zme*0IR%R*1=pNDc|4kWk%czB|b(E=$ZGUh7$?e|u(-ES=F-%GfQH}_$8rw6W%&9~B
zV^j%ia<-zm7k+a=+kFWFvD*bVM=+4g!-}P<xJ@wc-c{aSspaQr-MuZCJF!u!U%w1j
z*QZRE?3LC_L9Jw*`zfv3QGA3(>DDoXIqHBe2~=2uW4;B;0;?cjXlrO7|4T2tiSs^m
zZ+csk_XUVRG)keTzp1<=`s0p`as1={;U@=FmU;<<N}%}&@cyF5HUc?9T5SDny?Y>g
zIgB_$xM8>Bx697?!BAcOvwzBkwfJ4}V<d-lNp=el<!2bWTAN)iem-n#__4kOlAv9o
z`+=Co6?B^t!q74qTa@)_^BzG1-QO#A0_PzA?6f!F<bQI3v4P8W&RaISSP@skT%Gvn
zTwngY?7bm}$jKN;N_#S$pKF}(5KEiON!C!-h=reLc)jBpFh1rJyS402fn5D^s=klV
zROGGlZkw~c!LZ8H#b@-#mg69^!Bv3w^^Xy{Ho3{a@p{iWx%4$6Be30X-FS;@%+<zH
zhHc*S94hoh0X3$nhSt66$`0QBGGEc{k8&yEmALD3`_KGFuOo}|Z^+)wXQbERQ|G8A
zdZN#ebutQsWIvW2N_xhN7Qg;q+9shOT1!%_OA3k`X2d$^6;ebfL=b2&a;#snP)w`6
z{#*yCR|;>i_E+sLD?c8R7P}OL+_2=9T1SPO+SPhfE#F~Y%Z{BP4s*LvB7`*RK-Hwg
zxw)R|n|A!dlhUDe_5OPBd6nk^cBU^KRB_@}Y>~2t0WWgg$SrlB!Mit(0r@rzc9|s0
zIoA3Vo7%Jsj2>wZ7y|<%S**m|_#&HZdaAiS!<2sCV1F)GNPjy1cMAFGa2X!d9`PxH
zzqo2-hy|hNH+1nn(pMI?;w`5uNTqu=HXNmUd|mCbe$#>_?gBQo8#kZg-%=|)#}X<|
zJvj6b_>u5B2Kama0y=0wwV=rsI$BjTM4YP*+Kc#4MVd-;uC75;(^Uz>0z7e0xeN_x
zp6gr+mX1DwTt=H^L#`}MCB&hfWK+HnY#rm_?T+f;9;ErxdOaH?ZSGj*d=Ho5vd~9p
z=2p=~v8PTFIhvm>-jgRbPI5T$U8H6Qn|TlSo~|O5z33Z$88I4f#EQgTiBqi)jnu<<
z)IbSb)=etq<tG1KNc^FYz>O|T+kj-pJke`(DIZyaaeaHCRk2SvZ&Tuo8LdJ!#zuN>
z1p`_!pwg%SkG@^4f}M<5ZG$~Za!I*8T$RoUJ)xq+=P=VsNuvf(FtRy7$Sik4c(Trd
z-3+F27dgKY(5O;<O?0)Wja2M2Y)AQ5W|bjQ7XnNa76bj{Ch*tskQljQ!l&75nH~|_
zh418hm7x;OZ_{^>i_y-FTcrfc6xO-&K+2+UHL(aOQ)Fc+J?L6$dbE7*2_rbp`f<Ei
zD^q5j#@77c39GgbxGMq2bF`B-q$21Cv}(sB=-Mm|76=q!5Y&^4_jH-t<yHvUYzwdO
z46zJKSp<5$L`&{Ab?Y4Beyb8+7_qh;el48Fs0ZDfR&~z)CC&13B^8j3vhXMD-wf)O
zYDIz8`8&)Jx6KuI>(16zDkxQRsalJpQ*3q$2Q1*$)?TQsd&^h^mU8UT^u9s0%de8_
z9YTjp*5)3j7UKh#lGFg>W-)OsD|6Z3T2>isgK=rVsq3A8C9&Ch@5@)C-%?g9L$7)}
zX<C|TwsmVb(7P_6$^wnKhtW&Ry8dv-h&Qi$(ljkbV`=}i5HeOGE`rGcW|=`7<wXnv
z)AOiOw5@%P;W{lBSinsQ2UrSXsgvN0U6ujH8cbX$+pHvVO7F(WC6s2YQjz!DloCqg
zr=h<?*OaFwAJvZOqLpBlIhYn8ux5K@0W0afVgMWTmh$vFMmm*9PwZ4n+lt?2(!f*Y
zvNv&t%BA~THj2l3(##Ypju)1)<fy3b00g6#4s0%*8LGPT)zVG&g1*(#OexS=y<9{1
zB%^o=1Ak>6|FoeyPro%kpX|XqK+oIU{3Y5)pO`T!m>b%cPP3@j`@50^@A8U>N$>X^
z1J5(5+s)P<R*))FJjMf?jMp1fzlLLYeHH`$q#BJpx;Yx(q9mOEn{RhI7j03QiXg&=
zeD#u*z!%uE6=?ul6N`uu6aohb<2qC1Ws*V!;T_56&pa&JY7?|4Mqi`N(MM&|1WVJy
z3lZE-k;%kPHmQZt)M^98Wdp@k)Xu`Qx}Cs~=JKyK2a(0dn4=*gW#j-kV)dnNk8Yp+
z5Z{9YKIif7TWEgF5G9{xrz8F>bzb)YDh?4MI3l*m5{6f(W1<cQU$v~Z)mST*JQB!6
z&7?E5?26doKjb(z=3VF`RE6wydg+X8rg7N6dIndr<vWmuLI|-h3-pU3rDlyxw1`fs
z*QM)uLwAzHZO6hDwY=mN(B{z5X2h;OyGk(fXs32&3-AXgv?2-eA?YG8+X13*@z}*Q
z7_#v8J1aOYNJp2UdWajk)X;fC{0trLS0-JNg4v{%yuZzyTQ^uG(GcK-(IfBw>cKex
z<TXCx7pu6DKMoIuPIh>pfm0F;hbt}*XHkSnz|Lz`*JdBMr%FJkbaZpbAuEzMR9A`4
z<>9Qq!U@lmwtzIKfoKlM@v97gM``QiE_LT8cuKnZ&7pfmy@2xk<@Ab)4XBa7R~9K2
zf2+f;qf-E6eqf$lKVr;TnE3Mhuu}MeTI;+YQou0?`G59K<<|jwRMLukl|SCR#91Yb
z@>Lb4Q5;cD=s%{+#t0Bkj8GiQoM7_MY!J-e{N^W*Z_uxp2JEIyGZfmCt$3S0_H>+4
z*TJy$BU3-pov3z93u%eAE+T5rs3wxDlq8xzcD5J-_uE+a!VpXFRl4uSZ+inn8V%1m
z&H@a&4FoyDXA!z&SR$`lSMv=+Z9f$ssx*X46s`F2K?KO-^wWwRJHHHZ&ER#k+_*@4
z`a`AmC2#Rz8g|GnJ<_`0KgO2`F!J<<Mu@IAS2cMGKLhx@(evt`M_Q~5a?APn?fkve
z13|nEkaKn`jLbmAM__qF#B`+#iQz{Y@}diza3b4#v5Vw+7w+t{d8>WC5_~_rKMu#Z
zQDM45pkkO+%GTEwE@g+=5EL^-rY_9o>866zdcH2Hd;>KQ$#kAZ6Ph)Hi&-n6D;43U
z^(j!o{y4!6n_>(8H7j2bE>V~#VT!2myaKcZ7R*?3T(Vsy)9;K>ut>fPnGxVoVJv9P
z4()0iB|i0=ov&|PxQodMgZg4I96G=n9R%<Zjft|UkyT315U}JUDf~gEaD|drS9Yee
zqC8C*bP`ruv@~qxiy1k@<}P+00+dcyW^<!v&MTmm{r&4NgjYGsW6q4H*ex;<oi_2R
zd|u&xG%&k)(XL3wd?AmS5mvWp$`kWyhMU_Mc_E+J+6NvfcPZkzu9d?@2&p#s8o4N&
zsbpBAL=~IHqOOe1Pq}W|zeh=G26T^db&52D!$gnCCwiWZBd6%>HgSwkIsG3Ols;m(
zV(FFkz3)ddnUAg1YI&_looP%RR|PR3^>i?FAX$hcPrloY%-)i}lG<{}6DzGAD4Rh{
zkWxjk8txY8;EL2OSt%PQxzk3jxvw#Lj?bpI;5uFzYtM4<HO!_zp~f9_i{9Np8SwK$
z($hZfa*hTfgXuL5pTBw?_N6we(DOWpiYk{49S>J8m0y|u>~b@KX(0RQ`(c8ZmzDCa
zK8V^?`|XyvXt*TESWK^i<8vi$tsndz7+(I$k{u@dZ_ZGH^I^R(H{BbUs+u71izv4Q
z(!JdM;{GzI<PC)tk=wy{2fuc!Q>nuZS7CR5vn>TjWnu^4S+uSx?q+`iX|OVn&o-|!
zu1&_54n^%L*Vwfz%O&!>N(8c%je?^%2!QUSf`r292{{G=kL5kzA?PBCk@osW7Q9r!
zi7i6<2tF3g`4#NWYAq{ffoyWVAc}i(bj|cy&SVNRmw6@MFif^<Lm%<ZlfnSD2jcGS
zexO5DN2mm1cf#kJ=3^U6DWRV8rMKHpCYfN~8jUQ-qd|$`;Rw+1JP>kpM$xKdHKITl
z><IjE*^rWSqFysu<Kz1jqxkXvtqqqML_lL%*aHrh1C3doH7T<%2?bp9zW-R^$O@hP
zXA3g(cWQAY9WqHx)Mi~sps;_zE)Fo^zfbFP;!_d=pP*Wdf0^&5XU~`Wl?-C?(`|yY
z8+@DZ3JObh*~b{MkgYyfm-S-EX=hF07#otD82v3j$x}wSYhmxrYhV1X1hMap3F3`j
zX#I5(mbh|5q`sUjLcPZ{{q6+5nv%=<CZ|63MKo{_i@Z;xqj_4gAVJUwFz}yR8fp&T
z)Nu{bM5shrVG6201;q+~q6HG;j~5!AFQMaspzp!MOWgOt$rl!H5t&ShKK4E94zyaE
zn0<vnU?0h{w3|3l@<#ueWg8wy#0#*KZ#}L#Zz=JPVtuTsw2$fg{Z<fwM1{sILU663
zpWf2S{)i=pj25ICXT&G+Jj)*i?kM&iP2gkD%WDwmR#Whx%7sm=Tl5C$99a)Ko2ak7
z;yEd2obkJRnjcY!&>JNCc%M2x{F!Jsh!Sj?lG^5uyn$CQd9O^U+n<V!_?Z8U9)0-i
zbcnI@1R7DE6DCyPm-K=ND8P@16ZZV|u_*P45`y;|wfzL%{G0xun6pEb$8nzj>0o~o
zO9>nHtok2EA&3^{)=)z}qvzAn{1JnLfknrqYUW2Od2Y4b)_??>WUzqWlepK%|JDY~
znQe_S2=aSY;%}^~?{qs!Lc`?GrJ7CcTwi!dN)rEx+M0d_abT`IkktGr=aUR&EBs&C
zS6Z@3H!ASqt)`rh4wy>Ul3mEgrIDfUL#Q>+2bc4A=1miEsRnvr|A|9aC;y2<wB1AN
zuo#s8i9@*|I9H!<M^wY!N-!{`5z3;51YlL#ohKBKCnZaIPp7CI;sS5$`mMh}Kkf&|
zAGtp6%s$o-JR~vko(If(zwQnoV)5DGk}C_+wAAFfrKyMk+X1{%WlhhMuHN5V7=~sP
zh!35fF4DaFRvPweas2(!@VG54=0-7!2aiJ-C#m|A-yoz}<8@YL2K{L9<DA{!-OpAP
z&!)MwEaUGpb!Yor7d@RFpSRFCqw7H(&;5(zC4luQljkB>0fy{vca-eNCRRpfJ0^pm
zw%n38hE#%WS5i3_=tfBpoidi>T%}bSm9i_?yn~FUJOih+@&+Bgq4Ft7XHPxfhm2=a
zJi>E5A@mdM4lKi5LP~D(g_&As3FNil4lc(!S>rprMNlL-tR(J*(`+6=r?s`)nY8r%
z0E)el7DnKHnmGBoKlkxs%OH*oNPG6hQ@Op%68N9uKtJn}0jN}eFPRVGU7=VRB&#Z_
z;Aay=JN*xI5+Kj?A^vgc3`JZEi@5vS{|kK^S5W1oyP?cLkQ#PUY;sF0mpj6fm%2eS
zyF=8I)V@>=<@?@yq9VAaT{c-tkge~;UKi66k*gaC{`+?|9iA2;YLRkN+k<?%ngr`<
z@<>?Z>Egs|Y(hN~I)Sm{_x1fmnyRnqPMbuQUdB5NC`wjH`UHmspSY1c%GwS{DZawq
zy=b<ZH3t3qNhuLEV$nl1r1QtiA<=K^(VSPitFy^*4EI4PP!y%kaJxLS3LUS?Z_8zU
z&g=A$_OzQYggY7uKp&jg48!yniZ-eX?mj1DMgYxBbzV6*r?>zfP551zf{`L-+2Ohi
zWFp#Lo0sQzOy^&f+I6Rb!&=$<*Be6p&nv93<v0&Qakkz;R25}HPdv#2!zVkaA*X9X
zE>9?ovih?_z)4D>kmD8(?6R6pT?MnS&-@*zjpMcK)gzxzVec9w#r6cvtB-RUXq2Jd
zp|1*JtXp*;BqCMyeB2Snzw(?9>|%&uhJiBEtIf;#1J93REc<8OvsyzP*feBMGDrlo
zS<LD;y7eQ7NP}HUa!dhXOmsslFs)hZCB#Nqmw&@RuQ9;4IH0@!CZM|<^6&PCh3Ah=
zM42tn4{M7y3DA@gObN@hV0p&jPQ^)xok$JVn;-#M?kanw1NeY!l&8mTe;gL^DCdVL
z27b^EU=Z$Y8QIKfhADwMg`A>aDH=CAMk*Vj=oog@vcB!C|G){r!kv~1lD<<$Thq#>
z$18F4mC5^i&*#`1t?>#yGcYh}m^{z6FdF}|7-NC)XG#N~dMGMCV$(&)k$A=&4?cA?
z;Ij|)AOSok4!3*g-`FZUy|CCvMI`M!s3o3}EkQ$gU7gN`F84QcO1w|XCq2&kPGr(<
zB9Dt@((o6OkNG`!tWR#%SQG%g>wxkSuMum^*aToJcyUVZa0~hA*+%2mN|0;y(lWB@
z^n4QvZ6*q99)1q=6Q;YKOU<@<ot5zjjKYXYpD*nU?rGXks4bWN>%b<gu_O7Tc^L(K
z`YmtC%J<kP$zzDoqj;#ou91>fZwI~{!wDKpBcbd8cuvOkE!`&&S_c%w`$dMle`9{0
zA>-W^9HJfik`4+;4_|7CDJWcTIDl6i0O$LZ)Q}dUsx3tz&)!7R^N4?$l>^nQrJ+>&
z9x<Beir&B9P2%&%9K^@yk?DFrQdgV9YOnB08~QyuCXn+ANbeoGYLuR8@uKm8D;P}Q
z;Fz3b#EK0`S3+@32I*7G>toDoquBmJs*E%aVze@ZP)}f0qL3lFBftfs2+6Md-!W+o
zv_Krk<K2n`f&Udev|Rki?=NU|F;X4F-G7-W61Jn?$4Xb{o)rW@K?C-2jw}3#%6~?f
z6Xcj87A~8H%*%*y{}6g90|nykE`>F|&tE3{UNa|CSb3ZeLkjuS3e?|ZSNcLrwOr<1
z({BlYCeSkQl1@*qXvTgd?WO}yQ^eV;Xm!(SlR=+}*2htEGl~WpZLb_!6vVnPe`{3A
ze1y=<zWN?ES<gvIbgEkR5b|kTvnsKgHjx%oZDlqa=ouwcjybWOIR#_?E{VX{C^k#D
z(?)cFNHSZi8kQPJOw$OQ2x7a+?e*FSS09tPZS01!lBClhD60H*e-I(=ueyA-QepCq
z$@45O_aZKbAu6{L5E|Pk_z==R;`n7DEP)}4A@LAA?7h#IFo?d06gSc$`qQjL`O|zm
z!BgkCR~UH&Ar=*W^kSD}w|kkO)tDYQE12$}RU$nW=&F%3+49aj%?*?%*kV=S!}9as
zw%eg2)oqD3yi0i_R8=&by+*#uDB5FMAs9nr9=a1o8=R$e%rBLV{5?Dp@DVW<!>Iz$
z|20Lj+NrfWODY&6$rEgK?+f7M)~*K<l+FF66_+0)muvM}XQxsQ#_?H0L+rxH5zHN<
zHw(1?unvauqH3QVU8&hi<FYyLtv=d5z#7{<Fq^mn#8)=+X*|2Q8?aGARO_~CRo=jx
z330=KdW%;{F)41^Hmu|&ocac&!G_Kr1EC4z9EaK#G_M}y1ZjZFM(eNmZmeHA=5?`1
z*nVm#5#u2n4_SK?@6Z}Ey#HE7Env7X5(H{=1rJ#kfsPAzOQf6fFmd-nO#jGRtfbi!
z+k_%RR;Eu;NF{EmEAcB3OGTeoW{tTSU8-G|cy2G@hwGaV_^SJ~YIi$ApZS~$dX_A)
zXf?J9B@wSx4l~^bJRN>^>qKUIU&B=TpG(y%sQVvF?fNM20!XpqoVUk(t?nCH;HsX?
za{wu+IWFY8l{_c1h&qnrSP5O^=Glw$)_i-#xvVbT&@TCBGJz^>L_=AtF7!M|pZvgi
zp|ROQKmx@p>hAbwA=0lQR;4QAs|kW-$-i_KzCxasw}tDUEebS&B5~z4l$Y+5w~dD_
z^wU$41KXJq5l}8;`lm|<gp}CihSLUuW&zg$pibdX1sfTOb$WjSrJ#1hd(UqJ$O4@H
zP&#audrKW0!X=Ui7{dYh5YfBji*Q>-yt1yh9~SMnAA<rpTsjk_P8`jn3iWFxnWwQ*
zV(LIb>VE-bn?UqRx!_sCd(?_rkEkUR%6HOL<!Z2*I%6{MxksH+xCsTvjPTiU<I*uF
z`=(zNy*r>&qcH5b8dL#WN6mXChQkRu{VLXo+wHa+dD}bfSO4<2yjyN8B!0?KKnn3?
zdX`f$R123e;s!r{&U27%qDb4blFN~Ns`dXI_TC$#XS|2^EG+RDhk+;{Et$MVmtet#
zLfL(#@ZixSo05_x)dQco9;7Q%*Wzoy^sYqGl~OYikLH8G-t4rPj66RL#<t+gQfU&T
zh(d~!LePm5`%H#cZBQ9-iPLyf`PlB^f~V&^<zkd6)qtb&%=h-|%Yd8}ZHkSX^C?m6
zJp4NKzM9aQ{RB5Ky!bl;)k0d;P*aYIn5T^*6pobHG-#uKxM&Y>%V6f$X(ztJH}q-u
zCcfJy$0Eov_*H0U=2N`6?e^Dgq9a@?cU#-=qzQ^x-M3c^q1RXPyml;Q$`{%LwKGBY
zAM;)TtB3g;z-Bn$aT*xjU-@-j5oo8f=W;b^waZ<?>f#S5&ol07o4bP5&y3Q?%gw65
zs5TjyO6;0YdnOHfD6RmVdGeLA&U;%CC%bWfB4&K*@1iZ(y(qOKMT)jxB6-5c@=Hh5
z%cx|MIjd|o;Wp)T_1BUF+X{SqKXK)}?z~&C=6j=WYlAgnadP!k-W5BPE9$RREy46&
z&2q0%;S#4iN&2XnEF*LHP{5phO(60H$!U4W&y*JIt4Q+j(=g2$FP+0EwGQQGR~H}k
zc&gMNR2CeF_wbkQl=JD*Rh%*`WpY<m4ZdIlGk7h4a(!YDhLNS44m={iQ%<-MmlTDC
zaue1Zwd%rAka`M6n%`LnfRE&@vS+YmBH83N{hpFEj)CgmVpPX5!<a7cC3Si==L`_d
zO8oQ2##)g-n%_hNa1g02PtqO>_<ZIfxuBO0!I!e5|KUKa^kDPgSnT=5S?uEl&Uucp
zCk|F`&6fTKb>Mc%29Ud#yB)Dc{4<v66z5Y&hf4P7!E=$y^H>?{_v`)TL<vTN=$q))
zY2W56@+$cT#mzcKH_`Mmyd-vf21tdTuOz}#iHyP0Z+iOanGJ%S;iiA)=b^suo8u`T
z|5%<%4sNf(#K0d;L#*Q|0)v~A`c&-gzGXNJV)K1(ZnnB^1g%zL`EKw%7+iO|UfeJJ
z?SGQ+{Z<zF>Z002rjAenk7;wV_@7>A>tY)A^ss;X1|)<Qafh7o;JIc*_KBhuXj(wX
z%?a-X&xFy1hJek@Y$65^>R7#)Fcy~joAwq^>VSi{%ZuAZ<4<p|?8utc4njYn_7yf`
zf9=!NzA@JC%+JXSGf`I60e{wXd2h>kwiHAlAF(4>pmNl*-SGq)b@}JNu^Yq0cF_|!
z(Gi!`!t%(sDde~wO%hQ+?a9?06I(2hx7)lin?*5hiy(Y{i~bS~uP#%#-LpFg4jGFp
zf!8{-N#5pceP~tP=JF3OElsZp6@rS$xSuml*XI**lXEW*mdj{u*<TD**vRZ#`ly2G
z!#Rf8l)m3n4fNhOvSxNA%^_%kdg_OhuUj#ad%vDwYz04OgtP1GjSkWgGsFjD(XK8R
z+ACm2;Lg<;62UB!qnRUQm|ILXb1o_>Z$R0RRPtg;0BEO7)oaqm#-)iW@Zw)gqmB$a
zEm3-giJzzSF;;hiJLS*mZVQ_rPSD^-J-NBtpZ@eJEN0>1onfubqs0ubLf-UGPZQ$x
zssvJ)IWKenZDY)xX^@d1%{`ULkJTg%Jwph6f+u&P%srbhZ8*cH$r^in)*|`3`~KEk
zmo7uuD$;phIJ^G2AWVoDVp35N{HR9mUrEul3HS^{ZUy}{3$&g_gGyJq>B*_V$(bP1
z6XiaH96`6GSFgFVakpJVdA+OCdg~@j3pxA8ktVSH4#^X&59p_<O^}y=HYn~)QvmDq
z`!1nJJL4P(d$Z-KxA`SBo>$&VX~xtxzn1DyCadzurGQzRHcsnk{C9ifaAlhhnRX^B
z`M(|{=|>V}0<ogGsepGMGv`#Xo3b7u^?^LjrGGd>9WD9r)#p0_hC!2~)fc?RDgSVu
zUS0~}k3>3l#8L&p&EWq~$+HJGKRy``=NqROZE5u5T_G**gpGO;A#i7i{Ar{PA~YhE
zr23Uk-z&?VC8t?$*9nAD^tE=4<R0QSAA<P{yJ{$(Zb0}dj)M5kZV~t&IhfIcU#pgR
zvrr1-TO}5i5%p3i{H~0BJs6}}eKh<2k~$`4zB)mMu_d!MZmr!RE;h^5s6ym7wx|RG
z$H~^mvq>?*m|0c%_OuZq01jRh@mrzPSfsbz*h=_F%WI0Hn++Py>7rtCXPXbubz{Hv
za2>nYodbIA{fhg0+?I2U$vH#8TrhFWR)2H@O1hjbXn~d7MYv}}t~c%#L_is`m6H{Y
z*L=N^ePh;R2szNd?~)TnTXoDxIxgMbl`QTA=D1m#cAsFV=#XBKl30s1D6(i&5Ai9X
zm3-71+p?~}^b<2RWG~tNsGl*w{%T$kdpBxe_L`eEu6<OOgtQsTleIQdISq_B#{Q(d
zK95iD=U**J7`J6UD(7P5Vr~}<a@0(jbxRL$0vFrzUxOUoHa><vFw3D$GX4Gp(La)&
ziC@wRhv6alUf^n>T0jV=E#;vy2S>?9WJ}|Pe;SdPY-U?8NFBf`rhMRcygK~#)7N6W
zd-7<DkL=@?@CnMe*e+bsw#&m&41q&ZT2YZm`A)Cw?5$KykC!6fq5e6a05>+?{v6#Y
zpr1@H(eqX1@P$?!jmcXRo1|38YkQslvzw7+$jR0JzKiB-3K-Gv_zw13IM8ZTpkHm6
z0m=pjjgMNY9=lJ1F=vCe=JCd6KGXjgjd-!vj6+n*D?RFbMv~L4WMAqnpjom$9aVEY
z_c7=2Z=Qgb^_YqZn1`-`fBo*Og;#wdEl+<=&7X7eLsTGS#UptZe;pAdbfZVCJnypt
z2VDI1>(aShD#B^-Tl=ojOk3JQ34elJ4Mh;aL<C6z3=*fuh57F@rO6y>wKglD&zrPO
z=7RdK=1GBuzeK&RbF`V*437W4n_*-q=Pi>p!&z6SJqo}ImPuS4qP#L@%~OagDU$zp
zI^%?-u5_-*EdZlvC<PVQk3v6!4HL4>pe$qSA%-bm!37pn3xg|s8$JqIq?#;4EkAho
zHv@C91)(!@F4%e%GWBl+QJH2xT`9USDEyqQi2X^Nd@*BvzdJK_Tb3C*$+1NVuYOe2
z!_Hz&&%I?*Fb&+i`d9ZuV4s%&Qk<+qoa+1vW9BT411$I9DgT=Iu@d}14od4R7-vnp
z5E0^sKviSScP<n^7I*vm2k{oPFW`#)T2c5uawG`7MyT*n2zO%5wuD)q8p%Y5)lv<h
z(FQs^%CF$0m2WbQsP1w#c%A~*-g`(DILblJh#D=uaTX_Mh?hq^nu4jzFOH9>#?GfW
zwC9CEp%m6fTYx-mCUmcolZ<WRY`!~TNQNVC-m(yMM>L`=&f`dt#9SJ!gh0&P-8pL(
zDquNSs9q;zbKYMKQCFe%++ULbTsEMrp3X%rA~^UJ$+Xz~qt`UgSy+AAPQ`RiasKEv
zFp(0uy#*&oEbf7Of^IYM1WqD0^RD$`&$+ku?F^g2iznRMwm*TiX`ha6EGZQYNM{kd
zPc{JYRkQNM9m#FQ(R+0|9yV_~#XAEf$qM6A0Vnq?e}vlVV`P(Pw|eHDylrS!w)Py1
zRPL%(eKuFIT(t!snFj?`EPo+OrQ*Os-qM6ZIbSAK%t4RGR5JO%U&u-rj-hvwHku5Q
zWLUK^m>5J1adh~#e%EqprHh(KrdPxk)Z-u&l6UiceksYOF(wq(!q<3si0}wlsxcl@
z3q<}vlG6L-&P3<XAGId=RPd14arSHs=pwYO{ZNRNi4yp+hR7CsN$JOqaIxXdgTZlc
zYMT(*-Ywq)bqDWHcIP!I>8}ruMj6X1vq_<yEE*>3Rv%}^6hc*WBGE6zNg1B<-ViK1
zN}r6m{%!sTjBs9lE#PC7YG;z3-6NNy)0$^+JmyA+)DL^BDJN4Xd?z+!LfSnLQ`ptr
zRqWjhr@azvo94;t)^_2xh6(ssW^5!+o)$}}dt=NoE-PMlKRC9flI6M~WS2y@giXD?
znoPnhOhy6(KUFy<U_-WIVIEW^Y;*J&g@&X1t@wP(M$pg3D!`|?KpmaGe-6;~!o@3R
zhQrOzhyB1O*meGz1fHiog;%H#v~q@-lKD~q;<fWxLt*p8+DWD1`3{K3=KU9ZUb!AT
zWz#3`3JS5~_npP?x~y|FZ0)nb)^Nsq&gzSuCDc>%pm`c|qpusL!a1t0aVBx~S@!$@
zJL&iF?EE?n^6yjpe@;!>^P@8bcOoXR6fGugV<<_sdmWA&h?w<JMv>{DARokNpwod%
zBhMT;AbUr2^Yg%|3YzO+aS>OAit~-+;dW+2IK0B^@HS2SlvfypP5RD5GVW*Fj)DR#
zm5VilOvH-ub0TS<Zu<TlP*IWc(nm$pgKSHfspM6#nK@;Lmp6Lke0G0WV_bs#`+d^_
z0x=oC7X1&5#10mNso=@fQpIh-xvX&#cATr27m>KFe~anGNNr`I9<Tpzhp>xe)Tdpb
zw94ao&)bPZBKB|{NkDR5v3}WZaKM={nFc4NI$33{YL#llOc`w**-dQegbCm-@j`I7
z5K$h>gHNCMdq9*;_Hp@eFu`o9ggM<I7~-GsI6}hR^@zy@*bY(VtjEzsT&g^89Y*P;
zw|Svb#f|g12HmKcu^b`@cR5UeNFz?U6I32%LHP4~Zvn0nN-+}Y5e|XKBwCOLC><%L
zJTO}@yd!nEyC6I4^7y_$_wB<_zYv_wa>XJ1{@Fk>xY|Tp5M(v8w8pGvJL6bZ;Eo42
zbz1`O(`ny`h99qW+w<<C*qvm1Hi^29;Zn%iI?^wgOG?_h3se{p8@o;aPp`!9Ke1xx
zpYRc&Nv3zE=KD)(rO#Kbc+={^$~MrPV!}v;kqM;;N%k0E&bsq1NP~|7H`H@rz`vGn
z2Men`lJIs77uSt3N&=MvVDG!Bw?vJkGLrEdl!u&PZcEP4%I!P3`Tc}%<-?o8sp#}e
z917cI)N!cloyx-l`V|@bPmvyeSCNXCq{J`NIFKSrI*;%DiyrSm<uUHhVxLndF%*=P
z%E^-a8{y0C-OTQMXP`17a)t9tTxfA-VqcsbEsd0u?;uu8H6+|pA{o`3B2R?T`ykkm
z+zUm#A*~lI$$#7Hf)3+K=^Us(*=@50`g4J}+=){|T1{G=eB|IjO_H_{3Phn>cw7&~
zRAj%<Qq~Yl9?~Qcm*p{+4NK_kZwPtRL2s-obKAbhk-%)T%OBo~(uXi3i8!r)cLpIj
zI3wk4><kz}B(K3c5zMWu3=!61Vqp@jyu7jq*p4e)Z!DO2E6d@+hk&?j38b7^>wXdR
z<4q#b;2kxx9BrbYQ4L7aa!(FcLtgb6t-c+`O8G0e&44Nz7UuQ5aIHCuW}XF}`3)5N
z!GJ11nkTu@^8vySKwkTim={3d*D>DhGkK{-$88rtu^!hQ=<lP1Ggh(!27GK8g7zsP
zE(hyVkQ)BP-U7@qc%C|{B0ooJ<=+8IT$2*oKr(u858rn3(#b1($il3=H6I}+@#_)B
zpOU>rQro|xvO%YndZbcRE%ed(poo!-f#0gVJseP>vR#5a?Ow#Bhd}>nHIM?dKSt4c
zm5PJdQah<(b-J*R8VVzG44|Hzt<*WS%L7)U&Saed7@2h;)yvhuh6zMyE=BJk5U
zHhG#3mZ^yeBLT4F!4H)G%$ogBT<Dvd?H<Tl8io{TBJf~H;4l_hTW3$>ny4N?S%3n0
zwcC(vkk=9xE#r1X>IgN}sC;qe|A{r>Z5oksVODfYzbEN~e?gGaokz)Rj?SIt-LFzz
z;v6iqXw<0WxCcww4&Szw1nW|x#f73=A(=|AiB`#Xv>SDiwl~i*f>yKLkv^SgNzEJg
z$fC?VF^$PS&V`Fcd~1>#-Ij%t9{K&})lgW(J$L(=5|jj{?q_qp(98@@6$=H_$;I1W
z)=Ex#*F2N&&t+%A=!M2`rLYBN)8W&3^s)=iZ#)L&OslFC3Cg$F@x*(W&*byMy<Wu&
zB;>?j{T7ATICImqprV}t-JYQRstywL7*sHGWY&SyCYqD<`_HR!;$-;oY7#0{V~^pF
zN8}tW$MDJQ2bgsm-txFPk!pyg?=xY9<`46QQ9mMH6QUaCB0`>kc>+&aV4EJ?`<4!e
zR%8eSnizD&ScOo>;J~M`kbLd;&m=A4!8$F<LAlu|;Yc!TdOgu{9S-M&Tb9y9kjMkv
zHk_3zW>jKlki#G^B!KLb4?N|yI;0~q|HsuV?vdcMwt)H*5>$HC%tBEYI~ZZRPH}jK
z($HO=ZwtS!Y||ysagXhc*NW>0@3fsD<R=(2&4_s*)kk0Ak#b&^{~uSw)ig#3wNsrE
z^BRD(ok4>D2WMGYcTVSAeD6;ZPG^&MNNk+a6Qgq^l@>=#b7b$5VPQ#jx>SGwzaCpd
zcMRG>|2(7sb*)NBd`+tZIYAi02cq03i)2con(yrKfW(Z&g*ai<DahGS|7hd`hr<x0
z4SuAH#NeP*Qy%wBbGXJ1oL8!^wK&IOMO(#nhtJ>d+tFLD9*Zp$C?2gbPvVL1PJD@O
zev*qCtl+9Ls^$b9?KqhgYBq{Amo`nifXq$q)#q4BQ;>z^o$=7(5)Qv8i)07h24Y}t
zw(kf&UsrePeWG|$4BN70Jr%j#h)JJtS%zuY=?m>pdyiBmfi&2{J=(gyBnMI-Ns3$E
z{Dgc@d6mEC?~uk1wFe;#+L(c8hvZLj6>3Th8ut@f1wq@HDS`6<^cZZ86oX`%O5!Gm
z%M?#-dr-lmW9y?WLPtvu04}WC;b}CL?q-uhXQ9KLRf%3Sd~S<RYj^et4zo06w3)A!
z$o2=7Jb1Pg@MtkcWYCWAsS4J?%>)4$b<=S8Ex^cf^HXa=^c82+nhv9|nWQ|SN!Sdy
zp2l}u09cmvtcX!c?Khpa@mAkxL<+cLh_w(Io0NULuUgXMtX&JoVIjtIT(*ApCESj3
zgN;0+$;BKwI-tC{cxM|<ulDhReGmR9aoSb_l;O*u{G8YL@et>f_%^1;Km4NzO@Dpt
zP!Gd}VnnHdtm_rQWHmB^<I3>(tO|N!H7a<o1=VaN;hQKq`m$5@1P=V-CENWWm%h;J
zv})DKvZ-a%-q(nli*I)BkOu|Pr)_0V32`V7IND9R4g+^`R*}!nSq`<!CmiCZIal!a
zG?GI&;iGJlyjYf|L9lSC^ou_eZ^buO+Vkc7U>!?!$0b|^UK5<D+ptL}bHZI;Dmx8z
zK!2`j*_`fxSzkY(B2AdTI6nUkh<L#v*U=;_rOD6+8jC2xJah+DaT7Cvc2{n_1KbV>
z3_%}%vh$xP%~0I3TT~?tqq&~J5a2eVqKlaA!5600v<Cm<YH0TAFv8s^CV25+J3@eC
zqTYlFaCh0V4J#x)Q3(P!1k+8R1)DDzEotS*{P4KSa+(jNvD1zb!l(kdU1&8VUF-_W
zq}|4957p>)xTFq$7v#Wbx&=SMC#X<S<;Zyv$;v18Bb8tfj+iKO3&{O7=e-{<*tB39
zV=(wBmieF}9HJ{qBatBDmrES4FH9q)B;Jiq=ZIbU@j??!d9g)uWYdop$5W$9h}H5b
z(TLy_+GV-Gl~+E%I2KC-T_*dCcE@IKUM_d=nTg769ItGNx3^Zdc9u4_%qcbM$SeN9
z9K(R03nceZ_%Z*c1O`<n7rj`pu<O6D<yOH9v;5I&>>YT@4K8ny@|^bV5W8u}%+SEq
zVl22duVC`C!+62tG@~Pa2ghI}REVJt=d6TXQ!$Ac`Fr67nNkLR>+rDuVSaGQNLS?f
zfoceH0t^{v7&hYK4w!{3K(sTvj0IXH`w<qFEi6pDpO5F?B%+*WtToEg6uNcWzv718
z%6@={)g62Zok~pP^~_h#y~q2?i;0#QdVT@5k25Bj>L2R{A{w+&;k(Ysvf1q{<Ll1X
z+~bZ?yPc!weh9DJ`~8<6I3B)R>z$D(WRMXY4iA6yr3wX6#`{|>;WaaK{KGXtOW|!t
zZ;8&@v;c^b2>kRqQ);kTf8h!3X5i`cNzdP#oIlTZlRtw1U5xX|g+K8~nUgZ>mAon+
ze>FiB;>gQd`*nhye$YCCoQj{zvoeJxzid73@5ydYalWNAgU=$jYwPfw1GdcdGr&(2
z3DL#tI{51pd613!-%E+k$Sn#si?&4vO(gos^|BpZ74FBJ-&+oaY=GOI$JIRvKFArZ
z>`vXE)nBl)At@`m%n)qK@Jt1X@**l;5JQ>@EBB)}InNGACmrh40rZkJyV>M+jvj}F
zKSWK+{}DB-H5QS!HFO3na8oo47yn1pxQiM~oC`hfo95VgqRs;CJ%AEuhgDgvS6}ci
zL|RK!>&SSQI6<Z_PEgQkEO1g1@)mImt<z1o{978tFCub`9Uv%I6tIynaR?er+q1cr
zjMWy2l_hrP*2FvM3%ep}DsXcR53J${7oN4XN$Ql(xR13SF?$We39O~x6xZhdYe$F1
zr+we2$QM$^&s5`p%5j)pKv`e8$d#(2w&P{fP`pFF6cAnfgVbD#1L9e6l|akkl`7N}
z<VaK1YODVwv(!+P5uVw$)(j>>|EJVwikYLcRVS$|ORyXzTS~KtS=E;PrcNi9Zi@pZ
zc==SbJ=ecFn0PLVn<XGQML>EquU8~Fr!Gq*y+$ifR6rA}qd~+JA@f8{&ltG;eRYpg
z;i{kAW1Z*Q>~xPy?9)CW3L+5_U?kjI!k3#^5dQ~$x0WzzjX>zecAa0enCA5)m(#OR
z4=HTd$Zq%x-=TtZkQ-VA4?JsM;oPcFY8b4o(gy*Jbe4(OBR1%A%^4jGQ0OiK6JysY
zWrjYKXFXTNi<e$c4wJQ>!jf)c;Nxiba}7V=f07QDRyQ9$T~mWEy);C*S)CRJ>;u9?
zeTxMrWNLd3s|H9NxpT!a_<}FV={=%dzecM36<T*H>E2k$5$P6-9AsDbgb!(tBg&Pk
z8maqRh3Yeg6R((G6#JrI;G&p;z2&NaVHpO8c=EhY@$LzFXZBXj>g=&${c_|vCoYvf
z7a2VW7y>Sh8bZ~qMYg+_s_V}$;k3M69gZyGk7>S-Iq>lQkA2kc@LC_o`R-f1hEElf
z+T5IG^V&W#B!duhd8}m>!7co|MWuk1sNv<%$(+DzT|YB{f6dQFTO}I}twAoRrnkAT
z#+)6F1phHLN9POcmvKLF(*@)6z(;O*H|Bb4T9@H4#`fZvGtj(u$=5$4J8S{3Sm%Ow
zc^JQM8yA1`Yi0kFxEilsM@R!E4ZhIJ_OA2!rugVTDgI$<D#bR<EV8`2Uq4X+q6ZKA
z&6I=OP$kyJZ%Y^+LZ|7>Vg9Mh4a`4Gjou27!;a|_qTQ^Y&wor!cEKCX4^z`$odQtk
z3VkScJ371XvcFUj5DY%sV8I0Bk^r{S1Ht8q(Kh~<G&C~sX#I&lb2RH`YqvjQdLcmR
zyzLkAJ*Wz;?r)-Q^H0?BYw2nd#;0((nrON!I_cIk3hcI?Sj1GAx#V;b>DCzv1LGhh
zY3%EkUbn|MEyY|dZgKuT55q6i{xxh1A10NcZXP3IXkK_hw3?XO_me1mC8Bk;KLsuo
z9U<D{bxGtU$Kk&(WdUMRdt{9&_^OsrV&qPp$*rpuAi{|bUgUpEwrdmIvhi!Gj^m3P
zJgJ&59T|<RNs)`|&v%uWBlwO^MdxoI(5=?<SN`}^dgxT2u<=vvg~NHqEbE~%At(Zp
z@<9xqBa)5uQmm_KCiw~Hkt{@AO2kREdT^di>6F1mxP(sU)lKj5=Hn(sh!7Vda;qyZ
z_u?wykK&&B!>#2nBUy@bh3o}E<^4iNN)IT^`?k#Evo!ehyIaHhU<KPW*ENA0UNV1$
zc7=wMHtf1)xbMD*DC|o)bM{!i<yFZ0dm{^*w!fA6j^jVMPR{-C*!a^^QE>#cG&ok4
z{)bpU%i`_%tAuA}&GGT+CCb@g1D4`;Rme@1$6p%@$I{;O{3q2i($vOPu>jmi$&I=D
zH^MKpG^ST>SW_fkdjB3{HjL*bHO~he2IW1wk%pZu^I3fxcA@|{34ax3$0!#`IXLNT
z07cjx0Zqu`@3Y4Vau=FwY1xJ;P>7f_qGZT2egZolRow_(0u}=fy}*r4iIvc(BKKF$
z&yUf^TIG5f!fvntUlXT)+J8n3&^f~Y88r<588x2T3{<`@s#Y#|b5eMM)2PT}Sa*)G
zeHs*pK~O`iIYtGQv0UMun-<RrM`?L`A)z`IIawvupl8H2)_3}eQKrFPgm0?TD#^-f
z*|>t?RoG)G^2iqs3fzt%5kpV^AvH|@AvJb`5g7CR&XYaBM_6=1bl&fQU#BRWzPzy|
z{@uY%tqz;HSwVRBmzK8Q&-V)igLCDGU#6<<{~<Lm(9|mBf+=65lDR;K(APtUHE!0^
zey+yhX9awxSVKQsac-n;to<$_wvCz?%iy07&hqJ&fE^o9d~zd>&~YnLtkuUG8SAU}
zIby>I+U3<r)sDC9A(?IZk`@dX%egJZ;PK>y^k@c}Jr2>U$J$e`ZxG^>K)<EO3G}wp
z9-pJ_ww?)rtqu<NMqLJiPbS;_9|b-X0uX~b`-ryvlq$AN{22}RB$dl=TkpNVvis1Q
z{Pq9kIZxjRktuJ)h}Jdt<%r`kKwtUk=^>Fhzp``roRFlBr@+@g&REX+y!mO=&Xc7A
zaaM5`i+~AK6wOBe_Q7uq|Kups_ogako}aRs2(DN7+YSAFx}J}qCu!+?pI%Zb1*~GM
z$;FAaI5MfmWN}+(bj+ZPl?Y@1_0EM(x{39AL~B&hGF9U*8=|P4IiUW`8tODs#_eZv
z)(g{mB&^1ePv-OD{@N-teTsq+|9y_3ICPr+!3HojZMKoCz-oIl7Ks$srYf&UXj5KR
zVxL!@s>In?mfMr`Ue6y-z`<Kl;)W@$jw#2(Gg%QwjSB>}DXNg*&s?G=h#Ao1>B<L|
zv#!NrE%fYi3GyjCw}!;B3p|J1;9Nr>Y)U+kc)x%;;tb5y7H-4zSyoag=jhJO6c)Mg
zYZq>7B}zP6hqA`vvj5YnUWQ~%Uw70otYf;dd2ipgs-8ad8@%QIT{Gu&nPZe9Vek!R
zSCe*=+ld<T37AyPE=-$j(2<CpHek=<6;G5{#|QlPk<3V5!7I>Tb2}wxr^%F!TKT|S
z{2g?FX^^g8c`uY}yS_dyX3D>_6|(q^PF_&<g+fyE@9Xh;jps3ERc2+LmlWTVdP9CX
zZkN|6=S4eKDpy2Zz0CH5j)pzwi8S_zv)mi>+qScgkn^|OR<a=7i=k~cmwF6gAr8kT
zjA_%niLTjDk1meRl~h*{hKneK{~<N%-96G6cuV6KG8ukJU0(Z8#jOqg=7|&_;bUl;
z>2#u9oT<6IjRM`^*9a;57Ou2>e=Jx64}w==?ohbm+=_|7XkA6Mvb}+Lt&*1zSZcc>
zQD$KOG>mRKf9y4+U|=-yi6Buu>Cqd;PU}l78?EC5m*90m8+c0Sy;frAre{JCw<Dfs
zl)z6j?b5j;#x}3ec*YkVbSCiqx8T<XYDLR$#WCxK`vVMqkQ&xYRXn%PcuiE@aMV!y
z^`FXOFXt>czbs#84cpzC_VsYSZSO8XI^Qu3X_q}0?h8;Mwaz#a&*$@e|BiK<er8SV
zbIiwM#qjFI!bx}nSntm^H4&OgSK;JFS@Zv4@1CAB@87*o$F^<Twv7%uwmY_M+qP}n
zHakhjw$pRn_isI`*3{H&%u_X0voZfdHd4vwd%j=C^OJVt7CU74#`$JV-lg%{B`jhA
z8#8;pi`Bo&6Hur1W)+$j_*9ZL2Eb)-w(TLr0H4Qmwa>jk3)ei(BRBP@F_Opb=!syJ
z&l|erN?(d<;T`W^9NkSz;N$J@CBd{~MkQn0&gfKq`62q4<=3{JT}h-pyC%joz3IRs
z1YZ~khGCF|UyFI)5ie~%Cr<VKB+?|S=f`z=8^|}i2cI1!ak)k)O1xgvRAA*^qPC~U
zxaFIDM<J^ZxsyeCE%%*~->+%-UMO8J*HXXu?G46qL3v-UrGIU;xcvRb{|Ja#?;ont
z68aVe^$bQbj|hRnK6Cb@MN5U03M>H0BMv>kzCMqA99EL_>*icsDuTS<R7_?6zW?xM
zc3S7O;?%=8rM;l&L@n|$*Mx(nueVF=_?Ask(PHNRY}Nq|ypgHh^CW_5j85@iomGt#
z{<=%>+wV#-KDm4-v#Kk#QasL4^+<Q8Q_RtQyQdKNB|JL%aFU?6B2Ahfg~<nc!9!Z|
zrq>?FM{$Bwk_TCDJZ~ul^RTiC{D&}oZ$ke#FnO@5m3Z1nn1Q`J%TbLF1*b0Kb%{2I
z<~G|K1*n1tWl;KJZCjy*z(zX_rTb8dfIOWIsV6CLKHaPz7J$^Km!8v65Ue(6(cvRx
z!Nd(#EgLew_h7}w6NHc4Yr+u4H_0LPmq**M!cGbbl(tD}FQ5`YJvgDMcG`o>nM-xz
zQ#t`3*hC;Iq)^&hVBPasB#EN5lq{zlQ3R`o;kjxwSsp=I$0-(ZZv}&@UMS;kYS+&b
z%MCD_)OS)w7_R@}`B$mg0P##X{*O{4R4J0!Tm(>RjFbUNjd=-e$0r#;sVNF`{Et#I
zK=L1@2J_)xrAA#n_<t!irqHj89ss3g&<C-qidh6C5~<QgO~*Fsrisd=Z9tn?Ztg4{
zn6@i&1O=efEV}cZk)G)gFuWj05Pe~e{!gXGEYjE=e}RwbRgm@yPD=}JgoO}KmRASD
zI<S6K_ZMZomMV1}S-LYZLdD8>2zu*2n6@)tNV&gJgMdIXZ3PV!;?7smVDNM>Q<L+$
z>2#%<hYV6{Z^Q+ZFqE#2&GU$eKMXz-Ft;9s$Bm}C>LVsBI$dWy8Eja?oi{sV1yTWu
zajpd{kIrgcf;3yT&7omBy*XJr4o0K14+>!ylZ38$b~I$~1X$v4H?)!V4emefWRj5_
z9=C==g<Gop+7J;y=?cs7L&ERb>@=z=z2Q^jbnNiNFX(u1iLRv14MK{ca!U%#Z+ds-
z;MykXm$p*4nILEfmgUiDQmKVj4bf`?Q?8Ou3nI$>Rwd!mw-VhQqqM2meAB~BXR}P8
z0d`8Ji54<(A=bs!p=HWtb^Yb2r}1l~(bVP0etN~aTktGS<gB-gyhFfxkPrd`PLN}f
z+^HK%=7(tUWT;`ls`ZECkIh4{dSP79RYyH!20s$ewjeh0qtn4)!Yd9NY(g)%$|Bm(
zzoEyYxci$Cnv<y?g0@jc<#78rgDB8hW16ukY7ou?AM{X&p4{;DqDuu3x%$YMn}8qA
zNs;siR#<e}+z!E0S9qtAD3+Q}s-r9?Y1(16^_kPl_n%N)fI?jM;u9R1bHFLq=OZ;B
z3l+#e`Q%)4yCOGRC{*in<sl?vQf={(SoFsCuR~){mHnutvaw`mD^KJ&8QcScWY2{P
zJ1W^r5v9wq{3DhsGyh>~#(uCp&aBgxmGU?&-UwUFth1N-e)(L;wR5s9u|@>kQJe-H
zCxF>X@CWr|l*7n+o8q<Id_P!qVVjMDK?(?I9wCh$ge@y(Djxx^!SEQ(tBZpR@>Q_}
zTwaWms*^c=Y~l5f244*8i|Nn?C>N2@AdL%5_TkXWaQY=oR>{|q*ZKaol@2}WGRNfm
zM>(QbT)CpS<abBe*#9(>4etH&iLb2ctK&72!Z=d-ITr=xCj^1G>1&t3p@<+QQ|f5^
ziX#weiI6}8$31AsEtn9V+8T<pM0P+RFZHvNsJYrYdcd}q55=}OIk1YTBpn?`L_`*i
zUhLsWaz*<UMSij~o>!?G|I1D<Jv5A}Ed3d(35A?%&xpjveADyYnCX|cXIpxyyz~+7
z^E532_qz16qLTW4Mu-bP7R!w9xHNSl$=e8`c2bJ!Do-G_Rb{fG`8e6ZCh^Y^W$2ET
z)18jNLC<-Du86L!;78nFB9@xh9dI>&FwzpG7uy#BNGZq*bCOkI=Fy+j{p}|r>N2Q{
ziTSOzlDOim*q4_jv}u{eAAxF!Hn>S6xGdE`6)FeAu+;ls{#E-&7aCFJ&{-jXlCR<W
zSiEFuN_7CA+Ck%EKQVnk8M9J*6W_<7Lt-^IvaA?ND4M^pBn>K-A#cg8KG&-9V*T2y
zEHV((Bc2`cZ1;dK*{W#rAc^<alt1kc;W<`q8udz(ahA1<guIO1V*KPXO>d`x=78p3
z3>YQ2ViYl>^YOLp)=E_FL5uL@;u6IOI+9EI=CUzq?JZ^Q=AqnSAcy4RtMblU9xN}8
z;SoQugB1z}`r->kod<b|fAX|9?(sF<KS|0`60F0jJom6JSg?^1qY_%bKAXw|m>MO1
zSkZr(n%eW-#{$rQn3@s*Q!_TE`43ZLh}X|7XDdEW?CXX<G-d^O>Cfwah+};9M&?^a
zEB^`mpaTwIYEqp5OwAcgZ!a1J{yxulOd2vg0>V-rt;zl2&jp8=ZC9yl>%vhA7#NZo
zjmIVX2GUU>TfnB>vmAvV#1d(o@YJ+8A#X7<{u(vd2LdOe>?mK2+~NAP6|rPJ(>^k|
z+*vJC1MqJxk{+>ww17(cG3I?qdHPzidFURG9zDTtgXkBBRq+Y~wBLa>;wg)%^-|vv
zc}5M{AlqltElt$JcXza;H^G{)>KJqYr^aEmk>DA<*#R9{2@6JI{pUb!LyCz-HOiu=
zI3A|1B^2Io$f+}%_3T>4k=d@>Qu*c;sJ4{&_AxT}oI@-hSi?E)Vt?GM^%=kb&b?fh
zpQ%4F87IHJVM-U|;89a9TsB-T8O`nREa9+(Q6etw+g2rTBUCZ1ZP#-uCz{H1mzo#D
z%y|U_S9y90^g@)*`p$6H@mguy^j^I)-cbXFd?Y<>2ogq=GG(w6?TFu+MP%DzAP1vv
zT!umeukI6aGnFA;a>Y5<!hWp!?`z%QY>e@rOZ&+MG*4b;b7gy(11-nM<^6p_;sMC5
zC?JV~*Uf_kMX=d2%aUdl;i>V`j1zx*SueB3!vqC66fTwRUM&aFqRMFA=~N{?j0+0u
za$e*(=ciHa`pDl81;aJC7#zaLJ5YaiX;_h#EkkO0(8>eONQ}SgZblQa8JB2WbTv=)
zhEEdmwj}WV;&|k#Gy;H`pL5TljaYn<g`lg_1@iCYCX~`@eVV;*vH-7>muHxOofrw5
zz!XCuGgN^jD`Wziq(QCMVb@Vz-`}|$NWja1zVB-UimrmKiS=9lM6#R^OUwbpzO%Hy
z#Wp%d+zlNv7~L#?nhCHKiGwMLMLaW(wZYxW)n-GOL6Q6Z{^YWyC}Z+OzAo|17Uy+R
z+p;*|z^Dr?4(olNQyVZQn~Jj&H7C$%iQ97vL_RpkDdrOUmCaMS2-%VFGJ(KEoQ#VK
zE!yAzU;{Bz&i7-wSW+%Dbn}O$@psE?c91eVb7AUKgd}3NI<Q+oGfCnsr(&NnpRx2w
zRzWgfo%CB-K{tz?^ikW7A>q`I*I#ZoU+8;ww|`YUF2Yk3ZMSjv))|s0^t<b;@`Hfg
z`jJglE6GhtdUky=DdVo4q=H?~z7HNAX!rC(nsr~_I;M&ZW#Iq3FGH^e{I^tdG{ty(
zPku4|{YVF{$?IeH_WaZCKnPUmtJ_{)Zfvyrn~32VF^xEs&xIhb4{)82^Pv95_pek#
zujs0Uie05z!|Ljp%>9p4lfjR>V;cH~zBfX})7L%v1Vxy3L)M+F=g&O4p;Ma4_D0<5
zHJ4ZlRnl&P<It8}4R*4K7;7Rp$6rSAC8D-zn8(1(BN`R71VV)iNe0h<2ufqw-m@n7
zl!IO8MYbfNA~#>qQ>z+Pf>Q3ma)f7=<!E`-Lt(0Pp?J_`pE?1a5mvD(wlTO?>$v#4
zXwoxGS&s#Cy*c7uahX~A5Apdz9N@X;xq^}FzoeQ`A8H&&$p4mVMmAH*rh{^WzbJ!6
z_r;}@j#BId>%I<KF#u9c#6MEardzGyzfw((&>yy@-}j1$S80S9H~~O4Yj&HA0wq2V
zR>7mm<h}?b&W2$9Y}i^0w*aYTp`1uRa?(0ne~!AZC62LLw?n3df$Ud@gTco})UQhq
zB8$Tno!v8pBsWLw=UowBM@H6(lR8(Vsd*Mg;tJnjiFUa_utS4+Sn$LZ{=oGo?Ty&#
z`BzO#lwg67L{4T4(H6T>EB&1igH6mn(<L%GsTQ`Gg8Ve6gFD8)&R7pAP0FS<jJ%vy
zecv|cZ(`lkm^uJeGh%m@!I)fJB>OK_)B6uq1N{$GQ(sD~^e<Ht3ek4+`jxkf@7PM0
zovW^Xh=+7)3^@;ZtEhQIv#qI$+%5z2I>^o!6s+-wYM(IEn}403hYAjK2{wam7Mqd<
zt5R2oyZjwBVhsFi;y@)T989@gDH%$qL5@ejQZ3~5=Z0&NIg@#~$G5_v!>}O3+P*g{
zfT~fR%QwjZP&I5;3Ds@@s>aoZ`~u%ZA!{)pHQCSeIVs2fGfEIZ)lh!jm+yOg-o+uV
z?GLA=tYs$XScm~FA>4!Pl>laDE4KVqIayC~+Qxp~gse-ivs%0lZZW<6A=yd9-fSyo
z0+?}ppQ$-{gFV}4BRO7dTs-m#u)?hq$C{l2=K6fHPGI(JxUNT4x}FdlZp#Yk)M>_)
ziR6y9YI%u{vP#f35z=WPIy^{8<1&ziEH%0I#eot^!wJin5cG!Cm-Vf=DW;%s34P?-
zo-`U#4$zgz(UL#0iT(r{OyZ%!y|fIzm(fyJLE)IBOi2|w8?sERJaY|$hU9bUgB7^q
za1R*D!|;Ir2sdjTN2FB|%xT-rLd&CJfb%v;_=L)*`7w)=p_%T^xD<iP#9?Bwem2D9
zTxb+9z>7m+=uKjwX|)Z{l^`(`>T3X1y+oBwI?GJpfFUVLDCI9`!zpg^NN#VR3ysq-
z6FD|&Hp~Hf_oakU)1){^mZP$6jXY!`g&Rihy!iEF>Z~YmotsHjYXp_14JF}J&i>Sv
zeGY@=$f%?-^!5!(l?1aJ`HiEt%~%JwXu1MgwH^EzUH#f5hq$xNahAvEj<fS0tfslJ
zx#wcdZQ>FPm0}YPwY1{E_VpHJQ7-)CuR%Dbp-9X;l4<=8PJwHa)`+pyuvyC)v)z^u
zexiZ2PonnSS?bLwq^y@JebAV(L$b%ALsJ)wazn{OI{Fj}Z=B9ie2LDwSO?4av_&yt
zSX>G|w*|UAj9dyGpE|;i=1lvk7|hp?<SkOGt(dW&31XWNDlfaXN8!kKGbL)&mD=Rf
z-ZICNgAA+;YEQ!E>|u;@rrmT?OvS?D3ZpkWkFEIN?3{Svi#u63^)bgwkSbyyw&=0r
zLRhb8Y~mv87Dq$4Zr2{>V;{WFHd_fi^-eFb%OAY0S3Z6pqfm=#ODr~gA0DL?a=?%V
znLtBYt0%05On+w=r5gQ6L5ve0jYQ;5nDo9$AovFPm`zadN@$(6%GI*gz-`c~A*(D8
z92n8!2Houbg5F6)X566IU`o*7)0gNdEF0dlk6SgCQBQN1=pabxgxuqo7W@ybMty3;
zJLg}mCYX%}3uoT)FiZgLAqWw^&1A3DWRlZ>%!}}SNwh#&kZgm}PqTwfcIf^y`o>~U
zZpD|+c#f}<apw^U(Q^XN;K`a>mHZ#A2CRdH2m#1rB1%lk<&QXCa?@zEP<C6Vbc{n#
zD7fv`ys{_(_d_k4fou!NV?2UpNWESLf%w&SCYME)V=T|#Oa_9KSO|bt6S3qf2heI*
zYB=3YGaub-cS<&>b(V8GR6ag|*P3*T2HjlG`tF-u-&sGlivS-`@InZuMBI<g0KM=`
z8{u>}ET%rJA3n#UXcl4vH-1HH{=Y>P@${Cb;lc?SI*0i2ME*J*vIa&QLI0n$8oo^0
z!~dz(boFNM8{4)T$g2Xhn!7g}xh;Byo~ox>%4bq2SGM1wzZj^<L`&<Z6q+JMerR6_
zxTqZ!TUuvXbNZifsCKdU@O_MbpXJ4F0w}`DnpVbK{Y*chG`pS<s=5jZjJyf0skR|b
zL9L8V@g<cYtME_(3Uxw)t=x5)FZ0^YG{&EmLU(7y12)-Skg&&x1SvVnmFMaU$ndwL
z1eoEXt|;WE#8!za*aNJxziid7^@k?P&Qbp1YG46ejeZ;9UbXy=c1EY!+zkC}PLV2=
z5NGt7w@5hg-|{v3#T$?kc0;fkIXG>gT_Oo6%_Ui(`OSSVVLK$ygR$*O>z;y})Ov|T
z(gPMYd!8Q<mY^FAWop=5rF@UhCW$Vo{JzHzDU5b9oj{0MWZHSb+<#}UBE+}gipshY
zdJ^l@*<Qn4K_9Rv>%Tez753zG?ppaWR$-v)L~0(EBqKZMkg@$6>0*f1;Jt^|1J-QX
zl^G)qdBl!*8l@$B2o8Rp{i%Gksr^BJA~9+>^xAqUMXR=w;l8x!X~iOd1~yA@`fQmh
zW5vH|a<D+ewXP4o!&8HV95@I|!r7}FM=%IygsuO_$y6<JX!MXc_@UD@Zt_5<Pc${Q
zmo#E<lSv*Tg~i?M@DyGz9+Os{X;P^T`tNJl<&ZbUTOvCi+qPc?0{S1MZs(r|WAiZ4
z$nPkY=FA4Loegg3=|CRQSus6^9hI*O2-K(Ogz4`rBv1@bZ7yDnzUMQ=cntGiaB_H8
z)Hz3lF0DR~q3}iPeVxs!ARHI0`)0d9W-$T<PQ&vk&JuED5xZiRV8Ep82twS%nR4Q=
z*Q20+PT9nw)rv%&rksFTA`sN6VviQzm@K14!-{+=$quiS`6Wb&6h!Q3bOo@g=X|+b
z$d0zHYe8nc4?W}#mn5aoT68j+XX+J`BTJlDW7N&EAr$%T@)sidI5I|_@EaZaDbQkz
z6pR;yEb>`IYKcZq?MoiQoz(1anNv8a3ZQ@VlWzM1315P<rbufH1&#ifRuez>Bj{`P
zQvE#mAQX`iCnaNpD4fZPh06BF0k2vrU~lEAvwM*~6YpRhOn;{Ma-Ol5E_qde$RhCV
zH~+V+#xh~!2N#{7DFodkil$7kJiYi5x=}Tjt(|#-kNY3spzIX#$nZ77i_LTHNOFDl
zFdb<nEAge0M78lyaIT!fd$NAAK?PITS#9JHX)SIaBsLPD%!<zF{i=}Jj23YsSPYne
zSI+AId58QGBWgTWqTzK`C{v1-hM(5zVZ%1;oGjsJ|7tZeY7p~~p4No6kH97V1{T7E
zQ25B_7Z!*LeR!MTRRbD*bdlM@Z6d`}g(KGo4=R5<9{eZrGjkOQ4oudV`W44dD5*>J
zz9g6EfHhpZ;mk{9>D5dl#tQ1#97eW>b9*2ySoa-v=2a8KtTmy0>8(>M%#SO5k9P!Y
z8b#qQ{-f1U0JIuifL7Dx#+%pJIUxDN4=ulU7?(0e(I@{@qow$JT#=`4J1TLLy;AsE
zz|wouKU&Zqyc3Zv$D8~~(nS>_Al=sV%_<hQ@3dv>((3KkZ1Wgh6SnK|c-8_@C5r72
z;S}v&o%#XF>fDenMA__GRY#$%s$@isT^I~mnBhmT3)(2><6d=k-VWs;pKASe-2GxU
ziBJ}UuCs710m?dy|I%tyQbzyLYMyf%c*cz9cRP33;XKH6f&WXZG5ANTQHT7GR>Lv(
zeU<s&T8;Fwal18Dwo3BYuPvy2eA5P8IWd#6;00_Mh<3n;`afC?9Xv;2bO}F7F=z}v
zTxsv}mE@PKTtham)B#N9zHA)dKU&SOPeH&`@k7QP-%nRiibH$^m;^5@EY}JmrjMJA
zEyRm}N9{+f`n`+JdLK*yi0hAD24|e)X@59EJomX5>m5Y{-<sN}w!r>pWZx;-TSo1@
zy(Xu*F3?jSg&K=-p;b*yNiANG8!uXIVZB2bKcxIH!uMTofT3O{y7*LR{aZ6k(W%$^
z;4HVA^y^3LNAce~M8Y(1DNjexVoaFHBmp`Q@74YAVzqwiWwdIPOz4=BF^NTDf|n!~
zd^R0Znj@}ciHDIQraaY#{u=zHqesn=eyMpn)}NG-C8On?J1kKqra+b81s4-!$ws_e
z!fj<wncZ=%%2+7UWx``(2I)}JG2$g{5|d^)5jyZ+vw51Faxkx&EbjxsFRgn(n-2P5
zyHuBca?1rk!53x0Kzh)B-He`F3x18D{L9rC{ma#$%$zJ4TmFZuVE}M74c&PFu4aH{
zCgzw$&I0s7V$!*HL{w9JzE)S;&29uyb6IEGoAOMIB=a;1gcL&Y7WCT-^ixKxt5wWP
z$=w=SXK8mvjrWj$=L(mj8}v&)&iTdEXvgP^U$?<V!;OP^aVIZ)WU;8>I=9w_qo;%y
zw~gh_w;9*Xs!vViP5><mKK?Hf0~VzL#C)$#fyK|G*&+8}RPM(vcMZCP^~D|ZsHYzM
zo2$WK_sbgiEn%zRUFs8(azwiTQY@iD$dN<PoiEV#h*(WOx>m!#ujBePaxOsd*IS+j
zTV7PFP22>yB;tp6nuN9}LpW1voHqX3CBNtEGY&kQRvTr|r+B(IgxUR2=;asRzU)5#
z(mN~@YkcAoVGZVYu;)lt)2NE|97jstTlJ87REVoLfPt-5+QjZNJ2&1rFAa!I@LC$j
zM?$w41!`RWsyD(^y$p#&_f}&xfnS+H@7lo_*d1kQU0>?~pUZpS@GtXVgKUVfRGYqd
zetLWg&@WJ<xDT<1H0$azB1&yL4Vrz1=WD_wLLV`1*uB1f$KR|c&cekP&&h!tNU~>z
z{i^@D9XGH#A@$(GIiGFRkbAwmQ0wl;wz;>5-qtDkzD{K@Qd??=%cP*O(%1I0b{tS0
z82I+df}Go$odnE<;txVdGi;meFOnJD$U>^YEKncXP^*|wXjV7<IpWtn?RbHi&g|l|
zg{VYm8VrT;D*pJ8h@fV2MUY$a0G$y(JPq+<B$mqi+wm4!)+w#eOo%dSi3o$bS5y*9
zI1Hl#x$#y*0tHUGL<$`!t9Cs)AH7!quE92AmX?o~rFAR?1sE_&m8&75b?bC}+POGe
zxL)uJ8-gvEc*a%aK~$Bzj!P&3Hw|gJ@uAMlK|fI9w~^%2!KYISR){^ABB&Yx1G-+H
zREqu*(Lz#(%jA)Jx-lvDLR~4p!B?&deNhXC{mifXNdAX#c$T;hy<5WKpqg-{3?snj
zDd)*t{sykUh##xeAA>cU)8bk^&u^sZ)LK6WZ(}M--mtmvB>aD}*D-@f=HDao*;anU
zQAd^1PG+;D{&&``=F2{VLZS^2R{FRXySj7uSZZ>?uzvLKRD&~4Qunr=A8deo8@Jmv
zctc4~)6DOoT`^c?KRaJ{7qG!I6+7cJ`)k1o=13Oqs4Wst7E>%Jofyd>nan55CX&iy
z!t?~4q}r`kES;KEm&ipe3L}<Vj6+YRgt8qfog96EfkdqgyePs_V!%gLP}$}Wnn(jJ
zS2e#x*MTBbooV1O8Ky%*Wtz2LVZTiwUx)muNb1v1{7fe04ajPi(GmrX3tr(>&Q}kA
zx){Led3HP+VGO=N6D=$rzKwK1Hm}jp+tQPVA_6Yj{g@<2msrWVfWZGg46<Xa-8?p>
zoWUL>iti^hG`LFeTRa*hcmm!Sx%A&|4HLq}$>j5yWC><LA-7=L>`9UT9t{anRGx$A
zWbn)cLaJy<P+5|VXBY;~&B0;zijv%g>ShR>AZFl^OJq0U#UUN<HedeShS1Xzf~f6B
z*+6g|4HJ_~GKNu63Q1n_{Snytd8pXJc+;syv*LWrM4=n6MeCA$C2=3W#G9yw+RgSo
ziBfsGJjWdry6G~0NRHV4D*m9<G*SeP*#1ZDFkYPPpom#4?gMFXe7LNqDVvHV1i|6p
z#z0BpEdPyawUU)BLK9YNWgsLvn`dR@?jUF)SHN^Hqcm&5haLE7DTGcrUqS3F&OW_H
z*EGqKs7d>nHNGhqN#FuW8{8KBzVkF2E`2e0M^8-Ed&wEM#AqaNdmx@q=Mn;f00`yx
zCY&Umgd*Aku)#s2;}LWm&MfR77&TZHgg^P(vF5qM`6Ho)XL9W9T2K$nB8xnmlC0;R
z=*!%)+H}E!gTa?Jl6Jg(ZW8leWMVx^zk6IuSjJ5t_BP)BA`EaI<$T>{Y*9H{Kp*`~
z)&B$y?3vqZnFB=FJrwweQ;+l3=lWa%urp}Vkw7;|&L^Mqj4k0T#&WMFxrP&c(xWJW
zfztdu+nvuExUh17gAhxI4D9$2xD$saxj3+;+b&kOBD`c!u1YB#(F2Hs5f9;qF@Cp<
z3rGPl4b=mPOe0W`D3}ZB(60~&7b{kW0`|nKiwi}~h0|1bjm|pGR`T}tdOg@D@X#kB
zAkp;GEtH5cCCHDLU+jCMA0t+sd>ug-YNprq(qB>?P+i0>R5^q@LS{Pd{x?L?SHpB8
zAL25<euGQT&+3HV;unKxFu04^pceM~A;o%-ApG%W!ikZX_aB$fpUF}-3P+b|4_PJ|
z7EWdu^{jVq;F*67oFmMULXXU+`04vJGRcN`BNFQ|{v3vJ(pRDDceF&pt?sRL{^u(;
zUVuA(s6K2NW_t^aIdaLK-l6Ux%cR{21T!PNt*e*L9^2Ah#nVzZkj^cx(?-<Hm|wdw
z1G$?Rl{CVAPs+}%2zL?IcmC>BjgZ5*yN+`)j^{VwFUD_=wY~>+ov@juP5HiF?aJPF
zhhkssR`K_DNz{nlYcVz)*|F*cF?D5!l`zpZkuOJ+iKOUUvYZnLvE|h}Iem&O`4`O%
zxze^doU^!>m$5~4o}5#Hoy9x_x0lg_2sygznpO$;(;8$o9YTK_!=2g#o>o&L0>aY>
z))AMo%8ppOYsbe`8zo$xN0A*ZVm=qKwwUPTz)YTbB43>;If1`1N`-{WZF)=B1$+b*
z`l0e(B+QFjNi)x4Mf`W)xnlZshul{L_RW?rR42?=q!9_V&Weh2QhOF|rFC_TJ3H))
zaaGUe?~4r!V%1(#7JjQ%cg=)2art_do2U%|wZ3(hicm|t7wlKihMQh1mUio<1yy#i
z3%y>7J$3JXWERq`aIKHe&t$EdI|~+R=oi>mio^VA0gP<Fm|s($3&Dzfv>U&Ab}|-n
zUoBeO+=g+k2wN6=bQ`XF<>o9_&#y0Vug<iV$dPXdPR|HV&mXi<bZw+9M3*onFtxEf
z7mm{)&5Yvr4sMz5uqyP;L<e|e<oLvl;cX!X1PmY&V#{ijJNfMM;%peAPafVMy_$hh
z38RL`hl6sUb3YJRUK*~Q>2{ZTN~(C5cJ|mCO`CZWb{b3U*cS34Tqun)=t=pRX>+cg
zazA1rj(dlUuf6)cq4hUUx2;h(>8~C|+0@t@wg*rAhE%X`_CU~jiwunm-3u89=$-$Z
z2gpTsGIhd12;jqKVUfRkwh_V0w3$&RpI)n91s{C%G!`gH;>!{<t`;e<*A-PMu>1XJ
zD>%e7$XgzWy<4EaqmM{Xp6+2*A@6nK)iNk(=0Pn%SijQt{2;y4^Bx>b)6)e%DppCo
z=EM)Ik76B1$_bsQB6);N0;$-rkh+@XC?V$dXxi2?$L)LJ+H(`tj8hmF3}yUji{<FJ
zvw3>#zxz{I`@!SUFY?eL&eW9ipf=tFD3+V8mVeeUgB6mVvXRaHvuK`nO*RhVk;w7e
znIec2s>N0OwXY^6T5Y_L|Kl62WVHYdZz5b?GP9~3x_&91lvkS}mB-28No%r<wU`&I
zIrO7E5aguHJn;<}a-8FMAYM^X>qwP>m$;Z^2E9^rdQVS?)f-RL8=+g*_g5@PIhuWD
zuz}`hmnz)$bB>6}qV<M5w|=nPe!OTK5azEdes2j;UTLyT8WV6vJTXG|^kP4$OCMo%
zeVHV(XTY2008<+_B;q?WAHK=2eQELmG0{<1QIBpgVzNx|2@;y^4!r9<zCpZU(io&F
zTkjnIH16|o0`U*~Bbap^!d9lyyTL(wnWKXQom7)ZB%b31F6c*b3-VvdbuHz3y}CM_
zvuGW(xEQ2rhD=Mpu2S^0SJ4|i2Ss}zEEo)z)VJ?%$A3k2vo!}3mPs|s^q6{ErWs~s
zxtzvTfKE<%A|a$k9g<649U#bD%IL=HA+Rfq@xh$N-+pRRcmJuhY-{|GW6T2pPGr_i
zxL|5KsfO5BS;+~lMSn}?f_t6hCsgIUG_n<PX~?s96RZ9ZCBBW5IV$4`gma$OD;c``
z|8m{m2s{2_!6Di5SPwW}cFC=erV)Og-7Bw+46fZwq1CiJW~F!}Eo6w7zjutyPtfU1
zTz19dV^`W9f^R8jcwK4n`NQ6-`)IKRS?9TjW0gSQRUbCE_$LW;%?HvnB}oN(TDa8^
zoDewrHcOGKHA*-i4i`TznQ*T%d857dr~x<1eLk_6keo#n#BHWn#Pu<eMWN%cmc=SE
z70u-Fck{A;>i`lO6uieOp}cuz`mg&Sp(reaerM?&sRPm+x`y~&!@m`Uv`&H{7#6!A
zv%1zNieoxIzRz^&*O!ONbiE>P2q9F>DVeoLvypHIo?Nx;<^lw}2ih=u<+Tw6HVgQ0
z-c|PP3*4LicNwp@26$y1Lk|v>(^<|NsgOm9hg#)Cj(jH@cw(T_!S%Jnc;ZTE!$LkH
zwPN)3{+t^{e~!}1YN!?PINY<XEs#znX3z5V1n8_!9A0SvA}MRg86IDO4l}^x6Q%HE
z2donNIUV=&^YN~}A(~v$f!|2!jSIV{QxUooG=FT(cAW}*K6A-M@JrkFQF<+rPgG7)
z*+CYjM<XQxpeiu_S*+_5_O&!tgoUOa1zQ-wA5uaNR{PH`9xY=XgFeI1&>5B}!(@7`
zN05oZx_m=;_e652W!AEUeNodu5TuIVv_?|K{E{?(IRn2XRA>XF%bJ_A$#i__q@fqv
zOPs(Pt7Q4)aw8n}RI0C?!O4>~gQtTyqjl}f*O_twtG4vcDOa21rd8C*t`>u^Nf(5*
z8qqi9j#cD1tsf&CkG<kZvc;^%v419V#QKmU1dAjcR~PEJLRbiK6XMETcZlc%JH46q
zzi%z2_h+&!R0fJ)o!~c9!o0vp92@^Q4cccQ2r?JfdS~QUDRViSDPib%I9^^;RJe78
z%vqfF6GwThYg8WH$)T687c`TGN~VjM(K!#(Xc({p?_9(b<`h{paw#}IEEojt{-A#d
z!4x1Sslltm;GJ3>Xbg$CLypqg1r<B^@&yd-7_uYt6NRacC<Z7J>C@bmCrT9$!F)4i
z-jEU^aIe>H7r3zuD|r(q`4VmT4$-BYM-O0wcMgzEG59O|a6uHmq{T8Mo8y=f&iEX$
zq$|8jp{h7EelHK1$`2h7_&-ZaCfeq0@=J?}?G9R=Dn|;Bb_EE%&=5Lwq9v>o{Muq`
zM=swFLUnx&Lb4@h6WW5_r?_+bd=c&c{hgQ+U;tAu2lXN4g7|I&lcRnb#?nH^pREDz
z;!66%&L{lkzvdbu!!eFuG3d>AFCTkbP*rLvZmk7^2t+b#U{*7!S)xPX;LTE>!Hv5i
zCVx@mOhp2JN0+XEj!3Z;nev$AAkfNtTt*6pi;h^Q#|Uj3hAxEH9Yw;UNn~v=5}ag5
zK?i(8_6JE2fk0Jy$3LFv6D1D&9283P<gFno!vwaX!%Fs}OB^rtwB`N4SCAu^hfByz
zH;FlpI-}ul=5Y#9o{Zmji-(I{=#xR~X-X+>1jfhRZm4z+0O9f+A3VXHz!V52Ow?h)
z_V$XYZ2oc9$XHX%PkuBk^hJsx(58t+TX<8{9>e>waACvb8gS1^<op8M<jF8%dNOkN
z8&(?GNy7f!@AuMY7&04+LSver!FhL>oE{<H0=_7}8#wL59WNRxCe}~r*H^>JXip<a
zP;U%}>@+2)IB~u(p-;59QKho9<7lV2&qdK9y%CRQB1>zbNRdKPvVy`Htz#qT<*DDn
zbW;*AwUX9ArY!<He?FyfW-)Y|gE@VEkal}Ai(n=~928-aKY}bJ6?@)U7F9YLwBMbB
zZ2wLpO}8ADjeA5tvtoOl7ixGMR!BV5wp-4)E9QAAC2l%X;Msc=AwrW)qvn?zEL@M~
zlzUQYzHf6>Tg`64rCCy)?WZI+cHiFmX|LMYykDU@bOKgL99N-1CA~3w@8-pVnVyZw
zOfDL|*+t#wotmES4-ISQ-6;+RI!=Ub4=--A5Vp&c7Qw}<LbI3*pu3JKijj>h?>MkT
zTt|!&=*vk7ZqDPd3SaB7mb?53+53rEZR<<z^&ysuNrCr^VLqiB#MHVOD{3GSF4vt5
zP5`N#UtO)&dws@#iGsvsXq17%20`Rr>(?W&S4_yLwd|l^RqWlZ4b#XF&sv3YP<R#@
za4Fdr>yU`^b7tS+gzbiB!OmfIS-*USeK`_NnSZK*YqD-%`J2xRIYV2(dgyQP071s4
z?7(Bz)!%OgNPozYLxSM6GH>Dk2o*CbF~pQ65M$bgPT2Eid>ri}iO}P#zs^m!G`K;V
z+6p%vsNFG)BakVw5#`lg0NHyYf}>b7z}QY9zb0*S@BplLHOReW_LN^axKeQ#p}9kI
zIQC`<zN$L2m<Wez6IRLkjCkg<ySJ7-L7tiX%4~^GG)ZX#p>AX9(4#Poxs_$fd(lXu
zMpxWy`^7cXjzW>qWzB?(lAO7r2DQe#j_`LUd;=>b2{>CgP$jH7FC>PAPDS=enq3AP
z?pOVF9_I6cL<eCfHL&!7Kk`7tg|Uk*4BI$SOh9tpzCQJ<bgF}gjM{>IqxhW_?L%oc
zO~J1d*aI&Ng3J%Nt6MG$L(;fOQVqd|fzgnym1G8=;p7->((X{&3t2#bQVU~!f|jTd
zpT9AGc!ytpe7VQDiEzbjer-L^u+r-LH9H>!^e~z3@uJ3{MEu3!l61kE+2>XDa=+$p
zDG+51M?6LoA17dqpbXtHK-pU$nJbSCjXvN7d6RI#`pku!RmOi6G{V|jsJ>8<Jmh$~
zyQUxJ6{*FcqOkDut20AW>;j<m&30!OYF8T{A81alSiU~A4&cxPqpObp1fK@~6MVY1
zU`xfinsbe7uUeja)($#0KCVd#8QCOM6g$`{2mO|Zl59q*;*$Fw98KNG;$Uz!<Wy`%
zu4^Cc2f|jOQK6HUU>0mlm=a|<JYo?TzU8LRQ~{L_z1X_;`9PM}#ar3phKuGc-oy>3
zsyGiL>&Vr62)C2JLx6F|xH$OWHhX^V?Q@yy;{EaAico$7y@5XH{yU32-L&lV^v))9
zLYss=Njnxq*#;B*;fmt7q;~*e1*i^QCPc0@&m+fHc<r`j2+Z_r#$2<4nh~@=jR&6w
zQ`?FW@?R<f@@TAH2AYPUsQ(#w8b+5Hj5GX6MXuGu1ovCJi!B-A^?<3l3t_5=fn^OU
zf?i~I@oHm4L!ZWlxS1sKzavl0kM0Cbf-R7HziD{za1cgwd$c}2e_lPDZC=?yAR$tl
z_3?B#dg%qeJq-SYOH4C3zgkDanvrkRM>;o^`W4=R4K+mP|HZ+6uM|!#a$cWkWeRm3
z$cy$Y28YXb%_Cv10Rp%_2(#<i0jx$ZM|<`F(WCpeF{`+^zuoqozHZQ*(+@DgI~UNv
zgu;tKyVuX965?ihHRW+g_aF51-o2l<Nh2LjEwYH#=5Ym99*w9hM#+`#p%!@ZI3NAF
z8G1bYJ)eizURJ<yJ+5ES^5Nm;3j!Sma{{3D5OB@J4W(bV8-RPDA9}!21zcX60v>*6
zo03Qa5AQVqF(LFR{LR+u7%f<`^BEcOoSO50)}KyZkUR0?<~3FE6_DG5{eR?Hq6!r$
z)};%?$}$(y4hcelG&ot`JdXYB+ok6x)$U}zNV~d;bosf%AE4RSZ|hg}_(5u8L_!es
zo8mSOYn!Y-pe8JU$CO4J=^IRx8C>YhzPKqSM92_Ne8-*ye_$}*x!RZrZ5NPPf`kuU
zT+rb{3;lQpR$gy4qphjxqYP9GMTn1Zr97A9Mz;v9{IO(9C*T?h*^a|Nxj}v)&TXj{
zp*aX{YLZ9U(LWdCPjRACOgvXPvS?@(Ceo`+Ou-&>5uJqO_&7$PWL5^{NDiUVFiR(t
z(1_&oQ(XYC(i6j%o1|ZXJ!vE|s8*1PY3N|RHLXvCFKgO58{pa(nD!nv`7w$Xhet~)
zH=2goIekkRqqz*j5iVcjORBj-<7K!u?!RP6O&dMA^YB)R^g=tTw1soSAuPOn`MQ;f
z>yhO!{n>@(+fg(EoE{M`3m-i(avD3$lQM_jI*9rsFX!}HwfdThez|{d^Zhsf6iq|u
z&wolv3py(}OBr?;LTgwB>s=U8EmdNIE<6RgvB7XOd^wRUj_@i39vZ<kqw`w~`m{6k
zxjrQm6)Q=iw-GfMQyHS0%P>cG#}(;ilrRv<ui<dC?6<0iywn*kkcXw`q#YuqTnKJe
z7Nu;le-GGIGK2LGjghk0%mTBFzKH*Ze=kiMB~>M&;|Qx0%g!`2e>>nOqViO`XpCTa
zGbs~WgQ;dS=R<RIZ5u6{Nd_^ci{?-X6bAtt9f-)>BVgkMBT{g$=m5i^EVmt}WV1Gh
z@0H*UC%hd?;W>M<^miK8BB)46O0A1Q=exmME=ehEew~YrjXqGTw0{Z5F(7&b6?N9i
zK!~9np_O(YV$*_~MK2-^4-sj#%3vHTAEM}QF+jY>5y4P(ATut+$Q&N`{K{YYT(`C5
z<b@>zN^qTo*d9}OGjnaIw=?B}zuZJ<a6J6Lh{%(0qleaJ47%NDu>X0c5Z`aUqpO=B
z5abgyr%QgTSD_JPmlf(316`jax0H~zZ>MIC8sM+`Z+@vFb4uILg;RQ~LN~`Dj~<M0
zpQffkq$XVO%D>PL6#xH=`K743UL9X|n<GGjNq`feVgKP2#kfUM>}xK|TeHnB&5uul
zl43}oRdtf2Bj(IgV4v5nwcl-#k9P7Bt>IhgXqALpHuk5ZnodW91p-IZALlv{-u5q@
z=!sU3ice+iHv1E;vQ%6l0s1FydPvZYH%TcE?|Ts{M<gt(#Qiv!HTQov%|!KTBR-%@
zSZ@EAb0-4fysJ|KLs&UWcKmv$iTMXHdhDZ%T)Piq=8l(4+-Hd6%~oxPYR|{Ab<#~)
za}Q=CWF;HBQM=D-pFzC_I&)hU9VpC#v;V6>2VT5+ZCjZ>hA40O6|3+WCj9@BO~RP^
zH=865wUF=c5;(3NRjxHFY@>S1h%bvt;h}Pis|X682c;4gSNG|z9=$EpJri3z#ck9A
zuCAM2i$9RxA3F%rrBGl`t;-+$;Rm*T3ph>IES|lI=VFq^P;{lx(?x2!%2yk?vTr3x
zHj?e9Mv0t%Z8rMaE(aQfUmz|@NGG_}CC?=CAHV@C--zVg07j6gp?6iL!2)lmCfFLC
z3?(;35j&(xRq<z(uM;wMD?dD>1OuEVN2M<wsKHQHcA2y!pwNazLQ%nilSUJil#Bl)
z$Hu7&Zq`HAw@PCx!~32q|CDVYrRcUq{l$I&(JYflzq7k7ki)N?)$v+aaaiD*fag#|
zbpGc|Ld!`yaEVpID+303DVB<QRzV=l;CU2!K<<$u+Ke#(n&NNw)2cSZX3_Fj5cL(H
zNb6V=<d27}%AHMHLW4v{nq!_r(vVNm4*Abv@|svC&rRxLdOu4x!YCQH+37!*$%BJJ
z?pWQ=G#NS$-S>nR+MPlKYD;fJ^+o@En!K!&l~&}t$*T;Jc1-R-4>f(z%KTr!BxGKg
zfLLp5G2;7W(ygL!g;VzZ0t^AzS6#Km3UromZx^A!OgB-`QDf_@(pcPh7gGs|4^(Da
zoEcYhPf1U&gBYxMa4EI5hD{R}8^I@3MoAp*Tt_N*FVv+f-Pc{!a?My?p^&kNi85Ho
zu_Z`)XQWtMqk^Eh>IR(>k^lQh5}hbPl;F^>-h3?5m*4sDf)&!6CK2fn3xM>}(Qr&M
zW`V;q$*CeZ%H|Zi$vODu!is2Op%Pbmll|?G6FW=(EDyQl7zYun`=@$_=dYm(MfplL
zu3^BRvdZCz)A)7$r?A;`)wf-<Njn`8I>NyODfuE|E~A|bAdjCoWhpxu$44IT!q@cY
z`DD9!M2~2I|2=UR(XankCJBB3f)DE0`bYZa@%^w4rk{!;FoAhuBeY{T2cVVI=@zND
zzHzJ2%%?s8iFX}uq$OgBV(3D8tVhj(U)iFXC5E2^g#-bbi19Q+Zu`e#x)H0>kxl4c
zm5#1wHBuwWa0I`Zmbhhvcd(cPNmTrr|M=_QaFP_E7zwq%I1wJml?|YsBsKUTww4NI
zl4jWuL0O@(CzpuvVVdb)8z*g6SVxO;Gw?7^7)AE}<kRJ~Ca}j}RpP6(yM}0<uNvR<
zuxPc(>cDD=v~4m6ERt-U|I^(*EGWgWX?ejKXhYtkdVoXd9?I$(IIGk@^kb?OG{Wr}
zZgR<g58(}U1A4<&!lXXOHY_A{@Sg8?&-QGQ7&F))sV_}b(gx2wOoYTeDTsgR07hWe
zZ5dp&PKhI%8<XHQx<gnm{FMF7vETV`vGz!+32PcK>7*bS<x9!~NrqVoN{m&=OQ^y*
zy5-qDaE&BPeMX<p<z9Pq<vv%+T0^nDEO#Ch95P@=oO6E#c$uN_Qd)bx5K26c=J{Vw
zM@qra`8&@#$>$HrhfTx#4Z)|MWy-N9+@@DF<W{SD0p)6d-!(vRj%fFg_YGE1=`k?W
zI#@d6mBY{y^h5+j**;hQNhcX$;VRto8|}BVjquvHXiGdJz+iZ1t?F&Pt~P3l_n1P!
zRM1UH;f8BX*CP1X&$~MRVkT^NX<+nSr7WzH+&cEJ^)#G`XiM4GL4RAL>YX`}dM}Gc
zPbHQV)8}cDqZN`e>_{x*_LB`e71GsR{`rupTa@2)Pf>yCtUjt=pWbw<OTu!i>G_*5
zMcvlvD{yfVzAqHLy4?L#TJ#hw<^=jhgR}qiCO#>0s)n2%`=uPk!3YXd#>tfWMGAU_
zfeKWn$KS>FwjG`d_|LdFyDm5^f`{-BEGJeaS2$R(*sqs}v$1*VPWy<O$^o7grLfWQ
z9F_()u&ZTJlJGm8H-COn*VUU~)IL@d+vwmYRfx9&_0+kTK)gEp%6be~@qXtN=~wsc
z;v$T9+F}r62gph*QMMrLka7tq)7i<}JO@E3vL3x|4Uax>o7E~c=(5QlDZZ}Z^k*M#
z-N^&0B3y0E1-SGs<XC+R_!s1upVz~wQX2YhsBmF;;6Gl`+1j1MdTBwyQj%X<7%Y<Z
zM0DuW?zw^I<bFQcpNUm>=^ue_;iUWe)+j$L!2v$p4DbB?UnL+rw*S2n5Lkz1_5UaV
zK~ZXA<NnVOkUJD?H}@aOJK}cuVD3~PlP-h@gI=ycw;&cZ=?40a#L<$sA;ZXg1pFzZ
z?p202HP!|96@;^WcT!JcZlt;be}cd@rE;771Pz2Ei7IK|H<oF<xrM{jg(;7ayA@f$
z5nT)*_md*|`w6zqrhHvfRj;KLO^WOBga4x~Ob&`hTgNLa-0j8X#qG)U1$z>`I>IVH
zutOx^9nthQA}<<Srr$pWdf}ZdGwaqfYgTp4aPv;$8+RMw9Fy03xGcLJH~2<I`lq<C
z8fmGRZ0`y2R$?g?3H<Q43)mk1J@hFia5@LoX<v&mD%HcF+Q;RIY?(PzfotFZogfeJ
zfI+T)D<R>SDM(3@-aj=TphL*r&HF0GTV#W|T^*jk77o`QSLcuS6KS$-e*u>Uk0&A6
zA6thO*65-l+Nc1nN6**A=k0l8<fP0>kiRw0Pu6S1-@*TRf>51dW|e^-;5FUAF%pQ&
zz8q`xwu8DT<_*;44-Fx33Phh8FjX5GkA)@pod21O_;`M+OJ(p82mRCiF;+wM$zXq`
z-ue(nX4@9j<Qh$Gm{EVq4cm5^<i-pIP`E4z=kS!%=3$F^?5$1TMXv6r7vIh{pC|!a
z4)l3|75Z_J1<>hkfi5M>tlCPi&Qw|m&C_B9o9uFx6`%eKk1;xNk12Kav^p9^yAI}e
z8}xOCKETQO`>|MK{5mnY{CPj67~I-b2>1!uUabyJHV|H=?RgA>SzyMlF$f+I$IlKv
z!sCX|)C1{idI(VWn%3ClqwQTQu#6OA>}Iy`-FwA-?#fgEbrN-m%;POs@y3*dbFgh{
zhFa(Y$&-hx6LI7bY5}V<@;CqJZpi-4;x_MnZ1G0LwByI#5SGo!Q*9n#TbCuLHT8}6
z1J_C;jB4C7hzjiD$oF{D0sC9a#FL|~8N(xIhQ5U%s>#~ILi0kTRdWnZTo!?Lu3@|K
zy>&vho&_mRwbqu{lFQLd%_ICr&AW%%PUB3mnu|4Ou9{joE?Y&Vc8RZoT3dCT*(CK+
zvRUM^p5lxulYFuxA*jIhTh-=r&*oU#l_cREpSe+i#Il+~Nk2Eo4h?+qx~gPA2KTh>
zDuZBw*2&0Q*Yk!sE#;i$nV7#cSSU1b+OxY%s0aj)e_x=SnIBkBo8<>mLhbLocnt`_
z&DwC$TPDs!Uqmi76r-Hrf>|E2*T?jbmg$=_?kMZ5T@_u(|0hA6pJ{0M$E9o1`Zl&(
zqWab79AfD!K?MgFsJ#XIRm+QK<}Dj>SG<Jux_snx5TI~43~F%YleQV&0V<v<UGGRs
z1l|{BDgaH%T(A>J*<C+-5c4jDQ324<H%|4lx?E{mdcI_DCp)ORou(f0gel)5&E<i)
z^;}rrjPFpH&zY*bJlR|A?e22+!f=a^4TRf8)rT|Zye7q%hp5r<(aQR|I|;GYk^_7U
z8=Fv|aE%F>eMOQ;Yi9QOtwIPbc>CG~fg~u3PxbvU2$!3>IZYY$@V8vIL+xXq#Xs1-
zC&~#ib?&c;{%bZGd4YRNco*C+r#Z~LJdzX(Z@q1d5KYJeXL-85cMwJ>e3j$dxkXR)
z&{?gL3ml;9tZITm_|3RDQK`Ko)~Tj8VUfqnvxEP?i}U|R7bnqd6AbO^QSswa`s3gw
zYHoJDjQF>}Ar3E7t-z3CxvYf}$c2kv!i%mV8hou>-cVEMiN|}gkGWPm+Rf_n-mCE{
z$3jL~8M~UZ5<O%&g@V8c9GZN5xA({QV$R7W*Zcd&NJrgO2&(%f5pQLVU`=#Pa~q$F
ztHbr-<~H8S8ITqlKAioAn4svA#BU1YYGNsJhd0>Mp&YL%#a@sQXdi`5^eh~CFWQ@s
zqj`U6#LI{qrnz=5w_F9nT%=B~a;U+imNDM6)0%Qt{v8TXD!R;|->a0Hq>H7b;Lq*k
zEQh8;M<+Qos`Tyc_P;#dc|Q(6o|`mU6bbvG{XDH4z8ONoUvhw!2`QNC^H){%LS-*+
z@0<00yq}gYpYjwY@%U#`lO?hq9O4DlDU3rRM_Td>iD=+EA$fExT#`;P3J$@-x2f`S
z%4$O{j<8@S>X4CMLQa%9<`yPb9{d3kF<FtSbxza_^ID9cm7}XYp*6gU#Fll}cP3Tn
zj>pI~{)VTTvw*hRSi-!N%=>+P=s@@NKwdo(a`?QBcNuT4X#LQZmiJ8*D-m)^S+w}|
zX`k1vsFWcBUU^9=d=EyXXwOLD)}rn7T!ktshj=ACM}@&=v>b6egLDI5x#9Xtj<h<k
zyjkMr+o1~BKT*pQZ~=_FU*4LhFA~&OhQRElGqCGB``$AqrneaSGp8h;wP@E>#Add;
zIyn31T`O{6oT8_#%<^BTvi%%N_o<MV;T8|K7@y`}zfNt2%mi*NSH~d*p5(ltyvcbv
z4<Q@bq+e_uBX|0KG3wl^?#`pD@;<ltoZ#^jih&&UP>JEnc1wfZ8v*T&jT{Mql;tfo
zNivX3bax1O3Hng*yU&@I$mFm{+k<8Wd9wXBhN*=6$+2#@<?e8e$J*`TGSTY`z$(8s
zB{47K>ehPs0y0WU>b`p&pCoG>W`?E6E+kmVGa^7=JgVS?>0pNM2)Q=nH)^4<;_f0-
z$V*p-aBK=`@M8~_c6sGJL%<nPx)y@5lf!}vK9Z|_l?TS<zCbFm@SXZ7Qlh6*Yoysp
zIrVL%7Nlsw)!11yCV3_$v?@JZjJG<e@hlZ_Rk<CD4tZoG2bx=uTZ)V?H^UE<nf!mL
zK8^n$Y`tTYB-`3(UAAr8wyU~q+qP|Y+3K=wyUSg+ZQJUbd!O^(dq0ekf8$S%h|INO
zJ#$WpChemlr*@k%OtyThEA|j^wz_ot1O-*UE22R;#V<o#$`9aTb>~mX0Q>?xiCGmH
zD$I}>o^$A@krfpcOUo?<4Ju4du|{27BYM&kdZd+eHsdMqR^JuRXxm>p-}IOpFioq~
zyXyLcRj6C<5N~8jTn4~+(P26g@<;gsfyLujidCUp3`61IOnC5pA(E!ib^X=8XRoYi
zghJ1g^w#om9nK$1QnL#O8ziZ=2@{CuSj1VL`XJ&*0q<_mha8MEfm6*?e=_^JoG(-T
z?(tG%7N`BpcD(2^6rvIEzxVN4aOI@&!8FoKuV1fiUpMbd<cvE{y1;$y-|Y*$1^Dd$
z5UqGOc);sjhAGa<>0X7guAMkDHWTaLz~8i(<@8Uy?~BfgqZmY_-qg+ikZ8GoS1#Q#
zt;^*Cp}xM{$vpfCWb6!%VGgUr7*|4xcopvu^dVZprHwD6K(rpS%SGHO@+mf5gl`1}
zEQ7CvaY6tTk7M>UIB8S2KlC87OULrE>HlhbH2lMEMo~=KFC8|csA(n9Eu&RJ$O_V7
zKx118^ZOhevy>t@k|EUFyduN}&T|4kwCL(dnxOcU%fS>8@Tn7hzNJN`=|+aAZeZF?
z_a8=t=T(cq7bL>~$0P~c`^C{E5wX2#9<pP;v#EyLURBFs9RH&4Kw}>N1|xN&gD}b;
zSU#TSJcQ8K(|r4~UXor~RpfUd&{N6${Qk&oOa{yZSlyY5z^yF2b)lvCeWhKxPXQaZ
zcDi3seyp;842V+6$--OatXrI57;N_Aaa^U0`-PN@wM0J;fR}Bw3o|L?^4kD@2j~%+
zA^hZF{$h#cw%xtWSR42cD?K@or&@ka_lc*wYX@?-{e7G%d4pZd_{C6kq_N;AL(2Yb
zSNDQE^s@19d&jUNBriA!wn51B*JD}0{+Hd(8y;P~WKv{H=UbJA`%qUKSAhLlQCSAq
z;xqv(4Dk7O0&eFUMCG%CSHV+v+5M?SO?Pdf;J4Xm?$C7)ud)sz-fF{1|8$R!@`k(J
zR$suVzac5dEQR72$UPy>ENLj044v4RWZ8mFK|P-k3LN??9J8v2hlWQnWjN3KK~G{X
z0e`}Ze`EDs`p!vo=3uW<*WEdljuz)!4%QX#ps}Q7kYDoTlR;(neAL-q*%1tquZHv2
zGN^EZSv6R*d#N(qd3^jQ>`o0HuSK4qnh(3056eyQ_~4KUz{ED5&Ut4A%fgW>82?x)
zj{~gNy^F26@>5jep~sQXOLm%AS`g82G=Cuh?YOE2bWD(`FS&0LoH|UyPBddUSl8N<
z^Qh93j7^{Qu1teRZI}Cv3`p6HEHrt!t|P~qf3=l?+;cRMPnyY!$eR+pfO#qqQ&nWB
zT6RfGmap2RMUw}Q1DzmDqhiqb(T5q2m>((nr~yrg$Rd*<Pzig@93v&bSJol1wUCkz
z>G*`I+;tBoV>-c=SN<4eXbpuhn?_-Gmg&%ImLh`8TBd?_LJq0|f%~%7d|@1UnIKQx
zr4<^oe7T6Dxy`UZG$Yt#%+O5%Ww8E_n{&D_MPnoQBgvG@88ODTT%c+gnU%!v0?VO7
zXL;QjX$|K$sY)Jk3B;pY0z)9jy^tS@#s|udh>XX~xX^EA)zwhxO0*-6#@6iPD7PgK
zNW_d`9|Y<eCZS+x#HwqR?VA`o7D{9GA#EZVxeLso>h9>#<YGy6C3l|l67wB=y4UzI
zLhp<*)F^{#o)WvGNCw$_Q)r{NJcV9{w*l(w*v-vYQDy>MY3}WC%V0(|UF-6OE#=Dk
zQcPzec*<3Yu(Y;=(5<JQFiOC~#(ut38p>)&1f3reVbGb-M&@Vr6B%k%_bEiSeyA(R
z!APC$tC!JXK<GS%AkN*g_uCEC<)@4oG@7nPQWyv(2XEF0%ETh%BYblhuI<$Zq{)_=
zYa@e9I@6L&>~yAQ?_>h*Mlqe!T-b=d$<SoKZ>XZ2nq0=5q~g%*?so^nf3#J2x5A+S
zGv$;MM}$0b+o+b)_#vmuXj<clT+?!4kzI+O8VBXVWtQd`-}me*!M2Vut!*W6uz^zz
zEhu6$Cer;_GRCg+OS_6YE(|LST#`mk-i`HiiPfNE_00HbG@oMx0kV@nPBxc90J|co
z1+P#fZxEzJIZIq8fu*8I`rR$r)rxC+EN#A9;t>wojR5BxeDY5fny#r}dW4!tiUI+&
z+;BAU%qjx28^H-heZohoznFx&6~0jzn;9AnNoCYz4Q|Cr9@T;N6Ey+TCD01rl3L{$
zvV%M(lf&NuLXO59$BI=}8-Ee}q=!uC{D!9+OU{qL(O=5g9Q1fef~YsR%B<b~W(1b5
z$|H?ZuGDHq3v(ew%@MP;*O*CZ@Phmt2<D=fnDoe)2UfPe0KE=DEJym)E8~*S1HHvY
zuH2CS7hD_;{T2_AX?Mcl7CbIx;kR-E7js&++DxXs?kxyp`eKlni<*rrVUi*<5Neqk
zJO2TQR(y%&>8}P2c{#WJ;+>G$tOgtTulJATOh*UHQVSH&1G!nqDN^Xo7;gw4RweYb
z*KscUov)K6ch<=mXawKD`f=izVZ@3&wxaPS<q|IaX-z3;QQ-<E-}CEH3N31f_f4GM
z@vxggLn$q)V1;5*DufY{KLgk_^6Y`JW7RVCr1f6k4ssC}UB)=Vz?erg${SZ?=R#hX
zYrBn;X;6M2A4F;z0R|pJ33TISjk%b}-+^!h&0pIjcDZ;7DY8c+x9q-f3wR`ISk9q?
z&i*9G^mg#<rBXxuSqY7<!j@VqSRq@!USylT)ZiMz(#*7Mp`mG1+L6Z-@s(}2<XI_h
z#2)#cyze{ROz?<G^2}Gr=H$|jy~ARw)6GwhqlW9PkM_)Rzmmo|FEcgpoa>V>^YW_)
zSYa=`nM||35|Wk4e%wbBwo=emRJ(zxEUHrHO(#eVwTL!G6`?p+&2_nig+1l-yQ6xv
zL7#E1htAh;x?pSnp=TgZE_Nze`!fV~Z9=gm#y;_fez4<0SXmx<Ik}+CL55I@6aVI>
zj5;H=xCE?<V3mU+n$uJhSh)-!d{iFz0M;0sSgFHMLgj#!$<G>mj3<tlrPTuUsvI&q
z3lh}^Q83AOwD5W!Js{I^AxVoQgQErtiBqLu8gQ51>vOCwDmJRSOD6$BI_2IL$@U8Q
zlB|xP3j6W;o(`;09Gq*>3Mg>=NieHjAr)lpk>aINXm~viwgR=<(4~{+jFiQREg;sj
zS<Y9yg($(5j*5}QZ;vY#o&gM~IN2!NO~bh$fRD%|)ciVaeK5T~LP9NJ0UT$A1Q7p-
zI1Tj>X>@fo>+>@Ie`Gr5EG0!K1eR~FCbEcZpYpvi!Yv;XJSugUdd@^4g=FLZ%8<@t
z#Kku}%13Gpas`@XY)0jaeqQs0$1in<-3of05E|URk$9F<D>h*qcfhEI>d>DEEyc{E
z`}$GK@(j3r;xdxr;Ng`DYK$F@HWeP=bX+BDu876Tp`nRs*PWK_8B4?jZB~}JJ}HuW
zgUu5q2+mGOk_wif6K_#MeZaAyC`<@fN$+pY*%3-Nu<xNkE1gxc)p?4x6LpJZC;65-
zPB9+JC^6Sj&%pJ$b!+qW|HisGsE$_tP1O&s6HQY>Z<P3nDmZM+4c@+xX=9-hd2pa1
zz75@gQB9}WD;B+9ZN`5|Yk5FJk`G7~)i(v}nb7{3)g&)|isEAEoJZp7M@(B-tLN4;
zh|2NUllx^^g=|kt>=ZAB!!yeKjxmtyA_>gN(eUeSW${k2%iA!JoSI$8$t1lCZ0M{Z
z11TLTlZyI!WPx;4#vq1(@oTFTtO-7k!J+3VjTc*Kyj#VcZt}DeoTD%!9dbEBb9sL-
z`)ISEed4gjmH4O@O*)F1F#-`ST%IP(je6W?*)+6$E|`~IKQUFdmP`K)v6)_vD8BNH
zZ*Dj4e<B@YYkMh0)Q|4Q(}lY!LoJ7B6@!C=BEg8Qn7}b2cP+z3c~IF33({6)!D)$d
ztTVuSY0s0Uqog?*WR9inel7dq;%e9)8T197bjxzfiXJ2cXXmkPhUmXfM1SgV(%A-+
zbRv)SsF@R$FTm*dP%DBi{1|yR*oh@))h|^$@2sEgiJT!7Y>VajCUDDAtOWrzzT{s*
z9WwnS2}Dt6_7ghDOe$c|`>=f5=9FoRTU%hXPAB|f>6@kpW`ZOVZ;n7h9oMh>GU72|
z5b!sT7m@DY84^_T&g{*x<$K{|>dZh>oMHHZvm~Ioc3L`u9W4?l&1@frKOl=@`{QD>
zL?&#@{rgoLEhdP+#qS0NQ!5t2^{JADyF{|u95yK(DxwL(DBY<Em>&lnS}!AJ(g=6L
zr}^5g3HxrqNQZ`brJN$yx!hz+5S>YHlJN~is5mK5!UKa&*5I;~yoxiV;xpjjTT13f
zK5cTiA&aby#pu&f;|aKGK(0lsevsuj<oXwRjsT=MqSRjc$@dDPUS``#;|?X^A#^Wy
z8`s+(w7omKf6MNdk!kZb+c|q1^eE(dJ&iPkAwjPLNyp1oWyYj^dcGLs3ARoWpswb>
zhEGm3di!9kx^5p_(xnDdaK@f0P^&@1ju}FB!^!A2#?uXUk7SlJzs|H^>fPVB@2{G+
zhk_si-@W#-GU8&jzJzrz2^mD;yw3Rq0*v04*p6%Ny_|`QJ{q-iFPkY?)GM?tE+45}
z*qT%5S$sG;C*U4vd!v<H{k>u@kOdfaq&%s6fJ|bVI%L_c?nP~1a>?b8rJeq;?b~x}
zK~I+u;{EZ@^^uo*536h*6wotv4aa~i16Sk#WG=l&AXVobed+?vd0F*eBuisUGx9}!
z^eQl9$Q2(<$GImtj^@XG<i_)t^M*Y3=#pVs5LIe^Hip&d9ToqUNPR*m>NaJlw?aM2
zD>q3R6J0DKq+K|<R55T&Apu0n-sfXKl-UjucIi&dfP@!EGNW5)6*3|Dp+6L$A_k%p
z^2cd+{PjOZ?dV@$V^9n-!XH|0bO$eC1v3?RgT}1Pe;?%$tTPI-u!8_=*X=YL_)32q
zFG0mp%Y5RAoen?+TCp@2?#1d(6e$V!qNT1N_2sJg+YlOPb~&e7>q)LVAM`&rVQ#EG
z2+abLh}{eL6gP(~mpx&hhx+D8GX_^gX$5BbBFbM-v5uKw&?CR|F=5E6e83tm+8Xh-
z3ho-_$)NmT$Q_Jl!>s`ti{4Jy-_4Bv6Qxp`3D#CwqQZ<PLk9-_Zuk!g_44L*^a8wg
zeV_JcpQ4?!xCU$OJ<&VstOit466umD=$#S1TK=~#C>(V4YDRLR`LilCR#@=x_B(eU
zf<3&4wpv{LHPs{BM6;s^`G~uD_2X*o4do<`sc5%hHl7gBbz@2cf^2Wz^**kOSP-R{
zELPd9@}^9xUG2__Pn0O}u<yykWf;iNg^mSO$nD0NuD<h?Fn3LjR}?db(?}1WIU@&=
z0lKw=Z{}Qo_KI?a|L{bh{Iy&p(Yk5hC%jsfUgjOk<|qNBr~dnRNy)qSiW$IHzaam9
zq%`>OdO%26JNT24wvL^&V>S|~6#o#sSB8;6w`$8<otN_>vvuVARlvOTE~mx)_y)%>
z2$q{t^wkm&AY>r$dZp(T4D;z&h~|B<bac%k!3?%do@#LQo9^{YJBHe|=e!)(=z4@}
zyeZFNQehaCCzCnctP~(WO3O#lMongb?QkWcNK8c#Gt=hZlLU{=k0h;Ngwr0-Tr;xc
zr=5VnA_<Ube^RYY*hf(%#fooa5gG&kox)9lb!`*&q@ba!j?6Yjlb$GcHei}jcIgxe
z2_x*(3ng;J<{Uhjh2{!Vj5KANNT^=w&*RX;NiCqJi}l)1@`fy=R-Dbw(!g}5Uy4d;
zXgfArzYt+`DmDxf;=v|6@G3T2zu1oJND><k|J9GAS*FA#k!`}aPZt+1kN_00Y9BRr
zDzm%GhstLB3q3w|D$)*O|D%9T+afPiil?l8g(Un>0tbT3Y4Jx%+JXdlgNtE#a}=ej
zJvqr-=I-p4buPW>_^`AY{N4j{xfp{d>Ajt<!)QB)c%}+!xg+c(ZT03K9#L0^qinaK
zeFxWcgLzP^A67mqbvH?C&=j%&m!z!9@Yd}fd2uGv%-`Qgj02%K1;pcqUF;%P<}Fd9
zi;+{-3ntsGK|Ex?litbN_7~`PViD7x%e5imNB5~7Mh?t9&<l-aPMB%a$lVD##))M*
z>VG<$C1lP?3L+5F@;cAb?4o7RXn9uS6<f0JrQtH(zEHPGEVkmtH<A2ohA+MD*&0VC
z-Tx(1rL5W}o$;13nHr{Nu3LE?GHs1ukTLG5m1ZOv5|tgZ({*OY3uA51g;dhT$!>@{
zRfbp^`MgDgml(u!OKlkyT0b`)!Fj#;FdYYo=dQJqx*48a<5#?R*lv7$KgS{$SCyKr
zdOknM%4L8c^s@nnw^Ywqh#7t7lq8z|6$cw3I~xwom^SME`~l|~>Sa7e#wD$B+N@Z`
zQ3tz9rHQD%Fm#|#MG$tg{SCPvi^jS`zRD1z%A+makzGD`Xp^vHs-TkLBHe+P&<=CN
zD=XSV63%SMH%OPm%wCfg{l>$8NSr7DW#!7mmcKB9=tp%7Nr+?rr`PUJibKETv%q6n
zgh-LUWaGbqj<oQjH|(9c-ptA`&yid&RfEnmVuHsc?*8)?=W=ODA~pkPXEPx@p!+1u
zp9$w<(Ogs(vDiUe4sOXf2N3YEo2>;!;e5`=dRF~u*5Iea`1LSGJuLiD>mBS)vm8fw
zZr|*Bd^Gs*ieW9NWmg%NgkNUb4%g#sXE&RjGEFM&<@_#X&u^f$7G2_DH<$APM^;yl
z7SA0LfN4=K7{Ro#+wp}|S_QVXV5S=;Lx1Mt_sLl5*~s9Xzv6Ykzaw(Fy2~@Mkpy*J
zBYe0*zfVRiz_5o=N|%V@iyC?N%fN#`6+PKBMpoz&g60*$cD>^eJKg=b2S+z;e`l39
zUexSAtYx(7WP58|=qX%EAYEGa!*A#+N`}iCrRSKVhZVQ3`<+*gOD!yOtk?t1*p<86
zczZo3ek};%x3KC*tLoSp@b$6%4P@ByhE>v6iKP=rZb@^9aPV(taf>c3gIGd>_miy_
z;A`b?NPVBtbz?MWR1Ml+kP2DjdO^UL9Ob9sEmB>oDI~?&jpb)Rint_`o)ujrD`N?^
zNL$~m-s}rcR#+l8)s6GQATby0vlr~u`qkOS;x?80ODCIGqCzpy9;@yx6iM`3sZOVO
z2VB}_04ggJs}-nQI0mV{G(9lCW$-0<n+RekzC&%rTXcusD3(lO$joxrwHR(5qR~X5
zip5dZ>-21l?1IkcbMl1NU^~SXn4np*RRG%MJ7)tmx)obo!GqL?+@Qwt7WoR|m|5QN
z!v#F2H?w`;&Woi69YsG(`=l%$%|(lZ1;|VbN4OUEHM{||Zq2bsA8pVza@@@{InJAZ
zsA>L>+S!_JiS~4C%wYJP-CCl0WhLufY01-~sV@~&HvjC^0$u8&ck`d2B4Nk+KA287
zZDJCzFibIf-y#D3Q0yU=-m#PMO0<aBQBkN9w+VvO;ST=@I($C~l&}_~EI1mo+sTnR
z+-@R9^(@1b0(;c&JH)k!H`#k~8!pTCzsmSDW5k|kjYIJT=vbtW<nz<U{TQw$H%!dH
z_gI{`K7-C`cSZQhv$O(CPiA6By5|lT&-$N>zofWz^IxzsxYkv8CIruIKThFs#Tx?L
ztSZ24mn{aSx`8Kg{6tP73&>AnG9*!Ye$GRIh+E?YI*YJnMqzBmLXOQ@MPgNnM;xb}
zfS97-S1aO8lsp*CW5gm#e5gu|Y!rFNgbQVdZmR;wB+a}p$8*W?jul<-?6;AJjFGbV
z1S;ze7OP)Ic~s~!XVthhlU(q*-rGV&r~!5?(HFd?2Z3@_coI1yg+a4IW}&*mF>`y;
zC&(w2ySpYd4vHeE#eov-z`(()u;w($wc(Jllq^QFlBZ3?c_k1ZIcqhG(8KVA`fN0;
zO~Ob<i>8X3#X}yIbZhA{Pwm|+%)f{amZ5a!^RAaz`<Ua`MF~xPKmX?cRnS==X)1Bl
z^dE=QJSDGBf%>H#RZ2Ug$-KEeP4aO&1`5GVD}#nyC%D|Q<bo#CYmLy6l)o5NDnnkC
z01xZP^W#V=P%1QM93#7p1SYxJ`JD)0vSU+m!yeRt&0)2P7Q&!I1G#h91pT!yBr~kc
zWiAxiV1YC)Yi-iBT#fK^&D!1+i5lmpAX`vY6<j{z<Fdf!DX28?@7Z7hBp#B*l{tcJ
zf9_^j^^iJ$Ei~>B`!LD$AJJQYX@+k=L^xTPoj;Fn=ufRtU|zzEg0gJSdwi)DNbRL3
z_Ox`mcI7xqyr_ZIepJU#ejmhH^MQlTf(nwK%k@Z~+RJn*Gm~mxlU;tRrV&`nEe7Nz
z&}VWi6k}a@4|VQ0?gD0RA@RlER6>3~IpuNk-i7AstfjsZ^IY{iFv(kNR6{oX=3kqE
zu>*i$T?s9D-lWgt&&%OJn7797m+&zC=WN>6mhLyQt>U%JSuZCOIg5l<$v2CGY1(`{
z3_}z&`C&Z>)42314g*^?s0nMk(dp9ABF~_gH89R5{c5b;or@qoH2dl~2c)cG;LQ5n
z7m%HN<@M*%Kqigv3L?vBVsG{g2`+g}+~P*_d)&Hgu<j*0LHJP6<CJwO`&$=~Vyh9v
zn5z3Zq3m+MHrO*K4Iu(Wh^T>*%10en6uHW%;x}56g?Pt}IWr>1<e`gLQ{Wuy)J7vE
z2Pm+-HF0dkuw{J6Bq4Elv1I)!HsasYGmN=C5{A&(2Ga?=8ZT@eB0oex)1^*W54;*L
zA!J8*@z6+Kn3yh=g$<u~n%W4LLC@MwnGO3^+>AaLeBm}<{Pa%Qs4{`E1-Nc=Z&o_<
z`aiYxFde{yu1G)A(svD7d;0*LIS<I$4?sLOA*i~cKCamvZ2LvMHMn~O{hLS#QSh<n
z%@6}tKBD;I$4Y+#Lf*0O%FqJ8mc)Ar-iy@V1401?*o4PJ2uVh?6iPo$u-EEAWT~n^
zm2zrLayC>PnfTaZQNC+RGak$Kan*4rDga+5VI)vt9HhfrI&szz6_l8-W!@-{CKIdV
z)@6qAXB<csR%9hvifY)sHN;-wg58VIu84^&Nj@a*XFn5i5>B+VLu|?v8%hVxdyZg(
zeFoZXgXv=k^o?B~aPvVgRJYRV-;8omV5k)ZXy9&?@0+1ZJJF3O@}z3a%ZuS#54KGS
z@~oM%(fLQnv5>PFz1x=@V=uujAo~!tEZixxj9G}|*pxHh=!p8Ld_4d)wH-xRU*6IF
zHZ6bpUmH$q0qOrJTJo%2pj+G3*6^Z-QF4B27E%(#KLJ$iCLPu(-J+yV6Sk*Rni6+E
z)5=DUsQz1~&xcMizC?|sKq2knU$dPFM;K%2<hH?I)5V%O0XD$@z3%m4w}>4LaQE}g
zr$!FXZa5r_-vDVJ{Qr{q7cc)enIBcOh2;TA(?`AR<^Chsfk3wi{J#V{G!)LK9(T2x
zBn>59v=|pYyxSXL3Na0Bj(b-`yCRSXu@Z`m+cgqMk66R4aAd}ydSB(F%Y>l5H-v<5
zRjCyS`M>>i5N&-B$vHgWiM<?6)p;y))g1d)Wi`M@;?*iOm-nWo2X++-OQbq+{^AzY
z_R>Qkl`nteRzpW_9;<s0G`#<f-PhaFD{g%_GyOuMAf2Wx($U7+hZL25BsZ0`#dJ*!
zQxe9m9|kt(nA_+B#{)!Wk%>ZinGCkjFMxk{{m64o+O?+hNv|(F_lPP~*50EV8Z>v>
z0Ux{t=y9AVPxoh@J%&50uX8P%>#FMNXp<$mx<YJ&)gI$*f5oXiqj0EtRPzRVh(dmD
z<T!>L*;$Z#V0w6xhZd_kqU3oeBRS|rOBz4};0>xu`PPXgti5csyw2h$JIebKI*uT&
zP*vSQ5-#P253x{lR;Nbkd0n7+(c@Ez8L1zIqB6G+I#OjU1$Z80#3>%ftOR+hu0*_N
zbVLK)+k3w^6Ow!eYoxpC77xzqe9yVHc&0jd;HZ<gEZ^iIpy45LmU)tld*rISMw*nX
zM-`-BC0*omo1I6tQaOQt!y!$%gb&!GP+XaBnOlog+GadQB0^o;Xd{Bnv*NTAc%lJH
z?s^^mmG0h^?gwn_9PTV>fZ3LbVJn~}-~%LU4MK)#Ir?e82umz#oCqZ4l;l;i#T37@
zKP%4zX2|8l8ngM$mCmBq$4t#tB^B19|J%D`E>tR7MCuH;bc7Q!9fiiS%cQP<zJ;W%
z+iu?1Z9Kzg95Qtfjo6+rFRWqOKG_+p>iz$*?!e^#W8LWhRrZAnzrt0Kd0=(^P6c|n
zY`I6{3cixqol_xA$S@*y`;i$jJr8)v-SK{~-&EB?$ncMZ*5dc~tNPUhYRl;8mBP%j
z-pCy2xi|~p7Llz&8e&GEJ_o3GsM#_O4Ewj}my9Sh{;S@x5j*R;lP6N-M&|2+EMF7;
zn#QHxGoM`;NpWDE)go1c^PR`adf>N^-C9t<R;l2zh{;kF=QhfA9Db6=G8f%BL<Y@P
z4X9=e?zsD|6(ifOVpW!p@nCd^5g=-C#KVN*KwMBg*UlEHz(IDVLXY=ULrV;2OgAg|
zk>ok^2bQt7*b26#`nDd2IG5zbKJbRFhjH8Ac|^&|)>-D7HqRZ;N`4muy?M+oicx2-
zQYdd3xJtz4B3z?jY9&mIoiZZ$Y-yK+O4B2oc~fwHqSpNs*G?vICX2n5NKTu!lp5+5
zNN+{^4mY$BMw|6U7}s*nu-NlCB0lhpEE<J`D2_+agQA$}Z->&btngA=f@nb<Ex=UL
zI^<2FXWX0JUk1XvEGb_@673@8&!r`GBoEa%8plz>%)JshZw%Q}-oRDxIBH)ndIRRF
zrA2IyfNW;(nw)l7onQ0D_bxc>)I8eQi0hOt`A8$-A8!-ibjS<WhzkpD3Hj&fXgyUC
z)pg*zm=$|+D=la7P=^+~D&dFE^B47ahYn1FsZ+t=l-g!7z17%75<R9&PQSR8y$CH(
zt(=JJV8};5E9ZP5sKu|WK-tLtJX+O=jLh;#`&TX)GK3yIubaKz$`r?ic=9cRv7`R|
zMQZY6n?&%NHxkp1pie|_J}aXhpnv5MSYb59&D8(MAx>2U$Of!;*QKqJf}r7f{jiR8
zASMMzZ@UxR!cWpY2O=inBkn!<gGKYe3#Mr{*-(-kT>eVlmc}FHfSH5tnZ;!(iO&h>
zb{PNZSY^(0B}*rFnhwqNYLWk3jPy)u9Z%O=R4mZb7nxS=pv6z4I10(U7xpnLXVA6T
zj2aLXML*F@da&-ONlY(o#i}{9Em?g&o`P|V4@c*r^APsZ!W{bT?9_ee*m2!HG`$#c
zR)Xc;d92hA1f!6gI#5-Y<!;k81BdRMK#h1G4|V;yE&xBVx>F3-hX2LrCjzIlhJ`)b
zoH^`u6!f>TgOodK(b{z0%vSN0XuDxhBy113^&`xJ(EBoP4ry(75P_?#@gWov3a&N%
zTl|`P5!*(i*03ZKJ#-c`_32>sa5eV7ItU^@_PPB*c=n3gZ#548LpKinN6Ssf7Vi@r
zn<Rp$`htOR6h~IEq&E9$9))O^Mta&;S3`^7(YMi7H}=>L|L&(d$2aJ|0$%MQ$)NFD
zWG4N(BO$$HU**Y=RP_MzxOhxaRk+q9y)-7rm=YZMLUA6|Cxn<vs8PZj2n!&E7;a>Q
z(jEc>JC-|_L?Qez?FobPd(LC+lz&li{7u1hO4jO0;`T^>lqc{6!T|Exd=X#M4SldC
zH96p9^GjT@H%EQbWf#$5_B!7EfFLy99q?^`t8(d55zyzqM1Q#7A|w;Ny#k?^%12t?
z%u6v>#K;<^ZjU3JA6JtGhh>WP68|nS7+`jPQ96$<D99?O-|`K6A8P+oltiE$L<Vp=
z_N2_qL*VWx|Kfn=@SNO-ch18P^c-5ZB=PrJ+w?dd7ZnwK$Hp}7T{OiI$!E_0Vy>ol
zuc%7c$Qc!YyE@=xN4{q2(M-S9i(S>{erZN_JjSIZ0FF2T;hD$3I%m~wW}=v@@7A}3
zvXUHSy&S&1BV13IN@V^V!NK8}5Dr&rwVbn|dJqyC#Car(POO1Mq@W5M2ZJZ7{{15B
z20I{u7B2iQ&;7d__2+VW(B7H+AALsd?Lvy&0ELFmh15Pp926Fl{y?Cf0;7X`G^vXI
zL|kRsj-}uW?ojDiOYRi=)`A!MU?Y+jY?mSJYu|SS8+K>2ru=zG{F(#^NGn&l;<jyg
zT2cfetJJojzUrlZ!EqL52m75Z7Pq=TfOsQ5mnrk$f^-XCZUu-NZ;E@&CK5#sHtam3
zhM@y6|K&e!2eK!dEJcw3uguXYY=GDS75_hC2NqK{L1U#8X&|X;5kv=pfnj}QRhG!E
z%|j+J^J8T;8vZAn9JYi~7BMl~O<7L2SF8+tqj7-P!8J)J^#DB*cv9A3(EcuxC0zhg
z6n9;^u~)?(^~&V(2TwfSwn}Y65L&pW02ja0nwC^7xE_(8ROgoU3k5c^mqfDcBQ*rh
z4LrzpmG97*ikNXc+^_OEJ>i=CoPIpQP)E`N;B{4~wy3^?K0*+j7g3tSIb8@0xINAa
zsRASQusB+@ADPA&!fRE*JQK(jUA=5y@yL%6Q17e-SK?{|BQWc)A73|ip{Z3rRX1h{
zA<t=V0&|k73Kf^T608G_^%;gVOxf@y7qI9TgfAA?CRTsoUjZU@`$v-y#MhjIfkP|K
z!w@h<a*ZT&t{|Ob<z*cyN(FP#K>Lgad|z+rk(5rO?HQGISa!&@s(5y^7;%AT@sle)
z4q`3ez)5W}JU}GM=qgM}7RdzWIUNI3W^1>+P`Q&(#^h1kYtkZu!TyN$=IWh7z<B(o
zdoFQ@E7*gGQkTPUz3<}e>lLo+dG7PQNx{Z|!!}q#i^;oaOI+&UgX!f*KE{}w{7=NO
zXTRO%;0PzL04K>;7?q)H)OrZQSJz$9tU?0POHh@U9<jd=U`TQti>@n;EVO;1%uWS~
z;6XEFwVV~SGi=R<<5Z-|BvoMPFbNjJMa+w5C5;Ds6n&IR5e-~U&2yYELoXCG(kK?g
z)wc|3LJM+NG8>1;T&+>SXeZ%6DNm&59gne0)iMys*qv>uBrJc-3njprK_mYbqo~+U
zT`%7!Xm{ye1QLEdf;win<00Gn0Rd0oYgrBMolE&=&)#x5^Ru`UF6O`-HP5vdTRNzi
zUavHi1^U8-bz68zbZ&fSTusYj#lBLGwGfCkHD>R(g9S6hm~up!Gy2ZMPYA?<<65#e
z<cZLOKY@nY218^B4MqL%xfgBQU@Z!oo(CJOBzi&+cfmH3B(O?w_`Cb)9$n*MS@qGx
zNH`6VLGGd59a$Zbrn@53Ra>4O*NjSW3jy&G+^U`5Evs1D>a=QilZzi>5U1@sJmE}=
z4@e_$16Luxkfb)%=^TH?gP&+|`=DKy8CjbyTx+fZ<>#m+?5as#;0E+r{C$q4`t-T&
zjLPS~cHf@qh<d>@x0uAC-M5qVplQ)19L9{!RvVKQhu5@k$%$=xic#m7)<ML@J?kk8
zgJ(RlG9aA?3UPy17Pv6gDdOnLO=H*I_;zPERthly^<04OZF=~NKM*5=jo6-*hjxVL
z_{!5D-(P?u;&d=9yM=!wWxvQFQHeLiB2kI7$sl20rc{MjGbe86><h$Zl}IYA8_ORD
zR^H1Vr@A?TA+b6qr?Vevv&<@5)_t`y0wdu-Sq~p~)2qYzfm`wWsiikDOW$Avx8m%n
zzIzXTyg5~Sndj{%%y3aguLV&rOhfH(JXjRWXGnq{UuEo|=`<)Kf2>z%4%S$Tgl<DD
z_I>$=80QZ~88*Ane2{xK(%EOizr<mw&fc5tcX1WoA*H+|k>26F;&_}Etwq5w9A@-6
z9Qalg6UZtxLZ}FY^xrB#DmlB72&Ux^4$+o_2q6kGvMxXg))}NE8s{cRMR@wH+{t+Q
zUDzRbW%iCPc>3g#%m|DZ7Eyx}lHpvY>MN^o5uCa+P!hU=ozQWe1N!N}aVP$_az7UO
z+m{!>${0S@1BK%PBY=*%jH_%MVg_}r$Iv1)toxB7G^_yEYPh1eE|GvBqmTrbTQdnK
z%_y#n@Qp?mwy;m%3Eg>T9CH=H8)uN(M({vD?W=3@f+Bf@VNVH~9w6#_wv-Z<4!4(v
z6HXcN=g&oOXkFN@?8)!(zMkI!+R*mfU!yh1h{Qa(Y$AdgikUy3X-?x0Wj_b=g8FYK
zMaw~miS=qw)^L?N{(SxLO;ZetW>*?NVc|w1N~=i<t3d7VCwlUpS6vvW7?dCr^OT<W
zZ0&*_50p?aypLoX=BupB@UTH`;Zspfh{VaX_T%RrdAqLp_#62Aeb#?`#rxv0@&5f>
z-e`MaxjFpQ2D8MaEEp8(aI1yb)4pEnegjpET1N|WJLfZD)2_B|Z*=vi)82wusY}B1
zH@lFU6E1Cf28HYP{;OMToBKsF`qsLQUSAFRowAKhw3fPh$TQtO)|wnjifQ@wKp?CY
z$5fQ9m#R5?t7a)Md8_7<hU8c>@PNFR1+6WJd<k_r=8u4e$X;xyu%h4*J|7+zEP_Ng
zVM+MIf;WD1VCILFX<8pfggp&f`s2+4rL^D{NZZ<MF2$R~ym4KCLadMvUsp+9^;P1H
zKH}06pWS?kn{mE)e-jzIq?LfH-ee{y$LFQE^=iE{;BVYlznZNfd}#Hfa1|cTEc(JU
z30xxl=2?s!72Qr~J_N`}C}guIE{OHx%MWiQ34jeJddS3E2tG~E46`2BKQT>n>?)>G
zT0SaDuZ%4ojbx&6Pa&#A?@4?npJSIeRvY|0wi|KyTFFbB;J!3$DJUw4Z|iVD8ClWU
z&XUO4y63d<TS@9-aeG%x;cEIZ?(uo3Xf81(RqpY{6z6BIb|1RXskILbEz2DGnJE4t
z<G0ixpF$GaA|3E{t!~LVObUrA9=dP<jUtVRIw$RUw(B_S<{%ZGixTw#)v3WYux*0)
zuyeOPa59gt`GDm`>Q5pJV)(3On1gVyw;hL#yL8>BS%xHTOO1wL8~YGhMP3-SNm|)?
z%C482Kt^9rFnzRuezXG3$kIEgI3%5P7|1TSji4c3A1wJUAH(E>zOO(PYvv$Hck8iJ
zIfm#*ql{l72|seypSR=e0#@5RPI5sjJf3^=)_Xm0Yte(D)_(0N@U1WWd=Wlb7HhtF
zA=Wmz$@OsWtx5~`Twfd~8}j(BCCUBv^TaEm!Q>w}@VmL$>bViJUWwzsA@F2$+wB3I
zOa(aH)L!+=lfS>+{(=L!Hbpy~W}$bJJ_b`U{Qe#M>w%WEnbHi~y4X46pbxOyS_-kq
zucJ&lMcv16I;@u2B-z#+bc$})7pYzht9o9WNtd5lwEsLf32S28Ji9MKl}W$e_%|Id
zdO1d9^fS%8ZWWQ?+vdD|V;jJmVVv|fSvWA{1T=HKbi2EGza37LZ2Z;C4tReUi21s5
zYHo=#DyaFHhbh0{_$zt_5NY?e75tm_93A8$y#8;fL8t3+^82-yY0?voe(RUP+djgg
zpdUnEs68UD#)y$vMCcwoZkOOc@pujS&9<3<7T2??^^%B|rqNqF+_0Uns^ZkEPtt}?
zSEiLXrz4JdK{Ti6{{Ux50O0HtI9I|nz&nj)YD~Yqg~hMH-OK#!)Ai~-88Cq@eHuJg
zXRrfEj)p-R3G%3QvD+!+Reo`GxqfE)J;Zo#X!+m8uXofjYbUNmT9=in+sp3RuewlP
zN~ANztiSa!ZUsYi(5gHDIEX~;2&{4!51Td&^|&v~zRzh?n*J`wZuhsj|A<gT^L6<W
z2T>RaK_EEU=zgcG7sj*ylC|*vtyBb}!$GaKJ%p!>)x3h`Qm$0RfjwjsCxNznm9>76
zC1C|2S4UooG&Z_S8=9b@V`J2ExTJnDnF1AY#4en~3`$x26cTf)4txKB0geuhgmx$w
z8Rv#DNBqzc3vXHz3w4;wfle8G2&L)$UCH5A_k3seF(<J}t<LaV1-%leB{n$*K*1Ue
ze0v`67XA+o)=S>+i$YC$>mOgH%4OOQ`g%c>Ol7|=0M-O8`1=4#a9yBSePD1a#z$~h
zRXVZ-WklP7M7-K*2r8*Na8+ZeHv6|n@G#~{n*JmMy>A@0A2_Fjv5e<?^ryIqOfxYc
z*b10g89sD>9T-6=$@W+O^|jjjFgHJ}GO*7i(%}!wkYyaj=iYTRkvm_sz7&zcA#9Ay
zqxGEQ^IQbmK0J(BcDeS8)*!o?(N}`ij*1eoGUwj_`VyMK03d+Af@yB~^RL1P%!CXp
ziF<76oofF_ogR<0%V@q#y208^UyYxa$JsO8H5op%a3JC`*0KGj1YHD(cE@`&$H&18
z>`Dqok|mS)V5fu-7MAe8+*ek!k)KY!SKud3{|<Y?D>j+%x0exwf`Z*~^6*>0V(n(-
zmm{{&_sKLU06hGC|CV<No0nYpyf`oeTgIQr`Qm%`My)*cueN5eMk&K>R$629A6|2Q
zwn5G37GAvbWcBP^J$z=pqzX^yUjj@H@RqF3_bbuQIoy1?0pOa}1>C?-6)y?7i?w`u
zy)UtT=Mn2CGuCYuOxFAoDqYOr6Zi>Wg*EF_xX)n`6O{U|dzZr%B{K0wV1ccvj123n
z3h2%BNo>6T?V{JGc9<>la)k0wb<}ERgp^|-n#hlDR%=VQ=>jve=#U#a7Gq)m^m1F$
zYgel6Of-Xy&GZ@AJdzEOqKGB#)?HlEy30UXJ7Ea9MRug1k35R-xDy4OD!}>&R?IO=
zaiZqX%~(amcf-RQPOACslg#U{6tQ8^Pgp3duH+@iL59>T2$XCv8CqyGz590VXybAn
zMLpQWnJtEQ78#niG~Z@^?OFq#)Af^h_0ckLR^MnYYJaW2yA6-KqWG?ZBTfX-1=Pvh
zh|FoPL|7qWnz&DrM)>v;&%+Pm$JlG}1fWaA-9am9)}kcjg-z7eeq7vCWoG-;hrYey
zrBV0t;`2?*mnyVBxr^#RP=wBe!le&*fC~NJ%lw{;{q8=7Ei5h<P7J3trRicCW^Pf%
z#uOb{F5O6#UrhcBf;~i>`2Gh3bH9<-K&V2$9j#_AstU}#RsA9Q#G0?n{f5SBuo&a`
z-$eS})BhyWnIMGh)c}cfT;jLZ|AT@_0(+x<A>W>hwE=8R+r3^8oH!<O^u6!SXMj#R
zV1nhJPI@H)pp#C=$&sbLSQhLRD^`RPL!<iZ4dEdN(9J$G-TzbI6BWhlTMO+efqI3@
zu3G%dA`nYHNb|tq^>2g|A_ag8Qz!c`7k1!w4&cJ3I{sf=7?6eIf4H!pFGWDD1geFu
zwTCX)GOSj3c#%Y}@%V0_rR>&FsN2UhkY}BRDtv+2(Bw&7ESH~HzHW=pEO-lt;bGU`
zb!1W4Gjx-U2S30%XcV@GYnG;`<_UnTNR6uZd$7Lpijbc{BiA;)qO>d3hq{ct-hVbm
zMMg1CbaRm_p?H<oa5sQV*b};k#a3TM5>X$2F%KsVz}oC_AA<z3#HQuv^85Yve}pCg
zBbb^OG#EJ7<+7id`aSLS+{_W+^LB=~pW&L(3*8dsZ6LQt+LNRWUUmITdK@;CXR9ZJ
zY}tNltxg_}nZFBixM-Crp-Mi~A=h(8ajVDX9N$Pytc6eWH?A<&q|}~g+drfju&z}~
zY%|AQ$vH$C44P79gQD^y)GKLrfk*D{71lQDOEh=dVr!3sDwW{FX&Ph{izYT06y@;i
zPr6u|mS5TafTbiXrf;f(tJzYVpWP0(#0*+u${Rh#pXXLAjIo4<p8^bvWq<t6GD|R$
z$R2c0d-DX63A#_Z;&d!AE|<jkc@3OYqv8p9bx)?5S&KE&FE2Q)*3CZ-E?O8B3`e3%
zPuX3C+HO)2LWmOQQi;SWNN&f8n|ja@??ioZyEj>+_{0?+j-r^|J)4NT*RkZMq*dNE
zlzz?k{YEdd!`qOnl0VxyPA#kUFFIFw=WGW_{jRAKDGAUh60_5hV5j-os3%GNTT>-n
z67Wm0biNP1O66>a0=@F{=3K3E#Hg;MbRn?zY1R#AuE&2}{+lzhH`$1)eI)rCo<=4a
zA+1we;VtHF&gWb?%5BN_e5$VQdSSM=?|FgfTE(C`)i4wJJ1sKJ?Y#kE!0dZPp?mYP
zR1*}riK%D`HZw%3DkLhCpAq5HL7_P}FyXs~QPhQkBmr&eO7lWY<{PDtM}+wM#bY<G
z`4wjJ1;B}m-T|A=Y!-@tWSq1a<bYLy`O1_yV>nM02GbEFDnm#drMxe}6-xmqFSSl8
zgRK8ec`g2z^1?1>C4)p_In%S!2HAI9l$*fW%)CqGki;FMb8w7)mB!mN_V@3#UNhVW
zZ%BuS!P=~x`iT|Fx}`Bj)Jz(M9Yfwsp4Tt)4Pd;wP#peQVay*%(xb>uWcyE`!(?Cy
zN0LtRksUxC&H%dSjm#iXf4If~IcuVGxu(a%g0DMzN1+Ndi7JybXffQRY>grJ$kA!9
zX3%amWH%UXK=?e{Aw?J}SuX*vN=!@T3To5+ICldvuco`*a#abfMEUO$3BhAw@8xi_
zABc>>>kiL-dZ%5pNQ|*HSq9OM@-uQ^r#M7G>_|thUVCb&uZ?Ovu)lrC65R6`Zt~}$
zbr;8E652MY(Yln7<ve4F5p@&%waN75OI)MKSapBc<C=Ti<D5p0#1G*rFO3#TOHL3q
zN_(Bn@db&hlpGVlEL$KwPF-Ti*zgR0B-x8ng1_@k7BuWl_6~8{_M698mIA6=_d?vg
zr`#bQAm+aD@@t&e3*@$We64)EkRwi%^6O>(y4k#g@&q7$7@#I67?B4K1BIHtPPxpM
zD54VX!Gn_xu|zp$K*$fC6O0Dytnq0Kx!SpYy<9KMEfa%Z)j7QpHi~e{+0Y-kzIF{V
zg`%>t2tTiNns^I8{pmq2RVr)Kf@ZXTti2%cQ^j)x%9Q95*Nc|S%ZFU2co)I*sJOoa
zSx%=aU(eIvQJV?5ZBZTV!}BPSAfs0~7PuPrkCLe+!_21ka9k%Vi(Xx_r8zBnuSfxi
zejfR8`kIoB!wrX`>D*e+;ws8Gp7-~6k}th2?}V`TOFZ9T8`Nx?SWA>*wQd^vU?B2x
zois)Ab`|x&>2Z9Q+eU&)_=oPA3hhCg?G??EIckOcKAw>DWXgmU;On)-ZD~ok{5MCK
z74%!_$0?12g2YWBS8RccqBACr9(ip9mkk0Ogc`@L_j;)v2K14O+e_sYl?%9J_U4-P
zstKTm5&=De+m)4~iO^Q|=M$FLF12mlk-g^W^ojPd<@ID4n@9FHC#x?g!&ir5U;2l=
z`@n4yL<Z7FqHi7TAJ^Z9b303K_DeVB4^>l52lQ&SGSny92zsW)*%;~-y&k*O5$F0H
z4p>Xi@)76GuJdfI%6g17iRP+L?<6+ra&<f7!;r$O73$u>)Uh>rgsVI(IyJm7+SkD9
zc>>;>NhurJNH>fHT-t~`j2%5KPoGJ%FYC_4u0A@;Ts;pBmB^UYeLrGbZ9=?DDw<2&
z1gu~V(>#l%x(dO?MWqRUJ&Gp-5(Sr%+>|<}Kaa<&FMdJ&yuVuVDyV?9dJ?ci88YXa
z<(qyr{I*a&GVh*23*Kx$N|j~!b<3v$+&pkI3I0<dBS22CSoD2mk<#D9d9fjiM17N2
zE;CJvcg9wo0)0+yym-ajg~f`TZLb%jxN|+3N8NIK9Kkkd*^~Wdh?wbc{p~{rQ(>u=
zuVjJtbb3f3#NA$54-uCW8W_G~&P$yQaFlHK8;D8&rf?0W+S}eGKMSwBsg8E<^ef$<
zr=JWyl)bLEtkTaZBYdHMG=GwMDUvx;B)S{!Pbdj|=UoT755Hj%(3gH{P?iL!z=##h
zyvET1C24w!Ue@1g_lNLh)~VMiJx79Yhx?q}C#;^On}PZc#KCUVMDHF&v;?>0irHML
z;3SHoWsmtx6+DzR6%nco-ojg|K-NOVPHGLY0)N!FNaL%1`$&-zAo)lUD?&`M0#y}+
z$m(yLE<*Qle)wbGa@BhmezYj|eNQS?SbHiWRqG*DA>1kJSn%bUy;)c}o67VT@dt0K
zW6%zqsCH-)N#|%tm3By17S(YS=c^Le06m0og4nvUeg1Zrf&$O3@Ai^)=;DNjr89Cq
z%OPzmie^2NMY(F@w>DkD1|AEM!FSD^I1zWU#o4n1ajKLDWRLN3M50^U3eW-&=5jm*
z(Lw7RNOHp(XM8jzTp^lr_adpiC7NuA)HS9F_eZ%xNskl|f%_bx>0s$UL?T({c;Kl~
z@KFkjtZNd1`v)ZD-44^jilB`Y*FoB}^v(#2NEMwBFyJvM`MGFi&}IGq8|dU~@|;J@
z;fpRiZazpKUhoZ|WcS2Su=4CxE^cuXf3ziuY#yCKRuTyAE@7_Qf>xSpXv?!T{5*Pj
zGd`~89?dN~QB>*p69`a%?KrDBQ5hk^t5pswpVv1y!u$oVB2eKSzQ^)c<C%QWeORA=
zfY<h{<Zw-R1;-ZJ$4ooK%ICcigUeZD!KF6Tv*kLC*enS5^a=4@cfVQ3Jk~OC39D2i
zXlmr%<eiTo^N66<roJMVs)<p~FQFX-rvmsLyHE_KXfaW@oy=$_tcf}CRq|`4QsHx|
zSM+ISRfqhXrCWM(_}?}5C_&KgnPuG+mh|P8awkLTwl^)F=~fr%6>{|6?o0hpF6B=}
zy<8mk2)A(E$b7{JN(9)3tZ{*-vJUKnwXkG~R_!LNLcBe^z}l03ONA-N02c4-VyCm%
zm^w^D%ArU5nr2zVyrs<V_d$U3Z02sa2eMyul>JN<Myy<-lt5sL^66N82iXfKuTTR;
zlKadJ3p&f4`=Z?pw$51zXh$GtAOdKolB=p2$n@*37Mt}ycL%`<S>*1esfuDeSg%CP
zd}VKd?bt;!Bwp>uHq6Bf?!i2vy@J~L!k{D}`Lk$?AE+xB2G{<!jK6Z^&fwb{)-taE
zQ)I8>mB;TI14-y}2_cTcqIXwS|BbduZi)G$h_N&4$+?ii*D0fCAedplw}c94+{-yh
zNQ@7=a)dB$gu8clBDvA*oftI|Zm}=fTZ(~Su33^ok*sCWn+?ZfI=4?49fX2uVZ7e%
zsS5HYR-&9kMkH=#jAA(2>~C8d{#(1h$CFm0fs>Jjyvc{PlEe@4Lc87b@QE+6I?a=q
z$?iNSlkS5o;1{K{g%<hxx|HpMWn6V+1vqNW$+v23x4S})=h$YEM1KutMWbS-`OL|0
z(Z`ysxSQ24Pnk~KvPpwgR=BE9ZBqFMK1L<FEFuOavK391Y5$l0Q}IvHjzBRi8~;r6
zRY}?gFWV;|x@*Yc<x-zif*9{V^=c71-T_kSYhg#VAs_QH_g>#yDl;eUT-h+BGYb4;
z_R7SDk*LKw-NxqQk<w_a;<CH0Ah?EUK}87v%x<7Bytg%sZwZ#)tO9Ce40?y`-^KM8
zX0y4~M4;4IE201|qrg9)#oa4h8+k20jy}LeIcmamCf6`+q4;;|hQih8=p2*DLIg6G
zBw$A^E9vl0^tNO<`S}6%c@lla;X3*Ru(w_1Bv@lD1MIO^S>uPCd|RE51Yom3_kJip
zR<cLUiPQH)C}HUq=H|7{=#{$S1*L9H4!ax2o&}>+u^?iXF`PlUM!j>lHu<*^UA#Ix
z?#EG<xPNcYM=PWqJ5*0R;eTEmVjpm8_xk-^@oK@8Fv;`ne*N@$in@q7*6=?H=u(u$
zrz>mPKc~<U-|8vVorFQIu^0(qLqZAA><b^5K1JW$7bb&2$qAh4D`jBV`7l3QZuk#O
z^3OjFZQk@MVqOKL42wLTxC4Psu0ILHA&L;QLr-*GwX8w^jZI#Tj;~Y5Je~NRf?(Cx
z0Y~A92CtKw(-;YPFOF<Y-~Jk=*!%B`KZ?bq*bD{enxi4_?x9uk6ngNu`rH#LeFF8k
zV+oV>aK{kN_;&^FNB5u~s(P3q+@R5dVmA*B?(rY0(_428X(?Z9hq1{Vvz_^=G6r;l
zzU{Bp%e^kw0PPKu0WE)G6`6^w7bpUFF{U$;1WcfzACMFrOmcwdcw$OLfB85b1qe0C
zVxWQ1pskbYD||%QCeIgK-Twm5Krp{YYH7Ln>?%GuasS4~+7V?Q?Af#UIJQ5&Rr~Tp
z?eTH@)3Y)dpa~9#3SGi=RXK}v$B1~0$fv7}3v-)_%2$?#NEhQC{!M5XxW?c9-adJ<
z4*5cFAws}BNLTIbX?B9+ss|`RBC12o^)1bRM7>Pt6TF@%O{-6<w6)_U@=~^hw^&t@
zrn(hYCl-@owN3_BM|u^cGpiFlDM)Hs<kY0NsYy|j4yYv}rqzj;S|?i4Vf%}ef=ZG+
z@xP`LB*SEhn_eL4_2fi;GNLBrL#vYwDUY-RN<Nj=ME9jpQ+9o44Y`+h%w$4DS$a>W
zm-H|0H^xXdrB^dAftKmlR;d|VE3MdCXvE(5P`vS>Xq6Aeuf#dxjZ?!KJ2m{$T@^xN
z5MdsTwNkP&xzQXxFMwHP{s5u1w#rv9_p7$be`9m+XLH}7FW|Z+CoM5I;lS$gC{381
z_*}X5{Fo%@pyPG!Xnv@Ag(`ejrcT$+{JnN#YH{Hz6)I?mgY_giwHwez@GFZTeDcFf
z-KH)^nlReT_UQ~*M8%@%YRW>LHQE&}TxwNcJo*7vcn`r{5Kq>FJ2Y#=^6Z19`4cp2
z0nIs3yLqE_?Kqyt#kq4d>Wh=JqW1CKl}lfc=v1felaK|%u&@b9-czjZ9K$@x80N67
zEThc_c(8&}8qXV}AUU^~F`sFR73I3p!w;sfe%ueRbU)zgcdw@Xq$@@(wy>DY>TpeS
z&-&A7Lnd`O<uRJjl^ZX=zfm3e^258##es<YA@2xrI1p38EY3Z!uip}tf>Z)Z$s>d~
znEwj*cX{fp{cvG<`U!58I#|#U2H>_t{C-!i%&vTW^^M;8jo$i=-ujK+x*^?<NXC7V
zIls~Hq9I>AG@)KyD_!Ut{bz%F;1{k=eC;a4e?PV1Z%j?tDz)JHr~zB4_Vd^9y;0^V
z%DjI~T?A{S2jAfP$-kyPp}|i@=Fho-P<a8!5l&ClZk*;fmIt_e?JUG4#@ChCrYj`z
zo+#7L{NBSWQ`aD_HR*M`&vErev)GKU-EUOrNg{1YduHYl6<{I|0+^(jb>yUPgq&+H
z<ZQM4k%aMbLX1-S2*i=dW-tX@e)Yx5^jRvo5osj|O$h;|RA-ULftElcBweoUPtV%3
zCqTL+h%6v0(}@?Df>XBz#4rX41W3Rd$a^cp1)`Hpty*x3PwoFQ&$IE<yq}TZ7KB*s
z7ia8ePwAecF)c8f1gz1A2GlOj-GTV{1Vi9RLh*)?326`pMV{Fu>O+pOd+GWd4TkJ5
zB^8ehNiOR>3x4D6iAn)VX!s6dB?-Yxo0QqQZsV^@c<Uyp83ni5O7&MI&4q~CH>Q^6
zXGHes=Y?yO2mm~mfdYcz@PX3v0j&E;h$fb2>eNa66NMfAm8H7o1De0fxGv6GIjN6D
z(JzX}V*eXa=`W1$;#pAFkf2n!GqhoLWX+KjWw;Yl_wYFe;^&WFy%80^5f#4?6|a$~
zI9DkFdHoFv%1n*|OhqNoHcA<5S8grOK8JIbQO4z0SK${Uj-02VneEOq-Pk-hOtHM~
zgk56E<r%3QMwFG(E|w5k?A4dc507EuM^n>4>X}y;fnH+Te)~BXB7Axl--hjl3lu&a
z?ibIJ=zQmwrx>S3i7N?B1Fd6{xgO34zLIxdT-#@lPO&TbldtWC%P=|;`rLke*?#r|
z32y%d%Sgq~BRHFQq9nos(Mz$W#9ytLlGJZNP_;pGCzni`oRk_;{vU%~8_j|4Z!G5m
zGuZrif|EhApnLRGcS7$CAesV%db7A0H5UF}yY$i&LlUkIBxWOvvv>L&NxB0tz&lK`
zWP~`Cm@^=-KxO<ukA+9#>(y@kWFNamjXc<rCUvOKAf9R3DK!iqY?5mi@!oK{jT?lt
zpTbtu8uT%>^a+J%L?uHa-hE$w#Ze^}{EdevZ9Mqr8b$A<1!&3?`=@^{&0nBNQt2{U
zel-mgnbgq1{hP%3j3B<>c#;uFy8>J|$+^ci=}-T}``f<rIb`%Au8=z5)`S36V|OCn
zy~X)+OCLR@0$F1;P#Koa$<181=bv-^M0NtPhzpNSi8qnj)m!$Xk01pH$;=DOgE}75
zXy?BK1(qqoWY7dHfvT;0%uH#XYNua<T$0M+zy4keuQwy-MaH+m<%drwgHU^L*It-f
zxinR~d5omvJ}m?PchUEW)pS5Iit)7?a`nxBvHb9^qZFYlTd&%r>abpaSTtS&eP!yj
zec^MOFU;M24nL?@=c>7;6p_5L&fb~0zIsqp$(5{$G<G!5d|_<BmEiex9t1)5F!o$4
z{<=(=n@5|xkkO!Hqcfe(JqMV3b`|;OOUN#N;}OJJzJ3!gLORhV;!4(AE3~CXyuVSF
z{$jM5K0<^y>dQCkOJU*AL_u)HBvKkU9*7b}kk%NqCM?DM+&TLcv;VixKF9mOAmetp
zUJ~2=JA3*??J|(4#ko7nQ<&v^LFMX;bC)Q4=p#DO6qj6%wkmbuFCn^kz>I0C%8@A;
zHNB!-^f#&MGb!}mXx`}Ox9v9JDB7SL)mY7)cZF$O@8(v(XQLQri?LE8V`Dr~<ow;`
zmoKT|hJEKAZI~welp4mQ5?|O!MMfTpJdUP=fjciv0e#`3eQp{<Q!-&F33O@Fm)g^r
zl`rl{vj?&}yr6EMWmZ&FGX2vmBwgk%9<BiU;Y0k&F(cqkALke6{s~PARt3^Xx}sbX
zNc_9-9MIu{(tP?9hK2D;u7`K*llNi5UY>uW4>aoo*GT90>(lG~yXfwIt8N|v!V`)i
z2r-?-L>6da2U18|{cBLMoI6MkCoax5aIM}Kr-oHwZyZvLcxb{QMH2wua1%Z)&Yh{9
zzK4$;I3eL&e0c?*g6`VjIhBpveQLCXL9AJrKcWm(x<XU8fyl&^=E>_0qI6ksmx0@t
zys#UIJJ>KG`t3T~1PeCkm;^8$hHyE+2?<iM*RGluZUb)|dwv~){+3ARWgu@Hd)_$q
zv|d*ICW_uKMp)d^R2RPlQF9xr{cykb(FM%nQ{M$9@FnKMTEk%9z>w`odm#Y?W)8V!
zA9wzu!eWOL)3`JmTIT+3%!Dn@%`d+A3=9I@Jh|Y{F#o_lb&fJ`-UDcjlMOL*S(*BD
zY5og75)hE9<0pMAk-;xdos~BVyrg`JiM6DSXxB@`GO8|IW3t`5FT`}iIt+a5l>(d(
zfyvwJ^e!@u<+ZvP1?bQ^`5FCoB>Alq)&J@Q?yNdFGwLP<CZ~#jQeax7k)*(#AGSo`
z4+5f_=j@9!ROap_{|B;t>Lf9PyR&rf92KuGJ@|a-@?FWVW$DsmW(;@ozWsR0zHoJA
z`T`o>Rm8h)U%5C|st%*t`Fr-vQzqSat9Amx5(>h8f)j3l+MiQ{)`hF~$r<R<!Uf!0
zaROJSrzHibU}ZrRpdhO!w&VCjTb#$I(q+fvhTb+;PP`D^ZIo;0%G7aY7e|P>lT!!`
z+)2`RD#c%V@^I<iLsr*+Y(#cBK<>pvZtpWT#W$5}Wd-LV!RQ;ZGjza7*3M3BX@U^3
zM<yb{m8(x{vp+7M|H*!Ng-Qi!M?^;>j4y_r@2`SxfhZQ<s1e?%5q|S(1eVT4+nNv>
z;pIvD<G)iQ2sw>QaNUF91{uWEA;6V#&E)cLWbtbm9E!~_m!TWD{(6^U-UfqU?pzQ+
z-V%MWlZrTBY`vafC7{&3DN*{`ZLBw9=Cu_wzY#6J5iPe^wA^wba$_Q6%0uW^msRr+
z%w{r`&R}H1<^2uSYsMY*4b2ob>#vJu{sz2Tnjo1?yQZv)aJIlv<QJf7YeV)ok7`%X
zQ!NOXdVYJ12POca7erzqvM%n;d2R98X$ZU|NU~IjV1Rk`A+5hOJ7u4{3>Ani0}Vm|
zuHM(5*mus=zB~@Ee!9DK{|v+fzwnSmX}Lyw0G?(qX2G55@olQDP-MpW;z&9#N|1Ev
zgCQCoi!4LV92x{bnTJ)YAAj*GM}C8O{dz~z6HiZ-UzX~yRhl=N9z>FS)^1JPudc8x
zA!KD5HCI9v4+W7IbEPNN^;#NmMw2Dlw2Do`m6vym%c(W(=EAz$m=?`TinZQIYD^bd
zAFZp5cIb`EI{2*HvsDK_G<WTdZt{(8vO(QsM<bd+sKYcl@YQk8X+k#`L<Pv(p{hU<
zyXooD^Lw&d5%hyj$tJaf6n(AM4N_YWDM%G-h-(R{iOk~M-P)}M%<EwW5LiOukmINh
zaZC2cw`qN%NN!-lh`!iU+vd(`lZr<XZHsf)mM)(Kx2lEbPJF?o`H%1_U^-k{x)beV
z0jQ{Aae(`~gJ;-&%Q_5ZH~%wrgD#8C+WAj08anNn>7~=t|1<TaqfDlqg7#j0FCn)R
zL!wcwB={I4Xgn4F1refsa$)86OE%cK^Nx3v09{!7jp;C*Eo#(Yx^DyOFXak>w3kX7
zL3Njoxf^`ifCl2jLIc+#ZKe#=N0*7j&$?+cofdh>Cb_?Bt;zIhtw}wmYs>g{_UK=o
zAIQG&-O4dczD{Y;W5DQb<;$CRbArtpcR+T?OlxlA&U%B-YVV#Bu+`l@B~ae!KHOtn
z$otW$)^FzissU)3(X1cjmYB_I<M+P5O2tegZRQ4B2X(=3EfOI0x0&_x6Nh`xZ&QM9
z-`+XWK3#>>yc}EGuUxEcXvM1ob!6dc6DnBGCWE7<DY3f$QV1?+*A{$GQM*^g0K#^y
zs_9vCB6tl<gNjS<`pu{=0}a$)q3P^VAI)Z<X4XjG8K~{x2|Xp38x0P8sMz1@t*;u@
zt5Sn5IvR|+`>_>yV}k%}UP_@QX_aG?c*^L)`}ghGu^sK+yM0GH^3WLr09j*9=Fwzx
zmV$`<E!PxttOYeY<K-K42KO;J6X#AcIMbHm+M1h*O$~`gUg8UXOEags&(NE&GUJe%
zwJ~$ByE+Rq4yHBsWgOaStjhxJ9wyQy+XrtYt5hjjATO*==38%g!AI3#*zo3zC!CE@
zELX)lDnG`JP%%4s5Gu$ZXkUp~!nr4#tB>Rgnc{e%3X6nZ_mbMBJenCFgIj>Se4R;D
z*$s35H*Xf0P4oH-V$#BE$y)H@=xD`)`Y~x5<lU`}Yd|FjtjiQ8K?~=eL|kG5gF1~J
zP~qXrmkA{9rq(nWQ?Y=T<+!DFtB`d^XdHNlp&gIGc}|=8#h|IZ0k=mQt!2%yMNnr|
zGz;C|$isJ}ULYG`dW~cB0xa`D?jaR6jRm+9bBBE6n>J`d;)|drOi83gY+RqoIt$d(
z1O;At>XyS>w{82=`&+mDpH^^XYi`A7oUFhEZ(Wa|W~}#k-PdHh$D3bWP>n|H_6&Qy
zzSW+gQlqi5TM*v5L3=$@)kdxLOfXsziF$1<^(4H%9y>kWyop2xl#H65c6i-_n}g>@
z-Vd%`_Z1{GtYmMYnIi9lx1k>|5!}p=*JxRckyV=bIe2cyz)!5)V)H%^&@D9X<1P4@
z_DKt|)rNgWdcw{6gwCl(cVedkvqS<5<TYu{*p!LzUWdsYZ`-lN(vee7gtrcu@w2l-
z$|rz@9Tge26n~yLtlK4DIZR9YxIr*HqSM<Il`<Yu$_ut6G-Yi)65L|DIu&ikPwTAe
z`1jU_=3Z}$I{v+pqn@k=*))sVA(eL#j!!nsBH#m?5JeE-8i8>h027jiO^O6F5ZBg6
zc<6<IHa~LT=E8?R5F)<-W-p#`P6k=Sq-4r5GEGCQk|D=L#u;k6+?YlU8gE~VoZ`_c
z(h#_2161s*)?zf0r4pXcu=AE0`xf|u7|WMZ#X^3P-cj&%BwffN&D=^i<3ewt7%l5U
z?m+Q&^uKvwdQ6m4EPENIq$MJ46k(G^{r20aGoI-0kMyFBL@eGH=|}jHysh3TmbazH
zE9v~M-C_Q0U)ds-%vKg}GAkcJ?^*?`Oc1|2H<2sIJI0yT6lTiDFy$G?-63|O#!^sd
zyhM%|$h($z(unbZ`cCnpnk)i%LG76-tAv=EkSulDG{c`(+|a1XFKZKLEa&z`c^;rU
zDhVW$*VWTQC$GOZlIUBj*^`=1GNn&cX6mKIC7#tlcMsAQxG-%KXe;nPn+_j^Z?Xhl
zu=<1#q*Xojg#*D&6l6a-WuJUX(*@CNfF!>PH82IYu1nwp2B{P;d?T5x(v-Dx-%&`l
zs}CukuypCG+RR02|I3VGPoKkgy8oH_(lJGL;{Jmp+QK!+_(lRVEAW*e==aA6+V}q6
z-TMxtcJ0{n?t%AG2i|*s$Nu;B?%J+iENM!-To#rXMOO5nRSk!n=OOhj<JQ27iYzFg
z6;M-S?nK~FfZEP=_yH92c}lQKWLyq=>(*jsd2wUzFsV1A3Xr$FkYwaiMSE(8x&&iz
z<*LM9Q^f=0P59yOtY>v4^E!Mtw4rUqQVw%sqZm9Q$qGmDJrmRWC?Rf@Xg;4FOVb~~
z0Mnnl;IFWEqnORc#2lDzs&6fpQwN=0D{?r#1tlUywxUBv`eKBR#1o0gnvjt;v~v_4
zwa}sT5esF~nM2k9%3IZq6-0Bh4m%5G7*!8p(sJ++8YbmZrD7!qP=_O^S|oRB1m`9$
zW})!{UNaf1Qo$&uB4xSJ9TOR=MAH4l2hh<&RsjNpR=@*?wr7ovSlLQMSma~rNi<@i
zu^cUiPk?gH@`RcgK$YT{g<Kq?F{^s0hyi7b2;YS=IlS{>hybi8>?RNa0z$II!p15p
zSY!ZDDOW{@tg^}^O4~$=ajNZ0X>IcjTwz0%3M&{=>$3VW&bTOh+TkF&6LGqD`nuO<
z@xbRp?SN7CCfP!4-E&T%4ZWQ|I^k|catC0yXp8`+g<wZm*e$2+k1f4)fAqz@hIjx#
zZG)$|(OiaZH;>WkgL7F8=X7-+nU7ZZH<e1`t9Odo+>rMn1J*+k?JemCd7jOUj-t*^
zi~yA0Ok^}xWpgYRi+3hsiJo{o)}1MijpeFQ@(zvEFN1*=tfTlIm$%5IfRj#dcQ<}a
zB;qK>KGA4zc66j?WJ?#Y&7GZSbJjYtxiFs3!}xHIgS7?o6N|)9M?4Zw^d<1*K=pTk
z-o`V1nQS)QV`Vd$-ac!jKi=OTOIzu5qI+~pe>&aW8;d70u{dTN3TQv3EAc+uy8-ea
zO5ip*qye-qS14Lmx$`jI#}hg9yLeYDk=WAR6^r$~9q;YJyxQ1XP<lH)6$a3OaeN2v
z#+&R2f?lK8mVtPDpeK$xV*N2<oUng<<gZqyI)DcEZ9TB<y&)uzj-NkVIt4<~LcX+F
zJV7>7@;F}2;bWvUDJ-Y|MkvvbI%w5sG`g9dHJh_69n@y$xZzKn`u~mM-CIzHUiWXX
za)2v$D?5)^?f*vo{dkioA2xz6vLGrba|m>R*$CRw*NZ7i1z%}kk%v3lBdEU*?@)Xy
zkO!TPL6EdWgJ5DLNGX6!7i1=#%_gi^cQhJHjP_@{M}#Fy7G|JU>7wlj-Z(7W{&*r1
z??xT)Z=%;FEc@Xao7o0+!sM@RrGm*+8#;i`z+wp~XR(Yn;<Z;87@&%%OtCZxSBH(9
z;xk1QJ~#!3Q=;jFgLL=4{TTO0@KGE!G;|omL@X=2d?*hAW9sA`H&i|1TUE?QPOz<<
zJ@^8?_x-JV-rbSfw{t)%03XsdrUbymKcCJ3^TSnf<Sj`IEw>GN=5TvN=z6;m`Vn8~
zN9sTqN2kMg_))FZHZ+d!fUZL%Op>D}L6|SW3=`8yQ|<{*akYr~jePMaVfPC~B+El6
z>%rGW07R}00-Hx1*<CV@Ov^(l(((mjfZyUJqO{;F5<|=Du}tbBbgdBCDg@H}eM)5#
zSU$d;W5<5>baCzy4FvDzN>u>Suo?F3RgfEeK3%*04M>Nn$%Ost<jU=N2`I=6<}UFP
zj&m9a;wIUz^hAytGN|`pWuygA1i!@;MjDH*j4Y|vmds;}$YBG=y(iW;s5bLCeCzH0
zd0}QZ9>9LGo|eBoR(pJ$rKzveV`^^Bt_@+ux+j992ff|x%?*q!0IjBBl>fQ&09so3
zeEH!gi*x5{mmb#69>Z7h$xe%0eIT>Z^(VFScd6$x^?hFc@r*sSfX83E{)EO5!joyA
zT(D=K0XP#D+K^Zk^6)-CFP`HhU%~8Y`{ezlYacn;91IH+GBVLT_t!pJdh)fsa9L}s
zk@Hxp*w|_8Ew5pBF1G0jRPx>_M5TUUnDjR$&tbAZLASD(+&k0xK|GVte2kc4;2#6Q
zxV}JKtp__6q^`gr_HM!Vly1x+X8XHhT~>EbFb*+jD?WT<P!PaPCgQP3KU~CPk$B8y
zCV3Xv?URM{ST3{Yz#sN!(g5TAVl`c!+yR1tEdCkh1(oNF_iR#@RW8DMN*zk)M+q2M
z8N##hguvQ{LZSDIRO1Nm2xm$aXJAuNf#V4tkxPD*Nm&WYF%ie>XJQcFSMbtC6O6!i
z#)rV*JTc^!OSt9KQfR;>##C^S!W0bO=4hf87C3<zeU$QeRW8ndQ@j4kr&SXFJy2r-
zy3b1n9Kb|8GQpiVNO!|%La)J1Rgf!1W;>hv?1VcfaH$D!_7H7noIkMay83$}_}J*^
z>hFtmC-il<yST#?+Jv7Sdc-Q}7zh$hhh1?h<E0WY#zvD?H7bk7J=pC@uIqP98p4X|
znb<QDyC6)oKm>`uQ8NYCs4CvxPJe6yW9k$^vdj<jadLg0tcBNQ>slu#!BkJP8K{JF
zSr8LdC*|hYESgw2CK1z)aJT1!@gdhhMLU)!lQO=Hai1bG*1VAB(ZHk`nOez4k!rNj
z2#j2<wkML!OgkN$vf9kCsRE+!SB|E(wcdTQWkqv1XlruxZRT-1KV(k?M^oyNlVE~j
zLiU@HA9BwF+JC84R)TPRy0^iE-19pAQas9yfopdx&bb-5;^vbZ5=`KwnzZjqHEula
zUCr@oGaJq8mn+xk$*~bvhp*wQLv*S{n^<3^I&7-NYBD=4&$bi6#a;GIPP1UlqRer%
zSk4^MXIZF#H&n$uQL;CoP{q@sP^s^@6ShY)rE&Zhub<LnSaBg#0jF4x$Uods&?7sH
zP?xON2yu^NzQ7i)MD?>;hc&dp2NN5|Ya>QfX2f2Gnqn9n|4=824;kP9k#RvKd6!I0
z!bf!UQ(l`FcFxo&ewlL6=Q?j-91c328DKr}sPA}WRS0t_LG-C2?IH<`I=ynG=-7BZ
z6*lg4bFYVd;F0wntz<yRQBGmHJf5*q;aEh*i+ET>%T~$4tZiK4ho$7#5kvtcH{#X0
zQElC0w;O1<od2TX-R-QDAoSZQ9CTkh6`I-ACRgCbcd=w+(ac6h;-}GV+^HL-_-<ji
z>wvDi4|Q~Rbw&Dm8@uWR-#PWF@J-bC9x_4ZtICzi;cr!EOMwtpEEz$*o2#13TVZu?
zNt>%(pLxF>1t0NVuJcW8tvO3OT-p-!G?l6Z-FH%@)}E+3=!SNqSi6bMDAjLfLptZG
z2FY(*YK7q=o~>E|zth!JxUuuqqdET*)~iw-O5szj$y3$~;(DF4Qu}^%LGPXK&4#U`
z#`oxIPe&D0!{?&_V0D|anJw9=ZO8_;W1Y>|z*b!CMr>jmHnj;oDm8y#n_<7cBU|W?
z|K?nmb2`xO*Qg4Oa?C8cZ$F3cb5qOHPkvjp0*~|bOMCix?Z#;e3^|cym6BSl1f7<k
zw_J^YgXOT}!>g5No{$j(Nf{uk3o|gq0zkni;J)@uxnLxEdhms4wFn`#!fQz8IGr^|
z|7n0OlG*s@ZD%Ti(8*sQ1@s7A(=aJSo6K@P4?^gJj)26_1K$#j^e7YadeJaF_3s4J
zpeqF$BE0Usiu~be$hT&*TgNAI`CPg@xvN;IgqfZ>ngLv;$E!u4)6sXv3mMSIJKA3O
zFdZwgfsV(zzy>-V@9&mu75Uv)wDacAOqS2G8!vbU6mr#EIuGuW+`FFmtdvUmN$L)Y
zn3X?`hVIrS3$j$KREaaDA|Q6W$*9Ab<JaM$6OJR*-Hhnf$<d?;xJzX)I0+5@g;W@F
zBItL5*AX}w-#oMO^+!u59@q<0_PsOFC{sW>0IC*5u(2ixi5=8r&LP?pR5^rCmQX~0
zf8bD#=qb_&VwI1NA?7E;>w#tzOum3#LZ3LD#2X`<1zT+KAO8mO#lk@iZq};g4i>1N
zLbaSbc+e`NG&oEYCV5-%8&5|iH<G7*NBEq}k7q5{z@3JzIg&2t($o_V1IrYTU{GN1
zVS$CJP#Dsr-~r&*lk^XCEK=upNE(@}10Wj~FEj|WbI_{dmN**{O0kvT6bVI$I!K;G
zOyOM^%)?4ogb)h}zRg5S39z@r(~Gv^U3Y-P(W;BRluAYcFh~`vbW?>TomP6xN)^i~
zK$@=7a6K@^G4eyi>4sE~7zg783x}_}diCK_rwb8Q7)nMMrDEZSD#_4TC8YQ4Xbvx8
zmw2Zf#?h>dOtFI~ylE3{Hv(pmlne?<!<*Q*vuA5F7ymVNJk%~zDQ?RoxD0_=Ohup)
zPVcy*7*%RwdJ-k&Q;u^IqouvXWhS2JO&Cpeg40s|dF!pWT=Yk&FFsQta&I-}c62yb
zN-@tQ)~)}ay?5_!>PQyF|IerBE7rZpu`DF{?KR2_LrB6oJTifq`Q7E?CCkz_XpkjE
z4})(GcNqxSgy&3v5FiO;ATY_qkO?6M41AY6-m?8yzQV7oU%Ov>YfCm{9_%xRSbKL@
zcUM<eS65Y6kpLwz+yy6F%6Q~KRw&#e#;liNmR%9_c*Z-<OblhK4w@-9Kz^y^3gt%a
zmuOqIL|)ai#e=ZMr<`B@S82!QDe8*O%f-+HmIij994^gE;#zU#`Xm}3A7}fJybMLM
zGsfagYSA_Tekagg#@quedP>`k<EWumwcQ^!w9J8blA%Iq|F>2*GlCt<TGWP~P#?J7
z+{ENiD&fTA`M%u1#6W+tEg!fL?Ese6-C44{hCoN3gnRAuHV_sv@RBFKV>|HytW<Nv
z#RswyCJQ-K%$D{yvilvhAGRZ!rOT<3dw?)IcsFB^iiN2{o$X!?jGC$pm)@Sh<mx%j
zvO8DX=?yjEwnhrF?O+PCZ1S7B*3DFiQxpBRlf(_?zLGJtuTd%D#roS4z4<A*VGRIF
z>FuW%)CeRjga|~ji)9?FN4Cn%HK)3Vlq2E+a|<&-ky|Vrkj9qCMDiNawMbT4yXU9C
zk1KGC<^B7)yhlsWeXc>PdlSzqs8Ku=)Fg*egx%u*B$MX#z4vzQ-Hi#&*K$Lc1d&-F
zqzUH6g(7iC#($YRML_VeF5R_+C4!M(u?^&{cy+3lftXafM-1gyL{5&`d!{?07#?4n
ztjQvg&GBWUy5q4vvWbR_M~?I;G-~;lL9G$*3sVkF;SEj;*;FYXil(|^$QrMUIMw#{
zk*lA*!!$g`M*FXliXj|8X}Nz?|J9f0ZJHYFjV1f>gbXEODNDc-NB|YMkSQAxk$KDl
z?Zoe)HH-{WBm#o=PFY<h3<%Byq38a9<jhifhhjbr>dB@w;8?nOQK0kXLnR;J7EorR
zfxxr5dS((>?bdDkQg7`ZRVM>~9c#sa4{IJjvVbnS9+eu4s<VODH_Hwpun4cbEnJz_
z4TP>SaHPE>AaE5mVi|wS0BLn<hYI_P4Jmr5V4h^*lCMz}u#Z!tK0q0;p|kRx6iKHg
zFe0lOWi)B!*%;;0mSGD``01tQIg)ko>6a@XJ#^1sK{Q+$iBtv`R4mEj@Vul<P!f6!
z43Z=l%1Gkb8tiG&!$Ki0bzJ{A7ULRluR#y9=6n_SY;k}<N)mj#+R%6Ex=aSu>6H&+
z_{|)gCP2uB$=|5EuNjRshDsS+m<C=8j?60QCVTsaiR?Qt92+)wgsuFH<-0Ek1;PlL
zD3^;|TNqOcsrRL%>q~kU;Vu)L;=Jl)Bt&pG3daKANZ;(0$HvAi-oEC!b)xovVu;as
zP}Fdx()$gkI^&66gl^uaL}SJ$rw5Qh*pa(?hwVFa>v@o$2%OMBJKa-t_E$mqI}JL>
zTp4sPRB3c@%QTsl0w@77ezN#nH<dkz4|R<yn0Zja(=3DB<tY|v$0GiX6ZlzrTa8|c
zjoV9`*Pahf^ILD=I;;k3f#^+{MlCy0<e?FxUU9ZNsv-GQMcX7Q)CwhxcP!IJRnRf{
zvZ0UeOzN6^O_0lJDT~9j?^TA49i{-AE5o6pP~sF<C3n$4>lT%-Ckn=$_5gSJlCOOI
z7=m4aWg){L;{{Ei`<0K0F2xXE{GdteGFOZdn<p@27Go$haun{>2NB#xWQ4X9!Hwd-
zNz)OP^T2E6#S!4}d$}G1ll^<;cbT@qR6WT9lM#xHIc!w&;8)^TL7&E$GM*3@b6*NE
z>$WDk5fRRDz{%!GNWK!@mooI%FYpiYmr_k39&am?F(WMVVhEZ2)pDa!)6b&{#ywaT
z%=$v>F}S^ePeNgOvG~cEH(LkK6P3J4$ZXioF%79xjL%=aSSH;<bW-u!`_(!-C}6C3
z;ae4^hfyfEOD6#Rk2IFCVBZmgA&3?GkhE^i8*lW|-|t}L^sb*#0-ni;*iwY$uy&x_
zDCQxA7d~tgG46w~-8e$k(S=&Xr#19}InE<yVQ-Q&6w$6`hzU%O!&NE+#_LR*$iwCy
za?l?dwK|XAP&eAj5i3E4C4*4gVE4S?f7bP(a{J&e8Y?hl!@lh*sMW|;`PnTuWn0pX
z3Pg*RHh0VcA@VyHOZpN6L}AdE8jkf2_H`nckPeAsll05WcOXev`We@<Q>Xw*g)$Yb
zq%W4mZc*lG->0&NCALWIQ23SDxjgd~lujl3hD-r+Q#-`K4@PSF5j6U1k^_71`}qG4
zj#;Weu}ms&Qcog=5@S#b3(cdfJ|ZC)zHZ+5v3dQk;K=m&%*vAooFw$_(|;@3JaXsn
zqfrSbF&RO&Ug1i4C(r2`_#_^2EgJQM(IxU2(BZJN9I(GcF!<3CUWEq-!RtkRl^lb!
zM3iF+VCn~QcxZspj5cFwxiPJlDb@u;;zp@%8@5@TVGuadz1laG(A5JhiKr~KRRf2V
zsHInA?@B8>n57hEHEC%lvy>n^X<FH8ttxMff!cw>WIZiTah6$x<1S!FDr6~1woSb2
zq_yqGQk0;IK(x++eaH?u?oXtX@dWB&Ix|r%UoBV4%$B3otmU%BY?aCuq_Y2pZ>vtV
z_*nMC;M0*`@80`n#EPdXkFl6eCHn^ZhI$A325kjP&f$8vl7&)<UaZPt2_c=Z6@g*v
zQ)BdNvT)e<b2e9R01e<3{yrQrLX<l=9aK<WA=2M^QO#09Z8V0V*gQyR+own(q7-E6
zA8A3xl!`G;{`Vk|UTOg^Vd?tro1{_MWN+-Wz`>WUxMwb}oSMfBUBuv7y?-2H@I1cc
zes>I)3BxeZ%EBk^w-;ClX3`vu@9ZvozH;vV^5Wy>mATau&sJ{!b@|D|pO)s`ua2Ru
zbqoq^P5rcVYW4ZG)thIQAD?Xg_#Mj>a^^CJCP2C;zF)hz(7gD_{phYY#$+6=+<CJ0
z=`E;!>NNfaogNor{Wzin!r;cOogYW;$rEeG@4Dx{fcVoSHqqO^*@~iXgShYHdwv5<
z4+*?=VNA(H!y*?g-GDgP=e}_7oQ4kIIV83ldIoiuZmi9nZ(f>TIXBn5_F(OsD>AN9
zapZpax%<%#JOuat$>xnYckX2KhlQ16kKE5s>r%k#)4w*)ehG1l{KnHfHM@3w7UFL&
z9(V6wguqhg{<3o8D1JkuXUmQE#?kWP?dFZ8wWB|{&%X8r3gFHkYhJp~gHWA;NgTIu
zp>Y|!OoE-If6zk|5r;5!<FqD(kUsmEoYVqdDF3wAyjd)aWlF6fb{wxfyz($CPP<x5
zjQevLQf^sxME9w#o?R^+&_1?;Ku^ECGv1)&uCxT{CkKm;`4liA;SC0FYzLfr)~oEj
zGs_!WvRBRcWJ(f{MM9mF*EKv4aZ{fs^UP*FRU3Mh$%s7J(GPUEN4n4yoJsK!J71VC
zP-oAH8RX3ht0P_<>53RFb(!OoG|XWPm0o+Vt2E1Oio%MoHgK@526Q5BtEuUi*Hqc3
zsQ@IkG{stG$J55*riA;97DmjNYqeqHv(K_i2Q69?%0w2+gWBp;OKF3W&637Q)tar0
zR!U+0ARNw4B&y3g=JqQqBAis6C{pcpOaaq0f{DUGmNHSOGfNq%{&Fp4A?;%N!4AZ}
z$IqLycbXqva_@hLeiK9g8AHEQO0{J@6SWjgv&61~>S#J;iSyWMX+y!{9~M!YmmW~d
z`0|t6?wLnxC(a@w;d_E7h+qgLECh1hce5H8GTIURCun{4#?U*jzd1(KH~8o7Bzg6w
zu5H0%-Gci*BI=~%N-0|<4XN@^mV-=J!$}HX^;9dVzY|33odm_V|71vtO#hZN7T!Ud
z+GNT<Ur4*!oM3^R7>}BtoW`R5^3$^`PZw5}E@Sy!#Y|gv#MWjp-7cib#J=||;0Y(x
zz%v<vaOw2ASgl@d;M02*GLnd(^ekrbv{MD2Uo_np?;VIk;u7#=2PgLi$q7QGn@olr
zi8C4e7rA<oIf6_vhd(dNnr&~~&zfzNwgm|`L)qf<7e3?Rv)Ib1N6m$g+*=>6TsSEa
zg$)Reb}PrjWhYfm?34jPI;Kkx@adh+lY8rqd;dHHQ+|39AL!RUJ>j1Jnz=ISp>xm7
zxeHg}oQcoV^CxX|71G{Ft}bcNTL#J7gzVjd4qbZgKK`ot+3l5^3%HAP9Ax<ZC}U{5
zeiyMILTK&E!{)*dKP_D~0waUQJV_Qd@zpH#B*!qultmiU0KFy>1RwuxK*aAL@7zyI
z^LT2S^M71^^4I1kH=5sH0+0}Ld3o{M)pI|(=P#3@B%HDtHJM><@$vHGlcW|6vWy#}
zAY0&+62|KDPcc=PxblhRt&p`gc-AZ>Z-DsjO#J|E(f#fi#AQA)PqU0d#PFpXYqNjy
z9ngd3Cl6Qe{MkMEJ;V#YIgjz`JC^44S)g+gaN7OqSo8Di?)iJG$Dalu?CHssi{C;2
z0cUHUUU26hYB*b7oW;v%_4yx{mzD@<ZC<^Q<lU4#*BmS=c=wcyR`o)r$ABf^v*hLh
z3lK9~a_3H8eVf&)o4e>u?S!=>X1sAZy1Gx;Qi``0qz)mviJgfcOreF<d$;jRp!fjr
zA*RJdBV9yvqWoqY9(DCupjd1AJMhphx8g1sC97wSyVo9&rA8v8pLj?YU$~yGgX&(E
z^N}hAt=9v|Dy_aS4<ucJ>x-lR{2f|Z`q=&UO7q%-=H>hAk;OL5Mah8NwjYw64mxs9
z@=->~Z$imIM1;{d5Z2u;<X$*um5q^m|2vWk0Hgq)m9z=Tyob)CMD(yvP=U%OKoNxs
z9WN<sk)5cB<de+dMM_B<$h#Rol`sHi0y0iSfC83&{F{5`gnQwe<wvL8Q;<=C$(7w0
zT_58&^v-N(w(EP=(N<x99F4BcoyBv9r+e<CduvWQO7Al5@%XVDw<rP7y_{N9s5u2p
zASWJT5sO@FMXNXOuH1UecB(JGSzi3oT{@1(4;rZExl{W1NQ0}hXO<T~YA!BP^AX@i
zv-W^6PAow*LkMm&!CHc>A(uXdPTab)`rr$&%+U9wVWq9fY=Ap~L>BEhAHnkY8cUXt
z?43*_o)@~3sP@AD5P+94F_Y0a1a!%aps4Jtg4;jb6;l=<DX`lsk!w_KC#AQxQd=aX
zhMoJ`{phay=y~%`-+}~iYIga_Z3+@ELJ4XMpv3iqdVl$OS+RZ1-_=U%rLf^qdZKeX
zg2YS`4IH@;o`65SxEV4S+pI>V&U&sY8sX8DGpo-&#w3RuYo1-89ROxVuJ9!XNCKJ(
z_JicnIY;qIt$$juCPF^Sr9wq5B<>W-v?iA~v(&Jb<vcaKHx^e^Oi@dZv786IXtDy;
z+`04a{XgOn;GwMi?HN$z(lK{&5thrNhs`H9S08>a56A!+q~<d3JmOvj?5--)3RgQw
z3FQw#o9F?|0-HA$-RH;0m=)ZI#Zgq4{HX`w^CVs=)UfXP%Enj?*=CO3{>@e!1jRzU
z8T6N%VIOxNUqqw7rWb&Adm6&MTt<9{Y0lK;vrb<oF|z@lOl<Ca{g}k;1b8vQ<L7n+
zXB5Kie*=5vtp|7oMt3Jo3h;c9jD>FNaQj|J@O12h1W!v3B+`Q9ca!imG<jz-+bGs|
zkv;D2O?T;hke!H<Q}gJT?ya+Q^N{&g0ENhYqIlUzOvl>7AXmi$bDVoj_a?23qEF|#
z^1bw(^5UH!(p5TUQw@iAYxQX%$fH#sx9Us87WeYvS>_S+{Gxm9^XB(=n;)KcKf2cZ
z>^PRIR=)Vl+La&7_Dvrsnd{p<sZ^m4q7FD!hzL?5K@&g-j@Z4PKtyRk;%RUiA2)y#
zOYL|Z3k4`&cBnMt`Czd^JVun#kxo3d{O(&yugQW?Oq<L_F|2V@xG8pO-ek^qof<f~
z?`3-?PFgFpk&|67ZDr=9#nxfy-1HJnowT!U8#}olbt`kH<fPk{!Be>Hwq^2^TzT6!
zdh#Ky+w5r_^9!^`T5$CeERt4SzdV~HpREv9Nrs=$cF9eAQwXCCQv-aquuZDb9sRA7
zYB_jbrhU?E{iRwcEzJkoDAn?|P@^wOBg`26M(Bi1p79d2!h8({=!H963mdN+mS(eK
z?XXHI>(mdo2*gVCe8K8H7Q(od4oplz7%iCBs}0i*c_2Gb>~LiK0;0ezKL;=Qm1B=k
zas;{aH>h(w7U$jPC%|I>A3LeKgq)tJE(+m_1~jT-G!I;$5PEhR{J&)t(UaS2N1wYV
z&Qg6#^XwPyt?y)4aQWIw3q6t1lTI}m(zTw>v<BX+1nCWwRb$m0cqM{#2VSX`mzKPu
zp%evj58G1}ND;j|Q5HxWcaXY3YDHEU*a{d_p2Adq;U1Vp$wiK}Z>~@hq<#Vx%BFv*
z*O-zBRK^Q`VTvsRiRmg>STjVgsjNJ>=YDhp?snA5ll7_WVMN=JuZ76Xkee{)(Dp0@
z>!o*XmKVPTp*GF{MA_}dnh2nV;5y9WWB1;pwWk-%HAJzg^j<6B7k0mu2F12sLhDd?
zE5TZnmQ8GcHpJFOIDLpz(?^pb_K5&phO`6z+}aE!@X#xL#w&fsKY%{tpGZ&fO1<zB
z)C>OvI)li#s5>t1Ua`Q%5S$Gk+}W`_^6F^Z?$P+wxwq|e@4Dk5`LF)YA|m;mMd_hU
z$m%1Anv$s~EhP1orN1;UKC_-d-RpPhfsYC1r_K^z4glibKdJI(IOsmPO@X=BZfIB1
zIYcawH*fst9=+<GzX!#MHw3NJ{P^th)9Y9^r`^)!cJOJrdcj>fFFKA}1b6F&GwzAI
z)G5MjF<=X65#!1CX#eEn=4W3Bbp0D@Iy%uJyeKUIEx-Z*{P?$(C+De~Iwh&q=W}q@
zBQ^fb^#w7+SKz|4m8T2fZiL^0OAc|QWb5|Fzu_hC-ntClV<a8O3sfcODgeZLxcubo
z%7+gzUud2=YSk))Id8Xjj1E=r-}h|nd*tO|=QjG}i+=O_?YmRE_H55=-M4T1?tMG<
z?9Oc4xpnW}?R!Dwc#XA!A`a{jkr6Y(ibrt~!e4qspQY03FloG6B9^Ej*ZX?=V*Lr!
zJ(L`X_4k_r)b<im$AA2~Sk4^`j=qkmrRwCp^oVlU_8beqVT58%vtS8I<&06x5<pz7
z0nw6`JMv}-4MNv?Omc5pm5dEoKsf*P+R?M_xs%Q3v&{!jM21DoS5{}wtbO)5-kL#|
zC(<sGp0!95Vd^aWBO!}mfHdO|4=I2-K4f6vVFVd?8qdS$oGc&(=Ufnp?mU=k2;5Bw
z%Vm&w4==Dn9S?>^+9vS_v~HDnE=LuDQXfgNz%57S9-}5KW{*R>CG?yMRGMEv<gc`6
zk?;dc7@U)<3%cT=HtXfe!Kkh9Bz(#oRI(b#u_5WcS*}iH00i0aEHN>)!EIsr78+A#
zo3tHEMhj`04uP#1&XU<2NxUR7Qy?M51Rz#>1G|bPnTKVLNJ77iPGnwEQDfXukF;$>
zB+((CPM@qI0;;7~o6w$;gAQO8kAcHA1CkqvO|wXE2g&NH4lt1{&%cOxbUg`F9TMps
zf?HJ+4ppk+icYpxD9%vJHac{mfR_Y=fD7+0gOMGVf`fgzh~-a6G34YGz>JA8Q57?>
zNtUuPTZABm8QlA*PMI+hmW8lur{>9LeBH$%#SvZ?QM#WeteDwZE>{rA(U-+T%1&ns
zMbH-rtUz`+j4cvym|P0MuOUo4d7CRwVUnUbfbjpEtQfaS73C@)AIofA1(%1Y(PML6
zuxiwCKuc2UfgCgHk>f9{%fpQRct%<Y&oThKb=NX=P*F<D$$$#!EMbP}*sEB&tMe2v
zjM&Mw8G;o`WsC~sLVyknWCF8MB}o!lo?Y~N2bCO3VTvopRHIlgKmtvu#AGengat6`
z*~APQ*VgqoOqv)A=8>g5rBrh1FpAOFlO!X8zcjW<kUWVJ+{GU9q>}B%oI`ln$mp`w
z_nlFqZKl&TCWOwQeMkqZa~zp$qDtG4MnOZvC}}uGX$HD7W}khE&Uea(99S;nWU)gA
z+6>?kPe!d#A-SsR<tknWEH5I>a12EOwY4(l%$xB-Yt$Sc_%I*iq(|cn20nsSY3?yI
zY_XN!THeM*Z$ZE(;30!c6th(;4*giZWnrjNm@4@z-)uX43m|xhFYKQx7g#b(Q{dBJ
zl6)%`P+@l}+uLqmh~4s3tXoD?pebcCY)Rxp3v3E3;IQIWw;iy!sHYxu_!ClS6X47~
zY^?AW1bSlxI1>i}lAKb`HrkG)xfb|qLq<~|882GMY?)%=VEB0^slP1~puYmW7#vNb
zAv)M7u_W?&o|7BzQ#sMgEEOg^DmQ5db2>~paRDf5xfL@}Eobu}Gh3SR>p@&p!tbMG
zA$s)GWwxQi^=-2q^klJ^IPGY}-Y6AtkE7P<2tyN!#Kv;`!ZxoqW~Y8h)JvOC;)S|1
zWf_-LGmNxn)V03knqMEXv~<tjF;P{YLFM--G-vOf0ML3Ni>VI)XML&2IsUwU)IOLt
z8q#-Ze`%}65};Yf9=slXTkk~Xu>yhzIIb3FT7C$Gv>jpowQA)?HRq%iG==8R@Pt^Q
z9ZFpRtb1cbFt33iSxn6_5`IR7LOPtVrD=Mo!qx&erb;*=x5#w}i!XM>!IaY$8a!RG
zrWSsa1Z=48Bx_z%3KO;R&>&=O8yf73S+a!%)wZP1gnvNg4}J*#%5$DTl9bF8rto2Y
z3UeZ&o){DQXDv0Af}TxnHD~DF{Tfm8wL!#R&s~HapLvktVkHA-Ny<xWGz^ECgj|ZX
z%=fZ4@KKx;_CR-|WEc9MyfBThQOz*Z6_z<FumL6%;NJZ84<%s!h6jdi^QYy7j{2vs
z&y>m;k~J2LWq6K(&PO9QI*53wV=^R<{Xu7@mdQKQnAv73#cYnuCaxY)?JBj1RF59P
zYb-(v?X;8~ev_PFWs_7M7C<3Kt4S5q<fAkQINQCLa(~Kd45{ZiFJ{>HF3(gmL|)~1
z9vmJd`OK3;Lo~OymhhIK*9Yxdt88IgGwKjVsD=42SGSR(J9h;)pr;3DT`#7m0+$J*
zUcl^;A(6~6p@<^2E3<6V5liY<Yk+r;V26@Pk_J4LNX1f?M3JF~tFj(xjb-q^AOfJY
zWdZV!3Ixy>ic3>m;)7L6Jf0{EM)(++gBw{TOSa-lxmG8`)&yq1yx;U-E^wb=vFRoM
z3(~y3Sez~zm2ukRb9xTCVZq>KOV^kho08sUJCp@eU#3yvzN_jH9A+PiG%Ao1o><+B
zgbK8X9hEAmv?8OwCuPUz)c5QvZzq8OK0m$o<%cWBzjc>p-8-k_@pweVDc%6_G0r(m
zgz`~i+l<g2Kn}8M-4c!<vJVUc_<vahX3NGD%kzbJJ&X$2w8_~D`ovGdQ+Nw5Itc%#
zDE<=+zgUJ#dm7lhgXOdmRTkH)1+agkYz?1)N;5*4kAGu|qCwq(4hp4Qv5}YW@{t1*
zNvuwc>%=HRtOJ_S9L|Ttf61#_$<`0l;?t1Qg8T!$jJ5Mqr402vLL_hCo_KEdJ9QFg
zkn({@hLMIS5}{gLb$0RM@L>4HoF%DnL2pnhR^oC>T!hIPbh}#{_3}Or$9}$Ny;MrZ
z0Wi3O)G51(%6kL3z*(+l0D%^<T&xgzjQr>+x}nqq5(7#m)*C_AmT0_S5v<9uqEIr%
zwi0$EuB9VWwTRg>a;ei(vcOKiznaZQeNsAR*p?W5{dL-Ij5~8EQfHcmClXHWa?E-T
zD&#^ZPCkF1Jr?OQcN1`prJgcg+<A$oh+fNyl?mgClk!qjL$dh{Cg|of)Mral-2UG4
z#v5;VlM<)GKb7F3jNv-q^oq{{ASn@dVZ7(FBuf;7xuTOTWjvyxJQ|CCc(-9lWeQ)9
z$Gqdr#K>i1xO9xkvz9BA8?|4eZP^leRnHcQIeE(Y<$slSY@Q<9iYP}@EbhP(NJ*5$
zwY22A<KyFU?iRK<0g$#Cm}F$`7DBQ!iWwV*B~zc_Oa$69<DJdxNZC?JI8%8=8^vv{
z>Pqz}C#GZ~(MPDMFO`h-4kR~%p5Qs6iuyJ(Czzt^f}VmkL4)4cVU(BwyEG*^Hi|Ge
zcyf&@t<?JV4qcyk#|pZv((0N^t46fL#E2XQ(YOI)b<>f6$%IYc(2%oaz$;fjT)p$>
zl^aLhxu?xb_hU5r%}#<j2xmx>MUli-;E8<w>(wv6`Hvm@R+cWW-ut5-<xgv7YjwZX
z6HdZ54W<W?k4Z>x+#41i@eI*0EH6G*lnh3;dTmB%ls6i$Vfp#Z=FxlLZ+ZUS+UzM?
zKeSn)ogp1?uiqtK>D@&NQ0`PRX6s4Nw7K^lt$ueCy7TQ7_t;$#9ugPtA;lq-=Il{}
zZ)nZDN7jTDt=#=d8U}`6WK(HJ=yW-kov^x0npCri-9;AD51!-1Q(P-3Z%?qLg<Y8n
zZKTCt{<mxK)TPB=wqoiGr`4af3fc^&wcmOb4{6eeUwI4i5BQLT|36c7rk!GX^Pc49
zE{$yk;)fk71tPp*$F!m8ex0?h)?7fIN4okUE8fMs%a6~keRBoUx8e6^FOjAxkI(?n
z3&R2oO=JgACrO07wa-4q^h~JE{nHr=v3csw>diAjJcGvF`f2H^oSu)lE<#-M?=L|v
z0DANO$@$gib66lN7RkYmuGM;VrpGy)b1L<g2%vkOupLuL%IDsvM|4=i38qc?0?^M(
zp73}wE0Mq~C74{DFXZZJ>m|*upb0Z<v=ph_Z0{9R4fR~L!qFXgr%_a;e^9e~G_YHY
z2hj>}VFT=-BbqA#lN@SUPz!{S69fgc-ok0i8Ky`vzt`&JN(Bs0xoQE6T!pM&z|dhB
z5N7ZZJI|Dn>^iR5O4Y$9xoX;WrI&RP<G<ol;`P_5@I`NHDUwHP{<1cA);;+tM=zqQ
z7Xn{hzpKQw?gn{;9xM>V*POr5{Ny1EMnKX!5Wb>|FeEni+E<sGf4ymhl%LmvMAL}O
z?yaj>>fw9&DG^?tW}saC@t*taLu)G#TKC%L?$bZf-ubmc;{lU<9x`}o3?OXTnlW)l
z&iwJXq{xb^1jWz(R8o7FQ`yc!P3L<SPzLRBXnWa*ZR~wGaE*GQC_2QHtXzw}r&OHT
z!!9E5Puq#ZYn7K!tx-}dT<1YToiOkvyw#N};AKh=_WA;4r5aO}8PF`1DnU$Q6Kei>
zw)w?v2#$W?D>hvcX&S|8t_-uFNV%~3{2CTOfnXM{E<ZZuKKpv*+&xP39S}Kg@WNN_
z{HI#B+7%MahOVdJg9F_Bk`^#XV^PCWp>tcR$e4_bzPbI^zkPd5EMVAZoxCAwATSw^
zpby|bQC$l0LV?BLbSj|?*YFMdUQ^?~AF)im19K?R8f_cfwr$(CZQHhO+s=-&W81d1
zW81hn=e$?<R;}t^(A}$hePfQf>uL!8Sg&A1#C#~rX`MTh@cMj9I`^cFgaqrAGK!5o
zFHmGrSzX`dl<ML;QYno%b+5ZJxulbItf_z1iR#aTDf(%;`LI9lUg1(j8_f^)A<?=u
z^=fQmb(Xf7W1+f!y>Kr7Q``6z|1%GSX+y8#_ojNci}Qv|Q5dn;N42lX88ME|z)z@u
z3+U^4X7w;qe7p|2TOxWb?%qHV2j4%v=+Ec@mvhf)J@BFC{N30^cu;5Ko(lg_gxz{Q
zSO3DY$TTENZf$zBQRKZ3BMZ_`kJqqh{BS@l(Npp=5S|VBlEyRNaG{nPqaO4pv<9i?
zk2G_k2CFd{itfS_XbDQwAQ8JAaNr!$^M(-Kb4|@#%or0R;v;jYQrLw}jf2mG6$0T6
z_+N_k0FV49Vd0iE#|eY|70^VMi8Qk!9_wTSatfefPFj%;3kWE}?i884lD`W+nK#BQ
zc~NzO)M)k`cqm=ji6R&VjSDS=sorZMEEi@dY)<ATI*&#;9g5C<-T1M|0g{X?wJbNG
zH)BPP((()-E4C=K3E6^vj11(&_!J58E}Da;Eqk8z+mg@iv>U3av=KEhPFJGj1`Sq4
zB(3U8@dbu*DW*e>-kNqdYJbB?Y~3yDo1*wNuGfkj@^1IKt-fUs{~T#uiL(7Fm>#-s
zYbwIMq=ns(GELY!`)YNJR{#Cst>zwr)1c*8g!?hlizD+@fSA-77u;6QF<~e#x?<qX
zEz}PWM^`KpMuQh4?t)NG5E-dpxn7QNS+C*pdmwJLo=Ti0f?Fm9%~+sr&Mb&)HUM^L
z09dYaKy2a}rz3ZQAHZUKHjUOWm_+tZ5xNQ}#(y*+WwZEa!9q&-V8yOsf!ju!S;^ti
zhf{H|v5#t#;t?3LN_jpQC$6o`a*Eda9mwbVkT^fXYJh2VMY`ZGmbz7xcUh|fD`M3h
zFsrF5Q>2QqQc>&od-&(26(0<W*CKs%Dc5~V7QOz3d4Ro_QpQSA9SFlBCIo_sT9oa6
z>1X}nz3FbhnYZ3*y!t%lz1e!@c{4rDXwr~rJ;#WPk~z-=OcZ2`XOm(pXCuQ+2SFtH
z_h*D;o%D*jT;s*E3&(`z6?EB%gEV7e<@8ZulxMeilcY^(<u8d$pb8(HKlpY>0(_H(
zR$bq_A<fd?tUsWp#Z=`IYN_4R28h)EB&C>S|B&ai{i@GB`_{ufosIdl>(ZiYxW@m<
z`yw!Mn3xP5W{CrNoPmqmoLu&esHAZi9fgYrVU})L5fp79@7C$+6g0&94VZWG&$9V*
zT+Qs-*RVXzbG}>A8PH<1yQA_Xd<j5Pn+f9y+%NKu7igX3anK0bs8JkL4Fit#7!)UY
z*qv38&KArPH5bUZn>;QuaFkYPX4|RgZmDv)R9T@7qN(!QSoy*#hc~B`&}woLeVG#T
z#!J+oiPYUs4A9VB%<Nj}eWz`{)ha#lEn@rJ*L$8Dzg&L(&Ravr6HTi=5BE37;bp!;
zSbi9OogO7>%hQ+)Q$p1t(>=yP;n!ka)$x4<{5rT1+?-+GyHD|uLKCP{gMy>tPl1Oc
zSRvX1^W#FO`LhD8QotO2qP|_wZkiI<c==(!gJ&*oJZp5L&8FT-lD+kP?Q<l^tWepV
z#%t1Siy<s}ZI<eF5!ea%Kh>M!w$j_>UfQQSAzAFlYUtOU{NJ(|{fxRw|JuFxw<eP9
zDJfe?q*A*!h(xNgH66<Ne;jO|!DKz)S}3sUvpC9L9f1xGp72B8@#OlJbdZ-6qkqdX
z#b`f9KbFvhmt%YGhk5$&ymg6QRu?Hm-8HKYCZY1%3!Qu4&{Rd(^y0f6LC-KFJilMF
zKDJM$*SW3#s&5Yu9}WLaVGTWtP*oS;U^%RP;tmDN>xp5juFuRr6zglHARbVAQf9Vc
z0Q>A~jRtwUvs`_kW<-};+FN4G#Ie0k$V%B7h}h!P9gp)w+~&@_&daLKen?;C`_N%z
ze|02ZW+j^d^}8mf?TlG723{Lxcv27>Y{M{T?`;?mEL3Q`<*7pHu8|;X6AN0N!-iNt
zb0U;ifj20eBpP!B@i)ZjH)lI;qla#=ZL}hto-Z*azk7Ui2=?nys8WYUhSlKjkA-Hf
zu&Ym=wGcC}sZCB2+l0RYgjE);m#>m}z=A{czA*S#u4<$d;4hgxntojftdUjCb;Ay&
zDoi~za#+0iz*{N*XXBtxH>0S4JGN*<qlT~03V2GKxd2d-u!ePDP4q%cYpBo!zEgSv
za;*{iu2Y+KjW~SInm-v|nwdf7RHiyw^o%w$?o#m}_mJI%?elrL<=)uFaE_jj?aNqZ
zAu3B>`$gYRYo;8liqQZ==IE;K`zW68Ob~6d1#!EmM~K-F8!pSU69Dk9`0iZ0Q5z&L
zvl^Rdd?U+aP_AtFeg#5txf#gPzr=8vmZBj5rhZS<wxp~OBkI(0sCA<h7O~*_dA;R@
z@!B@h5mAo7K^q$FZT!Z~zzS3<Pe0FkuvRQ=(W&phizIfsg;HAP$pSQ4sWv;&aOU~2
z`pV92h!eeft=M24s{Em=iY$>{)uO^hK~(tdsaVBB%pe4Gf&D#WG&crN<Pm`v4{Rh7
z9V{93IxTp(&DZN9euVwUBS8^x901PcmZ+k*czSHLQN8>1hRrz&Af|`*9W7iMwNJ`G
zO-`qT#q1b*Tp9Q+&oK9xo$;6x@9+MFKh_FuG#O_L8VPqGV?97K*oxZ9cQ&zY>SCQ8
z8fzqojr~Pqrknyvh9e(Eu+&FV^}3=pr0a_Z>r%s%N5x15c<VaIR&0h(O!v@@fre)4
z)7Rj{FahRnY~E{*mEnADRCAotr2^AAkHS?vJ5udPvTF#&SGXYw%3AA2EH;3{zx6T9
z2`Kr8D;$8@6|n{IQ^#>Oa6zU025o})61s_i8gsi~_Q*c462+n2+G_T}vf&nurmg>u
zM#+DSNBOGUIkty<R95qPg$j@uB`9T$32`PGU2NXOP+L^u)3L;`_}>Jq50SlzJ6|q~
zVMVda%ByRi*%aqT)@%O}13I_E-mAkW)q_^qTm?%<A+O|^DJcsGx}k4#6M~_#@hUAG
zS1|yfG;rDaVuOtE+?GzO2Z8}!Ii#}97+vMTF08a(t)U8WEQ1F!uWN1iEHB6*${DO$
zthc4z!OM@~BG2?*=X^}FXInFB3K~H+EI7o&TNy&5723Y+YjNFxLoTgr)c{QO%7J=*
z{XH}(1)jz;<27qIF1ZP_tz@ute1V0cBDtzqk<eiqa>ui$Fo;gu=It9rzsdDjXneJS
zhno#_UUfo;-88HlZ84NLu_jPcRHb<WfWAV6OWe~61UZ5Cfnq%7ZZoWMhDSrCfJj;<
zjD5ESN0VKMdKmM}25e(6^ls2Jxs%2!C^gA&Zsd~@S#aw=EuQc&-mvH}GW43e?lHWO
zoy-g21W&CB{UVtv?gl+d?X8$yiR==C@o2YUT%|yI$8PY#a@2^xIqAcc4G)}!>-pkc
zKr%!4G9DAm*9L~y0mks!ds*4D#NmME3z|b<xZ=Or0LpjD<=KG%=8@R0O**(~?P>KM
zy>1ns^Um>}?~AQ&u83@-0||u6@qvAdrFj>Opu@9tv=|t`xWu%)fimRY<m18J*+^#p
zaiLz$B7-f_mH@x?KoZ)}*C4>1XUK<3BG~if+$667elfNM6e-LdQ*pB*!rlketW*&q
zjxb|7u(R#b-~&OHhQ->_W!9dSTeb-Tlch*OcNB%~U%5z+e^w7I_~7?SZ!J->DTFTh
z;NGsRvcd)}V(`NHCmR;Fp0uE)^JZo4*3wY%G)EuA>G4q=7*A$yr{Pfp4qFKAE{2p>
z)SoRb9VajYnZlT0f6lKIa3pJ?-7e(BH~0m}`A!g<t`v<Py<SQ41vp{;?o9sqq1N7!
zo=eaQIF9y9`IdnPVUn17?d&xS7+IXm)O5BSpwo0PHtjA7cO*uYgZ&dR@{a9AuG71i
zK_??ul%Mr$(`+dP%IT}Ll_Wvx=B*U5BFL11S-Rb?OPNhAFN1pGpbR7%Nb{-bJj5B;
zG)#P7DmEcvoLha-VxO)`HIV1w<6)S~$8x(xm(cAby^*n}WQ`L2{ngj{Sguo$CrT68
z9@nsP004S1M6d^ZPHrFX-~XHw(6?*k=gh>%$Z#MFC(FGQurIP_Smn=je+L66|L~rt
zi#h0)10@A_{1F@bQJ_kcU=koc-4?36<XFxN_85UfpQBIej#RN~t7A{Z_*8W&K)2^m
zw%;4D4>E$i;+~5r1nlZ5XG~DxmwJwUe2IO8E4m(105UVMH$H%utK|z4+uiQ4-{h`v
zm%hPhguP>B^G|pJyDqXRBnfdmQn8A2c8D$@`X<xMEJXR-g($ssuo%H<zy1*5Lr|Qp
zqS0&E_=!ESVTd#!DsX%Uz_64{Vj%x&8}x-r9fJo;#*!(-`vR2c4iT`T0Ou1Wz<4PS
z+cPUwCbf#PU`Ja}j@&`f0K|F!1D;0lRimJGp{K|~r7|mIYLkd=!Eq_)qKEDtL>JV*
z)8D_NLxmb&P8Oa#SQ#nVAT?&BA^tOVf2L<nZz%bcd>i_@tAaoDOXFx%QX_K&B8pqS
zGfljxG6~-PWZHVW^)~;zn$q^(ne1ve$G%*#Zv<;W+`%>^?PMm}u3yvR=NNcZV~>9k
z?E%O}XKX0eGBP6D*xN?Aw4=FJkDc%lgImSoe~CJcy7_zI0F^Rd5>e&puu%@sV7h4#
zjga~Z{hD5Zifl}TOb@y}96rUSSpmug4a+UXA<;s4W-8{m&4`I3olg|9#uq4w8Yvax
zC4FH2Eh*akDLUNoIUGl#C=$}p1Q(X$Y)wSL9`7WnPdI@>hS-j`MiFQ&$frGPg(7sE
z<pS#70YUfmN+7qGcfLL$8AhHrH=Y*~L7UjqV^Up*?)vgXefPv!c|&Etx<CZ6Qc+@$
za7K#o;h%JQpiYED?+;uYB2l68HO`}&8F}FlV`hsK>2B$Ud_>xLDeYBMUVRuXKZ)Vm
zCQYY3hEIk`Hx<@bS*ljMQY_`^<_Q$^C)#G9Wdb==sdg(s#O+SU7-}w`Vc$gK;*ejv
z<rU1$Ki1AuYG*<ShBHG?b4Ln%m8j#ZVqOrHJw&HZZ+d}%)W8xg;wcXgH|D)K6mNj_
z5hfZbh2<t{9)%}qww5%sEKjb|peeK$Vm50~&x$zaui|_-;tHU)X$IQ<q_c+#;E8$<
z%@KSQ6+?&6QULy50BUo>d4>Gw=qErlah68?Yn@kDS}>^|sb<Jw5@Sk4CB;M19ckx*
zD@~C|@vD6rsxykbow~ZrxktL3t&QPi>{vc0{F;g8PWBRLPhRMB%_)|egq-xp{W0r?
z=0TSn`DE8Z-w#tOhVwji!lve{(-9mxUmcu$XEgp3f&JTLfK~%QI}HyPc0zBf5vbPs
z^LpUAhIN`jC2|n$nl&K%n9B$p4M3$+fOEi~Wc$M$85W0aW5Mz9m8VepI`6i5z<`Bp
z#(g`stX5~h@q{+T&9+$wUpA*qg25W+$`LMhXCivom(e|pEp`qRX|h3SnIypD6XOSo
zD_MYIf66J|As6Zmn%BAo+Iur%OGTE^BIo=3OS^|k&m;y>5b;i`^fSR}<@lB3nNDf;
zy<O&XOAqIlfy|@f^`qfB&c)!#w{o-WRe4XmDZGef{9b)uzSYk|u-;?;r@^D)`0dbe
ztA?YKeAZbUUC(>R|Ikzem@h-RTO;RN!2yIeLf4<>*SU)hk%?P?y8oq=qI5{6?ibfZ
z9=7tgMo%35)oyTgJKr$Ak9iX(ntMs@L$+uIt&D-w7{tKA(|zpe_6UhA9f`1%SGr%u
ze0}D5UqE5QM(I`^Aqc~1CkC<UP4m;Y7J&jTiH*}Hyo}}$h}p=WmY8J-f<#cMQlPLP
z^<^_bx(T;ti-Yot4iv0X%*QE=(Z)?e>RD(UQ>6h&&vh*f@Gcg*wz=!*IN0KjOFx_h
zv12vM1&WxEbw$LOl@~@Xj*JS`;?HH`WmHeVnJn=Qg$y^;29<W?{fHqxHf1vjEm`*y
z>5Zf!;0RFU;gO?ckanibtTB<G;z+|4oe+qjiMY@aG@sapge%3`4HFH3OGSzsAOE>-
zmK{>lPQ0o*k+c$s3>(F_hCHOJ-)tX4hcYp*<wNW`)8`m3p67XXJ&yx-bH!_!$eo-<
zoX%>X&x^WBuBVw|cnAat166Jcjp5<I2jqeXO{ln?tC7%pCx*;^w3s&b;iW_($$r=w
z$d<WbZ<eC~FENpsy>Fr0FrAH<%7ipPb)g2>z?0%7c@KBriqqU>Fe5vJ+!o9|BBWRs
z=yLBqon@Fdcs3c*b=XM@o6E8CbN8^hyP2NRPe;$fSk0J5W5nCmNe3M<dM6_Yejk=!
zfBI5c2NK*}RUVP^PXvXKF-Zu+iBC((cDqff5g7Vz(!pW_@PM4_$3or_!noQqlKQ^;
z5yKH8alG48&?YNZ>AqcuT4Z0&aU*-2^<G)$!XShYa4qp@27+4TS0C*+Ow#Q10%}q8
z-rGWA0mgzwVGBM5u`1D8xs|0t0lg=&D2t(M+0b5Hpl@)MTQ#qEsx=E*o4;idtaJzL
zpIs&x0iyVw-H-b3do5y_tv4=QXf`3?HaD1*YT43i<a^<6rhoST;GBy?m$vAP(knti
z4AR+jQ3!xO*Ny85<$e4Rri$=QF<myif!2?vdF8Na{1Ht@4jTR@W}ZnqPrb!y0!E$m
z(yZYB_La9ps|n_hXkmr#@1CEr%K{^!c@bOq2_CoMm`BGJ*&YNIZU9`=BU5U>Q2V<o
zqfdOGscM!vJO<Ae+2M_|D%$7IZVCGb^6*LRtgqrK<ZBHy(bbE<xiC5X9uDE;bXt$>
z9hbGKSwrNfb4#ETm%JC#y%MMrdgu#4Cwt4!+(NQU>5E4F1z{vUFk{cp4#3SoNYpyB
z=HuCRx7&U+Z#&bLyY+gz)0e)D@m`Hc&aGEDHYVdQ8nP%^VJ|NbzmlOV2_xTk&o^K}
zw240zqV)4RO=SXY_D^GJs+vWcv|qn{J0&SwVGoZhDa{f9DMCVlUxAk0B|ktz*stB@
zpNh3k_%QzR-&$LGk&W9h07h?P3L^_bQcb~W&wTFaca)bG&&P?TTfavr?1>54-U<Bo
z;O~XL@c8&@zEC$54ACG20~EF}`HH{2^rC7f0*&lxjkmO%Po5N5+aYcdP4-d?Nj9ZQ
zIfm{x8XDpgq#~+VmAyK}?Zid8BkW_0@(**-63acO<fw3Rf4#{(q&!)a?&OA?$m=8(
z^uMPIqzE||hK&(IUazOI9F+2dZYh)LWia(it@S1U6{jo*l0BjJ#ujzZ(~MJP<0M2=
zdgJ0StANZEhp`ylv*RKoeW^R8N_-XKDws-XsOqF)sG1NlA@c1C*h~kkX@zcrE7l2@
z<(?GyWJOY03!isxN%O_g(FHHsT?CZtqZkAt;f0<Rtha<?>se?~4!MZ}Qjw|eZ{q7q
zGR)RTV4=1N4;01phXOI}o(mnPKl{s1#3RPlqt|`o-NIV#bf5Y8d0kDytH=3Y&`o!^
z%Z>hXbYiA{+>TXi)egE5ZL0Xf>g{DrbiS`F{MO;QmN^6m4_A`oeqq;Y1!8X4$wLaS
z`d&k5D<6Fi+vpffTz{T<w*7g#&cp3|8F%N{J6=b7CC9u8hrC$Nih}><60*i%=X>R4
zeHsy;ongAZ=+yYGS|Ux`e7N@ZcI)n>|Lgp5-TKm<y#6xdz4`Ln*L9NJ29b6LKx)N{
z1Z8-Vw{35(r8dOByNRw)%|4u&*5%#eic1Ro^6;1=;=^01T`NyyGckO&&F|=nLyvlg
znBOpFqsez<DWnNc=Sj#XJO-#1;$U+pcx5jz%CmFGdg&-Nu0Emr8Krg6XB%=5J`qEc
z@V%ux9xd{xHFCQz5I|p^-%UPO6a#q~GmS7S{h-x_g~wB1jTIM0R1}?_u=Wl^pBH2L
z=7x=*M_yd8cE1gOTSM<Ng7y0Q3?}$~aUrw0cy=4vT$JhRXxN#?2g&OD)%amFxk&R3
z3gPH>;s}3a#&}2OO&6T$bkU1&jKWk0a${&9uk)nG!p{wxOgfA7JoX4N5&%u?9!>}H
z*h;|!##V1}!@jhdDcJHTT}f_(BJ~Am+43&^UU&A7q1n<N=@<R-;3^&bk&P)rukiPZ
zc`ZX(DSQ`0w*C`zk@-~|9j-geFPDP^U8$=S5L?O4i6UL5;BL?V{d-QqMjjwJt*xOK
z^M^1;b|vILsj>QEx}}FOMS+}$!NTy%Tt`>xU;E$GqGAH~+F`CcICfdsg~Tg2f3t9F
zHQw0{z>@@y;c4hN5bVls1!sjCO~P$YI0M^6I8OjEgqu8<S)C&Xzo<qIa|NjBoE{=!
z>TezRxJLN1BuDg6Y`+IRO}osg?FVAX1Tf<t#)2&Mu{Qw5T=6>>pgUm;5KQwB?pp&K
z9Iy1}ZF2EQHK8UbBZMDqT5v}JQA)H!sS6k$6^EdI#*h|eFqMr{O1Cm%ZV5VCuK>~r
zUxD&EQN$o81-77|&?z|RxR@t~!~|3)ODJgTfkwB#89NC???FTvbhwg{nmdZrf<OC}
z^aSBbF(EE})+l^v79x`zUahAl)(H0d1Z9<Cs4n*o9eH$zl3F^I%6_oPCBP(nz_zHt
zWW}-CR4$byBxO+#09XN0ptcEmT7M#vlhym3{92laXVgCT=vrwvJwDF|*$ej@Tm?VY
zkcPgk+7%JKK<ss#%(Vs|qmzj{j+Og>{%F;5V9M}4RjY4$n^!7SH24B~R;D)*?Q|V7
zQdl_~6vWDtB?@G5x!&uo)-Q*ChjZqqmD)^`*lLe5Y2Nb8zYVw)ss(_=A~WyI<BGog
z8|I=@#lSa&*7tHBGXIhh8{5D`-xD#t3oB1<sV0S;yb&Hq<|jbn_ct#*G|?2Hti)%`
zozh!%k+LXCsPU|{ZbE}=Sq|!v+pxBNCv8!EwxkA-H`AAZ)}vFuSQr4hgrVWzLfIwT
zC8lqEAhjfvRK6(pd(qrHBbYG&1a_QXz=EEOvIGVu46tQVbZ*nyR!hp$(&Cqa9v7?|
z9fqjs$x*Kh-iRX7|091u^QOX-JXH|s)IOGquVvW<__tVBf%5BgXGS2Rs|)@b0;ALz
zFrdW#dBWJ(nE0L$883N{-0DXh$Dg^)dE(Jqvtn6F1beTPD7h`mOJQz>VJb!I%-}*Q
zMZ03T?CTGx8*)?UVSV7qc;p0rWxu9a=!BZl`Nnlrp$SZBkXU)kkuQ{mq;YKF#!^CY
z3zh#?f?M;QoOuQ-R4Ywvg65gBahOX1ojUPl;l#1W?E^y*KM$cPm<Jj0NBx>CYE-jV
zb7EDnj4uK806rnj<p~Z;th~CzR`zxD3*69roFq%IyiWpfmYClQKl^J0;;l&ETB7aJ
zB(JKuKkB9NBHXg_d@>Q=A?_B_^eh0QO2gPU=<F*PIK)kXlzfk1O)^;Hqd;iH^A=f4
z-=b+QG9`QiSFRP5QNE;Ave_OYqaJXfYdPyVQHKl7l@VvcSz(dhU51#}hJv%H252af
zIeQ*YK{cgetgLrJ=O?Fv%n`znQkG%Cofqle*TkquzR^UQrDobyEwdTd5u_Dg#GV)t
zaZ@FP9qNzl#>!udc#G^E#JS3=8V=5gByf>;y|?Wv7B=H)8&$llk)b9}lgz0vs#9f5
zWiS(s4L`apwJJNKEY`+_$~~w|o_+>t|1TiT_z@^T&k0UU_8#%oz1%@yGr?MV4sJD5
zcia|AS<{QX`>LeYUA)NFY>e}(AQqaooj{MPFDqSMjg4`=hd~R`=SDo;8Ys<@wI7=Y
zOGBlNgw*`)j%$tbP%+Z*O1Xmrk2@K)3d||=s^=~1KjkU<b2LV7=d!ND7pAH(_c6GH
zo2DHAHQQ1fwXzX`wXFafoLT`tjm;9JE~!3CY9>NYq!4fsGk6vF6$mAh?)NPO6*jzC
zk&cyjj(L!F1A?YBD}6m*g4kaiLGU#g=%HD?ZU?}|)oiL^pt59*-K3Qq`<f;4&9%RQ
z@>KzRnLuT($kxjkm&5eVh0?lweN=ed=M%B)vG&b>bAetQI$Wv0cF>vyH;gH==Q=+w
zyf4=Ne>*RCKlnI*mkGnNRCY`xBQd061Sx$qa7+Obyq_2D4_2qK`Pu4zP`6g_^EEy1
z`{JkP{%>@Vnh{fU#ZE9!h<vu4Q~?st<u|$rOr&G9=7C`z_>^t|=<gD;S_p$~#B5cU
zWb6<V=cEIu5KOVRu1_P6Va7mY7_stbg=Wc&fi`PnHs}$HVI2W1*P;GXvCuJ^1K2+F
zrD7FrQWCk*A-d+SwXSYEe9M-W#-^c-I3Tw;acxhn=E{I(1rC+`)Fj$^Fin<W03fPI
zsV({2m-r!V?K7vQ7E&ameKKIaB?pQzCc=d&Ux0^C+uBT}1305rIVEiMAx*sU&Jqcm
zChZ7(Hdg0;<I8L$uC{E*4zN&ZvTLxba3io=&|H$@)u=izR4MWg{OHy#RZe5FZ?BoM
z@zM4-wD{wVmY)-db6XTnkXHzWmjg#bS%-OMUtHa?C+IWMwU#HqrJ$CKUawvc%<gg!
zj6U-ylU2~7b&&V<kdZ67c)`f^;xR}dtVPyPsK(EKp_ifW=YwPmF7AC$EZlXR$d2T`
z|B&kZa+lV8e+aD;&GpuXuL~SMILr6;aoMn%q0JvKoQ>^!qt_cfl1b!^!%EL@<>x=n
z>uGfLx_ulgKK&Q^7nr_w|9kyLZS8zA-*#BQ2AAHkwO+b>vZ`jE=g8l&nZw)R_0#t8
zN=<pK3XhO@u$(4Vi72v&UF+*?dSpNA0Qwt6J|b+x|N8xidx>b1$aO#AFSMOE&~_21
zMr|BssGr7mzTT2_cr9#e3@CNK`;9521r4wLf7H%uEsqV|Cjg)v-S(((@#<sNUv~}d
zuI}ZNe5GCr9zqQN>zYXnU4Gi>PJLWqn)D4h4(^{Omng%NmGxuuQ2_n_VK=i&_skGY
zM>DgmvwoDZA=}i1)h5Q>Fhf*L?-(BXS9oB(K>levBs!y~<UeoAbsL&#G1(f+aZvbo
zckBZ14^CaMF=Ng2<EwoZXzJuKr^Btb23%{Cf|L`+HwZ?UG9yXjAY|yDqjLh_k4Oe2
zNTys&i1R%?8o-M2Thh<k9oXRCi{l(x+1*iZ%jW~rsiE==;EpHqMHI)${sobTcYYA(
zJs9t^yM%veglQT8{0G)EQ>qNEG%`g>Z`sHcOL3u*lhQY&>*=2~MxTgkq<Q%)JR4d_
zLr+<{YQdA0(vd~#GC-D#wlad&Vg*bu$XKWBVCb~jt7wuDq^uR&N+tUt>aMNZm|G~b
zg{CP$W+xdpG_(CA$+*2Y`M|h~>k)&xc?ZJ)>0v@E2zN8jNgx)$GS2jPOiWkx@+lSi
zJTb!rmfo^_HO2bS7s}{1yJa?uD5*;iB=`*2>}fh?<D-G;Pzy2~NUW4X(|lf2daXtm
z*cT`)A^jn3H)qQzsO0Kic_K^Ojz&+07*qAr&lWyFW=&S~_bJn@Yp}&Pgk4&jjR;R(
zwRYEpp4j`(7;L*^rC=d>8}!<lRjYTp;;Gc=%@l{vMSWBnyc!6OD)m;B^RP7T&rC6f
zE7vaCN#n(0`c-|3salGamM=akRbt&paErRhv;0t2;>%6|V;2|Rj>uaD%O4(r7H2D3
zLtht5oeS~}so@AL))~xEkp{CbKCvO0Rxq1ouByUlJMq*<qh*1S=RFNTl`&Qs6u8gZ
z0CPET;zwb>GwHc?t_GI-KkpH70lQ&qrrfoAaG(F_g6HA-mAAxwaveT$9NeTIF8606
zhjDlgoY9%9rxsCcugl0*x>{?uQpdvB2<KX290aRPLksmV?b5!u$Oh=7bI0P9kaaP~
z>NNN=nsn<4aCNzL)H=oj&=d5sz8d)k3^aE*;t5!J<G%iY$P7V0r$KGsgS|^*+P1_q
zPI~FL_NFu@2~)c371X7a((=nwmR1Dr#X6BKei*xwFG6<<ki5Oeo?T%dF0|}s$WO<0
zjU47R-}D{pdk1qP6Ci`{F%_gW>7L5j$4GtU5<FryLmwRc-Ov)JbK#|mxdjrnc;T!W
z8R49LTI)cbaFwfS?vHc+p5u1UUNi`E&Pt}j^x9Q`bLJ&Wb9LuDQx?a<su6bkicuR|
zny~=A?7*x^7?zx2at@2m2|WjL;n5mLxON=T7~~qYMBQBaXVL82#K9HxxpmOw^eyKo
z_BTvAM{U3*g874_bMOp#?;p&GzKm<=V&dfu1bM0onghZQJeQ%Qv=YjC1V!_ff6*01
zbJ#!7Tl2gYgum7b#`@lKJnq?{vR*#pztKp@?4>N`g5PIu%bs3AoXp_|0}C(i2_CNS
z+Qg_nU+n2?`lj_p?rP)jHh)638?}HOe(fvF5CY$xFvyXSr+=ZI_jGj7VzF_g*ZMxc
zpDxV_oR!9TV#AQ1f_vw2JbzFpZls3X_Mrkw(2FcU3SA84%aCwU%O`@ruHQWl&O^y(
z-X3P_^K|NQ>DzI8XTrhV_xv({e!-gIRU7o^HZ%Da^zIzksvY0Z8XL}?gY*(~{E;FN
z7Y2ZwQ5Y+~Z<p6+rT144xBmj~k9b$?#J^XMtK?1_zdMpR^DdMOmVi|b_3J&}Y;nup
zS5fD0A#|%5eK9<Wb}D9@h>n3nL1#_320v<H)m!Y1D%m{3Mi*N<>wEW^$!>vrB@?9Q
z1*Ousw?#o~L(Y@N9l3U{P=3`2?5PkHZO>VE{Pkfpd8vo3+|+V^9Z#oiAP@dBds38O
zOhhuXvp;wNf|f~1j2`kkUDlQhgXtQ$+=R8=S}1vne8Qb>J(FB$!Fwult`p<v4zrda
z40x`g2(ix%((uxM<h55bHfkBMUWVK!<9qKKWvz<)l{8gL%u|SDL(UtO))(H?u1tkw
z^%pW|;cK&0)U}x2&ttQEBMF~$CvT0wAjnTjTVND{+Ze{QZC;qDJ%)Y%rAzLi@Bz;5
zn$s(oGOLRH_&&!k;jTdxrIT<v^YeFby$C$RrU@ttqf85j_p<Zzb$&i_9cjk=%lBIP
zypToftB-DDgKm+?tOHy;t;6|DAj}ujZv6PQdjP^mp<CsP<c8rOV+>Wa<9JHJOp*)c
ziBWY8(#RfUoQJV%VNkk|k9I8SB$6;E$f47?S8VKIs6LT)B)rKU&K6Mb3?{YYjeI%^
z*`ipZ^XqoI+rPa9d0$-0ORH~xkSjhr6sE}$eAas2{l@5xH=xb36#x*b<0}xUAaBx7
z{!lwmG5m1?&*kBEPSRJL{4E6nferWklk77`t^a9yH32^vU%MBJ{9<qQ2U>BzUza1e
z)%c|tdwMpGy_fs=`z!_u%p<6%q~Eo8qbP=5il<*vX}py<o}!>i2I?Uz%6o6aOqb{7
zxflFK>*HSd<(U0C5}9@l-3u5I?0B#v%UrGj7e{?t922Qqg~8wH2y)BJ0iU<To`F{}
zZzW3|vx(T0i3!K0)T>@K+JzRahb?Qa4S2kwXcm6iow6>4q=#>)qP#ZD&%FEgxJkdP
z0nq-?z$!sq$frna=D43D#9CBR3q>C5@*HXrlh_{du}qO_kF97$5WG*R-L6bdVoHXS
z#fV8gQoURjK~GM{oAn&vES>GC-cXbU&@r7L-?HU#d;wcgKHq@VlD5fo-CM-gRyKHE
z!#K=Hi&vw6aFX?ps?Xfh)ZBTk<&6hK8x;PtS=^MkifvQ$i2h$V#lka$WKw#s$Wml@
zjl{1t6w)z2w7>quu8VFWuPT+P*vmP|Y7PKl0;#zkY|5xX&l76}OS+P9i(>3qXxh@^
zTY0#<Z>Bho9-~5JXi{sOxp)iw-J5v?mEaM2;U<+RGYr}?I{`#UqHhzwbI#De@Eji^
zSF##^sxBUB4f)grlqIS9d~hJx%`O?Y(n8@+J90=K{Owv+KyZYvlG#85Lg_Dzw+2E+
z!UzjJ#hm7F2X}dtit{B_lBPkcNx3CL86Boeoz8pw(*HDO<@2COQT}-=?5Jp7SN=s9
zE%ow;m0(icVTJ;+R2kc~e@l$2q{J<cv#bqCq*~8VLH^~~luIPD8F8W6c9OMDG_9l3
zz1XSf`@92UuHFQ?ljc?1+s|M0RYC98@^H!xZU<MqE;k&XtY)jn?|bU#@%$<w+@7k6
zg&sCPPucj41%rY@m4Ck$HOQI~*cxhY=vLi5EGH{6c^l)otDRm1G8`#8F<mbg)l&Wl
zC@NO?K#yw6_{tzm$#|5Nt|)7$8~&)Y2ehL1mIZ_GD|gn|Ojm5i6|7!;S+){RYP)Gt
zZNV+Q?_o&!?*2<0=8t^$d)l$JVlCW&Xao1VhJKl`TrX(sdad!k3ahRw31^<byE`v_
zc~m!_ZF@;mJHgviJCg-FM~H=PGsmmR<>-7KAJ|A&*Q;F8|NqyflJvOE)=$tY^%1i&
zpMA;ki)f{#ex=!_4)D)j)=d8IwC1LK{u6!Cz%FIv{>XxYU5Oa8+Q+BVeCaX9KffCw
zY{C8~W|rJP|Lfnxbm;GJXnY`_BT&H?@Dx#QnyoJXfIQ9K@$ERXc#?!>WwvZ&OGM}d
zp)h~@Gj2hEt7%miaC4cu^n|*!TtvK*LuqB|)AvuGNC8N_C8PGKq&_95epixn@l$-Z
zf;|pLvG>78b2M}cKaev?2j;JvW@UE^^G|!CCn<$2Byg!t_`L`&iyz9@Z2rveE+g1U
zQb|KFv42;d5v&|9HN+6M1%1*7ZFx4bvE_+M$B^q|wjHJ1W&M-xn^zNGa%v+VkJSe9
zao|z#a9tekGR^z3mg@~;kyJ0Hucgr0<zel4XB|a&>sabBI2Ll)R@^vs$@s)rc&XRk
za_Z+&T$-k`!-UdZ{X+v1+lv1;r&zzE^*0n-`0+TI8jptr9v6MxJj=bb9>w=<GqEX4
zBz#N4=gQIi&Y#MLPVYbUs-8DBc`*O`exyVO@Sr(i4ADITF8u)Jz)P}VQ9*H(&+Fq?
zR2i$A<F^!$w_~o1j#iugN4}NJPeY%VrrMie@#RnS=PUG1mjn1X>m<KM=N-j8jBTo=
zdbrNQ>EH1sZdZKZ`&;ozoyze47QKY3PK9bf_ct`vgbiQJlnCh^+XQBI{+5QW2h0b}
zznH4ox<2~fslE)bT<v7^Nrvqu<f~IBbcX(J=>bxE29d@QM^RCVWBm{j2JBi)rvCpl
zDp~2*b$q=5pGXhS+422|`|{?umTS6P^816we)}*vG5I;#xu!{2rnP&Pl_Rcj1O4yK
z;elE1dRrRiFY=AVZ^882rP2%?vvGCPxvssPhkWmN@zDQ<x?dm;MSJl3cZSG<qXZrm
zY%uOttI9~8HWgIXU+cO3r+x1;s7LR^5j!JYeoDGv+3L0&ehnhSL~F_RbO6l?;h#t4
z=IVKLpTZXMPl>GZyurpKs@XnovJY=I5B<nOPLFd8vu3f=C_7#1H@pQ8>z~gpoh+Sh
zjxBF*iZ9=9`Dt=r-ha%GtjGvZhc%!Em=-3I<$uR_Xk0W>(AZ^Xw#)bv#}tphFQMzV
zNsqYgQFLA})X7zS-Oa?E40Sg3V%x-)pdx%0mEnfw;=*HSqa9<o?^mo|x#IUf?iF5Y
zne=><?*zrN=$eeVM%qAnQ*Lp+yn{9n*V@0iW@l2F5SJp`%ZRy6F^bk25C785?{G!F
z-UYVb6~4l@SoNe*5q1@L+C{RzQFLfE^W*q7uHTP&N#eZfGRyO;UMiCUQUP7H+ZzVD
zbi1>5zosl<?7ts5s99b|)}Sb{N;n#hRw;l=aDnAz=6|l1@3dCzW)iI`)c;~v@Z3Pj
zu474fd|P$HM}-on5NCCdJ`u^#DfEA&+LIOUftof+iL9hV^E7@Nzgjk*tpJ?tb`59i
zV*3CMD`SBPY^qy~in+-l!Wto+Qo<X!i`H?q%j@0C%MGrVu|YRr%3d7=f2oHNiCb;W
zhCf;5>n@$bYwI=&>hKKHOg@AB8U=Mf%W!agg1djY-j^T98OYV2rKSId%}S}Jo{(U;
z2w+}}#jnig&I9JsVZfWp9ja&08*jfrGpQdye!kqI?yk_Ke-CoW|M7XB_D9D(?)}qg
z9RY#llmanj&X^3Uw-(E?vhwbuSc~?YJRJ$)7ZxJF^PVrUp$6IGG}U}i2t2AIj2`(S
z+%ru}pJ$iX5!{NqMrf$-^9;EI%Db_bcwS@V3`PXq+a>`cLRsx<`gEqBZQcu_3zB$r
zYq!DES%1Hc{PM5}IMBUfI&O;*$4X_`ii}lcEB07-;etbl7s3$BfEgyiNdPH%UY5Tt
z(%_gDtPVGG=y|mFInn=?8qDme2g>D98VZ}_VG%`7Q=&{SQwB{M_lue$d_@;ZQwR}G
zkfGhQ#RIt;mwTxYN@F^qb(8qNkf9Z)q36@^e_`lgLtGvL+>LUz-XChe9BNYey}=t8
zZ-ufK9{tywqUCu{i4S=i_jH%cr#80tyL(iWz~NVx7$U=25|02P&a>=IN~>qOJwGdJ
zZDO}T0o&`OBH-|`1iy2BmDn1;Y37y+yPT4G%KTiM9<a-;f6JFMoh-A9*uq(*&CYh9
zA6Fi4$VbCfKO9E>rpPCwTh*>}N~O+I?dj`uI%+(|Yjr+nx?M4mjPlzq{Maw@dVgam
zc(KxNSfh#$Te$Augo}yZs@{)7aT#TX>6nE(HX+-=_a{R<)1u+t)WnXqc5)4~l5zk;
zeBfC22a#d!sYm1<=4~XO?q`{{SFg(Jp6c9PHPd*;adIJrQG4TibncJs-Ul?xp>;fM
ztxhL+RoFaiMSwVj<<do8a2VNci=o)Cr0S;Qp)dmqZi99U!)~&sK?oGG6-KuI^WIeb
z$_Da%g<P$hh#NPVM$iXm*wqDu<(D4nQtp1&&;5i4TI-5Nmc#u0I^;`nuL5>ARvlXx
z*sc>Cv~eW7BZIf$#+5qO6*{I>&@{uUh&IWMnz;0TT7hnKFa*B69NUw#lyvO6Q@w~v
zt~-cBAEUK3>Qb8N)igh(1b<g&Khy@S`fy9RM=4W?r;BDQOux=M(>jhA8BT$|V|N@~
zYu7>MX?}|NH?&UAX@IGlkF$|(j5F{x<0mjiH5=i}>`S)#jG{CMhm4<VioP>QDf-q0
zSqEKNpH^GP=P62vV*-mNmxzOLms9+TzGjB&tX>e*5OIK>Wk|r}DqdUb3>$xNh|P6;
ztNL|b*lE$pO(OTMx+VM`(b6Jlo9)B4tQF(jV+%=WVYMJrGkn?_*JU2py7OtOkS*Sd
z?a)U!`_>)u)U16sG4CDg{+Ds)4co-KXmBib*6gp5>*bA{=9r^Urs_%J&w+8t9dhjy
zjF4Xmpf?NQeB87SSnVgBzDJVQXe)z;`{rMhH_HY+wxo@};4NYoT*l;Y{~AI0w4ZIZ
z3sWw2y-o3bOz~|N`PSl06Ice6W(J7q+FcB=VT^%%V#~5*pUFRrD-D-&Wyr#fgC)mU
zH|^?{QQP4{OXDuxmQxuhfJ~CTLd<V2&nIApbOtV`Dwn9>5Db0G+V;ymSAabRawkfm
zaDOlFZP0p@XE|n}(ehg#g+Xv3&lwt*A)MoI21N=Jso14lr)`>Hl7nujv#ezT5>gg5
z5tPfbDbI{H>qO&DR3$AgWJ<hG1*&;D)tmr?g|d-W)eIx$vYb5C7btXs2>eu&8o0DE
zg?QHFG7%oat$@S1%`EaGYIKH1wj60QsrNe?H58{UO-I}kyOvG<7|qQb2Tzt$t%PD|
zw5KJ`ul-)3ee>_Yw&IdDxaIs7stUI<j*6Qe0%G;<lB>EFvCWm85F*MSTrf4vZdS6b
zB<k7U#o0H46W4+}pE2Iy2of7HQB6HtiMcmeKts@5%p%y>4aR-Kr1W>nI*!DGU>*q-
z6`oNSMRF=dM({<JmX<oQFu|VwdK|*0>|*I8M@)*akq+EQ{|`_!OEXJMP&sVDV|k84
z0uIJJYO3EsD+?KD9b*@T*BrWrlzPgDT~A1L8`LUXN`aZ&iuq^z+=aRc_8QbCq{WQQ
z4!cQE|Afo}xSdeD0Ck`Kw=E^>-LsXyl={lnk4gXZQ-RQ`)%;Xhi~bP4)e$JLyynUR
zWiMXy&&6cI<KP<dVN?t6)JD33cLv;cJ%`xz^Nv7dtw*H}kR0yx3e%HV=_!Lhr#}pJ
zi^L*WBsjF6g5pgH5rgV%k|5fxgp3^zkwF|HtvmW(S+_F-25mqYM1yU<J`xQi6x<~A
zV+PyY5yb_F+XlvB<OL=M&U64CGwT)Ygq}3jepL(Lm7P_mb)suWO*V=&NhqPJTx19(
ztho%r$ctN5Gm;YVPj%k7&DvnSsO@Cs^Ga9amRZOP?LgMqIFR!UpfvB|@eElq+$+gA
zS69tI&o#(7Q|e|8q#@{RocKeb9<iv?728d}kDV3UAMC4*hvR2D6L&~L2CY1By*6%$
z<z-kh<8i{@8=e&0xUxsi)fL1R+n^n(&xcgTuJr()R*ke>xv)GcQw|7hDREvT?1gzP
zrGuBJB0{PhnfUw#uD*162f)V?!Cx=N@gW9P6753U<)(A9wfD`55yc>AQL1HxoUs@2
zYf#J>2@l_<<x;Ic&qVlx@~1nI8PzOK@BK1DEfi7dT8f+HTgcNaArjMaR(d$!OQmW5
z_N3fq{Td^0L2o9=m?>4nvD=X}8_?A~D1wFsH#UAZ8KAyF&5jBO_GKjL*DCk)G@1mQ
z3nh-W&Xh8(K9*xuY>y;WdriU@l13)1zZy+=5Dt;_0+Ca`q^C`P<Eal=xX?^}!<YN0
zBpgH_u8V=G$Pe{;D-VXxA>}ZC`LgL=?sm1Fx87mD`@H?(cDbIr-r{_*)4ll;*BB-_
z9^();wgP>$kkOVx7Pe8!X11Qzq@A0wTOy{SWZN2!=)+WndrAmyeX2H8u3r-xj~<O~
zB6A0A8%5K&FQRXQP}lJN-LK&^r(>Mam1~sp<JEeOfR8~gzmr{#V2qJH7l$-3H7fW7
z4EIN6axs)=CvjxhBu07nNC>ucC#FysZsR5eCxKiMZPj|IQB-R$K-pEH>1MjNt!Rvl
zlxgY<GlTpXQm%|X-%wwKyF}(#q0kS&e*)KmDb=Y40Ke#*#hypRV@Zk;+mv$CM(f3r
z(R%6c@|Rq0y1JjHl(N3eeccf?Ux8UjtJz3h3fDx_`=Zj$!@MNA)ZRmX*RE%eK}}Q#
zh<_^oBlue=7d?a8LQh6o!MK9NQV!ISb3~-VL~90^hDqg+LlRA6|98(?Li@-L)o-F-
z@N(;@b;0P=wu8o@*BuoOUFv)xScQU||0yjl0_|pp|18f#!g!>Pjz`<GL*P=MWdeXh
zULQo>`en%W<um^b)detFY`QEVOZ|HO%m3&AefkZ0Qa<mO;uu=~*}ZgqSz2$Fzy9uB
zpA}&MKrBYv>Q1i}+JHwAQ3#%;Yk$Zu<937T{@BUWy}6#e-0Jzb^1U&6?ta*L?!LKP
ze=CzgvsH<vED@>@2{ZAFJ&AIdT;H=y$$<8p6TssE+4!w4ezXwjil@kC-I?pc%9ss=
z12vWQ1Ys5D!NQCqO%@;l*0?;|mY{+>hw)$#3Wt%?BOF6f>Q#@X-7R5V5B<buF9q@(
zv^T3B=p+J3s6g$&v9&b#FC_z{#fXw-MNKJesl$Tx20Adev)LM_hQw9t3(sK1YeZxL
z%Qp_H9(_1DEHy(}R0oVWGIvp@EQA&^InJ~$e-vD(0>qL^AK>0UShtK^Sk95kxo7g=
z=;eZ#evt}glud+I?9_q>YD%7VK|ZPIl>!>esoxHzr~2JX=*`2Et%qaqcQx#~+m~>l
zs`T0bmBSs+#f)ij-Dr?72bSG4!#6I=>c`-1fO=kP<W?#yow<uEshY4IEZek(4PAJR
zfw6}#bC54sEYZz0e#VSmKO;wkm-jUzx=}j<ZuP&Iykr+->3j0Hh<+^(t7^sQI6KRH
z2MmB~6kJ6X8MG+4tYBNye;N63e#9zHtTB1HM|qf+k$;fSeFK+gkc;3MSUiukp5|WC
zm5MaK1=KW|zsUHt2u&a+a)3AVgFwm@?D74ZO>gq-5g1r*WZ~hQ0?~!Z-^!lx{X&}-
zdxAunBP2;0$kaK#@}S#x@<a^6A7RQZa5zhlZf%NG-0}_QdW8-DwgA7xOW6{oLErAk
zp}o2!;BLpQarT18?2Od70$*72u=iUxu2=&U3PzqJnYhP)vb>BU;hn9Fci`me_=3eZ
zcH6C2xr`-o&#Z2q0oKobb}CeAElXAFO16zUxlVtYX;YT4o`f3E>ZT-8Zhs;9Foe*6
zgS%I>j4&fx>AWMOf?h$&q1}kzHk~Bj!HUi5559VrIXTbNVQRIH-ubyXfb$hRwjezr
z1lAO806CGj-y?jSz9Nz7$^oeg(augI3kw9qb3%HrAy}wTnLz_1BAMSV9Z071=@VtS
zB6wsID<_IwcEZG8=yNLJHh@i-9WI1BMP@k3w*vb+I|)VGdw(i#6U?>fP=9e8>TkWQ
z1wNltI(2<%T9pNJS@*&igsa+kDmC++Q*lq`q}x<s2?z16GT{t}1K#e_wO-7}nb1t6
zHWn8IR@ZzY#=ws<<`#xTPsXIec4q0hi-GDuPRT^aTAR7M*{rv=dv7=I!*v{DF#r#E
zCLo@hFHI)eU(UIFCwJU?)-WZnO^0Tj2Z729UA5E2z`1g5fpl^60rbYa`E#<R3QnaM
z)(jFuPO)a79SDTZAU)~>%3#Xc^vPH8Hv~@klnDvd?}B@h`%@wiB7GPjZgCt<U8*>n
zL&HRHwiE%HTlNf*CNz*l4J|)RlLe$gdZ<L~P=PES@hTak0kYl0HAj429lb6I<2EnM
zJ{Ez;mYPlnP!z)SlPvZhm<g|QyJJ572DgVuAe0n>Xmn4=!ZRS+XPA#vUY&xSxOYu8
zY)eSG$lm@K+pw+jSelwuHJ!ei|5fS$vMV@$Gkvnoy(}(#((wdzhvcdCa&3HGV@UUG
z6pcO#h+T`MDX2i7A_`AL2lmv?fQdo$Rajyk`CqPu|If_u@BU!@c-VWISzNK3Yw;h>
z3}-4=32NJOf#)?FEZW}8Y)wv<_Sb$BcW21xZr5`_7+zoUS)PUyIvyYgHX4CvoqDS2
z-_&PEBrA`7ltY7__d<vv8$ub0aDl(AdS3}<oNOCb=KN|@Ev?wt5vcqbNP)CoK&d|H
zp?$BQ;&46=`{Hm`K*=(?-c)lAEgVtiNLlg-=#-zvBsojyQh%NhM+cTtNrnOqeT>cq
zk&OH**yk@EpTd(723wE_*uy{tU0EVA(mwY2$3w_v?on*)XHpuKL$|Jdf7;p8K{0Mp
z?=l_DK9)j~XaD6Z>M3~7n#}$u%%cR5i8!O?ME<WS`Ob)uJ2f^MZy|q~X-@fApoK%S
z55QF7rYM|ro((CoOTBF7^fp$aY$*;h&x&;|Zx3}|U?g)RV~Hnc=+W5&C10JvvUo7h
zov00<e5A{XvMxstD%NAl1NmL`Zf+mu*x;2#+_?JhlG9LXt<sQ6`{&{8wDl~o7OA1J
zFe_ToN0-SMAdcJ!+(T7INvERAT6I4zK%#k!DRuT}TFf)fNGaT4svcz?7}5{G`V-vK
z=xxjVHa82#g><G%YgLzSd@9Wiu2})JDK>;NbUv=5RjgEK8)7kL`AqE|G>4m`ci071
z=&q>B0~o<f7l{&tnHHnSw-J=&xNJ-Gw+HSukO@SRs%J)g@@SbVE(r{1*ZMf-lI>ky
zh3M}#MY?Rh^jo#FA;Ljmvx#>7<io;w%=v<`c`u_ioX)BvA~T$Yzu<$F(LlUNu}-HX
zo2$t^#%lFJv`*Kw*kHM225*Uo<ok1IvGG`>o_{XAfC<Ou%JDAy>==!2MIdw!<tCP|
zFcOYQwiHcyE`4n-_fgk0A%ze*+lA|)apf6(;m6a|Dr)s&S6X7b{riU^d3T-i+qI*`
z>gYVsn&oak>KFZg*t@6T${+pR^RaDP9a}55ZQHidv2ELS$F^;EY@^eYz4w3S%$yq4
zoO3f(znj#$%tcnMTHk!0_iL~&>^=wH_Z6b&SIZ`Tv$hmI`?3?8DxR8o=V<zsT$!d@
zzS2p%UhQv$^d{O5KS_?3W?1=&9Q}&On-=|ESY_Qiosz)6!;i`t5muPKu6@`mq{ru}
z@(P|6^aZO}9oyq1YJheWHY*%__BjZ|j+DV`V)AgqY#n-XpDj0g^r!rzbZ`VlY($kz
z_wf}no_k)^tEQZK*jBN-(P*E93E|`N(<!Kn2zlRQisgMpa3cY9Q*`W?l24l9GMr(V
zrC$qI-2|`!;(u6PeW|9>aJeW^WrtL{s5mk!<%atUZIu$jQl^raM1V^;jIFk-+pe27
z=y-JCv$mhlF_KY2{i6m9S6j;G#w%NO*X;e#-D#Eq8e<ZE8@eOBr6*E*D52k3%}7p%
zr>Ex;2*?e<d9XfLP35iV(icvn;BOUlhhYVJY~jEtX&p?SY8~WcuN~7FpSj4N;(TdY
zPK%{6>(ZUPJ1A}(s+GaI?ouT&+lsPtwl!iMZje!m6}#g<(LO4He^_YZgzP80s1*s!
zU0cy??R=dt>f9n+s8@iOL6xuU(_cW|nO-$XeF*yWU+V6Sht4a0Si5w079mt#{ZRy|
z?-5-ePrPKpFu|Q{7&fvo{!!~r>{bs4&<f^D;ohyks`u}q(>|BKJ@EPQ;ZnYA?~wAl
zgTEtgmId=ED{kMOK<7+9Tz$_m*X#ETFGgT_vWBwLxI=+du&96m7d44D?(sU^LV8F^
zG}8I~L9R>0THaK<fvFeWL3S4w`{-LWuZJqGPr}s5uH6Jvzt;V@zW~B1s%M3*+L&;0
zut_lRDbzTe?`_SkL>@DAFQV<k;+`BM>@T%^x{-^Vsxf{Z@3R#jKkGw>0*x^>bGKCM
z7OEzeQRjdvn<703K79Ww#fd;(Jq_=DLw2<T<1CrS5eTBpTZO!Xs<GLgdvKty88_K_
zaaZyH@4yf*V(s9~DjqZF4m9NoH1kt<-FA<_H?tarjk+arFwGyU=SOzcr9>2jh^AG_
z8DgTpScnTyTS=+r%Hyq9ha6-t&rPfs(6&;jIkQE?{N<^m6AW*Lg)alGRahyXH_`}e
z@&2ypX1d;F$Bt$ID*=@Q;+sdfWHc_z3!G!;a`QKUyP8Cde!uua!VJkJh{jhpFomqK
zq3I#g7VX`Gf1<n(y^-l*z45#nP5y<p%h#a4;@JD0qh-jSD>>z+(5<!ByIWCc121bT
z&C*5p8dFxJmQJ-0CF_w#!+sxoaxYisAZK0$e&7IsV3~d;Ypk$$vmX#vse)BlbqKsS
z%koHgs!=_B6VA-M3?YC)OVLdIbTCTi$V8TCm`oHYVl;$TEJyPDmH^HzWRf))2@$$y
z%0wRe_CKPIYFpeOQ|07;qEQy;7cPpT=viuR6XVHM*aDfB3;FZlk4*QnA}+!e;@Lq=
z{R6)*6zjNj4&oG(3`Gk}q8+K-Hk(poWtK_IX7@INhq_pV9GY=SS!_rnd%n1BIUwxf
zli{U0!~hy?u;!T$m|uL}v<oFpKdvW57<#z2=t?OZq2~d9osl?fO#I)yD4V!Pk28V9
z+l3{8iStU-+_d|X7>=FWIl?jy{dDLA{kxw4ZzdIhM!AU<I;Hv^GORV(r7~PU`KZze
zDPp6L<2<cdH(XN@w5lnN=V(Smfz7fY%Q3_lLyJzqR{0+k+sV|iDz8=#q?Om?hsnRx
zS8F_?y7QhLSebnbeF({eC7zNxG__lhqTl5x8^Xl2g0X}FjV?*L4TwH^g!g2o7oe!z
zMA2$YxJanUP`za;(?#a2d@@v46{xGfE9Yin^~Rb}8*`}VpsQI~>3CA)*&aN*X}IZF
zr}C}GpqPa@D|!yZ0rAIdBKFiH5NozXv?hDx@;;;AcaRztzr&*tO~_?1oXXSbuIx=-
z02cf<;;bEE$~n|~+i+%(H)e234UDSgP1{w0ipnYb3!A9vI=rKrZrDzM!q0%tnuMP=
zfb;>{tk2R~R~GbxC6V4MGD+KxIup37Afg1O$4!(kx>q#EK^&meg-t@ze%${OZ=&J;
zCEl30uR~ly9`;vUBO8Kxr)#GJM{XJI2e7t^P$aIdG1PkBFKmD-fcxnLYSZ5A&K~n>
z{k(RD*V@kV&=Z8OEX=3iN+!9)_)BRy9(1=;+ug}j@TO%4l|;l0tJ6QY)viE>F#gKm
zTm~gkvr^KTTE7{JQJ2tmuw0=9(Wr7pz?5u^%*ADPlW`qx<l$r6_&Z&QeKalRJa+Vb
zjADbtyOjzbD}RlDv$1^kV<1^r*&}FiKv#7ltgxaMtMT^@iCjy!uK=K8#}KE%u;`Dv
z<S-IKe)AT8<=^wk2X<Nb1p|eO0ufr%j<6o<lg@tFj7DT+s&QH*lI%n^Q0a7!s=hMj
zd>FM^kFsidREVMJpC&H;eXfpw-Hl>1OTth)uF~a$G!m&ztP?-0Kt>m)qd;;P2&vzI
z+5(ty;lXmJmsyhftlI20)4Z%-%DJ&$FEnT%d^8?j6Tn^OpURgsRC3_EdNff?14TWK
zZTBad?TB1r0!gp%fdi|`3zlECAtb7l0V-?yHfuj&l0u|=#8mtE<YL}|O#4D-Cft?C
za#H)ktF6A5tSRQk$<o@PbgcVjX_+T>gC$vI`n5ifbG-iG^Ie8yrdT=(5g8Zgv(gh_
z<OQRBMD_KGret?E@XyVsh6PAz+_nCAQI7EKtNCG?&OI<bt(*cx<j_63wx<ABU5cRd
z#|cQLPcsBLR3ekXBm1T8)hRYy;SxA{nZrHsGT8)|q;B6qk!O6YSokc!fSTXP{WfMW
zh#{D4@}y`jUG$HTr4OZrC-Uwnhcsw8M?F_&6q0t_5MyA6|2jFxu-AXs;#U0NXBs-?
zq3VuSI{Oo<oO36_AjEN_T*&kgF$X$EbUC)&J@}l{@x(JG4<=`kpL1Qo{ZW<wnFE1u
zQVaJBvyzp&-86Gb3OG3ciyukHj@lcE@Ek{#@ge_bnI=TXq%!S7tmr2$<!#erD^52G
zZd)3A9WP<_@3x&jSB;oK!2t(WtMv#H#AD>A$fj+@nD}!2MXII$Q2_^(A~ggqC>oPB
z@LHXZ;uo5*&s`8j&7=+!60uZ&kYkNNKanXF>lVuYGz15@pv3BkEc&Gc?#~nKXV7Pf
zigNhq=7h-<)MOWykePqWR{7H7L-|if#8mL6o;fc#JBGca_B_2645#1IEBy~5<o1bT
zxUJ4I!slp*59$o5l5I#iDqYWhEi(u(STuSBsv&2FXcJcp&;ox+V@zq++=mRHjE4Eh
zYP7!_e%+5R04%QV3-javb7qoOG()axd@7DePuBfD!>n*e!r-eB<jPq3B@(`x0>mjt
z0w!4Rq~ZdJSQ2t)2%;$^Y(_ov4Uyd5=lGznx$<St&(L4<sW@=Twn5tup?@`3!zQkR
zIxVXE;^9A9tMb)IwU5o)wvufnIjE5qv^KVwAcM0Tb!g9|;gCWE^{qHp02Y!p^IG)8
z_$Z?kXX@F*Rk0H&seU6ZmuS=iN(C~aG-pw!4cf_`8}8wE#|@%ZVFPtTGshWq`7YG+
z940XFSsTO<evC?!aryK5qY5YwcmKH~S0QFvRPs#nLSahA3R2RrATWJGhVH+0eHe+G
zzVHVZFC8^k%jMZ4D{#;I(jNbv0PrV_k>ZRPhiZI+QJ(;K%B}WM-i2GwmlH<QUxN;Z
z4$e6BeWCR4iHm!nBST<X{_Bi9HQY%CSmuua0wTO`We6G>?k@|OD$_{U{_l}-<!0y=
zPDGEKjQN?~RhHE0IdCf2Vd_R@ps>`^0!^cPNecdy5=>KOb90iWRD2F=jZKSM3de}R
zm4IPmE=C#WB<+ABW<4o-V=~z$#g6O3#<TGcDFsm?E%SaplQ6s&%7t+ckSE5RjPlgF
zR2oyW$$2XwuhRb;0jF|McIv;~lGhlFbLk+eH!4uiN2zkr5*MSkk}%r%T1ETGVjz6@
z&HO%x&j;=r*bO5U3wMGn3xb)T5qZ~UsMM4yj~QV^%YuZq{Xnb8Nd77Fhm>wCU5Tl5
z{4W96f;8P44Xy63ZxWm&`#LUw&cPu!44uUVImrC!%Q3wF;WhDNn@NWPZcu;*Hm`kG
zFQ2Kz(wWturStY}qT^#SDlSX?x0*bUcE{W5F(snER;GI|cWCPh6IC<Y)8fK+LHl2~
z;+lRwjc<qUw38DVlX#gxXSuI_R|i7h)wGwQ&K;~I$7ZLK@r|;!{P&8*S6mk_o$#l_
za0I9z+%`8qybenFYW&`d+z${CtE4p%`@d#1kv;GSJxMoj@Ql1<w<{q`<f!mbI?XPi
zAnth|1KrLz+yN)?Yj&sD$-k>?|3~2)kjVcYz8MVse-Gc79{z9e&F_=BE+je%%9a2N
zlr&`jtl<HT(j^?D(YuL38pOea=Bk#VcK$Tt<5W{|m%Q`y_2mWND`Og{$Hz&)A-vSJ
zlytnGQ~hbWJTNNCj=}ASTVI8F_rwmMsjGq9dpTOZlq`?~#8vt?25BH9iQV~wN$z@e
znU!!IGj;+c;5`uGMFKPPC(92$qZ8Og@bgP9rl2lOL)W1wQeEVt5_;$C#89s2f(7dv
z*A%l1Sw)SzWr*(~Re;=(EgRdP`%LN&R`h`4)KZPfOOFkWtz?J1`GIh3=Nfh!pJz?C
zvwTr&p^PyjPX+B9s~@%sjhYKs6<SRx^HM*m34;gMBY6B(B}cdXOhto}Dp5r%yJLSC
zWQs8=a*iocz_^<IAYIn4Euk&emEm6tTzh%-bbgk5D3ZSZgE((*bjojh>Mm8$T5VpK
zT4<TNrmBE1TSn_hpiO5>rSUmPV)d{9YK5)u=d{WDU~x61DMIEWe+z~T7Z@BOQ2Sx0
z1}ZED`1u#CW_Ab5>*R@JH2fXn?^X_3-Of@Aokz8{!>+hUPO{Dq)2d=()h|i=JuE9u
z$^EKaEAl7zpBCLo`LEA;aFPjUyNM$M01M?TnYR;IOZ6;N+IZAM#fPLtjU|0QFSp~f
zfr$lE+TN;k%gNZa4cgMp5Y~*fmt1h3{a~E$$rTcPjvi*!auvLf*XZads3vl9?yyX*
zTjkzERO!b-H>hK?aiT|KP?9&O9T740wh8l*ME3Yz@7QwG6tYPq*QclEijDW7--=C9
zGo3^|_5?;=?=3cB-msb+uNA$|(ayWW@qlCmm5qQ!vo;q2J)pwLs0<G3URu{HK>o%i
zcT3xS>*Itt{eCg&RiO9z4Ep7Bbt**>F~{3%Mz<a2M8+pKm*29eD*jo_$yG9Fhgo8W
z!sG_S<#geDKB7K7Ygf0W)2CZ}NHq8&jxD_T)EN$rEM%=tz>ewn`W*m8#=>U(E!kKf
zQD;^`$rZ{}O1qE`amtoJ^!jQKO5zvxOYy#}#FK|@`i-BS|IPX#33vL20aya6qpR!n
z>b!$+`yowI5k+o2I&+G#?bYM$PEsifLUSZWF`mvr3flPTtt_2dM(1tlUTsVZm&ZCa
zkwd`8z0?n=?ev@FI$+LAFFR8pHhNu%X<6rgLpT+DND8ZR|GLcmBsd7!%^vl5!k2z{
zr=8=1=DhNAt$urRefjKOK9mPVdVV~b7{1B>8_?7u&(QvW$d*mVFmqtq{Qem?DJ>1w
z%wDG_ac?(~YmI<`-sLSNR^>&!XxxW#tLgL*03Mz}*Wp}N*E4OkJgxg!>Nc=}%(Byt
zr@KvQ^J7>WRw)c=duT=ksdR4OynFMNT)_wE6=~6^FoB+gThWgNA@0{ZtwJ^1AAAps
zG;K%B=)`4~(QRwQF(p#u1$mz0C<Yz*RY6yV$v26VpDP!ECMjzA>5z%^{ZGU5{$B_+
zw7!>q-X)$_f)Dq9p2m!De;1kv*li+?%R%CEQNB?X9dQ&K5s?g>;zC{6VLCBnH%*L;
zb^pa52M1CLn3dWHHrWwAQ6%wNqbwINMa>Tv3W-fiZW)tP5xJNAeM?eO41cs8ohO9_
z!~%>>Ng@C&&Hpvc>pakXJAGfYc>QCnyzgkjF1j}E4%f?^optkx_rjv8o<CM>_4mv{
z$KWEb`$9JxC4Px}#<R!gT8wrw(5apr9raQ+2t_<9VneuSZA;;Dmb<zguNh!DZI_n8
z?~ERQ^bitW<zefGuTOFR@;b|86d@ZXX2lznLy@Mp02XB`Q6@21TZRFh;7Yu7o)D$o
zwx;hbxQ5xO_d@-`iV1sPjsQuNTXvj*r%w#I^q7=<De^3=uB3|hzCTpoWP19z1s=#R
z6EOvqXVpMfPhBQc;6#wlZj=tOWj7@+Ynm2Oq1w%ky6w2@Uvaq{e;?^2fXd9$^0rEe
z?nsjJTY;o>_{(3?^3u>ZS3h-Yn)P6C2}&rr+ejs*$maVJVE3V47o)O>gBqr-RL-kU
zDhrVbWVxE&FO6lK+8e$TqZ_~mW=AmkP61^383z0k1Ub%VbxZGM;MOEZMLN_fOF{>7
z1v1u7o-ESv)|xZQgWkrDqQht{k64qQeNhxN3e!1&o1MR;1i%Cn6T7*<s`U8In3#qg
zJYH))PR>vJFZTg(@q$+L4>P3Ks(sg$%ry=zLCFeVNX-!heYEwS#`C^@KGB-}z8ZL<
zA>(;oNPO}KDMIiUxpg+t%kj3qPp@`uS_Bnnhq46yJFuS}U+zS|bwV#-i?%Z2tx3}%
z_w?T~x_lRprYJ19d*egDXIA4Rr{n@KQF;9A#2v(lx4t;))pxuC)D2wR7m9&qF~>Dd
zTM^oBLcrul`6O<e9|cJ0B>Xs_<%N^nDVk!8d@<J#uXkM1W+2xisJPhT{LXf`r8y8Y
z?7<bnFDWC_VuoC*>EN7Hc^F4KMqTU|CaB(BYH4<*cYiSKK$GZ}7_%=sc(keIVf^IL
zdUCAl5wUS(V8}Hk5S(TBwaRDj=im0W@A*<>{e25>++lSb)myFm)^D@>7?{}6vz;V`
z{!Vlrpd1oTM2^A=9K094eE}_!EH*ukT6X6M%Gt0KJKCTSaBmtW#gdDNK{I45@&o^D
zNO|(6r{pn{7HYV8TnB9sha)7*sO-1wZ=cnQz@!hoR`Y}679m`&^``|jGdV3A{LmR$
zhWRB)z-DVh(uZOjv%8esFI#M$kWD*n3i<JrHs2yj!i8VcxhqV)0^Y^o>J0rPmcvhz
zO_KNPFor26uh6z>TSFWnj2nz>ede&6#W>sZyWAD2K_{jgl+!Evl_3iYKKU_9boQ`p
z?}e-b(ew7fa%)G}41zRMs|o>n?Sn1#cfSlR*#RpERmqKVe;lF&szqiot9OivqqWca
zhR}lNe|mOrFkZz&Hi8Z#h>tK3!ED_--zMsFJwuU;#{>L>xJQj0cYOUocpz>Yk}9Rg
z&LYkW72;f`V}y&S#UA>=VG7L4`u}RHA@g>0yZz^!tc`WnZ-3`q&J8@RuBLeo6`S`=
zC5c$~;!MVRZ(1sTM?jNJZ8}h$M8bV9#mduioCZfqk?5tKS2YlAz7y3nu-Or^0=8Jq
zBqXpVK=Am7%rWM5pk}hpk;cbw7;?dLN`nHM_cX7V9}xnkUALH}Yb5dMwLe4HgdZ~}
zCW5@L#X0uu#A|po^Ck$yn(^PX!nfc$pq&wHXNBtZyjIOT@3fJu4&elnxAm$cfUKZu
z2Wk1-Zg{o4z4~>8(Xym&T2TB9U$iY1Y%a)~5klHfszoi=hafD3ZES-V!z9>zS+R}s
zpyo^^0DH4F(+VJn-3#Vj3+A7orz~F)kKb=;SO&rFTlsvJ@aU*thX>c_usU%4>$m+m
z_>-h{u_IRmG5R^*!@HMS072s4^BY={#97T`f{oy6zMthkG9nj*l=L;<r5_GhF0hxn
z{n7|t(+&L2Y447Kh_3%6QTvAmT_<g#rGFo~yOHMz4}o}oe>b|GueyHoR><}co|Br6
zo2BRPjx}}OpwqR3%uB7p+$wn}JZu+rXw&07%x#y>A(_GeZH>oZ%JqqRmglqcdkzs;
z?JX!RLZussS^${?p#|!T^Vjxt%KcTLoT%l?2Wif$bhxXdk0k5^{_pz)plh*Ly7nU^
zaj8~2|4&@K`|lCv9<SI!V3&aE&sq?HuCPh@nOdJ$q=%62b9c2Gq4Wng@6l6*r!U{x
zksfh90yn!N`JPjb-rXKRi}mdD)Qay%>&;G^@$B_fYmWBI^>z=Y2!aw=Za64F#AGPC
z{y<SD_%@AT-n^ojTLz~tHJsdP?5Ss`N5db)6_Fg$=+7TBpqB37r8~l2`c!AgprSfs
zs|s#U2HrxvR^Cw%2dskhMOuxYN?gWhO<+js)&k8Y$FW`5LP2va1W8+2YUIYC7)Uez
z-+#Z=-rox5zYn1k43QsFzCVMk^L%65%;t_Gnxl%%=@vxNxnciGj2X56qu>0a-~6NB
z{G;Fequ>0a-~6NB{G;Fequ>0a-~6NB{G;Fequ>0a-~6NB{G;Fequ>0a-~6NB{G;Fe
zqu>0a-~6NB{QpS55$^mi{DzcaKgZ(<$G6|V7ou0`ZfD<vKhmzIPqn)Z_9@rIUw_rQ
zt{Xf^;yBcKeZN4*{>H)1pq_z?lV>h(LXHg5>*HZPB5*&VC-BpOBtxjvLH>msJsD3V
zVhHMuP>LKy62$C(+u$J~emDUCAz9CoC<*=WLXGIxH#A7MmhZmX$jZ+WvR8#=Vhv9l
zV}*!67aw1eFfu6S2>FF795H0=jf?8<;Piyd`hLCQySXD65lA;ag!Lm+h5`aL0xk4=
zaoGEeL)IUnji*g@-*$vwrsSL*qz(4<mgQj@I`hY$bPLZ8TD=-5&BaO^#RG{K&WjKn
zpK=!s=ga<ev^@O)9de_34Hg|T-w2*?SlUzC)q9w)PV{!oQK)Tg_t22h^5)9FsK0-R
z=zJw_bq**sfQXsZ7B%WuP*6~}gXTg7%CvAZ?&LP8Z~lt=X;g0Qsz)PxI()JW8N~a$
zkn3s7>V3$#<kDgjp`Ey$VIV{6*d*Onw7iSn3tJ|Yokcmv*`qAn$l*+W(iHi-_qss9
z^U5MemrCdAa<6{EP=0sA4%j)x-;|$7J>wHs)b{E^ZKNr@-oUq_q2Z_d^m$6jlI~^4
zT6Ik{p-$){vfr<sx>kf?Kc&Ekj_n2RK!AxVR>Fp7izELGGG$hY72}q0;U}kEMv@X}
z$E}9Qo-%Gdi?gm{M|zE?oi@n)u(A}Yuz@DRhXv}t(P;$%?*H&oVI1eQ((`F|yE1uL
zvUzy_an4Mml=*8HUpZ=V&46?Tn6++8TQ6Hz?-OU<L5B63kc@{ikf+@IdE(!7A7%w<
zwR}bLL21Ki3Te>0$%X55zG4-;jyQta9Ufs+%EiF>JE%D8mXY#&;U9zWNzchWG2f|@
zs_Q2Ct+x-Ga#331!9SARN0~5b8Sj!MAC%i6hm-PO=R2HdxxMq*y#m0j47*{PMayhD
z`gyvZ%ynj}&`hI#DrNl%c*sI=xn5*sa3tq3O<9!xttQ9X+xva|f|CpSzt70YNQ1=Q
zET+Yq25)>T_Kyxe!j&Jhbxo{Tmwq~9t~E=Bb^`0~Dn(c=v23MJ_>)mUAzw61boW`s
zL=^ruXg4T{mNYZu=$Fy<Z$K7d)~=HRS`V}pw_=X8z)pWYrLbkF?8sAg1@y=TZL^fe
zv37v?$op)`(v)uTQpLehttAl1f!DE`T_$qC0$d*`5TM5K<j1jxoN?}cw#Gqy#jT`q
z;)SRDsqjm*@8M@dx<DjV^Wtrk^bcr=npMlC7bq8ohR}Mu9>4#t;v@YsX>V{M=0Q?Z
z_%u*wmY+`IK}7rfj!kJo4gH&gnH-8JL{(PSjRwr^O)<N_eq1v3APx$X^3o)9BNkCf
zlg|z^f0Ho|qz!x1GV#$N#$G_HX7UYVQ_1Sc^w_YZ{Tq7^-F-m}8bo@T*3RZZ#Hjr%
zk<O_DsEKfEP&p&zNG0SZdoZ-lcNX$aG9#?(0gP!&8f;gTrkPdzpKG6&t?DIev%e9H
zFu*<MnZwrJNg25GNiCK4amEQXDshu0Bk;|tCxf@jU;Bm$xx+I5brs)7o+l)H&A!6p
z4VsY_oEp-*rGagpln--uGEz5ehs_h!dEI?#!)pbk*B1$AVF<cX%>N}^BVEz;cS7Zs
zz|VHV4js3fo35V2#8+$2ZVd*VjEIGsHe0N`9A6NtL}LNtNK}!!0ggTtnzqsczlCr+
z<{VCud1pWky<H)}xULpS_QJ7Jj}t$734-WR!!~6Tr^eN-LE>^I$HjhH(d@{V8}6y8
zgvaqOCtewUp<=5U8^v--p%NsuuMKK-%dGFYBXmSaMEz~-n$bOV{H!VQy%7b$lNa)0
zSz@VMWwNy#E3)tp1H*;V@W8MDuq2C+B7+}Wj3yj)PlzHvTW1QWSQZGyU64E;G(V1j
zCku#S(4<@;@9)v_jStS-oDAfmhnqUQ%o@>-i-PeQn&YT9E^Ws{_g!1fN;-gT8^MfI
zVE4rfkMMO*)el}`vAn{HBNAY`7E4qimcLH50bjjLuhJjCcf60-B8U9M<m;;B|H4$L
z<!-sn@@N4Z)l#*-Pm&!7{f~j02x<-A=`5novMD?qsf-0S-FqcR3#M}U2~v!%jFswY
z50w7?lCznR%g2qC?@jQ)YEe2hz?**^tm5C%?s(1NAyNj0MNEO(lT$Q?p|7$c_1vaT
z_T1<9a?n3?I3e?HS0Js#=-4&%CRN=y{gPNS+yL$X?=amoFDy~G3aq(mMXi}+{LmxG
za-dLt>-XScTS}8cAjezoS;awUIc%Yaj7%XLCzVbd>j@2i;?2OFI(}I9qs*IxhvP;S
z9xo^mSs6D|_DvcTUbop5Ajy~@PyAYOUkZK0SN!<k8{=I)S>Ht+BppAJ@K0Rd^>zXE
zagxqs>}?Y4Ua`eE=YZL-D7qN3`dxgD#|lKsF`IPKi!22n__krc;ah$E{j1em-pC_7
zl|i6(c<^w<Pa?uUBfiUZBH|fu_)fgZe27>Ta%}GI@-uO+r{Ut^O+7!!!R=QHcl6sB
zF@KlGbMYnsxI9;oI83bhGTzio3xFezs~?sdke{Wz<gy<Dd{?%eW{(I6s5hX0D-ge@
z>rmQ>34(P=r+MGAphL&5sZdCi4nuurs9-ZBVFOEkIYutdY(kvV$?x_J<pU@v>detx
zaT5Ni>?8c|cghF3-Mu{%{~K8H2$Nk2|6Pfjd_u`b=0wI&GZZlfQn)6IlcNNwpU#o|
zj<2sHGR^d)TfB|q)hz3naftAU2r?9?s6Al&prb_Kc}{L55|5^+<6k}dJ3TtzQShN#
z6Mbra9v;pNoC)%GrF1^ZcT_oc{#&DneE(v53o@olaWG#PeNF-8lB|&T7@-__A14b=
z3;tVo+e|3c%a%mva8AzgS}7eN14N35PlrMdwtXb6KK#5LB`dV3-_ai&ph>ym<YUD_
zMd-ONcHB4X<Ktg}IurfYz3%n)B%6Sxyh(~s$oIRCei46r{apjU!roc`ozBhl7dt|Z
zF0v`_#2d(w<Y#H+bNDN^11G3L-_54T&?Lqxsp*&gzsO1hUaH^A{IvEq_}8Uu`g8QM
zD5<P2v$7rkZT!x^uI`tASGQY1_Y8J#<A>wwaA(`D$5#0{%K`UK6e{99*BH4zeqFaM
z*Zx4_J$9|JA0H4D-bun9_$+yRJu{sgeJS?7JwDq@8FdV)I1uK1_q(ZTt9CaQQn`$N
zUQYN~6|K14D>Yfhog6C!n{+uT+MJyd<bX%FyTG~BPU$!hir~R06|X@aJsEiK6MBa8
zfB~*<33T>_3sI2Gp3SJ}lbBm?=#^>L^VdQnK=Xw(<8kPdZ$iEek{$Q6A2QO)O{RH+
zfqi?6)_h^hp0d<B0#^z{bgPGI8w3?to6>|mR&S%`IQNOhd;P#2ZOMd~6CJq2hmZ8|
ztC&Nrn%AwIXIkvb+*TDjTiCHaOg8+W;wX>I&ju!hk)s5UqUDI$rFd&Ozej%x<>DS6
z-v(fZ<BO=Lmr!)-x5Z<@R4sM#k!y?{-|bMuCx{QZsIU31{?Ap|Tk7-FK9)KXaD9AE
z^dC?9!SV_Apt$==M?von2%?1^0sINkckG`LJdk(r?v%^|_R$;(z;1_Ie~zr{2<U~|
z*VFBuuRPb&p9TTz>8_g(RE>xm4&lhV8?H?agNSERBvv-Tlo3F!dU;r`HE-?i;V2i+
z<PFA*eGG!sAsg<}3f9(o(C4F1+~i>T`5JlhKYg(hlfv+83buhna^PR2;>Rb?A?IYj
zC?|1ClbF40pb&E;;S11{YS0$?y?QZPI-C|V#t;z@O0qqCIM=iA@g(5d3bJk1#jKx9
zq>+fEN+H6G$HC6c{-QUMA!^?0b>!B}*qkJ2uy!;K&Ot9T)5c8&+Bl+s21caq><z+U
zYBqq~>t!?&zk(n)dd55ZWLEUQ!RsZoi#d|iA{Bf%puGOE;30@BK-PQ~&~D9N!eU6d
zmpU`GaD$afqbs;`s@ZeAXCUCt{*ltJ(e(bg*us@N8#@M;LQ=YYOzCf@=S?t2vF`aZ
z^L$@7E<q1^PEkYA3!VTH*HLZBk1YfEw#l;jgpuBvbDey6y|gToJ*^_|i4yMVW9e9$
zHSd&g=7i!FMOL;rF$fjZ*OozX7h`~4K_6%+lZO)tS7`f|(Af3M?xbq?jTvK6>>)6%
zcZ7YUBeo`^zhGR&DxU#8&Y2%zJ00v)2YaHwTH~$aVxJtQ>ki;U-`@AR{6#0{)?J#E
zGdwwS5DB?B)@2QFX?`dPboJO^!jUMUfOvm+`|`*JY`-lA^w#NengjH4G}=0K%~_Ey
z9n2R5qzVIp^>OOok0n{_5$XW?+-pQ_05Q}-P|!aWhGh%{Cj=f}tn(9hwfM6De}V?R
z-A?!d#OaTj;y67qLW~zc)Zd)CsvR$jkj}j~r3r4I+g~+y8e9H@4}(T>UdbA0;?)j4
z?%R!o7@QM8<SzE{pnC4wxf<IAI=55nE$@4-i&&0C>@MeqtJ^)sD$H!@LIeHfzAxo^
z(d=1ON*~us75#yRCO^+ubaIt^eSb|VJ6Q-kcv>Al9?{e{m!EWwx0J8`L_YZYhm=$M
zPS@XwlPfzjeJGLep%Z1&Q+TUp<;-4`v+uS~ftD-4-T2kGuQ|r{*zsw3P^qn2>rjdP
z=*)Pc0qak{i!2kJ6JOPClce<}-RDJseiZLQx9yiPrU}~xS3m|bUuy&!&?8WrCXf)o
z%EG!K{L8%~&<!Bx$8s$<Trm<KJ{@VDG(}2a(q~<xRLkwi36K#Q?ZN|P&B12kGI8eW
zGO$>Z4gTBKlIkqw+5EA+JK||01>>+AC=>(V2Bf%Y9rB0#)CJyDu}7=+5sm2GtR#Ip
zfdF}Lglq_&=i3+JO0`6rLk2d-g{{|_n&DiQA@=tG6QtJ~W8=`GS{~KVBF5gn4JYdV
z4>IK5_%CE=UFdDsifc@sOsZk;L(6>cN9xOsd*z+*x)pTnof+`=2mv5PrsV)EFK5aB
z-bbl&jf~=jRt<p;B@rnd8^ss>Aki96WdSn_TfoB8OUB?mwBt}5eumsQ%DrsWp>??K
zyyq~hd@p48f(6YMB~O0aAi#1pG~e0o1b@&+;KI1WlwipDebD(UKJ3WNY4%f#a$Er#
zv!uqUG3vjpp@||FB3Ua%Lt@hy`PB1DrX<zmZ*?fZ6Me;Tx(cYSYh_GaTl>{Fl<bpz
zbEi=h^m?)0K&F-6f>8VCnm!!|^#H<AjP$jPb-Levujr}|jE2Czr4w`d`?+Ra>&uK%
zfx_fbV~kcSm1#1^ilxf%*TsfI=iFaL%Av;Os&x>GmtA1`)<LH`-KqO7IW{ZTvFCIK
z8BD7XXj&TiDI&zh`9B`RvC`-x;%nz=JNS2+{7J}-eWT!<ike+q3(GTFUU`{w5_YpW
zA&E(*^VcOkLoQmLx+PawnN%j$md4-%+PDmSYidc{PjyXqXL(I`jq-=U<!pdey(Aaq
zU~u$hiGa(o9d)W2`x-4i=0;j;4|fCapR-@1scckDhG&QS_xO6N4x5%b!vWpVi2jn0
zj+BH7@opqF4J6Wv@Es(m5$tY}IWoSb2M!e0{W}(5Fd`$Lz%pjeuC&JBf%&2~0tP$D
zvBh_-@~3o0XU@0Tirm}M{UKR>^-1mhsFsBcHD7S$n2At(XUzJChf&~%Z`ia0cQP<n
z`Jb{{jxRlbeL|iMnzdJnct%EevEOXZ3@UuegLi0pJ9FHz62{*Ja=BBz1&@<tm}4{H
z)Glf=-*mXk6fyZ~nn!!e8e4)E9L4KKXZf^znIuDeYJxT<D%30G6V+Z-jIejQgWX6i
z3UwefFKcLX5o%cZ*y{021WWY%btf8y^iltQSB89m*u_v`vC1E(^*n8CrnU}AXlXby
zLZCQL=f5;8qVUhHj$BTf5J+$ClQQAlvk9tcUaQ5Jt-_r*z>0lysO5z>E>SZ7MHguy
zXkjH;=#->6D#S0}X~o5TK>K`#F_}2yXveTI+pPvg7x}Cgj4Gu1BE;a%EfST-jgpZ=
zz1Wgd1x$d!=a=`N$oN7Ju>5N^`^a|$TVPi;A%%)i^;=S4ho9;GBAq39Osg(@Yk)n&
z-@T(}aIsf`0OdOa_!Fq9!0{q~$DT9&qpHVKg(Qypa50^bR;y*9e<h_HuSzZJxvErc
zJ9x(#Dl}rpT&ldK>kSSs^^=89_xn@)cwB{iFa1}DYX+Yi(-Lkl!i~6qa;DC%WDySc
zaUMgVq3zIO{7$yXzM|OFC%lsJ2(v;)cS7J)GNEctdT;X$T}JAt$-ihP1Wwu<g&IBE
zqw~IP1e1_!Lc^5|=ecI(n*I$CKTafZ0L8paM_^+xD#XR4kKD|yNVr_HBEySH&L34s
z=QGQ&9RN@~+30LP-J;GFx-fdwFIYJla9aAkAimQno>;M#vVY4?Dq-Q_C_8RSVWT9M
z63RO{E2XH32BtVImm&!#kWpU4#v3{iHgH);M}(PzDmx-K%y%T(B0|q9G)EHx10g6U
zl)0R^DVSg`fsf=@u2qwHrbA2UIU#(eSQcIEP#(SaO!H;wBm|lmik($I-z^m8f!r^G
zpzO$)g|Z^|)b4g)cS!?Ay3r{i$BP0C!n-7G2~ssvxN9BJY$I_~c)pOHs`6)@WX|k}
z<p0XT{V|986MO4c^~2B^-uCNh$DJMY7>9aU_LPG(oEZ=i!7_mw9n<bTiKj??p)b)H
zeke>&M!099wrf#c&8k-t#NN@gle!D@Chystp1n?V^kjw||EXECCT95fOMSy0@-d#+
z7M57S@HPtRyiS1U&ov>k{C(sdW69+<1ByC#Y-@`2q64`^%EcedLOsG(rQ9p@gv#_F
zS&Yeb?(Hcrm=a0q#(jGa>tnbE3~_r~QS!ub7HK&%xmFQiJnp?cUxXo7))4&zQfc74
z*s8boen5*H<b&LXx>CJc{uY(cf@s%^9H$4sc$_z)C4q$VM{4PfpWzi)ioZY`A&@z#
z36}hGT3B!#qbV&{pvA4A0A%dYoJbJ@)JNZnFaL5cIPg#;JR7gpirFzH6)`FfF;o|&
zW2dCFZ)pj5y5Zjte<Z?$Jq6(RY7?G7uTL2_ospvxvP@XkGjdrK3@MB}l!yqF%lPNv
zzzOtvm!yl$%Gi=4T@VCA&NdmIb@hc*-qFIU0UsjUdShv>0h$GjP?I0h=}>0OmVc_7
zBXF(-xkrm3pauvj>s=zw;$1K`X+3U{_^;u%XBSPJ+gNER_-m2^UAJrw77rMtf6O(T
zRwXNz2J*G;p{wvC)phIFRIzu`YuipyS|N<rWy$UJ_l`GCqAG>8S*7&-u|y<?o?v1G
zX&R=s^BmC!>hJ1_*2FW?1~&9a58{3!@=BWGQ#_M4m<@LW)?9+kC5@^SX>bXsQtvvh
z#5kX$EX9w*T0nRLeM5hBd8R;V_*WwJ8DW^VA!&zfwYLf&7|<d}@?h`$blVfkdQFP`
zR*doqYLid)W1mI)<!ZCXdpDN1lV#J*ZoAX{V@DApCV7dOGEUSq|F*tsk!!nxyoNJv
zte9FCVrxwA$Lb_U$VJ*xO6@AEd!^eu1~Fc%xPndaqr|mB`<DU;4i1@Le}zi#G$t(^
z@Et7XV<Z_Nwl)DymwVL=-&$7oCp~AEXv;2v!G>#PMEqPsj5kh#5LpMWtWL~v&NxUr
zw&<9Bg8)Qgcf%48fq}mN0DE8C#Io(so#9Dsxjg0ayqyq$#}xboD&L)g(h8>#WXs)2
zE~X=`A}M-!m|N2_)ipFO<vZsRV*O~Uwev`m=qImjtN~<1sbn6Sr#1}86wH2*Zft+!
zFgA?fjYooF!_nxq-r;)w8CfIqMSW}~k3-l!wfwz$l$xT)oDNd_aW#%F!XuAx8r2)t
z3*?*>33N_J8MEf*5i5xwEBPmC9zmfZjRVRihf<jkKdD#2bLa~m0R_8WQAJP@^NG6{
zgc82KQ{nGuMe6t9dyB2&#)+ZZHqS8HO&gNuPazO&^3dIZgVfupV3RE(N7r1lQn(}L
zztK^d&^oeoa}ALgn5{k2YG57%UU|5AX4x46>_5Ce_{BH}^t#@?iR`2vu4TV_)z<oc
z?M|QC?v^Xv&fElLBcYg%)>9`S0gj*nyEH+DxavScdYDNJz(|;GPSE+@Mc2_Sp78mA
zts17y5XQ|GrdtSeb>Lrs9@Le7wx1@ujNSMP+kOhG1qHx|I6%09b=Up;Je>J}TSWNt
zuMaTj8s7spnb+5psuCb(@(CUGJbA8&9uJ;`OuUH~SqT%|qcgh!SO+T(SkOBJa^(gS
z!t>C-rNwl<3}fb8M&@ye{^MTi;6<|E4UpKD?|NPDcfAf<g+je{YN!FPDT7AODaH!j
z-rYk#?axS{`Cx;?YNt2Tbr~!drJdy|Q8uucxRFJ0RZK^zg^VJZR;3)IT1s3S*rwC*
z0NFs{LGDZIwV${5?`?ciG2nr;lJFm|b1(>}*pa|IRf_j1<!4K+hkXp4CY=Il=<hB>
zk7pQj$MYGt9b&^auBZ*aquk0vV_~7DzEml%VanBplp&OL7}D<I?ukiy_m#`3PX|4B
zi=vRieVCvw@f=NDD>xey!qiL#p@%^HgCs?)1(703j?)Aw=#bwFkvbHibUHffKR;P&
zn85}_#yPm$w!{(BFd1rnUce?Lr@u9ZDh;6dJ;3!q%I{<G=}G75EPnz)o`I~7&!LTJ
z%5R@Nb;cmQ3yh_Ua-{FhKJ?Gec5Y)a=}WB;)&&%gau$8Gm|l|NxbKc#=pI+Qw4(pF
z#CW)4oc&z-f6=mAlb0?NdQ1<WGg<8~3oLhUCB8M5@MM}>U0&~l0>)yaEBcJX|CpOt
zI9~Sl{}wbhyFBVVe^XVen|oysa+N8KqKX#fEoh)ze{M}wbSx9fbv8+Rd`?><s$rOI
zH>KU0{Gv?V3K0jI2@R2zs6F5DZvK4D?urAQ?tm@ZLUmWK^@9pjulsggeZISPv=yBX
zl*Ek7%ZQ@1YArQx&&DN2-klAB8h1f?CZ`A4X8V1=x`iCPfXhJ57J=eGy=36qyHbcs
z0sk)Ay`b0$IYB!bwEG*iBT9$A73_%AEv^#E3BmbeI(4TJTe_5)fDDZG`5)af6>Nw4
zZrx^Fb|3?r{qhx>^4Qp851MibgJT(-Y>8gER}!8MMr23j+ma~*Hvc6Ymc4(GouqrC
z;16eyyPfZUB^?5#K*NNxJz-AZ46;S{;<#ua=vX&I#(lU{5jhxSirHq~E;k<~TE+=z
z+;e_%IwpF`B=%R-{o#6eA(`0j_c(fCU^|OMR&`(h+d}Inv+9rhRx^AK09Vkj3^Uyd
zEV?7XFwH#0EaN5%)ct^ZTkZDMhj}t&WuK#%jOOoY|Es!aRwNsL@f$0nv3RLsqi!({
z(1uX0;wI*}TR~2&Y!Q3qB?AUPXqTTWlfC<NnNqWYd!)Y<LA-aI&+bk*qIA1^{;s*=
zKqk&;V)<D}F!u)r0{8r&)vPFyf8_|~Zxr6<PDz?BcTYMi){AQkLt4FZfHFo!j;LTO
zL>5B%2nmI?ptUH+)L=@oLYVpLe3<>-Bm-ic$|!-VE=t()I(yTd3-`2{6B6I(lM|bJ
zU$$ulh@Jx5hsbb0IhuX3VmC~#n@>AA8j*s16~7|Y@dP|)-mr5D#24nGyeWhqTPabY
z%>WGNpqI!O=`P1Am3)=-DO+f^Qm$9wp=V!;0Oh$z3<9o25F`t(5QBlUEKn6y-&a~b
zfjC6?Jjl7=I1?<M0W(24>rG;fqz5BIC8pGnyU<@7%FCZMil0oRbK(@aMNTR%X2DNG
zWqdOD_!P+W=;6Ss$*RQx^JVHaXn9s>EUkF3MSTAb$W7a};8Af^V*k4#2U?pkL3K1u
zqUTDDfMJ^y$+JaX<{nNAL)DF~d|}q@nzwAXzwhIG<KPM_1wGg+x?g{xrE+SbI(eZ1
zzBEI9tybw?AD{5q+!<}@1%fU%_n(a1Q0~RWRSdkA4&%>Q?tOK+Drz6(mfldCcrH~>
zEv`>>UD_WC0~6%NR1S{Iu%eWeb#$hVt{E(Z&4qgjv&W@d&BXmf;I!}d?b|99C~n(7
zQrO+Z*#z5KaE>?dX~jzXC}EW@t00Q=O)b$v#dox0Km<FhY8<>js^p#f0oD2?7}+HG
zT49~lghOd9v(#6Rci>I#ft0vn;#YO^PuH>B<*hH`(0%V1rj*i+qZ;wHw4)eN4KXF{
zJ0WZR%$F9(?<(g$gRO&KZ>=^J{QXf6C6|i@YQ6ocnI~UIe0;s1pH0i&;@@OgXfLL@
z`U%0O^&Je)jPmSO$LRD$85JX0qp&6U_kTPLHxirxnO$Jr`<h@pFB4?$8LSh^uny`|
z14?30u7<#+Iv`Y7;7SA7`lXOva#!YEdZE>^^O$45IrC&UE1L%~$8A2ZO4B15eeMb4
zL9KXx8LL^y^Bzq?fzBZ(V!!Wc?<eq?qWBSR{4VRyF~D%C5zvla1*j+Zdw6WrynU{W
z><cx-*UVhfu39OYna2*AwsML9fd0GGoYNnojMEU-<ng0_rRLVIpMva`Q}X{vL+R;G
z(t+Jn&9Z?(=Md{u^E%+aYj^{lBV3zrbIB~8;BcQnT|sZ5*Q}5-Jpg*g(?sLls+mH{
z^}!FG`j}zp_4^nK<g?`uLK0?!_**~o_8=lnq1+4l(FOf4OjYVIh(hVM#&RZ7%{;Vk
zapfW5t)QL5UpkCP8N~aE)bC20mO&gjLBm+5o=cgiZeJYwgjK0^##a#cfMo_nMeh+^
zXjcy0MX)ohK#5>6P;tO!HgA?3rFGyUOt8qqNfNN&DXW2~WTY62^C_vRBf`M+%^J(V
zUN;4S@@qVnuMQoe&z%@j5*>lE_xppgzg<I1L7yZiHc$x`SsYfpNItQih;Tcqr*LOR
z3XL@%-gCAv=}#sr=4z_sXePOo0s_Yt7n4d1CT}sjLhmU6DVTUC6Z0T@;ps))qMNZE
zS_7-*kuHAiW(do7CfYxb7voH+1MS7M(A}up!yP3JsoS_K1N;A}|MsGn4(-^hOd>7<
z<%}ZS5)|_NnD$P*QFMh4#g`_ubI(K?JC3LEamJZEN|Rym(n3fzLu|*<%!uq9L1ec-
zSYr}>B4&eOzSSn8tI|9Q(lADztM{{RA$D5%`<q@`5$3jgJ|6wF^nn)HdCcnA>nrDG
z*paCO9v>Noy;4*wa6RCP&vc5nuds~uAT_Yw=2SICiJh(qT8ddvIK)5sGIR^FmV<+d
zD_M&3&ZCEcgOO=2&t?*mUM6(#+VzCfWYRi(3$z3qIGr^pjpq)3GXl||B$Je$Z^?r!
z*Qm9*H(~oG|4%(WavMoknJl^$8#BOz!#F-Nq(Ug?N)J9dvl1?CC%5nDoNK0WE0T63
z&Ow_LeBBL5eQVtQT#G}QSWfnFQ_g4XZUAG%Vr~%Y-8Mo@HEF9JBB12)IjNZw4Fy98
ztk7da9%6-5yRp*-^f~BmPvxxJorHj~jS&>BV|W1A(k4QNtgfFE(dM}Nar0ohc3@vf
zGLTSxPbk@k{$?J<CC_^9d}rtZh9|6~UHziqe=WXcG@Ep9wYYOou<B33!-qgJ!=?r?
zr{iKq4iRJjHN8gTL8C&aSsL^FWewd4Mbm49JEnbit^#i#(ypE<4tJ~}A{V>vT?#0~
zBGAu5`}zT5vxcY7UI`$$AkDJK9p+aHVohw5vOMlo?9q$0dNK1>wzBSGkl_MDwq63>
zTO2H~sHB15Dv=0>xpfN8DU4c_t87EGpj-yv1<?xQUiV5-hdKb29DgbPQP<CHv$U$`
zNWC0IWljz+37s15`gaa)UFN9(9SWI5sH%V>sPrk?ed3(i&?90*cqo@6psD^Q_q+OL
z?m4LsX2CGs3cLj}L8;SC=#c4y7c02=T(YylTPX+e#k~><Ot`}1X9BGI%~Y~ssG5V;
z-Ov+MMHwRNd1|*a*Wg0w8t+MXiwn$&Tq==af29I1oG&4uKV#zh^~avBmcy)#RwxqY
z!>!F}zR5)vFQQ;2sKzj7Hs9MsJ6TTFhfsVzDm`r3w!1w1Tdw+bQb&Y?BLJ2g{XaAK
zh6042!racNQ&W|6A>ajU+Js(czss253mJMmCg>x63oCGRml?o#q(}VJqcY}j-uw<5
zt5#cKJLPQMzP6TW)S9!lcnQP|1#oh|srNOv<3+rGNHn^8XT3&^T&1<u^WK>g=*ld#
zMM1X$PsF4>5mB9EdTT6*yQY=VuxLsBs~Q@EUkcPY6W;H3@2dzL)PlaqY)~IL<Z$<M
z-#+A_&-z$2Dty4Ih=D1;e&c>tc<NV8gq=vR^BiNf1IjbV;1`Hz#YUGE%$<jJMl`l`
zx_YLhEPos|){B8A;{|NZ=-x9(55{I(wr7*;Esx?jE}uuB&2eBcQMRqRz?x{F>YH8G
zRxN>vYMi8HTzlocf_A+sOM5V$c?#kN$CA~BU3hyHa^8vrkD6_^_|(7-nI+XdY#3~?
zCN=2Dg-q61kTSm093hc5UevAv$3|hsHI)Q!J5A>a{NJWp#FWjp=(Vca|C6d2*A$?5
z8sE8ZT*^K&irhB>xRCKw8h+FZ@`F1;dMQ0;p>A$}pp))`RR&4H_CFyrDBIr+qR2io
zsnN)3a^&ydmXwn+V5kpK%&WF=UO(w}#Cv!2ufM~TQb=%=?6K^*<dA|9oRndKXY#T*
z8Q9?eDq-Vg5d2HR4oC{#@qM=^&l11>^kt>qC!-)8TcME{KRn{n1<e~GtracuV-#8A
zWXE$$1$|C~g5c*KsJh)F^5`!AKSa!Czl=OV7?5m77MzT==#;BrLx*CNw%Kha72UmC
z?D0j_SG-L_)nhfKDgsk%zw_r4txBqp#`$jfjGFRM%GstEo;ws3@g>FjCT7x#44I9t
z7?;AonNos;EH|xMvCTE>YJ<aKKfPXdI7roaRPpVM<j6lDK0>`9-yv+lrDyiN+7m;R
zBJ`>!455j0>d^xYFO-cRt-opO7iV^Ldm#8d^e;X<CZS{iC%%e;9`VAim*TXNkS*(q
zj85PtrGE8FJl&OuODgNl_IWBc)>;NTb=GA}4hHAz0{Y9;af<RodO;KxPt{ADFkJ3v
z+SX0%K6Al+dL6CLfgh|NZeho_1bxswp!@1`y!GdJcyyY^xr#EkX6y6fJ}K<KSZM7F
zEVPZxcgwS-^9SoHj#*lMzV**Lz<1WDS^NKUiOlHRU|?mR-|StFEVK`5zd7>u6aLu@
z+vRE~GcGbnf=RdSAip)jxCJOH0QEqKxC+e`-2OJEoKLdSeyh(+v(@o_*DNBVtG=(j
z^!={CX*0zBZvRG4<Cz@5`lq4c@fzv(qlaH;_3w{$(@pw9TO_@Atn$;fx_QIh4w!MB
zFAPEk70&*e9gb+dnpwV$@0W2-hux0NT^^mD#+Fy7h}TVZzJVa^52N1<-2;(|xdF;3
zosNYL1VfU6fS>2G10H>ES6h8G(`d;o-@k%n`c9rT8d#IbTBI&UPTeK!4~g2Dwk&j>
zM8PwSiPl7`3CTJ^-;b_3)Ox0aA2Z1sVqK2U#SD2>alO#lG(DGOcz2eWgE|r%d5oHu
zOk<T77{i!US|>Q@t6!}^8Av+Y<&_yo(!PJ<rg7;w6*MFFB*$vB&!(@3EqAElvh%rT
zT5E(Hq*1TYl)y_(ZdS#r)L@fX|1a#FLvSa7yRMT=Y}@w4wlT47+s?$cZ709jnAo;$
z+dBE5b2hi?Ztw0^^`fh*df8pw-}k=HQ=`^eSk$Ha2=DkLl`LF(@pu+AvT}|FU!N2%
zl&vu)@tA5w1!I9RkKju9#Ou42)iwWdlHm0%oI@x_wV3kwX8}65b+K|@c1zXh{|X8t
z8HF85CNv<5JO7ut*b_G!v%@cV-oPHR<?%VmSn#ixpXeUZ_pzkfrq^gQ;GK{7tmr_B
zN~<_Eo_)!TN!NT$eS6fqSHlR-u?K{c5ucTxGg|cXf9!fyv^qm~9TDlx7h!-&!*(5s
z#BiT$jUE))GXLYlqLikV&-$v>r=icPRWS>#IDO_MI=;WV%=mnffGp1i?U#|R$KA2u
zL}Y~(zZvruM;>ig#lrzvEcyE}-ifH2)9lGauk#qMZyr!3Ur%I$lVv{M-JI`?Hm0+D
zr|F3{7`5H3@ag$`3L9s%xh;umdW#t?@X__4i_HLd9nUMiuZq{z%j8U$)<|bu<CiZ=
zBl8Ylj5<Og^Z${0@zd$Bf{+9b@kpYx-7eBTk`6NrC5d7$0CkZggA#n(2n<}yvrBz7
zvc#5_aDET}Y|L$Z9SdJS$$+4UQd2Ryb$Fi(y}p>Jr242b6%w(I&<TF?^GRCH5{|br
zLf3=IQW;e7eWNM`htuS$Z$WEV1)5DN-hqc*Pc0zqpl@Es2x=dp+*iV|3+_Ig?B4x<
zq}Tr=q}SvcwEEkv-BmiuRj^$OLAI-$pcojrnyZnsPt^7dcSHVa^|gk$R`90-OFNqz
z`ZQk7Ra@c}fb*2HYrJuIf7o`T094@F(NZD*2$jJ<%RNk#_D3lEc(j?U6}u|?c*s5;
zsn;$&eETB!YmvuIih#hv-sNfHZclu2P}GtSWW#(3A8{z0EX)+L*M?kOI*#>Vu5>0`
z%T1NQ8#o2)Er+G5ku@F8o+56f0PmPY9beCA)Y0P>H(QM7w=<c-W$-r<&~GC{@<Ip5
zlLRb5g;h1MmcOkuCA0#H`Yme2%Vle_Qg$}Q%_Bc~*-9t94uVY=0$#hXQ>{24*-3i-
zHW|S^QA$A@;G|Yx`Rm{OziFClUOO}`x=`78q~x4!#pBW>3;ani`cQ64G1!H{jWQR@
z7Br?+L`iru98Dirhtm%1jV2pl8h`ansGmtE_?ZTL5_&jKDRe5H=b~4I$Hh99%PYWy
za|bg4Mh_=xS*uJq<>0QP2hq@!7Uqm-kAZ(@R0*)UU3GeW$GU^MQ_}qP0#|Nxp3u<_
zI=nkExF4Nd4Ba08#Uub;-ZsG<+o}fKQ90Jww+OE&^eQ|<?g`Y;a}+K5{Cdl13ifQ^
zfQm-`_e}7GCtMAhwaV*XJ)^2{b9!OR--bnSL209ZUe}qK1_W*ddLo70B^Z9LUf6wI
z+G5_J7E=p{KjeLK#s=V=Kh{^?pJ@XZpze&lC~=56j_=z&x*q4iy8g49l8GQZdZb4A
z>s%ZEFpzoh-U)l>=YEAZ<{bu_5ew&({D>Q#T07V`i|++piKS#G|J(UGHz|Ci1&m7k
z9=>Nn!iZ-D6R3^$!=}Ar+{|obir~vWAA6p28;f)ckWf2KlVQc(0xXsW6UM6gY*hS4
z(%Kl0G)otsx&MzXHgkL1#@Bu8^F;H$VPCyk^L6jHm6|WZI=fGvakbrZkzz1k7}t(6
z5wS!#s0<JhhvBm)2vV_v)7^lD_XdzGZPUI3Eo$D+)^{%IIB*d97E?h=oa=VsuEUP1
ziyTIX(S`|Quy#mhw`8T<Baw64#d7E1U+QH#hXr6p>?dy1n4F%7t`(Z^G%wBKGB?$1
zb8+Ci0GP;5NVL@gm)mV#PZ?EmJha(984Yc;Z76k__&tMf(Yq*JE1~J|d1syA-7KU|
zhUF`?PR4#KLlnjK2kvIIkF8wH?V&}s;&8_!{YkRnG1W4^k%hp#P<eb^DEc2|G=tFt
zPCg2o$B^_4^gSzT9>$eDopq<Jdd6f-Q$KZal@k93F(S?EJy_+lI^@187Xu@hL8HEI
z4ci~?1OE$wy8ifEm@tr*58TcJwHRKZQKW;Z?Yya5^(?pCT%cHB?~9@{WJk|}cFG_c
z?3uENSsr_Tp_5gMo9`4A5joo$JfEu^4(Ug_pRbSBmVb#ly@ysbbPpn8a$JkX63d=k
z^VseU$VKe(s?jP=JIRo9c<#lTx|(+pA&RIYyw-9mgrW~uA{vA?8-iyb<!BlP`ZW6T
z9?(j!!Cw04WEh-F|JsBk<KCduDKo!{b&aHm<*|N$1XDFbl9p8J8%!d0mDJzu<h;+q
zwXQ4B$SsxDha%aC|3fEc7q;!o5?*gWtV-Q#($M`t4_f2+g=o@(VMXkR9dOPN8N2Bf
zyOg8Lu#7lZGUo*d(p&jd#YSECgaQcf0zj=4G}-Ea*ATUEb{~ioWe8+KRwzK6vF6}k
zoUE9WdW35puzrqjptq>M#JqzvkA2eezduk;V2Sq&u3B5Y20<`{?b*ET&4%wrDhnMt
z#P`*Xx0<KB&SOFky5G1X!w#I#i$mFnDq{&PrAi7K;}+CXzUDe#COpAiYPO3Z_{h_8
zxum^0fx^7Li^g(y^WG2Gz>ah~@^vJS5$goDI&TZT9xA-P@KGtY5E~Wm_Z!CLYK&I1
zT%}XjdrJ?`ehjdkCmgm&8vu-{R*EadQpkouKe{5(D2naE&&1fwTpyyDN4v7q6OqW9
zf<~Z5AQ+&V{cY8ptWw*vq$1Va>0XW4g%|c^i~&SFf0+M-<NZ32d|~K02jUX|>jd8+
zshwWuOzyGq&V;|^lk+Np5U+Rd5*ug;_y)Z5J&wv$DiM!5H@6tMl)L_JH`HCnr-o*)
zpCtZ%o4jthmc3kS;B9c({&(5&e7VxV-C%RP)aBK34Hm!8x&$4;Q7bGWHUUY4KPU?M
z_Y}z|MXIvD=+-h^y&y#U`usBTkwX4i-jQ%=+YVfQU0MR~`}W+B(~W`SjjJT3BJp{S
zWXxdQi54gnDS~q{fQ&MkS&UH{$e{^bL|XM~B$J2*Qs<m;w97(;iA^7P_hZ>ge7gNu
zmMwRhjerLU3!PC<Bb&G3Y4zCM3S1{gqC>)SiC8A6yx}G`h&BLtE{dP02<Ye~&fh;2
z$#}<t2O-y~-VZg)NszStUcZ9Ko}Gv`RdA2Dvj^IR2NT;ObAVu`U1hHgR|Qhahjb96
z1vlL*Dx)@y7ydS7#)(?q!WTHcdoD+`4}sReUbdS4BqDUJDwPJJu(>bshRR8P)RJ&G
zr0CtWjVl=#(YU{1U|<2fCHPRw{7<^goGUFBU6BPBXtODYw5b#6ZNkVO{~%NpKlMwv
z3+fbO{1i@Ns81%Ycn?X@Xa{lLxn`tzkx*|gpZW;MvJ_n#AulO-E<X((aQJ0d30p$8
zN#ZU%(#lG%9+7&UZ{oe#M^bqgi6S>hTNN1&<?;v`u2s!^4q>|WlNo;yU1V4g`p#}x
zF;{J9_@_hS6n>(?xI{G|X7V8sm5Bjzjj<tXqySV?`|Cn3lOYLp^uFrs42s-a7mQF;
zad#b&Q{S9gLsPRhZ9!6TJ#+*r?rb_#K|R7+kkws=W{DmHM=H?n5Qf_Wn`)k5$%rGM
zzHg^`!wqVcq)KmZ+6EUiZTteU*h!b(3WZ@{I>~Yf$B8oWnC@D71m?GWEDJK8aiE(e
zo$&=hQ7Sr$U(`hnL0Bb5HHN3vQ%XGyh7`-DkgAl)jMQ;3z+AL(P!nRwq>{!SuCv#8
z%8r!5RMnVD(S}_>P%Qb#iB0Ax=si_tD9q;CTFd(k7KXHG;wg7YilfE-lwx{v2~m(=
zb1US6ccfDG7+xcZQq4F{@Vf>ih2yH*Zeroh>9Wgdh+KhBM0izG4#xH2)HcG{U~q7S
zMZ1{ASSMp?K2KC!+f)RO`fX8(gYa19%zolLSWRNV2Ne~mmF{@8onb8fjHqA>OJixw
zc0(`#1t;&Fe($J6{Agy(7f99Ow&97|K~-NX4;+jxFyLIjP*{3}2~_~cgVAeBaK4m5
z#X*gC!{rT~-wxit+P@Xr`lAdo{&InmcK~&P;LUPtyAB^B*KwsfTPzv>lBCgEvUmi#
zTB=&V{s^s2*PgwC9M73uIf;B3Kj!z^x;`z$lW3%!Ke(V0j#8ZK=G+NKnmaXcKerlp
zniDmcL<9%!9;X+5HUIJZf_^ZxXNe9u(JvfG7&-Qkwe6HU6#3ntoOV;TQG_>f%5XRU
zmREMDIw8`OH5oWzZ81~>;h+H39h(q*GOLuj8UnxEGU+;#IB2-nScv3RsNBOFqghG<
zD?^ElGky!R-no{R{YIwn+eRoH<7_I)L|bu^VJ=5u`7Oc9*JosIj@N5gM8yX@dAo2X
ztIG{U;5?YNgbyg|v6%Sy<^E0KU4PA7V!QaW7=SJ9XNu=_<N09Ig_1rD4T-<gs+nc+
zN`yG)8uYqck&LsLm8d#{n!(?XVi@d7=>k@J6^+!}2_9g~$T}?Xf`tZsYTuVqa+(jV
z1+$Sxy&JSayi7#<Zv_cLNmk4J#(6M9^F;=^HorZw%0q_g&%&5+U|f@pdF^l0QFL_W
z^#neYFW_YU7or!MX%7%dMJA=RzR*{yiSRH_!#^jvO$*A<o@`BvS@~YiRw9$j;7EUY
z-H=?pj|3wklVP*SsD1-x8E{RJ9jTU>w#T=6#2pk@er-l4=$!n$>tpE?W9|Q=rhZh`
z1D+=D=XJl#pXB$7gOAxfnVW*ab(Z^&Pl?)0Vo9ALWtS72iy(UCgmAAoW;BlH2_idP
zll=ORk!fZ*Y6B-zW~--2Q{@7&BCjWCO&ugv<GiBli;QDHvW%jKI#4h!fLujBF?Nn{
z$NJfD?31dJu_))bGtFSTgGXr+(Ia`^!`?mafLRV6Zgyi!vrj8Pm!Q!KP#<hFStg8`
zkABpxy(t!PwSl(bmxG_@(v_A2txX>03EUtaBEmErL>#eDJy9<sPFt=A0N&yaz>^}E
z#J*7`Mw8Y+77f9#9`h_>P3ip(fb^S70UP0`MikJaq?d6BM^L4@buXaB+%+<Ww>3_t
zU5%hQ)Xav}LP`6)^PLZD>+uO0EJ^+NEph*3=+JJiJY*3&ead+f0v4D`r(<MkE#;Yc
zDUbfg=O@VWUR@z>pXNh3S0*6kZs_tGnR<rURk<Q>E+XnHCkd}mX}h7aGaD4-<>TT1
zv=Z$jp_RJ-8W`p{W<VMfkG5eJ@&e)<sc@0z2pHjGfZh2vK}w^;QKFs3*n4E4)}JRg
zqd7t0hmzZ+Yds<S$Sru0WB4${qQA7zzlTqmmr~Zyu0iklimqq%2BDsWG~xS-riuV}
zf#9M{avT)X{WmTS|A(}Wi?JG|5V8*5T+E4s0QsA&#*$2nUp`V!SPJG2;elysN-mjc
z;T|%+)8A{aZS{IVGD~m_f*4B-yFeny(qVCRFtR^z7sn~3s1K}5uv|TSoXP=-+-&C3
z)@EV+odfs%HRn;03JaoV2VSxCf}Hvq?f}HBEdv--bHK4q=z#IAu=FHE5kUf$GF{yc
zR70!aHbAtNygx!{6mwl_CBF54r9>mevT~y!+@D{TYy^*zr)dUSh7XU+79EUN6ZsO^
z7T7rw98Kel=bUA6FxechA3jg&MdKSXJ*+97W9|;eWI4p1bQG%OQ53RcKcUyrBj&Wc
z%xU>tI!T9juW1~+&RrR)Ea-3LfJmm0bH8~SJ{Z3uk&WQnoaqEw1=ynq+2*9%u#h`*
z+OOTs5Tax%Z(jWS7-kfb)aczgM)GORYI)lQ-|%J==>rvBEz+X{s^ipem9;vL2y2=G
zT$#GoU($qR3dqZI3Fk4fvEyx8D|?^<Yo_`M@oid=qf?Zo)|e@Od_tI!%%`K=*lI{j
zq{(*w9HQBvwlr1HEEZ;^C4`F*<d^(YW?JHo+ko^8I1ZyeTOY~NkEn+6gkA!L^VADf
zFa*DtOFCGg?AzEGWvysbJYVN__SwUuIjNVbw4w)K&%&dSIAV(IgGVjOh=Y)$u=xlJ
zPbC1?Xn)#2PUWiE;<_B?u>IbQojqaahB_k5B9cKDMb`4CxO=Qo>;5?6cfHWkXz%pA
zj`d992c!cQKUL*8pw?N^@J0B@iMkx@)ZGV|Mfz(yU(^+tFjvo-u<{kQ%sWF4kJ&TD
zul>6ZU%jjvO07R<OoLrRoO^x{wBE&T7*qmnN~3823c>lGCS2p&rp(Dyi`s^fo2a{l
zc4sJRsC-6)INZN{)_Z&n6xBw@X9^2UOy78!V!ZtMLv7c-GbRhgNpe>VjX(VEF(`5l
zfV@uL-V}6K6Y9oz^uMp$)q_f@=zmuu%9*oN|E^J+Gp8;2o~!T_NfP)T(v9%%2qS-e
z|Cvp9jX0h3O33VBX+{epa4ka(y9vRY4=H(fS!&mG0`|Z^JqxXJ$GE2F=WBVIM3dMK
z+0N|}$td9w`X%ri7KVW(9#15j6zOh>?;UlXIiKrU&Wv*4sGR#}C=7ZFi|uw^T3qSq
z$#>Ka=`h*d4cZ2DGWkw%td=`%;9sRq3c=m3dcK&Be{4ThzMw_ycjeLN`iG1v*DDu)
z)&PnkjqMOdfd98g-~MQo2uMSczN5>(xXMyMDoaNI`9$rkVv%D*mzq55g3}_jxZ6&)
zm_qWG;N;b8k1Zp<2u)8gTy0>Cc{S)C$1)Q^Iz%y9MAR8im{K~kG~&Sz-O5xNGzOl-
z3NTtnw0ciq;ZOsCv|jNG9wo8*^Hx(D6^Pt}3XAn*-6hee?+Cf@Wxdiqj4nX21U}$(
zd~HMH`JfcHE38@<(%a>;8hO4Q=K#sGRsd1Y=}YK`luwHZEO{>Q5<%Y=x3XfTr(^r|
zSc0VJs|$lHL&DV2Qf*isdVN?b;_i{*@Ax?Eb=80X00@|GTIF`geT#S<FGEK|{JHWN
z*(ql9t2MD28o=YcAKp(Orm03o<$qU_yws63;f7(&zSa=HY8=kLG$ClNd*1C~Jqs>4
zK~$=Dq<hOzbTCEe2<zdup@6SZ(G_YElLXjjLqNIo^`zdYs#W`?Qt_^w!z3kJ{!uX%
zXi%;IWek@odp%w~!e}Alcnhvsah5EyKrL-!ng}31?E4w>s{?@rXtcI}oFu(poF_Xy
zQ%<@G9JKbr=>52mEmtkoBp7hoa{>_H?eC>EdkCXN;iQA7Y|4p^n3&#<p7g$-E*b0m
z3<_QDLW=x*jClh1{Juq8s0Lub&;#eogP64Y*)P0gqThW7e3ygqX8MKd!YAU`535|b
zg!k6_Qw5goBqdl*6!7h=MSg`B6xa9Njj7B%VwKX1GK{-v{8n<Z$T0J1x>o(THAcNn
zwV4hT-_D-{G}C@zUXW}$GVjOMl^1K-Iu)pxbVyh7e|C&L@qdZK7}s{mzaFX${&hN#
z%L^h7_*EHXc1TFB*YgqN-3#&XGv}JbiOn#>@J>Yk5h@H8aPWb$%Y6^-aN3U@S3zBC
zO}>ubg@kq2Z>sQJMA~k&3-@o${~1SQ&#p>Nwu-z8W@bSG?RFoKjh&2)C45cTf+4Ua
z-_W`f8LrlHVbX=FKyQ|Msi2<>GF*&Gmv@$2#)wm#fsfp1g##6mUa^AlSZQfKVkRCR
z5Sf`qmBOlSXzIJAsuT?BB$y~L{<Y_`gaEgyIYbp0tc4*mN2!$&epe+EkevsKdfbS3
zT0le@^q|P(kuW87Z}Km8)sy-kI6%E~*IoX`Rt(v|Z&YeMMES}9Wwt6)VPzyc-eES<
z;Y4f__}Q057KD6orgg3p;~z=n5FKwXqI!!$qkt{=>tIh-ly{y0-<%0t>T1AouPeps
zakIIaN;6?I(?}CWw%=eyc*2IR0)IdVr$P$S_?1apm)0bHLgQwB?;Q1)ew$HbpGRNd
zbHyureC3ZX5n~mIAaLrrucZM0j1F#9(l}xB0;Iz+Tqm~jsrCwyTfb@N5BEo;2WQ5E
zAQ7z-8H9_TLqcy>M05XxbBHe<zOG0*yao?u^bPSJL1e7LxmGojIg^&Nm&w?TYDRIs
zNDk#xRDG$M1=9eor2x2|K@jDdKGB&w(vGYVUP!Bfl~g)Ie>%Aj1?Z+<F}^d2$t!u$
zg|kUP6P26#`7S#t#wB|@z)pD)R-tOO3VUEox}{|Z-1s&!vw2z@ZxC+}lj36B%YG(h
zO{o&!SSpT@p7kAytcc~0Abe&jjNz&V>V@rZ(YfoV7QFE2?kn`k4cw1?nY5<QW?^=I
zN*POewct!Mm{3T@%3-!!b?<=7Yn$yZ8!yxC7@c+IpY!J9E_U|exIxou!C_~4W3h4A
zFvw<)T7_oLI>yUZ!Z6awFPuc<)W(`D!_CrL$GF)Y4CTn(Bz<bF!g*o5d+%7=#2sjj
z=>!Ha#aE6GJm+&Ez8OP{E|BxkI{9hK3iPDtiY#Icl_$19(W)p4s!^8m9PhOYOoszt
zGlHYZ=vV75P5S0D(pdgS!GWE)c%(!te5jpb0^GXnhF>^J8hc^M1o%*<Nsg8NVUBX%
z-L4M)Lws*gxtCu&>mNVT;J@8Xa|`0fCzWk}P5K90DmMa8zv!yc;5>jv<(^3cEsMSO
zYC-EX^22J-Aqd|CqNVoQifdAM0{J2r0+=@AXXW~Lk_t?1+7(<J)J|sr6&kQw%8#|>
zkF0<2mJ}13jSs^fQllSuh?_JKyT)i>8d|Hk|5kduYS|ulNG-nqvAY@VKPr#iY<T|B
zCt;9_rvQHTmsYtv438*tAN_f+`3agT>}EX-Fm=dm?=evL?Krpfd?0}z_AZ212bgbv
zm-)#dsni${An*tlArf$`5N&})DPUB*`F?Fu!|ecJJl@i;m=M{xdt=jchMr%zSLuY=
z%zx1*`k95=l}nSFqqBL8x2IYcAX#-<&(@lvuu}4u)!Py@(YfTF+2nhnSZ<|h>9k&d
zy|5dA4m(RmA72GK(a8NPF4{<-RJpT^L<HEH_GW*5^f#^Iu<UZK6qxr|?xpPwLxn{y
z|3y9X;s921l93mu^S%A}>4*?qD`~;1u)mFA+yi-CxP))$$rYk*ngAkx(|8~Ka7m7|
zl#y25d7p;Rip&U)UiYo8+;Z8?uB#?>T`(|GF{TxlP)kTub%1wg;yRYDQL)|MnAU3h
zOnnn@fVT^xAhj0dCTn{!Z+>mlBOeTx>P*)RXmcn8Dh(SuHjWBgDmucDI37EbaUY2~
zUmLf3S=QT5>FK^-I1O&>3}-7Zr%__Ow8b`FFd4?e>0|US2xGzR8)j{O3_(K#3yt>O
zS1CR;6Q}QBL#Ybe5gM0{2lFWK`UTTR5)LDM1UrD{uBPq(qaA1-tw?4OPfW=kn4j;5
zdwUfE=+gaR(f<Txr(eix@0H~%#L8}JmzBb{;ctXs`GeNW2Vk1C;t;t21V-nohD!o}
zk;&lewTHs%ThyO*Zc{450T6K#V{NCOi}=bdBY(LWL<C&1L|_{=y#-akF%V4pfs+O|
zE<qY$0B|gjLKAo|=&305$LR;IZJYFCaRX{T|M=2Q_Aul#*U_UFw-|HPi-)@<Y|pIT
ztgEe$#nuH2cE7AV#4<|KSo+#*{(ZIJ%(bi<@-tvaZs~qY<$o)J(4buxbB=t2o(!_#
zFh9Bk0iD425k8LCC4ZPuUq$5|o11}gWy1?86pG5pK$4FY$7foNK>jrXdaSY|Wr-XC
zP|l$?j#QY#gc}q0Qjw<U*h|60*ad}esIzwYJ7Nn_pjvkDdf1D-VQG!Z_&zP0*y<8Y
zZILYlQfIDO@9~=}%ZJTdW^z}I^4oLM7V|{)6IEAymh`L+9WETY!uvqgG75Pb(XSO0
z{GGU=Gmt8a5WISNJ&y2nPQTA_)zf*Q-3;Y57&H?HhL~#yXd=5r1=YpfeYKVPE2KYU
z(NPE~DZcx3?M%OINe*geEI%}AOW*xe-*;t=dBp63&y@1~{5$xoMsT-YzeL(da0rF)
z9h$>V#71VYfpt$8>+;OdAW>ulJd!x|5J)lv>pGIHDv74o6|E`7ST53x2CXzHR?5%Q
z*h!{rHE3~s<o5`8NV*Pd2SF4AP|oV=g~4z!{@Y5qz%gwy2&>Z^0^rV_=1_{oKsc(&
z?VI0g`3-F`{<3>oPod>ifJ9~<0nxa)L1bYhEKDfDD>V{K#%TsWKXV3djH6nVg&12;
z(wmGcgoAPPz~9{{3L)z6#HEv;+!u^eN5vH9BUh{~60@7*5^UXc`I<_T)pE6tO7|L$
z_6Srl3k{f+1Y?>7FNx%<q-``x{i`AbS3bI1{Z?#>NZ>7a_D)3~%1muH^ybkCD`K?Y
zy{)eo&Xn_rEE@uMHBa`p)3YBso-74pEWiq*W~U?kq=ibxhX{AP-rl-!UWuW&y<M5J
z=(ZlANXYgC=5&)^0474VQbNOHUCTri&8j0K6%^L-JXGzup%zaCE2@`lG~9BEp>mXh
zs3Y>K8u1gkGAt?TjT#oXw`$6K`z>Ap!yD@{`P$#Fv2~2ns?Ks59P4u3atckz@9cA=
z+>);FA2d%Xq*<+WeU<k3%O;HDWWfD*%XHUU=3~>mhOa3UXK@`8!9&pGcF@vhC=BTJ
zJJ!OgA{D||_a8WqS7wO%gmBZ?25e?MrwzA5YbyQ{|Mxdev^`U000lkXzST4P0ki16
zJwngRUK@&=CVI^!@Xf`#n}37(3UmkS5~~-f7Dz*c?Vuv4y1594djJeE)DA)?w8GnZ
zw?17fL?(0+X!u#2AfO#A(%53qX4>OXV1u*I)u-X+2xD15W{lz7xGg;_=rUkQJoss-
zXo+Q@Zy|5ZLs%`dCnHD$0<H8eG(1!Mr>jU~r<5Cw;u}Tz;JA9b@xT$^4No^WQW8Ak
z2;XDQ1lQ?$Q*}Kmkl}k5OZDypPkzM>&fv0r-Fr|@zybFIo!Obcv~#uo$pfRuw)7)D
zuiIjX-hNI6c5QZDJLd&9KNZB_XjmvDJHSEWqG1-29W2(Lgd?i(VAY!U4Pj1TVLd||
zu+H=Gux_pGQ+)rKS;{bmGX|;p<*p4wv~NQj7a@@~{i1<*V?FgCdEfJ?p8ZRKl7cBd
z8*^05I<#tmIv-yoG$gB|>GpPy@kHG-&!C^RL1{HSVQMB1I9Lp?yCA~gd}-Y1S*?h0
zpI~!bOPiW)2Cu{`@M=I|Q8XMfBInJ0zucgSGzlH!-&t~YHJu^fLrOPzjV|j7V6+?s
z9sk@{#=IwaH>$Mk`v9p7X@tw48vIV?qn}f~T3BaL68RqM3gKo8qP#mv(c7DcJHD7-
z|33Ze!oRyCQt8tgY!!?+lVJRKg%9z8P;&mE`I3e<#M6`O*<}x7qJuGMdqJe)hZwfD
zH(=QRO)wc7onKPYZy#Uffeijn!BUnKxx7S~c<~3x)qE6zg7WH^<t<FoYr!xZ-~cP<
zv5(3gn-D&*l!E?|0X{8NOMx#}EY{o{6Vv<!Rk0M5As(H6)c5ehVW49)Qr~PRE)%W}
z!b)y*s!Z(PY!`ab;omzNzaYW2zP83S$F`JUWo0$qTqM5682qHI#m1N-)o6`lMZlV1
zPH<${%^03w)9Q+M@E3AkYAuONUhkhgOH_GDVXIoq@vPRUGja0TQi3e$S~EpU$!!tS
zEnCqRREL@Q&WJOt&0wHAxFA5mP)nS8Hr`e|I{XQA*ZccnL@!Fa?HAtJIqu0?pyK4I
zAZc@)i#!R;BFwT(MtPFH7*--#KgZeyWW*;qAE$KigB@h?dNYxJQNrghMC6;y<0Cg+
zMv%Y4qt4P9#%?*}9pu_Aq?fi9NCQEC8OoKlq)1H=qFIm<k;i~NA^+?wEIrxShtH)E
zb0g1+S;v;50BX1GX4qvG0C+G5Ls07WP1Z5cOw%8REcb<sxVhq%(=bhWg$9kh*%CN(
zfS-h7JAf`*x_IMONNkX1AugtHNDV+4RhuFr4I3LH_F^YJ_NdmHNVciAafro{R57Z-
z!#Nf3>`gW#K$Srdj8}GN9hekbzNE3sTI$=v4KG>Tg?R%d$sUj<Ez`dImq)CBnFC3}
zA`N1ftmARSRN12@tfY>@iz;Z!!NZg^zPGb*cNM?5si6g>1+)*mESWEfAZfp<)x!d(
zD9V+Vr8`PiTg^uwT?f~e9X0@kLc}DfuVV-{*8!%o&e!yixxIm!$M8E*4QK7WniL=6
z8zC&DTUdb|H?>{9W*&|BU`aw?A~wVafjta-4y?ICc$A~QnoqNv)h;`;!8W&0-jZSg
z@Vb+fl*1K$KEu}E?$LecUVZoCCGzR45E_?k_FNW<0lli{<G!rjMlQcN5XnUvrtUj~
zN1GAAsxW>gPaQKs5xexSU7+7`{%BT})7_hDoJd@t4M@V8kx<zlQAPD1!`}`AdfEo?
z*nKcm?V4h^-;zOCs6hvsIqa#jYB`_g^i!Fs7=U1!qyQHmEKhp@XF#vJub1uPY5=Ay
zg-ECubiPPfU`3EA&|np5BXM#}!@Fdq-j~zo(eGpmc0g_opsO|by^p8c{PVN%aVP2Z
zG*~}924Q4hsuZ7$3ulNd^QSPt*KoCAdaJl1I#%1SGco{Vw0bx2;I6#Bk*4ewn+#U^
z`2v!a9E7X9<R*k%&=`^v<1DDzFC!*2b)yttBJdq9p(Z@ZQY4Me0yT;hd_Xng!i2{G
zRww<dX-ZJvyNP1Wa3#r3uGDZ8P;q(qa`D{Qv*SOBc>E81mpa_;1tYpaf{0*MAl`Ax
z{RlNJRIO4;JD*g91T!yjFahb1e~vPhq_}%%2I7IQjU;ntdG6mL@V9Av!EIbp-y-xS
zN~9zIE9!wYdhl;W!)SHNZwKLs!-XYNh7=mx`=3Pp+fZ8NfKGhtC-n!M%(9Ioh+4cd
z%O+4$VXy;Ot>dF8ByS6n4_bt0*ci>=3!894aP;Zt44JjE)4&L*xET=Gm^Q$Eq7Y*s
zQ_*lE<Rc9<Uw+DBL}WH4rm1kqCVWXfBQ_4zw?zLQFT6(8I7e(0fg%H-4(AZqP$x?*
zA4Nryx+an8(cG{4W<}6*nd#>+M0p$)c9FK!v1`#TUFPJC`zhyu{fIqe&RdOfi*SUC
zy|XdzIzlG4_DJ*VM^^9ZH2WLXT+X`4nm!b;OGR<!V@73$DyCq!-Qlo|T|z23g1>g;
zJsLfe7);1ls5CfAUfT@I-i#He=1Ed*0RBA2%%U19Tp=AJH9V*^f-x2d`$%1jhkuMX
zPQFb@anrs*L*`&KXR;wB0R=2{PKpBZ5T^CDFA8Y`y2}gQ`dyHLdJ`qu9*z*f1j<B>
zD@MUH<s_T%>A-M9-arbDf$eI^XdO!|&>*`5Ps|kBk->ow=AML$HQWc~h6`s;t5(mL
zgSfQ$j{_PX((ph4R4eL+QSLU+VIh<`b*FK!%)c!b41T_MXyK1<5=ZiW#gPV2TZ8x$
zzeQa$=#UF1wy}j{`$V!gqOv7NLqEb{B0YZ#siTwQZx@d&59z4dqZDyIOKGEw1vbIg
zxy($mGhI<>OU<gF2gbCa-SWMMA|sRznf*$XfU;$WCMC{TP%h2ZmM9|Hu*0g%O`p>7
z_)v)mMDvhm|2+7OvK1(;A~yECZ^d^}_U1=hESJK}YyI`}J#$}jIit+ziFP)`2>c;o
z_q1__jot;|=c}S+ckIYGBL0gRCr55O-0LWV;Ux%Te-XRl#{(F#lwFqAw~>;9Nu|tT
zMwW!>6sEDTpcYL!pT-KN!~C0y;_1%Wxk&utcY8H}PO`}y6^fXeG7>0z3gzpRDANPz
zLZX3Y9|}Y!eW$57+tj#5r+UvAP|GSC2z)|nhc(zPYoiFCJL%mFCl-J7p(sqP5|$+I
z>D-vME3gYP1Z84Kj~d~aJJYQ^DDn(PAwKF*eVe1hSSq+oK`{s(r7~Hd%r9bZB%P*D
z?pP@s{$h-J3gPD?vFEa}{-~Gv-$v>Ve&Xm*?okeS`{HJGZN~zE6}um@84Ygw*902~
z{XU<0LE`Bf!^GF<G3XJ)&?owJcBwiR$ooKYb=nxvf>7FM)IkWyi#6J2x0>DJ0$A{b
z$0u%Kp=TIAF?89+Ae{)+n)NW1<}pCPCLV{1I*M@|tjz?hoNXw85&qQ_I-a;VgC|1t
zEuhwD)eO{on8jn?c{6-m+=YoiB-h6*XE9J&O7m#vfT5INvU<J;?96yRFTk)8eu>qC
zWak=552INzJ4g!-MFg`wcc-7J=H4mWq!e7kkMKA^7|&Puq}agj*X&nu^Lqbam|h@B
z|65nG0gEQfiD8OPqd-6OE9|{x^_x4nwckV#S}jBnM&(&vQ0TT*U|3Dx$i6n%dnUJQ
zk#+$v`DSHf`=O9s-NvQux%?vXY=dAwSW}TfY>FgKeHK|wWZB_S<(_%)xXNO0g!Uqr
z^9*4XIkyzbR8IPO)0d4`vW)d-NEv|~o8d91a(RtO;Yz7-+pNKp^B^NqZDl1R=int<
zZgK#(U%>ME&tnK4g9$66MJ*?DJeKzuV~ppH-$%`ACctrx49hx`sI_0_6$J`Eg$#Gv
zjdI)s8cHGLs#`ifSVFabljOhn93^O%1tJsRz`Mp{WLGvgQ)Egq3HL4Ry06VD?q=>2
zNkL|O3QtD7!gHBHfS;{1;NoK>cBZn&?%WRQn}JoraYi01$$Q3awvazHI(n-iBBQ)8
z92IxL;W(i}B@W-GwP#z$8f66n6KXCGqnI6~#oU(OITaTEiwC9&+pzIgg6N@ZZa7P7
z5R_SpyQl)681?Y-g@Y5R_fUk(+yY6P;fwvt3)y`YL9l7~juR2jFi%mWsM@V<OqxsV
z%<|}bjV5qooiYC(aK)DZ>}7QRbJ*B&j_yLL{AbJ;?<3*hS+xxw*9?-kFJwq=9(h~i
zRyeA6l&<y2V_Kbc+5FTCs!el618*;fRN(U++)%>gT5l|$Oy<k9;Se=)A>EuoiLI^k
zE0VJ}wWF{C0TJ6Gi2CE!b%M&ribFb@H&2Jch2qMB3he|ClcY9;JA!0z4^X(exSb$z
z0tO8Wa$vHoqs(g`ChOS^yS>9Ci!oJ<s~hrnrhPNyIVXu$9%u|6<IRUS2&K$U>cG1R
zd`;$l1$}Y_UY@jtjil6;&q|oT8K>)Ibe4-EJw>*{8E>e0rQjtAhtLgCBO`iDhy0FK
zFb1!vwTAzueP1awg^{^&Mvx0vzLnoGRA3C$fjr(_yCU7=q!wi_3~oFOj{;{XOD|4w
z2Gav@p!=fnpFV6*wvmKeP|G_mnnpSWFALF{akA4c;>lXz)Qf|lB=^eM+%61?-I;8E
zqw@h%4#;uD&=)Qw!h0DpW{gv|?pn(cN-53MxRcl@G09V8l*s^AIWb+`MFmXYv^(`H
zKk1i!NH*a*`2=i|&0Qj%1x~Q+OJ?J<WweVCL9rA_pZ#=5<E+1@3B6A`t`U|quwhcn
z_=<-lw{s9mCbx~Xf|=Q39l((^i9NV0qgjlIv!N}DIex2=?RxoPI>=`cU_5s$iZ>$i
z1MLWP9(|8C7c|}*21-fq$nu=^qW&yisPt=M8!@wsy<6qGL$yg&=oj$C3TWmK5qn0o
zH1_Dgfx<_PKJF7wa*Dqe|DFuv)jLa2%8GUxu$uT9s?dn8o#@KA=5fV6LLMRXvHE2y
zkb3IX{k-U~7>;G4@dFQSl0f{iIZzSf&glD>^uv|&e8ug)r#w+NfuAQ0%a-vDKAVB-
zhZ&>Dqr!>tH=JJ-dA=}pV-BK)Y}u+0A47!wvv$iMKsa-|^aX@K>=5I!CvZRp&cX43
zASI_wd;cCdvZH+zqT&O6?#^5Ow(SY-6Yl8v_Cm}=Q)*p2yw&iX2PL-{)`8XLhCnz|
zuU{RMt81woLV#)$&JPX3RnZR}NT?8lry5j^#<|Ovr?HeEe3eQg@>hf$h>P38`8^PQ
zyEZG-3F;MUkA~K3B?Mg#7Ca0E20Jj>4+%wP%OoU0>6LgZD72ax0ADYi#6k}S$qif%
zJ!qtUf83)02Mo2ZmX!lW5(r~Jpq5zxf!AwhH-f~@&<`BrehNUgB}Kr76A+SB<Yo~n
zIf%BAA1?fqepowH?f?&qr~lgoM3N&$*eNl7j@>KD1c>u#srxK-l!dXLHGx@vyea!?
z>H~`CyXV5qI^oaUZYP461_uuFO9CO-YolFaLH!w%Xp`8FOzEOnkBn-AnU74NKr<Or
zJb}TkhKsZ!p6)Hfq1Q*k2SMNV6#+w!-aQTJ3vK-x;J=Y)wORj@Cy!a!LcM`F9K%pX
z^zUS{2X@s&B^0yXNhJ`Y{(iv^y}qFxV&srG;FC(^scmf6wftl4JwskZldsDZHXb6E
z9YXMcg#*<$3<pPW2GUo%Q#L4q#!}NeB1pDQshf#Slp>`DD_xP~UbsMUL3Z7FCwx7*
z<96&Y`&ahlJG}D{PxGb2Hd1h3bM5Mu0Tdo#;?dT#!2SwSsXUaoQ#*y0anV3A(|e?Q
zNqz^%(Cz50H$8$@lG0F-|05b=FAFPHFnwuxXv`Gk#LY&;J>EAx9UOC4xg<bQ;1?yR
zdC#OQ;jat7pjP71<UmHCwEK6i_veDoYj~FHi>vE1o>pdhX?sL$ztjOW&L;6Y6k98V
z92w5~R2Y?I+WeSvVQbnKyisZ7E(eI)EW(K4n;3b89dhqlxxBHDo)k5KCfwnBZFz+K
z#il}c@ETvW*sVntU5cv;`^e)H`w$E#pn!%TMov%tg8i_WI()MW!+9Rrl~)m!tQr`P
zU#=vV=J5=ku5NujwJsd2(EBGofmuO;AP(@ub#1wmc)@BSVqwq$dBWVagJCv<;M@(d
zdG_6G=-0>1F>Cl$A`H>RJqeb@-qZJor*T<(X8pN)C}F*fUg;u3Z6SiX$8W9$8FG)=
zc;)$1{{A{$zc$~!LfbvL-rwCG1HtYnI-<hjd9DA5axE}BsZo*`>$7vp94JOwQ6+#O
zaSyq`)Z=L<Hnt2l7YrF!kReHvD=(AMCdneKQ2g_)N}GPj@}9^>lFiCKs6Iw>Ai?{B
ztt;!=isC9_@s?jS-L)izlh03ZKDWgsJ`+s_zo%DiYdsRe=5Pv3&F5VQZv%RA!@`Nf
zvFz(yZ$4;_v9^&J+9%;OVp-hEs2uCC33K#AsnOOzoKfal3c()L`$*(E$swzo)Ce2M
z@h-ox)fhYpQG@N$(x(m?<Cebo6L}}CJ<jnb!huMaE(`3EoQDYP9_n%etP1WHReB9K
zi83TjjZ3jT0Xx;kY-P6T{L}kY0s#tlQHgf9N(T|iS}i~7GE(EsRPUSqC3Iavh%tu1
zPX-m{mezI%*8e2Qx-VPjSNUcao0FEN!Oed6Wsi2mwE*cI=Rg!irQ92n<vv?k6oSj;
z>S9xi-Cwuso|&sph)~z7>%LXuox@J7FVcO0jJr0AOUa%MH!<z_*Vg9d^~h*16<6?(
zh<5ah(m>x#C=Rnm^E21z`@g{V=SMi$$BUnz82#y)o}9prx95xNRj3m6icJ&Y=&OzF
zFeJLd3tAtEMzT<#`o4|D^>KN{JVeuqj17fSlC`-yXPT>s<|XxtXeT6-d-Hj1kyWsz
z9b&0+8utdBHVfU>f1b9z?VtPO$uJ89muq*Yi<R5yay97)vi`!;4DflXskUkSI<MVr
zO4|Nbzxir-zP7a68$2!A?q14W?l0ETw^h5}>h3OP$CJVCGnL93qQdj17RHmE77#2C
zPdgvaf8C$5HyX0>#*}#+9Y0n|FD+6Wy<WdGlkb~9uA80b{)Ho*@`>seS`5ujK4B=B
z>>~|rlED-^`bdAA-%d~cr``2d?DF`}WmmW}Mts&yKiD#g4UaHQG1Ln#O;7vvC4Zx`
ziZ;XS%7FT)g+Lh1K9=}?9%kWA>}YK7ZFPL9DR@!Z44{Ru{I`|r4z}}ttIH@yZ(x!k
z^%yYNaJ1@io?5)9BsalLFkV^Obj+`bac#IoT*Ibue6(aYz3Z74E0;O0#t3&l$FGLb
zn|S-S(a^NbhZG5c(G+{F5lM4S@MZY*)J35|=>D4IV&id{i2#5o?a61nnldmDL<D~4
zV4_7Qj3`22kXzvEb|TQ(t<zA<;pgEl7ul{dE3LToK#pg>dZel#Zomcs)`ccMm*KPx
z0JO^7&mU;4ocrl`pf@{k&=^K|JSPBk);?N6w$Nb*1*hp3ZZrnn*58E=C$b;iim=2~
zT2EMsb4w%xX1BqiK7JZ~!K}{!m#>Vrk9Z#+b*X2&!v_$Y<dv7EYF?JyZ5`h4vb&2M
zkKNI?^vE$ZoLaTfYI5~Rgd<d;qD8!y+|U*Dqm=BxnVK6EdV^gxZWBAW0VG#uTfxvo
znn5$^(}vGo4<i&FySOw|P!iM47tG(})YHGK`Sdh(+G5lonf`v+&)w#p6&1p#Kj$jv
z=<fX@+(RRP33s{*Z*t!Hcq?3_xzCC_8QstfYN93(j|aqE5(%Isx!m)qsv`1*Jf*!p
zwjAA=Xd55CcD6a2ecm%49bQ{dqFE>kEd@6m&0*BG57N8_dvs+PprmaCjA1D#h#7fr
zn%}5}`}!fyyPj2oC$~TKdA~(JK6_Q_(d}B!H6x<GFvc)mKe>g84ho#J6KF_xcWS>1
zQ4<Hkla+aWVOp~c$>&pXhP{%*>V6ZOUr=---*ls8yx@V(`?l5lb&P1f6y05jpAQv0
z-m?<E9$x_~dwk5V73)ooFH0&q*&ZH_aoT<H`f7S`$=#)!sZJDAv~ozX_%$etEa`g3
z910(%KF;$qY<FqN1#24u(WF6h&s=2E5%ij5Ajkr?KSy{P1o|n{<?vNh(U*_;9xWDx
z&xZ<|?7j`AzO8ls8u~Q(wVkixpFiPB@ZLbVj#Y8DOpZS<%kn>)tlz&bRwCrV^|U=s
z*QCM3l1`~e{3&k2$W>NVg>)$Yo6aQ%TOd~PPa2$!er&LCW$BJmO}50)T0b`<99p1=
zR2`KdSpz@%(QwHVU6*ZF0nODzemU!p*fsU!p|0PKQMR#eC-{4M51SzM($#kV?`-`$
zU9g}N{N!|J`C0h;D=+tf+p_%q_~mI<GJt#BQg4+;hy7aTV;n5Md8o$`5?-9a`MZ?j
z$#MW($n_{q!CxnK1(z~1_?6nl<mriATg?`H&E-_R#Tb~-)ksdp0u8*~rklN@5ZA?4
zzTf3#P*=xoWJBFh7W_#4;yIS;#W-;uhY3qVVn=ts<V6N-Z*UD<!j@$l)!O`ln4wt*
zoI_GEku<HG6y~LBqIEl5n)11FEOcTJ>}v%zrLzrzYJ5NS?)$Lr#}JY!2r`f9g}UoF
z%+oA#ciF=zOQ@3?j?W~qvH=dx=ARd$8<DN&(^HvxxrgZI@AbunQt%%q2q$F>Gq1Th
zV6%aQ9POO<keOzvU~Nx}Zk2lcCiiQb1k|zD88oGg7+-^19j@#y#!&s+_cUj9%bg>2
zw4|>QYBVSraq!F=TnzKq4LrK_w0MEo8j8-04C$TYK|tiYDWD+0LWKST6wyR@8JG+=
zdL=%_?@C{NS{=5fw#O$DDWk=+Qc=#bi`DDZIvr1FMS9W96q5=Gb~Fv8Dd>FGw6yYv
z7!&a&<sd0J=5;CEfA@6XAzQG`Z0LBu4PX0?XQDG()bPEbxX9kJSO86I$+HX1^}umL
zjhb+)3u*7@PNmW?W+w%EtOf6pQc?l*sP0}UCIqaFMK~5om>&Kj2Y4Ufz#E~$yb2~-
z2_W2E$g7@%zR#Xc%!>eeI$d4PTfqA+-t+AJ5QUq+69;F<!vQ*21bZoH3zd4oZ8iFx
z#xTPuE>1M@OKf&PA)9s6pv(1BV?1j2iBv3cU9|{^IxrVvXid(pAM{A`5K?*nH%0Is
zUn!R9H6xcG{Jc?%ePHzbeIts%g+y)b0HSWt^bF}c@PW-PY_?v-=m3uK;lMAv>@ZE{
z;SEhR1ym1n!(8oOH}DORNOF#&nz2%IW?LjyJ@ioqk+F;|kd2lVuxOhoW}EK>`wlz6
zbl;a)JEwc^SA8JXh2jV}a<)}GAA<NPVCMSxO$n=7Rxh&M%BDo;5Jw32!3MPNh~$aU
z=P+0Ih5LWTuZ2NRlU69z3HqK@Is?m2{mgpsO<asCff%k^`fD)<`c|*`_+p2I+~H2w
zEwhE(Sz|ZR=*`t4XWLI!Z^Sta?D%>*nCqF5F!$(85Ce8vwFIBG&JiQV4bKtD4Ka*3
z9a-#`;T_qw+IH!4Y)Xy`%$GRslD+O(c<qg$F~T?_kQ}IpktlLiL)!ZMN&-vFG8ClZ
z-<&+Y@ZbJ;+^8}by3Gx()<hqXkKc{3$ceiG{mwFdQkb6ZP^+E|+P(@^!`8zc<iVai
zwLT9VRMz065ejZtF5oVf?$LEy{h6U3knZ*G9v4gP8NOi~sreXowGhw(GGzd<?I~Z2
zXw46HFx2a|e3zFrHmBRHWj0uTDQGP}7b~>h*n{p;zH}1bn%{$|?N7Jalv_XSoXF@u
z*JH3#zif2krN5g2V~2ma^Cp^Nd~{f=QaHVcxKz)xx+UmnzHy|c$4tw?WN1EdG{s7@
zxF?$cXGo|;cC5wtzg71wo4*5Gs#D-uX4}_5!+&<JiMkCW`OJ4#i8-B!P;B2gl>N3|
zp;e){TAHh@67~I~1H$9=zpijUuJx<RU*f;uh;mV2kY$CAAFBhID)s#hl?+h6-??uF
zH9g_CAjhxyg~AmMrgTc_Az$iu`1rw}kY3uQgV_&q#7vBdS<0YeBZMZ$@xsb;d_d8k
z>UYjY0gEKln?|MOAEgd&($7cF6?MFsXAE;R(spa96_E6uKBvJgLcMn|CgQIW44q89
zOt3ZDc@T}d97R1sR~D7TFg3QBLNKR02<Tc8bfW<<UPwj|q_^-KZV0G71Ieb39nz?r
z<<UWC=cg^+bz$YG12KI(V$}WuE4<$r<ent8X&XU1fBzyYxaA*0JZ~V>1iHyIv8ZHD
z;R-d9EwYN+hP{wx3u^UxU?_B#76*6qP!D5<L9U{fpISyWEe7^5TI|+&UC*A)J2m5p
z9jjpZD-5NE1;w4LJSo92;eEBFE0<a@<Hb*|XhP9s$Dh4PsSLa#Q>w*J&E)FU3w<~L
z*7E14dxc1NN~8Jd*#33+=U>zo;{pBz6&IpM=<(Lz0rtdrpqVH~2z=+{0h%zhJWDVW
z;VNRD55uJE<MMjjwYd6waBm+1lm_it-fy|pEAd}1N-(+54*4nPgA(x;3J5OF+y}F_
zMkrM@ipeWNCC0VoF%EC2y<5-2&6mUT8D7N1RUHY<J&gX8+e4by$F(#J!EIu=U+(bC
zN-sE<pOqhuwA?g9h^zt7XbIv3!8*#B0R$nVQtRDCI=9>3*Nxlc*WsUg6t#pl=7&bt
zTcNYX*c_O!2)DT}W2wxmHgV>?F83qL(baQBpptfQf_EBpSCK(T_$!Ry_mwM_Ag>iG
znieXA%U7){uZnzt5!al_Rk)kb)fCYbox~NJyen;sB+_G>IO3rRZ|cS_l8{lSmq-1T
zEy~4lSI(3Hf%j!=J?-r=6*7Rz{U6p+;lJK5k;cCa{+^+sp@k%$zf3735VHupHD~m~
zxadylm7{bsw}e1Y*1%*c<^z6dhJ9K3pfubIJy1P8YU$}jY`@jC;1|nj4Uu!9>dI@0
z>%zR)5Rt2vGDY=?CfU+Usg~*177P*&?n1J8>&68Pu02I6R<GbZD^+0@dqtIAw%4Bu
z_@axm;ni@No7?=^T+3deCFDNuJDTr-an)~_-0<wESryx2JZ3%j<2q?K6>V~H(`MHE
zY};6hiV1Qsubu0iYO8wnyH4uQK=Y<_V=XMPf4n=r!`4~Mcy@+UKi+<xs(4*23d|0z
zVbCmyZpuK06!<+5CD5I<A_FOsn{n?w=IOfNEYO?MJOTd4+iKmo$1y)OH)yYg>+G$h
z=?MgHTbGo5sh7Fnp5AUUykt9Ds(UBc(JgVT5M%G6gspX|+I9^0D@ubF9E8?a1R}7?
z$9Utv0I)z$zindaW9;G3o_IQ~>7XB;2D&p|c*+*(gguki%XU5RlzxRaiP8_zPI}*Y
zqVN#bOS0U2jh=TpMs>jNP664L*PRaw5Yp#Psoj>xold9Qunbd)Xgl6^I)sk++IcYX
z+4GOtb97YynEg0tSV9Y|>3@}-)SmU5vJrhO-;^>It2gDnkHwpU6Ycoyg{g0JrkSOj
zhUej}<XW%ip8y7i-8ge+`P}iw#fNmAWj&qzY1J4{sqANnr%Xa?-h;k1ZK+#ib2R{s
zv0t{Kfjz(tA9TBiH+T0Zh3&!0^QV~5fzapqchtH-8LIW92(oz-#agm=(@B)Ct(#P)
zRsTh==8zwB6r%HMv!e59A;^MGpZ&FYH>z_?3NJtTv~lgK**nul&%-}I`#kkF@iW=x
zHSfcOy%Q6>7tv<j(3GA{)8eqMsX~{wlM3|F71W(E*<(UKxGSdgEQ}+jRJ=VmObI|c
zPMC7_@Gh8AtuPLlQt=@7OQ~3U&X)|pdb?isX`Kz}c&S$lbi0(wwd8au)oa0IL8{e~
z(SkXoHJdHGKxa$8xmsWI0ROpuaRvh6+$`n!2_>KM%7@hYhjp;j3KLrHtNH)goGX1=
zf1%p{791-r<#hzrmc9ep5Tq0OHR?fpXvVr}Ld-p6*M+QQFLYiXvPMTr3#P15C!#q~
znzI_@LMast5~-?75JsrlS?5U&WN0~|IT>qqoK&E7%xzMh;4orGqbKbO#@6C6sjZ5S
z(8%01y4Xl75hOSY=P0Qa4JG_(%SlqJ*{W1VKe&VBGC}T<T7fp5BMm6);~Hr+(<#SD
zV>S)9NU4Mmd1xvgYR4h+kAK>_Gr42eR{y-NG-oT%>uUHW1?F|NivwsCFR!aI%3u-I
zIy+<fwh>N{6zGVlo>Kv0o}b?dQ%|e7hwN(n<@beB_JB`XMPDzeHj~f+*@WcKm73-<
z!yek|)Jo;_0c+_r{tG5Z)|N_@Y5cByfagf()s^R@rdGWh?IS)=HmLZGqqsbnZc0&0
zN6n;F8eDHq#5zDWSzD09F}{F1w(pMcl*0L9&8bW~xjgP4=|zfr{gj-v@secYMrlwO
zR^Sd5(buL&k7DPaX$x<{ij!j%k(X>MRPU6^mVP#EFdb?^5`js>x?j&7{ImQUfo}T<
zCd$Vf&u1GCo`9?3!l&*$SOl)jo?89lntKckWavX}w{RIF-7HPSh(y|@KeWZfn*=SN
z@TtKp4<;Jy)+QRE4sCvs!&G=msWc;{IioreWb`HECTwvm0ds_DsU5Vi;;|tTd@7i_
z;oV#yd4@T0hSiPO0owWjFo2|bboxzkqAT#D2#+L%$HmThRTYen`{uyOVMCJKO9Tzr
zSD-BsTVY1YXxn`W>z0BK7pJI{$~QHTHs+1A7uEW-*$LWklrWs|-Ik}%K9AZH;5t-H
zk+=^Tk_GrwzL+VbmD+6P{@bbG>wO`App`Yp{0Z2i8cU%+f@=kwiIGs9tQV_zlR5v5
z>l4Jrv?+%#dc3C@N;W;bH!%qlGmJ+u0VraP06SSJWpY`3dABs=*N2#>gr7&q_Vlo;
z$JUAt*|*i!CW$2Vb~N;_7xTE+5$m#q!w-Bk@dF`uC|x5o5soUgIlukjRmB}L;KcGZ
zXWC#*w7OK<mQqgUF%Q(ZF$$lNz=sP9e9+X=MZeC8%KB<5zsI0KTg-9=MDq<U5J3VM
za>&_wrGj}+Z*SW-^O$`=Ej7g^PjK||0!Gw+^iY>m>-f!G8nS&mEwen05!r1zlQR-`
z${(P~O8g=HL!hMW6BkskTB=vFPD+`sKpJj>3AJ}s-b}q{Deqb!@IEl{$)4XgUCL8V
z%fc1<H9-);0@HTuYb#DonB&g&wAZ&=?Sk~~$l=(>P;1QQ#L`?;W#T;krv3Y;nnRsh
zIX8a%(LFaunQyQN`r;yr>?mzTlK`iO+C399jSukKr6<q9LG2C9@l+k>0F(o>+V3tk
zF3eHWC-?r5m9LMfv98?tqs%Itw77cXq5J*&XoB(f69_!Rez*V@vDY6pPR~<2Q1|g!
zckZI+HTe1Cm7`DHtM}cx<L=py-MRb9NIqS?c!z}@g0L)UdQqFgvq#O?Cr|`@!5*J+
zPhX%U9)X?U+_@fNSSrDi>oTVxQJHiFx<y?*dsxpqlBoVSEds!+(>QZ``RT{YPp-Zo
zK>#=(0#4)ATDfuO*DEEEN8}#+e(CXd#zk3-=+`ejc)eIk@2g~T3bK<ur0-~J%5I&)
zHnTg@%j+y<h$<TMW7d$$hZ0bz^aDQsm_4_0`|IWFM^=8k=RW)3AG05?elq9Yd!!#e
zsRZ$&y>?{f!~-SRnFNdZE}r{%^~70Rpz-PHrKeXZIb-L+$w!U9-azz_>mfcqf@D(4
zYyvgAEFY1f1S-6u21hx(;Wk?ZydJY`QMfH-Gqs5AoxoS~`oXR+<0^kMmLJXcCdPZT
z>l18V9Yx0Sw)??26%WLF{O)9Vj}UlsrSGxvpsDEw#SdCN|M|+jdrQyf-J7R>TD%P5
z%QA=Z)l{76kHCe3^hArHNg<H=A4LjB!^A57GAeCN@Qh^-5NQ{Yl)$fuW!d<I1P_VD
zG&}hSqbD^TU3zp0AF;TL)6+eOavJXZDSUFfA6{>KakTOM`|c;FG&n#j1ZxYu!j*OT
zjdDwg`V1<tw~Tf38jESyo)+mE<2L@e2pdEbxbdBP>{t`phF(L2y7|$y#;KF+yvKF-
zQffuinV4T(e)0izv!PEgc2R;ChL9Pk<R@0scZt6gR~6oRHP$G5)1Y-u(AGpk7Op#M
zly2!$)xK1#U5Qq^(ku(8&PcJ=D8cHW^vVJ~SZWnoSmnPC>no(PEt*USB%0q;6cs3j
zYDWaMM)8yW%s9sMvul0ya3-l`ask$qCl$a9Dunu_0swmZ|NKs?OlW&*J|BGOoW5_Z
zdb)Mf(XESq?xhFGOAnIu@gVsXd4{}n=y(AR9lruEjV?LVGmny6Ej!u#R30BDz~VAT
zh_3Z_M$qT)*Y1q)($k=+r$MiY%*}0NI%6Nhp*3Y@+A)*~y1-lVsFK)^!fjtDp{SmR
z>Yi3-Pc}aKOXKS7(&8m5rZDFZa7zgbV4FQM&#G8ML}9X7JIGz94p)X0+|DMN>{?Wx
zOY0q%2bycOGhB4lvEKOg1gr_~fh#3mv%5EG9*~v!yUS0G(i9Xle?;Tj)yC!b@p3LL
zoTg!QL>iLDhqqSGe@^0oopg^uQV$!J6I#8qRNFR<d6#L-j5#?*wD<C;59JJAJw{aW
zUNN8ax1Cnyn~Ikppn)3vK*-9yf?EjqwH++7v8vf6tg<WY4f6ulqt=_UY4j;4(vCXc
zeu@4~tMOOv-D2(?MB4yaWr(ANobG37;jVk-x~!+9qE<>aAymYZFL8%gW>2~I7FSL@
zz|B(c5*i5te7J~{^y<a=)vqqSRII;LtiM#Ozf`PuM*k$!`Jk-eFSXAMO7Lv7qxW4m
z741u{Y`eSSm#^Ub`E{EAq-xCHp4PHsO3D|clkBJpGFaL8QmtrLEB<5l99UA_{Tg3z
z{xSP84OJ`ii{c7HB@iH)I62$6dXisx?&9)|QxH=c-*<jiwIvC&MQ;o&bU<Wv_A-Pg
zCq36!@THcsGhT?lW$7vjbEfQ}x%*U5ia@Ghm6kS>$i9@Oz9?yG=jEIvkeUbxO9>}X
zO(ONdq;ci>=c^}9QJIiTL_t1FC@UpJOCp@N0vjRartW@n%AG$7k}F9H0!g`!KE=eH
zx@;geZIB*<1O|eP$r@mwM7i{uMYQ;2|1Y2Awtm|8xeDHq5P|>Pocrh@-GVem2%aiw
z=IBG){4OorgwzLQn!uHYP9C07NOUnKbJ-rb9~aW~<tqy;oOQ4`Rz6J>5wQPK1dn+z
zro?`sAyJ6MCxn@8S3d0d+y9(`VlO$-6gU}Ixf9#3M{*4D(XY-fKbey`sh<`vQ?dX6
zS+Ws`I>ZM{&kk^CAaSu+!m;DW@J|#GEH6)+&WH5m7UR%38zvHH%W`Ivh{yj0QBgCD
z@Z2d-7LjORxHq(AcAhPegmt(Vv$ydf2eR%DpTCq#zm!YAluN%jxpcl-2GaYRmBCp(
zBbcU4B72l*HZERYnSTt&F(aBQ&o99*MmIVCLvtUVVJf;O;6CQgV_JTP@>ax6R#4Fi
zMENfBdJ56qo`11&?+B)gG&vC@A9{WcNGT@#Hy(qb#3!flt=e5YOB(<t;M^$^G4agG
zEaU4a-6gSqpm|J9Uy%1ju&ke#H+SCs<Lr9>;463W0t}JFvv<G0;6D0+WH0!fWhmnZ
z793tYX%^v^=;hipF<7%4lmKW!0tVG}XMDvFOXN{eGC^6)^7i32%FXN{J!lpyC4YDt
zP!<M}p8g^j;wYl2U#KgOmq%ssFOBohtU*5E)<Gg_vdD)gPm|nB00g|{6mv|7B#Jo*
z0u1!ncP+sQNyNj(^&j0Mm#HlZLvT`w`cvX1r`^&6Bf^%sd=Bpv(Q(`&<cgJcrrBbE
z#b`i?O8a!OiNw3{i!a4=X+{4TLdmuhrggo-tkMEBV~+dd-<F@8r8#yfO07IU0mPcr
zu)zCg3dkBod<XIbGmv)&7<iJMk?q?b|Ax1`d-F5M_C`EKEs#4Et5uJ&ig*u~o}6C(
z;2{<LTEn(#z%0Dq+!go9W3Cs;9YQRC#r@;*O{Q__x_kcv$h$(4JOidsUu_z~{+FRZ
zvxff+ouDUFy`Ledsj^y)lg~i(NhSYZeP@I^oKy2`<2&Wby$6(yXxzQ!F3zr=pKV+_
zLXy0n)S&-UDIMi{7EtO;<2yQJc6{>r%Dr1c8$+?&Qny_VX45XQILZY6>g-AP>}NEe
zoVObV|7fYt6@=|+M0stK9kg+YO~L3=F;yjM?P|z%!&3s6iN{xXFbUNe+aGO$)vD?>
z0KMurAuT(6ie(kIB)Al?d(7XSVPXr>h(5fIX)ne{y5Lr#OHi*Ps$1*yztr6R{0^ap
zW;IAC@luugQk81OA(}A?ZmLA|1E&Yk;7B4Mo63e}5C+rf<ILN^J@pvx3yV-)u*xK8
z|F`amr;Q6hx|SAhuFPU4^(mF)FD;y>>|~HQ#!#VhDc#!CmcNit@nO^I>AFv*=`<9a
zTG`*K%F(J=e5n9jn*xwQ_!a8=pdi(oEu8UWbzJ=?)xgK69O8^|UX#<)Jjv*jTPx3=
zQR5Q#=55+EO&Y5<jcF*p{gaxEZW4JPO$Y>cXPRXE>^b-J2@FoleWK-MrdfX)59d}t
zzo{-G2nR88-8jWezo;Di$9YKV%^h1jarwP__?0km;I1xbmKOd7ZAumo>R9??T+~SX
z-0(us5rZ=S#BmG^&n>xP-gJ-MfmM5U=A~vbSTng^%Ew=&e(|44h4|ZcZ3z=IQjQCW
zNie3rKqPySM)Shlqe_+hK5~9>`MHJH4Y+!>p^$s&i4!mz?Rest;^wPfBCw@}j~XX$
z<Fg6QQ8+!HUBqXlhbSUzxAkt2o+@J)>lk>EXlJ$F&@6u-i?JsE^1H-ny_S51;0mYW
zd27KS+o4jwbzh^95g>)x0^@rKYXt0*BzyZsSPw`yikGfU+JyYFwG5q645(0Ex;DLZ
zZCXzO`O9#IzcA@^hx42KLWJAhsPV&{#s_Ed1%Y~NF!?gEjy9SG8w@t*Lbwe1J}@`Q
z%@esF9F<KAzRhInXyBeZH!#n(wD4r<>8D`U=p|qV4~{2y-Q%YzuNO3iPPrHntDM!@
zPnMs2&W8h1d3_Q{Kg(nWR%TDB+Xh}zZpO-u(pHR{Dq;zG7cVnW@~x+GHfswId@7bh
z93A1Q-p}e@WgE@U>|+(COE1pR>Nll8=%9k&*C^!Y0O)zrJl!-oa+;)@re#`6H{JJM
zrxXn!e7bhpJvT>1{eF4}py|ht5ktP4%ePNcq5ksS&z3LTQoVYX&wtPC`HtOjzn^u_
zURphI7ENp~;hng<QkpK;CQ#$dZFlY=*D6Sr`+$;3MPZM@Avj8{?5UmZ;wAUk9Q11O
zEbg$fnyV*HC@NXW9E7N3K}kR?<nfue^aLMU7lg|Zy|AtxeJcAK={MKa+4q@cAEERj
z#}pXyA9?N`J5ALG%Mb1?-@eByHyIxhl#Y_yHPILZjZgF5<*H)I#Y(blhcq0$aMbm%
zb6s61M69ie+;R2N!^ZsISI+$CKD$UokF+bYyA_@^HbviE0=))NYP?ioyi{TQv#Bsx
zE+1pBLTrg=$J`J9LM=7496l{5J}kG;Fa{Z6uBGd!40tL3XHYsX|M!5gwdPWL%Y~uo
z?V7j3K-jw%3X;DTLGGreLXewpDys=A=WkPVzm6O2r8xTqh_hcxuwP2B*GYoiaZz>4
za%swDm{*<-uo=wwGo8ueDFr#eOG4-YXwg;oB^_0_?yruH{x&>%+90FtJL9a0n0CM~
z<`<&l>qhR^_ZyeaP@N0dtA2BY$8rE68ichWvNP`AdU@&5Nr-DD3C0YNU_^T99tE&G
zKkJ^p05ynh1&!MP4)IqWxHnHXzIY#A{djBn&PNc+{p>vw^X4+`0=U{eod-Xz?{84O
zh%S^alqNF;Sr}y#U|7NsvQRzbFrrZt^z~S~^7B`q@8ox$Ux3n+bj9B<=NG57?3~7r
z=6_Kns*UR>+~*fr8W@74c72&t$4@Ec$AVeXGy5S4oZ$pr+t=+GGv?>a<MDHCd-bsX
zwyM4J6N$a}iE2?1+I();7;7?MLYqx$)4r`a^rTPcUMfakDn`kAx1|{EZAl{uwYb^{
zz6)=pHgu9<G>@Vos)8h8u^uizzOCvriC)qx+pboUPGhHalGH#&jRD45;@U}Sy|c7%
zt8sl1^M#mM1on|QAPJQvZp!`e1_MB~*{!S`(OLVd_@vcsSLG>B-qOP5<qN04oow;3
z2=BQ3<O7TkT!~DJ_o3Y^z!z0%j`Fww;1svVaSnh<(ErX}r5MuJIP(dfj6Qen#PZ1#
z|2zAI&}Y+bL4(h~Q|Gu3gQ8)>lm;+N(6(y+88S%s*y8GqXKcI+XM|Ij3|`&|Eviq&
zuC*&qz1Lw?r)rIGic_tX&}vi9I1fJEKpyd7VnXbcCRK$Rq(?>a=f!DJMU!4UEVa+;
zu0{=Lu3aUnWP<s6?%<!jU&uZC?dlOs*p3^ac)&()^^0qGzk-FG@Xi+0r$e`TfqU>-
z^YvAMw#Eyr0?R**i+B9X`CR%82yI=04G5hx`war!8N*+54gzk$N;%tT+qwrgnZFip
zJPIV^ft^_l61oShb9>VC;0_8&qN%lNUs!JLUoJeiG~?HUCK7j}4IQp;o2^Vwd*b^h
zIMISc-@yLRGJU_AVWj#-UF*X(h5ZJcR@J8d0NPTVh8t|KK2>bmAjN99hSp2p8m{&5
zxnEV+A_EY7sFK0=b)Y8Kw^pmpNCuD)JG3L|hQmN?U+iHd_SK@i_{#qxdv<T#`Ucvu
z>y52F$WL<(5H!6tSx}pCTpH3UbY5T0XD+mT9Iw!-t9+1coV<jpWt@SQ>ueV%w>7Sq
zqRDTXn8XPHK@-r9IUEPowlSQT<TaYYaX1|^gyZ1WW(F5-H!;~U*-m(CIpu2E0f}Vo
zaA6(HLHNj8jr888MTgiXrAke{=klZ83Knyp`=EjbgLan)ImBh#UVk!Q%$DlK8ej{(
zEvAgieJop_hP#7?$zm~ero=vijT<HA)CUtoS=;bxs!qN%HC1&0KwMo6b-Np58#M?7
z>o$c-=7#fG@`PfdgZivpFaYA4n++!Jso6LgSE-0GwC<QaE9SgmT1Ve&X~}PZzRPES
zF=lIU)a#N-gIPNa8=9a;+u(zZPXfp43$qm_**Z#J$SR}c9bggLXn=#U_b52jY>T!n
zfpgT3jf;ZJRy0jD!_xrc7<lcg<Bn|JvgJSC-n`}ic7}Uff15+=nuVu<FZgU~%hr%r
z{<$m-dHZV&Khlzwp=IlburajMYB{2LCqj_7YF%in+p=w;4M;~~g|D++p)&bDlT{&a
z-%2*aid=0^dws9a?E&->CbXdH50*r<uJG@qVI}WHkYOS((}GzduhO}QENeAM6w}?7
zL84r{)8>diusdmt$eRc<MO1LI^M;645+lqIr4AY<f^x?~Gfpz2<hOWjS-;5ue-;Bl
z-neHlW=>Euq2LRdoj|KVq<uoz15znuXD88rBAa$9P?*!%UW!vtBE|hNCHEyI$YA0^
z6B{?B%$>FZ)aTCn>7)%H|FI6S(_g&(AOF!3GeGe|wI8E)QkCt5In=DjNDHAYoT7+u
zn80X{f>lc!7Eh9Cif4!<J@lW4?!c3v+iL>;KvshynB(}yg&AXQlags+BAbE8F5|*<
zM~t^eYt5t(t@pr)G~?GUGUsvq2C3VCt;>j1ODepu5%HdyEEhR$Oc%=OQn7HDUT*L`
zCR5BI(<IC^HbbwWI4$gf?nQ|n^e?<HJwqyKmaq;J*)ow~D!?g@zy3PvOC(1|qC==R
z8BYvHM-YCb?z1;bl`WZiHB;EWBf_8Ut~kUb+sWZw=M<vo4^Gjkkon({Ka?+OJ4h^V
z8guf~m>|`0kBEt@y%?0Lm&pkP>Gldz<~T14f~Zo`6Mg{qXgsrJ29c9g64a;7vw1X%
z+gjC`Y;y^%1-;&A&qH*`ppIk}`v(W<Dvk_Alf&z_j7sAvreun#cL5dA)FT~;@;=%M
zm(jfnni2fZs|OCkH%-tl+pNL|^6no0T)^-$LAm#jyT=~Vd{8teB*~aV4P?QS?L4@>
zL0ZkTU#m8^G_&!-w-i+4(ml#iET8|fF?WvIF*CE?lc({e@PB8&5H{5!VFP&IEnbFf
ze<WD8-he&|^AZ|GyWifmWB1<l_N_bru=lO>-nZV~y63H3+uty6r!*^KJ_q<l=Pb6*
zsYSZP^Dg5q<W<2KMz9M=2vjxsK^Z)hV8(vEfiRU^uL`&_8J^&ay=u&qFyWIkPU+H=
z2Ie)t*|M{@vVFKmctY~VYGq<?YR(5^XE?7QP+UiDQAuo)ZnULT&SMU43O9jB^2#ZE
zX~l;^j1a$StWd~IXXp>$pXtx-@K@T@G0df7f)6}9)%%tz>3w25D@;*fNtJ++9qQ4T
z;W(i&iDWXmJ|w0aZJR;|9kf3)<DhINyWbf_1*f*5if9hs34uKmsJ0)IoqhY!1Syv;
zm#TRHI}t^-61k%zIAC!x2i1!heX>rqil>=MueDaU9?CjplF}(LiVp5~iV&W)3LZza
zL1%i>$yK8g&QE6!qe%x%=V>u~L{##QKiK3bs+Oi5<V`f1c53@e7*ei;@P#Uy$MgUO
z3D8Q??gA+zOi8X(+)zVBhYSHK<!flaQ!!afW!ES{X3d?dAUfaERW@9uv^GmdWd<P5
z`5bMVd+f$QGC`4Ocwn872tMyw4k2x?k`2W+fyY&Jqt^=u4@tX|TqD@M98Uz(ez3DH
zA{8|E%cdc^UxpKYL){C&w!qW;R6a{LpwFl+gY;MvaT0rve2-!BEuGHb8-1D^2az&e
zmNn>#ny*QB=!;x_Y6|uB;i*8GjYJ6KR5!-s@kC!To*Yah;se>z^mM)!Bk#~;^D+Rn
z=p4is!-7MW2z--<1_t0ck;S7p`@~{Hxv9y)$xZ!)_3rCK8*|Rg#$vrtAmhZ{4%r?o
zRy-O<y@_ZdIh@2RMF6}B1UZo%&gOENK_{2Z4h=h#BZ-lbc*e<Ok^@tlMlzX!p?D&h
zjVCa(QAB$%(Mb&B9*&a#P!hK)K#ii^`C`d&Dt!m=9zT>vzf1JTlgUj3{qgwl>xrR$
z%;!zNMvA|IkB(8aw~lYu19<z*Am}-YZyHS`Mh6q9H$D=l#uIz$lYe%ywNW&-d-L8c
zZ;c~$h~7CR<X|A0Ef&ff<%4D;rJ(g%9-lVl!xD!54?@Wi)XVC|VzG_%Oxl=ZS->`m
zV~IZz0R9Ih1~#EyOXYU}k$}DTYKup#`v0Jj5xjM@4<AH7L60iR?gyo0E{Zk{4`GT^
z#TO(%{YY<56pakyt&7hN@}N($U{Y{4%ydkKDMdh21kGl0xug>xh{fW`sgc~kq>RG^
zQA4%Mkhr&n_5qPc63J*{0QJJZ$sv#G?1B4nb_-OBkiQm-8cfBy(O!IDmdZe1OBJ*M
zV`6c1l&ZzDrSf68g>2v_15=CPQ&w`uCHhx5b$9IEgO^|ipW888O9wz8H8vVW_&@as
z9|y)(<gvHaoe8Wf=eMVGuANf&Zolj8%{%|FHNAV=s96F&q<2gYfzOCSCJPJ`R}PZ5
zB)hoUI_SV7JyEImJ$CRjf#7GFfR_hoX~5%Ujdr_H9p5TFA5mB<p{+rfmBCCFQ<zhZ
z3$JvoggKN#=^(i?6iY}IxKL(=qeqy@T&GD~!{pJerp}WUO|&wdM-)o~okkU{2@hE;
zZNCt7X*#LvHklopM4HV}t4(qNB;Mc?<UV@1v~Zq=?w3;PYLIBi5qJI)NFzQw(YWz7
z$fT)Vhx`24>WwD~RFFh0oaZHkcOD4*4zpjGLwROvq2H;s!4^S3{TkO>8SK3_xRhR9
zwdJu+5F0!m0I?N9jk(X@+t9#Ii*q~h81@jfTKVQk<NNo2T09x1)6|llT{<F~F;WyM
z4~7PM+8i6f1GB1ar2n>UFIrywY~|j^OADtP=kGO69l^Ky!+j39JVENJD-RlHZc)E(
z>KeZC_mAA!MLY+MD-UQ;CcLKZu|;?O5rDJKL^qP_Lf+=bXXG=U<eHd2=^neYeE9>B
zj>CeXkaKJ9IK*?h{K4{ruiV88W@9ZK)k@7459$Ch5WBT88bILE0$wpHTm)mPcbK4v
zi5MmQ&MtC+%@oG)l0p~a#AF2j7!AoBhGKwyShg^AWkzyn6TaRIU`8@G(jV`41_nbi
zlA+rPU@t?%0H!mUh(|}@QXh{d;vUb*a|OR~xR{yFXLs)X_dVGR0C~Gq%Tx|;1u;Vo
z|BUd0+H=mLwkad4lmN5R`!j_p3I`BGdKft*^T1tQU2m7D3KU)v7M7~;!1ALC2Nygd
z0)rSA=n|gfPy!>;p)q{J!FY}iF}mBA7zb1O<hWEY=@rzAp;3uQQz=9mlQnz~q^WmU
zqzPi3QZ8WpTzc|#<I3}Zc1f)MaIJ+I0WTVM42KfYL)=+~^f-bJnN_%14U%NmY^!r8
zrAXf)T<j2dwJ4@O(qF*A{*l2bK2v)8M~0&VNfQZol(t&q!|;Q}5)Vu{Bf`WdV)vqI
zy<8?{=IF3fi)rF`Klc1EH%Abb7ZFV_PV6#?F%u?bAm+s1s5OOW4VLKXp+8;)Gx0Q;
zY>j>9W97<4)l9%5&NsSw7|aa~yM}5cp9A4j?XcD$TTd&u$E9t?Iq!9UEHN(G(->#>
z;gmXS)}UHZb?*G8&aZ{TK67dmZFz33<|N_g*r@J5*FKCs9iWDJ_TXP3C-FBhSKaHr
z8&w04Hs_(c-5Kci8N2sO^SKbpTc30ihuD0mzC!9l?qk6Qp|;G*5FXKVtT?3gzlpOL
z4;o}+_k~qGH+0wBoATqbp}$g<4nV8N?c)RL3C}*eZ-C`;=2>hBGk9YTj9KfIeN54-
zth;I@w(N2}or~61Uc|!jHbT2gE}7qG%Uvr~vimIyF4e%BtZH8}&9zdh<Lh0i0T8~G
zJ+W-Lj{jm@Dj$yM{**f8H0u-jhkH%>CH;{q6YPx=FG1!h?ciEeVQh5SoIChnA|yOj
z;zV;tY<#HEh{XpI^`XSL4G<6?4_1nI%GxM=6rV@!y&cc<)^_pRw3tNCV+GITnCQ<a
z>x>@&;g?n=EU5xAsE)F&6gb8@tF>a&^+Gyg-Tn5Cj|TxH$RDd_L2Ob<W2#-xI_XF}
zs?Lpkn8Yeh*})ufLgAy8%;`}?0i`xz#NA*t@3;Rg^j%G3()w-}C=-gbi_)R@xv0_3
zF1Pwgx3Q~LE0DH!Hj+QB?(05_`AqPhhN}|j;D=G~K!1O9c&NpjQVaX2SslKm+SpMx
z&>X=$nIHaE6B`T!y>iJY3fyY7e8Gw6J5AYYJvIyo++grgF8LNFtGh#oZLddfLLawE
zozNRmsny{l*M!~L_LQr)vpE$2?QBr*;59%6Zd0Q)e3Zl2C=q!4+Df;00Q)r9<Ouew
z)r@!q)NA(`_5=CC4q~MPft14RU+|@fVN@GicFm`!4y%pxQv$KZP1?>D?K3xME8DZ#
z=4@qau6bj&vn|`(ls>iAaHiXV|D30~)T1W5^#z^e;Ru3BYiJ~DX0v|dF??T`T{-dK
zH%LJ6U{5}CPdsj1JxReK5${(oYQ&i^DGYkeQ$vW^jtEC*qdapFjaul;C_y#M=#+{8
z1y2KyPrx-Ac4BBou87r25Vb6_uH=tXV0%cVRtP3plfT^%3lffi{xV6UPc)#`X(W2p
zE=BaAkb!d&l1%!|yFxzw3dMcjv`lQ9^n)w*wGyq;W%+VO|MG11n{&C%^+WkWK2tfo
zy;QA6nEpGKg=x#wYbBuMu{Z0*EZ7QoN@)0S#V;}NPQ?4cz&nu`8BlC3`9)ebX6MhW
zX5E_8Gx$Li^R;}Y01l-*HUV+jDVGa}sn;xGJ`F6Cy;qswXt`9a5r<V>AT2!Xs285+
z&mv7NoLjmR9?|=gGfbDNmn&c%(>3-dQe#{sL+AtlEAY#{_R;EBA1oie>n_f^w?B%-
zxJFb!s5uZI$T>ua@R%v)?xLMR)&2Mc>WW$l?A@OyijoY1Q1kU^#2kfqMbOrQu^LcR
z=oY7>c)#RwU}-M@)1-w$skqMooO7!AeMRb=QLE(l?Q<$918yS4!@M#0jhCgGpDa-S
zD|{dp>N&@=<EJr!W-^t0hB_!>VA;|P1_gF04%pgCg&`RZ9sz!RnEru|MVtKYQirDL
z9>_k%I0u1@_Bl1&7H3<!(hMO)k*+9FNoh-o4*?Gdb2hVRVTd6I-*>V>2bc@uMMj(P
zHr&ernf2w4Dn(-e8Dy6=y2&DnYArMEq)U}F%$!A=;`yk`<K*Xz*AcZQF?Yte3+J?l
z$d)lvR}&dr1d2u(4P!x`s;RE&s=h(sb1;wb*(2|1!EyR1N7L-MioE(NZ8-|YoRmb$
zx#!#1$MdHebLajs`#!g9@aPXW!IIpjz>udZ6AsZs?lec$teD2cNQJcLqQ%JU5OEnw
zB!`k#^P(2WEdRXr+H2l2#ArAJrqJZRd(8RjK)#%2{#mSJB=AcV_rb{&(;lXfEel@P
zm?*RP<*t(+6?w}!97UOmgASMK!0=SF`BJ_5uV_oANFLQPg#)nbr=5TOui~~1)6~qK
zmy4oF%sFgBSzMZz#I<6ob19mbnBd2hye~!4J4XF2c%vN-!c)>-$UGjbB2k-zcEpt8
zTK%mJ!FHe%H>nZYgs+(!O0Z#EliH*!)Qqn`J2^Fyj61PdZXi1}IW(AP&5SQhOE6Pw
z?=RVHBcK{cKuiGU2Zgy2Oa_SY+je}DD^?xh;z2gXR6dIenc}{BW}k!h!Er@%wK)~y
z9SLiMPdg5(ke|-i*zs1!^HY%lGe;4Oy`DKRb35Bg9rirSJ6e<-7t@%c(+AW<@Nk)!
zW6_t}Nr-1|`58rf>*WI8!2eNXp0TnK*$_~g{z2-mM_^%oMa&p;v*UmzELJ@EoXRdz
zkn$1Sw9SA<e0Ml3jbPDL<y9oNQC3`YfmB<FDUb`LeftEHNQ=<(uucJZc))9@@lmuX
zN{l24HOBuW5~f$<+dKB`#F*!=KqBOW@Hh}O22<;NftXz5Z?@AUApV<v@^vGg_(y)m
zPMJ4jmFa34;(_UgJd~ahCOc~Honbv=TCH_k(~2Ui!A{3|<Pk=4+=h)RkMw{)YK7(z
z$r0}klR-`6lTZtlRW2QjBzvOB9x#pA4EGNZ@7VrLH0(&{jdMxOFs7}v;4jxW_r)60
zCWrf@i9x(FBk^d`5)lTHfCcYvN?t@HBuj~w0w&QaMj|m1OhS96ZZ``Q2#f`>0RN2S
z3}bqqqCQO;Nyjwkm?K&@*tycdq7QTnEVI>M=$ULSJq2WV^On8IH+PPyOM>5yHz&@A
zbXz5v!57_+NbN<`HNktH<$)1X36JFSUR~f$HlUGrq(3A;a3wW%9Dm9{>6NMn^ZN>Q
zDb%bWtYkNnw-FVxpQ)xkNExz4IOe}RlJrbWj4a-iXG^cZMk*J#4PJ=iKU!>@A?Ycf
zezpAJL-*_@L}SO1uyNp|W62|k7bt0gCC!pCktCOiGMHGV3g=$rkdVkrUDscZM}-L9
zYtfrvbKUZM)i^}JDyab<DW&fc_SooFyIwCuHJv^%L!ppkl|NDsX)}Iq6cy9BFb#v(
zB$lo$3GE-)MEK^RP0>x}fw7ri*8GnRA(<FOlciFD3nrs#Db<5aa^}*z37%bWA(ZNq
z;SkQ>C>;#|B)zj&9v>gK8Z_H(!jskijv|KkfkEKaYVXvY%3+UNBUtnfMIlSXd3uc*
zhNHW;f0O+P53wThrwi_TAQ<nfIQuFfbDjZ-XSM{g9V(bQuyKYgPaZ^$c%Cx&mN=a`
zfUk)43aFh>-V<m8r{;+<Y0tU>S|<%;s23k~9@Zru{9|zxKQP1n0l`384TuAAv(oiy
zX0pJeS4O<z><ranC8`p=DU`3~i+J`iFC0-}M`dqJKN+3WIoXervvx_FX}G@|M~8<h
z;ATs3C&?E%?p27F9!U2hvInQ2=_wC#k3S~M{@M^~3^W^Q1{!a10^YBDRP-r|I5_}$
zogt58tEA3FaKJJ~WoW=I+_87Na5G(_w6QMSF8)qh?^4MTJXRjO2h@KLmyqBs@OEi8
zlVVt8Tk_!KSw+T5*2_7thViSQFJ@F3P$)0v7lhY_(u<x`gqG}gGC2|_v51eYH2w8E
z{DXW`>nUjD9d+{j2onM^jP$-rsb1C%RkextfLSKkA6%d1BoIxIn5AB>1#*4P)WBj!
zMRpPZ9nO4=PU>RgJ6u0zV>c3^SFHL@rN-_Ucxt4myDBNfGby)BR|5T?6z0d^P!b~|
z^D7Q1X(L-)0jWpif5P~wBcxdbK2u%dxDmR{>i$x_kb?+n_~uTSzIVfR<UJ|{&sV$r
znnUlJLw(OuT3#(}qG)&1L?5SzK`WME=4(uH$)g@0bkLvc)f$h3Q8Sy%5*<aF<^50_
zVJ`CUpLM@z2SIQr%{>{iY5$g$8P-r(`PnPCW#8rXGKBt?H~>r<A*#RNUIyYrL;x|6
z+!XB}9%zewA)OM}F6oz-*g|5)^xLxa98p1%DkaK=NpGwaz$P%%OH5}DNi>t{#PB0=
zx_P=ZD4mQCjM%u&?v_x4U#-*%qiF2+B*FTgckurQ#%*<=TsoOEktrTU@o^}Hx$Q9q
z0Fj6m-!`uQ*tq&vu*Q0PYWc|nfgr{Z=-+ZChupdQXiS=!s60*9j^SE4C&w`zh$tR*
zH4^ay(j#(K&=Ij4AyCG66Zm7Jybce$g13<RC@~H<kqDnH0BT(C;X~uWrnQAj$*t+t
zEPP-vETJIlwa+?>7Y+PLs$YMp;=XztCeg8__Nrh<6R~xToNy`SoU;|fWGE#ac(x+s
z&`l|)vR%@SXQH}4KUGV4*FnpQ!tEKRNiJn8OO9B4I;FH@%2t+483Aft5ucGeM2Ik%
zO2p!*kBR0)y<DYKE^)PzR<)YV6fzacjgZ>@{DHMTRUu%m5jT+P`u)y5Z|G()Y`Q8u
z24*Um7#JQH=^q*xww5e9hiV~<=8Hw@ahAm#MJjGB1A{jf$>`fu{*eFsOtw}Bguq+>
z9k{22DR^Keu%x_er0+&qU139&MTR!pavjl5TQ8H0T9k>Hq>UO?sz$YIgbagrR*Oyw
zyW0PtB@HW92FBqGlz#D&d+M|0lXIBxi_k|a_m4v8qsJHA@88E|!cY^mJb%%>dya?I
zChgIn-R}Ig<umt}79KY)ome^cZ287tm!3TQX>rc|=6$rehGC)2$)6Tat~|f8a^uv}
z<KvAVzh?<!PJJf8c|hG`KdhdgZ=8SRet1V3Xd;G|Z$DZ6{3bv+`4N5z9iI^Qg9$_j
zg&|H@2QY!$<HuHy-f_=-330_qSg1ezaWjhi0qzY>uJ4aP6p@&_9z0tL3bCkBi`OB7
z`I#@>+aEy>@M;o861~>Ci`Q3AoNZi~TRwB5apl44cbAk2OU97<)iw9S>v$0E{o{@6
zC)^Xq8-Jf)e*cks?IWFQSb6%_#_6vhW|ZHE8YgF0ug*ef^@XGE{qqpn>&#!4uOGor
zXk2uq_5K)ITDaA?zPNhi@9wj2eW3!pv+p-9+~;AyPQg-6Sjf|a0$;(!_Huvdv7v~0
zn|8+9vw=wO{q$dI2`_m=%IkrpTqr%6T1A{nUIlpMO|*A~*4kp{17|<wwq-MXzxwKh
z*wO{<Wm6FJPR+aHjZ5xKbC`ZywCI{o0ShR60^!4Nzf;S2wY_JS3dYgx)iYk%xDH4m
zqOQ%W+GG$;RbMX44bXm9Hv~6RUGl0&-qj($*Mp|vmWwa$x%^C?+Okd_M&9DEI^w~5
zJzZu~hQ3P7*gK&Ws9uw=s`$%Pou0LU+QNaBTF?=?wYFwnU|S`hwgTp<xsuqnM4tE;
zHzv$}w1{rznyYP_?|rsmDrgg1z)WT^2dV8=t(Y-1SyyTfSgYMkg{GF)uht<|R-(Qv
z-EO0{!tzP=iS*W|lF5`>baJ9}pyp1LYEyGZ>c42sU05xhaYF<y@$vJ<?Cr*f7u@?F
zpg%^@zemx3s>NDU=80;GkXxc-ft56IwnUPw)%5XTh7q&1jSCN`_I~NfE%(%;)nlg-
zxf^)QD)4LwSTY2h;P<mS95Uol{1xO%d!p#ASKk;X@*(_lXM#L>!w}}+0dK_p9~G67
za;>DbmNvKYPnNe%m-$JOX7!FN$=>5c<emVjd%&7VvQz)2Gz#BNquQFvKi8*WePJ*o
zPZU{=iyz@T!qU^z%TMQ*7eB*{zcxFq={7@X>2w{H#!35oo4_X|0|cL`E{K6oJ?g5p
zN<CWx9rpf=NI1YEU(YxdFmFaP1F`<07^JQN6MnGnual%OloLv)$;CRI#@~noO8Pxy
zVTt(Xh3VSut^4ZQjpEig&t@QEe1XHaM0}ZBKKZCI|Dk*HgXMF_CFHO|p}~sfaK`Mm
z%MqVaB*@iu;la|vw`{T8o44KjXCWr`)8p>^wbjp$xo5v+wwZe5<5MTx`O9!s#h3EA
z<5v6%sRJdxTpIZ1u?xGAy<^a&3(wuh-!#6swR~e9can~TOhCY_v6;Ji2eH6SX!X*=
z#{A!ZTD)vTw+1PFg6wl*e_HHIjN?gDHf&%E)Key?0sP$niJw8L!k-rB@bWa~K3;nA
z*T%)`jUO%mNC=y~v~YLj%#ZHb&qz@c65A5_noZuq<E6*PNi`ZN8@EPbHknq69V^d2
z$Eab>mS13Y1-rH3vvw(#1IXQnYx{AN?)UFQaP4DrG-WNse_y=5I{S(LkRCKHK3uu|
zwR`*ri2i<K4o|J`Xc|{%0o_U5aQB<{8`rM7XYZ{XeHt`pPmeF3{~ksF)3*BgId|@%
zHf>7_vv@nLJpXuUago5**4B$K-%H7L)xlhZ<gjHltCuqE87v{YB~ujGh?w*eS44dm
zZcwXk=%G&TaeGh9;PG;}e82Frbn2R5JOo=OIxaG23PP;hyM-SD;s?PGu{7pP(#=E{
z%Xit~qi#bB7<<(KQzP2zW|Rg`%gU*v?v)2*(~*Gp#~#vc7_zf#!M+z~DW&m9YqpoN
zGPLVwFJ-WJ12OdP|B03tKXvb3YFv5H`0W1Lc(smMRGFY#Hk<N6h_;NWd<9$bqi}2z
zVQ}<=g>|_Hx#!MUS!m?m|DGgz01m-#V|62$9?|ubb^x3ol;5)PS41pB$4tsvSS!jw
z`Q__aS0<{HB%M)NX%avcL57M5RG6qA|K^@L=AQd*>Cs2-Nl2N&IL}@@W1r$D)TuZ;
zG4{U~>3ec8hQ?M;oW?7NSN+6s_vQ)d0=~myAb>x1T^hv$deT$X4b|I#s^!>2%!CoY
zU$k=L&hpL2>=66vyQPJ%+{L4K4nU68ICIh%BWZDE_SDkChmC~=s;Pnr()3EevlE>Y
zO$LJ-jkBg8LCl5Ep%*uAuRQn?R6z7OX<2P-DidVjAfZP~W>T<IzQz3K(!v?KoOpTY
zuA+jEfa?O@(?lyrLpadwGm0XzAr0=qp`NI+8A*YiUWshIV!c+qxtS_JA;<8EZ`}{?
zxR0JUKDi4V!pYgCC$}g}ydx#B&7cya6l$&JXA;Jaya2l}t)D_^Nb!m;@hB1tN+fy2
zdp-_drQjnn9NXeXr0z;)G&%~>JVPtbKE=p~+iRSjr+onOMqC*aLnQ0aB>O?~C!L{_
zOBIfKU@U@1ihHFZR`R?nWvr7cn7N5q+j4dy-XEJmEk>=y$Cw`k>NHh`dhUs{?){JP
zAn<6G|Mm><bMbw5VFC8eqlb+rH&!10Adks}86+t){XcQPf=*tQj)klqB*zK>p)dFl
zCX0>h3-0ry<4gzdL-pu<SV&eM!naVoZ>V11v-yp(DzeHR{o#+BtsoTj^ahZHZh%AH
zeS98`{hm5J?es*AdpI-s78Af}n`y1qR$_&NY^~TD`deCwRSL4Rf{&k76x?gbG4Mw?
zNpC*D8#1ypVZjB@RLa=;)=sx!Bn6-LO{Cz{+(3%7A=v}PJt0rdnab1)wH@RHymP}{
zJljMiMRBWf<SX~)X?n8A<SYnXWLsCfkt9;&)5tIv^8?vD@tB@ndLczt&&BY2=riTP
zTY<Q&bj~JoZoxd)r-?wrTx0O6nG@UcOADu&snYZF?v-neAMP|hIO~3RrSZj4%z`a{
z`Ips8f46kOVx%P>3O>eODw7Do8k`D*S1FQ!6Tmlj*=@3bnW;lcb1;XWFrX7f@k9*s
z4k%Z0DBt55iLpW=6qnkSwv5Sq-B0D`WRIvyRc2ZlTCFO~E8AAAGHb!MRjb@^v_<7A
ztrlLr%3N-nDp+Z`wWwITU#OCm_O}f+D>n^qs%VwWjayT-3IpWUl&z9cay#l)KCZPZ
zT&)9NPd%t*!d`?X)H>lWQWwgXEQ~gk%~5!LXg5DHLP^=w2&~O?qH4^|0IjH65I!$d
zFKRabVl|_d_Jeh!Y6WX7*!2@Tvm>A*f@hP4yb#ed-%&xr=XN*7ItZYpRc&7ct>VmD
zh0sld<q|1k6X79?Jl#~jCYB<Ugw5+(=fqAP%(etZd`f-?5%`v#gVF!;`;Sm!6uEQP
zsi{A{2e{9Vf$ajmzfu_txqeYG7s3@ya8!zEUfDp>^z0+BWtU}7Pj0OqdF~!NO{F-E
z(_gwbzgNuKW#^n`0w}XHZA(Wa7p6Al5Z>wpN+OhPWS2#FMFOP}Ua{t0T3*?3q6fK`
zEeao`!0c^_AEeVgPyivdBa0xcB@CobVx&KJ5A?IdOJ()DOB5lg<$?{-={MDHD}g{h
zJ^vRbHzS~!{DXy&L(~sv`N2K+!|UL<QY}r@rZa~SZAfu?k=r4@K4;L@41`st?nFxq
zcYzCzF$fXn#aR^*>kv5nEIfAaJz9Nw&a$WIlq&t#i}>eLXeyVnZJ-t;GrXZDvYDo>
zbV9^pYb30YMXKv39T5jfkbFcsD1Uwli4xQ4rHtgIjO3RmBl(qyK3)neUXZ}z*CB-H
znh=#IgufY!H4X9bN)>*~?|OO9ZttG`@($hR9eVB2ru^waoAIXnF17ULC%F7kL{;3B
z2Nyj4^5S0_=bzaxv+mV9^xDW+`je-L$q7Jl?;lsmHe8vX+@jFjE7x^*eGU?{@{Q|1
zx<@X%XYWCIVz5DLH9kGP^z<s`>S@PxjvuUVE}wH3&x)?&Cc%7s{*-&{4z=5`G#PNV
zG>f6-D<^pJY2%A8y(#;rm7cUGmsmHk0xiITB>ecd<tJyU0X{{qmFFkm#z<<s6hPqG
zQN;J;{Ilh!^I#)|pMnt)u?J;a`NzLuD&XGy3@qPB;+OT6P3R&8#5=k4<n;0f4>7T5
zoH}9^Q-r*|w|FNHRq>zvn(cdg=JEd4;}VVj_@}KqlRI{8O>f@2ck9l*+ji|tZ`r<i
z&z`M&fIWGIH-x$zI7GTe%?LFT0(KF?(<gc^6;}t#LJCtZMh~4o&_56zjHBL>#87l_
z&<tF+hoC?H^7}$5d!R`aeT+R7C+8*Al;iJbS@aSkcy@|~mryFFjbgSC@M;wZZ><Cq
zFoTZ}L^s09OG~V#<G}31*>6{moOaI~Z#<uEJa{4!b7GRSGJ9(Ei);8C2brOlt&{Yw
zNg5DS5E2l&T0~@|IiGk027D+$b`2iPk-?|IQ2Z|40#tB14UrMi1FD57?naoIgDj1B
z&*f`)JTyi+iPxZYtjKeKY9JH<NDAa`yMFhXbzzHpl;ceyRaM~9v=JitsZWv+Ke&YW
zIoX;(F&_T3Rw^He*a}a<tMmaiTZJ5`lOC_7%5)lFkW<kXrc@i>MwUvWK5e#2Te77r
zkyh&<D8b=Un#qzBTq5%b@|d20+0~xFv17^DVqGed_%W^Ho0pi@9DKyXa~nQMILVjQ
zFMEstYwcM#+EsMWetZUC_~2JT(nGNY7a0&CpJBxT%97zZJCO)(D8Z^hvc-eoS~lrW
z2`;YbWUBeXVXAmX2lwak4q+g0;e91A0>Gs}Qcx;j#uc(mIXM+Fqiu}I#k6gTCHc%0
zAWmc&_dlXzX_N%FA#~iSdK?|!hcU==g*Qorp8Ij1+1Xwyl@Uqhn8BmU%w+Nf5J8BU
zf#iXB#z=H%;tPWxL$rVLG+Ua+*hTX{;s2bB7`lcYr3xP+OP*c<yNrm@Y0EBQ^{8=!
zmgv@lIc`=Ymuc9jhZrjI1jo{3%MkDZUrpCQ5GwC36E5UHg_WigP{9mgji>x!m`_}W
z$kb4DW7r@Uf+?|qCP3>Il2MaoZbpA{P|=~Nrr3;3*9)~gq+Nwdj5VWe*dVjMO&&%Q
z+D4y%wG*Skl(U$lI7_@RBX|n?5@by9O(&*grl(MxyAwdJU2+szmJ&W}aC95%XVMrE
zZPP^?7kp~q8KuM3X^@OQ5v6fRjiJrN7->05Q3v`m?uP_)PTwjWbYSaDki8GdjMFfg
zcuA`DGRat0D^>8GU>P)N-fSoe^II)pg1rH6x_Z^|YXDYdf^=$v0l{~;3QI?4Mwqrz
zVawB)=r3Fh2z<!Z6T!6BI*)!wfpw3l?3#LIP~KTtfF@8d2aw-4UCOiksU~Zv0VP>=
zETF<4Vz#Hf{xEyx1z9^Ur+`xmrP=n#g=g#(U_f?-=er%Ua9Y&wNc<C$yAuE{-RLlr
z7X-UQjGkp4f>=JKpl!tMv3*S@u@zoU0cAdE!rP?_`2#JiH%TsUSqJ?Q2;AUmtH-gl
z_px5A;@#!^H&4=wFP6D{R&+E&*$ofnO)SFX593fw0E*cBfhH@ZOb%FVOFMpDh>1%0
zd4%js53}?XU+~cZ`!?Fz6lE|!Ipb)UUoYlyrz6%y35OgCXU9@S10SF^=d%?go-k#C
ziREj~v~6%w)lh7qQQ7{Ki<o^-)Y3zrzKOc}QYyd4pgo@s3JA@2wm^a)c=o3XJLG>1
zjM>N2Qd9al?$4H0St2~^_?zrc|MuHqjqH%%1Eg3F1}=XHjFf%a0;*L@^-9)BDRUK`
za>FOoo^6vs3!>_s;nuts0@*e#gG~6P6@u?@Y?&@Lq9S1n;Hb(7g*`&o!db5KpqpS&
zYf~W$qcyqn8zk06Wji4Qy40Bnu1AI;f#1mRK-7{hF0i^S%PRZ<4Fgz6_>1Xzl2KAJ
zou4ijoM}v-h@4|wh`9BvWa>0^x745^<@aqy%iITnm~6dMkfdwZt=(nY=(26wwr$(C
z(Pi7{vTfT&7rSip&$ZtFdpBZ7+(&tsk&(}H&pEDf@dl&2LHxZff;nQ*5u;40UQi%Q
z1$|Z`vGH5$4x@i4Zi48<AoZH-y%}JUkR+XcFiYp89E8GBFOY`e<UkG&-C}7PZ;97R
zyp4F$Rue9cE@2sGgPq{&+6Z_y)@6%1NmK$Y`3-Srs<5iE$Y~C~xaZOyKb~kh{Me~t
zD%UJsBVm81if$A3#4TDVItO;DrgY*W&1xWh*w|F4BPy*XP2AFm@{1C=^ztlC>lcbu
z3w-Xb0;z1e4&&ZnR#?$sIXhstSRi@OY%*?=>wBzkpLXe<2P>5<W2K2Ll|ozm@bps_
zdw!9dr51BJX^Q$pankr~oGgikqdyK<)wS{PNOz^Uf2HpO89I&xnWW!`DL00U_Ecmr
zPLot>VMO|%w~sDA*;M^5jS3P`9D0{=hHPMmR81>cjM(Srbda&ueCmCNPUXtjoU7g@
zdmc~s-)K>Z#xCH*1jy6CZjBf_l9Z}9P{~J|UxNevSs8ZRv|JRfm)z+<w#FTL$Ej^^
zv3NH>pqD4pyA-?wfcqhMe@L|h12@*x^Ct`(phCU5CATCpK(F#?eVzp$6}fK@Yd$}T
z*K1TsXKDGU?7!zbN7OwSA46@5as$1dP_b(Bgx7TENA-Q1BA&>fFVgJc?4#Yr^dZUH
zS<KP;)#i?jy7mjc<ORKyNrHqTzc5-kIKv-bMyb%~{yr3>jAK&uiKN0qp8)y5A;y3k
zL4Zu4H-mZ1zSxnt(qYb!%B{jv9YVAXDIE{DCSz>Bt4w8X{#;&zEhYyr>S^<8epZ-;
z+vbpq@GfZdb9`lD`#eemb17z+L{Tn^fEMu)k0K9&kKi9kG!q076zCI-oLNi*<#0@g
z(~nw~^bm){^aejwqegx(X;dvyfJKc#K-}8UsxB}@vxVu|V0dPKK%cvcR716aOUO2f
z0T0>=CJIt(ru2*ETO{YOTzKiqy2`3Q=78Yu-<*xx1(2?(j~^2GS@JJlEXDl0+?=SE
zMYUrU6&8Rh@+IB-5sv0`fD&EoKh?8qT6_V54ARoGR6II~MUaFw@60L4PAJBxB>0O~
z@>l5|2XCerxJ_iuCP#*z_a&>nqtU~iKt>N_(Ga}G?S3aHgW3={b|W~=Z}gl|n|~La
zUkdxNoMJwx^%X}Arg3LoN4UR`(`+23N7}TekEI{adJAbYh&)v$Q6ucUH(po2sq8ix
zWaV%bf0h2(3gb*Eq=>XWwag$VBTH`Lv1tXSxXi92NcK$h=*m}|;gGJv!i8HTN1Vl)
zwP915$dC@2n#kIsu(MT5+ZPAbBHh0Q3Q`9d2Xwno%p*+&&3bR}_D4OK44kkTx?ANC
z7Su?=(l6#yltGS-AkTPGYS+$L5#pwu<PMu_#Yf92U+0!iTdXE1auSuCz8bSrB=%9Z
z$>GnJ_nE9srtxdEe?Hzl68}qiJr}XG%(*cIBWficLlB=%^^LuqwNmT*KHG0T(b4<0
zI9(ePZxrs#9Z&Fe66Ud7vXl~nR*(s<IfiRAO(=`F)HmH|B<PJL@^3FB(ZInqiW}?E
z@b=DgI3d`G&+cgF@LFC*w(vA@7sCMFXb)*!&x#R`>_`h&dO)l5098U0&+Pfa?)2@~
zyGTIXj<<YDxcgZ%c#Nl!J9C>RX=~(ZI5v{P{j#WQnHy!UrODGW8?pSfkW14aEns8;
zo;-}<tI`wVnAzS1?!SXbDQ{`}Ah2o;rF<Gwl~=h+zM59%R+vgY+0G=A$a}rEH%t6S
z*l9ofJN`^@gf4~YM#AxONy224yYZsk@4@j^#>%%URX>Fkz}+P(W&WT<p54{-^7*G!
zDSiZjpWio-MUFBYK9EZthJ)odVF(CmKeT%93obE7zu<F{GOEeJ;bjU=S&$?Yocq(_
zzA2|Jabs^t+XKH&KsG#!O+P?lKPzuQSEE3HA=2S*oigY<Q&6Q`6Pga@#Q|N-X4>=7
zWyFg*#G{C!l9H3}=VlWcH+qfSGp!a<f7I^NlG@;1_XJv-Q@~f+NNoetl~~CLl(u;j
ziSYD<>UxS{g`F?W>T`lp4Ln1YU-*;N1`o4VaXO%j5Z9cLfoLc}K1-fiFt$XXS&b@9
z5G}R(kg+p$7LybfDiZ2|p{M<CI%x7Moi=tQ$BUE3yz?a0=TNlb6ZPHM?_}p9Ivc89
z<gl-*T#ie;?`(P2LTwzQU%K5zB4pQKbYXf*h$2Ec+^cOb@x!hJmfVo*i~u?CzRzRU
zEVh5QAqjN)B9ZT6{>IhxWV+^l*|%f@OMn@b?(1#t`vr(MGa~Bi1ncuDd6U22^qj`b
z(5z?}eJpS~i#2!ZD=~1WhD>7mu*$`L5ZJ`Q<uOGouRbee5m;(q1nw5zlM#$J9<p9{
zDt!bU<2YT~!8H{+{I$^)#>cXn(Wv$wDGOM}5^P#UyZ{rdA5vaq3Deq)QbJ(8sb>9~
z%PT`fZPby${(LO9{w!Nh7g+z*tnXl|tdLS8&vc|wDyk-{TCRTo7PZ05$5Dwm`L_8k
zU0%EReD6=6QQF-1k(_rOY3D8(Bk=j|<-<3i9*2c6a56e03mI4~nIjTqU1NKj&rkmk
zLmeA)j-;II{tSXStGGh<y}XE-W{-rmLWjK+Gq}|oxI=wa;@954f6oW)z6|#?u<XVX
zoi{ql4av8&m>EsK1q&Ah^PFBTh=Ju%DN|92G%Dj~@ZGGGVB2-#w!gC*_j%OoW(`O9
z@x+OHG}}R5PX4M_o(7KagY!zV@}Iqoa@;9RBM8pdIANA9<pFh@mJxq1*odu3R(#x5
zqd2(S$)nhbiMXlN&q=A!6az0}-a+rarJ>Wa*YI0q=RV779{G7dFyFnXZNHq*gZ~7c
z2O?L00hDq0lHsSl&~kgDK?R+RPH(al0LiYw72J%kX!bM!@Iui>Ux!s{HzeC^f6(^U
z2fG(V4#Sir;yYj6*5oHC1<j^Uul%SB83s3?Y$!4FIA20aFy8>A)k6V_rVx48evx;z
z)oC3QzRWG8g;%3Y>B#B;pNTU9!ciDy>-Bt(+Sqfk(o8q0{J>Su6x0b+1aXgjIza;k
z4Aj3A^UK1<o{->W7D|-CeaBKC5si<kU2r=+FEn?5{uKQ(;Q?CCa1*LH92yy4pRJg4
zRP7`a;U~){R)xsMngfNBMH5Alg6J}!z+y2xg$e89`bFfqkA8%c=TdpD3-;j6`=n7S
zJ5G=tme}G@bX?SkRBtQ9;|FVk?oA<t+%5VMkCQ64u0pe%H9gv%$VmaUE<9MG3jD`O
z8#fd^H2hi>#)4RzQuX)w1~7i89P}W2(F8akRAn%M>TC~U>YnflrS_%~9wFC|=x&Dl
zHR3`{-mP#m*-#rSmY@D$upo+KL?ob`kASZyriqf_jd{K=nqF68o!xMy7h8kZsWk4M
za2OUFdbhbyz*Qok{8rL}vesNG0`x(lP!0T5UqX5Odpwr#x5#BWr9lgw_=VZs^5-u{
z!DP~7g1<#bzd`-0I<MxW{;*}zaeFM`R$2M)r&M8hMmDX|Tm6n3)lmg~LdOUz06Ne}
zG|!V>UQqd#se!Rf>5TTM>QQPQUs}0)@lxwQ(F!bx#Zuz+yDn?}#KeT?GpyYDtf1Ue
zA#9|VrNPMFkAV+kO$a-dfFvstGZ}2jvA6T}KA)l0Ui5CgT+8(HK2O!P@2w|{vYGJt
z$u|-KtXxnU7+yykjy5}u_R9(bHjs|;$zH|CUg@hU<`Z{xQ#co97tzedYTU=P(j}wZ
zw&1(O9{PgI8Vh+-uo5x&y1$3)&KI-s*>p`Irb$Xmr@6T5=;&qqtf9<uD!PRCm(+;D
zW6zS)Zp0oECOKoYt4yGEic*0kzh_;odnfG1I;+*f;xlmhMd!7q{aIjjMU&H#6vZw~
zX}(lrf6@`&yW!&p4;(;~*@NQ8yZQ1iezo;^+@OIkSlasm;ue<BsF?xnq-GJgRfPhX
z|8c0nR8{Ik121EcgBIgq&8&X6-iH*K*XpvG7`|XXt92~fI{c8L#4#M9nS*D{y9LkV
z(`l7EY3AC^x!i$MiEK?RxY@8M`8F9Air2H@nyz<w^s1?|meaZZazVGstXB#RO6Jw}
zQs;Q5{QU3<*t~=+B$={yzc@VZgG9#&4}f5D^u{Y~Uxl%pgcJ4*fqxtJhv%wv)!{1>
z`nN?OJv-xmUN6g!7bj1ow*+s+Phdt~<AGKPBmlMw`WFIb(t~;e8X!I*Piu3ye15VR
zIl`=N-78(8+Rc#hiv5kS7IgU}Nwig<G$8^?lWyxEInkx8%~ka<z}DwfTAM4+zjybm
z+>1sYQK;{qxADKpRPHdvE9iE4-OXOdb)#0MkyM78R^k`z@K=iuR{1n?y07rGUH;++
z-bIIDk+(4S3<YYGeror$L>Dou!h^Z37`@DXPNPBm$U8IY>NwMMQwiZTiHjcE<_}GL
zz-*CTsDfj_@xH>GtNJx|bb-W6=gAAEc#f~{*xS33-rddRy{@aFp~WG4s#;#70R|u9
zBD`bVt8H_$JZ&)qosH$~Rtyv^LF}T-Zzx(u-jConc~9W~q4h+gH^qy}2<f3BCQZFs
zeAM0=Pxc%_H$1u&nOh~pEx>vVh`p=zwv}42_?)I#wW92?gW5gQYB#)yDBl>aE}i4T
zMHp|r`HaG-NBhi+L0sxQrvhu9Q9*qQchfL2m-zgOU=W~97mVM8?%B%-TDAGZ=$~oC
z9kQ&|@hW}qx9@s(U<TIi)2ucwzhcPEYN<b_?#f!=)5-j1&Hj-ra@@0sKgW{7{N6Mf
z(qU+BEDvV`XHim~TqTH%csK3iz;DJ&J;VLzGn~v+@Aw)4`)U8KT*<)ol3sd0mnDQ`
z+~?M))(mB4jJ*^>a=Vgg$U-8*JiQ;-fN)=bbYjC<*tV;M(lJi>a(;0bSMn2+@_}q)
zaGwH0miQBrG5hW(Z&n~%@xa|E$De$2ACI@mauO%j-7w6+56hlPtNf|_1crLA0G|y1
zhh~JOlop=t=^qqhC>^H)sCCX*bx{vcy3A~Xd+K9y9Vfc!BULvAy9E@qB7BwDg0P+v
z<Vg$V^(itQdT?}*<c`--C8s{6h=kSi5)n=M%PNKBX3<E<iS&$FlPXW4m!3DUFzfu!
z1yw=b=G_7EVN<Y?sOh!wPwQHUu2!5$_l38wG*PuJ+=moL`n4OXh~0pG*xwY10=_GS
z1!ciftnHP!^3i3^0@>QZ#z|N|0)%p9;D@m5<U&aDgIy=htP&W{E9BO2k$3EQ5I+OR
z7^w3~fo}E&VMbNLI4_blSXo_C7w))q@peK`p3ulxSXqgj4G;%Z7;<2Zn-1AQ_%`Uz
z556Hj{lR-CWwQI*!6>Z_P?Gy|NsO$iQ!9{z=OpA2q9u=pJL)8&G`-S9+8Aql$;wph
z3da<GOICJil<K%ewMT$f$g3fS>Zq;F!(s(5+o!{-F}(|1gUB+F=E=8tRAl*KGsW9V
z4+~`H+~nwct{o|HBDc!LH-{s3$Xl8h%{RfJH^y)G87xGS_B?{x^>9V)ux@(W3llKF
zL1qy=)GT3v1=(V06?g|71Ld+T_f8hN%#k8hwA*lZKfJv3+HP92yOxpX#HDXB7eS&c
zfsglEf<^{orDoHs*Y=H{D{3io9!^SGOzSkUH#)ECZKxF6WxioExlQ^sISJZ)3-wQh
z1;gHeWH<{-2(KX}-G#W6VP>JJB)(pHKTq{7l#a_x_@F`u4J4FR$%E{ruZ|?YQfNRF
zY|kl;gw`BoiB7yVk?M*;@&^idkhM!^#0gP|3o*T88{x>2_d-|M;fMQe{=V+KvsoUl
z1Y>3t5gdY3D$lHi`nG)pv$;w`9Q$Qnxy+Ld&+3K3up{ke2);E%xMGifDNko$)q1Cl
zetwi6FS{^q9qVJJ8_X)=49ziV8jZ@JF(OW**CW#{K(qi_V-zvSW|P+~ryb6hQ2|UY
z2G+Y|N86OcEwX0~jRLE3BRit;{3rF&Od`D^pFyJV7b<qsju*=e&4@9E-H#ig`m4~6
zgmqTSnAF6AtL--lB*`6c6hy@^+~tu)6u8CEl{hqQ|I8=SE!9eu@oQp5_DfreIN%y6
zo7u7V;UDSKs11+6f<ua(ny3(AW7UYTCoTjp`+J)Q&zT@_8_<|8bLWV2Agv(a_%D2*
zKG<HfOr(L#NKJ42_k1A2o<F7aU_f{k`f3Xg^VoX3oi}IrV+vb2S&R0VHJy>1@4OI)
zTpXNK57}GBItkxi^cLUI%{Kp#NP2!H(bj|ASgz_T9>Xcg9=TyR##@c>^o<%5C;@eW
zo<_I)s1N`L;iJE>C`sv4T|-|&!#>NLk&!p+@F>=qXqn5eIc;)a{)Hw5VHjFL6uOxO
z$UU7_rXUrx7da6v{O*VJ{&|nW@|(crTy|DpmTLH^JN@m4wxDaAZGi&noxJeb_#cJ>
zrOaqbd*72hTNW$Phfafr9q@`j2jiI#SUc~6!6*IQ32nb6ZW54#0L2Jkq^J)ix9X8$
z+p#Ul6C|!2FXChU_?CSc|3gQ{yDa8y?p<e<OyIeJZ3Dk#<xQApSU{~LCZBr8LYPJu
zUT(5quf^ckQ7jdM`1&>xk83z_h-r4Jca82SfB4h!vxdICJq4<7k1)kUmUS@V#XUch
z9*h@=5b>!DV~iQmX=montQr-AX2NPCvn#m3?-Mr)@PaE4;Q^9feQ}mr&Hu1}Sz%$|
zNm;laV3Bd`$JrUN!(Ih_D}^cO`u;y3hl?DSvLeXB!wKw&et4p!5+&qd?<C`3VUNkd
zk(D|L3n$GijU^{xoy8%*nnRR1&Gg;eEE(``1pf7oQx=Of+!|H9!<QO4fP%L~y_~_2
z{-u1%9k6<>fcHr^k1tQg4)-v6Fjz|-_I6v(@`;`72LkGDasB0m3b&WTAKXs@p=e~_
z;$YC^kNG$vzHdVNv_$@WVPaimVe@?4t-T}kiSs_1V4GbVu}l*AtMAi{c$kXEQm*}s
z1A+~DXkXswXb`pa8=`;&4|Ot;MBNWU0a^&#t9h7M?bH9}4USxaewguz(4lx)vhNIb
zUk`zTgV>)T`_KXT@f07pGA~~m&9om8n&Yhh5RxO43ytV4;=$;P59?Y!=&QdV?rjXb
z6aiJldWW7bYl6>!P>gMYn|%vczC|CyOVEAT;rqtHpDiP1I8iwZIV&+qd8dc4(-@Ao
z_`Slt`e!IFn?#Vre(yL?1c4;;ixLP8g8?zyp9jN`ygrwlvAl;=UGCkjr|^fY&K<el
zdTW^M3;Q~@7DVK2(ut>1(G@F4^xpSe0_$#`CTvd#@}LNFym=-aW<SWh?tE|)!j;No
zU6#aewO6!_XqQ)bXIPcK7}73_G6*vtDMZyyxDE_v)Lvk3%w{tm9TMl#I`1yWMrEwC
zKwOb55>`HP?!gBk<F>mt=!Bz5B%PapbVXOD<z$c+mHm$#uG0x>SD<9|r_;L$k9pDR
zU?TRvl^Ciy;`pYsh((yn5gxKD(FTwSlS@q46LZ7lKL>K?128~({w#*HQ`#<(tYgex
zn$C(0KhN|D5OK}Lc|m}u3_<xo0Gbq+QS>4AD@;yrkRu6h!;9HGbwTCP;|%H$>r<h1
zGf3$c+kZwapz09?GRs;O@tEk%H*tIYQtqK|1~m%z0<?r~5hl(J*+5@7R+dQ77OENi
zB1R<ACn;@N&Yi+&Ea5ujiZMN3tb=!i$!D?7`)G75vd{KYQm&9w;t@*-c1$nfa&_P>
z)QiCMT{4UK1AnrtDOSZ{v%yo(n$L~|(x=fAw}8rvUj|Ax%x%mDj=#>Zp6H=ETb;t5
zHCP*BI!8!9iI&^JI3KIH0%2{dfo~Y&Vj&awSG$1ZC?SeMz-vGy5Z^!&y`JDIU4D1e
zmoPdyZ*BNRn^tXlpr<z8MhQS1MTsw4-Y2q=c^R=zt%wl4a7&kPg<-UpkyAZ(jf(KJ
zY#x~<KL@e6Y_|BccATy@t+zMmXW=A5FPl%^Q_<Rb>s91b>(?CgPL+=eL88vlB7wBK
zbv6a8hRgqc`0+>>Ni-IpsRY)}-~&5BfMFr^vuVIq0kW|jQDs1h61JpH9RxVRE=fp(
zKnV(z92$*W{U7{_fU>VDC8^@HNq`Itz$gYN;V{74Gi*K=z$Cgg8jE+&Mts71W3dZg
zc~?wmA@~gF*kZ>FEPEMvkYq|U_324Q<>%f05YX}u`W1j7@;F%4NyQgTZ?!sB^oyv}
zB0*gGs5gWCWt=mSNL-}2vkLW0I5sH`km+&7s2$u@&YNd=ylPhlWTd2FcAc~!3*h8t
z*lKc<W14v}srkR^?mk2g_FeHv_wFF1Ve&fe5t^N(XL|u#;D(RG#gXRUfI`OSwVzuP
z6Oj;s2Pa}zTNqcDaeb3X9ut5$<JnGkaWYghDcs#W$9?4c3(eKiL^ttocDzCN`JWXA
zI1ujQ9Bxo5)T2!-jxGUz(0q;Y5Kttw(dXlPue!yr4(_iq=q(w9uDr(tLw<IY|A^~?
zjni|}fB|)yiP^L|uP_~q*=Xn>&oeYZqWDczqOu@?XLCn?!fNP>kJN^S^sHUPGii+7
z%)nR_vSs*Hr2<NUG+Pjib0$6MeqGf8wl0B<aYapPyJn^f3AJnEEat(*-0xVWMs@hR
zIYJO|Jh6IDjy#7YCRS#>a*gVR(Z^)t%T1V2MN9RC0`K=i2t+{)`4LnogQI;c^D;~1
z&?quUMSJz5D8XVqcy8jdeP~f~W%3z%4|{q&i<LCLUUQCi4{RyZq|_7Oy`e$_;}6Ke
zg*o-J-EFL}C2qDs2yVyUnYtrad0xHlm%j4Z1L2HGE;cGJmK|YGU0x?5dn`yiR2dQ=
zKy$JcLOf_c1cnfkAzxd~jTxWmz8}5T@j(!`VxbXNbpWR~)7>UKSQKvn=Yi4mtj~CA
zO0Lau6lT`ICgRvrfrrT5W!qoAk&6V|$e<bmJc3N%kOkx><9EB>8O@fwaEeshHEZkh
zrSTCAaX{c!Pwzwe;DZ^ZyD<YecQ*BzD&x|zSB(&@?{*eYn3~KDl6!D9UP8`4!4p2C
zD*=KKxg(|8?X{#uWaq!dq=bXm<;kL_y)uFz--I&eq@7RC<&=@|XTEoDNJFeSb$#Fw
z$$Y)Pb)0&!*EXUV788_n1I<>m04%b#3hmFq;DLvK@Q{ulnwGINa3-9YTo4r-k|%4$
zM(G+g<XJSbFP5)`+fx%^i^b*0Z&mH{>Q(TXl5HEHh9@|<JT`kybO~A;(=VvCv7NWD
zjghLCOt$(ErD_|)3N34#wKAi3E;|1g{^<P0u%>DDCzvA?Aw@*$Q0T<EPd7_B4Z#6=
z<*!3=5#inb)nFSo48&#uYg?g0hNfo>ImIk#+kvDNHxhPhvKpZCKmN0gP)r|zO>|R`
zSMh%Di=o(ik=#n(0nql!J_KAbUhHtOBH=Qh+h%rdrWUt)2^)vgNauCP((K}Xb*#&9
zUp#~muPfEfZ)9K1yS<iNBBcX{y>0?BMSK#jr-#KK$U`(f!9PM<ROgR{fIfGXeqOp{
zZ-H;sG8LC2`XRTmL6P>AktfK&H%#3Sd*sGISuHHjOd)F;&YO4v{72gESKBSt=3bLe
z)80=u+aPM0g(Y<4jk6Ulv6<GGj}tP2t?xrTk|oPPqQg9}u<$^+B;ZN67Dj!gr=b@Y
zA8D+i?LnKo6DWJ|<;kaFAe>gD8MOmj0>$AQUs4r4_WUB=LZMTJWlCrROkaw&VA`#9
z#j@_;QS121bHdV;6_k}T+*@QTwLSrYW$NV_mSyT2$#HJIx!f=22NjI8d{2ENMsHIg
ziewIWF$>2uh&DfQB@%VHlTYHvXk9L+l}uqyFP?r9I=0~0qMQPyG_htlwL7H0QZ-Oy
z$M{8bu#gK^NI8dzcul22FsA@=y8E!6Kx3b)fhwE~gC7nzcXtL2`x%gH@_0!nQy)2z
zB|(qH@8d)8ihbJr`AvSLd%9d{d2IU9a!ye9)yXr#r7`He(+wT-^y4bom`W3tWyD)r
zOBwm>7?y{J+&{6AixiYrBzy~f=U$Z7tg~UTlsTyy$o%L7-(^9bwSn|wB38+xuTBGe
z_@gM^MbCP-l=)KeW~LWwaN^BZ;2P2~@Q}{~N?h%OJSwo7QMm=d>dJLSxhaRgf=I)w
zRq2E6;b4Fv;L%M^n}5y}5q?CfV3Ic(A(X=S8f+}IIS+RK-!P@oQVwmCurzaTo`HM<
zS#tp(y=lsHewaucysj~oF$ISZN0O78ttHe1(%5o5gZy6Ke*&3)EDj;2<!mEq?`eIW
z>c87<y*bm}pbx`wZX3cMxkf;vy?bn8^YW})6}1OG==<)%p5Ajwy0y2R?oP96lDO6S
zeyPjtb}eha3wR=U?@C?!o?ljb?`$-G!s&qDhntu!WwNXcJ?V0P$J^<lefxRa3$%X~
zi_abwwTQ~+uz0YaN&X(9Q<k^r?K)3hQr?x^OO11ndq&sa9ot{uIkKoULSXUy(;eOi
zUj<8ezJt@eD%gE5k+#$(B5ex7;J$}bsQ`M8!KIvhZo-Kw$CDCkc$m38B@abZ$NRa&
z>xsR`S%GWzIws0^HB~H|0UGAXY{+po6-hL(J?=vrX!+q#cH6JXg}Gl1Ko)#0_k>q~
z0J|R88NgsbEsn=ME(aZID-C_7E?=$u_8=}6KhB$Waz^QaMq#TMb1%+o^7xv=)S!CX
zb%MgoG55?2=z5!3pRK12jGCMli<Kp5POJ}Ra1`Pi(K)SrE3##b^0HM+LvuR$PMVhD
zXU!Q5<2mK6ENwoyc0GyKo+yYBX5TrU<=uK1p<3#G{p`RVXPREbf=zR?dW7!T=H0oS
zwWVaRRB~l1639%$*N_BDf=Ksg@XPKVNw)cqWUD=fSu90ix2z_Zf9Nwpb>8x8)}jCs
zMMzHZ!?D;3a)%gPuUwIkmyRU!+*)o;BD&-~1m95H?kJwpR*<`N%z>MGWzWo(0zFrS
z=6G_&Y=nJI%AXHEei3mvoYs}B?CGrx1GW{c6d_L$bFC&*-V%4hP3oDgHR+~23&hej
zIM{~<s1;Qr$x&qG1ei>L0<hPNU3GOB1Qel6&LLJe@f5f|ae969;jwBWEU-4H-!*6<
zb2WvS(L$x}kbKFHL6eq9H)WuaD>TD@n3tPli;V`^O_&a@-$cMY<7D_C{RMMDNXbRi
z$voEJAse}PV_*IIes(33IIE?}PwHZd;z|tb907?7vJ5pA3LIBcM3x7-({@cQ{wZc`
z>ilC;iFmpHL#);d%mvKBKX3f61iD#;dJk<9Dk=#NxGnl1*+J~O)x*geNoouVI5uHS
z=yW`wSU=)CU476=u(f5x&PIoWpq1{tzz3*Lnoz01RmgiCcjO;j41EFG-onXU=*J_B
zJ_*-f<v#G=N>qTdDT(G7^=)sn-wCx1!XT1RolQ(TcZUoNl?aA?vFdCtfyLoxL_L+$
znh0<F#~m9_ExZZsw#Ji?1B=Oj@0ee$@R<DN@Sa3KC>jv1cc3Z;hA4_a7aE^Go)%Rw
z)-O?jbIWOSBW#Xax=UoAxZ_d(&PJFht>w~lVWB@rpGm2VwyCp1p=VB+S1RQaS4V=-
zvKH89qT^(FCd8)rV8f(SH8K}NQQFq|78%w&GhB#mVe*Uak|DI-{j?gCjV@NC{nNOv
zELc1M4pF#P!jhSv#vBHb4a^<%FylQh+igCrAza<&Cy9X@rkBu&!}Wtx-I#kBtvU!a
zAY|a~x9Ksl(7$}6#>4(gvYGo(pHYWIen~$fpErG*Eaq?#P<*lb)5O8SLD88j8ea=9
zgXX8bW1idr9#uL4sqktg5@T@Mid>=l8LRN_W}SsqH3*@VB3I!`)@?P$mMS8BQYW+?
z6FE_D@r0A7kV+Ja>a&kLDrq4F7KczX<F%@QGWIs?YB`&rRpZ^{3`-S)C)xCaNt>-@
z{L};ir=1hh89ZCCdL>D`cn^ZBWq`cKF)#qf^JeUoGfiLCD#@(|V(#H5-d7^@ksaf*
z0N0=UGr_@(c#a=cV5@w>(8DT(?u7lm`#rKZYTs7|gG!&_RlTI{_@pY}&VHaEh}mto
zAH5$K%lz?#C~7)cO?XWSzIV2_?RE;rRp+@N6s9+dn6Rsjh5K?MF{&me6;xeQnfbk8
zU!f(6_u|p-4YK!G%ECi@=QBlQ+>-#+PNa8R9%lUJ&`7$5@=8Q5fOn98!BD^~=b7)S
zz_c7$i5hI-ya>y-(*xS0|5|>cqQa?E-Pv@cJKEIAlQZ9`+m=(KxS@q%hv`o~eSW$n
z-pqW<8^3x<$*l?L0QvIfX8Ej!ZKT}$77vyjRyNOS%W7#=5}^;~D_ArUA@jI98?mEx
zWJlIatexl78PAN4I{L7B9se0rou5elnq)rq{I+>0bDOVuyoR4UdpKbxe$kf+ASIVH
zhh2L?Q1L|-mv%uiyd1he>(97TGQF4|H$gH#B^G%Jq_37c;zMFS&9yaf_7=jbKQ359
zmcZ4PCN*P%kON1vEGwrToO+V1<qylS{a%Neh~H|zOysN7lsri#=aCx>$<*es>B=sG
zA}Jt~?+KoYz+ULBlad?NZzXKSM%I5!{mDO}3Ze+Ulj0(#z{dpMTOm72kI&5e;|WIo
zQ)QqFA^YOGWiILYW*K8vya$`$)MFL1MwQMehsJrKo3~N}W5YwWxnT=t>PvuM=1HYv
zDf}dkgg@T9=(z}|TaUwoB0|D7iMva@2PD*|&4r1N!DzRQc0R}MNxj!J?lV{B<X;rh
z<xHp-iByrvJh9RPb$GM@wZ|S1Q6A&<HMHE@1e7Y{sDDF`yvNDY@_YV~_x;~s$gFZ6
z#iv|BY#H)VD9U_TJWl`&8BBEHxfYMvA4rR3ztXoxQ7aP3D6+DIM>?{{g|~MPrXc$f
zFxw+!xiB|~1o)RVUuvu`>z@;e;(4D^kF1PHM<{uVZKZnhZo(nUSX-C6OgI$fLMq&a
z^pvL@t!>8TJ6Bo|w4oIqvz0PLtvaHD&y79%`cR=YO<MDyhB?Der>;nErXdVW=P703
zvZSR6iY%=<-)HwJgKaWpN{JX&%!||6*2CzkU`CrDO*s47ivIaS)-!}tYfGf^{F(c?
zYyODY94-7+q&P7$a4#_7t02(OTaj|sa<SpnC;{AgV&9k+trd{goWq!0=&krD2mr?N
zfrHrUTl!&CA4ctIK-N~?6<2(1W|)Ki<2kmXSzhd308f5<LvP+K=C;DsP@<W`GL$kQ
z((`S$7@6F*R+4asJT?T@Ix^%^tNot}+SZ2`jtyu$h@q|s4+H{xhX3%6w|gxghtfLH
z8|3ZydO*$-^mKrpc3&oCU<N+OQyhR-ilra)1f~3Ty=Vjh<3He{rH=m+2k$jr=Yb!f
z47Iz-Yo8`FKIV$74{jz$SI5)O^k{_gd5}AHofrUIcEt526eT_aiBXn%Q6}SxvV<N3
zki~yUk5vXS;YF*U+a+>mz(sdx;=GF6J%LTkxS*b~-fT>9=nq>>o`XCtgfuCaK6=|y
zq~q&kCn{a;e9ERiy*__~*QufL*EOQ<@qjN6$Jy>$diOLA3T+L~cQ&@$o@u}H2_U{L
zzYeHpx?;a3PR+6B-WmmTGmV>_9G{M0Tn}b%dpF&E!gc9M&k{gqa=RBgzEOKJZ0&}{
zr^Dk2Md(_x2jA$uV+wY`Ke2y8IVfO}e0P0$NpB|4V|Mx|7tr~UAxV695d=dL<K9Ly
zg)?jnKSJ4h0mb4ZZ4V&}Q=Wqf5@|4JN01>q#<0Ikjwx_QpcI!Qhi)h<&%@U?CRWOy
zmU|F+q6W=-5QV9k(;G9jF!>vMei4yTUu`*}_CKH`TE?qnh}2<<et+dfOdKN0v-I;5
zrs!6#>PT18DgFi%O)Fna%!HxrycVTsf2%UqR!39Mn_a4E43Zw^x3ybk%2Fyj5DCfn
zL0P(x$<&^cK7(j$)wr4Q3OirZVLTu)A$oee0ZUxi(+fV&F;r#5U4#k{VLi<OL|6+y
zH*lU8aj{FL^WSMJRbe9v6hz_fXGNI=>$nQ&eSq?poqc1qB!NysPe|sqo^D_k#cQOC
z0CRYu$87X4DlFl<$0kSLeqTQoop{Q$VNMFg6zGY=-@os)I+uG=g`l)glq`1qT1Al@
z--lNfr!HxE`G{-Uw=~N~PtbiznSh*B+YKNMc^3FY<aB5c8-LZJ$J=iGP|b&5TG@+O
zN+eHat8AsbTlW4#ez!ciF4E(L#h*~CN2eh48{>(jX~g99ktN1Otqw%zw?y#cAF3}!
z2JOVnvmZW3En?D%Aq!>9nXu?jNo4H6NGtpQfKUQr4-dE~&lWvOAB)C_rFnnb;QQ<~
z#Vt&!wC8Z&T9M1ySSmF^yEc+vA)_D1>!DmaT=q3a)<#OuR9d+W1ISyJP#A;jj-}u+
z_8lUb-Mb5o4eSojI$F(wjC!$Kj05)F53FZDL?m-b{JJO(P<C?55Rx%+@q*nHX`<$e
zW|N0DW_y}9KgZ^{tX~Jq_$?5sSCPxlcRv2CzDzf-f1NM}YSn!tShw`oRIOV{LaZrW
zy9yzjG#|E-udbU>>NNgr(on*&8P%gS;9S}4qjp)$VYy|(wl!X`+4R38)0^mlC#Tes
zzo(Nn?itaQ(sa({HUlklAGZT9OVb+7xMO!;<Ten@UYXwZscbR7AbYNXo8<mJ%ej(6
zg!su)B=}hDJjAeSOo3yNQ9xgVa89>lP1|DrwOS@<8F&e4|5m1Yn!lH~<NL;*y`HC5
zy(s4Y^=H-Kt_%py@`ikw$#ilr=JU;_(Kh`}@_Hlb&7%7*;nkvq1{uFPB(6)G+LZaG
zv8vE=$9!;rU_L>OCB`z3@o+B8S8G~rSXj%Ie=nX-zK6_LtN%wZUy<ozwmh{F<1zDl
zN&4L>^`80IUwt{|gGF=hz6g1$%B#yhW1^G7QX4hJa^c&s_E>S8+c3C$#QXyuEHbil
z0^Z)ce8*$-Xr;%OAb=N#L`v1UAu#nXeTVxj;{kP@64rVXB^L)zkQd7Bo*;JF>;H(W
zRst;gGO0tdr}gU1BMPB&MPedD7kYfEsmP*d()l!(WDxydeG8I#^X1j@m^3dinlWrX
zUWnIV5)mJlCJ;mkn)XThPWw@g@w_EMit+!nO3UX4n-(OCqr<+d^>!|`F|2YpLo9^v
z@HgB&GO|fl5xResd!MCyUvX~$DvrZFR^Wt%wosdiOgvj7LLl;!%57t3@oRB?&-A*l
zXMVcR`o$pVnv&$z=yeUEW7i&wY)sZm4VK9|tRwLqdK!G)>$4rtJm!yVnM(wbu9{||
zDQceRSX3_ItvI_josopMj4Pu@U0D2{&(R53^rrU@K)nj(ZCb7{+;{EM(i(rQZdIw2
zIdC^Ms%NHDRnPS<OSa*RQ7>~me!5$XOUu(~?%xSV(u=e*4BL#<>aZSaP{1zXfD6lK
zzgok7ES0@Ty^LQJpoYm~>`?=_w#<B^g+v(2m@W_R+U{9+a1bnn5hc9Wiy)Gg(`wCi
ztk-Z8yf5BMMt3DEWSWr-Q)`Hr{G|f9g&wskY)<_b!50hE7!Bz%K{Y3jX)XzU#m^^2
zo>QOdEFIWUB9a~x%fc#vMX)hh-9|E4i|@XG4oL#2J_Ba?&1MxD+0lZ(jh~FIPHjM0
zW#UbySASJyd*r9{O4pSu#gMTMfnn$6>UnvkBr?0<W&13>oWr4OHiWO)!X!#kHbO03
zG~lKs(gMTT9U1GCSCa5>0ID&N6>yz?84wt!(oQrMukAqjurg8O8BubKs%j6^e^~Ue
zGwu|X5|MHiyP)n#S}miGR@eYOL4f2VDuX5k-O)Mqw&Ag`Dlav8{P;`#d_~Kf=Wd_-
zWqI4+{mn6fJE5-Z7?Q_2!>^^~A>%#3N&*ea^j&0ysp!Nt_Pg|1N*jg8cpXIUm#*pf
zf3-pDhe<<+llot#j|;nLc)A1PG*|l%E(nF$Vch=Ed0SWEt2p|&wm#oI@8gtk@N>|j
zxFOWyO|jU&F+AV0LJ~72@l*`#vCfZK(eHn>2(Nmuz4%iNw7e}tT*Y2*8INYpu{~fD
zEltPOu*&6_{N=6bOjDvhlVWuDd4=+DY>UHNAjj;a=!X#=gH=gf(9bvqKnxt~qH_2!
z%Q(^JF2N?-^B7T5YsunP$d>#UOUxC*yUco+-W#mS_kpv%bKYZF#BBel!SbsInu$p4
ze~pSgJ%pV9N;rdlQwFdR)JW%jAO#gBfDr17nOVgGBO)I%t9%ASam>_{*HYagz2Y^Y
z@~|b)aCN2+`ql!icDRVBZ^^RD_+0vB2=a2XFMOqKtWp$MY5*&iQHx&1pUK8DI`uep
z?_FU*HOS`H5EES@t43ZH>|ED$=&*1tmJCD*mb|qWFUdC9kzibJ#k{m)w+X(Vey)`z
zUZm&6$&o)qOQcLlnroni{LOHU-_}XRr;EZBlNQUS7X!w@+Vm6Sp$)*%oEc?)2-tt;
z?OLWlEg$&SMpCN+YGe~B%=?fmM?O3$MTZ8*p|^^Np4e<2m#5!L6<n$7DBscY&)~sv
zS>*_8ECw<BbD9%pA!>}-If#NjmD%QlVy(^(w#M3~JNj2WrF;=}2i=im7t<1~rGNk}
z+6W4vdXSzKPFcNP;S*EX82He87%B9M8LMX(<0In!mO44ZG6<$|Q?DSe)kft1YL2M5
zQ})Hi(1D9wa=Z*MTkW~MGA)po|1Nm7K<g?-=j_oqk9r-fqI7AQUk6@N{G&Ol_<or9
z*ghf`w^Rc8L*MJ!>v(B(aQE)PnT9S`tCyqudK`hs!OqteT5A=bVDwcCVD*koY&GOy
zf!$KAHoJ!0mzeUc>08H-#Nz61l>B(iYRaz}R0WD=itgx0x*5Uv1O*mp6_}a}g=X*G
zOQk)1o&eILDXPpw*P|ce-A|HU%RB!_k9bV4jQ4((F}Tk;oCWh?3H$M$1*dqomOq6_
zarT9OO{D=AGA6?_3j?~(6Rw(kQnen*Hv5+n#6PR1a-FVeYqC88vPybzm%eZ?9OQYm
zc-_1n_dm|m)AXPp^#A8@6F(=O`ULoYN>{LlFL-+1cSMHiO9gLkqxJgqpUu)sf0Co+
zeLWSI{6zWHL!fCPE}GXbG?TV3)5&p<-}qM`*!$Vqr~h^IVsw)II4#$o@c*qp5>J&-
zX~<L6*^nJTO~3(h&X1eR7j63c2Hlff12}9+HEzQB;HC)E+hsvYph{rBo0>92o=R9A
zMJ1KF>Hl7ceKHbcT-D>N&CV(hK&d22_%c|AK*s!^0;FUO?7CZVBwFA9r9ZkQkNZb|
zlo>_DW;@2~4sKJ9UR#^X@f!$SDp|T#QzTmc>HAL8?p!>vyoIdZwt*PJ$fM5rswmtA
zhVOle_ZG8MDwFeVj;&W3hCL0vr(_Y7tjp!@CPwJ)h%saGY3#EtzJ^|3(}BNhM+cWW
zk1Psu&7U%8lqaHME}?cStE@s|!S`+XE*v4UWZd(4qS@_eurQ1NcYd38aY63DGg_I}
z`?BOOM!~-Dfx9HmjHEZqTP9MNAMvECVFc|ybq^Omgr$(*!<}fZZjE2aqaaf;xF`?+
z^%)}tj%-zq|D(hm|G&*h|51bg|8gWBVH68?Gy4N!c#~~>K7YzIzyBOZr&FJ1*y3k)
z>flB#6DC=}7fvy6qtn&H^o{w;Sd2wIS3gKU#?I*r;=&y;Gi3mXwHdVC|1#3{3VTxT
z=MtBnfEN0sdGZd!;=pyro*Ero-vboORyu9HcOSU%N-{6@JmA8S@^X9H&7B^pY(HfB
z+Lav!{hZ89Q*T|9uY&f<(Q~*Ru(=4bVN)OP?~Z3W{j+ti@qLcy^a2R|99<rVET7+D
zu8Qz){@e_;J>vm|k0a1<bgd;S_(kf~$f5+ZP6|Q=ppzK5VPDSZI_B{l2eoby4)HWz
z;itt-jFg;TeIhWRfI&RRdiLR*h15MfJlrk5>PLjEVCozkGPgx3NlvNnvgXdR;qefj
zh}Ysc+uW`N?J@fu?SOPt#dfxC4X-cXH^y3vZ@rrXZdtFhG@z;HngSKfv7Tg8inRQ%
zK1VyWgg=WU<^?027U>kv#P@1va$RQq!qhi?eRhQP>hnN`1}-JbJ#C%6H{5KM8Qy3`
z(G;~iv7NRFN`T0&;%Xhrh?XgObs1ZwQg3tC%!RbrKF}tq<Y);NE_7ZunPsy2s2>N<
z(^U+8iec<Z6hR-kh@@KOzNIDKX6-R;Bs%n?S^#Ggz$?ChCZmNf21_iFq;8Q30TSO`
z4~Mk|6K>g>c^zeV!d#HbV?Yhz)T_70KvG>d)!zG*#dZDJ;{($v>&EHP#fd2;q4&w}
z&_;1^32eQr)rv7TNjFTMHH0Id<VaOo!)5ft+14n|UtT4d3h`197&8$Jqe5ep%YEUA
z7XmXc8RMsx=!+(&<JBqVv*pm!-JV%&-CXaa;AL!Z^YgU}nK9ToBu-PJGUj*#V81vb
z*k^K`<?{mCG%jKccyd=qA)XlqMeNR6|0@kjzcV`QaEzjCJGK2D^>n}WHS9Tu0=wD<
z#07}LaUOCtwO(+VC{JCN>J6NL(`wH7m8|``5M74y8#o+sK?a?r>Fn4EtFiQ&&CT^D
z^n;IT&q<9wRizZaGeApvZ9#SzkT?&^aneDJ-(W-<Ff3W<r!I^C$!q<^*h7n*1oFgo
zD6ka1cv;xtFksuLXJ<3beOWpT-&+_Ek4<X5T?hS3k)1nBx8JxRjGB<pH{n=DDDmJG
z)=a=;QO!;*|GmNSi6+4T`5i95J#zYbbe-3>$_L$n4e$uj6z&-n13W?nY#aZ0grd*l
zZ&^c}!fts(Sb?7;6D>Z8Tu6SZ=yb9CLC;GA-=c?W)5msz@V4xRMZZ-#KA**#w{b`x
zEE6k{xsZx!izy}iz|LYJbPjbWPyczShPtJb7i9x>VCrPb($klNlYR<##cFtXP_FAC
z=tB`6O23%8OX|G3T};-5!4@u~Dd?PE*nj-5J7~b_e{%<UAkb5C8<~<@Qje&j%vaq@
zN2g`bysRi|>0)<6gW_8sAvW>ztiNei_t_S`X2|vrx||NL2Lrlye*8?n`Nl4J02U}C
zc_irmcTOX2dgK@6onJ8;N6Vk1uhrJ>^UjhIMZf!L^EyDg{Nn!l@q9HsOYr<w&wcgW
zc(pG5n|!7ait_JYP^;|r%NgE7BfvvHFBcE>fCE>kXbCee8;n4Wi%lE_b040%vE6Gw
zq>wVX3N@k>vG`&r9zKs#TK=ZqQBZ?y5oh;ozN|sB!S7K!$2V`?97&2=$RN0TH{tJ9
z{i4cO&&QvgyG!pbr=(jX$peb^T;v4rxfd34{F?q`ntD<YjtJxp7;#wyZuR9*A%gIN
zpw4A!1e|++ea`!$_Y0zw?fIH9wRMk$HB?`BC{z#;>Ki49f6i<vh(SSEY&SGr`RuQD
z{I5`KL<_<&z|~u$#?kNsbYSazMKSYBDPShwMK9nA$k=*%cn3YK`}8sSzm_0Oy{l5P
z;52|G=mL_O!e05Q`yWdX&kNXCuzpkk;H*l5ZaHe&PX_7!*Ahg=_>Uzh)cfUMOVD!B
zUw|d3<v*4nT-kptLGKa&T7q;BFbYZjmn8@S(+6M)>huXhgzMUkyoebeJf^4T<$~Iy
zC+SZILat(<-!A%*`qzAW5hEX*Q7-=~2NCyu0_6Fhu~U52?C0&uT}~Xq{`tE15bGB}
z6XX#9&;&gf{-X&Bp+hB;q&GE8mJKkBR2$-X8#ffmp?(BNf}U>F^BMk?1g*ro4C}kT
zL@#2y$sLFd4pAd712X4V_t7+3Jc3G{JyMkiWvF+?{WR-EAMv0w4nZ}S=PNN-_xFsq
zFYPTkvKs!{%ek+SLwX+~&E*j`e67~7l4uMC(`3z-yDPW_6J^fyXi0!$(!Hr9(^8y3
za(PoVe1mkwV|zj}Xn|Kyu&OWa8`(c(z|s!&NQN9KJHH27Ol{YGa0ag%ji{1$pDLPf
zVgM@BN(N$pa7i@B0@^qrGDE3832y*!zZBf>|5yQg3F%svX59e%JbpOPcNr~!%|c`9
zuRRNbfk0of%p*f@r~Di+=}D(uj%G#7Z&WD`tE5S-mIt9nTg^g}{;e;5S+v%sxc|nY
zP%Wuf9CZd){jdB*L`NS}mDngoQQzyO?n-&gB=tpBJ9nf0gX65xZ4eka?}e?rt+xd}
z{x&5F)Bc>RFyRZ$c_*Oye$Lu*+CRT%(()hwi__BiQk%atEPp1!6gD;_x-94~_8(T%
z3+Cq%AO^p)L(1Nw*Tg_&Lb~O{K7_5SMPJ4L$TSl|%zGJ&Z>Ts|HA5?U4eu4>Q60|D
zdKN?GmMmi`TTfAUgbj-HY0MUhuq-45gYw02qx6QV<3kPh4cduLP3j{wUL&)<Bg&e}
z9Ap|1OzAvsQBM3}#3i-tjEotN)+FiIc)bk=lvZG4)HX(N5@jc={R00eC6jiVV%_|6
z;oaWeLgcVmkFq$}#qhTT*}n{>tOU4jT(csX^dM1As@dE&FR2BnW|Y#>x-8hd816$$
zUIJ%oCOd9LQon3b(em7!4=1Ik$%Bc?#Rx79QL$TexNsxT=SqRv9sQ-rMI}}c<e=s4
zEtiCsF%G7r@-n2+jeH5pe1QES8>w-4RiTSqjRq|+NStZerCN`UW2%_mq7)6Syy9(0
z82r5|!lfGtMQt)Z;rP};hKiRe;1>#`7lLx+6Q%=0Mc^yZoR@FDWS<EGI1PP`$wWuK
zaWPiD-GLc@ajfA2Gmz)jPD87-Z*xb`${d+bn8LIfblWDWW|AfmCuKF2TA#e7$^qHC
zT~odiWxw6NeKV$AH+Ul56-K#=3blCk^}*Aq(HqY)8l?MOy817YLN+$HDtgA9MdR?}
zX~p<1or>e>P!E%CV1nlCS~-#i&&kwp5AXZb?`XFRFJo(jyLx3pflPg}m0ht=DWGta
z`60+NSrU}N`9pzCB>^Vy9EA)hi!YlBh4kB2Hq+Ft`Z?*5W>%mgD|vZhkPl83&F&qQ
z@}v~0Vew|*%c*p=ZWQOzz>f@b+47Jo6jQ|y(Wi@@j>lLGUTMJ%{%VVRNzzPyqUn<t
zmw{1xsVp}dg_(u*Tq=I94!r`kv*FqVi<QN|i5yF)?bWmxaH@`kC(rt%@*TEnlvfx5
zC(&~(yd%B2E0pZC5O08-2Mw%wH0TE#%WQm}VQRpfp)I#*KHH->N^qY59IP0Mur0F!
z8y`#dzUIFCa_TODhwi#u44&;}n43Zjkr5}<)*HpvXj2d1e$&0m(3t3-@t2>1mxW9?
zzig%<#>!q4<%T?#o?7&#@_lh7XZ&YpODX&HcBkv#9otT~(pH<@F7w$HUgIiTB_>)K
z7N%yO3JyA+n;{SHrWKpA)Z>MyaRnz8;cYeU_0U?k<xk(kM0nowyK^l_JkuO<h>Ym&
znb?HbR!Tm^pCj*f4h|fdm~78KN8HK~_j=vbag7QO4#%7)FNc{@?V7417z~{hY<q(z
zQWj+R!e^_X>ZwqcK)Llr$Jq-{@`Il^MIm0~llvdfmSqCb*REA?QL#LV$V;WB0mLTr
z0r*6eWoQ!KSjjrhJR)ThUwZB3Xl=uLj!V1B^k8&E%g*@^WIn6va<uW}MvjurEse@0
zrW9YfSak)z^(bX-rMuN9>|MvjfhCqpZJD3eQlFN;|6n1P+=<qS8$NyTbT;XUZ>H4#
zh5yp@EUZC<&3F)`QTWq9foZLYdgPl<sz}A6jkxAM{!$9M?mid{GLPiE*{xY=LkP0o
zGiZM=I*)(C*01{?(NHJS-6rn$)}}PkK(F_AoA(7_?0qlOx6SJ*k7EBk0jvSEAKt~t
zwsGrLqdxPf4^VO#TUkDK+LfZ3;p<0QY`ZgH@YsgG)A#u8{#5CG2XOpy|KnJLOePQl
z4}N{yHyzs-8elzxbJ+F&+GL@RE8Mof-l+Gs_j+H+@n3sa^Saiyx8J^BskyYLc$pXK
zE)QY`)kRe}KG$gZ!MBU9CGN<b2g3O`Fql9gZf`lxCGxNCS;4Pao?2+K0D@$}3?q2K
zWko1$Hc=EPfi%J?@<(@6ec*SHr?ffpq`9>5;wbK5^;p}}5cy-|S0FoCu=~ti>~&55
zP+bBbK3KA?h5y89;H~~Y_U<W2(yq}GaM`x)F5B+1jV{}^ZQHhOTV1wo+f(2FpL1qT
z#9U3pTuj8=<#k48WWIZ^wVn}#i1JGkX-ua+(<2v%0S}LBjkPdrhbtdQ1q-B0;ZQUr
zylD%AOb(%Oaj^sC-f;@|N9FJ}JSu2mX&41(eG*J?zZ3TvmY17a;6Q0Asgh@|v_Fws
zew{nO60mmZh*QdxiV?nYs5-F45{T|BU>sd71xc7HF`<8LL2gd9MMsT$q~m<BxkFT?
zd5*?yV|aFOWTp+FoI>PyHckkiIQ3`QnDs<EiKNw$YD}Rm!&JrfrAZu2B=zu{V#YF4
z9ePCYhD;cu7-{aQtY{7Yov^>z#57_>N~=@c#E2O|ekqlJBSX6Nvov$()QO05F|BvQ
z1X`qICzwhfN5^4?MLb2ieVOS6=p+vJVouCR$I0ASz>gSO4Y^qC?xx=)kPIB@VK|<&
zq+WsswSx>NiyshQz>XPa7$QR09wb2`tshwZL(6n>P`{d|XJSvnz=$->o+f3lYd$YV
z#zc=IOVV%)Vw@9h`jfAJ0YZ(5`N!6A5%T?^pzep__m$q@4kF&+4ri~CK5q@}=lOlO
zc|Y9sOZkGi-moaYz{na4Ol>y9@&P>`{t{q7{fDi?0F8jJ7qF%H6N*l#ipsCs%y_nT
zv%o1cATwIIpywbusR<9d8`dCItdvQjnwKeRkIVn30j*^4W9(>r{A274>gK9ZHzOt&
z%5A0^q_LdX#PZ&f1NDCA4)&5QSd@LhA$Y|GxcxhNKQlUtaJI<niSP-+Q%=!=Kbd1M
z-)QCv!Gy0|0_0JEb$S_=AEzg58^(Q-@J9rt6P5`#K~Dm421TGD?GNqgAUM50S&4`}
zZ~`OTv%D}YG@>nw=ezGHahM@9gRlGUb|H!II`EwD1zITzp%0CSermT<LD!4FHr2qc
zW_d0w2699}!NM?^=8tuebk+o$`ODClK4_rc2kxgMruh}U>2ZTJdC(v;xiklefDs)f
zg7_s{<h9?+Zg3`a=-AxO<mUvr7;kY7b6lNus%2xd^%_!jy16bgg>=Kw8wuho)N5%)
zm;hQv)51(-Q$SJH0tfe!$}Q1#T^Eo83BorI(#g#)35*wE+PEHdrYs6&1a3j9v{srv
zatxxM{k90C&7pRxQ5NWv2p7*KUKglzi$o<&2+z|mb;K#XheZ}+&Ek|@X{x`A!x=%5
z48|EnrGboBueRn*crxdxux!Q1#Tv#e6t9vIcr4x7M?U&vxtr~BM-~5Y%ku>CbcR3B
z&Zz3Yp`B95(!R|9f_AF@1?`Z)f*gme)=cJcH*74>4Xx=$?1+A2Ex%)O=l=`ZVX<%d
zH?#w(<d5Z$)7O&c_fz<E-eI^PG;O+@iW`2uzNhYz=9Ip682sHBN~A%ONTGlS1N+d(
z!Rilgth_GvZMCl5*!i+L-WBt66v_5?+ai<G-~2zP2?RJp|7GpWjzV2+gdHZ3t9*UM
zadm_q=h__tM$P*!FYq*<{N)3(T}tz{oYvM#epln%7k@c^O^EG!zXK-{Yz7q~i~*It
z^LPbL$6B|nEGp0_Tip||`dj@nbbuHMhS1vRs|u57YpdMHhUcxESI#be*jxb<Wddd3
zDxL>zr@t8F6B%p>C&bXpORbsO&PdGB2&v-;Mo*lEeH!!31sp(#IW}_%Qk#;eCzmG6
z9exdYBTIC3o5}sXg&h)+cIxSTyydf<Fa;3XBKgwq=6e26m@u)MgQzX%@mGD}Utk)r
z1*9rtTL#10lv-YF+TTH8{L^U(Ncvx0e2S_0wg3W!A#p;C@EePZkIrFh>jRSg{R&l7
zsVmC6JN}}=!<hR-<_eGhtG?q#maMy;iIW0F*5&Wcz)XdBYW?88yOYOEX7Pe2R<PT!
z;Xepd`=|HiXDID_IO!PhGXAD(R|A>}NSpZr1CBI5q!pg&)D8rfvBLKKH;Gxus>(4U
z53yH>ZJp501$79K!2jOg!4jxE{PA~)gn#^<Kx*B8`#WHBKmN}CEffd={tv*DVgv;o
z7iwI?Z<#1wiYhgpAL?lUBs+G~67uDW%8fr~SkLm57LybLsWQbb71y&mHBh>&e;^-q
z;6RX7j=O|qcy4z11TbB7+`0T&H`r<zW0<)aqIg!RzHceK*mj@6*@Fi6K3OK;c-Y4B
z+m2?cPJYv;+6%*rlOOi};+9`fX3s{qx_#M;t(=w4>aUPX2h-0Wr}K3R+D9CB!2UdW
z{{tZ_ez|1yTWrQ`gFq=w17n~l^Cv01*{99k_b71f`ij>yKDp8PotpQbXu49RBH)=b
zNFBs)_w4QXO{bxjg!-It2Jqp^M{`=5Mw9;u?{rVAy*8)(Y5r75=};**eAYz!)BGo)
zbBTF`ZjnXf$KAobJmy$Jo=84_p&Kr`+YP;fuM(1J;H9*-^mYR%90&0z7`#maMRl>I
zDc5FcxBQFi8LEhMKNOA}#beh+D!g^H6!9jfzX1jj8Uy}oh|9<dg*P0#=4mVX*v+Ej
z#Y)4O;-1V=>cJ$@X&ByjPu|&73J_CCbXBEzJ=Ldb#jz_kTFs@WYkyw=pbQhz1!_R4
zBim^%&Q&Q3d#I#hFA;YVksO6rV@w3=H;SA&u*e3i$i4Ubw&3>S=uy?RvmaxgSO*2J
zdrkz_<%PXPW&o?RQw>yZR#w!3Y{h%~w7x!aXUT3kb1X@)3GnCV>~0>KjReg_fw9v|
zvblliD!&5=^i6xBq=@{E3i4G#2ug~0+4=ty$PShXuN{7lk;=<IJrTs-Gd7X`RCdr9
zdKOBx?PqQCbONyiC!K`@_1VUm(AP_Y{zf7x-S3S)go;)EDbOPip4uC(M1I{zS-~T%
zMD-V6I<R3&B(R)<JfTZqegkPW&l0(dWg?_B+{p|iN$|d}j+N~ZwA*H*L0%{O+zI!p
z60UGm@6Md<;qJRU`eW_zDc2o4{R4W@DOVsCe@7eXtua1oKvK9}Ot~{aHv?rOVe$UC
zm9z{-zWq8u81%yX4^-#Lac8>nD#0tg4!BkF$l_HB(9_cm0b>tIi54AB(Z<-8wa>9B
zo$gBU7cve_N;%7(P$hKX*m^+PxMRC^@}}@XS&##C_`<}XW>)E9*j7%aUb|A)AHe(6
ze`ct#_6;9AZp7Yu3nGh^b^hB76~=NX6a))+Ju*iK1R@^O(1uncQ)xSX4!@SXq+i$j
zy|8?|K<r=xZ&WHYTn)sU2E7V!IMn7|BFc5jA2{l%L!b0UISSNgOkW2XmL7OIs$%g{
zm9gvfV<oxb@M5O_M(oi32eETjLR<aM64lDM9}RACI1>iSN)fxR#V^$0mls!%mq)!f
za85L<yADa*^kNT!cZ~f%L2pz$&Ft%QNyZR_;f}w<BqTH?JRw~9Q<*Y;Lkq!hymTkP
za`yf~pkrK%Y}wU@8?B^sQKG~G9wq3goRvnCSF%VmBiAY~)H|YHL&3@sz$fQ3>5MLA
z32S2-)?zQ=5pIr5Q;SwIvPGo8GGdN&8&FZynpgG;=mpOdCb`Q-!zqceXP%0#7_F$8
zOD@oz6c;bdjBbwd;`S+LaJj;7xOSKTi}|uerapr_u@%ZLmtq%&Z;QZGZqUa<CPrN`
zql#P210ie}%PU;$ECpqp;H&9+N^gN-IJ+A_N#B@<{lF+Fbtlk~QJs)soi-3t5f7}Q
z0Q#QRUmHNFH?o_BAse^!9@$HGY(A$swnZ_bg5&&C4m_av-&d&MdaHAszaD$S*IA_$
znLM;<!^s$qYhjJ-%(Rh~!^^JfP4)Ht@fGj7QQT}X9$R*zoY`${Byt3gZgo+^>JNbZ
zZ4qVG6Zf!&D*dgHQw#CPSp+j~SR8~PStBhJG8}`_Zy!y_OGM)B$k8TX4MWI6z>urb
zZiCFqTFMA@FenULqU4UJPJR~r`!^uSfCpc!xVpzsE23?039t>Qa()L%+3D~!Obfj4
zwC0~`W{x}J<v7=hyfp6TLcN}PM&a5xar2>K(mSB^t(>O;GojB`g~mhZxrs*9@9SeJ
zI==a)L*UUOJ5aybbBX&;((o0S(gj{e=6_S{Z6g5^ix>*=o4tV-kidZX1-+s&7*MZ*
zfm7=-aQHR@sQ2IRDmlB!<Gi<;WdKIKT-mB&_2pNLS6m=()poWX;a4;xC=#Gcygxyb
z#i8XL>tzAm-1sib2`!l_CSr&qnMZz$tJ!CDb{R~|M7n)JkqGia!~n~LrD_~Q4<2Co
zG-Qzhwmm&0)L`4uNktS*WYmp2vcRs6I||`ROhY8#Ha-8X&1R@tYOhKgj$PC(G2pp_
zAm*^nAk{}70xpmQKR8DNaCJ$AF~`aY)F`d9Kh-5!MgIL{Xgz4$>H}~?vDy;`)^pNn
zM_YzD+?a&+afo%?ttPZ~Kp!=bgcBy3fcE`~yY9lax|!G+x}%hgq^GZDrcCb;E{o#X
z0Su#u8f9|k+`#YGYZO~{Gu_IHH|Q*7%k&UU%!dDHhQy@%Dy4jst^9us@D%N#{qF#t
zN!<Sh;L#K|t<M-+4~1HtR#L<-gjprRx!1uHcPAqgL9-ceC{ooWPKsCuqZ7Wxr(ylO
z0rXr@qC?~JGQ;BmIqPUol7Pd0Dm=v#`CtAH|K@OF-yowh`!i8Ups@B?`<Fd8mGF&r
z#1=@#dGc|^Fd>kNIP8lUrB_Prh)B!YqdQ^w>4P68R1i+ZN1VG@Mq7A6Y3gv0nPEQ}
zq=TWj>31(#KO4MNxHThu<kUy~k&h+B*dRqZbrRGcdQ4@>GCae0rj#w|n+t#FJf$3%
z5Zf=2Q&Rwf)2yz$mzUPX+*N1f?8+|}6oy84Df&VckCH1TIFF5=Pnin^lseMbjKw59
z&SuwV#)YVBC~6#^mhCohpR3N$sw!aGJ&YK418MA~T;IY^9^BCFj=+BK9whIpm<jv_
zwN4wb33=QD$ZBVSn}O_7U6ud5&=2Nuui)=M`(S_%b@U#%aVsAIhFC?Mw3u#!I}4Ng
z&a6C<ww-gZ`R*?#wj@s$oxz3}x7|=EE-DX$x3QDLzhmBsnN5;wQUjlx?3~<6fNYxn
zT;koMb~S>X5B%-uaXjpx(z0%`9v=+qPs_uxVaCK${QERAz3)je;k>^Q$-xO~m=5AB
zlNQ?Od8iW23YCI`Tl#T5TA<nHStPK}`!Dyd3m<8daQX}m`#ktRpKT!_w}5O?oS7(t
ze>aoD%VaAXRvLC?0$X78K)vaUBmoB9>3a*fT2XYJaQrrf6uvKy=VIY)p8JWA-@xBV
zSC#yzP77oA#8ME0^mFf2MegcsN{eJ$z{Y91KDJ)T3H;0gM2+rl9;1&Oiog+=sH@`s
z&g}OLoMjPF*S4RjQJZ3c+W+Lru{sQP%Hoi2%L@^v3pW_GARl}bQt<<X#3Mr|YuF15
zqb%(}3-RK<A@@t7V~r7Cq@2R`MNSwi-X$Le;iNfj9mu7&JM_5j@RIzGE3~|)@?56!
zx=ko)YLZ#y?N?~iGysy(B<b&2$l4CVV&OSYrXS6tM*2$Arpk6D7DZAcxIaub>DB}g
zHpCrdgDuy3jEsU=P*l91#;sOe+`3>8$4TJ;5=2I&s9RI~UJwxZVMy41zlYGIDGAon
z6)Ntnk$#5P?d&6_Vb0r3EI9~+ikBx+pp$jgDF9^j%FzMdE^?=fJL$i~v2Su~VI|58
zgNs=HF>#SB?W<8S50YnodZ&=k75s+#`tEJ5f`imdnM@l+iOGLqPwd}#!T|Oa6OORK
zUiPv4^Z857fQPr~<G_y}!2sgHh9g1fYQu|}Y)VQgo6KUwwkk7WKNU_G1-=6->;>?2
zxA>2x@P!COBEuHQ*H7M@7zl&-QeGd&nh23s6JVirYam3Vd@-LpK0&9kSheBCudg=H
zzjB%PTp39#$Mjy|^Gy=<d2yXStcJEp6_hJIw94WjP?GXEvkHSui1M_z0hKVsT7o0#
zGTxaRdGYhLOk=zQPD2I7dYWF^o>EiYesFJ|vItkPx!!ObLhd~LC31sMFmOzX2u;*W
zikbZEkgCW&tRktgJhu6lV%`Qs+2u68+PNYs&=Hm$a9Q>C^~Qa$HP}FL#hxvXD<cCv
zH~kzRGrc8^x+C18Alejq#KJyx1V^~ih6Wgz4R!o%OW6nE_j8(M6&!?L%NLieETrO1
zuavI1+p^epRDFAYFjWMMRr4Wdepa*gIBgRJGJ`laOSw{%`7#71*kvVJ5xR5Jw}51D
zu^J{u2$P(9_6|C-H{zS=;g>vw*TcW^maS44=KlYVTh{UKHu7KvZ)Ei}dmeqw_3sZC
zA^#f5rrCZupGyegqxsvm-5JBWOi$g&@jAaZmUOJDVx6hs?%8>}|4hC6w`>}O9lq4#
zB4qLYKI%SZ1O8eJSbA{g?vlJBXBz~9oUflQZy$Njo8IpvmwpiP9DwZSd%tov%k|Ty
zCRE^rgnkU+vlfZt>+R|rd;CNU{n^D{4lt1vAVgy^zvUb><hw8Ocqcvq)RxW{%U{+=
zhiS9Zvwocn8%_eR*w#`YqsA1vQeEDFuh|!9eU4%KS5n>P!^`{iQ}n7S#<5!O@Kgu8
zq{ly1-kaDwGgGUJ6S{8?J-YpQTF9`5$w$508-y<}ZYyc&-N4jrS7rZg?<}sTW4ik>
z2<ql<_japoP6!|us3eCTp)4|#Cn);5<VgNVFwv3K8m5KpOn#c`+CSvZEEO$Dx8``z
zTtb=hVA+APujrPrhpK56jWe(9lcKdI>%enGjdm!F){HhQsKE6?C)ZZ%!C&T((~;j1
z)vqYEzWw9~ydO_oKW(jb592ajoK0<>{iLY#-jP6eXNSY2RG9v<vk9X%z_shM;2q;~
zN$tnc4$gyT_Zj95``aO&l$<d|&84w{QB52bQEu0ok_pT48PzaM5AsH!MT<LSRG+l+
z`36=wP4*cDXimYFrKL$|%Zh_}m0oTZRiyRgQRMHv;lR}xwAraY4oz0~UDK>mpjP_O
zsberl1WlXes7%3R=MqzZ7EKvtGYfNXi=2hy<4og8h3i%eDr&rCym4D^N{nd=5i&A~
za0?q@3RdP<VxQ<$ZO|1(OxB^jq`cl+sa=~dH~t>qzz-u7Z_gBOvQ26h<J|Jni1w0>
z)V1+sf0M=Ho6>plCh7H?o9%W17q{qm+77eb8{IFMiClnV)Iqah0VRWdkw3guLWF|3
z!+ZSyn8~7aJ)U9j3-bp0TuH@>+F<nKaPLI5ei!5`C{!mkF3laOe2L>Gz@+7nn5kTk
z#$WKr2wy?Qwu4uOk#IfQNEF;$GM87CeK>ewEun)k-3iAnw!R>C6I1i>bbUB_yWKKP
zzF0KOr&}`xCd(|#n`Jikv_N_aGV<S~o%v(DEEH60R4g>G%;;TDhXw*u-#tbu0^S=l
z|7TBYFc81;{d^}Wh=M#(CLzL!GpYv>Jn_L}v>)kq8hXBIUGsUd345r@E_;f9@C7Ku
zChUc=#q&<y;7jQF2z=r=T5?n`NH75PC%L_9l|sy0Bo^y+(H~G0WV5Q@#%9y;JS`}7
zQ8KdmYEhAc?=bB>VafD3I#MFjdY`ge9!~e8_Ef!~Du7FPwlUvK8K658dyM2tqGxx+
z^q6_tS-cK&Uy<NKruq7+8!)*%!s+{E%$ya4M7{QWH@dEdTEi;5fvg@6el@P7V2mUb
zTdwq?m6k}!0l#zrBJl}zQQXe&;b><MZ;VLFeyz2-cHItu0zm3Dq@Z0@XTHwzykk;I
z2<mGR=`@UKx_17QUdlv>jrNx)CduX{>I@EFk1<##SMQ)Ft94sgJ(p;Q>;c-|ZLRpM
z=RUu@AgaAAqm7q?Uao(bEfc<5{JDVPp^#rKJ<c)T0JGaRle=Y)xYGM))VW+?*jL{7
z-ur9H&lg=u1+oE07pJ#}_uB&yx@#(Ye8+!z7K*LR^w4Zdl8*7|XlRjE*4Eq+k31;b
zsyS05VsV-yjUN??c>+icTMWa2un`lq_rgoB`(e!&VR;tDc&x9(FuAqiQm)aplV!A!
zBSUo0g-3oLNeP7h#2Rme_&*3<2$P31#oN<P^j~SYJQu7?0S#)fbj^i5HY4YD%+xuY
z7P{_yEjf{HzE*_VVTGKXX;z;1sEe5K%%+|cRDzv9FN$E$YzTtCKKW+v_DZfcC4Z=K
ztej09?3yKp2r6X931dZGh}mR@q#9jGZ<J_hBA96J+FIIYg69p)U(it-@7oE&@V$gD
zR+ZKy^Th4hb8x=*0e70LU*VxfC#mA<&=$WrD0)0`&&P$G;tQFmen|tB5{=3s`avN-
z?r(1>j%+w=)9}2F@<38DVH~eUm|Q4+1ahK=kPm-76M4zP4bQBch0VKy-j?UEUu|&{
zYW2FP!GPS1{8UX``DNY9+w<J9<(hCjLeuE?o+DJOH`KddQ*rfmyjmJ>k~&;brU6Im
zuuNpI3`l6IC$lv)UTW)V`B{|d1=6)4U7hnC^2c;-3+$FqNJ86gI3D(nE|b7@{Gmd5
z$!RL0q>cPxa7Z3Z$E2W~$6Hdr|7frar6AM6Zz?O@Alhp6dZe2knnHlLdx^*kvljZH
zg!9N7F~tLMbs?>~54l&lJ1{Bx3N9TWZ@1%b+2Qlpc-o`02PQN~%38~(x{6BZW(bgq
zP%dapO_N08x&b=tY48Qj#qObM?a7m%C5qjJK6~PHOZ15vvNsP!iESoLnmVEsx&^Cb
zdxgirw=A>s$i4#{M71)K1&b}^i9ZD#$Z`sir%>TM_Nsxzpgc?bt~_v49ake)rFTHC
zYd*XX?&M<)(1_HCh+*J^STHMxdE7uY*?h;}7ucDl{=UTAIo^A}>Ib^W5{1W>wXWp;
z5Wq`-WD-*_e)HM$2F>cE!i84n5sO9z5Y)LLltRN^M&CFV9Z7@R3OSp7t&OeZ4YbK~
z-InX>Ho3<(QZ`KDc^klI(-fdMJB1~H-UJBge1GA_dgpO+T{{iKdbTybqapL1XPwum
zsv3j>U)88%d1z@wsz5O`0mbL5plDr=HQ0%<jo<S0WivE1>Gy}uE=#O?tje+;5QE~q
z%n<7<Y4bxd6RHOf_Ylm^WJ)SSH|H9KbUhC4aBU8g2;WoLA631Yntw2i8&gMUA8oF7
zB^ECnEs4{kJp!+H+hgGAcn}P)%#PT7o-r8@XaTZQKh2ar+VFR)mPPgpZX4cFW+HDs
z{JAz@QY<<?-u$(U#kXPRX}k1!Be5S*kP52#bw^i`@@0MMKO`o}i{csCr9_74&%%xh
zQKLWtNbYNMRljZs$fQ`q$)OL&r^inoz88jZgezBlAlE&@gk%sTOhMc9%L+5$wLNEi
zdJIoKseH1Lu~9=gRkgWF&1lKrszRl0*EC{WS%aq2J*jPjRk?P~-a@|4xkrxl*QE`%
zxr#N`meMR$8g@&uIV#v_6{RL=!GZHamFFJ|)m9Dg?re9tiNG!h=~AbDokH3~I&};y
zij0Drw;5}1!@&6ZGDNkEQ{lzb3_uO}CIV0GQtbuHuOV|0noIC=kwvH_wc2xXXJb)*
zQIKgeSU19alkNil!0AitNGAk-$%?gZ2#uk%CaAm&unAIyL9%`!G`IFLPigAk6p$Iq
zO@G6(xKphlBVQ6f;D~N$c)$z351;Y#1}W^1s}&V}JmGC~@PXGsJ$;H8G3-C{^=TK#
z-(T(iKfCk)*`5D;>`tE1&ir97;T(&(NNF5GZ3IF#@2o>vE2kQoLaIH3`^ksjm*0sl
z&SNx5aDG65@lHDr$mea!HoC+ruyp>3!9^K62VGe)jt}2NehNlkgCccjpaO>#N<)yZ
zY6ZWm1!<};re`2QJEDrVhCT}}xH0)(?+*8aMvo)c{a4HVKIFJm<b=$mRmBROKLf|?
zqcB~TS%ogrbH1tmvA#2l`jgDZ$T7XX@SvmBeE}SdFU%O>+eMnbP%<Tij70AIP4q*X
z5KIoSMTpmJp0F_?!(!yOSdG!BZg6t2fj|J_ej8TH*Srn1mUc1ros@;;P=C@{1xffG
z#gX>FQxN#bHIXTX2XdAq4uKIhcPU+ycyo;_Vu79J_M4Q_WPoApsn+@v&G*+w70eb&
zrc}%%n-fXr%yB_T+5bo|smJ?E8sZ7y%_GxytRb^IQ)6mYO8!RTD|Yg{N19I$h;enb
z9w3J#f31Hrib2$^;yD-T2!~sXnFfHt|DY0cr)b_G0wy|Os$}O^XXG{^N9uzmRbvrr
zc18FzL554!H^h=9#iGM5g?fFm^ToYI2m?afSpy;h(261kB>_6wDws@3ULt7R7{$<*
z+ALj1l9L8B7$&v<CF)2W6~Dt87|o>|Q?UO|59U^VLl!pUz<0&OwJur2NK}Ty<f@dG
zCe;l<1i`=*%Z)IN5e6sisdr-#`XrleEIE(SVfsb>k&}QZv^-Owv@2vdHvdaE1(AW%
zYL&DSkHvZhY15s4WGeiC5f=LqLFjk(j#|1UCJ^sV#VHnq;NhlG|BqZ$JLd>Ql}cE7
zo5%-qA2xxIh81oNgMA=KPs%z2V%jChXkI@#bs`H##fh?rk`Z_6o|hcbSRofyXWR&j
zZMf7jHef2aeu7-<tk0ZkkLG}45{7OJBCDt49|_rgI+!^5fD-QD0yNqr8BH{y&>sRD
zZEf%=g+-ii)o!})FAK|E;MmFu8$U~)&kR?_(3plnOe0Ep#;-H{8SDG^RI?}m&N2(K
z{+Te%4vP$#PP|E*40FP~2x<$NI)H2{1NOI`cLk|bE!&Q?Y7M71Y;U<#tc)~0!3&(K
z-s#CC#JmcXr5hIVXHW#7AAFlTQR<S$@$5B4?G<8@mSPReIM*-e84#!=fPXb}lKB=5
zVOerXDxyJ^UW6u}8kg~nYdQ%mjt^06CGwcp@!D=8vw&bmt8YA8gwSLn-f+Ww1npgb
za*MEWjqm(*T|#FYCWR0tN>Cc3T6vml#z>m`Cn$3#$)#O89DfLpphvQ?Lz?qkJ|I8r
zKUR<I$xpmX5z3t%ncqj^?zcmXDbwN6#uOd9d7WXuQ$O@@!Xfz)Gx%l<ixS5GBdp9u
zYM>e9A^Bory-xL#v-hBRId!*1G&AyHj>t`LVDHa|xUU=s+bMQqg?2C0%1w8G7mhre
z{RfeYrReKCr_!2XWy-O_X_%?jbL<ft;KMcCPv+U=Rdpl|pD`Mb=LG=sUjL$rA%IMO
zDd~lL#P~71yK0zA@Qr+~)>BCan&AEj<)_=;T$ORSNS?kZ=Y740@P2XivYPx&=4IMr
zoRmh@{fgEvfoly`jHUugT}&J4%Yp>!#YVA^J&l4lxY0E8Ln2BfmVh|Qn0d9LCmQS%
zPI!*UFy>ZNhI)-;F5Ury?WLCFhWsg<$bFBa^dvXfn0TQYb3>piEk^1=e7Sl6(2zb1
zJhF@u62<rPhyF}ar^s;*k<f@O*-27a#aGE#%!MD_Q>JO}1PBc#GyPXP+}@6PnZXkj
zMfJ*9<Ad=#+Tnpy@EyxHLD`8<KX>rL@NekasbG|4g=t$bOyoxFg!@#EW`y9Pd$q9{
z!K|xj3X+U?v)&Wtp^kNlWUIjf>LIGwH{O*K!H8N4;kk~F{M|5T>E@irle(OlwCIfL
z^_8vODdm%8&SR*j(%s|!Q#0NrVvP{(1e+32S=MDg!ii_HN@9iSn7P@YvvWVg@BT)u
zPlX9P!KtObd+hQ;$b?^Hp04w@ju=0&WMX1wv(9Adw8@6)kMqm=&#MH1uy}DANujd(
zpv0~20&!w@_mCsh0{3_~P#$wPDV|lscu?{e`P_2r<j&5CbmkSTR2{+HIAMR1RN3K#
z*+SAR(-(#cVevIrvk&wY(8T%+i7y3Ez&){DuHjYTm5xD)h&j{1ngVrOsb6dM+9{HY
zSeWg9E#8k3N}fIIAk30g`5BQ2d1}!kW}^z@m2(KnvZw6S*j<_CtxA4_7gc+KGv!)I
zEl+G9srJkQ<MHRYV<w=R!^;u8<bv~4s`BeDu{@D0xMmUwY)8+jAc^d*iB{-%)p|Zu
zcd(yteNmuAIGHI%#7KDJJHjC*drxxfb(oS9;A<5bM;y;Ip(`Q2^uPdpnt7#@dJ1(C
zWh1ihjlsgU4_z<LzqHc+v@aA9zM0K(<$(A^V6Ff5b!x?h7g;26Qyj_{httUq4G6mV
z`WPJK_q_lT-VIsiWhZQ6k7ZeqDUKy97KCHuN6j$5+D?&>scJ9U7dQ7{W%xai9Kb&?
zu8qsW3N^mN=&kux?}4rzFNkG|N?^3l!3YV*(NfVYEn(Q8z=!%E=*IuX1VJ1U<RCLq
z6v1c`2<~jt{&jii>v;fLSs*cs!4DRZgjK|k__{-oT~#E3Wjs4P6LCotlc1y5^>f<^
zxC~Ghwj`$8N+ZT-N?O6#3V{+;pAf87Q&CoZTHfujBZQ;PvkF@a@=lfZXOQmFBt0uV
zPF1IfMzpOdtiP(*6oNvpv4#X^r&iF~Q$Ow@X7#z_hs5I7?_Ekse{kghHe@7rrzJlj
z8>A4S8m%K8n`Oo+5uG3PrI~fYCZD&c={lm3iPXlUm`B%3Tmk&0?U@2F9KOPDu;gpB
zn17#2!=CnY<B2cP$uJTcL&Vzjr};uYt0i+lelH%jdKE*pV%4)u`%|@%RFNdcJPl~p
zN|x*UO=rjoyLc>&czhNmbDh3K0fp3#hKJ^Ua4;}FIpk6WpePF&Es#+bDF&T+9MdF>
zC7?j>4P=jFKW$1xg2O>PR^2V(=Y(hKOyMJxze1Rsm;c6`AT9I>o|MyH{l-nUgIp~x
zTMA>7>_VoBiDW@JY}W`2<q!pMRjr?c&gdLfgC=kvX|v%)3fg|w>}+$s5%ZFhkxI0X
znaAe>>^BlS=<+Es7HTSt$Lk&V&-sXUxA!2=ZKK_;cj7SGnv5}mpyb?#E=d+9hPuwD
z^I^sHm(}{#;n1|om6xTk<^uNGpH&mKgM{L&Vdg-BMG^?ED})FYxA0KfWHh{AbI|H{
zC7SN>MR|HprazFuF&mcXmEYybsz*fwc@@uWa)@(?V_%nhNN15u7E1m`mGHt^O75e>
zk9Ya{1?G5iMf&{7MWjnkyCoVf<T<}eOvW>d?VfRg^nPF;Kx0PH8V01P_}=(8u|7?_
z<F*9UuRU$Mxfg^!hWzNpUHlayz=7_;TDg{lv>Wt=V4p=Q%x1~{OrvBWX9;UfqquQf
z1lR2DK()rB<6-)xIOtmzJrlDecCGeTU7jepncw|jb%{Q2XP;=(?Ws!nVm5uFfVP4J
zGQsb?S1!qz=}K=G$hnu@;}W{qVamf9Lb+pezz}1WK-*jrwcRuH90L$?_7iXWaDU6p
z&My0XC3?!d-M`)%ozx{3hEFL7DPJMbvGabxG_Ll}hp&*6L#R*({$kHIRramZ9Encj
zO$ZJA@S=H(#gi#Hs+A%BspRJV<@Lx5)6rEow=Y`#Evw=5&D<%|OA5~LOW8Y3Oba89
z&OugwstPP*JNDSTw|!gL`Lv&c(!h>o!SG)B%Dw8KN{t1<Q^5Ny<(RQ-f&0`TB7y98
zX7^@frFAH!KmRHjOZ)4SxQ&<ZFbe_+2^sWohS{ER5_}oAjU<Io!FAw%xm&9j<+w78
zd;iw<O0u_J2$DWj?Vrh6L@wFf&$%xXakLPctgCQI96jjj9h#AZ@)s+0IQ}Ut*SErv
zSb-CAfgKA8X4ZJ$tMAvaMG6BtXcXFDDXn+8VKPa6Jn&J*gUv`4{o>nl$nny(0rMx)
zX*=Tm41Aid!Ni(KdubcJDUt_BFt!?D^byWQh!96=NDbHeQ_JtxxruYwd|Uyh*VxO5
z5ErmMplzu)E!1DJd<Qy9O164v|DiS1zpFL*>EE*^apC%_Gq9K-A$^w-6-g|(k8Y2t
zP6=#Xor;mZnfjMX4lO>|t#TcX6nDG){`u-~N(;ftXPeDRy||cpatVk0WIR#D@c8|&
zR1eU}w4Tm~lfw6l^+Tqg^XQg<>((KM?N)~z3$y4H@kxUAmh?Y-z34J)GrfbqVWPqW
zBFZui%PlUVs(QX-_#z`NVNB{1nAnj)G%n>IXg`o=+tBEcG*M_f=PGxxy8S*X%t^kV
zKRmrwrue9Feht8?+skO8vs@H<IFJ=z@QEpLo+!fQcnR->5e>x)f=Hsyy$;AohZClp
z*S+UBsAL{PJ#kFFjO?2Zvi))a!2kS^VBwj08ObFtI}*kH#t_vme<h??Q-&Kzd}GRO
zxo|wtNuPq?qD<21b^?l)`7-OD3LX7Rg{Bb}9;Bp;C8WKp`8~IHIQ%FYHGkhtS@CmI
zqbq{9U5<_aIg~3zz`1h(g>U|YN*wGX^%$Ad5D6fBwz_4IG3P>d5Tfv|a!#nCZ8(&e
zHp5@J57FlTT=ft<CvU;>L6KM5+Y5KsoF~s|LsCUF>%c3;WkC+p`ZD#MT-4E#6y?Ix
z@1^5F1SQ{;(WaVipS7n&Oo}FO33BR4IZ)Q2#>%SWk{r0|nbTmSTsPTmjw;Q|Kr-KO
z*2D{@?d?LyZjeI2ocZtJkhZ&<Qp5M{ph#00A*}of!?$bh4sYpON6-0<hvT^r7fGJW
zIWI$zl)1vr@y>zDS#7=?crpUwSa{~k1e{KgSC(%oc<PG|LmEIv?bU~%HX=_VyOHm?
zpCIGZRpIMhGlDnKNQoIZFE@u~z2ND{O4KYJIuheIsSp2KEWafb-+o|@A|t%Wz2TKz
zl@Xkj(8IK~Q3BwPbwt5T{>@F7IiQu{uOf3NEzO;R8F9TAW2IH!DpGu@H2o{9+zVPp
zzCH>3bA(xOmAk)@1KX+A2ZiWR-s(?IGY}{Vcv9`ALQ){6P!GEU6BHIi4)6wAoe;>)
zj!}wBn-#N*2C7uAwDB9Yb#mUCoJu7qrV>31&OjYYi4?y0pykTFe9g>5o-Az)hDoKk
zOrVoXhoPHN-R;{9w4wEbI0v50`bj%6PRqC&kQ*ZNM&L2&0m_%Mo$-%Eba78_zpWl`
zQ!B!IxAyw!nRRg_?=Bq5G}6y<#+(SN>|GFJ7FEkx)7CWbbf&2=w0_YyG6<EATz@_^
z;{)#luuCO%@Uf6)QW-uKg&q1UX3qB2lFiNFK=v6fgj%jN-%nNjbQIc*EZnb~aW&_E
zUe}*rvat%sNvG8ej9|#?QY|pwAT%G=?zOe`%U}8v&X5j$MM+#pkLM)S<?Oq9j5_RM
z_eV<8q}50z(IPj|JoG$B`)Qfb2<dZ)q3Yl8tSk$6!y7zx!(fNUsfb5ZBqug?;_#_!
zJuyeQ)isXDk71N=rD$AXGjId#lNi@UQ`%N3P%&SU!#_P*rwDL$h)BD?X#*sAJ@cGP
z^VG5=sYGi)9(welD4DUk)^8Z_Rc7au02X`qyY$zXVEcTJW_4`;K|3cNeMs@?{Y!7p
z`b$yZFBDB|h7%BAVh)aE@S;-mkRRmF@Drkq)sX}*HapSnee#vYo!r}{`b6hxb85E&
zR0#${#k@gf$hu#Mjp~%cYGpNBiH3`ivc46msirfhxX)_s%z$bK+<ZkBH(vyFhXWgS
z&gP-}@Pv|IuP{71KC6uPS$6mL{9dM<dwpdwI)1gbr*52`m@5GAF}gze`erQHC;ec`
z$_^j(%S=Kz`jY|G0X#axo_BP?ch?(bTA~8xymKtJ0w(e8W&i;y1!GdpP6Y0#rP&HJ
zgoSU+C?0nDz(~z$zC-D7^>TWPZ3P-2Bb941;1p=0VSP}NZxDi*`6{hPuET6&aWadu
zfUcF+Z;SP}pQdW&qxC|ox4zL=_xt{KpakAN#o}Sd@n34bHp$s`4*&AhB`k%t%l&l6
zk2Ua$hUHCzh^r4`IqzBe{Xys96n2D(MxTjgXFJE@@x0!@8Zh5&J#BjcJBnJ7s;WE_
z(c$zV7j|*2?<I2acmvMj=}myf2Gh8}>5GVMnLPO;q%|c3=AZk&uAY~%8d_9M=sKXG
zf3SASE~rJ+ieA51>rsyjmk8MIcFZU>KPH3eXl7kRv%u<e^2(D3l9xUkH@t%wor&zB
zh_(<SFu<5`Q?tYLM($^{xr6MPqL)<8*@PrB@Y3N<hUe8Y^ktN&V7e}aOJ|~w;NHt_
zBT2LzQz+ga|2C@t&@NIl=h}7GJdSSdBsc4POMyUAi!{%tH08H(apE*0B3LOX6NJ5H
z9~-ygO*$<1k*uO+NUeq>n~&L+Y^RLZMkPT?%@Wn?7^h_k*(LOAc{N$0jf;rSWpGaX
z=!VN1aC))5agKAE9NUHEMVVg}U=>PIJc}5RV0Rk))kvkukXuQ1O{xMJD=mIp-nW&H
z1gZ!){=6feyCQ`?%!>6=IcuM#R6u$x$(9CFWN@Hqu{yWhlOAs~P8N+Q%Cn}0AR7mW
zPgd4PT_Z4Y(9a#V8qQ(9Ss^}Quwr+C3DZollG+yrIL<>PysQsql5q9WIs$Pl-;}B{
zY$7Hz5}68oEZu!f_mf}yTyjYVCIG0LARBHdkTWf?N*r&V#V{Gg2h^%KdKdJ3Dk|>-
z7fMmksHie=C?Wi*p#m05b3Ej9MIY!c$(2?ebJjEpQ6UQ!WZY9ly2QhvC4->g9jUQH
z+IV*@4}7*B;rlUfqO$FECHl=+Y#L!4&^_uLuO@W942_AsHZyO|@7qP^5S*fqbL9@G
zd{ve-SJdA~rPCVB4~}wWWU5K~X19)Tp$-<dmW;pva-#`jermu=VX+{NwbrXrG>pbP
zi;x*Y{qIrW`S>#+#)tFF0KA^AyQ$;!Az4Tj#le3SG`veD1~P2>yZZx+i$x6KU<6a4
ziWBrAlRu1mMc<9QKE89mr+Zz}!kE}w2#jDY{TWvhge|Rx7?t%Auuf0)7|J@;(9y&h
zB(LKk*p42T4-&_vQ8>(5#FNk`5+`IB(BlXm$Ds0KC!n~}2@PuBU=#WzjPXyrav_Tk
z?jh0XoaSYP!0N5aFhU!UyYGL&cl@Ce-CKgJe?ky^ro6OdPH*nhN2|xikm9TxY{~AO
zQWc%;F!s2%f&eK7)I$#3O5!ZwucZ<C<tRJD-J4MS=Axbm*G@vjl44bof`|XzV&!pJ
zeDzRxkC@<LjM67KPVmYWMPNri5PyGohRW{7BVj=m;;9Z<l@{@2?0mH*nGwDqt*9e3
zY@HUPLK}LiLy@{_8w==7;6Mvu2EnDTkR5x4V$k3(!Pw(<osFrs#+)YpBk2gpvEL;=
zni1gKucb2LgPs#KVrz`I&NO*8h^gl^7fwLzk|RJ&lekdc%R?NBi5cq%Sq_{YsQN8l
z?$6kGJbI~SSGB7(>5>IggtiJmDc`f~_EdHIPJJQ)t=>vCrH9PtiP!)QdyLUe`etwp
zzD?zSj)6*Ds(6p<cj2Jai*veF#Z6aQ)CQZ%0GhN}#sPs%yo3%-q?DnH38(ie!`Q6^
zUp6e8ozIXnDenxH?xd_$1vNW_P<Tq~`TIe2k71nSZPahVKr*4CfQc`}2?i`_#tf$o
zxnA_c0%%h#b;+?Yc92EQDljf|Y2s57jjihf5sk+dthu`9>$T6k)#;zh&UXiPNF-6c
zdXbzJcflUY_p<n1V!HKU05qftHfs}vvYWF-?DZnFq51=6!GY*-dwP6mJLC#z0tUdj
z;{v(wIQQ-VZ-@BG3SFxglh~5^t9+Ymndnl~7>rry<$^=KC?|f*AGNqxufK-KKd_b<
zGkcO*(CW)*!~3Vg$yA4b45PWR3FB{DxNrAcVrP<2PW#l8{xk?L-+qmE3Y3>^$QYn0
zOr?UqxWWii`KAzf&7u*Yj_URv2jNyVBDF%T`?m`UY0amdj!B`%BucKLVJ)y^p;-^t
z37!&E(kmC*ddy3aGHEq25;40d$q2i%VCdBtc>5cp23mKWzE+4At#qR+;_Ln@y{e-@
z2^ah_6XU_eAyblYU^uLT|7?g7sS5Q*ms(mUK=tHSzb6>*;4?+wX=OulL51X15R`+k
zZ^#Yw?`Hm@nyu}aO;$)NgaiME;CLa1Wmk+sg-Sbx`2>u&eskdZ@M=(69jMOL1n{|6
ze5FRF>C=d3^s2edwUlr4dOyO=xCKTFN_WhSaU#K#gd3r-5Y0&#*_u~<VY)p*ee?bH
z2K6R2_{`7`pr~(y*^2NPU4BCf0OqObyd#65<(SJEeUg=yK4Jt8+Z`k$$8$ki-4OF>
z_Aw;jP0QLyoPf0%m(Is^1@Xvalz=I4^-TN-iNAIA_r?6L65q}#d!lHM<ikk9m^AR`
z?aR%1Sf01kVTq2%AuNJ_9^0>13asG{Q!rpLxO`!IlyA0F{;u0N*N-Hg$1y&x)inyN
zpQo_v<Q6$kR|6tmned;+;;w5a{y_TQWo<W5B&vYtaglZkY=>yYYR<oYQTkdh32LwO
zzoWkW_{0N`kptZ@ejWt&T@k!s-T**xiLX&`T2!U(!i8v5E<(bE<m$q&pNBGw+Pn6A
zmx(a2IPx@cF7A!#^{%VB_F|P7PF3w(75TLJKJTHlkIJ|~fJ?H5E?8*UVXRAJf*=Hm
zx>DEifs_M<e4ZaW`yIaesPaLsdVBdxzdG+E{!=xII2A!NF)Zlx8qgaze0S2xI|a+q
z>kQiq_^ex5SdjD4pT7aXhS<5#cmwP3mj3j}D$K$S#5j1!S$4Y^Wm7A|Evk~3@is(I
zNCxqnv3N`W^3nqF^a>J=B02qIX7wU5?z;ijZ>=zf2(2aY2o{ph3>zmVD2ePq0iR|<
zevJm|4WW-ukiG@?yAww90f-#+r!`itk?$_*p;T!u!$OvkqNf&*gDFNPvL{2GyqSnQ
z2FDp4PCB(KXKAVpr@|w!_csC62NZEFXB5uFp=~lZVZt~hIBP6!a8!(Y-;i<4uS}Xa
zZL(L6J8p6q`KuAXiT?w!=*xo;?9x~Cb3a1_88YVW`SgIV&Q1=&hqYAIeM(SBgfKGk
zz?Xk%N-z!f>}eDpTm~e8m>>8vc{AilmX&TGnTIQIdU>~RSIv@wgPocEK<bUQ10Kg<
z)oc8f`rXjJo{x#0ot=Z5B^DPC?=Q@QZ1he!@#?OWClx0C3NuT*183*O`#Jx~p^@yc
z=)34gG7lxqd+c^Sd65zRTuPVGaITLlGDrVM3h&KAHjHL`9^lrWjNrXh={hx9E)K@f
zl0I};pghb`1husoBq=>I8$(k9E|%2PaadSsPDNrx8RJquUiwVj%yACjfdR#U&+~Ak
zFu&P8Klw?#mt0{!AxHyLGW`dpLHda17(8+ABZ7mQ-N}F6YY2vVGgnh8#GJ?>GV-iH
z`Ft6VzajA1!Khvm8;Auw36MwF0;E5ISAXVv$D|d3$2N+k2NcKREIkRr7@X|suMF*l
z&Lcnq3V?|VZl%4&=yPX+KABs>)lEk!7QXt)MOZ<zz`r6N4NLVsA00CGKF7o-VWh|$
z!d!oTw$I)J7!i7VO5x!Qi;KZR>czq5{l0en#-zSSby<<OnVCa#;0>akb#>X<IM`?8
zap-9SA$$D-uUF|Qc754skzv6|nPbPBVjxWwi6D?aY$Ou$!n+h4;o{%pJs<T>5KZtS
z@e~!)gui`-Zh%}ojScFwDN-e`w_&KgxUtA3;h$MM-?|JR!P!W`GQGjSnNm|Tu`;nR
zLSrm})B$+2$o4<Oz6zPRIXF<Bbw0PyRC<<GaS-saE@kT%*;Jgwv=&Kpyb!z|cHc$L
z-?_xEI=ieSLEP8TARn4(kWZ{MDQ4E|m2_R%c`e&Ep7q<Swy#1$-lNt(ecijRt?YZ?
zbFFJ0zJRP>x+1k>9>_EO#3zUDCwRAOyKr0gYcAurC2d|mXTRA0+~4+u9X)?&pXS5k
z(=FzS!C?7Wx;*JCjjm_JS9Fa7F;F^LGJ^Du^J)mk6W)m8B4gV6>J@^SuRei*f6=@8
zmK)QHDFpGt$`J))9sfMVaBQ)2thc(yyF&=!1qR7(zJ^>qUt1iS`98^?6Z|yiX=o;C
zX*?rL0W8zw^nT{Vu<(=G3+eb?#<e<g`I`#E@f71fk4C~Q`0*Avl7mO4^hgc<DZ?HP
zmnFSoPu!!Q?wq6smb9rDN#zzL-lkH<CSFAq6b(&WHAMV`k6AYayEOuI6&+WUVd))|
z2%4_&U5;aDd0s0#K3hGW+zqa9f@WSWcq_NG(SjASZYvrlWRDgu+6mFt(E(Zb<B5Nt
zK<471+Cg(Zj;IktAJTpQ>>^)F#?Z2OxBjq2P}-2s1Lsk6(X8D*+n)zY4wliEi!O;q
ziEq^>D>%36I^K6?KMA&EkR(t9sE*kriOC8|#f)lnqNQDtrmX-hsNPy1Lj@5$1UHJ$
zC5CviL~UuvbftP}`y=2yQvIb0Ur2%R!Z}=GLm*@D(NxY{TVbj48F3^sKdUU?vHK0k
zJJKDeMAxzg+;m4!18cQ@Qi0Q%jYjE4kiez{YB)&BLb)+gnmMie{H2!AFHybGLHO3$
zz5`uH1c!ovvAh#q3Tvgav?P<jF|L129FXM3_U8U9L{Zah7!@$QjKMi1+z^i84bq{F
zpB><gU5jJC{EWg)NZZ}bo`p3qw&#x?{On#9U@^}pOR;=0ymz}84ulw8DIm4r3L>^g
zdAKIixZHADldl~tAd49imf{UAt|m~=q7@kghKdN?b~l5)*zK|qoDgr-RglYo2NYA+
zd7W7l*<+>ncE+<_m<aIywlfp4jS`}^RoBt~lVxWwrAJkc=&+_Pp``ltWw0NisgPw(
zlfdIkxx#yc#52lD<~BN}9ob2xD&b?i8Y4|a5uJC72Yuqc3Qq4`2UgoQ*Ow2kdq>56
zqMd+LD;k@j8EQK?56AR-6VG3)XIanXi3;{+(tRs|)uj|nkLSshA31BK2yg<yx<mpM
z=&79Zo904gm7{R~eG#aWiE9a{)xkq$$6xT9UUmKV>jQ1zF>1pp(ELf~xMLubjXu?4
z8}5A^7{=qO<yZoGdvz*;Q?TdRuN7qd%5p_{u7V~32tIAi%Ow#@9pg!v4)W*@Hd#3+
z(k?C#0>&YnTYV0c0zM+d<_MO##e%6)Ko$3ODqW8e@C8kDBu!MkEBv}m(nW%6Nq@az
z65c&0l5#q#sV^;B<t#Do@Ob|9g8<|JctEJ+c{s7Oidf<}o)u2|A9F-Qw19R}7iQLP
zXTa=7_RSVjP8>zIfFrT|D|9T(&T|NchaY29-}Iow&@Z|MJS!*0)--l+Cf#hNW8vZa
zfjy@pji6P8@+B79RR!Kf>unQ}-y$}rTh^<|<8C$cE3&KY@*9`@P4cqH<2r7c7Kt3M
z3$}*)^B2jcm;D6Kqv_*{S=xZQ_g$~oM-<4EyAHTit(3MBIluH%I3rA>(=oOs#W*9_
zWU)eZ)a$~z1ygyC_Ic*Gbxl@|a!;Swy}9V}E=E3Op^A5Bi|3bzn;cpgohQXLK%eT_
zJi@&s_Ed!m+24q!wUBr009%|ip;xM;{ZTLYP5d&I5f2hs#00icD~c%1ZQ`$O&Xowp
zECX(2m*V)gUA4%JXmidHIkbXZDG?7R;J7Ur%&4bkeWE$+zm|>5`fQxKYf&07qBKiI
zwUL(Zjb>bD5x^Goct-5>sTi12a3c+M7%jw)BY-~lv$jtK-R!l-&~s0=1S`P{gMNj9
zz(U>Zx-2Y7wz{yvh25?*gpvxu$e%Kvw&o;hp+l|mWqdNs4Ww2>sS7zGm5Z5;#4mZ6
ziRmx1(>DY9^5zVbk4?N<(ABmT3OZ8i0qSM7N@FC9e;4?vhiRlf>jt)KnKO;b@I@*&
zn;V%h@-$M?F&J{?V2riR>*$&=@~T0hQ^U&WkSbS)8<)*BRw~xf?1`1aF~jZjOb6z)
zS341?M*yJRxeT?|*5m#1a)R>Ab<C@4a{MTrT1n7@oUKF`%ddU0P2JlMT%CT{#Waj7
z*jME&$`ck7sPYElnzd%TeXL6CW!<fncCY=L-eSazB>kEpNITmVTfvJM3*+#q3luML
z(y!|j-QrQU=bhdeB_Dkd8*Ae_3TGct7}RCcvQa5i@cc1p-Ey%nt7;tU35dEfRH9g0
zMT-ryraDh#8{>LnRI6p<5pZD(5iT#PTzX?Jh1<WYm#OhzEErd&0Z?Y`#EU1Tm(d_C
zu=}`f&0P9j$kcF_>z<++_cvhNeWo~T{-QwhWsv6E!J@t=zMFZ{t6Oee?L^HMBL9P}
ze~PZGjn+Wj*tTukNyWD9if!ArZQH8Ywr!i0WT*bM)@f(Awr^&exAS6-@r}{@`@CIA
z{Z|R}dIZ&ob-K997)T|`r#Og><u}Xx!%_LcB*v22F$qkd4%maz*)K71jV1}NEsz<E
zAA1s-%!_a`izxirnPtzAnIJ_%WJ`y{iPD|k1(by=uo!ROXbuunDBuM``o$awxR>x#
zVfS*eFgoW!!zRycjoMACv99Y@+K=&?$6>h_2KY<V?7dfoN$Hj&Jd%KQbju1ov`&K@
z<F=rJ8KU9$0x1gPldXoBM;fE9+YIA=TZEkyz&7AAn8~Y#+d(0WXiV9#noX!ofKB>T
z#;P`RY~!YfB?_@_YYp{ra>xeO2ExfPY5fppmMs@^Y`MA45flwBMi{x_Fid4}5&3MF
zl#AyboWbudQ_CoQbflInC3UgFEedz(Ter*51&SwAlqlMbne>kh->6_ihB;F?)Lo*P
z(xPUJX3U&!bz$`q)`U>bWR3#^#1iCa8%d`#Fz`d{%|uYv+men3W{w!WL1dxqBFOfO
z6JMy>6!2JYF6AlZINRe!hvzuvnRC?Cq^tC2pXvQ5{Qgg=czas$`mT%v_b%b{<Zhsp
z^KKp7Ui20HzY*}_R~Hz^w-9<h<$>UW>eXq`L<*To1K4bY$Ltko&QKSie`aPVBAEjR
zpqYY)nfZ3N@*V0x#dG5iGe0eJhLq=!zzTHwJh~qD&IjWUeyU@8D66XYH{MB)QoS*~
zU-Ol>(q`hbWo+kT%Kkep|LlZ(p>RCT6|6sa&29FdP4Wh8fpR+a=8kuLGVL^jLlUxN
z$ODN&z}2>z-)#Lh8om(1d%Qd-xB}g`vGsc2jvo_O??CKoClUiQjb~iO?4xXRx1-8T
zoGfusLv4`xJ>mTta3L(XEY<v6raY?|xZ>&$0-2xse};T!4f)NOnMtI^Wg<433ANFQ
z1w#<V4i9Jb>DVzM?LGSef*=@(Va&x&V-YM{Nvx0wGggt-qaZ(Mhl9{81v97e0}w13
zDYRW91w#n_ryMrRUp)#N9gIL+DYHK%3JSPvBx4GSJCLTsAb+Z%ZfB)o)==}*Hl&Sh
z5OF#9!ch0rNZCX)nGskxgsVbBL%dUPsN&}^3!(eyGTv4v(%G<WcO1+%A&}`!C?n7t
z1Y?a2-(ql>EDlHBK9vMVkW{YrD~GRgPUWIRZ)`7LW)#S@w{BY!4(4=OD?FZX%d0Z|
zQ<Y!*YBvH<AU%+pjX3f4o4(&6iW%~Mn^w~-jYW;O@ou87RyCtXAXHXaSde63_7bss
zVA3T!r+BUkTprF<K-{xbW)>|($igG%@udq;myoETAiTFU18w*9KD@QA5_$LEYH^<g
zX?I%jl_<RsQQV2nlI{LKYVL{~l){PP<G@$QI4wzM8U8Fel<yDjqA`JbKQJkjNNh_L
ze2yA7x&hD8P(ysTwr4m;a5g!<_zvox9bt-7jQ%9vRoISs4(}_EUr_URRFVYRCCfKc
zL#?Qf1j(ehM0&9+5gVmL9>Yl7q)#H>G`Xe*V_BXi$^e>!kJhM+;kc8X_*~6UuBd=d
zr44BGs-hcdmd2X~#7H9ccRe40bakm<0)6UH5n&G$_d8V)kaQyrY1j+5313lu;>VoH
zt9{bN5Q0EbkU!}|YwCn71@|u5A@5V2TxE%ycq`_-KNLc6SlSMPkA?v!i~(H>3vT`p
z(|g5yX(Me*M+**WfW+h#$#F{MNmMF1A($lKZtn1c9*0&X@?i-)Laz?M^bah*9bvA9
zlYbT)PyGT-@riLllxmVp?$cyVl4{=AeC_@y6+I@iO)xT;RL&I|C8{&QKL9F}2ZR=*
zFzK$HSFWq7DtVYy0(0rFuB+67@!YQy(^psTp`U5{WL)?kF0<P#=6$fzQlwNPuTLB^
zJgZ1PHONc53ng|U1C{A1a?Ek&0<o#PHD|Gi4r&fpUh_6CjjAoINq1*Z?m~G|C#77R
z2HbHJ(DhfYr+LnPGp-D^v{I+@41awGVXi@Sj`+GxK=bI-F3%PPI=rQOvHl8QyZi}O
zW!{ed;MwFwP{@T*IN~$g4;ykyI>r-n5N-u6Zuh!!k)9X06>Cz2Ui;QF=-Dt6p%hQ#
z6@^okR0U#l#(YXLnA}J?n&prFDeH@^{hC_2ZyH+kZ%ud%ohCP$R*87&b)7_DL~+C$
zp72M}jcQV^axLbl_&Aqsx7@gEeBc6H2*5lh(;%nw&iGzxCw--H@KX|?3N*je<4RbH
z<H@x&yi>YSTGnA^B_vniQgLOyX%jh?K|wo)feEC4H?U5|f%fHzRLScDCf5`(Sn|i!
zPAp&T4i&0*hW10ZC*-R&E>K%c8%Ta)Dl^h>^@@o>zV2?s{@!;CW$YxMw`pN|u$|4|
ze);;CSc7N&c$R_0Kd{R^e~;bwH$2lv?ABTDPIkqL7x7fn70jtpBF>M>URLX!(7WM1
zNXjyM5iPF7W0hWv^nCQ@QsO1pN_mX<dtL;LaFINtvQb{6okt2lvR1A~>w)>Q`g`-<
zxE!QvBNMZW&coj8Kp*oOII{b_BC8^mkDN@_^Wv4x8io>!LGyo;(MLF!RHhK)i9@GA
zjY8|h*ah{L2FaW7pU);MHa;tfgmSao9+*4N;#{is3j<ZZhtlb$c1_!^(*4!H>yAo7
zN|9yC!k7W)Td}sk>mhx?;=h?jtZPgc09|EPe^Vn<s5zYeM$<dLi!zE?1;_=0^ChAR
zS84mCO9v~HE&&?+_1a)zBl#+=`pq4&?Vr@R$fCC9o}*B6ghZIz;D%_Cri%gzN>R8X
z-~ZiF*QmU49PUhEe%2rt_#169h0IIzS$3<hG|<9UobxS=Nj#gbpHI~i7eXQO{*OgB
z=8nft<<(=0P)UY(Qc3XSq}KXw{n#%fX7>|)VO`Z^*O-bC?O1a*43STN$Nf(TGpSDG
zrlktD()$@_a}E4Q>5g-R)APJ^YhlF`qPrY<cp&D9?lFgo{T9E>fL8yEoz}gdiaXao
z0}je+mdPG1V9K_>q7*m?WaS#*Qyj@uR;+o6LX1t;P`GqU3L)~N`Vv_?s5pE&!uOiB
z$?sItJrRLpFHu6>bO2;y7~c8^Fz96>3Ts5-f!Jzzj;P;P2<BO0$7Mlm|Fw-tQXs~^
zYu)_4t-lkQpMNQ7QVE8+IQuV6j(JM_(=FCg;BP9*9<aC+iuhY$&Sp5s7)rdZQZw2L
z@|IKY5M|iNpZNaTPdDzms4M4eAjw=*d21K*R3)Qtm^--lg&H3<JA%q8$yutoOQehw
z@(AB<X8WrD$|-t&xx;SYmHX=hZ}KRoAu+h<(2pxfptu8V<6JU(#Vo32@yP-1PuQ!<
ztQvR4V=H9&w-L&xC(0)|;hXB!Pzcrf?FX-|H2@7;am&_eY+;xpe|QTs3Cd#3YRiPN
zBGrYS_>1_xKqkvIS5MI?cwBPtFqP9P*%TYr!xrr~9Xy5+8}?7?hQ=sCuM$;@B1-p3
z@*^jD+e^GuBy5dgR16|tGqL=AM(<F0%yT4@=15y_<&3GsT&kP{xJKcex7Sw+3t%^j
zR^|Sq7aKXgBP~w`RWcom_wfCGh;n>!&-Hck2&Dl_$ks+wWMs_W2JzZawnQ8dXY*$F
z0p87%B^rO9TNY2RhPpjXLaNz@J-@iAwaD1nw_JDBzy=_9JQ0rXh(Swd*pzS1d0s>I
zr{d8!M7>ksjaZ%#1mGMd^gsvYP+?C>4$@Jzm%R&nHu$B$tU)(W!gGZUKo2(18Yzqq
z_uNChx_atHUR#tS0R1%&NiyR&Kv4?2ZeGA?7obO_4;KW)I9wg)>Fo@PlYG2ZwFp^$
z$aKT__A|juTmaT$hpJ2&teD%`OYJwS$c!yl5YkQv&|#N^#xzmpEp%c?l0cC)DCDQO
zMp#lcLq1K~ebM8E3Og$?fGXYr?j({AXZV_YR8rB%XbDdce7&;V!UCU2jE~j;5WFIn
z=qmV(JOOFXT7>^Aol-su3usvbqHX#5m-<t-q|6f9vwEe`zJl|7uYK{p8`!p83DYZ@
zR4XaZ(m4xfw~NAKUJ!_!afy6iHU6`y;;n#ue-nJYV}QJ%v{tq@!M((IN!VkF#7S5f
zkcKfzPJT31O#PF;ltmG56{_WiRVis9?mSAPqC4)S(#OHwDcNGC_+zXvEn?+1LCiz)
zRqvgq0ispidcIrjz&bX;Z$fd_S749=itxV+nJ?E{9ga6GI@}9dEw<Z@rWZMkV6jQk
z(8NTBKrDC)dHta&CgXx*mt@83JrtwVoRTh0MiI6?uA?;)*=KjRdD-C?L?mr;yLRYT
z$(!YI-UX{HOwyOxiDLqppjs4G`!!^X;fvTgt1MWvTd1b=I7YHI;MxGh>$Z?wxPZx4
z;tLHTt0M7=c}9du)qa%0%*o(|Wpwu&8>~X%UuF&v@J{*+HrS_`#p7s(srBAsgQdCa
zWu0?ua)0~ELBra{j<Q0&0h9`>l#X2N&jXBP8%3XUtzY&&&VD4JV=fTT?oPeZH=O3|
zP)>4}c4xSU$XYj1inut>1f7|k^bHxDG!VVCf__c@Mv{BVa*NBul|+4i$x#U^gC0;3
zT-05+2=4a{noQts)$s#Wh{#+FrZd4@HxMd%XR@;9To#87YaXg6E~+O+$&X$R(`~Xi
zU#bJC{CfK%)FUG=-Tq_}*z<U(Q6OU7T$a}&*kk<X9ob^#XyL<aiO*Ee^E)W3eX|wV
zQRWLUjY%n(8}!n&NSwN+YZ(N3*1{CSA~Ar-ac7=);%9UkPOEAUOq^WIIe-b>F%w}H
zy_&<wX7QEPT=8IgLF#Ayj#+R(3oKWgA4y&F|7{{II95e+XXj1??mhJKUXPy2l!n&D
zI5lhlGcK3fz<1+5lvpm^<+B(4%AQi1K9?7(+HQXkvi+ELuy&Ir0b^slN!wm7ypM^3
z!Hj|bkrg{8`b+i*w~P#oSMAjEl;$>p`7`10kv*DG0A=D?GRKY*WPfO+$RAwipQDHe
zzV@U=@cf>6t0f7{6ibpYHwbVZ?3;)LT?gLQNy7*d7WXF!ri?~~9i6nEn|a-@zMg|i
z5-eppWjxd<?R%}4lxNMOH~?OB5_}Zwo_}GlquuBQuM33v?ecqOyVHf;-WYmdXQsAn
zM0X|<XF@O`+PX`W)4-1;Ymd#J^MP+7HkC%odX_*cf&`e4(>>K61G~<jWD_WeAIM(<
z3>cCS4EvSdbhF`;(s!>~C-u6+xsr5(EWJ(5ut;^Orb0)h)8(oXm9g0gqz3uZ+hB2I
zXi1xbl|X_{#yR$V8P#?g<;3vnlA%B9xCg%F%G=h|;k$;f8-#FE&<@KBA57_kWC$o{
z3*jw!6!d7Bm|t>3xLG(8^RI&5t(c`BI=GoxZ~W%zRSS|Qa49Jj(%~k^!k|i+3@Fvf
z?PQ;{8WiZ?V!Nn1?^g$EJ#_x8UAkC;QEg(98X$M4$}u-4I^<&M2lR-1e-nDtVo29|
zjCI}ABSqwL%aF+uLi@An%l!-b<aLA=lpz;8khPgy+G!n452jemit_<mJ5*#6tn<;F
zZq@GlE|1?09w(W6gTU8owdAVTlTz}&8fnA)?vN%(2)F<TLyXU`0T{5SGK}m<26gUr
zeJ5nhP-7zExY~l%Z~_r_r9Xy{Vc-aGZ`f5-;rSx!>*ChaME@clPYAHF2Cc7W!TaGM
zJ+x_nQ^xJqAwz#^Sez##5o}QWV?k$SbzgnFbDkP&^y4oVTwy6_wG@2s4i0vu?#D7N
zZy*&(Bi!Y?CY7q)w*p2>b1OJc3x~7G!+W`t%RA7h!1L%*sm6on$=^Js&w~NHA2_|3
zjB^o#9nXa)k@VA%raUx!p3hOdclu}V<OuXH!&gJ?=pm>P?w+3KEI}ZhU*Xt-YBp6h
zg_;!_Cj!+*D{sR${7#j4M~G|d!G?BMfQ1kx8i0`y9!pR5`jJR7L|^mhah%%lFY?pk
zF1A}tP0i5E3F+I@W~EEbqy)qop@F9ryu#+f)#@lfkG;PP{c?JTXYr=NvP7&UA@FAi
z_@YDa%;B*{L(!{nVNAUP-xd+4OJn@_U_&3viOXUe@3L7}*E&g~W|Cauo?M0k2hq^V
zA}icjHlw|&MByErf2%ei4MDq1RxsFyLr(_s;vX1Fu~X$4*N^oBs6q`9FAj%#6%c7y
zgsdVG_o1Dow)Wx>NW@jw%|uLD;Xx$_VVqUG>)sq=SeVXxm77px0y3KamNt)@DvjR=
zbhhC+*}8rif%tVWlo~~9Ty@tr#dL4H>7GAb8O#WuWg&7gTn(1bhhw?aZGY8wx&TN;
z9czK%&>j-^O7B;>%(@dDP*G*i)5I3xuNqt5xpTOCdL;uNI~`<iF{)8RW><7Kru`iD
zvn~0!Gq%RR1Qt^u3cGEq&fXP@N&3Zp;o$2+kWj5oOy-gs+`u*r`%*cwuS;DYk{yFV
zXDaAJnqws~;2**|(<LoP8teB)gkC+_m6Kh*N4b(XaC=sRB6&6t0*P+1E2GLLrvVbB
zeP@q7f4)oVXFF!h%HgOF#E>*q3SDS75?K^k_)4!UHN^PJ#sy^WS7<USu}pp+dzxEI
zBh<{K(QROS#JsqQ_tUuqiljze<g77*%aaBX-wZ6dOCcXegA=b{L0{Z=I#U3Lg3SO3
zw13bqc`J*F9Sn|KF3Bc0#k06Gb-`n4h5nu46eq3<AZGSwA(w9%lOBK}*r_UrG@1W>
zDZV8cL@d*^UDeo_C6$GA`Sj)V)eoP);*Pphz)Dflu!1+EB|+bY3Raj$nuxU0%pit?
z>~7B5P%)OFzz-?&wtuUL^``k7_A$E=Ao|+HbPR;(O7^VxW3zm2lrHtT8K%5OXNwNl
z%-EFpuH03xkp+bbVTTu1t+<8*@_S`z4fC2&J^Xb$>3mhJ`TUl4sW*nn{}_5Q`E5@Q
z)ilsQPmKmsv`4EtVM>G0xl}@{da4P?iyeb!o^+?zn3bHV3pnZv)@@VL8OT<+Ts1LP
z8`s5qL#b+e^f(ECr9ogTP!0H~Q>J<TRy?#;wg2UnHM!;i;MuHh81Yi267BA`eC3q!
z64YS}&afP~CrR)o_q#Z(x(ux4vO-P{RTcQ86u>Y!vnif-J>PcLqs6p(N@HeO`)Sz9
zSc|QNlGmboUz#1*H?g#1wh*h+Ia~H#A4H&(b?G5@jh>FC=gZ-3E<Bl=e)nPX_5N_d
zR{$sg-19h@_pQL=0R}WQE6KoOPPrq2*8;L#z!@rV+hqx%CfF=W%iH3poUen{Dv#OQ
zxZFUrG!C2s16Hty5Z@4jMgH2nNYSq-FQ1jvHw%bd+WDL)%~;l_?Lb<b0qFgLBpT||
zCQCFUQQloqTrUfRBvIEbb+lA1zwCtG8F>TiAFNm`-0h^%^$%8zx^0m***LwkqRXU;
zzM@N-ApC(9{ZjQn0z?2vW4pLB<N~YxDuY5oseQsbMQGL`*QAZ=x9P@~8*E%7U4~Zk
z0c)XK8^*5sKi4L11cj{;KJ3vA3&9TQf+B*swd!LQK&Cp<I)h!na4e;gQ|j^oT%Dx}
zM`hW$0?IzY_fZA3pocB^74o&R681k6#{8PwU*)3F96%v(3jkaKSk$1(6vqXt3~mkZ
zB#uVhjBat|pkY~|62svKH_b$&C<Ona#Fj{E7*d0`IHghH4dwOH4FPFs<=Zi;a7KyQ
zp=fiC;h(Cv4Zondn?+pvUmA5C6L{<qRt9YBTS-YJ&_a}oOQnhWrVtv3f)W^FU}9zu
z#s%JR&*cG6f)e)5vvtA3JMepW*B2_zLGa7opGvEek(4hJ7)s|SflOjHQL^+$amo5j
zx8cTHA~~9Nx$I@Rpf?yy4muRk1V9VSF%%j|Sj?%(Hbo^Z!2-eG9k`Bp&hK&XCKXD!
zXa{e5M7H3#2(^-qw%A%a;m^pbmC}7V{v~O(vO9xk?<=4iv{@90x8vY7ZptjzV2lor
zgSuov9kp_eid%DY-89ox22Vel*|7a{B`Xxf^_L;|n2=hoCKG{&ENWb=VaOO$u%pPK
z%7|oJ1LLJn@+k3IQbIxP&JYUucgW6H8Bw;i`wkX#VxB0%CPeMH1D6&C?Kk&&&Um6)
zvhp>=<vZfV&bO`4N(rKc$;`x@IAEtdR_gIj0ticS)cUubT^zpBeh*0sCD0Rl750(c
zt^yzr)Y(X&0!amE6tJvT>b}KLRDd;ZTHX(K=))?(*p8`Z>S5N#j@L{SFRj+$S>`EK
z#mC&kAI`X;mW!2m)uOvkDaFidHw~T$X!y4Ud*|@}_k{f&#e=$7yYknp*?C+rT%Wy&
zRsL-{djf6h7|z~8OyKx$u?AF^u4u!*Ny5F83Lv$*D>!MtkBse;v?sVVlW@@yNc?*8
z$O4Q+3}ZxmH>2sLY+%M1qWLjJea9O8TnZ31*yS^(X-OE7f`4Q~a3h{T8ep0#Lk9qB
z@P?o}gJiQqfN?Rq1vXf$yRE%ClmHRaWHkd8NLg>yzq}_?IhS`<x;BJF0wI&&aEaoP
zvk!JjKz1vDwSCC>9eATa5r*CNg3))s>eC3_&vSr1@5Ucg3FhL0dJpt_D8tL0*~Fx4
z?#LI+`wN>D*@R;Z%F+kyQVs5Kxqoj*Cy9=~&JOA*!MIcK5=*<RZgNj~h_-54CR0=)
z{~0u^TWG>-4~oZW#YF4WjLDYR9v><O(hQa)EaJd$9Zr7!XC6ZB1CJT?zQh{kyJ=w{
zMaQq=>$gLWNMts~>}TuL{d{N*#93RwMmb7M5tLLn?bfbvxO&5yAMj@Ht)lTTchA{%
zsqO4}y4=*(cfywIBq0g(+fb++ZXPrhJj-s|fKk{ad;my*vfR>S{psbMm=%k(1mBRk
zsu5uCy%!hWh0HGL4&7LM4!m=Hd7e-T@Wm~Y$%?IPkt7Pdar#Blgr0B$a`VQ9#~7WD
zq#J(H&C@q!#ZP!6?k(ZBE9tO`_m$$WNSz$}8~#YNdtZ7^Pl(BwUP;C|j>tIBZqm~O
z<Sac9L!pb(P^!9DlnRX<b{lahSF0@N5BRkh{U2SfkI5c8*tt_ES>xHdp3pmxp`Y*_
zjd>Wz`7%8$SpwBSJ3803_+fBO<uAA(T*R)8f=GSu?FO%94q<f_L0~*Z$r(y3=DBLo
zZrv~oF84!<V8IxbFG;>4Sima<zHsQk34>q`(Q?p3;<ewIoh?W`?VgnwRpIy;ISRSj
zScmA!+q9&faiFkqau>DEm+mSw#vPF0Fb*}6ffo@3(>08|c1}ObiLCg0fF{CT%hVAK
zVi&psNl*J=Vt2<O@~R{r0~A`*2|@Pe)FcCNk=tpyOdb6B-A5LDHaIeP6~L~|b2(Gg
z?o$E0l^dllb*7k!eTKXPrr|h!mS-Z>X0r~J7nm@bHt*5ZmNI7|Gn9f-W{=u!N}Yh-
z-+n>LJ~q7OnC)WOVfaFBx*4W&ohy=IuKt(^m#{Shc1RpD6b9&HPn;@%k(loF1MoHo
zhV~5h5F(q$$e2p}HK&i{EmdG6DuJOM!sE#jm5kywcqKwnu@m6em_)5EVp(qOw7OIA
z{2k-+zu1z{enhh*H4Y>7;x{1uo_>;49->u7=ucQAs0I!U)NalU#Q1*YgL@`)_t%*=
zy`;cAm&xDma$r{LoxdRm#*tEbS|Y8}UEj<YqbVbL0rlrOv@fH$H>K2?G9d^$_O%AG
zd39jx=^cVp_OxyRZY^d4!L3y&{Ur3uQv6OG=%$>205cri_UG@+)M~>otg0=|Z!42C
zS}VR|NmQTw;qSi1z@(kUn|9np^`z$*BH;KbC#_$Y0m6ivUTZu)PR^%%{aBYH&x+ql
zr-9}|_`lJVFpcGSML&T|`iU~4a04f$RYPi>2nyU*kK{+8!;pJpqr<i!{Fy{8dv9(I
zoZVJYyMhnZ>H*urgE(3M%3QibPvmPDf?lGS>>=w}n=}03A488QMt^*CbR%-tieGGK
zv|3jgcy!!WRAk`eY-n>r#@La`G*RLAdzW^kd%EDT%;S#k;l<Lwy92B<$2;!oqnY;^
z9q>pu=%qXAExO|ym);d;Z&H3;^VSKm1{$1FX=uaa#X3JZJ-N8JK4C~5S05z%i8Q|P
zJaI@F#SkH~rug?+CKf$eb9iokyeM}tK|urAe!}nNAQkP#7xs`2=@5G@k5@opSI{t(
zT0HPtXIopv#44#}*+@L@(B;;D-F27`BqaO0Z~zzUwVBB2X>c(IL;gX2H+#FI{6Y8j
z8$J(7=#Yal@`~3gpM@AW?KiZ1K0*1EFW0oBepssM?lRp>?HsperxX0hhZ!G@w1z*t
z;VmjZ>K^UaM1T6x3VX_pG<^?;&gWbB=f2iDJ$M%MV$Q4m(Y)2uyyM3uut8ougl5OD
zKlejAes;$=Ux-|Mr<?F#h`1;r9A-GZ1PyX>RaqYTx!)FN^J*+Ki=MvQA%Zu3-F@Ru
zf768zUF}vv>Pcz;se5}Zy!P<G^*6bJc)M#P5hNbf&FD6@(mys$jwaVHjqW?AqYq1@
z{(VGaU+)P%Pi**pAFmD%&-ol4olkR@vqQ&^WzJ+vb)UD`^T`zi=5b5mCFTQa_7Qvb
z%XX#b11W^Hi@!UmHo3jFs>3I#dfrsc+n558jxQCh=&G`RjsLt)BUANu+BJB*F~S40
z!76c@5y%B1d70BDNRRRSD4RY?!J@eZ8_&_OB9j|Q%*!}Ry02SO-=wK_O4X!sID$HK
z=;Cr@YW9svQsJk`7|SIsZg?)dXC1~NH!3?xU!vy$gx1~k5FVw+TlOQQE+uTz^mOZI
z8DfCI6zuKQS{vS`WqGC?_UGom(%ay?q(@EJ)pR*W39>4<bV9K6-fC(fZ9=pLHQ>+Q
zt~X{O{^R!&c{7Szs)USe(yV?uNgQ?x*o)69aiUo1N4hPo30mvWFt57TH6#B%=~}bW
zYjvC9SZsm{_E`Qjkj8Cm<~X)nf4+nyYE0<!C@FLfHRYFU<R1Z2HwPXDxk$06Agx#L
zh>M!#yW9rN+7uJK<!#OPWN?=%2WD_9`gXIr3N$;d!y>i~qSsa)yi}UfZ^ZuEOGiC2
ze4>2FvT!yAcA;KMhEd0dH^E@k((3-3WO0Lja<jHDqef=Zv?$aY0rbz*^J1#$CltCK
zz!#`-B8b}dwHyJ6hu^D>%Y^!~o)77G>&JfZ^}?CFZ&=+{zuZwo{G689JSBR`S!$Ub
zrLKpX7Ec~gb-v{I8dLY`U~+bJIp2>u`AWT7F4v05FI9S8h9bMZu>q3f2ZO+Kk<_rd
z`Py-|K2_~`zTUh)%rWCZVCdyP&?@YeuEq7x#pSvD(}+Ajw9voLAei45F=RJ<*)oCt
zgWoM##xgi-W_`rSq<Fvde7*t=4*IyS=X5vU4+VMeahqh8i}K~<G?~^8sG#bNm&#(V
zsPW@c+#M<QuEgl9Moi)c(&_kE|9NeBKk6>#Zfr|*A^`#9Q#VTMd;_TOl`%V7Z#=)0
z`$MC>QlJ%>#3%a9S^sOwbr6hUz6kbW1L!H~&Jw_PnSbB440q_~Kia>bl>P2x|AoI)
z*;@Ha7c35e&C~vR@BS_K3R+EE0SRdX;VpyE_IhYPa-j$-Iw@MKkijx9xhCdlMQ^t_
z-)MrGp|hzYd9Tssj0!yNV}vTyKJhs*5kcG2(8B#0ZnQS%mzH*0H`7=8OI(g>EuU7;
zceMArSlE}3DOBOIBg<0eLEWarrrpOz;*u8o6?ANhT=(O^+QfoXYECZ%fHk$Ac2j8%
zrI}V1_Q;0$+f!*IG&D#HQ>#tTn~P-41llnXNGNT!Mst$Qj6TE~ZCXLE=wMN58&O54
zU81(L61KBLY=&n6cr5zBk!Nsfaam>kKH)0YoIPTe1hnA~@b?!9tq1?_&x>B81RF9g
zjhqFz*e+OD{Ye#X>1eZBO9iZ`l%q4H=jc-7g?PUg<e5`8nANqVs-CCDBt^F(l?^FO
z%=PBR)Oq!;8i@A=--6=cnXAy13ihAh%A<>0yUY?vh3m9sqJ#%sKxjlI*O&52g^?+Y
zf%BwqpkpKSJoF%nhP4)(8~0DYbF;2eC7CDY%I*Ij{f_Ma^gG~x`dzITuI0EeYvMos
zjsQ>@(YTBSn1SH@1po33^5X}so0(ZNCHM_?z_Y<P%r|0Nretqtc<D{A_;Q#6ajuu1
z)yEuW>V4nj?W)-3cf*JQu{%;FRe0f-^Qi1B^u(5Dk?t0f&XDow{Wcv~+)2Mm`26$|
z>8Q5>UCIE*AEUzSD~e5QZtw2pVSjDqY!hQq0U!|Z|7YJR{Ab_&1%2GV{`}eb_lFAW
zA*Y>)hAsUIgA;V5keZHinRLVO-Ok~fBTJrx*HKxvJ8;tB_rfwaK8FD3^zoP<XC@x#
zY1c(dV=bzvU#msLtI4EWxxFQdnimPZl1l!kafYuh_fL(_GXh5t^Fidp3x#wSF=$<L
z9#SESMfLG-q~RDI{7#1A+?e^e8fvZ~@6mL5qA}v_2ZdJ(5PZbFG%|J)CpCsbg;atm
z=BB^rj$N;Io=fQ(>)-^oG-_01Tr7h<aepOFC>_fC6=0)8mg5g~DZ`VsZzS!P!ciYL
zg{!O?xe;$-$MVqVtAo+g@IlU5w1YgJ4?#cfO5G6LSfqY+1w6Ie&F5qWc0c^paQ>RU
zZ^R#BJY)DI`Q7nNex15AM!it)xAb|a&PT8ac{6NW@8LP^<89%0w#!?FHx%k`>`dSV
zJ@gTB{-l$+!%#_7aXI&Mk23)wEH7pEr*T5Yxqpt7D*KLI+%vDGw&3YaAA3m$kX@6m
zY;JKdS)Eyqn96T0PG*+!aZ$Z>(-7(F4#5HT4M}{wR_Y=Y@-zx*-p!CS*V<V<1B)u!
zHQyA5%2@|fEQ0OBqd13hXIi$i_!<2Q$@YS~UIs(6Dl0l!tH^m-?_}qbM~YAXO^GzS
zh+t3PkKn;N+cWeb$V{X^T`NXQFn-d8@%8IT)a|<IOevmDH=cG*JMm!IvrMam)n#)u
z)bg>n2}!5w?lInJQ~A6TcuItRUc+9Zmoq@KI;;FstNk%H3mO=o;zr?y;W;Kv1Y%hy
ziJ(g=2P6xyvuW5e24rH=Wc53Y^KTwlHbn$DahfZCdVt6q-imk>s)L~Yn2z~q+T_Bg
zvX|U*Se5ecl4Sp-iWv9b0}(1Fw5xR%$=ET=<=Vwd0hMa$6V@`I)U>TwsFoKUR2t^K
z*z%1I3%3~&s(qVnd@7Yje%iSWDh!e~!dv4b%^ty(_Pdtl;H(5zye#>mLA9lHNpczp
z))WH@cHU;xznFj|)(#=6nw<(9CS~xe%d`=4oR=ccr)q0W7H50@*eKWNM}c;_7@E3(
zu!!7NEyOt^@;=}mk$c%V8CCfMXr?U<i_G3<XjCI~nAB+#iBu8t7{`}dj;~~uLy#Ca
zb}sYOCqg$8`Xh>s;S&Ju>{~Lt>v6r|486zkI%j6QA9(tGdH`9W-+m^K>yJJ~MzkZ;
zlm!+lN`H=nC6uV0^rq+}@lNe%ao(-xq~$ASolmMgZ;>Zxbe?7@NdXkQmKdCc?e;I`
z;q+(Y>PZAOMNc+Bw1(BP^wX+tNKMdUOFt4kH^}Y<BBXT_-UZrqd_WNdxd<Z$)d#hl
z9=Ofvd4arZ>=!rOf3kKElph6{=P$@L?5M){dwD$HNZW|=GroaC6Ni@JPtPxiJ+E!-
zwkguVsEI0$AX<w%cpz>N_VAs|O``;<lw1o(I>P%y$1p$;9uH}_5G@nhy{3B_{PO6z
zzb`P5Q%I8hREu;wQ7WTMWwg%YB4O^z$7)E=sAdrwG_)WoixVlkAjP$ZL;Cd=W+&g{
z4jGQJU@j!uKuG-dG$!t2R1gd6Kg0z|<AW_2mLgDkpLbw8Y$zkd(<K<b^8<`tuSIV)
zvR}`_Zs)7vTaRbrv0&JaUnol1`oN{`gqB5Q-tWT2{%N{>IWJZ_o~ZJE0PJRCO8J_E
zVdMm)cs^KQS_`3m_{$45)nl*aymCUoNUhiX+33CB;k}2LL=J}Cr26!bK9tJNs3yl<
z;_z4B+wrrY+ph17%LZw3#I*D(9DLDcV76#xO_4ezr8G<b>)||Aubtv!OHvLiQK~9k
z-5*U~0Lkae!2An{wYPM_^6iOa6GI{z77(NcSJX>xx<}K46Piu&l@<dL(Nn%9+4l=j
z3Gm|@wa$w;TyI&GEk6PU;pTAv@8mrB3`2dHY%W6tCD8WvMOPwt#Vhw86VL4&_pU$9
zZNi1WzJK}_Uvs><6F1=SHrrozYrH)ZH(YJG?etFo_2s^z*@HA`ID|FdA0b%#fvgCF
zN(au`<+br>ped!>F<6~^)J)afq)?h)hC2qdT2B2h;Np&AfBj`t6UQ${6v3~}Nl3O&
zw1~KSS)lO<N8{n{>z5KwPSj*nf^chx3TjOWfIRIJ8x>sXABR?=DLv@D91Olum7QEY
zkyF!)Rm+{`4(DVX)4<EseY;2F8xb%$HMt<^u8ap0Mh|)os&OG=n$MEs0RrYjA?O5y
zKOWs((5KK%wE<7^<$@(AE`U~sRN9PSU<TK~y33jPxjvdjIU&^woT=xW0I2Yb1H>VW
zZaIBk&tCMSxl)dF#we=<NG*tRjLGpP1iAH*4$erbWs&~KcmL)ggi3n`uaxe>)2f7E
z=S3ry`78+_7>CL%!IUb)t?9+{7`Ss<#kA>)DG-g$FiE?1lrqH<P;}%E5Dw{h_?J|+
zyQ_@GR*bT!5`WW_S28oxRPnt=bxum2NU9|Gs&&_laO#G0nhU($p>c;z0Ycz?P)T@E
zGVRF!(RVZ&{2Gcs^qoThHdGx|oYfshTMDd(azLb|X1N0!yXX&nx4K93AARRl2Jk=h
z9Tew(^c^*^(ErkRu#F_S<v2g|-TqhTq6!vSpEU$->o}d*$cq8Uz4o3AQo)JOA~1I+
zbTH8meW&E6TwYji1t*LEXF}jkF#Nyto#hXGw_O5F?B+c8SMK!}^%0DY23{}{K9N+8
zE>LGw+_L!qoxxL?ePF`raxiC}8cC}8YT3w|;vr-VsBt91EufyPDnA0%H61S9A%Ads
z$&Z^9QrEBaRhKNM9Rgu->R?nUKm$U-=ugHkCPh+6HnAD?Ws%<{lA2-6%*2#YX-s%U
zrEFM*X*~dv{$y<i151VNjc`1T30so&2BXtB2+bU%@H4M63Xr2>Bu&E;Rk)S~Xm=S?
zFiw+nTircnQ=KXDAzCj<WZS(h?=ek+QY#kNzlW5F7P5c=MgA;0V$k85O$DqK;kSSH
z+52%?P2(=T#Z9t6jQ1%iW6>mr8zl3CUO*=DBpytJ6nbsj{UD#oy3Rx=Qq+JMr~`|X
z0%0I}29njexOn@U;-Rp&MYF#8hx~d&wRs7pEu-BYxuBvvg-uotsRJJ~<3Y;+K^+0s
zgouM_CUPqpL!jgnR7L=(UI|S+Hm+eCb+Y1?9vqN3%H<QRW2NneWr4tqSMI#ng<P`G
zmbj68fgD`n_=VUrCJ*2GY>)~~djs(&$Wd9M+ZbDF!H_QYj!5L+GqJjKm{NHaN4ydb
z??c?>_56EbYg#ODq#W_Z9)5RqR(<dPXWu2taO27-xD4przf<)e!q?`90aF;?gJV&l
zDq`x_!(O>L6YeYb-v}A&bx=#m+=0eItv1fR%O~~!Jyej=cq@FPxpo@mpzK2n9bBtd
zp6dl4opQGV^(U7;)hAkuU^T?bl4nKGDpO(mb3|kkOKNeNFSe7_ajq=6^O4UR3&(|E
zabsnQ_}a((CwZ0d#wc%xC#NGramL=ZrD_ss4|7D%$k(HF2g1@g1Oz_LkQJlt6#)a1
zdEKib8M?5zDLKv#HOpZD*=&YwH1AE}>DbQF2D-;8GeLmtb&)IJM&Bx(imiai0C=}<
z5(1Q0P;Ee?<}S~&D(_vM5ULQh{Q(Z?qM#`=4p8M>8f2xLH?Fs~wUpXQ9Q{OVE9p1t
zmtP1V4N2F~BhMHGW%^_cisLdwiJb5><8D~nQk5}NhUSHZeB!+)TcPoFApc~I0CZ>I
z!|PnTArf6-(s+tF1+q|pCm|BsugmRWbPvV#;H?-#>r>x{eim^F%CSe2G)~HXNX+DX
zXyo={b|~cbX3t7<qGXZ-!lTZz5b<MZi3DqmBxko4gRPg=G*#iqI3fxSz69EUj*_fN
zbumU<Sv1FTmULC^KO(BfNm_#uk7Rl#eI{mqkV!N_vg1@KgJmn_fm?*xWT+!ll8l>K
z47s`Y(ufQOt9;6hIr2>&cHd!>6N1K^5w;tf$u|Sne<nn_87#UIDblx>{=UsREjzqS
z6bek`#^w}BEs67;FN<cvGB@&_hUzj*^t7>(Xi9P+(oB2C-iKusnWXf#jAXEEV8`ew
zcD6L4KR3L>JG;#|S(ewd-vuOAT_j)7mq3yjE~MWIU_biZh-*}1c%5r_GFtH*gr3;K
zZdwZaQ5PNBI4YA<^2+lhx!ld$Gzp1@E?}Q{&||MQjUao1v?^~vQ3u<Iz~1hZbZa&e
zZc@iHu9SZZBd6wJO51rf-ZCHx7OmmE;1!^QF>e{5d;}>%TJz&~G74T^WS)_n0Xbx$
z-NokkN-j;dhZ%j;#><QdQ9K%G^K04Qi@*hk!P&<Xa!ara(EVI8Rf!uMEw})x|F;z3
zY;H_UU65^<+8~BMan9y!p4MMo-hkMPtl-;UoF}<hu!m5)BW1l)>S?VLO7%{~Nub3%
zlF4_ZRKJcX;BQ^K+RPYy)DhOaxY#og|M7PhW)(@Kjbf;b!r1v{_j?%<(lwhEL&`^X
zoDL}SiE+PXC+v9jl2HpbO(7QGoKWRv2`dHnc@+C+6O5?`Ddz=?hsJRZqW6LB;DDqX
zC;|zHa;ZfJt;Qkft>Xx7v8z$w9Qg6>ftD4@%|HkOha`g)^Wxm>5N4JIDIwKa7$2uy
z5J$#p>ukghGwPQ06_Bgr8~bGh5(KC<(E5V&%+j)1E<JSq8-8U2!`r(#|7Pz+!l5c=
zBCKu7m2j~L`ELMryiA7FnvqAChg0ZHxd#x6YIlGfS>zj&Yc;3%3>XfmrDrjJa14Vm
z7Pa8WD8NoAil`_MMJo|gl0(6p;n67_UW3h%zU}j2?dm}Ec+a2SgHSk#rhd2I3BsT{
z2!_=NMpKD4WW;4!d@tepDZd}hDe8s7xahFfAmzN>I@~`<imlyvr%7Ad7(93Xr|_zr
z$6ZC@D9pxl@crdGbEnxnD2qduGb6we{aE>4Rzh1!W`c)<gQEDzpc9p!@+8PW;EX(c
z0@~uJsc|r^Pl4GR=q~Gd+L)L$yPagTs%`A&Esu!`rO&8_YQ0%gWL0I28_nk0&m;MG
zz=_dOj=uIIh=EgP9e4A1oV*%o3teI!MJaS#G-=v{qDPZ(C8(WtfhSt7xepx|YqzHb
zTcNht`(Rj7$4Ts3o;Uz`wex#~C@9&0ZpYWy{ikh;O>P~zo+|w>xP?{0PDn7qCXy+5
z(fM@$_@}9xls{izKRYejnr#h#PnBrb3)W{_<_Q}KbNyVf77zlI6A3QcglII6SP*q+
zrn}>c&<hUbnjcA&ga$6>($FJoZ<K$INgH#}v8ux}7@h^v@K<nW0Y3-Zw`4A7#W2HG
zLjxndzsEB^k6v$MxTmuNmd|70vc0m}U4bZ}uO$cMP96npkAQ2WCjS6DgC777CsT=8
zGZt;WoUrk()Bv!V>5sK3sh>L!W2l-`gcwGar@w1ulO)nKIX=EIPblO`Vtp3R3gla5
zeF#%3<WZ-p7=O;|0{1oDCw7nd-ghGBMOm_x)jImdVo3sv9#8#6c_6&&t-<e~E4)QJ
zErc~fY0^B1c)Z(L-+tr+^^RU3i_h!l<nMWoVjL=vbM33%d<|Ga912L)w)-P=H~&Od
zGha3t0o9pJy&mQ(IZ3m&Mm@Be$TO_;Ptw8S?>nYLr#u7bV|toDIfL5TbYgU2PT#8k
zw;lx23&e|qTg~>nr>o0wi!{u=082{6DZt~P5*06Xhtw;x9KvXVdCiH`;91}LA(vam
zTe|*Y6|If*w-FFs>t5K)%d^w@Jp3XHf3k(!EV(9VmI}YfM(G+4gKl9Kkh>7WbiW27
zV2szgR|4eExnn$8upXkK&WSQ|t=USFI$uKE3B&01IL81<{<zc1tju_C!a!n*mih@<
zPIRtvRLfRjI=Z87?5`xkdm|I4QSMhjA=X?qEaQj`UmQ_lD=@kcJtYKDzN}7_uJ`ah
zS6mZr(4|=xJfwZk{Z;b$uWPe#ppqo<9hR~d?-wqUl-D!gVF@gL=486>zoR}+;q90Z
z5ncx=pZCn`ziLg)XiWD{i2Bh*0i`qQae=?W_z95^NlXyko$dR8j_#fxNL#tJmLUoO
zQ(6!>+kh`auP(j$-3pVfrq7TPFC93)dc16`@E!l`{<<P@=_1t?REUF|vP*}8BpKK}
zxl3242)Lp_#Wd4OLB+<UjPZY^(t9Mwt1f$=a(k-uM7aFgZ?#$v5~)c25%3-pWL2Yo
z1_}%>P}HW+@B0tT&y_bR)4vKP{=`4UXm!0-^F9S99J_@N!W4K`4nF~V?FGTYN^6TR
zq?t9PjZ1`Ry}MriJ<!|6wrgR|8dDWLkVY_T{+JcJl@mYG+!VJmXuE|{A7JGHeoH+P
z|HAitd#}s%mFEx*!EQL5#2{y-nAgeTr1*dimjw47-))SA;8CE^iwo1w7v}n9q7sDR
zkj~cl&G|za)#%3#ME#-@JMdz10KY9eQ7bt1AI6W)A+f-B{yP`bC440`Yp%uxgM6=+
zyWOx5?^_m(=%OgO!*lV9ll`~6Z;B&6`$)x%gbGa2-yG%*a_1c#y++OUSMBtqM@dz~
z3}0^)PmhY)%%@J_{{eVQ<<$H3;l4ir9*Kb6yZsBF%ulq}MTo+y!8yfdat*!I{C@$w
zC8-(0my~r_dZ-iiBjYeHy(L=GEhJ?$>mIyOB5vF$4S>=>g*78NaYX@m{Xy0>bST<A
zg)M^u-pWc7IwE|TYltx`zOmv#QGUGNDsgcK!R5`;@|(HM7orteO;CCMYVOG(l1C$g
zJoc$V`tr~yCsGB)(Zlz47+GDi#VkOXl()pwi)tro-&c_{MCfpQ=i<Z2^4XN+Nz$^?
zSy^vcP&jJMzh}=z2%L(n{F#BVNb<}3&E(Jhuwh7p!C?PKz@w1bEZ{jvglqL8@&9U@
z!kK6y`zPSdqS>ha2zbQ*33x4O*}N!3;oE|`k=P}o>&i4snPVgENnB(QYrq4=Dqgz|
zS~%foprJRa59Y)=R7n$->p=Lyl2C!{#1mlBwC+_&2FKfA<JUY7$TZG%aERH)l$N;7
zuk;<*oM*vJJun~Zwjc=-@MJnogk(UApzpW0KaZken?k<O^!-7!w7^Nuu9i(r>B*8g
zi6m@QGD#~baEp|X8H@DoxH~mVBv89z{#2y5v9vJEdNMXJ8e|qD%5aU&n1<|$wD>Je
z&;=BX;GFo;>PNRD6&G;TAU1_#1;JudedEvMHe+)<^zlv~{}Ff-3BB99{j*!exKc#V
z?aH(gPcw%daY^l+aTDiKDwcUI2}3K5NV+wo7@B{`mJVKgJT_we>;Z5}=C^V)kl;=f
zLlXua`o{59eh9pTe*|995`^`7^UGE?{^e$;x9R<E4KH`%&-b32K-|p!(V|JUy#v&E
zBdSG`TZED`h8jojkb!Z~0S!9HXIpiBTcqu(s5hLE4WYqcQwZh|&4Ah;A^snM*YgM)
zsAEbhV#p1FN_4xnvm)H}kHCZYN8kxWQm3LccX@d?bX-~^Js3))79y#YdFfSj+78}A
zL?xyTah3EYD-x<T=JKqrh|ywQS)DDMosIA*KNrSEvperB8*8qak|g1OD9M*=_!_Sv
z|Ekf>DE=Yv`ds^Mhbw<&H<se9qzrjLFm)f{*D*I`O(9W@YUG1BFcTs0yN}fitFs=d
z<D@m7N9kwj$s9}KVRMk&K7?Fj-OGDiX#j64Ye~yd1}DRxEuAqg4`1~Lwo;$8SE;NF
zmCtYSC#!2QuBW8S&fAu#oYtr{!Yx#Hvv9{iZz#D|s^K~C9ynL<>lcJp&QzH4wVL96
zH!vJg@M@?oRL-c@@Uf0^1Udg7ftLjGAAxtsOXGGbF$C?&h`K)@64jOqeCliCBO9kk
zVYm+GAVg(E(&LFx1QZ<7Na|G&oK6et31|R&i@0W@d3@`}n%x?sw)&)bhgP#f#IFri
z^PArqkW`?>h*X~dBtWxe2<LRuTvK7fqO-TPrJ{h<+Pn7M*{~Z`#a51PhOfo<t50F<
zfK}4@?Pp9O^#Fg?P}9wSl&rc_z_Ky2boyE?r#fuAfeih8>G*#HUgAFj?|W4B`R+(M
zS*RzQ730g%*KrK6x3Gg*SLKO@iwX#C?SjYtMJ|tnH)JY-#RzzEIENc-vEty+!6h6o
zwM$lL@g1H0lDxIz;6-h?cEiyz(vdky7xX+rVV7T={gC&$W9VIPt4TDfH@)6e35y|&
z1!(07NtFc~Qup>^IPlf2`BJm%)RUqJ@wTpwCR#f&;bfaX<5EH;BDNfvS0mlU&tkGt
zNlmq<i`1Nj=PcQ(6?vZ>#O8%ov%yVR=~jQ`@B#-mtZ?g{X9QX;(rQ~6d`kP}87KXm
z3G!2@n?_L#dJP8Az)zDQn4lHat3NFucdeIrn`OueVxn33A$9$8@L{oU3W{mXp*RT_
zEBH#e&|9phq5^VgBYRn|aj%qs=|%XD+=BA-<s2RIIoQy!Kh!GLtfT><Mj!#fVdmSa
z!K6n@6}(s@sm}P|@DU-?W6u@*M8Da$QO15gufO2B-;_zU1M=kXVlAY=el^R0nYf>c
z#Srjs_=ozVi8VW6`#;TCPKiRd%WL7W`M4OAk7S#}`xGb6#GeR^S@$@+1N^{>26uCC
z9H*H0Iqyd52<_*8z%cw@7jslN<x6(N)6}=bdM5Fmuf3g%oeMkB0L<?|Y}Q>N)7oLu
zMBYQDF@@xBF_{wSJkU!Rsd-La2_&K8W5+cSgz)AiN?9DNNO5zt0-^gQHaooxNHWKR
z#VuKoh_2K)5iKLr41+)pr5=wJi-iO{uVW3wo@cx%CU&<uGX}_o&MCrC(rDwK*R&Cw
zEf|yS3-8>eBT8GicPWN!ZPQf{t{TP$wJ1=8aEA;0=LCu~v{ho}_Rnz;PL_99jllr&
z;>bsT1Uyk;2q3PsR;LQ2bM~C`pee$=PcX04{{%cGW&l2(E<0h9w7`PFvhsX&$%sGY
z_qSMCb)^jXCFO{H{-O}bUrtP6a|!5RJWYDTE8p<%-%U?+O!inO$E8*NGV&mv3327s
z;uX>`f<o5TG-Sy?O`ZglCTBW^eZyVP8Cin$Aw*&|WM%h6<%V<Prh-64a0&t_cA5ra
zjYFrkJ_E*eNn9Wv_+|l=Z#?}&&|8o5asF0X6kz+8Y2-db2I$a3@{3^vM(~7Axr@^#
z)#k`FCWM{geMMoUJx?GuEt{cy^BW&-iS!Jh5JCv=;-^s{eJrp~v%I0leO%?`zUU%n
zc~V4*j&m+QB?f2TtIo4l=dyBK@Qm%2qI>j3kBPnu5J)46G@N=SO(5j8Qs&M_apctb
zERIW?liS|l=1TbrTZ?Lm_1L!xGobZ%(4frN&&<6qzo&5pv3g_DlE{m=`&McGBuv`w
zbeq1XEX$Z6y;t%hki_s9cwIn?{b3_}RQuwBI0{LbrE_xLp+OpMbv|PPdWC%LUl`L7
z^hM?8M2cHeLu)y9y3bT!imuJ~y;^(8a%a1ku6ko;6c!kSRV>S>1^c#s1g*MC-JOUj
z9Y3PUf<yH{p7lY$>}P2!Kd9QFS<hAHU$OZe3cEbvLzo$qxPtLE$pKy&c6{iV5{X8s
zUl$dt&f{G^hZ8P-z#Rz)w~^=V!flQ7E?)tYg@O84v9r0?<r*QlmQ0#KzK<Q#dizt0
zEtAloB%q%loD0b;@@~1%&IA#!(+=6s73IRF3_D&2dDx5s^K`p91||N`+RrbJ<*<$^
zr^POgZqc_Y?OmynzOP0xSn(Mlyu;i^$m7USM&DC(lOC(ri7FLuDn5ACl<ogV;6eQm
zcs!ne?XB~Ceh9qh4^9B5oh!e-k1sp`KG<$k{&0G-|A(q`3bHNGvTWM6ZQHhO+qTV{
zwr$(Coq5yNP21{x)zuZ%5&LU@?>KSRT62u~Y4!tq!_1JZu0F9TnEwIwdQAGZYKwol
z-SX1I_Xzmk3cO<CI#@v&aB5{SJ!5EJl>9jS)oy#k-*Sn>kPCiIbZr3nqVGRM_VMME
zZOZAPBD5!fcxizWueV6W^phYYKA12bf3GY`Qu<ZbP?k}#tS}{GU`n(+ighMi=kseP
ziVw|jCQxGsV3b9n+Sp}Nl!#4<iG`fSO|vEHu!t|+rlsd;f)#4mb^TcjP&QxS&V6k_
zk9k{jgc8R2VJpakhC-yQ=!m|~1%G6U5~vILUz47|BtY@ywfM(xcV4hXu&-nKbwfYn
zk^xYHwrD5{^7*a&ODxuEEW8UOl){7kLohxQz_KaDojR5!{`dh(vVV1AfA?xr;2NmU
zU)B3H4Xv~MD)4j?nQ-kN@@)*8y<b{+H*cZ;OMzDu>_&nY`65DL4ermhzHhj^;0VF<
zj}1K#N8@;A!GdK6F>bVN*l6L}#n6OAJ2HNEUg3%ffYly%$lNHl6C_`QWn+5<^v3d_
znZlbW!Ncb_;Y!wYyek3DJ%1AI>f-Fjl2)W)1HZAueUl$+!SQFqcz@P|<<E|1IDiMb
z0xtF+c-PWan{}pT=B<?LvolFoSGXr5jGnVx6&DV?s|~KLtOg?s_|a3qst6<A_Hf@L
z{*b^C90lLsK~=xL`u9f0BRJOd558g6f~U8oiW%$5ll<>?<Xlcdo8%oXL8dS`d|a3!
z`_85~;v<e|+w(|~Z%$7+?*?{>vx9-lKYO*##qjqRd+zoxUzFV~&VVCBF6WimCf;E0
z8M96+hS6%H^~P8mH#9hZWXjlumXBQs*Jm41kYDklum&qa%RJ~_w-LfO_6S!Omtxne
z3x_Wv8O11hxwIC`L9pS7_W=`)d7n$7sIr)4y*tn-=gxK)DvneQ5~5?s1|$A1<~EQO
zSXA$s0U)TR9KT`D%5{2_h9!Ev0a4ZnFBZ(EaLg*{cK#;u(kA9rb7!jv5pNDJZ_b&A
z&YGY}R&`;Vjd4p|XI=eqXQ8fPxAw;VZf!gE@9ad~RaU>mJwQ`ZWc&()8>|DN^*6%l
z7d2T>Gp&}b8IQP0|I{N{xZ!X#QU!z`J}o99Lpw~S!iWvG>?I=Crk&X%>@lIs|GZo&
zR%k{!g-LFf9BUCyE{=GZjUP7-$cpnK7lb_un&;lh!<~0JZ%;WBNY3P8%(a&#N3Q3D
zj$`cT%Gw>A05k3n3mbS2(RiT!YZ>RPQB9fKe@kH<*lm3+<)D7z2$2zJp(cXjgxrLA
z{Z&(HR(u{dl8^P)oY|?048e5DZY}yd-|Km`Ivu(^O?y0@U|LaJA4#<4Wvfun3;6nu
z7O2$mYGq^VXMeyg2w1Q%ogRGhVv&b;lF{&fZRoYr1hXE&T%`TgcI)5f2}_NUywp<U
z3A#L;`%Ts@>0G%)xR}Il25Ob<YSsyTQ@iEFzjFkjq1SHK)ov*NE}EQQiNI>dTy9NK
znVPf-SMc&$&E;ule};N}H+mhqxLgy0%Jn>9m^~FDr3M-Noqa6*q-I-D3{HAhi-08S
zt0AL-t&EVLG<0$hPjXrwS9aBFznWh^wVuYNrt0kXFdOD}YW+Mb-;avF)?ETX@i@!v
zn31<t09oNxaXzFU^3?6OO(q=EjV<@S=vRzR>?$V5E$B>e95v4RPQOI*{}>?;>Wa*W
zUiuqJR`>?zsh3Xo0?JcYSmBzd6ii@|r&J`BI!~(9o=}#QCN0iV5VCk&G5_(DWbu&z
z?E8y-1$pQ#U~kTYbMbebl^i^neF+SHToZ>KHApuR<iWl)B1cJ-C|j`-@2QrG7*XF}
zTBwU8gPfsuL}(o<l7F$dT=@70stCT~p{GfSu$SV+EO3tIjAN(Ns@B*t+DZFbo{yHZ
zv7Bo|WuCZ=>U%BQ4b-ZYt{rtlTl^BLrkC^cyu%eGx(;k?0I;hl=0Gp8*R56X$HH0&
zy4o;~t@zA(Gz1hFKmS9E`x{tMO~m=Hz%5xO8v*Iqa6*y6{N%w31VpjiBd#MUqPUv|
z<IOy)$kFY;Gnkjc=b8oDeK|&QHQX|AZV1WR9-ycvsa_7v_~Td$rR8U}5=Bn;_`gov
zJtSNS&S$H<d#dw*6wORRA>>5q+ec(X8>(WPPMnYaP(>DXrlxHBvk4G3v86z(c&<pc
zjB9q$c=_}st9jN%!rPD##3@o`j4v|BbHCP)n(Wql%x3o=eIlY4nZU$P8JE5F#;jOM
zP|Pf*5}^j*^j|reGdQfp5)z}4h|gIDX3bn+tFEjng~{0X2{rI^gt{0%Pb^W)ko}UB
zM<*Y(>6S<+WE;MfWId$Y<OF7hiK%ccz*{wew=|x->9`6=?tZE+Nq{)$qWUp>K^)Jc
zBtCKi&u!2M%EVw8SdJgeeGKSm2eC)#`1??k+yQN~2kUm?5H;!+A1%b8`pIZz%24W|
zBD5m*pa^zwhV(0=G^{qU#qaP3Ci^SQoCDPJ!GJqyk2PZ^3Cf#izj==QAwsNQbbB+3
z#;9sOgPzLZawqR^EXih5?zP~@1cX?nY^%Io)R^F0JJRvCi#if__Ku!)a#OYOe=q~>
zHa~<yP3tQ7!~hsCP%>=(=P-Z$Zx{|V@VrFCz4KJD9|A@`GiBJiG12Lw>n2*YG2RlH
z2}&}M7>1jZWNt)zvVU{_XR&OoeVy0HD6_STV~OX;l+2q&T6qgLrsEO`3U|=kOQD=$
zZR}t0?X~gT7weW#M1MZ;e8qLQqp&g9t~GvQ7OLyqX3mT@rI{qrqogS<rx|DIkOf5+
z6?ro*l>N-(hlFlS<%iMA48CvE3_4Hdi(vguj~8|cIF(IHydh`1rnw{}V_!*GAHaI6
zsZVKr((uEj33gxNpgW8V-fcf;-YuYKM^X~ZoG+R2P*pNGuGFg{qr8E^MT&w%YD0aP
z-2jrp0Pu(^QXxxW2szz{c0RzA#08)cN6IfW*DGLw$c&LLS-}iB%@_Y`ZMy9Ep;=`0
z2>^oODj6tiIa978W+L=|s=@Ry>@T8)^Qg{-{K;UP#_DvFd6Mae({jBT;J1|#x|O_L
zE4j6R>35jYzZbs`HTHZIH2jg<aZD@(!XF$|q2eUfIj183lNj0b*O?O|NWGR|pm&ZF
zfWWs_pGwH>VUZwWU>2*AetR4gVmN)eDHCroxR~I;@uK-&&a4vac*je+G0a3QJRbXw
zK;-4`DC+an&cAQ7S{?=l11@Uq^fJDTB((X6c|J^a+%ApO3D_m)zShgrHc`Z9f}U&i
zJKqG29!~cWLWdU|N*qx2cOI?{)U!)QohJslh5YUy3lBGmCf0vr3iowk1KuN6ci65|
z`I>sn5lo;_dXX9YkXXxm<Qh0$yeRB~pekhl02YvX?=o8{c?v1u6G5c86{C-Jf;kT)
zQz3OJp!`NlvqpD>Ow*`o3sK5OqM?DQ9!tS^!^_5fL$L!G6Ed|M@N8ohYWQ?x|8A>2
z28ue!nWz%S3e4slV*%^L3~$kRJc-IM7jF$@>LO<}>o0Az153so!zxMp+3YEqO6Km;
z4sFW;(p}%ewBhjBFe2|em9S*i0<iLAJHbFBlC@YuFAk*{o&tVd#^&n|kzBd_QIPY;
z%sbV^%S9vF#jQc%lKNYa7<%@u7}dECq+rzGZStT<R|#EIC@6AD=0W7dm-ehkS?~W*
z9Ep<RPh_NQk&4%EY*G+HI-jdRmKNfOODjxcYes|X6cwK0S>u@9TxdIDWCx>Ii*{vR
zEkzZM1E*e-7`9yI;l?rJxNasH3XV(2E7hK}6h^feU*El}JYPow9Qy)m*<a<ecBZOf
z)W*Z6@%0g^)xf=P#Ah`i+w!;ortPr#%KZJoNbm?yY>q!$YKVv!n~LRvWv+=4Amo}%
zX|6cYLsi~{m&&E{0P^aL@kJT^os<nC7CKSNwDC9wd}AMkYKs8VR>=RdqGkvbMX?}P
zQ0@NfcpB^Wi4dQ6po$-5RVoBtrR->Hi03=uk2D^Q%7zu)iV8D~<ex&lpord4Lai)S
zUJ{k0#jTA-h{*&&SXmkV>;KKwKzdDtuHTX}EIX7QTX7ZosmPF>cmunWr5x@Me?(PW
zn;)Sv`WSO`S`w}=&c-jZ&i9E8LS#<Tgd=!kM=YoySO?iAK%G(s5R2P-+abYZ60?^^
z@@;{o-$@L%QR&i+W?S+_DmsE-D~!hXnZAU-(5$2~mJNI1T#Lai^hQ3f@+x43_H(#H
zH5|Batpw<-*RSGpv=kXPWMPq9eEmMo0%9*3hKMNA2j8;&R5Lw~ub`siVdA;BG_?W&
z8sCqlbvFr4x=2+8p^g^cK0>6pykAm$^hwnhqi8;W;atRIOB7KkkJ3zX61NVMuK@DJ
z-*#!6GIp>Q3dTUeKnLaoM+#$}te6>AAu2??J(?I&N&*uyrI8trEfb4$9#fry#R{++
zhfbIe^}`YJrrIqeS(cz^4+$T0BIp?iVh9s0%Pmb*KnDj8Q$G^#z=t^%D8ryySj-Ao
zWyZW?oHCSfA71jOy3B@owJak#fL2<B5zDQ<9My!nU%N$VrYQ!jzCdh@qyclv`gme|
zPq{E4lE^4L0^4iDpA?F!HvU$6<{Oa{oC$(U$C*I0-uMy>{XGI(n|3YU7cp2VPZzP7
z?D+8Sv@HoUq0{x%0QZ|f!hO>>3gN+ENtZ~}=z;BvGNQbGkGe^E&}OT#42UbdF%97s
zxC0n5B@VPFT}<Pi4vlH-z!ir`Ov(m{G$}x96%f-H)7enze;5#BWT-Dho~uENx~?a9
z2fbt8@JOr%F({9zC0^Mcn6NuiGV9|dV?_99_Fb)s^+LQ>so9&{R4FA$d~Q;`*)b`}
zVoWh?(qf(qvw~u_?%roft&5iGq~SrlT*^J=^`Gn7A9ny|LQ<YbDLa|u{(i%2kuAVe
z9|)hgG*0)xl(HT_Z1M^(259mMDc)yUC+#O~c1ze*`^_WZLc=LC$dmKSOcDjb>day&
zKQ`IHTK!Pbr1o52r_ae^!<y=h$JFzB)5;F!)rD9|E2o~C^y1Og%`izTA77euk0dJG
zQny(-utuMK3_j#VguC!Hu0wU8Cs4Uz(#4DjLl(VKMTPz`2C20&z?s6~6rDQ7<jznT
zGi<92RScW*d><G(n2#iX)u2L7U;AN%NsxRyW(q_JTVe|0S!~uor9jY}VKVvRSiYw1
zm<qjqQGhr?KTU$BvONlTi^t|bw+FnCGD|Z#02y{Cq=lMC_Vhqx`t5R0<htE-$0X%0
zyd^R{%5+0y`YZdt!_t&uvL?N&Tg8#=dYk!@I+|q8WUfy=oW^8%xXQ#cD!PZ-^3J6J
z)lNAlvOsD;nCy{Pb#a!+_BC=tWO_ApO=Nl%ghga}>#7#&P!J*d5fKYi10n{!!Ftrj
z<iw4CH<l-W*Dp-}9Yu4dXC&1tT4C8nWSQ8?T5PeJv*atzn0wr<V9%wa{maanW#Iw`
zoH2t*dI6&`-lFtH*Z0}^cxavR{_du|ETHIDmY{$^PG{aX!Tid|uJnJGClKUW!k`k4
zS5fa0dzKZkeG~x96INj{!KFWGJfndSlvZ09M1jdS$S=)YTwXeuAQmSL@tA!3gbae7
z4p78*I3H&T216OnAE=R^#!NQ~8+MRXx7s8(_`F$n+mpB0)rYg&X}IWr{@qhVSVVP{
zmISqvshcVUER*N&A>=+akSS=hZn*<#B7q<&ziL63Uvy=#0S#6r(}HA!iqXpNgV+9!
z!`ixCe5@vv1P4NG$P8NorVjwZfZJ643gc54{q<|6sQea4L$QvB7=rSdR#wp-2`6jP
zU-WTv(r@&smBLT->a~>5M`t32;yqokY|v%5$>GuYeOe;o@F63Ln}=CKRK6g32~R>2
z$R_FU#AXbP*6X9xAGBEe+D$d6+ki@Bp)HV?&6yr5H)(whfo>*9$iXTO$iPNGj?ueK
z<Jas36Py16(6vJ+0*P;>?6^G~PIrR4p08x>c6+Y+19QLwwBOjvGSH}9$EjK#&(3`e
z6I$b8clV=JDCl-aEpSEF@fh1Dxe<;^+Q_$JHjM9TEjZ9tg1hUxCA$I(?&6R}gLcYP
zQGJBeLb;DeB-k2J@V`kk7tQUHpR6G;7!<k!dX(I2w^XS}rv+<A;`tmHUukW}5TIaH
z0*-9=#)n!CSU$!cY810vLPhbfrQG3N?IT5rge|F&Z?+R_>E9jVQvtB6Zs@C5J<Qv|
zTax--sbOpQ)JUMPGDV!-TXh^mjSQ)jpo{vBIKK?J15n*(6|QKnEkR=EBa*K$9+hsC
zrsGFP61&~-kY#*`mJ(K}P{nAF2(ALuaM<UpKOH}2`^>1r-IKZvyP!>V+__D{HZ$rB
za!sOp*PRW>eGtN>48(4M$?!sbNJ7V)k~Pp$IRDvh!jJB5Ul?>)LnDBGwxQR{)F}3s
zgZlTEldWpLGY!YctR>3Z`TL(f^Iz?@S>FsCb$#$jT(7m^u3865JZ0_6(=?U<-z3G_
z-bu{iYkTDQdhc#O<!a#(Z7O@`q9rI@((RQr;K!9~><_BG4}HTwU#<Hyba<ccyukJ5
z@w|S6bHjJ{*}uRU93J!#abdws-<W`$O)|vzE1ifX8JG<rA>3Bc+A2;mtI0pba})m}
zwX?Jya$c<=hrB{8bg`ALF{2~cwiH-te~ly9yB%h7dmCNMD!m@?`)FjA0tTky<Y{%i
z4Eow__&FgJ)q+Ddv@(0F_`6@{?vjk6rt11MvOi{j=^vNxyMwK$^OlL@$X|2bcLT@I
zWm+iZFBM~<4Js_WdC*6bT>xl>uAg1|{W<J+M|+h4tN>yy|Jm+j+3I!H;rkrOvZnz;
zyPc>1c0eb<=?wP+vyJaU*B%@J2P=%r0%we{UPi7a$6G(=-{NvsbzUOAa)x~K<%tc|
z1CFjwy>w@8#CTy)4|^zc3E)Qog^q5wiTuKV@$t?14>(ub<TpXF?ruWArI*pI(8JT>
z;@9YPcsBX6iZUz+a$IM>Zf<`VxE3d`ZZE%)R93}5Lzm-Yx64(wGn%UZUBvakD#mb+
zrTR4eIyvLGh3I{q{Pl_yrpWvAQnHEQ=w<5T_qI|d^?@Mhajp0CJUAJdcd5SXIS3ko
zF87;((ffWDd76-f;E5r3Rp?Z02JqfyDsRrOmc_>w&LH~&BPv`m)Gbxnh{$`oi2gMv
zmlQyqs3T}o7jLGvtgt99$xQATxX{GE*XT3Q-2DAJXgA%}a)S8>F?Ytkl7tvh&Mg>Z
zkWn*uoHP^dl}uqRGX{XEF9?8%NnH0P;O`EN@Rj|LqqGEDT*?2Y(_;Tx4eaXtLzoUT
z3hVHCtN6OVC~2BQ`+rm#i#)M`V<Zad>lFDR7YWlN-)BSKgv{;+BLFju{s2AfFOfFV
zMQ}I8eyI4!xpH9C(=Of<13`9yG7wUXMd)U?QgwPuYjSryy*RaA5=#AQRj_YCyfJaZ
zyCF&4h|jC<<EST-7+Duh^R|$2y0YH%ye*$fmy3oL1)h!Fw%<ujwrSF4(~C<nQmN-1
zQh`naS-51j+hfn)8?dS4^S^MjOlon?It5p+#oQh^^dn{X%Zequ3s~lvVRRbX(4!G7
zG;VQaFlyLktr?f%lUczwy1rI@AMLOEoy8o@Ek!P@AQ1g6#?Nh^0F7MaW~VDn*XOd|
za1FIBiGqJ}sU}1)0(0~|ZUbO6mnHG4MexbTy)}^k-+%-6_~ftO&znbfr9vM4^hdGB
ztLy#H6yOcN3$AdaBYf)Azm;BxQ1t39R3B~jHQ(X?;%9}t4F+?3V^@mhtnt#T6ZaSN
zw?qq!YQC?qVm))OXIknC{aU5gcmdb=RXC$D8}f3sK3|c%jH1qF|8W%2ZRuhDOxtV7
zSE=XG?*5GOeHlMZKAh)JMjOl_s@lg69e}F@?$vl*l=|*3+*hQ1r+a&Poh>3`fD+gr
zM@2m^GlP0Ykp%v|zmP{qM}u%Njm7khgUff5n*F8FxT!*k)3G0{G-%7TfddJ$YtR|a
zm0&uCvv<yGw@B2thwl-^d)^0iC}n}ANO*)6J0PkXz8O6m&FO+8P#DN*V~)jNZl%}2
zzG*?yiE#5=oEIb?Gy!hMv6leGs~YWH6gFP8P#}uTJ>Qdk4=L9p7*M<qFt;-UySh79
z{&7c`Y2#U;m+$(4)T%9`0tUDKe-s*T9{Tha!zBr(`B%v5VpN=L{{@@mqYfTy2Gx7S
z>ae=tI`tr0g*d{62X-TUE_>s_xZfv;1@b;+H@gls5+z!WuAzFrr0^RHvOd`<3VWkT
z^20rWg7usaYuY2T7EL}-GPy}VOCd!5#s9~mnHgMb+;c~_&D~*(NT~9XY<`rM+^o<d
zF848U2A?9;1xzeGZ%gj~=_6O=D^wqd)5Mop1v!Q*UyQWM77%5*gJE`9j&S~O8ZF`F
ze`qu^fiN{*|4{~|d&~c!(azTa6#vM<aQGL}YkM)-?{bA_#Waqh>HBxMyg~mP+y0o6
z?*lqU^_AaH%fgnUJU5yUkt{TF9UH2Y^9}c08|ip(ak^7k=n)I0=~ajCUX8R%!1}W&
zyCM#=D3E?n`27QMJ#yDt-z+K_*kc#@XfbhE_<YS)Ij^hJ-plir^ZV=j_38T56yC3y
zjj5mACth%nLj_O^n^wL#)nT)-C?oapF$nQ(3~rKVxs3&U*VcJ1-#Pm)<b}}OE0UxE
zM1>FvA9M*6GtshASSamB9Y~aSj?)LV9~#<eQ*LX>R?XbZl0V=xA>O9Jjt5nJ9^nHB
z#QvH`lu0%Bpr?`NO+{N>Bqh_(uQ!n*s+*-+xB;qKU|KLlhmpdXpGKnDGjCdP;Z{l8
zBd|j%UvEgX+)`ez9^d~@c+gn$;}Eq*12p+n2j`mdT%PLb4GsT?B2ETmrTu$%(2l<I
zt4bY+n;_sTS<C}ip}apkS<cF)^YexB@7=yXm`<r1BT@@|uuZ2sulzX3Iu+k6MWhJi
z_7~tcVmJdG?e}wpd;)ifUu0X?5|xBTz@s(LY4!pxTo68-bNJrz{lji;VL#o@r88v~
z!}8m=cw*k)OExeIgN7a$RLEFMB5YiAMM!=c%>6tV3P8o$c~VK<+j=uQza&B8^F1Ze
z7)BQYl6btX0Ix1K4|4E0y%Z&h1zJnRA+Re+zf;$+m}OOewIYFVVeRPh`<ZbUpPHrb
z|GkiYj?rdO?e={Zdzw;C&nc4ZkJ)C>)Rh*Otj7V7TB|kz<hVy`s=<Pn5cN^*&UTaZ
zQIcP}Z#1|0UI{6&HpO^WLRsp|tW5u|5!c+Byz7H<W&G;G3S|sd=$rh{OG}v_Fh%U8
zzA9y$w%Vri^lw`U&*>j91^lHxSjzabT`-mL#~Z5T@lSi^<#92P>Ke1vfcB?_D=fsm
z-VNPv_H@5A!%93@`Y&vqXL4L=z~=6I$m6Qhl@f}J>a!K@`khWaVR$P`?-(Mfr?M85
zgr=cFNg=14-s+_E?4Kv}WcFU&zakoR)3;W`#{(oc1E43ptEk`Z(#5R~N#C$UAI@ZC
z`Hyo~M$Dmx-}jRMLNwnm@O$p9uUOgCsd9Yc4SXi@+of=*uc?N8xd76zKK7+ZgiZ@Z
z2kh)(RWR{V+D{jR<9&KL!1jQOL~Q={$HUg%MqyGZmCX9<?s*ikz%cuP!$zfWH9|^<
z4%b!j;qA<-!bC*2jPw~1+T~#?!?gcyuxnG{sDKc?wc8MeYqrWUME2mUAcR6%MHC~%
zF>Qs1@50(&Ay^iM<t=_WeU}QQD%c0I{Jh3D9x{D|WauWMNDQEf_siW3lQ;n5MvO*W
zuEU6TVOm~ale`|-TH)hC2U8MPoP|=9G2#Z&16NnPD-eA%5#27dQ#>GhTtfySlQ<Hw
zNyNVz*g&gQ9q7MSH)?uSdOF08MT|!wzGTvf?D6J-^F)-pK2}pvR&A@;{;_2lhlDtV
zO>Lz^9D(mfto`yYRtf8WyDQ_?<Gb%$M3Pxsch$hX)kV@ImUW(R3Al6$+qj9{e$KP;
zxHbhfv#5yzp2o59#ZN6R!7OzAM1YDVMD{%`j|U5ruj;c{{q*DKD%r+>Mo;^?@7TXc
zFigFa<Y_?_kSV^)euLpm@8FAHzjgAz8-JaScLh#>GtN+NkdwqX`Av5Gc~e{YY>nI;
zVBle8<ma<Iw9{Ema`Zs{sDXwgcog_HgGicQo;eJGNJEm3Aw#tFX9YRofXI&VWA(br
ziZN+)dHT6tjcjh|D||=&D)Z|@E|fGJZzNR${mkp=^LRaYb(I%)dU2JGVJ*_`eDgp(
zsN9xGFGcFxQl5JOyB4Puu57E6)^oRt73|v0g4Q8$k)rEnr#7`?8R}rDJt=DS^34Ra
z2?@;<>>6NDye1I^Dk1eRt&Vvq*%nCi&gg~~hrASR4ee9Sj+z$N9N$yi<%SYwwCgP-
z;VWKV-u<0fI$i^D1Tw^2FBUSnlJ{oP$#cDuQjq}Ut!=q-mAEF|0J^GeXNP7!DVztM
z+6JI8YuxIN`%>YYX{|rldP+IJJ#ic^;&?R#Eb+Nxlz0Lbv<rh+@3=7{;KfI`)oFTT
zYrh=7%k(eD@7#P~`+gC|`=UZ&KWZ<pt3ShnvpT;DlE(`?P3C>+*<8Q#?L~G})6te|
zc{%g2!VFiDk?S=I+V<?{rPX9VEiHc`NgxhOtZGvjo(uk2`gV%B5B@N_RM}}~i`S!9
zW3G2KTWj|8=F&F-1}?QR!D=CBHk_@dX8U(fKnp(!?27EraCXmy8bF;w*2Z}Tqu6yR
z9WuE05sO`)VFGAaeE_5J<o%s;rlzcHx3kfDhKJ{G|5zOx=~H*0rTd`4x33UrGvAJ{
zQ<U}9COP4K%nAP4eUpgO`#<YY61qbzG>MH^xNG0?#ay#@JgWMvBTtH}w9<4f$}gL>
z3x#t{2WQ%=5ZlVJ0%}8eBMs8YL-h;Wa=IF`S1c(8><5;|J%pflOsN&@2d3q5cj8oQ
zCQIA#AqCI{oF5@{3ci$YPQL`W%e=$9_n2u#K58p-PkmO4eDW)4`a_<I_z4Disy)+W
z@|xH%9MoWd`48o#OnJ>GSB?7>{RQp>c$glq?}1-k!b;WLzYu9I!Mlo0BZbN<OIixT
zLb)=@xysQ`?m|&qfH7}DmVbe`12yz2hf(e??xVOL;m51=jBmjE3l}#Pfx$dFamvH1
zG;@^dd6Kze9o;i`AwsU)cActSy!%A}#2O_PdE&^uW_}wi)r)n?Vyu)sEI*!4e`ST?
zVM{}<ogP_;lWz#^CqwGp$iZfMFpjn%R?ou=Eow6MB3@(>_o|6bg~4+qXAo0*$W2BZ
zg;DjuoP<y(+5ZJm30RB+s#fcVe^e;Y#y?aj?5sYiP^{)3QU=$b*p=a0`3H05SfPID
z%7q3!ab`oIUNGh&RIZrw8DgxS!&3ZRT3CiQw6-bPi@AMjR)+XzM0G%szBW~Zu>43+
z?_8mjiQr}b5X^q6alY{Ro=LELsA5%~U!!UY$s+s_)J$y9;)M>6)##RqJ`{UD-&1Y%
z%NT!~w#XHSu{LSpMxP5<3Dnkolq)Q%+fd7k5)9ZT;fkZC4;;sXC0FlQssOF>q$wiK
z1RiC2nPZ`K)=X-uWZ=?UCilj$cqS_is}rHlOR7*}%akvt2bU|g*crxob$<Nu_R$qi
zL5krDvoiQQPOD;FW+_*ICFPo?P!K^rQ%>t{HIDf>=fB;--fO-qmnYN55=lSzSk+<h
zX8z79jjn;qxKziohIL=p;wJWFLsA|+xr?aEH~o1e-Umdk!kTmf*tl{R&VK@RMy!{?
z!pl2|HOmIUf{GNGM)gM-!P~N0g%Tx6JbEPQm3~b+W0NXb>^4r+2^1gcT>8dR6;*(m
zOXRaBI^a$|^=P_)+awe=hxLe98K5iOkqqP>8k<=9`JrN~$=g5TG;8*RTFr8J(!>#7
zk{@ZSqr!l+1z!@#JsD)%6tdQk(Dlzovqqqw=kjeYjV7iVdPl`FkgGza%|ELvw4}gN
zQZ-^~oeS;^WB(*Q0)@`JewGiHrzel6Gae7CSzo)kAl}3K*T>1UwoAzRqXPPst0B=0
zmzU*uRV2;K_eBhE2MCtf*h2)f1O`IGHvTfy^PyABL|HHjRB0xntc7#reF<?Hn;{Hg
z6to0meCx0)44jNK6dqEUr(3YO1eoP)har~mX@VGCKmq}4U8L_?=Vxiy5JWs#ptu-E
zkVZc&W-!3~q^9v-f>jmr3UC$rk<5f@dK}xfoA{(*U~gXtE5px|+*4_^C&R6nhpX@<
zxD}U7hRTQS#FT|t*%SCPvfD(2PZY!yh(CL#%t^9>Z)n$FvqXq9V+yar`kIj(Sgd6u
zy1Z^r2`Jo*!hrujIN@pa&mh7$)ffnM@J&>A6Qz{gwrqYKrOf{ApFbP{LMIyC!X-jf
zNpdN_*mE}*q9fJ*3t^-7w(DOr8!^zUb(@|DqK;i!Q~KzWruIX^!y^z~ir2jZbI}uz
zMj;AP922MD+^nIRsE=a6KPJ?FBW4FEQYtv;330c-=&AvIN%V>W!=>?gQmYHna=)Cd
zs+OpHZ^F@H8jeXZTHL#vNJJt8!*lRxQy7}g?c5%&ui4`&W=Aoga-sq_5(qoUZI`e9
zMgR}h!fpUF2QqiZATT;Ir7S$O{((oZ=KPcNj=%xQ{O&`Y`rpLiiTt)J2cUb?^w`7;
z%UPS3MIX3=&#O6n3`%1%DRX_io^3p;VSr^XD;wG&`M9OUFVnf*Bxcx&eq~T?)?Z=f
z!N`J<y3U&buQh|48qQ&;UvcO5beeY+rJQUGj$KM$S2&+LD8~X4zfsP~-azsO^3%!j
zgQfG}X0TIkqbZ(`lMd<cU*qNZXTy3uPM*e}xnLN+$y}Z>%6;)@GROSa$6^t$3jy{k
z*!l9ngz@<=dO#^k?PCPg`@Hgk`E4ZXSegDREAFad={Z#Xs$%S3gHONXJpj}$$bn7L
zYX*B%qL}SHm;yx3fqr(AIgRp3+~ADEwIOD0{7?Vz<TU(bg)A2^ZqZJyZS~D$0ICc)
z9Qgw`0@<ErWo}>J<NZ>}Th;}c5&*o6#-7Jc>J87UEi}%s;ySWCn!Zn+ep`;>mox;?
z&IEd5xKu@S_wz)ll6wbdQ<s>ZHKvp$stO&81a7G6VpRlc&;&P1C(s=K7(LzYkJ;IC
z<Z&MUdL7@}>1!Evp6q&DZWXu+k(f@DeU3)14;6SKasZ?JY*b#Yj9x!!W!@Z3Z?_~R
zE;sEgPu=$+Tp_t3wxgZXI3?*OSb3OU6uBh>CTv}WP-3iSJ6f7y&P!G+txo%5h1B?K
zQB%Ov1#}lM?D(a^?`~of58FbS&Y-?%Wo0psRgVs=yqR$0#ItUy^KoD#z4Zp{lW`fU
zlf2pKEb3T14o!b<!`<9ZvmI_>UumrASR}ckzTYzBQ4H2lN-#d-pD+^ia4{?2J-w+D
z!@;6>1Vv1Q{+{+YgFg;!OV~)zXpU#BI(18^u8$uq{%Cs|F|ln#BkA)c3*XUIQ(KjO
zZY3Y5r9WE@5D?bE?eILKj&=&P<V$w_Si6-=`kl9(Gvd>^3j2cb*2GQXxgbCo_wFV#
zGAjhW7dGPJspMnjk~xNZ05yJbadf!`V_65w59AYbq5Z8^(J})@%92`Y+CRBXjvXOm
zQMF4o&6)uD&{ZBwJy=SedAJ(2fKoym)S#xYP%~KQX_@{^5(@2_`05NGCS)J~0SyPw
z^f{mE2ApXQPa;PIxX_J7I)6!ZVWrv?srKfy4a2fL2kT;FmU%-m2ZNZ71F-2WAs90%
z&3vO!dc5VieztPgM=c!LW&T<%>vlJ+p_T<SyVp6J*SYHO-$l+et=Uk9X->ALA2u8n
zt&xBoUQfGD^WyqG4WDHIqZu9n?Ztj@f#IEgKe+KwA5pr$XjgANFvb(hM1FX1)yxx%
z$;Kfy9K-P@DIrHrJ9sQ#G*xyvmW}DB%+uCx75msPXQMa}4j5BETig#NMuOxsRfxpC
zlQKage97m^zchSf)~aNFJxC=slV(ajt>3e~AJlO2G}c6KfcgfpC3CmT+bzx9MKi~z
zlgTgo5IL_y@GCf~T9rT0)XmN+aE!{HW?E@_QKRz}B&yG8ASV!ya-2XW>TOe4XASbq
zMx1Wb#1xx>zrV#CLCF;sGjFtGoU@sbWoMgD;rCPUwT1Z<lzu_98{S*-pyiBTfD_Qr
zGfM)R0SkJnNEnq715ehs96#tkWcuc=XVLI;iR)-Z5CuPiA%Y__k}CV%u|3CZ7M>?h
zx=?b4#$!L|@8i+`P7`rxspNSFZWY$Bu;qYow!=j}$LE)mtE+4*KiHW+urohlC;l<6
z1_y$BS$clL%(d2|Syl0R+%tD!wH_-UN9+YoU#JwC{&Q9<{*t6t4=SY>B?Pv$T+8}5
z9R+)jst&(WR3eQ)-cPXI_;1c{N?012#`Ia-h)EWKzm8_zzarf}X5QiBswD(?9R2j2
zcCZu?_TO?bG+)AICO_Q5>7*TN=Z*qL+d}`7zHW@P;q;{E<z-{4cu|5<25D34Nk8`C
z)(h%ZU1(1#NYm4hn-Ub%amoa8!_1D)_o$Q!ODbEW3hIa;$cd6i%Y~RpIzVxW&gacM
z+0CXh0V*;GH3IU@inR1=Wp~tJ2k;4g2>`P3mKq_j`((*lEsSXSpPviXi_tR2pC&DA
zlzS?j?8ZF)BJRXdoq>P91&jJv3Edp-=#5$kI)TiHBLrg9#Y|`DZ%!`+7!Jd4((ET5
z0#5!NbsqB)i)J@(<*e7Vh?LcEu*|~M>VyOiLyZ>f3IkbaQeBZ1VFRRHmmJSurNzq#
zSKlYEi=WBkE?<ofIY>|e<M)#S9^`xmUbO-6;E-aYOmjK8{rZaI`-z*`BL8d@9$qlm
z$28f<VY?`n#TqOg{fVho!x;6l!BkNphr+sqv&_P`aU}I4x)}ZN^8yGo_&>aaA$kcx
zb`gU56bU_AbwC<F>FIWuyMh=(a|946d294~dU!s7<ZQ*{)AaR&?W2pG^dim1c;;9_
zANFL-=Zv_10+@`3^VSIKgCm$M!=&j%*1W%t?~9PrO%@+j;{DE_MDrV8RoWkbU^w}*
zM%$dd_knBy(?g?{Nd6qZ9XpPKDkaJUQ44;g&)4R4Ny1(M)fucs#mO@(4lLDOg-$WZ
zB^LMlIj#GB6HCh9flg_R$+K{5djPQx>MTfi4geTE1*riQz%5`^)P*v|Pmg6EaVp6%
z1gq{7Zb>zi%9cUT8mJ!wD}YV`DJ#(pK+3O&TTJ4gj~$uKX})?tEmAw9sI}L}dz8_j
z9cN%B20`ZsyyTlGLpA6zmI>wuI(-#TYLP?=YW|nig^M-Aa4KiHu(#CX+H!1JA*DXB
zR)6?o^glWmBp7JymHFbh{VMr?3h>EHCI{PZ$Le&G%Y%Jc^?fm*%eqY-?actP=_Ws)
zLyMoFnS~j2b;qko0%MY}$o;#zM_UXB-ihSsLBrDhA4xp9Hq<|#z@<RX+tl%p@f?ep
z0_^nVwOG*TcXD*Ib2Ps9(NGyNcdn`yWOk0{VZ1*<7sgCuh*9A8ugxCFm3m#a{NEsn
zSMfgk^%mvc+%wX*){1)>#!X=I^C;y9*pXb+Gn!A+)bX`Sx<jkG)`sL|N8MMiW}Tn^
zC{5Wa;cwQV52@2etJ>t&;^=Qxiu$Z)$}|>0WW+dnZ-w#Zf6uM>+s?}ZE+bh~?9~!A
z+tg@)S<UrKextn(p(ofb?qx&vnE(!~rook(db;NLvO1==9F@bee`m0`u|2*S^!Bjs
z55V9j2|sE+stNpI>wNqL>^%uf0L3-Ty>htGbja37|AnikLkKUIHMKFyj!GxVb!T8C
zsjlJhHaeoQEsyt+@oUOXT~HsbDf}k%U9D-zGr3^sx<zt9!l4!MA)QEzdLzh+!Mbjj
zrrU;c{Q_Si1^?XKu99F6xj!FGotqL^Mmf8ujWKF}B%Z^eozf{jFR2d~Q95_PIHa5}
z>C12Z^mm}ElF}%5SY@iU-1gO<FIDet%?o>F_xAyxVu)WskXAg~U3XJM9cf84K=B!l
z{Bw#{#KywhhqOAqq3!b$g*CYBda*oYd8=RKAgg_r)k;BBA(AhqEZBpSTUyg^x0sVL
z@{nr9Cf#P2{ZR2``xOA{n(_(0N1$)ySen&|m0a`B2`zeD&mKaZ;`0vjbDp4Y4V-=+
zY{Ac$8^jbumog8UW(@Db&w=5x{I+uDj?OdqO|U-I^4<ZIUa!2t;#xA`o@H7J`d4|)
z?q-oayzI>)Q^c2HR3j^!%VY+6wGxM7fOSQnO05eYMH~Wo1BYL_ds<7-w(kcuv(ZU_
z?LW)13vL~3b)08>yaEI-q0O$TtKUVQ*J82Lci)*M;LsTnSYtsYl}Dw(Eri4q95I%K
zJamFm;-ce}?BP++e|QnQut^O1;n%7O4moj^qDHF|eWKjz1cltF8}AWpNwH;j`9)F1
zlAV#5FksK>{ytY1L@B!w)NfTN;q1H>Gj-zGJAW$q`O5vA0QKq1{h^S}kq_-_9P5z{
z%iT@Z<99%t^0r13yFio_+lP)ZJp{1O&IWL&4`^p+=zBkLqM5g+clMf?e&+djg*AnL
z3|Zy|{$L>1MEke0{D9&kmjkKD#P^~>Vx^9i9m|67E9H(&_b-|?OVb;@<)YUzPdxRc
z88x~fqxe{$0cTUWA`G!#=$m29VY@RyFITuev3#g6X|pKP-eIfKa1g9%A`>KVa>iMa
zGL?m%OPlQ6IU=Lt5Uh2w8tZ2lVu8{7f+Kt`ArGV6T`zuNR!V7jd6=Hv^Ny5R{@$3(
z5;^Yr{g~|Z+gq7D0b#G{_O?6vR4^#jz!aZ8eZ9{!ie?b1+yNqyp@w>cog3rckrW;d
zL#>~OQCw{Hfv%f)TBwtf5)(D*wF#w+=H(Fh0eq<FW)GdO8V{XCZ@Q_=^!~u9ZydXu
zev&lFl8gl;Im6(f$#u7qo0abvot+hxe8gpMwEX_dZ}dHim2*MuHCFk|>iE`jU_qXK
zA>5VX&JMM`d{ifcLDY64+S2KGuIBn!`i~=0UA@95$xY0KYbf2=<NX`!pGt&~sjzQj
z0pD9rmAnQKx$T3HL##UT^Ln4MGZJf{GOrf;kStrox<r1}Ri+gIn+u$Ejk7~RgXXCh
z(2#D?^DnbFWRT8{0h8QzB6L|^*ov@ONBa#~t!bg5TAzw-FKpARXY>Fo_!gG$69alI
ziypyF^jVr6;iqZ^MLt-qkyODVMiYU;W>ExFpvQg!yHRO-v`b|Bm3C)j70_He_+_CA
z(h6xO&_{G^MBUiusng?@Yvs}Z?|BoM=Mxm-m_Q66VZ<?R_u|*5D6qyLI9oadBD`iD
zM-fu8?&y8Oq(az=$Fd9>0^5GK+{>-}bx-p7_|Q`Veo6rK6Dm#;H1E0re#`2^2@Pi>
zf2{x;P4+a48m7dlSZkUZh;hK*xF&v-Txdwv>}eTv{={BjbKF*}A3U1!I5Gx(wpz|$
zTUwex_L@M_va*@<d{IR+2d`O&r^Qu}SVlA*rN~VB_N+zcFq%}-1yxcQ#!kOWns#Bm
z78&W<?pITg=5r-WP^Cv}V+&Hpv{;6(W^L4JcB6*-x>V(jt+ftYQyuwHsZG79mq&(j
z&dEjfdcdL2nhy<Ic<Z8FSSgR>c)7k9_0dR_axNUY2x~1&$(2<DU82S;6~X!!W%ZBJ
z(C=y1du0<kJ$yKtv_5lqe^N2!p!KrJodcv)sm>ZIa%W4qj$+n%Xi^O^G|G83jZQuo
zdwUR?yz^x7lib6UL(4|17x>B?aJihRC2mX}z`q|j)fHqq<=gy|OqT}tw`svJ*=0>R
zN!pz3$NW=TyhM&0jJEe%PtX-SO!Tl(lGY$U|4j8$UL^f_cyKaD-DceR6jz@EvbUN3
zd}WIZdtKQ6;u}xHnpz#Qyh(KfvbF(L26EpLbTSei8+g9d($TcOmrI9)@1lIbbps_h
zX5vfF8@E>88`j!EbZWt!Y3!TKNA&qW(lLWXcdaOE5VBBe{hdK=j7vb>Ro8F|R%x!i
zYGDsGf8f;qnrAo~zz4ax!$wG@+*~+ZG2`DnoCq>(+Lr<Qgv)vWcwbku$~?=#sNE84
zyvCfZBv45oV^f-6p>N-toq~yeQCw`Yl6o>LT>|^1Ib9)p^WjP;mXmJ}B5J*P(NjUW
zAgf@_h+^Eiz$Vl{3b>vsa0Ptx<VjmhB`S-6E46J8Axmj1EQf)=Yi(1M?^7pWmKL+G
zegAWC8Kjy{Ig7-tLQ2@nP4A@1jg@PcuB8Id)%uA^1sqVLBj=w~7_FG3LrQHJ+_gUE
zQ@CdH-Pj!4ab<rv0+lPy49z4|eI%#f?cv{SWBPVN{5nN<0ziFYK1VWCWIo%-+7`gP
zwY2~n2J&x}DYrSm{iv|8%V&=2(VAx*ow~nfallroBY5F$THZYYT~Uw~6VHR%GgQA=
z-Z&4naYqY-(tc4DH-~?b8QE+pWir))ZblQZSbMrAF9@m%LGzf1$-&mz3_w`pn~Ncf
zexRu^eHNsw1cGE-Y=4Zf?(%GLWp$J%Ocp!zboFW`j2Xo6#RM8q=>C2qC_>?6^7w~7
zwtk)*lg28gP7uLtLZrq>2!tyJcY*NmDj%}KmZtE7iA^YvUUgAwf%985w;D0j2CS_o
z+@dqq4F6-6)}`}@N&ZYVVJnSQb<*hm2lmKo>|bmPUYQoh?45j5yGh<Hq0&2mG~;Wf
znDRbXtnJsr3c^EdOGRvT>%tSKC)UkCpxn;q`PQq=pZkH<tDiDWogFR>39PbOxW@MD
za5dx9gzjhTrl=!a4cq+GcN-$uWP%9M;sUKN$I~n%vR*rF3FYH=T0&WcOyn8gni_%<
zOb=zY_-<e`B`l3p>)*YR?~ff!Vb*COUQ;kuVKi7Ye5MuD<j20f#Dw&UvDm}~3!~pm
zr7Q(*HM^MrrJ8-`TQT@;DVF>_n~yAc$#Dl-Sdv)fVWuop&N3|4$SkN?9f{kdH4=8y
z5;9>grnA^ir@?9|aYbsnV-CZS{ZK?QdB779stp@v65m?&I<0=BLed9ib^=i7b+m-0
z0%Z=^6n5r!C)<Es=S<-rfx{)8DA8|Q+MvxoNNdsW8a}tc8ie)nm<`7iH)K8~R!8C(
z5Vn3O;=vdo;~nrYHdss($P=d8l;qn*c-)%raFMD_sLwzV6i)`^`{<&jxlD4>q}V`v
zsDnKi4l>c%{LSSP7?q-LK8plT9&Ir&m)d;+R_>M-%oL-uP;-u><ovKz&t4fZrx6pv
zJnW)2A|F7hW9D&#*3zuuf;PqER5!!6A`R^K-EIb611ZVWB7~(~qJ-$l`SnEq-@6lH
z)XQ^Nm!dZh)&=AX(Wlgo#|@g)rgG7-irq^tf`40|3F4*`b#Pn3uW0P0x?7cKsh-@M
zCrjNunR!NX`a8~mi}eq;Xtgu%q+r7W5obQZl#!ei8bxWXLnZ2RF>SxS)s4)$?yyUV
z3H3Z3gOAln%uHo=hapo8>Gn6p4X}$OdA11`cbiSY()uQD>`V2wkJJo#x)L>!K4G$G
zS?;Gm1N=g2`@b97`N2K^9WEyrzKVB{i32n|iD{pAXz)K-=Tm+@8@V20$bJjc`6S>-
zthb=2J<@w;;FN0gK1CR?OCW2`yNVjW25O3%hW~`XmPM`NW?+)$ruVduf1>@*Fv7Vr
z*!qth_PI>FNC2l>(3%7Pv9B|1{0#K+e!!Et-)w#C47?<P@BSE^2EM#p?HG0*EDooy
z!_e^k=^Q7lLb7t)Y5V6+*#@U)XHo%Z8rTVjrep)~$KgVx1s_U-6TTj#K|<_|VcCw?
z>>J}@8kOS7S_AO06@7!})o5WyH(ye4-KvADA&qrE)KMV2#X+s7b7x!YddE_96MBT^
z<TK2XIf8^4ca+`0$4Lajk>>x=q0d=<+pzwj+js*qe0(SY$7S!QLX@84>!(ck$J#1{
zb|!~Vk7eXi|Kw@#;Oh|IEJeG*78s#%<#1aRxKj9Ro(i&6*Hy$=Rv+@4lED;@8OS8V
z;fe&6=rn1Pn*iJyqNzZgPQ@LN4vo;TW)4%5TPIDUt*9u}v$j2XY*8gQYBlh=+}0+A
z9k!S}G#1cm5(uf4;h?n*NLa(dXjl?UMs6{cCbjq+4t>jYFdE{cY*~y{;m{e%WdQfJ
zHMycHKwyyjQA4)Df@B1wO5=A`1U;i*fT&EV|JAE9F%~b9B^G{SJ=Zp$zp>Fb5lWAs
z%AB=ehESMCCFjuJlIh-I=*sZwTDPR{y{&mDP<p4?(6<tcz!jtRByqdxY}Fo-4cfh9
z5XxK(!i~ghEY|zCx-ESzAwOW<3j(y^?wEx)S$2~0irHgTPfc=^wSG+AjF;vmO{mbN
z$#AL36zfD8ifIjo?Ml|I(Pv!VGMxYGmk_L&5t%eeVHYo2&=BXr+C_>I?OF^?NU$Ll
z@)H(rnL?=dyuj$SJfUGS+)G^9UJ>;p&;md)l_y!u!NlA6?Kewmyurn_a31CA<mAts
zTDUvt1-C^uqr-|VeAeJVyNNusEF#ze{evt0Qp}vOj7FU#=l-H0+|#hLV=*aJa-ch;
z?X6r<du>6J8#~GnoG1>?Ao}NNPwzS5M**kpV(?t}qv<WkG=(d6@5`Sj>KoxAdh}4c
zi>qiM*WGA4CtV!JGb?pBa4wC{#Q|k^v+H)0=TH6%MP~~=+QsJY<#}O;g87<*f$Nc-
zTK7!&>!S^K>-!rZaf~w8ymlBCWbjSLvB?9^wp7HeIdLRh$x%kb<tIj=8@=UWGC{(G
znYWA0T;slNX7+Amm*YPqZD(ttoAMRBq3!_hK+Scx(bEt1)AIpr3C}DC*Mt#e(WBvf
zqS>H@wLE*NgsIPV5G<mVJ#^6yUU%6Yq_Ccg1ey9G5nNDG0YM3cO{c+9g$$XUXctj}
zg-<*C5-SO<RiguN%QU*HH?M9$=r(p#xUp_I?strP7s(lbPWRGgfm*~TtLBdZ(dr+#
zgQXS)5Goyt0$C5;laKfEL^@rH*nv<pWPZW)L^%A3pdzLPs3F82^B+JSWbr#fd`31`
z5E1>~j>3?iB=}E2gXp_Eu@Sk^NpWB5Bxl9S+Rnnu&8)4Lf!3azE1B!v6>|z8_%j0K
z(7hs(G%4Vuk(HDq^~6FwJN(Cog4TM505<qnao>IX8bE!Vq6*MJ_`h0qqD8N{=bbZy
z$(&NeAszVB>5R5~$BVZi>QL!rzYKaplz4A|NhF-9{VNq@$b5^?krl%Db51n%gOegu
z%|H-i^ZX-eQ8OV8mhi<zlZ_RT+ZDnFkI3Ryu9RzHFJWu&;ke(ojpH?vY%778o|?dd
z1MTxAYrLaQM)l4c7|H(f>#!ehgOj-)5cGXahje$yVJ<|_Z#N;WE6>!7;POWY(+*Gh
z%&yzA`NoO>roK>39R?m|!Z1R>m;g<dg6lby&-wMcsi74--Y8&WXKEj>qOa3^S}1ei
zyKD}~S_*0Lnq27r?Ic{-PdGLxx5l~hzvPNYMXROZ|8?)%m3o~iy1s){CXaH_IDG|e
znf%)0=XQ(qZkC<>-pZVxUx7x2pGK9jj-ljh{wo3r?qFuqOw_=1^ycTN$MYkZ8^6x{
zJRDqdk;n0-0wayuSSgzm2BEpiuj%#P(&Ml`ujO0TuF|r@yd~c`eC-=ef-qhkq7XmH
zFaU5VOcrRAC?h!59Jdvs+~LR5HXlR5Xgi?)9&{fc=wS-qN++QkxqG?u$6i!7eRulN
zO^myymR9g>W{<YT+qqyM5WpTQPIRb;fFL5wSTT+D7_-DZ`dFEG{c$JQpDSJtFc}^y
zMa9yF9D<hV)jHz0PKP_&Cb+Rp3GlDRy#p2dG{68C;!aqs^t6+-^b|N~Wm2d<WSw+M
zq(Poi7%{9K;=b^30xojtu?I|D`Qg{ATnIWZ*X+2XxD7xy$N{j?M7y>UZTP#$fvhN&
zq|C+tVe1~-YhBwkOviRIW81cE+n%v8W81cE+qRt<+qTnLtLp8l>aG96`1EY!y3b?3
zfsH3sGR)%$#gO^Q&z<dkuq|QLcxjTl{Q_b|GehXK+A+Y635szn!Ja=w;H+NRCFK`S
zjBV(e>8cmR8L@BM-{~gOB9-{=^}o2-L>dN{^)R)SYo)5L8RZGeI>P7rSY!~<N`Q;S
z|M?95UCymZjRfq@Y@$L~4%M5Gg}0B29PKAAuCRvuu4w2N1c@E0cuQvSu$x{tw6=Eg
z<0b;|O2k$>tb2B9FHo#Aq%uQjijDLdD>ihIhty*rn5zsglK!1c#}E>WFJS<_LVq&?
z&VP;h(;Gu#yn#WK9P}H;onW>hjDc>^;`g~&-P~OfoU-Mkm;9A8M{a?#Y92?sCcpAm
z1b(ISrAvub$vio&Cf2ZE#mY-iO}R^0f!xz5h(o4<$Wu(bo`WqCFH(`SYPeleJN%fv
zGYSj$&reLc&(n8{y;J-VuBG;e-@!EYY+ix<R~xjip2$eY+y#(9`$pT5Npw+9Gz6^6
zW&Q+>d&Hx9*-)~tx`Ju<lFE+oFNYS<fdx8r7z!pmip2R48Am$hKCBw6Z0@BQ<<f&w
zi*kNDY?*b!(>?O9h!quk>Ozi`^4x?<wFA5>!LzgpiBg$mhg5bcMm`5<wE|45ss%8R
z{(!KA{VZe>C}G1C;VCVctK3rHA+X3;U-GtIq)x3Plij#Yz9gWd-#IL7Q9iPqrB0hp
z^t|s`w2yUPu7Y-*_v3+aN&0W9@lR<W;yHG0n*Z>KFc=j6ZP&ov2xGFLi3|9l6eJ8X
zcrsdJ^32rxV%%2$@`$3Gq_^AF26-_C!hj^KyDXg&w%ZAW=w>~cQN@0n*OgiN%G(sU
zp4(h1n13Hl$nHe{v>BGnud0geDP4<R(__3Xt{&Je0ImDtE^hs!Bc4`2=c~+Am#c)?
z)`~u*K(gvtRtqEgl(VnH296ny(cf*l{?ziI%HCzyT&<6ic6zpV;z+<qqbr*<O6)tN
z0{nAE6pMDal**Scm^8q3kG+4Lvo&gvKtx%o?;0=bzuIb5v2)*@R{px>%aIv6=gH$q
zuXGmmP^plR^Q2_;l=bM-?n%%*^W(dm^-$mE9a>f;;&Oc~ser1?yH^3IUz1rMO}?bJ
zT5WH(oSM>HQPuerwKmIOZI$3DuR2;~6Y^;Q@0l&pTzA==e)DDS=l3&fv@=QaIR^F2
z#?s67e5;XExe1-=tS8e`w$081h(iw2psE-;nLUMbUtjWKSBK}=+R=17;=*D@$l<NO
zq8Cyg1xAU4!d>W7C~!rP;JFa{;ShW6@>obCGFdbBm6t>$#wI`PM$(dV6Cc%nOgSKG
z7JlwIEfb6)@Y71aR9}!q<Zq<kkJP^&zshVRwJWHiyZjUj0qbeDZ&vxLE`~d468^Hf
zb7xExOB!`RA3B7d9uk@N>RU7_0%+p7xiIH|ae)c2!Nn9rkuIOtnUSg2j2L$5#8;8`
zBt@XUe`NhN(Q!ruuZi(&j+Z&?Jxmt+gO4lqGf9o)L?b%t>$^v2Z5pem`6<-yEmlDG
z;?xM8rIWwhhMENf%R0*t|A9QHT@Zd9LPEc9{7KF#W7yqkI{A`ew$^WRw%k3ENtNKP
zLGM3(I>v<-<U8{#Q-}5#<B%D*Z{us&=6wnr4!jgvKxRl{b7zvIr=$2w0~Qd4x=G%q
z=L&!^apktnGRa!rvW^?~M(ci*s3Nxse#L@u7$r4?+$|CnDURcE8Yjs^Wg6WXb+Wuf
zmy~4Uj4SXldpaD6G9{l^^8?f3$RyQMoVA4c6+w$Jkk{ta^;)Qt!%)QBAl%FtJ-|Cr
zYwvqRFtVx~3pBIodLxjZ#j5mG*|V0Aj4cN#6_Q9V8lDC2Irazje_TmZ;v3rbv*nw)
z04+zmmd~+v(!o23g8>-o71bjGz#`2<Z8{N145VPz#Rn-O&6+A6wCZ;5slO@^JBgG@
z!L;5o1`}=SEEM}Gfny;WLW~DvI2+|Pj)x`3)i-PHa3l{&t@8lH!~M)wDrM@0Xdafe
z8ZYTn3l;AuNPEqU_5K4c78<z|$dlM`khVtqohoSLlgQGw0TZZv7`7u5u*CngF$;P(
z2Y*wdUPoie<e%mBez?6~wrrk;&W0jLca8I4q(QI6Suu=tE~>AH^Jmdt1tyJ?ZHfpz
z(YuxzIZV}Fv?jJY9eP<(M=Fv)NBt7@6QcaTvDs(<9Q3-c&+j7_Ys@@akWOuMSOZ!k
zG|Bz>PU7lc3fi`FJ3G0K;bB-y65gY5U<*)+aBwR=i`tTW9NdL9q&`ov0wz{Zax2EC
z-MZI}lKNDGn!IdZ!CovD3*GWkgVPVQcDCFWB7|wxPS5hs$?93jp?+&XRxEiF#`ll@
zqJ&_#D6Qb=CR+67_{$z0w@LJR&Df@>^>V=t=7~~=@KMEZ8Yc+Yrp!h;DulcR^hz(x
zPWp%SM1P^UzC{`0B?*nMkOR;fEDt8Rum321i}bn1{xTUEgCKpmfMs#Ura}^8d^@*6
zipp583!dwnGla7}>p5sUt~CZ)fdwgbgm%^=o~5_k&yD85ahTo9Pv*v|n&(xFXTXkj
z%La>erM9<&qA*gLJcqDps_fjuDQ`rK(?5m^fosbop%I#yQh@5$LG293b`up+sbrH4
zW%(jQUC83$TXoc@(+%f!rxsL)kOMJrqi2~p3RU3vGUm1S6udDfGp2Gt;t8YstMFh`
z$gC(UNX@$R6{^V9pBFO4vnb;Ylk8G84AMAiWwzKU+h&}AJ?ZBNjL*^>uT8|y?l$N#
zD6B*DGO8??_zQYO@V1%$E>vAiI-jc}6xjY)6?6JG4Rwsf62E9i<3Y{eJJ(O^pW)ZP
ztfI&dtGI#S0FKnRw0f6#!5&v4EpiqiFLG}7dLXV1cbHZGEp3NSjsar&frZ30RkjJ`
z!CX$3qtt6eYjNpo3N><vy93+%U}EVjS41^tfIqk1j2GF{(V~}){&CKa(C?+)P>)0h
zF^wQ7m_XoWLXV@5;e+kjPnGslX=^XDCLh$ZjyOnY+!@9fyWO|Nps!MWRezGVQ4WQ^
zK+vfl8ml3|Xe<FN=affmRDe2WQt;25?u#K;{1sQcq`&`b0!v}%1ZbK@c_x9o_<xi|
zesAJnJa$S9Md`h?@C;wb(fJz*#bxpRu)=XW*5GI^O2EAPvlw6lHSitQ(>;xx9_kz(
zI3(26S^pIIMMuh|F&FjAVQin!JMJH75m*RL?@N|<;xTl@AW?3wpPAgC>SQE-qp87?
zYPYSUCAj76c#PHi{GLMvJ5CG_lM=;28w^u5e2mxG-!fxO<;F3fqX@z`GE5t%anArx
zyx;a{92giFmv#N9<5B*#r$6)(=sWdGatL0mYCd&hCywQjKu5k98|Z;?5!lqz<5=}x
z1JBrIumsd520!7NTq8sh|BM&vZpjJISWNlF`KM06_#?X58G07oWH}N$gXpn~N#%kG
zX{bfRb!Jg97wISs5h>mm2BAFK6xuAOvj_2!uDlV6wUrR3`b`Lg;2UZ<G-hIs-vSg6
zw6JrTFX1QGpV*=Qp7_h14Lm6nV!UH8r?aGlO70}*#C*7+zhMS)z2857vCHYZ0NOfl
zeDC=yIrN)Q98oAgp+e{inbz!AVr!`gxC_JsSb7q1KolE~-?a+wg%f2YyB|PVuE4_>
zsnoDol%0L#!z{abt1VvBPO|2knFmd)`A3>lu1K-Mji&o8LnaJ(>jkK72t3q#n$sXL
zH!?VRKC1aWhvuX&Y>XPHv*f^hC{Vj+%{tOblkEdK(c)ok(s=AnlAC8a$!QDU4&6b5
zqJpOUl>)ipSe>(7a6Gp%Wh5#Kur}y7Ei-Vo*G?G$N202@od#P9SX)>LrT!@qofpzH
zE;?{${ek(&bFb{=s|~4Qllbd^Vvvp;z4xI07?j^2#IN;YCI2rqpnDrTLyiGLvp=%*
z2BsOF#AO{PK0v6YDW!QZ+ygYz3I}Ft6CuzAZ7?9UR4_#Sgx)BPRCr3e9?ctje{tW$
z@h(S-_wE`gPV#kRkq&b)gJ|`9F-qwvDBv}{CIx-4Xw+P}<HGZXnCD`NHNw%_exd3B
zd@)_&Jx1PK7aOF2S#6Y9L(Yvm{3B9ExmkbA5f0j;t%6815f}wb95{(|gz*<pb+c^z
zU06a-5#l`3UJtaj0Ymn)Rl459Es2ClRHl><mU$Tpphgdh*G!9BwD-$Z{KWP?%=E1`
zgqIvC(CIQ!ZEETc>Zdbxh~i!R9-$B4P1q6o_(pW5n3t5N$Cx>>q{0-P10)an```@*
z7zaSGtz!GF-SrAjPvODv^CEdshxU#BWZRH3;OB{csa~&~x^QxvZL8++_)QZ$3TjLj
zHk6xyx#L~JtU#Ry3Br_F71-OmD+%r_*fr&=iWR^W13@%{Z#gvdPTsCZ!uSWt;8CMH
z>F^w0dPSw>b3bpyJM4vYzS6tO7Tn!Zx-ci&8g;;{+~_Kw0~~4L!mK)6LTR8KE#^e*
zJqH2K4la;)WZb`OCBX(2Yy)(PeQ%2Lq<T8`$P0=T`AAwFy#S7%WhM?Ch`KN{Lt>~%
zerbRNKvsB~VxGDkv$o`>180&{68X1axkm%C34pSX->b}qaQ<P82mHlsqzn48B)6zJ
zA@2Fv>DlGQ)fofYl-3B9z=X-Y+gW(ZFt#|M1uY2qY>CwQw%t>!oZmmRWd-m;ApCAU
za_K=_!I#KT7P03tw52;ydDFb|Rk2i*13UC-U}C>?B#G5dtujaDhNv-l<%}e`#_A0(
z8I!p-Yvk;-BRe8$_1^`x<Z0<$;Qlc7rjHjC;Y@e?zupc=DUgB|v-1}!@4F&o%ADr0
zqJ}M|wlrBjo$Sx~?$!=&drR|KKGf5E@4pSkzn+NKRQq~on_uXlO_TMu)w`LxUiO?G
zkIATQXCHo{Wdkh#Z{*Ux@&83G0UUmi%U_&)C)&(E$R#=$ry2I({~(u8dqWAK+W(p8
zBK$Ma#ZMpY)5980?aWmltPWa7$oTqD`}usF1?8?0t`M;uf9k(X8iQV%P?9<(t6<Bn
z_Vke%*8J&%{ntR(;Qwc!>wqyq_pOliin-_P7FFhM;(2()E<BaP^_#C7ByVZ@+w1fR
z@*`NZGbQ6XGWWmEe^g}N^?F{%kf;E<>>6Dkso;hg!H_u2vsD6ZfNlz`$uA<X<06rd
zi~FlYzar&(l`zQA>=&GgH)K<xz!uiVoHG+9ssL-WJ)xf+idg>m+fA!j={O1AfZ%V}
z!)D2QSHcEtExLtl<b`cQ0xU*=yKw&Glt7^I6kdF}tyC0koS(DGg8vM5{A$XX#D+SE
zi7Ppw0VN6?Q!$-#%6W(dFpJOIMYVqu!Yixgvk@Y)v&F5K|EH`Z;~X*^%OzGBeZRNQ
zx>OE_8*nr!uo2?-tHY+kcFZ!(TpzRoqAiTbnI(_*iW7^gJv-_bH|D*73czENeYUn6
zlg%`Pnxg+X#*wbgmVB>~7U>6T6Xdi!)P3gomC6JL7&)aK?7k|&qT&g~(9(pv{Lez0
zKe2*MQO6Jw6hPScLPg5dScX!tq_u~1j$*l#G!8)BN5)V;E6;lgr1);={OslP?C-?W
zzT9liz=@o}Lylw{r*niXk3=^u0{gdSII&Ky%8~n8E`3Cp#_ri+$zs30{Z}^6r{`}&
zusE%ti5;MX5Z@4?UM^mY6@WPV=iM0mHckZ5>z*bfUx?hBl`e)mix66>{(`N@k=f(L
z@Q9_oBMYeTp&IIyJW72O;f=c}ErbuTuueWOLZ6;8#KER5YYK3Fp(;uxDsP>F<fgxU
zXaM=j6}mhjbZ8NDrpda~v^3g%Y+%2HoX@lZ$YE8y()xYgUp9J?6E*xzTS1Yq<Vyzr
zO$hn=f7*>pkoN7R;5A9=k3~iQ5#j70itU8(MC+|GNHhk`*H&Vbs6HMe{u17>i^n5E
zejm(vYyhEWyS9eRO=1f<TOJVK$HMP6-CN(8|38E0FbtR|78=n46E>4>Nb)d$VVjg3
z+G(#5lgVz6ZEsIN!Hgbm>zOUhSHu498yqH96<{?vxwVFs-R20}C#909bLw2(WcC({
z8{oi1E0LpWfs+3-ahBmk^#|e`WH|U|;#^DpKNDw5f^h*R1fuoq)x{$&Ltu2<`J+c$
ze@^JvCSN{F`dhA1o7ahcl<fSo#gd5o?^4|^ozGiF-@w`9AM7k;=G#iIRWLlh_Ne&e
z5XQ%m4plFc^wF%<aRVDv6S%=CjJoFws0JQKFI(ZMNTeFymqi+lt9OG>l_PS!F()$+
z;uKkl`I|xbUhiL7b0+tH%bJHircg#Jj@<vqnr9L(F8@c?6j}c-SrZ+=@xNuw)PH47
zi~p82TN!(7`hEXMc2A(1&<oG7ROvLO+E^Pvw9}?e^ac+Grn;Jyci8_cYevE6o<#&!
zU9w>J(D$Nvu%Pk=PLaK___La;Q1i8i==HxnTjTdWoQxJ`gv+KrZwwG!iveTDw0nyx
zH10Q*!ii4ZJy5(4E7gCBfBA=*IthVYUYD!xx~DsrVK1IlbwqMsV`{pba_g&se4g^n
z&G#Qa4OlK={mi#<Ym!qe(*aWf|7@!2>QWUD?9<^9q`lDMqlv8IJ@y)G{uit%i}eqz
zNsZgf<9gNb18b6V*7X1JbquOEB7KU0a5-s}&;ULb!JwW=q_&(eVo923TBF(lT*NVM
z4D|)o{`?5w;W5zr@V<TJc!xO<Ht00x5?woffqiVR4iJ9d$ZN!DC-Hw%BQ?F@-W$u|
z{&5mfsxLjHQK4TLDy~i(ay<c-5EVcXOPxrw4OpYtMGFqO;FGVq^^ezAw&|w5tm!iQ
zT>;G1A3rd>G7HBJZ%>EX8kX8P%29x+DKM$^FT?&s9DVjxiE3!YLZ95B%|>-Rr{s8p
zwv)bPGgmdTX5O*1AtXz9f$7wmcMoWv?QOVmR=Fmhz!fke_e6kh?Fd7P7wFAaErtl@
zC&6D0{4-x>_|JUV6WqfEYpGqZ{%q{}q5=NmxTYe92Lx_!^5v&D358-S2WTTxJE$G3
zx#IS>rfy+&CRFCMte~jyJYw1j&*3OZ97LkjkK7Z5s5R7A+7TOxlREvyEFz9HMbrC{
zLpQi}>}-<y5)6TRZyw{Ew5&LI6)m3aoO_6V$=*U>bksi{TZW`AamSRhny)T6J1)G5
z_NZ`=hcO*=pQZT*=st?SCOATVF1u5o?+Yac7~bqOM6Fg4O?J)Eu{JK7uVQ*z!~3a#
zof=r4&miLpI>O_~gL0yf#>|UPAr%kCdar|_ta2;6K`#eW9~A&MFqUvgc!(3ByJPRC
zE(xW*en?+IqzRMz{M9)HBQzgn3)FT-x&v^5mLmU+R~E3Mt7hWn>zH-T#F5?Jq-qU+
zG}&0gf}SCTr$ukt>`BwNbIZ8q-+{n^_XTr&v038k6Z|wRUD4lkFdypIwjZ@>*Vh0A
zP3sD@Y8$5t!c|mCMK-QJP&TLi8=eCXZ=(*4c5&HzE=ZN<w#myS7?+DM>0XB`zl(ND
z6Ugt*swAWW!8cd7JJ&QrRdOEK4)PoDP1gOk`+F~$(+tUE#mMAhF1W|5E*kJJPz0j?
zP#cDc(BoUY*S_lG2>6m5?W(D*au0V<TT@ZZSGU6@nG&6}s_|ah?(Y~<rX+;@e#)Oj
z6F1~CVCA2PzlJcY*>m+t1i@bcn6CLHxRLs(z`vnT4`a2dSXD=(T#KEG89dl*ZM9Fq
z>#b3oX_+8lt7n}j1+mml8SjUOliD}Vnsl3JtJFdaQ`!htR4?cKD-^Gqs~g-?{<IN*
z(zX1lsoMJmQ*y3+$637ST<!*Xe>RNL<!L7)|A;BpQ~ag|yu2&|Y&rP$cBcQ@2|C};
z$i0*oVtoiMNq?G!P1sUQs~R16RuYvY)!)$lrr!1GygZ7@DKA7uwL%Ymt|RrAl?ywa
zXmi<ooO7(@_3JLlj;&K_S#X?Y@{00c<L*qOsgR4G--xiFk|f<2aKLinDw6_|w9D~*
zMc6GFx=}y?MQjoe|7~~A^2R{@|D`oKX?LIle`w80@*9y<s6V5hQIfL=)n-3s%A&U-
zcwzOLX+NEA^2PL?7Kgni9tNS`LQ82i7aelO$X;`xWyQu5vpK^gwEi8c1@W*h$sfP5
z=DY7=ZCZbS9AeqD0oa(^?E3b@)RjDjGX=i9ML{xyp!Vt&{s2EvER!OQJP0EmJ9|X(
zD9C7;N!K-VhZ5Elk>}ao@i<F{5P`c5KBk5oKoMig>Z`oMI8n*jy!A6%U^STz2Ca<o
zoY7`d^-?jb2)8DjF@K8j0azbXH8VXRtQ-UVAQ)_y>yZf&{G=8(px@{2lTy%RJR<j0
zF#S}A=@dk*&11~PmPoF5t(W3o7OhGt9I%wA`bgLWt>wZZ9OqxgwFB=TR`8wU`hQ!G
z>aZTr_PDoA*vhBZ698$5A&%OLmd`PsEaEzOUZ}g;K?tam+vf>P3>SH^q&!8vzm#9?
z%Y+^t$`D{i9UZ!H6|aaLd(VcHo@Fs!PhaYr?_MuS)o|MkJxzf>+>Ox!o0^3FhViY#
z%SkTd(C$9IR<QfoDrk}e!y)SdmACjb%W1dJar;9J76+<Uc=I;A+>v&}&sLYYfYu_n
zd(Y{%?l|GLKpx#REIbVYTWA_uEF904VH8X%ozVX*Jtt~F8b7B=7^5UPmt|!IplkV&
z_`YkIef-){XN#IY+>x%b3WtM*2ja{E5x0vPYJR?Q{0-2&p-tolE0Zp-a(sFN`?@_q
zE1$%}b6C*+loCP>zPGXSH9U+lNeh%OkVQJg?9}_Z)4ljv*leucZ=u=an%P%f?VtQ-
z2RVJv902Ze#nWoJy>;-kxekZD>IR^m=Hs{~NQ0WmMC0Wih-c5!j1a!K`=Cco2cIV9
zWg;Du4U|{WK-EbSzUE=Hxl5z<;OhrIn%s-`l}c3@vp62?tEMzI-dd(UBn)Ma`X&PH
zlC2NOQ8+G9mC6v>qYb_nC#8V)P)lQcc%BA0c&h49rGK>F??Odpa&cc$Q8Q8@cbw%}
zor*~lBUj_;295WIpMetJ8IQXxexEO;_F)*sWx!y*Oov+wl!Fz^5frj$3_B~%n9b%2
zp6pYMLgriytpZE38BPxlLBVs&I;naSYv$yEv@Lk5nSC6sVjzwXix9N!nxqyGBWS()
zLM!6761OWEYtD*5=2)!;hfdcUNUwPs-mvB>z9yOHN^DHY4Z}wgL#`qI)9HUep-mKx
z2MEN@8Xm11M__B9rYInC?`WhklEpNbkM{e6YtvH3EHQj!oGbtoa}Z#72%9Mt+{lf~
zB#IPBx>R&b%EqdXe?aTLFpsoMmZLjEEdZw-v-RJ;W(bH45VnMPFg@#|Y`nC%n?}9w
zMxkrA+QD!#imwVj!7;;44OV?=E4o^f;-QsI<QizMQAmU906lu^O&$<B#f~DrJNW_q
z2z1zRWh78~9QQD;Q9Rq^e1EY6LOD~I7=ogjRcasvA;=bG2RHt93ZnR*U>E7oJ8%U#
zm`hre@KECq>0Q5~=;C^gSPS-5zm0fwOX5T}d%*r00&`%u4BE(@AW=u3-S5goCU@s7
z6iLrS;q(MC5_()n^H0`G4AnSHee$t*om}AH^aBexu4i_wByPQCUwcFl6X5YNs8Fq<
zDHC6qFh_%73RKC|kjfhrIV&s#7Ycv8_N&!$1ZY6b$Xdw7IV6dmaR<MG9!+xFITkXE
zmdMB>N)7R4=p=M9F)RjulP)Ywp@8veyg&-4;eb<fULiF)dns50=0Ag@r8^0thLG(c
z3W*)E{M{xF(rG3rxU9VGRZI8fIiqu@i?HmKRo|~gK&oZ&t&xI?=doA90-}88oYHCt
zPA0t`H1dPpdh8~BteC`w0C$FDfCa9G;{_t|E!B$TyPtgyRr6LC@+h&|wECRgkhZ;Z
z^^)rcDh%RpETg$QQu<-0<ck4*Smq78)`O=E+T`NEu}KtxkhO>@x?+JNf8t&_U#kf|
zO^y&PeEYBkpJv4gCKRvEqzi$P4rCd9%a1*ox~!l5)@Ws=kX?Bpvy~~wnk9?jz~ueG
z8dt;M^uaFK?ao%HzCetDabk<JnKLjRfVZfFX9J=2w&W?zF`WZG3}B^~hwde6VQMUH
ze8;q(#Ch(cYX%TR`7_`ie~I7iSX2hVKb{<!bw{#gJnC`xFm)%H)SH*oP>jd6p6lt!
zrIPL~q1XciMP|$Yvxx=9Zp$O6;T6u5eDMmkQgVqhZ`<5aAn<6`q)S&!z2&x`Sl4IH
z8Ty37C?P$ejAQ3dw#P;XZ4;>RBA}0f#E~V|suPVQLOBE5$7@={t!blPH)_M=JYC``
zspl-2fA%(;t%!jOUDMjsfa?a}(UsCpc!r#_#Z}CTDkJM+`&Ko9M2uoDZPN2{pM?NF
z1cbuPozoorb4wtPWODGjM1>|QWllvvPzjd~C)ca9flqfVb(1l$+9vbj=xX7eyW@BP
zu^PZ&mktvSl?59+z}SQG5~}kHt9^ODx+(rlIE4CA6=a=DJT-RSMi7I!jukQL`H7!)
zWGlP6mOwAp$IFOO_XQXVws!MI72-y?vCx>2OL0<zDAfYdZ_JlSWCdxS;>4n=LP92I
zXEQc^0CCT#gOu%pa7M`EI9#H2O7TsrcvoK#3cxjx{qTd?6HCP09?g|2yeaQ{3VMN^
zu83G^&49KTLd;AVkZ$@Ud5e7VWcjM97(<0|IVABqewZkYj#whuLLPO`EyQbf+fWmW
z#*nhq_*J|af-~QN(FkXVm{Vi2Xx`YH!538{K-nhFLeY;uZRj#jM=n+z%gRvmkXWKh
z5NDP6FLM>8ieduR!p++Sim_w_B7Be>$v_!<2I9)^@<igyx`@~8DxO{3Mcjy=$3Q+T
zi%NkW(t*p2M8Me?4d{p}@m463>k>hm9u_Qx#mA*OOp#j+cdB6*b3A|d|8{n&Q))<S
zV<?VYBrL@jtUW_~<_!^THZtV-x0ZSsF{m%rxiaHMW5rJ|GuWkRLv}EPeFItgAq7{B
z`zY)CHZIub?RSz~m>Ft_hOzgrtTDeFA4a7uL<Bhm!RQ$+L{ao<W_jEbn6k!GlLhLJ
z333|HqUL2tk+$o`S?DmG<@(;BmT)OV%}9P?#}x7JreV`KbQtRJ15g2lN1-z8<|RXM
z)H61^G7cey?rXk%OG?7e9?sU4)Fy!BwL04zp35c6@-(6iU%np`#)(IS8+exPMHD$_
zYD2uihS)=2f*+7lt3(1aXu|p50e`0m=8odv7^Hn8)Px}d!vMA&3(UOGc!A(uazO9;
zI85_3X^#HtZBhcvtfS{kEdjcbq<~HaB3BuHuKh?YsO;Uo<t&gu(f@6Hf4X5N`tU-$
z(p*rcm^?|S@`Z32VyFwyqGCvLq)Jmv88Q)95Ni?q;insX{#ss*^2jUv;+5O#nhdA$
z3&}J=F+7=Ai(EDil`=`y6w?+HEnH67B;FT`80%G=fsk}h)HN-j(tX{m$SzjsE}pNp
zTk<y%0OrqI!9?<)L8YM94^&AB{-RsToeLcM$`4BFpX(=d1l%&)oGM4eCVB@ZbHVR_
z&!2w<2CL5wg(t){)<tg+O!d`>K{^NW6e*GKq+v6R!7Wd!TFllBY~YCO+aNJbChR3+
z8$|5%0U-eya^8+##*$g@IY5Rd7N16^p79lMOP&q$T?BIp3xiR16R@>;6x{6||D?qi
zu{qdueLk&U5sRE02Hb@ai00GKOAWgq?hJ$)upQy6W3q%#IZaC*CR{1zE8{Swy<{3f
z{25%O#c|)pghn25w$mShZpoZR6`lBwZZ5eyP$5Yo3V%X8-~P7sUMWU&K<79#3t!er
zEg0h>v8SQQd;Yn8X4;2Y>j4yF6BHK}<uC$0m;PFe#)z5BPR~Of{cdz2ji5c1C(ZL2
z@~Zj8)>2UpD|iJIAyS}Y9OOLogKS)>J1(Ph<568JuM;<eLTsH>@3J{X+2`qMS|yPR
zILf6EVJXP>R6Ur5slx0vh&(g!K7L>)Sa7i{*@(BzK7{oNz~ai`7Rv$gyS*;BFxxR<
ze9S+}C13kAIK2n*yw<zF+NX-ztjW+r8t|G+vd0g?U5wazw|qQ4f7hMw%=af9ishUs
z&@~k|C&9U`5<tC_<dG0$vqNUYybt<^$RekW?rlIzP!mF;s}!g(iZM?UL}AsxyvaTe
z64eHD-I)FUh-{-9=bnR$lU-%Qi`6S@p#r7&LygM%JZwvNDK?Xmv*Aa}(`l09xf3Wq
zG{hxt55dV*ZIVg9_Y7Qb%Pl`CGa0zMqv~MgFg41fDP9uHJ*h5ZS(fzL$`t|R2@pEf
zqpHk(v||U;9Z>WbZ^!%UX0bme&7Ly}dZ-~csyC&Qs3AkX{U}L@RD`!}-*UBm;|%+q
zGpQj~rER#-y;{r~weh+aq2q+p*h&sT6`*WwGhy9F9?ag6sL0<vDFy%YKb79=7iOX3
z;I%1j!Vv|}9KXDn6F7p5J*2&+#Kve6T~9AncBS}xgE>@#`_b23)~O~+uxQ%*@7?y)
z?gWfDL0vyQIw2c4%zJ~c{=0LD$+ts?gM5i+ry}NM0~|T6AxV7YS*69f7A5bg1=CfX
z*wl#fCI=C*TD^h2ppnBQ$QD!Tfz~%^Rvih~*5`8WW>x+LD<$R*Fa#TgzdL@=K`$ta
zHv!I+3-F7c0tv~kbF`X}*v|YO8>aNr`Wq4)p#G7$ouLha{<RQ|@DijDj7j8bP(y^|
z$MZ$hCXe+LU7@5t4Ez`qB@F9QL)DXW>x-&m{m)bJ$<gX()Y)nIrfDr(`Br$3=Sr=a
z`VZ#69ey%`pkiG4@3{S4o1XYXDYoESA-b*zgOyqB>RoS<L8JKQ8W1Y;7^FyL-iPZ{
zOFkQO^L)}2yKCm!PVZOFlScO5yO9Y{{w9q&U$1eWXUH*s@G~Eyw4Nqr%;3A$^6Xo~
za$;6}>EZORCSlX;#0!7Lh2p12j7mjG(U7~e=G63T{;ikgfbDHm8QaCt=d&`s4gkWg
zGIo?LNo7y7M0z@OG<r{6fc!QmW00f0MJk_`H4B-r5>Lfl=Eud$4kA<sQ8EbZ4%29x
zi`zpV9laaU6fXLu(t}as?F=7}ru~sDlT)h=`^)SKtE)7D{rBr5?BrWNQhHbS%SYtr
zSx3=*%!#a-+37whowuc&SKh&hE@58U32%jS7~nO(K>+YI+B`EGXHl_X>TdEE4%hcf
z$}?Ez6zoV#(i}SmNcc1F1L6m=VjDIaf-d^iEYO}E?93v}Zw|JBk*#i7=|TjQ_%xl6
zOmU{+GT2BrG+Vf}FF_z+!5YRFSK>PrlyNlTdUf)+s~C-61mSwcQv<(q?r*sV8do$%
z@KV0*dDC7YTo)H;6y96+V?_0l0O16`%)ku^dl63?sd18i3|hUAzkkR07KNd?#ENXU
zl0M*6zZMTXGix{Q;+f|$kRN-ULOx7#vemU)b-O-0IhoGWYA1Q$0rY@xSi0P=t}(5F
z+(5_-UIr^<br9i|KB&0Te%A#q#QiKk%c0Vraq6*Rvl^X@rn}i?l-li%nf{bi>AJ&}
ztt@)GD)Nk^k_$OOmj_dwCg%d##dHKwV=V5vj=nmvQ_5XJN>ZMHM!nd*`{$8#o^fTi
zn0ieyiXC)~Jrbr(k+o^BL5vktf+63%(r>0iM!q+AD@Y_Ir~*=hDE}e#Q+%yMnAtpv
z#a;v&dL1`2FAz|`#wD*=pvk&0f%<r0dR%^OTGkyNNrZ;X2_SBQExkz`kSeAl4W8MD
zfX01n{!ur_@STdR{LaBJi3NK9Q8)duG)aHd&8|KK^%VtjDbM~+qmh5q&AuOXlW5io
zU>qW(D7RyylAX+2)M^I7l+4S3i;GgY{wt^xm2)z-aY%iOuYk;z!dW`@5*$pr;sx;N
zNC?RpYRgw^swU2<OrEyJ8rc(@(ivten%a<_^{`1lI4uUxZ0CN`pnkEhUHGA(sM2|H
zujd5rxe#xjk?GQ*M?f4*JT-YPIn`+{SHL#}EuK`=nD$M}6_K>n93$7WTdSaGs&e5j
zo1*Twbu%}cYp#vNPGR;K?5hhcKo~#^qB4cwx}8C4TeU;SG^2uVu|w+X1j2i^)oX2$
z9<|6Vw>W~iGJQzOvS+dx%}s#}Dh;<2oH|3ieMIz;TU_Bf%UOc2m~Qd|Fqli!k6zr1
zMCK=T7OzYsdxa*E1`N~223%$#1EN;{qN>&})dDhbz7iCqZNpy#idd`VTc8?`BkwV~
zt9>RPzpA4CT7ZeYERKXpOi<lOnJ3fn)ZK9(kmb>GEseX;WNLaBkhT8oRcoY#j!^~E
zh<dX&$`E9OlnS#@nX1;syJT=ExksG|^3FzW#}A2TGqRsKvMwSV>e9;|Y~bJY!)~In
z#5}ao!Ue=xkca>oLa-3tv?Yf`xn>PN`w?Eyi#mu@)oUv&bKNhEQ#Q0nOg56}OiO9k
z+|g9{T_Fll&=(u3zNWIONd{?OR^N?U;5C^kHx<d1G~&I_ZpZMlZa;cuB!rpe#QA$)
z$Y~z_vBaPJ_vA?(wf6|!E#=Lm`+1tl^Vr%Px+27RMkM0rm4En_Tq`h*NaB=@5e6_J
zMM?@(L4&+Mn|Wv&xc_0nTz$7TORx*1e^N&vS9XlbiZ91P_L%IX{u-;BgX>_uG1?WK
zd?;PYLe-I~;*MR3YDpuib)TF4kYO&+F0~>Qai*%5jx$npS=O;c1>cr^&#9t7e>${e
zy2O;T&J^RLk?OEmP5pkkWJ;x$lYT(l@9x`oKS8}jZ!77aj%ny%b%%k${eZGx#Df8U
zyK5-5u>|<Q$Hqq%N`b;K1+GDyq=iH$3UTQtIPe##8_}OS6%5)BAK~T0iiy_El^bj3
z=YD<RUfrs=VqJhw3%R07;JcWRyHTe_FApZbw6TYnw`Hz92V=?A-_u@B!Wsu82jEQ9
z^S|6p287=8q;%9(v>Cc7zCtPPUc)3boh<0Zf!Z3?V_UC!M(_1(^;*-MfaWN$5V9kl
ztp|Vf#ZR&3`WQN!^6<Os+P(%quA>QDINZl9feI+QIeTRKG2LScr$CVK1;;fZW>YIa
z4g|ZIko2TX=^a{%1Hw)7QLdZ4;Lih9=Z~-P7iSL}8+)5zq&od921yzSnUKuCx-C1H
z&%9N?h@4F7)-q9H{Nr|#x`c6MMgr7;9}5IMx!3<#+do{Q%%8ZcYiFBmLAmd_m?#AW
z<dgP3glBh1fAjx(`k{$`7M1adq+2M`tdvl<RG=-t(O|7Q_vBpbobN<pgHh+NBDRcz
zC&XE83!`wg-+tiD2S&(3VIGP9f;WSFX&Hm?OE5h9Uw+`t6Cz&WVJGp4CFX)m0G%P^
z<-RDa|G=A@DizStJE|(O;m?9q6SyW7{?c0>n4Xv=hE_4jX`Srsl5onsM}Df%&|z#K
zZA9B!BXCB%O~zjA=`<LHs#aFG@#1)4eYMo*)%MX!#*Jk1;#f@~zpFPO8sZ+Ug%QfA
zU>W+63iHzKHU18G#=P#aX6kL2)_@5dB6vOo?iraufI)9Uw$zEQxNQOw2wSyxBKF4q
zKr$Mhjr;I#!(ydh=%@#6Z#4qL_`HBes?myPT<E0j&&c}sVmd%ynrrQGy2$>G;r?t_
zJ*bnF5R;@{LdVetWzlJO_@)nxG67z$h)?<`=z9`LO}%m4s>!mRiK!}_{4G#u%?bDc
zul?ezNzideW-k(J4Gx6Ms0_pEkmKDau*H%gKrY>2`8XuNE+jMoIp=0@c$j4c#H>cm
z!tTArfhP_TWsz8)vz9-d*((7tN^g&e$_ulUDz+`bJoGbq)GA;SW$p3DM|75IcVPQz
zI9U~w5C9WE5CMeLzDO8>Gw_8JSrx(0VaFUhA>!OU-)}(>%D%`Li(^{fkd)|A>OXd!
z=qy4gDiXJBDto#SE>EbEi>&^Hsk0Eq{W;!X4JCJj0U`5KFK^!h-#N{1BEfRUcOQvD
zwIt#Cdq|8ds1iwGkd~!YQJ^14{(<PibJUJ<)F-VgLe9$M%k$V{k03OL#46O9uhQZG
zO9bVGKr?g~4$aJ#=3n%F)J1T9<&fzHn4&fG--3EP&W`v(XybmX+(%Jn_76jX3@9u_
z<Ev5-Y2;0f8kQRnkr>Z5hWr%45+Fc8>{?esc&D)5onv_EKtTqOUc^pfU<R4d?PmEx
z@CI`rwcrRtPq_nxNbYk0p8rj64%p=2hTsyqt+W>eBP_}-%I@1>OVueyrOhE>&(h+s
zk{5Nh!4>6I90fmJafxF=N1Zi~xASiYCbrw8RkTLaFPQSh0Qq!p5i|%`<w<}Ft|3R#
z-P4R4Vz;)aeDAf($o+VlFoA=UWRbR?fTt-lfnVjpYeH>DB#l}IpU?4ux7J>72>{jb
zUjWXGsk3^Z^P?bL<m46Fj%BY?)VC_sB|e|dJ6&L_A6yG+5r+ir6gG51&QVRNE)96b
z04dNuCcG-Nrn@htvlW55{Pe#U1WKYCwi;X`6d?n=X}GNDho7KBNit`L#T}1M)nzKF
zbA<Gm(*e&dJ(2dqq@#|+eet$mVVp?8FPqnG?{7jBfB(eFQqwfoL7TTiAVQn`{1IWh
zVa@Naw;MzHP)zv95#l{a{F+m1WTD8=cISxL1c0O4x1+F$>s|WkVMziu_m}zz<NM+5
z5aLB~R7Dq2;pRg`WpR{ZES^YDtSjZmaR~+Fr>ewjKC02JzC%640#aoU=L<OM3E#P`
zTG^R}X;btcN(B2_P-l33umQ@qt9xrdz2ShiTuk*}x_iLQeMPyL^@s2<x+8XEt~qMI
z)X*l@1$y5rP!5Xz558%`97<tHS4dO~EARsS558IO55Adz2i{t400uvu5_rz7iMI|Q
zH;|o9cph6ygQDyjCM>fBgqzYM{!)io$TiMQ{P7C!x_rM{)YNOahGK@M`5$}}!f1SD
zmc8Q#-?Yd12j5gy0)=Bs{TJV4DiB-<{GGYlfw_P2G^tcfGazp*r-R0mhe8BA4f2x4
zKF(#s1xkbi#q6I40iQry))o3tfhNHmB~~5wwIg?qMS$ZeZ2svx(AaB>Zr#cS^#1FP
z&kRTb#sMuaFyPbTO$<_JE4&#koEVtr#V53m&3rK1fjC7fs_+C2p<<z?WB0lvXdduS
z$R=C_2Un&HlliYnN{WSB));Rz(GC~e^#<4;{keW~cur1rwY@l3xxicBZ_+Qq4}kw6
zIHB0k{Qmq)a5gp&;YPWP{1BXs-GN6;r~RYi%}4|!v+}d?vw~Zk!7)qNk0JbZW{n*c
z&|3?Y?g~%BgjyXPErTtZ2W;XQKi~?^W+vX$;mg6|(%JtgoUU?gfmOu%TS0DNMIj!|
zChi>_^8;Fj18G-R1eUEXtCG$xq8;w@TER@JYH~9-$+C^$MBlzD!gaVCGrV_r;bH|0
z;L3PwIa6rV6^r&DGe@`7Z$gz!m@&4OXOW^FTVc*FdjpmS){36s5MXnC0|f?)QRe*$
zFy!5!L+x+PPLIw{xld-ciPPhLb9p@)r;>PQ+dY>%PtOX+N7I1Dm=`^ej#7?Vej{@x
z?X&bx{|#|2WBv<qs(f!l4BgJUno!q0h15>s7Z+-6x(%Iuvt>_YqZ>{{^Pt4i%Q2*u
z(6pZG%tImJ!H{nui{YFJ892ue{bO-%0jYXQb)tRP{7{@Ffor!*__&vm0Y?M*HJm3;
z#kPgcY8|GuibYv{f(=%vEl%{(@GQD1P60Lv+0)tC(@xyDVS^zp!_C7X!9iKV%@dSw
zqpa%h<?m1#HqelVVc)|Y_|arC@6uDly)V(JEek0G-l`3!`92Bj&zD{<D;{Gm&t9<`
zb+%k~fL!WK>yRY2h>|U2SqI0{eV=n9Kuea#3Kyd_cCPGoCqk0tPQ${x37C!X)ag2D
z4(Xc9CyYH&&<bLF#<cNBnO~_00Z*d}F@%Fiy&!rJd*MF7?>KE1PQ7E`vVJXL0j4AD
ziy(i){i#UwDY@<z6+c^_e4!Fa#2r>oXu%s86_c=$h7TS-#4tYZZ11#3d(sfeP>}q)
zF~P;I_I?vk^Tt3;=n=}w1|!;&;!Fkn6(v=Xps)<B0-LAd4#j4-F}{~bx8TTw$vN=a
zH{Y5(9nF+kTKWa;;++3fO`|9Z)6MXN^2ee~2K$bZ#0XJw7=M>e!uUQK`FhO@+$eao
zUwLtYIw}q6#x?cG(GkgSL0fcQ>WFvxt&nu+P)l`;%%xL?Hfsu-?A7vNxHqv>n_NXu
z684B07!p1c`?&Z_=&xrxJ=7^Z(diswYHTT_V*I1;M}u2~=^Ch}3K+klUvON|3d=;a
z<!-w00OEXm*<Khm@$7fQftkYUw1y#u039o4;FlG60lUSMDxj>t%g8J?cl=`$flf`j
z8I7eG;`fm{D~=0twgJo&7tg8DCcRonm_ub*^s^K|mnBM|ry&8Pq*gBH4XE^<VDRsa
z>$y)r+u3R7`M+x~N+gQ1zzpPZm4-iarSRb2i)~{(M%HmqCP9J(A8Ci%mm>+A%c!XB
zTPB(TkszV4W1!mMa~8zFRg47cIDk8>kWj(mU^;Y+z2}Q!P_9*?qXPCQ-@i#ENcZsj
zs$=*=m#|3Tu>rUr;~^IPgb4}!e^Lg6^EW4hg47Bb`s77e1eopt0uUj2s`z^b)!@uT
zf63?QeG3iYWXU8|O_;4ogt`|2nT%A+^W(weIDO^d0VJ*ZC)^V9fOQiE<=a?eaBk2W
z$%OD}7jm8WOB*bTeeDdjS^{_5x0V*LqKD3Qhd_cBGq^Eh*NC<U3^t<!_@VjZR{F?_
zJPvdm)yfu1k-L(6^uO*+i^Ua*$zXPz+myk}v)#HG-THJ}VSP$!W}SeahJIf1+zz!}
zvxf!tBA5<Wu+4=2Ii-y~%<<9o=F#YK&j3$r{TkcbIDWcdYTDU5r>j^XOFrVO*=jeR
zkzGfIx8s^fkh6p-{`FT$Z|!-;4~zPiOjq+|!`oGc7R4&#%(%$?i|MD`xzTVaMy7hz
z={sYS6Br+beLc1N2&(Rka|HBtg5MYQF-?t2iNwm7kC^uJ&*CLo%Ei6joPKPyC20tv
z>x(R`>oF;a^8z#$f2^Zfuey$mEVkJ}u{rNO6dkrHB$cHyc{xe`vK!7r^S%{=!?MO`
z8>Qu;2GmBa-}Wg{A%df05H~+$8yrovk3W4x4DDz}d{mM9Yn}FD<5lDWK_xfS9t3=8
z$wag=RGU>B(tH?6TKkeVbWlSmLpG!mju9*!Tpa#&2p1!u(pf)I#}icOvkrGP48`nx
zSbIYjC7A@ia59SISX#9Vn83KjsytXOEva&;5;}|B1dFReRr}2$&BEN#Si*Q~@?HCi
zc5XZg2(K)yw#HBC(3V*8Z6}gQdA{2<Tc&~M)9~<22?ro%ut?6<CugC=kCjlwGQnXm
zLjI-~AOl3T!yv39Rv1Ayf)Ib@sOgTxX1YRt@jj*9tq1dyoL(STvENt3o-Q>SrafN&
z7c6^&&leSI-@^%$bX2q0#;2b&ypAIANW9mr(xg<{@%S^;et*8XE_slE$Dixxi(CVt
z_KM*~hNa1g#<o+YB|5n(>1k@3k1qGqB!jLJC636I_Gh}jcG6rlyFdHMUT@GIXAFnO
zzC3QOCiY*ZaXfSAhz2hLD;>&eRd}puVeXwZK1r7N+D>OZSS4eQe@l-OPlpA3%%_XQ
z0mQ$-5);fH&yGn~#C1)Ax5+bPzF7&`7JOra=4&q|A*R*5O)3Fn(Buxj9+L{(mP!-C
zZp~Z4sP@$wlBKIeH$@LEy!BUPnv0$Dk9HSBnG2^o8We4po=uTE-&k89rIA_LxubLw
zU~<p?u6^9DuU-j*uP;}AoUu05wx6xd3wCj`HHuBCoc!&?a%Cb^*?)trU06v=fDl!l
zbed4}eXfxS1(G85F3P8o+r#`2>>)j1_B|+rBeY871tlEs@m9p*M@Pv#VkJh<<oWxU
z0iTnRufVzoTMYOb>XXy>UO-A#7V<=NNmq)}BHGc<Q|e{uV%w^<LT0lsYSgZJyFHlf
zRxsFK)$4!|PiGabf-I`wF38tHDbo|tgY#sb1D9+Wzq-PMAX5(PP@suH5Cg#wx_tm3
z)G=<RMH})ZXMl3qR5?ky(MDKEX#XK%XN{E7C!j=)yDeBhIO5^K(9JBobY{I!D^jvy
z&enFh)0gD|f3?=a*5>ux3=IAU7l25)rRhKR7ZWBb6s*uS7q%zl(4qZLy4A`0G}sw=
z9VsT*NQYF$X?Q&7rR9_4T%RIoN(<vow+YEft+4|XyGNo-nd)p@9(dLYBNl{!nKDEw
zXig}EU<-A;Xt+}GZ_@a!ShKfPuX`fv7BRhh42(aqrnIE6Vl(eSaZmQIk4lwIMV3p|
z%{{O<tAB&k3woBTx3yl&B}>o*ngl*hW<#)}_!TZvYMB>DSL&whHn|@ER8Ay9T&^M8
zgSF;p$)XAqTgg?ZkA_oi$)MB6DI%28Ot#4r&8t)8Gj>;u%I3u#Q2e<e?GL?31C==)
z5??8H%Z}=Nvs`3V!S2!n{~q;Vs8$Vh%+r{tQ_4cRkoqBcpa$V3q+NvWy7U#%o4ja@
z?h7Kvb8W*0shpd{e{wF<pe;@-wStV1=b;e*0v|Bp5I77I{-#BG6Y0piYpw<6_@1x1
zF*0`N_5mg3gp_uI<W>^2X_0=bCEAOkBEBRrfHAavbzOVu^8KR+?&WmJ&o#PxI~JX~
z=2ABpfuj;$x3CO|hDsF{Qr>Z{|70)_h1z!<I&=aYf?N-^ST!>s{gJeYj<0+GP>hbF
zpQhwxH`4w8JyAm31HxP!DoqfLZ@UzWTF?JFJ7r32X1v{^ITato9DjIt7_NHb_VRB8
zN!9Z53FFvEwxeq$3G8q`e|VDkUwoOs+zR*y7TZ9Mx-qo0?ntw1;c*O2XY>));3t9e
zc5d`0=*IbH>NT}0ar|c8cYv426Q1<dR@-w|z%?0s_jlhk@b&e27cg3XCy7X`b_(eI
zYM1S`#~47;Z3H~r8_GpwM4>Eq%K0BdNq^hY^rgv7k2}9DvY%9MymBp2)zNK2W-Pd?
z5G3Dz20@zCehfxB$a;{hE(9pPCm5BmS<g+vhRZf>d!o#Ruedcy1*^xER{1>$5pFos
zcuV~QMgkp425y-R)%}q_f*T!Q%giKxWY#=-{;U6{!^W{xUp7Q2hnImkBhn^gv{ve_
z3!>tGy-J*AKrEqszqtH);aZ9OWV!$|PY1m~MW(xs%(Ov@NHd*fnl~)}6TGhA%R$9N
z+Z;dnXJAxEM0hcsmlKiJoXnFs+2|n3)}Wev*30{?@;Ko^AJzY%fW>J~z=0pI0cCN4
z?e$FV8O68{dDx+-nf^EML56|?dvu&`8iY7gP7;y@Jci;Mj_w?d5y;yXxy0|1$WZC5
z`#Du4(rXvIg+0ULk!`RD9JXzmjd`X~gu+|>a{?r~XIkBs9aZcyOY5eG$DSPskWvNJ
zu$zTA#B3!DYn;s9KyZ3Zeq>z}P~s0Fio!u*^eDJXAa-K)N6;V&{`E#GkW+apN@&j`
zYFXn1Q0@~cgfAgPODxXxJP2;VZ|EdP4W!C<^kM1UROnjxJ`5Q#T2}byE>u(@gfjsO
zr`|Ag>0vM5gE?Vv(2XYHNFgLWOb3FwRU|L*Bf)-UFu}w5SUfPp=I4Ei;3~jfe&Y{B
zi+<n*yBBLs$Qf)gNsJO}{ec(LCSu`p=kYpvd(D!ayh$u=ulw3$sMz8eWw?L2?F77T
zf8<4<1H_C&BSvWa2(VkkL~JKcrqO|7{*8?|M9(x|mB8b~LM_W%i+0c&RA@z?6t@d?
z1d-c7j@TL+X`u8SFBm=(^US{xYdH;yze8CZKx!K2kWT3N1?`wK2qb!fY~0vnkm6%c
zh#PfaG9%dyE9Z;ps7=q<n{o*@)OV0}i|XPXM;t5Ai2;tDR(WEf@duwobkf*BUY6nv
z(%Sz=*d7($m<z*!0$QRfPh8-M0TR_O5}cFVBT<oPd)#yxru@sur4XBXBtvm9tq2xS
z(2fbF`##obhaXhTon}jsYAN(Bm7}+BLRVDp%Je~o|A)+qG$)IeHYsk-0hE!-xL&N`
zkh(At2h&jmp8Wse>mH*kYuhzphgGrdRP0pDif!ArZQHhOI~Cik*tVVYs^@w4?%m%v
zy2qHm*RQqKob$e}^E@0OY>xQy`gHM$VEZ~^xSy77p;x{F;CM^g?+*7^wz`V9-vvZ7
z?^y-y&CjO}t571wN$*K3X}eoc+%U1mipTWjsf(hBoSO13bKNyK+pDWkxzz7^lQ*&F
zf>ZNW4c7FO-C$VAtfTSk@ncgsUxJxw<r5qgXm{$~ki8rGAt3bONKt;|p)$cB{eg7m
zH4{MCU1+DXxQRD+d$_%_{b4ROm28SI@SX#^n4*XS0)|utfUUFJp<q#=*l7o5$My;`
zwOBsJ%kqO?zX?9y1#F7}Ho)I}k-QmI@Gvlr73jLJQ(FcB0m@8hcO(}Dbri)8t7_4G
zbn9?IKcuk~6szg}_tC{@urgtaNX?s4mqDxz@E4Z0)9sZ8{@obGS-N~7QenL`w8kJg
z>d4*Cf<z4mpIb8t!g62@rC8z?s?R@#$xU+jQVg(o)>;^34e_Dy7_G2w3fTOyWBYL>
z+z`$izGU?1Ak((vd=tkCb!KhySW+s!5j>gpAi;#5y`mb&<lTF{O?Rd$g$<nk&aeeP
zJ?{oseawxv9aL@%CAZChNn`;T{^P%i&Ri8Pb2%~Wu7=(5!?;P2moB2n@ZdM$B_;K~
zqQNePd5_kAyy>n9!hAmZ2L5Z)@tDb&SU;_7P4<`eqrk<k@NL9=hBtmo8FH%VV-$Lw
zs2536KIpc={xSXe;7r;cxY0v)F?)6xv)}CL7lF_k0VMHxwUT_HL+qozdmEBi*Od6q
z?x@MpnlVxuUwDR4rrGbO1!(Jv68KiN;4M)+33!boWW=4x+MY1=J)lBVuyquAC*HDu
zekTZvFIGz#Vg(UW7D}fnq=!-MEAeG5)iMzWNZ(FYJ2tsfB|spl^Qsw+NW+9%y<WhN
z?4{GvP^wgt?~dL<X`MCN^<v6ifX+Y<=f9wEP6*4R(%&SbcQ2oW$CS~ZS_T+Ow{GtI
z4Pgp(ue#en3|<yJ1|v9u+Z&DO4SzCD*2bk$2&8m3pUv5Nc}1elyK+M{;@ed3{7^x0
z-~E$`D&K5y2lZ0~#V+fnS+t0NQ87QBOm|$8Kd(_!GSvu2FIL@zFIyD)+UAhu#EMs2
z)}C|s_Cf0h4n~#~LWFFx_yNXX!GM}aMN%pZ6fVZT8f6HzVh;>b<_G@bY)Z@xIMyDq
z#$!UdXJOPIYQr1ev}jM4d2!<e;VI>6nny4Pu^IHNG}L`izjk5>q5(KFEi<93{b<~$
z60IshyzMz>9mpTFxG~NT?iUBIBXU%46`M<MU!wFL{{$_gj%Q>9`B0~iiK+h$TGmv4
zz<>d`JwWn8rR&4t;gF{%XlL+#?p=W=U+@b}0n<>1TlfPcxbH_h7sq!s<tmtxx?1n+
z>+!r;bc|t)u&D4^!aqf`=`{CGH|mM90-A~kfShIZ17S2HkcBGcHO?4nFq06$j#}p9
zeYmjq5`K1u#jM4*F+n+@$Q-1s*XdhtTQN?83=9-#t9ULIEVuj|h!+{rlEyZIUkJ#3
zepa-CBX<0|6cgYQXhY&Jw$82b^kaVOYTp)-38bh^Xr)AnNpm2BqSGZfcXW}5*l!2?
zmB8v5{|Q_63&dZ^VdBgykc_Fsy>6~Ye+Nq^fWGT%jT0Qlf5Q7SyBpLVKiDaVbfc;Q
zAUaK4RANNb2+OQ5`c^!%g_p$l_UI5F$iYKriYI4=ch2p8BT;ykgarAIf(0(Xs@{*x
zVi^r~k$AEsE{%ZEU!`WB*^dEr=Cz+}^ue|f(|3&KQhLlGEL$Vry|WfolSS1?t8Ed&
z1;bdy2`XxI|E<4O_q4;h09%IE#}w)$kLASkcwCx$8GgplO#gyf&M#JaT*20*dOcY5
ziOwlJw&&XuNcc3~^7Q?dw(X6eR0FUYTvvVp3Yt#&9v<&bCn~xBXICy%aA3eTn`sOk
z_3!RFPm|SOZ@C=a`Y$&WZ-XuJ6RmKn+VLNHAcJoRQuFP%A5BI0+B=l1zNV+j^dGO5
zW8U|f80A{ECDn6FrllOesF&c_>oK-ZFT^}<nNn5dGBJS^uBMA>@Z@_iD#o9FtF?RS
zAFMW<p35Mt9^D#nah+nQ<Nm&CJZuIu%tM=}1(SW>n+2?g7hL>{w}u{gti!sz0}xAk
z>W!>2Z#Sgt$K#jV1Wwl_G#$QB4iEEsj=Lr%ykYpT*lgo9bGD9!e);~7jdAHc+*<V0
zNWbwZwA*RUtA!RT%!3<*LUE?yd2`URY4;UN)1Sr5lllS{7@!3E=G^Mb*IMK1f|S(4
z1wY11$k3t(^Id^=;JY?ps|a+A(iF=1lrS$I)Tzce_Pe33z<4BU6JvyC0CPUcjbxK6
z>PhXv#-VJEDE`!H{$SMjsH9MOKnC$Etn?V3Jxt^oS#htJ4P&Xq#-V(%zzPuaFR>7~
z4c0`L(g?Bhw5TvSyXKu%w^&ImD&2Nr#z1YyC7GMUfRH>mBnD#zE#kvcqF=|!{AQ-Q
z!sRz`Cw&7HjzSaWA493YxjWIvSB1fKRy4%K(rqm+>2Q`!DOuG@ILY57jtgsC%S=?_
zY%9p^J(Hf_stNyRZ1lG}cp0df*Fu4*lZX3<Vu8h6`4nSy&D7$6(;$&(=en}Z^h%8k
zJG$jqIJ;%CLJHLOKt#=ZhdqTir=R{%ZNm)c-3CTb<{D%7q<*#S&7j*+`h-@|>V0-E
zW9BXLRyaw3=vUk=LYgCcij+q7s%UfEhhjXXqC$=Uzk)xQ=J%OWQ?-l05>quAQxuwt
zATFVCb3UJ+wd$*=pcZU*=T-=awf#SBVacKr0xO;cX{uz{D*3O`;4G{lZ)Ive$?diO
z=!D}y8R|o$1E^TPiIW2%Qf_(f!RPtn`3hF&!Z&IcIW03o{xdm!ch<d{o`usi!x5;>
zPU1PrV9DD&v?O!1o3Xuk_;MS(g>7WUo8C}iJ4Wz`h`?6a49@IXIomG}qtCO(#Gg7;
z*!2lF?KCW=Bt#`k_T5z(Q7|^LvJ&CYzS#-Ja&%C$4RR^>?l{p>^zul(%qx<kX<)x`
zn~ah3CVYS{dG%ye%t{M`b<2k(#VS`JG#cEgs8IF2Kxa%NYLb$c@w6<kD|9bd=KI90
z>j>W{bHvmfG8%iVSQja!ESeBIG0XUnE9VNe@MO;$i_y%2mpU(I3;Vt~vQE3oTqK3O
zWUE`k&K?*Pl|1&e24dCD-IcE8MrHr$7nc0ib;jBy(EZ*_r|o3{^*z$sbY!U}r!ACY
zT6>~ff!C^uZ+;LCRhoN34A(e9Jj)o1##|uR2Q;0U@*&;2_88WO`7j)6A$KU8pE=;2
zBR5Y!1oNytE&tjeAfC9$Lff9WC{--H5alMGkYX@$C@dJ0Ih7PkrKm<woC14<S`ORI
zwm`7Rd_$hBsJmoBOem#=Jzsuq8bi)vT{pZm@h9ewQ?G<cghh%QCjNWc(sY7y(<eq=
zlrhS+%`c{#jw_0_KNL%yBeUy(!QRMhIwJIpN47p{f)B##ma96qPYgJ?$KQN;JQ#R|
zl)`@k=GOGzhM2lE0Bsoe_ki}pMic0$&X1$tm+RPD4K^h@q9%i10r}B^$x`CYXnKl$
ztpgH_Dr|{WiQt((C(76goTH!n@g0Kh5n)Ym82}aS=GUqpFDCS^#O#ozaX+B=w#Na>
z2f;VSGF(JIR%EVgBJOH>ndC=QV4TCKwsb1Av&*x6fXdA9b5uW#{rAF~45zc}OUTY6
zw&qDyF1h;uEW;4mX)Hj+AD9u{5&_BV9xDBIt@AuB^17?zc3SPerudbAQ0Lig{{Wm}
z$5j;|#u`QktIawhq>xIr-G)KyrIgkFC%9ZeB^E-mazc?)+Dha^fbpieuA2%}nI88~
zzHByK{p=)ZfIg%*U~7jc5QF2qs7@YSp_yoC*M7<k;BT0LeW?-;2?_9fopiaThkcuM
zO)o#XqU(BhH2W+@d?)M#Ed(NTr!J_?8+TVlC{t_i=`DVk0ySu$eazfwz`j*04ca+5
zYG^ij@?Pqq57cWACsHPjc#|>MyHVvc{7J#nTt?`-Xl98mtHBaEmhD>qyzB8(rka6r
z0zNsju$~_$kry|G5<6J7W6&^nG!pYrI<Tr35uGKeeZ1);zeCDPgRwCiI5Nq3{_a~j
zC=(7Fmow6-(g4&i#CNb*C@sIEoIUh1-z=!@F93PMyu0KtIV75Vtw9E$R>sgaK<LYJ
z3ose%!cLDEBj{}8#fPOM7^f?~TKoCEn-L2fLo(&-t$fXon=lv>rM^Xx92yap9SdrT
zTGzuOQQ_Io=+b`_Ta6<hVfZ+r)h3tTu}NHU5MT>Fn!MFt)cLWn&(n;u6j!5>Vb`zk
z1ECrVuQl@Z?(MJS{Wnx3_`pn+(yi)!Z+i1k>E&{m5_I;sH&5=wr6Dbd=&jOq?`d}l
zei2nr62}X$iV_UE;_n19d8$mt_H_q}VpK7m1xUB}gJ1Sc%b2MTy=)KJwW}lmb&&_F
zG7-}?JN+0>_J)P9a?JzTSwkTJbk!Mvt}c?1T}&4%@$w1~_aay0%Cp!QGEFBC4Hi;s
z@BTMhgl-rh7-L8FTcZ4LSJq?J)kRnsRv<ip$v(Lwc3(VUjO6S|f|r*U#uY7=rkf0S
z3jY8^X#<wAih_GfGn?xJGtIv3ndgzxgGld^fl8gnn4<(48q1qRj@=ikh$A^nA5a>Z
z8Gp#Moj{phIWof-BlAKMU8ao4mc@?wzQQa+#SMtleytQ6D?OdZCgZ}oGo?@OI`a?p
zF8h+rUl;~a10G4znsIgzkG=T6|2PC?RVn;dp<R|2gMD8^s=H1<Zg2|A&G<N4Rwmt+
zS&FUBm~9~KWdzvH)Fv#LQjC}}hh^ker`&;Vt5`~z)^m6Q?Mo(a<4iI1=C|*+Vq(d&
zE1h$P{})aI*B_+b_g`K?vF8Zx@+NseHZF8kru;XFp$Cdm#&CkjJk0=eQRMvRge((d
zAQn%R8Ll)~2Ab4%n^rx#6_~Ll^+GQmoKpFDm+u3n606%Q38zYtaI{ec6E;4fa4O_0
zen2#Hy(fzrP?rP2yE%KsVeSUWcCg~PaLY^((&P|s0SrAzM9E5;jlWLN)59OPL2rqq
zj$zi$u{0Seg}d0ww+sPHe004M<$5~#xkwz{{Wb|*DJZN-QAJGB`r-?9RIZZ4&d)+Q
zm|x$Nbi=sl;8U}SGt7l_Pz+-V)>~(|t}FPXw`IbI1joSwbSJJ|23NL`v&*gOl!~-c
zf58)^1FMpwqrdIqA7ROLj+P5#T$RfS{z^XN?qDTE;;zK>6iQe8H(<Wv)ZqR>oV_^M
zmX?&UH@=l^yT1-_mae1V<zpwZ>&WnQKbKqg!(jV2`HfOCIO4Dv8*%oHhu0I44|PUr
zN22JPaLJ&34GqPAgBS6&3y#8@*W=xiE+>X+{>=_Wz?0StS^P|CSJdG+l<UBhvq^hf
z5H$#HnaHF4vx0igfi3GbOGuZzmH4XvGRs{qls|_Y%47RU_4dgw*W1YUG}CRTkvKdT
z@q+}A9G+7}9N(LNEC@OnEos#{o#da~umkpZeHj~=7+YLPs;RLmMP^xm#6As#b?U={
zUK$AKq2uZeZrvcTwy93^xEPql202!`O4tg-c~T*0``{io{~{r2m`v!ePFZMbV^H?d
zI}Kh~P%(d<zKO_H_kM=$Cj4-CpD*HUh^YCEq9%C+CuoF4aK^@{-Q^a%m4>%_yLf^e
zK}^1mjXvHSA1k}x-n)wsXzEM8>S0C8I?#%(zrL-%t_<Ntcu`h1tVj^SE2)<}wv_l`
z9cs0#V%dRqyP&*)39N1|^WJK2mMpJGP$HNeXhu3Pn3-8TM;h1ePo5g9DV{@)y6Cqj
z60h#XVl|x&QkF`f>Nk>GqOIR9W2KAOj|*VPzeCpstx+YQD~|AHGmI)h2i%-|`E(g?
zJVWHKy&Cp2vzn@a!Ml!me@0%8sN}3A&;e<IJRVCB#iJix0{S0B?7-2+Py#TjV%Ix#
z_Qy8@bbAnw=r&RtrZ?dTo>1Yj#8%I_@tcOaz>$lQb=F)=N*Ct6UmoUGOa_P#|Hi2S
z9#ct)jVAfV={^Vk5G!tl=8Keg+0U?^??e9w$5h`cAZqXr$D~Id)+mUH{dN!b8O{%v
z0>hp7joBy0&LPvNB0IfS=Bts-MH<DtK$YqBH)4`SKL(w2K>6uIibTrqo{c=IhJNcy
zjr77Uj{i5yw3fo~e`lE>B@@t(rWmDmxKxkV8d9ftaw9S9<n1BO3GyY=cj6;fII*~J
z6DCe&8YIYaROooLy}b}isehqfb^-}RPZ!@@tZTROY5*DU>@uw{98J^3Fw(}x8`cIW
z8C1R*<YU%PW37T~_tv80=NRYvm*}iPr|4zsWUHrT*5zn6=X`~Vgw`4GxWjQ#@XT&I
z8Xya#QMrq;V(?aK&xjK&5USthZ#^!+dVGSTpBhCI9hN-;q@qmanvG_BAAdDkwvDTa
zKB)ZY|5S^7niuxGK9`$l_c|p*xpaSazq)t6I%ZP~xyz;LJB%E{Hh_%BWY4Ws{pvKK
zi>%Sas@dew<LSO0?@NP~YxtKcI;^GtYGV2tye;sEUXQAdE*7u0cnU`nTJ0Nq#(e8T
z<+^%|khm0gW@*iNZsSv9tEs+HnP;hn73ta?>9aL30iDJ-tn6|88ic)6C-J~%qLv3#
zc6Muc5EKDB7z3Ol0oYPjPA6~IAV{?5%j~YG6Ljr#$KA=mh|0c0h|8#kTEdnuW;IR=
z{Rdd~*I*p4Bzs4!eLse4+g1d(tez2UaL2_m8T6ws;9%F1w4YVnA0YPL$NI<!#_O$7
ztjGu_ZCB@`DY4xUcl3X$A~JeF7>qf>A%(o+a9qJ+&cXXcy{dDST+~?KKuZ0c6KN|w
zs1dt<p?E~zXDwfWT#O{v;V6rKF`8JZ(jT5?_9FY{mjDqFwfe;FdlvwIE=8K|_Q+xB
zX81%6E@6Tryt0`qj}*=#oSOre^vv!>jcr|2q0?+MAtj{8<mAJUIexr7A)3&Nllba7
zZ755kyQI8>JREh++ekC!aG_*>c3aYBJ4q$(00|a}xt58B3d5$z(aC%R#zJKhA2X#7
z^V3+n$SIo!$%f@Hl}-GQS2TG=yWifg@O1hfoXMJ+=`u;&_FGBq;z|nZ&4XSH4rNC$
zQs2Lfk#v%mdo7M1CSbz~s_2~1egN@NzC$Ip#rr8QSTF<Ic)`Lz*ebSaw;>5XEyCfb
z9KcwE!boed8R%#Yl9hLuuTLNg8*@Y6bT6|ALm;|Q;i`MT#GFnn6R;9;5&g=lU~G_(
zIlRy~@}xr(9|ku7G;NKJcW=&+z{b~Zg|xZOxROM>hu(+>=73~NAf$hoCo)YgWLRR9
z65XuO%WnO1)%2@a5p53z2evf_wxP}r_&vMxS{bGVreNfnMG3rf26KpXLfcvVd>gso
zYSjQqg5%Uo_X{b=!^B(frC!}#f8QOS?&S^Sjd4PjXuXrTgTI((Ol-Qt7ru+^)R^4`
zG&|i)Cm;UaHUK$k4Tqdt{=wG|puRdGw?)PpQZZ#$Bc08J9#6uO6yVYR+C_I=DUIEs
zK$uX!?!$qlgE&H0GMemIqmH7m+rOVKw5xY~Rh;0amnlC-QGU}zkWz&6l|5dH^s);V
zKk9M>=e3lPq3+7m%IxTH^a&2%Nx{=rYxk6?Gt+Lr9{`+hAXCIj@L;ykX=<c(`hE-Q
ztP3c0|Lo>|n3x3%UF2LKZg8FK$6OS#f*j=o2B`?6NPj?jQWHKK+b=9^K$##9okAjh
z8-PQ|5skQemTWu!qdEV!J64vG-`nJDr-{ikIxz+T=?sz<(7PM9{5SxU^g_q&)#&&Y
zr_iy45hmCWvu+cXx#bcW?t`pS0u_Ad|AGf*XBFa58kN;K7D$M2A=|q>)?~7{l9LRm
z+WFlsZWPeSm_N%tgwsF&w9Jdomxzd69s;}Af2H}*lP_SAfu8B0BlXAROS%esp>*cK
zlp3a`hd>#|uR6$rvTG}Lxu&UeGib;-Yy1zar}qUFtxZTTy$d}CxlP&cKXO9*bj>;%
zj!3GMTeYcPG_0G!BW_)SQ`r26u!MJ5&}kX8lGcK@lI(~d<qIg%ftX693TBb2ty4BL
z!kGLU){@wbN*g1;g;rnb_7vxV%(Q}7C(0o#kvrwRg~HpE4Snuf-7mNnt9^S<TgY19
zsQwy7KyiaVZ&uGA29Hb#EJF!Q=Jos$Pntwf0Of+M+zy!K2%M0or%-+qHe5|tQ;NR>
z1*0FQpOW!11IRnk@!}vrmd)^c9xe{3pB1><1e);E2hs|~P!H6NF+&MIkt4i#r72F0
z%O@p}7m|NN3X%LK`M_#SDx~{GIj&PsU!SJ_3u{{^enHtrX0@iKp6$!9%iVM&P&`|l
zoKAQ$k$u9Vtp`&k`H~XF9c!~w(@e9vMlT0zY3G&MS^Kt$)vwZ>ITvt9Fea+wty&ef
zlHR4^v`trYX=N(N%9JlP8@f~iNg~A_R<mP=lY%u9^i9q+K6%sayVF4PKy%=phMWgA
z+8Wkn7~jy@>l^LWA0<d-rG;fW<)Q#0Uv(uy5DyEq3Q|IG3{o%UI?YC1H{~IF3j!Hk
zFJH1!Mc&f6Tt!}AU(VfRWb0tzk>_q72+Ct!)8Apnb8?Akd@x3LCw->fAaicX4a)Nq
zLURQ~J@!IP&Y1v8rT);)a+CTxgtC(3fR$hB#asmgk5&VSs9Y+h(s_2Mj0V{C4+6QR
zKc%-E`psGnE-&s-bgHg?at|1q%gP^ch9Q1Dx*a*I(Ky;e#>>*E5mVO2BSzI}Gv==u
z8}o+gi$z}<1<SkzRrU(AUY+(GPJN!*htQ@dg9lLB2CVAxpN3e-wt#of69$x`HVz~5
z?h$jr<b22BPG@9bv>G*}`O9qeF*SkBuGxekJ&V)mKTWHd4HfJLI0P2%eQZ@sN1lEm
z4z^R!!*Ow7<=cJ}{I6McQ5o6nN}LE%K2m#Q`s5SckGo6<AS>=9Wm{pj;t=n|Kx&^a
z!vLaiv7$w-laP!hMLky7uq0SfpN;blVlso%mw&(o{dR9bE65K{fx=%-AzB_%zbZhH
zQgVnCufYBYi$_o@Kr?}8VL_G=UKaUX2|tH4SBirR;M-OdD43s1;b+R5oN}`mwb!Lf
zL@4~$7Al?#u{?CuYQb?m<rsR+3j!trates=#*jv8t3P>Xd?cTFF)`<xuMWOv*%F>@
z7VVDxK87crl#n#5Kc%l6@`;hcRMS3$3XgL>#J`(0(yKut6r3?=FjwZ11M1{Ea<Z06
z+AQ@wa1>rQ%Gcrvh>SO3`*>!ZHyd(q;V~iD_YfAN{f8xL));&4_f9OAmg*AWl|0@?
z#A&<4`7)eK4?e{_G|_nZ0%!E(3i?K1(CL~M!UVbyp1~YS#C30QLz8IyLPdpYTo&|D
z_fD7VWAkxd#l*pPCvr3vyY3I%I=^NXcw5?HPorCTZbXtPiMtEnG9hjy`aG+(mal^V
ztF<rq)drj4DVBXBE`eOI*09Qe9hf9K0C!cV4Ty;kyZ7Bx4;@5+g-OjWvTu93s;MXp
z`>>qO3`~J%bBOTmGd6xx68JH|n$Kt9Qb$Nc9T)moMuOFsF8`atRYAAgY@*GaoTwg#
zV|d2zgVW_w)kpG$5-+K~>8C_uv>?lzXJ+4C*mAnoUD6s;Tg`3D;govqi#sUeK_gYE
z?%?G@%k6S%{cFWq+-)Tz=>c>FX2$Q}2ih~ufSRs{)o>6W(#pQ6&_t0ytVq62{B+*2
zdHV_AS<5OOqxxp&U;_=`(bH9=XeZVwLrk|N-&LEil%$L;K4mvE*19<+7Q8<sjFF*L
zWVX*<wmAwS+dH`@pK#=8I47DuVmn2^6jSy}m>ItwOiECn69&+;xLOfDosKr2v%NSI
z*5L5g+n%<oy*v`uTmTEG0G-BriHjo55k^El$&!#~yA_)wz*-@L5zjoFhpt%q=9d%2
z^4c_-Co47nB>}qkv1WwHm47XRMHHc7>LasFwB=CMY_%=?W`subeV_sNtM)*|*1sD&
z5vmAkn75mbXFuUeQ7^`bBpay}*)8+*%K+o<-jdZ^M(%)xcmPgxgRAE_!5>oD3I0wg
zzT*x+grd>dER(2FPFH8q<OJ^vCF9cyKvxEt2E?6>;?L1bNaRV~s5RIleZ=vb;|KZK
zp?Lxq!gQchuSXX94j)Ky|3EI1jwqJIQ{vWBb_U;1s%EeD#Qz@Jx?yHN`7=t|H9krt
zJ)#rEtZ7jq$O7NYFd^V~^kbQ_4ErH#qVD%-&3>DIT!J^7!T}H890-e4j3U9zz9_S~
zG_2*ItD{Av<zP0A){eFpS+@qc6<NSa*RTao@T`a9+V<wvDBkqpA}o|xgtX6&H7yxc
zz{hhV_f6!Xv1+zg)4hGNDX>{26usqdFy-sV0q(axRxV9H(4?W9lG^fRJh^K9lR1^e
zvWSSgyT=tyjGnK^t;XqzN=>h8o;@87KH^M>Dc$Jbwz%=Pu|28k(9)t{pZu&T@C6_E
z=W7jLy7o^~U01lCJZ{$?@XmOyK3kW#xBO)V`l{+&h8rWb_gh3KIIca|W&~QND=?x8
zF*u-@-e7bAvJrRPLhSE1b;nD+iHL<<<Vn*6rE6r>8C5%~Flf@O#+4iy9ro`XU`quv
zH2v(d#3N4DIM&?FlLpylrUR_4e3&z9+6@PGcCi@HNYFw@+ptmRn=MaWg*FoD*GkqD
z)(BxojlrM3^aJp4)#XK{(H=PwiJB?o!v0K1$SX!fTa#*P(i;3Wy$rAgfXHx!mg@TD
zT$EMbx`+)Il>IVup4;_xElgnDSQIh5Mww)wc5&b~j6CYm3=vi%e>oD!4o$Zf3_AaP
zIqWL1U5P|%vP>N54g7UuHHm50(KjgpBI<ArnP1QmwsgPrqex<k?F-QILR(k%&W~cn
zX5}J0?8X-kb=T`3Y=2tQ_$44pwn`OoGs}E47S0J;yk*k9D@QHQX)$w@W?b3PTbT8u
z+b3^I8!4^2Jv3#)>i@3pDM5{PA&xFAT%QE-0t>rHE6}UX4h?49zmgdwOmM72jdgK(
z`hy2)4Rql@q7NfdSbd1~R&+E0cb-B}T<w2t=&z3}{snys9!=T8#W2BNX^V5olP|?g
z7|LjrNwY35>w?`4y1Hf)lf(x*f?FPo<+V0uRk?7Z^s=d9;8cYIZS-eN@n7Hv?PN4n
zy-a;rbZRT*hvO_iXa4`I`NFm)Tg3-1x(w?faM9{|3qdg_1rS34`5JHfb=@?ndZCZE
zdH5K(uNg_z_j#WmYjA+YETB=Mk|4I^)&-<{Q-JE@P?R{95&~U>SfRKnnX+q2l0hXB
zZqSGIViUOA6)M2@SND;WsqSR&zb%}(A~@W&!GZXS($~_E>nM8+v8DDMnEY?w$GN~6
zpc~JMp-EsC!3XRplSVc_IvO|^StjYWBnn(u=`E}UjEw!D;~>VzG$Z`pJ1QG1kHVr=
zd!k5++>)>5+B+2D!KfuLmw%ZH%}Ot++Tr8~jOJ`ZjTh~d=lsk{3vmwTn@QM(YaTSB
zlMshZ>RokA(EW%WZYzl{Zi=j(^3wgw94nC)W%x(X>=@~k48va5dg^hC-~=PDbM3#p
zPlFkizU3*z2-(|PAH)|~!XWT=$4{kv20nv>g6<?p_Js0^%G2(yQ7?SUwYSZw{ZB2g
z;2+mGdciO~Qo7k}DdGW@sn}w1I{oC;&G33q^am*4W)6YBO6lGUn71Pgn3@P|lhwzo
zOdBmB5kd9FaI`onO>TbrtMUfe!^sdxx)#ay=s*pRbdl=t)&oJK2FG=XgAh7aA+%wE
z)q9ccjdg0ma0w@%N9F|;(_t3D>dj)P>*N?IAlF%h%p4NN7x&7SCy0*L;o^uq-kAVi
zjZHNTeqQB(gu9!pH7xU5IvCZvA7RVN$|mr<Jq(Oxxj|BgS{?#Y@X>{Eh7Shk8mF$w
z8hPa{7Dk<J5X{%j1^x@qVn&{jP2KuKHiAe3KzN{#*ZzRv#kPXWG99#n$2C1%>|)dV
zMc8RVf;7hw)<^L&-|o*05kitcNenT%_KsC7Ts$?n;P#=}&*AC-Q?s41+XBEVzHxa9
zo>GZSuDDrcKSy?RII8HL-JaRQJ2zHvUv|1C2V>j&JiCV{@Lrz#9tEtqU&+!x@lH@S
zMiv6@R*f{HF|;Wkq<D`<n0>F`pJB-D>$Qr6z>t&RULVNmG8_Nf_wm&~qC&Q!kLM+5
zK;q|<{nm&2T_@=oHKZI2d~vJ$ha*AK*{}ojMkta1(|ZveIXq^l%A@X?|3cF9<Ym>f
z_d?Liq@;~W<2>L-owx<F84r(+g#(d9DAfO~4g58bPirxjKzE5TULst7()Inzksu3%
z93SP{%E|I*BsOVuyM|C(*6qr+zRJ`iFBKdP#oAt9gs8C~PLq~7KAf$qP)7n>)nK&0
zY@Hqy>`c%Vn}hb}I1LRAK1!(s@VmIHR{0+IS`Pz`*>$+698ok|<^nffKh>!#PdD<8
zP)u}(bG~jPE<S?9&_=vD`vMu#mGZsx<)0hkVyEYCVs~h@`|$&Tb*cSSPIjLer%}tl
zXcFkhfHR<(AhcPo>7$4HhT9xiPaGj|m(OhPB#6r^RyL396b1fef4<QQ+40Ts)Dm!d
zv;^Mx&2O%(Azy}{QQ6DrmEdV<ayd!%Diwkfi+X-{PNR%Lix{}aj#5J<YYhM$jR3$u
z_432<B!#Mq2=f+572l9s>3`&;@+#3ASr_PRg7Cx;^R<sz-QC=VPS)3j@r~xXIKnhx
zTgr{)S$^BsO8VI(#?oLsgpk~ohG{!IgO80MRoV#ZO-;Brg^^24Z7>7vmn{|PmZwam
z!4vKTspD0;bo!RNerC@rQdY_DK%eAW`i{=0dbWBcy~0u=r`gC78Khiu1Ff^@o3T%l
zS%>HMJBG1xWJ>f_;*z9CB)@S=9I;ydiGaI0IzL!|96rIOldLsc6jGt(tpQY?OQehz
zH%NZ=Ci<rA_|+=y!>RL);`Q)2bjEhM`uS#NilXTc5(3rfL`ypqN2Air^L>P|#qER&
zRMEZO{Cb45i~skWU7!-zSm-3KtdTCw<JJG6{j^0@{MCNqC2~wozvZ#0jqB|qPwaMU
zv|=gUn`f~QAmHsL48G6i4<!1sktQJb>_0?u$XfP5K!(OcQ}=clS)<{tjH^-TYCozu
z%s(F$9;+VZH*KBB<DbQlfsHm?y8Z^6Z;5Z5hS_x3T>L>)&}KLyia6nge;p)RrPXSJ
zQ+!fjMuiDTuDpwC12$Yu_5Lw_<bzk~^L>#ruCX=?Tdu?qVH$Pder@$!x-3c4mtUd4
zcVA^uMxDDmxP}TlyLEu%=T$`ylx#+A>eYjn*NyHJFQSrtbI{7l+R46jr&E78FIP``
z{$^q5lcsyubhsBvnG8d;6?N2_fR~~v8r0pArsDtQ9Uzj7k;+!TWSlj0K?@7zjVF=h
ze<fL_-Y{*5>zehZRJk={oJ7D>uNRb{7I5&}s%iInq@+Q4;LDQ*wdN`C(X@UE^FpbL
zl>4Ua&o%o6xWk9EdD-{RMu986s5cZ1ji~eO#hhx&+TbY_;1P3HD<X~N{#w1W)oNx|
z3oQ+&2kfRitJQ7NhtlSBgDsekNxXOd9CNEJC(bj_IS5ej^6ak(Z#zGBtaUfwoDBlK
zFMmbwnCA&9kJ}(*ZK<#v(qve`S=sDAs!!cr)9u6;n<c?7FT;&CJGa%$*4AjO!2r>v
z$o>Xp!?<~42e#&Ra*DJx{L2x!D-hWw9h%AeP@*NEn|(r0koYOAP)Jc;?i)He<!O1j
ztDMu~fu{D|KiK3#SOjagurVL~4g@yihRsoc2hsYnxj&cuZ8)!AD;`gmq|ugg?_Zwb
z?T_u~nB)BMIj&I-eF*Wk!73mQhb}2GDbx~HGXxTVP~$K5w((dX(w!toVlYt<%y{}y
zU0wcAV9>3DNR8&NF_q`VY7rOi4vqSAL*wHlj54LBvnGq>>JLNr7eP2<dQL4m&?=cu
z7OgQq+<;u{kl}e@Wb9+bde&b+i0vV-#2RwmROnh9&zub!W-;c`z}kgD?t~g(GNyP+
zA#8$5l!KV3FR*Ydsh3MoMFHYg{>@4DRnneS%)o3fFQ&~LNom!h*+1u)`hmX(6;ln2
z=8%7cX@RhxGs$@uK@D-?7Wfb-4UG_=4})F;A8}lFeC_rx)@<r{UqyGYoq50M{n#lp
z<wkJ&J}1jQ43UP%EV{&L>LYuB@T-htw;1U=9BODWX#_#r**%K>rAQ~KX>9KVoGI;n
z<h*)k)U8`>0s0rDMZPL6?gGJ4mii8XwvJ@2Y?nHl>r7d`C9IT@1mj;rD5XR>m0X^c
z)zcoW<SzQEl~>C*#;^ih|M0u9Z>lL?Z*GtG5O9TF9=#!3k3bL4^bF5To$JUGoRMYu
zRDh*Tq_@&(oS(*?rL;)!PLN6=ku*RM@&^b)A${ww|EC~S_KzTB`yW9F@~<GovEld=
zAP8mIGaQ(IE1oVK$wQ=FsJkr?Jr0*o!LT)80tiCGX7fE^Q4&&t7y<Qm!W*eAwpN7(
z<;53hu015eY(NZn73K;{#((nA);iOgala=hivKE#9HcovdufPji}1<e-bvGjPz=+E
zmRlAL$-rAAR^NSye_N3oq$}oxln==2_yLY5iU^g7CkpU`(*OEFF9#_f2coh%WX0MC
za6a2>m?BO6LE^>AcJ|<Siw(m5X@iKZL&*%k=3==cBaDBx88>jHD}wjp+ORR&C022m
z7IVSk$kV8h%PfDwT1_X9RJb?0VjTA*-r0Pqt+ttkw<BEJ+cEjIMP`u$hMe)O%KSH^
z!h3srn-rwhige3}oEVZAg*Z6b5oJY*-u9}6HPkVIA~O&f*0Got2NU&<xS6@_B}zXu
z2!mhN!LT2L7K**{l0&=N!bbBP=fsF_z1sZ3{5};El0)NGPvM$`BEf<!y9rOHvGA|C
zdg#~u#X##e*Me#Si_-Vl8)Cf})JF33n;mUAJZAX5@30j>x@Vp$0_OzR%0DyR>!Ly{
z1aWYF72gmm35~3ieIshH*`L5MaUl?A_T1xBPQ@itNZoE=U0ARy=7$+QuIUh+v$P(@
z68TP0YmW2L9r#-!F)FnO76|zCA|}E22N;&IRcD!jn<ASjV%aEW6YdCVHhM|B3PkzJ
zUfPivP}nfi_mWtK=7G*BzG=186qWUmBo-v3bN!cXzMO_aqvJMqDZXdZ-ii@mzs53g
z*doZfp|3)i7t77nE+2T|K_G54Lh3*=5LPM4NoXN`#b};1nM~jPh*ijyFs>`b0H-O7
zuv0}EP_;9~HzM&1h!8y1Ct22)pQ@WqSa*|z4S8~J90v(f_Vf;PwgRopMV`8**SV2b
zcwn1Eq^Ie9BE?9DRr0^EBwPmLNE+~ehFxBOW{N*q7ug9e_1~r{Mj19rNPMQ##ISA(
zvvG?G<Q9d*-!u|l_Fihe^)+*mZ~{{I3UQ)-{fQi@HG|*byGcNrYtvUbjlR;hdH2~w
z4PHbv@na0pi-*<Rm5@9TTA0FLn)#baZ#EfRCxoi-uRUK(M?p4$ByCv0JXutxB|*YA
zn?HD8o45;Z`?ht4^ARo$(vy;0`L2TzkkcbasRB-bB#tj9$vWfHK#j`QR)hK97n?c6
zp7~mTju3u-(Eyb8C~>N@sb;bS-H~c!zGEfuX~4=Z`i194$+!9jA?-VEzK8x$vYTtf
zvh37%+e2|}_k~BtKKlJlq_RYZ#^-vb%NBf=I)J#BfQ@qS!7ECI(h7uBGMOcZ_gT2;
zhPa-Y4clOtE6I1p?P0jCwFdM$<NioZ=QoHBfEYrMH0>otX`(G^f5xlWNZ*5(%Qk_-
z*V7%@8Q&T;Y0Ez2h!)4xOdSe`kw_vQ)$(|2M#}ckweYO)&bHiZ>Go9^M0SHI2(xb+
zG0L?oN<CK~Yi%ym0_S0E)d&X~NgY|_qr!vtC4M)q=3-$nu3KT*sv<N0LKU$@2P;XH
zr|jmwP&D!U7PGyuhLR_V8F*}DiZJzjxmk7C@M~-}ROb6S9v|tJf)kuF-+~v|enlI@
zNObA-*>y6JVHOm@_P`yuA9_iB$W4$#JCr%S;`PCC&)N&W!YZ>8*AHeH^%40i;Zx(P
zFokCW`zjvKKo4<$%vKz#;K1;-Y9q36u5HN)x+!UO=H}AVVueyvaaUM+c-#cPVIdc#
z3ToNo>|@ihw%*2Bh+jQHMH~GOKu2S?MHGo_Xv$%Kc_9M;FC=m);yFV}j#viZg<#T6
zl^I}IHt)f@t321w6mjf8faUoDjuU8-x$wW4AeqrpoaF5c*lul}^w2oL%s+D2&ObAr
z@P<ehY&YHSYO!HHI;=o%!r3G}{X7X3b0vk7%~UU}I&vg|XW?pwI*lJHLKJd**`!DL
zk1qrcQo1;Dfs<p7`Fg?n4E<85n20dO12w32Yl+*LX9%Y}K94#Hq>|=?pRSh;`X641
zT8qa4xh>4(48RK&{Celb0F6|;t>5#|r91R&K`2wte{^n|swQ*w9*~>@=t9Vp%~B~~
zA2o;*9lc=(uo03CF8+|5Yx_iE8$w^8#(wT$SYtha(+*srZ09EDQZB8D0?6+JK!HM@
zWacWHMSXylnEq8fcM<%H4uCFHPcct2B!)}@k!x87LnUh>FP|;<RvPfhrd4t0`ol>e
z!k$;{f51WzmjGBu*pe&P$}At%F}d40B5H-@(-9nG3fdB+I4yQ-)x`ca9#k-QHf0m*
z3YUp^Dy;7yHrj2OWk<tbC_X04GsFF=Ig^1wR=!LUlVt}TT8$oxq=^AxZ3-~8;0e)j
zBg?J%6EJsTxD}hGbYTf6_k18({WDV72U~=~6R9)mi>EUG2XrJv9O%3oZcDsfB2IvO
zsuaEv0s<t$t=8U<Ow^eVctZWKNlFga^{f#7w-1fc!#GYC_$a75WOTCebyiK9t9|xM
z(4awsnn8UQa;^0rQLy0~RDPzU{a6^dhe@3?^P+A^SOFfAk<%Ewi$UPmGd+Fc+fsc$
zTqn&QBqI&znRL{atuj8aMzeWQB8j?IYi^xmYnF%&yJw7HCw%mWKI0s01vy}&4(P+M
z5T}F;FztWEh<<Ms>WMKabS|Y=!$-VK3SJBvXPfwMT?kt)n$bXy!3nI|&dCya8)0@}
zK$J30kXsV4_A;R|o>X75BY(=z(PEi&OIO&Ql|>Jl`wBf_vr8(>;oV{KRgr!Jm(f(-
zpY(8|xTU)#d&Rqz+h#h-t{kJ@EK4b;(us4(D;$5wPY?cTZ}yQvjRQ>{JGoc_%t<fh
zicQNICyw_ENmFrMtk0yYwwN_vol5G!x_yB>q#<@pieh3wweMTBMAF`vm0)n+Yh|EH
zFf6_;LjvJgFyg8k92v+oP82)@-R{Cz*Av0+!ru7h2Xd<<ihMxw5s%ZA=Kk!KSpZXO
zCnsP(;l)D&ItE)sx+c@J+4HVOLI&@%lkLdc0mQ1pK-W~`^=OchYPzOsovY*iS#huA
zjrwVBMaI(=MD=MOz9QuiI|Z)|e;rLELp=AuCa2Qd8QM4V1>p-S3;&50YcCWI1{9YS
zP9LYOM7p}$WsUobB(R4Ju}#~@0Xw`44jqAP>y8vbYJXA(J^LqS9zYfv#740GdQZ7*
z!RJT6Zbzy@>!O&KT#Fia7(bF>t>#w!_zdT=I03W&{o41j{S7bJ<4SRF$n5F9Y$)b#
zoD8o`$FtNPlS<}mmF)?2;H*BX%+A>P;gy;gMAdb=%LS6ZBI|4XHfrw~7Iv48bt{8c
z0$PXFnz|bAYu4*A3zZtE(>CAk1rL6}84b;GowsG^;G1EgB@KQIR$>IkS=n$Mi9n1#
ztGVLPs8DCH9mQ2cnJ!aRmgX>L+dJ0vSdPt(c6H;BG{RI{cvXZngfh@A&Dsp)!LTjH
zeeo3MKD^YqG*~VR-uhW)kuN0Lul#FZw~;cFkJC?11=x$wCn|HEG<Fov(2)?X<pMWG
z{!~q;Mad_z2r%9R7%At_aeNPvpbu;GbbpV#z<}EzB5?C^&YuDA%=%89=vU5V%1+l9
zOeUKHUASs>XD#h&Lq^4ZsFgqVN#R%*1H;#)Sey-su&Fe#pi|2sPA&F0UT%%oGfk=s
z!Il{-yVg7Ne2p}s!K}=YGsCD4*^@RZk{MxS6}Pd5s`w!k{)L4UFVbw>s7Dqx%4q9C
zL4bZRG%wC!swITaP~laI-G?`oixC5{XeVH(nUv}<`;~dVcm8~{+6x%J?0TBbu)FVI
zdxbC3OlTLX$tKo8htRa85sIgsp>*K3l1&{PsI=X?1vS^Jz2CQZIxkpEI6`PsdWNB;
zcY|<c1MA%TP=iIle4_t^h_CptyB|7K84d-4^4UoEQKP+RB;>lFag7n&rEXTIzN#se
zrhh~AusjDA=^v`mZ3pfDMI853!n=>($mHL;w3%!V)VNSy-?+Q%WUi)x>j_LC-ZWH#
zzdJrSH}5bmbCiP41E7VRg3}jTgypcxUm1eF;I7;J$S51YrVYOnv*b#LG`Tx*sC);1
z%d1S0^dF?-*;VX~$HqO8xQ8R2nG)OW0J4Ui-!2O#Urrp2J#!-s;6Y@~oah)C_;q8l
zM-4nNxkdxrczY<j;f`VE2|Dm3>4~lR>#p?Ym!HSk$3a9181c`5yt?8+(}nY)307bd
zx!8=$+km2=t3TmwE!|F6TD~)fGBRy<I)ESCT-SeWp&aguDV?A-+U}L<EQUQb!7ge$
zYlUqQ2#S@k39SHWE$_=;pT##X%j2v^cC06=BPd`}I2oO1Er`&`fD;=o%Zab&yImz3
zL3Yz`@`h{38LRR#<}lM4jcx%Mxl{anG~3fylJ9U%1q;&=4z6KX4)`?OFBW_A*irbj
zprR07-Ho;;2k$bMK{BFpNK5M|U73?xeS1a2C8FF|mpH5<e!&yeLUBn)IePCiN8?yV
zI{fxVAgcW+E5L`{X|IIq#U5W^f}%X^A_>d_fW;~7iUuDjjD>)q5_pyFj|Pv^@1X}i
z4sg@ePFzuE4ZaV=?;UTGVMFN48_+H)fOa)DD;XIYopyKw-!zSn^L)-H*|fHzAhpga
z<!qnE##T{Zs<SgPnB(I8jzisV0{5Ept4a|kOK8ElXZ~$4mZFlOVIjrs*sP;UtWLAf
zK;_+{e^#4!z=D8_8_@lq5m<7Lmq!}KRO{@YEZ&i>d20P2#tURfHxwOv&n@6%I`Ls`
zsht8M^G%2QOX0H9lNzqtmhoaQL~hYDJw;g!7e1>Yp;_lDB@S|ztT&AI-IizDC60jK
zkd9rx{F2dmoUZ_@3LZ1h`<v=+ZOxdzn7XW12S)vo$&tT70u6UNLN!tJd=@}MoI0S}
z(5JE4t@bxNWa?dko^&@%vP28z9JPWjfR3Pyh9%F@5ie)#66(KAMKGu?B~qV_8#Mi+
z3r&EQa*miH0z*4I%x|qXkZnY4(Q&`daD%eTVc4ujkzE+P2fJ&r^5uhG%C5p)$mGs0
z&L3TK><;pHjdeMxw>Dp~s8MeZSGE@%^S%I;6XiqwkUh!t3|(Q`#fVb2-qh}XNX@d*
z@%r6(oz(G+m;F69n0YE^c33Q3GmX2W@tV&-&yR9YHyH&Mt@cr{L+I{vTREYUW(3^v
zh4FZ(%0>mx6Y{@&p)^>gHiVf%_5j*HOj~-iPG&n?TFRZ`jv;bsVl#svHn#a-g|l?W
z%2>ibXUD$JSvU_CMTtS3-$?8iR~&Nwu~;izLcO)g`JDRLZhkv<+mmte4PayJ0Q|B1
z>kE;q>eC8TomoCwoHM~N=qtFNno&@A&nDpsZ?+?a5kNA>2FB=vp(DncMH%pi2qELc
zVw%~PC7IwGFbzntOij*P1N9QoW|aZQo16Q45xO-@&0`jGNYg-I@{y0uDk4chdKdrr
z3}e2JRIGIRNrrkAH%pDN_h;8&f)%7Iy^>ex=^p%PQ^Xa=m3Q37<6COGYoC^?Z`A#<
z-rneBQV36IKLg;{q4}h{w$;S1q+W70e+eEb2)uU&%^N6TY$yu+>BYlf0;Go@;-oOO
zR29C@MWr*MRL#$(1INTkFar4gHDhgpo_OP#G`7MT1g^)+6Ef;EnZxzs#whuXefi~E
z!d{t%#pI*zrucXDfVw7_TJSq%su?60yo_12^`McLYtqby7QF2b%(Vmty&#*>B1BM!
zVU4;Se-l_V&1=Ehtshko&f;P^(uppl4fBJq*7m<PyH9AfA(p-61TApB0N<GCzq}Ci
z0Pd_dx($sVui6^sJ|!4IX$k&(gA^6X32<wm504jFmn#uQTM-<fx?wUEQ)3mPnQqC1
zx4B0@5J0owG~~H^k9Iu@?S-sGJ<+w|Yg%}I;vFs}|9&4oO~RMzqF0LxGYOu=lQO0x
z912nhQe`GVssvTNfNL1UsO6q#CX<2r7-qf}l;O_qq5d|Of;&d|qHI|ii+>||B5Cnx
z?fgG@p{VMahY5a-e|Vv#e|RA^057y0_?H(7sc<&yWu|c9WL52w=va^UPtoWIsc;4F
zKOs%(%%*TkFxT$wY-bjt!`H-C>>8S_srX!^x4_Q#W1e~)0`NlZLUVofZsz>(YoqXq
zrgNO^g4U&fc_FOAe|Vv~zq}CaUtS0yKOxB?HQ=GT#4-aH?jp|mjL9fqf6dbeT)b6d
zljMx>08QBi*4`S^V~P4C8et}NguGucfm+wK^?XVr9`MP(eXJ5qUU)T71aHD#U<G+J
z-Qj-003M6-7U=DdtYwG6J^gHRa=@sdJQ0iiaqWEykeC*T*g5Q76=lnHUU>r_&IM}6
z(Sy`wy0y7K*SX!I$AYnrx*cscQl##M&<kZplk<40Bs9TXs+38vKPkb-Sqi<3;;f&I
z{133b+aRpXymvzX0z;Gk07F%Ofgu9b{|yYq4gr86t$%=_#j6_G`@&`v;m(pALy;yF
z5WVMjomAE69(@K>v}SlYk%EKoE*tRHzADmHTxvOEcmj<iD@7ufp{|TsBAoRIKpF0m
zG&JkL&FsCr1F5VQlhBBvJw;|abskT%r}GiT+`>nHfgw_IIpNmE;l_gW_^0dUKH^__
zuuVeN2IeH~`@*GGNv*DAQn?dCY%)Hu^5_(rOVDlOfU8FP@)n+4%8d|Yg%@#zCsMJ%
zV7-F-^5sw6^)`Zkt>yuLKs0*FW0U5p&ie4OBfgCQH63S=DRX+tgR^8!%;twKhTI~j
z(Pr0L@9nM*R!X*-_2A{h+P}ci6yi7r(B38|iGo9`C(lc|zeCN&3_qrA>NC<z>rH~k
zd%4FBts(&nNu&DSo`Zx;&E6Z(MY_8R5NBQm`o<u%_K{tNE*y`~61Q7MdUEP244Wgr
zL84j^oXjt4&(9)ytFK+8G-VRpGqRFga8-h^oX_U>eGn&_+|uUTccyhtu_$yTSebl8
z5zmw<9u02}_+YJPnpAGY2Ib}spI2;Bn9C}e9`l&OXEhhS5XyYs&KwDuDVpD32nw-7
zD|^CiB93aj4e0Np%sT6TBpxH!;2T@DpX}}r@tx1bt~NV}8ZLVLa6RD8*E`4>yq23k
z-dxVVD+i<cm-PEN6dS0<TodgYe9sf**8U2*y^VESFd4c(ra*Xgf7|^KTi{N_3VnJy
z0|tFcpz-5*Z&sbcg&Pm{tt&g-|4^*d{}F@({gEHUiLdt;7{dJj07GDR4--JU_Ec@k
z^mleU9gBNM>vtts(<a#P*{%R0fmt1JA9{U3f1dfiQW(3RbWcw_89}QB8P0qOn`!*9
z3ruz02ArngDk`-E6h&;%)$38~&sgMI#^_aQPJSXC$j&Qz64s)Z7aby4u$J(QCcZ4r
z<x&YH;O`qq{ytREiRFo6-aqt2F=mksqv9{w;OCm=V8yO$=?kgnXzv8S2JnU4+w9ql
zDY+ndjNqvL%NJ59Vfb%fsHK*gVqW3De4*KB|G&P_3J$;*GWzQaq0kSGVY8*cn7L6?
z0em52#f7^BY;0qKL!vIoCb0MYaf?Z${obbgYUB$V_G5%!#J0%WqWlbjbwy9{hWumW
z<)Hhd<9D4uc$y1=D&Zj6WBh$K0tE&KTgwxwP{SD)Xn}CK6??FQdwYW5lY2#hQ$S_~
z+Ctg2MVSuwQD-9ee?DR)bc!XTv1kO!W?^#qr-s+rA^iwNa4k8!^YzprC0N1S{P}YX
zpSX&#u|l(&ApV9i;(Ic8JPit8`OYNVe)7Qo;_9A)EZN#_0hevN%eHOXwz_QFwr$(C
z*<H46o2T~vew=eIRzyaui@eLsIp;gZ(<mzFj*O1~ftA1-*V=xEBFdcc4*h*>+&x?A
zrdV^QfMhNUsnYL8T=KZy^jlrT!Y`j#>d8L_(;}gg<6^@IhRVIi%3wqUAq}?7<P6^?
znKcuTk@!zAYH6eVFt>4bshi=tUz=3?YrflPogg*HykU?`o~)Y0n@S2a&S$>>N$6gd
z@dhJ{P8B-72A#5D7LN50M8>0yAy_mOmoHAr3^sg8X)*V(?p_Mk;QUG=VPSwARyfH#
zPk6$HGteDNqR~WC!A;%mtaWvcTA^@^Y^u$ak#*o-I@CJJWMSqAiE5z^WD`bFiFx}7
zfthp$F4~!J??SHye`{u0alnJ&KYdYq!R0b);#(aa?Gex5lg*Ni^+F112b!4kw=!<P
zvdhYAq(;N+r_1bt0d)Q<Ir;MNU^Ye5A+3<LqjvfDu>7he7^Hn8?M=9dDVK!z&UZRO
zZ_`smLI3bVyF1G5eg79PWO4Mv3thtbt<lg*VgH908f=m5f&33I6!{M?#PKgLBy0@K
z^TP`@Wn#hrZPA9zFr)i-<txuIpP{`D#-&?D94G2y>IUEVr5o<rF`fU17t;D)UZ~m0
zu|s(<bNZDoXB%@{@;*3gd70Wo-0Nfc7II!VN(K^Me(D60MS<q`eP28_R>LE~uFAl*
zk#Xi5d&!|opjfb#b@QtEgyG12SfN$msUqMrm#KC}nL44pAN4*vJ(NRW%Rjmh5oeZg
z{*^SQ82Li<2%kkW%|E(OKuw73TF0@9>ay}1FBI(B+lfEK_8A^02vAd3N*n1rB!dN>
zV&XrzP_}Pneb7&D2E=+TAdgvaV1r)(xLFfgy^a?|65{;<=wfAZX(h*R;fy*`Gfr+q
zSNq5=ZTx;)c%x%B#Hro#SO0?AdNTdkkv?-qa!Myw{+P)R#RDQ28LJbIST2~vj4U{f
zwQ7hKJrV=KicScVnV~gdW?p89cAQYK_q`A=|Nh75akpJ~a^^ml!5V~}a5IT<X@d$y
zrxN-_LZCzjxriE)BBheb#&S{??|^OsmQbRh(I*+36_K3KCt6iTyUgN;R?)%!Arycc
z2>Xdw^A|?Rxjo(+o1`<r?@aU(d)*4mlBPjzY0#LNA|PGu*{^)!s*@%E>Jro+Txj%P
zTu6!XxA?!fkaMUv{hGl*G#mq}a*0cuJRq(DO`>azK2ml$N+U-@4+ei|l|Y5FI7>Ny
z-pJn5Oik=RxR3$o4=$Ab4=xmp^Z(*P6p;g^ImgOb)nV6+D$@vmR&rnas{?NZ7P2K_
zVRwC-%S(}MG$5)?M8T^53l~bN{K=E(bPi%E&N~JzM-?t-Cn%A4I=Swu=Mxi2b^C==
zWhytkfoMf2zvtxfIjPUdU?Wq8iSr~$0xFiyjXQB$i*v3I^_e)3T#UecL05rdF<7a5
z6#@YlW#>TS9@7|zAx|<&4k7j|9rf|m8c${EQY@btzhGtBNJB22Zf3%JRK^={5P(Rb
z%|SZWgKTOu&NhowJWiJt0u5+oT?w=BQaw6yXMhI(HtiU=jBo}7)CWPytkEA3qCt%S
zVfwL!k}1RXIYqzuC1VwwPF8ic1Onu}Zg<bruIYMRM|HBCdq3&p#huESqTp4$>YZZZ
zq5I2r>96uW{WgLYI~edqM*-!1cD@@RP}=8}sPBL6vD^*;2=QOJ&oKE&<qn8>&I-Kb
z7Cg(TLk}~yxlsDu!ySQ3DFf)%qV*{Pm{`8xsb<_Jp}ug9Fl7h>(ih54pqAbjjFPpt
z;Et<88>5!rg_`l9+yk{$BT+H*T99b*DNEt_D>EIyKQL}hsi7(@yd&&B`|s_kaGoqk
zWXVWUa5BOubP?hzQ&!h7Z!z;G`ISu(1G0&;N)Ag3N)JWTXZ&LeIdyya2{nxWQr9bP
z_&FFAWpS2FGTbW{)b^QNgpvNo7P|ajTSy%IMW#swuL0XgEzq=yR4__lm7ObWMWvf{
zjzML<1(ACQa|Im|@Y=2}V|uiz+<@sqKiYardOgO`5hHvOH{YOzyA*^&3j`c39s%LL
zy!UU;$q8ewkbal6b?k>1JksH<2OX&g83?6-g*YPbct1;85%{uvE`^2%)$!6KUGD{*
zVt_6$c`k+#@fu*Gu@^cZZxn`$5nMpALSbV%73Zqrh!~0FB12;MUkcX#9E|l+p?^+B
z_uAm#E->sZ+XdY_o@~uLTZDBN8$}Atavi=d`J}kmWa0<MsKk_-u7+@A#;L+B%^|x-
zM@J8%q^#b7=no+n@>6^!AT|nB<+)6$(|K5gXyU%jucgs+=JKTlza!ptscdZ2RIrD<
zK%__#y+H=w=E~i!`73zX!Txqma4oWiTLUuMIxXK|cZ{^j)6s+!W6o?SIT;g0oV6xr
z*33mHc(t7Hj+E-g87a0YDY*$v1e>bEz**e5HKg%E0+>9MUYqThGQJ#EUe|iPA0J|m
zZqCbLr(O@6AC%}AlM+mk$v}!FtoVPvJYK8zynSuE;;HL?7$07Vc`Qg(ms>9eqHzfu
ze3_d`gptV(9PRVg@8tgump1<SuqH-}2Z^efrPLt8I*uEG)$s6XwEv)8`6(am<n$cb
zKsU)Y3l%5ZIBy-VU06>JK;cS>$n->Rio4nrqk6U#Ouf>!m)m#}EZf`TDPjl7&Q@WP
z#W;SZX|SbM`Y&22!peR+(X%LPQda$k7HV4<_uAko0{8`BcwdW(8q2|sQ+TZpsVCYc
zCy=w|#vs-ucNB0}5&JJKq=Ku^bDlgTk>ukwyxrkDw9NA1TXKS9A3>D)RKn_Pen4y2
zvm&ybiRKHnH-WE`xvr_k96?Y+TZ|G%cRzPoJbJ9w+Y5AxbNlX?Dsi8bV}@Tw)Bzq=
z(iX&CO>A>;_OojhMTT+ubxIw)%6@-2G2LkJuPx+$Tk$`(5beLV(C&C}@^LA{xf*ps
zB<be-<C%1<BlWTJS$s)APC-sj_MN3_uE8$?26WVr_5z}nT0r;RTX&OzOUz078eYAz
zMKS8DZpzqd)EVY=rwngkh~6@*o4#N{FG#@;{qBheU>3cZA`;d$5i0*eIJ4ML*b&Sq
zu5uHBIK*W1TY6%sNdEUYg8+P`aLsP*az6)JH5z@@+PK9?vpaO)q;y5N#%|?{37l>j
zYZY9t?JA4Yu;M>^$>r*kR$J1myl2a$b{w-O^Jk}ySLJ_Snut2C2{zUr*A=~9ZFam*
zg24r**4iVZ<$g6FU=q_n=Sy-93KK5Y`rJ$G*RnM~z&lE|x%_nKbvvWpt7qx4?-vK;
zVOp#Cbe!{gZqLC5HTFCOzdy(|g|@hu;n?gQ6S3k)0jF^}3YBmW&qW9eB21GQEAJ~p
zPvBndzqD4$&r_5FriCQqZ23G5x;*hx_MLj<pEyT^C$^Is*p{<5rgCK8<hmf-A5hZe
zpIrpGv`vKsof^O<-Rx@A_=(V?ik@VqC1>VTO!-AsDMJ!bW|!R_ua2bZ3Ec%mZr2?m
z&n(LOAN9;edW0*f=I9Mp(dEs2EEbFX-uS&uliMrhDkxPJo7wde&Gnp~@XIqj$h6>{
z`T_}f%YOh2e2cEc+{Q^#W|%$^x3Ko=<?>qZ*F7s6&bF-XzA*y%R_H9(jhytM!w0{c
zeVY(8a;T-c64S!L2F&%EcXpaM0VOe0CmcuoAIwzNyQ%m4BboJU34Q((6}(@u!aWg&
z#SOw}lPYtcn5DGBx09|K+D{z6{xV6g|Ka?4`G|cUpAzdU%&{Tb9mVx!24X_m#XtD5
zn_`673A<B@3mt^NuoetlT<+lW=Lh~MoMv__&h$^@%KsUKo}?)(ivN>V242E-!1lZ_
zUOZeGjU9`1d{hlNxdCspZwb&-!p8d%DVoWT*Dry=dyD0m8x~6Om7G1*pgwRuR&D!#
z*+O2{{oQUUB~H74eqYG0w>>~8)|P+PDf0YHBbWQv7TQn!v4ti#1yf+o?5G`habqTw
zv;-tTt!KjEUGXtA2UNHByQ&{qCUN4QNtYv~^OJTR%!!DlG+;>&?hQIvkTJf0e-<QE
zn`AMnLzMlr$WEk~8gjd4iaE%Gqa3Hr&WZ#UxN*x%lxlIVPN%+}{ywXIwp{s#7J~o(
zX(5Lokywv)hO0+ooDN0DK4;+QM0|aFW?JWdc$!3!{g5qvh^ljPReWB9r<M9d+-vc=
z?aKy`jA(w$-8vMfR4tU_BAT#E%)N^uqL$V|L*jG2sgRR8!8tS|_wT5joy7|UftFvG
z^wmfpX_@d63ltu}KFxWe4vP-eEj=<f_GJqU9$XD!{VPeZ2BiPWLQ+4n(AwZXve06H
z<IwJZWFe%CbQkv-d}3f?=}9vw=?=3Q{JueG31}il^g`S&2;>c?C<Wf_+Syr?CG%A^
zNh=L4nscz&1sce#1*Z1=oj)S@h&!wSXDW8sv~rBgDmKd+L{KwUh)lljhTBSIxDF4I
z#pF5Qkc81tN4Iksf03#|eVWoE0z>}*Qi(G9%$_3ACERw3dL!M8Stj`O?bI;Ll>SoR
zBeavy39%R&|0E=G^keuo#aDsN-thq3KC8fRuB7i$&`}>d1XI?G1Ly>0w;)*sY!1z$
zHib70(+O7MEa*3gHqH}3+LNqAT5odvYss~>R{#4wX39Ui)7ALr+WT6o)+U0eFxh_z
zXe63Y>BQ7;oW42&^Zc<mhDr?y-d=R2*WMo(3v$PW`j~r*b}wiJc%&Yav@xZ6_ORe6
z9&mYBf?l!uw*)^-z9<lHsQXITvboo<vOYK?VzmXzNfD7|rcqVZ^zLc>5}JKSUBaG;
z)+)3PF__}Mv<Qtuqc!c?l*nD3!}!Im)0r}h(K=c4ZP>it>n_Wlb8uEN<XHnEP;a)h
zrisfC=`_!~&_$_x=7{CLFSM^ekCND59gfcfB3)>wW_~E3PvI%&S{bkoG35F8Q;86D
zR*VdY>TS|dLiaO&Ax7U9&9qI{XBhq)7IKEhCjEhhnx#0i6s2qB;M-|!4z`3R9Y_~u
zCcdU9d16!}n^Ov^HOk07O$Eh&In!t=_rBeIP}5%i*|hL-2tDVhRYV0%)b)C+vp+I=
z-ssLnKjv<xn)G=;9bA<Am3BZ3#M`&?YLa4+#%97V1Ixg{3K|82=Mix`r#d3!)wY7<
z>J0Pw_c3;SD2XcVD3Cf~f+ifDDCmj5eq0cB8$tk0Ixv_Y0m9421ycbJ&vKvSSUp?k
zFRI#Ei8YFF(v3ALp*YWYlBWwa*aW^=2p2WtS@#I)bX^NkKxL(-Au>fBkeSCFm%{$Y
zY4WS*m7MQA3<!OX;ylpv-s>TmD|v(%zbfWwhOm63fu8ztubZ#)Q;4U-Y_>1R8Y>n~
zPTKRzDJVnP`?H`oo!E~n1a<X4u29O`R-OuI+esTF)3|P`xKj{fQps_X$p5%PGm^Tb
zcDX1H@mCr@uFy+{=Xd1DMQdfF+GcQYaj@f0ZQXg4f`xz?=fPCVW`NL@w<eY7$&_I;
z4-Ezg(Ja-vFrIZTxEAoVT=0wf`qRIz(EN#qwy+$NEhrBpXJh$*fC93<hsa-@65l+i
z_mK^<%3NOp3Ym+4T%m~nc7?2N$*iohy%3A?gPYz9yr=9iDUza^W(SIG=@LY0<M|~Q
z5{W&Fw8xMYAWjGyAsjOL$+-1yn6SMG{l?PZ%-b_elaTW(-TFP+?1g&dE7hQp=j%EL
zI(8s!R1S;W=oUJx7_?B0@19H`%(5ctX=s?-?H?#s^}sX1HEC#xw-63uygYyL1ida)
z+&?+xTf}MB*J}{R10n@VXe{A6#LSPM;0Z8TFQ83ow<DSoVXuVYE2?1`29Zb#<^P8j
z;&8zJLwD7PE-23t1i+L3f`Jv#FX+0bLp|p#;mRL|m`VE%QYT?1azf3!b_?#kPo(S{
z1%0RexlC?^kPBf+B}M^@1B1K?4q5kFSioPG(sC*RAB=CH@Guf+-#-=6XEDd-bh$#!
zQm=^SAWApSyb>B+Csx>pAXiZW8go$J;IvmbR5b_qrqWiI4T4FXTne>n7{x3$C^wh0
zD5|IfX%v-Qb`iOy-{SGJw=G1vro?EcHiz}nE{lGfAI~Fb$b?o=1a2h}M%$V&{XPqC
zZE2*^CWc%7V%?~*I!mKvYb9Eba>jsWZ75mRmUMo8yYO-bJH)}ou9A)mlm;3H8PErF
zG>%Z@Q|Hxg#lvdB{S3(_3`7|NN<=6RY3hZN(+Pm5%Sq6<GJ8B+hFPGLjOeRGa%Irf
zs6&ou10rH7NEAoZ|G<>NlkAK~Ey=cj{vG(08}dmwSnB95FhQ__gi3E8o{lADY&jIf
zva~o1<O3-n2vcB|(oTlzyn&t1NvSr#Eat!?5SccnN-0b63z`2hjIw;}9J(_r-!$zA
z9(l~-3BN2O53|IeDfkur&0=xdA4Z29-0(ihyI^=$Z6aBHj$2_wS8z<YyqHFFUsQjp
z%@r9~@^_jrKE5Axv%y2m)=p<;Kt6^13U$6~(A5}z)Z{FR%)<!*z6^*bfPxh~CpBR{
zI&5DmUhdm-HaNA*`vZGZ*n#(X3<7~!Tv_(WCVPfSIoAEW3VWV6r<J_q>OZKE+%)Xv
zs;3_-TIP~<va^4Wxnf3#NzE?jfOKvkH&no@N9>eneK091g!wN6FAaSKwEe!`YB2O=
zYY))}wOM}gsgJN*YkK_8JC1^0C<RaQ<u1fa)s>HK<Dr)pj!y}Y<xqV;PE4FxJ(0OF
z;U85<y{3KPV=(UpYvwff7yJc#-_o{L4}D7LNl9Bb%pyw&!?h{j7z8bP|CCpYDplXD
zLa7>Lx1at*N#Gwmqwe3gXvIiipPB}9`cb$j2vY3%(eX!9ll3X;8i1iaX3W4#%P+(u
z(COK<2>^Wl=a{F`F>BVn+ef={#+H3u3JgrmEs<8;@Q4V5|Ah*j|JrC8)`x2Rg#t0&
z%^2`Dtwt7{D0OoZpP2_Jw|OlFhqTINkQS0Kdv$xJk0`MV?ud{mimN8NfEqg&s`?Kq
zB<P9!$f{a{2A6<ecBERQCZGn>;yct`C@^Ib*?vmcKV}dKiyj&>c8C=9rA+>^WO#Ys
ziI@j=2?fJ-a!kGh><Sk0>%}t(%&bXhunI&@6d&nh&J(WcTgBL&1L{RTIUp*-cjaj*
zC)-o^m#D4%nP^~I!;|3~hzkLP@9u%+7;DE07b54YO;9D4>CKK8=54lErV;jz`KpqV
zF|O!9Z~zn?1JZ<6-~K&1kw6vLX-oqgQU3=xH@5r5oaPRTHB=KcjkB~#DQN>PcM!v&
zrCHXd^vHdiEXt&qL3ue)DE5?T<!X6Gx<bDBz@6@ePPF}#r)jeJB)#&w(zb{k`S6%u
zD}VrL%w5d7Y`=IYP_#bzz+jP-1s%bs1yOj(?HJ3fZGgEqRZ(CpdFd~^G&B&{LTFbp
z{a(X<a2etVfh)o!yE;9YZ}R*}uGuGtVB<XSZ+M0$(&!Z>DU`_7a_UbYL+3YV^bhZ9
z?ajYeb?boX(22n~iK_h-35+~+_b0CQO_s|oo9j2wXf#<*d7-W(+2Zh|81Q301-%e_
zT?f5S+dV_u?zOrFbT7}TJ6fExoh-AlhU=6&^RFKHTZw1a=N_lU%laGL$m8SHLU1LQ
zT8XRuP7SrKzVyA$_?gp(Ek}-lKmQ<W-LHnDtVM&jZkOjR;7gK*j{A~lOP4^T88p?(
zQ*y~RAJgZQ8-YG#D%&z<Z>=q7O2hReV4Vzu4=zm7bwdHMHTia+h-uyGas>buy#s3V
zfu#y=j^`WS6@Shn`#L#18ZdX*uKQow<{iwWC_BJh+k+p%#Y3#n2^iYktgH&6TS#r$
zQ?iV5bPa1!eUU1p=d0guLp(k{71*@zt=9=6#LU*7b{neLi!d|uQ>yxDb2ZaQa9`k^
zcu*!^3<76TE+8MUneX$9B&kxBxDWf2QauR{1D2^f-ms2iS7cH>z0xF{Ms67L)v||;
zT3%JR@A=6yY)TIc<Qvvj@5wi`!aG;~mBJ__g+^e`i9y;14tZ0OxOsD~5KcorA3;z2
z@|o#ie-M#dmg3$grsk8}t$BVFIh&E9JA_V5EGs?6Z+o^enAL9yQzpi8kZsa^CdoK=
zaB#?DYj&=qix)Q~o7eJ8{ItTalUbT$%cHkE+onRGLzx3|+V6Fv(RDv0k#>#@E*H!e
z@y?<3r$IWHUsoC9a4Scm6WAeW!ftK<z0CVS8Pf7gsF2JH$PmfCcv~5*#vlM}<~_K$
zx~6-G{2hjY;vhbpJ`nAh&}n2A@TN1Sa62{ZG!%U;)Q#3dW*7N}WEcALv6vJH@Kgl3
z7>Vw3J;KlLkcH9ulX%SH8z7rCB`y=|!C!;eo;#1uC@XSc>;q}SYP<8eIlK0^?)tmx
z(PH<yKb>cC^_Yu8%O0dxxu+Ut?E>#cF%<n8#fRZ_&N9>c)>^0Mw*OQOuBR^Fjp$Jy
zw(aH0_sEt`tzh`WjMV1Kq!u_3)MLr5dmy83h-WO9=dsQSCcJ|97SQh<60ES-(~Dip
zEc>xESm0zPXchIsWe6G)x9CXCknyg^IV+TIAz(w+h|NRX+CZ(0G{vDp@+fb7%w4yf
zxk2hiejJlh1}epxMSTc5XE_wf_B-aI>+rV~S9?&U$nA!0{^4gN%e22dSAJXU{ibd$
z2I8Y#!fn)SAPkk$Xp+L4B8opnJ{#lW?~2`VpMV25ia6mj1#eS30#p1ISTyx=G^%5R
z84JHV9xALxOb_(RPDCetIQB&Z4g}3ifoVc_BNIP#wg@3CcYaGtJ3G(PWeQxy=YQ$a
zIakscnyL%lASU4~NqsVU8{Jixa3-b?VQkmnnNL$z{81Dg0oJfGID5{h57TE7HR?Os
zVjel#BAsb(90qi1nR}G&oyW_WA71$03xqRNoR9Tb@{s=BHRR+^P?(-AwtOQskQ@qf
zeBV14XA8s+!tkAk^)mY57Ax%dwUX^C4vhWT<t9_R*#-v)Bq1$G{<Cjn_V@sbWsRwA
zl<V8$@Z9!Z+IaO!=k8RrnmiBx#p_9!r{`uK#QSQF@PYJz3|8T!KvPJ~C}#<PBi21&
zrFMXIvk;M1xr1i>g7A}T@C2M1`*|5prU2xTy2V|>P``es5yZ8}<4S0v+&fezAVK>0
z#U(5Z(5{*Qaq2*Vjr+vH(d+SQLV=wo(*w$2`g;H)!$Rvx>F4|%S2e=OTx`driE<`D
zCf*lHpr<+=i=Lb#(WKHOn_rS}ugN?$?Bpaj%1<jwu7MBEnqFi|EKXx9jgyRh?3U83
za-s+0?}N7e2r0RR8DjAYXQh#m7FS2_7o2*pffZ+G1m+hv!)<)j&C+)79f@(SKS<t+
ziNs}+t$9kwhSWVz`%JuO7aa>^7cl`z$*`^eH4OEUSK1+v(&gPm#!@n1oy9XqT?|SH
z?SISbYy0@XQBRFooExzqUj6}7)yz;P<~UNQkQ0=u4}Y56l;c*zu0JF&i6R9;y+nLp
z>?BDd?Vw}w!d=nogTdd#?|Ow&_<hmk;b8On8A`Q{Gb*-vclbBSsA7C@*mVSZv92>a
zRQ5r}bc4Vx;YwFs^CjjHGPi#6*O#-&*U{ZaY5(X?CiY8tj7y8A4u{T)ls?ob_(~c5
zlN-&g3sAfW;4!|GBCgJ#g8hIfTaqz9h@W-W^2VS%DxC!>^~PyoM(l=+`KPY4rr>Bm
zQHqris3>kRF#;o8f8lWsE3$=Dv)N^u3)}KW*|WNLuh<2t-TZUn=yvhIW(X2oYK^M<
zfJPkoj0^|8%*LS#bRx>um^ip{3-+d@pWB;<REDPO7J2y1yc8Bd6!0l6zagU`<H0`>
z%ScxMHb`$3L<x^DgpuVknShDaqQc~?!c{ve4J|lUzU1+xVHcF<R#x7pFySdY%{tXl
z_4`nHvRS&`ZY7)o_Zq57s&;=^G6bsV1yt4_k&RA2mGNhqNje9_Ljc`za~)0k%)Fbk
z5D_?|Z+UV}I%{}>npncLV~ErGAnL$>G@6Y9QY=bj3_`3-DdYB;dW*-s0P~kYz3zeF
zgdu+iY*XL|uqh+r2vt?%pAL${Pfu2HSL9|K2!cv#$F?o^`WAbq@jUayry%lti3Mm8
zGBl}Sk+r!+lc3_fV^h#%m_~}8ggNee;{_yv#hWZ7<(M>HGBolCeF_+eS}NQ5pQSPR
zdGkl>Pc@gVcoi@e1_L6(I%FD#Cnh&cHc&oQH60FyI7Y6ke%*m{bryOUROLV82<8i$
z+o=(yZ$wa)LC~<llEeF(L<0kRBE{C>Dl5ui%_o1YdUVHseG+*Sz;uPeW_>TTVBQ?)
zW}jfxt$yU>?bWHz0O(d&;JeyW0r+?Y0s7(S^AN@<1F_KZXm`6**qgT4f5Jo~07G^{
zq;x#Xg$17;8m_E7j{J<zaRaqvM4XuI)fJQJ&0~Bp)6d=o)@}?>_MF7Mp+g%=csa+_
zb<!U>zK5ue9`qL>k9j}X9Uf^+OwSX{#=8lG4BcH22A>j|2b*I-kyPs$nVH{Kyk97}
zUg2RI*UMK5=__)sWI!a3tBWUlmTtU0r{sJrcH2*M+Z8gn#;e+TIl!w%sHT{{yj|QS
zawJy-*Hi=<ouwqidJ}$zbKA%jkWM?=1}HQg(tS609F7)?sbTSH>$>~E{_T42hxMAb
z+2Gs#I<<mmAcZT_8g?6i=3io_S)-=od&M{yoI}KDa_x3!VL)AdTX149#WB%7ZJj}t
z*(S{v&eJv`p5z}|8@Xcg0~)czE&js`VM4_4An1+;28R|(kkx({RLUTIP*S>cmb$QO
ztbU%4;~lG>9Ea3}Q_Q^p9^`6=yH|-k-$C{lCSP@DRMH|Qk9bTBN;p~+_Mz#m5VQ@L
z0zWLW2;3ydRm2hsk1inYE|bj@s+c+Hp8Gz3@@pvC!yCgCmtG}A1kr{OiCxRx<Iwa#
zBw!XbAIwWa{b{zWc;AS%F9)+IyA)MwgrSjAE<K$Dm9{O71eJ4(WIrQZK;fx-BB9w<
z%F_`LhPXaU{26&Zy!!L@fnP5%!roA*|JQt5hj)p4-~kflPxp~uPCC)}X15cmnns!Y
zDsjIZpm4P1-_)FscKait=fHJw{&+vubuz5egIkxs-{an1fB|fKN@1XT#W3@HSvVqi
zctho09+*IIlWeA5uM9L`kg%>JGqL&xQRKPkXup}#YK-B|6C4Jf7@L8|J=Y7U#Q>!e
zY!x9JM?DHQJD+y@C(ibn?xuBj7mX1)ts51=ElZ1@0v#%_QnlIDhTnL<LnZKnhK5K4
z;YNhJVK!{5q|j><<H7(0=;`WlEGlj}V?Usb)JT07us`db%QKq_%P~*P32~#gPDTuU
zh&p&>GJI*XS;vb$i%;<K3m4^g21UH~5p926oEZ$zWpk;4$dP+i$rDLh7o!%;XzG4Q
z?awhK3_=Q<U(MI1@VP;_z!esJL|p-Z9=6>~4>hq8`74R1r#W06>80jy52*z#uF@Hx
zyJ=|;^eqWHoq>#W>jjkg4b7cPQflOo<yk|RD0GmHeZ+krd>Zd(BL9R4<}(vi*~IZy
zq8=*T9@0|n6VuP(X15oJ*4+4&?kiH{%-y=~k?_W(x!*4*JmnPFEVO%$|IjKbVG81W
zGGVj)*(M#jt<XaiMoF^E`3#zqHMfkf<Q2!Z^Y{Ff@6Gfm6epc@B&(IB)6IMyC`mTc
zq6v3x)kG7Wg2t==;U7(vmo^!*H*#CM@l*Jx{({uyy?x!<)pPv}0EyILuw=lfMJ>z!
z2gzaO)eEGceCKn``vpp-D)fYPq}5~##uVs?SEu_GqY+37`grqRZ?0<R>O<rCz?#$C
z$Ix_utuFFHIiFRes;=|XbOV}9{ExW-=Pm~;^*p$}lKzf^fjXWC|C`tN#Ps@PE8gTG
z47=eLohG#OuSs>h1TW{;Id4*OnuE(}`l?2&$E2+U2#qF)nH!e4&OoxH*1%F^JzNBn
z^E7_ztA(?1NBjp8DXRRiR)Ui~&jqX;a_wLFv_&-WmO-WFcr&?mvi25ikq@!}8`Qqy
zfP*im3D;gKIn2SEj=8zo-LD{j(?*hK(;fcu4kT7QQbuVJdp%qCqB9@UZyur4ZB#m?
z`u_&0xw^pQG~|4p>ku&|ihsf|-E#7P1db^XiduqM<@lLmX9#DM>HN>U-`&N73jwLK
zin^ZY5-s#lEocd<-fmheS<pBVW>*buJCV1JxN_6Yy}=(RUUAFf6=1~kM$VE$ZS4w7
z@Y4aMS|y_f3PSD(nK(FIRs_K^j+=<xpUY;#2(T^NJQfnhRgk@A(EKND&KIK&hp<?$
z-8hCByG1{#5Z`Y6^+2yCYaX`)+q_i1YiL!M%lm>Hx^0fehXb<qrbW%*U(*TU@xG}F
zRVURgR5(%k%Eq9EO~$je3bWB#P74AK*HcHris0690FaA|NQYq#JoC=cw5q=7zae3t
zQSh<(KBI9f)V51;0;PABE&BVeK>{v9@pyMSek$@HvB?rVw&fX>>uoF0Q+*{Vx0g#s
z*lNo}{Nd%$KS$A7=N|8?c~4v?nOAF*d|0}<>=za|aGVmE85Dt_i5=Pn*d!9}2fe){
zOmu~8<9zaIZ-a*mhUhj=HyvI!!L~_v=7{TS77zv73WU{>ppUR7#Rl{mHKaC_@y)$D
zU7}hD6ajRp)I8m&(cfWH>c6ShzWRwK8wu9{EiS>;#EmWE;F(DefiIIk(J~MOEUVw6
z^!-@!^(?b7?wYlJc~lj4kzWO0-=vRUVaVaN#rEcZeAFs5Oz#akzPrA#g@mG(azh4N
zAd7{H50$gDadSK%la56<j_$hMUQTU9ZWFEYeqA??l0mIv(U(M6sq3}^#CH_zFLo5r
zTN(DkUS!z)?C_ssTQgx*aC;f>ybX9P*hxdj)bvIvW@QYt|K?Lj+A~bYGZltwaGy4f
zv8d)payq)bOYdc=;dZ>%w)X|t_1@o5`$;t1I724tg5nu~mW!_?>YZoh*N<k0v9@=;
zFxZUSg{;6PCIJUPW}D_BoU$o(+0<~@3h4(;gH`0s@P$7!+|mGUAV>gU6kVc&=C(Oz
zF+pCVvdP@e$02}|=Qjev$f^~bDQ8i`0A5cNB=QVsmll{krnPQ;aSm}}iB8J>B<&qC
zB>7?gcq~B2_CE|2OzCK<M#fFTBzK*i0CvX1wz6!^jxam)sgBf>u1JBR!yh#TWlVr5
z6l!UxR%8-56@;G(GVop4GZLSL+5rR>16<k@#$zy9VbXCtmz^#*d}oeuq28JP>QpSz
z%t1R+|D9r|g46^e6jOfA0Mu(i2Kh47C>d^LPHOt}GlGW=!WFbt4#SXx4rVlUqC+(e
zp>{h-U--U?<sY90BH&5JW7RBQQuMbi)9)gujV7H5`Ry#7sM9gGW}|C!1Qo>)g+2;c
zJ2BAt%+b&g6_iu~z{Cj_#g1_dfXGT>Vwzg~x=uQ_#(OvCEs8=}MylT9Bg-?#+>ig%
z>l#HX0^WvSNAM4Kb?8t_ie<&6?D|p}j(hNmj864$?msVsE)Xn)$<Nq$3-n*T>t-?n
zYm+9C3!;NF6(hF6gMk`D%RnzWy_Pe!a}Iwtm8emx)sg$P7nHSv^^KW!37}BVYl+QP
z!}#X?VQ9cic7Km2sW}pTHDyYsuc~hr+><!S@(hYKdVaQ;QAVU(^J!?d8M)E`9i?hz
z!_0;KF3%?jGf6y?gWd7fQ-GD%s0q?p#VoaG5l+54P-HgM5^<x@tZ-=nV11DbAqhRG
zHfh0SuQU4VpY8c<+;Y9v;`<%*SqOs`y+AVRg&R2zn8m3vNW=~2fan1`$-3vQc<6~}
z0dYeTxo#>t>z^LCmXAm0q7Ht=mHy?FFeEGW_*|lu=@Y>|r)Ky}OWt9~EmRj|e#dTR
z1HyQs*v?53byU6i{=CD+&<^Vzz7LW58#%p9c99W)EO0w~I)UN&87_eV9u{V3Ug#}P
zk4mjI(a*BeW%Q-fTjdx4LOkok2BK#4x8JC_(JM7XF>8>NQGSKpaB3PxylT-a|H`Kg
z6J&N2t8GcX`z5JT>EGSPvC3$yTyBdGyL3FxtMO1Nx!cdP+X?OqIfp^5p8N4Tle$@Z
zI)5s(BjQbH-zP!Deqf3I{#S?{=9-OS7|><{O^>^w&mGd555C6bO{Qa2zV1)H##~SC
zPV|kj?B0$hH|L)jm-6&V5S_D+enWRcEfk@cpRl`R4*4q@(_=UVU)M8kKnVaheb=2~
z2}D_GML1~R4HL)tMP8Cn5+5of5+6{HS}1NFg$Pqr|AdOJsDJ<{?3pD8xq&9_4oWka
zPaqHxQb<djvKGCl>AWBbRuek6VteQ(<(4BBSqNZaxd^uh_iSBSb3Y=NkVRu!ftFJD
zYQO)%lCWW(ufD$J_l)pET7dygkQgd#Aitc3^wDlo@wW#`?5OsZ-ngnI+}F*zA&09v
z>BGlGNWFYNoSRBkmgUAjDGL7j&v%y#H`L3m_p=yU!@}o_qW6BijeiGu2#`E5@D}Bw
zjoTD-tVGroIs9^u{4m7&KnK;^hoB$3q89Az26?L&u`F%5@H{*?NbmAS`!o41@CCNs
zii6{x$O}<REPQ<bN(hd_+%gZVIkogrJ5hghHn)Qw!A~Jf_Ht{S$}nln4J{}~{WRLj
zN~eAlwcF#h1b-|PXL~}46KvWP=*EmYg1&DZB-6gIm`)vyF-O@9`&VT?rW<Caws4hw
zI`<K(%wPmyUQ9qRk_Ij}y-@;!GQgiS9t4$uovN@P^teSE>gr|u{A{Vxj`ai_<`*fC
zWPcDCZVpI^B(GTCw+u?3pbV^%?gI)7g5OXWC?bRc*irJ>A+k4a^^Us?Z$tTMvTL3E
zWIHP7yCYALY{H42&wCn2uYXX6^x;Ha#7NKdQ@LF}V0NW+D_HJ~HO*QUNdJOP&eaSm
z-eF$Z_=Mb6)o_0szK-L%UOn8U&!2qrFVv6d*`C#e+}yYoMZgD<DL#{&*&*XY6IDSi
zduw}y*j`pGDv@n+sqKth0ZNm#E%H>Jp8BtN<a(XtbalUNSt}%f%Gm;K9B)P*n}f=D
zu)6+Lv$dnLFn{`}O-LB#e*wC<lqqIe_k8{`d7g!TNeJrf_z<R-u*iw<puha$!_x;X
zrq@1<23vwt$B7967<d$+uY=6n*L^L40~)u>0yp62(i6#<E&yYnc?pSvEr(~bYV-u<
zchuu6PDxW%+SC1jm@oTT=?XK$*Y~E<ldTjh*{U{oFrE7)9i{}jGUUx|DGiP2Q^$@T
zrs^fS$5iYqO&#7w87LA^c^y-Wezzq~F@&e3+WjUr=*et-L|lqkdy`k@i(X#xeyjI9
z!XZhoJL3Cl^4g29TlGPta(Hbejq*&Qh%jCgQr;Wr&u9I+8J%9a@Y5g>W!-EzV5B=s
z4~%C|GySxre@^XI1LZlMsKK-V$?zwRmK;hJnI>kJT#(^&12tvzm?MKBT~%J2kiMe%
z{%u8mQt9~Hl#vqkg;LIfNcYnmhw=5S*ZTcwySLf;{TsIJ({)`)GfeGHuv&0l{osbC
z4?Vmk6EOvi(at-~iF@;xoPQa6Hw&s9nU;|@SFi>Qa-JIya&H0SkTjfQJ@IhG1kVn3
zv%tzs@c_}Sr#UOFoIVGTf~V(!!<yHumy>CJ^nzW8JNW@o6$W;U=`n8oBP(;2bZkNz
zk)bV+Z7en)$H)+lWhU+};Sy^QjM%Uq6r?ZU1Cw#kD~Upklkg$sgE2$7L9?_HuI+P%
za|Sc=?0Zu_`N9aR$s=n|L?_ub9nY8UgA-`xWTfL&VJNxiA=AT)MmA_EE+8y?f^w3g
z3%Uu!oNdZh;QF@jgl5K#mml)3N!kcWD+^oFxoehi0pDWg-)9okXeoA_3N=VNu@wNq
zPC}2wRD^Nh^a<FbyH~5;#J`7NVA<jM3p4VJ=f;hvJwL|{t5)0mjLCmUfbaMVNRb2x
zGa&E!nvli31xd%M2g?`i`2K>d7M27^TVrAf-R0??EkVOt1k*Dr@D7!^6tqh)5TaS?
z_K4qounC?Lh%1_Dxk#~&hTkw+t%A>e0)GPk5hk2Xm4quM4SB1cwDB*b=`7ANStXl8
z?q2=+`U>wK@7dVd(Qm1BHH1*Zoq#n>O#G=7SQ^jonm;n(W5yK&o@*#g9PAcqF_D9w
z<ki=DYO1>cK5JmhrfPDvqbhh+x0d>q7VAi8ArS)hilgh!{US`5dbatqsIEkX1<E%c
zLr`B6g(L-UJqiVTb4dIwAOGm<3u5A6J~amSNc0YD{~GU_XJcny-R`U*k3Yk{y2if3
zZORv>6N0oh<1~C=9GZ!2iNQ1HKEgk!JJ8OCpc_OmG#nT&^>Az#9yq<m&gffTh(JJN
z$D&l9NRP)~O_eEmdvAGqI(!T9AZw9*D!$ihg_A7_br|n)Vr+FT;5#+G2R@Gk`5g2X
z8XT`?B|b?!0h)%bqVdfYMmNh}Lg!^^VmaZiE1k*?Q;w?j;mnnMdfH#iO_MHtzdNmR
ze!R?M2B|}D-r;v~Xz;~Pb)m;S!M(^FdxCpBZWyly{(Sk=Z|H5UwWa5q-CkD3ajVu0
zD--|(SM&p>IA5Ki+wtS0#f5zXWk#52j)63@q=Z1e;D{q5MBPuL=SY`%d@^XEXNkX-
zfs<Z=2je1#MZOT3PMhgvSv-E=e98S<7&87g*3)x<_=VWb;bkgA?5Wgk{@c#b5Wl@A
zEe=l>4(7SUI*lIRNN_C#5}1*+XgLb9+==t&CKA;@H}Nb#z#%EG9wuId;YqmY_X?cf
z=3Cg7y9ILn7)zORsxPDK?#5JqGv<brl2f$WU%V!*%yrEA>aR}Y`|049HEoqOY{;EB
zWT?%MIeuf<-_v(sNz1@#s`YV2uH3fP@&3qlWtmZRme*a>?RlefMozbee$Ol+ySCvt
z_jH{!-|p1<^S-s|u=jkG!qn@^mi^{!XK6bT*WZm@+nO+dn~I~X)CNC>xzQ=4*ajXq
zoZ8_FDzVXdD(eE4tqgZvCme9>c+Az^32VNwoHE+|j+*dLlxuo&$)>SbsnFb+{!>0s
zxD5Il{7Hn>DKAt~_;W&Af-v@5vC5i5+RI;69SOPK*iu8fV_Cp(NS>5uAypQZgu%8i
zSPW`HH~~`x8OymcO2D;qNOW_$4s3H4L~IjmT%nhigE1@y=8mLK(IcdZlBZ#>4{Jya
z2HXL-{bNS|Tc5C>M8XB94mz$tBCd+P&J<J8k{y9sWX>L+W~a{&6+CKy6cBs~U$S9O
zAk$a^hU_L~ozt)OBpd{tYSo@lI%6S<eO#7kiBjGI+J~Hw1eebP(+9&?5-dMK<*Nxz
zd)I-4s-#MwZm7=4?0l!sn35x2+zvAlm+PXS*sOsBTeM22V*xKJ?of(T^g#k85Hbx0
zi8K(yf=0bBNLrOxy)h}gP|(M+fsMx>6N1*Rp-?a{i=rWKJOaTIjiosVfnb0Xm*5RF
z2%%U8%sx?rx*4rWABl`gvr6M&!8(cAsj>i!HnBjRa2$(pGQ9)T9Jn10h}Nl9Vr5}7
zsjHvX4q;g44Qc&A=0#J&1dav*H;ug96IL;lB@A~7-&H+uC*OU1a7W)^hM4bUWJm`U
z9|rEp0w{=ABmjX_5Td~QrB^Qy!C&bie=BL8&A%2{J7?!|uO$p|iCj!SAZW|Y7Ks@0
zWm_3UA%I-UPmJ-=gcW<Vqv{aozRD#OiV`B?tR1r3^)gY?37TV35ER*A7tI9O1%KLD
zBe^8(3-)hwg|eAij|$|)nqv)C2pYwGg<@&+sv7E;%x`^_<8D|Db&@#I)-W`tdMr~u
z3*IHWg_yKN6~Kj_a0?c;N)Y4eSB*{zz=Ll|kQh!pB!#Nuz!g$SGWG9}S&n?OO65O5
z1sqt*G{6ypyTW>LEwgOfkwvfzGMLK3YorTiR3^wfE&R(vt2WRo&@PV_=_M47c-aM@
zh4#iCq<-ehAkDIqUg)jON{nz&<IHmyolA7qUzoSw6gc6uJp&C>Pe!CId^`x(u_Hqf
zJj{+61_EO%bxyQRo808wLxHQ`l%9U`q+Lyo3$ePw`Y4%c8E1R`?#;?B#2E>`9fOV0
zqKs!L8=x`y7@ONVgVYSz&P%bOxUqPem?zcn4$f%L$1!5f6w%RQ`x!DcH^{@b{X{tn
z$n+SjMkO><eRuZ8Q@3gw0I>O9{3Af_N8<(|Ki!g78XdfXPHH|M<v*nbh9n90kjU&A
zP2klm;mSW2it!zTpz56%3v_D94uhZ#sdJK8FDI;5A{OS1=cWy<Bm^Bzk9zFq8Vy^&
zYZa1feKJp_pL(he_Kc|3ll6>+@J6R#blzrVl@QarT3HJm*9t(ljOPkKdvxcD5L7y~
zp-WpJ<0~$N9PQC9PtpD^BFNr=2CpseAtYWiBnC5p)x;2Hl<sOKs(XeN05Nj}qpm%H
zCQd!1FtQP=imk&!$|D$E<9z9cR;jy_!}q8W0EyNFa_x)}<Qoavz63F9sk5`8bKk&U
znZTBZ9=V>b*ViY6q79REnJPi3Bpw&+fr3}>@XE<d{JSm5_M@SnRvoYcGTfRre~3DL
z1Dd!Z%t%g*d_}DgqCTyAzof@gIsB_ux8INR;FG_8njG-ZBjc<<m1KJWaErD=EvO@l
zvJz(gimE@+v1AimLIy8w6@a3HCmr5HG-$){0p(m@2-do7yN=9a6F>RO??ls9bCR=&
z>aD+iM=MlU+*Nu$$8R&={So%))OCMT6Cb`gu*^_aI|wl=AVJJh009s;NE>LAOdV*6
zVMb~)`djJ6Jm&#6U9Mc0tf2x}c3<(;bCiw>P&2x*HbV5L2GkWCxy*iFVSj^rI*$f!
zir6qD)B$tHm>bqU$7mvq6+4}muS^R5$`ME{+9Vd$LRy^oiO(Z8$l~%kky%bUfS`r2
z)t1&T=zDpPF$J5-IPs4$CtsSl9g&6#Jk~TTYk_^0&C(q2{mK!HRsq&n2DR}!3|a#i
zvPL2SjfP@RrDk)24my)Y2*PK1{?~&(W2(4a<SIZA1btGp8HJ@-So8WsOBe#dQUj^u
zDsNzWRln|%u?iKzkfyZh@}_ADfp~4=MJ|qHDH5a&FSZ|kzG|aP`ZSn)EDQFZ7H|#a
zd5y$NtK3DGC4Fv7wk93yJ-4UEb3^`f%vED0jo%G)@jER|fMrPfg4nYmOA&b1bp_VQ
z9^@`fq(W=Ffdtk42}{PdovcYaBTeUKGM4q33ovBu-86&9s}77Eu+_#~kR^5?WF6}!
z;_F3^Mblo248<(zW13dx;ckg~T$(Kj$>$*?GWyO8s_SQX6csA5C?>uNwKnH1nU)M&
z+L4c@_eYt5-ddqitZAflf4&5!BOBSuu?#{q4Av+pKE5mI2M)M;dTsR^*jR#Gxf@3x
z%mqEC=$9=*$6*-W_fD(z!z8|<^9F6E2N*Yo65`wf9Xc=9tZ#FES-24=S+CxW0ToKj
z7vg~`6Ymq{dtq0{Iz<m^PQ@<xi?Hw`RWvOFNdRm_KqjysU)BpirSP}d1Qq@9LN*oe
zXUE-ylkl5w(a`*1uE7g&MUzE4{haFn;tqoHQDi7L8D$y;%@Zn|E&B*(S;|GRDKaH$
z+tAiZ;OQux6DgZhdnEF-^6l;}T0=*IPk}T`9p12*1&#-9xH*dx@tf^imiRA0$S)ZQ
z$zp{iXi8E$QuvwvJbb4?P&D@uoAd7Ger=shQTcWppau1mb(y=CshGC*yYyV5-RttD
z+s8k9>ri)h6q9mG(V4?(F+L>VSH)(hHiti@xoe;kR>Y^O5zLH8>H3#gb$OxtJJa9J
z`L#|CNC9uiGReTVF;>P)=QyQCn$qMrK}=(CpEhSU-!;H)>!%%ju62zkRws{*aO-A2
zt=zfoEZJlhU5QrON2`^<E^)WK!Zd*(PUNe?2pqg+;D&wNuTAp@WLqs+fEwaO85<g9
z9*{=Sc4?mLgA)?v44)OOzY{q!lMXsP1rB92O(cFuI-t2a_E$2+|H8qM<O&Co)89zi
zVG-oo%r97m;P>;Efwze+O(@I!tSsJjuhjml`rAcXhW(X0B>%P{+sw0Fv9HM;X@S>J
z%uMx3vp{`H)&yCX>}1&jlU4pmzJ*<)4*(0|`sQ)scQS+Ty=poV?F;N8JV(^gAZ#TV
zo0OXfV;~s@ld$@?5d}MgC~~+k4s4_sEb)j`?Cas24@WZ!))RcqA94iual14xOO#l=
z$)t(AE=~8Z)w9#aB%_iI(c&<BkJIs7?ByKrCaFKQJ0l)10DGfuUj{(jV4X76LbWwK
z6bHz)s+18$0o%DoJfn8yMsYi7XsEROG*%{=tke@*bqkNJ_PndTEV+Sg!l&U*7Iubt
zwZYanGm;B3H~;}B2jJl18TucwErH)g%aO3uHj~k-R_6jVKJ7Xyw29&^m0+Qy^QZnj
zPK5o{M9n&=v*e5wDn<Pc#nCSDGHW%$Em6$rqE8vpM6fOKqR5gRT{*bUKP3FQ(Xa8q
z@e(qyxBFM0u^*6xB(RQXOpvi6+!(;6sb9Q7<pP2!k$F|(DdU0|HY7pRQxcSG$(XAB
zYa{_is|60`p|q9LI~=4n6J{XA&T;(cf+X-{VvUA$J%}S?x@g?2VcCY+ye<QVzFwOO
z?KLFv+=#=RDSqDiAkCgcjCfd_wqm--m<YT?5=E+G-IRT`Dp}iAcwQn=p=j16Ve6HL
zK=j-to(7|;crMV~=A&33%(!JKvUJvpcc}|g%fn8eiV^T^9%gpV+q;55(?VQ^d<NGd
z0u4omr_EFz=4vM+UpVRG)j6%-KF83((!$ci5XmuXicogt#hNiltC0jJA4Lm1WUn01
zYho}IP$#pkTGv(5kKl}@Mhgww!G}+)HZ^U9PPUi6T?2>CygG;DV#+wiIC4G)ps-ZQ
z{DI};GOqg&k75bnyTh|Xo$di=-~d{Wrm>5$+t}vLFF%{5sajq5yOlfoiCxwQ`AT)T
zd_>{jP^}$-wr593sZ`KqX@1E26tNTCK1+Q0;SHTIV&e4SO-O~Qgw)GVt#OvpL?O?+
zPp;dzkqY|^a3~ZDtKA2|^#T;Q)XXE+M1fbc_9;cuO_gfID$XgoIlop)kiF<oDlw01
zi9F>36DH}^P^Vdgo!v@cOSV^Rp`pu2xhd4E(V&?vXx{J0!eB+*guZKv)TGnUQr75W
z+NI>19anP_{tLNvA)+b+Bg|TB?Wy;w6l&?W-u@zi+9uh4CF)<?C(6iTjf+TO_|hUf
z=@%;+6^o5-q>XV=sNF*M9j0Ry#*8mY6kLbKjMoK)!EXdJMf${G2mJfGd?WcpF9EU3
zP~?e1526}fXJXJ8e>tsWTu~TG3Oq157A@K%mVG;*>;$O&(Xm{)V;xov8Q_dUxR5@=
zurq2nUP0DEsg~*cr|?{Eu=EcG?I;)T;(<u>G|FwCo?`JFB$MvuTsRm>qf|PahVD}S
zY(x6BC~pg^X?~AG@T-D1Rt!iPk;p3#+SI<P>?Cg!zRqsoeUT*6N!vFkt$d}>K)G4&
z<54g<3%xP`#RV?jIStPzy%r*Kp@5z#KdF}})}W5zuxmq+IsH}agMgvSNbG0V`1|(w
z={iCQI%dMHGl^y&4Xh%1!ZOsaj+?k_*vNQ8`_TJv$YAqi(c;H8=-bzI;Wv#^w>g}g
zlTouc`)@~Du%-c!;|Vg;Snl(RKv(H#Fx38F6feZ}R`XTuIGiBJ2#=qx^YwJ6&h&M2
zYx#gZ$gYsRTK<7?B}22EJ!&B9r1~#0<e!YX@Ou8ul5{UT+hSP<-O|QKyk5&|L-DI0
z<WlFdRpA4Fa5WlJ9x%uJCz8AnU)C)evaDW*wdISn0a^Itg)#Q8pcj6;n?8=*Jjo~P
zHoCtd)k(o{(I5&8RCSPEcroO}lj`c3HYvti^zcqKjF0INgX0Ybv|?O&{G;j!VdGXc
zG3~AR_)TG$FjYkh>4;Oj2+|kyPuy;(6mZw20EM*X4<^>{b7#mLgubv<n3xPZvvI(9
zUha%jzCK`$A;H>I&^MIT9mjpwy3~RXBL9cIdy0`gd;>h+I&IswZQHhO+qP}nwryLd
zZQI>v`#;FeOeUFR?>4(hT~zA6D!+Q)=lOo>uN2*xLG>FCUrUBc9u^QitGeyLZrKT=
zUSr#nZ(Y^PsSXEPO7PLhpGnG|gb&%Nw^1uPN>q3F07@HMh6YWfY9{P-F+uFkA>TEn
zo=dvg0SvgxjjC79Ms`jc9JY{-8SOpmOzwd!j_++kGLgcPx4Pfyb8Y^7ho`ItSa{Q{
z{W3}Djdxic59{mc=hau`#uK>sR*Qh6$<1=2)kbJjm>mY1uU2EczFNozaZ8uJe|FKD
zT<&YbJr*jNRCoh-_gCYdNIbz{)Ofhd%(X}J9pJB57-Ynw!5EUistoQv21DaIjkc-l
zFvQ*)h3bY|&LMN%r$w#q#P%%danV3nwna0f|0t6#0Sc0MAF|jexs(1UxN2t>DI1ZA
zC&i^e&&~RmZ@3AaP83He0kHrI<%v=9&o=W*@+Fs#1wsis&PfNobjZ$fyNkP>9M{`Z
zybbv|3_fX$GEqi-%durxk#doDllFahPoZWHQqOTb2U%k!g&h8Ao#QybNL>WcpHp*b
zZT*bP*x#AxG+PJ^v7P~>Lq7;}=?=N()oR9ugSBT1O{~`h=^28%V+SU#*yd5U0~kU)
zMD1yl9=1Cg6dA!&yE1SzH+NDlSZqi5=l0r_s-+H2`5_?<xv2p9iW!cxaHX1oS>O+j
zcz!Oj1!)gV6NqEz)O(HFhdXEZ_Z#P)?f)cWairJc*q2X1`~%(LadsSoqb?504dE9C
zVru*|;{G{?XU~=SC_`dz^I8x}9p{CYV@xjsyGK(e`vd(Ynmizd<$RuYrr}a>&hH=i
zC`i}iuC2BjcUX*Q{ZDmVxI?o!u7Prc@PS>sMY0xpT!-8t^RQ`#!J&#@Ogv8XYt}*t
z#X?+r=T_rPEfjF<Gn84M3h&huC2gH%3e?KCPkst*(s*ySXDJ}-0=*Q*b+7f_?CVu0
zef}GTOf8-I2p|GnOv;6OR0S0@UmnL~qKwMpOK4hEK<BxAd4C>hZ-)It2m2e010otS
zNrt8QyzO&$9R%G59@?#tEewGq3TVRY?JR-caGY*YwxGHv{v^{xAP@S6yLYD?EGglN
zw9<KxlVnY+={9>e!e~@{Hk9y2WT<{P?>JHhSv2c=@kMDV4{;wpf}Q}AB?7+^DyQb(
zo6Br!d(_u^;_TOR4uel=Ynqf;zld}%+d5krEFkQzV~aifmkL5f^!b)?Lw3#5P+ggP
zX0bKi4-DRDw(%<BS&L2i5xd8{)gaCML@K#|Q5d#2?Vt#N%zoONqH@36qb09FevRbV
zkU7;g9fc;BaxU!&Ly`a7B?K6rZ#rO|zN^2v=DBiNHw>12EjpLrJK4N42)`xTzr$V1
z;edTBMGPn1e&ueY`3Sc|bHh$pq}Nyra95E~2f}buJj>6wn(BEwc|~FFK-{;d2NppD
z;`&fk{gI%FK?84uASx*FZN&)Uiv|h`j=o7dqhdVyk<N(5H&aRE=KK&AJ8b1z|DU`N
z=y(%elHC>VK!E@RjByTTL-wNRMP_LlQRYP(Y6<hn@Bn)ZDI~|BiG^@aqRV1YnEqEI
z;^2xwK3RQV6eIF;X2uJeeS+3Ej7g|y_|g9L<}gGeN=J25e}bX#*9mo4<G>9{k{Tj^
z7OgwnDm=;Mo!V6s9+R{V-coWFGGOZsaYVVHs##-HJ->6&kT%Pb`6e4W5fn+iyI|8O
zmF*x5E}|g02u9M#a2&#5VZ+MHqaX$ad54i;J;6`0BkXbfvpbjo<AK-)EbTKQTkDoR
zOzNo2CMv=j2dqrB{>4_djA%QNzZJ^2^4suaS1O2(`JGVe#!SqCSf)f6|6a#Q;l*{4
z6%l>_&dDJlL#JDT!CbF6>YZ+Lr(bloUT$|A{E`KTNtkkLkQkIPqbBtC3q~>q73rZd
z6{MYvol&#Jx3<_Q>IKY}cOd?rBPJB4ykFuoca9BRz^*bi7%6;;C|H@tuRD}2eUo-|
znU{~`SeIwju!|Z{QN+Kh<N(_raJ>JukBFw&9U(pAFE18O&%N$NL8AcO#1Z_6Opj(y
z`a{2efxjD4!yJDRQwDbD|M`(r3fQMwYfENI^n2$^t)n8g8{zr^gd!`!pDBCu&fE-{
zE^?fmX*N$r$7`y1X{?6%O5A%UYf?(3ZAD<@eM&_wP~M2gFg0>BF$>)%<aG<~^FUk`
zjD5EYoi)#;1My*h9NrLY_-ss7c?L7GOV*#A8lRbvyl*y-rZ^B^$5*@y6Y`j^iFNOR
z?I<G-lP)eA_YZN|_90o`O2hzEdhN4){<(dAj{O?+wAZ7_URs{u02~u1$*<@u?pm#V
z+nAq8>5qfa^lj=Uh(+l8^mm#P&6*)YL)0Mp@XiM`Xcph556)A<Gw`*HERY>RS#lf>
zYs>8%BAu%+#pqcg7%i<TqC`^PrT(+Qkc?WtUUmXfy*g@*RyVJ5Wr!lrD#HhuqEWPb
z!gspi6F+hO=ylc&8eXmM^Us5q+xCdY>lPdmM6IC)9-mNEQrOAyfDghsG5R()VHbL2
zEbcH(0Eee@+uU5eg<9NAd2T2vnqX5ELCekSHc#i;p2&J&lX#^a)hq0UEpPS1{Jj6k
z#ePhY+VjFXCwqRUIW)`#E~d5M;5SY(M}s@mcnP_6WOIi#Z=P$SPlPZd4@dHCRy;h{
zc+>z$)7DMBJ|*n<ZzW|+$)_CaM+S;|$afFMX;_g!^oQ{RpUEgk+kAxf$Q)^OX7P?5
zLz`5scAA)ew>csz@jrL4yn+n7_?|!jGhtR&H$I>DlGqeS*})AJL-tG<II%?rw4|pG
zbqo^9met<G^^NWnW=wRdbkc;=r8hoSTe)j)#xK{{tjGk(I^n><b(?$2@dZo)Y8;}z
zB!S``-yrfEv<g&<CINoL@IMLgZwKfyG;fH6J;$<!2>UG}Ch^(N`kVut3}5^$gu49d
zcb*;5eaUf%wg|f!$5LS8RNmAP%mtS$304{Wt+z)Vj73q3<~Z=9^MYFnq838@m_nCW
zL97KxUA{=0y%E~I9DPq;P9>Zm86p*G@7Nu2xU@9dn!W!(oOENj{6<xx^Ud`s+<qhv
zlf+`=WZ7(gJkamo!fqaus@U?x_5F0YqY54nr_~8KKta=)uCRoof2sD7ZPVcU)c&dd
zrYnVVdclh~(d7?#Xh=6IRAvclpgZ)ttCvy*qZ_0*=K0shgKb;x=x>lN)n1OBcE5w)
zigLn-OuCDs=zf>jbe~EB>Vdc?!~ELH#T_^&`UR%g=Pegp{_%*M>Wy!6vwf|@x$WVV
zgL2*M6Z@E>Y)Ncdj3{PlgY0(4(v+EO-Pc<IS6BFHfy4E{!&kM}4Wq8+>ro-sKk)tx
zvm&I3l)qhZsknJ)bAJ7TZa>jp4=)cZuLRTgEm*&)XPELxJK&pb3h+_aXCH!xhHXu!
zvUcT`4W@1SksUQ(uR^TXa5UsPEo$&r68cU351jh-{Iz2MvR+~Z>jSzW1&?u<`;6nS
zlm6A8h#H^t-D$XIQm>repYktLCfz10x2zI{X+;U}0FZy-1o4DfzoVIy+_J4a0*O}!
ziL3DB6<m(x#JY@v)deuHY0J|6#ioL!Ex61T8TAspBzFjnZsmNjaM`x;RIepOAt9*Z
zv6WCPI^{sodT{nAmI92fBWGNP%N=JUVWvi>X6?qyadYGuIgS3&&yL6RE#icE<wt3m
z3NOXtAkrTuOzTv}<Hc3mt%g5Zfr=^wS3RbBzxolV3lmJBGK=bX5OFJS5$lBN>NGmg
zW>yBEA9NAL;G6n;S_Xp0jr+Qd2jnqEvrK+V1jg2~rAbzvA&hX6Qq2lny5rWdf2`tw
zVC5<~s#fNtUn0b%_P@UrXP2w=XgM@<f@L;egK8wJK5m~yf1LpHF!9o>NVpgLO4Zq~
zhGQ#=?iQWoSwhMP6_=RO?ZaOg>x3Q6{^HU^lR&zdO}fkJ-;c)->;Y3Xo*heuo7Yj5
zPUXbgMxE{~k_l#UdF9d;q|%@T4trOr=bK$x`n*3M@p~U%9$VhTTU>`wx*x7gN2LKY
z>3EZmJxQYD+__}whqj@Uf+K|})^fwV4hz>Z#{4tvuAJ^~JEsWW$!_+at=CQP8+HeU
zl(lSupdCw4npE-w>ZWbS9Cyk|U$qzl84QX*8X%gNBnzlV-b0pAw4#F<HwS#qUrt9r
zgeSpU2ft12&~T@Ju0)fZCKRF37yS(Eu=sKs&5{g+^wt#Xp@nJsQ77nETm(cCRSm>Y
zm)JBz#P=CEwv4DaHylltF<D-GZQu`AK3AU#z%R`lVD{lWHSP~q#f)-aC$5wo8Hje4
zJm~8hR;G0j28^*DWbc-++m-x<lJb}!X>NVt+zJ(Z>fE1vZdEKy)}}tSL07hEZ&mr5
zn^zLvn7bA{IuS4=??}V4RP{22^{+0kW8g5MH3I20XRa#iCB1`M8vCT{Icnb#MZPL-
zY}nvQqeu?P?GIOe_Y6vJxGWA`G8hQl`v!=lprx@jvIf7r=FfJ^PaBjeK4z0ETRx>%
z1K2wBMm}h4R3QYZuvFPISkwo$RH<O&yg98z0_V(>?oH*)rIHNHnrk%4c_7NyA2@7+
zIvmJamV@*+3ts7uct`B3QgFIomQ?<%F1}L&Xk3=sm`uH(vsx={u{bxewy3Q2Drjw%
z(%33WwX|S=nx6Gt4bVMNSby5CYv3h_qLa%Xe&ubCp{3^jXRqk(VrJ~l!@%|UDacQ?
zGEmI>B*W{&;qU_k65hk(sJU@NyT^VQ!Ex%s8La2(lGmGP4Vg`QWm4(dU2U7+I}~Ea
zAhcW?h}@kr*jXt+7}30Fd+m4ho5RHDQW}-R2A4s_GBz<W<$X)WPNPuqKbn%V`6p(0
z83#^AigT#6v@nomj$cHhrWDCFAB#-HQ0KwMFQ%mV-!P>i9=oa2HW@rBWT=gc=~~Ac
zR3p!Kkaj*&{8iBTk728&66|#b@J<JJZX0F?Itdal0t#D7mSwVyNY-23#hOf`<^$7%
zHP(-ew>~W#nPQ7rW9(aCn+?t#Fa<aJxXW3FOrULijobWhK)NH#9wJ-H0Uv=T@I-W3
zIReYFKzEsW{Gbe+dF?ZdvKDu&<41idx*w!f;Ra{}b>wkf`D;&%m8w7C5HPyCN*jyN
zb}+q%#|41;1}gUx8ric^3mV}_rwP_A{L^3k(C|nzMCi!tbL{#o)X25QmtlACvHnd|
z@4z@wVcxg$qoo(z<1eFUU<AZ)oG#m#W=_9}ExO;tmIG$B7N9~pa=v(g#}Ng|3#m6z
zq$`>JH8@~(>%NbNzYrq;{3m6K48R#QoZs=0QDP{Z;azeSMmld2VHoJO!q{XSVF8t-
zj$wby0(z`7UYQIq87nOdf+6(eD2B_FeAMu!_?WWH7R_BeIrtA?{JpwLRwL^WY|Hw|
zO!IX+9-Un<JWxx<*{A1^wb9MNu9=*BDH@Q<L8{U6;bz=7`2u2HT68PZ|Ir<i`*nwS
z&XUJhPy`=_6|U_I5-S9fv#k?cVuaJ94rST@)7Q$Rey*=G{-5a~L?%c>F$3h9W60z4
z&Ax)`4MY!>IaD}cob>-**it|P(v1z(T#s;UXQqp=YH*kFW%}>#kf9(EHZ&=;A}lOj
zZQy@%hct_M^_O56jcT$rQfq^dlwX@C>&Vh5+k~%XWTatBl4J=JZl_?Re@|00n}wrO
z<yyL6FjO{9un2pHnS;W{NDLPFt6yFBsSGMk+-Ip-aAhjfX>hV<ah>M{*k{85(Rh){
zAX_jsD7-N}>m&cOE}G-F<epebXncbdl(50(X!@?LQB67E9*<%=1`({GZ&zM%C>?^H
zxS;2@FO`Nj&m=n(FuHIn#hsdO48#TiQr?zVJX$zUZ*?exBPTk|A0;j`TYk-7d)Ljb
zg6S2&fSv1+3mWfE?O+c{VI(zuNpJZF8i?v&=d0;|0(r&F3Gw<utWf_GGjwO&8CciK
zu2NCJOtAWvr#u-S<o5c*gN_3Hq|VGtd?%bfH>q!_OO9CN_%-2uaOdBdY?DwVaY4$R
z>(ug=h$B)*QmMq^-F@&w@gkPIi2%!EdsAThU{v3InQUH<hC6C)zT7MOZk&WY?Sbm1
z4QnB}LwQ_+s2StQQR;5^TDpSj5tm*LRmd<g)X$b))Zna!-8h$!PpKaRX(pQ%r*yu<
z+h5^)yUK2iqlGqQq{x!EKuhy()ylo&j<x@!Z^yk7mu60mAqRK9<@;LeW1zK@JSj$j
zNRb?&C?@M7XJvVt{B0KK1V1S{gId7SzsMZk-3IO3*%;gZcw9sR<qk|Q6$!!hW?h?`
z0fN|L^BM4D<t2I?GSC+*La;r?Oht7cxCcW(%8Zeu2D%hLi`y3azgj~GfI(Fg42f_5
zH#O86@>bY4YCNwBtn89cXkLU&t6u=Zk;BGUD6W#tTmy<#r@q7fCzp0_Xk_%gZUEu(
zT<!Pr)tOFPfSEr8t&^qp)~Z|40*T(BvH2Zjkc~2iETLv#1D)%n0wj30iZ6D=2giT6
zm~Zam1Lf_)A>LnK5Q?>++=XIg<W>E8727ZTf!pZ~99Rfe??;w*3I+mFo+1b;h{`ml
zJ-p&ia*&pO$A5)}s?hPG^2lD~wRmkHr4jbfl6)=*%eeCaR(X>ANnIItYl=jx)`aJ0
zMgq*z3Ix~>yS&;m=RU}-h<LG3|7|XIvwXGa?i^|Ep6snfp4lU!SpG?rpg$cp-g8*c
zS)qiAM$Of!MLQZ?yxI5kU&F50ieA1%E7#hRLI84^TEd`_W%EuoNA6D54HkV!&H6jf
zzvH;oM;v*%$>tf2a3L6kJ9NDbH~fV~T<ZBmB?*3v7ji(2F-W;Y9sTjGp7}qrJIUec
zN2IwmK_zj~A9q1`{F>PEn1kYOe?aI`PSl@5k+)QXFsl@FZ4hEjBCTL;tHHPt4ly4G
zw1lw2+(x3Dwhe6Jg!xrPKWi)6yR9dkHwG1eCsQZufm}#Qjt$mhInUzaof1coDzG$*
zGngpkS&A8>XaW9VyvBKo3WsOX+%AU4fHl*|GzZQuuGuy)Y>npC9?|C@tb<51kx}(n
z<V~DQ!NLcK>UmmqyVcjR(2B)|QF&Vt7Wb`wk_KK`8V`^sFHiFO^X27x3Sh`o=sBD_
zH`<L)SGux>k$*Xsp3q}`(?D?d{<Hb|XMloY@f$1W$XRos!VsHaAP?DiQUC=^03g_9
zvHe<Z|52l-@bY$jp8&?wc`}&qOR>i7aC)6Dr{{BUXq@&50@0DpJ>->F=YG+oHVrxm
zQ8W%Re3it{oTvvC4f7zuod><8d{?oAzF}bCH3CpUL+y8}SI_O=64&TMXyB;Hjc|Yt
zJ)yenv}vLf?Y~*^wPE<^*yQ8+>aau^G^CaYn42+qia$MZQ~{e4zm_*><R#puf~lH%
z?2;IXF0r{^iV67+KOJ3ZkO7GE!bJw!)owdiKcfVoNcI}Dd%f~gVcIchQgvx><juLj
zy284^Im2Q`qv064P?$oMB)N!88^RtJU(^OK<=3OF@k<IZ0{qk1@xiPQRtjM^WJ0Dm
zF(eik70D&>R*km$B&uv>GI5ON&i;wDjTl|tadsJbQ=rA0EHqfZG-@&ge$5?*WN;tp
z^^$bV<n-o$M<`pf?FC0D%hP+c!xu0mOu%C9@uTz#t3Cw|^E>qmoyU9gg*J71OGCqL
ztIg5l_^|OX(?`qW@vC2Cf6hQO*Te=C*qs0VeX8c}Yju3QT0B&W$-C+LT~EPz+=tNr
zo5WYGn`x)*dE{%ag>6F_klN?d__09);CLeRMdj?f-DMt4#7PhByv-i&aFom0@jDy*
z`js}7)?56)z&8?od;cSRb2CxLfc$%yPypqw6QLTlll=D=!6>6+rk$h7<NMqGqW}C!
z^-x`cH2!@7^%{ibj(*dl6oXavcc~1MZ<a34{mA#eVA7<T``hYwX$8J+inVk!V`JsA
zsU7cT(Cem03SRU2OE1+9d(2MNEU%jTQ{1+fEg%)O*LGW7o9CzK^NNc~9l(8G`0#-T
zH~gBK`m8;`Hf9QDTwJ2j{xSbY`9d~@awzhHhn^gbVO1&4NNN_Kz+VM}4DCMYJ_etJ
zvsarwb5*wX(2W1p5<M^hOUsB1{8W2nRfG(L3P6dGxB_W1ZlC^4CdX|YFLONyoZp7K
zeT_p35Q&;kqfzg#pIK#rS;sx){}Th}P>k#Je`nxS67L#fst-^K+8)>|$pjmJKw4dm
zi?XV=GNaO~AS|zqx+Ef2dYwp?ew{yTPR(FJHvdZWp+<5pH)+52%@hZ1oK^@t#bLW%
z_ek&|uYc-84Ae@NINQxyWU|dR-T%?N@sO1LuiT9;6GiE_=8Q=?D>P(NdOC`&wl6;o
zbZV~mb*@jd=fm%D7-YNJ-ah<Y?5i0Izw!3a8<BYJ%h&=|mL3if5cwBMW7@u2-9;lI
z_XUkR^pI{%vpS7UZ7YLDEsa#utVC^%RM!nhEzEcT9e{Af>f4i$cZa&Cr|tXs*TVtX
z+i3o*eA34%xulSaUg_N~ATQgUuu1CPM!E9($tv3{V*l@E4kAC59qdX?Si&e9zxUj5
z+rTkX9Offld)tm)ue&5aA`?2Mm1eZ)jP0~Hl04!{yCYgs)&Dm<nl?s1q+x^q93bte
z=X@*m=i@T-`m{e>=lxaU<EQ?Y<>c;Kb7-NR_otd{fA_DQEOT%B&2JO+VP~n|!pqb0
zcdHT|czXN74DD}6*yJ<~&Gdmy%fnYu$%qJ$wuT;`0Y4kbE(;hFdf{p2YOSVpTPr<?
z4%)PdzL24yG&l3g-*$=G&Wfl}F!^VZ!PQn6GID78oF4e$u=;o5KG9N{!N`;@y~8yB
ze*rMU>bO3P=cR>Erg~oPN8jfDMip4b$Rb?0qAqvK)!9CCJum*&e2IN~M?#$hMK5kL
z>g@@pI+S27u9vh#b6v5kI-GFpY5aYkaLwIs4PK{`hnqp8WUs4}Q!O(9(*Ru-RCHLT
z&Bk9S_{2!NB1eVbpGCOr)mu8_J?=Z>7DTT*K8Yj1?O>O@m%@ZGg_o4pVC-t-T&9M7
z;r3y%S;;K+y2%5LrN$b1xK%aDxeK}mxcq+wM5US}hP1z**j}$BUJqWS;&`3s_+^nt
zvs@1)KGFLMbk}6!&-}TPTZKN6%$<MR58biu_i}lTb~&iE*IhEGP;U%XH{_i1y#5#A
z<$>VJoMd>0tTF7OMJZWrX?8rOr(0^fbTr<_w7p@cfW-zek7iE}iKJ?#bF?(x^XlsP
zk__o2B0+zuc@}IJy7}5yh^?R;0d;<$I~^{!Qo!^AYa@EeWT<LnOo3%+L6|LO^PyO0
z+|s3VG}-0SQ0fr>8Yr70IydxVWs?J1JVAA)gepikKMeS=ucD-}l2X)<(IsTnX5Q-8
z*%WE6{nj?A%KDu6+OK&%cixt8_V8z?X?fl=fkw8J{IMWY)2HMwIe!&9E|{R`_2gVn
zP02_-=aR65wpw9AaX}d3{UY=M5fEd|!}NKBg<&xv(M)a2<Bc(N7>A@7r^jY2{yTL?
zo&VpY`TtFt|CgII!Bm)g3IY|r=fDW7O>hIWddQYM@1D<ZPOnEClvtM8pNl;%daMg*
zI`M)eTA@G9v<?W{Et)CTZ&k?DTk$U`BAxqb5UF-=$As-A?;%8xOvAaXRI~60mu;l;
z@aq2h38C|e<jeGE3orbbw*gUZ0t(XF_Nsk-f(uETis<*bzKJ+1ir~TfkA4P*96(VM
z%IfR9!Z<Ohc-t;~?SQ%*Ef2;%lwQyasm5uz4Fm@>&bZg(f&kVR3@y#}VCyF!zz9Y=
z)hu+v_&{l;A!EM*>?%3B%op?=H46~UjC^5?{YBiRSWC&Bp)JUyw_w&~)Q*^nl(R&v
zf>xD-gB<1nMznqIZx(T#VtWm`Bii3rpJ&GuUq1732)Hq@2>w5M#h6%!x|jA-qp`st
zA@D)R6et)@UG2>^M+-j(JiR5KZkUGdE9AlzbKTV~h8|we*B}F7mD^;GS6i^DC_f8r
zPty*Rt~I-*DY7o;Yz{GMBY<eH+hpmP{fF;gBWt$DRS3$Y-{3%s|5xw4xv#nj>5>mv
zPoB4oiJPP0>507sZnnD60lW&X)pbUrbI%#G3FP?tFChX;O+!njy?KZj1ydAG?)Ucl
z_q4(0?<gr_%@pVAyzD?!{QzR$kFAF!sO1*_!p+IYCHq@qmS16nJ3<uuUWfCa_4_$D
zy6b+M{)_Om=Elv>xm`MadmArc6;I-U+20VH%5b22Us@iHwyz%{X#q^LIILsb1rxv-
z-8-TcZwDi~4Y~vF`2)|tV+&b+>kVW*UbnITn;Td>^^COcR{1dCQ|YmK+<md(F<8L_
z&+Z?n<70xy^10~q1tt8a=a-Ssq;fBE7WmuSHDJ&iGA$s#H=BlRF6>koKqIA<iO&l4
z5n)IU)DICTH{L-YNAVOUO$H-)zYf@8qO?NVGXpK~lgkXi;bT?D8vT>QLAR>0xsFpg
zmfSg3T<-ga4nxakMlaXrN4mQLxJC(r3954g@k}(GPDEs@ZKx=TCf|F1l(%jFX$tOQ
zQY(Ma#D<+c9`%6?#qQ+{US*bKXR<+T{NktKt$O(}p~8i032&$pE0SC;)er&!enDj4
zLls>ZUdR@M!%yU67P(L=fzl2B?&*$JY<kxIV0zAu&M+HpVw)aq4s0fr+~7RMuoP-y
z!)0Yp$h0aY3;@AcS)(I$lW;Bd)Z}xD9-Ylp2C|rXb1?xfxVA0pSYo3`#wq*&G01eo
zW0-8I&}Qyb6e0*gCd!NVi5M6r_{W5uU`-(3hvODgh=)B*&$Wn`RBUH(Jj8Ygdtw5N
z!SzKi2}XirlNCV6q+<`=1UxN80M)SIY_Wkp+rfeZL+zD`Rmc$xOEmvjT*^I|A^p8U
zuqpjwf2n}80_|aJI$}(x2S{_#2XwD!i5+4os4N=6@>ytH0qK!-b_fJv@NT6?5B^>{
zr1&f7+jR6}xWrYAO%+--_=&fS5#TA@3DijZl{-@~N2vo1qJr)SD0p8O39RugZ4XEy
z%$e_ye@=}%XDZwjYVYI`%w@rq`rL?1A2ypz)r2hl@`)s${A1886D!!RHw>XE+<L9_
zhlJqMpv#Od@NJS=GXNNfr-PA88s&z#jk8MJ4L1DSCF5W6-A3vJsDTy?gQW6hRm9(v
zQ-5NA_VN>j9Ap`7Gcsvcq2jC2C>dm9Sq(vEJlPn6MpJV5V5QDtz?PO2a}DVnq+<3h
zt|Sr^2FPND6VG#nC2lwY-!mr|Of(hT*Wb=sRp+V{2*t{z*iIT+2Ts!<)k`D^F-MA5
z3I0YfViXmhvyb4PO{e3aoelRY@Lu$@Vv-dDJQ$kki{1||mr@bi>GWuid<GqBmTasS
zkV`w(M4!Krb_14KQQ9Om7-m0NWeW_X^;6EtmxBSbEt(E(1+N*k%g2M_Qz=0s=^tuu
z#)40|CcJaL*Ajf2o*)SJ>%$h<+gED;-m|)pDh5b8l40<(IQ3@ixPtLtqo$R>aOZ)@
zQKlGZk?ev1lM4WAUWtIw2fbjkKVPEw5jF<kiYv)&&On0#+Mx-VXGZbw$WvNix<q{$
zj7zhQJWA9<*9p4!|MsccGhD{j48V!<X~I4ri9PIDRtLj0J2`ZI$)S@kw7J`8yOMXo
z*-NWbMq=Kd%eP<)!qL)TFmltU;LP&WCXc;w7?_QZ_&dt|SB6FzAM7Q^u7M)K*4E7{
zW)p@(k70$@0p|+;PaGy%8D;7OcK%ccZ1g`J0$aWC>0<~uvW4<)rO`yl7ovxFEt{!p
z+Nm}IYC>c-e;+BUtSEi(LO^YNp7?|9T;Onk05x@`w3EJr(OKXsCPtT&c4qrl)CGkP
zv4gMI{BfHH1vK~vfSEO;)@gY`Cc`}*0xwo3msYZw2xZifm~nE$yV^x`Y2)|W!Wtg4
zz)$X$zxWr{){*MPjP{r@kWo0X@W)JiDIO5INLihG#&AL<WMspruT?>|>JsS*R&;_H
z%?)h`G4U{iwPS^Xe14<C{d*sy#@%;d$e8+F25R7U!p+6TrSvKkoJ#4I2mur6WFl*b
zixf*M8_G$UyaPM&nL`K%MxUf?SA=syUT9Pp?K6rWT1ET%hL8a6!R*Ih%-$Fz77lps
zY?3Yo|7M_;*y&cFl{O7%OM%496#?mL&Hu?Gsybixt1d-SVq6+cx=9=+P^2&sOR<nc
z-FFW0qF>YRi-x5`QYvw2m;1+-r%rH*(MQZG|IXfO=t1WTt>Uk67H2Bs%Nsd(nXie7
z7sIsD=Qt)7%ND>|Ckn=@rchQ)#8|ldxI!|Pj3kTfE6q7k&Z-W(WmKAin_bO)?X3>H
z;a|v>gofVrZ7wfCv{47IG7<r=Mr%ZeQ%bN-u3Q%n*8VkssW|5tv>IKwl%1$Z<mv3P
zua-|tDADa7PLZM1@Cl?5sq~SX$K#|vFO7jf5iZ7+ECHxcIydIbWhKtBG2CP9Ky)<%
z{r6`TC<eWi@}EKg;G*nYNbF;3eNlu72FU@qo|V&Ho*JX6Y+bUI3!`_83|lG4<<rd!
zSkKCMeKvd`3DkLThkBqbZN}Lq;mU{UvI5{fjjUTCW?sq%2kuPZAQO|$zUxS*KL9;Y
zq>LK<enBdvNFatLcZMX&umet!AAZRgIp@<A?Jd3ldGDM3bJc6wZr360Z0D|Tx;QcC
zGR9~aWv@Et*m#KkvOT)nyf44)pygk5_@YAqa=tr14Uov~bBfe=xcf|ZLw^GNSMD>6
zz7o0pqMkGSFS&(Jvg!~+3~eryeh;w6AQDP|IyESL3janHZ#b$M_esbvoI?zmLV$Dy
z^5e)QkA=e|?X6hjsy~g9%I`wWc#$4}+o};M>3S`RHF%XIvHX=74`7}cb|+Pl6c;|>
zcc1(Y_mnu!7sRuqCCE4!q2#&=u#_pPYM6GIc#{0eriuPo#8@SUBm|@fqUbXMsy#NH
zy1e}a8%J=}b&4Ax{-8yfoFtMA_sRsceaDudrHk$;cQ3K*s=g?worTMPih;jKH!0yZ
zVj8Lhm^6_HM)R$+afGibb+OLVD=oIbaSx!cqJsV1*wtrFj8>QHF<k3Mf1?&QV;vmP
z!^g4n^jf(}f!Q>Hz)|Ah;qS}8cW2Jd8FK{mx}~jRzP(@(5AHo^NIghEDfleK;CV-T
zn9_?uR^@Za)IBLqm%nTGFK843ba=^f(F};!{u+$EQ2=<N&|M5*1B>Mg8`CH`))a?D
zh$R>46T_#-S^9F(*GmO#*%|B(+F)R=(d{kT1>8EHY|T7cgmo7igbK}a9RFVON^-JF
z#|@5BiYPSQ3}Z`=QifZagLi)K%N$2ZT73Z19fQ;5rT9!hZWO4>ahOu4b2AH2$Nx3E
zkw(#;%aa!Pjri1|u(efH!Wi-fmLyK_1{(OBD|fr$ui#<_o$4ItSY{2k0%WjtTD`&O
z7-^TMp$;uVo8M4yHX?{TZ%xXmS&WeLYB}Q_D$$KIP;6IFbQ2s4GEswqvAA<<Naun0
zGkz?&G5<~wef<uh{7Sx$zxwvgd0EWV>jAUl5-npA{7F)2Fp<Pnzwg(F8<n2Vzu!q1
zYC7LW$5$eri;~sl)(e3soI?74%+18ZNMr_&4teT#^G(8}jm>xJh*9D|qATVp)d(?8
z;)kKsJbfGNzGzqablW;PJVrNBjk3)`#Yn&Q;*r|LwWL5Kj?~CZ4}_-p>rGM0CtHD(
zTWvd;trvl^!#%DdcHr!6B?eja<0t9{TPno~sp+7-J(Wmn`^f~4;;ac7wP`h}_JuL;
z4X&a;96-bST4dxHHg@cSTYWGc(M}n@oK1H+(GHoDfX9m1QQZmUWMwS*?z7|pu_Ryb
zf!$xegG<aGJ|*W^_K^e`FQu$bW=Ax3y{p1YnJB(L4kq!HGd49;m?Q9NX^N2JX&&b<
zi-(U@yLti6v2WiUQ^g*Va?J3{2!DZvm9z!3RT0@7pC9MwN0FkP|2d})T4%qz9-nN`
zLkbm5|5is;q<1Hx#R=;A;nE1$z+&F-js}09w9}udP$xu^Zq7YjO2;@-ohYBhl?3Dz
z<aTA>S*qsh<Kok!AcwRU5U$qzyYJq*8TVhJP0-cy=#DOkP+xUZ##SNCGi^9$dI5v=
zlv&;P1POXU2z={xO+ErL>C6`qv91eK`W3*M$A&-;p+|F;8}Y}(C!^fa5ko}sea7hp
z;wyw}bZeFQInt_7>8sSnFGQH%qW~wRDZ(~%DP2uscS>6;VS9g*70$zoX7`iI)h55+
zJZrq?OQrT~^Cyenlb_dRGw)4=zpwB&*B>?%z20nhyibEc_@>v}Bfm<tMmTh0YKS~Z
zjsao9g<9Wx@%>ts=0_L@i8fbE)Xh$3qzCmZUG}}Az<hKo72jW%ydFCXut5#I&q1FL
zvP~f^E@oJ^`$q&U_));=oDPB|Y(xtYLW1xUM8-;oia*D&uMS^ZD&^<NO90YClCXAs
zp9WlBcqn_$J@QYSBf}HgN%d{Y*&9>YvTt%+;U11DY4R?vf?V3BLW58BpptKQ)N1{N
z>5xQEGSibXaw?|$qN<g_2r075Z;#f8Q*{I$10#0p4-jXT<@`>&XCghr6jXC{2CHcD
z=06vUMNBtMx2bcxC0qq0%VRRSU!ysn(i3q#(t=D2KB%q{fVcbxLBO|YOU!JXC8dVx
z6R`_wZeK62b#We9Sh03wbq<W+$hSjhIB#U7jvT)D-R#>1ArV6?)fE_)j<%t0*S)gR
z#qcPJ89HIv;-}G4SRN<e9*?Ehuf+8DPn2<SVuX4k4T>6sQOA_$Kha8Pg>T1Q)U==3
zai<uiHfA_}-aliX#wSF33UX};cZacjm;o3O_V5p4T<~Iq*a^B*iwghQOSqut;(8C8
zw=g&@f11&yIMY9tBR@S1IZjnr6gQJz242E>$nv^4T0B@8jS>4j&{_>PxdCgtX93V#
z!ovFzDU!jD+ard~bC2ne8y-UbHz{kXLA~#CwA#k3@@M7~b^~TyR<pNtPnTOtsnh<H
z$+gTz+Y_W>ZP{17BG*hhnXL1V%23L^lssVju`PiVs0%wPM;@%$aYapjaUiSta2OXn
zG>t*kt;5dhN2W>axL4ApNU8j!eMd7wVo5b9(!)Fb-%N<;e@(s$6RV7}7}OxjuBIR+
zQcMiFT{A=+<-n0n(`RRe0t(!?<;6=hIaeoBUr$WWtDh{^bcaThVZn0(iJPFx?vwi@
zOK3<#W*mcsVm;UBubzysIu#syodKf~@b&DNX`OpvsS||`LU!~Zt1ie?@OkuJSL+k8
zZp7=it{OlyqWRJG>X4mNG?7k=sKYPN4z7y`TU!bYiO+PVLeA<07Euk|exh=BmM`V_
zTXE3ost`caGhxIR$UT64n({;)mK~~EyQFXI$`|N8I2yzH){~+1Nz-F+%y%Rw^cvR&
zI)on!mIE9IcY4lYUW;%LGtyn$=J1Jth@~dYB&0gb=J0#_A;lpG8Bq(cy1@{(oTKD<
zw`;#P+hy|=6-jG#O{z<f*hMOc>_x`5y!|O*Jh(lUz%yk#Od46nC1u-XHNqeB*6@rz
z?uJ{6Wmt}n5k=&=Adm#nKacJfGiDL1Kzy6hBLhOG{wPEkeP_-QXcKQbMZ6I1M=TP2
z`u1z+XG?I@4hZeVv_dQgN4^OO9R2BjO!1YWvi3dxb}lN>ohs-%<+RjCj=+>O;{Jbu
zuv-$Z{5OSUQJcV+hHD2Ya2EFJMH%PuAsk9nB5XE0PFrw(YuQacVkiBxf4LZc-}k=O
zsIiSCEKKrS0UnAbP&hO78>g#|L_dEhj-^mVgs~G{?Y8#=#(>y!r8?o-p*aX%0UoYH
zCuvNro;fN!j{9F85TjSDHWA~8$`b+R4RKo!Uo!W?DeH$ZBvM(X92XXDW*AXbP4Ae}
zFQwjd&>`p@YpFv0B??v4mmaBpV6drGn;N;ReHgdAbv9RSF;XvUz6G7%b=zUteFn}-
zia4uJ2<*j@);x9<BAw=WAG#><z#O?W^G^Hv{V5rP``h8QPpBK^%*-DN^eH^`Ofv(@
zA(lMvaUubt&YF=9POVKkTHtYJ7JT?^$z0oLbCzxoNdL4RU$)#C0+UphnY2lQBTG@b
zRu-m>#^z{OXu^SHX>Q_Ynu04<HKHlCuu8q0<jX`*9M_3TQ>o|U;ftE)dTPtU+cET<
ztyU2kC_%^Tv(En5;B~Vz7v+?@nPSrS{j`5k1}E)^2#BYD_r)m1GM&YkUmBW@odqNc
z3db|zZb5ZOz^iQ?(Zw11bNVr6Y&eM`{V0Gcahy6Fg&^paziwO*X%~zSRXQMu4-Uf1
z*A-0;7uRy1=u|yhdm2gQvcw8WDEZb3iBOztJjv7b2iOFjc?c&J{8{%f(o}s5VPHjt
zrXd1Z9gvylJ?D4#{Au!w=e4ZQ0~82dufih0>;CH@i3@pz2)`=YS%$EDq@J$&QLnp?
z(`$&Q<7}1>$Oa1rR&Lts>M0O?`P-|YH;sr*MdXj$`rbxA&Z7w&n3Rv*d?k>!^LB8C
zah((~=ODP`lG7$(CR1xaHhBA~kU0q*Qo9^vhxjY?5stg}OwT_NqnE9fjjEf$!Noxi
z$>BnmQSugiW*kS;E!zQtSKb;FB4^Wv&0JLIz=X3D8^XBOxuBYWlQKc?YMaki_7B%c
z3uhkM!m^CEz+8}=4dwm(atQjK!c*EMKKT$Iqno6axxV;hQddY)ER>PAl)$aVC{Whc
zHzd|p*<SF)dBIJe1zywkXk<yzP4oT5HnfStHF12B3yDM?McShXauDYP4G@l*J)~T^
zw~QEG1b$=bux4!;rpbtTm2Q2W?REma@|CKP2#a+c1D!jNwn|6EZZu1s*7O=kMvqTM
z5N6qt_0-giZuXDltGb{W;2KmkL_2UtvEH6I+(B>46%VgYc^2`S^>ymRae#<{V(Lp+
zj<IuN=eYcIR?8@p+HLTr1Q=^!_zEf*hC#%Vg8BC9D}&xK><*YywAT$Nf^uv@|6F-*
zXqW;00<L>n)bq|_F8twenKU1ub>e11XH>lFx1ip8gi5~AKOVI|R>=(Ea=|SrMaiJB
zp%FJhAnRWW3i#?$TTjJdg7Nj_o`(bN`lcfKEazCAuGXoU>J(5MMQIk8S3{!fMGE@i
zWGYKQVh`#Xoc9Wbs^<XTl-ldEfzYXwN+H$^qM1eeWEOLlL>0Bc45O0DFCupIT0H0-
zj|5296d3JP7BFAhWl?YQ<G2J37*HyUz^%l>X<8E}-)CX1Ee(|0MX}0XtsB)>XQ?%8
ztVQaO&*)LC3?$0hlh4oYmfp{x2iZ8;mD8|*(?H`O0{Wnj#^8#4>b=^nxL7Q?pTSs!
zfGA^u2ngjMOuUeDIskBVIq@1-XOD)<&<hol;e8c|uk<?`w8`LXfrLy22;&L*9~d*a
zlbmp<C0X~+{{;TY4f!VQFLiX|AH!QmK&H13OUIBjvKS6xUS1di_66e;fGRLgZ6`%?
z-o(h`q)-`T5_RO}i%1(&q5SSqK;(N0rzjsihv>}8Gf6*&K^U`m!7IzmLod;94t_!T
zv|OC<htehkH+W3)E*zXu9ZOPM<dPfF5f~RPFQV2s5Yd}%cSQh}Fi8`_!}o`5(tC>8
z-Tjprm`5hJPMzlxbTft@JvNUd^?ZVlCk^EBCvOGAK~0#C0^O5}ll%Uh1xD@i@x;~?
ze&}@(3x{VGUzR<z#hz(Yj(I<)#GdEHVI?oQ(hggkFFOf+x#Ho+ijui(o$Tb_Wu}n%
z%eZEbV^BILfcs~_yGP8lXk9P~GPpU8zL&b5Jj!8jS2ZZ=l9h+(qskng*wlCUjTJrK
z_x+ch2`T4Ew%Q4IrLz9fWi<5O%Kjk^ycDA6&w-9Tt0S~9F4RRvre4#w{4tRChB<qh
zivx4T*1x=^(L<jSa#q^b4ZXk|N_T6*GX_bE(l_PRszTX&D_^P#*5#)+SrRadYuIUW
zk5YsP@~xq_pdW>W3@6E+9~E~xIaZgduJ#wwW6lJ;wDLwg44IZilkm^$e~xx06}xWT
zwR^lHYh=;iDNoPP+!ATs2?Gz`=N||M-3@JihO^x?qz~DEgA6{>#Sr*DsY)80AaQ>h
zmzno3yLlx7i@3_6mlm2Ze|2}M4==t8;s}>0ilrjDh!i^)qAH865G&}3@Wi50g94L)
zS9YRWtjebX-QqjgS-?MO9NBhG&_Ajl35^mGI&y>%^`k`gwrY5J{|hc3<oYKR=gA5A
zE`SSo2=0qV5}0|jz(6&Sj0hgW_nZf8^&e$pH})UzdPxD%89r+-tGQX8I=G@Xc9$Xn
zX$>z1AD}My;6D2Y7Go@(s~qqgZ?-`d7$$f7-srd4qL~JmyXLD3ieG(ZUr-<<4ISc`
zX7BzzDgj?L$XQGyEJ6PVI5&pd)q>_Oll9MLNGfM(qY}~vEbd_X0}J!4E$N|$Hd*9x
zQT?)Vzz~dS)5?|d%yjuYvw;WQOYJE8Cr{I4vk7|Tb;WHVS+b#Vzm`8fl9<PsO__eN
zP=F{s^1i_$NlRM1Z%cyklKWBS8QVZ}FUsP8Sn?7ayEIf_=t4*rQN3QnUJxnb2)-MF
zB>Q?@sbBK^@qd~Mf{gP)|H9C{l0>g6NFqnBlv8~H8#uo^p?-Q*Yj4cl*024iLnZ`c
zC93pQBr<Z-Jf6DPH(RW>ZfxHEM4`%h%`XjgCdw9rAw`1``#pwCNGvZo!E1xHDf^4F
zx6n)&GyYAIE&WugIO4D6W>G088@Dqj?1l!^a*RC?Y7+0kk*CbSKD#91n=kf|cp*kS
ze;VfZ<K6v-cx?)|ndla4O(BC8Wi8l6)b>6}RZfN>S9qh`&6XowsHR(huNx#GCsR!3
z+4TMWjTDsR(=jkGV6AsM3?Isvj5SS4wlvXOlO*Ip7UFuC@%L6)-V*-RG5sryzBw7r
z3jLZYc`{b%nj?DcLLKm%bR|o=tnKBQG%2XA7}WEAf%%z{<{#zw_k4iRk^{4U+ZO^z
zkJv=b(D{-5E&PQnEXjuDRbK8}QBoCL(SpwwthFC-N!wE_fk*XVql*{B&_Iqk)CeKB
zi)_gWdsv%$`0y?<B@=SVKdgeCftqmtdBAQMa}Wk1AMhP2(1SjO%C<T*mngZoT`YqN
zMem9aVq`=-XlkDH)PVP_^0a(;nB#sysP*fo{&`jV+Fz8$6!CKO%g?S6uNe#0XK6@?
zqwy&g=p)2UCR@)!uO6IQYd8nXW|mB|PU=(ORm!)VZR7IknG7+gCn%_=!&!4%$#}QA
zAkDkXrLL!|9_%yG$D!M+bAPwD8L#D0Mb+KSLyVu$P}1D&PJs(JgQl4qzeSw@j<M+@
z8ndM-Dd`mB7%U>X?;zxMrC1c6Dh0Ameg3x%z>Zo5QQ18`%FGe<l=wd6S=LOuTX@Wl
zRZu+1S(?{Itd`GXMH;thyARbZEgZimbo*s)AU&6Nv{qWAQ29U|F`&}3y>OB&)hSgx
zzttH3IL1t}$?X~!uEN&jyCrMWQ0c8XRCMsiW5cxij^P<y7(;+rgEaD;;zQanSXN0$
z(Fa)9+*Bf`G&e+0pq1GzhdhNpMhZ@?F2^!E!$zA=XlAeHX!GV9NfQPW+hLtO``0&+
zf@+e2+9klWhz%9j@qA#3VHMV)PsRHk($H1i!+aQxksMb;e`HH4u#l1+l1-Mye_a?#
z$ud(rv7?>ylMIs(JM~aAPZ8yLLRrjffQne2VQ6l4U-5aV;Bt$D=?<rJ-b<As<4<vN
zb0or2yxsX-^y(=g^K-GwcBI=jpVld+$IROfLgC2=bvAhP^ZA{!%+5BZ0)hm|F9njM
z2Z`i{Stca65NX;0d6d!md7!IO^VRlrF1a5%PB~q>>ukr{HuwI~J^X=qm9x~BC2A#u
zE$^BY{(YXY$)@WCos#3V^>)ZH5j2c#U4RHwj={Jzxl~#jNq3lcFaZH<!w%4lj1h0c
zU<q*<Par)GvOVBuDhZ19%$9J;;+Tex3B@rf^~Ho^R2sN5lAa+c#WOVIe+3UicY?C`
zMg~T;Bj-RLOA9a9apq>pcXHR2jRD|=!-d~>-{CM$l_*{2!yGD=WQo`ltMDSuuVu`5
z<JCdQB(=~B*sn>pl9m2SIGN8W|5p1YfQeAt-Ju+GJJ8^~P2Vr?S62{Cqz;p@9{nKA
zCy`uOmdY7L9Gec}Y%JCB9kG~t1cR_R1_Yb>_K>k2$S9h6Q$1{f8u|8!Z^+MW?a-d$
zS;?3OG+S8+YYezlX$Zf-f#)$HF(L!et_OliH_dnBm#qoH;{0uBp+1PD31OJ$lT?|i
zkV#RNkb-O@vB`sa?g^HAST^u(QJV5uoC@oa5|4DKo6Nh8qSAP@ZXzX=xBgZzQ=f(}
zYTcHqN%I9WeD`8ylX>bh=5)ieMveQiy;GFAuFkg_B7E;YFmE2Rzst0T5s&7S;SRKv
zeLq=3x8vP{lfzDYpJ^c~ltB#Im^g7v+bDUk4gRyKI?h(az212J4F3J3o)`lS$EV=r
zbZ1h~v-Up)5Wt<8dnQ~#q|Fav>LtRSR?<<d7XVweM>WU}S;&$P^d<?_R6tz0j4`aU
z#i(_hp%g*imq~;^Doq7?#}`OSigsfbu?A>~X?zYjh&sz@b<o_2*c{>y37o{#6xc#H
z>W^x-2vmUeQOVS!eW62LxzTrBV68)nZetL%-%tiT3eIXPS0uX40McqeeguoHix%IJ
zd+jLvIiakwKssCq9@MfRTfocyB+L9Fw(FgwzVNU;y_RcnJlQ^C$m}0y|2IcO2wI!;
zLzCaO_*Agb4&nzFaC~~>iS&EXw{a-G_Q+S~pp+BM?NL$_?T1Yw9t{m6<+O%Y1DIAT
z`Q7ylm9Xwj_=6ELIat;4dE}VgCopmsGDNjZ%dN}2Vm~xYT~5-#*>9)=;LGiV^UwG&
zWJS@DIaX;6PEwk26XVB!5*;dxx4v5Zic3FeY1v~~Gib5M=A}as7<o;C4MFZnaZo6u
zVHgAOJ3Usrc1bwF)X`q>zeQav0$K|uuUi{Ok|N*8E=UrxDQlfz-->GTJU!lB2WExw
zRncY~C_9?uZ+oSh$}HpSYrOZ6Ejq1DxOAf)KbbbkIIVRz1|r3`_Xr6Lipz=>oIvYC
zOkUe{0wX>R43cuQ`xOu*Xl~D0$!-zdn?2#--t!}l2Gpkc(ggfT(=gCh3nEAirnJ>l
z9p>L5uOL|s&;z4Fgv;buSAM|Uu5}<I#;mHessMR&1So8E#R@K>^bvgX4TBr+J4eOn
z9XSF~-7V79wr{KIxs&iOmUc^L<Nz@-B=_t>3ycS*^cdvm+DwfwHH*JIV7{-0G?_Y3
zTdg5tMu?<bzIvS6lz21+ZIhp_ce6>*L~rS+smf9i2S~akB<a$f&p%TH#MKu0AdkYg
z!T#1$0v;3_BQjtFqA6A|{?1)!{P0BVUOYU+kQ1~chaeXy-PFbgf7t9&WjHhb5+6-+
z0|>0Q42%RJdVcg}P{qZ>V9YJSqOSW>%Yd2do7YW~kJpy6IEuC0?#KFk^->7pl_ds2
z_BtRY@!Ym@0RGe|9TUO3<p;<9mjH*G2zP_IwkQdvG@fE*3>Z`XSC<87wSZk<=njKz
zDqu{J_eKGl)d_3RXbY&Q`UwD!A!i0`xpXNAvFh8V^-lX;c%tAq70tCY#kwmx)ySh0
zlBEJdPQ6T8GSBgctRA=sghH5yAoOEF8w)PIs7N>UHum)U<5#8(MSQTQOl8~?ty^K+
zZy;D0lJ!{P^Xm^B*0JXBqbZ=>-Bq(tA#TH87X&}B)*3jT%xujanu0`2U?VCh2w*$%
zpOZ*JK!3Bw!+wIW4W6}&m2+*J@}q)kGsFmbh}r*I&kzOa<or)PLt^oPx}bbuQK(BE
zAQzTUB)2xrt8#XrG6>=aA?$+`SlQ6e+kFaw79_eTQsC{h?2ZsoI1gV@!bu!V97}^z
zM`Rdvkuc1EY}y8*e4eMUnjxZ+ODD+vbQ+TT1TBuL3d?iRxzHv>(g9RK^#7vjor5Ed
zx_<3wV%xTDCllMo#J0_eZQHhO+qNc7I?l=SKJWQXeO>$SuCA`T_PzI7>-t@o|BPT|
zTf%rtG%6E=n%dmsInnMlZ6pIAeTcHL@7_xh)QhMbJl@Un&9K1nZip8GtG%}_)pY($
zR&7>?Op8=Mc8~GgIPa}+hgxLfwRwDCefwaTy7&!`2dwktp%?~~^Hq2>hX_$ROK5tc
z3+()aWqh_R9YR73W;*;FGT#s!1u<P%GcC;*?Et;?P<5N6Ue}^_@PDlo1JaWqD->4r
z#td|g#Yhgo*wUkmo@7<d=Um?K4W^7SpKpc?jl9hYk=#lQjSprWhsmJ}r!$mRopatA
zkxL{}PDM%Rk-+?s$gc!^i&4n0Df}h|RG0r{vJM*CFr>!#>!y0o*4Zh8hJlhwlRf?Y
zUBpiAM(W1Ghx()Ek(|Ih*d^5AEzEBQ^R&SU7lq+gZR6oyNB(917#B1vod9A3-AjsO
z;Pf(*!#J;1e3{>r@|>20Hn=Ct`66RXA1gX5eadtEZ#%DXWDRXXKR}oiNaeka<Qtke
z22Q_fB8>$!<}lmewhKobWs#I`lcz)o>`ZzpQAh?ZuP63=g8LJk>^B&xMqn(f&?oBe
zCM)yNoNH-PJ-J`DoP-0?WC>-XC3wd=biI9w>f@*YY!+K832(v2K~l$rb>e&-U}dmb
z+ID69IJ0T&{<i)o^doO<rYv34J6v)?5^Nz33^HfaA;GDsVJNr(R!h<DAr~bSm^M;_
zsNmQCgkp@K7t)+_Fjva#=kLo?mNE*jfjb@IyGE6{|H%XoMgO0s6C$f=WztCu5Qt5G
zW{oeUqF1L@DqE2*IuNTg+N~{#Bris7!!7=gvsAlUEjTeY6sW*;ku(}lTHn;f^%hBK
zWC^(*Wh53BKQJZm4;}dja(<+!Mj-40@~)mFI#RmVz_Iv(?P^4eDd_M%tHm1_+(i2(
zn-k%9LU>km=iS$ie*TpmW4;8W|M|%Lz}k-x<Kh6R+AT6l#n}Qdm7RA_#H$Vo0QuOJ
zz@K#tpKQV9<f(iuv~Q15462|X&VcUwR~5wADArMb-s<(-+W|G~3&S7k&@@T``Oc;N
z5ISlv%cX2019>J^+Jln#a-nQ5)t(;gva{NsH?i5m?nGVdznvyUjchD63<Hj(6y7vs
zEf$hwb{JK*l)|wVEM>*N<S#{Q<I)Y>x-h@$`sHVoW>S7|w;IV(zzG5G!(fgS%;@j^
z3-grE%)sMQyKXSyLcD1p;eyp9h^%!b@K?%;Gt9F(Fr!v$PlA8V{IgUb+W5Ux1(LNJ
z*<3)ghHm&8UJ{rVIr^okLovs4RJUqybw_-Qno=f?tp!<g&%Udg(j|_u$?)b<84KI$
zv(ilRg!r9Be9)ZakTFg#vN>}d4gdKJg$!r|j=fMk;RBuFrWo4}a(#!zBAtr}R<z`j
zuoV-DqTw&ifV5|t@0yXU2k^Os_7o`LhY>B}B*+r2k#@Uv1?doU54O$jf;U{PsX!UD
z!U9m<k{A2=sK$a>Z<WO=j7E90-7czqi$W%Jf0ndA;SV*>+yz+Z3_A@-=n!)SP$}x&
zRG_qaKT+Fg4ETS%ha<~N)Q#^}cW)y&2>P$2L|F>wlOrN~^Tt)@(4nQNSGF)Y7S5=-
zgSMN{QW&}Bt?P#5tdoAB+tq5m)9BT5UDD`P{@$d~+g7hzorEN{3X?d$m_M@WC(-A~
zq&<wP?);r%GQu%~KZ)ZiYDH_f8l0IJ#$(!T<L!OKqV;0_%-7|OQ{^R7)&bnYN5}A6
zv=e`k_=~uN=WC%S^hUO}x7}v)nxl1?t^%qnA^3>#jyKVFWiIn*?Q`1?NcI5hL;<c~
z`3;-DA#y3wjbcN8q!cxogKLy}Shxo0&q{@|1EVY`BW6W?MKDN{_sK1ig;JI95F+%x
z2#bsrBCm~eM*Xh{W9xlT!V+{DBGJIzql^6(6n8XNyUlN&CIA4UrK`i~``N~>yJAL<
zlm;4!XVvu5e~F?(133qd*;yfzZ9AD5i%S&M6AXvhd~HYX;TQ04lfz#K>t*M(AAM~0
zE)EZuZnY~>m_u0evY0XejdjO<6=4ny#^i=|s<b%$7JeWf(=C>%ETY7uLTciND~U;)
zW|=q$i)LxWNH)XdyrmdI{o_|t!_@8;ICjL+@1$dyJOR!jPawz`GX;MJto1EJ_uuw^
z<7g!)p0%f9|Hw5(s>6#C?rXo>(0wu~*@iYl)3&9gsoiLWkp=r!A;15%c7BKWQ{a%K
zzXU1ED5n<p8;qfi+zV1ZCuQAruP=rV!u5JPalOZC^Xrr68Px0{VOQyIBHx(`PpFGD
zD3S#aB+=O&IBinujZ+o!ADsl;f)g~zR2EiD0>SNaXMN@zsf&F_jf5)QKH!ic10zis
z#|n?Wu!2GW^;mp>F&fCncwN^1^El2j`JX9KxWZ?d%>}8y<$?{6nw&<VP>6%bi!Cz1
zr-CGs`r<LLc<3@boNJB}HX9h|cpVHJj7)R6Ql#XLa{PQtxmL{F&P6Q-VjFWNg1gRk
z-iR)Vy)QIy5dz9YTErQqF6#MefMHt)>ZZsaTPyFu5CE)l53lXbYHJe=w+)I)o&9V~
z&*(vfn-KkYSvsxc05mIFph^R`{w%TzLCa$cRPVTT)9oW)L7*wJD0)-1W`AOnNMtWY
z3Gac4<#GD@lV>l8*8qeqfPhh)Div9iH;-h9uwo>@Zqt|Wa|7S6<NYl_W3N|)aeK)X
zJj(tSg7@^uS4MxN?KkW$aQZ*HtF^CAA1}9St~d{y8|`#;fbA$658-AX`b(nJBVf;G
zv5_tt^TX!n@c3hYb?vE9^M3?ef-Dy)A?%#MRD9kyRK3v2nDku-tn6&R!yCK)TYGp5
z;9!Q>akm=%;&{IHMAPVdjN_%FE77=R?s+^;*A*5!F=H{n9#WFJy=1mhl44zz`An;0
z{kfIx_;zsy{pjoR5MQEZ*Fquq58+xdUt}&hM1%FZ4st$?U#D~q-}HYXQ>00_Yqz}Y
zUKeZtcP4-@C`cb1va<CbW9D^sq+c!{5nKEHN8=AOwDrpH)5_M?a^Grm^gKRlJkIp{
zDFWP{4s!uAQOvdRf<!hK1Q>Gi<AZ+0<CY)JyFMRq|1uC1L9TK>44)Rf4;M&&#CZ2*
z%1})o|9<)$w!nET1P8l@kcT_n9=P-ShJA-4`B5Z*rSWBeZRcsPPS2ys;llRF`r*%>
z0XMAo_q{3c#v^eJ)T?nV<W%+o*!PV02kx<#_l@3f+x3Oa?yHR2638xg13IGL(ctsR
zckr-%{Wu?KhWMGsP&7j*(D8c;`SBd`q5^ixN1_+6&y}MIPE+N19eZCCy0@$U3*fn@
z#M|gZvouz&nmNdCg#fm^QW>L0UVCYG{m1qy8|Bq<e2QFwKzJFvx7+I5JU_$%>uzd&
zK=%XTBgbApT2i|$0bt;PyM-8O^+fo+8J>}4aurfI#?&OLWdVtgcGIiFBrS%5q|c1P
zq-Fz_a^N>rQO3yg6$jmXG1m;)Ee4!GSg<G@Ob6P$_9sSf81ovR6x-l-7(IsZyb$F>
ziTseL6dK3)=MdxwbcnOxygB-+y!h#GcAg421_A{x9&mU15)moKqq8`TyX^dJS36$0
zokdn;-$bBw1|w)J>(P@4Tc!GRez*4L<Ez)};*}J$7VzqQXei}$_C)|QxV*8KzjvdG
z<VD*b7noIS-oMoOJ}y-G{y;+U^!|!9UaL$+rzx>P(-#d~?~`C84T7y-w(@=>>gl27
z<7@qJ_Vu_!@urwRJ^#IyhGA&LxzpxGK*z_=K&~|GCYN$@F;zJ70dZXoNM*=DjvE0#
z+kk6hip0`fm(8|=lN!KQzy9M0UY>Wh(caqQa--W9P0z=7JmXjYwk`=6aAU<va7@Y{
zSzo;VP;E2nb=%D5F&X5UGDV4P%R))4XQzLTA<t0Lg8Xg8t>y&CGfK_<fj=$0zn<==
zJ+@T2@tNWJ7e`BO-)x4MSM!!>5O5v@zJ7*oER4aT$t^mfI3Xnt%a#(c2OBADF6zY0
zV_NgrZ_VJSegAeSUi>`eT`JM-u_-wp&3EU~BM7O$N4+<^{we6!RaxKT-M3Iw7c=xf
zUq^0^j)Bj>$N$^1g92mwRlH3In%&r!s-cSa8`da--|RmzMvVllps2^mk_NKMOAux6
zraykK%AZPGrGEEKPF5k*!j?EfD!N_O%omlt0Rna|zs~FP>~%T8f4odvS%Xg2c*=)w
zv{%4sba97l{~xs$H~zoW8pPD@|59robn(^}G_wL)S~{eA$u6c~0$iaTR35clX=LT8
z2u8ls8x2|tt=b@xJR;bPBFZweLs7RE*IT6O+e6426xG>=F~ucB(NSx{->*)X4RDOb
zj<dSp<uo!m!2~pJy(0{U9KozehUt$~#hFp`I*90fBsU76q*nj-fU29`4VNJ0=O@+y
zTUH2TDZ%q?6a)*h*xjpELlb)rP21*5y>_3<U>ujCK2UW7+Ik;r-A=_0|AY<`z5eZi
z7HUe3bkjan)!u4(-qj#^F0LfG@>+~fF)OH0^h8eiAeBh+WoVps#(CX!#T$uVcfSkP
zK=vZleRIc444+ZMXoJZSPK>0lv2)RM#~T$!rAM3E0NJgapadwI8?p-uiHyh$*7rsI
z|LL`?peG|Ngu5l5s9i$gt_<QM;8y0rnSb(m^UI5;w-J9tIejL&U9=EkuVz-@aW3{o
zt9*;fgghWBwhJog6n<xWhiUR!26Z?-Jl|K3Ua@{$ycdW3$X@}s*8Ze54qs6gO-G9p
zSL?+g2-q+%D#Z|u#pb(vfr6|~lYrbdeebYM*@l(>B*A#UvYd@o+o|JwLuoO;<*?Q?
zbEK;-Ej1t|i>_oKu&<=O!#j2Sh$PR6577$WV<yZ38K~S3QB=v<oQrX+(=cZPa8Le&
zs&B;_={i_+CV+T-Ah(}a{8IyLuvh%n)9UJQeB=4qiLVP!qXo8sXAY;X77}ZzG`ehl
zjUfS{==xP_jB#a1UY*(F_WNC3;j$anaXAT>*4h=nh^_iydQ{4gCaC5iNxo2G%a7&B
z(BPwRfA)&SWdx~<%6r@NQG#>D{%pyaHd&day1;=FAiKrjw}IY)<ZBp&0YAT5M%jmM
zhn=RB;0#i1snz4;g*JR(Od3Tj!UYI}DG2Goq@oJKK}c%G%LIz=uY>I!(+Ph-j;vZ<
zh1Xl4*9{3THgDB&FCXZZF7G})Z6CPTW8K}+$;2SRP(M~bxT!4dC0|Q$ZF4bQ6f6;=
zvH=h(C^oEt<3?Nk-j9MhN|RGVf-D=CCHqq7#Wu<fgDvYe8}6Vg#T*3~9*dn4ywG?L
zduudx&Kyji(Py$+o3Mit>8v0$$BO#|mrr&lc=2@ww9C!NOUab?4z7vq4-z^L{TQ)d
z_P{vS6@S5Swo#OHPH66??9B2e$M!JHe3Kh0FwSW25A|3mo-BbNVlTyfNk0lEj6SzQ
zsO~(M^zbOdnv~dtrUO%-+moo*g%tYsubIEK8F@NR2RHxVX#)N}#A&MF>Xi(zfe}q$
z%dykD=WAbm_v)4QP^fu0&BuftCru$cG=wC@+aQCL_5$BXh8O$r+uPBae~MJDYFVKN
zJx<!t1{jEET~u3j>9Mw^sKhT*N!jf_L?f~w<4ZTOA|t?DNK^Kky||{tkA$k~e3z}F
ziibnTLiW3*tmVY7kh11{cSU9O(T--q>dj5|!s@G)4%+&VwyIHVozV-YNSS@+OUx?y
zbvMk@gk4+KBISIS+&E&P9PM`}ErCJ*n~5%U<2<%$Hv~?N5xX;y%A_^AZ$)o)Ep84T
z$-ji5%SY2e<GT;LFjikUFEby~`vIw1Npb?&p50&-{-ct??`I0)FuC(x3CG`d0R>X%
zpS~@s6X>+?(WJPCetw|*uF|vp>c+1_BuUs}*IPfadx5nSKY?%tEP)JrDpR7i5h@K7
zV3Wy2N&z*#{b2nYKVaH}5Nh-P1Jf%17ff>rG+4x!-7&pd)G89dZ|y~gitPPk$c%0=
zt65CzTh7Q8sn0xQrALRaSw{Ta=20J^-)Yhx(I*@nt`@u^x*t&;*CMds?x)iQL+x+?
z*8KK~7SXm8Gt@wFRw`T`e2y*3Gl)AW0Udzp_r)M%FOkRe3crrmZ6S)+^S^+auj2m&
z)LK+psq3i@)Lrra0j=7KcCY@tW`J073<UoePe#csBUd%kgUy)0#Qp@vspVS6SRTMi
z7cwS6lXAot$(P`*7JY<&!=(y_u&<C)C6s5I9!Us)0-eXY35SgN1*5#K5&;+Zt&)Ql
ze6yazNo_oBU3u<pig^YmfN!2bZ2aVNx(6fxdh-y>5kxxJJpGXwd$YbNEPQpl1)|L*
zhNvpSOdd{-N8QxTdLI>ldkHlv+QC7mMzXaZ)p-rgc|m#-zC`SPH3-lH8H;54wR||^
zlXFPQK>KQZKE^l-FF3)vLCf&oWINXGmOx(T@b)nG0+A=)WIM}$wl&@6`!&RapD^*d
zvIe#s!t^B-m!4t`QZl6(SGrG;ZBku0KnXH!w%=c+^SIf)9+W4)j!MH2C}hfRZH}bd
zIgoMZ@1RbI26|idQq+jY+xWKm0#{<x=oo(@r_LD-D8S}CP3R#LU6=EIna2z}W}9wB
zB(G$+QLkj$?i9CC0Etb)4?B*ooXF7*MjXh{uA1*8V^Xm+vWx#gfaOK^C{zi(4sdZI
zPT9Cfi1G0_SQ!q`u>F(0{r7leW!>W?SzSTi&yIR$Ggo*!C@APIe}|v1N|MwUIj6*#
zRj-=-iG{kPR+WPU$a0`rp=>$H$S_#Xito|XGKf|D<VnY_Iv$Q-Mc-(tkj|p4B@AO(
zEqN6zX9vZ*auHj$`zSe<7(bQLI895EWLr^^mlvl*j=81Bp@P)u!IdpRt!MJ~XPDwQ
z+7!c4N!fv>Hr#$QmG$4q0$}q_JXAVa>o%_fi?5Yku6fos_6~Avh-I}=QIZWCf9-;m
zYlhRLG8eK8(1>`kmLC)Og5diWQZ$^Xv)IxFG`)99Ozk=nQ&+Fmz(5{S)8?a)EN=(C
z4P{iu>db~LEw2KOv0lEumvsr6-fY2b^a6D2dXexz9{MzuGp$Y#Rfu8~VNc{(ee-&|
zqJ^D^E)GXZuGM9E1HN4J{?W3p{$g!B5*~I4wt3JN)-n0&cDiE5v_7YkjDB87((1Iq
zB_&d!2wh>*=XUUDu~g*7g<Uz2U21D_lgEb9uad_m7kTM&JK3~^^HDpmZd#vyAJR+)
z#hG4H1ZlL<jRSouV4L~Qg#kg)=jx=3)-%m=<);+dOXI-J4E18@LRtP~eid)MPoHOE
zyv<!R0ch;3(g(Gs?Fo9T<xqu<618`s@}zlJ1!}`)oTS?#wwozRL|9$uR!)^?L6()!
z>D49*<G_86nO(ge_lmo7k&gR@FGcSR4y&Y!O<C;Xaq|`RC9TygQAyS}&|7!2u$A9+
z0&P%XmcaKLJv&WiLG||^ROAta!Jc1y&GibQ)_jJ6bP(NvvRZH#V03+d6eyRwMl1B)
z`gTAf7^9juNVGN}TYfLC8dc+0-$x}L{I_Gdy(FDDTo{|}Y9;;({t&6j9SZofwH58K
z6YH)aj(TisUO`>!RKaL2OU31R7I7ihb_F49<Bg%SZ^H_qlP_F(-3k)6l4bWBXu53&
z`l*0x2O2GK+cqSyntPiW@$$xTv(SaBVX#_{ZD?DKa=oarTA`7O1wLOdF%-9?zqvoj
zNF>|Zrm<yb1o>HWSHmg1?7I1Y_jgjsD;x2qfT&xQYDA(kVITMbhI2)8Q*Xa3qAb@E
zw&j_5WuCBh6Q6_t?n>2Z{aljE(qr0ls_bfGi*5t{TMiq~9Cn@AG&jMt#|rCce@&zC
zyuo~37wLTP;6K+4ReyG(Q&w3S99&Z}cN^zaGzy-Ky`{O_m0zA#PD!e2<J2b$fMohJ
zr3tXejsos6>K~Z6Qp-6;^@5#2L`l=%fJ&)OyGp6Zq*UX_&)Ys^3D-PJq?V~3#8ag}
z?jI+LXx@AR`F=v+dL?<Y(@8=JV41#{auTr0F(qoxogRSJlbrdajpmYRYq_3jTTv^B
z>=nj!I7VSmV0^N&A5cv!y{334_%C#7Et_Nc<f|>BB}v#8=iH;EzLm2k?NsgAp1@^I
zo7?Z#-!{3R)y^H_R~@(c5ey{gIeTGA$UQG~uxT0^?dL?wfwuibYkO8lwVVaAo=ae7
zLU`E3m32y~zx1PlA9iQH7cRe6Db4Z*2E;!a2fy1VXGc97VT|1=fG?G)wv7)jC0{I%
zFoT<o4{A4G9BH~7QA|WGiI+Fzw<2XsiE3r3m~ZsXsNai!Q}zlOz}e#_J}O6_Hbz6p
z_bIyU8@b_EG5usN)cAgdcp=NgoJrdVID}Je%MpPn5|=CMXBhFy`9Ci{#0=1)k<(wH
zpR<W<qyMYFHV#v2hb8ItfCEBmdn_6fo4V5m&gCW1XR)7Up9s~tRbgTq^6HUrOmr%M
zzr<q!Wdnfc0U<q?zgyv_5;h}>yNi;zdq=s&C7+~)gn{JXs>;<|6H&54PBG8rrIj-l
zdNe)dB)N3<5B?TlR)vvREspYm`6q4TDRl#F!!L)a{bD^cXpy6M3}?^Gd@X?+4LP~i
zXPlozXG#ibjs87Zo??~$XW!n5)87NX4t@n)%;%RnHm9mld?)u0en@xrfe|XCfUjK7
z+v9rwWlEL)<Q5#Gn|5WmCFSn_C#7r*W!aHEcXH(-w8Xkw!ae!+9do^lf5atyQ_)a+
z)5f=<q1|&otk62@TnQM!XR1?hh8Yyg<{NbsYjP3nJHSN2VX;f)YqCWLr}Q5T5jHAD
zDRTlwyr|J<38Lg822>mj0mDpVZgo!43#0qu;!SE-JYUiY9tcW;6Y4fj1O&_Dto_^)
z&8hY4n!^sD_)wQ&8zWof=7)&EN)9NF?7+u4?Qe=C0rbl}!P=pS5@(N}z*%?=LzJ~4
z!$|u;79l;5g0EI3msWC_iDtARcGQAKboH9#$oHH`_OiLqYUKU-%KHxin%9*Xu-n8Y
ztS|_)=z3|*#IKYzxbWD*ljl?o@?>{8zqT+&GjlRBXRKoiYvOBR12AA9hkci`S@?0E
zAL(TkTJvk@`g~DE{G-1SL;<Rd)G9|YN(J22O3UX2DIZj(MOJXl7*ax*&hF}4=;RcN
znO=7v()jV0j~F|h?i#Ip6iqK@B5c1?%yD-9jv~4J^|8;qMzia_#K|FV<Fq}BvDGf`
z6FWbPU)u#(>h_(@<~i1T8O!c0jXF78mbYu1Q%3}+6k!NYbf+%|9pH=z0PG$<T(?g?
z8VDLxa=Fz<roBte4Q)(4O;*Zvfa(^)_v%&wF4C6IV{n5{^~GodRx2I8HjhgQ=mgq(
z0tTS$MSfY2(zN~D?*w|TgmKfo?!OieZ{Sj0?To)^nWNREj;H!9*=wzTHB2sL*D~O2
z`)yn@PSSUWPPB<t6fvrK;+2bu-i1rq?Y+KV4JX@ucX=PaOvt-!o85f-JQpqtOt-rc
zl>c3mh<MTVbMQWX1SONm1&V%c*84Djdi<qRba&TYT)IkyWNbE`E*{;xBl_5OMXaZb
zUtY|*4Yso6aMu@2`Mj^nrhzO11h~ZB`z&*y8O)|jtsb4~^~9{!_gf>glsU-}gBc9D
zUeZW)3<<KSA)Ybs_h!K=9Edbd{j}A%9S;?-d(%_7fdjt>?r2z+&aK|+?tJv#>2|q3
z9j-+5X7H*~z>#F8!)_ao!?!>(rh;uCACx#qPeY`XXep^e&gtDUBrYXk#(P0$p(V}D
zy3MaI^ESgbJSraPmX592Gs1Cmwyc0r#2?8Ph<pY9$-|1C-L={h*#A1O0*5k>9F-B-
zwRch7A>B?oNXZX*PG<`D4TAdjwhZ#ycx$ScSu!{cjic91J_j~G;><+!tRS11eJ01S
z3#`V+e}Svud6@U&J`qq}h~^W-6q_(nMyepfn(3~vEA5KaOr7z3S*00KCS;vE#wfag
z(JgJ2GJn3<g1)YE>Dazv;wWbSFJZV^6Mj_<E2}=lU|k!9xHZp3(@rtd1L=1lQ5H55
zm?hCb(+Ze{6?(zBT{FqNGki^|?}BJiI%yEw=_C`Q5o~{b`B>lt4S(l0^uuNa@=^rV
zzoox5x_{ea-X>iZbbD^`J@@&V?si1J{rQC$a6Q=P;!g=iI7&reGiM5=R;lG`7&ZGw
z5$hJGo{B|s%}t8_!J0nHpL*U8R2D(iCfMSQ7ithSPlALGM)u)1n?|A1>M)*|kjE=@
z_aWu{Tm^c{#B5!jWex{>S!;f>a`f^GP#3L1W+d~erko^CbkTbu<A-IQNe&RpSaQe#
z9F6(hG??nA%Z``NLA30fMD(s!MzL4jz?U<fdx4Saw=;AZm1ZViGe3W-xg%!AYVqCh
z)5tQ4yF)7a%zU4MUPImpgnq5wThve&&tXn*5|8Nz#gdPFdyf<JgRpYiu+x3i6v3W@
z1hr;YLewMb#~<MF=1n{Je44z#K5#9e;J%b=rmdNQ70)*Naq}|5z4Y}nA(kbBZp&(}
zL|w3BOD&d&I^BzJlPIE}gzf1q&#;(Z%)&^c*i1I73j=t74LHS2>5$Q+DdDpO3jG86
z+Nm6u8~U0oUw@nu-`}kCQ0@RE!27cNsd{h6{Q&EJh{XWRV(=Wfej0gIBEB3i7053F
zhNfpN2RvVG*5k{vIp6$#+ufO0aYc}C?53XYd&?a^wQO4)w6>qY7@I~*c8y=}!pqpd
zm)`*Ekp}&)r?W2+Pk{AUb4>GinkRZi6)BK~M3%<rOS6~aOL+QSPMG?wyXZ^x{3$Y_
z<1^imUv**CGkPJ$z{X&26+LrbEt^hH`XYKVARm%m*4OxGAH;E98b!VPN+%fi`})HC
zUHp#P$TD*<Ei#gHbDG5E14GWouV$5{vr1ek60SlQGUiNVTGGPyK2Y4k39$X2i<)j<
zd~t114cHRnfE{ITQU!IqA#y~e&NxB!xME#UI98fm&Biq>C&+ILUGKxi4hFp!-w<mo
zjMJZ*<2iUSll5&96pMs@hMA|uJ~}Ba5eoRViZk}zmT*IcVtgVXHjGK$ug=@1ABKv6
z)&PB$P_oWOKk2p}Kv5L-o7EYiekf#ax_Ywv6B3~p9Jnnv$r>4V&8nyIP#{AT*AYV}
zSJ#)B;@(VvL9e(7RImY67*RYDAtQ%tuSJ-+uO2nHRPLab@`PzcS1*r_zluViWhS=K
zEgB%#QR=Is^`elJSE$1L+KYr%Fi6Re^XoTOxZmG*kxf&IeGp00@OP$nI(~eii0}Rf
z3}nE6*Bm*A#(`d$s=x8Kv;;g|4}**yC8D&HB~I>+M!FyDUmlX?A6~DIL(GOt|Mx>_
zjv4eM#h%vL%puDj`|(EDVY9f`DAMtnjJipf<IBfGa_1-$TQSrGA@KS`q@e#@yu$+%
zX2!=kZO?IMB!)RkpNbBzG;w5(e!2-SN%0D)Cz><R;vU>nE!3jfEq2Ma#|<h95AVx_
zZWnA2Svtkmd+N5hK->pOZar-1$5_x|EX5*87H0QV<`l~Y;W!y#M%=n0kS8JJ_}TPi
z8mY-&7R=xHL;q9;{wevmgG<*DhG83^Qztp1==IGRiYoIdQqT*SEga}$r{-u591ZCH
z6)iNG7n;o?NnZnkbyt=Otw)T;a3Tf6HUGH59CS-4%ZtVlXdkUrn#2$ND8hak9F`F5
zECELd`RDvZyz^V{)FCr2S!=DegdOzC4OF;eqa<0Da;G}ik4h#}?L(hN)TuH*c9p@N
ziB9mrq+Q_C=5_p6#(w7i1jGm)NWzf}`As8^It;l{$<zKyZ{6DvWbSF${#@K;G_<$#
z0nsE5&#gyt<tZenWQs((IbBF}n!FJxI<Ph(G=|HH@6GBw;rEz&5_~lZ3sG-(lj@RW
ztJ$)@=|;dxk+|^aBfv)TC(nF&qC}(%aLSaJLn}XFKt<QkY9C2ABkU{Qh_3Bf751ND
z8VO>;dGYOZ1OS^zV!QRYVwkBjOtbfJUhj17k(K!5`=VK4_m4kZ&twNy)ywm1x4$=+
z`W39%Rc&|ha|n-WTsba`TwGxR1}InwihH&BzLp>ZXr#55ksjW-J1q``Nr04!JsJ|$
z*0#x4kmTWY4Dj_2`(=$Wz6p(?)z~L75SrH6ndrD6y(X`==k|~XY!&>xLc>`^SKs!)
zOO`REsX#QsScS&eshXoB?}SS~FP#ZiSZDf$6EUbyewL}q)YXnmu!{XN;&w7&6*Enx
zxvvwWr@%0;=2m0uT@O<k9B)Zh(fZ2_-)EBpmc;hLKy|X_6r@1$MnrLlbdoL{I*x^<
z3B93CB{o1kd5Fj}P)-$R%DFdVLY?MwwmPtxQcJ%~BpUBa{Q=^>DOVxs0uf{8JwdDn
zHu+*tL<9C}^ra)boGNGQWqg%pHb;>J53thO<dHF?WrMG*3*fGvq@;^iUnnxLPo)TW
z$Ht)VIF&Q<m+~bci{zbtUv}qg{fXIhF%;?wkH0?AG&L4C`4E`yjHKYDAlFB;2he|w
zv%u}!%ATg8e}Pv+rx0Jbd!p!QNPXbnCCoVl-3^{AW8&r*IWBh}=XAkc0<Jz_Y;2r1
zHlCD6zd$dwztPYtzPagT<_jt#kW`rVM^tdeEIE|QjF&QX+O;?(<b-O77;l#YR(sKt
z6B)c~F^lTbEv5kzq*A@t6!1U0N}a=rw)aGTIxNLqqx@a_Wg#LLH0U*1`Z#oG!%V+q
z_?Ol>$}}GI0*Q2i;3k{Y`OiZGo+{~q!Aij1?DF<pdt`ymrB-3~)&D{>5DZ*MK8EC{
z%8ofOeG=l8=`s5ua;Cmf3?IMWWpC%Tztb{b;g12#?>Br8a2YdxtVG^(D;Pm&#ktY%
zOGUYeA1pc@YSFdAx7J*Pb}w)mI|+JKwWXZsCMiGKK7^8`C4u91c_%-COhwP3<@>~x
zk@-xz+be!Jit)O@JtiF#TaW`V|9NQQelI<y$h`{)_!7#7U)Y6Ck4#=~yiW-vpfWSw
z_pspE_{6Gz+>caG&-=J6MfGQeVaD_>Q6t9XoSqxgU+bS?Ezl20)orr1X{y{7$nPks
zo{00)F$qNo&K0ryG91l$yzb^<-h{_pt3U0V>GopYbhFv)^&9o&jdKegtQZ8Y)F(Rb
z&iI#;Lj0W1mjf?P$x!*}UhKA6sR5k7{`0PhbW*7|Tx{&Ku)iQ>D(Pze_{UYzr_u8;
zet%r68e4NL4M<l#9^btF)C3G9FE3)uz0@%8)W268mrMEJvd#Ei&hR}pQzHB0=lKXb
ztVP-<!7Il?S*mus5A$(B_-J?ecUD(DT%0c^Y-lmo5m{O(L?YTjc-zwR^+W8M%$`T-
zWvcIbQF~^c15TIMQKmW)vXJ!Px$UPozqa^1VyI<)3bMLi&o%md&RL>hyfU_LqSDB`
za>JlmG|Ru-c>2YGkuRMG4cCX=!Cdwc>ys#um;T#MtodmL`aDyi6)aqb)wsF8tr`3H
z{d(jQO@9z}`R^hTHq-2C+J>E%&B=Bac!6p5m~U`nE52vi3N6HJ{h5`iZQM8v$q}LJ
zxn??9dvlr&-p@eZ@0r#9BRHiS1vyzmhtye7CN%^v8b|qi)uc*@t-c25F2vneWcY#M
z=<Y0p*VpLi&IY5imC~?uD#CZz5~Xoq%7JAaH%Bi{u8T00JFG;vOMgy#vE>V*xb<5m
zN1HBgvzb$;I{d7q|0fE&Wcf<;I`@XHCad&~v<~;{gfkWsM3^74P1M+H3*zlROgmz-
zEaE1Xe<0IMawqE0e@!M!L@gAo+>l`EZO?t(E;V<rP%r1DwW)S~kWi@fTV!`>H4Ae7
zjmL`~w_BZli5H&E^`7;-@pKWy%$b<Oi0+BM5{XAKq9^lf<#|whzX>z+d~4rS%e*Ab
z0=;W|i#s-+EjcZ~DI`I*+8j21?v&C;q6(`eIH;EE$#(3nGilU2p`3l7h^%^%?8=#Z
znUyKCI4xI=!=76aEeqsT?<V9`7HIY_w6#n#mU~N~@^<FmE4Z%98~qs6`xw-Q4|Q7V
zP;MOM0U;SZ!mvFaJ*ISL28CyuAthxK$zV=iZ67b%nn9C~b$?!0n(LgkHh;3NOL#58
zcvjF?N0c&4fBJkl9Oz8F;pqwn(&0hrr3J&$?F3tog7)3MPF9l605y}6ri1EAkKu3~
zt+o#DSK2lNdj99YOnLzig2S*{u~Ib8IezrKm<or#9$8TKAzL?`fBTaz{7h?=yUs1L
zzEN`0@ZGK-@6#8*S>D@AmrHen$aDQ_XWS9M#Xxr@Wev*mOw>NCMDwo4hnF*F*UB(v
zyy*QMLGS4;v@EXv*<?<vF(B$1(yoSA(5q=#3Tx70^A9XSiJ}pRk+=O@B%HJldK#7R
zW>@mMVxty!OnmPFhCcEK*H$Nx2CuC>iv3djh}nYKt;Qc6x=lhSg{CD`7x)Ibv8>ZW
z{dnWkT4tvT`kQ#$w2w^Jul>NL6=*8odDMvF_*IM#nuK0^h#lpcp$7qMmwMN1K@Vbg
zq$IgmrrnkJcgk3qgAOWXKRKNUPKgk##;ejZpw8A4SwWX$wPS6!skfcDZ!WGHfoKC%
zgofJKs&ZORx$+`Rf!0DwJpW%Lc@t_&$NVFzmd)g43K}MM(TVDaD<U0rbRj&qM&^~x
zr9rdf(#VSZE^af#Jd0Fi%^z$^_c9Q&Ji$DEo;<w=cvI}OpL8bK@#(@oYhA`dGQ?G4
zIL+gf`p<XQD|fViJMR}U^v0_Pps=!k<x|#-ZPNn4C@%9mPg&N2hly$zywgGqtVA4y
zRJBFmbXT_*6s&Dzm@ZNLVAz-+{lVnLtiVHY*RZExx~cRV9f<xeu`de4+y4s<QFOye
z3{CtO+M1(Wu8JF1v0@f+`G<Q@e9fCbB*jL*i%z9xW%!-l3@gf?$<7Or+FZfBG#qH8
z^Es19234y4-nkt9MdYS}B%_WfJ~JoKD<X7v=b!>M#sgKm1#+qujfZ4|@-CvGTiC;~
z;RGnGsDqJNXl|s8zFQE6(wJE~%MGOz9fe}a=qePLw1*`l@8BzgTX003-moiM4JrDu
zHVFV>o%PpN#OgUfD|PW>u&jLCuveKpV_NIa3sNAXl{w*>28DeWutKKcKr{)QKV75F
zS3d?zG$bdHQ|<r=BZmbXDyMR_<}b-H8_|t;`zkml%0ObZY(MMG=YPF^R-4b?oBX!+
zBv)L~G;6TMAHU=p(f@IJJn7FwBmxYkiEJ{O_E-@^C7v@wJ?V@g%;hP+Pv|&go3-mU
zSpzJTyFOi7*OOBAF&fv7`0C<T&xHRhZeIIOdRY;EK9b6&3kHu1YcU0z*WOXzNT%&m
zFsvr6Q`9cam<w2Oygrim{tl9__kEw{qnq2bx*ErUVt-0vbn$`Ziw>4xd_d*4Wb8%d
z!AyrrOsC)c(_*ej8Tb(r#o+N!@S4r9Xc#yNmsyM$DP6>iWKCCI)wI2c#af(?2ZKt}
z+*!H+MU8m96P$#QnqbEWSs=MUGI#f^tx?fv?!957Ki*p-{T!e_(vsk3laRmHMy!3H
z$kTX1Kj3KT>RyyR_H5`N%BgN}^tb=H^tTsNfAA>BKC&M8td!Z<>-;$fm>;2=1k_Y&
zmBU{#O$3SYD6^cbkTM~oobp5tN0)i-=%k`oJ@fdeD;tu<s+v;+7wrxg>~zcU&{Xb|
z<ouIEvKS`iVW|=w>Y2Gq)E+JqX%A!O>P0xf+$5!B<a=MIVg<~nonBfF|1zw!bhxBb
z$e1?Ttv5v1J@kcS@!!w`&@!HE8Mu+Zh`wYuL#scjTkGdqEp1kAQN5e$E@AOqiKFie
zRR0^lwY;XwsYYqRGo^hE+ruJb_=AeQ|AvjU`IW!r9FnCUr_)$=^f&1t+Ff<h@&Ip7
zq4}o@%J=6$tAWj&W39__c=}((RhNZk^zTo6sv8pL2|?xYj^}5<q6^-Sum)n)!LS%U
zijho9xW}UmvWx{%a0X#vrA^d<#oYT=AQOQgL_T9fxCJ8=g4r$iZ7S&S)Y0ju-zd*t
zgtcw%tC6rAflV3Idq8Gy8~j6Ed{t@jE%Khj>?#^W)3{NkHh7;;j%@w5RG*QloXr|s
zUHDG_+-h_LC>8nV9v`TV-Ae{eT*{sq&DRb(G~HBpcLMOnt~TmVpFr)5;k)iuVm|Gz
zwx23qb7Y(f(eM4koxxEcvOTK?y><=TB43jkuaUYy13{BzbxPO3lJ2IcntLeYl5yve
z=b*8wM#JCH(#K7y3B`+++9iK2O}4x?GA$K+_DIwaI~_3t{kgHqAsAtGkQ>^Byg)~+
zfYPo&v0+e$PBY3Hk|j#maz)lJy}L^w9BNFVL~m5qeEZer_U3}|fi@-xvWs-v56l5x
z0l}co0>KB8SEm<XqB<b{+$fO=o*2wtym0>F8DApT<b;gpGM9ga1aaKZE;7SN&HW_H
zCh8Oz>*dZCbMcq4o!cEUI?*v*s$C7J@6bVo0R;;rkOT+AY0epnM0KWo`PNxEy=&z<
z#)zv_SgEl%bbek$4oVnhwXZOW{ro?MaxI-TB;|E1*8RM45V}x=bjyAykSu($=KG|e
z4=QnGxlaYG!HHC)o{eMZG}k^sU_H$>A3~ZCi}6(H%!{(4I77i=VLUv?3R8Bq4P~V?
zlSSK<7NFf%Hg@asE@th}3s}(vCq-x`M6C24=Lq6ugWV1yG~OF6coq|e#OaZizf;xD
z1?myvv6AnpI;Lq?K=QDQ&vlC|<Y;XFN~=Mg1>j1oMFfC=S_d}=jETfK%1zLtf}S-A
zZh`^&A-FeJ+T@9J=qy;fTM<eB7^~#Pw^aO{W4@CZWaJ8R@Ok+tJ43?ivo=zUnbR*M
zDHcD=RT>!7b26xxP>wiQk@P|b=(xo1mYdJwjzQ&{0($u0VCDt#NWGB513r#KzlT=z
zTz{RN3u)ia#M2d@|A~*rK_GPOtcx0VM|LSaD1S7-t|bD!$Buhcv1Lr1HG9wYrsrTh
zlY#w7$4k(p$dm=h+Tu-^zJn)}JJY8R$jCzd8$F_C3-jWZ`4X|%Y5=nSe<-iTU_Hp2
zYz0TUoZY`mS#k({Nw~*9Uyc_hYWu3*kZ^^di*mS6`*dDl{g4Wdm-f_bTR(m6@qAJa
z{&fMqJ8=qi0<psP7QD5u5zGV#nq03yYlNY>;y)&l?$$Q>&h8z0eP-$=Vts?(Vw8Nr
z9o-w@qJLoc4qJ{iXL{*&m_~Qq3BPcP-&zW6jh<S8r}qZ-R{?rVK;O|nIe+ZR_<oe#
zU@n%FZjehYubdm)oc>g5aQZt2nL{0&2j=F8cCOy=@fds&(P_cUisYumQdHns#P;-=
zv!&U_`o!@v^6<y+<i@lVVYuZs2;-fZ7rzMK6sYm!pYm;AIQaG!D?jZEXKnskGTpD5
zmj_!IP;mRl9ytCocWIAcH>9P5Lr_jOtkFpi=$t3_U0_Tm;_v-MVOvlc`rr-d`E(bG
zF^C~Fw7q16To?O$2>DIX2w~KxaTpXQ)jHK23${WoUZ$h})Un=4(Fmxv@zUFkUvG6<
zQvt_c5HE@Dr;uaH1RaT@Q5dgvNK*3xY5}dUX=4nf*9H{_!Qn(+@aL2<LLo+|Zr%<r
zxtm@EkcK8dIIBal$fTtkm4GKT4qoap7}1w$@Jrb;6f-CM7E2;J7(=#ETD4(bWexX6
znRqY}!YN+zB#g-qe=dw!2n*v4edd7l5vFj%qiBv!lw|3qev%xCUaIuY_@7Zj`uRsp
zr9j(X=x{EFGy1^e;}oSh9N)0_Zk9xc`~vNs58+@KMdUE4?ohJ=C}2NgMeS#e{zeVO
zo41KL>AY#b_0<E5RcMni5^A7jsR>R+YvfzTFg$e2DIbtr@Hbz*tj%cghFrq+364=5
zfc(eLkfhdX`8R0i?GUPREoXur?4wIFuW%q*$;1iL_fI;6xyM~%a_0}vm}RBYvB~L$
zyIje_2jAW&bl+};fPLCq!USw?5RgnD;B1C*r^8yqmuEP53*GrY^Ois}%p}<oRd62r
z1i2U<pb3X2|JS709n1-y71HJ5R7!ZVwhT!9A8whQZS@G4VcF)*9RXFVK9i&n5saOI
z7`?baiBZcdGrT3OPi4;OADQ}vwF4>xvoo2bjQuS}4BdW}DQU~bz}xR~n#N*K?K_hS
zgvR36{2k-k#Z`iII-7!OTb{-ZFjND>#d2eB$Y8i}cj|MeI^LU?N|xd)6{6N3qxbV^
z1JFseBcFEmIxuk@?-%yruVeuK;AVkLtrocTB<=({1sACx#=gr{+LWrBvvZefiPWE$
zbjD$oYA7{UHqNU(G$dnEZzi^V%a~$DOFW$6R?05WIVQ1UFD0YBuQ>d~h?36&OCeTT
zBo}IeaRZRBn;cd?^hguoD<A6LQH|2_Ut4X@ym>YWpd^-F^CSH*;soRMDj%3`{KTIz
z`9_{(wl%?|9=s&rTX1wMPOuof;J#83M9KFI24*)3ez>_}+ub5TcRfupIh~U{wDA!y
z@y1am6xsBO#OC!l8a5qDL^Z(XBjpsV%|Y3px%*;xIe5}C^^}CJ4MWzjiF}Bw2l7ko
z!>zs2E-v#7nf*-yBINo;Gp}jw8>;M0jMQ{ZH8cL}rKSW*k!bgXl$EpQ(G6@Zw{bm5
z3Wt~<0Z}C*q*i3nFlkuvW^<VE^z^fZHrW4crZmhnXtoAqaH5?v(jDLg=ccq7P@T)5
z#WDt0e`n446=+^=`EOHaE;jKJWLM9Mmb$<X-DD6%mmai`I&h7gh&@9k>O1yg*>PGT
zXmAbuy|`u`ZK@A$gSwAXJAny3#p};TLv?EzNV{qxQY{ftv~*<n+n15^w{(4R(4Fhi
z=9t$Ee9+pQWsT00uyDF+(UL6`{*NrmY>i<sM(+47ZFOOw!9fwBP`A!B9&<r}0)*X2
zh6ff~sdj$K@{JnszQ?vJnPalW2gad>33-XvMnxS<mHrXIwY-)5Jedkb8Xg6I;nd_#
z|5KTi?q!Pw`hLIY<b#A1G9P(;aX~02_^ifKkdl$9f?$=mIZ2mMO6a~KuVFIsiLOFj
zQ;grNJIcj7`MxIJlPB9$hcU$dE9DC0I}I>g4VeuuRD$90vFCr}j^w6A$S2pazk*0N
zi4`XU%u6%+QT=}8j(^76c*PG4lghZ!ARcz9!tHJ%V6vk~pv{d@#7<>H;nysJ_jTeD
zp__%_cu6DN3Eht~^FZ~o)yn*b2YZTjYqb`d+5rENJ3e?tkxLqO;?r9iG%|*DynA{b
z27$`%E@o^P88*MTT3NKF`#Uu~94EghZ*O`N)}c#}DDJUWVMNHuY=<ncUo{wtjy(s7
zBMD%xx6ZWSt#0${S|0SJShCft7JJ$VG`XBj+ZrGVT4kz(%uj%$$J)YajIpmw3>M<<
zYNf}4ZmIN~lWZ-RpAp$sV#xggv25{;+w`~HQ>1qx3eGt5f$fZ7qP4ORY2ANR&2M)9
zf;L#`3fEpA;46XK+IoiD?V7gPX1Cs2X|buyTv4)<%?iN!V3@{%?1tW+#TkPWCBq@_
z5GCng=?%(^-G3OknmC8}UMd1dc%NbcJRXibv7nOP|1r7@X?`K+W)o(3hsLmbHp-do
zQp@2&8-+u)NFyXN&6*(d4=fED?Z0*5U_U4ym62SUMD${4xjsrzwsE?xJ&~Z*%<bKN
zM5o{$|71f^mY%9mE2G}9PVv_=w>-NNC9QsTi3_Xg)?nUiEg5$<xASI!ES9lS(d{O`
z_v2YV_R{9>wRzLX*BDbXb4|8t#%#opI5J>vTzq2A0uNs;k>;waM^t^jeW7;zlaVX+
z9BASDKSIZjdzzlJl)ha}*n76Wp0^HEXcr;YNLdLUI<N<Is^zF=WeDtlzNqz(p?bOd
zkA9Ac6fv_MecKO1MPC^O-wCY8_Q)0RnM1aLSrHvyc}l+)3uER5;<%0UA`<X#$3btt
z8TfTRFFua}*KLuF*R2HP$UQS_L5gmKUCB;Si1&(5>J5lZ(q*oYx#oBdM)9D3p+!}b
ze#e~|dO|a)GwGo<Z5R+js$g-CmNV_>zVf<5g?$INDAK0FTjo8;RomBzZrfih5A+Ag
zp_%lPm9Y+H2Hm))TZV)aKaCEGTRU*yyy#)EW-R~cBIAm$t1vZcyupG%cnBb_azh5<
z)}Uj7zj;Nz1xF8vc6pZiEjrK}E}6)M32TDn@LEUL3u0hJ5NajFv&v4IOoqk75klJu
zv7h9JXhUyg=6S)R*zN)gZQKj-_wr;AhpweUP=Iib>RdXYj1mf%c#e%HmSLyz75^T+
zb44B=mnN@aHM28Gj$*=<_Qvjv%qzDNBbmcOk78{b0Z&?z;3>2DT=qFN`PXj=5HsV)
zTB+ijt<7U>cBQxALkwz3X%(xGS6@B6Hb%GqhvV7b3=M0F!0Cdzb~2z<O&pJ0Rlb&y
zRGm*lCzD8yI;$BCdjN94k}if=5M!w+31jio^L;h((?n)116O`_?3FtSb8Ax=XVm9g
z;X*E>jPcqxmc}CR8!G{?^_3Z0*LkmL;V<?WOb;)#UKV0GwlFBSY!~+Ec(yb5Y7y02
zZX6yI#45@1d!JpH`Jo>80#hRkp{C-mM7eY_Qiqz6D~y=22?ziPTn?w6l8o!=hC^h>
z3P@d+)5DJ<!_;SN?cbwI!Xts`cRmJbKxz~XL?&rLigkv6By3i&EM?X+Gt+RTD6&O~
zcT;iFFwf-KtYomN+0yD=A2Y6oi4=m12+>1M-$P$<Waz_Q_vyV`X_s+-j(^sf<HGQ0
zjm1HW>p}3`kfV6hn`bKF6m(S>{B%AB!viY$Gw@>;yvnLbO{ZT$4nH&9rZaze%6Ll6
zzsDjQ4^5$SAwJO9*0ahHQ8?co=#e?IwU0;U1J0$8>>hsrwOPw`e><QvB|Cwq6Vh8o
zjH78Hr~acpGTc;NzU&J<r6ecvp74C|#1{*PFzEQh%V`=Y+tI4P1J25pgt$D6NeBe!
zD@r=i*;NbZp;2T#@tAX?w1y>xj5)^-dFtZ#zMf{UZ~UlCp$k@FM@WQU0kMqD45;Dj
zykEvV$sh%U7Zes0_b6WBcIzHbS_RyylY+Rs%hnMK!UH>_)7`9VROwX^JkjM5@xy`@
z(VzF>)La9FX6}NPm^AGZ-?KZJuB{HZFoV0FG;c(+50I=vkT|O-FIw>?+5pSQV$z$3
z`+GqV14<$gU`!aYk=2`g44amgp#-|k20AW1RGyNfwfJU*6~FdW(1hF%jGY=iQ%^it
zK?T?A<*XrM(~io{%Kv;SDhrN41iO@F3fW@;YGkZB)t<a5_z@)9V}b{F+2aX3E9sKh
z<Yrb|6_VTL*JDj*44c(&rw)E{v1SM5ioEJ5<L(=Xi;xMxt_Pq=c(3t2Q6?R*P+~L|
z7QZCJP>!=Z2&3Z&32{XyKR}UL;4AndwKLdAiB$HC#;suTF)BxC&|kpKJv+1actT8l
z;nx5$*?$*iZ!%L#A^ft#N0#Kzz6=k52MNSv%-P6i5l-M}TN7hd+>51<FNy%ISSD6;
zS|sUWx}45qcc`!0;BvJqoO;|!wopP@KA|!ze6QGvhMU`#%l#2`bQe8!R4A;0{28Kb
zHDNs6&^XN+{ZWv+n0L=o?;w)y@3)K1t=H}twZzz}@n9;!lUkuyh3*pzV&)(-tW611
zq6=}za-h$%Yf!u7M9%u3oG2Jkh(ejb9Zw##cbZ3x0V~1eRoohvNtP;z5*OXuKEx-V
zAN9ZB|60#~6n7n8GF2RwC5(nxgpI}ARhZD(k@H9%J=KrD6!mOh?TwDs$&RiT%jB<H
zHcu^AJ2Yo`Pb##XmAYcse)cn|1`2|U&ERXjx<gX#XSUx@O+_u!ZM$ma+I%p@tQDNa
z6$n+oIZjqHfHM>2*QHW<U^NQQkt}mUf)1ujHYy-mqzst>%@i2GW0rS{U*s9;1KHT-
zLw8VB81S_C1juMWEtO;1!Snae@QmKxHLF52;!^**@QMQmMWg$dS>hSLK_ScexI)n#
zS959kVgxH@Zym7ve*noqHoqYSXLmWU6U2(kw;IlDna6;Q1;yEZtVI`<b(C|c)F@T(
zOkmkNCPR;l_j&C)N1=n_O<XQOrp@mDVb^#1K4c$qp$aFWQiaoFo!smMZDmZn2kh?z
zxd65XOQdfxA^?Qh1ItdH0mp2BG+-+q_j?sboMnLS(Wqr*b0*`YQ?Ne9k7#&0+`U%@
zaAN~gw8;V7I{usVVL)bF@LGEDD(o5uxmE`+jhCtinAXLpof8Ks;umSN+Nu@7^T<{~
zo!GcEq{#Q<A0xFJ%z$*4BmCy5>l8^u(F)%5^7PlQ@DK8rT2mseV5<~y<vV5g!^t11
zS6emRu~?ag0Gt8cVF0zLCTc?@3UH7sg&gXgCU_{x3xOmj0O<h}nM?rqESbT=@^+%O
zOEg}tH`&PsBT0&Jt+GvwSFv?EH|T%mu$%|mnLjpyZGU?cmb&rsD7{PjJHQ3KLTf4n
z&)k5&P5JiT#?fl4T!i?6)j322%2%Rx=2fcmFEs}24Wd`{0l#W2ug;%>G}`Svu~e!-
z04r6Pi6+y?@|fetUG#@mqsim4G);{bh<zr{viZo(u)qxPKjV7RwvX_Rn$~T^=EGW-
z@U4)x^mABj&CUy2H3-Tt$P$>DLN*RT7Rn~ah=L@Wo{VS4vpot!q;vkUPxuw2su4sy
z^#pI-gJiIzT9wLn!WS#AFe^UwQuEFUf!k7BA%5lWl7W>GN~e?A37dR7*d^KU<DFb(
z3Qhkyj#7B5cp3lym`0kk7L?4Vi#n+#<0v@;rLcTI%|IX$Z{@4@^&i^T{sw+=_s^|8
z_+B6$!%*mdYfce)XKtfu0ZVZaWtQ&YdPTR$R|BwLjDSWgW&<<i=c1wGVy9E!r^${j
zOi%GzJX8=~X!5Jn44i^u9BTm9pn8CZMqAD+)0h+6(<>S!&2V^v;ZSu%5>+ZRemh7%
zxs3i<ULo>1okXn{+HHW>Qq0^*vRme)J=0tev#*@6eVPlAtvn~~^)|Z##!TaAX|9<Q
z52%KjhO<2&PA+FIPquEn+vSvP&s>_IkbpMMlobyyQFzSeQi&uQVTwk-X0cwaRRd$2
zQn^uZ%1)h1Jfyx|`mjKt#uF^5;zn`<zutH7Io;)mA+OG3$mY_i?09w}GnO5<l&!cY
znvn~aDiwN1v4AC(T+&hs5Mh`&t&el16JaHsLbC;=f|<q3aAJy9^w>g&vf}Cz{x+1>
z%~#ZXruhXJk4DN~F6NX{i!)W3u(0D&^|*4VQQ`2eD{+5;7ldu)qyvgChb98R&araU
zJNNF|*)w<}@dJIW-+luEeciv}efv5t6NL|<wdMD`Z!QJl?@5a^yuP>m@!Eykt9S3W
zuby5%`QzG+zpXyF^V7;1?~B*bjwXhNcBFq=IlKPo!}S~IR_~u`|L`qKA9e0s0iX!j
zJNezl<>mI}d)`|&r6H#hXzj}f8}EMxaL&GsKf?eH`zMyehz<_JJ!~AvVdR}Ux$(wL
z@4_dL2Y>`ad*L@bQ0xUb$+*Rl-vCoZJ`RU4))ew$(b86~L)L){pLk!s4Sm2fOiXO_
z@a?T!-#C4-edWyBh12a1zu)-!s)D+70(qZ)?7ejz55&8Ds(t;mcluQOAIod6-}63x
zTa`-IAO5X<{!@sRW*b%e?9#@yB}j;H_YLp%Wr%He;je4gU&C)`E(4|gOaiUm{k(mB
zW#hGfct3t+g$ocbzTUoan}@<X2h)1kz|amWEhyC4Th<>n8X9pu)OK!L9w4EF-7+Yb
z2@<sAf_@lFMzkTymHgc;r~<D7w2Oy*r9HOCaG_UhU%Y~0haex_4c$;aMgT(3|GZ1V
z=)?|1fohZB{q9)`7=htE3~!=G-KG=N58fH8Xre$+Q~P9-W*~WxJX23AXd&{kI(f#^
zsZFn-sh;KrL^_DQqC$Ul2+hMueAY$9(n5)PBF`?OV5(Rh|HZ3Anuj1fMjM`wzn3E$
zoP&m+mg$-qhDGlUxs798wV@Jvx2-Ka*4C<))<Uoq#jtLktU$qy8}!|*lsJbb8jyQn
zDq&fsxv)iU1T^blDOFj`T0!G5MXR!D_!^W>=0&x<di;-U$nznH8vUN?OCqHJf#|8N
z)pX(4XtDImcd%^w<$AVk5(qzsWiz@3({N@58T9_6_R^Q_x2|}%-$cKOqkoH|->C(=
zHFf$GCDp#b`8rfnl=uP@vsBbahJ{irKDV!YPrV>kAAIheySH)jJR;|jKqv;O5P{H#
zK-&CvNryxRKZXAU-Ppl6dhzM!W{8Fh|J;`%ubwlgqVUkS<8e&+)zV_Uw56VcY4In^
zkEv=i1>L%Qeiy6`Nn&hBfxRJgh6D|<y)k8f=&f00hQ*&-)wDiE_~1YceC_w%#;20i
zhv(NGF0ZY;i;o%_mb$e8N3vn6wmU^?54(8;Ph`Ogp1A?YRzdG&Hk$QTp$XRiqmG{_
zghioWaO>c-jux_s%vb`_SHYzXxX;ldX`!fumCutyem;-?BA1c!uOg$U#h=IMr(kYB
z#7{x1V2QS_2LZ=NNPJSo$H}#`_u9*Ed7r(xcIlKrEEYI4RJa&SoSnM)f}lJLvix27
ze)aBGY}&lfzVvQigb4KyPkGB9Z@hofyZ9CJo>k+HpF8a>e*g#OH*R`oPFd(JqOF+R
z_EQ>&66tdgnL7vFy7I`o|3&+w&)04&<BrnN5F~`+j^XdxO~e9}p^d9|+ROj=Y2^bg
z&OBHnQiKZ-KjF$qY6hcDTF4#Rp!cQ1n!tZ+K=C_Bi2BpY89YtxGk;or@VEAR*W2G+
z0gw>Be)aA*>lc3TF1|~OlF;<V2=okvcki#>KSe6inDw|l3bg}rDV?l8dLI*sjxl@6
z1c}xa6tI>l%>&Q_i_N3BQSaN=A<X^BGc=ht<Yrj8zOnS@uwi}Qe(%oumw)k2eFvEt
zZk)jgwvMQMZ3&2<MDh2&c)k7cHSglB^*0`dWA5RpwaeeaC;)jI?_cuH+)<FXdUpvg
zuk}ZNT3uNo(3Lq3@;?kycHVHY>>*v@(;8OG>AoO_u=#>Z4=hZ~mdV9cbyaSYt8N>j
zp%pYUm9WNb0}2h-OcB_D{viPZvF#DeNu;rU>vQ}PmO*$65vEATCtX~0(n1IT9(iFJ
z0NE;sx^L0Gcj7=8J?rP*@IL&WEJTuy;^ZB=G$VI;bJX~Feb6-dst>+sMzMG+ebEd;
zVm5*P?RRKx<sI*vtL+cJZ@+tc^Riyw{MAgryLL*ml1NX^YM}}@@nuvYiwH;RM#Q>4
zguF`^i~>3GZhuQsw18w`Uv`5?<Xd!pg|z|u4VBhR(iu?^(Xo@V2EmNVU-qm!TP%Y0
zcLQJ?SwIDtRLFo4feYyR;qTtLlisDTSMR;;orR<~OnM#0czXxGq1V#8VccQoEq%R?
zCeZZ8>GOCtF%6tP<$ZQqIPC8=x=`T9Y$>EvL8Sz0jG~q(Fddz|g9SNqa~!STxViS(
zeYW*|`t|DFPra2lFfKr+*1mAoFjms$`qH`8yKlAc-lg_8K$T)EgA)L;qtO&kxY;CY
z4AMSbc^~@m*_Z3ze*#7=`ku6{HaF*lJAnw~(xvMete>y2EW3L50-atwS#-5gbxY_G
z1g~~tbEJVo=(3tZG0_DH_u|A*Tw0{0z`meFp;fmW*q+}(O}&uy{Pb7eTQ|LXkJ^9!
z1|*ELOREn)r%>@)6##dJOB{KqH=xbgj%}2o9(PJ#eX}T~Gdkf@$e&)KStM~8lJKV#
z21kTsn(CO)Wy$?WCrg@9Z2iY~FhSy0+vk^QPk;rItF5Uql3#3={UG_$E>JX6v#K8O
zjZl_!<yKN_i8pRVN))T;dB|A%VkR=)C$n=iCbpIPSiS@fHdlk1-sy|p?LXn6;4!WJ
z{YRkLmDjzycVSiCyVHJfWBtx|;>b);!}N9fy<pt8@Eu#`c98*uY12Z1=mao^*=PIu
zUGLEwGtB?M!W$`OjErpr;S(=jQ`D;=@WGC=N}|I8df_)aEI=f~{5H_hZUgbayMGx?
z|C(NC+ZQN54|4JJMWzna7TV@L*8CX{_g7<c8s@F$&sw;z8ay_iGB`hxjpH}43xD=K
zUaqlyDI<LZj%FHruynpB7c+P^<6s6)6!$XHj^IC)3{*`;cg|^*n|sObdGm(1axs#j
zjZ$6vwNJgz&eJ_fWPag<B>EiV#U`+=kfz4Dc>-AN>HBo=(~C(q25vk!NZ&~>UhF{a
zQYTGX@~+&aEsb<=lQy71b`E3gVfF5L=G^t@viIS~?eA{3-@NF(^<n#?H?Uy0_Q_v2
zuKvT=5u=-%xTRQK=42rhLW8(<2>eqaflfd=A29o~fq>L%xJ7b>;jjjun8^<(uq1+t
zRhP<AftwvGBm|aIyVIjfUdZKS`3qS|sv()VvUXug_T5=GVN7QJn42&sbC=z&4a&4)
zHzs9vH9LY)nU>s~S$XiW8<uIWdo(R`2mT1gWx)l!Tl2E-6yB|YS#T8Z)x^w4*bgJK
z5d~Xo_cfsI@mPM1sDIqHUp{%!tiKFb-PnHz`F0agD=5b>?rZ~=<I{#(f#ss`JZ3wv
z-hsz$2{!bglPy@TVu_}_wF-2$ZS+ZnuJf43rbOqfEnJP>`;yrbRk|?Cn^C6AbhLSO
zdZeJ!L<8DU@yX)6N6-uUQxj1$7<96qin0J|&k8P%o?k)a$<;@2C1dUNdnh%9yffG7
zH4J?6@E)B6e-3=2rrIKM0Hn$}ge#f=QeCWmYy{odk8i^r15sP{;PZ{w9(gCvQ;k#m
z{3qUL-zwJ)#EVv)Rc59xZ9<D9Tutj)R}#!t2W^S8&}{mWpil>mNl-A#J-ncFRP~71
z+pZNPLYjmgRV0F(&_PKev?Z!ZEM>GMk;Y5n(k-x9llw&*UtgufO1(SG%YgnR-^@!P
z(7Z4Im1+G5Fs7$rVG<F&NwxO<E$^-CaOJ2`oomiJClGB)xv(U*MDA-{K)W0WRZcI3
zt=|0x<m?245o5P{8-7ei1edn%-uG_Z+jywn{}PcajX^IIo&hC-4#~8{y3``^<~r(;
zI=0mxWss?vZqz|SeY+ON--^QZIKsC3%h2RVk;tCta-QgN{@HXn|D+0<CrXvatW^1D
zQj-iE_NyQE?-jC$?hy5{TIV{5fhQV(&1(RjobCH?w(m!zG&|4eIaZqem`%^1{>fqr
zQI`NJ_50_FwUxiNFaKyeUVGPW(jzQ8Mw~rQ9C!eZcl(sg<lz|q;ByMk`|!FR=79sn
z$A|XyAH3H-@Gjnh8pQ2|)@#3We)Zuse3GD@Q%@P-(e%M3Z{?!jecULx_b;FGPTr*6
zFUCegcB9TQ;H(;q2k*2$`b0$C6D<$gjP7I$6(6(!3%2pY-`5^oq?aZr*{wf14X0O9
z<B5tyQIW(EU$@IYu031^Z#4WC9JPpdE?e+F{2lWO@3VKo<B%kCdK}slRpo(rS63gL
zUwiWoW-IM;uNf67U1=7&))qvn`sDD>4*%9OJ&$m?l}CH@o8Rx+m)^U7SANH#L%a4J
z+P!~Ye&?PY2M_K#2x8AuL33!pg{^2{N{^5vk>>zmRU`f=gbMP&{lMH@Mlz+c$Y!$f
z(Igt4NR7ouNA*CB2gx$QfBd>!Egb8Z)dEvj-7N+Qe8r3y1r|3)3s9e9Axxyod99c^
z4TD?>0?13@DfEChWGNVuJ4jQnCep#8%EhlXUOVqyIMsf%)c*c~p9UB+s`aIF8y|g)
zB?Qocin)PFFFU01(N#8~!QuVbq%`jvkMDuE7RZXj1DmoIXc!*5I=}}Oo&?9wu^NEY
z!hH`SEH*;^SG+PyO*|@^mw?0rQ94)QIfpeM5(p%uP%x_nv^cO(AJ{>_X`c5r#u99r
z<i$_BY)$cFgG`vFTWI=($HN3StF>b>Q{g%Ils~3smk~qQ(#^VBpU(pnvcH;x`6{E_
z&Jy6X=Jl3oYv!aw!rUDPw*fdNI|Y)K&(9o$M3E{4&&nIvxD2_MtP@8P$>vom3=(9U
z&=3n&fQiZkTIJJh&j#fKw)Suk?XS4#DBeYID{ym$B#C20;O9|=tf+Mtc#y+$10x}{
zrvR-9Nn4M@CBs>l>ZWmB*J+f>i_~oa9Y0#a>xTisg^yIhi2|1bT}HKxMP<k}=N47i
zj1w}!8gs%qmPptsL$uU99z#r}-Z%-gL^!|O2&6{7P7{#v4lkb=l_QcNSlUyq)(}Yy
z>)^3D3r?vFiV`0~ka`qjj>KIjH=pop$h|?{7OL}@=4jqG{6E+65ug!Fway1hbVL<Y
z2M?E+)^+0wVl}Cgh>~zSfI4HUN)Gz4lus}M<>}W2w2LsXu+YdiL8&XRJriumwhS{*
zQBlWYXOpM&Vrzn26(dNZ1jwjEECi@A11Dfxb&{)>WtT_4cTvTq#3y;-%(u$T5~O&B
zN=)#ht(al6{>?6;!wM@n3^VDE2y@v=k<u}_ryIkl97&OJ!CyM#B{-i$Nq$WPa(<I7
z%s8>|up!cAuI^>iL}^baaK@L0JM3*LWR*{e<P?*dhZHsnKBh_AaY{(go0%|@pxAw}
zdfbI2b(nAlNU@#=wBqS$v}z=KVY6DtYlUUyC98uJ1xz=pn7wbqi?P+v*quxqhe_8C
zGcfoRS@)Ax>T%nR9NX|V;rAI1BBTY#zZJt=*@#cu)?r;avH&O_QlxJdA)*5uEJ2ix
z%vVb+OS3L+Dqu;5ECZ<Swm#F_;V`(v;^g#0Hk1HL#qw;~6uS`=C2+u?{Rj$X*#2c!
z)5(lK5#>c6sIfDB7aGH$_f2p#;v-y$l!}^0-m4(l0iE5+h7vH-yAHBqzFa!i)%@e*
zG=y8wW<`AkDnd9~>rq>6J#AGQc<s4=<f*~&k+xVW_}z7=$Pz3Dow6{a#B`PL0maOB
zz-H@Jrw9VPp(R@<5`HE8K1Nn;Ky7+bKX_Eo)@Hj$m4hYJ1y@-Utx5@ZJ!YJs=;R_v
ziY#$GND^|B)<q&nnK>O$qSSQf&BGKbM-()*>bCb>X>K81LqDxO^lPh=EB&5^7Oma%
zVRT;mK4FAIw7r*vEBod#Z5~zE&8f10UC?W=#F$1rbX>n-edtBdvcZA}n71BQUwjCh
zoEe3o6&ux7z2N2~taZcf;fcJidldV^_k#Y&fI%A_gh4I)Q}oI975ULbJzb$i)!GKo
zaapX2z6EbiX5I3kcciFpQKG8T70vl`B*IF44`CL%-kd1kC&nSI=frq6Zb%r}0f0GY
zGyDhaC*W0LH$CuV#iV4uG+!&b^O%<r&C86hsMm85s%RVRs!`2y5z>^F%@e|w2Tnmu
z8^EKkmuq=A4pYwDpdoc)Dg=0hDDr%}d<G?H`2hMbN)4g^%?r~|IrTiV7-Dg!277E$
zg)dF*=tL5xc5-aeG_?on?xDI<s<_3ypa_d8zh&b}zEaJT?B-ys#1sJQVtT+tB?D5*
zuYBWJsRo9rM!x7SVD6o-m7M~aXbfRMHd{3Y<f>G!&|o2uXzz0*N(rB!K_v~=pim2z
zO{qXD%hj}G*fbXc+bNlWWZq_CY8E+*XLA$W7OUrpI4w-U_~dwe91W*Z6BGDfZ4M~r
zEi<CG&A3BIhBVG%l^iM6`c6`3&)$xWsp(ZZSB;|=+=^Q#3wLknM5&^XPP2r0FGDUN
z1wuPz>JX5i9|o+6%8iLIBr=~!rGU&+>0~;dHl!x*X7n;66P9Qm{};p%lryYf{!s-2
z%2fX*As0wtofB_3%H<J09v9%6V&#aj54CEeNl>6m_~tZlgZ>fxh(9fPmd`_Qw;}y7
z7fa9uC$nU0xZuDuJSrr-@af{zBIMFbpT`}rx#n9n?j>!~Hvnxw!B!2kB<e#^CN!dl
z=c&~}**7r#ds2Lc62{2h>Mjz(<KwqCK7DiTjc>e_CGX3(6N$uthG@L~;uETSf*dEN
zP5br$?F!_ntT)XOMFtLmxds2vL10mC&9fAch*!n<0r7#H7NK+e%s-FCfU=A5KV|=)
zj!l%SaNAXZTy(JvS)r=-X1xUN3&?5Uqg-XtH;&-nnB-{k3m}nFrBH4a#Y?c{<hS6|
zOJp{w!HHN~6rTq;^b*ImplZ!&9&IERAZ-x&5A-tL%}<SDwDS}>DgZCWQ+32`k~pW7
zbqwSgz4%22sP<c%Z@hp4FkxY`N$z$~l~l^LgqXt}VUh=(_>NYydWeIv>N8k(rNRk-
z4SqjO%BN!DmP77BSL=Der}+@!FC=li{75Z!My(Ch2c%A{WIo-SLyCiSwjlzILE$*t
zr`WN-ksFwA=xYida#1?aG=Y$Dr0x`B_VmGdw)e)Kewwx(=l(j9beyB1%=}9iN~#YI
zEh1wezFz;e^!k7*gAyRXh`<}0I!Oh&I%H<#kAUsrG|LAttpuqe$5|!Y+IW&~YZKXc
zzjn1}o_QvitOPy0P|-5Rc()6F*7z&~VjJ;R$ASjSn@4e4D7#K2A5ax#_gF5)n;1iL
zpfL|P(~It6oLujRYyOyu8--G})%X{*)2WbGO{Yw({`2m?{Cj2hwt2Eo`Q>N|-`!a9
zsrV&vEj<<d;lqcOWLj)J12|=OF%p@+1BvpcR#Mx6tQlW5w=z&LYVS<`XVR`s0y2xN
zKr3!(ShZ8c1X4>SlUYJ**>ozN8B6tv+TeLreW`g+u?0cUM;DQQ-q!^jdO?lRY#v+*
zC~>lJMZdCEXw_-eNY~Zq+Q$20*mbN3cex@or5uT-#5k>{n_f02EdivmBIrvRTbrO@
z?SnVhzx>PE_1C=958GF6$7x)wJp`i=a+PKQB&iF*HU8RH>z{u8ue%Set-QN_>raO0
zqk3aI>PKvDl)^;N2%ix8og_NI-D0s^&k<$I>fQU2zD3q!s{XfRvS^@58hd5+(T(<N
zx4@_O;;oIPvzFc{6Gpp4y5U{BNxsr+z!av~wN%{FpRgJAZrxk|_6Bt7o2%aIH~pZt
zxcCSu4v~PDw0t3EEWbonk`2+^DNYy?Zm7&=*^J_aYQdScIpiA2YLLYP@aY3CI*^ME
zdRcKRgdMbtRV`6Y&i^a)K0BV(od1`lq&)vQ`KP6hGQT<Hx1pAVO!@P(!Fv7!KJ(%K
z=gaPbTh49UpV~I0v%*lku_LpOod6MWWu&SHabqYd9}y>-@()O!dimz+{R<mkUxl=u
z`2G1Sqyv=8=rHJ|eIF9dcM1V?NeIu4kKV^ifY7J+=W`Tj`|OwNH`HvSh9yCR`~9@?
zftb#fdBQ?oi0`gIjujd(<H5!CN2js8ST2(TD_!T!`r?RtqTtq=5%EE{Ti+9aig@-L
z)d6{8xvL`~i<jYkA+!unAZHU9(Oiff3W}vdGiST|tV!s45zXjM@hy011Z8EX46bqF
zlhNshuwR@0L0v3#EsT|~(Q0r>2@Km)MlTX3&oms+kVKbWe2R%AQqYd%nWD*9q|vO_
zYG5ub)Js@QEIEcUns&zoXR%Vif-}|ivRm=S=ISm!!q)W+)>4o0G}Y{do_?CjjP$aj
zDjP-RUmK^-d#B##Q-CN_fJl4SZc14KhCyPYhbIKlh0k1Szjud48zE_T2-7miStQ}W
z#ux9l|8_%5>TyPixlcped!K!PMKdm8ob?lY(9Fl{Kiu+ueACtt1S0?NWAEXgX&-G3
z(HQFFo|D*uFcJ_7EiD=GNK_$-grEaV$aKe5B2=zCz=?5BsbOHP251_Ia)1Zfhn?(w
zWQeV1sqA-)>9DyiV}GT*xSw6M;h%O9uj@K5p_vhdhPY0M1VLggRPZiat%092RdX1Y
zH7nSfuPuTiu2SpFYz9&LFH7xDK8Iu!m%d=rC=jlx2F-D#m-TbwtUvk?%gsPb%O9-X
zJL~=U)!KzybW!xS3UHg3zVObxuOwDqBN4FZn(KHJg8P!ff(MZ?rdf~-e4J`|<_4yp
z+x6?;zA)o2b=c|Mq9*wu@C}$kufRVsRVoVN5XRwHF5nK=uqNAR$k{et4ZG{bbQMuy
zS%iJafOXzD!O_tl;Lb<gZ%J%M=vNW%phTUEVwI}ID64fcfJL_{#8E5`*`gfqLjcc9
zZruv`6G%(`Ulxf29(C`_Mc54aqj8*Elx>nf+IE?XvVr<-OB-oU{u$b5wzKj-I^%}z
zveK`RTA|x6e<DapeoS?lK8>u$$WSmZA|Jg|Cs$WKXkWkAKEG^DSLeNT6CWb*xxfAP
zUp0OON)FD}@OIcf^RBqn;;(GqR&*VHPd@vSCL!5ada(ZEpCDlcSy*cy-t{h>qj`Jy
z^*eSvZeP9!nfKI=*g2u^163q;4b{$;4@yw2yX{~iD<E}1Hg%#7NEO7bu%v=kaOARk
zC>&bC3cwfFtpIdqB2iu^CzE`slINOmBPdH*9Vp~(Vg^1FoZz$Jh~qT!ZWTNjREZ~t
zoXTR1r|<Kb5E_8uoInPKXmlB0OseRd%)pF`RlYHr`s;t~ys+^O7Qn)rB*|d}rbK05
zlNpXlF0}1#UBo0FIPCZ3Fm$Hj!bKHyn1ox4qr*}sf<QeEI&A0<hQK<HVkL4RE+nq+
z+`uam@K=XyQGg>cOV~>VOX_yzR@QdJ77)4TWgv2YeZ@!$$e4}jVWOl3$C5yr@5|Sn
z<AO;cGGtmM6f*aClK55h7J_P_trY6jMq`WH;3H8PX7q++O@+BFE8Wboi;mzHP+>Pm
z>m#gF#8z7`xeG4;qK76RJ&~x*xaPygA-PLo=0=-hic6Ep0v=vUnjKO0m>%Fom=b7U
zh8_xn8UsUSlUm4hiA`%$W?$}+IwrxB`X;y3p?i@S2HcWC%{ttsuFkQWyxc{#K}@ku
z8H<QT;)hgf9CZx`QqdLwx=9*&&6m6N%UZM=TrAGlTs(dbTJ_`!%C;)JOg`uxy?cSj
z-;m4^hN~%KS;JN)&IQ;0<*ALH97yuAlo$&R5>nL&Xl|D9VX%(LxDG>{Hs~O^-hpZx
z65vt+8?mH&piZ50W#>qqczw8)1sxD;CJ#-L5Mo-L*tXsKp5OJ-u6;Xq<qz%NyDR_F
zuATc|dTv`BncW}sHZYZxnRz)6Qiy4CENeiY^SE+|@kuDh{*4PlQZ7s&vdIB>`G{Bv
z3D3t6O>pNl=(JG}&#;5$9C+L|9TG)EsFHxcJKrl3PviIiQG+GNSg~Z~mdlEEa8A3d
zHhY`L(p^4~&_wE$&t^UKkT`aM^B22$fVp<FUM)xHEe~V&%7Dgi2c~o${Avhie&tn*
zwji`UnY&IIS?jCeJXdMB6Vkks-)&r8rlHb)!e`LGEVcjfws+=sXU(xi<r=1ofmNgB
zh?fvT_828tp#_lu%%)!xp+(TIad18z8UguV&8}e91s)Sl42Dh{^4Qu}v;l*vM{J0;
z#I3}10CWo+;vW3nv@Pe#?OZ#+uhQv=)mVEe+MbQ4(qAUp45-*<#|`SQEid8X-kHzZ
z?|#GWtRGM#2vdN(cc*>!9q;_@pH|NJhF_K?`Mo<E?|pC7wFq<p2;c`{YUkdT2WvUR
zG1(S2IC$s|$BWyA6Y=4yP~6)Y^q<$>A;_uL@LmA=O{xne_c<XT3X08@>i8%LI%XPT
z1wv-v&2`93{_xnFuec|g?0B!-Gw<GOLL`mAdm@S|l_1{tq%g1w!8ko&OD*6xuNbGb
z)zP88YN$4`20Mfv=3o%Dub*CDzU}?+ci&wB(=PRN@Gf0gd+-e<_b7xb(_gp5YV%0!
zIDn7e+*rEqom}yj?}2%bhV&;MNFgkQ_yPWFskxt%IyHQC=(OSL29IE%T;Z?)@JFti
z;l+!rfKT?iLV`);P?~P!HeH{w33fYzU+w5z%g-YpFi5T|J_ng2?tkcg`*C~e%Z;TE
zyt{W-@1CceEn;_!st>~KA>X>~6y532s5N5*&)jbN)PzC3(nW2X;-vqmu(i+wHpIu?
zdf+9budJBH?&BhvS|l*&Oz4(QO&W?F0&_6BD>DlR^%rQLd3WQZk0rZI2v3teo?`qA
z!u|6_zaMMF4y*VlA0cUwChepdGHy7&lqzO^X_M-AmdW%tNLVU*TgF+amWo8B0tQeA
zK`uM^3u6sFGZZAkxQ=NPm+R@SLQ#5)XKsYMSQJZ*X1z4qg2mRc7rsVVHHnHRO0Cy6
zr&~ekA2J;w%~LJ8rER8)?glRho8WsvE5?N+DuOzbFZkm}u~1ugUv8D^CM_5npGCnL
zrgfN;Rq35JdWw<Z9Exc}w>cNf%`hL|@MZ*^z0wrpHj$d@93PiG>`pc(b;IVy?<;oT
zv>Za1ZBl#;Ex~x9Kk*?Ji@c&=u80wLvk)9GJoKc~79VYMgx>K%svqNmvRMBbX%ylU
zWrpPtvEk!k=7CTRIe%12=wKi28ugt)NaWN-^Bsu7EU&<<(l>Mt8LGOOF&m4cr*Rr3
z#mey`a%PCq)zE@<@*re7KT>0X=@5fDT%k`QI##OlC8r!C?a`Q@v~P`EYA9ijQ4?9p
z(VKMYN22$?6TW?G2ES3;fD6B^GTu*|1riOjR4094GeZo?HK!Ovo3u`m)#e|0wBoO)
z&1RKh*^%UF6icBIW6`eALiMoRX#k@qu%sX^1;VDw+#$d@fglN@SXdAus1jio3r;gO
zZIld%&_qI5&U7G2;2X>*Dni-`czE%IER=VH9HS+Q;wvwHQ+4_srR@;qMg;dPf)9;x
z=EC`ER97!MHY~P2<Lm!}{y%Lkh{B;kA`CM|!2+p)*G<gSTEb}%r`y=j5R0ZJMprTD
zx>-`DRP?bw9$Qtl*3ncYR}>7G38z+bE5(=qYeY1HburPf@)@CaHacR~S$B?rbqkte
z=1psQ2BE`tt03s=g@T}-PlaGyL_)1vizO`xGxiihRZ4Ab*P1s&V}8lo7*vXC5~qPc
zE;XjmD|(Bss<*&Ya0(cPD)frbAq9qcTDs$xKCc7{isYXlnh3ylMyvs%7yIh{Q~$dA
zknIuz-7_uf|G?>_lae(|t52hq9Cuja&SI+$`*^^mO~$Yg0Yz!*J#$n0_8+3n#ag<v
zwZVE|#7uWb)y*7cnEQ6v6^RZ<0-p&-^5-Ts2<y)=K`y*3iw6;$c{`NXosM$)tB5e9
zb3o4cwgV*n2%FPib+WW5XT_X$wCv$jlQX{U(q<FkHb)1DI(y9NuOga(A|EmolGiQ_
z{ZrBVW+>}_21SHW1EU0g9y<$2#4N_$d8om&qvUV70~+NvNr$6=(`b}ne`(UQh@phw
zeP@8~oM2;M5ijK;CJSF_&fr@f7H&+~$&HcyEHksINMs?>f#}x*uH<;b##8FF$|PP=
z=*&jkx})|y=(+_3vrYr#J*w2EK7x_&BtZ1Z2vEV%2cM`kItXvu=Guh?6$*u<)5)Px
z%lS)@{1QBV=Qe1<cR~mPF^S@!Fj7Owh<X8ayoe-n1T@5?fyb3G4aK8$qPI=S)TfU9
ziGHA~)=Ni7Oj>h0(yP$p30q6zI?_NEoN}R6CZsuKb2#JR5ILiQqoN81A9<|3el-f%
zumysFhkcF&KXj4c2136_YJmorG$5t}KEi3?XJ3DCr9q{cXQIIGV8oC~^!Gbn+P8b(
z3sdMN&h2O@S=L}oN8%8INFi>qfo$W8y%M39uSPoMVNnovom_Bpwo9=Bp5E`MijqYx
zg2qu1HFBLr4|oFcBNr-(kRB=1k?SCy!1I{5WLga_keh-ZLsJ9u&it%1z++2O!H0%E
z&gcEh=ZSLlILtXU9C43dr|!%*EP@Qp2v_u@A!~wm&^5dITn3_g&IY#<g;I<w_|p*T
znJKToofM9=RnUNuB0$W{qy`hxS1Xq^jYzJFeigm}^u$W|m<syFhj~NTK!$LC1o<o$
zzWqexAmsEy843Sv`B#fGKZtf#Yb87sa=%3keM&<fH4^2rGw;wJpu?v>_rQOjuhtK;
zL-0<Z?9yDRz!YFGzog8_Gd#>-9PQb&WA6_9f<<*_T^JeRok=Is;|VS-J|%w?zmfkI
zOLKE*WaJ3krrcfwrs6iXCzHv{NIIDwOQn)y4SbNT6bse)`BF1MN}$=OB}o$%_c-2q
z%k;JfN+pwH*(~`$lPV_JlSoXCjg6)!UHTD{u5T~83)?HLa@n7)?pl8KSu~Q2C(&>!
zo*o~AsG$_l^Ds%N!rb^|abhf6m>W&a701SA-MNW5*BQ$sv*U%)xuTmLAD<h|puKn|
z4!Sjz8pnN|BL73_WHJ@NnnDLkm8$F3M~>mSJyAlxN@bGi^kg=ZOpZUB8q44<ZvGis
z{y9)QqC-bpaWsax#R3ME9!06_)M$1pGmeIn6G=8d2V1j$a0|^TG<{&lp`9<H%=XL-
zGL2Z6&S5__JzTQdPH7LL(XBOH_t<uy{P>$hZIP~r|3D~>2k2)*6RW=rqz6GggZlm}
z2H-#NypE?3?*e@oBw@x<Fk_>*|5=Hi#0Nu4kEhXy`Y>P_LP`x(0GBh6;YZUGa1Gh5
z6BSKt7@sQA$xIq-E-rmF!WKV$C6Qt0zKOV%)(Q)0vt3;4(eORndtKTOx^BiZ<Kvk^
zI-N+Q=O$B=>Df7PZISh}iC*#*H@x}o!YUjk0afvT(^-h7N^^x%XE$&hB%b^9WsL9Y
zeAd`M`JMasJ-_>f{PX)?%0IV%CrAv>v6glchCd*cAEtRUAUL<b%fIrf#1w~lz(DdH
z_8C8qK+ISAk)LrUdEtE>jHE1ZV1o*&TDtKDKiG>AU|(jJMI}JfDKRqeu?0&R;;3M1
zL5!-4KZ3uJTsJx6r<AAAhhz+}eIM$AzD7Zr8o`$BkZQ4MTzy7|%Q9m5a(LFqu#8+a
zDTso4<*Pgq%13!xl#gZ<ilxL@yv9M2(|tqCYrv!R`friXYvvHSW812LlS$EOIto#7
z@*thVgDJyZl~N^7lXT}4JdhNCJma4^Cta}Gve{HTn?b|bQ7P34UoU&ByY%EWJ_D9=
z?uf~Okn|5pTlB0SqFi_hLx%{^RKd(VshL&sM4Tq~^ApO`6jstpcF&%7OwFRqSF@5{
z@WhZ(yi`0v(iQ2=ffrPxT5rbGD(u1#q_1FUe9iS|OO-;k1sPKlh1NXGOwtG{&-y_A
z*}mPHMT6B<K#-KNapYoJX9<;ET$dPGVp>l%Y!RvgKLwU2F9{31MEp4DIY|RCT^iy1
zNX_*t0Lt+WnHq#gb}=_3c~-OTl}wG|kG^Nti6HkKM&z{LnJLy2OMRFz-yQ74v;;Av
zN>#`!7n8uKb8M*uKft#yW+3Wm%LrydVEczuf4$?snu$gMI7FRfmlI3IQ8F&0FGvEL
z8cH;=_!Nubzgg1eX*}seLvf++LuPVhd_inVDIqYDK{IO@l$paOnF4p6Q1C<&FI!my
zuhb=b;MICz5w#ZnFLS6eG7JfLmrO&f-#f?P5a8*SYoHM%@eL7H1l}PQhsVG_#3B)R
zh*G>KKB6}niI*6vMdK&tioizlJ@XL*OsY%atfTH0TLm{C3uPEsd~=<q$}vQS0hgKp
zP;MKh>}{qVSb2wF2kxWDU>UbvXb>qjGG7oXZC-F7V%y|s98>_=@rigU)s^5T7{-3q
zO#^9t3g0#!<316M5>y2t{Hb=4yzDc{`t1hMg!vm5ap3G3k_O9<z4(DrYy`E>-rG2R
z(ObC=|2_9EO&j4|I=6avY4z?`oO420jk8PbYiA`F`~b@O58tgnI^BNnI*-xu!JDgh
z-|?33XeC!4ebhep+S-G6+8<x@md|-7Z{n8LPk-NWM1#mX^O5)E+i*{q+`3*{dB^+a
zs!}TA(lxM~_Ju!t_b=k^t>1cW<HmjXTj-cjv?n2;Pp1R{Sa(~#fJQn{xOabJ84ni;
z%Nx;v+1{;t-u=tog->WO&fw-M2NE3xeie8b<c|&VQyXBAeM(J##OkZ*t)q?JINxV~
z#6;wiv`Vr<SuLr#E*GI^{5uvgpXOrzsaq@0vM0w4qR|v#lUVkvRsc#YxX=?^%Md2>
z)<t=lveC4!+*Zn(D9|`50u9=*M`T7l`Ff&REteY2n1PX!Dcsb_QrNASsG2iXGFP#Q
zlmvx;zo9TTr7}Fq9cW<!!pSJmP(>o5WH9eeBTwQ`vV>qk%5~eM5Y%i+a%B*@lUs#K
z)s{iV?j*iy^@Sf#n|42|7^`3$h?G-yu`{tsUK@C`x%lq-t<U+wq8t%+LBc$}OBdL5
zd*|+Nyr*tnKCv68d3wt4)yxo4(eVp21~7PVfKmc%*g)?AC<%s@IwFdB)^#Asf7Vh-
z`=_qu#ZFzRa#f>J#Dq<8X_z(?N>xBnGd>+$_=HV0-5X7Fq%E0XxggU+sf*;*@-F=e
z3tqNK3qJw{;Akoz%6HJ-JqPTaG%V48gp5m7%iv90Ja@MJ(Hk45&#$fgwSD<V(*h_s
zWsKjh*EGSt!89Ns@57J1Q}26cZc_;mB$;o{u0Hrdlw@?7ML<I21t3*c0jRbD{v>T5
zdLVlZMu1gA0n3cpsYu(+jHed*sZWP10V4fxR7IRtu?Hb^s}yqP26~pz>3@4t%xO>v
z(e$pIGr|j3&>5@1W0j$T|As5-Y*o4yc|w}*B>03??WnZtP`PKtT^QWQr@Yg8(Rugk
zNy*afDO{<z`IEQ_V3%v%2&f+m9-f}hSq4~LEp`5cF7#l=BQA4RVUvqoAhhn5xKOd%
z1#YRj8C##dg`Y`zYbNC=OaHOU8WH)7g`0)z2mkwXr|7DVU#iy3&UCxb4r~pZjiS=(
zf;*elTG6=mKLrntqGO8AjC<2F^?@OW&P&)?)2wc>1L{TP65cKS1gfiEy4xC%=^rq0
zPA0SQaWp(RHW44w?VCHe%bUT<`IOW?Z<=`NRas(6@`KH|>Xa#~W!SGlrTypeucKj|
zpu_eJTI}~;h~qT^$#mVKkS~T@1=!|;L%SPjN4bptZ?%Q?xo&YA?qLm5&iMCcaBULg
zB0yIvogM|p*;IOB0^?0XSjP_c6KjkH{KoS;4jtOH@6hi3`|>;Y>^OLE*Fo^H6<Qo{
zog;32U`n6gxL9fq1~KB}O(-u9FD~FEJ30n*lp0MZ<EgC4QeJe+G5+g7-91!yN)@+A
z-fJ$80-*<MBe-BHl^aCaBK2)8H2_->o1}TPUTqyY3K7px*;#b!4MF|KAK9O?7tdJQ
ze81r6eKIMkXgMkEOv8PddpO%`gu3De#X39X2IWm!;K65$$Nyk>uOZ3e=WUo^D_*ox
z9k8O6H{pa(^Jq3rR{2;e6HgoX;`7z|PSWW;dn42IAlmJx8<kR5Ikj4Ok==!5F>ZaH
z<JYEnR;mq}BPgaW4)Xc9Q=iYbAi|cIE+A}XNVQ`N>>d}Mb{@*vf9I#(Ag@WhbpFjm
z#lG-MSDlG&{35}e0;jf@g|GQyY2H*oxPX~5eHRk=&~*3yZ&PX2jhkdtPmL)h4LdNL
zsbt*HbCrMz@;4nQ@<${kuoC)4d!ot5ur9geGp$N<WTxz&K^D)0EMA9uI<cT<MjPER
z@eCJ})|D_0vB!K&L;Qq6hhTucu!XRF4Dn!b$BaD)P{qqFsO8ws>;sI8wdDO*Sc^ai
zunb!@xQoR?D8dCCLJ%5H;PE?tPpZri{&EoWk`Zt$YTvx!ef;*ur*E#k@r}2#<bC;e
zB9Y)1CnH~d>U38u{5DaC+O#k9`qON|Ev6?%3yFj~TO1#sNV{ELe~Q!=V6P8dCf<Fc
z@d>z+5zkI^dj%@L^Tk~||Lws3-TMxS_n+WO=yv!|DlrOEII<(182|NvUrD<##qW@s
zE=-N=sp765z%pyCDYNvE12660d*D!h&#rwh9C|T-=*5?I9ei>Bp675WESEAo$+da3
zls)_Ry^xPm9lv?$P)Ny4TIPb{Cf>vSt~}hVRny)HD^lM7r-XCgA(Bp5<mNnpl^;$5
zuQBmEK6H|)HGeJA^y)za<b_xQf|RgQWP=dD$s-8wXTbQRPDuP%cGdIlenHK$;EP1D
zM<rjDIpy?2*)>V7J;?67K@7u(?|5g=BaEl|BBBYXg@XKQoPtOR6uTRTVafOu3i;RP
zOa&xh?eA{3-@HhNM()0OZ(U!#`xfhn_vdpA)=w+vHeP$=ojlK-F45}U3oHN}YX9&<
z`-f9v`{D&Wy1z(Q@7QJ%4<<DHGM+}^w(EFWLCb}_L6+pgyYNZ-qt9VZg~_Duff8x?
zP@|kNI;|*uRjCq@GW|K_w_049<(%|OEh=o{>L-6La{gax5j8cFF)b4>92%hm3JggT
z!j>2DpcOt?rWDVE@IffgC_($kMTH)t6ZACctVt=dTMtA7J7y*|5l^MjaAp+$JD!GJ
z6TniNCYF6;a!7ui`=Lr67bfg(2#q|W6;QM!k+DTK_=LRQC&=a+XnRQP2J$Y5-N2@d
ztDX4k5CzV#qH!ATgT^U=a2ut$5>4gOYPhITE|Hk*$1#4Y#{+;<2)1=Pc&U9jM-H%v
zk5Cbcp@V*~YBKhNBqb|4B;ao#VHuV5+K?-YhV+<1vH%mO$`-07hpa&mqkkGtHmno~
z#G+u4a2XyRh&M2#7qRESQVRLagBZvE)`~&^X1^ed<EM8F9xDF9!MXx;RfQyv#)9z7
z4eTa~An_TW%)u*`=s2&E=>XL@Xse*)VZS?vdDt2VggifwPbNT5cydXOz5#371DJ4J
zNHyQ!VSqvP#d9e^mcKZOCxcfud@RwB!3lJr>^cpI^l}UnU8@eZ70^K}l!_$BywI<Z
zwn=&#5*&s$jG>b;C(Lz*Of=?CA!i{p#vThDBfK=%@(+$d9fOL&(jq%uawUh@1JRxr
z0s=n!N;6YOl#fxM9I01ZHDB`*3{dP+B(Sz%Uw#KI1<cUu)tZ~f#MN{H0Kq$Rol=8s
zH%LkjsFVsOtB3(oI}&R^`(z^_NXRNr>eCAThS2}g=h?ja>rM%;jo$%&cfr~UTlyl=
z%(og%$gNQ<EtDE1$W=VM7^pT-><C`phlL7<rx>pg-o0vHZ*v2LBOLZ2`t|)cuMTxs
zdkX4bhU8j*KqR}$o>IkiSm->VLvfW3pH{SUXtv}uqzr0GBgeXGL|sl@H@EP3o2y!y
z3qhFP9ny%D4BfT}v3$J<YZW)h_u!Yv%Uyi^nyJ;OQI(5W*g-CgnV2C#gvgx2HT06k
zp@Pi8wl94kV#ML#Z71PVtDq_se2GP^%EcEQgqTydHvp2oMF2qp%GvAbS<cXGwOZzi
zBE16M@7bxfLqS!A(dD3sTz?2I*Ju^uA+UsR=E^fOGgC65@DAHn%53A0{)s=2`zp}~
zwaMG662(Z6Zd2fq@)1Yp1u&Izy<?H;QQjlhTnMSzsmq{XCnqI5k#0=<NSzc!<j8y~
zF7t??I7-D)y2D82=7LWljVT;AYO@NW<c7tM{c2ht_(Go97IFK+P??uUdPb{(r4o6a
zn*zzotdr-wh<H$eC%U)-&)mQvd_tg_+D5b9Dl|d-KkE3A+z>vR6kN1`kBM+}UC1Uf
zV+pi-4lUx-tAoqZ{2%sHj$ML*X8;URPODiRsk=w&VDnsXsVa{ieDQ(x*!Bf}Vu_=g
z;iOUu461P43C`X0INzZIqz#821rf1fScE2cbV6jage5$3>cb~Pd5+^^WKhGwmp|Q;
z6Fg$L?NFG*IfeF!T>}f_ST1GixmZv<u27vm>_QdROR2({R3M5-tIxN;=K|n5Fp)3C
zWsX9u;&}!gFK*I;t$f_?R$SVWt!uc470GPmJXP8&1Gu$;Dca@$ZXW+lIx!$`pzvCH
z@ha#U57yiQgcSuUeyMt37Y%r;AGye63d2;mpE9#Faa+t*L7m*VG$xaw^k*k}4ZtAN
zHWKDQ<!Z=-4Tg96>sQEj*q6vF&{>MuvL3@b?n41Q1IEK3T2U>!fw7Hl-f4nsySxf)
zTQA<i;wUo&`Mz+yqlw1L^`?9_4qF|HzfOeJAzCNo+K>W~{m(SjW(KDWU-~(0P`eu_
zrBy3n;uN+8yx8VyzU9)2YIBNF*)7i&`HxrxGwGp3vnoqnl7T5=(@fc+G+oP?)g^II
zjh8@CNel2_mPY9bLZ&RaA(4_h&<Y1fXBAW|RPs_qOL3F&$#C=9m%~CA4)dI+vo5M)
zDGH>6SGwrUtN0j>@ltApYG7W`25r&USP`dBLa>S@e^9d=ELRtMgM}&Rk1qNOs)mTY
z5;PlPpY+}gqG!v;Pjq5wgG9FwF#b^WBrcs!W+&{+yGt%L$o{LI`lo+QP<-%Z{QqM*
zFQR`D<5;?=v$kX$C1;=%KJ`vB7|1(+W%ce??dw0Zul)@!a@;?+_Tc+)UPb>~bBf42
za~n+yc#4Ziv-J;>__{^Teu0=U4jQqT4UX?g#RkUM6D-_h=O(77cs0tzA*!wNDm4S@
zff&adz%@|~JV(maS<Mnjgd4v>Lrt>)91c-P+G(GO1~HQn*R5Q}@LgUZ0p|y#6VGX(
zU6K+_XI%6zvOVOa&D>HF@0vMbW4Dwd>^dijAC{7J7h|V!v^3YunYBy7jDy(=uqPF?
zlqW0{v)7!$XD!A3fWA57eA$uV)_^0;=2D3y8nM_8Y_;Kzzf-4@7y)3Hj%*-kIyEds
z<7RRLzutH7xdAH%>n^)DlS`+v<JpPKSa#f2SjgZUp>(NIp<&Jnc$>^6ZN&f``UD#K
zIafLnTEr<dTR=6Ke?Vm9sAZ2WbSf;aH{ownah-#J?2kFVt__VMQ|jy>DBjz}4DJyc
z*`0v2<(u8f^@N|zm4LLqNk@VaX*^_jt0dUn`p4(JOJ`Q^pF+Ff4hnJVBDYCUFeGpJ
z<FyO7@rJqf;1gVndWCs!eF%=QAHDZqORqDW3Bd6NOTT*8Zlc4?U~m|<-@VuV?uvKn
z<i;B}+t=^4&o6r?Z+hP@c^9v4oW6(-3(k>;;i{##w6gx_!}S~Iyti()mtI@H{f2k{
z3pt|S2+*43RGCBRDMV8NZ+v}q?e9P0I@*|MdSGx|`|}y^);;2DchNiZ9&L_B67)yL
z8X{X8gWSG;+B<y;e8VW-43O!y*|kr;p^!0V*T4MB+V$5quHI=c{{!MGUAc`1NN?1(
zm%a>|6Ixw=^gea1!&8KD>3#MtM%_;<XS^?7M=XQr9?%>7v~reV>a+9hk6s7d-MY8_
z?F~G4t3N{gK<~`y_W4icAqDrd2utd_JBhz)ZO$Y?0jn#?v*6#9Nx>ziQS#5T&vv<t
z89%aVY$Z-)cPv5b^Vr-<w7}ty&#6Rf8ABi(?6EYjp}eC<i&R<;3r6T@JQYu+l9-b4
z9}{D}ky5z9&M(%hxsG;y%Z+s-N_VBV!tGVK9*|0)jinDT(_3G@v9|mkW<cJVQz(Y{
zEu`Hc34e*QG>-o5cZg~jQILp+KatIT>Rrq(jedT;u1tP@tu76INLizg$J{q!9OD)n
z`$jA%+HzB$N|g$ZH-lB7<4K~JO@VG!X-`6OKhPSfRRzoa!E2<J)pZojCg7KkV#Cq$
zW`x;PwGGgsg4D{v^VsaE#?~L7HPzg_-Il7AHzn5S+N7jGjYvFlI*&kw6b7$97SzWA
zT$K$rZL9S`Jtj-7I)bj(_+zxvZb`ES`SLN_e~qG9I)TKaar^pR@6j9n0iDWgfh^`%
zriwlB*sK&#&eYH3P|6%K7&>7-TcenN`o@Q0EIqE?zu+IU`9-lIL+9X`b3(PmV$n$E
z2BbVh|J^8;3a-^E$`*F0X;cMk>UK<MOH{WTK=iD_R{LRBVXGz0Dr~isU4^Zd?4WB_
ziwC-94R!V&N6|C;LIt<B(hya+N?4THE;Z09h!735)|#MQwRTC-CBu4C^je@~Te-OU
z;B(MnJ-oz}QQAE@{;0bLJU{2QA42`2<<uy&c4besx6{Fts#P?qsTJf|)!r(aOzNfq
zm7Ufw6>=>KwSgbj8Xz95y<?MYQNv}Kwr$(CZR@6O+qN@r+BR?6wr$(CI-j??JEA^R
zb$3-qtS{#eoW1v&bIfrYhKxl`r>0Y*zwhbfdlefy0I!`lFW3UddrWL$cg@rM*R*z;
zM{1w-=j6cYXpc5>Sup|ORLoM1v=C#9$3^y^&C6w&crrxo4xehDN^o^;btYUgTtPTO
zBumGiS}N9kRdqg9!>ydbTt4@Za)J(=plRFwu6}DP*9o8g0@h9SIEJ>d?aw};64Q&V
zQzG_lJ#nt;wHyUAQ@AhtqQA@DD!9wbTyvNi&<^>3r84Wb#Y=^ojM!R)n>Nw*X(rng
zt^z~SJRRs7Rj!ws6fRBVyJKAezHXFSo$+F~Zrl5(jqDSQ2XW2^3gp_J+;>xFHLsVh
zlw>(^rp}Zo<bJS+j+Dw5)x)HHFdlmpw`h|izucM|+3P6Pr9rKY#ijFeB*pQT#vBJj
zeS)HjZ=2MwlIHm<ZKWv97G)?+syr!+v)1b889K;Xq?nUKsoRtK=d`I<L7ev#$``M6
z(~AM>D*vfeg|sSBi3;OVHa*MQCP@(%A{TG-4}~aYV|M@-cNHD;mGAQ}dMHs;IR?XJ
z%B|KbYVG>=bq@oabTjb|1FUG<=<Dr9c-^r9HV6Llj(V?IL;4+lLm$sBNI})i1A<l)
zHy%6M_t~4hCuW9@Onh8?0inuccZyijU+g_QP4r!EKQ*(ho8!gkk%CAt9Q1DigNnF?
zC<~Tto<)n+5@iP%DvIK`tatc5bn4I}&WkY=Wkx;aBc>l=_`ODdN%HgfQYgR-B+USq
z5yW!c?%@;xdLnvi_}oFdvxRi6?JGB5-z`7J)1N~!<DXxjtuFjH&kCbik7#@LvboQ6
z-ZPrp91h@eK&KLr*~Pjz{^;@-WCOjDI2wF>eO%b_Jl1SIPiNiJeM8!vj+<|<x!;gw
zdh+yyJqV_lV{zRwtauwR2A1x$eoB5!785S=J<uiQMKq<`jMZLiWt$hA5H5SUOj5NI
ze+W}8h%NUF?J3gqzr+w|g(IYecd@Kk8aBI4YUs*Y7Bs0<&m7dU?t`;Z1H^AnbmW|#
z#GF+6>`gR<`DuFDPi^)ic9IlnP<!T;wdQYxLQpFWB(3-Hz6{Gc4FDp2Ej5BWrkfY<
zr5``ExXl1A4{!{2J6$~3)(3|P03+Ws$OP9V#R*&Pn!0d>nKqWl5<kfKc6k0+Q9?E_
zG115@e|ZlHX{1Z|q*7svA?A;H7jep_mna0*9Mz($rMKTdSrlH#du=h+<wM46K_Cf;
zSAx*si3m6Aox2oa84#j?=27(oob;s7DK7D;e*LV>=a%`kqdhW<&M)|VuD_q%pqjY=
zVzHvlJ6E<!d}KdV!q+O2jXo#-GSYq>eTMaLh;d%IOG-JmKvw6i7*olzP_SF3E)Oa(
zOn_Ac8&HzK0GtsFc@_ss$8AZIdOnfM_uV=A<w0h~wbN(wDi$29%p~MWWFBWK{gCsv
z6E@QdR~C*E_!vCC*QDcYK%Hk+io8cytKb?jY;-+6z8A~qg(f!*5B$~MyIXuAS;)m=
zbP=~5zv~CT^9c)hpvy#%VCjD;ISJ_33<lE|y-FvMhV}N+w2Z;+?x{`FBx19sg^gZl
zrnd(MJN_HAX!NwL+G%qTJ3RwWNHyR-aZ<;#+vU>rdp&w(r}y&hEf)mRDXQeKcg|=c
zNUn*R!$nO>>Wv(>Tt6ct0fQw2BH(orE|c+=JBjq$EMMrjYjV3c2CoUv5e%kZ=QRA-
z{6i+ayB<c7TexC6ou%mo(Bj(Z<#viK<B_B)W)gsv4;k?+Q*$7s8jt;XsSg&*nZ4)%
zJGNachr2{1Rl8dTGE$@={+>J^*ucmks0p<$xJ4FR3{?jaAx*)kRm&w>ce1C+nCL-`
zpdA)k;3g?u2JU02D%Dz?h6spDI3CTrLy}Z>(!XGVjAtM|dZtih{dk<Qh`5px{TV#Q
zVGT0kGeIMwf11^%<*Vv)eS~s1o8{5+07&r(unB^KcOl;nQgfgHUG0gBe-Hcu>@mO=
zN5JG9u>}5C>X?#@omYfd6f2Bc(eF?a1v9kH3W48-P(4Smj4;!D^F3W)G+oiCcB!oS
z9F(0r9|We0NjNe3fC=HNx|uZNS&*@WWNVS@(jaqHHjON3NA2MWLxW-15lQ)vg3r(g
zyBDJGlKE$5N@y{ahp8I1W!)!Am;qB}QN_MB#Izfu6SXPCJ(J(I8)1B&vLO9nNFMTO
zYx7?RWRoUW8PwguuMjVK3=_J4T*or!2+SkuBOTdeEwr+sFj-^*4)A`^iWb7b!?E7X
z-k_R*(wm~Qu+r;cR_x_ybzj@Kdg<aWCMsbq_0G+|^~#NvlsXB~5PD=bK_glZ<1`1!
zTRLd#w%+>9-uNQI*$?)Nl3tlsuDe0iPR(bFzM5r|u-3fo5xh08pp)x;+S`vI=GQ5+
z$I$Qhdv+`M?L0^+fyA%<r{|HCb^Oj)a<`}Jw-_tw3%$<c%9RZ}pIG?Vcqh}hDz|&G
zl#HoI`AxcK)y$5d!_37W!(D`zs$;kc0WhIhxt%5zy?}p#lEY<>v`l*Qs%i%M4?1@|
zZ54xWQsT#zq@7l4nm07D=V+jk7y<*P#=JJys&-SD+yAIT=h&Ld$Pdbn{~{*aey>fE
zU$pR*F4I&b2du)P^WAlOYq2>0LY%|D0A$%u@rdlW>0My)9NTj0v`35DZzL3lUGm~l
z=EhIA`(N7-pTk}g*txnshLi69bxv0gk&TAYEUpK$iCEePka>GlXLiiC-q9+0b5A=z
zS{%RsEDra8@ZSBn58e8y)7s;!cE`!~Y?ztYw$49CII=}LKL7q51x$Tp_UsZb`_X<`
zTn*o!7siQ{FY=2h(n6ZcPt!HkW*>#s*xA25x2>p2%5>`mzq`c;?+}@xqD>Qu1v$(i
zo;JLT=zWUlZANvwBLqfBSY1xqRJ@Ajz0{feaa*P5o`(6secdjf+JGyP0m>}@*NTeC
zxMjc$k}Rre2}1SbBWJ}3(i)w4HY9zCOw>PhVCPQ>sCxd~hm;+Fn9RJ&fV65mo<S`j
zmupd&fYA<Tz|R%68oR9SvH-9g)k!<et7y_-zZSXfJ4ENfEsK2OsugP>`1feH8#tXj
z{_cb9Zg>i0hF>KWW0&Vk!%~NyVsHR3n<()xElFH18tI#URNCDY_<t(9+wFLweKgmC
ziG~||l-VPEkseD{xt_V%rID&Pd!<%(TZG-*sa^2=*E%;>ZqMlX0QiRviL*F-#BDJp
zKi|2s#IfmM+cL{|S%^msxoUiQRlovDzOq0H4}uH4;5a&x+SF7(?!ShrT#94p<ylb=
z4R}7YM11s{b-S!bxi6!q>x}|1+~UNspO`WvTrrO`9E@>1pGMaspH2LAwhhP|-?{Ho
zr_+)7RCg~B+HbA69P`cG|2o`kZ)vR8#bU4;&p!}T`9oSMYp|fEl7DPIx-KpzUJRVf
zs$9S3R9}Z$X;UkEFFc>FK~3wt0a6h=O*+9hi<Ge2Dvlnu2g0SCOA49Lz$_rVCuKK&
zG<3bbbU((gp2SoA0S{<@V@G?Iw+Bw|@fS6^c`|{rR%bZ5Pk6d2ddBnwM=Se_C88rY
z0Wm$5%Ir(b&QJ@NZN8yJxE;UqYVCkN-^bzKU4`uy^-iyyJp)4&kCy8Il!+Fe)$i3u
z1<7c<XW45^pV>ns1fA2%%g!zChy0gaK6MN<c>-6bn`cUg2^+nx>n&eoXmbLfO${*3
zf>c>G{2aYsvj+4$3wG!#-v=>HM0xk=LhVLe%K8pJfw5#9huNyizq0|Pf%od0Gm@5`
zw@s}->#d!O-><AU&qKeBP(DTTi7xh7ff4SMk-Em_=iRR-t=tbRUMVu4<#_Q^jHBen
zm|v7*d&rFu)cu7X-1w^YRe{y5R?|&aPp`Mc1s5>hlP_`UdidIvF)ltH#18U>{}6rh
z&<d@q^2S6hcIFgq@bR$dO1`1z1W?P&YktRXP#eaUqsIl8nN_AZ34mrZ8o@Ys5@dMH
z-H`Ovq}|?%>yupS<M(`1URe~uP(2T93pWM?zS^n%G|hZkJoWAL#@wPjz&=lJg$>fR
z6M;1n<Liz@LBGCi)&7Dlgf6HluGhZJ-Ofl8<tjf7o-a(v7yY0PvWhb^)|qP2nNLeH
z+|`uDKVg&+v=zJLP+CS|KcyZtAMbxCq?)qa=wA(neO#(l!0jvz>ww6}r>18bICy^N
zBlLdm#4W`he~a#HJ@t=}U}bFZ9xyd??a@xzq+DyLlL^<ogoz-;=6aon+5E5xd#m65
z#-n+;J3IXa7<2gE+ucs42VXOb5umA7GS2MNjO+jvA+H8&S{_?&4RA@QEw+2|-FK1P
zx_TO2A8zKVbtdHpeb(TyJ}Z8C?98`GzZNHt<3~yuxPvqtT>c=CaQAQD&Rdb(&1^o8
z3zXrXL9D%>2&$bQPPyedqTz5w1BQcPZS2)^2BlWx#lMP8x?Do8f<y_H;z;*wj~q*d
z?~f5~96^ha%v~N~na-{5;K$e(=UXisW<7%)IG1B}$Um=m+it)}NKDZfN1r^wwmawk
z)FJ~lj}~8hJL@O>pf)fG%y)F#d_A4dSHL$1Edz0+Qahmht{hoom3bcu3|$W`Ka$<I
z2?sr|U{qv!h=ed8tb+6}QU8#hmaHEJ0e{RI)E_502?jj&{eOOcI#e#_g3x*t%v`JH
z@`jtoN_scv(e=or%VeE29uoo(3g^{Ih5`&|Y#!!Ne7=LKihK}dOYRpc(%NqK0&Sgd
zfmg!s0OF$o@v(|+zS33kz7BJ{<Uj>xi_OdbJP9_oJNg)%&HO$SPk@14IwOCLPFQ_I
z@AI4<qP^l&s~RF|>#?q%qTA(vvF$dsb>95X+W7q>03|fy`*vyT(F}S{Lx;X5@jgos
zzkoJh3tcW8KJI_m8(kllQtoD~q7eKaSMUJaOX6m{Ql}NX{HJ&TwO<#<n!+Y3-T1XW
z?s=9zYJDEA*OO;d)vtcXYCLnhqBzmZ`g(Nk>2$X^d=PQb9O3cBuAuci&8|U)!6AFY
zGb%7@0ayD=%BV@!L<LY??_p$<BRi^h-3}ft#yqKBJrFJ^HIT!xiMMOj@ONQPB5!2n
z<TJ7Dcu;i4o{Ub0=PxF)V{c9Qad_ReyPrq7J#yADm%DqKo{y^YJVzXmrp{0}36Kwj
zUOAF0bvB!+U+18w!w`(zFxbioo7pRo!vwTMP~))GZyHm7>V5aXZ$@6qt`l6vAInEh
zjYi+!snP_)VGkpQ4*u&12TuceT2*4!(Ehm3dtKBnJ_$h|ZY38F+J{H)8zEk)v<)o>
zk{<&$y{3T289dCOzK0$N1{*QiMF29oF%r>qBnVzs7#bjOBM2fQZ-LZ7#(byH4U%hK
zjZ`f877UBYFRsRk$85RTxWq+TP*t6ZSMfnoPg1kopfGp}Qr`yaoD;P7bUX#`V5L0q
zQ}%q_I0ux0yLO~|@n;&g`?PvL{PrH;h;HaxHq_}AxZyZ9eebY(pkSo<!ixl)y=1gn
zKKstQ8W3zn75-yog1=nSLK$4YPjvf+S|p^yeq5&q$^N<Z`u!NViX#{+eHl2hV&4fJ
z9YwcG5P-&?mFvy!37fDFxZJ>A94@GAkTld$6g5>Y#2~V)r$99RC_uM;dOHj2ddwXZ
z7@tw?fuN>5lSwOoM7Jcqsx?U{F^y;Gy}Bsv62Ga3fem)HVdKnC**Y~rK8gNwU>yw1
z1VDH&rGdzLxn>Ik`p;lSJxn9cs~%}R$%P-jzn!hib@-kPxGbcgnc0*3*>2%(u?pAQ
zFw(~`k`=zvP1!mXO=Dg&o*NhMCsGZbZI!}RZGEv}rXXR_+>#|HM`WN}#gd6HC$Q*@
z*I7@uFH?EBf_XNmA_X?yA8WA@;j-zEYG&B2RT^P?lh_ofrVU+6ix6QS7ofaxb)Z72
zdUYTG&<iOA1rB9Q9<tmQgbG7%z7F}a!VZ;LD5F*dN=3C4vdRxlTfHfS42tFhK|1ME
z=x4baeBW_`^$wjxB37MklYo%wn~X+wrd2BdNx6KY^~T(A#a|#EYYY^wK6L(*D2W`p
zsXRm@^=N_mN&UbmFPm!+OtYrI^0s1R{4<kc44AF(1OwqH1K?)4+P8tm^11RnEoK87
z#>g`Pp-f96Z9yciV};hO(}|cu*bur2q5unKQ!LbQ!ed>`M+PY`opS@ByQRN;VrOyb
z!A^?is|C}TXwA$n&_wip+pwdl-tBRI&1c?&nd_e36j~j3uTzf}8v!G!_n++FTe3*w
zL1Cd$XW~<YJJi#)+=1rf&R8qxK?ZPrsnd#z33GJrWXjd47`)Q{|Go#@F5=``b~>gD
zSl<Gs8xvWK`CM+0h(XswEp)FGt?BY7=69@sBcq!aq!{zw8)l>kxUc^ySupWg#3g^U
zMBpy7%vP`I$<jW#xUsU<y>%^tH0g0yBED%g*Gg4?U3OF~X4br@gv42&#~%QnuG?3*
z7OL>fZM8w<)HJtRm>Ukm2<lW9v6sk-mD{f4#$-n}a%XJJTDLQHEtx4eQYe<I`5%mA
ziFL14B<mbwtU*2`_NlW!?(YThYsqj)yq#Kc8Bdb(Fl1NCzv-KW6Y7U!`Y&0KNVfuc
z(m>g*0?7lA12gqP6O|rE|K*X*0Fo<94DMB!9P2IdS!I5H4tla~GX3z5U_)j~6YHZq
z08|m(ut_$qX2nVg@C><|N`r$6RWt5BhHiYVH3fRfgIsUj-7Zy8t-)(al33?^g@A=|
z&U2?SNasn3%nUv{OIVzojfd6sueI;$QBfR@{#kdt74WpE93pQvI+@<-ZZtai5%qfE
zZ?S^-!|Mr=Mlq5$vNt8nntD*~ZV3ciI$wV0&eZ`qMcQZL7ah#k$pEOqfOvmnx{$rv
zH>!-WS|l{QQpqX>^7epn9SR&gDlu`Cc+Q7#-DNtG1(e6%s_g%Lxr!{KCl-u3nOz?Q
z?t0~UebTGeI;BjxiP<%{Ey+Tw<*(s>IeN9)rogMv<5l^2`LHRP)s$=0{TV6>UALM~
zhu){gya6ZXSHu&CoqeCa5;(61az>4|b1L$>x?D^$1o@aIDIuv}CXO8}UXb2*S!2T&
z-R*_1AZX2mEFTj{kb)aiI?P5Go_@4ZM&>Sho&0f&lmGUPx36Y-EqEwh<{;9kyr;qd
zdLn3p(@b412-w?XZ^Y1wPu0mnG(xVY`l*P%sXvSGNgf9;iP}p4_M!7J@AYyse#7Bx
zzS1q+@p3b9v)OR4+ouCw`X6Od=gss!i}ccXv|r0Sep`E%HBg9SIwo^<V`R;B2bVv5
zYD^)qhzGzIcAXnT;$Xv(`ERMrfmH(S)R+kyf8@f+*_kH?Yh{I87uuF{*bld?W3X?G
zHt}V6A<;x7$lMLoAoRH4SP^~cd?`c;@y09c%Y#b>cV8Md$(JrRw&OHh%R#VP6s~5)
zlEXxEw@k}Tk>+DlL8u8lUdZEE4W0+{0gXME_IM4whf;~muXB+nk{>m;{Y|q;F%Z5Y
zQQzLmzn^$TSrsVju}Fuv^qmHinl<|!v1P1MqGuAd?1%2Hwhe&K;>3r1$6G?sWAhgz
zZj;(gOr}=tjSJgq+fBcC(p+EnX?FjFALsU11^_Ig|1NS#ql(ChK_*QOXOmS*p8g9`
z$()@fld#0emk}d>8ujpI=|he=o72@yiX~1zYZ44z*Wz5DGPg`)T)_3@TetoxgSCC7
zygkOvk~{o<A7=%h8bO#CdJDk*HL<&Jll{5b@ruqL9t5942AS8pu5PCYXnb`~e#M-s
zhR<`m-5x;rc{0y>bNLqYV*7nEFYfCP%b&-lO|S6=;7F5zP~?d$SOoDc=Jpd;_&=`I
zphTV(m}(ag2?;<kbO{tFgt`NuZA(!0sDNZ&)gS1yN$7iq!@k~ts}KU-WC@Z^G6~uQ
zXGtLn9#KXFk{y(z#Lr@+3^MtY<im9V?WmuoTV$3Yl6;&b=MibWm&a!b?Ne`-z?@@l
zbgsqXCPE1H4`U4a2{U^=-*^$?Z>N;(CvQk)vfP?fm+KM2JQNFOT2Wi11=S6tX=kn)
z6E*!!B(-T+oet5Tb~cm;-Wr};kXOD(AJZW?UZg-(Bb%30xUq%h4-N~fCMeMG;PYE_
ziG=i|LJm@waQW;?(73OVV+NBl^cG0396p`HwMJqBkFTS%WBE#C+r`{eTJ_wK1&hrg
z$;d0{G+5rv@|o0`q~|zFk!~fmeFgczVb;{(g5nm;hX9gSXr<04FC&BC3)E__cMrdt
zAo)CfsJFlz<r>l~$1~2)DQ(fn$P^|feoczYx)LG3{;%3U%J%boVYv#ER@wASY6j#z
z?363rz;3$<y0fEG@xQ2J!Wu)Wbesv(GxdGEfZ<4|hQfxw=^{j9c+CakVg=5jvY5kz
z&jlTIUGr3_A;N!nlJVpV_l$WWJaB?OcueakR59pLqNdBVXi$cSx7H#S2Sw%`aLKc?
zGqdcoE;X2v=}43{6e{zAY+?6^b(lCszr*HRnStsko%2-6+iiUgHOd_iMIgVZk|j$P
z{;tU|aeiA{(){hk^$k^ZhhsjYbS^s@Rz$NqgN6VLN&Kr^zPLz%MxThx2L#t_`m>9c
zp=|!%8!ODMmnC@r`iBx<t2-9u(j(c?#;f641<Psay;IB1y0mD`GG#nQ#72=I?YH&M
z8{o}Bksx6mR!+s<jQ(dC`yKny;xh(*pI?EBJ0dh@NiQ@8*x5_a=LZ$|7T?^^5C)c7
zo*@Z_FdK{t2Q;<85JxsP&H%7<Z`t6B#c)8_frT*^B0B+{c)XY(;$I<wDcgT+Mo9L3
zlxPdKOi_>=yzGjT)8i_9LXezXDtSk?o`Au82~s1<e^_9CN<fcx1!wh9T!or2-fHJ0
z;hlyBolnc6Dp*%%D_t64l`TvxBRyGAE1HHxnwB<)NfnPBRdi8qj1c*eW+g6p?GfZh
zh01p;JVBS285P3h<cAQyb0-<i1gu9kBhT@tn$U%;kom}0)I&CZTazDp>-9GwI&Z)o
zr7|YFoWohP0{6W@E^Czjp&1`;3IEM!)Q7{Mj5^F<Y^6`w7!U#n8$Q6oMhI1;xe-*h
zWFOKS&EV9H3}BgbOBbZE8777~Lp<r#pgF`}F*3GRrX>0=5=QkzLEyKX)NV4miGuV-
zlyql8h#kZ-Z`aSrGUqq1h=mw2GW%^rlp8y`B~<i`n9j12u9OQ>Eyp`DcevH@pU>tP
z2HIk875zD_i|u>WS+FjzcAo=!eXM-BsO`eP*z9-`_JYXHFHH1%RzQ6`b)&wV<Wk+@
z)t)lyq1plFFf$;_3bM~Cp=rv1W$38U;f4~u0|WsM3Lldlw<{XYrOC`*cM@<&L|Q-l
zP0ua0;2u5P&e=I+W1#y`k(Bjr`x2gvDY)kz!NF8A$y3DND_dc+>*F04E#HlvFWw*b
z)xURf3Ju*H`#BV?O=LxSZ?_-MAu|?}aKB-4Q$y*>FM8Y380awfed~n<2r^UW<~XF)
z4Zcf*5#W6<Xp^F@6Avqq_IT8=J(g*O-+4BY#6C-79Gg!$!0+yK_3=AnNnLW>zPVf6
zu48vY#`DpWo)YW;0hEc37!|096f5=`JrpOl$U>P$@Ursz*6V>K{~xcB_sK219BsDf
zl!RfE=wp<UI-2wd*<C$V)X@CC2wA#ALt(<uJl#4JqFRT)kc-k72VYn(f4MP1-~D9f
z46Rz$o07p!z8|-SJ;0>2s`sMKCM*91TGXpqG`h+*IM5Bj`k(4#0fi9YywxR;E6W0H
z*W>Kq9r^bLT~Nyje3x-29HjWDTP|0jlnQV#Fx`rIE#%vDh@r)O7`YhO)nC=o+yWWU
z5_h>bQHbQlAPb-99;&|_$aKwaOioThWJ%C}JV&IAPlAuuZ?6$Nrv9&&lFTx7mj#1g
z|I~(AI^K8buYSFNjk5iJ#BJ!;Io$V7i8&J@b9Y;|C3}ocgLNak;)uq4r_%uAbg{8H
z1E6#Kq3<05aEd;m7t_-*Vh*;<IhYyoW=dS{H?_7*AU^19C5P)+=C6@RF~GP3ST&ol
z94y77Y?;G@_k#XK{zPe`Uvz8^qU6HwE6N3a={KW`AJUi9p=LFNoyL{!sY&M$=pp~A
zjy!Iy*GPwrmuEZL!$V=|m}4}`A+Q>c3zjJ|jiO*6D|?9;L4eccrX+OrGEnN${xX_J
zUm~U6nF|+o#fVaeHudxBYpUb8M*ND7FpjIip}15+h^7+EiHIJ15SPL52P~6)!aava
z^l#EI8utZ0t3f$Q?F@*U2aWjq?1}XT67}w-5x(sg@#-8!;FfL{b7qI<_d;ppP4)To
zLEP#bOKRj(O%Gqwl}xx$ic9?PDXL8E?WVjXDT^xieMo@WSYY6&Q&Y`<7A>Q31~bUg
zbqd1~V~E%Up$bN_Wk9AZyX+SwzzSCK>@yYHAj)(b(JlYntZ!T+!I47!2P7$}UpJKL
z(W&j9Zt#^{!M<8!3qBA2cW~Iw0zGxJ@40wb<+2eh>OPvx;z~|0>5VhmWb=<VjV`1R
zF)4#F2*?ywVB#+J4_2sn8%RwOLv3JUvCAhfy)FYxJonX$Wu(g(oKU-gyP+l!maep>
zdx>=LvuDJc8Ye!pBc#)dS-+rCDUL5Mg43s3-_%H=1E%<VS9iOXQ~hUl;-M?0l>?5V
zK$}fkRtqMV{t05JkKi=?EHwLV1jT;nZt6`S{Hu%S9#Bdd7H{>t0jh+Z6@G+#3$@?i
zY2%{B(tsdsCRwcSnhY5q-|`b@btUU^Ihvj=Z#5gTy__qdo6}ID14=v0*F_U~5p|Q$
zNtiXtoWWaPVHtk({)YZe^TnP&y-U4;bW^)Vg;H0y-8$E!EqWb9Uds@5z^MsflvWE^
z)d16fghtlCLL)fr)^hI^t=8hX*v!^yT{ARfrG}T7O|VBs+ao1(E_>vzxa?TN=y6ZY
zS)WxORde3!9-uT?`hDG$+i(j&w-#@1`t9o!TYNNIyNTu%QLFM6%CTOU?~@eRU|kpO
znK(tP@NqU7?{c5LneDHrOS++$kJ}c6@T(rm%Dw@<k3zTiZu~L17!`7~ShS2$_SV<c
zxVJI5#`{Z0*UjF95(q>vhuF|hilKn3;b9)`5!Q7k*Ri%{&TAbvnv}_z9XkpGu)?}c
zJtpR+V5^Ie%KLZ;L=Q{xOA+JhXOH-_|0s+OwN|X#jqHHfe-%b{+B&?y3L}q<OWdLT
z*yRYNw!=mA7ZvOnj0bcH6oB;Kj$z0a!JT)d5&%B*oH4w-J`p+JR=F`5A6We&`M<<k
zYX1EwH}zNU36~ATDFYC?m6Qz9Qka(^k2E4j+$uF{`gyH;`*MhfBe8Nvk!m@LHZ>cT
zv216ZJ-KdtxATWLHQ8<$l0qC;A&TXSgjI+^(tualcpb$S{KfNKPnQ1-X_<xY?i1j>
zKx{Gx>x1I=blsh0TG@FYRMU9qb!SU}uo>ZtErfU*T|@SdmM+yA6Pc$f98he}f$a|y
zg~g_XmyZg@Ken*Otu9gPqE@Z%5<qDG`i$Bbb#;b_Hjxm1&Q`_RG&u_b=-OlE5qXx<
zGV@KloDajb#N~jexRXV<aXyC^7AdAu&TRNQhf=ge*;_LOYbEG8o{;VoK;XqDmC+d}
zZXX=_e*RFva8li3+@6TO<DY7D4%`5`*k@FSRC&roxOm=xw2TZD1n>~U_lyc;r5!g~
z3~5OMgzDI{tmlSn!RVzIc)@U3?L&%l-wN>#3Z@f;ZzV&K1F)SMWI7`blL!|w-yByy
z@$Al;3l}g7E#BR4NnJ*3X0{d?MTTzi6WAS`MzG5XgJ1>qI}E?9+MJ8o>UZbU<GSwx
z#n*`*`e3eD@=P~W(?35hG(K@~EGsQjkvCl%WYgzb<+eTkw#~r135ja=;p`r3$|kal
zv6#H>yvFIVm6TbIMIn_!u4a*(@T3usv+$&`w_4i`r$0T8)6D`cBxj{{Cs!wKc>8@;
zI4p6fm+oUSv?%U4Dq)e>>0)=6?Md-x;r}eQmIYyQ4m>$4EpWnjp3)5`D7(!WHh~hU
z7AjOUWBmlWa-rT=oF&Bpnx(n@9PLndv`j5rAa-iwOc_QPvs3NPcaqd3E6{5d4urvj
zXJ)lJetD}|&oIm<{woK0v85e)lpU%#HK7#r785u2{u`=EZLYX;0RY+p^y@YvHYtS`
z+l!6BwQQJHFdLUKaJ41Mc1DgcY-V8NX3IEVPG{1z8o=G6EbJq+atPGDi-`ku+o|&o
zS_5_9-wc{bNe;cx+aR1Utx*v$DRnG5!ugH)R<F&W$XfXH`a4J`(YI-V^^fMg%g5M8
z(S}uYK(6$D`WkUtZ5u8II}l7r0)L3<r<*f9I`MDS7fqb%#k6-%Pk2ejxL(F%3WX8w
z*<yXQ`g%}P#BpAS{pn4_4oFy_1Hy_4@WW_C;lSzUIM#+;fv^W&9WFkGFiMF7qJAAm
zn+y90sAMB2nd)5oJH^MSNgGR+)R2&#ZJvkW1hMl%&VIjd*MP~lT#l`zz~f5D7X1@<
z{FK7e1*y+?n_(8rlUZUKa0&Ul`&7rz#Y_&FBYtbH#qriz?{Zw>fm$74aRn#HPj6y)
z|IjOJm6Ons?)Yr4i~Hoq4tY7uC%M9D;FNqW)q7RO(57!Duz|M9QH2iLh3j~%fTZw2
zZcrA=rVhGPA?wvE@#2cZ_D55W6WqPc{t4)vO_IQl{bRLVDzas9GuC`ey;A#nXyq>*
zv3P8I6k*C?8;n}*I1v!%6Ie^Y51!H>(`;lbMna3Y`wo`R6yU6?c$U+N7n49}hj+~S
z1{CfcSPNQf5|-CCB;QXlt#4I##QJwgH!za8f+j?-7N<^g3p@Mh%H8J%HT8*8cZ16f
zU!1$cPAd=p&4nCOo;&+BFbEvVADuR9DU*XgiA^9-5dPoGh_FfW3HD|Q!~72IFgdz3
z58Tfr+Dsyc<o@5Qi26_LNw5AZF1@4E6=iuN8RLdbZQ6g=98peT6GFBx$wzhHZ)7P7
zUhw~7cI|JGgxFH&pB@Ay7MV5mxAno3_D18B$z#ojtb|+UD#k(uN%XXH%MJJ7_p=Td
zAk<><qqKYTta;?9ICL<;>S{1br1>+EIVOkX;bYCjS8m77{d&h}y>RD3schb?N$ccd
zrA<q|!-PWjUcMEg<!3LGOb{QC&=97^hX*9}YFfj?%xvT$-IBd*hV2(k+)Tkmf~j2D
zz3F%n6(A!)acLr@JG`j)D8UNX-)nC=PlF<oL3)Sj-$0QDEO!38FJ!)oE|95J7HAtu
zwHj-qM}Vvb<U%!!9o(1Y7Q1&~RS9RxsRFh5LI4X!rMb>7`$W_vpJ9JmE$>OFv$AwM
zf)TyD3e+PEHGS>)h>PT1x@2!U!4oP^LoxkqM0;E5DO>)(Zz2knrQ!Af!)XAV_pkQe
zyivt*NC9jYFHA|EN{>cf;KhQ>Pr{$OwQ2RLh1nejD>45I-0Jelem3Ag*}B{GxUag(
z3g=i2r;A0e3eQj-rN~VVr!+`EZ`!0v9e9#Z9mrMhU?GukGr})s$({yyMkoFXJ6L5E
z@5J~2%|vwW5Z@8?f0>8?`4IkxA|ff!n*S9Mkw^pGDN4A&o7CcTr<BWg@vwiF;A14;
zu6oST(wlNT8CVH&8`P{_F%<KhG1i})jIY+H?^#P05ABqaaz)O=+D#bfNq`^b8giQt
zh?_EtEir4pDi_EdC<|%+u%YG4&OFtiPryK$fOJY@Np$0UYsY8`iGOEf?xvgZ<dlgF
zB3vi<AHjN%DQE;_k?!Y^+wI_E`jSml$ydSjYPtq$-si;xHTu0a<<$s1_ikgo;}FFQ
zW0%3Q-ME*D)~|PRHB2ETez@Y4ovNMDKYlXAvS|58f$Sy!KY54-aW7U|SI<yhE&drn
zFDQ4W;IrUR)+#aELq$4N0cr%D1L9$UYes}f8AJB)b`>bj!-SDzz^RIpC>9I-GeXET
zqbu}FQ4jZ)BmaXC?25hHgu?8Wc$va!F~fpBcu7reoqH@~Og$s%)iqV)&$z&$nPGK2
z2wY)MvS$q+{&ymM;%_zfjVq_g2J`PuH%VR$v&m49Gzwlrn+j2cSX#kc?Vl=jtEJx6
z5|vOEzyGO*=-M$Z(3y%mCwny7WQ5j)`Z~h&u)kO;WLuF&KY}#1F$qWngG}z?K>JKY
z1G0418n9g7^o}VjtpJs`c{abQwLQ28w%FwXl{MGG7jU>R;a_SL<@e&gj4Bz;WvdEr
z4UVM>N9To<w)HNkffgrb#r{No1VEkBZPMZYOhEKT_+=5rV&eb5CLkL4-xLsS{67hZ
z$U^>m0g>!NYfEj`pAP%>uf3PwAxKq)R5^m1g}APknu7rM7Wv=Vs%N=hIg*jb@4Ei>
zh+U=9*!w_F(~KWf{k(OdWv<S8Ycq%>-gWohRyR!j72T)H)w|Uz?9@zNW3FW0Qb%q}
zZLM|ZGyKg+ZIwOX{?N+rBIkpG`-#3LJT}I<K{03umzVTtk(e-XP1yRhjr3d*$6y9+
zkRxV~jrQcvh6YmstYXdKjjbT6J8GT|&n5L$J#Q#<*Dpbc3d0TZc4~VXdd~=i;Q4)`
z10_c8N&%&k33B5$qtpulb7SC+%zMt3o#%Tyts#mG8kn(8p2-r_g7F7ymy8#mR)kND
ze8!!t=?s-+X<^e}0=!;b<+K(q--9D!roBu<lR{Dca7`3N2DS2_$K6HZ0U>zAeBj~k
z#nCv(Y4&!3Nr!o$?zD@r#i6Q_O-1OYxhZ61U2|z(6Qa2QV|nT?Dn+Y9P<LiB(#fm{
zup_(-Da`~6dh=+EzbP(FF;(RsYIB?tb5%B2y`cHA&axEsxSs@Aj{vWxyt$k5ZjxVz
zR(Q_3rYyQ|44QP0H~2(}UX9rVS-`E623w&P1FV@^yx<MLCB4s@??t5BhwNh}$9oIM
zd$wksmU5|{qS6&~R;@6rXaqe2qqL}xu?EG^zSGC}%ObSlQ{TyUn-@0u46!2~WUyg8
z6e5HVMA4L8s<VXgt@5k<Yz%I6A$qQ_ZK!$c?TioQ{W)h46%@P=H{M8+9xd&%wAMLg
z8k*m{m30Q2@@)0QiCf=!Jjrk<$||;ynZzu@2Db8Z*mj1hsm5oUs_cbwFg)B~?~3n@
zjW^XFA9vrYz5Cw_uGg}BgK{lwdlKvt%MRtD=ekVsTM@LX-E?e!Yt%<w)51S?s2TcS
zzr0w4kOs;-XbYU&g*jp7Xf)~UHK(Hm@U>Ip?lWEik{X{p+}j(^ca*l+cu!<>0h%9P
zVQygC7Xm+Mbs*__J01uQKH;4tDn9wj^tKg!&ymmb@p`c`P`$?PtiqllgM;h&F?8TP
z&8W8f(Z_FLtlr{*?_<8h2a#xzpXZ3yuk-F(@@8Ls(n~Fd+xNP$<r5GmZKw4to6{t}
zCA`_taYNpMK&og)(D)k1`C^>-<{SMgj!&<nKAKxfTsR&fk~_4=iX9PJA@skYBnPF*
z@QoY~`Y{rSbON8j)WqeO(OedUZstMTsZdrxKh=OkH?s^<UyY7n^2#5RxoY^I)nD6B
z&u;1fB*C@6Z0*#02!3bOO=1BFOBW*$ukha1N9BurLL`eRuWg*rmA|$W`Ni*Gt4M*-
zWDb*HfeR>Tm5I~?z+<oB1BLD4Zwnp5$?qO!>I=3F^S4+>^TlbcvGbeLHhdrHbt>-&
znucPVF$7i<Tk2^6?UO5hcG2tF`FT%uxLbY2gzcqyxm#Y{w=h5nY!82=b~3x699D1Z
z^iNoQ&E=}8t3f!M#0GuB4<u!?Ytwj1sJ}LbE;+W-hZ9<N@6H25BR7~hI;iR-ZyZ)^
zgqoX!PRWo~Ic)d~4X`SMxaRpu_zPHjfjA*qRPDUu^aPtAYzqZb&fiI*qh~iF0QrmS
zrhy4GkLO=&HQr=%Ac)Ppx3PD^_mD#;6cuuKgwain>TKh%^6BEU$abFx(Pt>qY)k2}
z1jVPx=J|c#YW~;4hULe=&Bo34c4yTJyB{m^ZB6!k70LYH2F<d5S#M0$QKJ5YcEtJ`
zc*2``&_aRO&W)W!6k38mtEpQANN;G_dImPVzQV<<f1SltM4T^+sstGd?^Ag%rqQDG
zzqSA{#L*N0$xxZ}MRtMOeP27=u8Z8xE|$FPyvzvCVE^idF>Y`7*_}2$+G4n^dPwWl
z7DBRgIlS}a$jShh%6sHyqrrW6#n7R=GHO9lswz+hU=@~;Ldl4e-{=d*ZsLevu-n%7
zxaQjU<of=$8<p*jV56aD%bV!4Sfi60+q#`?pRTp8I^4>?U=@m2$9F*lF3nknkZsw<
zu!0)p)wjcZy`PWXUvm8wqq@m$DyOD6_v8qG+MLfk!7fgyX(np6y97$n<<WF(RJM>j
zLHa7QS8*(M5!$Ot6l9zQ0~afW9V`+Ofj$C}Le#b#^}T$-S)YW+UKaE`?6w}Sc`bk5
zu;Uq~>8kl&vxJ7MQ~@?I=^PI8xBc17xSJJ2JvS+Ro~8<trcZ<A3H>}HOcMr8B>Hoe
zFmMnPEh>zJQb}KkvuP~E>Ywm;`amMIY;yHVcmdHQYMM)iC4B4uJO>?0cL>bMOjUUH
zvJ3s0Pu7yGCf#mDrBtl?r3#F$qS8m*b0?>#(pJ^3joy$rR<&U*Jxl35TYiGW0!JF3
zU+@D=)pcL4w?kB2Zsb1tVxG8nB8Zs<VHZ};UIHC<`rWM4H$=e@N@Sry3y^y;OmnX;
z+`7c=CK-N!q3w#&7@3+8gXKHjWff~fd%?M<sUqN@PeXPt>i`m99ySETJ75QaZ@x`f
zzN|d+ZS?X!c}9m8(9^jTw9GSKG#<=&HuYkrEYeys_^tN3EpJ!7$9F$P*kX5j=|0z9
z+}nnqi=!VAIWjCR`{v(rv?q7hqJuDb6gMX%xC4Jwm=su1w>V2xZLdi?PNuN8$A=_B
z*`!>qR`{OyxaD6+65c|P6HlF@(KCCSg)*>Cjv@7%N^cgr<mt)QFk0_+34F-A-0Ojx
zq_bHp#yT8Rn_1@jWY_({);Oc<!<ITu(w$ZcC$_`zTY5Uyd>;y4mcy>%wtEA&*2(?0
zTHoP~#G`F(_frgA(*K4iK)~)V6W+>)fN%kxdG}t<fB`90?xBsLh7MqP8<O#7^(I14
zsp(KjgW9@Q#Up6QUCG7Kc@U=lJ)$=zEe|z59$yckTyrzq3T8}yGqUZ^n$&UaV53-w
z+)7JYTe^f0GtqQ(GCk@WvDGN)x}IyC#C;{kNCic|wVj&iR(5XzGwoCCR7YmUH{g$+
z6hpDD-dgc*A2E7kE%<iW@O?WJZTM!=b}{<FJ=~BLsm{Y0Xl;Pih?U2-1Wk$Re@MIP
z^#sRjAzhsfl55s)izLmsU#6N}Yn~%VZJ##6%S6hqZMJ9iypcAQRypgZu@g(DTbMgu
zI!g$YLxwdMA0m%>z!RiDm6sKn(w`mkr5Xn+<E3up@bm>uAXmrlOO9^(H5)IY+)rzl
zt}xt=4R|>teRAGLWLQRx0>)FF%!Yd}`~@y**Z5k6Xs^fL3FQVWXuj+_!LH@Km&+-|
zD*dGcfAx2&HQ3?k)m)X6GCJ$QXyUCI%MCRoY1MG+NEZ2PkEDY6a9Dry^pm{Y1<%UP
z5$juYs3n3~AbBe!KsfV*5=DNwwZ9tU=F&tlse{E@`t@WlQEc7aQW43XZfU82U((&_
zsX}=kdf(Iw?}+*`@&o-OM4o>wuRgT4wO01V6$`mmvfShd){`6!B&{PhSK9quIT0$d
zbfL7mK43@JAi;=-ZJ*da)1o2hY__^?egVfjg{~h`B6X}FT?!sxDG=O5iwKu;(fuix
zMPe80O8#*IW#+B2px}2`+Sne{kPC-ko0ru5u>_(7-nk>5xueg9Gska9R2KOps<=Pp
z_cKLgm>iet{Fgh_#)w;l=SEG_E(pN#EC87+06d%<FMHNIaS|4;Ide``EFX-SIf7q*
zhfoUW09DRkW>BrD{cQBU=Uh9VYqpBSHC&@PymB~h0dRh6#pz=MjNg4HC^;{CF1(Y#
z;E{3gzkslNt<SDbCn#(926Xj`@5$%+fYmFBAa`jFQ*Yr}ko5MrXwOHhqWz5?-${0-
zWS3raOz5Rpwo&gNRt)%U3%|#R@OE-dyhhBeF*GVZb3iofgRq`j-c|{ES%rUEe$H5O
zwME0os1XH35yaiLseamCtuWi>ue2PNh&S5JyMC|S$Ee$8H-RIBlE;r6jpP;Mz;`t@
zd>lQ`$GDSf>nFO*R7&mhG|ttu;VMsKd%3vZ)n=Y}dydp`&!_)4E%q5J9Oin(-RSkx
z{W;Cc4Trbsf-s1u3zY+i-dqqs$t#p7k^>y98uMC+KsEXcoH#jQjwOu~WsVJt|220f
zVnxHpT>@ssBbfmCxw~*|vad2ygO<^8X~7`5MP)Wb(b&YH74V4-VMw4lX)-)!fpH{-
zAfOK^odw>GFV>g1CXnsQkN}UOLho63zv^_`Q~8)v`uG6%#nPWF`^6h<ydN1;5Cp)f
zj={*fF5`#A)z^Z-!d}GXAG+^e@!YKfsyKM+D6IlQ_v$Nwz4iAb?LToU0X1khUkLgQ
zi4XmTs2dgs3ky6TC3p~uoltX}G=PZ$1cabK8#D!?fT5oPa6z(;J@kxd`bs2qKN}~O
zQTW+LgDu)aO61*nWb7L>xzU(WbAsgGj4WL-8G~uJsw$t}cmxPPDml|?nZ~bZ0OhQ7
z?b4e~<+w;Dw;b%T{f+)96Ii@`P6|q(lndQfbq2&t#Cwf#Iglh+>lXJD2sT5)*6V|&
zmC59(@=ZNkEAZ)C!Gq6zM@Gf&>wvTXv71RGa)^QW;;D{CYPto0bMmk=APIRv&1S^^
z-c4&WWx15wVy(Dji<NguY@yBG6_R%PVjyV0N}y&@*i+7$s7z?6glaq4DuzsP5F%GG
zl4lu31ZOjeNKYLw4yv5T{1uQ6e&~o1uspnA0mQtyy^m1jB*F|V=3TOcO&F|$!^>>g
zO&ZUmxMKt<7iT<{*mQAxfb0p~#6Gmi=Zdq4J@y&+al*w1#OhPI@XiwvR}pl%wU4mS
zVXq-j`^yzU0oejv##~Zym<KQY212Zm<NGTig8+txj`2oLYQJA+?hCb^V0b`5b)j)(
z^N^x%CL2{{3?z~AkLDT_^2cJf2WnO&Z}a@)>hb7>uBt8Norpf|UAajk7-?2E2*SS}
ziOF`cb4kQiRTntG@b^2MA+;KX68EoHJ7e^C0tzPjXEC#G-~(sU;fe^s3Hw@qG%_?i
zHj&Qpxh|V*(z7#=nYTsjz;=m{+FO!@9AmkMghAo1QAQ(fX)s7vIwUb<<kqjKT}MMH
z4ARUW<-$XNIwwc0QtTf(Tct~<nueP0KQ0NyUgyHdwGXyV3@zB0t+7FeH0LV@;2`+~
z7k-Okyq2xedRRMT-`qU{Ax-~JEyl^@nAb)d+~s6qX#aac(RWisBCg&(B$2aG@20_j
zEEoGFKq<zfO#bJ#7tCF_a6#NOa&r8}!jO3?G&3o(_1J=nK;SUzae4ZoS9+$b*^{rx
zin@;hSlkP4`HV20J0Rl3eElI3<z*I!T~LX*vuoMb6oAZK7X;MD6U9b4Q~Z2#!mXe-
zYK!W5M8Snkcuh;o3u?syD>)zHIaoOO7E#54J33*JG<}A%l+eslGU|}{3X|U8*3G&u
zkp#Win*Usg_SE}tjEtAS7E8#Vg2X{>Y?{4PI5mWH?2}pNDw6D1jD{g2<>(TNMv+#6
zNC??2bVve|(iZ?prDWZ<d1Ue(MFf>AZM|lakJ08?9~ahzdz!i0=5H*_=hkKM#ZH=T
zki|GeHoPQT|JmU!94oqPWl=$|f_8yPM_UXEk33;7hv|aQV$-<$dt4Yi4TbfNr7L1+
z)+^d9LEFmpYqD?R<&0jmi`Ub8)1St{U^LDNVl4jS7tOWSOTd?1TtL-SskQ;YP5sS4
zv~G`$muc}m#UszrNGrH>=Cb(3iK-S~5KQwgrXucr3~}Z?<)Y0FLW2`<0n(h<2K;(`
zfpke-7l7r}WTcLzADaGg$9Z7RbxyuUdh(*@M>hA>U@g(06Uthw@eo>j?4k@d0`i=D
zwxxs}FdR+7VkwcA2>FYL6V0?tm?<Qtz~{jiE+V||bTYsuE6*@T5u&+S){T&<zRoYk
z%8?jYl8=zdDtqN!<5h4MM`~?CN7kj?Yq8ZFQZMn0*}@G=$pF^Tm4M5rFcB1gj<d%m
zTk5A-9!6aQ9dEyBu2IHE3?qOUH>es3)JQ!+!J+|0JoAYcB;7({sm$xcLN)2HB|$a*
z)0UMEm31Pl%TC0UIJO}xF->T2XJhZPsrSqYU+71PmX947`5jATWeoUPI`%i*W+L|7
zh~^G9mN=kM8%t2Cz9DgWzPwsZ%g`-$TH^5Rt8R7qXaABSmqZ11#v+~e!gPqFM9`>Q
z0tX2BZqS=iwwVSYO!N?;0yWB$ELU=1A#AyCD@M=EWDs$D=O+jw$T{$Sgs)1FC-Y>7
z<QRam%y}1DsExOG-~d=dsTPazp-{<|0Rq|{Y$pfIev020_qvd1_QnI^hCL7OR1czK
z&m`R(3Ju3IP$<+7ol7T2D@51z=&=cyY1OTIpgT|I13U|cV!gy5UBHK8B1G)zo}-DI
z<O96WLq23iVGw;;eN9Bj!{_UDu)w~u{W`4m=f2yUJ3#`*JbZYAo-e<heAK~kz0nSS
zwniXl+;ag((RddjP3!gt2L$bCnMWEh!d*2Q9|3>V0+DNE=JP=VhN1)FNBQ|THWg`u
zYSBR*6-A$jN3F25TbADK_Rj+28&^g1Ie_?%k&Gql0YIWug*sAG>;#dP^M#h{5;6UY
zl@;D3dgtLUOz&U~U9p^XJSS!dI1iTaBS3$!^|?qP#x0*v2y|=6qbj~h7U3lIXG9=p
zE=teX-?FFaZ5WyY02z%e^UxMtRkB!HI-GM8k<4Q@;>Mh{GK_<wk}t?bohl1pb@yeV
z$eW7Hmh+I#xhV(SI|Cl^TL@`nB5u*pY;j${N8o%=>Ug1xvxq~(44Y^c10}Y>CldDq
zyDi`?w7X=N5WN^!78wFl{9KOFR31Oe%**cbtkmMd;kuSsa)c9We6q(mzvz8;mWT;#
z|KjO4K7}u1q5uSnhUly>|Em&TDQqhC4Y7(R*WeAXAQeyLn5Fu!I)?nLnJppa#Y)}l
zKiTP+)a(E>Dy>npTgR?h%nTDjGoqafjd@+C2TEzkzYCRnKvhN8xFe;`2u2ogtOtMQ
zuHWEv?8&$FIIg@^kwR4~#!rwt8_HwJd3^Ak|K@w5N8Qyfll&t|FSg`%D~2>TQV(fm
zUbzpS&s#&XWN4Y$U)dF%IN^*tp2E(Ov<Y`K<4)Cy8W^<jdHT?g>}s8apkWlMaX%$g
zm0MJXph<-(owPAUwsb$eUM05-Hb;FaH3J+~3jxH(8S(AJN2fO*eGy~oxG&isi5y$~
zk=iOmEn6+dStq8(rjtHYXC%@-m0C!7Ji%Nldcm^&%*1IPNq~wvu)C<f2@-0V&)w;3
zqNhs7Z;Jb?&gaNAYj|i~m;s7V-2zP@u^mDCpw_49`)uWRNBsZndKriFRj@Y~`VJn8
zQObbC+(PE>NEQHZ(xOc~;V3p&ZFkY1X&LjDocl?WMS*1$)vK%FZ<+VFLVJ7CZ?9``
zS)Ii&88k{4VQ)8O9#VIK$rwcHNDt9|L1rr6EO$NG*$I?(x1D$SF2GwxMa9MMBS5-K
zPM{kO-%A^lt|3fi?*|j^g^pXNOi}2Tp=JHa6s}3A0{g=QkjuAQ0Nl2kmx|gVv?|!4
zv4nJV_}L?L({S)->do(u)c8a0Uq8)riib>PUu;Xs(IkF;?5Y#~U1t1mzwS%A)}zhl
zuW}n<E^3$|Y;LBHRY(Lxb+HOh1oqH^dqjr<${+v3@>ILuRgSv5Csm1yh@IRXAufTo
zXt*HV^%uf9?@|PT)Zp9A*HfQP5C|VnKR)q*<{95MU*Rb>OAREJ)*X3j{O~jFbaI-*
zkk@OdLydeuW;|`}-}ml`s3RuMx_OuFD*D(~awywsOhkO=jwfEgvj2mwbBeB{4YX}+
z+ji3F*y`A32OZnCZQD*dww;c4Y}>YzyU+RWeY|fqM!nR_9=qyWYt0#M)~iGqPo3Jy
zgVx}Vdxb4LaRcGVk$^fGvQm(*Nyp~Y(m9<OT!TFueTA_dm?o~M?mE7MM=gMxV+-@%
zd`i49EWA9R`HEK!m6qIB$8YjLW^gAv2#P4>d^%i~E=dM?DVa-~87+o4X;}?*G$7m^
zJUo<WL{ntpCTovH56Qr#qWiNG)6Tl)YLDDncUu3)BaVW~q~6JCSp`7yCvRnglImBp
z!&{fznU4TuI16NWliOu!GNnEa1(m0}FpBOApybKRO|uC#(KU?eM^LsH8j`j5t?j!V
zXHzkeS#tV2TT}P*DzqlsKH5Jm!G7pT@2;=GAJ-$CHi*AS?vMqah`utOW|vT63goZ2
zBYwb;gBHxpC8EZOVkSz$@bm@Cy6I2~AC7}(<2)zhK`mMOmnP6ayT|*ijBQDTt^lj8
zzOCs<S?erM8sky#t7`aP7TC8PHTP(lxJu^X=fRRF6idR8*A0t!FWf5ZVFp@_fz_{T
zWHftp%~JqVl;-b-g|7oANpXi$c#asb-Av0F0trtoLC&eSBINoFgr=td-6w6u{i)V%
z6T=p>`wP<QaSDG8c$YSOWJd#ys%ejvAZ`0e%ixZJYm^B&);Cg-YG$kd)~?(8bqUY<
zr5u*C^}{j|8Eo61_5tn#U8SUu1j&FP_7LpK9?_~HCN$e-f3dDpPgVmJGgWLQ3_<fa
z2AzQL6ZFb^;WKr}G}ssO7D;-cokqPvI=bOEHMpj8mN3EaUi^=7P(g^Bu7joOc)@_C
zgKv`rt{tW49-ZLHctO08F|k#u*CLy=b{hQh?Uts5ql3HxA5&iX?;=*)6-N3`xC+b;
zV3$2G`Cvpf?R0+BGa3Q%p6qUx*TvE0FzD&g>+*hL$=L<y^1dYn%X?q9`&Qf6$UoSz
zMzjmNWWA2m{_!>bWFxk``+U&&(*i9qrUd$2@Cz1AJwBxQX;=4pjytn8>`$-J$UczT
z5bvRI8J>gwCZH&0xU6_-{hAN4*s+d8)c~dpZLC6l=CAr|15Sp{98!X@Gi;!MvI@l8
zVG~ZsjSB71#gSz!jn1P6mNT#+c`-ovipq3{wI5ahKbA|L{93&~=vRun<7frLKle9*
zJlFzInkiUg{c!QL+u`EKsWAjVo8}>YQAO|9@)SJ&wLU0AmsBEwxW2UN6MM#P`hoEJ
z)smR(UcjL5A0!boJVqVWUtv;h&0IVTE@UtZ`}u>RU-~dr_cZEc*n7fzZ&Qd}WT1X7
zqgaq!XpO<$s&&P-8nz}j64syzUV2bu&&wLo#s1N&;Y3u@zwH@di(h-ZH@O+pF~~$p
z)r3i%;;1A3h}n^rW35RS=-IHfv>^(hr5K3YNiM0TuQ$w15_F!x*n(oZIyDIl@Q7JA
zKPGnW-2L=42b)8WSc|q2uqH_C#4xpOqAhJ>%V5~qEh%0_HHkX~)eEK#fHFNPASq{N
z0xAbXWYbofI82@ni7JfH_$uriWe7X4BMka};4V}R+MJycbdF8vs^Sk}oF~_ZYf|`1
zmi$v%wkN@~0nN`UPFVJC*Y2$n``bUf=RG;Rf1IzTkehG)cVOZmXlu3;+x-J^jKH8R
zlO>hi+GJ11{_;=m+R0b|#ZQ@t`W+HdwHW^+Q&?iC^YrL*rXh<Uq{MT=@)lB1M$9@?
znGWc)1#S)3*n^1WFm0_F(0bFm846g^(!~gy_3rCe-Jh3htxg(WGgsZ6cfjlR?~>z|
z;e^%CUv}UsQW+U(_aq_dR3i;g58QF|VK{lgju+c)Bov1O5M)^d{%LXHDe*Vn2(u7w
z@<H<Wk2eUs=2Jkq_dKbRXfPe@OXqzz*)3ZvG77t<V>G6Wx@~)vIrMSUgk@u^2gBwe
zIILW;-mCSJ3UY)IHkV(07IeB5=lP;z@YWrZYb}ELhpWEyaer;7{gH-VvpKx*dQ$s&
zazb+t-jq{HC@<;ce`EnXx=U(rsLu)IcVgfnyRF+2fwpRmK?jZ|`|ntHSW@FSr9jNz
zD6oCjZ|tbQAwKb-kLL?e<G&X#;7oOQn0JTtPnt;Ms(r-pnaVwfLd|71q-CLnTKh<^
z`nToQ9cwnGn!!Kla$+@On$rp|SKaA8O$sOdBW+wa^1s&Z&={|BUaLN@I5(!=K-mMC
zAV<C2*37nVr?$T=4x{Z}Otcb#xytR<v$$@Q5Vk*vFG{KFm5oPTst7@njX_XQ*p+=(
zn=ThBaShyXch0YGk4wV9Y)iyz$o%c%4n@a2bg;tw6`hLMxWs&;Qc`siW2h~iFu(zF
zT=#1v5SttOJkEcA)lDTj2h&AeSgs7a=c51z{ud>B9=v4%%=bRh;kPy+;o09%OLVI`
zkrM+Xp#AOb^zm?g3)$~^6O-vuSpoWbE~FWHThTF2xQ*r0MW%`)ZYUP~<z+Ia<a}!k
z)9#fh{n=d7Z!Z92KscOur#Ch8ox4J1P$NtpzsB|k25KA=2_D4npb}TFYZUIqxR#zG
zT@<P0RNQv{5jRV}mHcHfcB=V<?+!HRpkvP|o_p^rY^#;HHLYhu^R2Jb3kd1(7+)oG
z9Vve!0TA6)uglNoAjSM<6~6&owkXKqEYRyNw9v}G#ldOA^XrRvUsOZu3BQd;T3iS)
zCRT`n1x*Fpo-Py@tdKH67j?2U+AI16JY~3X3cP7HOa$Mst?twQn<m*1{PXZ$FMd%M
z>34^k2xT!RpGmqOx-U~<c$CIb;cJ#5>lYMUQrK#e9b#D_sY<60;(~I}5Y)@Gw8JX$
zi!Ay)ox`C$_lgdqq}BaMye&$se!+Z6rtycSHKEt8Rj^u8xqIP)r;SqwX7U&5Jn9_Z
zVxsn%Y2}ISjbj~b4|bEGj$|QOIqPenpf3h|v&-|LdIgXrS%0gAk!|ohj9V*Z%c7_Z
z2Y0F<ox?fH3HTRye~V}*KC?<M$83vUQ89JvusV2U?4-<TZTrz<2rWNB8oWw1JTik8
zjs1pBY*;@^|8^LO<G=~u4>~BKz+-be%OP<>NL>8J%t#aUSeqbDDxhSliG+$TKx9e;
zvlxFbQ4);wv|FRJjT<Z~ugkU_Km)jg1JL=19LE<_X5lv0b9v&GDJfSG2$GqF(&!@o
z(4<BdF1;sb$4W<HE+52vqbM{g;rZh@&OApH^D6beG+Gjhc7(&8P(sixRpv|K`OUk#
z-fDV--;CIHvrf1FIiy80%OvA62hG7c6tCkH`Kw9l97D_ID?x((%X7_I9XFLwDp$N7
z1CJT&67Gj8@<a|4lP8!A+@3Otbq9bDEE^prkt8w%k~EMG)wF(x&1hbH%pXRMBKS}i
zIuW9$-@A~kILAK>8@NsJ;2dTCT}ab&x5uegwym^=q+}~<KTY;BgJZ^+T`TVL*kCkh
zc+T4MC3x~ZzZ6JAfJZUFWQ4&GkLSKQBwJ0H4du`ggNRsfZfWR@8B~)+X(BB~{JvkW
zre;Mbr{*Zu@1J3B$=<3Q4O=HJO0J2Gj!2(}YH3zPZkpqA45QhaPiwl>RBDF8W8DfD
zSffV+bM~qV^G}Rop|V+XE|AGcfmF3*Vac`HgO%+ltt;kNNja4R9?XHV5nOj+>r164
zyewpyzd^ZhEexTuetKC6ukY!-bbh|%7i?h!h_%@e9*Jjt?UvvpP+hvpNIJtmU-*Tb
zEzd)%HQ4mV_Iu1BjPi=1<*Y6aT4&7|0UlK`xpa$_v)S&ObWNwKgjXBC3R)cC?54U2
zFbHk83)4+Fv8TAnu&3~|RSv5AB0xVFm}qcE>Cb2t@z^}2DeH}d5#fP8h_>Sunv1LB
z{#2uuiE>m2SV&9|$+L6suQS?kpxQb)`<FM4WPi~W;?mraijaC-4$KH*cH4zEe3>bV
z!mu+*-yGLB`ilY#H*&)8_;{tiQhD>0Y?GY1;J9MMVV}~d_@i*7VYb|$PNrZvps;S0
z84Vk5QjQRM4h?qy1?OzwtWLAAW<`o(kr9Ub^6yB;a$571$p4kyqYRm{9;)WGoDQnd
zR%AM76WS`0wmBomd6fqf&5H)XB=EV3kS+?kX|bu%)l@7{l;p`Fz-!L##Tfh9zwc%n
zMCDB#OoEa`g9Jy;o*Z3{tvC`TNhTpAB8%s)gFCOtCr$0So-FHgwvu-O-C^1796VIa
z-b&sL$Ddo*Oi@FQXxvYW{o6!1Q-vdp6$vSUm5z(9QvW7lWbIs|$RAOg1jl{lSn@^$
z8`hTO!K)a{++PaZkY|TH8_W0U5n2OJHPeF%?gL32Aw$%!cfZ7^^SPTTL~m~+HZ;)$
z@t-pV5@|Tb9v1#6N6!j9s^_a^u?i6uq`~qSB&S&?2K~<IXKW>n)&ZG00j3i9vett?
zHHx5PK_bj)0t;a~(jF{doQF*5;tp6l?F%y-JJU5_ng^V(g$>i&{AAyItJgc1Mk8)8
zTV)XS`NidI%6pJ(bSBzfOc1Yu${bp^@$a+=N#(J%J-rgma9f}tVnJ@k-P^j<uB0|3
zA`UIs5J1+FiPWdrQEEQ?pz=gx7hg!BDmbnp-_ARoGZ@hr=2;?s+HiMWK@>??{%3N+
z&2*|?GuM5PxmmONfC%Zc_Rvfj+kq0f5+m<D(kCd7_nQ;)$MdV@CkZdaHN-gMWD1-t
zDAc{ocr@SOAi}@dTz=lCtG=I_s%}dmsJ*PGqMm&HA!1)CJ{Z0Hl)k6}e3)HGKm^V9
zYi(}f&zI}69Sxp@4B6<yl~_C-SD%6f0;w3&uw2hGk3MNvTSG&Z6gnO!qp>knM-Eor
zj2uG3tQ>oB>LjD0u?zDhsr^BgwWu)Pnd*WCE@`T=(Qieb3|Ukrl@p-~WtYvjQRe6o
zbjpXns#F>B3Z|@M2QQ5k^$1(?M|mD++%wk&LDkX(a(iO!{-DJYN-X>-+1b)Dg}Vo$
zEUydwOBci&!cK5<#48Tw$5Axg?ZDflVk#8}6t`OcW{zYg%W?y=uk9Tycyp#N#g_k*
z2>D6B{*OdxZpDuzS=tmO7h*=;aL&eBVb36%=|u~U=U0!^#G&AhphdG`rUt=zgCP`M
z?62X_C~S#Y#Z4V%{3aM8J=+5;36Eh1K0PNppf4#n^3;|t79NvzQ#BhWqqc*5>2@{}
zeIGr^T+=6apqO1F@Z`IRM?gtdA1zFK);7mH+HmkCK4=U_C0yoDGy?zdeFE67KA2Rj
zN#ib|=UIiJg)LG{G(iCG^tu*P&Lvq|gm_A0_p(o<sgN9x)9IM3BU?f`jp3X95m2g9
zC%`e!xx<`ww%r)tMa~X)3b{i}DthBowE(n<)0%||_C7Gs%5lRt({+Puka!1<%RGyh
z=FZ(QF&=4_^VS#{k%pMakw{^Pp$+_6le&ZM)+Z>wkoh(B6Z4H^fKY_)<|7+iDfeMO
zdmR_vc`j=!a|UbM&^95}okysrAl<O3;-G|rb6iRKuN>Zh;|`he$m|^~yI~1gxqnua
zNH9RWk%I5W+*MZA5LMcg&5Qf>UQ4iw^AnLr(Z;hJ@me&ybl`rzc0O1EDE^@*gc#G;
z3El9RS1xUwLzt9=Q*JJZz%_6P*9xS7#z>tw9g2^iiIF;0-Upyl(gN`0PWGvt$fgwK
zoqE(_aYIwNFf=@Wl^HN(1z6w=&c{!R=0M;5*+9Q7OEw8F6DiCya2%~hTmmHvcq;&a
zqfkKU+{|~nCWXpAPk95a1suT*9@j{2%z~{gcd^fti%+X1PtA>N-26H&A18?6-9RVj
z`&rz}x!;d&BCxzfo`g`+WwkFZ5i&ii5F>9>$u&89qc89Yy29xs1(be*3~h@=d;HNA
zxUYpgj>yG*%n8ZSvfSUO>g#nj%}TC|^NITqa>{JNjtt#3@f;~>N7)H<u3AqArwJI0
zXc%{C34(+3tf95WrVQKO(eg6!uy@j86Ke`=e-F|Ag>mt@vmpn8CV)VftHbwwNmDBj
zL#w<I6-Fm@ZkjWdGGE{h5>XfoDB|n6Rmm`$l*<XGpRpo3a}M6#%Nm-mon2U(qfvN3
zx^m0HCU#7P4#2W#mXC9KAf7!$0#bIhCDp)#6nh|3g3hS@dzfv|-T@I1pQusw7(KsC
zjSaz2GAYR`-rnagKD-$~Fqe-qB&(V&2Sc*gEX4E%z`-TU(HU5#3*x0yO5w+Fq-QGq
ze#vBx9mo#QvfVqS;1Giy@D=OAfZ#s6oTL|ow2A*J@F{KrM+*5NX-=q;>A5x}8{!pB
zj)ODievwVXP9d$2-wY$TGWLt%2-IX%%U(FCO~@8Up=2kSNgy+lq35+&v-&*41@e`B
zc4(wuQ<k!_4MA)W8hk<#6@#eon~%M)Kqyg7QPBv6{X9x#&(2+TPd#NbZmN~8jq|>`
z6u)R?i~`p6Uy`nYL>)kkP?sbsLq$I_5$A^;E?*glNv;a|>8(57+@xG|%*Y7U!)&+(
z-wQ8)Ojz*N6-zo~aM2+nypS}soM233`gPRgH%ST=w-Y`a)hAh)VNh@-)pNit(a=7k
z!UyJpMw0JXQVr#YvcPlZ1w7<xPoSUh?l(b3j}Sk&e5u@zR&3=tH<y<mu&=CWZ(ig3
zm<z23H;J<=HeZbGu74ce#z=||F44|48(qfqI{^!n8b}j!lxe{u@dYvpgNqY%n2q+o
zCWSIk#!)|FABT{=#jEYXrJzgiz+R1dxdRa21bK;3++gsE@yKS{;N@}I+VPQVYJU2)
zJoRVQtpk?rTY(1W?nAFY6!8jwzbB8DxJv9z$M#6blrx9i@QXiOMk%xH;7<N%#dub1
z+G2GMzaW^(BED(|nHT&!^+qk&Z=*$=`5o&K8HAcGtJPf3%zf);gX`6pGRJZ7r<2T?
zhVG*1?YvMYtmS<iw81{V%I7!4`*(9Pv1U8jne6M>1>>l9;to=AOELM9_Yc%@sm5p@
z)`MW|_7Mx4MW*{FyU9dhGiv)`vDgW`B}=f@ZP={kt6SP)YOHXsWq$CzBN?;U2GHmP
z?S9o<>?Da*#3Bjdy|&HlZCMs6^2feMOV~ry!X5N2X0s@Q4B-X0ZKuQedPUduKoE{z
z!BNfd=yzXDwZyhNEb*SKV&H*L#5)iUtB7_Svt;uB&iuxHD2&qP4OJ4q-b%YgVVbhC
z#Wdn$_7~i*+3{`kziJvW@5`_O#?9k{hSCLB9}W)wk0DP8|M2rK+!h_1+qil)CBA>A
zzSGATxw~*ZI>Lfej=p!#b4B9YsnfSngCjEB?+vs2kK&bgLW1-sNY2iX^nC2xZC3J9
zD1!_d{gwAcWXp8XtaXnwXZ!7fU=>`=>#0`fazJB|r8i`~161T<+*{-1lib4^i!X#c
zH95UCPrjE*#%x|?DsTE;WT5rI@RpL;g<Yj3KUaO<lr473;MeA8aKQz;zTUxK@8vD|
zwKcoGYU7=2FEKleI*LJhZC(y5FUIHKR45etXyQE=Jp)H$j%=g%H%pIC?Tz@$spUth
zv5V8mqL?k_-(TGWH#Thjo1otFDW-ruRdRxM<EFX4);(KECM2Av=fRo|O{zM<mFoUs
zq8E;<YAQXy-8TdHZj3W|>QAf?qY$;WRNqL1k~deIXobTOV~SCx8zU3<_(3=pqA18m
zWoZ<lwY7_Mgw0h>T%ZxV46T{}?nOLOf(RO^`h-~QEUF_^zSOKS0k1fsE8V^512pkc
zF_!WskWTecxmU{Or6h?3v*i_Jl|_DTP?=2W4_!0l<&EgD<`L`XhY%dfGOM^^1&ACg
z{$?aqX;t*!+MdF7f#yR)6)dBlNJbLXE&FO`SffkvM*I{bo2A!AcjmVyN1c<JUtfWy
zB`Zm!rdO|^WXhsbKOPeVA;pDW_eJ{#hpSe#G^dFz)*sAZ1HX{z^dQ5_DoDB+#z};X
zKiHdD)irH#!aZsL{y;FIRu3537}|^~77ofU8fI{jT;xu4Vrg>xnJNr#X+<}3l|ycY
z>rbLPhJ_8RH*#_$<zy)cmmTI-+s9x>#$qW|PK8-a+`NQ!U-(JSUYubZ(aj)7%Eo|@
zCQG(qf;@56>9U0<hwbYK#Ip*6?wMe9m^|JHGYl3_Tc#y3VScAtn!68Z;Zs=q;d;5w
z;Z+OZgK=E0sydRf^D<{jVvOXddd|62&D#bP$#s@wo2;45H{zN(J3G62Giu`y6T3BJ
ztitNJ6f#{=9M6qi91u{A3MjnKsuRRW&F9L?n^0NL=VF!m05~o!4Aw>#EUwqc8Cm7|
zD=t)s&uTdEn<>bHgy~o_hzPS}DU|n<rVcKud>+Gu=fg>xnqtraF$!4J{~AKtXBTuF
ztgV-FveKP$>SbAY^>A|NJ63OR+h+l(iK2A8?3p>^(d&|AgObAdnesTe;y9kOTVv!g
zBLw0ESZPD6QzkgS+KN^qw7R&^BUT4a%v~GwkKwVn@Mty1Ga^VbT!7j&TPp2Qd1F)R
z<A&3{gM$|x&%$^>X;R48>TK~j#z;#ljCffCy2_XoKB_Tgj}TOef}i>xe3RledL9_E
z%zM{5zqA=Srct9HZ<@SWlm&GH&Pa0T^On~6#riJ`2>qXy1)yfOw$4@YuFS-v(Pm~j
zfFbjKDf!r#z32V0Sy>KVy==nCrxd8%+O$1$o%{fd8KLnG&S*F`lxP|1Af*6=V8?D1
zuuRzyrNB)wcMfG{aoCDgLn>~LD9+G`17k;PhNODsK*D6;$6%qUG)|7J$!2uaH?G{|
zRLqn~;}~q0m@!<@Sj4I^Rf<&Seq*E2SanJ8gQ}x7T~dqEK)Nif1wLi;SaSRp8)fm_
zDU)>FyZ5v>-9Ir1H)m%sJIHjj><1lE-s9i2{YcV95MD~Glo&d0&faJ;)kPTx*{s}Q
zt^TS7;gp)xx;Q@o@@<W$d+wID-hXL9zo&g=anfi>jf#RDGSFDEj$vt%(GcW68h5wz
zD%(aP?t4x`r*k%d@>1iv_}uYQYFNsh8b<5drg0OW2E-eK)TCPq082Z+KecDNUJ&l^
zhp%f(yj+QCJq;Hxrv|}f>@+tn3Jm<!fw`+Q1FIIp8ntIzR0L6rPOy}GjG@1GVKZ#D
zKkSf^u=jn~x<LNciRV!Rt|Pd3<txjE2^hGE;-cU1*+8~J^lWCgUM}v{>l46!df6=0
zt4EM^;~p||H`i_k!ObVclq_Qm&vpO-zh)r1H1SL=eOBp}4qzYF$wd0`RI&S~uO!`I
zoayN3A4|oULs!tvJ9Bmc!t?HLz|qw(LeixvOY9-v#~(h*zVN>D^5BFqgq3PIQB0_)
z{aoe#VHGIx=70duCTz}5yJ^rrB~z*y3-~#!v`2(!`$91D*xtNX3-tNN!*(abqhe@G
z*<}Jh*)ts=cDMRP@g>4YG>iy7TZDsJ-#9P|sUNcF)iQ<mKUnQZf0sIRTNf(t4hOl{
z?!e#h4e!u&-!Y3b!#IS*qBP=t1cZqy$IYZM*F-?w{rd)@n|iNbH-8@FZGS6&wSV&N
zR<debUr~7OgDTs)D_~A<fEtpEuqE&1e!lCC61!V`Z-Iy1dAVMC?aoBKiu~+pI^T(U
zf98W0n?r9y`Fflyh*!>FYyTTT5{tSx{nncM$0Ow_-s*VFtIZDH(ngt$%F1>B0;)rW
zw4ZDr;4XprhWNvU3)!AUOesA>5)6qI-N>OI+s@8q89u&6`hgC?S_D9}vyCr2FkHap
z_?wz(4j+*Rfp2P6IYjgFp5n$R-w4X<U+pR<S}Zb`Lb(z7WuV_B*k#|0e4Ci>sXy&A
zX|{moG$|8dJM%&3-x*Rf6<kNsA3283=p{;z5w6TqwG;NO<=%$A=b=_s8y5q15=?e$
zxr~muS3n0JAP8;oSkZA@@xttk5Se*i&CRp-&Gam3$yQ@|qa`vM#-wZvl`2bWWH;6c
zyO@9FoCM12>??o9A@69l3h%oBsr|!lCx$^n(q4py$w%RJeYwf<KGVQJJFL`O{{|(M
zOb_$@f6>bj@#c?LKFX+k48SrjuV2U(>b@Q{F<xbBZO(8>_q0I1*t*%ZO?J=8e;bT=
zt?OBd^H#n4FYl?&8`$Zwo+jTnFUJvvVk`IDU|UG}qDX#66oWQt&Sv59Y^M>dc=b$n
zqB!*$FomLMNHpL6Up*NhQ-9W5!-Uwu9uRLmz%u`Mn=g!0dP2&wE{<4GgC(_q(y?}F
z8;pm((xCXTc4C%VhNtD10u(>%Bc8eP6!Z&bUCx+gSLr-UWmrW<Va8uxdO%#bAciiS
zSSdoNTgeud<Z!ojT^j6w=cGfhR1tw#c6&l;t7-&yWo$;LNX{Tc*#d9~HbxXf?3EGW
z8I1OG^-1;)Bj43=QX8rw{*n+xjj=+a^<C~p)S)1YduM~S(f$vu)+0B(LJ;Bw01_;<
zonP6rEIl$otc`KXG4c3$bSJ<U6^1{7ecqg@Q*-bG$_8g1Lp23A1mSS+nMnBnA^y2L
zjLcFZb_eNy#YT4<fxW0f(OlLUF%j`4l?ySFSE=LG$>VfLJ^xeX-UH_2C+`q}3t49J
z^*I4HX2?^u-5zG2QOh^-tw7eW3;Mb1KJ3+y-1)=SG9m5T0BbF7bcHsy#<lOTX9(wR
zbL)3YkGGM{Yd2CSara7jy)TDVyjbA_8z>JZJz`nt$u{e5Cvopyx&2wL{y<^rjiQhr
z9V_M)$6X#Fhl<HcRZ5#^!+fZoFcyoG#3!5VgM9Tua!Mme(t{ChxC!8b`K;;mQY;Iv
zTW@=28j2wHHs9yAqlYBmysRhJ_8i^^Tti4u0|2`l>-$p{pZFXvb>bUEsRroOHO6S9
zPjJ*?4YmLgFsl5qDLkAa^^@y6;I`gKsplwhqlF6Rmkqk2OZG*fxl-RoQ$=ol=a+3F
zt6ZF*tO<1=HKw3cL+|t*dPbMCEOzGi$<p5#?{;TE3~dfET&=9a^WpDY%dQtxMNl-(
z>4W%Cu2eZ`aY+7K=wzH$`J4?c$6<|iX|k<b_FE>82ZwtpSuz|aTowR<oQVnJ&JLy;
z#qyjZ-$;{=bWUoZwpF3-0dcq=zRAi!Ha>+wiLkvT^sr;05=e|0f84u;c#7sG-#=rT
zuZ~`xEsuc*-X*Fk>7Oy!)j?#z$Cg719*kv+LjLFZyZNUw<>d^u+d!DHVuTG)ZsYmf
zn{T0Z=`VV0Pr3qnom*g}Ixj<7o}k}UZNgfRhB6;%(r`P=791))tWu=ZY`C`RQJGIX
z1$!TToX$bLXn=$IxBi;If#M3ohmuS`n6&SEMNsucy5*NMSg?!lo}!dCorSqjrPWS^
z1206+qI01Cj=3ONfE1<T`&~{9vcl|*ESnO0s0i#ba8^{o#$P6j+qvE232&!IlC9!H
zLUS@|L-x&aeD(=%^}l3j)P<C+xQ!A7njE36Zr*dwsac}{G?z#!{xmw#xbqNo4S^e9
z?`dAAEJTbwsD52ho}EyL)Kdq^ob|G<Lr&llYO>RJywkng=LoHO+xM9RfK&Zr|8;F{
zEr^wF{hX_trM=>B)lbq%3~MvA=*?=MyXBa_-<sWS&7Yeoj}pX&AA^()aa-no?;Sup
zVGa$agQPCEX@QV`1NVk`o1WJVQh7<qCW|xk`^u)XHK&r?wFtWe6zR877hq`X&COXK
z1@dlJT{UQw6eHG^4TWw-HyVImB<OUhwVj<o({!{el)(lv-uT3uhbUe=9z48LQGo#p
z#sG@FOa|*~_!}AX1BC+$_fB$=Ksr%IeyBr&+}R4GROUdjq>lu5*_(-b3%8ruJ#gq~
zT!J_P4x1hG!6_hj=UEB%tMt=Wwq5|oYVZOEnjAK_8H|61wTB!OH&PkWmx95{YHjWr
zm)=;AphK4iOXgpF6@Aep!wej?S=$@tgSR9lZxLTn1GrrL`98}tv#_zk-b(@zWiD}2
z(d~m~i#EQ%J?L(ngZE(J!RHDFNoI17<09|#)!Hg6#Tk^<A#ksD;#&Hmj3b{~{<&MU
z2<>N)=d<p0PK72()wI}CX#h35cBoLtHrsP!cKsy=98Rs&mDA-+XnjXZ<C{dEPqcQ2
zY1GY>5RBa#jI;fqI5^Q#(HL+MEUXqvFDbJqmr0t+4W9&{1P+fU<vmyNDwF1YqD?5M
z&R@#Zm|y3DyN@<hXcLuD9n+q9uh~|^`szr1Mm50Vm;G4@r349h312nL4n;eyi+Xe5
zMC|Y#ofHxZXJlzXRD6Yb_3+82Z)CbG(jWh}N^dI07dUAlABjF7&yIks#<BLD<Fl7*
zR}v(Wg%$WrjKvw<y~OnPMMGx!z%U}?)E(Do(j7%+EoT>Hu>EgVfSQD-w&#H?sOxo(
z^kOa>mC4&drwVuADVTJyo^q#f=yU)h)m&)B`PK5agCwA9&ENxeKJe#_<=0a=k?JkZ
zb8m1x^r^jmUBOs~Xh0<IEkTJ>U=QjTa(z)tue;CXfQ9GH9DUdsS?6LpZjr)V+*+#b
z#tbKF@Jiaxz)j5DnA%#PeoDu0pA=90lD@UO1oP}kB;BlTqzLG83HI<MS%74RIjlO~
zDgmhdwi8VOfkV{UdCHp!7(SY{g!B;fZFb#FZEa|B@R0IPk<TZL0n@uPm%yvu@`;kd
z;NT8aW?64tuk|_ean%C}MPA`W{&0PVhxUcf2igfB=l*4@Fis0MQ8v&+6++*o3hH&*
zYnSH^A3N<K4iDITzY^HNOdyzTt+zQ4%OL%A`{|A?&|j$dzP8<U=mcbsVL12>Vbht`
zk>7b&Fi+Qjz#@8J{~6&*OMsh>pv|xgKZh99kRD{4?PfzUSptHo-!FQUXM`A!*o#~H
z8AfI0-*)8iro(70WrpKJabSOQ360OB!f4-_%~Wveumn>F>cC8s2t<+4p|MZ`#a6&=
zYtU5&piz|R3$7p;<;KaqP%=9BKK>s?0(x%E;RuLb5LskiYFhzFyY_0LFcikj*!E|3
zB)6FI0})WncMDP^mvlta(dG2aWcnEElE@Zu;LCb4dNz0uWLBQ8mBlZ{mV_bPQH2K`
zPt6(3bH^Y5Y=DVu8eszc$1jiLjNxV0<T@(-8@K}HPJ?|0>bdcee^%g(BaoKfsvVgA
zyRHehu7~!(CWg^6<o<}mkoGp0yp(Z;i8vze@yTYkA-+qdC6fOfci0)!=*s3X<rLB#
z(9kJ(OIZ~XwTpALYal5uLKgPnX}uY7w?rjA1dyo|?KXiOC+`FctHZy}cVO%)`3B1L
zhX{E!9)oa-7dHPVo2IpO>E6X}JK(@12w#AQRF@z?lCV#15Mk~(JSnSZ<a2SR6L$Ot
zFe*D@SJ(NIEoz~wrAdq3x5rJbouH(VAxaSLPzrwJTz2xMMGky$o(J|%V+1-zT*<|=
zpok*3kDwBwBZ%>KQl!)UOR<xB9Ht^f0VG*D@%x8K)M8_1J2OH;W4#Ro*2SY+Dk@y7
z61cP)gb~WhC8m+ENZmQC^gE-pSO7P4a%Y#`B#6?8Qg+r;%)efLeOC=BNWKVxnf;VH
zC}|Z5c9Ku7sGn`97iNp(ct076p-enl_Q`j^R|-+;C}r^<sbsowfl|mOY~5HzB@%PP
zTD(^}7j};co;PP)+DOM*sneirTk<6l4}DdDT5ORmzYgj(A@QJV+}Hhpvun6$y4&#r
z&Uha*N$6SqxYt~GY`#iBUN}RcvCA3|LaY|U-4rKOzTGh;9)1Fv*N^prVrXtk?iqbz
z4y?exn;+J!$1MEK|5lbB(jv@W{1vz8=8Wv3cHN@5gyq!>&;7pb1h}#}2sq^W?D4@X
zjO6Hc34Ju;K2$8{xn1cF=l{gSGeCOMr<wTxuh2qtRxrD+&H3bwak}z>P7Ma`!@8rc
z_l$5%XXu2;6qFC!YNa^C(~k6z$=2v%nrzqXg+FQ*CW9+FQi)L1_!=<<H|_s~j$$a_
zr-zPHQ1;XEwH7f-W6Ip;d|--h(qEBaTarCQcW&IoEEjN&1|x;RNQguLsb+*YP<pi{
z%Ub$w(}h~ZtA`^&axn7jAxlJEK$9PvT7Tx1z#M%HLRD?LN$Z@66CyX6(q(Ocx_(Xr
z=`)Sb+K@cbDx>EcE?mB*`>Q>-TvcUZ1P{@~+wUOyVUkRnl<{HOKTu>!n!)_|I)vu6
z=n+HF7O1aU*rhRfe+MLV?+PnK9_gbjZ=8e5L(cDTc9~sF9dFdOok-w7|EAc}MKMpP
zNmAYJA{`Y3RT|5o>7bFY9`ORu4BO%Y5t_2r0pj)yGQumK(h6Fe&9p{rwM_be+U?}*
zPFV0KpT=(1>+0>c)#Y}K0TCuJh`FR7HykFTL=H%96@QZ+d)j#Oa(bze_6k={)(qMV
z4WE~KU=!06oMUS2pVe_UJU|Gt<!-%fCA-UokPcn%7~j}<wRLSqC-IF_d-23+xCzu!
z?7n4$xW&92ySfYNwy*T&J8aUHw0JmBrF4rb)RkT92bfSFrGSwT==^HPS0dFDz-IB+
zytFVXfoaU58j9&JOd`YEdOxb$V@J?`A|Zwd6IAHv3(N5BAfNvXIh0mTW~2uI%|E}x
zIi0T8<Yo=9s(s$|AFntKG#Z6MpUA@-NL(vm6Q9W0Gx1mtseEE}epg-x0En>CbQ(;J
z<*>DX4alsTxW$X%4SbW5E-1!>QS|Q56zIMqU2gUM>~)%s4T5$f%r>1duWO$3uD+P+
z#KbeTqbh~i#51-zJ`r>fbxiDIIQr;iqclpU6NZoccHk7j?VFR?KM5xWwFQ9^IwH-X
z=)X}e%5aXm?=zQrldFM(ape~LJ6@-0qDhD-XDKZHr()?igt@y3DE|1MDs3uKa3|P>
zLnw)O^Cn03zkA8qV8eYkH_eXrF|=mLz2Yum2$|Twnq7Bwp|$wi))8pU#Wt?1*ecwt
z`*8BY2GjSFR~APY);qvAD4#dtmG+1DKbIujW>Hs?n>z7c(E8`2iDvu3P21CCr<Fz@
z<X9(y+gcspneK-S3O`)#qr`O*z79ya&m&ffXnEekSr9a^iD`fNt<!Oo!{2+mHTkXc
zHYy160$R~^hzxvD)%<?2I=i9|N_+P!*2!@hH@s-$Y7iCjmYy4n!nLPKC9yZM?(-8Q
zS!3B*i~JK1(p$Ld>mW9{^20y$+Sv^up!BNiqrhTL^FnmS`Ci(3S*`-g7D2P-9sw?j
zz=CCHqMmwQ8n(5H{gxgB&#l1;U#c3>sg9qY?ns4-UZyC7Gk-;K;1AlFA<RoY&a9jP
z)h%P-xND>V0a0VKhz*gZD@%&QzmdeGh@q8}m6d~&A#3aa6W*3t?j~yq6z4zx-a#)W
z9{%J}enClJRd+yJ$#?JW3Hs{RV8zn%2KMd_x|a*U+7@Q>fk&*XGZwD&gLk^}hdo{S
z*)F5p%75GzKQI01o<|0PJjn49@rW+(hyjjz-7^etE90bi&G@+v5NPZO=~nAAqpZL6
z{?Z2ETU9@{ljbRt3R5WYWA*O@YlJKa2X<jb&hvga=S^6ZWE>0EAyFQh=Wd)(wi}(w
zHVO^gTpOpcP7({DL+0iEHJOw3W9JnlHi99kMX#3LKTJLUnvE$5-QycadSxrVj)wz?
zDzeX3{WA_$QrC}#C)vQC%;Imdus+O;qOP0OVR-gR3>)OB!-Oa~<K~@(;f)S7oUm8Z
z^-1v$)bz!G@EWihWEj9h_*E+r!IV`hnm+j;8r+{yI#VgqVrr^UJ~hd{QHtUk&||tp
zTTm(P9H7Y&o~8|sQ-Yts%x?>X<tIZbpa|j;?PH2y%(s7`Fsr;giofUvr=0J_KmfbB
zSZ@9qt^M~=qd7Vy1M)G}Zkc^$|5W2-C-@G%mtr{i7?Qd$lg^4f+q_hbE@}kd+%Nk%
zw=(9f%oK{g!IchxPB??M0a)ZX;lFxVx9lGQfy7AY#*x9VyHptyf<S?4gR4T$Lr=T(
z0;s@{bKoBteKj9?%&fO0x2_~k&aF0D+UcD!3KIVSw^Oj{f{Kf&4J2#E9;^0fk=+pE
zZ)T?9I(XR9z0e8R|Lg3z$GW;%1D}1a^rAWkM{MUm6j6sk09W5Kw-+Ii5t$4B#Rmzy
zC1Q}hRXek&N9wV7pN?bx>@@nkwm=%@iIo6K_P9IcZJ<G&KRivW?5Aouc+2<++gJ<y
za1D3fJ`D{N70Y1+oc*x63N^(~VY#<t;(*floe=qj$B3~r3yaBd=v(}oyqO*n$0a>t
z)b)%tbse=W@4^t<A#*paU3BPq2S5RAoSGIk`Qc4BKHm^tUwDiw8GL=;4z8U)4?(^-
zwXf3u>AclbR>T~0i&(i2*%Z45?cJQb%DUWi>f6w<6OcYSy*w$PnEE?OOsLZz^4`CP
zcJFc$oyx<NEh#}5GZm3BR%cU?-lGtuf*B5!zb4Z=|EwpT#P<H{4KP7~Bc5bzDO9B@
z52IKFi(#Vvi{*N;@w%SodhyX~uohwvM4+HoF7c&YB<@s%aZ^F2IR2ztoO;CW+|t$}
zz>lX8FwVoD5m}bR57!q2#>iV?0oE+5sDf+S<1c2=YL3El<FT1PjWY+5*f!<9stQ+A
zOAo}fPLVtQLbM2u&~z42p7(IAFTyd{TJd+*cnP0;E;4}>u`Emr|AggF_(FJLKJr+l
z2pme%{f1!$llrnAdU1*`pq-zYmi8ofzQP0*ofb}3E-CN7dfR%=z-L>ND11J<O;i@d
zSBtrQ@dkp}T_3-e_tXsSS-YM8b-F<hmxG00N#o1duSyk5wxMVx&m+<f#`kE6gK*M*
z>oQ}yOet}%w<B=AV^G1PhbS$}QHZN#7&2$6P70usSa3_*K<25<%_7|!KkkPZ&f>a6
zg$%5;@$x2Qrc3hF>=!pWC85VA5z3U#g&5OO$XRg0T7-F&-lgJ8Xpt(N9;nh`8W$5T
zDbJO*#JyISHu4cyr%tm`Qq(-kfSNFC4Vy!dUom*dW*mm*%_j53m-z-BOgGK?Q?8#_
zAG2gb{hKQ?Dl)pVCQFc`q*iW3x0D})iD$#B!)%)Ev-afJkB*I<ik&ur-{0St2eyF%
z=oK*j<s9hnWaMOL(RBEffl_Momj*aH(y<zp1T+bO8>!<-%1B+h;GOU^+n9PM77HU5
z1kG&JH!8_EX3}6W;p!7mpoOTKqM?r#=THsoS|p?DZH$v2{iBXTH(DLoFt<dy!|rrK
zcdB7cl`<WHFLDy&<P{L$%v?2mlQbz4Z&b-}p!YGx@iE_LRg?-<iYT}+HK&XTBa)^~
z)#-1{nXo@g&vA*WkS&Op{4W@Czg#c~DxM9g&JQ6=)S}M7o0K<g2)Ph7b#-LSVw9JU
zoi3q<8?)xoO3x;g>yUS_{!42J#AY$4WH%{`s_)@2j5LeGYeg8<DQ6V~E({r;v=}J7
z^Wj&dE5MxsdF{Bv#Z3PtLnHEVq<w6bFtk?WF#j(R3j6;jLK!4YLr=6M;mJPmxR@|x
zIN1SI;*gmr@V220Gsbku3>Vzpz})G6%+Pj9%A`TFMs^a5b*z3h{*xRcT^x(l<cBdc
zPsVr$F7`KTW@Zk4&V*?ax;yn4#?2E)7ND5L!p+;&m7#snI3ZU$R?cKjy#Xg_jLe2I
ziaCQ}?vo@+m4{Fo>nZrMF(HzlUB@9Vsp-=2%SE0+>Pg>VkdQD3htNbdkhk}LVe^Q=
zD03Jn!YsKE!X(@W0a&51KmKn<)c*f6qOkwTi1dIN(J)$ERQoD0BH|0i3hgWiF`)mk
z^9@OjXcIYdNIUh?<o_+<GR$A)En4QT3GlSz7K)6kt~}g8EwL;w$J1LR>lC$E?Adpz
zvOJCyUc4QXjf6R!1CRd)d&r4$>Bx*EHlkE9*9ZNr+&mYKh#VjrKs{a3D-n)XtmxKn
z@>`QB;Z1amqQT9A>oJ(uCUj{yCx2p{&`kxh>i{jHy}e%6(vk1H_8X?BBxmj9;`bWt
zKqE!Y6pb7Uzh#G6zL*~8i#aai-3y%1m{r)D=`S9FZRofU63X{90tr<Um(f^qtV?G8
zsnqgvoX=wFhSLR6v<WmF?UFeKHcB<oA1REI4+;v4IW(whDkjQmF?gzKP`W8nMPj3=
zHmv-A|F($!8BR$x3fUWaIdLcJSJNPcKrO6~EN^H)A391==SLHTZcdT1W4D~q#>LQ}
zdQ25X$(q%*X}^oKVoRzRt2Z`aN!anMLIFE1?>Uq9!yMq}sO~>Qt}o|dNJ6t(s2pN3
zufM>nokWeT9bH3{!y-?K7`v@+2ln_9cBHFOlwoI0#8geS#)^>7sss|k3TId>n#2+D
zegsKR6u+zyNZD5}VO2JB>0(ePWjruo@wA1~2Qv;}YR|H3yPmCd+5Y7`J1O2|sc03S
zJo^#kJ5dO;P7q^{xr}wz#;}GUm($;@A8l&vBknwzT0TosCM%3P_^E8?QZZjg&rSiE
zCgor<Yv1pYa?0+hXl=i_SjvW<rDS6*-dc})DE_<_b9;NxrX<E3?FOPI!7PwF>x`kL
zCN(Q1gF248``rt}4|)8Go{^lDGkgqJ(*Tp|>c@terKZNDD@K(o71#jwofTPFt!P&y
zl{*ChJsgVf)^|_cfmuV5t_^+27(9Z=%u@CaxA@{Y7dV*ft`xjb>=;L3Admdw6wcKz
z5#M%%l(B!0|GC1}Wv&&JBK-*<sPexO@T!aa^fg9F)yJ!ie#jJCVS`R^f^{psFMkcj
zl=K-=pY{l$qybXf`2g2Yzy#5@VIoTv{^HNU1vWSaK^|{|V1e~CU>dAQW{I=PKXT}0
z$cyFE$g3YQ%a@7YMPErPSFxlQK>=g~z7ySHi`W#~m$$}Ez09x&UL0F~es(n*q|8op
z4I&Dj5$Y+R-$7H-Aor+lc+H0Hkr(bu#9)l!UMrQ{17o76=(PUDTy$0AtU^rS5=SsL
zW}2k)N!KwXYzrCLccS&eg4}o97_U)?!U7{JCy#h9Afve|l5H=A1g5VsC?qq{PAOZ6
zx#K3RRMS@o51(|)f%p-2vM&}q9bcwRVf#fVr(wrk)MI@CV%s^8%6m%MPAL@p*s#rp
zOw%{5i~EQXS)WW5X2P`q0J3_=gssj2kt&qPbol7^F#{AeR$z_Yg}f5WA%}2eby$zW
z5IBnC{-W2K=zx-S;>Z2vmQ=^EV?(#B|1*Uq?(4lLO4qQ?;dU~$YR8}i)ejN=+EZ^B
zF1lK4gf!;Cj517KdSQ`46;^lp%-gxSFvL^4z7%oiy}1%G|GmA&=S;DUGPMZ$1l9n=
z0AU|IP{Gfa4K85c{@1KTW%QxN-znjsz8k8-`lvMlOJf}^QppDl3&5WD1CJ$owCz8e
z%I(Wh8+2rid-=RO0p827D&(GA5Fz;zN!z7d$r<8$90mKzA@G?SE07=F*F<eLFj}t&
zIL3aW@wA!Mf)z-#N{=BfFh<zU)HU+Xf#qpdi=p^3|CI3B3c5h|smMTrXj=jjZ?HW0
zgsZ=zB|+L=nJebmGKD_&jydtKeR`jka-Qa_lyMVR47Yxy<|jS8mEs9d(3XuMuo$a{
zIL+#pVTCCBn9qx+8IAM30e;hl5eHnjAs3i^Zg=jVh@t5S)7<RE>k%{1%}-x{`I)9y
zYrI};c3?Nfe&{CUE-e#(A+osfy<}4bVo_3OPRInBEzp7Z;Gp1Ytg4NqP#{8Gu03Z9
zl2{muFw>`8z_!vsAa+^?Ozvy=w^8D>AW99?$Hb4Im-_%^m>(>Y5I>Qo%I8DF#>wsN
zIw5zvjmFD!Ip89G$>yMLTtn300Z*dvbigTN3!(&IA6xP@SQ^shMXe=H14|$hzZ)q9
z`Zg*Cq&l2{g#xnv%>edwYtf%O<+goLX4wKfU~Z?e=&P732k_bKuVDDu0WkuD&Yqvc
zOO1jzOI#4FXE<_l(WN6vA)<lrP+>|E=`!}Jc{?a)CqWwawF|RjKUWTLve~(z@cDvb
zkcQw~tjD?C&%1+gn~gUPPBuh&QY7sqc9|?sfsNywAbXrYt6fp{c{CDEXdE7}W|w*c
zP@KCcd3%$}N;|Y&>@@>De(_0;8|~I){oSF1CsY6}uFBTB9QC)|pBq1aEAfZQF%D`a
zP1N<9t(^Ss(t5ku3~imWTRFn0^aycWHk4<N)HcVy8UPMmU!IiUaIdna?OSUKIJOlR
ztyT^Ytex;ndDOm9<~s%JblO$a5!G?gJDTr=wW>Q6Q)@S%QaE=rJ7~8T<haXRuXA{E
zbF!Jd(lcxNV`HtB^;xARhnte(EMST#zh6z-(j7UNG6}OLim90J^|wmF%1)A!PD{IG
zGqNl=a&sX#RqpICRJ`J@icmi!PO<bG`k%L9ep;gPbSl7xcs=JefSfJ`Sc*boO@>I9
z&nOVnZQj+>Qq$SkCv27n^Y}a!p^-cvNKzX1DCDZ{N>9tRDO<c;|I47u?(4iMMFV+6
zR?E6Ay9;<2Hd?qCsKe}mOEhd?%}R=Vo;s=0?=W#0<Y_Yp$XCuOxSFxJ+eYr<hT>!#
zN;SU$9w=7$>-jy9-iBUyw8%O7-w;@55%14MA9U*kVO0f{T5z#w{7<vi`8<rFtnTR+
zFEKjyD_;JPN0m&KP^>e2ux6eL?;K9YuSL#K%H<PYT{!SqcW};pPk{OUY4=P1Hnj3y
z3Oy7D%VUSl>Ta)+B54^lnp_jlBUrk)M_tvOtVO-5E(M6XZCUmcMpWCY&*wi+YRCm`
zXke=ncSNq4$7;U~C)IqmIHTyc+ng?TK4~=W^&RLO`d#%qOspw7+fno?=Vd2&yZhYK
z&ZU-Za*qC7Kz5^`|2?Ry85~$njf{osf7M6kcOCRC{n8BVT-BzJkbA*sIc&z%hX1jl
zC(y*8YvPVVfd5t#B`J~-cqfx$;g4eKP1ac(^aTvA?!CN-uE|lj8(v;5bSeGDyDC44
z0fU6tet0qFOA#Hs`18YVUS_dW<<Ebz5S(yENaVtt#ow?_SKM`c5xP<{H5B6}W?ND(
zoB4w0YMM%~Me-i81E7{DdV5ZpB&J3VCna=;<~OM9aUIqQ{*AxEV=HbVjq4TcS0l2C
z<csxXGUO@H9M~wOf5rPS^OIx9LPIVZnoF;2LXr0$h>;5&!k2h|i1sK$lCQn}Pq8qs
zsDT4h&eBA7`!Dj>+}Dkg<b-k5`OF#&vB(F6LX=$oi2h|o%88OYMF>RAl8Rh{-wnEl
z^#KH<D;SI2e8U`RW!kV!<G7d>U24a`03($IbrfVcd`hld**~+9hHi$7I@dzRb0(Z2
z8-4F+irRnQ=2mZRUbzewK2n0E!>#NH2gkapKKD<?6V~2_j5$OTrRcC%FBLx<%fE6H
z2nFP}I6j?UFP65cI_As9&RLBf-q_lFGB-t5RiUT$yS;pE6V9oTk+X9N@qb`RkcMDB
zf+13-C>~FbZsi#i&mn(L=Sr0nL6?s~vKcvpvcLFe`iQYBuV&+)7~O!I6ti-3aAf8>
zH%36X!6H)Tz~tuZb_9kxW(#lyJw-)3yD26Kno|uCFYKV;9SL-JB%*l>w~o>-oZXap
zeK3zG<yEPs^^Sh!Cyo%m{ASL?y~<NDUl#BiRN54l`^q)#RV<Z-Nv6oMf4%5`Wdv%V
zoPKf;ic(;AA&8d^qt<w(bv?YSP_PJ(TMD?GTB|=+$jF(GQHJt*E`~tOA=l#JIr4W|
znqkR&zr)|2?vssf7+bRN^M;kE9c@uN2KjnC+H6(F*~g95N4Q%e$UxkTq}v{C%vyI~
zl3-rSa#hjLz83UbaL~(+;FJlbyE4X>wafyDov`oW<=YRX6~&=aLi0v?V?&z+$Rr$g
zdY@ayC+9|$0LwQ9i@S@b^)6Y+NSC(xgoi9|y^C0D#~s5qZC329L=q)H%uCKli2bEH
zxRD$f(Sbx*(k#IP({*ct9mWl|A-W?RWLMuAk_FtiokY=&dZhVwz%~^QAM8azQ3(`2
z|5woohqMIU%sZ5$V!d8<Ofnr#dovJLrAsbnexFpAFXkziIC3nLqRg6ScBa-<b0jTM
z>g~@vcFje{8)eJ?hb}i=_}yy8UnQ4OEtB<yuf6uSQW<EjID8QkNAe9m&RmrLVL(gx
z?u;NH<m1SJ``fPC+=B51rY$J<lrX4Fhu?9inmYFXP<Btjxjliqpkv#%ZQI<@j&0kv
zZDYrFcI+><y<^)>CcpogshXOpI#uVU?^e~iSzW99?S3ELM;I!u7}KAo)>U0gHvAZ6
zNEXl{b&I53zRWxyf^SvFqb;8=RhJ)<4u%6?Zc&Kpq<%Yja5hz@C6W@JD^0dq1A!u7
zpIpuvbonpjc!rHqQ`Dwcx7J9?Y$d3{tmh8(h%RGckP5?y;i|)8P<&g7+8{l}EU->{
zAa{lH<Pd8hH1c4@QWY%uq$YlaGci7w-(rnoo*eFxOD59_Hp^UwU}*n15J~9~BhTR*
zB`esP-qaZk<5Wvu76M!W@?xO6aP~n_iTDS#KHZcvZRTq5<p%o8(~&QM`B}srGfWEU
zqgB;W_EivYlx`H5+duBaI#pcqAjtId_Yedvn2bNW?aucZeA&N%h&;?|Rf6a?YQj{8
zRS}k6848k-N^djx$Q5Xd2t~eS^U3ee>Uhb!IQ$@2*TAi5Z9bz_1h%c%=BlHkN<x*3
z*rG$<f9iz`1l_<G?*>Ie2UTJWp|15>>RQ`@2Y$Lhkv`wPvem=Aw7eNrcr6fieVr95
z_IMsM*6Z42<P;`kiQ>#;f035eXn0ufUW{~Dsp^tt+xzdD+x1#m3WAMzeRtTU+O^hS
z8A%ipW&0^YoME}ycl`_5%4FDw_-D6DVZ5!0O$h`F!UV_X#UN9z27b<y;Ali|wkzMx
zG;J2n5#2hGGcY_^;@V{lyb%T;)#&5gaOC7?D|Gq*goc#?9m;%+mZh->8!Qs7vD23#
zvo&Uj9Gq|%(0;$WAWi*76pH0(*O{^XHPa9`mGi;cW#xhrC`pOxakDnT1V1EBCy@rj
zKQ-vweAOfR{ba(>kqf)g7%GX1as=V0&8=2JP*;#%3F&q_P0A8k7oqZY7La<&(yyD3
z>hw1M?=Ck$L%1vil+u-Ci583((6zutk+ZZ|`LR<Pd}H45+vWj6h-Vp=$SsI}R4<+e
z8l)euRY1EY$l#?b*j%PAK!In`yhEkGEMFk3p;npHk&Sn8fYQg-;Woh$c6&0_+{`TI
z{Zwd*I66pf3aBk&UN2KR0bd^XAjV365ESyoQzgVN!wSLb1#rqXv_OHOcYfvL3fwz*
z$!LO_={Tr~Op1Y@CVk3u0-v3sK!y+wTkz2e4f+NriS^cJ*o1Jx^wW{)Rd3g9h4roR
zw2U*_pSkCtqxk6{XcL4`E~f}mN#kR~!KmY5_Tv;3rQe&jis)9Cq*C*!<Hj@W#l=#V
zM0(1GIz;1mL6i(2P`_7BZ3Uu4u@nNciom_hZt&o-45w=|{L?k*RrGR-kRDuWr>=Hu
z|JNY(Sl&gyq61Y?0gNTa*beq82Fr16{T>8;%_eheZuIAXAU;xse&XEGvlDMVmv^6P
z0hqTxK4CefNh<!5W)<DvPHE$97V{21y_pCx_cf>3RT{kx!xMCOo%KC2n6W;hqu0Pu
zd=99OmDAyel{=AtlxX@#!xkt@;OesgGeFn2QwZMrFCY}k$CWP7ZRXFA6)bm1QJ6L2
zr3|nKL<C0rQgz(o7O5*H>j!xHzc%3V0#C7#zl;v#26#(w<APkbsr4~;lIa#mpkY$%
zQ}{eW>kj2^=*XR6;)uZUn(HW1hFoQnj(ZUsldU+@Hfy0El39|*!nbWKo%rrq!p%iL
zTn`BiR1v6Lf#dinW>gDLD2<|5#X!3vk<|#m71WnFcO3k4A)NLurLF4&|3ouSc%&+#
z*?^Ytxf4Adz;f<%Zg{ImiBA)`^VQ7~ppBbRXp69Obi4i1TL+?sxV$N1S^FEviG+ly
zg+9|<8!KcWJf*^ZDe6UZQcXDbvu$x73U|<c?__)L#N8yAY2D&4`|MwRLLWz5#a8KO
zz6d|oP^vmKHxP*|0e!Mo46!hplorhS>vGLes`$=$M2WL7Q<LFH9{i5!ZTP9s>Oz0i
z@YvJbSP7lXu*Asx!zf)e2?fX!(WeC<+JS#4z?3-7kp)080rlITBFGfAKR1Jt`09tG
zSc$C7TN;7X4-cHfd^2%``f7D*(&PGK^~9`<-8j{2dE`5&<5Fmxe=`BLZy2ULEl30N
zwgYKuwZXD!a`1J_M1b)G&-Ol1e!*K1^a9GtQ3TdkqM=diR6G79uFMl8H^vq%skKL#
zLUkq2ba*HL3ccWEVF1&zx4CyKcB^{E7ci-HeMBfAl|MS40~qp17^0dgU1lKQRz`|b
zFSck+4K?98$`XnA35As<A|7}VkxDnSO#e@@R6vN#O&Uc;eqp&A{AJ&y`yR^3%^LhK
zu7Uws@+1p@8UCR%y<lHrtx#$L`QZ*LeFI`J@@iL%Vhcnq6yIu4HE$(Y>pdwKXU162
zg#!2BKU1|CLx`jWtd4f7T~=;TR#r?(KB)Wh9Ipu6Vg?KcsXjF5mjOca27~Ca)Cb&5
z`O=$&4G2#f>Vt??1G~+#xEvWu4mPI7(O3UW*r39>3YENUP>}UFOx=5n!ds_qYHg`-
zIv}QO%Qnx1Ehp)i=;KXp$o%Zn`OsFa8YtdvKN*gcO9KGUA<xxNq3&wGwiLRvEWizG
zfR|a&EgFARnW=gxgb}$zDusNsfF2qz)dW4Uol?i2tj4Y8q%<eL!^M{b{*6FjCTPKd
zA~QyzaQ8P2D>A2;xpy3xb@?@L<~m5cV$@JDB^A;ZEhsgrSml6!K*et`X9>TNKcbya
z`0L&N`hLfG_>hQKJX})-NlBjD<k4*^QvkHKo7uZb3z{~D!{0N+p?#sSe|@j=(9n&T
zoFIRQ#xb;cSR6<wRKnX+pMd9I7di($N<7(ElwYT(59MRpOeDFt=i(0}jeLrhfz&&Q
zi(8`g+#N&R4oI=oy(MlZ9_2^kLJPn8k7H6mJ5hFHO(%^v+flf3;1jm@-f(11_fDDl
zZZ0h5*=~-j9&-UzrZfLCN?LI+Am*-w#0T5qsIDT%1Hx*6F>8Z1p!kqYMf&b+ldDzl
zWriq^UVW;DFUiJ9uzb?Yb5zx+)0S2fH*^n19KZ2*^?I(-ZlJ%Wn%O>UG8B{T6PPiS
z_?oLtRnJQ_3^06xp>fAc<hJ{5q|tu`Vkdubq%s7#GGa<e9F!;GusSCr@3LonnQSWB
z%L1cP_zLTs`0Jcl=R5(@cRVNRa~0`L5#(p62rD*CKH)}+`LIII3g%+K`4_Z^=^MO5
z{9VqMB<LvIRItKV_l0N-Es8~%`t(*sRt3UPJXG6LjV>}8Ab~4A+QD1!jI{*%+B9U3
zv<9@mL;_4#EBo2Khb{V0weJRRd6lcuX{J{D9<HTRu?O@KjPzJQh|||O?LGc~aM;Uo
zE`3LUW$$v~mi2Bc8#aR+rgouz4_e*d1X!#Os|g1}djmKBc2ll9e7S`!nF(?^w`k(F
z9hR=l^Hl!LcxhA0&kd!)t&yR;46Fr)Rxr`A@^od2k&#7SYSjHOT53G{Tzhw_hz6aQ
z1X<>tlnB^g4H<69YhomsdD&EE=B6@;olS-zJT|>~hIEL4{}(}uyiX*R2EHeQO!|E&
zCyKET(e4$K`!vTPSCLvl&C=)dM@w7Iz|$IcMH5}?k>{UHLk_W?X8=EN+=w?K9sspb
zk){pWS$kwBft4R!QbmV<M%9)z$!=N4o4EO}$G_tj%=7dkBn+qL-e%)%*v>mhH-*e`
zq-Dksoq;lsHi5*l)xIWtQvJvyl`CNoteXUj6zoR!dZhI}k#GorAx<_Do08W?Fu;7!
zK9+{RJ!hagS$;1!2*;E!p6PnZ3jMpBfKr4)AZkJqQ!xw-fC68k9X2UXoX9AGt&h>D
z^pkb%Vr88Oo>Ij|0E{iZnG+Qx5A2AQ-y<ela3`wb1(eT3r3_NM;)=#P-OV4b4qNOA
z4UlU-r}Q1rS?O?wC$xLk{hsySw-zpO{WC@Q@JgOf4u)ePQ8s_lQ^Wc$Fy|tv=>XJI
zl6Uakb~Cxy25XUl1X_o95(gn=ELM>-Bxf2RsE$xx3|MhAQ4j(M2}g?9b@P9ry~B-y
zH92e1l1>RoYj2o!Exo><WvNYb9lywY8eMluW$Cz*5-I-AFw#7J9Kx*OxZCL*Jzj6G
z&z}k&Sz0X2OT=D%ZN0l`{eISqT%FsrYhW%|35Z0EjZRnl$1iXt4j`w8v>7i-`&!={
z3bC@Uo42#AcZjT^W60}CYN+7n{%oT@tM+yVPLB$^m5om5nJ;_4tHKEn;s{wTI`q(B
z46alrT2zmeSyJ~=@nN!trai(OEkA0^$aW1L4sIUa%poI7N0Q{pwzauYBTIGl$ajbI
z42d{da^g|64^oKCsEancb}kgCE?*c?bd(4y)<5N8MUX{%;zc1&UYo!b9Wc^Ms*08J
zAth%tOsRc1U}uwZF~hfZY;ZgmHauEl_JN2MN1FT8bb#<@B+m0RvV2{+$!NO;^?k6N
z>b-9g^&W^s7V>RP44fv-8vLR!l>^2~gHpQ!Qfid6lrIk)JeoVO*DQxx4;J+T4c$?b
z0$!mLU0t!kNh{)|*wm<k3tI-O*hcQZqY3pebSR=s38!d@2^UV&M!$xM;<Fe?m4}Pq
zW1l!MYOt1Jo&zl5C0emofI<xac@-3hs!}DbD~})M{Zy1fSqH}(w!laXPVE@iyUW38
zITC%TsbIp{3*Uk0Tb4sOADJO-PbV;@$F(=0u4{bA(Jj*L;g&6e;YoPy@zc{VziO_O
z$+AW&FXVC(GHED!8nb=hWYIx%qpWRQ$J{zlndt9Vtir9%GGQgP%9oq{RFY0y;E12B
zZTyIyUQSfM{vXj(1iuRl_sCK%fRyE5hjExz2HQ=0u41GhEd^3jiOiq%peD0IMeE&f
zBb+Mwfx2Zrz~--YgJaF!Q9+{*$WWoj|B%no8nn|?Xg<L<0*mOajAdI?jNN{@=vlOk
zvE)8`nff`3in=Ta(~b}cMXO;rm<&}Q`Vn3;ODuxP<*X7I_dF%p40YkLd7iJlXJi?R
zAX4}B>m{`FBz!PAYJ7HNfAWIemf}!KVV21*9Y*GDnqzrtms0wzd}v7N%e+{Sa<?<j
z$y;r3PDH!@c?ZB0m6E$PLt3v&(xw0z#Z4qB3{@+9@;!$r<OSbAJ)?tTp}XA<suBRc
zpK%+U@aFts06fK&aq!G(39(AZOZ#2Qr$hJEPYWAcE?gB6vc)C~tbfvs<<>E{PCndO
zOWnF!g}!IgOUssGC+7mhm;9^hlu<xYBbt1l$_1N4)|9ifR>Tq;VU4PA@Ge+VNf2wO
z)$h>erho2R;$MA<mM)!Yh2A!&ExuPPtOnTZ#$Kw>Lu0^DjW+1AA-475uF_xo8o6-z
z&EGxe^qcb*Bl4SSb*#}p_wn8(>_FeJTAeCj@!KN*jr@{imN4a~KM}a-p=l;~)(4&}
z#dKtp^#C*nbZYnHV%odQwj#o|wwi977q%>Jq?i~$d^=A-h}7}a9*<DLi<#n5^eq0*
zk?^4<Ny3_KFWdcL-<S8R`M1;SZ|x==zKbS7ud?f(n?WT+iTOH;j2CEGCE)6#^3~3y
z4{`<$&>t=;o6l;r3bv1;j-ZlRBw`WZWmez?1};!*!#qq1JDE=Qs8ck>WO26Kod-I+
z=9Nq2(d75{&$wR)V&?6=CT5j4pHmL1&PF`6|CQlIR4)l2qAavv%7*!zXUV%?$nQY>
zNjZ(Jsu~gdVnm$iF4EYX<58kX&w1O+vnoP1{sMt+!^{d$B(KLM%d=O7*)b;0y4Z!5
zJ2>Exr^yW59IsduQ@eA$+~t|7KVq6+=*}wWcr}t&?DK1qknm)i`3ADPuKfm%DUI&j
za#5HiyQeVy0=h)eS*zuQ{E<698moheR!pr4l>Bi3RY9t_cwm9FqQ_jE!&OxpA=w~%
zg7_O$>cqvdAnS-h|Edeky1^Pcck@2FR5TB!`tAAT+2|PM82=ef)D&W!hHX0NI-HU3
z{edEg$tYtmP~17*5BhFWY&&9$-(#lhb2Yg8|B9VRBli&-fJ?^)3!<6PK}Hs&vW6v9
ztP*bE7`!tn30lYKU0M~FAFxs$3h6H><Dw{D3Utt_ZiNGg!Lg?gb1POdTrAR3xz-FQ
zLM#Uv4x}_b<8}>Ty1XS)mZ#KW^%L}mea!RNXA>)6rouD#VN~F!`CsL;E<41`WI^;K
z*As1O`;K|Gz71vhr8OhLM4{XRIL%>&b@zwPm#pbu)~AnsoMH#6o4c`kiiji*;}=8M
z)cc>4XtAD3LB9=on*li^$|2lkBl8lRwY3?n{&_!Qr-2sALlh?K6M-Y`0Tx_f++Mk3
z_3sHzqMHO2vs_Z<&%HLxT9q-#hyB?;Fm^~*r?B(W;!<Y<k2&3#<OCh0EZ7C>qW`Vf
zi5IKdfQ#;6sfV(%*IysxhCnEQXXCUtPTnIhkY`v-ZmI%JqM{LXnjNJ&R~0G+hP_?<
zXBkFo@tH@czb0Vm6})8Ua5^P(;NmBrIi)}uvN5!Avte7RVKoJ?bq#(>0VbCBz}ju)
zS)0I6(9JLbpALuu+G;qp{<dZP`>v_E<B4g`0C7o!PEtK{kC_}sb{vzQ8#`<u+Vmq?
z`M<({nM%({EWY0mOE3FevrR$FZu`W(CG~P_Ig!ePdJ7q{quKXBBn7Yg!&`|uta66U
zVvDxbKA7SfLASAPN+zkK<0N(ssYgeMXRN0JWZ8cGZOH8#X7bzQiPh^TPwpF143@sK
zbc}#W-oKxsWfp*F1gp9U`Zlva6{{%xo0L!>8ipc_R}y;iwO%L$LbL1YA!Bx{))~^c
z_&xKl(~Z;71a@$HrmlQMZzdXVLL@2Hwo8KB5K5YB&=$n`$Ty9UQ8R75KsXIe8p6lv
zjscW8gIBr@94ZhDBpD8ZA_U83r#HuO;;HN-wAD@3)m^t!yU`{I`&lMtv2tCtrOIlh
z*7s5YMJw)GnvAU?$Ng$6bfq+(CpPQ^i~#1|)Kc#99^8l{UNG|)Ipq<vsWYH052uNA
z20qt+V!G<tD7Aq`Ai|N7yop){7TTL|=~>yhthkB52D7{w*LTE#XlcODQ!Ev4mm7ms
zFNW<$hCmM?kxA8}CTXvBr%dI@{ly%K1L#4K47wHyCmoBUDkdYFv+DcV$16p!^Cj=}
zHuxDhC5hiRX);xlJ9Z}%E$8%GH6OE?{eL~VN}^`p=sb=TjrpZwTx>Q_CLma-a7vpF
zgoY>57^Y$6E4=)LrVUqB{WpfI3~MQvf{+a%W*Jnnjl|`S{N>@v{^wXGJR0H!$0F>2
zSm1_rP0I^y^f|5OJ?Z)q2dxb9n+GkfoKWd<@C7!EHwJ3ON-T#^4wonABoMCRM&^j@
zK-%7xb$+g(c{vV>NJj}y>afdnXVGeD@bxp)QN@M*%~~wPmZK^aa$=Ap;{BsECK;;3
zVsEI8?`o08X3+<npG3zLkoo%d>b;ZnbeBD}FmnOBf)snZJG>lQ>5Z?TsSF!Cm>Hy2
z5E(p>VDuBF<tLKX{t7zp`Pf%XQ4#U5^*T{IX1<;*1VvrOSqV{su+C97Mfb?@a_&(O
z>&_SO7!4GSvp6pCIIu?fRbYgJ34ytU<CvBVJ37KUQ~PW~@!@rCCP0?yy!is)ythd6
zAAqdOIWff&hy)L<fAb{%-H>+rmw@LofackbDNC?Z`4O0PcCA};1}!9u{7bWQ^f2sc
zDsW;6X59d+20@!)j7n0eCySDrqDf6Ggy8VgbBbjof%s`Cs^^NAp0E+t^H6E|@!T6d
zxVj3LRxZI6xop>{deA)3q2GE}+yTw4VXFmxfPv#HOubVAUP^&m#NMLXs_obtxY<8E
z<adFdyrHG1BzZioJeN&Yk&5#SMrl82nQ$wi5P4<fW&&Duiq07;d%YqGC&<SR`1cWH
zY6#K}zQC<1`R{wXLZ5RdCM#~dI%bv@sR(%Ciltd*A3iHc;NgVD{#?g>mDbE3*R`qy
z*uRjTaV9^6O@+%*qjv~pQY=>5BzM(4oc;k*N_Kp&anU7q9<B<l5)VEb+$`X``CXb@
zU;x06`D{CS^r?k;Lr-tJYfZVGm5C9P-jwm?&;>?OT6B0tHUj$y@Qw^9rflb4-&$E9
zV<nSdJnCUjDi+2UCKXOp)%t_!Q&~bZTqtR~1(ewl)8FR8@^8U@&wc|}hhzHj*R(n3
zi#Lk6-?ik+ckuXpaf?*-rO7W{t6QolL}BX5ZA2EN2!Qr$kKg<8+f;9$?q+mTw6j9{
zP!TrFr}p$g0{<r#ypu-M9Tlq!;$5>Qw>c-!>kw4A1PKL`ouVuUgBvfn7Y?G0p+jHA
zVF=6suDnkwR*<$LpdTk-DS}{UO`j@uCzo8_ILcn3ru>lYbJsCSbPXvIv%<k~vU4dv
zJJyWRT^ZiIPNTkFN=<~wTa=hRr2CJ-&_G57!D#%l@&~vo>%y;T!G@fkE^4x#o@UHJ
zJzf<FKCZ-z&v{%d8cu>Oy)YuM6Z`ki+V1O03Q0Im^oa&4DGIMQueVNb?>9Kp=;cJM
zlYv*G#5XE4=8>c*Tp3^zHq^2QM>c;zofw5SZg@#h<dCF;F^u}d&YytBkE=ZaYniGu
z+DZXda_<NJiS`=%sE<VhIbZSDIqJMU;kidkNHT4NQ1F0c<F72=r}4cCOP_?cz@9V$
z!f+vdMurWAxLp>*6C|yib}K<JQNo^~k`?_u<MF)q7LTvnfs5$3mSkC#I*jWNTf?h0
z#`}dQ_mSB&oUpLJWd50gEe~(Q^SMp1jK3EB{fv5F;+|qaM?wFV`RVkqsq{W<3VaG-
zIgx~6T+h(^dt&b7@C5sZ0mF52rWuMshVLR+L$btNpsGxa5Uw8g>UBM^!-qY65w;eL
zzB?p0IO+P6wR7m;xfxVTPehB$X7}1EYz-P*yqgywdW1zr#p?B3zHUrVGA=SIFy;wr
z_TIX@hFhbYB{7@$J5LDs;vxe4d<y;Ei+%6JKFczPjAt&j%}r`eP7k-wj(Y+%EK8M^
z6yzDUa{%}a=J6ZxjKVgZ_x`$FKh<%Xc{$o%kC}^p?r`*z|Ejk0^ab9MKhL~Lbpkwg
zH9qf*DbfCEW-J@R3*)9~C8$x3-CP~m6wLcou3{(9f%z2uL0%9&Jm5GC*^$X$4>rPP
z)HQ|~i$nw@5xYhhkw3fKS6UQ~pKlnO#|%E)UEVWbnL%|dYKU+a6N&=;dG;@p{de+$
z5E^K;x3hJH4Pz1LIo8_!_K(*VoqZuuTjg~OFttM}aR7mrniq{RKsb?H)}5Pm026g$
z*&M46nHX<+lJ8Q$+!h(HL6t6;QkFJKimEs3RTlTYmy9qZf<#b*Ff+6*`3HSkZ%|o{
zqZI3?h7-Vnz_BPHgwWQvj8J1~O)hFJunJS8v7(ShxibvkVWy^>Bf5oy<Xu2(xLQix
zD_;u^My;ScZ>I)h6<%T1j2T`h1f4$fk9OOJ#~V?^dF`V8Etah5xiWOZ^g0!B)~4ey
zYuN?`%3h=iTQ9hxy$J;IZO_&B&hPWV^!>K%7lV4cK~d$eRwGf*iEb<SQu>uxIM$L3
z-&%b?nHnODqpLOZd`5*_mH8ClL0CCL5WD-aVwgxcKOldwwbg%AZ#y;a;SnLQ{-Gak
zFH4;`N@xUc5`1yMty)QS>1>PIMF*#0*jksHr^DEKpNpp*>_LnOV*Tp9UJ%<d7<zfy
zKHN=Q{^0uTXmsqApa>;{DX#7)D_6&!U<ww)FnR2<4Yl@u@Rw*0m+!08ROoPX%S%K%
zFEEkzl`lN)9%r7uN6o&^3q#SAZ9_%@qN1Vq{eZQajWwhcv!y!zfd^~DCvOhhr>*y=
zeORPFUx(1%N6UBVDo>%;nUB?Y?jIM@-bXw-8S#SkWeQl!+ETjAw@=A9P^hO(_@}&(
zTOA+kez&@xRo>C=W)7047EmZYjw3hj_rONL<Fj`y=8u{Aewn?sfJ{hRu^>fYBTAhM
z892j5Xk6|B_%p`C6|k>rprLC;N|_H9!Pv42fzO|_#@lD=_ACBfa74sDp92?zCHI0G
zn6(N2(UBbg(UBL?p_r6X;y6tNt&)Erh|+75|C<!V3ymgoScEmsQD8JCT&6&0`B7Uy
z+mznNn4z2n*}K`CA&aeN^64kH_3}P8?KKq6)${4MzLy1VNlnb!SYwfG@RUA}$msk~
zhPJ=1G(MN*-v^3H_8C8!pYH&BC6xc+k$35sm#5|^cNk`rw@+GLKd<`17sxQxkpBHA
zIqAMK)Dy8kRKZe%p=>?KV&KS`g1?@bS(75J>aKQ^NNsCYSd^B^ipDNJ8{z=BHj{+O
z;8PR?tZ;)g12ygEw72&Qxee$yhKO8ZltM@QN%1QkkOv^P0@$ST!n#NNoj@#*$t!J|
zpw*s_9QVhL%<z(EH9KLbvth5TfzKMyZ|TF3pTMXqrOyhJzKHxI_(7?a^F83_QRBC7
z`8mFydhbQwq`m-bze+XLCYHsbc8}h6%Q(oQ8N!`}NV5EhV4}}gXF8hr`2sCH2a*%&
zxzR$JXE*nH_ym!|nWDYmXf7ruHu7uS3X65zKeOqongw=w9W+461OdH3nHaP*a|~El
z@Ow*u*Hr(*aQ7Aa9gVEpaMPM{cmwkV-c{B<#NmCbb@8)<It(IRA#boLa;0w!^=T`4
z)6~CgIRr$|OP*L4Jw>))oT@SsWNB`Zb#fGW#<AJuxPNa8^Wx<FC_Hq{{^E5X3zJBt
zDqpY!o7U3V^6p~b;`Hr;ZmAczl1vwS6DWcpw$x)_YFN`a;py})eKx-HIX%x0UWWcH
zubG~cGwYTUWH(JVxYWGfzh|WD=V?!#ZjZ(jyP`deZCa>*mtx0Kpy9K&>;WWQ&?r|!
ziE-ui($aytdXX0%Lk@=C_OPmt43ql$Jg?opZ-Jj5uAleeeTF$%IvJgkfA+e^ff2%F
z;XKRP3&vLC@Kbs)+?DlpTc~u+T?QZ%c)P_KrS@0=yG2M4;o-&EXHYAq{U1`gX>3KS
z^Hq#DL4ca&nW@=4AZU}d3~9dx<fqUBSzd)Xxur_0KF-EQuyk$V>LKsVJKDs$!lNHK
z@;-AkN8mIz`_?GiKk!wyz@W&q+%mt*K=2C=5j@`V_PWksYMQXg+}u!|-(M}iv+3bN
z($9pa-0zNOi4p60>KlMjp=$x;H_xU|bOv%VbkOAIz3k`f6R}2!WkEU=lUDCa?6W`Q
z9i_$Y!Lh2O7Rjk0bp7oZ4IELBu?@ts4&O(x%E<7ie_JK|uWsIrbKPK`UmqZFTtd5j
zD2sRh>&eRI*Jt(8rsjRCYVe4O3dA#|lfwt{uiZ|CTesK?8H<&0bBodhNYUeL7Up1c
zatO1V5N1ZwONS>FM}@_dCJEO4(Dh$VUIJ~#Gx|^^7DZLHeh;|sSKM9{w}2Uc;#-ap
zk0wWxiJv(`k{`JE=>XR2sbUBWKe&i*RdySiQ2<KgQiRe~`7Ug-1U?olL)v<l2q&ER
zuAa#|4%@xE$7H)r_4`)%a{>106!)<%&Uo1F_Pm!t|6O_mxh50WzZgsYNQ?qaBv=C-
z>O+Mr@K*q@{lZ2Qa4?%*ixVK!P$N)b@nDd7GO|EttgtJ^+-wBKG4Sx<ip?_Vct3~k
zOT;ayTFR`n8ooS9*LfM8Dmjz?2xf6CZp>~F##E~TIhDnE3gOT1QgZcDH9LLhA(l(m
z<Vw?#0KpKaOjcm~-`y8Dp`4WlM{q;gs?5BkHlazH(LqZ~?^cSf99jDGXDeFI=W=>n
zmyCm|q;K19$8FHm=xD}RkmmtIFU>t#1`Skv*42<|y!Fdj2J(nOF8xO6qHKpt=mky2
zMdBnbsl(NoNLy>RUq%l{t&^1z^X-~?kxaTelC+p*;V?Y%WCeji3K+-yAv&|USqVwP
zb;&hoDV~bi7!KSgulQnlIv(&0*wG_-MF612GJ6$HIG_7^eT6`N$Q~sXOXPMS<z2i1
zbXTAXI<;<&Ae)ON>>K8t5>w~rwD_LCWz$NLDxaGxPg|5poxs-?=yWU#u9o1+c+>1P
zmYR{A+Q9QDhCde=$o2jJMMkPK#SrGH^Hh6;T-y$UpfJdbB$J@TDL^45IoRGBp@?Fe
z%97otwY@;GEdzQjetxoC8kElwJ~;~V#b&y|K*ruD^)naafsCDSFes3f6e@?x;x86j
z`wz>taB5;r<9^7<gL*)#*hkBBku5fq>aW}Z3^PKIe0*ajlF6#pvtbjloo4p;0lx>I
z*OH6xAyRd$V69O>P#GxfB)e#KDsiVQ*c4Lmze_lV3ALFL0AU^9CbSX$sE<Rg&4%~$
zgafF=(PdNa^TwR*EV^S(ngUJC%IXHi!&Wj1OI)`q#7@6Zx<XztUpiGhY_W|FFoQ#s
zfX7Z>*sG_Sh!nF@Z21uxR{f?^{KalfI3g<)q4uAgrGe{v<*h`$^t{FQ9=Lm;(Hc<U
z3<!YSS&dhS5F@C)b$+JedmpxCp9}^?(<eM@_5T~M*UHEjgafz;U9R$XwA($D_en0u
zO}582m~;6o<oEBt6tF{^d^ew7S3xX)jQ*>Xzd*$Wk)ssE{4D-iRp+@_BkMUUV?tR)
z^-=(?Q6HQC-NWS`$Rj^%+WPQDwy7r_>xvvnMiNP4zCGQg<--lS{KG|JSs|b6{N45c
z;w?pg+M+dZluF{R34IX5@FXg$f^|N|5cCZjlli<*sv#;N9upay4^mWJVf?<hv+r8q
zMP^XHw~GIIIkxRg;&r!|wO?zs->CK7aKDzZ-)b<u{%7kQc=^g%$G!Mtt1faBNJyci
znd1_x>RwT`!R+;I^=%@=7t5N?Eo`A~p-JX`M*ce~mA5Jw?l#HZLF{ErX@mrA(%M;4
zZE(^9(H9_<NE@0`0$D86fI|J_q(xn-Fi4~p1av3j0mHrnzXp?afH1H)5$VEI<+S^L
zSaf`GPkB~Hd9HB0$<~*lNH3qK`*ntKgd_wpe*EJ_!b$BHMcMcS4|X<4TmIvu?12=}
zBBG5$T}^2b8obFE0%l<PCgZY;Q$?&sx$49>{EK|oyO*0;x2rVQ1x7{?vDjXi?5qJy
z5+I}l1KzD66!{+*iRY1W#IJ5d*Y^WPZZ=O0qX|-shFEB=Wp&=<F!i@6zZ!_Cd!YHr
z5T-Ou+z{KG8tPWYlmNqU>p%AUlqlJmKsyYNET&mnzZHvaZsJ2U0k=_@H<HokXLCc{
z(xoEim2f&ZSUNFcQ^t`c$xl?x$|lr^iUKx(J5+~jRL!%d)(Uj`h6(-_ZHdF;_mx4}
zf%E+*M*fA`<f}|TZ!rRfALnWhM8;fC4ps{;t_klSa{v@JTI#Hf|Gp1}w)z`@KsdVp
zIT)}r@Gfg07APA?SM<BDn&lv?$%(~A0vg*mmU@>;mMn)_sCki%KvYbxTM5M?QqiyK
z%rSOaAXa#ox679#B|hTaz*4f}w*(op4MaS>GgUHUKmlL~shD6WgVb>)@)*?*zMXS)
z!^RS>jyvla{@dJC2aF1~^tU5~jER`|Z(!K;h<Vpb1S9r}#E2A!jbNVQDj9~i`qcP(
z@Uvq5<JsxKzDW^wUC=+ZZO<<C#2YM=eB$EZO}XrpC72ti0M!_xNCu(5JTE_HB)<cQ
zTbGE_W8=l$$qWXKmdh=)8kg~Ys1|`TpD6k8%beOnz2>(B#caJ8Oqe0X7P29Y_T-$a
z+=mM(R;n&E3&Tq>%8wbTgyz2apBc$L(ELAU<kr!hCWlnWj~N-RW5=~)lT19}5~cqO
z`Ol9TNxEyC(2QRK3+=^v&`L(-CP;<<*FY8?bsUI#c_BcJkALZwWlT?lR`o{DP}6hl
zibx!IVF`ue8u!Kwua4&6mk!=wvGpJvT#vZgW^t<3rW^gxa#w7-Z&);xCGefSIFst~
z-hzt@mV4YNP*ghFE+;4|G7zLQ((9*j3E#wgbz?S)MiQ|sXk|l@!t>t+L_11B?wTEV
z@BonpwhfMU2`q77JR}Rwpdf<3=aZG4ncNCeeskcJmvR%C5`ri6#&|6+H|8Xvg&m2U
z6H*WbP0arkfl9qG`)EsHUxVSN1bhB7BTdx~)QLz!s#<VbmKGOojnJ?8&wFzgcV1+&
zC(O65%%~>zY$04t4MieXR42jF=H3Hhf6HXzG2LXP$>m10r{26nt+o8KQb~U6Ck^J{
z*x;&U$q?l?k@hy~${5j&x1gl@a6;x1PirIKf^LxBr`T<qOo5?FtJ5@oGjvK7Wyz9p
zUGSE)u+!+s^26~ySx%!=e-VJ9$LTyBfcp1-HE<Jd;l;?Lel%#TC|DVmo$LwBTBL3o
z$=u=rwb|?EYY{fb(jNpF(Tby%?+=xeLJ-=6UR81{vn&IxVI8tQi0Y-!Irj9L4_>ni
z{Ij6R5GH?Dc^Pv9f8EvKS1k}9TF04HO00qL2dj%Q*M0m0Sz9ilL=wF~KK9y4HXvgP
zNsLjsnFF=`{!Ub5?ys-#JDG_EC_uoR^#k}5S|c?>74cW2kupmzAuVP#v)n|)f$1pv
zVpThoX>$wbVtHZvjit=k5wzh1e-RM{yhC9v=D1}Wg37o~39%~)3z$b6)!nq?D@WAA
z9l^t|loO%v4Qe^j!{76A#u00|EIAF1*gZ@w*>C$lgE!}Sshh&ky875bw3;0y)Bwr6
zyHa}+L)tu9CpQ)CElhwf^J=L!xvj`}oy(&^4_s7*6^}Xoz#^|MgGr-FA#(_alC7#O
zYL#-kNNT81VYfme75Ow+i>zh+db?J=?VYBUW_2mqU}-zjs`MBdnK1>r3jJ}y4;z`+
zszENlYH0`a#_=rvu4qm&w;lhj`X3v~L~6AW!}DSy<t$a5LR#{BP3G)k6?-)!M~}SC
z3a#57y#{8jjrU&iPTcZz@h;<u<shj<X!%zl4jAAB(RNtua9s>*faO|&%pj98>DGj;
zLRx-v^h8`_*mjQX3)Tio#`3A|z6cI1q@tGc&tSrU0Ohhn$~dVH|8!8ckz?;)#6#s4
z3B>}Zmp3+t%PVxBUS4|3w3%SmR^rcD%;-6I$|MuxHiH=!K030l+tZ`pBnQf6GlST(
z?WH3VajDI}{sCEo!o;OBOBZsWfj3Gl-tZm5K)tU+e9J{4&V|mkG&RSD+_brxy=E(5
z#tSwdQ(nHC8#QQT2U+-h?BQ0rhUid2xE{OVUcwwyR4+ati?2Tgc7T7U3l}I65E@ex
zC90tBL#{$PX@^X6yMHlHLZH!wd{ei5gK;QP1-4d!ahn?wG)jqIw#X1nVk(wK)j178
zt8)kS?vqO<>-V9IVK#-{;$4Hu+WCdEr1E9TS2~caV~iaEGjs#$r>eB37ZH9NhZET%
zy@GbfaNMbCGLQR4>7WN->{qPYFG=O7$ndPs+jO^yua`e)XWDlK@-g`tHcDT)A*G_h
z9%AJW;_s3pvfR?|x6BJbrYPn0chI860o&BSS19qI@5bZp!Rl5P1LwcVhIA&r-14Yf
z^{_hGYWUVKMy5^bKEEaG739|336UytjZag%UDlPZ4_BV$1%-3TcqdU9{Mc_S?GfC*
z4q3U~iLL(%Yb5c#@dJ5(%_*AIE`+9>4F+N)F(GN!`O6VlCwnS_fI%T^7^{#{GpdJ6
z#cU$iv*wt+0u`c@ssgyG4m8fA7mRCqDc37iE-zcRpD1W07#UMqVwl!#19RDXCV)01
zs+X9H`tOdEa?D`WN;96+(8-@O7>B~KM4Kp?pQ#ogm&H=ro`bUm^@f&gdf4<n-J97}
z$m*;zLFV4#HNU_7qc=EX5}n8CB~@PRMsgJ9=p(HwDt5xd!%IxGYy?5MPF?00SRTRl
zUgb-Qoj|ZzduYr|i41=KJH40*QOO1~Ig$=k-khzkE{kxGrALl`PH0(9mJ{prMy1!X
zz40OG{Y_WC8b)+dls(#gT?L2Ws%UJ@0bdH{;@IdI>aIBvr*5&NcxB~czwbOZgHId6
zs2)*TOgjfeSf+Y>YRWRdCnUrBf6I{=b^aYL-y1_4l9$Ph=jnu!mO9n=U`0qysb;`r
z6|ntmUq{cf+aEWC8_)F9`kG!3nG|p3>#IK~Uotj1Co<arWr!bTB!WY$2Hi@N*lHuw
zyGTN7gp=IbSDG9KJe#y}9bH$Cir^cRjTa-IRvw?}Ni5s{z>!Z=kO%5#KXBxouB;gR
z=WVvq``mpArF(%2V%^#ZL7T9x<1K?S9tp4l&8x_R##h1x8EPSZ5tp4*+aHi@Q~oH#
z)W%HYWZ_yj&6Xw)psP0^Nyxp`g0#bU5iVie)un-P>Gph|&$Zj?(oKc1q`i+aKwTKm
z{6}f=b(R(g%TWII?bE&fO)lbhK7&ShwTRW=6(9^D742;=RyR}k!XqL=hBTEvq6**e
z6iRR|#ZTyNQ5F$HFGv4t7pM3Ua6O|-fnV@jB2+l{Nmq((n;LAEoR}}Y_YK7!xKZG8
zW=6T-QRO-PYIIG4T{{Tk`Q7x3Cx1Fsb3jBdf{;-(4jqTd_-bJOJUK(34sA3v_3B>B
zv~Yqasb#z+;iX`t#u{2kldnm#rIn9)8@(;T)4WZ8Oe<_;zX>|NoDskP-tO}^jpbE<
z7Oz7~OsPY~$5{<WPjilHFp-Np8To+tV!4#CJTjCflzE<A3qB`BsDZzrg~aefjvW*w
z>1H#!(@3NNt(nx4!eSnyVp-;pr5v^?mCp?I9$Ow;yrU?XRcX>_ICI=9Q@!hNdsJv$
zJvw`E%d9b<ys7h_pYM;4)YUD;v#9>T`t^YO9gseaMkIqqH2-f3QY0<l*G|UCA+)f|
z?rHR6)y)5LBa!Ig^MBk(I)C?)uU1|e<#<c#BLR3@`|a*HP1f*1u0x8Kg3S1IKR=V+
zx5Y&yFp^%!3BOnF%c2Tm9MRkt?pt~XQsC0r*(8j3F*KBDN0z>fwky0us6*Oku4qTe
zo=V~x9E))jSD-6XK^xO+QBl0QLs+}k47iOs(+dU+3O81D+~YD^z*)y$isE8~j%b`*
z1emZ9J=!g`HcfCYE4uT~(P&yO0ku3JVVwp%Ih%E*&yBISm1iRCp`q(lHYZqh_245e
z<%y1V$27g|J|EWVlFW118HKgF9(}&=2)n2wH(<N_J@==FbY-JQ@gFzxs=coY=|$EF
zmq24UqEL8#){*8A+ZNZyedGH3;VYqYX}7r53FbJskHY}S7w%ikq;@n~6f4x4JDgW%
zFvF&Z#%$vSFtZXpMl&$L!t|a1e4gVLG+4Afm>b<8n7A4<TZbng9La?#9z`Cgm_}m)
zgsK)BU6gUoB+l)?0Q^hOX<zAtkD&FykHrD+X4G7t24Hx?2Qwrc%1x0g$<3f*h@jYN
zyGtY=sqgzf`Ah9sl*w+Dn%3vO;?VnoM*)gqEVhE7R)PL>4h2mvdNlqivLf{L{BdLE
z{Man!^z32*@rE2^`~BSft8$hW$%)GZo-q9Jl=ed@+TZl}FWI?+x6`wLyhtM-ck9C$
ztCWT?$MRmR=KG#zR_lmkukp(MTgmYc8+jFxiPFKL$X&RS@MvRM2bS8A%|KBIqzrC?
zLRlGLBC!c2OJxBjnX~1)JBuO##L#I6R)B+g`@z$mS$sd#ObWdc#FC==9~)V^v(SI*
zx2)sHvec{!0t=D3Fa1Z(kz}v<+Bp)*DtK=i#PYmg&Qyn^qFk8;o_RVoCs=7k|9NF6
zHdcWN5kr7X8i7=iT;@1S%E86w&#8+w5ryMM{YJXsiQE#1Bs!biK#llZ{VX)2q5=fG
z^;<A;{dAF39v9#h@Q82wt^xEV9Zn77KxG7NPF*IwMdfK~D`o?VEXX)T;{L22^20`E
zVDQVO)bDDbNOkx6kUM9nP=|!cRlSI~(dB`8q4$8-W!clN$y8|R9&+R)j)JlJh!z%$
zY^OPH#jr?9GQm=@%*@GPo}<kzsogCTPrQo5%L{ptRC8SBJ>!NOFoZ_p+D1V$xYEFS
z<R}kZ+;oi?WUGWg2d;;ml=M~2E&oxkc$LH4Ze}q(x+JJH2I2c-ZzK!61uM?fCkwGe
zvm>vIW180$7ipDRZo~>cNpci{uiWF+H?Je*sS7kpVR^uweyx_<V!`e}_bZ*A&kwTM
zW^6wdpOMFOUhlg68l1piRZ-GHN5#vVO%huWGwy*=-u2-?BnhJ(5aFJPR2e4ZOdrn}
zjNTPP%LEXPg_}?0RkRd^lcU)o!Z1NxEO%~BSj(qSBUHOhfQevF4x^BUVlMe*GWeu>
zZTs%9QCqVy|Ni%SGwsV^V|k0Pr;|A9;Rw+R*CQ1z!;2eUC&JDN&RZz=zT73!F<mk0
z6ZbHqIE7HuWDoS_X}4hyH7}kxYllIlWI=+S?~@^`$iPv1B7urTCl$*F;fZ$7e*XnT
zDU5zZs-fTzW8%)s!Thpu{lXT<+1)Z;bYH#BgVVY)98eD%)0=S&T)N&OQ-M_2`U|Tw
zegn;GgQH-6N5O~cek32O7`*2)a?w>tR!i`<tX+9fSgPrNGD}g>lFy*)g6&_+{yfGC
zf(U7NxMH<_jaaMfGI1Wj#0N2nLFhO?Zlv5DJAoF|YZmCVZn&1vRh^d#(>_GJ11E%h
z8M~*=6t7mW@EXaoq?QpnDGiVpm>U<yM{G0w(KvkV<*xdf%cxa#H<;~=n93>V*7}25
zgLM4)l73DmVCeF+^N|)2pa0yo63h@Jp=#!=tfzj4|I5^GUG<}pZtN#f(f6%PZE-jU
zLE6y?smSvd#c4i78Zimp-r4y>nWM5fnR63Q`r}68i+nrZ!v^IjW9NetZ5KBx?#)00
zD=C_=>G!KkConG{(IT;c64v6s9U{iJzZSX{57#~?<$La{p?=KYR8tN_{J4?2vM}%!
zJ1GmcrHem_TauI38b0u&sd!5qTLv*3GvsBt-}}M*r1AT4Z|4z=$EC6?<on+1P<yyb
zd$?xX-rLVMVHO<<D5S&r!r|D)(_$C@J>H@>Td`n+&>b-`0O3aPu4QXlxh$$K;Fq-y
zXATrZ5^Okd0I!oK{0wg+h``!82{c8bh_qKiwu$dLMTUFG{yMVG#1!c@vXrEn4+e{5
zSDT?=yJ~({FQ4aDUaoEiza+Kx=#_;yVaTdzkLv<G&&hl_TM_uoq@DI+h(y=CkZtl#
z@nbnt%Pbhj%gAPjIeGC_b%e-O(FY5=gTXBJHfe4p8yP=Xslmf<GaLjfM#cP|#*Za>
zVa9#Y1+nO|2xmygi+j&bLSp~|={uz!v1J0qVd*KWU0csVw3nCV-dQWg=!3en2F)3L
zGWYNC;|M8mv;u6f_KO|VVVsXiMWp*8o(@r+>ar}D&{O)nc7_oENqXHf&1<6Ls!d?G
z1NlIsOiHg3O*aSjmXnBo{^^Znp3E4$Evu3XsmGc&=mxA7^EkQEE+BI{Owb32wFUjW
z=+;M^65M#76^iS#7@??;ivOgb4C_?xPXvogvSY(4T+{8NeOU6EtUR+7YGuXFOWTE^
zz}mF(3Z7kDuMtp*E78j-#jE7P|FDsl1`VUpmCoWJ8QHHp0lR;wGdaGg!bu;dv5P;9
zQ&2Igdc$zQze6$&nZ>HVJu<xd)J3lN?qr~55jkVt43Gw#tO{NO#Qz+t8>pS)@`S^w
z#!hBL0_HZjVv~%DUx=w$#ZD_}`gwAROlbI%i~}o>x_RP+^2J=}(uVIcMM9b-js50(
zR8=D5G6W>#Ef^$iSya{td%lHV+*+{XByG|0{6=73_{Q)^=9uo$|8pb75_P5zx-621
zsaeiGMB~%GpVIVd=BaU#x<W~y7fa~Zk+jP3mYH$-PHx$#M|VMXXHAcg4041Lz?A6V
zpmxO~nWER^=vm4S;h+}8L^Ke<>7)yACn;#e)}cLe_Lvi??ZLRU<k5!Iv%cliP9dE(
z1O{S}jx@mtVRq$S7nbTv^~`LrTzuD8z7gNPQ@Z<D!*Swf6|li^i{aEbk#(&b<WjM=
zBzSl<*w_U+q9m55l32(G;(cvhR7;!+DrKrishNy3oz`fMNbsr|^q`mf#0IC$Vc9iK
zBc#pAn+Th=XVFybaTZt|@Hwo(h1F)z!ESv9AouxD#!H;Y?1VD@mLXLtTUZy(cM;$;
z$^PprQ2w3F1rO%b!xJE&ru`xI1d`k@ZGcrdJE9X#hV^D?8Kc+HjH0GU^$w;d4%ER~
z2)|Raim8|L5`XzF_Y3M_k}cw!zbKiAxGMj5z633Ij$tIP^FBJ)2tQ4YO4utCm9mQZ
zqx|K1cRU9swt8$0`#LDS8BS+q?E%Z+Wb3C68AvS{12Tf?jJy@ne7;dkj=v@b;y~v;
zM2dQBobp*!0#k^$kjy&=P?A@ud_`Rl8~wa4bBPXpcH2^Mvpw$OTFA{2R6u#sNXi>c
zF%H!Oxv17%)ZnbWRd=0<>8EuyRQ^(*S`cr1KOBluH1`qT?_KQCS6m;VL62lG9dMQ<
zD}w2CobG4~8HZ(eoRdsa8#ubX5)HAk-HCWBn~BeXSVzisj>ddLGc~lx$#CACORv5Z
zJi=8E8P<Jp6vi=kGf{#qL+>lst$?6JIrJBPn#EF%pg+DSL~kZg<pw9g_P@t@ZtOQq
z)<dz;VM2=+r2oi=y{7|ShjLzWaSNbVp!#6}fTrM0DRy$vaqxp%lB7}OStMi#g4E$f
zWcY>1BLs5Ye2{n4FA??1gXiRo<~O$x^_~^fzv(Z&#Q~FMFl)CUnpvO}!K{#YB<{bN
zey8;B(8f2c<c9#Cw+b^k^yZ89KW~w>0dH@Mwg`dBSs|eYAbz*p3_laLv5i_ET{7K(
zreN)%eBP%ay4!9)egVDB{&8bVp|fTxBxwK@E=HnN7cadP7eJVsP!QR=W1@vB`!#-_
zW1CPDQ07Ufn%5*1n!qJfEkTp<7%*ZisN9qD14pvqR{%YM<*^GSS+YA2t$+)L{jB?0
zj7c*&DF3pV{s%|8th)-tqsxRFPFL)WHoQ#zW-(R;B^6~IINY=tg(<ppouU)K3>-~s
z%2Ov%k5U46$MK<O9r|od9YBtQ!C^{*jEKyfHIRE*k|K8{MX4&>ekBBwPed`7(|p_+
zZ0@%w!fzAA^T)Uq^b=7=a`OHY0(f1!sY5aE!MCMPz4xF3^oAn(%7_xE8i$IQ(TVm3
zz~%dG0sC*$sy;Q!-l%D;z0UchLH8N>NoUr!<}w8<H5-ql<!ax+Vl3sl{;8Tcm19Y}
zh~}Oo`J_xI#244qeeXASGRBq)SKY#jA4%37^KeojLB5_!8y>!E!Fv6PJJbI-L@yiq
zRAQ8qLdpR4yZQMaFteDLr}H%EGVxBgQYLKQIG5KLnwib<*ER4M(eO6i!t9@O-4)gO
za3pghE!<fMB(J*$1G*qj6k^CEJ6N+hY;jI;FUOu9A5FbFB;S~GzHIa<1hQhDZ3r%B
zG@+YO_dV)VB_ahj7l7{w2X;{K>Vcan_31R#O3sP${*rn=bc(t2Wo~XR;Kx6AC!bsn
zm#wUBR1b6#{T6#ih>=ucECmZkiUAtpL?qh>-TEwN{YU72WP1HFs$Xqm3j}|))@XkY
zN!8xq4HXHBVOGB|!HsyOf_qvyz|}miI>FO6r>ilYt>GC|`|vGM-f7QD!o-Fo5abGu
z%*&5-MGLXtt5n!ohU<E#fVo0wH;*!1{0Idb<<Hb?qOTk>d|7SRSWrm?%VFAAOzHj3
z8C=w$N-UBgMcKpYGQt_CJXC1ku}PFgw?Ns76U<9lsJTi_XEP<?Cm25XA$9MnDR}}h
z{^&l^S(7K<prP-sukX&0iROmACPjNv*7?YzJ!`z0!hf4b^-n(c#$I6_Hs_6jw!5i$
ziqVk01c3S?mKVTBE@PH%MV1e0sADS<nZ166bOZg0nYh@yFQKX?u7uQ@7B!@SW>6`^
zjZ%U=wn?|jb{%(<#0}|v+%DlHA<M%To<eXLab!UuO*)nNJD2O7f?`0P8(dR+5g6_$
zoUOgF>&!(;S(nn(Oh}h=qfMYUZJ0Ub+#zz}qIf7eXlfcw|CxT-bsu|XB~`)ABZXb$
z-(Rm#587uaFf(DcaGEHOF`O3VNH<*-5TGd(CHRVU*WCY$t$T_NCET_}9ox2T+csuw
zd&ahH+qP{xnXzr#x><W2+=tUzz1LG!`)iE;^@0WooLYn1P$UXDPtBp-;ZWpf)#!DP
zz<TS{Si2%0U8G_mIP!Cmc(RK94a}gM_KLG4;YX`<GPuT${4J;<i<IxJKcS!WV{Hcp
z4^B?Qzh_G=u{MgBbV-UL{CI0D<~B<;y{4JbE*35(cFLAm)C|0~JZ{*jr;h%&K-k3K
zyXE7&l;bl6rq_oqY^PBwp%8R_6(fpJ{`wlk>*!^0ZTozM{2aXTbQXys@ik$Tw&pEG
zgTn^RPL6(3GQ4R?xE!!Um8@^?BAw4D2%EOg2VKKG!t;RE`+afWoQsyDSNY@Se)a^*
z!~J}=@tY4=$;D~pSKBYJO!fJj+Xd;IqzbwuRm90$&+)q<U6xIGj^M;&7a$Dsm|A2|
z!p5c-X*!6JGP>=iwpUKF73+W>PoE8(;UNrnBPP$G3w1kvkSp8#XHjawc3ZXJtl~R=
zCKbYQawy`Ep(F<}h<}`9<ikMCaqLHZ8T%r72V2z3jVb$BA(Nj;JRHj!1YRUuqlDDY
zCRzG{?tQ75PlZS&nGFA?k`f<5e~Mb;oIrqqapX{LDf)_UF>w@s?khSc&QcZF89}&R
zbI=WvD@n)WRkAM~!}Kzs{<{In^;&lzwSaA=%9s3``ctn$>)2!)%`ny6R>E<-?wzQ<
zd^DE8(2Jza4z!N#i1U*C*wzXBF0pa2t}lzNg-F(bs)XL+k07j|356Ba#HHDYhh<R|
z9QfzH2Zsm5bcDtMDO3V0DxbQ%Xkt88%9N3-8iUjLy`%7}%+b+HsVQ^TbLAi9V2oTs
zVB^p}ZX&(SvU5E%n7f@>=VDtLw9G=FoQL3%q2vNOcfWWM^gqj^Hg#i>sjhx$x|Z&U
z78Iz{pQMpFy2%OazAX|(5S?t88r~VwC8K}ctBL}F5_Af)+u<D{2q`p)(}ves6Dx?{
zw$iLtzr&i99R6YB>ntF$^c4!IdlmXQO)!W0BdTHt^g6ec)K4TVMKUVWq3G8@M*4j}
zXvfN|#iU8?MEd2BwD4vg+}k!N(b8ZzSxo#^_wuCFd6uImF<NNV4I4?$lZ-Nb$|M3u
z+(TWeeaM(o=ziw)mh7OIa>0s@eV@mA!Q5v^MIL}qjhQ%^DD%~;$-2g`J|_>DSzL`&
zH`=%|JnQ3;4JbF*K*u^zLaNG`H@hppqj!B_r#Ol^VS^M@pD87!1B~@RYBfP&r+)(v
z{^+D3f$pusZ#dzy<4-fS;6t^^VPO`NGf2^xRFv%U7F6BYK++la_cCuqpT+?FItN`7
zo6^FpFXku@wjR|9__EeaSXXuZUifS_Nm#dMeokOvI`KX*pr*dXEV?f3j<!)=r33w}
zm^f`ZEGP_qgG%PV3$1Ds9*Avj;bI=+^I#l<KVe8$y0Kx*4=F(baF4x<<4keDMl9;M
zg1aZ%kk1*D_!A33sl>>5$RR`@cQyFOR50)iPy^i`LuNnGEr>#lmjO<IZMLs@b<tPD
zv12+hTz{{}(5qn@NA}*y8f)NEaI3`UB>0qMDY-GWGt9sHLo$+GWuUq2PGw#5j8FaG
zGnU}kwrO~y43NcOqz5OBWu?N~?V`8wjJ4G6B!_13a~)ITLo~s5%o!m2DeT&!;m0Sf
z;WX~n{E1Abj$j_Cj>NCL`uu5iq=>9T_A?+6{?e0XDNGd;GEqulGIFnifo{^QfDsg?
z|E*TO0~L-;1L)s`!j&>$3#yu96aM)+Nv^r+MlpVsv@tGFGiXprDjBK;bjDVvxiv?F
z;ZYT2$8Lw5a=1q`jFuVgY`U-U_EpbKI4&7#i``Si2%l@=M}<d|4vjn=oX@eI0gwp5
z@QQO(svJMvpEKrs%amAUEkppeO=I@5UrxS81R8JuN=Rz(%mo9(@2Qo92q^Lk6pVWD
zqN*S?k=SP-rz<Q8$+V}dJJhn3bSa;@M|LQ^4pe`efGG6g)a~AROb!hRc{H=ZjszAo
zErVs@YtjpO8l!7fahQBsTe0@c!ls7zO@ktwc%6~Dgx&f$hu{E-O@<SU3nAiKM83FQ
z&}L(FnX79nF+T^E$YsYFs^g+>GG`EnzsL67sMD=siE<!~)SXRtMZi+kl&IH+x77Cx
zKq`~@2Nb~I^}=Rt;^rnuQbdIAQh@=>8+YgMrxO|L4;s&+^Gl{q@w$tAi|CnBCLT3b
zQCT$X2b^+(p(8t-@VZE?r4bi(jbR}cwQTFgTqs&PD3KnvqiFc{CODikX$L?z0VjZH
zb=-ESwVMjb)pxX&82%q3I-6|$^tKRmw4>PXR&3kKU?kGM--q$WZ#b&GvU$R-1#Iy_
zV1)HpB#p(4)@iIl*+37d`zkvBs*vQ!J48X#x4kIV2{4C^s1mO>xr1u?J0e&<Kp4$=
z6|8P3<9&BTIL3`JVgjqaAhjrZjEt8`vUp99mmu1>>=F$TTP{%q7!TNJgf@I8iDijo
z?9L6<aGv?$IZTa*YT7<bxD3~tG8Gy*?0&`^-pG>)^zeSpvY4hwX2kL6A89Nv`eN!o
z-1*w1718N9cPwPoazU{76w}?i93t7Y3Gpo-QPAgUvj^73#4_{%?4i>GwoUyF0P7am
z7XaT8Uw**FYg#Q0hLg}>Q8!adIjb~h8v-PggcN|tN81X5+JLfuM+zsy_=TtNijTR6
zoBt|;*_7u(c5c4(fvcRV-8yl<><QH4q5qZ&n6x(>3vnHge>o8O=oO|bWSpo6!|4{k
z`;4e|#Q*r?bLzhT75L;^f{FRjs9F^;cuc8KU(RmAGYn{2Bhp0CL>8*TfHM1evl_<i
zZRDu}rJm9XW5?}jNr?#UyOillO=GWu8{tdHP2vJ2{)4N;pRkgNm!k8wPho<4XN$0y
zi@0P7)7=^z1Pta)x5-g06A#4U)i-cLF>cQuqL+ZtrQx*|Yhh8l_@=)u{UHPOkPjwK
z<E6|LDqmnyA|c<lvHJ2k=Mr+Sm*;xtza+Ii-dTG)IH*>SbiC<@`?gw2oGUUn#v$4N
zQg4IdV~GhY6t)B`aJ$^aqSY&G??ad{95BnNyMiiZT4ga#pNTC$UB#pwCP0=JHmsN`
zNqceIGm{<56+xy=&iMrvsEsPZE4wd$7ELT=QWtUiw_;;hv7N1r5QY{8Iw(tvays2K
z?!_P@;m=rCK21I^C9FSl5_u3La!r!gja4ZazFlG8BywDcFHS!WgKlTh#aLc-aOg;6
zzL!V@18T_eR&YW>Y7`>+U;m>2ETjm!r<5re<@z<(!(0un1ghQ@i^<MSxn&T{!I<=K
z|JX;xxoY5`G^p%WYtk;ZO9WCVF4!hC#hH{dGl{%<Q5rBstlk}f+Q4fTW^vIEd?8h&
zitGol5TVcJNujV|avg*fh#=*m+04sL1}~O5qcx4M;rFWzYfCb;5S^UJbzQuqP_|ym
z^BeLQccdo0@|#YUWBJBZeh~$tZhq+m(v*1v7{ToM_)X$V9&1u@B``tHtVmrmY)%s|
zo>mIizXJJfSP_s`C*=Hn{SP{^5s2RjbTvV_x*r?T4F!JhGnpBwsz)M&cY0BZjZ^e8
zn9v%&6qy45s5E#uvN0(fPSqM1-Ik^}u(~=*dTf5$w}|SWhnAehT6Z6pt!++Do%gAW
zM%UXMZH3^m0^iyA{(&<9D-DQOpq9C9Ak>H3Y?DVg1)P`Jh23ozjfBp})M5k_P?u)R
z|6}B0Xy`eu9!?p}ZbW<xYz&HK2(*~I9tznvpb-kG67YABW1@~|lEJ6?!xR#IeDqu+
zxZIl67#Yp-rly=X?ceFTRv_ebZ8vS&zfVn#0M6(NmQ2m;PEN4M;=xy(aUmEgL-!ax
z`y?zktk}F0!*#~D^mIW&Jz{1X^iVd&12wk;Fw^_z`OLPlVpZc!BH#yB*|}@IlVGIz
zSzJ*^F%vEX*|elP{Q-<5i<3NeM<z_|R}1U6#fmT%4cary5M7L0^MjjU7D8x#z<k)T
zWpe*akU_fO(qb|r00lFK5qZD)m%0%{xwm84VH}8Qtcdm}Y-VJ~ieOeG#|iNCFwP?s
zV8#Zd<vyONGXabP2A}CO*}=o)H!|2kuNP%~sCKZ?_LLF>*!m+`j7WP9C=05Q7vn3K
z;Vd+4He^RCHgl3=Wvg|ODZpLIPb*RoLvE2Ezh=9EPQSg{oU=4E>q=V=Q-Kf7jOcb^
zcq^6>>wJU|y3c0~Z(<T#R7=@7Wnhw0A?m4Er$`sssMw6$__z@9^>HuZ7SU~#-q2c|
z-x%`bNVe}L5-Wx9Z4eUHP!ga_m!_yM$;mzbH*#1~q6UHr;orJa6B19U_QBS|Zc?uh
zMZy&4BJDA=2ZFI`brWbMPQSqZ^9v|xe^m*59uk5PihLL%Ogs))8*ji0Vr3XX1rA{Z
zcpye#g9wV>+h{?zT=Y5CJ5Fx%=WY3}x87}CdA8cT-Rgh_@6qtQI(G;$!;tXj2j|>Y
zCBuen);#B+PJ9zJL6G*12<d!Ow^Z#eSDLHs(eYS*K=X58^<l}wv7on-T~WJrl`W&L
zZ`7G9=hGIf|1pnDm!W9N6l944sJ2{WrOGBKoA|Sp1Smc*l^IPo6rUkW_UZtOAv3ll
z4C#;^G*tJny#bUt!HG)|iY4sZ5~c%;K(w+5xFgTB7I+&`MJzNy;JSIk_z{7*0$1Ab
zW4PZ_9a3DzE;I35=i+azhKyd>4b8Oh52t&GgBW?6BQPxv$Hm4Y=+CL*v2D#BicWrg
z2zAZ>GkOXA&R$c`M%}nm(F9v2hA=>6Lkv<diOa4g*oDZ|95<*u?(ehh-zWU#J*T^v
zF7Z567~Y2E7G@%{0Z5~y5X7D0LPVWOS7dwu*cyUkA-Sm6m&+o}X)AV+7Z68C+YGbC
z?OZ##RnjmEbZ8cRiqH=pRdQu9`0gK&)D4f@&P}o3?n`p%w_e3e<KZgb;#%aYM1+Pt
zuGwz<Q}VZPu5+*%Ox;4X)_|3~v(URN@pD+uf-5kbNl|4^q>KB0DT919@=v`!%2)ZR
zBAsh$9o6+&m3yowPT&D2nXx9g_m(GIiUj#<gWMMpS?6n)YaN0zU4;6ObSbtJ=q6<8
z%uTD!hhvY6KKM9RTcPrBbE{CvnDsE$u{YUiX0~T*Ke+oUhz$(CK9>^J2$P#~Xc(^A
z5@4^%MbDvUCr8)IUm&7@pJ(9h@?v-l*gbfRx5EKhJu#dITvy?uR-ZcS3;gHH?H((=
z_wFtHlAWz?)Rn)1QHa=k@<c7=&^LuOwBzbKHyd4=ovz)jU*7FthiQ2!O4dv!Yl;Jt
zrto-L?Z;uU*}2bNZJsAN-e<TQF5SI1fsZ}#e48yVfm__$pwEnJ^o6D4#+Wu@xj8tE
zITDdL_E=Nt>In!ZP}fbA##nzr1cGu4d#=r2$ati^GYiJ~=ti;v+t`teW-PkSFdU_B
z2u)u;SwAk5hm{tdGJSuO2{f?n-H)%Y`^DhTk(1AfMhLMSGOumJ>6Ooekw0lXWG(HF
z?~}o6H&^Lektg`{Q$1yR-0dt+mUlBdXvaBUO#Ifqt}xc+25#ok_o;76EJY6kk;qTE
zxvTU&h294O5qVeT_^I^Mi}bv$^89fjIbr$3AC8lH+o0;DGrK?xQ3fwud{L(ejF$#i
zb|LZJhL&%yz#pG0k*Pg3dLGLBo{wMrHiGdldl@!I9p4gF**Y>V`kUc8eej~k1E1EO
z=XhYoJN&@ieGk!>v$OM&APpzB^toLMy*%A+-^O1f3rF}hloUY{hdY}M@;f7e_4v7E
zJA4+oJc>USrgy83AIq3;Ep3nYwoi^b`pU!$W9Ox0skU<aHLH!ISEH#6QSWcPblV{i
z*OQ5k(zSd_9sqZ(W;aW=v-J31<33N_h)jK5c2z!ack`~|7xFQJ1g<>&g9BeBuxTGH
z2F0)@&F|_y5sw$_C|9+=ul4eE51KYTUz#@4Dt;A(-UJSN#HF5E;(sy<jOU&Q_S*s}
z?I{LZpX*w%lf1ppnr0Q^ZGCx?$WgX9M7VV7iQ5xmGz$Z}GK9d?)uHU~$sz1W?k^+Y
zEodQhmbGdq;-QCI8eVrDeD1wI?+stCN~7+2wC|a)Mu8eSQ)w&o{LOkkjc)J1REziH
z^5Ll5F=7XkVO~TEhUYR-5|<_*2=LQBPBM7M!xh35w<q9*$mgks>=<estqw8Pw68kt
z(26XYb%pr6%3cIF0V8Y`d$~5J|0O3l^#o0ChUT|}ClwR6X8GWzYIG;Q5N0(thrWwr
zmV9G4RW;?`<>JHpl*Pw&X(YP5UF0al*@$(U(}RWQn)S{aSyIXWm^N4>pE<~;o9$M7
zy}oL?E*^u~P0eqIiJn{DX&2ILhcch%B3?(_nzLWoxj-Iu0ndCqM9*L)I&_46)G5A@
zC@l+8Ys;nS4W4h6s<%PkQ|aq$dp84fO@oUlC;|Y6iDt_uK!c8`+4Ee*_pm_!uu`o0
z$DBcKAxA{;6-#uuW(YwnudW|hE8qR=6RSTLweA)7TqIoYxb)6xsgz$U?L+J9{OUb6
z74Xpa?@s8iHlWM?S?jS2MrkHL`DLp|kz;G<zi2hC;=fzTR*5~ZytJCM|0uVk0>kR>
zTP+^I9*#U}Jt1JL%xeIEPO}0R0Cp>W&X&*V(&t5%=@2u1-xda6PxpJrpM2hGJ?|E$
zAFYp}*eB}Q3)d3HjOfCOovfg000`iE{r@A~0&oA9bVCx@`(M%x3SGjot9a#C&CMde
z&TBU!6YvFbQU3O?rfHQY!x_fRY&L2CX|4lO<kRBR=OD+N>=$!(2%YS73RHIno^-=w
ze+g+{I|oaa!W5?QFo#)%O*3CRp$jBHza^(hC>DRYm0m>=v<^z6LalUaoEv}8fw~RT
zQURXO<|PoL!xcm@88Fx6^4#pXOZqE^5y$%!WwQdt)ib|@$&j?X;b~c8_CsTTu&yW%
zBLj|p(p!hGTe;(=*?Zr{>Sf45EdNSF1MWuK^(q42V<4=M3%ZaLZtk}l>x-GgNlcUM
zWn2&FMDVfiPHPl<IV$fLfT%lU%iThp(=fMo{0yBXlQg+t?<~S2$|&%w-nb@OF~?Um
z>ujp|C@{8;5rCu!nEql-g2g>Q>|8Y}JrCof<>c1OX9sc`ThK8+_w?B!5Sm5DyY^yd
z+2Gqo<o6#Xp4@irKiS;}dB2d5%dMuD<1MNT#OiKHszDsllwn1m{}VW3eS%^3Ukq|N
zn+Y85q+b<$n}&yO+*y4XWfT+0QRDZO;x@du_`mE84`_mf7k`sD49@OuJj?q6#)e}i
zL(A=JY5#~mpWO9^p7#rD(f*OwLdV*Ug}8i_l$qXth81zVn(vGEI27G<c42%3S^VX<
zJQzgVi~*G(L&>sCntcO+@evqt=0D86ja;0oU|^XP)Ww~&;Q;jj^|qT<*}0p+*Zb`7
z<@pT!`C;X079YsW!P3Ft8ZS8Vg#nxk#@NtYf%dR5#c-aR13~l_pIa5&$!aa`ykJY|
zcHUXM)EA&VQI;eVH4nr7l_JeHtj-A;f3M@K_m9N%#1y@<*Kl7Xtb@TXrlwu_1##N|
z;Us&Y$lY$5HF=27pke!$?h!T>L?hhhXNOADQp8jTI1eD`A{K|mF4p(S`YMLs#QTuu
z8u$fShg)Z<ls)*mUJvvPsZc03O)$<^iRhQ3xqE<n@wDGxs8#!KM$eZtmn|o==Z>x^
zIbUP3^6)F{8m5w0jg($>lo}>(2t8+nUwx-3!t`YIPaBQT7YD*qJp%%~lp!!hwa|Om
ztlL(LHiM4&VGWjDl3;Eb82#o`75>F_dfGyPgqzze*mh5bU%^oK=v*Co2L@lKj1S@*
zoi}{LTAU5kh$!?-#c{EHr(QmEV3lZf@<!fBOilO))($KKJ8}-^|AlYW6h)Qw{|mmw
z+!-^O`$ImT6+?(`du3iYCQ%5a$@k=$mDJ?rLE59>c%E_Z4TJl8<`0)VmY5E#4?Pqn
zdHJpx@O#Fjcl<xd8=3uo$=mop%x>B)JA~lrXF^a9=~Ta)U+VV8xwjB@6F**@Ids9A
z@ulYNs8_|ZXvjU-s=+DuW4U<nXln#@?_wsVT0~Upt5!l{#*m?yxX>F-IdQgQSK;vP
zT0C+8%$`SA>YJ&W#_cOyK6SQ3mVEqZt7Pf;?iw87cw%d%e4?pb)k3BfIM6(AVpXbt
z)x0*dQ}_Niq<(~D{n3`VL26F{x_Aj~wiTeeh3ZJdak^zaxOv6e>K&nKi!Ds$2wzQW
zbrvoyCHzC>y)gvH_rx}C%$F3~B6;Y0GpbhR7GEy%U)U-CZNlN_778?RTCwloIL%f#
z6m9b(*M{a@c5f4F(ZA`p5AIv?U(!}X|9?rF8M;?({V!>Ig6bJpT0?EO{F6_J->eQI
zX)<z-ib~>AJL88APCuC0+3NM!%<CK@)%OecTW<Iiakwz<g`w^rw>0BL#=8lB72v_N
zcHW}cYLIDrv>J_q!vQL61UW$}JblTY=T=bONac1{S;zr1P%x{uXII;Zy3`t-WS0&d
zAJ7)h-L2+tR8(hVv0=DL^t9BgJ`55sD1I<XmdanijNw6zER|$8Z7S?<$~8#=o^_-=
z{#vPR;DTB+izfCYj$jkxutk(@q}xok_(tCXt|EAM@LFhJ_b5gZ#F}7vy=4^ZihrNC
z&2#YA>D#^Z5_+t7BLkdHJpdMpTuJdp0IYO#Rxjb9rvGmooTb>RjByVIykEZH5YIr6
ziS&B+yICUnbSHTtd9^p1t3RQIw{(E+wW9>Cu4q`bqu1rm&v~ZyFqHl5eSH{hR|;1T
zGK>B6S!!KRr&R6AfusRbL-%1>E0P|MR^I2T2I4cIN;TSuMWC9$OWJo-LU@o+l@1fg
zDUBOB#69e<8*}aN(EuRiU}$1*^r#ZPm^dlNu}x1u<^472*TCtVKQGP|r)x0W^YEGE
zzFHVO!?@nZqnR=YxVQvF9HG&q{hlo_11TAbj91#GC^lFo+@Ta0b2}c-lX<_~?)Jlr
z)6t<DOUtt<uWi2;y|p6^mj(+(n8#LB?1Wt)o97_cs0~~M(I%@*fv`LRPAB>sH8so!
zg0oTU5$2mUs}|#oniqAEZI&z#WI0q^-vBJOi#0FTp>Qii_47B&hpA*al!!5Tshx^l
z_WFzxOre5LW7K)sE%O=8n4K){TDj;J4Y4b)1ZZ1Y)~^?Cx&k}coXHEeGLgy}YisG(
zceV<_d?80Y1{Ts($Eugh#3VGHsF%U)<FM=t4=a9WmzV_ymgG8K)yW-F{I!)e6-?EU
ztO`ob<;d7o1r8Gxi5H8(T7J|9YH`?qDvwklNW;)iP4n2K3D5E^_<5{Khv(UTlQT5l
zAAIZ7#X_Oip{TDW6L#B4Y9(FUnpUf|dG(}t=~!j7FQY6?_0QXBTqrgE^d&Z*)sI7q
zaFxwIY3swSXu4@qgSP*`mCmFkMzLsz|Caw0=Le&C=vit(TzHl%A8(<;dh50CnAoMN
zwP1OJ&p8mokUm{jG!Zd7*htedyx193CzppF#~8Cb*m0vrG@_JrEFI$xXj(2e3G2jB
zGhY0YCU{LX(CAS0bd-Xvsir@{V)m-$wNV|8WKKJN-@ImFpRL2o(nf7k{mwMZti0)_
z%zKPJ`8>O8-E{E8(r+H7zt)eWHECtvH=$Z6S$<EG`loelKy_J!hFK=*t>f=n{;yMP
zYLmsxF_KrB*IR(Ykww?H&Y|{?dwugrsq*i)rg*u3FT{;J6@dA2rn2y5X5%OFxLsmS
z5mTJ%TfB$<o#b=uc%+ltPuk~{tyfKj&Yy-!&l(G6af6i%S)5=)j|Z{*u(c*dbw=5a
zq7cCR3l)e;2@7!+W5hML-^Mpu0l}}&z~7#RbG6(#Qi3zCw^GOr>HMD9tzy5p5FO*_
zWZ3jNw@wjrfE(oPMM#%OqfnyE{I1qg?z44&nr@$PNm7VL-5-Hzo<oC=EIz=IXq#~{
zSZr3Onc1><>gAxy8nyP7Qdks28fvx-78WFZhVt|@)Q|g!F!Cexlb*7jj`FMWWrhW<
zDn7CF{jHN}Ok*%tBlVqFgHqzTyc`LrqrM^|-5Zqtc_`)*r@8bc4uVEK2*mx(*!C5|
z?AUfJ*-ku1h72d3V?vfgmnjvPGpDH~2s_SWDYh%SDg7${PW#mLi`$uzj0wS-5$!oK
z-01d-Xv`!#RBTqtxtKasF;c4w{l7w66&ruk6nmFdvGa8z-L->J=iOBXunkytm1&a6
z#e;GQM>3r#*qrpMqZzRr$(EJps`M=+o5VVUSVl)Co}|FA8Z$CTuWXplQ&ihbR7HE6
z98L_Txis5jB%N`V3%L5M)2Oite9%hCiU2x};0BQvf2(peADad=)B#qd)C};B2uT||
z=&A+O+l0ssELx`SnVkzQNX-A_17K>JpXir6sS~O<#`OO=A`?+!%0(c^g0h25nP9m=
zr~=Srb=5B}M&kme9$RV$at!t=m&TCPNR;dI{d{QA8PD!)eKQ3vwdl$IyBi^_qrhg@
zeK}Z7Y6grRf=AJ=<!xqc#2hWL9Dox)Y&560|5Rxwl=5(NO)3KD`|6#a#A`m2YW4W2
zrTDFoH?oYV-A2agQlA=P+5CsoTS1F3`X_)iqrCf%3Cxc+S%5zxT)^#ic@>a__(wQ?
zE+EhM91@P5GI?iN$-+8K0FpVp^fihgoP#jTlAxJ~>-8$~Z{+zTlC_e_^yt1VwTPZ_
zRHfRuI2oT?KqhQHZ?3w}gn>w2J(TR)L;+5u%ad9)Mo>T(Ye-=Wh02Nxr)1ixj8ZCS
z>|*s>?)p}*NDX2NCB`tqx8U2O;oAxB)5)y2&a5xR>G$w@uhx1=Sr=VEzXW>>VS4rr
z71x2pnfE!5AngdkQeOj&VcO5oZ_ujPihEvuhAg50LWv^7JdmYCfIujx7GOJ!z>S`g
zVpU!hdYxgPo{GtXDnrVMNfB4|1lJ!L9(&r4#Ra18$K@IH!<(9mkP$}2Q-Oj<e840k
z%V)|a)K5;Ho?P8-g{ldHhkg1-L@f<bXx&R~!0e&_03u05^CZ<x7VAfBJ^TDMZ5?|(
zkisBjuaWSk0<U@=d)IwPl;oa|hrt`U@XfF!4o#my`#V*IQe`3dvma~)=s2ctN+TWr
z4OKgAcQu;j-^PFFqQpfn14_d;qs8RjM&-g03~BH7v%(5yNw8b;u(klAlGk~>6lLP1
zX$7mSR*B8frqru<(4~p(l`{rbf<HF44VXdtUs~U>1eb9)#@J5$IXPuwX>fdx8OTKL
zgc`&OsL;ICve}k9%Yz)sJj1sP){=wJqH|?}bR@VRLHLY<v1fc#o*1v8x+GTbr?X-$
zy&Uts4}vC(u_Xcl$P&nFCPae}d>@ytKmC_=;&6{2z6$&eWLU<hPPMq7Lt;bQP6{tp
zv~*3KTwe5#MG6mWVnI7+GIFWS>ArqllZeBIp?y+kikJ$`npm~Q)k%KcdB}pni+wTZ
zZ^Cye6ranh;RXPa8eyN`^?!1h+_8=q7ojwWvC@vUtQzZ!2eOJgTC;t=tk5c4kKjiK
z*ck3$_%p-`iJ&PTRH(b9Jq4Xsp)D=-e%{PJ_F1~I{jyip)D(@)EU>Wv&2blBb5;+$
zB7AkyJYlcXZVY`YEj7UNankI6?5cg-Gc-rmVq@vNxr>L|8_i;v<w569@3YN*$5Q5L
zI=mr>9D?6&m{Ya3{I$86`Klw&k<ibNDzkdeljzc;dh6`?8w=!@(79_1u}61a^<Gba
z(0E{biSv&UoDqsr-P9q9Ae(r4S*^1=C1zkiRtsm5xI+#aAfsw#+cq}d^Mt23Pwv<H
zUuq%MyF5;(1-^`c;5lvYZy%BL{90L^7UsszLpZyU(9-y`NAua>KRAX6K2FbziNWan
zr#0W#r{Cd2#5X9?XJ5%Oa@&~38{>-t%-&Ao6*qx0b1W*xy6oDt;jl{7;VJ`CMkrh$
zv;1WAbXz|zch7Uhx%g|fd|#u7W|O({8?iZ);5t--S^sQtb-3P?;rzz@jdXj^`L+1m
z0aMHWTLU14zb5H;?*3)#iomPH9mbt09YWc^GO+ZV&7V3bm5|l(`yT{nzeo?%=l(n&
z&%tK;%yLl}{!VG_ZoYc0)wr?0u*cx{+2Q$cXGjp9db+}a?t{&yVG~08xO^{@UE;Dl
zmln^Z4rl-3Q59Bb_1+!M<_d#Xn4h;Cd#Ch5S3}Z?-z=$SP~3MhL_CVYZi@xRE@24#
z#|;}u_1d#pbTFdV0!1JsKtyk83(^(AMGZnm2k>${Vpf=%RdwBbVDWF;YK@F=!<cxU
zr!=OxR!ePHdbxCc9j3lNtDvCFV<v-zU+la!Vn{ce3qt0CPuUJ=y2*^rQEMn2#@lk8
zJ=_royf)wZb26X^(od~L!b;%Vt8#gVIpDRu@ks+jP@~&7hTG-KOEJ9zR|ty>_s$9%
zu4RZMrk|N2hfpQk)GUq)q)<&S4if<9hs3y=9T@Y#+^La%ss$cff$DSd^e2bBsnMUB
zRi{dA2lcvE!jxxSeN=Ta%+h3X!t6$+Fw%A6x@sgfwkD38GFB$~%lT>5WYsE)cV$Ed
zupU83(#k{1j+D`aKneBu%wz0rG|8=W*=Uz$TLcO_GS(ND=8I}R<f=d9x-E9fcglnp
zJCGN!xEa4vHWSugcc0~&ZY<X(j>R8dqkzb=-aeZw{?SIaG%Fr_WAfT8ds|>V6`v`i
zdr>r&r=-hp0u0_N%4W!ehEYAfbxI0ovfqbPk`_^MRN!dm@Tl-Vth`T;2ftY<a}<hF
zRq+$Hg#s3X{1ixp#_p9cJMD!E>LCBNT<2J;V2+kW=`1{DL))*Jr{d)6QhyvS;Me`}
z2Du)0Bf{S<UQ10e#iN_SElu|^kI8wj&V0HUy~UxQ$|;O*-Tah)T6jsJ_m(Vx$!RDh
z2uv_wMri1*RNmkVaW7$<g4N^{`+4C6)S1BK$&JD~>Nf0YUsdbZ7gh;)>wjuqfB>S~
z^u<v^P<k*RQPq@&vgtpYW<s$-TCO$4UQD-9*e;dqge|n&)tk~aI-BD?o)=l<j=PEE
zK)EYl`yIorv)epCnNk&(9HI;LO?gLubg{TE`>^yn-Ba46bE(YT((Rs07^2eXWovvN
zuXkJx{qG9|_~~K}^j<u?7sq6)fS|7+Z;9%2X|b{PVoA+_{wrj0z88x1v147M`cuBx
z(59Hn{YrVXS{3=jDsUya2bRM7htVCe>ITmz^2Yr4^cV$pb&RR;f+luEM-a>TpHSUI
zdo%CQ(S;cy9Dh&4m*wLGNeX`VX&nYZ{L%PD+=G9V5OeQE#;MBP*%vyJOyX(8Ku)HQ
z4CoXD2qhV%BZT{N2f_O^WPzGApUePfM}9peEEtfCv3}YFUH7-c+o1*(;lUXrL>pv$
zNzz~c*?0oK(O3ejkIyPzN1P(}tXFU*COqG8dFb<!APh7@dw56+7tgmtP*gUA25paF
zAe$-CPrV1Me|*y)q#<iPall9WO<$jwTeU1?2x8cwqVMy5*THL@agibVus9PGbc_Ln
zxN`)vAD*G3ag}xIU5po`ptq^U52J|$!tAp$X?L~}ko|e)93+M+FoG>=1V%4HMxUBN
zSGqzSigqux4sdzBf@j>u1#O(|9PHDL4!^IG(jgH2hQEpkbyJueys>>sp+Q?l?G*Ad
zd$kn%_xrfIBFtv`jxQLgr*#2O2TIT3xoD=dxBDJRVWU6n1xTJ4C&P_-CB**V_Mzqn
zmuXp~<+AqTWH$$m$jV@)dbN=+E5jcb`(HbLZv1+*z??tmNKzSD-f8a%g@s8_E4ACY
z8Sa2Q0EGk6wYB~1Vqv&=QKqZBvyUZ?yXF?NyAOMSCU;?>9nvYGh$cTx=r&<2Z|{}G
z;^K_?#gyj+pj#J_0-1O24kw@(171EYK1c1mzWyzMQ=`L;B<mM%#gZcie}v=)0tMk2
z+;y!&3j|;zlterA@N@NvqDp_qgQ*5XB-vqKhYeq!;tBf4(kXNEoXC4#PDs>9EzwZ0
zO(VqB2&l*yun@qQAhH;+{mQeliKK^+i>pr8qqNDwtIHk$tSweO1S<b%+^CPZHW6xo
z!%AKY@c!zSRK4{s(|0osGlp0pg`-aGH|KDPBi#ey+|r>|c@pX}9cx4IFbKPq#g<C#
zH!a)#{KfB<gN<8NC3q@}WzvOha&+KOTh(5n-{(DY!DojmpK7<RvR)j8StDPpL2QO8
zcYE=9VvXf$9_ZEQxH=A0t~Cs@(DW;jZuI^B*xv-rAjqNXEM6h^e^12ni2E6+FDo2d
zYnA)MDA5hZ4wrzPX8oG`6JrtV{Q_l-Kx(t<v0DwRL_3-yCZ~peJVWdA9tAbij997S
z<`Luc9>V)FFvousLx5I&d<XeWrrPVW;qysZv^1bL$jw}rWR3uuA$Yh_)W;&qYC|HZ
zRaRk3W7Ejs`=nGWASUxfx$~J9WRz`s327vGOY43%r!>*Fh`-%<LY_)5yKIZwjIVb)
zk5PN!7mRo+o?oz&!w1M2a?+#i;WVq_HJ6=N+XK|>kk(UQ_E403H2zz#N6n&F-s-92
zqY6KuxW-mp<pNN4Ssr5r3oMy}60vS+9g3YHA4S;q-qp4T5{_6%_*z5l+uiDBBxsu+
zw@`G@#JzR>1xH7J`Tq2xmS!%D(zwvd2s;X$LI;9;M`qQig5`3RRYkq(d!@S+@ysm%
z(O7e>tLDmHdTqL5OY{~{7)~+60huPKq?-Hv_ct!9Kj$0pcC~7zx9eRrr_`3kGoE|j
z?R{_Z#T3`~`}9gDraAjaJ`qDVrc!uf6GZ$$*^K#g7(PGL(-MmTYiOIWv?vKub(*!j
z^Yu`vlicg9NoIX{(^~+YP#F-?s2I=Uy@6!H&8||1%w6!35gh@GiHhb7uRpn_?e-9b
z4ZrMsu>dz`aw-N+!`WcBFcAv~=SeCF22KYEPgn1?o8xmu5>8^$bmt{4uQuOreQ!KF
zqQ9YHhPhYj0=UUCN%2=MynTq`^__Cwx<`SJ#oNo(ZbThTzxW)u+{xFNi4et9be)PH
zLHd#Bp?5mfPqM!U2?@QEshM=WDhFo>ubrvAh|XNVt}W1gZLNO!Qd_$Cu57mshH;+^
zZ60R9MRv+-&6Xg_-Kz~Kmh$1}q`*xM3c-Hq6=16-(sAWFn5fNINLRJ<U67_|XYefX
zIp^bp+B*z+cwbWS;_|!*&r0=a&3H7&4AgkCs~5?qbB-)Y2dz({^e#HXBXiDmj1R;v
zZ==izH|ZqWf2ln&N9XgawkjBKV;X1&bRCtX_H{NT+PpH~>Ywe3^r4=}PO&f)5x9W!
z`MqrAe(LZnp48}X^^zY|dfAc%xhsBN^S;kOy_)m+BxKa(WnPlu{JLQ?A=LiBk7qIj
zUZ{6+#S|$2yK#uqgi8#0^sobGj)QJ=gr@HW5Y_<VpOfx_n4zEHZIaI_j552`*Sydc
zmuX5N=hw^nn8(DW;5E*Q<4c7UN+Fw#z=rWD`b-R$ErF#}vma`ho7FROc=M#E1VKb4
z!tpvx5zZ2Yjh|t^GAP4ds2!MQFdp6ZtKH)-O(|)nO514y3I;yNQT#uj$HlFe8?DwW
zJy)Hs#jV$?4X!sW-EJ)S59At|pkh<jIiDLUL{K5o)<a{>lS{-7s3J>+S{KN+=CprK
z{RFO;Ie)(smP*qex$Ts*(uj`pT+wfG3oR3eL)Fmeca%QY(CB#$Lh_a1FUb?1gROqc
zS@ZUANxLFnl%<T0))^{tdu`?U51QDC!srMhv_V&M?bhKGdJ!yX_<W~xGD5DW_4;^I
z$-UnWB?aSWG=LCAxlWD3<iYtg(2sA&-PWEu3fXUt4(Q!exu;RR-0v|O;Yq>6G7}MS
z6k`6T^h|boe?dI*1>gQDa8&AW8dyYE;E@^7DB0@}SC;?8X^hiQ0IXP!LzWwZJYR-@
z$Ok|~rqk<6!~3}OKNXm@HGd>&uMlqh!tBQwo(Ea~ks^xDcdVQ1SU1(M-78|(qu5M5
zhukvMmRv6Q7Kd<Jp0|uV%4T3YV<2b6sa>k9G*?}yZYvA$w;<u@gdK+<i`nfq4d<P5
zW1^*L2oJid{+Yeyg&;1n6!2^yQW-n9nQ~_BTdxFM9{#@6K>w)y9rGRyd>s$JCgm#D
zn$jMarRmsnB@Rn~3664sE)AM*etBnu{VdGw`ZC!zlS$l0Xiuq%glX*_K2a-|wh(`8
zj6PYNls`jF<@R!)gJ=Ne6N=OlH3*GTf4&bqC?Pz?Bm$l11qj|Zi=&L}PZfDzGbt0b
zFtFr71?9EfE%tcUd@nJ+o}G9`!p{g?M?%L+I8|OWq3>G1JAi4U<!<I{l16X#C}54F
zxLIUsAQgtYO%yoBJ4S;s%}psuFV6I0vNk#*e>k6JNZNYwPWg0O?yFj<T13zZHfZ;H
z&--{Mo17C>u9n3gR=iPs%E;tS9K8`R!}0)W%v1j<q~c(4FLB2)UZAtAnw++fluhVm
z4{jQHn(Z&sHAvF({S4`|*N0cbd0gHccJ_Yk;N#Lark)dbWvh4#?#OzE?dhm-mOb?u
zEOUKpDXT~w8ux5@-R~*ma(V0(ibgf}rS#CM{wRJHmGTo<PAdd0IaY^md^vADVYH3F
zRMj^K7tXH48mV3@y>VmEJ_rCKPhN$`U@hC<tQF{=Ds(h>e&0W8+-2kZ8HlFs<anXf
zrOm?2T|O1^R0j~K5WutBxA#Lo)t%&mFxCBGc2A_suX%r)W%TH{JWGr$mHVDORI4!n
zyuxXEdj|)-)4{-srlDx^?+*yky2szoUsrmVbk(i*pUrXQ#V<xH;$3L#S>L)Avu;)^
z0Muz#r)(<Gw5&wrZPqV=l;{4Z<)v;R*s9=33k3I1hxl7VX}Wo%0*6fg@<-FW)>~8C
z|5~$Y!JFLmm3{!)Snt*6je;zZJSj{|zpnikvv$q(FDJlz>?YJLOHyw>LCRBK?*a8C
z?+k(39Xn~_Z4e$UozM4MJj>MrSd`zj;XkIN$6DR(`PQOGoQrK=PV)g|M{mu8=y$k!
zpe@VG?(zS~QsV=8o=d5zIO{)jfL{5yF2eq&C<@}A;%P>_Aa<rTj*4t0Z$PTF(FB4?
zB1-`rqY7L(f?`GHHhV3Vq5x}t9owL}stK;g?)SuB*IaP(W?}Z$i1+J%-VnK7+V<f}
z`u)O7tUjb(D(60^nRdlWyFuG4)G4nl08Q?au?bGP(M!fj;bPC%Yb{qgO^z>}8(eX*
zw15DLw2))k7nei}amA)8G!Qs6Y=&58ixxx>qJA1A{IwB^MGa1BJ0uCVHWF-nhrTc^
zAx`}uc}0@a65#7nmjpUfwVP1$fJ<E165xi^Gz7vod}J=<8}PQGmD;pg1g+JwD5|4*
zqnIYhlnfc{WA^OI<$n%tF5$HDgQGO|z7zyOg+FA6IBPBi;0;Oircy>CS=A*TD>DBC
zq19i$Vr<$h7GKLXOo5$^*4q=0(qyh!s6SAamYDM@V&PV60ua~8#7Yn**;7gT6~dyh
z2__F7-MtQjKs#G-7X*gWXK~fFO|H|$F{VVgLB>)`$2ireJ&09{ij|h|=JsQis{9P*
z)l37{s%dJ@WEet|>K|1Vf%*~??I3>mTL)RJ#4N$LjitJW8j7%!?WR0tLq!E|<)tUk
zH)OD=A)&RqT$KpXYzrunQy4e#hW%aZc#_ArS$p@@9A~Wwe=2t&!Y0bGJxsG&K3^>V
z0kf9NrmNz4l`niQ!Qw)kSyxu7Psf3ym1_MLlwDrq!K_-s#Zf&GR`-rpGN9eX`SGU|
znk-?%?@{Bo;_T*l2S<j!QpV^Q!+#*DB#KG%ug%3H>8ipK<yu;Lg;Hg4eY~cA!TF3}
zBUlBGp9+IKtDkMXeZ_*5pG8HTx!NRyigc!1qjF6nmyu#mV!j4zI4iTPWn)V$&lI}}
z)p--WI%LHd6jE>vVM)NHaQtIX9Zd25O3J|^dab|CL$YjhSMB`U#N5MmS{!EjALlwH
zqBmrr_!pHr@MOhuZ%td+b1%K@M+Dmmq{X*wipjkitU2wt^?nr>gllD1G!ofhHoZuq
zfg_*;et=^M*~~_JQ_&o6%oipI&MQH2>CP3jPk$$RQHDyU5eqEiYy#4S8hd9j!{0C$
znfmo@@|o3&QtHS@zKGAg1+GPWX9e)`>PPkRDT*{F&k15H%5PUSSEN~$f?R4uFr)|k
zpjwR#V`$T{Z-qOd2llXqJG=+ElvCc%6-tS?66rKkykUh7*&l7p-{ysFcW`IYgs-pv
z!~>(q-wVDVdXZX>J&Pxnr`%YTR_mGfyEc^S!<u`O2eFr^MCLwJ=PKeP*V6eD=qtEq
zvY%Oc%%cn{*kU@x{uV@&=I^?t<JiV2HRlz9Bqt8FElE4vlk8d~yLQk>dqEjBsF<r^
z_$2R#gdB1Exdb$ubn{8x2s=rO+l+&_+OCU`xAn4TTxv~YqIJX>N_D2W87QI^#>4_>
z_U5*nLQw4hT_Js+R0oFcC05|ZS7q<{FJ6=PI1##!zh91$UzVG(nBY$%$=oL?0$bVr
zRyclV#d*99v_`D439;N^1grKwhut&(Cx|AmSM+>0#qqhw=~ercJ6@~p=*`{=%aJ;x
z7Yb8L@~^i9cG_-{+ZpBR&F*D}_hX*B<;uf%2mH(%->c2~2E5a?%c|q=!eiST4vlog
z*s4z@nuX&|>)Ew~E^QaWI1vaJNON-()eZ#8{U!g>w?yC3=N`}|6K3P&fG2Buy2SuH
zd)dE1Mab0+G7&6J`)z!|JmbA|cSMkSHMYI*I}Nyl{cBivwcDrH?B$r2lnM7Ki<tdm
z1W6a3h@0bAm<Ysu&1h7p^m>>tpeg>Z4&ZNxN5YU=j8{8<Xz-=cRI0NtI+M6|)|~+;
zMUoBeJS}f~p;n}EW2dDtRql1Z2so`%f#bN~hV)B7D*)80^$@Cp_JhzZQMM>ca0bc^
zUjW!dhck!r4Dhi*MP+KlbdZ2XT=d3&ZiuzDRVvknNlJIU>$fe$8kI8hH>6<y_JCCp
zfrQoi17lh&qNCN@Et56~jM5=l4D;)GXhY=lrMB;bvv35K=F@_Cx(XR9UGjqbkE9@W
z;2Yg#6!psh>V~ew5JrIv9g6ouu&6YR2^1*xgu$yK!rxF^(hyEGJ9~q_b`b)pusak4
z2hwWz0uf!)=)(CRwB_ZQ1ku&>X3uHq-1y@4s9a&y9HRXjx``awu1>QD_K5dG$X*D^
zqWd5WHOXA2#D*Hyzna@quWNgPAVEI>4prSU@u$tsT3Gdk`Q2hw9NGbIR2Zmtz2r-o
zgN3-J*Mj5ktuxjvBQ2%l_&R!z?h6acDxry?OxTb()Bi{OBL8pkJO2L@zXpE)5x?kk
z>c&j{fVHiqp<f~__!Ph5w~C#etHHw9vDGp9^jrwraXCMF_)e%FlOIp?wZ%a0V2<Nl
zMg)1hha^V<e9BYc0eP+GFJ0=I@lnItHlI}y|J)A#ta`dkgPD_~FWJHZwmeDW9f)aA
zggJeQ$*=d_`1QV0`z`L~z~=Hj>J3#=D}plYF^6+ENLzQt4vzA!JlijF1+CZ*B}TT~
ztlPunO8~!vHc>_R?Wtez3-llGiv;|;>i!S-jW&9YiTxa&SO)vIn?w82;=>R$ggALZ
zOfL$u=UT6@J(g>OP5Jwa<)b@9wVd6QV_)K=OYv;oCv}>Z_?%#DN1CMru@?@f?KXyQ
zY*G)lLOJiRSpB8`>0#jA9z%q0?5CH5X>#68^Nw+$ih@hPf5ayR=8k&wCFUw#s6o@n
zXP03G-+VHlw-%zhPLyKS15EIN+3^g;IHyj6X3Xu{^ARsYhIfbNzBOhdn$N}tp_W@(
zkJ-`AAZR|mM87^WPwtg{?E<l}n}BM}CfEb<Ud>h|Skjn#J#_Z!4<?02c>voks}>GM
z9@Ny|avs2yD%hFuA(XV&x)Yqua4=lBpKQ<PeRs~Fwa~Qu`apsiY@9Hn&)hK?7(k<#
zSQ#o2W@Y|Et?aB8B!IPk^Ou^(Yxven4__^h*=Im&LN^KabRh_QnP3^TG*0niX`kqV
zqD}BRt1u^#b^|92Esn#DfqXx-*ZA`dgjd_*Cl{8OoB13^R++p?A>p*q0T_bDz`RX@
zoNrCY2if3x0|!v5lsJmS0-B^oQz0UkCxi|rkq+)ZL+KW862qz{fEMfO!K5V3L^MRQ
z@XMB%phRx0h#OKt^Ig4200|_dcQt{Ozl=9{O`0V?{I_=#NNf;fjd?`?&+*9kYo?Xp
zsnnF``J()Y+Yv}VZg@)9L8=X|f`B4he}_r@K1VT{hCNP92ZT7VmD8lafizw$J%Nyd
zOh?n0QDiqBp^{NKw+ATK!_y}{cRL)sF}13>_$FpK7w0#JtUR`@v;ya%6Csgep-9qo
zdUa;*6K8C1ZiOxO^;<xfKm0~59m}Vc#;ex*_?rZ!s8Av6pm6{Vx(r~?S)Sb~L_{{<
z3@*-{SxtPvz<<J)x4>cJnfq@NvoF1FKIZlOx$p#<e&)doczl-GaU#c~&h1Nyy)fSN
zPRMlV3>jhg>b>;8xY=sM>hThtirwe_V{%}1D-jU}Xp5E@r36WXq3I)Rd@4;lWuf_s
z%=JOW0AZHRiCj)5_!ixD3&gP~pTiE=6QiVNsQcMw#I+7@s{8o8^I%JLw&Zrxin!W2
z!+l+>51=$tW~lIspe;-s**<p8xnZG~TX8DtsIl|;%fOZm*p828i77eUnm}QEuNJX1
z?=j~fGBXKPRjTfK9-Jvy5gnxp>)QJ;Y{;jzx!GI2K-20q+iq65AF72<g#&C46)|Fn
zBb>v)Glt|Ai<>FzM#qgik2uQBl{BQUo|uy)Mb^XQD8XES<XL$&VMG~mN7G_}8SOI|
zgDe@^?YAi3TdVhOz0^6Omc%KLFmyl3MxQ|(X^uE15%tUi1Xt0DvqVKA>6`g{_jOCO
zBW5>2ypK@2<>L=u-=C}N9h6~l6*E^~y(aSVnx*Vw>k>Qp8TaB?6bc+(1EGX^D=p_P
zZ8AA{^MQ(CWtGrbXM=%yZu@C#qJ(zf`WMm`n>3672@Kx?GM1b@{er?kA9v_1w0UDB
zW!-Je)<R%3TERyTq~b<o{f>&&I%VQFoUk2BWXs0CEwYzn<o?R5P_`_A;e>N-QU<7G
ze~9p!>z@mV*ks2ZEsO`E%!{3vAy(FrS%cg_18zXl=n0*==Fb;L!C}IeeVG=UYlp8<
zt9>#?B|o6*=4?g+Z-t&P9Gb5~KDos)LPYpE?urix52Gc|K&d4;)Rve}!?+3n@FmHG
z()LXEJ^VPXF%q!#d?XxX5iJ+)Bvf$4;eIM=QY*LfMSa^JVZO9|VR5BUcwFw`)=p`c
zZHw`=u)I`wD{E`u*yod5y;bXII0|Vjco(ITF9E+IcB%GYHJ9Ye&i$uUn7Ih08MM(4
zW7J!CVt)3YRCB`2H)Bk8EtZ+@V-K`16WyaL<(J))+GZLsK+tL=oIj;To6Ca0k(pPZ
zspS^ShrS~-U@WqN6w#$%F!ccuANq>3YSJ@8#9g3LB2L*ww-*7m39bs39LEOQaGki>
z`aF(+2cH(JkV-WCuoPrh!+|KakPxUxMn<5|!crVg#Tifl5-;P`S)m6Cq_A?q4>YK8
z!^K5;?m|_n43lZ6TF60sdWk2Vm?9|h7Qwn2*iSung;`n5Lej7+D9?9mO6tslwek@=
zc2zAysL}@DS;Rq6v~o?u0rzvoBn4Bq-KgSV`gr00Qobvg$-BBX)wbSQ=^Am<m1Q)(
zWo&qIoz6f#7UjPxO>adIpqxmH#ov@v9YL##Y$pzsTr$)~<!wR4(oFt)@VKhxp5^A%
zTpA0fQ?@d`fkM^tt*G3^-k#$LB4cj^C`^Z?&ANhW%CoLk_7msZNfr_$@2U|xFzd=+
z+f~|DW&~NfEAI$M+y-%`o%P6v7km`GGTEh#sct=fcBPgx!_;0lVFoup%Irv|-74Pt
z_y6&AkKL6mYS*Y^+qP|2Y}>YN+jdg1QL$5T#b(8}ZJoK+-p_e)T01{rejRO&arfRY
zc>80dpB?=*8_{ta2ORd^sJJw>v0k%(0`4zBm+N-md&)k9Ev1V})!EMY?|-%5gYeJD
zGvxZd%_s!itSEFhDgd%Pd}2W)oW>nx%_z9yrAjAu3KRR&!h><N1*sNWf0f{~aENG8
zST##Y1pBF+j3XRjNJ*8{K&J<-$)Oyuv_M(TW?lS5TOh0$vr2onoyseawLfUf8^f6l
z<&tzB()D|+iCT#j(UMQzCL;tB{Kkhq{cSVDAw43SrW0Y<<sO!Pm%SI}Ztk}oIO_~2
z9KgQHni)$}YR570hHGNnWMkK*?Q}(UbLvk9e?}Ne0U?&RQTjthd2+tslc9Y62(GCm
zW!&*&%<-Ub0A>FG@_azMZ}BYyvR{{1Mc%7ZXTYSWsOB-=;?{>is^iz%_-ArA=MOuG
z|1xhHPXmm<4$0f@XzHeBlpZ8EKf82{GdNu7KMT4v@Exn>oqfE>AQT%sO@OF;%n0Gy
zeah1(YXesKpnK3FT_k}yK5>b^;%k;H^d))~K_8`oj;ewAb`am3ni<l;DHfr~q{eh5
zY_>a-9aty)r^^ZZmTL#zi{Q;S7^gB7`ZI04i#mI6ucTjsb=t?*r#iS<A95AfZZqWe
zt*hESDfn4N<jmfH=k#5Uf@MX$NR~9eNVhL@z3nLy1o)FAhNiPi7_lEV0oa#U@K;d0
z7-&x$b+L*YUntg&7<AC|lMeev(I4@a(p4I#m}apkZ^c|)-!)IiOwKa>SFv9nIW^-t
zw}Ca(dG5{N*BgEabo5X?Bf_UxK#*1rarWbp1=o|YTBpk7@R5tvb$kkr{*%)|6s{7L
z3b7Zrdl2INI;Y-fiWW2k_F&Qy*kYzeyH%B~4OQlZ{VrSym|Y%`C3;6Izb56IKw0ko
z>NI~+WkRYn(o&RKnQzbJ)F>Eyj2tjedx{kp#^(|o3qu|0sYUIzce~-qR664=lw(z<
zVB}z6Z(flzxI~#cDBF1d#oBbwYuol;|CxssCo`Zr6Q4orBEgr*Se`XPa!vpZgg&%c
z&ob29mO9E1K?RjL$QtXyFiLRw+Ls+!Bs(OGTI#gZU$yFM)%A95X;UVtzr+$2B1k*@
zE8x3#=n{9h;yglyAzGa2xe?#Y$*P>J7F}hdb?)Yd$phj}XH_nIJRbG4Dj5N9YgyAk
z{)dCF>3F9lG;(QbYl)P}4z5OjCE6MW+vv~n?Ru7i)&*Q^#I;;}>`R5=8o)|z$|XKv
zMaOc5fX&n=uqE9!$9BqpwwH`3KH~T>0-IP?7;uh5@Rg!hr(YI|x+t3S#Xx5n`rOa@
z#8^WdA)eZ@Qi}{=p2Xpo<F*X@(CitOpLL3@g+W}ucO8(~WR<k|%2==Ps5TrpRW}@&
z!{TSy@K}6sEs~9WKDp+VjP)5A=8Arb%YgN~|E@go%9A;Y=oYN)<WxFV2SH|)Wnjq?
zsRGs4y$e3BR$G-D@1%pXpr{Na-?BnbjQZ$ExnA7cXw-rqE7!E5r08QzeY!sO($tR5
z`?HT0+h&;~ZQM*oe}?iYC46=(5EXic^RYYK{rkrFap?@%Sk#jNiatys#5afwVn^X9
zWr7l^^k~pdLYIW|i=B6ID@JX&$|_QbO2<b@TIvcy$xp#;SWO#t)ZI?m#+2p4+H9FG
zSq9-9;?9+b+c;B`Qk4Yq;2{S3YWRkjF6@9d13@AgU&%CfRT6IPw@)Ol_p>>Uzh#T<
zu+(8<_}}J>x=r(sDJ^%(_s-%dInVjG{kakfOEj`jI5LD(*^Z>8hcKMDesLS>RdT6j
ztkc{^k;i%>EW%L6PuM+Hh4I`N9Gt)8#i=-b8*A3VGieBz@bH#>mGF=piDQ4~iv`!(
zY&SD}9hM8Yq+XT_>%qPwha2O13))4Z9N{GfwFVx7&2F8UVe3>|MY6sy+Q{(ntbNEJ
zYrvl|m>Wy)(}5Qks|lvB8Dm|L+w_d(B7tAoNkE>NG14I?K{?0K{U{Ssesr$viH-2V
zu)D3Syy8H(wy!0!-ap=xvEEHMU?>PQG3tu24U$LMLnGK^4>o&VC3`#KouOSMgQ*Tw
z{3B2{00c_qP~5<O1j^-L&p2;Qjbqm9$udy1bo}LTtpl4JZ8vVmSbvu*gFzr-znetZ
zNW6n-og`Mp;%Esa<~Z1ez}$Z1Ve%|zj~VlU^Jn{bdv%3k`sp8w&$7;8Y@n7Usx(~5
zn-$8655EsK2o4!S5Xm+?#gI&05AKRoBj5V={1-F`aB%J#x;VGLC91yB#cUJ6%r+Eh
z0@&=;DEK&%f3+o?7%v>Z7GO5<@8IzK<~;9+aJ(N9uybqzmlM?(S~CFd{~}Jrw~!%v
zFB0ww!9`5Unh3eU*k=s6{$r9A^1{NW56n(H?xPL5L--?z=`x4rFEWqd+Nh#~=2>SQ
zG8lIPzwe)Zk$4aK%!io%CL(DA@2+fYYAz#hQhg_P>uMtdF{2ZzagBR1KEh0kgvg!Z
zu0Kr)wkhW%*HsrcIZ`1eA$8}<=Mk5|)9E|HX^cFyO~StVLNqwLdZn8pB!|EnLANrw
zh=o8x2trM%=CjO3MD+j6E!c}CiHHggSRtJ>R-)dbV9_KYQ{$i`J>LL!7xV$U3oY;-
zKmin+mZ+83T^RblAe7_!68fQSV-cl4WV4Ggzi&YglVVdl@EMQ$^v6|&^0q3N${>QF
zf1M62pOT&uOH$1O)25Ib>Bqdt$}Twc{xKM(EUl^^?J7<k$+Y>ghd|xktBk{F16NkM
zdb49ye+8=q4_DkQHo)%F;TY*lqyjoY<D`?QirQzq^GLw}?n`(%{P8GGf*GqOP_?*H
z3S}Y<Cs&K4606_i$0KzTP|BnLWs%D(80`9dvH#SwRh`a*^G#_O`i5yE1DLBhA~33j
z==3gN(;pRj8J?#ixi}s%`h5?r#(q$3$4_6Bu*4itR97Hh63Vmo$fopjoEa;IJTnO=
z<%@@SA5rIoaepZrb-S$0j#H*t#45dA$~I!C`5&%R=7-kD@2br)V&0wgo<Mi2ELa2H
z*Q1ZoKc3hubdGS4WsQkeVHR25d)=Zt^C=s6msjUj=@umTu61!6oQ#ynBk<#mK>4WB
zV6rsnRJfy`dm@5x18Q;|4U@?6Xe!<0{)c%bdE=<kl>0wE&CA-m=fn<lk%o9cEpMKc
zQu{2%{^>QHDVHu4Sn_14Xz24r?};eF7{Ebb!-)<o`x^9qx_lmO_qAUStqygRe<;tx
zM|{y+IFIJ}$3k%Hig^uYc}LP7{mN+)2;?xKQzb=4%|YP2M#9JWy$@0u2g%9IwWI11
z%(UB0*&_|tk598_h_G`8hJ7=;vtnh%q2e2oo$dX1k&vWrQ1!ChlNmnl58|Y9mCpU|
zcB?@dfic0p7Cms2>d3<K16es}%IZQg3OnknRQ6Poa(`(<*w{0;*ut}*4@-LT4OjLi
zgd&i>0@Yc?5WM>rRavQ#;zBMV@}SyWc2*ub5^-!Pa$~2(p@S^@CEZe01}ATUN_Y03
z)TT745+p#AlhM#%OB;p+n4Brr|B|29aHUuXwDPgrsw{I0%M06^>kAe%y3O&60~ynp
zF^gWTG2A(-8>(=#g+GN$^NU|ed&F2)*SLP+%8Ky^+h~k7LL!2rA9>(CYiZ~fixqqG
z4S5O$^e4yr6E|ck2Sx3-!xN$diI)COf*HdPLMy=@4ZU7f;358*Po)c*!*suV(WHT)
zAh!=X5?9yy@WM5y;p4RG?+lj#EmAVOH>vi%E5=X?Y?XW@TW?`cpBK=@^j_qDc`(0N
zJ;WQJlj-v`@zMOb$=qBT-d{*RW^klH-`&&TdVKMp5^+79u|?et68P-XV!}j*Ibg7C
zy__&|Uw&|Q@3RAtBmk*UvADTriZpsYV~srlVm>jtH2mL6#EIKMgM~id2|9`ge(fk)
z9kl`=b7Bkxyi7+2L&GFXQ`_ObD9636*ZD#6I~^2n{NDL(|KIlT*o(|?;^!3o`-2@H
z+WT$P+hxkr7@*;7j*;-A%^^3bBeOIkpgR8@Vs9~PSWSSv!`B7HS#Od6&|(eV2|oVr
z^n9oC7im%*#XpHJ3|sYiT)&@?2xv6$a{z-QV#KR9@V@{f<%3P({Z;7t#~JmG#g5@H
z1ur}vOSvkdiy0bCV^Bhl8skp+uB5B>Bqn^xGP-30b+#I`j_(2%1*}rc9ftdZWgB!0
zcnC6-3USs|^o<z~jYiFp=MX;eTA8Yz;%ob|pfHdChyUP4$fvK$j1Opw*AG9;?52uF
zAd8!D6pvhfMF+(m%7%WX@T=A)Rh_D_W{$&ONR&T)H()+!c*kDAxKQ^?F%L&OL6*p2
zSQGzwh!Vf5g_IA%t|(?WL6~-+yLB67nQK7<R-M9_6cX23RQ~~<)5Fs4CoOjNy^tnV
zTCdSCUlfo1JUe2Fu)Oth@CvjY@}E|@S3B<?B}s5cHl>?2>1=T68g4cX1;Q}O?lkD8
zLBgK7MX<rk<ZO_FhC<v(qFna1<t&rl;8i-rk_%@Tw)zXwf?e2hHlS~+n}?X1e=+OM
zQ=ZSu8P^j8!~$ZA1VrZoXvlH@c6G9Y*MLZ%gq%sefyQ@cSzCuktZGg=1NzXV={T@1
z&lgQI8RgT&&MCK8R)(a=j!{rA3zR81ieR9iFQk&&PvUcs4&A(>qLwe`wP0S4wes^}
zBfZE8W_>jo2ecG&=;WdI23qoJTCU^Jk78Cp<#&;&auOh^_+O|)(4|u~QX>*6m(r+H
z0eUU1?DwIJc%rMC03_7F;(sBb+e)gk?7uP)%9Ya|BT?$E&jY{}JSpJv<~SsUg!MH3
zGk8=64as7Uf|V|_lc)y#?(14X`F9gH1k8M=1|*(tpNtXfaCqK|ghsObmU1sS?F;O5
zmE~AVh8D&T^5Yj(Do$D8t2LB|FXWJio-d&u-(wfQ%UaL->E00PMiDb`QPS3{=FP>?
zl`VqwAQAY%VL70|+SJkXZy7sK6qHe6g|#icH>Z)&cV;r^u>RIUVU2ZpciQhG=!f4x
z^Of*fUDSK=qHqP!uS?18A15n^%TF=aX9k-(U92I$rAI_&cM|x+2Yw<V!NNqO*ZruT
z`FNQ4a3j|OvJ^xKr9xUiHr6nPGErDr6pB;KVD9C(u>hy>hvaq*$jTp9QTtES>9M-)
zTtFWpF)`TC=xKNUHUAP;LsBVOUoc-R(8%d&2=L4(p^)PX#4Rxdy^+D3m_KK{l~!^(
zO8FMnyhB0JTyQ-FUHsP<RRj2<qiRg-a;!}vWpM!THzofa;lk?ktf};UdyUdRln)nB
zxQ>rA`KNIGLok_nWgEYGyVJ_Rv)T2-4CqcjPqXU*Y#8`O=j*QM$EWPT_P_<%mbpwF
zpeCMAxl`)(H1FlAe5c>S1bC}up`o7uSCc`P+XJ1xfa0K{m><m7*viEntit87RE1IF
z9kz2{Nc{L?t%3((3o2<Q^bif^HSnhC?TXkNx`{L^8fm4hB+ZGP%hjs}l+Z51o*3Km
z2|s1|98!TSJ~zL*{}fklDwBLcqB3(r0l$5YUOV2xk*U&Ar>;3|#eJ;SlC$8H^n>{!
z;#)N4-e3i>ZpneK#=vU>M!(+d$W8kMetkA;Sz2Y-4S0a}%g&M@wqE`Yc&Kwb=pyF0
z#h15{P=NT=Bk=rYHS997+{`tvBM#&k$?V8=5*L|m?lz?nrzft_f<G!v{ile{ZAae;
zTLiq~dLlC^2o}@-S+f>WCmr5)H(vA$+E(oS3?J>d-jr?f;T)RXxf^rwl)qKx@*3#z
zNk&D!SX_7Ak*Lcoi3zzwQau+_(=RB>_72zjv8SDoe#NKR^&XrXz5P+o^d9#k0B;|Z
z%;oXE+;yKkjsYOLW&Y61&n>Ou@5?Aj3yA@lTbZ-fZ{OsyfoJj6`$FMye*T(Ys1x!L
z-%c4$dlF{_HmBZbv|5xzXq|t~+z0$xp&WY#d(%r*c|Sm_UnjqU%W4w6a|l8)v@K!a
zlWQIsor8|Leu6d$zb@0><Jzw9?GRui$WZnI<}K>;dus!}khl_6QUK2tD$On@?onbO
zR8&&yZp%hEQ;UjTuYyG#4Mdq`+}vm2gK3r|#bhE3K%aUTWl!kToNvHQn=NjhPBB~D
z{&We`(`qjq04`Ws1X7`3Zi_NX(Qpp*B=UJ3Q`c5fXrSsQ&@uPFY9hrz_l$gGADmaL
z9%~}Le7MNh!BMPvE=~3J`BKd`&#|bsDCpM{282t5M6|p-Yt(xmC#*I#)eq$fRL^W`
zc|VZ~aM5cDB;}ux#`$16X2*ar@qq~h<p`in37-s+wfvVC)e^uOB^OJ``wQSj6^40Z
zanw9p50unn+A{v-MWLe&(d&T^0BsMCe_PbO<p{pN`KZtIBN9FXrO=n`mO*~<cpQP!
zovm!Vtu1skVH(OoCX5}xU+Er*zLMt5dF1bH;;yF6)}hTH$P;6g0kc^EL(W8BZ0|su
zeTr-EU3Bi{HPcwcnGeIZ<Q~}iIgaG@BB_^4WEg&)Z~}S>4}E)9#(h<Lmu0t4`;l1=
zHh%GEppiFYm=Cs#si+T_@%qM$ew1A=M3%3x_)AgSxUBCS%_ydqHGje^+lC{D2QIkH
z-Jaf({r1&AM<5>+7y+e(#a^CxufCzW?@r%KN<V`Kch8R~lh!lTEcM6=V>LjQG6^gS
zIJ`&B!4C4*qUX{$@^P$DDR5CKVZ2c|_@Ok$VJ9E2pn7D&cDZYXR<)3Tf!!%(V+R^K
z2W!8bXw+r)ZmnqC6r7|+(Wv`H9Il4hLN+{gQ`T>9EH@7NPT?^3cU(oQg?P@IMR#A6
zfr=Jxlc#X>^;$2VKUH?q2i;zWt*n#j6>Tk^nr41Y5CJwNPJj0TF5RZ5;ldYrSbf<`
zJnTIXitFdanMYlrHk+{6R}9|i(!(rj9%rl(|LTI)oC+5pX#X;S@#tPt_I595mX@!Z
z&tXqp<n>_`RPPP`Y~s1|wr|xSYrxb{g*AO51`R=dwXWAp3p++6b#X1H`n5QBZY4wR
zNU(wqHH}C1iBROv&;@~*0#MPFjDc1c+A39`dm2~j%KI9@45lifqBn>tHxPHDwIfwG
zs`dGVPM*%-USw{J+dF(FBn;M7jw|r0{jY4^uTWRrf;}i<iDMR?tjUv!r{U|oPGdN2
zgzhknj<Ly^^ec(5V>MV(_X{=nn)|}8o%TsE8P>&`gfZi@piq80*4zeTst}t+kliA5
zyg;cQ4|-Yu<BOgu`=I`*1^A*GgJ2-c-vD1!F-g#*kq{#tMM<OKzrJXG)&KaS=@;=s
zoML?Q|N5dd3ZGm5_@X8RCW0iBzdOz#`bBrrIsWU5?(GemXkvwxsVZ9kWJ;#q%(Kzt
zIBdqE%Z0J66+@xW<Bc23<bvW)CVu_}lL+rreqSfXLk8xhlYv+>5#hgoSGZk2X`v^3
z4>8Er%iJNeUau3d@>psr^g=vqUbz>^w)S@jx(0>v*95dS1*N4F9uF(J6+cSuj9;$4
z=_c<OP*EAoL+8$xB*yW#cJITur$U<pe*vbEJ21Cf{0cTO(fciQ(o=|_;S~Y6<#u}w
zbkMHg5N5*LW&UOH@+E9S0ZhYWA^N#Nj*h1n>b8*O^T9jxioRa2<&1$NWS|bXM_5+V
z(;}#KL9xTqaV8A1$qW@j)_c*{(lW@~Uvh~pLv^A!EjcL|`aY24kB{e}4%kKfP=<a$
z0DKog77)DC9VMO6EgD@N`?n1q2#xO#TV^&mnmHE;NP&+<08-$ZGRe()He~ThVpOB2
zmjfFeLkjsu+A0D*s<c^Fs`7K~C&a3oZrmV@5=_yMIkQF8^yGs7zv<IHl26=a66>a&
zW@}tUQrW2Lmblq^1`j%2c&8Wr3)hU&`Es(;)2k@^=;G-wQ3uxr3U<LG{YJN&h;2;i
z)sX&GsT=ERO>V|Z*8DZHiC_e3YH($`lNd$r8Z=h2c_bU5!2sg;2ppQo3wS+$1Sf1N
ztz;$>m)-x<7`>Rw4F9h&YECKmm!>3@;s?CsLdkg0NAc@#U}LKz0ZKd=em|o9Bvk!$
zb|Dk2*HH{z%dgkC4bv_HU3M}b-hnO(GT7mt>;`{8i?PWiKhP?D#3(gdV8jB0+CMNM
zkV9~iF^PbZZm4E`YJIf><z1fJxqnY#=g1a73$-+|d%*J){<JB*>yNCxq54SWuP}w*
z+lO9Y?N-H>0mX8L!rh~l0o@#R#*Is;v|D(B(<*JJ=7W2JbKIUbl?oQ;WJN>aVj?t^
zao90kAX~Wuq<9>;=ouv$$2u`EksWwXF!0*!b5Ebf-VTMIPR)4N^qOaZeJ6Lq`_v*2
zpiL+G<yv5)Txc<n%3@O#aBh>-xE<LgFMP20=q2cAphnVaWZ4Ze3)In1Zi4u><l$f#
zx_ZIH3EI)1B&5M7O+-3{Xv={tz>*sdG6mxs7Z<WXz<$Z?D~TSY;gRoFjV2kY{or3%
zZDW@7iF{TLy~A}fkiDcz8;?9doZi_rLomAoPwZ~j44!Pp@*$JtcX}D9BMyV{`bi(R
zdnn=Zwfp@mHU`w~FqLLnGwPr+lEyF`Rmr<<Tj3WCyJ=E#tB+wsvs=^((9;ugt9nc+
z3EeVOgbx9fId#=3PGo#(O+^?2b*B-yqOU4bxweaPdjq`i#sO5>iL_~TC?nVjvmmA8
z2rXFyYYjGF;!Sa`s9KQ;69<=Z@h%YykA|5)iNgeF+MA0dD4cwa2@drv9(rNRIKI|$
zuN|-d<U%?zsh<*_n<A1>%xQ!i;%+!)UrwO+re0~@jxpaQL{vnu+-cOTAY}RncC&29
ziy1odT12xbiZq1OzEWpi0k6mxZAj5e$h;A3>oOBD+Nmr<2E@jmR8@S26vBjdW$-in
z#I`I43^mM2Z93D^b|B}Gr&?*?9u77|H?Fzk4SVz6A+wsuh0SUEk=^7ayRM-mYN8c=
zuv801c^T=D1!`1Ik>_t@e^w`K;6SE`q-_E9@utBo8WWm+a<5so0-PV{ZVtSD#2$Al
zizo!jfmEq7P4uZYk%eeGz)SAMvL`{m<GbHva_>V;>GbN~zMAto9zeHwAXE}kfb3PM
zs&V}m@GOAjaSw>WE!Q%D`OG>K1y)vg`j(xbWC~^DKVVD~9NCw_CfMEGyEKP%=H3`7
ztND5mkMFeFMzF^hJ$Xg+OftRq&@o(q#o)FaNYj)lAU}G_s8tw^?a4p*g*h79uGm)b
zc9V>7*X<zmuqVd}8cX0=^e6YXB$^i%M>~<Z#@TqEnW~97thbo)c(av>hwj*ICYjW0
z-8JJ?D;+-ilcsUg8_KGW<U5xRwJ>@Rg;S35It7EHf_{sF2;u}Fc{1I1uv8FgqQZ}F
z1ZR={Y6Pp))&;w^*w$zJlYkx652~t{@*WrfRgy}O^u_*PX|%%T1TU~WD0ho20?Sdh
zIb81n2YL^8X+wY4Vp9z()C7ioo|%Sgxi3C?LjzRe0S0?xiO}x`Fmyy;)uoRz)hiRc
zpz4nFR){=_dg~uRXzF=+eI8}a5)L86RPq!qP*4l(gV~h)N6J!8y6oQMFql1@8V`9F
zuA>u34o4c??-HK9`b6>rN9Dd#n9Xa(HYy4&C-5cA8Ru#t)m+0$vnUb-gn(hV*Cnqf
zZm&5QvImu%XZG72X2DV<jZH<vxTgXx!co^sFlS%x(<yU*s{5y#OINw^Wwq&@7;PIW
zm<{sj?~)NIOqupq;U0)_<t_}ve7M%rzcN2F*XZ?rWHN?u{K}rzieNHQ$bikF%W{G%
zl3PquTQBKU<x?`7D>(sDCL4(1Rl0O8YI3h7ur2XTSI?lS8e6d1gK^JhF`LX(s#Kfc
ze#oCnKBm%;LM6z_kLj86o8t#e89qGk2c(S^s2FDA$eqVpzL*cXK-Z`4l>w~Ls>>DZ
zje;Bk%69wzTBB3{TBBaXOLv)UtYj%}Lh8yvYgGSOqx{rYEzq)qVcYqVeFF+)`(u{M
zDSe&9Xpv;`>upvan3@==^S0K<65ueS)_M`r5iwz#3Un_g%z-!Tys%r@L!WQ@sWMAA
zLLo=ahLWGlS6fHVzwcAbJ2bg|63wIgHDGOqlJs+o*gnN>{s|s92)xo@7iWG9{E-9P
z;^5tNvEq9^gjp|LaN>!LN^8BTL{u$?dRd_htYo~tlP~p3+$q}Wp5B@SMAbG;7jS>N
zcN_8aQR4S4ONU{javO(e?Yq<xpExmE@D%A#+#QrtnwCvWCbQ13c(e1(0;RgXZ$gJj
z<$Pw$2HYfcl`|_K5zG{>6&@URBLM!Kzrz7T?g*4FcuY6xfl;a0t3DPY2R^QPYgf;;
zOJj25(e-9IkD^KYYTU|{xBnSDc6h5;mQ2|sou(`Ln-tr)Z!>FU&r5A0Yc0t70;wdB
zB%mN;PQ0gzR-!v0Ne4pFiqGJ^1}BPD#!5;jn;y4K5;v`}x+$JB1#*Pb{?)RSP5j|;
zNH}Y4FYpS+1saMmAhFv4aFTxFuAjU^{%ca`3Ktk+r5u-B3l%_sOnD2qmrAnh3T)^N
zC9HO2!)_{_{b31JQ)qU%!eA2hhJ<%Yi@*(<S8|VKMiypY;!t0*W%7(f?V#)fu{RAM
zPl<Uc$&_!}=l=I(T4AC{=D0+S!!YeelXTzAwIhrIW1I<?4KcB!S(okuvIDtyww^jX
z)Ji%@;e~OZT|lE395pV9N4reN^Y(M{K6!i=)p0-S{(M^BDj7OCjBqYS7GI=i!W;1*
z-nol}=0B3qEE0`gwe25ATi60AJ_a&3yM-P}oPs|7q5T-dj!Z6uM1%#RKmUJ;qs&5B
zO<$ki7;O$%q!O7r@nxP{@!TavIw;H0E!6Pvh=>zj^D7XLf%E;v2KHyCWR4ZmN{5bj
zV_?6q`h&n!yvm!>QRCfED1MeBNIR)aG#Zl)Fm{UaZHcJsDX<`4Udc6EZ}#6u-Kn?S
z8?T2LCS6f0co!IjRyYvo^jp>2709H;)gn7|UwDS8BOF&sJoEB%f9A{%wAY7O$#$y3
z-w?aB5ZH^mupm@)NuB86eB}@s-@cX0JooB#`7QFlHu}FhrHbRP3o*g&I#}&VB-f+-
z`f~8^eV9W0Zt&aQ`$#L5$UiOWv(8>uUUg^&IW!N2&<>;eBx47^!iKH!NtblOFgJh-
z#*@;e`H7OERh-0c^^qO6>~SVR802>sCNYsxF8i5e9`7FQI@t3X_#ypSOs&}C5>-(t
z;jdyN;NA0ZfZ%$*%l^Is$~nozuq75P;&~*+z8(X-OqkqlCfa5)JOU5fre?TMBk@qE
z%)x5RM*p|2-W`LtUyo*IN~nvIE$nq4C4N=hy|X@C`HC{2E`YC_`Z=Ytq8&wfPeX;=
zs!V0xLysM~QaQ+@OYRHP*J<#(9QH49b>dn*Z-Y=CLWCUiUV>ha9+}rkr1YP(%ff2>
z-AY)2<J2laip~Fpj(QCJgO2JIG)sVnLU1_>%^kEMt!9d<0Q`EHF3M5s76Ad7!oLce
z2A#zRxl5PGI4ZXpujjtr2IkYSUO)W1;5wWwo0B?VW$F}+DVh|CFbz7NZPKC`&L#iS
zEDMu{<xSMafgHm4eN1%^MRT|8xJ~hbeR-#I@DU`)bIFKMyO)n1HnpVAG3y>)vxlMi
z07ZLlr4G{4;ly+8(OGDN_?2&-J%F-<&PjSqvL&d4LDJ85AYx7GVL_@+1)h{5`;X`e
z?P?;e{bWNmJ)KasDRst1%UqQvXv%zztxY81ofpogGG{)Q1D~V|4eS|;4s#enmz#Gf
zT=$$F=P%vzj$z4J1XbOw7+7cnO!8fz8t^OxQ4IwWMQ{>AbrA3b%@koWgE<wHGybjP
zHwn^_)Fu~RaJ|4D!)^H3OnckDqN>y`D;<Pp;|=nA<~cgS%R5N@*&1d(s{i(y^m(RY
zB7-FSHBA40c-)*(mTvjsvv)ki*&hd<DW#&~t&74fLpKbKYqCe;dO)8_y1D;r692U-
z{>i$J`g|#<J>#=Jts4>1xhtKq9p5C_MdD}UZ_Uh=DZJR6{8nU>yJ!{RC0l2DZWc-u
z<7jrJ!4aoQxWDVFfko>lzYbnBy+BI*hMciT7URzVL||_Z<d_(P|43Pi&cGTCTSl7p
zid;bR!yANor;*Z>u7Rijxc}ql-m8Rb09yyKV=-oLQ#mm^Sb!;GvY?6UH@HDXUq~t`
zP@k1Jnp}GPxC-&S<@(O=b^q`3=ZK9sMA@J;6)tkHBUsQAlNU-77bShHQL2px?uT8R
z;Rk9p=74d^uO53@C=`tB(<-3X>1nI-jv(%vo8uw?uFXgBKo^0iMe;*D$ykagw-g3@
zAXIIiMC*trHi$R}H#B8x%U78vVMM1RU$zUH?`|v;#zmDUK{!~s5JBl!31LCCVt>g_
zkIP(TPvhIZ$z}N!2}I}u4x{H)?Oq@?ml93l5MmG$vpOB;)C#Ul;XvbkaB<+{@$K^X
z(S>!`Gz>6Z_yW_&=T+r@R0jhOIu<2zcB7@0-!nis_@d!nH35RwM(c6C@WAq)t=14c
zGShtet;`n2ywb5OJV;UZ1;C^%LKNm<-ixBdvMa5GD@|i7<7HYwk^g>;c&bZPVyv<Z
zdh%p9ZV)v1)QzMnkbz)xD0c9~J}$B4MZPCNzOMEBtMOV=w~Z0@xOIVW9z&VgBJiF6
zU%mMMz@xBP)YAXLqo$6oV<`W?qdWk3)W};`zN%uL13wA?kFu&Pn>;Tw{l+7-Ct&jp
zPBv95m`{!8O9DAvk=XBFmK;23HEQGhGn-olZqIF*3Dz1@WK@W&zfuf`^Y;Nt#Bft0
ziSHKTA*O?G?WqCiGX=pU@+k2EHC|i3?zfV&#Cqh4JwQ+B1i!v&g&KpzVYUOxWHfqj
zh%E@O07;uPl21X#VXN6gTw|U`q0)Sz=(7HrGNmoGc5A_P8YWqoy`hf+K@>4LI%FO1
z`q8gmII3$&tM^Ms)IoPlCxY%6lRXmjG^YOK0j<x0D%c6L%)ObO0(I$3a>pqeT4k__
zBE(J}nL^PTw}=y}PFyEKzxbHb!gqbV>(%y4Q4bXerW^Gq%$cF&9k9Gmg*q_h{XCV6
zt@|&#YEkMOYC~}riHp*^CD8OMk_lP@!857i^$xzyVVStUpbF=%s{@jbfwV{B`n$J9
zDpZ$liEo*PbgJN&5ySB1Z4w@%w<>#*XbX$hfg6VNB$U{<SNOt9ileoMxA=|08Lr^r
z+!d&HJkGE*HIYFrka!Eh7s1+P@fa-c3xgCxGz9a4#_*+uN+?Bgm1wK=e#<`VfZ1i6
z^VWx+OWw9pfsIzT=ZI_jNRkk}@G|r+eHhu*TnG*^PKh%7LV17H&ZqVm>Y=YVXJ6Iv
z3=5V!XiwhTid)$5(Hl9Y%*y%GV)3dTjYTCU!df#)fb=5|#~*Q9w#Rt?GOQH-V^p%a
z!VMGlKH3Yw^PK_Y(WW4)gp~c|M$clK#*{%M_}HaP5Ad?%f8^1!Kk}1(0C`k;gQGzC
zSi#rwX(T`M0KNx4+SEf(#zo+ka@ctyDDuDZ=>8l)9<BX<<xvbt+=KQW6#NyD`ED-q
zz0e#1Wlu>|7T|0t^^p)*(rTBT4p?Q_vCk#LcJr+p<g4hy(>98kh8c+E0ws>K$CP%>
zp-7k@f<*jyA7PQ2nAKDf)#rM;M_$_>e~p7%ElZjoe%fbw!>(0GJps{8%kJGhJ#4<E
zMa1Vl^Qbj3VUI6aJgOM1920!A2wYa>X>nXu&!pG!FHP9AKLSfSGR_YlKN`$T>=
z_Ivo=%r4AOU;pU?(4+c8_yqr<M+*S-C^8Xi>7Cl%DvXiozx1ewVxrzc9~+V(SbkTd
zZCZdYk0Rjwo#`p)Jm4@8A7Vg~_d|l}l}@PFX1%ZwKgYjoDid|!GO&bD!(AE)NK2!Z
z6rG<aJ3HE4wb#$t>ZsxKep_!Jx+}hNH_Y$$V{b#G^Ff5(*om5>%Q3DjwLhf&@$~M%
z&c}Ky&kNyAG&?c~OX^h#-MHt4@O91JVpoM-M{$I`hg!Y@a#dmos2{(_7Os!?2}B)0
zk4AO>LyykmhDEG#WS&Rt|3)d&=3TgTHeEm#&JWulWVByVuwSCQW>nB`v#<b6y(&3l
zwu)%KLUV)PM8>t<^x>Az$7Ap};oGa>;t<P`F3Ig>sP%<cMwufG2=HI}Jjt9<#88zB
zrW({z@Wa3D2kfboMwOiOvBl-B%0>uJp`R^ir1y4`SNMNa<XFa|J2BA5&j35bo5njC
zo1+d*P{MnZZYX^YBVgwF&IyXdwFTCIHzHM$jbc7RjpNZAx`?uYYA?eD<AvWq?<4q|
zdq%z{6K@hTsxM+Og`F|w4KR$%lKzi9>eG+!BsSGZlHMCbgU^`_=b{bXYjt}Hgg$_)
zZJ~TMoY>R`!9mn)ohlRgM@kAG#qk9Q0B8M`*nf>yuNoe<O#JyK%^)r&<wry*I5(dq
zjBNHH{%+YoY+j5vi&T!5h+{%VrE5aGht?GFTEz=%;Bv>^lMPI^60}o|ct|Lx!en})
z*c>7j!&nthifjyh903s?eI$DHQH+;6D`1gxm!Kq%QuBplkN1IUoj3H-1uSAr0~LVW
zbw3JABFVaW0~Q)c{Ht1=2dSCoK^A_J!B6BkdAGVSi#7R&cW(Kv*`$P!W;A@mOqFz?
zDrM(S>7rCD8yd;F<YG3WUDlY=2H!FyV)%K%>Sm86)gn^!$XnDrY-QXj)hAA-P{HQk
z)~F*5?J#*W`{2HdrU*$d#jh}@;6I6qh8QfI^<5S2WwECa1`;y=!AFA@sMnhR2OlM}
zt_WOi(CwqTOXV2d#oCoMSI0Ia5kdkd7Lb+PJxF27`VT&e59J^h9F71^FJC}5sY)w5
zjO3Yf$D9=B04Jh7g(;>Dz(@b$fZZSa3nw#$n}E^62+6%Jtly|x@^dj)V71hJ6Fz)X
z^M+WExos5Z-@&%ZV8aEGbRGfXY|N<fuCDcpZo-;4aZI^~>(Xp2UoFb<Y2*E!%ncOf
zym4mZX4RQ7j(we;Ejl8s2#bOA>r8L*vVf1dp9P8`N166ZIqD(jEeZT6t{}QnilAg4
ziZlB~x-=4d6oXbiu*AY)e>rIInF38HmT<8&qg_y@f&-^DAZJIcQ}U?@99a0E5&mGl
zWu7O)4Kn?JpI6R^A8>}TV(hOb6||6GL+qmd0-S;YuMqij#lZx>Ak;!EU{R?CA61BE
zu^1y*>EdkcyFf;|CH}NXa|!5+TVGW5H0b?)o`WFWuWD(RS1XiI2zlN_*9*&p?#!Q&
zn6(=qpWC&amqK@DE5SO(RLmTjzOkPcLRYP>0EnV=NV_me^F8n&rTU>NXVrp-@vghF
z)Xxj%SEAIH>*9~1#CWQ~f>oxhr9Ch;J}?17Nq>9y5FUT(e9&r}e)HJ;anP%0-<7w-
za*wt=z_TH(a)a;T$ow!6;T{LmBl|}mMJT}Oo;-<mgb3+`V3Amn)SOhRj8z}WBwWfI
ze+PHjm-n?J6I&pwp_~4Rh}-{@AV`mofBE(Ra4IAFNWI33;NSh@zIEOW4NsayQy-}s
zDo>9q;-f8iFvDcnj|IIwp8B}6-985e-^7LnH<p+x{8KiAD@rAn8(Ec8khIn+6mvP+
zg0Q3CS~@(9byP&0ESJ)dge3L{O<WZzP8sG@o>r$m_)|p^uTf?4wt_I&xeki{tAX80
z&)#z?2rD4DGy-cHhk#YHkLMYw>(<|{93FBlGwdIL0#m!K-#`25kFXqg?cY!T@<mV!
zjkBN<@{=ePj4q0^7qeeMM4!hY!iq@Jj)Y-Jz~R+#89;&~79irENr1jJVvy31@FIb!
zH8`u8;usz5BW-?=9e^XV#yMh?^0n=XuOfJ=@8nLf5>ztn$wr|{h3&}h+_tBr3KR?T
zkWWd|MEV3-Tqf9PBl@1Y0aq5*v^GqxNR~}OzsYfv;>7Xfx_{o702)=9zE$<kvgxd8
zlbTC|C$k3u+?@O^1DQU4_eqN}il)B}W{^{8hay9HvA-Zn_v|;MfuSLx;%3G~1`p&c
znJGT3DRMfQqul1MKNP}_Bx6{set8oaYwvQVB5V-H2gSS)N*7YYbX+SA?0wmP6-dF%
z{@dZ?Birl!Bav6iTwK*4Ow5{IXdq}L*n1V&d!0t(vr6wxOLzZa$v<Ig&9qP=ySy&X
zH$t`6U@|dH{{kM<bNv9|M=$07;YT}k4|~UYVFeZfoL}dZ7MEHG`J_10gVr>c;%+^=
ze$!$rJW-(BZqf9>VY;l&uW0sQkdn_zt|T|d6}e#%&VZjy1f$$~Iw&B1h8-=&Hc?`%
zZ>;~O?pQVMb>Gc%nAD2kz!-rAcX-3G{rq1JX=5E`a4U=Ab$={*W3k^)m(T7Wt8;Qu
zbGi=Kk?d;%)@I(n_kZ}U$`x~0m)AMCs&KAFm7D#?A>IC`mpHr+jj#f=mcp1$S3U2B
zH1m+e_oi3Gn<s03bs9hNz9seGz8~UvVW+;Uhh$si2RU3^fw_<*f~Dz<D*XUwdxm{-
zC6asl)y}HoGa~V2Y-R&$={S?_rz0S^iTgVu(SBuM669@#@`iR@GJ|xuT&-67Z-3lX
zwCmR3II3&;2dA(UpDZINfx$PaW;~MG(tm4+GfoB|+_?n_pwT~ngH3Ky@I?|Bx!oc{
zq+Q4qF}ykzxTS=tiD}iRfF1>?G!k|-*1#aYi}KutkzrrU2e*zC)pg>{3vNSKv$T7V
zM2Btu5eBw9h!W)bmB40@;TCFN(sR|*b7jkd9YZd`Ny$MeAs)6V&anvdc52^0jq!lh
zdPsrt7vo#*Nl;cN+b}QtBxsJ*NxFcMbo%vLJ`W%Qo?q!*+XV=GZe1ZRTwiU$Yz+PW
z)lhSwYnfv0I-g2~B)Dk?&Hp7@l$9%QE3OO@PsRx;I+#Y2T&FZ#Ys3KL+Da#Gl6Kc7
zo%dEnN{Uo;>p!?*4lWr2O{{<4FA^?17%#jWj&y|e4fCXAeE6c6+KO+s!)waMBo1sH
z0w+0;gQA>?U*X&#q#Ga!nO!RAg8fQ>xj>SfM#nghH!2XLLh?;Bo%m5rkR{AaEZXQ^
zFRHJIj-tY)l-j*KN-A0$*-nsv*-u81QE679c2tWwFUD5^(tdmpW0k%2gPb_jG{iMo
z;0zVjdA+`z68Z8}Jom@XP&|%ljy0{dGAV_FWJ}_f<U|M<sqbVmwd;1J!`HSz;K)U}
zp|wfnoJy$#QL8cuN_>I|U5YMKVCaplF{`%RI0mzyeDlN^Gsp3X7pm4FhJ9-EYSzf5
z2rlWtret)7O$2?CJ(=`zBFdYZ?!ItkRJ9Rl=G{c!wi}kR(R-@tL5Yh4N~XJyGtQ1W
z?HqIWqN&qWT%<B^7u1^LgItM$d<{H>%M9SdhCLId>dxkt2+HH9$*oeJWW3&T){flZ
zCi?3PUrQ}?ynCfWgD-fl@2q`1CLlm6$7@aa*l2=pOVm!U;G`gUX07loJFr;P;%qX$
zvrLi10y|er%j6(0D)j18AgdLLbRoWDRM8S*Xx?MNsQ3*y<{c!+A6aH^1cNEtbI6B0
z+eX@1v6?q=V`83f^Y%oTmn8j~)P;_8d}OvszMJOGyNlMTrd*HN#JGn}9~oI8r%{nW
z-Acj(pqjwQPJ(?4RK!Ahl4Rt3^-Kq$jED*GB{yaRiRhKSE+t}iY<OOCi?CA059~pW
zB|j<YaSBuo_Z3s3({N|BPa_%O`{DA_D#t`a%ZYcEzyp<WZ%DQaE`zhO9mh}A8_~lH
z-<dEz_TUvL+qLNp8K4r{yj0KzVSf01&WRQ=GbE_YcCYS(Rd&O0DgJ!`g6<$giz4$?
z@00>Sx1XJIE_1oL8g$}+Dy@=VCgDB3^ca@I&dT!u0*v9zc6hHv-RuM5(@+y^Qwku@
z#tViXoSc*WYnzYU8`sk?-T%w!^l|KMX<a@t<vgEZ(%)!?PbG$->#NHD>11yM^HGQM
z^54t%#^3p|a^mo%_>0rJi+78@r0kb6#_0effi4P6*x$KsrWO8QF7HF&8zX>qxQy$r
zz!~4`_Qt6On3_%l$5|XxI4;}ye&K>(URKZ1$n}TFVb(UnA8iJ@jWYpqA%)KI-3Y0G
z=Jem7HvLW`#{LcnDDII|Qw_Ml9HJj&5D%yUXXyHhS120lOd6V#YKTQxZtC0adNL$e
zZ!ztdo~3*;UHD6mN}!>k#_|oSU{uI6g;Dq7yGIXm;7<FEw~LGd>9FU2i`=c-mn_AJ
zcbS7)(B@F9(_mh$>5E7kO=&dTxeupVhd0VxTj1Pe`KW{rwy#IQ6sq78F6gG^MEip8
z4h#y@RW@{hv#hg2fz8HtjO*>J)#n_Xp@P+iFIbS^8#{Gt=G((r5f41kfi^PQTp>%6
z?$d4&VE<|zcOb!P)HC%<%!L8^yI0(}aVXMUl+bd3k1`eu_W6DdzG~bX8)<d9SN*_k
z1|s;PU&rmRr#m~qAo|eF@>6@yhEal(6J%rtmRsl`BbE#p&#yH`1g!O(<QkdOX>-PD
z+ti)=;k8VfdLmcRT;J?JUA;Z6DP>lLwyuOaBN*6{Wu|&wu#e^^n=l>2?uh1H<x+79
zb9tX$TveN_x*tonyR0cZ*G^Vd09G(#K3^Nx_wrk*#u|Ro6dF?=?$)s>Mzb_DLJX(<
zDrHY|IiX4(a%NscImVIyQ5FHb&!0k)k9Z{O2_!V~u=~NdUbZGgG$I0LH}cq&<EDy5
zR)v83!rqp%<l&%_01=tD)a<F7+G>j1g|=Bf7lm&Q%2$=XMDS*i(|ioWKrlreNCMWG
z!%z4`u<SFoNupOaf>DasCHQn0h(&vSmJ>)%_|gz*PGLNUsWE$;o*5Z-9mn!zAzRn;
zbD@?{puU9<F*#7c1OJ-u3wQhHTm5?8lICgI@n78#oxJMRn{duWy`-MGjlrKNsJQBG
zRN6NX?Md||Bcl5tg9+a_WYxf{V|D68$0D(~$n3vRUg61UZ31m^56p<iUL3->B2e_g
z%ID>x>$HWcK95J9Tdfm!GKJX5hi<R4ju5Joj@Z1NM=9*5dkdU5@vftu3a6*K0V^SD
z+nqzqD+^Y7-*OjW3xLdnrtzxu7-lPwDXWUs*1oSp2G1?QmwR6uB^q#^eq8qoH1c96
z(R*Xs{v8N1Wh6X<VDtZt1=ERG6E6oqEmI(ia5a-BkI5Ng<VZd}n|x45!d|(=G#t~x
zv_YQIzdzZnw>Z3C>7CZ_wq9)hwqwjhy02-~%KB6<wG+&}tWdN?8=gPhOQ<G_XBpow
zw@_jHRXtc{%9njIXLaYIPQ2#}STcE1V|gt~BfVC8m-GMh`5Asch%Vh)oQKE0KBWO{
zhxAbDUZ9M#9ROrJT_|imP{kDl3Z}v~C)`3^ZLk)U4y^1`iXJ)_ESy|e^_XNbdZ{wB
zJ-tGUIk2}<U4YKl)|<lnfw_qYQwzs&*;&a^EJ2MCidPzp!EWm2a3+<!x&8Abo`T)=
z9xxA1d-@WeySwbkyLMGpik_TdHhwq&c8_mP9QDKcK)oE>O<FA*m!{|f<L-S|){O24
z!tPWk9a4{qC=CJM;K#-diX(~B@9T{Znge!d-|LSuy(NZcS<WJN<PNj0RdDGm6)b=G
z<2#czl$_tz!a9Ky8=?hX+bu(8K96(4)G9I$<6<oq9J+I&WOFt+5y*F0@t7U^D0i7o
zpT)UKUbZVW2s3OIH^HX!zW$Ku{P1*s;9cn?`o|C6e?mTLm`E<z?dg2$a-H0`JkWXI
z@NYz_p*O@*P&=**dJqYEa)KjLgD=oLgv4L1@06yl6*W=oxkn~;iib&bgGErZaA_JG
zL5e@J7O_cEEf~vK^b9f80vtzr+{|BpDqtoEG#^u5zFQbKWS-mMtP#?U#!PSb(iCu<
z!#zU(4=zV%9(VBe_Libv%x|=-4GPy<v^=Qkq|!{dp#%u<!oP=={%?XC;Wa9Re)GR@
z8#G0ujSS1*=XZ{}^gXeom(8>8(N6bN$M0uD|2EweN&n?MM33u@CdO`)_1foWzr?NQ
z{)qJ_|9YR1-|bjOR`{+_kDHEcyzbZD0{AdM{*Sx-`9+&jS%P<z{wlQ&RyQ)~_at}k
zs-~C_?dJ#l$;sLU&+=f*oT`vp?!v&n`Q)s2FIkEBJ@9wJR3L?hn#|78Py*5WP$$16
zFo)tJ(R3OGnk{;ZvrU*Pi#$uZSY@(Z?n+IK$d?=}@`8{|?+!7aDqWyVez62uio5ib
z><dxts)7ycLNT);xA;;oPbumMhiZJ^3VklXx`qyl&GP&C7(KC_$9}$QL0o3MHvK1<
zhgYrK%;b|7b|~_gA5i=4D|`ZST_veN(l3>S&#UC%9<qUv=H}q4t16A7hq<ohjS3Ll
z9_z09!CY8n=n030jzADxOrSTc&g@k@sjcWsO0K27jD}SPI7{g~ZfJWu7LuI>sLc*y
z_1<jt<gLB~sT7Mua&TJDBfFWtV~qT7V;u!<&{sMY1KhY{DL@6;T0n9*!ZLlP=!m&Y
zf1?8aDWJHyI}Ch3k?$6Ox2rxzzaB?s1`ClVRAdf^L33XS>N(JYqj|}CrXu$cQ8Yp`
zI}`adlB1A@bln@9a;Y4TcH}(!?-j{w4KA&MAZx(ws^1lRWiMKESr=LpD+pf`4r|ij
zKLH^S?SP%U3?cM=HpdMehh{4xT)Gt}rrchWuL3d#2oZM7`+b9iY#oc1WNE;F$$2H3
zr9x$&QF3r@Y>3<DKf^&tTr3RGk50jC-1M%N<1Yu2DbgK3fH-Dff2D_IuGvdrSSl@W
zds2ZyFF%n01l1C#rHoLEW;@<7QtgR_v6mI3{1a3bMxsi{U940*cV4!OQ25}{za}Ox
z*`TzrGb~aUE9UlH@O8x1%G7FA@}d?<E!<YEM4YCnq4)hh`2Ld%Co{~=<%VWeNEXLL
z`T_gsv#)`v`-=RKV?;0Uuk7;4=t(Pl&1P_Uu{`D#B{{?Msywb?5JIN?KJ5Uo;L2{X
zx+KXcbIsI4T_SPdIcS;ttv_&Ek2<%Oc0Qr7@1ibsc%s&K1ebKX($P_ld1ZqE2L`K8
zP1=k$E<=lcU+88Fo93|kf=<|2TB<UW(}Ei=`K(<NbzI?S2<+5vXv|M$z!>t);LI0y
zRERufMCVWdcxv6S`gI<}|1~yF!zNLWJVPM0?uZ!rX*86QM=0sj^ADm6dN?H3?ar3p
zUemYuI#e8%yHF?424}u2*p-+S1b;A?ZGaHj%yhi(+<5&-xNd&YU9zuCGI&vvZy8P5
zdWtspiX!dZmUUORh|Pxn*lZr+^x{{ddWD8K(C)8PCKY~Q9Zb$syg9<;jP;yLTg_H)
z>RQcVK|*E!#UUV=J+Cq{mLh6vEG8z33&#aGiAem*ZiGGcC7j0kfkWGaPK2MuLH)<n
z_9(#xAG?N$nF0LhOQD|s)-?yH<lraK>dze%rL@@QPx?mv3<+Owso9J8``Z6weM20u
z??0?|d5KqjA4Vh1xBnSXE=&hI8W-K$E44qYDEAbapM}a)YuM{|M5k6msV2Cq##ymT
zyh(|{$Wx&m_?CpH^91}`rlXmUkEfCrRbsJhHul+4;IEN7;Bx(q(m0s@N~rU<)%+r2
zUiLi7r!OvDtCB2YktWjzxvhr-uU=9;H<Z$-j_Zc4Z}}g68_|Hi!m(YuI@ZF*(Sa+`
zN#ugqw{lGT1gXbr;PF{e;n#1@@YaMPXkXCjoklj$*jN99KWBXl5aSJ#o;8bV4Nn?^
zgg&E1jl$kO!x>wE3MscoE)|W<6_=O7Dl8`X^__evh|oU@_lXz^GJSMZcd$eg*{ZIQ
zd*TN6?T~WL85QAQH4iWMJc_^Qe8PBc1{Jx~R_G|3Fh@JggcGI#$aS8o+Wy&v7Bcld
zcVZE?UVJh2n0mB9sgQc0ag?e`Ndrhze%V1IKDlg0G=y&N=;-mb`ShEUU#8Z+lnq-W
zcJ2MGz-zX+SHNGgZ*vrRbvN3cx6q?Z&*?REqM>2npGH<Q95`*7E_$RY<M3)Ek@Fw(
zd;!QR;9k+i0|m`%>jr<`G=!R2r6>w9Qn4dq&keBzdp{Ad#UKWR@4<@rreif;cTmSc
zD}=fjC0U2PTsYjJ;?1lL=Navx7dm1S+NjR10#hB^w}Lbui>2kdbl)A&$EK+!)ps9>
zjon5I#{M75?t#0~w$T=JY}-b~wrx8V8x`BO&5CV1E4E#+Z9CQZ-glqwz59$aM)&#;
z>sj}D?m6f6wJ_Bla3b{s3E+Bm@ZQ13qBp&lGX5LO{XQm@PptxZ|INF(ASIuQh*U|=
zm6FG(IdL>IV!S%uhNV4X(Uq$>txHtsD+dz#2i=Z0OS7~c{vHbI&pk|+odO4-`BiWS
zc+*)Ef@A^T4fItQscFWQ2t&%CtMCx%^d*q5-F=9+duYI<@y0>r*L-$dibK|*0RM>!
z<-q!KV18wQHOBcpUq3JAgOq79SGm~9_l*Jy;AbX6bD<kdSR9y2hbu|xjXb#iDkc5z
z=;w#`%NOwo4W4%z8AJ#_1^e!2(vQDF@DT;M;YRcsDpLQe;F%4HjbxJB`P~}y`FxuB
z7E3|hQZdajk8>ar7~fj6$B4b&m~am2&<~Z&FGvo;TO1KgLsLSKm10C?{dHIv+1bUL
z4H!m5n!fv;i-lkWMkq41C?Z5=;hb%ty=&lLGhf1%qDmns^rJRP8cV%Q?*j*oJ#SNz
zDL+obceN^hY6xzyZ1n++Qc~>ctdn`=gci*y9K1+En*+*!@N28!ff`VYk9tV&sLG1b
zU<@li3i>vsGLKeutwN^Y<|)xyRtQPbQuI(oAY_eq5@)Eawcqzes6iO9B}xzOl}+ST
zGpdK#LubGoZbE^vaF_-QC0!eS;c_JZ+w5$~K-Q{Hr&OZOI5|B(y}r6UWl24n4+*2n
zVZgAQY^UXyS8*`r5{xvXX7j<HBD`ft8e1MVgf{F40j?3j<ahs3$JztgKC^a*vj?I9
z=}l3_yrS4u7OUuvp6RUMV52*b#T|dd>jkuzb9*$2CwWi9=6He<CK}o%xQ9f)$!)Hm
zt>^9lVgqW9XA~#iWfjjs?RB)g_d4=op_~1Vvr+(I^K^=9aQMEo<R?qugftQ~62|no
z33vs9XAv7>tkCz+`+M8)59k{7?-l-chWvf3$Y*!$ZVc8Uo#h1&;#k$o(f9N*rkKb4
zaslxyNg|DauvOoia_HwpU<jpVWQn39(46zagn-V^k9Qg!DQpZmA$T*le~=7Rr@sH_
zA8Q2g4=3=k&7!e_5FNhnMLu_6Sj|csmVzpN`$M&a%ClIV{|1hnU)LJnCKZqt&+d$Q
zMb9F2a=T`lM_M}{-ejzB!zp3*7XV)1h1Y4XmhYX$;fF<=zzyNweN?7jn|vC7&#-o`
z=kALs-EQFMy8?G&pB~2E+U<&z9DJ>VRhzf=hBk-9y)bKoSJw{l>qgGx{eP}d+|BS@
zvvyQtr!Yr`g1;RYzXu+eU()I)TijS%JQuLW2q6`;uwT+>BDQoOM^4<Xjs~8|h@wVt
z>M6g7_}L{{Qwv(2q*`m6O1}5J!3@y1kIZ{bA7grxpfEM@df+eg7iaK9v1l-<HIypk
zL6r=|M@1YYQy8uOcpLB|p&k|0F(NsJuJuygP{&>3?Lfy};OSVxS>h(~Sg%Q#u9xAN
zURvgK#hz_?dlNSekapLDHfygLiGbq-HZo`z0-mcDv}rRVmu-b;-OW^QQI!-6{z;tt
zCMlJ&SGl1UE0L6^LIpLmM&;T(RhzG^u?>h*)+o=DE1qu?mI=UhMUSA;;hak*9K<Mh
zT!HB5=d04Ox#-uXDXwP@AAbJOrjFJq*2QP&Snjak`<og~{AHxu!)M*Yf2DUrr82_w
z%hraP@ql5|5&Bg6%N2$kZZ{{N@U4?8vaQw4J8f9!s6Xodifz*&>3sGPF=sBihcS}@
zSH35pC~S1CGQ@kC*~)xX{8?je+}?#R&OQdZ!3RJ7NbJy2!&~&wQ7NcQhyI9^IxPwU
zy6NyPNAPD~o<i*aRHH6nVD4l7#GozGPIX6oXoDF}fT?w!fU=YASvpZ;;4A6Cgs3^I
zr;gDL(Zq;>fq6CcjQoUmd<<j;*9_j99kh0EiMc(d+;s{boXRC7(^IM~Sg7%r?r@5{
z=Mu%jDzjcftZ3-jVOR=A_&0YWn?Ki{co<%ZH`lfAEGe-k=B>CaGT<ulcK$0#U83Zt
z)dIVOqc;EZJB~IUmmG|=wy)4^k~%|$II?F)MWWUI?HbBG!?#re-`fU#yKYm-Nj2^X
zHDoS1c^xU&djMn{gYAEtl4YEvZ8$_R@XHws2L{bx$ZTz((ry5w=5SfNF2aW0vge3J
z$lBl)v*QtC{fp|7J&X^u+9KjIR}fejc2IrwN?{=<cPUp<C+b5TM`qlIB~xY`Tm#tw
zz@Fep7(G!zl_e>1Swcthn!|2hWEolq^BZR85Y!$|F^ebBdHDA)(|6qE(MCIfVw>3F
zauz$b8j5BZE7sxoLQnprSh03B8^#h`$QQf~HT9AXxQLPyEDZ~4%FZ`yd`N9|c#{t=
z+Bw;n$AWn1YpcCS?lMn%od|PQtgB4iSS3Sk!3eW7*|bcySZ4)#V6J=`f0PG9fWril
zLd`);nnd&>Icjq7nh_vv`WYjbLT@1_Sgh;`s@NRY@LI|%#Xp>TR8VnR%8UFJnyX_O
zq6zj}&bL)zYS}HSg*&s;cxJ(w)oq9fY8sIepOs%;P|RW9^7Dfa<L+w6^t(HXaz~$z
zlm9Q0onuIlPVs2Bg}1r5632~t3y@>w{v!O_WuxJEd}Z_1w?wW$7%7PY;G+9eCaMH(
zjKQ!aZ0QG3PhmVUQXVF1W*s8*)&F6NXvO%4+&I2FkD~PZKNduiN}@~f_h1ebSPcF*
zn}ikhw8%nY`-S$mgYg+BTBZHt+yK4fW1#u%@bv8!GG6)pcCOg2ly|be<sxJMX_e}I
z53y<PkZ0!R*N@P$Ha4q}@jF43RZLbvBZ=i6?|SEtVxRl^@9Hk#2AluuzqF*EpzZ}>
z3wk(?IBgDpkgE&<17mHDr{`}ELkQoyeSxC(-Yz8@j^~|L44D#LL67>_ErIv5%h`7q
z$PEU}D+d#t=22;6p;1qRdN>tJjB>nX)kKGlN^3M~!#6XgMh$HJLP%Q^rKEO%SjrmT
zudls{8H{6F9Gum-=WwJJ@tA(}C1+jCDshWUdY?L0z<uv*Q<#jK+oeA*yOo1G#P5ua
zc4NeI6EqM6?Gz@Ow2{I*CF0*u8YT%+0<NRGEr*tuSjy|mvC}!84L5s`OarZ~+>;UN
z8>)YFl5-R{F{!N~Q2yhcmCIR}D+kw_^zXI)mW5|skZ>z{<7ccY!bnbeRE`d-0;dGJ
zNxo23`-YmljWrz&(Ep{Hl#4;sxIc{VSgbs2!3^bq5^A@(JDl&ze}&Bi6h_Kqc4GeT
zHj`%j2Ev~FPNJdoO|yNo&oSs_!&rX8-DFEer21&kQSmx4HUW_YIs{wU_LjUS-N4NE
zOWt`XLPN{)Fwup@z7IhF3)3R;(J2Ks+#%cGp&KSHe!<?Fd)RkFqwTfr2it93>1yuy
zl$M0Lpj)%|=|r(a9ERS^AGeuLX1`5eZxGYigUWKw{_F4Egm1V#uY1o~9R!J)2h?cM
z#1Z1wsp2&MU#m$ycJp|wHQY=bA1wr%;Qj~KBVO&{zrDxOkmHpb;J9~g_T_*d1I9Di
zLsdaOZ?2z}`;U|(-{N{jp8pY4KdX)aiOt;~_c^JmR3t~xyF%DGn_(?axnbTq8j`f=
zcdgPi%(R}@MZkR!9~-(L^W1tkk{qF<mVj5J13}vxzcg5R<LpDk5Z8@*L#{xf-3@n<
zGb^u;>N0-XZzXH2nI&-c0XgE*{L&x?cyto%tplziN10u8owl6@bm;oL5Z;Ghc-Rb|
zP?v=jdHu3bWu-5==(y#{%v_3$d)ml}nN$1jJlr^^$%#S8o^0POj}{VbvS5EI<uB(L
zS{XK<0<--7f#29v=x{vF-j&Lr`^l#p^Hw#FK=}8Z1CFx~GqWtMHC<4_*Fm#6yRy#2
zlB^)n*km9V@;y)t?rp2L(nHkCye!bSCM$>qz%Tt;J5%%-wc&r<A!F9d@f+$easGv)
zVCqGW?`f<wi5!>3r+QGasQ3cCE7WB6Q>s!>Za^X}5@aLZypSW0njP)W6lC}ZDfwPN
zV%yPwv&)L^abC59piHV74@Zs(mk}LIKu$4DJ35A7*v|_Wvo1UzqFLAm%IfngAcb)q
zKwm{7`>;SD?QQwZu(=`is$9*EY!D;4s&bv<eOi`#D;~YReB;#3jEt5ht<h&-bfNcx
z1oz^@SaCGjrRE2~o}e$25(rPmY@kwBy*aEw7~u#E2^d8<9E6Gv#-C>_8Wxh|j;iWy
z|0XdXvy#$53%fEg%895kCne#2kiTGZb)lIH*AlDtGnTz4koe8k8oA($nO<bdJL#>j
z7^LA5^rZOQog(J+!^ZViU@II9bg~xp%})*twePdS!`Jov)3(>@@r9eY?Z0S0%u15k
z#&;=x`&OoIK2a0lrm`DK2@Les>#)O%M{AUAI@16tnbA#`PSGRyjXyQvby78E4AdJ*
z#eh@8)puKOnxs=AUi;)Y#G!^dK)5Ea326>*6U?mm;Xo^`Sin+&UCxl7QUUr?>}4Jk
z+4KXyJ$D&a^Orax5k5J}0a~8UaaK!ZO^|zf*?-pnexVv|bd&r|b{j{BQxEgfV+>n;
zta@Yf2U_pnhljD!Hq)~?(cjqi!0#9LhviNa8s@1z?JYIZS{lVwmCDO%RB(&660D<y
z<X?SE%jj^xFrn%)4;m^L0D3e1knRWEMCHHKMf2y%7R`48DTD1P^M}nzWUbm(7YulM
zX}}yrpkE^gVv5589~j$nUrev7d%rV6R`hS9tGr+f9Wmoa{<+^e+*l*nyL)Vuz?766
z)Q?GT{~gRtiiwYbPAUwVJEvrE<fhBmO`oopVbFo!-w5Sp`^JRMNQzXHUNlL3uzDVN
z!a*6XNB3&aa~2Ey*j(zX17H=gWJEF`0Uc1qFdEI_l_59b<%0m`Af8W7RO+Y%jUyCV
zetOmNI^vYP=e$J}UEupd$;DW-!e*i+`NrDo09ifV-gkj8xKLX3-A2G1M<idh9`F`~
z@>rYcT|A3Vml3-G3hOhubDwW#u<r-3g8z>{^?h-tQWfCCHAF6M|G7rWI(q-Lwaaus
zO^abSxmIcymiXTVs{VuXf9l|>uGT#tX15P4MkdZ<00(PuX4lr99vnljPSjYKn7#EB
zwQ3K64~XLH^yr-+GY8vD^F!Z^9{C99-XiDhU|2T@`598i4RWQyA6X{3>HYdjgNH-;
zar3=LS4{MChT!KTr5C6Ibdx!|Lytd@<Nn%-l0Pv{N0|%BJ-<~BP3Z3-%nh$HalPw)
zR(j1QjJ>~HfTi@Eu6$cmF1KGD?teLmwq`UY1gHOo1Se9g>z4oyPJR^kfA3qQndt!`
z_LkkupIhI=ARC22f(G4%tlrDoaYOoIKEd%55-4KCaaOgPcwi|#DR}J;SCPt@_(R)#
zmS%jnuFum58M*;bP1w1VZhHR2lgH-g8aeCtm=?Ib-a5Rni9zl6GD*<`MZtJg$xGgl
zLN69E2keSLd#ZC+D<hQYQ2VDGPamdOw;Cg93OS9rIxV?>!$%@-O$0m<ylb5(`d>#W
zA{uN|E%Mk1EJMoDdo2X9P9&Z$PjR0%U?s3JcmmTY_xDjh>UI7ln#_*<71~%pD<ir8
z(01{*zBnFfkWVegH29vO>r|FQon;Z8)l!}-9FL+DrxjYV7yiOvGOm9(q56IPHsJO2
zad1~W_Euf^v=7@enX8SN#bMS(S3u}wm)`m_`~6!a<kL`>T#=a|zcqXlm-=C2>T#Fs
zvkRjK`xGU!h2tjW-^1B$`3e9P+??x(A%IBqZbt$CSVQ1o&5gH%$i-l=<+ODz5Ez3|
z1t&=uaq|In%w*+rD_!S93D{~xJ-@DHv;-Bn`L>yg2mY~{B1pe&CTZwzo9Ud)_tNz=
zvfn^gq^^co6CEBHo<o=c)}9qvi`31M5eoBsA(w9Q6F!<qfl<Z${Q`D$t1!}4aKmw*
zee0Ltn)&xq1R%J{G@T){yOWhG1p6novUYm6Xhtv4mF^rgrv@NJW(VHLfdWj!WczPm
zfkjfUhc5N6tNWf3=oT8V&<52+Zm%tP)a&u!A?cUdeGy*994nJw(PEt1>7A_)XXt7s
zYu_n&1X=~(7-DLC{{yuDxJ)zh$Ya;M6e=@2IMTwLiwrtVGrxmpy)PaNc}rIoO5+%}
z#YK^y&t^)}db%nn8PgN?!QGE(8)frQVsgzdUX5!`+bCb$AE5poR7~L+-`&GD`h}Ts
z(Pc7|$VTpLxxbsA`2`+_AmAqpQz{s^W6c~~d0;je?MZpN`0CnjW^Z_+AI|sKwBPqs
z=)CGr{o*wNBj6uLC^=2f1MR*PMR|qutEE=Edmj^$O4mGO7ukB-9?ry=tHJGdI@Fjt
z(WRjoK5laQpr6bs&+Ukd6XtB+?ecX&&z;`#++!KDsD-i&@9%k=vkn*3KU<!k=vkWa
z0c9rW&)Qo6_}ZG?j%E9t%k~zFpYfX)Ag=H@sIa1Hn~M!z>(9&zc=O_Ghz<W2*%a~R
z>ta>0Y!$PN57)Hly4h#2+2=1m+7Tp#`rSn6WjLR|Pm!5fN0yU_WT03K%owYV*v*{>
z4hI^WJ%OnvMqxEm!2gLoG8H`DrTe8YK{)SEgm`7}WM}AVyL*UfU?#Fd=@|jP_d3)b
z_7vfiYy(FiYb|%bKG*+oc{n!n?KwM(L*DseQx?sb2F4eUHSRt66H+}J8#!8vjEmu{
zQkJB~323JGOc$L!>nTH#P>XWD^wNfLO5u=r+pP%*ag^wXBngkkb^|Mf_{rnTSfl^w
zv3!I?EIFh`LNfg4olXf3RFXK@=zT=_P#Ozt<3H9?WaxicOSH3?41D$1dVL3QMOO+G
zm#{0T3VMPAI$vjosseu7IljZEJfcXtj{VkMlsmz=d8ye4;lRY}i>y<%QC{?<p8qkH
z;wG>Qkof2=7{k#i_@^+AZY-Two0Cf2mi}Wbk$OxOyL=l<EPJG)>vGgU6ovl|KiC@I
zo)!+h$F;rn8Iz-|0Xq&*;KcaXOY7Oh7GyY~?#NOSgE>-W8{l6&s8b*X07b&Er~R`c
z)a3kS{=TF}y<UXBrwhS|fIaNn2C;@&;Hjn2R7In`j84ZNiGdNMySlYA`)M`K|EXbx
zSum_;Co59e9u!&+1nohk>+(%PvhT@->=@GY%@J7Gq8rw%$^V3l1SA;-WVqmXo0Eq?
zYzHwK{f2mer%@)Tx*FEO%&zU;@zzzG&T$cX)rKDbNsnNscO&<T{b;q!>>HZVT*9Ii
zZGuE!rJ~3eO@X((Y<VMq3;#vwlxh_SRy#UiDHhxY&M)303~s{H(Bt#{z|66V?lpMn
zER6lh3I$KR1(tx$yt_U57;;=nBnwXoz9<Bax@8^rBYG%9>V3nsOtea(43?D)>c5O7
zbTa>DKS1lt4xDHu;3j;quua|zcsiEByw~_@48LM*ZggCV+DvcFEOpSrD7oGp=Rl{h
zW+c<YDx%<tqcGjY)~E;JlJ+#FHu7nuGSjA+J&9roiEQF=o$_);L(3MvAu>i=uqHDu
zacc14lm#L)73l`bM5<QOrF>`h0PD~pUd~Iho}QyLgH`Hl3vC`}<rSjPGsNNa;}#cW
zB!ExHA)VU^bH%0ioO{PVp;{DjqwFY+^~7>}RAU9Z8hX<X>yh62{6G0YBNZbR)^=<c
z3Y3VcIRIEcLl$;GTcO4epj65N?AuxvNsuaCbmO{b&s#g3y){E-%alq}SPWa6f;wsx
z_+?o4ei`yMX*n5kPOLT_J*g}DiV*w>)h7{fe4;h4f&dpae@`OP?}jR?UR8izfN7ul
zL+pRp97cNwIhm{EiD&aqO?KM@cX@Q}fd;ag8utv^m{@f>f0Py3lrll(;Yux(o4S0(
z7ck4%bpMeD17E0bIHxcyG6M$RV8cO#V^mW{tgCK>Z6e;eDgA!9R!pX}L~dIMw^Zvt
zaLy8Nt-c5#roI@pE>(Rwv?qGVZSl(=7kSWn<Q!4YM4ea7N|b|sN9AF2e$2550yp{a
zrJ#x{BX*JaQ(8Vmdz>Oek<xFPiW1f1J#>aVjmwH8b7R~|t@c9Hv9IdR<Dg}sVgZUE
z4%Jz|{i=fOPXy{OhJr0c9^N3nR!7W=T+di8K^zV20Yi*Q%o%0h#A@TZ;*vcUVlZq-
zv(sb3G`-v^{q89v-FBt@#RAq+U^edkAjZs*itVrbMQ*_RB2Ret680SyMuA212@-|@
z1Yw`B1GD%o5JqQP;=TsQA2n`o7rExAlcna_z#t9C7SoX$R}l5<S@cY=dx@Ia-@}MH
z88uZG$_K(0I!cufWVt-^J902Np|L$e7RIAM!do3h0&DjY5P3X2&7UwXu9@0OtYgT{
zo2^P+r0TTjyJX=e6{l<N?FLNiQ#;s550e$X*rdQ07bJU^6GKLHA`m$vg(~Hr{;+Q0
z7C7ALTUr?k4W&^;sD(g0@IYMdC}jAe45~KLJexs9OnGtRXFK81{3Gjmu->D!uv-)y
zR`a@6Vz_tsO5bCPJ4(s!#SRfgOv`6YhD=UDc=KEI&X?g}Gl%}SrP^@M`4D$}(=%#i
zG|O?Bk>9zWw)98mK%(JsV}Aa`@UxfmzM4Tri7@26ux6mmKt&bt`1hdG=|E@nXqb}G
z)asvqA9ho#|3Y4Fd|tI6n6R^u*b8q4EDsKbkBh#e%!vw|$&#V^%$dwcn-qW*)6L6w
z0Qt0=>2aK*F<QXA%7^Ai%flbBjFzAqd}BmK3>OdfBqa1Hku=A_Js!UipU9x-`g6e^
zr;Ot@S}hYw=9>ORtTs7<>VaG;Rk^W7M3<xw*ee5c-Mty{{_yFyD^I-B_}M+eh5qj3
zV79iM2n5NzRxW%uTspOjKLt+AVH+uTw_)FXT&agO@o{=tYIDs}-Fs|HS;a?Xeo;)~
z38xTK0lfa%Y8TYD0~g9@p4yxnq%c_aWEVs2rZHTG>$11t0Sb}8;JKq;1x~>qL|C4U
znYC?I61}k8rC<VjmU3Ygr_`whBl_(W^YCHVo(4hIKA~lnm`wNQsu|pG{v}Wi?08nd
zZarnA-Rr041i)wA^>Q8HrQvxVOXu}$yUsJpEhJr{OPVE1r%#R09$WNnAi<lNaFZ$6
z%1$K0EaP0;-~xWDBRV4sfjQStmB#I26)ugWG%T*vNHth;jCd9JxX;+GEn><&lzU;=
z!B-+V04~cl2fxP06Q19GraD+$T6iHF8ky&>0mG)y6pOs#@neT=LgSXWe!fym18~K{
z=$f{vwR;f@;c7VarqC<`K-GtY6LQ>mj3Vy>lBHRmfr$R3A8DTE=wyEGwTO)uAs$;p
z-YBess(0vA9PL7it<3_Y6?c(os0+Z&(P1R~q+818S1L@Y->WlKBwozQglyu*^hKb(
zwG=ur#w~ji_fEcBlp^>uwzMe>ficMUB>DYmFYzUr9ly87@V}xyl(PkTK~Lj&e1wuH
zXzf0ep~Zyd(-kQPo%=%NhB+ulLoe4CQ7<)~&%f;^b>=dB9=T5T$gR?Km^ef8FG)a|
zt+6D7>z<6=d;6-we`!gO<MWf1-=_vqOli_=|AjvC8^yZPt5r<Eqwvo*drJyPm}=ik
z<U=(K1|QvAKpP{6c`hJG=ws$U?tQWSuoCgh{Emj>B^-AfZ-e}(R%DzO<w9)%0#k6n
zqqu`bbngEgq`8~7%(GI}ls6$Cn_Wby{{>t?ie*W$3H6F68Hz{%AtwgM5G|SY4v;gX
zxY<2h>Tg0jQ2e1XNnFN9R8u@PCkle0!eF-9@v}QKkQ*YkbC&CpN||ER)s9C`P8<OR
z0A29n)~1F0XT1cQRZkntEJXCr+$KIN{N<nS^G|0%>qsK4Uv%Z@FPO>0zF~}uh|kq^
z&26^y-VDx0J_QIcE2_d~^YA)71N=OTKkhJS4}N%_({&IB!}l^=4^oV;KTz%yiRqDR
zL&u>m2Qk`()IWSL21m`E|3=ynDLWv@7cJBrUnC)l9(K!~Qx1*<5kw_XfdKxyr?tqu
z1(x%)1YYx`SE*qc3w&o4S!#uSbqQG=@sKZTxIvPP#^DqEi8oj28t6e<tBUnCO0$U-
zcd)eoLN>jqr@y7~x5PAcF++=M#w5q4Sd*no<(}pDL|kk^P0i;k&iF4fX7|btCW1K1
zo89z{WXVCVhT{5qZ#r>b5z;<Wyk7M_kos*4fI&VJlx{}&h?ke=pP9ix(8nWj3HF#@
zSFuBT7Y^(m8XD<eoGM{+^-SjYYnpfrXfG4GtO?(d_-S|6*oa)vH$GWEC6e{DsYB+h
zDjoLqb@P}9%D4A@Y?>sgr)b-&852*>{Rk8GoZbyx<jdx9sBQTD+kbGPubWwj*&OeB
z6Rnh*px3=GQjdpDQ8(U<jg1BTxaaPqlgrp$`JT%~cg&<4;O`eQkp%?aAB4h;%xK%Z
z@se=A`X?p3wnYZEZWDSmSJ!}WHX2R0mk~qb!@*0_!`Ku^U_FP7^1JHQ$IvPZi^tQ|
z%595eT)tb}s>X-VwvXQ}fHB_#A>n$>U!<}>^E;SE9I@uEiQ1h02UE0T0Zdf@+3Pbz
z^ucj<eJ%ETejuhKi?)b*oB+tUT?ftP>U@Y+X{n+&wnChnr=Y9{F+ok@TPf00>{tHZ
zl%gH*-*}fZkI|A7fGGM4h@Jj5+cMI+n|IVPivmcO{vIiLq;^ab($#832H~%>7&xZY
zUlQ*Knq{h?g*~c5xHp+}2ylubRo_SvGEK}1J0~wKOER_WjjzSv4)_5c&-2C5z2lGE
zW;9E1>kA@DeMraxsK1mb3&cC$uYU>~<V7Au087xo$v~qQx8JeJ&2O9U@vG$_=Mr>H
z_y*4eg1EQXz_u%XEKp~INshEfTuJ!~G`OqxAr)6VtUg2pK!*oo5kOcDfDedM)3%e?
zSBr}BgGN2krp&V<AQTP{87afKaqfRuktY8rc23`@jU*?7*;TYe^RT8MGtxlJrx7XY
zBo)ifNR11%vwQ`aZ>oopk+c%i6wM^4T4II~y|E%`Nr5Vcz9a)8^L5|Tg;aUEH~)r>
zEvoF8)=R;nJ@41QrJ_;if25+k<b1QV|0NY6F;4wQDoPZem6WZs%OLfb;noCP%M{WJ
z_eQpI4OR7*N*BtGA)#r&(KTR^+(=3a68|5hqJ5M>$cG#~@dyDIWU-<=vH!1BQPsD9
zrW4e}^)FTgEdMXzN&o3QoRdBBQnu|bJy5(8(NcPJnQ;s>9h~Ais1f$7@gJ@TT36uz
z=8FFGd~-$Pgqd1j|2tM>op1>-0KpVzgmP|;R>Vp!v@h;YV>y-<Q!C1*c6P(&PNag{
zTMfM$jA$qP+N;Sjh-F2Cv^78+SWIbK*_wJ`Uk++t(eZQZT0dJw%ki=)YL;hFSck%*
z-j9^H?FRpn8srmSk?MTA5@ljZow-_j4(ncp>wT>Yz8*te9~Q6DLGX*=1TTKZ)8F`~
z!L(z(yEW>@F3y8XgT`!}k-o0-)~p=plExoT72T0%uw;KtDna(tSzUcOfZ_rdunqkz
z`?zfoqZML|q=zhJAB}pSv?hV#9)piP_)Oo0KZ`tLggJSUBx1B$q@hk90U>I%9lJ$^
z$Nyi55j|3>UTS)5>VHR!P}F(<2Vx}82{iu*#a*!M7_FaBHfz4v7lDm;${i>}5HJg@
zd_h#??O(jM*+Uf>S22C{1+Y+1)~)1`S$DpDdA*k=oR#48>=#8aTxmITZ<n2Xocmrs
z=nrkHn_LNjmRLStgAgODL^JxKP(W^Ige7)>iVEd)bcFvf@pES&i6SZQ?rys6H|J5o
zZmfDjfvYQsixsU`MS?eiiDNT#vljb0g)1t1NJRCGI7>Pf3j7k9x*-Yqo)Rt!CAHB)
zuxze4#bmg;G)IuV8o}z1J#;7lssSK)_Ex%5g>aaXO#APc;tBwxQ=`enih%NxrdPZZ
z8?hgjt&Y|MzO<QG_6s?bcn(<3`}D=#-B{VPPa0)f?U9+kAofSE=^rUB!?q0fEpsZ}
zXUKaI<u2uXcKkTirAPm}>a*9&Q@+Lz&xv<~Q^kUvdX>9B1-hzd^~HS}OO0=5XD7<)
zBhsnhcC_Ek813>zzC3;8t|`*vRVpq~>RYHbtKoPsiqi4_ZWje2jrJ~7nFm&uDg>g{
z@M|3_?MY2HP-?c3sQJ?1uN0sV>?MqRpzII61adYfHHsXw#C0edjRL`jr$f>oI!`UK
z@;1bEsfzYc-IXBhfLLzL&5PXTt=9uD(-2|>mL4Vu0b2>O8B$vLP)(s58}ElWi^_04
zkYd63OWB0s5lVRzM4MMTlM$fblDV_1S2cvJJ_Yw)!)8<!hJ#4k^wM|*)j;yAia42t
zQ{iP6AY56&c&WFc<K)BI7Mqm?QejCssphEwFOV?_ek}Z;I+_2e0iFolcKk>d5n%Lx
z@`@ZZj{bp)oHt|Qpy-Enu6ZVDNG&+}Nhg>ey7auxr|;$C)Y%*hV3%-KJB5dqbI+(>
zLfu#q^T(!Wdo9dFV9h&0=q!HHpzkEixD_kTLf2N#zF^x~u)id8sKE)jcvsZy6mbaY
zPBiMBh#>hWd11Rp8J}0Te{M!DXhEzzyiNjZzn0jV$+%f^I0vnB4Vu~7)3h3qa<s=}
zZ4HJ-(ijsw+-&o{_de3N2h~#bAY~?6l&gJy96v?DPU;>^!@f*HcD_@V2)JLA>a9}h
zw+Vr!r2BiH+UMu!(8ck<B6IMBDCc<8NU)h$`BzCY*s6x%vSg}YVMkbnb1GFzh~s<K
zATs_Tx+HIWc;<1`8BuI{(U3QVSGvt4X_XHzj~`?QixfuD&3#2WsaI%k+m3@@xY<E)
zSjJY)FNpwnt~d*DW-R{yaEaK@BHZ1YFblSxpP^bhV!Qm^<{h>&HsU)~lQ3?$rdRGV
zK-`yj9CE{1blP>&^N_)N6JQoDtpIKM<q8;X?W)zt+deV=xoDi(<<Z3>6N!J5f_*ue
z|HG9AClf)S4%P(W<>C~JwS$Bz+XVShq=Z6AKke<h1x%2Oy0`tmV(Cos5tkt4TJf6G
z+`^Kdd3P)sC~GvTeEtf6mNLL30Mw(Pa-2vKf{#BmeLX`3ll9VI4ayn!?ReI@>(r6!
zCSd0;v~7_h^_7Ofag~*?K#q&-nPPR~cF%T2o-BHdX(Xz=*A2dc2AhC#o_yR3Eq(Mc
z?aXIzM?B%U>fK-)N;Cu07+<hZ0AC__>V}TtD03iWfL~6YU!L)t|FzBdi7GxKF@(F4
zdYK!%?VMrZ+WfDSHnFt=QNA*HdVU*T&dDW}+7FUto@w>NYWLn{GzFfuK({M?$W#ol
z?SEIdnp(v<Ohsqi%!aF+%M&3)&z3Y;$0K>ojs;OW3KuEGpWGcl3wNrW`C!5~75|;x
z5=YKx_4sFY>w?t1cQb-Vbe`Vnm%3T$DloRkIo+Bs+GG&bYRX*>NBy<=^l~Y~m+>8z
zc^8Qy<A{75_l$O|hlW}Kr>z}+oJ<AY$OD!5SzV(}(Rw$uaBEVxRQ0KrvGcmemhZcb
zou|=aXH0nx{<FdjaBRU@pEs8--Otr7)Ylei;Ri8fPF2i>EoG8UJj^Krlo>{TERCUc
zoA1>=rQ~u=ivm^zUQM<vh8T^%_%>5$W0l$^_DB{(>5tccTibjFuv|Xv$HnHLj`Ugz
zY3VPM2b+x|8Mtu>2#%%&eQ7$mCjK}S=&Q{K3~-U+j+Y<D#K~rux;a1-wG&w!?n+A~
z4GV&gW$<#Q2(Jf1c&NPMszWTgdWu{MbvBX~evY-FM;&@Lbdz>xp}?;Ph5o8#nI&N*
zBWhrKFJEzLO1pm>h(@9y9`tpD1QcOZrk>}Cy3SefN5N$5ecev_!d`~$$5X5}P#b%A
zXZBa^E+hx3kH#HZUx49{oK)P@@hMPtE%F69ADMK+E3Nfs*>|A-abOLNgMG$l<NN&s
zb)WC;9l;wqx`TaKsR=O<Sz<0X@}*N4HsaNgA-?iTP6_28RB#Y3DM~i=a}=0?-otU;
z2K+Ac)<n$}me{<LlSK-g_Ez4>U;1F?w|D8X@5<6fIWbo*&E+82OUV4!CoiIm%$$>Z
zcHk{WjEWSTS+SUIfq;}G-xp>6n9Sq<l#A>Shf0M2f_R`el=owYNjYjxdo85XSz5;6
zk9t+i>T)6&GOKG*%F>P;;b&eIEH(4mf-Tlg3M5H}8Z@0p2|A8{coM)_i6K&O079?C
z^7&kZ<*);%n5Y{uyCRw}cAE6~v1B|2qv`iaSWRxvz1lLab|rH*>Jw(!stA@AauZ)`
zHpAQ5rfv;!5MMG+BZX3O*)iW0GB8v~>5`3Jw>M#8kprn_Mvj;`CU5HywZSv$>ty_j
zFz|X*V&E%%MmXLQ{PaxolcAVviN#sQBb57$ioxe7cmrD-w!Y)<iNH_@)}6&rP~1@y
z>r*A7%bzV*GJsZV#cZ;Vb4ATSKT+e=VSUAoe9^X0(mh&3PhbaXkI8<TluN~uo*_CZ
z$gnj%9_R_^Z_Nm;^uK9F=OCx3MuNUE;k`c5A&0>Cy)lkr+hhACar7<jb4+o<a`^17
zfKXuk8uo)p@9%7r{j}HX)ZVH2Q;2o=5gGZZWS=2j#g!kgn-m`>f(wjmfDQsoVWeq=
z{jhj!n>4a5&|RE}k{;|H+nJ&6@0l)^ECfYOR~=llgjxryrNdsPx2bkNWYFM;J`s8%
z$9Nt)v1tQADG(&FAy#_X{c?y>i6oQpO};GX+U-uR4lTZz7S3nIZI&_gdAPg;<?<a#
z*FtHbx$8tHujEcR{&@R+bfNgUgo!9+;xo{wcR76z3yaZ_yDjg(gymlvX!U$3$!7B9
zA9HPL88?3<8@BihqU3T}qV*=<{H;$ewyKT^`lm4eHuX(mup;ZCTl;~7y@v_9X48F-
z+{4@=*z$^+(IfW%j23{B=1%1bXBgN=S41u7DrE!unnQ(Jm@YTM&j&4vfqM)su6f51
zioZV@;K1WOa9;_O>cN@`4`#usH4}%C*DMOx$o{=%rK-0oN25Q5dFySE!$%SrcZV>;
z-yl~L)m7Or;6wo*3lFG~sU~cJ<^CB6s|Gl!nuNLa057))c1v+Zj7Wym0Uu^?;SUCZ
zkY|;#guq9Im=7%XUc6LC;}QiU%Z3P`&*DaA(8eSIu1uDkv-fBVBW&G$T7$5tv}pr<
zVL27glAPLUzS%?NRC?mRBk7Xkr@b_s$LoHHBm5C(QZ#w#&IS3TTmp0VARfZRh9qks
zhY64;ltL5|4j_^>NRfoQ$w)8R_6_TaSCCdSOz@DPiep;w9K=`l^ebl7TgH-=Y258w
zFxbQ_z`+u?j2bNc`n&42@>CLQqV&SkQxNAqE!5vo_sbZ!r%)Cu{@YKWXJond@c9^R
zXoABr8A~wf<;l9oLh0R*0tp5pjwcR~3JicIhJgD><I`*Sr1IOXKAX_?`6(j<h|lP5
zq%gi{?x^gn{LiPN^5eorRUp2y&^)Ds_68K4$$v{#+5g@Gb94o_Opj6RJx}{MjHEX?
zQ)bJc#9y!7hh6Hx&qR=7uzMPpNUWpvS;~6bw=z|i^$XP~c$1+ewcniV$i^@SyrZR-
zS&us25E30t_2Yxl$_;@<MWh}B)-u7{BKEbeJCG{@=j=ezpSe`Xh+e~9{7mP4Lgc~l
z;HoEj<%|R%x4Bk>ldNSEHqzAlx9WUiL#w~jN_PyI2Wbru=o&MbBCvjN)c<GT1*;?|
z9t>fpfz()+1ctWLXl_mfoqqFoR-$fqYLn)i3B{Vw^`9S~DSKQ5dkw5*=ODtaKhPPg
zH4TmO9>{|Oz<y}Z9Ws^_&O~NxqUd?>l-u16&i}^W^yw46T~m*c#6w!*{xsQ%A(I-=
zy9564>yfJ9RsAtMb3vP)yplkMAEXI`00D{zk-uvVu-^D+cld;KzVh-k`&uQ<Nd7fo
zC@|s(z9-K|1d39V$h;Tm355<TFI0tiFy$Yj%}W4AN&Er&`hD`m)UuJdr+|19V`JFh
znbRwVyoE!@Oc!Zm7lYuZ{nd(*lL~{c7nGXmI>(Av@N1Cuv($tQe#%#PsxE_m!p&u_
zEX+6akFZr|=(PH+f6hD~Tj50&FnK_;X^}2_9@^$3OQa@}#WWDy(i6JG8TMh3b%u*7
zIuS&3ATZkrJ7TAi_&Nb-q>8HY$+8x#c$mo2kAV#5il8PZpAwB`#p-A^vd&y}+y#N7
z3R&?2pumt6BBV)2P-c48q=@c%>u2?-LbMUXC1hK~-=|kwSAd$FtF63#(_=+G&r_rJ
zNqV~@q5d&vFmw^;KP9s#$C9^-77024{#a*FodZx_S25bozk-eZcacfHt>~?qbLsdq
z_TR!nT~oEb)Tf2&&Kaj=A0;}F)a5S|;XCm>a8?O+<c{YB;`1xY&_#cq=usC$+*nL~
zo83MMEd9$<^5<QpAE+Tx39qL!n_ie7yzo_0sZjmD0Ze{ZeH)Vt!zrk+BYKnDZ-!T^
z`}$_38+PBfa4tX+>~3GoUXHtm%^^R{k8%j?K0e~jO?aHNLd*PJ<38Km9XcFtBfQRP
zxUZZeC$~LXUWRPIT<%!CM}*)!Byx%SA)xZYs)Dddu5OK^2sd43d_tsXEbuy7x7KC%
z>~<4IshWDiWYgT;Wr2>IjrJ4H%<PO3qn-5I-Q<la4Qk4}<9;7+u4YA1PIJ)9P37{;
zwLMwfzSD4jajOh|zJ+?7QNRuP*mpVH2Yc<g@0Xsw7AkDumby@x?Y<($H0m2?cjP&+
z$;;F0<qm=oun@K*(i_?LIr(Mvcz|uyE#i7Yb|{<J9%T(rtqze2v@-x;!J>&~2yJvc
zi2{*x`o{1Z>^q3Np?}M!r6$NAib&v7P08^yN~9fqR}xB8VLhfqxUSv2=E^YSnp0ar
z-&}4|;(Ww{@|s88h=~~{_~IGzg8daAfm35G2nz)bwZZ48{QMo$NCMy!|8^rlpw)>4
z$sw?>`N;DfgWhbfo@0>&=`X#Y(Z2RV6h*@Kcbx2oMlQeMt<7`<tig!1vY{kFlYHeP
z{Iw5^-dpwkLmaXfK;qOykVEtpeapQSH2}S^j+_f*T@W^mNP^q{*|Ji}I(@kmnHI4&
z`jv_^7XdgtLg-pMq6VG{A)-wj6EGO|j^SoKZ4a{~rmSwtbM4nAGgvk2$LPkNw?{b8
zwsFZK2Zf-?>ZsOaBxYlA#X)xt5q8chxE7&bMM;GaL>3}PQspAhnL$gxgix&&W|67X
zus=;zhuK@HZ)!16XvX(dGVyt9VDcz+I7@urF}~AF{2-S4GD-fmdUv*@S_Ok5^GjW!
zkx+7juB^n=ooG04bhGUt+{0>;?|qo>-sJX=maqadfGEa@Oo%EW&-=}O9wgEzkI+}_
za{Y?T!|QWds6seMz^(YSvYK5EGujU{x_G;Cg-YWBuA+=s=PPORibb_FQCWWpLXsj5
z!Rl19u%6VrfUMZr-uipjaL~0mBK5CMa|_))7$kz#uV3E2II%2FGc(fyxjDJraOP;J
zDs=0`g&C?YJOvczVOu@@lpRQ+Gm>yB@pYAz&7Lm2`VuR_K3ojYa5Logz2yad{9LoY
z&(5S0E=eC+VDwC|M{r|;rHof;)zD2EZ*aR4Q}|>rj56HyiZMgFK@IiL#AEXOU(KN0
zID`7NFsBReDRWs%6xv$_a(e7_NHU28Y3DofU~_UehV~3(;qJBdJXtyJ@pE~epYWgi
zJY-pKKH7MEjBlfiuU@a)c<Kn)zb2vG-l%ih1v<{<@evC;!=J1@Kb5SBm4TL<2e<ex
z{uHNOE!ce%(ky<~I344|B6CQw{IUJ#ZqSG@5*W3-wz7X8CN9)SwScq*=aZ=vw~rmY
zJdx^3Y0u1J>S&A&ZF;XERNW=kB;V3jsnRQk7b%@)d@a0v8*lf?!6K-jj?Av;vfo$v
z{4y|#Pr)y=z&J6?bEf(D<gFV9L1&3T*TH(CH8J#jfs>8)XvzMD2;bqLxGzLMAfI)j
zKKjg7yx;sdqUK~DQXVrdiTVECC*_7BTqy0|kHETpoe}&+zW=q634#Y1Tkhb4vH~PG
zyyNTD<gnl2_4U;+F_T4f{d)Tvh3J46-ZR8N-=-4B_Axaj+t2*d{dH{Pf)>1Xsxf!E
zFsA-F6VxceGi}a7i^0OPLKB5h(pI8PFhFYIz3c5vr&L`x%|MAt_B_4mPf#ctfqK?%
z-57JPD#l`sWJ2leNdKqZ4tY%|g`8+Q1#0oHg!%~BLJ;pobgKX~O`eUpXqO&r4iGfD
z?AOumr*L|yOx=S&#x0VC&z>2ouNC-5vZvPMyvJ3E_Z6Pgt3{(%Xs-jgqaPRMSKTx*
z)^Rv#gjK<`ZAMov>L+yXv)~KkiP`6wKduj26HFPYxleJ?u6+aBFsHQ+UHACiCGW5E
z7LglZR!}n1>OE31V6H@D(;K1fRObeV6(+O~*eP;C$kerMplw2)!KX{ff90))#%3t?
zhS%r*y~^P_(tGi}8nb*RGh^Gkv%$@~VsRh1x@;Ys7MU?iUFu{KgTPS84&ee<=5*NT
z_7PS_zG!-P?ltC+W!^Jj25a<WO2W)6i{*n*iqos6gl)Z=-u4}L8>;Qe&3aw?{Rwpw
zkl8-mp44(XFOEquGdY!QH95(sLbxZ>IKC?spnajq&S23R)EbHnp!W&DZ}&0Zozt-3
zGrqRm7xyA<r?90$S#3Ha3zIqJiFZFXo36JE<uay;v$X0*q^Wx-k4PHQF_fvw9W?#>
zLv|udk{UJ@<IiE*Xq-lnyqle^+0$YBY9%Zsz)4}#CAaIBw<jWbd{UCr48YL9a928>
z6sUw=eA0_)lqLQ;fA=at+m4TiwvdDU`g1VLv?9Mqh?e_CgDo{4Zk&DJp?smjIypp8
z3#UB<GZ)xWcp#>6p^9Y`?G%068Bva!WAbmF$K5oO4vS0;c^d&a)8Sox0fx#I!2~PS
z3RC0EEy3vPv1l!bPE){{S#BU+B(IGLTxAF4bSENzz+7+L(cdqSa<`{49+n`Pi?RLT
z#-Ffpe#wyL;J77sEZ34E*O?bb)S%G_#InSg2t&e9e7(xJ9OfHogz%2nVL+~9v&P3R
zmtfOlb^YloA=ffF2hApa(09vTSwVR}^nlM5w>uF`A)imhm~S9Dds`<~3a&65FV=M`
zE$BPZZ_4k@01*=Sx$BC4+;&fy?hCu92X~?7^c{s=VO|fH5_O*Yo%62Jx1T$3p~$TM
z#U8!8jkJF`yh%KI=2FReY7q2)s@aP>?iatwfB&p^A5Q1QSuGfo{o%Z8_IjsXuQSQ{
z6I@}^W2xzj!UAhkt&z?j51s2i{v<BLHRMZtfzr9kb2zj<HL8DFCil6Ox$Y)NGu72?
z)28^j%(;-B#Wm4At*7T)yJ_*^-g<lM@x0hPoJJS`$N6_@8>9Ba<T`2-QDMs=u=`Cl
zADy&<Q!TncwM58BF)wbl`%*C(r<M!clgoJ2!e9F|Oqr-IUU<k*Iqe#96n5<PJsN>-
z$(r$^rsXMuo_-*BO4t4;laZHg+XR!b6Xg6MRzpaS!+FUs^+s}I_MRGoq3o$h`h~pi
zCk;l+##n~cGP)}=Q6_7d=?gmC7QW}FRJrxUiTC-Kb)$3|qpg-b!<5S@jE`=Ts7Ca^
zwOdEDW{d?(8T*ywo%h=H*kViU6+$JXe{z<}`gWJin`!xL*S>x|?WFw7qtmsp3Q|7@
z3+dFeq2t?DRF_t;yUW%nE6231ji~X(J~wxHc%1*mAY)vsya;o0K?(s;`nGgrrACk=
z5IH@URZyla(NjuO5pdr}FXBKE|G*Rd#`TP1mp)18BRu~(_`73_KBYHd$j1YYPiDtr
zT2=7@Z0y0eNVoHhnuyt@*5gLBcs**tU0+yzfbZSxc2ws1k5<WuRrcq2_oreHH?clt
zR9;l$lNDDc@hQ&Md^5vW728734;ltV(ET*4lAv1|@ecPB%(yE11h0*baj|jNde!16
zGFeuYPRZ)A$h17+VVKikbjMx(%<0p^@}ch-?7Zn01#o5HXNjJeEUN0g!rvGgbSl=G
z_gzCO=uT?(zbAn2v))!|g_`(TVY^6hbFuzNVpaji{B(j17aaA>Rrl1?hDaapP{82p
z<xwUkSkg{&Q%334CuP)342>;wXo`}iG@FbKsu+vuPba$b649usW)-=tpTzJ<)xVdW
zWl@5Wrs7*Q*yiA+I`0LdTqU9Jp0a#TI6h~4+I&y+HIMJJGLJMA7e1m+b&W*yK8dgC
z$r1bG0@!KP^5?@pOXM}FT>9-i1JM^yt0eI{4CoskB%O(mU0mZIr^!Bp+o|jJ1Kmh6
zazB6W??v@Gb$ET!^FZB$j*c}h&OiaTBgR&C5u$^9Kb(LztvTq38NpOJ-C1qFE(qGs
za(d#_%FGce|GZpWY}}vQLl|x@%ut?qxjY8)cx~YVHASwVm&~+{Tror9dsPyzs|I6v
z1)Rm3=duAlq=mNaGD-ayCuWDMN+?3mcYvjQzurF@?=QHaNc%$4C~x3otv^bf^5bfp
zq)_S;OZ6uOa>X&Z-J@zu7st}?zwIQ1i-B$*>~3<)wHl40*g3Tyq7SZDtE;bZgrK#S
z*<427^|*>s3*ASg@+&(}Fs(SGG6JN0F^!)xf?MzRExNyXwpUX(yUb@=biG=78fiLB
zM~*f-TQ9|t$7n(oCi^z2m+i#UmCM=0Jv2KxFv>%@y#-u5D3w=L5}n;#C^=l?TL8`q
zj{D*)f=9L;&O2!w(kUNbN2^l)YZdZ+3h%U~>|^(NBI42kaENaOs?#__ASjk)&5V0J
zC?f-K=%Op3bOK@05aOw1NWZ52THOz-ysq5$4^8&xSQxxmZ|XwESfcc|fE++vs->kZ
z*A^%&GiCV{-C6w*!D;Vvp~65q5zTl}c1=A{X+xF5ZTo^Sa<)fhhR+AsL>g+ziL8wp
z3^c2TxZ$Erlk;v>nyymBYOGKrL4qV-89-Fw%p~=K{bl=)1?@cFEWx=aHun!N5T<v~
zz%rRtDfOQ%6oYMRG=`r>Y(gY%R9t%!*d${qC!ZdqU09<GPAKq@^tEtLAad3%ycd6E
zHDkuBNoV#4oy_<jU+UygSUfO`Ad7*(3bi~zxUtIwR>!{<LWebS3%yA#SGR?~PxOh~
zqijZ=B7DP1LI|ga+IRA+#P0@K0iib)NK>ezV|_BsQ4vm+2+Sqfqm*(7qZKG^+!RlI
z;T&{f_$5$b<MK7?v~rBKFpl>6$f>AdAR*7{vtyvr`Hn@(AlSGI@Z8{_oaXU5WRMVC
z{VEE_(pYuuu!Z60$=OA_6(^}Xi{4x+RqJRae8J6V=6B)}KT2Fxsf~k=5U2ZY93dTt
zP|oytX${S^dkNqZ6&SqxtY!0sw~uHF1$}9c2|tbxc=6JcRmvmlm8VoPROjF+Z6CG(
zIqqslqXpzP&wU8m=gQ5%HJ2jU>Jur<iPk#(-Ed%{k=iY>6~;DH+WE$jwnadpanzC(
zE(!fH3`m}Gu||<fB&8_rNClKY+#x)A#Sy=?rqEs+MaB^dg@<%4oV!#RjEwB71`kD-
zWr6Is26myDi*7@SvlZ=Rd%Y&W#pfRO9PwsXb4o}DwrMZYNqmEy!)ch(IdW|6o?eH-
z;u7h+d~$K~FhJO2f^vX1e{C-aooCBXRJP6YmypjZG7I7=Fnji2!G)?oh}^EH1R>DZ
z5|96AXcs1qY73F7h>4rSOMR3^UxuIAmc@jji#(}O^FOZ*3gOaY;mVUH6=`x4!9&0k
zsq`gKVHpIZ#QoeV>C)Q=4rLF>Sl;WcsOE%UQx?WZpr)!Q#XbBPpF9VKB#AZDU6?|m
zaA5;ef}@iOIR#R&q&xT2LWky<#VbZbB0`On;;01(hM|K_hT%O}(@TctvA;H|{ZypP
z2@foWT;9q}!K5GPt<Cng9Fan`RH3rQ{Y)gErr@+u3RTV1(q~VgYAoLST0Q}a-N#Dp
zPsjU2KO?xcu!T;={j+-O2%lPTe@#-!n;ix_)U~pYhidhaVmOawuaH-l4Eck={@tcc
zdkkH1tCbafF)CRDdTZpA<mq%eaSE9b-_@W~Mo0rebQLtVMR0gR?8Ik-TdsOGhD6Ei
z6#!Wfx@K_`nJ_pP4Naw1#Zw4uoMI)BA{p}*Kp5=G|9=65KzqM{j4fRa?xa^;pJnCm
z0c<k3ryh{z%)ld&CLjYpC^A^^zA|7R*b*AlrG;RLu*MUlT5-{QJ%{#UDe_n#Fno}!
zTXG-B4~3MmBQm^ZtukKCI~fIaE$svNw6M~bC?Pi4CRRpV^c!g=H=4P<qR+Z#Fo-4;
zTC`(Kac+T~RHd&Lw&-OFxT`<<=9JYkT1<VtrkdYMgVR)ZQ2t?2%v1G#UoU2O?d$DM
zS~7+<gAkC~8-Jj=2K`-vjT|DOH7%Jfj*gX_QQ&Y?g)`{c0xW4L4fL&TsaeBD64scR
z*A=75`)*5Ir62<0m&USqo@P9|#=;WC4G{Q9G5*DV`WoqA6@2h*B%L7tQxs;=hN@X^
zt^~Pg3^(BpqXBh}H}-Yl9B=B`WSir6tInQku2^;oJ5jL~*Z$V*IN5RqnlBaWXe<R<
zz;!YZv(ZaTP3@Mg?Jtg@HLR8`I0t~SXU9spJe_d>5mRktwU}0ysV8bYniLyAhDLbt
zCTgvo=xnW24;NA~&#G$`&E)AUaRuK^xiMxm(5O`}2>#D!$-gL9%~Ba$o{Qd1y~$qE
zmQMHe!N102uqg<b8cXY(X@q9P^BAy};-*sFLaXiA*}OqLJx=qwN%EFccB*tW?<^iF
zmUTj^7lAc%BvZPelvH9q3Muv>7LwxLn!rRQ{JwM=2|wN0(V6VDB(QD?1uBb^)@>Gk
zqR2urmPIW-8c<MI>s_c6QY@~1>cPjh0RqSDJl@tU@3r@0tWv8}G*}J?0jdAidZ+ol
z-uz^FE*8lrh<3$ZEX5X~%CLiDHv7)*Q9%JaVC&V4QOs_@9?YTHH#<Hie7^1K3#1Y(
zIz9%>ig|aGD79Fk9gS5{l^7fNGc7;JNThvdWg88Ea^v*k*B{S+bkCigbnl!_rBX2y
z->^}`QP(*{4<iHCEk4G2gE>p9^?;Bpu{~(uf&X#@P21z6JQpYtbpcH%Y|t|&_D^UK
zMnPmKIRySGdH*zTqg25Qz#8a+19E4XDfR2sBKlO2Tn$cx<q6Li1Ahbcv7{wPE5&lY
zG+vPJ1=GXhfn2pnXO$VbsC7qk#K5yN^~Lk6j^*n6YN-R50*d~?UM5@l)1(~pJVs9$
z$faSv_BwSMXO{7jSeBEGS0u)??B=Y)V)EgH#w{PIRz|74zFZn(zBp*R0x5w2`J?WA
z^LV|oN5Bf!kMJHVrBetTU6bmJb;afVh+Yz}RI`ZGfa8a^s^l??qq))@qdt-$Qa!cK
zc}pN5(mPs+Yckw8mQM0*jH?i8nb>H}vcT}6m&knu7D^*~tGPlvB*k==Z_n}d>sk9r
z;ZUSV)fpBp&bw5i=V=k(A~Fi<nD&p!qcKx#rC^XZp?f*==JNBbX#B|=1>fjdiWX#K
zDOyL1c1t(C-VV0u^>rs#Y|ne`wb%SfOR>WblSbp5sXOR64rd{h@~8_w$Rj*aBExLH
z<mAd(pTO8Y0HQ4Ha2(l^a^B}vZ#xr78Uz7@Apl9&^2N${?e}C$u1sImb0zA6FzWpN
z|CG0{8>L&ASB|B4-VPE{*((XPQZb{Up`jr?Pa0p~2vFO2?Br(dPKtzT6g4)dkW9da
z6C7$VjdwP0Ib~a?VX|fJU=$B(SvQQtWui-WbaYdy>+Vb^yL!?qMR)idHr|?kUVM2_
zEV4D_YY!}f5WAblDL9M14UAOz=vuk0mmjaP%F%A>vGoo+XV@jQC|B5`G^QP@`sI+h
z)*Eht?qm{BeUY%YEc`h|#QfEd7w-IZ{^k+)#Pi0b2T2xnYzM_GhQDQrA8Eo$^!mU4
z-NM)3|J(LG^Rt&1?tfy70&8h*b9HZ^7pOENEYK=O>8GhFpl3X^?is4ZnVWg4C~`D)
zsqRWlXN^VrWbtd}UfyaPxsU$I=kG60o($@bHgl{~v?K2IyYwr&AI*Tuol7T!dK5Ob
z?)@hV4{u?&?p<+@-u1!^L-BT69K$nDn%Q0gaQ!}ATY=~nZhX?Ppsf<0Zabn6RPwpu
zK-XeBg{|eW6FgGE+Z^=vhNZB)UE(&|%1T#gDC7MWe(N8mH{<;lR8*yejQVF#C2giN
z+TXTnDpIuqultMqFK}>#|7T0i0jHE%_fC49J{>|#3B(^ie0xap(UH`KtNTOO64AMf
zJm-w7U?$y*cjum-Tm1eCrgw$+XD`t%Fy3PU!U7z4pjh%-jBHG!sV;u`1*{25t?s91
z7}UneI}5jrytB47!eT1^GJ91{@5|kJF|WsumoQfji#GD?{KCr<AY_(G^x(_Ze!V);
z?i|WHWA(^LVf(V@qrjxXko)F7v4W)5oUY{yf&V<DjgK#*Q^_)*6hBiGiurmb@M5?r
z-&+b9{s0W|vMY&SSalo`>MSuWP47;7)!8r14aB0|3FdjMB3{r!n|ptNy9<|qYB^NB
zv?A|#6ckMaVZF(6MV7rxtzH=$Ljz{MS_BcZn6ni!bVpW56XiUJ-b}&FLkQSgtU7Qi
zu3D0wE8T`U8w)A5emxU7*$q%bqKeMn7EhdYkAESigD{SPK@P9qRdPMFp`67IVg%zr
zPhDty{+I`Dp=p&UPqT?<G;hS>x0f3~+%i(NOlgS|Sat&U_EnJRgiLbMOWVRSWH0=D
z-+l3MU`rT6{@M-q`KPRxK`pWP_w=5Z+=Mh16efe3vg4BKSyCxU*_cv^Q0QBjMEb6v
z%pJv=jrUr>8T2Y3?&cr1@b}R{j@OGNuVY+sF03i<luHxu@C!!b(>Cf%UKJ%w%cj(l
z&^gf%Q@n{XY{!)`^xkF)6x%}QCC5j{CQ#c~9*e?+){@3wCmUaVgULlMe9Na(B4KM4
zmdna4?4>_iczF$kZX~Mdt8-6Ix-Y()KX;$4k0mY$X!OFj?i8kmkhko48fc8Izvc%j
zyg@240F*T2h9%3!<(U#`BsQ>h+Z%s;YtUQxC=@saLsBB>zR*uT#DC(t6y=A3OyWsj
zLLbx!OrKIy^QRmRzw5@8AW@+O9ThO&*;(r}w=f4j{yg;kk%q^_o)ut|rP^Z>FIN=S
zS*g+?OgMEZl4T(g7#KpFAVfC%;GI%DzHH_F79^YCqwkEFC|jj?3>K7$w26wx*i18#
zwpXupWt+{Ye};FP@4Moc2ngbPu<}<}weU?@Jdq?SFCe~3tX6Jh=P|&a=vVjnvANl+
zjhjyzXQu;`6}qSH!VzR~@?_)m-wYmxniIk5V6SXUU6wau_{w*6U4i0t<@OzxUS@If
z*}{uYFb4@;XY<!)+zV$|awc&Bk)O017oT9_MZG5>+|hI0Do~q>ZtctZC77x|@KmGf
zIZa4`T1yj>F1-f=Qwj#dqfr4u0kIOn5Ih+_5Q4@3NtW^Hp(a}_XNB(F3M-(j1uEw0
z;#Ni*Apf)NaFnZMoeFwb=t5D3<jNCqk-^bpOC*NH1)anaEKH9lNtI3BW@FKXir6!4
zJ@&pvS>>>~`OshorOB$$q^WIrDhINOkAWLsD-ok(Ltbx&usbyeujP;-8rUvLhLmpj
zVUKcT$kHQ#z`IZ4MS2@BEavh^>_Y|m;Y47F=7{_gw_3@ryFt5-wt?`~M6bBn2;N(5
zaheY_9<%^(zLe$1GlB+06{zM8N|ubspjnl0(85Ve;b765@T-M4ldo24wbgD!j78_6
z(OXg#9s!L7lg|jh?TNk$WqyUW+RnR0?Z(xjbHEW_%peZ5FACAwIY8J@6z4ABcC0O~
z6Sdk%@Dat7DUwB#ftV=56-u!|c7pJ0#1eLkUf7(;Z5#6zviKu)O#!I%PHn1L2P3i%
zyc9(0RlH<g8Q~Y3g|}^u8ijKu5F`uKU#eK!=hzOivf%->o-DeZC!<?#sj+%^(>gom
z01^an!`;X)JYE)svVI5I9TUJpO|gDhuHp1_+cr6QN`yu+%e)5Y2_~ybl5zjsp;V28
z?0ON-kX0b*Dh_kNrrH#G3#P3|p|=Na(~9nbd8Uq(a(lDXC&eirut4GsWwB8jqRprj
zU$=eNo7>*swrk6_?4IpAw`Jeow&k7ox2{W)0KW`ND?l$LZtTvYm|~lwpcP?Geo{O3
zc*K<9|0X3lDibO*1<DDqoMhC51w-WcrWnlC*z_?c)zBmA2)Yl~b2RjdR40YMN4ZTU
zALil2F@|f9vuM@ot`?RJ^MaCjb^bOBVqn&n<YZb_PjpM{By~hXXElDi0{D5oS}8^8
zu!tG>Va(vfv3`>~9}eN!A07_UH-t8*vfX|w&pjN@jP;hABF;Vb<Ko3>7LV>1IJ^FJ
zvhmYtcj`fNg>--(H&mm8Ye+j6XH=Zf{YMKA8H7xP7No0_!9~&433whsS|NE~102_a
z7kpegVp#fZ$#WdCz%6LhoqEGGE>1aaLSP$YLR?R1sdY7*a)jzWmbHFIt;btTvlat6
zE&IzxranOjT9SPL^V5@1-krMLxO`98d9N}v3)i81{<v}SGxzL+UuLH~(=*Se{rTg?
z&mY<KH4=Rw62u{zS=-lT(P)pcU)F_=4<EMsiSi)=jQj}oDs*^|{O9%0D1s_uut%WI
zl<8OL%~lK}L*gUFDx5)4cg+IRV1N<Wd3D{Y{DgF%>>R3d^?-H>e+hmZGKu*99Fg?1
z6#brWhlO`Y_VHL?P6)39ntj|@Da`=rwsIS*2m!IAsaQ)IH%}}~KX8BkyXQOsbj%zt
z+zaRCpWS1mABl)nQf!%U16?Vb58}qhi<39qW3%q`6SOC?L=4nPDvYa8_r-rJ*546?
z&&+JiI&Qg}B0@drcT@r(c*ED70pyFlh*kc&Ok;0kNGIFr1C@u?Hu%*Kad#y`wYA*F
zE3r+Xgg%40Kb~H5AKqw8-dUWy>dwr}&75WIE^3FY`V+bfz=#7+;Z0ARMmtvc+;@1T
z+OQ~TTB#nSepv-<1U0b~gvd`w`@a8@xya(iE|8STHAqgw{)F%N%xIz+I)JCKJ)7H5
zn7cw_>hj{3Hx$!P7?)GsvEt&;BJLc>?nuEgPJ+cm_3+8UT^VQ8u(9KTr&tc)p@FA>
zEhS0_H5%lL-3ZDZs1yrSHG@{y9Kjrh_#5C3KXW{!LfMdQlUFNR9GD_?7^bjYIC>O{
zwR*KUJdSIxc}F}|vu=o$&!R@h16_tCt3uecgmi!@;R9OdTJF}6v%8KyAgpRqN{AzA
zNd*fyxDO=Ys`J5kv1-+-@i7MIO+z&oBdVspsU*PA(v~4{WBh^s(P~57N4ZHul2Jfw
zjI*F<jYWu4%pZPNT@b!y=M#4|o3LO3A=Dl!KbEF&$k?G|f=7!ln-@po)DtcY4IC9q
zQ<=+;v2|kq#5nbjNl6oJevLK``H3{oYJA=%L*gcga233Gbb@GRnQnOkBEV4Ss+A5j
zqYk@JgR{%tu({;;YH8*I*&aL2v+y#(TR~Kgjj_*aMu1ci-OXfCMvOHRq+dxG4QTkx
za%HrbE5&JpEYK?JT`iMNqyl36L>Gq4MswA@EexS4+C8(0xL91no6qAV*lKbIX#CM)
zmG**95Vc~D<qCdu&)}H~bUTPXV)?K2b^}hE>eI5G#M7kRTC_#fFa@vhTmTgLMjIj|
zM%<IRR|dLBP()$Kk07K-`bnJ3e6AiJu*wcpwxU5p2b&QmauV*U6(wT`KB9a|72{jO
zkk#_Vkd~Fd=~jd0@_Cq&B#Pr0#frgDGvWL;x))jqP6AAf*4O`w{9j{b$bzIzdJHpu
z;c^*6N1(W^xm2!3-Gt+b1P@iGgr%`4(|NwFbSvY3RmQQ(SaaK$T2r!}rgCFrPPq`5
zfVInJx0YrcxdA8JDE3ZZs@2?HG^k;7+_oB=RzvD_h&>W*0#Zpd1Ej-HFXLjYG8XR$
zLY%d?7$Q_@ZG+K)0C3!;dL0MLa89FHP|(F%Kl#vX^00B^O^2t!;rJpSN?p={1WZqh
zywu~PNKO&`C1i61+U%(1L3YKTdwTrew(kkNQNcFW08@kS3)57I8)iUYF<uVlcpBHj
zconyMpNCtGaAC5H*4XFv#&*54hcy@<H@(GerV2ZHT0HMA>nOxs(8KP0G&@arEIOJ`
zI;}#th76bdniS@-GyIctvkInHRK|Q26^L?#$XMSt17#kBGv=$N#vSdznK90%OF0u|
ztZ!Sj6Ggb7(hQ`@HZ$g{$hN2G2NaF|jq6bVa>TzZBKz+{8)0<8X~LV(DB={g$O&&F
zW_4|TYzVsl8tI};vx7vgRx9ELRA+}7TM@}&(AGfi5FZJTIjWVh+8B!?7FYSOU~bZm
zZ;hlFG)c>g^A;jT$gW4|7Z3Y7Om1$xM8i*o&T-@&K5FOv-Up#!I9Edvk*Uy`XJh1>
z78v<63RrYTf>W7658;gmI-!9`!@$rCNg1*+=n$t#iYXC6bsaY1xkmVLtz_|08my&c
zR6T<}WJ1$<A|~R>*pu3*63HmNSn8f->s8YpwQ?q`REv9Qh+un5%B%3B4PRICnlfPK
zbEW)viPC6)pm&^rN9a)(ohx-Xa9#@R_u)uz!`2Ew5BnUgfzb5=ErfoL)&z^`X~9i1
zlr$lUUw<v)Jq?|7UW?K|GqZ_Gs(;@6{;uu2-s&gs3+~4P`tlazCL$9AgQ_@%8VMR<
z{KE*{JRI?W2tp(7MH&CvaI0bmKFheBD+?Z(DC$g263Il6Oz`oglT5gvLOZ3+N~W3g
zg3qhpu30_2OK%HEjcvt7bECt#m<amKBqJ7$I-B+G;ipQKgE;xjU?p6ZbJg5vElAK|
z$^Qyod}>25_xl#2saK)se!u=jOsN!S65eElJCv%caSH>aY#=N?rwkF;o6=%R`5MeP
zZDUgFlHY~z3QJ=xVpMr^Lj+zCHk1T(ND|=U;hR!qjzZ5njIns%gMYCk1z1bARK|)h
zBzk{Gjy%g^M%7ZKQf@TI{y>8P`*R2We6v#B%}>!=kj#rC#XQ%k;Y`yq?XQV=j!Cj(
z$L5`z;RTPrFZ$5#cc(Me*_-mDO0R(Uu6$mjew4qhk-tbE*7&dew`;^Z@6{UiLj1i3
z|LRP2ZAxt<@y%m+5%{fQy*<6LJF#ZXn~1iJg}%N*U+<>;rrf69{7B!X;ogznTt`P?
zWYfm3?%`Y~{NildxRLAx-0XJ7NV=DFboA5zNM}b!nu6&kncx3$&ztRizyIL(nQfag
zeVv(2U72m&nK!m%Hob{&Hg#t<Z~J|Q{LUF29X+rg&h~p>-#eNsmSBe(eQnKF)HRYn
zqYT+awmNwL-MNvZd-~H|{ppRQt)s7_W6kdI;lDWfdOsO>ck`YtZ<BP#hBSKMOHKD=
zHf_oDb!4_}%xvn&^lhVWHfG-F&TQ+YZ+d>8S+jHd&TZ{~0y?2#3)6t8EfqB6*4y{c
zcqACRcf(jIS1kWOB)<>uaMd%k3BMdAHmBN7b(cU}{iJWWSYHG9ZeN2}$u=+w+)!|w
zv6|!T-{29Uc#|5NV2k5F2<h!glQy;D2S|DtLDjG6z2a5>0}#-$kyaeTy~?R==zs}n
zN7SX$9o>M~(PBMCzbPC7Dt7jE653a(W~Zpx6H>Da<{WG4mAkNV4_2-@z!VA=5dT3s
zdwVw7DtG%Wqtl&ZZMnC5Qy;G^9tFro0i+j@U>^Yh>Fwy%fT&P<Bq*JMP&UDYg+b|+
zpmYU6>Fem~4u#U!4wKy;h!PH^w{MdU1>vMSKrY8T7OXxjtj<k%Y2K+)tz5hf&WqiB
zn*bd^UJiTLo>;^FVt3z=zIwX5n^RY_T6W56Rx6C8n$@*aRI^%vB-E^8Ak{Rh7)U71
zDw?#hW=)|uG6JNq7ef<l;L))*$opb6t~9*#x@%ab=Dq^VKyD<RO7*3C`$mR=OR*6k
ze_T%AaMN2>|6YHcw0Cx40P!~br^k=yyp@;QIswvXF~4iiA9m++SpWS>Jy)IBCeBQ8
zQBZpi0_IXBl4=FD+}VA((#RUB)RkVu4<TTDVzJo!73NfiA80~a?oC!Gj@IxZxtFax
zg(2kHnY4vLqO%||!^XDo@{7e2;tnItT!|TP$lSBPHm<!ShX!E(Ns@lTvj`8grw7S~
z4Wx6BxSyU8pY@OE$jn<hlm-z$7>ZY{(=TR&Mn`)nouuWwUdG8Gv$$~cfm4+%RSGwu
zc>5tJawwhf-GK-tf$nry5*TD#ce+E)RwJf%XJwm`70P3`!t6=Y5#YAJMrOu59Llar
zeY-fMIRzFMCANpx5XTp;cLrLSaO#+L!vn%Jaas~-8tGeU3;i{##l2bOTA)^s)rG1A
zL$Ncv1;1%F0>?mG;fV|nx~i5}1AHcF-b9)(6%#enUszlUKeKH*kyxXH4}wm6vtmt5
zuoCNi^jL~%2JCWH$*HC(h$zW)btKcBNHd+?Ez!)9S~P0Yp}~5X!1QlM(!5%{SU_`I
zeZn$Sz9=Sinwsz<?DFZ*rhuq(K!PU0g70>}nU=;a)r&r2sggQY^Cw*m;#Xv~B7iM(
z3xPp;nLk4#*>eY^M8ANAkHmEGVbLEYfCK}AVl})__R|9zTK+S4`Z0@K_RH)wvv%V;
zMh~SSOR;3*<dek{=ULEqyp8_y%Z2+#fH}M8z7pZjCKpaT`p4wakbc30NPY5hyGSM2
zZva&5qFzRuT9Ky(ceeT%bEwj4;8QCV^NygsBJfB|D^bJ^j~ZuyuLlFQ!NY`+c&X(t
zSqizP>>6I0Yg&S*-%7GRuw9`eVFNlo77mhsj}EX<Iam${>6NenO<RF_1CxDY-%NpQ
z;pD1upk^I61I57vu|cvv=$>0djUY?896T)v$>BB_nL(XYaotK6SE_?570UUs+59lb
zxqLl4QjU8##cZU!?+InHI>RXd<Is5Hw=@kHNE-AK4J|Doq*)iyWzqZF7?32Tgn&@U
zIv&(}8M*3u1?iV(1ol=Oyt2VDUh<v5Ykf_-Tw*PVOZX7MQ8|NU9)$d(TtV!N%zP}#
zp*q~gnUmb@Ut+Qt`|`_=7Ehd&%hFN$d7iDD%1?7$8^n6F*?ux0GM`iGp}kV(6L2K}
z69HP~STbO_CdHT5DV;H?b+zG(gaPJefB`|>spFjg^Zn}C+4+wjFMN9hwzBz4UufG@
zi<^soQ+2?g?v>nhqAxP*3IQbDKR-9~9q$wDvGb>YSeU*wKmB?5_EzD^lQs6vXA3Vs
z6xUOj{qbU3I>CM)4Bg%kB+-ES#M2T;T)+$*@rh1nnOH(%oA_t=W~q&tH<DE7bimR_
zykKkXrgx=~`Lio61u+q1g}k0p#L5Jt&X>_xv^g=VtrXt*Si@26wf?%7p-z^j{evC9
zaqi32^eiD52hV`6yc8`c2*tZ$-o>UvUeui>aAr8c(A?K%H}HBG$n^ENg4!BAoohwy
z00t2Dprx{s?;T_FVIKsmu_aSALHLzzFx4Rfl+kgo48_=ht(L8;-gc3(O^O(y%Io2<
z;l%^;c2IGmQr3x_s9Z=^|L#;PwQQ-l--&A-PlVQGN7m9<mKo^90ZxOFv`=6{!x+Ev
zs3L)#XMNI322PmL(|UEnU<NGfxtI8!-hiy|7(dzZGVioiGExVgw4S6C8nG6kBh0Xy
zme%h%SUm;m6v48|=!LY&z)U`58~q2ZXA#CL&9f=GY32QMHvCpk)$-^@T+KosttgUf
zTSwOshBjqQ$O?^Ay;ySM@pMw=Es2B`mPVfIHBz!*Q+~yHt~?IE0A;8Koim0i2-BCu
zLRAqw!wHpj)n)q2>=YBd=Kub}eflki0tp5qXCLD$0hqCu1jy1Ue>BZD46bKf(tcs#
z7sBHtlmcynoNbjGWbTbHXhRs}_XYKetkXeEHmDXWCwQO0yw?ds*!77L>aag7Yfr&c
zTSl(6o_EwT`Eo5qv-W!7;9u6XG4-P~+4S!fn`)-M-`JB*c5NhW-QAtZ&Q^4>?PTHS
z9~WMpXncNC1b4jpF<TIFAy^TOGe_p1eb%^f-JL$;9=pr*YJM4SVKDr-Q(wAwPGi6*
z8iQ+o_A~e16-lMvj3$W;ta0vB_vv|zM|Nu(dO{>FNLbn;_x=<2=|xZJhl|#0E0Ub!
zm2O?i(W|yB<EUe0Cm~MYFd2MW8kZjUE%AtW2*6(nwvMNUi{*S}yiA9vG;{I=OiSbX
z$>3S1asM1oE1e;6v)R{3M<P|OlyUOnxmuP+cjq~NQ;4#<Q>IOOP?5v<%mO%#+Dxim
zDHUt=c)~FFxaTh~-2X-(hD|ue8l|-1UO2}G;+}cB`1vDM=P$zGngbZ<_jyJlg{K;E
zuUUlv2O~r=h0|uDU*bGob+!K@vcRDLxpjcT?uT{;y#b&5ijRrWxC)TBC<(S+6NAL|
zFbvY1rtXDLVB1@xcpSITLA7F7M|3P8-C?3dzT@uP@vequnN0_<_PElZfoQ64VydP2
z*}pX|zR*Y^pDV%8<c)yy_aGr(8n|UVnEUMb7w*&p#^I1h-8(t=>}Q!rDcndJrso41
zBI6m;DA!GzpFzH(`@yD7$sRP#^>*4!`o?zP1F~&uB)ZVDLZ@6^T4NzwPfig`zwYI~
zxylXqJ^}-d6QptNCED~)o_0^&m8T}8&6PW7uN4841*`J8`uwA>-Pue3dZ%)Aco;bJ
z`6Y}Pt)9PNHP|MD#^F0@=|}j{Ue@+Xld~SJu-W~l-u2?+#cRj85}UHUC29sVu#|zZ
zU77*M*#u4ogH?@@$_R}^yQ+|7ZI0Ee1fyx_A6nh%HG$saofwxCO5m>R00!w@TXRqJ
zwqR4rr;#WT0tWrCs&c*9gW5MQEH92-L;ip_`hM@XHrkFM$A|9>ej`Dmy{7B5X*m6<
zXv?tC8YFOfwsG#8#>q(#pM3f{@zhFRs)`3KeVQ7B<PH^U)I_SaBJE)`=@%+y+~;4<
zpL&c|^8r~K!&ZPsazd43VYV&lb2L^a=kL4q!`!pS?x~xLlV9N~@RqnjflE{P=m<=D
zU&6S6qM$HM9;$XoBHJdMMeCZRK_cqIq>pcp&YwF0Xn`p->V|2TqUb`{XWVpY;9+^6
zrkifK1IG2cjk~wp8>hYFS>yA^A~Nb3Cc?;cjq)`mL*VU=D=*+{xFZEK{Lp(6OtCL)
z#`Ro&f1EE>!=xuMDetlkC&NLT1A){ES^)^LwxU}1meB^&_FLP~?SPS5q41;V1DF%o
zl9U4Cp#&TtPIwXx3XDB{-O27Q(zdaqE7{rEf&zgxiZP<bNAZYdsQ+y2#}9Aaj^0vu
z?h;(k1I(#khQCJkS@Mz_bARE!D2U~CmX29Z{8#iHm_&Z^&ca8};cy5Cb$4=>sYj@2
z;7hEeaqcb^4X*lC(<xzbKXaK$4$8(tjiBi81O*;^Lf)OI@58*!EY;-f<l@CiA5>?G
zY_8?;sJgk6%*~vV+*)m77~|VQy>R4f_x4#IR#%E_dH0VS-g@_s2njP_g5W%oE0u;Z
zMH3bEBqfYHePjOI1HX26iZD+s&OJUoBbosbc<S@TTTej{o|`!bM_dr<8`po3bxJVW
z#Un4>V`rJv%#=<b<JU&1Q+K=HjC=klU|T^aK8O0%-BI2W0JvHhr-zA$zEr+&>+byR
zr)oJ1jo5b!qXn+f@$@VUKi{9f_>dogi-q83IMoGgbZN6C8M5qMA?VdB7<9Jr3Iv~Q
z6fIIZ!F6-2P0TrZ>?|tK>C~!vI<X+w<a@FTK2gp*z8+9MGvU+n3CKs-S{b$L<=tKa
z-0JCw0rs^w+qe1(lSrq#wk40usJc8|q;@QGUmjb$^$h0LI6bjBb<aG3={NA#>{Lh3
z$9HvY7h^~auraJAHeh8Coe9fP5D#l-PBL@|?XpCWkxz%jcq}Ak6H2AQL@_i+YlgA}
zrX#n_l6(Kj;`0kWGNYe?b^J%NjVrUzl^|>{(@)MTqum}Ymidiy3?eAYT?tL`-JKw^
z)lY1&Q??COQQT?HX)7ewdf+t&wWjhcX$`#Q#kNACEJ}{OUFl>W2#%ebTD}CJsS^w$
z_mm0B+dwq}DiD0t0bL{X*P!)>kFL+fb>sDY$Gcwo2&UB));xtKY@SU*^ibT<BvXS5
zCeY7GfRP-~0)be)o=hc4Bu6MZuia^w(6)_TeUUWC@}r&s^m&o>?rJ?n>7uOKjR_&z
zTC@$gdVl^q7Y@XURav{s_G%F)dr5=KwkEu<PqT{3Caz{K9~8YCyOQ1A-GJckO)Vif
z!3o!UVN(+*$n)KB*xvzt4A#o{)y&yVl9u+oFlCHQ8k6KGSyT>F#;oPWL0tlFs)X?i
zDK}^WTzve}YrUhAI3U511W#Uy{L5~p$hPz5Q)2Pc2rS>!Q%J>u#ZyQv&|!VMJWnAN
zRwPd$Wm4o<0iHsy*iiM~(H#>XRVf_NHPdK6yh18h!~Dr{C4X|4dX=OFJC4!1FAjw`
zc>1+gS}e6WX)Vbi)H55Ylj5HL4z?^VX+L}9KK+(Swf~IfA)bRmTuH!x<1nlw6A2j9
zAe&ATvl4UASCPYhV@^%D!T7F!?uGA(dS&4slShV&xf)7<nt?@~x9A=VMpra>N$$5Q
zE$PODKqJ+wcH?bof&a~+_cw=L%Qh|DJ^pn(OWY@q7jB(#&s^3#OXM=`+^-*1{7i>o
zn0`_3a`Nf?$-grznfoK!02*h%cF$ji(+aR&4IUK+nP%HlgPXFUwa06T;!<huAt8>w
zDjcqQnS4f-=-hi@ea-m*BbTkWa0QM3DY8$fg~^nLXd=@vml;`q3Y`;@N%LL_?=wHu
z5p`vjIvXP3BvMsp%*o*pU{#i!+<_1$=zt59yXzPfte?z3`*{A@38rtf+@E9@BZYUR
zP(tz>?5cHz9%q<Z)FcmFjeNLElO2PEMgcfbaKwrpwnK}DOIlJ*4M9|It7!~PU!JvO
zc)VCD^m~#FO{*kr!02wndYnq71}#g5)iAtre9kAn8L|`eVKuKkmb2B$xP)Vi$9xg?
zq+AWHvvGYsGWgZlNia+SuawG_C*t<Ylw4rL0Hh9e$5#bnzzc^lXua^PU_|)Is4$~n
zeYOIlVIsoTPG>)7%R$`Mx|)#f9=)r0FEDrYsRyuVfP&nq&lhgpUBMZY!bWh4Ls~PV
zcc?hAiHeetD1oxx)|6;6%!{enlvG-okgk4}A5ouIZkM%9#p02(?m2F6LEH6%<9sSk
z-Nd@Wb(?2?lf~9bO?QA>u{o+lo*{lx*XvogGx6{hV){?l24GPA@c%ZeOA>9*?vu=W
ztIJ!>;UN~`Sb*Qnvj+Q1na-&PmOCs~s)^#UY^kk7u!@=ZP_shdR~kw!<6}BhpaoC?
z@+?JpHJ|a(ZYyOMV^ut{|KUFM0`H()zbk8R$^64-b2ImX#ak49h4V_UUFKcR6)?Z=
zq0ZE-3D1|B34!xp{cZ8ePjgSMG=989;=`k(2_YyO9xBu!Oa{NXiow1$j2<F!7AGjt
z&%G_&*_+T%Quy8JM-k%`WsY0|xi|^Ex#!(sVIBfLY>V)g3~fQkD}D=_KU_F$qQ+Fw
z@Xz(**#Or}W=?2bf4CN1iGbDG{r7GjchBEk`03}ymx|~_ZGf-{R>)f~VS`P5^p@wL
zXN9{$etD8!T}5YSio}=%8Y6(z#p{zEftjI8Sjt%$lPYQlECf7g1;EiIV=sDT4<{;;
zLI9ZK?gBD8aooLq!j!ZRA&gwPK(mn?O0hrUap8WTX%m7P=uRo}ofNYP9mP0R$W>^8
zil=S?eiZk6lGJgZqG2Xs6LjJoNsNe?PAK#P4Mc#2c(+4R2eUE79UFiFEq?xJ{?y~f
z<Pk-$Z;hm+-x!vyvc5sq+nY=z_Gn0y!urU?;E|?hdxE^~{k=U&-cFGA*W?0kCyeSC
zpmgBP?f>r}4UepU1z}DKaOClug_qZ`YtMhcBkQwom}ii@t1yLhQw=yyj|G8Ao#LC#
zxiS*6ZP!+5G$w9)Q>1b8N#pD^bEuPt#Jsv2503(=%9U2y4b~=nD6X*Wz7~xY)CCOo
zR*@%`=kY4AqsfBxN;SVP+)}x=1>=@n%Vu%0f$*=sr1E3q@RR4LP36Z&5vd(Pu(P~h
z(d}NMI=v{L#CxGs<_|ZS*szEXy;;c53>|n>DN@S`ODADoGA)vJk~VDVoXd>J&@!T9
z0S)7$nqu@4zJ*I>RK<YS;k%bR`;ep+61OL5S9q6#K`6$YLjwxX@LI~fe%;y<UJvuH
zXXD|QKb7tehJ$4JBE*TP>J-NFP8OMxG92aT!c>kZ0ZYPNA}Mt;Nf@BaIw0qDmJS4l
zBCz)_ihIFQmh4g%Y|s*Iae?a9m$^XkX?1zFn~ReoGUCz{Ptd~jC$agm<Ykw>C?$%C
zyd<SMEiA}@T4HT3Z)r-vBj_#^9R&$pv!1#^Ub4!XO_(&<WR<?69XzU;$gd$h@0o2v
z+}^PHF#U@&jj*^FTG43HF}%3z9+r`23h367G0Lat`$W>hTly(^`JpyISo9^w0Ol7h
zxW<aJJHc@?wmK~_NID;XzvWFvbB6iP$p$rEy>x!qG)jfa4FmDfk}B84w?&m}7P7k!
z8WR$QsW$0{a!B?|MCHFiiObw?Hx~+<#}5@t#awk_N2OMa^UNA4%oCl;jn^y4p;K>;
zm-FZyf-6Z<^08m{0AlOaVtzjv;l(f&MUE!LAUOe@#NpQjDM2GiBB~zTSIqB&g_|#p
z7n~YRUxDovopN2Ivw&K(5-E&V5g_`T!r^-?x0j~tN?AJeI?IPQ>eTmD3gV?4>pfpT
zL^l<L>kkPZzzK3P{D3UfDa8{suYz0#YBSaa<U|Y7KlBn_Mokz%mJWcI<{xTw?~`ls
z={m&bCv+f)1b-w5h#$Sx(a;aPZ8ZPL-_Uu@)=dR~4g}dcz>&DRhm3?}C{o9gJ{*+-
zgB-v)v~}y&iC;Iv;yvUPgitt2OHyvgGi$K?0K&&0WdT7GW@qV5@+_Avm}yPLo0sHL
z6jcT%S>#BD*Ehb~F3O3*_RU$D8ZW0I&v+Vh#;BGt#xk68IQ%7#g{)2+r(~^U#^8Vj
z*deRMVKggZLdNXDO0`s=h1F4eBlAFMu<eDVQC4$h83&pz6D5}@0LKKJCgZ?TK_l>l
z7?0p~pq!u=#%I9x5+7@6@km!{Ld)SJrr3x{oe{YesWhy841;QnT7IaY^IAs5=6d^V
zHR(E1Vv+og#1&X=cujT*0IA%ibE!zf(xNy98Ov?e2lbZ;pZkhV8B0rB2I*x|eiJ(H
zSUnn!N~SAIqOWQmHI9t720$0=GjE|%g!6rUON^AGMs3OrChtwP6)&(j0-CYTwAPbM
z(^=!Gm4hd(=}bwBz)X>&qzF>?O_7lYI;oVB%lJux^iskU?3cA>9!e5PqVi@|&0M5R
ztMbKY)Dy@peW3!m3`)s|L=RW&R{=#QCoNZ)hUV~&nVy+e@~|4@sbfj#lE7zzOLWfY
zSbZXH8NX3OXc;}zvae)RZhII*rOgd!=*y6z@9?j-Zz~$uf7FeLoL_Q^DmsQFX{Y;<
z{6R59`RR2`=o6`WMMDe|c5NB2)aop~mVUlMAGFd$e8MMUrJ{AaS>;4s=BUG)vbNF$
za%(BTEc4Jj1(<R8@KS&YUDsOjx^lXe)Q_Ub3Me0gY$SHoqg*<gGEuIvJXN9rDY_z2
zuG+HYL9UF?k`!?|kYKBWUJ@T)g{FTEwm4WywxVdWHx#C33#bulYY48hB*UK$PPK-&
zX7+1tLNLsJa`UP(3z4u^(`@+yi)9IOu2oaHmcnfE<$BHE(`%`+b1=(v*m3Wj@QF<l
z^M+$W2j<}iN9}h&-D+%Wm{rE~TRc#kEMgr}Iw=GXlSR;_rCnb*G$7WtHlp<4U|>B@
zT2tPd(wboMVOj}!wB8BFv<%Qu$-w9??xDootPK#r+AHH@R^T|VOysdiDQ0=!54wpw
z6{3;Il}doOEdEmiQLtoTjjYEOYlzVf)2L->W;$w$i8KxdKLE8h2o&7`pQzu7^$Yh!
zv3?Wwn5LKniGH6`l_^)Hy%wjC58Yo;+;4`(E>GGBdm!C>zUkcsHVqV$HI3XDfoncG
zK{$v+Jl5e-cpAod+Cqg=sA(afx`qK-*%+P$K!k-jR;ksqtb4ZQD3hhFdn&0$(kdka
zcqcxrBLyMql*hQ4l1_TYM5EyM38C<Y8sb{Mb{ezkYPfK;zlm7Hk)TkD{SYPhf1pPC
zq1iUFACFN*@;{?1-RO-8<BNtI`A_@)LP2SJjE2Gi8H%x{Y`b452{qd~P4=xIRnbH!
z!Knk}QRVc{CR4gv8CBR7k8dpY8)=gO8XRUsxpm6I3|KWjia6^vXBTG0VXBGtH)yi(
zT3mf!v1WvY;1eD!{i^*^C0RT0fW6PV%cc0AXmlw}Bkzc++pVZ^s=WX|jBz0DQN|N#
z|JZRPXdCLV9(TheA*_2yR=}`BVP)K+;iP5Ng@iT7O^u<sania{PuE*yyH~RrMcGe0
z5UktLb}<`~SU8b7`%&@Fa36$o@1DO+3}a)E^irO@ZI!uFH2RCIx-ku_v>ZLj+4WgG
z#j#l@SafF1vc8%P+B7TPps6Jn&S6_BV@0PxMyi!jLOs)A6$4Ym$QT7!HmOo6H=1LA
zV2}~^C)aQ8Kd^?L*}OkheN#WtQ?CO9h6CqVKPtAVKfhjp2M<GsfY)CKM0cgTlIaf8
z*3+A|MIPIXNqf+hAUMF=erH0v%CMiMBG#17N~Ht2(zt#D;()!H?6?6FS)EW*Q2WmQ
z@<&Ek!?k};5*ookif*#c$rVJj6a1at4x|?zL^L7RmNG82CI3@MJ6MPfGir+(Qq(r(
zk1N{h>lk;#n-~ql%fV?hpvGm5B*q7YezBG$=)6%2d}XRuVshBp*tl^cT^pM^tWo*4
z(%X{N9k~goYF{1}R|)?cZBV(+#}&ZRODzO1Rr11*Y_47}OUAD3<~@71?b@^bon6^2
zJ2vm$y>0gZheb<fi=$&DXA~AP^$Hp^oVhfkSP4gjJ88!IO|F3kFm)Ia`xF}i{OxPc
z37!%g)N&d{=!aL9&rxr0cQV~g+S2LHZhPoN;Hz=**Sl3`4;q}D0)4NuGJJeXyYgC^
zIq5pRkCg5Lp#dC<H8O_YB?VI72gjDZ`>3B5?SD<L;~_%Ce!#dkW4Wr6?Ji{Vx$zoo
zBx;$s4gw#wNYYj$&az~RV<KG*d+YlW*1|0607yi`PN8Nk0VAyYOQ^=78ATJ?3z`E_
z!K#4)bWWifOAi5T*hGKyK^p~vhfL|FuV+)Ts}Ja=Go3~PG6?6*N_7kE!H%6#$cL^2
zdKAN@E-!SNpg=2lFjpPTQVpCOgo-8NP*a>@$?>r)Ds8EVkS#!8EXh5_s<7k7V*`Jt
z<p=37c0-rj@m@E}^Rf8#$MYZEb7v=^kyI)bQ{#F$(Dk}=h#mR#`_M7g4N}_&@^Q?#
zNgvw-hY}?H96`(Z_$Xn>CZZZll~c|;^pJ=h5?*K^o)iitNBt)heyLL4Yk(~{wc_3~
zuDJ?2TkhTKR7tJ|3$i@n1=WGSVR~xCVQ7c?<}klXpOD!4J&>ywDVMAvuzckJfGQM6
zMnF_9*QLT^AU;F?;qwIhfgMXW`CGaAZN?J>QmwKzK;7GzdOnurB;XZ^B?*tx;n#;l
zl@9_CWsPS+MNnbolSNSi8w5-&**spa>=9t}+Hx0_f+>WX6@l`cf*O@0H|k{J*oycv
zuG!Oq#`uJyf^|Z!gXDuWimC;lV9humB7%<xCXNM@TttUjC<)gxvC&%0=+nU>(5){u
zXFWrxH9Sa`3$O9@>shNwGG22S3*)ITm$pi$&orhk{A2Q{=%9!;-IOPIU0V`IP~;7k
z-U)-(I>6{tV-sZY*_DOq+Ze6|M0Ge9)`)xjm^=MMN&&<ACO&(-#9PX5JEc+uq`1mK
z0y`a<sEq$UrtKVTxo-x_Ear1%Qmc;_@C1eVHgk3IzjI;**SHUK9k>#DM8naqUIa-8
z7W&wDDOV+<6SX?7y)rCgIBAhLi`814AW357b#E8puvMK{mm+`Om#eR<@vWK{F}oFu
z?H(T;1qK1q8WtU?z)%jNb1jlrKl$Ts0y{*F+TF?43Y-HAWKVT`bPPRKNiA0_fbdFN
zMteRSb;l;wF?z%CzCRA!35H9J(N1*?hx?~e4#d%7mApG%FM|B)x45%X%U2FUpJ4=h
zVJVOtU@KQb4uwfsCY(9}F7?O#bV;VzK(?2GfmRXgRcCm7KkSRFci6Lj5<?`=aqWXC
zqTO>iPu*7>{XKcxf!6i|%Bbf$@H6U#KIT*a<Ea>YhOR)rNj_I9K{eWA63Y$ODy8wd
zQ<@<8YVO}BV!SV`XE<ITWq+!aDus#Bi7NU39N0i=`+<Ass-+?TrO5j;NRQBXy;6pr
z#A?M7GXEH%Ji6-S5rjY@LqiS7u|^!U?^knsvELkz<5l`stX8k$egeF!RvWJZtaiRH
z&`Cw{jke7k_iOMKaLO>xhk@s4B?GFyQ>q`Ax=;H1W6Ap`1o*G9Nxq4)y{9f97-J(x
zo6xpQZHH{-Pbq25pd4d;Nt-8?0`X**?m{WBvp1PeFIx)Kd$ZJiQUMrYmsHsH!q1<~
z%}g#HIjb?_UuMrN-1*i${|Sp^z_Mej$(Y6Fs#er9HMF$7Dpf-^SE4^rwpMlAFTYqk
zamGD=AA?Msy4g5$62r96$e1D%I8P?U0&|jg-q|S)KS($ddb$FieqU85-U#tYx~-w2
zmEXn?(Pr46HDlE1-BT-;Yjt2aPMnoW%21&Kl5J<lMmV!<1hVbw=|%coD%twY%lBa9
z{*-dVFm_9nG#~A|_Z3I#qy|id?jKCG{yS1eu39MOVKddp8Ovu&mA%Y9&wd#I@gqt4
z(Ho|J&|02#B*S~e%BTAW_1?V6@{H4cjBB>FJRLaH>`1=Oyu-It;m>Vl*aybg9WQ<H
zslAo|^c{HQg?ylr&kggqRbk_h|CBsr)zNG#=a3asiUa;x#jUketc~jB!eG}cak<wC
z8$S5;D@2P@IH`bAa<`-P4M>8od7NlLoSV{^HLxw+-LWy5wy0#5z*F&I2^jf?nPEBz
z^)Fu|G0-uHfpnOGx^vJ3Hkvz>)c`Wu*D?DC=pgL+Aif|)jhBHX<{hTrq10t8hcrkD
z+u7|r4qDvRQ*|8}TiPkqw1UhG#9XoVIT+@JO38S87pSaWN4ex6+HDL9sDMz*TjC5F
zD?%%!@mdkYUl;*bOyMSv!X+=w95CZfYHtcbmuWF<hxLW2fuNG!BK>So6OctyZH>%M
zEM-CNP|JHxC*(CBLB1!7Qrkv3ULAA<5Wf%fCrNuesG)Y5z7oXbGoAi4%ajNstBv9+
zts*-t`^~N66YBS6U!{b21yP&6<Mae)5uB1TDHLn@T(w}uI;}sP!^y0a^**HZDFKc#
zqIB|i7A&@cHVKxAm@YufJ7`)wSgZkA$?d+Du{4}OZ$UhK0o9F<I#R22)#1s>Ty-gP
zp~7VF#gZGYP;`>L6*BBhRLTX4B3SapQ4qNqy>pc%)+swoD9*zk;IAJ|=1oN*bAzk2
zNa#v=%1qPg9;V*z?(ANUfGVj4DzzeQkWDP!l-*myOM&s2{K=D(<sk?6mJUV%ttz4X
zdCFHEylbSC<wQIJ^Nr#X)!c$4Uq;svD5E8YQU{=|VA$>S6d}T`8?QRF<ApS}Lo&P+
z!f*;%nin>%Yo{|91|*#b?G8<O-~)-vpOc2Z+X_*vR)0IQEKaRi`A^E(%!Nh;D_LHK
zc7_yP{}5e1Y?q)bgRa1XxiVc%<29=F7^TWLn5}fJF3l0C)txbgSQOAEt*9gP$wggI
zRK4XLI6jOiKx6%CBZqYx4JvL)dpk%nB)5Uf_8Vl`egl*3H+*o=DR-hmZG$J=d#_OM
zHN2eKYiO)m(zCIrXVb>M;Z&;7)3?d#>M4+PM@LV0_Zo~vx2*oHX=`g+W*?E;^mHZD
zeWVTlp#~AuHQEg_Sv<&bRwqa=?t2!{U)-5*+2IS-g{&^q9yD%zyf}H&J@MR2^e6>I
zrFI(9xN_fp{+WC7EVC7p#Y-<SLLe>7O5oWLzQI9>^I8F5{{aDk)KN7ib{|%mf6eIM
zMD50pcN-s{XZ=P?GNzcDn>ocB#gLQ?f_Iek(m7+YKo&ojG=BcM@$+%spo3R*6k`;(
zMKeI$b25oX&n)*a;oKV55Y~p}rl6i;U0NsH@5C%K0JR7Gp|RBN?zyiTU()o((p<=@
z00~SKi)08T;1g)3(`GND{mra&echOKu&;v|c9%qsY)JBljQXe4Cx4PM(tNS+o*JLi
zG>u`-J_TSN4f_6vHS%03@7YGrw)eKqwBl;;9xl<H73ck(yLZ8Q4)8Ld&NHZzL;QgV
zP%KrD#NjtVN)axq`uhSk(@KByPQRI|j@Ghe?1pyfjhQt0{ZngiOotP7AHuCcxeiK<
zs6;E$3lC>(dlZ36ESDI~^w;)_sEWKXQmW*r_p?`7bU{c4POhw7nH;MejCUr`=M7i_
z2yp0!wx>GIHI$vbo06M=EPCu@Q2{nKu?;sr)L5(Ju~g5q3id0zFIO7jzGwc6_@RXF
zKlXlwVFW)gvjH6!dWhDPlW0m}?}yU#q7Bpb$d{e)+D)ft5<S%E7t;%QhcQM;Knr!J
zpDX{QEcetQlFU8(YvbC>kQOOV4{3>x_E0)Wd*s25<E<&ZICG9|PYYiWPO)Ctul7Sw
z<dC>ii`pKx_PaVaB|Bm5cY&<mXHGKSWDj2VELJe8nz!ufd~?@;>O%xy2ZMP4&Stx@
zkx;N+7~i<mO5D3p&ORMVc(E*apk{ofFLg!cq?K7P<1s%TmUzoSC_YaIi+fAsLCM9~
z*SHEbVPRhSpk$~C)#Ji~BcBWDNq!BC$c5!pz8M`)azT(gPR};ZeS`Tio_{@m>alzN
z3YQr?-xW)&0ziJJ21mk#fh;7#0c8Lpsp26o+MLv7{o`P=T@7XLBhj3fqL`JStilqU
zW>X}`4sZLX_o(gK2q*R)II?$jNpj1IHy64q;R&#x9pLFN{e!9DY9&|5<Dm<7EJ0dU
zPe|H6ke%vmsq!>E7z1U5Ay8rR?O;(nMJMWJQ%cV&DfQ@*q8!pUIa1>ieb@e)V4etD
zWv6zJX70x=#Yx+V&IIk)QAbkKupX}S8{J+bn?d^j-<5H)%W(?p;9xuER8s^X7e|Ur
zv4kh*TB%5l$_HUO(e*(9=qEcW<-KgQqG>!I$KK*cYp8=I$Zjqm*Vs6BgH#S+_ei*k
zso~z_JHOVjUi}_Uw&)I*{p1~*@Gw^zD6%x&R2j6=alpLxs9nc%HC674_+@;JlInv~
zs-P_!ed=hI&I%gOY*!~nV%w>0tzKkk{g4?}vy?uvS@?|_VY7!RhLNoaP56-4i6NoL
zsKQZibkw2g4{*K?;u$bzYc>WH;8d%Xsu!h?`Hrw$^5`v*q``II;6SzRHKdn9io9EL
zay3+2?g#1_ucA2%y>1Q^(H^gK$=)p0<x_1NYZCy|k*18|T3DxOp2>Oz&rzEwk=*<P
zApK~z!#RQ2Amw)DC=2%5HddoFuYGI+fL=88?j!gkk~`82BI<xWZJZ-SbE#Va1q?vd
z%9xV{QmN<sy8b((c(KOMp->vn4@LohCJF$C?b;ly@ku2JRCcSAwO+$eg8UD2{)~kR
zXFDM1KOyqA;erTQr7{VOj@RmB*a3<8K#{qn3{UuX0!Vx>EX5(I!cae7*3!Gf*7Y$G
zqw99a1Ljrr-W*O?*NTSM4_LYjYE9lzEIYYsT<?K?Q$pZR;4?`ut1v(I>G1cr1mE0v
zqqlV6m9<m}Xek6kW=|-5;bUoR-BWZWVY>zD*yz~k*jC3*$F^;+*iOf`ZQHhO+cr-A
zf1fk<zBm_kv##r_@m8&OK2x37m2C(dViF85CDJlhB7JC(?|p1Cu+jq*D(#YK4MY_N
zd;nt}vSh)WOlizmIWePQ{=9LZsd^y^9-ekt8StfWNn65F|5GiC)5$B8-k&HT&M`%8
z%yXR+!&G)vod_(a?>YUVHp)0gCe-tH`yvq%67v1<WPpltJUO{8`0k$=RxGudY9$E*
z8ZsE&85(b?!~?9S{-$OStS4~3#vrVx-(U4bSkJM)h(@hR{GH!JkBXTAYc^GM12->`
z#BC+`B*FVA<{`tTBu&GPJf(}}`>&i5EoK+f*ZnU^;2{1kTLRnD*!cAbfZ{fH1+#-}
z-+1&i%3;f0rVgk1Ve()y6S%ES9jB+fecM~#ej4@*Pl%B6PaFg&H*WDE8V8On*Ra+&
zJFw^Ikc$ee4L0y>!w~GG{y9{}8^-=piY~u1FtSk_Y?$;qc@LwABnmJB%Nibdh4d>I
z5kuGjI2Z@)=KX=sJ@Lqxg-|)Pf(HZj8f2emm8HmH`0*Dz6oC)YQU@SxErxO6{;-0c
zfHCq;e9}Z(di}pcv=yXHuME+!7LyP+wcbmA;no7S&C3~`lKYm&!uuC49VSM9&Z!&8
z@H6$bjbW1m@sS@lNhFWCBRj834?ms&+QX?NRsSqEhAz+J*y56d6OQ(<RE98)B6?3v
zx3?J0uzZl3r&ne~bPuuN3f)dXT@Eu)3ViefS#>sd=AX0$M(QYu0Rb+&PVLRazcKJV
z<7e**jfiDVk7}mKE;aqpK`qAZwHmSK#J7Cw7g!TwIAwRH=>|NDM_!(A${EOf!GnxU
zON&_kVjO;<6L75xWhz*u0VD*;u+ScnP!2|G?=vtZQ1m7xp))G4rJ9ybG=_Z5<o{~o
zOi<HHSF=o6V?#x(g~-IuA7BZsnr>322hg>#<JCLFWF;4Y;Kbn(ca_s_6@VLo5|NUk
z-}}+u>@X3+d;<(<|H9tnGD;}w=j09vS9R|c*rG$rS5h`#P|iB#%1s91!bx$ejc5KX
zp%!NCD&M|=I=^3)vqpFCaRxk}p3`%V|AL_YWZ}Gl7+kwT-1)YMA$`Jesym@cp(!F3
z=Y*(5#hr(6b=u47{6P-2_m8F{&-wJ8ii%B-vU%zl5uO}rhM95@|EByMGphwaouOv<
z?|A4S#JK)~qF2EtfIdcRspAyPMSKXy_Rf&LRCfOrU1{$}43^5Ee?_Yt49|y8h3INs
znh2-PfQulhm}A%?3XlHKXHg{iSuj^?AztvCxPZ+9GxMjgHMnj&q7Ef3Nbi31FHWAi
zZ(V;xoosqg8A3$bR`>|!8;sA=45}X#lJSH>f(Y>Yn3?AUjX099pSrybUa3%I<^MRb
zkY>KLXg`xQrK<5AR3#1#t(paMGj<FTuq)ZqP}QcUA=`rz9=;RReCpNJ<<<Gw=~ZI_
zHCM=4EDZX%@h=HrEBvNpsgOc21hrHI$!yWE5dW^D9nY|M1j!drZZ8)4rf>egf81#Q
zc3YA@mY9!-NqecGtcuf!cK>4ErFu^wbEUZZ(-tbsn5(dp^?bZBL1MQRgk!o1tLP{j
z9dG@>M8UZ}6P`EFoYOYXel*PkbjAIBL0vM)&9JGsd=vJToc(QhN5NWdyRjf`cnc`z
zvXWs!+Ec1rIi;rfSfnDTLFK~+U|mTeJuhqy9mEDf612*8TW@68>v|vLMi+w&Pb}%^
zVm;?P9bB|O*c9*8(smBFW1IK85NIwI+wLB}Uxi1Dd;Vaq_dY(rpD)Hw6CZ=^j<VD8
z#hdq63&(w8Y`ISm*|AzZU%3euL@K{Nb8+-4l?uFFp7No#Q$VMP|Jw7M(mn8AUR?zF
zBnjHRGz`c@uBkX%H2t=ZN1!zU4#=|;%lLalek_ee*SLD`YW?&TKX{DsVsOY?Wi7n}
zJ|x~Z4E2mc_^FZaE0XZ<)X1rB^z5whRa`WKCY@M7s@SC?hc@Ej!i>>~!Pw#vTLR0n
zP|n_Bu2TFogQ~7)IDso^>91l7)GAnNcnodFYxPxa5afmP?v(DaEq-l9$6+ZKs@%A$
zYYB@GIlz!}wY3(dsG-;cTG7IlRdo9*^pv?^h|O$|!|34`B7VK<26OVUtW578uvcRE
ziaxx%Y#j#z{6d7+NpM{dn$6A-dwpR`So&^XucXUdM`l?ZRT&yw4@RH!kXF-ZpVUl;
zZXOsVkeHF9DP0p;x{i(}yK^|J;mB-)zB<#+ydE%)=auMijN+f5b<fluJY6Qyy-OCY
zw>^$!7B_r$O}o$*H{xwOIFR|s+U0LJl=>GW9p04g7v=rE&!JRe+s@|Hf$hFm7nsZI
zSxtQWA!J`-aU=bcwDitX{C5ccmY<!^_XlT)5frKGv#BjDZu>5KbBnKye?yI8J!>Q+
zyE3nj@Ev2f4=wLIuKP#O&jYtZvh(+1*)8lc=N{vI_XIy{CJ;t(IkPM>G6rUT<pk~j
z#d<*^hFZIOnF81^sDc)pRbh;BZL!%+F`_>Xx?bphkqL77G}xc^_BD%PBSz;@Q%G&a
z7{g-5<wydat1pfp)5mAK^M0-ZG5L)LL<trrW+vPKqSu+vBdmvX(;3?=fo?Yl+Ii-Y
zYs2v^e#_E}rw3!hJDL3j(&ygcgHDoHp}3Dz-4SCm+Av!}JvKTXgJi>-4ELkaz1sSr
zSC~OT)L5vopG^T}3P8%|4Rso(Gx_NHc{757A@D~lwY0AF;=?<`2>i^sceO;ZR`I-o
zDR{r>`t*RGV-50;W}LooWM}M|=AWia4hm*|c0BX)!SeUyfB9L1X8-tYt=zBjcEL*0
z9~0x}Y>avnyApiAk>F+?$?}>@WjomVaQp1G2*aUcF5B!O$@&Ytlq$rq@3TcxKl|PH
zV5Q5(db~%y?{GJ;KYwuGwqta29A$k+ZF4C<|C-}){NbO&LIgS9Wbky`6yYQ-t3eR+
zmDmJ5ifg!7(m;ZNTUg=ojPi2=5C<@ZS0CGTIsyi%qkqKf&mX}eAp5sjnDo4pl<i{D
z`h0*dow0s!&oLvaU{C)w1ix!Y>*CK%>2Gc><;C=T5i)elj>D%T81voag_zNM{r?%U
zPUNOOsXSP|O{v9ZH9ZnnwdyW!&<%rxYyuSd|ALZrfdo=gEH>V$bGY<2BC;BD?^Uq0
ziF;hg&)#SB?ru(g6rJ|rqLi?0uRB+pr=z`tVu60Qg_n=pzRd2nVIM0k@3|IM@W1f5
z;`?tT$Q*f)iDHpB%bWMl7`*QI5tYZo$Mn1dTf}|n-7PyZJWIcZqy^~tQqwaFV;cWF
zUaT?%0MHh2W56%|WZ%q=rzD{RgdT*thFQCXv|M|1%v0pN7AqCYtM*ePBNmKbZX`9E
z{EV2A3PgG5(q=eB`EsLRQGY}ZnvHJ=9imc5p=btDGKAGV{diP{)t8d+DUbmb#+l@V
z$7U=sU;xek%f&KWba`AbMnj<#rb`w#sqTT5;|hMUZt5+^{TGULwMzIoQjz0`Npz-_
z8M!K;8+D~ioR5-B+ZqcZciHm7xKvpunqb#v|21L<wSSD*(2u~Y+b5Cwq!HZU$t>aB
z78C~cQ8JuT#_I|Lq<C){DPt$_oJ7Ov3{#1IF=HM)l3WHP5x(zD9><B!o6n8-`K!@-
znj^F+4V>NB3X1T5V;cD!e{7{<{RUA?^{4cbWN}?=vty$AxZvsOTrb#2*{-dUO(5V+
zCizhYYKI9qX5R~pc9%9^*JWSX9v+_^nRVOi33^B5RTns16mYOUt@ZCFHYQ4kCgAEf
z2^Cm0VI`Iq5SYJjQa$!rSXO&+(mY-dKMLt<5Sy=QxZhyli7$ApRc)wdLii8)X<wN3
z!(I@K23u1dt?u+VSJw2yLil6xmuMShYj-rPVjtpLF6N;s2_#4oNyT6<?j#R}%aeqr
z)6-LLZEc>;`u<&SD7tKt)8k}s^=Ez_9BkCvVhXD{A<x*Qj626Oyq%;@VdB+_Uy^U3
zDoCrNsk%UZjm2-5$XU%}CAW=rVk;Yf&SFEE><*gpo6N}k?9*7{tfD{8$vgv%(Z}n5
zK{A!4I>O~n>_9wp+wH{)GD}2I%y(<%)3<UcpkM5{o0R}$73O-vx>9WB_(khw1uTTB
zn3;kbH&Bi@A+!0P^>HUm`!jidLgzbgu6HfB7fXHn1c$RHzl4};uzpwM$o6^^k7W`m
zBg|ui1IZOPL#pB3+hQLy9RcYK+37LlIfE1ns9{XFUwTlGxPh-SNN>5@)`R;RgldOq
z3^}!kx69JU`LV0*#A|m^**5=r-a3+?Gp?nX-d#+e)^kziH58bt+R6ZHZ`(sT$O$ur
z8=Kkg_qur^R<pa+Ho218wp>)3OPXrBs1BP}=v2NnKPs2l9j{XEk|^AXI2e+A%mhH@
zcPRp!V2koa66>}%PuFD%n8@^!<hF<^?;%?<`aZ)PaxitH$gXDPqsy~w2zqs<cPi?1
z;0YE|A6BBR@ZWrbJF#fwA$3tsWgN<|($lvs+q}pW=z4<@&3&l)Xm;Gi&o*3<me<@L
zWi>hbFt$&}++bQBF*N;I5sTmqp<RwoUe>W>C9KO1gO2@-un^Xwwrdw=CVFpG5+bVJ
zOr!kIk5Wc_>8U8I(}g&w%UwkAMDMF{^70>61}a8;nrt+6*|cfstJ9>I7=zP?X&99O
z+C1ag;Ns=B@AX8li)!Bk(??HyQK=>9{7=kjHWI7YX==cW+C(!5W){FYxui2zAIfVW
zw3Q+1pzw>SU+%06wHg1aeFc{e5X&p#(D(up>m-Rk(NrfSm|K(<tyth~Sq>|d12Op6
zi~16+E=-xV6GC2~)+oBUs0&M=ms5i%)CZ=wcMk(vlTQeFeIB62pHW=S$Q<Dyp*Ma&
zSK1ppGNUzE_Yr-n-|pZ1phGO4N3~E+%=9w(&-E3RZcp&AEz`CY|3kSt1RjN2m(|U3
z)e2Aglb_Pv;$BY+kBIB2<2U`RaS<M{Tw_=`?<#TOv;Jut@3cI<xl+{{yVYL%CxmdL
zfU|Z<@!2l0kjcM*ah0DKyQO|ZCoYb0M#H}#DR1ttv3A;Re~An}-~`ES0;cc6cto)2
zBD*<Y3CDd0ra$=?gYZD*#5&q>&l|l{hKbz~@lb)@0lZsF=iXoOSH~UWeS-ESlM0+r
z!JJE@ZxAfYyy*(M7)fOC3s%{@8I0HK7_=tJ<RXd!ZM#XY;a6`PxH;E5Ya{H%v4RcE
zxl52xmCP)g@XCesrq?Kd-h4e}YovZ!IE2{vCe$uT`Dzv-(~u{0cDhC<vE*2_TJWL1
zeD1l&vD{M>9(#l;L-rC%ZLr?_7l`V!78Q14v0%o}UW^A(b4Ez@YwiN|nlKglUA;df
zK^v3@ml@|lq_#y}Ye~2|_jh5{nhPZKmri$@u7T|#$a`Io7q|wCfG!q82H=a)%@2C;
zJLQ8!imHeXgpbbKdpbTYg+HIf*FsrX<;@l)+9^*?*id5FtCPiYZ|F*T6mUFu>U!Gz
zoW8w**xp#Zy`meALZmx6XrAiL`0a@e){bC{oRaZ+)UKVG^{e&f&TW5J{^88?^>gn)
z_2hFNeJ*ZWB$CX%0$II>raFNI9f(&BS<yX$iKG#Hw6u*|FU|L8zn?0D62%l_BKV?#
zwV^B*BxY$I53h5yi<j^TAHQU;^QRDbvGqRFdUev{$!fjP^qs4{PSe?HtJ7oM=3r3k
zBG0mVGXVP>Q4N>BF+?N|!3n7r4)tanr%=MX`%=MkC-NYP9&_LM64&3r>9UU2$6TpI
zb*S#WQl$O{Ws_%;jcQT;g+Vl^QNOh6PN!M)s_+yIPA}zMv~|Et$5cjxvshbu!S@aG
zCZ;oAtsG|x7}VjcguDCR+6Zwhbi5ah+|pBOMz}&9U{IQO_NPi#NRSb2y-K|grkWF6
zN2kpkt5i|KSn|0~b6ix=hjS<IL?a1EHY0E*{|qX_IhtKZh7F&m-wc5{yLJ9Vp_9SQ
z<aQkAiV<LML~lHYDBfbaCI{;9X~s-l9J|0w7KXku>0FK`3toge>X4+xaOSRl<@FL=
z4v8Fm9AO)B?xtH-;4h2TLPR5x<dF4Oym8N2PV^oMNWFG?ar@!nH;f(n7&A$nDar^S
z+2PJ9dai?EyN6(}sV@WV_0M+1ENnpBnCXSfScARI4&~{P*kqH)gaT`^p5lPHt(>ai
zXw=Gpms^3_RAQZb7}dpDH`k&BttQn~G<S22d~S4?pVKFrM0|_+VETh~hgN{o{s-4W
z0V?c7q1=QA-UOrbF-n6tHFtRfB!x{IuiI)9O(mEUzvOI?<MF&~pP3*e&We%Jz`@<)
z9<W&c`K<#ICr-1BATyCKyztF+U#?ZA1IvcoLU2Xk)pCG0Ccna98#6XL1Cg?k$v{M%
zHNuM>#3Z3Qo8DdvD!+eHmiqg_;zF_1s|S!KvzA5T+oWna`yz-&g8tEPx$>{4k01-(
zRM01TVXMIWF_gW-e_`c3JL063hD$2i2}zd|BX?{flT)i&xxZXhF|f=-c1bDlHTVq0
z6!i)<j62d?m|(p9TdSb+N2!z!i5}q!J;TSML4o>$H8;34D4FbsiK4)Ji|eiZ%xw~4
z$>SQp`L&3+S$p*zZ_7s$Zcs7{a|@_oW(nJ9GunQ&BwJ!Ab1ah|XG_;&X55n(e=)v7
zw;#;DyqZRgiG!=48>1*g4M)p+^<BPlnf}a-09)N5YJ}mdHbnBxd)1$qC%KS<=g4pG
zC7X=)fTXQ*0{Sg??(YYg{f*66*x}=A?X2#HN&0$K5uJ2ro;DKPFnRpb;!UG4VmMJ}
zt1!2DL0NDZ4B3p7Zr9`>2obL2W>LlhwS6+qp8Wl9uMWbWO5N3sd&$V?sMcYhwDdod
z2w&j@JmhwwU)h0_!>NlBVRM7&soY}$DtF@yn6=|PKHHsM&R-h=LkD>F;~4#C0$f+6
zn%p)TGcPBlc?8454|KWv>FKSUKBsVH<|f&Dv>QJ@9RW|a&l95*iU=|Fy3CexmBbUQ
z&p{>O{-txq=Sl8UcKq^cMI-x^7tE)%OR>yEHur6Iyl$BJpMK7P1VLnkHQ*?~tvElV
z@6WQgby~1&@G*Y6cE_VXqT^A9$KK0Fr8#B*<a#&^&L7d#xZ{-t#FuR%bI6h`lFiwF
zzlqqN3axL!gq9N5R?hG(h1N9Ya+5-Z+s*dbpkoJIy4$VSRqF0DE7y_#^1#wZwHALJ
zWx2}GpNFQzRwq-hb}niO=w-<e&*MY@H5)u1S{-ZMcO)zrC)JQNzbKUrb|xqWlUkah
zWNal|tbs9ilrEzaag4$3r?iA2%!}In#vI|B@{rV}7W^AW4Yx{e4)G^_nZM{Gt!2DC
zfB)2OHA4Jva-E{2+3|CcD*Cm8j>#(=>IvorhVGd_<YMs9`Q6wrraCkWm`N5n0r1{8
ziARsEvj$ZHx}^KWTL-+Kj{%ol`OI;`RD8T)kmm@6@U~q5^{p)*q`~N{T_X-&WtKmf
z)RnM*QdxDduuA`^bql3Ja2o}kTqaNsg}9{l7MLN4p<pG04hpOgTH(#?_NthHLT|F9
z6&aQ#_|XjG<!c`L^9i2!tkYbA!O_{RT@XN6T|U0^V8fXOqX?js{Uji661H=R;Xzi%
z{>^a?5%!!32xaty2iUKx-9YJG#~ijCvlmwU{zg2-9FDU5&_WN(F_JuH$ftIGn_&4Z
zXBcvm$~Oc;gBE05Hu}}>jswiQ^$&!AK8A5sef3D>Dl|P#L@-YC^m9C3b_cFE?*Rl1
zJ!iDjE;#cqN1hp2T6}!uIXM<+95Btt$NrCV7&&cQCp$3TKOVG0F~i{)2>o@%0#)QP
z*r?;i;tT_G+!q%=i_FGJmi_QZ%KjA7lvNc;QQSb1(;x`B8mm?LH7x?i;X`jb!Ph>F
zKz!h%Z;~~H*gVsxGYm>!p+Z2%l}QPfAQSr&#rBt48HJF=N92GP2OBuex4#9)g?1|+
zj&6OfnwZ?U2X+SOc|Rtd10|gK&CzH#$NiMD>I9Iu@DH<dXtRq_juGSEuw>sTg|r(u
z@KWn|M13pOpC&KREB|Lpa|=^G8*5$o^X6OjGk3X;dfgEZ)%(}qGn*qd$n@e*0GH06
zLnm57u=dbeQ3VEq+Gx5*H@49NBt-HAoa5^Gcjb(b(6W<^Hh=?pPn-f=^xsvsU<vs_
zYv?v+z9@+(X0sbV5ZYzyf3jjJnt~ybW0PEkyEwtmla83bc_IzCA)oumYs_a(l)emx
z?1;B0az1~8(u|61bxy_%E$YtZYMkNqXS}B?6k&+E;AM80!RG&jwB3RB@6u~Cbt{;E
zoqpui`g^GQ{;q?MZ@tkg=}2{iI8!6X+BayYTo*v|q!x@DY$yg+eq&M*>z9Jm)F9Wh
zQ4d6GV%*92ez^abCFjMuy_<WYUQd+s2^@yyxaMGRFAhsH0gdH1B?Tn>JwedHB>u~p
zg*o2Om0|Wz3X>F?oafCJmou%P)7p@Mm9~ZkMj=`RO$-!yMBwPB(>%FsT7^2sXu>g#
z+bu)~ieG||%cY|nVIq5#|2&nv|2-ZD2;`p-o~r0u*-~EUIKu=MT*~%d_9xCvO#{t+
zD{{nIv0x<4vL&o4k5tJxDBIP(uG*6gGCO#purJVQxb%%C59#8JC$<o&^rUo6fE!B@
zSG$M#S9z9>WpMgY9Wn2bU;jM&sJo!zZ`r-{W~<lz_w?qE>tD+H!p8cwY@(ACt=t)D
zcCA%WTQ%UTS-)GVnO76ZtUK|%RI+C&&iV;e5sXEXuLpMh?!?uwXxkn46q_wFMvOS$
z4y$P=xiU=~tCmh_?giAssQ-zu4%sk;$VYEhVEXyGiX}cZ21+tHq)|@m0-I2{QL2a`
zjmpq2<AzJ=cjuM#4~KOXbt0XGqg_M|+$C{A;ur8=puD=Vx<04Xn?TJ+#A0bub(hrl
zB|uYiuvWNPcI*Bw6wa<kvvQ2=``xB6!1H8xb^0^M7Sp|(@pdvC^14awNg)lXP3|2m
zzJ|LL+T_#L#Ar7;LXQ0X9!_wtoSwtuF8B<)YS7Nab8RoFY@VkYV;}}NK|aDUHLVsJ
zw}QA;kcK<lD=BTofIO>fxnx4H8;t}IxIzNa2Y`E=BVx+D1!M)ON2wvTlxf(;Mkgdq
zpo<MlOi{38_77EBqrlsiFTs&!TulM$q8lEaoL5hdHp}XZe}I`x5V;7&FiW5C*J*!O
z4IJv;hG>PpzJPxI2~khRDzEXSCye162X%PN7hsNIqyI3}mrM(hrK-~wjPmB5aA1mU
zAGa1*=a#{3ofYP~=f?WF)QJ_dq0JTk6cQwL#RFrlFvXwp^WwQWzi52(SEI)iCKd7t
z`;UgU@E<}}z`oULM(r?tH6N97*9*CkI3Av;&Lme|`YfY*>T>N3bBLYr1DnT2dYlEs
zoJy9FBa`5gYvAPD;~z#W_KSGKLZwxl7f_Mz@>^_;YcR^*TMxY#HR=kPRKQ{b;R&iG
zBZ00mENFvV`hUU$r3pBbwS0D(`aV}Z*$s%LH0_sA0A`)>M71yHY_o}Id#726%sJxb
zRKG2XK6g%!6YWcYSTQTRsU$^z$k0l&aQyNiZ3KO_(t??VN9qA11rjn+Mh5~atnYGe
zh7XpcR_Db=8VMv`-7=+OB+@8SFoB9*wxZMD4i8HrQqW`uDB&mv;0&ltD;I$sl|dbJ
zVr=8Z_X_(=cAQ++h3T~y=C<ivovy6%y+U0bjn{FC@s71q*?pV|kEMI}>9z3^;_(t6
z>_V^UQMh*OQJ&VIOrcW3Yx4)XDE93$+?^A%cJS1W%TU!3QL*;BTFSXq?2}U&n?BMS
zMr%pLPm{v7AZPR8shpq~5?jW*0Pl<ch~Dt3%;S}fH`LZTPFgTJ6m+>}J(y=K3#g7f
zgJ^NX9J+^{8tuM~oSRmAVSwAVX7k$apeDm=_6QnTFe+`EqRO@K!WG=o_ij{a>B1Zp
zy`HLxr@0hQ$ZYW^NXHWgNUPYJn#eo!@Spjo<U#h8cGFm2=Op3T`b^KNy51!}@{-nN
z%AC9<9;ZEfYLC8P1<XN}y|AU3yHq|eu6+2k(}JT_NsTZC?2HJd2z{jmrAy2ENLGnZ
zHvX82O85S}q(d5|&8z}+LbwMMkEDX|f@b;~f9mPOL0&yuSIEF84zX2-={LczQ|iR`
z)D>+6l!nPfc|vUFjfK_Ji?h?d!rD<PX*w74l?|#NOXG@^HL331NxY~i4l$mvzDKG>
zxVNyTWzS_KZ&M<kOUHPy(%$CN!3qa-BUr>)`L`vozEH0v*cN?p{|yY=^X|h3>q<6F
z<u5#ZiyLzr*P>q8y74<MieQFpaLE<v`!K^Wzw>Qc&(F_p>uqK}54YVfa+cMA6r@(a
z$;D3A{z+LuO6DhhUg|Sr;LII!#wj^*J)5IahtWR3Tinc4ML*52lZ|oBNaE*N;)9RR
z!O?h8M3#kmo0+1`bC*+9bg^)O`1W`iY%i#Eoae9jppfPo;^(Rsv1|8L{v1~>+9-2H
zvgnd?Et?%LiET1utV=3w$Qn{!UFQe9k^{&7y*>b;e8r*Pb;l}r3r$nXpn9Zj|D##C
zWa?+cvnG?B3iFn&XexykyIs@+ybxu!oU~r`JM|eoA!zTn-#VV1M5@zu!7}C4O0Mh9
z7^UsR7h(LG+5?A1T6`6o_8m)YBkcS%eYK}gp}jfGc6O&Qjxn`X)#Mexl_3Qd!vrz6
z6%oDr10*zlF7Evl3nB_D;&ME5Q2pEZi3+hx<}`HMW7x=%Pp9vkjK*fa{vMWIwkHWJ
z!~k`9#l2jWd@WPR&9nla$zZ!rvNXExD}kV6LsVi=kT!7Rpry1MW^i|31gThsXP;Tq
zc2$s3cRKWD4`TG}USF&ANc06^P*%0mjJ`n1Z$|Igu^Rt9NINYW5)gC`tAYxOk&9RF
zGvi`k=mDS&iSu}gx|lSJr;TS#JRHWpH2g2!{3za~QNgunAoFUGLA}t1LU@bXau*#E
zyzHx*xly7xdBboenhjKI*ff2<%={-9JnQcHvF5Scb+5wX1J;kTF9{36r`=ms9AhcC
zz0)qK^F`V4Oz_7m_~^r&3>8oABEp-=Fbv<;5e<Rlq25kN@(wp{cd*q3_Y=HGBkzRs
zicBZpKpDpH?z--JS+N4S$Ix!y<^4{0q%MI@a)4HRDKK79xRwuPT${NOd9)}^060sP
zCJ6fCDVI6*8D+bGzv`JKlG(xKA?G;XBWk|9L)8qO55_H{(ZRqvJ#Zb)4RCP-<-ask
zO7Z<e-tf_~1X-4BWl73Iviftubir;XA%%QZoJ+&1W8?|2@;}%?<ulRUuoT~~j*qPF
z>YZ@F=sKJyLdw=iB1LAEbQ7n<(9r=EM<_LTJKKnlzY4!Tn%*=e*`$IOvc%|eDrT@x
z=^&q+w;Qmk3=Ca;AoA-bl%M&B1LGY;9;N)#o9+331m~od>0CECTOh>5&jX<k@~Z1M
z@0=%+G651Pfyf>yH&#~VtzEUg<63}>m>OqU=p%eDisDFCql^wJ*gMCa+=Cc)0sgy!
z?<WL<(ze*%VOU=WmGastbTJ)xB0Y&gFb&;+g8pp16~*2yDd;u3;qsu^wir_`PPi+a
zP7L(6!+@a`PwLI&h^6dm0@GPGFwa4I!V9U@gXgfqS6vd~O5XqqA)ae#f>Kx=6Nt4N
zsBBL4ZXhSSN8^8Epa4A?ljjBPUuL_)D;%Hf0k%7gsx=rZdf1A|ht{xNcRVv(m+oFI
zULl9yhU{nz@wlYlua?YLLCiGJMNr=Zh^mp{vNK4!*Zz^jZc$_4Wm~hXS?b+jih1dI
zb*Q>iFntupz`5aZesT%^==>~Yh7@2beF}>x9H8Z4{LtRY;x<$0<Lc7`hM>rDDiFui
zXqCS{ro%9^{zEro1Dsw7g&Yq5wT$pyGD0Bt?-{Ha(!8b7D<w!@<uUXLidb<q@w)3?
zN+LZ$cJm@m@HalKRpr)!q&FB{;UyiAlR^H2ce|Xo`{(LXrSn^eg~=cejuIX*cJ@{y
z6E1(!W@HZ1w}(q=PXs&kw~3`A$x|-KlO2(90<4jC3$Zvb7Q31A1$<z<C!Qk%<-loW
z5XYn-J|dBdaCaStE55^aeLRZCRZjQcynZ8P5Iu+f_A#Nn{-<*Ao@rI;#<em5jLzj%
z)d#YM@=^_IbPG&<CgM2_gzDcj2o3j9B(cAh!>547JQ;=D=9bW~=ax*>s?lT-^|&gK
zs*-+o?mEYlRuf%r{)0CsLe>h9LA<jXKwD-gy$6m&V#gb3rp0pR^DDgNHzBPQ_6$Z=
zEu93%Zp%^;AMA7)cxRh}m(G$%wjzhhf?J5tE<ODj1D06o%KQ2U{_%7m&1@m)rHgdg
z%peFnRWB(~I5fUzxpWPCj}ub6h+0@ps--R4T&F#P`NWMcpFApJy=ik0J}$8cgt`=N
z7g!4Ns%%dI8g?X};jwDY4$dshLz-AmA0E!FFm^d<V4rgW$^?bL89uplh*1yl+<elu
zwej#-ucBSA^Ju!>Nb>SoPjcDXZle<yE|Ar1)-TfkHf-tnTn}~#(3xD7wD4~ggTVT-
zRY0a>a=z`|5cBIewATcsw0n^6=$Xxw0d+*ICkUQtMN@(9G$BKi*}XQY;79Az=$jn$
ze`Ms!P?2bwh2}&tF?nKxb&y#~Rx^@NoJNa%cO-=Dn`M|Fc@7@L(OQ(CY_KKL`UqhR
zC?0&1=5^r+eD`u)K`tUJAWe*Z)J$F3wYxfE#frd1=gcT<c?F-6^2xON0IX%Y&uHoX
zB(UqV?_>KK1Jp$Ecn>!<W_&amf~;b6M1xVvl97D5!|MERx`xtrei{FCDKr7(7xQK+
zSDUVAIOaNyD~cV8aoNl3E=dG*i1E~zH9WGW(K<1IsZ0-NE)5o;x#E~F1&>az=(3I7
zna?z6zp@?|ljr7eIknrrX4Noj8;4nXZ+v~&?;n_6twP#`Jx6HMG~sw^e0;<@7bPEI
zK~f=@O$0a7g|qQT>~&4?qI#}O{f1d%vwuaRVmw(Ve4ReemM2jf@_o+B3O^k%%TRi_
zC%=s~KttduQ6zzsr_8R7i0XAtoEb$Y(2O@ejtLda6H&JrjH8xqw4H@lEs`&AU(y~Y
zN|Lg#6()q*L^YLj(v#k#G1Y8qYC~ddPhzpjQcLmP`TB5~Zh&DKzNozla(qCbk2B}!
zEnPs#97*AF8;II^akalI@~~RU^zNfy{^<n#K&yaFbHDk}to@22lr%Ady2v=R2o7z#
z4EbzUeH<!E$2_EN52gMQn*%aTf5c`GY1$Yxj%OKEx&-~hWe3T`fs8<v8#pq35BEtg
z*L+4dQ~SFzE@ozS$E4e&BJH2OOiIl{)mcy(``%&9d|+z5f|^7in?G%0Wf%Yo-BCZt
zUK{g^J44IaXJLkAkqx~%@ze{rQ{lD>Lw(9CXzc(vt)NRgE$YdNqR7(mpVlzBh7%Ru
z{5QFQAi$)Wy}~WqPV8R4onKY=7-I+w-H|zk;Lmmq*-{)WDQ6I&I|l;i-7+<aOO67~
z3^8X~j04y_(Y~!O(#Hg3YTZ5PI$6n5%qD8zJA~s5-NbN^L8T5YLiBzdl==Mp=jjp2
z>w@9KEWkH@p;6@_m}L}@34CcUVpt!0U6oIpiY6oR3Q9_vq!tMI?1!lu)bCh(hRI}D
zqcQH?h9{4f*Q-f_!@Tv(2w~cGSDMor{qv|7UoZ1JKJSEfCEd>Ng!Xoki77O0|E2RR
z?V9GR*L*t?T?mhh_e7GqlRCa21Y=Dolc`=oe3Wk^&rHcTBpSejEHuYQM*5ij+9*DK
zNbdR6@NNR!d2F!(v{~f(@Cx6M+ZeG_GX|qCY{6vbe#zD5Ga5AsEhgH4#h?z)1z88X
zgub<wfMRe1pG9kK)RAE-v!-K!8PE0dM?*@@+dS7550Ac=O$CegNWy+iAr1my_*rT^
z_fN_0WUQxi+lowe6Q{7bSMA|pj8zRu%28`w6^kCYc?vJb<uFWnjc%BowKDP$27E)U
zaK{B^50o|oNtE=vajR<0MsafIT%22atx_Z%A=`YTqud*O@4WV3jJMtSo~RB>hl0_<
z*;ERnop@@XWBl~uKIqbBLKyPj!+-!kqae#v4+FE;ZE3hLe@f*UD>l#aqU>#`AG6tg
z<rnmvyBdNMaIPUMrZQPZbkLu;6;W)x-?d#|x(DCb0BTr~FcQ?F3o!$HAp+l)%FG|K
z8SixOAK0{-P=CZ~Ljt#-Wn1Y&r-X{{m|7?je<e|>i$ZHHW3g7Kq}-ZuzhFFNKibH@
zF`GExC#A_tG(V1(UNJs65QPG|RT}aVuEemMHWur_Ozwu-u@v+-t{O4SR&l@XUNeP|
z{!WNxdsjH~zbrp5PmSJpxvNXNPJX^DK)6^3LZv7=)_9ZhuQ9^`c2l62#dQ|w*6>QO
z)bIa(ihI9o*Pl~t+O1Tf<=WqhuJlONQ)Ci1VR=&a-R5BTl{Yp$ql&Q+E7t2Fz8tw;
zw68Dpr3Z$k*~LV3%OiQt^v%ruSzY4c9PtrQ)p&xJ25jS!3M-iVMG|rpxOx43vS4V;
z+o+8uRHZGc6)c9{*nS1OagU```#&zNnUfge>S)B(FfTR7I&C9)ri1FIUQifz!N?bg
z{qkldV$In8rlbUz18#LKRVtasOy|>)+8OvGA^IqNQvrfT^7}S&E!BV=wY2iZu#wbr
zDHxS5ER2Rf6&t^|_zvURyw!_23~qbiU;%`eR!jW+5IUOCelu3t37tSQugFb|DM`2)
zwzH2c0B1qAhLX<*E(<ucdP1Whkl5E};iuukUW2}!Tp^&pS3@1a%@%DgJtmW9_h$My
z@X)&PRtiLYi=1HRSp1o@`XTsjmUS(}k>d(=r>*h?c(97$0a4WbQWpDvN%C0;G8@<5
zW*ZQ9hau@>L-3mJsQ9gpEcuIkK-G9v$D)%dVjt7#$UuHg0*7Qw0RaiR>d7L8fWi^m
z0^vNhLLbN>L(b%Gr}voj8q=6T;d;%$eTSVxWV$;^xYEKL@DJ8VIQ(Co*=8WG>$Yf*
ztPUCMTAb@YI<w-B&iu)R>k0vE5B0};s9O@vO>@YU@;T+;^AAdy&>Y)^)CxqDtXdCR
zsTcfcYL7I2w({ng8hFQ-X*gwQ<iLMrX3X}aUQk9Ob*)~jrFPB~DrJPd3r3Olo5pcv
z$9z_yUnBi)T{8PaHNFACgr_cJMIB1z=Q>tjco?Oqy6yO3pdY0Rjdei+aw{ys&GN;F
zm=SYI60a7!>P_+`i6hs2>UQCh+7ZOIHY_#H{hZmpS%2fCxicO^D(t9Y>w{$C$Lqdh
z4+jYU>n}&^YQ~`d$6r<r*}<LtkH4J5_rLx!LfLUMJlHZ#mY$xT8igE1R7A|vQ>Ux7
zDQG0Y)dCv;04P}J;^Jg4`oVx-$4<;@v+}k_?CzN?bBrO$9l<(>kmys4nu^;!;HB9_
z-(X~9IXle_sTO;)SuusV1<x(l*49RX{T=m$+&YvDZWH59r?zC~mSQgmtGm03KBY4X
z0bT666bEdaL7rX3%UD;E&yP+A8Cq5K9!83~bjzFbu~!o=1*0-4VVhH!;?HqWQQmI^
zr-4gY8Qym`wOaNW7Vq9ED?%c7?EqCV)8nG6s;H=~<#`_Mn(T<KrXnxuv%HZk`;=ZA
zwF5|~r%Zs=nQZYpdr_{f7stTne!q|X-J>_t&D9Z)<`&s0yjOrQ9;pIuc;Xa&v2Dz$
zH7K#WGmoB?poY@7z+Y1W^Zu|>B@57e$)0#-6~~NE9PT+(+m5wwfw|4}zv(*npd{oG
z4@atxI=5}#0&j2bIYn<XT`sf#s$;p`b8=4FN}C~Vv;HW%Uqv_j0c|czQY6k`Tt28Y
z35LoK@!=LOq&bTMUbS*(P5}VWS*NH6w6+kQJS)=y3~2N@-2szWj`z9DQ2@JnzYka8
zj4pC>6gB&Jrz!1+E#lTAzL1VeX`B<k;o+BeQa8OWR(g+S*7?9xpW{U$S!i?R-dlh>
z=9S}ddpq>Khy>Q;#qhe2S<Lw|@wJ{P@`i~qb&Jw1znMrIyN7aOTTOLGJ@xz4d%8<!
zBklXullLVv@4H(ZS_5(4C@?XM>E}eygJm4x>~haK%k@W;;ner@B1C=iBda;IKb?66
zo;I_mrRpp^*x;buLL*V2Ru~8CtemJj+$mV1MtjM3h|y&uO>0oX`prwqBC=yaXj2Lm
zz*?|3L;_ja1A`I>{uUM&2$78vRJgcE9$plxg$Gszw|8VKJ>B`CfT%Om;*ed?4Aapf
zi$6W7mErR;(P`gie_bs%2-I&t*fOtrLp5-S5~=Ak82U`JoA!RUvR7#e^S5K3*-<`0
zt|~z4rrRMTEM-mE9Mag>DPjRO9ZNAW?gjJNx1mh|*^FA@#W#1`ceF}oE~QQCM0)6_
z2E<u#*+mhi?;IB~S|NFv%*$9_;dPa=Y9s36ObYq4?LZ+|xLB*)Gy)+qnEPRELLD6t
z`64jWkwZ11G*;(B`=1#(Bgme;ipO9{IfeIVNcO}U)a$|WpH?ha-FPuM%dku~AbK;i
zI}G9SsxckYXq2RXu$pg(5M!9ZclUqPNe#(a7nMQ{>i4o>nQK+=l-=<(!NBScU#KZn
zN8Vq*^<RU)_{ak>8c~km(=UJaMt7sXC>%6MUYYqL!W5R{Pu<z9akm!WnY44O%k#%i
zw=hMD!j}X?3ozh;#(XkQ5v0a^zYDlLZ*8gRo1OA9aA=QQE?<921rUCB=$0Orp{xfA
zQqjR!T$eVq3^<xM`ZvUHCR0ZO{VD)ImHvn!4;?b7eWsppMCz9N6W2}yVp_hd5>Q>D
z!!D8^HtQZT$4?squ|oq4CWY|2s>)bte4InEc$z#nE(dwf)}+K}WsUxVoKc0Y<6Y}2
zxY1njr<Xw%Jk{fAd8je@H!bq>JhOb7#hSf5ox(C=VSGc*@+`+6qa=(i+pMREYRz3<
z3|l@hV}wGKx^&!6#DC@GwuZ<lKANh!wsr0E1yz}e))HZu**s9}|2WHFd|4j)A;YEW
z$`!V+2*}hz<5uu>L~yxV)DcGP`hYoO5|`NQUHhUHM<Yu#_Mnq^6N*AWMH5IK*zxz@
z`Dg!N4pT#&3j2T7;#WedJI^Xwh&zH(mbdx>z?)}#q0_VlruP-YF<{MFt7J!{Z?Twz
zXwmJ}w_QKK!rrr+CILkGdn0K$)TIVgQ1U7_C~OZ{hZ;ip@}1xSY$wne3zVYOP?oYf
z)1t7@Iv@Fi3hm9s#iQo>f!b>Axsv6zp>dzYIrBP<mTau?U~%@4my<<6@0nyQFNz0k
z<4`w@qHhwzeawa?_W+uFMA2yAo<J!T9M;-6p(In&PZ9~THN8S5`tc`QXRM7-a$qXg
zL|)tY9Eo@9I@KWLoPsN6?a-xOm-Vbr)?2JpbpTs(SJ8x1-_~&Yfcd21#de-(;hoK!
ztziErwl&E@L&A_wxu&qJY4HhJnOxzu*8XJ_FHiCHRvwUkS>ro-w?*-<&w5qnAT5C`
zet;6A#yuV|gI#x%`!&`;byO5Qs9VsrHW6k5gh}yZU*WUl$jNFT%83Gotq=&lu?wDN
zW!gmQ|GXS*#&QT3tLNDEoQv{Z2Q)UtiN4MFg3#Ca*NHYL&GahYv`MkqX!kySAwUQk
zqBF(nV7Z>VZjBCc#ZmDZnPiu78X`cYn~U&DlN5gKqN#x;s6zr;WvtiR;%Gb6tVSd?
zI`Oqxjt6vR&CZvoyAqVos}<3t9+cO}l&#%TkzdW7Vw;K^Jvx5FqRQf==+-1RIJr2H
zdU$C3u@ctCo1_X2nmgmH?R|`R>s22;j1xz0^F14kLs?#quJc7clY*?q^8A>urrWI0
z{+C%Zd5kstBv>fepj@b=KKeNWw$Y~%f8NxwlCUPJZs)1WvI9oXl~2L@HD%_Bu!UnS
zqPdC(ng+8@5KtC)=Nls-xkenY)0w8NV9w(@na%_I$Ivd#+zE}&)@2^Jv7uW$F}eOE
zsqSE{5gfy<HHfZhMAF&~w(`)pDNqDm&bU;w67R0MepSK?TGPF}Ygklm8N1jl<9)%)
z-TUe=8TL7vX3IUSds>Pv!}USe547h*rpr#2+ggi{GLKT{Gv&}#b!_&=CaPF&>{mQr
z%(5p>Ylt!l9Qytu#dJl+zk(q72SscUzkCHz3VS|IVC7cL6wEui#^_S_LZu4eHI)!+
z=&aOQ<<P0a6S$Ahq>wKYt?TTtZ1WE1e-V)t&F5nvD;hw^L`-bd-e@+aw6pp?n@p;3
zmRW0ojnhc#<Xn?-I3!%)Hgu?L(F9aUVXbEaQ4*Et$&Tj*4*QQVcPk6&+%r(kx_d&j
zAq&yWn|`V1>vN}FHiN1%84QOkse75HS8H_Xy_%B`=gTaLpG}n63>Z}zWXnS7wDNR~
zu;Es%yoC0P7t;+dgm@=5m`Yq3h2r~Hw9iAQKTqSV?~^PpuG6hQnG*8%X{vbpLZ;_a
zzsK+)S*)!h9)SZ=mCkDmzMQ7Jh!*FLM@keaqN2mf(els{g9Y92o<wgr2p<K3z1$Cd
z`g)Ro;*Ags{)*IBwDmd0j~4dVIuY+ts_bJZd88jWGrH6U4F(yED~S^O;9_BjvWgvZ
zp+htYNI*Y81Q5l8Y|s0b12#y99>QkGG|B~}GtKJ3bqWieI=M_TVT@AZR$aMd^PQnO
z@3T%UW;lRI78k<!hVTU|6AhILy3A<3D^$N0YiR)=SfrfhS(Y+6FFA3D;e=P4#eOg>
zK$V0F^E+PWK#}lmHTSR<`f0j7M_nL>#XkMvREeMf<6|uVdDuaxhvz6(d3-?BlCe|l
zA}V0mwLS~el)1_HbHtmdEHYb9Zh%L(uPuXjKBevEr+l<F@Kn3M>gmlTyyNluQ9=fL
zY>sVi&XW&CtCn24)xcD?crl2CN)>h5vbysqpKGw{+m1Okw5(^p<AOfQ*wRo2-*Bs*
zu2w5z2AEuR>MgLRN@HQj!F4!8e}24l<fV6gN;u90GNmLUA0R)DLA~F&9Z%cJ<fS~F
z__Y6q=j8qEGq%1Q8P<|ep%n+$U)7m;klR23-6$-B{00XPKbtA6W2AGUHdWy?HiP*m
z9Nm&$-@#Ej-&B&efE0**JmpOlCBjI;y7#^;_1n9}P%(b3@4bwgk9%%;b+<qv=*n~2
zZh8m+yQM0<Zg%Cfhzh37?^V(moXzv#3>6AilTdm*hrn@_U>S8xB5UqdkV0Ol<#x>h
zk+mAW4(;UZ$79|A)sBRC_;)uUEPs%5+hb@PQ&Y}A%=?Kyri+=ZJEuN!lXxFNQ7*Lp
zeipYgWL&G2%nV32))ff>O?;YwUi?XKD}&_KFcsWuzluLat)tSCI0Eep^&bg$L+DEp
zZULcG8+~ZFFwq?J)|q5z)d+tAH!60x-zif)o5d@3IJZ+PnC9fUTG_?scmcyx=B4ER
z2URkAV_r+kU4zH~H9E?cI2PGQyiEG&()M<%2S$v%Z>2w-E0v-iw#!b@AhMH5(6$l+
z{atUJweDnCejE1;%_F*PFuUU7EVh;A4928bfIS{nCD|p9!X`0`Y<?rmH42^10AN5Z
zHR|Oe+{Sexkt07Z5U;ba&Yjz!5ztBWN;uEmrk&{^nTJ8NL}(>uu`J-ghj}ek<JyS%
z3#0Sc^+4w=X>~D@Wp1^Ray5jcE>0(=K||40P~|Lp7S`eM6P?TB$8d`D{pyM`GR<xk
zcDZt8w{p_}O?=M=|6SvXqSchBb9N}O1bT!}Oqk)NGwxcfcD(TBQ45#x`@7G;{0b;5
zWncBKP$eEX($7k9Vv))H3Qq;QAZig<s|1$@(zgkSDIE?tb9}j;c!Y`Fc6I*ss0!E#
zWanHDlbEN#HG2f4ZuzifK}stVSgjelPNx^I%K)eH)U&~}1sjEI3`YYm>|Tot>z|e<
zp+w|b@7~s2i*B!}Kj6<?wLMzJqPz@23Y^O94VaW~!S}5lVfl{H07c<z%R$|c91qXU
z>FQ9c?T!E7ef6;{&yPp@n??}>c>|ILE1{KbIbH@YCh!-iDGNj?Q}EA6m3df#yM<nb
z;>YL!@uKg?^m}{rDOvf6D4-BwVu5DLC6HpO+`5cH&UeLpW9M(+-81?xm1jxGD_4an
zLH(b<U5Zut9zf<GMnz_>q?I5pR}(gj?j9!hgi(4{JNG#x3F#j!#7jfnF+|()@?5NA
zAp1}g&A&@mtWc$c`D43QsSyznEWGp1QteuvuEqN(|5!r(AzIB;+6}NT=G=u9NnqpE
z3(S>B5j?pb{H(9EjQ1k24t9L~w@V52*l?+AYKyj*3yF(vjA$%YX0_VA3zQi^L^TFP
z(26}|zJdr@d`p$aSfI6_M0S^GTcKH-Q$+cI+wL_`MPFoP{77B;ju?s~bz)Nn0iIjx
zF=Ux?E~RCE7k-a7*?w72o;1Ucl7P7J7t-b=t-xk9xg!v4(VSx$qNSnw;D$Tr6(;O)
zur8;Hs#7&kqpZcVcEYe1i@=J=SMG2;7bwYMsUDY(Mg@ke#B=UkE^eeX3i+(f7g~PU
z_MhAVVr(5Wec62oqc8rssr5k4pO;LwGE<H7Wy}fvgN61W8K9B(&$0H8hbn5D1K?3J
zlA(@UIG3L+ZK)(xzPW>!!{Vq)V?|A6phJ+COclQ)eS*H}XkE8Aq|eNDfdowMi)L&D
z{B+cX6Hj}y1{Ohi=`kPpFfR!#9jIR8I;s1_5h0EL9w4<mpFB{a@5RrikDvAo(AQc_
z?I#~4pC4c5C0FSfi)R-b)c=dv`n4!-gG`A1l>jRy$nN<jm|y<%l5DPQ;hJ7OP6DR_
zKyqXapQRij)=A)e^H)-&5;5O-mqBp`-ICE#^FYWkFyt}MN`j@KhR&okS4`*julFI0
zLCaL54H4(AK*$F;v)=vtol|8Z_hJIGFIFYE_dfb}{ap7%bANkrWmRY-L_;Z|BC+h@
zG+XO$uiB{SLqCGQnoKwKv9(n)S8MCDLWl>Ondx!botu@C$PL6>wIXk@U~-%(Ei|yF
z$KsLe<vnJ*Fev^7^lQj$33_AfaNh^a>?XF*P$-avTwHDMO0_oD56~DP$c=!X9wuV(
zxt<_C-|uSz7An15E7RVB=qQp?%mD?$K7^aKKpltgD*OB6p+|xlucOn{SJUW~hGz9Q
zrX=cL6Q-PmyT2)k2QLs!TI^u_O**B<ZV92F>Fb$rp!oQH6Q4tY^3Kpxh4RDJATOw%
z3M3rwg9t$}>iceJmcfCZS~@1grYoGC2&!_Rns}Pe*J>J`Cd!kH04Q~pkX54p5!ci`
ziCfR8joatb1G+~k+N0zl)I>$sJR+_E)8iEBc_Pqo#8sEnQ47o;6Go}D+;T*89Yms(
zZJoVzv9nFE`Gg|yLB@kk)-eb?f{3Pbr3iCXq<7=!KP7B1jT&Z;7XMA(zVlXMcaWwh
zrqbaYFaTx#>#VhWUP=LHR*pqs`QPv@Yq&VHxZpV2xUsV|86uZ!Q9q5ZkaBVF$@Je_
zxt?LIF5W0+`140l)Qj|_H6Oy36sgy_Ym44Ms$Sw|IElwYo;8T!dG!=FM{_^YpQyg{
z9=iK68*L0eU7xRZBtkDc8IA{`gH}?^^8qS@9z7Mbr0@xE=K(J_jBy@~2NU0Cs|O+v
zl&fJ1kK+-S%iGXLE9C!w%2R@E8Ti-g#_RaS=nwddZj+hn;ofn3FZ6ucT%ibu4mqr#
z$xg>>FT06GELlAAk!c#PyD=0-F58cBaS=l<jCdOn24ps}cKUk{pReg<R{MEFH3d3d
z)ZFd7bCTo>9cEc3_mjs+W1wFZgy8@MB6^3(W1i;;p5N{|W9Ue~t0c%j^Xi9e4kw>t
z{|PZ>bIrCqnf;`Ad4o9>2R`3|J<iAg+g^4VqybhREId9B|2*~^x+LkjzjN)MEF-p&
zS(1iEM}Oqy6iv4y;^$rwxN^rU+k5utp7g=dpQ0Mx+w?1m*w)nloAV?A`ESnC530uz
zg4|}LPchhYhxh*>>m8#bYq*8mSRJcl+qToOZQJOM?R0G0wr$(CZR4xwecp4<9pnD3
z`dxeMT5HWYd%=e+FP5dVa)Gr<=3_es-TbV;&HIZd87fM$JBM-tpWAgyS|3GduP;%(
zEj$nR6S4^#RXL8)=Q1|RE+VWeE3%+yBdzfJ&+AbNPeS=|TCrt-AV%|ai$P#c?Dai-
z;ViP1re9S)UED2<emqiRW-s94&v%y3*XUyNqJ2E1&Q6OY#h2wKV*DLd8aLvQs)i>Y
zp~!mPS<$4xqhNCB-G$zp2Cd%j<j4rMxFlJamC#(K2xjjte5A5*q{eLKD&GN!BXr4^
z(6P8~%CO47SWx}5$V^3yWIc6AS@v6hn*+yN!)xotuMPW}BtmN?aKT1^XV0o;Yl~uq
zT3gQ#ku95>ro@Jwg-lSOBW;f>%r!S@svy-Cj45}Tmm9!%x1|nKV_8Grh0zQm|Dx{Z
zl9DG`k9Lm`>b*D}A3De!DwVnlh#vKjxC<e9jaM!Le~(cUn{AbxAa4mS(n)*?)eyr3
z3N<C*`UiVO-)uG8ZZXTE!Q+-HX}J6mNI`&<-~%<LjX;h@@FxPmzFJEE5Bquu`7isb
z@`%RJ2RC!<T`gLyT{vHnm#gp<zgfzVpc40{#LFouxdUJJE|dSp4)qxqM&E)P3EjMr
z$L4jHT|QIBghBv#JWA}}&l9*#^YU|ld%Al+F^SJJiE7qFvoe32JR!))!rfF3lZV~-
zO^d?@Fw}?hHM%?DOby#xuP5L8Yu$5HLS|68hs1$ymR4n<uJQ?erp+FvnzR`yhNi~u
zFw71j-F9GwS9Dsmi!4!C1(UP(TlAemPN~@=r}4iQFnW!3e%^ir?IR<l5U6*^a1xR8
z6T;+~UKBbU^3`B*(B7{)*lj!DXAIv|PDl3xyWfTzUA<nn<Y(Y@+)qHeyisM{?4WWx
zd5$`>xHF0bY?53oYZ_i7J0HzSaSXC|eVBeNvHXdS^n>Sx(!eoTzDCB=4x-U@+erC3
zO<pY*<xpK9DL$Pm<nujzdLY=5)QITaSq1lFIJs#DM%XgQDh15w^d+&BvH^yl+}qd5
zq3C2pl3@4D^wwtkE3@Yh6U~HRhsFFx$Bapypnbl|exY_DC0KnFYbHl^*H`cv|G(r`
z1NuQRjehr{n1aL|{}8n~ywAG|5fC2MmIiO=Wq-1GveG+SIbJP#^zm5vd^+rptFLbH
z8Od%dDSOO!IIxz<W9uwJwVv1W{|~7RE&CU#?ILdaFH&2>%FYZuZn}F&{=2>&wT3O*
zduE?)$lD6u`fX1&Y_>>uput_se^T-a+L4UdtMOl)R*Q>flOZ2#yf(ek03CBE4Us(<
zwCD*vj4B8O{mKWFUW2X?Xa}jN{}+K4i?#p7eMEHI5)A{*qko2sx>>pdlE7eJYV?Lx
z^hsSFV@foy=?_{-{A6j9>OK+CB;%xGQlms>wNWY2#O^kw0|XzU`}%CaK{!hlucfwk
zjk1R2V1_wM3$4f4im8{x6f=5Z^oI+WL~t14TX0W%Dgj`Tt!k0ff`B&{y4RAfwLGRt
zbdSnrhipN-ZAKI;1E$DKs!@nfcA(zhQ=OGjZ@7_iFK%&BqFYQ%f=aS?eB9k`7S;N*
zE8D|yjrZ^EN-JQy68Z+aRf8Rkq+!vM`b)zCTgh1ec0ePAK4k}Gcypb+VE<Bfa}73n
z^P|5g<8pZ7t0w}%oQdNJV}RJ5GL~kejoas)a{G7Jm$&VamCF+`2D_yf*0F5ae%>Ul
zbXICQ=VD5#K`{D2s41r4ZK|e>IvuciZ9sFtD>$8(pH7RH+3u{G`LO=AMZKaFajSq*
z>o15|ToZdjPsq8QjR{UR8wM2XK?Yu-m>&4>M69@#Nc$$1Hwo>hmL9OkzXf$O=6*Wf
ze?MFod8RvBQ#DSGOsnZKUZ!}Rmc8!kZclkPdgWxNc;AD80}IJ|VZkLfF;ZL<W9SjL
zbNzAr8}QuG=|uUh+1~%MyD0CAR9`ypR1<0X!49xRJCx@49SRXLC50OP!`JH0NdLpv
zVqv0n)+Ju$T2KbR;<Ho_aR$qn{#5c`lQ&eLi=ni{^!AZ1qHb+SZS`-aI<7+GW!%8`
z^ES=)*ADhHHHM{bC7)_X;PdP1Cjb<-Kq&2>O1PN=!uLk^hw8O{MEfL!<N?DfF{HYz
z=ZN{W7mP*aEx;*#gDt9&H9@Lt?H@(F-{lM1Lc{Cp!UA(0Taeoi1uXVRk9XOb#w90L
z4joiJv!y?xai4N6pj7p5u6Vh}!glo1g<jk}I<yD<dS+Wz8I>~D@aEW(hH&4#X@$Az
zy)`AC0fo#a7{=t`mf!lsyK-s9@5KtgmOQV&^q9v8U%4<Z_hxyanGJ`?CgWp4Z95Q~
z+dUU`^}fvflilVqp2=(1T*8oNf%Cf7>6K&y%tQ`T0J_SLloU814!hDRR2Ug?HK5eg
zOhJm;*)YR(ZF-&X(+(1)XpS&FkF^Zj1n5H_U(%_eObu4A<~OrhtIzm&_*ElzB>puU
zAK!IW+tSM&ag6U#_uKkjV)jFl69QhU-MXi*axUz(AGb<GyL?gdvI0-!DzA%t21Yd}
z_?a({v2Rh+ST3sdeu2D0%E7J&;;j%(S=7ScC9jAJ_Op)UgauM&e3#U4)06v+qa@R_
z{VD?~<|k-`9@Po23fW%)_}y$6ws%HBN!AZq?_m?(73s8?Ux6XEEvIshP&U0U$E~4F
z`d{%IqFT&AVSY68VCJ}|Wvj<D7@AH$GST|0heXMbN=g%>_DS<HZ7cOJCL5(JUHjy<
zGO=puMl_3)`i>V@&nKGnatEj8Ttg>iX>A}Rwc&WF%+|)23(|ND5}is7Gr=sCl7HP&
zS1Kf$+WqBHNu)|DRDpvgkZ>|Fpkl&PX4OtU$n9ul8@(QfIo0&^rKOSt1Mv=-tvS-o
z!+!$s5hR;(8>Q(Jz}SIS&xec+QT>F`l!Pa%K^9Kp?JlDdJ%(eP_cZ}$i9d^Hr`1(k
zc-~EKN2H!tG>ZDo{{!dFxNs&_a|K%JH1N^L2ovahtB?KN$PU_JZmlDuDR%t!xM|&>
zpFfZc*JJ1-)V|W79=4|pwm*k-uxN+l3zMQ9330fu7aYzM7zWU}NHMyQ>K|V)#LT9k
z?B>azMY@g>D9RPG-;w3P6gX9&I|kWM?FS6sCtZNow=KjH716W&R^h<LJw4hO?A&IG
zsaP9MSeh|g3jj7F9LqddKKyFanD$dQY)2(5wHV;02%n}uu?pOkKW)9+L?7VRrZ&)m
zi+w-_@lG+#qr5RW$FQblS!NP%tiS?^9)~=GD&Rd|UO=WG)b^f7`6wcHC3W}r0fy_b
z6MI(SY5P{n9x)Quog}=Lg-0k2GbrCNpFrrU-G<LmfY0g9q$kd#PsUUtb2XtSEsJ<O
z7><(6+Srgwu}FY!%39#ze{UWd{F#K3?eR&+ZR|k!dZ2OU<x8*uIjj;D9x`UO%^(lh
zIgKqE0wV<O4^3uo5~~4>5VZ`A`}I_j(^FHt-SCb~=W%qx`QL+!Kdvs^s^@3$+U{ku
z9ilf)jyyX%W8er`L}hn?-&V~c>AUyL-|X@P%^w51xXi3oef2iJUa#+`r<jw?M7aQ$
z!8@;EOD}e=SaQg<ZAE$${Mn-97_Q-0L<>&Q)g#&8@ZqbSUYMOJ=4ZYtoF(UC)<kb?
z&leXT!_fu+)()pOYqfv6p%O=xc=7Gp83v9=JB&b^+G&X7`qOQ_YAW;Edfaxa-fX>!
zcYS*`akI|o{%UjGIp3Us$c1ZU-RRRa@Q0uh;~4E)r?;fZVy;2KWWQtR@m1B;!4N?A
zoqQ!>yI~HIuqGpc;q8{MJbH}~{!~E{noRd)7<0md5TGCUck(9uLh`UeVFF6GB57E?
zYAn)}IRm&*lNGBjg^%)z+j{S~TZ_kWiPxp$*0JvPBsKXSmSdprAa#)5t_LEljXn!1
z@~K23#Z3tX5w&;`n`Hcnkj@LVdPw|==nod;R34SOTs~gL&B>*%CKbM+&?E+2AS@0{
zl58z}zrKRc>8E|Alu4|XOjY#)nc^?Qee<=d#Xj(}>KAriog9!@5t`?^k=4B!NZ`VV
zvC(B-7-T}Q^TgUu2d!PCkAzKn5s75to!ZPQMchdupf(xI*{=ozo1NJ8VX?_4b!<@{
zPYXl%yhBtu+g2W#x;>F5a_uQ7!Qv-~XD}%kARAS4kg5t<S#KtOn+At$Uz-D>+4K}X
zD|8TqqNVi4gUmYNC*S~xnv+1;djA{BM`R^~w9>MG3PSR`JS22IX%OXMFm=dVy2+Y;
z<!JfI(Z#SrXw!63$Y(Ua6E`o4j%I&RLyDl`BT482L20@@RFawuT8CPsl{qN6;v{MV
z*Cv_=o+>5%xZu8=iRIGQx_EUpN@;751$?u2yAZ!v+{7`AT#ZIYhLjCIZGcnI0RcN@
z8@lFKf8LxG$PE^#(tPvl7zHB2I5@~RC#T&26P$9Hp)n7+H5%NbLw+^a<-O#@P9-Vi
z12DX0;|UUq+9si_1<`SSpiV27b#tRN?xOpa|15!F9A}0C+aA@3j#Dg=e>_VkpP)mp
zdDwKDHq@+sPa)!PIZGnuw1LFeMND1)iG%|V1}^<#c*+7iwSz?{tG@4HCsr1L@LCye
zpT*q}uxGWlzOqDh!XU8Cr3Dw(hm1iGY9n!haEqalw@{nSNqsN^K8jY-+ZUsNQ=N4f
z1g`^wEB~zD*~Txj_?9`QV6$aJ27z=LCaT^j*tGk}m>1{6oPcq|P&%`T#}tu^Ua*Cs
zVRLV+G^xxR98(%A3UracC8{2)<z9cRYEIP?4oKk!dSM(89N)4Ol@Zt)%fsBHX3FYf
z3VWrUluJVRiMg<L0`w|cU=>rvL;J(@R(^J1VD3K-*XD&I^9!kcsSvS8<~6=yb;m1P
zP2d7n@w461sgiz$2_uek^uJJRP{l{sJJ(yBYqo+aSi|nA`H)0t5E8cDBw@4+eAR?@
z=7#YoTqp~z^R@ji7Qw7yO{l=5p3G1%!wYkx*s57<s=yE5s27-aXBoB%toBW^Oj`-+
z*6MH;I(#_Epk5lR=b&%sbK(*J0QW`=MwO}~c!DKuBv@ni9aZl;hmIxRCCLN&pAtV>
zQIYW2cdS@OT{<DX{Uu6sQQ)w^!FtOy=n5MaI%rc&6wxwu7tzRB@yE1rQ&E|5>RfU-
z0)DAsYn?l)v-`piUf;;ow?h@XF6mAScra79LUFQQZI2$lh|tIrZtEIr<$dVr3L|(e
zJ78LR5y@KLUdY#CWL!0G^UDZa2|*{z#9Pa)QHqhjBUZN-b#F-&_~1;t0GpiWO|K$~
zM%cmuwS7LcXc%7*B{!{ne3>+|MYvX-cypQ&Tz&$2@H9(hnxYeRCzsx)pjv&NVj<OP
zm+r*$!t#1%!vPFTl@&=xKYj_SSh{;no(73`K56}+0*62}h)eF6ub|xTa+1>6ncAkp
zB3RzWPCqtQ0;dfZr=1X0B-=)oq``Qc{cu+FF9EqD>cMa{?pZ7Qk3tcz9vhNOf^Q{M
z(^|MP4ci`IXam^hYIx{|l*&jiTn?>tceYhisqro>1j&~OjamzIF&6Y*&pGK^#`TQh
zUuA!3Cf_@@Q%g*KOabesuhuHk$sNN}4J%s-(;h7vhHPO)i}bdgA1dpflbfkiCyMOR
z9ciFd*JR+t1W_He!>jp6Yr=v*`tOwB%zv?lStd!w!j^ZYyIZjB{L#Z7PbeqNDj0zk
zsH1gij&%S{Nkuo;mYl`~QAd#5>QGgbD?gWW?}UPlB95FQ$$UhvPr37~Vx)!3XWeNa
z80un~7TlUmy=WkD3Fi<`j1;FP|H1E2PKG&Nhm4=-Nw)rJ#UupZpD4!bfB{NsZ#}+`
zn^6}__@_d#e14ZIubelUj{D6$W$F%NHEtQmW<{4~5Itb*!&(8_Yk9gvK%cU1k$Cvf
z$R&wEj}sAS(t@s!_eO|g-ks|=J$)ynX+@Z2?n+O##i$_L<Yh`b7QN9nN9X?*1va^h
zl}eSRhls3I?adwG6?$C-)^`jmY$jNhE3uP%jXdNn;J*){umC0mH&uKh%ibLaqrzj%
zi%1Tes6JH<N2m{_yD#Vf-Lf4g8aVS_;MdX#)bdA5q{=oDwSamtw>~B3{%rwnnh};M
z_u33V#OOdv^Pxm&9z`jDkriG3vegu32nF-8;<h~M@a)~P6cmFi`>RKnOzTyt2<(+d
zz(oHbkb&4ElPj`*NY#yI{BJKcvXQ2oo2UWO3nB=S<y=7mkBWz+tSqIYlO0~Q+7lE-
z(>?Qbwjy1S{%mY6;DSs0=;nb-w=hkl0h6e%8UCR5Z$93&zjBDFFJxNb5^S_@<V&7O
zny`*PV96-ag?W>2MvH=7IrxA#JMPxC2J(CeK-&pzWd@TD(=|Ll-Iq)iTZkwW^@D?)
zZZq;eM^-@qx5v^t3IMeTOOpIE*q;t#Z(Q#Wt7t9P&P0^dx{arXhM%L6wgjEH)}W{7
zO{9=EPT#JVjmsYr;*7iLqpA)7Z||Q{Fby0N%(Fn4b{4tbzM3R@&Q`}Ia)coVL52X{
z0OGhZQLnGtCg#|JpTBPNC=d;3^(rEkz2%w?#vW97)$&B=6^@_3i#xnFDK-f@9BD+>
zoMSM?bR|j(N!oC*u<q&wZVHbN9iCU)WSPj8j}?;@M)W^5k7)aM!JXuWxYNxd>Iml$
zBkwd>!_sZesE#>^GyaG{PA+6!T}hL-yU&dm<9fo6gEUKwp|VHZ-4>KWL0lX`6gSam
z>R@WzX0f-A$KqGl!AxE`)Hp%5%dCJ0--ILV!kPPfcH`X0v?CcfP{snil8k?u_A;Fk
z99Mh)Gs^%>99zh<*y(u1>!66ZDT)}Zv89NG<2Q5U&L6MQyS~1*o32x{J3H*ZQCQ1T
z=LU|>bi?VB0=3ZjV2v8)?E7aB@PMvhd8?-<c~qh<G^5PmzG;Tg+R6E!?^O3kydKWE
zeE`RH1*?!Y@akgb`&W<c57g~{hk8bFCCMjs17-f{bN2OXtt5J=ZRERj&bY{TWTFoP
z`NfN|@a3@F@)f%LpbD~zTJZAK-a|1H+w+s;W4Csgo<B9B?PflcVR^$=`Zi48$4>rx
zj--|$1c`awoP!<9pn!SJULV7ij`=88-@>Mad1=?^FA(>9=j4<62_8ju^KCuKxs$+j
zz9msDMR7rnivIz047Z7IsfZnO#jP$CSZ(b#v=TP7q4x6{aQxf+@`)zB40b)On>ShK
zRIP0Glbe@-Mh9(ZCcn1lA4IMsQ8DQsM6TN5ZmO@6Sy0<ogy_xWsVu`goO{AR`w*Yr
z;KITA`*rm-D4AS9nkw-`Fs!oI-6?OAGr?u(EVtj3JHZUngemIrDoakCz$;K8+w*pR
znMlCXb2U~{WXOeX6+P_LeaLZ<4vB>|xG^?21~dKBu*^?`f&$D^_^Q|qZIymP8wcS6
z{+P6af$J(?q*{;3^IoSjsbAQ+o_8aC*Hnwfihdjo>1|oFlZh=@G|{sWDo>8F*qiME
zNb?e!qXE4oEJ`nJVOI^8gII<6s8DK)vF9%3@%`;l8qCnep2UJ6gP$gg;~ei>C?~dt
zHI&e|YFlORq$xL|2h7#K7O0qxh0kZA+H1b%CV{y`2t0DA{pIoyU?hlOSX9z9*F^uZ
z)aiwGfLJ@7^&q5LAK?J#%ev@!T3dkjd+PoR$I+C)p(iK2y^i-3e&yd7vE}VYWMwE(
zN?gV(j*4dFq%J&E<rLx$YIb501oVMb00u$mUwkk%x9+tEPQm4PK@hK*juv~kR~dAC
zHu-c3DhfhP_2M>AnZR&Ad;OBfS-PF$ZiMUFUR+aeXKSGsh7*SU(LXikUe1m8%k3Cm
z?t<)IAH<DI&`fI|Ck=ILo&b9V0oY%6DP3I82L<nY?XSO|F`b&Fr)D;S>^4j38#ZF^
zSw0}O*bVx)w$5VFDaOPcgqf=yrHf!N{=!*rJ`Le-V?Cf#+WwX8jRAwg;(i|ud6Cmj
zM*h$jm*9Dp81cvQBP_JCls@f0(t-5eh=l1y)naV!(>O;J?W>C>N09L;=*_8N2VaRR
ziHaH+M+?eSDQj3qk<|&?QWjSSWMfiQjSU+v+4pV_p|06~`wWR~n=tLJN4dft#aPr4
z{vBRPVUlU0sunRBA1m>JaEiM&*h`j2z4?@W9oHo^4?vO2A=wS0P#GN`!+t<R@bvEr
zF}mKkSR9%pkF1~f`zr;I!U^I86PxZoY{TmMo0;`G>OWgZ>m;$WzKpque?wh5?LFb5
z3#A{j4TetqVV%%Rcc9NO*xes2MDdoj()RQ}ReRu5Z%sJ1SIB8I-HeNUlwvR`6`4|P
zlp});ryC8-9nbdDPw;0GfABV@fKu&KU5-}-J*V1&t*ej}xw5X!Kzoy>PVlXR;h8|_
zxujr2{4!JVonEYy1*dIjgNh|=z5AtoV0Ehw{@1{P@x2zT{pNZU*92guyQq~%;*P6K
z-dfIag4PN_0DdCDV#AkVSqsj)gGiS*Syjyw!H<A*<HhKsTfq4POfJmQA`JlIWZnsp
z9gRoke)#sZ16VkB{sFvkyMI`?XZt?n2l~MD-C)wy95(ZxKiLpvzaKyXB}=x(Lg!)m
zZVCDZHRO|Ergx1tS}Km<CEvD31T+!}asactU8-5X_eFoKk$+IQGrymo%bc1IYzsyW
zvk{RLqAy-}4M|C_CMA~K*D#GP|0Ut*>*k)&6z9~e)Q8c;;CF;8(5i0B($1Ao<;euM
zb|4M&pwxa1K%`DwEQgX_IK#bHj8tMV8|jq9w&nXOx)-n^Z<NLqMpYdb=nj^9B_er$
zO23pXmKad0fe@Ape3FMXdPTY4#Hn<atQsnq^Q<bjPWM>u;I>?(fcB#Tk%oYEuefS>
zPQ7g!@J%!HrR5nlwfH>$DaZaBVV<io#w10SI9Xpdfc9KyRcJ{-bS}d2s_cha&XEDE
zl1n-@dC`y|q{U?GHny7j>K&Qj2q?C6T_zw<FmT+^u>BIQz!=;=Z@PB>z5c%LY_poQ
zS?_h+bn(94)N%i{<RoFJ?u$0Ua7=2OGQJ04*WV%0dBj4-s<1^I3j57YDS?{`1l?aZ
z4bnOu&<u&Ur(Uq)ch8>*e}s%~AAuzp((WFmi<c}}eDh2o=lLoQUH5=d1wVxxq#{pC
zmJSQPcpLU%gASHgLBBfkr#MSg^90$6MbHfO3LKWFUbix#ubn*a!4reIMW<r>z&Awr
zWiKRTlG)Z=i7Mc{*|E8Q*LvEH8BFl17XK%`Y<(zWcaO$7fU3=wZC64_D<=eHaRXih
zDR3eFS>z8qw%iyaMK*^h8xti4fm*eX6vH!D*LTx}B}>%=j+q>2CPh!{)>kn;<4>^}
zH053AGUiZEw{DHxx65W2ZV6F06UGP8sLNK}t0hsdN~0b1X+q4DjYGS{+<o@(K$7sM
z*p{7c8^yV*T$*)C?joDtHsb@Z2-!(qrJUXvsvr0Ia=pD5Ytt1-v%{(2*kF{}Yokdc
zx`lsl^PS!4cSci~YnvU9mai9lh8XqH{f;2YV}jtpbTX5k7PTYaMRP^Qr&Qf-2yByd
zF2$o)>eo$T5XZo`$~QTd&%~E&w2Gu%O**7}wwaJ&`GeY+)wx<%Mrx%$f!w)lrCbkz
zg^H)G(hR$nRM+Te3|37F0HV>0@LK>vP38xXPsaNikKK)rdXEU>bi`~)aazQT!BfhL
zdUihDjE>Pb=={Q9@YzB-v?o^C%&4zQbssTAN@fc%mzg~kTDs1+twms`PT;d(+1EWZ
z!-#IWdF&y~xzQ)_aLrMeOm5~M)#o!Xx}wtACeTj%m=Vn%4{7f=qDFU9dj~2mrlz(>
zfbP>Iw*hail6N>KNr0>$p&3CwIJNOrX0zuBI=DbXn4s7@8m<T8%~CKu{}P)QKJ@Ru
z18`gE38T{q1_yt-(aH9+W>gk>%vQdJAGdhgpBD}Ra~_H?ckW{aT%J84oS$9Ibtr+C
zTTL?7iU{H~#=#Czb<s#cFNUi-Cg_wdQh#MlBO$;Hvd&vU6Ak_F9%pS6iBs;ud_X&P
z%=qS%fdXo5;w;ey-SQ-!^_kX!klN)d7Rv{B9kdAzCfy}PHbY6XlOt?3H=aqnYH%kB
zBbihz1;VvMRy|+Z&&}se2jdk993<Guk`8;l2<}yO0YfXUqEO5V+8-Q*(Isz7F}w+I
z9kjxGLr}C^p;Y!)#28T8ipQ34_GB{^elsl|AG<D~;IW!d$POXJ(Df*YAQzH%MiyX@
zp*j*|u7R_4r--<K7l$Xod%xzgmjrwx>^{D~3a9w-akjG`YZcdb^TzOtp-z6FJ7s$W
zho!C$7Vu45jS(9&YcN5-tXN#LF4y^{&^`K(f{jS_q}5jtxW((jZ`g{~GlUH{;?t1K
zr3V}sF@sT+2EmSg{v6Z6swva{7l1?kp)%yIJYIl8NjxNgiqvaO0pQ<C`w2%LP*?Wa
z)ZwmcB9nb%kKt!hbO)(yQu8y{9)}#k`Q4M?zWqd(D2U^~HSN^$noH4NA4#&vE|>C<
z<a?Ce`q+My+^8J=5|?bLchF)^^Pv>FiL2*yl@Jb6O85`?_L(G!!-BZ1&)~Ovb_7qf
zdz*2KM5<I~zJ>#-Iu0T%rTSH5<=$^B`<~0a!5I{_qOYH7|Al_FZ-{H>YcTZo<`esE
zC0|P;P^zeu;s7k+0lSI3*A%SSiskefPL9m;0^2QMrx0*w4RUI_uPswAY(g_}ma`N~
zBUQSwlA=U6K+14gN11=Rv0HXPf^G~4GNdwjDRK|yf4D@Zg<)K(r<x9TW?sltk|eVp
zOA&2f!GwOzfMjrq7QCRSxuVZv3A6_vea;~e%<Dd%3_`o~HX34T6tWC1JV#(35ra@@
z`?3i4N9;%e&qoLZs`hxyQoq4LKC=4ag6sL=1)#<-x)n6llJgB)Zte-8*zH?rzVLqQ
z#gS`O#;D#SJGmF0G&k5DZG_-!X0JKV<gJK(X*n$}r3!HZ`^Bd~t3Y#Hm1VFs(_2?H
ze3ka<J=d4eXxbu;q8_o0`6Or3D<b0)7V%o<<|NWHslE*X%7~zWd9M;$rp-NP9DyQl
zb>{{E;@Y}+{o9=x7~*A2Ce(7*-YNX+MQektzWHfe>0*C<oljPK*}itTE1{5Z$$JFd
zXy`i^G6*hp{9GJ><54e}T(lH1-gF`pEypK)#z)h>LcAu8u0Qlzyr3FH3KzCe*GIR2
zY!IblQs`Jnp(VsFyD0I^DyUiS1v}iCh?r`~y+>`k7P&0jR{P!3---ppBds&q5l7^K
zY4oeY-uI*dXs(;Fdv{V3M?qi949L9W1$NaO4~0d7f;2C<Qn)8eV=3C3FkEbIhK4jf
zTlf(eIjszG7~pV9TM&&C7q#d{o3cBXnAc_D_WQ@u#uAf>+jd7KesFXysQUw=!Lup>
zpI*W8y!oXcN<0uRXGTIY1PIwvUk`!Z4@2}6i2dR>^Gxmne2s87JLnXVu?7QDgm2H^
zFV<y5cdB!+TTRrN-ZJyVhp&#;Pn~s_&PBPcpBsO{%lB;tWdcc|bSYh6Sdm%EUP6$#
zTsaP2K>SgzKrz3GG#+fQvq?91;+n6eTl^*0r<rr#x%s~EvZ2?mEPk?RoB5jnd8&F~
z8?)Y!70@yeH6de9M#3K(5_OQoQ>t4J01!Bl)PE2-7AEj+d-REa5I7}y{ZcU)-HBhq
zHxC{ri}7WTPWU|En$gN(^d<<ig0}i+W2=1v6cgzpy>SM>;|gn%lzKMS6KwE}=&w2)
zdbUod;jsska8OH*>dz=MXc7;9L+qe(sk6pztaqz(mrYN7cwjI7aSe&qSLA)S;`l_0
zl$R7r7H*hFefm*pJ{`k<yaKE{>Dk*&1P17Uwndde?`odOnjT7QVw~}C)Fd~1{|~%K
zC%7V0?fI?LvkMtN&&DEo98AK?OOIR4*<eQP30WIqF)=YS4>Pgjp+9Z;<a(I6v`b14
z*qFAurf?6df*;X2&s^|+1>9WjZSBi7zqo?7wA+5i$4eD*y$HXf=0398*w`2g_j1<c
z^5{}ky5a82khXJibBSw-)dh5Xy`?>|Bjuj%PVH803`Jsxc!;&#cXoERURCY#`7S13
zPdY{XasFe6bxO@)QHN5~w2n!9syjD3EB8$dL^rwf5bqLx7`Csx{@jm)Llw7O*v`ep
z%?>~p{ycPLU7h;V8ay<zUTP^Ih&)|j%f;cTr~Ma@yS_NNJ`3o2+(mSWM))2aU49?!
zX-|K)c1!<bdxA}JMzO5F_G7}!PYs!kLDF}Z%ZPP&wY!%X;+Bk+tylb-AL{gM^A`}e
zjpiXj3@n3pir%`%VEH$sgo~5Q?ATVn2ce9(rnGpo5d0meFP8_miogv}2Aw1fYn<3M
zI;sq}#~a&j5NEOE^v5k*Ol&(B8?q$lgA8}@>z+kA39{uD=|HZ~Jda^7gj=FW|K-ix
zd6mU+!`AZR?BO3f=|cB*caQeF%uGYt2Ie$9(H|`sNsA)d{F}I&5;<RNAI<Ia(MwH<
zKQ}a1(PJ*68QY*&=ExW%o00lE(CN|Ti<Pp)DH=JfGGcVyEAE}>C-U+RdtO|mYx@Eq
zm}p;eNu=y|O@0^)?k6UHcx&~EA5V3M+h|BtJBV)%o73g&NUIU;Rhnc!GU4hAIM&w#
z1cj{R0ebO|+T~<*c;IUHcXVq?94VN#pFL-~-oRcj5M66bj*X8U)lInaRZT-bPH3l@
zy=wjIFO=2@J2K_e6{!K_)Ah;lZetTfLR(U+EQBzk3st%xfjd`vBS|JvWO($?YFxXF
z3~O!vQEkW0gZQidd`Sy*`SR_x&!NQk-^}s9nYWS9PeX%x`afn_>L7mKXZbuda(hVl
zTa2PH=O8LN%;>=;nL<7SI~9e#AKU_l7BmL5*yz}i$Agv17&qebk^y}|4uw_$#<ovo
z?|Z*p&VcdFq2Xy=RpQ`UihK`ce>6-5>|Z^I{hs!=x5IOp-nN$3(SO8F#@~s!j0MHD
z=h4f~@L?+?+Ye>v1U@M8i^ZWLLdbMe{ey8ZpSbs|{NGLbQI@vFhwI-%T7VPT+VfQb
zAnpY~Uy~H<jbeJ7Y^KEm8iX5A+D1z^*voFF;NBW^Ylf<2_QQqRUVl1Jo8ehXA2*0*
zr8cA?s*2XlyD}+WQHX8HnjZctG$9x2E<_H@sjsjZo~PITU4aR*BXqY#asnlsWkbAD
z?P|aU8_es9+9H8k@X`VBmC=-FZVtVm#>Ac<AEzMU?;3GdywX{UQd6aLXSh%$<b+bP
zA5){#(@T{kImN9f5!##H?#|vnfa9C%J1S}x5(Z+&S{8yo<OnBSsgg_JEGZPy6ZoqH
z@{x%(y%HoNJE;mJWxgcM7aiHm)pg-@iG><XYlpLHi%4!dRJ1_HJSzuaMSO8{_eF^b
zg=da@OiX!*J`yp*aG<@ufgd;z1nEc$YR5Y-GaZdz_Ck-~b5?0jAZFGRFNkf@x!B0s
z4`8VjT!6YfX0;{J@GN4#c1=>0+X<8dY^)lC7kE%Dt!yqz28ERCwq5e#ow<odDa8$R
zOR^j?v!@I_BCY8XN4~2f;&q~w)FC@_ZxeZ(xEAotO4T{^`GX-HVbZ!spV#!^*Oco<
z5AO*oRXcI!(yiQzwF%m{K8?a8%3*P2u6ji^D_&c998<3H<trl^^F0q<QX>!>{N59S
zL9VBJz&JngVrbpFC@F0{YPInXx)PDoZf{{ZKwO1Bt-A|K(QfKxOg@eh4NnSA(c%<c
zOQ>!X!Q->^<9wMWls8_Yrp2wfBO&C6!N?Cg5Bh=9hYZSiMHQ@R#D3qz5LQD98+zqT
zA5cg`7Sg6DyEk&1+OVsFAQx~hFaoK`xS|=R89SgM+J%spS^ybyf2bSdg;L(pDOYh?
zwpk<k0d#RJA4v<fzC)*<G-bTTkuaK4sfQ>`o32U+N^Ho)C|0e&Juk*fNl|3pyr+d4
z{p(H(5t80o5BY_pQ*c2g6wMt50TAn36+`BW@aFjGf-72;5z%gr<@^+Y{)He!szGNI
zI4T2S?+1qlL96>z^+h-H_)kV`tfHbDyW;b4rm8%zPPRF)SNtRC6zXO4oT(<~!&B7T
zW0*luk)&C3*S=)cfBIZA|Ms~q0DZ2fM<(R_akr6xqs7Oq8xALhr76gVmcA>(ONyjS
zoM+~kx7=_?kZ6!<2_|g-#4r@r-;5sB-d%|9<gcf@4(UDEL=9{cva@)hB}H3jgRm^G
zo$AT|jQ@FLLh<9zc+q$bgcgw5#}jT;^`1NYfQ$MGgJP{$x163WO89n{??SCsIM_{`
zGzNK{u+Vo_#E|nu9F=TPcru%1a1zasEODwp1KdCj&=ELn34h%9GWDW7R7EH_glzI$
zB&_vm!hoS^NPlsd!jUZm?OcaQMtfo=Kgw^+$LSr**&@8`z}qFtRe7BhpxOxT+6Mpg
z%8IQjUY{4ZZNZl&T29Xh88?!r;kz%#L;chaJvW#16Pm~d`1O1i6J7H^*_e{2qRQxH
zPfDG(HZpRAtrl(@nDywvhpBrZKO5U{8cQC>#w=9IWO0KooF;03Sk8k&Sqsshn9>8$
zG`qK44EFhqQm!o#Alg|W;$Wo)4t?s?@NjTgo~kjr38ZK7h2i+lBPC=%#8vannBUXF
zJE^BMm=*|~vG4-o(VYzLWHtI-eQX*Wfv#_pM*Xgo)v6COz{E*}sZwit22zSlM(BF}
zZa>ay25<$!F|!#jYrw_|4H`g-0|vjrusfJ|POA~pQX$calrZw;AYub4gzjT&R1?D`
zN1y1uvi-l1W1Hiu&KMmi=g3;F%NCm7!;pt)44-<nhHRUdjWMH(`VvU2n5U!ksOx40
z%jW$@1uyqB@A#yOWgD6EvyMiy2WpUL#TKp92q26MNm&|@*0j+QE%n>a?9<QrRmKn&
zeKi9GRUFVLYTv^PR!`BhV^^}g;GgFKe?I}$Gsjvl&S{khpzsOq6sG3jaOhWExiD+|
zTxg!(3Hw1-jTex*o`%X=oVwKwYMGrX(mY;01}@M@M303V-MhZ>vZSLaB9w>42)Svr
z*=~%JrsDgk-cc5>P>*Q!8P&Ipq!<=u>r87rb0?wutPzakoyz;(w*_qBej7fttJ3$5
z%)7B2<KQ9Me_j0x6#1e{&-97Fo^M2y24ey*=c9~2_((xCektnou_(DF2?s9<<U%IO
zx$8($8AjJyYidJHD6NE3+#0}YhCbFi8_tk?!1E&zL%0|rug4nVjRuBAPmoV%$dt}w
zea{pn4Ijak<xJZup4vd&d(%sN#gO>qNpF-b>!}f+Z>@YjaTTcQY_}$LEN^c(1`DP2
zeH7I72C*O@aHr0UqY$@>t9B_T!mS?k15c5x$$t|dpqZyc3+Ngo&5mR*hS8#uoud(F
z*<{}{&Q6$@?Dgs+BB81fioebC6UH4l?d6xYN;sMz7806E6?y`BSt)b!y1rU0KF`3a
zUVLFo=^@aBA1N89t9sP(^no$G1n0VOewP=Hq^DaP2sYI2Vvdg{V)_QW<E%`Z!z3xV
zO5}eZ?mnc=xfL!Q%Il80OOB7`CI{oLG`~EOTV5Nr$atJHn%mVl`;D@f{6PS&_oN*B
zd3<==<5vI@hjjIYAI##DQiu#`>}7uL=_|X1!^4Vi#GCEk`1*o-8Ur1n2x85@kuWQL
zY3^rNgd2+)%^X||rp#Zcgj>Kn;z(8?=5S)LTr}tUKA?_;!H~8FWs!i`p4iG1d}MN#
z*Q$Eias2qRSKUUIeUqOlsdG@7wl{AYBVw6G^i^o6&vPO#K@W(3^aL`yaY8u?gkApU
z9t0nJzzkG@*t?`OGZr6>C(wE=^PhJ%>_=0c{OL}c5zjV&7}1naIqa}30^c--g008D
z^jd0>uVhJOT%%7h)nrP5lX4z-e`~CIT4`>mSu!#PH#DeQzblW06kaYKnL-<gyBdg`
zIAi@CNmE!{nyF`?ke+ZeFAUgl7WSo!sO+rYTIb|fM`9_e#pW^CdGn5~f8@)nJe&*c
zphw`92)6<56`m21@r$9XLr%m>Bm+XPETNMdH%t33G4iTZcUv-b2mGBtfpb^0;G@g3
zgrKOQ-p(>jj!?)6j^Fwc_Sa}&^_@{+O;|SWo7nBnb~<8vb~~H-B&&z?8=FGPuIniJ
zqM5TWH0n(hsnO3+I7nf#ocys&H|>40G+Rvdtu?Orm+&-W#{|!43KS!5BD&{A=P<Kb
zYWppD&2KQ9su4xPnW_hI!R!3Gq}Z;%-@mz`w0zKpG(+m!3(|UI7Jn@HEeA1HG0)I<
z9u!a6gnQFVZbIqOOTLcq(OXi}x_4m6G@V={EBaV$(ZTeXytGl+r@6OL6kdAM&AsWA
zC5KBF3QAxsigZ!o(U~$Ud};|Lt(M=}D6DxDX3<KHS1(kHSiGXSHikZ-(v4{=b|$bM
zjtTDUxxvB-Z7S|24sa{5#KegeKM1BoF}+ZZ$cSpXD`XfOQ`wA)i0WYk&+5_FDF5yT
zfCh%u)$F(JIPW9pfCDhbrnO4W!4br?mH~B#c9IYbT*ZP^&Cyk+TN8)L8kI1n$(Iwx
zur%)8U&FSjKU2P=FxY;g@=@<R`0KMI|HLeIye4XC56zQubVf={-Z|S-iM%)=vixQL
z8%absL;Dxf>R^!U!5LResA+mj^9YF%NAn5GBXw&NSgen_f82?Yd*T%BuD`jl+)rrW
z$xrPA2(|Tbk%sxO=<pJh&3=&u^Vx0@<$JgS`v@^wz!Nafokd6PhC*%ddN3?g<<T9w
z=j!ae!&NwP=a#6_*Pgjs`m_8{Zx<zLv0do464502RKtL{SLLBx9l4G8P5C7LL2{Gq
zqtpXs3LcOpbL*daW+MkxUSrTo(6MKwU+!2O@%S_T-k%+l9ZJJe5IhUz6%MZhM_D+y
ziMB$UVmrXeqDplXUb0mV;N)+Jz<?1WEyqyn>sCJ@QAxn48iZ7v5k~#{2-L^SVfWcB
zEI*#8mFC&mF=l!YU)_IevFUppA8q;&Tz@!B2Uo7U*sXqiEhR_-><>1nXGyj`WN#=U
zyus7d41DJ@XJ#Sd!N+7F&-(jyLXFf$ib*3ftP~yt{TmRs0X01fha%-!!kX+f<HYQf
zV{dLUjb!I~GQ@SdL?g}>DAkroqU|bkzm?N-Mn;(L{r-ZRz*H_7Iti$1)oK@jOVuhf
z|8gO}>)b7K{mqQrvC}VYEfvepy-*eo+spjmzmqwBJ*<|1P9fF%Kw{@B4LPSRqzq32
zby}Tx&v`#QNCLiv6;>hX$WhNtsA}Pao!@k#{YQ(1H5MN=1hvf>u3)8NKCdUvdOk~*
zyxna&%clLw+Q&@Ic|2;e-NkM_G7bcx8S%HHN2X=PcEcA;UifU#+7+B6Re6?C4hScW
z5MxJ_0l}MlJZ}N*=IUmwv`dNsf$Ll(E(LGk0LY#~`h8m~8}X`KXCF;nP?_uyzD5=|
zGX^E-kN~8UEove8m}5NGP=Yq;L=4(81Vu>rLme13=}g&zknxnMpHx4n9$V?9$nnM0
zGwidqbg^Q;Tk*_lGvl_<pQ$1nmgwnKRIoUGGqt+phS}^J@*fyHg7x6{sAk+@jcwIe
z5=<rvaUZqPnBa3v^d~mDXVfP;Fzi(caqP%8cy=Io<^hh33XQOyuxx@DX7xl8efds;
z?V=jyDyTk2<4ycO&)=PA9vHc)SW8;K>w#>jmkqYF_3fOmwt~BB{1nV7U1vJ<D2p9u
zUO^Wu3dnqIx9W=TCv|vMPDQmRZt4*fMm-UsCW#_AlTTNpIwO8ugB0vtBsDkNy!N+W
zTW>d#zaMd5n>BaUgHsJM^xZ*x#LC)OL$=v%Z={}i5sOlH)SpjY+MNb5;xPYS39Uuz
z$_GLDqJuT4pkwj7dD?v5UVXjFx`?uA-&(tOcAj3pwp;&v;^uD36JJ<uuDF@iU3k3$
zY5q1<;iL<%Lgm3};@A@WEt!|9dN#zl27riluz_zovEws46&Y~-5>*h2h_NggQ=(@1
zBL+A<26=zGeH|$8zaex_WEsq|XsG^vJ|zW!Bd|&sr0P>2564%XKfY6V&RCmo?!sGB
zuQyJy?~m~&CIM0f&RYA6!H-J(qb%Mp6X&tdCyG^VNO|Mm$Mp5&8p^}1#lzVA{C~2&
z$J+*T+<-QX_~~>rY_&c2vf9q08UD9dNS)0Cw_P(XytQf@N^SwEFT7NRfb{%QI~aY{
zcHEWrzsW*-GP2v|n~7u8we5WGzj#Z?zQwF@`B?p{8y7OqE9^l1oxicTf$GIJ<fa+i
zl3VK80Rav>?X)z0mpq5d*}Z2qV`&5B*=oC%V%t7M!11s~)a(Bx3yb4}j`>-pyx}lN
z1@|p7GPFDRpgZl&_#E-I!7w6!`=odOjaB<eo=w~F^2-({z<SDazJ;^h^?>|&(8W&x
zY2!6NQVIdRi+9e|6&19m4;7x$&=r<+VI>Aum#hczCqrWXNBj_D*xAp5#TQzvC=$C*
z%UOHP2XG~zkFr?Dy&^UFywcyXy9f_w{hK*+XJ{hL(B_t`8Qvginw<3bR2-|vq7I0N
zIQ16du>B|kpB+qu3~9qPFSJDi)PE^+!zyUu3=?hcCE1e^w|#@-vwhy`7~P5;&f?y8
zj2sq~C?DyO%z?o85GApb@mi<5v-lrfuU6TwQpv-Ir?8T#<l_o7`4bNCWG(ei)ZcfP
zfva}3x}6Y@5x)n{S#K=Qb5tQk+koA-t`|=esGZiRKG9G0(TH(pD4rMCOK)E-5HF9j
zvNL`7c=<gr51i2GyMuncTq7eJ&^lc617Jp3SzU-@+AO~PgZQBb%IBxy`=3!#zYIVf
zFUhiy4`IzejH$#kYqJ6Ko}Ms<L$Gx6=O^6fUN>|5t5*k(E_!c=3Epp9sev3D`T~ce
zKyELz=|p?)EH2x8YDBry$>a_cHA*JR=Lk}hg`AK$8tnW|t(9CU`px2Pw*PWjv5^7r
z{1_%UHve#04%z?5Wod$9GR^e%`H?FaB$NC@X6Z|76d1tOWH1=q05(TN9sOWU-4n9C
z3oJ!^0GW^Q{(X8RK0Tqhv;sf-Z@b-%>8scAh7xYhcL2Ld0QgO0dbnpSIYv*m&CBI5
zLexRaVj9qN0Iq8K22}R9^nmvTiOq~F;j-ArN3aRIxD~b`Y1<H@>AXK{J*>k>wH&hl
z2hrN`{r@0Z*vZ)}wlcuFpu=-5e1HE$VF>9c?+-aWe|!c-XN3@4yxzWsBh-F})(_It
zwJ1fiW=zgbQ2+S1dzjPcb~w>yglHN|7n}xaDAZYpe?VeY?oK$557yX`Au!`5_Iow9
zKL4!TE>qJ9R}N^^;1a5vz!Y1O%;*R0;eI7_Dd@Aqw62g0A(3=$P|*8!k5pD2CzNNP
z1>@`pI#PN`4(70!ju$$`Qd+Ozt9o*}!%u<H|I%8D{YWjtc=dxnD~rc-+IG6CONdk5
zO<$AYTI!Sj6wer0ct0^eZzcY>e@Gl6a*rqFuA%9+ebaciR|DK?g22oTM>BdkQRW`^
z9Yq4Huq(37<!Xj9&}lzKg@*FnD{Rd8=Nmy?*y`_u(+Z8`K@>L_N}+XLd|K=x6)WTJ
z$dH+;2A`&2)g)(WATSHmj6>ZFSZHDP_f;0>k?xE4)u_eufkp;>7B2M6BW4F-!s##5
z7T{tW8*tnI))0^M>;A1F!q0I3Zv51AbG;y=Vf(a3FiS!)WfN2y2J{d$8TmVcv|Ntu
z_z$@kmba%TJuj!c!Scl-mG(Czy#72i2aPiSW-i8JW}c7^Oqs3Vq6kz9pZ@4KyNaBj
zF@?0V0Elbule>CkH@fO?PN*AtYMvVm3M*qXOotva)licbG<VX67F-AvrY19t;-TvM
zmXPK(z<zw!gqCtfaK6=o&&y@T%Cp}BfB5hpMaYpP9{x`ku?b}k&_x_{`2TegzyIwb
zimUr0FOUh8ZZ-p6pm6(1?EzxF8~(2B$90PN2(Heh=0!$#!{6(29jOIa%~Vy7@E<A|
zlPU~N8j(_&pZJ(_2CJz8u#j`T8G*xHennVj#Txe`Bj&z^e-H@1m@tog2+q?uhik^H
zJ!??WWqB0;hNQj2e9YzL!Q|09Nyd-gvm7|+@sDW9N^0uNVoRkE=M=9N6KoRVV`27N
zUj4nl3XszWV@P=I?iNVP_)jv?(CvSdiNKfy!pzhv6Iq&lKOqX)!<4yM%H@H;Y|=b@
zKUr<3TIl()4w9~bo`w(&?uwj2%*546CtR<%MabRYR_UUq_Sm4%&7?PaH;aFc3Ix8B
zp3P|_?ubq1;fX-IEGT9W51nrJ>)Zjlh(5PEr!fTTm`1T7gETPrWV*WyqUi75_v&KW
z>J&9GV%(=+&d*8Q9(*fA1EHl32UgEd8cjAMJk)`FsL6|x+J+qHFP%cSQ;Lmsyb{#;
z=F>r@p6njQ9j@FF)@oDnY$ZYV$Lems)-Kt)8~{bcEx!L05o-?GuS%^V>G?q{!c4ST
zwb(NZ(QNSvN1D*7$+ZM~*-f9#k!#IMrhR63e_=r~%<MLs{uSwL=ux!dWU4HZdQxK5
zt;>#`Otu1#OSbGW=1;cnJ6P0WG5qJllnXN@VY2sDSf!&iHPZ<|5|PC$EPE0*?ZWJw
zOy;RxB_Hr5Nqx~vt95^o(?GeoLxcVq+6qo(inofY*e$dFW~>o;-65c}SaPrE9A&yB
z=7a*>#Gj@aq-NpAqUBS_CN6cKt17NUS=_CMn@MkWO%2!-y$+V0dksz_PF^D~_l{Zl
z6sBs|eX*zEappmA2)z9hM@%cZS(z?!r?UgeGjIFgt4x8Q&?fy^)`7(D6XN0wps>2%
z|Bu3&#Qo(0ps*gmefvq3{>~^N&$HbMgi^5-{bE5-EqQQrz~XH+VV$YBLyzwG6P;<o
zl{mCoy|Ip9QfwQwb<{?Xk1w^T>N<C;$O`%u5qJub@MyRQ|3cdL0P0;DvtHpoD?SF)
z8d6yb`jIB32qnJmlp-l$DWVFq0)xS_1<m5U%a_J|8Wb^jUCjVu{N(dPr+z!^km&{|
zVgOW8kW5CNl;@MZsv|F^h0kBnRgK&rS?0gBL~g9cF>x;zm(UvLgL7O*<_ErU&N{ei
zGh}BYz&QsVOdTaWf$S(Bv!>K-5;7Ql;{=b>HNS?Zcdxb}Y`WxiBeLq!jIxxW7A?{v
z$(~tDQGo?0b<S-~Ba+B5;o^@ORo8am&@Q31M{k`-@b^}h%)zuLblZqo=MyK~xAXEg
zT2XAR+;+oDoD65Bh_3g<tG|D(e8yWyMFl^&Tn5bN%bkTMBu$lBbIN}#RcHIS15RG4
z?b>$13x%%32>L8H+9$U_{72-$xC*?6@FZzmLqYe2Q@w$Q4SyL!vVwlMyu9+-xEQ`e
z{Q=Y2!^3f@-Zlw5$K$SJ;0>rU#mD^}{msJuJMEGN^@XKpZynik$c}(@f|q|7oK8d_
z|KJ+hEaUC45(e0Rr0Hfai2G6htM4}Gv>5R8Oo_&RsB^f3SJofNKe_wK^=V^bcj%@)
z?@T`ZZQdDM?5+FSa}nWWC32%td2#_<d%fVm)-7L>6z!fVZ&lbh9%ve0sY7awP}*G(
z9%vbo<Q*v=G=G7dtJ&5_9r&xSZ;l?3%7C24ot~8jT%&P3K$07}ZLw@b<2F2i>P4l4
zt%R*1EEz)2C$qusLXy0ltw=)4&d#o`x_liLe@0ofpS;d?yL{GPSFdQ}@Z^D%0I8Zv
zf|W>5Ic!F$Pz_I1*DtY`Pp{m<04nWzv5~HUTIr{y2I62sDhjJS`cpy~(_30PK}X%>
zLKO`84`yDB3Ral+!Q;dfoC?}*tlt3usJ*n8B7J6y_OdP>eow9CtYCgNjPnkQLhFi^
z{6|k>#{nwPSE8;%-LAD0J;$nOr_geGMSw=_xQyzZ=3%*vXnOb5AXz^4B<?|FhixGp
zD>#r|Dz%L24iX&;=Hp9Zj_(xRIg`>oA6@F0zk6pK*`@QPYqw-=6YtA7bE=2Fu3xJ*
zuh5G0p{`xpsMXj-WzJu1;100+76}b0mPl+dtJiC&xj7i!h`BX)ljS_!OqxbR9G^n^
zU#w<9PrswxAHwW0C8=a|dsQeJTU9!t3iYql!b-MrPkQ&9XjlFp0F*#$zk5Zd(ooM8
zW1U94p{4VZ6}z9M0bYsQS;|6KzGsfs!=>HKQu-9pwMX3wSMFApuUu~Gha>Bumvtjc
zr=Mlt$5H?;;Wk#lCLZZ7mQr^KH?dUC+p=eq${1U}g{9YDfjd|}MA%3EF?oTF=O2@w
zgbm2Ei)t28X|*iHZ%bKfR-Y}UgyFBHe6{*&DMebp$rYux){SrvFO8O?dUv&6D>&f<
z4!3dc?)-)0jn5ymS@rE%^3$3NnNrp6KBkmu<vv+pJu_j1>}Z8xfx=}^5u|fr?$@pT
zZ@$~r>Dk;DUOorrg-Vu}-!r?q5$|4FoWb5&UZEB|wN$1P;H4#%XzezeU$4gwnMCOF
za(ur6;0ODCskmOwO*a**wCMb^&l)$b>%G#w%6tm5dz7i<|EqYDHSfWYy$eI456Nbo
z(2RjiQ*h9Z$;yo-g+*^?Gp1h3<1Z;kU{~^_H_b2PNrjhk9Z#y>`ei)1a@4DMQk~`(
z@ub4R*YKo5OSy!{tFP`AymqCVkuKn=RYI=cNu^r2d?(e4Y@V0uv|ymu2eM2P{VTh4
z7lfhJG>8Xd_|=%iqg=O>@e)Df<Cl(P>yCEOPAf_|a)7=2SGr^u(Do~|c}KorXDOX9
z-mavzfT&D+S$2W|p0G-m0)3s>Yz0fF_?5O7EbBs@g({Y1Gf*zm>39sjN+%TyRz<2P
zZ?247$xCz^z=(P_9RbU{K&QZ3q3d%pQkyI7v_33N!L;;?bJ{FeRWRn7P?a#+_h2<l
zlnZlO!3fHfC0v%%>b0il(6`%QO-JxGIqmb3F3A}%R^y7C(a6eNkTY<oU5}H#TNZs2
zWlQT9<KFx8wq2b&-`N(Fvyo+F6geA>+bkhD8|?>ftZ-Q0&%+fxt9ltPu!ZI-nUITk
z>h2P#H2JkJ;pqw2cGI;Azr0bZ<o8<>t_9wvSwhMxWPX^$wOZG%Tkw}=je2D)yWd(i
z3qR3O#?nfqItpLL_KRG_qO$T{52Cb|?4@^QOnC6#QQYUvD=kS&2lezf8dM(-9@a5a
zq#62cKup%o9q-0PV$fo_?o>yeLJ|5#dyo<LL@G|!_(=2Qk}R6Ws(5;d>$8ziBY)Ym
z^}zC~It5;Z_z8O=^-HO2>0y)RvXLes9hh=x?dyeuf03@BqNVl-u+`&@my?Z0&+xL^
z%xCU2dZjH)o>~0zhI<U%IoL~OEeNd~TMA8-Z&YKXzcgcr4Wl*7yNLlJ2`~eGO7fau
zSDox}m{e#=nJj0E1*0-m5VWV|2F!SVNF$y@qaCp4Z=w?oK2_jWuue-fXQ*JZIG1rd
zFtY~;0U4yvao_ZIUj=%Szz8xh9=<2522R+Tn}Md=1x+w5F)0)%k!Qqhg*ks>Ew`tQ
zWrL4UQK;M}APtE&)(x!v+W4s1JZmyZ1ws7X%;ROpp7sVM1eGp4NkN8V03IopbET|O
zlVj`OohlrjOU3=s=NM!O^n5gsufIak9(S~O)9#IzYp{%*e}}r1e1YQ>;AkcKm}6|v
zr*4yKfJFh#q&z@zOSH@3Y9&|5!{OM{Oi&L}UL|-Rrwh=hq=eaq4mYsP=*XdIcpZ(%
z$IC_NZ`?W+&8Y#l7s1*Khq86b=Wx4Go3c~AUlF=uKuHzr&S>1ki&T^8sxr#HI^Y8u
zTw`94wSp5z@c?qPbk5!_uc|(G%HIRnoEc+T0nmK;dXw&fi!{{yc(n?Wp}%tNOLf3L
zl*r9+gn2TyJYjLQ4+(bF&^m5?6-H{?&Y>)CQ&h{D%@qp4X`hh8N-<ybheF8MkriC4
zRvEA6os5E{5Poa|X}T^|8A?6sDBmKp>HvtaCl~jQR*Fn6vdC^h4bW+~z|>D&ONbW|
zWy7+)Z42noQUUrW(`aH|(^=)Xur!ra1uhXZWhv)AInOckn{E93vwLBRiNxrw^5y3w
zzO%B83?qyL^Q5H?1|P%Mxo0o&veH(NPHK*`AIqWJ=J!_`m!_CQj{D%q!e5WcF@*kq
z_TIg_i6dJS|DR7$S6=sr9LqwIjcvTfIbjG%I71*4m^14xA1_&!+Mq#}6g`Z6Gr7w^
zz$QGCKtg~dkU(IP!H|R@1{?S;XWX*=SH8lpcD=e@-D(M&WHR)cAy#+QuGg+zyY|za
zKgO(!NqcM8AGqHiMWc-49!20M^y5XaxxDtUab}*{9J-IrxpSX-o@1XqT0QpIy?W1`
zJK>)D(4D&{jpF0APj9o3FAydpO)pF1c)qAH^eBpgXV9aw?wN~}SR=4en>*i0Oc(_u
zxh8WL5miYypi9)ruaouqnNvvr=C0d#EgENUtvvp4W%25BxO9Wd8DK76nbqsJ|Ml+U
z@<`m{-!DJ<PP?3nk^C#)(_bl;(g!M;oP^F~hv+-il4Mt<lg+GRkPbj$_L^4E7!<R-
zC>=rsAkzbQ{yB4g_14!b*N&|Ibk}|I{y%3vT>EIwz57r<G=`GIH}Z=ktF!l|pivSe
z=AU-{!?oFS_;cgqGs};!Qo_V&`pJilzh6i6XzC;$AA<Bz`l^G}J1ZY*Aw(6tngT;P
zXyNWy1-u<LwjSJ5vYA@c{35W?WdGR(VnpRYhVz5@uH;Ckc6EU*sUyR1-sI9p=mk8{
zX}=5Qy{+xdkGgk8{3fFVR32#U!e^^@?=C-`cW<2edFe6)fXYnet3wH<IRW!}(gP)m
zAgxd4dJrl61`|~H$$GREz;l#6L}W%p{s6zCmPO)|;6D@=5$fbajDXZIarxmTd>rD=
zO-ytm$`82nr|}u<esHbv`LV|LN8OK3D^OtX3q}+AgRAQD5aost^BInBKN&;iH56B_
z8!b>L#tr<|$rJc_U;oZMe!K~VLN6miqwLhFarz`XlX1=cl;#jM9p>Fu7T>2%7W56q
zAWF8Hks||H{KQcz{`<>0W$~t0{+T3J3Z(TDPc@M&HSfwZ$%WMEX;&W9o-C+6IgkNn
zJ>);nB>P!6xsL%kSl$y-%;Puj8!6r~7fIgW62)#J#0e1Kv@X1PCc%yR#yEQ9^JXLT
zO*+A3w`|socgMzzA_Oj_*J*l}{_J)u%wRi;EbFSTmp<#+bWt0mhuQ!g)C(_y7hVMI
zdlCE#_y@c&?B77c{$I7denbvW%cFi)%T6{wp2tT5u${~-KC-dq>FS*Q>zb#(us-*(
zKJPM-wwVRTdRT98SWQ@u){Qv)FUdyULlW&yI86(63)N~+Ez8=>$;PR_HLlL!i#-(q
z=raa5q_||8^F`)W6&Ls?@m9J_ooWo3u$_%FS*n;mf0i314Yt*4N4WTBoqA&L1z4}$
z1(!IyN_TG190RNKcUBgU(Lnb!PJQExtBuP?@d7R{oS{KUM1q6H2RGL)d`4m=opg^w
z+5i)m3FX~Ws_mM<yv77(o17dY!TY(7eK~_yjFF4HSIm;2a(&g%rotfzWT5sr0ItI4
zbKSOI%l?88tCCp4A-l5KP>QqOwAPGG7frc`))mpVOY3h=0lj+nCNuLVa(Bo~K@9KZ
zG$G3icib!2WDOh@Oj0rkuK9~!;_j}_oObUnt<K)ZjZz;2>U$5KI>eQA?bG?SuP(h%
z3%^hczfcRmPz&26buyLq&%ygbi@J_l%nedHzR)y=!}wpb^6r<`*!^3n>3&Okxph*+
zJr~W}I;z@&)oL$PX#Dx-%z3crx$`x?Ui@?BLmEy~=H$c$g5K`o)_-!QarGp>aNNP=
z8>b<*Exv91A__VZc!=`t%xOEiHgg#QB9k6>`kzNHw0SLhl>PQ(ktEoYvPb6bQNbdD
zJG(`?7%aVfA$#0N+2i`k1xXMo5wwz$L7;&|f`SR%>eJ8GW=~W3icAtgl1b<!B{0h@
zfUp7?A>}^oestQMKL!#YNxlGSd5%5CWSW}pBQ{wOVx9z2f$XIUSfDSt^fQYy@rnH3
z)&;gc?fcgF?>UI-eSXe;_<(LQ8iN5(ku+oUr8%q13pXH$I++}Bqo7lU=MjSB56c{|
zN4iIZ?0MzN0tuxXEWDJ@1w~luzX1MY-5OV-XwXm-#Ig`VVzwto^}L^d34yAYq+}c%
zL94ZY83`-IV7@xDvN$L6JU=g8rduAsv9tsbqlFKWo&jL$PND^|^g<_&<DV!hn9)u-
z&-wD=O~y%ac1a|ym8G#LnMnL^L<O=iy7Q+&VM1af;l9vz*{QTZl2YM5%-q696-b0X
zc=|$m`$BsALVCL%(%XEs3?%e7CC#$f%`@GQM8GKdYkYcbb^Z|?pp5*jKD`9L7zN~f
z3e8(~mMNtc!9|EA-X<In%l*yh;V6>wo=e?;s8&xuU%h(-^2U?IB#?OI>3JZEn0#M<
z1d9HTPUBmwyL66%heP}PX%Y$U?CK2Ty(k4GkyD^?OcyuYHa|!X&P(gN>fQ-<jlTbt
zyL1r-Mxtc9-(Pee{zzh<f5tKp@#6-L9-bVBa5eNYtE<VMQHn^aw;)f_9ikQEAdy``
z$@*e345O*41C7l*SQZN!KWMg4h4i!*f!+oYO$9+6I=tKpOMh!zcwz}r2>d^Z3do|m
zojgObtpEt{hEj|GAqpZs@d+r<&fYOXuaRhOjcY%-M=n$A35LX^3iXS_3rxGD2YrKW
zarr#n4WiSyJ;>lF?LED{02{@Ckb40UAMtMc{7W`X|0W?XTMa3>F+m||0h%+z{pp95
z#d9>vB}JXpr?WtINsWKQ@R2dZcNb4I0a-VIl_r_g*pB?^2fUfx8=pYdCgSV2A(18Z
zn2?C~ZF%v`%KHzfnA93%QV+tYo*;8q+{H&+Pmr5JEJVP)6Y@=>ap{_S?|sO|K+>rK
zPEape8W{VRS0Gt}7=}#H5vta64#}vjP~+qi5E@dc_*dT<A>HQGJhu2Aw|e(JJ^CAW
zZn{e|YZqo3UmPKclTZG;soLaP1{ig2@l6=AuPuJIdiSQ#O;BuJC!JlB&ajxS1oqm@
zN%!0*G#`|=+XR2-t9WZi|Htc^?3#^>YXU@<E~yGYYd1rz6P^fQ>>pj>&iks{Y<mhs
zl_^Ir`AtMi4t!!6#Z3j6batEg;R)pZzIqR$^j`S_V;)^#3lSwqwhjtP>x941Px=CT
zJ{rUqD#I5l!(f#myBiXL3ebO|$B+aors9I78(%o%o?sU7?&(Ju?=2!{!N8Ib;NQBl
zj~f?(6fG~@Se?NP<YOx3US7CB**$+ThoLUylCEb_)BWNBslz41(lvcduV<+9e&Z@O
ziz4lX%Ii1lLm)I<P>kw|7S4LoF0Swr<?+EM2YX_?(b&WUPxiNXbM?s+Y82t#xJ8?#
zNgCy5F-638Y*LR=K_V}s$w1()N)z;+JMW&E#lW<@2wJ97n)9RaU~cWR8}g!oXbmIL
z_0!BCiOPz9nunyg+}XhsDc`+|KM7SmcYip$yzm3GC~5BHarDHwppN)8;pwMi1?BYY
z2@DF)DY@F+aF5@HwR&>)g%-{dto`3JX#KyJ((JeA^3XgIo*eBC6H`n{f#h`|je<41
z#gr=f1LWl4(q#+(>3i*IQ3>_JFXNg0GIWvbRWGs6^1`Xc$y@kDfwK=zyeFUH^UXsC
zmhslgpQp#m7{Uexl_R=GtsgWW7f3cN34Q#o5L$mFj}f>yshAu-#9eE&=C|l=5-P5p
zP`5LFgfJ|?)JXEGUsUfcaq=(RTG&MVT4n9)AtF~{yl`uI;nuQVV&fOV+kOSI+jY!(
z@QVw6J5b}t+l}|nVJ4kg@-qo6v9C3n1=|5O%R>AL838b_$Bp8+#}k#_3f@X*yJ+x}
z+t)EywY;#n{P<%qrSsC6f_4Am9rwf;%8mK;em&ewh~3EA%ttGWpYcI}tXdyB>0g<Q
zdv)ftx;5Y<<v=WaB5lLC6C#!ZbLlb@w%&X!=Yuss;B&1U;t&W+$9_TQ8rw*IK@Xee
z-FY5vHop;(`nn0t|FwkQv@)Cv%~H3LgCj6?D=pDN>elY}o?Y+`LYyzoxaa4n@Y_!g
z50v@DabiSvW98NvD(GIh^U2D^o3dxe%7yQlCE4-Y?)Nk9xl3!a=g{cx65e5NRZ0`(
z+9+zAz2(k5pi*{Gu^Z@;R1EeG9AAUfGMid{E?shu&q05d&f%^~YqmB!tEd|#;{~E_
z1d%zhs>bKl@*+NTE((VedR1II_E`2dQSXdvGe?<K8KKuAhYT2glg!&xn7(rV?#ivZ
zv})kEh&XYO+-QmP+<#n}=OtHwN^UihJv0RE?}CG>lbyx#!XRR?OQdycmmW0c|FL@Z
zC-=#xRA5KDA-mY%sbUlJ-6fFk6XE^~QT_{2KKjjx@>xa?V<&=V|C8hH2Y;jX1xhB5
z(Bgl~?JtZGLG$b228qaD$i>%DF7AP1>&hkC7S_GVU6!}N0NA@00*=3QUGAbLT9+Gd
zA_WO218+mLy<r>ah4^>F#J?|OzAt3H>m>7Cckylu(p<_Qm{*WzWDuCCVLFq=(+D^2
ze{&Moy36GY`uHzMAOF=oJ=!3T?K)~a6Omj8&xv1H<JE!OukST3ouzsSu#)`d2=}rF
zAr%A<A+jm%yLfr|;Yo-LB#E9hfMC3M=`O9lGC$*<xd`Qn?Ev+t2Pft$_uU(38lNA9
zPe0vUxqS+PN}sz+;)-0R9RNqG$MfLT^Zj+ILr_J!h0<82Ad7u;(Sao>8w-R&juz_E
zuFn6nDgb}wT0edt`NdU#q$7U58^0p`-TG+2X!Zz2LfN=B>puOIC6OTLXwxA{HSEe7
zKeow8iRy<Ca0U~!Y**7|%#xqGhsT4p?f1cY+ln^JPw@17C#WTLVDlM$#&~<a(-=IJ
zP5brC;rFZ^d!ZhDp&ksQ9_(sCLkBgeas~bp{wi(g=)&mYSTj^5M*<8zSb20y)$Uy;
z`b|=<JuMxbw)NK0QL_s<iV|ywYwM^T%JRa^#<eBPzhR~h*gWE(B2;*|A@_spq&nzu
z!><$3)%q&F#7efO{*xzcdExTP#na$kwDd>>xLa9#AFn-Cq0++L=q(l?iK_PoxgR&U
zN9}W*LtxtSKQmV;E_64}euO8W+nt+TIXU}3GoK4hF6|OD_w+k?R=Y7E8gxkkk0FBA
zQM2!t0lLSR)~-KcBV9Nv+=M*f9=5lnNEG|grbhHWhEpP{<-@2DwH87u5V-*vJRLyj
z@L{0CS|5#~3e#Voh~(#aX%j_*G^&u=*9}%C1~k^5Dp42(zLGoq7hxdip8Iy~2qsb|
zjDRO#MYi_&7kDRv6&Ule74&3XZqdrBKI6qvf%QyRM+K69S_tpB_VPn?Yqi_BdK&fD
z%j7f&@Ol`X>SOo4d8&KaGRijYo=s*g^vy$oqy4lqazTP>-*s(A1|8HjA(1onIqi#>
z_5H~O&Xz{}+HWGI)>_b^`nDK$TG|nJ2Vy{I`L}`NAtm5`^_t7TjhdEMZ3)H=D6Odu
z{j0a3JPb3)V2z^KlK!g2FpX@CzA;S8;j=W#t|$f=c&H%W=VPD_)2B{rjw=R;CcAGP
z5`cz**S=7}NVus)*zlF3Mfbh6bLXq*&AqSg>_m2LE!0<ZwPg8hrd(+V=DyxKT0UbT
zt+{yd*4&-_ExF`HlkK<+^xIG?F1e)<$$UzFA85cOD*F$*b>l4#s7=!?F|E&JxW%EY
zi`f<j*1pkJnBBr8tz^64t>cudWe22rwdumTn>g?hwHAZAEjtCV6-t$we81#Jv=wL^
z@|WWZ3Iy6wB1jNdTzl=Yd@)<97i)kU^u9J^T&CmM`ULRv3eJkf(-{eS@3(A`m`5wt
zg|Hjpl~lcRX?(ouK=p*Y0P1Eo#x$sHe%4(Ik<19^iR5X-!~}I)o7ceM`;!eK?xWr?
z8BeK*ak1{`eJbX>!CA-LWogAeFyG~}e;BqkH|TXpr>U$Jw)Rbsp-tey#uS0$^nut4
z6Kfrz4_uYK@eZP>Z6v^X*!vR{RI^1(jlkJy-C{#QUMpe_1!Ja<p$B~O)#*mJ@7VFD
zx3};3KhMr>ZM0pSb(O+Wx951$v|$g&tNt?9aJ=<r3Kh|UC7fk*H?@Pa)M+sa^RtIe
zZp{MDR<lL>Hyem`h;zK5wr|Rm|00%eymbpX3@5;}z3uY3$hHU03*jDGr5z*(XjQSj
zenx-12Y!Zg{CjA#IbLDCqGmjs$()$rHVo$Ey6bB$=L32DjOBO({-$yY_N}j>oRxSm
zGdZbani-qiq0r2Z$K3cWRvVUDGPqyBh>bVRO`Ob$X(n*n@L374TtnIu!e))i_ttaj
z{HCvIhXO??>)Auku!k;dPh81!M@cLgZO~+-O=#`<TB_+YW_{MnPL2P$?vbCLuZ<f2
zvxS2ka6q+sqIN!&-Gq%(Y@P`0&^9_TL^wfUga^T1qeYt?$>hQ_r;%RTN8S-Y(>pxi
z@DIdWFM@fAZ`_Vy)+#BP7G|0mh`uo*Oi{!rJC(Ndo!@%vMkEk>o5(O}*`UFxwo4l@
z64??6FJn}Eqh_*2js+8ia=KJ3Ow+3dzExz3Ii#C1nFdnmWt5;rJ-}Tk*@^x)FHBE@
zN}8pz!sN0{<btxn5?<3+UP0Z-)WATj4|Syy$^O^?!mrd_^jfL1BU7(t3cKHo@^^1l
z9AW_M<nW$y3NiG$Q*<h1#^20O<%`M=5zCpz{PhGTF?HM*Vr*$E1ZC=Fa)v+xt^&&`
zMPsvzq9r}i0C0B3J6lEqIUyZEZQ3lGH>0@Orq256%VjO*^+bCgqB9zuBWu>Pbt_%7
zfxcL(e}fiHX?Tb!Vd5%xu3Erx0tTe@0ENz_UN3<L0srTvLx<rnMKtc&O~M0NGf#Xb
z>U())+<Pb7;}2*y8JY)<WaFR)n&9Jf0UV+recrjRWs6Umi*(^z3aD}EE<OEME_~UT
zJ5TLqnMvu%Gx$pOKQo^T3tkaE9(=NvE<@HY5(HQUulC=(`o+*&Z|{Bct^MiUJNLZ4
z|BdwiH{Ra4?~T2?UsdmfG#^?%2UtaCDSp7IMI+*UM7xW46)+wV90H;M6?INc1`Xed
zDOy(mG%07J0;o&|B)C+s5;Jj2T5?D!9U0d^yzDnOHfB?{KjtJP6ql-2B=(N#_!*<Y
zPji91I@)d$fwJJ~Ks!q1Jm#pzF=#}R(~aY+AtvQ<LcFT+LLoDep+A6wr9XGWztR?s
zV?G%ZYT%lwp0iX*9}t^UVpjsoMYZEI#%V==g3yX&DizxZQqh5SjibX3I+&SsP&Si2
z=nSHQQ`=lcH0S21z>!f@JBZ20frDt2luMUO)jR+ljiFkJ+<_4shq#!7>P3twS*KdX
z(@W*NN}D^UvQC+#f=Ldd!v~!rgs7~7M+$AunHY0&)tH3U6Pal==Ael@ErtnvCGYqF
zO%0-IX~IF?^r8u;cCduu<Vpx%W3qX?$6<f~t0e6nkk7$%<VwZOHB@v+_n}h0h7LLv
zmEBYJg%V;@--QY~^W9oy!&J&@v$M&x!@<=PoD<PDFlP#SQ%Q<6{k<E4H1NEq9Uj_V
zB%6kzIz9b6&?|+*Q)2HC2A<t{@xYljfSpWHQB2<t+xqB!=ui3$Z9f3o0dMo;`7GUB
zKBF`S<*^Rp#PnWSj!XP+I-S9H<TN#9A>SgFd5*;NkE8?eI+q_GN8R0c*eJ7wh)|sB
zmP8_v>`o<8Ta(E|Z?-fsk*~$cH#F9~%%<+5a~NN{3JzHY@K)*T?cLN(<kBd?o_M@3
zH$Jv?Y+Dazc`-}9CFe|TDb@=GJg%;;aP7d7Bw`8Fm5e1*{VBX2xbka2PLtXGY%Z7C
z>g2N7zJ6zHAUQCQ$T*oys&{<bKqk}Mmq@0vi6mwMifA7u9m#&&vqADdl)^1CK!fP5
ze6i#>mF`1$uTJIB?~*-<RBBsqPa@I(O0us9^IH=y<HE0Ef<K7%*YQoY7jLCW1U*KH
zZG*|=;MOGSN(>}G`M<9|_7^8x8$`ozZQsA+jS-{{jGik@W(4BOVxhc6J{q=A+E=gT
z@%d1mmT>2<2&D#47pWPK$G6ZkWJ`|a1==Ex5&lM0|0_!NZbMy0)pr0@fSLB{idU@i
zU(vt--VWMByWc}lpo+2wLFJZ<p>6$rm||4%B?Qnp+SM6D1O0d-VtP+rbTj)J1tY^u
z!dRG67%)Y^Y$lgWIf>qQJdqk7$n}nS7z#5Vu1$iZU7=eBL>)+`V#!|A1^-R;d9-66
z+%~g2ph}edYgm3@I@E#o<72H<2AWu^pv@TDii3kx^^+}?r-9$!%qclj{ou1ua;hXc
zMK~MZd}|+`^htcW#?5vPf$)Z<&sLAw5nwDuo@le?WMEA>e>BXqcB0^0^4_<%?|FS^
z`mJ4q`geFpznBCbQ}se73ycg`e37ptr?1*F=yIc-F{$63HsF(iz$cpkmxpG6=RBp+
zW(TU{8=L3-2dg9W69_XInE7F9U&;yLwXKyf_fRMuCY*Y)gj8__Wh6L4G@rx`nY^|q
z4K|cIKNhsmN~||xSir9@YG@5`<gm2+9MGfbWNy$TuG8evjBQ$75~V!(DxVtn;e+Lc
z3pA{_NP?-sp`jn#`Acxa|75mt{cDiZQfmbF>G8GeiwaD*-7j3=C4_Gl2*{?{ugp}Q
zSPAHNS8bR@(73)#H8rNaojR-`y|Qd^Vx1T^Xxw#T&w?6rpTJ*zy+1F_y@`9jk08?O
zH%A)ZAEg<^n{=1jWHZJ;s+r5gkn*Chx3jHLk(Jjgnj`$rUHj3>(kH8TKU`io)3|WA
zary|pKTmf%<l+K}Z?4>LoV`hXbE$Lh>OW4oGfQ~vjVt$Q$Q!(h?(rpe{vm*~%s>Z{
zYqIR@(pgPn-p`+OkKbOo{JuyAV8KVo*F%$%Uw(h({#WkOMZKvO4qc^QOGmD+V1?b7
z*e)k9;WIBn#c^Qd^9~ccFrl2J*Vs$0i<!bOUO;F%K};O*kHJvvTqsu6ht&#G*JJ(q
zw&4p$FJ}F613ifzr*~^8)-Plm0SsYC2*4($l8M*=T%HrLWWwVTd4A7Vr;C}1e0I<N
zKkUn90L0s+TBb6+6Rz?({4>f6YVRpi*`~~?QUWYWAIucS0Scgm^eQ^#ajcO@<n0nw
z@WCg-a8h*@SSwWF5Q10aVjicWPr^}5B{AAe4ddGf#%46dC~0?c1kAEiBcfQ!E2bAh
zgM#SNyM7uID|{!TX+2mR0Afo~E?`_-Ui`Xo<!L~hB&K+nwwm{y7Yx_?sbp-5J2#NN
zM$wd6ft!CIA=hOanY*w=yQgrWDe!@ij8WMC0n>U0w#M+u(A6{0AL~t-sP<-QrzI)}
zKOT&jN(!Y96a9tVR;u-KnV2l2X{Q!f#ME~7d73NIg{?qT)9w;GKw@Hp$rFeN@n2N`
z@7X3LJ3Hx*m%!vWO(s`iaQPUy3QbjV8-i}5i_>7{X6WOq(R>aBI<;w~F}8LVzKlw(
zjMLfc^l)-SSbP{K?(~p4Q`XQ#F?GKDrppe0+B%Cy!4_xJYWNU#R*lMjvukC*>98~m
zvxiO!IdT8OGt{xcyG=D&Xlwi(?Tx&{I%fX2VxKS6l0Jd$r`Q~*J|yZ(ZjVp8F1N_O
z6aLLK!Jkt4+{9&whh;IDZ()tijifbqm;8ul9IjNP-PUSx%WB_R!lTP>O>SI{Ji8%b
zK&_43n5iz=ix9oUIx1IT%PZHC`C4U#Mcfe&$=h36$uvY;*jlNQJ!mYgR32}hs(pDB
z2Rf;SuMeG8U3k!S#<S%*{ukp+c{-|k&FOH{tUu&G+&9jiMntMgFf>N|!kD+KgX>Ur
zsL@@0#^Aw3MR+hIh^~s*o=^h^i?<u<M#&Kq3=k6!(h%>Er9pTk?>6nL9h38x7V)>V
zm^jam0ngsB=*b}KhP}G*MXC}uQ32>*L)kP6l+K_!@rimN9kuRidzVN2!4VXXSF<3J
zsH8FFt!JHdG!awhL_Q4S6{qZA9yO`(wMzct7@~lNHe(dstTk@m+ZH;lCQfK|Hwy#_
zL8L|5ko#DaZ)X=<yj@$~wXy|68~YZ?PpiAO+c5PA+Ocq10iAL`>gw(3iS_rjbXN(z
zSDMw}+ot7RV*|`l$&;JmziMI=fq+yl8AE~Fsg^G|QGGWl+o@Au@qjG^k93tbSfP$}
zIV!t6Iui1*RBD9WW=frPJupqkt!zHIa_gH<sou^8<j1A~C}0~JW#N&IPUCyP;b|+|
z((&ojSl|KbS0|V(1k`Hp5cLE2oQ_eY`2Z?l_ATgo!my|1P5F$cpAM<T^G~Y&Oq;Ns
zE!eGZ!0_8Ic=Ltdde6A=THki<Y&xGhZJ)+=(7)7qEcGUEFFuD;ILubp=?9IV%j})6
zKZ3s(W>#nK|7OSn9^%O-?(Cz+)sqwy5;08Gf`-4pNj1>Ro^W2wZdACZ8s(Y8U(7;1
z1_?r8#-vmPD0ukXZ=A~cZN$xt9}usVAl_4S!^jz@ruINJ;mIFahaawsMQBbAe{F=$
zC(cT%6c4>*m&EzdJKvcIN%HLTtPnzbk%Z6tmPuw{52&hI`yQUM@+;9Mq8X(Z%I&$_
z_WD%5kk3@6cbBTwDAVW0voJxKdaVRhIR09_m<63TSF6H9)tSUhIGN}HGvQ=%pjWZ=
z<JVQ$1e(9I7zcMLiuw+;tC+9lGX-!4<o@BrX`@^&OjAEc#O&TlKy9xo!M<{-S|g5z
zs=!uwPEp?~?%yD$5KbB0v5M%e$%&)Nn#&b1;fM_XnUo(9Vb{CC*9JUhzc{t_)%z>Q
z?zl@c?yXbtIMw%w`ZWhatT<DIfDY>d;0Wyosvg8AMkHo@-hVJp)C3s>!LI8Qh`EXI
zYM_M#!zrMK&>>C*@eau4z*<@UPayn4sd&Hun{%r914U}GU#sK~9B?Wq1Fi$bY2Fn4
z#>-I6j}@rD4L-gK^_=5be$&V*lbK3BLtV}=uxx1(g91AW2kbhf!jM1%_aA?rrvE_4
zVoiQW<bf%&c(OAwc0nMR15ORM!`X^RnxTLw5{VI2jIw-~e0w06TalrQAf^X=SIH&_
zV6cW47H!9yY(EF1*Ot4e6pRC0koVQ-CW<P`q|AhqE>+SnUxr@6^DmUg$4>~}3Aq+A
zrNvkZC#r|C#z?9Afea`L1!IgJvA{Uhp~yrvV)W{89%HUYn$v>+vnv16?1+iJ^b&0~
z1_p$b)Op#S+t{b`ryFzU|2cCs(z(fE6ePLJfRRbnBMG8s+&PM>5HanDlj3Ramcs~U
zA8{E;rutG=vzO2WlRq!N{IWOgaq39Uly=-di#gmJ%9qp3Cx>;41k#A%9ypm|+QSL5
zA;Bva6IeE<)D_O75N|lsF_ft|Xu4DfcBGokm+IC3Ksz!;@~W089D*G^;rz#c7k6!*
zpyt}VTnvrj6Mh%U;?le%t`(OZ1kvc|C_QN8T_=)UF*<JGh(_d@_nW<lN$3sXq0NOh
zs!KX8{;-C~H_*j)Qwzpp&72j24YQim-oy>ada`5V1F3`)kLP-`ePexFldTw#HE*T4
z30ZR(WIqjnusjLA*~BLX^A31>6Vt2R_})~kI>H@*p#FG1iwc?IfqLeEgATxPL-R8^
z72@#*>w-@!3aF5u$k*86RL8SYk--v|0*s%Yi79h9+DToqxXl+7W8{7U(`5KbP12^z
z#59P0+D(Gda_hnv+Fvgh@W%X8k$5(GhD?1xJ9@TKk1PTQa~5KLm@5?rD)3bC)N(3&
zNijI}sY#Uqhxl`FC>rsBwc`~ew?9@`b7qsF#}%lB(t!hvb)yC7`BkTty_M$W)mRhK
zlq3gIgr4I6B$K9}+S_mL+k^4SPh~<#0pUR);0C6)`2sQ5#eeCiKtSZHPR&e%CI+!s
zV7wNuOjOel7fClQp)`rGx-omNG@B0HTB^gE0uWgYNY<$@9&sZ_W4K7xNDqyo7H<|8
z8S#!Vsm%mFx3thf<<jA3sxyXc;i8DuX-_ZlYwX!Z!#J$3i6^Po+)R=d`*oUlJ|C0J
zRDVw_xfQR&Kq8j1#M6M}KS6tzk_!<j#`xYsI2l^ONF7An_O*BNW-=f^fFOu;_7@~)
z6w-SW^Jz{?GNi%AoXQ44&Xo=qePCN4nQaEa&SY!paUiGLckEBSwr5yf2K;p-n8;44
zb~7@AExH|*T8gUcfcG-X86hSRUhz{|Ud;6at&wP?z9Il{1vO3<f6KtARjP;c2MTp5
zAf_OYWRH@sQ5CM;{7)a83|A8+`7MPcX%bT*i~Z#JQH!(D$%V}U3qka$rN&v3Oz!bl
zD<3>?&s{<^LKX>*1<o0kj9_><QU)YxUW}<D*$<Rq#4}YmucA{z9xZiS|2Pt30%)&2
zZ%WN|%JWs?@Bn`#@K!mV?~--ul&D>Q2clL@ADZ06y&d=q^;k7yd&W>PjSJIIUx9Hx
zJ!xT2?>54}_HB!8GfDn-el_wV{NkQJh{j5#0vF-M<T9#vkmRtVcMrUG;OZr6lR*%^
z+bA0g03&^~R~{J|F-&3gTjp5x|HcqQ?0_)Sa+P=LPG#ETqX?$FL(#;DW=*dS{czBB
z^=z{rxJ_qYej;$E0uga{#W_#`DeNRj1+yiPjZlHkp)HeSG4ddm!}F5CH?fJ#A$$$1
zS3s?T@}7tnxF}E5NBb4=ZJU74P_HcN=oz;bHk^15O>*BZFsfApi9igYbiJAxEAY5`
zQLi{V_ms$Qs{Csl-{$j0JmZ)*jjE7ivM-%orX_Vu_JZR~SIRUa_A}>S*C7h1*%I6e
z@<onk72<IP(y*xPdnc&#kO#NZ4`*d>S_roUlKnITjJGcV?pHo0dK5!^tAmWIQ{N;;
zyqKV%O^kNX5JtF5??!L~kwMx_1h<I)ChbO4A_K3L7w-WT-^Zm57}eh{y~U&ghGa+{
zlsuQnSiO2V2NoKB74!v+DZ^>%Yx*28SX1%PlZMcVgH9$#qQ4dK!IP%HeusaMzvNm9
z%6LbWJSRfoExMmRP$|{RRqY<BOtlBfGNpdNx;OdU21tZOFIGFbsAg(l!J#5F39<|)
zHAWnDVe#Fm2Q!#kh{z~jeWy}m+&-QV5eIaW>cS<rO6MK@uOZAg!O<edL1t7OHPSXU
z9J{Dr+ke5BsXLoq0G{!PI5vcyu6nRkFXSMG6uuu5#_Zj&op_JRh4a;jy)pEzImq{n
zvV3i{$)Vj$6RndT2&!0unXWMjACJ3w*g=1(S8F`(KuvEbOXT!vmLWiGf;lV0|E&8#
zA?d*#)c0G+X8l@K=2Syf<!6`NlDU8DWe6}Vo!gkuL3D7&Z}cYm$OXSQwJp}u-`fVm
zK{_L@J<=~P48FuR={Hm3EKtFb;w8#2Nneb8Z3^1duM?RmiSkk14}K+1B~N1nrBjLC
z0i74@Xkq#JwMH#Ih=zYp(iHD|2mk-jh*<+lrBgW*T@o>r7=col_Z?={5s4Y_ZR6Tc
zjjMkL8=yz0R~GLxB<Qkrr<}<lckUh<mS!U+PmQHZxK7T=aV!Q>hzDDZM*To^irDsa
zEbJBrG%gVceRz=9;2|vV=1^ZHN8mOP<+B7JJ5880Vr5!esv)^8wUWW51;b&Ape}Qk
zS$taHAclI>PbyZccTN(8S8AsUmK{;EL*!H%QqCr`5GE%>(h+4AAjjj7a(>!m&v@k3
zgZc5=5Wjp`Ru1l`FezRcvm7}J@o6%o9V=!@GDQTOb>^*Ob;!khY$zE|pl&A2618%b
zQn^GGC|bp8HdDw{D5pT``k&yI+EkE^IXDb_DDwL~`(BOMCaA(=_zk6!z5TrdJ$=3X
zmU2aBsurqXzF4IGLs@*{4<#%mV8Ggf7=0SgPx*b#WNUR;cD#|_f!jgzVuvOJ%E`My
z`dcfhswk*x#L&SSR{-ri<oP4yVoWq6ZM>LLEv7N<ygqbKi=hbn)9)B04NCfN<Jbhs
zymZMu{mIJ7IZUWUFqqYQ#~>KYql@nMM{$`jGy$#5f9l>jPs8<+)@UeTcm9i&v-g%4
z9yKn_t{#7~a{ceiiw}NYnsdK7iniA<B(y#C^U}%Hr&m_5pI&}+qVd!BEJeiWPZ%%`
z&^`X++J*VXg@^73x5e<1akO%4aqY7kQ1|31{0TZdD!B1cL<fUmjaoZCirf>&*N)wG
z&wdGUib;T+*Z;U3MPCOl-N|+T5r`TRbJmGxNkS7AJ!k0}L_<FNrF-iXw2xPgs7vUD
z(OtT>HhZpdac<@8Y~#xPweK!T)0B!M_p2}553b<>xc5#puFbl$CmR2lUpe~F{o<5<
zs;@r&d*jSk5Ie_jG>wxpYgcC=Nb|xm_ud7FYjpN+E7y+TH#Ekw)OJrCEic?`Tw7W@
z@(=gPx4tj|+PR~Ri}!d4q|>l`qZVQ_DnXYpue}TadZZj;LZzIBwwwT(v7V{rQNMeE
z-G{ukjZz_WL23nYns~+Gl{dBgVyd;nXgg;F<#uILY<q3>nrUfX``82ky{Gc7cq5WK
z(hQ;>Co8(;Q@{X(&kuZP9dv3LudesbC~h3JUM=I3&K*D^0d+NA(x!m$jQRo@w=CQJ
z*ARP*N8~k*zN<rhuM<tc4H935bNR_UwT2v<M&5F;I^xB9oe{mEPJM`ytT&88&}-}^
z6;GM+j%&wN_=j48uOn_NElobRmP$S?1<X)0g{QeNo^TX5B+O8>*j4%(tF7u^eYV+9
z&?Ys2jAbx)r|m|ikTM)u1E>#CsntwPB$w5%t)Y}NqP8sQZMm+(l1a6ROw(G$WEKsr
zlPDXYnG$8%(oB)s&r>rMP7|fw$AFJ}^t3T^tMS1__ul*Hk1_Oz82T@{P%G*xQArW^
zN>nMJiYC~V2#lqYItt9pVRo}|@jlf$FE8G7Pd{8ceg+Xf-D4twqe8$iA>h)!pHZQZ
zfezw7K_;^=hTeGT)e$0H!9VvT$*Wg2aR?sf7ToVaQ6(kUNm=S>vnl^%`J{Bom?U>q
z@1c?eIza@_NsvAJu7V_0^lwO`(XBVBt)l#MBbrs012fh{Vbl2Z6z15MAD>xyJioH^
z31-8US!qRQ*HjOs>tZzS*3ZKLo=}VmJmV3FtWABPs<lcTU$-lek3s}4XVG>ioeCJj
zqRHNPPhT9;R)9e_*m~DVN(joaq|@Xwn@;1uh|@v(J!G&h{CQ6LVtd;@`eLK7CBm#3
z3>9CW@C^-LrB+TpY|MY)-gtlI{0RveEHG%eR5_q3yJd0&r}PFg?p(Z&Z{2K(+#9#t
zd*>hu@8c8rBEI(7arfM}%<@l<y?T1qoxcp1M0^#VJ7K}6kh(qMsiOgJ7LBa~**gW@
zx%kw5^iAXQn=9AnaR=!r$l!hFi_O{9+lU3UL2H*DH0J;D^U`G_rY=aYlVm3oyT)R7
zas*G8vM~eTQy+;S@c3^7Abtl)#C~3y!^_f``*3;j?~PBdHGaGZAR%1u^1_|fvp>1#
zJ|RU(C|)DtE}OQ6N6U{+kV-T@FK&xMY%-w~Csv<+h7rP?DSP;51)sH{vsNiy1HiYY
zYX@<I?)OI_r0?-LnivvdOfOwqoB7CZIQJW$K3Ki=wR_@6h%kM94o|A@NE%mX0KrM5
zV)vV)jW4dc=kBf^d+a}3k58;z_#S!>6Snr*d3WxCHet&PGkE)~KK*ccX^BAA*3^q(
z-9^c9)xrFLaL+Ls)ytSVc}uup$%Fzn9wx8ExlG@6o7JkDJE^;I!qyKnM711@+a3;+
zPFfQjhrrH6*F$DYL4(!1H}OkAb${p~mc*Pxx_Rg#`7Aa(>ZY>*u~qani=q8(Mp5u|
zte!sRUb#;;7zwF;`~lr^p}N^Vwmlzv9)-_&Mw=chE4ooOJsN}78%KZmFSN4sv3uuI
z<I4TUC->Utwbsq%M+e-pVUKnB+cNgC3XtR%VHh34r053)>uM)*&!4q2y~w@yJqZ^N
zC%t{^bRe0|(Djq>9L@vEG1)jIA_}2nC1owl5#@pGk^VQ34O%}U%_yfdd=TN2K_UVZ
zrs$_1+|$S1^WQB$JmsE*fcuPV?80;PF@8f`Lc<bRzx#{2@3zL#@Y?JdylQxzXHU2{
zW~EE<n}+%uKXz>sU3q$1Q$+{Wa)4^$_yf#95wA_Odj0mwjYsS_`s%yog|FPDV|eT!
zJ8GOgsf~@aw>oordEtY`!UEOlz_e&O7~ryreuk!vz)dDtLy$`1;%Cr@8@E>Pe+dc`
z`ku5aH#MH|SKdqL(1J-1?1^tN&j`7<$@1X^p}U8QAbeSSymN`(iaOKNZ8C_WvN;HD
zeX292Y&ue4k5?jFuUIZWuWhGlLCEMi`>p%IZTI2R#z%MHgg-g6ym*sB#QRPH8Vr*d
zSWs&!Kf@(<c==i{X?+wbHHt@cQ3sJ&LLxaMKF$gFQ-pkwq1cu)Ds@$ugQ)06vyZGk
z`4}S_ZmDr*p0*C8i@2^O`$#U5G4_MxQ#nhgl&a#?073+76jwS$tiwI&WDJvwoB2vu
zyK=@7-WSt|6C={nBh0@6<ryzSEqC^ud+$TseLRkpAD#d%E**6j7GQ@we9&0DzWU%t
zc{Do2AQ_eE=ZE{`e}bwMDpchl36d`Wb%FaZxoccoaGxF<VR~pEdPZl#AV%E?-wg5I
zpqg#Z#x%w%$VzMU`X9Gj04Pe@%^-8!3`e#5=mHx4J#~oL;|ccman|n*Ch}4?%(Gbv
zi52y?6Jl%TXC)+7#otB<9zVMtxNi`m{zo_sZ`{XQF1jaath;CIV{B_nXInP&fu~&~
zA9$LX_z?J#eM`a<ujHKZOubNhlbm9=ue(d<e3ds8gBnM^a&MfW=Yve3{E<SovBaB1
zA}<#3{ajoQByjc-J$uw5ip-k}wD-|>%8NGwFjeW4O@bTR9Mq?Q08>zH$g05(+s4Za
zXPDv5(+lpEFB(7IZoGfa{oqRD^JAD<TKV#CYnT3EG;Wv}$z<JXrlc});6CqEAc#eg
zIQ|1i8?oCJ0pn4J^uS<RJ!-%vs?yOoKJ8Ji<WL^PGjU>tSnwjb8*P|D`8XJQZwdCx
z3Pxt?7gEJ2%>CL_GBR7XHr0&We6b}(Bdrux)yP~an<yJ;srJ>49nVeSNPF6%%8?r#
zH&HrDrmd~09fh%LD~d<S0Jc@tBOlGPDj%)G-8j9VWtui#GiaUW=cOCu3)NgZ$mS-j
zez1d+{SZ<sHGX8Uj!=yw=&L1Ei^208^@L{Q&r?%qX+20+s1~<GvfLP9EhFdaAg(oO
z#D)uO`L6L7*|vxN>n^sHR<T{dt%@JdCb|s_@<}9dfdV)d3%7}!N=$tSNtM^3^%CNE
z5L*_&@M-uRL?T;$3TEXiM<1f(AadufQFC(4mAg-mgPtDW7O7-|Tz;s)2+_pz3NR}F
zGp|b^w|R03Y^G&N&f?9rBTwDqXQ*VPapp_+#`lu>wCvCkEY8up(WcZva&>9zj_<8T
zz%5_dB({6LS0Lb~?-gq1!{e0<bFG(q*V4sa3I*N9)m}Q211|Sc8}jwu^4&n|6h`aw
zcR?FTd>z)lyF`(WYUNE-rhlnkJxK&o-1)yT$rAy@<O(dz7NTApEBEiZA6x_XhiYlO
zHj$Y^v>C~9LT-iliJV0{GY|rnI-)Eu+yNdn&gw^**Ire`^FrV(vhc{g`*7{?d1KGe
ziB$To7O+QWZzAikZI6)B4&FqdyrU`nXBS-9ng}Dhkm}i`4&oT_mpe#D+b=75P@?0!
zkUhMRJ^T`~hkp^F!V6)*1`7jz?N|KBsHi$B{E%1_T!<8gZ`$l09eHt^4S$<`aaV2O
zuKKJId;Ir$TQPg=S4eu_5*+s+qAFX;5et5MW$AB?3r}oUNB8P&dR1fW^~p2DNCIHE
z_fDvU7_O&_Hz_Rl$~D#Pnghgab>rGk?vcyxxw}xBm>SSJjgQYPKfa1NY}zTE`vyyq
z%jeytbE4b0K`>9AKkXjBO)U<L1_Mrt;21={xADb~8=rs4r|REID$)9#e}gXVv;cE%
z|LKR7#dFk{o1)R`(^<HsksAND*ZUaa`)&To%Hw&k+`(_bz=c>KvQ7Ha4|spOH$DLi
zERrN*qi+8Cl^yZ!Eiax~dH(?>0gcl~tU`dMH{_OXzM-o9TV7~=@2Ncc*?L}A(I5Z3
zb5H8cy*ty}_wV1iXaBCfd(u00Z{N3X=RRN~Hu3PP4v_!Ipcx@Vpjd>Ek#5mjsjxag
z5)2ji3^`PHZ%=P*YXWr*B>Q4px0->x_7SwkfBe2s${q@g+KsWJ;^e$UcXG7vEQ=vw
z1oj<buK!Bqv{B3qKdn}P;Jb?Jw;2e6ps^OvS}dg;)E^)D=e}J#a>hM-qVaU5aeq;y
z%EQECb>{Ti=U?E{7G!KZe;(;olQaURc)>TYu82cN^Sy8%c}&hBN<R-^$Dq@Y6!xpF
z02CaUU*s?IKq}!?I}m2EAeS26P5Bxg360lG{OPq$6?slR1%Of=N#Uf;Yi_q*54NpG
z30@PDM+GJgL@!bhTGI^pK_x`Y$<_oi@vx4yQu$ESR(Kpfr4Px`@8xiW^gt|CCei?b
zoMmPJMr~wUSWy1@gxMl(#Y}i0t<Pal>B1E-lO^ePMCJm-*^+=c)!x9NVafAgEDlN3
zmew)LO9-hCIqG4v4Ra*i<I8BzHX(o-J?cPviw-)7DL94<R`(<^3|m~0y9_dsRUDuf
z8J<fIvG8sZq#7hiI}8p*V-A&b;(AV|nlDUK)iFAJFpu{D1Aq%3D1j0imx6nEsesu?
z$S>pMWVno4F@_UUt8tcqFH?XBd1>7DsE&;>5>SLtWvA+KTYR6zA-fIU3Q>AaCoHqF
zyHqM8l6)<L$CH`N<O^`Q7c&Cszwk_v7`DVa0>6g%;^b|%G=cGm=7GWgb24Jk3Nn-`
zd~hu7a0RRhqDFU(oxo~QrM#A4(t|jnS0fiz*oIRK>39lJX^Q3g_)xB<Yal%I_Kyw|
zG84ke(qUIHt5)MFK^T4$*AX%y6uH>y%Y|SnOuz(Wy+ZQdv8<@*&kia&6tNTwhlzTj
zmWNb`P>C^5v;`9`>(|&c8r8P(D6Ezk38sd{9K}!KV;99!*PSHefxlD&N#=JPCAir+
z<hUe9im`a`uwl_{s-G>xL;y?|YD9350cVX4Ri`g9x<r+xA>_U`2g9V@7)2K7$%t*)
z=}f*+I_$s}86~?L(uSsC3h^>j>t&Lqs8*_A+-KSDXf9eP3bR@*VG_I<Z?1aPv4e+|
z7$w~qWgzekszMS?nXz!Kl-2S!F8T_LI)M&(J))R`TIbJ?v>UrYWv)~PUioG@z?;Cp
zCwTtAL@CcQ8=7pK29jjruz(789NFG>`N8Xw7vWiXF9n!VB+a%#E-WLafC6GFJeKWn
zg>#yGTj5VgUQ7TpI<T%xSrFuPVa$y2`!i@tG22MnVbYpRU3lJ00i^$K!i&+==ONdZ
zBxbhEy#5NrS8%nAHtk)#$g+Cpc+zB=_MCMwLs<n6u}zG@L=EFhTmXuiz9D0kQYHsX
zu%#8h4#Y(z{60!{o`*%9MiV+%-zLLPjgf90CZt}><L*YS3(*`E6i$pKXEk%CTM=I3
z;D$^f@qEphFoz^n48g=1Rc-IMK-LEpEq$~0O4QUBPWe3yty#M!>NnrFV!{2P+1@D(
zj6XpiwvQxqGx{0q&qP&O;w$U81NWa_+wCwjHaPGAVO2vI%MYnPWS=VEO4U-ml68iZ
z8462<;c2=b+hnEu>GS&VRUZ5R220685q=?spg0^0r3z=LIMo6+rm`;Kj==W$zYQI3
zAjY&Z1zeP?No`*xQ3oo!$p%+twnXSV&=0A)2KswrmSj-@mCd|}@DIq~LEG;y(&otl
zNy&760$;-?FwG%Sg%KefR`V0d)6&sWV}?ZBrwJuX7lfAej4{~Fmvp^TcV^K7Z5!K8
zDz<Igwr$(CZR?9uv29l>HY&EwlfCynx80Z9+I(0)V6C|(Mju^bkS=w>5xo?xNm@h_
zE2<b^+51u9jcrgYrG@B>DeYeTJp;RNgOtX)fN!K;40o&}98Q;RcP<E$Zl|-uzR_q^
z>08q4H9`c*Gw}sIN!{HTxH5#iaV8-uoy0o9;6I-AyUgVeX<5aoGf@gzOr(v3D!K&Q
zo?3~=F*WSDkk19#)m7rWgR3L}c<WwEdxGXSX-cURU~OU;><X2ANtAAEf-c8pHl#F}
zyoMrY`^)Z=bj#J*FtE&UIXOL9>oLpDrqNR-9G%cjJJ*wn%ooX#uiN+Dnu*V-DHNoI
zw=BT+ZoCl%mr{|8UIr1D3B^+zGz|6uSCU++@ko^zT}De8C>*a++OYI8b@Y3%AwHbe
zCoUirI8ac*6V~Yn{)wakzZebU-?;&XZc?;){@Q<T7)8`O^b0<wAIRy-cxU!<WXj-6
z?R*+Iw+N}Nn#02%YP5!g2Y#j6Q<5<gyX|u40`uu~>K`rV-{oO1<H9UXCD!Jgw8ENR
zE+L^F!a(H_4+W5mNt^_Gh%~x7ntksru4`P^;dbrQ&$PG6kL=$@GH1PL4z)2@MAt73
zy=h?;&4_GQO3yr$C0sx<@qHOU1hv-l56trr>BG+g;|?9DgW>IYtU&wLVBm&+s~~+t
zgBef;MPx&;9UJCXONzC(NaL%{P^yfqWb8E~{odcfinDZd)cpDh@m3t0j<Gp!5Ovf=
z;p9+KMmFm&C=89nV}W|9O=YbQSo@;P6D9@}W;Ce#Q)3lvQb4`Kv7#_c0hf#KZO+*d
zNU`K1i4m)IR@KmKBjH2!A(fLBQ0d;sggNW}v6*BUa;RI8e&@nE_sextqQZh~H?&bI
zX(ppi3D_&wtSU9Sl0DM4h4J$Jm>lOJS^=2`c_7n93z{-t)%ZKAW+tdaiDhB|-ZiQo
zL5#W-IY+bCej+8Z3nZX8aZvo^>SyJ?EoR^y)rzy<iURY@DJpUdRpLN(!4-k#Y=YXA
z74Qc)&h-ouLNi$z6pg{XLqbqA;V`s4bbMIKi6I}KOaBe^!4vzWg)GzV)z$0uFr#NA
zQspSG(L_mH1Fs%=q?@;%f4C8Y;E47ZA+O7JU;=Fczf4g%){D!xu>Q1ESu>%y`=9Js
z<S)AR1>@Kn^Lv2zqrl&I*`)3&GZ@hjp8tC#NoWy4DWP(5IhZS^lseq9h*nurQlcD;
z*op}pD}D=qz@z~7ryLs)t1>V?Zk~A@=s4wR(X_85FN?ydvfV50BvaBE%QLC?g8)9D
z;GW5mneh2dxb+izZmYDr!5XJf75@s_)KP;7=B{PjlnKGfB*s1uaN$U|?GdhKbu%jR
zK`xLGvi~Lb!I#Ziqz4Xhxa!kM5jpcB20~`U<R`SBm0a{&*FWHY;C1_dV5{p9OW*V@
z(^0}u{25$PW!#{St0f1GzPBp@k4posH`B+f2yIucmZZxxP<9n&t7I!Q4ybY&5YheL
z^`5@vQG|wG|B{Gd|Ds0OzzT)L&S`>9Db#VqZ-W0Caq(C&m)G_#qJy}EvD)tO&hZ7%
zd!yxV0!*jp6AG7{ulaR9zqnnG{*XPN*dfOvVP2Du5Lxd}Bj@o~#C6NL2iggDFeR~q
z!FBhm|K(3h0oa;Pn_F5IN<-e)q1zycFAYa((@_xcIckS(hQWx#MfLf9ir+@VNA=MH
z7Txfa1GFb?&e=;pm#rfw?)3ZiE_?cR-{HRkI(KEc!-{5nQgV3I=FvDAE~qKZ`+qsg
z{z|il*1EK9-8md7dB;U5(rfSeIJsE&$RM=C^ba^9w#rlnZQ*wdMpv}FVurGp9Bi!h
z`@$AC2)4MAD5vWGA+oAh6y8xGae&T_0aMQ0?B<V<P7MLwP)rAv_b+=LKlor~8#UT^
zSnSo5wpr&!)+G0#c1puSiL9k1m5%AN?ZaQVdzzqeN}j%t;1{TbF%J~|N!os_eO-Q?
zUyZLs?r(L;MNj~j3D!M*E;Z;s00Q6S5Z-4|xi?5PD+$of;S)GI1bNs(TU=Gvhgif4
z^WCTIk3~$H?FY%SxU#Imp%>lSv`I6?ds7;(Y4$^wbq|rr!j*zikAJu+u*`@mB~j5K
zb$V$;F$vV?c=>mYJ^eBaN?%dj<P*H2EN>0$KBIiM*aga>9F(F4o!&Z8;hW&O;5~4J
zk?Wiuy|<^JM?wmNU#Npcx5T6sck3JltY7|HQ2hMmBba6Xe7AmOFue~d9HkH=am~=S
zzb)uJPrSE^6AGTEF>H<z%=Nyq37br!2tZ<YvVs_9)7E0$i~l~nZ6`H|uI3!BJ5Js4
zjv-mT@@-j0OAfNfUFCl=g7hJP(e6oSh`eI-qR241ro$+EW=`?_vZ82!tGz*qK9jHx
zog$TFM1bhgB-Pq7iPY<Bxg3XCqL^GDe4tAIj1q#h_T^i6xg!T#;Qxl#A%`Y<oM(*)
z){-CIH}LZYzfTOLxX;KS$agL9?Q(vi7kgM{`qk#~^~>Xmh3@nOCaO{RS&?uG(O-T(
z9K4m<+7gP@Tbc}hKk1)_$JduaT|y)sY<YPK3nwJl+p5q5{Rc&r3%WjS6_zd;;B}DA
zHzZVJiSpm|nlIlfA2%%=>;h~S>mN-_=1o-}7qYT51yX#>2Z{&<I${*jIb$gzM*z&}
zOS4MKp2Z=zUH(8W(a|eU?HF<Z?{AS0Gp>n$bF+m-<Pj(VaMnzV7wvlqPPXuDJmaD{
zlR@mIUk{fQJpWz@2JZXic@8ra;CJ>D1b}rTA{napveR_2M1d9%?tpgy85#5(zxix)
zv!3O2_Wd1yu`Im&S?7G@2<hz)%S|EjWo0|!PfFnZNXsjj@6mnO8>Pd+%_|!ZBit}`
zsnOMPoGmEh0%_lX@4927=MBB!@us*)m068Ch|JYlDKG6oQ5Ci^nO7sqfF34qWDYu!
zt!BUl6r!U!k-~=x$!OBRE-$w=S_cx{w}t8`9LHGE3$+B_QdNV5>LSTznS}iFTh{!R
zWxGrj45@L!7NJzhjlWCf3-&C8LOYg3IxS5?1+!6RrAicMbs_WDTzjMHeRELut@32N
z2@<;_39ORju~9WP+V8QYkRM-7Oj4SQP&yG^0uJ$Dl^%Q<{#!BMj_F7oB2M?&kEB_y
z#Hx^M;=4y1FR`r#)VE?4Gma&>yUUxxYOKSW5Jwkj2eJ2FL{z&<p_KfG>MUZo9&2*c
z2s%%;Z&CWFp41%yK27~LY_e7qjT;reQ)8wr%od<pYftFQ50W<?<{;V=mvgD<WRlCp
z76j8a-aSaYRl_3stUke$g79tJ`H_rIiDh=NRoF{oPLattXP(I@hv_jiBtuC<RPa57
ztRD)^Mof}A@MJB@eQAwe07%MVOCRgC6LFIa0z63BkrnC${UV^fRH_BRPXIU_#f{Be
zRf&$IQyHq$N?h>XErrCp8>Jjh60(#e`GXNub36Z~-bKr#=kct;EwyF)msXAW`rN!i
zV!aVJrllsjj-m40tA81pbetnoJ+Im(o%z~4nI^_(u3L%2nAgJF%c#bvt!^kel${&9
z({*acgJ5mXiImpG!Djf2!4#o1>S>EgzDf}5g_TuIcoplgB*){@%kFrC=lNnaYrF4!
zChxS(<7)N&sXE7GTve{R{`L$rGn)|}H_#54w%zu(g{;XJxHVkE_bBP8>39NkL*5Y9
z=OOkD^!+e-%Cu6MbVitV%d1}h=zw>9IuZV}@-64b<#hnzjwI8)UijspB;c=d)uAaK
z`IwM;N{^h$AF4|yL1v69kIZy$F*uWc&u}en6Kn12R2Vt$aS`I4UrZ;4rhiN0uE7<j
zJ$9NllR-C>sdfs}E_yI!5dq>M<P-nBjP$}uKeq6dc=s;5O;EYo>^Yj3Be*Rrm<F!j
z8851m1u1&fPNbO{1|}5nN`kA5>W&OS;WP&)?#(7=TZYF;Yqyh;pz}Jo=$dEN=|nF~
zaP6lOpP781JUdt(Hrh~iSp;8sI8r*djHIr~Befe7e?MXP)$5gMwXoymrg3E9sN``a
zzMsOQJL{eWv$F>HUVw~CA?UKW`s`+<K=Y@C{a22-%-`cMj#26s0el}rv}9o~A0m$S
z@x}2D4~6f&cT?<cF6iHWg7EAJ_Bp$hFkzZHnW-Ma8$Wp!S%3kCAUJpH3vSRP2q7(k
z9C*&ff5I*28a}pdJC{|}a9(?Kzn0yqqwlS7z9oAhL9S`pPq(3~EGcellp)rXEj-`<
z6xiA6Jm1=K*`kwqjIh|w-o+g-`*{jk<UZOi*|K!?+C&rMLHWimB8pzFEySPGqHY+`
z_v7g}!jV&euOh;E%hdFHvGp;gc`SHc*N<;h4MO=V6|=_qjF3J(=0_c)batk`h75lc
z*F%pIaY2+?98n=BZw5X~+tQ)s<p`goutZ|28}EffY%b8b=l?Bca<sfYZRTNQ<n<2@
zAWYP0)N_O?4EHMWX`dv7Nm&9-V`*;Oj<8E-9Gb}X2?ln~<127N_b2dURp+sWV|ojP
zP10FbTm-hf#5A!r5zj+CBgkY@L_vaF%VKT+m|yGt16B-kS0qODJcaUjr-QG-Hpl4E
zwXIR(Y$+OofXUvFe;=$>z<~8MCyr)YiEv7r?4MeNo4<Ku>MU{+Fy$08l=D`<fv5W}
z%P;%Q1mjf%rIvW>`MH`1PqoetYQBuRW$M=kZrjZM^sr`Sjl2mCx{`ACeG#eM3zjY5
z8D*(jg&bL49CL)HCBZg%n&HyIo8<90hs7aCrVz~m%vWErjJ^8X7pQ+9_G|l9oE#^K
zfRbw*VI`AH_7G`w-X^C9NQQ}+7IG{z%XLx550NLv0;%}M^pNm(2DYML+~5Lw8GqW^
zeyk#f26+t7R!bZoi74pA+4i+no%HI-1Sd(Mx|jZk1STm*D}TFnpI58aWb1apZze{k
z<~rAOfj4W&6L_8cz1=gIs!&^Y!-30g;2Io1k(Ed}nN{?LBvNlw05Vw3A}82ks3o%+
zN&7FvM3Z?WF7-I*8Jbz}aZ+y02G*omwBk}a9D?|#O77H)*c>JtVJ8%Gm8{7iM?ZvY
z6(Xu*eHS&yqiRgXxLMg`O6yk`tDG5rCF*n*E(Yx^H*DX>?jmuDZ|@2E-`C1<p!&jd
zq-U)d@NDp^;^XMUp4_~rynDLnxW2N7@Y0sdRJ+LbSwK=3;52D7i^W6Bs9B8Uqz`M7
zKve_&7Vb21!cId&F{5hc?9-x}Y}l^$^LHG2>A2&7H{I==%$Y=|U!iuCnOAHqeGLig
z5(K7xpWj8_7Q4;UwpO_6iOir#J#gtLN;)#PvT0Z};ygTBrUdw2{lpe#QN+nJW1S7&
z@yc9^GezpjsoJ-)9T#XWO3?CcHN4Obk@YIrgReBjj8fI62!Y~Yf-GtoY`!gvTwSeD
zrUS}ENZ;gs<|_D>9jfu<w5Z0H*rCiS>#N4BHa}G***Vyv+;PJM*@80~kaJz;HrI9t
zK!6~_h2ZTpZsH}5az5TqlxEm^BE>%1m|USY;nJzbk*m8B)b9?PzL~=8d<FhJ(^*DB
z$#G0lsxtl02o_pkjo<n~P4gw`osGlCe^;_!ev*{=J?~a5bDdU}RZ_)qT2KLWHW`%`
zM^!>cTfnxBgyC|R8d6rm-Vs=+2L|!48a^IffsJJYV~MZSqJCi<_>H~rfyVkv;nNyI
z^In(wtfsmp+6Pw=H4Zw7w-NP%h=JzGbwu}SsA)o!WjZ1TLr(uR;<_%4ryN`We(G7Z
z^_nCB;vp5%?NZ=s_TbI~7zOp_7+1iwPG#CNpmudiKP^+#TC`S3EGjj1wDVcdS}PX^
zV{}gqLPidyDf2kklgt}-62s7(US>Klj4>k-#QIfmy%8Z^j=RxpT=+vz%jdt%#!a%Q
zZzNSYrR^$(H{?ZF8_AId>Opo0W8Dz!SIn71uz_HLN`bJ1b?7|c<6NTe+wF3In-wv9
z21HHWrE&5Arjb6i{Nkc0>;r#GCJjy2HMvcUxc{yX7@o^4__Ov6|EK?f^mid%@7-jE
zo`sLdHYP;7ZCPS1Ce%9odaMU0rRbHB;VK}c|Dw;qVaEo78S0d2w{TX@2$Gl|0{^9l
z$`vPdni(6|Ww&p;@5CSE-Ug3B9|4p{@TInRx1g=B$6_Ua1#)g9S9>losHUNU%PJ45
z`#Mh-{WXr}%{~&p7t`~8fEFVcK_nkj4|a^3_wx1kMEScS>S>1iCL%b&_l+lQe<A~i
zpm&knt0GVg(5wq5u@$-_Obd8*s5voeR#by*=-6keEF`=}mOO04yl0Us)^Z9pMth4%
zS#nl|HWqZY897C<CQfYC3SF0^I(VTDl6StWGF5BSi0iv{UFEwaWLQz6D~_uyM0ix<
zyr%A}HNdttsbcBxb#Rp1RB2!WnGL?IrvoH>A?mZ(UR?3Om?$cYV7tFG)NaZ{0CH0s
z=p|*ERHJ@<UB=MPoFl=KXHjC-_+J7-#IpYoigjt!@2lXBkcL+Nyai^$zce|N%ihiK
z2dTr|ueZOB+&_ZFW=3W_JVYpzZ&d?uu^QyI4#2;AsQyRZLH2i9%V4v2t867pl6rTl
zM5!7uXPs2<#pp0)(8<Xj*f=SJ#D&pyKqrg`so_8l`C$=ClcV_c^yFAN15CeLr&}8u
zYHvM*`AP$dd>SiZ>qgUaSvmlk3~1~<AOn5Im0!Jf6*R~)g&O*6i=Q~M-poAPeRLdu
zS90)nvT<i0ALF?8LHFg$>L}(m3bwfOUA*7l$J(9>Ic}$~TqfI1D3h>AzeOrViKYI*
z6}R1uB=4(sBK~vBo-ElR^FgUxY~9WmE&fh3H9+K-C)eLz{G^8J`}-C0h;=r^<Ao9;
zX(u+ZJy6>d!uQ$k0Lu8+@3Eneb8Vdv$KU4t3(ui2pAM8{oL#+Jc8e6AhSS2fh%*H8
zGPOG1#WF_l*B7~dep&mQMyRqUp`M=h4{xVOSdpWl(QTae9lcU15LLZbo+E%2-Apf~
znNizb*8^ieju@VNrch-6IS}X#C9SH2A?q4Awv~+o<RM%xJPb2OKW$fz!bc^_7}<R`
z`YY*6Rk6~m!0-!=*TRUPb=-$(vpc@Uz(a*&<n=56>nq=u8a<(Ww2iC{Fgp60e^S=|
z4M*4S?pa1Zh_EgleEcs!r3Z;C;b^%uT*y#Xj5hjoic)~d$n55Op6BVx%WmcTbmgfx
z&)@ADfiuj&?5_vXAP&)C=RC1S4JX@<UjkA&J$__#H{=_gF^xn^ZsX5cjs+=Ym!@D9
zwb^#8Yp$BV(V^0(=cEVtxMQ+}6EExOeI8}3U80w&=m)SZ`=WOga*}kK@Kx!Lwwrtr
z7(0ZW&KXSDef~r&+-V6J5o4-?=;g0IQ^?uDgwV{4YR^Xu_rN@)TNMEDsq!D0=d|cK
z;rLAR<UaR(Wb^?%@w-1RvO&8yGTKvzEL{l^7%kl?*p)4??dB~Yi@bg@MJi>HpEo6e
z4-$S0x~FGC(*g19Y^y5N_?YwahpximG(h4fYZXx74IKZLEYc9Q)OZ33hg4#(WK7m_
zVXmE5jU_)vmQt~*f;}X=HwlwJ(SmwKSeOTsbCVXk@1RP4>bHfc5vc-5(@|qu$b%t8
z>y!qyEa6RIHUYRzsbeh8Q5nynL1nXB<i48BVjNg+<Q#1&qoD4Y4h%O<qoD9~BcN7F
z#IIAeh_FC94b#8LfqyY7xbCIya8w~4cLmX(c-@l{igc5Bi`;mU*QSR?dSm2uZzFvX
z)~J+q=dyb?hKoT$Xq~5b*5eG~np<lI9jSDa9>zye6LRNh1QHl_p|ubm*DKUX`0hr-
zrmg0@)M<BThi&N+VVfQ#Go?mWE*z3+%ear)WNt-@-|ndJn)o`GHnEUDcBzOlwMHD0
zm)6-fd1UHzh-^*I)<<<jYv&*)7yj)`^9sXgWvQG0lTN>K+c(`lyt)gsi^1NZ+t#D^
z$7X2ZOkQG3^1=;UKIM}hGfZOR{_dFkU7u$CZkr>=y7QY(EqejThivujwDqYM9KWaQ
z{!-t``LQM?9>U&v`f1mvuXpND4nZCs$kLK!ba}@6e3_5o^m6$$@(H{8Gcmn^aj`d@
zns-A;PP5QX$HdX>@{%BuETt%mQ0nzI2HB<bVWc2e01{{0H_Z$f^+Vzc#j7FygkrrI
zndp__sMg0hzVt6rLnpnASf|qd!GFpXujbiao*yV!SzCR+9_|LG?*`j)JA-0gC$fI{
zqQT~skovcUglYwgUj_R6y=NIa2Kq0nc>#r~+Wv1|`wvNJKUc&Cx^Bk`U1JXW7(am0
z`%^FIPcdazn|x3K{;i{~*4t{2ixf4oakbS!q0<w|^BmV^+ESOqPjN_apB6=_F&#nc
zc|V^_;2^=zTQ?UJv~%|8#mh@-Cn5@((dj|ZnLyue(N_w*Rso3(iDW-s%-MZ&SMT+T
zP4_s##bms^q{72aP}`7BQD9h#L`hYWr}{^7(drKS#LOrL51kVYokOaGcZwRTss2|;
zsI6Oc=~2Ft0NYMF#>Ewvn@@uhEB$uL<Qn4?_bZD1DT|mVJxh{hw>tGfbxVc#>NrY^
zUrw?-^ynV^8g=H#=PDDknt*NwRDwD({=}GKwqd}-4JlbsBFVqn@^LW*(2em|<0LL{
zI~SiCcTatVD#a=t&pE>M*$SFZ&k6Uvmq!2jY3eoGkR*k<P?O?-tCXL>*XwAZDy%$#
z@b2cl#hP<xIYF(Tw?XyH{%PhECoiQp%e>UESf0MNE;blGoGbzGqS=5RVr+$j-fJRl
zGI{}e?}Rq7QNR{+0Q#aYw5aa7g<q@bLo2d-G+7`KiLu|T|3rFXzRcUlFU#v9-bOw&
z6G?i6=Z}cYs5#8xZ<yVn!c$O6KS75p|EX=NspY%B1FOb*<}~FB?EC^hfmrkZWer>p
zX3SwczqGeQOx-1HFNJPyQ42a;o+*#PHV5)jreXvpHiIT1B}3rDx~0W>oTu;jjC|IJ
zCvhHovKIKQs}0A%4i_2QUEjR%=n+v^-yE=r5|t{&#O+8zh1_@ToN6C-(!+!yz=FXp
zp~GwHKf*+o_ii-3kgg2^h>{F$)iomm$wROr3$24dO32YSP;kHpeEEY(N+1U)B&LD`
z)$KF10Tx|G>wzux2Jk?$okstDf+R*ksNU6gV;&{m)!4>4jf1oGQe+3vePRIVT3f{e
z>&r9ukD_l7VIfq$9J`HU_tA(#c#NujhLX2SrDVsI2ZIvQF?lqRZP%3oICj*wW<f5w
zBpDa!w`xGb%mLv)XtVRiL<`c?-uo+$v7Z3<7mzkODwu~@k5}bhu;T&t?RQYLwN5d|
zazbb8BmD52>}RTEO|4a>tPP%VnyPVB={RDmbte-xI`(b=Pjl;{?6!7@E2pT{z?VO&
zakr}oGdBgrNGXN+U`|iyZdaA1Pf_5PLS_6Dr@TGOl^piqI_!q0ZkIgj1}f;Psx{42
zA#dRENM>TcJpXyk?jtHZpzV66&t>*Y&*!<+<6^tlVdsyZ;qjPuWFqcyl8{TDarZE%
zYqQRDjH)`GaBN(chYi85kh2Xnvdq&uYg*z!=67T1yP9K*%G(TXd+Ie_1&74t<(*n4
z^i5tJUQOy5B!Y1BFV%5bp0q0Swc_=n7VAoh96dxVa|2f)Xu6Wn`a14wM;d7#tDlpE
zI?4K=LJjo{T=(w(wEJA<F(LE2m#hzJjoGI{VRa~}6L1WS*7i^}!oDZ-HKc~oPf%41
z?V3WalPP;sS<x^bIvZubrZi{<g<?czvxfUrO5a0?@;oXe9^@PW9~;Gen&+Vjuuk|=
z-7w!JrtjrJJ<92)c-DX9#D@jnmhgy;0pG6a&=jUVIp2`wO(~oZkXe+Ke#ybyD5%s$
z?|4gXiKdzY%RNZ}N>gC3z|Y!Gw*+yhUULv2qVj2?s7rhQE~EP+Js}=Lg`52jpYHuc
zy~(lK<)ATiAb9jPIZ;KJ@JOg=L@f26;yW-oIml(XY=rP|4%v-f+Z_l%z^3zA7@U%w
zG+ULT02(>Kd5|^nfxy6QmFa=&MWIba$NHNhrv60p%<#=Z#V`Ey$=^NJHPBw{^xpYC
zQrS|sZNTL|rkCyHkQY6+ziWxME?1O=C{d3on6kd9W<+~_pbV<hblc97etu0L$wk*r
z&1zTD(Il*bG%RdoV+$nPY|;Z-+fGyJ(cR!OMzFo!Wi#u)W&o^P+1f8K@uIF-b#+<k
z`F%~|ehhKtLB_-3dQJb&zBkOP-Fb$ja?7Eo=Xs{>sodjw^ZxCNWBz@|)91LC<LSyD
zJ)_?jCm*&2!}I)lyX*MTwaNNrW#b)G`mvPKEE#ba!<T-3*Q~5O&xl8YPvmkfiYln=
z%Vux{`x^EO-XL`RF4Mr7^|2f>wR_iKfG9tWa5%G1!{=;=BH<-c931aqI(jqAp@JIj
zg6^b7F+M>|m7;PoP8)B!4a`_FDfbyFp^tgy?x|6D^HVej7dZH}z!2*Uw*h33Pwbl#
z=r7z+^ZRh$@aOLO-5_L*(R0ZHT==*Neey2`hFr4&Z^ixc0vv(7UAxJ6x>S1_db(K2
zvF?!uo(`-`OBWX=7^`@CJX)YupQowOV-6#~(y$_O84xnCQpRp)q|qEr1{auhWdy6f
z!{W&*c&few<M|?rK8$-F(ywFzj+xUxuy5YpLu2wxbGEQZGQ3ao7G5_W(gd1Xt;Lg&
zqQCp?yhd7E0yhIHW=0}Z$|$&yWXcpZX4OA-zR+@6fs)VS80*T*lG+sURb}@FLpNQE
zuQK95LWHa4e9gy+ury|eIk{qaD}iqj$*+084ZBHoO$Q8Kp+GuGt;ANiC297chW;cR
zTV+0ZF9|IP=MVey#U2;Au7Yw#?oyV5V%QDwQ4AJa2$4|+KSdd4o7TKbRxvdjWY<aR
z{)&3Bn$Qr^>w`@4t{6KJ%a7(J8Wb3Q#T$)*T1Ak5o3}-#?iUeq+|4_)@WF?v4YWjA
z#rkSfh69m^7N(6)UPtz%+yKSwCf=2TM#0Ap+r>_}Acju5`AnX^?&42E-X>9j-;+!9
z6r50+i&Mhy1Onlbmo=pSbpytLJ7I;|(E4j&d{?k@o*K~ssu(qYABcHxSn@~HK=RKd
zKU_Grqzy*C3`YMuf{}QMhVVVS>)MoG!Z}Ua@6IYCG@C*Q#Igj78QW%)c86*|PLsHI
z%PN)xgI&<uY8fd5x32rp+r}ztEBFPVZLhUl@S+-_AiQgQ9{ALN9SWMJ+(T>svR!!a
zn-%;NGgZuWx;zlNQkj~^Dwee+R%6Y_B7If^`&5EBDPDb0ClMDy^zaR?{#Hr?UXs3y
zeqCmr*RA;5>~~3ri`}WO-G;h2M1BoNM!~ekeWHwm?RuAX$q8^bhXeqoB*eBr=6d_5
zJna^0ZK4DuyMkuBKDYO1d#>ymPtx)hQ<S*;`VVFv9IO_p8*$nGvP^a=WZWT(T2hjz
z&90&h{E1Eppp3l0K!vQJ41|2*0>(;^bgkKrd^$#c$bSg_*5$_T&Y5B-gq$1)h4Zm#
zRvjOP7>pb(vX!m$(z7L%nUr&9jR}Qqv~t;Iw4EyKQKoJe>ft9#tyn$f^;f&DX8aG}
zj5IN!Cq9c0EG|tNoIX<#cE2fYYa9gTRO5%05_;MGG(>VhiwEyd>RqErh>eA-suIEL
z8ms!&*i>I=H<-+hLF<=BceZ*~mXvCiF?(F<fte~<=2~n<06<&UN)>1e>4E*D?x5Qp
zaVox(sA^855CY&axgmOtDl&&F1Cyxa-0_36mMZlz%2}h9uu})<CYUb`$d_gPfo!9J
zPq6AxA$hB;yOQIx%$J_@gu6h3<K_LQ3G6va^3L=u3-DRe2ISWQ+xRk)6kiZdH(KDa
zM%9Tss^wh7JwJ@h{dQuFbFfJa%NYH7I{v!c1m;CnWHdc%QT{K{h__fOG9vkq9Km22
z(JJdK&mTUCfxv%F)xL|fyTx>x##ZPAA|O&H8ae)(ZW@X=uXQPLexT~S!aTTzLBXa1
z1Q8Ju2ddgea)^rfrxTmz=sdH6@(ew}3JZY1q65-Bi^=758}DU?3C(w$2-&`nu{*39
zE!doB1%d6L2(ne@E9**Ch8s1?Z|h{i)I+Mkb7v3MhJ;;uDe9Q=;xJ#I-|wl&v$PG`
zvC`X^66R<VnB|bMy)Fg%t`;YLXk>*Zqa&0qxLz0B8_8tG+o^nxDP#&ThD`BKij(tF
z9B+V%R(L!27<aE!`#YLaFwwN<a|y>wz^j?A^%!EFC%-lwhR2JVg_GG^g>~rxw?{%2
z?#)d!3-Q~7@jnQn%5@mK?8*gpwqYmjlDdsBHQ|%M26Hx;)UV%nO){vb)>w)R7EHxb
zr8o}hx6E_gukz)5X6e@22bg(LmOkFfdAHh9KOx0IBGaY|kUBOU{Kgm{W5V#65TrPa
z-9B3Q#*ass<%+oR)zc72TsS>8By7biaf=PpJqE>UKdZrt#RtmS9k!!Flq1buRsXgy
z{qBgm98df^HG`<Zu2Dg{c}SDsX{jgoLgQ-lyA*4+jvuGE;5#`im0m&4gJ>sU>holg
zefmF@O#Y_jA)9G@#JH5yL;qzBU1VAXrMOr7-fG=no=xALdP2SfEQgiDC}otnQXWW+
zDVab3H*yY{OA<fz`BShv?7u%3+uuAGSN&bR&2D@1^8W@*W#Kv>Ulkcy@f+VAtiFLw
z9OU9sL<M4bb}v~jB8Jyf8-*B=g0brmV}0BZ)(J;Zi#{*t|B!}nzYg!g7~Nz8j|GcZ
z%`J<d_6czoth1&~7)jGpCv#7!l`@b}(~IygbE4PqAH$@`)U%X%NS7oR=ShU~BT_*Z
zlwk@KMdk;rsD#E36P-(Sq}6bF-_8a;JnhC)gFR`WtdTOYmBy9h6B%W<!&$j4sHj1(
z5wwnX_zlb9RE__K8xmwS)snx|np3BS5T$)~!0NbVXM?$9V0|baJo`NL;57VE9x-~l
z5~(zG2)GthFmULrElfqOmzwZxkQ7-qy=7uetw#?ss)jW=UGemve0G5_a0dpx*$zKR
zFq*{6hNGsuMm%-sspzcMig&X4vn`lEwNa+uunb>6piGzQozYK0t#o(HP`pk?ClF%X
zrV|^KTQ)@sG<Sw+v3c+EL-`cu85+pL=|#41-i81B6VTxNgwG$Tj?p*VQUQp4A3$fE
zeE(SZ$pQh=FM!YpG#_ex9@%lsKn|4_Uk;4+$6~Ic_HD-O|98jjk+9?5SKu_Or=7zW
zAx>uUTgg$$*&<!NKwqoJfBkdU-Q;NetjFZ4#G!j!dMl748h8~+8lE`Y-~O^ohDLyQ
z;4seVZ>p<d|NUI9&ofa21NZSCKk1(=bndvNCdt8xp`sPLSA9;=y=joN-lTeOw@A@m
zmTu4E<75j7>lp9wYP%;QVyt%#o4Ng5nWjy(f<TG6m?zcEW(NoVfXWXDnfpGYG~8%#
z8SZm6D?_n1y~(-o@<BPhv=vdkV=!jZ`~han)#1_<$70|$RQQD&YFbkbV{pNp1G4jO
zskYMx@<hZZ<;cG({dGS{LuV@@&S5(*7-qbh_9_{WA49!k`SeP87pLxIC+x=FZDo=T
zN(YsioHwXcSuTJrLyn3ib3*{?S6dR*7c7ou^|N;C>L{AbvagIe$T&CzFB_iV)XwRM
zg<d%Ocuad$U}|tYmM^3%iwG!^>b$jr6mG|Q8wwzz9ueMxSyZIPU+b#`2T!O#)$Sp&
zZZ6yun(tlrT#|2dll@h(dis==46k-Ro>?|?+B99f_zw*EiA-plq6Hv(<}GWofYG9*
zSKwv4Ue(Q!J(VD~X721qvfNtGJ54x^p_C!jJ?Ao?rlsT0pUv#E;Lew&q>ebgm25o_
zhylX1c(bEAj0|?X)?)GtL6bkFK<b;kpBM7-m$gR%KgG;OIBVst&8$*^MZHiZXL4O+
zoDJ;Ng2`U6v=_<bb7ZvY4XFVEAYr}d?vN;3M<1WwZq>g{RLSXm7*jYI_Oqtl+&Lmi
zzE<QW*20%8$+@Ad*t(QYG+&$Upeep?ImS-bU?~G(etD^45dm|{T5SU*>CA3Jtkj!I
zMYJlTK1T9U8Q58_r-EXIwQOvAh>=;&DbeK;8(t^4`bF&2MrgfK<2lJ4r&dy#--d&A
zjqCzbjvf@G7~nD$!yyE5|H6w}LHWh}ak?lubI)%M%BZ;<u9emMmz(*U9YobsoJ@qZ
zN-kAxxJn^P&J<-%N)OuvJp)ED@2~<mSMxA_6rMSYhD%p*(5z+iGyJK5^KtTgGkPxc
zGkUpmB6Me_8ZQKz3OM@Z{xwtPX1N_gR^QS)a!o9YQW}w7CE1FnBY%loEW9QEiOO{{
z<Y{75wyXQkOuS7Ti7y5$I-j+)<vUu3{?x??i}upVuJ%crH1S96kF$C+e0?$0#&wI<
zdhtlRtsGi2IF5}EX6xn+n*dv`l~?ZwB0zng#OyjeY0NhFUlM?dpos)f%HMQunhNk3
zHBSd*to6U%vh>gAtR$AIfzlR4UuGIi^jNPUO!wWrxsMb~Jr;<4bI(^6X2mlNKeTZC
z&6t~1kExTExy&qbqd^jGf-#c&+YEDV9=L3GDuZ>;Qak^KQ}7Dvu<;-ZiamKFTU9b;
zI)s(ycHUH-l`Cbio>XS(*D$Hv%p6WC_HLh6VreET!ZKxjYHm)R{t9c-96Td89V-;h
z%J?H}B6pml-WZ{Kn|{emw;J(<pNeI7YhtbqG*UHxl3=D<INq{XJTj1GqF6XyUdfdG
znGolJ2Jv0lP<|<(c^<8nZL$}Qtd@mKf&SIYH$+G=N~AF8*x~d^8M*axo%{TvjM`uD
zy)XIs;I?><qm+<V)!=M5FgDf&0y~qVzG_(^F8*I!!p|hp8sg5m*IbN&K;(rQfyo^f
z0Q#z}X6m$D9)VN&Pr%=R(hRY3VS!E#@jXYon$7<Wx@Jid$i~_tych}J9?sBpkIFeg
zfG~LY`F7=Sq}*_Y{7?^Uq49@^W~2YZ_UGp>;`<z5<L{R%3u`l)CvcR{f0h3o-`eK>
z37l^>lnBouTDz*Lk4sDjyh>`Uw%_zq!i?XgaoyoUvFSpv7ovnSGG<e-Z9WyfV$XVM
zMp;T!D&Fw`PB6f0`W(jaQ#He*WO9W{N;tumr7xt4jsux(Tc7f%jbF~&Qgk_33Lx@X
zG=3oqIC5Q&p`-_&Gh~>-VUrC;(3+Cwa(GmiPJ{jc{D{X#OD89NysO4rEB-$!n=0r>
zAaM&a|3K_!QuEU?3^qZlL}U6)T>U;f_7l0(RroHV)(&Nqe4}3ap98U&M`{SC<lM>7
zl&O9D1wu7mRwyB~OOtZ^U1fvce~<xMexz5o*BJzU6eu9pBm=Q(Vh4Xygh@ir>Xg^!
z@5yHXBDFfo2IRnM6L!>>NPgwwasEG*bX8V?G<bn2k0^-jbwH*VtK{zVS%+d)eK%St
zj9K{o_p@2|ss}P+gWtKYiR;riLL14XZl{<MKPRg)5X~s0RUbu_LpR5xV;1QI27u70
zxrY?QUjco8_72=o&JBd#DE*d%&hpYoE*hlD{{_^L+cgSqhx`-MrYMRcDhGmKw6*2S
z|L*^f6=z(JeMEUJt^&h$=ykwX!?{#N?nvUU=VV+^g}dr9PH{q~7!|uf^fy>%OV6v)
z^Pk8aQ^m{v^{`p>>uo)M?(1#N`yZh9c^f)4Yf>Wu1*-2y($L<06ZsTq4W9aD`Wtd-
zHzKrgie?eJ8N@TbF(MbY3pd$7X^Qw(>V^QOX@}g(Q=|7|V;vAcE6^R9AhOvJ6(ukF
z3heR4%5SoT_?wMiVEN^%;(g8v9OlAL%r&@LyagVs4$})JrZibp4FCT+a-IWQ#PdAM
z_IEhFwBOJCp5HvKZujLV(Y&#dam>o);}$5^vSMwn%2^|_S7!4xwZN+VpZ|frL6{(s
z>CB8L1<@)i<7|bhRY6$PjSm|Y=?+1sK_uE@zcnhJrrb%(7h8|<@o6wVaP5cnl;trW
zzQ344oc6{(-8M*5rGV<e{YEF#V&E<N-PGiz5lhPYH^ZALK_^lh@&Z<htm+hJfSM&>
zx=@IhOq$nJ7)eX&vZbP^&;VtMN>9lm@ZS?Fe~is(;u^Y=oPh?L8zrk*K6P<pYk&}5
z^&C$_CWz|p-^NF2a06++$KT#q>poj(Rl?>y;3VfrKl_<d<i46AmHPHm;)!3n#jC^o
zspWZWS)k5}BQ*t*Geqxbt2U}KkJ8K7%HfF7>}ifrVWlvFekQnlcU~l8;3W}~DJ~`O
z)_?5RHbKQU8<<KWz2MyQ$~;bI?b`<7&XB`4ju6s{t;+7ku^C0H2n6sYkmDSgfuGYV
zdcd1nJIM@CskOkF&7wh68-)x_Mxh6n#9oMNS@@X!j<XHJ42V|w?7PZxQZ)#6Fh<{(
z;Pt0exI=C*zRZ<?{4%S{nx_MfHG!$o0w<8tHg1<TR0kJd9)gfCl(OK`kZWghZFApr
z_RrwVWhg;QX0aq$&@WrK3{3Ua%q=2uYP6xk=gq7`95yX4B`t|XWeWtRyv830=-fkD
z>dqHYeHaJ<k9ze2!_}_G5RdkHd3{Zdp0N<ZkiW103w5*86O6#NcRQM2o=yK0M}|VR
z-VVt5as_)n-cxMtw=sEG?J+B<T~jT&dr6g7{NZja&~b4V;nuXK;y`B#Qt?Qn-c__v
zc*Tg0&)iEUOg#PjAXQtYSoskYrHo%d=uh|F<Wv@_#pSA$;peR1_*~XvPV10mqu4G|
zwr0l=`_i4WKiC)i;`?|dS5;f640>bs-ztjKnsH72_I>>h@+rjAmgr=m?zM`H*RyG>
z=d1a%3%^aumi=;!!S+C(w}0!mEH(vy>ipbJ!xZ2D@7QpkO{{FLi@1lyzVV(k<d8Id
z6PCw8*bkT~9y8G1esLf({h*UVUy+J^(LD%1r=*0#fayg7x;rjlITj-DI=1occ{!er
z_~G|63jFC^ocBP49G)G|1-KU4y^{?%uXgW#aJ_UX-7*8w?p2||4y4|^4q6E-N6hto
z`S1IaFxtBveT+3=`I|}ntS$-g7^tfYj8o(Khr6%~Cla%Sa~jMGK|UwKz^hw=75Zrv
zW+>J~Nkzt@kP7Ff@=9MZ@+iUyQrBb@Ea^Aa@fW3^QsX@L<AfkE$z%pW+8&bxrmV8?
zh)6k%A3`0RCm&$*qh>&>7c_7_wZO30Lw)V5g|+n~eKoO}p9$n!`85`j>q5X^m~$~P
zc6C9LNo+NGBIBZ-foV_6;44D`87p|2`p+x$05HT1d69I5Oyr~A(YX8L<jT5H=Ep=c
zs{96dNpPbnWB4C%Jjna}-$o}wcVx^>8iQsE33R*<F|aB;Vy|<~ULMY+^}g6pbO(3Y
zO%04=)BNL~9JuQgC}(;!28h?4&^!we+@qu~+N4SQ|Kt8qI?3nA9m$DJ*}G8QUv+vO
z7D&gGpP1K_bolH;MjszM`BMN9HnA|PLGK^_F1%q@K0nC8dTVQPgu!1WNc=IdfH}YM
zzwwy3eN=5mSky;ll|FKoa2ieuB0gcTzhF~+pRfyLcHr=MO7=2BNqDT%kK*d}k5)yR
zE;y)(C&B8@ZW0(d15OB00{JlhfriMu%Cf>h_TT^7P3oO)x6c;t1<{yZd%vL2J_+(h
z+cZ5+alsKIo?aImOZ0fVxQ9F$q=g1pm7<_<8G&bMw_%e*?&sfrcX|Ztm0SlL((8o=
z`T5)0J;lC%iaq~J@RaoA-t0mQdfy*|$K|udrqmE&XzI-Vzy?H?dc)q4rYv{<Pm&Xu
zmLoiUf3yw#8fl~3d9>|Az{2ggsb2tzTQqt)l`vbwoBxe2$=+hbCeI&4Lzw2|`>3~n
zD!czhEW|STHrsHx$8|2y*YTst(HkGl%ek7LUJuI}#gzDIweZ*He!V8=OtZl@T<rK+
zq#b#yV+iQ<_^3!`Q^M4wfVE0lQ1X@3|KJOq$NU2{=lq?!MHQ>Hy~c~~r5)YBJ6>&l
z8ljOse-}o()Na8w$RnZQrJPD=w39?x4{qgiYLYd)Ae^xv(_yDXo1q;eB<i}ef82nn
z;NFma?Bb9MV%?vB=ujdGHM${j=umuwHLqc=-X}r!Ap!jiNZ7zi4NkL!pfA+Q)M4@*
z<BHMEh^d$WNS7+q>E1cMPpIJW91yVfHG__D4#>ZYVGdhsV~LX>F37MP>K5@P_z9D=
z>6rA2_!9+IY@vN0+jp|U8Y<+Iq=n(T_Z;-G&5^hUP>=yl%jv=xP$LzpcK>Riny5uX
zih21}P0RI1J!0i#>tB$5b{HB3yhUdu&!cqf7wgj*Secu&!@9lF=S(mz$>U$?jE|l`
z85DcX8=M~7d@Qg^L0EYfyqPO~?;cD#B3Hv4zYvd>`MZc1a55u}Njv?sEA3Bf-^#r_
zPAeE;AL(~IuJ;s#D+l1kX6YF|fas*T5xj6SW&6{P*A*5)a1{jbXd-Vd6buzT<tJKB
zAxLQkZ5|IUV7<)RwA{ABlE00U#{wX4_Z((9o4tlXxvGH4%QBNeCeDR%z-`<`DAn#D
z$0tPEIYZ<q9Q^9g=%UZ{HS77Cz^my=hAHp_`hUJMHu>#A3Yc6!$34DRv;qfw?s}*r
z^=%ILf0i87o<K?S)}NYwAJ3y%b%NG{FU-Eu?0=@~e1Ec3PLl<ED{nMw%Yy*Xk%OBe
zIPDU)KC!M%#aq}=<KSD8P$S?n@nrDNVw_+%)U7^7Ab%sEk3pc9omHThu#_*@j*4Di
zyvQTFL0@(@9a9i#V_8#H*n#p@QRq`rFo)p_oHn4{qb=2Ts#TIam*V{GwmOqhOD`j*
z*V{2e4naYR_eiOQwX_kQt0<P#qPz1?@6|>3!o#0wkcFU@)_)WdxLDWkq%^>c`D!U^
zf!cjYVp3F7lW#rTsaAxQ;r?#kjGUq8HB%jfL|tgNq!KqcK7vfFuaK<4j<m0k%jy1#
z7cpmVCsNgS>tOp^`_4xFb<J#*5Rs_kUnTCELMc4j-SBPN*!_<#P2E1Ph<QJar=Ui+
zn<^*ex9&@R87t!W`!=V~RGyC@_;PV^(iz@9BQ8nEB5tgrE7BZpL5bCfqXwWuCnry>
z**G~q&vHa`+1>YNE;il^P}1y}*I8L_7;B8G_Wsn);hLlkhuT6j@)%f;J#a2}K23Y~
z(6HYEsDX}&k-CH&I7@=feeKV0^0nv7Hy)<N))W3Q!o<bO)Bo)ajnNJ*>2;<t`*pzo
znmXBQ$zj^1FX^a2anvP4NKN5<hpBvvsqFNSlM>xTSTQOO<}pYywuSg5zO*ZSv+y_B
zbxMpn_IKCojCDxq41?(OIi>NFV1wJ?pb;$W)`9l8jv>&69xU*ffg+uUQi}3a90xLG
zC@Mu85!rTq+=Fm(r&01E=H9~3a6>8h-BsEQK0s-PG)*(k!VEP{OrJNgXBC=r-NzMW
zGc>yo?)R}$lJI?o0Bx5z=KcwtA0_!aB<P!=hWAYHYULRx<B#$u5Eib#OK8_qP$CR(
zdX#OnRPGT2g8(n|KPb|<6bPgXAcif#(oZIQlLfPgS7}IvgCA<o=djmjUv}R`vh(oE
z)#dPsD$>;Jir&e*ciMTA0CEh@k^l9JJMFN#SS>kG9^?I_7*vK7>&Q>;M}{L={v*RV
zoA*D2Vf~K`r^WI|hEvsSkc4mN$Ew6?+Cp4ptDECi;6F*Mf$G|P>RgKB4v6|8UQLs3
zMW8rCptW18otVuhW@!fY2iDyc414Uvst+hUy?ap?leK@p#`2pa66{#P<xiG<dLLy9
zEXVgKF8quzrG=KlA`-g4zC|}sD+%nbu12m-tO}OSHqJJ*E%0t+DHC=LU!<fkb`4W!
z2(||86|>GU#bU#EJ|2`Dj!%>H+Oi^+gtG0q&Mg-}+e>Az=YH50c!3m)wc6A>e`>X>
z;y0dQq*mx?G=0tZfzVm8n!7}O$XR)x-b`MNTU8mB1>t5;AaN#3JzM#KHx-2KL}>y3
zOw9>~Qy5VqDnKp7mx)!C&w%+C&m~|WXg4NRzF;PHx5|G;TV;w9n>Ss*d5eZ#kr(YQ
zgMpC>R@gJ!%gy&qbpgbqe>+W{YBKM`t?>LZ0Azi@Z}bJ9L-n--LUl4E{+EolmcjUT
zA&3|-VDU?s<}{7E=aha#)xLq{I5F$uptONK64{$VHq~MsBD=UPv;lwLj3pL|%ss7<
z^;{cD#QDj`V+^hmkt_^soP6`C1PnLELIUfZB0?;U@!}yop)|<+p#vE9gl9!?#AePl
zDH>E!9y^icn56f)jgh7^?>?+1)0uQhPo1MgBP61e2y4C`_Rkj9M$1<`j~wuuC8)%P
z&{bG)xxCZ(N#Et|$d3^>l@B)W8EalE%ZK^Bo>&xsF+KOF$R<38y~>{|XBPgj*FOrJ
z+t$`>NLhe>*&gS@?|J7sFHdYIvdCMVV)Fe<&2M(?m2xe-lnJibOx}~T=Pt;B2`C6G
z2as4`^(SPuJmyJw7oqxBW?!Z@x|1$a0S4J!1C*LtGL(v0^l8z9h$UE-Tx<2|Ij6B^
z9L&)Hg_Ur_Lts87745E(R8?l<FL$0u<V$#Hgc6ZRohek%JZzMiq5ZhXntJ^8q>A`R
z@bQ>#4&$AkL#5D#nk3$cq@^5IsewGkt*4iK2A5@p1ZdrW{UzJ7KpvgsM{lxZr$1G;
zU~s9U%*8BAIId|&Q07SAYBBriHiTlu{wC6FE!NX_Y(_p*<WnkVfk0#lUaOYO<|d30
zT`44adF-0o-7iyFY0`Q4ZSV<U*b@5C2DL)2@9}s6-ahZ~-yz$Z?`MDHO<Q^i39}m>
ze%W~mky(FEXgUMAk*Q34X;!t#Md4)CS!fVKYx++Us}S5elzB^BDV~rOd&dSgG?xq{
z4)6sW%4cvupOM1DlR>1-*3PE>%-hG(pf53hFjUesDaMOIVy=8_j{j&B9ACVg5BwZi
zl_#I{M-Ch2Q6S^Li?Uy;7UG)l`A~g!#M_`5yX8N3Mv8Y}C|*gzM)o9uF$;FYG#`7(
z=XuXI$?h=3|4BY#;0@->;1h2rgP!%?lNR)~mIEUaw_HZ)(n_l6`@L>9`C@w>Mb-wg
zztF{&eqobx%k(<Jc&0uoQFYlP?bz0slX;L;8|JPbLu9-Jsrx{<8Ik(6uDjNqz?hKC
zk$h}G-lGb^@U^@aD|DCxgz#N@3n+r>)d<+`pm;pm&O5z)f4<&q3Rv0ZJH_j2yWVJX
z*<rLvOO=If3X!EVppJ(Upktbp2hiPb&f)%IGLnL)u2{2vCVnwi<6D)ZANpG!*`p%Y
z5e=V*Qp#`v!7j?a*edE!LvUSQtMhI$;qM|~<m(Qj^mSevEfOo6i|ggENzU!|)=(YE
z804gQSr{^zHgC!wF`J{*08z4d!eJm*sbO+bAA-Os2KgeAI`DY5BpZamd7_GocInr`
zLO+!*6~xN&3*s&Exi{r%2C#-X2S=5{pIe7FT+j%a=h~q<GZazL*m;hCgx?}R9V|Zy
z&0V??i-m5TH3?ik8bi}JJ-C>k^kL&a+&0zB=qdY-raZ<``o$XKbHpI>O1!GZsN<9m
ztX^Am(Wc=X@lE`xt^o=mwT=;kTn`2#j{{riYsdd7eY-mmftMz2E0c`3)YrnnZfIYz
zBAte33e>gpu%XWMz`X_0zTj|FekW#DAhaBdTHwVaCW`NIo#&V!P<T70aa{1T=%$#Y
zI1@)a8pZ5qU*q#%;Cp6sYQ-S55;Xsx&1Ug`Hk-cg_7i_cyp^wltm><YpIz$q&(mz!
zP#07*%+Z`o29W|7v|V{{`S#{VhE*@F$p3%9L~Sq}!+Hb_j%b_+m|0-73H;`K$r5v9
z+ZVw!BmyC%f-8Q78wdHC<c=s-fu;_G+=B2<Fv}Q0XhbLod&78wv{0Pj$k=-snFGWd
zggx-@uRPU^2~wEet?J=c>-l5?p|7Y7C86$%HhlIc9j<jLN{Bf0*1(+}kJ)ppZ+1AP
z;&pN9j6IB4a{ma%nzrY*9dY3tlfJ14?f=8tJ%3mBuI+-4ZD&;_72CFL+jc6pZQHhO
z+pO5B*mh@s_kMevxBHAUPLDo+!CF5(^O^I$?&}jVdXmvEqj0%kq|NRy3N3I?+}nob
zenK4H{8Jhmg`{O3)CVsK2zl>yfz-8iolRdK2KD5-7#6dH72%SrIuY{-ac(h@p@|F%
zpI(tiCw#twbhP>eLD`~w(2HBf+LeO^u6or}8)}O|bw^nNv4mj*6_I`6WK1gW66rMP
zN2WneLGV3yto}6&lRlu+KMxzSCVY@f%%o97KB8aH6nE0in*_JN-AW`3!!4{ind;xc
z^=FZA2F*~Zr7|0FA{24Y&cbGrgs?3ua{~h1hy+`+N;hn{1#&h#NWcu{PbT2H&iF7P
zUE8Z-oBVhYueO`ID#La6y&Eu(Yns6~Iw8pGct@u^YIak5$vPdfT1io=wC@ymwGfrx
z31aXAfW)3G?PBJyeOi?aO~%EU%P>SdC}>z@!dRjENDI6XLzh>W5k-cWL2+^DBfD{6
z4Q1FM-penET?D%S!Hs#@^ZPCW(yk|sldwL!5GQ)o6@Ppfw1d<g4$>VOv`4zA;iO-T
zh#3_J1Lga-8>8YWMG7&>_p~(fxo2b<VD?;f@#fM=@4slL5aCW@3F;+XC%2_gVHCb}
zqhQ=$rqm+0SpMfW$fA>rn)*{0Spq(nMu8DG#@4TlR;lGc?cV$KhY|C8ROzQ`;`wH9
zdt~_)P#;8E+<yY?5D>4wl^^mpukgv_gjWOf!V~^2$k3N^p!dF6G*&PML%bsQVczCv
zu%nTwVVj@~_b2xIl7pBqwrP!v-y48QMIj!zk8r^mhhiZBb_`aMf#o6N%hc*Io%(zf
z<7>@-ZB+h8kHI3>r&#*dV<c_M;2@35{8<dGC=n18Dq^h4xNvBfEHl_chNT!`7V3K-
z=H>>bUcLicyW%;Mrdh44)A&eb@QA}mpE(jkBEC*Ti-F?}lj#Y5%+T<+r>B$K-{)x8
z_a@2R>-4>52gJkG^-qIwYql|Z2lY`S{`fI#ozXv75+~FF3rr-A3O$_Gy`-RivMj(f
zh*Ll_X)Fa(2ud{^3fHu1IC;6kOI*eGp)V?`u&z~l)|N#mpYj}1mmotZ3c`$~Own9{
zW;|g(7~HI70aj#A-c)A#?1z6)>8W=9U@|vCu-U7vc_AFnIWn-Cq5LtwHPb<_qJb7l
z_e_$Fdb{qs?FR?2D;4_9@5t3CxO}3BQSqc%YkdKP{w}Ibc)5FCyyk<T7NWyqx<ccJ
zfyCW1y(OiMqr&okz?8!`K;h2PD#EAOX)gpb7pf-17gnJTKty%OfQaS}Z;)QA>R!!A
zI@s=QT@i*hzD~srKZ%;lOp$CBu~O`TsR_wb(+I<n)4sZMt^?53-MaGG`@c4tzTdgL
z>fN6P4ucq@U|oKc#N__hW`DYvr0mD`X4`5~LPNzD+u=0TZXIj&M#<K_EBfZX<mI~{
zkZBt5=+Nns(TjLdLwaCS7B6&~BW(}u-iO9Da{lQ;V`j>B3AE)yu?CxCk~=j{+iPZc
z&+1gpg*^x=&mBST_lI!MZlCK4P-Sj$isCL~GbTl>VOS_>($FsCPkOBIEZCv)*IEtz
zQ};^UZ5Gdm<Dp7jn_SYl{d*huq1PZQLSKA1XuokK6#B`L0iSs27|8?mR-G&w6O!9i
z?zgX6+6-VM+m!1jTlb^E)P#xM9C;l%581SFgvJ7;X;4O}6r#2WyS1Tz_G$)&zm8`l
zplivz=oGOa$$U}fD~=OK>O=_QS|k@0e()d%Cs-i#?Nq>oAaxSl_OYQK#)m74N3tjm
zJU{x?VN1IOd+Cv90FZ|ibm+T~0BAxjpj`@VTuH-&Fxz8h2R6y-k*NhquvnNWm_0UH
zZh%=5XjMRQ)nBh8@PP+!3hazM+_J(Nc_w3(9SldYW1dx!zw!tLn~JToXl&K){P;+3
zym}fly~|iU*zN8RuZCUZs7qT^7mEV7<0Apc)%<z)`adUDt&22hQ^M+~)156AE3|y;
z>Sq?TFvupc^V6%d)VkvNFjKjPkX|C7;fM?7WnTA*Yolco#+0$NctmiKl$p6_hjvXc
zOXO3AsPX_)M|`)687crnOViy6paklYhCe5}2cflz^fDVK_)f)@@`whfo-fiVIG>G9
z7*YPz)5Y>VN$SiLZ&TGtz$F9obd$JA_y>a@urqQa5LYZqA_Y}9E{&|W2xqA;+Y)kI
zw-5;2IjL=mRtQQ@a1X?+gCyE1nnv5uUnZ~p(&Sz?dz-@nA{mn{M;Y`+XQ#it2P1ll
z|B+>&c|m$KvbV!G^ZH6W>c9M1><x?CnM5JiNZVjEb=s0A|4>kFSXchekfhYaFN4*j
zRIZ1NDoUy<A-W=(niF+4NwRo;ZFqfw{%@pA?diWrnedKyZh<zXBt5gpR=WD%qX^Z4
z%3&fd57+@AtwnUdGQs9}QJ=&G%B`c*zFAyRDLOtWE-}GQ!DNWtxS#Y*HO^Agni_-K
z^Yx%{tYHKBqyd#bzNIe6xy4$L*`DBfb4OA9um7&e$PYkX!UU4ra-S6zgr#zjhV#k}
zk#nYsN>g}N#T(NFljY`3VF|NZMy)?J2gy{>r)h3Zf$$3;nai^r@L(LSuMT0%+$FnM
zr)af3ctsqPENQP>V+Dd;sI^w!7l-02*~-dR_S6)0NE~;~S3+#FP+tLl=tWBae>&Qa
zVZm(3^mE%t*o7r<Y?alx_G0*pwoCk|lpAUPnWT-21MQ|p!7O0ISTx;^n@Vz^!jkj-
zBU2?xwx*!Bh6_MwweV5(*J7e2H{3hdjY-xg&-dn3*FX;ys8Sdp7ELapufU!%Bc=16
zK<IRD7>?ZfD+{;9p2G{WA&Yx2q{=?U(ks&#=gX{m&li}TFa5S%rW>wRwjItaS+ZHp
zvWA?;x~E*p?x?MpIRoTBO)Adr3;w-zVbbIEf&4xN^7StAUxW-HGLCq(FA*jjriLKY
z(a^bwaIRSamqTKtdg%xrKN+ozXh|rG_~ARt9>&iUL&1N@m{f`XkTH_@<>_$)x<B&n
zprIEjhW7Ef1?K+`i!mh3X~$s?{E=&ZIz!7bx~j$<uAkF;2qhY&0iasl$unrZv<V?A
zcsBU|N{G8>`J;jqM=W{X_-ZBT>9#3VA?L^{>q&OwOe>MhKhz)b?MUNiGF-RmkbDjZ
zF-zSq9}LPHPaVRf{uT=Uk{XYRy}2APJcHQAPoH(YsCZJ1-n2_G+0ZQ6vn#(seXvA0
zV6ZEO0>)Ve??qmNSK<zpjTD=F?)=irE`?@*ioabVMRFM0S>l6*4p!uY&lAjq*~5vQ
z@N24n&*6IxQn>L2W;s&2gS@`gANRLNdiMjB0XHr;BT&rzE8S<?3AgmF=I4IDvtJ=M
z!=BB}<|LmvsdPJN0bAKlr=2mF+Z){v+#yp&H~4q>V<e~Kgm<=1EnyqkO)s1(`&=qg
z8{J|%dVSYQm}p9l4=rpkltM(EN2;3}Y0_?NRv2jkw7bFx6^u@805+m|U%ELbV9s(v
zTM1?um}d0N!Vy&%-B`h-Up$4vJe5|khR<vN;pPU_-$kHWSjwC}xj*bQuaaKD;7~3J
zSU68Fa*putlp0RBloFvxO>Z0@!IXL5mwJ+qx$~gVKUGbjpaK%7okH=}=L;_jQEhi$
zne2d*#4%)6n0#*?1XhMJcOw4#NKOoSbb?|WJB7?FPk+5fV(31phoVXo+B|3yA06)(
zW81a;Mc_d4i&k6I=aM102f6Od>f|h|?2(a*cr_t<ux!tv#t>8^!M%KJDZSN91X471
zlLL5Dj$j6KNa>W*-e{$U7lp3ZUuL4kVmeC<L4vgfq?upOS9{e{sEb+|DG_<}p8Zz>
z>RRb3gH3z&1B3t^oFd4m&NHD;IFaum#at-YL);y?0gAhz>u7%_tbow7w)VU^T79Ue
zG+%tW%l}73My{n(?bkzCUnl6+*5JJbYd0%;Mf7h-yi(rmnS6m_*W7IOom|l08?%du
zGR6ckvXp9=o*MFeLGMD{!w4lD*-7DW$*pL5@ApztY3tiELae+EFa8FR8?t)oaJ*&W
zYmRBD5OcB};_(VL+E{{b_-OiYA7!CFw()RjufERrR#9=WpkN4psRrL`Yd?OL_A1&t
zo&zxij9e)_-m*DuE*6ga3cj<mp+!CSuHC>u2J#ZpR=Tl`&+fx43Y_!;54X>O+`pGE
zjBiQY-CC?5noGokk_*V*_{1mwgtqNzvqc=H<V}Lc2M>e<4q=qCz5_R?j`*6F_4Ad|
z$VyLxcqni9FH0siOYHw`$!J6U50=dDaYK1Fy5TC@YAyLJmq2lAlIymjAT5d%1X0B-
zdLzkIp-LGKcVJxw?sc<NzX#?UqKEVh@p*$D8Pu60!}zq*667!i&@i>}Jz041?~oD|
zF(!do%uSb@(XT51xV=oaC(EmYDI-4LTIpEZb6T-!kLo9qzYCe!Fxo*;>`CzcnUpv*
zuf5HD4C`?IS!2ud`+PFrqHvD8^-OI$-1C_jU0jyJg^3Y_rXwds6PofL;49<gPZcS=
z&b^ZQ&eStV4Z=AwFJYh^7z;Mq&tyOO3MIrNM6}c9*2e#RM`kr+$Kv+6A_)<<^@s*P
z9WVSK80082wT<=0z7S4qz7tCk($PW<TZBt7=#GMe_+U<<<spWUp2YWY?XGxHtAtCJ
z7y<EG6vQ@Z|Aj`qL?&$Q!3ED^={|2vdSl(MM=nO0QPm6%eW8^usvRShYajvR@!p2;
z-EzBmg(cy@_DGeGx1aX%CBUpwiQ+f1jucISJES-Qr~6Z{2V}KQJ_|P3Zbwvo9T43-
ztH)tn9P$8~N{?~0@|P_f{UCxE4S}j#$)!>;H&3|UI47EPWFyM`=`13owdxGZo8P+Q
z;~^=C3k4Fgi)tn)dThUpzY?7$$SM^2@y|5|aSTF419mQ2q-iSx2PoZvQ*^nC+)mOx
zaGG({hdFOy&vlX=#zo2q4)@`zAb8R(!Gw(#*hq_|Qe+}Cp?^CvsQ-3k@I2cy1iI5{
z_pnDg&6R?FCU~8kGf<Zq)?@xOZC5D9uFGNnYDbXgn@ZkMiKVsYsz8dNj13kx-sa@b
z3gK$a@@Av-STJZ6s?Ra;y?G?={TLq+iZKYab&9ofI?VtxKqq&|U0<e55H*K{{|e~1
zw*h1Qj%!Z|{f_vKYugABGhA!9-QddIcxkQq%70wUYf4e*i3K0<k2!=zE>)yB=;iII
z^6U1dGGrEa7ta6(Fry=Kn?q&!vWY6&`wT(KrhmTTeDZ(BTauIQ<0#)|LIt%Pd%yA(
zDbcDnc@#1%;gU@xz5<_Z9XR~Jn<MJV#5u@MFY^Nr37;ierO#Z%vRe!qaw)-`)U&}S
zC~$qql8CZ<a}^K*OiGu-rfg7Tx!m^z(`|!^Mqojv$=fZ8tuOvQ&-b?e`j)gWt=#{W
zsoz^ca&JHFQ|O8nG(^~lfi`@!S&``!J?ww67~H~g|MT;}LqxrsW$j>;Kgx&#s8x-s
zPwG!<U`z=P_jzv5Vg)m<OKvf1VX%f$hTin}HSd|XoS6G>$cGP%di5TaB=#ja<(F0B
z7Ng5;cWeVZ$s-mL$5{Bpn;%_aJobH8($*0V_nP+9OvY?trh`qQujr=u(FF|ah1*h5
z4Se9D;VEQ2L|*idP2^^ZJp!5JG5>2t=9i$jRPlD-7cY~+G9GY;!v7&ghU)IWF)~Pf
zN^y@A>>w?EWI4n@tLFWMbxcjqP6wG7)@4a>G}KrKieOG@4O+x=ZDH9KrJ^5&22Jiq
zL~$8K&gFWm$jqEVGDY}@Rt&KE_6$Tg@+QFcTnOYxCDM<PqB`8EziQxbCQ^(Kcf4T|
z_LAr);tG&*NCemF43dcyg?iOZw>VTnv?fP_<8Q!DP5P<iDOJgnB$fQ~l>?c$(qInJ
z#I(OqHxx2p=IJ3r{1An`drY@0kk`SKe^z%%6MMXQK)S!Wzfm8CPb*O?^?m2NKSJuC
z2Y*SZN%Z<Qfz!V-W#`<!rH?J2)}Ff(_JFno{2fA&hDc0s3Ak@jQBanbTMEq!V6FyQ
za~oB>9vw<G6IpYmFVc`Q+;H<c&c<Lwdo0<e7Yimh>4`E#C1aFL34?Bef@pBR!U7BU
zchr#vHJBv#-$_UDXZ<w*y!H7%iSMo3{(|A6)unw|mFh1qx0}B2*Y-<R(YZ%JaA0%E
z_iVrI^?v;HVc|P2&gA`?U9`;h0S!6~+&<!?)j_x2dOiEG(~PCV5UpZEUG3!zdYa7@
z#LFKd3eL(Ceym4VDQ`I<8t1XcMx@C5Mm(rZ0&K7IweO7Ph12Rt`sYIw&q8ccoiv*m
zT3YQsv&a%@^W{9{+ZCMU(P7W0$-@*$S7`kIZO0tn|3B>*lg9sjJH`m^|F&c9;r?UC
zL_GdRzG;!C2&56K!jnO*Ve7mtPS%xba>lYgip<@pwhFhdsnTJjp0fOv0hIfsw1sXf
zWG|Y|f4^@e+~Ns?<$L|&hgcOLL%qNBu!TJoRj^!P>^a{CDs{S)kX&Ith=!CcuZy>z
zY~1k$Fo?Mc?HrzSaVouX%7<>{-<$YN!<$yJ9Ba5Rs)?W?+UZ)un9sUchA1EZ5G)Vn
z9@SO2N3OVKchL&-B`109G-zE^G<LMCFji9Ji)~SS&4f9O8od`Bh#ij6A)#y7r2J7a
z!}0@Wtq>x%4({(|<7O$ELNK8v&;ve2Q+iq0LH%nJbE5q)<6$CE7xf=yHHI>lICG7P
z)XDrjClq5)CdGUy&2(~wH#dvoe)IgsH>_*Bm~0jk8)lx?<L{Ko2Yv7-lHeCHriHLJ
zF$MXaP8bEa(i~U^ieH_Qf~a2_`=kvE%~pxV(cw_W<w=O;$P8Y8iH8)?ZzU)!5-_c`
zCq@c1hFSi1(ov(wq1hj_T^s_1l+Q=mL~ak|BnW*28~$(HWaWhi9ORNX)+~oPi%WyC
zk>d8M54aTKxu`<dnP968R}^9j!e7BmfKB&=HHvXw)_ip*b~;08G?pk?S^a3Qmz%Wj
zAv~pIjvk9vT9S&Rv-E>pJB`S`%?B6U4)joIFC(ts=-MJ?xkkm}jnfa^4+ImT9eWIG
zm-#YZe5&;S+JG{cg>T++x^CuAG!!oxGiJ#e_CYQQxc3;<(|*3oydQdO_PQSu(U%NV
zXT+lD=GpxL=$5F;^jt~(>g~Gu9DhMYVNlbX{at3yr|dj&03~w*Z-mtD`fIy=6{U^U
zrtN8(A8l)d^jg^SqGA8C%{J{Y2iN1_zLL`in<ng;g3G1fUl!lVV^C2Pbc+h030dy~
z)9!Fn_I8}{y=|#lJ+qH6FCW|YHjSClP|<=6YZkCf!)-SCwf~)}5|Pr5)gc{|A#six
zx;%i&rZF~z=q=$v;orI0hTwZgaLPOQJ{4Tm0S>=tXg|AdPMU^eFoB5z&$M+~8{8jB
zpEgJf$)JZUg*J>Any<}X=AaeAhL47<<wK=O688pA7iQ=LQW=dAzbbbcSR9urhAe@e
zmmM$gY-YbR*n#)lJfe510+A>G#4cXokG%I6AS+tBon_ylkc(?(cEulX#J@v}cx+)x
z=iUyDH$%!#8=FgpF(0O9H(^Pz+k#j6{mVR@SJT|HBvx+v5Vko>CEtIP8hLar-JQO&
zgN(hJBlQ>3CA&z0JGLUJEi@)onq2Sjz(EpQ%1MV!gNBm>sS~Tr%By~GX>x$w`hfnt
zxTXI-(n{K~`=}MUS*`mMi8plVB_j7=e5!*OE21`0&QP%ErXvPkb9lE2ePhoQ8BBKc
z(!$6F<<l4hA%4^MfJc)?iFJLX`N(jl;F^8LY{Af2n035Z)Ns~){}}NrNF|xa`AW+|
zt8M%FK<BM<xqbEh*jJO~ypG04bC^d3IR2dgu0Svq?ITxD`$Fiu-c6+#v+b#yba!vJ
zu*EhDfoK7{-5Vu)z>SZc<VT2*5WZt&VPz0|K36RF{1-$$|CwkZ_F-Z$B7rA^n2=sS
z+Q)r>$7`#{XUAbk$wmUw?;N`*ZP#j^VcSwkGc}Mi7a#ej&1E)#P6l^?Dtyh|IrP(T
zr6albf}GF(NVRo7rL3%F`EjlWf}HeEA75;;y{0ggns2B3CKN|h*Q=%TI;q1EE(XU)
zG^&BLsV?qO#Z-ot_8Ui0%jLjH#gK*s)nb{OnQcUF9Rst3QV`dAhkO0KzkK>Z%dao6
zo#1>}Jk0nx?g5RXL3iTEeM9{h8()}Bou?De@ktIjkk&HBg=S7u-qgKa8bV%)1=3H+
zEQ{WFAr8c&6J^zX$o|F6no}88Ab5V&b`#O212Kn%ryZ2^eJt(!P(dTuazUBZOC9X>
z*HC=Hrl}j4_~R~qRv=d<&y^p+5ju#1GKPI)C)1lla1a1D)SM|_R?+`cNSHO<%jjWj
z1Z(cMWEA6FqZ6L?pcD0~sC`^y-7lh?adnO&CG-*qBm%X9)_HORoJ(<1!&I$}N!9+f
zB*x_H?o^RXthi)c{ZDwZ8CPtJwuYk(k6(M2`*xMz*|pzxC#4;=K%58P!p`2q5+v_<
z!5lusSg8~*?*{1?s#1|$9tEU$tS4|O>_*6#_Wqjx+x-gEg5cXMBL^%5{i(>aN$3;A
zl9dNbF}(`EiblaV$+ii?CqHikk+yHbR=z2UIeUFoCOC;JTiX)T;B8A6mr}2vJ*6O4
zS=y6@q{+xjiC;(Y$s%z5y!z1V6@c`>+i&i=&$DGe$1QHS$!++{P*am*y*>?*>jLo7
zyD&Kz(wlOwiIVL~d=zPV8nl*ifY#x)sm*az(V2&}$ab6Lu@A@n%LwE!2CY7dc#G(}
z9>w?SXwM$cuOa4mwE#ojXiF=%E2djfrxV$?OrpdJWn=blE)P6ds=tMx3cT|Qi;fQ4
z0Gw^M&hHw}x#t__{SSvrNL4`2uYt+=@f07;j(fxcyMaM5k~0-X;-oOpC$|z1eyZ@n
zn_p~ep_|oESTXKPW5^4q@}Mm8NzubM<}vofA-81NVW^aY{+MH%W&?2JBYm?TlC`=j
zbPS3;u@oZumw$@m<5ugN%1jRaMo^Oj%!_KvMgB^x3ayGsHr3diyV}5rt}S3FM{gXr
zhRw?}SX&EUkVR^1%5zCT)K-<_zL;ezznI}%4xfNiFErky^Pe&C(r)7Emi}GiXJOTU
z2*Ala4~f+%yCDp43bgpiuS=b$0hfJLO3L)ZCp%q0@{GS-psAHO{%{z|5}REY(ymIJ
zkVQ*x!X>I_w3+)}*QKtiG5i9?M)}>9WzJeJR+b<P7Ol5VtyTI!$>3B_th^JUDg|jI
zV5J@?Sa#s57vJ`UJT<B2=Qkr4Fq+jNGBrl8ddb6$;O_3JSr`v*|B?qnNaW^br>FF3
zs*}QJb9aEYA~g~JotU*0&2Ar=ksA}+pG>)ox+1<aU(F*_%PepykS+^2IGta{+18Bm
z)$Fbpw|erd;p>&&rd*s18KK1ft3mrzT@=xU2Gbr@(d9@2sdZVza&Xpl*x|QK9=UvU
z<<AoT%u(s(5=MZtG@el^^`+3|9uHO6dgo&pPYMz`hOU-ME!4%Pa=Goq+X``sxHRu-
zO~4pr#u&dJxB+v6${yitf(J(BB&4p>3d&`XxfH`t(^VdN6x6G3v>WH~w7a%{H=Lb^
zT6V8Ztrs)BCA6%+j*DM}(;3F+CaU;6L(^Kzj4a}aztkX!Gf2Nf>k=E?tgCXHD>UMU
z@8OE?@l0%l_(dQPgxT*4j{)(%R`VPuFYl!Ej9*9AibM{#-MGrlal6Oz<CEoWuX?Rt
z`$3)X@%ieZ+2C=dCicFYnC+5ZNVz$Y5aa!Ee102%nY~gSeYv!LUp~BYmS7qh2iE)h
zDoYNUUP~6!RzdAD$+CQ%88USBbhs&JgD_>&!6lE0o;lYE@a@-md!jq6G%9jD!6y@>
z%zWESLM?Uydf8k)Yuw%$8b7h%Au`eW=tDhi_alC>@_fb1F7+fS<v(zKL1?d;1OMIl
zTtK9ZI4Y%Q6)yPh#{Mly$oy$t^F%-RwEmv(eZA3Z3)uGegRSE;%D6B2b^YmV>j{FP
zMkGEcvibf%$+@?wsl(N9`w1!q0w{@53j&h19i`s%6jXj|>70GtQgpj-c<8Lo=lyg$
z%3jRya4G(u6ruErSncg9<KTV8@iPj9^3_4%6K>|WDCj%+5@AtXW80TrZe?$+EQ6f_
z#2$y5ZJ0sbE(pz{zMj4E61mZ^ntm9H4vo2mT^^Wit78cEL!EODYJDY*n-`>Fduvny
z^|_$`v1LR8<2R(vZBmn9FXiRAo$=~$K8W;q6#6HsoG+FX5;k%=dD1k~(_2D4y5joD
zHD5tu4kmTuZ>y(GuaJHf^fqc}k0Af*2E+-G6<k=WYYm>e^S!s@lLn~(U9;v#7XjY3
zcx^NO=UU9H(_~M!#;JVN;g%dI@l*_k)LO@(<H5bwnp@&JJ(i1t&K%D@&VM%Dq!*C_
z-$16BBsMI>sWNFiB)_?T|C-!OEZUnkgs>3Opo}370_uQ@5tPn?aJ}FE_7CYKhB>84
zKd8AELVyHjwQm}gcncu|Cf;;1*ksuw1OmKdHl(4QSblS3w7QOfnAfO0Vup1p;V!PO
zh(|Lt@ZzyN<eH}AEWcDAy-{sDX_)@XUigM_Y*^(UP}k_SmV0QBO9I{!p7@&nH)8Ej
znu+?yfnmv0KfOG9Za2WK6A}wZ>zNNaaitY)V?YvoQbfQTo0N3GcrlzlGgTw10n9TE
zL}*lUzgyJFDl*QOAdrGnIc9zDjS6Aa`dTG)-ho3uEorkz0Xy!%5}PY~(uhRYccLp5
zS?nvwG<>j@cnkTNc@UG-wxI|*Qj`f1?lUI=IY1>gPiZ&6&^&}vPa282-SQkvC4%Cx
z2`mA35V;G`R@={#X&^SD7%HJi^=Bl+o>?1oH?PIVF*P;~NhKMf!(Y7#APe1pozWT{
z3!OJQq@qg$rl=1>f_H`<{}gR3EbQYfH<uS$q{b$-2;0O}C=e(S>6TBO`dhtcxOlF#
zIZ9N&X;6%gnd&c@H_ERLvh4OtB5cI>A`~%(o_(^uscHNXVs?edp@0zh2ph!@D{K-F
zy>>Z5{{T2POUF_Uuedqlawd5ErzUnRl8a%odOleZs|gv?+o*Yc$mUV9)y-HICV0ir
zlspaE1S;Vs1@L{I31?A<IRW|9B@5_;ohes`rHW`AuF65OlW8(gKU+EUc(d`yAj_i4
z<QBU*RlP-M$I&z7A+$vfAMe7=6?h7VB8Ka6)(PM(FbJN$J4g}<x8&7XqkX1WBGeFY
zrPlq?$L0w{os>?fnQ;&D-b4cPW_WvjY<h4sB*oo2s}Rb}g@bh%AAwr~kkmOpC*yH{
zyc$?TlgRBHe#P)rku`*bWuw#nI~^Fj4;NIO$YLJEwr(Q&2IIdEs(9W06~>L`SEs_s
zOPnJKBnyp4l@ZwE6t1Dx25=-6Sh{;4a*Oxy*cFKk?0WvxRnJa%G!(l}^7VIIRHsbo
zvwg2D*xnYAj%RfE4+gj{tQOj#e@k<J8&I)Vq%m%WU!y>>8i9lzSxyJC6O|c2kxZ9b
zc#MG8^OTyMWHhPf7kB?2ztT6GT4R>)dFe)ny+oXPl+o9FhkmW~BFf8{o6++jDIZ7C
zRUg4={d!-s2}!H#^Miz{0Q9SQi+$j~DZQwaC!Z!&G8RtCsZqemE_|$zP#((W{0XVK
z-GQW2lfVeSh9w6Z+EpV{W)mDL!>FZCa)NdNYo7VCKK>0#Uz|j~ldmOh@kSBGA-5{c
z(n`@lEU@k8Ew;owyZxhqC<3O`>aTF14<7{Lnzv6Ji!6eeMDZQ!Gl0j>93hS(5^A?L
zJx44%IXW4@NVLZzO{U(C*H~#9*=N(ug<_MJ8C-!^l&TC6=odqy7QQ<aU7C1Cksf!C
z?U1WS{jk+Z4Gt+SOnZcCKrv?9Mq#k3v(kh_skK^vZC*{5mOT9NFhz;Lz9Rl8E2pxX
z1m1}oy`sWPpb%at?tLRT?Z}+vKteApodTOir7on!nP1LUj$8xi;#5^vLtHbYwQ~tc
z^ezyjfIrTjnU~P9BTgfVqMVW;A^DwCTSj8RE@QKr>48g>#H7!?djDv9MtV}501C<k
z-jLjwI9WbprI|3K$xsPYwy-xSNxmE4d$xO+p&D@lok5INdPx6am>`;*w8Q5|VW;lC
z6B^gM03g%KJ>p{!k7q7HDN`Gd2cbI`QwAZlE@Aq_jCNu&TD5#HfCB<^@7D}vw5|OW
z_JCAW3#I#2zZ0&+bT^qSxdgzjAAfi~FfC>^r$02u*w$jO#<O0J7&?KF7cXFDDm0w}
zV8CqUItr4<A(w&v*@Zh^oqu3xc}%>Tto<vOs}pRsDb%8kXIT@RN~X$pjE4QmNX{XE
z+4&>BD1h<}vt+s3-RE!x$HjcO%ZUOvMl^A3a0Jlp#jY4wg}Xe{4`q#eE2C^t%<5&t
z8+vl{#Q@b9^Z=c_1$|@Tim-!4(n4^I)YBoeW7xEV_nzbbEQ|Ln0L`_?RLTx{rzK3$
zU!{DTYax<G(cO(!67(DAYuHREAYkD3C}LHBo0PLUZXIFFOL}odh@ZdkA>A85<2;Bp
zTA?w$h@i5c5DVe=4%!vjbgS40b;!fNVwxN0s*b?MjB*rh^Cp(Q8%TzdH?M50eG`2B
zJqcwaev#Tu*QH9_x7Sh3#`gB;QdyhnQ@h%)jF9MK>jl@002EQnr@f={4k-}BJD4*3
zF;E=c-&Ymcf^Ngm!^owb%ZJip8R+(Bsipu87wE&{q+DTc=0Hm;6^YbA9ubxWfIvR`
zNswSRm7Ga7+|j(wOExZHwIs>4j^G10M6tXi^=H@%V}E3<Ouak&H7iu)rR$AP1mwEb
zO>lz{xKoi15(|;YgnQiFkKWAZ<c>`6)lm0YO_WSb(Ds~Di)1ofCE`Rdw4eG$k^MdN
zd%#a_dZsgFcRAiLhLtH!eWVMDH<+B*1r3rWP3m_?JN%wVifowmWnmZ^F=x;&Rebq5
z8-*VzQ!C;DHlUAqKRa42@4(nOh1k;tY%%{*h36Ah7Zr=);NT>oc-D47=`n@d;>EZT
z{N|Y=GtPt&5n&nNf(+Cg=FGr}u{VhCO}l+$4%Y-M8E?o33ycZnr|ihtQedB*NLGVl
z^xb9mS7{<;NFi@R9c+~|AT62)?dkv)d(p<!zA`%MjLy(Cipsus8zP>$VC>h;_O9j%
zp)V-t$VRpnxcRMyX;H|aBTireGl_6chqrIvUpjQ!oCmYKHd<Y2MfMxcNf;sU@6G<=
z#qU8)I@Nf0ye_$X7e;@L&y!ZT@Gj<0lto$9{H#rj7f+BXXys?qM!^HP1P$$TN`nDU
zz{a4d`(}%2X+i;vB8LHL-tJv?2Q({W*&Q7$4)gP{*7RnoVt3y+)B{SVxq|Bh8e+p_
zZ?LN5@0;}L)1N#>7n(Lv+X%hDR##SX1o<DP(Z~1v)(6)0pjN2kQDKGndOP6F{*Psu
zRK|>kEP{fd${eXaRca1&<s>4D7VIy_ehU-_G)i;^V(*7%tl_FwV(-S}Mb&4quL}w~
znl1PjEh>Ui$xpB7wpSnE+qTzV3*e^n<Z#wo85<jA@x{Gv`l|f&pj*C%Yo(5oqZXzO
zA2hVMn@6!w7yTE9qNabkJHX64uHP6j#Rn1rr4MDOmA(QFDg3_$QIf4M?q=v-QMep<
zIi!JEWVSTETSs!(W`X2*X|<CV>gX$YLnA)+g+uXWvqmsys(7-7I+gSIXp9_nmnjxD
zr<In;|D;D|;Tj7FO0E)gm==VXr0PNoiLAaHVh)PBa&;$ME0_QeV;$YDuEkCv)-CuX
z!3mRPNfskNjh9LMruo=O_<7JqYBA4SD!~buB^EI?v-_<9WW1l7pJ(dH#|+x?^tYs2
z?WS0i!J26b3}^VrqDqM-Ae~fm%&?g}XRiB)0z}s*Q#b5Us#dI9DnMH4r{_gEoVury
z`qgz}$_2QNd)Te)BE9%u{Pek)rd7tvu$kvXf_8rV*ZHM&38n*@9jaqvApXVSw!0|V
zUv!A>D33Wv2vL~${7HYhUkY!Y&P1+n*r~he;E&F>S|e96y%}=Q{6@`9`j&Do+hw;}
zm_dLwYCLy&K0_a;YLEen$<OSgvqi{23r6NcsjT+YF(bq;3R*+>-rObc%bGvyoR2D+
zieup%_j@1f5II-g<YrHsJoz0%#yH#HJZ$Uex>C{(?qoT`jLm{l<dk?72(?J~{q5+E
zhX}?_aRtv`tE@#;N_ph7A@XGF&Sl{^<o3_fhErJaQy*V!R~W>h%2JBi(I)K)aKBC$
zC{shgxYBw$->(bb4%U+zT*&iVqE}nH?zUSUa?Hg=RmBp~+e1Lqy{{eOS*U82<pl+E
z=M*W-v0~tQ+V7rx+`WXX)Y-VRGW*!^W=9b6U3Pipvns9)<cD@B$fq!?cYhFdmWx39
z%=|s;yPVV4S4-6yEwmeMkKHX^%b=uJ_93`w&E;kHv_Ou-3&)Uz@uO508!5?3N8_dZ
zV}1ws7mj{r+nhW4dEX9ak9I;FA;ud5o|!%I*-$Pb&twGMC(_Webd{fEO$nh9|DnW|
zcNKc%3o0AZd5yfs)8ZM|=N)>7L&#Gd{ZHjVKORV`b1dRBO;@L5o85)}{$!`^8sI5l
zJvD9EWqG))2m2nJ<{%Spx1fyvDa`X=x90pk@$xVqlhtN!=a@;D25gli!|-t2-{X^M
z3b}XFW+kb{ZWrHZdyZ4TM5|whNc3Q%orm}$DHlcKM?y7Bn-m^PToSa9sfO1s8F*E0
zD4{tXoJPNfdTn!ek}@b<S`YDtJBO~619Ulhq^vqR-hsc~_|G43R}I6jrsd`(U;v}6
z2`quMJ;%h7bL2r#;aoORe=*XiOgy!O{O_E8ej4S?!1%K4&5}>Tp5jn|A0k!!FJL9%
zHT41%*dJHMoOS}k!I6>SJdpyV<p@}v1@b7vq`@Ld!(G0p*i05DkioqiQTc9rKlf2h
z?YM5ML7uy0G=+xwKp>CO3X#|I-K?zmtr(HE5#h~;c*5cxW#oB@f`{Z-5G0R{UW-&1
zLWt`e%t{->X_ah<^JmkpXS*!Asf{R$ESx}h?~sp$Xgmo|tv9Sj!hh^fmO(4E3jak{
zIWft?Z*P&<N^^Y1TsTa*=j$Ugyn=YprswPn3}YDxG;eSHfvY2+4*l}}EBcy1suUSJ
z53X62aNb<C;+ZOHqq2_c(fUWZAPiNJtOX_TCd|gOo_UM~`ghNJ6r(Mdjk78-MY9xe
zg8{q2QCtnKV<T(^l`2FUcGNxLv)WQzfC_L)uj6W;;US)J1$2#8*Jw{k7q0xe4cBxz
z!F+RS`_qGkXe8IRmPS%>yK?_ziE=g7DgD00lJ807;Z0DP|6DzYAYl2x=zUAtcHK_S
zVZg6+qpJ*f-4)*fPrC6+#!dmezqcfh9HoDl?AK<e<%}ig_Ewwe$n^cEw(X@<PnW~U
zR*UZDPr+gfji4yip5Dbk^Bo{7SiL@D2W~^a9j%;U=m#qaJ&B;VROrRe;bYnvU)fYr
zmNcVi2IY3D#qy~u5Eg{xfF`(`)|Bv(U0a_o&=-;Lv?~THOc9EpWCd!{GdgyJ`t);a
zfP;*A3}}3vT-QM3BBcQq=rH5tJhtM3Mv;=)ydW;h$z~`&Bw+xFwI{8wLHAqsVe-$F
zWQiAuDx<3|J$Eu@0F(H?H{Ta(xp?<E*Zwd@`DA`qhB2quyoN%4e47p*A_&UHC{R;>
z{I%3c5P)9BB8<eO4}^av_R>=#-7GPc)!EFJ4q9t+Sd0;xZYL$L&Q}3>6@tppTc{L^
z%9FN`AgmXs7tpj8v&LwM3M%Pe5Seqbtcm)pUQZ23WqeLobTIG+M6^4w0&+HX-J2(r
zo~LO(a(K)#-&eRDu8xY>^R5lmp?JM&A5XjloAIhR;Zk+s%6J?c08IHiPd%%)eAlz6
zA(%~!ptl3UF0CPi$3E5GQX~aR!;OFgJ0)k)_eQ+mK;h9xl+Wql403>QAg!4j{y*mV
zj9iZ<5z|Uh+P2I&Gs}&exAPtvS0)C$41EHiHmiT*1sDuqJE+o^(BdJQAs3gD5}K>5
zYc~72Tylq-R+nBpE@gS4niz+L%d4Dca<tVu1?O6@3Ch`)U~X3{n?N|<hu$n1|CqU3
z$idIejtrg-n69I0mVHhBd9+U~f=Tq1(U=@=bE!4rxl~k-R8Vahk2a&TJ$H&@GeXQm
z;}){Er=WB-DVv%a53mgNEXT#)x1U@<qJNu1e|n<@Ft#k4&~(7UeyibR=rD8eXUsWB
z2BW;EE)j6;eAv=z{EVmrPC&NEgn^YOq<CuEyKnrqb`S?6Tji!*`j^S2{PBhi01Y-A
z;(*4W^VWuTxSxu+?i>tb922;_Ph;PFFj+g$FO5Qt93l3v>5DMqb-0R>j-)Py8WpDJ
zP&?2&Y|`3~mwfSUE3KPo5cOpW=o*2lJH~_}&*)G%KJ{e&*fWUF{sz#T)!+NxXamln
z<xWZo(I(u374uWdCvBchDq}HA`U7CiRE0_Hr&8VK*xXInza&6Rb50P<k|iv>){OS%
z=p1TmAljWY@JYafQiN<g3%<+Mum@qfM~^7m(L$m|M4^scCU8}U+W14LaJ>E?g#!i}
zv6aeq(xM7@UBz`Onfu5D7fAf~f7OS)NZBVg{bbtkDT78oKzesnzcn-mABOS@bE?Ox
z6<%23l#ufE9G)89s@GEXjueD(AA&6Qi4^yjyt7=`nviWsQW^k%jqwXB@G5F}pHt=J
zvt7AGgH%<OBQ|OPhP63@*IPRN(Jo@8(O(@JMKqT|7KNu*G>lJ1DwI$L=-YJl{h1Iv
zT4y0;rsGD#M>5wJUKHk8j6`fb_x!t2b}7%ubR%*ODLjELV_XR#@XTR06Goi<DZRp<
zQMb$2+|wjpNPa?9Y~AWHOrb7w%x!MGa$=|A2%eea<&UJZomYy*&u?IkM82iTtkgW0
zL7*CMcwtxk2>GadXUCAjR!$&69609p1ellsty-B?F$PX^_GlnIe%_1-h#dSQXM?6W
zO8EDA%v8<Lpik0YK9=i{klb@pf51nxbacBlXcZi}j%Ov<=U9{W66LkX^NM$N_J#Pt
z3tD1qBDr7dpNQ>yY+Rcdx~36sT0W>ozqUF?@OC{4ios4Pi56$YvmCYLkT~vVfzT9R
zxEO9aZd{TYA@RH2_)t7lI52A*X<8RKNL+*10f>nJD7~z`a<`&rdbLRy4>lRb@0s<4
zG2@nRqR5^}KMl!p1}`yfei)-b9TlZvX=4&^-kj)j7R;iXBXo;d)5F#QhzuSq1LMAW
z5xK0J{bKhdC0E-Q-tW~|{Q#~Voj^O8+BxM{*7J<^n>EDJz2_}5sYfs{zd$#hIQ_#Z
zZ>BzUtNjB^W)g1GViGY=NcZw!$s@_*`KD|}*s`>sqBM&2q`<SKdWRG@qbfYZ6@p#@
zIK7ockAwv3Jm+94-k1Gk&sxVicp@`A&o>u?e@fYT6Z>EdgmzcXV$=;2KW+JxkG~IC
zB5>g~@>=(g<P8LO2VzR_giotUFNG7J-vqn@qG5Y1Hmor<y)!5;8pf@Hh5#>&03qdm
zX^K*N%2?Ru=RoF$--vo)j7t`{W*$+uyKAar2I2v!X$D@#anqeM?IxtWHgz%ZFdl?i
ziOZvM$*VS;1Twv2l<Nw<>Qp>v-%Cc<+&?D+KvgTpZm^nLbX=$faVM1gMDzRQEhN#`
zoB{c9%zjY8=+6-Pl@7l<SQ4-&f#jfwz<ZI_EZQv1G8HpNtk>akgX&jcNeCJg=Z~p~
z1ve}o=n3L$p@VUiA$}|j&tt<b_$?zqXzHdv7Q7G_D~<iavje}lPEzeY=UD%7+fCPN
z%ot?V#W!m+YtlKuk8h6%IzWR1X7c6b@utHV%P$rK4C!S9!-;f5f8Bc}owtx{lspEN
z3i|c%Dg1<*s(a>oV~@20+;58x{?++$!QJuL$rHNucQHH|&L~QFO8iV{$Ls2#eAHHJ
z&PDM4x~n$>(WGw25$m?|Sc7eO*L=oBnrAa4U96{Z-<$0DIb_5TIM}ld`1|c{JOo4y
z%r$!eS-iqmv6Nbkm<cs)R4US#fa{Xfn8U1f+);xqaTvB}pf+2nuD1$@F=<k0sb6fC
zijf_8XkJgL%Vt%$o&spofA-EOUZO6Gx_JGMyY!|gCmCE&;X<w<1D#Z0;_%+MDy4Bl
zlu%W$H>OmjCJDGFmHM4tzXzuoP*y7~iY*#E4@tTlQ2&FX08mchS43(Tj+tngRQ!X#
zSJ%(a=-{`~Vc4Kaq2Ny-H@c6uAD>=TN~<F@zMMdkcl$3?sB69LaSWTZcStJqjUG>r
ztS$SPR2DQg+0F#$vJWGE$m>G7*V`HPFE@6Jn_fVj70rqI1Ds3LVM6&WoJnHV(j=%}
z>U8<P_YQslEOks!af3i984-qG>BthLh+f@2Smy#I3|Ff(Jz9(z|3PD9@in#=mzIK>
z4>HH7hurgOPtc(r(9f<mS$x}3>cHF&lcYCm`p`(-8N-Nd3QAblIcFK0tvgwh)|;z*
zxR_(5?~74ZtqnC%Wk~*nYz^ME&X5-v@k$UujD5*b#EB`e1sqp-)*Y}VT?)D?7vCPc
zjAaW^G*z{b#vS_*ul@-@j0sC&b?Sl1r)m7s;|_kk?#>dLbxGXdSmC5gT?u|;0TSF$
z{q+`+Pz9gA+tPfseFJg2$w247C~reh5%2{}8cB`Ddi^`ge#s8O8u_?ydvIa_jr1#A
z&K`&rjJZf$>Q?-G=p%FV&kD-;jIKPxQjUQ3y?jMUC-Zg9RzOLF>Ju)y93s#uD4#Oq
zRZS-NhOT^hkL?eJ-23ldGy)FZRMbn^he!WEd5ZY+A!e8+;|ZW%p*ug_7NS}xXWohN
zb??H=HKus*r7OVUf2nA#`cdHxVWdd}4g}Ucp4prhb22h_Q@2qwvqZc6UEb=x$vyzu
zZrgMW;Sq$bb;>D=s9kUQuJVuf0!8}afwz1V^x%<%j>fOF<oY-Dy9LdfJBBoN2B8X{
z*{A}8I|~<{X$|J%O<L}+i{K)@6bAQ^pgpeDBSFt)dHzKZ$SZFV$6A2fpQ%>LL2xxX
z{#NHKVLB6ow7+4&VbSafQ+x0V$S0f4stpK#sUJHZO4ztj$=FaJQ=*x4ANsdU@EB-}
z4laTNZKE+!e)9cr`1Pll@BJR0nw!k`e0IM@5*UsildiwfYMyC;aK#p(^LtRx^05Tt
znztU(`8UZRw-Pm|P_BhyBEL943b4KVucGcxQsz)_EiT`!2YsP!!V)eKMLvoAWNdV9
ztiw!#<0+DBd!&v22nj7Hf@Nny@JN560sPib)iw#}_++~dg=>LOWF3S%$!mJqC0V<B
zjA|{X#h=cH66Op!grB<#Bc1@#zg~eV^CIlkLsPd|3|pbZ4D*5>$o)~WCf4Mlm8!D;
z>gissUJ^Ud(L5+9B2L!M(&QPXwq4EdYi2z=>}ql3ds~6)_me2o!7#!4PgS=|6b8;6
zxt>7(*?uAY_!5JT@l}>O5289ao=U~hYp$&%a~Zd^HDOdE>TTy4rT*pj=O*y=t)!F$
zNU_mF(mST-rxm8#Vm5am+gbUt{$srL^ZAaa^EYEIf#$nXUy{>sR;p-$)UG>(v5}~Q
zjWKs#0}lVr!hA-_3afKASIfNC5Dg1?(PvXjk6TQuj)Mi~-p+I{%Q9j2^FG{giioSj
z8N6*FJ{w&puGL^=PcXWETi)FjnrdT;h1$6rv6q_<7PqzpTkkgrg%KO-JweG}f3i3n
z6e#l$rh3C<rK7}%vnKfkbMb&*J;~-m+0DC9Xn5?*=MN){s~Tk}P<|M7y}pME?eu{$
zj{XM74Fi3W!t@Tmf-o#L1*$h+t8>^L1W#qZEOwJ|vt{<FyM5G(Ias4s8A#l%jH`8v
z9v;uX1|nV_Gi=wB14h^qM+ki=a4C`zFz42bTAXjE<Hh`*T@#ZdVJOfI$cC1b#Y@*@
zmjt!}&dTw%u%~d)JpLfn4HFR>+K-XeUG?_LN#&?>)m_mivH}Ec_9Z4wwV_)-l#Yv1
zxE??g9t0PkK9Iu3u*9gy2)<1L9XBJRLAD-($<RzGAF#sLMphDzI!g7Vjj3}Fnp=z>
z(=j>G6YJJUTMeNDDjRh}y#Y4)bTG~<f`3A6nwR3PD08LZBuw@gLio5%jKAM)y&6?)
zb941YL@pR2<P#~`??J{;bz=a%A!=%Lj5;XTJW3>rKm_QX!jV{&M`{mW!?T=Bxe^*P
z9^?X7V0A=X9k8oQ#x_4K4L6<=EjqaPp@}h&-*%`CS=X_fB9F4#1IjwCDU;YN86z82
zVVz4gj^q|N5`QUMO1~~DP&ntdnCA-+WThd+liDH&h!A}hx|JUiE4{3wx^p?rPx7SZ
zvd0#4p$m>-okFKVUy1Tb0&KTLGS;NNLw>p1POS-gAu|Fp;sZr-Oznz<9DggowEuKv
zOy*Lw-X5-WNzb1<!_K%Gees8!ER{D&aBG56IMWdhUPC2}R)Ve;mDi^EUOxP*+&q4j
zM379UR%ya}H;7cAFzG>~tYhzIDPMQ9cu0CQh*16=N275(Y>7ds6|#g|o^6=X1fh;n
zJ@zb{UrwF^PrB){eNXVnfnH(JC18FzZg8hm6~7aCfkLEfkKP*HtIP*-O=f;o{x5RZ
z5LSp3FtmsQc}~zGoGU)t!?oH%X(yx@iXzJE_?cn}ZCGfTW-8H&B;%+nE7Gy|p<%ux
zeb5DcvvkA{KMUuehPe*p4S{b|6SYrzuji)cst@im;9lnYxta^D|Euh>PfNON0q`-K
zoD3dxTajC1BXwd8bn|Hl_kkb$8G&S1q2=R%ZvFYu&B_AWe0n>La6iH?bdoMqh20Jf
zfRIWTcrUE~3?iZr;!Q_dNV$ph&Q%L5PEk`W8PHJV>DWWOm7_;!Kpy3sWB|N0Xw(Ju
z9~0L!&afX%1qyu$P6qJf2|$&Dkeu{*2PF_)q1vV=-noI2N@V2X6cO;uSgbis`@ESi
zJtZ8UEk?$fxDOIV7{TU+sJL{q$zU_Q!wc}yzp6$R{Ao@A!pJoY;7m@?FC#$w<#_ET
zk`E@klS1S#gt4u&&r!A5{!@(7TvnWYmD^>q?+AlYhwzn$qeZ;#*T*!3B_^2<sK65t
z-OO~6>v<;^t|vaZ{xp=#mkVgRPt4zyfrN+ZsyiMAH-~E>7q|y5xaH-27czW8tUfT$
zcRFPe#<~J<o;$_^Vy6E6#NA||y%HlXzxEF%(@c+p6kLn5<O3Fg@nzp$BOy_-pM&Qb
ze-*G7Cy0K#hpf59wetkluW8xXd&8-Jmx(k6vVO$!yG|Jd(!i;Q#;7OG!b-^o-$%$o
z?vWcl>7Dzx_5B*5TL_hpxkpd4U~(<>fa0zT8NIP)p-}M@)Ad-7UA0FH>zNe5Lz_?{
z<NgU!xZ=+$b)nb!DpOQ!G0j1pWDcadlMsyJx>3WB0_DGUtx|^k8$@1|R7e~Y%)&(8
z!z&F|YFaWi9$)05i;owWjD%b-e?G<62DuNYUJQau5Lx+Nzj@l2z`&G3c#VP0yl*3r
zNq~x42iT)ETZo_;zCPm#ZuCSkh)hG`&BjkqWy3&v*9)#hJVh}up5j_IJ*Z^YOWbz5
z$LA>fqvP>X{A#7kYr8w^E!d3%f`PyM;!4ahAw4ERUCPucAr61lT~I(uzkjtZTU!C{
zj#I!%;=4V`B6Crqhak|`8TLJr&-P$C@Dz3^&F@o<w{Sz+x=CF*s24l<px+i(9*qu3
zeWDJ~bfA^gE<+4C&vdTW%vlI`(~~>AO!SJ4btyDz+l~49jp3k;fp#$K|47XQ|8DOc
zC)1o}tf8rlH_`b3b8}ugIM1K@SM5js70KjCtz{wOR88t$R%o+i6{#NLo$V_I$_Stj
zilDc}dznh7^3aGn>0SPoRP1!S5NlmP`H&rPl=$KLB(O>5v5d1b;{!}NhQr3fnj~t`
z`A-yi-Cof*e3KN|%U9jj_f?zuns$r6EZh58!Y?leOs=tb78Pglj1=*7>ysr>22kD`
zmHbHdIL_L!DmP1EZcX^V*t@4L!M27=&|%xQZQHhO+qP}nHZyFS8AgU38Ae9Oc~6~B
z)zvk+d(`NQt{<@P*IsMRXJUh>PlLtxIgVTW%tq5QQ=%J!hyXxIwpNg8nwVm*D=#HW
zD;n$-WT`q~4d?BJzj0)4eO-+4zEI(jBl+WpcE@+TY&<h%b56$<F6-=J<~miocX;<F
z@J^+wp1n0PW78|_dfJ0cgj?XLAElQXlQ?YK0BKj$<{<g!9~-*#pXa}te)jsA+YoRv
zB#d)p%2c`gYzmMqZKg|<HAR{P$oi;Li{0r?Q{mm0KYOX&{Y^-$lC($NEY2JxYkP%g
z@?dX~Ih3Cgh280nTM5cQ;?HRH#~w`l{vJ349%ksRHqCxBp~on>P~8PohwkU8uk#q8
z`a*?;dB7CAYkBAWA^6_*hi^O2w?LoSJba(M9chhhU~FBBHWW~qWZKNBGbQ;nb_}`l
zE>PztBfL}>ux%Ht1%TPUn1<j&fC*Llsfg~yUW%)T6uz`#ys7+F>xIS42;J?sNH*gr
zDmeH=z~WCOPb?%&+vEELn;xZsa16e3Q)jD!2?t;A-?hFGU^L%L<2XR0xIi|r=bdrl
zD4luVyM7I2KH#CAX%`3YoMul?y*<8j7u-wsS6It^$W7U0Yz+d7cQZ)SO-n8h9<0T`
zs1vg+?+yYf+$Muw<*;uj1oo0;1Ez$T`<W%ff0}_Wq(-Y%^vC4C%3O&2ZZ5O}tVgY}
ztyLN1mkQUUqAl@+NgGWJN0=dZ6eg>`k_?6|os9>jZ9%$-#X!5gh{n<a#^I#7;sbxM
z;L5~iiM(L({F@%FGq<<V`4OE|e_!c{&-YEDHr#&_nfYV6_l~N{aIip^aQj0hG59cT
zcKmrHm`08~x#Z*Rb@NpWgwzy6qq{|t5{j~XqGnF5k2{5BWCym)AZg)dy;q1HG&Uhy
zpgP26)KMww{TBfdSKdq>ishV7kQ8xtQrgMxc=L=X5~PE)i%-hfob8(KFtnA;QDx$O
zE;*8rcU@Vln3%dookbi{p~r$ddOw&6HsKTBP4htXn<m!scy^{PCr9^ljMCi<GQP+0
zhz~Y6Og-@!Qh=m3kP#SHqHY4{*VN810*CFN67TO2o+(7ijb0~MWR<MXun#-QJC7e&
zx_0N+cRX3qzA98$y>e|av3joZC3{!=)QyB;pP3#g9YNuo*%rv+LGv4#87l=@NZrQ|
zE?n=7b$>hjwjSF4Vp<q4(y>n^A2@ong&X|}xBuiV+{8CTR+<ony?VUJGZ`i-urfD1
zGAfkIm}dX^Y4E7vya>)5>KFh-LD&-|=ZRze9f|D{xdPuqHssL6FqXNla)211F2adG
zR-`+kC`-byEepH50ybvOIxrKSKy=s!JS0@3IYM26P84|5oi_eV9o(;EAya$NYkPZW
zZq>}gf{H7!>I`sn^t`(mO)UqsCHh`tD%`Nzao1zexhYR#R8o3ijVhal;(4kizd1^h
zMB`=O;0=YACFL%q=ycI$mbt)O#~LCRMjsdPRwk?J(cGw<6yVJ&tX_`6mJY~0@T&`(
zqIQeKpe)bDLX`F>sJZ;jg8mYy5E8<od<9}$&>jrYdnCLJ7y2O@nz0N6w}?f;&3zW1
zE|8g923h&5jOAjQ2qo~AeHNuRbmhT<@h{5(J=ux+vFH^C0>gSq%Gl<L#s|}&>B!}Y
zYi3DBQ~qC)gT|$#?cnn3eKsBUhvfwg7N~ZPY_T78vz5wIwk|cWTzILWMwThArS-v|
zQ1c4oVE6120xS0q+^g8`Auh8&&(tGIZvv#rzt<Hx9%5`oI14G+z*~A)y09zgGvPw-
zN1UQBxzUTEGhKfq@c$AiPvIM)Ww?o430(|=Wzlt$AY)E`J?0HwcTcN>I}z9Dx9K6e
zYEz4j9qB`(9YYL%w9{}I?7(THe@K(?w;*OWtFo4;dqldZ&<Ulj*O5N9PDMrgfdW6V
zsjVfSSxRqHt1^MdlBPx#&n~~x%ahOspwDZqSPLfY0A5XJwx#Avqd%Q??$91(KIQsB
zADx6Zw@OZ+=Qaq&KJzC0hxlFIdb~`H6a!0Gj!}H3g;3PhH%KHX7HfOO90m`mD`$jE
z0zh^=8RL4{Y%G8wM|_)t(<q2=zZtZIaEDMj_|~H`<KZ6!Lc}hVvT4T$DO^+@Oq2a`
z#c)D|FTBzaw7?(MbcsTNEBIVsHP&VMNA;C1-W|BOtL_R-1PPU;jOT?WIjmXCB*L#D
z0zJn`xMRg+DE&>$WG)EGPJsD&vzVgn*3m2PO`-DF#_b(DGzg?1iRdQ6=SGU%S-P~R
zTB!0+-Az(;JKtTRS4-D5w`AVx0EqQQ2hB5C(gBC!_Hy!7`$l_+j_Qyd>KUtB+ok)u
zMnE*ehmb9pFID7h2#}J$Wr$XvfynbN$)9q$Z&BbjgdQc;Edx-lHB*X|BWck8-^&ne
z#5Ehw2hJR!LpH9tek`+ozrIhK9}mC(S>!}F`<CXv%>EG~ioZudX>xZbkDk-f`P@$M
zD?lZayu~G|!xAJ}=cn=<cA8~noV<mLc~&~zMroFf69`buWTZp`kRF{#lsNlhI{;;s
zy)o-kp<b!-x?_pVbF$!mTD1uL8Rz<N(4`S_c6zZUcEQ56OFt+~zf9u5_d4N5CX~$c
z%|jKkpijE>++~`8xdc>IwcZ_4R`Vm>5Me2P5bESj-_*;WM8D6Re7II>#WH={;&`zc
z8PLVAaQI+NgGjAb-7h?sF^bEVc^K};;^&;NUVm$~h}$itIM;5Qc&^G4d)<+pvrD~^
zHFTynfmic38`t~H3hvTy{&(QP!M|pii1CowU1h2Ol03a${<Z|N8%ugks)EoZLRqjI
z*om`2Bkc<rqE8_R0TxRGQKU^xs8tHjs5r>~Or(rsCOc@?1JmL~M1#x62lbqj(+T*F
zE&IQi4nRs!QgK}*#<<sCBpv&S{sqR3WPQZ<{V19|KQl492+TGC%gIXLqf0QMB8U*P
zLR2;RB!DOdJP5g#ElJA2ebgt&Ab3KL-MS~9oEf}qJO4^hNlFhIZ*p{26v+fx;FL;b
znY=6sF0SLgy}UpUa1*W(m9->Hq2*WLO=1z3YC4z-C{kb?N`h31YPwP|DaK;-6iq;Z
zSjs{FnXCGcZYC!Cj!*GC*FjjYinO_yd%wH<aWTIB;5UyG<6_s0393jgN4K0IhAc9V
zd1CsXat@!tlvf_JVrjlle_;+n4KXr-u2I85QTuotnMNB!kXdYnTC)OSTwLWtkfL#z
zo<o^&i46=0cwmQm|2aFw0mAJy_!|F3cfG+QIMeFMd0yTFZR%*LhqM;dPosxjTwMH6
zkm!!vSe$U$5lD=cSdRCTPgF+89s)l0Rx{$4o%+Heg$d$oRsad>O&MCuSQaNW2(}q(
zWltn_!lTL9=y%AP9B=N??0xjDjZB-7z<Ni@8M*10VcZXkKrfEPiRVb+_KB=az3Z;0
zQo}WbTOf5TlF}p!{5;|F2>^$ESO`2{6ZzFDwFa86mZCt&k4WD*Tp4#xN)$|52a3VZ
zi>!>s5ImZH$hNltgZO!`0QNr+4tGbx(JAQPYKzgSyeQJf6tl@W@Ww(jYCPr_i35W<
z;c<AUl~Vzb{KUe5a_sGLmBYDmG8Hh~-N~K;%K@<U7_|qdn#cayNH+U#<<}w@-a3KM
zi1QfNV$jo9RdY)1;cvi!2OH`c4OKY_M*`6+>ML2s>sLy;|ND%uH-p}5rHlmx+nykU
ze)8xtn?Awuj~Dt+ls!i>%Itzs>!oRy3h(j}pu<)V)1oMNjxEBOmXJk~!Sm6}o<$oj
z7wby6P_8Hl1T^3z*Yjxct?`8nV@8)1i-2FXIvTGwDD|mDS=<)#QkKzwu?_@bQ5)XV
z@Az?LT*e8aVIoQZ(E=PGT$sHjUVrNjM!*4d7uE-o=dA3>B_n>hWO;LAB+M(j$kKKr
z7YqpzrC`&(@5k5eK+iRaZ-VbTJUM}CfX$IFF#YY6{UB;H9_039z=MDksB9vgfLWp7
z=r}b{U3xnd<DHB9MesJzB>4r$d=1~XX<j(vXHU5A%Jo;6YUzOfxR7K`%2nGj(+1IY
z;-zUt=UjU|TNA}1jgTSgA1Z0+&YuF-Bv_U85CA%Yuy%vHObk}3=ldUlLfiBQaGzMJ
z`x`S~4+9eptcppyN)kzAjHpfP@_G2J9N4)!>o^SvBKcZYIX#edWbu7kuX};s*GNTv
zS44G&(_7gdoJwFMmU=Ojz+9{~FvNM4gPy3lNcR?%n{Q)@7{~(NSDwmNb;^15Na5fw
z#DVRTcRuMg%H<+gZ`=s2R&Vpa$gZ#Y^()4zNDbJi${Y<H2&)mvF?s@gVw|2@-Ld!e
z-`ge(Z8#>LBSQ}tP`>(C^e|VrDRgdawY-fFwX2O#KhrR&N}8yMZ3(RyySN^P1UFZj
zXZ-fqa(!Pt`4Dw|H`>5mSfR1yx-T$iRg?W=4>x`CO8h6e6j7aIo=TgiT7*JS;4EkR
z>V#1g@@da&aLw5%PHTQhW`X9ap3v(|Yrbd`{^cSd;-sqC{(c25RZToX?>CSg5thdi
zBDdg?@96Ua6!Q-kp`xo#3u4+)B%O;S)uEV4E!`$gt$Y&DEL(=xcqYr=3_ufz(@|k&
zZP-&l#NzW9P&g7SAt13b7m(p+%WeAh4XW^Jl;K@1Vz(-hq^0NOm`Q?g>XYHu*-gxU
z%SF88B6=H>)ReQ0KgTrr*tSgEL6<}Q8(%Oc$@R?_2v9Mg{lyoEFlQ)6c!ek&*T8%=
zUxSC1utz^vUAHI9ytP(MSB6zh&e3&}I)ANBY08k6IAbIKxO0+^`cH<zbK8JJg(5+z
zzG1MmduA7th=Qc6es&PsBlYmgJw-l*huKtBxO6#E4=@+~`oiG-Z|m5+e6yEl5~Ax~
z_uDV{*bG*Q?gf=uk#rFlQn3P*W8ZL1R}mvHpUsXZ<-_yc0kG4Bv(%Ay7|0(s+mz-`
z-?ggnu+pCq6UxSCNWY96op7Q9Q4R*wiWT|4n}C?u&|RT`vC<$y_;|lqJs_aFIhWNZ
z5IbN+UTFZw<&cj9g)Cuiz-%kSezh%2fu*Ctgi{9nnAS`?Y<;T8--sNob{~fs!)AuA
z4D_G{okr+R6G8=#@AhKg*TX(Yr=Oa<!{OD#Y|hcg&~Gk5_g>@rz1nyW(Ji>2MvJN)
zJz2e`1_UYozd#<KT~ONd9B(V3jgo+F^bg(ZlI8^rCIVP0lz6j0TnCx#cA4P)oS2;8
zl&KJ9$XM|nBfyw0j=p{tcg$2cD^Wk@cfAR?1Gcw)JTrfRa5(AH;Pkfl+*dqYyga`n
zr(XN&?S9$&&dEGy7h8D43JYy2^V8?*_<ET>zMUP3lVkDw-tfJJJw*W?0&E}h*Xd^1
z>v()?sM;{}nBvs!Xls03fUfd+gZTwR#X;EkqAm;=s}-$>#pAq=K3D6u&zI@c=)EQX
z#T`Iu*Zz<0!1Y}=Mu_(XN=v+YT!r*$=z{K#@0ISl|F`sj1Nl~{@pTV0!rJNOu38=L
zfc>f&ntHwuMe8=Q_&55Fy_U8b_c7#qUrR3&;C5|sc)*JXbjCunTM@b*BQ;Mt4J~n3
zYh@~2`jR$%N5fdjYRs!VRqXdXaZ{TLRS2y_9WsMh+rFd~Wg%T$YZ?IuGOre70c-UB
zTF5FevnXsmqCEv!=LCLb9;arn+4f#f_@wY#OnioBK36tb68mR9v6pb}8v?1OJw~WO
z^VXaWcirG_gv{Ji8XHl4b5H5x)WIbtmX?F=BpUU!Ld>rQv}6{_jkdSe8I=YqgnX4c
zJwj6sD>GHs2&vj*Pt~5Ls}j{z8BXolDykLOY?YPjMV|61bv6NJ(#e+6OrqBOax<zN
zGO4b3K>X)ZOXFj;!{F%ao@dWI=1v9bb1FqjmpIvaDQHTzM@0iOxaREG=;z8+PtUuu
zzqeV$xDXAH0u6=wOMUr^g$RLhhy8<O%=BS;?iQ<wMScYN3+Z8goU6*>1ci5kK6@*u
zNKUvarE$#>4wTPP*_imr@2YOCKjd-Y8H{F8*kI-74dT4tA2L(NpQxwt4cK%rQP-Yc
zW}O$h=*En0zj<QxBS$F)o^+M~w6cHxKdJA_9xm$2c;D7$(<XW!rg`7si0u3N0O=2}
z@o``0y{BpZIcrpxpZoDUeLpFpe$bhm<^@HmU7W|2WpAda(Urivynq`$|3`eUrH`NU
z(HqY-0WK(J_l-{@XO|atV*s1Q)nf4I2fv@*51Zj-&@<PhOB16tjZ3+>y)z{?GXS4+
zVnh(UFI#}&8wGT!Kiw?h>7|U%`}YUC*!g<aA0CJTLLECtq1@Hx{!VkgLQzQy-D1PK
zY_`zxY52JWsTMwZTWF!{CPI{QHGH2i#37u@Pq%YWeAdF|wMyM!0rs+Ma0g)9H}OIe
zyNAxzzt;qX9==7<#n#r$_>VB>zPq#F6>a@4a=#FWrL~jWlDD`LM^`X}91$wFKF*Do
zgDUUq>+{C3Kh2LX!;_&mh0hFz*7>>aCwLafZQTsP?Q`c#@agGk5GJ;=Js7LHDLCr5
z)9?__IK_%RF_p>fA<$_u6b6Te_fk1i54!D|wLdzOe_)f_d@=~DObv8g?K7N}unssv
zT*|ocnzW&?X!Lj^_kpN9hvOMZPIH!Ihtnm^84VTNaX}+!i|RLHZ0wq$fXa`ZpBZ1L
zzMBLmCTxu1&(oq;WnUR@{%$-`e919yIcsbx!ZuwhX0+&Wd<g#%qL#+*a;Iah+p_m@
zs`u5i-nQ{`>8Hhh(n#;4HO8+7kn}?gTOgE)@>6K2b0d2G(4=09-tyi}cDQp`+-#Qv
zPdbCq<%gWx=PAHRjuj><Lf}$US{=ku#2d#u^9TM}@JPHE<0Lf%j>v~eGSdR7{ICRn
zeRg>LdXt8q=Oi!G?>oxsw%_@N(N{=UXb4%e`zgBZu6=0XVfzH5$J#yHOumKPyjc1o
zEBqOY-_;gb$HQA&Srlx=DZuLZ0cciW5%g>2qFvpN$7pOpeV3Nr=ZFq>$Qb;m;^Frd
zj7>?;_u$18G(HO&T7?{Vnds60K)UP$Gm9;V9n%0-kO~qye=y%xN2}MDv_i)GbBS(e
z<y}nQy`E4wnv9o$yic^}=uv31%j<2>{I;+tLT?T4PN#^yXztk=8>WcjPaKeTz7<Sm
zHeAYvG3vxxb(-}2wKnC}z5B(ZmH(wBewt*$2~nm|zXTsD^RpBs)f_B=2!q%IsqzTd
z4)WX>LnlXOeMoEdS&hMWBP>TF5iPHfSM~y)L(__***xbP--xdGvBvu$<o=P;;_E@6
z%Y<-vtmrj${11wFv2YBMVy4dzhx`wDg;b%?DjX8Qd(Z+p2YfR3zn3tHW#F%b=T2!b
zhAXi5vkAZ7ahtDxW{hfpYFftr#o|B%(6&D0`PSQo6>s>u0K$IdaSdO71JUms7tdhG
zo|M?fKWRdTquw?l8(DuhBOz5E*b{@u3wYT{Eq_6j@5jD-Oo={|#JWDm_{P}Up4m*9
z@!hsq{{oh0M#cxypPHR{3B-|I{3PQXvD0@?pM9mWrvPJ(^Euk#5S|0i6nG^LwK0xl
zj}+0P*?aHoz>(OyB=&l$5JkykN4K~szDYs12QjExq|63kZ{e(v2L>!F*ho+f#%+sD
zN1tUF$}azF#finQd)wnH=Y1e`^kj^ox#j+7ca`pQ$YP6uF=^t*ij1V`LBH=#r2s-S
zk>dw=9Gg))HPC3Wo~zSvi`R01tV(IIqtC{BxnzCQo~RT121D?@Bg4ZE(q%eo%=9vS
zNt9y7duyeMNt=yr6=vtFk#v-p7G+IUVy6<DBHN<URW&wuUiM((Q;TRSF*}!CkxPn<
zwzeWSl<}GxN_-NaO?8#{_j9Z&pH_I6qnF?mi*2{rJlD*eH2XMuRi|6RjI73|0XX?P
zVTqd6$G8D*A(r^Ox^x9v@CoNR<jiP(dD;A;S3;cvtzD!^C*z=&*zB61j`d;$Ec*Jr
z?je1XE&S)2t_^jqQCE;QDu14=3pU2DWQhYHPzM?`J7mt(j5mZNs(RrI(&0OTHkyF_
zrCaU@k+u(%=}GlCK9LE*U$eT!$G<YD-ShFld3k$jl_tPCKNJEHiy}`@|95ZyfA{wP
z|Jd6{439hZ-sii|$PrHQ!|RU|EGMMARM!Ti;9%$**FO$a8LW0Am^KomW9`nqXXQb{
zs?U5`vd*Qt&+&l*DCJ5Tl+0e+vg^UK(T+#zCL@W5lEI45D<Nnp!@LcaipWT=g)})Z
z#I-KHFRZA&<?xZhPb%J>+J`s`)Tojv0R-Zlng)p37;kGR_;`eD7Tt)-ycN*OA1YW3
zr8XdwUc8E!290PY1$3_J2^1bmIvLQwNvfc&HBL)MP{Maw9x}r|mgpEuU&yl+0$}x8
z4tzqxbwN;efWfS}mNdd>>mYK+G$E?y5TIw`NyjtV-zJe@l}K{Kz=;~gp*k6U%#mT$
z&nT-hc6l+ty8#3PK<85cB4wikeZEx!Y)(wjADx(Rz-%>wAum%qqyf|~13<iA?6^_H
z#Wp4>fGC`lTRCid;foe_!|p~cddQK}Ff(<pR1qV2)dGtrXU?2J-*cij16lOn&m>~B
ziF60?gS{J>*tD?(Iz*fS4*ohZ87@FIp-_F<)7aRD(m)o8y~g4WTq%<Bq#Y;$cND1?
zqpN9vEyq~ALn&HBnR>h{)RA2i@HoA~$0ai%3rQ^<sxwfl9aRou_#mqzHW8w5YFtT&
z9#F{$fE@P<{ncZvsko4jtKvdoM2QB6{3t{VUx`qNWSnOKO{TM2-)PBFWqYWoLCc6F
z0~^g5mH)->eY3GWoJb4fpRcbN-!mJHGbNN1YuhcxX$T?7F=PvLGW|}&uh&G=Di};-
zNrHKsoa~MXVQ@_nC=h@kLBKF!U@(XP4EARVv)#sjQs2XRt@dKIHjO4*CQL7~v8&om
z^{Q&4)mlF?UKgaej$CjQxUUi<Y0n*Jr1K}|uMQINC`QlQ2Ezs0s(jiZ5dax-D-nkT
zK=W8Q)pGiKr$`D};c;JB@S&1z3SFA{WW>!DWf?yOuG4-rOwumw#jrBKP!FuCWYeV5
z3bm@e*14wb1)UZJlm;zYL!@pv`?`rLV{mw8MyVgnlLC9#DxoA<Ovd|Kl+{PJG7qR2
zu0Xi=ULp*jtp{`QE!bUxrgbV{daq`j{n-8e<Q{%EB9wDX*jY9?U`Zwi4FOQ@I6rsX
z-Ulwy$PG)o(}83tlgu^=(9;l=1ppaQ@uu85203MWYj4R(=*a>waV@Bo(Lg$BV=@{2
z{J~@?&9E)0Wwd~oLr;281m1RGJs7jzcz%5(4Ns#@U(^fufT~YP4DHCGX{z1e$Y;qc
z<u%ipQB?2I+BlCG6osux%K$`W^b#7UDv^Q9G?b$D!3~R$e((vo<+Vrsh#_-n=#>o<
zk4?10MPSmLKD-vIL0<5nq{zXLp0zrbX{Ymu43^FmFnY~mHiVO`!U&3FuG-jf46K<`
zYI&I1pjK4Q=Q#RD2)OG-MgE@&0ipj%2uQ(I?M<yJose3Uq{M)d4&Arxz}Ge@{Wx>>
zexZW*0|+iPDTLrp6jH#63snkbR>`pf#8%ZzcMvzY_}Hz2V|N%~2?nB@Tq$Y0NvZ%&
zwKWN@X3{2lz{Lqr(G3RnRg`Y32$(d}6L|q-@|b^mk8bgF5>R5w&F^2kg_LoQP%#vN
z2w(XSVbbEtwwg|ecxNdzISZO^%3}oUdcBwE2zA1$3raCkvxI;oR%}6;Nyk^UKa^&n
zyb!z=i==z7?`D97SCnMFLNp_tbm$#1T{qfbgBdm4V0*G}k<Ct<WIxDO$8l;%zRWME
z8DxgO@ah<1d&eazmB{*qzI!91y2N<`P*!o~M3_vP9AWKVK^12et{%ISTTN2~e$|-X
z(WNZaw?LU1cJx0I0x)NJ;gh7Y8uDOzfU&_Pew+>O5y%eIkkXQBJPx$6YIc(jK_zg}
z%5|q8@j-Op>Ge9cR^&BkY5}$Kl=;JWnrKa6Q7DkoNyBd$lH9g{L7`7MGb9pBWF_C3
zuvkh0Vz`x11tzvBV||4f<CfBR$NHxL1_;P|Ksz48J`p$K6``Q*U+AOhBu1I#{dB6-
zA{JKb)XV>vejuY&$TPc}AzcDnZ0pg8u|r5@(GnK+TCFiG+=G>3PeH;=<Z{5B4#=bB
z)jVF!yU)cu#(|PaBh+SJutgu=swO0xL`C3oh51TE!7W7I0iIYKNc|i=JF@4~<nh|g
z&3C)WiyJY*g<`sE471mtB^*?Pzc4oh|2H9E3=d)H@LxheGBQmM^1%wje<uXU3w|I&
z^eO<svwklG5AdibMmbog@>FCgmPOUmk5CfojQJm${v`xFd<y}CCUR~%#O=6A?^zX%
zkwHNc7p$KwI+qcQ3F@sml~>iHZ~|B$_rPA5w#V|Oq7=`NgM7fT!%IzTm5ChgE}QNp
zKeG@-xhh;#Gu68Zc@X>v<s|u(JC4zzk1}%BQq044Sv0Ay_6=&D9alw54Pf?yYDHrg
z(;{U4MCKb+CV_2bPPeY3+`qh}CVq?Mpw!T5-$Xz}t4)|MsI87F7)oG}qMKTSwGHbE
zqv92VmnaVUx3P<}=S`idGn`jPKQr4ku5G`dR=lMqw2!CE;u6R3mNuB@EJ4W5)|mY|
z_K+Ay_t_61tz;E&FhchX5pmj>gOHApi9vZivPAsN1LWY*2m7{xG|T4G+4p_6v12k$
zb}PHl44zg2vlV5egRh2btR4*I9BmbLOrhn-FkaQ7NJTNyhr^?U$x($qQcQx+s-$!*
z1aZ^qC9c@?vS@y~w{R+x)LUf&E$qp0eEso{(q5BIXckT>rhI5Knk}M`I?%F;T3%RC
zq#OX>j0O-Zei=Prln0qD$1IIe9vB}t%Qy*enDQ`hK2(&GLE=!}=@oO7F8&e2J(#9I
zgm*)1{hcw0czPn=Xo%W-)!1L9h0ZRBumf?jSH*-lZy8+u%Ls5uZ2ZISW;`LMW|103
zN8>_&@S9@Xwm7g)JeaJdx<3KOTmBiO5wlOCphQ(hf8L|aq#|0KU-QTN)xpfrbtL+u
zG<-1&-xOOABSfJ%Gk7A3xIvxw=X<HW-%EV0H`@2*s@H8Lny#F!iLz<^Op1&UQ|tlQ
zzKdjl#TF`B{0$5u2+cIFvI(FUk)th<MFJz1wZZ;M?b|-ML9xf5f%^wZY5JCl0$swH
zuC<%Q`eo7h!Q`%#2(33n7VYcca_M+GLGx>U$M`Me0SL!CJSChWHaYy&v*3;Ka1-!#
zgnx_xwqQ3EJc_@K0OLv*rY|kV69X05n2Hsr`vu@~)p6a5nuwH_nKgTqISr`~Gs@Cg
zYpKs~BOtpy<>}i9Aei$0jB9`X6LQ1;^5rtXxrd^3;YPOVT3JGgKz~U^!LSF*Q57rI
z9#Z4Rx^Z*=Yrz*Ta-MES+r!Q7HJm!34Z4rtJCa4FGEgg@YXFLZ#T^rboy72GTHhNI
zy?)c|*4(T@z+<v|x?S!aIUEbX^4xFrBNv<LbNB<JFAqqgapjY{laqA4-2Q(z0+e}Y
z3u~0?STVSHCqu&0nnc~^#OyN-{YV`cCuiyF9eD>2?{Y)ZnV=a)Inw7=mDu=)>Ap0X
zPzq7fWr(j+l~eb0-Ubr?HUjj&jR2(oU<4%mn-MTt-kWh3_J3&vp!^pjpoE66s>U7T
z$kaQqeW&OH*;OXVGeYda_;pzG`6V5vG{R0HqSxg!4+*vtn*-VlQvjjW@mT2d9{E^M
ze((?SVBsARNyYuWyZ-W**AxUFU-1Zf*}l*AraI&Mp!{(%5d!B7O~=C$@beLQn;3z>
zMJx7o^qj@+2Ua1YX=Hvdj4l=+gKU}_On0KSCk@#|y3sk-!J6}={nY`qwRW$THN=F5
z9nQMllOeM=KGcQ@tv<rG;k_K)(2gdJ%#{h*>&FtB5vJBYDauOxGDNy$f&mVUSEEGJ
z4{^jUKkLmz)T|r|`M~*djVE%j+=@R}E}s|N2r}=3?lxH^p#N?JELK}*gon5?dAshP
zNj@JvN1x!$hwgk#j$Y}$%;vr-Npt;>D5{y~MH#;j&QESJ46Kd9zygZRQ<5C=7`yG#
z_Iu91iI1R#E-z1J<^TtITPJ)#{6JplgrY}NMX5_d>UDtKCD>PGm3kRt(}#D9hl3s(
zd=MgwZC;Ip3Ach*2U7i#Ja82BcY0YSA`(QQOhW-uuS$rM=VYbyQ>&Xg#+=49%IjP5
z!nqqe<Qle5OUA+bgpp$9_?Ho&QhXT2KiZgT<&qG|mI_Q;`sZv@-fi$sKj^Msj{Ar{
zAFFo^-yf(04pCoC*ILcV5CKqzdmP*+T`!|gtM9eN&ux|6HrD$b&#de&x7F{B9?Ulo
znv-1k#iDA?hY;WMg_@hc)VuR^<f@B>gG)9MTCi>6SfiWmM2|z*8N{I-+wG`Y-v{ca
z`-9v%WqJeh1R`g9oxG$sX+_BDXi2pY9dd-Mi7C)RrkXy7lpr<5fdm#ra9V>JT1~CJ
z=?Z}Gu{~sa-Yoi(PLLJop|Tn%L?=N8(-_#$drjk?OsfsbKrqb{CM62xoT%Fr-XM<x
zIEq<xL2>l|dq%(|2C6ScsGJHegNJxANsK&7ELE&GAUTu8@*+G;pevT>{V?Y!`Rswm
zm^jp4e3uO(CNLyUj(L)pP0w*-BVOZ-fDeC*DD{(MNQraQ3$0NGjOk%Rh+LVz6{)iV
z5_fo*RP~F{ks1+HE+qVR_3@@qn^KjUIwBu_pxm!P_M)B98Mhj1;~Y-bK&bZd?m;Tv
z8fKAawFz!;oIfKjucW*RYAKEG!EbdLxdwwG*@lBGx?8aD^o7$QK7Zid|3U&57=<ux
z>czy<b!!ipS)H0kAJ*G#E8r*_Qur!sL0qUE4uea1t5o|B8(C(%?>c_!rdF&oflfyq
zMdXV2{+L^!vuV=7G%;gYjNda>6^GLT%>i061&>b+PN^m9c1jlg!+Z4@vEg>;sD_%@
zHnQB)C(mtU;^S<onsu%Q$(*m&*%UDrL#+xtuQ_!Htqe*(1qEZNk(_+z&E8`>CNKjF
zUgY##24-^@I=#cP(91mnOoaf(2P>P1z$W%O{-c(AXY+nP4L7^J<k@Dsm-KT#ZHAlQ
z$6XgCv&s^cE$<UZ8M!PFxIs3+ls5Y}=1PWtBa71IZpCm$$wvmlCTC1~UkqSAgFH+o
zh<GMdE}GTKu{$6(_{D%$n8%j%i1BS#-&ddy@yN70W*ands*rSLTC!_~5A1(H0@Mq=
z!Y!-RhZ(?P%ri=v%gS1%D(4Hrc}Nz;alvpeF_<h5#{f}&7su1K|LSj3nPfL1^~MKZ
z60TezAn6?Q)9K}q9=iXCxv|)lTl@0c$n{*;_;E&z_mIrlf6neyK_x-Jq7Ut4Du@Gc
zpNu|{fHfY?LD4u`BQUJKR5JQQkQk}m_N;<5KF3#GtD#gI@J#|d$7r}lCccQyP8O$4
zmQ`HWK^FcWB*5Y?B;c#5mfhZT<H22Td*r93&T?+2?$-g|TC;BHpu5{y-$#kxz4P<s
zQ2vQ|$Y8uFVfTm0>%H+e5-{@*5}^Ax5-|IJh6MP_B`8uEYr)=gP<0XeX(4a`h>re7
z^_c;}q<|F#s5^NnxL`{E8wn`H>e{j5%b&6AI;wnb*@+$Y1NjFDXc|Iu(`2cpB#B+Y
zQ&3lz+h}X+aYOdF7J1AjPcy^+!T%{+k$b0AvSIZ+*g10Ye0Fc!g0RB6GJVjjX$|W3
zGD<$R#BTXtm4NXzw|i1)R0_b{mW){E8-od-t@0*}(mK0T+y#KYm4I@EB)lRv&Qw&M
z=H+9vqXAYCKGpaHN6ur+h>Xw6toKtf-?!`g2UT}FJHu<5QX#@0hCRo~LNKr5pLU6Y
z=oH0})D~t&>tMHNj6&mCen3G_aJ%qc$ruUv-qm@JVV&AUV3Ki@Q}cso6qqHnCS$yX
zQ~iubh2_M_HO-fI%=$F`uEGnTFY`nzZzO}S?soDtn`Rok+qIXeUads~;xX76a~y$J
zbLg^fW`|Mi=;JMFk{+m5x%r$5lP3}504b)JAf0x5>@m16(_z@Zi!s`SRcec~Tv*Nt
z^HTBZq~?yQTc!MLkl17lri8RAt7m^@r!6RF-xZQLJPEoBG^;3GERi9{hYbu|W1X&A
zhE|}U=pl99#d<m5n;}rY5arjOD&er<;S=WU(t7Klo`B&txjbxBqA-S8GFPY!Fk_4R
z-zIfySO+PFg~@`9@teT^&ip04>NM~0RSKq(Abh)IIuT!rib4E9x-nwZk0xt;1HlM*
zkI9bhz2v0&$_Ph!ii)4{hmi`BZl)gAo1Q<`ObHhKlpl;N?kRQlUOfK_pvQ2D0<Nx(
zCN*IOOr|53JwUVA{uGC?B_tOSTe65<5lgUNDppuQE`m&%HK^+jfRhY1(Rd|ezWZJ$
zK&B`dRT}wKhEQmu=x*O4zp7G1efr;!fTyqvX9T&>k{s_D)qza=rf($R7$CAJtv;a*
z^WTwxrT>Nm)VE(AyxFaA?9rk~7T{SgC|LZB1f1TM{El!|b+~0rWv3{FRO%<*3IdS4
z1Z7U0Stc4<Mb2a(BYF5K(XSNzIeV#w6><<3jviY%ZG$1A!HS8(Zuy!`Ck^*Y$(Qa1
zcG_&b19mVCs?0lf+WxZmX#reQ|BL_hi+QzW+Qt$`ErAIb%?p!;l&CGuD3gj&Eyl&Q
zah#9mK0tJK5=oQ<E8f-h5vSCtAWOKGgtFBr)9EMWF%eP@o~9>~5t3nf6X;2O#2`s+
z(obkCM1UC;-PPL_rIX7Iq6ARsP=Sx^+krBkd8bO8NNoh8E6flk<qZvc7Hc7+wyX;b
zktnePc-eyEYT;Aj=GOQ%(11k&hL(c(k~oP+*_QBf{UEmiD~M(KDI)XvT7<JEo5rr&
z%NKu}ZDZz&vXkU^j-Sjh5J>l+l2MiFUjs1G{i)eC{hDWr(z==l&ZpM0I&JKY#wU0i
zGA4^GEGr}nU}%H-Q(C6ynVnV%p6&qIG!cfv@2W~!2|FmpK_63$UsgAE=nAVWFA|yc
zLd)tNhbey>0clM7ppqx8-inhQVyqL}<u&%%33uTQ0`LK5N%bkORe_TPC`+}p`a_QU
z=sj-Qy0m?>nNOLfY(qwIADFO3RQqJ8DxIj~pxQozSehN+4a1pMtne*8Vy_NY)h3N4
zg3~zv10w*<rX2^7eD7DPEoDCq8h(^V71IkI-0?sl^Xi@l%%*mIR~liwX;nY5;yy)d
zDU>ZY?7}#f0LT!OPcXwVXNPX~1dJH~1xOMQK`5KXH5S$h%Aw=tDxgIn%~Me1_+<(U
zm-Ga}vzBj61c^;hp?J!GSZkw;=#Wc6XYjykdf2ChPb8)4y|<)}Ah*v^Dt*`NQ&=}W
zjKfbELT*ihDZBCi&IoWMP7sX71BQFsLU7v~9P0<}V-JxFPSQ+33;_Om^{ah{oUVU#
zy>t7Q5#S2>$8jl~yY1S;AHsLM5)~G694k6Q;zH;jBOo#U`{n)czZn4%o_a2_I2X4q
z$6aLk*Og$4^j977p!i?k@{qt}IyOPiyh0w70pgx!FmckiGNhp0AZ1ZzOvKEXcDCop
zPq`#Gq*}H#SjeJAfgAE#5(R2^l~HLzR)r=OWR^J*Ns$IteC|3$m!t}KiWZVbo~#me
zWfN$pf5GaCZ(GKIk|Y-#H_^|4sDu?wsb8Bi!-mv~^=8Y^XqSa@+h{Tq40&@^0O+#c
zE77e;@{(a800^-r4+-d#^aq{Mn|t9{DU;*^^xcc|0~hvLF%}gnGZN+#_}9S0obB!J
z9wk%uE&#hQ)hzru^UQg$<Jgol&+PDqs3O}<Hrw40{f3Ipj@LZAQxT>#oCrYTr|di(
zKOdaOhf7fZsz=lH136b;9z6LclEVquNRe)S?W~OPHKV_X0G(rYUeOZ*z6Ge|4jj^h
zyFb0XyASgqQjSl<Miaj@zQT0uhU~ovrcU2-oDF;(n-l<iT*IwTa@`f=d9cK{BQ5Ni
z@WgMrgQ@$#Eo5Uz$Gn)a*=#VdF>a>dU|>w%yvD#BKL;)a$oTSNo^5gPr#T^8a1ZU;
zmBk_fx0(s>@dpc#VCE4vQt8uh$`_mQ<wpv6-fWB5^JMHUFJi~QcO)O450H%yY;|6A
z5gZD6-^q$A@NZUuA-rM^u)~w?3U>sTaq(k-d)?*ZJ|`wt!{%@6WWCw^1ClPh#S^$V
zn5ef|=BPUKYs#yE4#z0fam~NH8W~=)H9Vv0M!&==+zp5aUoKXyzc5!pMC^W7MXT5R
z@h=v!*Qc+f&BI%Tw`4-{VR}mmCFI9Yaw4b{Ixp)gNT-RRnhc_aV`!fEGZ4G?z04f}
zK)4wDdcN}lhS1IZAPWy!H`?SbNnPbwPC=E5IxECelm4vB0TbV<hwdIk_h-w%zI*L#
z6CoP*0S@NDELpOV8(@X{4F4zFHuWW`O+~MVKFUp)-Ag<D8R(REdg_+m#z8tllF|UB
z&$+B%w<|G`FKg0VU_k}jN~xN0h_jU6MdGTv$S-MC8z4uB%8oEmhZ3V_Va2Q&5tG!C
zPJop3>4o^b<Sdc!w7rw|G`XV$x2?nwg+G~d9zI5!fi#Pn0wJEwLqo5Kw9jGcsc!n!
zV@vv=%gCBsSDjoml*v67uSCHifxy7l+mxL<+BIFP%}RA+@=_itoaFW!oQFG;+#mxm
z2s6%MM0!YMhi2QB)<TswnNJn~5ZBfl5Fkv<Sgu7PFe~(Yn!Rq_eojC9K1o>F&@iR9
z#l@EoSOR_~xI#Qq!=oLSp&$2(v&5jp3PVs%O4=Y<P4>RLV7vcGf>}<Yw$RX2eMPi)
z*4SWORB<gbf$5abr`hHbW+qrFTtI{rc5NMP&`4AQ!x^2V?!gv>{;j^9p1z$KZqoqj
za|ej7#C2K)m9D9tAO#=a)hv#l_qEdFxy9?>%;y{kRN;SzU}hJOgdb%1{`PS28<X`*
zD#G(g+Y<|C;)(d-v=UKbt0}?Dcv{%}?xK%I|Mh)viTV+rp0?*lnE3#&YxAF#g|}kj
zYjblfy2EumU*ew?1yE(W5C<6HD39!Q%_vyYjtsx9)y^4`t7UABnh+Y}SgRhly~q<P
za$x?jYlWpEoWv$MFkf04Pqky7*C<kj@mbQ$faWo`U1yfZ)T#tCi^)?J=Q0rKI2I#^
z{hV+27zQ9`#lVj+vh!(R$CQ1@j!B8QyB?Dp07in|?+|F{%r8@7KR}^ciAWHx_UNxK
zied5TtdXEi8ulil@BT<Eqh)Ct9I^`Wh}!A`^fN#6y?at-j-xVG*1qQhZ6Pk<EYx0}
zFKVOow-K*K^3?*ak*cl5)GKAvM>l)S!NDq1587-n=y4o%{F0#e?kccvp@rnLUfl>{
zvb@SDW9S!l*|s3HK-i$NnClL=EszaXJotIT4^Wm&BNY+CasIKDtTOm<GRDdbs;eyo
z7aF*8N^rh|v5Rn(`uiw^4|Tm!%*iTMBS-o^d|ax^O(==i&D|=tvQw-Dt?NuQb_BB2
z-v*rf0Z$-3Ir-oiLGhT2-g07<YD0yh;7d14gd2yv9nL*cq#o(sTgN+zbk{JUM}tCu
zTY`>$t}Z020QreLdB*6vjYuikzRE1u2<~eAF`sdETPc7f8XDKG_ng#rLO?HFr&Q${
zkZtk~Nej@y7|93>uN%F$GcgP#XT}SNi4w@aR|DJ7tcdByJ|YweT#xko^<i&S>g{5D
zX*XWQQU8HTO49GvilAp9$fhlUKjB1qEfJakd3SNOBr!s;N*gH8Me#lll=#SotPZwE
zFcbYDVxstBuW5k4=OB6LO>*~NN%L-TpTfAIJa~hI^MMOztF~B>lI)$oByP*%k#a%I
zfcK5_vJ^+>>2x~JD>~O1^IZt?i5+K4@?KlzK9(aqlG-NIb*GgIrFDgjs=8C*q?7Ll
zWL^z?npD#3SoK8DN=u_S(w4im^sIjs*V2knb81tz_xSaIZIv6w!d2_e^Ra<6wku`P
zK+~;Zr*CDBGJK*86<PCv(iGgXa9Vc1Sz*?<J<lo;!e`zSd`0}4wZB}7i27`A*>@}a
zGL?%krPd|}-e7Cyspq-Xr}HO}Pu7Z+^^Adyx*XwlQ>kL$8g}b@xHiuJWZn^Ed#L~*
zr;29h5iscJc`$dHE1cEXycZ=Osv#)pJ*~kXBO9|ri9KRoNofS$uZ=~!v9FT!eXMV|
z1FmnrOOihXtYGC{(@B0%2!TaiNDD|OO!9ejT7A{ZNw(D-DF^08_L=2xtF^|Qemty=
z51*2MfuES7&B1xOBtr#$V&Z!sMHR{Q5eTGJk|GvOE5dRL)y-mai^^pYsS&2szl3!c
z08HXt1n>7C`<^fc5}YSuBVe$X5?6XzC%>p2T0Gg~EWU7I2Sv978j>^0CSMfMm-=Kb
z&>I&8kWYWzpj>D8PUX<xw6|G;9o;NOaX)?hf!zpRu-gv4#Q=Zy81qMaMwbSUJ{?-D
zqwO9dk>KMKD6LE`+VFN!lA|g{(x`2(3jV=+5n*CTHqK7c0VBmRdd;Wh%gLew-I|(R
zuab|!{;I+xOqF0l##5KBre)FqN%Od>odk>5>=C?W?oEK#$e3lJpvQKT-E%)a9^pCa
z%Ajw>1dUt3Sn%0cL+e&qowS)d$9K6tVsFK?rRdd2ci4s1*n{-k!su|EO@Rv#enP}G
zkA8vMpwq%Ywovgy&jP|yVvPBG?Dmd7hH#VE7$mmUc$p3yQ?xTlxV~^27%&!VUYSO$
z4U;FM0DuaaC7=Opm-&$vBAx~&_#xLS$P+P=Eg^9KOm0Gt9@<_Jd#<loW%GPOp91!*
z!MHyNpt(@-qJTxRayU_2iQ02z@Pe~MYQ}=I#m=n=9yiF4=s6R?h}NDDIJ_09vq@J0
zGa$}QB6x`92i2nXS6C}zctT6nCV|GP)jqyzW(I%zjcsPwnu*6K_tT|w#=d2%TITd>
zz>ljb2x9gSN%Pgfhbh%{Z2|aqV~Yz8$<Hjp6b22wV`t~E^5Ah1a_iE(sOEMc%o%u&
z<xV64r0V(43wVb{b(B=v13G$6n8Zv^lvpYGU@dZ3lL{C_L~Qm=SzJ8mW@1@Scd2YS
zqzb1dD8_)yGuvTmyp?07J%dw`Y>8F(N&6z*n4WQzggxFBC`}2&h?C$Y<oI0GWsP~z
z#q^Z5S~~wEL{)c|36XrWg1a4pz0XN;(flGPZ<C)QGuKd7ptL~FKFfjFXBb7Yu6t^C
zH!mk+0i)XqP@X*bp`gMqvQ|nKnW~kCSfUGM8KMeL)dh1-Dra^CvR}d|pL|h&qUYVJ
zsSI^dYu~r8;a3MHSw*UHPvM1QQ{DS#DNz)uF7zeX1sdtm52%^)AWJP{R8I?J&o9j{
zi{hW2?=9|}%z_i1vrPr=&0CF-Ll?BGh!)&{yv8j`I#m=6=c5r0>IHZ{*XR-D@8j+m
zR=#05GMzJy4IL^xfF=CY`sdvUuTNeQdrVgVkrJvh1Clv1-f6G<5y#1<eR~9+OvGU;
zn4b2?5I>MlyshaRnM5Ftj=6yooQZnoK>eikXicBTSj-mX3m1l~5?@l#qj@!BIKJ78
zL-YkUB@puA)dybsn%sg9`0*#;y_VqcI}d+Pi5Dp4!5riUFnmy#J>#q*JmO1Y?dMOC
zFEkJYS+?be)h0ITK2b-{!bH%(Ig&bdONzOc7K14YVr8Oom~^NFb5bM6<-_u(Lmsj|
za>ejQ5XTlqcBtWJGAbz~^O3q+x>Ug>*(+)86+{T_5rin`so`X2Qd3T6nkFh4a3wI#
z@1c|yB2z+%w5E^<1ES=`8>w6sNBAvHJ*xnM4@Vg1bPwF5Yocf{cswlYz`PSmrw;8b
z)a3ld^g!1wn*-$oc4^9l7E`Jeg)#%*nlbK*HhQ&B=hCwC%O5tamSQLdWlD}bFYcN#
zfKAK$wOXXNK1Bn<6`j<Qe#1=rFH8g(tLY4p!j}}cK__6f?2=c=iM=3CF%swrqTPF#
z%@sl@(%>B)Bb#+rg=m*X5FutXMg_Q(_H~AIMHn3kzpc1;a=WHd4q<i<8R1gv{E6AZ
z`c|Z5`Iu+;5&@x}wGAPFS)UE33AXq;AQQh=_lVM8I2`g+oxu0~unNW4bO#5)0Bmam
zLF%Qn^q;$MyLNTMiTZ%+t0^}2ZBMpys$T>cxRGZ?AnlERC|4f6arA2Z$~AJMG8=H}
zy0RHwZPxl3QP0Q5HtDVpjjcDi)+<&9h<1Th!fS1!UCiE<mc}Z!IpD_Xq!_Ia?6Cyf
z*Z%m0I}jrT3g4lYk#vOP@Dim#zNY|N*l}l6b{%(4!fu^itO948S*!wKpI+RyRHGWV
zVA0IR2?(IoGOkEaH;g9^xV(-fPI+w}O?-8lM(g;C^&-1adHbo_IMlcec>|2xVYfE(
zGj?%rLu;2pw!Zc%cJV<=W9Jr3bwi5QTGX$|=!~H3Ml+&bsK)w^I<g4pi@+2=9`mpv
z^LM|vzA>I9c^K1CqPmT97|$CUqb%YB5-m32X@JiRM9k;dm%^Y_Bp$ncw{i7O$ypps
zQR03a>R4^&)oH<?tfqGPj7Zju)36%#a>$VJus@Vx6*AsLA&l!EQ$*XJhPN#vEZ_iV
zVVq1(seECKll!LH%oF>q+Kl6fu0z>}w^W8PjvhY=lyUUL7>5a)e@vQjbnlCaaooQ#
z$~rutKK+1k@Oc(Cpvoc}ZA~NNfc?2f2IEA+6T&c-sIBX3jI*PpT*lF@gdE1vEw%Lh
zZBcncZ7M=U0s`W<rU#h>6!wGbm{Sd<w$3pQK5A`{MLdQr#xSl07Yl=6T-)#wMxYzj
zjP*+)mfvGPHOp@u?XU{VuO;j;K@5M?q_$BvoIUhrzk2N5f4#-;9iL6qH{{e~5#rN8
z$dqiOx`;o4`R@4(6eP?XPe>I4^2{=Id?%t(_L^?LIRr{4NBBsHFB=@tXoQkGg9{J_
zX0fJRhLtvF$6r7&q`rm5qRYF0#^V9BEN)z3x_C6ovz|?HidhsivLdmTb+y?z$9}iv
z{n_Hg|623wdUZRS-S@>#M!OA`<@)|jJJ1T}OrByNo5#8keQlHF8n@8p09F(b(97Nx
zwq1IYww70QjTQ~5)jI@U2UsP!z0obwQt*T}es|deyPZ#U!PbZEaZ?PDzqPF~L)Dbk
zVy<X#+%BqsMEVg8O^2e{Qr)DwxT(7Fh)I^B@u(_LNGJ~m!EMqz+W^^tO185b<E>cu
zVMc%=1Pbf-W#*-yhv+IOA!|bTv3myS@U%X9{be6VN#BMg;|XkR6G{!Qf%D36pjmv7
z%JGJxILcAxZIn~dApf{Mr-^uWMPv1>XCfNviI7RkPK)_m8m@QS^;Y6ex9!sPwzeCv
z`W-e}Q_;}Gr7IP#aG8-7%^+M;p!e{4$>6yxgcX5-jE6w9J3+zdo<Iw}D|wRdoduB{
z*s{S-)i%H`L=?-&(5>0?s$VHJ>!_ukB?zi%u>6ZOS5^F8VX=Vm7|T>D_RGE$?(vgL
zPzmg`Mer(czs^YW)uO(C+Sz4A<ZMxN**K^ygE#sR7nI+dK8-)Iks8-ERwqcmms7GM
z^ENtCNAHxehXPPDFJs{f?Cz1jB!_t96H%6F0C(DaT8|-9id1k=MZYvlR6=U()Kqd@
zHn_KjFmRk1Ne?Mb^>=`662y+2y6q5?xqQq9EwED~iO>jPv)G^x!#rPiZFla+d~A2w
zk~l52E2FIK!ekV*qgbb@Wfs^s-R%SQUv43~G5q>b3RPlEZz13iw9~1;yDWd_{qp9<
zkb7tACmZ7j2P|JU0!qAGPFTp+K|7m*U=JmT=IVJ3rp@;6vp!kH^<KQW`e69Ho}KAX
z0~Z<BgXc$Tgyr?Xf3T4n1@l`Iw7%TB+gW{nTttx1*2;9b&F!lVDfdUm6Ym@qjlcC;
z3+HaX*+|}LGaY;P=99R2-5a9q;$QE)(lsbg3if%S1p`FK1buUok<L})9AL@tZ`eEj
zn1r~Z)C|k2#MytrKaG8|>t|+YAk4x^I6!m!RU^4Yw4<@)7#*W0RI~g_eVs{<CO^Ae
z|MhtC%FMELwm4KAjegVl-AIg&6Sy4zKgc=<-ddxsUDvj4+qQOX+qQRY+t{^j+qQP?
zc2}#mPT%i6UviR@%pb5;GLvV_G4AUcOh%Cy7K~lbj2|X%5o<e(Rg6CmcMl5(x?Jye
zrsrzE<Ke}I^ys#)&YO>kxqDWC$FKQPoUl+1a{pXGsUbTCK3@Irv&8pH-VH{GS<o|r
zcc<3_+s^})=VgAYj7Ti)?suQpsZGW2BdZVOPJWu&n4$;-=poz|cYm~wv3PpIp3~fS
zuiJyQ$7sZJ3xvZ>j;L-#4JpdCdq`t9YNoYNr(^bb!l?Gp*K9xoFII&2Gus2u9I3M)
zuV|+E(d@3SfK#uZmHF*f<Xg+wl%THCJb0o(U&nrKXC&ml0H1!}??Ha2?yu+d5mJS1
z+_J~b*~RF-)j7Z5g{Th+RYC3^A731gp&Z_Gf$j*S<+)F@TMDspy<ETEhdY%Lw}?O<
zk1{_Ozt@}H?eyJu5pd7b_+>_Z-8X>a=K?ZRN;IXj>i#irz`zxj+8!+}bAOgx32Wb8
zxl#drb~BBGF@hD`hD{9ja<y7E%mMotX;g;0oF4p45nOQT8j1?kk%<ku$ar1NO1<`-
zO_i_tk7@EQn`k5>5Or}>SVFc6{Lwo*C@KUB=8i&a3@7}b;_KE4MB|18n6-jV6soW@
z7@dH(T3)Y?pxcD+3$#a+TJ+~qh)q8oiMi=TIzbm>uNL2<>HaJJ^YqfOi3R*FilIPc
z3bJS4uO%KlmObK+3#MeM-bVn`G{pxr!LOH<E>bUYzOBdhXVF4s<=WKdRWDR#<5W_K
zgd_0b-b%k$wq8`aZmzBuV|S-7*y(>Z?Y~iV2G<OZG;nkZQ9BTGe=Y+hW!n%YM;g`f
z(bC%`_45;DG-jMk^bpMPhFrVlB=z0jt29P$>G*RE0>cfkNb!HB9c|a0h;xS+nEH7C
zxpn@(r}@A6XPOIg_{#lGLU_H=YfE2H1YEW{dznOKKKvWy#q_e2I&9ct%OC_iuNIXy
zcp>yUA0~aD^ggS2qub1#D9<FJKtSq8nLIuTjJeU~?@}or*ZaQt{MBrTL&FZ)s=d)d
z0+m!RrD6<=sc|{#!LuN#%7H(+fKmE_$;Jie=Jw7i=6rsxd)?y{=Y^m?NWyo68=`hb
zSD!0@^7#X^xSW^&`~E}9;An>bC#Lg9)g^wWt4YDh2F)C_d2yxT;asMhP1bhh+CMrG
z!0X=G*|Oi{entbC&{b;v?wAZHBAwdxIUvX`$aLE(Q@}WJU{LTs5iA~ioKi>w_+1cu
zH++pOz13jHXzwm8*C`A+Ly`qrR{0%vdVdWpE?^3Cc{};s9*|-I=i9wb!2+gH;q73_
zW@Z<w6fjRtp>6H#u^;2&^X;W;s}N3ztV=#Kk{daz(pl&=gkq-Ep|S9+BqW^m^?WgF
zRjO)D)<q7Va+y51Hw~4kZJ-zt-Qd;vD8vBkGDJRLf2d)N&j#>6k}STTf(ebSWn~v<
z#3-jcHuja;3czD}tc7&NHZCMmAd1Y+rj?#IOd8J&kpK;puDD^Bm(uD6LeHBvpk;ba
zUzsM@ZiT&A8CvwS{(zl6<-Vu(uaTkDTdRGz-9nNP#tXu3b$uV(Ao~rjSqtW7Q<hWG
z(IkTqlLRQ6B0}owA-(Y)$Jt3f(&v5Be4Y@_n@!~$$x<;_j*=EcO3348+drvUb8C~T
zN;?zMUh3Jt)|AW^4{9PHRQThHR=SWv%L@K3b9gR{AN4m6&E9u()7UbRq0CR*e!)=K
zFbZLx_F`{-TS%fqQ$LZ5*C7|bgLd!t5Z9}n+zP-N?cP`|b5T`5?v1Lr=p$>sGWQc0
ztHE-N<JoG@$mZPN{l^(ZSeM=Q`(MluLiTEc9*bDSuXa{%(}P<TZ(DdPlE7YUpU5}o
zVr|VY|2n)6+3eY7v$Q=QR}M-T3ozTR8Th|&RQVtHt+ecH`HCtO$nSANCD^gm$_4KE
zPk1p;W)~)glSQA=&^^F(QE%cXHaM-zB{}DinE!w@jON{bzdIo!xQVCMNWxs+NSjj9
zu3jvAXP0NW7=4{DMqY2LJa3iNo#$;s&h8yO4BqpiQS8K{{FP9QT<p*ux4<7NcJic(
z+7onNhF@PTCztgsE5j?kh?2iWLP3=#^FaUm>hz*kex=6MIuR?m{b@jr)|RW4VKx>G
z<*Y`~6|7U0ms7$Iv?wvCN&d$C)+c0giAZGba!O@gp^bbURe~Ddq9&#q!+kPR5s@@5
zV>0uzB-8ypg3{L1?iaZf^L0FZ8k)Z5&$1g#s1&PixW_Hj)9vuQ1L3^|y00tp>e*LP
zz!u)+pE8B>IR|?FS0#dEXjY~I^@+<WdY@tg<toP&P!*}ypF~v`Ve;_X4EOWO0&=Ov
zbAcxsE{5|;@XZYBHFjm!vusaMbyRg4B&{bcR5Uj=-Zh{p+L9}THnufUow8STs9(8&
zM4BB_I*I_;KY;b&$)p>flov&qc}#h<6)^?|-IQSaNk|EUHW+EY7)vm*NU5YXdFgtG
z-s8(B|1Pq&`fcA~Yt!N^lO(w@f28p5P8tacb|VLKf9M-q2P1#egGiKVF6_3)lE-8;
zBFLH(?o7~UmZj+_ZM^jTtx5RBmX>+g!+D|i6@=X8Wp1bX_c@h$w`G*R45aM4{xK62
zDz+Z1ND2dR-Ih^X+X1m(ghOzi4n^eHKbv0NmQe%;_7Dxqp$a`KX@6^Ll{ePzgXUzD
zF0&Qt&-Ko`lobbcUupj^jr_&DShezpQ|XlQ9Gh~962WZs;=4R}wTcG|*h$s5CM!3Z
z&-Fr))NkDm=@p*Km1S4$`4)^$dD_l3`OOLcumukmxX?Av#XKW3tI2^y#fyquW?(M`
z73qf3EQO}mmK9&^4a8MH?P1@y@D&s~*1=tjg+?jCRiXB2e~8F*oo%!#BI7jkW;bFN
zE-N=p=BDeB!0#<28#1C|LENeNVX=^;!r%Bdd6v|g2Wvu9!~S0wAITW{fv;Z&iDgvG
zAzPW*%a(-qAAl+o!_)wW3e{PWS}oOt5mM#Kz4AYc-C2QbXUA6c(&@)CbJui)e#Zp7
zGUV)!O$J?Ugwzn{a1+EKvgR7vX#r4ion~6WoUDP15N4q3$ME`zLXp85bmMaxj6cQ-
zqUS~_;9!jwcI05!Qb-u=Vs}&K;Fgevqh!ws15Y3?jRU*Jk0JC<ybAFkyUeBCyAcu!
z38VukR>TT0kk|W5gACi40Am(}#y#)k2(KkN0l&T6PG|(G|5*m>0<{o^c8cDCf6(B9
zU<d4=CYBR~aRLB6#Wg`9F~uyVsAOJM#RO@~>9F_-s7^65Xu6|`S1kJXL?HH9(s?)7
z<;^c*el-Ptlo;V&Ar<|_fu@v9w0@Hk8oos)15HF2TcZar5Y$Nx-uupWN+GpVIxQx6
zahv&bBx!TyoOrPh<Oa!ocjwkI_E-<zf^y0Qa>(8>n%Jr_$2{A)74b(noxl5W^`HrI
zYRMp4Ym9N+It7mmSo`zK?u~#HIyOu*v3PS{DpDZc+0vG)iSZU5w;r1^>x?j+ad$0_
z(Q9_EHvoHg=n(Me?F!vBhVT=c4s_aDFag~5F;3^0Qz8ZmL<cswzz*MX8Y=x%5o1_v
zp#`M8!b~o<q*NBzLc@GJi&N=<>7fzWv2^o1`stiP_uuV#KiO6%_$c8hR;%4)gU!n%
zs)&je;)Qo>k6)^NbrJ!-(kqWFH`VGXzTfk~=a!=Fs+H&>ysFk$Y-jRT92>=~W(q(7
ztGd)H3^o0U%HYfMvC8y2#i-;-N(~^G(!ihu*{X8XLK;u!?FuyTZVgMCBTDajlxi<k
zXIvGn%~|%zKjTm@jt7du9$b`iZ$m>Op1+$G5L%YgvjeF5*+i9WMN*3Dw^7YdC@J;!
zTFR+6@>;|^RqnETX7sAKPrr54TEcd_>l&qte&zHVx<N3O$rIIqZZ#$B&`81+yds6-
z!g!*;VU*u8N!n9&!p&`1@>I*!C8iDSw8n=GQEip<*)Fungx~V;bv-;T3k;cp+R;9p
zHwxKSDWoHQv^@T6EU-)c?65twPv=Z`<GQ*(Ze(a&c{*aj@s;z_l@Juc`qH5`*z{@l
z$#0too`T_z`A$<x>=Jx^@G$GuiS1f{v%bIb)JeUNp6qB9O~<BQ7=&4Q2AS0^2eiD0
zso62|EGHDHls4p?dzr;H7glm&YE@Z#-K#V7K@G{bETPKU*j|0~uO_Tp?<&o^Bi4^T
zES#nlZ?6t|`b&*0XutSXT-DT`rU&(|rNI`=?KVn0F!WbS=+_Tmu#eWlU!;~Ur9HhC
zTP18BB$i%AabUq)`Pwev%AKicHj!Tau`Qy^qHMCaB_W$?(>Mt&Wh_p8&!<MxxfQzS
zJ{)?@qfXVi6<C6DzE9Jj>98HRmxY{1=wn4B6EON>_6<ZhBV5XZh?OcgVHdCN_Rwab
zS8>nXu9l@X&Pt)qyE-wQl(ro3rI&T<p@grVZ%)PE0I7Mk{)!Ye9Dj6q#$d#&Gw>5B
z(^$-zP)K&Sm)0Z1<pc+o>sWYIZ#Ml*up0nk(!Xw8hpG0mcgfGj>u#&!q<;94sna)1
zf*8!w5LEop<jGE1AsH%CNVA*BmM)M8in3HlCfobykN6S(r4sAYFwH+%5(;q23VT3*
zeW@5HH^a>H*Y9)M{+DU=aYD~UIK=rVf8!OigX3Yeyc4dp6KPqbZwWIVHlb=chbAl)
zt#sl2Z`ukW@{*nyZ5m(63w;n<iDo;AwotVX{FhjVyCtVsg)$VSScPi<Y@uo;Ii#4n
zf9f`(_i}ESi@-7scn5*>NR|^{N@XY~S^^Eb35{WldC4e<&8g$R@KUvvsrRDaHEj0<
zy6|ROA;@IEM~0}kM0v2S&7eD;R66?@AVu>=*OYGwb~@&jyQh4%)vv%7$6qHOPz2bF
zTi8%XYa7lf)aySsYx6WSSqsd3t7U%`@Sx!QgL5QJnFNpERa*{6@@!oRR1C&akF&@(
zDwz$2sdw){m@J1UMqln)D7>)-mk-{s{cFa1La|WHs~rgRDo<r0Tw{twEJqs`q&-40
zLiv`7M<fL8h`ij}c0yzcq=o)0RFulW4T%%6tOr^GqAooz8MPFovPbyUZuZOheU|>`
zO=J}%pF6VTxB)$RVAa=`jKddQ#?mcBy?D+;`o@7Qv4MINQBENIwV1I62q{1PyD7)P
z^mXhJs_fp!KbBi}swmSj5x^ocxGOE^LS@8AuV!81?A6!ei1Q6rMWDl5G70vjx-)mL
zd%rRHzKgYA!Ly)$fN2UGDaaLN5^&ay+Tyx6Zd)AqSWi5U*da=P`xN%tY`5D=JJ>XO
z63t%5vVU~f?oo_L4M0$FUL5_Eq>Wu7ETa(!t0<P2!a(1fWxznueMlo8w<4m?Q>v(^
zLZid8Uf$zcULgr~lydGv>2ag5m<Odg9?rbNukX*U=l_A{^LQfQVKGo@|5KUXX>e|t
z7<z~pRAvXme*fFCB-$svqxvi%SKq|4Tn*`hHaB7lG;E_^fOrJlK|7WNslb$A4@_gX
zM9vpb61N#fmOxF3Ynp8k!!BXDy!NXMsKPmHgG^9NK{Lr(4<pPjLIVM(ERl|0N4jVO
z6N-DHQS(u_dsS&`_(X7q$jPH773z}c^@oGtNB)IjEfmiNNNI8)^Xc9cERywwuxxm)
zEitH4Qce@hs~3g*4%wyuy?IHsyaU8oz<?BAmH?cYKXyMNOI)*T#`WGO557AJ`}!$2
z&3?kuAF`AI`iZU!w>KUg@^9a?A<855-rvctF^w5}avvAx+B_`jmqSJN1>_Qax0f-9
z;xLAXYqj#^QR7Raoz++!EVB}HUwp#nznTgL8B!j@U5RXAq>0r_OIp9ij4{19O-i!q
z#{O``vbPS=Gn@iHN34q2S6kpZ3P3C{MOqy<D$iVjTFGF2it`PEcY?$N^50*;h<%&y
zbaFD^K3){(=gH+C{BT!V#zwSAIM)`;p9I~GLo@Q~G#RN6t&P6d2%Eo_#6;ojg4ok3
z848~YqdRr9$LudB9cwlx(vIz7pj4MK|KxTk`2{>oCb~R?_e-uTVJOyUXH}c#qy$wE
zN!y7)H>jxj$@EZD?+&Us5hlF+oxRSXI;I9SGkW;YCc1YBwA$Iwj%-0W;cM!-db&`l
zLh8A&<4R*1K4S6F#EGt~#y-{2?%}1;YN+6L{4C4cf@MvKhj`DbV<;MbTEt+3Q0y!Z
zWNHX}0rS`M{DRqP-Wl<cU2%F6<z^E7+?W1bgli#jCM3{*d&)pbpUB~#q$`sOo@&c&
zkB`e&8ZC_{aZGr-Maf9m@zECK{gF^_TbyoZPUY@yrO&0s^*YuGx?03_oeH{H5KJDv
z_`KfnW3WLEMi-x{+29q$YJ0^OLbZ$W^PcARh@EkP&~|0Q>J8&8fs$*O5wH|8oI$xp
zy>YuY__YyVywp7$#8H;`0RRWl3Yd8Y2FVw^FWUnwqb|*^7dwKdEBXYn-j@r5O@t4$
zwOBKPuc_ZT#?(D+NfQ%Q*|yle&l}FnHe$_**oZ7bR3Ao%I5@l}>Nip0iBwdsAi4gs
z2AcoHm}CU`9S!{M;?ES)1+$$9`Bo6qRD-x;cDTA-4;t_iWI#r`9~pc)eGd4(^uIqY
z2cs`i%$cM7@2o!S+K~ABIwPv0iK0_paF}AhCM^uG&LHc`M3mx8+e6fi@DZ~&k!yL@
zW$BGy$<oSx`x<g4Ag3B(&mtaop_1xt978@;^fH9qV_^7)?3^Cz2|QPHw;l4+S-o2g
z6H?~ny9rNW_UU@v{6JCF-tU(`To7@u`iWw1Rut<oJQrGmez`CnHBitaaWNB>?AJc+
zuf<Z3BATu|sNx7k(61;>^XMWHOD?*j)rm4ser1`d&@zpifyj4G=7mtI`@PxRfw2bf
zbMw!tqQ_sqeki1J!vG2>1oY03i*sj;<2d9;Y$YKLY|ImObd@?fxkQFh{*UN+2FKXG
z$6)VEP0+j09f^>D*M!=)%;Zb3F|}vKATibOj0+u~FX%^>0*4!~Lbe*_8kW_#ym6UL
zlrLj7@vL(-%}h<%&=mn}P0%y7jM-}0hhpLivqY8XgHjA}Ig0$&hzxZKed|2Si4c4`
zWf)qiyOHlo=>)oDatE)-V$LK~@QDcR1jHIyj^Z8`1gb|>ZS?JQVY&6k9QGr<9p+?c
zWV18=9R0<<?lnBp^(6h7Ua*Z)1DyhM+m>eIssd#1L($vO@IpNSv~RLmWQRXE>gyHz
z>43_4Vo2nZ1<fim%51{9!+VenwG+hoh7I*soRhDLn!DY<TafEp9@((%1rt&_({fz(
z5`5CI=9Iu1_N;;w&^%P#v!N&9l^W%P7Qf$2KOYRv*UB(Y>J$rt0=YC5qEx!T?ZEmf
zp-6}XyBF=wjZLA7u59xr0>s_uHfl@hmbIGn_6JMcA4kdv4MU%Sw!m*qxBKLpB<N;#
z41D}j%<1#FA<sE$*Xjrc?%SC>(j<J5xoOYK0CWZ)Nn&L@X9U4?NNJ&Rl6P^+b{AkL
zQWQ+#$46LpJ43<fDXgA5Sl|=hBh|06o7GgDsRY7IppS>b0jcl68h!8V_+)ww)(s^i
zrW)b6S>CI8=D3X&Q`GSbLA6_6ub-*s<LLxjMhpmIM~J_FMGOW0fiX7%^m<>FR-+Z8
zC`uC$4-eeeIr*oT&b1?+q|5V-)+j^5;tT?9rA6#s-yN?XJJfbB==A;X;VgMiobVK4
zIpxIBXG4hy$5^haDuMMa`nGjO{1{w`tkVm9sVn37NUeb0EWyylWjEwq@f*PD<(%Y`
z%Y2SMgEfqG{nqgj{2?~W*j_58KO%9VQZWPY_K#dY-=zFZI^w6+=7e;Oee{0Yb&8-|
zN3wiQU-LKe`Jx|aC+mBF<7qgsMc2G+d`gGrk}ak+T9^|k15<zy2gpUO3(3DpjCZ_=
z%(_nzN?|b~8>cM(B8n5^2&Ufq>8PhG{3%Gn%~TJXPeD?YcEj+*vjk@onLOV3yo}-X
zMk|&{8V6})!S@ZM=5o^!f-^}F3IMAP>~5pcCF1FLhypKrS^g4y`rRr}^NtN}CmgZj
zXEf^j76RbLSpq4N8pQFn622PTvWtjUU;>LHLvz5&On@-bJjvuQ6d(!Nj$S*lWI8gW
zuJAFXa@P9Rg2K5D%RtEw3A-jzPeST$7SF6X<@o9hY10qNa&pYBS^cUg?m$xdZ+I?~
z|Lyk|4LJx<yjgyLA(v17gAdS!A4*~5aVC!++;=2NkiS_)?Q1~pu8}ULMUB|GiPl3j
z%I!-ZG0kV_tsWQXISxt+j7J#9bX+xeQX!m&@f(A{n9zsM`3kPm?SBtVWYql%OTwB#
zLH_g!Rllk#id|7kgWv1P;w3AW1-6;4mmBz(@o?fj8(Y&hcPXwn0)h%%9j{Tkrt@zJ
znqSCfB{9+DVi?|!D~rKH87~XVYzbZ`E8zL*8Lhis5MfYxIRkA*9$$Q+h~e*{tYkd9
z9VJ$y6-o!vB1=LS8K$+NtRneGIhdSJ>qCc^(qI2cDWb&0g$Q^M>N)3qNd|5DPp~N8
z-b2k&<Wfb}3q2^?Qr&9Ml#)GWBdLClI#O!LW{F9|lYhsL)W)6&A#$YM2WZ*)`3zJL
zY0dL{RYuQyi$;_J73|RIOT@nREi?8uk0gXtfkv*6A6(|}$s_@Q;BJ2Y5jL?r`3W2p
zog(%wr^$Zei`QeeU`KeJGAUwfu@wF}YlKCXyENhuWcJZHcsHk5n^SPYlgdVvL@;n^
z8gKzpO}VM$aMyKJ9e3q&P~Y86@eJg*Zppu#W+1uDgRh6Sk<(MFB-PHCH2oZ)C|T=_
zy)W?t9|`1L2fhiYVm)AnKSx_8*$-uX`=3`kUmz@A$QqKUj@}{x=MB8b*E@svP0@LL
z1t3~BY>Mb<6B9GKE_s57Z;@NX!z2uRd<?r{=Nk5%h=PG=cx0@MbcpAkvPC&nLI&|5
zQy4}pwn1r%E316wr7Q}M$&A7bQpY#DABG`o*-G~$hnstg!DA`H<-2`{vkHs`e`<HR
zWEY+CR0F@O{=FYrQxKt#ylBIByyg0LJnY3OM$M<%soN>J{Vf-y!(j~U<?~(S{|2AI
zJh>7WK+Ia{u&yN+C#$n<^x&wiLq^kcF0m%5!G~uzz`}JT_jHei6jtPq@m-UyXHYpl
zS<vwtw_I#ck}YdL1#+3ICGq!$UO%oA4B?tuv)>b!q}-6uo$#^=Z(WXh=A2wKh~Z!r
zinx*ReU<Q~rpL>of~w6BlLj3FRVtKR#ED)FDQVB+y&$l7`UYfLcoJ%wZUtSCH*EKL
zD0G(_fS|N6d3amlH2q;L*Y!05keDHYicy9gBuy=-7c+d2i(w*WGRtXjvw#;BXsAYi
z(Wv=ygzn(Z;QIP)+oOssKpdBkS_A%2${u=g)`1A%9&_o$q~c^}>_w8~?P__wCgIq(
zw_PuWSv4w@7|GJyhUTc69wm++TJm7E-!G!gpE9;vC>UC5Q^3&&GRVCt`?EBC9NU>a
z3Au-3F4wS5M@31O`e%g}Pi>71p&AN$1_(m4f&d<zewZbP`C?PG{Fq|SQS!jyKG->6
zd<Lo+1-!9(CjE)Wn^x~j0@=S)E)EZY?!J38XCGg$4_@Ciom~c1*}o!gLY$#QWG$Ez
z!L|R!vMsK@;HXv}JtqxCMmN#xR(+3DSgHNR4M(FT&~}t$2J2tz2~mJ`DZ8M&J}`ll
zm*m1U&>t<!Z{zTkH~WuDiqQnA3gaZ6O0=bRe%X$QrJ>~aU01NBUfPz~-ct#~rTGB@
zhg^hT&n(A&5IjsW(C82n10)x*6wT9y+Wn&4UaZNSV0eBX{o-%*aon+7&N%+7bhW<7
z!slpYj5hX_5k}$Vt;lo1ohpB4f~J>ck3eXj$k+M>roWs_rm$9wWC8ut#BG;{vj(b~
zbX?5AGDQVufbBuqzk^%%p_qBW!ujcxm><}=yB_g*CDYm8KpTgqqSLx*Wr9D24LKn+
z)cE6VvOwCz-p@C$92Z<bSzu&Tdr>0hg4`He?X-?%9gu_R+-ZagA@DRwJR3fLbmiuS
zxjuTQtrsKWY8m?x<Hw)$@f3OfeFg@o=HX_=p#Y~SlXg@#)Gp2}?&l|L6W>DwDa5td
zhGkVf0Z@KJHu`>GF$Z_`cs|pi*D>yV8{aPCkJOp{%#Y0BJT7zJEjRf6&0dFpvvt;l
zcx=~7TNUc!ia)w|Y-du+c3!Jp0SA$zhkSxjZXvY@mO<3qYiyaY&X>I2x3}GaG30R-
zPzjOK!WII7Yw<VWk5O(6y7<uOp%8FBw08JBgkIVg>G7tQ#WQtriA;;j6EitdO6^vl
z`QP3Kk8~kwZtpc9uOB0IV4`;T_WxzuRjo};{S!@A<S;%v4^OFu$=i?$Eiz1y83C4F
z$5%1;<$JcJ0GOImrIF+%;l(lXVZQuu1b2lMYq*h!{m-e=Agy<07+npShhmKHDEktL
zC0b@<*mpxr!!^ZH2}jbHR7;-fOB|^)cckJN(halUf5X*_`i%=0)Z=0V^eXdkEN^|_
z)A1+~l$xOYoF;Zw)z+{Rk*T3E{UOo(bGFPEoHs+qXCh;$5smcq)Tc$6vL{?Aq7$Lh
zpE*_Q&^?wp*;F>FzjfK}HTJd_K_+b1+0Dd5n&9I|`he7QD4qL!HT#_7tcClR@4JTh
zn1G>rh)`e?-JBd|t6&urME1y~l1fCuK&NRDEkA!C%jZk-av@u6OC%7-$KI!(Z9}bJ
zE66VoOF1LnWyAz8uaIQYQ)(NaH*DU_umIV3dnLP23RZ@cQ5JBmAX8~Vx`i7}Q*7sx
zc_N?<zG}_SmuL;{JKX7uHQuD$7q7~XM%D1E#qNa%qNt-Lzx$B)bCxL-5%p3v7jm^*
zC8Sqo4Rd3?z7k*yHmySXs=si$ufN+o|7cIyGgNoYSS@@MYudyXpSUK*E_4oBhJUr&
z2F!)=>p0zrdav#LquBjNk-aSE&RK*V!Xuju+6VLe2g5bkNHw4vzY>P5Av>2&psT79
zU09Gxy0sj?_@7OieZoc%m2jn|sIH8poas+reA!{vL?4kgpdSS#9It?9n7*UTe&J0l
zacejP*|(vtbtZcW13jK)1LLHYL7xZIUhRHP%x7TaG;dF>-?hEj6>(z(L{Vwv0O{ES
za;|<mu8DBk{>$QxM-PMl(IwA_?V+8O5<?;PCGV^*geB7*X2bh2IB1IiE>faJdCw#v
z|91HU(=-Cm@OrGnxNdkl3w=!m-P!z<y<`+cZ@psZWgvRaxB^qqoUV;)<d#0T1)nRQ
zANW$w)ZZeNT>%0=Z$|Q}#VS{7nHX-X{NA;ZH65_3y=>yu8gqSw@Fn#TxOKnFNH}_Q
z(T>ay5wnxl*BQu3`M25$(p1=d!vz=f0zNl&hacULw#%0$|6|G5bF4Gvx#(Bj?&RkR
z&^$-cI?=ctb2j_vwZFew)VjpRBz!CI`QWbhe<VlltBjI(0Kq<b=uS{?%3|MAm<l)U
z)&_$P&CSIq14x|l!N>n=E|QZme%vcT(x11dd#NvU?~5W1xsp)Z>{=DOfykghPyc$<
zQPK;;v8)$Sxm*C6F)ue)m)gayyt;wrM^DEEMc`Atynbd^j6K(=kix_Vl^FW`8`s@n
z1k}n<!j#lp$t>+-=9;+O)oS}2({A?_kj8LS#0mjaIhrwDG;=}+kNmn>OMRjVYvVpT
z#d!utVCYOSmt*W`PB*ghBXrNkNsR42LL5jlFZAl4*QsYYOwev&M*XXo8Hx?LMDWgn
zU`M`cR^X0^dX{t*+?J$Ln}(l&xmpTy#o4`AWX)phX)}syY?4C5{u(zK??kIQEe{BV
zp@G`aAW1`dHuZjN@L|@;qFXggoWIbYY)=_$Oh$EtRZOqT5X;%FfA;_Mid=85>tmvz
z?@p@(XN-;4z%M7%C>7q`9&HbEb?)z!H9~221W1jNQ!tf}Z2;pF^!ls|9oXZ@B{Qc+
zd1}e?NVG+9JA|wTdgyH2V*j~rUKfPh+`zr_GM(36h0_6pR~N6E502#n`E?H2h+VqI
zBO%yNgLs6mjJ&BCz}w(K#nfw+OhFb1Wj&B0rZv3!m}y-v<SVDRS5#-u-Dq`x^8Mks
zbn?B8P;0wV^HQCRp0!tVhP)j0NKp5lW&7-TPYYYQ5VM_YR>#@8(WGk4q3(`5sKaSh
zXHm6L-I5D0!Aqi4S6h-B^t@(LHkRF%IG&KCw7F^Dr{idI(>?yazj6^66jGXcX~(8=
z<a@*Ua7aFw{d$XI7tP!#f+`qzst)&YG-=0ieGj}XaVW4O5@40rh_q?<*P?`St6Jgw
zkG(6U2MwDHS%Z*Y@@kY^Cfv1K#@20XD0y1s9|@9f_I}xf9Q?&xBn)>|)4%E;uLt!;
z=^)2Uc2|cEWJ9DpJv~1uFXEDJLm3g3RSuE?yVk0u1#{mG3$-zS2Y*XIOKbUIAE(Q5
z@=AV|xe{;jB*)-wMVJ<tk0oj$6^Sk>)2yhW$YQfpqAA3|`kN%<C(=B&7=V}m<pxyb
z2y@_agmZR+$tEL37M64gZkPrenS#Z*T+iABg59#$m()2*p=w-cuM%FdsZKj~fhtYf
zQ-Jh<mi>wT4xCwus!V2N5Ur<Z??SRPLo}=-)oS#or{Sc%8oU*YqW5pt)5uF!b&<{+
zyR_Jh#4g1sS^RUBx12H&4=?7a2w=*`jD#=tlG<6bL#xJNK~a@C*m{NhWv?+poHi|w
z;J{LbB!0~+UM7On$L~*Npz0BmvC3vXNy&7aIlN8mUbw2PXxS(mhc(`%j$}3LDdelv
zT}&Abq(ARzF-1dMMnNW%!!dJXf!4-Wc|SEetTpx9cVQ>|Uz|M1#L=VuUm;3|dciC_
z&6;(UEP3V<1zEFm|01nR=t&4y4@SE{*rgp1JMgd5nor1mPBjolE#E;<E46rkY9i0b
zpsnz+Z4(8{PNFRMgd>dth|;}PH)NlFqSS*|2lQmMtVVZ$H%x`!G4L@e2BV)|UWd??
zf2PvqMAM7>v*3~ew2CS6Ga=r>Nu9_jCrXA^a2@<Wx1hvAEux40ZVP3P-BjH3vrod#
z{Xn>myMMjQX4l(DQt9y2sbQp&>BtjLi)6UxZo#fuH;$)LA~<+6h&9f14jdkg;${bc
zD8N}j0bD&ZMy9?^pWZwzz5WD2#f!BT`1dBFlsbc2YZ=BeIbLv;I{8x;4*{0LAfkUm
z?qllo#m)A7BuDz>@9D@`lbZ-KTD-AmS=nookajhbP!V$y(QhD#jIR!OKWT5yuytG6
z>A<tnw^e2G9*s-w%pqvHawO_TbZ~s^2vnXVNs6K>5ec4!RVUYZ;dd5Dyp$Q!XKvB*
zz2#Lk<e3>B8jNYG%B!Jbvugo1>B*9*5^I9|I0>a1F~Bm^teoz&aI$!}d82$wl#;CY
zq;-)Ff>{oW)N?YH2;?)jI24jLI3nIgE-UlNQKgbqqJnAZd`2+ET{D&dFwO5VOlxN-
zH5pWdJI5G`KO4O$5Gl08KA4jyx3l5IAoWSj*+O7YP!p+97uMzf5`cwDeBiHKXn2y#
zZBfZbD;gM6TV^;NQ{mO9tBoY<rq0K>4-2oX2818TQC)B09%^O}0#8j20~Ver)<tZ$
zUE~W;dsMhQ$2Vh+5c=&Guz+P@2jc1@G-m%2;=Z4KVM)HgJ0S*az&ZS97528hKoL9S
zX#>uI>!bd9y!s1g&g@>5N7njQ!ZkidwTD=2T`}R2FN2^!oR=39h@v@v;sV#o<E<y(
zLUIfxCmG6ZudS6ai9vLtgJrc7HeB;Ryy55eAW;hNKdx`@Z#On1$iM!Bxqf!sHo%+U
zdI#D8&V(;qW&<BbCPaWo!!`WdTMhjV7P!vk#ZV^#$#U_^zX;|x#oURqr_4CGI&540
z`>}Cl;Sfey<tdUcad){gCLSKgP$z8p0<ptMHe%1Dje(x+BR;~D+T{?epspq{W#UxM
z*5jXV|9{BUSl{lzj_Q*^06lCzSIP$xFqC}>6r}$n;VKX_wiE?%8-qL?<}!Ld`kVL1
z)(#5(dZXFm9+FfNA79YF-4!OJctqrPk55o_*^yIY+Sr)nZ)t7$?^s1=B)6r?22ERj
zew)=(O`h_{gZ6y~vCnUb$lloU&FXgxP450w@6-`m_4mGoJ)#}5jUkD784NLWv*$nr
zCQNZ$$Tw#3iUV4Bx;se^4K#R!J^UyI@VmlaSYXsLY;Y8#gDD{6uEOT59lDE=&m+dS
zM=yr84zk*frbc1bsSF!{o7ka2d|9%hNP%VeZ;E&R!A9KLhV>amN)f-TpD~84TOt1o
z99?Rla~h2$zP|TJ0dZx-WZI;PqnqX!wSbD)ez1x&7kQREm*h|4HP*==h;W2iM)X}+
zcDS*zD*O(Lsg&fpz2rQwztUO2ImqtnascxthtPiG_ABk7wnkI}*i&*X0|f~6TZ963
z8c|~;v5t(nd4uC$CaIoM#f6^wKS+=QLQ5b$f*fdrn<9y-97~j4VE%B)46*_W;S9zB
zIMgL66#%q?Q%%TgtjHOW(Q`C;@uVo##lV_4lvphSh|Vv(bv>Fm+bL1n#V2s%z}*w_
z!WJTo$dcn0O$vDAPVER=Oc1Ltff^QQx;)QxlQG{n)1QYtJlOTI^a{{IJvQJHJptFp
zsnG92Cjmhv8Txsp;aF!5J)ej*qib9S>XZ#>SgxUuCH!tVLAVdnhLk>t<mCr9XVG-E
zGo|inH6}0-N5JI*@Hg}dKV~xSwSD#e2vu4SfiEd8tj@ZIep>CCS#J>hr6(ts8%PV)
z-i4CC7k1ZHHu}dwn5?xWUI}%c^G9A8(+GSznLJwkZiPRmjeOAa&Yi|rAX#wrymbXV
z!o3(&c27(C^4!Z#Z(lCf2$rx!JXy9)?g5u7Ko&zHlu1=01UnX|M=pPkXfycF2)f!l
zA5jx$1{#@&Nl|B9{2!A?#(n~C=`pQY-0`VVGdZJQcR22+#@V?-<h$a;G>@P|#e%HA
z5t2WpI#Y9wb~35q4l_ee4u{b|znRhEE?`Gi-s(+^&ZY5;za@MW&W7-e?dv&rE*|Wl
z#)!l0$O9bUU_sg^Oi2jvL!v$TvLd{v$>u=b0jkWbiF1Iz_wU>5aQ!bo8;n&;TLTP<
z!oOo~f;PNIy<4fKDpL&kOP<YO4#D86;m-7q5{5V2U0r=N3vqtENN}lCa)D+?(cmfE
zMCsm{$WBUY#)(3A`ys#|zQgEm{6iwG=Z{2Y;!=q7JWkT*?m@%3i(p2_D`o~CYU9o-
z4+WPek{~<=(NBe)>#H&FMxUpN{3fVuK-xYf7e&}pB=L&ibxHt8%o8_3aH<LCqGev$
z0-8|IxI<*{K`ap0-6cK(%+`3@iKF6U0XqYMmIjNu9-A^cwgqx8>qZjQ%5i~A`>ub{
z%XqXHP)%&vj+c!qEw{qAa}PTGj24j=ROg0s6aRu66@rsUe&DjJ+{9F(=;=MgxAkZ6
z$R$+Y4f*`h%`3Csms{^#n4o4_VYP**O!WD&m9|i~aC1a+V8lrc$B=GaygC{Wvnn#j
z9ap48G1_dEl}E7)7QeDFnf)qNYSb#$wqp|^T`RweHsxKCA1kfl@V2c?`1elvOXbx1
zT41HIkR8vpmFrFZy?unRlJcja{dPpgHOy4-HdD^naSD9Zw@5=P%q6h2xtFkU5AmhU
zHyImhgNbpv1*hwBKM^lUaU(6a;ZcLbN$^}p(Y<jgaes0=$?Ao|aqxBn@8dQlpZbq}
zRigwU+qxrb5}~Szo}}G#s&~XHEk>)EyEc_(-S>n}2K^?ia)VO(ks_t?g$K?0F_hOx
zgH9)RSn&3ZT0xu*Qv}~J>oMYE2p`<HaY)i_!YdZ#nd=1ha!*=S(8l$z5H4TFU#PlR
zE~QLlZ*1!*Bc~5hekx|;Kv1z!(9{FHMt11<%YS9Qt&Er5HA)}M^z*0O4u*IL(X((3
zxi8C@1X^l?x->|jl~i4lz(opm+SDA(T9Uu2dAD~i5`jETd&VlYkepQ)GZ+-ib6m*?
z)NV-#F&d&&+?i~$6<>_VU)|7@geu%TTWXhdp4YE1aB5HbEWJFUQ&GbFq6dwhTgvAr
zYg;=&IQyc}aii5aSd#M0+nnO9Qc&7M_iI5r0l>@KYn!<E{~Q37sD7txm4lp0_uk_W
znRPS@G~xbLk@VmqnjNNS;P$oJy?bI0Jtela3@#_Z?r3~>JzGcFJew|Q?%vJhmsXXk
z&|TLl(pUUSa`SRm5*_Za*-90=lRq)FN?>YB9h}hp#5wgMq4_gw+O4hE^CjII=nIVs
z;1a0v)dT7aNT-@vt&&foPQxdL8}p&npC92`QdWded3LV^Qr{@IKACpO%wdYU*)U*k
zYfM|~O4?cvud@*%kg6?Me^qa=n@Imi>FH8C!HsYI{38jL;5NtDZp$6Gc3G04B>3*M
z1SdG1NC!k(JYG;zGm()@S(C?VLQ~KSJ9y&&2OdY<?6Xg@DvXzh7Q-d3gkonXBj+2U
zIU0DE>ei$1v%ZK_s|O`$C#sRT{F&V-`eqZxoUr}4O3R31a=pWp1ojnKW0Qkf&~fL^
zLnMY1`Xy=nFW4be;w>fe-PbZJ7s4aRYQ6<T4q5+NU)=A8?I_#1LoPY|F4U)#rKLu-
zMj|7ZI}KddjKYRIf2rY_B3f&H4Azd3Pj*b&HXrp?zdlf&4B{P4N8Il@^{V@oH~YBh
zd=xepc*KW3k2;`>?75awfcDtgD5^7<#Qa#HsKDE8B8V=%lKmj20y-j;6Wfgo)eoP5
zXP;|vdnQ6Z2Tv|5pi37xI5}=#FbRf~B*+iZ6F)HdHg@SJPTuJG(90`$wZo4e%?w@)
zr07Bv6zdX^@YhK6WJ}mPsPg@0ViB3W?|MC*(klusNb@)uHR5&R#w#BiJee^+eV$UD
z@^*CmxF193i?TgXq@sN-Als$#wMYECOtI?d8Q?}k1&V+=LcN3W5nEQ2MM$gIN9r?+
zPT>+~u_r&YiZ`x$hkYIy9%v>5$wXEE?AR}&MgwQ1d;@fIO8o}PgJE~$5zfM*1}Vfr
zr*Da5X;G&#MDN5=Ogf_;M74-aeoo|u!W{Z+$b?9%f-FKu_QH`&Pu{<Spuo8C)0HGZ
z7%OLZR;I=V?I%*JUXVjqa@Ed+rqrt}2DU1d9=h*O85>7LUVtwlU`7~)#XvA{c+46q
zk;i1IhDwQHkscXDR8zr6T|sEFfV}ra_8#V?$NJU9!ZL$U_#dsSdyyf8)I@#?TX(20
zuXm|{%Rb@KFP=_Sni0E9vUFmm=n837k%u_mD5VmLQc7lJ0s245SoBwv46Ko+{)^U?
zsJUr(r&n#ePxFLDy)Uw%<2@@Gn+sn)+%tU1nr)H+NmPKKWa`qMV_|nHPh^wDDW)l1
zF#fFDKvs0|%&6c}>E!-C)uz`))5qO~6WPS>N;8;uqj-h?wYn;*xau=RjWTi8qyKk(
zVH7nW+0xZDB}ep*18rg!E0R+q+8xL-Pskhl9<F|z+KylhS+v&ro{*z>I#6Ijk+}I&
zhjcE+aHtWYrGRo7qL!1FLm(rDw?VU(aZ_-$k9Z*kUN^$s&{};<tFZ?{SRxOo&;1-e
zBonUtIP&T*hYv8VI{yBPko%ygKr~F|!Oq0AKIB#CVTAdaGtDp}k~w6aIh-=FU}9av
zk_}2xDalaD4ytE|S6I{ZsWt4QBC5GfF%d5U1=9qDzsgc)1q-uv;cjbU`@ohuOVpEW
z8mwp2mqntNpR~E=r;xH2ry?Xf2sA4(h|+XJOz>k`%|XYBs(X@i2*07Z5d_u2$RMzl
zX{0P!9dU<v<6>4`9*AIRKYhv*;P``pUU;rUN_#uoQ(rjtpdg=uIfLv7<2zY01K>?)
zzuN=K$YwADPmB^`5tu-cfpZEwOoaVyo{I)NFxKh1gpvj&;WvVi@NAqKX^ms7e1^fC
zVvH=nZGD|-A7QNN<EEV~PLjPZ)75#HDBv6s7V|6afr`!gHB7h$`5G1#L2eIJ9(G-o
z0Y8SUt8g5rE*fj)=?QslZYP$FPonEeLuv#GBENZyamCN=s#-1?v}a)$f{!@()^ZW0
z9t=eY+}>1LLd=w)Nr?5<+Y+8M7%eg&zemMA9U3%<Rrp_rATf7S%1%MKLa<jOH9Xge
zy*jLGt!w>1&@GTjTQ;O0nSo3^XvKE(Z-u)a{GoHxy*koZs2k#$gJ0(J;%lVJCdh6a
zH(%!K!eAKtsU9J;OB)jZaDabp@PdG7D3PY!w!tN+mw7SuwJ}kPGolYC+v?u27lJ%7
zTQ5UWfpVCt_a>7}G=oW)Diy_0)bQaJ>29#t5<?5}8wmbtMCg4nzg<Rs{y-NSN==kB
z+sikW!U#Vlw+c#{H!(~g>rj+HnNr=CjmAR(ajH1{SyM=r9^eh{mBD^WJ7*CM8d5?w
z&=w8vTUI+ordZSIgEZpc^$iRXf|v0T>-CnkpjlL&F&b*M*RnTWon+hC=0Ljd`_3uB
znMFBB;Wu{2j~N1F1QnYq4bDIxLldg%Pd(u)bxrcZGC)6DB`+a@<x%9w1{C)&qyO8~
zblCXHW=ChN@|!d2FAriAPOa+OeX&aBohD<W{g*6eEOy=>eIbZFF#urOQZN&DF8IT>
zA0EDVi67m<0X(D+QV1fGKJYRWCibaLP{^DdQA8*v53YAj&h+U}X4@2enUCsPDlPE(
z4lKIa1pg}*^#%L9Mnccn<Y}>V^ZpBT>jGXje8rF+sfU_hYbFNW({eaM<4;bUwWNUB
zJV)H!iVyN3{(Untx~r-?$3tR{lwR^S7<XP;GnVXoCrh(O28eAdSbCICVqH7TtvG0f
zj+nawS_tnkFyKzFxAC1BtP5mDBujV{H%sW5mJnSpilp~dIBC@kfJMroU^q@Ep=fzT
zwN(|*<A{qd+FXLLMfH`#wq2l1;91lCf<^UV&oI_M^)<A}Kw-xuu5j0H8bV@%Ksm)-
zPkURHuVA8ztpGK)#jZ8oSsO_p7UF#A!orP`R2MN<!uEO5_JVkqtt_X-x1&U1B4K$l
zg0$#OTQcFS))!BM^=zocbOBp4mj47I03CWZazN>^BMVQbuzV~hVm{;)x0b*&^76G3
z{d@6Ii~_ytt3%lY6Fdf;(Wgq}feGf={R}MsJ9Ur`;}5Sf6IjCmUeZ&|*{wv**B7jW
zyzgfW%1w+%1gW?sLe;x;awuNN&0xCS;HWQJD=wjmlWXtB{u0jHINs@)xsgK>8`?r`
zX(Wtn{E!+Q@TiS!*7(o~f$Mx8MHM?iUx9gZD!FzLn+!xj>%d3%zTm}|Vui0GW-LBd
zqtUKA`m*(u`fHPp{bXwi<rEUVHqRC_c*ywmANsTCXVgGp180+FkNi6H-Vj$w-R$DN
zZXZaL@RD68ddg6k1XU`fRp~e5^-Bz++$UcW7fF(6Rm4z($mA&%1K~5%ZijhOBHnB*
zxDVsvG(wS~K2q*_k8kYnKL255@TihxZkNa5A}Rrp5~d}->2p$)p#Ar^nY{D0Z<sLI
zxmh{^|LOFQ9pAU6KNG|rC&V~ujwq?frWfe7DK5@FdZ0POq_q+xeoUQW1fV>4?&+X+
zn1~R9e8W{Ydqkdn<#i~ej_DDOvcR&RA_$1^Oj)bubmW|+D=h1shLS(|dAlKwXjVA5
zUzHl~r3|i*ATlI0GInYMMC8V2=-ge(lhEeWp;B_vJQH-}<5F45aoSbmxF-3XjDq?y
zkvf0OF!noP|0cp{Nhg`0=J8Npy)>Yamy;mXG#L^$iq^iHREW{Vd1Xw?s!pj%AEPWC
zTe4UESm#s<y1^?h{r5I2Rj36{KAB!r$FAKk{o012%3$IfGNdWqAl@Ln-`^o@!J}sj
zwA+(jjT3p)&6vXzqBf)Y>mMta-`H~nkKVbHYvcxT`guR*97mv(?7su^gYFO#{nnzb
zvrsCFn9)W~#6@@o#8Hq-h)};Jaak%=W2`^=AOjziVLhRB)}FV>Vkc?54~xNJ1pG(%
z??(<Mux01P;c`pDe-T5>HOc%ieeJSXL4p4TY3q@ui@9-j*K>LrJOTTITTO4Ns4`>v
z3_EWZgZGp_fMNSCzHqiR7TQM2$M3JBAK$Y~+brGxaJ0x%XwE0F2yy|g@t^HJPCqaB
z3h(n=&QUQq`W@fipK*P;-yPc*L=FME?I%?s=-{42O%9<r;MQd7*LBBfZ`jKc8OOp8
zZ688%4cc{m<n;w4h6vs5W+Xci9VX6@)d%fVJLmgTCc@+QyGXinJ|LX`{{(HKQ)&$H
zb2cuZBHI%RFm(81Pu)TtyYSD`<GWk#ane2oNz?ym@q5CJ&^hAFeVaHtzF0!?_Xd=|
z-;jW%@jR!GWQX74;_&O(cXNjQcaFzM_|^4(=WUP=1o%jND+%X=TEWg3R)R&xNgbGy
zsc(5e^ok%cMk*!vvl@c5XhE2otlbZLoOhSS86CPHTfsgYd6*imK@c`#r(qMR<UcDq
zNk0|#Phe%GtEvQaV@v|+-&MNDC~nGDY8cVyi{S!u9tZ`QPyb%9U7pI6XKIg!`R_x?
zr%LQRS$U8~$@1j8v@pm)`}2F#o%QNAYtdSk9byIdBT{_8n4TpFp!g*!XNquK#bmC5
z`e+8Gid5fcYsIK}*(^>ai78x3)$6LKq*kj`1x?$hCGb;@J)iNQG>fO+T`tup3r0v;
zRIS>~ozIHSUOHpW(r7rQLmihx;N7-Mb-t9nAjXVIu>4BCC53}Zi{2_mwPsz?sswG7
zad7`Ztgw^L#J;z2a8PO`l4wPL%#=xvJcA0am>M-ydnqOH+@^{ewgPjB(Y^nP|Np^$
z-~MZWk)d8md6HdLCYwR&fT=8+NFkpzsVad7Z}QI&uQBkFWj}<SyKcA0Wgz5@h6=xC
z&g~B(-m6Etaf>fyXEJRTebGpGyNoMD-0|`ViEy~d<3%}I>FR6-k9Xq-yeE_=r_hqy
zrLq-I`Zwa4c4En+yi~SnOMT!|Dqn*^^?p){qL{w56(`<cY%Ei#e7UG?tek+c9*O-{
z?eoEDdV^)0W$4Y&`|}NQe@$bawMfdL9)0FT983B1*c3pCrvf_X2^k_&1~6XHHbdFn
zp-o5>11qs$PDbQHuYVPXd^7I(WR}zgNxL*W5|DBZQUp-sP-q;u;(SZ!GA<I5CeQ58
zX<Id2#lBlL#V)oI4VY8u_<Rp~@cCfs@Vzz*cu)3xP8@_DE*91U7cAHxd$wPePGa(s
z3nj#zEt>dvTihS*^j-+?Ar7P%;EiFv9slG9PWvoMW4!)tu~KfWS2@lNKc8hDwz{D!
z?@H>Xv;Bh)K5`@RZp#-<*YB9?t@3xXxLu_Kkl;gN_X<v8ZPf0H(nTO?_udr?yg%oi
z*5L#u3LEDV;b#A}#_&cV#WI<sfVU3lj077P>%&F5<sIjJzW;#B&3c<IgPr@<?(=E-
zIwS{zCQipZ0WjvgPy7U&sHF#~vz3yuPfl{_`FQ267l@|XnPD1%<<AB+y&lQ+(J5+u
zjDOQkY5zDa(ij{9+HERZ5!?O=@PIgF#?Q>t6(KSEe4X4qSoDH=>Gpk?o&nkbw}iy{
zFF&NQ#%{xZNMjyK;Ii%OxMc@jk2yh2o;MO8BgQyJeIKw-$UpS{zXx)D7w+2{+2c4A
zm1yKKOwXatZ*+SRPjpfLuI0&4yZK0oQGXOE9SI=cYcf3n#zjKaIdCbv?sc<XanoB?
z;kCbtz;@7$s|BF8rX17^D2F3$i_M84l~#wAKziKBXx;+viONPtN_0m46MLQE;yVX4
z2{o)faPAf8*^B8hNvF-?xz+Jg=){Ui3O<F{_d4?H4OHsT9+IZQYFB_KrRwOh!;<9$
zUqlz|eq(C?`QGaBFY&u_`iu^z-5=R7FEkah-$0n?MbKuEzeARoa1gl_4hdaAL|WZu
zIMU~`IF3zx|37@)V|Qh1yEf|BHY&EAs#I*-wr$(CZB=aBW+fHdw)dE`*0bJst=;x+
zt^a`Wq4&{lT<3XQBaeXtY5C2_oC+~t<lj-!dHkTD6f`!>3*!rpNlj|HmPr!66+f)a
z&{DuP|6W*%0>LNEHyjQn*E|0|LdGb})QsKrxP@nq?;)>~A><w1u6Wj%HevZY2dJhu
z>pLer_Um2V0__sE-yd$C>u6ezLGnyFmdfOuVaoVoxcg`^>@9^QTQ2D~eTHQ!Y<<+y
z-iEVzBSsKW%3j+NU!TPSB^OLF4rP{(4$EGl><Z02{Qy}!!VDs?7Hm^G{^dAHT>%Q4
zEp3l?HZzlB&-vQ<WjFHPEItj4DJ+7%$7g)Ul(_S3DUuw__QbOL{x1~27AE<`xgTIo
z6L9%qR+Ry3yo>0{?1WUc12pm3zgBD{uB;2%#l+w}cRc~c4o@eQuclA_pg%aeuHO;B
z!@uW)H37>Ryz*D|y#M;O(spQTQT(8Ae0}Uu;K&&?D1`$3%^oUtphjmM4m-+Pg+#%W
zTM^xQXD*3Fl7V+b65+cFAP&F@G#J%-P4+4dlEL%{IacTxvzu)3)u<<J^g<~U9HA!m
zj5cm5O>b*bu1`1NZsBH2TT<pGJjFZUg+E(x!>r^eo8?O26CvssTL0XVT1$Nt+v@08
ztDVDB&fC*;zc1>}2Qf;>lzMU(=810CULi!IbwNm!#0zF!@Wuo%N3$Gt@$mFD><Caq
z<@M)F@q?}s-Ad(#+D&P2joFN{IXu1)Gp&%&;BS7$=~f3iu6w&6!+*v61iH8;4eH>C
zeFjd<Dqk}VZk94I0i{h}S%3M%6ut^wYY;Fpa}IS$8T981czS#-_68y9Tr7>Iw?UYh
zy}fE+N|UH>p&lNE3!W}c3^rjydOdM=KkH`FD8D&7%l^rx?d~Y>UdV}K3DI|NlSdK}
zG2I6I7t*Q>nk{blv6?2<c7YpV3`NddgE&p$=tA2sukYTaozH}3C)9Wi;UNQ=O|Oke
zMspRhnoVQ|nVx|5@dhXl=eaJI!q2^~SBj$z#owp8ctc)GyT5(ayKh|^ycZxON?u(g
zsgq9v%Rptoi38BYXTkIWIL}pNdkdrlcFh@a2-y5kr7z6jSimV&9Vox=t?JbYaET<%
z$E(mby}#2H|27h&tdKS-z*1?Wnncr7wq@SyP(jaHf2`*~r>H{L7#mxc#4YDAuS2K&
z$@yDuNt|=7u|9E4{=0&!gh;zI6<6@8jUu(|SW=67F$KYV#8S>OQ%wQSd_X1JteA#<
zm&;Fa9!;Xg1UTOOhlK2RIVci%MNlvHS?P*7ypN_t3l6ts1zz*!?h+gc$vgj_5>1_S
zi^mq?LD>1Wp;9YF*p`Iohn|#t^y|AiW<6GF8B;061E&(|`ChI3get^Guozc$NW;-a
zdIm84Vtt)DP0~O2{jK<G2jXuu!$4ZDVSepn#&QXV6YWp07Y*N}WxML)fkpy*T@>x2
zICvJ+QwGuC%$9}Cb=vv!9Ro7SUn#;u|D8!bp!=Ii9xK_dUy}gkh@VSTA;V74l(kd<
zwN(k&hZwJ}lR)T1;jag+Q4%aAt1+YG_g^146arTex>5-OpYOvXr3V~pf<rWe3{%VQ
zD*NmJmkEcHjNBrs9-|bwfTSc#`zmgvp+E@@H~8uY(1C=HQLCx32yG}~5b)Z`Zh&pY
zSeSRR$8!Xkan@^wt;5{}afCJ-5vnmfG^wX`AWB(B0f#YO4OKwph4kW?T&BNmpWH^m
z9^x0QD0$Q*U&D56i3J#}G2|Olb|&jlNeqzxA%w4MLYBcNUb>Gr3UegpOCiektI^EJ
z2gCOunQ<eTc?KV4`3&DjvmJ!BB-DAPh{JY@ifAwFe~JL7581YQ-kq*Kz_YC$p3;fl
zTKCNCeyaX~pYU~QOGlhAqn)Jl2dsuCsEj(jem+1^H|tCCV^{JFZ?VfKR_7VZ(C3!6
z?f{4ouuP@)4vD-3G#sP>@&$(p(?kX#oj!LIPuI#%@1T@2zlh9=Uyn>Nb+kr0IWAJB
zTLPsgmS8X6-RJDriPJP^a_nO_3Pq4jB7sH40<mPfRo{+sfA_p<<ellt&Cf-k={FSt
zH3Go`T`z25<O1~XQzvn{uAjiJ+zNZTS~^I=UXat>T3$@Bha%5(Y5Pcm63w>0ueh7H
zmlq7Z9#KW0>ca8?^dNklp|CSE^gg#>3&F4H11jYLX>Tr$zmMe}KAo?IyH$1YU8*;}
z>)wa*+N{Q}cRNaYEaY%&x}2}JJBoVTHA{E6+OERWLQ;XqhJsQEjl^N=4{6%t1$Ezt
z@kkag>Cd}01<}b1RzE#F3V$Dzf6^=9ZPyRP=Cl5;e)^ocg)XuB(Kq+|#-jWO2LpG(
zw`T4M5Id~=G(}pq00j<fv<5IFRZG4`qao2vxP1QQCj7+pOcgRiP!ptS-yW~$U!Iq4
zA4^UjH;(c8$P3BECoB`X8@-||qqjnTgy$O|%!#GagMCSjkm2G?-gk(=8HYmQ;JBg)
zL1p*cpVt$?i?*|1iRZE*72*BY!WnfT!k?}oCVpS*nH5>1`0|~p#2+~*5m7$Gqrt8_
z7+*43^qV|V4tfeqm-7|BfD3&}cw#{F)&})*tY)aGaxI!vDQyzeq2q_jo_t8CfO4>t
za(|ah4rlcvgv=o$FJW(epxx@<4_YO+9!Z!J$S4WSvkd7lB9}yA2%kg~vtt4dTiWa`
zHxNJ6PbW#3WvF6cW+*RXx(sV+w2l<h2=)5uT^|8emZIZakSz>(F+hqB3i%y00XIy#
zLCh9%?35Y-DIRX7C(O->8(Ik-<pd9`rh+LHM%iANLU9VSq|5{tZ#{KHGl0K0_$h^N
zEJm_#s9UM`ouUZK!2w23WDu(|B7EhC<Z2ofUtB-yrMHl7x;#osaGYi7>5Zw^))3(k
z@+Ky-N6tu9+Nuh5GV<75YM<cvx6yrALIxx?|8qMmP10>fdzii&A}KESOgiapGGRxc
zov*AYtd@3a33MKwl+|_+azy09k;87irLrBrl%hpY4ueIK-?}TQp)0R@*ydzh6F}E<
z+S0%Igs7}3K&guwg0M>rYxFLwhZTEkhh)}Gpj3{j4J-NR!H#SB$QdbUNX3aK8X`ZB
z^9lr2(+nb*nAFP-i2TWw3_3Ba`$5tktgK+7E#5rP$fcve7e)<Ys+D$~u=<YA#N&|r
zY;jge#qdS#Z440I6~^ohr-`_%jGWd^kER*p*%wZ5kfls#If??8sA4*;h$YtIMEC^<
z7|nl3{DyNf9^`pL%O*%d_*tthGG-DA*J$2Lk`r}ml)hh4kxcDIP}>>CI>3YmzNjLZ
z##}RiYh8c(+#C9WMaGR|%y^CdOTaNUT|ctwe)*Q2+yfSl2N)4si#Y9<hW)YVEgi1l
zLBG<p?u!ZM4lYn1h1b9QWP*MMTrNc7zGj^tjHkjs{G`BnN~ysRR|beQRicR|F=mFo
zq8@NCy3U46Z@;{3*D-*fTp;gF_cmj=!vJgNhRX^M1d@@VGR3u)T-O49XX`7Zii^U8
zcWZVc>fwFoN?hDZfu7a%*Mh4qiATcAqOB*r<;Y3I%~e#Nn}9PW?^L#_#*;d^9uJ4+
zLylCPR4Yyz1D>%mAFhG?n^TQGt}3N(fXA!0&78`k2wWWmjcHz?h;(1y3v>S2{|h?U
z(2&DtCruT%p~feCNitpTUxX4nRfz?z#dNW$dp|)p*Vz0dr^~RQf+ui%YhX63%N2xg
zFPgT52Po^Ixc}&R;Z^@lFT+fXxA>#2idxdo1kdx@=ia&tJ!b?43O}z^Bg_1S2uZ;u
z=;e1s3OS8BMR6V#t9J;+AcA%L84TgZg??WrM1UI;+lbgRHaf_O-DC>+Nj``s?0VVw
zP7oKdYA<c|Wmur9jH21K(NKokvo=a?etq=0yELWL;CLWtT$8n39)rag28PmF0uQQ>
z-zdR3(tGd`mp>_0`n2r!#5-7tkPsIW@&1g41vNN#wltgR=;jBG{9P0Opze%Tkan(n
z+^zrzysNl`WF;$HXnK%VMAPI<eanq<7TP19YoQS4>vx?rk(IKd^i!C-Z^l1>_F(L6
zKmSoEKmSoEr*cy;f9~dX^Jvo+i!ZCmrR;EoIP<LQYeM4boTjmk8TM?YSClwT14~&g
z90-t0nxLv)$vsdnA(Z2H1*obAq5UzZ;`A!b(koV^M#Lq|EnX^uvAC7T0n5tPk!t1D
zR}NWp$iiSb_nc-)X5CTc$bn=>i1RuO@8i!{)0A|RoIMM4sZwsPtHO4MB3kT!Po>Tj
zpR-;I#GPFLWv#|sQuLpIEl3&Jk0*+sWKu^4uTV8vE#ags*R2BH>;mUZ6N_Wr9&=;P
zXe5n*<JL;P7xScz_633UQ%Hdy<)udA)1k~N`7r|7k)*~Yj}mpw#te(EE-cx+`wvDr
zu#9EW)46qTKyR08Qg=f320P#R4@QZTZ1-o+(F3Agzi-=G@+l1`Gr^+)fKgU305Hnt
z%2V`jCIk@xjFOCNt{rofdY0HFiH30g6zh|Oic_Gp+tkw>78=ZM@Ba3CE-il7)wFDf
ztL%tD9RhmkGtl7gA4f>~<Dt=p4S|n94spsBDRy9Zo_tJxUlW9KnlduAT)JVu2g>{e
zGuNx~WZ%iT{A0HR0<U?&Z)6$Y1)#GYS!+2U@63g%X{(iQ^S(sVsk?#EOM@5*zM=sW
zPp1fHgu0U$<c{?W1Z|A*gQr4mGD*ZjoOx7JJ3eCPsht%EQr5a`E<g$zaRjpyW-PuH
zG=g2RNYl7w6gin)GSP$BMNWAnQz;>o^ZdoFr{}v<JAK#D9Fht<0qc2clq35|txyc6
zaThijzmgyAemuT$pA{=GL-_T5D-s!#7UIDT12rvAfB@52ai|mAXS%H{Jq}SoOu_nP
zRM3Sy@dqUulQlpYTLnx)WL3VudT>c;M5t-t9NYf%K8V(C{;JBz0Xa7*#@vNeWYZ(E
z8TVlL8_e(Yf}QXn`HpW5Ie{>3bK&tbyd|+tPU)r4X~PPV*bJAQLW)8)2ZQ55*Mnm2
zR24)VUJ$$Y*VyL`WK=ev=k9M8EnVW_@`t$&cE}=jc#IxHuE5(#VQq{&cj^gb5q(yD
zV`vMT1^KKv4ZhEf`eMc)y0ODv+fu^e4NO1q_72pi4kZ(pd_^-^gl{WSsF0j*gYK@r
zN?V-)_ZBKb92o}oG>3c<1*Fx*WWVw7@RA){YNw$68V7o{L=Me~esC71+3Tq>KVT0F
z5puHt&f{W|NRoqCl3g3+*Ox2km13`llEOybMlDVy0VOOS;V!=@+;~*fz1Ns2(hA-J
zHUrp10$<N+lhYu&ShXFj(Db?7>J!m5%IlsGHgcZ_VH{M<_YddGkS~~-NJIHqx%7US
z{6t}N*I)R~x=)LJy}R^J=~{=|U9IP?@W=cBWwcL2v>rk~_PmiiX&t&6vF?}8NhZOk
z;aSe#xvC1$mxYW~rB%zOpu}3~z!Uy`LzH>kr;uP#E<sW_SjQPOqQLknolIn$gXm#P
zH8<x<pJ(6|%^craKu&EL(OSjKR19+c>}gc>6fit8o-H9{%?+Kw{?xZnEZUOC`Bc*?
zdKrEl*Xo8yg^<r$iiqnZXn81DPw~s_;81p+k?|cQQ-V_fF{wVCsdB8S&EjV|Lw%Z0
z?Kg^)T@C&hw*V5l^J%4J5sCN1EA7ZqdNQ8|bVbu2#h>Ptc~g24Z<;l`MAEO<F)c)2
zK@8&mL!>k}qCw(4SD-@Lf#<A%mu@)&h?FHDw_NiZi0U{0N2KhA|5v1xHV=aE`w0ij
z$Qq9)lue3!`;SOjV_JXuk4PEzSEPh4z-GN%{zs&|_nLG<-pv4rln%ghGjD%IN;vg=
zK6)3Ynq}($h?GJ=+;>5jX$}fJKDnC(1K(t>|0Lo0>0I3gjqIsILz+<aJ)dgvXiBPQ
zvvvfKPtMLM6gf0?smduYI?lm}y5`9imPz>LTRe&BwP3*)p!e#7-scHsPyz3KtH4Hr
z4cb8+gfa69T1Z)tiZk%3S(Z3prp&2N<}zu!#qRPW8ek-p+M#iYfhI}ktX`EE>iga%
zF@88d(H?fA`$%=OteNyuu&DuaG?&&*oBIP$)R$0*Iky4pUsJuMXrSQ=$}7Nt0kvy6
ztyQM$(;t8w*N6~QeM}1%9^Li6f%(w0$t*BFW5!=#E@blbs66>_-SA!n#URTfNB?o(
z>-`emgbr2Zc@grq%kqb_N5zawZ}*5OFm)JcGNlR|)pdWGAW}TwS6X}C^G2?NMaMJs
zT1w5S(+tSCn&-Dae@#k+^?yuCJOcq0df}~qP0ARke@x12fJv!jowNv(>K-RY0q2EG
z;9ShCmOdZZ)y=P7J{Eq9CYBO$KuA0nehTp9db&aT<`}$2N+dNhMC>zY252c2=xG5s
z2DY3UV}mMYPL7kVPmkLru=X7tt;70u@YQ-cLz7igBxxG_hHM`u<n?<3JENT;vY=wV
zq9*ky))khI>%`UP-3M}=|4yE>14OAmpD|YuPk=A+Pt4t5kjVZEB?0W}9o)yRiec}s
zmCLh!`eXh46*ZI5Oo!#4*+e%NJCb>4yy9Z4$MOgSRze`*#6t-Jw-V^{4^*Y(!i*En
z>e=NS-=`Y7|H#sOx^YE3Pq`cp6z0m6i-=<Q!rjJOxTW2U{8&_gkL8%Fa9A&1<o)Ov
zpXU9P1U0H{kb9Y^iTK(X&E-Q727^?=%t;PNa{^vKy#?2u1;p@|ZF#D4vmc~2_H)`p
z)$P5Yt8+gi`5m`YX2el>T25JXM<Izn$7NJs>KWSY-{|2zWm)J1*PZHBF>0pTFsEmQ
zkk)U)Ht^3da|Le6x-fYbrP5mV!=ki*pB8nYDA8ISTq~<*Ko3!4v*aIVRWlG4=iy=8
zw8Dc4NUm7I`mD4xo3Iey4F=809z~kdH#GI%P*w5=brMJw8iDS5FEhaZt5Sw&nhO0@
zDSt@^Wc{O3{uU6$gy$z}O)U^lbW+HmxpACaR=i;|8`%XN2@H^BNq_=XQu)JFq=1MU
zX~G@VFLDV`DZ8+OZ2&6e6+orbQvj%xQ=U7V|EQFx0F@FKpi;7oGC&;uqf+k2cKq65
z#+~K*aY_2I?lGlo%x%CK*xGdrt9G6NHFkl1qdCh+G0twV+C{tBkvz#S{s#|-wrUp;
z#cQTaqTq6?$?!#8l_=NHo`QJBa%#uyMt@MSH6QI38csLGJWh_B<gT6{XcxTqOYU^+
z-O-ggGMk4q)=%f;D0_>DhbvKP#3L_d*+09!CLVVhJ@2!FWatMY#Un|=eX?+JT5bC2
zb4Su*IT=4gj<7@qo;FX@l%5yEMhLpC=4XL!2SyjoawKHKuSPcBEd|VC#bFSeP;=>c
zADKft2x0y$x`fnCXDTpw8JJ<rX|)gZkPm$Up=R4C9#hOSyMX=Ntx;Jxz=rAU>S_Uz
z0Q&-PEm87Er~ji=d75tb=2BvbuW0%9IY+rqvJqqeXL-o>eN)EvjA^!az`lHaqa1zH
zji}+fv=wO9E-*ZdB_VutJQBZ1@c$o6sqS=$<J5@&BW0crfzii~*qm%FW2MwVA10gt
z7@;NGBs-WWRhu#I#4}=j246DxL6$bIbaF2@z_nGm_9IPTnVD3|Ux^2z2hv@86jNie
zDpmiB5#`iedMTh@A-P6OCAmx55QFBzUg9=G$(BdjB?{}-uMV-{aM-C?g$f<VIbkIC
zwQL|S4o{GL>HUXaiqC!BMGoFdngmMdF96u3CqUJL6Gr*85<7uuYJOgrd$V!5cJjI<
z+rff}3oGqy6w{pmy8<h6sSS>}&o$d0u^I;%_iiN1TqaUafgnlCPJ?!{WB_EeRf`<X
z-EzN_U^;7>s3BK^%ls;i^^Z#_jaWf%Lygz2%oeF1Sb@*p)E!bjpdm4v@(V5g(^%7E
zleozOscSs4VWIUGr(?(GkazV_hs4mEl#Shs_{aXy<!UJ4))gw5WJ1OLW$gA~q{K@g
z!xxLSM@gaTw4IgGgh-|T=N40+pe%{ba;SbB0X@QC_BZ}7#}CJVI0ftlySN*9cfuRB
zKs#7U=A4@CH|$Xrb0A2&#gS6IRAcjjy#>2D4kg3ihEqh%;vXHIU|if@g6hQl4YOmK
zk6lHm7q!C^81R-q<qZ!Ym6sn1o2xDryQrL83tp#CB^wYoazDwFw%xQ94%$3-)_3pT
zoT$EQC7s?{SP*GUO$Km!?<F3bWU}sX92c1NS{S75ib98l|MuOt<a_}sInBt6(Kz32
z!)>x6xKh-FQ&xQ)k8$_oery+e|E)+VE~{R-Jpq%Op6}T40--F#t`pm74}5|h=KlHr
zf+@+Q^?w>9qa(q<w^@GJ<omZU6vES3nNX}EJY(n!tyK6Jgs>a-B6>B5#S!GXaCe|M
z?g_z3;3cOPMiVF`CTK#%dkSFI8)^v6$dGrGQlGs}VggQx<QCz57RA}3!|g(H*Tg2s
zNgh{#T-c?$lI9rchc~8df7!{8V$!kBj0@Y+$3a&o=&>z=5;^82%^=T#njJGkIehE0
z#S*<vkLEa(6tD{Yr6JD3YQ7@Q#m%jipF5#ynf$&&dY-#I9E_!;x-#a<TG*hnATC~T
z5&GyCQck#1s!Z+<9~!p7HhZ%`El<_kLprvn`LxY5DXo&lD|<*2q7)WACfa=DsYsw&
zJmg=>D9rB)BOFzyVJ)x(krr*j2b#dAVHp%3IULOzCOC$FC^?Q<dw{mvtqDI$61rW<
zn@AwZ!YFkt)fg#!LYEP8j%<*<|J{lG{cx!H*6^GDECUbwhoMqJM20TzldccWbSX|H
zlOC4T{sqJ5K`hS+Kh8u6@<v{Fy-7d&H`Y5B2oRIl?tI4~J5&#oUq%r+#^%T1oaqRH
z@<bxCOHgGaB#7xo!r{P10e94PL~PJQDzwu3EWuFbQ4j}(T@@uL+DwvBk#=yQt7;|n
z-;G*8<!OX&*WGJjbfOc%r9N38#n#)sf*YsF{mob>(z7t}rg?C?D-HpTYydL58tYi)
z0&P<u9_+LZ13nyv+G}6KA__$o+3yD=#1ZjAXCQSJA;kKTwOE{yIio(Hnuytoy8-%3
zAYdvQjP7S1&}3GLJesqc+iKeQXUIT^vZDZUQcBOs{Hb2|vMki>XkKX4CZSuop2x}x
z^Qh@Lj|Jt~*%#!8?=Va%HcK`xtU^w2JwxD)$l$UPts>cBa(p#C2D^BWRV|c*IzpUK
z+D&z{L9r6Ta%EW)ska8LB0cAjmJrAyNu{O06#2PM^Lv1kUfPN>vnM|^=yrO}6uyhr
zX?|=DjL|e;S<6E4K1&_jqD3vb%RZ>)2}!h{q_T_)ugS;6vfvCyIz(}UL!o!KM06j&
zx;}Cipl8mAWiu<PplSq!asQov4NN<*c#dz2F0zHcRFHJFe)XomVGqy@9lP{13*UgJ
zz;Z^1NG=SEG^wNx3Bg3|XY;xdc#KkP3XTL0?<b$-etlOw9_Q0+2ytRb@h|#EcKNx%
zjp_p$V3s!cdR4^TTJUl!t3b&J)cGs}Ek!>5!^q1Gq(GQV3@WNSWh@{V&FuPaco0LK
zXNAK`zF<F>HVJIkI+H61s<Snddl;gq)4<+zBRb=5i?ULvky=a3RlAAjA%^`F%;_4}
z>5s$Asiw4Fgv~)_=B-1JOSRw)v+TaDA;WL{aLy$fQ=m6B@*+B4wAkt=d@P}AvFg7#
zPCD{6$73}oy#oVa#3?G+P*UM<szz{OeiRrxBG%pvnNs95!ga3&_Bn2^vvCOzxU8Jj
zS+5wTQ`aghp-Iy93TOrilD3Kv_C14c2}Hdh*sNHr2daa^1Q<Ds<-<3_Mh=?~x$Sp4
z18J4jcCFAdzmdVuDlkuSs}qnDn|J8lBO7<hTeE7<F>;$duGc=4=*#x62TCNr>!2LY
zZ3f(q=PsGP32l-<bXt|fWz)tzd^c)h-K|AxoAzhQKmgss^Ll%;7fj<nBM=1Olvr-t
z_`S0LoDxEIuUGvVIVi->+_%dU)@uCW0Dx0&^Y6MiQ~eK|5)*!XR3kDi>`VZutanT~
zeruGNAozH91fE_x4Lgv9<e&|S-SK>2q2PCp*K;XO_%f>v;gTPx-^d=4!sjH&Nm$5>
z5)vfv%%5+$jR?;pe?Y<nIPp-^!|S9V5l1E3P;mHJvoX4x>HsMuDOzPSJI5(0G8XwJ
zCLGp$IhjFpHZ08=U7mSe^Z=W$V-8#K;uR50v8V6timpp0FhWH4tizdb@QGx_O=-7F
zGUrx?@@M&55Q^7_;-Cc5%UreBI*2#|n3qw5T*3i}u_0*w>!_6agj<UQ3VsJNjcrQg
z9JzhId@(#?d&Nrs2{`Vsd^qC1d(|kdPT9m|>W>|c+^Lg824IsKdu$%o^%t0}HA}C3
z-@&A9XKLJ95NwZ+B7-ytG936?aNS+9^>BE(v0q$hz~q`(PW4CmH@er^LV+#&NP%+m
zN!|U9?zlQXDAbKXLN=?mlF)NP^%L^OUM-LPff70P>FMF4B=m48qJaWqZDp&6sBNxK
zm(cm*KTKs_nQcLn>n1<<joCM6ih?nJAE@<E{!YjhS)&n1_CtubxB5L?l)bjkPbO0@
zyY;dAr{On<dfZ`0nXTU5=MbGnyh69XN@Wi~sgxMrgN88sKM6k5b2<DE_BmWx^6!ri
z!+W4tq_3WU>FgVJ`geM$<=GtXMEOO)PXvY|Z38BI4Pn2Be?Z8)eAfGBqW8ZNd}XfQ
zS5Wov@sldwrT1aGDbG$&`db~wl}A>4bbb+klB?tt&^-4bOYH8<MsYZ7@DAMihk582
z-$X2Z+H#qKLTk0)VxV~fMX^#qrzOI-4tS4K#}mMjbHoX8zk<ejjDeIF(5|2X8+H<L
zK;<OM1cy-(Z={LK4tpl%f!iHd;;$6k+Zr3QU7`yY)}N}EMLpa?z=v+sEuZbm%t|xP
zax%_lN|PJb?LQ9>4)3qDu8{d%C*Q|f<zAvcqBJi;pmZaBNj>tP3uHtkXE6ICV=*O_
zFp#`!k)eWpm43haSFu?5k9A?E(KCUwB4uU^Nwb=baS}3g^Ex*YW#fG3rn?Ut=Z=iZ
zSwKc^@(Z3vn*VBe*GHLZ#&qAV>ejO+toqsX5TTS0TPCbE;|=i{jj-Jc*9y4+#TWPu
zrOCic$Y{4Xuj4sxl?b+p&m>=86dH02fvjtiv6_=O(C|bN40Vlyv7A^wT+UYfeUi7l
zETyMREwyJQo-`tiF=e?g&eKL2?BGzI{)9#-y#ELBrGKI@Bi*R61-5l?TpPJ|UpBp;
zIB$2@LSv~%vqQp8IE7O^1(AVfQ4Is~#hLJiH*V7Nt5>-#gZ11zG1PonkuA~*8PZ!W
zX@{Gr(C&e)UJ?>?as+RBF}0#SUc~$_Y2q9DDVu^$qv`G<3XgG#?w7_W`Ox<L`j>U0
zgq?mXs3?t!VhOr}B-3eBBqePl)MjP;3CxCKaWNyz2^krn*FEwqR?F=72A?ilJ=Zcy
zRr2z;g3-RS<bx_FJ38XESS&mxd%+ag<4Sd$q90aMTyvWXm4WiUKehzRLUQ!w0}^(%
znKA-I>sOnKW)~Nz4x*c{G4<6QM<uO;?Ad8w83%vrRl+S0Te%~^(`>(wrxv~QBZG^m
zhT&X8V(AIfN;<EU?veQT4WUb0sl98`sZ8}_)Wp#WIE+z7h$<(!k3FMk*WsAR6C=NK
zPDF4-QM6RjP&;=@>RMYFPDGF9a=;%qT3yLF2i6!FI?}a=peHQMcXzo*n>7EZSr}($
zkloFlq#a(rhuJq_@78J?ik_()SU5iv<sM!8WX=;I<)ep<m4Vp5iWc)%X?AzeV_!}q
zRXYtLgniWDgP*U82ZiOM{<e*lx4Tc?AoCNY^@^!*-^k*1IQ5C!%?T)P@n@&qZ|teH
za=#)mtKUqQELypNS~_t%k8OZ|wos0pRCb%NezsOyJ;n^NprbTW@X0fhJFyFCpnP4_
zU3`OfbaRaNN~g$9^CJzksk-B=ciF==FNY9PDdfJn>TzQaxsUyY13)kROXl_6BqnPo
zV2KXs_P_66f66-ccl;bXd3{LC0dIgbI4LDa)yXc??H=2Bukop8d}`x-6QXOrKCJ}Y
z^hVC`)@-$2PXBYwJYY?m?e>i^NM?W9>i{r{DHAma>16-u`1Fb1CYE+lni&4tHnVPe
zWbd1T|4pa+0nX=$JARzam$W7DDus~E9!L_aFc6etOS8c(Ozclq*l?0F0he>Gw$qHa
zbl?ut(N6@T{aRsnx4q)?y97@Spg2dPK?QYLI2?n)!=fCoaLh2Zf^q637#wI_=^U^w
zy2%+518(8L;zi1t1bGp<tT7}a^Rzu`bYO`=M>9Q)v-wW$A2l{V*kb^Ln-2VII1MuX
zjpI^I$oNxI;<%N`FfSH#1*qK2%2m3fMUFvC!Bm>?lJ2~p(S+x=Ek*%FOF^P8gGoqu
zwQ^h>Qf0;{PbTb&(us0*v>Z_hfmrHDb3X$|E57&PORz~KVj!@`NwQK)`gR&~CV909
zWA{kJ>_foH&#ml{?7U-2A_5}F0^!(MvO$-Cqm?>y=NT@|?sv7C=|YV31r9FKSC)BU
zBYACwW@Bs!e5CC-zSCw}Bg7cU-fl2~TcyXPDm#rwV*A`D={qBTZc1dLjK_`MjLtvS
zM!D5kYIGz9B{SYr<cauN!$Mdcy=%aukF!I94mGWqJVW1NIaq*Ip%7uD5&6p@{&ZWX
z%8t-xPHEyEmYg-SIvpATiZB6w1pV|^vCmC$T}DGX$gttUTY{eZ;3lb19Pei_=Uv>b
zMz7d9Z`<`dMSA?K8ie`ij!J`qVhvssdYI-Gdq)U75I)F5&2Z2&E!=|@6NmR}H(n|d
z^tTUeiE*Pvi<BNsbn`?!4@)?PkJZk_F(7^np~wUZnp&Wf#3n8-mz<$DY{KAMnk}?K
z<XJPq4rC<;HR7c_6HA41!aPgTbmsQiZ8LM7?%73d<;tf1jtY1UmbKr2S$cRlS<6&3
z*t~1aRNhbA-<ivdYbnV(V<pMU@~E0B6-$dD59Pn>eZ~mh&yaXER4fL(J43YiupIOY
zg@K7{bt&AT-LJlcVZyz_j8`o0sN+UL9J0v8Duh!|0|q}6piKpDB-~*B1cVtElP(f`
zb+@oaF7^=!EhL0)Z<@J&mj*#j@gO<zl}opddJ=||dm8@iGwUg5KrLT)u3EK~(j+lZ
zS2jo=8YQqoalZROAL-}I*#xl!cI#2;q^IQC|G9z?@9aV7n4g$<i-5Fu?5sonTBY1r
zF30!P!5Z%JL&gU0QUO~56Y2{{E7PrqrI9pY44zi?iEud9KW*Em8JLqXzo6;9gxj<E
ze5JwpDCYjqI@9H9vmSR6%bf;K#-v8^1dG{}jyfVr(AN{p2f${D6?naS<G$)^ld_ZU
z%?ag7C&H}OzvwnM&X40<Cg|ERd@mNFIGvDJ?Ds;K0|AD%c%v>J>v|(xMxr(<$&yH(
zsC_QfSZ1>h7Q?lHMQZpSLKuRfkbtg~&Fg3=pS%0X9;=fMu^g+TFQ~TqB6GQ&d!Guy
z<7H*^K-E~-ONkN$R9MU8RsZw*J=xghx$M12B1St|DGy?k2<Mzi{G+tEMgmnTtm5QT
zOt5p2l7Lci{HvsL1}E{nu~*=?J66Q5Fr|w~9|ip{;6u}Z(j-w5^eZ7+*V_!Rhl8ys
zWvU;d?e0<FMBSbZ5J`)rgkBOyx2VIaV;3~~UP<*oXN$UbZ41k6U?uCv$zKl)#?>Kz
z!=VG;R}e*~y~~UQ34JANktz6^K`P~-xS$`Ag`nqzNd?V(*VA0+ZC2@hBiwZ!UA#L_
zwPiszwiWYE^&A_S8*;b%V~uz1zXU=n*I&qZ@qO*lZU7cf4yH~~-Y;TY`T1tV6=jme
zs$h)yIj)w{a9fV=P;+B+-TW)I=p8B_s^jMJ3V1E<wM+Bzf2Sdb!c0}nwxUHg>H2(-
z;Vr&G-fW|cv$Y)T1lchhhaEsc1I0_RzoLh$SRc2|-~zF<tiuoj+&vm?YpDnTZvtYo
z)#pcbwx;WOy!VgZJ1g#Yu|f9UpohP-*9~sR2;`HZA~%mmbF!VTtNv$=R-%~A%>Q?l
zNh{Rh4G@_HHwujY+BDu(DXz+(uTLvQo|<g}D&5Py)i@p*{HMEz3~_fZU{ZqunZ0;-
z@kzEdX0M8vXO|Nroo=G^n>3)WfQ(_#6pEua0x5!yr|bj%I88LOGlW}$YE`1maZBDB
zFD+W`T`ngce4h74pq@DX)R`Ia8`Sl~uH)0bdJvO*x&d~9JJn@z;m#On|GE<+8oG98
zklh|rw+G|j?z^0rfYzPXQXBfoY=h{I+-!Y0pB&x>ujgdX^Iu{mU5u_V!LkAmxjq$N
zPFk!q9YFFhH9x)>9q=Da#%J(G)pAur2@8{UP!`Afb2^T?t9_jz$t5$$ETRwSP=IOS
zj__JhCpYr0vHo<HH1;IdkHeKD^&~mW^OtvGmdS)~C0%)_z_S!WFq~W|A?kK;B{gL8
zpCi0z=C<{xrSM$$Tlp+8gl#ryI{dUTXZIcWsM64ev15E7hbRmuJ-yuU`@$3J-vMJR
z_kg{NL}m+W5YLcfiiSmXSPb^qWQPg!^!afg*T7POU(Dp-=gthX8z{Z$&a=YErX|w9
z^%9QPZhRx=iaSWc6qJ87a>uInWwgY<B#;yI#2r>A|MM95n}D6iJd<biz(;xD5J#Yf
z9K;0%vci~~YhskMS!_^4Od1o`RWMm9^g}a&05vaCF$yTPdvF#a(t<mdECsC(R5amU
zlpG{}RG{G1eD~Wv<(x#Aia-2rPvrq2LK)>yfR=f-e<-tzoi}ZN1{Hq=Z37zD9-gK+
zJ)M_7S=i?&1^v(n*b$*MI52F|H*`Gg1AP)g>Z3t0lU4npZERBUxSs&zc-wi0&tjcA
zRsp&IC+8K!fl^pp46;)cpiGN2M&)aRQz!<z!M@#OWc>1z!ngSidCQT2p?aF6J3x8@
z6h6bJIg~eMW|#t_eTssjIAjh4&o>mnE(YkjADkMi)NA8i`(xlr<71z?mg3Ci^RSZ{
zcfHb{=WPXLU|Y<pj^O}itL0#dGjJc7NYZ|&*b7pjK+S31>GdCETG#jp<W_;fT)5MH
zOkYmvsFx2ncTyO{{B&{5Q*uBB^HulIOQX4-EwRy>U}7TRD17<=alg`CZ1xBHf%b;H
z5yR%MLWSTtqUM`oSAw_m+#7`Y7E$mY^6H&CCq;Cg;sdw2L_1@-Na49Xbh=K=v;|3v
zv1Q$poAX8zX0*BG9_5ye7?0nV0=i{0EYEsznw4+OX@F!+bj4tG=vB{_`{B;fta#Va
zc*%K58CtIC%+(W07G*y!8(I|kllnj)b_<GGQS#6iARwfGIE{!8?V8%7e{eT7`lMrI
z1?t2$AOkq(8XedsbpCFo+(Gr*awGC|e#2q~e^}ne`DY*#)s8%c*QKs>e<UAw#Ahh1
z;zD4*BHPDt5)u-@TQXF9$G%wQubS4+68od1lc##Q%H%NCUP)ncdA5gwybJv_s;2qR
z;bh7txk|k2N{kihxXXkIh#U$EV&5VdQ*3l<_3#SM4w?uQ`6M`;>^~__RC#34tRWVP
zf-!$8-s|WtEcUqLO#V==*Ru+g6wAh7t^2oYN7sf@r|<R64Ag#J2km2TV_&TVuG&z`
z{aU6@T*bLwd}AhlNA92WS>}hlzJRIr{F}*Wbp9s2gG;dQR}#PVaRtv~1;&M!cRnh$
zEb(t%^4<{g8+B>HcVVxwwVqF)9#A3q)&935?gLyc!p=zQt`~?V8Jy8sSajlfN8B|K
z*;HqlBCf$xrwXbtZ}RdnHqh9x8yu)A&f+S1V#$$p`a{t4Z`8Z!u@;h&xE9S|9fZ33
z`@?&XN!(<ObySEBsl4D36=(rw5({vwn>Apax-VtNib@>JbZhmi-or|zT>Mj_c&+g<
zM1PbZZP34D{wz0`A)1G7ju0xYX}i=Tks)H7gcfUSnpq83tX6QbMu(Z-jWVadDbJQN
zSHO+VHg;9BRjFJUC9u^mQ0>8_bMZ4te!bGw>qIAm??WFMtv~B%2g)KKrh&J0Z=b6r
zPdTsTv&NVEkJYqyeoTvbtqaN9VixK5Jv<Io_*O42MizUoTDo9Gr*zSZs#XeMP(9{a
z^Uw)N;MU)ow$O7f|L!;9h*ZrM8oD{eW>92HV^Fzh`>iBN)sv?#bg`Iu9<d}rjk^z+
zs%f4u0@mm_NYc(g0#~=iFLG=I`kGiKKA2xRC~PKXP}&0T1D;z2y#X~2aQc+t!5F00
z3^y`RvhYeW4ZwAjgKl86v+-RN;^i4mRf&r9TGIV_6VIfQNjVTej=H*;q7-#ayI!iH
zdI@}JnTJR|K(#T5>mt&AJE46t|1`rH92ojuE>F?s>WTJK4k1*MV$VCUS^E#1|A`Ur
z!`jXD6%_t|#TZpWDJ>E>5=n>tjWJ#u+ou8Z0jxOyp!VDLQ>o`ul~L2z#pLG2ke6A|
zH-&{`f0|2Z07a#MIpWtw09UX~jH09(ncM7=*dSsUW4QjFeiHbjZYMBc+EgbLMJsHo
ztV<%Qm+fc2Jvd*qF!0Q#fcd0cATokrV~J{mU1ndcn+Xf<aHWvCAe@~-*Q=;=mFFOK
zoaCfzakH36Mp%_*kXLDv<B|#-fg#n2UlTgfK5KE-XR`r1q+$NQNk(-enoX+!7znxj
z<oaY%QiQ~wBoRn!V`0kiBxicB;fo}Hzo=rAx*w-E@eZ1m0kC!@0`B+<Yq_{}%cbI0
zjJgrHV$Sb2BbTn6=c5%rCP4|4rS9SH;U6EL;MF80$|-Qsr!XT~edI6*RLfhN;NWyJ
zi6ys&??T^uOs}V~Q$RQGkp@|m`aU2*B*#5PUN<N!t=;r;ndq7+;t=+i_TkUbt@_}y
zSGI!{xf99Ds#DTTd+?iB%st2?j__@<jKp9`*%#-d8!m&>gH1=~Fc=#t*nbXM;JcI9
z|Lc8*#r_&P(%sWo93A0yhNng7X@8GBA?eXK`2fNou(|K@{J_UC4mg)r`r0-B7?J!N
zJ6!tre%v{B!;wI`<EREO84)W&2HGTVcj{1jl&Yhx+3q~d?Oem-?RK&w|Fr6=uwIF;
zCKREp2}2f&ByHRJK9D@r(=~`5l|P#ls^@dR8uhkK#T5VZYFewrdQMN%iSHM>!yser
z-xy-wcIfUnNXqLjosV~Jz|e};Ux#5rHD}TE3jtV$o_kQR(8lB8q~jOe=0BKexi(r3
zQWWpgBPN#ie?>28Q~$8Hasoie1pTAO)mK$fAYA}m$=x1oYOzBG_Q%0gW^u-cBs%CN
z8A)MYhX%9`wu&8MQ?(<wk5g}EJ`>0_3VuJ=b<^BLli9)mos-&WaAXtcWaf_=E>^td
z@6qL6GYnOVAhU-xc1Ap_gFLi@I@eJHY%w9!@E<z#&#8T%7|W7=31Ug+;=)32?d>oH
z_-(1@wqxHB&P!358Uc0l3zfyInvo5uOyKsWF@;biiM|El`#0`TyWNzpAHGD?w1TvC
zK)(Ga>1e98CXG@B3T}9bW%E%X&|{j`@b{A3%l1_zF6ASTbd{Bw3K-cTzmxS8P93sF
z(NPICl5iD~))17~Z4?HqJndFn)r~WQkTfFu;gw-T+?;iXvhAf-!7N0G+?GYmO|3nm
zm0TX1z0q7m^hv-x9V6gDb<)raLcB~15(&ku<rb=Vj1Crc(y8To)nSDRExc!{lTqxH
z|17$9G%eC)bSf~1?8BOT9A>?pmbGpxKtwQ!7LcQxjit)IRJzc4a|WZNT&K=9f7nD2
z?><IcC8k9JKusAf#kLNq)J+PvVU|x2*)}c1)gSZt#2r-_x9TIRZ^P^rJuot%=4aMI
z7Gk5}593wg1`mhcLUkTYk(4S7yCe}yk>Jp>5A`|Mi@gc53<FRyL+UeZ?yLrI)w{UL
zb=dJ0I$N?u7L6ohNEh65`-Qc>tQ?49S%)(WTs@%7*_fDVULp?|yRxY={yGEl-)i%W
zhq98z;cBYyY)Qq`O&6JqYjB#+9klQm*rBCDWk4s~g6*L}mkMKWX2bh#%qH`4+Y$zb
z=(WC0X2%{wm3?ny-jZH*S*dZQQ*(LV_@hyO7Pqh$w)K0oBJeu}n*D9ig5oqiwJ@?r
zeR_hz*<7=MA`Z5iX0$ohZlKE)HIZ)mRfuW(Fkkg(mI$H(>Smfq^$Hhmy#fs(o(_xl
z?=lfT^tdg2Ep6O9yQ=0H5Haf-@V8vQa5ZDg@>#l;Ih}E{2KOlS$G3=!;-u_v60Hpp
z%?c^a@3h591rPyi#!@9uBNj3xm>R;QFMyW23*3of^A4um3_A7bF>cFEzo{OVOit8Z
zRLE`Syc8#`V-Z`c<X2q&V-^e3!6DP8rWC1T4<)P}?1<SzM~&!5?3ss^cGPhvnaU-T
z0S@2es>~^K#sTap@npC*Tgt#TeAJ2dwygVdGU(`Kq7E++8Y2nQMi7_PD3wtIQqy#f
z0io&oQxGFAjWMJ{`(d>KoDp;M%%nZDChwvjKLg$;cN-l(MeIcyLd3IF`dPzR-bgbC
zPEfZE;f{2r*n$`0&uyV7l9kx)kUkLG^~E(6(9A`XG;Db1oRTyKu0f7jWHrOa*uv^+
zxT|wJ2KvdlJN9{g3iOk+mB5+VKh}GGfv_2Y+{N}q;}+sVK*+li^VIr_JbVlgw496C
zY_$#X#{D`(CoDx&wSp*s(!C}MaL8}`u3PyjB0*hk7_PCpqwAweY5ZiVb{bzd_?dU;
z0@x#gIC7%YWo3-zLxEl4f>LT@DFTdCA*&bYe@JC?tV8h^?mlLlmr$AqNfrP+MmV%L
zFd(^wU$~-w(b^!o!7{QA%R38^YRlcX!VY(1C%Fg{dmvBDj0GfP5VF+c)$DX1?D>(;
z>A24QMa3g>VDHuK^@5+N%N1T|&Hzphzs)ut%=tWdDF5f*GS1mxz{nCIVv<%e6}ymj
z{%u|xZHHP<az6Dly!8_K4AN>ldjs1GXX$|sp*s~bw5NHw!oa<GPg8A{qq=RWQhIOh
z_#m8#R7F+iQE9A|LFE61?6d%FM*eO=iRg=h0vK1yh`o8N;;Bq(vYYLu=JNxoD|l;!
z)m<8BIr=<;Yb_TBc##5j)PJ?BBUS;d*mePo=*K!qM9B(S&EH{vs+n+}RtkendEHjY
zo96VYe-Y2KNW_y1IIEXiop=>K&gYjf=uf6Vs!CVTSOdf!k;!hM_}6NEHzv@a#0Le<
zkTPH;tY<nGk973T-R_$SQahMzzaxSL%(B{an8Gin4=OfD_JCtDvFUZ(PyjxXVjV~(
zr-UqgH@kE700`-Hw#t}e6h2erE`NDMf-{~ImrMoFD(qVJvcA$l@*0hn&ZV5YxJQWe
z2JQy`p^z^n8Uce~va%7)K^eFer<El^-y}*Qw-;g?!2K_=$D&#+0b5KLlh(ED&o&hL
z!%#=ui}e0X9_QvjLUXRjyS;!j+W#hdHbO__|22C^-m86XPWsnb{=k-G$^jw$LM%t7
zhQ3HWs-Ur>W~``x%l_8kn9xI<vbtLag@e$bMDdb`+4DJZo7}^|)fIp!v)&UPPA`-W
z>G}xA6SJZC{Y)3HiudQPkBbO6C@Vkm{B}FCwc;>&Bv+rfz*q}OEM;D06zp9_g!^^|
zxi;6+V-LiI#^)jG(=y^^ogw@vrUx!E$f02Ssi{mo>NnL5L1+|b>NJk-=Wm_Z0M+;y
zw=;)Q+ES`As)@vynHkQh*uzdK(uoeEo&DSIGSIQIf*HhQSE`yM(r%ted0B1vk7>7^
zDf1pFV^E$^{w2bzpp}0D5B?5Nza!!rDe=FrETNaMh-?xcPZ=@Au7#;V;sHq(x1Zps
z8<>pe6D77vH2yRWxOUk6Qz;KB*3(!X3P)D}9hM39BGphVo3a_^@d{QcZkbOvu}WfL
z>4}pxjKhSsaM}lhuj5j$(1)ol3tyF!@fZ{8S2?S1dpPPPv?$VH_Re;Ol+1=_P`@wD
z5btCFOjA#g`3sC=rr^=G$YwkSDwld`-iYk12<@96u^bsi&a_f+-d|WKVW4p?u%vHp
z9;M0jx9epT3H?>t*q{`2N9ls(YwA;mA?X9q9L+Xua=&>=<QTS`PywZGw-|Dd-7T<d
zZdezzq_cBwH^t%zjM=z2cv&pyjz~01$XWe0xb{(^(d*38GEShN%3G^6kY4`8E7ot8
z68rh>SuQe;(U=`q{PMY2GE1>TkjDhPA_1({eT6n>-5yb4bpS9OBo`s!Q$}m?-_8&d
zrr--6s6C;GveG3+Bl3I2x_m_Xp$1eqKpG0rTTi^8nb*RV$BS%MOtdu4IEOj%TwB7-
z_2C<SfYj3tr#SnxK1Jwc_4K`Yk;l2jR~z~_Jzk^B9dKl?e|diUa8i7Q06S*;v54n2
z5&#C?K(S%?#3*Z_z9Q7oU8uBnT1;1H@fgFsu9ut%*&ijjI#s5h+gzvw91$UKp!wsE
z!MI_7UdaF|T~m%g`>iL1gu7SgGaKIEbNamXftkpL3s{%&m23mN`X;k!xgCYvUZf%A
zjdyGd?Wg$Q6X3c)t_nI?#E;;)t!AU6)Hry&e;q)3NvrFKGngiE?QV#E-2dPBkSv4+
zQZ($}j)V>`cbAt;8e5^$|9+&yi;b8suLwUyo+pE5E|T5>3+%^ev*t1GYuHq$ZkLn8
zo*bb&Rq{To)IkSp(xi(dF$h?Ktx;??jy>KI2ET&)&z9$Zu`o>ADf-S=36sx3_@HrE
zz8E`)cApQ@p1SpsgELbEhPYa_8tJH2ae*VGl3aHmQ8PlkKWEi`d#BN}chR$o8lq!Z
z3XhYMg4Df2gkZJeOt_=sTTmRTx}&Q$FNPGS(ZI%k>qAohV-OJzzH-`!NZ1%@Kno}e
z@KG)0v&ut869|9atzx@AQi`RL#)AuSaJ_;k+F8hlV-3f7?tD`tn%dIhvDXQP6{1Mh
zsoO@&{qr@#GqWs;;>1oN3cnuU&BQf}{d9ubex&1bnd1gwmQX9At0C6`Ku+{8JtK%}
z?Tg(8MuYZS2bgEhkspwW&UHH3k8;{q<s_&Hup_jVeK7)tI{>lifvwh-83p(Jx82Mk
zMcFXm5#gsl(HO?f1+g{KZfjJ`V6l6ep^8->isFtpk0U{VcP(lX|AWgn83ovH!~tJ$
zMVUv=gU4@&C|pEP9r@{*>7611>J$By0{-{~Z`j#>xx(i@&GY>Vz={~pF_3C}eHSY3
zVeE!A<<q`9tsIin-JPBdS&e6V!jrbq%I%JvVM5}~^>`3bdjI$7lqlHsYJ%H;P^F}7
zSY>2JPZnCEH2$UA-NyO`H?1y$4cO%P_h8xYbHQV^0S-(J+v|?>CN8E8JiKG;B?p;=
z<HqHM)T#Ci!qpjZYOzs!UbjpE0b)QWN9+A5$gHH)nk_e|V9?C;_F65cZ%%*cz(+rJ
zD>g503+vPmHVLi1d4=e+Z$W{{l8B&WSDmeOGx+$Xs<;1ku<AMK;%W2PFMB(UdB2k2
zV-AM{Hc!`)?_2>5;nEPs{>Q~Da=SH3QsSrIiOTBd^j!}#-rlh>V-66A4#V%STfN|x
zYu23zd;(U9e<a#Y^;_gay!`Fz_z)J8eTZL@?GCaeMbGrQ6Z{+ftlvK_)(@G&$)xgf
zW8ij^I%@Fwis9t#^0IID*{glKexJFc_+lGsa9~4?i!hi3*T*-%c}WF)iBQ-W+OXD%
z_w1*LWR7UBxA)>t@0e~e8>X0D57f;yJ*~}TZA>**mm+KA)?mUtW@Q0}mZE}Fn1h`o
zXz#0RmokNaDq1K#RKgj$Cte8)#ZlFnvUk-$G$v(Y0J}#csCPQqjX@=727#Y*><&W9
znw&vvDxE#?)?WreGqy!jfG59rYOyJYdC86oDLr*iOEXC8Azc<T=L<cmFNVcY`e~A7
z6iqLJluya-f_#&f&*rEb!1nmDlBaE@b#$?M!?6S_!PPUm#xpAcXH^rK9{$cr7I&p9
zP$f5G+pqirA(qY5yNd=(;OVislP9tZa!_*k?kW>BhJGsw(0ZJVRLzD9qok#3M{xl?
zEVNx-;^=CZ7aff|2PZX5ti}3JMc&yeMoveh@r=`k3AbR?x`Rt?KA!=ek6=N@2BkoI
ztDX2fK7Q_m4qe@f|Dk(&&(RsmG)>dk(>G^*Gn|iNp%hiGh`+XhKS>v<)}J5N``5Sf
z#Kk)b{vYb@@jJJ#UlV+6+qP}n&W>%{wr$(Vj&1MQHg;?~mEUuobGoYfjM3HouJ1Ab
zgmsTK*PPe)`b2{y!L>1OeMGFcA8_Umg8uD~IKV&~p>B8E3WfRQKA@ZHMU37&A*v?%
z^g2Vn^P!et@TEDWBP<dhUnHW>6%|1<J78O!&@%EIr9|4Bh9ehz%go_@xBM+O-z0Gp
zWAGalo)+T>t3kpW@6PB4408TF)bYOPgM>WZpO|<-c~}OEQ?WD2T0qcne<Wf=D5zUb
zDM9zE$@>H|Mhs3HEVKB$xY#c`{<*#Zoa|mU8MDUt&3ZOqOU5Fy5PY&?<%~mz7pxG&
zfEQ}d*#{}PU%K+U&^+&GHzt7DV;h5`m(kwWXjdkeJpgW7!XStPFLNK7`E(H~X-p`B
z@F!;~@@;s66sQp15V6K>ThuqTT7jc_qXd>Gnsa!)HDJB?Z@k9>I3VYQ{7L;7E$?4`
z&qdmg-(y6#8G&|WM<?&s^70|ghZK){en9R+lhjWyo2Qt@<z1W|8pBx}gMlt0vfxiY
zsb@03#!YW-W4B5L<87ZJtne}ma^|uwx$$H8#FPmC7(O(me!LAYPmmkmry@sPo?1gk
z1g7l{%iRpZq$|Ek&ICE5akM{GIX}<bKW2~8f0{i~u0Lka&nZSc!uu-k^IVhH*Z5t;
zy}8Qx3P<GG@l^-2e0ua|)p`-R>yQ<eTn3@#l-M6$2*22=S#9UGRwkq$f==PVDG?E1
zZYE$TOTBPU*<_EC50eh^UB}Z*i|<W{ZF`B%Zel?^y#Sf8oXDLieFFDiin_^OR2)9d
zwol&~SX_(*K-izlM2TMDky4Ki`;r3VN}7y#L5y)*4%v>?iV7J7paMwLYFa%HrM%Dg
z`S!fcw6qHA7*`QCt&tbQ)z>{JiIWKTuNa1ZmKuGunS)hK4P<QUe{el650+i)H%Q(S
zyfo2dd_%+5OT`;@B<od}mVr|YX(C!2*J5hY+h9TZ*V-5YUycs#Ntp`Tw%rNtM8y{!
zqyZ0SHr4C{>r1H1J&3bEt3^X#!&TARDNt=(I9)9b_5P=pi#}(}(0Bm!4R*<Dto<+S
zl*o7SKUk0V+&@^4li@#Dk1tFwK#1ypVLe2B`gfg!n^TCvy1PFue>yV$tX?jAr;-o{
z1d<KT;rb))e*tTHIk+!#yFpOHMFCtag8e5}eoC`vmU{z(t*_+W(5`brPYRFm5%R99
zn!tA<9u3jDZ1A=Uv4CZ5rIj$okw}!^^do8pKNiO_gms{rJ=E)wuF-4T4%pLIYP*Sf
zZ&mj`8y4Lz=Y8k&5B|&QiQ<&|v3k5neypDPmmjMKjPSp#p7tNBhg0t#t49~(AFHR~
z-&W5<z&}>cs^ouJJrtz>SUuo7C*WC&@N7kxBr$c#jW-bFY5zg>^cGZ@B4Z<A7mmf0
zb0<hxN5hd2tXMa8|8hHo`YHNq$U;>c$cKoRdxV<VUK>ua2yYJ_%lOA&Q4eme5ZbW2
zuLi#f?o5$J)$B--esn<ZQkv(Tf<oVEdlmx4g}7v#TZME<#O+h>NvD-Xvm!V!#Uz8>
zQl;VM1e8(Iwi1=8@GWbKGx$MR(v%5VNtqMQD$r}@U)+}fAslR^R7^q``E17b40Ouu
zAp>7jHAc?teL)`8xebJdz&l_muBuBsm|Bz&>6ax(=i3dZv8zx)w^CcY!B4qkp5O;t
z(89X-VqVyWQbmJY=D9xf+l6+`KLgl^Nm^nT@SDad+$-C{ZFu^KS38R*=$b|{lXXA}
zD?BhmmoeWhWm&a<v%QM5^9Lm_1a*1Dn1$&}WJX6eHEbr-<Y55~QRQM5#<*r<+$c;-
ze^6>~O2h-=kx&_K?{}0Vrj)M+nQLldsUZ*QFXy38D{RNqNGCdGgolc>;X)40%k$H9
zb1C+ECz+)lBgQWpvgER`!X*L+WgIlvZG~0__wkCsi`q4Za3H0eFxqe+pt|>u7APf0
zk8#5+**1Hktb?^effv+dJmChholx_D*br<uf@mW4aDnH~FP`&$CI4n|dl_>sc$5w)
z-^wnOx8Tj=UL1l2$t{g7GVtd&toUswJP&^$6HmQ9&0wV`lw-i<EPQ~$xZ#C8+-W$h
zgh=5+qAa^YHUS#8Ht}ktJ|~*VCBdU!7ZmBvjv7#*84AM`CvK>8KnUm*EYaSXZrdOY
z9yA7}9|>^&1W4AOn12%A#Slh$&H|St<{uk_krUq%AlVMGi>TGO9s00T4OJ<GUH8<J
zIfY^$C*2@aFQSB?`IuctPy!f(k=+KTM=BQRHPpLg*mjBarzAZuTS>kKyV)mJYX_>@
z!;=JI^pJ4}TT;Y^=hR-_u()ahYI0D{pGK=<IDtTS^~Ie8{Z3U$gw1CABk#u*_~@N`
z%DJ{Z6>?2N04mz^(O;SpEU&gDK$L|kyb@tzM38-Irm7+~w+cFv0=WOn(7qk%*JPe@
zC>N4TRmug4p&%`VjJa(!OX=d}u7H>#H5|5hO{*_mwFTyFfuKh}(&jl5NfPx$+u*g5
zN_+jAHWWz3UUO{Zh2)H#$j_XeZ11+;bkG)bTy%Y1{`J{)S|y!R^L`Dn28y(JGsSfI
zrE-3f0F`+@mvuPLQ?+UD=CRgV<7b$zq-HAAh$p3K-_uBn4cPJu1X07B3k$!K95BCI
z({>!k<|2lS4Zvx!l`!+fE`_t<Cbhu1C-7JW+b6C1@Mk7V*xZ7tsZ$pPky~W9D{Pmq
z<Z1n^>(~c0Sjtu{mp|u0NjOkFmW!FG*catyPaI2+BXZaLX_S^bal_44(r&%cY$BFd
z%k4_SZo9?e`HHqPOk=RvSQA6o*z#4$L4>0uqQJExMkAJVstS1_>oC4UCF{}vyeC8T
z`7u6<)wyc0OfM!QZfO{rsnlIHizv(5sEeM<|9LeO7I<#RzkT%P&J_yrYvp&Ou(f_R
zj)&d0P_%(_VU1{TL|l;NJC4flWVl187ZEJfBt|Klw0qTLZ}vzyF7sMtXa77AOl`Y{
zOv+iH_w2|ZBA;v5VH80&$^_G+837(fS^IgPpFKO#I-$ut_<gg_i?VB85?xxQ#|wgD
zS^Fo#o+7oV4Yf3<sd9&Gltxi%#f6JySJz!y2}|=Z2|umgEhsB(HS2<J$)aNEB0ue4
zUhEu|9tbx4$+w(i=;g|Ct(%<9e*Q7~MUQ~`#HZmV5ZTZMk*Nv=zOZzN67L`jf7#S%
zJDGS{wz>j(F7vDUXQ`AaV}y;C_sb><&pcb`&Fjlo;n%{BM*`IGTl(%A;=&&#FRpVm
zcQ*Z2x{)4TZxZPfJ%$NjWC6W@ex28Kl=qv~MG&3AV$u3v-VX0K`oE6&>TW*Tz0S~n
zfAKG@Y$chc(eJ$f7-u330Ek6slikO)LhJBIA__q>*_*F<rJSxPmv8pRPdZ-n8!Z>^
zely^wy!hU2CePs9t=-i0QDl{(DJ4R`B4EZouqM#@Cf4#ToEeXu2m&_!5Q;Y#c}^4H
zZhM3|t8W92minCGj43FTCke|r4{Ls4I1wN;Nq$o=&y9AeKrUdk=vl&`#571I(3R^Y
z!wq%=Sy#gEm@K7$u1j|2Uq?CyKoW~kIdLs4bvDPb&kKMx7(mj^DaC{=bQm!oXa?r^
zR~mxU4!A2l=orm-iij*=dAre7A@_f+Ny|_YTJ;gd=6>3|Luk<G^im^Wlld`ZMg*1G
zp_-uau71g<=MIQ?<|T~{pDc-TtWcx=e3h|?mB`Qn4aiNd#HA5EP(dR(^*Et+RX+66
zn(^@DRDrKvG!uH+<wo3CR*<V>Ugf;!R6GR87|atU!?JsX`^2Uje(Qe?P|t;r-bjTW
zGk0+z)e>faWtr4)p$DxoF!nIz4)O(wC761|Et}G(;p2<*^1ft7F`Y-itW^At8$E#Z
zh9LdFxi$c2C^!hLF~`twnf~?`5$;{RweWsKnof&(*!UX^E%oeZQH|hirt<RXe0__%
zV;Xr*aMkp3Xiy(_NU|j0iRIjQF-8L*yW4{-2&4_bYua>;{bMGLpTN_r`iAxlxeiPj
z#yc+~{W7Js{J9dQ2auDHsB^Yu(Kp@Xc=$x$1teF%VJyL}f1^l8D;_Rp(4X<fi+0F7
z55^8hdbZuYcXLGi>|osP{;_SyN`%d4ijhTqUO0ID^-(B#94Hbn*8SlhUkGrouEy&5
zIoH+^u(q2mW{%*+;<?s$7>)ty)_}Ve>lJkMs*EF-Zk0?1DpXb$O4u|e&@1cjlADme
zunP_e3K0}&(6rze6v}_k)N8{$*dajOvKM~&6^ZPKE!r%9GRHf|%X$2#a-+|Gwv5LX
zsM|pBgE7EIhdU}ZIppIvBB_=EaTj2nUPTu61B(2D_n0H*p-48t0z)=3kODOBAZX0c
zraK*azX_Cyg+F>5DJI<zMmLpkuVWKthl6%wNQfqTQ(*tdN(RyD+?&qbf^i`_EVD2U
zkyIyN52~FnVZx~>fzk|H(RDBn$Eq@x{#&7L(a;im?>d=Z%s~2HfnWsG7H7B9jD7GW
zz;`}G1A7S$xqYL6viGe5#152AU&^H7YFgn*P)9{5r(}|2v)NAEeLpD6y}gsT-R$y*
z0$|3s)IUra->Ec22JNtHQ#;H%rm%RMtu56Ei^8RQ0_szM-~kvmf1-$a4_bG|yct>c
z7cS)pmQ3O-PLU-ZK1g_bz8bXqxPGKn3DQ@QbNnWGBxx~}pMnLB$2k!Qks5eF4RP!R
z9jbWCUxPjH^$r1=n~g4!CbS`<`lc?liDE(_HRyy(Ap+UlB2_X*1LSMRi#GW@nn|5v
zM(u7GeJp}AoAYU1K*2xg<Y%otpa#6G>_7c^T0M{2fKWp5?NMF9@y-D#UXh<EoXKU&
zv2QxRuq_a?hjvWI*@CVY#8T(2sL6JJ9o7XLY$y3+m-F3&ek||&bYM=p^7wBV^V0gb
zwSApHNcXJgk3I;Ac@srZ6oNj46a5wS(^InoCWh`Sx4``CXT24KcHFD(dGm8JzV<c@
z{>U7rSi`h*fHThZ7poYx?WxW83Jw-+cRE|0$HVaSx@_yzgXin5d6eY=AXnyjwTt^5
zrEjecV9cw#k|wLhIW|dQu#ub){C-JZgurxSAlMm##p`@Y-aOT)sK#{KsKn5ztUf^f
z795$VPF6lY@wyO!qIY*ZfS&GSTu@a%vC_&66KfhJ^j1db-ocP4<QoHO^f6@L(r2}v
z!G#pA-jBGa{=oo_k$n~W+U1Qc_yA$B1*wlc6jacJ<(A^;eJ@XB(}>Iis>Q7=G9$v<
zbu@S9dfF6VB#hMCEPI{l2mNw<nSr1$?>6f)BVX61Iu0=V*<f#uVO$zlEBui$HY^N#
zn2~~9R`%2@WsvLZ8;{=>g_Eqd5f>i~Bejf%+kr5XdQ8`^_!V*hsf!3{932rOIrTpq
zIPqpV5M+I0h_%}D)MaTx!g!2VBG)Kzcv41Cp>|FzBV5w7;xeb>pj}kxxqe!`PgMD;
zNNj8>MuZ(R(xEtg1tJd!ZcC(FG5Nx#u(kl<JSH?7YdkH+#m5MAE)d$DCGJ=f|Gk3q
zpgwlDW&qZq@T1`9zFb5q=g_1@U@~SpT26xM_0>0SC~GfMT+k6^vH~z3ktPT+%taF2
zVIxaOnKt~(z(FVYw}HdMR%1zE0I@qICs$w_^(excI1@lq!B)aHYC1PE&_B?EyQe!7
zhD`jE#CM>PFH!N<7!(yzt$rn(Dhr_GW{OFiB3eq$U=7kPYN1cOl;OtBYFWP^q!lW>
z&cl(-sp@`L(MnvlpIzHO;l4aT^D7U47^CLE^bx|uH%wQ+RbEWnoX&pIvq(uHz{xws
zb(gsG0CwZYTi+OH_R^)K)N%{ECsyj>E+M<@{3Yuy_wwuh!}UTJ$Co$RhwWhx`{iu8
zi`%?4najHJuO$_C^|Wgo<3^2KqqRWEv_p-`*cG#xM$2a+Pcusx`S7t?<^K88`7y&1
z>O?B}t$dg7xH96@oEmtxDX)@tt9RhI^uOqvYdN&~_5PG{rG=Ba05<n<HMo1>NaSG-
z2Rc&c&7;;xPX)svAn@>LaH`=>V=E-QHF?!{YrRzut)dqrP1oW(1x^|+<{{60Wxd;C
zE|zKmnDW94qCvg!UL|Y8hWJ=@w~p==V!)enAF$`J0cQ`U^J3!L@<{aIqKf@w+{tw^
z6kqi*9!yab1xsArn)8QcAJ@*1&?<w*tnDqslHh`!Li<%8TB@doDjT@>sq1~)5>IGs
z%rW`KjkeJyZOF|5`;}8^B;emY+uJ7seoksRH{4~4s67#xJHl*#yj4NlFvdxXDD0ge
z-TH+iqw*K;|9?Q;l=xmTKd*V1{y<@vAJak9pOOD%-4G$9AX!P#VbZBY%tAoWh>IB8
z74h5?#b`>5BWUR8ZYI$oz9&(lSBumNW+yEX{$tw&sj>dsw!uNJCd~N&RT9YiBQ2XG
zghxb5^8ZZRT$KF3plz&?Z@10)MUe3>l&w);x71t#cCW2%19>Yc0bf9mtX^dRyVN`;
zk@iuP>3(YP!ss}3^*c1C(4VRPQ@6pcf(#!U4ayi7@ob=Zm613JK7fFdNsZWGRn$iA
z=Hkp`0d#8z0wc#A^dmtP`%~w3K=|bU6O4-J{+J?<62I=eU%1*@sK-zifC69zPh92*
zgNVRoGoHq6A}{ZN&!fRrynoO2@hB|&Lm;ut!5NT_4OI)CsmSmO<O%7s?~G&@tqo^g
z|JN@Nn<6>V3$~53EORHy3@Hg$<=@bavM6%KK8hXm*T{n6R2*u#5i;K~R1)VXvqu?S
zHaJrX4<iR~5R1$coW6qo^`3u7r3yx2)dAq{jPe7xGtK($HJpiYC7i#tde<z~VkCa+
z&{}*fixy;|gm`j!(4yoJ%63q{cAJbSM1B13hD=K`fEPDUw>k2*ix&q@QOBlqi7^a7
zkV?H6yQ;{Xg$Y@<^G{(^aRwDEmW}{EF~5iid$Q*c2L`=5hj;gq6<pe?k&1~rA_b;N
za-@gXuYW@~yKIDr*G>93m*am}xWkF;x<xl)envS-gOq;Hp*>w*WV}s=_{rd(o+wqI
zw3-yg+QG5^+qv;l`?qtGo0EV?dnu2gnRy|vDeE$@p!C)C`Lu!eX0&)8Qi0GOh5m9O
zs_5d^ll?n2$2wP>raG}sUJ*ntcDSjP=MUVlD0BJ)(<I!eYJ$Zm__PX&2^Cx>l`PP=
z(*N2QbrHtPxA*b>O=+vcE+{+a(N2LixY_qBdAP(wQk%N;C8%(2PSF)^Y(wEt!hmF3
zn|2+r4+q?5Jl)J+RBnuD85&45-Q>r+Np6tLkX1;8!lDXp@v}P2K(tn0&S!fFUhjJ}
z9UTErav(>Wcr<w*ePbudYzTm!lQW`jLlF4$gjJjyIp=%yf8gDaJ=;?udw6H|U+gYK
z3sN1D-?6%S${H@J7`H8Y0nw|?*r##|+X^P***CH(#p8k_ryD2rR#<GQWMHt)*=<ND
zY1vX^ih2=97^8feA%4@IpfC=efhW#)5=sTKBS@q}B8D4cVsfb4ttBp)FvsIZgO@cZ
z2g6<%{IBj!y;_^s-Fzl+`F{`Jz*ZLi7kpE9*#Bbxeue8^@M>ubkwCx$-L`9BnN5!n
zWc-okp#)O=>wI2TI=!`E*z&Mj-J~!xlt#5f6sANySUz^$s}#Es15Gby8Ath#>C61@
zEBE3!oI=<Jv#9w1z{t-=+NifKdKnIu0C*1CP1Sa1hQy0VBC9VtLzRsUa4Yh)&IB=Q
z^VUzQ3XB1O<ka_)_1fP#%@&USU_nxFBSn%?_-{`(oq_C3q{KuDi5dhV9Hcd1M+~=$
z0d^*JRO-A3ISrjM7zpgpiF2QjE5n~Vp_G=Px^jmtMSQS^a#VR0QXjQYI8N+Upj>KP
z>JNanfF^8M5IjjWC8Gh0w(DKwuiLQ(E=)Ho31T2G@us)L%qNBCh!yqWT-fe@jU?kh
zQ8(k<11T0Z#LhAP#Fw~$0afJ%OD|er5>?8)={s8KO`xb`*lFG|<^Dd=SdW0?{;(M-
z*ToV{v;l}3ixfqY)H5R#X)Q2X5(DxxD>GYwWm)JuG`XJpxxInr+YE_JFgPlON4~@N
zi7bMWCyWdbB-ZMfk~GR8%(RXY5JM%g=lJ4<+CW#VCkD&ewLE#Wa0sc1AUkwy4>=ul
zNdnDhcY-Lrr=Q>u2#)%9>}RJ|N0~8&hFIvO4mW`dr4kKe+dVPC&zCzGTjseCSueN~
z&)Nt;!GtWos8HLQ7%EYeK1)cRNxP$VCqP;`YPm2W4K`y2<^VY6*}eWIBp^mL*Lv9D
zR(<VI7@qS`bVn<l&Lqj1j2m_T>3C(<VKL=H^aRB8M{e(QEBZ9H<$=Fj0Ay#0mwQ&p
z?nXuU!UMoGZUJdPr(#+EiFIeB@IU}E`VzElikR~EEXw{8m<qbfHO73QldqHDUf$c`
z2*@kgwp=7-Go`ob;JxPW>)!5jR*xAJ9I%)9#cTwMH%`2dYF1W+MX2bu@JH=lG}>))
zK}=;<acyl@+?C8l))o2CTXWu!;qrWZcH01|ZVgGB>Q1z0pK1slw%8VNjaL!jDjB?O
z!tHhZY7~QEj;0vXxT^rtx%W<Su7qMg3!lV7=O{J+LJ<lNrWFV@C^>)DUb;;U-({DL
zE7V*Xw@vYb+oD6bgy%-n<Can3WlK9YAno%f3NEv*ADMd?Fa{wyf+Lseq_4KGCYU!;
z&XkJ~+ium6?xKUt8xHZ;e^yD=5qacTDf5R*{{VlLtth3zyQWBDBxbocggEU^nKy2)
z+!05O_Nt>ve+*eRQt$%I8~S``Xo=^xD+Nm>Kf5EdXxif#wUWp&^!`2@<~j8wN{Uqd
z)~=|J0vd@-6IUhl#0YWhbOxIJmNLMDp2lUw2vm1~lYC!(a4(Yc`vxU0>+uzXbRXd!
zN+4nhTk<Ix6M~E3v|l5$-V`_Jyk$fcKd!>%t0{<^bnvxCdm@t*PQVeBDut<Ih$8KC
zkC8<7_*xHwds{GW9ywrNjBZ)fPq)OLoWcHTyn;_z2X~uW4IspRvQZPPX|Yb>@BCY8
zt>xJOLs(NXh^s8{hjL;qvB#Ys5#xcBUCaxdrh*mgp0rqaf1TtL)#$d8nCZVKaBvO1
zJ&Y613T~(!96ipcDRQEpVKZTj&6uMC^r9E<gw0>v6qEa$KW1LOKZ#SRNK7@E%zo#(
zqYR7ZqnKufY50f;F?j3szAAsd>Gj8-u4N1#$Gs-bcFO->*>Sl-ra<fNffPr%*jlUn
zB=@0zvg2attpSFE25y`Lez*EJgTh^r5J6E*Y7vNS(jpwC0HwZ!VPT<sOF~eIz&b4d
zGlAopBBDoL*{D+cB&&0B2AT?Tkh)$OC@3{FPwnVhl8ij51eKJT?3}0}6`#aXg46sf
zfqg{4MbEb@6{(eDf@Z)4y^a{QA(>>9jL$`B{nm(5N<oxB)2yG*I86I_R4GCo=aC^J
zqdc`Ld7QL#V$M>?vs5PJ4z0N1JHW0=uH+^Cd2m-5w|ArdeG85{mkLgeksR3v*e9s_
z(+9XEQ1n=VR(s65UIMqOAzfHv*hW-;^=%R36Ke+l?lXIAnaluo7x&Ao%`ljv^;<we
z&?8Ku_gs`!7IJA0Bhv7mr~o&Qh&4$mKEkgU7E_f<jP+|LNWilK)GMUs+{-qRaO^(_
zoay_YQhPeX8ya37C`R{O9Y=iJev<(Nm?jRdwQj$Fk>t{;Isc`ri<n_2c=V&(rP^$D
zt|!OaWj*v<{#Os*k<TlLb(O*V+0w(GtiXQ+#7+0sZr|kN)KgQK_3X`oioGsI|3)=F
zM#*m}KX);b8Jgc--T$Sx6Mjr8yJbIlDAzw1N5n+{W4FBgpm2=C*W&Y>;eBQWStP5E
z*!~_QL-D|0^&wxm2Pg29*sXy!lG0_Nah&^*h1m4DvU<L<I048NF6=CDQ^>4s@wG6$
z-HiA-5}4rE{{JN3D0u(x@(rB!|G9i)jq_jf%?C~<YN6=gVl)KVeWrc4Ms<v-WKH30
z4qXrTzXs}?uFa_!kH<X4n+?|CSIRowotj)f16Lpc+rA!VU4mme77+7s@MyUAl=?yg
z;4plvPn^xsYHMC*sqC@FwVAhLha@+G`FLa6MkF@^tFk`1liXR4zij@fj&L82#d;SH
zezrl&_T`}H)jPccdstp@vW0YM7`O~Y#dUH;r*=#_NFg0DL`v0mw=S|yI7bRa)w0#~
zR)9T-ZmKAQFaNGclvf6nuq^+n*Nr8Q8J2~-$E1x;XI!xy95KtYNrPH%B1^7^IRbCq
zsz{v&BdsEhNvLGSrmRUGcoXHw&i&J<mSQ|PrS?mmqk;>uc>O#brnCYCIV6b1q)@>s
zlS#?PhSh}d(eP~;Znej(_H4oJyo|0Lo?p-}yl_W+&X2Mrb3QjhGqPBNNKMd%C8I&a
z!@9YtK>rE8vHVdwuENxN?-loXa^bI$8RFgz?+CIw0|*WpP|wv|6If`pE5tXHQr?#0
zL%-H_EW$19my>v4m#11<%WnPeQ6GFHmqf>s;+n)zl~?(*0Hg3dl7L#zs+6G}2ckzg
z|J@BgCKAC?KWQvTy8>Aoroa4ZB|0{848n5B;*S$$j8(li`xpCBYd`g&4Bx*EODTB`
zjhfScqtCk=>hVLn4=}R+&}9N`&S!MLYE;Q?FLCwopG`y*Jkgk6{gnZ;^qlkQmpAZ*
zwMHXIxiQ-11Hq9BH)XkGe-E9qe8sfs7W0ddWew&v&zo*wKQEYKm)HmfPARm!e1%-Q
zztFX~-&y!PC40Rm4MUHXh#CM36>W{*+ONr`(7Q_c6Z9^YjJ`ZgAB}bUtoXR&1yc5M
z#L_=Z=;1ld_|3>*KF+qA$+b1A?d3*XFEEeVoKqF|B(%|4%wmI$or^r#3Wm{jJLdVQ
zeV<M5*Jx~ta=|fq`z13r>GX!FBM>%u|CRK4y5^tJV+A4(8sZUTW2;}Hc_I;G7)h4J
zS_E)Gh764JV<*^mE5|1F&CTmqR?d5M-h0(MPqSP0>jEJlJiHs&{48<e|FL)oL?t;)
zm@SzGH#Zx9W>+L>v5Ya<#RS><D0MKTYkP*O28L4PY3xiitb)v@=KsPXxXCU(?V)Xn
zixbe@*JEbFp%R78>}cnHO}Yuhou}o|Y<({Cei08r_p%1vo^hwXJzA(Alk7IVZza6N
zoL{|yON;A)5L^u34oAi1fBQH7rKfLWmS@SPos>PWX?i$Y2zm7Y(#%q$C-Gn>olymE
zgv!lJoU7FR(}Yn6;+;ClqYrFoDD8dqu+=J8)A2jX({i*1m+>HGVs6Q2AXTZ0Rs9U1
zFzX}(Vj#J>?qL_s7m}I}z&qT6aZZ9AF^8fTGfaHT2aRGm^9zipX(FZ^T5N1Sqv)!6
z)VL}E;`HZAXbSyz8MV5Lm5Fz;qaby&K!amdm1#h+lf4h)!~3^T)O}v(bM4|!&fV|U
zp0rDSLnayKkL(8F+KQm`h$P<t#j7j@Uw9qE8{k04M#v`gadj7?BR(HdJAgm*&6;G6
zL@BQ`UWq9DF^EaIH4F-3`Rr|C1KM^(g9P6g&aWhCOX5q@2bBSUa>zd%h=5S)Z>M%w
z8Aewj_-MJguJS@+pd)IoM$SI5I|_Mgv&JjWzl$yhU>RAuy!UBxxp}t8V>?^rCDcwY
zOZo8lUH_u#06fr?VWC#a0iyn?Cr=+HvaK;cUG5&<m|2y4oPz%u5No>^zkLw|QssA3
z!pFC;w|M($;i!1q#8Jfuv}Lh@XE+c+7GVzFWjm-Q{e$IrDt$6^*(n{@3pNGgBaiiK
z9cu<89a&<c5c7a|6>rCA8mVDE$zteU$PFn)_eBrrhK<aU2`kLsHiHl)!A6xpR##1B
z&A+%0u2#gg!LXCEyEWpBh;ZdH;`--}v{_brmbV2Cw|n?7@B>beYzHGPf(b;Nf;WHN
zH8vO?XZBX4EK3&*7M({-Um_qeOOn_R5%R$GdO9#0Gqh;LNaQ@qs5XU&D^JsOf8fw_
zZmNG?&|Cr_Fz*kfosdJCkF4U8#7%QXs#Sh21hOXlE#g3>G9=tnrl@6Y@5(3%ZKgS{
z*z2xOFWg6E{f4>s5QrL?P?+L%*!Fp$!~?_|-`~+4sLbZypPuS>^>{1(`dFS%!v711
zNf0!oT{=U2rP_B|iBx0X92BhZh0qjnHA+(F>oo1-;v8(z?<0XD8X}qJiO?gzpCg2N
zo?By`ycF+pbn?k>-5{vIGITEB+YUGF@Vq9<^1*!u&@`BYaCOA94=&?aYMY0;&J>XW
zTW4(ObZj+BVoKf@35mzwNZeM0VEGNsL2buB!p+b?Z15Xk25oHhqy>)ME(An=f=%Lv
zrAdH@R>GCtL0%}yX@of`WZl(qWUKS<bRPUX^4wn){*M^1_n!@iaR+LhkTJJnmkmjw
zk&S_W)P4X*uDjR_Sxq#N{BgtG)<e$Seo%f%mGdMS7M$Jk*@95KXceFBQfj!(RY-f|
zB+<_W;7_{rE$;a&FWXIC=SS{BxmS(S`!;zU-<f)n#iqzO+n}vbDTObD<3J6EP$CgT
z_J_Qx)8Yk;V$yJN95VM_1dwBCE@GTS$=4n$*o-y{8iu;XP?QqatY5slZ%x)sf-NX#
z%ZM;k|69tSY^~Zo7JaR$YVWu)aio&n^vj%-+=<?%_^5dJSN_R`wy-MsrS-zNoUQnl
z#1<vyDO<@6ifN~81rMq6_EAc%w*>VsFU`@e$Ttb--^&8T7Q>J0rzEe|A+u6UAHmDg
zu6M$O(JV5sH){gy6=H4=D{_^8cQvFxi`U*VEh%WHz|Sa_?phRzsz?z{gL`n943bQ%
z160zl3jo^fqr3_}8w@Piv{K=!s7*rd_P_$vBeLn$yGDgD*b@UHC?SYJ?>(2;PUFEF
zK>OjuN9+ipHV3Hd67_iQP)I}Lxt)@*MvYvzeOzE@z;2q8BBlRaMvNCK#2p@u6euDV
zdF%lN%uX!mf73>V=k93meT?(ErjBc!nf1Eb@h8gmo3y5YIT96-*GMsuwB{}sgKbxX
zjKL=B7^UF2lLi`t>sExyN$buj1Ol~(O+`Qzhi}(FKo<vbOK1}`6G6jRgGQ_74pn^&
z04qcxiO2rM|2<18aS1|0o%L1P!AOb_9&PqL0HOtgl%!f$XBys7LBr>}o7(}|gS#~E
zZbRe*G3RRBiqMX^59tJFGbT`{duG+h;D(>Rjsc8dupOm}C;%J4J-NyJ)6D%B9et8l
zq`u-omwFG~wKoA|xJ6HBLcyK7Unx0Mwn-4n(26*lN2K!fXQb>{#Ggi(3sA3>MHrIj
zK`i(7XF<oCeISofZik~U3v}Jc6EL4(pt^{MT%$s3!}PD+4Cm-faBpkJWOZZpwm-A^
zT<bXSkbH0LXo`^Jby9RULF)QID5^4<lz^7?aD1n|4W})0m;cOScHc0Ld>m>RP5=sQ
zhK;Je!Juy)yHPOw1l*@cbA*RrJU)-)ulA~bUS<l}>cX>9|1s4P>*2Icn_~x_m0E?q
z`FdY?&?d^zn#+5D*C`%DG>HZr8xO{q>s|Zw(^%y9*etNpn_panN;zOD31|+82C!S!
zOUuVEKec$8sOjkfbmdt*(%aTW5cvX^?bH5Zh&CB>qs1^r7@BOo^Yy^q^>=GS&;Joa
z61*X@5JU&W-yI%3H&5r|5V92d=dP|sDTLwq^62+m!P)2ec8pJL7te)ur(Mqpyx(R6
zal6lP>VH?uq7J77k_!he7amK<JP_Hs&jsOgis+LnRW(?AYYC}U7^;1Jei>yOUUbzj
z;^#aF#_Ydt+pu^`-%Fj`@E%lDd$6SB;Hc{<!fEZ53cAZ6#8{?Y2U6m=PGJg3@O#<c
zd@M4q3sWel+KQK~oxMhC1Z<KZ``h>RM&o(&_O0sl?ckVbkf@aU2VhI>?Ffi7jsF|b
z6jf{nJ1>&X1pO`X$FPm%03M#wpOVj5f7bv6(;aJO@Hm%-0Q6WJA>xWB-4bF`F8Gy6
zqAQGDy(bi0nAj%i1z58+k|(VQO0XiXSbaB@+9^&EnXOTra2wfC4wPz&p5RVxy;0&N
zfD~kUY<Z|-zktoE<nAlH=7z&t>RPP<>rXx5X}d<HOfet`=M4=bV{?Cv*((|u`6QbK
zcWN}+JPR&}4)Z1{vqqx3q;aW0U=*Z4^%IyJsx%Y)1WrP*uZ2&7hooqXgLqeojh4tf
z*WtoxlB$S6^Ft-_B@LH3KG`h>M;m7Rrhs*(s3*6?FEt1MaAWr`(V?Pq$$ZOriE9FP
z)fi5d@<?i~49gw{QPz!<-5#E8Bp4vNu2v8!S8XV|rxKtee!}7SBxL|*q7ijfnV}Jl
z@evuoK$KF4YvQ-@=)^jjU~O*IIR1loIxx!Eo1XC5dmh{2$rh!GXo=V!7(AsT0Wv)R
zkAOZX<FLV1BEYZ+We8tbgZ;4$YYQNH%1mI0?_=Ykwk1`?%A4Ez$z@aqk8liLwr9{P
zF<9^>k`lZ>h(&nqUjscnL(BLx64ZMdVE%DemK|V3QWo5p3WPbNlMJ|CpiXYAxj><`
zuKR4lbwX;GTA+@E0%4#`0h}6yl1Li4IuGLDsJI;iD9X_hV?wm>kSqmA2hHZlXnoaY
zM#|<Y8!JT&7KSwGVrWjzB}iq15Mp}rfzc4Y3xCK2?>HizkiGsSNyTD2!R;9k7j69F
zcMFYZ%8*+L*XIhFCd93lb2v0XR$LEdfx^NHl^j&;FwLhXep)L#`N(&k4Y*{IV&gW>
zmj%YVHC7>?AW>7TQR#p8|0`OI8CyV+-=aB??18&&o}@Yo&Nn$qaXLp46i};=n&~wO
zqh(#vn%c@Zr=@=b>HcRIso!`hQkXx{ZViJ9z0~n*oP7gw2LOU^@BW`Dn^c$u;}@26
zlATCUmDQy2r<M(L!&V+t>Z3UnbEyoL4W#y2m@FV@it3xQ7GShk&oj|5e!>hk!08KN
zuZ&iL*-L|685|IMM=8z?bFKX&tle5jJlO!_F7v*|66n{@z2l7HuNE2qFQ`L9d)AoX
zlb>>J(uA4Y?9G?#@km`$3g*2UN2g1DsW3G;8GuOjF2fP35*avmQ)y;j-tY*i7oT8j
z5j$R&44V^spABm;Ud%}{Jo~0$K8?u(Jn`xUL{nJ`9PEi(=+UnAbe~7*YTfZ*q=}Vm
z$ceV%OoKMwqTi3CX&-Mi($dctxX7xHm?~d*p+^_p$N*S&#*B}EyVdlle4}~G@EhO@
zBEoN!moql18bB=Pt3l62cIP!WJii2P0EU<LWhUSQ;v0x3dw(^vXpC`Dg31=0GV>6+
znn!7?0i4>xLm#ms3`JFNiVe(^BS?f9wXjISqpoW1imAepA0oB`nL4<L<8(P@3I&l<
zFAoh|C;G#t*ci`P2^kRm!o%^ewh03EQIu|}%U@4@$hZL(`O**DrRO<@B#|6*_65>A
zB4;2a+K)ga^jlU^raU>duH+Pcx>^a(tc4}|@;oBCc$*ADK_9nhlUCByEHU7iBso;7
zHt|jEj;Y-*t^@DDCheH*y&Yf~5F?#ZQA<Cj7zD&n^Y^-MHDG|dZ>{$$viWN2nnnD6
zcsw<vlx-LZY=D_ztiIoa+u%C6Q5r3l#C!joR$<L?g3~w5qC;oWV9ILq469VDaFuvG
zLG2nRDxQ^-UfbjT3qqZv<gp6yN>f9os*n^nPq2IOU^wszS;bVGdeD?^u;Y|lQ4HBT
ze#=wm;D5+04+A&9xve>%6{w5fWL47;V>4AQ3{ik~)T^>46KT7Fx9OjYm+#PBo(uJ3
zuQl_K#lb`v$Bso(D0DMTux%Mgep<rbT+y!fiv$(}%8w+?6_Y9&pX^l@i?175d?2d>
zBsW+Kd?^M1+zIWm40H0WmJIDjB<cs(CTQIJq0($XO_N^2!fY8~c>nuZitZHf&Jra{
zzor*|bUAG3yigvzfS0$xJPQjAW~bFLy11IWn02WSrs8ut#?-2#TD!~eu3+2%L%A2R
z+%51c0Ij}A8!+M5=ncN)8Y0&t?}oIN1c5(xSz7O_@$HF^o9}wJ7Z+NTOF73dI6=<{
zXBdPspEXVvZWDNXP_<HHF!9MMa3tV|9$3J-qhsLTSD`LEF;JZKLt9XvefB*_ZxD2x
z;BU%szra0osokcsfGN&&d1@sChnht2Pxf(@Ti9|A(%{z@yRJfW1vho@SHl*Dj3l4e
zKOqq$zsSd33Q-8x0c9=Vo`ns24^?T+EZ-~Zf#c@`Km2vYvR;*SFwcB~?Y9MZZ#RF^
zkR;OtCBV2AEi57la8$#pX6SZs@S%2-8i`QohhW1(oMdG~qVAkQ)@}>NHBDiI^*h%Z
z^eKsq05C@ctWvf_cWVMT9MUN)1hWIFQMg2L33|MuKew56L;9r1moe(gspehzh}GkD
zL<&sfv5DV#-DWn9`x1BWACZv`LxcsR&gV4|Dd_YgRyY_q>D`?Wwn9zCXsI;`P4h4?
z{f>-Jo(8L+S&fXV{@w1M+O<;$A3#A<e(ksxrhKpT+9tdm=S%SYtDO!GxU0+yIw`6t
z|NJsi#$tzx)-U$We_dm?FP>6s=$G+xIcrAvC&#+9kdnDx6<?ONGl!A5EStL%%~a?_
zYs?4^Yo82_9bZ?~PLe%tfoL05lttsa>F!eM)(-zTJGaJ?9)y*IYu@8%Ti#Tl%8C1@
z&$*-cew%s8G^k${AZhwAE1+d+-e>@l5pB>1ym_P=g&5YZI%R6l5utbHG+%pLp@b=v
z-adHuO<qK@bl8I>Ml!iYre(c6Z=~y&4}Ypi%cLjAqORz_i(1`AiPnB~IkF7z>(T@~
zbS1W>CTc(bNgy`7LTljwjBmg^F)qGF%QY@TU0{O|F6EobfM_yX=T2HlXevpxFWrw^
zj?B_jO*w;<lW8$Qg<wT>bmh`(XbsBU<s^-IBl}0$IS<SYL=_C$UNJ~k2k&Gd<z$QG
zV0X`)5Z0u8dxF!+b03B3tVVK~BCSOH%fLt<D!6Fdhq2`hqEGan%FalN8aw~1{=wVC
z0rSq!Ja|g!oZS=Rq#%cICQT$s3y}PwRi4e)U%QzL-X@3d`J<_CrJ@hYDIP{1%ZFg(
z|BcxZXMs6)>q$z6(QNTF5i?I_$Er2>?0-*6vkUC~d(PhaYbNupU=r#Q=Cto+z3h;>
z`^VXd%xPFC%?*~_csbgd+PD^W>v8w`7%$Uxm(b=BjufVtT1sQ>DPVDmuZASnY=2K;
zfPv|qD4ma<HTo=FVCaxJQ@O|JjJlBJcaKGucPPnk<LysHb2E`@QoQ$t?)iasejk0B
zQjyKr%YH4W2xjzW-nD8Di)Y>#r1cW$4TX_>=yjjtIdwMUf?wcDhS-Gchv8E3io6WU
zQ41@7a*2<b(-BM)>wLzeUhkKk?{v33wzd-bVEg)g(ay8{MT7$Qd9s0pk=FzlC6Ng2
z&bQncc#$iydmUnhva>&p??0Ukc!EZ@-fq888F}%Ybb{NC^m;(s0Zyjgy=v|FID$YM
zY}0TqPBe9}a{shZ%evK}L;!Rgd$br)NrG;^KSaPubL=%y>H@rtCX(~?OiGb32t$&o
zwQ^0BWZov4GFcC>VAG0Hv2VPCnh5i%)z~|SZCgB4&g+%0@glC%js%&5w$%Z;&Uy@|
z_)EeOgD(O(9D>9F6}H;1JLXU-JzN)+eCY#(mL1gviT;i)LYsBQzzRlw3EwgtmvF^Z
zvokr_Pui?A?f4h9!w*4G??jH;b7Dg)c^7^}uVpqJC*M0b7j<$L(x<k<mtv?{CV*D}
z4h?e4x<V&I=QsEt)J|<V(LdDA<88^$Khuiv?@BVk91GF?%OJJ-*`C*{27MRLltBCI
z<<u>FEtXcmY=_zow%<3{rH%qM3m!xJF<~J32*5yU1tzZB;bh(vRAhCvjKar-JZbYW
zN6O`Sv`UW|K*=0l*b~otT~>D-xUEOU!XwvEa<z7pZ~h9egTIc*Zr&3@*a{&%p;l4w
zueRx+U@n7QFZZf*_`Zo`yo)8!aVc7QI5~gQ3U&x1q_~1H@Tw$!KahX0oyk&Z0@)gl
zsgwfr2PHadzP3nE1%h;JwdLzv-0z#U5i54eDSnWI12B4T9$hc-v0t5nforpmD8Bao
zS$YFNU=}n58dNH_(x{l1YlHGvuS<5m-Mh5<tL9s&WEKPrm@`F5H^c=}guWDYEZK6P
z$$MA<20P-8o6pKeopcxec<7Om#-Ux5$1w|-$UA&JJkFn?!6HF;%{L=rEb$a`=h8H_
z{3JU3dvuV9lbVtTCYE%v_tmvWTao-d(fgmK>-T$#ZsnSbsl)g&K8r-`%wOhNnzW8t
z_nIj!otYnI_G)Wiqka(rU>ndSy@8b=cCjCjDGFV?eN20poZ-{x4E%(8H-J8$Unygc
zok{oXKuKsZ-!GJC!?%R)cV)enJm7M#*{pj$Rt>I_n+UgREuE2JKDF5RzCwfB3o;W3
z>^apbF;#9O=+ko|2p{{1+`Mvh+aZV2mh?bv8OPi6L|m>*mCDc33e9zj18FW6Tqs4R
zGGT>M9RdEufu`z*BQ_KWngiM!NnyZN)O^H+GBO~ao@HvsTw280{tZwff8P6Pq{s-q
z%pPZVqt9&dX*Mefl!WE3MFP!i?p7f)L-w=LOp{nZLdd%xoO6pC=6~f~8<pmI>J9#)
z*|ztlaAzluVCXY0u@+`}W{5B!nWC^goL(c=rH>V07o%%?E+YQ%cCc_0GM@+CI8MR~
z$T5J<*`%EROy%ceq+hNjAss498nsPA<brSH=;viA|E`=QY+?G-jGp4(t(rv8-i;&(
z0_#LXOPsnjVejR3f`i_)Ro_2P)vw=f6gA|rAM{)W%07qlBwm!)0LTZHay@J(%D0G(
z#grtGwRHj1=@_RIhi$6v7LIGb>JZod5#`I7`5;I{<3k$eqUV^{pB>pU_~4xBi;Jf#
zk_o5C!@)BV#Q-{X0hBWGiPb3Cs7h|`JLKBT38g+%-$4SH<Vnhoyj%~#27nGdJC8sY
z__T51Nc2pAgJ_huFbt#>^-|<4M&EWOD6e7|#1<>2d}plt7Yw=vxRvA_7#%(xI7OTL
zBXELOH~-2l@~!S*E|vtgqrawCMS&e3@lqT3SJ|>I(dUZE813G~mC}+zF`df|r&YM5
zGsGZQb>+@Wdrt0hqkcfxr4m=CGJTip3a2J{51{RJI#39!LfD8zI6;k&;pq0K{3U0b
z+4mLiqp(-rmNI|a4{07Ge<eX2G^HI7dzwEHo$7*sD08o0XzgRTFz(P#oQU<Ffo7bh
znV)UAS$6A)Fo%Gm61A7CKaM$fS`h0IAl}N>CPc^i!|f>WLHCAjbs)euscYi*y&2Ff
zwq}lliWX6o{lo3l7$H_&NlOvQj+5t6a*N@3^sR+?I2rMEvRt?oag80w=VUK&9A0!y
zv&x6u%qNw}9AC_i7^iaB9fp++V3w>}QWTE(y*Qrl!(zRkh<mrZ>$}`N=8agj7N(Q}
za*Ya1fmVakKK~8|Z@{N87?$}1EFsuIj;NA+6)|`hJ_RxSZfOs|M86tLkc<6h`G*;!
zP6C!f85c8%1;&=XV5JmWHL+7_)K+!c5JPckv~+%j!$n=1&1UQm;;A0l{C&itx!?n}
z-KP4m$*2{+9Gy@9rMFtSXWl6+ogXvj`_aos4gPhl2bp+k&D|ZlR$A};Bl5t<VE$`C
zqNb|bVm||PJ(4R(EF@kB_H6?oP|%CL3&GVP&|99B9a8WEczRe+Y}^^}SPUyft3T4^
zG(z5tK&uomN1v!q7c?1cF|546Ag`NA1a4dkbiMNWjwsU2<xN6|NTBHqnVtG`pse#U
z2K|PWdVPdtGX5`(=D5sMPl3mZ*-wm2ZT?1&gXNcv@2BY9bh?ULZ|}Y47%x2qrC1WF
zjIL55k)kXO_p%)?7pq4uS$DV=3aq+Jj#3Z@p#A+v8M9LWa(&C+kmu$7^tpOPXrBkb
zmR`cDF+KakJbidxTg1;RRutlH8b$jP5P9tdPQ9<;RJv@sdEE}6r<_0D&eqcH<uh4T
zo^_2b=1zN~?=g>wtOJ+fA}Rvh54aN~S5Qz!Ujs{QeMH`ldjmTj-liI~2(u9#Kyb4_
zB<Kya{`e{N5Rx>@;i5?*p2R~!PRh<uc#Ekz7VCky#hr1Ln^{e6mDaNDBhSPR;y}K}
zO4bkRdqqs!5uI)XoIcF(svtJlW^UG+Uq2vNu)uu7Q-sppDNW`s61X&r4KZEsTPCUk
zuV3JVaL5t3!w|d6jP1l0ACk_d(Vn<(j<ogz?qJ%^+PR2H56T%~F3Y<g<YK9?u11)#
z)HI5%ND3C=20z7zRuFOItCo7igoPic^^M3^6^8(P2a?6nZynI<TR5J4xwbPeHVIkF
z0CsfuC3$KsqlH}!pnz6jNtqk5z6FoNGT=|egMoxJ%|jTW&&7xg6&S;JN{>UX)<NHP
zY9U@E2tzdGMZ^_gVvsqL<rph^dUX(c1-g@i!sfvGc{bO)m}_A?M8(N|rlU0#zO9^m
zMads}kJn=fhxV&%_n+x~l_%ku)@NhQAwH63Mr_!xPG*5Zr1QRoPNNRWUSu#fP<kep
z6rfL9bHRy+VDJ*t=Htcj6gQ$$g2ls4*Y>3>lAvhS_O3c~K+a>pj*EM-R<4az6OM@T
z1PI$rV&cWDT`DYkQhE4#*n_lUrFEu%a*zIaJJa$1csuGW;BpgjaAvtN0?JNZh>?Dm
zEZAU54S54{$#se&>MUeb<#!2M6^v5e!T|e77#C5KcW|Zlff#Lkc0z&q(m{|`T+$JP
zBfMFE=t)*Lg)rs-!S%0L6crcu_DmWZUa;Y@#U>y{6?8sIEvp9qt`m?7Bn!i1c8}ax
z?tL(<GgIha0Gd)9ocjrWS17hoq*u(Vr{9A7@dcR1hE!J8FcaCNMjj)9&N>XSW}x`d
zb`ZFy-Tf>FrTh?V)e@{ApPH(fG1|I+2>~P{PO;XTBLAW%?q$HnFm)9^rob=&;Aw2k
z7>tGHg{e~xnAV^GwKV01sM&W1{}dcxA{<%eh9D$wW-4Q~1{mIyabt)_$va%;0NlWw
zDoXNnDsv6)SJG$L%uf<h69E8wkni8%=>aQ7G`L1v+R$Gh%(R}nbpPyJ`1v|8et;{>
zQujYwe^!G~B?_f1v7iY)fRi3w*2Eh~J-b($l0YWGr-;#JV{iEGBV1Q7)Ob!-UE1cN
zD&^SVeMs<c-wb`N3X@#LtBQW=Cmn{g`psBDSxnFwb(M!O2z8D}$@-~^1CXe$!`uT4
zT!jB(`BtS6<nP#4X<=4xF$2bWuwiil#h86I*mr7OQ`#HZq&$SYimJK%+vMX13=~G5
z{@&t*&$MG(*)|S|ARUq#<mD?5q16s%+X{<v-iAjm&zUO+p7e`=x;r!LACm$-tFFUq
zRCgM1n`&9Kz~1y341|K@u4F_?gS>^ux1%tQLfhi)8%4j)ai7QWZUGNFo^6lX6g_ZV
zy{@-dg*5{)L{%MDg0EY}+e3h)gCgPZ+f@!{+u4J~&he@%7Z8lEhGSJ>!Bt0*q@Ex?
zz699Kh1`gnB8t=-0DvVu_=%4j5GA%ii{-_SWc?CLk{LbiZ+Z`{U{BzUaDqF7=N6Yt
z7k7f5uhLh{t3h^%F~7-_Yi-{oePFkH_{ZAeoDN{lgaXdP1$y_QLNdkpH2PzZR}>KT
z8xV%~!HdNm3w+u?j)5gu=AGZwCP2Pj3NLOH5U<oWP4adpm!r+^`b%%7C@9-&t}mv#
zIXJeBBv1g<)+#tqER_SZD5{JIPF5Wwh_~oSk1*SGBkY*a?lL0Fk5;O=R06me;uHW+
zPpBLe8oHPm>d49f3517#=1KCr?^9hFK+)XHbxy;YqmhSzIz`GHWX^M|Q!DH8DQ<8`
zDG1DM#W{3({||}zY_Je5{qJ(J)wfTm*{^W~m2dIyyHtfpT4(svemS?h%5~WQiRj!B
zcjN+t7Lw-Fl%C9t919vp-iIiAI6hvh-6D%KY&6|-17V5%X@~)3BJca~Lx{s{0Ziam
zvTu2ax(|%?E&TZG8GAp0=zpx8=!UJ|jVH7G0<SXPo8aGmmHghhIRtLNW3KBtWDCIk
zFo{eVqaM|DEbFb;5oURP^y<z==KR8)2@^--8btL8X*bhetisT@y=Iy@uGl6_;(NJk
zn*c9?+?kCuRsVyrdu+~Z?b-z#Cmq|ijgHx|ZQITr+a0sxj&0kv?R0E=XRY--?|!wb
z_8*uZX4R~^<~7E7oT(!kEx9&YCS$ef?*jvqipgH0QR)MxS3o+c?d;l$e%HinwUWWT
zuoKlPlPe}g@|MPR=$_(JyY~j<Ns_-?l0B^}o7T@K&jP}%jvZI1^AeL9rz>?>eiiA1
z&=zgbC#P{X>D1pI2=$GqMaT@5CPzGZuTcMi33e<~pv^I3O3rqJ9Hu83OQZOdhv$&o
z8P7Z9>gYU$XJOdk)H=@jij&Z!l%d}wt8AL2+yKTjyu8S7NzE6hmHK3{ldeJdLvmh+
z8xLWU+}i?PRs9wCy4C?RHuMvDeN(&+=8tp5-_<k(Rt2u-jm6>hnuD3e!Xaap@qTJQ
z7I4P+Jf)v)gB5IS?R=K9OH#4<!We_b-My7=(x_rs<|+zI80j?Z)>O`uS5Zl4PZU)V
z&cr;$rg2u7MlaFIXt5x$Ny$6y?*?x&&m$CTT&;o0Q7kQ+w#X0M7V}}s|C)Y=nrT&^
zXTTUUX{J4a5C2LZ^&UpI2M?DLGBBqeURk^J-h^Es%ZJokYHY`F#%=WHr4IHjWsXd{
zSfjxdxNcS&*Ck1Ay23;;>m6dXW)b`bS3iiEAJC!Yw5*I!JUgVv54)C7y;}Le=k(%@
zf22$X_qz}*NsHwKJvZi{Ea0+kLM)pA8X_}sbq7{mftjWe^pYx_H{1~{*)SxNQMv}K
zC|is|ACtB9eA_@r2B{RF1r<?h`5lz48G*+Y93KLa-x_kXe5?=k#%F;FhMdYs;$Kr|
z?s%PEQC37@rbe+espzK9pmX~4K%;1Z{b0Sh{64bAv2<Bpx%CV?Bd)}nPqgny!6aEc
z_dKZPi?!t|<!3SM=Bz%dGpCVbu`N)|%G&nTIGXZfuWie^(>J;>sIV_L@7imm0GgIt
z){u-U|H8nqe7*s^-sf}lDRgYFLCZQvhxVm!K%Q$m4lKEaF_x=IHpcGF=Y8{V@3d&B
zN-w)yC+D6b#kS`so*@yD2#=oDMspK6MPxWr2k?HOAJRIFAwuShhC251e%%@0#9;Pk
z>;1#hsoPeT*Yo0^z%tCwJ$~Kn_Se^ubG1RcjL6*eGa{D*%t4B|PxhUVJ&(Q3^n}OC
znsNNIQN8v7S`~w6naSB*M-BsAbIoH;@$PyO;uz%mt+l=aQWiU~?D$bW(jw>}CbTL^
zDnCPenv%E1Ynqkrt72R!JXM}7GL;qCN;2TAEV~iRE%-^Y)!4=aw-P|iL`QBhC;eyq
zwlz*Pep=`^M*Pf32cIxgkXcdp=ZrVis4b-TB9<Sq)RCOXTR1+bw+_LQAEe&HLZ`L;
zslChXPkn=r9E2ZvK<GB>olQPp8FTL_H9MA~kkAmVeczh4QnMv-5ZKl3n8&K><yf;z
z+e~+ze8WU@T6%IUt=|e|KN@cy7u4PkfDjM@=yI*V$&Rq)ZekbqUY8uWkD(%VW5Q2g
z5WlX=YO5C#XOLvmqZ_JHvA16%w)}<!u7x(aocM5AH*Cj?&8TFFz6Dw^U18$4H~S%J
z2Nt;UWz=98|4i6`ty2$Po10XOZhO9ON^s*xKKYrP@kc7PLGYsa#i3DlL~0U2!cwEz
z@|IW%GT%weu1HfHTYv!p1ZWQ-2B>j?e-j2snvWOE{96hDCbJ;7hqf1UAl_e012xYu
z`w4#Mx5D7*j-QTJ*_1t5F1@P`0ugga?uk_3;Oo-U`0;h9+0hAVX9W;c(Gv0CaT4u2
zNt}LSIqjN2KDu!Wn0tNjRb?2aAQap1V`r1UK78U8cf9JgY(hWm3kk-J<)Qw!2;QD;
zBR`M~NTB!-#c^UTsBx!8df$T|HHTze9b_1aM$S1_o99vzm38Q~vQ8zN7i3T0iDbTZ
zpAaFWpdolQQzUo9Qqwvl>c?I^?XY=eW+|Y+vt&yf1oxbtN;*C}{lZQZKfZUmFWF(Z
z(}s+4rf>PeOF&(Cv+}vQogy3#&>8P-s_k9!q32+kFr{LF-po=Lh>bV0!q%I8CY;FU
z*P3|26V{M&bak{iTsM9v4BUo5FrnKf?1XjiM;TRj#pZ1DP2mpolya2krpb&WmFn#S
z`&I*|DUi%zY#|Du`-Xr(!Pc)TElQQ&ckhGL(Be32Q$KnjIXPik=DJChQhtIuIx@_j
zyO0n^D3CawRA#XzMD^g5l1BAM-**HnYq<|@IYRq67QM)TAX)4kyl8DuKz}!w<B75F
zhbpZ<?wJxhnBKM_QWs8$91F?!t{$3Q_;|8i=#nNw&@TRr3DY@q-9+B$b=*<9D@D$(
zM9mOg=t5_#Z&4|c0Yf8Qo?Zo;r-3!5w>{iiW&lh=VibW~sd}Hwz>x(=BAfV3it-ZJ
zOOLd>QP503w1UO$rBFn(A9HFbSU~*g<jge9;}Mytl@v2d6NzK2!!IbIFQ6wkWUI9i
z3#xVgDdv@Nbz@PA1}9|L%R=km*ie3$=wk)W;04>Eja;8DTb~S**+xt@90X6#OX~F+
z^+-A)RlghyGRKmjTdWryX`3>wO4yq)V~<wD6k@TKuAvD>Y_6xnEH-SD(<_pdoV6xQ
zOU~G-T?-63EBE%H&n*WWpvlH2+32|Y>!S;6Do8A~`N1r%p*tNmN+;JIyk`Ckx=gg%
zP95-`dibY*WpqmdzFtOL+IC!AsJt5rHeq~v@5n88kRzI*K76h}5U3kTD{PU!cBKM!
zqvn8hqq$l{OYH(|p+BKGHXIllY_7j(Wt>~YfMMP`J?q#(t<o?Db+coEhHJU&B(?Hd
zQgYil>s7BKb`mM}u*U7PBq7{B7&ayNtblBef4o0L;X~hWN?E@eVGz8~1{$`5lw(?P
zvgCmF0-bT4`x12{4JUUhdGC%zB9bmotXTcnf@*=Xd1It&n+i4Cx@*BXS3$c9B}e3W
zvKD2^+Q6HfG=Zx@r+VsC3g|J()R8bww!h@fPf1u4f?(C^qd*}lvToi2Bpp-oIj5LY
zTlIC{K;G@Ho#p5?)Basgwc~!odH7i1C%v!GNgk}cPz0AD8txc9dg-%Jo%{$#>A6s)
zN=5RQBQZz4UvAo<P_JF20{uH^o;5KRIwy#@*>FdRFb~Ye6$Tjr;BVqyROv#<F3nrz
zS|G_DFh3uy;98j>oE4~GY57>pBbq*<LdDk5rd+9AN{lBh=;S_%23<OqphHE)CszVh
zR3zX}maDr>j-kL1fVAuT2|KAu)p*)e=E}BYZZs0}tm8h$*?@PU8uNv#fWXVssm!{5
zyBxfdl4y!@#~ME9Q<PM$aG=8J#bG*7Py6U`ma)_&Q{}%+on8K{eZ1U|!CmQP{P?_7
znm4MtB{tF!Z|oVgN<;~X4Psd*XEgJfy19-wDS^W_mWVS3rJ%TEMyzg000RLQBJ}Sj
z4fOv;1cCEJZ?<0huGZPB)n=>HWZPi8LdtR{`Ex_3q0CZkp;G6y^bgOH=9%T;66Iw~
zCL(MhO#JzdRXPTBw)Xa-`^a3A+H|rP{`=PM_tk^%D=y94q@KZy7DEl;eT|SSFdp!3
zp$bQ-hiMXd-H1uTTnRlmA=oq|Z(GTdgH*;dmAt+Y4hg!@vBn8SeG@9}`_~r34B2!T
zQmMU0Jjpl5u*XQiw$ei5-GPt2$Z7G|atKi0wSMVI@_F}D6*;c-DQ0bH|K{q?Wg0#T
z5qna#Kf+se%Z^RZ{5C@krm^=|2TFM$B^gDOTE+D3!24?^vvCG}H`^vH(3JU_D>7P3
znv^gr;4UsEv}dT_5T_o3ta@*dEc6FKNDC-XzZz!Wt<WM?mLBqC<HV-D7Jii&RO_KY
zVFEM)7f^~P{s^Mbu4s5LUuRPH5H6fNqOhm^0d(pVfivUgHF2N4<iV~8y;N<6IPa}h
zW`8V+a$@6Xiq5#=Z($uF!_xL7c&P+S<y5lH>k&mA71P>?St%(r`Kn&NN&wn+(h~Nw
zxfM*8T-^7PuU%|*^KNf!^q`SN@N**_#MiNb>-cF<_q}*-C4$22rNpB&Ibaw{9vNBa
zs!q$*vh$8Q<zW>?toU<n=zgoSxv+@$`&E1H=}#t#{@SF9aDVG`jV!gS2a5FcVHHP2
zJKue%8%Dg97`L-(WdnfUxzF*&5Z|j=eU$Z;0kEtxx*fnZ$KLZ;xKxOQXvTFPS~iXI
zo1}O6iIqu5Yh#`{X-Xw&6dshFP5qO;u*^xBZ*Y`c`{ZCGUle4{;+eLQ%&agHyRwKD
z$iB!!r;x?F`9g1Vtc^xb5hyo`*Cc;qg<vY276C#^_El@pTIZk&?DE*tbc!*e%MCF2
zW~)RLsz!w6FdF>&>N1t9Vr*IniU!OB#9EHPS}eccjB{axdpRLueHa|nc=~J{Dc{40
zhgIDjCt&i3YeT9hIK8_%VR<*4btXmm6Sf$`5vhC5pmT5u17@&Jj1guCPaa3`nY(_s
z(%Tmh_@G~}^+?Dw*u%T}P}JKabS)3;Ex%qD<(6tZJpftE>N0MkQjp+i>5<fJ*`WeD
zxT=&Sm9-S$6;nV1S#TPy8?)tI|KeyHN^VKhq+7yweScqK48cXdGUe_HU%}eRRy|E8
zSg=@?k}>xS=le3`d7yOApm{DZXj1124FVk8FEec%7*k2p-*B*Z=DxI6kvb>T`>uy|
z2x|{l>1nWTsk-G-PnDv$aRa<IV7UiTXcUC2=D;#{gB~xm7<4}+Sj#Ih(gct%@0Y6y
z`i4FkL#$)e@K_fc4cc&xF-;=jJ3?x6c+HpnzQT0O6JGvc;PSt6fWg?FvP$d(0ix8O
zcs~ec!3^~qOV3ytj<pFvl>)#*j<x;I?&G0G%=ez%(8|_$pJ57!Y+Pa05%t+wt?}kV
z_uvaZN^S5DhDj_T#2T=zH5+DoUXmyY%2d;@$_iC&#0b8F^@&`tkoRkDdO^TTeXy{{
zIl=$fA5rGqF7^m=No99cm`4eMzl`xKWbN$yCCcVJ>RTz=w;D!4NKGr%K^c|0g5!*U
zwI)FJAi9m7;Hzw6`0;gqH965u*uk<{bL%!xz@%Qd0Te>dp9^RXe;|=(VnND$KY%`?
z%Lz?4G=HV!QpBY=rm{Gb^mYF=HG!4IPK<(Ah2Azg`B58Kau+204i7~7Ey~`$4-CWe
zhr$T)#ZB!X%CBEyGsSnT#V)e~X`_fpNWE*eCs*pJcow7F39HWW+Ddz9!{<Z0@*?fK
z3O4kh@FA}C1<IoYf{9xqo|v9-lO#Q!rD)7Q)O%Y%iYV~8apE|@oc-^}j$O+7vW-O+
z@G?XA3NX#vIiLW4r3#CRd+{~c74q5wR?WlwmI)emL{}lqm28?--nc)WnQ|QLMn(k2
z;nW#}w(_|U#>yq}mMA0G2WS~`^}xN8^CaCsEnW!njt~^?vV$76RKA^3n{Qk`c6Pz5
zRuU=Ni233QBrZ#cWK{47E{C-=Nh1~1egw0M;btEQ^-@L^-Ukj!N*bAPnI~|a+B~1f
z@(<??jQxTW+O$gs9?4#afI}5M?U?nU$O(l#R)_M34%7`o^abBN=J$;P#Jl|rr8+p%
zmC-)_zN$$p?I#sGy7X!%(!qyYz$1=NtilA2Gl4Ou;*sid0G*d;(zitjN=y=#gz4c1
zBQXW`!rT0d^BCl-Z`Lnxic^1S2Gd6(p2Xy8fx~oKwqmZQKH(t=HPK1x=l7@YMocqd
zjd8@^i5d$p-3`vc4txa~34n^wczzDOjtTdwr4s+i@MC+T%mF13n;8b6AwI-ay!zz&
z!hnRK<5+rFX{0A@{cdib9uDX>LtaXsPcqZg+(x=~N6JLz><@+B*Z2y|xjdJgeUBNr
zlwb|g?2TkAutpW1l~7J3^=Z-B(V_J(OmDU;V@?h@1j7|_u>A#atR$r~i~gY+d@L|a
zIF#%dq)JK$HlmpPHRmyof6*z)q`vUy9+HAX4C0U0zd)`Mp4Bm)+3pbrMVq_jk$`uT
zsx!Pvcww%NPM7ggd^4p2k8gDTg*-}H9s>t}R6*#cCP*$ToviFrO*iF*G^NHG3iJ&H
zKOmFB+bmT9nWDfBOy`;g0dZgS)Pn$<dm$!w2*U%{4QR``Col0Po=xe_LOKSi4a49O
zMpp?lJO;w30<5Xf&F9NVlb|!@<5@!9urE75^zY3~6tlz~9mi-IQq)`>ar^l<2om4g
zvD{yEhA3Z<U_Iv}zlqnBTetI<(7is`tTouZUg(rma5tW9wAow|28v2r0#k$%Gs^?i
zn<szT&c`eNl-e(#<^*l5Rg3U+_iXl&&^dnkwD_YObypE3zf50axF#O64<CGrpd;|^
z-A~`#=9xEvPCE{=3y(pJ1-pCNqrL7qI?4AvJ>yKax&23d7jJj{sWLLrZzGx}R^niB
zX?^U=1Y-zrmOS>Gre(r(WZszz8Gv60&R!#MrxvXpSqK8RtEd-ONpvHg&0`Q+2C?7z
zuk~V$41?s}c-`{_iNQ0&q%+nrzP_c=nt6c4<S1R5H@D{z!ytYiEMi+6m)MhyN;``C
zX-P1owe82mvidG8DNGPGT=^E;DXypB4>fY%B%yWkpE0`Ka&kuG19eG}rh+AjcTe!W
zSrqKY>0OasKrRUpK$dhAQOmCG(W45l1s^{wASsu|aKLIOji%w@NAxK94L>0;!cmVX
zM+u7~0eFi32@t6a^S;~%hGfWbZRGbl7X@x-EI(&#PyHxVVX6hbO?`KnwqEOlV_^M_
zYB>MnuQtQ)aXZ}K{$R1s0KV}6uq=RJ)TfYz>&H=0-+m<$5ThC}Y`*Itaum3JHhSKU
z-*9Bp;`?q-%C$aD_8Tg%RAcKJWx7K8$!>u9y(aZk!(sTvUKj?^GE)E<_R3}eacLc}
zEUu*D2$+4Zs*#NpPMxrUE${h7BVabiXOJ1yhD7SYp>&DZU|XD1`Tm{`%K^k5fWHyo
z>$F9_;n&qA*R(|;UTlX_WzaiRsZ9qSdz?F-U$<gQv56Bo&?fd;*FiE{3s>vRu>+3T
zkREJs{Xo&07p^=1?bB6CtQA#>TgWH=y^>k<T)VrkvcVir*EiT7n7LTFP@N08&ug5E
z9x(DVW$dZB@37UH1D+M9GMM}E><8<k+xYCXU^A!y#3(R5`=9t+q93v5UqL{1mCj9M
z>#-!KJgB;eHvZ+>(+-;w-ZjXdAVKs}NJ>`)Xt2{tsGtB<l6@Maxe}WpKV8>h4}W-?
zyKsRc6qmnwET`~@6-(}!KasA*LD4Kw(w{11f1r!whm;_cwCT{V2?m6NeEXW@={CI`
zQ2%XXP5j%&`i*As3{ul=ca1!wi5S^Gd!tF4&<dqPC(;P*AK7GvuIpW(5+aaFc?~}l
z{d#uXdsZ+o<diB`%KL<_Rs)pV<F@{*tbrkGp)<8~<HHr`8p`gX3}vk85H+W3JbQrA
z{aYa*2El)43+Vx)e*ueG)6f=eRvbqE_oTG~>q|r|Uv0iFPQc+d{g1b}K8R?~7>Cul
zD9bs$Zo703-(T6A;gHoS<wXUT(Tn&5uO=VLVfZNMX+?u!05=>E{)j}3bPw({983)$
zQJFDHqNRh~zbbV3by9iN+U}}dA1F(cW1IyeQWVkTP@{Y#jU<}qlp>1?lYO}MWGDBo
zmn!*xe-pYkZl(CM@yiToG9=iQRLzBePYd9qI}!!V`NK@Mh2nxi_Z1AJO1G<l@pZkH
zk*VlHHan<Wd5I6HMtz|Xn=>XUA}Vtb!f=4#ja-=H5DW%#@fa;&4F(gBF%v=^ciPI*
z?^ZRV_~MEHbf>wK?W=kYceii?M~wZ^5T0pm3+2+mgCpdN$$Ay4oX3i{PuK|m9u(v6
zyMM*>dm_Yp{0i%GB?5>A^Cw%sbWtn>2c~gI4eotjgL4Qh>s(9nZ&Sb*O+~=9E^U?o
zY_^OV1jX6+$f@Vb`;L?Mw0s2iY-Q&b2GoE|G}GYqhmE3p-H|{I?hop%WV=YrcNOa0
zEB%#~a}w>sqk&wdx!Bolrv<}f!iF@WE(HxLw!Kry*k+%*nEM<XIV94uw`x;!O+RK6
zs+`6`JPvMmFc;>LRz;@xu>jaQ4-uQ_iQK5*z+fBh-i|B;a;Ws(K092A(x+X2-><G(
z^;_XIIZ&~)VOB}@2xU+K;bAt*SK^h)`nej5vMA`++vmdRM;tMwSIbL238@8E_|<e=
z?>6uCT?Dc;SF^G`3#Fc4_ydWN=fQqFNmF>v#GwmWkpdseKN;r9Jj~skDy}w78j5Gi
z+X7pDt5=@<)M$1^2VcRq=E}j2Of3hmK-*wsP*W7b%hC}e==5jk;^9%m^Hg>T$5S<1
z8y}oIV@)3se8N?WJQxRdN-lr~2&<f-pn)dHd&P?0Q3F=550cA){5`bZ<FtvCa6?};
zXqB*s_^)56@cg?5hX`}6*P*`;c!+$&y-zeG6)ljRvPEPFC2QvHxpyatOWudg7=YSb
z=^F|!!W?Pu2~a!Qe++zqt-ywip{??U!~f7!BY)FW(vhVtAT^oO{A(DllZ%uesmMnp
zC{qfYlw2&`UqBT>VKjUKV+wS?AN44`V)xNpnQBB-)S*%%E=(X_2R^i)*n&+GEEMM`
zm2}Dwi!8J>OOnEH=#TQVMBKlfs^+hq9>g!G%JDxzRkwW|BO7$8l9(@a(4iYkxk4;$
znwvw^Iiju(MT#BaQ9t*KW^APAP>LA?8X4DJJlZ7AS~jz?%(Dl;NRCQTd=BT!dN~yK
zJcl7PNYl|2Ip4HYtf>%t|1qmnoZKmLFX>F2TvAyG>hreaE)U4J=HvBuLD1i^E?8Ac
zgErhG3l_7A$&W%@^R(N-CyM9oOobzy7W`1_>7shChRw<p2#=f_&}`4V?k(t_g6_+d
zLlwc4t!@(U1DjP3%V&KMbr?M{0CfhZ#n-&8vvUfN3s=vS!0Ic7)}~UfSU9!*=jvuG
z?A2&$tw`z1Au={X*^(0k%U^|6m&3^~DD9j}`x)>JuW+S`+@HT*-+Ft?s*>^Xbvoa1
zeX=vP(>!jbe$ZTfm+NpY%389`$nB0DhA&8xHY;uUCN_F*K8t*pT!eHrPJ^;;JWDCZ
z){UnCBY{O&2E)QQ03y&s8H+smgv^`cV##l`Ky2*tvz++Xo-?~e*({U}Y2EeXnjv-X
z^oSl2_`JFET>f;H$8ZWFL=Fil(_K%K5Q+C_|2rS6v|J!}59^Vpe+X2STNN=s0I*88
za+ZAE*WyL5!&jT9ULum3#R)NK%@pPe<jnPO0U!c>0sKA>4n0;6e-$L&1ht?<3IEU?
zP_0pT43u3+%yFOqtEwD;>3Kcb3nBc?5)|G5St)^Ig&?rd^IpysYVo2i{AXR%6Qncu
zz*1}eZIbtS7HT#+Du&2X%^$!!{WD+$V_ND6r6>VlWL&2oL-&cYF3<_JL(kewd}P-U
zR(gQjUkx6!UgNw2^*1vm7^vBcyX!>HpOuuSkEqrj?#w9$JK0SLm(cMI7W#}Aw^}|4
zI%ReN6aa*>T`cY+FqG@^VXGuscmgyWG5`Yya(M1m+DtYXQeUtviTGnRmo=jrgph-}
zCpI9fxR?|j*(-e{8Cl;B64<YCTc;_ekEwL5Ls@bV$UXuR!Lb(UM-lk7OvC{y;e@iq
zv_0hu<}(p->+gAz!5%<Q@6Kt3u2XYAejagOgun=8o-&mslA1(gl*;OO7LuHE44?>X
z&s<Hka9I#lqbsWs=RP-#Y=r#M+;M9%qe4dSZD_xy5k5)ML5utZ<#eDep*ZLWFK<G+
zB>Y(7XQG)|zL*pT85G_5=DZEt!Z$go=i5asJNKg)g)WRjq4i7DVKk)P3=xF`{mR5r
zfwws4ESE@;t%zOv@5e%;TKE<xx7w0Mvtx|hm})-?Z7#XRa!F6%MYeiiYT`%GhG;&c
z8w+9fRoF85p*+3DpC3P|t@SuWznCe`lcP>N6MG9?<RxMSyLQo#(9ji5QafS8E221n
zQ-e7q(5=NIl~EQjI%fe=6YPH@rrtDbaU5})v&M(becjf>2WmTh$&jS8lVg9$QVBcN
zP6M3qD-S0WJ~^x;KzpWF$rO!7+*jb`@(w|J!ogy@BSkQ;Lvycq^#su*hN?kR^)=cB
zLZ*<}rm%{99LZ(1b5^){g|vMIn<#)5kB~arCgR9zlJP}xL04SANbcwKv_(8Bq&)H$
z8_3RmfB{S%>yZKKdx=0^sf4L|x?ldCHK~OICFyt&l*n~`Hf(v5^~iB8+AzXkpi-gF
zWe~6ae+8v_C=qZE%I`dPCnnc-v<3YMl(vNuy(TYbknb{ZiWWM;577YdQafs=zsYB|
zWQOTJDB`M(JJws=Fm%m3({Qk%5di4pkOhovlLz5~%pVgxQ~*l(QPtwOTjx3CKTtGX
zhLFQ;H8<+u1i{T}Y5$FvI){Eo1!lJnVW>EX7JC^cN;A2RSs;wsuTa!{woR-;CrR6~
zmTOwYzJx)67wpG&0eX6{r<~nD3eHd@6#-Z0?3I}5S(0^=V<^wO$eK_XW}bI^RZ(OL
z0;Z&Uv}(}P;kKH)=*%)VB*}{X0Mxf_t&elNoqgLJHZ4V{5C}s=(u8BfVy!T1S8`4H
z)+fb^PyEmhoDuB(_@sySeN3wpJEwzn4Z<FVGTZ*%r`zJjp&7zvW9j{(Qt&TY>bD2j
zQ}utNrQEL6bzuK(OA-CcmJ&p%k%}|^2bbbZT+1eSgmD$=5>@Wz6ODQSGU*N(A9YhC
z{gvDuMr)Zklb>W}lqjVYLPxq|_Q&e*GGJ~Z(hgmY--SW@2P>-|6b}~LBTZI{ElOe#
zK4_poFGNyY|18?~IpWY@0qv9K4+qLovTfZ!eA9{N(d(66AOk9it8+u)t&&s5Z_0J_
z1R~ezT;B!#VnGwS6y)@zfSi@ZcY3!q)yOUcJb(|X(aP;EO?zK$S82uIcPRq&fQVxj
z!Gee$p-Vm#iTX<BPRejAv{BYl9~bRwYS?<LNotB#!1EDg_b0O^9uitUc!sC22X^l*
z)+tb{G*+EGZ7m3OWv1C99FuTIV3Hk(sgH`!tdB^zVNlRDD-JG4r&qs3Cz_v_)_R>~
zXpHJ>OfH}hW#X_H447YCdI;^Ry7S?b<?O8?z`+a2dQ*mrE;4t7#EF3+2Epmo7?=-k
z!%_=>RFu$T+!9ApYl7She$C&#76&*WB!ky=PPf8ZGHhGbybot`=zx!*xVZO0GQ?K!
z#M(lctsx=0D(I^`*d(ki1eTvLQJo{HsbE{eG+IKW#O{^f2)>4BR&2o`xiw4_u_4E(
z;2aK_8ra3NqCN-r1qW+Tf(({Prws=x;W<w05o+N^ZppK)=Ki{%6z6TGxxEefEPoX)
zcyJ+kDCK4}?(00KE?)TBhcqi%PqdMWNnuWP6%XYu^vf&AvYkx68{|Op7qol)V=kv>
z$Kn-^U8nRy<J^2DJH`ix0;arnhHjAcfG3phZma;7T}NirU(mMK@lFo9vN>$HjX}bz
zHnV*x0odyx*!@I+*C>s(ui4OWy6^vL*##?LoZ}fjAwdq<d0nl@w&M7J2$BA3x0~eM
zkrKM(d)JUOi{pNWf|L9nJ_=QP53P9fU8A$oUv?KsjTng^lh_&;2fl3#_&E|XkN|gE
z1z?ZZt~ak1PV|4JBs1U6?*SZP>_!qTOclvr_uay~k|nY@jR(bTysJ%xxiqI8%zZRs
zbOrO>qm9U>;wD-pwSi3XoUu3+<O6h*%#l2H3ChEYvkc621;rRLD_k%SM7OD<g1_r+
z*l>Q<m8+@s3OBv`EN!~2mIJ7{wFlNSkIqleFIVQ*%xKi@;nn(*#?heXYp4S`gN)Zz
z!9ptJqGqaQ)(y6%j-_327lAf+2zt>-L|Ot-1i=sRh~FzQ7FfxY#f&EU(EwgI6P9o*
zv*IFV5;vPA_`m%9GcEFJM>oG)_y<hsxlH-g04%XloRI>}j%OUgdT!rpU!O4PQK(~{
zcgN8lftPK<{v_*qhfhJf{rR)|ECy3IWu;b>EZeT#{@xVaQq(mr*S3WxtW}DMx;pzl
z_Eu7$wv=L<tpo6>eYiDxe0HnX!c7A@&U$iqo|QhIWqG{ww)+{5soBoa^mc$(>~jM9
z0g;K1ZvD=f3);IKQjtj@dNG1BHgwr`*rnTk#}*}Xw@J{HGwj)Y!M#$+A5VA=Vuy#R
zbdzfk7x~BT>+j4E^P+=g8ki?k#7+1z=i;s6paiC2Y%cP!$Z{#0_S0TvP{j?lWozA8
zZmU<=o*=V)Bh9f$cF`wIgXdwq!(H?cBw6>Z!u0|tOlzMbYqezL04Jdt<KoQWafS?v
z*5#urvMG9(V=Tc*ih(}~=Q8|v@Jd^Q*6o;>ucMyNB@Ak}&Ue+ew=l@xTrdgWt$z#k
zF$3JcLF}ifp$+ZajCtaWVLNHRoaH$x!aR|sEJXy*NSCk)QIIcbSu!%T6s$nZ#X}02
z)6=v-ID$)zu!*}k#akWVNQ+lFAk&DeRpO&AYBdn4DI_F`jnD7lxv-?-U=6MPOy>HR
zRy6AIJQ$3wXQR(n8~MuGfZfvXoo1=?$9^Pew(+XKxxd$IaGbAvvSGFQx@ASXj~ca0
zxks1EMqYDM*CpR_0#U=WqgRIP{JxYUu1>8Km&>F|jY4Elx5_@YP#V=zR!Ow9s-loK
zUst+o8F=mSO~vL$_x#S}afX6NFiWwN==@hP8lOX%T2Wkk#h^UnYpBNj>H&o^ODCh-
z67Q?^b3Z4^2<%l-w*k@iDZZSg{;)aZRTbg=3eJzqDqpj(FuSYlSa+x{53#=>BHM0n
z0z{YA1Idj>Nvk-HMKfkqHk<NG<4VUxbt6vwkPONgOSMUTRo|U~mH`Zw$Xh-A51j#N
zVUPY7OC}wL{OkTsAM~prBg<`#^!eI{#wE#g8@qM6beqIR@NewGE4!~4oqIcN%ey6J
z4Zr3tGUd=2vJ%mQ;h>^iX@>etI8-{9sens$yu-qgIy>sz9WU3N2z31PM;T;yVcu_2
zDAr-NSd>J0oiA3X-2Py%vs`<7yyV;H1kZCIFOk7{Y!JGi4{y>PO&_RT++JGhZRoV~
z2DXV035Hz8Fo!haJR-pm2PIeWRZP6yjlHpvFfxmeqZk{ZXiW&I8A2IJujC8D$#cXJ
zK0UdC68nU5DeR^^Tkx<o9Lr(+`A}_hEPL290s2Oif?-NPoB6rYaRp8#*-MqKkbw2a
zG@F5kU&3;RaIBdLx)xscxL4ido=g{wywc0ion}lEhsP?J-p;T2Q2ZXqcE;Nac9%Ij
zRX42#4y4HJ>*;oMaz1~!b0`{U<WerYPqW^Tc7wORmICkbq|OIP%bk$zSKlwx&;P(Z
z_VyL*F@PXG?C$Rt3{CrXuiT_yY#bM1PokSrT(fI_l~VV<@c>s$lwriTC&V8{1Zsd!
z$&Hw+Fc@pXs)O)Jnc`OiGu4psxxHLy6{+eh3Gz1VEk>ta4;~zy@EJ%^ZJyA&K+VY9
zNrM@TYN~qxFuH=!2!Qzd>r>71)5ByFw|@5JUGTx6TFLl=;;$Zf%!B6{AAfzSFDSJO
z;z4&Pa>svQmgdmPTN{}96kBqVGzm3Y7F6m4W;vTTF<<(8a|fQMHXzlkma^wN0rkkb
z<tF5k5c?tLv1g|_RC}ZZ28nOvs01SFnD0Oa8|MNS1GlQ1m@P5*3EDv2;7@1yuB3DL
zTrsn`p-utJ%IpvkP|8OD2@RM_5e?F8WB9TWvW8e!77uCzpm{Lk3|1=|{o})~HZ!?q
z3}1DC>mV;U78<7BO8Cq@r;o6=sc?eMR$Fu<49DQ~_RTyej~k%1NVe6jFu&|hR*4{n
zpv^Io9>^nEDGFMp$PaqIUuoQMMtNPT>5^g7eNwj6&PP4<eK3<hdI$lzc=cK7<w-nH
zD!~lHRBp-CtmFmCKF`9_4~rs=FpUVT3CoO5a4~jFPl)1TUCZN!&D`|JbGCYR(VVM0
zLqHQf3X`Dc;R)X<8K&;=n043vH+Q%s!y5xI00)H)D@s_PX&fkGxRJW~y$?^p1T-F`
ztsVK-b?tfK-35ASq;_goBkO&K=S%17naX(`WH)d|ns#cNl;ExmrC<eUVych)DFZ(Z
zjces|x~4@3Di=>{*3mjVAya|~h!}3~-c=a}hcJXu`a;;e#+338EdcAm`0Uu!b$nmc
z!sNs>JP?yn34;Vk8qx#gNS;&HZj$9^y}pd`Go@^Bgu3_>>Zp|ry`5=^0o|CpCBTlh
zuo&=Tf`7;he>!&tS=p(GzBz_(fTUYk<Ly;SJ?VC6eMMuLekyD6k1Yf5y8?bJ@Yi=}
z^1`#L3!YI)Jp--AEe>^nAeeq6lM1$IJJDJdjle|Lj1Znz%y_JX(fcUTu2gKeQI`}|
zi(Na3+wnL=#}1{5eXHZSr2%=^{Q8EB{l@D8w0a-`_2`Iy?M=?9)HDTc^Q(UxLY=v$
z<+i~Hxei5t08CCm2q1a>31xPWNdbTqbtM}|+g>5Y!r55d8LWuiwpfMumeq<bo)1n(
zz+;3}%S!hq`^5!yolojP{mlh&zeIT+;{JCo=y%LtT#zhpky(POXk3w##<T8=toyak
z^sEY(DGFRDht<6W(QvK`L6fCaKl@|v8K)`I-ATOXyXpDRYl9qpb5Ye&Cs*`y=|I(I
ztJgh6o%bAsWZ{de6m`;ZU@@pH7I6To_%xV)z}lIrTzBqIwG3J&Yyx&a6q$2-SQc<f
zEoVxOy%qf$As*4h**H_$hPOAmf+b^N%2FBATujv#s&UkG6+7m=HdC~W^@my}G>Qr|
z&5@CHsnZfpiyAaauC*ok1qtr8>e_@gg(W37Nzqm~DxTmKTVZOsk;Ep40t$lJh=r_0
zrm9@N*?=;(X>m=54p%M-J}sj9cv$}5_wna%@=zpjN}yhgGcu(!c<&7fmYje^MShFM
z&O&TSsat<8$%b~i`6Ek-AgteZ!P3k5SXP8+e>^D#=-2mk%{#2s(<TZF`c5P@etR{`
z5~>m(!k}N)AP(Le>l;G%iubhZHb`^r`&$cD_r+avhk-O*L6`4i#PAA-1Mkaj<qlq_
zX1eKR0!0FQo#*c&J9_5UQU=jrPZx*HwA=d&935a)e<uGBadJzE>2#FK$h%2i{VHzd
zjxkZPU$H5O&KN$DsEdaksVL$s@#mlsbObe7QYVAd4JTajS1YZYi&tY!%n`USFv<m?
z%5`Ju1v%b^N5&AZ7zP2=0yau=+f(tmCMy0BPBL_ZpmvO$?+TKfB;%{Jk%|l@GRWd<
z5<mwMK0>Xo&LXm*jGoQ&btVGYik5V!=St@LY0h1%6}Aq09mE;aSU{xCaNnSj>J21q
z^Ak9X@p2#nDkr2H&-9_?^=joNGV%|8?y|B+Rniqq+n4=exW@3?gt9$JpGtCrT$2#4
zrU6M7pLpRe&N$48SRk1w)2~V^?Kc>{2gwwGWa<fRn8iofF@j?^m<^HP4Z;}0rZ4xR
z`l`dwKa%J>PG@tSmHS$HnM0o19>?Qm$Ylpjbm)H9YkFkZz7s}43VTl(0%56SVR`-b
zJjUUtX4~DY2c(PLdJzWSU}`p(m^CL**y~ZgWcE%jw_6H`Q`Oe2_AIB!c<hqx`x$=6
zA6)>{KsXy%h4Sm$dT%67Do4@A?6YgNMT_f9W)~q>v=*4t>94wr@Yi|DUTdaps1lUD
zkw;7ZNB8rxD)811a5B6JaU@kKXaSgNARzuvZ-)yIHl-pd%i{&q^?UIsUk7(V)N8!&
zw~p5Xu&FpZP5SY|uv9CZ@6VTju{O6(pi`!Jcw@95-&&!7uJOOhRJGprsS05l>csxB
zf(HMq6;$?R1;zMoHkNrFUfBFEqfUlu+<Yh=8~6-F8&K0AG`oQYq?cK;7uqD%P>H<-
z<VF#=md&*}oDEE|%=d7anJAnIAWP2?#i6b@KD}{V*yzq^HK~;mk7J~R>T@h~zdope
z4C5Z;#8IleR5HJ2P0(C28onD-D7f0n`_%o-dTJ~j#z6a>*Mq#n^;-(7Dsn5iKO!MN
zq}7tir|5lMPzWXn$51RVB&JUC!x>G`KjLD4FbbqyhlZkdqY^MPUNHv1@<bi(ILV>M
z1~}1&iMZiQvw_)i_j+NGZua5OstpAGj%$>71Fp~l&l*z*&Y2feq_rPQY8fm3mcEjQ
zcnQr}1d{Xz1M!*WT!Zwj8yV!#xYSVRT0Z@2Y@y$;3P9y6fTz}g+h2~oTc=GPQUnUX
z1th1Yu5Pned%^4(nWop{Nr}Q(q`~`FVqL@^!%DQ1GAj}G41f`;dV&^3ko^iZi53^?
zO6(kN&(9rfCfS^Aq$9S#yE(oap(f7J^jL^?#lUq2N&vv(uEUDk60%Paci@qh(Qpfh
z%<_B^@6Ozl%BzUyyFs;=isDi$51?f_lcn$rePKb<{<!~OL0wS)U_k>s|6oBX|Hgtu
zzpx-@{4XrX{y$+sX#WEXiZ-kX`oe-(@fAIYGsgnk1@ysd`wXn({QCN-z`8;hulH>V
zzlWrtP5=jeI?WocpOjK2KR>CPo-(KMi2TG)e+^K{3jx)PS47y4Rfy5{HPFL1J&iHT
zPr9N7?G|$*QvpXRsLzBgj-5m{O^xpO+0sTW2i%9w?T}8S{=4jF1xO2GP9MmxZd`^0
z(s0@-`j@&uXm}li2&%zlKxbMYSektJwPyV{82!b%I1+R9wWYp)VL`S3#)8!T0}G=3
zU$7vEzp<d${{;(z_!kx={1+DF)dd*-!h)p##)5$V!h!~+zObMM^}+vy1u6FbjRozY
z{fz}N9?vlTg$0@Xe_}y;kY8Ak)9X+K=l{Th<}eJ4{sRj_T|-{^e_}yae_=rt{|O69
z-~Ds4T?PC%7F66<S8&r3_&>2Afd3a3bn`bB6sB89^*^v6(a`~C^#6ecrT&8jf$>_5
zlgoZpixV*H+sBYB9z)fUqee#^JOvx++KhL;kf}0Q2!$V%PNte@D^4)3XDKYb#7p^j
zCl?fWK17F?zQ9uZA_zG>Y<>f}=crA60kU6AfzIBOJN?=Yv&bQOPkcCRq^1Q<b2;mI
zooTu^1>n;V0{nI@9SRRZ1;kg8{^*q{;xd?J<OdW^-(h7Ta@R{#()c+yg@E5;L+uXC
z<C5%{DMRJ=F=?gcheBJ^+2>J*qTGaO1J^aTlECY7!u{?^K(jL3aTD(jxq+`eWc$sC
zMDzG4xY+p7xmir6r7mwH^&<TQOzc}Ne1(y93l^VmOwAF(aBGx`jC9=}8_RE1Qi6I{
z&oQ4t<oReNGO+}K<jd=X<m&xLFygiVb)Afgu40-I*A&^2YLS_LY_nI*L2(&$BODp;
zXbWI~ZIG8?SV}YQl(-G_lcu}ld2N{>^m(n_qV7Mipb{abAOC>`^{}sTncse4LE`wY
zfs=8P)Th#4SP=EW9M)f0Q2G}ZR4?%t7S#JMENJZu3p)P`3u=M=uUHV%7Z#Lu@DCP5
zRRr~g1+6!=#C%~vcaL-bI~JtADH(CGjP*Ab<nu2qXaV=%SkSOi$6r{G>A$g{kH4^>
z%V;;THo9<PVFWcY-dTL9o$PLqoO~0|n;2-YSPbYQs>$mfdNOpmL@*MssBu3r)t>l@
zX616_k!rS(6^2WWj#QZSB6Vwzk&ggYI|lT;bWyU5ho_7ze_M4%z}hTt;b39@rgi9S
z&WTi1O7nTC$5)GHDDfVd_&2%w3-?;+`Y^l#=CJxzh)?NJ_?eQ+RgHB~5a4y5y1%+l
zh6tK(6Ta?qYShbR2B`ReuGuCzrBiPN_m-{mc>QpV4@${!1=slGVYfeLq2`nKjA*AY
zcJCRfx2MV7XpT_2zQyd2e7_`olhb`ef#uf&E8>N-UhMJG^oNctJHS2aQfLjZ69^^|
zN{#s(q=|rV33Z93F}~T;SV@V|P8BzF$<?NiM9RgRPW)rfPx>IGyP(+Wk%OG>mx#U(
zdB?muoP9DXe@PN_3vm*DXxgPj<^Vy3XD(G!Kon80Kvl`r25<r}dKpc|D>oBzQ0_}H
zagx#Zj$s+MX$_VQc;iOD`%HP1B>n`}XM`+KbVP}pg4pfWpe+LOM{m?VLv;W7c0%<v
zlnNw+p!sEP*I(NxvIice|04v=KVHE#v6{w#Y=vFbg2hubgNv64CgbjP?AW@WqQEEu
zFFfx4)S^wncKo+}rUZ7DIA+5b(<x_hEKVlo^PAC)x>l%rnx6%2L*n)Bq!jCqJ5AYg
zX(+j$i7C0*@qI-MvIyB9x?1tOgHu^en;v87iI^rIRFD`3U`xPxYF-#Xvf-&Pdn|t=
zKI8kD&O`@v*phinuc2}{A<E=e_vYiTlW~*|><~pp?L0KUv>n=SF3qi41$0&B+;e5|
zoj%^Xpx2h7;*5SI=24+H8NufhO73cEI&SRj7;w75tYrX+bHk3C7-yje)UiZhL57ei
zWgAF?&9dxePMnByDoT7;au>rHu&Fo^x`)z}4hOWSloxJZh}4@Y<)WQ_%ZgD^-D({F
zR(RBz)1)6Ylk>r4O;#wtAd`ZN^8*thYaH-Pwe+$>_>qFICLlxQq@`uvN~BF1cGv(B
z4gs_5S2UQk?(!WR9H$gSJK#K_z;la!lj}sbqgclWwP42174iQnK2rNQTMXW}eZJOP
zP2WEUX8YvsDSb6rv_@YmlI@WjEf{&jyrhM`!dKN-VP@e$1_lo;#Kw%Z3sx*bdF?X*
zIG8HirD#4d&oLN;6biYOm){eAAwGVddwkE5-WXH@ElT0V6%>MVfB3u_@1EG54|`M|
zm%Wkjkuffb2Z$2?>?;J?nooEMJOrUttiMTMLxg0TDOrdei7+ML*Q80Ol@(?_6Gt9H
zjyV`oRiY)vP<)^5y^uDsAsw*$*lYlt#P4!wF}Z{!oMbmbufmcyblp?q&*mf8dE(qK
zJjDln${l2WqsZ27;;yL_CeiYLGDwm1hq7B%urY7dF)PU9LyJ9(yYLx~a6A6!y}cdd
z*ABk3d3tXd%eRmuVFj8uQ1|;Cenxmg8Xnv9wvYngO;9=XaZMH~&undSeziuN$Z=@$
z`PD^UbZAr<iMaGOG!_fv=dn4Zqwc{Lj8*HErG0j!Tk*BB!x_pVTcU{pd^e8LYBVbG
z{N(Ks3Gej1$4bc+$fIKn1wY1&3^%m^Y_F`YMVigRW%djzjZs)kDsFV)L03>k;cH-z
zyW!6ll}r5|T)IKxzhTH4rV11aHysFXUI_NnsW@Md7BNH)ktXK@NmfgiNigEMLy<07
zG6yi4=#+XeT5t~%0^8?j52jaBCyhC>?l7S`3g0=Bl)#>%yZGbhXS`ha{VCP$kB$b^
z$vi%9jV!Lithe|)jF}A>|0>&cHA77|5Pl&Bbf{e`S-o-{_tAd<pUNZJzW|@N4UC4a
z4R)bXIV-_j3lY=DNw)F(tGc&DC<ZwedHVOaY47Lo1~jM&&-cRDU6xhuE>&|L{oO;N
zz?4Ct@#G3D6u13Jf=G#g^3>Lxr;Th!OV20jwYah)=P8g;bx+(?-3J32PXhRL09y(M
zo}rK`{f{l3VS<$jT%@NWoBr*FajD`}$`UmCTf*P6m`@;lClO@T%mo;p-hs7>dqxE$
zgc{U|0bwHCrQX~cuP^jqVqH&o>GT#B*aH@ouyr}&E!B4?T&4mBJ>%M10M}9P=l9*`
zQ0EDbnqh}Pj4r35uKC&p`gnbIXMS)!-0oYc)$Z=7AFwjPvNk1zM)fvVnKoUYbDqM6
zpnZa;#&EKr{-XdOZ?I1hXZn5^aP;7{k|1X7K8`cbA5pI#H5*I8I@5hZH8tb090wUL
zT*A9+eJKJ<+2Z1?M+)c!)*>L`#KE~jHxVYY_aw3^5&FsJ)m%zWwrOVG4cV%1*8tS(
z^z-phG5+82_tCT;nENCekBoaU4#i|&URAEjL9<k;?~7Yfrtez{*Ra0X@0%1ow9mdc
zet+BuDEM-E9vX0_V~|3MeVEQ1hH<^J4G&drj=l8yVP0>7y4@#aZSGsOz(3t|*$L!c
zwhM(_u?Qk>Ntu;5CVI|$lf2juS*DsH&Bw-7w7S_Y3|R%?gw1<MZM=*0yn*{t9t>W^
zzZmUCkg?h-7bM)tiq)4#cgkv6Q6f~BO@(GSb@T)z1-WT=9dMw6lB<`{->U6wCQL>C
zgn!G+9z|2sHnj@gQ&J6ta_3JK9f9b7E!V@IO-fV(gKL@$BUNa%C+)3c|K=F5n)6)`
z#s=>T@7dOcCYPC&uxK2fV$zZuSjxeRM8HLW=9>^FhLzW_7>bqTwWdwGXN1OSsk2eP
zGsJ9FaOiF)fJ^X`P+%=*y$rq-&ohvcV+32YK^;MwBJ9FU&9fn(isiqk4jC&KzTIw4
zxO2i`RZD^kJBQtU$Ys#4u_7jIQtP)9?8~5B3p07F1zaZ2g_2shnOE0G@T=P{=gMbY
zZFg9H$QX|fNSd(WY#U7<>prQ@OLY_wy&_|oWa;Fn(K9|aJ7S!|&wwU7O$POKrB2S|
zBaHSkyE)1~qLE-pmzZ-)%bE=@?d<$WJdRxn>|xgp`X;^_LdkKlV18O{<?{AN%yuc2
zBwGfzR0^80K+Tdx5U)%aqPqr2Wo#Qt6W3%id_ACADAD_v$8@>2AD>K(bNdK`S0fB=
z_B*zW#8rD5<d@x}8($J12s!RgU5KaKa}B}}d&Bj^X8GEjCIW23aIvdIjo-_--sziq
zgr(=V$81H;rm&$l8Wqqu&50dPRV;=XW>_nEjIL`>?v9Ef0Z@dY0!OzZRY2bJ-3Mw7
zoINxOHtM1PG&TuQz)$Enmo>MJw(~%nwnkg2^R-SVt&>0-EjQn*$yUnRqj9z)0N(zu
zS!QtB02_R3nzfAOVpko+NYasyD9LuI*%Y~&tT_Ppu=y2C>43c?V^Zbx@$V4#PPHZl
z#^CZoiBymhFIXRho7Px{`cy53o(}`k`MI=GaFYURy|{8}&-7ke)zQ7geYTQKFYGf+
zrhQ-|0@I18W9tU(@ugGJ82)>ObU~su2<g&i2OzrlBcnB5?rNqaQh4>_ouQr}MO!vR
zg^O^46o!fEWwEa9wvDEQhV04axtJ3e8E@n0&Ulz*7|{zIaD;uH>E4KySjgBnV>#wx
zk6H?NDOz?K)awO9AmgoSq;TG*yM=hOX|n`P`9d7#7YWP^q5g6igi?50YP?nzwn&4_
zQhfG?&XAHmP08uxa@4pF6D^NT;s#5^j?qY#gywQ?@1lnR(8|L$(ScWKTl)ix^ZCPz
z7O}6XB}@{@n5z51@l6s%p_fpaFD7f3vSP(aD=MWak!mm37E_O~9Et8?s6i|NJ$!#A
zuF$9Fo>M@q;kP;a*lPuM!fW-wucZm*G@89vtYK9PAV~Z9p(6cI6N|pRIr|<?ZIdOV
z38F^9_qKL09$qgU4PwE%=@G4mj(ilGy1_AYI4dp%qXS5l#rwR*iVLL<BIl;O)ryjW
zUt$+iz8I4B?erx#>m2|ehYMhxQ5TGY)^=>oFjbbiZP~3iWA_{MFT1DpFS`e`@JRv0
zaQB{|<;1qrucVc{I1jxIa3M;!HuO4=F1(P`S!;x^AJajP_<2uyQdm^3a%B=cH9p<4
z;gv~Ej8i4H+3a^m+t1_cv*GPk$L?<Dev`({-Nxt{9bd?ZS2GF?6&$kO;L|kSy_PZ)
zg2rB#U=8jS$w+Li!p|fe$+QjGsX#g$BiELt6W-!L3{en0IyN(iP$Drx7c`tX8+?|E
zK2w)4a#1qc%KkoZqsD>e5=h{pC}TvNv2(~iTRRfQ%kGy6r^H~w`ZLnO?G}ed8`TjU
zW}?n<M(fr%fQD2Z?p{D#4%JD6*aHYo2U{?j<OyyMim#4o$V(#ZJXVFqB81$9JG`hU
zlIE_iMY16S2P^a)WxW|O@C2IsIoi~D(UOZwqW*mvf0_{u?6rQBv}y;jrU{zMN0W>i
z{3(Uh-Rl&e%dcgm_I3wQX)PYSoa$xl*~xLQprVNjfL20vCRgakEL$K~w((mrhhQcA
zmfs+X*;2eEp7}lT%%C&h3s_5*=%P&}l*bXO^M&QPh846gXi|h*J#jO@Vi=!jQNl;q
ztj|u{nudQ63F3X%XpUc$tfct4!RPa!=)$YC1{*Y_M`ZJ3SMTE}0YAUS8gs_wgvN^8
zx=)bB&Wuq1;RuII4VL4#*`Q?MkxFuiZgs*jBPyEVo$^mV51yM?3DrbQ0rKxvAQ*}X
z^47ushpu;ut}N=`HIs^Mqhi~(?WB^5ZQHhO+qP|672EdN-}gV=XLOJ5G3Ld(+82AR
zjX8hs^St{SblI_SBW4PG8<kQ_^GJ<DZIGjOY0<@#AHpu&h|NMIv)K^T|CTv&Mi2RU
z1@~8uCW@TdphOA*F^TR|yXW;=Rn}0P{LfX7PJ=w<W@U|7BB&)}T5M!AmDjQ6B}~#B
zA|QKs*cs6hcL-GuLD;t0%{YSb1-${LGYg|R4-=H{V9-n)XiX;p;51IzZi@5U+Xlwu
zFSt;+%Htp+3R2(2;^ls?>RiOU96^A)NANbG|FN#eIBs^$dr4ts^#|eIHyVwO*@TN7
zy@<!(SohaKbXaw{Mu*T8H8CC@ja35Vx*pPD11>>0!-2N-uv9&2mAay(<a?7whl&IA
zdKhS#wAyM=hU{9m)f?DZA45HL=}RCocpo!wChuk2lJIX{w6UyWB?1$<*L+oc+j5PF
zA?vXEXLzA8%4#bH>=r)*ladQei3s_9R=K`oDv(p+#`f4%u>OVr*n47&YQ}(QPHp{K
zpgO>%3uAh;&}{?dLM0lE8xMj{$bSq!_$r+_c!#`1R&%<A@(>v%C}a%@aitoatk&-U
zh1&SEtVt|BpZ+UjXr9EKE;og6q9d}>f3z>H3kv)@wHMWYnK<AW)#4B7!zlgP1y4mK
zFX9|;EAb1M!r1O41V>||RZ2LnWB@{K;WGEe0UhbOC7o9HhX%fKNM^Y<h~9xqSZR$u
zMH9kYhVW+|*Ie*YRhLJK(O+F@*h{&q)E$$h&+^^je8~jZdkh)^){qN|_t1Z<45Cj7
zZGP^ZW8Xk9E}cn{LJajvVYz=b8W|OX$o@9rHK{)?x(TtNWVc{^frF+Z$u3)w_{%oL
z4zTz93rm^i?Hj?!1=xEQUTqM-XZ;=3Ur^yT^=n64^ka*t`_vQ_X_|pxtdZd3|DF4T
z9w2zZFdlQa=~vn#n4nQWCM*+%vu)7PV3(pD#6PS6+vN{D8#Inxrm_e~k2Aa&d8dUH
zT?NdGC*6rP&pU|>KIY9k4z6T&@IpBvP^CD20Q5bcFcZ|iYPnY^?y;D+OlMvuJn}2M
ze#;ACgF=T~5F=(>uwJgWS2v^1`ry^Ru$-rk-v9hPfrJr0b$;_igP8V<BVzw5`Fgsh
zt)pB`9o5qJg9zVTE`B|s{e8L7bI}d_2wbYeVn=LAYd};3BAEA!Tuac0IMNHXAoBNW
zJBz*v>`YYAiO||uJ%W6qj_3VeZWX!aVw=Aa-Zc<@tgys8Gr-@2eVizW3(8~Z=(WK0
z@3$ok0rXW<bn&Rzf;GV3bMzm7kBZdm%E<k&Fhjjce*ZM<;?bkV@W_AsJ%5nYm-MW<
zecmK|RUtdR3QxL7lMQgR3AT4n<+fF|C=r9_7BOv^SY%R?RtzT8x$7IFMXN)7Fl731
z$l<(LupasbJbBF^@!FV?A5fwAc+QnT9pv+Z{JoTYC4f^l=T{8E552&DfH~g@qEnT^
zj~>qIS-wDmW*=wqZXdP{-03?7ZGuTNai)#aa3M|NlFmFLN1-DN@BpOMR)B8Z>ByXq
zI>wopYgH$`7zf*@FB>~oL`TPSAn|$#vXxh910jB_tE=;9Cp#JPooxF4nnn#a@$Q_&
z7ZavgKW9*E7f*Ahc(|LFvMs%jY@+G#I!dIoYpm5q+O*A3E`i`xNELH^K10qd%93-4
z)x}k-qB;+@cy_p%yem)4jt@>nFAStylvC_u5xdP6p!snl^X%e6`sF$5(Gc=0y~oL`
ziF>Ehy@~&VWKJ$>xLXd2^xf_Te9RYeP|8jM#7w?M!70mORgi}|)bZ3cJjc!T<#;E@
zt?TA$s081uEUFu<co+B#Yr|EKCxUHHmd8JDm7>QzkDG69F_!D#%Rb<(`+(Tjo@p7F
zrOkA*r~0Pw8je?30*^2=PcEebV@sB8PBj4OD2Xb@(JoMf;#0QDDTr?2N3g8amid<m
z_A-Bv=r8++uJ?U3kXwF=`Nd7)ipbK)7m<t<(A*Rq<Mc5E@a65*>VSQ%!cdw-=n20k
z#Jaxi%y;%clwxHv<Zub{<5ACj9KoHNId*N9Dm>{!GTOTL?R`wu6`sh*dcmFOy<46I
zFYQyXZMgOC9JkoTPDb>L3&49c?&RwE5quee_850vIFsyx>Y`!W#U9}*PX4*Mc{NP!
z6^gd~XX`kVBPn5=_|=Z**Ze~?E=aNj9Sm~Wc|3eIK5?~p(kk+sw-zYLvJoR9N&<9%
zhJ-BD1wU@_nWYH0QpT~M`*;`rdU(ZTgc+94nkA(Z`~^864s@Ig7b~9BlCwi(3snTi
zz|EV6v+z*m=6r_P6v2a`sURA^x}1|XoD#Yz9Y}B=lorNdx0;?rU1Q4Il3V2<dZQ97
zp6z>IQQUY_Jv4~;13dreO0i#jz(jw3)gqR(k%EQ@7tsG@chN7<GNg7zfh624EuUJO
zTa}$rwap<aTP|9m^>W8p%>#$cu7@Xfbv_upJ|4V&gA~0e3L-=|wOx#)Q#Ss+pJ==F
zHVLgF-BTuL9DWqiHEm$Nme5><=R6t*z@Qec8d<8@@bK}H>qb{AU?ae&NCga1fr|r$
z&Gf%BKJFv9zrx@+V(AtB=}1H9%mXniBPw0YOS&Gv4s7jiX_0Tkv|21;PzG&a#Oi^+
zl;^Wz&e+P(&B#`oy9&v)-4HDSNtRIMBMDfm?XQq7qZocm0t$Sb-_GS>W@}b_y1D@S
z?;Mzy$LbCBQOnra_T231-V(!)M{i@|SnRA6{qO_8Sn^?G8iz&~+OY_>b30jTN{>bx
z4`)`qYB1WO`=$Ml#FD&?z!u44QG-83LP9a?qrH*z0`@h7l0^uA1PvVcZ;=*!%@{(h
zdXLb!97PHKfG&bX=U9ScmbptT&?$A_F8YW+Z{IoBW(JAlp5bX<J-C=W1h7co*r1gp
zy?eSe{Y1$gB7OzSS+~{Y1PHrolS+joqQne&#RkLSW53N)CgUf3#pgqpa{pGzhFOkx
zbNzEop~SY8q}8rw;%lb7Q5Fyu(m?Hzgk!ZRydO<rp^mR^7N}a<+Q?V4cx)zI9qTO6
z_auFVgImS1z;%sxYAOZs5XVWx*1OusR4}PQb3}vo2%Eu~289>af~ijbnxk#9;IJ?W
z2t?5Xn-Hp&7%PO|*WR|+3E3($5z;?39}}EkMY$5s&OE>=<aG!sx+MqWzLD^0CW>T{
z`bJ#0TDB?sH8bN<Ey!9LwPQiveSGIv{aBl=XR`|y=GdUbKt#l-eNtSyt|O>*@BZi;
zd0>`$V6?jD6aT6wWbS7f`;a6J+k~MGUIsOOb^NDSLD+MExR@Dh3NBT2<%2dkN_nh6
z+w&Xsp-mq;q3AN-a7MKYme*WjKM~-a(C)M5)Uja(vYir}3V=OJwza4N2DYfoNtT-^
zjiXixO<S%EaB&Mljtsn+gL&nAu@LRba+0h?(G+CJM;lH8<*gXw*9}?{i`yymJE1qA
z1pcpRBCr5DJPP__UYzeeD>mE<7qd$#<M5x*U;)OU7luFU1!fLG!||<3;y++(KHAo!
zyM55#;t;<kRokM;i=0nl3K9JYq)>^{41;1hEfBE(>zOd4hUdckaO>CSNK4NcGnQ1~
zEVt~LM#<$R`YD>eN5gDfKAlrfOk=V=8WbvyVpjIS%(Z$drjA!v_Asv`o}pzO%vi;L
zGqa9XRGJBxsJ30p6MV7MBwn;2Z;E}=zof?)l@1_(4k2?T2UXyAZa&SD2gae{(UOCv
z9|9;QuDctT#{|hlC0K|@1sK?EM6!dPHq!p1m^c}3X}!|NvhLK%(ZhFU1Slrvw@iq|
zdGH}cpATqxXjC!r^$rC*Ug~#!u&s%;d^Gw{nul?avKcRT-sAqpm8oyU{Oy-y=7uZh
z=c*jDpRmUjC2n^jHNTW_vn*bKz_C!?g{HRl-fwQvc{Jw*>NL5*wo1fLuj?cl5xR8<
zPq<Mc6V?s%Xpp6T{iin?yn8nG?9LiZ50i%ZkI|9rAIQA}4mqNamh^C-JmK$*MA!li
zpSnI5s`7Yptv`5^IfB_Cz!62LL(n3Ux+L%|g1{N3OArHQT*|a5Ao_DeY7`9qQtAn5
ztOSv@(<@j1#tSQ5=z}pF$b@H2DovM$^p(Dt%M8|zfMR<{{@LHN4VlL3xyYx1vj)7m
znoVneeBXk9=-1iqCf#C&ewc0MLPNj|3xhoh+Rw7>m?s<n{jOg?rILi=L9Qr5P#Uk@
zF<q|SCC`sCA|OBIJ`{$V;2W)x;{xIYS7%;M)@lOv!!&+dW)fD4<6-5A$<X<0L=8>X
z;?r#<F=q8lfU&~|iDL~*iJeq2`(aYc!@^?339*@W!I>V#Hm5L$zmAUGiLN&GHD_zv
za<%8mi*?V?U6^`n9g7(gp1A|*#vX%4-pAeLN9JM3>z$H`Ek=&5h5--v3Rj*Q@`WYx
zoyuvGi%2X8VuRB%sTQKxaymCj<JcrH2>wbEX<;+sMrP(W0g6TdGLWk3s60SCxj}GH
z+3--L(BFRvKbs^?Au#S(+^mh^@*l56Jv=F`wF`?zon8yeB{P#O0k!>oOO&QgH@FJ5
zQgsL_wTJrVJOsQTYdl3XRC+plbHAZhIN~N9BHVx^HOXwgrZ$?Oy};yq-r$(iGOwa|
zg{3~Hs=Zd9Z2zXf)6M2>RLd3jKL*PUsb-#>@?Xo`ap=eHkcmrCAZK0Fb<2KoHWB$(
zg(UWN846>1l3BZGxByLr`~R(p7=H;8=9L2`tGDNE(+_}IDkw8TDsn&b4P%)hc2p6<
zS>rJ$lqF3HTRZ-9^Le}WN@RWx&_q-vwB;Y5g)L55MG))Um9Z82n~HI7C{Z275=;~n
z@J2yNp2X{^H)CT-*oKz|D`UojsU?}s!7OAKtulM3F-?b6;PAk8OEIpwl_I|S92FKQ
zD+Wh5*p!E~LN?!{Gs^K<3|=+C&8=+M4Zga5J{a~L=}1)YW=!@i5W-S5hc!*tEV)OI
zm?=*iXn&~AnQ~V=w!)YH7@~d=Mtmp6c~w3eXoFRMeC@of_EWJHw_KgZf;>Hi_PtUX
zDN$YMj@8B1)8p*pGiSgTUqvOK)yZaDSw<Pcz9QWF-A$Mr#JT1g%iNnb8-N7Qd&s7n
zbK~%OEh*<#tTBv<L*{GZseGN(BCFo-+{LNt@@`T&VVV8c4`H0(fpEK<0Pg2aVu<$?
zAW}p}W<~-*&3oA45Ue<{v}cRBpP@J=1>4;jJhThCO|*5A<>dd1!>O<T<hNuVONeg5
zX)2bNZA^NT%7qJ_rj)0&)WqJP7_Qd@V=H^XMUjSFnD)7Eh(@R28?gc*D3AqQ=z%us
zq2ivjJd&eYAG;UoTySimnNe1e^ZOVZkS<&x9F#~oymPl|_4M?O!e1RbBkg~IQWWOr
zFjBPEYbfC3MdOtHc%Yz0|J6gBL35H%)T%9M(+Pe>&}@PWHK4#zhN~H_c4J8AIhl^s
zU!5}q1sg+>_E<y3J&>7mMp_Nf%U~yf$G4!7JQAA|Dzyv*HRw$wF4d}EEkyxoxOZIx
zNxvRInh4(&knoDJ9TD=daDee%B_5u(veK*wD(NCU_cxNpq4usv_`lL=m!PshNgIRg
zC@q83PlO&-RM1}5sg<tV_#H@%FoN&s_hXJNgtlsCbCqJi^$O9tNLEm%+>WJ6_Cp9F
zE%FV`K+(6@|JzSYl0`a`&YnR+0&5tf;IYI}#WlPDNnMig)SzC8x`Gd$kDG^QAwAJx
zoBD%_gX>UoHRS1H8_rbLuf<Q+w?L~8igw3dSv~<#xlYL>Hp}`a?Ob3P1d|`~1&F6m
z;<?Abe=-)A<6&Z5*seQTK{yY792!!<x;DV9PKc}S`la)^rx9M%2jSa9v1xMi(32D7
zzDF-=B&(iiXUpkB$@5du3vZk=cq%>(ZIzZC40&_j29kvDBORC$>@(WWjnNl{Af0Uu
z)-Ue4D)}vT=Q;!X2Qzw=@hg5(6OX+w-|lw%>gR`WN-gQGtn!VxVGb1D9ElEmYlbj@
zOJ{xRPC5H3<SI02Bfly%2^($=zMPiTiqnW?vu;}fD98lxZ4*b`fTctbT!*PN@v#Xb
z>DlFON$3vtQEI0O)lc;YIvp;!6{VsPX>~?S!rVTZHYllOb&XHL+*7tH&@B>sxEOqZ
zh)^}(4O00V7PNv;Y~VFn#MFO0PHAm)Dt|3H=3is)czhG%AQtX;WviegvG<||EXgC$
zbf(y8+^03p)YllGvBV|ZhF#ylsB*W=KOVNQ#yuWwRTdlLtWIb~+wx>~C~LWHSdDgT
zY0z+F71~e%6@PVaD*U_o&U(|~wrR`{QUL2%m?>pBCxi)aKRA;GcYisP2?seDhXH%v
zStuR$ZonU0xGT<rPNc-GIXc*F(OEY5aM*|ubzybWk{RWGm!Lm96dP+(6B{~M-vg%s
zCt(h0#6rTxgVrj<O)L5j?44%x8q6J+tOm?oMb58wO%Z}zA(0tXIPzj>>>H{-NPq5{
zCfG#y+;nhYh#3DVJPR-El$l}^-b5}pDcq%GEc1yr%Iy`aTV)z(rBqs_FOb{|YZj*^
zwj76$@2K=d+S1*8*-wUK`)(mWA#EPC6N3ha*QH<91$;}uPxF70E8ynXL#q<7m6om(
zag&xbx3`ALV^a;=!>8W2%fO2y8J8aFd4LHp$ViJCPne%A^+SPS^%S)Aeckjo5G?-A
zV51r|0}5uqzO6t>1TG;U%$HzVerE&1eBf@q`~undD)q+MS?_VZNI}>MKcjBTVfE`m
zRxfB4e@BsPf|$)HyL=Lx=9Z>#_J-^iij&gOny*$Kl7>-K9gGweg=60XSrt${imu-E
z%TOGh)L=-yJKk1Y?$6wnUGDrdB7a~*XQLVm&(&4NfXE|+n3xZZf+Lm`{xn}-Df}>B
zHyMmQS4-`DAwlfkBC*DMW8)v)Z_Z&1ZFo8hKOY1Fj5J|CbzRB~v4{4DI0#g5A%E8}
z2I(OJugjrNDE`}`x;vMhYr?Y*-Jf*@Svn&ddg<h1lyH8(bkOoHJiMRKN0#VgD}aE%
zn~#2}NljOxNJ;DYYO}?1_j$WhNz2u0z0-A5=bwNClG{esqfhA;o8I|Ux9GYUn@cKY
zfr@6gbUanqEu)S`Tq_}qq_3_M%DIfw%Sd5(8&*b8@xnUrr=%V(H99kIVP$(_#K<4i
z7oys-0H@b@gFl{MNUUbCUSmZjdmcsB3dM;kPN|_$9!Q3L4q>X4we3paPBynpPDYy^
zgnog-BP-=_y!!Ljp3IqIY}PVE2E<t8o7=sNiwaJgh*v-c>RW#_^0s;B`IaPo_W8~I
zu8B!nIi=&<h}FwIh$<=2+(hSdp`(W8o@SsU4HbREJ+T6X8A}c;!L;W(LCK5s+X-s_
z_AMd)t~Z)`f)+)J1az*lOC7Yh2k+W$KcD=4>Vr?<un?~gsI&==(%Xbqozlh4w{^MR
z2JdwXBtbNM1#S#CAWRt<Od}xULW6gkT=Cw%x!xbU+J?;h{c{O%sPq9Z!jvmK<L2;D
zL{GeB6IgS%wDQtw{WX}gz3F---QD>SxWwJrx^H2Pc;p50DKW(6vgbYDT+F@l^1-HN
z{8HoE+FObdns>UyQZii#$sOUwc-bFFsoGwZ*tPr6f1U05T<l`C`LxsXqWrP*qqn*9
z;(Fr-p;3SBg8(NR;)sDo?9nKbnxo`Cj+$0gXZQ4pErqSJAj7J}O+;2uOTN*FBrm!<
zV&<BQN)&NmHZ3G_2c8LuWc-L$us-v@akUPY16zPxr1;J(6(3Kx%W2o!XB_~w{PGBa
z0&3=P9QL-m`<*yi1fHtn-OljQ&F$!T_gNUce$rg2EKROMxAVO@u(harT)KS|lTWJ(
z3u}G$ZTyWI9oTXEX>@fm3ZQq)Mmcz($~^Z+kEEHKrDwMmUyka@49xC!w(e71friSN
zU#XqEHMT@Tk?_#LIj!&p7#!qtH28dWd4Hbvi)$@Y@mFT>hhN-K-4Sfc<XSFbhe}lR
zYC#U9Eq?Q(dcr)kZlSm`qQ2~4o&L&EJOo4}9CpUCdz*Qj^IhCM?mktn-V*_BuN54D
zr(OR;t_}WwkZZlXwB4^-S6l3utz)(_&B}?B>}2-*Tk8kch0>T}SD(5Vwt^%FN8|0q
zsks%?!eJZ#4{!|~NasJ`n%P-rVe{xHK^|kyjQ)|(9QddGX5EHg2MGx5gUa~>b?iLa
z4x4o416e~N)Z>$wi^UqBJjne<A)lzsWK~QDxK4=Z$_4ZmXh9om<l^DxnnjxS8`}yO
zH}yco+iZe?TmPA5SRE}o3{0~?<v>$Vc*8x}{<vkNTLIH39K6YLHD1#U8<uS#iI0ZY
zcLke!t;>7E$0^D+UsdvlELeA+45^8aWom94ea|1BFDL!$8qeqT`*6pQNVwZdkqW4-
z{R2r5M64&EZ|AshhSgqX&<AMW02Fkerx^Tdf6##)KXC>E3+Kt@j;rMzksXb|F&q+d
zuzI9QZO%Y?6d3v$+McvOSwFva5uHA}pcd|Gv5N}v+7WVTdHCtc1%}x9QjkxU8aJlQ
zUuMW7G32Ephv1LjY1=9%+)>xz(rtmI@+{PU8yJ9{8J<S9yI!})(gCEJI$2JpHgvx3
z1Kqd0vc&~>Qe`eK6WQ(Hty*4*xqRljrfFbgItp0i1vT?!TkIL1s)4$uNQ?Fxo#5FW
zpA+8S+E<mG0P~I=J3wp(9e_9*=X$_r@P~2kez~+cJr|IlDPfHl3AD$~;fD?wp`dai
z4Pn?xn#S1zo&G{y`t`l-7rraVbOm_6?@*UZ04|&lt*eRZjo4J+LqKdP6#f6;W9M=+
zKTN(8zkM5L0iCrZPg^Jt*Hg?}xL}$PjZWb--S5b?2q?PE5W5ClAF(-f4sgI#`W<V)
z$2r8yAZ@ZQclAfMeD?A(i!4&tjirExI)$7c26HKKBe_)irN>$RTjOK@&($-_NX?C#
z1v{DkGDwbql3J(xVE3)v*=wXgQ#VJu+ZkgNFarPu6?J_t4`~NWP&PkX2E5h-Xd7E^
zRMu#y0y*g}8U!=O^1k4d<y%uF2D7!5fZTKnbee>@iK2s=i8p(Va%GL-nehRiw*E|D
z%N|MMvw}JP^Er)|OyZVfs$dS3nX(!ns@3^gEJ1d&DiC(Y7%$0=md)_f4rri{Md#pi
z4ir*eh*>S<!eo^;jqO~`jk|(4u#@L0#$mGZRd0e06q_@6;QbPN#6dt=j`1+m!)f98
zY5(|V0r;2#4)D|_MuQvIBO`F<t7308-S6tbTn1gdeJTqm9r<<R(>}>aenc>Tf0gNH
zU;mdV!^Z&-Wgvt^UV%68mf}p#1&vhJpd5-6#3qJaOHp%bIgt%&E*4GaTdLCLJcS*|
z_yYBUV$y*IKs7$!ou1D@o_C?arlwoUCwe;?_$*OPvv)WmkjfxK>o3nnmow3fENwi_
z@UuiY@8yZt*_3AuJ<QsiW=nms+JfR3AiO{2bYWBh0?v;6lqR!fXom||ACay(jgD8D
zztK&4?N4G2Y5X25UA~hzrS3Nu4%dU%7n|4rRDAxzHP4{ZI}1m*yt~Wte1NrKo!QWI
zds^N<qR+>1y{6@T!*-H-<Tcl{v}UO;w>AbL2sUHIZYXE_;ysqbFqvJH9Cj3W!AMR5
z)uy<8CY#{1Fq33o0AjrV6+RaF`V-iO5XxOJ(Lw^|;!etRVEnOtu}!Pe+QHy!ZGP?v
zpoU-8D>qCL{2V!0+Zg~@$8e@{fEEVLyqo%cdz4|C5j?yI(&wmLKk%kS^RTn+OH;g)
z_k23OfPqdpXdSE*0ic^c1cZuU6IZ+l{wJ3t;v>r>t$TD5${{VMEfS|HJ2O8U$Ur<G
zn3Qc#^7U7A9lu}KbWZ7OafVEnU`k2O>122rYvWY4Dw1e?>hXD^Wnaf2jnGiy--eaK
zgva!kG3+{++Pb%sG%Zj~R?UEj>m{(4`BFDrH-SiBU#JVG?=)_1n9bAm#f`12;!^&J
z!3H3>sX{dtPfNJnv(yP1qp!YC{T_xW^WC6vqpOvwkMmc=#rqFL!3bm=unVEro(NY&
zR=iu-wZf_xD>R_vFBeSoKn;$M?UU3wJ;+)@Z>?%2$lCY%&H=D13kFRLUYu`8@=E|(
z<_bW|ehGJx!U-Ybx>Aqrc%li}{{j49iCh}ch^K&GZ%EFJe6()A!8|?wbxL23@j2Mx
z6i2{K9{{Vt9S&>5j1Nncr}qA<FR&=M$UIF^Eb-OO<4f;d+U3UYrr)|SvR)hgPBwWz
z#v(870yLXr_^en<*S>1I5VU)JcF|lLV~8Jp?9}qqZ%|Q{g_b|CZaJT=B&t`tdAXv;
z<A8j(cl)43YR~8$T~EzdzmtiOvNEG^QCL&V?;K3~weC03l3m#CUQGMd{;Qc5WUg*M
z<=aZYS=u*5VP_$iT4}eXw)|J>x0O)vr0<IQ{zC3_nf=-92wMB2>0-s-??$kRzZYxj
zs)H8&>lvpS)f~FgwA4RXQnM3vOMWHE->|fW$}`#T8){FG&<m{@O0mt$Z5da!^)t&0
z{#=^snFNYI4IT@YE+kE2KGq9~tnqpex&BO-C9Y!@WA(-9+WsWmSu#9LG~EAY`h9Jp
zmeKeZatRL|4TJ_HPrG)S6->}*6soAL_W6AwdNg2a`*8H{)|dg)`s+v9^8i137=S2E
z@3((#r;3~KwYiEInNVQGZ5hQ<*Oq_UIRF%^O9fN^KcHCd|A1nuhMpnk3=7oKwi&4v
zkaS({XTj{k{r51&;%^cRT}=H<u(f~lAewYIiu#1E%?pVDiPJ<On6rNf=vorA;{h<9
zNQMwj2Y7aOgcN?^)YJdL#P*BWkXKaYeJ9bN)1(A1DW(ibP#}Z00k~KJ`>)7x*|GPV
zGqELYv%7KOmq*&$nsQX}O2f0ku$H2Z<1sj-+@c3F9F7~J^Oq)u4<L;rOb{f(Od?LT
z@cI@mL=LiPK39NH&BuPeWspKA841(aD_jCHJjR-zm0)p?|3cFFoNW<#J*qmnZUL*j
zrdCBXi{9S>?F`oYD*^4j(S05I(`<}Cu4(jdeaF9-<_9>E>E<csI%z+a&tK!4tH)?-
z!|1CKUcaE?R*#S*uysX!1uhzhg<!Pf>V3ZUA3`S+btA_|>BlNmwuFIouKMxb!HA*Z
zLNzjrPUhsp-4I~xtR$G}?~L&ju*oOxf2Y>+%+0WS!0%oAo$VDSUPpg8F?N#2&`nPT
zaPHorT#x6Xo6O!OhWq9A&#m=?bNOY~v7zf^nZjld042y0rt!CsFAc&A7?jwZ9Z9>s
z&+iW>N;A&_m|SHop{>P@a?dtsQDWgN8St^2>gQ2ZmX+)1Gw<iSVHKDvxdKqB+i0QN
zDz%oNzOmTzv_X#K3f6(nBnw*Ba|H{w%`$Ftb)X=!Y{_N0>Y-$nV3pl?724f097?6z
zdSr4XU{L(ZDLG2P)i)~6I4WA_Fc2-#Wh#!?3b?9@i?WTS3a*%*FShfIwZyPzE1A5R
zo}L~vU-??B2qMrE2ziiq7{yY;?ck0A^^(@Oz91LD#B-K^e&_|gn!BJbD<ZiDxxLGP
zvH0|l#f4%=lX+F4@E%yR`(d-+F1mpMTCzSWU4fIgVuR3A02(bgCA@gu-XCJH$mXkN
zrJ6OI7v(C<V(zHc>5hgo0i8(V%$^!eGc)Tvn<F_Z^tftg{ReZsXuLJ6G%f2F*3B`k
z<1Q0C1sLtsDrxIwywxZ*-PFyD`FI9+I5n;{u2k0Fx77|BjR6QpH6}}_H9CCSI{B5F
zUpX~;%Wp5;%@#haXL=WV)KMvBhQ%eIzveR1kWN!kI-&eemA!N1(`QpM;3+DLDI0+u
zyIHN7^xEepZH4Z%Xq~+mHMN9CTbmC@G~L9Sc8;p~<{vwi&oH_au5Fauo5*HtECMSp
z&rAN@vYOPN!qvSZ;lVaM$FJJpN&hxP2z?$}G?#_>x@DQGz8g?(k)XazZ_;7A@GpI&
zZjwy$32RySm_Z2+t<y<3I;MW?T2xno`MgGrlXc&Is{<nRzE0YZb>BL3>VTSLNDgB=
zur3ug3$M+F08CGQ(z+zXpJK=514Brg`ZB)N*CnU_^itx?kKW5t8rFL#_k3t{QCb#Y
z<#sbXMb9iXCM~YGyHU4~T)dI{`RlLF?CN&8HMks6W}E(zC+CmGKA=qgg1ti}VL9eK
z)mlvH5M@&bnR9!ZWMip=Xxj_4dAYsL$+6M*1mt|d$eais-Jd|3=@ddNcqs#{c0}q!
zF1MYKk1Q|9lFgTV=PTRC5xAh|-;)2#zcnEm36>iA8t+6?#Yh4?4YQW)wrSnW){k9J
zC#JcjJ7hYE);eE&u$$bBQ=UlGt8lVsf7of^FEXf{%(==1FtP==u@bw<9s@fKq;3W{
z45WrL9S4(uSPp_H)e&w2D3_rR{3y{79D0)?SWmphfvQ?Aj8j+E<6=@hngv>=%l`_?
zr7CrU%b*u&!eP;-32WA`4`6d?z3*8#PlJ2{SYhyAds{TdpFs)o?nlBFUxQp47Qs-o
zFUz03Y$sB=E2}CWyMBhn#a%rD&E>ZKY@laBKEWP{zccDBG7udR)}0Bd*I}wxL#yWE
z7aEk*%24LBC}gX^+ZCk10!1p$QG<{wT2|bZK(I9B%!mw!V-lg8QrcxFAmNE!SUeW0
z^sGqX0cY^9kkM1xg^|%s$btcdb}<Q>^9lpGC0QiR8PQV4kAd70bSQ`T<!ELLoC|&p
zTs84x%+(H2EOq8>qEe(@W5NL5kr_VT_w%7edz;(U_F?uy%CY6Xt0u%&Nl!<;lfzH#
z?!JSn5Efmrx`DaG7H1Ds(my8coD+@U*!@c`%|mmz+erOD*C&_W1pd89_3<gef(Z&Q
zPF4b5yT*c!`LeiLu31A0uPH@tFIO+X$huque{fd{!ac@GN84TA!eV)I-e(`gI547@
zBVsb}tOT{$X-P)2xc|wH=oq+Rd?NrC{vK+p-B3QL6m<rzLe70^{MhbCf<9sEqV%*V
zDwYI?UCMu;4%W3SjE^191C{$nM#~K)C7Y%)tjkC%-=kEe<+0Hwuv`Hp0a6Bg<=f?X
zPWP}E>9i%*punr)*`ndu2>vgcDZi^}ACaB;{Usn)xQ(JmAgpWA89a%3W>QWpMzTQn
zRZL5^18MdaBoAle&M>dn$~F(_Z1xa|TQZ<5Rg6hFjlCRBP)JS4t{-U#MJ=9|vLOgV
zjlU@rP#RT^vor0UV48+vnzkK=mskB82t=YnymS|hyaON|y_HCC`+J$CuG;_FR*R5@
z%SJNZEi&il4Yq+?k7kk-)v}#jXG7`M#vxj)(o^o=#Aut>T^1Fao}C9L4mAkXyLtb(
zepl8FTqCSaiTpw++yL}@8g1b%NhwqRx;LAtXXnHb0w>*ZD%X#;f)VYVx=pPw9TRGK
z{7wkWmg%I|-n*fW(RyM(692|LBSF@7JChm;g}E!n;FrP>Hn2<4Ul~*8*LxfDv04O1
zMX27`xT~}38h9;gE~8yF9ARY1^DDDS->$g9?ih_y(yjMCaTI>UhEQcyA(Xu`rOby-
z1Rd2aLJ&C`b>|V-6_HBmVr$7GBm74JIIO~3EGVA88vhz)Xtf_NvdXGvOYUC?Sj-~-
z)cBS4_74li0fpc?m~{Oe=_+k)y~=c-<I4-lFc?F@?dI`EYW|J4bhO0f<Cq#^wJSGC
zV{0hlk||{xJ(PV9!ty7azla}HY9XdA&m(!?jq`v6%?WJ54n?M{yYk-TTvYmP1<3!!
zKV1HK>$~0{#z7OkX08^h0g;Bmy0xcde<_Ib+RB#OyK|XF;wA<+#v7J(6qY>YxPS`f
z<c@uMmwy@ed}*Tn`C;~Y*nGk(P;^dRUBSrI9tRtk18?a%J<Z%JpqFl{`xivBE5oR=
zeO90X{FH@nn`S4EtZeR<xOi$0Z~jP!gJ}$NrQO2*{Z;ApfuUL2-SZ$M&!A5iEP8!w
z4!~8Rz^(A|2Il%6al-m~^Z-z$Y~jRioh4oe%Wt2FPe}0T_BiAQ;|0{wP<klh46hZM
zTn<wF>d~8>?L47VMvm>1vwJsyV5CXb8+T4ujw_St__bY7lY>`vLMy5ABTxIQuTh!X
z*Y<9>G}4iMMFJ6MCbX{P&Wd)s=Vvn+d5qXWo=3;+JJ4Pco%sDq8exYHRT+SYRW+=Q
z<1e^2u^P9^>0hklogF!+Fo~c=zcE@$90@~FsJsKu@{`fiZGX7jt;`qa{_ddXe;Yk4
zoy?W(M94i0m1h#{GPA|k&hRJz<xbRN+#Qb0rIXk01GC7@)pD~BOkj4w&+<_F*t8Zw
zXoufFAeJC2o+49doV-WElFA#Pwy;YY#Ib2a(^w5mp@nWOmg&H*_eQx`?((Ulo{F2;
zg*^s$$PUkkzj_DFRM#a6<lOwy3yXl*$MbP*$P$O-`FrMRMFGirmG>$*ncDN$<JE3&
zIm+{7i~eDi!0XSuT}?t1^A}Q^LR^j+BMT*Hzz=Yd9g9rWK7HZBF{mMpW1K{1SP|%V
zghoj;F^QW5P04z>shhjf;(V~F`W;eaCkNY#KSk-@Ox7@YtLMAgRpkNTJZx)r#=_1|
z4RQtDBbfT)9_J_p`3Xx!&lc<Am=%4oh8J#ub;SD95{WI0%<co?Whk7h39dUB()pp#
zvbnKz%{}(wx^O9?jG3{wjpGH%xiR1N_PhX;9+r?fB_UTsc9-;9>94BW#j-~JrEK5|
zm!K9N)StHNDus^X!V33twMOJ>lj1+<3f1uj)d~U2LTAZf9(7ApO?4B#_-g7B#+fYX
zh!QNlH560|u**@97Qy@EU)lCiXQr#?r+YF=lZU^mDCMdAuv8$Qq``n5pMQ?N#M0-E
zm%R8c@}1v%lF!f7WW`&l-rL#nNo;u})J6jj*$=VqNAeBfaJhUybg-=*%1rS|Ta_G`
z+UIfkWc-GWV-T$?Z=&)JZd$LSZbugLYDxu5+tV&K$C^g?LWC<Pr&tUrP@Mc*4;4x4
ze+O^&C<rO?A}sUL%I8#Mzgb?591eV{Qk5{=L0C!y*ziI8{zj@5M8!0#6?li$^1)<6
zu-3+KvQG46d-LvtdjZF2mypOB%@v&996#1QVY}iX$LD|W(;y8j4|?|39I}Ff58_KU
zJ7wD3A*GJ`&$w`v#0zmta=Y3+kEG!cE)%@KlDt-xS##&w!{^V8Q=L)zCR5`~1>LdY
z-W0`mmy87o4-s!gz1GcCyDD9vz;H|dAcWMe(YA#8YOnO-qgAKrs;N2$b{OdOcrCgp
zGsSWph^EUPu2N5ZEm*$+XGv4tfo3=am9X|raf@FxQ+#%{y9rN8Doax1WI+*WAK0#L
z=J!<}c0T6kP&O$ZsdEyD{{}^~ic)K*XnY-?q7A<Ouhq@^PM$skVGQP5=xw49P(L5F
z<GQTUWZTp*eI23O8}RUS!SW-zoc80TIhjI9xhg@vu7qY(*KilDIbWj&P>+h%6?xp!
z_WT22p32x2YR4GEXdzMr0!;Q64K4?$z4ttQN0z39fF6sRF12P-s76s+SQ;k*078Hr
zL{84TYI+-;(#Wb4_A|AhihL}yF1#39mWS%E(oaQOO&;$}EUQgzZgFvxMG}_*2`LRl
zt{_YF)L&$OwwyYtY+5cOK~_MJOPZ9NPymKBqvanR=b-G$&wVk23o)Kd0_I=zcT`tz
zUrw71Q5dHvl)&Ggz_0-vlR}ZL*#L=_Z~bS|w4+-7fls}k8=2@<H_IiLw^%D~1hT&@
zfk9k%;}(D+Kq&o2Wbvnl6U85)AnCwNYPX8rLV!AZ-+%gpKEtJfOPxy=0lDT&6)@Vx
zSTf|3PpB9+OdgbOt;Toxw%IEuSf83@hME;cy%m1dO3WGZddg0QaZ+L|cb@uYnM4C5
zkYWW|0DsrN&<KKyE%BEo#qRPD>XSSIG9|TC_Cz?NbCQE)CL|nmqah26bpOs*SY-*8
zxXTSaNt6T|^BT@NJ=?UAL?7>Z_>k9Y%)jd6v69{4?0;5sqVx$BDM9y76B9?JDd=L7
zq){yL^hUB4U_-daVItyqwygRZIWIf&Gq)9bk{)3LzcsprB{O-Wyc5wj4o$@sMX_tF
zBTv||`ClFJuv0u4dBCA?E!{85;h>*qM~0-=OPU_l&Ib%LkUr4AJkmsDIl37}UU<cw
zB+m#evE^sj?z*6p@9yr7C%jzaH81i#_9yg%h>|>>FL%Z->#-!B{i!s<Sp0SWZucb{
zA$vvWS4+bCBaZSw1wfD)%4)a|s6#Y}Vj~iuCJa(qf43@Dm?E4BHfm754KDgMFoZom
zV)P?IAj*CTX}fpvP(AooYkQc(-9xp)BP%9~^6@yw9P0cc_(xlsW++6WI3<XvmpAI8
z2PG##gun7%6^od);@U6r<T8HJ9ES~BUI0{*Zi#UxhlDpuD^#OlZkj<0t`P+uRMu3O
zNQithQrm-mx)AYA9h6{ACUapC{HGYqYy|h^q?dk1JCD=lEJ4$Q6>KPDsjlxs9eNB~
z)C=MslsV270@E!aJN{}j_a~;c%ku<g>{*OQlr~Natx`3%KEe$F<9LSF=RPUIr&?c&
zRS$i~_PvAqCF?@8AruKl@9wn7E3sU=`-txy^u8vJWVTMJtd<Q<zgEy&7MjKKb#@NQ
zLulC(PEa;^y&~HKsz9FV?dKSRcxkUG!xU$iG8LaM-%8}FU)$Bk;cda_>R2)1@GA-Y
zOIPZQVztpvi81=-G;ZDgda^jwHJB}=#zA`9gf-2%b7-Fe<#jT#bPGmz&dXuP2KBSB
zm`@#%trY2~S=ME4gmeY0P<mJ;xvaU8@C2?9ln!OK_CXK1g1P0V0ZOg1Vil-q2Z0cd
z+bhBxt(Bw=wa#;DSJ%J*W<E@Acc2NQR-9{3tw1_OM_Q#-?Xh=$tH&o)LAVi{B+0E%
z%DYgixrQ)<5!$9g<PoG4V*Z~55kcle2%r&g@hov_v!C|^_XZ|XLA!2(#O&wgdMR?l
zm_Qq*(KUsPG4VY>>K?;{UeSd~0@tuip*_jQ*kV#wvh20a{K;vXe9;LTTCVz%zj9)}
zLc^;N=x_S3IKfxA^N&1&<;>}v7bEAtU*yYuqoug}QT95-hK4-jr(4^_sr2Rc;hrb>
z)){2cs+l7!sJq{ivSRGvU9n`K0iDdUOg?)Bo;J{diw?PF?&;<$xf_dTNAx#ki;x3?
zU2QhG#wEHc@ZS%xBu~dNv^i~`d%~ZRX&iRysShxXzM*WSkr2)_yo_`fBtAB7M5keb
z>HN;fAcJkB>QmeR5fFnV{cP{N47eqky7b#?<`VrkbvfcsySCQAP*l=^kO6Yk89m~-
zT+w_N!%LX~XKQYj?q469J!g;_9a)OQ9k+c@y+SN`T;(mVL21wy2G0_gb9}zcf4ISm
z4@YQTY9276MprF6a7u-wu!d+L^vG9LD>xZP2u6%6hq^wgnANKZwTOTBmYjDq7{*M>
zmiRks6~`ds2CV$=c!OAfJxeBZe^FII6yXfxtd{I1rR1|l0HaL+uGwp3D=7(lQD$z{
z$+B(ieG+xU?K<(6eY$i%4uAC!DA>Z0>5{F?i!JH}_S?|~O~&1TI3{lK#DDZFt~dSL
zlqrb;zi#)%d9Z?MbcCVjgJ=|^2WED&o8q;T&nfKEnZwjPG7zb90E~UKvOW?JAd2p}
zkR|w~K|uf-x@BvS-<0fzM`GsGBeH1qpxqCvX%uhQFr1mqV32%h(u6TFw<<Ty9qQ|1
z4;kK715V~wxYgg~sF1WOh@a@epSVE0^5><QWa4t!PK!-4!e=F&d^X7iz(Mltv+WqA
z<z>6_H$W_3hfoG%uBLZ8oK#D?Ubn;hN1YYwvj8{YRL;$i_sd<8=id4K5*SNYVVLWF
zj)Q%b3Z7W=!^_rL>EB8M)$3nUWN5*>;GzG3nj1=8Iv&)wQ5NSa-(cDXC#>k?Whtrt
zjN=BC83{@X&>Y^Sw=#srR!<Hbh3ROd#`ea2ML%X<+#fQUtcz20k1>LUe2VL~lg}tV
z_3tN!M!w+RPdJZ=gBF7vy2!GC3mz<XxwcnmIMgT5Y!bqWljAz$UH(-kheoorBy#=R
z^WMbvu(<G+yPc|Db{g*@>e4zOVMy`5M|H{?rgRVu{hy^P8{F^khUtlBAl=2Ym-9{C
zyfo+V%98h5w<R=)X%y6+Qd7Y6FZwfA(7gL|7i8o(L;8vqBc!9%at9hj|3-&G;#(86
zJu?GZ(59!RimWQDS91P~%BG2Z|B@Bav^QQ4uxb-PJC6#^Lyi`brWtB^X!V%W?o6q&
z`zgRU`*4I#fz9SrU*PamTd!?nV!j>S9$5s%GF?ahATe6|>rT~*jIPFP+5TRx$tmT+
zs8j#CrieWP5@Lb06gD#5>V4W6XjDdAl|qifBIpl$JQs%6vo~4bb;Ytk+=Rw%1;*X4
z<+s1ds`Pe-<@@q-z5B+3Jo4Ei40i|o83g`&r8DLCd-Y&bg}wl$_+dESNe5kQ9TYQ#
zA&5o=fiuc%3DlcUt>*I+?v(p2`$RqOkwp9MW%6y=oK~y!uJBD4HQeHACF$cFS{g(c
zy<A(DZ23-E>aAWc<#Ea<5_HC{;cw>Z6&B`$<MG{Vdr7y4hA+<Y&Rh7<3cey7mC(h3
zOK?3I2%;uJx${TM{#+towr3%xcOfR}!*-HEQzKzC$<n?B@vF8QF>8JzY@s_UoFW<t
zROFM>-5alJuH<8a&bmo+0%~;iilh<<8Y?)aWw<4I`P0n6<@QFk`xJ(O3DTPQz8{H7
zN*m?NqcymnMI%l+XF=n$>YZfcX4QnU*v`OZ^YEl;f40?kJLJuOwiP-Ef5T-WUGsTq
z`Y=?@;cwf3{-L(6+QSD(&i%;k<WUy^#(2z8QFN{z7I#ZaKXN{#kLyiZbN?!TtKhoo
zE3jClEdYf;ZKzH23xl8Kr0Mp2B}BVEciBNF_D^!>3qF;vHM<;@zi?Kat5hS&*;h0p
zYg(UUo%AolA0quGBO6jPS5hq#^iUz;w=5M!ll8c(TjHzjz-G;BH%Tr_fI8k>7xumT
z7P=0;NsToaa9MyH24VA<)ZHHXK%=<~a&5`uqwQFZY=)SPsI7%=LetSh)A%<3+jXEj
zbFyG8V0MmdC194lZi{%jg87}n0oevdePmT=eUASZ!WVAK4OL<UhPx4y#0eqo<hbap
z9}hJ_MA>q=t8Qzn_ph6c`A98<Gs(3W4%gpZHXK2D$Ndj3%L3r=p89v;$;U5?koa>P
z=W6;))Mp$!Y}L$n5Hl%zY@53Nca?hcK-9`TY)x8k%eIxl6CYLWNLYbe1=7Hq69L&8
z+iJ1wcvu2L7c=PvxKRJKuC+ml%SuuAyW}6y=j-*BE4(Jh7w%22xZGoG0y7UwrhK-8
zHq;_jv+kY@L%Jq!+}gRaO0PKUES}&B<E(lb^>sQq8%qgB(Zt>*#&8s%0K~Z{PF#rP
zCdMp*duZ7`jx2Pn0ed@Y=0KR%%(TliEAp=9ET!T@KamCxl3wUz0xzRzF%R2?0%5B}
zgLbN2$2Ub=u*mk=F9pFs!A|KQ-2DdYTK)uTqlklGsw#93Eg9_84-FG9*lYL=DfGm0
zX3}RQt2F|~Qa)<Nq-MC<lJmpJ!a_6%prPe)h1O2S7T`dEIveFOQ1-0Z1f+;?`11yr
zexNX08@9BzQj-Itzr(+D@(*Ix8--SIN}M0FQ&K*`Pr1FH5dvGlA-d?&S<;#Z)r7N;
z=2>3#4ifw7-v_LMBwJz@Fd8N(+#xiE*s%ZUUGhT1vu@^DOB2Lo)B}z$8lBV0E~*!S
z4lrLYD-U9<);pf$e_O4;`)Un#SNBZkJI@R(_ObG1W;C!KChMja6WaO)?}=kKFg00y
zA+m*}Xu^rJ*os2li&eu|JL9y_HUXPUR^6($P^-lKWK==Bf2cKJsU2E)Zk~At+ra0g
zGUNlR`qq&Y*a!Pk!%@4_2f(7Zr-M%+yC<)R8vQAkED&&21`l_*Qp+TM@3SdpB(fzM
zLsk5fSj{fdlz^!pLg75efRx0Io?)MWMcO-VIq@zF{;CNJ>VA;ZpBA3yA=J;>FPEoE
zu#kiJ?56^~u1WGv++b7sE10wTs+a?eRA~!E*+Ew+0G5o7rQTk9V#|VB0MB)$G?~rs
zGgdV$yXX1Y_cJym4Y-3&<%7|JTy;k<O?cWPstIvpfHDFy6e{W&NTZoW|BfWR#4SW(
zLE4w62_gsdN(bc@qyZbX4|kNEZi>J%Gf4&tWt~OD1q-=^+n@_i(~>OMtP4V{2&>m6
zK74c;r^p7Rg^E~-w)}QTY{=E3;qO07LmFKvG?+KyM~A>0#xo)GbTa*tdMpo^VRL;1
zCvqvLJl})hGb|;!dO&UBSV|bIGO*n0>!xjCjfktSFCd0@8AJYF@CDhc^s?SLe82MO
zMw-1_&pfbuMaexxvN^u5eR5cM%3Jy`3F_PR82$u$10H!Xb0F&zlT6~EO?+r*SqD}s
zI&};R^z8Xw>6n8aCZJ4d3B?EK#jLB*7(6~vC81b8)6|V6VpDTD-O3TV8mxQA>_*sW
zEHB0g-IL9xsBx{Aa$_4nTa~fo7)h0+d0;4DWOx*TDDKE>?J;#c<ze;zRPnn(`YeKe
zHB3|S#JRp4rv3^^g_`hoTHw7)Iu0!0B$OP%wIG1Wu}bZ`f?Kcv=qZ}pi_@b`drSY5
z|51uDxLa6lK8sw6Nl$;u;R>Y>{^5R0t}91}!s+|4`NlT-AntE?0)JFYOP!<G`2P*b
zjdo=(RE4$ERIx`iuKAQ=o1@%pMs>!igVi!Lbod1cs3!hS>KBGGxm?A9R33k-@S~33
zgX&qU90A-_KF)(Ax@Zf%o%_L*?C#{NLEZtB1a>uQ6i;Q8-%;6nT>%>V@B8S<LuBUS
zT9L)L$N$9aQNl=SafegYyeNhs?<$Ax-yQ*FzQCW%Uycx;No>xrj%R8AKxXBhdWIb=
zde>$EEWCyish&}&2TKXPoS95PCwB)INeQ0yZ_!X(Cj#ERJQ*&>J}d$E(vF${Gx%No
zdXEsrCm=HJ4}?5@B|23qH3uLUj))3T2XS$`zI_9DZ2EskiS|@zQ6^N~E1KIb90y4f
zQl&#cTuyzI0&XkNZn*Q{+4V@|TH#WbJ~ZB-(Hv$p;*opA1zC=t{VoW3Qub4V_yzN6
z%Df6gaSftD^eXANB@he}ericB7!p{(r0WLy+GuFFOrT{+l?5Pj>qfN6+J&I$tgaOe
zb8k5{b$e9BtTb53SbiKQj!o9r^*G(zP#t0K9_r>u{SWm=hzrkD>k3vVZhu9&J-B^O
z>zcU7Sp&8w8Hv)K1GAlx%JbTF*=!3wis|>*m%M?jsJo`4kC~i6-lH~{Bw|^S$m?Gw
zUNFgei<Qj13gzt&2$V{unINncADGYJzv8B&V+v~MqM(y@1cC(QjD>md<(>XFCc7hJ
zZ$+aX9dBz^<b&m>X$QX-slP+04B8`^>EXG@8$Vpg;b~AWyFMQuO`Pcod9l45%x07w
z4i|jql@o>H%5CJy@aN79*O{y!{JO`6M;WxQQy>l{y+?*EbLZ5EIbGI!M#8tCmp{j4
z;p}>6e){Xo>88Z3fEz_#&!o`r*`OftbdVR9;3wZ^s{)AC@A(i~bd=H+Z^-l04*rX7
z9_8kGXKNEB_N6;}Z7J}rtG-p@&h>Vl(eF9>VAGj?#|H({BW6Dn0y*428fppaCRX_%
z{Qrf?A`m)<?QoJF>Jenn9!ldJ2AY5r=N&BL@>xrX7ok1#Lpr|{%BiuVM_T8ZE`-ob
z{%R4#mHvzd`VS=AiRG+G<qHV|wdPsd;x=6IgK&N`swM9EO5ibeUzRrotRQ7lXa;Ee
zld-3I4FHwqUsy*H{qlCZ*l!H%I!&NSOJc<V$0OF4LYi~(H(6M>YDKw;9^T(B{94~Z
za)(O{#;(wA3FF2w`;G44Tfm)hDw=(3`$86bei{5>Z$hmS5toY|i7hj{G0*3vhb}Fk
z8)@7GUpq9=2c#Nl98paCzu3CR?nu;iZNsr`+qP}nw%I|)wr$%+$4)xv*mlQeM{o6<
zYwf*1ykq1y)JWBHpV#38t39R&QHC*8UUL{D`%v$6cD31zsW<>8&~Dn6h7crV<3tp7
z<9~)M02WjG)6}t=7g<Yl+ELNDE?4Enq3<o2zH56UZN1sIpAH!D6n{Z<mz5-c8fa)Z
zHTaNroT07_f#*g_Qtiw)9RFR6=p~Y8_Xt=A=f*?3c*~tF&M`GFo@y#}cA02u!iXZ8
zdWw21)Us1&7*D3xBVrR9M!W<@kE(evT@uTz4h2OsfUk2&jZZs0Qlsn<c4S_tIz4JR
zq+N^7zJB~$W%Q!DsKsQ0R5Q0tt29HvFfX3)-$TFVbgL6T>e4!h$S^!-%eo-8{piZ|
zOGjY}`AeQnP0BRK8EPeA6lU+pbfj%hq8r}9PZ`2#W?k41g<AMFpg*VWKei`~u3HDl
zAJ3}j<I#ofZajd7HJ||7NdY??kDn}pJ*e@<jM^HqJNUKy0#5)_f+@Ew?crh`c`hcg
zNXoOZvb}h`G11LPqc!$Hu<cqDAOX<vzq=Yg>0Qnbt*;foX!w^*5~ov2>AQVMU%u;~
zt;p*SrT=68nc^)1%s=7Go1Fiee^HU{WGkWFzh-78E&0aTnO!FMo&9guDM|3d%|k_|
zYfqDrW=_{BqyPE!EIXFnRgIP%m#+eJSm<JT)Cwq27S>~=!U;L1Qot;Tgu*1{fwyZM
zoFkWTSQSP;0yP<MlqMu~%!CGdG1L@VNz0m)&2exp&uBr`{K0=qQ2yOiN`l3e7x=>5
z3aj7MopOK9@5ka8;BI})<xX$l8jreU>{}+F&}-mKh9DwDt3JI;v?sFoxi3iCUGS9G
zTp>U4b3$B8yAt&DohlR*R?G{Vo9@2f-rk-Ntr7-J00kqh>Zbp$7z7ftn*<~n3-bge
z@6ZW^v1BOtB`kP_n`i380e}z4opyzn1Z^1z_h8{9M6jiblrU)zJqu-dg^OnqEmN2B
z8RWyd>@3&y!C0japCPX3WFX%Z8IL(}A@fh=s!PR_o-}BOW-k#*Tl{L{`K}V>ia$Dn
z5Ud$(az^?Q!oIrD#~d}2GM)F?YvtJ90refDUHf3@CU^t|QI4B*aG28MxoE<pQfz8!
z4!Im>VffbX;uYB;j);ztB+DRY)f~l?LT*B6D$>7r87+@ptEf2mvyotGSQTsy=WS|i
zA~m`yGeA7Y(e)lG0~f~~{%@l`Sbm0FE6kCv;Sevo5#zFb{%K?))l|+reMKsbZo6Ed
zqm2IDuj1ZzF9J~^wg~=U*+v0{U`)czU+<0Sk88mCL^I?UOysaXh+b>`9|tvfp{!w1
z?`ABkXY8)$^kWy|ly-qrG6Ytiie8a3*(=ilh{Cq12_%9!*BY*x$G*5`EJPeL)8|li
zeoohw&FpWYO@)0bsR%;VW0wb@XV5bOU5)`q-;TBIpjkaYAt9D|%n4On^4v6Qhs*Tq
z%3bExr=qpEwk53GC&=${?K`VSz8wdc50>J1B&%PjmUU>zxcT(ZvI7maH<6}M#~$Q+
z(5J?Ux%$iCe~WsId>`te_(=McN*zCoMPU>^roP<OQ8Tu4T^q*d%YcGbOy!!}M8+kw
zVKcc66w{|-HUkA49t%!A%x&U;iQm7k@NKWZ&uP_vXD3fM@vJeDLx}o>YnXlkM)I>p
zA55Yl3O=LSkO?%MYSy9U;K)-Huj$&^zAYpQFuNV2!A8nXQO&T4x6Dx^BT#oHC8TI<
zQvJpxnWOY#MegE-rYtn>=CN1Dn)ST#h4H7ZjM>_!J(?UfswH~9@VTR6Zi=D(k2Ggv
zwK{D=nL9^PzQnUz`iKvOhvJYrwCjy;MRR*O<$8PNK(-3#TZI9BZ)w)5GRLe>peKvq
zUv!WO>ER!Au=-zgkj=c>ZB;{P7a~})<4`xF<_QD8EtjAN+IN%2O~g^GR3kcG57pgt
zL$z*u>R1hkty#Dycm)XkhkWe(4OMu(GRe(LYpTNy(35q;H1dUJHNh=Z#mWWq1?0D1
z2qvYVfhmI*hBu`VwG|X?7g36e_!_9c%fU=Svs*Ih*UcSfCE6@|ey#mn*1H|Gv(V*e
zr}nj}-IZ$x_0KKuTrB44a*I{Ca{?PEVP`{OZ`SVnA8k-kO6bPuf3(5fgox*AO!j@b
z3^a(Iqe^11HqyblHL4jsv0@A!R-rT;3)d*{fS|el15~Hjuw1OHL|T(25;6Md8k*Mw
zv{eF?O5`Bv<HPIrk&qD5f<A@$6IA#LXPn68jnW7KG!`UE;_(u~ZS=cSO{%N6QezX{
zazeev`c$hFhRJl+KgpKOJ4wGBumy7_f6pJ0@<ye~gq-%7Vhibdb?bo15O{y@P5{;g
zR)mH})O<$)y%y_+gzNr?8eC2?EHO8A|D3j2uI1(rCKMyq{8f6tt2AMXij<zKLhT>o
z^^4?qp<iUpJ9Qj~9-UTwzo81YBs3iVaq#|Kn{08gqm*|Kxkr$D$hFd|67HDUKG>gv
zJxq)xgyE(Sr0*izGOh2`?DhrpyWf1zcM6OH9A)6bGob*Y-%P}V!DSg;vF;G+2)n2i
z^fuHD*FterGV@R(zvAPGkEw=fgW51+mI8`I+PTO_-pI4TWnEtLdFT(LLq8yPYgETC
z8I^v=*l@S|_o+%)JE2nJ?Zpo4c<9h{B=SA>4MLju@tu3Pzq3)yigo75szQpx(W&P2
z7xXiX0PE^jGLa1pkoM=t&l1iyTIKd_EEZaKLN>+UW(5mLyyen3L^sphQ@0U`CALv1
zOIxc(A_5~qxo|XF8AlL9?tE_&hRlH7mzL1I;n*Al&U`$IqY*a+!Zu4+?;IzIvZ3cc
zE&(`(M$<*|w?1k|&UY@O=Mg%|8N3Iru)kZ{6X;FL>L^%#K%vF5;EV?GsVCW+qWcE-
zyHJI%G)r!)1|2*o$Ca%*qyL2r@xV<LWJr!S@8vHJGMfQkV7@Z=M(J_+{tU@CkhN1Y
z9ajDf5q(JD{iV!KTgJ?+LVVO=#mv$b`2&u;lhaN2S2PY$^SkGHZF{$M#GDg^T;aIy
zbH@GBfFAY8d$ZP<)NVS}Rd6yxNq4G98=Z%~Y0=cpq&_7-&0VhxS=78aBtCT<=#c98
zVbnpYzohO0_+3@jG+~Io;PLyLKYCj|Zn4Evw;o!o(X=t|#O{1ISsl7+uI${+4U=Hm
zp)YmPHuIUIHs|^t5g++w=+wag4N5Dwm82YX-O|oFk{u+Eb23h${L5KKAsFuHQl)&S
z$gdYK7SiI6EoVxM$tMerSis7R)l>|loNPd4>T~)<o1YRq84(50eemEW6b0z;3jiM+
zN2iH%x+TIjWlI-+bv6F-@ahYG8gdqQY2`NG$A@+aQRGFS$;X{kU%TaqyH!mCW|c(l
z_TqOjI(Bc1<+<=o``$vu%P`6$Ud~t%xtoZ>JXS>bJppNE^Nuy@DK`Di1N5u1pU>2s
zv`Jpb+KWTQpC1JFo)*M(>Q@|SnnvUP`0teO$w4XcyJhweZ2Mh9#@1lb(=_!Y9kEUO
zS+FJRB65K_j8{918DcsidEp0W0qqDL@azInVMSv8oQ0z=`C@|)Ht$BixAi@aUTG7c
zUWxb+z+wj$dC~y{)eGl_H44&~yx{9Pvf?{wpK1BjU&W9d{I=g!lt5+%DjC1^dAcEw
zxpsNI)>=Gq$*A0%{|<_j+hIr|Rnn`UuvO$E;xOWS_;|tSjdVrm1PwZ(cqebsPXeac
zRk-H|l<U>JUu+jr=dmu*znIEuTXySLxW3!NTX(&A6-CB+So}GdbC4Cok^;f#?xOZY
zPE4R9$AQTZtHz2PmSTdw%<bHwQzBx*7zTNVO=68X^^N+&W=)2_P+%HowPa)&9S$Qf
zOW!c15nNc#y0l1@ut%@^?AF(1tCWlK?p7h-EKsvSGY;2#;wNUqSTqg2eNTrNBShYv
zG~?|Sq`MX%v7Pvt1T>Jkxa#~)4_XS~uO=>M;$;ghz>UefGJRqz5<cYEAB86p=m(j<
zP7_NkL71C7R7cYc0%O$}9B{jlM)EW3b16@$4SkwD{#1JJ(G)cvNUd@?%{Lw*52G&H
z2a+3)3z3bR7r{}a)%AB^a?_FBsjNK4iz*WbJ?lIm*-h)l&ESMXRm<4)OUj(`hT?N{
zF4e5SD$AE6C@4`KY8}-timu1x<BEr(F~|+=E=88o%8Gb<aiaEmE*zg&-1n~zGTMHQ
zg(b+r6(pK4&wwW@x2Bv4Q*gVSZh_4({6La<9Cw2H?U)h=0hH9cMvDj!uMMAN&$;(^
zdj(ndDNuNK$%ZhF_lT6BT|#3n+!b1zWO~6?ju3Avmp}Vb(jx*pF`whdb)Zpq^~D8a
z@RTPASzac|PxlI=v)wRg7JhCdZWF%JTY6_YUnE21vB?5=5pKl-_?#@t(Gq@JSgTyN
zgfau4MNh4CkDIZ5DnF;7H_)KK791qVeT&jj+S5e7cmwR)sAg8R=+p|C2MOi6BwC{j
zQv<G|!MKU-3_lj0N|z==a?8t3`?9@H2RwR$s(@_j!CW%UUE6xO?gybx%tU@JaCt?q
z1m1Veri*&`!cn2-M3aJB49eY)2}$w%qKIVBgca;NDGpg0h9J;1W*SjY_q{mZ#jtE^
z+=AHFJ@?z|l1GkIyaC&zkujl<=Wo!^+aQE!&rH-|;%(%J3ZpbE7T>y4z-_VmVE&j%
zqymTDnHUti*;DnLaEn`w15@UbJEOq9tjf&w`GtQG^;JHmm>=T1vG|2R2VzOs_;+v_
z%)_g*4}161K|MjPm~%smp^{ltrrQSU<u9l=WGG|uLI)sN3K%GynH})zAzGMnA{A=Z
z+!=RFTw!6ct1RlLqM67|&@W^PeNjffGIOO2M?m?F5R==RMbi>JM!YLx3A%-_MfbC=
zQ^esGPS)U2#r<wZ5{BQ|gkE@7UXAmSE8Xpo$u@L3A$6%B)RmcR2F#VabBaGWdttSR
zATt@7C&>~#z9a;HvKUm}abc-9cfidlmuPJxwjv*sAbx2-N4q0|eV^Y<CeD6x)E;PP
z3rQ8^J!-q?^63UE>?212N|uN?WBif6{X#(!aB{hgcX@hlALUD&i?YIS%E;IlIvjJ_
zZhp>P6hx*Tl>(dGt3@3kZ&Z<vB}<rQMX?mQr@8NX?`o~o$-BMbc)+uPtAq1yQ>uX_
z6x>G%z5MJK@g>M*Rt<Wj0uUdWNB$Kb+xdS25uwcO|5tom*7`?$M1~#40{LI@F(Y!q
zs>vO0Wr9b$cRny;-&hkMJ`&-RSc2mtcOiqA$E*Yja`m6zG|+m$R@S44I-y27*#RpZ
zcp)OjLT1lXFi^LbX*S;k4m5-W)hxuhVKCynB%8&5D(k<U*=LB2rL9*7;L8s$vws^T
zPj#8rfJ-hcaot^uN)Y9&%$abP<{cF~5^`wCdZ-Id!w%hGEs$m~r<IM<^5G$GHJ#8%
zD~JJVosO9KiBBSiB~4-Cyf`A}Qe@DkM~B{_BVDaCUUGlRawH*T@Bi7alV63zv=V_U
z)%Fh*>B@=~R>u+_n0cY_@bvifY<G6qoJPkLUQHZ*3<G}VoHl?b&f-818q7&CezIY8
z)$sM~TE>F_3*z{JxR-}gwmT5npLAE3=*Ky627_Bg$I`Nvb{^rr?v-h_SR^-0X1`dW
z!d|6MYKGO)tmCS(y7UQ2Z>`nH!h~R1@BnxxZ8oIK%Nh+a3=z=ra>WxN<nOx>5D1$L
zDWtG+K8||ZB|)r~UoRH!+RDt1{!K_fB~Y)QtHX2q?e=L0<2VoS9sknrJDh!`iI$b@
zGvj@Ent{Gyr?30o>-hRiy!K#5vmuldfztDE{A9X?O?qBO`L!&l{P664PY2*tA1FlC
z@<#IHrQru594QYKobxJAfbn4-U!%`^ulJ{I1g*lL$X}O>>W*L&H^**;Fp!EQunQrE
z-W<BILixU(DCqUIe#_gk0Mdgak|ff=cI#DlQWeuCy&Q8!aj8yF`*W`<sNo9hTVKa%
zUT-Mmo&b+_7yFSyZp{Z<v&TiI_kGMXJKewOEfVX5qu9UcEpf}Af6&N?i~pjLYz&?A
zUW#z^Q`@y$S%w0ClHcyU4J&}I2OHOqB>CW0wKFD+VLfp)c4=)JJ2nZu?5Iq(%W@l^
z?1<*S3x1^Q49adNJa`#VW6CI&X&{#;X#yK~Mm*Kc>gZnh>|T^_w%LVUYZ~?->2?<Q
z*1!gBS9*T$b{@~IL|llE#?Zf~HUbV+B=UK|cdD|kZ+o6r8Q|*d@fxMXT>+*RTkO!T
zCk}S1`s1qY5b)I1N&ExbKG~B0E)n7N0D%pnUC+mP=9uE!Vw&3_)E!0m5;CK}l@)6?
z%Y!<IDPfpKftP%p<_e>#o7Xag4J=>)N!V~Uww1jx%pTo$9j0H^m%PzFitm=uK`!IN
z3VCaaTXTb>@6B$=Y?h)e09)D=L1m*>huPGsscsRkL8B`VMxMUbuvWCHM$A{cb4$!V
z^eFiZ^|V%zJej)n<WTHX=vOY)Qps`LqM_|CeJ%Q2XHhVQ+t=##!=Cli$*W?;S;FEI
zMCFHko5s-;&!Q}n!?Y7EBXap1`b08Yc3-{><6%4d%RAJn9>a{T^E}@+pM5-%@1XJ8
z;Cb-CvS9(JK75r!#G(ZKeL;HGj%Yo7pSi>&goFN9G6|a3Sc7<cyYN50K#Nq`^{O{?
zWQS_sDJONyC*R4Wo4q8{%&whDz9|MQ@*63TtyTJ~;)FVuDcdU?32K%b`<gJ3Dxzt(
z(pui&8~?9cIchkhVA}Z$Nry*YvebG0bDQEO#4KPJ+ddfUWwE$Ug+AW0s$fVU0RtLR
z>+l@99Pap+RT1ZMGc`n|ZGQ-1&D)r?3f2U4smh^AuFT#}1gOlJshM%)OS46%yZ3o-
zr+e`a_UUHq2rR?h8+v5JV+=+9>R&nT*FH@iw<RrHHA+?uCfqs9U6*&Eiqt5FXPxh6
z#!ucH+JJojsu&gt^=$+&Rd@U!^KwIu|L*o5^HNX!w$$fOiJNceVjzkk1H8}2YAC?G
zWImbyzvd;cw4?|UQUH5c$6xbu^Z%Nco$OMlsAl=7Zwr6TOH%o@hCq5H#moYTn`;QE
zcLWu5z$Zg&c{#_&DPShgM$-Xss{(jxwR;-82rj)B?859N5|@^;lRdNz>t6c}JENFn
z!=}RF5Ia~CtkKEym#Iw;i=^-CznC~%4x3pot;Q3&;#T1g$aW<p4BlO*EVG`_iTD!m
zxAnYZ{UYYC)c5Rkz1Dp`j7R$Tcw&>Ny|((>DR(6Q*t_~ZX2a?D1CxLO-yb=o8Iux|
zkv|L>Xg#g3`$R!-X||1tH?n<Uz8aus{)V=*cj$q-&rQ+O#G4YALeQ0H_M%?M8$fB&
zWf2dIjgV$^7GM0n$J2V&iEwsE(EdCON5B|9`3%b_tE_1`BHG|qbU?zTiZT4RQ6{H*
zCwd_+8yzOwl5=hHDaR*p!`10ut%k_JnKM`^X2YWYPogX@39_B<A=7UBbGpv#u{O6f
zWt6mb)nH;`wl1$HWnlktbe8ZFm44d)N&s-i{#o)gQk;5gL*HD%BFt&Wanwwd#Zq>M
zZZ}>(*eD$`Fo%e-e1vdlz~^XQnD}<WEd2n47ql~KBgw*FbWeQ=#3e}8XMRv0p%oUF
znbO*zkrI42xzboAr$#HyQ{04tB7`m+2GB5t7H9mw&*V8Vem2}(#CXG~STIdDqncj0
zyod3ihf|JzK9l$=I>ZzX996n_K-|_S=DJ?XNn_sRVt@j_uew>g;?E}MRXJ2_L#WYH
z=9y5$ynGauZaiaN^I3+kathK-`5jCXj&T8TYh1olo5qxkl#VyUHkaErH>;_FL(M<P
z^f#mCqRK-A<Hk0Z5eZKL+GNas+hjE%K%1;>Z^wpjwo61GA2wBDRPGc1Iup8f@5*!g
zFsm;PP6)gL-d!H45f|oEuwe-#ZYGMe8T=Q#Wc@FA`Ka><055?;@rG1)y2`J+W(<}D
zyn*lrE5}1Sy3i=L6H!*Zlp>X3S~c!jp-m+X0`xPM6${x)&Z87=flyFo$UKL=1rYjV
z1|-e6+Qwt0wo<389>}Hc;pltb@fZa*4K6p~E<qDG_a`z|#hMTR6|&T4$vs+!+Fk|Z
zD9O2$bG_QIOy2j>?z2M}3kQ<rww&CE=P?VV==9}xmnlRLHynCl{_kf&ZzrmJP<&Y<
zeSIO1-2PL!xuHGx7uNPyPxA|eN6eW_!DQxTUzAsxv*XqCjR{M^hg!fy572=0*yO+I
z@xYrBe-t9@_?gHHM$|L>(z!QBpR$Iy##-_7=Z_H>5oN`CQ0)^QuEW0p^6quz&;o<v
z;*WIJ<ZD*ooVH928+&pKuAj?MzY6<GQ!`45cxgS}Yl?OaMB@Sl2c^A!LvJS}>O)WJ
z+{l&2)H+_eqDrXTusM=|EBb;`&xe^IqCJQ3X<M@K=ha`40sfQUxOxxep`q?<vytor
zoCp<DM$1hHrA6DFM{y>JMfu>I9+>zMWvkAdu9ae{m_6#j^`-Z!8gvu>NK2hwGn<Cb
z`2B|s*($>p&Vb#;MAr{+FwUy(<u=~q*Zg$wu}+D(xv*kuh(AWRy1(MD-}@)I6c&S&
z{#EIQ`AH>H3~G5dg}7TL5G)6%x8s*XGWgUKR|y-W%Zo86k1`tK7}Iq?lX&<Ye_kRM
zEt!aOrM~q<#?;D(nxEoRM3tI(S!&@@)nIp%^$1NfhMi991iU1!THU<O-ct3{QCnS5
zYMO3jRI9IiYE4U@e1&$0wTHfVwVA_aMwL1i5oeiQYD`kLa{HZgt=^su{)^3-Fg=tm
ztb7Ifo^2)Ukt&7+?t~MX0pa>|e2kI`2Rn!@e{Kbi3#$-q6-J1q{9}>Dy=^siYYSlw
zt~#xj=|ElHY9>A~Ju+W5o0+`WjE^KZbbj_0Ry9#TnnkN8a!aR$22DsEE?v45@;MZu
z_Tg#g9Xm66A(@@QiVmgwoe>8)(>~QJzSyp=2e`yn^jKcuFwnPhM`cxiA4Dere7OPc
z<6p{x_3fj`gdv1BYqtVzp}Q%t<P^{|o2Q#A{!h=WlJjrRjNa_N#$BEVG;txfFpoSQ
zS}DyP!pR$$1Z9r9W{zx-C}bU?S<{=Irpb|WB79{YJP1k6<gL07dFqOYCM0$FlLX2Z
z+H~&wfw=!I@rh+*`o!O&IjAHJwk=e_U-hF61?q4CP%?jG1&JCL)WYHx`*8XXd^wi-
zU-(iqY9V?P;r8(#__Bt@jZy~yU+Ou<2uCn6PDe#(h4=4S{ujPX;}=)1dN5?%&sQw0
z(;ICI#7UUC2|N2~an?2ugB)F(*dT&YS(ooBhDRoRhkRP#^MQgwUbcXXJr4M*SCqKJ
zMfL4Q)~ZSD-)|=lvBJej4G-<Zj%J6yy@@jSn{@PcnQ^$v`#46}I>h%nnk#!h6;8e_
zduM2UIHvsZ4lJg=nF5K)_qqRWXq#@Z&e#knlpofAYW-3D9Wf3q<V={WHqN1!`TI)6
z>grLqvr7OnQu}qiHh-&lWfK;$kdJw_p!MHw2h`=U#gu0ZahY?4q4OS7Y{fa*NI<}4
zncq2o-CipTj!XA!WcDr<b;{S^?aUU@{Htj``S=i}nnI%hTq_|Y|Erhldmzs4!cpsk
z4U9uf$tW%`SRKB&kK%N<o)0r9hsp~*7BaH8Qd_G3CvO?ZRW4)yT`ZZOw*FcOgNjgh
zGOWvQ3_;JRNb$=H<?0NQ!^@rlUeAiCul*IfoekFb=XRWQtA{Z`$Bm^Pm*=*{G~c(j
z%k_5O=eosZeBGz3tv1Y6R2_&s4oHN=Hegnf*cPA-sJ&AZ{`9EoU~}S7YRuyB>FdkW
zScjP6>CfS^lhIg<-Oe7f*Z*{n%aKL#nu-AVQkwnf<qaF7XBa9k*KCjcANi6VtXl_s
zP?vT9Y-*R(tPt1usFYe`!P&@tfAD!ya;D?A;<ADKa^6(ra8AY{oqV4D=NW~-n6Sl(
z`8ionbuz>-S_oK39R@PC^)w|xATS|Jf_^Zhi`k7e6I!2aI}mz~f~c9;Lx7EO)CRJ}
zF_A3kwo3;M0b*>T!alndKJ5)&0S**6P!dSTM%|9Sh5qV3!I6iilsftkHv14Mzq4er
zVMGREKZP=yG%Q4a9r`o`3A<3qJ=L><tXdI-Md|nz6HhvDrk|Bou&PzDj-axo%mNl|
zGP+Hrbcq({NK^xRDp}JBXnG{~P<I)5Ko>TSyQ__6c8v<?((f7bE7|E8%XmJ+y1&~V
z$!etqYW>zs!L)2=*5|H*f)RFEfrcRaV3G);<bMDb-Cp4)5tP!{X4WFboJ65D(39v1
zKvtTN!)2VD<tg>YkK&a^czIlj!oNd-2Etx51|b76fF}i+5U|!Im`{vXC1Bf|$}yDL
z7t@Vf@*p~ssQ&mu_Dz+LcES}I%Z(bFyY<Bi=F{{@7CY%6d>)wohrXm4V<S-ihrWao
zF@i{fG>J((nH|Q2=nJ#@`iH*M#QgR34}E#Eeg~j0k+km~SXRx&Via`Up{pfhaa+XN
z38sjfNjl_)B>1P&NEDsuh;_pP5MN}fV1kr9zh^oTL()vKi3?$~hN$P=h1dG*vxq5)
zrsL|ue<K!(gl7ZhF`&R^*FLs*u!Ka%>s&<3PHepOVTZKVEJQK=WrfS}v&NBBa<-vu
zF5LKP*ox_h&g?Xc+#mxvX0Q$n8}+>mRW;~T8SVEKSb}Xjo_JbiNH~0)gXxDH4q1^s
zkF{u!B0OKQ>kl6Iq;y3wYdjS|X^2HeFL^p2*bDbc4-&QT9376g$}?nkEy{9$Mue&;
zZQd=$k_I?G=%&<ykk8`8jRNtb6_;yR*^}oqaDK-L(8i|7#jy~&!LOT|pUXrpt5sXQ
z<DmEhiwO!s`0Y`QskU{0?wYFM3OQ9@wH|JQlNMG966P4_+duZD;F1|UdjVN>9Jtzx
z=+sl&0-;4YA2IFkBQz(OLbMHnwVRk+5V)UpH$Ls67z)vU@FN9@J9vR-=VLKgJ%t-|
zhflO`kcRK5!R<*M;x4O$hb26?!Bbw#5FFDdej{)&?T~eJn-e@}5;+A|*Ak4)Z;t^=
z*K3B~eV$1MK8;hyEJelE6Am7-+j`;|?|bZMr~xD#{soEOm~+7~Hs<3DF-?k4{0NEP
z=5@8bq8L3~`+H+K!iw0ADJk*nkK!@E{a8RgFA34bgRvY_5eFqK;P~B)PL{`H9rO=@
zY44=6q-0m9y!M$Eo#C5SQ5r7bi|Y1RsrE?_^5nxqMYOoHW`Ybh(A=lKO8jMNG%Ju7
zmNh@)?B7;1h_a_WW@zdEr+o*`)+`(hF~Ni%>&H71mNf4=NR@ij!BI@IniG27w;@!U
z8IJJ+aOKBkCtCy9K)dWTVT$)fn1SzHZE~2n%83o&PHN^Mg9L@Ndvuzfvu+wnrWFaH
zb6K0O_$cQ^-N6aK`e$-rD<8fI{q=qIoHo*!m|9zD=IDL_VZd}`?;{U<<5bj{6S1jx
z(^zGSV$;lIVw+wXEa>u=@a$GnufExHEYHG37BdB*TLYf&CqC5R>B=(Zv}`Gn<O4lu
z(Ak9Uj(f2D)RzNSVvz_Bj@?Y~Dk=)hJ=hisGEO3+>L>GJ4j1u5Vh-CaYaEF>Nw*PC
z?KNRi=uy;3_E;%y9&N^3h;3^cV%aGqRO;Lrw1J&vNfSJ%83^SvudZQov~@dWG%hsu
z-^9I=>l=(%TXv*T=z|obx{^f7^{aq&Nzx0^k7#t7bZ{DTbDiW8nJhH>)S7dYnmupc
z!zQQ)4`D){?XRMr3a7=2`tm7aSCYt4`QCO79n&wr=flaQ$k}M)L><+ZJjCU?;mj@Y
zF4ki$B~Q^#7H17DCKD&iacL8JR-IXlPj1vm2iFF^mXdGgOC^BaZ{1#5wUg&sy)ajc
zNw0fQzoV{#r?gzid@6$j?@R6Rj;@KVb`8r!&s&Am7h^lg(XM_oNRLXT!S0;4eLGGr
zfA(R9rcTooVnHG1Z@`S6i!Dh~&nK$P`{k^~<-xjOUoH+=TIUO(Fzw|46ef9{Uvan_
zQ0S;^Mz@l80G~FZ7f^}{HkifVcg@i76Oxm^6y_k+|A)d9r5B*N^_}IG)A}QPA7Wpe
z8{p%5R7IaB3ZO8>|5BI_<p2tE4|gA(Sz<SWl9TNOqfaw_?||oD3RAT_&<FV)i~WZ-
z<h?sMO<G-=OF@d{DNL^zdK}ad@Fy-K15OMfhFD#~c)X?gv&6UGSyJgPW=|C)VOFGU
z@&bCSk>jbFttoW`6~h!n{Lrt!Juw`2<!vG@Z*a!zfpz=P>(^@1Ig&D1>ne7k5hV6%
zIyJwTx2Xt_zKHtCqFJCwU|v<^rpsb(<{_}}(r#^>2_X+j8IADOqeg*Pv})9-!J~dc
zLIO(WG}QocTv5ihXg&qM&_|x4<&aj0=q#()VVHJsX^2`gaR|)7B4zjahI6;S7UtY{
z1is<DrQ|Ij*@nKHX^HoFzp~Xz?0?=ikCrK5+OtZC3qt04lbiyEU^;Fi#peYn;#;w)
z-vWjMY8ly2Z@)qOG?RGHQ<#;HP>xYcFqKUzu4Gh*F9T?pMduU>W&dcHIaob>N@a|L
zRE?k--7gHlG3bqfqcy;jRpPf9<GiOL6S{&a9)Pt=LUFiOy|xSG+~-_U<^3f1x;2kl
z4VI35kE5NrW!J#&s<L~rcK-i9Zjw^QQAw&gIgI7V2&W9UKD0~?4GqcY5VYw6r?@L<
zVWfB_d2|;jPjkuEVB^6rP$JD>U)yo0Ok~OiO-*EPQaRXbr0+`(H^~ei^qA>Cx>FYM
zXQR+-Yv1_PY@yaf=Mlv?pt_eYW22l^4KJI+pdGmaH#LU17)->ZTel*`$-2($v(o<*
zFR%NqxMk6D{xbkCasJ`G5f@E*H>Fj<aIT`?Y&CjCK)dt#a=*6tZ{&3(sf?p*1vx+z
zl2-CMAjQMV)YN<XG0!RZKKA7M?+w;sL|X-COKJCx!%xb>+)=g=q9rmY-YcYN4bQX)
z1`LaQ+Id**yK-Hxq{>3$C#$LGP|bE*Ny4DZlTg4?Pk9ahTv@n#xa&ZVbG}chLM~d-
zX06!bdV0aD3t&UY%LkX|VYkZrq8Dm4GWvVsKGKfsU4^vbLzn<j*KnMkLjyTHIA@}z
zi;d*@KOiQqzYc~H@BIb_E%4MqBzL!giooFRCTx#xs(KPyANUK=0k>S#PR-F`heDOV
zj;)nT=He&Enbgl@!Jf;@`M%w3j8DPT?-V&c-?pxXI}vhA!1p8d6~P6%2P9i>Pc^Di
z<Ct26^HgL|iNyCy-(Y8E7{8p*=U?`oe={VHQXkC{EB&?s4!2+raVj9w+Pl*<+y_@!
zE_hqhhG}v?+XfI410^A0yl>{8SvhryYlb3ve+U^x6(DiijIIXftEs0(j_wbGq@EpH
zeM%;|bDU+CkhBWIi)U?Ak0)nnmvw_OD65*)%qc5YdUf`0U-Spt*P@u(__8O}lHC!P
z<90h(7st@L2HW~+D{)YrI^ZRJ8rsV%Wc>@vfIHQO!Ug7Nc56Ur)`5qP18;Fs*?!^<
zI%AlKhj86l-&0dmG@@0gMVF#f46+>IS*(1sE)F3kW;Vtf;C!Y_4q(|amlZXKG(?#6
za?mc#_h6Cn6JM?`&%t#wR((xnda(37%kjRGjk6JJ=NQK8ayE>V+lJPI>nS0M2;uQ<
z1T4`+p9{^qBQ=>i36KrH-hDS*{B+-tLSw2G|5GO1y2ZP~)I6;CrAlHA^h>St;a>Fl
zi1NvTi0bY!`zm_I>gzqLurVhxKN`*u2b{@z*`4l%6eLtjE-7PVacKk`Hf0PzL~h3E
zjS$6PXpM>-?mjgk>aM*d+>BOtbf2IkUwq^Kyb}5R>2T~fAZSnO-feCmqM9K6%PA8E
znrvtn^CnY^F7T2L9UDtG4IP(|79aRly%0@~Z(aFvV|;z(JMrqr$n|QAbo3&fqDxT*
zxwuAbspbmvGYqZe@!!blYVsO$MzCbEFY_@@xBDfR_oFoJ*duWmuE@K3`VrJ;uLK@m
zPH%WN1FI!tLRw-2VrT8?O|XZPtc^8iTw!@v4#}z&a#r$2QS7%)x0Idoj~f}SA$A|o
zSItGKPr|p?KMmRbT3i}oxJ`#S%?dVJCEi@9RBxz}GLXI#`_1uCe99$CY7s__ay<US
zl`_nSP5AC#F4Ix>dIJLC*#3NR67LpA2)lDL^=4b3v4Rl%(#wKpFQ#~&<A1^ntTegd
z06r{mk9j%iS1EzSs&wW1kuK%#$)O^?gM^s7hGt=Eg_aa}8S^1(@8kawF`Fg;A|~tI
zi-!JR5fdTsBp{jAr{b@O87~VEF?*uGgdYGRX4Y%~#YLFXi`n^qM9ejTALr;@5r0L@
zO<ga}cdDF*iE)(I-a1_wfQU)A?#Hhr?oNT#1vCv-+&tEskP;-&A7T5A2u)|Oywj-2
zxBA?IfrU{13aV0%c%*b#LrB2fEg|6`w6fJsse5MBjZ|NnTYR?vyjupP6jqXi&pDll
zsRBOgo_uau*whmWqTEP@I=J5=`aPogQQZfAyZ43eZg;mh&WsOuDbrn~@^E-`d~K$3
z)>(ZcOk6OmkQ5y<2YZ$YS{_||NCvXoh{~GH?i4nxpC^uldmrNnH{Z48k`(-54F5^1
z%x#<~CoHMWDfqa!8Mx72gwHQ#gziZxgv=}Yh29|p$6U2h&5?uf_qRU@WM+u)=^D@h
zllIPNI%ovFn)C&+avfTgH8!w5Xgt(#J25o44XuaGDiWFAxKv%RQ;H3nBRpcRNk=)X
zdpC?_xZJ0qO}%t)JXT;)(ufrLEyNTcY~X)xZdfLAer^sK;W7?}(A$fqxINpnxUHtj
z^d^&W+R3Htq#-O(gJ;b%w&wQNEs?|QN(NS$-^A0$u^7rbz-bfPjJXmtyJ?$nCRgJ*
z#lRNPF+*|_#%htjm~Od+Yk;)Rsf`a59~-H^q1GRI1g1-QeiXI+@t8svH*$0~UR|`s
zlVy48P@@-roH^`FQ0eH37(1g?KQCy@8d_;)F{m@gS}!18I(YW;+K3Ny2Er?w*vijB
zi923+Qyu;`KrMG-q=iat6&GT}Vk5R}zw>z2h6wl>pp(AU@on0^60yyG>tlfTyy$1y
z)XLrgdZH!GBH9g7+i63ci%VGXI=CoZI@ud1O(UnYzs9Iv+`%RB!60V{wkZ98W~WGb
z<T}P5!#@%Ob<J56jrb)n%rE?_D`LHfrcb?KxKW83($UpvDXrZ;zD=DM_AvJ*)bYhh
zniXD#Ha{GPZ=hn5Ge&sI`j@^j8@A{3t*%JgLtNV7GM%1|0joU~$JcT`_AjgHa2t`Q
z5JIXdm980?bx`{0^|FbA%6$Y~hwLaU|Gh{StYVCQL^k^~DQIMg#hs0=me4E$duSm}
z6mXYpI^J+L(a6QrcvfRHteM@BGy@SQ_1<0DmByFi=e-=nxrMQyDYJ)0V&Uc$<L1PB
zb8}hc$#A~(=v3v#sYbr65!7a0p8UYLLG!R&tsj26WRR0N9J+PN+sa7b#DBnEC1g|-
zPOE5JF7SPZ|Btz!4i&$)#sV!sOX6$%$yMS81o|6OrD{Vz9pr;)I9=<vAf3mF^a$)5
zGy2_-c+8J{(G!11AB6-JI^%VOZ=y7&WWC-<B_QGwF6KUspqcdW-oVE2H^^(oy2m#!
z?D|d!nrn|b_ZSX~Btix-dFDdCz~sVh4JZvrAky?37KpA4S<8z}qYO|rwbfSkx;yoL
z!CH=zt-7kRtnr)qUkqw@9<eI=z23L)`<tMhHRp^<Zl&yPRF_ZAZajYX$w%Y5SCO9I
zkFlxxtTLQUcn@UpBFxwQ539Ofzm-iC_D}1;eRcG8>;vtp>c?nm@=wM^1&BU%3T=KQ
zS7Z|mxl3oY0$Z5Q;DeqmA$R2P7mbnETN2xR+@D_^WVeocPbRNyt^vkST~Z7}&!W@;
zQ{<e7{7+pY5QbYZ5;gsqjh3=F4H0Z2E03reEjZAAH@Kt0?=P(v+T9u6R3=C_72R~T
zddX?Ow?(oxsr94cD^dBiGu;Dhrfe4s>HOVP=d51lX_p)*2OXg{uXNkYZX?Tfhw3U<
zxN(riemuU$Vb)=E1jQn(cU;wavMiya{0aBcE}@L-phGqe&|!)tYDa$$V+PM(>o@&m
z8*zuKZgss+|MM|?zc}&>rs+Q}=A~M>FR|g$Qpnxq^hy2JjVem!TajRaHI3zybvm@$
zkcp`yx*fb#X+=tnFbdM$FP{@23;xOJ@Pm#fYIBnVz@E>aohM0kW7ivI`P<n{F~CN;
zd$UF>w1p|oVhGvoa+Wb0nHcNieqdAh>+0Q+9e2@>9Uc76apITri^H*bgcvk0)Jr4h
z6xgyvzYr{i*SI|60_e(ycPvP}hZsb~FRWCty5kij(uDj!%~cQ+a5&JY)sR2aUyEma
z3+h^oT~|O0=<;UeAsK*q(}A0%c4oOvp<o0>-RH%x2iYt_5kC;bt%>iHN1#l8it=Qp
zNW|8f4c^;033EkyYf{A>&A&of$4ZEld{mL4;xp^UBqaDmuIY7l3);;DGt-%UZH6}i
zex^L|^kFHXlq%`h%WRoB4ysU<J*~lvov~E$Y26~0r2cB%t_wO*lc!iNmI86pjt-N6
zOttx@&MgsL?jn?<8*G}44HO`fM+9jCuZ2tuePXozFN|4*$`Ky&j0>aZCxn`js<;L8
za&t4toN9~^jZ$3}W<m{sF^~4#xQq>rMf6okgFu6kD1uMTu!_kA;a;rzYU;lTAHHzT
z^)2?{rs<`$L9+|Mp2(1ObsJqnu|gzxc(mkb7?(rERi@^<$OdCQAKAG_RPe>)bmUcz
zM&?VCRDVj4N$EEQQ607n##ux&Ykh=`8*%p#J_yW$souSYM5lM2mJ@@nwW+}kuR$BW
zPYf}j16J$Ng_j9LP6SGG;UyM0w5CMd@q?w2Wqr<}H}%Yce9hXPZ##vIaFD`?9ugO^
z!Gr8^a0<PHjt6)utNqi(PYYy8R9x2Fy#QAoU#q<{4rdCAJcP_0mJ-NxC2&dp3s5NG
z^K{&X?%i;d^^(@|M+uCyh3qaP+f#L0Z&s>_&I^kx35~ipb3efwUtz&naS>Sh5zVH-
ziQ$bU<t9+)bIt5Dl%$W_ueVr1WwR+*rSH|LpkO5PT6*C_Obu|8kE@OFL)=AWaWFf>
z95JNNH^N-8iQmCM_A(C{tb1V!@u0^qY+*N?cwXge&W=}R1)ne8runknm~98-;FOV^
zgVCyvtB3q`eSy#4$ibQp{>qr;qN&?pG1p1@8Ni&r8IRfGZ^^J3>$RH|BVOE8X34I1
zI;8p9zm34>azLiPt&xDkr2!VE${G3dD3Pr}ESZOmaoc7L%4<alr8NUki-Qbp+A-Fr
z@r!PjLZKmOUMNhbyqzZ9S;?l<l(5K@B~k2Kw!PS<=|)X4>p*Wr8m~e-6W7?SW73ie
zueSb3Axm#VKt)iA#@QU4MueS;9E?TP3c9_aT+yi0{eA*`sy%U}XacW*wOJm1m;g<c
z+-KhMX-9y%H&Y?NCF&Am&7KO~j10Ap5w1Yuw4gJc6wDA8KsrFe3EX%?y|6vxfs+Sj
zKxKWbbkFoF85oi~h6WN)?P{hU=C}Ja*uK?0L(D?vWX;`V)^tQfb|Zi><LB6czsAzv
zDMtwS={@&`u5oZlC*^&aiP{14{rW>Y(JkCmx2a%?k@T1W>A?p<g558h&})U70B6Ru
ztKD3jssW;DtgEDHRbfxZ&ynPCkob*yLnOGfOl9yFR~jQh5O!G<nuB8&L-`nuhD6A5
z+(eXAJp}26+mzyLP0(yDC%->?CECU_$RluD&>6voOd-`_e%NC2fU!s^8>VXSpIN~T
ztHr33?*nE%&?*CC$;^l>-Os@gbAjHZ&YFq4q~u_sM4YfPR40-L<tXr7`1bT^QfE##
z@d?3#^8u`>S_1h5vI~Mj^YUMtz=AvUY0pixCmLFP9dmkNk!eN@(wTMb9!hOg>y4(9
z)ATRk(IWCZvrFQ<^Z4gi(aOsRehn+zy8r8AYHo$0BtZS+V?O^A{*eQGOmv?>Iy2Uh
z#zsAM37LY#26AUm@ZU%REcc71K!Eun`JSE^cKK=yOGUai&xNwst!(@ku*i7M|Hgm6
z#7XJ@ZoEcG9#pU*UtVp1Iimvq>|#QXoU+<Ew)-mEo;n#FwsK!6L$+-3UzBz{i+p}E
z=ZLe&-R|w3i{-bX9DOz@4?*$$5f6GOxQHK5$yEZ}La9l`qtJ0ql-D=J{m><**xlvr
z;;6{C{p@a{iYSmax*ars9k7dV$;(6_o_#Z&-*42(StNGdKe`>5C-ArbOYT_Z6}ZEs
zRK*tx_HMh__#<$(Hp?1fcVLFPGCnUH{Pqi?vx;dxM+HEoI3U4B_eG}0&9;63D!>Eb
zMmp(lM~KGK!cwCDsr~Z$dclIQ1CLKdccq$>1J>3eytk@ruwMt9I^Q~#(?5}v_{}=h
zlpx>=1B$m-w-DpX6fz?ONOCj!uH&9vgFgof>pOo!lrg4+YV;Z6Wlsng5le9V<mJSR
zlgz4z?J4LgsxPmvr(D>=k9Lc9P_Vlqz<zv!U&E(8A>LBVrM;xOXpAL`Ap2a~f<-2?
zhA^}V0Ee&}hRErSxk1gCMtx0O@AO{j+SvIYOJi{S6T#M$mUzy-$j(y6tQN!aR+135
zX?a|<I?>UIB*NnunSO}<ApB+x)<y`!q*i22ob=T()UdEeDxHvJObC3+XfWE{%W1G2
zOu2QVz?)h0ma-vS*>*HsjKvbBYY3N<H?@%9r!X5?<t_Ibb{QtlaB8?(1f5~rZjd*)
zgt}Q3HKbkT@GB?!=>(-4NQ$DntEYMQNzt-!Vz$djbk%fmeA9Qp?_bJd-4tW6Mo@|J
znhWbmg~^IMA-wsJj1WEp<<Kt4>lM;XjCmWAd67}qx!c=fE$xtQPY!A*lsEVd%|GN7
zeTXB<lS+qV$%2LvlhzOIk@>W{g9b!~b!c48(|*YPcIT2&^@s$UW~+?7OY#(7xH{oh
zc<o(8iNL>7g8AJPE-ixEOx-Y6i8i?up%cg*Cluk6n6a+mO{V!1ZH#!un2uEd;vT6T
zNT%^=To9MzW%QJiJ@t%i#1xI&1HM_U9HCBf5f-^P9w+SQI&crqWCVJ+d^Twt!~=y1
z5Z08Na#qprgOq+UPs^#%-9-3g`bf>kO&#uL?ssV|2>hoeJ>y0AC<nL(0WI61cPLXp
zAP2*@10zlyuO<dOk6Z@x4xDC&JUNK}aQc?OXy$y2+J*I{5v$J9a6L(uoubDQwXckB
zZ!kY9CRr1;mL&!DHLX&KurGae5+TEX&4fv?arupP;=3@;y1j0%{?-HS`i0(F{a#p!
z$m7~ww~XjUo~Dh(dHU$BqQ#;qF&NBASVW*+m$pv-`OpFtzmUFE5fn>Fr<yPWT)Z&$
zA&6ULu*MH)7-J61AQ>&1&_c8)8zCYUc@-#FF`B?#vBC?pF;)c{6e$=c+oO_d9{L1n
z;2$kO1&2J{VJn%JZ?oXiU29Fn+m^$A;XU0PqF3IP6EcWCgL?McSfSN!j)wkhiB<tU
zNod0Lw?INV{y}j8>_vZ!kM+^>2N-QuepaV3$jy<&r%ilWZ10wrNk`!!CthW#RKPAN
z@ayJv?oRIn<y-a@7@uuLTVh&JVp{b5Zb1^k&G{>^l4IL?Go|@`0w(0tE+^;-$oJMM
zVVm>2SQT5p!PfUy<??rOJ$qgr!+$@n@!l73Ft<LF-sM_P<o!tXzJ~MgL{VHDd7asU
z)J$b5Y%38Ku=y?Ml5g0btK;S2`ugPZqVFn6vpPJmg%YO^vh<Ugo;VHxsk_K=X4ja+
zdXfa^ZpHUy%GYn4|BD@7;-@bncSM6ICi-q_`rh0mf^DGCbSAN`J%6iRhO(~3^P+6G
z)^L(KW~th|6oK6x;|aa78zHhSEy-28glF0qn~kvv1XRb+yLKax5=ovYB<Es!+4UZ{
z8*EeJB7>lc%(_Ol8%fEd#Sr%1=@i&lqYZO@wHhgeFqvvw5O1#QJQD|tb+t+L?$`}-
z%g0BXNJUH44HZE(%3yF>4CN7YtI_^8UO6o_N@P4ago=yqd9u)KBS>D-l4D45N)_4|
zy^kZ4sr6#or<NmcyAQS;_@vC!rB}2{BSSyrXeeVE!S)eu_&gWa02H??W1i&#;PA26
z=O|J+e=MA(+|5bo!Kxw{s*~qXY?}p%k4lIPc*$COiGT<?XhsKWJmj^;EtK<~8uj^Y
zmWdE^Gm->-!5UodVmx}sooaRol-5#e3UEie)-s>`D+k$B)dLq~ji575su%xxzdVD#
zNnUvteHc_sH9$!x$o*>dV1;|~=^i7>9M^q)oyWWCY3UYl0~Y=7XdJ_}98di1+KmDE
zD#4vCx(aa4`+TmSSO@DaH$je0#+&vrx9gP%j>pfVTD7HW$y)OLrxkcc8_=L|$%AHd
zhxg@3cMxp{*auNr238clKdd=elNHNMx<7dK>3Cl#Jk9?!LpP=Y2h+t_i0f)t3GdU}
zf_wOQ;9XlLWK7nJnA%ATH4jWlD~<4w)bprkqj`f1>(kDq&xLzgDx#W1k2-Ygw5D!*
z+3(`4|H2@m-I={%^PFrHvN58)r!*q<zQ?t}ZQiF{AD&OLF@u#|!rAB-K+j%rejV}$
zymD<6)%G>6q$om(3)i60NDV&C$hr)|p!nt9Q3{ga{6Xl&F>CG(r~uZnMMViM?CJX!
zrDJG=n%BZX+M&(t44(n{<((&=u!mUUY%Jjv{;-kHun48NnfM1(%_IAu2LMfVc&hrl
zCRg@t4F0NvqbGPBAY8_4M4)YQbvZ=jDDND|r#vL;YqhQIjsD`<-C@HgQQd3ka~F{-
zW{`3_H7k8b8K(2e1qyA`oK5o~_}JO!T0i1ibNgXPE~QNTPUd8fA`?(dIk(l5XdZSw
z=D~ubR$g5@fRbrEXJ#|m>$~B8xzX#bX>%S&@BQqs0jrT!RLViwvqRkupW%Y_I1wY#
z-UXIdGJh6O1cKOu%Q?U!ZFdnTiQ2;<&uC`(+h1EXgVwf7)RS}*IF;qHf>XAZ%muw4
zUd*e4fC_W$nJ~agkt$zcKZRCAYm{(Fqrn+gcHPysAmf7>oWCU{DXVDT=cm2N#{?vn
zt5@h*rn^_E{HxX5-{m3ZZ5*rHMo12d1xQqhOd&q3QhB<BrOw9o?2C2z$ULj|(7GO7
zI!E%{twz?vEPKffaW5fqd-+ba2}0xEC^{I*V!|SZU*k$OvpK|x2=@ntuAX%42+m>M
zK!>0ALREM<Mt)o#UY>xd1qHAR%6e&6i_c$rr9qD+Z{vFSMFE`tLe?KLgY9Y@oECm*
zIfp36n$)?F^4RpB9Fu1fePpW6Cdn!ay!d;|MJ7SpdL@a;JEyY=SfNQx5@)EF!K3Me
z1_ydS3nr8kxo_W~HXUJp?6AC;h;<8SyDCB+y(x>WF{*qW<=r*9IyvOIZw2r*+9ZG*
zcu3U*r!Dh?-KwD4P`d}g7%6we0fOuga8(3d+itL4b~-3LF0F*h6?d-qup3Ji<A~X)
zo(u-pKx^5{Py6TNOaSSl&G*@#o9}v^8#!XKm`mRxD$C!OZ={&W(^^QmSZPjnVAl1<
zX<@=_Nw%@FvUACHxQJ4OpOJ|EJ#97KKISRd+OMU(k9jQZ|H5L)A3FWoj{kUKgDr0M
zr*V%Ty|L=dn?^qz^RFNlT8kriyCLu6N~`3Ox}JO*ynpBUR^o5C_q1PsEcQMNuC_gY
zZ1HXnm^1lHF%>VDG*Y-<Y(E-Q`KiA@L%xAbJi2;6$O+isJ}gQHelN~zk-1=b40)RD
zQjq3<%tdo}^?3F$ynT$7%e&M~E2DW5O2*v&4t3E#fz)*6tL%-x$Q$n}9V^?S#sTDF
z*;foB-r!f5QKUyXS;+J#!V#8e+r}zL9?}1_@AACJ9pVglM!HLkHrdRS7R>|;^JF#V
zqB#~r-n%mMOY2|!cU=!3?pEWQ;*L%R|1X|hfbTW%YW{g~H>Y_O=|U3tHhAHQu?Vx|
z_X${}maaEn(~immkIF_PeOGo|f5$J6QMdh`%MzuXQ@*a<@BRF~(prBtI3=wr0spJS
zoOCyI?;!YlM2D=}m00#+$y2+IR_3pgyL25T9=#<*kaOwVY#kxRHWTTV!DxtKrhpmz
z|Hs!ocIOr?VS|os+je$r+qRP(+qP}n_Kt1awrzLLIdAvq{?cRASpVReYd*EA?hEOD
zErd`tWxsi5&>%|+U)Y>=W1W1E?n&m&nT@5nNPtvQWdIs@x*<o{mw7Il)01`I_oID}
z8#qrHI@EnS469+INT)2WKNzy@oL|Ql6O<%eP#h<on5`tIm(k(W9SL#&LU!A|``S37
zRX(KW5!Jzo>OO56ZPdsRVqhlv*la!IE+UZX4JUlx3*w;2Z4{W@Z@0I2AVo9OQv*9v
z5_~BSVVzoVCxP*XbPRru7t+?6al)-gEk&D+d9)8eQ6roJRYqQb!5GLNV>{5z(14y_
z9tzYk$l^AR0>>*<x0gONT2+`C#tP*}hZ-VZM}!eASo#KmJLxH4!W?i}4iTwZBchLK
z{a<#G;Q*U)z+J2tQJ|M}9DKiC+zJ1f!CbNd-ap{}s+eelZ~SGr9LWTZ%Bj*L)=ps@
z@geO!KrsR4!Nvmq<7$fVGJjWEugSUHBE}~2n<8q2t9=4p-2;<Z)ZQ;|qEu3iv|`=2
zf6|Kbakm&v8ZenrY=0}~@@vT$O|lqfAq;4&LLTQ;#k5B{u~MMhn^|0p4*CJxoHgK&
zpaFQH;uETo*GJwNo>L<aNC~TN|EFB{+=>)9@`75`jSnRx_CuFvJXd($Sg|;ux|k7A
z8l<WYXXem=i9Qj+wBC*V(<V5}=M2AK$#Ty7F`BNlw%tHZ-(daM$?ukNZ)HQIHYRX9
z6HlCD2Jow1Z*l!sy-xan)awrK|EbsUQ=9)<n2miQB<#Nf#L0qj+X52*3iK4!2tT2=
z=qBpt$7{C}9dXQOu&>G^zn7TT)rU7*Rh=%V15`{-$B`8^^*u+2v~3gDVcQ#hqI~&v
zZVtPaSzBw8L|XvnPUXN5p|Y|g4AGU(fpldta_^$7A{M!BN7|fsRgPd;3_VG;?=n0_
zWNpm?j<#WCme-h}k^qpPr0CHy;Ui*EqCIH$7NC!mGItPzp)H62!NP{zu6^`c_fz1(
z`jL1ys^MW_VaeB4-mX6PiXZh1d2<GPmQ{gJVz#SBhhcU$xgd2Z)ndMHhe#+{;=(9I
z&LcK&2ibP*)&#hvZ>roTaUi}CtE5aoC{W5OZ$02DB!v}R_aatAyXOLB%%A>k6=eY_
zpBl&4TBzea3`bc=-1}E7@P|6^nlLd^-n9<1Q<2o4``)`u3Ygm<9okKF<JrI`dz^uK
z>E5jbYVy^;DNHK(mWA*j57J8@korK_w#rlq<~Kk|?6w-QA#P#y=bH_V&j&b%Rlc)m
zq;jaHp|J<1y#q27s{hm(Sq}x?CT+JRl!8jb29`sP5#m?G_$Ng_>pzu<Hvh;O685=0
zzS~<!&&kT9h-Brgy7al?qqKCo?>_>)jSW%DR9?+QG&4f;=NLOZj(Q(8*5Y^3)20N%
zlSpkBWYh>J375=iSOUHYPO^kkA=wbY+9slJcq6s<g&H3<JAz8A!a0?-^CHb9wUp_-
z47O!~m0!zp0QO)<;=zOS1@KDx6XVd?Q%V*L2&doUE3efQ8@9e&&wCU^a$w;!gEf=6
zS#hv7fHz;kM7ZSrhK0_y+ToX@j<1$$(;3Btjxc&(xrzG<sHP#6yC*jqeR<b9k-p2*
zu)Bnx0dH$qL*%L(o>yUTR$Xb|0gQ+(pg2GCljdpN_E$d|a8T(=f4>5p@+rE;iz9up
z&=D6L{%(}1incQ!Q2oAS4q6<pz@Rs|MM{F;-j*&TElGV4so{ywwRk;3pNKl`@o48V
zTq+o+mWW7+gJr4@Shk}yN0$~B$jbt*$EWT6X<(J$@N~v4pBlAf)i<$|cR~|L5^yKY
zMofZ-sozy4JHklGz&~~e6~j>;?1ay{#AlgJjJ;bzKOk0vjC1lbhFPIPuP2%iZf_Ao
zy%5B%tx!$z^k2h24IRsoOnpcAia!K*uzmDtDr|WOg#$+{cX0@P8FkHzgHOGVd3oMo
zvk`JC&Hh@nUHjdKgxLBZ{Qk+4)j72lmRQVYkx=1|*cXzA+F=971Uby<=w$ftzFMAX
z#`Xn1_&PB~KgoGj!hdf}nkI66N5o39Lrz9Cxxp;zb85(n>rdz-t`#Qokq_d5$3`M%
zuOmHp!otAWO{QQB8O>O)97NC_A;!@ISklJ7y_`W8gW<t35D|71<-zvEf{`g2Z8Yi@
zR2CWZqF6KXQc>#>>i`@ZoL04nC83`WNBD}4dnt1G)STFBq@%<gT>cZZNro!gr&)8U
zZ>0=(C|W6zIXIlDA^!tVQ)IZ@5Cmd)pSE~f)LaKameDjFxpSM1G@BA{OoS$;!|UMe
zsdHN*n@X7~T=C&Z?*hPlgm`Rbk39Bo=VtvwHWPk&9PS}p+=blZ!{ft~_33#nQqo=a
zB4r8{iaAXQ4Bv9BwF@DSKbJ=8N7OMC7^>)w1!ozo1PmdPMSCm|xliQCo8-#|ZP^^q
zZYNFi!<ZTV@NTjC3e|cTBUckWL^(21H4P~}t+a<Mncdt|1nQ2?oWxiin#5{le#-HB
z=q&hze2(>&o#0qx&=bHt(Ee9{%<&C^Zy9Q<wWqbnJ<REbJ5JKm`9&2N11JG8@Pi&B
zJr8)X{sKK3DF25@9GSQtL!KD9S+U`%%+9oz+jPa3Ye%p7MHcu666UrTTKn1(AEat)
z=(l(;a=P_0x!WI4TR%RzIH)wxt0;z)>BHN>5oGe)J_nY5S3U+=qUL6sGW5Bh)|x(U
zG3)Q!cV+eRqvzDktfZ?j9Z(osf4`|+FG|!Y7rZv~7RyFO>ha~-F^Z;Mdv~~9oc>Rl
zBxs*J5_7X#?5SsUk7Xm%)b@BHWx4EM2(Ulhegi7JOnozd=`ad2IoSYwxzjtAV}#k_
zQ2RK?*uC6qEFNjO^P=qxqA<IDRQ}~j@AX@IU<ny%S%VNd=54_Cyq_5Z+^{cgJ&<<u
z8sa}6o;P5vl|+p$?`4A9U(<yNzBGsfA&7DAG}DDJuJ+%9SgQpNz#1;p4Jk~z2*iz-
z9!U$rg>h=JKc#+3{-Vjo6-c63$jXRtb&g9`&=nW%M;|Ich#rJvXy$Z=jIGX3UJa#F
zr&YBskSl;J$cdK-D4HPk8KQjc@gXLU5$2c$@DV2KRIF$VRnRF^0*a)TEyQI&Q+C+W
zCTZ&|*H>3WR8D}iR95XOuH=SnB<m}kJ7tA2AQKEJYbwQUKskk&DGk_I-eMN$IFAIy
zCq>RbHDihjd;7u`*oSEhxrkD_EsJ}aS$kGHg+32|F^7tpGoAVbTA%_OmK7I-yqyrG
z6R20uqSpYLJ!$LZt11F$B4?9mj^XJtc^_6VTip6vR-803C~q;6=*d~O{M}L;83Ijr
z(8vlzQYp`q?rx{bZPvQdl1QoxrVBp4a@9gXY|-<=9opRPe|Xr>n8*(1jI!xt%7BZj
z90wJGIEa5n;IJpnonY_Q<o{<6tnAMzuN6wHD3v3>UbbB{s&?a_JE}%+iu`tN@+sWy
z(Z&y1t-G2yi#2+GVUBTDr4Qb!8V`2Sr}9~-&q~s=1o%2`9i2h~SuAJDj7f7zE@J~i
zRP_c8?dU2X`nXSo{9@Lv_<6<vv2ejr1H9Z)Mcni!8P*K$OAAsN8*7OMNEg<Gx8D$#
zgVkR)Qx-?ct=Zljbm<BXdpDf+$!?OMYOP=(m|c5+Qm6J5BMrNa^CmhyZ?!gL3M21z
z>otRkPa%OkoNsyjHK^s}4EQL#bR<s)C6egzl!>(dxzYB>h3`+(vgWtlBluPRQU;Lo
z_67bYR!`fX4SXIldEgX2hbgkvWK=EMGXU)KE?Mz_s+7;?L#hp$V9S5OXi^q|Q>)aV
z(jykwZoxL~$z-~sg)#(F|Gw00_KPns>=Fu=m~R@Gk1aQglUVLH{}8ixNx3kwcP&yG
zj_+*B-1wR7G22`pD>ZW%v9-C%sBgci**d!hu%8k{fh{FYBMxZC(yjqaz<4`&tGnLm
zSk`kq^w7ad0$Yi>E$~%KLc{(%UN7YCO@237RI|Q6fcF3;^1+&k2lADtP-?-MUapu7
z{4+Qw_qLQhFMJvoJ1$Ia5cg_A;=RYKPMYeS`4d^<o@<Q?79P`7U?BYwMHtF@xnRLc
zg*RXE0U7GV{g&})4hjtO9+e{F$XgvYls0pjYB*jx6O>Ds>&r0}B{$;YhnTCZytUy!
zB0MW1vsP6gAGr_fh#gC|4~u<F$Uox#M_PJWz{BUiA!}&G=Wix2cZ^8I6})TwkxXiv
z;OOahsOsg9w%a&ct1Gdb$ame*%(4&oKb_WUFVkBaKm~x7H3$x@o<li=5#t!)ZluSN
zAC5~A{AfdIF*FurVZLwP-Dw<U#FbJ4XdjX5(91sGn3tduVICGop+pFp_8_}X`ax~v
zIY@+O<gXX~o78nQN%x}x20OnyK$fYcmM1qt@3a54d-~!Av3~KM4pX^Sbl*H$x*ea0
zdK(O+{gN_N!if$M5^jKOVe3loH?Ol-=$Bz)*Z1Gxa3!x(D&jY<w{dUQIXBi&CEXYm
zEMDuhiI`i$8Q=%K_jXqAdf?yjCVnKEHoAqXuyMf?L8W4sx}3^Fm3W+WGP!h$hQex5
zR|i1Zo32kl#a8s^dDYU0@D%||bJCNRZN+--ke=kQq3JMX9rv4T#fBq>ob2^C&q#V+
zovaRnz%G1xH`0n=OdE2Gvl_$+LGz><a|Vzu@15ncP}X`a7k*jbY8vi9ubSWP+Q2(n
zaG3u8psf>gH&(pa5E_Cjqfk6))!gnF!fFFY;p^}b=o#NQ#dHheei|JO<L@LO*P!!e
z#kKjLcGwcZDZ~COX2^z=DUEfZ_n76<@N3#DldWAFG6eDiQaM;9&@g5Oi_0jcUo+?n
zcvJ#F<$r3BA|o4W;Lk5TVsT~#!Xh1KJjE7EC&QyKnM10sOeva(wGY%hGeg_iDlmr0
z1ux4}`E|t_NwX$!`44ogC}|DE!bKxSYK+VutmU4OPFV#B4~KpQ2$BMhLyRF#U;^z(
zL(yty(1Ya<HQrxJ_E2T5{=QBA8YafY{1PHkj-p4>{=~%!S~!I@!%iW51a6{IC{hr-
znnxa1lpw4MN{#NvqxpeQ=-R)VTqnP7@-Z)G_<X2ivN6t{>#yqlJC3@LV+R`!<iG37
z4@)d53+}%BsB7*Rz?i~uYP_bB@`+k6jV&gA8<!@-D<R$e`lOdy``vVVNyTeuk3KTY
ziEzZcncnz%=pwgM;qBb{{F?SV+X)Ld3nhXRL@m}7jfG8r`^W}K%96-k*1P>@dCHRJ
z7|<%D=FaN&M={9yEmLlDgRhlbb}`@jlvcPb6JMt?n`da+Q`;54SamVk@cnQf$IZTV
zCTE__%~>G;JtPiON<z}tBm-Xxb<Y@`!<$CgmZ4?=Ho;jemx9VbNw-|0=u5RUM*{af
z=ed7%p(Z^8-s?JWTWA}n{=e%I)~Ge8gzk>qtPO?4+OAQ`YO}d_f^aQJ^>Ew>8ImbV
zDh=C{G%D{<L1G2R638J*@oT!VST66R=R8&w?6i5UDv2{eTJ0tmab;yQO&N5USdH28
z1?~T41e&WABpLgV>Ntfaqf#ec?=Q<ImJXJ-U*zLZ1+QcX(^JFNt-zy0OmRyZ`h{dN
zq#;N$<gC29NH)n11Y)|%X2cY^%*8_WvL)?1v+NU~i1-!CkuW8yD+6YCFh|mST*ee0
zP7Rb#9?kEW^&J4OQ%Q<|(gcRIVUPi4H-1uIuuTM?+;*t<C6o6lOGw7C?1VEIeRU-f
z@9Y|YUdqC{V70uRAG)cPx1g-XeL&0F0|CXNmBOgkZ-?|us}CRiFIHh=!m7R%GR^u#
zo1g4$^f!-N`&Qf~y%BT<Ziq5U=!w-*fxxcl@bjR$5bhO^SbYz8lhc=KxKaD*sq_l|
z)E<o~H8c5EZ>0@N!<)oPfBJu_)+5&$#O`XClqPIKtO@ET)hNNk96Vd=<sAPBKW}Zy
zsTNdQ)Hh|H*lgH-eFB+T4i44lia5FK!#(Hg`aj5#1;FRgfA`QAYj6h^+88`--i8n7
zUB!ZX+g#r1x+?@kBhr<AeTuZi=4uB+gt|)gt!vWm!W2O3k5(BvHjkk2)cXTQM-lJP
zav&_!*zuOQ$6=IL5I`9g{;|afD4vNceCp%dwSyHGhDtNZ<z!$S=k@re?d`(h%!W>r
zYoooGQU=#qyYoOg_ux;LMQ|zj8^tTwWCuU^um1*U7!Uo%*8{)2(`xWa*19EG?O%x#
z<NmJ0Ipa^$%kcC~DeA#l{6s^yn&#5xvbT7d{C&%;v;}$pf2um(bIggix92;0F;mFA
ztJ8f+c)+%(@4xVo{?pTh;*ZEs>5td^d<vx4ia6T3;+$z++(R_`0^Rg~;G@}H-d8j|
zyKly~v;C{|U^Tt5&w|!|c}Ng(GFh2ro2-4c9Pn1JtIIi1q~y!y2&oN;G$IEheYi(N
ztfsJM?c6@L`K90Tm|t>%Sj@Z3KNC}ykKnJ+Pfl4*9PN%TsYzyTesY+0Swsd-ILmpp
zf18IP&EjVXUj+XRyy<0kP8km}CW5-x5Y(WgEEt^ki^dmedR)_RJ!TiLA)$mA%;o+D
zLUy&FiojTp^St9;m7m(t%xTX91d~jXsa6+`lzaZUPq(=smQcn{TwPv^?_%U$HGf%C
zY6sQxxt8StSC&XCq_=+6N$YB4_}~#iPP<h+G>8(sKN3ivcpm+Vi>sm6&2|1gd+1}5
z<&Z@{F8NUVjJCtdXXk8x&$}}5dGEWa{j<`+`NIUyufL7=26Iy@R1`Mo=4E1c-b3Ze
zzK#h-(g?Ttb?j`kj*2~#dWFV-01b!h2qdZSDLJ_M;u3~G@-ag^?m$1`L;**^aO3~C
zhqM%vYbW>J#Na8K!c-{C{cRf-kzo*oJ`>Pzx}+AF+B4XdPuuul^~{yQ-@)n*kMBTm
z!TWU{t^NMr^0+#jZ<B-+zu~KfnYM@@V>UrrD->>YIa;2dbYzdgoYVGjM*TK)N_^nc
z@A)O5a>`GSd#$#z`<gy^kE+8wWAX|H!tQmBpHP^rm@ipNBAavnqQ&x5>i9H0o)g6>
zO269jfeVGtOz&$obNTwS0!a4`HQEj2aP=^cy>d#r_Sq>%PUD7!ph1uantXM3wmslw
z8?Stj@3cOo8ieWL>g8~r@OEGFejvib5f6L6!mvyG==bdnl?2^OJvvxtU{tBYNdhnv
z05&d|x&9Xmg5Lhcf*vM4oD_MfEY)n=VtM<dQE)a}O!c?q$%BH?Ptf!xKP5?;hIcJW
zb^BG4Bn`XY^8f2t#?bJ2Y^D8893g|ry_15^oi(rKX}y@1qtpBSR?2<O+kX2Sf9q>P
zSf@P$Bss)<oX#ND%>MHGR_hSt6h#+-CLMR~5H80L?%{SH@45xev3_|YwY`h@!XZGj
zkrkY>%-mS`XL%7YXG>WTw!Xd;_W(u-%V6*8lg))%F8VeV(WcT6bu-8dVZM5+CSA$i
z>?hvncyvS0Xbn`mxX{8--hUA<+nOvIU%Y~((2n|LOLA7b!>}6X)PrdKTWrMZC3lb%
z)NrNX@nsS;Z8AZsncq$Z*S7lOE(sdB?WVaxoDfQ*fDNG+Ew{&jlH4C%Tu(rXtNPQ%
z&(rgVkdq>cW8?joq~qS8^&=tTnL8P(6r%qm+|s#M5Dva0AyljP7uXD^lfyZDI}~Lo
z#ESzPPejs;2oIAic7-Hd2uwdGM^`Z-%^Mh8U4Omh3PH!4%|aP^IiKjhav33~N9-oD
zLebJN97nu`^in&2Vz~WgX!I>dB?R=yY{aOMhxgY*!oO*wGG8`VXOH)(u8Za%zo`NL
z@q(aiJT`vZhqL*}CS7^*dj~>a6Xg%>0=NH<7t|HQi6$+(JMVm9YB3c{9uR|}jHX1+
z$PlIFVBZ03K6P8w#r-4tjYZ?DLyLG}s~GWKOG5E=!nzBK94{2p?k~acdv7G+iWhz8
z-;|9w+VpYoLDZdV#ffVHM9&}_jIq6p2Ckcn>Z+%F7%U~@gF*j0vnabYV2YeoXy?)0
z889Phn;#Yhq8h^mDo~)tR0SF@VA-YNao4imX3O6%@XF_PzC1le!<*x1n*-U7W!9BK
zQ?O@PRP!(_Xw58C{Jln7AvC;#FpfF=g0Kv+N-k0GR5=^pByWE=?5D+T3)8xTVb{aB
z3->(lflRX>J-L$0Q?Rzn5GoffnZBBgVUH;z^vV*#Oz0SFSC)o*sEV?wof}~VdSK#U
z!ra~awUKtFnU<~PcE_Qvhp-ziGnjVypP}8CLyqw((CB=Xl-VmYgYV`0L(@n{z56}C
zI!!U4Gt+Q={a-Y%{d$ksM6BwP#CoXJf+)k;$KV?)5p>5}pi;_6&41UZ8YSR4lw7NJ
z32KuyW5o&zuSCky6#iST7|<*~=%<CE7#%>9v>g0w>OZX+b1Lc=<V|219(TjLp~?Qs
z>><ySGG(v3KJ>`cn~LF{$W*)z*KWVf3-arZSn@4swbW9&JOj#;6NZZHo)cNVBSZIc
zhtwQAlv;cCAh0q<JPv67DksAXGKN81)w(urmSQs~QfklwN)Spe#-ah!>3$quRyybj
zV~+zZ-fuxxspVRqiNujKZ?ya-A`sTq4_xn~9Copzn@Bl4x^2%<>J?1<cxJe`3rfT`
z@Mt;5`#mh_F(j=&et{JE*>;Ogb&e|86d(>5RIOV7A0p^7#zo}c`0)RTAPTTB)!k^g
z|CF8#c&{KVwkN8#yk3r-+)pwVf*D~5_AHKijlIDoNdM-AszS!MsNZy6ldj)2ppoO$
z&^B5~*Y#6UOF#c=2Rc+j4~Fh_30YujW-aw41T}@N;&WF7_Ej{u8|-19t#I#eRgwVN
zv{khj^wF6xO;XE2ufqIIOp`Qakle%j4ngn$z<N4$7ZOXZ{>UsW{K1JKlZl%<_7+wZ
zme$&>xuaG{XeNHy5NobHDW3+0;qBoX*Wnci1KYk4b{Rf2U|K=T$qu+hPtczPfK*NU
zwG{v&nG&ztCgcM#$mL&TA!5A0ejPCFD{ArOlkfi;L1C{RdxZAUJ)#`|c>45zd%I5W
ztSeGygZoL&ifN)6AQ@=pBGSb8IHJlNi+S;>SMa{UP(nVZAE;-s5^7#6;e4Gn2bRq@
z&C=XN3N(LXzPj7{3x*O!ELPMcoIjHUeQ451J(G5b{F$<7R*s!5!G=G*q`7@+%uSJ(
zcSM^_eTeFt#gqdhM&Da?48$t{f|i9Wwz{j?1(W4YwkgPVD9FaV*oxDv#EB)=_cS8+
z5U;sqPDuvL@bdFpc0_f;qr;=;uBXbF;>E3Eo=FLG><4hwTu&u~@;Ill(fkwq5gD#B
zVEnq8BKegNCZUiyN|h-&>WC`E?yJ$k>*#us?^WrS%&t}G0eib{BP7tdsr9`KcHmKA
zD4G_8r6`=#Tp)<s1WVX;%K`whi-}m2P3k9KF>V#@yD%A>-<=nEBVe&fYUF^I(hfx`
z%C=`C{zJd;4?cw}un-MnJe7xrI_C<KsqeE{V>So25iXisg@BUq-dW4r`?98ja(
zw&Sj#W1Z;{UcjzKgvv@;egi*c%YK9kXe9#cmWF(48D11l2WSWY_Z5Wq-huI33%^_8
zngduRC*CUiB5s8Fh1b1|#rc|5vrzm%3@Rf#AXKwGi)2>ySX2up$=b7*heEkOamJ$b
zH8NuvSp>U_sM)1KU{ty%g^6l9#PtreGTpV@o)a5jxUE))xxZcLKkF3yMK36iGS^mu
z2}-Zj<(9(a{wFH2ZbM{rZ-^%G{~A3)V-yQYoFAX_+gUW0h-v$ao#joxvZ4xN!|7t^
zK}|pDUOu>EK4e}FQ01sfWir-Ul&E;6v2apk?N@0pxxQ|`hiMMnGmktE*2r>Fh<aA(
z#7Sk{b=;_MCKX6tC`hhq2Z^|ZtxQvaJ8A;Y=3weynqO#vfQ8YpwDLl77VfT1p<+#*
z)y$PI%Z}Wge-f2O+a0`*TQ67-fWn<nQ&e$K+&Liq9>7pt#xJu(j`wHPRHW!4nqQqo
zMl@q!fU*414}ep6@Dx{?blsr0uL$f`B+vPsKhbYYA4XkQd#uP=5L|+Ze=B6tM;v?L
zH@x&duFJ$W_1(f5wE8y|7bZVJ;>5lsvVp8o+(A)OrK*m%SOGTuvSThlvKXWz+_!Sr
z?m{Kb$-albxImM`;tQ{(2krDhS&d+G%s`hTA#8nhwXbGSUMv(hBeamfy-jgyAlS{M
z3y7dOt5%kz!Q^6@&%4g#;zxwbwWqNf_-(Zuf#8WAxze^!u-MN#+^hiKnXDbzd+vI>
zq}UI{vsykK((K!+Tt4ltmenL>vwl`-xS0j8&_Y&@7}yK9qDmGXm8v0O?A1>-4>QVq
zNz0!J_Rw-TLmpBC(L~|%fAmsz^RQyvvmCW1e!1|2SP>?z7&*5OMfFBKKZlg1XT|5-
z{2LUDk|m#5frcD7m1{}<@2}mMS$;0M$<#W&Y@4+@>=Tv7LF59{XRLbe8aXSq>Dj|j
zk_yTq78KCREQgS9fCeysXx(L;-}*Y7j1>rYse>_Nb&(m^<UnYEGKchA+4>E)p=N?B
zY_70+YZfj-L70ZzUkl&wBW>BWqod#jlY#mz)%`ngUJ4k~(EXfd!tYsy8IVu$wLuY}
zw*CJn?8G&$Sw^xXC|tBXgBmR)2s{VFHgq}TYH?m$lM(YAZ#KoIFxzTCtz+Du&OstH
zr}@p979_S|d{p4rxYohQ_`o{mMfBsgd!x0UKI4?>1y}b<mbd?BUCAXZJSLI!hPDNL
zLVbPi8K^&KOkfEZjw(44xs8UmF=VI;MhW>+8`QZ6`B!;q+M#6c^(|noVbhg3G57x^
zRRrR<T{#Ua7PC+!7@2JHs?t>1%eSo}yb)pJ|C9^0odbSSI`&E=M<nBTg_%8cK@sdJ
zSc+6fN^?XewogSDOS0%}-zd9-9?@&rTZl>etn(gl*+`~klz2s!8_87ou2Z>%2#;w4
zUMi&V@XgRwugtj|{b#mEVY`e8M2O5A0U3pzc08h4I^v#ZGKm89Tn6H@<NEa&WX;tZ
z+q3ZN_@8Hs`nDukgHh?uab&^EDQln3PC8c>wOI$^{l@Q<*@hHRf6vEhk>@^T^g|Ex
zgTvhrj6&ZW0gOJB59H<W7P9LW^v<6VFTkV_*8gEYss{hVexMy4ug^AGzkj7QJGpGF
zZigRV@*X6AB;iqg2EE)KFb!?-R?#lN>yMcq)~Sg?vl;LX*Xy;^^?K_S&yQ`nE6<-m
zYtNmXt^-^XZ)SP=iYm+oZ4~+A6Lm6p-W@beu@x6OXou;*Q2so;fkyoQSU`(HxVi4S
zsL}ue6oK=>GyxUB^A>A3G9>zKR{dUza#AZ&`o6v2Nbkl-???g_txRr9t0}Ut#&q#|
zyUWT9wr^AM*-vV#ph&v|`u(&rm#bYDUCQ!YA?nK3r5i~KC((a>LK!;YpR0?qTm2n>
zq~bIa&Bz6t;%?n^u@5GHi6=p#3j-C=x%Ve0b5AJ^n#DoUaoI=L5dngBEGJkzJQu$@
ziMRU2-$2~xIj%T#FaB~uS*C(Es**}Xgo^#jJ~Kwa+F^py7ni00gQqYvSTO*(4=O>{
z59^4}%w3L7i*rV`0MAQ-6LAZoY<yRUrGaR?zq!O{-a@!8gVc(tq<0aly(w<VB+J5o
zT>qgwD}1C)hKGNk-c#@<nkSj3shK+vHMY$uEG`A#OjC6Ek49yWhq!U)^opsx3=0!|
z|Jc7&ZOPzBeJAU_0XuM`dF-vUO?PRJS|US0T0IA@h>gy#QlyaP$450mp0FcWJ%kaq
z@r#d=@e>>-4EkdWMltS<REhoLJEFHzd3ls}m5XKm6WeuiP9mCdR&iZZQ}g?%kUbe^
zVhYxb>Ly0co{+ElJ2$vhH6}Pm!ju+IhZqZAwD1~(cbVw_c{o@TzNbus=F00-^ZXKO
z4u(r~`ZL|h(b1?(kG3Te7cUM+CZV?(7C2adp$Cd6fz;Ng&6~Sra_rdVFW_iM!N`m<
zld)#?8^Fw+G*tq=A8t&S1MpBT43~il1YPFl74$8ZFbjje1i>Lsku1}=&Zk2m!vb|E
z7p}~mjw4&Tf7jPT0;b4lb~@4Ys*BJ>d?>%gERLmoL(^9i_j)T%#Ml5W?e`kDa2EI!
z{xYR)9TbJT>TNc2aUgunM0+4VvL-h8P^nno(5O0<QZ?q_JY1#8g?X`L8C-x`$qf4M
zH7@QD!QL)To#r2MXpz~~Bo~`y-A?PrU~Ve}FG=_#!`T7=zDR!Thdy4@$0*=W91JXM
z9PiNpSA4;xLHs77lkYgTv6~%06#PWh{$^c)cNZAuuT7hJ8t6l4EWE$LaF_<66jAUO
zCS&6GL4qj<3`|BA?q~s-2>yjh6gCpkoyF^GbW|w3C|*2~_mhp%ssMLz%Wo48Bx1e?
zi<~jemgf+GW_AKqx%-G}4KPa`3)6s?wauXzlX}IHt5xN+wGSoJ&<pO#qgUm-8yMUG
z;x9A@Q%R*c78}fnC=AfgSc5GNtZ5vtRjw>J*9KaOBrT#Q3;WJ?uKUN<)`y-aspD=m
zqA%p9G?y%p8KIws*5t9>N{T1eTnm^)MY9nAZ<tvAQ<ukMqxIa33qQS!ypq2u(fj!o
zq4NZa;@X~io7nmwm0u2lY*DvLQ2aoAP4dKTp;uImgro_PEq(kw$=xb33}Y@bVs3z&
zRmGUod_(xKgWJETg7V)*nh3?<5t@v4J%|n`l5<DmV<uvv4nhh4k~4Q?PY(afQ21pv
zx(2KE{v%r`GPGbTdcD4{<O%5Ni=BN&K*zMk+^qHXZ}I*JPiks(cSu)qShnfR=~}D8
zT{oMGNW<br(x^h;^xTg(E8$|tLaNrw!y3}K1@}_`8D&Au$ke+H{`c7CA@AAHz@+Nm
z#Md(C8<3vj7NNp5kC>3K6mjLTAVY++KK^#sOz%RkUGSx+$DJK^Vb*0j+wae__i^&-
zNWPY7Z()n2zddWC>B(5S;Py{GmS<=aRr>UFR$02MpHEdhEnv)MI$bmO%)M#rrOTH`
zp<%L8keyd+fzH>wXinTRKb>A!wm0ARMr=a=LWJ({pk2mpQt=xr>%Wqniz`S+=}?<<
z+pp<O`TFAlQRu06Q3r(BwJUlfLg`gj6|(FOpItzyXV6CU#p@yM&zC^?L<O-z^I>)Y
z5@}(o%*5ef1_a4rLyUbWQ03CkbO>R`W}liE>4gsTJ$dbcmmy!}h_2VNKEJ!VrW9RD
z;x>Fwrvlq(h}>IXpr?<inUMeT!t)h&=G|vymhl+NwJGv1Af*}`G8t8+*Bdf&B}TVd
z8}n6~99B)nTiGY<34kHuiDDPP;)Dxt(=z#bdP&HryHrHbxu8+ec?rL;8+k=$1h69Y
zibpf5?*Zf{hMX9NcTbJk=j+c?m!T082gF?y`cmRq8uM56amJYpxL%_b+4v<B-JZM`
z+G?rgHkJR*ti!5lrQ!XOJ|DB<7_fK)(posAuzV0NdL;d4bxcd{qBCZ`z!5AAs~XZG
zf<Zl_f#vx+V5dpli?ez0vhV43dT*$DNR|rqW{N@LoJ7*&K&CpzOVp_61Tq<aqfe;q
z5JA&UWV%z&IKzZdeL>F)rIo7vD3~C=j;-9FVtTqRN2D;`;_?SoCr!82a>&$O4sLDV
zJ~G!xys^mPsWnFru9d6Hl~1Eem)C`OSI*sQ-;BcT`|!s~J2DIZEH$~a!lb(h6hT+y
zWAZEM&DEG+>_l8d4D;UVYl36ER|P_yoIUslk?i*XRD0xnSS5Mu{`<WmT`A^~=iO}6
zlfDsqhnzDJ3^)zshvFm<^Bvke%90;uWdB(deH@WuO?q|>^F4%Ibw2J!k7cT`yjM5S
zo}{PnZ-Aq)v#Ani$U<)^g4cnxD#9q;FZ7SK@Rk&ZwoHdPgz11N)tnn2efbde*$|?8
z_Zb_|p;mJgV+N-4sDV_ll-y0l!J_R25fuX$iFq1!Nap+m)vaJIwBT-v;5cZKfHHz^
z2j>SU*|ePPRU<ENg_OT^TUs&ZySfz$OvqBNV&|zZ&2Ld2z3BIl;FE~DK*+znvx~m}
z^_GWXQHCo!S03C2wCFph!T?S11V%QWqu8S%1qnowScXZGuJnt{kKnPPLn5XYBh$-V
zo-eR@x8VpKiT4By2%0ZAn2@8f|5z8L)l_IO_7Ta7yEHjrmkVGN%;EO7a(OL*Px(fB
zr7~r@@@?p(K$5O+&F7!ALDy$%)J!?nMQd!46R%EBlI@pnxNp_k*PHW|+LtHEDvmTd
zQJKKNIkhRC8{#<!>?v7bGf9SKFsWyjN!1q#tjn^nc{LCQOGC-rEZOsKK!)I($Ja78
zc;v)4Ts<7hm^oAT1?7YxEi$RL`D|{<$RT=hB`OKlWka4*>f%@YTUb=X;!C)HN;a$X
zaW3vq?gd13k?#|b&khp9ZoR75BS@l`MwEgzqo(}4vh1O7V7I_ldM<A+jj+UiSK_}*
zN_65<YTDrw=t$4WCNpJIdMGD9b}2b}CNlw8R62Fyx&M)?rfVu2u<{nj;LhT|oyb_8
zH+?>LAA$aPr}A+{R{x5GL^!=X+qnu>VLM69T9Tr{qI-(;gxOQ%IxjJNNp=;qf{o6?
z%&$5=ia&E^s=r5u?P9o0Vmsyt=cDs~(&%qTOmmX{w%3vDU)J#Zc8nPI15w+W5YKTS
zD+;obN{lo76Ly{O3PXhg@3qiecQyOFQCORFs!A6b$@HA6XV<}%yQ5=^5)o|G{HD6r
zP%!=GN)u{4v21CQbXc!;QH=8r<rX^ox(D8~Xv&T#hY6ha;!Z*Prg<OSgmKoJYd%2!
zfocj4lS;D)wMI1jdQZZ@Oj4#u$*RH31eWQi-3(PsiJ5k9R&2Fo22<ZoD6@k%p=$c)
zDFTCxC;{yW4Xs?wtcjCHwe;j$IkddvJejI}M0of+Q7~#@8$T%L*+>|(L)H6Ycm$i=
zw_)Y@a>LY!(TC{Vuq@q;D}F6c&~3733fCZ-a2auyA5lnYcQzh~hhhw=bFh(DC<_4x
zF08cqv?>G%HFJoRax&Emk7sJgmvd2`Cbh&X?4d*@EkFP?{r9!eznG*!su4M9EG46h
z`5j_V4}~m(wgU=PuRfERqs9kqpFO1ty{HD~ea25U4Mg=Q#W=U3N)h#Tmk)m1KK89Y
z5Czvm?68v52+kSne^NoC3V6Q%SBm6-AxMkFaF&?TghBw)O*~7Vb&3+k`Ed6^*6Ebr
zRRF`d0lQ?6FlH2<o>iF&OK7syRl_Keh6=aZ#B@@+Ekj`3Tvkcl2dp_)d&Uhmp0K=w
z>6Y%?JT{)iyCjGBHySBI1Xe7BW3F!5%y?Ll9Kwe|To~a7b7dcf9cPODoa%jEs_HLr
zHb;6GN^RlRFt;x6ie~3+k0W`jX&||8D>Pap2y+{8EGZOROEC9f6mDZcRJa0wSSFOA
z3xU9pl><S!GyS>X<o(P9ea|JVP*m_IJ9q~7JnR7mtf34#6=tR3F%)-bkqDC&CddJw
z6lSc_X;N^32EsU;_sIVEwbh|>qj{Xw63LVUE-W6k8pTRAUn0*MlDF~%TB-f=Oq%a^
z?L0Qu-?U5~deb<_I-G3+4vHx$AZ2N|RMJp*{9_Brtki2B_L#?&+voxuH)a!l!XA%V
zn{dul_|||(&Gl~`Z9T@T;eYt)cHFXrf*rTcP(tQ67%PAotif$}#SX}cLSwC%kw*({
z{^9;Q;n$n5{fmoh@$@igoEstPw<V0d2YJ0%fQ<avJKF)t+GBsV5a2l($)GtDCsRt!
z*C-RE^V<zqTGE!%fy}sYh-Y1JRO%U&$~exe;U3E)!m{}hcXelQS0NF-gBcL_Jhtl#
zV%pOH!>92W(EI)dw;R=`+RjPEwVjj-z!|*D27Zb``alCkamK^u6-iwEF~fq1fWAac
zHJmoV#;NZeQU<&ds0VpusI>Le!1-vi%!J2|-r8$L@gEXR4wP=2tLtn<`t3v_Br>2+
zC@Iy%k47+ftD8b_^X?)zjAI20hv$<d58uiOI<Aa$e863UPeb$a7wiYFvtuMd%(u%I
z(XHIdl*+7cHkVp-<MkIZu;kFy4p%U#g4t3l;ku`B7mEJNx6PjM9KLB@F6=UoGQ)Mr
z(TL>k6JN3}ja=Q@w{DbbX{+53>CHU26qxcOQ&x;t86PPa(+xbOcP!)`C}+M`_`@=p
zpvV2&YK6iaZcax1JaL}(u8(ON6G#cA=0IUf$awwow#7YV=ty_7PgF9O3>q^i_Vcrc
z!U9!w9VLXzyGiHXVpeCc$J|za<>eP5-v;u}G_k~~_aM%bWax#`kZQ^?xztDGVcUGW
zuU7<cfu-}UzN$BMY@Q^E*xT}iBh}mDBZG(R#J!=(!mg-IFx%~ziiLU-Ib!tZxX~WE
zN<hL7f7iv^HP!GZ+75q@BLXDgn8Niv&r|fTG72s?)Ja9mnEw0DN@WD+u%ygIX}D|^
z0}3Bo=Xs1Iy19)~UJS&}gBp3kd1Et_t7A#K)qh(!!p1tyT{ojgc*)7(k%$9!hVjh-
z1Dv$Di&S<<Od{FBSk6>m73@L3Zs&R!TvACt0_zIBT{%Zw=er5zwa6}dNR=Ppr!(G5
zc)2uIeI?p8TX;74NHZX~MCD6DZT_P(1V(9+$1owiy4R0mPtKf-SK{)}TgP~*hSBk6
zNULt$%7sj9T%b*NA1xt7M=BNv*{5y6^!rZ?JqU>V3a<InyEvLPEXdVk_wLTVTBy;6
zl-uB|MF8iYt!G~xj7yV^YGzKeXxk};5#f}`A1h3zu!;_PmMXP9LQ&yQVY{wS?1L{A
z$BGEg>BiQBbn1g`xQ(=oZ4Rl>Qa`O2esbw2w#4iZQp2u@5!s?$YqP@QV{4;^6p{5s
zS?bXS3SKnss%y8~STcDpkCYMMg21M%wIDLqa;>#+qL~<^rwA@iWeL6VkaCnNLC7-r
ze|E~l)?ssL0-jES2ZQXAHtVF$HG6#qI5Vx4blcjPD<r{4CtXz0^8;I5Wf*;D<BtmP
zj<yWsk$3BU={Z)$IExHJl<#@>Asb|YcJ_WB&fQ<)rb)({uf$X&fBsMo%xeh@@P_fT
z#3QO|fSd>4G(CsW#N5_D?y^h%1bT&nl=OBS0Z-F~{I!+3%o11G#~$+_jooGt065pj
zQVbOoVeBF3e#Z4+<PkA}{QJ;wnt`>b8uImy{F%37%pM-itwKIO;!J4Q5uaX>a)G*}
zphDOgDlN3D@AQQwv#Tn=)a3E<JbUE$F9eV2PvZjgy21%C*H|E|Wh+WrJ=a-@+9aoD
zHHRe(L9`uWEe(`^g5i`E$!Je>08fbI*cD=NZDmnVcoih5F=G6U!H)((Q4fR<8F5X!
z{mh?g;v)}6IWlSOHl@Owz&cotX_r1|RgyB!d;gxw5tibiIYkK4k}ITQPzUq!sH9pB
zuGCMWLnAWx9-AdMyr_&GP7a-N?QmqG*dOzg$O4FUws}5sAyr)MMqz{C;No0wL!EaG
z-CF$+hdv;HxiB{5j%6uMUVb)$`7r}xma;Yp(x5j%>?694AU#I%H#cSG;l-r})!0!A
zO7V0JgMRF4kXdVt3Q~ri405KmbPR5X=1kX=kmAI<$GGY9%d-fU*Lg`W3_>Gc7I7eu
zFNIfM4{dOvg=NwQpT7~<B)l=jz3X_62nZQ-`?447w$-&hze^BHze9!B9WeVBKp$FT
z$KhRWsZo$gjKY7+vBBisnPxa}gDIRgB18k@9v#%^O2wul5J-(Wfxqh!;SR_c$i@o?
z*=vLgvNXdMTRKPJCzLCV-HS4Is%1bIR{gK++rW*zJu0-AJ9v4fuw}~WO}10j2AF%=
z<#U65m=$yu%0jS{AG2yO!txaa_Rb>Yi@B)GO}=Eg<oh`l)2gC9CIqI!_RlfLNK)pL
zi>c6>7Rm}roJ=UkC={>4Kpr=XG9i!)!_~7MxwIVVw7WHHlHTeQfFf30Oe!a_@@M;=
z{n2u83H_EJP@_b@+kbX-mtcnp)s_xVcE^N07<iJf#NjQZQYJ-djpqc*2Z^Q~^cA9p
zn@#E-x&`i=ts7<`d~&Cj&PNd#o?`boNu~SF*8Y(WX%Kt8B&)z%ts2DNFl#m9C^S{O
znxtwkt{EbcRS^1PjsFoG?8B-wJ{!`&tuixTA)yD(br6-9p??MKyD8ya@AO1c!mrkC
zxXJeWV^9RSrvAP8eHM^Nk+n23AGa}~XG^oCqo-l5qp`~U^fCN|zk<*6YGgh4eGRgv
z1FUWAdLP@okSSMVqs!OQ&AIb_mE*WK-8MnT319Ago?0C_Q5^|?#k2$cYa@*gUx{hy
zZ3@Gb&h$%Jy&AZB+ojdX0uB$X!{LxUf0kD#?rccy=$ANUpq)3nx+I=jw%RZ}eBI>9
zlh9px_-i6Tb7&MU*BHn?7u){c8S6nAG9C%SX)V`GMysVFv8AA^D_>fCT8R*8Fo~a|
zYp<AA@{pgm-j<s|Jq@~_F7AiJL!VbnRVoc0?qaLuZ1ktg^6;8vXRN(hLdW*iWY=Bk
z4gb~__t<KGK;+a(|IRm+V`%8LEw%sq_cT}geB%mKPt^TDu9<rMJ9&0uM*(j+xL^R;
zGZ?wi5hF4ME3&nLf?L^>Ct_c_nC&}8x~Z8T_o1Uix5$wWvu{BP4}k9;^qev)sT1R~
zdt9pCavFjULQfyYA|tQsBHXvU*vX;WGdS!tjA<;=Ap=d3Cjg2Ck+g%@nDUg@Wr_um
zd;YxZRqLb*UIwY*`Y=G9im|Au5-QaQv{<|3jm?&bXunv9UzQg8l7Ojw)ZfpemqyE=
zw;WD(2R7nwkB@T`{aSgSt=5LZh4l`wRC!OD?wegGJF~%GW6)lSr5y6xwOm72SL4X5
zVRbt960gE0vi(4(a@_Jsin94;OucySflmu5ydtXyn-WH&5jXiqHRPp(JR|vqtD47!
zn?F{(RO~!TOg%%Xwz{<q)?|DSdF%AT3NUJ$WHEiXId3g@y{Wjt-Bx%@|9(h!NttaM
z9HzQ|p18buwVW>XycIjf1?Ij$a5Fa4&Nb=SS0rO25Yr5*_^%N(z6ez50~m}rt+`t2
zW$o<iy-@49EmW(Nm(a<27qil=5s`TeLNnmySed(Ng7_BoB^eIu@G12<X5owQwT({z
z|LV#5m_O82@Iy>HuLvS}QIN6;)wiLSWG5%LZh@vYKDg_Jeak^^x5hJ*5%F0zuu_ra
z(}hZ9So!nKi%Jz74949fZt4A#rV4Qx-?!n;&E4qg55_l?b34}tjJ;%Ojsj(hlU);}
zk02KRYkHJEoEMsc=~Bg$HtbAYtnn5!c-G`f<6qRR%rPs@6s)j#HnMVPK983bMyf~#
zDT>5==nHAE?m2*@N&5oE@Yj>z?B)wgcy7xV)RklRXL6ztP@F0x1I8BncLU3}DDQ`Z
zXy^wZN3;4m;~_&%Bh%&y%v=eQPVdY4ucu_LY5g@e+YQ$lBiYN5hv)khExoLUnH_sC
zrlrY|6ST`Cdi!#y$onC8THj!9;Np61sQ2kMXwR^2gViD_xyz*+%^ye0gZDqo2~R;t
zd9X`3<K9m^_DF2;)zt8s6FuMc&&Bi|EZ6hor`)CArN!Ta><BU$b=H>p@qezU{BC%Y
zZlfOK2k|W<8Mv#5<W}&%R^LB0FH)t#kQV;d)_yj%J!X1;|DlysGPL}o$2F*>p`({w
zs!yZDx;Sj-LO^{GGexckVEI~(nv|xqZ&Ums+Z8o|c;x^03PRa@hs%GgS&p`0!<q5t
zW4*b$Z6=_@spsh}aB5>`{h>$y`Q;*#f4z6RsXIf+rPO`k>PEJy`+J=C=izv4<jU;X
z&F5gB=Mpf%`XUngc^^2~>Vn#lA)7VVtG)Z5s6DmOg-P0t(f<0Mz6b=Me_l%!#h>3W
z8A)3K)KId*2wlu53faM2#<RFxG^0E<w&8s4ooRG@6kRmLI^Z!Q#utxKk@M1qVN<!b
zeL}{cF0Vo`(YXm$f9m_7a9ln|7vuv);n#<k!kXI?ivSy}a=6U(y5z`lW!!wv0W(7g
zu|JqMU-r3+QLxX-zlxi{t{xsz@6+ARFx($_8tR|fK1!*7Pd2U88{X{d=Sl90&$14e
zOep;@YYrt)8>{fx{&b1OUIo<1g+pCKJ@xsF*}-nlSlvfIG{M=ybHV~{!vo4mfJOYt
z?fEq|C#{gD^pvGE)yV*;_S%CsEQJ}xA}&Q$0w6q**&3t>RLG|dzDvi_OK83-K5on0
zVQMCviAH#eXEwL~fQSL;gI(u%PIA|w(beUCa)tngbLfZXDNqBzZn*;j0Q@fvtOMYZ
zv$Z)`u9m*S6eajxh(h}w<A?VbY&JDqQkR3!r)2Esb%{K{Yoq2A*^hT|%grpVmTrfV
z=7xURia7r9Hl+2-$H{OVFyQ3j?26lfG$}oOYMbrt&L3Dj!1|M>zika;R$SoV{ygmt
ziT&vg;FmK`$)v1V4BS?#_0h4c6QF4ekIFH{cKQpZLBb+pSe=6PD{*ARv~@ew@s(=L
z6BTN%jllAqEbhPX<FV(*UgEMTUw*Kiht1a<Z3yz;BENsgs^>}lol61Q{W!@h!}QYT
zrJiP^uMx0!5N~|X+g%;y7n&$PJ-LQDo|}<TCWK>h+!a~!f=HSK@Ma$!10zBy#n$ri
zq$&xpyc}%<akr9g&Hh0EgmG~}WYL7Mq+l>!sY<qM-_&}0Eyo_0T{)h%V-1P_F5@xG
zEvwsf9j91UygZ=)#{)s)2k#OS;VkSA2VFz`Wu}6f=N77z<Ley_-LTBj%JT-e7uCj8
z5+b>nCHWA%C(kZTltPWM(5DFyOYkfvQN8b?7O6s}K5YwE-!OC`+Qv^5pcSny7E8ki
zIcHW0_P8#GzV71N6WE=r`n;mtJ(;(6wMTrO9sr*jX;H=ZqK}aX#U!Cz_~Ntg36RxI
zj}fWDBN>4L$ftWnAd8&JjJ$4i9ZrYa7JNGO!Vp)*?{AUsv@O%!X?croq$(K4>$)F?
zBZiRR=>j#**V>J9#n%xu>M4Me`|wgY@R^qG9tW`-|EpQsoZQkc@K4nW_9{hnn+kMI
zqtB=@BES=t18mzb@$uQalhEix5ls75rnsrv_R<OaR|jhPj}Fx8k76!V2QK+8m=nO1
zoJq;@oPc~gP3w9l7$zC9ullfRdVS^iXpA_ffzTfIuf~lSvS@UElmYz#c#ZohEJw$J
zP(pQP==S4`>ujjfzm>*$n#9qjk9%caIM1Ml)opDu-1_Zk$Z=BnY>o6~J^J{?HRDFt
zZl`1U^ALA3pxWy_wVB$N&8bhnh!hVB&4@l-B1j)r7Ph}KNun2WH?w1sRik%uaxoi-
zF*J8J4_V|+80Z16Ox~Vi=KwprUQm~Uu8rCnGf`tpZs^50c2rX3uO_BO^?AD5ziM@p
z<pAqS)wVse$Z-`-+IElRw#^VKb@R5TGW8BNl`+YWjgXAw%#ctjO;E>!W##G|KnKfl
z)jo<gTxV+hm3P#dVSn2qyhR3s%zQ`|+1J14f^P{Tb0G~J5jF5v$6}%+l$kjGEYjpu
zBbzX@5#ykHYsy_^+n>i4qOe!s7otZ{!|HX|<Su{3_{BhGIAPaN=9qRA1o}^0*oMI%
z@`6>Y^+0G3COv{@^ZOjaQ5c2YMv3G?$5%Rw!$=GsI+i)=6X6?a{Sk8$^aMe=k4*I+
zdmKIqC8LR69%U@fV{ZVj8<F)p{BCNM!Gvn`lqZAVh57j4Vk8-#sIkQ}a=PLFT^3xP
z-Jx5K<g}4b|L$KuY?fYM-H<f8%yE!Q0~EWKSf7RM_Ai#;^y_8wO9VAVPc}gGi03i)
z)1zG%AEic<c_4VM72FCyOm3&V3b1bhgCYoW6+rZ>3ktn6i@ry?3h+sFDld1p2_hzF
z(xo9}`y_n*Q~c$DpnVPg-#n1&^^WsS2V@NO)3~w`rz^@_2uUe=J?{R6Bi_}xfV#=K
zY*Xhetz91^7~yay`-ScxCkPy0Wb9@K{7Q~C^Ccx`vEt7X(X$s+zQg}T)jcqWwnbZj
zj_u^cwr$(CZQHhO+qP{dC$>*)vvc2jUDaK+f5Wb|)*N$u7=OVm1sI30pb}<+TNfb_
zqbd^01yKuXWjGIT{^No6^FPl>h!v&F{xK&0etCJ(M|skZciQ?55<pRYa0J4u!D=3Z
z4{Syar3ARTC8Kt4I&(McSKCc_y9JxwLUnu_$vL>q7<QAc6(wwZaZ`9A$|Eu!_Hko+
z8t+mMi%JhBD?Pqpx>y)eeka%r9DwB4JxPq~p;QltbmYu08#2o}c|pU~-VZ16*9}$I
zAzkvh=+^U^Gw`!Cd>yeJo*N7)hu{}kRcPBK(7ONQ1NGYf`aoeDHt7HPK!b+n8pj}U
zeCEFQ6?`T?t&2&|M@U%z=L6w=x8A=3vGkXnw!B}WY@tiU`~!l}=8ApI)pqZ^HwLsW
zzSW{9Bz(@druw;MlLp<|#0xyelK5<FXoOTL_ID>s!^_k1cmX6uooO12HjE5!dqKT9
zN}&AC(rc<|GsC^HBcfe6jrm;wPTTp|sJ)B@26w&Us{d&5S<mruz1;1y;;ScXhT+zM
z!zR=z@(jiV1nm0tfgU_*QPRMt1fi5@MPYOJ=J^otqziWvmgK$d<>9s4^78q<Z>tnZ
z{Md;jc}H_Im2Q(PBkx`ws68W4xw(c0rNmR3G#Qj2+&iFx+foA|PY1*x3at!|Lr7Bp
zmj_z48k_y)fmriq*lzfqk(>+@n>hVkuadA{vB5J6@@kSEX^COOX(1pXbgp7DPP`8g
z(QgX?)>+)?#lZ5wgc@`@dE^Jqn7gCHdF4r@9oPn^v5hT;y=vF`4+|I;CAz^%j7)NY
zl>rF=`33Ra{yufd)j?}*m0{C)DXKuy%3$r2@U;*_HVlvq&rPs#qa5zbVWph!r&rvf
zd?xcyn^hqIyDT2IC}zz9!8lT84kl9`ZjLHiOiN$XET&3RNChu*0!iApC!gFCgQlf<
zfV@e|^<Pr{@<0s6)(o`&@jxnfGjh^Z@VqC79guk-*+)~BQ@Do#EYH<gJ*uwr1m0l!
zuX}0&i$V>cFO2wtyhEA5$s}V=u7q^$1prlm6R2_iR_nz=B<<~`NmXc4PRe(&vDlOc
zID`CqgWy#9ApTMT#rcYZ(NxA6j!=xoqMPVRQ5KshQui5D0@>r#*M8C?i|mlLLZ|IY
zFC3h`@W}E1pzIRSd0vuI*VmOQlo2J4(}#*CXHFtX5Cq(rfG2?O{(n4BFJnwY)QR8V
z4+n?cDU_$OrFS^BhbQbMqoHOQY*+Ad(I~8Bp%&gryKaJZsX-C0N%=T3Hgpn=U_ZDw
z&nAFCd3%?0dpz(T)l6A_Sf*guUmnPS*8zi2NoXO)AV?jebQmNPAg3ZBI+NOo>Afdt
z5>A9=!qnKabctMWL#2FJooVCO&A!+?!p@XueFv$|VZ)T>m;#M#8ir;HEvu&!83oBx
zHk35=gd+JRbdEMi#1afB_6>rHHr08OM#IgsXgAu&%D{3JD790-B2JN$FvFJ6C#9nk
zQV&sF^63s)iSwP_Esw*2wafx9eZ)<(Lnel$6!R()qfKHYVQ(N)1(Hr?LJR5po}ftA
zv~EkYD!5v~^i@d3%Sh1?yu+#*oSi*`Er|ZGcFVYbg+u`cz;}2WrmtZBTX01|e+i$i
zolpZiV<Un_00MOa7zz5DLO7LI*%S&TpQthhK>1E!>bYqZwoXMm{_ep6nXODd)hb57
zX+#DH+<4*9n?>Ly8+HE610i;B!RHrX;h5eB>br$6Fii^~Op*Ozj%nv<r87p-+CL_i
zf6qeLq1B?qwKeFGaP%1BEU)9=15?vti7n}fV>;&C(V6wTBkp-M)F40Ui9I3j+Ar#k
z{pEoSuqEC74N3j-Cp1aa08lsQV{4!x>wbA4qL0NfYL*c+Zc2Ttq8b?mxQC>s*uXcY
zW5R3a>2~skc%g$!^-A+S;G+|2ErC5qCeC!>hJEQQFf!$sVN}bM8M}_~^}xbVT+R2B
zWtB9=cV4h74bD>mh~0HuB#;57zLYhB4+eTuTsbM3ihu1nG*wN)?xS~MmvUYoSM{Q%
z;<o@4szCu!@Bs-7a$5W?o+hQKZj6iCB4t~*!W&ICjOGH$Jxp7fyMp+erN;3Qz0J9M
zT$oy=kul}sYXI!DOo9q=^H27vSlp)wFqSry!UoNRZoU!2o#j^q#{4Oq$OBK~^u)JV
z7`WkBizA?_Y$h5ZeE9|d(2!l}xnvk5Buyh{mJ=DHj}hY>qdbUhnJLn_rPryLOU0in
zrpePj2;u8&<_2!|xO|<;GlM3{itC)PCWhknbi#utVcXEJf^w1{^<D~<;9qvS>0lHU
zCK)?0Ok^tT1WAm}1_a<@iS_ZDVXaxHiV}>tO1)8*;np3AB<rC98ewW!x4zX=!H7D7
zk@?nN_6<<ynVzkC(}tYcHK~psjn(bGY1PveZV`XZWO`=<XXbs)#G4^H@wXMBa%>C0
ztRoC_RZ*(QCoC<7U0nwmL<XBRz81;t1!q<UUa%@nAd@a+UT>fGU+K?fN<>A?7u`uT
z7&FYKKC7-8zicP+g(r#=O9~V>B_wawu@=KKcL=%-wPhIT3dD-SE6B2>8}~r?B$bzE
z64Tr;lEAQlAE3#(6D1Qs9?pz8b6Ja#V|Yz7$RV-lqJDt7{24=aCj62G{Mi>Da1N~r
zFLew{L@byGR~NY1OH*Vx>Z(#agJ5)yTWa2p&UkjUf~lvd@X;gUanqwnPDK+ZDoMmE
z?VPax!0k*sZ&EA>R9NjMV4Gt#F{#)_RPCGs!spLR|4Bl#gqJ6L%?IPJSmD=SVR`ma
zbk8Of*xi|=MH1Z~{jkdG>AIh7yI$=6SpJ9*E|kj9^j1Fj$ahMve>8fB+!Wygd^@#-
zRpJh-1<sG`|2alDl`|{V7-So!I=~1Z%H5jJ*ZS1uiHW;N47%kbxyg$Mll*gQq`ZfU
zJ-wMyq28%L6{v`NeEy20lKYSh{>ml#9}a|2s6Lx!3RUPu)ZsL9Vky$-sf;Mpj2KCZ
zSDP_>G+C!P!veRA+!UW1{28D-s`XZ>AMTh(Ak8(e(#;Q^0~YZh4k@ITY7nkg93>f*
z!ZM0A0&WNoB;80JgkPv#Njhvj2~os49`82IObO18AJy&eqD+|yFs|RQn2$nEii<tm
z)S@^wNS%r9Y1b8gc%-KOJEWEAXW_RjWVxr`LLtUzPIN)khKU?qml&#DeUe^v*4gH`
zMF>;Va|Naz_^lm5Xe#-dZd77wb)mB1n5~f^wq+3#;`b2!P#gq=qdFKRLzVbR=i4bM
zxdcW6ZtYzv9-G6!PsFJEXBcQJ7>!gMh_020Ei)wH$s7Zdj-=D(z}Ry;2>pFPe0m(f
z=))iyfm44l;s&W-9S+NE3ahS4!x{IMm_%jf^I0&A<rMu&Zm2Y6GR3>>v78ZwoMvr5
zGuo~>hepNp^uy2PWO84bK!v#d-2A&7ud&l(l9|n}%aIxA#CD=QsVku^B{Rj%z(7!P
zY|!<GpW<IIV8LV3@ITNN#|;g`>Ams{-T-%5FSEvk#93`5W7VxAO$Vc<N)*160u{;(
zY<zc?c-fJUFh!gJYWE%0Ye>;zGGx&<$S$QS7)TctL-!0&5e^Z4)>QLZskaQw8q@)Y
zh-(e+QqYdmV9%Fe)*@{&(UI`v&VO<1`QiX%`<y-^ML<agb=p2|PgQejwzy7mJQljW
zXk`r{&+!<+^NGxS=n`gNqARO?x?h)BdN;iFjgR~1+k6%asLP?QOK_R6Z^0I246p>^
zERch-T*HMbxg>>A$0K5ikAlgc6{)$#7Ut<Zm=A_;>oq40UHV*}`_~(v7)El*#|7I2
zI`(zD1bvESvQP?_s+6{uQuFT}Zg{o6ZqS^MSASk!ap?5PXm`XSg}nd18FBI`V0k88
zB0U^A2GJOhw?`mpDLpjLs@``<G~JZ}26SfZwuV9K<H}9%FDD)#g8kDy+NLy+82v!m
z3yNN)KlEWtO<Lt7@?%+%UtuirUOk6MBKv+U?34QOx6^$6v7X3qk<6f}D`|Z!C#JE_
zU3*%}g7>?1F!8QTvFss(vO!2oUIY<Oa6g;9FM6=r)ed0xef{c~Dt3_MP=;9Q*a#A8
z!W3keN2<2JN{V9$M8<ah@lFl2&V93dT3Odu6dcO=YIjzVULC)rBcT0yNh2r&hSj=#
z9XxMRN{^yS83#?c-W9c}o#V)KtBiqY;)fTLtCGo}%TlY3`^ABv1XLCv&&kGA-MxL&
zZf!~lqiN_+9%7=`gIdVkub7`0a&+ce`{^jK{=<RtUiIzd>S(|CWq_3{LPP#v97tj2
zKOE=@dmO9K0TowN_&*#-gS4ChrJRj&nX-KFzJ<_qHIkzyn3Q0_V0u5xD?oxj<EF8o
zATx+AjD@AGdP%MuT}3OQMvLN~sLq<PFA3C_2A9h$&l^MD|KUKsVv*82(Eo=6dA=Jy
z6oFIM0!wTPjK_|?#UuQ)((_&Wyr7{`=(iwQWP`1|+UbqVAn|?AyGsO{moBN|+j7b4
zwXy{ZB-HEL=Y3<k%B#V^oXU9Z2)_+c07x>m9;5dY&W{&|Kxzt(*3qu#_vl{bn7pD`
zVHq^{FCicVZ_E3tPwI(-_*L7fJ#-8cb!f-@YP2)GVmkD0^IQ?Rb};D*$j5_C+oeN5
z5)E#rUZty*2V7GlVVY^CB4?!4{_%UO)Pf<$*)Dy0_r9H>AzXeRq*<?n5HC;t%_jGh
zpsN}^Z?a1F0-e@q@BF-t{&>7dd2pk6ZwtC<ABocIEaOvM#IB1^7jU)<(dO$%lU<wd
zA0A<ji4cvd$TqIDx-8M@d%yY}*t&)?ty7q1Lju)2lY2scM^tD>;YHBEq3If|*uz@k
z|8!zc{McmG!FRohg%}yP6G+W+92Sb^IQz|ZQ+&|;rvp*${f`b*t(_3AUn0uM_m4Rk
z-7y5W>4(E?#G_V!%3u^Pujq~!qXXD|>46H_GyhJkh%Do_D3)&{Aq|obq{p>nh@Qv~
z_LPPT=Pj%r!+@;haYplha3Bb_LlM{1hIw&im|~#6==cBOKy*4bJKk&DZTd1A$GvXg
z|KUK}-amAhI6w=4WNNSSR=MBu+_ZOfmzOET5W0M}ej>=2^TFE)P<hrlDOOR}BTdYk
z;BGyJXbUGLVewv4SD`V39;zImgu7}kQW||ADI!^Q;Em(5;6|wZXZp&|YfuuD=76Hw
zNx%j3NU7(z$Tl^#nH}NYzu6e$sAK9hBrHIU$x3{JF=!>LjQm$VtJN57o(pAu;JSNC
zjAW|05%&Yyxc+=v@>zw#s<6@fTlDNk*%D?Tt@2x<*(J4`)xE2zIYK-*-gEI$WceHl
zvSevl>Fn&c>_}|27OTOFF%akCYkwx-O#Fh119QQ1e>@nHFj5$w0R;6ig=+Dw!s<JT
zvd*1E{=RLaSQAW1uzhBf=8#Pkyd0Zw8{~t#B7&y2Fhk-6t?AIMdgjGa!-^eg-r1Hd
zM1gT!4B8qbfV9m6@n&*P?><d_tT)XP-Q9WAS8`=V^d1ZiR(;EfKnkVl9@ymHoH&{K
zg%J)R=Y-}$hRZ8m=YKd*uwJMS4l~e1F?bT~1_CnRIgpRLJBEdf@U=m86q><cYOTX$
z=Lbut=d>ir1z`lN6|~9bVge_LK+-!&8}eF<d?A1@O1T2?&)Ou)!}8H+yi1|$bPBze
z8on#j$UA{p7k(s?Q~bk1xC*Y{?-u90<Q{8g4X67OcD1$NL0UG&1CmCc-jXU_Tn8=W
zB`htB5b#C=@hopzrAftihBjIwqZ`_ThEIj%_!4X6tEcMFsb)|?jD;eP?@d|T7YI6d
zzX9d;R|UnsGKCwx6g67*eKk}lOG3zrI0R{psP)vf58aKY(`s){S0lLV&31~9)9UQ%
zj<<yrc1H3WrFS$U$#2OC5WImV98vBC6}B2JI&P=&TAxEe&%~Mr#sf!eTNpdq3WBM%
z<%IX~xpPLjO~J2i2kM#=gB@{&uPomGPhSuDjyHP!gVLH@R1=h@R$FoD-tp7eWqp(S
zWTcT-jKr|qThFrN|6>EiXqKRC$l+dI)X>Jby2hM8K3)M(eynVcW_jvcG1g$SAWgOf
zQIs!P-~QS_y?R~E{J%C(_)T!$9NW)#uh+Y7JM{|Tc=LZYkU2+INd>-H2-5frdjP;B
z=@^nh$bU8v{6(~Wx1P+2WGXB+@~elC8>|Ppjn@jjT_sJaisXK>k>{@s^z>^3onB3P
z&<>ryJ5@TfDgN3(k*y}BNsn2O&_3QQ7xZ5nRL~FY^<M9;4?}hxxp$N*`1A`yV(^(}
zy!Lb39}Z{DWsk-hGJKx(&kH_5(L(*e=<I#*ird$z)Tdzty~C&<^?L<mf9R}@D2KzM
z&>i^37JjsU{G}yJA}zSaVJo5&Uatk>p-{3YS8hk(j9Zy6L4#TP#f;-1XZ9OdbX;Ru
z*Pb-*P;Hk9`GrAu8S%CGB}g}$kmwPB1ZXx*;GC~nYRXMobP&|FR2DK@dux27ns#C`
z*?uFLTTXtjS!Ic1?2;~T*CP*9aeXD`II;PRr7fY!rXTI`vAgbWHs7UQh7&sO5R-J6
zrrB$=>Jh*V(jP7jcbH;kmDo_1)dRPAA6Zgx0IBCY?kdYor{xe%<GN`PU4CFCXAIV<
z#TZ!ntj}8v_|>s#Q|n}&+V`HQg)4lWC%$e)96!q&pDJz+4jy;dC29$shA8X^i?AMX
zHf?KrRaq<Hi0n+ym*T)+38F`xtOg=wWCLJ*#~uiHT2s9vQN3Y?n1FFn<`zep$=oks
zi$}SNC9mn63rQ&!;N_#;9`8dfa<<}_uv9&XSFVPiq5W6AB2uVQ%%IR+q<a6%`LVB7
z%y8(mX(COKahRR9xulFME{GrDpBl$!R3kwN2LW32;RI}{UW3h`d`*Gn)+{2=5K^rw
zPU!Z(hA)=JW}u)(Jd0GbvjeVB7JJF`RF;Deu4FIj)^C*%GPKJ6hX!hGo(k?%Ff;N4
z{*MM~kFOR$_>TrULNMjAn#D7VWYL)%9yLM4*s%LhYNY3UO)q_z&k|s;={Ids?f73B
z2>(v?KN?7o$>5HEeaPFIgA*&3{JITQkc|VRYgX19=Bi)3fb9+!Rji{L>ufB1KiTEc
z8f+sTDk{#kUtsR_b_wq=a9rhwWe{SR%v<G{KSO}X!0-v+jnQwZ1%4@|m3!L&fcPk4
z%NZXky#q#F<UO<&BP;)u%9vE=kyhGBRsGbBSQ0uR?pzZ=2zODUjM>qe1SeN72zpRr
ztIN@(Go5Ng+=>~A@Jfv*>0Bb+Fj)9V>hZ*}B}l;wmwPDpGVeovWuV8YG*~Wli#h`3
zS8sn_*F<o!WKp{<y3;@%sl1A1m%Q6lZGtLPW6fBLG^AKEqsH{$ELTpdn!Ia%<pdw@
zU}0y?00baEnmFRG20Rpo0OC|{^R7k2V#+-ao*~%(8U<c}HwS9EhlKTqho{Sq$2cih
zGF)1k*O-j#tGK^KkX@schG1bKbjn8@l%~&#Hf%P|0*rc|dVT-N|0&z=8XvaE){1Wg
zV-@&wDPdUJiqyf$01@-_Sf8%!TMZploL=H4A%gWVYUN0|j35%LEhB$S;<#L?L^}*3
z?o${@e$e<U1C66s`wkh`r*c7f;+Y3hyixBTgxGE{8{==HMh38RndIm>tSw8jq%zCK
zH-sT{%2}Fatu#ZTHpOob>Mur+^ge;uHt<I7EN*&vN#WCjLS*4Trp{wP23lZf7I@*#
z0CPhw_eB*t&yyflbeMDXe4ZZFx#7`gb^cA7B_w^BArA~m7Pz;=6gUu%CHY4+y|j1q
zL{U?V-PnjE$%L1Ywf?dJxS?z<%J#Hud>Q&?ejK>=9TJ!U3y!e|&iD_PAV&XxWgvt9
zlYw@BWuOaECX6o?VhAL$T!vp6s7Mcs>HR+$$Wds@9IZ3e4i)2YGu0Ub@jn^p?1;9g
zFESrHV*HXSyoMv^Q%T?DgZ;6t=Y1u`tm?W(lO9<JMR*H8r1DMh)vkz6zg<rdpDVX&
zHOiio`BaAgL#H14@7_slqi>VBpCO=5mr6FHIvse3HIl5Z(Gk-jW;Ic!QojbRX0iVO
zBd_7z{jsSeF`>*Jr6{{~(ewIEv%_gp2dBNE@~za>3gKpl5K6Cnx_*A39pgqPc$+n=
z(4aJEhyap{@qGT>N|C)OByN{IwEqjrg>N}lf)4ViIXTAJPE9OI!jTO<AcMuYg9$g!
z9l4JAmpa1}Su@>2tzwGi9ezZog{_e1AA1>nFVQVJ%zp3M<U9&QVQiKx5eNP7EvNy1
zS<pZ_+!fqD95`lkD*br@J~4mXb{zWG{#~N}vVs>FB_wuafvO+m8z%Z>wT7b*@_As;
zPe#da-?6L@*BdVrl%9zS4A{Jh%&4=Z)gb)RAf<Am%{4<ik<-J?f)|?U)&YscyDu^=
zL+Cw=RXv3xSjGZJt`Lp!U<UZxR~TVZ|1^9b1yp?GasB=#3o)u1klMl0{Mvc>RcBjI
z#?=v`W5w1GFqauq9=1c_3P%vtbV|kc9*f%4tXeD##7wTrJrU1kbb~%4R)HoM!IquJ
zujSz19x*Q^Jbfzin}!^eNa1e_aUMkM-{h+k<e3`u7gLOAZMZLn^vWjwA8&4tFZ>}d
zK63<~b~YflKhV62gz^xMP5Hrr-{oH^g$8a#IVqG9IB?&HhPPsv_NBSxDUN?R-a(1C
zA1`oUKO9S1^K=B<fWA@7B0Weo16znq-gWl`)(_0*a&f%e_Ry%ISkCzIF2p$#aAoCJ
zVtdKzzS>^>-e<j@pf%I{enbk`Swi05;R63|w^GU@l=aeNNW(pqB%KW?Q*vZWmp-JN
zof#4d_=+YaM1BFmZv~9x0L{UqgC`G;fP?u8Jk-Pxw~>W~zW;il#9rcag-`ovJv}|<
zS_2u|s<{*?zITMd-BIv%H*y6##@rFQ-gjs|bVGwMwsNrSUvS@M^eh}K>@4*0*z3CJ
zXLInRVx&o572nNF+#DRq!d`cmH*S^WEYXP*3io)DWJwfIW+*+n)N0seD{km@#|wn-
zCY5{{vUYf<a-?B;(y&j5M3(O%dXGIIgxrsgHRxhP>0r}@r@WFSVo5?pUlv<!i0H@P
z854Z$Z=>XxUOr<1Idz2vHbjTwSme5yyovCb-FB@C#e+nyl`dl(-7`Z}-=qpTMde?+
z5NzR_ftiEKrLW3x;#L&AKd^p}nHIWE<;$60@=g$zD>&U0(4p_CnZUkT`qan5c24sA
z8Oa?(KQW<UNkipoKJ@tnfi*6|S|i=xx3~6uul<}XgeehE2UTP90EF!_x{n5vgGvZc
z;<y!3^}~zLMgelI5s$<hi*l`cFK9m2La~o!*#Q&#!h;yDif>2ye7p5N&PKIGplh`k
z=Lb>5XtsXcd=Z`#eS+dsY3c(#DLRg{fW&yBzNTPwaS||r4_Q$QOciWIs+*Iaq9vS{
zmk3o~K+iUKH1~yQhbvy7i4+jjk|svf>{QFq*fGCl4!W5-nV!O)6W(eD!9KohoeK&O
zxBljEd?*ip!yB|Jl~?A&l{XQ3xN$*(KK_Od;4|QVpzv3KSn7J#yIvVTOnX1PV_42#
z9^bq4an&Z5-5rU>1exo0bak`$b-%I{-gTM-OdOBt&D+y11^OO;n_VxOKvohoEvF}E
zm9xt>0RX*Nn7eD^sd5Hvg~6-Z<&>-spY5Hda}T+fOs0pW3I>vn-bij^iA_nVZI|J7
zl-)G!r{2l{dK;b9UqI=bstH(bbDhz4H+{a=UC$(4hh)wCMWYBl3476hf~zmbxR2J$
z)AR7wlm$aEXsCfG2SB>yC6ohT0g%9)uizW_3{~a*n%YR|qvPvd{QSQEy-~Z9Y`xgB
z@_CfKZ9BZonGH2?IPh;Pgep{a6&g7WoB15yfzM#AYRsaZ+@QM@Nu{JCs@5{_C`Dp~
zVX1k-Ycu&9{Dc=rcjvFSPn5+bFvp3(lb8!muuf!v3Q7%Es{tVgBtw>PiH>up<KP?J
z`xhcI6bO<@qyx0N0Fll*k&UN;*A)#*<z9^!Q=vN*A<+U#hMkn!6dntx^GzJY#|uN1
zEUD?b(&{d~QWgFvpcCE!+d9+ppyLg1o$7m%^P)#!eTATs>zrU1+|%;tcr*gsF@n;Y
zA9sySMeQ~&n133Fbx632TT9+xJ#_OpWfl_^z<#znr-UqWKiFR_fg<WCSUY-lhAF^e
zqH7mp{=Qyb<E`J;^B*c^zZH>-wmHDVLtWEu4_^%n$@0KQz<?}=wX^z#j)0aqM<Xn_
z6Ys1p0r1g&f^olxyIpQnI1S`eNLlB4rjDR4>ZjM|jZoi3LNf(g)qC$4`BXS~j7~6w
zRFn%&TbNNO=zM*^&&c&6oG)Q@d>Tj(;J@iTLu{rWk&t;2`_aDwe+7_OEk@+K!Nk0v
z^Uo>M_}{@_0BDtCQ)mGs<rM~9FH-<|fmRH2BgVe84oNGt#o-UWz^>)=C<1dM`u~*9
z;8gOjh4Wy}3zNB8^fL-?RtSdp_fXFHL3gP%o1DG+vH;vn$FcxiZ{<GO3l#dLS6vl)
zd!h|KM6Nr-|2^%FbPi^++fkl$;xbJDGk;pQjz~i5sAVI#htW8fT0=IdZiH3lDq;}`
z=haoZS`oF@HJz62DwFMOlT&~q`E*6$KbizJbarAE=o>b)Mx!dO63bA4t-iIY@}I%M
zi?`Dm->igi`U$_HEdYD#IbfJhA0GRR9bOhGnV76h=|q+qRlqdQa<@0UMdTlF%mkEh
z`r|-cIG_VQO6ae3V}2BZA&t+8;LTp+<D6IKeJ6=B1q*&J^{2oo_odGnST2^#@lDjV
z68n%Sq<Eq^Yw}1#6Itc<!e%boPiV5zLs*S61P`#m?Ij<r=ceoMR3-6Hocz`@P7~`9
z%82gf`)5=kZ!n(_DSX!s&4UcL2eyv)jp_)(^E1Tjd4J=Vp9pKZf9?G^l>*GX>xE0#
zPw#G*^cQ@NWQJ+B+Lf{<Ut|QEE>*6Lc~iciVygV^-r@LD<IKfZ)j9z2U@5w~4^oI&
zr{>q*<#@*BAP*Kn$DQm3px2OLVXvSn+O#}A{8nJ_ZtsV@$CJ#ZS*^qoyY@2vg=p#0
zEp~vy;1RSIy3~H>HvH3+0io_ZnY3@xKFg3H#i?{QXiu1ciy>YnniyN?6b%q@j!J}L
ztO``lCUQ@CSb&HXpzO=8M}7pBBgo6gd?_7UEGnAFL6#;3FKvT2aSRrkSmDf@0acI4
zUxN;bZ%JYgz(@o4stPo5RaCKbsvoHZX>JNsVFR3-(GSpuc)BbFdtRcCvHr|(u3w=8
z8*@8x1u_1x3sE8#9=jb1jC$ceam$De4yHsDbhdBIn=t-MG!P914Y^|13oP3AtImf4
zC!I}vZ|&{v9o;VR`?|Zcv(58y1X9Hvg*i=BI6LZpEQt@CvBP0!Midc1f-g=id?Ql9
z6@y}<Hy-e7_6HCE!!=vWFY<Ngq)UP93FOy|880PblfCC0?JbFOriky&wE2){?Ane+
z=(}T@f}pi6hF9!gg(MUNd@ku*R?;&wpGn6^b95AZANAY&+E?zuz`)DFKHJgz7K4d`
zgNuHI)x0ju-3npl0ZP4Nq@qvOf~zf3(APlrqeR)v8M|ZbDw3p4awq{u439tW7gQ)a
zF2>@pieu`oL21$xa>PrOf`PZbz#oB&zLT}d!4^Yrw!zDlh}};FFpI8D3=}6(!Pqd1
z!o<QwN0YW`F&Od}q8l57X`{!JgHg5MQj5n}!*jD<nri-cMk%bj#9okr=Fl3zjm8hT
zn)rS!7GfA`Ltk3u%ctN6q^ANw0hnUw57dnW?Wm|2DDd7GFfQ<O*JoViV@#hTX@iG7
zco~`)_znozA3WOQ2S9yn4Fv5$g&soD7ntN?YktB(Sy{!yOJQz|5}cBXZY>)}l2ZAB
zP_2q*S+Be3F-(cr--)j|x0a~k&`Q*Ns%*8{THEY(F|zwOiRq!)WP^jZ#$(W+cs?jm
zn?V5&3sYel`7SmrUBfagGbc%T?aK?wFr?eu3IY~V`l>+e?#LJ^Z2R!yGAM7Cl^W5t
ztYk-!vjbNWxwmU~Y+vnrN+e4}EL5K;SGWe9!jZAs+B9f!sIy0gXIe*$Y_)1K0$j*k
zkj;Z3k6e$H?{;$K>MZud3FVs1C(OKtVBxgvBI>spau>_KidDY>i2~ka-Bmc@?gW&h
z#}sSbL+p394BTkjzvFlAzn(9)zUkK0LD~#Kq^q#+I+}tiW&v8M3XBvSEQZP<G9-Q4
z<{VFFZ4M8XuKcT@NGf0(ls@sWgXoA#8ex}}Z_$L8Qyfy@YFtPPc)lgvBDV&DkHc6&
z8Yj*;^BmC`t;?c+^n*1{!lHy74j+(m!0%np$4E4phl|)b4;eCZ3EQN7xqLOoWy5;?
zlt8j_qQ^GKTd8L4zbPicv>NM?45BYoT~tluJj_1X3WO+A)O|0Is<1fSW{7dDHSV@U
zFB!B=&_xbw`)eYRsG0i-7NJ9*&4STvK%D_-(*IecYBR?&!RWAEI{aE|Y=|$3C~u|L
zO^Hn(be~_bSuL{W&2ouUGyWQ-PmO?KEJuheV7;U`I<Mpi`Ml_?p$OEGT(OWgz>P33
zGR)XMU4hmunaxrXZ#QMtJ2m-{f{7g9&gw~Xk73P-9ygq)=WwqNXDDtZgK{B-9vULD
zAWK-wJ)eVxCuwUVgo3pr(MHFF15TfRynS}3jPP9V!$q4k((cx<II?u=au>%-A+ljw
z6RiRIAn_ts%LF2?{~o0Brn+C#nXd2FD0qbQ4Gh2L=H%-|Ti$1cs1yGMNsetHc72&a
z!3Wg|6T*uW5|jisTMHiB)u^w~l%N};X;4Il_U{4<5Z+A5aX2z*)c7r$-!Bhal(_4N
z4Dc&SqQ|D<`0RAOQ0ITI1VEeThdpsgz8>{P^?gf}-1c1e=E_)4!j=7YT@AfO?eh!D
zap;)2dU2Xt?W{D)$us&D7d9J$j`m&zd=YFmo{Tw|x!9QIK7Thh!#ho`5W)w&+{t+Z
zJ@@f!`o3-F9<DkgvZ@}6_siCuaT>9Vv&`L%DRXc!#mSJig5|e`7pTFBGU1*rRq=ap
zm~by`Cg6(M3g7hW27K#x_4&-{@tHF)5=o9pN3J#yG!YGXM1x@n7SARM1_TZ06ZW2i
z0U;3eMbQ^x=VK8r+K8=@2?VVnU4q4-AsCFsAev?hf)Oql$eRg8%BZ0*0<G~dF{SO?
zz>pwqwbA|Y3s^&FF#dqa$GK$vaRt**9umaSc*kf=r-p_t4RK`}ej<o9(=ivxdP+xo
z&WxFC-;c0U8HGW>I3G7PLA-+U5~*UWe<W~U*rYfd+*!Gt^eoCiJD0L)r7K26&WJCt
zIULwoiRo56+h@L~DbZwmIv@LM#@uQ7>o@XnH{)RJ!I-aTodp0cds4A57H8UptlO=x
zXU9iAAJP{aqGtsE_@Sm6WlN-Q<Zk5VW)Er;F%dqzxSLsztypo3mJdD`B)D5b<Zay=
zh^uW}?P(AuXQ|Z@j~`ZZ5;qzrwzGc?5^Fk;zz2gD)M;7VntWcxw=DB65as%9LU(+>
z!x>P9&;{yhLVJ>I9Rvt=U6_ki1b^F^fL%Fd529|BR4PRg(fQKX$|zn!<Qw>2EKANv
z^wStay`PZH=5u_9KNF?R9l{C+4HACt#(M5hpw0KL`enU_pwXuUa@&`yn;Fpxo(Fi|
z>0`X>h;b;Ft5o`kK)`9y9N!l!fpoWGLJk>2)WJYbSrl(5z%R+7qIM@|Ia#h7?w^te
zI?u+3J5;ONMmH5}E;bJxj<lTYc`X683Li-#))|a9PVrHJ67Z#|MVBPXbAiZJINs(f
zNnuYcPvR5PC;OXKTDfHex6>ZDP~r*OX%KSt$pjt-eiuR!I5|u})}bMDUosX?WR5}$
z+r!a4B@m9*&`M3|kVI$4`>j_jQ?_u<VS*+C=WcN3QKcLZQ#1B%^~Z^18c+u4j-^~6
zG``jcnc)#(hbq;mm_~t)f04V+Y;aEyNuN^Lq#EfDDi?{3klIanPlF4WK;fbSYJ5y6
z>#ejlW)4az;miXLt<-+7?4|7DzAda_eiLYs%R2GIuf4b0Ksjsu#T{XE5qx979v#p$
zDLtns{Va&sjsRFf9oo6bF7by%)ukbeLTpHQsQS2(W{G^V=4DfwOR)zXkF^y<iymX}
zKy0Nnx)%76(nI^ZQ+phV{^1AC)Clbw=i?Ry&CR1^$M3i#=+LT8Cu_*+<p}qNjLQ|?
zb{i~jV3?c$yuZEK`o+8~()t+;K38?s<Iz<%((@!%xlY94@PaZW8(U(B@Q$puU|JP@
z>cFfqz09-`8{8&J>C;}qx?EGgvts7AxI&e>0&J>wqZ4JbNQCUFCM^WwX#CM9<Z*SW
z()f^Iqd5i+&P8ioMzr}4UmV=fe{4DfK&8`mz|8}fI)s{;3|Y9rx*oquQj|chwDFw!
z1H&yDBR?tGW?BVD)(19;eFYS>a}*d~`uHApIv#W&UnE*yAF!m>nBIy%zHVyiYHy@S
zy(@1psy#75t?4gS<($E9iK;RKC1-z8Aqe;Dvr+o%AhzjCH#I@c^ruwwNiT4p2k(c1
z>25d0{sP=fMeS}Xd>WP5<mt@hMVEUSqDFJuFv(Ge2ny2VnuQDKr?b~L7e&~9x&3lc
zXoRIhUa#E0^RST3M9l=f9A$tV&p*&lu9Z?~=70bbGlo?nbcceCV+za~6#y5{j&OBR
zn%KMS>Vm!fhFWcr86l@$h6m32V7nzP1S7zrRG+4`+`PQD?BHGMvtEtpI{L2SUdDpV
zt~Aw29|bmcu069io`gq`o?>+eo46CWhue>&1UC21Ca8*wX$zP9g??-HN<)7mNR5Xx
z0M4|4Z+?b@a|Did9~~|(FgW*WA)BmCfc{Xz^-MQrN1F=*=K;d`5z>aMw13fNfR#&^
z0uA%MHdfk6x|3e~{1NjDhT&LZR@?CCRjA!ZBFJrYN5o9mO@;(5fA{E=Sq<??8MW1`
z7Lt%k^T32@%5P&fz*xsK!}jJJe-2GfijxRZJ|ny|>R8!_Xd(ZhTFq)%QKED!V#8rv
z7A|DInwb79fyK>wWns{AjvXF0N%8pC?mhi<Y8vFQgvU<yo#x+~%q5^zL+M26yB(jp
zEHGz!0T)0EpSdnttU`-)1NnwOpLBmlT}EvS-A?g}MgG|fJEy!FAOt5>p<=gWdJmiL
z8RnQf2xyR)={8@jR5hdu%*m0=-&HoRZ31=-5dxDsyk5I^>x?*YAzien<^;EbE!}H|
zbM692FA-QZv}xMGPZL`SCV;us9+_VCN*5-6=*=ib()=Hl<)|ze5x%tR{x5877!6Wt
zH%meF57MBC*?GqVl3Y64L)2KxA6Rat&V^XZ7UnSWs7PHiY4O@%Zda9xggO9Tw5jaD
zS899dEu4Ai#hmC}CYN-=neZtrZt$K-#x?gcgxA)4hdL#Ph!{rOiqtj;7P5j*;OliQ
zo%Wnsg$+$KTXerszunM|Ej^d}cp+H%F#2-kRYOgkf(NJ=g;dW{&3?XN>px4&u~J{K
z1W$Iu3w8&mZox-bEofTj=2LageVw57>S-IUMQoyoCH_D*X5yy`T!x#pi4q0XJ@MxW
z#sXN03{I3~xcs6(S>+1`7tC5#)16wrL~!UvtXN^~A23EpeyLQg^5`~CQd)DOe!Rhd
z2?wpskB9>0X^bm<Uz)Nh9d2HeNOCT)ICVCdud0*H23RC}!PnY~!2o8}a|tjKI@9xI
zQYbsXc#hs9k0~aX_IGbHrYMYg0(5r*CW)_XVz>TfYi9tlJB|<h!@GSp$Lt$+%O>pA
zUWJ0pNu}I|D=2AhD>nX;Hz@_NkhrVhKHF1@H5nx7l;xbFuuI#RLoi|^_XD8}2H?-x
zxBrX^WT*5#tI(f?dJSy@>b!NumYHSNU)*7Y?XNVTmjPNR6Vkz}N@;l`k&J{8y9L5i
z&P^oMmbWwqR^<$b!yav;Y!CAHr%n+s<jty$cPjj}6hZ!1I-Fd%*gCQQL9!4Zh`H3Q
zB3}X}>304~J7qc9yXy=>(hCVX?vcchEke4FMhc4&Bc26?^q#<gK%rzHs6l@`QNCJb
zYbgp)%{|JMPV#w3Y%0W{n21Z1bdP{*Sdm*&<`atjRT|2}JzkD(fXgiu5dAWt3^VN3
zfmxD|!JlH|F2w>za_(2nVr*j&*T6}N>k?L6HcNa5(|sg;jK9h%;NxDk1j*5(TXZS_
z37I>tT@k94*Th8eieIk35w^x5Ku%CvBTInr24bQl{Yp~oBrF^N!w9uIFN!*r{@zc*
zGKi}h*<#cBH*o>ZB2uHG8{(wmYr@!@>o+F(E_M(au0n@6_O^k$=N?lJ!HRw*|Cv{`
zfrW4Jo633x1}UTt|7_xd!(MB->AzlmHu-HXZ1DT3{W5~VrKlh#z>AZR-#4h(dZ=8N
z5pDHJh*z^uzBC^r-u=7|*Fom|dw3|og}f>vZl4!6q|>ZsSVs69Vfc5PzQIkl3|t1)
zI-x$ID^s3K(a(LSRKdSj*9oql;RNM*5R|^u8!i7YqWnZCJq;I67r7#oGqpx0BCeB!
zk_dg|IpOY?4pZ`ZTv^yHKR8%YX+W=b#TBtDhS&2b^_Q~fc8u%C0lX{?|F8`0Kju&H
zbj;%%&bFu|G;F6#*v22}F8E!!vfs32tx)`Y*M~^voWi<Cx|3m=mRYQxcaK{zkcXpO
z<HeiZ?+_eLI*?9U!GJnn9Z5rS5|iuhpF}-B$q{kVqTdtK&_^WKyAGWeK#%I^KUIjR
zJj}&2!MqnJW!-yec`H7HqtZ=x?Nb-+Q={zX0EgLb8T=pR(KG?0!$rB|xQ0&%wM6Q2
zF-v^t=r<q5y(HFjzcoLO7D*md<Sw#Dy~~O@s``i`omKofTXbVPT_pED_G7`z`T`ff
zK=%rb4iq9D-1>I)u`<4$??95U8WttcF*8x-{zkQWjJwRbmB?e!YOBka;z9O;KOYP_
zr$GQLGhA(3ST}V(&)1V&yPZ-G*J?LpNYZl+1lP{2Y2Y*e0=%{o7A|Dr6>!dk+rSO0
z<+gD>xlUwOD-VV2g#9>^%CZ*<V%2+`{)O#4=Nxa`W{bmGo9!_4)QBD9U|=j_<Gtd)
ztE7co4GC3_4UX5J)bp1SZUOUWz(qm!Xh-^&if7LKfwCauh!-dRt3(tMAb}Vcboma8
z04)6qTLPFNrg-LV3Gg)7XAuCp2EV0~iw-nAt|tj*oLZR`mAICJVKcC%j*CMAJas~~
z+yQdp*J=+b=Ub{e1Wt4UEDY?Ce`deE_3`f84bn8yokZW_j9!auROG-sIkK!x&O$3e
zE8}7Yd=bfJ#`(|i>HuXh`ozladhKPh_VS`j><KB@9(usy)(2hss>vDk_Iz}_PMDUw
zEf+v>gQ$|kb9&DVcwjEn9}V2xygzB{^vV_43%HJi66t6Q<nJ&`xD04z+U?|kBpc<Q
z&l1PrdY@NEDm^s*?7bqf<cY0(3GK0?g&NYiBt)5H#vCyeA>bU*KP8ZO8Zb6?P@TcD
z=b<vDW26qpD}(n<a>*MAttchB2gpaf-u2z?7Qauy=>?X)9~`xYfx<#K9nfl-c)ZrW
z?v)-F4+_J;*Bx`R)!Uxb0$&bi)L@fR^hy3l?C5l}4R#2_=Gu0V9SIOYG(oVuY#9pT
zzeo8!MoSk6#bB;K(CPTB?%V*}4PO>Z++PIVn3%5)XOFtvp<f_1s(sumpVzN(Qf7tD
znYY_V^n+>P>3$4^u)_(DhUWQYGYyE`^HrIm?-$$P^OK8VRc=4c-R`v>XSNTQ3`^xR
zPaI+ovMm9tQKQu*4IEp4%T#BxHNQSSu0N4-gHBpLep4pDZQkxO+sIPagF*mi*%7wH
zD=s&@+P==Z+=qZ~TOmx>5nNPiyCLo!tG2Xsyo~zfCs*5l$q0zomfgM(IvtH@aF!Kk
zP~k0oTzl+3;e^}HspH01s8E8e;L%0sXQ0%D=c{b}fwdCNk?ztK!l(C5+ru119nC-X
zj-COh#Zg7ylfGvA^@T+nLBSVXyuug5H7kfgP6Pag0eBK(3TS4*vPG`MA@X#!5b*Tg
zw=B`~cFF_1-3M;<*|V^w%R=mVX+Rz=xQgZM9`kv3Mqljf{fy~j=aO~fpWup%hEY+=
zqbl8*w_<>*Md9sTOcWZCMj*KUF{if=gPR)6{cB%ajDsTAux_OHkCA+$Z@DALD;H1A
zBzP_nuNQeYp-Klfo-n4mY%*lh3Ke7_5bdn&UG3rs^UPq@tHhu}<(J+_SHd*DusC`K
zn2Y(>z5CUGvTtVXJ)td{%Yxgck<0CIZLMTxWiZS5+dh~2Pc>+M1KVo(%E`vy@eJV4
zbP;B#VnvO_Ct7;dHSBIoU}g1}$EmFwSSO9qOW(x6f)^JqZVu$(QhbxP%$~$yQ^|n=
znXB?yUs_W@B`mi{3}*MPKU;SgD9Jb5#l5-%w}LhMUsBh+kVe)~Soq4Z16`_y(4JUy
zS~Edkl3Z(vEMS|JYLARy$yEP0LbRId?%bS;E0Wcuq1$6yWy?3AP*60>JsDLtIt`#`
z?K=mozvucW{cXm9usfLP0Me*aZ<8=vcSvvNFyB!Q6`bI^rlkaM_KPqaQ}4pxh+hwX
zn3buJFJp+xwDU)w-FfV3;@deHYNp<|27)3G25{to$N~aU_QRzWy(nNu)e)Zxmf9f^
z5b*F+L)|9V$hp(SSt3m=PYS+mM;{6sbUv+CvV;Qxy#>d*zoQ#x*8uvF78{UKm4Psn
z#Wqxtux)aVH6zY46jzefjiwjx?(qGU57p*^){0X{l)XrAj0d)qF~i-{i7jQ92e2I^
z4~o2oi*X%<0Zm%<hqnt^>^6VE-e)!ZUER9E7!$<1)Hv^XUo4xOrb>Khge)!8UM2xH
z*4M?oHMdu(VMAbu?d674EbwIpe_$V9#=>A#2?N+lP@fglae7NO))YwRu?M^}@V=LB
zS<xXpM-m-S+Ltj$5|NeMGMkq@rO^@E2lf(Z-@r(#m=gAR^dIep4^=CXJ%}V*IK7X~
z>$Ns>4WC?D%RusTV9)WoG0XdHDpNwmIK5kt0><hpoTI?&O9tYJ)>NsY-9eVDUeH?u
zwwi;~&-w97=0DK9DGbq<M^klP&MF2i4?j`(tDTZsn?ybyVK*k9VK%dCtY=*8rd?Z2
zys{P*t#HgbF<#|7DxrCx)@!ny6UB%iIBP6&f0{l{|NHBy-r#C=Jwfo&&DQ01{?Li@
z-|EkCyU6ia>hcK<5YU;VYci+Vp_t(iu$9Xi%tQBL4z?l4I8@2ia4(C!48uB`Cg)?C
z{#Fb62bKwpv!U?A1b|KQ$S_mJqcFD?9K|W^MOVm5fh1K=?Am%jBgOcCb4wHn_HC9W
zoDeIkLSe9zAx;$A?2z1BP$f5iFETd3o`I`fQjutuOdT9Ye;*qJyP-gFnZ}|@jcD80
z%9ZHGF?_iJb$=ouPy>;B92{pj;#z}xlQ45@-$`xJC{Fl}n`TPJz=*e|(sXG(QjWse
zxa+*8$k?KN&$0GFJ6IO2#eE=bvf#fEFk^46hdL#bny-v(=`=I#scTs|kHCVqR~f+x
zxOQe(2w(9b)In@$(yIbTLTw);quitk0#^Ul-U(`wtKd6_1Fb<XZk6YY)<86b4upAd
zs^uQW47hbuGcS=-lky@+F@Z7^Lfk23&n!m{tHN2I(W_*S<oDB#0BUI!*t4v%2aC}|
z*JdBVJ5%G{_ekMp7JA`-pdVrj({Z0N+iPXpoJ}$Y-$pu*SUj3x2s%gt5QZislqI1%
z-ujSXI{SCvgQ#x=wCw}JiPzD&{?E3C25!y-#W3+Qrf_t{QX{<+m<?16eNk*u9+R)w
zQReU##vyI?UtU72QE6(ibHa9Xlo%$=)Dl#m|6SL>zeV8l&FePfiTzuMJ=1td`9G@~
z*I~9(@ix1EuX%OZH@SW8@E*oGOoHV6+96S02%d}JB7=u62#U6)XCf`61C{m5D)>#q
z6oaQr!xlv!3Q@<Y{dC>;LH#h0miEIanBh519Jhn_7#x)Ow5v@o5Ee<)3~`9=W)Iro
zWoxsa7)Zo)MyfcZlh;3EN&cK!uU5@!XM?Z=?unvPUA!5Zlz!uxo7GC|vN)Cu)nAn5
z{JhTHU(@_un$ArtN&s>#;-;OEI?!Y;j^CXe9tPnn9rlr+RsuP<>&H5_-%|jj4tYEr
zAV*pT8U-w?k#cA;6dh=Vla`N5!sN>;PTz{I<KyFCNtES`5H`T)W9DE^x15QlQ#733
zc`6$<?y?DXpInrN&wdy%_DAo#8er>i+`iN1KIdLhpitvwPvSbw53<)r)FSsjkwLp~
zX%u~JKFpUKIWr8xv14-gwItL(ArEX=Wdk?W;LyM(*}acNH5mg1fy}2XkJ!^rNjF-+
zd)b>x#1mqaC6XUqu$!XJ&oLKf1xYUTizgN>tf%igm_eVzp8|kZmxjyF;ZMEwW@o|V
z@V`ggYL*ES^>$;Y4$)swIaWcJ2~^7a<2T~f)aKr-o~k7-9#8NPFh;yc_{g1I0N~BN
z$Kp<?ybG+6UbrNlDR1;CM`<FS+v)bh8|~o=iUcbG!3}qk)tIh{CN+G7>BpEu%B@Ld
zB6Nfvn#lp<hJm`<pmcI9Y||MFFSErm5+An6G-y0F)AP65%(<$?He%58PAVFV3+Ez1
z4t>HwOTVG&crD|oMq8UQU&UP6Ffz!&PtzjKEo>qmpEsTx%=?EON$O>oU;#bi3otdy
zKKs$#J^30BfLrp#E>5Vm0+xQ3grz9Gbuo}n?{I5|rC0|3WlSl^-9$9;?lHV~=;hjj
z$Jh-^v5TCfFnC+Oa=c^ALgXr^dmBdnllTQO5z2f|gWWH{E45%wnGLQrZ$l@@*55=z
zSU*Je@6M2o<-dT(sgKbkl7YV1Fywzll{JaudtTddkXIu|ECXJ>VPeuoN|1HK54yU0
zvaI>>ZpFRC{PrXq*6@B){1qwZVv*odOs(unkE`i0>!y_zaEig!2(jyFV)3~i$V8EB
zBkPJ(^$C$9GC?oGcQ<q_bAEu=VsgA%9ybAA`%r}of4ax>9em&qK_#*DA)AV2h~?WD
zEa)S&ab|ng6yyl$oOQk&xUVDjzEHm)S5|Ar)nbG|s*y13#VKl}){F|Zt~`4XmYtt(
zsY3)~RlX&8i(vt;6gZ-wgQg6E*hMQqkBHXIvcOxBdp*p%(5oZx&{xXkcwwQSC~wjd
zdBzLD#><}AIG?*IQ5m;`iNHA2N(N{|5X@H74cI%qKH9S4ZXu`$d9(hAtS@%bgg5GL
z?UC=SBTPmL+d=)g5J{w`y%{!Mb6CLoOIj3#vY}Orx;MJ8U*KuKigmRfJ%NRKsj<zw
zKahKz5si9*!7FtHp({l<SST=6V{1xp-f@-(Q^!(*##07CQ?TH)FM9?4_(g)G`1bZh
z%zPim>EH{q?_rV4bEQiq(+xjOKj3Qb6Pna-B?#FiUOU$TranDB4#)Q@5I!<cOn~Ab
z$8Ric;3!hY*I>fJUW>`SghZi6S~H5@;h7G{z)OqQ&LV8<F4pbdq0@RIj;~`<Zigiq
zRaGLLOq0!^Z_xq7w5y*eg^PF@9`p?W395k|9lC=(6E1F0>EMwO#r<ivMK>vcfa|2O
zlZ2Sn+D1K7A2_(Us@p*GK$}-_!XSdk4j_$LR{3QS=Y=5-hX^>#@9EX;=blgVED~E!
z=^cd&pnK)<m|sH)N)HLMjD(;wTb7}V&#we0r_Gszjn3eZvrSU~)~8wFQ8nr-VyJTS
zL-hS~+g<?|@tj3_I-VR|bi|%Kxb=(<>@O}HRhPx)XRiB0rEbK-+4Y+z2*ck%INi0Z
z_22sYGM`S*<lvAQjT1C2EgN9OiJ-uJ=}B!0d;ofDL}bYJ|FHLtKcYn6{$|^@ZR@mc
z`?PJl`?PJ_wr$(CZQGds-g`4MncRP1l3!A<>Q$v`r?U52pXXVExshWCuZ9JkaPS@e
zEwP_ef8f~SFmh-C%C&FyIMm9e_jw9vv_!3Bs!Vo)1N7aX>UAC}BkB@9mhd+slB--~
z;8SqfP!oWSu_I0i>!XG&;IhNs^Y)dA4|Ks{smC4N!iuDgt^1l~jK5x2hf(VyGsVW>
zpb~5<*DVgMnz~UQJ1$f|o_h+@`)be1Ike!jWnp39BOl>mVX&o=vkIa@MIw%vp4cZ0
zqlpq&lKuKDBMP6a%Uu`0J?M5)0RwwAe`4(9!W3_Bt@I@9S0Hkl>dX6ssUjxJ)`8?S
zZLu>B4jNHPG?Bhu#!ap4I&Ic#jf`_~r}@uPs#{I@v^X5GkiF?2=gzzS5j@TnOkDpP
zu}`B|1gWC&JoJVpBlq^GnBg4`3Gc8oJo@bVrQ)WjOH|~zwcDOnjz-|UwK%W5-EgkV
z^wiA6-bad<D`u*zwmCaKPmLZzZZ^eq=e^x_J#(+aZdT}MS?GD3b~=b*Sd#dub$N6?
zURn9sA9%kJIeGWC;K>khQG(g^ae6@<<g(Q~PNF@&7m}oM=4<6^I_rRkbo@L5lukVY
zgb!bitHEcI0|jX;7UyMN2GKD>f4S3H;Ldd6IN_S6>-niBCG_+ZxtdJArA`;0s|@P-
zz#`<)ugl<%Z5eKem+i6RDW&VOgsC3-N*tflZCBkRER~$kCbw(mP)uWNg@?({^(VJY
z*nfApFWaP{m(Jhx)NC<(Y?co(syW{h7F~RP$>=?1+gci2&;MS|R#Zv>Zu;+zk36_x
zR@K$UA%We}V|S%m#hFMSN^g|kWl*sS!fm)1SWy^lm8R7krQP=aWqeD~?UiUu5O4&x
zYthALOIPokmLwz1Q8QM`Sl|7w&3WfBm>rd!#Bb4ae?n_70tok#<E^_9QWr9u;;4w{
zb`wfHfFtsjC(Dl;7}Q4?avh#4a^w}i7|6HF<8f+RCWw-9x|BUIv+fE@|BNy^`z1s7
z?hiZ`*cL5MdwLYph{a@QhiUkPsS618>yF1_7Ox~W!>kGW8%^v{S}S2u4^2(%xSs80
zm6Q6Vh~d0pZGyv&=Sgpvv#O&%DI3c^wp?jVFvIEL>-5F2GE-2r@M>kaq~wjd?Olnp
z-d`?Eg3wI`<bluBB4glJDR7IMZbj>hn{n?MRc#UFMGH2ey`;SE%b7prZ`;N%+a25u
z`C65KjI&j%f~<PIi<093FAn-(h)parQf`#4zU~j`HyhwrWHT2O;Q^_kXfT0fZ#&#j
z;V|zhz9AfnzU&{*ms5trJRy97HkDtOq|zC{;a#z>e@ay;X_qCjEv+Rv4JOj9tXK^$
z(ruiW3~;F#h~-e!r((88{skV9pQvQhRML2bV)_`nXe?NZ&d!%{QUxV6dt~$?#{v4A
zCMcq1VDGSKceFWIRBKb)L_So46Ah!YdrA$av?VdaXF%Vywd`oOxhnF*5u;#Ls==5}
z+H~ODQHogTv_+IH-0!bg)opI*w!P5tOJ9<cZv|mn;^)}Wdrny~FO80v%Ka8m_R7QQ
zk4ogQ?x^$O5?>uCcLxm7U5Fh=a-~vo+G4t`yzhs`t3Wp;IFXnD0UL%*uHSJ7f!Okz
zsNvJ0eSYlE%F(v43Lh!!XTrYB+vxv9lgq5v`q9eErR9R%n!=KJhqx=8j}$Gqx}}%=
zH2$U?r($eadwO692(6y)_ggAXe10ECJUl*jHF$Kz>K+ZBmXA}Sd`^6=^$!%Ykp`tp
zCGQ7DK`^gobMGaR3-y5Ml_CvzY|YFcxIMD1n`j}sE94SYbQ{0V^Sywhq72IipJw<o
z?_X-NLf7j$`x7s}XNNPYWSKMl_{oj`9}?T<*S+h*u=7j**3!Rhkg2tV1nQLT0L0$L
zDlgADo-h8i=A$9E&McLjFLO8ZuwIqr<rsz?sWIoRKCQsr+M+)d$@5p3FoYeTN+_$L
zsgFpi!p`V6aTPXad9v3Mj&hH?cLuGVny%`c&V(57eam6d5tC_w2o?W&dg%f00(QmJ
zu)N65O!6a)wOw2ft<3-otFiPfggtOY&2Jkkv%N;TTym^9k#4^Sf!kq;U0irpTilWr
zp<)_sJSd5UI_#enu_D-F`Fq}QHPqjGzn&!@*1vv3qst=|&Xf6);9K=;iJ*k*OvUw3
z&4a3Qt0euqrQpyP4)6~2*=+T*qg^k!5|02l{?>-h#Bq3YPbe+`nfQr2On0inG(scN
z<5+6c;{t2<&Q=u2OA_(ZrIiejx#0vtMVP+&`QEP?9?yXucNs?^Hyx$~#$X5B9(R#m
zzB}?I`=^7e-t>wuhk8I!Rau-Nst7gjWeL|;O8uaoHGTX5d?wmhL%{CyN=vXy8@}a7
zgTRx4bA>k*E!UNs-h*&;Gu`SspOf0|;4^@ub?GNFW`-oR)zdlJI-WIct>S<FG6;n0
zK)Sl*JLZq+*%e4EqmYEQeKJ1oKdV-}@PVQtc*$ubV`q$mVzNV=&c&ypTqf2q{<L+v
zX48@D;I&tj?h@~|`@XQujn5&#IekRtr~b~3_poaur?VDS)XUK&;!|SQsnFRLK+XM)
z>#E9mU3LE(@p`R%JF)c>VrTC8e1(QWz7Yqs!caCy6s%gZBGbPdg|3BGdOxFtrPP@t
z#S_`~N0Pw1%a-dGBYX{o7-?)}%<J&Sj0*Q>nVCIbfM>Yhdrm(=GsE}otkWv{Ken>}
zV=Mdrx2?=h1Q#dC;9Qk8mW9U)-)c={ZQ+1^Ihw~&F0()}^SDiJYhy>k<SO4$HWfhf
zMtWl&u`j4hh9j_t(=Q&v96QYz(L0XcJW!WrQ)+@9Tjr7Axn5*103o%D_AbD#9UO`v
z&_x(Aur4t8(LC}I{l?!r*|n_F%Q1+Eq)Cf`kn@Y=!@3Azv;_UT|Ih>kCBdhb!6Ed0
zuX3~JqWd2*j_P?**|5_!MLw*AEWH8O;K~X2dQw2`_+pls+nwgFKQfG9q?`S6SEw@t
zjyD?qe~bw{ZDy-VPLf666(W~!D1t{!#juP1M!{?-L>Z9QMaV>GO2i65G=kb0PJ>+H
z{)fc<0&fda;)SWQGsYyp7at$y7+>1yL0hju0tng<j$mjtWX*lxiS4j~j1YIHWYo@m
zNA^y`Mu#b1k5G$Sh>lMqB^Qqw^LFBuqLi&SUJ74$X?Vu{E?!Je{bR~;QOWT{rN<Xs
z7b{cp_aujb1Cact2f1-Qtm^TIuDscGQ$~3QFL;>x+wm0srlIN<^uIiA#?{=Wbo@+B
zZwFk*r$$4n5%^^`Rr+=*^v>!7vOZ7Gy-JULxTXyT7EHV5z@gdtaY#I$+0R2IpYd<&
zBJ!&-Qda3oe&|{PNWM?&qdQOwU%5HUr!~q}#(1n5P=r<&oI9SjTc@KjpjGj`7Cm9%
zOMx|&?>*a8sMZ!fkO{W<S7SqC<Z^MqJ6SqD-j@4oU}>6k(-_Pl6hzw_+U+sozn^S9
zCYrW0+}k^%TKRL>Jv&YGT`x`AYq;QWm+SxZUMxTAxn3?;dwf>B_2kSjUE6Wkh1*4+
zApQUWxghp0oIGh!(7`7Mp%iOHVsZN9`VjD@3U?8g<-P9a;<eiF3HW^OsTN6o+lixi
zMe_V9*(F~>Ik-7de?+2nbqNkiiKQ}aGAKs4bwGu%r3FQv4TwPyUg;l)mZB*+=)D{a
zs#%H7a-7NeO;Xx!`&^P950aQVe_yVXv)yvQF$wW$lAmddV<BiGAR)A`M~sR)^4>wf
zyvqAtVsWPu2gn8Ft<r7bmF_xW>4*yBmm(FnV(Xj0)wAsLs9Npb&0?I9=mgByG0gy$
z`^Ewg5XN?#y>FGQ1kqkAMW^tTm4l=dN8QHfXv7Du?jr4<9Ae{z-`$q{wnXeEm0h8{
z$MKRI{)YN-p51Ab$(RCyaiGfROD5e}8<w${6hEn$OBE-Q_FLcrlCo<_I=aRCPfT$8
zdlr-Hxg@dKU1BgbW2A(Wcu$tU9+MX@f#W~iZ3fB#$=(|`8OPl9VtFh_=}>c>BJct;
zc-~eJoaL?hdtf3E=IBoWj>jKxaK)i-%KI+)8byfpHd)Q*A!%tXj4eeJag@7?iNT@X
z!RZs&?FFDR0QDB}E6P*q52MygbAVzp6<9}(4K-SalfF$R5y%>$z3>$qm}i2t5ISv^
zec<5kgg}lu2C+#*W`9UbTwazbmxmTQOztThnK%w1g%Wh8`5pSX4vRaP?L!CX3N`xN
zv+AY|uZleI?fqnCvN?hBQ2pT*PUh|cbHQY+S_szeKUX*aBb}>)chss8r(391h+$Gb
zgp3XvLnGV;?#{OMB~aef=F}SXgQAup&ke@x2a|gnV905Oi7U%H9j+Ci0Z}pllJT8h
z6c?3AZo~ZA7BCLUPd8#_;!(0dA-tkeHmJn39so&uvUY%zp}_V6TAs#+EzUdw8dBd6
z!W>XqO(8P$m!o1NW#$Qr-^>DVxQZ_rtxdeG?VGc%%$6_|Zj?c{-aJqOo}oakmJASn
z43mfwvVbur8y26i59gdsvByT5<?w2981$=R7I@|*VV>?c+9xfWQI;647cB~Y0h!2^
zcrX!CNZoM<MZTnUU6@+L)da4kLM~K7hJolFSW@HU<mqpWgTme>>-_E?^6d@L;$f7w
zjCwHRf{OGIHd#5O4tUIn1BLMO=kVR{JClq%l3P+20;L!$Hv~ZaLSXK?Y8tpiMK}8D
z%mk68N;KXmK**{~@C96Z;@+8x<tY()iXO}f-OK`?osEKHb`z-O8a%~3CJa4FfJztL
z#M?k;3ah$xK%)4Pfv`=XMTKvv-6m=4)W=;^C9n&qsKOeV-yBJEz`db6>2ZzM@vNgk
zdC(DaOx(Iv&>10;{$_w9<>qHh;F~q7il+vMur?c64hmF-?1_o<Ix|d8--pISt!<iH
zE~W^18&enS`%H5{{Ksjcm1s6fXzxPvm*p1F!7;gt;5HBwS2};yu2?1jxm?r$x@r8l
zMQhMXKtTwe*6ZQCT#C{QKj4`L_mLp*#tI(Ne>ZJM@-p@d1A{rHq^Nk&adSF(Nu7xM
z;8p0kr0dH?jhLCl)fc6ruXiY<cQmzxHdmvINoj%$)4V!g$;OrFYK;}Mk)V7V(`L$+
zFxFa$X;fHOL+%zgidJbzRGGvI04D`AuT<>RoqYlp&j|voi4`@UK_iZ<N7z7n@fm?B
zS28PX*WCyM=`{)leh~I-AE+XuzOo;GmcchDP+NK?Au>LGZQrTsP}=Za_z>qHJ5+N@
zl1xtNpA@WxqPHcp<S{SA;3alrEf;$XzIM6E9+PC5RaP()BS|Ydk^aNLHK=DEd6Cy@
z7sXP*HygclK;mM(v~6%&0##<L1ZI0}Yyh!@+US+gh75EW31$rCj>sS3X05S!tAX6g
zfeKhxzGdV7P&z^p`DX7{HBe_69!)zFM%-x?sWxtP6|EjAmE*;Z!SF{iUDG}jvmRy=
z4G`^E8_FQr7J1-iVMaOX@V|*hO)LgnTzjcR`s<b6<|yoWCKtOOuuJuUV^3tBt{--u
z7*EBD#KbJ-o$)kjlMTk-OD=2PEk^SBMv7ze^1qRmqL)ipb77eq1Rcg|vb2=h0)@fF
z1Q{}QJAZuA@^h21iH=B#fGEE=P!v3gB60sWCnj7u?D;TJ{08a7z^IfVA3tq=%z^4d
zzHtJ6oU^xRdsf6}y1My%7OcGsGd!Gy@#4&tCCM&<;OhJJWzYM^>>FCa<RessNMUjK
z$sr^s;xVHoB$A~Tj%X-wTa)&y#M9ixrW<j%#;CP4s+PgOwvWEx@@B>1W6(|D6o{Vk
zz`4qn1Pm8gA3T)YvT?b#w#Mj?1$GBtO*7ltZYP@m%(uVJzl4eAilk|IsqVaHIV9EI
z8NULr^0RzBotlCvvxQaw<wbUX93UP`m=|dFu=i2#B6|_#Y|Lh9y=$_EN1Vq5UUTAK
z<ivrCpk5iN?4aR}uf~;Wwn|Y2C}SL)J;SJG+{S@Faf#7khv6X<Xik2+Uvph?n{6hJ
zO+@NE<)MV@5F!ciE7ArIN2*oFSRv<O>f&;O-~99kRi1wtgxh8iig8b=baDe^0|wm+
z1M{dS7zU~5MT&;Tu@0j4f$P8l$<z}C5ESVCA{($82P$A2jdGb_B?05ais<xmQlZKK
z7}cto%R`|c!^7>bX_Oi3qshQ@wP+98-&0raAhwb1ViEkA!*av0fr^LG7;A^10~0y8
zEYx4Qa4)uOufE1{4IQYU?*L5R)uUAnU7PndSu4ZbXisX!Iaww{YDv!{%;hfdqTB-r
zLA5_Xjwtk*#J`zWd<F~y)YP?@KRAIw7>itSWE5Z{6oph2h@_Q>F5V;INf!l^4y)c^
zPy4Sfm_5?tJ%3s^M!_JQ+TDH!2!qNX7*+!qO$A!ki09Z?QUmX|{C*UNs3%HY*-^c5
z&Uxqg<WQItYpaQ&R@F&FQrf#uZdMzU+tO%KsEx<^Tg(s5%{G&aEKUuMOkW$eBb9Lt
zNp(@Nadv8I?EHQ07F2GcW52KbyTty3zXrz@4Wo(OqI3>FcS#rXnz)219V8RAjYCyy
zo%-K!{D)bpBxyK=PAu_~{V$=iXgxG;Tk01O;`s!~f{nnfvcJKB?Np4Nlm7XcL<HDU
zEGI>u(orfAyRE`6G`tJ`T26vp9)cMP)x-q`!jfALBUbVxz6o|&-T4ZBW3{T)-5u{s
zC%=WowvucX`kjb{RY8wY7(p|Mbo|IdW?<rr3*8#;=c!*Aftvce-P3hWqgjN75SIn`
zRIq1%<3egMLSZ_H{s^wY9Mufc+|d0#v4lIJ<hQbvTs;fRBwmzT-RI@6eOkK~x6AI=
zHWFD!K7K!Y9ZT1`aucgtrbHe@;X;kd_DpK_^{WY1?dbr;eSZ%3_>4xcMMl3N9wO{M
z`((_@E`{xqa0zpBXzNR6K-m-yt15HTFr|LmEL?kC@a5l{wA~m0VTd6=zO|6Jg9vb>
zzqd)MB{}edx)m6<O#QVsO^jJ)#r0rWQdp)bbXz(Hjv@QF&+QcX^s>=@`ZOO-au!V{
zuPAD|E61a;%U!ykPlfQkwleW9Nj7aG2eF1qNty!{ig!AhzRkTg*;eyo@qK=88qc#A
z;ZT8`YhCr_slyWDP(Z4*K97xH^g+V4`|wH$u+DfkxnEc@Q0DDTduz2*6<ZoTry!(!
ze@w(K_yyCtdg}jfsV9e$BoF<?Uu+9lS59!IxRyi%)pJ7%N|uTzQ>Lhw#RMjigL2DH
zL7ox}skwQ3q+VH-5Qme|!{0=QFZ<S!J6+P>Q>Uv<HFr}Iqaa&1gXKJHTS`<@ymN~I
z%YT`V95Ylq6+qOhQj>>X*C=<mL)bZXX!u`=`di~04e}xe)FO701*)?Cn+9To<xr-I
zU_$IE!}0A@S1(EK<iE99`Ki8?f%J4um2+~P$nvVu<*G!-0y-<E9;Be}$}G;4?9WuW
zKlz&22GQC*#3E!jpmf1HiU`7jJYIEgbHHdTeuY;AMq&q^;-L>s^n6y|Pszy?yNyU^
z8DL8;wz@-;NPJ&2t`fjz#q&$}*X?pUf7pNp;%c_+az4|ZXO>~2j-)@egkSq9d?lEh
z4O97u<i$xq!qf$a>FCsQxwkKHj-F92u=H7a=MmxpHfFq6$F{`+*%m00JLwq5=u!@P
zROqI<hBxV7<vK!fZKG1;QH%tcwTcCS#u#3WJ;_wc^|+)&z}AzEhfInq90|IqR0G1q
z*etwxcRnAXAzXa!q?)aO;?IxFWl_2cQ~nx0t^Jwo0y?JA()M;8_IiIBf9pW`+UWbY
zxj#g|wSZ4;9<3@qNzmRRSdG66O>SkTw{MU>EL<?8EX%a?=Xt(b_v<;y-j-`1?GlA$
z8pKb<1EmY}XHdC%C}t2X1iFU7vNiZmtdDl|;ZO6_O32n{K|dqYW-RGRroBReY&)On
zcH&pEa8Up+>g}2sNH&E^wYYGtB5_uZBbFdcn_$e^PiDhDw`%PX!$Hi<oNG?xCLs6u
zTM{JC>}$bXg7mAx2>#XRL?FJuPRG1{YFrQ4V{%sXr_gF-ZGz&vNfol4(>-}$#vKvY
zgqj%%6_}!*Im{d6&U;EqHLESJRkmhraizmfM*yaqXN67o4|)`w->Gi`wP)!cIUfqV
zlvh=kXGtWG8vK@?d`MWcejA7&x#sEdCXttY_4KQN4xPFPvxj-1(XJww0nt6~icFxm
zn~L_LN?rf){HYaSb)%91M(AB91}YD$AQIF@0OIN~KY24qDd(85W);;bEn(jDO~m>k
zhh(X+C|?@mzp=3gL1j(Szwz9r3(@LqXR2I3RktLV@uU+2Zu=B5J$ZDL6N<SdAwxG;
z7+LiaMJzvB<yS=0^NJVC+vg$EICyY;XOaWR(&<EGNiuRW*=bK%k?4x`KYLGxK%9#I
zdeZ=9VrP}@SqPnb;e+Fakb}GRASw?iR7h+TmR}2%v~4AFcdi*nm|#f*?69EL`>n%c
zrdbD@A@5%2V%Ii=7~@ZAO@wUJu*{X}lx@lIPc*JW35;T3QdA&)C#>yBG*EE6b!l^>
zJgXFIY|WrOQK%>*w<Bq=8Jdjx5-E?j!N#wd9*`@a>fjKwjH*s2IX}_2qB0)^`{@U|
z;jsLS<^v?qtwX~Aoc(yYJ0ly(h+Y_$hawpCrB>LDx4y8nc}$Apo#O_Bn}X`C&&0Fh
zb0xmwHy|%HC=>wrpcYI0e5{Pa-z*#qM>!R^jwMs8s1Z8S48CFuv=KzYI7Z*h2B~8B
zRJ8w@k=kKTso-*3z^OFz-APR+x`kEl&|6f*jA*5RJcFSJ=K)@+A(`e#E!8XkNLNFv
zr*%ZURdy@?F|xoG@#Lw{cdF`}9cC%R>v>t&@CJ$m-epL-{$5P9D^L7QhbKeJx~YH;
zq>l<Zl7J+r5H}mYbf^65YO~y#)?Nedc)1?$ZnHeGwC!#rjg}twMCBb0MN-N$@e5&~
z4MCiFLXD<Ihk@5-y4-2)-8Q<cjC{ix*$~QvwuE42Z8GY1aBi1gY@YQ#%K++_7L6Ws
zL7*bpb*!xkbHyLN^g?AtA+GXAQ>CM{aO3c<Z@;ugdoWPXDL`UW?xt_rWHWFD8J3td
zz*X86tx|-#B8Pu|Qb`fz>JW8ye|G^$^184-km0HQlco}l1$ng5hoWr3>RP{vyi=pC
zkqcPtJ@~JG<|M<%XNSj&UJK0<(P;e$zm~Z%b822Wwm}f`&?J)|z&OD$tXA+y4JS3k
zS(IVBhRl(40xTNxvoo(Fm^-nV$1;^|32lL@$Znjmi<4*Ay%pE>(dCFcMc?_0<8M25
z8ImQn$OhAb#QRiW5N~(7bE@}MQjnY0Dvwvk+g{7>K+Pm7gw)f$0`Mtj{MM7}UuEXH
zl6!q+F+Pvl$7#2~FySsBOs1}Qnaw{bw8sGiU4sZOwcFW*aP($I)cpaG$kv>LQ=e)d
zezGFPp+?-pVC7+P&ldu5&~O=4%U1(%CQYnopnj}f;`))m<J;AAIxaEnOZO^Q=oSk^
zf_#5lOa!`Il7uUDNDK(S+~g}JaL$)>mBspXI*7`uax)pt9hIIT^&8Qtj1m>}>o&g6
zDWx$(oT7Hmm;JY-ksZZGI5Bz51q~s|1}`n~5!=oVR-Xl)x<fi{kRx<hW?3uK%0WN1
z;;;5q*Qf&Kzi^-~%X+UfUsI&u08-Ajo#p51PRfBCMl}<ITYLfYj~J{{^O3L&*dEsy
z2r45JN0-PvbZ$J*3K#g=jy)Z7nLifR-{l>xt(~sX^A+OSj8HfbXThB#&6-xXOEQ*&
z5Sf@BE`@*r;=~Tx8MXNf33>rL_uQc|RmMB}B0EBiQNAN0jVul_lGz@><_>aYik{M0
zr(@%dz{-Z(++TaYwIh<K;3S<f7cNF`p<S1qe9{QgbpIgjgc?T{f@qhEhG_I^w4wSi
zxb(K`EYkW#XQcNKclARP>LDOReP0cR5P}v|&wgf5z6L+?OO_!gh>0fUhm>3IedqH7
z6VM=oE(L1Y>0TEo^PPm6atpybmy)Lyt5?d<$*KkXS(YSb&2?kJt<r|N9>8X>Mnx_0
zWdaC@Hxph5h=%MxCo#-J>2$^h2lb&aS1n(Z>KM45(!Z0-r*czS^%~SGHbEK~VJ!q>
zU8$!Su#oc7YF%?L^?93dvS7tfUNoWeGI0F0P04wKUGxfPGhE>#3bdA^9t{WW#yQ<t
zfvv@ZhQzpbb4@*6&tvQb4$FVA^g;HKyUQJNCv)Q%>fC)j)A-KU!OtZ%v#%KdknV+T
zI1xZ6wZJNgyo6RFW#k`GnG)#S(Tf?WX&$+dih@SPoM}S|;m-<{usE9I;bv<3K=(+l
zwcD7qCzA{c|6oDJJy&E)JdsE<3KHFux;wV5_m%R%XY0#7&v;Q>>~3=?^^*(PBnv~)
zEQ^0w(S~w1rBlDoz0yP&EV+o`6unwksDsW?XGmL#)S;X+qeXG!DwmJ_J#y9X%mUuu
z#KOs#1OPy}H*z3Y_O&Mh0mQA;<Xs7k!jyCFKaRciKJYsQXYo^i3k2&82|<?~g>hV@
zWVEn2r7<4bQFe2MAh$v#3rxp~<CuleCrgzTWz?vj@)h|o_Wbgh{aLuxHri*IrWIEQ
z&d7y&CSg?44AaWO2po2ISCgjVTMiOXoSg3{%#U(A_`{K85lSRhO;&o3)Ml|niEaQ?
z$g3cLVy~_*#%u(m(xcC$Hh~M$1<%r#=$UMHC&*@v#T07|J;a}xRX<(BerZvJC51&k
zx+(;*Mc&jPW4R6zp+0J(&u}(~wBrucqJ}SGdv?{;MT(FL6exq>E_E6S($@%6HOmuo
z4455wz9XW*c^V(OsMVOI<Lz|6(gB|;vvnq6ikR?Wg3>2APVmMSMPN@d690f|Vqtgx
zo~WV<r>+iJlom5Nb?I>hU`@qZknwK8<TBvd=pbPABiJ_?8~}3*oaPu4D_ToTb_97g
z&A>`sO7gJra+Bs)<eZQe`Avlc5=K0Ou>;mWR}=N?E~|wTY|Arlgw~#5i-^3lmTHfL
z@CNd`b#h3V(-o1A7B+fD9bC?o{idvK_riH!(eb(%Z}|J7LYo>!2t{NQE4buI_}Mat
z&ag>O7>hfzbRoiyi}gsB`$e}JWNzmurrx98$XDlEy+tL9S(y?b$O=hb!+4)&54(&w
zMY&6ZPOI3fhlyY3`sTn)iWEm?hf0jotnhL9Z=L;dObeHtk?OU`#WL<{lMqUmLYhIo
zj}6m$D?pPqgHXTNzn>tClgVt>^+JKY0VGDd6|~nQ+?j7NYP1^CkR>tF=|)8?T>PFj
z)i;&pu$3u0+clA{(YrGB9eF*)Otnn1$~9Jai=~CI3#yf@mWTKn9afK5b$liff+#wD
zDxaM}*d|DizXYfc9o{l#2M!#o5tZRIH;0%XeiIJEbN2>ucVX5em=ZEGEO+TA>;)5J
zyh80>5arYl$QzT?r{`eGtK+4MDSZ2IF*10@Xlm$j>_Pxxp^tJg&f2Q3h3N6_YW4%o
zWMh|P?Dac=whr{R@uG&}9xP41Evui#s6Q3l)f0>exmPN-n*ur((uiSqt+60U1xQt2
zNnX{g)RL2pC+%WC&Vh16FreL}Av?o9Uzr1lS{jvdbDKqVLS`i@5@HJX?=4Z6MP%&`
zV@9qzSe}OM`?rN4W%Er}8GJ2j;>)TmlnBueOJOzy95Pi|Ti|0w(07_Bv8r$n4B3Ts
zEEI2cr+4gL7d}fU-X^wRM>r@xB_ah#o4UN9fY0K$<XmmX+_Yp;AsqM*1mi0)EW46i
zqGa0{&KFR;&AT)Fr&pWe#!PiC7obms(hxT?ZJ$OwlNbGMu9bbG=}dGN*DW+!P`XoY
zO^gdkmN+~)g(bKT)KD4a;>te+?$hSKemLw;)?-FYTPd=|;7VnO^4&~4+P{NNZOxcE
zVfbme#BS5n^63C6eoywWFufk>?6hOq!bDlWwgDOxusya>>&~j(NgL{!L`f$!kAmix
z{fJU}Ou77R<xajV>3nou5&)gxGFh2o@(;&=fPjPnCE&Ncy}eETpF@aX#(4Z`(vtZx
zzbhqTQ1YPX{kP2<Wd#e27rVsI5Qf@BaEt#(Oh!NZzlg~kP2WKkKS>d$@_<83_RSz}
zuGB$z#D0Y_XNP7#5a%+{BxhKcIcc8(3AT~>bu0gHR^O;3Z7=Y}?isq8&X3lz`?^f9
zhp@N}Wt*<qLt5NI2X;ux9Z~-MVCQ}BT|tm3*Zhd}Vufm)0^jN|Li|K2egS_KcE3Ej
zc_E@v456Fx3u4{%)y@x?V<xxgzC+G0D&Spa`P$3EM_GY~D3*u{<J9R)j=YeA3ry|K
zvU$<u5=@+t?`pb^e1;{t078@1%WM|RnRwKHwHIPLd7%6c*Fwa&$;Z8!BUciHE8INB
z`98|3a?}0+rD;RJfiTihg#L{y{y|8rYhe7M;<AQ3smRG0GBe9D8kB1sRXP$5ieUZc
z*nKfP@^yObMg-|AADvtU^_JgLmde0@QMSx~&n)wp`o4Z?9tLk->KJHzMn?upM9O}e
zUn0XvnM+tAoZTZ>7!P>UbI}9Rf&e{m`>t-kHCbt-MjAYS@i6QT$kIK{cS_wga%}7C
z6HSK|W5CLcc{Kyk(#ANeIdi~3<_;uStPk|Qx*>Jf@K^yN&Cg6l807)E0rUE6tJtLQ
z9Lz{Q%ClW&3kuYK@@=YJfyysE1+P+@YmJ+Er_(zaE8K%4{x>93fohi9t#BZpu6j*J
zKTS{X9c4zqf|cg@c8AU)3+F7O*7lRu;iL{`If}_jwx?;oxx*ct93^qtuJC``&Dz$_
z5Y9%qTY#IUQ$uP5zo}iaV&B=LlT)hO<@g-rb_@gP_98G^@fP}4{*F%C1Sxm?o7Z+V
zeY?}$${^o@X3Hp{S45bCzv+|2lh2Q@P3z(9xqofUw5IGI-?&roBT-ux&IYzHNN6Nb
z_3?Kz{h!%ly56ouZ(m1CrYpDMtv6f0StQwqj)Uu*8BzY*K|tq>w9{g!Qb(mBQ=k8_
zn>pYdt6{nQNHwLlY5up}Obkc$-L@+SLRrB|ur;EzEi^!mhO)`_@nJl1jcwnY^sn_^
zB@+v#gJK$|IcJIVk)RyqpR7a|QOLZ$ff<Tp(z$0s(9d;o2lnV2K;o}oG*8@gr4t9n
zr5F}3w!zhg%E(0mX_7CM$VC!0%uck7)3Sz2j;Q;4InYIdNls6z;@5zM>{BSAK>lAG
zdg1g7Ua_F3!u1qTj{3ZPr_7Sp$UMeT^Jfl_#=VgOZDW~q%x3wkq3Jqm>00q*)$BIe
zboK1H99xgO4X*L<3iHSu;75VXzD|7i^9%ovnWZ3frM?vQ{Jpb5U%p?=tXEamt{(-)
z941caw(tm8;bAH}H#ArkjZPGU-(pxdeHF<*g)HSUZhLSj4&K1m9pYjK4+q8#x}!mp
zoqKIS?BRMvty;}tL<*?-btsw0)s+fXk|Ve|?aRDWJ~~H=_ZGAcs24+!tU&0kFi3;u
zr?Bw<SG2)_<TtgTyt~e<lTQn_5i<w;7<BmlVC_`O1*l)>@Wy?dUG|B&_v5noAlv5q
z-1WT?ILf1xOTveY>sohRVmK}^Tr60ncm-TMdAG1B0K8sL63I-|-Do0;{giv9tb_`+
z#u$Dy%+9=@RDzEu7dO23Vn$k7JtK+9r=UN46QY4+<=^GtAWEjbCST6aOFg=+I4p(9
zjOzDm(O`qzo%4PCdlYa=8^L>KO%xBVQ=3k#zB(JY5(C*?*USD7s|KZyuk)N_TQH^S
ztQu)p+3v%MAW}CAyMh&tBU_AC*g6#18pecfZ-nTNl}zr6{Sl{`BpjD3EW^D%d+I?t
zEt)t8U!LvrbJ!0NV(q*O2NPN3Uw5a$`{d~z$X^S<g2_1g6*)1$9_NuNsty*l8+b;h
z<&16Ir<qO6oceTi_?7KOv2P1&wY+^B_IM1UFHYrOlX5)ps~Fi!d6ZIC()JY|DL`U|
zPBU4w10&*-m6sjN3KYTxG9#3DATbOV2Wg#%s_|Zr*`N88k=8X@Mn;(pZ7p-W`+E6I
z+If_J368uSfd=4r+1UtG(veK>1f3om?LEcq5QwR_0Pag&Y~E&*ryf@^hbAIh56}Ox
zWT?kOcqyt&@iurCADGprR+u*8LfK9|e%MJxm46s)P0f5aPN~vNflJqJcd1|(fm2yE
zr2^u9RyGs{rtZw2)AKi5<K|FalGJ5qSsk!Byhi(t0pqcsAa9e<|Fe(R-!i?oT9s5k
zl_v`gSNHR)QWYgsAMRgwf=>1w=bmjF7fcjQteZYkLr2lWG^Zsf{&m3>p+N+qF$VZ0
zC2}V8T>|$F(izSKsK)KzdP6as$BXJNN7GmF#$mo$>P?M?0>%X6+_IA{e4-ySSe|Y$
zHq`^r&y|0`eICCN3G0GP&;^MZ0u;A<Yw~JU<4_i`QP<q;<C0TS-4fE!6vZp`msC6C
zQ>3w5@}vY1t~!81K6`Bg0-QMpC?eKpq<{&)xGbx9or%)%6AT+ITeyqE$NPtlR4t5q
zq?HMdTW?XR3o%s)jYhcw<a<~igo7K~KKd}L;^E`7WhH4(D&<s4pqmMs<wUBD(EVm&
z2O6(es#o5u6#{t#YCisY>XN9~Zfn~kZSs^*H|$F4iZ%d;ed`bWg`c5Jk7W!EHVt9X
zL>7P{{jtF4q*alcE$twiyibQzE@gj|ocxrS4FuyG*q^<~YaLVOH)g^m(CBB=Szrwk
zLpc(N*E1&%ugerlCC)ZVn++=8HMMFtzpP+!`Isk?v~Qnu)K7)#DucYZMQN+C17;vo
ze4uX5pF%8eDxR2Y--z0F8qGvjUuq<eeNg2v^1X;l%>b?OLtcQS{MTs0CQOFA*vNV<
zK5yt25I7bCt5r@j2HkG_x<T1LgqvMf4IF}FLZtQIUW?#guZ2Ege<AA)zxylaC3kL)
z5M-lAcwyEw;y5PFLBsaZ-Wk#TR+?y0%`r|TM{4y9=OQe+FH*$Tf2<azbiY;$#H^g%
zu+cZqb+Mw}zN-W>2j%auP?<b+cYv6<ASC%*wgMVD$OBHL5ViG5HO^!780rfwhfJpe
z+$9ThIE8Aou4%O-El4+<Y7J8OGP%2RyYz#Gsdy-OPC@s?JFXJgzlDVMZapWty6Rv!
zH`w#ORi2v{f3*x+Xs|1vz67gQyX+Wko%f2{Ma=`WTdhAaf4s2CocIb&bEu91!yv?@
zU^$~IXkhsBILG4~N)MZ%&dW!N;6kzi{>No;NWwrR-Ml!Ltb66-46WBFz_1y$jRC8w
z4-`Q$C;K0Xg-y4iDE>YoOHxFkz0Bfiud}2L8`}KHC${}0#@L$tV3!$2@@s=uT~NPS
zMY;T+tn}~1z%#oe$hlGp5C!d1tFP-54$XbK`x9y{iOo991yc*tB5vqlrTebFZVqP%
zaPexvbEt4}W9c1nZyop7C)5e0#H)cG0Rjh6rVGf&dpINz)W>u?XeS3_sNGRQ@SF$|
zp=B0tiT#-4yCN?A+$zbD0W<0=dP=Hbvka%(y^+lC(BdRn=kbkNz?1Mg;zS=H84-X&
z**m`PDnT5KZnD1&C!o?pnt)lZT+w8v;^);Z(n7V=Op(X}k00_IBeupt-eK`X`-qRd
zB`GzC;RJqv8Vx4EN$06bpj6RfTZ@n+7EP5D9lVovhD#gt!rK(mSP_Ni$e}clej8n2
z@@4+tIhxNd6NYz}g2Wgo7;%1?LYM=X#)-xH7IAapEp;S$l$gMKOvz-2U<n10?!*_x
zqOiR$h9rSy{d}?p-l&EYXDm$Tw0n7T+!XY&gpq!A=J2FUR1Rt;zJx-<A4W9bOnq8W
z@yjrJQ>>jMSRqJHe9Kt%+4Un@IVMa%%`^#v3%5uS8UoEgoqwr=vaMNsVgGQFg-xmH
zP*eX#g~Fi7!b5S1yid>cq*0W$823_UzY#gWnIJel9SPKH4KKma-$SuAX;+iI5Ci7Z
zwGkW1jt-V5tx1>(9j>p2xZHe_?i>102=@kxI)o#J4O%|TLrZG5shOnvtT&j+{M$or
zX$UvL?ZAjBaj1>yq8xU$Dp8|*uh>PSlH5t8NCBEV0GUQO&jx;%+{Vv58p;Q%>-x|f
zug}HJN#}S49`c_nY1P~05|4CuOzAx-={d@i6QpF!o6e^A77^NJ$?Y2zDbg}w-z+40
zbiE>yd6<GIB*ko*26;v7ZLJTIswajElDgU<W>8P65uNR?pD07J!r+i?im5Op4ED*c
zP_Erfc)uIQbVqbGw>8Zc`}lIP2D|w3MuQ!amZf&*gk`&nW*I<&^=7Mhz3Q222_V#=
za*1*TqXm-7cV{)0uveVe1649uY~!iImPLPBrPJw6VJ@<=v8iT*c&r^OY_hVusXD!V
znR2Jpbq){A(eWhWSLsoqL41RYNL|Q9bauoXadf2MxmD?cTz+5u#9BGxbdj*s&2ojr
z_8@5!Y>PB!46D*yAQ(EBj|88;{`qcRwj&7RAi2iO6o}%jh>7;o8-M&1{Y10};phu_
zy-Yf=mAXR30b&Tf)kzvkPL=THPS4&NGhPU|ahV-~^xESR*taBo3NYF4=C9J(?%uC3
z+1#2m8Ljt|@-bQO=KD-!Iui{Rg_k`BWYS#^_P}I_OwuOPlrKNdBGZ6eq{Y+Al80N;
zj-?URkI<$v0@cHeHc6U$IErODo6ul0T$_?7(_M?2Bhy`tb@R4L2@t#qi9{`hk^Mm7
z+zTSJA`_bCie;0>wrPaOZ71g@GXtZLEE)-I;-miL-Puo@@fW1e-0oknWit|<24_q+
zc7P#HnZQK6Agc|(O1(36zjodaM_YUBtRl8hK;0rCK@ET$OS7ws*~Th8vGo8W4KhdN
zae_!ZHCY<%kSvP6sNQl2iwI56S(K&H`y(_SE8|M{0YHISlvC33&}Q>+Q_F<@oko{N
z@YD;&gas&xmS#5NSAfC#d>_p8qcjnRN%17!<OQ7h*~b0w3DEht^0TJD+U{{2$)G1?
zF@kGSMzD9#1_KrMB7%b3Tf?ygYWh3iMIKKoK*A`J+vFR*lhufbP^;LCaElPBk=+lk
z1-y*V((dAAF`>wx3B4uTYX_V(1q=h~Q~@AFn8Em)ZZbyYGebJawm-m0%jRBM#5fVr
zFwA(@#mkX-qffRFc%oN@L-n|K%C9ZZ*#^uAQgR&^BAdOz%`X7&-!HqmlOasugQ<b&
zMwkb|EcX7j*-mOa-;N826MJ4XHs$m2j~CXm`emsg$DDsbv9iuaOB)!(M%WneOA|!j
zdZ4X+6`R*%<K7xvFmT+0==^VC$X%8HgdsCG1xl9xlvg}}l3WpCch6v#$!m2-&2fg-
z^BCAAp%acyjM{lZJQdFakGWEnfxGFuB|HNNA>)unf_BYRP`yT0LcNSf#90|p@_vrh
z7tTCW{$hf9oBUTq&yrhh4yt8|wIFS1hVQ*YyDg2lLKH0WfblJH#PA_Q=5Wy`*oqlW
zK_d8gqwq3tu*FDWW1ULkF)+uP5xh7>q^VI)FX*>_w(#*qyAJ8UC50+x{)YJnlg(`^
zRkC^Jsbz*Q?w__8%lfRs=?!%;D|<wEr27)H5E^rVbuCUfnTQ`4Na%D#LXz>uosAAi
z4k1Q>h;!wlj!ZpZ^{lsDybKNJ@sQZAW=rC(Y(b{&xg=qylBX5eK40&E;t%E_r9yuC
zhKUtrj_%b>B2DCTf_EAI7e?Btj)wt~@B0(oOQU3o!eG&df2*8uLF0vaAW8;v)WD^?
z)%3tt)6?v+<99hx)0O6R)|Az%gDzaY1LmU`*2mi^_wG`NJkWE#@2s=&a(}J5c#p7_
zz5d`7mS`XcS}=U!D!c@SUOkVm<MYteRf9izz1DP-1YBbR-}!!A{Z-T5d_t;sFi*Vi
z)QFk7GXcRQuZQ(hIT>a0TlFL**j3eB&rdY1Ogd-u5efb{_zG~92?jqn(<j72&ncIQ
z8Xm&7rp8KHHVkF!bQ<FM*c(q<n7Z)!kK6<nqtpE+D@p3#rdY~<5SYXzio9kOG+u|E
zuGg5i1A~ZPW===vb6vZ&&f$Cee!t*2NvVQ7yL$8e;{E`fk1orlhdbJZg@L7!`0HRE
z6D!x{ZEoxse&?6=Ddc$o2-n;CuZEVHj{9l%zh9+4m7hwtd*}0=wXegm*C(@+Pk+a*
zECDAAjB`JGB;;W>cir<e?#pKhbqcSwPWuM8$7K?mOYSPQ-JLU(mn}ksx-FQ;ZDts5
zFML-2FQRiDM$drpQt^@p?`j?Qang|JCHB8?bK`>oAV)Jczo-5|VEdaat8MP1Y|gdM
zkN)wykn5<IrH$3|gdWxzuV8YFHPIz*<{lR>ynln*o+DlVAP3B{e<sFdLfN}me_Wr=
zauS~j{crctr!u50N4~Flzt*A933S-EJ3VcmCJd*3-{7u!XDRZ0RX+6UMKxp*C@34S
z6^wEEqZ|Ta73c+1(fq@!I^WEzdX*+txt=#zU6Vn(wYpqsSh|VgGqUIiP}$Bg=s<l-
zGop*(^D?I?g{W(DZc%)sF#B2*62yoREK$m-+HQupzGXIe>>DOXY#hhA3V7q438sb5
zMzp42y;~*s{|iamoYSCvl|$zQX~;{}x6$do;eE3@>H?pL>9L#P)#9?3eP97XOA%M_
z>-n_N2@wwF4d;tyOgsDK(DZ}$s9%frRO!v{=)^zotzA(HbebB~=&j3Mxc$e-L>LIm
zL_@PQ9|~UXXpP>++LFkjsipAuD}q?XW(u-CJH)uoM6b6(CSs-|B$)1!;z;>8A8Olf
zvpqhxk|M}Kjp};aT#&0f4C-ZECXP4!6f2LDI#S&Q;tATVuCHn~J<#nkCgfz>q1l#s
zdG~Z)(~rj$rl}XXJ;t?Lq0kTiUALCVY23mxPYt5e{F2H>Q4v`MlmSU#{k3V)i<=y$
z%SWcy1@sv?o^=kEfd3MsgQW#`H|yRyyus_Z$<40D1CIN5e6-rj8{-5FCDKg@;RGky
zsa}eP?-nJnISOIZZ>UQlKh=T5cLWrl@l#|Ej4iUi7w!*wSozx5KEIiKKcB9Fe*u7v
zr3b!NbY2pXt+yillxYy@z5RkCF>(o!m%;dsTZ9&|ykKd%p@bz<d_$5s)5^E26W4R#
zVL5y4G@xasAvb{ccdY{io5io|*LBmIc5>N;O!eyl@ChllmE#{AmdcN*7`#gFr#6rK
zkK+*dj277g$mZx$H!r+c{9-5&+g}I&Z(Lt>#|??wQBJm6>m5!MFrnr7PUPhg*!VtK
z0;p=q)JH={gJ?GCkI5qsN3=U7M-5l%jhQl>q0QuSt%ht17?1$_06N3DB23$0=Jt8@
zCXwn^|2-nPwU>T%VlkjGQ*x}+H%Ra<ADP^*U5J1vC@n$W@aY=;mDDO=7mbJ}jyK22
zaZUu9AHel6${fI`0tW*(@tG&`cjAjW&quV!74KRo1Io)Uf%b>^pPSbz279QiCY}X)
zsn0ir)|MFoAei;<vT5DXG1uwX`MlkJf9o_m4z{NsW4|EpOxEeN2HLKd$VcT_XZj0+
zP!m>@U#}WSs~AJDe1~_a!)<Fc6z6fnl`9iiQfu34L!wC2-risBn-F|&PS!0wO=)8^
zMsZ?HD33kk>74w?tU)6UN+CDuYc7nKT8;mGoXxYR^J((7_i>%(SnERbhMV02e~T?F
z{+F6$!+T-=<@|pTv=t=7-v36>P<?Exz13>`@f&z!_d$-}O6S8YGWkWE_5n?HOA$}b
ztGK9J@au?g8f=r%`5aTW`<MZl8js(YgX7}mDjrNz(0459=oERdr4#d!fv0m$CsevU
zF1KCr*)oixX!$;jj_yZhV>WcFwtr9(RUdiGHLY!^N=hS0kx9eCn{ePO+}<rG2k7Xt
zOClmt!ms~O89{cs?{>*LMJ&oE+LS98#DeuF^6me?AL!s*g-K~9qO6{zjLt05pf{!b
zlMlMSpI=&1((Sm}vd8DO{C?BirJ0_j=lk{-9OC*5fCHCOx^YotustU&IU;%h@oflx
zSPE$y<@eg*^+Z0z+`d~ds;NUHQ3?20ykD%(Ng&ur<2-ft=qE`);({Zj1_+~$pyeuz
z-Y#Rx5xNnd|65$Vc@aS;9Da`E0|><Sjyub!oO{T__-WA#wkDLSsrS>9Q~}M+yjQ3W
zN{`@}z@Qr=u?ZiYWW7`FxYP_|DH%_{8mV+e8P%e_gs%1FJ21q5r18F`#J5grWNS!d
zRS6!S1%F4w>$Z!X33;*i(G9h$=lHg)8N@{a5QHqM0eCLO2O?qmcuM4Djq6}?xK|MU
ztP>+l1HZpTt23wc$lo>@-!w_M0OVG)7JxDJ5y)_tw>@x}h+W(~!>S;^Py`|djgd~H
z0&wm$?+z@pr{(5g3Tq>U$sojbtuih+cGN@w>=GO4a#GJ8k&*)IKoW|DbzR57VY}^!
zPCGPLv4DMr`#H_y1BWL{xF0=IH2#khDOoIyrO<#DrT}v20<#=FnK?>pxgMVvd6(-S
zkOeM_T5C&^@TVqr&Y_(I-1&%;)V<{fsp}?emcN}J+fB#hH+nA7WPofg14hoYxTJFq
z2o(Aa^{Bg}9h1!#+ytn1f9p7|;$I6@7Vg{su>QFB&pS6@HdBV5>B{_-{%OvqxjugA
zfpcd3>c9(P447>nPyMzQF_-=kRlr{8xlqFSXIp>%drQh~@<&u2f1$^kGWKkTReALB
zhBI;W)0TX3bQp3~ZK_&-pH-~IGV^t6>H5BB?y<wFSYz&XzGu$ZXTA<aV1hbF-p^e}
zJG_1@U8f5;qw3<%BZ2gWC5l2hpLJnqCN3H-QvAiEal+u?r)_O+=hsy|VZVcZ&<ZhK
z44GRD9Rqx4^=(f;nB5-P7q-ar39Yp7S<cRoEYyJiQ4(UH+UE^@-;LE9CcEk{_8zKO
zFv{bVz^S~e8T@exqGqw{!JGsU7lDV`^Tw!Q?5n(9t_v^l{q*Q{i-tgI5YwZ^^*qUA
zA1#@be!}Z)>r=uGR|^?Kg2wD@h8vF+<ER_LJ*ZoT5Ej}vWW|KoBOj_fNRM%eU7Z|F
z1%&9aQI9ZKvssNHybC7?LMW)2Pcf({x`ANcNl3R1D#LPztnqOSCGyvzNq-yUQK?RA
z_z>FPL0^R=L6AUQ_Z2$f5>ZbYO{iJO2TQ@HH0k5t){Bk*g?ty&QtC9WNF(LYow!w)
zt9F)HM{f$I!b<hgkgZ>fECU8(y%^KN6NqIcxZlOnKJm--Kofa6dJ22nJjSCSU?ORF
z=4j(@e-Y)bC$Bz#Hkp(7mhlg=_95|Nvy??AFol{5F-{X7xH;_acXxw*+ep_WF;C#i
ze(Fn+aS9{3a-Jrvp8+rL5Q<;n%VIXd7ixTJ6GWomOCu&R_afdRh&mM`At+Zix>!yU
z`?qq;_hbjqL$F%LJ_0m&+|qJp>smi63_?Si6;chA=+z%G65Mo)vh$&@lOXK#^R+Iv
zLj_L2Q(_m>t98FE@$!xvzI+{KSOFcO=lePh`Et1P`%iq4r33N-3@{|er@*rT2&duW
zn8Dx=(<k{DFhpB>mX{Us4Qm@+Dv{e-cu}Ft*3x!u?0m&*@)`89&VGYh$#WbtmCJ`Z
zlp4uW;F-6y7kTn&;m98+Wu@8!<$-=wytPtZj;QU;xF02MTN?kXVORcl-D0gs#kSQr
zWW5^z;i`RMd`lyRxeA8LpSnsb&sa=@L~gu&(~O7WHkH(05i$G;`$&-b&k}Lw*?pvS
z;78?V|1jswxV3fV#bpG2es|xua;FbNkb6i-$N}*euzPPbg)|X&m6>#>WR<xT%B4<8
zv9K@l)|PDPVt9jgAYH}Q!?BrTBKxuPrXFZCEIg~0+d>JQN$P%ZigHTA3NYMOf*3V2
zmc+~fY79YRvYCFUw;1spA|(fRM$rl*YqvQkuH%J@SF}o}n9o-8NjB6`$=!z1T}49h
z|3%k3Mpv@MkG8RGtJAS<+w9o3ZFFoW9ox2T+qT(p^7c9B{qMW?j`7A=pX<}E+N)Om
z=9~xXk1oAtGe8Xk1YRi^Y#VwrEFQSmRc1I|C*Ds{!smr0*~oir+w$^Of5{)_7d%nM
z6bB^Gxli-V0(Ka`d47DDZ+mE}?o>_9Up#pp|8@f)<CwFb2)K9@7Ub-?;OWQ@ni?${
zL68<sGB}NYqJ}d2E`EVkD`>7xXD~{*FJwap#du!*p-(dg-mfr+(=hY-Ogd&;Sg=!G
zYmy@%z}-E-l#{GtXSkBXn(C1+^=BLDkhgUx@Odu`|B!r$g#*(d@$i9b6hfKfm;{mB
zn49ZDR9#<?+z|$M4dp>B`kY5rrY_2{@xqtljkK*U%BsbG;&}XdsR&mg;_>0x1+Thz
z9$b?L*1#(-3>%8HG^0?yot>H|*1b`uohWIgMG!(2HvNM6CU95w82cLOtpA<%tvox*
zbJURUn)0mr2f}l{yAk_D^0!QghZ67~#q3~w8%`=Q;M`~O#SHnrHw>Cr%^Gw0Lr`J6
z?4Ab4J7a(uz`5^dR_;rqH6z7RvnH$*>RkRpVN*HdcZ|6rWbd6W61=Crxcvp}(#!tQ
z%KJk0=fHUbdD=(7?dcl(Vy_@U^${90%f!QYlW9VUB3%b(wnC}Nh-}Y!96DbFL!e8>
zn??o{%fmv7L6&dUy-MOdbu7R6cB}1aUSHCzxOfG2o;TuanGDnbH>1Jz5ZH|vPbFbQ
zJ}XNf^35NTUJdc-P`>trj2~29@}m(aAyl1Fhrv`yb_?LD{<A5d<?21)&&qk4q$kRG
z-nkcL3e~bB$`A$vTXOG<Ut%s?OH>dYnK0m5P8{e|&4xV0Dpph8{S4JlxC++A1!Xq<
zY^zdze0zT^N|5G;R5A<94e5J?R2G6pr;DU4gif!HA^Ee6%8Fj6SOv|8$`%y_2UNkq
zx$=Aj*O6;=yCQ#wSZ5v&)fUjSr@Gig!O!+ou7of8GhZzAl_C^|jKMn-qDw(U*c=sw
zk+xpd9+ZMGx8uz-{Je0(N$z!-Hc`numT;8qVzz<ZT0WvFuZ)q!DYE+^OHhOcog^XZ
zjG{U&hDz0R0(frOXsi}wUkB`G3#=#<iFCx|;iSPk!Y-|!r74%P6}3y3Dhx_7_aDjG
za0YeXz3aTsd+BYU$DO5;-5sU-vSL>5%Ji6?fmHl3eM^SGCHY=AH&e0tp=hGg7gKs~
zfI-UthRu;zp3d@|SQW$nQLZvZ*laT2=o#FUWrJi+MT%0xidI7OcV4|rvCI?^H`44*
zuQrvbMYT-!fUx2ahL3VFb$g+l%HP&G?!^lm_#g*wEM3@bGy<2)YE--!=&#wZ7Sbra
zbrcQfaH``?_KFeDj&t3nz2$;jUYM7}Tm0U%EIN6`rvyrC0nI6$l|6i9ne<}85DeUS
zLCCYo6ku>^RK5tKEJ(>jSZzRSQcN>NMdDZMH72b17g3i$t}}m##ow{xhS%fIk=?Sk
zmw*nSzfpjV>Iz@mUN(?4!JE<+pLUX@n|xSh!sz^B>hNk%Vy?p@UMNFkFf??>r$?0n
zU?(ZThMlWIGZAJjSRo($!-ujC-Uv=nfMkYm8E%7_gQ)<GhgIh37J@z=ZZXq-fN5ls
zC|U=YNWjVm<*VN9uQdE1GJz~aY_tPdy)O<k1afX-!{``0Pi635{|j(WRAzM3!)Uhc
z)Ou?M_trSEV!gkB$8&4TT+E_b@ssat^FgrW@0sDwsN*q}Z6H6Af~e7wuw0qll87U1
zO6n!Ehi5&xDy5l}h)^v}ugNig#j0kKbL!Tl-aPEc)tP@7Le7&O2C%@RG=212pLjPm
z@v*WVXy?p+Gz24+eo!>#k^QVnRmb1n)!Ph>#up;}6~Q|rqqa|Xl_RTBFzf5q6{w<)
zJ>28EX_S}_BSynR;Sm%syZdIMd*;o86hHBd9fOvq_Ev^`=7D}jz?*5F=qHV<WTwW)
zTr(0bxALKEnD6e*L&A>Zcv@WLsA=i^I(x@zGzSFb`=EC22&BT03J|#Yl`9NPmv%0X
zw|Gqnlp~|-C?DxS_JxKIvmX^&v<~pFTx}K!(x5Z<Yy!i=8`=W1TU5RM)h84)7ejl;
z(n{!S4gX)xa5}K81A%<sHszJ$HJj{n8Rn`5XeiY0Zr8a;LW2nu*1{@CKNz_9b$L}c
zDu<#Z@64vA%j~NEk$!zneJV_P>Ud~YU)%IqgwLGUO`Tvr*r#ZCV~Wt%o<6%6k9(ED
z=N>uYs_IfG5vVLQ!wFS7!R@YQA~q+$YFj#uy_|d6fCAFAe@ELwmrwSpY13=u7e9aA
zAfglIwgJXX;aHOv;^s|E0kkV|n!ql~^QQ^JZP#n9hi!mnV|y%JYYG3o4;jC2<J&aq
z^BeE;qvUgc4#0_ZSmzBWC*xzZLiev3pRwA($dfe9<p?m^2YNYg^SY%g`GKXxl+gc0
z!qq@dQ%`>t^k|Ya%QafuXbOGO!LDUmh#wgHz=cA+WdbPNn(hzfsbg#GG9-X`7=Zik
z+N-wPA7*Ink;IR(Uu<9J*xy_>aF=?5iS~yae|puXP>;6Cvn97rOr>wJ-s7556san-
z%~H6a{gSL9`4vWXqi_nt^PSey`X3@y^<}@}^LEXD{N4EXWC3tGat@p<iM+CpJMv3@
zD;0DMx;KsdWQe~itKDCkg`cD4<B+S!`LdDYq4sq}D=a&hcd%{jM@f=3ZZ3`|ZFb?D
zF-vDLj3n3bo~AY=4<WOSR;&GyQ*_#$IAXxl1#Aas^u*<?x<ljYADkOSI{n%LRcmZ&
z9D2+HmG!vGy6uYyKA&4NnGLstKITREcJfBovxB7$x6~Zk)|MH-s5g54z)=jwMLMan
zLBImkSu{?6QXmmEV+1Kin4}fRq2BoZZukTtSw0JiZ-5=+z}uY#G%f}#j@vQu@Gnc)
zj*nXyz9<L!k`gZ@6Y0(04KHzSI~{8uhgrvNBU}7As=&bTckYI#Sk+bIzYD$OHkqp9
z5SV>u?CA1zvoy&(qqtvN3O$kh_m<hO!%>NY5xe}M&Kgj<j@Ou@xO<Zk6_mtOD$^FU
zJwE83nDgxzssRIP(c>0rCa7B$xFd8dzl?``lb>K-RNS)?kAF+-<-3}9fw0*0FB5ZE
zX_+rK_fCRU3J2Cd!HjWH5;6B7Vnm6ihJsg)Kuq*SMUtaH?kPZ}x`Io$uO~QGsr46i
zEh2D#25%5$<pv-y2Y^~lVzTQWV;C|hkAa|5dwZ9-Ks3u%6~J?HmHKJ7EZZNjgy0s?
zdSCNDZ2Flpf0%uZZ5n2CH&`byPw<BZNu^@N{Sl|HR?G5sC=Zjz(T_@plvpu+(Kl~+
zq2CR<+w399@Dr_xSw)WW<T9Qe==<%&Ym@#LECN*K)m0Koj{Z8J1hD{yyc|O75C4Mq
zH~rMb13)%Yx;wxS?Qf=}4}L(J@R}m})3=j4Q6ps4=iIkAd}qqKXnJ#K37b_frHjF@
z$~6>rm}`a5O0Spdo~|{0zv%rh+tg)!`|Z{EftxuWat3Un$ARa=xkG39MC6J2CJB9m
z%tIIbGM$eMNo^KD$rpq1V;q&FyJmisH`*i9VzPx3Tx<&Q_7-~yPghvXEW(%3s`)Sm
z7x#2Lhp04w4Wci%=pBmPu$Gm843cRb80yr7jdIqSNIhrWV$#@I;5E6K27xB^&JR>N
zXsJVwVFe@q)IU>juunwh46xm=^13Q6<tk#*k&+`m96peDh*dk8X?IHn+&CjwN-H=x
z3ZQuFVdAgr;;>dZIv&-pe*d#}^zqZ&k-&;m{i?B7SP!yPCG0YfU)rkOx84tPgpS^r
z73#mZD^$OKCYNR@r`x6kHa48k>UW(uft%$9J`g@&_Q70k%-q4B5kC0k2mBVrSm*#%
zOotfdnx15@$=P!&wsUwr)(C@)p_k!rgGd%(FIa)c@F8wy^Fz;{Ox(7zdeuHy7yO>|
z^I)V4XC%KWTX-%Nzz9g|r%8rS{^Y~lu&rKo{c%`Env{jvovNflR3efcY;k<v5W6Cm
zRWwf)-yTSj7N;$hlk!=5iDDmIXw?Ha=)UgfCPGIAqmE0w$|6s@R(3-lJVr?HivyC0
zv(owvcUTsu)53t2^Yv|^<#Mpl{jISI@w*cSQ@#U{a28+EsCxGd#Wc2?MehE7-)7WO
z@E9gD{1k5A;<pN!5sBV4A=p7Shu_gU38Dl6%`-Y8j^w(vUTm!cB3WTA7t17Eo$hb{
zUYm*hPXTcLCWUo*8CGz*HJ6#Xn;a=A__^zvnHnBGS|%NSI-R0m5Tg1{CZz3f#I$Vc
z-2uVC__|3aa{1jFierZ{UR@Grw$c$j2HThhYh4^y#nNb{vBPgMm1=mSPIn9yRZ=+I
zYXqANeAh@)AL8rb549ICu)a61nQYj;oPU>}60v)oE_nS1J>52QM<B!RECFOnUK?L$
zCzn0s%=M`3YQ7GT9W2SCPL!z_k8BGV!XW0X4*#<c{P8ekKf~BwIim4GY`QiigPS$-
z3b$B;1jz+u!M8Fq%z*KI<-O$}42Lfd8s6M)sdTd#{u-_BDI#?|JzO)b0#XT*_CL)D
zQP-!O*rDg>Zec>Y+;2!x(Ko**b=u($NF`x5Z|FJ@d&tiy`Ao4dUcw<taOHEw(x^Qp
z*|7vF^=74r8i_Q2JT)mmB)q1j@`#T^o7PhCglUz9tUJH$!nvvf1F6>1F#J41>AX!U
z2m=(4d}Vz9W)}4T?1QZ02ESop!`;RW6|*_TmF$7hZV8gCrHGfQk6EBt6srBKA}}Nk
z71->)b_qJ9WKFT!6EzoC&}w+E(s0UA*k+d^-FMt|Mr4Ho4Gao!GT<1z;i2#?)0NoL
z-Q9fe%lJ(!_x%3U_FRE4Zq>TCHUi0}9e&-mB#&U4N0>(FR?Nr&<CC+>0DWJPoD3uH
zB=WQ&k?MkM=N#S{YrTf{7*OqQe<3jeqljywo=$@=sT%E(Yy%3oC%?9R1b$BrzqC`a
zptiWbWxT&2uZ*C@kYFGlZcE;gEp|FCbUo9NZV<or16!1OaZgF-27Z>IxgW1q&Y`l1
zmcJy5WF-T%=f{*1YGu`X4R(#Si6uv!S1+dRp3i(HY#BSgdvsGZIvIA`9LL)oy~@#F
zT2_2R5oG%J!3VJXhrm7`cjsI5Jgi7V?#f5a=w*9dhRC&&uvQOdo3MoQ-a%j2k1w*Q
zK-I<8%B&L=7ncR$6>y;iQk%wGm30mAD~L0&+QI}n2ZEf&Jx@SzFw3{&&ybBvY?L+}
z{hkvSTXnlkeRP6;UJ>C8B0iosWW{4xtY`gy!WsMer%&UfD{i%;Kb-Cpml`_*gknxC
zw*U*tIlpXRw_o~D(!LexDV|7;={(Y0$g&oef6SYEvmy4Dk5S@|_bbEL=w+Svsn7#k
zV*P&qYRO}AnJ=$B=x9hf=@Sm^&U{>(4ph73?ePTlncwkRR26zcr8=WnmG*<a@{NU>
zBj<vcmi)FSz$vQ4H8?@VtHpgA6Wo!uP#rv{*3g-AxKvh7{sF8Ba6IPomW0#W3P3QU
z7Y+xc4$@lNS1eS;6vBDpi-QBXIb@Ce_VT^1(Lzd$I~3=A78>QJAWK&Gx~nS5B?uYw
zx40%Qmr55uL%}6*X0|-M_dd2dUdU%Y0=Zu@^>jaQS*X;FC<@Nx?KQ~mo(C!fA>qX!
zO5_a9-F?ScpPB(uA)2j)hZ&c`Ip;#$V(?G1C8w_r3#M=UO%D?Ad$b@I!}BOfW`!Bu
zLZkt6%({Ol20RCCeY@SbAUZpnGVq)ZBAY1K1i@CHEqSDt&KKW@NwpWCpaqQS)e^!z
z)TI#|e)`Tk)A+1%Ln2%pn8eb^WQ4ikIHE(wlHi7|z@!-17$tkeq`B@Ts4Sa=T0ca!
zic$kk-UXOpYGhv+cRIoT=LSZ5B-@`eSsh>4s(ASGlB4?B+-|&X*XLVkWgic%3H;Iy
z?jwooAKrs!Tcxj`S;rliw@-m@B@7O1j9<$*kle^@PuVoM^Bwws6F1ZyQw6ACo?Cf9
z5hVslw%UGcbRG?jOvv*A9mw!Eu)kFDd@b{O#MFoM4V-2Mp)-<b{`r0wKZt!z6F|?M
z20OD#mKiwz(C|R*i%mLVWUg(*zM79z?>;H2ka7GRLS@uwhlb%}sX&%X#t7S!ztA(q
znUTLX#wb(%b8hjsrnu#_P;);=rEWJs-%K`G<oJ*WM%q{owkc(_^X3YHibJ5*<sj4R
zj%A5q^$}m-Rz91kIIKmjn@e03ntI2#)$tW}sKnxE-ffcfZV>;u!{fO>8K5kk9n6N=
z&Z=3VT38wev&{Y;=l$$mFb!LJ_aB7*ZLrhF$tmRnUHW=AkN!Mpak@GWzG>!R0r;;=
zNE!^>f>A^Xat(X|-c@#ZW#n6-^Wsx`obLELjXJj{)$vh1MQGL>eu#DJ7!xqiUuSk5
z?xbQ@=lFd+uIARYZeT>Jw5I4NPTn5T%SiP#rR_({kpX@D9%omCIo(g-`bG^K5=P63
zY+J44F{}OZe*Q|hSwg7zN_G`>>KtMgGqu33CQu@TMm4iU4p`sE=cp;8k{g{!nFnXk
z&s{wVPAIGa%G8>e1F~!pYZ8SOS0R=}*%x?hnkT!$MitXfU?Cl1FRWQD1Nv#)Snx?Y
zPQ<P%idtb;p^<(hRv7EQ(5(-Lz3A9>XzE^_1piqOu;bixNOs~cvYg1D<%`I2!5Vd@
zP98jUL`v>*xF%3l?Tq{-RqJA3W`$MXOu8~Qamy~PgjJEfuD$T5W`)|9V%I0n8EVtY
z@Q7Gjz8+DbJwSPQgHR;(9M^YVF0XsoP`G5HYtCu<P+muLziaCmZn8VzRKuzu;az|Y
zK&!Xh>~gGp9r1kM_9BzHZ-mjlVBI9a@+uk;cdcHGf$>(mJ49I-G8Q{j(kGKe+EO$i
z48y)h4GH6wBSO+=my1BlSNvd1sC#~X;;~d0(gW*C@QeSpAZLwc>5C;TvjKc&=X(!3
zdL@3boSTPEu8d&7Eije3w-LtVIDkH{7A`BD`ZMJ=cJj9FYG{aWt7B!JSo4XNxKxh8
z<{h|#ea3A5OW2I9(d`1^!{TDwsp`_HsxtiDQiFP9BZCa>+=GkiWuH^8DGxR}|MpQU
zzg!;0;j(YMpvG8~YAF<^Xk$5K{_L`mE=hBWf_Pb7S({o4KJq`~Z9fmGDNVo<DS}#d
zIa<q6w_BvJJ{7<%5O+FN<tgfrwehPtLK#4Ex*XkW*cWJd9*VB~^P({fCm$(0@tn7!
z0mND^gMV~~+DmBPbASioRP>$RBNty(<MB1)F7@_boaHU&Rh}oA>k6y!<r3GV{GEXY
zCso!G7!+K#X5Wvb-&vlfYRhvxq|<%-;dkP$^n1SQR8>nnt*6%a^Qfj)o0Lda)x4Z@
zbd|molsvnFocAi0zpG3nbrAgG9?9LB2z>QQF_yK=_4DwBJ?OfXRuHvjOt(76J>(v>
zBfWQ*D_mk{qNer_21@PsLx}BueZ<p_YA=y8<#}L4#668~cy*ALZSGpI$?WVAGt@#J
zULUTgzqa!@VF);MkbT#2XY~dIf!Fg&f(v0A9kQV^hSV)Tp_9IbCv_ph{{}60$j19f
z5b{zMRa02$k=U)yXp7kWBUwUf>)sB7SzR}`QeG_1AcEu6RKz#eB5uF${l%K_CQxDR
zyHTwv2)bB5zl@N_+hj8c3|ZmSrGr<x61}<@a_04;nUGhuc3-GSFfg>7<+2BoD)DMb
z)3swhS7f~*YfK3r-1yxQjTXBRhkS%eE9%P6XQeuIc6&v4eJQB&%?T*aa%OBiqw=Lb
z4%Y~@@!0!sKJNV;4lN`K6w5xUniAvQf`W4a{LA3*ut^Z-T?xYmFQUjUOQT84h(VKG
z)5+A6r_EoETse(P2ltDnmB`kPw73jw`p%_H%dT_($p%KXb*LZceC;Vg4=q-v`!*R;
zRm9uLoZR}sk?Bi83WDH5UNRzN+tFKa`t|D5YNq4x0=S-QljcH!BEF_Lws=%ZUXkVX
zbjOU=8)OU(Y?)D}P=gm^7$C`IW$mCy3F8T)?|Qg9rE*N_E0o$yf|);1&CpRv7R@e$
z5s>xXWdt0|;HMJnRL8M2%<zOy#=-q6hbd248%^*FU9hM4qtZ3;p5Bdfrz(j<Fo@mf
z`9Aau(t79C1Dg9)a7Qd4X`Ph6_Ar>EAzSgBQibxrn$4Y2Ib?0HcO)ci2A5?h+95Ko
z1b{25d7F(+rgysAj849<&{A8VF%pK0tO96k-}08SO-O4#6W2uV5h~dgq^;QEA*7Io
zNahu)o?0L9p;340X$#97FH+*jD6+zjyXStz&BJ`Cu*G$PnkwOFq*!@wSoQd|K+erL
z%|)yUhEWv29|Xl@1m63C;VRuLRi!E3bKytnHQ1s`*)4A{=b}}(ZF<jxbQTOJc18;<
zNJ8_n_uJ1)xOg{6<|VcZ7Ob|7UM3I9tzp#ha`PhZ%NtFPYGGckKT5)&GTY9>fp0L7
z9EIz5Nr3;s&mQw%gS+~bD^fb}i8d_}AoNULNK=|57jy(K>xVB_%Wd0GDwlhARyjoS
z-HhCK<p*OW`c=j18ekA+eJFO-YUu?<SeDTmFGhl^KZLv|7QjddF8rq%t{%(@dq+yj
z9h(Jzt1ncfWe4UXKm^T%0kwX34%=udF+pRdmmSi<ewa6r#BP??(g~t`&XcfllxI05
z-hhkUJ~tz4eK<<0?y;X5??Fmo;DSj!@l1(Ti!$~2v<gbk-|+~9DXr>^%;6L~sqlYu
zZy}~>rW{cWKDJ1a7BD%7+|zf5y(yi=$G;$*%aSV(+#v_+jp5!LN!gg$7H3mmt>GTr
z=+<x!Hg8hy+MFispQ4vsE{-x?(h|_7Ea4x~I$<<?FQmy$?aCovXiIZauqpq9L^IZ)
zksJdVT13}vipXdLf04b)qU?xFt4^{8Uk7K)(R46spfa;nmobv;;U9m8(wT=Q*O>n5
z6Kv1?A$<3EJ6u(==p91YgmRBAB*^u22pQoM*E}fdrzHp*^u6Z!0_Cmr1f4KM%M+LO
zdN@LW#=n&6byMiI4=np4${dt{C$ZRznfXi?P(oO!-gVn%$RdNPPwy*hz*@6e)HyO<
zB>xWx=4EY~=6#IUt@<_dL7wf<nrZ>6jgVR`R4k6tt8FES8|IrBU9hs_ZF6%0S?}d~
z<#p6su?AWL&_aAy)LeOtIS&<uGe&|7@Nv0>2g&_dKI*!ezftnVkFo_@@~;=r0fnVx
z0}^d@$zMkVugrv04^;nI@Q7vEL%{4Oqu~TD^_igr)TRH24OWk4y#tnol1htKjr?zY
zv4;U3k{K;erP|$A&CyMkZDWHHt>LsC4116S(*mV8=rFl>Akv6uzRI}l$tk9O&)Ev^
zt=j4p^yJ}zFtECBxnK2SJ!F>;-GgF)DEht_TeW~+NMi6;;r!VPM0}2IqZbUm!))GW
zWzc=w2eBDia^`RfSAJVIIgd()K(ufgDLzjynM9j)lM+41W;lnX%4im0pJY@dr+IM{
zBjwHFg?o8fuC|Tq%3P=NPm2zvg40pr61YBxi(LazE$3qLEFoSx>wvROybR}A=Xi_(
zGl{BGuwL2Nd{2{g!7u9qixP*@KtXgWsGae9QJJ6mX+YH^muyJJ{e@^$N3`%NuA@Y`
zN_7DUoUEwAR5m1(<2431QtC4;9$D}jRMnG7x_<;E9GfN9UP0`b;^Y*KaR^o^i~~<>
z4g`xXb?UoTV~{xGY#$BJMim`868=DSoE<<LXkxlj3=bxRd^Y%mUFYm2*^EL0ZOXZ8
zWl|lVr8Guuv#94J-^tm0tIo#D@|!iZDN+7;s!Wq;qZ<IEcHnywb?fy0Lu^CI!H=4e
zDHD}<Qe?=Kea?(M8Iq+C4yhLwhQtCtLdXUtkZMg%pt>zm@apw<6IfXo1bm4!fs%~n
zNf=eqGVn4tx8>KKyZIXb7h~JmdZV}zoGd+@SjvtrUlh9FzD5z28emv*KkzC<%zlb?
z|C(oD=19uQlrYKJrtPV3`7S}n&X6Qi^e~2IsxWTOUo25(usNZ4PbhWZ>Z4w8`efL)
z&(YL9;QmcWo3B^=mU#B~H{i?I@nvXHf&Un}L@)P9um5C%f{P*O<_gr^Xqx1dSIU-9
zwnz+8xs;tV!FU+|-Hhye_Yk+h{PGDO(zTl_=FlPHB%R&?G{C-1Ws59Ze1jrB%L45#
zPbCXA?z#tADmGVFmor|%y1;Cu8AST4d5XLH1sAd3uVAs)b<wK)Q@oeDmHcI?ORe(D
zuOJ?|GcRcbSDarkg=?G;gw}#MrWp~@AVOtig?1IGZabR;2Yc8HCx_usG{@NSC$U2k
zXL^)UM1L{VpZ}{_2weufFJ_tk!^MtT2WSwW-8(1)UWENF4)soGB&6GZ%~GHi`^u>F
zV~D>HgS<V{ECWHWUYx~<N<ErWEH9i;D|fQbTkDrcDlQxluiq_|Y3#Qbx$*Jg*A3V3
z9wm>G#fO+%SvAMo`#uEq!^1N2?o_Nxth=|?H*utsT2fU@0#rElyqre5)m%;Vc6co*
zKoHM}lz;CQ6edl8CJrqnAF3tf>)H}NKGbEmGWZj~eyj<+G!WkoF=l8Yz35jR*|G%P
zlyz^=(4?2J2&KLEMPV--E7QT#yB7p{WGEd$wYeuzOQf7C{0%XZEQH=F4cDFltM?(F
zNygSm62kRo2*KcrVUofHna9%H!E1!hT8;iQJ1t8(bycQ9euk?=f5+)a$fvH`DUwJV
zmY#~Pz@i=5VGnMMN0@E7{f0-nxA-#X<7#-dYy?d+%8o&Qhm!3@0)=t1+Oj!E!Val;
zb29Eek;~@Sr<`M<25RDs-PCR9Zk$QV2J8+uSPu=5FG1ymKmhs|U6_bQ{C(9+^s%fx
z++PRt=@OsCav+!ED9Y9)`x2gvEVB#Ow@_SI=Q!f;|G$yK!#D7@vFa_m?C(I&F6Eim
zrJkOfY5F}6bKMU40`0vb7kQdCSVZ7V54QGrb?kd@elGg#Z&LZue(dMLtmKLt&sP<=
zQW%XTxtw4q4K+ZI;2)fLZC2B&q*bM9iJ3>Deel{RiWF(IGFTyQoS_fsGFAp;h%`Mg
z=8V-5bvk+Yc3p@)VXzTUrivaH7TZszwbn^$N9o~g#NUa4VCd1nI+Op)4H~5QI(145
zZl5<70A!aT_w(Qf5s}2OvSbeH5q6PY2K7mC<f{*)Zv%f0D23fGoVN8%IwVcB+Z8PO
z(#~g&H8vBwCYW9IzY5hXLqNdUP<BS*KTX>zDlVpAmd5&Pf>(;hMQY{C2I3)V!fuK8
z#$uos99e!do}XfMqZ$;7i(_+MOGgJK3V3SMr2myJKbkF-=q!`?5ov+nEjVv1MZ*Hg
zSWuS2P)x`F)#-7iAuvH&BQ$tS0w0(f%@b8n_M&*5B+P{`EdIo`2?iC8;@JG$ruM7y
zhJwxwZPU$og7l0~VDu}#kAu(N^IbgT0Mr8i2Y^aX54yZn9znB_<r2sT^EON^dy{FB
zf;*kdT#zWG0;1;TAE71V0gX?~ZzTU)GzbEV%7Ik0FTZ#?z-}b{^f*he)d$2UO{R0y
z{OaCTtlVtGV1d*aGaWEiXzHaHiw;XPZ#h{eTc6Cz9Fk2cZv^qd7*#YZ`apQ)pEEJ)
zWEQ24k!m)}7O+fqY^_|8Y~vFVokCA5)AWm9f7~V&luHU6c4}^kMDD0(G}y)N<2va|
zWj!nMy4fsmgUxrZ;#_C{ERi-{Af{d;{<GjPb@NYftcXQcGKqH4wXmZKzGzqpUTmLn
zC%Jnz=w_0h!v$50^fjE44i`)ji5oCf`Kbqch7OTTD)-LjWW(bMxzL{u-MJ$S8x!7N
zmW*gf)aR2e5#YKi#C}$dT_1scarO7nQnY@fjemvEJMYeqhm=2MeoOsQQw9$_F<(y{
zRWrNQi&9%^OFPys51g*ojrVJXEn`hhRQR*&n8_i{#P0~HR)Av-&<R5aL!y$;aFa`-
zg$>e$XS85#3MvvxMdK0!E8Yi~xVKDB_5bMbqXM5a*JELi_M7%pKI=H!eZ_ldXBmGp
z`n~IMvIPBux(}|B^okieu6x(9Y4A~o5|>#C%xPx?lsF+PuJl9@NqNFBgD=sozRp-X
zoA9nyaxt?!2>UE)(C>NhH{UQac+y&+gL~w5cQOf<c~53^arD;Bo*Ya04SH<%O&8VG
zrDp@G2X}{dow9{p#g97+mxAYXI3FQ4&dtVNHhruognvqjp_*OhP4yN1RCn;p(_c(|
z$F0ZGXm=$cEn5O<G$~O9z0oTaDVb^QH-*tFM#I6opAfM$j0_IOTUlJuUOqTzpFAm4
zqC4qIO>#M7T(=m{73w`zQ&fWHW&<$f4G`4s8<s<R@$zO>61T6`RNLp!HWo~mQO&Bb
zMdZNf8+;`;4D2_BDV|6&+>Znyn<ZGbO$|XSe(mMG>s)%^L>F6)SJrEM10gmR9U->M
zDb43^?3P*^jZ?CgWzE?}J8+*Ryf44`e%tKGa$o$IhXMu9pX*Oo{i+hb^>FZgTKx1=
zEe{m;J}&b*SA2NGhC%XBOlzvwj6%q|&c*cJ3T3}woWa$mp7Az2fY@i2x<z0XNoEr0
zQ|>e3a<@c-jb9w_CB;F9g2UX<uOGE&;FMF9PD-kr1f&rsP?GDvtSyRv_A22HqS~-d
zP|y}K8yk|KC|rC<CoO4GS$vn7NZ~D`&uj}&&O@xF*tuWk*t!~RD^gshaca__J2kcO
zCVTP@Ki-G)OzPVfD*|igxO)_5LUKb0vLnQnM3XIFlo|7@*OVyz08OZ(=t+*mc>l;+
zGu3fHhOCKotW%s!?mbEocO@c_29T;TT$n$O`}!V`UYelSe^Ea{IGitm?PjkKyU3z_
zxDQ%{fXcbZlDa}4)-H&=F5zH3I7feGt6)FhYd!{$<FwZAxcgl-xT$TwF6(5t^?35f
z->NhcgoS|KrDhfc%fw=nUg$LPm-<5GsOUH-hl7Sjh$trt$5HB=;xNin>Y}oY>YGBc
zV0sQ)&~A(U23ehp{0?SSEKfqTjDwZya<7%)Rl;f7?+cQHKkDTy<{8lnx^ee79~wh>
z5*{EovnbwuxrSlcowu(3HL2Jm3rC}u4rnj>4g*&h<S`WT^b7a$&&=`9*13;9)0tS8
zPX}4sMfoC8oaZj|RynYhP*2?UQ_rK5TQa%`-f`{;8~D1Bp(E9?@8K#oasyk5cP^Sk
zZl^+Wkb(l$)hMlj`9nZk1lhKy74A*Qa<56z!Y-I89lq#x?yJ8lk~)i)NyD`=;ta%J
zA2eF*F7}B9V+GYah;6A;RQ)TTy~1#hL1VOhhZI;gQneXhr{P{S)M}Y;KG08)8Y@Wy
zD#-HbvTycy+vh)y^(w{$QQc253cSEdBUt*Wx&M^FjifY0U(5}y^n;qCoDqgcM7vn&
zEU$5nv>Y8~G6z%Fy-zO&EE=kBv0^hZzAYD@?pMR4bhJw}*kbsB97mhuvN%qCXrDBG
zq7~ajwse`iny$<7K82dothNaRN1<o8dV1M6<a#=FqFS`}8N+OV#R>mcL#NjSiwp7Y
z(Y`r~6LyCpLF(~OhpOOx&qy*dvQA~0%~IWaVONGU1_@3M?{7sX%Fn}E2|d}bOU9td
z)ytgfv3bz0ZHwg5UttD(?0q3V#7!1@r9Tb*d>t)Gvz$@F`vrW>9nBe*@_!l@HtRQY
z&6?D|iV6IP4|Ipw1chs$O=Cg2=;D2mQm^LDW`<oO2iS9{j>aL)93k<Ykz$xL>SoH2
za}?Dny>UEi|BWza`7ygESdtiDqXxyVu{)Y(yLFU)=j-!~!ZaKh<Dh!~4awk|Rt+b{
z{&YbMAD+5WKRE4`X$oe4vYXg+QfCUl1p!&)m)OFDW|>@fS02ek;50`r`#B?4@)w74
z3=3g~M;=syCxwIKcT(NYV;8U%&0o5A2&(H661CFFL6APv2`PCe$8g|#1>iL#;@NY>
zRICYPgE-%bQfJfIcGp`yhy{lJ1d&J>{iuTK1Ts^RKMhr4_|xSx{gJ%aq1U22MdbIT
zUD3KSOJ!FYkRoN9`12QI>`xt>5M2@pgN}4s7=dY-aB*C!mHF04A)O94fh3}<NLRv;
z<OuEv7!_Ao_?VKBOqRyu)!WHqXpk?$B%Qy39Eh%W<ObTTgO3=?aK_A2Q$9Gg2<*C}
z?LAxhvvCHcSrVd3;ECOckhMosN{mM$c}%G0HG`=%#V5$c_Z=y2l0MA>K?TQ%I1UKo
z(k$ezer?W|W_!y4$F`M5gX)&zQEK(#`kzgO_|F{Su2B-rak~;o^mN+E=wf|bxTE#^
zs5I20&_it_844vbdYCZa>tp+Mz6Pr?Ipxy0h;Jx_^{iuzk)QQO3&!k^>@ewTR=vgV
z_svHkVXppYH;zoyGh#BY050cJKyFlo!K_sX%$)v1s8~oPm+>noT8+jo6GR^K_Dku3
zXUypBup#-GpcS*42xl~Q@D^13r*MR1r;-IKgF$`cD=DQ_;T@@?3kR-|IB&B5tjE*H
zU26^K?VraJ4ZR)`JzfM9BpLHQGPEYTqV^YpimmJeM8#|~#2F@r!`q{q!v^hGcRcYU
z8akOEln_z=l*8&z&<*Gr-bUVT9e#P&TP?lbvi)!#u{q-+rG2K^RwbFgWvJR4i`9WS
zxSO=1K*lnM=eg*ykOC;a|H>L#R;~ZsEGIsFNBn4p4K#YAM*rqGSUmQIh}~Tzq!dbb
zgqoe0Wyrki`rMc@^GRw#&Y^4N4V3Br$5;?h>V<Jw@d`5<Q+j)G+pJM9CyHdh|2n?M
zed2u0&}~1xC;%T4(T#`i&aG!I)lnQFUU)1C%Xz*fu~o_H0OKc9{S<1}RzjK<Fd-O9
zWT@d<oc)L94yb^!hlAT<i72K1)E<z29AeT1ofIB1-gTJMS)zhPYbWQ*aI|HxVF^UE
zH#mE==k0xk(YA;%iTTaC-vgyIn09)29{(PNs0sd)r0H}QAz)jeA;F~fu9*DR=IB7T
zWIqfQDh>umB*(h$@D&Ogb3_6$Ix*z^?=MW-GOMKKmEV7cb0i_#jd8$hiZJa4#%gmW
zbZ=eV8}7LhW72}n_JM}VMAtp1Rbcf8j4E>JEVOYHMN)5Mg&lO>#bOzSbPzb(K%jLs
zhD$V4eBXb53Gq>){|vs56>>=7(So$2SPN<zmr-TzRXyGbHBCDK0yuXNiqKKCmuHFu
zZTA7C$+yu``TjLfBY#F|T~}|!%aP0*HWF3o#cy@1+6(AxLPqs^A)>qB2%CkUg5T6Z
z9_$=-od)q1f8`hq%>H<jmUW!o=wp<omgd3nDC=x19$Khfgi<jVfq>Ok$Po9FbfY&>
z+bQpP)oq*DAa)hS{<Tqd@K{Sa%h#DkIm*TPU9BB}Zq7hSiKyu_DCn0~#_rA&ACfo3
zGZ$y95rNr;j-dne#e7Zp6a{r%Y>+Bu&68_mk&o`2()~9gdX4Q%27SL3)*7TZJ+7|L
z%n>3%SR|J*ZkynSkia~4f*Ajy+NPhKKPbGnrR>>ug7OFG&t$43161F-IPd}mC2Hm@
zZYm^|@|=O1PSjboOTjZ|NBzOca@Up+iVsJ!6)aJso`?c8KIpHoZ~n@Bk}5FphO@vJ
ztuYR%7-7oE;#fwgFfgdsc8`b9<ENfL$+dz8>e!q2(|p9Buc~7h0GELn4gG+MNj|a3
zu3fc@#&He$d&sC}#E^}GSJxMXi^8=AEI%F2O5Zw3Z6Da?wmLztXn2nR^?(^d!NmZW
zhsE3Omqt(2YUZz*K@JGvhh&$o<~1{io`9^eN2{Lk4Z&{BM2QojKQ0X4;n?ru(|u9S
zI5-mqD9OmvqDI(`d}f89n<Tp<%u4(!nU==EUPT5jdrSxqKhc>}$VM3XhM2}3Olf3_
zV|rPuV{Ab1NHWqU3jsce5mJBMA%Hii${mNU$d)&U`p7cLk9g_oG8h|ZNl)Jwl|BCK
zZK?Z{-eQ>BdfAdk%z`Wn2a5pfj1UKl4HKYd!&RKXm@qkyNFBkGAhVzYQ_iJNUG6E{
zpYsQ4^EQI01ziZ{?>8Y=8XIWn6CKJT^<IXx^88WWG_P`9EFJC00ecpl)GrhD(|Wg6
zg|m87%$TBbMv6jX^%j_%*+QEwYIfR*13Be?nwNw~x`Sm)UN-`RW&a|)2TY0*@R}Mg
zqo05CF{!dxB!1L&z_S0E-qF(a1`JqOJ`FA;Yx(K;e(VBDde^BVb1@JQA+Cjf+FT89
zm+R+M4HL0)Y<@2>NAO(ld5H2r7uheqSMx#u^YU{S_rAF@bmRTfK~t<9XdLs+fnMGN
z>b9s6$XtLyF6ywL4AJvpcoP85JmLTCen(2OLTz^`P^Y#FvXAz*4U`|=`FhdEO!fmU
za<$1B)o>x_d-3z!=M+R&Eq^JW-Nfzec)bjEYNAAKI>%DCqv!7wy2tW7^r5fofX@vX
zL5SCl@xw9Y^Wxa?4(6{IerLPR8b|C3`LA`JXY9d?`sgwbQ{SsM9+4@%f5}~;iYx%R
zJ7x3&AMH-hzLO&`5vj*wM^}aa-tqouMXe6xzUTgM&r1+?MMHBM9AGOu3O_2brn`^H
zN`c9^tQ>13O@)=`uZm5MdAHzV{2`YHBet+U=F*usQ3G7_@rrSMBxQ98wwGD4+I||W
z4^FgG51*yr3l}PAcit)dH9L4M0%Up6*EOvHlcvA-1BNgzG*fK4=DLS@X*R~aC;x9^
zx^Tb;<2{Ovs%bDo+j)8Vbs`3|KN)t$IZ7)3-xQelD?sS-`B)MXJ)%L>vYeYU^n=yV
zr4!Tcx%8LIEf$N}_TVRDz2S<TOGVU#3@Ej_=%_57Z5cFro0he;D>H6|Iu2}FIn=q;
z0oUuj>1L)`bs^Xab7j|NXP&=_mT4ec6ZEt(!b9fxwdw>m1O=5n{J}5cV9Lj&{c{uE
z^1qXyMB_^vWps66AV4Hc%}evx8i^6J)C851j}c83lB7UvdWnF~P^EgT0cGBgm7L$I
z4_y=2U2b!@4v&90-m;XcMYr<R<P&OO!RT63NC0}G<hE=!ib+GmU#%Xmd`%4Mz8}B6
z{_d@ZJB4jhc|%~P@qqE*fo$J(c)%lJKLeVW2^Ig-!~}xz+e#&=(VaIDbzj`L#S-h%
zw5Zcu*OpB;yrZ6QDTToTM{e;xzySM}B)O3f809y&0CTA7qCEmKuTV9y7^=ZttZ(Le
z0TD>D{3n;08u|a_GVS2b;hIYAqVri)o=jlA@pqj-<yB0fv&-lttOc?VE~3|Ex`AQ-
z3Ukt@gGK7`T`cw{;7%cszJ;R1bwYV#_Es5$7=!0)E3-*gpRCb*iEP@}+x<X&AIzC*
z0ApacYXO@b|0C>jVNiGnkF>*lXJdafIdI(Xb$#WgMwg8iHwYIpoPklG*nrEt1C}Hr
zNYXO#oQm0FN`1WjHQwb0EcAz;^Nem&{rx4_YtPEmx*VdK?_UPGMaR2=-)BC`4Ed>k
zuuV3Akhsv8oyiTJ%jGrw{dD4(T(5Fdvv;e|h*%kk#liZ%dPk5(9YeF@KDX`mnJ~55
zl6YK@IgVr_dv!78UFpDmyKoA36U+qz#uC_XS%2RxZtMP^(q{PoUD|wn{ohNQwa3M$
zN7i5el{U}k>`z)bVOWe!DI@+VZ61X3=`B;dqa;wH_Tm(eA~5LXSmE{CwC~`DLY)Cv
z=Z_JjDL=qH|B(X<;nMRu7qnEv=O*uqSodL<HQZo@FaPkyqt6TdeeLVf>2c8RwRe5a
zEe}$11?Xu=@AP5mA20KDpJw~!PHx`s^KAdB0Qkc?8%6x7FD}U8PKypZuJs{=AGH_O
zRU^-qF+y@L0^8uMC1k?FLyOySTSiq|9%o;G%DW|2*PHjieghXbU<SS*vE`huj5bQD
z*Lgm}exTg)*9JgM6v&fpsW|stS|P@E62jZlxCBMzn~w~XB4lSQHQcnWTqj<M-(zOF
z3$Pn6epSQ5vM*FhALTCFZJ8W;u+O$jp8!LmoU-=_{~w^Ur$<{1wm-Ihi}jJoKvZ*7
z4@j`_en*%#nt|UUXsAmEWE<uTE~m}m_Ir%+9nf`O`%HVhM8Ro0%m(l@`glzGDm}?&
z-T|2jd?uzzZ6&1!Tgu2eMUh#fS{kH+TihH82*&YwXiuA&d@E*;hVM(a7ldOz!*(O4
z{5s?d{8PJafC%JfUL!tNnt+j2$@HGfV5Hx9q1Stoz0HnQy?Sq~hKu-Iz&)^vxCpvP
z?mXumP=m!FBMx93zQOSwnR|!t-PGfb+3gIU2ayQgDx1F`Hj$-`%g@#0uIqT8L^Yz1
zh7yNH|0>)y^7yl_icCW*4)zqlHb?)*Hap7#*k(-&8!Ak5N8(&bD+rxhi|%3V^ZR{|
z_Dc8UQ-lJh#N0^m?OhRQi2}XZDwYVt{A9%Ifw}C~Mn=g1AK&l~nJI5bFAvD2cA@(7
z{{v|z`VVOa8QM_x$Aw8rn^3sm`tymy4IdP>nw0ISDHWmTTqGXfQY}IvAp}plSAf+g
z3TCW%kTz>DHz+PFmZ5F=bOwn9?~wQp#*8a)8zyaL8CIB)VXP;yUHRY~QWm2OD<p;_
z<;apes1%q}7UpLN&-(-2N@r4X$jg`oc9^625$YjYpr*K7VUE|Ouj`qd2Z1LyxrY~A
zMu+Q~j;>$K^CkE7W%)elX2<{=aKNzPtf#)&(5tLLW}|ok8QveLZw|{JENSv^od15A
z<zNb|PaN#5B%@M^xYt3wExK6Q+oBWGz?DA)<ard0Z;F**>kSL|Jp)1h3o!eN;&=Xi
zWt#ZRyzovQyIZVkvh`xHD*~<RTh)YE4fFsQJfR_Cr_5omg$}*lL8sR?p6|Y*uB6jz
z`&$m@!vfk4u2${)8W3RV+(}mL;(vj07ZuWzQ*kJ_r7IcHa}wTcI;PSETlG-nv&Q-y
zZ}W-A;>VAJ@5h+eM7E>|<W^q6IiZ6fEh@*Jt`#RIJOXL=_vCaZ>UQEpZ7M;^_Fw**
zjH0yx+rYeFCeqSp6siRX)7KPuI}UX_=03jU9=2ewy;d@EI)mDplI6eaU5{>xb<-+G
z2bl*<W67A4kq-LH1S>O`pqGIwT~pT#;a0QL+m(qzzEE-8^GmWN^f5rusSu9hys6oK
zjmEkcI~VhMv3IanKZ35YMR2922aBnldYa~4RXC@67`;pB*gmb7Yo;z+3D!+)#to=;
zH14)1b!`2q+%D3wmF}CU+8R*p4BT^$l+v9^Y8Pd#(bwI|&QC?G1})b;IAvG)woGf1
zo$JLh^Ko*i_1+?}Fj(Vh$_)k3Z&EC0S$HJuDwx#bLh$p$<3IP-w0$bKzsRhN;qjs7
z#>QDE`#jl|nx_y#rzKikGU_GoZMu6ves0>_Bt*wK!d7%3aJEG8UZgmSLz&AMCmG~R
zVleP%yPnWW0<OH%gD~FUoofS=|1$W2l21s-4@wx440w69L~ARtQyKyH4zSP30-}cT
zMlKG-cl0YjVF{+j*;N!Q^F0GaUS4?aqtipP<oz<dT4jD7<fOv2%zj_&`pmM<r*9|v
zE!7P7X`#1I(rVR2y>_ccsoqR{MIGrpK#xkZb2}kwFZ}>3j%FIhYonS;Jh);fpG#Z|
z)JFoJ_fxUdfG+<kfMpvL5ZcU7Yu)`Pw3$!dRK&2)`$@`OUW5cS00?akI)Y&&metn!
zgmK|i{TJHobo`&tChZs<uc7eZpR<2Ln?S3JMwWokrv4vPIFTrihUG4#04VK@{}tL?
zH*(f3LN+%EL@^1L_LAW)|LK8fM=i6Dx~?F8$X5I-U)nZ!{a1LT+dSB~cJRaF36V>D
zzX@+@2Ty9p$K?^-ckq1D2}ztRTENGG1Zz84=Q?V3EIuqE0x=XdD;>+Jr>}+HOws47
zkGIbA9m3#!y>ggqfrp0O$m84T7NQukYM1iyRtJ71&1aFvS=rIAOWht}`kYH9`(wPi
zC}4)`u0$r*(23^*dJ2x`by?aZ;Lvb-;Hf|Sts~X#nAbw!hRQ+}tpA}t-H5Z*Wp3cL
zsO_F}I_<kI1TE0Vx3FegsBWbe;bl|)!sIMs>8hu<FG0uTIWP-<?daky)aL6P9APy5
z!4qEg?5Jbu6;?nZECJ)$eO-x*?sRjvq&N<PZfE@q06UBR>hGPvHB?3`la6=5h<5)!
zBifaJB_eX*F<nbZU<TjYK3+D*I}jPbEwXt{le|Py!5cmMVpSeqqr0qo!X5-7aa{<-
z?LK^0x*xY!Tut2YIBKnS0=8<dCayOdjK05g-UE^ie(#<+?q;6hEdBThds!8wP|8{H
z%j@JZCU{C`<n@FJ4)_Xq(MNfTDGb>U3i9mj{WO-|MeLzcy$>CzC)T$v=LF|RuSk<1
zo)PLX-<xKpAqrLDtUzamxJ$hd;Bllp4<u;*t>2xEEud}ta|Onfj}9>DcA4q*W&E(_
z{qSW(pF>?N_8-3(fant^3nQs3i6}hZc{-n*tA3!XE|c*T30c{D0%0t-N6e}L3OEOQ
zg1G5}nNHd-ntseH&nxUeJ_aH#BHs=ff*?SNww2%4qR^p!*sP<1g9bs#zs0S&x4cy>
zg~#0~p`iC-(uQW5g4j2iZaVU>111Ze17M5R2gDY*$>)1%Ww&S_NxYTD#K;B3^UAGC
zf2Ki@9G!7~j8F-04%Zh>%3NH{H%D<=1dDQmtb4R>9p95;|4G$=faX!^MWeRc)n+O@
zs;k73M9NTAoRPP+P76qjeI#2%xlN?6VU>=9T$%5%ebihN480`|_yZ)4l0Y0q+m%X+
zs-(M&N5syVeS-d4_Y+(nUSiY(?BjwApNei+HA}loJ)^)C_;fUnWSapFg3h}ZL=2`C
zEsTKb=j5GGK4VF)zbR>K-ALM@Y+Yk*8%pr`<UaDCN>p?rl3;Kzw@VJ>;X4!X*z|2o
zxZ)R)MOGefvJ`rJR1POk?WT3LI!IXvJ9nHG$_yZBBCG9xYbCKRuzgIufHh}+P5+UT
z_#jrZFIhA!E?LNFq5^slpSEnbkZpio1{{(=vj)j!TPj(7T<a9eRC)MX{~H@qP@gRq
zklSokUAKl9jpM-ll{qa0{}2bCy&rA^x=$y19J-G`08@t~6N$_gz@_}-OihE%{?Q9E
z89%})!C~T2rc5!|xJx12!Lbqfhxu~H7$-~K<p;F7kR4B2b_y(-SGblXoV<}<L=+64
z1!;=RGn$m+3iRlTObq2hME}6BcuSL~ZZt9{mrlDwY!f_Rk4i6<D#{!MSrfvUT+$fU
z57`k)2tl*qt5yN>zf0rDkj-;o#Nct5+_20Mn8kd`KRqsRl5w^PX;!GFh7m*!5-!kX
zMp||OvUoL%KT+tYg?gxPFx_Cw?jb_FSgrAF|Kv6a27U*!0BNKfrUBue%)5%HH%Csc
zjJd&|u`|aafj~Gx4FxSElSsE!Fvr8FByf)bl_x(;JvOrgS=O}D;P1YYx+|5@Y*78i
zz!HGQnKh*6^2csG=`GW%c|nbbe-y~@oVPG#fO0Rv5?as(yAM&4TqxZ*L=-ly|Cucx
zXyT6E4{9nsG*DH^-HjlO1#H6!C~l4gRn%edEzVB?XjwZpKylO8%$u4PP~6mWN+Q0y
z(;Fd|L1kq7r?^=_zW8r(Q{2{kbUF0D;-<~{L0ZZNJOO%r)6%aJ%24bX4e>Py<|C48
zx5-u(#X6CFAdO)MeY8Uen|C370%opEk-9w@TycuU=wW`#Cfb@dnvKBf5V`gC6V+cU
zDsOxc@au1<0Z=>V1YBSsjh!j2Wbe?7rUXin(PiXaS$-7_!C}LkP#ZPCyyl@n_5Oj7
z=8YJ2I^Ix8Q6Be!=S$SgO`VDHWR>6<2=aqEn`HEu61Us~tFN+PPVSbzxu;f_(5nTE
z_8D*yFj??%2TVO^uVFfvxb1TX2m^%}CBiAL)PYy25@~Vk1VY#>b!^Ddv2en4Bk_9y
zt;Blz_C7{Tx-XzG@U>gD)L`x;no3Msnbj$o7BWp?y~dk~`Bu<pNoVJ_Ws<Tv`gRgi
zdr)_EK8d)l@h5p*(n~a(rj(ym3-%0*;6OY=nU20V-Tn!Ut%#kNMV_<Y$G<HyQ#Dg4
z{~xCAu{+j4+Y)eWXUDef9ox2TYsa>2+qP}nwrzFJxjk-ojrE~^KvI>wYp(fR(W5B_
z5iytfqNzIB5kpe7dzp2Z3nxWBF&uRi-i;<xsmB*#CI-1~>ElH@r`Z)qN`WR-;nAw_
ziz>~aW~xi&H!La8$}Xb;pT@I@J<*ntqChOFZ{n^$!GtyTp%!b(#4pk;`bjHHE|e>f
zu(UD2*38K87dNIhlY3%<$_%ZAztSo;QSU{5rQvP!VE;Dtl`W1!e5+=-|J_Rtn$<8d
z>P=^eK$IsL6Dev=P+DzZt0Z9I7ImFscORnQWT_W@L5)G3V60~jPi?IiC#jIFEfQ+F
zfqWNCA=zqV$ny_fBr5xZ?@XB`HL?&~*gylLMXK6wGr8Z>tC1V2UvZCzf{uUnsCCM2
z3(1+TmMl*gecQwe|LtaMc>HKlw2eQovg%|gQEPLv?K!I%MLhvYpjIC%lNv#CdK@2C
zO`}j{D}sZ9u)Xg@R(_{mmU9q?P0rQ8q->*F2@L^2l)Ea0f{bfXCP$c>H<VDe%!j?g
z^6a7Do|!MWvpr6W6u3Y9VVTp_aX;O1J>U7U^bsbMFPWw9S-Kxl<d~=hF$@!8k9Gxq
zNLtAf3CPe6=X($QlOmWqnw4gf_VZg|MFxfr<S-MEeW&@(%(E^D*Z-5)>C>-4uH4(C
z2&iR8&#P7pbSqK?kqktxI`UF`pIK7byTgz@M-1hQG33zAtcbwPGuC9ys!A2FTsV%G
z|0OpGz1u8NIrv$=q;(+2%>Z3yGKzO<+?iBESzP);(0>CwG*sa<YaYv<k*MHgjFImu
z++M1rp>5u2?nw?PpC%4u6Kj<yxfGZhilxsCuKpCACUOK2VwAvJ+bv;?41gJFftg4?
zG^iNV`l(`}o+@#m+`GZ~yZ=;HI}KMri-o(*FstA)VjHdon=;e>?*jB6LS2PB_X3Od
zN;4wYAjT>hB;bwR@3r(`Wto1%rCyZvmc&N1qa%9Q#MCUkX11b#CXGLEgcwBV+v4vc
z)a5C4hJaqDxLMS@8$9gZwF?co3mGU0pQR8Y=WrKOW<J2us5g<iE6edB2u<e}je?&J
zFHts<0qM$Dov}Lt08a#_g8{s6zI1MFG9^=s<Dvkm=sb07@Z?X1o^0gRLY`_?8@s(0
zIQYNF)+f(8M#UfaMkZC#8q78%`9@jJpSzr+EzROF{llaJ3Jn*-=;GguaZ_X9HZn4@
z!!Sv+7htMGV5*!XuQABAJXI+UQ}SeXdLHWNPooQI1nrp|X`auJR}BhVOGP=X0gr!C
z{1~9WKJeLZM-2HA26n)S^--2Z#$Yo*8e6;NE3CGmR%r_Az&y3tHO1e0MB%56@tH-l
zA#xr~C%gkix={v-%`%Ekf+GPYN-%I1S1xs_91vf|_XQVbJ0^_J@g*14o_EiqV}_T-
zgySO*d(1X-_%v~d2XV#zIO2bLyxw0IOEcYXzioBDPamqczY=ht6f7-7uX2ar<Wqex
zHIVkh5^CRCWvN|H)d`W+bv%i~goyzQEE*-2e?LSE?FEx_wk&Wvp#RmPT~|-#FuVw>
zldTiXL;e=|+*3XME5aX&F4{ld2B9hRc-cF5X~~t`(Xo%&xf3Ye-}fqH2g%A(Ws*TZ
zd6BNSrB<Ajo($aERgti=pGx4=96KSUF{vcgIyd3D##Qiz2dH~n2@fB^#*UtIsSTvV
zxKWHPVbz(8d>PY{(|L;gT+$L&Uy8ift|6Ame>{lzUj72^uHO4;ATeF0FmCQlUY{?B
z>*#$mK-0F>_@oG>3Q)4Px!|574{mKwRN(8LlyphP%hLTVXex9ZyqVD^ol<nq_sfkw
z#l=tGN7`?UZ;i9e@r-21)+_wIul-~4dpcIpG94_Bj%j>7QdD8^E@L7D<p$c_1JlM~
zHW@O4rjnD@U#VLrM4Rp;%Vy!|HT@f<wH8W;au5IY-NtLwy@I_02ep*YkHe$&c)<^Y
zMoy9-TTH13THgg2c1G1UJ(O`ZsdCMm&zL{J5WMBrcKx7(UXcP{``nYxz%9CS#U-s8
zBUFGyab|I2Fe91JT@+^g@QEnt_Vva7p!z)%On~qgBRz=}X@DMSb*l_k>M)d`%^lK#
zkvkr7nRt|~CULlDte11VcB?PUBT8lsQ7JYhueB%Ddo9+;?m6Vol{6`yY$o76n2S2?
z2)4FdxD>wLclCbELc{x~HkHt^^1Am57{xYH2U4EFAcraSJRgNj^=!}2^Jc2_H!ZZD
zUSCzDj_iN-W6+_1Oqq4O4`{!fbtM18&wOgax=!F-z+GR@vu_Q{iCOifhts>BgiSk+
z6{A9e<Y!Kam5-97A$NKGJJQ2F)GosT+sj;ec>COd_pkm%9&nmDoTNrkDsz$rvL;E=
zv4)`p`FnxZAY1EzL@6t47A|EimY%!PkCBIqO1T!IxF5m`xxs;HVTmI)E~!Fas0fRN
zH^<id9h8`^^_4`E2T&Up#q1iZt2Bk(_&)&WI}jAj{|9h7ku?Urw~j^VwwLg#+#A-!
z&PqAu1Zf-ea>}jl_I?a8PesX;m7$xITV57v_q<<G9c&*5A8b&VWx@areUQ6De}|Q6
zMdLxx!MT{}@7uvzS_HReX6#va)F2~M2n89LtPqgRa~#ZzWV`uncbb3DK*B(DQSKhZ
zjjGC5YQ>Ej6p6I+o-hZJdk5i0Oz@l>xCNVUsE^>~{M_-Ry+ODxF3=!*Z{CR#(WgOx
z6ZkX&Ga&DUK58UKk4Jv8lnz{89^wg<1Z5S>vw0Nd1|;$3WyYW-Yl?^yWiiOZOSz2N
zUXLOFmIGw4;jzxps39k}-E8%(0C-B<{zu?s_2cqSp!eMWufSRSD{wMS7(mt#{hoi8
zQJQVLjN0)1oBzK8=X?-nP93N9?;c33tN3|9;)%;JA83xI6tntQ;DqX>ae-E6EFGvD
zeRE=`l(hyWL22YC;@R}mGlSIhig{>WUZHjr+Ro6T`;WjmQ4|$RZOoDyI}>uYET#9E
z^AsWRAAwVkAh(_gBav!m$n}~v<}jC*e43%4A_PqM&Na7Qrp2jVv-EahW>NEIrM4?F
zh5`?X2VC3&OKy)MBt=SB3MS_WA_DiR*|Ps)48*DE+V=u19iP8<-%Q}#A5VicgcPQ$
z4?%53zEaGy|FX`oq<uSHuV>2;k{Qd7v0tC!l(vOpT6_axgK2P0T-3@|bVzKuClz*1
zKd6vq`2HJ%yU((X$lr?hH?BYeLC=O4U#j>B_G!T%b!eI7?q!&<dmvY4`1sEVF`$kD
zt6z!%k9W2=C2hc{D#khNRMb+~ba}7m6z-)E4=FwEpW7@xF)*>zgqfsNyV)#$?*Ozo
zG!Y}3p>`((@}^UiBF|3E+>+t4`I@SPmAWR)1z5}?4P@pbQ(NBdv@ky6E~`H#iyc<A
z6vK+L&5Al1)T|aHlMkrDj$$RA!(&7dc{VsCVHDKy-F*5CQWdCo<63z@$TT382&4C0
z8Y5l&Q-_Eb((UL)ym#+zH3M@lp4vX4ow#<g#o*W%A(f*q!;cBRGHlka2;$DU1-i2(
zeJ6vq+UOCOl13~*2PnG*$r@l&NG9bGoJp8=kYe`IZmmfDEIy=tk#d;zTG!2yw(a$L
zi=*mu?{!y4{qOs?^$a@eFrtDazh$7|C?v&G6TdO~s&I^pr=n;oRU~-({}P;l#z3t&
z6C5!u5$pudzV=qZW0gnePHm-bM!nAVP?1Vl97qU)rE$J9`nat7&X_yk<+TIoV<=7I
z4)XF>lXXeTCbkW!<`eIkDWNs@G!;?Th=LUK#Re*GsjO@kNeZs2ZbnTl?abxtEai&p
zEx@NYnzk7??|xHbLQacf{5x=@RSq2l$jAR(cmhT3KM{3H{)h7Y__#jv*dx@5aAM{Q
z2l^12c%qpC;}A`pbvqUZQDaR@g`n0V9nNz*H4QxYvS6;Rw?0j^1EhakjV)Ph4}wi7
zNl#cO!I9x8UMUIDf@gEE%`@gewlFjHF-gJ|tr*gfoLi<|Oz>tRD2!=Gr773-GS<gU
z`6s($<NZpr;;2(Z2}RoVddmMW34Yz?@msn4yjV^BaU}}B%+qcgp8)ZD;CzyzUar3#
z56Z$ac({s0&**weF)XQx0e`n=Ai23@|H!+-i;t{WW{?hFFGkrytP_c_4m?yG&f-Y~
zqD~Ks3dE0p`?h8%<mOr#m6BBe`;S{yt4y<59znjpML87ziCFP^jRu`OAQ$7hUa0g1
zQ<ZsOJsP6A+9FK4uoo)#Xzc$tIGGT7UnQiWHzQ6|aeXA^xq#}&#a0OYHDEsa|N0zk
zHNE|oHj}vQ5twb5*|{2Q8<wCA=3cK14O?Qjl~Gc*4uH*oU8mA{u&bSW1gN0w<{yww
z5_@Qq&LJ|AOV5}`m`$$&IpXhT!{;S+O6~Gc91?Cd#JcRxvAlF(O}*7DAJ?=;MEo|E
zkRwJcV-zg<&pM5!S$BYjY<@JVM9!uSS(s|jLx|+7HivQj<$-So)5w9kt82d6*gxH%
zD4u(24b3*$hH^)8GEoc)$S3W64A1V6`r$))o?5`G_HqzZNMEH$wNm<XpbBL(L5sP%
zx-YxB`uB}cOc2`qMff9ik5Pdd%e){+az~voLWj&Rv6M{Qrdfd+K_23i&=JBRy^oAb
z_m&CEi_mv870$dZ-6Rn?ufnzOz13bYPtA5QJb|gUqrZI@(nk5H(3Nhn-HJgI#pwRo
z7{V+wypD#3$<6+eVr>T?16+fKmUtWCFxt}tk0bDHG5ujA&0q?@$-uBdj1ZU@OtHR_
z^ANivVJ?H8!FmyOLaPnYga~Uj6kkCV(;$#ULNL!>eQm%qhQk4Cn(n#*O+bz#5P&=H
z6&=&RPrzlLn|j_^-Gwg{F@yF4q*mNa^pu)s{SMsgfJn(F68c{2YlYkpAqT>eN|XW?
z8wPm`9J2PUAfK-`x%o&OJ_ui5{&6_KzHc(T*J7T{>1vg_sa65iL4<CBc{L=eR<xiO
zLAJaaH2Scv-f6#JuyP*oUAeV33j~8&wK$?pA(B~iKz2TRiK#3*kYQLt=~>v8UX>HQ
z?STOKiUOmR${goY%T(fZc62I!HF5~^3{Vp(f8sZB;{7PNv95-4EiX#xy<ws9?{Rz?
zOH-aY=si+cGZl%F_UPT6i>2Eg*nS#LM%g3;zy#39AD=$JjS-{_k6QP?3g?VhKWv<D
z_(5MNgpxl7UeF1xZ%B&FDD}TbHwKH~Q>FhXoM9g5H&$p8fmr-;X|oVUq^aH_4JGx{
z0m5e{|D$l$>KeT|(b9==fN17Jk<v0K8vZJrskQ%6ID6>`#6tQ1OfkL`=0|%lDTyXo
zq#lb^lbInRxBZX8S;`|%g5u_-pTe{Vv+$`8r2eaLDkC!wFh@4i{|M+-wR6PgK_Bs1
zW!K9#vcLaF;Y=qRUzLPNBX4Syw%mk>#CWmM=R1Q*fB*rptK$jZp2za^kfdh@g~}j#
zOrFKT2(;(W$PdHi1}O~FgU1iQ;tvrfe#{4ay)Q`U+4kx+XpxN(j1b#yxrRXW2Mb#_
z-;5&SNG`9VR}27NFBt$UO3b>D!+amzob6tLn;kY_YOkuNon@1jkZhTB))GxUbBq<`
zr`NrSTO)6tEd(OC!Wd2gMKxxC``oawv)?KvE9eOa6EHYt8gA+HeVi~A`R&R1y`yl$
zxKdet?;<wz+(`c>_P-FM=gWdIc~p&KW|XIcf=afeW%*$y?}l@FUl{@BgsXjeNv4ZD
zB50?up%ZkD(x2>DgJ<-g5T<9!saT1y^JG3%7NEmVtv}2CWjJrtJB7;v`Fm6{8PN^F
zK?jj!%nS=T9vmvkR8)HN=rE^zpICUp?gWlaB#3)ucilz)CKUYJxM^{FO=$Z|Z>uyZ
zRdq$Ec_RoiXqT%e6ksd3{tnJ&&5+uE2@WdYP-8Tk`*vPv=KzVz?EuZZTlVoa%1`J)
z2D2c?xX!JELrqBWsRw5;UwSYe>b@`kK094zcrI^do4-9w*|g7(1hTXl{z!Qm-uE2S
z>cSlQrbWT4NQlm1QMb20i_oU!KbHdqx}#C)<;@ASn5*rl5B`M>>bX1JdkgFnGY=5;
z&nz`*cML+vKgWH>@2Mx8nobB1uTmja{^5mld$Yx3hpOOIT4u`(L2+3}DyJ70WREUW
z0B21E&4WsSY$w`Ic>o4ClMZ9Yt&yb}FuR$XMR=J2nf8y8YnT+dEnv)yu93eQ>^zP!
z4rAXZI9G<-<-Eo&i#1duG>x<LNimtDHdi3Sk)>JYmeuk@t4t!Ls9sqaPzcuKu|?Zi
zdzyTnc|VlSrFNA4vxiBN`2@YvhT@K}Ecwu=ZwmlFY4m;chIpS?2vC$BdH+D6qy-(p
zmjzK+@!crP99@997iVEW40#QnT`DRFYyr57s9ujjAGkDeIR7<aqFt@7^bdL7#D5)5
zqg?PGc!p=vsMY^<I1QZt*WtXb<pfOokHeXu(pUaphx63s|2dpAna{rtr!&ca98Pq2
zvG04`K)hYqJa(3#_jfpd(Y<EMS-6sJnl!2Bm!gETJS9pFrID0b2Lj!+Y+6>NLXh-G
z4I~ZnUy`#<BmW!We@RYdx?hsh;A3`H&%Hy?JD_oG18VxR`Ch7Q5Ur$0Mb{JYiR3_h
zF(bSrBn01z1n_~eQj(IFrL&8kkvTR8Q%V{oIG8l6ERKSVZ7~NMeHvZ{CDU(judL7e
zcZvV)T@tP^(3(?l$eSobs)$=E|L9()WuK|Bqu-e>8gZIZ_HFe3V|t_Ak?}V^Xz%7k
z^|5Z{fpDZg_K-@aAd^0jICC$#cd~Ql{(xj+p4BXtL^vLnhr=z!4}f$p83YJ2kPpaD
zoD_^S5S{*3H~TL6O7;!fR#o@;`eHemnlp{#G6~(|=KB2l{P$IUUVfO|z3u9G+cd~W
zx7|sqOERPjK7_ySV@LUWawVHO>>K|@KF7AX!?wv8+I}Et|Cj2cc=%`E$a28U=w|tA
zV_Tpr!$UOAymlM=-$UWP-GLbs33DWOiI#JwU<S9wfvv$2hQ!lPlpGR#l<6>%We>Cw
z7z4@I`8JixPtTne9H|@|3FosMUq2U#UiQ2F$_FtN5*`dWN2(C_xuE_N(tcS!wX3Zk
z`F0MrzekHqP^lu})~i`;<O^xv{obr9x0Blf8`2ibby|#)xpD0*b;b}04&1^))e6a0
z?u1dTi(M^ShSadZSQt$h&Db5JK1tkBeK1cw=yKO_cYpo`SsvHjj1k`8U7>kS*#<iM
z4&uwdcgFq~3s&t%Q}>PRE!#_1rmx{_m-(8TH86$Vi^??XzyueFqK*Zfdq>*??;9__
z3+D&&e}`07xH{DDq4AVikn*l2U=_p>CaOj8#mOw~<FW?=i*qh(DB_UQVgjN39%kzb
zfv!NQyU_evcWk#iaTUX&U7ovtTvOOxo1+A!XRF9t6@ZHCX=G_w{4Hb3`PBW1Wh1X2
z=&6X5#|ZSLWM+8BPg8ov90`1wh-Ao#LA+StqBDmo1k^lnQIdOvGP9)c9nzf2A#AYP
zq-Y9e5emTOX}Cgxv^Gc47MPbDBrtf!K86=M(-z{~BFby%5U(zTx4ga-L}0q<|0OkB
zmPvCxQ<4}WDh%WAk%=2Uq@&%e`+yq;kM=7qO>jk}A>N859y>lD`7LOP%uD|B%6t-+
z4xdP=jF-I@OVRq9!zpmTavJ4FCEuh}CY*?QIS)h3b?cCsQ66&=u{Qswa3IpSgj8Q(
zpH$3z8TfwYcxt&3y0IL_w-6|b5PX4UBKl%CeXt*KuKhQkjexZJqwdhiPJP047nuu=
z1=}yy9b%8o;!y!WHq>Eg28TQLu^wBkD&3gO))?C3mMty%8EA?gK%^rZtj;#Mq@T}a
zTTVoT#ODzeA1g~u$bSZ<4E_y}(i-X)DN4PC*A(y^ov=MW+_Pk`L`@$6HAYFA9GUN`
zB$Ac0$ux&jp)XI4)YsGS*+Jx5-_f2gu+kyLJgVlq8mrZNf}SP9I!nRi-^2iLosg>+
zYBbwa2dGoRq0(#F)DJhV_)6v|&d2}``OgtXpmLkhV;5l8JXtswmzWU}QOQt*AURmn
z6xASX7)Fp{3u<8&3ub^Frb&PS-jRD3<4h$j7?4=m(T+#n-a%2F3<vTMP7U3WktK;c
zbQAFU5g>PwRBAc{2B+2@s_$@}$G-PBcF%_&Z?49Jzo_0I$46X8^Ciz-{Q?Uoj2vcS
z#L$QTWN8B{b|t@iT&usqPYxva=zN<o&wXFrrF716E?jd&xBB(A;7Cjnj|O%N-KGRz
zp5xu=e%{&P3hbTz7FmpCN0+Itk*cln8r>iU0b}pp?0waEnsN=S{`NMKgjRB%Vtb2^
ztm|3Zcw5H#BSzEuxIgipRxlyUTcz7`Je^}NyE$}Knd!c&N1rvC^)cV|fT$Hf@)DMU
z#_R0UN{9y4BIVqy!21&RamaPOHr`cG!Isl|<DC;2UuAW3754<HCT)ET^nHrIhxs<;
z#-&6|Z6rj**ztnfAxqnS(w*Oj-&K)@Epoc&!Lpl`gRm$|X9d9joF{MS%)((`8kbT0
z_UI_Zjtfb}O!;RCN&cc9&V8xF3gPaQdLJjL>7E+YW)c4e23aMvs7(PeSLhR(X8Q}|
zx7;)<1&stt@@w9!tu*X9a)CfD8^c}*Y;nPO#G*)`TO;ChGG1y+BRZ&GXAop2u`IeS
zSXzL1^y5Q1&5y!)FRtbpDiod=S2yHOxYPc(m1(&-5=0zHkb*<8WdeU}Jt{*YpqT^&
z%6Vcj%)A4zx>5!0=UuqDWy5$OV|7vIH7umkePHj#*xQWC_0}<TSUu0`0Yr)uU8do}
zWwu+r`&%BIuju}KiHp(d#ex~yT)wk7$G$MRtDdhkP{jt{J`XWsU=+E4gf;7UN+E}4
zUh9+OLcMN)cVa4`9EE;wek-b!CRh=1Zoa(j^NiOJX{wD^aM|2)Xb(^9Sst;l19cl3
z&v1RBTQyJ&Jr8yIagYOgqveT?B{(efd<o`4zY-JO!tr!wwU3+U)8g9EP|q?Q58JE<
zt75OKwfh%Rr6rU*ps8w22N|!+$+e;z++YHp3?28E`X5<lej#*kHTRVB@~47hLi>r2
z<1#Jay#mD_n@m!wwbGBovqAoz#uGVWfI?ou^RY(vDanM3LOP}aTjj}OUd_1-b3TZ?
z#!5~mf9S;=49i}_gs{&XKX+!$DAJ%*-(ajzMt&Bovqj8BYvmeLkbeVCmKtQ3HZoOa
zg6UkGR8PyE-mHC=Huf34jrx8A3+D{K+t{?Cshr%PD_N&j<Eil)YkBC!rNwyEbC$MT
z8kQA3(+0NNi&CYiM=XOP+gf78=(5x|m0X}3-3-K_WvNdQWlfc1%x}IP+DqoZ<1$E6
zt3{q5gYh0O6)XXaKiMbEg$Np)8?fmyz6tsAYzE*9{@%o`vg>d13kl2vA92sfQsJ8W
zUUU45UB-Ike{(mGTO3Llw#vnP?n784xHcuI!3o|1r0dT6$t`40Z?Ys7Y<GL$z#3)2
zCuqbkXS5Nd%fTHA{&En+K-h=w90Ce*mYZ4T`F^{YLtXvL+$7y>V=N@RyN=jhC#DSu
zDHdW}ldT^l3kjz0W)@;Ox8A4~F4?qX>vI12zUp|nS#99#vOZb%`e?oZCR6HZf+KV6
zEpvjoV8TR!lIFYd;PQkVI<gn`@_c-P2iPKhPq;J7pD_xX<_LoLv9ujp(;YjEkcsY8
z?E>)%e@^jru~nd43VU~01Z36)CFX~STJMuEKkwxaZ3%)(!zfXBYI69D>%!B##Vv|z
zfsEWH3Lp$aO+boM%5rJDsA&?2Z7J6+&)X^2-OP=@@}zNjrF*x8<wV^v_0q70n1cc>
z5Tvr<1Iyd~?j}~B0uI|Yo9}(uCRc8bgS**8Ne<?mgQJToOl&1rq(w7X-I76NPI5%8
zVVvrgFQ3<9h@%}?8Auy9a)<Kgg0w#ZCJj{j%RF^Wu~&9n=iB9?qXKrH9=LYA1EX{?
zR_zpqAvQ#{H$!Y^U@8BaI(<Y2yAwne1UX<z5CbhOcKu~dUg^%3aJdo&*p}Oz(@!8K
z0?7>liH_g1fJkUdt#C}3r`w2CZUs#hJ3&Vj&LvVj@{SUC$_z1w(wc^J*I`^Af`cTh
zg=h{7X^7x4F|sf;1KRlU_w?)u_=+H8gpAY3VrK-2j<xEKVMi4>Eu{()o3hVR-{pXR
z2IW8&5o!TEZ6*SxTnZ07)19e;VWM4}SE}Kouc^$}we-uoXoap%t>-7Wi~VxfgFH2=
zp$mZE&b_%45#{T(RKR*6)noCD>+%>$$a_OwDpNKVHnzmysjWxI1tdM2t1pskyRSKd
z)?|b$EQ|YQS@#Xijg~L{n{1qpmk1ZHUGD>v*SOH9fx{;&>0vORoxTe*mw;C<u`e6B
znQ5f!Cmm{+WJ{!p>$Q*RY;BL15*M4DujB8Q`HvlU(~s_!=Sxo<wElV$;aIH{(1+DF
z)0d1hgrvI&muK)lm(t;tGMgzG-Ghn29eEZiOg>j*SnsuVd>gU~eNdh0BeM4bxVxYv
zp2K8<j48u09O)n%Au>7;p!l9(R6?#jx8m0xdT3tLr2qI1o8wh8I&Nv2-29Q@hcJ>f
z)!!o(XNYrf%WTLxra%eqv%Su<c_QflQ8I-dLqAX4aq2=n3-MjfqXiQa&CwlbLo<1z
z5iI$1YX0nU;UKpy9(lVOn(qlNL%t+1!OYV|n5x8d+sS4g@r0R~?_oo;1fTQ%X#Knu
z&bGQ%MP6gh2QfAa3w?3Gv>O!9)gQ&DsZ6_0RVEbuwVUx${<&wBDxmjS43QxwhBr!8
z<PS(HsKFJkmN(%l`m9P!Hen_CDoXsrYIGFA21s$kRuq^XFp}sKk^&2b0l3pKXs&k=
z`kMHDL(>piN9TDP(a@&aLr~3ac6RP5`5>j4xF5nIOIrK&L>inQMtpvKHSz;Tst*lP
zx~LX<v>b|^uWf2mnmQZ=$$25bU1I@Cyd|tA6qogv)OM4hUjF?Z4N1|r^T7~mwrpPm
z|9iPw;Vc=2Z_8Q^U<kp6fH$rLlB>8Jt{nBASnWkWGHqTTZ<pSmJ3U&<3U}HAhdLO4
zF+jmA2W=rW(gSa-AnX~s^&%7_gt(jeP$0jKBsh*bNT>`h@VtnC7iQ23uTK$161d-Q
z^p$8c5WINrwxbbg8Cz5WqsAIx;Pu8rJaqXY$xwg4QKnBYh>iUn)h}5hws=MfE;}q5
zhd2R`#~*pI$24r}o+k}-Ryg1tLL%BN3*$gv5#Q$ekeEjb0A-JZ<ZSt`m9&eVg+2b0
zOVUU*sPZ4#RN%MueBxY4osnAx*<X5ScH2LpOrSDOFm=9|Qf~kNw$F)2G~OF}tDa37
zDL(d;xKSG>Gm_n;a=r*Tnaw#<a~8pd+Ai#NUTuu?;7u6@6Ts2q5_=Q{!QdmmY7!2x
z+x;)KB3oh%fiHGxV)1yk%phLYY976uwfyh5y*d=SpR$B8S(JEr^4-P9O{DU@a&rMb
z!&HXCXigC}n5Z=iOy~2M6nm_IV)mr!pGoIDpOur9)>W!BE*%&jph26Ep+B|r!U}1b
z<;(aD0<^qYHOB-+m#>jE?tF2;t{gFtD8s=RmI&Kjw%WSLI7VMFZU9~9+Zit%Kr89~
z4=x1u^STIQWpzxtPK?yF8M7Is5}q~W?~KgZGfG2+Y7$r>=@x?8{8P=WU3pbW)K#E*
zk~gsCf|GLA>i^V~vi*j_G7d5>IS-Hib?Ze5S<KsvlTcUiq6~6I`r;q!N$4YK@DJeg
zXX;@kvVsuERQ=S)QI{pF!vACw=Pw@`gn{u-{;Vl@jSUj;;m?g%E<G+Pn{_Ht+L1)7
z;MyKt)?#SMD6=rL;S=WEgk%tFyqN)z?`5>zpPru9y59zkbW#sj{2bxTF0^JK$;(|d
zw-L_izEU88%8=tJG_t@ykg0P1DaJgacNjAF7y{b@{bJ=f-(7Pg+!wVtMVljJDkz<V
zZqFx26dv1^AFq1L0a{l`wHQ@FAs)L5$J<z}XFWe-ZrI&h4zg#Y3p*_3Uyns=A^<6y
z#C2N_up5DO6_6yjm2$cq@#j&_P_axZvw|&r4Zxt&L+=NVtPCi=!+?*@TJWmd_CsH1
z@H4p`L(cguOX2TTAEHd{T>uoxom8q?RJ4av>aAwKl9R^bZrC3<&YN2J+(Z;M9&jVN
z!J^rl-N#Ei<KSxbnimSjJw~X*PZcI`Wxh9jATrWRAybv<w#lk3E`z`&pmf9^yiOFb
zA1wAgvY<9)gw+_N!&MHI#&T!|p(55jq#jG5FMK*yZ&u{7f=wO`&SwNCKs{jwj#gzd
z?^WSBf1;$cZc7&4_{1~<Gt2%w?}FP@8lktY1#XG*A>%%cbrEx=>2k$$^7|N<K{ZeW
zI_p+k@5mC7Sc;Z5!VDsytdvPpOb??nsK}MERKkq!BYnG2XxHdUFb9F8%OzwqA_E&>
z{km`&b&yI49ap6acdf_G!@6SSVFxY;23WVFZkMf_vSL&OMzc-3wf>FH7w4C@K*w|J
zxq}-nxlpmZnOW*0gzrcmfD)`j9gfEiM1Ps4>k~042Gf46Rap)=zo$}~GH^q6;?Y%y
zF|z&9u(z6qqWGv=6PQF4)UvIoQlT0HN-{fyTw`32Bd1AGG{F)5)41$5OEwSmN#~gH
z$c&GixEU?h?MK}2H@61T1c6ef6ehnZ!L_bgF>EAt+KWD;z;T_6gc_@N<5LsX-5CQv
zi%#}t5xacIZ9Qm05&>%b_x?+ms!bY`M6<1V6`7iZU)M=rWfng0JU%xPfwR!L2)Rpz
z-IF-g#Tm)VcV+{aCA8IvahuIZb7HElK;~JIyDH{0Tkqda$Lo(E=0aW%BCs-JNkgsc
zpCQq_KRor(^a!Ke9_HqI9q&;&7c1Xjs_5C7>fZ14p%imDM^bQYjxTeezImmgw2~+I
z6Xgj9T&Snu@I9DetG%w%Vpd9#@O_j~Y*zY4sVn)BnEUyZHAn?^l`(difNH>Hy^gtZ
zm@6=TLMC_X3sjg#YkIEU)xsMvN@027?7T(pPc!%Kx9Xhu8SH<ct>d{=F+K8o5TA46
z|Cu}TcM=o_!ft4V)*d7hE2cuLU`HmM_k9Y{7%u|9HNX^;@ujOxYo*tUP4Xy0;jyGV
z_xBY?IvNf6D}psMWlIe1myYC*wI~HgoTWh}Rfu_YG#dKxAsgwdqqD`IzkXeN)wrz6
zsoJm7#3?b(v8}Q#a?Y}uQBV;tS!=QDQqPZSsCRx9?I$bvLRhcEcx8^_K~l!`nY!cu
znIy$8v>DA0DaJ}fSxLU_$ZH%X<#QeR)LKytj)h$kzCp7&lK^_ifU&G6gSEe4wV(NE
zntUp}Yr?qj80?G9qlpW(pjw$}M>|f=&J^0%I|sdsBy1s%<;8XRe!p+~aK+N{en+Vm
z)~?uuuko)rO-cD-@QBQx0FPpmexK_4hZe;!^o3sT^W6G<K=ONSxpd%i!G^msGu=UU
z-41WE^`NREn<Z-KPqc@wh3#aDrmCZ*-R3s@b!+4Gd^-MhaGhXVlxok-rxX994@DA!
zD811B{*`pp!#kqf2spP`wfp`wmg#ZBV6Vh^Gp$`*H7DonsaA<oZ_Lp9gjKSsNY*zL
zzCQ_&y8W&7eGU$VZs>e8%{IXnU9ID$l=A~A@S)Wn6Cz3%yli^oTJ>gK-T9j7Uc5Kf
z=jvqZ6+HJmv7h;I#V~l!eI3a3O&3vuk!+Jy<>Ll(`|sv%JRrk$IXPP}^smdc(%0q6
zylbb2Sm0CR#;;scgenXB(=jD`z;cUrF5R1y<$XV6S;Nv|g>h0xu0WD)__WS<-XPC|
zr4e}s`>;BTISc^Nu^zh2;>mW88V40IR|@e4Lz*8ZgpG{wZETiH0tLmK>{g_~oHVCX
z(~?&ydR=iV(;5PqeIJIdk@W@S*pl>?G4YVXP-A~0lMiuR5vw0$WJp3eN&Y^NgP{CH
zKz9f6o7&q<gbtos+vv2IQrqa1m|9zp?ZjsroL@G?-g#eNbEiI4x24Ju2#4}s@t4or
zu+OKF_U%`)krA-tl*25R2OYTTGfxdp8bX%fzCnT47n_SG2T5gYqM5G5UJUPRkT;Ps
zJisy;WNxux(YF|aHAhWnIX9ypn2XF@?^ApN;kS&~b0E^A;{4HoEyOL1Vu;Casyd5e
zimH~ZXrFAYV^vL9Oo}A9v(xOU+wEq*#;GkMKMqsZ(JuEzhVQ<2Ig<OdfFg<1H_Zb-
zZeR#^Eiv^>A5z$xPjK93NYhO^pZ5Tp5h4}yqDqD-_!1o#bedlh;JHw%inAgBPa{??
zE#?soECzvXh09f5s$Yk!uu!KlQ=+K|sw_8d(yPT(qq(Z^W64zaz!Cwy6o=XD7e650
zqXd+f%us}@K)4*&ucG1qF~5k(KM8s&I?dxFYj*WMJF}kU;O^*c6g`(__;xz`JXhVA
z2OiaCR>cgOHHl1pHt~l|)!<HmQ}4@=CaN+REyO#QYcs)6#9i@}^LR)F%V`aZ8Dfsk
zpyJRX8BALL+{eHH?eVFZ^hwBOJT^3z^k!^cFo@7$bg-bgseyfn+k=HN5sw2n<uO-z
z%7@s(a+a<!cC-peh=-`2vXR&(2B|7_&VVrf78C;ycWoCDC1C+kWU`SkSW+2GZ}Lzt
zL&(7L9o7Wus<g6n>hh}WZ`zQ7r)nEY{;U8ZHQw@?7iig1sbQ{KQn@wx4LILQB&fKF
zH7&He7h-5Md=!WW!JgXWJY=s^MSNsySmDf}>=u<g_lgP7xSpw>H=h=eJ=db7@OykU
zyO7Im!wp*MN$Us9;lN^ifwe`HK3E|?cL7;c(6rM&m)fNx%B0(ADxEhjn9E}UePPTS
z%a^VE3lZl0Y-**VlpFX<1+@h8ei0iT&TUe8VH4gzg32N;v0~*;T$CyvQTV6sEH=?V
z<V0v75>pE~N(rMhPGLOQL2^+9cbh`KY`t+=yv%lqqwKW4gNTwOAQQ1^7`8f80waMa
zf17O$Rz3@<0y)1c_uRiUpL-Ngp1gwkV|R=Aw6lcTutx2JP}Gq(F_}=*Qii(s#w}cV
z@|)k>=Slx-(;D^9<rv9J9|O{#N(Nc}-(R^cJX*UD2D@U6^?H(ZUiQFt<hwp^Ot+6K
zUfZ3l%_dV1IWg0tZ*XB8NScgfPn!HvfBP`;2D!hnHOiow43lN-1>VuGDLZHqYl4{5
z?naq)yk!H1Une<zdvS-vxiaD8Kz?*1fC^~C86NwH@?+k<J+$0jfYosV-3GUkT3_E1
zU-MdDo7!U19zUn!Y-H@!YLeHV_Air=Fa0l#V@nw^^v^7!Vw<Q>{zh)=q2Ca_39W8V
zGdaH)<*`o3&2*O2j89~XnHzOZwNCeh3AXHo+WK$9OTbE${nDT0-xB%Y2;HR<TBp4C
zO;a%tQWcY8oSJ6S$GkMx4z-;mK&n)@D|zx6H1$)H6#hDp9pukOL;)C_{|I8i92*3N
z_8cdj>ndf8VZ6y<j&rSYy6$w>Ci#NQ*~RZ|oD;uTx3~Kg#5aHrpn@*~ce4EAtT8ur
z=pr?V-K}|-+sbNX47csmWuPajg}$rEYvt8?TZ%Isl>TbwcwxfC!S}+dE0+oa0qK!p
z3MP<R)^N3alkq_5CNg|-wkO^3n6gOFQD1n9l1gX*Qu%>`@c;s5ODY^o8sxxcSnU$?
zet*7#Y8$JzQ&X8OR`4>DuYJ9@`?j33Z%9Dk$UlLua5Vya{~Q2?J*wMJ!)GAnT1C)L
zUK8G8nN5w@F_1AN@2}JGvXFYw7Jx|1Q?p<xt^Dt^wIb6M@)E<2p|g>vo-HE+-i}bk
z_7NG14OTvuM-3Q^angWcqy{%JN>)U?_#CT%!SE1lX7tHW^R@dyj^P+wK+CYdF}07?
z1@(f5V&3dKS?E{sOk43*hLlaNYq3Y9qGgOs+eNTYxO6jcNANtujm~VCkpvbs>RqpI
z<6Rz-bjMqoratPQ{arpt#?glrW$gVo=V*9zG+cRIcbCja5RAwJ4wb6)KA{gKPf}^v
zk*${mWFs=8hc%)4-|QNgun}84>|f}L7x$_@+Fkbx^9NL}Ov7E7nU?}e)b=McyVBcV
zw<2E?2T$N5o3hiTzP=%1;bdBz`PAF1N^xYO5@PC|gZ3<^^a(GY#|Av3sA;$fGT)*O
zT}w5*cDCL(+3a(^cX_Ia`d%u<s=SWr;^3|K-s9t^pz&=96S8~H;lIX)SAcnx!+@QZ
zOaV24X?kCqMk%DfZ_4XfFS6)9rYJ#L+9F25U+%M{_`ywyYK3Hz<)6r$YTsuWwgEi`
zz)1Ri_pm5ci8Sog8M`V7@yf%A9Bm5~XUmYrOS9R6G6UQeWCplyK&~{~63f8X$3YWJ
zj((!sk3=}3NpPELF&q`^vb-4d%NkO>O@;}^r{HXikHaOU5-piU*eZ+{hMMmB=RSa%
zT6x=YVY8^<w5)QaTfa@Vi;0igwhsb*$;{21-!?tOwmn+Rj9GT2ux&w^>{0AMX+Kv6
z=(b&FhM>!6`3+SCUSpB)CWR(u<zLSU5=&1kNj=8QEm_};u4zhUXKxTc*lEC`mcOY}
zlRgNzC;zAntJW@J2)9^0*CJ@<hLx_jd~aigd`<#oe$qW@0pCsh9O#fvJZf+aHgk;d
zSR@kQ5alb_G&#jrF^#5|dI!KkX``eSrB;cjP{MHkyAKBlhVYL%EHY_PP+Uoc;nw8f
zB84>OV-^CRBV5l1qU_bauHZ3x>$a613G2S4ISEZ)Q->0VYR~7WV!b>K>QYtJbau7R
z<H~!JIyI5J`M9r$9Jb^%tOzlZ=fu70xMK?|J#N{@BT)FiVzz~H83uWjf@7T37Cn|V
z99rv^8s6pZU2#qycrL@c#V@-E9d_NcAN{v8`}B68z<Jx4r5fD|AA6XY^ZEX@CCdh?
z9o(nK%_n%<17y5H{50OH?AC}QU7|5$yx^^(bvRyULT8iH)tRm6HY&2oo>=`jPJIh1
z85MxB(ZuecBmQ;~3J8mKjAzo7#8j>CUL%SaHd4)ls6a{W3OehD(v8Sc)k!alB8CCY
ze|fZ<+bGxUSo5B9M934oNpA*6+twDs`RmA`ymr@AFVmv=e!u^FfWMr>%Hv7GhbLz3
zKZzR(S59Wz-_TTNx`%EQwWybNaMJE(yLGznU!EVmE!EU?GL!x9R5ln$Z$kL@BX=sb
zU@cx<EjiTW{CzJ1a-X{R)tzEKux2;EWtW$}B`r)gN<y~Ct1La7EBnS5Q#2O%s?K;#
z=zPuJLTqm#&+r=X=yo!7-S3LX4ctxf%NGE&N6yyXn-)GNAAxCgc^#d?K!QG&mg=*J
ziT`le)UtyhV&aEC>tsx%SZw~zt;MDcgcm_oS=)yaK?I+){_;5M;#arQhE3)AKkmn}
ztWq|x&PDFKgYzM`ae2QiBrl}nGz7GVMT~N5q&BKjf5?fV&oBph9CmNwswFbh&9{SG
z6y!ft2pEsAOp`BV@YDvag4aQgyMWEEhEI!bD2%lwTJBThdYvCXcpH^ic>K;AnKrF^
zKvxlafHQ3|cylOkpNcPw$b2cj-0$<Dhebh}lF&QyT7DsnL--<8#4osrM-Dzj=ueAp
z6RK#N@V0@oAK_S0h1Sh8q6PLZhQm&emM_cd<xfsIUR|GiMEgjNb6xIT#@dZ53%c4A
zTW3dqb9%{eN;E!778MLp{n#RpN*wcNpC<Ov66FDel80~vFdqOHyg;!9#S2L&S6hG4
zCtmr%g!sIW`P}SnlX+~kweu0#Rx%$AjaM-xivl68?;eCD%K>z{OhCn@2)YzHujmfJ
z43mtl{uX9&53gsqZ9x9I@4bi=iQE4fU1iAZnq)tkpNNW=$?u{`Ju*9?aoLL$+sa8^
zKkbaWg(Ni9UAkd~wn~udF-ho?)S_Do*VB%N)R~<kfAqG(Lxxobp&;e~0<;JhX4k(a
zX-_%FDKXsDy6rvp`#^T;om88iK+)JanSCXWE_YnkfPq2Krn`mTkDKGv2|yH*V7zK9
z!|wE?>v<nmdfK}vFqU`U;>DD%uT34Uki&0BcT0+<Ti_|1BfR^F2A8cbxB*h*F3hW2
ztAlcT;kfg#a0r&FjYtsErOQ^I8(AIaJYM}t?rlV&TS^^13Y#()sy3}JyuE~Im)C43
ze*ULJ5k;n^DLGR=CXK!CujeH>1NO<3F5TbV-v-U!w-f}z?s92{j-$u05kOQgxbtgO
zKXx11nc7|Knmyi&T|VC_Lko1uQwGAw^cZ_;{r&eV_X{(s*IUnzHi>`N%iC-=smA}9
zLsvAEY(3=cxw+(M{U!}HclpmxP@VTW+q-q7Ct4Y>pF)YK2BNZHxqV{mfF8ccRNs~;
z-=6s=D4o;v4;q?4DQH@mJQgmjEqc4wsvllXQuX_`w%~2`zJtUQWIEIA5h<-=_P5!g
z>7Jf@NNi1|tG*Rtey3Z6`!DQ0AE@i9R%o{j!eJCt`}^jIr0fq^v@2N}Q<cXt<l*P+
z5^3&at2;0qsop&FPFqPf|2rY|0QIIu8n@q&OnIfcMyr@`6JMIfLiU5;ipJBDh%lof
zmgV~X5C6GlL;Q0QZ4*uVqL0lIMPec5Xpo~$s)cL<rAl8$vg=j$7|%gM0=35E{K5v+
zOpYQ^w#NmWwWsql6^yJow!pS-figzSU+Ko8h}1VOe<J)g6<uDRK5h>vpoBV{z_CDY
zI=t<Hc@&Pvooe26&9Bd7#+=>kNXV;W3=_wa^njd`ts%o~$@&_-Ru_1mdnvI^@wd(S
z{+&GezWl;Sum*JYrmfesZS~%UUlS=fUXhG}Ia1f*z5}%F`gW0<_eEg_OJ=6SCVkUK
zMf>W9+4}MaB;`;JQD%^wG?mkE8D_Hqi(-;~bFKInpvBk3Mx<+U1a-r5!nF`@dYg}O
z_u4FlZCCDWxu<b7-z#`O<8%cM*g}0?JxRHXK39y&aqB0qEFsu_c^`%#ptLO#q?-9w
z;r(Dal@6T@bvb!odpiP`-MYX7#ijl=lBT!q8so>h*Jep*>T7>6uJ`p1l3b4oNU0Fr
zpp0nw&7U(E<{KAm?&tY&!e%7FpHkq?>2MY^S}UFzcZVfm9Sva}$@0U1@9uoxM;T%1
z?D-ZjLtb5AoU`50_LmLr;y&DNEg_XSFU8HlFrtG^T0M{>%p{>@Q?k9F;0Qt74SZeu
zvoyWm13d1jQLg+$=6J|%WEvY?4V|A8j!JSK82tWe4gK0kX}S8w&yk%FyF^LqlR)Iw
zBAktc|MS2k1+_;n{bXat%nK2a@lWSjlHetjG<}2a%E%UTw(iy(w^9k~;>MTm)b*@P
zLqBhselPf($N>O(P<4(o-K6|g=adS-<nv+Jk?DgZbW8G0W|76~5gw0bx0B|bxlZ?M
zZ`Tusi9&9QH~mF+Prds&7bf!kaCF}9AL!ScL^DYEDmU^8lLqM_+!m?tNbylXe^khh
z&6XuMO);t}N@T=YCHtcMSE>0MWIqA!q8lIo+)$)&<ygAwXmNi_IkhgsMOQS8q-tX5
z_vzp}+Np89_jB7PpMR#{ZEJQr&QxrfMG6&gNKLm0&ET<%9R2o{l|V%w`+wkq*k1*E
zl}223jt3IRUC9mYo+pVuz2&G(e_itQNI3^IGUYFD3}X*0yr}l#@<$@lNyGon&ww;M
z_20&3*FA^p2juGzpQ3G83#C&NYSdswJ#@NYxsiV7rR|=@m713B?T}%Utcz7x2k!$S
zTHBC5dKY?Z@{VQQ%hLRNv<=(JzmQc)w(3*e+0Zvd<8B2?({xFbzjIM(WYfXehBCyE
zB=lIIph=8)tdurw#9*+;{u9qOj*#9x{21R-4*ixB*YZ7~Rc`LrRK3SKnT~FSWmEJS
z4C9zJ`gQF6HO^t74H&Xy8*_hAlbaZT4k&~+vS|6BOU=y#t1k$`odJiODOR&tfRKD*
zy48@93w4)}tEx>9`?eHvUDfbJM-W%=@o;t2fye)6`{6|l#E;GlZ})to!L3#HX_vIg
z+YHk*o>Y0!GR+x3_*Itr8E~!Wlts~6XI-!PvxHvhUUA;GPec42Nw5Bs29s*T{HOr%
zn?&0BYnb9?$Qbn`XSQNUfK;g-GtbIDox(9SN=q+>tgr(tAUl|CLm5-|-11EZ_QLJ~
zl{3TpYK2dw0d;m5LdUSM2EUSZ&}v}2isLq=%7cz3|97nGLRn2ANu=GvYWD7EX`A$*
zVGVDIX}X><&&40f-V4TPN;!Xv{KMc#@eiCIyi!yz)BjXZo@c05%=aPkWtAiOb+J?^
zFCdV>H1L#Y(Qxu*H>LIeJw;Z-+l8z^K`3u5MM1#djeb1}OvpoI<hjA~h5C!WZflVC
z2xDx5aPVf(%&c7L_b>aXa*5ip?sAbhRLi~!$2NQ^q0$XIgAMvV52`}^D@sn0XFZ9O
zv>dKC)N+9+L%Y%5f*1fE3n)lhZV3LI&|CDrpmbyv5m|~FDYbW>7OR7ASdG#EDSKWl
z+Nmz}LDX7Eq$RJ*w@;&F=~I7j@wj%UV$Qc1J&!pzv#AlGati|69%0pFaecTB3@}9)
zc4rZtAquh(#<9c+CwyKj;cp_JQB%I8e7j-smJ-B1i={#Pm$u}Z<!FblbIILcdNI+z
zDwdN=jc0WD(e`V4onIzJcXtmd^QFIIR18S@md^z7YjSl+d#1CTJG8i`*xs1o&#|t@
z9!wGtMpyjejgUGCh&PgN<>%+Yc!c891e;nX0Uc|KMy!ZYNwksyJLdz$R0L-rKd~Lk
z{ncVgzg$MypT1HW%`%X>IT1<}QbXhx`8LNY+x%kw>IqZ}i_$EtQpoT07Fpz3R_kn$
z!J3t23YKR}+e`)1^etB7?z&{b2-O~S!Q%N4&ucNo5^UG>^<k0RAbBDHXTSJv^nXZg
z_2xfJ&*Y2WrWX7S)gq6qy25wWYIRB=)3{?w3r%GXefl~fedsCFHSI$v@YtWjg!^fu
z1DZ5~!5Jfl^KCD+;LeGsXC0{|&641OYw&tez8B5D$hqQnPG;WtGl}*VqGE!*4<fMI
zh{@s8M_jW7&BI?smQCay*rBWjIHD{W6uaFiLdX_j#1t$VE`C};c~O`~-d(V1UT*gk
z-&LpszdzV6$6Eo37UN9?$3XTI9`G^UO)yHrZq&M>Nb(7Xobw6jSLgM<4-6=!2`T4c
z<aavhvhZRw7A$>2%{Cj)BQDK0A3J6%*()tB1Q`!7en|Vo^?=b#N<HkUp`^ibE`(|5
zk)$6QD|AT2%|j|De$cVTUXnOJR*iXKS<u|h*?#1DW;E%+LZLn3pj8_q!I8B}lsuKm
zW{{Q=U(LJHes7kB(`<qmamKYABF|S_-Cs{pBTBh@@)R5B4TGm*PDEED6kIButd|<O
z+r?8}Nk0}hl|dOby86YP|3Hx|SF?`EeX=+#Yuj)rX21*;cxuz@-ZXPv){Zf=pII#^
zpzhQ7g1Q(eVj!H+16+BT{SucPCxPc|e)pU8&5t6;U-(8(Pol`cRnY1pX)J|Ln(@X*
z$GT)nI&0#c%u!Jx-SHU0gDc3ypZe@GWkj`FvyVN%;Xx5j-(KQ-MSv8zZRP$`weM?|
zpx%Xd{3Q^o5I>z>E)FZ%S7%ipFSa{fJ|o#%XV#o8*BdQfI}n!HiBcZXL}?n`O4v`;
zSndJW3K>i`7UNzzlUeZtloV_mGjN`)mAID#;NGWNAx$p&LTOLK2#iu6?QEi`2g$Cw
zP6#za-`{;^26DO)sjpcAfj><x`v~xaWTM*{GTg&}&iOdPfoaJB7T`Ea)rrLq9Gw+W
zZVOBz++z>QiJsiyaGa;UicWkw!Wde%<>?f&K;T*70ffKXnV~-@+!P5W*QdSz7~t)H
zgLBLu`%OaQh43eq;CCgL{AAj+$`7{$`7)pGy@8_Ds4NQ^jrt`@IrxfM8pRI1Wp|kK
zy2JO2;K`rN-XkObhp&Hbj=X!IhTqsuCbn(cn3xmW$;1<LV%xTD+sVYXZTsn&>v!MJ
zId!T|Rlk7z(cRz9-fOMTe!~$(dQ)<|L`K{f>*951rqT=X^H>cI3@=8jnv_mS^#Q_{
zm^azfw}_1U{`9hK=s2ilRz4*^g?N$$b56DzwD=o3=sGmWx3@fz*YBSBDl6g;^^VCu
z`NU_^omRFtuf_;QF&4mKq{8LAFB#|=X#73C0Fn{%<YW!=m*>k0Z!JLg__n?5joh)l
zf!e%rw{>j?j<164lhs$I*D6sMcJS{|kd_Z%HM^pMK6lqIt<RbLVA%TTuGe!HBJF_s
zvmMp_3{HAL<j2&A-Vp=!9(YMlAm*{PQg3zjdH?64eQW(o&HYR3%6jW*F)OorJYZII
zutbzMz{Xhp^+~-6h+{jFHzv-VA98hGBv>ezpx+qj<a*PEGF@O3{ewG&<#!UKNf;N{
zuVAsZ6%<)gUpt5z{e?B%G3ZMa*J$BpxHrL6!X+O2=Z9rxdG)#J5L-Kc=A0it>rJ>g
zC4Ym52Ifg$2Z=jhu6&qs*O1A+P&X&AmJGmZigb%$?()1X%`0gSd-Vu?uabr%F~F7z
z*{xH&Jj|m#qKp0EE0cem_=B<*cHko3CNKLtfygN5vCdN9QEz`gGg5CEV=Di#u>+HP
z3d(O5N!1S7X!YsushVI3&CgCuiec0F?COvMiS#>~n_{b^0zGrBSve0`IJEGUA(Y;H
zZVa^jt<Nzm@mYV+ig5do?dv6$k3vSLXC&Qk##xi}-|HqNnY`(Q;*hmlWD6^kDEt}o
zr}{KaumN{EfoaJ<H8bb5y$wMg?zcg?#;nP?EsNW@@S`cNgObkWNq4<gX7`P|&`GwY
zUH6#LDwOR64cKEDIL%Rhf0FIq_;vB$#y;cc_^l7Pemy5`s?+|#Z}aqU@4F9>`zGFZ
zn8DK7%Bf2E99(Kn42Y_4+?eCmGsE1&PY5Xz^Q253Fn`F=r{iHd4F#i!h}=ZPkCMKT
zH^G@Odx74FGULeWr<R92a+w6k8{RxKz95p>%X_N#o7;A&J7j#H%JsbIiRFFO$>Ye9
zu{k-59EQ}BhPxlKGt(<P&>OaP#k|ON$SXm#otuLi^54}8LJi&Vrez!mex81kjr<xM
zT@0ET*{pnP?qmzwkDU|7mDaX`c5X1t?PygSMJq=w94k;RwJ9p#R2|1CtG$g7Bmso;
z9iZ$fDbd4!2Axo;a2%;;p}lgHwEhPOiEg3K|1A5Uw}#EJzbp5sR{!`C7ZCw1(Q@ms
zzeh1F%{v#eTOSjfN$Lfrty0N^sYtRcXr~Daf<Vrtz0H$oBOyU{z|4pUrV=wy{yk~x
z&{}|c)en}QfjRaNDJ{Ao0?olHle0nI1{A=TASsUORf6*=3j@>cm18zy<FAQ)JGZ|d
zl&VLu8Xf&n$ZS1V2(ub+Qpd?FqvKy!cq>!Es=79CAa&TGoPvBshWrFKjd#A<jyN^o
z#D6vb@Pj};_DAf{cD{_8IAI((4!^5Vm{;)Psi)Skl(tcqU9Mzq-u+qJu(grD+F-J#
z1eqc_908gIOA3Wer9BR7jYoUI7L#i9f>QS7DrWShhkZX0e4c~f4mV<LB$Q0j9M3c_
zJB309H5lX7;wd}4VgIcz=x+%uPblMBsnX|@fDrE}TN_{uj!rF0nEhRt$f*;l6&<YZ
z7scL~3_S!^{v?=@dF};ieghFwWbQ*GQF>5_jzB;YVk|3sl2WGVce#!~_WS(a4Gng&
zLu}}GpHw-=7uGJHFeN|vc~rY!;4vvn#<Smc2Tmu8QPdSyXtnprcxCa9PtBI~Qc2t{
z1Ix0i5^vCn=6lMspq1h&5^oCUU&C;^k)_}OfDma9Y}u3-L^u5XM;&J~ETR!XZdIaf
z^dbT7-7FW!F$W()98SF$nxgQCXzaaVPIdXh4zdH|JX_SGfXlhWczAZ@`kSl8j`hcG
zCm`otqLzAMA7z~B06*)h?WHc@k{*<6cO*xb&EI%CCvkaBJS6gQDuO)^B46oYIFsoE
zw>`1!d%x{%l0>Do{3RuM`7#&t^bwBOF?|4`6bMGjigAxYFztC7w3xOwSXefBxdVab
z;lsQDVFV4vOhbeQW<-c<f2|lwU|%Ao!1!@NB?8GDh|JqO9m#1g+V2AwoJb@xI)+|L
zFQX&1iM#xJO2SKha{~Th$Ro4sv*y<HISHg-`dY9AfeL@{2<Djr`AJMU%k$<T)zFkV
zd!zFU@vRQa_f`%%Tg|LVK8mLm_mia$mimfMstNTcP2wG)U!RoRxp{;$s~V*-ISW&D
z^8M6zXmeUGM&%TJUthU^=2()rn)R8P8@W(jD&K0+#X@%7Yxs)3BmgUf&URSZFSS7e
zEu$zQew_-1%E!I?7W#Y6*+GRi_5QCZR5Q}SGh-FlFEN|4w1Y}IsnruFje;@c0IYmE
zJ7;_PLl3M$rjIh%ntq<pb!EDrGcE*wC%Ij!nB+J_H@IDoct(42kv}>g`-JQFIG&%?
zwh|f!=?5s-3eP;jAdZ~N)wF&CL#gt9)vPmLTT9DX%I|VX9LbbtgmXsKOzPy&GmbQS
zOI<ri#5mcE?435HsdzF#Cjx273Z)!^rG=8Xn<=ntdu(0o%N6|6tpIvf)gnDQS;9Ig
zQx;AfqBEbRWNkZKDRSR2=c;V7M7A!K4K_&d(bS-apkLBZ#VL1|{W&!ug2%`=t|UST
z2c%MVVp##83gHj71yysYSUrF$*zn*^D^jbNBNU;aBz_S?ooRn9P`w>CgoXb}s-3pL
z>Ao(;wu|ow;9Ep_euL@{^Td+!9<_zOj-{-uR}w9ow~fJ9l$hBTrbLXl<)wJ$h*c`G
ziQbnebdCC_3gx-*kmuz2*p7SZ4R6Zd*hED^!MkjX@!_v>6{3>>*XGfCma9zXD+-;>
zF|<{VvgpyM)<McTGQ4g;b37-&+Ks_G<i<Uqv|hmI{Gip!>JwnnNi-u~ZM6U9l=kqw
zZPrY7gK(a_0hi3>x5l~)Kobhmt(=1Z>l3*&u6;AN-~BE~Pacog<8+4gkTXz%jKWAp
zpdRKjzD~<~OJ`Q-n|gkCSafsL;9C1f;w)sT&|r~(2JFOiBNiFjRG-Wu6U%05rF@XQ
z80I7-mL%nk9~&yxz5P1fKBZXRB-4o$Kk<fS!?K*y1QK=@TIw^2d0rfV2(1QKxW)qz
zAreg2Zd}i<z=f+BInF9TcA@*Sle)6P-O)8fP0g(tB(JbAYM^)}aMS1(+_Gk5r(hwO
z6v5G}vva1`&lFK}vZ72p@fpO*#4A<bQ4_W$gc{J7=?Og2q2MFdA^P^ZX;Kuh{QwbN
z!Ne$KoO!Vt+3tmp<}Hv!9jFJ;gPgAbdJy>%Ko2S$1n5B~y)Zm~^`Mo%dXURsJ?Iml
z2c0AQw5XfG|5?5mlMSX~?UvotyWWzic`C$sHSMvy%{ypbp7^cfaYGe+arToMNaHN;
zr^)CeCc3rqc;k-YPs`FOuiT~v87=LCBulH6$H{4*jYz#?<@|$o0}n4re1mPl2gmFd
z`9JmH5dSJBF*QCyRRBNewbkP^6hl`wG@CpL9^?CPy_c(cv}^5M<L&vU{d|vS5cI~2
zy<1yHOG_5kU>_;Y(2fQ}(}-I=7p|6aM#99zmhG70m9X977VT8S0J4>z+bt45k%k%Q
zK!9<M+PicHfD5wunrEf3ng2Gm1>k}}){^91&k3?X-K^-Ff1OHp#Z0M~IwzGaK-G^A
z*ZKAW;(%Zs{EtH_Ot9W?zklN5Phrb?=Ahne9s`LU<DbTRioKL-cg*%~rB)|CmRw@m
zg|q67o59$jrb#RiVdRY+^ucw$z++f5ZdD$JLjv9a*5ieik?W0%XxsaDJ2EEBe5y${
zLz3n}@CY!niKApbhW8ZS(8l$1DlZ2xoOezU@2M(SWRlClkn5nyXWN?}57BDfpZYJ)
zh?lz+9!L4=6D%|Cgq8>exeifC?0kmP6)w}CIj2G$%D+Tq$sP$NP^D!dgdCPir3R`9
zZqoFHAUJroTc=t6Sluv>9rQ%&z85Q^1}fe7LU0|a+(BNy8wLehCcgWD%7L4dnKLVA
z>{Rw{xw@CV+<<f*2D$v{Q~7hlKtD~>Gne{3tL4QHhtM=_4ioKpigTNd7A*)4(iQ6Y
z4(<aOFP#0kT?8xL@{VvECtfdba|FN#F%^?c9s~FwWHM8xU&1&106u8xRfe2!*R~HM
z@7M)oGs?AWMy-SX$x#vl$WX7ajtK}Jdlp;A33;?HIn`;l@H_Uj8T0-$6vnCgi`qXv
z2)Bg^t3T;-4!{Qu3>Q)hM+mM^?%A9-*|9A!X)LJ0^4gJ!F9XeHl~}SY7OfPayB&O6
zbw<-xqIyeUMsKdEI+~h4q6gfjryRKx*^ymxiC2qruRoMf-=I_{zh~jGy&HC+lCUjz
z(Ku}Tf$j8+`W4KOm0=-FWdG3fTF8#V>^vKoD1YxGSigwKJI(dv(xKy7;f+4kAnTpT
zgQ^-rpW~m2VT%YiWo$Q9#}6$J+m33^NTnQ8!DUrW1A#70n@lb$`|@it3ENktQ|1Ko
z)SrEc{!(7AwTRON7(#jGxYAfW3h4g~p%jO7%*e<{xAv44g-&e=Ya>R*m^ypcA2M`>
z_nG~9UM6m&yg}jvPX2$?Y0MT2&$hwl_$pCWVQXhH)AfhDApSx^6o$6Wo`#fIE{3(*
zdd~WuIRY~U6ow^DZ8)#QK_DXNTS%Q#*m&1Az6Q16^$@$A&8P&>;i|*!6T*w5qWs}7
zsk?rq6Tpyoz84YX$M)3mZXJNBmYt1hf7KEBLo4gA^ghY5keHWDOqq$Uc$Rj~c09$Z
z9UjEGK5L%T0l4hW+^Q%RlU+&i?n_ds`c?ufw2R=LGR6C0;!mwz>j7{ZAWQ{Yp9C;8
zqUjp*z1*k+#ho0nJaUhDcKsii400yXNGkQij#G0$aX8Yr8GU2RXm{L@9pil+9S&aV
z79`}o^a%*K*JRR~L6_;uXaVx*9bvdgk|L^xlu{{dpA-foW_N$8yxN(lAeL*PjEI<#
zwdV%bdG!e84xTygqEKTz71xM(VqJI^wlnX<E0{H8MHaaizCs*$h;t<-`Q~-J;j&yZ
zhddoe&duk$<X)`fttZ~mTZq=vl=vB0T@{905J9exm4C_Z(kVmVsnO#_k+$cJ#H~cP
zh(0|6%aVCCPjeEV?YPO51v#u1lKG5h4dYyqWZ)JMO~;Acn9m|T=`a21j7;J+;n5Xj
zRE|~f>rdcTuNuTb<R=4ZtVvttG5JW{7To8QG<w0%CWtnsI2^`uS3&VWYGe&NX!0zg
z9^GVir4_3rxYj<gn*rGblCf<9ewJxU$^t^~{I&MEkJB!(AKI%Y(a(5El9rt4DtH})
zgq#*J!s7n|Bz=6jgQ`2)=f}M8^QVZxr=?aV={sQt=6;O-I~{1zLK*%a0!o?GKDT%>
zxh`?zX*EcdUpd6)`RMfed0>2*!32th)Wovoy+brf9C<zU;GPYDU!XYjhJ0`CFZ~ji
zFEk;hSftPpJ9$bfdP}ZFapd|Sb*!EsdpNN}sN*UIwh_2DG9bc73(@(CTM*kECwm&%
z-MrZVVh9RA3>Bd@$`x|IWR+~J4iXsh$`hoS>W>_bZ;g7iW}R_|g;{8)424-IP)p6W
zKR*5>Vt(pLI!5;rSQ(S|0&Wj0zQq*ZD^&736U_h4^dgsO*C=~Iz3%8=UkK$J5sHh~
zusZ`J*~j5{Ff*eVKGpPM&8PcOO=(z8$T^sxbhANK%)AHd8c?)Z2xc-yD2gcJUo-6T
zWW#jzU~nkz$-R9K1^COt9CJX*E-j|3O{H^q^2Vw~(u+rV4}^L{Z@~R9C|H88{T$0c
zOc@iN9*zmuUcM`@vZ?YBv&n0YOs_@Wr!^MI?L?nge&kwsE2FIU9H=vqs!XV`vFL6C
zn-P24H_esRiY-`izgTvK9083_BV)O$bW~Xaj!ZUC8PXgq7%iNBBewL-UhcU2){}A0
z?uTc<(CmK_n<JC#lKqI@DE6GDri?8aguTRMt*t1)vhFyB_SAYQSt_O6g#&5$4Y5Fx
zPK!klG){^_Pj!~NGhny1dDcVs3vTX_+ivdJW!4WNRifAGWbmzq<$SLVqXT!7>cV6d
zIsR4#Hz!21VDd<X3zDC=3Fb6Gybw9C<F?a)v`TC1m|eUD7EGZ$ZjNhnVfMayYe!(j
z_PbbU&U>Z8K-|&UHawYEj#3)66+#Qlr<XYdFJzChhy!J1jCz~*A-My})O^pJ1i9C<
zAQ-5O#%;r%qdr5DCkJY!X2q*b;ZPI>rhp;2S)h>NUtfqy=qszQ4(D8bvj!@y6Hi8c
zrHCi)zU??02=+eduP=m(x2d+T@uM74S~(CG+eHvwP;}s_&-uWs%ya4oPsK&<)BQ}B
zEj7J-`fqwBxwH`>U*mAH6hG{_xZ8-;V7&X}Qp?C;)KV<X1kkau%9>-4>umdv5)V;3
z4)}6b<2~S?(jjUi8n5+Q))P(F#bZczx=5&uQvvUbT`_YQ1cWe=UHeJqS2;xXa+p-X
zP7^#$&_09eq%R?q&80_xF`U1;(03Jm)Gqbq--=YFiGIt@{33;qT<>FRz+V0M335Yk
zK@)kFW{v@Uh-MCv2CTPll;5{Aw{_mXh!W8>^hHB(QzlLtzD4IfQzrBozC!u?O8UlJ
zopN|r>Am5!B=B+7zu)J|)4wtvPdShWueZLLi}uK?h$gk#wrKttwO@rq2p`ss6tz}p
zYmvalLF4%WG8e&Nsc)R}<5pOn<UrLB?GF=j6q9cm4eWNVUtnTSx~S-N+T3X>*s#8N
zS8df=&<FKsmLOb2sqRI22kw488+0l|g9i9hc1W#{F@cO!du}bRsUnqu9%~ZOo8$M@
z9y%EBh}eIzbg~Q@_<Go2`<`T#3H3DVVOI=ks<qTwMj^mIP8oPOAh>yHnDS(X_VfL$
z09{DCP=($NoxYI1z;Rrq<XyOIZ4jCT$g;cVmfVJT;pT1QO?N571#^)afn$*Z)yv;Y
zq!ZN3|1eMNNRrNa#j{qjde@{DEG{e^78vIKWcHOKRscnTFuqb2_?LdxH9fAga3U`&
zp0<j%RNs}m!K967rCQqPhEs_myBSepx>8DMZ2&lKBDF-~TnJp?g9dap94x=S8M#OT
z9!bLot}|E`7q;zRRw(gpm}qxCUi#ACs6_@-%|Gdgp<O_GduGeRpS7if$H(hs{i86D
z;(?}EO^#Qu>!rD{INDny!?Bw$sA-Xrv4QPneD?$p!uVm6rQz;S_jw1J<Z&`##?K2}
z;jufs5PtU%9=(Bh6_zhMFnya@T%NrJLRc)&2T+BKHieAf{;EPoLqZWm|6SW|IO+*O
zd7{LgO*S{@jilQF**oyrg3gKJf3`6W8UHp!Zb!EjZ2551YjxrOO~36%rbg$kRFv9u
zmVB5rl?m9@YI*(mhGjL1>CoME>Fo-N*x%_>=e?KUaccZg{BmFv4Y=VhRT>(r?_QYl
zhQDdEHY&@`(&=P}4Fy7vYJXx7Hg9zF(dn{B>&_K=KREO-0(A47KZbTsXR_CiwlsP{
zdhUi#lP~WCt<Cg>hv7V}WNZY<#};pIN`zWN>a*JLw3zQHz4OJ5GPDP*Fq3b+IljS}
zHO8O0@+9kl>b^Xq{{e_X){r}K08z*lx)(-}-voy{*V7u#KWpAC_&qmt0~&aq+}|U!
z9*wcLPYZ365RtnvLT|&ztStBEjknlEyoPXZdPv8pBTrf|GVy*{y(zc6CM!^ysd=7A
z#^;C!_4!l#G-*cqFkZsA{84H7<ulyvQOLrLZpT)k`$Df#O@>>D2S<a!lI_E{)TC5!
zwd=5$bk2DdRBC&UmF0VJcC|WIbl)<hL#sXBmsbEObo7rDn)r_t(kS|k4j_eGA;vSt
zPXCcY*#Af&R_0@>!G(Tz^|RC~H`?O`ty20>fGL#s*AxQRhz|vrLe_sxA<Ms}&}JOw
zs%Z%Un^1}8Z63cz@#hZN(<kKd1?8&<+J_X?Qo+xea*C5Ba0m=->6rr9jxlX{95i&M
zT1xHi`xtjjTE5PUe^D`7;$j%R9&jQ=Ke%8)XHUj*VB$Ueq3+*@mU?KN4`XNfqQHS)
z>zRbrx=X(Rrcn9~W>BZ5RITQ$wyaR?Geu{0Ar&V0w`R{>bhIz~yeoO%F#&T+uwMXE
z$OP29RMo_CP++Ol<frpBurKDqX|h0H@c+U&3=G~u@^I8aX${RmOxR_Q%wnGiEdWzU
z-2gf(6aiogG2s2z>Vi9mWYJ}YM(_4qE%l|~NV^PwKp2>gf_88Lu40qvm1PqxEOJW2
zotm?F$7v0U_a;akp!l@Tpzv*VYp3v4+r%x?LL`Zl_kGR?mG>VnCRjQGh*(}Vo;wAC
zq^69liw&5OX_+Z6#5uGu4s$y0Ee+Rr=}b?Hcu${hP(MG<3jm+q2f%&)%T-E!DR#^2
z1eMDtEa$~;{KvW!OJ-ohc-;6;n0F*1GM?C0tLai{hINXq#qN$yPe**0_^7}fLMVGU
zx1~#BbXwTTf7OqZ<Uxi8qyv2{$ruxPPs?z=P{jxiiyH_IAqH6RMqi&L04h|1Lk9SI
z=vP+BeseuJ*#fLR%=>hJ5fa_M%=RJD9C<rW`G|+b{X>P|d|>4P4H?o0^A^W-T<}$(
z`UFumUC*c!k>O;vu`bbl7!dXTgk|8v&D2$)^`f_JOi@VzCW&}95!SCkobvi?Ft}5L
zkK6o2o-IJcEI|E}<A~9@a<5b`9lVTUXKUkMD$;RkON;B17U}1d9KD_=kP*;CrMGu)
zrbmIkjfDX^LwVxgpf5ej_SpL?G-w;uTNSr@50Y(_#YNemlP#>s^sinW2y{}@f&HMO
zHf)YjRgPaX08*$^QvH9V(6W`_Us7m!32K$JOuf|wryJ}Q(VB4J{&=JyJWMXd>vAV{
zDIYkh&?ZJ1?ZTC$%3rbG;jqQyfk?xf)TcL8!bvFT;wr7gj%1@v1={jrQ%5}21+TN+
z39E)7=<}MQq2syqtv~T~A^2hHvRXCCDjhTh*m+J@mwn89^o2%Hio6S!N(}i%V#7|o
zxjn{XCqr>o{PF{mG#pYAaq-t;s3^YJk*YdEO|^pk1gSUj0RkQel~qhfiLZ(5Xk8bx
znidz|UMC2C&$^I!IFx$N!jzB8RE8acXZQUx$rJ0os6m|thxlsoGkoW-4X|(oP7w>5
z27w@PKm}>=E`;K1Ug~L)YG=7?e81M0p#CH(4m1(8Br*g$q^H1f7a4tSWYE09;qLOb
zlET{O?e(oNuzF)~4-+v5zC4?!@~<ew>|tuL)BoBB!WIUcga(Tq@;J~YcK5lhwpT$n
z0`7F;csx|eLxb22qJ`=upQfgnJ_6RJ4P`u=A%a1hc2$$m)?gPUN0UX?)>ra}^jO~y
zgOdql_87y73O+yN_yF{TtvzqHj}+YbjjT@j+a1<<Waa!R%uBPp*O9mDlGl6Q=O`ke
zX_}?u?w$@ZIIirwhD2+hoV)pp{fn&$2ZF1XitnK!A))J729d;8H&i?(06V0=+7Ju_
zImRN2L^xOsgCG*q!l5+rDBF;EKz4EJ&zucVKgkbnR8Z1|dCoWg{Yh>KCYhfZnn=vT
z3bSI$QgmeNEZ<J#D{{>W3@gLAN?oHPY<g3M(6jYhk)Ibtv8daM|Fk~|4DI!RYJR>s
z=FAToUG{;{77Rz9%G*})zAEA-0h`Cv$#%nrd!V<_j91WEe-NswBf<ULg=8$R*iyIC
z!zD1m2SOK@j5XXf<b%rAxuMRp0q?qUNLWDTgP2aYwPov%vEEA;tZsuQ1$r74W4-lo
z@3m^Ku3sN6{L`F*FR!C%^$2-q$~}z}F=^t<kiwzmm!*HboYAyREBYARY>?ggfK)v-
zav2ax?I}+miIu{$8HR|#YjwT-n<C%~-9qeIIqc0nFM}mEm3rPe_T%PK*5$_2=nR9b
z;dQ1O5Vt>RB5`OHTt)u)bu){Cnb4RiWFrg|<bDdhvDTmzhv#ec{50b-8{YgoRO>WH
zDs>|wy|yq?|H}{$v<?#82uvE*EERDfJp6Lu?1A8FQGoNx+qLoYz@D~}&gWrifkq(J
z-QboO^$9LrAdOZ<I0Cj9q}DQwR2?#870)!93CW|VOfHJ^HQHi3G}E0nNDMB4hD$~G
zwsTpXNMJX0Hg)lR<Gk6v5Yc#r@=*H_1$#5+tTnL7X4d2~vJ&T4>ylNysg7nfbU%H2
z(E4(NG;Gyq%UWyiq0Z_-5?sgnT4!zO54#MT*^#qWB@T<jZ|mh+Ecds)jlJ_$xF2n+
zdhqINNU#;iXE;{=2Y2XWFyq(r0eDUEwv;;!E)F5<K~3shD3&p7YV^=kL#sv)j~vUy
zR>y*%=@r`jxD_sZzt*fB&7+;Qj;L@7whqr|6#6%AsWlMkni<=zrX9!U>rG`{LVSlr
zB~~QaMnQr&7B}3KQRF<1o~k2XX$j23!ziOtwuJ8c3o-114PKUw5-`Ot*Wummfe8rL
zF(`V%K?nj$4TCebN+94LBr))~c*`gk5WRBx+&__@55m7!7B~O+w_Q-!mixL=y{?65
zjNx?yqa-q!lq*nar5$jlS!v1st^u?7Q}S#qZ{vC#?2O~xfvBqRB~XN#Wgv^a{14LZ
zS#!}elBlnKV$gJ4$vh**%%UJlwIA$LkAh6qc7?Z3U>wCmA>%CHhiq2~z?_21{-RMH
zM^=|9?(2Iba-T7O+2MQY$FZX+)=VprHH{)%jq@y&unt)_<qmE#BLrgaWNK`<1`U~H
zf*|g*saN}p3RSVv@!DVU>3TW$HVjezA5@5S@i9eBMfnwfQw{L2?GIYbYi4Rh(RC!5
zZ?e?x$Rt-dB0ewf2Coc9qdzO$JOI2ns$E_v=8HO1{H$|e_Or#yXP6Eb@b4~{Yp=JF
ze%Gk%eD{w^4t{LJUDeAPUUI^>6`^I|2{va-OMY;YiH_cU(Z^Jr<N3DPIoEr;t9OQx
zttB^j+1P%6J&FKYJcTla3ADE<K;`Jj=_$}D+uQoX);t%EQ^pgjWb1i_?={fl6JLdp
zpR7UiW3FC$s%qyB=pxNs4M-rDfxadDsPoSrdoP~%XN1Qt{VzJEMy#3>)(AyYRc_vo
zhR=f^AgiCjR>qnOP8Hc%ztQ#mNWDMbZs+dqErH1z0qacL>r$2gAKeu@l5)4)>GS2b
z-p%g3&$jm>w573b;p1+L$;i&u3qsd}ba1MThDYTa`0j_cySwv?zeI}v{1h(RDPP|i
zNE+XEaM{c4fNlf9fLqSs19x<O^Z%ce3PA{S1rjeF!Z{0Gzk2{dEItpE#Oci|Lqc-!
zL_R-Du7teb^wJu}MRfjU?toxJw;m#9OEfL?`}h@X;d!q33KAomYDPwOn5s-cj0s>j
z#qNj-Jw~J~V=)ZN|9SiuA$}{`FCoM=VM@IzI6}GRI4f{if}?INKKDgFm%4w+>eA$?
zZ;ttiD=g)Dy2I)G$K-*e)dj)u099!9uPU_JW3kA9UjuT56T=l4JZ}s?Ey=uy+5t(!
zsT`KfJr`US6n81A4ib=%<%)G^<toYA#usXv>r^@7I9$d^3MCoRXUjWQ-Yb#IPV1SS
z-?4LIM>fKW6Ozoelmt2ik-{iWp><ah8C+1AI2KG?)Q5Ua<s~^POe`*2*lt(KoC-l`
ze8wssr5e<dY9f`IHosQ*6U%WHD&85h;?AyZ(pZX_ouM8M#kc?nfC}mB{Y8Z){-Q!^
zQh!mQ9>D<$cT@}L`_ba1G_nzYs}miHRb{7Xp$@5S@@-=1QKX@wP3bsjDh3*ansm6S
zyRi4=IFK4_(tTdIr?VlvvcIU144Ym`t~c=UbuX{(KU63OOb&nwmEeB|r~QixQ55@{
z(ng04ryR)EG}NSm1f0t3A&aVnMaBV8p<W(UJ&64?W(WW(6c(!RLTEEM{YR-ZLUL#d
z4qx4rJD$N;t-l{(KWRXnB)4Pq8{s>49D7Xj$}A~4YuX#^*O76@Ou3U%*|j3dPdQ}O
z9yijwhqVS%b+J?5TvF*rzbI_WxO}dPb&?2bw=QeoJz^+Y2$eZDh34<!lR#J~M|}~>
zv??`l8hcb(%glsJg;3s<8%$+Lu)vxTHPG{=FjC!P4u1IpJSE2pY4I6afD5xmUedF`
z?=o@uk(fgDn~_sFOeP1>y=d-xz3&Yv`Sa3Mh2o6Z9w^2t*Xcus&}|e78G4z%o|DGu
z%#)P7)}Brp<!=j6Gx>9+I1i$0+aJ>46%J&v5yVPoF_)r3Lr@mn->IOuSuBge!<L)B
z2ji)W8wGq5)IDd_N0XQNRVH)0xbXd32fuoiYFZ>-7)V?>*~Bo%4mhW(s$06uH=_bQ
zBRCj-Ftz`gLL4}oLjY6A@1H3|oC1tvZ?Q&9J1&-VwqGA~KS@i%3gk}~@a{LNlvTLC
z5GV;wJ(_6@v@r2$>b`~t)Tot_#6IGQ!dt46Mwh6K3X%f`?^=t2*8{iO-cw2dFol%h
z{+U8H1^`p29ZVRN9>heXw7_DXZY&s{aV24bqL;bDk`S~A_C;iJf`|W~Da1YhBWKh<
z|36cx)bJa_|4bq3$%A@bJr)(hn-jfaiV0LEwn1J&<UQGvo9|TdprPy2F_nl+)hONs
zMC0?tOmxjSywt{4<&$|-{@_y<5*iSIfzL~q0Zul9Qqj1-rV!F!Q|L?>3168zL%r)z
z1{+`s!BhNa3em3#{~uH6NafW747=fB26V=?)MFnBYWh|)F72FWAjC~lmc8im-ub0+
z422lme%ibf;y&ja^cV;X{`QRKfb$KJf|BVz_(FM75w!*d8Bqmv7J~eU>Uth6#>n*=
z$zM}Q`mZT;?RZYL+z)M+h7d*#Fop8#I?&w1b>cDGHur2qrDXF&7M*FrE|#QHV#eh9
zuvi<}6(bShh3SWLd%H^huPFo!0e`(wgY%y$l$lM7ks&qF7}94vKlj%ZLUa1XzEEH)
zU&B#wXczDd_{|N^Gjcb*8R&p5SD)~uW&y}Yf&k$Y$#HGL>xtQGu0w7{qUap^-1M>j
zktUmra8?ha>PN@^(G$+HDu5P@wZU97$eRCs=r!cF+4x2hL)rph3Pl9EZW`c4IHj29
zivUcaD*UnJUU3!KqYR6QytxBma_K{WDHKB|xqwhgt_r2{&lExfm_p=5$^cV{YTi#Q
zZff@bnnKP(rVS|#n4Xqge+E|~o_`ckTM%Qtr4Xz$W8#T!gbqw2ef|*P6KgRcZb-ty
zsWF`}lu?nYQ!KM4`R06SIUkdjm2W>u={`rr{jE;i<sf=LoVj-XHU+Rz7Pmq%uavX+
zg!wUTAjR=hri2jXL4g=pG*faYex`uv#2OVec_pz1d+wPc1f9}Ez3e?71hgS@^BXt4
zR$mx-mPtY=iD$)R-$zFtlcRg9N@eU0AUn)NrViMf;@_SWPdf-e;?<C(m>4v4vLBUM
zLN8r($wz>NVz6&RS$M9v&)VC8!{nV)Ho5#!K?CJWoU^Qg#e#oYsz8_kFomdo2;wa<
z-I^)6C&Y<N0pb$?Q|SL^RqITFDS|`Qv&JbZ2DCTCfaNmh!*@L(F^!2(eAt`Od;5zS
z3a(W-p7xGlmyO9#kdW|*_ZUl{P=2?vbDQv8zW1$I5jGG-`$7fi2$vQSy%eNFZPdRQ
zXiVjXtAcY6i{^SO<&QZEcus^TlUB-DC<p&&?N4sy(Rhu5Ap*k>g>~5NhVLB!Q|LhE
zLZ6Kt9$*R~9-B0z3aEZpoDvb;gp3L|JMmQ}v1g|$K9*z=z6o<usBGY#bDeos0Lf#H
z2V2isp+p~JVnXKVGe&4TtyFHD^+p&i^3nH792){+<y@GjR1OPihiSdlDk}g^9Lo1?
zb{?p`&99mMR!EP8H_cJ2<~Cpxp#z^ZQwGHZVv=u7Vkq%oS!PsMMn~d0Vxq4{23FVa
z_njQ0wkSxpK9cQrjHcm&(j_>yU>yy<{y^t@Oud+$J`onMefI0{d2JN^9Y5Hg@%g8b
zsV72v{OTveJeR&4!-62Rop@_)fBQCj_=nWE!sa^L&ezlOPtw8T9sjXV<e@w~JLw@a
z<e2^tBaS3Ob!@h<N$cKGv^jn%#j2=4nGSg-Fb_-XY{brk3DMEPUoAaR(9Qqfm!gD0
zk5WehpYry@MN)NGZU^RJi+JjJRPJQ<bac5-%gZSl_MvooKt<L><g>FbSVHdeFqUf(
zWc9^|CE%|D7n!5^(yIwl$9nb~R)dolWv2ci^7^5!9nlKvdd$cdc#;jlL)&h8ZpE+U
zwPb?V;y>U8a;<*Db`}nn00Lkh-=nNW4X%Eg6jL`CrtoMv;qB=#92+X_Ny|p!C?T-+
z8`>%KM^c2J#RDmj`@BX$*pU0{eeU`Lj!kH{I8U?O)w-PozLRiVkfsDMX0wvvN*O|a
z?k#m`y}Z7Ct=Iav-P9*~5S%G8G!`|fA*jD2fxpj9$05mQ1WXQl8TN`yA!Uxt_MuAA
z5mRBR7OOH#u>K*6#HqXAHT^i|yY}bvvHkVf(8@5zJA)7_2cVSWs_&a<z^R;Rk=dTe
z&51WWqJQA82he}m4e~iJ!RCgAKgH}KIXSCMavBU=@fdHYD@^G8Lxo_h9VSOI=}lPn
z&^gpl>3-GmT;ePC-~sL4Re$@|%(aZ$U#%aitF&n=#L;+TT-32O0e+w%KkAcKPFbI(
z^tnhADw)9Fb;z*My}gJ1DWGFllqqaEmo-pCAGra(8;<A1*YHdkLhY~OWmB%R!x_Zc
zo}dgsg~DBuvof?kKL107bcrXGy#c5Y#tfez>mb>H1(`hwpyZ3I%$q9-Saw4*{kYFe
zOgF=}|BDI%&dUbwGbcjPZeiYCp&cBjQxN5+syjFX#kz(4a9K~%s#HzgGe|UwwP6Ec
z+QQu*ovEueuHfFlLJLd86KHh7-sq4Rc)KcO3iyn$$QO=oS2V{QI)5bAHiD#?r01=;
z_+E#PAs}i~4HW0}@~Kxde+0HMn6IMXT;(Nh=OL&^2gCotlHo2j6^unzL%*RUjR+BZ
zi!lu&R1USQRH65`r`H-aQe?=U2{(Vm08L0WL=W#&xfrAWs|s;GcWBJ~Q-xTRo00&T
z1nS2b53jHU)F^?<Ebg<c_l$hJnO}%In^hZ$yW=kY=OT!_P=<?tbV~HR2mVuqXih->
zszR?9W`JJ*{H(wP)y}Ggw$sxhhWgO%ZJ)(|s?hBn-_r}^KUHYq?6O|o9P#JHH0Sz$
z)<0DU>8~neC)x0=HYF!5e6(Pw5nY9Qef-4MtgFmX4wNyHnybU>yr2D%pSGt|vupcT
zSZRDKnSp#QV|}m=_fB>b>eD_9L+<K%fOE^Apnwwtc(j{C4QhW0Mpg0S@RY>#G}<ZO
zs0wwP0Tph=kMxz-AI(H=y@9qX_t5*6C4!Dy$6}oWR5T1#hBFwF=AYq(!^Im%TXfm&
z5<i5c3!>9It}A(-TH^>jQUgr$&MIbzAZ#FEU~!GRLrhy|xY$(tWt<{9x;~zKJRRq?
z)@!_YxMbsm^j!(WuX+WLRmW~Xcf@w!=m-&(YKlxt##*s9o40-$QU}(=Ozd*+i*j3K
ztWKvq-5lm#M3XWS-ctuROXTT^#TrT(h8Mz|d`2jv<~<*@(f)XB#hqf2S$pLEe7=o-
zO4o?;mDt!&ifrcjG>0SqW=OIh-9%I*Y#pmRKC6}*_XT&#xW@Cs06)|BsCdEvP=z8B
z1>ngD7-*Yi;>7ulcJWiLV*aW^(&2jZw!Br@na$+1jshLw(ZJ6sx?jiD*tQ^EU}grc
z!{t+6<TgdmG;66QjR8xsS}x{gFlkSCbvQAJ4W0lP+b*ccX(xGVIkn0AhN(bT`OV<K
zJ&Q##Y8hPx_QxuN(6{ZR`oIdT`5pCpPi|b*qNd+*FsnJSJpWUL<Zl705U-=?N;)pX
z?2BRA+5f8wt^8Gmre1}}lx2Cyu8|Z5Q_;rI%!qki(uW-sAkmHzW@g0v^IVw~6pysH
z*CrF6FU?LXpBiU0ht?7hp|AW&8{t!~6DFk(8OZ&n>H-6!-Pf27AB^GJlzwfWLST~!
z_2ik${8NSehyJGuk;-2NtQZZ;IdxF<dUubbn6rbK1xzZ{=-8;yQ~VUSo{Sln<2LZ1
zp%$+D6tfOMU?HVrN_v`dMeT^=qCI{T0WDJf4dnPl6v`85+f$7`FWID2rnb}!&5w}C
z6J#r)#)O$(hdcq+z6Xi^Ox(_>abloD{5bP{Gu8PS<D-x7Bv{n6NEM$IY7`jyj(HVj
zjvkv9>iSxEb|9p0P!f`%4~+v4P`G=ta&k)jsri_SqP1u`-qs`XOB#gYLB@u%#gRxI
zOdCh25k?z_Ubd#Ze7(B<J6ujG2>PePKsSXR-|;b|v?3oOwkY<uiTJ`s9_a@3^?cc6
zJYFXRb^ExRaJg@8u?<^n?i9e$9n>q3a7%0uulMlM8DLeV(#qkSTzUm2(2Tp;a0O-X
z$lBw3$~xfjHsC=Vc{H%Co4%H?hTF|RT9pMnIiE48ffIhLH|2I=bJ)0xlGxa!L{$yc
zv9dhQ-L)T?7wyeI<*&ACtvp^cFV-GA?p}*1nccC>>GW4A8U1ar1K{Q=Qc88RPfHv~
zZ=fg!KO<pK6sQGPYtQdoTSK5~AtQ`1$<@bIPMj2<Mtt{n!8l77nMnzQ6i6VtIob?J
z@QZKUa^PqpYPDy|XfeJ`{>jkPFnMH;3~Xv(a0|Vs&aTru#<x!ZF{3_AjCOOZRHA0R
zq>iq5)D0d|<q7&F_vQ`s<#pwCry)?ol^TkG!*%@DTcK*x4BWSlDo~}HQLNbE-D@^#
zYkDT&wf|?swqm!@TmrO#vPP~!dkckqgg-Cf4+{yh{)&?!am^4n6T&2k0156@`Aq9r
zQw~WVaL=@nNS4wW3Ok|vFN#Jh?hHfGYI%-U`pf-I(Qyaz`JhCv32HusAK{HjIh7hM
z6d&WFG5jZ*6(!ye2QM1htC{VpU+0iENAJJWC{rGHZvkas`HlJ~AB~K=iNCDSfkb&S
zLE1ej2w%_ki$RiR3P)kD3_KGTJ9q>fep~p>tXi*#XUi&z%P+K#EvsylfrPP?BY*n1
zQ@T+)ai|MFV$>HHe&pZuDG;!Jgh;Mm=ge`uyo*EOe`2wDvS@ykravkoN%cLcLghY?
zQa-(-!(|B7z`3Xs&wE7DBpLh^1N-r#a)4Y-AAFkimPheps5J5A?Q+WJ2^Wm9Q)vn0
zsju~*SODMv2vso8uu0{h4|j=Yc7OJsp2I&4<nerAR{Uh;1f1j5-ofrFJY~PE;>)ei
zqkUU9=%0k}X4320(J#G%*JSez$KU{tKmQdgDj}Imx=~D5#$^P14g%c$GF=mw483j^
zs9op*af`s)ddlCTo84K@{TA3>Q`2CV#9V8a$z(|fB_Euz(xhzX@0h(9hRl7Z{!<<n
zVo>sVj6G3e?MB}!$4w<9y7A{0hJ>CRa%8oH*e&b0+H`ylUph7}8Vm5@52r1YM@Y+>
z>M5_L*Mnc9=7vZ5b;^nX1XL{y?Z^5?;Q|JzXZ!wy@N@;47#BuC{gPeg`;q8WURr)K
z?ulSZc^p#AbA_H!CBi-EsX{SxaNhmTzo)G4Yk|m(RtpX+)0E{xs&=8zcK*P(dN!(!
zM$7k}TF!&Z+m~Zg3_9j6nuM>F<-|A3iwN$iME<l$Jqhp6H`asq#5Zp&iE$N~=D9ua
zdL~R#ruoyZ{1PT)H8^BCBMJd{To~)P_<Lm#=ko*-ua?kdwK|YYNU+x+3l&ze4q!^B
z1Yr-Em>{-$OoXo0_}uV#z++MeBk5D15r~K|svpZWgp0M&pJE^p4-}6_0R;u9CH&dV
z|LHT$!7uz^q%wVUs@ESO<S9H+iCf5O$0+lQprGe4`a4Lk+G<hkCem+Fu^k-5BJg@j
zkb0@D=~g3kOT!}C^9cMRiyA~6BgA1pqAV3vSlm$~lamRFP>n*+`*J%I9%yEDTG>OE
zKq!l-pzH#^@*)%+CZmYNn%(eC{UQfKrz0G~RTk#!@&eVCBW~56baB(P0n>>el8DO*
zNLn^r@pt(ItA|6Dc8EMmH`PBThN5g%Jodh!N#)=h&5$byFBGopMl0^6b08(3>;EwE
zhT29KL;~}}nixT9@MiQ{k)dNYTz?8>6$PP=f<Ps<gFf_}=X3$)?QoJoukGI(G{?#-
z`5wNd66ry*vQm{`ehwzan9DSx!uSPuByTe4KX?pag}!pW<nz51x918w2=|a^Lz!Dz
zL}elenN@fDuuo16fx80;i6B&2$JgVbcrGIu@e;=(X(b%^dc!cs6UOD3FpvcwLdZ*}
z?_kR_VogcL2~1-WCx98*gjgo}%wg{s?iVc6LGjuY5ath|dGZEllBEwM*CQy67>JDX
z=4VYS?MWC9*V{vZ3o*%&p`f-se_?hOb3A_*>6^o#1(Drn{bPl=({@Ag1;Fy4cMu87
zE_#E+Snu;dpJ(IazgmsD2W_)$g3%K@t~O=>E_H!ru{{TZnR?mKx(YbEc{U<WgpzZq
zh{bVTYwm`}<J=Ija$9v>)ieiDVX5*Vb`;6P1J|**P>uFg?FwTTfE6;hpd$YtD>V0)
z6)L7E?1q69Mmbde#|rhEefL@DInTwtQ(7&pd2y5)dv4--lL{$E>;?M8o-(T!ST_>D
zM@gwtrBnJ+OmxE*GuO?Jw`}1%+0Ee<DT;5HzWd2Mzql;v%CY4SXMaGfo-(CU?zYTt
zc~6;(fuz0SG@VLP@kG0sr_!gdH!hEvQR@$42+h;LG@IeZy@*0Xqd$j{Y1;FKeKp`p
z%%TjAvQ1&fFLE(z(0=i>#`}IF^DTg*%(Q6JIk)qP2*l7RXIs=?US#zh;0oEN^zHt0
zg$fk9UychT9hd;F5Gw0->R(qV&-I@xw9fUPD+C0IY%#BBBC0dP=w)uQ#Pm>NVB429
z^gZz%`2V;<A-#*yEzYbVD>6hps)T4<);E7$AujFTa{yN;`>!h`05+ea3UGzI5J36u
zO!S}qw)$BG^YO52P$D7(iJu#na9zL32W~AfuY1YCQ(3<{9x^uo4`Z%~&7Ji~2qu#Y
zF()u*A}F7o{jw4Ewh<2^?icTnd~Eb<eIKcpWQlZCoNbGjq9RJ}0&E)aZyLl43QfMl
z$qY(Dc@QU2%!sA>FQ8eG>&_N*E&p?cWK7A2S?~a^P~u-#$Z56R9*-sgp<hKA6u~8B
zQiVBQwLV99F5sYJxjk+B{Ar?WF3F&ztN8O9UoM&v=q4CgE9V%E4GuWTPZ-wUxloAp
z^rf9C_X|TPvU{;sd0XIjuWIU$1d3`~+l&lQ*y0$MQH0)Nzd6zraQ!pHDEIXLbA@<j
znv_zFbAsOp49{h;Ym2dI(3s<~ULi{sPO^D+j&KUc_??3`AO<qBN6b>2U`!JePTUeD
zy2JA_0rlF|FufY{od!t04w_MQ2S7YY@G;aInZF0|an&dMtN5zp$Cr5SZTA;la(0R!
zA0Om4%I+a^4bH!IA>t$H-sq<&F+XPXlc{uk{EKFUr}$oO*ayx~DCnI4FozP57$tHv
z;{C%1j%*J{f|Q)`8+pdxcHwNjo!2u5uWkd^k&G?9+&7r7N9+K+2!uh8Hw717Y>Dd5
zM_WGIo{Sy5QXe~MJ>tzRoa_<n)RBD%lof45T#l@ZsVGFQz0q}jLqnvCKGgats{8@M
zu;JiQA4;rf3d#(F1`a)1F1f-Zq^v^{xBGTcTi(?5*`@HeXHCN#j4R9d>QAkyvmAO+
zTa{Zy))1~i*58-;q7Il}OW8eMIKLGy)3Lw8<169wV>`8;Y&~G%ZSb+hRt0b7?_=(F
z?QYCj3ORTU|HuBs{m1^if58jHK^=oZm&n+uJW4rI@h7E8={n}Dq;)Dd0$Hlzb=Wut
zy(Xo3d8cn59t7Cd>k*~~wDu>ZfeKF+R32_8=>p>H{=o1k+||UZdY(f)WRN@JwaP?c
zd6FFb_i%)Dlp|YwDONYuNqwhK3nV%478`ZyEa_tS-=a*1w5s21`JBjj{N;$lf2h;4
zGcir4FEcL>|8PgtnXxhlUVKSwd&VAZT1PavJ`t$CF)ygDH*EjDb|8ZyxyGp;6h#|Z
z-2aFP*ecak+40GL5iv(Mg7JHo3-fb;6>d|)!wHX+j%1L?V~qw(ivfJxSlv3ScHJY&
zXA-1akeLa<J=;;Zk(p+0K`wA<6;TE6^YC&L1S|uVt#(knj6(@SDrT6`VfIEG-q<5g
zlrS?9q2er*V$3B%45Xioz!2OD4$4ZeHGT4li^Ix!PYw%z0w#8tS%wVx3P-I8W&8<F
z2u0kAE3v+<;G{J|$R)xX(Yk$hH_t#b90f|>WzJ8;(8-U@$LZmxAe+v4UtO4zgiE54
zmVlBGF^0z55C8g+eWaZjcI=vSn7znxJM0kzV@DDH5C7U_!8)HDpB}8{IbZlVanPpU
z?AVy{QBS6?Z*Rkp6t8vu{{E?5n>R$-Zj?MJq4K1AURvZq4svT2+uLZ?5XA7Ld;Ogv
zYS@x6tc?1Feg%S{pb*&RnlZ<C>%#2qK-^I4d&B)n`>cM+>q&mD&eF6={j(YZLqVD3
zzjrJWkhRyxNXQ|Y_k$o_>rg*JL)UwHT3DD^cZ?z2_q_a9K0`C+I5;_Hw%Y&P#~$Nc
zUf~Sg21N3^V|khHu_#C}<fDkJG){j<LF9{3qGjFyG<!3+gPPzFWJq`SYL}l|;-&9R
z>zz+)gl3QX)TV`zZ@Of7XSSLPn;F&sGs0aac?_v*btZPiTnUj+WK1VfPs4bg6Y92M
z#{`!+*ip8%N}|~VObB%yg6^fjo%5O{giH-&c55}^M}UvGTS3ek%9U)C5im&hHp<QB
zCy){pP?4gDpazbRF**4wWStbGW5{N2*Nx1Tf)6|D7V~uU7NTKoOSLT{$L7|O>F;TT
zRA|DG058}R45&VW3_tQ$@H8aYszBQ5BvC?*<tg32UnE-cGh7u`*qF6;;GpD5p`kK)
z%~QE5yEI5n!cgWgP{0jvf>3t2V4BC|4b$CZGe`H_&)J^?&!S(VdU~EvwgjCT9r2OF
z9;lsCW_F&Qg!qE68<W$*&8j;{1elky4w}<Nu7X*+1ro5LI$+pE=xW|^_@(VyOWaP)
zRnEsYI#<&hZM(Vc7L)zk`kvDT0>?uWI;w<l2#=O90sLbA(;I?;NCR)zKHcDC!W?_Y
zBAw91wzve4E&RP&gBuqLBo&I$)3V>7J7+clRqN~SP^7W|8=+*N=<L(E@i-hJD*VZV
zlG6?zvPF+T@S8XqtZ&}}L_N%j6cu~78s%wviGq}5Mgq0wWtaV~G0h@<r_l)cs7plQ
z78v+7JKX`3l>n58wUZn{g9n}e*AYQ0VpJ&rkoGm$6FmKKNk=z9Pv;qIPUxH_WDHoV
z#g0f8r&Ra;JNYFAVKq|5Pi)fvCpxYj-;FU1OB0_Bo(@aoa!-PWW2$0eOFOGu22yhP
zU=UB?EW%ZVdLP^@iMup7-`wH`_UXmM<&qx<z{*t$bue?^m%*g!@DnXuW;-241F)NP
zkG%qqPjRnH5OBq>bN-24Z~SAGp|{2r!0MXA<q1-~O|ckwoE;xIE5b2fYD;X5yAiD>
zU_0>tj7~lfFjRK5gdO$&f38gi5MC^M|MJ5UFRP4wBnsQ}emN<AI9pOT)JJ!rz8_1r
ztSz@nUENn9GQ_o4IT5IR0ml8T9@rkiDGMv?Zi=3BQyUucl%rgwPOkSgj~f0p?%QX_
znd|!YoDSK5w=RpNBDqOEZ%$)|y4}>XBx>CpSndY2Ha>@z5qxCcYmA$g4XwW#n6v1X
z3YiC=5#cY)9G$>i{r0lsW^iXV@KtO_+65(S?NcYP8XdU&xi8otkby+=I8AE-g3rLO
z=gp2dsstu!{?a>F7)bD3<3!c4!A!}^$`txU0UP1$9%bBKDEnUb3@%;h@OpDvdPfYL
zKk~e9jIl?~nwl{p`^n@Xv%ayj^Kw#P9Vnp?P<2J0_5F`dhhKUa`zk(P9rLvo!T#yF
zMUrgPEoYxm{!W3ZGOVY>jn;Jvh3-gs@gInO)4m212+KkT>|{T%@*=Wcqbc_T@P=(_
z33c9%tbvj+x4KjV`tll&RiiPdAKtI6p9eo{l^T;q@%!K|f!5LaC)8-&Mq3RdJgI+t
z_c5^O`-OjUp_;ElzN~BlYy56NXXjYhF9vITTME6r0rA<hDJ171FR<Tj(~P!2H63<s
z$hT<KydIFOIP0@;nx-mPF<u65o=<niZq6=TrIRZ@eX4qxoBBYb`)=_C!!Bq$hs`c%
zvq$~-|4{bN&y_t;!?&4qY`bIIHaoU$yJOqNiEZ1qZQJSCeDl5Ucb*#5)YMd+Kj8eh
z_g-sVpD-cQNcG=Wxk5e={i8!n-G7`=+mQe65#@x+Vf7KSfdzv5<w+0fzl`Il8ndhc
zi5sJucHT|Z0mQ6-%?+l?EM@#|`oU-@uClq#wz)*t057+Y0%NFu4<qK|F+Yis4K6ed
z%yp;?oqPLFiUqYicF6I!<67L1`1lB#HwUQ-1;$WXZov_6-;q^H*u@^Yjue-|2MZ{m
zq~+GN9Y6^6`vx@$BrsvPF>l8WxkHG1weJ0rUe96)xiCB3t8{%%P0*hr4kKA1?lMmD
z0(E!A&cyNYqfP3Y9z+=X|7?KyJBiDwpsuf@&{I~niNesrQ!Jb?8n<EkfO7531?%2_
zZ6k@=#Sg9v`M0z;p6f4Sx*Lsrb4ym0kE=az8}(Qm{tJI_>bkqMi%WVtvw5C8SVA!q
zwJ?7@6Mi0RA>t^21`6x<>n|No!KTHvATfWxT6r}ue1+{VQ?5m;EC*8VEZ<p)(zt!Q
z=h|In!~0oEU&oCiOvkY~I`N#qzz+jqE=%3ZQv@HpN=py&a~u!TWrX(jY2CrwJi#2a
z8~Z(`mX;uX>HC5QwY0uN4kafYNZ3r&Vn^o}^u02`3|JF9NNZ(|pM)DW+>}sl3JkfA
zhCBPcQPtMu)H21}+(9`4xQc_X-vUQi;|mO}>=Ru+6qhPaswr3PGOhxxLuc0zL;MWS
z`#RL;!HIN;T>}b$v`>ONqqHIqYkD7V7Dt5ZG;t0PhX!TTCx>iMs}e!9a!6}J7VYpd
z#qxjPy8Q~zGbTbmxM~0fou9R8rfgXQx~elBS$jf^Jm;!Fk)kI?PrR<|b6>tP?_le{
z{V=OCZa|07oEala9OXdquETsTf+I)<!dQ>vYgD~655%y^7!|%PSQ;yk+b7+OU*46l
zx@*zFk+3(@q!z8*%4puWWM8Y|l6Cae5+koaF?PdWx1DoE*4~4ZacY@}Z|>l{mdJQ7
z+?TA7YhFqL==2wj!~nXCkJ_*MarYjK$8M>Av<aL21P=WwZXNjwS&YN?eBG^OZ}ye_
z&NlSxspBnni1%h#EN_MmxE79gh@XvzE{7$khIowPti=BB9w>6@?)~?7a54`RVYeY>
zU+En3#u2!`RX(YcNDcJqo4;aG4|fpRE|+;|a5~hH(d`&)TejM1v+r@Z@Ssezo_Oke
z&F3FW{{W+nzlu}rh+PwJ?bo6|6Fci0;Kl_}{d*KZ25Q+4AldKa<rDA29$n6gQ=Vks
z3qh4$*!zqF<**r;$L5LW!Te`m&hiFU83^H5NNznkS2$05&qAF<)~4A4PJ^nKXxSLw
zkaihiwP*8o)9=V$rs(;a0oslRPj1x1%Tpaso9j$hG9!rMRgIJtdg6p!$Yy8-w<+s4
zPe<-}C!V>u?@^`&#d0G39U305h%niG3dEuRB>L_-08{gEmY{|BmOoy+1=qzS(|vFA
zocjDWmKoXEvof?$p=H^B|Bzt6noy7eYT*6RSt-*vGBv@4WRTU_3GL6Pdk4#Mba14H
zqz-{W59ijw)5+5N&uJlPVEt7c?&z*iug!e<Kq7QDY9LX!LxY}v;|4B9VteISC)>Iz
z$mRI<ym2ID1f<1fhs2pxGs~)z$>!#`ZjX5uMT8<$(*ZR8!peu>?xd;1A5^biqXHw$
zogpDI8g5L6L&vRYxeq~vj3;tMwB=bCU@jeVb{R;L-bj@gq}tA=*aKn3jI;QJRx|Bp
zYj00+HfX4Y+?Cen{{kpRtsz{usPupX1|3{E+y$5u#k8;WWPgnRJhCd4n@}-txJ#SK
z0Bjg5EsH>5DE_k=2=x76gIr(Sjv~j^yWLAfurRy=C;fSFN%#y@_Kcxw#1|hhQ9{B_
zeMx}1hs9~oBl8Uuss%IH3l2;5=+07<q;4<!k~D<{4oPta4y6k0U&Pzhe;q81N6x6p
zW#u6=C$o)%NiRsr^_1y(BhAvS^#tHt?16+OWV`ng*RC&e@dy*#rKwS)iK_O^%*1CC
zw=Twuq$CB;@|&@?CD|<3GFS~LuILT7xqSn)(f5%_M-s#PesmgifP(<F0fqw*q+ufd
z?=bj0+m}CDeZK3jj=+%<jz;Jk9`=!Xc4*BLqzP5!eR<)=j=A($kq+15HY&kR5wI6%
z7B|`!1$xGb-PC!!#4^?KU1Tfbfp&#+!3nZ8Cx5o;JrCbr{xF@3lZ`;U{cZbFkLr*F
zSNIbbYOIbi26S#7m}p$FI5#RYe@MP&b(hG(A&LD-iFU4@SabNsEyZNH#>6n2v$_8;
zWVmU^6sC#YA;oq=k2R>ae8sp%Ha_P?CL^2J<x(OMxMJ)Vd8Ao3>YAyIKO>P5nR!J#
zl1a@;NGa=R;7vw~&uOocuROoSHZo?~b=WT4E}Nc76y2RjvjMJV*m#Ap7BHas1MnP9
z>tP4U+gf#DMw>p@9dE5B1cGSALCNPFhXF1xlyK*eg)N2ruR^NL1?Lt6t-yzN;j7o;
z!V~a&;Mh6nl%%aCtR-T8M)Qg8gHSL%GBNg6PeH_)h7-N=ER24x?k^?+Eo8PwZIVky
z+AWoAhj%{3s?<zPdP!e>52gyFIAuTuB}x_#a33pzo)f%^mB3#o)n!7oJAV-y=)D>r
z6f5V2kI4`)*A+yyV89-Fq?F8C4i4G&Y?HGjQQh0h>}?W2G=t#WssU1H<lrZ1P^Z~B
zL##6O!^pzYWEn#+5y#j<FFjRDl}eL8U)a7e3wA|v-w}>0tasf<JvFBADTG!kC;}am
z72C)WI77#Sr2!9-EV(jLlVM1=q+?O((-`+nPU&Ub7U+cr2<vQspz63Z2dXo#t5Jq^
zCGgX<pQNffj!D_B$-}@wk^I4br7fR%HrU`VLz;u8=t%dT3R1$FPPG_PmL)DQxjd&`
zKc#ToY2}FA00uOFvwWjJ@rOzi2b9!!0@oVIVO?_bbpE78uw$wAV~jQUMm#MTB(6Dw
z%%Z0VGjDr5#SQRU3%Fp?Wu>Y@vPFvJ%I<)_@>09ue6VdJ4QWDzl8SbF%*bAo3sI+x
zjTmaxLaLR*@Yx{JUYN3Nis{ov;OX&irgq;qf+gad^foCdMw9AAQsgvt=O4*H#!~-e
zXyyP<V~_%D5Dn%1Qbwc<$^(`)Zz{XVJ4CPMwg|owCwGaPG^EWv?hU`T*IP%%rMKky
zLVz6qy9D(FF!?$ehV~0Ik;*@`n^Ee6hLBH57^PR5e7t>e8zp<OW=%<}FoS6u`35t3
zMB6i{{5foM6UB1Fcysmkh>Gk=7&{ocA@7}VQI`gu(8)X-E3<@)=B7x=nO3_$P#40_
z5#-w@hPX+`L!pX-WBwxO^xSFhCgXxc!Mq3ZSn6WGc4h5Z-Yx<kLG91jpZ1y?tL4cT
z%E_k!>mpL}MvXpUUP@{f+N(<Q#kVY1aMr18dp*-@c}*yK1p>X;vT9an&+Au=#2od%
zJoAAn&9j?!3lFDM8mq6`=MDZxvfPzO>m|Pr;@Jyw;uuiA)-u-}1*b&`gD(l@z(~y?
z{*<Ap4ydaQu_YvK41kEJaIz`Bv~RZVJrajNJ;j9k#TY-fT>F3y8idXm=%1FvmDs-z
zk~au%G!tkRKYzmq&2X6%lC#__R~0jw^kJqyH5&vREB5<lcOVh?yA6v5`z_5XBh7Ec
zVrlFB`^FKuUoOBTxxWAk2nE-d29F8=k1@wx)V1M0f|S<zJXrSNxLv-PJhr>890!v0
zmhjiZ(BU__Y~<pWBK5T$P-5CS;a+Xg@*z1rTmZZCt9Db-+q{OhpMWBSR>fRk7roIz
z^N18_+{5<hF4X+^!2+~1xvl(FvRWa{>I2B_{5pSd?9qB_KCm!)cN%qUOicRIc#LC6
zB`^~??yj`NyKy0xOKoKWO0jd7%jHptDAE%q?^VjJd%Q8n({zM7dKE`vk;Mmm3*Q#T
z1uo0)M+VusRB1t@-1Q2BY(yLwWQy0EkiH9)4Yx&HBUA4w%<<&k9wgPW$&CATm{omp
zeRKLWR1}u2u}p$s%4DZNMaX=&=>STegQf3?d{@F0W@mnYYW}X9HJ5LgCu4RL_A!xY
zKgRf}N)jZ8#U<U6%gZ{^xC7T@D?_f3o#iA~r)$aYPR~)kqz6#Mc(L+2Y4vr-CSike
zg*J_94x5@%)oYP!4$u=3HP$oA!w(Hx27%j|Ri|yv-$>-@fcAnr)zu<TAu$Cz2Ez~j
zU)W01C{?T#1gal@z0;S6d9$jyK)ZQtk4NGc4`n+A4rPR-4eYdkdr1T@%)U-g4))>x
z6!$(MZ~^e$nibL}R=sctraaUT{grGNxfUgg!O0LGe+fcdq;u2eQO6Odu__N5vXA!8
z2;R2JG|Ni%f0uHk*2r-`OhW;YYOEpHWFMEeQ&8b3JJ}mf80{VO{LVkQcgJ7)1<@jx
zJ}Uo7PGkz+@C1Zc07)7r;3}E6xe8nY$SyY>hJi=rkpFq_i2Rf6Zd04(lwWn^XaKHI
zhW^haU$WSb2TH9fffm@|Br@~l#qh6)Hq|WBd_mF<r%fplLiqRvFUY4P^RC-g9JuvS
zfo@HHLaT{}{!|Ud`KwSc>}K0Dm%S41Kw}^LCfgV8w--KQd(EZsJ<2N$B2v70S@a0U
zL_ZK}b%FG&GG{2ih&9@xMyw>RZ7WD8uBdNVK7V{xuQ|h-vS@}+&vliIz#RCHJ-LS`
zrb07Ct(RCV9t=s$2hla^_&SV*S_92t;&9f;2#&K6Q~j2DFueB3p@-E|@;!vZWa{Xe
z{9uzgCg4S*dR4^WAwqAx+v<%1i7g%xa;{YZQbkAD`tz(MoYn#D^(Eoy<8df8rp%^j
zpG6xfaR%;bs7%;zs^v$S_Ioc6Qy%`X$(J8rJUw1Xq?eViL+~BQU^Gix!t^krrs9%?
zzf8A>c488R-n|ioxQ?_(6i9p_<wHHXUF%GU#B!^czZL6Deo8^D!lRKvb5xh_eSd5X
zWap>xOR3`!ps7M#g0kxazoZ_Hk$;7vK?9CkC;fgYfp*Y;NmcGoz<@-%26Ni~q`=N8
z2j1UihU>E(WgrSrf68l?+Zl)bgdY;^CqDFdrQab!6#}s`2Q%eKrTyIx!N^hXH%yBZ
zS%8-jpD<DJ!6NnstWp_C5hzzYC;+3Xe%~)x#Prh(9|h7lQF^FJ54ERYKn7V0RSe-B
zFO?bCBr2-2HInF8QUo71p^;gIGyOyPD6%Y(g)MM3A_7_3k3Uk^89|xkgpKtNaD%A#
z!IBK7D8j(N_Ag#Uk`x{W9FwuIb0_LZfIPKKemVVD`C8P9`ak{uJOHu{%8toL#y@3T
z7Kjsy4pAX1S>dpJU-QAxc1x0Q&&|9^$`rnRkW;8stq{yEg5cN4q?4#f8zPXAgW4-&
zU`qM=`w@_O*iW&ioN<D4d)&XqB5;k^Tj&31uUhu8x}me4sEKGE@UhhRm)O|x;B5u8
zql9ngl@iLXUXUCLn2?8!nwkf(P0BKTJ<rl1if5xLBKZJaP(Xu+JS=~{w_I=c7|mQg
zG-qi&|J&%c`(bTK7qV+m9F(!6NA>sehSK^J?O`P;t2voCVN%35HrOd^1<h8sp|;MD
z!}HNToDuQ14iuarF49)(YCrP}Sy{wCS`){75_fbs=J!*qPg%7bqWTpT39hR+-VH|n
zI(F<LVkvx~XXJw9r6U9LZg|ntDZ*BA2R))O;5d-o?e;GTexU>A1@hU`fA@l1UBqU9
ze~6XUN4G{_ZlcvWso;8(ukTSSbC3huaB-e2%u?;5Cp(vC3+M2>s*6{}t5|3!zT|SJ
zWmEmF2$sIhs;L9b8}}F_!r&(7X8wTv*G2p>wor-e(%XZDp4ZZWe784@WCS&IF{Y|q
zhZC_g(w7COO;_%}!MY4qW<1<LINJ&7=VYiV^1=%Jje`91S#|E92ect03i-T>LciCk
zGZtDW7Wy-4-q!x?2T0xkRUs;#9WDKc9_Meav2T?KG%5adJ{-0~+XOWa+rj*-CtIW(
zEiy<4{ZsJ0fWd_I{XYR;@{p7`rAblvwDsq@!AKeEAxyoyUeO^*BC9(!+(|x~)-t9|
z46{p69)RdIdJV#nWVd?ce+<(iCaJbD{xK#mB2=l*bFt@V!B0X)o$o8KgX3cRl>+^M
z4Di@yX5Mv}w{^GcibRPFyTT)+$B0Vr5oZN;ng7T2{R)fCd#Ql?Qbu?IP3T@l_<ZYH
z=lN83fVLl<E|TP}`DR0KYbsO)4;y>7+QaSOROQ;-8tb=wjX5mbIXerc5cZC%yuD!?
zEkR0tE#;gkbDQ>j`16qA?ctz&wVgK6Wv&)1S?bmk3{3I9z@~7xG0R+N``%6=0n;VO
z8iN!M8qp#70en5f@>sI!r9cdu!8#2*LSkQ0!w=yIJ0X+NN=3=7QnpKrIsvyY1GM{U
zAGOuwik}^R=CvG-mB9r{o*2JJfqiW0SpAKitXwoazTlrcB+NO`sm{yqUz^%~l2!Gi
zsAg|>7_G_&*}D+JBl1}e6a}yhfo2~;zDlCTJH9jS*6ZY}6i)o|<wE>&0(sfTl%ajW
zCh9tp-ACGM5_OK#e$Z#@!*=)(So)PMkb0w87W7SlOEyo$$2<`VKdbtpj0~N35j@&N
z5Ld;GrDp1=09^&rhpB1KJw{;zDPm;qZ8Ta;Xt;uq_fPonZtArNLyu~SaFj;!{}_%A
zuYF21?n}Ne;A)Hk`Tj+luhfq-cq{z>QI1m#`?m;N804yUe966E^={z8N2I9@LSCr9
z8GbNFL`KWSS@aGz;;0tL7FeDcykHpO!WjCZ<xqY3R;_pf5=PKnM*YhtG6nA|<`<jX
z>y)839XHH(Nardq&&IFa)t_9S@M8w#g>h`ZS!|eZIT7vuf2N}xPBe?OQC92U5`bKe
z=Ia@;s|(UWp3mdn-AAS86LPlZ|8DT~rq#`b4~}1T{p25}!{7Pk^SFK2c^zC|BM}HD
zZ<la>QUI`p5pp0`n-Y-2PeTwwhemBp$1V<KmJOb*z(d<P9}glJSB)5`HCi-~C#3f4
z`cJqwtGOdbeo|fzhO2}B!#c7bPSd2YLo`3c+MNyDnc9}JZ(lq-h9<LTw6S(PSv9ak
zXb`~@#_i%Wf-0J0*i9ru%O%*z#gTZX7P5j$p2Oo(g{QzGS(yh8kg}pMl%>T<R6)pC
zl36FrKzRjR;-1Iu!<FDXjC+sC8{~lj3f6^WAj*MGRz5d_(o-^Z7+<G3SgSp5^VSAm
zv8u5i511uwOK1p;s+9RecoRnLl<%dsC_Ir#evy41pxvm6qZC%mH}8LG1u3EtUUnVp
z<+?h7J2S%sD6=Na_>(oW6|jw&F4JKCTFVFq_Jt*eB6?9jO8HMm#B<RS%lm)j5wXL_
z6sndG1ls=D3;ZdQCuFrkg{IBB(*rfgy6E8o^W4I=pIc)>=dwA#_)<39a&6&X=T*BY
zyY+9==vVqI`MKkfJq+W5rssuVRdOREV&x4C4{@XaD@o$I{4f5)oJQ|2S@a1gdh90V
zFTPr$NZsn7u%~uIu1%=3X16O^TqNwPOZWC5&1P-Lgqtcw|Lm%Y$Fr40klP=Bj%qEi
zxDBIxH&~5TP8W95<+k|u9JPUC?}&{M*0ymfLDWg)+B|9|iqYB~>R0$#IS+`hjEvC3
zaVyx5m>BUvJj^U_u2>;s&KZ$u7_&{@dmm8Y&$kk8p2bFhh&P!DiR-Vz1s_F<&|#kG
z;b|ctc2^2PDi$5*NBP>T*r|1Fy4SZ^0ib7!MQ!>&dSa-g5cKaVV%?Mw4c|&ceX29R
za02Q+*h-QrW~ey6J;$cipXrFrcrlaV*}FM`h^14dsUVV)gi$tMvh)0|5OuWZrnMnj
z$)&y^7vb2Bjv-YFMZTY61_PYk{6>o-D9H6mfY3z6PC=Hv_6{|k&z7f(rpDtPSBfBO
z%XkMGA}y^m3Evc*Q?{*02y*BDI*Le`?;f57f#81tyMe0A73B(~-MyL`dSt7a%U;n$
zUscDn?DAxhqy-fAdrNm<<oVUy7JJTVl!R_weZ*F<(#RSa{U4sQr!%$hb;^_<<4NUB
zo^s3n>}|S3Z?v?kP(su&s;g<N>U|Tc)bO#L+*YCi&t^)urmp5viP2ByT1_DzB)NtY
z?~QQB!~Yf)5dz#pw+FyaD8;m@DtDW%tEp-MPinxUniw|JQq3c^`Wc$la2QrfQqKQi
znu<);_lcU6wI-VFLcT6+KZJ6yX?Nf@E`;wOK_JR=-Uq!Mea?nbsGno1x4nM#&`<s}
zpk8XmRc^xM_+90A%k;QqfQjg1Gga9VtBy)=30o?sU1ux0{|&f0%Gi&nRPi*$xQfC}
znx)R-qK8R}`UNNu7G^*jC-mYRbq`yTj#A=inEhajf5(-}IAtYGkFVLgETASdd>eR_
z_WoJvXCFHmt1hnrF%n{xqJ1NqH31?v_*tjIah!+l+2tb|dz+6?@ZLj1J-5>*Y_hl#
z=nt8b)3S;zS^F>V7+b$E1J!YQFO2M@(HeF~2v&#pZY$>wJtT;>VI&+-!t?mX)EZwe
zNVm$l#i&=LKh}pm$-Z1KL{Xro@-h#+b7?(C-hUj#D~BmgAtgT?mJ8fQetVcU#fz|X
z$(xQ`EKmv95;CMmCBaTI^|-z%csII-sC&)MC`ZnlB8+c~MUgyE19j9<UHN~SP)k(q
zVNh@_-Q>(eSliei(Iuhae!&_;+-3$;<YH!oDI6ut$vh7CAnXE@EG3K-KMqu{l{$s?
z^dkH|q4a=>+8qcNy3LJiC==V#u^@n*j31KFy|-*{xixhm-qdLi?JR6we+xS7BM3+o
zdXl&x=_TVPnBpzVuZ0-2c0FR{=;zW0NZ_(UnGHO5f#)MRF;X0zFRV}L=g`NSOc)++
zy&xvpPE<%OseRBNCy~+mj87&5T-HmwoM;d-DVpgDP>dJ8q&#A*qD6ZXM*;Z-9`_`o
zl2OMt#nv=RT{GKU;M3WOY5;1<ytnfBk@mJ3>^q5fuT2kHJ;*z98g0e@R4^#jsmruI
z4cn7?Jtz*>d>!%IjX;pY-jTq<Dcmvy<o;wyP4BUUSj>)#@%|jRL_DS%F@ZdyJoN6!
z2`&H77`oDmq+c7=wg10EpE=`GxOYeJ2CI;x_||gBE4PWX@Pt$aL~iIrbJh%c8I^Ak
zn+eph3e94B$d}&2d;IT`R-*-yE}`CuIeD1>8$YB)uld>ewYHwz7#a+g)HV-_?h_Fl
zAKn{MsZE-9g^zqviCly*Y!@`EOlH2AceW2W%T|5;cc^%j9oDsW`!jRd3?FjzUh}{8
z{}M^e4%@pcb7>(gpwjXZKBQxu+TD}*@c|L9yRu%y)hoHSO#oD}WGbVm2+}6-9H&6m
zW`q7P*Wrn;mdTT-+i<+<!5_xm!%sbwQH2qZX*=P4gO}W#4PMnWs>!=NqtFfiK>KNG
z*%y}_ilu*#ovviP%pO5JM9tgh{s$e#l2Www!U2Jt|06G}qhO&~cPl*@3&S#xFCD{c
zf?Gu!j#SF4!$|GjYdf(38jB`{mo+f4j&R2vt`ZQ?KiYZNzWmS1Pi8YjwjMRzlL9ot
zZn2k8K<3c6p=HacwN}vPQ!yryOk#g0ieoz7T&=g@BpUkRMuvodra?|<SWHgcgY7HF
z_%dH}@Z{2=X&ss;w8O-@wrP5eHRrP%CXZ2V>pFa+eiM)X3k$+yeU^89WpvwimgHIe
z!KqzXEcHsfnkCqde@AlDg|d*^CEqDRRf$!hD%cu6lP#gU$EKFU<TH*9^pd9z#2c%j
z*3ZP}Qtd`Vd&;KBDUEIO<(3&=11OAfwJ>Ik6j_rN=xASmW<1+&_<M)?b}VzTCFT@(
zGVo^;KL6t&c2XwCDv~HupcKVqo@XyFZc|>&fSv#)!DY}2S@{)NAez`=eJH!*`kzRK
zMxx(<>jO|xTyD~=X?cNZJT96)$JkwBM^OTMVnPfy|FF<g-}>#s5fZUr#hZcO`_tjI
z#QeW|hhctwONW>e|4-cEnvmCA4H8!460izxDdc)-Fia|ipxnvcALX(L@deA;+lZrR
ziW_TSeA?xit&m9dnQq;$n|RMJ=zL(R&e<vjZ=HiQ$#g%c(fJ#wC_52sIU1#(eud*8
zGZbW&8h?zKH;&&<VXT?=*PYiBhd2R49$1FrB3JVHp=Z8}6^3uvEswW(x<6qCy)XIo
zQ3xnlY3gwB9+hddHj)wy1x^cr<MaPuhv<xa=sR+kdCgwisHsG~^kh3_Fj;qgzto?l
zf2FUFc{Ri1ROu=SvSUDG<A#76#9m)*8E#(;Oh&(0tAAKY-7nv)c(^Yty|F>`s49n|
zQ0Pr1N(X3=6M@7=tf*AA_uB8R9eD_<sju^DBiy7DJ7<J@eExE_vZ55pwZ+QX+jknc
zMve_wnU4&K?>sVPXNhase5#SyZ-0!oH45sO%^k7ldIn+qh7T{p`;0&qeAUVeBNbLR
zNe@=w$Cr9}&cnL-4I~FQme-gsGDt#xD-FlvZ9E@}-~(jgARqxryoH9?soY=ZhNEb$
z1!q|)=H4XA0pP6WY^}q)5s9*zgs_9N*=(iO%{+j&@+A3H&$Rw5GjwF3G;0F$2aZIG
zrVn;86_6Baz<rt9BQPz2E>~!39&b8bD6kYaPSXzbVW!Q0j)z2K+1x29%!W7H9AqAx
zS;XHlG;9syTb?=OCH9}>mr3zV=1o{a&nJM2?L9_2_p?T{T=<(4ulhy}h48a!n;pqm
z%?L)Us8ALB*T*aD63n={<Y$Bw`GL2vsX|i&r_f#~Gi&JD=(WV|t@ri*H$qvVICZ@K
z)L5ph)-6RutPmwlaV!I57%1dxwa>xHTT_p})LJ6*gdG0cVLn3O7mb<YXjVt880d$H
zsBzjmI8=M)?-K9bTDQw3@j>t*n4(eeq3PsamV;<)OswM{+Y_g={7@}ifn!93V;HEs
zn#S)$Ade*`C7#KbvW~MVTf!bT<QV6o-DVYoSpV5lPb<o&`biGKx7s7P7E?-b_wQdU
z-ov&w?V&!*GQ3H-MaBNMd4>+2o{S%&ZylX~OMNxg&GtW>&u#NQ7xCq|U1mf`4Ol#H
zliR1qbLS%c9)3V`Xm9w<INz?;uGBTxZdSYT^mkb>Z3aF1nMx-OKtR#qFQI@xpN{4?
z+1;OC(8l#ml)3PeDBHmUvZV>3#etC#)DkZ`^p$JTrAy-p6ZH0fpD<cUF%|yKt)OiP
zw{%jnLk`H|r2MbC7c@kDl{N#Ao7scwkK|}_{N@nJ^l%*WwueoD8nB$b`!IQixB(zx
zYrEfaZhLROUQC=^{bXp%^f;Rw_h0X(dT4pQAO5$m-q;l4pI0@ehb~vs+1l=SKl!tS
z{g{K_D%V|vd%p*zzx6+-C^dRsmp*Q4zio&@0M|S^pC9NzoS#Tv&{_ER@YqIS@US4b
z%<zV=ZRB#5cs&={@0PFoYAi%6KSGdbJo_Hd{-MV<*%=1UhqFP{80Zj@3+=YO+~%<S
zCO5ty#!V-31v)QfM<n}?1y!S$uEpnC{`01H{;XzPNoq9u`PTlv2gd{Z{5AF#t#noP
z^Y8vEFS^wD7D{6M-|@M@i~8s?9}kzub(*Wt4*Sh6UnHT5toMt%=z^E2^UH<g6a0y(
z=i%A;{(o=LLrVqxZ8JR$0bqV7u}kz@C;qF`y+JWpFUzT$e$q)KFzg8Da!pr@rA(En
zJ0O9MNM^jw+|7ruh|ucM4vt9tSbETubGJtUdzCJZxcngPY6FN-o2%Fxf~|*C&2+j#
zwZ9lhg%ke2YJ7Kzr0+|guEp3M0Gyiqr!ktmT^W(>!#3Qh0Z1grB@N~*a$0O48D^F_
zLRsHKUbLro9dqRAsRQ`&qmuhY6<4p}SeNMwH~Q^s#jf*RM)hf)!7iNj+KZNQk{v^I
z^*(AL+kJT@8J5w9rP8{yXzNO23mPC5!pW6k_XOl}&nwB|kF&eYi76cD#%x-h5F<I~
zs}yEf`3X+CSgl}0s)JU&&Y_?KK0m-+G|W<lIQ!KKbh6DR{r^vRkB78Gv07pyU#%t$
zb-}0Vm`NH3JXlnk26}~_w-66pO3tps46i1eqk^>X*BP2I2Wb64pavWw)-%Hw8jIR{
z{%Mt(E<PzV#Rs`D{lNdHwP#~gudP*Vnw6-nRpPSsR~sh`1Rp3&p~~l#*sE3D!^P!w
z?c;9W>}4?5uvEg+3bP>3l3wZX+K-$4CI}6%yH={adc4d&jncc>xrHJ?Z3n+p6&gQG
zF5opiSkia25PNZt-`>Kf-fgc)fXs}EZKW9{HYGdhg(Q!z(&LEGP*D_oy&Bcla@G1g
zBW!+eM!xlWQIV5lKjS-+DYMkOWG35ouh(mVk#l(>n|?-9r~}ZpRoV@Oq+AQ1)CrW;
z@z~>gZT>p?ca(A8n&CtP3MLq4G}_^UQKzqA{y1Ln_953p<C7Om0Wl#EK;yZuq;)9~
zW!NbZ$=(X(p+Qj!_R#^3(j&Y@aXYd9J3Ie>dzY#`UIkwVq(D%4Gb!jk>YUFFo?pz0
zNkOdaqV#4jT5k0Zha_cx2C@#0DQ3Y_i%9D4_Ww%bb9GlLJdqNqA@*Vy55hA%|Mw)%
zwgoXlAoLdAmi(HL3-9^yvt&4Yen}1-=%L>8VSY5tel><q2>71jjMBMqJ6|#NKbLrM
zecmwoSpeL;e^!m*hY{ImNc>q@>cs^5Talw@e}YGus->AaG+}u(G$Iul3EIlk<J0eF
zJKkLZkIpcdVWrt&$+)%AkLaXhpW+V|j>XuPS5`eN<G3WF!@}ohNskO@w&4&o@S}Xf
zM&t{cB7b5HU<;>Gz1kY4_x}RHfn?<6VmiyuM={OubUS=B_cJQPIYbrZ-VVFaU#!gJ
zywdd+2jGwC+3zFi#2LJIkWp=lx6r8sXm+@!C>-njxu(+rzl_Y||A3;U_c{NvFY@^b
z%s0GSB_FLH1DX15HKwFQGif@0LCYtEUkgbph<qrp|6cw_cc{x@med&U`3*d_4?G{}
zoaIau|7Xq}4FGaUt^AugTu-D;ctmC#n}nW9Uqh+BvL<RtRd&X$#t(wQHEwR+dfu-h
z=ld#~b5Hlh=j`^+mBnayzwv@?RHNK2j>kni6gsYb$h@k#FKl#WHdo{clH6BW;#HMW
zf1P*Am=Q3)=>je(1Qmol4iYP(zr@KPF->uN<ed4@@z#-+-0IiU<FgcuUPt5zBc1?`
zD!;2HKDmXX$<yR~f9PW{m?bI*4JA4a;=gd`$m7qxs$mVSIGEG<li6r3S5-V8ux7Hi
zY?i7<ws<I~HpJ0F&QLmamK_61XN$exS*kozPQeOU60?23=<N+b3+L!h6|u#T<|ZJX
z_r%qdHB$?FZF@vq$_@DfyXulPb$+s$>YR^R?|VAWw~=q;{C=Pg9xgvCu+XUVa=^C2
zwSzh#9xH1fE33Achl9lriw%m3Z~4r;QJwCj*n-HG1n@l(NZKL-#hfvaxac}wEQW<+
zMXK4p5o3t#8vNXH9l)@$q${Ll#lvg(BzDW35eAkAive*+0SR>Jl9rTv*=6FshH$*Z
zXjVFt5<{NGPB6nP%`c!6(Y-Zf<MMO7&f8tzjCc@uvv)f=AVqX|e=*SY33+{#eLa3y
z?AZNUSb@MKn>QWx&4ykb4O1I~3y8A*0DW^5em@P9&*6ag{TGURpjrl@2vih|3&EGm
z2sI>U>KpHrg36dvM!#(@&kC#%W1JmOk0lub;8r=h{P>*(7~Kx<tRp@;=kB2gxAS?;
z2L4p`1^;b*YdR3GzRor12hhNg<3=zu!g~HrxcAGdP&aQe7jkLkJx@oA$==OdRj<}v
zjX($t9>R|naq55^#myCzqs;J0<-F#|%UD;t_Q(?)tsaWJrC6VT+gwsDxbYKU^D(D2
zktu-Q+{%Bjjwrtv7M0{qw~Lb#E`hWMdSbg1bvx0dHx(gd#xG4bqiAhF-!U#Q<9KT^
ziPbeZCcH{@wVi5oYrfrDzWUss2Pz)vV~s3qUMqU4bzjF8VG=RYTFq8J=8W@@4YfJY
z!c>$<`nUuZuNXX|{8yP(oE-<}e5FC;E*Fg$X$*6<$MTR9u7`%9Uy){#rW=p1O?#rL
zy$L477*Ci|W^cfCW@sCo6Y2iiS#On2bm?ZWUd(&QV7jca9u61lXxhsRZ%xCoUC-0C
z81Ei!Uq#Dw1{^pgis{ZIw@WbB?CNf%=cOQ2!xifuoUSZ=S@y1nsOv|o@2KO@<h01E
zs)v-2UDjrK2}6~1B?=$1H4cr(%Li80IZC*(E#p<R^z@-?2t#168XsXF>vvt0ZJ1^V
zdp2%x$*mE5vgzshV_yN(#yrg1g4c8cv$cw|S*_ZIL0QPS#>L7d&>1@GuBI~*?AN(_
zd~FuZUInz2MH~h{;Pdh6y8O6dCr0oJ1{oYrY-b~3UV!$xIe>LhUkIepxDP+XSj{2T
zSb7U06hCJns47qJ{q?#_7d<>YA5USP2Q|uZ%5y%)yB-Bx<}hjqSQo9a-WJ-2MO&+v
z%1sAyiuAWa8mnp6;d-QNeFo?;_BGS>I0~7BFD)9*VF|l^tf1ipK_OKwAZf!T;r=4o
zz>(Zyx7s7X4jn>I|2$h!xGU;G6K^f-97b9~fsWt2_AoX5_8)ShqOr!$$A|{tZ~xrs
z1GhMq7lp4IA4BHTj;VPJhK$-2|26jn|E0fxqa+2x@3;Efh=Ayl&1wk-DQs7Ap&#XW
z`(VioL)t28Uj^G=m5f3`%*ZCzLX@@Wde4S+{)NT5l-2`BiF%KfUC5?lcxc1omor_z
z&zsh4yZBzc?zzs-&*$DF#aGB?TncRzyj<YRq7fU{NcSJ(bqp>H0z4t46k0XQshhjG
z-bCK#k-L}7$t~O9UH#uM)dF`7xsivL(=3D#B-Spay)ABB8v3s$pWC)0<xAZTDTJI$
z2KzUf+bCd$tItFx_RxXng$>k9&&Lq-ale7VP`|h6oUH=A){|>Nuze+=iY0#Dj)zNZ
zK7`roGBdCnl(+qW4!7+K9$S>ny_1<%@+-v^WTgz?4A}up6V;-D1-T17AJW`^zO7=#
zDl#K)XO!l^=;He)J@ov%ay97kidyat(9>Ck!&1>9C39g!-I6Cc^!ynRBXl3=o7rJn
z)LL8o-?+@+&&y`JO@9<WDm&_hRTK!eXNpS9Quy71IK@9F6$-YAJLz)QKzvHm>E$j(
z-vu)vG_`B)IXvEHHeAVF|E*<jxBGrSZN0l*{-nm5|6X)^Z#?@`3_$lS?Dg_3FjoF~
zK{~4ov@gV~`4)0_ALTh`M)7z+h4k+7p8MiB$CHE=d4GTO^M4;r=<WIFmLJEnOo9Ve
zQW_e2EmI#7f!08O6NPcjHU#D@9Ji=SWen@lhS-mnQowkqr471!oCMl;qHa^Iy>a;J
zSSm2v`b&O6Z+d}{<MFD^$o`YwZuq%h?mP>$RTRII^jIaBimcXwieSA89xBx6aqETd
zv=cZ+#ZydV6(Al{m%G8M+?A%(J(tEW$ChkIF@Qo;^w>8^t=S`-JAESQ4xVo!n!&B;
zk0T%;gzmYer~$_hUa!6Jj=2jd8Av8ruqMzw*4Bc@#L@4^%(9*;V8csjR;9v^NQIW`
zEqNd$g$c)GO}^?c8aKxFi>q3|<W$AlHwBQISSs43G@DFH5KFHc)J_kau_cm@t#(c_
ziO0qNlCSLa_a2J_j4&CI2%2k>^76OU0fQjroXQlE&XVWoS^)?Us*7B+%;n|;A#Tb1
zGyEb+NQ5ynJm@9J3Og$~dTp6_W~lFsq@a5(<KpfaHfwNPOWQHkoSBxq3_{+(at9#p
z-TD|XTW|c4IVg7(b3G`~Y6oG$hch#OrPuM0;TZwg`{w&ip%KfOqyHc(P_ju1fg*ul
zD|=(b)l6WLa1T(E2^|Htq6u?L6)_QE@iV*Wr=3Z618Kq`aK~vO8$}1*k*WFJSc$I-
zZ2#N~rp}sQ(SPJ5-ip=COO*Bhk!a);RR%tZ4_mfdNH;($0|`l>S%cuRER~2ku62xM
zC_j9y2aAd%sKu0*SrOQxqGbay7SDuXmaUWTV=|5|aWkCo(@r6J$#WVh6`}z(XY41F
zFQFp&q?!PW`MS)_7q*afv`x#TTD*&^#Gqo7h+#7Xo$_Gk3mi^L<_A|gjfS{iFpN>8
zZv$ZWnO{o8DfHQ4N0Odq3pn^V|GHyMv>9h9xT(FJGpV{#BNmC3EwNohu?c*pN2?uA
z7Ga7IuM%!Y0mC9LJfkQgIGrrbLZ>j?p~7|1$BIT$1#+)j@KfMnTq<K4(^8{Wxp~uX
zQ?+1op@0^z1~B071Yztds~p)dsYJ#5L^D#nAgb?SK|kv6ifvoF;oN(`LO{<08Jk4t
zgIf<4*+kaTU}#h)&S1~8H$=?>!2%HPAjqWL0gh@IL};ZXt_i@(%<mazHm*R6who%>
z1osj*g^Cu<%8oLSe#k7lL%L2t%;wj>N_;x^<-{3hK`hQCI)eKMcK;zm%Ctfz!4)MG
zE3{DHv-KJ)G39_x+?4ynrqUfEurgo2aKIXXHlE4r`I8~U3~Ro~OR<+ksC2Goe(%3#
z-IlSW%y@WD$@>qM_-17@$PF%{-UEq7vF8m_7fHjDI0_3l^9^=NzIX@GDCG<{Zra|I
z##L3ZMHa)N^OWG4U`eChx9%VvECv-JOZ+m%8WEI_!)dI75p<J2vp0NAXSVpAC(oYw
zQM%OJdE9Od(}vAuagnX2k)?EYjbuK%{0Bkgpy(=Im_T5{O2V@K8j<*zkaaD!ePQFM
zg-R;Sd{D695qmaJm4pI@!p-lG71G8~sDMOTJce?OW_r1CR#H$cnHv`mp_)Tvy*9ys
zU9A2II}*)a*{gs5OkIgS0(q|m7e%=fn?TI?o#H92i;Ur^M{EZYDQh^q#zqxIvmUvD
z5EVC++3e!F2rC~8WZUmh@b~->FaN&B=rOlF2uzk<*8y9^-Eeb>F(;pTZF@4t6(Z<F
z1-Zy-(n6)#r8VWmbl$%0xby*}eWQ0e_Dg~pVNZ;43yvwpFRh|IeIjTew;+xqFLp1q
zAqxk5H+D&<yze2H1rGXT7Up#W%3@e8b8TSyY4dOS-=<Czz32;3R9F|p<2wmM#5y!Z
z5&%mE^nLeGXNGx${AhSaH05>=3O#Tf1=g&;#f*`%DL)m+G(Ff{;bsCA?!t6sLU|Jh
zPxBQel6mlqR9Q|5#S(c4bO}DG3d!Wf!(o<Az74RnIm3Cwo3bMg3^V3^y|nVeaN|oO
zANvV?Cq!rRpBCtg9^5199w2;H)6}{*oiUP8fpqM@%@^g>wh9GdOOKdg)`C-+l2heT
z?5$+}1p4to#R;S1@i<A-9C;~!%^jl$;ZV}8M#j7V@kQZqP()`UEIF~nkirHTSZ!0a
zeoN_t90JYUB!jZMSZLT;7Z0`5R%<BgbT@zkS<Fox8_e^4#PH<Vkz$irNKN?JaFTx3
zcH65TJB|eaMd-$uFpB{ra&Cx}Sx2i-b0?aMhM;rsOj7xvVcs()lv)2qS}wUwudbQ^
zD8kQB$z;;4v_!~XJ_KE^SR2?`dt@|6;!Hsjp7-a7Rsu2N0>8@Vv+wQ}e{)FVdHPSC
zdZq9X!?yBm-#n)fJScI<RBPZ9h$D)s<**+n?t9;h|Mvs|!=bE9{iJW$%3v6<zhCBk
zuGvQtUwF$?K;R8O*maH=Evlq_lM-;|-&!8^Lg*V!QiSYZ6iP!Uwfj+KmA$@3Tm=%S
zeoS{m)<46t?H7`f<}+zhIn#J-->^qv2YSDSpS0F}1%SCJaA}pLFrAP;H)YQBOPNxX
z7Xir~pMx&D2t+n|aIG*ipLhy#h<aYIY5s>h6+IA!3LBS3whuHdP^TCf6xoDK<UB2J
zkzJB}=OoGa@_$sJ!U37xRaPpHrX;$eIvlqvO8{5Kj{R9MMj9E9?f-9DCg3EnGpKol
z)WPV&sxDUXKhv`M=rG)+in5ro^5&>r!W4h+WW-T;2&y#g!F24Of!Xr^4$L-j#S$Eq
zSSA6kl!DXfeg475po8A$_zfdm?UV0_9P~;%#QZyfgYKQ%k>T4=ff4c8ia~Pq4sgR~
z!>o&Dl7Q{;?#J_rnzuyMf>4B?86sw+OD`=}j;H`Uam<!_fWE~d1zQ6cxsvpIp-3AS
zI@2xR5aMN7oj2bDfEk4pV`c9bH`%5on_hbP7sNV%GC{3L+8=YBJnQ>~e+V~&Y|Qj9
zxqwpBRX@D=G<mS8F=*S!$jEV+wAl+dwLUmiTx!Pz^h|-e)P@-_nVnvMe)`?$N)}6J
zp+uJFJLFu0#@<qq4oAf2{}bKi6Eg4~6KQlMn8U*kJn=rtI?osc0QKSMuzmzpL(x3|
z&<o~S)UPNx>yd)HT9UKMXCn-*tmmAgT)T23C8FtBS#8BRXJI4z1mW1TX{zOZ38a0T
zabvfp!TS6xpr~GXHFzDAdY)^1&VN!wZ#E|CBlmmFC)tyP<G#e|z1tpLTz~%^`8pb!
zQN8l6{L{6VzOKose=Cf7BPT2+#N&X>hW{K6hQKTAMD(-ALedmgsIQ!*I`(^xBb3E<
zbUrfUvEca!3Haz*T3a~ornnd37UA0adCu*g->60{T&)vX+|0-gclvRv-R)7@9|q8#
zS9R!vsOYwZBZOzyG*}OUkb17;oQ(t-T^UhA0<|b>$5@G7+moiEv^Ldc8W+c|T0KR8
zU4g=8e&!^b$NGO^djd#3B;0$zwibsI)9!qjF_O%9Nxv<s^CvAOU$tsTh!oyl(tDh~
z17fXSzMcBgT%gNjJdrcnSzIu}=v`(XXZH&T%3&2B+p1iG4>0#H6=i{6ndrB5JREHw
z->KuKk$U(&!&&5B!@&b|MV^7?P*V^~k_V%;KkxoZ-v$wF$xX|rJ}$Vb+Gbk}8ki@w
z>7n-^F1Rw0D~?j%9FUGq2^#V7@b=v8qoRL8gE`I?TDAW_BC)inRHt<3Ya3;qrd`aJ
zC?tS<Y=vq!q{|k8&YPEdrue5QlT_uL2{S9oiAP<P;bmxZfK7zy4sbs`6*rH45n@gN
z__Kc3_yfjycdmt$b!Vh%KqQwwFCH6*8N*p&I<Tmiy2fBZ90Sp>f5PdM0n()9O^R)>
z!)?~2v7%jGV)XfEda&}Q5gIg4icQoVr#6-q`@8q1qEdfooY7~b*EEll)k;_YTt8AO
z%5r=)dYo5!-u;bn5n164^6YR?i=f|ut4H9yAqdmpcFps7MS_VJoUcc$E`*ULQT#k_
z$yxe#QIh&*C<(GEahTiE($mL^oPa6mQbuGc_%}ZDym^AJzV2_otX9)8!<Wa~*d%;l
zpXkYF2-43X(8|ni+)p^CI%pVR`4b7nUeT04v!^fV$$BlH=7j5IDL=e<szcaVeXj*7
zv3w0!p)8g?1_bdA^-totoCW!=%Wfu8cLrHDpseq4;UE`>vdRx;n3cie^(kZJ(HpAT
zTGsP`aMc^OAt{}SJpNEtdRrlfU>Myd3*>PXdB3K-3oss$ODccPjYahc@?PlycsVXP
zzAau?csfwD@qD&-{Vi9UCz3vwCdldG)eT~yn60=JH1=_yMCnIjCyUB(%yW!)Z%;el
zw^Iss4g*jB4p}6?{r9{vyCA+rlqp9Lf-r)a8~r%62h!8?s7_4Xc6?UC%TW7;m?AIY
zP|hkHEQO7>M>0a2`4f8mW-ESg<AwX6iAIoy(*mdp-<f4)!uC@x>b2KS9C>CKY&I@l
z4kq}w?Myj;;W@iXv2I_z6fmYLhegNnuZN^U*NJ-2N@<x$eleF0SYIFZu!z8H{3+_W
zOb%dE{bEV->a5*llE8kA3is6S7ViETmeA1pDD>R6`nYVT*G_)k(D8xq8hLyc^jLO6
zJs@QU@8Q<)J*Ty%?bleF0hW+;|JZKjl~I{*yN=olsLtOlfj#$QrOyZJ@fn$|zYUGV
zdC2?=O}=oS7U4A1lBd<S=mAyn)0#%FH<b@Hn!7)<S58}pmZUNd6uYvA2*@L~e&ik7
z(A2x1L^82+>=ZwVw5U#D6RJ;I(ih^xH`QYHzvaJxNS0>(m((jPfM%vhp_CbZvuT4j
zA&4uVsU-FX11@5JOh7ct?#!B9ZNDR@tYv#e%nHj+iI1ccA>pTE*ssJ!pVpHK$~S?I
z&UbDzlDd+}cq+N%Ux8yD1qvLQ&sYZIW1a?!&HMl^tgrJYo~I4PG!~b8?lv3r{gX~A
zcMUNUiA}1{rA?zPvk|L~RhW+ZEa+cVDdgB2MjaB@WL)E*`ZoW8%#Gx6ZSDpFT&gP6
zugR@Yk}K41;2uqR;x)-KO`RRG7Y^ks%x+wbA${v{F$R>&7<~2J$jg4?qEP4X)j!nk
zX6t);4=~S#_{eE#s_wG{BoHK%6XsIV9p<$Jy#p{3D8$eh#JHW1N}Enmikv$&b8{xm
z7Hejb)*2bK7rzjeXrVF}ncMPqr-Sv8>}^1osO)W8S^BE09ebv!82wD4)n6u%zO%g^
za|`fkL?3y~5iOJ%<N(W8G;^Bvv1~dnH?y4jLlVRETq=j6^FAx5S)e?3<;M67+AEl+
z3+yxw$gRb6GPL`LUhx3Z-Y{RLxN<fTd)inzXEf<{G|cU=YMMh2kaF5E|D9jfO$k?i
zZ^E*uOc3*fHT{&bR`%+|7Upq5t@Gs~Z2wk2ho4-VueY0dyL?Bg|DCxu+il!WUu5Ku
zo|@z~`B@b7!nRu=GD$ZW7JGP?7mUXb5@({mvr~x+lmS_c)Xy9o5fKgIF~k_8>eDI}
zEr@OrgTzFl>=tTH#Q9+gMPq(!pN2%cWe%VCGgUGSJBn7+Yb&#G|FtwpUfW<nU!U(j
zB%QDD8n?{z98`eLEK616JDzHiESqhu;Hh8t#Kl(ohq=u8?;ZH#x$Ak#?6ddSn2gK%
zF!vgwLA7lQ9P+Q+N2`ZJw%F|(SPp4VTGjVMJhy94UBpU0F5GN>FV~!-x6~@Zdl)j;
zOq7V9$wGK|FZRVC2TSZnlznH7*a_7}Knx@VkZ>%VyJVHrsvpl%)?)RUeFRb$-_~J!
zYoaF#?L?Y9j_MO*#RG>n?XqS-v*}*;13il1TcK55{@YT@iJ60i>y_$f1Ac4PJ&a|(
z{(9s@?0E^l^UU_QiNiE=NBw0>3jWXKx%qq|<O5LPNvMXU!FCcjGwa~tDk=k$+bP|!
zlolAm?VhF7=90pLZ;iJkq7s!+HbT32c?(GkCSfB)g!CcxiwI=9J~j<#fWYo&!-V+V
zjXDlJ*B{GEw`%Po+x0(!Bv3nQgh9)RxtsS|4EErCOq&Mb^4F}@>LE>NNZLA0$QkPX
zXaWnP*CLNNC922Q9sJ*0e-V2hBz0gmVNL)!Ujh<5pmpOSD<uA!zu&Q)zuq7!8rU~c
zBCkG-YMbYFZw9+Yrs>0kw|tMSJUJYcl$7m5;n;CrhP+-w0*kuY&6j}abT>hz>sLgP
zaS4MHoN}5=kE3X7VL(5Qg$E>c$?S8`9FwoGM!9be%iniByq~7eFEOtA`{(;L#s~YI
zhfA9XcyJv}H)Q+ZroA<(#ZD)Uns{k3QHl4cH<a;gMgpt+j}-{{mYcoxetUC@vUKC2
zjV;h<3CaaLSzFr6PlBfE&3UJ+;g|Mx^F6Pfr>u|`gl7eg$%-QR6z+Ck<_AX8eg(-?
zCL#o|G{(8;HpXJX6TjxivuhSU*nlyesUVAoZ<CBkr`&PoLj!9WgA2+szrY>9@HNbh
z{7Q|oNCZ?^$Z`_4N03*Ba<d=aEoqYS^MO@VC}8jmRas`D&ID_w(Ari6$Zqq&0Y@vb
zHW!%`)5FImc(dNP=Pu<7?ZXLPkET5D1**$daa_wDQG-{Z;be%SVC6EV#yM}nJvl@~
zU7<#p7c{#ak4H1$ul0(7Cb!Qc8qpYFTVkdp^Il#tlG_e^4P9Td>X-f&^2?I<f`#kT
z&*Pq>LqBUT)e{Iq%%S7{g^8@Gz!e$W{3*Ee0=X2cAed9(3&#w83&q&tm^9zp$FSrb
z_bY>+3JO%qB16nW7!nyR*zW&g>K%h4Ys0ne*tR*bJ+W=uwrv{|+qP|UVkaHjnmGCL
z?7hFLx2u13b@i`aYt?;S=W(1o=@>tSz0>5r!Y6Tt6_FW@^(<cLr2~96F}!$?VIE4V
zbPm7bQ%YC}$;?s^=IRDhu6~UhG_YpOqLCDH;`k;0tIm{i8Hr(18;oM`3eK1eoXfEf
zu$Ue1uS-<h%e_zmw#Uh^M9E^(C+0O!7P9F$lKC6&XOV?lHNbl*G18~Ugv;G&dK0Uz
z@GH<OMl?$RS~bndn^!E>9w4C87_6J}N*tgKB!~oN#z-nW_crio%F;3(;kZMw@Ixc~
z#*vBWCrf`&JlF!iy<K3^97c%8EHaHKq^9#zX1k6xG4E0;Y(PaD2pJ_(9{6C%=1)OK
z#3;(Td-@Lk;*Wh34U|v24~h}4|24V0kHW-;F{Kg-Yh7Fw4gP@=6oREPPwOmCa@o!?
z=%PZ+(TF$k1V&>-t4z;5q9X#dOqy9NG|N!!B$gRV>z^*D#thlN1UAD6TSPbAH>X}@
zi{Nr9#Ho-)aT$Rju1}vr>c~<U(-V>qsU)4%G8)^T?tDWIk!+MMLPQksux$943)p?n
z3M!ydz)GF-7IHU18a=a!qVW1aiXac+4=8Vi#KoAOi$>U$mY4fFiTlLp^X<gfQn~MS
z7Y8M?NT|#i-Ts|zSb^iLjJ}rd%V)zVwO)cyQZRKK=616#j~gv>);rlLv=fqCDwkpX
zJC1SnBp(-iz^P}<Z0JQO2_~eaEL{gfBYEWat$fwMh-+SM@)tE8lw`U*_6`K-0%9K3
zrz&VuxYM+>W!d2IEE(0o=J6#3aV;Mh=+e4@lt20;3F^Q6e1#6TlH<Iv;Z*^-?>85N
zGGO*^z#|)qXM}Ck`;>yVG3nm64<hq$O0<C?pK{Hy;d}XfQHY*^gXyB6d0ai9$s<|`
zI@p(v-_T(c0Vaa%a&A<@`S?sjy1F_*R^KHP_{O>$^&oOu23;Z`um2UUg(Cc_b;thU
zc8o<82~eJy;k6n14-gI&5gQnU3eN{``9QeUJYonFO^60H(UU0jd0do4GBM)uI4(OE
zV1E5f3K_Yd!zd#>;c(*NS|3q-3yLRFtSFv>>?%6+SdgMSwp0|>1MQ7Nqk0c1fu!P0
zvrv^!3Zc!vzcEi_#u9p|5~y%kKN21-EPUtyE$2^-=3~|H@AVT&Ay^GO4F8c6?xjB^
za1YWCaS<ri&7xn5kr^a%cU8}M_p=UVs>-k*G7HUQ;p1V&-R)-W^O(^q+?fMsIJpIL
z9(gU4=p!_#ie&SY<R<j=@SQl$N~1sclk7;Jz*f#CpzJ6sv+y**sv1qjBvM6q9fUx?
zR{}f|r<i+^r#R>c(SzM-v!rErrI~7uzIll$BLi>N<rZmFrMAG)ZPD^()U{fH3r<-U
z{`Zz4yYhsL+Ds6PENbe5(T!D5(+*3OoOW2jTkf0sV~A1=yay`M;XY+uyDeJG&(A@H
zk?>4+7XLF^P{);KjTNtr^uwIK&^E;P5z+Jw9_l8tz|w8!4tha{23QC=!Z~3S#02}Y
zeHMk#--v8VnI;AjT>D2fa99rKT2(8I%b1-K=Bt(YUx(z`CsV(#VY}fXIs1|1nQl<0
zt16YAs1khAN4Z^Zbz0N-4HSkh5<EWE*Zwk|7{E#bo|x6~GXi;f#0#hj&YJ4~m3@Bf
z!IOI_IhyntDdIz<otZtoYIKN;^P@ZqpyLX^E7i<z!&e|7AUwSwCA3)R{_SnlVB+*s
z1fF(5K}byZ?>L*Kr{lLOC1Z$NxZOj`s)GEF_-;GW!k(Gn$A2d#*bOWpA44(b&GPyA
zc@T{1yFM^5Aftctf)vJ_j6Xe0k$BfjhbqiT5$I+(eeFC+$rRzuDf1<ar8OAN7W3~o
znry7%5pU4h^;f)K+G#9BlJ=X|!QH?r3TTgqb*4{Bsz3gtpT+k;1tm_uv>%S(jiS$#
z12f+icSz@fup>L}d+F%+zKQjag~`pz=O#dCVy34^f}o1gc3W@Ip<S*yLkU?dN4b{k
zZ@~-!@f}CQJsjEDw^+Hm11t%V3(yS+w&1h`VbF6_;ER3o>D7hk_Wt6*vsij1SY9Px
zWXLcis5An(slgY?IoU52QO(;q5JzrK#%6H(RW9^GmPi-l*FFva;tG6t@7Z1fo~D<0
z;0KPo>~LK(y#`q7%>E>N_`}`G#e(15QzGtp>@%0|XTzGo&8~qxdj{GnnkI+!DVe_I
zw)G8t^+2zOK3ZL_-Afz8nfR4Y4mv<}gCHRyop}Shd-<<`87v*5#D93o-)i<Ev8&6&
zOHZ*^qK3QG?%STs6)2+;r6Y7$F6kPaANieNO4@olx%rR$?!6vBUvtrB!AUPhS<x7G
z`c5~2ZZVrVQPiq^2*QAh{Oe_on?I3o`GMlK<;)_vl1JLeL6yP1ne!;ok4)**R&xB+
zSST`V>^fQF34O(WvzAUR#ogLNrIUzZQwE){gdmWTG=h9e1esqnPcnAS(n_MzJ|A|5
zu3~YiDNG@OZbp|vYFOcOzi7FN;-bv_VpUF=!-Jut^?f&8?T5I)V@D|SI|3{WUM>u}
zM~Gz>5i6zT0!rt@PSInI!hb><X=v?X-HxOvcSqu%BPJFbiHr59KXEtTpS;a+#!n7s
z>LKln@$9sF6J4tWIo>Dq(fm??#MK#%j?Qn7ukZ5jwmB%nE-1dvR4J6g(^k&bMA%Z7
z|3VIb@8hHWb!^i0EACSVESlI)s)e!WeK~+0?t9hj#|$tb*(#JE0Axv>4<?>JXOT}J
z=tz_CG`hCkd{p<<>2*|AGNyjcuczhdh&~(rz8CwJHg<s+*O6Bq$Qiptuvu5jg(|#E
z)Pk(<gOHl%r`6{dYbs=fX*<LM_VXlKnOgcOJwiUnFPek`k>dnuL&N<e$6^I@0im6l
zjMy0oKa-5Wb?HXFY-`5C!n$EW!}7G){wD*59Wc72WAsK&4&j2>5ZPc&TytYYv=2W5
zS@eR$w3HR|^EveFR0T58$p9!d-i;f=0P{rh8>aD^N(H&%2c>Hn=sETH@X1-!9INCz
z42XS)iYv*!c&4y>_nW@rBYmvO?mXSF=TV!-u6t@90fR1`Vk@g$2h+E*fO2kSRfZTk
z+Qdv0;pS3({~<d>k5Djfq(Q;DKJrM|ha}9(YS0Ne(K~3|L0sbXKQ}={OJ^a}YEx~}
z1-w?{6`?^yCZL)E2w}z3Lf+&#5!@@>#7w%wIo6ls#(aJd6wCa;+~Q14GX8@0Tr3-9
z6XBoYNQ8oGy1Ornx^(cG4E)fmgkLvCk)oSq{vPX1N44oHE!#O#Lr!s&Jx_LC!;ivP
zi+%0`N09^qrAerRX2_p6<oriVanu`Hl=<JfnL=Io6m8JY3B&CV)1>YHlJoRpfu9fX
zfyanD2z}*Ayexwxv@>ilq&HP|VsHJmt2E2fg}a`=n8>9!(P$+{MaA>#-a6Wt6MlT#
zR4Lv9Zq6Su;d3GE{7kS~#j7jB9zz8EwpL$K#H*73IXn0!CRzIrupc}?b-j?^M;uBL
z@_Lw$+(l_%<o@ajl7Xk$98)F}MPmOtq1xAs#hU78nQsUd;jj3aR_y~n=8oh(@il-1
ztZDpB;PJ7TzL>z$x8Akpi7bnmBOwPuDJ&e?a1#ekqT32_C9*I|@!noR-~y-`N4Ox9
zl@ig0503FX&BYe<v^$|!vxejGpr#`+;nAkbo;d%~DV*yFKuGAy773QdJ$AunSv_N(
zPv;2zngk*#E#zeRvdEi83Moq0SNE{2J-YI7S`qD&y`NtoIT+MwM)$7&v2*MIx(5%B
z;wTA+1UMC_WK(!OHwp9CN)!xiOMUBtQ$fruJg&1@$!E1jA$R*&RwsnX)g<uA#5Wn~
zt|EBI>L`nDguZKM6Fr|cK@u}p`hFkr04(VFbILLh#>{;tU_=piMfu28uyD=bggu^n
zeS2hGkXjj!1s~77?L*p&XR3SLC;r3$l(#Y%xUi2eNFJrpMLGRAdy6c1@+wN3(P8cB
zvOg7+Ybv8nuCC<2C$G|IuON3dZuA#x7OpT>bcZF-{QQiR#3sG0dJqIt9ntMMY?2yo
zs$)=>m|rY{uptO|G06bL3~kQjM*1$ya@8ZW-Bjqq>z8x=9CRi1OwN}0H-)f8LGV$H
ztixj>R5ViJT=_t=x;BCTi8!QE1Sp#$z!79BmU4bNl%X`;(zywFa*_+56(b5xS|eX=
zzws{CWG4I})ZyhxZcj}>;*}X?h3z1s{{$_nvOL^!)pU7(6YBjHZE`l$ZoRb-8Wb5L
zj1G_bk=8xRS>Ct-JQEGU>=y@<t*8WjMyra!k~G@@{+uBpZ1Bmuxu;i13AUV+3}oEx
zXGFv0PBp*@a?c=8`(WmEW#J;k>m?r90zE-#q1B25>Uw{4xiWTs9Zqf`3a+&bhz6s0
zz3oh=h>MHIzFR`V+6a`Eg3>ZJtDB*ih%Zue7HzCLjOqCS&<f%eX9ThHen(E_x^Lix
zzt{aJHzK$dL>7G;0D?oB$fOvt6U8FIXTnk~4F<r;-?oDRY^SnwbKIlTT=`5ty`PT;
zGqp4MOf`XMQ+WB}G8Cr)lVD`|4TEE54}+oNOK3uRmX@Yc1Fg7`N<91=dl;s7D6MA+
zja@36do%@<r00yJ!vTmrH4Onw)aRx@jL6@-lU19^fL}}23qHl}9jHAANQbC&YsF<(
z4;0<=l&gig95f!m4T}h|UgK`7qM!I{t$dFTURIthtV=4mVA`6JSRO)mnCe*YKs>1U
z?}%K}Hy!_L%pm(QW~51kXgaJ2F3Od_z#_k#3&35a6G=|7gcLzoBVI`XdvHZycvh;N
zSNw*gMlgF>MCi$aQ;7V&E0|f-fyNRw4Zfe1-xFnx;1e#+Jcht0vNb7pLr2z^h$a9b
zXlf=)5c+Xt7^A4VSHj%=&A{{+W*|^CYul4sjBHJv3dI%yH*8LWX3YS(4>78d!&ung
zm)zGG$7~@3C3y^Wp+0(+C#)S;F1){=6_{Os<I)o?2G@QqU#%(sm8zMh4I3Tqb>{uY
z9dOZK;|jOP&h7B<;{NgA+kf8Mm;~PDPDDWhqU^8oU;`Vh@RGp%MC;dG6U6ndSTRP-
z9@bWLim3QV(I|*z%bHbbv3T>`(*j+-S?ql+VvoqtS{TGg#iUYD-IFlTH7+A|h_WO?
zp1jPVRv^6k>K(w6WVq&!6drq)6Dy{Z5%woC?<h(ZO{~sBOn1@m+_>ZqvF2#xsE!aN
zkTB7ue4k=S<i&)zrm?c?Pg#z@N~@A1&PV5=bC_=ij=&{_CiR)GhKQB1o46ClqHF_&
zes+Ek@fXD2J~m>_=p3%L<S(uIiox~y2{fvJAxQjg{X{g2A1Ov8P}T^=zx2peYF+zr
zYs+{^O(r-%d+@V}+tHsjPpV#2zy|z=b&PJoJvx>Mh(L=x*L^(E{@U$8We6Ed!fycV
z>FB#6E1&2>Q;#{0NI=hU4BWmI(37u}`tg}NB|QpJ0ZMlX^l=$urLHjdKNo$H2d7Dg
zL11(}5)5%(_6y21t5@JSy%d>SZR1fg@JPf6-*gc<X769gle<y79c1p4p!eOH!oNAv
zCxR~JsfGZiYO{tAi3?br9h7@4RG=&@m=ruv59P^T5f{bu7!DHsXwWbJnJ^5ZCsH19
zGh|C|=WZ!flq8TF2DaM8@u^beY%B~+{`|LY0uwbZ74o!$fwExHb7=`~IPXfyv?0D~
z1j^T}6mXLsVWQVF-{T!$2x!x)LE<Ka2NwA(mBisr?VYB++@&iH=^!)Wjv>VE2G)cL
zYl*le6TysZ1jBA7>+MUdCSiUKxri@bu0^+<fs7imTzG=UNp^0tz7|NNKwZam-+St*
zlNs4D5JN@<S_~};ZD`;oUmm8@y1~G%@F>}Kb`P3_WU58F2H)~0_2iKjt=)7oxT!uw
z7~10!!msQ^v!@1XD}tDural`j-MIlgzmcQ8@@_H36mIhguZ@jlw($g7FJ#afiV0wH
z^>8x<#`AfWrRPLy-t{6Nr~(>WI+Q)ZFaN42{UK7P)?*z1BXvk-#z9@ogL@)U(%L;h
zucGw;k6%8#uu{e0P~&nKi~oH?iJ?jD@zF9EQ~ozlb_Z4qj2MncAGp->hPj^So9@L>
z@=4gsP|`K?ejriXdH__~x)c1n3J9ylTVGOO_0*ee0OrV(Y#Hib7s^Y&R(Gw)If4`j
z9ZR(9L;FIO)#wiOu+p}KK2608{Bv`n){X-|O~n)ZedE5Z)mBEQwKvtt&&}PVZCvQ|
z*M4KHJ~R*ZFnXTriBT!=I(~b;M8bP&z0DApeWc2E^F?ZBe$3xrRs}8Xf6D8Fx&>q|
zQod_OvmYSl61!5Ng&zKDkt8G4YKd*NU6z*!!SZHZ<tpNbW3&+|kJKLm#Z&R={649^
zO-H=Qd_s9gdp{P1mdDL$0LmUfmmcogxJa<m==PYB)4&smrq|GWD#fe!s7bh1@Ad#a
zq0y6Z?IB6(`-)TVenqlVC&|Fhsa4VXK-aAeN>wN-HkyfbB+aQEgk6*MHiRCBJ_#ms
zGb-Y#_wb&1)4%hY@zlT5qI=W3JMRtbQ4`A76c&+Ihb_~GC3x+8HNj?Ty(ZAah_TP+
zPvy8w0B8(WT3*VL)TTfEySs~=J>4sMeZIZ7y}f4oa}M$HubBs(_|r=!>7}au`bGI2
zf{&x;_20@&7XX0Ml1_drgOWP5^dR%DDQcf&`;6!hk~_>YT|g~T^+e}mf?ff6BiCFP
z;!ag*?-Z8|`)Wh5RQ(@Jo(o~XsA8-d31;#7SYQcFrAB51im)fllFtT}%g!~k4if6r
zdJBxC4r_))GGO1>GTMvkf#GJm{S4WB693N8*K54fT65D~Ys8R@87h%u{opEaj;cln
zJqy9fRX&sXSE?WzPl%c~2p^l(%D(PBNT9IsAxPY2$rXdo0aiy==i_a?ij4vSnY0r&
z_zaK6n?7CY6altWiJc!D^==9PX)qh(6_>#RCaGyf!fY&_X;)<)#3NCqAEZ$YT?ba;
z2K7lj%K=;UQ@36p@cX-r#2oH(ofrVx@4yod;xj&Og7^cxaEF)*ay9H2d!k7f>UKL3
z0Ym$@?5r*jlBSqxR1pPPM_NABvfYNvwW;oZCqUnP3+S*SJOtzHL5rDydy3o!S>K1>
zK3bmDyg%FT{47Rtx6f>Uy!@MOzJwC&{}l?$9S10%VXg4OixgfF#pI9S)+^}o%{E9y
zNJ=_IWUdnF{GBs!0dl2C^u4nnsSTeplx%bh0$U=2z@5@cP~8Sqii!nP5s5=2%0k_e
z&Nzi_{^2MjHd-2nANnVwy&x4{A=C(@`I{61EW$9-QoA(3+2Bucov?VA9hwaH?=`33
zjb;W~E*E__gP+j_a&)YDj*e3~|E7t{se&!A*v6bG(T<7zPu?df??oO5&A(KpT7-4x
z{I5^Tan*LAiY2C~KdskvWF55Hz^22~YI_^gnuD=Ytxf%(o`G9oE+UMh<>>UXefaFy
zKE(#gpL#M)!sh#S$nMFTHrsFKUN;8{N!+$3m64P-!La_T5u8+Gv(qfy_tqhX`vB-A
zEWdvI5|tR!+knzYVU-lnolaZ-``B&1&iA)K&D}iV0K0Q;P`TePm@&>ye5DMeTclxi
z0gtV1t@iIj0A6ml8~MJ5lbpv3@7BF2`~+JAmJf;HH}Bidv-Qjv_^;lfs%G<NTyB60
zjer-(dB_s6b3zW845jy7{UlrMjH&*=-X`-^z~@=crh8w<cg^Eh8(_WnriPbW9}=07
z^H)*H2pX*3Xj(7cgS5ody$y~|u!a=0pN^TVROE4H1tD^&Sv|+bW;}&?J%1)18c3dF
zjIGBm3&0^daJJb`SmgBo2kn;YKCjY7SXc`mCO;3RFmF2L>3?M3f~T8|f-|Wn3|;&>
zFJqTX6%#3#c?@(uCx7+z`P}_2_ar|jD^gHr)9k-&&(4bUF;2GZ;|pro;1FzOf21p!
zVhDO(PK{p=pg+3cW@|nX=z8z{vE}H8r|9@?#GA<?x#79{oU*FC9T0&s{=pqXC}Ti}
zC*1s=VZZTcK@9QpVdV0G`?j+_o&HCq-<{s)^gPE7Ad19(&~szrsay6VkeAlYuv?P1
zA93VhVgIN0$QEv_=K<LTWR}^9pHp}iXk53u!Ry@Tb7AYgL;0x+QN6FP&<L?m$KQAR
z^A>_2%KO{obCl|NY2r=(;U~!6waTv~@BV+#-6(ToY}Nlkcai@K-3^uJ{Xlo{5pca+
z?=DBLV^)wv{SW^ya|Z&E@ITC*{_8)B&jBVVq_{FS8Lmv102csLDSH8mWk6T)bj6qC
z{w&)8VcC)wcFAQMUP(20o+SL7JxP$J$aCC<)u~C@>+okM@wmW1dm!={NTxR8SX;pH
z#>tsjNZ0ObBA(W74g_?q@zV|xOiO*r0td$cz0reL*^z@kBe$iHEx=!-T4}3D-=W18
zntu0%eBNbWdt}dFkRMW=YWI1}K(H7(4GFq^k8{5Np6<irQ*!}svpjmdj%r;F;7UyM
z8$lv(1}i{OG5%QJT!YEYb=`b0&hY+&s=OcPuDnK2(A<64@g{3kX=pU$w&?m|LF@ez
zjK6~6?v|_czn}>9Q*(24{n-1uU$FU-PCdM^*^f=vu;e)YPhu0Pa+i@*(!urF&i7s5
zeGz;FO95Wk0Nm*XDmQH;rsk$}A|2dV|CZvNo+C(2$>n}s!+rhv&^}-C{~&4NNyJpz
z*p+#lM}Pf+%`bocA{|NWokK?K9mr{}B+ScVBY=64MJZVLiOjg=E#A|$63pu)`wM-5
zUN2{Sm(-Z>98wHl*r2sL@{~@nS{+a2?Dr2XiYY}(#fnmP#$1?B6kBK{Z-iznueB75
zQ@{xb+UW1IR=DF{ibd(8{kvCDIqTc~GlqJa;`kXQfh7Et`tWOMy%-*C&0fFGN+`O!
z-KC8z3To*5FQRsqtKo7&cmP4Q2Juod{6CZ$71!T$exKk5!VqAodl(dr#pFleb3n$1
z;MUJJaqWDSu$|)AB8c_*?&ku>t!l~xL%1~jyf6BxB5kMeulE1f?&pkfu*OyT)}pBa
z8bCL9uw19_JiDXK`Be>kpPQr8`vNoy{G$E&5ybs0F|dO&k=lB{e^4?a3Zj=`K6-ml
zRG!uNTp28ysC9;vx>`+F1@78MHrLEj2+YAc@+312Rag7>KUj@6&~<AA+pS@VwOC~m
z<85}#FxWJJGv!B7WT6Y}iQ-$m)fKBhXk{@c5@ETZD20ig(@sPkAg)&cA+fSC0IX&C
zjO$1mQ<%aswq1odkr*=HPC+m)iqpGlT|Bw#*t~72#Bcv84aRXH;>&b9sIB+0^tmVX
zc|P=J{=7*H8TODI<E447slV6qxbI8)SlUB%<+6~NWtv|p=Yy&BMk18d$=2}Ko%nf>
zvuHSO>m4yb7ul24;PZ!Ci=NlSNQJr-P>ErQv~^PRDjF68XThD_09vn};RZRF17YtM
z5FV8oY5*Gb`|9#O4)NZ5FBv23D4QMpXov~Na?RW4h+wQM#rm;*OxHWT8(Ww>Eot%;
zr6bAZxY*~Qpn-NVFAIlpy)b1FQdugj2WqvITf)jHQV<v-$G_#*ZgqG0TtIKa_kH<U
z?peFv{T_v|*grp!%Mz3J)BYg^nr{t#4FfYDfwZHZ;1QsPncF^gE4)8Rx9qZ8S|9hP
zj->O1fATPzo-gPc6Q|w_?JEEgF(&0HITaLbYrZuN-)nB#GkUu0qNEt}m|E`2JZQe+
z-B5X%jHUS$`g+5C9!Rfr_q*?K2mUx0(UK!6n9G0CCS}yE%G<cw73iLpchM=zI~{}o
zSHiw>|9emgf*%UwXMSsN*N9FO4kTmqxh%wYVL{(OI;05p-O%H_CNgk54V}~5ld^!D
z^K5uH!#ppc79c*nr#|l~1Yw$a7cqC{FHg({StpVEvKAl-=R}a#9-CE@ms6$-93}y>
zLk4jh;uH1*wDp5v{#BGw_NLQgrzs;i2NRoa<8-^zR0{o*MjnNB0Yqm6OuB!eqKQ}-
zl$!lKiQ<K@xV4W;>kGo4MJcTMyyW@(HRi$Up)uv}kI>xZF|Df+0RHbq@$+mZMVvn>
zm?a2)I#Xl8+Xh<ST3Q<mN64VGABYly8LR)Y+0S^u*U=~0^S6cunP^<b>=Tg>`)E}(
zm)uJld|`F!DJn1~vyB3r+yoKZ|4?m6?gNixZFYgDgChB`xL&nq>CP@3p15u&+K6`>
zv6ph`XT$DuE9s&|YmR~huNP=?e|Mje>96rnb(~l~w&)w{06k~)xrpXHMPM5IZ!O*z
zwA+;;ero}xzuK;qL<~%TZQ-4SA!Rw`pGAZ@C$t7?kls(Z&qlOkCHi^*C#`COoQct-
zDt0}Mu|9jW50u`H^F3ZYUkBT7kSE)Sbt`>5ab*0NY=AA)BG8WJ#Sj_1f?-nWG=Yd4
zM(>5Y-};AgaaRZ|k*L<L$#oEiN*dZg{c>RzRgsxrOR-Cj=E`R0a(u|vOe%|c*37Cb
zNOV|LXS1c#%a7)i!EB!$Vo9ziVYoGCvlufgALdv?Y@h3yXPI9%BObE;wG%!w{;eYc
zw!?#5+R5;1n?yg2obR}{s>l4o8!6bWnT@oId9tTyCAu^*u9|@UpsT4)HM5^RvKJ!z
z)~L;?kT6Y?=4*wYwmvV<rr1VF<m&58*xCN;EX2kOTXoLU{5UL24|$Hy?@vEetyc&N
z{I4c{3?Wx3|F8ubFH|(Orqk!98m(3XR}0D`tnXLucH_23kn1Q~Lhl%qp&zq?U|sm`
zLFD^B*h|bV=x<e)KS?CZm(T-FrV%OyRfLR!4gWq!+h2()H*x`byl;cNj;ZIB=jMG*
z_Ph!Vf{e}Ki*4BN%@|kl6jybkL;JHDr1;Uz$JJUGeJZ59(fW-8HJ&XV79~+@2gl3Q
zepjjJG(k}F5-lXn<kadmq0LW!hA$2Vc^M;<q!p@m^b10yKP70bgT7K3Fqrg%EzC8H
zB*~D<?@qoa(Za!S7H<-k=d(l}2UqWhysyJKcJ)+RSQWFT@dTUcp0rZd&^pxJ5NyM%
zwj%&~+FCEbA$M4yD8$m_Y?`qRE1gI>^x7W#0PpTMnXYnjBMqF;0u`bz=n`M%ns)Lh
z`44krc$J~ijbc&emc<wX$#TN{&pWqrh&Mt*@h@A8fYaHQvw%~6SpWptmDlQ}*KS>G
zH?$Ov%>nNm0dVym2BqgW4FLGcP^4`XHy>=e&iy-ywXe2~Kh{KIiK;T{C`COrZ%aGp
zbCMMNBS@jbjSb6S#lKy&&nTpz0qIfd4Y3D(?JFH90p9lO^}fkdz%IKT?I+~)3TYZ%
z{El^v*6sn|Kl=&wLT~Xpx|sR_>yXFiImmxCH{bs0Yx)Cm%*4~m7SwXUr7*eB<Sb*5
z{1eGh;n_>9L38Z_%h$Nn@%-A#>FV>@JDmJTDhWlbm?giq{d~mNi6mJPE|~r`u7A}R
zdEscX$*a>B@PtvTWAcfTHfKAi0GInTsfR*xUBUC^fEj(nHKRkLre*!5)BtdKP>=Ek
zF=+NF(70aPs70NsAJidFwPy8C!J<mn+!QrJgyX^REm8@q4Rq}ip>0_vvwdgYUpWG4
z*c#^2CrriFKA9S->k0FE^G2Nqe1V-4;Nis}JpDT}logstx@9#jy0c=l%u}TDt!c#u
zRzOvU4b_s>rG=Zft_de7WgmAk$(?;_#aTZ!{G=1(-AHC{#<(losJ_o6TJ(;xN3YqN
zmezY^Ja>!~W@o0ctmdYT4mTf~mK^<o|C3GZ0Cr*vlkPw5$bO27;14vq$|`5oHF38%
zFq@8|^FYnpm}qpeHZM;is9!7JPKB0_w%wT)cq-Zq<Tw*{ptWRHS6^OfC@r-iT>(ai
zf3CU{EG+YVSCM1kL%L(im(go)HMx3^qa$MfBhKM)1iN$zmxhZj=EF$JJ+G>_M+QM#
z1h3KG8pe*b_MT{CwkT&DvUI6`!C1}b2$c=8wJTvQL+y7ZplD|?a8O^_o-U8Zrkrea
zR#lIcjE-I|`{^yl{~=~zg<+e6&^1ra(s`bWL1Fhlnr8A1Bv8`h2<|BoiLyJzE~&X8
zx6`TYF0Xl}L*KpXw56iU{pS%##cTUnTC&tet~2p?)z;HS>pS90na7|w+joqcN*&vB
z<W~aLJhGXB0Zj6^Emh$c%|YW@3rIB2nRd|bQmq)YSj>?kqz2UewBZECT@f}BLc;%T
z@}ya@GfinZ7pA0BuKLbbnLOpDOsRi^mvULyxSD1}g44CSbS$q9H0y@Xs<V_I2OFx(
z>+15^Gx@wH<$7rRRP@Pow@$9y)Wsn)Vf9vD(pt?Hk#u<jz4b7S0EpJ}wZlf71K*DV
z0L3T44fp%lW>NVeUU$4t4GLj4+lE2BkUc@NUhwB|bp6H(l*=Ens(sf1J>Y0YSk_PC
z&CSS{(dDyaTK3J8SQH~^|Ff^0S&~i~E0WJ~vlo8_d#Li{V!s|q@4z+gCY%?;SB&X`
zS621RQBD!GmYSFsHdpBjnc_pTJJ26M2ZVJ6(5P5Ez108}woJ8=1enzVV4enbyV0q^
z^t;euTD{Iv12w*B+IY3ng!S8(tJvG^Xp(L28*3Drs5lS`^pfK6N(Wj7QjCQe);H=~
z0e?~4s_m=0MAj}^NOFG~<Ueo<J_!f)t5u_tbcy;Q_tm>qtS$Y6KZ!Cs6|Bn(l4^ut
zY#aU&`8sPf<%|s~ugotPYw5FV&u*wqc&&M__yn#9@LIRNTBMoF{zmBKqCtaKgP&4H
zk>lLDmTrQXBY(^$A8y_otOl!#T{RUo30Ka>`2mWB>}@ekQ#2bZJ04w2XgZUb2P$jp
zF-=kiiBnYR^x{;{+3SRrL<0=ylx|xzN+hPzjM#o(xRR+^m-}KGv329r4#Aj$U`axS
zs##+i?tDG`+n&(%ZiPJZE7i>Lyf(io)ePR6H8TaT`*q@%I=Us%(R!WcSW^?C?$6d9
zL385~7*(+|NT$Q)dCu4Pb;r=hklYwvJ)vCFwzHC;hJr=^E2nNL*;@Xxq(iMnMUId;
zRsJAvcgxhWMkQldf9jBxpB^Xy(A0?&A?FgXWy>-&x`TI?8RhpxYjs*rwVVyIJ{Hfx
zlHj<3_t!Z@{=7EFo0JFJvsc5JI!TT`{$0W=ao@dNN>1y24cv?8C#Y!!F6V^oTH5&-
z5i_89f2&mUn^O({HS%8QNwL!A)HZ^Av4EUnp7Bcm#JXe1|IZH5EyV9A&9{lS==Hau
zG6zN1J!N--ZYFiJ*=j$}ZC+q{@yC+Z-1gCA`Z5R@#li|@{Y+z?xZtP7F8okY>S>)d
zss-yXJ?-YP&Lm=~qlK8yV^$!My@ezsT&6L5aIgDppP7E%9ZFos0p)=$q`Fs-3E96H
zT+{Bua7$ab0U$DKC)f5KN<m9PXnV*dw=e0JM7T2zP{<HG>y_CyGo%U@NFk=XKb}d$
z(FcnQUg!&_;P5IxlUj^~T50TevyHU#rD6d3hEFn7>unb&Vi|VPByNI<$;O{wILN6D
zfssB;$`aB@SQy_^70H!nLSG8btO2G3w+JibV!q(wIBTk;1zlXK)%#?pADCf+5(Lf-
zczka6UuKk9PwF9Zy6IPj8&RGCB&p@&2PsPAX)`L;VWl=b5uPcx?@*fEd?T(In~H|o
zo3;d+KPof+D@s`7%Jebcc&su?o{?SKSbNeA(leaBy!x35*$wq5eQbB>V6>^#Y{}sK
zqNUQepVN0+{7b*ersHnJ?%p{Z>?;nJ)lH$gZjSzcK;l9+r~t@9I=II$aUdM;<E2VT
zP>HtxQ&!r0gr|lq#~9LC4&g9#?6~h)p?Z9*2V>3toG+s@@6aaJyhIrbr|^c}9WZ_E
z@SqYQ2$iXrk;9CYWo4qP63RZ#tuMa9=@Z?Q&png;+saa_L15HT@JHZ;QtqT3gy-T4
z6fr!g%w|Q?$cuy_yY%Abb->>im?i$2a8zmG+22F41rz18Sd;e)8-EA%WU&5mJW1Al
ze>8a;#znpEx;`D5qOGfL#Ek;0VLmBQ{G}({t&^75?=Kl%Z9r3X%@$Tb*vjqeQ|e<C
z&zatO7gz7)EEqL$K8a_s_ggqVpAo+fHnxb#AsI#50{^%}d7;_y(D{*O_&RS>oNRYY
zeZ|ht=2heQAM179{LOKh_5aC|sEa+Eoga14KBE5|xEF=@!A8k?0vN+c9NupxIG-=c
zI+=)Wm$SR(M7Mm&ER1a3o~EkgIze@dF#WnTZkCy<P7&}!PW)rEKxWlHw{nim2^si%
zS_3Dc+rwa%;>;a>1SS4{+o8XipNE4>D0XlWt^ng-sxCNn=+bAPDKc`JFeWJ#H2Qh$
zonOtXu{>RmIF-9Nm5C%8XRgIK#LE=aeSs&0O_=|SRGt1W(eh#Z?_gQ}c*{Hf=NO(s
z^juR%sLT1J0KGsVVEnAb;M47O&O@*C{dl?1WStD%*mL-J@U~+|(y8nAM_ZSms)T17
zY-P#OaR|QR-<~R)Mxp@F$p!V?XM`iwPzGIAh3K?CZ`4XW|0NO|wX{6x>z<FhQL{AX
z-~h8awD~W4zKj^9Bhl_DKlbT&o1w9+UW{eu;6UGi8k?v66Po91x}J1*`<zeHJ6NdR
zG|t4j(*5bEf8R~dH`}uK6NA4$zI~$nE}wY*7;8%0Z(C=hh9yjKZjm#Z1I%ANpfxuj
z;2Vo2nAhax{H8*H$oJTxe}PGWK>w0q#$q;%7=Y-V!nR-kqS4>GI-dggZhK3oN-`hx
zpwQa7s=KTI_fl0KV8dP@r3<1B6+AWplVnC!1_3`?N!A>EnZhUb;woGWv}YO;aXJh^
zj1@iNet<${aJ|;`bOE>DY%fw)){U5CydKn~MQp#>vG8##y!-6B4mW3$=-1{k#CZ#=
zs(K?oD@JRBqidd(k$A+Fu^6kv_)U^z_LZVZGF;vMn&W+M0;Mj(Ks<<fh1cNt4=#z!
z;R-Z)_NN9bRZ%?+P=r}?iYw!ptSV+2h*Pk>6#S;ucu}sp+^latq$L7x^sW*{oY{M-
zyHOt}C%!4>;#_laq_sMMJ<uYJ*=ESAqwVT2iV@^$FI&D9bZpwBYI`h+h8<RKEuv5!
zO1Y?|Tx_cmS=zVUDzeDeuK7<AHOVX!ZY8^JtUrheY^;<TwzBT!QI2NNzDi7irlfh1
zpwkien0$kQW5TV&)!X9Y-`qnV?;}k&YP0K~6HdLu*X6C5Go+!?hw_D%s3aO0j9YfF
zeb%nGg#uD7Y)kIojL$_~-t|t^h2fe+yPWYN58{?7(8!@^B7&CV7_@2~#$!|RKMFnl
z$hnqrTD|3>cD~m#2f=-;wX-#}`URC}sO*8q#dBz8Iw;*!=a|He=@b4UcpyD7WoMr=
zM&<M{qpICna@&3hBWB#g%XYSxe|k7}w>=SSclWWoy`zuMHMQHjI$eUu!wWohaq{%-
zb$l`9uP668aj+((F_t{O|GkySOHOyOTuVxH7?=7FZa*hq*vlAdTEIkm$mKS?g7P3p
zl0};*Hsl8B(-L*Ky>An{fT|#^2{1$;tu1hdy=$&%A*$13<D-Ldv2gQR#Nkw{*RYx2
zN9QZaTFW7~3V-&+unceXY+h<1HZ%x`Q93n<$!xYXdaU-i2{#`r9Wj|UA$6LZ!EPP3
zbTnP}9JMSWIt)GPA;(ICw7ri%y+6rgtn_s^9%a`5&l_puFx7gHe0^^?heYA7!L(&H
z>f(0Oh-yb9_+Kh`moLr9su9>1nTGH8(M<PuHTyO@jzeBnOY2POx&i8CY6V+V#S@`3
z=0Nb(wfalMQ+x;iBI8`4$~k^vc`8^lh50eg>g=uXs;*A2BZg7O0qQElV7heB<a`qh
ztO2riUKh*+$T-xa=62&(>-+I}Us!KC*h|vN+J|rDi)j9Hy@mf!%e3~}UY;<2H>a02
zip(5rtBfYyt!81x;Izxtdf7$UtP*ePxSQaGj0cmc7LAaDFEp1>BErDuZFQFe2&-LC
zErim2jz>@%n3pOf6PS<{Cdw&HW3zRBp!pFCt{S&7^j=?elD&^sUo>W~9<lc5i0AXX
zaHoJJ9BvFVaPVUZ_99Jm8<8QjAeK&9p~Y}}s0o{K;jbHe<b&K#gQtx`OeH>z0orT9
zM59eUsy*Wyc?slbi&KLANZ5ib?N#r0WCA~E7zci;^<O#$b+2jo{<b)d<GK#McCMCs
z$+Lc1y;7Qx0VY%-L~$5|tlQ4J8!(YTV@gQ5T#<c+=@c67o`J0|^{I9&6Y<gRv8WW=
z%KrMA9|BAHMJmj%eaNbXLzE1;;KJCE{^9S!n`Yd5B$8&4?@Z6|00P12@2@7h*!R>?
zU+DvajYH>^c1$aDgY|EeOyQ|*OfwF&bTZSHSb4i@I-anZoQ91SbM;-^gI1l|!dK;4
zmf#~49Xc1xTLOo0Dci&SmN9Qc66*PkxJjg=+jVp6=Qv|KDb!>^kVbRNK;Q$qgJWX{
z*4JBFzN4-wObb*Z6&)UFlIUFhEMsQUqm^G?=q^AHyC+_K<kPtW*6C^|pUN^<x4Y8r
zugq|%Cb<rMeV2T1fro^z$)=2l+5|TKJQ)OkhEEN54s+)rWa)wDL;l8r%z+9@$l2?S
z{-WZ}9-z8MvMWQmPCFlB@dyGJTmouRC{W3~1qq8I$~=$M+rM`vr~58(JgXp+q0e@e
zVAaCFf1Kt_jD+v+x0RY;!YI5miVO(=jsDg(evv~#Z+1Koe|}LtUmuFdQz?3OU_$<p
z(F_P1a_aU(sQ+2@+#)wA!C0xOm;w6U4oIkFZ)OhJUw&K11lefb|JEYT)Q0!%_6&@+
z1kPlb`n<ERlE$<MvzM~6o|OZQreKRf`eqZx8$;GF;b}lK0R$O?PP|O2J;(bG#SHX3
zF_}dEdX9;$-^BjZh}Deo;8#!mThsfTP}vk4n84;B_UiPN+4Y#d8F<eU5vtzdFU2Ft
zR=eeJ(+7#2Dsk?aPl$sYK%O<X`(*<mBE7psj^ZL(`N|Rso@!R>cuYm<P~mzob<>QL
z{~&W~FbKi7Q@iagAeuP3y?K4V^;Xg`dNQ{Cj@dojtbMHBP>P%#<L>^2W)iFW=5htF
zo&6H2*wN^6^|lL%Ygjufj(zH2xekrD<dgDm@w%@-2H{9;wIV%!a&=t%6)FZ)Z1rG5
z+R)G?J54HvfM-CUe=s0>Nb@B)jAmn#ib!xOifg3RNpvaO<SrB}48<+@c|~RR7fJWx
z6E|Jmn7Rbaz;GdiXwzhgU2Zg0_C3whSEK1SB#msV%(OQ7v6;;D<(5pcO8qm^W=eiF
zGj5fIpR<D(|A<{<yE);3t1%<4hY+V^(_M}Co8$p=eP_yuwzabclAllwI!$z1c{@%u
z`@;03&QQA|%QXXar0^qHP9<CVx*uCoo!VR8CYUu-vt5R8EdG~z7viBNe*t9+F@5ec
zTkIAV_3}VgJ=R+Mge$7DGEc|d!c|L|d__{6_XX|-kE~G*dwfT&cg~7wcIqjOr4r-s
zsic12SXj($=Q1V%vc5lwqPfSPRy^7I-;b$W4f%j!5%&jL=f)zgFG9?%BljL@nO>@0
z0PjcA3C>_0Th7lVj@wM_5mkP0ChK5I)$trr77~Y>O(>T)xXTzfq4=EYeUC7`v-qIB
zvT<D9l4TzEiN2}*jSgS=&841RD5{Q5T5dEN+reJHY**|sQBKj})Z!3v8MGy2kWd90
z?@xzA(RQuJBw?6td3Bv4nc}xCkJsM#B6u*(;gJ^1i|w{wmbdG>N<rL+4v{U*oP-W%
zniQOb^4_vQk;#qLES@S7*lPTv$lgoE!XrOYo%2~+T6x{<TaaxUZ&2cWMkLoqCWZrJ
z6O4a}z8^r6W7l5YoVGqzH`KRG;}i*Na_zCb@Y;M}2e$!#`2fGz;UG?$XOIfqiNxXG
zvg~#TCX_pk#Vh5d=)KZ<*qs_Q>sQVZW+$Yra}rnd<4&{!@Mg^YVqVBscDFN%bTD<a
zJem;SH99Yg3D@S5cHdS$DgTAdp6I5>K*}e7;Y?Ud?>h+z@edLmzqk)QDEjBQc%K+V
zSYcxF$J3H~;}yH%ac@dJqwwP&DvZEj5FEr!MG6$I91G9Ad9Q&v&OF0zs&2C#?@g84
zcgS&KD~_;V&nO%r)K0kH_gIeiLf7;4v<<(5!AUNt{pEJjdV}@**C+S$L0~tXjT%0J
z$0rQc&9cpNGuj>E)t!xpkmvg47){Tj-VhAs3?KiAUUpd+d{z#)Td*=kB+F6Fvi)C0
z(9!elcy)#yHG7vDy|-LX)!gm_zn`zRH~PJSaJ5&(s9*M4c~&a%N8p_FIi3){si8p&
zAl~v7{H_*VlM+9Tg+5p9@fqReMD*3}4d|+wet0`uOkCG;s3$hFW{ghQMDW<r(Hz9x
zF}+#9=4G$T^B@8IUYwo1aK2%_k!lq8;5^+m8oe{Wo^W0=XnflRoO3*XoEIGYqTWSz
zTA)x%+;ybp=<iU}ZMUP?QVba8+OF4hW5O29`7Oe86ba$_IXN^sxU>5A`;_l9gj&(k
zeYnaU%NI^_M;F)|f$7*L!>(0^L%S3gnQLmlq{b}5?*6g({Y$@mHafMZiYh67j}>pE
zgKXo@BWZ>}+b!<I<m7gs;q@dHdWeZ^&@-=XI3T?X3psULi^NTpDJ23wB407eZfY&S
z(nuL|Kh^82qw=Kf@y<e;@Z0C{&Q`svt=zC|Fv@S(Dye>O%aLUrPmIBWqIInVq410b
zp#8DZT<u@-@h|V}SPk~ljTEz%eb{Bw;o=O_Z23~;TKySU{jah&iU)$fZ?-rbFc~43
z4oN4kALQHlWDyHeimbvWc0drzO^P?#@cCf2_cikpaW4fc7ZjL!`@jAk*V?=FuG_`<
zHuW1HAZRrDEz-O6+6B3O(<usA2j`pjtfiHrs^#dTNE=0!KkG}BVf|UO>1gY{7CH#N
zejA1_pD7w%uN~)@hSy{{(4Edtur152tAmm(za?>YM`IVSo8&b}D1<uMH}e&y?0lLc
zokvyf*av~J0}}4Usxxr@*{HYptk)lLYgdoeXwcYBcIH7E%4P)&HFOXBt_m!N7q_?g
zX2)||-WZ@ux=oc_<EdVe9o&}HGAJmuM+&wF=r&>4)GJ<A3n?!hN(FWB>iB-d+6<Vw
zX*Bp;sA)v4xNRC#(8S{m#I=mnG%TH1n%=n`j54QkG$6qkQFRy`XIMBIU%zwkB5l>~
zZ0VqE_AIAU&3Igx=^_82iB?x<$P2v)o{I0;Ev=HiF~qr>Ic*j~+!zB{`nf?v#j^mb
z6F+nRMRq88%j>vEw)4U{;UA7Tb3Bo+%jNv!ch2)v^W9*xxL<dt9`$Bq#Cegk0uctM
zX>#fl13S;v(d&Cnj`h0hb_Z9m6VM}*#f|N0v1P%22QS2pIpR{B8Ih8f#flak6ZhZ>
ztk|C!2W#`iFu}H~|HO)%eP|}|j-C9ob=b=f3F;&eT$2?~7QUMHF!uG!Py9O6el_0M
z%tHzm1)Lz^BF{HYoy~&*=J}0RZ<(DgSUl;rc@K$hVDpwqE67Z~>o6MC>8peQG#MlQ
z04K(+Qx6i@KJB3e8Z(f>siN3&i7t0Sm~nkcQaZ@g!xTdz7!E4%0Z^$+RGpOzq0%<f
zQu~ji#@-H!k=5ivIO3^)E(&IS!>Y-`Y8ek%icEH$ve|V$BxO*IBQr;+%GPtc5m-2Q
zse57#p-{BQaY+zTJE_MM-$pHNW9@i7tLPt*mzd>KWjX{5ft3I>C6dJ~Liy?kI19YY
zu<BOn;jyAYYi;^GN`&>FuxhWVwJ%?8_ue448(&j%%tq}y;4rfx3S}E7j%k6Qq?!6{
zXE^pf$MGswLNnsEO@(ZQ<y8gJRoQ1&G;Lh<t+(MFp&U3~qJeRXnSlh*%>eI$@OyY}
zOeYCjXSzcmj%y7p*A^z*-<ozFXPhZoBfvFQzOC(4wJbg%GbhRx>(3+FLPu7*Joe6i
z2B5H}TEp{$AVS?TDGj=f`>+Q^3X*B$)6NC>AFH`4sc=y`&ZvHuSv+3J)>7}Li#OP2
z5q)S?uoT9l)KR|EBvVjWBLqlPqLQ+J*~XkN!FC4-k!$QXW<_XCXDh@v;6fz4Oj`s+
zy<49HVe5B<oVjU{u`cu}0SN1n5x2uuE<s!5%3i~nl#@vP%4B%5T82+aLk!sG1nQgQ
zc0EAz*-1li#c_bNF*-hC8O(5yC4`RoLctB}7H}z%lxsY}C5LT)ZOl7V!aGt1k*H<)
z+q}o&`+GRQ$9-?|S>AfSIOCYtW6Qm=T$<6nIo_Z1=V4OZ41@`9u$~QA8o>Se;DCS9
z8b_GRRrs3mw#&3?+ikN1SuFK<&$@1+k{y~oZkX^kFsz$T87^(x1`dB-5qLY3%ybHb
zS`2O8hnm+rQ3WJ1j>+rQQnktJRpiY3v0vYx5%-CP%QyJFZ}Za4ZC+eX;6l4SB~!XO
z<MGDENH{#8aalR^p>bnrLM4ULZ<@58DpCf0WMngVJ`}#^#FaCS961juMU0Ut<wEtQ
zs;F<+xj+=KHX8?8mAE~tYzTsnZf7Oj?<xI*M-XI{^aA4p_#sN8M~{|mf2Dj}l$sE}
zc>2)1u9Abp$k8-X5InI|U6YWM9fyl=L|zhNh>JLZ3;Kci_-j34qUw0~VAAiIGOBS2
z|BILFBYuH&3Qz9$oxpLoh<OS~p7yw&(2T%gz3$-hBvq%VSsa(lya~ma7zeO38@jSU
zR=qFSE{v<lv+^u6)xo(EGB`$MP^&Q?DLJMRl-*8HX-|?SS?650hwBMRYHdLRVwH4H
z{T~Kho-52Ww4|mpX+(A^15RjmkJ1$7sd-t^2nDv!Osp9BLT+h1$$GVFugB$98%K1z
zj6b$}D}}QUb9nU9<e1|0oTcParbn*_WeSRG80|Y*UZxpC*H>AuF&$}J(L<aT(Hs2*
zon{IX36=hG?5omMMY;<JEdw%`A0r~ICl>Oz4Lp{kvNlymJQtgBqThIx0OF^~fn|xF
zS64S>4_vHB@`!f(!&3NIree(@KKBaH3YbU%IfR8Zj<NgE26qGgdOS{We3oXg>-ty(
z<6FMx%7}4!Q!_pOlf179W-djqhLUPD=lO8Y=nMl#ViIlMdo;@z8OM_PAE_YLIvwIA
z$;;CEtj7#WeNM+(zQ=fcsb8~O=DBHIX(Zs+n)Ldiw0XbEy8$757@xoK#NPp1UEVu^
z(?3&xJ?>kt<Ngc#>#grMfNov->;x-^6)|G61XKF&!zE*Uz&pZ_=5$?%W&9jIBVM^p
zBG(9ce$0tn#gk}Y^<hM6fZDhLujE$UxDLBAFegdLUs>vWy~e#+a)s!xcap9s+CNi3
z5q)*o{4y5%i4ko`OJqbUsHrLx8;Z4P^~1j-vxI(iT>avcJGdefOgE=fq%|vRe1w&_
zdAy*1Q8xgE_7UsHh0Mk($M4pe!Mn!_An^iERmCRMoF%cxkwF0@OO+jDjbw7o<0hS^
z2P2rrb5W+v;Jb+_%vba&2(wbW2*%eU6uucw*CODHK}vSWl<3d^?FKulFreat1kvGQ
zI?uU4qpJR`SZ3_3kkhkvKau%ck+5V_f$VZwuN*Wt%+{_bmgQ1Pp6x?NUma0#Nt@{~
zrxJ)UlqJ)$3kEC~pReUPDabcDpRCNUGg^Od8WQi;0dkDbs1Trzp+<Nr1x)wqbQa1*
z&1r(AP@y0mo>QqAld_VGX13m(SwhyY?iUNMO<iX&rH~US;Y1~62vJD1m?o!Kh;zBg
zS{QVKKV?>0ZQ8iDcArp^BKPQYw-(dvH(?ldhQMf)-V%P5%T!Tr$qSD-_r#js2>%HL
zv;}Mrn-q$tTZ}=A1$h<C2SB*#gINcxx5<!e)0#2&cA-;k8m8yPwO0V0;eAu-r)P`s
z@?!B)cEy%AWN);ZFlU@gQY?N}s5COD=Vnqbp&oLuBI|_>(s7R6&bOQ<oCRlDfcpAA
zVjBH_0J}g$zkfhU_9PMm`N@g?iQ!&$1nB8O+w#uQZKdgAk&PP<Aac7fjmdZtb*JK~
z^k5p#Ap`O%u=G@JFqh9~`<;9)H!$c-45fyKl36F4P4`U>4`s7`1Ibi6mrP-yriAum
zc9j~$Jslzcp)_t;05*d56-s5tsrDSfykVw*ew*q|rqjcHy~*U@%c+50EMHE&#45ak
zH=z-9VA_eJKD=a)A{YTm4v(Z#BmF7Vog7MX=ac=@6Mu4Y^$|3-Z~K8Aua6^nu=2hq
z==MMjT`E?#iQCpT%7>@x1-y4vW(7L@Z-mlAsN1WZNF=t=J#Jf`<^0>`ZFBsM2jag`
zs&5!|8>_zsWCjNgzrOd10r+n;G=!I#^7upPC8SeBxx-+#&BxL3-~i@WHGFu3WfJS|
zild=Hyx8$RL|*hLK4KD8M?uU)lu`^hCE#2(pHDl<zC<FKo*c^eO~_PE2su*Q9+7u9
z)e9i>p;S7a>O<Y|H$C9<tNn1I&+UL}G4ij`Qwj6HPILfo#pMd{>T(smfT^@JGD1y;
zxpHL|&PFfrorakV@%}Ab;wH9g*w1(G+m9#YD3%Bk=C+Q2l5K1xj_`l7crXrvHQorR
zcFSD*Xkcx!h068STQ__6WcI$fea~w<GyC2cF_nRb^pRO(@Qzc==0Gsx22}EE9HrGB
zfbT%8D=v(o%htfrzy^+nG$0Ps*n}uUYi)O;X?*haZxX}IdA2Kr1tu)aF;@dB+2MJw
zm$4L7EFUAMmr@zYDkCZoabk&v)J;R?hLFe?p^PsJ*(jwpgK!K@w4Zd8MnvK<wEvvQ
zCltAxh}1SBlV*BV>XVEcsaH6f+(!>r7BA8C7*c|61r`l;<1SnQ&B@278#liK{X6xL
zai5)BzqusA1+B^AC0@e2p9u=gS@tbEQ}8@i)U&=a<Puo(U-GSZ+Oa|z;;0Hh^^!6o
ziXj0HgZN>g#{9?dX`t_?<@w!sB>M@Gt$lsG@%_6$EuV?jb$b7S9e84jt5+OJF9!O$
zS{@;x3R7kCi2wbK188;m<F$K#URgZXxOA^^_Bg6nXL}rSfP>_7*B&&^-=;AEXkdZ0
zzklG)En{3Xu05b(!|)8dCzsuYM*z-(kxnGmhP?2P&x^7F$?>vq#yxpw_3C?GLL39l
zLf*XjQxJ3P>U*mXzI2x_o0@BKgO}<yy}bu2t=P$taR>sN7ZHn6RVg^Fy~R{pOkpb6
zx%QGHb+$N$CmFhuBrZ7k3#XQFQ8SWQZ4v#V)Ro23fnj`@>ce7beyBIu>-6<U7E7CG
ztJ^NDHUR}h*mNox9|C1VGM-BM5-iU<{>p4AJ5|W-Iq)C*b6EiMX1SiN&h7*iNFINU
z@q)^GU*K&~r&lclwPg-xi<2A@kc;p-HX}>b9UUESmZ^~#ekMXOHEV(wO$|0XctsQ$
z313Z3q?(x&Casw<e2T&=0L?Hi-IE%J>ksL1sc70Ssun~e5>u!8kql;l`0Pp3W3gB!
z#NVe<#5B9I^i|{9vw*fq4veVnMH>e%9DO7+srU@Pd_?*kLo<3M?yiI+eLQ0Z{2E%U
zX9gFXfqP1dBbD+OsIzydKaO{!?%tuncwbs4&)wynmPkc>OESi?lXjOV5wzF|tTtV#
zkgEY`)~P2nvx8mypXE*+o*zq0@hBjEvczo+vprDP;!o5!#rMBUb#>7%FM?}-h9F-P
z&iR<R*;RJ5Fd8kiuFitnr{*70ixu*qD67wEjj}nm$dJC4P?H$^fsCcbC2u873YZ<0
zF=z}27nkv9L!-ToSu2LM!YyFh=%mvO#x(#zn71A%=uj1kG>5N?h|qr`taUzz)hfHw
zw2Y6=)-lm(#ew);yGP@hxGhOsW*BbdJ7Mxmes9Jbj@koOj>w_<b(tA$EFsqfc}Oes
z0mHi;!7nQ)*Vzi=@}&)_G98LilizX(tm)nGG~e_v9u9q9Iqx!q0UH_yyTu#G=tb6P
zwHhM_v8G98*C@h^&*U$b)=5EdaZ-q&Uasa28_7VZgBN4XJdN6=TcMWq-mL<{yLH=@
z$W^BCznI1<voZaqtqMEC`bGZm`?U5HT|#w2*>Q4bl-;s-a6M{L*Sc-OB0QLBc{h@h
z#EwaPsi-54;btuAL8);wC}37Tv{BwMOT+LeH(`|$=2cLzw2ptzcnJ0HCSk0OdA%B8
zU9yAl?kLuX_$vVot|cuz36f#v<$8&!>0%~k<PzrYj|YP!%OX+Bfoi9k!5n`&=VW5Z
zxQrulGfGsQii73vlq9!z#f47B5d}2*0w(7d3>~oZ2*bW0ClzXi8+b$%aYOLRH^B$I
zI<0U4%MJUMR=w&;)8g7k;?wNP?=iY0g{^uxsDaPIAnNYx?Trr(G<{BMYK?`};>)e2
z)n)?D$<EhT;!kyN@qxlxEEz|E%dlQ3Ix%$>O3SdzMiC*)4<1GRU=-gv+qPYI`+TX1
z4O^<!#8Q>&wSCjpp*OpLMF6cWUI|1i3s~H`HDDpjSSt^YvVCi1LN;(y`KGsUOQWHi
zxUF6|iwUUN`bKU8`W&}%sS6?8MDJtRQH)VpOAERUyRQnZ#r;<Rv*AVD$`bA|E#T&s
zZ}{SEZt1pR;kLGHTV1r4dK;1q%;0}+*+A%3=z4yR(m+(UVd6_PqC2~HaPu*IUYuJy
z{otRP)!<>DdFq~i+_-**f<#_yY`w4+-$ds|=q2Ca;$c7LU76R)vrENs14kMmM2H2a
zatWYdbnrMBzU{`wJoT6<iFz4g@5MHqU2?8%4*t~~izV~(x0~MNL}$6bWM*oKS=9<x
zMK78)pcbAQik*n8)jkgr)3p~%2K?4En-1-b?*^fiX^pF^$3*_OlO4Xd=kwdAX9~qa
zwmQ43T&u;Hzd#}f$jnaH%di_HUY#!Gz{f%~*~8<T$cbxyD%lIJ`Ki=UpXBSxkND!X
z1pdzAkgr{+hdVB%LcNeJ!lh&$7lT~bs#J=z^bR&+_f@<k5`J|;$(3@gPA)#{3U}`&
zCcUpI-d~bX4m+WGVUzexkX=!?A5^N~dekxYM^b0pOHtATck<x&|K}g9fBD|(iM#Id
zocqlOiG*)6_F&W<2yf-g5Jo<x{z@i-b_vxE;~lLdZY*-(aDkYLvIr7AOiv+pRg6~!
zt2Q_Y0#}C4ac+!PS3VE71jIj~Z4}F;LmKG3Q!5-Q(Q7#MYT?i!r;4(0C8#vZn}hFo
znrelKBE64?x7gxz-tli`&}ddiv(-YDUT4L?a^<5K6!`f#;QuTXhQwBQ4EX0+`VVw0
z9`e0I9h_!sA*&u!A_U?(<kWF{TrBFyF!Jz<bi|2iPMcP|mH1%T6-1-`iMYk^l_|Q%
zfU_o^aI_sS)B_xp34+*Nsc-^dgC4d{7hy~?G-juqOu3o?92#v%{yk@LynMG3JEPVn
zF6o$}VfXjR+c<i9tw;nKgTisf_gKKP+GxjAO<y|j94lZd_qT|QC*xX;#~HTu#a?`o
zwjKw^TFP3bR3@#g^M$jG`HTOUd)N035qTANcoVMAfwNRiW-ne(`DIFKipIPsL8@fL
z>w1hi50EQksq{eF=*HC)bLP)WFTLbXOM(W~V8&j4uajNGJ5s1**j+u=ITENSj(g!`
zOBtVj$fAZ@{g{z6T>DpTeNOYbGaE<Qs)J_B(;!mSa)t7A?cdOjY>B+8XNyOGGfX-E
z_FttpUYMfSBY3$un!pmt8z_fM^OCq$LcN%b4jw${Z*t<w^^#z+J;O*#%BCXRsx=qW
zHx;X**L3|qrWP_5w5aa`!vwH8ksT?+E*J$nCWh2zouST@y}60Wp>)zoB=UW^fr)|s
zRI|>M(OLtHwWHT$Ee?SpBn7c3yt{8$zQOE*xF7DqXUkH}@viNVWiwgGp<=dlXgYhy
zL5E<wqj?*hD!EAu^M!XyA6T(4Rj9KKavEc*DuVW||6tnoUANig!=3axr|AF4i?i)!
z3hxK}6?NI1tq>PD`tvRlIGuat#?gW4N)a#gKa{+i)v^QM0Pv;WetMgez{66IheLJ+
zlLMD%T)OG$RQHmiTr%-p3^kbKo)(9wFgZToypnM3m{m|N&Z?~>B<RKRp+kZ=N(<5b
zb(#Y4sf5>&WBM3SmKsVEUXK5yQu-a(H+S#fgK0C!oJLp<k)a^)5$5)VB5|9?pKQBQ
zKomrMC+$RjVlVM6+pk_tRHtegh=ZrPO;KJ*#QV6pmj*v2llNe!F;y|LI7qdzUp`YM
zTX^&g4@eDTC0B`(3jpy>F^$?3-dmN>aFz10Sh_2YY#{@Q$9HcZxntcsOvB`Cj$1XU
z8_gY=7X4+rRsTHh&*{P5c&Z=I)lf2?HpDc7WPo9OfU+Ym(wDJJlOe%qB`vKW32~yl
zQx~KW5(I>UXqkUPV#YzWXK_ojh9ravd@OlwBJ_OuSjht408MW<41P9O&rAaQ-oE2N
z`qe#S^7P<e$D=Xz8QnijJYaakhzYI6<oUt8f#r7ckP5F<NkX0U_BNwz4O(j?)2IW7
z4mqA8f6Kt>)oRBIhl<le5ND4elXXvijmfa>Znf%QMc5W~U(l2Z8bJ?Dq8eAkonD2F
zTC8A3glCNSV7YOgB)om{<?8zn-3wO`jjBijD#B$gL-tWTWoaWc0cRRoNoFExNQrC>
z_RrXiXOj@Rul_h57mN^on?Abrc`Wc*<S>EOgeI&ySnHH(RNzbMt=)Yh=Kai(qZ|&|
zqWMdC%hzL6$5APR3)2vhq4BC^EpKn%Fp<{>hU3HfcC(!yAA>d>PnR-+Cd%a^SGUI1
za`G)h;i8|~UAXlJ$5aUd85lWxXyxMppoHJdmB+`&t#0YI)A~g1|HTno6d@@Z)T(by
zJJnfVCL|R079}rZRtS2e8U$gWyLZ?u6hxXO@#%oGD3}9!s?MP*XxNW}HaAxW-4s=@
z9ocr2OjrTbsTglre1e_I9>K@g=_;7zP{CK<gEZ)?XldUx)r-_VX(CH+ebNi*x~_&l
z#$W_Tj`BOC;Os63c7j-Ondw?~qR3;w#{A-JKh~m)$~wwPRHzk7cqXvy9h0HQ#rwQ=
zoukk}@g^=8Ak$`d|DfwTZ6C4+Iah`gQK7`?u}W@sg0?ay-UIe_f>9r~21}%GF(Lqj
z*#pZ?h5^THfHYt$ANP6{N1SDV?$M}aWpgIuBrI4T<3}_+9q!)S9k{WM5!z%2ZXJIn
zedv%G7rd5UyaT(&ey-KQOXJP*KBjdsYUjj3iugs^tWH<*;CW=LpiXRD8dBu@@z0Uk
z4Q4>P%MpHa*vaNeM9~u7^fL6@Z}AuMNv$c7mcLbsxbmDbyy0XHRm;;A-LY7i1|OUO
z-9Z4gs3vMdBnq&fD}@~Dvvu%Lk{1F=P5{yaCNh}-@L94K3(MPx+AdLht6FC#8;m3=
z#<j{eF<!;i>D-|IHHzgt*v`DM5o~+go3PZ4mq+Pc+W!Px&?~g2Lhwv>c-xd`@2wp!
zPZ#qLKd?NBh(P&v)Xuy^b^e7~hrL1cwm#r@jOEq&Q;<fxogtP=H3(p-3^P$@8d)B5
z{FsCOG+nFnxGZ&3qd8)q$*^obax*M21N>)PPulho+)>lIjo5rp>k_^d(w4q<i>=vt
z!E^<J@(Z#AW~PvhLy(2~k^@9R(w82N_YU^8C=8L#dB;BCo1dyi5b@L#ymb$f!IEla
zD%%OaSb2q6@u`=Z%FYPfmf8yOD{q(dt&C7Qo$MR3$+w+Nk_|uJ$yG+s*ze*fg|~{g
z@c&0N(xkPZWG0>0Ni7*i$#E!!<@+%P0+Dzt-!^Xi*tq^z@QZtVc6I52Ks<(_(0?o0
zJaXsnqcH(XaS>&f?%{fQC(l;{uwRUTS}bM*)8*x&q2pqwQ{bn`&=$r<cr6|(2ro4G
zRcahgK{1Xs0BcY^z(b=gXOw9i72DG*8YRtOc!J?jbwm<XDl~pONI#>!`p@zTk<aNQ
zYQ4~I4ZN0O=1!8`a#Y$g%>^<08Wpxrb0M;ok4k&J&8~nkQ#)LktdEKZRKrZe*&Yxl
zmot|qTQ}bAMwM;PT$-SefHuyQ6%P(kcub6@5=qp<6pdcZe6?ID`^Gk<axIrFW~)@<
zA@%LjhXn#Po?uB8H!|AsyFL3~(Or%h@~S+B>}WdGH`q7SJJ2_1DO+-8>X8c<N+o(n
zF^46V(WIpmAi^+lS|2A1GeIS?x%xDa3T75>!HFqa(IZDg%8IK?_-rVxo3E((O!Esc
z9*vZ}T+At@7H6t5VPVIm>T%^zqr%}`SK|HxF9_PoNe2{P4ow7ron!fmd-mhiGxK;O
z@dACV-9G_=z8+t8zke5(iNc4_>cS`P-HU$sd(t8eukS8=zIy)t%Hre3mD6h{pRV5g
z>&nu@pO)v{uir)6>lhl^p8je1%-XYSYd6oXJU-R<@q3m&>g>k?KoPKe@`v?H3yn*U
z-1qNDLrx{o>NiX4pWXsEXFk9`!vGF?CzgYV4i3XTXdK8v<eoaYe&UXM{tL(hK!TyY
z_WSK9_8OdIoP5vkfvF-7hg}$J3VE?;Y0Ebt>%jRh+;2XBKHwQ9CN_Hbc9(CgpT5wz
zJimJWbmQ8C_3y4Is7oi1`{n2E`#11F-20~*H%_~!Pc{C&u=?&J_wx@_sbuZRUmNGX
zgji{|Q8muYtzVymgb0f#-20axw%z%^tll_|-_TqJO8dPDw6b`+abtP?_}|^9-&)}U
z#0&2>F5l;&@Xo@t9yBntgGvhub+(rEM~#L?To1LK+m;7NC}Fn@%4PfnEu(%vj3p!5
z5amkV?&ep4S3cUs!@klU+he%UE4D9Q!LUP+kM4$UC_hF3LeKxaOaAD@4n=`#li<DX
zSqd0|;XMp*qKBP&*01lsGgi?=0l%j9lTDg|<U#UGy{Mpt$j9pB8BeD+y@IBCn(Pqi
zAojKj{hcl}1t;+d2jvS#3)B;NVix&R#p-x3-s#dj1mQ8-@I3sz71`kIH~gYZ*UT_1
zdT+>W9BHZzmC&1Q?dWrDt!!y609#QE>*mSw72LQ%&&^7Sb7-Ogxd)~amSs8`w8)Ks
zCbC#cRaUcB&^S!ds%#p*1|`hAsFqic|B(%O9t2UN-%@=^q!b_!J+-x(F8mrTmR|V~
z%cfVZWy>ak@N-x;qgyZyXI79wA3tl%ebadVvU~qM^!qsak2v~IwP3TRPOqY*+7~!q
zNHs-?FEBAnMSWygD8=G)<MIRQ1+lVp+dcbe{p2}B&Lh513{oKip$~zy`Td*@i41-O
ze+1pw{y2L5#aG6Oh6{h)lOnHPF{q;O(6`}njCj@3V!gDbo`PxdE6b0mYBL4hx_o{Y
ztPV+HY)FB<A#jES4Y9p3Wq)X`S!IUBubb7hK1KN8Kn#41Pd>n>l9eatR-Y`aE`N-V
z8XA_GwE;)6VXC$}MQRVac?3^n!3v(q4#-wP?`78N)#+RvtpA6zUZM~dh5o2h1*dg%
zv@g*+kbv}6aA^bXb4-)8P*lRoWXK^ulfj?JWu(kI$S7*@=Q;W*nA;EVQ_w0{qOI#e
z!0{0hpH%U2a`nuk#=`sVt@l<ho)U<~0*8hQ7lVnjQ#W4_lxIPfzsnC+7QbcF=HB|o
zy?+5B)IT}pE_}ZJ=}Gs(x6FH1jXQq!w7YN>4$LR+xbvqh^cK-pOm6!r4Md6b*@?`Z
zgKk}Z=05(q@!9Rwn+v$3bTkABfw*J%yM70;0A*<X%EQLO-+x-Zs>PWHYeb500pcfI
z>Pd}b)JY3Dqz!ssDyRwkSp$mSK|<7@mgn&_HRk`kvh>%+CpQ{DTn3O3zJ6u#?%Meu
z-3uR+q9ioEF#<h9;o{?!$EQd|8nYg^N1=uwm(t1FvrjRR=oqu7Ops_@K>=%-(mVj&
zH(Ni98+E^b7sA}1oTtgOAveSFjrF+?gNF5>@yWxrZ@zL*{Q#L6Zq8!_TSwHmJ_p24
zqWHUCzuWlyx_jZ?+KDH@n0s<+_0sn+3P9farx)G%hYIpm7U%HtT6^~AmE~muU72G)
z|3f!r=QRh*9?}&)tzosC?h9fFn=iQZz{13AnOs~|SLF+G)fc*GXa&toC9HAVfC9rc
zQv`<4KO{gPwmpJ5i8R*k-Nr9r83eZwVTyEo(#1t5Er0;vkr$=`kgal{`xfnc7ze`W
zSvz~ez4m}CM3Ro;<U_hNBX@aY)cAaT&@}m~4Zdhbv3N6m(F{UjUjqHdf1=gpkKDUg
z8rL2)KEA(kS+8yWY9`=KJEd7kq$OvyK!uz5GOCb8gd=q$VqNb-?#1&)fgHK_zb7eL
zK(eqeyG|tXEjqu#+JOCrN^2(RjHrm{*hyJ~U`FLHd)A#z7Qx!P0Wgj%paM)PWWb2P
z1$6!RH}~vG_u_Xek3MkEK++o~y>?@~eT3i8Yw67}?x6FQzFzwiXl(uTIXs(~22P)H
zZ=Dtn`@4-U6!<Y)3Mo}kDS;ZJs3i(aM<*X*L5|!UM{76ltloOew!SaFTUq?lT|R+v
z0XntD`7?&Gk~Y`o&aN!J-&kCv_BTM4Vk?6a0I{Rd6i>L>Bx?-PK3)D4`f=-<wFh5-
zQH#DOZL7^qW`mtT1afK8^$ga}w^){4Sv*gt7f%*lZB*S7cm%<#o!A^{;1Ig3Mo>(2
zLBhS5>55B>loZ(Gm&i?5EeEz&w^LIuWIaFqt^58R_tCS)hj&53I5W4hbelrOYgGUo
z4wpFcP;Wq+vmM(g13m7PzItX+N@sM!N02wYM6*cZG9=-n6b46xWSZ)j&}GT}NGD60
zQEct$N0=aStBrFDv?ss<$<@}>0Ld>l!M>1uY3C`LsaaJI_(mv8x^gS2wZt2@A|;Ad
z^gLv&eK8Xm?~~cN857&`V=P|+2b-)wP51N#_x_*pQ1F;m|MnDUcKKa*aS>M4qlb;9
zn`;k$5JzT$8l<n&?*-$&1@G80w~GuQNShW2L??g&%sv}87Tsqj#+m<tg*Q^n7#Uj+
z!Y5w5rl?ng?}Hs@l|+XH^xE&YTYyN0`4>P(`vQmu?&C{n?057++a6!}xu1)tuQPR^
zw$L{2vF6Qqu)i9c(;#m(Z`Oi+)!?!Dl)?FlY#hIbUHH}myj){@Qbzjl9nCcMVCj5I
zE@tp-#K8=nDDGvX9l?Jp>8qOZ&SZAFSl>-{&pS8W<qMGvZItR7$G>!Mouhk_$ozr{
zN%T3!i%no#0Zk2Z^8~Qk(~s%irx%lK4BU9IpT3h`ydFaBQYTGX@}}ISEscb@NgL20
zJBKm$u(EiLId?s~<X-!{@xz_Qdl%gIuQfh9fd#wOFaENA<?qIh7~R~&Eye0GCkvqv
z8pNqW;GYr+bOO?OhuNPE1f=PjlP6ag4r=g;nfzb^OCl&=cBm}nyV<cqLSQ+yJ1x58
z1zb*+e<3SLH6$}v)+S8Jo;&LXjLFO&a|7mN?y}poL77%;#-z-yW=Ak8(~=uAD|bG3
z!!qr4i>77nz#qZ5EVzI-YhL!8!kaZP3y$Kgnwa?r+hJrjqF{6Fz6R7iAIq;1_0QY(
z%O@|I^_StQ8T)T1-)<sm1?3pV;Wl77K5d{CSS||BbG8HP9eCcBU_%eWY{7CBOEl%p
zRiLwNqfIJwoyR;kB|2Yi!D{r@m&~T9(uG;xh%#NKqm8T6BL$r%8qiS1CyVnQK`-b{
zO+?M0-^q3=%6zCTE4Vm%ehZN&SDwL@jMaA^q0|U+=Woz!82IGjK068i9QZ^{wMFCr
zNR@L4S2O{nx>)_#2)ePSAHW?0QCqfjd;Rz`_vAUMacZ3V!oBsqa@|0@Xcew9Gj(YL
zS{&hOTFbf;f3`xjCDKB(=}Y`VAsQ3EV3d1!e(9*{5wW*TD@cSi2`#Ef1UVr@Ng}i*
zs!1$mv?Y<oOXA`^uvnA(MeE;Pp~OnPJIu>~ev)tIr4VS|7yiPuegqiP)37j!h~A`H
zeQ?ix{{~z+s+A|}Q`s3rn^G<;i7k=)TIbQuEQBhjm%>&S?}D72U@&6rR&ULV$%x?6
z*5YIL-lO#=>isVfxzZT)LctkOBIuAzORPyP5^pY4j}+Qgdz3+@W|~n43H9w-9B(TM
z*5e4<?k_`=BSj+H(&cRFa(->PoL{MeW=pB^oRuoSCN)XNL9hBj?_ME`=nhdI%T=y}
z=-AQ#Y+M7db+&KA*}ff-((F8=<ydL<V>Ug5dMAq!L|p=?)bE`uR+s<Mxb)O^ymqhO
zp+{JDj5u?SIPd@*_x>rF$-^;z={AMuUb~@(dEh|t@u6|!NB8(u_rg7>LEK(wy~ao9
zR-RnPCkfg)^^^e~O;<0v%NM-v<3_=~f8nfq@(%TWF*X{o8->S!vuZGwK5BgSg^0Q>
zEf3m=?qm}cAG81qw(;ZNR+lc&OB0mr)}Eb)(<`a5r6N&OByq&o?ZVU5Ckx<>hTnpt
z7V*wy3;xHyVP4_h`WQS8NiwJBp*>Mm9*B2!W$E1Ndk-;NX`DT7RHQVeS!h~Y5UJ|c
z;h!D;t!H{3;c_#N_UQM2+_@*cd+*N7_5%lY?m6(r-aVNeySDG&zjHr`J-ie=9S&?o
z9V2>#B#Asb2&?MxMj=#?2kr&t<}#8gl|^4~U%Wqwx`$E&@&0~2P~(2GOz?-_70bCJ
zp;;|3byc0bpTJklh>>G)bF={Uqb!7pR5_y+GpAvYD?tExDLjQ9@P;e}Lvjac>eWO#
zSX8<2?fUU^?)g)VXLF4QOI{ja%&6Ap&aQv<IhGJW2P)<UCcO+v<D;u=0)xYQu}Nv(
zHy+;uZ!M4&hX*!gEzmGLc6ERUEI0{{mt)ljtA+dSL|ANu{I7Ur7V3CZG%o>(2cmSY
z#B&a7KqL@INTFa>3utj*p+2w!pVPeRX^bV<G|7vXcG;TZ#|D`&O($3P2#<#eu9quE
zVy42A@Kff9nq5W=VM{mba&;;LP{{sj4(6+javMv4Gd-oZOj|Q29TMj57`P3<F*%zf
zY5Ba&K}Zy-0`RQ7fsM<Md&xR+B#~@Jr9wYJwh0X}e+8JROrTXh&Gu|i9$;$^JJH^f
zgAU_e1h)b=XGoGbMg(3SRmh53b$|zDd2V1Ng!UAm)gfu?F}P$n;ZWT)uIprLh2kuA
zTR_JS7x4OFKycwhWpJXvr9hWaE@Dv`a?Lq;6*l99Ot8kBaFQhw&K4nBY6g!Xrc!U5
z1X?1T->LahBVVTp$asgBPmIbD$p9?vDwivWB!<o6v1N~D3q??rco>4zqZo4}?mD^o
zgkMAM4e~Zup29Ro^S<H#oUDfcjbO@EK2V|~DxoTPxWu%s8&?pkNu5NLgxfyUaZ^=t
z(1)cw!w8h8Ul-6W!ob2pEmH@juDte4up!$r%sfR!6^os9p3;l033648Ac+znqYkkU
zpvDZGfSs<AT)iy2Jo=-9N)9DH$qQ#{x>zqjif5?A1V7q}88++R#4I|fu!4gylir9h
zmo4Qf9g};yF^tNd6d4zM(itzo`6Np6Ya)>In`~jmiG_y^kuGy}FB>CDdpd#Ro-`b?
zx2cd-J|&V<Ollrd*eLiIBW=eiAwh4(gGhp6_x18I2bR=9!WkgNdIr#nr>8buA=wM-
z<tkn)EGsWr9i%8=x>m;Q{RO-jr)wI!lZoRX>DoaC2A?9UUeZcEZo83V8{Q_oKEpwT
zv;g_HVwfu%@oC#StSd(r0Odo9^qWP92!Vqoh{B<%a)D)O*2PT)EXj~%0M*>qXL{Qm
z1h-q9oOZ~D5<sa~hAo?XGlHT74j8l-LBR~$yUc1jneiv0yyyWncBbt@V-WPV364g5
z1PhT;QPap<6(mE@*_>=B0W-Y|krgw=!jY!tA0MY7+=4bM>MKwY!qHlf+G^|Rbg72d
zp7Z~BYH)m{%@=ZBce7Mv@fU+mS(s5`x=MI}V&*$w6V-Ay4+6ZQC0i#FUM2iKMpms)
zZF*8acvR5VW}8P<7E7o{9c4{SmkPM+G2;Y9Cl^UlWQprRl8~FUE)qe?9Mu6O3Uz17
zJWQc-L_t%lZhOy_<`&X5^wZiyueLh5()TfF(b`Q9M)=zI2qPGx?Y$&i**A|d^Qf9`
zPL&1hf?ka!#x&v~bo~bPp%p>P1`8fw-g;Pl@gZ<V%_t14SSwFgbIz!QwPv_IJdxLR
zi(+5!UeF#H&~GC|7}T;qMW1X>ksnRe(-m4&t!)4um&K~+TkytY)=eLJC`C1k5>=h9
zXwF|D5mu_Z2(!@j=0y2EGze)uhX(uNhJ=wJ0L(d?;UBP{fLDp#^uUu9laiUjRHf)l
zVO~ZwFXNt~Ue86SqOG&3Mm5VtKvP;aPY7G?I|VUq0FSy}tYqLgOgVFnhSZ6v5a1D_
z$n)*;C6uJ)edygN)rI~yFHA$_R5Q$Ch{c@>?6FA|zBIM{LrIw0;elb()b6V~2dddZ
z$;s~qMOaMvE*n=erE-R3HwR-SrT|zMV;v?c>5x)>WokzX6);TIGI{4H=H8h~F`FY3
zjUjZ%W~*9<T$Sn-YAgg2?foc;Qo<*wQ%Qr>Db#{xQ^?WEay2a(HqC{=c1mU-nX;Lf
z>UqxM+1&WI#i|)1P76~oI6N31MBS;>&=CHs%>l){Wk&S28FvWCkj7c8k|U*D-9hT?
z+8x@MnqDQmY8<`hl$<JAxVsB8g_1%#^#bO-47r392<;T9LqLN5&|ytfZcKzBk@-+6
z1!SH|C)4q?AvJL`qn8<(utYQXUl2pksA2u`A5|crO!aOOa)A`qIq`<0Tpr=$aSpC2
zmJS*FP$}2y1O>W;Z%zZ(=^eojdDD_%`8))78`2MRu>@UkGE26G3-&$3qe8+9pDs=<
zLN2}Zc`OSy*UWT<dr6!04L}=EaJqt767``d5*pFN^Hi#!?CTi&BPl*k38QCsc_#_s
z@%abqU%t0`;;y?q=YI1+B9Z9O5RJEAd_r|*$Z=xKv~PFNu0Wp3YTX=Bq~idXTkwBb
z1QzA#DV72f@v0a<AU=@OB6N<Q`KPcLP;?OfQ}q6XHc>3YZC3?y-oY|tiK^P`)dILL
zpll5v<w~=jaRh(IBuA590ErYzx#DzQyaY>5en+#_0+~%}a3a<g#peMIy~MH2uUg60
z57!b$A#D)(2YMNA=Bq|A+Ifl`6@VAxsXFA;Nt{#4Iyy3pUc4e5RQs*XH(o$Kn4mD(
zBzHTgN=n5_Ld@ZgFv){XeEW30e1L<p>N8k(rNRk-4Sqj$%BN!DmP77Bm#Z1Tr}+@!
zEhKTg{75Z!My(Ch2c%A{WFFm{LyCiSwk86NLE$*tr`WN-Hrg>&)7KO{<f62%X#yeR
zP&J#6+0zGS*xnm^@kQEtocrrY((xz_W#(PFP*QzxXb~9$@%8%0+`ApB3`&3iBLZ)1
z>Llgo>X4a{Hv+bY(<~prv=XF>9A}kmYlBI;tqt|X+qJ8`^wLZIWF_d~g^HFj#=9Nx
zv&Lr`5Zj2iIu<lo-aLxaT+zvvGCoyNc8}#!yooV1`x<kfGrjK2#>w@5xaN<kxRxuF
zr)&R)c4SNBRXtlIR{tsI-~Oxg#tTzqpYqDl6uxg@$*1I%#I^KP@COebRFY}2^$g&Y
z-Ni^``VJ(@n_5Y22eM{-)!fQJ!Kl46`JYL<HVMcqvI4ERrD4@h5fex)l}z>#TI)-v
z;=Kc@Hc=Zq@2D>|4=T1G2>R$E^3MCZfI}~+F`CVQD*+`=Hm>Mb)^gKTS~b#jHM+L(
zz8G{JE5co_2u&$Rq7gAptLdhf%}GlDsjLY4lE&61C|JGv-r6@`t=>59o_^A}d_PX(
zV(lUrg^;T>3m{2d2(Izhzg_$CyMKS<!0PhHYxn+ah(4+}w!M1D=0+(@1dZ?sq2EcO
z1KceZ%k?Z#wyZ2Zmh>&M9#i$dC6h%1P14vaE6;8=j^6{H-V68E=gwGqqf8j>66uC}
z{SNt+UIV5u#jd5|mi`3IsC)0x+V?l1Q+Kbp@80o(+T!9pq&P$Zp40M$n6dmOSxGiT
zbEi0ANVuUQn`JYKkCt=U37bQ%k*qpdJOGbA;GzS$*r1mcw?f!Kn^@Hn<&1it0`Ifq
zSsnF0SxU<DKPrE<)KTVlRQYbGB_UJ(`DK4S{|TS@@c)^jbJQu0zOXm-LYK}81M$X=
z%pP`pM8uVmsvg9(uBd!OoM_5FAbIMgJ1dXRuYY$1(t6_e=Pr{DP%fjxpqKVNNHpIm
z1kfcRJl8+_6fXfnpYDfeDbU85Z`N+A*+vaZf(G~dY5A&{&Xsw>LSBd;E<=tL8Zcw&
z!rHUbSY9j^$$^!w^Lll*$C=4Fm3l;c(Cyaq1fU|G{YJG<o>=Y*C1mk3+%JTd;qm2c
zA|sj$u|q+=kgJc{?mlZ0x?V&xx>I}$UK)N`*(rl-ocLsf-4OO_(+||eLf67r`5LVT
zmz2P;J!14CVe(8Z3mTH>(u+qikwgmGkqlEb8H?2F<w^z2g}G`0i;0D-p^T>8F~ONF
z<*?vPb-nCXys^2egO9LPJ%hE>V?0e2d!ZL!q%tGD?5N5{QTf;U>2vO>Px%xe$`l~d
z-t{|DmVj=MnCRgNL3H8yi;YhnvS=eD?G9mD204o)99aMQ<HlcaYDqojm6-c9w7q-l
zDi+PSgmK18@If;lul;z>efplQAqYf%?Q{3ZhqRBjhG-0Ra?eR@K^O@Lg_f3#cqFQj
zL_*L3CS<zfDiJDI?&HL`t57qrRs%GRL^;6y?1vrfePoE!^+M6>7SmyKTgKi}adt1e
zYQulpNxZJByo6>(6dK|>Arb_MwNS#lY`Fq{&Q#4|Sk|oI^i*XQ6mg|WIJ4<Qjj!e!
zU)+Xd6c@i{(<l(GsRqq)q?h$_<E%ZqhUI3UrG=|2kIuMHzg<0lk1mSVRsnAF;@9r{
zr%Gb=RT2S<uDQ^o5Zsp(7CeZIG0lQx;Nw)wGubir%Ff^Y;k9vZsl!h1<Tc3$fp5SF
zdK>=4RH?|1Ll}o+xqv%d!<uZPCTH7tC+Myh(^W);WfArv1J*g?1V=}|k2?=_e;~0L
zp<j8tgA#Qvij~R|qbyg+0A`)K5JxfJWs7pa4*@(cxpgb#k1s8GpDYpwJnG&z8euc!
zkH&FEqimD>(YDLcC>yBPwzQFs%3lNf%yw4(AsjbsmzBN+)C$~o`4d4(@?xsX^l4;0
zMuviU5&6+Qb#i6-YU9SE#<>M+x;pp$JNOWR&;5-L{-W_KP;zj#hPT7U{Kw)}i+^SN
zwxa9sdUER<nuKJ1ZfWi5pCMrdSy-#r7Tt?yY2F@w{f-@v8<!qI<~_9|c24N|K;?;D
zL$$MI{1Q~_ZabLB3P>H0O`WI%QU!4<EUBOs9J%Zs2#1!i0`SCjD*)Y@NR-#f$s|)K
zWw<8X2+C4c2MW2Hn1Rm($N$-I#L3n&P8mEHREcL4WlOU$p1#jxLZ}0Za{}q;qS0kM
zF{z|;G6OTtm-)tM>aX`}_`=3Jm=6nYk|c)_m=cwJO=dVIxzM(|H4&4z<Dl1<gV33p
z0~b}$K@x5)jt)wl@B{T^(LqChFa*|l6f2MmaRG6CCp+GjfWOmaivk>pS;Af_SW>qs
zx3ab&wt&bzF9VT#>nlc5K*nrD4-+LVIF<y`d|$qrJtmkGB15KC0wHscCy8H0Z^5q?
z*h;QiuGKcV4L%Z;VMcFA)>N3=veL~YyXXjR0VQ^GwA#ZuMQpXzf^*d2zv!U}NKYhc
zGp_ldaY*h`n7PrWnBvl8vVez|l4eJgJ;pkC5vBy{7^jB<zeY!w*`yXQU1HPPl-ZYi
zq>f4Oq`t{5g>)|x!+={ds9uHJ)a6Nblb5@w)`%%KTf`z_p7<dZYKI-efmE~wfNqjT
zUi0K`{jwIV1{aGn6$g)>gH}Cx{Ib&}UMA!Bj^4e%<F85P2*cHsfj+}lCe8)d{$;3*
zo*YOrvXmI}4-!(<2xzVs@L{lu$+!wb95d)3x!!?l8xr7B0UNQTd!SC8lf~?z4DtGK
zN=J1-teFfnNkWKebz(2PvFFvDZ|>Z)V`t{T8@qRA-rTuk@0+i@5JzVB2fYnUB}Hal
z&VUqRnjFg-(C0j^9AZ2Y8fAauf{-*CBoNu;0K9xetOSJTV~8fW%hu?$Q4i0cgXSc7
z+}5)siil7p0iVO~6^W;De1NFHl4Gn`vU1C1MLRgBT~?dD&0y&+<4b5F^~z_nmU>7W
zyTJL2-8{ftyIw69BlMPsv3t8i<F_3nIuCv)fHS}SjzwD#*q+Q?M~tlXonW4;G#rLB
z_v8=jmlkNKw4d-9^sBkX-#>8Y?}uxSEh^U_T@0)mEl0eB5VFT8!3r&i1YkD(ng}g|
zevO0k=+Fqr`_=3UW?kSh;lyC*v>}hJeMK8EsCvW(XiJ<@Ob0->z%K5=-%Q(buH5k2
zK7N%>N36!$OVRdhJe7Vk(WXzuHao6ge|2FF7kB4xH9o$}?W|X+5rio~K6%(U^O1Y*
z{!h#Ep5d2eN&e*F`X>*Jx)y;h00I0UOzqq+WWZVuaZFwa8XP=yhvUWV!io5BRVeQ5
zIQpOK9}(n~D|jye{U+6glKY$x5Cz313sroS1RXOCu>v78@aEcOCVzPB&6k{+Iy>Gg
z_spC3nh;6D_nwHNN+pQ*JShyULNHEu*is94%`3)fZFPjyR}IxB)?kOw!W?v>#*Nc!
z3-{e0|K_<XVA`dg4(`SCt4nt&xkn*fnf{t3R+~p+=m0){Z+-5Ddve)bcm(D>8q%M5
zAO)}x;s^Neh5BAj>eTQR(rLrh4IaTjxx!%q;Eh~0!;2SL0iW!3i3F1vMQOT`+jM=#
zCfMx=ezhaKmX}A~XOLW1d=@fCJig|B|9NBXoAtS??&9Lg;yKFMB6i29`XJ06@~zuW
z(P4*1tr;VD=62hrCJgG8CTiOhC+$att%Vk_AwKrjeJ>$>WyLgh9~a5gB7s3?0=IN(
z(opOWn1j(>nOQifzd&RD<Mq!zm+UeDJWcj^it#fD_s<voeykBYtm2=1grq^5w3BAQ
zxZ(IxD4F@CO{(8fB-39bVX5eCne5SWAx~5)U;xb`$YlqgFxKE_hJr*8*D-D4ay{Ku
zC`xbf%=K^=i+rJ0uNEezVX=kw!qW(=CQ<Q3srB0CbSo(R1EwROd8#G1w9Qn}&ERE!
z9egin#ki0}MNnt*1%K=?7HX@`ThoQANejltXHjqlX&okIReIP)PchOxielQ(ZO+AV
zGt9>~xEVobuQbKDO{Av6<Kwc2-O1*pZrI%TeZ>$?%OM2WCdJ3l5{wu66YpZN$V>X=
ziWqS>3%~)xLrXes@zFL%=p7%V`eR&B7VE!88in{onPE9ZZ18xPc_2_j&L5Q$LhQp$
zqdpvjL{3dKA3_vnc?D*bzM*r-P}R(g*;pJsjngP8R*oN$GeeZFnii~+2O-n>ks1q3
zhv?Vg3VjmMu~eQaWQ#G<9*y})`&JuGbtTL(Y9dQHdXw4eq3Hb&!?$P6;5TY(aN&2l
zi1(B1Q4$TaP$hj}GeZo?m2BRRHffzAtIa?1XvJ?Yn$0TRvLnfhD3(G!#-dH3h3a9s
z(*Q<~Z%IL13WQCUxkG?+0znc)v9KUSP$j}F=CbwJm{BqyLK6vLIUYiiz&DsrRD`q>
z@bKaZStxG?IYvtq#aBlBP1Wg#O4|X-jR@{p1RomX91Z5HQC+?4*s$38jIaL>`v0`G
zAPR>Di7?0*1q-ACUN<pQYl&=)INip&x>z(dF}jLD*UgeLrJ|4h`Piz;l~7ZaTv0G!
zCbE@^Q_9B#SUsW<tci(+mCpz@+~|l|XEl2WtXt3&GjCecGYB2FTLnQ^FBAmzd@2Cr
zA`&X)N-SwXn6al2s#0oeo7TJ;8uLqjfkCCHCUF`F<U(x(y{)(Sj(Q7B1*d>vs6uZG
z9a3PJr=>Z5>G4XSph*4+qKN=(XT%yHda<uOKK1W!9I#zNpnIl8{qH-Sgeh5rwE8q!
z$+0X;+?k)Q!anYEX_GN5KtNHN`o!GSp1lWXbFpdN+1g+|Fk+^;qv}QuGt7NE=!!(h
zk-%fZk^H$y4Z`|InIIQjmc@hc&%7b!b*H0I{Z&L5(m7z%_%;NReuN#>Uxis(l(XWf
zcC>8aR5NOP+oa7V!flQah&p>5)n7$414TY$C?v0482VMw`$j11UxOk7sDV*}H;>^0
z5;2Q$cOGi+43+#XcR-`uCJ8wTWNWno>@RhC7BQ3%yzdOqvomZAEaIhH#AM+sOd5R4
zgTjpoJGn8kpJiq?6p1WAIuQMOz?B?t*mz2_(?t?5DR5>ZZrxFP?swgSf{APm<UOj?
zrapp^?<7F<$p}!v(FdQXG&%@w+veJZ1r-W~q|+Hi{g(5WBKakF{LUAk3C{_^55y#j
zgThD+B_rwu)bS#c$Pv&GlLj7F#?%#$(uvkKB~zb5`xE^@SFRQgk(jjRcBEH<#}l@e
z#C4>B%w>zY=^`P`5u3vq2ZzWR6&w{+F!;!0?e#lRzy>W43_R%PNbmy}32q?p_ed?!
z0Fws9gy18X7Jl~i2Ui+Ynt3S-{179COrn3>{^p)H_PjQN-sIelhLU9s#&jfhA&3;>
z<ZH+_zS!FldihSIQyvxtVb>Y;Z_YL;R>0Hx9aT}X7>%HDR78!@aM1%EU;G#iltf66
zl<61^5l`TG&Ra6A1{cUp!H=P-j;ZX_M7D#+mZpLa4Sk%+c$d!;#qu$jb80x^9>3XY
zcB*C(WN1dXycZ2w6SV!V+12MV5Y=<mzm+JIVqC$ShCt6udHwC6aHOq*28<K|VrC{a
zn2>(8a!J#O<f`bm!5ctJtb~s#r*C|iHv|o&3-?Em&tk#bPecwvPA`;^@SiPzwK(&g
zXh*qHz(XPTTg1?(G~`h&Q7mSsvh)k+@aflG@b9bT>V9?z-T{<dm@MR&0u1Jtl<9ei
zhZ&5cUAwmL-i}|esP3!_Jw3cL=|p-k!G%To$NN$Ijr^N0OirSno<ne(a$5zMic{N`
zOeTAK(#iBdDwQ0l;e%``pDRyI73v960!>6MNt!4*$MDu$q_;g#Dw!PU>mxtVq>4%Q
zBoe~|1O4e?hyDmj*SF=JquWZ;#iBP`&9(gU%cv(APonNrJUuu7Q9~)9S7DM;xyixd
z{LnyOZn8f$nI9OKa3+T)o$NqwvTrcgKbd#>1_vkmd(mz@6Z@SCN)6(^j*$OQI+;xQ
zutw0nLaFRH)t)1GZf6SUx2fJ_Iz8Oin@kSAoEqrGTin!3wEQcecti&dPsh;!<`zdW
zsB}L{^^NrRjr0zp?&MIC&CmYni9b2H`Uo1^xBbA5*HQ1b-f?6au`r$8UTS){WVMab
z9!8^6sX5M(Z65jYH;KwDT@U|_P#O==%Z4Uae+x(tf_nP(y;lstf8%)_Od&>z>|v0E
z8B4*8_2d5cN%SN>7*cvLje69F0qZ5CR6{v%IRhEKKRpE3kew<~(Zss(sUn^1O@qzF
zp|5(_;-{}9G7RsVh+Ao`u#h&}#I+s`-xICZrTw7mX0Uf~us4@ZClcw&;nZ+?Vp3dN
zWc_TQmwd%_Z@jy(3j0YwRs3(d52C5kT;bH&4IBrF=RSQI<GVVaHTEa7WAC0<-*_$a
z>fSdquk76c62mL3r5%LfcSz;CX&wy-&h72;Z@(ik#cm!jko*q&jF(3s<|+Ni*Eo~B
z@V*X4QWiL{Muk)@-FS^3?8OMMZ?Vgw5}>h^7#aA;QA-)(s9<VAjH-)&1b-vBZgSj9
zDNmty$rxb!KGX$$je;^Yf-T!2)na3~`nV34WyJF2@IDX2GIG_VAPVZ0U*(BVew3$0
z`O%C*v6L8#*EmRWy03|O4S2L(?=A9p%_JgsY^O`$WRlO;vkFmh@*thVgDJvYl|m^)
zlXQ<NcpxbNdB#6;PC9?J_4TFVeZ8o=uU|@a!q?00@=iT@jmLncoI7H2ASC@EX^UR=
zLX-<HVdxM6nktx?CpEK5&ctbQKQEy?O<^UyWcTcO$J8v!d^IcS1!uaH;)VPSNmryd
z2VPLMa<v{)tFQ}0kiLSY@io_<D3o&LX~>wG$W2ed%p{GVGOQ2e&$eyWEE=q?0)nKB
zjUxxsI!mbR;JU=f64QFBVT(`|_$jbFc}ZC41>(m+&q*4Hu|f~$M{2HL0njM#kf}j<
zWEXQol4mvRUcuBT{?YTyn(=erVMLC3of%<0vDAkd^W4E^#w3UVRmwtMxtIh-onuQS
z_yWFtF#}PLSw=7(0NXpHdg~p3swZkW;1E@kT}~_+N6EO1K0gU;sw+{);!`Y!KeMFG
zV|dcLy5d6LyUgUs_=4D!QbJ%Poo3e1DKm!+G6n8Bq2P%mUbeCZUa3j;z^k>wA}Z7P
zU*=F{WEc|gCYgp<zjcnmA;8lt*FZf;;u|8Y2)siq4v&F<h(#js5T$rad_-?D5-%}S
zi^fmP6@iUpTIM4Lm{ga-Sx42$Pv@LWERbPf@y*HBRgNJt47k(;fJR@yl>LIK2Ugx8
z*n!(9GFZlK7aBy0jm#H>N*fm(h}bsV9|sjc-{4R@m1;_G6AWWN>!yLUJ_T<Z&vBoK
zMhU8d5I(A1Brn@cvR=D>G-3XRMI1OYPtss{u@|o@#d=WV%%k<w7u@9=@bB4=Y1#<)
z;@OqOxs}ClIp>718fWGj*Uv~U_&$`iAAeYTcDnJ&4IZQ8>U%4TAGr$;wUR5(K5Lvk
zzPj{L<MZq8!dds^9o*8|=?9@B>O}7RXYMy2z&&Ad>w0zhBlqqVrBuYFYhX8x^B=m8
zFW~O2-8;U1^D%rDIwlltNeF1uDM0|%+?Fq(kq#8@-QQfm!$rdKMl@iyd+(9^_>z16
z3mS~mzq!hRL`Q*F1z!63WBvTp1{h?YQqv!?`f7UXXrnjI_nD`dh&+;3NmeMUB{kRO
zBJ`4X$0Fv@T+BOlYvoz?<hVgJ8X;^F%YNk&K#BPmdi-k{!erjNATLukn#Se(N?8*H
z8Ye}dK^yjn%%~?{Pt?oBLaiP%Fj6vw8#-AEyA=~vM@^N?Rcs<9LE+zSDvV9343BaL
zT9|-fGV(Q4k%%bi&wI9(A#o^KLNGt&x@}SjYBnUfG6>DdtwN>pv_ZxW6JNFZ!j~6K
zyPs8zRWJ@j$|*Y^POOsG2HtEge7tt=HeXnjBf>67n5TR3JezL!?Bn%M)XmEycH=aU
zjd;Bp?;<KXeqqJ{1`iHUN`Q46=-me;!LU+CL^0304kY={S}JM(scCsJtSe=%YE+7t
zuqiGL(}qH+3Mgvaql5Ebu&JhdqiK$`CF3s_WO^udk-S>&#Xn=g%Qk7jM<5>@P31%R
z4%)qIpS_cYCF+xqaj9w<ylIPP&on+ev3~m8>hfP2m!6syKrUOv_-%Sk6YLvI0|N4{
zeeRz6)SbUiB|wm5?w(m$`caf*beTm!LgWP?RaOD0wgUbnZ9lX?_8JTytA+xW8M9N7
zwwoDGE%c{09j*k3^xvq8IIUs}Lg;2G<jf7UETPkXTT;wvPzcequADQ%3s%q>tH5KG
zq5OY?6?L{M&5Aq$O@|3SK~+PQb|IBpR@?=_eSXS2try|DS4&EkW>4Wt#f_iDO#qu*
z>qbEBSn%+)e9kh!YHF$TCvc(rGahl7vkDtr<N~2Jx5R~t%`R|D)s5Kt>@ECE%3Cuj
zM_KyMUDk-mXDr+-SU>pB&z+*HI)15IGdt65LOZZIY&MEYs|oIGR%=D$*1rlK97V?z
zof)^LXKDjO4xE>uv!+?yVh7ZU$|bm4`UzB5y)?HqAk!Z(aSkW@;)AGrcwi_#pxZaM
zbC)-Rm6QG94{Mru>6KYxO7exxxayQCt7X`?PNn@<@wdZ4ouI?^3|j2_wB2Kqq;113
z=wg>`+je!??6S=++qPX@w#_cvwr$(y<b6NyyLU|N*)tO{F~1-`WJF$ho#$HXunF5=
z1RZtC>WE_q(FS+x1r6OQ&AS{Ve(OFoZy2_}W7xr0B|MCBwf(V7E=cd?R5W+$W0xsY
z6dJ;~#Tpds(vqca#Ja<lz3Q{mGsO$BwbBlF4lX}^zPnyQ5NPEM!c%yTl(|Hm+~Z(K
z%k&X?@N_^O9_|i;cRV~I0O1wA&5yd3Pa9KBa}Q#EH`ScsmYTqjQOIps(#L}2y<7Ho
z>r*EIWE=Ohb<I-XCPH?}PrF7Q?-NE59^t`@N~J2OGyboUugog9p}C`VaPjK|Nd?hy
zNpn++DXoqERb7%9v}FeLtHkpZ4s(BQy(xnCgx$^%(#V`<gMccxS*c)sQ5xGmsJx@E
zE>?|0(6TMJwH^#TIyKig+3PjDv|{dA9NjhrDrc1@MXIT~_FCGZs$s{DHP&gVRxwAg
z!G%FVAHL)&+26%PeaWfxgk~lZjU?i$y!1SmJc;$sQ_`J+x(RNl9~>xXo_!2D&g5;k
z5igF=nQOY}K9dJ9O2r_3;1h|bu0;L_GOpL&97(F1Es|o73x0}owV?fJQbTaOYEY?!
zIi~{g9K=-qNi!U4C^CE@&ZA^GOe*OlEWf|8jBjB__EcI*Q0_fV8fNj3rtO%_b)fbU
zJtu}22IVlFAb+fgb09T^65-Qch5d!#D|c%xu=v@;;t8~GNay~IwnU5wLfd}|y)kG=
zgt`j?PXrlFar1)bOR6#i<+~c|E{VX2&UUro#Q&}TIsU9Fx8a-u5X$%@NRdDGj1Ma0
z3|t2TY*B(2ioNrebI2WJ`FIy{q)9WwKX!KH*UazUlnR@GJgrZdZ?Dc{5dg6>w{7`)
zwOnRy-R;(W{-;ZCqiNvVFBEO4%^%-|$e)bCyzBGfz1z-iNewVcAMj<Teah3-%vW!G
zwDqcI%+lAZ&qrIE@9@ShmmBwM{>v;7y1tIbU(3%|9^ff`4pjXK+8N@~tBtPHmT8oW
zw;3D=V2Kwp<J$5Y16nQHN&Xp)7HixFqGRvhVRl#C0&`H28<`NbcYTsOY~)`t2wBD!
zlif$=9?spscS5wlf>P>Kb%J5tqa=`5CDSj3vJxNaE{l#B-NhyvH=_9~1%DXEGacT-
zFVabuuZ&%{2t#%gVmDZh5r>qV(PzOHgam!qWCaVsA~vJ@YmL32gFDV0i$RcUhY(z`
zuXS>TNnE!#o^2~>yQaDJeB6?I-^^bx-u|7h+c-t8qRlo1-jUxQCcxp(MqbBY*qQwl
zZSxprPiW!K>rJ_3g>Kdvo`=??KX9%i#khK9z|PqW`H4oJ&;}t@cqka{Fa)Wk@K&x!
zM*RCtJn7oP-aJ{n$ksx^07cFB?#OWAy%1eNi3Mw(`L@p=i7Qtig_X$49SK2m7h*hd
z&mGYRag0T3#^)ekC~_7`Gu>&a5OuW*Jl`L$o`Vflu1LYvmj8=8HADxcM=^su+1toL
z!YAuJPw@u^`9(F%V@DyHd;@9xUs3j#ppI>dk)tz9Pl<~kcT=&OJ;hNKSuf{6;Ihs6
z_GdPMLsb5DIKiVJGKb7Mj`CINgiNck1|NyBuR!96R_9JPitlzY7y3v9dBmu~@16RK
z#ux9BQmRe@J$mAO%+#liL~7`8r%VCHz~r(?bjmXD8i9mj-<a7}>QsnfqP@hs%sf1Z
zwz<*_hSUB<sCVfQOfkMo5kf#FHH4A-C9bge)L*^5ZD39+A*8WG0f2H=haj@x9t&Su
zU^CsU{T9(YaJl`As^Dm#t}FaqEgK@Fy9IR4FTd=YgFm?5cuP&(jJuq17F|RAK!InG
z$BF^8zV^s+KWeOA{t5{HpqS~(PpA?sOd&}<DXstnUhxKIg|ZJ)G2YN-*d^tJ1Nvt|
zn4Fl8f?8!$!iPGj4rqghR*dqDf>Mq&w|Zx5jO3ZrHH}N8s}B%83D@pH1bU{k=Oz}A
z4u>h_|81<v-R~jDUnr8+HtFl>LePOv%#<HSB=iobOuqqrxE^r@W3Su9CF06c9m=WV
zgBH4wF8MtrL<bQPRq&KCP`{iDeM{$<-k;T&z_?j|Q0%x;R958{F(6MaI90*7vfTj?
zTTNgGG`WelayV4(pyg-5yuMzfA(|l9>y_noV~AZy2;MMV@9Rl!9U@CP!B4F??Q^*>
zW3^Qk%AXqOTyiJ!+C}{sXqUL@l*X2dKiG=mNo3ga=}U?;xZk`oXtdH{1X8!~=E4$T
zH=7{;dhx<oqH&Bo?{JKcXL)_5mo1hmBt@#P^qL{EFbM}h#uq>@&q!m3LNi0GcRl!r
zA%%W68}yZ^3aU`MAxCN^MKW+fGL=|a0+D(~0tJ92)-TW3XXb@Fw2q*OdVqIq*DF~{
z2&x3n9^{5;`vTD~o1yIi*Zx=KQruW*7q6)*F?{X&UiJfPLoJ-IRAx&xiBjytDFY-n
zF_x2y!LKaw94}ls_71=GXx1;?K<5WI_fM+ba3{mP*ab;Jk(h~+J?7YxhyrEYB4IIV
zj^l6Cnbbb}0`>|*DI4{VzdADKp#13uO`;oJjLJuc@%YOY`eh=z$HfA5%;yI`8Hn+f
zcf#qbdstTOfpQQ_i<?wyWRK4XzX+y0;x`ca(hqQ*Pzg|WPdxtZlnB`V%GdDnnB1#l
z{SN(1G)mFmIRy$-2CVd5m2riZ>)-R7$*bV=zU+aX$6JfPhLgomha^b#a+LPj5A(au
z?|Vr=NZ@FApo$EH7$BwZ%tFOaA}8OBlp*B69{!Apj2DE1(EDPQlf4PsZ2~oLiKT83
zzra?<9Cn~mnjZYvvkyLdFgPBo)rBUTD)dqi(e%##^j?6d21R*Q)IOpdpw2d!x!ohD
zTNxYfv>7a-TWt%`Qjebyn^G$3-1OHv$IP>K**Us3oePI|2Mu1Hi*yy3Zt31TAgrX6
zzs)%G<~Hxq`Wz7@SHdjr`%FwL3ST#Df<8MKDq>25E#l{%wFKfyYl>%tpjH#cw_@y=
zx9t{f>hBWK?V3^?NPfh`Z$o&!fyG1!FcWP&`7y-aR%V5#y?&^ozwU91jxOG*6L$@>
z%$-@<sZup=x}Uq4_i`?xj+gH&R1HUkxawP6#?Hd|`(<v~CU9+2fNDMomt59Bt0&UB
zZcl^0s+7G1GimeAD*ii6w3cHtsWjzxnKUz_s8u=@cVaP3fr+yuuJS&(yz(5#M@Er?
zM?f5vw2egJ0ix#a&PioJM6i@@xfzuWGd`A$_Pc}pNuOZ$y;GmyZ(31*ryCcPPIm}`
zekNV$Mgh2q^|f3&Y#f@s2gw_?zXu2w$$^JwhPJhg;GeUidV-6Hf7Jp^gf9}dGGzZ5
z#QKFPeNNg2o^uOPPGNuQRA)QV)sb<an02A|xWNC!9kBOlyZaq`(h=rEJoMK|*VKBH
zQz}N7%Igfzh%wId>TYbWIU0wbz4{xC-hqd&z3qW8Rc-cvmNez;k<6RE1d;V9rdJsd
zQHBKR(Cd3|dCFFLFg*i;Ghl5Et%B4Fe&Ob?QP(K$=i-OicK~anEcQ5-s%Ov8#*4Jv
zVzVx=!u0ea1{Y>MrIjMFVA*eCU$TAB*C6KHgU*E?H_XsWh8MHYzlB>N6wH~}P=&ir
zDi|0wClX(HC?oUJl2yPD$zIwS)bUv?(Z905n=tfbE6*rq*UC@milgyaQeA;R;IU1m
z;9<#vbIuKzQOAq$qObc`v2WZ?tCo?90j(5~;6SF!!b#7zq|L@`U#@;jg`x$jgl=YF
z$&|ePcn|S_c>7@;91J^!Bn9YZVjt0lHjmC0R$=A}=LiV%1G<_1qW^R=<>8J95V3w*
ztdfZLLr0#=yjW3JJ0iw?G7T>`<|!CUW#PRHYPF~K@(Pi(qr^4w)N{9H-_*Av$1QQ3
zNX(E#hr-$hgSBhDe`I>v`yAd<HK1JOB9_oQWPrU2J7DnX@%j?jrdCMn?PxC0On84|
zbN2Z>jq)FD{pLjR+lNd3I@9eAgl7)y6<m0EQP>Y1zbZ>Or?_IZ;jzfnvXk4g@6K^#
z(AMDh2F63pBths?k#@Clzg%quEG}<8U$Y*?)NJza@_lK>oGgPjkpEPP$$x;%FY{aO
zZm)Z5E@_EjCr`)6#9ijgbiL*!?sCD{c$G7AWc`!N!MudHx`ui1&w}fz(hiQ;t%={S
zn#pP(`xK#}RGs&uibb5^<r*F<+7G-vYMriF2npK-_M@)b3=~Zr&nGJ9s(z&)`%^%g
zyx^+j(Ea%w;D-{n!94r`Y)ToVw*5Og>(}lXyVTVker$6{HGtrBEXOhBO@%Ppog=EQ
zxV|abp<?MsehZ_nBdPo2yLby#+whFSSJ$U?^!LO|yo?czJ#o7clw!{Dk!n6TyswWe
zndUeq$gci$kvrUB9zLC-`A=HRyn<e3o+K$lsXl^`ADSdZ`hi($l`4*Smv6;5=N^ey
zQntbC6?VLYNdo7NYE1K5S}m<@H&7-5S07ZU{U7LaS8+kzA{yA@-?uqo7MSrlVIlnS
zlRjrU6Xel8x}C+pd~~bG;d~@C3Z{oQmW^=t$#LF2Xerm6lyan!pkp`!RPy#E^HXK~
zZB)(WgdDhMu*+1y$1i&=3zHX}Q76fFl41?}XYWkFR+Ss!X7U3T#(M8JF3N_Ma&p)f
zH##<x%ab_Bmdw@)3k9PkcpSJ8^ny{rYIxBXjzP4O;MlDy9tC(Qq^r0BikDv;8|G}J
z>VLf6{ap8r&e!1bA1JWjAIg8f2yzuDxl3n=bWbWXl;dHi0Z(LklrEtB4UYwz#h6p2
zFv0D;-_MwguZExV9JBh7ubz_Uc=O8yv~j?2R^sSKA&<zn4h_0P%p&y|x=XIPGPvcL
z%P%u&<z^g#hRQzGcLOzp3bILRtHwl~;HqTUS+<HnZ=L!OF3;G4lQiDE04*)^Flucw
z$}y`1Y%??wzaN^GGBeeh?S&_Q3_7Vi=;Z6@^wc%l;J7L&Fw&T%+9`$!oS*%Uk2jt`
zOZ6Huv4fY2FUp==gnpH2iWa163TCTZR^^Wt+MLO+nu9A5E$7SQya-kS?*aQV=KbCC
zTHS>vujV*jzB=LT=mDq3V?CcS;``HdG-bqX-gGumK&pG+#mBFPzm;!>hX&<uQ)r^X
z95Fkov`o70GVS7~9DcKQsdOkOI`g*HaVmdG7y+3lQIGRju1KQGcBn+QWG(q&9;lWI
zT{p^fazVaAGtp$iu}MLz*>C*_nUG;MNkFS_!M+_$@34RATsWU@4drZdGy^ZrWj-ms
zXkcB9l9ksAguzwT0CjHd#HGw_;>fK$ZcRKDkO{L$uO3#0b}S@)sHy`qsM~aKZU!;)
zMGr20+`9r-O>@EX14B}5!k>@!EKw^KLKmv)P<4}jF|mI(lRkLwVVjaZMXIQYOZ%)p
z_gl=lS(&<Ndv$KBCq$bLtu`KyF4%({&s~{f93STijy}1srplru@k_&5N$O{Ti4>L5
zqBPavL`+i1DqE{cdKs(wY!PbAkzyH9E>x&gk<I-WEx4afQjsBDPm&@P_K8GUqpd@M
zGBj2e<>MC>ed5{13LfPp8QZfY+PD8yf{;NHNstwn;INprzbE)QHfZMiSOX@gwqv8e
zw?D<z;5yg}bTzM8X7{9FEpOL_&o_N|FZz_-e|6Ne6L4NWVBrvk-yuTfU8#Sr$QhM=
zf5mb?7rI_=VqvCewLS<jNdy3!lJduKmK!D-YSJYmpq;TnIpYRWpOqGnf8Dej{&W=)
z*NF|_ddl*!V)yw#;$fFWhCVY2H$+T!0?~Iicxa^{KnB0@d|6H^GTm9HIvD)XJO=2O
zv(is<Ol|u&3_r)#cstJtV_A;raPDUFp6Y(YG`0HN6J>)>#-nhEb#iLx^5<uPypTE=
z?9UHdQe?L*U$@_vY{2*q;j}TQ$1d@=_7~$%U-w4kkfn#f^Ng_RsYmUca?pFvrgpCc
zW8V{yK&7i^tR{DyWiN)4dt4cIsts4>bJy_~GFuyeYT@D!5_@gm)CWWQhp$LGR&kkl
zibowD7LsdP9-g6oH41&?l7j)wdAcv7e0JP=u}%5bza30hNH0&);r2-jMKtC^L`2}U
zS%=jUv;8W=+i<kqV&b?UnbneMCrk-f#m;8p^KEy%G~$gCzTht5Iy``JgJfkSLg*KP
zlz#my4YyG#Yr5RB)L?FWHF~Xv?@K{upehy;JX)SoqYg=fQg*HQyPuMnH7%UAglZmD
zYNqkyv5#6wt&7jv>C?PM7!)TR=;u#<V5bJDo;^WU<M$$6@p2?oAqAlFI~=z!CfE4#
zr?Rte0!QECuT#yLX;ewM@7G%Z27_wq42;c&KIc@?BC%iKNF!IRL_YHLhb0!<w{z6+
zuM1KvKzUNkwgIs`Yr&jMl7WWXG;zLHj%Mbg<zI)E1O@sT&X~)zuXxm&IMMBaT6O51
zWl$D4{JUoAZ=XaSH`^8&g(A88sdC@#yv?|EjMAlfoiqU^kM0%O7`s=;sigw%A@&M{
zdh`;5cbm)g)>u9G^HOV0a!uQ_T;9d6Joa>v7oGjNf~3LW2Jx;&i2#79BmzGo?Sg4f
z{<uX|A9S0-y?3zMo8c9SMgE6`!prO;doWEr)lIGYW&ACvyN0JNpxJt3wyca^<Ts}4
zE^RUd;AjAFK05pIrSsfhk4?Hm_a_2IEtU#~i%Dm%tX@&EkHg+9&)Y;Oe_Ou~VvS4O
zcy6MbSz7!1j`#V<^=dSMNE@j$2vV=XabReb<yUe~BZ54ic+KQQj;2n47WY*_=mT6a
z&lq(<oj~+VkccPwvV9))2%N49Bj_-$^jSxU@wH+}j0IAWnw@fhks@uPPf^2;6|4q7
z9r$Ig4N9LPxO&h4Sz>Bwa=m-)@vbJ*ANO$#?e?~OS4rto2=8=yG&^Y~5)dJxu_#_8
zip0|6zIk&Ld;@XJ#XO1ClS#HRl4?4v*QilWE3jejaas}mlgw5vT@~l6L)6=uO!ww{
zAj%J*4KP&vbD1_UdZ*IXQwubLJMU*u_kQ+RLT0D%#rFiMBPw!^O;Hk2>`)p7--G`M
zPl){1L~7ZBr9>HK8*eQBj22?iovKwlI+H&OkfnNNXjn)`Iq5-Xw=J~t%?VJAq~>Jk
zj1ks*Z2DNSKAO@?h5-Upazg4@z!?o+=X}IXh1mXfEi0|cC}X{*Zs1`NC+we9QWZ2)
zVY{x#G%Y6KxRlq5Y#4vAENBm4dcY{p{+jHNOyc1fl8PV7Gsab?eq!s$St@Ip@C2?N
z-ZeY<wyY&{;WaC8gy##bND&+&w(zUT4RixYa!X_`PI3dn^0hpj{(eGeHgoLRSSdUp
z<Mdw;ltFw1v>v%tz>wzM2;=^r9Ub%)G4EjqP~krz0sw<69fW1~Yyd^=)?&2;yj?Q^
zZ!f?D$J6*4F1sD(x#t>VdS_GXA>?bVZ*#E-g;fwrqVC_WyvP1oyPJa=`ZQn=L+kym
zr9{`Z!xx7Dm*{w+4-0VpLd%wMn$x0lLAAVnlVGd(9T@|<QJ>CT3W5dslZ$s*(HndW
znjEWO&Nlw_PjwwMNQQaD@u&*flo}_i3jL1G_>xBpXSo(CsR594M*L%b>pv=#+jupm
z*!oI-c2q&SoQHDTYh#)mphaK2L|dNZzXEsMZP)!O!0J>*yX9Q7iTK$4&?31?t6R&u
zHng^(@w{~hxDaYF@W=`Ec-{xOS`|O1n6C1ztNl7mqB9ywW-=O2MFQwhOhyT;u6wdb
zn41RD)3%HE%^J;J;nX*DS+4J@Q#GR2AJQ`@-{UkP-x`fQ*2=GZ9G^BBNv*w7>vV@!
zsD~H)NAv-c@BfWM)of|2yFaB-CV7NXmXiTrj;6H^J+gevIAf&P0`9AX4DUXP7rL`Z
zKhS?x)Y`!_VMj&A2;?l%dEX{?wV-+1Q-Gnw|L=6Db=idsAoBC<4};G+kl&3xrs2iV
zEJ|Uj9&faZM&VW%4O2lFO2wk96~J6Rq`@nprklYix%MbdY>4%b11KG2^2fRZh+!L%
z$GyJRuS0LAFxX~08m|)}i&N@CFAV1gI0S;?8=2@_eIA$6JA%{gvG{Y~Zx1BT4@LrM
zeu8gvQKtZHh9+`bF~}d3U6f>qjx?44h0NVI67%^2;-Ap&v#p2%W29$X;YyR7)XT~{
zQw`Twk>@$CN@bf|m$vse(d}mIKb36f&gSJ+y&#Z|gkedcgqNs;g|n@9y$ms27PyYI
zO8UvjyH&YyHL7!<xJ11KpariPr)gkuHRR=pF@C`7%L`3}djOl00{+sRHTkRkZMFSg
zq378(;ASuk+bL0s{F@_R$_evuuwI|i^J(?$P|n`hY~6}N`vWjvVSO^Z!Qti!M)&nT
z-NbsOh5bwy?P-=Ut+B6Ls^a(iTiOhsZmVv>hLTZw%mOE8LtV=aHiO%!(Q>-FhhxaP
zRU2G)nI`f$t=9pHo}>lF4WUK2h}}VDe7_R{De2^w_-|U+ImDOL4EBHbWC{S+dY04l
zAfQ89LzBZhW_P<9USqB)3^U{cr7dpgvLDjN%SWjb60K}4j~4NF!f~_cix#Q8fy<0l
zQ|~2!`c-er)!lE*3sZ_Exg5uT0V!Yvl!B4YXUIbJuEXA^;o?v!!-ra1hiS7w@)SQa
zAyd=i%c~DI*Hb1n%p+68_Ey*L#Fj($e<`I<a8^WM8+stR*>Q3jxv45XjTdn9bU0zF
zOm}E;!*YhVyY`{_)}cT$q2z}!mg(X@G6cz(OIN(P2^%e*>pIz5-K@(mA1$8WeRAWb
zO{i-Q*U=<1w=suIv{hB~|2*rON@fd;=S<+-gJKkPCD+w+xc7BmFqQlOSEzDJTM-D#
zj?LxH$B&QTR+`I5KxM}y8kaT~HYoRyP?9&vH!s-T`s8}`W;)&s^$uQQ&4k5d6xO!s
zoxBJuT<Zy6PpFPeExR9Ia1#rRNs1t~b}T}1*32lF6h`AxH{(vbOHMCZQtxy{-tUP|
ztcpL--VLsa)&vAUI4Zue?*RkDZ!|4P*@j)cSF6KuE23%xel7=Va2+jvz6?I+**+j;
z6}1iZZkE<pq{&KUU#7ork(15?!0lxerl+lNRHHH;7k_bAQ5Ag;CJvyh+$8rct?c@;
z#r~G#hY>AHcD(JtFxgOAmU6x+CjUXB?=Uy8fEw9fU+3g`d8%X~=49(FoS~QV4g#v2
z5z6yp*<5F+bqX=-g84`SVHZ&{0Ev}e+i5y0WW>(O*MPZjdm7@omQw5iq8SI_c$vSm
zNS3xm+bF-?J-HQFo^Bu(=IX9JJEO6tvc(nQ`v9w^W&iOtX|P+cB7jaB55a`n?xOvc
zq%+OG1RyF+u$0Id*kKMIoVdXt6YjlwdiY4^Y-IAe-=hw78%E!B=Z`(>4UckZ4q!4-
z!<uZQT7qg_ib1DDc`#F}!Gw;`OOPsFN(gQJz%S8M&arV1W%-9xn&jRn>>uQ2+*VgZ
z=QqoI{c-J3OVKa;@qp5C{YFY^jL9_iiUqmZwF;~j7NT=Hzn0L=KIILyhDmI?r_*6?
zWAOM+2H08(h$WNS2IqI-%p9%AeU)ddx^MO+(|Z}Y)#Cw0OQ;D${1K3zoANHI5x!>8
zwCd3Fy?TLm$;>cf*Mxi1jC+&QY~f8~Hk!=WuHx-Nk^htYb~T{wgZVeJefE5MC{Pfr
ze>OQ52&|EPf@l8CIfgRQWw;Z0c&u1siz5n*6HTNvjT|2c0XhgDo504)R1N*6vs_j|
zfc}ca+G+PzxN5R(<+OK8+XEE=X6e3)Q{0y-Bt4)4K1KuLQy35}YFu<_Hmvubb*ibc
zJC)+&uFoKlxLU%%UM)kiKA&0G$oDkf7a5W_@RnQg^QHTz!*7ZPmxjgU+w(?INP#uW
zSRkEwz-wyc{Q|x|iyROJpR>c`Xy4f0zeG~!_kR&dI*n=!-q(PG5J!6XK;Qp?BvBsV
z^ZmDuGz0-95Sw0(Sp&S%Css;Bx+*Gw=HifDEH${PdeiRn)?mzw_Q4bOh*}9X5RG)L
zMkDRk-+JMmFh76v7EX~*)9Q1fo#UC}-jB3(5nKQfJH9`?k#Jm^W$@VBxoCU!x+pv>
z$y!2k;P)#ErgvYt+QojodPkgrlRzkH(`KcxZDp%c78k@CNrk6c&wWMx<^7*2pOPH;
zO_Ix)6Q$VM;kd_5G{!(goRQ=~qj5l^`xY^;-fwHoBE)iBPOk0xJGf2P%)cW8Pyv9C
zSMZ|E&!Xm7)vW!E?<RW$On{r=TRp>pi$IK<;4Tdt)*6Xquo?|(sD=g;xZ)2Pk$OVz
z2WR%c;Y?&>-k4LN>M0!^-L$YWEgrq;V(lCmVNP9nB3{W3MKeLenlNeb5TL3bsdGk{
z-re3fXbU&~rZ#5#okow?*ki-taBV95s~IlE7Lal1kv;Hn?zTVVY25egzg!-ERneEf
z6C?%CUpHE7`?t~~(1IptZ3+a<SF3iVfLH<Jc-1;CAKGFyW7&=9^tA>G_}f2Z8h0@N
zdHZyhg+R!B7wkzo)2Ffq)pGxA3``&A?VGTs?&QoaSxHkqVr79UEJBA)7GS0K&{xIB
ze_5OIB?Vjfq5T+H`{&OTSele?9{b#rCc3%B4@uCC7W*~tC#dlm0f}Xk1Qd(aXkB*u
zsZ_QpF-}5IMCPq9G==``%b_?1oF*^lu*ViOnmiee=X=6R^Lb0oWDZOLI>g4^6S&Lm
zko__Z?v~#Xo6Ljhp^I%)O_DIQ=9AM|@UgC<72r5lsN6KxXaBGu<@}2TatH-i%G7MQ
z1iu959?;tuYIdTjEEloN1Xad=NC-GvXhJ@%RhCZ*ys%9mW=U1;0aY!TMrP#2>1X*-
z)Ts)V&X=nS0{Or1VW7q&ip_*uJt8VI_T=eMEXhw(n}#qcP#%uW7jcz-X*d{7BBlaz
zT>i$%3pjVhn!I2c;&lM-M$)1ueGvpo<&ed|-J$LXD6WuCvd&r@r$QF`rq)3I1WfNE
zlY!X1k}<|3N{0z#kU0>F{;IJR#W?d2x%kLD>qUY=+THg=EXoN-6BfJH*<+GFDh(U&
zww(=s0lgp<5qV7>r5E~74--V6wV0j_t~H^P2Kuc`2Ge}004&q<SZ=7onx!HFN?1+u
z1Cx~R1St=8a0uO5ri}r&QZwP*-SMU$&uGc(iz&{Oct>8<a9~|4PVDlpj3zuc9;#{2
z{mo6%JRUc*mQ<gGi@T1!jyB$~;$J29-b|QBh+em}XlPiM0jA809^td~V3d~pU$svc
zPWuTsw3-PRC%<SUg^!e;>6sLBty7N{@duBPTWC8wmr6#=xT{b>%X!}`!I+&=lB5!D
zB|R%Qy5TNri~Vh;z5tPvs`iR9|LE%~^3uySjp&i>25%LMOVe$YqAk|W5^67F;ZR%z
zf%y=nMx5Y}UyjGbZs}`#L%HGhEMU%!f3{NDv5MNvU0GeI=dR3bgzHa-9#r*&Myo~2
zv_WDB#+R5MPo-uJV&V-_Q((K=@-PqxK2{Cq1jmH+c%_H5dqC$~-@+hVGBH8gQi0X8
z#Oi{ibBFV9+URb;?3f8bdH5a9O6~s)ODOd~sr*62jQEMQk_zFuvDMX`UA1Ht%P?yK
z@<VC-ccdH8?^zcuat+Jqkl(qw244HcfdP4FX*KR3bYIGHJNOHNILF)XiREL9`^m_X
z>j?Nl-k^At+0g3bu|<322JV#<Y+H-J1CRG<)UO#-5}Ecpn{avJbk{6y!>zaf?jzXR
zY~T9MUUk-7TnMSg;)9TnF&(+P-OEoN>QiT93vHG;Sukc_t@|DQ*{S0d6GG691ENiZ
ze7!cA&symnQ94;K1Q}W`XOjYZy~n(2^9~x3z`>I#e;25IPRDbE_Iuiq1%9hHl4JD7
zLNZ}=9eTQ6Z@o{-0oCedmn*X4^a$@vFw<lQ{(G|$kF(Y)z$@SGS>a`Uzag60m}At{
z5h4mxyOKwa(QCr8j#$92n0*{A`#NDMc*+p)fEH=%MC@gAuJ~sU%;Q2vLQ=m}94AOT
zKdtwo+8RHq$&)}{(5i(a`wT*`vW7n!d&Q5?B-yqgeFv>o?W&(9=%}JSOslLFBAy{{
z2<cWfSg98>6Qa?6rMmD99B8>OTJYPa^kyy$H`Yt#R$Rv}%SZS|?gzmiDLcKp|2k7k
z`@hKa5E$_ilCflSOE>9KrVMqqCvA-^9vY)iCXP>}nir`!v28EHd$m;JT@j6ep6G>k
z2F8<S%ZMjPtxZ=5)(M0rc;5<k@V&o7?3W38d+#;wpc;Lw1Mnf-%uF74TOK+MvBmH$
zN}`DwTxI@?T&YH{hPdJPEkWCCc*W2cA-E#F@B$-vu7gC}I9mnu<&5gzkqtctC1<(n
z+cY{*i}(W2zf-k%Z;U6jGJomu_50kW6aQu&hJNgSDcLZfP@-Cfz6mTRJ=ws`y%~8_
zP6DFfHCH20@lK9PI$J#>Dw(Q~*q%Tu{ic6i)&|OHV98FfCC|;}xl(woXpOl3!=_2;
zPk_+w*vqnf-q_Ii?(nseljwR^2@E2k*C7d$Mjf6VjY5_b#xA3nbR!wCm?jHfGIEA{
zaUF?p8hO9y;0%E^Q@z<lhAmz*`#T7tww+Ol(%cfASt09fLC;#391gJm^L!wjE@x!D
zdwP}tZWvZb#M2}5jr{V^PGQ@Q(RCNV>85Y~2MD^-{tzEcK$YR2GsBj;hJ!9&6yX<u
zG<TKec%HNIn$tSn0feI~VBMly>xJe(8;@A%f#OpL`6cH19b4dSd$rklEnu=d#an~t
z#_PEA$l-eB>wR6Vh8kg>Z=#(~EF=KU*eO7)7UTwmz9m7`9S4zh8E>S|E}`!c=InOO
ztxN=bo#{h1!7OMUR9D0kFh+t54mYesMVJRn8~)go79!*U-c>WhGS99+B=Iyw<0IPi
z01%Oa2h>_*uok47-0ShUNs)qrqM5?JLrh#QH}5scTj*p4X_}MR&37U-mAXYSk1`<G
zmsIB%Ap}F2TG{`OidayPNw1mw&4TI8JRQ&lYl|)b6p;@Y{<I5Bl=)Cs!WJduZf)ZF
zKq4S$35xvPQ$DIqn4~x<`vt2@w0Qd@W;jY?dd?(|ybn5N{8KFUN<Sg)r(ck>dx3dM
z1F+n2ao^Ir4YOSX>Ff*IAzVGk@de40)sa~f=Ov;|I8L?!|8Nv~vb@1C1FGQ8Eer?Q
z!^<LAuehBw2RZ_yPg)RxHy=AKs78r8&0+xU!rj9WOhQ6w2}I7Ff(y=UICrD<s4;VJ
zbElIpb<)LnQb{rNBiJjPOi+q}T~<CQSL>+Ite_I28dK6Q_=m??y>L%3>=8NH2RYbh
za%9P;TC~l1lp+?ZF-G#H{|@`xQAU=`1)48pMNu%t8rkU$!lD;B1h#tdtzREFwgNh5
zhLmiqi4$C_R&BNpZj>39I1<Hkt;WzW7Gw?EWrSXZ(&XYJzLe42BtSTV@6bhc`^ug%
zs8XRO|4YhWW(U6eA5bXm0g{XHazKG#YYqSkq5c;rq~0L`?+&@b&wEQjp++ER^xj0U
za{I>lXLwr5edW%nP=A^Rjb{C9(W;DHqgwXo;u^bDI-r~VH@S2I8ti@>vM4kT0)h-V
z+?r8kaIJ}F+(WoKkT1IXR^Wp}>JhHM#^|wGe;;TwTW{NQ5a@#g8yj0-b8H`>C?!r7
z?hrn3T!Swr1OlSgPsy>o*2m)SRW&Czwn)&_NF@HDVw}jT0wN1i5h|!G6lcoh@w{vu
z0*al710}$1RdPolJ2;W|PtG_6M|_i~#*=%@vw6utPYiq)42)j~UNk(=D@edUipo0a
zF@Q*-sm)8NI064vR1}YNHk(MJW`c5f`Uazo{F`2UP~(;l{gwncdf>|Y{Wb`f?f_S%
zre}={<ZR||JZIncD6S*!fXhTx>DTPPv{2vi!CQG;yQ`eZb;kO0M-xl^bu^L^mRk&?
zDNGzst$+@|780KPY#*tyPgL~;>cwSLLK~X-ZMkEn4=OH<jTmZfC4?%`*u*{SG>HVl
z(l_BQ3skDw(++EBPhcU+6U%u0=z!%*ML=R&@Q*FTteh>v_g6&tteD7ASmGd3ekdi_
z3U*GS9cp++<XuwST!a)~_<cIuODMiQT*8`!*}R&ih!09L%Okj8)ce5>@Xj2~NLS>g
ztUs#-h-9P9{P6Z_v%jZdjFvAEeO?`X%#cRbo&lrN-6P|XAxwoZS#7@f)q7v65_>up
zCyf}rOSX+&KAJs-D6Rr@CJIiH4+yt2O6cP*q<&-a!99F7@O0zE8uJs?HP*y(bonk%
zzvIn0d!&<Mgh^8t4{U{#oDK@;sg0A6DjSqr;oo+5+XGe6`vOwU(|{8j?363EDSYwA
zGpksZ1V?ITiEf8W!Tk?w?E3yJ{!WjhhqZtDw#iKNfGN!xNJ6)e&4lr*J`y0)UdV%l
zXUqDIvj8iBzCwmfd$}!*&bMp~r7eN_xSt`#u-$K4XMqfxhvUUZ=^HymZ8Emk<UF3+
zX~{wVB@6{9AjgQkL=DD@EwECh5<aWuqO9&=W*eVK{dt+%WysX*6P=VWY!rQnR8&Kk
zh6uU+TaFM}^qRM2Tr<B+ph0V$Uj<~)2t}z(krcJ>2>BsN<_(o>y5;q|DKU%s;0gC7
z_Oq4N&%Hh>wymDK1c0vn4`Aq0gnKLr1hU)?2(>*uS4Rdla<$B=@0{XO^MAt*omN4;
zq(KfY?7_-Lv#nh0pv}&cdo6O8@g(#Wvay87DY${>Vr{RYb<EE1E6z6PDbt0Ckrr=-
z>(b+AjOzago6ctK2OFoC+F+m&{Z&sRt;bf)3uBfMR3EdZ+hBjwH!IGZ5HWqDbzQj4
z#9E-Xw^blfm&0PxQ?NYbVPe;d)k4Ss(IYsfmpiqU_r<vt-R2qkW8d36i^ny^6}zu1
zQtJ;BZFEDs2vkU~tbXIER4`{_fkw%dc;AhH#^AR`b>ySAjbX$z#4RO7uMgvTB*|Sk
zl@<8$1^wHhXv^XsqLJkwM?MpL9xaz}rZl#P>ZyF2!0}&nsw06vs`x6{#*S8qh5M*$
zC1Ae<h7|YzET@ozPsEI2vWmHckGnPG&29}<qV;R;qBB;}z-J7!{2+w9UjhZkD;J7C
zl3+zbdXo!koI0-6{7sObd^@|!Mb1^liP&u^C_AdP5=M8sVhUh~EY$(*5LDm)V285S
z{$YnWjNO?*0qjtBAe+V6o4SE^Ob(Swv9N-L$FqNc#oP1JL`ot^8}*hrv%YY@ezSvn
z_>)}K(1>8A`JYv`h?7}LL(oD(kBw0gc?LDd&}d^q4>;&%tfWZ4Gf_`~H&oquNpGM=
ztd=(PZHQelG4!zhNPPi&p5oLwn{6#zDoe>NTP<Bb^8sHiE;FUsETbM2sx4{cdiQ5w
z*k5~{d_C|~k@J7ShEjU|CpHv@0qcG6-_%f?4dzsoq(GphB+Rr^_>3+C6PwP&K$Ldh
z#^<em+yYgo9>T!?M21dv^%l!G)xTsV9oiEcIb$euXE7LmcI%(`Wr<@xf&b(epxNjl
z$q&N`Q0s!?+gUUZhEdPAXlOWVS0Svb@+Re*tp30+m{(jW4-M4ROXqxhe0&;$fJo;n
zVQU=fv$#E6nzLX5J~qa%<52t87Mf|-<tEW7L*v^1T@V&`2oam_!x&+kWxZbA?%lO8
z5JI<TVOS>hxYlKUK9ix-L+rPNUIUh&1Ws$SiCr7_yD}+Me6_2QnFyz)%xhk&rD%3W
zW<!-Ba}aF7%A2Ga*gHn6bs-qhCjXh~f^w*|vIu-LWK%78Y43~1Z}o=Q<~P)HD!QHH
zIEU9Qd$ZGet(}%uQx(0YbSIqT50e}IW-G!6YA0TPadSfIb;jGGp;v^>VT=`>h>Y0X
zdc?5HoW%deg)T(vlt>oC0ntnHJ6L-BCnMcD&?=txOgs*|LL{LepUeV!2C2Wc?SGl3
zZ(77MhUkATvk!M##ti;>eO;F!o6tv1ewQ{WPPVhIvQxoFrXrSu8<Z7UvdDo5KV6F?
z1NE^?mkE6uat)LpYZzKwG$Q^UpEtIhK5mwqMvVxza&}Q`RgGlw^%sGUOUtTGG2oEs
z;L;iMj}11f3mxW&N6EQTuH)bX0`#QbNcGfw1~9IxZ(Nhln|RavA+@XN7GPG1H!zM3
zB1W8RrHVMEZAbfyh$m7kgXf4Ex$+LAT2(1r=bVGNUq98GxRTPUM^gNa6Z>T$2a9P)
z*+J|j34KtE)fO%xlBiqooufYE7qBQ=R$_-YL7!oX)O-Iou+Rv{(f<uB^yf<!01G`v
z{6E4%eM-@-Sml88QTU%KiB)-JR_4hZ5Zkw`i_j>d9_t9Ue+N%H_O~Z<)e2)xD!}*h
zAn~J8OYjGCd!~nhzfJQP4@!HCdiba3d~=O%<F4_}c38DwG#*otD;~C>Y+@rt@vy}3
zb>m|=nI_HWBbt#3VB2@h8u*c#F}s*WpE2#022f&}_JckA0+|I7+iB2bfvjfwmrjYJ
zBtwQS5)-J@Sp0+w;KGL>6^6%MY5uZUSnvIbp~80Y&FP3PpxEMMLo$Pf9>&@>YA&-Z
zz*A!LKJ5X;@C%|N-XCpI-M0_1_Wld_%Um_w`!~5x^=Fwg%(>UDmC@H8AXfJ1Ei9rt
zh_`j3qY%n1y=eTj_n4^1URZ245s6#~y`D#Q#1lt6@sl@>z0IC&F!S|(PMYC${!ea7
zN9x~{ZO@?hYJ25>WFZSQ3bh@3RZCJEZQS0fUB#gig1S=sSz40=*kvP0nd6Py{|~s3
zRrLSBh3+)|{s$M@+iq_}{TCNXDFVC2#EZX`!M3Q%74@zGMcsf_&l-tIG{A@oz(WzX
zsGpKI9oyO%IFRQ$B}IPi+UOY&@GaLc9W$$P4sBBs3lv`3L+sqcA-K8jGI;$_53}df
z0+T~0%{JWLB%C~HOcWZiI+u5PK?H#pa2Zpq?@+D;-l5g!wYxb7p-TG;?aV4Lb11bu
zBDeFLKefqw+1e$`jUqov<i_`4aMgt+_!;yoqRnbbH4Fy#(;Y*yZmQE^{{$!q@ZIqz
z-na%IB0+Qk??`uk+9Rh2|3?{0CJqHCLzz}a+<wspADoMwhEenQ+k^6fl2_*TVbBTt
z_mebucUOw{&|{YtjY&aZo$7o}5{TmFL~Y%&N`!n#H(U-aWT0b;!sdPBw>gPKrn4FC
z&)2;TSjMs>w4o#NcJ^qF??Uk{llyW_I#h%>C0z+F#NSKRy#*(WBF^Q>{ujaJ>!^pp
zA|0_Ap5`}7%<c#-*!A$lk%7uM?}scZb;0-VcOe3<7QsC&SBSLJ>3(ydz1)MW7D{fm
zDnb`3W5b50)tq%j?xgtJo!$a^r&|Wte||e|luK@!U5zxHFsxU-9ozfLBFvpQp5__z
zSchT!^<3_f=Mz|s#|WBKC)aFXFG5BSzx@i5%MkckS^gxe6DKBt*oNSc`2{81)4v+L
z>YSY2G9WuZH)&{J6IV6*ryZ6+S~mldSC3D%u7^WlejP3_XL6Axp{J|)nlsMFc{-Jc
z_htFk?3cILGXe?%<`$!&Pz95_52LLgP&l#TZW6U*X~Nc)Nn(|tWmQ5@KoH;@?JK_f
z_s)YwqfxN%pt%yQOWoCI4HH(=W|b;lCFFYk6-+v_AjwIdr1E}5%X~$xLL}Nm*i94?
z0WBai6hdSE?=5Mb=Pq>qD23#fWJ@L5@-kO;T?10O9}8Sn56W;M5m_DE3cnFcwv5n8
z*W13EHlxss_1nqP@!uZP_<k@=l=Rv&{VJt$1%HOJW|!yi)Mpm)7za+{$RmbBI=$&6
zeDNY!`d<aEA>^Zf8aq0@`$cpKFk=wT*C$<kykOUycyz`f(<xITwKuWV#z&iBgCl(`
zTz8iUaFxg)jr9*M@{(d9{E&TE)bI%in%j!jWzSI>zcIwE6oN>xlq$N|E|pEa)8gb;
zv=Td_$_h^6Ef9nJueJ>8<q^%32TDMv^7Y|Q{z;wZ8t_A}ZD?aTDJl257_TcZ8-UA~
z7<LF^PLy3Bd3j}32WY^fc|Gz<1a?IvKglR{N?az8*F7vzbjDSmTf7*-irF1db_+vK
zT{%49A$ys&C1+9IVe&C`bFRlW_SBrQ6$w?z_p$i@zD|YZrQbSkJjV@&kKnLx1xjOY
ze}~Z7Hh+)f9cg9G41*?<ja6}|M`=i6y2Y<;Mu~c>Wm6~ijzj`xv@%>BW!y=Hi~C`9
zxUXBY7gi5D!;9TmX*Fe>nR+!V0a^4B7Gm4K>Ddm-vBV^lanVz?KV67`$Gji#PflHb
zd1bnh%gww)V{5p9UhFcHQv#`5B4hDX3ZEy|3T+`gT3R^5i~sAVVtX&FgzL{fTS60^
z=*;`u;*`8?pUCbwISZNahL0-6fu!TaT{sHIQqAn@lbFFV)|VaF6L^qPQ8on}Vujex
z4V?q|oT|XkKOKlW5X8Lt#0gM9Ib@!HYdfM#FOeGox2`UAy%K*E{VS|c_EoI1rr_T6
z3e=|pFDGztdYhU%p`<t6+!MEoKIMAAJB2K%!KRHvn`PXew$^s<oS0(Cac8Iy{qHVI
zv)n_^K42Q*Jp)gyy?d@P$1Cnj2xVY@Q8=LGZYyG$<WI5I@vyx!L42CGsUF*FoE3{5
z#&!R{K!@%iApb>&$aMZs=n(cDsVH)Y#BqmF&JA7f9LRtOk{oZZ)-xd%ZJBCuUDi!x
zl3FTSjt5m3&55Iy1<gQBht6Cxv(UeRcJgf(c1iXRHwrZ@L8I=-0@N+)t-f!9S&0PE
z(x9E_(+*jS|Ag9Uw_}+Q!v?foo9y?8haS5F^fvbEE2I&?_LyLUzqCe=;<<eZqM_6y
zA}Q;g5SGWe-Mg^;J`KSZ44H6J>95p>hX!R!dgK!F=%S);G<9d}3Ol5t`>800{{GO1
zD~E71!!;vAr;aD<czTEslVrin*5_2lix)|P9*q&S%Ko455KO@Tzwi)q;p@0+*?{ob
zdJzH)m<3PoA8H%phJ9Dhr|$fAT&4pRyeow`*+;l1xJUR0IL)!-%ruXKzF@`hErkh#
z@ZYTFRN!HZzcQDGNBe%shg((i!4(Fi4QkmLMyNiF%*NI2ogwq#NSz<iSn$n%fJBLt
z{AR#V`Y0PrEB~dwVqfN={d1wz&TToVWu*rupvi$*c`!v71z7)hhiddzY!CcRv)va2
z#QQ&SPI{DJguKOIxs7&=&bY4z(RAh>ofogik552sE}EufS(Z<y^A(W14CSYr22dW6
zP0LKI4ZNT8cWI5L%S$%dI__TOxBJhuaLcQ);C{sV{~6c9PQh=z-i{B+l*(PSO<v;f
z1h<>=^79P=#(j{bg+Ri?4e^}ej0+qAY}kDc;5~m}#I5^*342~XuPa=zagK7nW^aQk
ziQ>bzGcKXt8P3in+X|2e-|Mw(djWa(u>AUE^>8CI-@-vZU6akf+47|S*-is}HV1X!
z^V<6U1;>N<{8jd!JbZLFLuJ?d$ma*B3blsk-SSW?1CeUFwQ@9z<N5M2;b{LJ)bkfv
zn6>K{Lj_07J_jp*;GB;w2q5QVv!Slb|K<98+N|0FblcY+J^t)Y*-_6}1px<pnL8IC
zlvPD$tXPnTv@C3W+*D+yh-)~REZ7>g%uavg&7K@(CY(s)-kzZcuKQ|+f$%QnX%)LG
zRI}3M2Hp=Z$k&F`miK`P49WX_Pybqg!i@@EE)($FW=f?G7V^y45svS2rqSK?J*6&+
z0{)S{CdY36!<MOzwMWJSUo*_BQa1b6$zq<yth{6tJ|5oCwqj=cR7lAVImb@E%dSCy
zH=e>Io?oRh=6NrReo`0_zW{E!e_JaaZpx>hVa#a~wm<(lWMe44cugLvb9E_(Z+U4f
zc?qNzZLroRn^vzk8s}l7GLbWs96QcapV~m9xIi#+1JaB*vPk(in<aR3KWf!j@vfzO
z-$bN^`{N72YTR+V8VvSl@|iSo5s?2$S=FM`?y}k6hp|v~_D%;nEq17&oc=nj%>Z-G
z`T*Fs7Y)FY*U)>)=KFQt_VBFlz^rdMcm|uZ2O8O8SI?_ZIhB-`f4Bzq8A-no6?zU%
z3V}RBSW?mYf0AWH4*0jhhH=nHklv7mlh?;i62><qFC-^JPs6k43|-8!tlBT<*{JT1
z5rSyJp<QsXMgtYe=;uXrkEp*Q^-kH?WiqHvmJaXR^_?Y<jf9~u;QU!g&B89DsosSx
zC#hLzeKsqLAF28vAPjdbz3zbdk}7$AHoqAgIKLWsEzHy|RKjuofm3SSq*C;pnIv%|
zhEWBS6a|`~Js9f$?T6`q@@GCG2iH<Qh-XoARuvR@hj00Vc^LiCUh{j$owxHuP=KOI
zAPW8X^~0AqC7$`3u%L7>=lG;V--f?OPOw5hyr~)jcvxJ*OL1^`bP*7!g}H8T-qPxH
zlOjZi!NsQ5h9#ep-SiTa&jwF}Vc=KS_(+z?@I1$P{sIq^|2{};zj=Ktq1c4kX1>;7
zyLwrjzv@%rWXx&VpUSD5(Gt;OYP~FBMI@UurC@f2Z*zW4dw)3}GHq6~rQeB}n?)oA
z+l$$~e#098Nz8ksLBN6W{RV8MqMPDZB2o2$hvnFnQLx1*P|zL3v7dOC<&xi+!cqE!
zwkeze`Xf9t(Xae9w4aT)qtQbMDZr#bHF_Ey?Ji&9&way%lu$CZQ3BKQEiZ0>hK*wU
zQ04HYLMhQmkyG8%h0-rluAvJm6jDt!`5w(;Rs@0P$t8q+vCZr4V8&;PWAiK9k;z&w
zBk!yJTmxP+uRFWTjG7H4egTt^jC;)9B_y_LKyx{P^6P5(nw_uvO1GuiWlxgvC_7WL
z^MN-CB(C}ERMhn=FQ6l7@MlNIQue3&Z}&6-wWF2k?-`j@_sQP6P^VxyOLe-E_W;bk
zBN%$#T1F-<atyUamDd36$6$JFO+gkVD9a7@+#b70RDYre1SrHX>sWnR;^;=t=38m?
zAH8%Usl<3WPWH2sFoZyE2f^k*#ud0HU83*!L;j2&OH<i)HN1o{@r4DPuW|H~BRX4{
zY<zkItTJ5|feh&iv|GRQSOeo$QUm}!7691}nAAubYJ`Z6y|&d}vdrjChjN>jHCIJ8
zAJw3f-zD#gqBcg_mDmhZUv7`{yzNjR6u<hsmW)7;qrDifMhW+doT{m1*YD1oOPAy#
zp(f^XQCuQSRd^lGb~a5At$nfqiXwq21xkR%VkEW$*5Ut}<n^5EeAyf;K-gf3dkB%N
z*~GLZ*yVW8^k9kNKItf`S^fgU*5dbupC&K%pD)?aDhclYvMGQ8_nAl*l0s9K${nJ#
z2nU&;9Sv2FBjFgHU7O^tXvVLvx*op<xL8gpa{AhSvQ_q9Q3jQTlZ~zQG?CXF=on?6
zaPmbfV>@617iO%3C^qb)*ntglDg41c8*e6;y%_on`PY)`7LEz4`V@%%+UZWbfsgle
z$-e;^+4<AK1W`2{m1Zz}!5a$-c5!UZAo>a<R1}>!!bb}=ZOy{M9{g?*n>LG|wViEQ
z=^jK0@29qa##KvfHXC0D4DwJ~E8v#}m0Gs~WXG;$AU@OsbS?98MGWi2zT{z%B~p>A
z0EI80<CH926gh<milB&*lbCc~c{qq(`bwNjV=l_jQjWh5EL_VvQ!AGj7+s>esrZM4
z&tIsg-~(xHx!GTn<sLm8LK<^PT9Q>{TP<i*3YGcCL`Ij9X(Mhq6H}9^E6D^#ugIJ$
z#^4rhB@8Xk-%+6f!wrDamq7jabz5X)Z8<)#744?B=Sv_bGs*&?ca{$W<};n!aB{1T
zF_7R|g8(5HFs%O5YRlgfK5nL|4+Pqdc*CKQ6%Rzd+dVeXPHZoN3(g8sF528>pYBZ%
z5$1_*Ag(nG0UkR7Jt3p+0*tk%m$B(jZ65EeGSJ50Ci_@6fQaz9xRUw9y~4d7{i5PW
z>|_b-l$`PY3-3i=GycvY_Id*IT<En~@m~n(V=V`!KvNQUlN%9x29it2FC|I6#Y9)~
z`uV{(n>x>})>C@DFk`a<<z>!)L#mqzB1ag%B=V=3bge+vq0KDfGAR5OvweO&^L3^g
z8|?r-`XTFK?-9`=h0A>I562<BwK?g4%$nbi)t^y+Lz4y_CtA-}$)4AaqY189x7v5Q
ze5-C(p=ejCd~Q=7X?!_c22PI1ZKzaJeriAl&k)L@fqW-Q<f6n)aZh|%5faKQXY#c6
z*~tgWO&ma=s3Qx$)(QWn+MbL-Lk1Zy+x$Q1dWQhXqAl7sZQHhO+qP}nwr$(Cot3t2
zRHf~o_ult=?=>QN(TT=BXRkHq7?um9^!rXtpU|3;RfL(HO>Blz&)b^p1~O*f8Q2Wu
z&gna4u#+uD%w(o+tlUA0nQ7Y@nw<}w*lCva+AOrr;NK8qrh}k8nh#B9D7!U*nhq%T
zD<jk5>u|))h@jclFRTUEZyJ8F75=+z`h1=XHT^K^Gwc4`Z*Itn)D+_kaP5Focb34l
z15Sz_dCj;QHv<oCAlP0AX3%NX8by@$bwIhc*tSZAe|b(jDHq<Xpw)`o`%KtGQ|YGf
z#ZoSqaBX#R@hHz(76QVk_%w4~AiM<at%$6^zU11J6;Fk2H3vr*9<(EPI`3iXOn%>u
zAWFp&xc!6mh6l*uUp6Ff&j!e?s%VM81*&p{dGBTA;Sw&5o{WE=u6cd{?l2;@8;e&@
z|4XfnRXSe(BE(-NU4ie4R68c8mrG$S97C!tY5z)nZa^)hNkGq((ceLbTF86`DPAAE
zl@?ctkL<G55+E*;v{Vow9)}>1M$V^IkTs?(q={hBM@GL47|C9v*ZRAMB2hnGGtdLS
zOZH`^3l(_kzd|p)BO1ua4-b(Lc_mvt4f4KxVLO~usO8njo0=ooN_94nw253>=ZbST
zBUI(+L1}CDKs&hw4MsF-_p;KH5eG?YN5}8{_89XkUiw9jrHtpHLCpms3xt_y8_8NF
zdO8(&l`OQc6u~mHEX8n1mfNizduk~(;)!j#`(b3`P$F%l)0<1D-dcaXL#0qzw55~I
zYE_?E%Otqqy_ev<-k~%_+#<d->YZ>!#?rtIj<5r(8-1|mA8sO2%?~kY#-$ta8;>xT
zxin%3u8;*v{qA)dy5G`Ai^^Mfo4tPFc!nE{htr&g1HcZyu1pg$`0>}@Qz_5;Dh%&5
z5O{Pvd@L4!=SY1A`lSHO(Yx>o3qM!e(+M}9#NupO8O(!4xA&Y*|F$b0`&U^O+I&61
zz5Fq{ZZTkof(JZXHK_4kakqg^blKLzV?o@NEUV%(4@9#*7VodE+a*9RtI(wF?S>^+
zUow^!C8C%(f~emPEw7$W6Tpt;2OXz1!kte0ivJVl8S4Gl_?ClYA}25WSwz*7V0R_8
z)VcbGcf_XD@wr1WaWTcL(9%}0q)I#7-|e+M@4_==YKCXCc<;yYEbsBUmz#MD9%sGt
zYyjK)#l-F3W}pAh>=(iFt<tRvM+2&B%sFBb$^AcqnyPW^Pz6D}4)l!dDTy)_G~Zs|
zyKj<9JSo_bkB=vT*pCUOL3}@TnzOhNv(|xSRo8nzAbmz+vAEHh1)$~lCIm92lNhz>
zoG`&U5WwToTNJMW>?RbL3EYxOb>s^J#gn1-zvFYAXY#ChGpu+to|XZO#ah|i-QOE$
z8bScL6&ctWk4z-7@pVm*FmN}qd8fWdcf7Z&K$?yo2Fj~Iu*1gc;2r}($@?ChYQ7Cx
z9XE0zqhf!4qSVa_<HSWDkP<uyq|T&x&Kkf(fC58Mpv+nW(7@3yfw^xvXR7nYG=gN4
zdps=DYbpI~5+IjcoLu975v%)NolYcXR6LOUrg0S;rXvV#F3oi6<xBvf|LRW-`X(vb
z8$mdhl)a`}Se+LM4X@(@w*YXGn>mxx=B9putcg?rbCcNX9Wb5gHOS&f6J{z@kH{w3
z^9T>>0g;m+S|>|3b6_WcZO;wJSG1hx47;KZ)DF&JB$37<`zMN_H5IDh9tvq^U}8cR
z@_<^nNo{z+(qYJOE3!vja&2l9vr6rtF8>?=>+!)r(D_ur$f0zgTrf|F&{7H4a+1{z
zndSa~SWiQeYZ4vUAQqLH-C`I~J&j2gkPf}&X3?|WyJiK%xO=>fRNy4S3@p|;bp}Zo
zt<&IXKU=gZfQUwh6H4$#F^V77C1=aOvL5de*gKz4$~d#!V1Lg|ctOlVl<AxRlX4Xy
zRRx7h%lz$f1Z~V(mK2ffxDv)?S0x4a)tF?C5%7F>1-v4Hp`lr3a#gsTw4MXOZlWEY
zR#s7IT-Q9K>|M_yP#=Xzq5x#NM1TggnjU~$zo42}nQEFU=I5$7Rya)VaVRT`egZ1-
z2$PbzEw0TlZ2p#=E((UC=73<nh=p+8P|90m>@3F~uJ|XJ=r~B8dxw-GUqsSR2E^Gv
zfo7JY;jxZz!KD7u#8}i)j?8f_IZWuC4W)UaOvE`_a!ME)8x+sD=$3<o^k~KtK|<lC
zPmk|xAc;$o1E^GE4p8giWwnHw)Z3s}ci1}Abf2^$7zI>jZ92WHKM?KMRo2Lz$pM#C
zhihVROWAZuTDyGIhRxY_U_11e8An6{I>Dbhtg{I*zpZw-%gF>#fv-fOUo#{EfkPvR
z0w;?KT_ep*cUxs32}To4VaC?CtbIq&p?h?~GW^#YloDmS-Za5ZOd+)&An1*VGTqQS
zRa2IX=}$Bnji+F2zDdUdCTO3%Z!sd?fIx}L8r!2P$OMu*GZBWwu+)v9$*4?&V`=q$
z=ZG~#YtdZP2ZTz9Q~Oj=+m<JnZsl>s`4e9bHZH#DLwM%OE-i}Y&v>2^ib+~l9Rf~a
z)(6a{NNfH;Fo3Pq<Vtj)I&@=fvhu!MM*b8m4r*)L=BZ-Z7~-|8xH{re8PJW#ET`b&
zlt?0))`3R?*(v##2(e^Z5m%u|!M$vH7==n4g{Aef?M27Zj!;*azn1sWsydn47e5%S
zcIVL`q~154A&78uS9&Pb_W^@BzUtysMRS1ed_}`Zh8Rz1&0Yat3$ewfVIT0cHhcvg
zAIN+;W@sR)_1=Z`EJjMhI&5ng*;BWj;u-V?^ePeXi%lp*koH6QapD=}Z9#-rw@`Ft
z6Kvj5vnHw!)LzDH=dmzcNu`-WapfZF^b-}yu(T_h0b18yF6BlzIAPV<qJiE{0s|9h
zAg3Ag8}I|tBXwOCo=20BGLuq9HEI^+)Kuhw>6-HLNhgGAlPlXwrCA4rrPTBxq~WAb
z8DR<Rw&L%o6j<DF3iFzUL~aHYARb{9&mNt=kfhdb_-?F{?z#WN5Q~Z&>o852#&+cZ
zUbf1CwDJ~zN=PLx9tNB2RbQ2JK^ARF&ZnJ1$Y{z<lOw4zmAQnQGlQBDRDdhvs$o$Q
z9Oa36nOlhh;IIm9vU{A-b#5g}GbhsmhETXoTb1h{)k?j>F(HD5-S{L5QlB8MR2Gd~
zAzBT#P@GyC^<QVtk`9DTxd~VjWp-p`-Wg5aOsv0lO%~x{3j=8IaPXbra;_9C>_Pt~
zj>5FInTUP1V|W0JWzHxSr_$sKZpm!jPGM~2h3ZiC*n+j(L}gNF_kU7)lqJga2D?)U
z(h`C5a*13ZB<JwBt0E_3A{a=S@TE!ur<E>DxboOa$n4BsnTe3nEWtmZj3BXCtMosL
zpg@x<H)Mq95{9|R_8gNZh<tcw`&mX8361cnPPK9h1=6plEDpK5h@T1Uw3umg!Vc|(
zUkvEd^PuF-Ce|GWd(1pkBt5qmhs#9Ai_-byO$N?0GU^@aEs}cxEr6iq)ET6T_)$cK
zh`c=VrBIW7agF>*hes(4UT*2`NeFoP{=daI?`rbvt|{kVd?E=_BpmF5$HyqSig_3g
z`IyXy8~7E}ab?Zw45BC(TtFx2Pr9gqVqTsRB>`f6)E)34Fek0h*=wdhR5YL{_yND@
zjl2c&DATr^D!~~pjc{5;Rj(dxzz5Le>aAiO%U?|KTXz{JmN(D@iNX|WxwBE}hT~&*
z7H!&ujS?_&BU?ouy#0A;oUPBQRGTjcT1S*cutfiY^gFXYE0p5g@;{1z9*lWa!g9GI
zoTW9z#KY&K)Qf_xdz(H-pvS?GP|31R?ZH(gDm5e{IM)!#Jm(|r@@rGz92C`j!LI65
zIRNYXFUsXURAhFX$8;{8S%5xS@QAxYNaIuS%SPtQn*zQ9mm^ZA3%#5r92(}<MDrLZ
zam~I_xF7#pn92?c(@A|~ks&a_sW!i5@cKH0UG@HNY1-KzM}ix_s9?5R3Qf8K9*&mC
zh@fBZKi#`kI9UJ!!9a97Rwk$9R|l7g=}Bzh%`tNiC_^n+^~X_dqKz%1WNTAFzel%w
zQ<}c_SY9m%Pp`rP)3|2~kl)%T4X`oRquN+7_++M-V+J+XhVn=m)$~P!ZZsQ1xVfJ3
zg>&ietWA#2uQlL1Bg(o{$g}I&-a@=hL~o^8)aWVS>HDf*%D#s|^qI0ANA}<YEX_ws
zPf|x(9@PJKUyw8tO?wVNZbLUlr0hnZsHZY@*^R(@Ons&n1|?>bJNYNG#X31)nuvNn
zN*hb<3WYInS!u!qzp%B>g=&{O!3w!`@FDJ9>7l>~I;bF?j_9VR9-8AZ2&;GlrUOag
z(Bvq}&y<#3w^Wr_SH0L8`zBg&t&wQ^Bx*jzi3AJrs3ptJYa^!xBw;eDfZo!^HYjMN
z`li<RVdXOR==y^Ecg}glLbpm6qzKU|)eMld9J-I?|FzXh-~aK#4)-hF{!h*;?7Y~^
zc&j?x#x7+@WF8_P;XRjxYhcHyQD+XxbZYX@SaOf(FQe*@Hj^gHJV|VC)BQiQ-@D`O
zMX$qedp4$&eKKT$F469{h@bE~kTZZ%{-;G{cOgF?)AfQ^>+6rq<@yr!ag9q~K})wQ
z93@P$&q=EvfjMQLlxiZ(;vR(@7J^OHq{&wNYQdSF+rW8@HBps33VeGGOvjZF4eO=p
zNF9RP5~)h7I9u^TdCSM0x1xBHTw22OQ+%&3jce-nOJ57qLSj0}*MDAn$iK~m|H-H1
z63;z6hxS&cce$uxK(PP!=bz!YxQGsB<)xs`->6CPk;t-7;G}ZBuIBsK&v$oqf?9Gv
z|7)fTfD<Mz7%$7cV3CJhApl*ZA?L4V`8y<n$BSQp9FT3!o7q)amhEx_iG@S|Z#zBe
zLMM})<~a2AfazGH5Rf^Iu7;O8_GHv35l7>Kt6tsQJ0~+oxl()te8-Mg9)Rj_!(Qd%
zIP#=UK?se0_)EwF6HkAZjH##-fg9x*y7Rr_>bjY%@UJLKG52Wue(6DaYA#d9Sfsqz
zI;>$m+n0bYjSFo|YW;$hgCr!(7O@yT64~5xQvqQlU9SfUvSdjC?gWEqa-&48UY*K-
zPDV9aLx#p5tVxSaT;!Y$B{=9=m9;6mQS9t$u2%@{6=n@--mxW>Ms%+`Q7Y!eskkZ|
zlvMxPe7>{1_Y`@+A{juUE8Tu#Q^<3$$Vj|A1QGPS{`J>m!e}<3#cPBz(}ZS;p}?nl
ze!I`7vb1Gmy~s{^;jQlWI-`XcX$<oo5PX)0yUqBQ?ed!qtRR8oH9q%zABhhO>xeN0
z@|T<uw2;IA1q;iG2=O8)NfHoT1EDhRI;29U6F#|^ZYWk@YgU0Ji4@@NIluM6UGXpu
zU3!#WJ4Rf#cGG{fsOB$C)oh>Txg&qOCZ#RRL<$M(kck8nG@wa)rqZbQ&tz6`!c1qN
z<m6S&1QFG%LYTAVmS3&v?nZ4T#2oLe*<yhXvaIHD#Jx1>*=IMKtu^exw6r?9?$2XB
z9=BE=Ue1dwY*F{w=>4u{{i{gS^hpsL&a<F-p$2^=4F`k&U@Xx{fbq$-@+9LL*Z=$6
z!PmonN#pwWD5sa~gYXI#Q_y}OFOwSe$r0knkp|r{%vv+SX64it>mGFN#i%Q)qJd_J
z&t#CxEa%|z_|fs8@B)sMjrg8A&P}0sR-{o$!#L@b3S+QZ3GImTpxZ_P{)sT~kfhx%
z^!>V+^fE!@I8c1;Ta3si{B40_&N9h1fp6Q56WbK%Wh;$+QSr&zFlS?af<NRq?5264
zEl@jPo6e6|LPB5GC6h0<YHUK0fu4xJqfD-zR?qfdq)|gzwewzoFg)Vx_TKA0Jyt&u
zDY&_{2e-|%KaE}Mt9qL~br)Pcd3#@>q{Ql3+G^f6CX!l0Xseg)La#^K>D9s7&H7!3
z;{~G0t}?gLSx6qdO5QrF3b*zcy<qEgTX1Aez%t;*awHdlDnk2E5|q|p;?#AZg9Rit
zfV@sS(0p#Bh-Yq&3{$95nDtO>0gb6^K|*&V=BtduXFjZWjt%Wm%3!cys)v|j8w>W)
z#Rm^r2v-3Ic|sf=mhKpGvJ@9NGsBTjrhQD=w^x4{y-iQ>Tf;4xq!UIUrQ*h#NtkgG
zv~9OGGGin+oR=>rL~bRggCZy{$-(O3dBHLyzk)m(F3TghutBbPBTy#-I>MTJ(~&LY
zEBEhTKaiL~`mZ#I)HU0Q+8WykI%4~Q38AyS&xi!nWLoTzL2)t2Sdw7o<)nJUI7pI^
z$wZ43f$7B~=~eX+?b9l-_PFXq%QksypI8yo>@mzlHnfwLdPk-yI&L3L!H8YFnzu%I
zL`|EPLmc-z8ZRs0QKX3NC<`7d{Fp8@6Wcb5l1`Qk+QWm=(oJN;xF#6gP^uu#(Tn2N
z+QsG9G6=Xt%2q3z>DwuOnE@hynZ2zf@j}E1(?J@1soEi{n`grA*%@sWn_<)}N<(xS
ziLX@Xw$6wnA*u~<PG)(`Du2C7Z~c3iG;%0A-oA2LtR#=qbL$?OGWp${??Cf>XPLw?
z=*VK8GuxOR>Ns5Khi#ru1d<R-M>75p5UR!ooO%J`gH>n5R5Oe)2E!%&#jR)~1f)Z4
zK$PwRzg*_joW`3(sE03iQ2`&xnh;1P3CU0+e;&g2*4ueIo#o|rGlT1C^fo`8$>o0T
zdz*xIG;#Bv-x^R+GC4hgksvIaXsiMBojsZ)93w7Nd}gPEkSrDiN|c2gm=Yh667b*)
zvk2<8iIu~Cx`pAgkP8~O<&`dqfKWxc<^0r3^uQVikAR}#9E&2QZr4$333lEzZQa;{
zNxNq13n5dg_hGZH3?FHP&goyD0iJICyW+$sWbcX5-OkP8x5ZfWu<s7U@vwEH(E=KH
zHL>kF@<(eA^0ZTG*igb*{=31?GB8Z)12SK*Kg;^_3)ubpS6H}S<OI7oI#9LXG$IW_
z=pBPpGan=YV762rIZt%H#=#K}7aP!{KT~J03W|p;2NNcTeZRh`-U1k0C2kYYHbOb-
z0^kDu?-1{EriJ!w61M*8Nd3ZP#Hywj#ddB?`GOtU_FcuM`RM&g9sSh7(fwQP=%{;p
z)+-fDFazi~IXspbzw_++r-kPR(@!()SRSv=|9+st8u1(mVE}$;dG<>PtmXt^{5Z5=
za8THtLvNcdCjwqQ^l&$>_^0z5QBbZk=mlurUP+gta}E+vQNgB3VN_glo^c6*8i5Je
zzE0QdK74%FXCkw*`KZ@vuE(<;GLgmSCX%8`CCFngd1!zN<j5tQwl!d_$0%o2O<clv
zMZwkx)(wJoT9Djoo1?eQVejkCKCc_7O!x9)UyrB1%OyR@8)Yc<qF&Re)srL)ZvWv|
zVrbVvp1u<*Zm?TS_!SAR1OQFw7x1H3adT242dht*%cirlj|h`Oje`cT`$v_z$0HH-
zc0w;*f+12dr3|}McFo)-XeVx8oqNdI)AQIDY{GTaoX~mX$ol#vpZ&B?Gr#+ij;@X-
zuQ-R+$UUYCe&`?JUe^^0uA9Yj7GsdPUAwK-P=TIhQJbveqMypTmIWa8q4DpI0clZP
zO~&}eF;RvV3B!fVLAsm_b`&m<6JsYeGc@=en&p&V@8_TM1Re6gl1Y3y27j=_kI%x*
zI9B5@)%S$B1XhV8w<TB%b{u_ZQl|c9$vdtn(-$jJTGVdZ1RgE=y+|n#*gQ>;j@vDy
zF|3~%g#?Hr?h+Z@+LCT)(IeWN4!nEgA9SlHD?=kJG;Irdp4bK}Bvknnt$JIz_Mk<6
z11_MfaxEw6Eg94u7`^AU!IvPl7-&fq5LYt&+i^JK!nVsER^5uM76Tg}G*I(|RYN7U
z(s0{H6ylv?iZa-pQym)iVIQC1Y($r5Ddm_g5i2SlK05F9TpPK|u$kCBb(n(|oL_c;
zk`D~+phqG99UWS9Pl<kA1fqL001g9BOUV)%zO8e}?ww#2{}`xg<F|E*VuXAOW}5H_
zSbTUUI1uacry?cc2;Vl#<*=Z{@v_JRp@?Ccx`EmSHdi=|>dbUT?C8Q0M{JVO$txl!
zv&co2<sg?Q0#h!2W#fQLgdc9*JUJq+^hu>q8wT;!LPmazed~gX`VdYrSyM_eduA<O
z)b0<1kM}B!UvV24x?g#;2mb^f=>h|^>LNHB^GJfu#Oaqs@+xK1<|lEy+UIl8M$IBE
zcT!J;5#_Qm%_fRg1zs9Atij`J8p=qiz`9E&HwXtg0*N3BG_(+iD$%qNopyXICMhrx
zOCmzr5g`erPw<<Fs0244Jbhk5=)GBjaf^tekLk#sO0Hi;3vR)7#D1RWZ6V|Af<YhF
z%F0w6Y;^71>-$f^kGwW?k%L77L~x8^g8<dOD>8Fy+Mm^kCEJ?nXi;V4;%W?+9U4ky
zs`vH$_}VHRNLJMeR8U|uBJE0!QXiRs6=JNxtPBf{w-|-t*tiT=wdU}FE>Ky<hcZ^)
z;CaJtRRotOAwh+?av%meqC%y*ktDPbOmC<{s?xSL%bDy##{Oe;43#J=r1C%kx@EEX
zEwAXCiuM5&2Plj%s>il@LeMZzt*oPTym}C*sN;NsEQkR&u^PZ4^pD9S<)7~{z%5dc
zE_1ErpS~r@j3icxYWd<+oT;Xf+_wv=e@A@jY?#A9ASk7k^3$?c8{PqI=-I2ewXmze
z#AaFCRd&Y=!`tn_brK;skUV5K6ZzT81U3FCkWPc9nmiCc)7VOS?p#>E8le!vTwt+S
zdtrhL@%3K?QAVjKCv}{a_!OEPYw=jS?4}LT_9;YQMf3Q7{6x$O$HHMk_nV=G`l!A3
zAx+<`6hr}77^EH#7<2@P%^2=yg<x|E6MIhJER(iL3B>|!YSaVYO6BH=!jOX>vjDu@
z!(#(PdXu6uXnjK^gY7&N+ZheX*u)>2W@f^Pe}GDW7wW-vCho#w%UQT}N$!$>%-IfC
zaa+&))@Ug*{;Ue@7E9S)5@Wo{0g2?t0HEUd-bP9l2k*Dp*6C^}6e>t?<ti@QbNW*z
z=?9J9_yA_z<{*3u$P{r<#4IZ?73m4!(i27E0Kzg_?mL>w=-yG}TG%QxJyEvsFTlH;
zSREilN?O~Ad!c&p7+Oo}h-3ZHY0Jo_Lt4aX<jiq&h{#bKR8#~Fj^|m`>qiE)nFksi
z2=<Eu;de;f2LtktUj~9@uoRGi;0v-mFxU5YC>1QtOfMRMXEgdl@;-iLf9+{JXH9%A
zJ9}k8p}A@xBW{EqBta<G7iM!5-HTh8ej~zlVyKAVmaF)4YLjdP%2T_ih)Np00D&bg
zx{Lvf&bxm%8owYQ8S*HR8MlCN2a<QzL%L+;5V(!<iIH`gF)1f&!tG&WN%cBEKb{%m
z{>Gz7gCERU9!7k9mtAXCZiOZoW)?E#y^z2vxOJ1I>q`f$$ZP(k9VtXPhT3BZ@i?8b
zwg(3#zDZ><Ac06=W~LMjd7)07&Jrt87v+cD1K2Vm^_t=IJ{ImNJTK7Y;1BXfgVysU
za!hdiD1-_5vvIE;&Ab=smO>56gW}XL3gas!jJJX`YUD;G-5&s#@3#Z`)2CH;Z63al
zE3gqdArwbBVBja2nV0U7G03smYW;D2^$Hq&IomaV<Vuz+0Z-P^@W<bjKdL{`|BZeK
z{ubW*s(<un$$htaZ=US!W$*aE{lqhb?Wt`wz~|%THqzvY!o13+rDR}qLUO^~;>i6y
z|6s>s09~1(Cu{*sDbi$MU*G;hOAPOiFmh$zivtm&3B1Mh1U3KQ@UZ{Cm3U~c7bXyY
zS2$Q`_@83p!W8{a*_moaPv3W>&CQ5N9ervR#`%=2jNIwR2N-#C?^|GCWDfK8-WvZr
zcg|e#jz`!tH);x&$iy+k8+>W<1WG7#&@-x(>3HQc&fo*DvOvG=6WP+mLEo;4)9uN!
z0R8TBnclnX9Cbm2$Va)fBTG!@xFQVE4m`jI;{eL_76Od@&*Mmbj*c%#?A4Z^!T4Vb
zi>-N%v6$d<2*!7B2_>%%t=3p29K5RkqaRxMpMD7RS3jh0Pk*Uq)d$)FrpWVEwM(sr
zz9u3J+)@_stOD=CpDOiyu<+|?vKbl$w<>B;W1g{wW6^F#w}KOx$qHWc+f~?8YM!%K
zCYgh<eSXLa_Lx3})3$<xU|o`3!9lTl-n4=v_5M(3e>V^qHep@C7~CJ8Q}CyPMDO@j
z7TJw<-FepZ<0rO?>pdEjO*ofBMKdU+2M~~k4G;m?B5NA62<2MzUfX+73}@uvoXf((
zKO2T6=E}gokbQ6pUlhg|1JeSiZz!Nc5(lNA?7Z%wn2LDtQ=*kYxuPlgbb;vUC%`+Y
z6aGf@r1yz%ad9o!?cC<-e9uu~9QeT`Kr=xiB}QS(l{KBfP^T-t?P)Z<VfzcQ*JvEi
zj|(dsB(PqHdd3z2JEZZ&{eQ26SoBTEoxYc>;qG3Aj^E7ogjysNy*5QEhoip!IP>hi
zQjF{JM9<Ig3NTG3kIUIeJjxR*2=7c|qhbX1pZaFwyS(*%cUq?_z4m5$8{NPn3+n5i
z{|J&qz$w>yt7FXV@UPWD<1h>0H~&lZ|9`ia3naWC(1-SRc@Pw;*kzxUL>~GC7(C^v
zd_QnEh4I#tdcZyZOt~XIq%0j<=j0+Zd_s=-pzSw_!7H=Uwjiqu#44aXFMd+=3IiMY
zrR<#K)(^G{QEtVKJ7mm@AYtG0^oVP?MRQEb86(!{R4*9rJIei9)$dvOP7sEB{GBm#
zwEV{Xh));yZkKs|s2tc~bPGb@=G<Ux$D}`|7gG`3&uj`sV8Tf*64ONk;5zA7px-12
zW|>Q~DPpVUXz(U!N>mq-6jzKwd1T~$+olxK5A4+5A}!lm?wj=>Wb3-k<u09$$2cYg
z&SWDP@croKRn*bH?ujZwm>y?$?M+O<d0r(56u&5>;xMAVWO|RN2FkjUGe@ioY<(9u
zRc*#gV`Si?#KxB&*b<NnZDbY&HhL{(HhO0g+{~+k&uuC{a)riw+ilch{cl$#f(n-A
z2CzLrlrLWcG859R85smPovSaf-eNsqN)~XwZdf=90FVSTBq|`%v%mfFLEwJ==z2&L
z>wdNP%qa9a;s!}238X2sIG!&20?Alv*tdQBo;NmokABVAI}iG|8Jf=4<@le**E<7s
z2cAe>q$H=e(ev^cb&ImK2~1}^3t$q|Pf#!DN!0_&)JB0J-uDN&dx0W=pblrmYSFe^
zKm}NULYI6UUU%g%fHS;!l1;T4=`tB961TQtDJ9o&6LiGO7g7u|&)&!g4=2W@1{r#g
z$jd31RqSjF3lA$GXjr*6tYMVeqH`LCJ<ynfnD|MdI4qVdbG5#9T_i7x)Izp9DR(<6
z<XfwvnOLJGBQ8l4=QeeDnKYw3Jwjd*s~>chV~d0?0C#dcC>Wb#5)sxa$cW6c`y?&N
z`}t?SYvUw#Re%B{K=GR51VtKpJ&l=kY56PKzKxj7QBUnQ<F{M_AYv2Qb}2j&%vxja
zl@B-{idBQ4Kvq_{hK+z;N=IDhztqO*d|iV+1j)N*<iF)}lA#k8U(?-U&nM)#b+N*{
zi}w&%M{nhjl!<W@&e4fugv4;#X2wJ&l~jY-i?pfQl&PvXw;cEHe1ME(J8#N1<_23E
zi=8Vkcy+$MI5#fU>HMnfjb`k4KVc^sayPa+iAD504_Sj5UG9daciOAeJ&^T&w`;AT
z+11ebA^-eL(@xY3zid_qruuJ*0J)H;UCO)8&+hhj;0=gCtK`LI217(;4+!(WRm`AB
zLBgQ)h1rx%*K1NcM4F7l^oD2^oyB{totCq;ttKZm10<WGoG_WaW4LU+ycf#E>etvF
zn*qU%x9xvS>B%!m0{@ot$)k~nQ_sq1j9Fcf+6#+O>xA52-%ooq8fqDDzG{dwq5stl
zgjRf|QHY<Q2vhe9GvNpB1{}fwz-M1kTWU|%bNDuvORpTfwayNxIvNQmE9+GPaW*dV
z8oT!Do~W?&>M^9^2IQ;s5VYy>P0RmtTuNM)eOk2s<NP}iDbwIM97&mV2URaC|1csI
zX4U(Ir~rt}HgA#^#_7HyFAmCk^_7azE|<Hu>E)_^fXEu&oYI%O{z=u0$(nYFbls+r
znkbIXmN!Bg%$Q0rj9aV@I|b70DY_p4H%u7^NRqBx0;)8am%9i5OfG`a&TtMMDOb6b
zzbep>TXrhAt1)xnpQ%8q&K}Qz8Od*v6Zw)a(xc#N)S#kk%B`bgW(d{MSAm#{P`xNk
zdEK6xr+d(eeWjc`X}N@UAjE=8lMRlTynp|)^vsj8srL82<onJF>umr=Le^*m4bp5&
zHyE2|thy*g7^ww?9DZy=OUrC5%>`ww)v(v^;g9Y5_)xl2@mHzMaNsHms0`}!R>?5u
z8E@gNgd2{~i2a&HeEARiVx1ZhkQWAUH~vaEwA|#c2s1&=19G_3rjn#Eu#zsYe9RW%
zZ7mrL(1hh0aiAor!qMXJlXdz3Bw+IbLJ0wJ2z4J{-Aa~Ta+r*6zfdN;-5vps@^^tO
zD0|ZjHd#nPV$E8<*G1hvnhNTRD<?tBD0o7oLG-aN<>0~%M^}Lh5GQ<IonL?&9`t#`
zxkXQbiV~<B({N8R%Loi!Xwqt{V@s3Iitl?Qw|_oti$=u?;X5s^gDlY0BS*>Xm-6*c
zWNK-D7rR_1x*f|Y{o?y?@^Lr@*8uTJl;;<Qshg#~p?Cf`d0bY<q^U%}RrW{t#nqju
z%^Nvp=l_Ji^gUfl?}kr)!|4p)znH{*VxOZA_Z|4TFh#q=fG($Lp+Epr<BkP~-mXW)
ziqg^@CI?3Ebyi5CjrV02Ml&1>oeR(OlCqZqEOr{JC*z*!k=;#q7Xt*)`vWzV;6H>Y
zac(RKR1F0|4DvDG)2-w@&SpgHyrAHdC4dTHt@wUy=ZRFvXu;)WDc}$dAUHbFIs;Lb
zyehZC>loVxWUAtdNG=SZMmxoLDOdrGK%6R+41Z!Z;HN2WVs<fj{PrnN#Ni{M&{<I*
zjaUWReLrJ8cQ6cn8CJtK5aGm-c&tIa3e0`Pffi9qp?oAYR#!7hZ;sUd<aJl+QlLqX
zHP%oU&(O?6k(>X1@!lv+%u>u%{1ZSv-t3s>jX~H*=I|@|h&&s2@aN&3U#o%5e+#%B
z=jgLjy!W$%LXedzcQl`y4i*QyXS&rfEwnV%hZ>$h`Yo1gYra=Go&x_G<|PO+2h_Hz
z!Y+>Q^5*={`T76^h3>*efdrX3|E^WED2_o0D(35iRjl17;^3FbED06C<uSUFYP=9{
zLWO90A@VpdiOX<#6R|N1Lr8+`G+j%C$~4U>$+nYs7VUID9}rHJ*?Hz<<&;-ESazp!
zy*iRU>8Hr9b9@c4(E#3#icg~vf>F~UHARzMnkM8M>w@X>NA^Jyh@fL`fmQ5^?k9!=
z0#DM!cv4{8@wq0axeSBvxRvNGV;^`-+m9MNB%cOaPsYAG`b%|*UnIaD-wrbI@kfGV
z`b9A&X49QeZ_=gYQZ83u>e)tUY^$aJEOtDmn2-XK=6xJbt-Rm|#gAoFbQV_&l?v3o
zFiZByQvRs$k(;6R{I?+S>e9&f0$KxYe=!BJQRJmCYP|hH8o?_O4jc?CyGjbBf-~lv
zG7I@GM#4W5xB2?Mv+_`WF(_S+QxR>WW~|1;c2t^e$~PtaPuy0T-R^wZvF1q^7w<O;
z2d?Bk=iBUd2ze>?yPVtUG`)Nl&M5C6FCKg24)N4Oy$r$%aawK0S3aex{Uve&go4pI
z{LESG*?cjSES0_irk}0S&p<XTLbmj#4NK$M{$9ODUou4ulZh!vn|-?-Kc!z*IRP>F
z3~tt6!forXK7a#)dO|feGp5m^krO<_|3&qk2X2SXsn6}9SG^oOaNdO#S!3gmdJ1cp
zW&D_Yof^|{*3jg+&-n+67XH?Hx}eQ>lHDl|KetarrVYf8D}`*Mx=P%kVLQ=xNjo=0
zONN-4i5_4ba}8dlEHlWu(%*}w|CbY{28BH~(i8T=o(`HmCP@nOcv^gsQPlne&N3n~
zKLn^0)+Vx^#2+lzJD5bsgLo|D!F8tzqL`RKd3P#bo6iH@WzIB*H0VJcq7*bYcMwU$
zh%_sVl`JCXk(Gw;+v6HV;iq^Q26Sv@Dh-9HAVB^TtFAIlne<EEKYpJfjbW3`Dn56b
z;~UCY;YK5~M4I@D+e6HX&aU%?(!A2>nj*0Oh5bCk;Iiccdwvr|7f5)+ARBBSg$Rl_
zx{Fzd*OaVeLuWW=I^67fw|pNLMHTMSjj;4ZZBM`Mm?qM;5!@UC6L!BsAFTZvsjH>0
zr1jB}X83(RGT3eL?^7u8<VL#d;y8KLSFEn(x(-+Wu5}zbK*k@Qlnkfq<{?-i^&LW$
z4y%6^5e=bdSu9-3qJ_9T3RcLTp?zb>(;gP*Xqs5D<f!fE9Of(hu_DB;l}5xnq3%G*
zd1+<ue)c4MSHJnMuaDBswf6wK<CyIPKVlggR81TQPTV1-Ntm(g<h;C*jK|f5O33MF
znoh|Hy|y8MXhjoQHC;;dp)-C~%O5?R2v!_-pOpVQ_NrTbued05=#%|2XmAcC$SLw#
z)}b_;)C@&kiDeu_c)=kZM0vF1UR%kxQ4k#D;doOAe*dD#p9GXVS9&low6B4HW<TU|
zZHXIIe8zTN5L`L=m^4Br$p^=Uft1o_dpw<`lKoTQGW{}XcRk1W^FDW1GcrHnHV|Z1
zFX@l8eK^Ef4u!IDv8i#W1&ZzSLL{~{$xQv0z!6&!RFR9UDL}lc;r<0o+iyvKX9E#W
z)E#Z*LGiDUL##*}*oBj^QrEWLiJCaGltZnrehkcg{$sUmmckyGzqJ0nirIMYE$yz!
zhNbC6;K!@A7fnKHW<Smqm$cIWJ?Ow52A)pNBoj)VDluMBTKV@KRoprGbSt)b{IJ;Z
z$6_aXzwmu)p#L;({;y$)V6&>q(h~}p<)cbb>n4`yCNh^@(t9Wlz2MCMo36Q#Y?k(H
zgZ50*L&)9j4(M*qY`$F2|9T@HHMmcuBAjzQxKQ8$zQl}mo;TOC$LrKSmRf$NTYVdU
zAZecCrPTYt*av19MhjHpSh5P|h2;Y5ZgKmsoQSrr?l%Ox*`@=~XWjBlNM+Y)S_D<z
zMwEUHr@f#_6gsSouxV1#uUOV2NQ5$pka=BSaJlyo&Z&@0Xs%-_Hs2$Bn*f;$hXBWw
zv`TJtVu<0^i<WDn_}{fA%U|N|&yIe$!Ai0D%SG>M>GDdbRhO4UJxdb<-cg$FRJv^I
zKZuc5{;(}Te^X#PMK6}f+?@WnEi{vPgoLKRr!9?CQDs#eHP+O#b||?NMP3Lo@*hqJ
zkU&K@m<Lw_U%3(`VzAQ^(I0$e+?_Su7Snm6-k(xOm&&yA&=PDBSRHhcB*B2;DVv+2
z_^rI)L<kL7kv5fXWSmfuB)<~Ck-I3Q)-ZZuBsf0cnn5ydiwlBnPTQIhb0EPHdGMBs
z*IaL=FlMP^Fr2#INX+hp0DAXrTz`^Vv){D1(uLk4!~g_7rerj6naM1c5tu_YOrz6*
zBEh<lLgr>WIWsSD6FpQ}=XaO$nX(xi<<#KwpXHf@Q8lFQ@sFM!e_<<R-g3<m5Kr};
zWSu50_ia##=Tn!#Do|&&^L{so3in!^uttEh1^wKxb%!RPId2d>u`#w(Pgo%W0MXrH
z0z+lY(miIgG(Nnjg3D4viaKPls;LWRfU0^#jE;XZNAj~@8uKVnj5XLWF$(&!vR@YO
zX-tzrVdH8^W^|5~mU}R}*Y6?Xnk`(uNK;3<2iKVe!pz`Wf@vajFV7>4;7C1{gWc|>
ztIF%7VVbDBtyXI3IF<feN@?4GLeiV0>&~TFMEE8jcB;My`@aQsjbX50+sMFa@Zu>2
z+qpEiig}9M5xwA=IWBz{Q(_dWAZRJR$7f(lfu-qNhk5ltI^Zw>GQ(YxK!&BCUdps{
zz{R8&-JMtk8m2~jKDVIWCW;F;1pb^$pIoLel(D<t>;><Qq_{mo#uMuV7P2VjR+$1Q
zfb}P5lNsOK;F1{QV_`&;hd&PX|4!Wq?jGE(;_lo(s%8P_f9rxf2z(2}Ao1PfN}ULC
zc2R!gM6m}|ER(tV>188^;^XL&JPwbC(JZqr3Fhk;&K@`Sm&Ie_@mTyiW!@>j7uiW<
z9|!NAHUR2$96a?KzaB44orG;^hjQ4d;_J}*&vV7w8s3kT?R_ADSn8Jw5}~gWwY0Qf
z8NlJ@53~CYue04(GWQSjOQt)(Es#<twEFLoyw3Lx*YzcJaO#U+g+|^a+UQ>gN0W%a
zXKYC_I<jt*rG3u}ZV@eTU9AV|2neNuBC{KG!GU`hH<*C3yWY3TpMXyvcs<g#zhx@G
z>|iu&o{*rTaV(gJC+o?Ur6J18S+BHEG2EEDVoKJ-I5zU}QE%AEsiER0FKHoN7(RRy
zVlK)Vl0A_HO@@py!NzT2%cp^+;mW-mgMr5mM2z$N3=G@+EDItA8}n>K!c$>E1s_bi
z@CN#2{=8{_h@)=mO{!VJ{a<w(bGVECLF|UX9~HinbVNEi^*NrEb^8F0{+FtV+Twjf
z&D|a0ez=r;{o(H@&u^*I7jGe9<<T<bPx6rLRjB#cm@3R-#m;jCL1Kl$u+~<Xm!5!i
zWG}Kmre$qw_orBeentupM$SW3f|lfuc40@t1fXfl;<WRiet41-i?H2eToEJR5={cb
zC$61S*&DEh5#<~?b+2Vve^y3T6eCE&TI5Aieh$}c@xO%pu18#@K+5A*x#cNFOv4n^
zj3Xe37cpitcv~#(&O-sNt#_Noo#buY<-i#N3e`DGG2(v=P*JqQs;*C}Ku|_+iU|K8
z>tUN9tQkOh@kodISbjgdX%X}Mi^pZvsx;e7-9nyDNKP?thUn+kX1PdJRw8H7rnw|D
z2pw{wCNgu`ZF%^ngGb}?VdEa72%|1_d!o}QdcAml=!S1$9>jycxrb3)gHzkvPa-FB
z^~wfi;;r1U+e*e2q9rDI&V1J+SokRt`>_(QifcQ(6^w4{IX->euD`SSdc4k9U-%J3
zYp;K0hf(D`JW#Zrnj*>%UQj<BG6<AOrs3Hqbi95ARKuIRu8Y52gbHAuhzpSZw1Fxw
zH7d3za{Mmxo0|L%V#>Mw3t0gG;Fm*BEn(%2nY|ZVx;*R77~z9ZlOINg(E@15vcs?`
z$m^H%9dZ8s2ej+k%cOJf6`tg$0I==6dje{ZFhFX1mJYy#QbNlSg1jJ*n?eG)8&wOd
z`h=cKPZ^lUG3q}Lo6CgYUTBGcL{A5^K7#7x1*2Q?75r4K2Oa1?F&k_0Qx8Nt;X4Wm
z6%_x5YId7W?JU|9DVgiRjEx;iQ7g~u4}<{maup1Puf?lamAaJz6vHzkY|8W6x8nxB
zOwhX{mSFm)Ns0PE&B+`$E7*}szm`1mcF6HN&1V(kpdNl+@YVNxN$3Xhr)KotS}|`(
zK~)od(j$2F*L5H9Znt*+{hs-{AJa3fr^P0Dy9)56j(WD==lf7}+wf@V8Rf#}EhvW$
zM7XXjT;yhC!)>8P?|ig!sq|GWCrWkmy(4}s`nNOpjOmONpB8S7sHIGTyd;%q79pIP
ztqs28ar`}<zT-czi*zp7#~$}Wy!Yr<s9D&g;qs_`N^Ms}tt|C{&7-YP7vCOGDytU<
zDm=M1iU8Lr4wz+ScPmVOvZfiCIB21gV#ccH181(me8tkh5{Fz%HEwxzE-50;*NM<p
z62+Q$GhKw07ckxHiSZ2fHUeGLJ|m}F3r#_UtP9d3-ynGs{$QUN4?Q*rf`=1zz9E=_
zB>rqhpPYZ@Q!LNTNqh0QUZMbs%n79`l~l}~eeFM{U9ql+7rdW*Br9`ctHwNh`rPIH
z`tM)4@2D6%GWBxY@p(lUs{o3+7vFJo+>j{m5hP!!OktJ62*1oQ0k=y&=|@#i@`{~^
zdSKW;j>DXVJN?EbN#2!v-+c6ySU*GpL}IxSp^yjzE&fE@ff+}aP|BWwIluX|?M<9H
z0SC{HC|FrgWV$wOxa=L@DzI&_C&-jKPpAlFDJ{$7DTyGb;$^rUR>(`JaPcHb#XfpR
zO0~+AR@wp|N#J4P_2^kC(1WIonxGaV#*_h%rD)c8V#-eZ7iK8&ErSiK(G)8el77I#
zkZ_HEj_*K*6xpZZM8V2fa+(qYAXyBkEr<;(Q$p_5{Pp#P9GvXiIXF0Iqkl7kP{)&k
zJxxZwG|_i3QP{KOhQ+~yD-MhVTarxNE3Aek0nNf-P#V}G01OO<Wx>^Id$Z4AB$;6A
zf{?-OMr0)^hx~>^1{@PCLX>`qI|jmJQyyvHfn^GU-hLl3F?jGfFQ9MN#Lo)(7M(j$
zC`ryepZyPe`bqvYw{LK8;(J%NEM8?#!hcWVrXZEi3)9!)h*?3Bz8Fb(BX(X23!XPY
ziKa7%d}zwyG9$y`zH(QGF45urBaH0!uvIx6Hc1S<EeI`2;@LF0^1JU~w@+tJ;b!F4
zM4yYFhle8*Z<6$5abo*c#D8((6r3%a=w19hOe|?wE$Td(18mB%c0nkAgi)h1RzVP|
zDG(eXkoW3Aq{Il1vgD!C#GaolG;+eFi#~sLQ<4xMLkt$s)D|Q$H|HPzpHkduLND+S
zvZh|)uTC2$j{^LDe3|E$=O@@x^7oHT8H9Q@Xh#+b1Q93*BGWygchI0)aPxvlzd*8)
zAZ$WtmK-P&Xm7cj-;D=PQ%25onb${yPG;7h7cy`%pyzl111wFDl-2~%ABU>B3)kEH
z@8uDb9}|6jCx{!8Ue2CgSrTvMUQ2TaBO`*&4U<GXNw}EDvRV^Hf@IN6DKuav&5|ua
zv`QDQ-;ONg7C8~>hphYrr`Td!^xGO|BgsZ^C~SGZe?@=3BINbwf^F%A(SiU3pe!IE
z$&n-hNRlv~MPc^5(5&M-cz9#V)?RkbG@Sh56=CX>aU^^8?2~ci_~5n;11l$n0QAnx
zyij{8+O7Y$q;U43pl6gfuhzhiqOKz}HU0wSHCpqoHeAA?akbU20;>{>q6$pC^$(q*
zR_l#Nq3V^6Vg(n+b~cev*CRlQpwJtuNOw-W@PeQU#menqzXIKg(T*;G^8SSL`3qhV
z$b>4cLna!UObMSNQ{+EgEE(QGx$Hn!NAmM~H?Uk4IQmWzBPXW<c2=JCv)Z3)hrPuH
zwjo{VVS*l+#f!%!lBKUQZ!oRnC%ZD@pm86@(9e+{DLw9GcId?e7iWB)`_L2{eJ33K
zNVF_mahgpdubk@{1x3bpN%*HyWnoDgX$RX>DgK2N6~tDRoh*Wd9X4K7cjY1xcM)0D
zc0tt@dj(ZY7ge%YG(zpRnU<QmCZrLuN7WbtC*aP{d4nGVqCr)4KoG=&#;A&>CghQ^
zR5e;8k%&zx@`x4z1YoxyhU#{bp!JCBNhy>oB|S!<L1;%0f=p~Vk`6rT00E&Db=ix0
zppNk{R*fq{G}Z7@WNCV0v}B~Ig3%`H9yT$KJfbNnHtYoxMkP!vJe&*}QgS57x@9PW
zQ~+Hy5!19(p3Q<;D`Za!DB4_}cM`oJRxPm9G|9VG#-x2MT53@no!Ze0BL19zvhoIn
z!PXr>=R_KfsJhLj<#Dlp+q8<~7_QRfaKC_DB`#@${icc_wlHEHP}5Vz*y@fh*{qqn
zR9(T=f*7+BpD-6#wReKVf^(Q4O<mfhdslXH<c8~(6Hs*CeY9D19P~`RV$o2rb=+5a
z5ogB4RIsrZY;8n37kb=`9T+$*R1{r~a0ONrVc?#vxT5$}B@xR?A&z5if9Qqe2R*+P
z>?bd>2hO1Cnr_qGyjf7ORaF^uMeTdoAt#|e*-=Hc39G306M1qofhT3sdkq}Y_OBgm
zAxhPS-(_x0u~G4HOjvfpb-Gla!f~aPF@sD+7W|2iA=Ru}Xw@4)+F{><WQU{EST!n_
zXQv0D&QF`ctJm$t!3-hG?j|LwHA7^b5jfKwIC8~j$ZG_wh}*Pgsc8f^JnRmx?cCS{
zd=obCdV3^W2X;kg-53Q1?>JMwAWgN+{LnCDoBHfPRY=Cjcatsgw;t@*TY@htfukqy
zL#VDyJ%IMr)^j5so*BCLw;XGmxlkWu8M+Lz;$}}lIBbU(!8CIG85>|>;)aEw9$3O@
zpQp6_#J~Ab(x#d8%{Y@j8vkwO<3+rPr}&oUX&o=kzdG9ZfNiv1B~kM{`gQ#1_L<6K
z-zkwTWq=Y%7O#&itDwVA*O;o7Al8Ej0_0uMET{Z{<zb=}L-$*UoPd`!A|4igRES%E
zx&0f~$shLZ`s?UA8FYkrVyleovvikz%R${MiV2eweb5$Z@TSXjn$gV@5UD^&u@%Y$
zY%(>hqp^XY0VFw;viALLeAK+9M4NO1{lK4v2W89Rs+o?+aU8=Gq}G(i3Y&52#S~;m
zW|%p*rBXMjN2Q3@`(lgUJz{Wlm@TV0xFbcpi={?n)z%V)ijmzM2aIJoCk2<2Qc<m0
zfz*+=@@8Qr3jM@pEQ>wRZmo(f(C)13y<F%ZO(X6;k1PZ}gfRvV693&}fdVvs5L6Vd
zf;nZZmKzEdcpxicjNHn*{HLQyDB+xPF=O=2!gZMpV<u)xrGF!Fiwv*sG?(vWm+LCD
z=EXl3P~f){w_TGfIZxk+A@5K%0z7|f4M6+3%j2+x@P5o(kpG1!SZ2}oUBXo&vVpux
z7?Zn~+rhN}S)^B&VlDRzinQDhy9bNOjsSouTL+UG(muX}Xx(Hc0^A=PFOgIB2fB;S
zcVFZ8Y<T}39@}=Tgpssi{B_{XOM3Y#!{R5UESbDxGEEJ2RxsCy7OG%xwk(_kFva6F
zt4$t=6MV3D%}?#VupgegBIyWG+<K+!;nPqV|E{-xu+7`_PRw6+iThoTud*In`ks&L
z6FQB5w;Uj}QgSTm=^%3zssQf{1Wa{J)$^$&uwcJ}`zUq-OCw-L>Qo!Zb}A6qZmYo9
zBlUm|N{lvG$)U#B1aRaE$0Nm~ZJ-$tdbrt&rLgcxVh4L~tRe656vbW(6r`^VF5Zqk
zNFCO+-$`=^m=umMvM<u5tx&y*31e==dfXzYG$hGSGbP`^b|q1^3)#}!ph!?Ruyxx;
zgx-MSP5|sIhph{U-6Ae|6LrG`ddJ)`w2cE8+dItC#S@zL)b~L45yEsuF+b~aKAy}e
z667^DjtH(G-Jqgx5O75?tlnU)l6Z!G!vy2+FEMOxVc573sFxdmg9kG&aAwPke@Q;b
z!xt+>g1Di;6{S1;9h~8Y1bs}8=!R^IaAqXek4d-9Uy#^3Ai-Zz>{FtA<PT;XQJ+79
zn<{IVH_{~>>JJFIAf3?SE@^GbQ{7&n+oRP~;ti9h9P%ej*Y}xiTxfSGd_C-jvoF|f
zoS{~FggUJm%dtjjTOi#HCm2zDJ}5$6+Gor=G}jce?JF)>ZJfZ`I$@P_tK1_jckngp
zaMDzRRmVc?YJFNNR(Z8j>bKuh+4r-$2=|s@Im_8@a=F&oc}$)eSapN1@R#cbpUGI!
zml*Ju0K}xf-%L9)oY>~~*cmpmv4(5{JhP#4ie%-=u%DSs?aF-b9vM0^llm29u4udB
z#cuISwF4%=_WX!@DnudN^O2z1ulY>@7i)l3<DqD>pi>nJbNN)dPK<@+R5tgCx)mW^
zH~)!~3fN5~N{#qr^Hc_<=3zUOF5d3NFsm{IvLDEj0$AZ!u^q~v09r<mS8RkDGW(#C
zjGAy{Cr7o<p4S*;nY!+9by!TxRV~W7nKOCVMI2y;VPu|2wr);8lC0A<@_NE;j=T=l
zAYvQ%hGklWy@ewBBL9;IT9He!4FQER*gR#E$IBet?2&Ti4!!oE`nEP{T<%OU$qwcJ
z>Xn7)`MgllM(iqqY}q-$E8i*e2EOX2*?&s?5ci~K8#k}jwi|iTOm5i7n#uw(O6FOu
zv2B?!oZ-Vy+*#?t8r-#-lwmomJ>5%OhcRb`_1f}TVfkb0D)>}T{r?el56pqB(VBqc
zj&0kvZQHhObZpzUZQHhOv*S$8y)!lcVOM>7;ayKz^hsN>9=FAYoz$QCq|FrFZVO))
zDp=_mm7Q4{c3Tash@0pm70X1Xgg><~-zpmqO&?$$!5jNA#v)dAP?7TR%}4!&z;ih2
zN4uMQy@=k|=7Tp>C--Bxf~wkVhg1hDo6vtygCT8Z2)74%BfyQF$IW`h6Qm~zr|3vh
zUD?zgIg#G_`}3O;#qWkTrprAFo0f1Y&Z2=pL3W?^?75SLx36kbTF<8)d9<>(Iav0O
z9?`>SLEQ>)jwJz|ZY1is!*wL)YKS9-uasH$T>}4Brx=Ksz?wxSBE!lD-XZ2Lf-qo3
z3fZMlhy~&)eANxUfUs9C@1;T3ilcsjq-Kwu8<2u(O{f#bhZQ0yKn7ezo;&30Z_t&Y
z3K0$5tYAQp(qRHp>X8o{F{LYFM4A6q^*Ve$DH9vY%;m@oKDE?7%WW^zc_t0R!A&cJ
zqwu-=7WEKM=v6$;h&+)0isJf~H7h6{;QkWEK2Pl#GR1jhrX<5ER8!k#s?bg-)-R!;
zy7I2`Wz7m`mput{vamDEtxgQUT9REfzy=6xfr0%If3W-%(^~v#x3ZGBlzQi&>6tdb
z=eupN&TJ*qG7S_;Jb&WX<BZ>n$$e*xytu_!-(NhkS(HtkHT|+F;PT1n;`NHw%>`?f
zYxE+|wA}URa+#f#ar`>k_g_Q;{(AG~E2=Bd&GRopJp060|6mGJ-q`<HXDko$^vlrZ
z_ll;?1H+SVtSu9z#Bu`mf&H5o<s9+{?0s=iR*$e>M-W(oEP%{Bu_n(SyA)<7@0yTk
z8)xYVZUbu%&e9$n#u0CuT>^qPKl2FL(&crL=R57NOiZ<6QqR~|e*6&5yBdAkxpkJJ
z$s)J+rpz{<)K{)ik6elLRWbpl&AVmKJ5>^{q#Z32mM{TImrk^F7@@`^wd-lGkDgKR
zV$u6z!N;dUPRCMQigq67LIn+7oUhGNf9!d$;+OmU{(!naKO`JGvbSXCVF)i)IoTm|
z40QE)w%MwVwU6ER&v&;3kOH|g;AeBPtzFGVjEit3y;4m|`kafaUX_v_MK9G$bES<b
zZBd=Yb}(fUYaF%^3s_hzXa1!8c4g0d#^s0r{^{}RP^Wvs9`T03Q*G?x<Z0>I^B1yP
zntL%s_}YD7-Z*4`L$hC{lx2y5R7x&j3WnO>SY(M}foRLc!;uwDoEwZ<6)i~_?1j&i
zXH6Z1s>&dd+5c%-b(E!ueYyG&3g}Pl!)4bltDirF^C(qiR}R=&l}A0UJeHKp4~o~K
zFm)=jDuS^2i-_7JUMunNDxI%fX2vMp9kIInliTffZF?|=8uBB_^Wv#{q>h?kQ+JPf
z90Z?WAEE`s^vF9oDwR^qY92=-k^sZY(KX56cVdVS(F#Bi$FHBz-xmZQ2=ke0w$<=O
z;k#e0qq42DdXqw(X*|-6NYq^6s&ZZNyC|s_>h$Mb+i!F1ANfS-Xtg6{D$hdbOH!V{
za}S?d8SLK2(T$-%4wyS-A$o9pOWG!XrX~n8T$2cN=-)Xs$IQ{=ty#(Wt-60id$67*
zuFyH&_kDns*j!n(X#9QoG{Cd=lQDwD86)H}LZ<pp6f%rigo}xi`*OigVabv>BB4&p
zfq9Glk(j8En9g2|;5rn@mF^3Eh8n_=$s<sE_OP|>s79MK9$nT7&0$rJ?dl^`22Id@
z0w?~Z*ls+LjOX6Gkb@zWsGnh4nAPg*=o^t#YV;&Bcnx>bd%<D*NfX2M*d<jK`A)nz
z?A~*f@)^=tt}7<edZjY0BU=<tu3*%L72gF)HQz;{_!pHLIyc2VsOF3Q0LopQcN12L
zpXTNWKS>iOqS!PTyMfVIB(EnYI+&%GVNXWr<bNiH=MySB9>K=kD3I6UpW>Cv)yEOI
z)nmG1FZ1&Ka4~bZsDe|?Sgrx@cFr%4@F!o)NwP;(UQ)A__;m!`qs4|OJftvvu-^49
zsL$PeIkE@64rl$cuj^VrU-<Zri3*b`F&pGr*-|QxLg=0Nm}9t~#9;Sic*j)GuZ%zf
zz1)_%c0;tlHPD?2I-a-L#5OTEymIAv%n|pUoXl$UizgA<^%Oi9x=C#P$uTF2sU%EB
z2eg1q_>|QuA4oMkK8)<zz6$bv@USbtDmzs?w|?)=o1=kpJase5i#d57evM?$IAf{Z
zISj&LJdRJ01Cqo0>F>>RRRvY4(BA+fkc}9EZ6rMvTAP{+y|zh98cXYG4s)5-$*kp(
zOR4;RnL@QJ<kxooKs3NT^dXDsl%~(0d2@td?DWNgZV73__>&DHyB<I1MO2=6!RVg0
zmub5vv-J_vMX$@fmaim0u^rNT{cd#A%*LY(Sm~3@%y75!9i9Jv9DHamvY{4Rg2fCF
zL$tn?*_Fw0Y4c(uy;`}l#f%a8j!U1+BjfKFM|85096uCyILQF#16~}^WG#g&RPSWN
z;+>D--P95*Inmfqg192vt&14|2AXq#>tUV*PCS*x_ZUvvp7RY6&<Fo5x|b|LzLp5t
zlmtTdJpcYXyLhl53N?NezJ}C7(_S#y-mv)``}+f4du%95Cdvqulz>VLThu;D#p%dU
z9mQDg4AOXIe~`lY#7V+RzK*2VJhaM8b){6tAE5sY-Nd_$ad|esi9oM6m)DjI#AmBA
z<V3)FGULy_<%BL{!Ztjk3G~F6!=FH<@vN_jN}>N|xUFW63*s8qPeS@)cU=R)kx$P9
z5mtHKsM>A2gI4+Gir|$vt_EVMM|CB&)2{CBxFciDNun&F0TrZ?JhMr<>nysWRdsV0
zB$o`+d;*`}6SA?IGmFZN(tLGBxE4I0CLn}XD=w+q9QF(qTrT`<QFoduEdk0y@v%lY
zLZVm;qq=@1<u*_HTwPP2;l$oA)jV1)JaAYq!E_T<Py=x7qT{WWaEk}qOs<>s)hTXS
z&W|moM70pW=4|9RSLRUZ(i#FDiqA=}#hw9!qWE}ZRYQjW;2xy;fYS`~tWc{so|i@`
zJL+$$)D%bqF{5TS{bfcF{A+Au@^J>uyVxTZf(qWH;HCuu-8I9IJ(@JH$hH1Vppw2C
zp+2_(mf?E{hqA_S2=<XvS_JvpA5I$J9{17^3Os=mCNuwW_G^kSSx^yv!UW?0;r6tR
z)PlU0u!u4GQ)Ow{lz=;-CBMi2tUpABn{3K<S7afgB~|>%UJymUF^QhxjN2|<uSlj8
zl$=Awt~_ScO!V2~-xDWG(*0dNcx)l5y!CByrV$}9+8NfQPl@N!iGs4mc<5@S|0qv{
zGZDV`?XwY}Wp!R?)3SGRZ}$0*T;D}+eB!rFi=i$X1X|KT@@4H57<U%O{9IT}{Y_w4
zfj|3K`OG0KaKDiJ-3a+B9UEQ~XF3FO<L;&hj;-%vWjse~Ud$(KLGp%JUuv;9a>&Y`
ztV~<OiC|f5x-Lqw?c>#u7*Ac(5SwZ-zw$|EF>99gorur6YiHg3HM<-G<EK%*K4##I
zrnYI;<H8`AOG|K1n1Gf5#E$^}<#d;P;PvY$UwZG4DuxB7u?-dKNS}&qmswhCIT%RR
z#IWNHfcq1nB35GOzBhwTaj2nK4ZSpm&?gpeF%G6?c>cfEf&dc}fl^2}F)E!r09d@H
zI}eCfFy;Wg11)7(z3a*Kw3ro19k`<ArqLqnV|@qa++l7I-tkasZ&E|m?jR1(g45zx
z**34$+p2lj;0Lul_<O)*?whgBfIFi>v`WI0DZj(lrBW=hLUZ#b$gvM)HgG(ja9oU`
z5r7LXDxHi{zC89`fN{!xG0f7xkVG5HO~6ce!X&W!omgA;{6VQ<!wR7iTo?AuGoUff
zpcSvy<;*>ch+p*N{?D7>C{p?ndmviyCSstoTzo+efqymP9x2K>I-d?Wr1=vd1y<2W
zoJq?GOy3WJzzi^xljt0(mxh37A>2iPdhTL{(&~pNlb^88RZ8yQ?E;aHkrRS!=Gbm(
zVv4$CrOovx3D5s7pn(V&bx2#-BBIWTcAZ!JF-n%3P*D)tFa_`%mmZb`yiJKR((W>Y
zDZTGbx7Bg4T!M6XxXzypNx+F(bMA@qVz4!Jy+cqOT0*Qx3Yn9Yj;*%tYdAu-LbyqV
z*C8H2EQ(M0y`WPNCccX#_Lum|wfHO|D))lfhY+R%mG{<f-d>FxcROGMDZFN<jFfZJ
zi&$OAgY0ez%-y&co!Y$%Xup~<j#;3H@)NK?ac>Euy!zvx-EOz<XwG)I4Oi>UW~+}L
zU!a6!iFP8jp)3o0{NC>nlc|`(tRpFjCbS52hGUJj-Ahn8Y-JATh;xawCQfv2O)$(o
zv~xx)Yv(>2YMZTN#?vdz$J6YTG`m@U%;lsFO+n1qA#VUIvG0~f>to-(siAX^5Hr39
z>U!>8ejF;a#3pMRgW``Ti6GWv+KBof2R*wL!S35vGsis2^;XoMoo<R6tk;j>2Uf%7
z1}C+lIW17AYV%OC0*5ka@SE^4zmRG`q?#y|duoK?V86Fb%Js_k><GIlqVoBKLb7;P
zflDsg5e<t}L)~bd_G*Vy&YK>hrOD)NM9%%slFf`81!>j9%3rrQp1LW@#N?VpPaoVm
z5CTJz)VVisBLOmtn}zhNhAJEMT6f_vYD|TwUPDfXvu!o+9mzqcnubIA><Cn$=0~BX
zR$&7vHPB5n<S<B*BC*Vu|2_UeW+Zn%BMGQj9)maoq1H^2s)*!7>~&YQT|Hd*)z<NF
z&+E#81UlLo9COX65`1%59{QHDGA<f>;iXR~S;<zKb~G#nihg1hu^hD(&48ef6`RIC
zlZS{|Kb&T5zYT{X*(unvsQ8DjXaXRg?t`A@+%Hb5`H$O`mRV}=m-YRSkk{XV9$!{e
zr)I9oBR>mQD}_%@Y&w|`sB9xC_fx5z=3sVx_8OXNnWw(0-S%RhCb*KB)`h*G@h)`k
zKdCRtZECE0c!_5_2cBZht9WRgt!|A6TL&=3Qm&Hg3ym5t66e?RZAI+n5HB}z@J9Ag
z@KOwW&Io1;>(-?xw=u&9*@NBdHv|sFA8*#?`7N2-OijlI_5VmY3&U-fV1Ha4sYiJC
zsF{Y62nD}6g@ud{HF56IJFg32+>24)-^T!y(hV)RL*A-kVx}mWG-(e9Q6Dk7CV{gc
z?0I&Y-SBea>a|BA&Nn$hQnS8$$S<6R9%IUE_~WK8tYY6*PHsP7zu(^54T93Qc<}7D
zzxrR6c5r+;7=xai2vYB8aJR@5k+Ixch>p|<-;llFfq}M0s7yAbVWQqx=~3qZ_@r%+
zvWO*ZI+NF|+o~1vO9xBouZ@${OY<agXNjh~n$?H&B0?b&!)p%?mO+CEb_^jxmsEsK
zBI3@h1rhW@Oo);-qk{`A3f`o|EQ9p8GhSBB-#`f6P~ta)jcIq%HQc|NmGK0D3WvO4
zGp<f&QBCP+c+&@|HLv&G+9GgKVGd1ZwJUWzbUPU}Iv;iL90wFOYUOBm(~<VRIelRR
zT_ow5B3=Q|<6|1_;B&0_SJ9o24nnj6_#kHiBnYR#$AzRP0PxRHZ`6m-6rds8(b0!>
zvddUqKzVI$zFq!WqT!D7HjJh8@Tqz^ey+YP?sv$tdpkPsQ%MbVCd88#h`ZOfcUJEs
zB&g(9R^i?I-mJx<p4L+Hyn7qCcDSA^TyHTb)5N;{Zl|l3N0`NqgqGABU6qdyFD6Ry
z<u;R@Fwk1E<<QPZuPvtDLvNCuUZ<Y|%=pEt4z0G}@#wib<eb9jb8Ik`UMIq#*7}0Y
zm)mXWrupP+G`@c8%Vh`9yRY&c0;@(&FMISsxm@z@xVXFaa{Z(2h#u`~3%V9mXfyX$
zfLLVo7lG(@{9tDs3C}_Cbx@tO8;)NykYcl2XKpISYAjygsc}zT?xtYY(oCgtK=KZ^
zIBFBd4v_BW^NuT@3%LXvMTp(EQADTA<yZdQ3|hwJMNd{@mL@N4h80&)6R+q9P^`wM
zR?+b<XJ6>6o~hZ9QSioW4utOPWYQM4i-8@#(72|V8VzylEnJy~VC{5gh}BY2i#qgY
zYWO*fxm%7*qF9@0m`htWJcn=2CCn}JOV99kB^Fm#d_Sh4FB3;NqMB+q9@q9WG@(uO
zw*<&{b-`|Cst%TCOPCC(YvrtA{w1CIga&Bo?V)%0-;siZ7#}$fB@hB-mz)ve32|l2
znOhlbhw2pDesA#5UktYa^}#)kK>dnNO@!sNB)N9r`9Ff{Ryu*(6BICN0kJD14D*&l
zfIm92I&(dc<51f9`PftI3W05&A<DdNV-eU2BL}92`IZQ393*6Wb2Vh8r5Y5Ze3zcS
zw)iQPnndqVU{I8`(I>ZhiNkrRgkGK5^SZF@S5~r=tT3#|(C<e!hZPu$+F{(C4TMWq
zHdN}6mX73R2+EPiQv%nFdl!4t>hp3&(9rSs^EmTM15&`do8y!YW+HzK$<@Y?#rA73
zCX+<(^}0qR5AS~av4*yzGIdH-dPjJ?-+@j1%NNEbV(}_tim4FXd~LYmk=0=fHaMvJ
z{lbw|6!M9q{yRr|dC^K*ETEFUur7H+ez?f+RB)OL=V|!{?(`a!Cy+7sQQCsNP>vxh
z_qP;i`NlHrW7lq8%cC8}V1P=JVlr!bfMvck28R%F%uIAEk<n2nN*&0YUm-VeuYpCL
zvyx!KT*t^IuZ0*?pr-u;_OFBFsx6YJC%s5OyE`Lfcj1JP<n4fTr!X0?X%;~z+V*SV
zBjx^`VS>2ewr`wgep;mnl=#<P2&&ngJvv|uYdd0}O1Cp^ixXk`K(0L6OC4Mi@KZN6
zHRiD+E2Mils;jtA701@CA+DLQZH-sl9wgt&E^K(<Rc{FjN1ZU0?OiDGZuxaEo6=N-
z`XBv7;do6~T8)mZZF^_CbAMXoD`ToDm#p6Ns|9P_fFA)cH@kqsL|N&5ovHeOTIv;X
zanD~>CqS_GIv!l`_CrL0;T)8@^$`!18t((QK%8M>9rK|8mxQm4=2W7p8$wu}k!1BS
zGy_lR{+JGDw#CzI<!|;q+wAU5YwfPs42;-VZ?bk+z&xuCVVw2%TI~2;s_gyM6|%*r
zUZ0yU-gu+CLPJB<_3eH-TYJbxyYp~0w0Jo0WiIa@<o%k+@mA>c$OsU&brfi}=Tw@G
z*7$|ck}>mJd4eO&N;B=P?UEmq)pk*r<%@V5z!?sesR59qU}-ni0q#LEC|+1qX#^z?
z#wrq}LM_Q$nlFg!jsJc-jAm!u5&vtImL}=wMXachlQt_V_g{mM(42n6hu3kcZ=1Dq
zNj&e{aW{omOdl!h@Ro7@b{TNAn>KBY!5u$#I)8YbF)T(jcmZO#gNha?Lu7|Byp2U=
z)Hmv{yxPB4J{rm<MYs*vng|!RYt_Hp@Z*x{G^AgM(~DNqxm7<zq4!z|BZo74yY08x
zdfE+CXtr;yTtUb%)+L*=%hv%)y>@ISsh~MvpDaN*>cGYkSjqw8P!Xtv0Sn+y&Ub=k
z8R?|T1eF8)&h2A;5COL!@^=l)Q;RIqoDsOPy-UQCKpV+E+2hI~#Y{mfLBp$mo2SM*
z4{>jcDW2po%m+t}N-K#mkk>yo^8aE|YfGxZ9Wp`(cGG|O!-)#>x<d#jJKr3q6RSoz
zbkD7?;@}0VwUl5uIY_gBSX{qsg5GWb8-;!$E=<J`>F`^1;sTFkz~fs}?Ua4s$a8{w
zy9Ez0_+>jDL~5*u<(UVSi4HqL+W+QG=Gny`+_M)WZlkm_(~1sJey9GeUCo7LE9@w^
z<{*$_7~4_`krR>)CPf9dIPT)m5Vv9C?c6mA9$AKBdPmi*=9da|;syzEV8!`&AKoY0
zHWvqcWHDYX1l-k*&^u*;9ccba#B{cD7}sVM=F*XdMUY(ipa)$X0%1<4C$iNTYC1#f
z9y@<OZ0s}*mxd!Lyx`P99brQF#T1k|lBkL);oZrz&9aFYz6#pVR6F7xR^&3kNXd&C
z2+Z1UVyPHUQ{bxa$sxu;^V{z{tBoCp=O=vgY|eo^RAf%!08P;AzYlvMxmcs*|ACic
zN2}BWo7h{F64xAq3_fgNU}HbQa{N6-)WW-&nyVo>m%k@eLXRj9jZm1Y3=#*y)g#MS
zg~S~CN69}^<+uC>Q@rOAMbXJ9G}xW(K!rN#YQ_p@vy)V|n#O31x$|O5d|={LpX#o)
zI7N*s7Ca8hz_%5Cn-Uyx+;Od(<*xJQyx(X3?iUQ(=O9)!bC=GMP^moTy}!5soH84I
zRB`2pJ)B1Qn5Xj(ud9V;(Iy2VZ5a6Q8ZZ0cs?DDtWroVKfkT&&)c)x-`}N^OL@jz@
ztxDmkGyrQvfq8Ee+PT~cNg_sc-+*>(I6rYU0}yS6onxd05M%JQeuCLEcre|aGz3!m
zp!AqODJ+X=SC&L{)Gt*Zh1Y)fY652*<w(^K$azeDpuVsPo<~oN0^Tl}-6bn&s!OfY
zuR++3EdTg5S6DD{!eg~(+ui29e9=5ib}ye^FJ)kfH<irv@ip;W+0hVJiB&A>vYdhI
zVQcQK^{UI0PZhU(FA&{LkukHNDX#9Hf)5>wt0d=TzpB4v=e!#<n4-*8s+jUPLXHW;
z;jnn)QMI-Jd4X~{DpA6ur&{Sk$sz@*Lg1R(f%qNHnSD_s4I~AGL=$FRU2tG|w3c4B
z&URf7XJ?Br!N(S5x=Lcl{#3lh{BDcJb0&lD&DD7XDeHdwMYI@~mx^Yc)#(mDXP(S9
zaWVuNu4qM&hM_m_{w8S^wyKH;nVmRI5wJX`H70y9M9N>Gy)BW~!s6wi92#6&^gXLQ
zjD*nFJ~^|Bas&Kbt=8*XZ(p}w_Ft{Z-(L7D%o1-zkABNQxJ{Ml;8TBN?nD^QSUgyb
zk&Hmj(<w7P2CTSQRTyAD!jPZxlm@s|wGw*<XDl|OpmJW1h|s!GqncYCw?$y<l+7ed
zcuoX~ZsgdZRMnvHgz5}opOR>1W_4y4IJGHtmo~@4RIe?>JPMAj#a4Oy(tU!nd5%4c
z!W=YXjSiY3rKax<gX7?Wxb0?=4_26iBxtoH)Yb*pX@;7U-k{n+2Hm7?ghEBtf3%$K
z<U@Grqg==c7TZyni*A@mFVP+dMX`;(txPq%F-x1DcgMrE+v_)^!DhEFl)y0>E$7V?
zYx+j!eK!%V8?v9ME)QOrizbL-m2P{PzX)6Ac*|M%)<~rSM-P0hqxB>$P^NK^+Z?HF
zYrc!!rd^!}BRY3gcbpLAh<;4TkzmMZBqDsKu#>w=?F5Zss8@A%*|YdW&>5$OYogQJ
zqsB1S(NRADqJidsNjt*njS7y|g{li8W`MS-vx$FYcxdO|Y_;U~ltlCtGEx!Z)q}?W
z0VggFda;b0lH#WP(6$gXSi-@{^3U1;liMEC`2$e7AY%+hqF-{nm{$iBio}7a4^H>a
z?yr0{JW$s(Sx;4v<&I2}oer!hP3R$uXbCDrJpp+$CTB>j*foA1dIDz~ldbm!s#74b
zb2IZ^;l{(mr-V3DFjDBV7Wsg*a=P$JxR2YKpl@(f5jCxp$}0JEon-#Re4U0q*sEM;
zZqgx<^p=yuZiw|xtqb7p!UAef%VTxAMZY;kU^(2Vl)fRS_Ql1Kd;ZO<DIxy(%@*SI
z#kp-924W$a)9!`DLS9T>PTm-0^&AxMvLd|Ed|`{NpR~@1zIF%G&bHNooz|r`>&&z4
zjM?fBZ8%}?Ez;)?fr-U}Ch>}^nW~QEJ$VEiKgHDUKc@dthw@!NQa7f119i3{og$s&
z+J_4<AiuOE&Ek2U(Vv~vBOb_^oZ%jtG&IcE0N#ed%Y+F@>22kuSZQ6kKs_<xtn+(k
z%a4N~2e4$klhOUO<cIn({g%UU_c-F{VE400WQ-E*<f_Y#nZNcOB81lA!_ms^9i}>$
za<QUK>D&}y%xcx^>cmwA@x6oySc1Cq_16c|%CV<l)Dk~IRhSlKXZY%~db%ixdwQDC
z26efVM7i1Huf7(r&?#5(wsk}BLC)+v-2V0aQ%)ob>1sGuM|LLd@apjDWOe(8K7m?A
zv^*E=6b$sNFr^<ug29#qBy3J9dwytt6Q&pLT*3({1&S0Yw>N@ZbCzH4*YJ6Bz-uL0
zc|ldd%S7V!#537mZ5Q#WY$oj^@-|PKb09GPZ1IO!3pN<kFVW~b!{=t=;GDucuFbzE
z1vl4^UyqJ<Q!eI^(clbTGppT_4@3yBXSjG(Z{KJlr@h(z`+ndma-(^?II$M>_S47U
zX1(@N>E++({0e1ANQ>)lUFdoK99_(HimIDc@%1qD_nzk(<4@cl>)cO^w*~3z1*eZG
zV1u!6bj>cl*3V@fKc{!<ZwxT{vmJG?|L`ELGMq8|2Dxl)-uph@pXq~QJ9GLbINps2
zLcBhikqK9CVMm)bcI!d)#1yp16jt9&UJtmy`rF)K-aQObN><OemQy=AIo&dy98GRM
zjm~@Lw^1v^3r?f4@0S#x6IR@?uUF;kmm(gIikHW!gZtB$>PAMjhNp+y=5xDWKFC73
zc@b%njm%!`Y8~n2PzrwY=5rU-4sXydh3q6n*N3)w7exrF@t<YuX=?13b?<vWI2yh#
zyKd*_8bq)gY>A7!&>UDXYSFR;V;2_(R#{UnrRzH}6yUD8`OuR*N4pFMK^vm!be{T%
z)M}cr;{kBJu{chF`Z&*a*Kc;_;r(R|<B-1F`=bjg1S5c!C1nogd=#Gl>}&TtfsM0A
z$RIC+-M!T-1V~d~_ra#N=bx;W$h7m(it=AZ2d39Bgmyr&;?siQyU~Xu3%gR%_5eap
zOxyicfg>f1_jy)kAhvq~C=(|m4$E+E<{Qvzw~IEqm=+4ufID*+ciL}Wh-&*6%#?-(
zhGBZUZh{tvvKm$($3D;&)|QfL6qxEaeX1x!9X^d5KP<Mh3=IhfSQjh)EB~4xgG|yd
z)^=3*B5n!EP7R7OoRW-rUF;qp5^Y|uTJOU1D<1G@lg87F)x@ctN>tj^ZZ%&1#`6>_
z^+pN&E{JJUcbJls8A$H4$kpcXk7lOS>(N|}cJ1SU=L*j42$y4;e-+E=E!u31Q%Qy|
z?XGVW%>n9Zoo2aSJ>HI*5=vmubo4&pZPR2(l*Ej$yq^fhMjr)dgRMJQ_=vQhC=Z+a
zYPS}`3hrlE_w4Ummf7CjvO?V~k~y|%`O7oewpshja@jRnM6U<@$llGBX7|;``^QfQ
zlj*~6RIe5J`sKPf_T)GvRh@}#u6(mZ?t)CJcU2k@E<TS|=B0UdyQ8uK<@#YQHDG|F
zYU18X!>hjyh*>Kpv#ZWr8eQ&T8dTt4)+Y;&bjEgjCM1u!()AjlL=UQwug;?JcBd6j
z&ky{WPHvs4HKmu+LhsdIV|Kz}bXI1AX?1_V`@o;aWuk6J@FF6-HFt?WO+2?+e3#=_
zI9h#)ytUHX)#_pj<dPH<irb6i3hB*+gNKq2NvY+joc{d0{B$9i`6U4mgCOC#_65ZZ
z0SaZ;#sF>PeJOf>Ss&0})@MWEVD*iHTc){u7VYF)x6`xpWzhrxV*e7&6puFhXSl~c
z7?~vhycyvP1JX>7x{F<eav|-xjR%qipxrr}y89cF4iUxZBne_bGxa~##|qc)2B#Kx
zD0)LKMkwtgQAvu)jlbjNbA90R`Z%2KGvv9+^?7~a>-I{CPrx*UO9{Py1Yy}$w(o#U
z;q&@7cGY&fiH=*7^L_!(iB672*Au7^;E2Z8Wj@D~l)%ow709l(cr}-9Y8L5i0o&^h
zZ-auPRz2+>y!LBS$gyJuO&F;)uh{}H7E+?gK^x`m1dQP_j16;BKtajDOPd9M&wd64
z5bz2gF+Y9>PXLF31vO7csNMkUogKjta`|gfW~6M!ERdhrRvIRM$wbeIR@1EjGI*ic
z`4+^-h*ATB1ZPCz?AZ$R_HlCcNOQA*(rYE@T7kv-!E%$xB7OKa!_z2V;qV*Xb?EAE
z-)W_WTbHADmD!60=dL4qAxf%<z=%c9AM1yaI#DEz=ndac$4=;F_)#+idp$Nc4w<G;
zBdt}3>fjtTThG?6&qJu41;<TPSlIchs49e^@H3U;a)z=|f0_pXUF1X_kP?MaKO-G`
z?s<~$%8Becl8=^~&nh>hh%L&?X4L%yu{YoL{2AW1_ami4k8N1~=i!JsN2UvmRBqjk
ztbOvEwi*KT?^A|g2PJtbf3kuy5@<zX;w(Lp829M>LiV3XH@&*T+f6())86LO|6m{K
zYS-^3(_h%9INS-U(af^3QTYd~EWz2t9vI(~IpY9=4a+QAV54+WPk`@cN6+&ehP?UA
zYoTRp%YI`QJV=YyB_pT*`wj510jlxz{OMr4pffr`%exley%i~&2nJiQRRu5Nj#Ru2
zO>CoJfRqH()r*9&&}1<9`m0)bxs%@4$Kv77@5}M=cDaED=Re5KQ%~;}D>&?x42%t(
zivA#P#h+M@!$s~xb5qpQVX4qDa~AZEz|Aw<IDW_^Y*35<8Wv81Z5g3vGT<$~gW5v0
zzF1A~05n{yT4r`G2LQr&Jx$_oJ;DEjeU>XM``8*9KvJ~?s)oI`9;o7L3r>DvpU3pE
zEZ+0z%zML3U;j6$JpDnEG7CA~Iy~Q)|6rfjShYIS)7Z_%#=6QJzRKxs4bK-OK1Lj6
zK9B4Rw5X30U&F{HT4n$~iwrt=7eMQgqefS+mA^iQe(QMX=0w9$DRnOR-iO0p5t=OS
ztZNFY;cV*sc-~FR0K)Q9b^sXHVES-ZXsPl)olE&Ptz@5Dr*)-xj1GAX^QyM>Co^rm
z-I$q5d{!)OE1x#YdkpJI{a@kO*m(UE-v20eYT~ycuHWyVr;#7@&%A|5P4B9P`k-^@
zPfUt%X8uy%gl9CY*n2YTZHvgx{x<O6pz@JzoR^nle5`h}OUw}EK0nbA?VO_6u*6!t
zGcFvKkoYah^yV7nXh_e{v5)(C-(}tJ4f-t6ILi|gR!wMa%!&J?H3Pxs*`wk^gV!xg
zx1Qe<d=6EwOaC;_(yp1=F7@#GV{{JNeYbdDo7sppnc9BF8#1ROrOSiE7$y_k$@znD
z0^W3q*v*1LFPINcAfkl`fbuFN!i~_A1vH36J}Bm-<4_HOrX+SQ=1JtlIyJmPtxGfz
zrl&Wo<V(6vYpCT(o4%w`$&+!BcDhg{SqytBPfW3J1u3SdXs2toyG)}@Hm$-`q~HL5
z!%vvZfOP3TH&`L-)&`_AA=@brUq#hnNSe|>s;b>q3T$tf|7|YQv)Zm@^4M?sb$y=C
ze6_L<NsE-_qYSXCfKd+@-!1)z3+}4wan-N;G_3+yc)%AxBUPa1lhoAwW@4gA!{<~m
z8%+hB&060g=dpGfD+wh}O+82koE>ngwX<+3&zR%=z##u~ec&Ah+$;qp!MbHxmE=zZ
zY}EUX^G@O5JskI0+Kj1jePa-P#&SAjw(0Mt2mksYUUA$-XGlhlfyP0e!q=1mi<{J6
zH2*$=wKD;+fOIsOovq$i%|4D9l3A9EyK8Q_;x}Mjg(`Jp(yPTN;tuSqO(7;EJJz7$
zQbEzC390VEK*XfF{q%p`b<gV#5PDi&pv2plAz&iJ*?9^0#Rr;)BPF0sl4$0`TZe|l
z^6E*yMJD-l3nP;?-OgG8zlc3gNZ`eH%)Up8{(_(<F4ob9hj}W}@cJ0%w!_ehB_G(k
z5c`14mk@`N^pG6Xy8PZO5Gr%LKyIP`fPcM04@5A~?M7iEB?2m~UPpT=Je{O7`JuyY
zA3$Fc{Kk+M9E)?-s*<HhSk%tLrl_kv0S|PZULt}$xX3JKzfm}%8{LoD-VGrZ3lnak
zozzG@{@gL<=v4xo?vOW$4yNoSW0084)AE^nrZPs@Cb%%FzZ%2WTp{^QhXPdXUT503
z+KCo1GfkVbt7g1q$lT`LOc%f@Bmzu4cm9hQe(PPP?Pk=ez0QO?LrKvgl9fXO#`QDe
z<0V1#Xd9W`F1)+KYF(y>gO&I8Dck-Y=5^ifxdXR^Z$#6qeK~#_n9O`772qP<W$Y_B
ze;4uk4%qbBB2W7_o7E`Co3}!@A?hPrxQA{p3Nf;k+FaYauh2?+EI<{XT~)9bh^pVu
z`uA=0U<>quSs-<L;ZmZlI|1{O5K%%HQDmkk+oSo_5v*$dPK$ntaHZXP@aF+en(C@k
zt?wX?*hv%WJe>N0U-tJ>YzZ2lSID^J@8qN!znFGhaC;#@US*Z;=jy?(Ymp0qR_)O;
zmiK+~Wjnj`#ahm0yVv*Q=8Nm)O4eqx-DS7u`or(qL9&0*uwP+mp;S~HF~tmu;MY7Q
zZ-YGO+wR+fjV+QgncrT<*+v$-?hc=xm&siEHGRfd`|)<B(U?Mk8a4NpRazQ!f%JHZ
zL_>zB6@Zq=G{BMa9VIDB<OcCo0{|Wb-62`_VAdfsj$j9tCc~ZSD;@VkkBt^z-jteG
zk(x`LY%%qv%Tj*h^}aq(PT=^!$4<bBx?n~33L>mM{rlT%Ma;LENV~vzG;t`S5Y`i#
zc?RxN2LNbkzKGatqZNNtA^vSg&~Jjj>D|vvsoa;D>jELd314XUk8@N3An@bU1_$U;
z=MDJ>zw^U#OFs5cF`()%{<ixzOpGG%kc<VHtEHE9+-1}DEbE=g3#hoE`HEr3*G*pG
zTb}A@Q^XVjg0pJh^?Me`*&0CD3yCa$Q6HZL0^1wdz;yua#6^r>)Mpwy$USA^&+Gyg
zdmD3mYSe$IPsx$W8Ohj6KEc5?z&7RKN(IyOiM2efz7d?(d`>_x?12(UD^Ly-u>`gj
zh%K(lScGPaA6RkrR$my@^|(N#;KIt_o-w<z!a8$}RWVijCp(K4e<<9^-M0b1{XVb=
zeL-I_e~QACkt&Acw1x%-Yhh4S{Ya918VRE8DS@UXWGn$;?G8CalQ4O&%6*%tS?&n_
zA<hmj!tZr_pSq?-b!vhHbk^w66kMs(2Yhk|29R<wh7<8EmxH$;J)s+z2UaEQeyf-h
zpTQpu3pBuq0E(1se?$xgz~ue`k^*O)&Omh;OJYKjt(QF5vx<aBUh88bYk$u3RgR`d
zdiq9r2^E1dsv900>#-JSCOHL#5o@z$iSywX5C;?@@dBv0YS<q5JW;qTKOCAw?QZH$
z?)S)GNK_r~0F;?A^PY@A<XQP}hQCcHJk~3{`M(jg_yB=42{+-i3DgE>93;MNa8VPn
zfG8;5@}Sf`Z}edtHiWS^ncr~F(7<4bcx$_N$}>_vPj(+*Di$pJrtuhK_CczWusN>c
zn)vG(nT^=FV89OaD-A@%He6&V&6;9h5F>xYGjbk^oSaFMa08lZq)C^Wx~gsim$*Wo
zb8*=C2Ur(|7!_nY651&K`3634pk2HPn?*5ZtBzDXi=B~e?mnTQrr)O)f^>58D>Dwx
zNH!6p-vNop>vW$mNMAp;@J*jM!)OC@<y8qtit*TjU&Za&Vz<q|z|9#sStfTOz&v^C
z;TD+|Tu}vpKOh^?d3mANoQ)T@rZY$hdd~iqT#L>n^KhPv=wdXypXuWHWwuArj0u2e
zRk3Wye&;)&cM;{@H}X1<v9*ivwgTxWZzy5l`<B(BHZ0H1P4fL-v+s6fEbKiCrcM~G
zofZ=f?wJBQ8fo$eE+|fbB2T>fLdS-jP|#f^BuZq4wj`gtg09uA8%srh@5b_?r&nSt
zm`~(KsV(H9P#xPY5w1ak^=N@g#}HS6$p~02wurG))0F{-9aXG^`J`(c#YL6MZ9Q+n
zr(vYj7UhiKwl^M*Aa_6VMU2>V(EGi1b<n@zWo|)FCcNFBF2z*boEYi!O`j;K8A?#$
z{6AJ};iBm^N!9NF?o|gb9^w2GmkQ@v4U5?|XVfJCt)At*+y!qh!Pa?y8uK`~^HxOx
zn97WOA^Fy8@cFPSOCg)e)U|*)DH~MwWA~Ept15Y)`cPEpB#j+9h&g;?h1Dr;iH8J}
zGm-Ya?BI%?uddHyED=S)0On4uL-_Le04ot2FZv}^r6o%oiHf^e0xU4%yP(?wJXz54
zqgqxvFjbh*cx5Qoxe2V>&5(zR0A-h+y(vIf4J6a4n2ThELjd5=M0Pl6cSRjEcLH}}
z5N*zUxq>aqvK%8Upcy)q0+?872hmDgmF%!fTqjRxI?A`TT3%8Y+XIrfK?CZ-o^+IJ
ziXexP$8BLJB)fq0^`2VKR3#$g9;j4)XAS;_6$ulXMrod<l5fr-(##=qgca44G=?ga
zYGrls;!Xb~5Q~S>^HeEdRH@XUT4ZR2s3Mn>jMtdafVe2q1B{1?3@c1=5NA(D9#N@E
z!MUbo1ZGom%;7BMh&{dTdnbNfA1qU0n@oj2xzHA{;mM9g$djAm+1Z^ySc!|3AZpPC
za8&^+q*!ihH{vz;G})H0i@G7&4aj2~<4S>k=wJE6X=U1(;6myr-QghAkHThMOdI7A
zGTO6eB7!3}^T^*&*5HzsFSU<_&>%tOHDvjNaRa<$EA~kfMBY3zewl{*el2jthQ*Td
zd5*8|%=XncD4%{@G!!YbflR-!&m3ChJPcWau~D1;EF(8Hao7DX2}O9MSUNjsG1p!q
zDixF5B)bD(1q>OJLN8IkitN!Kx^%}~%n1IW`Tipq0XG*sUtigl9dy^`;`+8Ziz>(6
zbV_#hVP;gX9vNWn{ds^@;S!`x_U>}(3iTiCL#_6VAfova*zpVdh~_Ea;uuii$19;6
z{#pCus1-EB;r5L%1$IOe^fTW11Hu|l7tmS>!eM2A+rTe!z3d1(g(lw-Q0o}%pvDo<
zdq^UdsMm)$j>7mmRJ#R{vi)mx?ie#gs=|(76KVVikg99AcA`R4T=|Epo<ST-_?w@u
z@Y!2MHOeuUPz^M1*u4T)T?O$pDIq2WS&Pm#9+mP-^#scfA1+ES%{rkIN0=non_}eD
zUd(+wP}=*_yt!=`&?L#M?gld02mtHqZ$*-<$SYCkt03AWMPKPR;=V2M->#YDjNhpm
zmMR{#O96?~YB%ri>Wvu{_Z&EKOoLN|&Ii?aN_<2|=^jDMLQWAxS`_fHGpnI2UY4Pm
z-Tw~OMs>yZViK8?(l4C}ir?+?f{CGrRxWdqMzxG_lK=n(K~~n5#3v?~2@r|gK(3%m
zHF*8Zhs9ACW+*vUJP)4KFRx=*FO|JMu3NjuBjzWeOlXLrT($JgqHh`dS`e&Urp)WQ
zJdjJ%yOL>(x>Q8Qc}}G62}b6wB&K*Kp7L36Np8OOPGZ#@ND`@HRKJ03q|nAHEji_y
zzKK?RKh+FILl7Kg+mFp5cX84lDcVRJXjo7__xAFVq$97*0LZdwH$+Nu__=$_layug
zjYialtDr%mCb+V{((NsOVIOV^`3Nk`lxJiF+j5eFd)tPU4lbnmn(1)TYF77{2|5%g
z$`%6f_cKz*+Yib>A)KYP3>kk4K%FdW?R}k<--1_@C8-m0d-GNM$!QdFe-fpD;#_iR
zbkt(m!!tdqi9K#1=KsPz!<V@<+dsDVFAQD}Cf5&#<k6AMiTcQbH4D^!#4ZGBBfFml
z=bLyxPNUzyeJrV1-c?~d6|7B7UVX;GBhmoT*Pyvu;(5Ru)W~Dc7;r6AQ$6qpRD~Fd
z&E16Y0)JAAtK{HSH_K66f0^50u}P#-nC^JeDA#c3!Jaq2PswdR3h5rJ@Uncqzf6{V
zo;^=Tu+A6zEFZ7HZ2!4F0{^)_5F~nGxRZE^*~WrRM#!Tu5Nt_ija6dz>I{jvj2K6j
zsVY>FfA(tN?49vrU@SGh$%BxZVsvP#vWh;&S95V@Ef>Cw$th-h6+bP@h^Fln-)qWA
z@RdH_aP0VgP<G$$e#*O8Opg<>UWi$n%8TaE?5-xR3`d28l`+e$1+*=oq{LpsOP>B?
zeLU;#4n7}ZSN~&u_<Yix{S+}!z+5)m2<E*@;rUa8?zF$(sk%VN-kW;a#r#)fclA1m
z%&?ZNAuzXYSKltZIn_)dpgaivc0hBen)gMuLrK+=G4!U*CPGndpDsVRhq@3OC07vK
z3&d)zBKkGBo5ot&c^LLk*%Lg@*!RXaL&uJqA>k_Mh3P;xv`e#CJou=xJ2!_FIG4Vj
zR55oo6exrcIXIFMjR~z)%kaq~L8(9*6<IZ5vQzjQcFI|a47{dGViFVYx1+fY2N_bB
ziz!Oa711b`qe_v;qnS={Op%@}WH3fJi~TX{itCSJN`7Xnb+>UcfzmUWv;?!tA9c@=
zJQ1|(7V!|hxFUZPH8@{=IBN2_pfZDmCxL{wxIF<TkmBdEmwtK(!tc0S8s1noquv0Q
z=oAV|37I8?_><DxtN8JOLsCAPgzS_X($adXJ6e@KXprTK<UKbvD%H!~pyy+12@Vv$
z+h)@H8SpZ{1RtF*`<eBM+7=I>WNt1IDMknxKFp4xFRkqsBOdgS?3F#-#-uhMrwYSt
z63z+muO6?Z{$D;pjH;7ItCn=g^%>&}^24E5T4nU(qN|~cj;kcOrEm>_7^!doemyGW
zYZ6UL-%NT02lHkI3N{W!bRd3>JR~V=b=k|d@O7qFT-3qAKWohP;L1t?C+u>Q?Jahx
zT08w7jFp9{=it-QYc)RV9A9BJ5u;9kR*!oxuL~-Q`gg)`Kaqdhdn(|cB<(P8)JDSd
z_!ejE$q&%2Pra>Q+k8Isd$um@=hamMZ}@dEXF|CHd<f}O4+rp~`<gTRu&eZ>n`L3^
ztTFFzo(viy>gr&ozpRgp6&XACJI_{o*#}biH2BxV0a8+!lR}X^GPp^-G#I_DN;`YR
zI;&C=@G0Wn5eXY^@XVou)D0Q%C}6Dj3yBbR$V{PpmT{WVf=ALJBgP4U%O)!9!f%qH
zb@s~Gg%{*7B)egNEmyN=wB|8rFJUX~;^2KL0hdOgzbWj~JeRYb-nZNJ!^6gNB&}wK
z_Ypvk@UFYp^~Dad8;BrOXywj9J+or~@0ay)wUugsZ6$NNT_h%+L~@d<<qZu&F*c&v
zNo%IdDfIBd{>SCYQ={VvoBr=8CXFKB3OJd>6Gme;<v~)uyz|_U5E5U7pVXlMQrvVX
zx-KX{)P87v8x5Am0A#P>QWT(q0b#Z_F;jw4O`Idao`zgV1BLv^))0~x2m97+8ih<p
zLPMlZKaL}ZSQ2f7NcNpK6+9o6{DwL%h4q*9F-e4^Bw2(%w4UccclmPhh1pqq1$or}
z9poO-1E~}^*Mx)o9)rq|ZRIpXY=!Rd!3dgBrnXlW2qfOpM23OPz5{wLn$IiYLkGI#
zjyMVv%97LG-A|DN-4olCR`3%-kYbnOkhu^Yr?Mh4#0q=$ut06^>?QnFeTJIO3<`o^
zMiclizK$o)2eTvzn2j}u(He~IiRUWsXxx{*wZ@YolP4%!_oIYK)Aq3tNl@NfqApF-
zI5-0|`29U%Qn=Jh!)jWx8b_#4(O8%P&%^6Bn!I-GumjqWNo|`bNZXdmsrMWtAsPoC
zaldF}5lBHSG4fJ@YDJ|ob&F4!bi5ZQlG0nf%3>hyf_lPfK%ME3`vK@MlxYIlTpZFo
z9ym!WPzoYsP8lN8NL<C+Cj_LN2##F<@iU9!SG9YzTviH}`|8<M8tz_k<{C(1)m)qQ
zk|aF33wzMqKT!m5@*QN>x2icNUfb=9d?_jl$<Pv##z;iFB7@{vKr$*hyfLT`O|(M@
zUaXAcVz&Gxu?+PU0sqwFd*z8{557J5BiFyzK^jrwuSi2?5EBd~amK@Nx%T#c9bR{~
z+Q?n+)?0a99A0m}cz^DMkS}ENmD%j6B$!<sBij>nj5AyTy37<Vkc{XEO1#)586_nA
zg^mx0>)Q++jgb3a)yFwl7OVO0snZ-baM*@LrXb8-#qf?}tv0MT`VdMOq6M31I3Puf
zdh)VAyLr;Gv6W(KyO<j`TCe+HyDA?Sagdh&+c4)(nx)$0uy@!Fr@<Akjl!9YRIX%T
z?7RGlTr6f0XwP-{QnQ?tX6*Z-cKJSj(K?y0W)HKFQ1Ms9ZQicT(qCRHIUd*)Y%JlL
zzYXYXERr$qfCNWT@c}59@DP2{4=lKv;BOhgGdiJ~{5Q3pO7w?dvG#0WGNmjY*3(>?
zfdcD<D`J|4D1?*%o**q8sGrYGG$#`<HP`!U7tUk9st<_Gtnjbuv%T@8S}zfMxvZC!
z-Zy+*-q}nAhr@H}QUPKB6jwQGS=!rFiuc3GcTD-8>QgzJQp>wMq1omF2yRb1pgh}6
z0J`BAX4KGcXMZ<`G(%x)D*HN$@T>a3=KXj*N(E#YLga$t?-n)4AN-H%W6Z8Lq9zgd
ze^np68k_fHxY+i$0GFbPntuto9$%_Rp9^<YWCLM}5C=L^kTB(YNsBZkOF*`d367fU
zIDpS4BF!<(X#~vlf2zw|y&Nr0>c1xWw4Fe^T+53CeeORGwuCyi2C20jzLq=HWl+QY
zf!m(fABJo`j=Fdrv6sA<i}>pK9|#DBVDI>xmU(6rOCxIiWUaK>vcSO-ph5xtxEwWL
zX1N-Gc-Jq1AV}kdv!C$U#^0M{=x@PCn~B;|GR1a?;$o^^v5jNg9R~k8R<b(ydK`4{
zaCOpi$xvvG-I$Bu1+5A7xGd6eo-LFy7lO`C*=8;UiFPT9*(B~2y_L~7&jGN$j&60B
zkrdrjg$dpieloJy>(62A5@lB~6Z3$S=szi&V!~R|$!B&nd_L0%&>adb3P+ZPIsI|C
zdi3bTGa7;=`XJLKu!ti)E;(niZR<IL^z^hixM)2de^L`yr@Wv^#HBiE5H13OoMZm~
zs!!OP0?sx8?TV62@W9ir>Jw^Y7(cC2alAE0Zr#$S+lFYMK{ByRo~)Z0Wy|rqqu~6?
zELVIK(vnWzf!lRe8E_3+jeeX^W`7`UHi*X?mcG&OJZ@f7l^9Tej{%bPS25H;Aqm%c
zPARI1OhFi<W4uk>I%NIzoo#UZs<;Hb8Dd%4-dp>HKG)i;+zOU$RFA=LffN4IIq|d5
z0%8Q}E4-y%{RniqoltOU+S|VGZZ1wb(>G}l!SfVa@ta`+B63+zC_2D*V5$zCaK)Ef
znrDxSzzx@f1mrCAQq+eY+<;?w-n*}0?vaYV@(CteD5S#EbV~T(%qB}zf<f^!E(xQ+
zStUt7TNa)%8F#$CPdQ8%M=Vc{kP|h6sNI=YaHEic&tjKyTv$sQuaJ~EwU9NP77u*)
z2j9DM1A4Tu1qimc(8D{YXjGheN<00@e;QhWICVjTZllN{B8u~8uJE|GyI8HVDKdoE
zE?*qT#XNE)7?m>AB?^q*lUoMTv5lYIIh_Lp{eM{>U_o|;U)G0IC;G{X19;#+)(03+
z3h@lAF&r}96$rP~UD|l!QXobZN#s8Hz&Fw4-G8ek)}AQH1GPU?kUjC+nWd^c6$1-Q
zC->#~C-9rkgzj!CKZNK>Nwgn~!hcytcvX{Hhd{J7zJ(Q4=2svuh<9l$k`BK!(!<hG
zsYfoWK%{bnoKo*lZh%|||5F*L7O>1PU-!5!2&2krxS%162XKwbJiL@S@&v6NI;|-X
zx9kKez@2;7&mJ43SdScz1z&Pa5mbS^u~Eic8#;28z(!w&yi6<$?4MQnlrOi88att@
z&&Un|Erjx(bQuRC#7FZyFO{k~010W5+gAzB?<U#=*oE>j5UKRjZqz#t;h&QZcHe8}
zj7V(!oOrQlP7=CE?O^8G1bx~jR_roCj|V1Vc~Q|*zU}q0ctTiMQQs`;l{YNyqqdx^
zZJPGhk{c=9m)u7(kbl)K0VDXH7#06WH+dl7?zZFiP%*Ed=!1e7Vh<;7xeLaxSX(FY
ztP)ou(m_R<TxkN_no;y(lg*h4zk>}3ukw(ltQV{d9Nc3pE~V>${;pH~FS)A9Hx^~T
z5pzw_{*)n2fY6d*G%J*n8<@j45Y8OkF2S#oze91n|4p&j4M7HC=3N$;$0sP-U!)5;
z@{HNL-5}bgU9XMdhIm!<g=uOkf-wHg{!f<<T-yI3>m8#bjoLQtpkv#%ZKH#ZZQC|G
z>Daby+qPX%cWk?Y$@4z*t@&onTKo51e`?=Vdtc{qoQeD3cnVhD><W7nud}w}MSQk%
zJ+*j>%*nMNeUc%7WeMhHh3{4$cu)*@znhp}T2P;MS+ZTsMWO}quDsIC6OLbr+<ob2
zeJ<ea|8Pk>f2Vu8$KzF$no4y9^6QJE@_^xz4)zx<rUMF%co)_SiqBOH0|pIFA`iDf
zQIphnV{ay(y$7FA`{K~!Gh7APGO1*m=1d)r0>utuVM8NXv3(C&p5836>;3V5fIkCy
z(0zC1g79uW1XSVJ`->r2KwoYQ;2eCve#-=Ro9`fk3kgS%FfR1jet15|ZWBEe#1U9a
zicK^^lF_1WerviK8Zk_94Gxx&d-!qBQp2=FxZM(Sk8N{oA^2l}<#*XXS^o?x8FM5K
zR73F-0uvq%#iinkPo*I1scx!o6J)D?gLg2#u&}SNCKBTOYU6Gwq1hJ`BPvs_z4xLw
z;D>jS$B>L2ap=q@tiZ;`$wXm1mk?9OwyU!-zwLk)Z_R$}M(t3JrfpyPYm-59)n|tu
z>!>;(n7}VAJN*aJWUx+xwgN9f6XRpx3&b)!ncHNt9EzwE!5`de=;Pl@Q~1{{T4GTA
zCL9I%>|AI-3io}DrA|ZgkQm7)gYSt7?p-<R)bSFT)f<HA%fq?+3(&uzJ5DdCZD4X1
z;h0qx1oB>H(zW~Y4^DJp&Nq4nh%~4#zp92KAdt7<HOUcR3DYY6GIbdmDlggxd6BiJ
z2R*H)#oNF2w2-vuc2OZk+0CYjjXtPkxx>SSWfc2|Sv~ZkI-??qm>fRF<>9epU^zI<
z?PJ_9J6kZ5)wm(|pfO(F%VHB_M<0E3k7&O+)CQd8D<>n2DS8rbu3{sUyzU+f>_$3K
zfOkW5E8(&OurIT1<(!_LN_%)9JQ{QNWaPT~wzKW@n6-1+P|Pz%@B694P<T5k0LKHM
zgJm51-{yC83oNw`W5qlHu?O=+sCu`vvkcpMx1Mu+5i@!(54c`gYMw}cA}zlL7N`Ny
zGy~W{DETj`y|bPM8cf!4&!ahB8(h2(8gwd`Vjn(jSG!LnXItwWfy1D1Z0lEt*-+0F
zF|KMx_}V3rW(0r}M*3s9YIb>zw=d%5l_m%jY+UHP{?4HA975=4jNn@#>V=i<5Ws6$
z@HrBf6^zAVX9&cI5Vp33LK$?-rk0+mh?*j#?PQ?z_10Dw1b$2nE6Sia;I^lLt`{bm
z7SnI(ALN2*hG54E_7)Gy%%HrHkqze^kO%)L47YBISCmV!wT*7a%*`Qk#JkhmyOT91
zXONK$&kEwhwN2j#%^N<ek#C1!q^pA9^YZEetgLtJ*EOyEvy7%3%`LZJ!WfA{MB+vX
z>V%mHwg!R5UTG`u4`1ESpX86%?Mf+XO6bEiCB`h65b2f5vEo(|tXy!MGhJlern3V(
zUbP9?3yFwxBqfr3g*=(ki&8F>YE-km5fP3m@*=C4ZXh2WpmMeJb-V<Ls%z32IErgC
zoV9axB~I{%yt)Uhd{(!GhD`z!YkpB~+n%HMZih((gk~}EZ8Q2LJCg!nzqOEO3L;7M
z8zpK~3)R+=L4=#cRmCdR^^U;?h+SHT)RV&t03{YvT=FaO3d{Dnr4pa^>h8b^rxsWQ
zsf{uM<x4<O$VNoS1`#sCp`^(Q<MehBOICT+#NLv_<?Gb<k@TwH--zM}vU{e)YBPPW
z5yNdX{^NaPF~{XQv4d$*yI8P@(kV}FUE}Y;;j(ePo+5E_3Zk9fXW!Rptsr0?JhvZ3
z(q64XPA&SVK1Ld@>gzk1%hSO8>;ZhD0S;aQmVd$)yC1SlU-(mCB4&kTp+&i<7~8iX
z+YC6Q{Ho{fDw$8uhm>{7gIL2b$@PlbeYblB%)!mot@w9#KfFQa5&)+!0c%v&3bC68
z)J{q*GD58eJAZh8RRvj7E@o$FW&<1D&nabmM)1FF2*{!W=GN)8vC&6l04K>3mY1Cr
zZrVvMvhA>g>0=R#+<5WsTr`C=G5!ZHD%INa<|TGK&(+h;N}*F%3bAY#XHuRS8md44
zkX$BNgwiqaUHUPeMturjAQv}RaO7;&Geo&0BZ+LFk)*<O%gBT566K%k-qu@0)kswn
zY2#kizVADkPRwfE5{a>~50x8?f4&e|O&%6Z21qVYu~!FoMUw2)?)Uu2lV;@iE!~re
zWqlsh99iwo^949lSh=;(7D(HseKlw?xHLP&Fi*F%mvSEg{E{?R562Uk`%rb)zyWwJ
z_|GX1?NtbGQk(k#LpeMh6v{>%6|C0Z$YP4xu(;99yqQgeXI8}#!OC~N__*L~CYU@C
zV&n*;^9vQllM@Is<}AE5*gU4*orOK+elor)%~5kc8f!E|vGR$*P2+n6Nlf;uZVhaZ
zUXDIh%kAlKvP;44-Xe#_(o0xf{gTDt^VVh^nx^8@-2*a=twlktNB1XR=#mTeQ<L`n
z+vG|=^m5>91tuw0Oy+ymm4t((7!~JsB058nQ|WUi46m{$*HAtkY$!}H8ZDo9ghouS
zvcV`b)~I2fF_;##ah)5?rbd*Z>OE7_-cfOtC~3cM4_<;<n2ht_T(Yv4lc#i(=v1N7
zBU<<gZrWjgmW6Q2;SjZXRV1e^vY$RmFu<vA+Dr2sXO<gIV3D{G9O%>jg^4|Z8(`aN
z4E_^aAzIlv2>MVVlINaGO&B;tUm{H@!G7{&HoxI-$SV+Z99u2ZjVYxXhv~BatpW)h
z;8_nsPQe)SZTFsc_Nyo1l}bJ-V4i{%Ix4Slg1;O|q01-XHU0P6AB8$J(7mb2C}XVR
z+;e{pdTT>LY7A&>5;qC`H94b_`OBfKKe4o%Uqz5c2uIj)jk}KZc~Nmj0C(g)OTX$s
zQY9-|g3p$DH!KFP1k#t6I0w{NqnJqSH;*D?Y^v_n79QDdg`=Y!B$PcC0tEcu8+2`&
zZsjT~q@d?nMH;9KZxYFL^g{8jHL|_wZi<3@gbyaSM)-q%Yy@>l<d+VQpdRz@oM+~Z
zv*nh_pyC0@&SS7{$bg`PNV|um-_x!m90__bqZC#kFcb+^XIKCI&(>l2(9x{!z%Mo%
z{8|Ur^E1PZ0dI7K!^M>1#^8T75l2LoRa#zS>;>Q^4xQjX#h`rWtKI8A%}<TsvzU!F
zarG%}lz&Z!*gQNDZ*~SLkh*`qWOqfgt1i7r1WQ9p{2sFNmDRWZ=KJOl(OFCwE%G&_
zg{+Uk5p7QsQLWg}*JJ|?eQ5suF=m8FkXBj@-}uC&{%hXB34$YUQ907cgsnaN1Nun)
zL=m-##5ATA`|Vn&&eiX_h<HfBSpcmEHmRnNoyH%bTxZPqw`7yy--f%+jqzBM)TB1#
zDhuA7q7;woul=`Vc20^>Klr#oDfqVc3deFqeqCWAtY>=<NMY3BzropA+Hw+Dag<XN
zyW)ejI3d`cL)rhLnrdC+kfmlX(~6U~f$YpLH+~4B$m9_C@W$Q8DoO}6VyIy}PB<JK
zThT7RDR#YO8fDB<^Wyp3goY<IB(NW+ZqGhS>-d?O^Mzb=H1VNX#TfE$3qDw@CEYt}
z;&p20dhIvPw=}(8f>oiF20he`9Zq&M!um-(i@701aA7U0`BqnoI&6(E3Pqgc7M7vn
z1s#~fjokDt7Yt(w54b*4MS66tkj<hndve*XiP_7W<4rpVm;0Z7P<$IaAbuFVT-gC8
zZ*GF5q<%79sIuUC6Yn1uyHl|jlY5q*UvYIstq|neMlF_M3aGP7DdFKg5R{XT9NFWC
zHb>~Myt%6Dj(BsZXWF+F#xbzMiT85uMWXg}AmN|=as~AkaR!OZB<Vrgys4A|eBiA`
z3l@v(Zt)JVIw3I8PZE7NaP6qWQTz(DXgB*3iNbOD-x`kNfo7ZO2xhBUP0MLo4MPL4
zo^5<FzEJ5&AbUacr?WKCDHuP4xVnHggL`rfxEWmD4--tcb;$bR%(wYb7RhexpA<2g
z3o?rmrb&b;#fumDgz*xKs_s$Y2xZcRpmD!h_EW`ulrybS4?LT}SWYv%eE#F!IN}~A
zS&nQ?{q^c3?q$)O)*x`o@BC`nq711_x!>V@k~o$NcP4X!c&f2@UT8kf9}^L|OpM?s
z!*b)PkU%MWSY+Q<2nIZAdd1z4UWOY$Fn+Ltv#&3S<J=7L2^8AzlLT!6{!)jWel;>e
z$<4@I!Zpvq3m4BQrsyaA$f6yuBBpiaS@ybBLjPci{FI$VsL{3*gRt}T*bT!oEU_8c
ztT`#_HT=m~p97Y1IGG7|8lCqr74hPSP$p)TYhuRhnQ{ii(K(g)l=MIFJZ%EK1*)=f
ze_Avc6ND`2R2pk}EP98-ESg7IDp|@wHyBfAf4sIuviVzhX~L<ebD=o!`u>k?<3hGO
zEuE_xanwICFPRsDOqoDMC~+-|FqNQVklBpj)&Y4r4`t;Sd{1A>A8<%N#x3p|*WX~@
zf}_D>>Io;VAcI8gZXDn3I4i5V<yS+13{V#Kz5w#uuc68srch{BDy7iAx&HjV;2wHs
zkneH(dqs9<qO0y^Xh@@h^k^%P;B~o->_2ji<l?{Iat(N)mS|9t;WJQTPaFNrn*Yc(
zkSB}>j`8cxVJX;Fn-9`vV#>_bVHm}U(PW4B%7=<m-8>DBXC(=KqSYp4`v;5E!WQOI
z-j+R#qKu$w{`BNrso5*n#oIy-#SDcIovkT8l4TnQGRsNRn(ip3&gT|~_Tx;V4gP`A
zQ0{$hQ4U6GSNeMhBQ+$1XoQGer?=>4I=?z7wm+gMNGhEUD|UY+Dm^JP8V=VgoOneg
zCLOO|I+ct~WyD}N+*R+*to~Z8i{aK}M+waij>GIZ!7VZFLyri%L3^WKgL$S~ERtM)
z$|bEm-nfi|L+aa$&Vuvv;^h{s4XQ>-J_qZcKg@~@sgnq9LX?Ff1#A|4j+3|&DU2c<
zmu1<ZXyqC&9N^$0?r)WGEjhMM!o3UD&Li+wvd&W(5ep&fw#=`0qNYRD6rr&d|AYb9
ztIt{y%w@rFmQY8b{=+zO7Y0mx=?sWt8y+9*t0NTahm}%IU(kL6wfIDJDFx2%E}vZ=
zJjDaOo7(8YZNiN8V@F}4&gqSC4&~v4-&g);D6G*;vOHDsIB~Cr0UWlyTQ#=LFU~SA
z>pn1MzBN7BiM_4*{4p$=-Xy15Vth;;r!B2_$^fL++r22@3+8VNyDuQ$=Pu~<-((il
z%2y|S&L7ji@t$G81cWnTx<UEdES*T$jp-;splxU<RV(M@Wo_$F=vYc_C2WU%1FQQ*
zWdl2vfy|(q+~>>%zvAt%C2`bEaYpOhcIp3-zG@0oF}Kuo+760dZgaFiq2cQJ*f1Bo
zwzNT@Wg5`QFqbK%n5$H|aE%!jcw*;~;e?C)q@TN`i&X|!*hhJ=mF1%pVq>E|urVFI
z+?%H~e}}Ex?U&ur_eR6&SdNU=F*WQhr2%}LcPom%bre}nm&-yGf+n#N%3=qzUgERc
z%Tor!3DwBuX=5Z%R#E89%)@~b+P(M0Q=0yxB5nrVcq4aAGF&gYVP7$Y2RVZueSuMY
zBV5&-rrB+;NCz)a5@Tue_cJ$^Z>g6smVYHiV=2ovCYDei#7z(5A%Hf)AFbo*u}be|
z-7Pjszs`k~CDAzMURz|ZW^C;SHDN)NB#|3=$*s;QNMADE{(j5|Z_IGEH9}T64+wuq
zKufr=ZCE&RJ+8bb7W4ZG8^Kbc-6O&nyrqFm6N|@zQOsyN1Nz{6$gW_O!=Hm1(HfXU
z>%armZ|P)${X%WTt<pXl>f((QUxveY>v-1Q^KHv{a9peX*5krzxd=`WRq{M>DZJs{
z3hY~W_Ef+-8!`=56LF>Rx}n09)Kju^n8QzC<O_U}DE+w<P2SXjaJ<epo?hw<0u{Wn
zhLH+Z6VJ;|Njyqj08ffT$c<p{2U<y{3NNC_EryH)#sY2>NsY=ut%A^c-tW?2bKWad
zC=dYXyX_$Ww%UEX2tb7VqvwBd>l9^!Cl@LR$-SvgK?vQdeagj{{3LCLrsy9P)dSA7
zR{veCvefv;$ZrjV6XM40r&T~?$7+MTr1R`9U&R1y)?2C;FcbomEu%7IshhJz*rP#e
ztd}`x38=~^6}OW?B?qUoVkn0bGUX^=oZYZx$5+H)oOAvR*Wd5#!e&kK5KxEViUl-;
z>wzPat}TP^DlBG*-9%QCiB1xGY;~pkjYQjIscZzWIPIzrDKFzun)zsU^|w|<M6K*V
zWLR0o>KkV#AUb4=$VwsdaS4dhJW$_rtvN>4&T9@~td+E0?e_USe-ik(m$hDQbvo=m
z`SnBal>D`7ScIx}mdUgJK|;TWDN4+{<>Z1fj8~un$0M79e{xI!&A+^fwjB*X%h|rQ
zB&~%cG{#wQ2b)4i{})@5Gd^AlC%xg>Uqd3PjrxdnLp7p2o3*fBM{FuA!yxu7zkTl9
zR&l|G^r}V1D}Do7WaR^>_s=IbyB3qYLX7mhB7)(|=RAjx+S$(8;BWR)A;rX-)ZCce
zk<0Kd@(8m*s9%iMAF{r>h<wU-zrzYBIfWmU5m)<(c6&9gR+($a^lK#OKWwjL$Cln;
zWqVoEk-(J`eC%<0ycNNN+clpiY0lp%-bc9fi@p1u0a3*cYGV|p9D8aEN2+YzYx~X5
zG~LI)@`)Ud77al*4#Bbsn{aB=!04s#1}wWUq`P{UEo}dO*HX>!lH2mpZ~~1H&_1`#
z&)H{>?@pMVAL2j0E+M*AWQdq?2Z@;PCPK1$qq&dxu0o~kfR4acp{wmCljqN6ppT$@
zPn!i^?FJYQ8RuZ0xUCA*jwFE&pNYVAt7ps8g@3>plrzFOiwHxRw&`SJQ+V2pK7Xt7
z3JMXQ(B+5cTx}0fg|F@U)9@>-;)BS)+iy3_$GsQo!o;F3Iu|m@tR2U%vuWC!l)`zK
zhS|bcS~Q!keW^6ws-6%8#Xt7@fTovqL;q+S>Hg%%a)dwJlYs3azmG7MuHucv(sf+7
zaa>EEn_Id44y?OBwj94K8Kc5v5~w%OZOy|E^BguB9AlW(Z6VTsKOS~!nE9sm7^XiB
z=eD%@zdVheeqI6&7AG{KJU)AwpD(`2A0Tkw(gtHlOouNZM@UYLm#Ldu4R{#wvF8Ke
zLM(;`2)-`em1F=)j-N817U$7enmq&TU#q%6&L6aNK|)zh^v9r32uMQ($_ZxwZ)DFp
zjc{d_0Qk44hWZXvcd5p9OmY~oF=C20nzIh^u4u|Qz1ym=GZgrpq$<Gj>`nL>;qguo
ze)^__Cg578RHBZ*TbptjzdYqcNW@9M{tNC!|G(hgot@L^;jYcnuCXfF(ztaQRjPyh
z-`e#is*ABS=4ilcAHz;4^woby*&5y;w`~^JEB13tM8FOI$6yrJ{%*%^_t)FS=dtgL
z1~Fm}{(<1Zk5Y(qV5&$7f?3O(rhnwaxd-}X9V47j1;CrWeb=A9U7}h@O|dV5+X-c*
zw~k?6R*~h*XWwZ@IJG0yX!{+YgD@q~_mpW-DcRniFO3%MfKN)msF}1gDM7z9{8yF)
zoUSH}>n$df3)SoS4XPD0l+n6Q16?xgU|Y-gwv*GT&;6+J<ArU^%Yfmn3&A8<OK&=T
zjZvt@z`x7$Xk&0o0JwF3n6wnRi^IGq{tKS}A_<0oKIo6&dqbXd$gZ~oh&e%T&@-w3
zV=m<p>;z^#N?hq&&HBXJ1^cRQTzS(dikp-%v9W_@mpA6F52n66Q*X9bc3%(^8?J86
z*sfMwADJXh{eR^+3awF0HS`4=r^hWBY){*Bm`u5Qm~kkJF`JI3M@vtQo4rkP)N)mN
zK7UT}$lnXmEO&bxgn9#Y^gO-(LyEqyNCO9v*f&#LCkp@_(mqGLx=RZD^=0xnYeeSL
z5e9~%k>OKZlOA!URB4&e`n#^pZ-}C6jD4*Jz6QXzoxL2KEiGQspl@|JEB%gdpe7?}
zv+t#r-O;hZah*&Jxl%xOsX#=~4M!v_K?G>Duu(8ntLXjPD@TwJo#8Ftax7}!tir$3
zN;$vpFZ(lKWwRxZ5q$h>tAU7NT2PnwE6VR5ERD6o^cQac04t%1=^IxiRl7{Of0}mL
zA`QU{H10G`)ei_si;8Rk3|-&+c-afA^@TyGw(A53x!Acq6WOZ%IC!~h%iEwTv*hFl
zzBYxtZcd>YydYg#U7j_S0CzXa{44+CPGSpLu*FxqJo;e^A&eakVbf+0z#?SG@x=Ml
zmsa+@dw(Z@z^3(eS}F)6DYgqM?=$p#`WpG?_NN_X=78LSF3V5&jSO!6YVm}2aXl(M
zOsovS2*2By1Rdg;&_v{{OybksDZv}A#M2zTZ|~rmo>Av2H(kUwUJx_sx9=C9%I$>@
z((TWymr5nv9%3?5`O$)6T&Y^UI3qzd=Rwm?X0L`o?eZ9i-QoH}BpIg2>VMO2JVwG#
z6Be#rs|&X&wX=Z*?7KTEE&@P}2)wn(P7_1kcJ8l|j~ngF(Ir62*vvn)Asnwk3nQL>
zn|~up0xz8iB7T^%Vw8oT#zLQFo=4I3s&@$^kn^DzQO7OO0*ypMps*62D1Ddfk)D$h
z7FnB2j;u<I3S%oMx9B4vG*w0?)Vt>7CN=||9Kd`-*DNt;>fht_R`htn0H&|{xtwOE
z&-R=5YCmt>qxh1J;q`CcP2^AZisa6Ry<?|eH%;hU8dceC_ueD3*JLl>L<E|930;_L
z%1zOl`yz%Y4_u{05ih%<SB`IJhOo70m+SGs-Zs`n{`=Ed$j0{-DH=8z@my7oPz?dg
zYcso(z0s4boqMYFw|pZp7wHN`9IE}rbhg(GlMCNshN1h@`}R&_KB?<DE!Q9XjQcIO
zxt5bVJ9T+KF(Xa*7&rVtJ?jnhe&|2oo9TW;k<YhF7_62p9Ts_phGV%Tt0oTn?KkxC
ze*pKbBr-cCqw_?tt{&8kg~nUhmrwP|PW|-0UiQ~uzoXwD?{1e-Vf`7|x!?S&<i9<E
z>EDRp8Q8e0cOB6s=?_oOKT+PtVpjyWwK<DBEg#XlUAGrbuf?uR(I(7ss3q_PVUO@g
z>9>a^+wH&V$d#K|@8K{ESZ<7ib=Rq`|E1b2u{CoElS+FI44d;aXv`x6!Ni@O_<wS#
zq3jbhv%jh~Erw5WPw@{xDB*fl=w<m*Xr`opg?oZ&uZB@md3n5<MP7$t?Y_fJmw<)f
zF^cH<+y#BSxbPG6H;x2!2dVJ~Y~|*JY(Kqg*<H;pHs!tCp8k8IoT9Jv;|xLKq^N>N
z8srivCc@r!UXPZxy?m<vcpC<3s$*7Eh$I0&Jr{nFh<V0s*}T!EF096NNCeHLlwihf
zvNojo4-*#v-NrQosoSF&P!RMU5__c)pef`%?fb*xdArP$PSTw!XzwQ_#vKZF&fMIH
z;R@K3Rkea>=#xpeT&l#&=1=+U|1xhFA+8*&|7G5eu@xwTG1Cq6^rIaQX4*OkB_qK8
zc-=t=u(?4%$b}dEU6<U2VG@Mjf|KJc;o?ABrOYKNIRH)}0>5duv#0-;cFP$4-?W>U
z#Y5}Opa?qWW*GVjgW8YYzMl@q$A1*adL=2->{091ET0T+XZ#u#6{Bk4cg+vEKdO}C
zmiiuP#&vRVv=Sn-KlL&avj;52<>bFs%c#o&hAJjccT;Gmban$fvp?-LjjrAiN?6MR
z@KiIG`VcB+PIobAXLI^ylyYp88UUHL2(W9v(rR*oTGq_@0*20-a7Mq!yWbx$P8%Y|
zkgTr5zYF#Ns7$t);n=jsTGel>ITfpS-y~K~3DC7O65=z=q`cBNox_EF&*>8({YtSd
zRzn54u=TPI1aLEm3KV|z%Ou`=sL)3nrhXz&HMvsJHh=dI*t(2(d@ioWi*g-b6ZPjg
ze$#Bd|A%JF(Yxsx{ifNfHE!f`Cu_&ml3)>D%STA1XxY^3%846%+%UVjJ#lundJawM
zE|Zp2!PK)pM|*+dPA7)rNHpV19yUKwDV{^?Ni$-&Tz)I#8KGq!a_Tn-PsC`jQ|AWi
z3U;T5Ub}$@;nsQY7vjdwL8GraUf%BEn$Vbax$a#$-=drxMTD(==R{9fOH&z28AOE+
zVIgF1h@<h%Z4+UB@CIk}v2>7wj+siBHo3Zyo-MS#JS2QO1-SjxB8OHI9#F*bRI*23
zTEmaGyqXF4w^7{Yb<+Kyw|t^SBH{^>@hgTkUF@K@j?E3$<8qn$RQ+x}+2Qu*G2rq=
z{+VibGHAA`PQBuZ)?%#dH;JoshMu9xJ0ZW5|GGlKBv;u(yxNUajri27(2e-esY9c;
z+H|442&}U?1Ci0TEd|;@A!#1J|MK@ts;wii1+87aP3iXvwy!JxSaI`SVU#4zN;u?j
z=xKS`TeutMx{B;)8n<5ImH=M{40SztM6<JF8y_Z;2#0mB8O4W<hjQIp-yXsna?4la
z`)l{UDe{V*@`M2)%q!v)Hn@o9OYHObTiMy*Byy?b0cHOB=>k(ZzQrLp@bNi0`Nka(
zaOVFGy+#dT;b}HF7}+K{W1MrrRHw@rd%VpQ1oXQ-t*wqm1}|`|Eh2$Be11f0g_@5y
z)C=%;ZWvqfdZKo&qEEll_2NfZt1<#ZbBQ~hZ>rTXGaZdAe$xMOt-+*QIn-omSqs!@
z0eFJTuMh|d<9bwJcD0IzS)*uDdssL80-s%$l9+?ltI*@VNiWV2CFvwnpNrA9oZf&Z
z(0;Fpg=<+KwdVWYT{W%12fpk4SUJ52%eB%U*4L+}5l`-JRRact4Y@bVWf;y?D^)1U
z>K~CUBKXDLThtxak334DC+z@fECZX8+Qqr^%Bo7)b7O>MRf4l&$g1+}#<K%nw||&_
zf30I2gf3fmV2Hx(L3HAj&nQcITLO^g09FmnGj#=!GP}KTProYWOMylZ{u!*<T!$$%
zb!|tuPjr^Hk&>m5)=M8O2zJ#yT&0L1Hh-fk@B-KbSXzdxYl%wTZ)-xZN*8bN9tKq3
zP!}5{WLUD!0<0_xB6iXMn1gTnv?*R5KgD)6t~_wIzG-!dT@m(eXfbJlF4~&$=<H2p
zv}pldO&hbbf(|mA8UC%=kUjd43j0>Cu=-S7j!Q+XgJ09U52(<eW}2zB8#~#`!8Yep
z9i#9DE^%3Ew|nrV9fECqGc#<}P*fTaH7Q@E>qlok4zSdI`#b@i9}I0<xsaN6w^4lS
z#xfh1kDeKm%@rMjrKk_3Ug@xW71EH_$_5&L^=Kbz3E8-2HnvRXlog1OXYpg!h^n`$
zRP}9VZfP*`InR`=|1u2%lVCw`A1POmfur<a;;dWwR2A^WTK>g^n!br2E_<W|J3>@~
z&v<!N)@jKXceys}(@N_t=4g@y={wy@)W)24=<2e;j1@-0C*Bx?w8{V!H!bm1gOzMe
zI7LOiFT>ltitn_uD~wd9{HIm_lXCgBi3bgFu|1ZSiJ2HB{SG4vZa?r3ms`;G-g?2f
zIra7R<(ccHaH@Xaa53s1t-63>bHclNKlwb6eTg>Hp3oRv-eZg9aSWS&wY945jWPsu
zV|qHyOm=ouL*|N%bPV8KWH{9!)^Tt3c4x(9#VYfXPPL#!#@_bv42~(Jhl%2DoJbkj
zLVm6k>|uYgiT*YEz#=SLsq1{k3O8|+0W`|qR$Rv#Nlsh`E<zWf3p1`O{}IXZq5HHN
z{Hg2o3bZ57(J#&m$7#dr-(5~&fb*N#(acGa+ELvFZ-SVP%14}JM~rh0nuXYU90>}W
zb3<Fv?aEEtbfvykb-V(-C@<X*j72ZCK|G`7f2#B;RFZ%4NJg_<sX1K@YhqZ=Tqsvn
z7pe`dq}e6A{;-dYPCovE#BIvVq`0(aJ4;n>H&g%F*X(?3G{dXi5i9M6zgo!KZ<kJo
zN9>PTMp+Efd5SoMy6k;WuKmD1q@@Y6F01kF(I%m2>SU}D)@&E0GO}u&zGHJMvLd%s
zE&##NwmdehaMdKyEQ%daK7<rkVa-D(&W3e_Nu6ZBLaqkU=I}5qDWT*AryF1C0CNfP
zt&qW%)(V#IC;Bi0Ud`(Hv;xh+%dC2HZ2v}z=_ztL_FN3rP*{Ltg%VPC==fQfny@`e
ztp?#IjF>DK>^){Vil*KlUXh7|26%EAOyjd2PjPs^)ldgeE1H}`GwP<`cdg5awOBVK
z^A^{{4+adN&MfOU@COAjq73vSLy34^tE&HQE(9Nrn*$;IvV=k8u1wNXU9z&n6oF|D
zqwt6%hUy>*weDl<>2kk{;m$a738_>gY-r@!l37eE-mg?|R-TNrBc>FmnLkPW$BLam
zRV|11%t{SK`kncpn<6G=LNG43hfU!iKwL7XT}G!6J#e%2qw;yFTcrD84m-v?BBbUc
zpya~~<=@$40AMqe>HPll`g*SOBwZg<$h3lR1Z{r$2^-&u%9H=Th%Dm*&E8N8hiyK<
zJYdwO)JAYtaf&9c_=5&rmSr$ojTo6kUNXpG1ep&jHPxn~I_xUbDI*Ps-%OT*1!p{|
z`Vny;EF$h?0GIcNVIZ&1pD(|(Jmk!9QvOPELb82UaXCSAPSF7>s*IG1P8$sEAB1=(
z%HkU7D53yg$w7<zfqm#?aqZ(YPdVH#2|&)|NjLxx5KL_p`mc%Px{|PF5$~_p@K5qP
zL4IaGw4n}B-n{P}r2E;X#;C6Rx>dlt26hzNKdqHP^opSy{&ziQd0}%w6yjRyypILF
z@sq`BYHzb@>5!eGZ|6yI4W~53Gi5|q7)8bRtiYKiXQr4=O-`rOet7HGi{GCMGl$C-
z9Nfg;czE_dj80`axXh)yOq=6}w&N_RDsxMMkpzzaOb|+|!Kj6ZDcUYuY<jjh&7mtc
z=&s-{+=?l?(Z<Wdh6VIN$tjrv<6sFT1&uW)Gkf127HgOkns0iD0PF|W$oW5(et2Yr
zGY=sNeVc?$+}FvV-adBgh_t{FSY12U5dDqGj2!^wz!pp_fF40FmbY@b7arN*wJLAb
z6ku9Nc;_(O_)g4_;;|gQT$?5~l}%l_Uw=L#U$oL>Eb6_$QwH6?>Ux+taD@JU*ip$0
zegqe5>yT;`1ljwVmW6d?qgmzcP4~V}4%n5hyD$@@^K{QpBDo@^gplOd%9Ndw9%9}L
z2#%JzkGm*8r^p?Ie))^@D(c3nb^-*@W=?Bgg$pOap+3gx9!QVrXXZY&hH5cI#Od~c
zZuKtypA7pKV`Hhkd=<kTO&0RZ3J{AYcR3bMN&A-ZdIFFr+#;W@IMa0wLUnR;igYHx
z5eT2N%3~(rN>m9N(`8oNWrfO9$lPVc*qvvuZtq(l1XA#gG>!KN{wQ_H4n{CFn0*}6
z?B-J1X~yq<fa{qtUV|>?!0k##Tb53qD>Sd9U+$*nSBeoGHxfbT$*yewZ>n1N&(FQ7
zIjm!QTwE;z7O0oK!NoC`j_;#hUnuq;q8z`wiNPo$H;n)f*Duk%1P|De<=@V{bl32W
zm!{VF8G@Z;E6?BMW|_3Cb$GR@!=V)^!?Z>vEU*MY$9ZU2X;%Slp1}1*rnn1@yzif(
z`$MME3(2{|V8#q$`IEL-206hLaAN1b>1o#^b7@pee_30Y%CPiziXRek*vs<PKK-E<
z*ji%Rn>tL>QGFS7?RH;Xc&f=KDyl@$G@;tdH4KR!&fB<jfvl(2!#7P`!z<@#T_FhJ
z)9L}WOl!H=mPw4i>z@z5tH9WFZ6M3($#;%i`lh{>hlg^_18CC_3A8}nw(m;nQV9jL
zJq1j5b3WZZd51Y=i6+z%ol3{S$A}l{I0Bg<M9s^PU}Q0q7+xnYi%$nFr}0Teg-NM{
z98hMWd8uL9_%WYu9$dOpD_ZUb?`>WJ$NUIzti+Q4HCW6VtqusfbUi&r0Fw5fu#Qy~
zg<W(R@jJuLr=02k)naf&Gm5w2Se$a;sMI+kB*=_@R4+|MAs_9}#yw3Wfs&*3#6T}R
zkfgP^l;IrzuDUK0k1sD|@#x}y#(88fa=76wc4g=HQ(euG{!y`^Z=TRj3iC``qpMmm
zXq&6cDx9gQP^QiZxlHWvbFU90{8OFyX$jmmv)$pau=gnP^}+5X>iI0;>~(goQsy2$
zX#x%9N)v{x2`+JsgStwmBU}Dnn}_?c?6C6mR||S}gp_7*Zr~5L737xeNzroRgDR*P
zmRW>z)0wH#ink3)OH~b(tWNFBWw+gm@5Ob=_sHiKuI*v0Z9-MLo1~)*N3^vQ@uWwi
zm9gcBn5WeLhb=D;6QQ$Yfs@O6tFLfurjtpor<eWmEC|VrbA>&L2Fegyuu|l{0z@f9
zFn|k2>Re*SvE`2{gk(zaH*IHQ@ulANou~?kfwgFs3VEdZ@ywi`{MP{e^oC{2HI(t4
zx!k6(&~d)0hISu2rNO4N*M)@f{4$Yj*~UE2_YGXVGOk|$N8w<oiygSQI6XAdfyTas
zG_#=6<<{i>!JWJREo8y<nwt4i<#Ta(UDP0-{A&dyH&tY{g%o}Az^dYtV4h<S>JXuu
zI{EUa%9o*uIZje4%GQ|X!UED_+m`=g7HaQx#}(-h0mhlY8e0oS5C5`}9$?_^(F!$|
z8+==_4CY-z_#Wq5wi%%#zGrRNe7ly$bTh-fq3--p!bk-<HU}@nfTMQ$1F|K{$?u}j
z=^YM^d1;ih8*xWzlzDrp6)G4A*I>1=<mv-=D;b9lbv2vv)&JjOR_r6tBMwM@nNnJ1
zfCZz6&8!(TkP`rQ>ic0lEnZ78d}Jw#)m7!JOnxcT9+_7rt;xG>*>W`WaeC^V>Vz#m
zfA8Rdx2u01FR@Q^5p!1a_PGge4o0e1V0Y<DXf0V~IXatSy1nj*dpG-Zn+_6nvc|6q
zQHW8Xs9L%0h&)8&)a>_NwGsLlc{>4Q1MCsC)E6BUW2*hp!tut5tYPsMb#R5d*~H#0
z2Seo9MR>Sbi-yGg;*g;g7mMe0sZ?`HFgOZvBK>mV>?NjfzOg?*`Tc)rZDJi?WS|kx
z><<gYC)^Ip_-ZS1Q4H5ZNBBqyH*8KNVX3YQk2h9ez_qhr-5MNm<on@1%fL7Jk$iUr
z(EwlBIt{F3iK96YqaI2?x8a&RNU#tFaJiF}bd7(A@fP~$K0hPIfod2vc$%!r!2V^L
zIEteZ3ANA7VBX(FM+*{JaFiOZ#168p6CS+^8GWgN-t76~I`sF-;2*wlK-i+kxVV#x
zv6FqI#i4&rpL_&Nzv)IHv0(<Aqc`F2GDMiJF=wS*-iW@!@TsUaFO-?=-!t1Lx`jiq
ztMb{+3Lo9fKB-u1MdJLQeJCmGL)=Wc;2Z=n{Na4!cot&^^wM^be5_7Hc{%y4-1pvk
z-F=KZ23Xf$h{M2+UohjZeF=(-2Ny%jY4`*ss_pKNPm9xVe?hS^^u7HMdRas-ZVcIn
zuDnXQa=;~Jd?53Xkf@_1`Iq$?`NOG!Wj&5!#S?1=Oai>2Fbr}^P&6m{20sANb4h$m
z{m4gAE-iF(m#=Wm2!?`;xnv9ab4+r~h)Jl<sD$v|ffM)IeCQC?irKV)kGB_eG$a0x
z)!aUKGRjl&3#zosW1)l}Yz7U{sk0f-$3;n+3H5fWmiZinR-xtT{pDiV<HVMOme8f%
z_L=k~3i+d@_aOEa=!31R;N=Z^_Yf69<^wu3YyUzWFU8nLupSNXS<Pm?YTQp#XkuA(
zlccH*G3y>5*rmsz9Q(Ue>#ssQC(>O=Zu;T>6$s|iMl5M}eMI?P@Nx(n>-=6W5}fqp
zn4g{ar)%V**BAbr{2#eYG3#+_L#q-In`xe6m&7t_@!mG@k&FLq8TKpmUo#$(a!({m
zsTF7@<ILwFQIrdoWsqm*MXJ66*drCk1MydkiIQYOlS59ialAKzJw7=X^X5(8JNObT
z@Csv`_}5nbV}BN2sdg+wOm;2fv>cQFQ#9;s3rdxmr+2LDS*XDC;DV4@95H6i(Jjp<
zrFxh9rvE!m@yeja5FcB8vL!M^rieJi`w@D2Y5C|Qxh0(1W6X#h=24yd8^(Beh2Y^P
zQHo==O0<9xJxnhJlYsXxoHW3Y&(r(xT}cF>#^h-qP9JyQji8<x?1NhMVN<1yc<g3N
zLniL3Ba*2T?O{bDUmbCY3)3yPxsl%q@NKg@qI-0&o0>Zl;ilaTO<6$%NfB6asxcy-
zNEB7lskwC2yN_V`p^=Bhl}*fI%>HB7?Xd0{(}Dhk0nHiLHQT*xvuAhUFhsBK^A;A(
zF5KAth_m#c=fGl!A?c2(9jDAtZBCLK1kho-N_)W{3fgFOW&8Z>YIJ$~;!Td+z>ytg
z1cwTg&2rLO<y&I+K|ueez|uzm+=A<lWA}zlOQL#J@vo;cSY-r<(M|8^V*jmmoTe(&
z$m|o**alii{d3@QoJpfv{<FsHpqwnKsyfyDm*vmrYRs=R4TvoC-g9j&m%J|ly?k8?
z`s9+#>5cnb;-!U&$dntNnK4yU+rRVU;Yrv&S38SU8q4wHy*_#Ffs|qzynz!9eo5-)
z<muyLR@&V0F!A3}OE|2{Q)zgmNz22><l@%Ch+XTs($v1rs%hqNg<G(0msl-G<(NSx
zwx@TeHMc1hetz#i0nRh(_!hPIa7|x<<~($mX5xA*lDIPOLbu)X1;I+A2Q=_V`x&}v
z*8nP_e)3-4zFIDOS%>aO?|-{(JI}jnb-?SRRt(ky2BKBWC0E(`-#e{j5x${)u$WUF
z;0O7xs;Id3TVG%s^^vaIr_pqM*76qG10Qls1qXu{8-P>b3-zBQyxDUNjoZ_U8Y*Y~
zw{2_XAqsf&m7x*Tkj`a>Omoa(<<f_QQlIo?zuBLN_QiXpmgK3Ik>R_2c}03JAg#3#
zu=^9x$B<hDR5kB284-?cMQ%_+_l$2n)E=3o5h$QFd@tXthv#F+yLe@{yFWl}QR8D(
z74jQ8UJ3?SVBM^Qz7ew+39xKy@Pa&X+LG!@3y?bu!q=Kyo^eMii}X#2So6r@E}jn%
zObatk&oXqsqr#iPybqi2{dDwqx+?Iq&ScFU2)3$qAz)e1&V4a<x}vji&3}k@5`EJp
zh1SaCrm|;#Nw`y><jmkK)*V5ak`M~WnK-=eDMl7o4R^dr(1CTrT=~hn+Z<M4FWU#h
zH55<a1Ku0@m8qS1RB7nH28)gu?IYp$kapR{=W)4_3+N2^dfEnhTx{e3T3yb2{WhQf
zE%;96nKEEvv$T2MIcP*sV{^75ldU0Y#Eq*VtHs&Z6z$As7SDtJ-m0+z?xC-gr9bdF
zs%G;_Pm5eL?&OtPCyj(@U^4Bhyl;F@w~RsysEF1TC@<jFQmL5o2XcOPM?I%Tk(Hbg
zSLXCNC<yE~bCHH45=ZJntmiobo|O8Kt?2~=W(0X4wzK*IeHazq?hcCcrH<);hm#UK
zund=h=T*bGyqI=hcyBLcJ>Ny7bx-9S!S?byWp9Kff)CF~LnBg;>2Fe%Rg{|Fo)(P&
z<J#W_OepS-1WB2Q-!cLlehJ>imgc$e9^kVO2P^Ysm)b=qDN-V%2!W86>h(TVbKc)C
zz8YL;XjBxZx`%rR4|AJh18;Fp62nQY^z6QeJK7p}ujMhCuv}(7VxH+5RF`Z4MWH;_
zXRU7!a+!E;*l5}D8W(D7E!F4hJF3D$t*H2V;YXop5{`eHNAgeja4^%g#D+Z7mFI2*
zpvj6W#e5n`)yB{4r`<UE0ac)@BcB&qSiriJ^}f@=kI|8r<UFN1bGdzsbUi29oB=6F
zkuhF~m7((855FA9uc?K<0W1zJkWx31y3^|7;aPtV9`RJmT1kOjV=Xqv6;H6!d3{1h
zKeRv#NW|+%8%4+H-rolAmyjP~lfcgMgGBCHB+*CrrvC`|;n1S!VB;!6h$?P*TJPr6
z`uXAeza8%E1vkY5_Xf#Ox6gPlAw08svq95Fs65KmrODdnu)-F-^a19n!<MAEjVHPX
zx<$Wb+3%OpTkfwVW-hjd{qZ`@7x4^`ns#e5+cz@RG7Dwm>DTNP0CjbZvU;W%cpQnk
z97G{`5vM90{6waufs({B-aoBXA;Ku<*yDn2*MG*ib+9m0NZ&n|#;f;z@?fvfMLVCz
z?ILp{Ui^y;-pTB0uZ`1RlZ1-4rFc7e^5*DMOk3(5&{%7!1>JVQeq`LmRC`S1_L{>q
z*uqj9i~ZCnZveYw`svW**0<a%1%o_}>(bo6jBjJ~q<{Iue<a5;1WQlvRPLfS)Xo8%
zCO`_c(N#PyMg@WvL8@f~F{snMTp6eJ4Z3|1w<wPd+8fNUjs=IRLYDTIa0x7#;6S7>
z#{J~^5pTe@EwsY=r_aCSw~A%Ti`+lqqH_^_tetNk4!&|;037uBIl-I;U**KT;9^Ay
z5Fwn*n)8k_pJM!~z=QX@91YvQO~-n3k`)kEbIWyg?rmL4I+rQsVj2wUur(E_fJ#XP
zhIFgJCD;c{B6aO~xn=#RKycT!@PJiSm&;b#un9C^A11&i4^tz*3-i)-cY^S<>IQ5B
z%U@G4bmD-9Vh<DBmhoGl=HOc2au7_|PG0S`2<zcDxD@Tv4&r#o<pi3|lao5;F6Gt2
z>*|QxvqUGHY0ZBH)nj$EukHJ?dmd8QnZV`Ugd}X?*TxQ-ewC9O@~EQnG8Zs38P=cM
zxu~W_u-<Mj(zQ$A7W^Me6qqY|r@86;sQI_SX~;V6@1b%YtM~@_W4V}YiWseU@>%I~
zOw})RMFi^V8D?BHtuVs&KGx0cV^HnO`I(x+UOrc%K`J5*-92>88%4Jm0~mvpPJ<Ch
z-BWb#g6GFb3PCpzJKe=1@*E8l4ji+>1V5iPhXABY!2Qk>=Yfd~h_I>*80#8u*EqUq
zg%$=J7*y0_x&*HqF0?3$A?Czfl`(n+{q_o1#6cEbd<@bnQE&~39y2Jl)jaCr?^aYk
zDXe?CwjftwjycH1-|O&j<MUl{5!(|_LplqzYQHeY?W*=)S6EJ*BsPR$qel^(w-D6m
zHD39;hSV)dOw?aMR22v2{n4Ca>G<Y@u_rNDM;1geFj0HXC@~6t(zEe`wc)u{bS2R@
ziM%=4ZjG}{9l7DI`c78RXe*$Hh0$U5htn;QDMOoOPon;{1P_r@JaqKp_h%UpB>=>p
z81BK9A<|zpw!)c4kAUe97ELCS>Cv0^DA^<|)LPA1)Q?)Ne9hO{G|io-q^35Nu8YEG
z^3Yig?1qnf4Er0N8v$BO9>E#!%ytT}7h+)ADg4U7OpMm7N)Mu{A!kzl{H)dMnm~@?
znqP*2#j!&o)oHT>E{*B5edL)t$JiMARpyL$h9Bec9O1U`x}EijWT}+F(II`R)b&(^
z%KvHNTv4?p4--pTWMDV6sI1xvPpLqZZ(=2_?hUD6#&VeLT|NzpI9Mg|TmAN%b-i*y
zE=|#@<nc)4GnbGQMN?*N^7_GWs=EDJwJ1GXwzjq{rc5vIbeS*zO^alRF|C5|H_JBr
ztQplXla?N^!YaG6NV!9wa$Y^FnS5<(sV-+cBbx*$za&Gtm&=Icy@XQ_qHPQkFR%%(
zAO<Ru3y{!@RlKyEazBpP7_Rx6smNVdx%N6gbkk9qN|dS+TqH|<{!u*e!?+MJQ?<@J
z(h26>)u0j>&T9PO*1I<4++HQYj#cdLc$o8tPb~&QTBTrS?ORaL(f{)Nn@4%*$TnLm
z!NOp)2VPX(W`1GutOX)~cv}W^x-!}+^G%CNY9iT2HzyQZ2D}y~mCx7gP5W`V<kO!z
zA<tJ^Qv0CxERf^Lzlwu*L>Y0;Ynb6@%wM?KJ)DU&<q=W;?46;T6gs_e(wf&iYyI$@
z|H~kDx}YlGAX}VnofeL+5qfPO?y)065HY*6b7H={gQM^u?t|h_VL$U6b!d0G@Qb(a
zyVlODt<4t2CV(!At1>0-zv8Z>Mmi624MmS^zDpY41?&M;J9Lkjb#Xiy(SZe2o*m*c
zYkg@LTB-=ADy+i;>Ha0!Yget*7xYons`*;xciIsIgeeb@%O5sV-hPQYVJE4vTZu4N
zfaVB!$Lk|Yi#hAr`uPxY6~M?aTvQ{ro9~bLH}K}4l8UdGmT5Oa4g|dLd8(J;rkC(Y
z#qL6nDR2`qs}@3nW_i96Do+da1LO)g9QBY`GcteYU@Fp>e9O}9$i&g&lq_3p;;4n1
zJWsQ}Ni=Jsx#%GUg?*pP|9S1MdAMu_x}3hdjo<tN`>+Ab!%Oz4NG|NeJzoI2gj!sH
zS4d9*>42|)W1G)@?H0#ArBj@8aamJ?jm6>71of2(n9qJuSB%FigK!D-qWNNqI#cC$
zF~THspDLGeKXhk_82cF#Si0!DHIgziD1BXmVFmD01AFX5ixbd}KTv-B!pzOU5<&Jc
zL(9d^J7i|O+KKmoFo@f=`Y*bgeZ*{!m|!|K^kN@7myb<So1!V0M<~6-)c?Lz$M=s(
zlF0Za@;4GzyQ89?Z$mckUuDP)oqL9A<e!Z*5S=S%RIYRj8#sC#jVqDv@I^zIyy=8Z
zij+A#mX8J9YTJq;>>ls=UE`2SkHnN_P>6*h%8Y8X>dqiu%#oEpNt1*fTlbAIaT&g=
z;~Qu(qMd0!HSl))zvV_!Z$08fyx%3q!iFcXp7RTwH3=kiIuXuzrsM*#q4eyDMI$#f
zs<T?X@q9qV5^iG>q-)FJN<w;L!wu5G=OI}{6?AM`5PIh>l;MWJblZmdf=F0Q9B9=^
zwFQwXTVwwa2Fl@%)hh=hA@1S)!B9rkc;?coIm2OsQkg;|u(hS_jQpsoGM`6GEPlzu
zv`M|86kU^i+q&^=IXZ8ufS2(0f(gHHX)@-Z^wbD~CX|M1_W-*M*=G%f!9PJCAkKK*
zGvC0k>n3h}0a5oDZO2yNGi^GCV{e)AcgJ(e-o>EQE9Q)Ci+Br#6w&r>W5++E<%QoU
zA&Yns+0&J_(PR4&v9q4XrQ)PJIsZT6cL)UQoA_nSZW}iZ0^d2hvVTHb@Xh}Z@jJ8e
zuRu;rr!(6|ToMY9-ZILGDRJ**@$QNKZ!?r9MsmMg07cIA$#oCLt-l+IT`1vn!nEmB
zPMtGpW6K2Lk<!~|--<5h>Vj}*F-zXVA(o1l?e>THA6QGGQnLuctxdr@qO)PU`}zL^
z{HELEPZS7|ckbon=yZ1N-R63rW_2+;@ZPnm><joRKNfrxsiV8?&+vyDqEq*lFeRL9
zCJSHQ?zG$BH#xYT)Tx&O!rq?kxBD)KK*?h@1&O?9NTK`SUKb@kT{oqqA1yh;;@TIv
z+t|OL3TE|}uhim3jPXvj+F7QQ1?*kp#ExU&_5&0Pec-Iu*1mqI6Kv`7gIv);{tHla
zVA}Cr9EvH8qA!M>#Q!1S%o*AGU^|Ll6I2&e??rlPr?k8oTmd3jGaiLFQZA=?WK_?F
zUYZP^!+0Xgn|0Z|x80!G?b6P*DL~HP(^xOoa>elW^u`MY*T=PhBj;M={?g{sBsu-`
zyPUWQ;q2$QL5~^+WkQXmkYHOBMgGYMD<E*qwxBMKmYHp2kz2QklbZ@v#c*#Z*Ts<v
z*OmL-<8nLf*bKf2iABsLOgMXO7whMXP(lVa$S6KhhHAKZx!{pbPIfCQ$i{%>+|_b!
zT-(*Bp^x=-te_o<rxaI)C^V7mt1DzWk7S9Ae@vl#2a@3$vQ?DxkR@9y^U>y@rZC*Q
z)T15rd)uNn9qx#S)m#TI<-A$}$+U?PWc-GRqCK+0cV_>!)c2OT0}LZlDp^_yb5e_q
z1f|OxS`UYGn}B9-y4knbxT-Ob-L85NHBlP{Gocj9rX?B-r8_g+j#AWW#{dRIIx*36
zZXo#^;~l)9$eb6S-e&|YG77rLupy4?_)mW$fz(F@PTa@a!CejyDD$|nITH_sE|eN_
z@^^dVDH+dPxkM_SR1srf!oX%Wt5QdzRFUi?GI|mtLnAinja2vwcBP!&Z-uO`fSf!)
z2vk#CbxX;0+)6I~m)oc7_>RgNqPt$CRH~H{dH4F(_Ua4q#NPZWXY`wp$Ax6ny+$hG
zcMy|Luj`@Q{R+D>b&QYR8RE}H(6=XlcDD%SfAMt>+?95Xnzm!xwrwXB+qRvGom6Za
z728I|wq3EERBZRE=XrPc-u-=}f5cj2%z4l2JWwq+LW*-|)==-T2%7=<1w0PExyLb2
z?;~!1R-<C{o23hh8BG1G;}?j;T#KV7pWD8@hnbCN@r)tZpU9>1LTD|39G<}6ySXac
z+O0-|GTpDSD@IrocLi}KD3_LK#dK-YzS&Du3RV**E&1hNVH;BmFTQ;1v)NqC==}y8
zuBej<p?kgXe~i-GQJ&g8VY`D|O)lBj*U_&U+Brjx_0O%UHzyZ<gF<j^PK)xb!3IWo
zupvKT=~422c<+bYOENTie-S>J%>y0p8?);XY$JfA(0$@K!bo3ncee^@8MQ&3Gq^Hk
z6S@iYRqSp_#7h2gmywwm5vynp-G+XWMXxeER|zmd(nrphj7|zi{dFt}Qd>{@@w^q%
z0?N}L?hfD^@OXJ32OF~Wwc5zeP$T%!e-$yQ4YsIfTt|cZ5r;|@^XYfNI&Ux8bv{~U
zAJ9T?8jj|36`x~GCWzjL@0*IX69`ExYR6ikqZEHd^KkjPC&ZGr9w0l6CSG(WBi5;Q
zcyWr|C@pO*t)pICsxF_^?76A5QXY5COTeJT<}eYDtu$A&Xwx8<LbBPZnN(B^9Je!F
zu4K0!aUhRv1OQ%O>Tt@!36VvAQ$od=abTMJE#A`!E)Pr10!_ns6}>(O9F<Ds#UC-h
z>d2(0_Fa=EK^J4te#x;}i({R_Eh$Nm=6eWd#^<2FrEYN@41kx6ThsJbLe8WxePv^@
z6K_&(M-H~Rh&U7O1nF}JnM_0C(YJCv+XoF5J@3If(_S`yid6p=vtQ&2zGBI~FXWN$
z1=ptSF7la83MW!PnDsn=gLFGwbRU>PmQ{I%?IMb=<R4drN<8Vn>c}sM=MFa(!^lb0
zRsz+0>QPh?OB&{<wmhMBHCxc5>lNx-(-Vq74nwf+8TS1i<Ge*Vjvl_dB4<fMBMkR+
zd|iN66C+D0wTYl&$muB0SIi;h8lw7yQuRUbr~*4Xo*b(N)=7s>1!u_b#)~#ag3WG(
z<r?iA`(?zD;c={Uc!~0yPg>VP3l0QUqlm}1cwbw!KPckoc?de?nX2)Zm|Q5Ee191z
zIRsptf4JvPqWrR~+yE&rxQy6+CW*xv5JSAv+!?p2(Z*v3{<%rlZNRRVwK~)SogjP}
z#UD}NWIO0^bQ99T7gbRi?t4-!D1WI($*M$PBPH&TnV`Aol*qwClAPxOn!k#q`8p~{
z&o;)P7q%Fhq7$h8Dy|Fvbv{l8tH>n$5|XQfhRRA~f3+f_o&znrFuLq6Bu*hnWVJ%!
zAkd9MS5cmbJs-Lx#7++6FEswLu8vCDwZ^tj8+{*MhO!L6Y;xV3=VTh@z9|1$Woa{#
z2<=!#JmIRO>gZNobStU9<U4C^blxUZ9Non1omce~@60;Cw%SYtgKDM46%2Z&Uq$84
z`PLj?Fa&2KP+<Z*ebxm`Q=V<Ls=tK5c8ahNWoM1>hec=p>W<2mDl^#9O?i7@(iWI2
z{ft*WqR_qAh1m{$Y<0`<qdTj-1-9<MF)O6yVP<;*{bv5=tgGEIQfAw9^+t5e#tw&_
z2P!U2O|<9ifZyF8(B;}4`0kQ-VN2<PVs*ANfxD{KI}m{xc{(|PkLA*D9)C&=R+AX5
zITcj$Mq@N<sH+D<m8{Ub))#JD8<8CMVW<v&wz#T_l)R2c4#KI|11T~|XRQ!t^^Z!b
zF8Q<8TO;@5ivRYH=DF7;RH~L=lQBOfbl;$^2*daqF0Uen(-b+t=qgUD#R<2CLJJ{l
z7q}yyHOX~y>>)fa+3myMt70t@hDY^zs5$kuG<$lkUj6fXj<oFu1K?jst4ft*GL|?M
zxU)o}QjCl7UpexNSQDjp;PfJ5TZm)}+!yj6IkNJ66xL%K_BeZNT@d~o5CMOPE_>Q^
z(E2Y_wczK<6u@TotRgnuJvI$^uu=iTy<D>#<=7#{fBQkcSiUN7J2l*wi^Ysn-FR6j
zRv~7-KovryJoU3DUK-Ty*H46N`CxzOmT3mRx7EFfu*dgcpRez9!_@V9L~5`FVOkPU
z<yRWM6e_~rAvBOTW6LN@X{&xr5-w_1V4ghU5=eI`&S!xo__DYGb|PPRUNM|{9IygN
z@89}j4x&OpVg6L9jp0)%ZWmf}yYKj`iCZ}Y(?s)L^ttotrEDnXypr!Va(&Ba<~v5k
zq4`dyH(d-kD?p@D|8*(jOasU{HL&K>c7=9wM@|V(G}(5%qgZ&<MnuAa^^=iK{`jj6
z(er#)m5HoIX_1skE`wYe8!C-~4D&QOOEDPq8d<IAL`Pi7*t3$*xtZR)>YWeV@uo|T
z1$KykR6i|=hI*a84^N><Rq>bXdL6$5J0AIiXDRHS{5m*Fz-VWP)|pZx0qk0`w=16K
zlB^_(fl-tA9lPA}5t%Q{*`v=pOmf^Tc53_&8ZQwkpx?P37wE}E-eIhqA^!L=*|xSg
z<R_1J9IXKcbF^~m$PjF}rItVyyRI`*pG-5t>=T*5L)~YUFn2LEXvQ$t)}~_PrB1l%
zmeaEByPlbY6(a*E#>J-5I!o{YRLb-bl5+xRAoQWly7`dgmgG@}Fe<3@LDpzzhEalx
zmsy;+0+&_c{UN2DKANApj!n0OmOAZ%`f3X=B7(HyRbS7w{ju<*iw^_Tm|`O7-m3vD
ztxYN@%aPP}>y{1<@w}if)qf~N%tj)7)ubR1u579pOLI5{Sj_aAg2R=ib>@qhuVX5A
zRAFpkaSr4T?KQC#*H7V_Auq<^;Gf)vt2OD?|DBx`*m=*!UjYBkn0IVo_ZIp}^yebT
z0ZdAQnV^XulX!2)n*r}~Ajun|LY=Xb6)+K^OAs|rfbhcDyi3(6K8KiWp4krb4xLHc
zXn2JrnA^wszRxT`GrG<YWxp8Mm2OoBLYL|U;7gD`w>oSqEiBxiFmDz1T#+$f;^Xd9
zt$ouT-=F#_4LtT__(^C7;k48o!P9~go^IrnFovg$f4!H8$!Ab~OQ8JQ%!rFs3Y_jn
z-q~OJU?%%VDaLW?l(nS5kTfqduy41=4$#cNyz)IBL9t<{*uE-CA-R=O+Q|i35y4lI
zWLZrg;ni9TE~C0ZH36}B@NwX*@-y4$VT1|Lr>Q{O4-h45`3?0!a;JTRMv;pTP0yml
zK&;?~X3XN`13RRpCshmG;%5KvOBBuG6azKpZd0Uf2X(ci3XqAUp%Jda<W7R}J?T4k
zV1)h3lHJNTe-X;aXlC39A#V*%N0F;F;Q_?z)G4lVy!qLT)Q{@*=pOEg%jujNOJmfW
zrY(;q6;w7D6k!MyNa>P|QR`RXhhf9Q=Hx$#WZKd9@~in?nn=-z{b`@*H$N#4r^Vpm
z&l8oT;SKF<+50S_!eB!o*mf2}L2)Jju_rJ7w>>EZuqW%lzQTtaVtVpgMIs#FB?h(n
z?*q+loS0#2Rey>6{G@0h!^gAsCWEX7f5KpHD85StUR?awo^(QP(KD6{2Yz8E0eND^
zNQImP<s3)%p^Qu7Z(rFH8{vatcU@U|!GUmTU5jtLYrZ36y`8YfoZ)L^)D>YHB#*Fz
zMzF~kZ1T9|dD!7vq+TJ1E(=x~nn&j{A(hZAh62ceBLP&k;5RrAbv4r#(?8`PspxoX
z;2TGmxth<McCbE%H+sT>h0kWwVZ-qbrgain6$>LJl!!xM7yPsPkcY{CI%zy%^q<>r
zp==i9i0fo~mtFllfU$a=Rjf~UC+kou&O31*YvUi+gCLY_y@@0mJsCX|D@S?=82UFL
zv18(C-~ENC_Q@N$LI6G9ny>n4x=0h}{hdt9K6iGgWadtY(VV}Ry~*8n&KYs{D9TTJ
z#~d~zJhs6#4=(V7IUe64lIX2KxKjjoWOVXS&?D+5ZTM+7;hy*hHUWK52GSv4HSl#@
zj$pc*Jo-5lK9OxvIVYX#<}y@ZmIPt<?ri$L9xTO&@ikred>T%5QP{0~`c}9`R`#0p
ziiQ#vPqb5tF7OQG(JCp)Gu74gAJX-M_R$U+UOtP`zWTmp*YcMEHzA8ihx~E$tQ0K@
zfm$N;nCqJPOX7rk|H|Iy;^{C&f1<EFEotVHbf$QWzr}@?f>B};qFpv9m(|4x4`{fQ
z(QvdFm~iiBKzgCJIPTY4Ag}+oB)g{YML2y}#@POdhM5G+UE5nnN5-V1lOex%K~FML
zGdl?xt-3T8wfIs_n&|RaLlB$xYu8svFDMikmsx2Gsq|$K0aTUyb;4`RTd8syI+iP{
zL*^sxxK<Hqd@4){8Jr+1D-J#noT@irRnP${+XROxeFi)dLP?Fj2I&n9vL&!OERU`T
z8Q^UR09DnMFbZyxqA;%JM%|x|B9cTqh*p?ppDT~@Y9KX<+OKtt5Kpwp|JK@7`&3aJ
zJM0G$t@veZOYIS=vq3t!B&dOaA#tY#wXKOK9{lmn_T4>JSxuy7es6)~RTckE$p_q(
zgnV(XrZjv_HMUQasy|xw=lwD?w%>}dAAYrJ_A`f?Z3lJ%zI6SdoG~w53HW>Iroq|R
z-nQC>Ji;pQ_9J+c%!Kaq+BD~}1cA6ryitG&7gI`zEG^%}BL2247{g|^PJtf(X3VW@
zQP5>`!rRdy*&g5sgR$c*9N{lDL}CWoX7B?lT&T5znln+LKf$=7Ww3T5X;H;f_;lLC
zz4j-yU9~$hH>PO4JmHxiXC%rq-3!>4pQK(z6Aqf3tNBt2W?aEDP;x(JkS_#i!ZrK1
zMY}KMju&^POK(Rz$4ly-WS4<_KbcdyhEjq;Ujz*$0>=`9y+|$E^FN4rQ&_Mml7nES
zQ&YU6VZ)cLg4X+ea`#W%OIbv+9#;{yD4-5ukzDDZtvo^yd~959(iu}^+j%7=IqVN%
zqqNLwZ`As-BKy~%jmdxJ-x=(1ZsrwOkQ{6^dRJ(SPpG+7lKRK2sUoCw5_!$wi$}|I
z(N;hTc|c9doR~Ou<zP7VrmBL<fS9Pz*%paL_yo|C=5H*i<Q1jzY%Hdt5u(DCLK7o2
z@m!faOLW+E7*Zp%wiGJ##)|GvSq>ARfLI%iL4?>j(?^D;$aJ<gyRP_S^3ev2RwHgQ
z|H!&LxjebqUS2YzGiXXw?8+R&fS+~#?8B31v7>=FnU%|*pV{3&-X=Mfbw&FPUqX`G
z+e~Y^Dq<oO@`69&^BaBHa;}U(fmwgnpsuLUK$5OR)o_o?Za@MYNa@tBNZ3h&2;2&S
zW!L*LNq+Q*5@s2Qa`K)1Z6jr9LFKvEalewDUjX=KN&oMZc*Lw9iMol)dCAu!LB>ij
zqlg>z&rhAGV?xHs?lQehU7i=~o9S)Lqnr<(Z)Sdec(&EX4wg2@wRfdy`rDfA&UU*)
z<!g)zni%!Y8Xn;wmTw0N6KUqwX*uf_1mXMggR^@dZHKj&AY}^{H&;yILQiL`;U_@M
zCqfs75D7kK2e%QBk1zH_J)jXHPkj<PN(Z2=C|awrasDSHm~pmP&bP)U31>(4A_5V2
z+bd5q!j#v#s6K>UGwXLPfVY;53<~r)xmBAU?eF{S*nZi}rheWN0?xRIL%u&M+`Q*=
z_|)D!&2OHa-+pWTjMDOYd0M+~K`NbQt64jZILt(iKR131{&+?K!{qyF8CoB?#?2@e
zz7*7eqX+1fn>hvokBuK^kfeq{p1U3ww<eTWyWHs;Lh!}7$(iwKRwmTB2TU%5@yh%M
zF;aQ0Bm&mb_3>$ml&7v@25eR0DpiL${`&YHheihRM3VFkDx<q;F9g>JR02ezp8h4I
z^KnLbisgX|)b{>A(u4!pZ@%mx;lb}`uZhDD*Mij9FMzMNPS2_6f;7}##wK3`kEk5`
zw-OyhG;KT)9^CZ`n}ic=;ajd@Np8#GzASNZ0y;k8LmJxZ@z|RPa3r>(>v=At3T-Qz
zN%<gb3!;V<glYS`er=;Hb1i7Vs#6$~hMif9>fgI^a{l@?g$_UULPVE5zRhHkBZ|*?
znh_>NOxbKB^e6aNxY0U*{Olr_SH!t%p1)L=!GV~lY;V?7BnGGEO@V0;B<_@-1MfRW
z!2u<z%fo{r!Q)(3!8Y##S*k`VvT}vxsIej^+KMh|M)0Y)yoIGZ|8MkXeq6yxoYK8T
zvFT=pMoUSa+K!~LBEOVhuM($<+|)G_Zj{#L+F|LUvWd}Uj?d-G?U<FQ9SZ>|{C6@C
z24rBrS5sloNW(ltKd{sFZmQSo>L-YBD0sikY@*t&1*!JyXU@I)`?d$3m&ojrNR<AU
zTDsB-I-bMe*8(<w70<zl5>nuCgs+Ib;Q6E9Mf-()Zp9HtJvG|v8J_)^2nFA(^!TG{
zSz!Usc@3uNrVaYs(a=>~&x;4jRn7Pd2G8GK4f;@Xpo1)0!Ik3`mzSiQ`7+7_t@`_#
zGQmO*QLq}v4Rhm$+fDQ`P2zjuI@Nj;%#shl3pX@r#6-c%(LP0W8<Zsya_F6Nnwtt{
z-Fs&uh0Bd#(rZH<M%)#DEDt@+{ZC#L2bBotG&>G_#_n?ZC{GklSy11%o#_oUllSe?
z=j{eulI{97ncvywr84O;gfN}E4v*t=qln)b|8j$pU`3@=_MD;UGo$sJG<`5k?^@*j
zW>Bo2uWiqB_v1s?5HD+Mr3)Vz{O5@YRp$#(eV&8m%Ua{w<V?@Bfkq;@KLnXjI7qz@
zg9{l1(_^q4{vwns_MHu=pL*V&d%VcgUtLt~|I$UNp)mfNE_w=nuLXvM@vkmwJwFKA
zDg?s;)Z<eBR~M~CL$?rkVGJf`Bo1JgOA`>><kja7*J%c{4s^i*vuks=(I#d;Z_hp0
z?EwC2J)d7LU%YRN3h@^E(`du>j<U^}sCAH%ED(UsL9-6$1#DcZd?%dX79cQ?meb1x
z<TqiU*HoG#^{kVg-Mjw%^x+ux?hdSzZAPCb&P{rY7Q<Xd)Ihx0tS!&Fyg=$l?@#<b
zh#}F8=ezL^=O9g25!@K*eCv_cDLb4*KTX@n0e@Sda=LK+1R@1ZLck-m-LyJfkPIkh
zIU4CxibB-m7MLAm_cDmY`d>Zy*cZM??EO^QJnsG7H_6|n67>!aSlgurp1eXTovPuF
zR4*&BQ=PS^J=N%sU35*|;XZC=EFC{(-gcYUVoIB7<1+gIIm%9X*e*L<o;z);(N+dQ
zkC7husUYOvQUZCJPwVkMH=U{W5|8_oWM2pl_yQeu=r>K#)2uB@J9q_?S+Sm^ffPKx
z%IX6xN{^G2_4C%wmU04^SlLH)z}C1d=AuBL!@e(JK9@y4HzOg(78eWgOW`*y7g(+a
zdz_2*Jq%8l-kb}6w&a)>d-%9xj?oxH##A0RrQ}8HWb79L2)m3E84$)r6ss}SJOd)n
z4p=A8ke9uh{xh^z{Xau%LZY94ht{}^RxVb18~uNNuKl4?djRextH#qs3m=X(zw5sk
z*0PD;JqK*86aL*;ud%-7u+=niW+5=i?54_fi96M$`1aym_5bnYnev!EnJ!Gk`I>;)
zdKDd#GO3xR{~bos>4SDTlJKjn&PXa?Q$J$Ta@D27+mCjMk#W+%AmhDH;I7yAv;A>M
zWJge5fez5iG`tx;BnktPm5nZYC>Udj&(8lm3Q)G!<EPWJciDQ2=baE0mI~Ay+%3eC
z35P7syn$M_nq!l06tu!=`%8-!N8r`~XwhjPB{GKYh$*u24Wu{W+k@=V){+ceIS2lR
zL9k;JAv%g@f;*eU+CuYO6aJmM8SXBYbjy7;&M%O=e8GiK8O5cXPQ!1XcySPjHaCQ|
zh7Z$(Efy9=QT%@DnH`Pye<*z&wCj9HzAZ`MJTdIBVL}^vK>31j`%z_t)L|zru3c!o
zJwyPTe;5*_q49fZF9%%nV_z{@tM0BQN@&q<>4W%O&#9xL_0brDj(T$fT^aVY2ET3>
zBmC&#r#=%1i~ER|LSM372Fv7eKLVvYTiJMBTWD*<G?ask8#{o%)IAV=fto()P`J5_
z`6GX#5orNgiUhX+gwyCBWFhuwe;wT5OG@wPrst@jozfh-Q227j;PgwN)nvv1s&b)F
zrq1n;&f6s$f1O7LR$YZhhgIhXm+{fMc3j~r5DDK)U`|YiQ{ZR+oXnrXt6)q*2ztN#
zLY^CMAM@={^}wnw*H)y9-&k{+>DlJ$7PXA4oOVmTUW#vj0)39eFV0cZA$8x6>?f9e
ze?N)YhmGWNnO6!VScQbssACsUp-aYN8J_6e|90z@j#00obDYLf#Uavu?ZYzHRQzSB
zR5Q^CY8vwV>?0k9(~NU}x0LEjk}0EP#EQD%mAq1eeJQ=TGU@L^9{GJ_tJPPHqgpGC
zC9|zEOvz%QfL7JAr5iYf<HBP=+IEDJI9fZC)x1`x{vxAgxq9tLb_-X-j(6aDq?b-q
znA&eno9h;F91Hr-W&PtnlZKMjtOFW)UdQUkOfa;zXhE`Jv;0v?K_AUC##$b~wZEz!
z4X4;=GZ?_m@`iXy9F{BTi!C6f8xQVjrnE$)x*DHAOdn`hzEzUe9-DiMp?Euru?qnh
z5+<&MT8Jn$Esswo776#rerf!!91b;*;{d&Yok^j2LZY+dew%A1Exu1Y^4EBN*G!(2
z9ondmS2E4iol0(Wp#)sAls95sCD?>W@1-b8s7?KBa6f2$I8Jx_mld101GW^CrIMcd
zD}e+(N)Pu2qy{eAT6Gfay^U*}v~G_^3^G4tw;Uyq{S=tQJ`ZvcZUV0%5m?=4f@Mah
z-3C228kgPl2%>t}WgJEiW^`pYNy;r^Ou~S(EG`B5FOL`oJ!^a874-Y0lgM;IqNCgw
zTvQqqOWDVPer1v~n*Z#upH=>AhrOP-S^r!TF0tm>#1AiYSib#tZD2TQ0w)>l$*HkH
zln%8q|6eO?-3<>o*uvRl7-?HnahBqBXV;06ER3LHm4GRBss19HQUPpK$UX^{gB&&B
z2I5PX&uYoAU?ThT+kDrT*(+Uvd*0NSswk^1|MVkw3U@1PmvhAS?qoWsZ(wV5srU)v
z6YX62T6@8++(HWnU~&;679}Jx5(Rf{wF^;I-jt^zi>d+om>albd+*!UY5Le#r;k&R
z_au`HM5Uz>*~mZ*gt;hI-=M&bE|vTrW7ac}un<&#-25M2v;t`YPrLp<yeNWj2j#Xx
z@8j7&yeKcw22CsN<UhP91kbCwPi#37^JDa0d>f;Ufgfp}y`IwnygB%M`ciVyEC?OZ
z7~_{_R)^rG&<TpJ7eBPSp|XkIkuZVQ5kMG~&-q6fl|2Cnqx7WW!_g4$0AW<%uP~~e
zSahLH>KIKYEF^FLA7Ru=?yoRvTB6S__E8X~=(>Ce)rr?fMc^PDRr4zu=>Pft6gcVt
zBdByJaG|1DA%=;(<~N4$0MACf1N-2vcm0B1s&sB*QAZ_jJ9X@uH)>{KCPz6i%Tq#i
z(6`P^Msoyk3(CIA>hES<O*22Nli*-^ii*%A8}b;150jNA<0VC^z>&W~xCpGvh#ELu
zUbqL%@=T@rqo+a$Q{&{;j}?~m6&7=*+g1yi;dL^&S^(I?q`ZHXnk2QH7fyVMR0`;e
z#GO0n;QXYYJQv!JSHWRY%1%0)z+tA_1p3z5_dDFSF%SM$Ysp9TU{5(ojA%}qnn*}-
zR_XX>YUL*w`5J5VNFY$h7e+V|2zD|?K_HS{rMxG-mnNX>qf19;w@g;9pMt1?)@Bu#
zI4;7R8d1d}$a=d*&(yAxV>p8&s3jI&^?ZpSOb19DgBqz2?Fok*IQUBIB^THY(vB(~
zSbwm0y0fR_!D8&psmL4*1x8c1ddEtn>JC}`$aWp1_2cyuUFjJ}w%lgvx$HK17A}$=
zCPMx!OsF@N8s-N*r}x1JR3eR`j;01>T4SOdr_vM2T2bb*?U7bG?Nr3CJTZA_BWS52
z#nS4WN;7<uucn#Z4dU+1#zND57DSHbbD&0wMY~HM33UllR|lSkA~P9d4!|=nFJ172
z(#-5H3?HTCmhM+eAnmL0_)*emZxa8AbX5#>$hp^@zNST&iZn);H9Q~_SM2N=-_@!a
zIN5~dO(rSe_}pJh90KE+Ngca$DB=9M`&HEx1?qZ;HZiRkaZv6*ZWxNH<W;+^P({OT
znvnR*+YqwJHDU$m@e%o#I_wY#-7-{|HvyD6HBC89cx-WXSqK7kyAinJk7VXjZCB;i
zdU)ZD1E`V{Y17JJMz9lRK}rXS39@?DYHUE`O|h?_QjrN02YX7xIUWm-hFKti!vtvB
zi;FE5f_#k$3iT`&dSOdAwZ?L<6|e8)k7Rs8A0<3DML40j;|Mv#?Qqh*+_#?VI;DAA
z#$4wQQ4zsX$6m9%plN>WCfT-UGj!y&uqIIy=~t<JrS_aWUeHh4pn~UtJtNp(%S6Oz
z<t`kVARBv97161Z2ou)z(NFLb+p-)m)G#MCXv|C7{+vgaV#WS@IM`%exTX%*>`i|T
znbk!8*qpW=*-l=t>l#X;##_+`N;PAYmXHovphjdBcyu89vN~b|`!j_l@G>b+)(>vc
zn9%f*d(N`uo$;f)+OPW%x7?{Lq7f+jQzgsP(<fI)<)iHYFS!-Uo&<EncD>4E-vyh}
z>D9e{vgNkjgKqOdsKg}!*{M)f;`$8WSpdnO9}t6EuB8F<nbG9(s&4c2E;~ZW<jcms
z!<fc7urGs6u)Dbdm{FZM05e+2*NtcaU`9u*#uq(!Mf3n%<sLc)fEk6`vj2w}eFxMS
zRT%$eMmg$RF4<P`b`y+n*KNV{tR}}}wdNtP=?@&PNY|_?EH<I=%p$YB)0dEN*{`r-
z@#QMwjoT{Njj?Doc<Cf67CAq6B#+@IG#AvIDL2n<>0klQsFbsU9#P+5uO107QOq|e
zzVsK}(`Ce}00=X?zDZ<t%|KOphQOyrdq$kyk?7tX;EI|tpWbg^%F1z)Ul}QTmfR-f
z;EPMH2)!yJ^7io}a9k7_0}Ng<U^g+=cD2`ymo#ww%po|ZSm;?ch7w|zm9eB=A#j&x
zaRMG@OEDN~`iv1r+oXL~<Gj!xvrtD79z89PES-;!E)wk6B0)r0%H9Knip&08QCefW
zWK9(0Ds2hMTR44avEdF7x_bT;@Wp}MzuPm^T*!Q5%02anGdeAP4T{3Z3%&!O(I+eE
zMp`yHB~f5tM9lqdZiT<lXdmeIe?g;7`A(7Lw9E#ofg)TrY(&#HC0ksRH>p!PF3z1T
z-!B+Toxy24V8LvVPdkc6q|hT<Uw+iUj4O9y80G?`(cdyr>1*`=NTVE8|45^C|45^g
zTo7|4XX7;V%6c?8Wp$RyE<jQ!yF=M!>fCc$v>J#U$_yfvGU!VOr)-YkymCKT%m>RB
z0n+H}jQN;zGC&$doSqT;AJVAlEH#sCOo^*F+Yi>gHZV=V-yNMj>5|(ejGdx<9I7s7
zq(B{v`p-W{dTpd_ft~Rc+_D+iiDT_VwA2c=XvX(p7`bRqYhV<6{a12ihP#wJ``L5y
zav<mJibRQAcH1S_28LRS%8ljOgy>X6m9F?S_zW0s-|BY?M}Yd5|G;_JU?7}$E77P|
zfWc=>1Q0(MRC;HA+nH8~JT%_@9w@1CKV<BJ8VR<J*SN~+h6oFj1ln$}gf+AX`sjeO
z*j+xCXmVGLY0yg(P*6xNTUF*NkyC{s*HoyCEgC`p;YWN3d4h7fqP8IqlDEOr@qS?u
zMM<!Bhx+)@Sg%<nHpD4VcB8h;Bg{bhu}0b*@(AISnpznePe=bI)ED$6L#k*S7h9*B
zH(DMw2Q~@O<}Cb$2i*^4j(v;K|8=nG7q~u(*$1o%lHJU-t(7PIVa*N6flj5@*V5PD
zS{dE9eREbQC1Y590Q2+RGW7|TB5kDJg<HZrg{CdsO@d|6zk{`Y=&mAlekmwr1*^!P
z#4j&x&a1nER-!8|K?g$7iq8PDiY9_p#!5;ik{-835;mowvN4u3337zf?!}>vOZfgF
z<VVEXp8q9`Gc**VUwoH6el^AuZ{6fA@^6!T7dZbQE9I2L8Yn*kWXc<3Zg5F<-ESLu
zLvgEZ8L*p5X9Fz3Y6?xxml#Z<UXbvPAz`=y^Ga^f%*evzt1QZM-<dqZQQIhcLF^6$
zxwB)Qi_+!m_qlI>>ok3onJAJuEJ@)oOuNz~+%<9S2&2FlrvpZx`}8#H(!D^oAa_pI
zQ-_CI2`5RsFmAI8X!L@k#znDc7pZt&{El8H4{j2fZbw}okLz1SLnnt}PKC(gi}Xx*
zBksgIx8cwNM-rL^qLHhzThf@yTOftUK;~vQ&?E6v(8v7R4}ZN;AtWL!5dHZS*%~hG
z7sMudJ)+u*Q$A)PtoqOW9vEJZ=y=jedI=T2t6n_iKKcloA?@T)2uR37KCACRpn|76
zbBvr1E+|~8#br)hFQy>s(K~{H<(&&!6VM|(&`DD&kj1U#W?IZByO??;1XmQ5H65(T
z7i(FTzuSY3%Wl=1?~K=j3==M?6}<9{g3Ig)bo!|4Z1ZGNVrmHNyCyAvsv{hiOFZ!k
zaDQaY_P5rBSjo1lKHLyHHxt;2JF_5^bxNJ+;CyBg8Q;8G$UODvboxx@{Lt!n=aMdr
zy&y{uxo%;*A(hgER{!Gk(|mm#0VMYP96!hB3uj!F4B4k_=?R&%1DsfeL92z6eG@Z4
zo}xp3^Gg(Q!8S623B{5zq{u~#Rn1A}viVN+pYt}G#Ptfi^cNmRER=jovP$rZ@fdIa
z5cDnnoI|79ZWCQxB<(401_TPdoFLmDt}}ez18~i;Ft3P)i23e<GcJbw&*R3nSPIse
z_xD3X*2f!9f0KGurAr}V%SAoVR;3KdJEXyQI3r!b%Nc#P2AllK$!4g^2Z*EF=Y$IO
z0CALtrY0C*E2>ImoEF{dX~>CIsP5p~Cw-wC;NAQ;addQHw~ku;y?&x4iVP87$*Q%1
z-=&zI&vo}WOBsE%$osUfKk9Dcn|{!J&GW~-**v(kH=CfPKPV=eoG)xbj1=3_VQo*p
zcVLzJTa+`J^~L>1F|YKN)+RwdGrC`P&vfaN*->Zw@_y+=JZ+db?$3_8doL8$pD%oE
z&+}ad^l|Q*5-jTFI5^bzx^A{;R2nDy_LubPljY#GHe1BkEsagPHy$qiS~$}6ykZ<_
z<rFcoyEJ~a)I}?xii3A0Wco&P4ui-R+zTitycg=Fgk{*_;_W6So=qfNmI2dUGlrM_
zTzS$sqKpsMyq7RABZ0G9Md?H`1|`b$snU79lgZBMLzBg>A!w@!81o*)L0a}ShUUaz
z_w=dsKWG_j5_4cThn>}LsULppB<~gx<iSXw=XBF6$gh`nJC%ScMFI;U*b5*z@i&3y
ziZL2P?FlOARq6Q*gS5mqNe7pluCOL@8o#{JHur2RDL2bXMW9-`fxKP1PR?=jjahNp
z!cN6^-A|CdEm;hw5(U4<=sZp^S~AMf&pkg4&P3S<;=(c|m6Y7Kk~yVnMx$~J4~m_P
z>QG6vFKhkebw<a(Sn^l-T?y()_-2IbO-OKH$)4uUI}COl%VDNK{H;5T8;kp!E#>fm
zS!r;^>VdYCsXW;rnoWU!+`ch%RZBULSj(K&*j&Oqm*g8vF_UN+`cijhnU6DKVwCQC
zjvQr6aI=~v18s9%=BMT91N@}RKtV=Z_w{?z$646WUEIm1rGv<hIHRw<f{-=T=cqne
z!0^c@Y`0<nB((&P|3WlX1|5D%sc_M5>gUt1p0`p&h>aLT*?<%^E^@FVSf3M>XG#)h
zC4H<>s*QW@`(2#jdunuMzi}l$15OGsXxJ$iHUA$L3u{(2CJ21*Z6|N2&~ecG6MG`5
zl5v04QAvw>+0ei0sATFYhlwFz@VIN*$eEiwy?>*ANr`F?68D3JVgN5?x-dS^v`l2t
zGkJgw`^>{t2QzjR@XP2~|FhvL5G;tq8x&&iyY!A|b__kF)CPPrG*nS6?u8R{iu$r7
z;Ob=b>2>jVw{{=Rtd;L`1l%1+r;t^X{gEj=An3Fg&+|k5cS+4PB=cuV_E`ha(8V}_
zR_L0%9o(3&oFRsUUbTqgG{1;CwsfKx0Z!O#vPsJk*%ND{6j+>N-yBEDj;y+n@gy%R
z+xDnLKFp#aD_m3#A#=VGHxw3P<wHU5#YCbv3f;MV7Kg%lqs$orBt}(h;+;&dv^7H;
zaLVFKqpYx9{X27A(+XY4&CvV$h0P-qsr0boxUli`8iKX+S{voH*DTWs(voP(n&c!3
z8pVz$7avwW*e4lA!ZIP@bMwfw(9770Nfn9&c^>CHu9r}m+s$S#riffhjRSNLaL58K
z3M>^*LNOnyVj$T0`;j!Cl*lAHfj^8H5L<rG2!2~LhnvGo^zB!gICbTezO=+eUY8pj
zl55%7+{7S$5hN{v2f8Ew{w)=G5lJmCOJLH9ML4KL?OljhTpI)&b~{*Os^_O{n`U{H
z8|k#BDGp9Ixzdi2<zy!yY1u-#LW7nLYugs&8g&vg!EH0?<f|0K72gdNC9eSd9!i4c
zZe%U7Zp@-o9ScTKn$9G~XEVROhh_v)oC@p}bZy<#bjt4K>pa|2J()Rkgt){GoHxr`
zU!I2$N=CSY7F=QGz`}^YI$}Pu_DeXq|CG<1tZ`7EMuXe(l%b(>54b3Ytv!39(57?I
zlK;9%K7}qC46@bO?f{*Do-0IIAF@0SdRSO&*4J{wfy4bAf!kLaFTyxd8$KRaR{DU9
zbgN6G3<qX)39nlu^(K*Ffa%S(Ku*F=s2H8O?$O^wJ6=z5VsRI6_>NpD(hOmSC;P1k
zgF{ZdmMs1Maz|NK%q+G0&dFzzH>nL(T%1kUkk1Yzif!UL(i8t|Q;kTF`*>22=K8mw
z#^LlBhQn=x1)aiV3s@JOd5hsg-0fOt1ncjf4emH&+0%To#M$GKuLMPlMrpg(Gu5WM
ztFu&2Z<R3Nrr&aL`_e~ownOkr!Oit*#&~KNdg;?+-Wl=y8H9Up63ZW?_n~nWI1_37
z#&4a-`OHC;xG6wC#eb#{eV7W0Y?1qJ^)e3^Bk@Gr-Qmlr^!8H4|G=e5WE;Q#D(zy3
zDD|{g6prfjDY;85HwtD+p@omg5M4DLaLgb{Pxhn{w432MAkj)2TC$zNHzI(m8zjD3
zwyd!C7aj%k68yz|3GLd<V6K9ov^mhQcvbhH83nyokT5@Q-)?qBZ&^sO3R=g-8Zfc0
z;%=isz~i-bt~%9kN~MvzB=2XX@a@ypcU+CX@%x%aF1>+IMcT5%IcWTjZbA7k9}$@3
zN2E8ipsVMd{L);Nt+0SyLW#~uUOxZNt0_k4z)12R;{KD#;~Ehbw805N3A?)3+d@Ka
za-a)>{>VRW#~ai>Z%zgKc$CfdfXASFfO!#oBRn4Q@(&upK5A^kLfsr*Z!k>cp;|x_
zeRWr<g@8=;+G16o{VgqNw3RL%59TLL-qQS9hfzK7RR&<+_FjhC!dy<nv<LT;tGst{
z<fwxn9nTvNBi8?YR(^tNMFL?=b||41{<7zHRN<W!qOkN`2!Hvxge#xGln@;h%>B86
z8R%*SR@1B&mh4d+#dSVui;8dYSD)@TyzLaECGDTFX{4F4NUwMYpZ{NZbWCpr!RnCm
z0IiplW2+z3K1-Tc`+3r>Rm;{gnC(H8Gr~X}7`KE1M+oZcx#?<^u{@KmAPYpvw=LtD
z?Ii9iLLr4Bu>f>|&C8q>ABJ6{OioYh^EEZ^^`_9NfLndAt%*wjvXeWJdpP2!92#RT
z+eV6$!~-s`u7?viC_MWX$gjgO;j(xlgGEv*e)UPS2up~j8Y?hvxLwp?{P)Rgq&sqv
zb`kxmGG;@VB_qBN{MdX64vq3uXOY6Xr$FAl$P6<vIzLP`9y>Pdqb3;Ng}nt}nqbbR
z>7tDXVhaQG?}C;~RB1?INr^mUJJ&*!z-iSA;?8E(sxdKhM4S&Px^Pj6uY&Ua1;zXk
zB;(I9k8{Q%lj7WY#PZZc?8B0Zt;52D)aHnHYVH`_M+eS7*g+-hK>IWZ$G>Hi7>>^7
zTjEAx7%1XPkPM<sBf`U>$3#rNNO1Gy`%QBl5*B8WX@9V9a6MBl@<rWvf`$&NB7Ks0
z94BCk#+$caLPG(IXlgvT5LvpO{KQGudkvi?2VAMMT9A3)^C%2Vq{ju;q2U_mswD%J
z%e!!<h*C5ER*lyum#`A-xA+e|ic=JzZgusS9-Tzmr{rTT;>chiEHi-$uykLdh&6Ep
z=gXD2doGiTOL#79V>t!SBq|tUC~nkuQMi-Eo<iu4O9y1j^#c|d)|xs)NT3(<>9r}X
z%lwz?b$h99lQ~9rv36z6)v*mpgpk0AzsXAO9wagS{OQ6zUweoTWiJ*OiU3A0pGP*S
zN-H~z<dJpDoDfrsD55=uDW)CyCHET#?C#iCIFTvT1dJ9&NbYrE{aW2pfQz{dtGV`z
z@cxUMH^_p_b)zu%7WS76G+Y2l`;qY}3^QtMQ**tdtFR_c3{&>uy7VIJXR~rFx=dd?
zb3H{VZ%nZad1ZWzLobg<vyL7s!lFO@I@7DXtm?!mXr5xwQM%nymU_^6bDRK*3y7|i
zA}HCr;>>=5DxKIK#h{fpEU|FNZw?xKCVvx(C0s1cNN1F(K>umA*V$6*Bnk95PV5}8
zuw7W6aaSn_<MbRJt8y6$Bkqu<w4Ej7yhhRt$SpKKz{3&10b=_^NeEFrsG_ei1R8Dc
zl^V%1I&Huzb(o!3J9K}ma95i=i;$6&-A!$K<&yu+Dkwo|Nn5wHVz#h!`t35ZMtBlz
zbLN=zgkv}H#DU|q2!<OcG5QXcYSP%kgY!5Krb0~>V2@&iJB0kTM+1}@`vLZ7o|8no
z14YW46{&jxz#i@LDnLS@5h7A+_zAE_fAfd*<Vl_zIEKA$(to1UH~8f?@3t_kWH?Z=
z!SRZ*ImI?3FSSD$;L3j1=Hr<KGN7DFlS4u(rs$iyh;ao8>4Bb-ekY?lDpM7&G?|3A
zkv;qfY;&aS=145CN>s-<o(hZEk&EkPAjm!Ua9SwzMDZPWj}^(iHsZZ{*a{BKcLCQF
zr4cSojmhVyC%U!3sxyoZwl^F1y0+i23=Y-8feAI0lrE4fk;xjQ9?pWQz|BKmX6ui#
z8ETB%Fk~km7{@*!CQY11WkN<CCP)!q!hm0h5}%>ds|E2^oyo4#oW>_24t$`8u6?iN
z_-=6HEgkSv>X^0AvqAhV`nMwjpJ+|Hp)U16-xIk>Ph4o=d5Z><OHb|q`%Z^}18jFs
z7~lr{esPBaiX2#GZ|X|ydLce-dff;IN;xxJ=7CY?+g6v!6Qs|HN!aV0e|ObTc;qMt
zn2=i4ckrL$STAg%oN=L@M8+k^(*;Gx9J-9_z%wwN&Yr`h*y35#jwEA9S<rk0?uqvn
zs*<HeT2c^>jMr5kj&kP#j9m;w>`niLj|%lrTZk4a0q{{m-@o{1k^;JV?3mMEd^8;k
zKjWS8e7IB{%emHGAkhms^83>{x3N<(R)PcNz;>SMM$9@D(JoN_4Yy}_dM7mfaInEt
zunH$dJ;Q6zSNSZKm7{oy5~8Rq?R)5aDDNP%1M-&$HJcdm6TY<O#`Y8;n-sA2+%IG5
zurSeg_vg_@Hgi?-)R|I=`FmI@xRd4V>!%N6Pr?o{`yjI%+-b9PS$Ok|<WrCJh+`}<
zUU$3AX7nDzxu<La7nfE*-o*8BP`4)W<NSl+UG{72C^Sq-Vg&e_Y*Q1&+C+_Gc>kv6
z?;!C=l%%^YkZ~a7dkGgyV&$+TG#lD4GTTB!{NWMjoY#gtQNHfBj?7PCXY1ir)F|84
zPEVC%jHr*b0Pp|pkZ#>Ct-k&(;CisduA$CN=S(3rr382pn$sm<U#=!zY;I&@ZzW?*
z+imfg!apJCoKET8T;@!c>{cypF0gi%!(Q|+G|bTCiy~gy6TTq24(~%FXcDev(xTB<
zZa6~C+$Z3@iWJapt+o|+&=OpEQr5S@5(bmnH^N*67@}SPa2BylW_;{{ULMYTQTOeV
z%kCP2MS`msUe@OSxl0sTY_7KfIt`_tq|1x*0qJRoap5-pq5ZE!S|j<7MCv{ckVqrK
zPR)Av&;39qkP9ljfHpMX`oDI#FQte|QSTiE9i7ws+t!02=jf{#KKt@`ym^$u-nnMm
zNBb-C;8w<f?|RA`F*_Ro7U>(nA`K~G-CBZPBZ%rgbC3k}p{yF;*@~dT09^2aHziPl
z?A8e^dg%Q|Zcp&~k3d?ld3>EZm^z(rWYm%X-yG=Mu2a_q%^zyp77^S$<j31VTtY9$
zq#$?Sdy>FKKAVPo{O(RU^TOF|J9GKu{v+PYZ)T+Y>G3=0<sSd<6%HqcCQ0@+D`_;S
zyvqi#T!Iu!^A6O^gk*oxu_<6=$KuEXo8)?%bXcHl`lvb0{P*g_((b)cs9?*^T&5N+
zU}Zny@Z}$tlZxRhN+)Yb{doHZeFD-|c!c5ezY5`gWJW^_QwiF}152f=2*Wgjsxd6b
zog<zIHlkTFkZxZXYl}7{9*SjvzF0U)9~cZ_C;Bass!&k`Oa4Yk9fhS4Hin^$Y<R-)
z47S8VUf!PrNgp&_SZZn--*^yi4~I;Fe9F=#7F+0`BoB`N9xs0@;bSaphwJ7}S+J>B
z*5#cPKQO#1-VwRZej1^RLSLkgGA~GcYLHlThR<2S_l3a(l+a$TnW+`Y%6^(aHFbYd
zG+^>`A~=nXC}5qJ`{GyQT=DlLiM4~<wm-;;+0wR1aIG{$PV5DXfX-!nba3&H-=P15
z7L+6_a3`W;d(nm->XfT64{~F0W2BU_##xXOwYehcIzP1#>|mEEymG?lwa_h0BLeur
zWqx8oqQsM`Q{Jx?_nyqSwk}Ff!%YcE^XScUK*yO%)JnlKlp0Q2tq=9Ir<zH0vV~*i
ze%SfFYFfww8J_8S|04YWk?i3SW9Tsy^)H<Km5=cQm~<%IJmLu#f_VpWqwBR@6v(e)
zm<F2&*DTR2o)KH@rYu~TQOlu#UVVs8gDmh{qwMyF$L;!Bf$s6TENv1pY&UZol!Ada
z{6T>l85S)$T5L#IFnwa}x-3JM_Cq3EsXnP`AzS1Zi!&2HGIHLsjHCw;EfEO<l*bu4
zSs7?VQ7|M%Xx8=N?py*|w=7lzNs4`RQ&~(XDPx8gu~c+7g<`7eevuexGR${8Yvo@J
zh1j&{H^s#1yN&Z|Kf|xtz&hK3{w~4}aGUg)R8Zx)H)VhIGr15B2O4zlo`PvqLQjJ4
z-u(I=8K{}xQOcPixj1s@2-D8#oki5F0Sq~wK|n;D7EF>5JR{9Hwr*Z)&|<~V-gp-|
zWXZ)ZhLYO`%vd{d$fJ01e+MvN9^Y-By)R+pzKjg`T9|n2ncjj#6p5+EP9xT)77AJd
zAk9zHasi#P-!)rq!dbds-;Ew+ypMiryX-zJH#?7D{8Lx9n)&<XyVn+w;)G|eQa4rb
zvEBX#8@o8j>GRJ^D)u(bm5VA;k>)#t6-I^~0S11aUq6n)e;j#_q}WEQS^oYA3{s~w
z=`~71zyLwt_}M2SzEL8<mecle()O{J=<|t-1DE3b$2D&;0HOt<^I;y|8+w{uaFn}*
ztiu@S0PK{QIWPBlq4qk^DXq03u&v*?#tUhvtnu~Nxi*U9vcX_iFYTe$-bGnx)EOxm
znsek3h+u>Vf?O6S3}<N^LpW|)OsU&Ty{={pMzhL5wKyvb;2x){I6OzQIekX00vSI6
z@Fl2Wg4#WY`G}*u(yYv6PY9ai%Ta>DL)q5D)WXq@1Oy|Q@(ZRpYi|oiQbU;xj!Fgs
zGssq@8OAqB1gVL|JT0IsY3a}!U4lK!lW3$8Xs5fgn3pIft2WwQvu>JaCRDREFa%m4
zl0jF?%VlQOGFh>oTgtM+=Y}uS?QJ9KuyTC&sOHCo%hOZa2aYA_?Y9v<Nk~XUq-*KC
zW*dwiNEt9=eG1|i;kz0nE^+7j)){-(yYzrA>h^xw@@S08clkMdPM46rGaLtv!590L
zqJ;q!UafJ2TQ|gcewX|NUF*l88Rj(eoGyRbWXOaPn}S8Ci?z1p(!vb6^J>bEBQ@5(
z1@W1`r6OZbCojiO`(~NiCSH=HX%oKXu9-19%M=tm4Ev1+1s_0mzJ!mIi62n`P+se2
z<;Qsp6%l*JCtge?rjmv^j==S^G9{!E<U2T%#-g0DP%5{``8*Qwv-wRP11j|untD&o
zj=KxE4&E!Z&v!h^`Lvb2E%hgcwFIC4M&Anro%aiY&pc=14VMU-ZK*s{=+=raUhcLM
zmku4SY_rvL7V#QK8ZyN$m}?6sX>-vhH_e)3SH3n#^Xm6<kTndjf9XwB285=M&bH?}
zXE*1AQ44=%{k+^vbsJ9$pGM;)ynR^*iBDl$1SdHKN7b2P(+<1?sgYD%#4va`;TON8
zI(|dac9r0C7}heWwI=y3uC%&3<SNJLgmB``HmoB8c{_|kF)o?`dywkeOx%_25>X#>
zkhNsw{x<6rzB=ht`O`tX+(wq4&|y3I7XF3Mg$~D)IA`VS@9NmT9QpIJ_jA4~z@vw{
z(T0>n#_zzR<|PfCBQIMtu6sW37kTyyR3Lm^INtTJBxSCGkAPD3AvlRV60WYlr3AR`
z)bZRGb!@?AYRaa6hgRUCaLpG|jT}j+gQbj%n;l5wj;Mt_<8W?Z>y7{?w)b>_gLnM@
zPn0YGnt9tgDn-CBMdx*KH5O(l)QAqB;j<NHkE)Q*x1G~A)HP*g5j&2e-8JLADkY9<
z`!Tip<IttX&}__m$v4U1YpZ0J&oLyix{Oz5%FP*F5Esm+oPZ8f0ue7b-I*$vpk-Ws
z2dEECu1V2$^eoLS&&q#M-wNEQDKFl%%-ClS%jr$^riJx!Jfx26k;W2yruxbh@k{L~
zD3CTtuEVZk>O?6LoLHhfPYhmbGqas{>~cHnB+||L<mheamSg@VyQ9DMM!ji3PKK6@
zQ6>$rr##}F6h`r09IcjTJwu~q<5U%0r9XHYIK7p$M(fXn)*|{*8m`L6HvGPEjp9J!
z*m1S-PIJHx{p0GrL~n`VNtUy~4Y|#%a}`|rQU%Lb-prYF4JE6?T3E+_VnZ~~bGv!S
z%==;Y2eY2s_tYrsneVLy5t0SVYzU+WtoV#hBjg9n`>ztbg|{npI`}!|2D{*6MKA9N
zbUrYeuPh$)Vx7~c520btI!2-^HXEAH`n>zr&X+VUSUfu>mCzevNvLgC@!g07-C2Pq
z(n4;L9Ylpc*k8(y+^MRdSF#Gpua!)Y8TyDLs^C*JxdN0vh0oxS#v2h;ej?OGRq(f(
zZ2#Y1IVT@D;4cs{4#bYH4A2!bT|-}k;mxYPTU?dM(wW1Xd3$;BR(tvM_cudeeHX8d
zs0Y;h7+VWPJG{>yVP*as;^w<W3D;@K`PXhkl{FptJ*_#@L9b6g6g7F%vE-fL`iE}s
z^J?hbuAC;}J+_PRYPH$j&-(arlkm?MQ}fqAT+7v)!_4%P1M}np&k4$T$F8N@<?4qJ
z4I0q<+5U%sSO<?8$XM<(d|P*MFQe9U)aaE;g5Jq~P12i`v~|E1FY@4qBEH8uH2Aeo
z>SWWdg|OQ*$0%$*RB)it&?q&@J5?-cUL%1X8ZU|R_p#5}oR1vq*uJ93y@cCoI;-ge
znX!K9itTYeP~y>p3C4?s1t^m$zF13f=kB6?A*x+fuwh*&W;W#JAJofJiaNr<8gHjU
z7mctk!GmJ60?yt>k8tPF08S0!V)?~+;-3s2ex-5~lXp(Yp$I?`EB|nkJ^{I^kW?V)
zlS;tnRkC*r+Q3M0wRh1~mB!J-)Y0-neS7C@uD$95b7qyHCmb3&0zq&#f!?q>vs3Y)
zwxTC1zLNGd8cyoxET(h6rtN84NObu|ZFUf?_xfE=-s;PrO0hsL3+LB)co)-Gl#%aE
zw8OV+^p$o+KUeN(2v9+`CXg(SkaX`DI$|!<4wSArQA+EpwcxLd)8HVC9pxq3qg+yB
z7>E?U0$~_Tii0|M*UoA*#RKMDCGqFL^ihhz(SSc=sglufx6@GxyRzA&$4*;s5m9Sq
z$Qs_LqPon%I%Dw1mdd$vks&lMc+g#bSSIBjE1)Z3{-}BTn7lww3m6!=sMf;VO*1jV
z^20rGrcYvT+<3nRqd`&qW)GAL6I5U0N8d!tRl}q&i2oO1_Y@vm+ogdzHdk!hwr$(i
zif!A;itS{@wr$(C?Va!cx_kG&`e4^J&+DXW&Z>8e=ebjm4~$QS=i#ux1$RbTh!=(-
z=MEZXiYfLJiDW1bp8!_)kLxv24K3SAG;&2`j(754h_%<DYlb4?2$l7IW$z~A5XYLc
zh>`6p33=rg80+#Y!uK(dbDX%C&p={<#8--m-mrkHqGp>Utke!}xIya*skUlUt7V2w
z;afV+{T8qpA%{E;eCGT}Ett(QGPm85RKuT|BIx)M9nZZ7q!?DGg&xI!iFsv_O+blX
z;chU2$&TVQEiKF(np5U<3I-D}?)7X*39-DgUMMe&H%MJGc2yON>wETJ=6LJ%+t#7T
zs-l{U&FjJbC+1e=HT+-dIK#>YEhZFtPf{caRd9+1%ca2076#=Z<ps5%p`=7vD%($8
zLBd&!I?|Zj(ICjF_u#0vRG%T%o8Flh)`$Q{R=?KawBONX&7zkDV7Hg3D3!`cRl;QS
zAti@+01xw3xa>b7`OUu~Iro2y<avnyisVP;T06RbOm~pPd!E#U?=CJTP%4-sZAKm`
zDCtI<i8b&s$BL*7+jz>7gq#}*uPfiyv9$J|B0`?L?24a48w5KUfgsFU!_w~TSK3-6
zj>EoEJ2tL<oAuW%O4$!Gg1yB(6wrdnBTO?<#3RZm^!xcU7=&h+vI}0AUppf-|6Pg;
zMxiw$T6SK+hK1pkvkyc7IfMrf9Qe2|D<vHe5*jqzLKL;I)Gzka@ePz5%a8W}7rg?0
z@7)$JLwp1r`ZuaMLspGZGJQ13r2J~D78KWqgPIK0NKCIr`U{QDnM#aj5IP}5w5K8h
z1wOc3yevW8d7|1<zgeJ9j#kr6wi4o-h^1ynfjFV2RGe!qpBzynOP~N{{od!!LWXcS
zZ#^lFq-o-NrJ;$%2v3TklI#kQP1Ce!jw4sKcO$apv2oW=r0$HE8`5*|r<%D7?B`?n
z`C=&$+@ECW&a4#d+27lMWeC43B{L*w^imADyj^>xzI=qf#t=1h3(FooJ_kd8^d1ji
z4lD;vK0H^jXehSIDG%m-kt$*m{O}Y-*BQzh!{M4mPHAz)&LOu7jYe{PCtCz0z=I!M
zcq;M>ktQO%GeE4KcvaiLC2j-bc2F_%j2v%C#ns&<8*B-MTM*k>uk=#lr;20~>S%`n
zZ`{}yvD!^p(<eR8ETYEK0W|ctJ9lIanhuqJA-E23G?}tOd>_K3cZC12XBM*oC7$y;
z3QElH9GcCMFJntDvf8Z@nwH*V-nHM*SAbvAZ?lBi)i;`M_Q=Mm|A5?teSHC6EiJBG
z(OMO)Rq+&uA(V*17Csi)-C<<`+=B`R@|u^Iwg#W&_?l_NiSp26Q9?t`^f0;GUw%=9
zzy^8lKnl2KL*-sKkcWT^_*-a&82cR@SX>}tb1L+wXw5;V+C#%y$j*%dlI+^o`W1H)
zt(3Y|pUn}*X2_=HH*W~FoW?W8-={|#{13#)5PIv*cix-n7&Im~l83S|9j_r$*_4Z6
z_TF8Zvtu*Jaq*?3?VRg|i}@qz;bWC?)=aJ83oaC8DILNBU)jKr<TN{8OpTJZID5a4
zwD!=Pck=B4=2k)NVNGU?asTl7Y@n>dNKD}`hYOGeUd07Rq%Hz|?ap()IU?&FG1xeu
z_*zVrjJ3$@;o&`$CGJ{V_0BBzutYw-;u>T_zZEi$WiAjp4YM;qZqvwwYR+=K_KO8j
z=upKgxs?P|Um&GiMY%q|UA}mPt5LmENx{EuleBC-Lw<e7g$U2e2{SCGsYvK@N9+O1
zL+NF{V;~sI>FK6^=DED5tW9<Ya*aniMmov`ilcvONt(s;Jfc53u0-6GGPxi+2y1DC
zGWL)4g_Q~ulJZw!pkrYbXx60c6Re$FJDw>3!83#xsf7a~FbQLC0P0@?1swBtu@{yK
zLZTfsm6luXq<$LQt?GT677G4x8n~feEuzA|!noE4I6y+Oud+_ykrSFbA#?B|4rba`
zM#ig@hykQWAv)luu%j%_H-R=P`%k_T(^z%uB5(|LhCG>h5uX?}al24BROWCqZbIU6
z<8^0%60jCilGM<C!C*!ey<Dgz$SLj41|%RYgRw7n^qKxU8cVY2Kd?59sc<sQk;I?>
zmFi4BnhOr4$fiOzpJ;{Ql})ra;^2=krDXBOnZmiHOPo>`Gk`Sc0|KlPLgQsdRZrgz
z+%U6thqd=F2j)dmPO~D{RTd-ffP!r+X=$q3mBA5r%;o{GpLTOJg)4kb!sKuY6CxDS
z#<zz^v5{FvEBWy(1=o8qLRSh9t1%Cv{}DOez8H8l(T=-%8%qT;ypO=tUwynEbC8B{
z25xt54WxaWZF%_vr4<;$EY-4?ecFEKaJG8w3H&?6_rCN0H16xNvv$7&!aRWC1@Glp
z%udzw@HRS~;&`73TpOtUv1l*Qwxw=Sx%77-lk+7ORzuC(P7m>_X}<PRstREui*v#1
z+yBF4gtzYe_#c$kv0xq#V8fb(V)(#1?K~E8*+2!X5nQn5XSN+LlR{9KMy{t{u93~L
zG<u$@8Lt=tnmRsFq2fC6R!GmSPcJShy1pXTz_-E%UTdz^`VapVMy<_k;1WQ+J^!!>
zscfHbtd0f#!u`*_(DY6p6~vXfoXk>jcarGfYbB!MqE{rED;n0NPRB1(s}$EB+;S>F
z8=Z_1zUvO9>n2m>F>*}Qlf>9d=!BfcdM}0i-s-%~$T?;(fv~l;hH8DDl`~Ot+F>&y
z=)Yu$GQ@js61(u78Ccp%fRaMZ+WnM5Yu}4%=q@8X$y<)R-8)F}VR?t)=c%Z)lN%bD
ztr2AjJkC%e^f)2f(!?dqKV{Cdc;8W__(=BQ;hAI|R|@u8m1_4VsugnAC#vOY=V1rd
z?V_B+aF^_ATKg*o{PSn$SV^FiyY3{Zhi$lIj5{D<{)b?o!fe2eyXi^vo7fv6R+_7d
z1Sp6$!SZKBBq|Zr4r&~@G6Awh_}o?{D~qh%A(lGc@4S*aMFFfylAlmCpf)Qi#Km^I
z!b&~}8HHmySSL`=H4crn&{mayhQ{#Ew(qUV=yj7^oI6f5&+{KYw9H^`(a!%9GlF~$
zkW1r?F>tg}@+a^;O_mZ@yK7D)H-LDC`JCCFR2b;&+;3+-V*IJ`a;-j%$<WTI^1Xp6
z`3#=*TD5zFCFT)c(j7f!Ojh6;XKm|_Pz@<RcMr0Xj=MbfW2U7{Y2PEFOo_x4K6GHf
zQ)ty@7SE+gc?Ma70`vC35ly|k!-+Xc5e4s8@hSjXraDSCcnNP!dSYL#Y*kC>z<kOf
zCIc?p{0I(i^~6W=bdMixQl=9)R**$?+oHGHsmE|5sY)KLlxwuIAr~(zUs#apBY{ab
zIcKbLnkz?5n@YZka3W?@rsFGZ;E#F-H-d_uLZuPYJ`6j;DG}oQMYk7NhkMq<HK%8W
z)xgQPsfg9YkuZPPy1hKJH)5eEKX(Tmklo;+z!EVtuacyU6}w!V>U#^2VFNHE6Si%m
zH7KF5Bt*_DWay*z8;BGfy?h!Vs(J%GV*!SW^(QmiKO>(DK*c(btLu%`WJklc93b<%
zUf`W22T~JwynCjSZS_gcqM6|W!(2q^_csU>f+l{`n2T@v(?`Wu(4OW2+q}BsRmQBg
zB5g}V*nMO8pLcz2RoGQwOiQ-kh@;O93`8<aK`JN80CaeM`+jR1@r6uR##HOuHKN(x
z_rZNM`Au6jqooNwoQ1vQYS~|~CbX1`mQ}T-rsNN{=1K}>N0jIh>00V(X{8`+9r5I9
z3RI!+*7$1%*`QhO<j;=%zv#*mP2E$J<>@XHpwdiq6a@kmN==I!N%KNfu<-d3B}19P
zRc_>C8A_E~0Ws1*dc_1uiKeGvlqt_`LH;@kSOHNc&0+cd_5L%-cPMek@WqU#BExqW
ze#7>0AW1bqc=G$M-P^LMORHgZb_2&VJEWkM3QWMxhy677)*gs50!z;7__{XEm5z2@
zE+#R~wRZvHUnTni=Dvry%ym5Zq&Qo5Av^5(cmC`x-`n`pW>)*t?`QZbTK6}3;e?w>
zie6EYh%xkfx7(vgaEHZ+AOW;QMzFFXimw}+u2hb~Uj5K0fv14<%q5>!P-$6-B}7ZY
z2tovA7l_<O1L`%h97OwjvzO)I)@A+7u|qxZ@|mC8;v%iw;uN3XpS9Rn8gJLRer&7p
zYz~7QE(^Vnc-A(rV~lLrK?Ia$(Mov@Tyetp05I{3ku^2CpA2h*EpJna>?#`X@NhSG
zHeE_Juf7$4+=Kp@l#nbi8ysf<%Y-XCh8nH+k6E)4VyUDS2N6>}^+(IbTkTThKCHR+
zi>hDM4tFhGDVs+SRr)C13*-3rZNh-vZ1z9C8D(*JXah>27ptV_;2p$QtFUxa2s=cd
z=d07f<nryn;?%aD*X!%vhKvvq_zf;%-%_bUOMZIQdB7rs%zP1i@{lVodis;y=~Y!$
z+^{GOoLo0k;tzOck`*P`BT&w5XyWLG%uHE`v>QRqcsk6=vC+g`X$@?-zBao`mP6g~
zqA+no6AkB>Z^DMcidICX{2V-%?jK~qgkoulw3ddvOO;mhVojz5Jozw-Q`Q9!LsbT9
zV$y-UPhiyx3BXOlrLrh#h_a2IjIACF*Ly!59;sZ$p9J@v`I4EwMJ1q)#D8G*FVJ*(
z!(pdvFX8yS_kVPzP|#6pqAL(ujA9TyXti}$-Wl2qY)p#j-{pVJgSpLh+i((g0UeYL
z<UijoFAJ9$KQ3tPNuqBMlaoF8nPo1g9}m}%fUOO)Z4!Frb%nHjTx(|mDXXHJEx^EI
zGh!nZk>EYkKg%{UosB`A)W%b~G(8^7cORp?UEh{p&!6^{$i0cxB!&=u)Y>1s44yzQ
zQA+B6`NgfCA{CG(GKJ+R$-o!=M{BC9yWo}U5@BXK%O#(<NfjGCJ$9p^^SN(6wBn=V
z?hb)>+gOf|XBEc4^*q8h#kpW=w$jTf<>9dLwU-42Wfnsy-hD%~NE{Z0mNje)io=S-
z(+RU-nwqm>Jo48;Y&G7P=Ircf`PoN*%EG%$Bm)*F^}s^RKNX_2I+kUc0#@6W3&(dx
zP3LT62j7x>x;J}()eZd<6H*d8*0x{AWC^?9+SX&*LUF+S(7;!fCrI3%!5yk>L(vfc
zji%j1NPyZ97n&VSbXkO#=dFyQoC*o}h}>On_@C4Vl&Ch9pklo{wyl;HPnMzsy<oK{
zG{=UjaAXhyy~tS3-vpDgVz~usWw#b4qOt&k&=^}4OZ`mdEG^B+M%-K@Tr+jJ^e+sL
zqopJ?|9=oRrp$0!kZ@5yhCpGz%~P~pP<XZ<h)pZ_uNf&p_QjjP0-zo{bjK>mJqjY0
zZGk;kqY&zoAkM#Z6hZQdmyZZ6wD8ng>8mRcQJGkrT%)9YLuPFPkS;$WO#+32FP-3Q
zhbo-xr^RUKOCh*k4?}bQd`P%B_Arkce?Xe$ggS!;1WJ(CV=lvxhHTXgh^%JmAH%n<
z3ZO}uxYO}t$UXW{?p_^+eA^)a!HxIp_9HkTw=tZzR4$Od7)rCO0zR5#xKVCA-!JxJ
zR4~%$i)4eW+(A3s5u>dU^nHck`-doQep`T?*O6YI3R8Ru*p)=n8KBu+lDPoo{UcyX
zo1YEaBPiH1;g~U~6f)W6sjo^GM<7T2KWg)YD8h#U7~z0LAm4<cDF@OgEJ2d!V9tDs
z2k>ty6;%bLQ9AP$Q&#V17YY6QY(G<#&t4_BT|{-3{ctb0ln`e1boUl^*MQ<Yz^CW`
zWbLh%WA{09Q?qotIBFk08PxvjaXG?o^AU{BOjx`h08dz9u>#j3uj*7>-OM#BCdMa!
z#@x0Txj>iukKdhPOjx=UlK1?YrTjq6RxM*>;A<&+gUC}?!w+o*CK|-%os*snHPBbF
z4(Vacb>ZpTntLvv1!J*In>T4uqWd7>YIqZ!&HJWZWcx|sD^y}8J@Icu{--G94yrkL
zHS{M&+0l`vt4Z-X$dx)o^`mN-@ulmfdqrp!<H{_g0ixAR^XTz%bb2|PzOGxH|NY~v
zGS)_EqLb9e|LGze_@S_ehi#<Oi;$~t-X!eHGI;gV7oxHv)lEe4Xe&tBNgJ$6^(`fl
znL(kW%O&AZUo`a9x9f~yhr_tzE2|A+^6|sT<6Pz-<<M=VX}Eo|OpmlR(^I+vx=;LP
zyfsyMNkFT|<@(z{NcdwiEKN(Il*wV%K!Jgf`33_g*wkkkbpAZxGzqd;i{R%^@^LY)
z5+D<hK!n5?wqSNAZJ3W^z{z3)d~<#ky1d@M#Vh2nU_jEw`l(}rFsq}E%7R1lMhJFD
zgfbKuMeUq7FNnt0dYM0HpC5B9(l<ZzCKBDPg+OI8Vn1OWT{if6`;Tj%_<!*!b^|Bx
zyDRziX08x$UK2iF(dYAWy30{CzyDXDLNoQvRR?}|h>(ifS!)z4%{l(;>^JIBQK#NY
zZJgeKBl+J3RnX{it?gy|yt0c`_lL{v1e2k$(`c8y6)2-ib9Xm}foC~V3>57CdXh?+
z2j2%=(RFInPN3=k2B`|2GJ~L9!DOe2=r>3d2FNpvbN&mY62dF`|L3SS&#Tese1rcL
zsV<&>jz9&(NU!$mk(9yAbgLtumj~a^_dd7%-T2ff06D(deo3bP_Dy?BC@A;?wOqBO
zo9+(C4N%Z0RbAc7CK`r=3j;HTW;)H(d5Zm_v%>V^<;JhUy;pOTa3~<W%6A!zRUg9~
z4@lb3m}7<-%U7-A)A8^|IAiLSz2OcRLa&!uuH$Yqt(R*JaBqK$Ee>}y^<OSkjiTy3
z4#;i+N5W50V}z1OyB=PSAz@U>_S$`l!GK<ldUj@0&QFQYchGRetudbmoL99YY2WJz
zX?WfL7pFQAd%i5gdfI>%Lr>!gNG03fSNf>c{ugO7J@%7pVFE6X;9Q~Z$kg@XxX(y1
zJr~vB`h==lR0esxjBQ*-YAkUy4xbyJMb1!QjZCdo_jvd-<$Srz<LP7gIwyR&#PEI_
ztamI|9XWZ<w1qkw+sZbvPBZ1*_;8X<MOAoSbeix&?_EIrr|Dmv3A7sYeUQX9x}CU3
zFLSHLYa1~4%6wB49$2DFI~>TzI`lkUZiFpRE)tCmn~iHW#~Ac4Q2fY#dlw)FY<hP4
z{3UKU|MhD4W2{<ggFoI&+Is5TE~pe*K=?p><0^Ggkc;e+^RI5_b|(>onkoYIWrT`|
z5Wf(tycAIO^stKfc4qXT$bY*^;6&p}Qp@VSuRx8j6-2o3ZCdTIY+3Oy>-km^=?JVb
zjv`I3X=O48qG(VkXeM;?rS$=xt4=^NskX;RZoq2Ukpigdu6>0R8YFf)Xi-?7U$^H$
z)=_|jG$<vqc&<ajT}}E7NxV*O32@QnSeP062(f7<Hnlt+BPba!+n+7KlgYYz5t88e
z9U?Ls7z@lw!Vg?=6U)r(U`YtDEzxM#<`WUpW3zapjvqYU9uSY#!^rn^FIN}t)LfG=
zz?d950IoBQk|CAzZ^xr}d^cEdo<Z<pad&gH8`F%WeS8d_@Mfq>M3h8>BoH^SY`+CR
z<!=5VniS3NkV8UmrLVGG38+Z32h@(250|1xE7cTeKi`wNzT3x=zBea^ghY3C3oi82
zV<0(a)?J9>r*EGB8gjV$VgCsVSFvF_sQk9BFhIO)o8?mTS&Z6htTXhI<r@C!=J>xz
z6K`VFQ$$<<`A0ckX9&bYkqu4fho|+MR@^PZ7IMy0<7wI;nd4JsJzxrE*?VR<ulDtT
z%&fS177AjwFQ3r&&8?F*3dVnvO>gg$Wo7B*ESj$LZvE*A#bqqQ{O53BKbNg+YB%nS
ziH=8?!>Kz}{;BxwV2~y~Q!5L@hY}|MB0dJDP9BSJz|cjR5!HT6ntWRG__xKY5GKOG
zkJI!;^RLr1@f+gDX)+9S`!A=dF8Uv*$-;+9Gqt;u`*v_|Lv8AW|BG%v=%G~@p^n*J
zb_4f@=+H-6n$}vpRh;O3Ve}9Q{s%W>DtqfpK@JFmLWJX`mm-i&XoJ|!c#@lmK4sf2
zM5r^e?NQp*QyQMr1|7TB{QP;7HYgBb_2%NsNF`-pgSjvO_n&aZ?cqe<#pdAqyz*?#
z%gxx{p=Si>+g=gb(<W7FrvJj{Xpr~r+T8)@Pi15^u5jc1&o|vHvr7nMmVm5d$P-Am
zh1YUaA_g6yE-N$t)!Ba9M3_!a7Rp0=3L*H!ULa7Qg8tT`i9S@WV7w}eyR)1&$88jA
z)QYxp9BDWW5szTaMiG!6gXd|DmqW%GD^d=RX3KYA5gwN+e&#72i*?o<BueGZ(h+$w
zGXubk6OtfqeHX;asHB<E9|A;Pxu(_YQp5V_xahunrMgOSmGlzfn$gdGg#KxtWu^;Z
zal9?xL}#|H%KTQLp$7e8><KLN>R)8Zg~0OS@r{`|oy!~@Ixa3{PGwoP!is2183OkQ
zS-SK;Qbj!ad-*@eQc3R*vSj`bvb0ntv|U4Z*!vH%)cd*yJ+%ya_=7C{sZPr6INiG$
z0^R-}WJ$y7KgiN<-#^Hbd5?kh53*GL53*!d_7Aeeo%tVR$>;$~ng4%~r6CBIkRN2J
zMI>+rVNV7@g8=DkQd`f>2eWU%PLUr+nSQtPDXdRDGp(kCPKd;iM^Go1K%ghO<Z1JC
zZ8;8L>FYnr5<=zMkFpfI_oFPabNwhwv8_MK5~&yw6<JxTyG4e^uYz!ep*@Q@$LKa(
ztD`cCh_&r#!=={nPu7O|Lbfi-we2d)rqwm+bY-FcF_tzke~hI@zaL|XBqRKE-VBDB
zhF}MABtbX+OtCj_lwoWiAL}X3P{&4^$s*ySlcI33`W{yF5o&+#b)O9+2H3M@kKXx+
zw(8t(*|`UhNGlw!MR$_UVrIT8zPgG@6S!@g{!C|S>7NC{f#QK;^JCP7Y;ocyVb5O>
z-M@`N_F{DN0MxU?L<hR3@F8ndRib*g&Ih|G!mL6^3swI9p;m8My#4qq&a6(i14);(
zL&-v&w4=Xs<wnxlUW)2Mt-}j|;^b$39S*jn0(j;Ra|o5TXPyV;0h$b|HOn=`SU}X%
zGsr^Iszke;^Jui%62H%(V(vVi(9*VNQpe7%T?-(o&7_hKB!N+FD$&y)BDjlG$YBVC
zJO9}VxR7@lg-dBl>EvfW47-VMM1y+Mhr38EG&d`_gw>5obD7TS4Q`!j`U(HpruyFD
z1N*`{lq}n$si}eW#fZAC@D5xhhER=D&F2=cajRU0xY^<-H|r`zHXe^zp_IHISvO?N
zsTle|#erc-kzp)&`N2_=DSdE>#hapiRO)Z;KsBw>S%)+VT0eFizb;!ZOXS8aJL7%W
zR-T|wCR@`P<{@N#5A(D!hEwYD#&HB-p=|;aszI;EE%qwW6w{lW9x)w+V2qtwr01jQ
z!K2ackulijP%u;`Vj&A`=`;jxz#Oa6{V7!D?Dbscir`t!bihC@JZm{eK-~kw8xXu_
z9JL2X?P5<dTy0ZR?(}k&Zh19ZZj}cKUJdtqK1Syfq+%<Niuw5{Ry&XD5wC?&M`N{U
zBv`&XU-eXi*0XRb1DV?y$M<t)$PX~6!&$<&;W!qC-_^zP9p&tTxiQ~7fylDmw!lrR
zLY}%u0zpDeadB30%djb<mkIwQP34nO6li5tWMDlda?mmzhC5WOMDd*$?FwL)xq+dx
zo-xN*4vn8w6tD>kz~P2ioHI<fbgj^(?Ono%A1`UK7ZJldrr8JW#ZM2dL&j#Iuzewd
z^?)m1k~C|$fc#E!pHRZ6V$q`C@Cb|}w^Qe24G}VX9B^N55&cRC<-jvDr(9mG0*?jd
zXYI$H;rtXtAS$Wf%ZnIs@pkzWC)~e@m53kS3a||{uQ&?-8GJqq^pp}EMLNE6z3KCG
zzjFNxey#Di^U`|cx2f<>P#bV%csOEG_!Dt{_|GZQiS&aoSGyV^5(&9=g3BD}$vDe>
zn?#A9fYFF6<Abt;C4G(nYX!i*K^K9yL3=0?qW!3`HV4_q<hfT3nSlAPFXEq&ZeT~7
zOeAx3F-x88_0#2bCq@)#saDZkFFWa9Rl;Ts*o(gXz}=p!*RiDZExlLIdNcZqjm~%t
zM=ez38LyPHP55$I-#}^jxQ7Z3*+Hv%$m;HB9sKi*{_3H|^}=7(!<7?Nsd2rb=^Fyf
zW#2jN$DcL@;&*zdiG29OMk3wR<38+>Y>*N*NMr+DoAA7RCe+cFy$PbT;xSG=o&<4v
z#2{C?eRVwoK*N2Pbp#urti)hnDV^f7ytj1PYR}SgKTVeAh3p2~XZ}P#X+&2wS2Yzi
zHI;d)Z9=bW^n}glfZVY)>8TNE|GMNHM#f+=g(c=TyWe~O5@M)W4t^DWM+$r3<rbnB
zHCn)kzM3l%6Ma64SJA8xY~+B#$AUzGZt7QgA>zL|Lz9RtAY(wCdvktm=`RwnKe&RU
z)eXnsQBTs*;ApFJI(X$Ze4l*m?ez|tR3jD|hjXl^(yW(N*YGPYd0Y5fB*n-RvmKde
zgj@DEv--Q1Tn!O32=gN$AzDY3p!0ZZN+^A<sz{(-N^{yu@3U~Ruvi-xAF@QIgs8~8
zb;~l-?D?9gj5t2hO073CI2Fdw*VoHi^$SK9*PE!-7;_JTT_8W6H|zZ2(a&W<)oeaP
zJ?S4Au(MvSm;=I6ofjPa@W1`z^ujNh)Nuo#zmj^aUgTAD&}rZt&sfCdVg=kyX{%lo
zvcGmCSjQU0LsD^r2uLy?zy}RF%ZK(<rYO`nan!bWXAq=^3328r{beb+GwBiTPM(r4
z8PS=8<ZG!9)LqNb4#z91iww!qRj7unf9K$n=Wuj*BV_;K2Up1R`vnX(oxXEXh!i=T
zLmx;wYBvBhLuT4|Bp@q$3`@klv+IQUIZ+mer&y~Ij8ixT<*J{d-DvsH1ZFu>=S8UI
zW95>plMBt`xVykLm%HtAe+s^NRD*od4oCoenLsel_3M6%D+jcU23^-95qNOP_uJ}k
z`Gxj(q3`KtC_L`NR$H*b$K{1s=eRB3*Jp|^*Lx22E5x`h-0rV+yq+Wt;??qeI9mUZ
zZ*e#?`kC;=;n*y7N*taRJM&XJi_@FYz{t>8JRXE>P9Z<cJ|{vZiYOV{cuF+_u7G5-
zACy(Chq1#QOqNp7&5#)B!d+02w8joLm!1xk3z_j81!SgvPdlA~5hGO^w4&|M#gi4*
z65Z`eaQ}~Dp+MNA3m0JFFdXAgr090qu92|LyA5%{Miay&*my<Z5(gOtho{S~lsT7m
zG>&>HSq1Z>T+w8jDC+TG4rwV_HCp802R<Vk7P$R87L?=^y>wkZ&HEGb^bh}?%j0<x
zfPf0X*U6-$+)QCQuP@>w+3}0lM-x83L&d5HcD$nS<VjZiiU5ECQBoEer#=b9h<R(j
zf<wX*$}cgF-VEV{ZY*DCUS@~s$tixkQZ2hYsu%z?E??QYr^|c(EV*gIa-A>}S#9%H
zmuyjtE6}+hR~h@0oSY{!2D!)H<JYleH7%a|SFv*PLA<*|mIU9LVy%Kx$*fqLVRdER
z=%{mh@I>bTm~E@){HYTd^zel;Y2s8$mgb~Q(Fj)qXG>J<%3n?WC$0K&)V~5}W73>D
zBgdV~1A)LOlvd?R<xH*0l$YKgBD@?JB|Kj<1`ZdM>RO)XmzGJ}8yIO9PN*`4(afek
zyk}?(-y&&uPOXXzB^8pb+Nn*dk!gku)|$m>lhu>RGE92nW_5>7U{<cq4dyHeTUf>O
z^#^=^1}}GOC-ae9Hft-64vKa3u2)W8rD?M!rw$xXA0D971Zjh@I;TU~{WM;vH7&Kk
z#*A*egLBQud6+UrCu9vwh;?8_YF2c;S1FrKXYayai#|iu;>1C_)%hEH2%(Q79xaV*
z%Pp2RSxPkO)30<nRn}?R&x$d1Y@RwI<eS2Pq{)*cFM|!r+dQeZa6}*2(62toV1~=d
z6LSQREMMmFHG3k^@%3Z5VXE^DYyC(C@v7K37~{2-w*bsqBm7N^q18zoDoHEI**F+%
za`%`C>tnc6|K#TemEneRmfK6duABzZW+X{2Ngm$@O;w>#y)Mo)<P;)#LUhnKhgw2N
ze;ATl8Q+-$eN8%qjEK}k|9J-1PWVva!Z2*&2uWt8K49j5;F5X3&aQu4)i?`NalzdD
zGjGFld|s1Xpxih<oQD{SR^y1xRU0;JGES?N(AZc08>ymtydJFYJAA?kw83X$u(VPM
zxC;eR7F{1gK%IxTOP<rm!Dn}-cP^Z$eoXLY^#NOuU#fS_ZIL=qXTw#7$QJewY%`#u
zuD;K>vyrw|zp62;7O>@x=ie%!8Ka`LVfppOmX8#AzX>kDR#sHO5vkZ+)myv1n`UhA
zZ@a0uPnAu9d_*`7EAmt(2FUjZh&SUPV=BeMG@Cc_hh&``{s_+OK4hpxLTN$~=<G#i
zO!1u(iIq)a^3ECBAio3nY3v({#rLoF(C4ao?d?7590FtJ^lsO8Z+gZ~*IM6mCiXvh
zHYuGsQ+!>I{QNLr#h%Nbzb57&$bRWur>xcjU+>)b@GYEk5?8DY<4j0;SeB+Z8o(%$
zha{-u64=R!3lc1JKj3;%9r$i_Il;J^#!obg`FeGJPjXxZyZ}~~WGMWS(-9k$pCbf@
zgok>)o#m1s4gIA#f9}g$7v$2k?fLN7&DZhl^8L2_SzcYBTF+(SJf>Ozx_sp2*6e%f
zn7&i!db&4t7#Z8uImgu=eLDMnLo1UK0N&>6)9(82$zM`1hQ8Ok+qC+=?AGaDSk>*_
z>iRXF1;1rH`h}8+R`rp`Htu_KG_@(`xz6_I5tiy5bZ3{OJ7ML*-^}h79jte13?g{f
z4!KF$&GE>#>;#d*71^T$5SF2Q+^r#x!(#6$)VIc+Nu4OV$lq$ApS_0QvXMT})yTb~
zQ#GR1X=+Sva9KCBj7G2o2Mo5+z6iF%*58-$lz;ZIT8C$*GJ~bPqvj!Funy8qmeVa~
zgGXJfkrL&D=@boy=lEfdp<&x%C3m{7E8g~Z1S_jbWjYJfpr)iRD^99w5o~?ikbG~&
zZZ;Vg!6R3^c&3xNK12uE8WYbqn5{J006sTi@(!}u5!p-@Otw*$Nyl~HIG{67td3p%
zbXAy1y}+uS1Tz)mJHvzEqjX77E8$k-Tswcqxk>oBc|nBPbw7B{Bi{!_zDt5Cw@WZX
z`(6reK2uiI?Cm^YTrrt~Sjb(LUc6iGwLLhWnVmyjpp*-~Ge5a2LPA%M@+3+L4E0}9
zhd~RXoTTG3hT~jHJU+VW1xRm@StK&Yux4w=^=L5rWig%d!6I0LmWTD=D%*3DE`iZF
zH7NT(WUC4EOw0(cM;JjIuJ`tDM(_hSe@5^}+B&vv`C?m7S_-(u<FVF7kf5e<*(}t0
zNSDvX7Cj0omDmgvP8K3^*~ZgQW))-<QQ8~{agr$PVdB#{4Mmmb7Lhb5d7yetJP&Gu
zQs{-@{CtoE-?7rO&X{=_M&IB?+iJ~fJw)_MW-S<>Ko-#F=LOvG$&yK81vIVZ4qUmR
z!U4_qg!qPtP7)WOs)=W{56SmqHVOa<`oEHp(z7HGyje>uzv?=Fjj2)c&wmB+Yk(*d
zn<F%r#7sn3(%=Oa5rzv7hszJ3TIYit%fR9)SDua68!4va$T8BmnF-kQHe{N*3-J)S
z<%rSNF(D&R91`vHG*Cb~2_)^8ITS>H@4YUakfKOH^o>aAjt9w4he0c{yD?6~{*t)b
z%+@#H_jsXU(Uy>Wi3{VVqh*lEq!X9tipTkz>dqX_NQJ50GXUh|=rix6(ocp>gh!dL
zi`-NN*uozU-vA@tF@ec?-be`)ieRy-fm#CxfTWtwn5+n&`nR#Xs4&5?)U$>Wfb0?H
z&WN~{+dQRGk<mKL=@%hZhH8JGrKjl3b8toltT5svPpH|G<6l%^JJ{d}glaqVaITtF
zWHnWKKZ6$_V9a2ulZ5!FAy86cW+F!50pAc4&_v7Yzy7}gp4?xl?d2zq9fq6?UaYt5
zsB92YyI(}LJSi`}V4@(@j-)Wj9Te*9_G1tXCmMaU`Ix{eMPGufF)Vd#e?K3*3ip(U
z#Ge4I&|9eiY<QQ^7T|e^+@p(XPXNUzpf=#Ill%@$ZUh0ysqspIf=#%g^xHM*C|>LR
zsZ>A@JQw!k^)FmD@J_DNHSfp&B5Zcpp`FUGEvUTSjGHoWA{sY?n@yxUQ%HU&bP#}V
z%`~DuB1pNjp*}!NxgCjA2YxXg3&i}UtLC-4#O&zFDeFSWNF;An`CjvVxa!^yj@)R2
z-+ncA{qFeyfBw9S?TmcZnB%7)!`I`~{Y&OD)-OshYEu$7<{5t;5+E~~fPrxkjcLYF
z@H4WZQHqWyJ3Fma$jK>3f`J?L93}spuC&t~R_}PZ&K$R_+lwBs`9xM?NVA6dHE7(1
zxKD7iALejcnj&<6(+j)T!6X<gYWjAko9KOOtC&-*e6)_!ZkaK8r7-NAn2jg&7siQY
zod*0s<ZRHT6XpvqfDF*9#nE%GAY0-%3dp#V?g&&*XIr-i{Pbno$zUY0P6~Ped!1&z
zLQuIg=Q$jtp9OP~c9I;xUpdvb-NjocfGn|nw}s|cdo}${B9PgHY2|`V&|Z$<0eCzl
z=sIkIVgS1|LN^>CFG?tCG%33zNwTk)pDKg|s{Jz=!ND&^ZZEIH+Y;NWj@y*2r`0Vq
zHARXmgbV@;4ZGURMS=<+4G)=+peCqREcH7Q80WMi*o;h-Gzi*2o8>q(m!tkZ39c4a
zpzV;S*AGm(Hc3!J{1;y^HAUyM^IumHhim6<3k>B#mBpQ1HZ1X{`RN+hZ;!pU+}c&0
znneP!>Zoe#C{z%9c?1Vqrt_pzNjqNQ8ds>IpkLysZVryYyUB<fkRNSQ-~9bifZa%n
z*9T0veK0qLzvoUcwF|06NhG(Z4Mcu{ap3*|N0Z6`h($Xq<NYAq;`&lN>~!l{Muz7~
zAEt609WD~?ZB{4G0y@yYw4N17L!RQAOjmJ8C&4d#dx<dBfP7v>GZ%CucH1+KC1WFp
z>XlYY^!wSG=iVaz!^$0VB9hVW7@l6PE_UN@M<yHgC)J^XZ-Ukxh_TfSzH%GjS%1t$
z=s7#K=E_#@QqSZ0dn$WBQw;>}=>2qdM`PUe+Cd0Av~el6r7(0qm9GdQifmk@OJ@Y3
zD~rZBsmiSX*802NcDqa8O_8~6lFQ?%pN{}+h5xE}le`tS{P%mBtxiJzG(Up84jrgc
zc!h)Z!T|MF!Ha_?t1TIvmJV>54776=>xsV2s>|9{^^8xwEN<h(8R<<#`eV6oC7yeg
zs1V<W1DNLE+Bl-}P|;@sIwi?qet}YntUY6#-q^(yqg-aGnTq(1UGGS`LtRIB4TWcO
zJ3W{<D}<zHi?X7tA%4MuHm+ChN>^@Repx;a6*&KUW!_hA>)n|qLq?NgnPiu?I*qhR
z$P;+9#`XYz>`ZFd$Q<=tsaZjf$ftn_L_NYDpF6Ux;(65|(4P@X&HC|D%cbS@qqg8B
z0qi-crV7)^p*dtxSSckqt;MQU8o7WbhSqfMEBrY~xyK%YP3St_Fml^n9d-;qvKUKs
z?HaR%$A)$MDbI&bg$_vO)Yw}8WPFTT|F5DLU#X6`7Cclxox@_7go6<M0VpJ*V%PWh
zSl!G!;kirjr;t-)mRM*CGcpb~2}qi7+57mA6+A32Jj<-$^ml~39mlflKYrb371j5g
z)HEW8b{YAQq|xxir1gpkv`Q1dRAawK$*+ny-GAEy$}Np3@$^N7O@ZchsAI0E{6hEZ
zJ|$5bc&va5=L%^tHw{Kxh%7Umr0JzeHn61|qZr-lk#AN$)Gm7P!&chj33rOoX<u#d
zJN#vEZH6jei2etkF!4?(TkzAn5GHh+cG^y@Uy%LJR?Tjvgy}GBVds4!d&BV^{c>F^
zM*8NF#cLGOPHN0j!^L*{^b6hbYF1Wp_;=l4L%hntD!a?Wfc_ExDzcPU2`w^{h%N+_
z)x3Jtt`5}%A-t*wX{9_1?MM~`+D=!L{SGhA;KoKF7F!oiNU>oB$yzbi-1{I@P16(e
ztXd|MvkWMdlfxh|75#YS9A?ea>!;Kz8d~z8K%cGJ9^*$$Uw!m;ZcP<&YeeE6xq%O$
zJEi+rACbmoT!?Rq6%-^~YZpU8GSU-`A*JkhS*SeVadz!;xwQ(RcOT5idSMuMp9cGe
z0dDJ36*`Ao=4HG>Bpc2>v2u@__b~zUfqn^rTGRV7Z|F`39`C06dHWXEHIWAyMQaQ)
z&=QRM=YH+@-&y;@#8T)B2aOYzF(-i`Qmza6pmYA0!-(FT0FWdw<&@$1C$V`ksQH+3
zA=|q}xJqsub`jr(8@fF|`LzwhurU1RoQU+^3MWWE|1%Qp)FUMvFCNX2;>_`B1f6br
zqlu9@^R4Yr(J^Av!ZeZc7^O#=_7G+HsP=<C+<uu5bn%tj2y*f3ks}cxdO|z<>xCXF
zDk{Y6=eN)AzwfqScS|F=R!77;^Va37unXdh+p_bc#a&EMHWS4HR;eSCrH!F(`-8r}
zYU+a%!$yl6w?O*8P*-Bc;a0jz5p5l>Q875MqMB2E;{G(4+U!l*wL=aEjljvm^83QT
z^flPmX^5Evl=T<R)-XHi89?0eojt&2X!$tNv^+$<Mu~G4G+9-tem01Y=C+VE9#(Yh
z($lOrp1X0^Ol2d_VtMsA23aRs5#s&5IM39O(HAm4nY(h$$fM<|$*Vj6n)scNs#X(0
z`YLZhN_+%#z{u$H_nrco-oc?x`L4!jh~IK1iA}v8=70&Y#?BI^ZEGgnNg)IQQn^Sv
zqIkTY#*4)NTxw!Y6R8F8h08)z@WAv&-R%Zb{Ex`CZcxgo#Bq?>q(a8`OU#|H=-f&Z
z7RpV^9AHrBGQu0PFr&X2=dH?dyfVZuI1mdU9Ycm5##&)to><>F+h3*hqd*E{N>T2y
zHMB`f8M5tpcBydDqJDA$kK$1dhUjIq2|1iCozW}c6R2*g;p0HM2q?hQZ1$8ppXPAq
z%bwvn^oQ9tO&v0b#edvVItHW}++W{L@$-Ds`#A|lIF|$jok-tX8mnXe7Wof#gGS7}
z7@APLRFI@X7?I6fh0Ir}&in-hgUa{m`qj))JzD^_E+f3~<0-s<q+{~qjv$=ANc(Y@
z_>XG4>I-XTKQ2qnf~U3X*|gK}b<0R;RHy#Ys2Co%0Q{y{PNCMw&-kl7%7BHS(arxf
z!;I!5d$TA~8|ngF6}W>8H(|1j$-xIr4O=UY7S>CWlf~dLA7NFycPd@z|9gGpQ_|g)
zo7ml&{x@`3oIIf+(1|=toF`H*Kl6}^2*=1NdX1rnO*kHd1Sa;6dGwo8Z-Hvis9%=Q
zH;}w<BL7eii#*U|)Z*j2m|LY&IWLZnNh{t=ca8rPg*=!IY#bL(cy9Tl%=f<fGqwxp
zxa(cb6?fyXB;haDgKd9B?p6_(d0IhIFOuxCAdGq~<<}d)R#e#zX(A{0ewT0VGK+02
zbOkF<OT?iDUR05%Se{rK%8C$)_ZXmV5&<wTv$ZQZ4~+t+dN7oDqa|i^Jm0{d&k{E#
z*w)V=?Ea}k<m*8ZBAd6-K~(KcT6QfG78(*pXW6)l^n}#LNqa?Ar3mp)4gs1OTr>Tj
zA1F_nR<bLnl%hD&p!~h^0<a*6_v(~W!2<=w>;xpeRoE;%9Mp!Q8d>9obLMmPW{fkE
z^A_wIt$GlcRldv_%3_2N4`UH<pggU;oU~P@w72KMgTPQ?6tN*P<F1|~rg!tNhE;Jn
zODI)ZS*jm%E}gQ0Y~$+vX;SAAfE>y~<c*G#SHh?a<CUxvSQ8?${Y_M+tCN*K;1SBO
zw;GCCX8x=r_o%x-XD?K^&+eHRWDnD@J@gg=0(xF&d{p=&{+}7_72(1QG99?~9C1xx
zarEyhPii}LpGGm)Ex!-TXWNG^j?nP|14Z-Wr)y7Ncea)ie^m*(fRoKHx8t>}(C4e?
z?e3S8k65neE{~U;_MB$A$1gXy$!2;^=A2LW_q&1|UN5Kfv;KFqH{Z{p_V~*)Pi|2_
z=j)Ge-Y%bZ-_Q3gExm}(slu5lG<e+a-EM%zL#h`Zr#<BLw@kr?K=eln6EoSot>F9E
z?Uy>&OQe>DfP*})hdFLZ+yI(*l{7ENR9ZtP?BPGqa;$Vpn@=(=#QSSxope?FL^tY*
z_q=U6*Vfx!Ubk6a(NbK?v`>5;?=REWp{FS_d0QoT&s%Nb?Q`vpdA|3C*L%TV&CRY;
z-rMgllIoG}5B+MN&3wV!&TT%AbUEH1vl=Zu`&(F@T+0NlJB{C4t*@Lug4J6PZ1^dH
z@Ymj3-L0Q5kG7tCUlNh%@?DNqzWXsbQ8$z@R}`QnTqsq9+%jVA@l=^1G3<(Cg1Vk!
zY9sS%j?xsUhKu_S7wv848%Q#U5fVKBV7Q<QXw(pSm^>5*=i^?`KM)uMay}3LY~0K^
zyLzcgudVWoK$k@A%-MtQ62A-}%g>QkP3~rmrl}vkTX9IdatMI3N71rZ4ZyEOuy3dM
z*^ttuT?XZA=f)+d(Wb`a9Wvp&B>J!AAqCi{r=Jtb6)jfh2Hnei{Q++Dr=3Ro)pbPP
z^m$>?q4`?)zQDHUFvkt9YSa$t0ZG=-zr@_9p;b2LtE$K5e<ShLD~Nn2!$47;wMiV|
zp^Gw=r$B2F#|YC+xX>C;7E@JaDwac3o3$0^HNHx<BtS9dbNgi>@qf#Q`4*$U#5tNl
z77!pP%fo>(#GLOM2WxoyjRAmVHPi(Z<TU(3?w*i1M>sC~J4shs@$7q+^O^s>6up(j
zHY)}87VYmtm4y9vkR$+A0);Jw4T8qWx6_gVr#Q<~F=Uw3?z>O>xZw7(bHBBCjhEw#
z6`v)YD;|g$&MzgLo8#lC>K`GmkLnA0zMk(7cg;~FC~f|7EEvMgnOo<y`;&e3PBXF7
zz@;w-hC$5ymMC~9o~JNdK&u4Yh!=bNUiVzOB8?&GK31EEpRPK@r$XIDnNqp>MJv%V
zzQ5J8>7~zcrynhVr7;&nj5}@+3rUAWAIFamD2Pws9~Vggi3WrIq36O=_etSv(B9T&
z)>eYYR0sQfMD6&CD7vk@z5Jio-}2I;0)Qd7cVS3L3rDNkOC$fwAyUncx4yNd2_z0X
zc_X9aISPl&G<R0s_uqytXM`LDrKx9Yv{_Syn9W2(#@^?jRBm(71~)Kf_@I!wI@V1e
zPx8>(NKY*WztEGV^sA2peY@&iMZP9g+|BmsJ1c?EG}i12W(6!Nl=J@vD299GtJO({
zVt1+!<)oa7<X!tsM*cUmL}n)Mutah^yP6owg_Hxr7CV*AMT-y(i6H8MX4G4s$fbup
zGJs!wcnoh!dR@~BZL<Y+l@lxsJRJj$w0zXfjhflnzu~`curHSbK}y@7_C9i{f6)?t
zrMOmo`Z%l(T?SP1#Qt_AFwC0+m+iIvde***g;ZG1jvE2S^@a{rz2%}V&u;b=8JZQ|
z{mJR9A%O~j8V6(oweZ}!q_-kbLq>B}D0(xNS_yln;V-$xF`n61tz=~Um(wU`iu=>*
zc<ynEE(cg#T!5a*-SujsVCO<a^G`9_j2WW7dZriS{Q>*Wyb@eI5P}u3Tb$Z)UhkKr
zg1zaeFG0K)3}l4`vA5q_32-G(;o?q7EP+~YErT=(d>U(U)pV?nDb7mZR8gp7a~EvP
zH;_B4O%x(F*6>0rCzO>RsS!licZ=htrQOrkM!Xs1h^LF!Q=c3?T+jAL%5gw#D8t93
zbWDN?E8FMk!~!xXUdAzb)-a>-wM=Xn$(@d9(V?5bqLN7B3%?y#0H|atbLTXcURI~W
z;udd|WS-`5(BO}=#8bh#M{j_^PYG?QRydM)Vsy3TdO%VU5=5#fofx60ZOm@ZO6e!2
zn};brwB{nlbm8-hjgUN@%bU36&W~0iUFTp$NW1XVvs3)9*D9Rj--kxzY)8$}*I<vS
z)4~xH_hY^9$+Rp^yp7L%$3ubRz0aaJ(3FYJA7{x5kCUmAdZd+Bz@4m1h9*#9mZD3b
z_Uf6Vx0Ok=brbUHTloIkZ^1w#m?vgr+%7vf80Z}V%8_$7nljz)qoG37M(B1TLH$%j
z%3P|0j-HoBA9#Y*X#x%u2Ho?)(e6#%?rJz~xFjI)$5|R}9)~!aN@R1h?WS_E;mNR2
zy&mhRH~Jc2Py6ZVA&UWLlY^iPnDoejyE}nR6vrkP%3Xw((bdoAe2NS6c(Ckba@~Kr
zm?ZnH_Hi_vHfkGCMg)SFn|7YbQHDil?PT-tFmXVZH7ddU<187fa<G%jN8*^?6JEmA
zEaCSrAL|K<*=(hI-ms`o-|$##TiCA1*hwLd6l;-js7tzr4RA5JgO#LBU{rB0EUds&
z8J{X4kqv40k~A%iMtmv6B{$QN(Wq<cODP+cUS`O;URMdd-%TtWwj#L($k~9$mT={4
zf<hh~oJ`DLBT#F!039Ivl|>?Y(4}(&ui{#|!bd%_HuMvA?sil=_h}SkN@5Ne0`QL2
z1_{fzVN641CF>M0gG5*y9A)Q_6-r=UMak)#`BBVKbp@+1mu=8K1s{!a1-6JvRgt#f
zk}#a!)fS>AUg1VEQ!FXePuJpxy&k(&!_ck;ofT&za3H&_*#Z9rBbx$y<|8Q}=uiKA
z<&eK#816kFrmh`^=aRFX;Q0{2&P1({IV{<VC+(D4nKS`Q#SM@lFbmBm2vP|ua;cI7
zn-_Coyaoqu7k)T#*Xb6T8Z{2iQxZ9l$~I~g_Jus|<f~>9i0=UM5ZGRat`oXB2&mu3
z>F8-Dnaj8Nc|5*LmzUDEliZX+oALG*#7tH$=Fnz#g4lApNz>Ze9p!V4<ru_vz4}#b
zT_gu&D-!at2OFe<)zHJUyP+O_gUL}|v#llUEVCe3l#gxItXDO$&UZdOMeCpa*37_7
z&_BLm>z%aUs1eRi<yXnyX4dclVybUyO3&c+gUoRY@GL0KH16JU3AcWc^|<x<^(uw4
zShn5n_|DppaBx}KAAHd~lsYtVz<$q}(S@Mhy7YSJa(uG7lSq|S9fN>j_jHbT)cumt
zMd!iCZ9mN5`8X;_C9Hhw6_&4DAYjO!JAKl5$^XZat&_1clcAKEzw#l_ov?=euq{+k
z?EriPD*X2S4;;;c)r^Z8*AtSCb_8Tb>yC@j0Nkb>n$gftx-pW$*wJ_YN@i8Pp2~=|
zy()VsGg<`AV&>|VE486^ADY$jo(s}qVl9Blb2;R4rnA#(zG>&K%Wj_)?<gU)xzWw7
z_sJCYOE+OqBl0`N*71lbb?$89W-)2AHPlwrqA}Vb?i9JnjKi{K^Nl(s_}FAUXA!r<
z3A&*eI-J$KHHKp2F>-0H9@eDn;em46Jv`l`>*<%%W28DCov0q&#8W==6Qc-?{}iu@
zzTICv8P)Q-n-2@8m|f(eMNUy39Yt7e5EsM<rkKKQfraTbixvaM_gkZqnkC4gs{Nv0
z4#tqhvHoU|9O(s65!v_(*T)aTRc#JorB^EO*||=hK2TRuVU{f=KOJ+~a52#ZxaCY=
z&UW`DFN=`3oElI%nN4buvoMg$H>g$K;ifba--MOi1z@C<Nri#olErOq=qPfPc8H!E
zons;+F17MSk@(WYWjaAi!h?}zadm>pi6{<NxT#a8hh?vL-TZSRzDWIld*tp^1_FRB
zzryiS72^es+VD`5=X%Qx>*23JbF@R>l_%fqtOho(^|fx#)eV&%MAtCP0-*&CZ@1Pb
z4s;ME?qS4}H*IV}1n6;El38DgTb|v%(i{e525Ok0WXcK`SS^^9+E|twZ6caeE3zbi
z+k`Vqx4E><Ge9Cxr{~#KOU}STw%_wbxQIjDJxTIC{9IVe@Sh>hywwZLzThmAw$j%L
z7%=DXA+8h}2tE)E8qckgME8<eHOgl}Y90Y3^GMVZ*qlsg8g9g#<L)@{b-QWCm4Q`I
zZrTuX#Pv#iPPE)?+4F(r&A0dyaQI}yWjTldP*-><?tJ7R**>8FfmF*Z1oZ664mT&-
zzR1{J*qNQ+Dy3#{su!7fIqoh=nF2MQxA%}Qy8lmIcM(-LxU~TocZcHc?%Lw+?p7QQ
zaBy#NhoZ$fP~6?!-MzTG6^d)`Y408UlRsJO%U&zVJIf?H*^iMho?;d%yr_QY84`AJ
zuzB;Wn9Nq0Wu`o)`E&`Iw@=s2;!mt+V?b<PFS9t&4J8kQZU8K`_wCL3e7zS5jGU9h
zGYlvY5To-E?|HzTRUo{2Kk}%4nzko2wgdJj=c&bnS5N*A#1{kyV;D!djHWZ1QuAF#
zobU&a!;{DJ+3=8`Zpvg!wpMM68gA7-vRV(xq+$zH_>0hj4ad<-Sp5F)PW>qdnavLL
zm1|wT{V5kj%?=EeTMZ6tO{i<4l1S0L->*%Y$11d_6h{{_as?|yh@({{Nd0g=3E`C9
zGRSFnw1OK6J$g+=HH^(Jg@oj;L`EO?`#&u{&&Z6f$RRX784Jk2Fc<L>n2CiaB|%By
z-b$4jsz(uE0Y)0hce`<hB~U*|{tROk30H)ZPobt<26b}zdkONpc5n%;dS7u*kScjC
zuOFX0Raap;L%CI|%L7*DL%5a7NJmH~MI<4u-uHG?M=qfW;=H`{mSBe1>yB~7ZSG&C
zO*uCast6N1hbii4sgY@;8^P_@{%=^^e)1Jxc(Tj&Ln(WmLSFIUJ_Hjc;*mpqkKmta
z&-K#}if#SU2iiD9QKp?q%Jq?`jED0+KGg2KR@2t;!uMnJEwDhv+C$vSHhz$ihX`}%
z3ws=^4eaW5A!?LaXsQ*G&}J}2dI%JO1~qR`WaMy6pd83zrNOS$^3~J>u<Z2?<7kOl
z2N5~t>#7E&c=f?Q<6$4Cs5H}@=^(7kjgwm2O}gI41H-ssbV6;iMWxY9%Zy|}XJ2~K
z46{{MF{#Rtw)NweJ;~M^)(>(iL7MmLR)szgos1ygFN_uf9P|ZOmmm4P<JEJBOLZc@
zp`O5X)<J<dWN8zz;RGhsRob->bPXu;M}`;#WcsycS$gfHU1&cyd&_>2T~fzAh=F7(
z;$1vL4L?Np7`<BWbSOk>OT|!#Pfsz4gBvZz>l(3>%@yB5Ta-a~GMy6n{o^Q%UzMTw
z@}^dOQe*PE7nQF3*0?)cSIf;Jk_H&pbGfioXbY{o_Cip(bjm~ivm)RNCITiyMj>uV
zths_3GKsnyG%UIBSIW{RFC@Mu(PJKgD24(_X&N&M(KyJDp=<X5GwX_UPvKioMz|(@
zD{@Rjc_k7xT)4cV@_-dej~==wi4&Z8hbaavH!=23XCtRNcV`QT=J=d#=C88!sVIWX
z1MV>ye}qV-neXkjuU|THT)1ocA_o@=G?bMKk=5{PP;6KrRaRTC)K^vkM`9Fh)eG$e
zv8{eynIK~K;p|0F*El#yEot4Q6{dye`+_~64cfT&2HjcMfnyPJC@!oU-&x4nLt^NP
zLm9DZQG(cNyBJ(JvBVpFTq&X1=kZ}4#~ah?9SDic07W~!zOqJs9w0-C6tzOs?-S|g
zNoq@K2udtz%Lm!cKCCb0K>!ri$>1W$#fz$vFYOaZ-s${O3X~G^7=<o0O(A1Oh)Gz2
z+pDHYh(cFaX4UVi!#|ROED^M~eiY)d0=H9@dlvSn#a5HXPNlm|BDo-ojmF}!xG%<G
z((5!Uzb=rF#P9Kk6spqFy(78BIdT6}mYR(05(|&t@TM76ruq-YRF}>1LNjbEOSq+5
z=yGemBGB8lyfuwiw_C<P@F=DGB%m`TJm|GWP_@llxQ5W7tu0@Q&m3T>c~s*WkYDJB
z-U^pz6*(MBC(<j}Z&Brit(SNYBJA3zg(g)ubYS-DhOPo=)C;?=;j6Ov;%ke_50%3f
z_v)E_w;+`=z7RsmS;0|6QU`FqlaHDl`dWtjnIPM+v$CYo^wBEBmkmn&_G!8|qxhYM
zi7(i*$kU2G5H8MmiV3J*sc3$qsHV@YD))uRX4QU%q8SJQngIp|OA@GC)g~{J{gxzx
z#Xjb$=95Saw!(QRr*yR~=ixDIcRkM1X=wknQM<e2SbQb3aU%4`J|H6;6x8}vt%pir
z{IkM1Vcr5oDab_j0@c6A5c-|fiV9xEMoA%skt(K)Pn@4vYv&vV2${j1g!JswW~*JB
zuuQ0N(<aXSGy;q)#qkn-wate)0_?9Sp(rG2iS|$pJRZHAtW7~^NA$3{BKb~<T=(Gv
zN7y)AX%eCdYY@@tQAb=mTqYqVt_{jJ=O(Xze06O9rEP2M%|*XdSNn_OlX9sx-J}i^
z3wsKdet>%&RepZzH_&S3!fVMKgq?1^=KVd^kg=`i2F1rc7)j}-WFD^Cwvud{IvaKV
z0LqvvEM-j<!CAdHRdif+ON8=a#<J0Q4rkakrF0If0h<NeWPv}{^*BaJHwv0?ct>V;
z2Ia2hS(c%F89e)+EjgW#P&D!ZSsbW#(jD=T!|Lk`Z+8u=MU$hbL-TvoKPLr#dGkyc
zmwUTf`WNET6=;XnScO^0<0U1Q9p}dnlX0f}Q<gY)su^$su8_y!Q;t&Xuyj2`q0$Et
z&zdCPgGP37c%n1@l!!ta0ljh)6>4jqOX0~JpwR@Z(0hTCs}8<wgm+fad19{o>;>;<
zh$=Vj{Z9<FhRi`K7#C4px}<oy6E%|@nN-#f4LJy(dhW`G=!b}EUJTh^QdUtfZz^>@
zLnMpNdIYmG-CmZc+s;aUvVF{vpE!SvPuYs3p6hPy3WO7F^{{=<%&(WZgwx3Ht<-q>
zDxANYX}_>va&}Q1o<Oq7A{^Wq2rBd&$4I29CqWV17x@V}@9`z%0LtCVD2Nt=_c$H<
zd6)-*O46dl2koht@csL`bR!yGfO*b1X~HMFk_E@74_Bp5>$C!>OK2fO_`kDM$!J+1
z^-CAR$}Z}>*wRQzAKLsvrx*!aOWf%WBOtA<Powd)fGw3mlds0IkWzm{*<3fiYB9(N
z#m#EKIRQ97w|C;ka7A}eYnWMJ<BsN}7-hNFrBU)PwH);O`G%~X*)767?*?2%$s(dr
z_|J4!+clrA2{70Dh1}^fX2EZ7XDo;>C4YFJ@{#MfOob>7+mmT28{oLfZ(SfM6pp7*
z6>6LcRp`MuAz>a+!gRouKU(G;(nOdvvnnXnGr5{k*iV2$E=btuGMLg=S4>^L2l>9a
z+lvg7L{XQvMeOYMDbQRCvJe}}ON)`rzkA;9ePe7=XYa}E=_KxK%@btec*lti0^X+O
z!Q$&>FGWhV7iMmrLayH~RgAE#WIx=mIm&!b8SY1y&(|AT5ON}>v!;muU4kr}@K;zG
zI9FYCwJVJ1aUvtnV_w<8uY*1KI#9m(fVOXg0fP`GD%&QYS=yZ0?YY#>^IUeIU5eZn
z@$lkb4Lm2wHPF>X(-3r|#!_ctma{$Jl9vb$4!#BhOM%cm%R~8eeD0o!)L>w#PC$<n
zl1bSqr&n|Hq~o9fm=!jP+Zqgh23*$J6F)>&v9s3%sc%Fbgsx|nBPmFEUJ<KrmiZam
zd#f_<W<)i>gQxlbI^;lN4RnM~WF#`;YrYFx71mbv)Ik6FX+;2-Ix}ZzsxYs6uG*-L
zl3w1qW93GN9WaE4*fn@{go9b18aK7^>~a6RvprBhb@bqh?{1Gh$Q;r=I1tBeO5^qR
zxs&T*GR}X!#n<C>cfC{PAFxz;<4M#h8cNVBT8>fpqo{~n$UN6wMT2E(Vjo*B0b3(=
z$SCrp!n8o;+h;B=-+b3c(nq=<oqQrba|=I~CrpcNWP-|@PpZ8{2m_<CnD!Ng>Xd*^
zzpmF+_;L(9yAg!J4NL;s!508{ma|}))fbNKXXv)3Z^wrYU;Aog4YYVSd?bHWqPMnx
zs~Yq{Iy0@s7FQ2kBfh;4E2?QR>f>FTluxxO!S}1%dZwE<KoLog8cR}5>ds&sv}(hD
zsVb+X;?dszoG0~8w!ba=&ZgDbY23Kn-Z(?_1C6x^|9Tj0iz1QIZQrav=0$?<WzH4P
zc`Bh?ly(ET**Y7P{^uH5_Lux-oWF8=K!*%83^Z{_4*L^}ByZ8I=27=^Vg;gnWv_KH
zrtQ&wWh>ENV2Q9dEvj4D2zUO4)6;n5?0Wnto(VobMo{PzpKlOC`XZyy<L;{C6Z~`M
zn}l(GG9g0?TevV1{RjQaOql_B>PCw;TsD~Zy?9>1gm20;3K9$+Mf5sh`civ?PFh_*
zanxTSF|PBN<TQ0Xh<8qzg;^p!i~CO>AvZ+1S4p_0`=gx1knJz+P%l{(p={f7d`_e{
z5gF*S5P~`xIHXd#6zjDdxqAWx_H`m6^qj%I(gxU93S7m&>{RQW5Jv!e#X>HO_dIgb
zg$~~Z>~!Tj-#;ROyr4A`O6G2wpRo6Oui2j5_(wR%M|Kw1)T_cR$*p(9CIb0q%#S{%
zlGgRGa93g-06t!SJB&&D^KN!j&6lLfVqQCdn3hG};+-z*W=JbQPz3BnS#gcKgq@Bf
zGzh}@ty2X%E!=OMq2KrTaGRaF9Kw!^L93*6;w*Q^{M3<SKQS&ja>qiS@E)mgSR}78
z{Do&5$L9c}xt46H{^3Nd$spNwnWKpRXep_(@iegVJP7j-0NS1Q!5eY0j-g?|U77o@
zt|Z;G=L6l*tQMGOa9m;WrN9}Y!M9PGkal6ut&8O1>l+C*p1^}prg9)yS7J|1*5_CB
z=jQVbMc5G4#7?y){ehCbl`iCp{&N@#4+e0%{?LnG5Qfowu19Et^7HfWKqQBFBoqkh
z$qKX4IcIN8N>po)SJVDa{@#w(EmvanS7T8H`vEU~`bXri-(wWi)bK}$LAS8C34e7Z
zB?132iWeikA{aTauA~XpmG)Kvmu6^`uc5~;$f#gliRrsiifvq$eI;w{RGUh;Q;!{D
z0`s`<P!Gl{OmF>4rl<V7WS!RID0OBJ_7*`jph^Cnp1CB=7W*KsE*)0`SZ#VX$7_((
zz1lu`Z}d(98?;Qy>|3u@?>P1<N23-Bd2MO7=t3nFNAVi@PDm`m%jMqc`sQr?wJ)uL
z{c)dHAZQ{k`yBKvwqu$juu~Cxo}5)1y(_KgXW8y!7>MO@aLA&_#wGIXcKK8s9~@Fu
zzrysrw{<=Z6!}@yg6?7}bU5CNV-p+M6UC@`Y@rCN`93)htK^g~+txy;7-_VmQIle@
z(36V2G>3pR2FV0Agk}OmoL~z<Z0`^C*yF9LM%jYA;X_rZC4G4yl6PBLvKl=7vz{lv
zOD0V<8Y!-GZpm>=BeY(2FNcMKGF+7F$I)BUv~1PHveFKv;k<G8F6&bt;`a=z<nNO#
zLDC+o85Ky18MQ1bhoT>^XIrk_u{l0+;on<fw9Vgc>&J4*La!Zw#7c?OmZ^RcEZjzj
znrwL1eefZg!FBcKcU2AYE1Aa66QKDMQmI{Y)`i&~dmq-XjorChMD^&@XbBLWsQ^Is
ze3cuIvxMQ1{BX758$?6JV^d)UKbh4)jh50CS(4G2PSL$--*qhgsJko3?W<7N*FTe%
z{xxg`sB)!MQp?ofCvt|{Nc8f-v3l4aY%9%l$x0R<K{VycP8o%)nHnh4l{F^x^he8S
zP)E^z;_N`spttncMWifCIGH!NwRfd$0@F&C*Ttt9Fsi3n>zeBD8g=L`IQtD>TP%#p
ztne~hn}Od6LpK;PsmB*Sm(8v3ouJj_?LanU<8QDg&GIxa)R3+>u<{57!JrvB^F<*w
z+jXcioRV*HQ0UsoFQHHJfV7z0GC}+_hWWbU)sQ|x45Nt_ydOPfwKOQ&wJ5A*f5bPS
z_pj$_09`s^ZJB;=UqG%k6ts&@FYJ`Hw@dn}cdQ!!m6Z?)3n{>|(#mFx!RE$QZy3uN
zUs99fdPnYUZ`hrI$fJbg6vrc2R-(97w;Z~}ieP9H;H*S&|Lt09t|F@lHsF=x^P{xW
zN9pO}52*8oa8VL=GR<ayySKUD%kCmRqFnZ~q=p%TDW)CQj5W<&(`*nwVGSHXbV}u=
zi|H);G()<UwWES03bRxqNOt^pq(fj<L9ESo#gKyzhTrDzvqf%_Lk?(kD;#BzUb)MW
z1a7wPRlMAm2~gFl-L6saAwG*>*g#p*(CIw&2^kwd?o!9a<ISVtrdgwR`OPA<K`hR0
z@wkUDDl0mgQ~CHE8yOuSPAkfFaw_QraC0X%mW|h!-KmBf=D7))Mv_5M>gMr|WkEI!
z9%|93R@&QLtYd^(Cx3ao3#rmit!F4MAvG0P`qcn-m0;~>M*!EEv5Hhdy!we$5cSyk
z1z*`c>Rqo{n{SMzxsc~`4SkjcyoK?*|C}x8+W1(lG_VPQx^G3Rt1*wE<9pC&ZGgX6
z0%u9hx^MM|;n6SsIy$5RFv+TnX-GCtUZ)ptaK!H5R;Wa+qn@>V>Ob_2@UQGdsZ8bM
zv=8-gW0-`TcQ|j9J?ef`y}55r^C9;@^oR5#27kT#PF#9hkMox67#`+{aDV-cgHCce
zFwNIck3kM%x5<U1ye2?*M)Q_G>x}M@_7XnM4=sxhfh2!V%IR|Ae7;@wximtTcCZTZ
z=dpacV1%8_kxVuGa8Xd?u!+imB+7!^*pODq%PtvHWhLlpb|PiLd2$1X^pwI+W)Omc
z*l(RcwBsMDWXF9kJ*yY{sZ-6n1Q=}ew(Y5-u6bt4Ya9r4p}7T&yj}0hU(Gj89alch
zj`!~GgWo{cwKtH=ovIwOib~rHhSFv8CwnwprT@B8Z_HJjbJN%;@biXTX6LEJ2#JU*
zd>iqXGqTXL#2bm+w@sYS5xFNJi=pL^fZqYMSaT5}wK2I3u}=BBD{;Opw}8%LlMlji
z{f<Q4WH`a|52GW0?_l>EF<5J^E${SbloEKAT;tSq$s^{<1%KU%A@_9bR4*zcsA+3M
z+h-3^f$_ZJ>X+CY3v<itFjU*?Nh<5NKzeD-mua1|e@Zq2>V6p|LDZUbmZ<Q;1)EB4
z1h^GQO@LZAh~%rVrSz2cPy6<Ef#?X@lZc4UFf0t#kT3Tbh|3S}^+wl?`aHbS!J7+Z
zt@tTCcaez@d6s&!ql|le&obkv9ApO}VKDik6&yoBb37@O${b&ojog(VGqU2g78Ol;
ziGvnN8HN8#DjAZ2NhJ-oSc_6R5Gt5dLe@x+j;nJQ|MSSdCvpg*)bMQA<{Y6?=AE*4
zLyHobPJYhl^*(?|Im)?l2U8G0F{{N#BAbWSsR~^m?dZfx(AAb9K(*GtCm5E>;Rp*5
z;2jXQV}w-q28&8k|3oD<8VBy29WJ}OgeZf_IXI3P#``Kg=Gy~Q4!=2M3ETXt?)GR3
z>O3Lt%&X`@rIx4ls>I5n4TZfp?XRoIykZMc8&2bD|3IY;w7EFl*Za23fO^GyfcM5I
zF31l!%c>TPWPI5ws)|DZBxdF4%bw`q?2!thL=yK)i>ExMGvy3l*Y*h4_Q<LGfJ)fL
z70Dm?on5Drn?PcF=isIee~TonA%#s+U+$H~fk!rF8YrL09czxmge1YT`ULc0<3?6S
zp_RTk^Y?f+&?CqrFa1sm9AeXDMRkH1JIxqxS30EyU7VTco)*+_6Iy;8Z`L}tdSx5u
z*LI{{8`{qZy5RoBOK4mP`VepzZPi?5Mce{OwZ?s~+^6a5$%ns?BVc8~Q@n`b1FV$^
zQWB*{v<e79y`y9$yp#6m3E>AiQb?98tP;t?qim-sxU)e=^B2!D(;v&bR=TPJf5*G*
zv7oWRh3||D(7Q394X_It{ho`DgAypbO~shRG^x)|fuH^0e>i$A;;`<}p%zF~-n)SD
z)JM(e41|OEc3n4Z2{0T7k~O_Vgxke9@#@hoEz|g#8#Fl#!_@;94GgXSa!TVjmwB<`
z6R;Z|M^q==n03c)!Y2(#8-!7czZL#+N|JO(1xRH&nrgnuxDjq4^tM~Hf#3<pQg>Q;
z&U7Bz%#26#p`gg9)#i=WNOeP1IjGXQ$-HOA#tViwNE+!sW9}te#Mf+5BkzA^!eUtQ
zl|ryHq(OPO0++xL?VtPR!6)x_Wzn2+@QBST6+LhZef{66BDUE}?0Pn)rTffchsJDx
z-;zyZ$~#!nGHdqis2DP60nW~w%w{3SJ}`dQ7N6<9pEZ?pzb3iIYW%tad?Nd+DH%Cn
zOG;|y=3ASeX5wNzaiQuuNmCYchLWDg5M|9o$OfC(gcan~&$&fJ%SCyUiZ$X9A{tTU
zwP@!((xYf=a8_nh2wQ{~-M}bX6xJS@<)4lho(zgVfHfuXIQPl)noAvzd5qrYlmeam
zd{hIS-3)aTwARI3kL0;y{d`a2@S~Z0R(_qKo%ox?+O2T)-p0xgP#gEIZEzK5p$#T)
zLPps5^BTdK)JfFw16%$qeL^w(CmmTufz<Vpie+R~FDBzdEWaw<g}2wyp`Yt;>lrCu
zh!M(mlQUxhAuGWQZQk?;kO>GeHFzKL?fN&ONUoT7J*w!vaOO+6%oAlfLM@<FsIzb`
zf{-pW?X{!i=8y&1{M(0ooKlVP;t5yBzO_)bucM$p$0TA(=d&jFeR+Jc^(9jjCah=~
zuRCj}Qu>|~Jo;zM^sB8Dd)%|t?83O}u3z`8lwvN9Io6KFGKu`bVhbsx91+6M5e@~`
zJ#AxJsvi%R@<!Tz+tjb~_P#z{WU|HhHVQ0bQzCh?lX&tT%Zp5P+gS1GPb0d0|Fx8S
zKzDc2p(v4?TbNYSW)me=UHa}ahOba?(64QX?_CBD&9xf0KIAuJPL>fT3noQ?^!H|O
z!C_E6V7_hgYy=tC_s2u7$m!JY*WZgzs%0q1S6hWYu76y~T=PAr{-Jz@@LxVVW_cpp
z3avKoXTAPa-P?Td#J4q8?N)b2=!E&$DPP`Q<07?XSN%q&ssIT^3XKHucVlvnGWb{g
zZ$W^-|6hdxaUu4%bTV}_x3P3()6#^6fKEgc0JwW0LqNjZLqb5n;DFEnH|=kQ@Gosd
zsj02T5gh{8LJI=>e`(-rAqoP;|4*6~z`@j>^<Vb?9}Eizz5@aZycdIyB2MVw^*;{r
B55E8a

literal 0
HcmV?d00001